Home - Confessions of a Data Guy

Big Data, Data, Data Engineering, Data Warehousing, Ramblings

Databricks vs Snowflake. The DataLake/Warehouse Battle.

As someone who worked around the classic Data Warehouses back in the day, before s3 took over and SQL Server and Oracle ruled the day … I love sitting on the sidelines watching new … yet old battle-lines being re-drawn. I could probably scroll back in StackOverflow 12 years and find the same arguments and […]

June 28, 2021

Data, Data Engineering, Python, Ramblings, Scala

Python vs Scala – Concurrency.

One of the reoccurring complaints you always see being parroted by the smarter-then-anyone-else-on-the-internet Reddit lurkers is the slowness of Python. I mean I understand the complaint …. but I don’t understand the complaint. Python is what is is, and usually is the best at what it is, hence its ubiquitous nature. I’ve been dabbling with […]

June 28, 2021

Big Data, Data, Data Engineering, Python

Apache Airflow Integration with DataBricks.

The two coolest kids in class … I mean seriously … every other post in Data Engineering world these days is about Apache Airflow or DataBricks. It’s hard to kick against the goad. Just jump on the band wagon before you get left in the dust. I’ve used both DataBricks and Apache Airflow, they both […]

June 14, 2021

Big Data, Data, Data Engineering, Data Warehousing, SQL

Intro to Apache Druid … What is this Devilry

Apache Druid, kinda like that second cousin you know about … but don’t really know. When you see them for the first time in 10 years you kinda look at them out of the corner of your eye. That’s how I feel about Apache Druid, I’ve always known it has been there, lurking around in […]

June 7, 2021

Big Data, Data, Data Engineering, Data Warehousing, Python

Why Data Engineer’s should use AWS Lambda Functions.

When I used to think of lambda functions on AWS my eyes would glaze over, I would roll my eyes and say, “I work with big data, what in the world can a silly little AWS lambda function offer me?” I’ve had to eat my own words, those little suckers come in handy in my […]

June 2, 2021

Big Data, Data, Data Engineering, Data Warehousing

The Elusive Idempotent Data Load/ETL

This is a topic I’ve been musing about lately. The idempotent data load has been a source of much pain and suffering in the lives of many a data engineer and data warehouse developers. Apparently somethings don’t change with the passage of time. My first job in tech was working on a data warehouse team […]

May 24, 2021

Data, Data Engineering, Data Warehousing

Data Modeling in DeltaLake (DataBricks)

Time to open a can of worms. I’ve recently been working with DataBricks, specifically DeltaLake (which I wrote about here). DeltaLake is an amazing tool that when paired with Apache Spark, is like the juggernaut of Big Data. The old is new, the new is old. The rise of DataBricks and DeltaLake is proof of […]

May 10, 2021

Big Data, Data, Data Engineering, Python

Airflow vs Dagster

Dagster, the first few times I read the name, I just couldn’t take the tech stack seriously …. it’s still kinda hard. Today I want to compare Airflow vs Dagster, mostly explore what Dagster is and does. But I want to compare it to the popular Apache Airflow project so people have some context for […]

April 26, 2021

Data, Data Engineering, Ramblings

What the Marooned Ben Gunn Teaches us about Solo Data Engineers

I always envied Ben Gunn in Treasure Island a little bit. Alone all those years, digging up gold and treasure, hunting wild goat, and living in a nice little cave. Living off the land, king of his island, gone half mad, but somewhat still there. Happy to see other people, but always a little bit […]

April 25, 2021

Big Data, Data, Data Engineering, Python

Introduction to Apache Flink for Data Engineers

Not going to lie. I’ve been trying to figure out for awhile where Apache Flink fits in the Data Engineering world for awhile now. A year or two ago I didn’t seem much content posted about it, but it seems to be picking up stream. I’ve mostly managed to avoid understanding what Flink is or […]

April 21, 2021

Databricks vs Snowflake. The DataLake/Warehouse Battle.

Python vs Scala – Concurrency.

Apache Airflow Integration with DataBricks.

Intro to Apache Druid … What is this Devilry

Why Data Engineer’s should use AWS Lambda Functions.

The Elusive Idempotent Data Load/ETL

Data Modeling in DeltaLake (DataBricks)

Airflow vs Dagster

What the Marooned Ben Gunn Teaches us about Solo Data Engineers

Introduction to Apache Flink for Data Engineers

Interesting links

Pages

Categories

Archive