Data Warehousing Archives - Confessions of a Data Guy

Big Data, Data, Data Engineering, Data Warehousing

Databricks Zerobus Streaming Ingestion for Delta Lake House

Yeah, so … I’ve heard rumbling and mumblings about, here and there. But I had yet to try it out for myself. I trust nothing I can’t put my hands on. Something about being raised in the cornfields of the Midwest, always be skeptical of anything that seems like Black Magic.

May 23, 2026

Big Data, Data, Data Engineering, Data Warehousing

Is Data Modeling Dead?

Ok, not going to lie, I rarely find anything of value in the dregs of r/dataengineering, mostly I fear, because it’s %90 freshers with little to no experience. These green behind the ear know-it-all engineers who’ve never written a line of Perl, SSH’d into a server, and have no idea what a LAMP stack is. Weak. Sad.

We used to program our way to glory, up hill both ways in the snow. All you do is script kiddy some Python code through Cursor.

A recent post on Data Modeling, specifically that data modeling is dead, caught my eye. A rare piece of gold mixed in the usual pile of crap. It some truth being spoken on the interwebs, hold onto your panties you bright eyed data zealot. I agree %100 with this sentiment.

DATA MODELING IS DEAD.

September 8, 2025

Big Data, Data, Data Engineering, Data Warehousing, SQL

Duplicates in Data and SQL

You know, after literally multiple decades in the data space, writing code and SQL, at some point along that arduous journey, one might think this problem would be solved by me, or the tooling … yet alas, not to be.

Regardless of the industry or tools used, such as Pandas, Spark, or Postgres, duplicates are a common issue in pipelines, and SQL remains the most classic and iconic problem. Things just never change, and humans never learn their lessons, at least I don’t.

July 29, 2025

Big Data, Data Engineering, Data Warehousing

dbt on Databricks

Running dbt on Databricks has never been easier. The integration between dbtcore and Databricks could not be more simple to set up and run. Wondering how to approach running dbt models on Databricks with SparkSQL? Watch the tutorial below.

March 28, 2025

Big Data, Data, Data Engineering, Data Warehousing, Python

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

There are things in life that are satisfying—like a clean DAG run, a freshly brewed cup of coffee, or finally deleting 400 lines of YAML. Then there are things that make you question your life choices. Enter: setting up Apache Polaris (incubating) as an Apache Iceberg REST catalog.

Let’s get one thing out of the way—I didn’t want to do this.

March 26, 2025

Big Data, Data, Data Engineering, Data Warehousing

Apache XTable. Delta vs Iceberg vs Hudi.

The blog post reviews an Apache Incubating project called Apache XTable, which aims to provide cross-format interoperability among Delta Lake, Apache Hudi, and Apache Iceberg. Below is a concise breakdown from some time I spend playing around this this new tool and some technical observations:

March 4, 2025

Big Data, Data, Data Engineering, Data Warehousing, DuckDB, Python

AWS Lambda + DuckDB + Polars + Daft + Rust

When it comes to building modern Lake House architecture, we often get stuck in the past, doing the same old things time after time. We are human; we are lemmings; it’s just the trap we fall into. Usually, that pit we fall into is called Spark. Now, don’t get me wrong; I love Spark. We couldn’t have what we have today in terms of Data Platforms if it wasn’t for Apache Spark.

January 30, 2025

Big Data, Data, Data Engineering, Data Warehousing

The Death of the Data Warehouse, replaced by the Lake House. Or Has It?

This is an interesting one indeed, it’s one that teases and puzzles the brain to no end. Has the Data Warehouse finally died, has that unruly upstart the Lake House finally taken its place atop the seething mass of data we call home? Can we say that after all these decades the Data Warehouse Toolkit and Kimball is finally gone the way of the dinosaurs? Maybe. Probably. I don’t know.

October 7, 2024

Data, Data Engineering, Data Warehousing

Data Modeling in the Brave New Lakehouse World

It is a Brave New World out there these days. The new tools and features come out faster than your mom on Sunday morning getting you ready for church. The same goes for the context and advice being produced on a myriad of platforms, the ole’ Like and Subscribe, and all that bit. It does make you wonder after a while, what you can trust, who has your best interest in mind, and who is selling you a bottle of snake oil, doesn’t it?

Today we talk about Data Modeling. Specifically Data Modeling in the new world we all live in christened The Lakehouse by our benevolent Vender Overlords.

September 19, 2024

Data Warehousing, Ramblings

Databricks Buys Tabular – 1 Billion Dollar Deal. Iceberg vs Delta Lake?

The battle for the Data Warehouse, Data Lake, Lake House, or whatever you want to call it, in the age of AI just got more interesting. In an unsurprising move, Databricks has announced plans to buy Tabular for 1 billion dollars, beating out Snowflake who was reportedly trying to do the same thing.

June 4, 2024

Databricks Zerobus Streaming Ingestion for Delta Lake House

Is Data Modeling Dead?

Duplicates in Data and SQL

dbt on Databricks

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

Apache XTable. Delta vs Iceberg vs Hudi.

AWS Lambda + DuckDB + Polars + Daft + Rust

The Death of the Data Warehouse, replaced by the Lake House. Or Has It?

Data Modeling in the Brave New Lakehouse World

Databricks Buys Tabular – 1 Billion Dollar Deal. Iceberg vs Delta Lake?

Interesting links

Pages

Categories

Archive