Data Engineering Archives - Confessions of a Data Guy

Delta Lake + DuckDB. Catalog Commits with Unity Catalog. Unlocking Concurrent Ingestion.

Why isn’t anyone talking about this?

Sometimes I am genuinely amazed by what captures the attention of the broader data community and what gets quietly pushed off to the side. I suppose I understand why it happens. New models arrive, vendors announce shiny features, and everyone rushes toward the next big thing. Still, as someone who has spent the last few years championing what I affectionately call the Single Node Rebellion, it is difficult not to wonder why certain developments receive so little attention. This is one of those moments.

June 11, 2026

AI, Big Data, Data, Data Engineering

From Failure to AWS: What Actually Makes a Great Engineer

June 10, 2026

Big Data, Data, Data Engineering

The Reality of Running Data at Scale | How Real Data Engineers Think

June 4, 2026

AI, Big Data, Data, Data Engineering

What is Apache Arrow Flight?

I don’t know what it is about Apache Arrow, that GOAT of data engineering, that snuck in like a weasel through the backdoor, we all woke up one day and found out Arrow is the Atalas of the data world, holding up the systems we depend on and take for granted. One name you might, or might not have heard rattling around is “Apache Arrow Flight” or “Arrow Flight.”

It pops up a little in blogs, talks, and the READMEs of some GitHub repos… I imagine all the vibe-coding Chads just nodding furiously on Zoom calls, like wise old owls who know exactly what’s being talked about. Admit it, you don’t know what Arrow Flight is.

Apache Arrow Flight is a high-performance data transport framework built on top of Apache Arrow and gRPC that allows applications to move large datasets across networks much faster than traditional technologies like JDBC, ODBC, CSV exports, or REST APIs

June 3, 2026

Big Data, Data, Data Engineering

Databricks Zerobus: Event Streams + The Lakehouse

I had not been thinking much about Kafka lately, but depending on who you ask, Kafka is either sitting comfortably at the top of the streaming world or beginning a slow decline into abstraction. The truth is probably somewhere in the middle. I’ve seen a number of newish streaming tools, many Rust-based, come in like a splash and leave nothing but a few bubbles.

Kafka is entrenched in the streaming world, and this has created a problem for the Lake House architecture. The integration isn’t “as easy as pie,” … far from it.

June 1, 2026

Data, Data Engineering, DuckDB

The Truth About The Modern Data Stack | DuckDB Insider

May 27, 2026

Big Data, Data, Data Engineering, Data Warehousing

Databricks Zerobus Streaming Ingestion for Delta Lake House

Yeah, so … I’ve heard rumbling and mumblings about, here and there. But I had yet to try it out for myself. I trust nothing I can’t put my hands on. Something about being raised in the cornfields of the Midwest, always be skeptical of anything that seems like Black Magic.

May 23, 2026

Big Data, Data, Data Engineering, DuckDB, Python

Apache Arrow + DuckDB (the GOAT + the GOAT)

It’s hard to find the bright, shining stars amid the doom and gloom the tech world seems to be floundering in. When the going gets tough, I like to remind myself that there are lots of new and exciting tools released in the last few years, most of which, when combined, have not been part of the great LLM training material, leaving some fun left to explore.

Two of my newest favorite tools, DuckDB and Apache Arrow, have been around a while but are now becoming more integrated, starting to stand more firmly on their own and together.

May 19, 2026

Big Data, Data, Data Engineering

Apache Arrow as Data Interchange

Apache Arrow entered the data scene quietly; for years, it languished in obscurity, unheard of and uncared for by the data community. Back in the olden days of 2022, which feels like another world, I was happily using and writing about Arrow as a data processing tool. A lot has changed since then, and Arrow has catapulted its way into everyday data engineering conversations.

May 14, 2026

AI, Big Data, Data, Data Engineering

AI is Changing Data Engineering Fast!

April 30, 2026

Delta Lake + DuckDB. Catalog Commits with Unity Catalog. Unlocking Concurrent Ingestion.

From Failure to AWS: What Actually Makes a Great Engineer

The Reality of Running Data at Scale | How Real Data Engineers Think

What is Apache Arrow Flight?

Databricks Zerobus: Event Streams + The Lakehouse

The Truth About The Modern Data Stack | DuckDB Insider

Databricks Zerobus Streaming Ingestion for Delta Lake House

Apache Arrow + DuckDB (the GOAT + the GOAT)

Apache Arrow as Data Interchange

AI is Changing Data Engineering Fast!

Interesting links

Pages

Categories

Archive