May 2026 - Confessions of a Data Guy

The Truth About The Modern Data Stack | DuckDB Insider

May 27, 2026

Big Data, Data, Data Engineering, Data Warehousing

Databricks Zerobus Streaming Ingestion for Delta Lake House

Yeah, so … I’ve heard rumbling and mumblings about, here and there. But I had yet to try it out for myself. I trust nothing I can’t put my hands on. Something about being raised in the cornfields of the Midwest, always be skeptical of anything that seems like Black Magic.

May 23, 2026

Uncategorized

Academic → CTO: What Actually Matters in Data (Matthew Housley)

May 19, 2026

Big Data, Data, Data Engineering, DuckDB, Python

Apache Arrow + DuckDB (the GOAT + the GOAT)

It’s hard to find the bright, shining stars amid the doom and gloom the tech world seems to be floundering in. When the going gets tough, I like to remind myself that there are lots of new and exciting tools released in the last few years, most of which, when combined, have not been part of the great LLM training material, leaving some fun left to explore.

Two of my newest favorite tools, DuckDB and Apache Arrow, have been around a while but are now becoming more integrated, starting to stand more firmly on their own and together.

May 19, 2026

Big Data, Data, Data Engineering

Apache Arrow as Data Interchange

Apache Arrow entered the data scene quietly; for years, it languished in obscurity, unheard of and uncared for by the data community. Back in the olden days of 2022, which feels like another world, I was happily using and writing about Arrow as a data processing tool. A lot has changed since then, and Arrow has catapulted its way into everyday data engineering conversations.

May 14, 2026

Uncategorized

Spark is Dead. Long Live DuckDB.

Ok, Spark isn’t dead. Before you leave, I’m sorry for lying to you. Sorta. Kinda. Not really.

Undoubtedly Apache Spark has reached its zenith, shot like a rocket out of the Databricks barrel into the sky. The world is shifting though, even if ever so imperceptible.

May 12, 2026