Why isn’t anyone talking about this?

Sometimes I am genuinely amazed by what captures the attention of the broader data community and what gets quietly pushed off to the side. I suppose I understand why it happens. New models arrive, vendors announce shiny features, and everyone rushes toward the next big thing. Still, as someone who has spent the last few years championing what I affectionately call the Single Node Rebellion, it is difficult not to wonder why certain developments receive so little attention. This is one of those moments.

Read more

I don’t know what it is about Apache Arrow, that GOAT of data engineering, that snuck in like a weasel through the backdoor, we all woke up one day and found out Arrow is the Atalas of the data world, holding up the systems we depend on and take for granted. One name you might, or might not have heard rattling around is “Apache Arrow Flight” or “Arrow Flight.”

It pops up a little in blogs, talks, and the READMEs of some GitHub repos… I imagine all the vibe-coding Chads just nodding furiously on Zoom calls, like wise old owls who know exactly what’s being talked about. Admit it, you don’t know what Arrow Flight is.

  • Apache Arrow Flight is a high-performance data transport framework built on top of Apache Arrow and gRPC that allows applications to move large datasets across networks much faster than traditional technologies like JDBC, ODBC, CSV exports, or REST APIs

Read more

I had not been thinking much about Kafka lately, but depending on who you ask, Kafka is either sitting comfortably at the top of the streaming world or beginning a slow decline into abstraction. The truth is probably somewhere in the middle. I’ve seen a number of newish streaming tools, many Rust-based, come in like a splash and leave nothing but a few bubbles.

Kafka is entrenched in the streaming world, and this has created a problem for the Lake House architecture. The integration isn’t “as easy as pie,” … far from it.

Read more

Yeah, so … I’ve heard rumbling and mumblings about, here and there. But I had yet to try it out for myself. I trust nothing I can’t put my hands on. Something about being raised in the cornfields of the Midwest, always be skeptical of anything that seems like Black Magic.

Read more

It’s hard to find the bright, shining stars amid the doom and gloom the tech world seems to be floundering in. When the going gets tough, I like to remind myself that there are lots of new and exciting tools released in the last few years, most of which, when combined, have not been part of the great LLM training material, leaving some fun left to explore.

Two of my newest favorite tools, DuckDB and Apache Arrow, have been around a while but are now becoming more integrated, starting to stand more firmly on their own and together.

Read more

Apache Arrow entered the data scene quietly; for years, it languished in obscurity, unheard of and uncared for by the data community. Back in the olden days of 2022, which feels like another world, I was happily using and writing about Arrow as a data processing tool. A lot has changed since then, and Arrow has catapulted its way into everyday data engineering conversations.

Read more