I don’t know what it is about Apache Arrow, that GOAT of data engineering, that snuck in like a weasel through the backdoor, we all woke up one day and found out Arrow is the Atalas of the data world, holding up the systems we depend on and take for granted. One name you might, […]

I had not been thinking much about Kafka lately, but depending on who you ask, Kafka is either sitting comfortably at the top of the streaming world or beginning a slow decline into abstraction. The truth is probably somewhere in the middle. I’ve seen a number of newish streaming tools, many Rust-based, come in like […]

Yeah, so … I’ve heard rumbling and mumblings about, here and there. But I had yet to try it out for myself. I trust nothing I can’t put my hands on. Something about being raised in the cornfields of the Midwest, always be skeptical of anything that seems like Black Magic.

It’s hard to find the bright, shining stars amid the doom and gloom the tech world seems to be floundering in. When the going gets tough, I like to remind myself that there are lots of new and exciting tools released in the last few years, most of which, when combined, have not been part […]

Apache Arrow entered the data scene quietly; for years, it languished in obscurity, unheard of and uncared for by the data community. Back in the olden days of 2022, which feels like another world, I was happily using and writing about Arrow as a data processing tool. A lot has changed since then, and Arrow […]

Ok, Spark isn’t dead. Before you leave, I’m sorry for lying to you. Sorta. Kinda. Not really. Undoubtedly Apache Spark has reached its zenith, shot like a rocket out of the Databricks barrel into the sky. The world is shifting though, even if ever so imperceptible.