Spark is Dead. Long Live DuckDB.

Ok, Spark isn’t dead. Before you leave, I’m sorry for lying to you. Sorta. Kinda. Not really.

Undoubtedly Apache Spark has reached its zenith, shot like a rocket out of the Databricks barrel into the sky. The world is shifting though, even if ever so imperceptible.

First, open source Spark has been taken over, corralled and tammed by the corporate workings and money trains. It’s the fault of no one really, it’s the world we live in. Very few open source projects truly stand the test of time, uncorrupted by outside forces.

The overt powers the be infiltrate the project maintainers, and become one with them, by some means, good or bad.

Second, for Apache Spark, its main undoing is the abstraction and complexity reduction required by the tech age we live in. Heck, people don’t even want to run Spark on EMR anymore, Databricks ruined us all.

The rise of the Vibe Programmer has breed a new and soft sort of person, but that’s nothing new, the BASIC and Assembly engineers before us said the same thing.

No one wants to install and tune Spark clusters by hand. No one wants to learn how to calculate shuffle partitions. The collective data world has moved in to greener pastures. So, we turned to SparkAsAservice.

Databricks rewrote many parts of core functionality as Photon, the line between a DBR (Databricks Runtime) and SparkOSS has forever been blurred.

Heck, even the existence of Datafusion Comet, writing core functions in Rust, shows the cracks in Sparks long and battle worn armor is here to stay.

  • Needless to stay, Spark as we have known it is dead. Done. Finished.

Databricks Spark is the new SQL Server, the tool of choice for the corporate monoliths, with that pricing to boot, as well.

So, yeah, it will be here in one form or another for decades, they did the impossible and are now doomed to a lifetime of purgatory, that once bright bastion of freedom and innovation is now the slow moving monolith of CTOs delight.

A new (DuckDB) era had come.

With the death of something old, however slow and painful it might be, that space is never empty for long. We are too collectively good for that.

In a oh so classic move, we have whiplashed ourselves back in time, a subconscious desire for the Old Ways is buried in us all.

  • We want simplicity, freedom, true open source, easy, SQL that old friend. We want DuckDB.

We gasped in horror at those large cloud bills, compute and clusters running in and endless cycle or Notebooks spawned by those children who don’t know any better, but should.

DuckDB arrived quietly on the scene, a simple savior to sooth and be the balm for a weary soul. Lighting fast data processing, in memory computations, spill to disk abilities, SQL as the interface, all installed with a few lines of code.

Airflow jobs, AWS lambdas, Python scripts, CLIs, the ease of use is like something of old. Digging through the detritus of the Modern Data Stack, we have found our Arkenstone, a tool of great power and beauty.

Old ways die hard though.

How can we reteach a legion of programmers using Claude, who was trained on Spark, that 99% of the data pipelines in question have zero need for cluster compute.

We need them to use their minds, to care about what solutions they are putting in place. To care about cost and efficiency. To look for simplicity and speed.

Can it be done? I don’t know.

It’s possible that those rising interest rates and cost conscious CTOs might eventually demand a serious reduction in that scary monthly bill.

It’s possible that Databricks will continue to fumble the ball, as it slowly has been, growing and expanding into an uncontrollable beast.

  • Yeah, the change will be slow and long, as arcs always are, I will probably be dead before it’s complete.

I for one have been happily part of this change. Sticking some DuckDB in a Lambda here and there. Hopefully that’s just the beginning.

The Spark powers that be will die hard and slow. Everyone is already an addict to the Lake House, the greatest tech marketing scheme of this half century.

It appears even Iceberg is in danger. But that’s a story for another day.