I’ve been a Polars bro for most of the last few years. Why? It’s Rust-based, fast, DataFrame-centric, just the way I like it. It also had the excellent feature, right from the start, of Lazy Execution. A few years ago, maybe two, I actually put Polars into production, running on Airflow, working with S3 and reading Delta Lake tables.

I was in love.

Read more

It’s a fast-paced and ever-changing world we live in; nothing we can do about it. I grew up in the middle of the prairie, when the internet became mainstream, the age of Doom, Myst, MSN Messenger, Yahoo Pool, and that irreplaceable Goldeneye, let’s be honest, World of Warcraft on a PC was game-changing. I suppose you could chalk up half my feelings as nostalgia and old-person hum-drum, I won’t deny it.

I see the current Agentic AI confusion in the software community as something similar to the old days when I split my time between being a river rat and playing Battlefield 1942 all night long, enraptured by new tech, yet drawn to the old ways.

Read more

Recently, I had to migrate a few hundred Delta Lake tables that were partitioned over to Liquid Clustering. It seems straightforward on the surface, but everything usually does, until it isn’t. There was no rocket science involved here, but I did want to write this up to help the myriad of others who will probably want/need to move from partitioning to liquid clustering over the next few years.

Databricks recommends liquid clustering for all new Delta Lake tables. Based on past testing, liquid clustering indeed offers significant performance gains.

Again, the difference between a partitioned table and a liquid clustered table, in terms of DDL, is not very much, as you can see.

Read more