The future never shows up quietly. Just when you think you’ve tamed the latest “must-have” technology, a fresh acronym crashes the party. I’d barely finished wrapping my head around the Lakehouse paradigm when Databricks rolled out something new at the 2025 Data & AI Summit: Lakebase, a fully managed PostgreSQL engine built directly into the […]

I’d be lying if I said a small part of me didn’t groan when I first read about SQL Scripting being released by Databricks. Don’t get me wrong—I don’t fault Databricks for giving users what they want. After all, if you don’t feed the masses, they’ll turn on you. We data engineers are gluttons for […]

I’ve been thinking about this for a few days now, and I still don’t know whether to cheer or groan. Some moments, I see DuckLake as a smart, much-needed evolution; other times, it feels like just another unnecessary entry in the ever-growing Lake House jungle. Reality, as always, is probably somewhere in between. MotherDuck and […]

Let’s be honest: working with Apache Iceberg stops being fun the moment you step off your local laptop and into anything that resembles production. The catalog system—mandatory and rigid—has long been the Achilles’ heel of an otherwise promising open data format. For a long time, you had two options: over-engineered corporate-grade solutions that require infrastructure […]

Every so often, I have to convert some .txt or .csv file over to Excel format … just because that’s how the business wants to consume or share the data. It is what it is. This means I am often on the lookup for some easy to use, simple, one-liners that I can use to […]

Rethinking Object Storage: A First Look at Cloudflare R2 and Its Built‑In Apache Iceberg Catalog Sometimes, we follow tradition because, well, it works—until something new comes along and makes us question the status quo. For many of us, Amazon S3 is that well‑trodden path: the backbone of our data platforms and pipelines, used countless times each day. If […]

Running dbt on Databricks has never been easier. The integration between dbtcore and Databricks could not be more simple to set up and run. Wondering how to approach running dbt models on Databricks with SparkSQL? Watch the tutorial below.

There are things in life that are satisfying—like a clean DAG run, a freshly brewed cup of coffee, or finally deleting 400 lines of YAML. Then there are things that make you question your life choices. Enter: setting up Apache Polaris (incubating) as an Apache Iceberg REST catalog. Let’s get one thing out of the […]

I make it my duty in life to never have to open an Excel file (xlsx); I feel like if I do, then I made a critical error in my career trajectory. But, I recently had no choice but to open an Excel on a Mac (or try) to look at some sample data from […]

Context and Motivation dbt (Data Build Tool): A popular open-source framework that organizes SQL transformations in a modular, version-controlled, and testable way. Databricks: A platform that unifies data engineering and data science pipelines, typically with Spark (PySpark, Scala) or SparkSQL. The post explores whether a Databricks environment—often used for Lakehouse architectures—benefits from dbt, especially if […]