Home - Confessions of a Data Guy

Lakebase: Databricks’ Bold Play to Fuse OLTP and the Lakehouse

The future never shows up quietly. Just when you think you’ve tamed the latest “must-have” technology, a fresh acronym crashes the party. I’d barely finished wrapping my head around the Lakehouse paradigm when Databricks rolled out something new at the 2025 Data & AI Summit: Lakebase, a fully managed PostgreSQL engine built directly into the […]

June 28, 2025

Uncategorized

Databricks SQL Scripting: A Familiar Friend or an Old Foe?

I’d be lying if I said a small part of me didn’t groan when I first read about SQL Scripting being released by Databricks. Don’t get me wrong—I don’t fault Databricks for giving users what they want. After all, if you don’t feed the masses, they’ll turn on you. We data engineers are gluttons for […]

June 28, 2025

Uncategorized

DuckDB Enters the Lake House Race: My Take on DuckLake

I’ve been thinking about this for a few days now, and I still don’t know whether to cheer or groan. Some moments, I see DuckLake as a smart, much-needed evolution; other times, it feels like just another unnecessary entry in the ever-growing Lake House jungle. Reality, as always, is probably somewhere in between. MotherDuck and […]

June 28, 2025

Uncategorized

Finally, a Simple, Cloud-Friendly Apache Iceberg Catalog That Just Works

Let’s be honest: working with Apache Iceberg stops being fun the moment you step off your local laptop and into anything that resembles production. The catalog system—mandatory and rigid—has long been the Achilles’ heel of an otherwise promising open data format. For a long time, you had two options: over-engineered corporate-grade solutions that require infrastructure […]

May 23, 2025

Data, Data Engineering, Python

Convert CSV to Excel with DuckDB, Polars, etc.

Every so often, I have to convert some .txt or .csv file over to Excel format … just because that’s how the business wants to consume or share the data. It is what it is. This means I am often on the lookup for some easy to use, simple, one-liners that I can use to […]

April 24, 2025

Uncategorized

Cloudflare R2 Storage with Apache Iceberg

Rethinking Object Storage: A First Look at Cloudflare R2 and Its Built‑In Apache Iceberg Catalog Sometimes, we follow tradition because, well, it works—until something new comes along and makes us question the status quo. For many of us, Amazon S3 is that well‑trodden path: the backbone of our data platforms and pipelines, used countless times each day. If […]

April 22, 2025

Big Data, Data Engineering, Data Warehousing

dbt on Databricks

Running dbt on Databricks has never been easier. The integration between dbtcore and Databricks could not be more simple to set up and run. Wondering how to approach running dbt models on Databricks with SparkSQL? Watch the tutorial below.

March 28, 2025

Big Data, Data, Data Engineering, Data Warehousing, Python

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

There are things in life that are satisfying—like a clean DAG run, a freshly brewed cup of coffee, or finally deleting 400 lines of YAML. Then there are things that make you question your life choices. Enter: setting up Apache Polaris (incubating) as an Apache Iceberg REST catalog. Let’s get one thing out of the […]

March 26, 2025

Data, Data Engineering, Python

Reading Excel (.xlsx) Files with Polars

I make it my duty in life to never have to open an Excel file (xlsx); I feel like if I do, then I made a critical error in my career trajectory. But, I recently had no choice but to open an Excel on a Mac (or try) to look at some sample data from […]

March 18, 2025

Data, Data Engineering

dbt on Databricks.

Context and Motivation dbt (Data Build Tool): A popular open-source framework that organizes SQL transformations in a modular, version-controlled, and testable way. Databricks: A platform that unifies data engineering and data science pipelines, typically with Spark (PySpark, Scala) or SparkSQL. The post explores whether a Databricks environment—often used for Lakehouse architectures—benefits from dbt, especially if […]

March 4, 2025

Lakebase: Databricks’ Bold Play to Fuse OLTP and the Lakehouse

Databricks SQL Scripting: A Familiar Friend or an Old Foe?

DuckDB Enters the Lake House Race: My Take on DuckLake

Finally, a Simple, Cloud-Friendly Apache Iceberg Catalog That Just Works

Convert CSV to Excel with DuckDB, Polars, etc.

Cloudflare R2 Storage with Apache Iceberg

dbt on Databricks

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

Reading Excel (.xlsx) Files with Polars

dbt on Databricks.

Interesting links

Pages

Categories

Archive