SQL Archives - Confessions of a Data Guy

Databricks Metric Views and the Reality of the Semantic Layer

I’ve written before about the elusive “Semantic Layer,” that mythical construct every data team eventually talks about building. It’s the idea of pulling all business logic, calculations, and definitions into a single place so everyone agrees on what the numbers actually mean. Anyone who has worked in data long enough knows the pain this is trying to solve. Logic gets scattered across pipelines, dashboards, notebooks, and random scripts, and before long, no one can explain why two reports show different answers for the same metric.

Despite decades of industry experience, we still struggle with this. Data teams continue to fight their way through repos, documentation, and tribal knowledge just to understand how a number is calculated. It’s not that we don’t know better—it’s that systems naturally drift toward complexity and inconsistency over time.

March 24, 2026

Data, Data Engineering, SQL

What is SQLMesh and how is it different from dbt?

SQLMesh is an open-source framework for managing, versioning, and orchestrating SQL-based data transformations.
It’s in the same “data transformation” space as dbt, but with some important design and workflow differences.

What SQLMesh Is

SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can efficiently run and deploy data transformations written in SQL or Python with visibility and control at any size.

So … what you are telling me is that it’s dbt … but with Python? Interesting enough concept, I should say. One would have to surmise that most people using SQLMesh would be using … SQL! Look at how smart I am.

August 14, 2025

Big Data, Data, Data Engineering, Data Warehousing, SQL

Duplicates in Data and SQL

You know, after literally multiple decades in the data space, writing code and SQL, at some point along that arduous journey, one might think this problem would be solved by me, or the tooling … yet alas, not to be.

Regardless of the industry or tools used, such as Pandas, Spark, or Postgres, duplicates are a common issue in pipelines, and SQL remains the most classic and iconic problem. Things just never change, and humans never learn their lessons, at least I don’t.

July 29, 2025

Big Data, Data, Data Engineering, Python, SQL

DuckDB … reading from s3 … with AWS Credentials and more.

In my never-ending quest to plumb the most boring depths of every single data tool on the market, I found myself annoyed when recently using DuckDB for a benchmark that was reading parquet files from s3. What was not clear, or easy, was trying to figure out how DuckDB would LIKE to read default AWS credentials.

November 18, 2024

Data, Data Engineering, Python, SQL

Building Databricks Data Pipelines 101

Have you ever wondered at a high level what it’s like to build production-level data pipelines on Databricks? What does it look like, what tools do you use?

March 29, 2024

SQL

Data Modeling Is Easy

When you’ve been data modeling as long as I have, it gets to be the same old … same old.

People make data modeling harder than it has to be. There is a lot of jargon that gets thrown around … third-normal-form, OLAP, OLTP … I give you the 3-4 basics that are at the core of data modeling.

March 14, 2024

Data, Data Engineering, SQL, Uncategorized

DuckDB has MAJOR Problems! OOM Errors.

I recently did a challenge. The results were clear. DuckDB CANNOT handle larger-than-memory datasets. OOM Errors. See link below for more details.

… DuckDB vs Polars – Thunderdome. 16GB on 4GB machine Challenge.

March 8, 2024

Data, Data Engineering, SQL

SQL Bad, Reddit Mad

December 30, 2023

Data, Data Engineering, SQL

SparkSQL is Destroying your Pipelines

It’s true, even if you don’t want it to be. SparkSQL is destroying your data pipelines and possibly wreaking havoc on your entire data team, infrastructure, and life. In your heart of hearts, you’ve probably known it for years. With great power comes great responsibility. We all know that even us Data Engineers are human and fallible.

Once those tentacles of SparkSQL get their hold on you, the probability of survival is low. Sure, there are a few wizened old engineers with enough battle scars to make it through unscathed. The rest of us will be maimed.

December 24, 2023

Data, Data Engineering, SQL

Datafusion SQL CLI – Look Ma, I made a new ETL tool.

Sometimes I just need something new and interesting to work on, to keep me engaged. A few days ago I was lying by the river next to a fire, with the cold air blowing on my face and the eagles soaring above. Thinking about and contemplating life and data engineering … something flitted across my mind, just a little fragment of an idea someone had written about.

The little fragment had to do with Datafusion, a Rust-based query engine, and something about it having a SQL CLI interface.

What an interesting thing. I’ve used Datafusion a few times, here and there, I love Rust because it’s fast. I’m a Data Engineer so I’m eternally enslaved to SQL whether I like it or not. This whole thing just seemed like an interesting little tidbit to poke at.

It basically made me wonder if I could combine the Datafusion SQL CLI with bash into a new ETL tool. Simple, small, fast, and maybe fun? Just because I can?

December 21, 2023

Databricks Metric Views and the Reality of the Semantic Layer

What is SQLMesh and how is it different from dbt?

What SQLMesh Is

Duplicates in Data and SQL

DuckDB … reading from s3 … with AWS Credentials and more.

Building Databricks Data Pipelines 101

Data Modeling Is Easy

DuckDB has MAJOR Problems! OOM Errors.

SQL Bad, Reddit Mad

SparkSQL is Destroying your Pipelines

Datafusion SQL CLI – Look Ma, I made a new ETL tool.

Interesting links

Pages

Categories

Archive