Home - Confessions of a Data Guy

DuckDB vs Polars. Wait. DuckDB and Polars.

So, the classic newbie question. DuckDB vs Polars, which one should you pick? This is an interesting question, and actually drives a lot of search traffic to this website on which you find yourself wasting time. I thank you for that. This is probably the most classic type of question that all developers eventually ask […]

September 25, 2025

Uncategorized

Full vs Incremental Data Loads Explained

September 18, 2025

Big Data, Data, Data Engineering, DuckDB

Apache Iceberg Writes with DuckDB (or not)

Well, all the bottom feeders (Iceberg and DuckDB users) are howling at the moon and dancing around a bonfire at midnight trying to cast their evil spells on the rest of us. Apache Iceberg writes with DuckDB? Better late than never I suppose. Your witchy ways won’t work on me. Not going to lie, Iceberg […]

September 18, 2025

Big Data, Data, Data Engineering

How to tune Spark Shuffle Partitions.

So, you’re just a regular old Data Engineer crawling along through the data muck, barley keeping your head above the bits and bytes threatening to drown you. At point in time you were full of spit and vinegar and enjoyed understanding and playing with every nuance known to man. But, not you are old and […]

September 12, 2025

Big Data, Data, Data Engineering, Data Warehousing

Is Data Modeling Dead?

Ok, not going to lie, I rarely find anything of value in the dregs of r/dataengineering, mostly I fear, because it’s %90 freshers with little to no experience. These green behind the ear know-it-all engineers who’ve never written a line of Perl, SSH’d into a server, and have no idea what a LAMP stack is. […]

September 8, 2025

Uncategorized

The Fastest Way to Insert Data to Postgres

I was recently working on a PySpark pipeline in which I was using the JDBC option to write about 22 million records from a Spark DataFrame into a Postgres RDS database. Hey, why not use the built in method provided by Spark, how bad could it be? I mean it’s not like the creators and […]

August 30, 2025

AI, Big Data, Data, Data Engineering, Python

Polars on GPU: Blazing Fast DataFrames for Engineers

Did you know that Polars, that Rust based DataFrame tool that is one the fastest tools on the market today, just got faster?? There is now GPU execution on available on Polars that makes it 70% faster than before!!

August 28, 2025

Uncategorized

The Medallion Architecture Farce.

I can no longer hold the boiling and frothing mess of righteous anger that starts to rumble up from within me when I hear the words “Medallion Architecture” in the context of Data Modeling, especially when it’s used by some young Engineer who doesn’t know any better. Poor saps who have been born into a […]

August 27, 2025

Uncategorized

DuckDB … Merge Mismatched CSV Schemas. (also testing Polars)

I recently encountered a problem loading a few hundred CSV files, which contained mismatched schemas due to a handful of “extra” columns. This turned out to be not an easy problem for Polars to solve, in all its Rust glory. That made me curious: how does DuckDB handle mismatched schemas of CSV files? Of course, […]

August 22, 2025

Uncategorized

polars.exceptions.ComputeError: schema lengths differ

So, you are happily using the new Rust GOAT dataframe tool Polars to mung messy data, maybe like me, messing with 40GBs of CSV data over multiple files. You are pretty much going to run into this error. polars.exceptions.ComputeError: schema lengths differ This error occurred with the following context stack: [1] ‘csv scan’ [2] ‘select’

August 20, 2025

DuckDB vs Polars. Wait. DuckDB and Polars.

Full vs Incremental Data Loads Explained

Apache Iceberg Writes with DuckDB (or not)

How to tune Spark Shuffle Partitions.

Is Data Modeling Dead?

The Fastest Way to Insert Data to Postgres

Polars on GPU: Blazing Fast DataFrames for Engineers

The Medallion Architecture Farce.

DuckDB … Merge Mismatched CSV Schemas. (also testing Polars)

polars.exceptions.ComputeError: schema lengths differ

Interesting links

Pages

Categories

Archive