Home - Confessions of a Data Guy

Big Data, Data, Data Engineering, Python

The Dog Days of PySpark

PySpark. One of those things to hate and love, well … kinda hard not to love. PySpark is the abstraction that lets a bazillion Data Engineers forget about that blight Scala and cuddle their wonderfully soft and ever-kind Python code, while choking down gobs of data like some Harkonnen glutton. But, that comes with […]

April 15, 2023

Rust

QuickSort in Rust!

April 6, 2023

Data, Data Engineering

Polars vs Spark. Real Talk.

Real talk. Polars is all the rage. People love Spark. People use Spark for small data, but data is too big for Pandas. Spark runs on a local machine. Polars runs on a local machine. What do I choose, Spark or Polars? Does it matter? I’ve written about Polars at different points, here, and here […]

March 28, 2023

Data, Data Engineering

Introduction to Linked Lists.

March 26, 2023

AI, Data, Data Engineering

Future Proof Yourself Against AI.

March 23, 2023

Uncategorized

AWS Lambdas. Useful for Data Engineering?

Are lambdas one of those tools that everyone uses and no one talks about? I guess I’ve taken them for granted over the years, even though they are incredibly useful. For a lot of my Data Engineering career I didn’t really think about or use AWS lambdas, I just saw them as little annoying flies […]

March 20, 2023