Never Put Databricks Notebooks in Production

Recently an Architecture at Databricks recommended people use Notebooks for Production workloads. Very bad and horrible idea. Very expensive compute for most people (All Purpose Clusters) and it leads to horrible development practices. It set off a firestorm on Linkedin when I commented people SHOULD NOT follow this advice.

Read here and here

 

 

The Best Piece of Software Engineering Advice

You probably think this is another internet clickbait title uh? Just trying to get you to clickty clickty and sell you some Google Ads. Two problems. I don’t have Google Ads, and I know a small percentage of people will actually listen to this advice. Whatever. There is a reason some developers struggle to move past the Senior role.

Instead of making you scroll to the bottom to get what you can for, I’m going to give it to you off the top. How nice of me.

Read more

Why I Love Rust, but Deploy Python

I’m not sure if others have this same problem, maybe they are lucky, they get to build in their favorite language 24/7, it’s their tool of choice. I feel like I have a great burden to bear, a heavy one. I love to write Rust … but I deploy Python. Even when I know I could write Rust … Python gets deployed.

Read more

New SQL Practice Problems

New SQL Practice Problems

I’m trying something new. I get a lot of questions from folks about getting into the Data Engineering space, how to get better, grow, learn, etc.

So I came up with a solution. SQL Practice Problems.

Read more

The Abstraction Problem – A Great Evil

There is a great evil Spirit that is haunting the streets of code in the land of programmers. It’s a Spirit of obfuscation and twisting things into what they are not. The Spirit wanders around on the loose looking for someone, and it finds ready victims among the ranks of new programmers and the innocent young minds in University. It also finds a few old wizened souls that have been lost wandering for decades around the halls of some musty Fortune 500 company.

It’s the Abstraction Problem.

Read more

The Difficulties of Senior Engineer …. are not Engineering

Well, I hate to break the news to you. I was the same when I first started, writing code that is. I was a zealot. I was zealous for every new thing I learned, every new language, every new approach, I would find the preacher who was preaching the message I wanted to hear … OOP, functional, Kimball, this, that, the other thing.

You’re young and full of life. You think that your Software career revolves around … software. The pinnacle of your mountain seems to be becoming that “perfect programmer” who can write anything without any bugs.

Yet, when you get to the top of the mountain you find you’ve been deceived. Not all that glitters is gold.

Read more

Polars vs Spark

SQL Bad, Reddit Mad

SparkSQL is Destroying your Pipelines

It’s true, even if you don’t want it to be. SparkSQL is destroying your data pipelines and possibly wreaking havoc on your entire data team, infrastructure, and life. In your heart of hearts, you’ve probably known it for years. With great power comes great responsibility. We all know that even us Data Engineers are human and fallible.

Once those tentacles of SparkSQL get their hold on you, the probability of survival is low. Sure, there are a few wizened old engineers with enough battle scars to make it through unscathed. The rest of us will be maimed.

Read more

Datafusion SQL CLI – Look Ma, I made a new ETL tool.

Sometimes I just need something new and interesting to work on, to keep me engaged. A few days ago I was lying by the river next to a fire, with the cold air blowing on my face and the eagles soaring above. Thinking about and contemplating life and data engineering … something flitted across my mind, just a little fragment of an idea someone had written about.

The little fragment had to do with Datafusion, a Rust-based query engine, and something about it having a SQL CLI interface.

What an interesting thing. I’ve used Datafusion a few times, here and there, I love Rust because it’s fast. I’m a Data Engineer so I’m eternally enslaved to SQL whether I like it or not. This whole thing just seemed like an interesting little tidbit to poke at.

It basically made me wonder if I could combine the Datafusion SQL CLI with bash into a new ETL tool. Simple, small, fast, and maybe fun? Just because I can?

Read more