ORM’s are the Cigarettes of the Data Engineering World.

Seriously, just don’t do it, they are bad for you. Listen to your mother, just say no. The dreaded ORM’s ( Object Relational Mapping ) that do all the hard SQL work for you. But, they come with many unintended consequences that are bad for your health and wellness in the long term. Many unsuspecting victims have been sucked into ORMs with the promise of an easier transition to allow programmers a familiar object-oriented design pattern for manipulating the data in a relational database, say Postgres or MySQL.

Again I tell you, don’t fall for the siren songs, there are tears and sorrow down the long and lonely ORM road.

What are ORM’s, who uses them, and why.

Not familiar with ORMs? This StackOverflow question gives a good overview. What really is an ORM?

“Escape using SQL to access and manipulate databases, in favor of your pogramming language of choice.”

– Me

It just sounds so nice, doesn’t it? I mean if you are writing Python or Scala all day long, why in the world would you want to worry about picking up SQL, when there is a tool that can do it for you? Especially a tool that fits the object-oriented model of code you are probably writing in.

But, there is a dark side to this story. If something is too good to be true, there is probably a reason for it and side effects that sneak up on you. Especially if you are at the beginning of your data engineering career.

4 Reasons you SHOULDN’T use an ORM.

Data Engineers especially should not be using ORM’s, it is tantamount to heresy, burn you at the stake heresy.

  • You don’t learn SQL, and SQL is everywhere and is going nowhere. Think Spark SQL, Snowflake, etc.
  • You don’t make as many mistakes and struggle, which is how you learn.
  • You are less likely to think about the consequences of a query you can’t see.
  • Poor query performance.

You don’t learn SQL … well, when you use an ORM.

One of my biggest complaints with ORMs in general is it’s amazing ability to obfuscate SQL and general database knowledge, especially in newish developers. When using an ORM you usually stay in the comfort of your programming language, I mean that’s the whole point of the ORM really. When you don’t learn to write out SQL statements you typically are not going to learn with much clarity many important concepts, and this has many unintended consequences.

Basic SQL fundamentals like only SELECT the columns you need, what a GROUP BY is and does, the all famous SUB-QUERY … these lessons lose their potency in an ORM.

Let’s take the example of SQLAlchemy, probably the most popular Python ORM. Let’s take a look at an example from a StackOverflow question about what a subquery looks like.

subquery = session.query(Apartments.id).filter(Apartments.postcode==2000).subquery()
query = session.query(Residents).filter(Residents.apartment_id.in_(subquery))

It’s pretty obvious whats going on, but you loose something in the translation away from SQL, and this is a very basic example!

Not writing out SQL statements helps you NOT to think about indexes, joins, where clauses and how they all tie together to make a performant query.

You don’t make as many mistakes.

Now some folks probably think this is a good thing, but it really isn’t. Mistakes are how we learn lessons, when you aren’t allowed to make mistakes it’s harder for learning to take place.

There is something about writing a multi-layered CTE, sub-query intricately laid out complex SQL statement that bends your brain a certain way. Being able write and build a query slowly in your favorite editor while running the results and iterating … it’s a rare thing that allows for a special learning about what exactly makes a high performance query tick.

Take this example from the SQLAlchemy documentation

>>> inner_stmt = select(User).where(User.id < 7).order_by(User.id)
>>> subq = inner_stmt.subquery()
>>> aliased_user = aliased(User, subq)
>>> stmt = select(aliased_user)
>>> for user_obj in session.execute(stmt).scalars():
...     print(user_obj)

Most likely if your using a IDE like PyCharm to do this work that your autocomplete is going to tell you if your calling some method correctly or not. It’s very similar to writing yet is nothing like writing plain SQL.

No pain no gain.

Not thinking about consequences of query you can’t see.

I’ve seen first hand the devistation that total depednance on an ORM can cause in a real world production system. You want to know what it looks like?

  • database problems because of horrible queries produced by many developers.
  • developer frustration because of poor query performance that is blammed on the database, but can’t be fixed in the database because its a poor query.
  • developers trying to “manipulate” the ORM to try to “get it to write the query a different way.”
  • senior engineers resorting to bypassing the ORM but writing out the intricate SQL queries to solve the above issues.
  • senior engineers struggling to do the above because they’ve only every used a ORM and don’t have the skills to tune a query.

I understand the benefits of a ORM and why many people use them. It’s probably fine for your small web app. It’s probably not fine for anything else in a high volume production environment.

If you can’t actually see the SQL that is being generated by some ORM … chances are you are missing out on a lot.

Musings

Of course I’m biased. I got my start writing the most complex SQL statements possible in Data Warehouse environments for years, I’ve spent plenty of time identifying the most expensive database queries and re-writting them to shave off minutes to milliseconds. I have a special place in my heart for SQL, it’s powerful and when you master it you start to master the underlying relational database system and how to make it do what you want.

I’ve seen ORMs frustrate senior and staff level software engineers who are the smartest people on the planet, but their code comes to nothing because they can’t solve an intermediate level SQL problem. It shouldn’t be like this.

You should skip the ORM and learn how to write SQL yourself, your future self will thank you.