, ,

Date and DateTime Manipulation in Polars

One thing all Data Engineers are doomed to do in purgatory will be to solve different date and datetime problems in an endless loop. I’m sure of it. I can’t imagine anything worse, so that must be it. Either way the constant need to manipulate dates and datetimes are just a way of life, something that never ends and never changes. Also, it appears Polars is here to stay from what I can tell. Not a fad like that Data Mesh. Since Polars is here to stay, (I’ve already got it running in production at my company, (don’t mind if I bow)), we should probably take a gander at how to manipulate date and datetime objects from both the Dataframe and (if I have time) SQL perspective. See if we can find anything to complain about. I like to complain.

Polars dates and datetimes

You know, before I dig into the documentation and start writing code, to be fair, I should just come up with a list of things that normally take place with date and datetime objects, and then just make sure Polars can do those things in a reasonable manner. Rather than starting backward and starting with the documentation. I don’t want to cloud my mind. I want just to have a set of things I normally do with dates, and then see how easy or not it is to do those things in Polars. That is the best and fairest test probably.

Well, here goes nothing. All code on GitHub.

  • convert a string to a date
  • convert a string to a datetime
  • convert a date or datetime back to a string
  • pull out year, month, and day from a date or datetime object.
  • pull the quarter out of a date or datetime object.
  • get the diff between two date or datetime objects.
  • Add an arbitrary number of days to a date or datetime object.

That’s basic enough and enough for us probably, at least to find out if Polars is coming up to snuff for basic manipulations of date or datetime objects.

Polars Python code for date and datetime manipulation.

Who knows, maybe this will be boring, but there is no time like the present, let’s just dive into each one of the items above. I will attempt to solve each problem with both the Dataframe API and (if I have time) SQLContext.

convert a string to a date

This isn’t too bad with the Dataframe API. Basically, if we have a Dataframe with a String column, we can just .str and then .to_datetime() on that. By default, it tries to figure out the format itself, which is straightforward if it’s normal text.

Otherwise, you can pass an format if need be and pass the normal chrono type formats like `%Y-%m-%d %H:%M:%S` that we are all used to.

I’m curious about the strange .str.to_datetime() requirements in Python, as someone who writes Rust, it feels very … well … Rusty to me. Not Pythonic at all. Makes no sense to me. It should be simply pl.col('column').to_datetime() and error if the data types are wrong or something.

I get it though, and understand why they do that.

Before I go any further, I’m curious what happens if I simply. just .cast('date') on the string column. Will that shortcircuit the extra call to .str.to_datetime()?

Yeah … no, that was too much wishful thing there. Although it did give an interesting message … consider using `strptime` well … don’t mind if I do there Sunny Jim. Looks like it works. I’ll be buggered if I know the difference between to_datetime() and strptime() … aka why you would use one over the other.

Well, we pounded that dead horse a little much. Let’s try a string to date with SQL in Polars and see what happens.

I did some hunting trying to find what kind of SQL is supported by Polars. Apparently not much.

“Polars does not support the full SQL language” – docs

I had heard rumblings on Linkedin some time ago that the new release of Polars included a DATE() method in SQL. So let’s give it a try. I could not find a reference to it in any of the documentation that I looked at. Buggers. By George, it worked! Three cheers for Polars SQL!

convert a string to a datetime

I was going to spend some time here, but I think I will not. I will show a simple example and move on. Since the string to date worked, we can assume based on one example that string to datetime probably works well. This is accomplished with .str.to_datetime().

Let’s move on.

convert a date or datetime back to a string

One of the other common things I find myself doing is the opposite of what we’ve just done, taking some date or datetime and give me a string back. Comes in handy. This can be done with .dt.strftime()

pull out year, month, and day from a date or datetime object.

Well, yet another simple but important date buggery we all do quite often is pull year, month, or day from a date or datetimeobject. Being able to do this is useful in all sorts of situations.

Very handy indeed. Yes, there is every support for things like quarter with dt.quarter(), other stuff as well, like minute() , weekday() , month_start() , month_end() , and other little tidbits. A good sign indeed for Polars.

get the diff between two date or datetime objects.

I did have trouble finding the ability to do this. I saw some StackOverflow posts about people using numpy inside Polars to accomplish this and were complaining about its slowness on large datasets. I’m interested in if Polars can do this by itself without any help.

Getting a date diff is a very common problem to solve.

I did find an interesting method called duration , docs here. This is something akin to the ability to add or subtract a temporal unit from an existing date or datetime object. Something like below you can add a day. I’m glad I found this, not what I was looking for, but very useful.

Then I had a thought. What if Polars is so wonderful I can simply just subtract one date from another. Would this work? Hammer of Thor it does!

So nice!!!

Conclusion.

That was a wonderful breath of fresh air for sure! I was able to pretty much easily figure out and implement all the basic date and datetime manipulations in Polars without much trouble at all. Nothing seemed overly strange and wonky to figure out and implement.

It appears Polars has first-class support for date and datetime manipulation. Polars truly is a production-ready package, ready to beat Pandas into the dust and far distance past. I’ve already implemented it in Production where I work. What are you waiting for? Get to it.

All code on GitHub.