, ,

Introduction to Postgres with Python

Python and Postgres, a match made in heaven.

If there was ever a match made in heaven, it’s using Python and Postgres together. They were made for each other. Both are fun and easy to use, addicting, both have so many surprises and hidden gems. Like Gandalf and Frodo, the two just go together. Today I want to go through the basics of interacting with Postgres using Python. In the beginning of my data career this was often a point of pain, even though it seems like it should be easy. Let’s hit on the basics and then a few of the not-so-obvious things I wish I would have known in the beginning.

Connecting to Postgresql with Python.

First, if you don’t have a Postgres instance to play around with, check out the official install instructions and get yourself a local copy. Next open up your Python and terminal. There are a few different pip install packages that will allow you to access Postgres from Python. Here are the major choices, in order of preference.

  • psycopg2
  • sqlalchemy
  • pyodbc

The Basics of a Python Connection to Postgresql

pip install psycopg2

Let’s start with the psycopg2 package, this would be the recommended way of connecting to Postgres with Python. The first step you will need to complete is getting the URI/URL for your Postgres connection to the database and server. This gives a lot of people trouble in the beginning, the first few times you have to put one together. Here are the options for parameters if you will.

  • host
  • database
  • port
  • user
  • password

Probably some of the confusion comes in because depending on the setup, only some of the parameters are needed, and of course there are different ways to indicate the URI and or parameters. You will probably see people connecting in two different ways.

Let’s give it a try.

import psycopg2

host = 'localhost'
database = 'my_database'
user = 'postgres'
pass = 'postgres'

#  connecting with params
try:
    conn = psycopg2.connect(host=host, database=database, user=user, password=pass)
except psycopg2.Error as e:
    print(f'Had problem connecting with error {e}.')

# or this way with URI (Postgres 9.2 and above)
uri = f'postgresql://{user}:{pass}@{host}:{port}/{database}'
try:
    conn = psycopg2.connect(uri)
except psycopg2.Error as e:
    print(f'Had problem connecting with error {e}.')

One thing to note, never make a database call or connection without catching and raising errors. You always need to know what went wrong, which will happen often when sending queries to the database and pulling records back.

Ok, so now that we’ve made a connection to our database we should talk about something. Never take a database connection for granted. Clean up after yourself, if you have a DBA he will thank you. Databases many times have connection limits, you don’t want to leave tons of orphaned connections to a database that the server has to clean up. If you make a connection object, in our case conn, call conn.close() when done.

There is something else that can be confusing about Python psycopg2 connections, you will use it to generate a database cursor, which in turn will execute SQL statements, but committing or rolling-back transactions is done with a connection object. Otherwise look into “auto-commit” if you don’t like being in control of your own destiny.

Always remember that you must COMMIT or ROLLBACK transactions on your CONNECTION object, not your CURSOR.

This is visually what a Python/Postgres database Connection and Cursor look like, how they relate, and where the important methods are called.

Python Postgres Database Cursors

Let’s make a cursor and create a database table. Remember a connection gives you that pipe to your database, a cursor is needed to execute a query against the database. The most important method of a database cursor is the .execute() method, or executemany(). We can get into that later.

import psycopg2

host = 'localhost'
database = 'my_database'
user = 'postgres'
pass = 'postgres'

try:
    conn = psycopg2.connect(host=host, database=database, user=user, password=pass)
except psycopg2.Error as e:
    print(f'Had problem connecting with error {e}.')

database_cursor = conn.cursor()
query = 'CREATE TABLE books (id INT NOT NULL, author varchar(150), title varchar(50));'
try:
    database_cursor.execute(query)
    conn.commit()
except psycopg2.Error as e:
    print(f'Problem executing query {query} with error {e}.')
    conn.rollback()

Notice of course how I wrapped the cursor execute() call in a try except block to catch errors. Also notice how I used the database connection to commit() the transaction if no errors are caught, as well as using the connection to rollback() the transaction if there is an error.

Executing a Python Postgres psycopg2 Query with Parameters.

This is probably the next area that gives people the most trouble. Once you’ve learned how to connect to your database and execute a simple query, most people are probably going to want to parameterize those queries at some point, passing in some values from the Python script. I’m not going to get into security, but I’m sure you see the implications here, so do your own research.

It’s really not that hard, all you have to do in your query string is put the characters %s where you would like the parameter to be pulled in. Then for psycopg2 you need to pass a Python tuple with those parameter(s). It would look something like this.

query = 'INSERT INTO books (id, author, title) VALUES (%s, %s, %s);'
data = (1, "St. Augustine", "Confessions")
try:
    database_cursor.execute(query, data)
except psycopg2.Error as e:
    print(f'Had problem with query {query} with error {e}.')

Remember that whole executemany() mentioned above? Just make a list of tuples.

query = 'INSERT INTO books (id, author, title) VALUES (%s, %s, %s);'
data = [(1, "St. Augustine", "Confessions"), (2, "Martin Luther", "95 Theses")]
try:
    database_cursor.executemany(query, data)
except psycopg2.Error as e:
    print(f'Had problem with query {query} with error {e}.')

Retrieving Database Results with Python Postgres psycopg2.

Once you have a few records in your database, the next thing to do would be to retrieve them. The concept very similar to the execution of inserting data. There are three methods you can call on a cursor that is executing a SQL SELECT statement to retrieve data.

  • fetchone()
  • fetchmany(number_of_records_to_fetch)
  • fetchall()

You should never call fetchone() in real life, this is the default behavior and a cursor object is iteratable in Python so you can just do the following.

query = 'SELECT * FROM books;'
try:
    database_cursor.execute(query)
except psycopg2.Error as e:
    print(f'Had problem with query {query} with error {e}.')

for record in database_cursor:
    print(record)

Also, don’t forgot the important step of cleaning up after yourself in the scripts!

database_cursor.close()
conn.close()

That’s it! As you can see it’s super easy to get started using Python and Postgres using the the gerat psycopg2 package. All you need to lean is Connections, Cursors and a few other minor details about which methods are available. Once you get past the initial basics of making a connection and learning about cursors it’s off to the races.

Postgres is an awesome tool that supports things like geospatial data, command line tools for loading data etc. Do some reading and you will be amazed at what it is capable of.