We started you off by having you write SQL queries in PostgreSQL's interactive shell (psql
). Which is cool, but you're probably wondering: "Hey how do I actually get information from a SQL database into my Python programs? That's where I can do some real damage."
In order to connect to a SQL database and make queries in Python, we need to use a database adapter library. Such libraries provide the functionality needed to connect to a database, send SQL statements, and receive responses, taking all necessary steps to ensure that (e.g.) SQL types are converted safely to Python types, etc.
Each RDBMS has several different adapter libraries. For this tutorial, we're going to use a library called psycopg2. There are several others (including pg8000), but the main benefits of psycopg2 is that it comes pre-installed with Anaconda. (And it's based on a C library, so it's pretty fast.)
This library assumes that you've completed the tutorial in the SQL introduction notes, and that you have a PostgreSQL server with a copy of the MONDIAL database running on your computer on the default port.
When using a SQL server from Python, you'll be working with several different kinds of objects:
To create a connection object, call psycopg2
's connect()
function:
import psycopg2
conn = psycopg2.connect(database="mondial")
print type(conn)
<type 'psycopg2.extensions.connection'>
The connect()
function takes a number of named parameters. The only one we're using here is database
, which specifies which database to connect to. If we were attempting to connect to a PostgreSQL server on someone else's machine, we might need to use the host
and port
parameters. Consult the documentation for more information.
Now that we have an open connection, let's attempt to query the database. To perform a query, we first need a cursor object, which we can create using the connection object's .cursor()
method:
cursor = conn.cursor()
print type(cursor)
<type 'psycopg2.extensions.cursor'>
The cursor object has several methods of interest to us. The first and most important is .execute()
, which takes a SQL statement (in a Python string) as a parameter:
cursor.execute("SELECT name, length FROM river WHERE length > 4000")
The .execute()
performs the query, but doesn't evaluate to anything. After calling .execute()
, you can call the cursor's .fetchone()
method to get the first row returned from the query:
cursor.fetchone()
('Irtysch', Decimal('4248'))
Subsequent requests to .fetchone()
will return subsequent rows:
cursor.fetchone()
('Jenissej', Decimal('4092'))
To retrieve all of the rows returned from a query, you can use the cursor object in a for
loop, like so:
cursor.execute("SELECT name, length FROM river WHERE length > 4000")
for row in cursor:
print row
('Irtysch', Decimal('4248')) ('Jenissej', Decimal('4092')) ('Lena', Decimal('4400')) ('Hwangho', Decimal('4845')) ('Jangtse', Decimal('6380')) ('Mekong', Decimal('4350')) ('Missouri', Decimal('4130')) ('Amazonas', Decimal('6448')) ('Niger', Decimal('4184')) ('Zaire', Decimal('4374'))
Calling the .fetchone()
method, or iterating over the cursor object, yields a series of n-tuples, with one element in the tuple per field requested in the query.
Let's say that we're starting with data from some other source, say, a list of cities whose population we're interested in.
cities_of_interest = ['Paris', 'Nairobi', 'Buenos Aires', 'Kyoto']
Now we want to perform queries on the MONDIAL database to get the population for each of these cities. Somehow, we need to build a series of SQL queries in Python that include the names in the list.
You might think that you could simply do something like this:
query = "SELECT population FROM city WHERE name = '" + cities_of_interest[0] + "'";
print query
SELECT population FROM city WHERE name = 'Paris'
This looks good, until you have a name with problematic punctuation:
problematic_city = "Martha's Vineyard"
query = "SELECT population FROM city WHERE name = '" + problematic_city + "'"
print query
SELECT population FROM city WHERE name = 'Martha's Vineyard'
See the trouble? The apostrophe in the name of the city made its way into our query string. This query would be a syntax error in SQL, since SQL will believe the string to have ended at the apostrophe in Martha's
. Troublesome!
To solve this problem, the cursor object's .execute()
method comes with a built-in means of interpolating values into queries. Simply put %s
in your query string wherever you want to insert a value, and then pass as a second parameter to .execute()
a list of values that you want to be included in the query:
cursor.execute("SELECT population FROM city WHERE name = %s",
["Martha's Vineyard"])
Psycopg2 will take care of the nasty business of quoting your string for you, and you'll be protected from SQL injection attacks.
Here's a complete example, iterating over a list of cities and getting the population for each one:
for city_name in cities_of_interest:
cursor.execute("SELECT population FROM city WHERE name = %s",
[city_name])
population = cursor.fetchone()[0] # fetchone() returns a tuple w/1 val
print city_name, population
Paris 2152423 Nairobi 1346000 Buenos Aires 2988006 Kyoto 1415000
Psycopg2 is very persnickety about errors. If you have a syntax error, like so:
cursor = conn.cursor()
cursor.execute("SMELLECT * FORM cheese WERE stink > 15 ODOR DESC")
--------------------------------------------------------------------------- ProgrammingError Traceback (most recent call last) <ipython-input-53-4f6c94103f37> in <module>() 1 cursor = conn.cursor() ----> 2 cursor.execute("SMELLECT * FORM cheese WERE stink > 15 ODOR DESC") ProgrammingError: syntax error at or near "SMELLECT" LINE 1: SMELLECT * FORM cheese WERE stink > 15 ODOR DESC ^
... you'll get a syntax error as expected. But then subsequent attempts to use the cursor will frustratingly fail:
cursor.execute("SELECT population FROM city WHERE name = 'Paris'")
--------------------------------------------------------------------------- InternalError Traceback (most recent call last) <ipython-input-56-83a95cc0b980> in <module>() ----> 1 cursor.execute("SELECT population FROM city WHERE name = 'Paris'") InternalError: current transaction is aborted, commands ignored until end of transaction block
The way to fix this problem is to close the connection and re-open it, or simply call the connection object's rollback
method:
conn.rollback()
Now your queries can proceed as planned:
cursor.execute("SELECT population FROM city WHERE name = 'Paris'")
cursor.fetchone()
(Decimal('2152423'),)
For more information, consult psychopg2's documentation.