#!/usr/bin/env python # coding: utf-8 # #Writing SQL Queries in Python: The Basics # We started you off by having you write SQL queries in PostgreSQL's interactive shell (`psql`). Which is cool, but you're probably wondering: "Hey how do I actually get information from a SQL database into my Python programs? That's where I can do some real damage." # ##Python PostgreSQL libraries # In order to connect to a SQL database and make queries in Python, we need to use a *database adapter library*. Such libraries provide the functionality needed to connect to a database, send SQL statements, and receive responses, taking all necessary steps to ensure that (e.g.) SQL types are converted safely to Python types, etc. # # Each RDBMS has several different adapter libraries. For this tutorial, we're going to use a library called [psycopg2](http://initd.org/psycopg/docs/index.html). There are several others (including [pg8000](https://github.com/mfenniak/pg8000)), but the main benefits of psycopg2 is that it comes pre-installed with Anaconda. (And it's based on a C library, so it's pretty fast.) # # This library assumes that you've completed the tutorial in the [SQL introduction](04_SQL_Part1.md) notes, and that you have a PostgreSQL server with a copy of the [MONDIAL database](http://www.dbis.informatik.uni-goettingen.de/Mondial/) running on your computer on the default port. # ##Connecting to a database with psycopg2 # When using a SQL server from Python, you'll be working with several different kinds of objects: # # * A *connection object*, which gives you access to the server; and # * *Cursor objects*, which you use to make SQL queries and retrieve data returned from those queries. # # To create a connection object, call `psycopg2`'s `connect()` function: # In[40]: import psycopg2 conn = psycopg2.connect(database="mondial") print type(conn) # The `connect()` function takes a number of named parameters. The only one we're using here is `database`, which specifies which database to connect to. If we were attempting to connect to a PostgreSQL server on someone else's machine, we might need to use the `host` and `port` parameters. [Consult the documentation](http://initd.org/psycopg/docs/module.html#psycopg2.connect) for more information. # ##Making a query # Now that we have an open connection, let's attempt to query the database. To perform a query, we first need a cursor object, which we can create using the connection object's `.cursor()` method: # In[41]: cursor = conn.cursor() print type(cursor) # The cursor object has several methods of interest to us. The first and most important is `.execute()`, which takes a SQL statement (in a Python string) as a parameter: # In[42]: cursor.execute("SELECT name, length FROM river WHERE length > 4000") # The `.execute()` performs the query, but doesn't evaluate to anything. After calling `.execute()`, you can call the cursor's `.fetchone()` method to get the first row returned from the query: # In[43]: cursor.fetchone() # Subsequent requests to `.fetchone()` will return subsequent rows: # In[44]: cursor.fetchone() # To retrieve all of the rows returned from a query, you can use the cursor object in a `for` loop, like so: # In[45]: cursor.execute("SELECT name, length FROM river WHERE length > 4000") for row in cursor: print row # Calling the `.fetchone()` method, or iterating over the cursor object, yields a series of n-tuples, with one element in the tuple per field requested in the query. # ##Interpolating values in a query # Let's say that we're starting with data from some other source, say, a list of cities whose population we're interested in. # In[46]: cities_of_interest = ['Paris', 'Nairobi', 'Buenos Aires', 'Kyoto'] # Now we want to perform queries on the MONDIAL database to get the population for each of these cities. Somehow, we need to *build* a series of SQL queries in Python that include the names in the list. # # You might think that you could simply do something like this: # In[47]: query = "SELECT population FROM city WHERE name = '" + cities_of_interest[0] + "'"; print query # This looks good, until you have a name with problematic punctuation: # In[48]: problematic_city = "Martha's Vineyard" query = "SELECT population FROM city WHERE name = '" + problematic_city + "'" print query # See the trouble? The apostrophe in the name of the city made its way into our query string. This query would be a syntax error in SQL, since SQL will believe the string to have ended at the apostrophe in `Martha's`. Troublesome! # # To solve this problem, the cursor object's `.execute()` method comes with a built-in means of interpolating values into queries. Simply put `%s` in your query string wherever you want to insert a value, and then pass as a second parameter to `.execute()` a list of values that you want to be included in the query: # In[49]: cursor.execute("SELECT population FROM city WHERE name = %s", ["Martha's Vineyard"]) # Psycopg2 will take care of the nasty business of quoting your string for you, and you'll be protected from [SQL injection attacks](https://en.wikipedia.org/wiki/SQL_injection). # # Here's a complete example, iterating over a list of cities and getting the population for each one: # In[51]: for city_name in cities_of_interest: cursor.execute("SELECT population FROM city WHERE name = %s", [city_name]) population = cursor.fetchone()[0] # fetchone() returns a tuple w/1 val print city_name, population # ##Coping with errors # # Psycopg2 is very persnickety about errors. If you have a syntax error, like so: # In[53]: cursor = conn.cursor() cursor.execute("SMELLECT * FORM cheese WERE stink > 15 ODOR DESC") # ... you'll get a syntax error as expected. But then subsequent attempts to use the cursor will frustratingly fail: # In[56]: cursor.execute("SELECT population FROM city WHERE name = 'Paris'") # The way to fix this problem is to close the connection and re-open it, or simply call the connection object's `rollback` method: # In[58]: conn.rollback() # Now your queries can proceed as planned: # In[59]: cursor.execute("SELECT population FROM city WHERE name = 'Paris'") cursor.fetchone() # ##More information # # For more information, consult [psychopg2's documentation](http://initd.org/psycopg/docs/index.html).