As mentioned in earlier lessons, Python is very flexible and has a wide range of libraries and third-party modules to support many operations. SQL (Structured Query Language) can be executed from within Python using sqlite3. The sqlite3 module offers support to connect to an external database and execute SQL queries. However, this module does not offer the complete querying capability of a typical SQL engine and functions as a light-weight API version of the querying engine. Other modules like MySQLdb (same as mysql-python), offer a more extensive range of functions and query processing abilities.
We will be discussing sqlite3 module, as it is the widely used. Though it is a light-weight module, it supports almost all basic sql operations and can be implemented for a database of up to 140 Terabytes in size.
The first step in executing SQL through Python is connecting to an external database file. The 'connect' method in the sqlite module helps to create a connection with an external database. The method accepts the name of the external database as argument. We can also create a database in-memory by passing ":memory:" as the argument, however care needs to be taken as this consumes RAM.
The connection is stored as a connection object. Methods like cursor, commit, close, rollback, execute, create_function, etc., can be called on the connection object. In order to learn the full-range of methods and their descriptions, please refer to the sqlite module documentation (Link: https://docs.python.org/2/library/sqlite3.html#module-sqlite3).
Data History: As part of this exercise, we will work with the 'Murders' data. This dataset consists of the number of murders committed in the given metropolitan city. The data for two years, 2014 and 2015 is given, along with a column which shows the change in number of murders.
Use the connect method to establish a connection with a database file created within memory (Hint: Use :memory: to create an in-memory database).
import sqlite3
murcon = sqlite3.connect(':memory:')
ref_tmp_var = False
try:
if str(type(murcon)) == "<class 'sqlite3.Connection'>":
ref_tmp_var = True
else:
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
New tables can be created in a database and data can be inserted through queries, or, data can also be integrated from an existing source table. The 'Murders' data set can be integrated into the murdersdb database using to_sql method of pandas module. There are two steps to do this:
1) Load the data from the source file into a pandas data frame
2) Use the to_sql method to copy the data from the data frame into a new/existing table
Load the murders data from source file into a table called 'murderstable' in the murdersdb database
import csv
import pandas as pd
murdersdf = pd.read_csv('https://raw.githubusercontent.com/colaberry/538data/master/murder_2016/murder_2015_final.csv')
murdersdf.head(5)
# renaming columns as SQL columns cannot start with a number
murdersdf.columns = ['city','state','murders_2014','murders_2015','change']
murdersdf.to_sql(name='murderstable',con=murcon,if_exists='replace',index=False)
ref_tmp_var = False
try:
if len(murdersdf.columns) == len(['city','state','murders_2014','murders_2015','change']):
ref_tmp_var = True
else:
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
The cursor method can be called on the connection object. This method directs control for carrying out operations through the connection. Once the cursor object is created, it can be used to carry out querying operations by using methods like execute, executemany, executeall, fetchone, fetchmany, fetchall, rowcount, etc.
1) Execute: SQL queries can be passed as arguments to this method for execution
2) fetchone/many/all: this method collects the output of the SQL query, one/many/all rows at a time, and prints them out
Create a cursor object on the connection created previously. Use the cursor object to execute a 'Select' query to show (use fetchall method to store the query result in a variable 'queryone') the first 5 rows of the table created in the previous exercise. Print 'queryone' variable to see the result.
murcur = murcon.cursor()
# You can use .execute method on a cursor object to execute a SELECT query. Use LIMIT to print first 5 rows.
murcur.execute("SELECT * FROM murderstable LIMIT 5")
queryone = murcur.fetchall()
print(queryone)
ref_tmp_var = False
try:
test = [('Baltimore', 'Maryland', 211, 344, 133),
('Chicago', 'Illinois', 411, 478, 67),
('Houston', 'Texas', 242, 303, 61),
('Cleveland', 'Ohio', 63, 120, 57),
('Washington', 'D.C.', 105, 162, 57)]
if test == queryone:
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
There is an alternate way to execute a query using the read_sql_query() method from pandas. The method can be called on a pandas instance and takes two main arguments - the SQL query to be executed on the data set and the connection object which connects the database.
Use the read_sql_query() method on the 'murdersdf' data frame to print the same output as Exercise (a)
#
# The read_sql_query function can be called as 'pd.read_sql_query()'.
# It has two main arguments, the SQL query to be executed and the database connection object.
print(pd.read_sql_query("SELECT * FROM murderstable LIMIT 5",murcon))
ref_tmp_var = False
try:
test = [('Baltimore', 'Maryland', 211, 344, 133),
('Chicago', 'Illinois', 411, 478, 67),
('Houston', 'Texas', 242, 303, 61),
('Cleveland', 'Ohio', 63, 120, 57),
('Washington', 'D.C.', 105, 162, 57)]
testdf = pd.DataFrame(test, columns=['city','state','murders_2014','murders_2015','change'])
if testdf.equals(pd.read_sql_query("SELECT * FROM murderstable LIMIT 5",murcon)):
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
Create, read, update and delete opearations performed on databases are oftened referred to as CRUD operations. For details on framing SQL queries for these operations please refer to any online SQL tutorial.
The aim of the following set of exercises is to familiarize us with CRUD operations performed through python. We will the murders data set when needed.
Create a table with table name as 'murderstabletwo' and the same columns as the murderstable
Remember: Use triple quotes to enclose arguments which span multiple-lines
try:
murcur.execute("""CREATE TABLE murderstabletwo (
city TEXT,
state TEXT,
murders_2014 INTEGER,
murders_2015 INTEGER,
change INTEGER)""")
except Exception as e:
print(e)
ref_tmp_var = False
ref_tmp_var = True
assert ref_tmp_var
We have previously executed a SELECT query to read some data from the newly created table. That is an example of a read query.
Execute a read query, which reads the contents of the murderstable and sorts the output rows in a descending order of 2015 murders. Store the first five rows of this output in a variable ('topfive') and print it out.
#
# use "select" query to read and fetchall to print.
murcur.execute("SELECT * FROM murderstable ORDER BY murders_2015 DESC LIMIT 5")
topfive = murcur.fetchall()
print(topfive)
ref_tmp_var = False
try:
test = [('Chicago', 'Illinois', 411, 478, 67), ('New York', 'New York', 333, 352, 19), ('Baltimore', 'Maryland', 211, 344, 133), ('Houston', 'Texas', 242, 303, 61), ('Detroit', 'Michigan', 298, 295, -3)]
if test == topfive:
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
Insert the top five rows of data in the 'topfive' list, into the second table ('murderstabletwo') created previously. Use the 'executemany' function in order to insert multiple records using a single query. Retrieve the contents of the 'murderstabletwo' using a SELECT query and store the output into a variable 'querytwo'. Print out 'querytwo'.
#
# In this case, executemany() method takes two arguments, the sql query and the data frame which contains values of each row.
# Use '?' as placeholder for values within the query. This is a more secure way of passing arguments as is explained below.
murcur.executemany("INSERT INTO murderstabletwo VALUES (?,?,?,?,?)",topfive)
murcur.execute("SELECT * FROM murderstabletwo")
querytwo = murcur.fetchall()
print(querytwo)
ref_tmp_var = False
try:
ref_assert_var = False
test = [('Chicago', 'Illinois', 411, 478, 67), ('New York', 'New York', 333, 352, 19), ('Baltimore', 'Maryland', 211, 344, 133), ('Houston', 'Texas', 242, 303, 61), ('Detroit', 'Michigan', 298, 295, -3)]
if querytwo:
ref_assert_var = True
for i in test:
if not i in querytwo:
ref_assert_var = False
break
if ref_assert_var:
ref_tmp_var = True
else:
print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
print('querytwo is empty, please follow instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
We used '?' as a placeholder in the above query. This is because, using %s to pass arguments to the SQL execute method is insecure, as the string input is taken in as entered, and it may contain SQL key words which may perform unintentional actions and lead to a SQL injection attack. Using ? ensures that the user input is sanitized and processed in the query.
Make 'Chicago' the safest city. Update the 2015 murders value for Chicago as zero. Retreive the updated contents of 'murderstabletwo' and store them in a variable 'querythree'. Print 'querythree'.
#
murcur.execute("UPDATE murderstabletwo SET murders_2015=0 WHERE city='Chicago'")
murcur.execute("SELECT * FROM murderstabletwo")
querythree = murcur.fetchall()
print(querythree)
ref_tmp_var = False
try:
test = [('Chicago', 'Illinois', 411, 0, 67),
('New York', 'New York', 333, 352, 19),
('Baltimore', 'Maryland', 211, 344, 133),
('Houston', 'Texas', 242, 303, 61),
('Detroit', 'Michigan', 298, 295, -3)]
if querythree:
ref_assert_var = True
for i in test:
if not i in querythree:
ref_assert_var = False
break
if ref_assert_var:
ref_tmp_var = True
else:
print('Please follow the instructions given and use the same variables provided in the instructions.')
else:
print('querythree is empty, please follow instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
try:
murcur.execute("DROP TABLE murderstable")
murcur.execute("DROP TABLE murderstabletwo")
except Exception as e:
print(e)
ref_tmp_var = False
ref_tmp_var = True
assert ref_tmp_var
The operations performed to the database can all be saved by calling the commit method on the connection object. Another method 'rollback' can rollback all the changes done to the database after the last commit. 'Close' method called on the connection object closes the connection to the database.
Rollback changes made to the database. Commit the changes and close the connection.
try:
murcon.rollback()
murcon.commit()
murcon.close()
except Exception as e:
print(e)
ref_tmp_var = False
ref_tmp_var = True
assert ref_tmp_var