The SQL Murder Mystery is a self-directed lesson to learn SQL concepts and commands and a fun game for experienced SQL users to solve an intriguing crime published by the Northwestern University Knight Lab and based on Noah Veltman's Command Line Murder Mystery.
To solve the mystery, you need to download a small database which contains all the clues you need, if you can work out what SQL commands to use to find them.
Using a notebook makes sense for this sort of investigation because it can help you keep track of, and notes on, the current state of your investigation.
This original database is distributed as a SQLite database, but this notebook shows how to load it into, and access it from, the PostgreSQL database distributed as part of the TM351 VM.
Instructions for downloading a Virtualbox image of the TM351 VM, or building the VM on a remote host, can be found here.
We can download the original SQLite3 database directly from the original repository:
!wget https://github.com/NUKnightLab/sql-mysteries/raw/master/sql-murder-mystery.db
!ls *.db
--2019-10-08 11:33:31-- https://github.com/NUKnightLab/sql-mysteries/raw/master/sql-murder-mystery.db Resolving github.com (github.com)... 140.82.118.4 Connecting to github.com (github.com)|140.82.118.4|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/NUKnightLab/sql-mysteries/master/sql-murder-mystery.db [following] --2019-10-08 11:33:31-- https://raw.githubusercontent.com/NUKnightLab/sql-mysteries/master/sql-murder-mystery.db Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 3002368 (2.9M) [application/octet-stream] Saving to: ‘sql-murder-mystery.db’ sql-murder-mystery. 100%[===================>] 2.86M 3.91MB/s in 0.7s 2019-10-08 11:33:33 (3.91 MB/s) - ‘sql-murder-mystery.db’ saved [3002368/3002368] sql-murder-mystery.db
To export the data tables from the SQLite database into Postgres, we need to do a couple of things:
pgloader
utility which provides a wide range of tools for loading variously formatted datasets into Postgres;pgloader
utility¶The pgloader
utility is a Linux package that we can install using apt-get
.
%%capture
!sudo apt-get update && sudo apt-get install -y pgloader && sudo apt-get clean
To keep things tidy, we can create a specific database to import the data into, along with a specific Postgres user with access permissions on that database.
We'll also import some tools and extensions to make working with the database easier.
from tm351_utils.db import *
%load_ext sql
%load_ext schemadisplay_magic
The sql extension is already loaded. To reload it, use: %reload_ext sql
Let's make sure we don't contaminate any of the evidence by starting with a clean slate:
#clearConnections("mysterydb")
!echo "DROP DATABASE IF EXISTS mysterydb" | sudo -u postgres psql
NOTICE: database "mysterydb" does not exist, skipping DROP DATABASE
Create an empty database to import the data into:
! createdb mysterydb
Let's also create a user account for our investigating officer with permissions over the database:
# Create a new user...
!echo "CREATE USER detective WITH PASSWORD 'clouseau';" | sudo -u postgres psql
# ...with permissions over the new database
!echo "GRANT ALL PRIVILEGES ON DATABASE mysterydb TO detective;" | sudo -u postgres psql
CREATE ROLE GRANT
Create a connection string for connecting to the database, set from identifiable parameter values.
DB_ENGINE='postgresql' # tells ipython-sql that we will use postgreSQL as our database engine
DB_USER='detective' # id of the user who is logging in
DB_PWD='clouseau' # the user's password
DB_ADDR='localhost:5432' # the host and port on which the database engine is listening
DB_NAME='mysterydb' # the name of the database
DB_CONNECTION = '{engine}://{user}:{pwd}@{addr}/{name}'.format(engine=DB_ENGINE,
user=DB_USER,
pwd=DB_PWD,
addr=DB_ADDR,
name=DB_NAME)
DB_CONNECTION
'postgresql://detective:clouseau@localhost:5432/mysterydb'
We can now use this connection string as the target location for importing the data from the SQLite database into the Postgres database:
!pgloader ./sql-murder-mystery.db $DB_CONNECTION
2019-10-08T11:33:45.030000Z LOG Main logs in '/tmp/pgloader/pgloader.log' 2019-10-08T11:33:45.035000Z LOG Data errors in '/tmp/pgloader/' 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "crime_scene_report" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "drivers_license" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "person" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "facebook_event_checkin" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "interview" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "get_fit_now_member" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "get_fit_now_check_in" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "income" does not exist, skipping 2019-10-08T11:33:45.237000Z WARNING PostgreSQL warning: table "solution" does not exist, skipping table name read imported errors time ---------------------- --------- --------- --------- -------------- fetch 0 0 0 0.000s fetch meta data 9 9 0 0.026s create, truncate 0 0 0 0.033s ---------------------- --------- --------- --------- -------------- crime_scene_report 1228 1228 0 0.080s drivers_license 10007 10007 0 0.293s person 10011 10011 0 0.243s facebook_event_checkin 20011 20011 0 0.440s interview 4991 4991 0 0.107s get_fit_now_member 184 184 0 0.053s get_fit_now_check_in 2703 2703 0 0.067s income 7514 7514 0 0.074s solution 0 0 0 0.007s index build completion 0 0 0 0.000s ---------------------- --------- --------- --------- -------------- Create Indexes 0 0 0 0.000s Reset Sequences 0 0 0 0.010s ---------------------- --------- --------- --------- -------------- Total streaming time 56649 56649 0 1.433s
We can use the ipython-sql
magic, which is preinstalled in the TM351 VM, to connect to the database, which will simplify running queries over it:
%sql $DB_CONNECTION
'Connected: detective@mysterydb'
The magic returns results as pandas
dataframes. The following setting ensures we can read the full content of any result.
import pandas as pd
pd.set_option('display.max_colwidth', -1)
To orientate ourselves to the contents of the database, we can display the table schema:
%schema --connection_string $DB_CONNECTION
A helper function also lets us view the tables in the database:
showTables(DB_NAME)
tablename | tableowner | |
---|---|---|
0 | crime_scene_report | detective |
1 | drivers_license | detective |
2 | person | detective |
3 | facebook_event_checkin | detective |
4 | interview | detective |
5 | get_fit_now_member | detective |
6 | get_fit_now_check_in | detective |
7 | income | detective |
8 | solution | detective |
We can also review the columns contained within a table (omit the table name to see columns listed for all tables in the specified database).
showColumns(DB_NAME, 'crime_scene_report', user=DB_USER, password=DB_PWD)
column_name | table_name | |
---|---|---|
0 | city | crime_scene_report |
1 | date | crime_scene_report |
2 | description | crime_scene_report |
3 | type | crime_scene_report |
Check out the original guidance for the mystery here; additional hints are also available within the original repository.
In brief, you're going to be investigating a murder that took place on January 15th, 2018 in SQL City.
To get you started, here's the original crime report:
%%sql
SELECT * FROM crime_scene_report WHERE date=20180115 AND city='SQL City' AND type='murder';
* postgresql://detective:***@localhost:5432/mysterydb 1 rows affected.
date | type | description | city | |
---|---|---|---|---|
0 | 20180115 | murder | Security footage shows that there were 2 witnesses. The first witness lives at the last house on "Northwestern Dr". The second witness, named Annabel, lives somewhere on "Franklin Ave". | SQL City |
Now get to it...