The SQL Murder Mystery is a self-directed lesson to learn SQL concepts and commands and a fun game for experienced SQL users to solve an intriguing crime published by the Northwestern University Knight Lab and based on Noah Veltman's Command Line Murder Mystery.
To solve the mystery, you need to download a small database which contains all the clues you need, if you can work out what SQL commands to use to find them.
Using a notebook makes sense for this sort of investigation because it can help you keep track of, and notes on, the current state of your investigation.
This original database is distributed as a SQLite database, but this notebook shows how to load it into, and access it from, the PostgreSQL database distributed as part of the TM351 VM.
Instructions for downloading a Virtualbox image of the TM351 VM, or building the VM on a remote host, can be found here.
We can download the original SQLite3 database directly from the original repository:
!wget https://github.com/NUKnightLab/sql-mysteries/raw/master/sql-murder-mystery.db
!ls *.db
To export the data tables from the SQLite database into Postgres, we need to do a couple of things:
pgloader
utility which provides a wide range of tools for loading variously formatted datasets into Postgres;pgloader
utility¶The pgloader
utility is a Linux package that we can install using apt-get
.
%%capture
!sudo apt-get update && sudo apt-get install -y pgloader && sudo apt-get clean
To keep things tidy, we can create a specific database to import the data into, along with a specific Postgres user with access permissions on that database.
We'll also import some tools and extensions to make working with the database easier.
from tm351_utils.db import *
%load_ext sql
%load_ext schemadisplay_magic
Let's make sure we don't contaminate any of the evidence by starting with a clean slate:
#clearConnections("mysterydb")
!echo "DROP DATABASE IF EXISTS mysterydb" | sudo -u postgres psql
Create an empty database to import the data into:
! createdb mysterydb
Let's also create a user account for our investigating officer with permissions over the database:
# Create a new user...
!echo "CREATE USER detective WITH PASSWORD 'clouseau';" | sudo -u postgres psql
# ...with permissions over the new database
!echo "GRANT ALL PRIVILEGES ON DATABASE mysterydb TO detective;" | sudo -u postgres psql
Create a connection string for connecting to the database, set from identifiable parameter values.
DB_ENGINE='postgresql' # tells ipython-sql that we will use postgreSQL as our database engine
DB_USER='detective' # id of the user who is logging in
DB_PWD='clouseau' # the user's password
DB_ADDR='localhost:5432' # the host and port on which the database engine is listening
DB_NAME='mysterydb' # the name of the database
DB_CONNECTION = '{engine}://{user}:{pwd}@{addr}/{name}'.format(engine=DB_ENGINE,
user=DB_USER,
pwd=DB_PWD,
addr=DB_ADDR,
name=DB_NAME)
DB_CONNECTION
We can now use this connection string as the target location for importing the data from the SQLite database into the Postgres database:
!pgloader ./sql-murder-mystery.db $DB_CONNECTION
We can use the ipython-sql
magic, which is preinstalled in the TM351 VM, to connect to the database, which will simplify running queries over it:
%sql $DB_CONNECTION
The magic returns results as pandas
dataframes. The following setting ensures we can read the full content of any result.
import pandas as pd
pd.set_option('display.max_colwidth', -1)
To orientate ourselves to the contents of the database, we can display the table schema:
%schema --connection_string $DB_CONNECTION
A helper function also lets us view the tables in the database:
showTables(DB_NAME)
We can also review the columns contained within a table (omit the table name to see columns listed for all tables in the specified database).
showColumns(DB_NAME, 'crime_scene_report', user=DB_USER, password=DB_PWD)
Check out the original guidance for the mystery here; additional hints are also available within the original repository.
In brief, you're going to be investigating a murder that took place on January 15th, 2018 in SQL City.
To get you started, here's the original crime report:
%%sql
SELECT * FROM crime_scene_report WHERE date=20180115 AND city='SQL City' AND type='murder';
Now get to it...