This notebook introduces some of the basic commands for querying and modifying a database using the Structured Query Language, SQL.
Part One retrieves data in the form of some csv files, from elsewhere in the internet. Then we're going to use a particular handy tool to turn this data into a database.
In Part two, we'll walk through some basic SQL commands for exploring and transforming this data. The amphitheatre data is courtesy Sebastian Heath, https://doi.org/10.5281/zenodo.596149. The aqueduct data is from Pelagios Commons.
We've already obtained the amphitheatre data from Sebastian Heath using the curl
command, like so:
!curl https://raw.githubusercontent.com/sfsheath/roman-amphitheaters/master/roman-amphitheaters.csv > amphi.csv
We also downloaded information about roman aqueducts from the peripleo api as json, and converted to csv using this tool.
Finally, we want to turn these two csv files into a single database containing two tables. We will use the command line tool 'sqlitebiter' by Tsuyoshi Hombashi to do this, which we have already installed.
!sqlitebiter -o roman.db file "amphi.csv" "aqua.csv"
[INFO] sqlitebiter file: convert 'amphi.csv' to 'amphi' table [INFO] sqlitebiter file: convert 'aqua.csv' to 'aqua' table [INFO] sqlitebiter file: number of created tables: 2 [INFO] sqlitebiter file: database path: /home/jovyan/roman.db
!sqlite3 roman.db .schema .exit
CREATE TABLE IF NOT EXISTS '_source_info_' (source_id INTEGER PRIMARY KEY AUTOINCREMENT, dir_name TEXT, base_name TEXT NOT NULL, format TEXT NOT NULL, size INTEGER, mtime INTEGER); CREATE TABLE sqlite_sequence(name,seq); CREATE TABLE IF NOT EXISTS 'amphi' (id TEXT, title TEXT, label TEXT, latintoponym TEXT, pleiades TEXT, welchid INTEGER, golvinid INTEGER, buildingtype TEXT, chronogroup TEXT, secondcentury INTEGER, capacity INTEGER, modcountry TEXT, romanregion TEXT, arenamajor REAL, arenaminor REAL, extmajor REAL, extminor REAL, exteriorheight REAL, longitude REAL, latitude REAL, elevation INTEGER); CREATE TABLE IF NOT EXISTS 'aqua' (identifier TEXT, title TEXT, "object_type" TEXT, "dataset_path" TEXT, "geo_bounds" TEXT, names TEXT, "temporal_bounds" TEXT, depictions TEXT, matches TEXT);
!sqlite3 roman.db .tables .exit
_source_info_ amphi aqua
Now that we have a database, we'll bring it into python so that we can query it. Once a database is in python, we can do a wide variety of data science type visualizations or explorations, although these are beyond the remit of the current notebook.
The first thing we're going to do is create a function that opens a connection to the database, and allows us to build queries. After we create the function, we can create query objects, and then run_query
. Students might also want to consult this tutorial.
# create a function for querying the database
import sqlite3
import pandas as pd
db = sqlite3.connect('roman.db')
def run_query(query):
return pd.read_sql_query(query,db)
Let's give it a try. We're going to build a query that asks, 'show us every column in the amphi table, but only for the first five rows.'
query = 'SELECT * FROM amphi LIMIT 5;'
run_query(query)
id | title | label | latintoponym | pleiades | welchid | golvinid | buildingtype | chronogroup | secondcentury | ... | modcountry | romanregion | arenamajor | arenaminor | extmajor | extminor | exteriorheight | longitude | latitude | elevation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | duraEuroposAmphitheater | Amphitheater at Dura Europos | Dura | Dura Europus | https://pleiades.stoa.org/places/893989 | 129 | amphitheater | severan | 0 | ... | Syria | syria | 31 | 25 | 50 | 44 | 40.728926 | 34.749855 | 223 | ||
1 | arlesAmphitheater | Amphitheater at Arles | Arles | Arelate | https://pleiades.stoa.org/places/148217 | 154 | amphitheater | flavian | 1 | ... | France | narbonensis | 47 | 32 | 136 | 107 | 4.631111 | 43.677778 | 21 | ||
2 | lyonAmphitheater | Amphitheater at Lyon | Lyon | Lugdunum | https://pleiades.stoa.org/places/167717 | amphitheater | second-century | 1 | ... | France | lugdunensis | 67.6 | 42 | 105 | 4.830556 | 45.770556 | 206 | ||||
3 | ludusMagnusArena | Ludus Magnus Arena | Ludus Magnus | Ludus Magnus | https://pleiades.stoa.org/places/423025 | practice-arena | imperial | 0 | ... | Italy | regio-i | 12.494913 | 41.889950 | 22 | |||||||
4 | romeFlavianAmphitheater | Flavian Amphitheater at Rome | Colosseum | https://pleiades.stoa.org/places/423025 | 152 | amphitheater | flavian | 1 | ... | Italy | regio-i | 83 | 48 | 189 | 156 | 52 | 12.492269 | 41.890169 | 22 |
5 rows × 21 columns
# just check that the aquaduct table is in there too
query = 'SELECT * FROM aqua LIMIT 5;'
run_query(query)
identifier | title | object_type | dataset_path | geo_bounds | names | temporal_bounds | depictions | matches | |
---|---|---|---|---|---|---|---|---|---|
0 | http://dare.ht.lu.se/places/43530 | Roman aqueduct, Mitiline, Lesbos | Place | [{"title":"dare-20160328","id":"dare-20160328"}] | {"min_lon":26.514763,"max_lon":26.514763,"min_... | ["Roman aqueduct, Mitiline, Lesbos"] | |||
1 | http://pleiades.stoa.org/places/738817254 | Ansignan aqueduct | Place | [{"title":"pleiades","id":"pleiades"}] | {"min_lon":2.5140266,"max_lon":2.5140266,"min_... | ||||
2 | http://pleiades.stoa.org/places/403927 | Caldaccoli aqueduct | Place | [{"title":"pleiades","id":"pleiades"}] | {"min_lon":10.4367702,"max_lon":10.4367702,"mi... | ||||
3 | http://topostext.org/place/352254BTei | Lyttos aqueduct (Lasithi) | Place | [{"title":"ToposText Places","id":"ToposText P... | {"min_lon":25.3818,"max_lon":25.3818,"min_lat"... | {"start":-750,"end":640} | |||
4 | http://pleiades.stoa.org/places/246891 | Albarracín-Cella Roman aqueduct | Place | [{"title":"pleiades","id":"pleiades"}] | {"min_lon":-1.5,"max_lon":-1.5,"min_lat":40.5,... |
SELECT, LIMIT, ORDER BY : using these, we can ask, 'Which amphitheatre is at the highest elevation?'
Use SELECT to retrieve the id and elevation columns FROM the amphi table
Use ORDER BY to sort the elevation column and use the DESC keyword to specify that you want to sort in descending order
Use LIMIT to restrict the output to 1 row
query = '''
SELECT id, elevation
FROM amphi
ORDER BY elevation DESC
LIMIT 1;
'''
run_query(query)
id | elevation | |
---|---|---|
0 | lambaesisAmphitheater | 1170 |
Let's get the top 10 now
query = '''
SELECT id, elevation
FROM amphi
ORDER BY elevation DESC
LIMIT 10;
'''
run_query(query)
id | elevation | |
---|---|---|
0 | lambaesisAmphitheater | 1170 |
1 | albaFucensAmphitheater | 977 |
2 | mactarisAmphitheater | 911 |
3 | tebessaAmphitheater | 877 |
4 | leonAmphitheater | 846 |
5 | bostraAmphitheater | 845 |
6 | segobrigaAmphitheater | 817 |
7 | siccaVeneriaAmphitheater | 748 |
8 | sanBenedettoDeiMarsiAmphitheater | 683 |
9 | sanVittorinoAmphitheater | 672 |
Following this pattern, can you create a query that also provides the geographic coordinates? In the block below see if you can construct and run that query.
Now let's create a query that creates a subset of data using a logical operator. We need the 'WHERE' command.
query = '''
SELECT *
FROM amphi
WHERE elevation > 500;
'''
run_query(query)
id | title | label | latintoponym | pleiades | welchid | golvinid | buildingtype | chronogroup | secondcentury | ... | modcountry | romanregion | arenamajor | arenaminor | extmajor | extminor | exteriorheight | longitude | latitude | elevation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | segobrigaAmphitheater | Amphitheater at Segobriga | Segobriga | Segobriga | https://pleiades.stoa.org/places/266036 | 88 | amphitheater | flavian | 1 | ... | Spain | tarraconensis | 47 | 31 | 75 | 64 | -2.813944 | 39.886018 | 817 | ||
1 | cyreneAmphitheater | Amphitheater at Cyrene | Cyrene | https://pleiades.stoa.org/places/373778 | amphitheater | imperial | 1 | ... | Libya | crete-et-cyrenaica | 32.7 | 28.8 | 21.850808 | 32.824526 | 550 | ||||||
2 | grumentoAmphitheater | Amphitheater at Grumento | Grumento | Grumentum | https://pleiades.stoa.org/places/442603 | 92 | amphitheater | imperial | 1 | ... | Italy | regio-iii | 60 | 40 | 90 | 70 | 15.911579 | 40.287640 | 579 | ||
3 | albaFucensAmphitheater | Amphitheater at Alba Fucens | Alba Fucens | Alba Fucens | https://pleiades.stoa.org/places/413005 | amphitheater | julio-claudian | 1 | ... | Italy | regio-iv | 64 | 37 | 103 | 76 | 13.412289 | 42.077200 | 977 | |||
4 | lambaesisAmphitheater | Amphitheater at Lambaesis | Lambaesis | https://pleiades.stoa.org/places/334570 | amphitheater | second-century | 1 | ... | Algeria | mauretania | 68 | 55 | 104.6 | 94 | 6.259935 | 35.489247 | 1170 | ||||
5 | mactarisAmphitheater | Amphitheater at Mactaris | Mactaris | https://pleiades.stoa.org/places/324774 | 119 | amphitheater | second-century | 1 | ... | Tunisia | proconsularis | 38.4 | 24.8 | 63.2 | 49.6 | 9.206673 | 35.855628 | 911 | |||
6 | sbeitlaAmphitheater | Amphitheater at Sbeitla | Sbeitla | Sufetula | https://pleiades.stoa.org/places/324816 | 113 | amphitheater | second-century | 1 | ... | Tunisia | proconsularis | 47 | 37 | 72 | 60 | 9.114582 | 35.243617 | 556 | ||
7 | tebessaAmphitheater | Amphitheater at Theveste | Tebessa | Theveste | https://pleiades.stoa.org/places/324831 | amphitheater | flavian | 1 | ... | Algeria | proconsularis | 52.8 | 39.5 | 83 | 70 | 8.123809 | 35.401171 | 877 | |||
8 | susaAmphitheater | Amphitheater at Susa | Susa | Segusio | https://pleiades.stoa.org/places/167919 | 15 | amphitheater | imperial | 1 | ... | Italy | alpes-cottiae | 44 | 36 | 60 | 52 | 7.044970 | 45.133185 | 546 | ||
9 | sophiaAmphitheater | Amphitheater at Sophia | Sophia | Ulpia Serdica | https://pleiades.stoa.org/places/207439 | amphitheater | post-severan | 0 | ... | Bulgaria | thracia | 60.5 | 43 | 23.328650 | 42.697178 | 554 | |||||
10 | rodezAmphitheater | Amphitheater at Rodez | Rodez | Segodunum | https://pleiades.stoa.org/places/138579 | 22 | amphitheater | imperial | 1 | ... | France | aquitania | 2.571162 | 44.353698 | 611 | ||||||
11 | sanBenedettoDeiMarsiAmphitheater | Amphitheater at San Benedetto dei Marsi | San Benedetto | Marruvium | https://pleiades.stoa.org/places/432927 | amphitheater | imperial | 1 | ... | Italy | regio-iv | 101 | 13.627505 | 42.006504 | 683 | ||||||
12 | sanVittorinoAmphitheater | Amphitheater at San Vittorino | San Vittorino | Amiternum | https://pleiades.stoa.org/places/413013 | amphitheater | first-century | 1 | ... | Italy | regio-iv | 64 | 42 | 90 | 68 | 13.306027 | 42.400531 | 672 | |||
13 | sisapoAmphitheater | Amphitheater at Sisapo | Sisapo | https://pleiades.stoa.org/places/266043 | amphitheater | imperial | 1 | ... | Spain | baetica | -4.516667 | 38.645833 | 659 | ||||||||
14 | leonAmphitheater | Amphitheater at León | León | Legio | https://pleiades.stoa.org/places/236512 | amphitheater | imperial | 1 | ... | Spain | tarraconensis | -5.566944 | 42.598889 | 846 | |||||||
15 | bernAmphitheater | Amphitheater at Bern | Bern | Bremodorum | https://pleiades.stoa.org/places/177471 | 42 | amphitheater | imperial | 1 | ... | Switzerland | germania-superior | 27.6 | 23.5 | 42.6 | 38.5 | 7.451179 | 46.975715 | 547 | ||
16 | bostraAmphitheater | Amphitheater at Bosra | Bosra | Nova Trajana Bostra | https://pleiades.stoa.org/places/678073 | amphitheater | imperial | 1 | ... | Syria | syria | 36.479844 | 32.517923 | 845 | |||||||
17 | siccaVeneriaAmphitheater | Amphitheater at Sicca Veneria | Sicca Veneria | Sicca Veneria | https://pleiades.stoa.org/places/315152 | amphitheater | imperial | 1 | ... | Tunisia | proconsularis | 70 | 50 | 100 | 80 | 8.715302 | 36.185361 | 748 | |||
18 | aostaAmphitheater | Amphitheater at Aosta | Aosta | Augusta Praetoria | https://pleiades.stoa.org/places/383579 | amphitheater | julio-claudian | 1 | ... | Italy | regio-xi | 150 | 115 | 7.323481 | 45.740914 | 595 | |||||
19 | aphrodisiasAmphitheater | Late Roman Amphitheater at Aphrodisas | Aphrodisias | Aphrodisias | https://pleiades.stoa.org/places/638753 | arena-in-stadium | fourth-century | 0 | ... | Turkey | asia | 59 | 30 | 28.723061 | 37.712553 | 526 | |||||
20 | tusculumAmphitheater | Tusculum Amphitheater | Tusculum | https://pleiades.stoa.org/places/423108 | amphitheater | second-century | 1 | ... | Italy | regio-i | 49 | 30 | 73 | 54 | 12.702103 | 41.798372 | 582 | ||||
21 | compsaAmphitheater | Amphitheater at Compsa | Compsa | Compsa | https://pleiades.stoa.org/places/442550 | amphitheater | republican | 1 | ... | Italy | regio-ii | 15.330556 | 40.870278 | 572 |
22 rows × 21 columns
Our condition can be string data too; in which case we put the string in quotation marks:
query = '''
SELECT *
FROM amphi
WHERE chronogroup = "flavian";
'''
run_query(query)
id | title | label | latintoponym | pleiades | welchid | golvinid | buildingtype | chronogroup | secondcentury | ... | modcountry | romanregion | arenamajor | arenaminor | extmajor | extminor | exteriorheight | longitude | latitude | elevation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | arlesAmphitheater | Amphitheater at Arles | Arles | Arelate | https://pleiades.stoa.org/places/148217 | 154 | amphitheater | flavian | 1 | ... | France | narbonensis | 47 | 32 | 136 | 107 | 4.631111 | 43.677778 | 21 | ||
1 | romeFlavianAmphitheater | Flavian Amphitheater at Rome | Colosseum | https://pleiades.stoa.org/places/423025 | 152 | amphitheater | flavian | 1 | ... | Italy | regio-i | 83 | 48 | 189 | 156 | 52 | 12.492269 | 41.890169 | 22 | ||
2 | newsteadAmphitheater | Amphitheater at Newstead | Newstead | Trimontium | https://pleiades.stoa.org/places/89304 | amphitheater | flavian | 1 | ... | United Kingdom | britannia | 37 | 23 | 70 | 60 | -2.691900 | 55.602600 | 83 | |||
3 | pozzuoliFlavianAmphitheater | Flavian Amphitheater at Pozzuoli | Pozzuoli (Flavian) | Puteoli | https://pleiades.stoa.org/places/432815 | 153 | amphitheater | flavian | 1 | ... | Italy | regio-i | 74.8 | 42 | 149 | 116 | 14.125315 | 40.825929 | 41 | ||
4 | segobrigaAmphitheater | Amphitheater at Segobriga | Segobriga | Segobriga | https://pleiades.stoa.org/places/266036 | 88 | amphitheater | flavian | 1 | ... | Spain | tarraconensis | 47 | 31 | 75 | 64 | -2.813944 | 39.886018 | 817 | ||
5 | parisAmphitheater | Amphitheater at Paris | Paris | Lutetia Parisiorum | https://pleiades.stoa.org/places/109126 | gallo-roman-amphitheater | flavian | 1 | ... | France | lugdunensis | 2.352855 | 48.845097 | 46 | |||||||
6 | nimesAmphitheater | Amphitheater at Nimes | Nimes | Nemausus | https://pleiades.stoa.org/places/148142 | amphitheater | flavian | 1 | ... | France | narbonensis | 69.1 | 38.4 | 133.4 | 101.4 | 4.359599 | 43.834876 | 50 | |||
7 | chesterAmphitheater | Amphitheater at Chester | Chester | Deva Victrix | https://pleiades.stoa.org/places/79420 | amphitheater | flavian | 1 | ... | United Kingdom | britannia | 56.7 | 48.3 | 95.5 | 86.6 | -2.886928 | 53.189079 | 23 | |||
8 | arezzoAmphitheater | Amphitheater at Arezzo | Arezzo | Arretium | https://pleiades.stoa.org/places/413032 | amphitheater | flavian | 1 | ... | Italy | regio-vii | 71.9 | 42.7 | 109.4 | 80 | 11.880370 | 43.460491 | 263 | |||
9 | londonAmphitheater | Amphitheater at London | London | Londinium | https://pleiades.stoa.org/places/79574 | amphitheater | flavian | 1 | ... | United Kingdom | britannia | 60 | 40 | 100 | 85 | -0.092136 | 51.515522 | 29 | |||
10 | urbisagliaAmphitheater | Amphitheater at Urbs Salvia | Urbs Salvia | Urbs Saglia | https://pleiades.stoa.org/places/413364 | amphitheater | flavian | 1 | ... | Italy | regio-v | 59 | 35 | 98 | 71 | 13.387224 | 43.200877 | 219 | |||
11 | vindonissaAmphitheater | Amphitheater at Vindonissa | Vindonissa | https://pleiades.stoa.org/places/177661 | 17 | amphitheater | flavian | 1 | ... | Switzerland | germania-superior | 64 | 51 | 112 | 98 | 8.213605 | 47.476290 | 361 | |||
12 | frejusAmphitheater | Amphitheater Fréjus | Fréjus | Forum Julii | https://pleiades.stoa.org/places/157836 | amphitheater | flavian | 1 | ... | France | narbonensis | 67.7 | 39.7 | 113.7 | 85.7 | 6.728710 | 43.434460 | 12 | |||
13 | tebessaAmphitheater | Amphitheater at Theveste | Tebessa | Theveste | https://pleiades.stoa.org/places/324831 | amphitheater | flavian | 1 | ... | Algeria | proconsularis | 52.8 | 39.5 | 83 | 70 | 8.123809 | 35.401171 | 877 | |||
14 | aquileiaAmphitheater | Amphitheater at Aquileia | Aquileia | Aquileia | https://pleiades.stoa.org/places/187290 | amphitheater | flavian | 1 | ... | Italy | regio-x | 72 | 46 | 142 | 118 | 13.370713 | 45.768239 | 2 | |||
15 | terminiImereseAmphitheater | Amphitheater at Termini Imerese | Termini | Thermae Himerae | https://pleiades.stoa.org/places/462513 | 164 | amphitheater | flavian | 1 | ... | Italy | sicilia | 53 | 30 | 98 | 75 | 13.695186 | 37.983696 | 76 | ||
16 | bolsenaAmphitheater | Amphitheater at Bolsena | Bolsena | Volsinii Novi | https://pleiades.stoa.org/places/413389 | amphitheater | flavian | 1 | ... | Italy | regio-vii | 11.990158 | 42.650243 | 417 | |||||||
17 | narbonneAmphitheater | Amphitheater at Narbonne | Narbonne | Narbo | https://pleiades.stoa.org/places/246347 | amphitheater | flavian | 1 | ... | France | narbonensis | 75 | 46.6 | 121.6 | 93.2 | 3.010181 | 43.184873 | 14 | |||
18 | capuaImperialAmphitheater | Imperial Amphitheater at Capua | Capua II | Capua | https://pleiades.stoa.org/places/432754 | amphitheater | flavian | 1 | ... | Italy | regio-i | 76.12 | 45.83 | 165 | 135 | 14.250089 | 41.085935 | 34 | |||
19 | dorchesterAmphitheater | Amphitheater at Dorchester / Maumbury Henge | Maumbury | Durnovaria | https://pleiades.stoa.org/places/79431 | amphitheater | flavian | 1 | ... | United Kingdom | britannia | 58 | 47 | 80 | 77 | -2.440246 | 50.708044 | 78 | |||
20 | autunAmphitheater | Amphitheater at Autun | Autun | Augustodunum | https://pleiades.stoa.org/places/177460 | 157 | amphitheater | flavian | 1 | ... | France | lugdunensis | 74 | 49 | 154 | 130 | 4.310856 | 46.953003 | 317 | ||
21 | bobadelaAmphitheater | Amphitheater at Bobadela | Bobadela | Elbocoris | https://pleiades.stoa.org/places/236458 | amphitheater | flavian | 1 | ... | Portugal | lusitania | 50 | 40 | -7.893572 | 40.361088 | 383 | |||||
22 | chichesterAmphitheater | Amphitheater at Chichester | Chichester | Noviomagus Reginorum | https://pleiades.stoa.org/places/79622 | amphitheater | flavian | 1 | ... | United Kingdom | britannia | 70 | 60 | -0.771186 | 50.835394 | 14 |
23 rows × 21 columns
Can you write a query that pulls only the Flavian amphitheatres in France? Hint: you'll need the AND command.
How many such amphitheatres are there? This is where you'd use the COUNT command. Let's count up the number of amphitheatres from the second century.
query = '''
SELECT COUNT(*)
FROM amphi
WHERE chronogroup = "second-century";
'''
run_query(query)
COUNT(*) | |
---|---|
0 | 50 |
We can rename that result like so:
query = '''
SELECT COUNT(*) AS "Total Count of Second Century Amphitheatres in the DB"
FROM amphi
WHERE chronogroup = "second-century";
'''
run_query(query)
Total Count of Second Century Amphitheatres in the DB | |
---|---|
0 | 50 |
SUM, AVG, MIN and MAX
What was the average capacity?
query = '''
SELECT AVG(capacity) AS "Average Capacity"
FROM amphi;
'''
run_query(query)
Average Capacity | |
---|---|
0 | 5791.476923 |
query = '''
SELECT AVG(arenamajor) AS "Average Length"
FROM amphi;
'''
run_query(query)
Average Length | |
---|---|
0 | 32.982759 |
We can group rows by one value versus another to see how they compare. Is there a difference in average length of the long axis in Julio-Claudian versus Flavian amphitheatres?
query = '''
SELECT chronogroup, AVG(arenamajor) AS "Average Length"
FROM amphi
GROUP BY chronogroup
ORDER BY "Average Length" DESC;
'''
run_query(query)
chronogroup | Average Length | |
---|---|---|
0 | post-severan | 62.750000 |
1 | fourth-century | 59.000000 |
2 | neronian | 57.000000 |
3 | augustan | 56.125000 |
4 | flavian | 54.266087 |
5 | late-second-century | 46.333333 |
6 | hadrianic | 45.626667 |
7 | third-century | 43.500000 |
8 | julio-claudian | 43.273571 |
9 | second-century | 42.856800 |
10 | severan | 42.450000 |
11 | first-century | 42.227778 |
12 | imperial | 22.010092 |
13 | caesarean | 19.333333 |
14 | republican | 10.250000 |
15 | late-1st-early-2nd | 0.000000 |
Now let's tell the database how the two tables are joined together. The label
field in the amphi
table contains the modern day description of the location of amphitheatres, and the title
field in the aqua
table contains a description of the modern day location of the aqueducts. Normally, when we join two tables, we want to perform the join on columns that are keyed together. In a sales database for instance there might a table of orders
and another for shipping address
, and each one contains a customer_id
column. In such a case, we use =
to say
FROM orders
INNER JOIN shipping_address
ON orders.customer_id = shipping_addres.customer_id
But archaeological data is rarely so straightforward. In our two tables here, we have to pattern match in order to make the two fields join up - there is no 'primary key' to help us know that a row in one table is talking about the same thing in another table. Instead of =
we're going to use the LIKE command. LIKE uses two different kinds of wildcards, %
and _
.
If we said, LIKE 'Arl%'
we would find matches on Arles, Arlate and so on. Placing the %
on either side would find strings that contain Arl. In our case, we want to find instances in the aqua
table's title
column that contain strings from the amphi
table's label
column.
To join to our first table all matching rows from our second, we do an 'inner join'. The syntax generally is:
SELECT relevant-columns # these will be the columns displayed in your result
FROM tableA # the table to join
INNER JOIN tableB # with this table
ON tableA.title = tableB.label # by these criteria
The query below displays the result of joining the aqua
table to the amphi
table using the labels
column data as the middle piece in a wildcard: %string%
, but uses the || characters to indicate we want the string values, not the literal characters amphi.label.
query = '''
SELECT amphi.id, amphi.label, aqua.identifier
FROM aqua
INNER JOIN amphi
ON aqua.title LIKE '%' || amphi.label || '%';
'''
run_query(query)
id | label | identifier | |
---|---|---|---|
0 | mitilineAmphitheater | Mitiline | http://dare.ht.lu.se/places/43530 |
We've also created a small notebook that shows how to import a database into R, and to build queries for it. Once you've done that, you can pass the results as a dataframe and use the full power of R to analyze. The notebook [is here](SQLite Database and R.ipynb).
This is also possible in python, of course, and we have an example notebook here