Download NTSB Aviation Accidents Database¶

A machine-readable copy of the database is published online in Microsoft Access format. Let's download it.

In [5]:

import os
import pypyodbc

Configure the data directories.

In [6]:

%store -r

Remove the zip file if it already exists.

In [8]:

zip_path = os.path.join(input_dir, "avall.zip")

In [10]:

!rm -f $zip_path

Download the file.

In [4]:

!wget -O $input_dir/avall.zip -nc https://app.ntsb.gov/avdata/Access/avall.zip

--2018-10-17 18:43:43--  https://app.ntsb.gov/avdata/Access/avall.zip
Resolving app.ntsb.gov (app.ntsb.gov)... 199.173.155.8
Connecting to app.ntsb.gov (app.ntsb.gov)|199.173.155.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 242648102 (231M) [application/x-zip-compressed]
Saving to: ‘/home/palewire/Code/helicopter-accident-analysis/src/../input/avall.zip’

/home/palewire/Code 100%[===================>] 231.41M  1.07MB/s    in 2m 43s  

2018-10-17 18:46:28 (1.42 MB/s) - ‘/home/palewire/Code/helicopter-accident-analysis/src/../input/avall.zip’ saved [242648102/242648102]

Unzip the file.

In [5]:

!unzip -n $input_dir/\*.zip -d $input_dir/

Archive:  /home/palewire/Code/helicopter-accident-analysis/src/../input/avall.zip

Use the mdbtools command-line tool to get a list of all the tables in the database

In [13]:

mdb_path = os.path.join(input_dir, 'avall.mdb')

In [14]:

tables = !mdb-tables $mdb_path

Split them into a Python list

In [15]:

tables = tables[0].split()

Extract the tables from the Microsoft Access database and write them out as CSVs.

In [16]:

def export_table(name):
    print(f"Exporting {name}")
    export_path = os.path.join(input_dir, f"{name.lower()}.csv")
    !mdb-export $mdb_path $name > $export_path

In [17]:

for t in tables: export_table(t)

Exporting aircraft
Exporting Country
Exporting ct_seqevt
Exporting dt_aircraft
Exporting dt_events
Exporting dt_Flight_Crew
Exporting eADMSPUB_DataDictionary
Exporting engines
Exporting events
Exporting Events_Sequence
Exporting Findings
Exporting Flight_Crew
Exporting flight_time
Exporting injury
Exporting narratives
Exporting NTSB_Admin
Exporting Occurrences
Exporting seq_of_events
Exporting states
Exporting ct_iaids