Download NTSB Aviation Accidents Database

A machine-readable copy of the database is published online in Microsoft Access format. Let's download it.

In [5]:
import os
import pypyodbc

Configure the data directories.

In [6]:
%store -r

Remove the zip file if it already exists.

In [8]:
zip_path = os.path.join(input_dir, "avall.zip")
In [10]:
!rm -f $zip_path

Download the file.

In [4]:
!wget -O $input_dir/avall.zip -nc https://app.ntsb.gov/avdata/Access/avall.zip
--2018-10-17 18:43:43--  https://app.ntsb.gov/avdata/Access/avall.zip
Resolving app.ntsb.gov (app.ntsb.gov)... 199.173.155.8
Connecting to app.ntsb.gov (app.ntsb.gov)|199.173.155.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 242648102 (231M) [application/x-zip-compressed]
Saving to: ‘/home/palewire/Code/helicopter-accident-analysis/src/../input/avall.zip’

/home/palewire/Code 100%[===================>] 231.41M  1.07MB/s    in 2m 43s  

2018-10-17 18:46:28 (1.42 MB/s) - ‘/home/palewire/Code/helicopter-accident-analysis/src/../input/avall.zip’ saved [242648102/242648102]

Unzip the file.

In [5]:
!unzip -n $input_dir/\*.zip -d $input_dir/
Archive:  /home/palewire/Code/helicopter-accident-analysis/src/../input/avall.zip

Use the mdbtools command-line tool to get a list of all the tables in the database

In [13]:
mdb_path = os.path.join(input_dir, 'avall.mdb')
In [14]:
tables = !mdb-tables $mdb_path

Split them into a Python list

In [15]:
tables = tables[0].split()

Extract the tables from the Microsoft Access database and write them out as CSVs.

In [16]:
def export_table(name):
    print(f"Exporting {name}")
    export_path = os.path.join(input_dir, f"{name.lower()}.csv")
    !mdb-export $mdb_path $name > $export_path
In [17]:
for t in tables: export_table(t)
Exporting aircraft
Exporting Country
Exporting ct_seqevt
Exporting dt_aircraft
Exporting dt_events
Exporting dt_Flight_Crew
Exporting eADMSPUB_DataDictionary
Exporting engines
Exporting events
Exporting Events_Sequence
Exporting Findings
Exporting Flight_Crew
Exporting flight_time
Exporting injury
Exporting narratives
Exporting NTSB_Admin
Exporting Occurrences
Exporting seq_of_events
Exporting states
Exporting ct_iaids