A machine-readable copy of the database is published online in Microsoft Access format. Let's download it.
import os
import pypyodbc
Configure the data directories.
%store -r
Remove the zip file if it already exists.
zip_path = os.path.join(input_dir, "avall.zip")
!rm -f $zip_path
Download the file.
!wget -O $input_dir/avall.zip -nc https://app.ntsb.gov/avdata/Access/avall.zip
Unzip the file.
!unzip -n $input_dir/\*.zip -d $input_dir/
Use the mdbtools command-line tool to get a list of all the tables in the database
mdb_path = os.path.join(input_dir, 'avall.mdb')
tables = !mdb-tables $mdb_path
Split them into a Python list
tables = tables[0].split()
Extract the tables from the Microsoft Access database and write them out as CSVs.
def export_table(name):
print(f"Exporting {name}")
export_path = os.path.join(input_dir, f"{name.lower()}.csv")
!mdb-export $mdb_path $name > $export_path
for t in tables: export_table(t)