Tidy FAA survey

In response to a public records request made by The Times, the FAA provided a decade of its annual estimates about the activity of America's most common helicopter models. This data is not published online. The records provided to The Times were manually input into a spreadsheet.

This notebook filters the released data to population counts and flight-hour estimates for models that have appeared in all ten years of survey data.

In [24]:
import os
import pandas as pd
In [25]:
%store -r

Read in the raw survey data.

In [26]:
survey = pd.read_csv(os.path.join(input_dir, "faa_general_aviation_survey.csv"))

Identify all the models that have been surveyed by the FAA

In [27]:
model_counts = survey.groupby([
    'latimes_make',
    'latimes_model'
]).size().rename("survey_count").reset_index()
In [28]:
model_counts
Out[28]:
latimes_make latimes_model survey_count
0 AGUSTA 109 10
1 AGUSTA 119 2
2 AGUSTA 139 4
3 AIRBUS 120 3
4 AIRBUS 130 10
5 AIRBUS 135 10
6 AIRBUS 350 10
7 BELL 206 10
8 BELL 212 5
9 BELL 222 1
10 BELL 407 10
11 BELL 412 4
12 BELL 429 1
13 BELL 47 7
14 BELL OH-58 11
15 BELL TH-13 1
16 BELL UH-1H 9
17 ENSTROM 280 1
18 ENSTROM F28 4
19 HILLER UH-12 2
20 HUGHES 269 3
21 HUGHES 369 10
22 HUGHES OH 2
23 HUGHES TH 1
24 HUGHES YO 2
25 MBB 105 6
26 MBB 117 9
27 MCDONNELL DOUGLAS 369 10
28 MITSUBISHI MU2B 1
29 ROBINSON R22 10
30 ROBINSON R44 10
31 ROBINSON R66 1
32 ROCKWELL 690 1
33 ROTARY RAF2000 1
34 SCHLEICHER ASW27 1
35 SCHWEIZER 269 10
36 SIKORSKY 61 2
37 SIKORSKY 76 10
38 SIKORSKY 92 2
39 SOCATA TBM700 1
40 SWEARINGEN SA227 1
41 TAYLORCRAFT BC12 1
42 VANS RV10 1
43 VANS RV4 1
44 VANS RV6 1
45 VANS RV7 1
46 VANS RV8 1
47 VANS RV9 1
48 YAKOVLEV YAK52 1
49 ZENAIR CH601 1
50 ZENAIR CH701 1

Filter down to models that appear in all ten years of survey data

In [29]:
qualified_models = model_counts[model_counts.survey_count == 10][
    ['latimes_make', 'latimes_model']
]

List them

In [30]:
qualified_models
Out[30]:
latimes_make latimes_model
0 AGUSTA 109
4 AIRBUS 130
5 AIRBUS 135
6 AIRBUS 350
7 BELL 206
10 BELL 407
21 HUGHES 369
27 MCDONNELL DOUGLAS 369
29 ROBINSON R22
30 ROBINSON R44
35 SCHWEIZER 269
37 SIKORSKY 76

Filter down the survey data to only include those models that have qualified for analysis.

In [31]:
qualified_survey = survey.merge(
    qualified_models,
    on=["latimes_make", "latimes_model"],
    how="inner"
)

Trim down the columns.

In [32]:
trimmed_qualified_survey  = qualified_survey[[
    'year',
    'latimes_make',
    'latimes_model',
    'population_count',
    'active_count',
    'total_hours'
]]
In [33]:
cleaned_qualified_survey  = trimmed_qualified_survey.sort_values([
    'latimes_make',
    'latimes_model',
    'year'
])

Create a combined make and model column.

In [34]:
cleaned_qualified_survey['latimes_make_and_model'] = cleaned_qualified_survey.latimes_make + " " + cleaned_qualified_survey.latimes_model

Output the result.

In [35]:
cleaned_qualified_survey.to_csv(os.path.join(output_dir, "tidy-survey.csv"), index=False)