In response to a public records request made by The Times, the FAA provided a decade of its annual estimates about the activity of America's most common helicopter models. This data is not published online. The records provided to The Times were manually input into a spreadsheet.
This notebook filters the released data to population counts and flight-hour estimates for models that have appeared in all ten years of survey data.
import os
import pandas as pd
%store -r
Read in the raw survey data.
survey = pd.read_csv(os.path.join(input_dir, "faa_general_aviation_survey.csv"))
Identify all the models that have been surveyed by the FAA
model_counts = survey.groupby([
'latimes_make',
'latimes_model'
]).size().rename("survey_count").reset_index()
model_counts
latimes_make | latimes_model | survey_count | |
---|---|---|---|
0 | AGUSTA | 109 | 10 |
1 | AGUSTA | 119 | 2 |
2 | AGUSTA | 139 | 4 |
3 | AIRBUS | 120 | 3 |
4 | AIRBUS | 130 | 10 |
5 | AIRBUS | 135 | 10 |
6 | AIRBUS | 350 | 10 |
7 | BELL | 206 | 10 |
8 | BELL | 212 | 5 |
9 | BELL | 222 | 1 |
10 | BELL | 407 | 10 |
11 | BELL | 412 | 4 |
12 | BELL | 429 | 1 |
13 | BELL | 47 | 7 |
14 | BELL | OH-58 | 11 |
15 | BELL | TH-13 | 1 |
16 | BELL | UH-1H | 9 |
17 | ENSTROM | 280 | 1 |
18 | ENSTROM | F28 | 4 |
19 | HILLER | UH-12 | 2 |
20 | HUGHES | 269 | 3 |
21 | HUGHES | 369 | 10 |
22 | HUGHES | OH | 2 |
23 | HUGHES | TH | 1 |
24 | HUGHES | YO | 2 |
25 | MBB | 105 | 6 |
26 | MBB | 117 | 9 |
27 | MCDONNELL DOUGLAS | 369 | 10 |
28 | MITSUBISHI | MU2B | 1 |
29 | ROBINSON | R22 | 10 |
30 | ROBINSON | R44 | 10 |
31 | ROBINSON | R66 | 1 |
32 | ROCKWELL | 690 | 1 |
33 | ROTARY | RAF2000 | 1 |
34 | SCHLEICHER | ASW27 | 1 |
35 | SCHWEIZER | 269 | 10 |
36 | SIKORSKY | 61 | 2 |
37 | SIKORSKY | 76 | 10 |
38 | SIKORSKY | 92 | 2 |
39 | SOCATA | TBM700 | 1 |
40 | SWEARINGEN | SA227 | 1 |
41 | TAYLORCRAFT | BC12 | 1 |
42 | VANS | RV10 | 1 |
43 | VANS | RV4 | 1 |
44 | VANS | RV6 | 1 |
45 | VANS | RV7 | 1 |
46 | VANS | RV8 | 1 |
47 | VANS | RV9 | 1 |
48 | YAKOVLEV | YAK52 | 1 |
49 | ZENAIR | CH601 | 1 |
50 | ZENAIR | CH701 | 1 |
Filter down to models that appear in all ten years of survey data
qualified_models = model_counts[model_counts.survey_count == 10][
['latimes_make', 'latimes_model']
]
List them
qualified_models
latimes_make | latimes_model | |
---|---|---|
0 | AGUSTA | 109 |
4 | AIRBUS | 130 |
5 | AIRBUS | 135 |
6 | AIRBUS | 350 |
7 | BELL | 206 |
10 | BELL | 407 |
21 | HUGHES | 369 |
27 | MCDONNELL DOUGLAS | 369 |
29 | ROBINSON | R22 |
30 | ROBINSON | R44 |
35 | SCHWEIZER | 269 |
37 | SIKORSKY | 76 |
Filter down the survey data to only include those models that have qualified for analysis.
qualified_survey = survey.merge(
qualified_models,
on=["latimes_make", "latimes_model"],
how="inner"
)
Trim down the columns.
trimmed_qualified_survey = qualified_survey[[
'year',
'latimes_make',
'latimes_model',
'population_count',
'active_count',
'total_hours'
]]
cleaned_qualified_survey = trimmed_qualified_survey.sort_values([
'latimes_make',
'latimes_model',
'year'
])
Create a combined make and model column.
cleaned_qualified_survey['latimes_make_and_model'] = cleaned_qualified_survey.latimes_make + " " + cleaned_qualified_survey.latimes_model
Output the result.
cleaned_qualified_survey.to_csv(os.path.join(output_dir, "tidy-survey.csv"), index=False)