Tidy FAA survey¶

In response to a public records request made by The Times, the FAA provided a decade of its annual estimates about the activity of America's most common helicopter models. This data is not published online. The records provided to The Times were manually input into a spreadsheet.

This notebook filters the released data to population counts and flight-hour estimates for models that have appeared in all ten years of survey data.

In [24]:

import os
import pandas as pd

In [25]:

%store -r

Read in the raw survey data.

In [26]:

survey = pd.read_csv(os.path.join(input_dir, "faa_general_aviation_survey.csv"))

Identify all the models that have been surveyed by the FAA

In [27]:

model_counts = survey.groupby([
    'latimes_make',
    'latimes_model'
]).size().rename("survey_count").reset_index()

In [28]:

model_counts

Out[28]:

	latimes_make	latimes_model	survey_count
0	AGUSTA	109	10
1	AGUSTA	119	2
2	AGUSTA	139	4
3	AIRBUS	120	3
4	AIRBUS	130	10
5	AIRBUS	135	10
6	AIRBUS	350	10
7	BELL	206	10
8	BELL	212	5
9	BELL	222	1
10	BELL	407	10
11	BELL	412	4
12	BELL	429	1
13	BELL	47	7
14	BELL	OH-58	11
15	BELL	TH-13	1
16	BELL	UH-1H	9
17	ENSTROM	280	1
18	ENSTROM	F28	4
19	HILLER	UH-12	2
20	HUGHES	269	3
21	HUGHES	369	10
22	HUGHES	OH	2
23	HUGHES	TH	1
24	HUGHES	YO	2
25	MBB	105	6
26	MBB	117	9
27	MCDONNELL DOUGLAS	369	10
28	MITSUBISHI	MU2B	1
29	ROBINSON	R22	10
30	ROBINSON	R44	10
31	ROBINSON	R66	1
32	ROCKWELL	690	1
33	ROTARY	RAF2000	1
34	SCHLEICHER	ASW27	1
35	SCHWEIZER	269	10
36	SIKORSKY	61	2
37	SIKORSKY	76	10
38	SIKORSKY	92	2
39	SOCATA	TBM700	1
40	SWEARINGEN	SA227	1
41	TAYLORCRAFT	BC12	1
42	VANS	RV10	1
43	VANS	RV4	1
44	VANS	RV6	1
45	VANS	RV7	1
46	VANS	RV8	1
47	VANS	RV9	1
48	YAKOVLEV	YAK52	1
49	ZENAIR	CH601	1
50	ZENAIR	CH701	1

Filter down to models that appear in all ten years of survey data

In [29]:

qualified_models = model_counts[model_counts.survey_count == 10][
    ['latimes_make', 'latimes_model']
]

List them

In [30]:

qualified_models

Out[30]:

	latimes_make	latimes_model
0	AGUSTA	109
4	AIRBUS	130
5	AIRBUS	135
6	AIRBUS	350
7	BELL	206
10	BELL	407
21	HUGHES	369
27	MCDONNELL DOUGLAS	369
29	ROBINSON	R22
30	ROBINSON	R44
35	SCHWEIZER	269
37	SIKORSKY	76

Filter down the survey data to only include those models that have qualified for analysis.

In [31]:

qualified_survey = survey.merge(
    qualified_models,
    on=["latimes_make", "latimes_model"],
    how="inner"
)

Trim down the columns.

In [32]:

trimmed_qualified_survey  = qualified_survey[[
    'year',
    'latimes_make',
    'latimes_model',
    'population_count',
    'active_count',
    'total_hours'
]]

In [33]:

cleaned_qualified_survey  = trimmed_qualified_survey.sort_values([
    'latimes_make',
    'latimes_model',
    'year'
])

Create a combined make and model column.

In [34]:

cleaned_qualified_survey['latimes_make_and_model'] = cleaned_qualified_survey.latimes_make + " " + cleaned_qualified_survey.latimes_model

Output the result.

In [35]:

cleaned_qualified_survey.to_csv(os.path.join(output_dir, "tidy-survey.csv"), index=False)