"Alternative-fuel" vehicles in Delaware: Data analysis

Peter Attia, petermattia.com

Last updated November 9, 2017

This notebook investigates alternative-fuel vehicle purchases in Delaware. Alternative-fuel vehicles include:

  • Battery electric vehicles (BEVs)
  • Plug-in hybrid electric vehicles (PHEVs)
  • Propane or natural gas vehicles

Rebates data downloaded on October 22, 2017 (data last updated October 2, 2017) from this link

More information about the program:

This notebook contains the data analysis process for this dataset. Another notebook shows the data cleaning.

Imports and defaults

In [1]:
# IMPORTS
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import json

# DEFAULTS
%matplotlib inline
pd.set_option("display.max_rows",10)
pd.set_option("display.max_columns",20)
sns.set(font_scale=2)
sns.set_style("dark")

Load the cleaned rebates dataset

In [2]:
rebates = pd.read_csv('State_Rebates_for_Alternative-Fuel_Vehicles_cleaned.csv')
rebates["Date_of_Purchase"] = pd.to_datetime(rebates["Date_of_Purchase"]) # must adjust in this notebook
rebates
Out[2]:
Award_Number Rebate_Amount City State County Zip Age Gender Date_of_Purchase Dealership Vehicle_Type Make Model Year Lease? Gasoline_Emissions Alt-Fuel_Emissions
0 CVR071501 $2200.00 Hockessin DE New Castle 19707 81.0 M 2015-07-20 Sheridan Ford Plug-in Hybrid Ford Fusion Energi 2016 No 14815.0 6575.0
1 CVR071502 $2200.00 Wilmington DE New Castle 19809 47.0 M 2015-07-20 Porter Ford Plug-in Hybrid Ford C-Max Energi 2015 Yes 14815.0 6575.0
2 CVR071503 $2200.00 Wilmington DE New Castle 19810 47.0 M 2015-07-22 Darcars of Lanham Severn. Inc. Electric Ford Focus Electric 2015 Yes 14815.0 5539.0
3 CVR071504 $2200.00 Wilmington DE New Castle 19808 66.0 M 2015-07-20 Sheridan Ford Plug-in Hybrid Ford C-Max Energi 2015 Yes 14815.0 6575.0
4 CVR071505 $2200.00 New Castle DE New Castle 19720 51.0 M 2015-07-24 Sheridan Ford Plug-in Hybrid Ford Fusion Energi 2015 Yes 14815.0 6575.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
510 CVR0616147 $1100.00 Dover DE Kent 19901 NaN NaN 2016-04-20 Coach and Equipment/ Coach Bus Sales Propane Ford E-450 Phoenix 2016 No NaN NaN
511 CVR0616148 $1100.00 Dover DE Kent 19901 NaN NaN 2016-04-20 Coach and Equipment/ Coach Bus Sales Propane Ford E-450 Phoenix 2016 No NaN NaN
512 CVR0616149 $1100.00 Dover DE Kent 19901 NaN NaN 2016-04-20 Coach and Equipment/ Coach Bus Sales Propane Ford E-450 Phoenix 2016 No NaN NaN
513 CVR0616150 $1100.00 Dover DE Kent 19901 NaN NaN 2016-04-20 Coach and Equipment/ Coach Bus Sales Propane Ford E-450 Phoenix 2016 No NaN NaN
514 CVR0616151 $1100.00 Dover DE Kent 19901 NaN NaN 2016-04-20 Coach and Equipment/ Coach Bus Sales Propane Ford E-450 Phoenix 2016 No NaN NaN

515 rows × 17 columns

Preliminary data exploration

Ages

All alternative fuel vehicles

In [3]:
fig = plt.figure(figsize=(10, 4)) 
plt.subplot(1,2,1)
rebates["Age"].dropna().hist(bins=np.arange(20,100,1))
plt.xlim(20,100)
plt.xlabel('Age'),plt.ylabel('Frequency')

plt.subplot(1,2,2)
rebates["Age"].dropna().hist(bins=np.arange(20,110,10))
plt.xlabel('Age (decade of life)'),plt.ylabel('Frequency')
plt.xlim(20,100)

plt.tight_layout()
plt.savefig('age.svg',bbox_inches='tight')
In [4]:
Counter(rebates["Age"]).most_common(5)
Out[4]:
[(50.0, 17), (47.0, 16), (31.0, 16), (65.0, 16), (59.0, 15)]

Just Teslas

In [5]:
fig = plt.figure(figsize=(10, 4)) 
plt.subplot(1,2,1)
rebates[rebates['Make'] == 'Tesla']["Age"].dropna().hist(bins=np.arange(20,100,1))
plt.xlabel('Age'),plt.ylabel('Frequency')
plt.xlim(20,100)

plt.subplot(1,2,2)
rebates[rebates['Make'] == 'Tesla']["Age"].dropna().hist(bins=np.arange(20,100,10))
plt.xlabel('Age (decade of life)'),plt.ylabel('Frequency')
plt.xlim(20,100)

plt.tight_layout()
plt.savefig('age_tesla.svg',bbox_inches='tight')

County

All alternative fuel vehicles

In [6]:
rebates['County'].value_counts().plot(kind='barh')
plt.xlabel('Number of alternative fuel vehicles')
Out[6]:
<matplotlib.text.Text at 0x11b14ee80>

Just Teslas

In [7]:
rebates[rebates['Make']=='Tesla']['County'].value_counts().plot(kind='barh')
plt.xlabel('Number of Teslas')
Out[7]:
<matplotlib.text.Text at 0x11a491a90>

Fraction of purchases that are Teslas

In [8]:
(100 * rebates[rebates['Make']=='Tesla']['County'].value_counts() / rebates['County'].value_counts()).plot(kind='barh')
plt.xlabel('Fraction of Teslas of total alternative fuel vehicles (%)')
Out[8]:
<matplotlib.text.Text at 0x11b29ea90>

Combined plot

In [9]:
fig = plt.figure(figsize=(15, 4)) 

plt.subplot(1,3,1)
rebates['County'].value_counts().plot(kind='barh')
plt.title('Number of \nalternative fuel vehicles')
plt.xlabel('Count')

plt.subplot(1,3,2)
rebates[rebates['Make']=='Tesla']['County'].value_counts().plot(kind='barh')
plt.title('Number of Teslas')
plt.xlabel('Count')
plt.gca().set_yticklabels(['','',''])

plt.subplot(1,3,3)
(100 * rebates[rebates['Make']=='Tesla']['County'].value_counts() / rebates['County'].value_counts()).plot(kind='barh')
plt.title('Fraction of Teslas of total\nalternative fuel vehicles')
plt.xlabel('Percent (%)')
plt.gca().set_yticklabels(['','',''])

plt.tight_layout()
plt.savefig('counties.svg',bbox_inches='tight')
In [10]:
fig = plt.figure(figsize=(15, 4)) 

# Population values taken from googling "__ county delaware" (google returned 2015 values)
county_pop_dict = {"New Castle": 556779, "Kent": 173533, "Sussex": 215622 }

plt.subplot(1,3,1)
county_afv_norm = dict(rebates['County'].value_counts())
county_afv_norm.update({n: 10000*county_afv_norm[n]/county_pop_dict[n] for n in county_afv_norm.keys()})
plt.barh(range(len(county_afv_norm)), county_afv_norm.values(), align='center')
plt.yticks(range(len(county_afv_norm)), list(county_afv_norm.keys()))
plt.title('Number of \nalternative fuel vehicles')
plt.xlabel('Count per 10,000 residents')

plt.subplot(1,3,2)
county_tesla_norm = dict(rebates[rebates['Make']=='Tesla']['County'].value_counts())
county_tesla_norm.update({n: 10000*county_tesla_norm[n]/county_pop_dict[n] for n in county_tesla_norm.keys()})
plt.barh(range(len(county_tesla_norm)), county_tesla_norm.values(), align='center')
plt.yticks(range(len(county_tesla_norm)), list(county_tesla_norm.keys()))
plt.title('Number of Teslas')
plt.xlabel('Count per 10,000 residents')
plt.gca().set_yticklabels(['','',''])

plt.subplot(1,3,3)
(100 * rebates[rebates['Make']=='Tesla']['County'].value_counts() / rebates['County'].value_counts()).plot(kind='barh')
plt.title('Fraction of Teslas of total\nalternative fuel vehicles')
plt.xlabel('Percent (%)')
plt.gca().set_yticklabels(['','',''])

plt.tight_layout()
plt.savefig('counties_norm.svg',bbox_inches='tight')

Makes

In [11]:
rebates['Make'].value_counts(ascending=True).plot(kind='barh',figsize=(8,6))
plt.xlabel('Vehicle count')
plt.savefig('make.svg',bbox_inches='tight')

Models

In [12]:
Counter(rebates["Model"])
rebates['Model'].value_counts(ascending=True).plot(kind='barh',figsize=(14,10))
plt.xlabel('Count')
plt.savefig('models.svg',bbox_inches='tight')

Vehicle type

In [13]:
rebates["Vehicle_Type"].value_counts(ascending=True).plot(kind='barh')
plt.savefig('vehicle_types.svg',bbox_inches='tight')

Propane vehicles are coach buses

In [14]:
rebates[rebates['Dealership'] != 'Coach and Equipment/ Coach Bus Sales']['Vehicle_Type'].value_counts(ascending=True).plot(kind='barh')
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x11b68b128>
In [15]:
rebates[rebates["Vehicle_Type"] == 'Electric']['Model'].value_counts(ascending=True).plot(kind='barh')
plt.xlabel('Count')
plt.savefig('electric_models.svg',bbox_inches='tight')

Zipcodes

In [16]:
zip_counter = Counter(rebates["Zip"])
zip_df = pd.DataFrame.from_dict(zip_counter, orient='index').reset_index()
zip_df = zip_df.rename(columns={'index':'Zip', 0:'Count'})
zip_df['Zip'] = zip_df['Zip'].astype(str)
zip_df
Out[16]:
Zip Count
0 19707 14
1 19809 10
2 19810 18
3 19808 27
4 19720 15
... ... ...
43 19960 2
44 19731 1
45 19955 1
46 19943 3
47 19902 1

48 rows × 2 columns

In [17]:
zip_pop_df = pd.read_csv('2010+Census+Population+By+Zipcode+(ZCTA).csv')
zip_pop_df = zip_pop_df.rename(columns={'Zip Code ZCTA':'Zip', '2010 Census Population':'population'})
zip_pop_df['Zip'] = zip_pop_df['Zip'].astype(str)
zip_pop_df
Out[17]:
Zip population
0 1001 16769
1 1002 29049
2 1003 10372
3 1005 5079
4 1007 14649
... ... ...
33087 99923 87
33088 99925 819
33089 99926 1460
33090 99927 94
33091 99929 2338

33092 rows × 2 columns

Normalize count by population to get count per capita

In [18]:
zip_df_merged = pd.merge(zip_df, zip_pop_df, on='Zip')
zip_df_merged['count_norm'] = zip_df_merged.Count / zip_df_merged.population * 1e3
zip_df_merged
Out[18]:
Zip Count population count_norm
0 19707 14 16483 0.849360
1 19809 10 14049 0.711794
2 19810 18 25011 0.719683
3 19808 27 38442 0.702357
4 19720 15 59250 0.253165
... ... ... ... ...
43 19960 2 6295 0.317712
44 19731 1 252 3.968254
45 19955 1 72 13.888889
46 19943 3 11425 0.262582
47 19902 1 283 3.533569

48 rows × 4 columns

Just Teslas

In [19]:
tesla_zip_counter = Counter(rebates[rebates['Make']=='Tesla']["Zip"])
tesla_zip_df = pd.DataFrame.from_dict(tesla_zip_counter, orient='index').reset_index()
tesla_zip_df = tesla_zip_df.rename(columns={'index':'Zip', 0:'Count'})
tesla_zip_df['Zip'] = tesla_zip_df['Zip'].astype(str)
tesla_zip_df_merged = pd.merge(tesla_zip_df, zip_pop_df, on='Zip')
tesla_zip_df_merged['count_norm'] = tesla_zip_df_merged.Count / tesla_zip_df_merged.population * 1e3
tesla_zip_df_merged
Out[19]:
Zip Count population count_norm
0 19713 2 30408 0.065772
1 19953 1 4386 0.227998
2 19901 4 35055 0.114106
3 19803 6 21364 0.280846
4 19732 1 25 40.000000
... ... ... ... ...
18 19720 1 59250 0.016878
19 19977 1 22984 0.043509
20 19934 1 12805 0.078094
21 19975 1 8253 0.121168
22 19904 1 34132 0.029298

23 rows × 4 columns

In [20]:
geo_json_data = json.load(open('de_zipcode_boundaries.min.json'))
geo_json_data['features'][0]['properties'] # zip code data in ZCTA5CE10
Out[20]:
{'ALAND10': 37591415,
 'AWATER10': 17805,
 'CLASSFP10': 'B5',
 'FUNCSTAT10': 'S',
 'GEOID10': '1019962',
 'INTPTLAT10': '+39.0688617',
 'INTPTLON10': '-075.4881219',
 'MTFCC10': 'G6350',
 'PARTFLG10': 'N',
 'STATEFP10': '10',
 'ZCTA5CE10': '19962'}

Make screenshot function

In [21]:
def save_map(m,filename):
    import os
    import time
    from selenium import webdriver
 
    delay=5
 
    #Save the map as an HTML file
    fn= filename + '.html'
    cwd = os.getcwd()
    tmpurl='file://' + cwd + '/' + fn
    tmpurl = tmpurl.format(path=os.getcwd(),mapfile=fn)
    m.save(fn)
 
    #Open a browser window...
    browser = webdriver.Safari()
    #..that displays the map...
    browser.get(tmpurl)
    #Give the map tiles some time to load
    time.sleep(delay)
    #Grab the screenshot
    browser.save_screenshot(filename + '.png')
    #Close the browser
    browser.quit()
In [22]:
# Test GeoJSON data
m = folium.Map(location=[39.2108325,-75.5276699], zoom_start=9, width=500, height=800)
folium.GeoJson(geo_json_data).add_to(m)
# For these maps to load, make sure you launch jupyter as:
#    jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000
# See https://github.com/jupyter/notebook/issues/2287 for more details.
save_map(m,'zip_codes_map')
m
Out[22]: