Digital Equity Data & Jupyter Notebooks¶

Eleanor Tutt | tutte@carnegielibrary.org | Code for PGH Work Night | June 13, 2018

Welcome!¶

This is the same Jupyter notebook I will talk through. You can also use it as a starting point to explore digital equity data during the second half of the work night.

I have been working with Jupyter notebooks on and off for several years, but I still consider myself a beginner and this talk is designed for beginners!

I'll leave lots of time for you to explore Jupyter notebooks, python, and digital equity data using this notebook as a starting point - you can slow down or speed up then, preferably in small groups with new friends. :)

Some helpful resources¶

Jupyter Notebooks¶

Python / Pandas¶

Folium¶

Digital Equity Data¶

In [17]:

# No code for this part - just talk!

Setting up your notebook¶

In [18]:

# if you want to use any python packages, add them at the beginning of your notebook
# packages allow you to take "shortcuts" instead of coding everything yourself
# like if you are making a pizza and you buy dough instead of making it totally from scratch

# pandas is a python package popular with data scientists
# is common to shorten pandas to pd when you import it
import pandas as pd

# folium is a python package that helps create interactive maps using leaflet
import folium

# later we are going to read in some geojson
import simplejson

# and this will help us with some colors for a choropleth map
from branca.colormap import linear

Importing data¶

In [19]:

# pandas has a function called read_csv that can import csv data and store it as a dataframe
# a dataframe is a common pandas object that stores information in rows and columns
# you can think of it a like a spreadsheet, especially if you are using .csv data

# here we are creating a dataframe named libraries from a previously downloaded csv file
# original data from https://data.wprdc.org/dataset/libraries
# you could also link directly to the wprdc download url in your code:
# https://data.wprdc.org/datastore/dump/14babf3f-4932-4828-8b49-3c9a03bae6d0

libraries = pd.read_csv("librarylocations.csv")

# here is library wifi use data - brand new on the WPRDC!
# original data from https://data.wprdc.org/dataset/clp-public-wifi

wifi = pd.read_csv("clp-public-wifi.csv") 

# here we are creating a dataframe named fcc from previously downloaded data
# original data from 
# we added the dype parameter to make sure tract numbers are imported as text and not numbers

fcc = pd.read_csv("tract_map_dec_2016.csv", dtype={'tractcode': str})

Looking at your data¶

In [20]:

# adding .info() to the name of a dataframe gives us some information about the dataframe -
# how many rows of data we have, what the columns are named, and what the data types are

fcc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73767 entries, 0 to 73766
Data columns (total 3 columns):
tractcode    73767 non-null object
pcat_all     73767 non-null int64
pcat_10x1    73767 non-null int64
dtypes: int64(2), object(1)
memory usage: 1.7+ MB

In [21]:

# you also might want to look at the first few rows of your dataframe

libraries.head(5)

Out[21]:

	_id	CLPID	Name	Address	City	State	Zip4	County	Phone	SqFt	...	ThOpen	ThClose	FrOpen	FrClose	SaOpen	SaClose	SuOpen	SuClose	Lat	Lon
0	1	CLP01	ALLEGHENY LIBRARY	1230 FEDERAL ST	PITTSBURGH	PA	15212-4704	Allegheny	(412) 237-1890	15005.0	...	10:00:00	20:00:00	10:00:00	17:00:00	10:00:00	17:00:00	12:00:00	17:00:00	40.456392	-80.006613
1	2	CLP02	BEECHVIEW LIBRARY	1910 BROADWAY AVE	PITTSBURGH	PA	15216-3130	Allegheny	(412) 563-2900	8000.0	...	10:00:00	20:00:00	10:00:00	17:00:00	10:00:00	17:00:00	NaN	NaN	40.407026	-80.027741
2	3	CLP03	BROOKLINE LIBRARY	708 BROOKLINE BLVD	PITTSBURGH	PA	15226-2102	Allegheny	(412) 561-1003	12651.0	...	10:00:00	20:00:00	10:00:00	17:00:00	10:00:00	17:00:00	NaN	NaN	40.394399	-80.021427
3	4	CLP04	CARRICK LIBRARY	1811 BROWNSVILLE RD	PITTSBURGH	PA	15210-3907	Allegheny	(412) 882-3897	4000.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.397019	-79.987547
4	5	CLP05	DOWNTOWN & BUSINESS LIBRARY	612 SMITHFIELD ST	PITTSBURGH	PA	15222-2506	Allegheny	(412) 281-7141	12709.0	...	08:30:00	18:00:00	08:30:00	17:00:00	10:00:00	17:00:00	NaN	NaN	40.441749	-79.997112

5 rows × 26 columns

In [22]:

# or the last few rows

wifi.tail(5)

Out[22]:

	CLPID	Name	Year	Month	WifiSessions	WifiMinutes
527	CLP19	WOODS RUN LIBRARY	2017	12	630	79279
528	CLP19	WOODS RUN LIBRARY	2018	1	716	99671
529	CLP19	WOODS RUN LIBRARY	2018	2	778	108100
530	CLP19	WOODS RUN LIBRARY	2018	3	816	104073
531	CLP19	WOODS RUN LIBRARY	2018	4	903	107865

Summarizing your data¶

In [16]:

# we can group by one or more columns in our data
# and then summarize by another column
# (try "mean" instead of "sum")

wifi.groupby(by=['Name', 'Year'])['WifiSessions'].sum()

Out[16]:

Name                                            Year
ALLEGHENY LIBRARY                               2016     13224
                                                2017     19914
                                                2018      6872
BEECHVIEW LIBRARY                               2016      5924
                                                2017      8509
                                                2018      2886
BROOKLINE LIBRARY                               2016      6840
                                                2017      9076
                                                2018      2350
CARRICK LIBRARY                                 2016      4498
                                                2017      1932
                                                2018         0
DOWNTOWN & BUSINESS LIBRARY                     2016     36168
                                                2017     47660
                                                2018     16974
EAST LIBERTY LIBRARY                            2016     21521
                                                2017     29329
                                                2018      9712
HAZELWOOD LIBRARY                               2016      7968
                                                2017      8198
                                                2018      2638
HILL DISTRICT LIBRARY                           2016      7693
                                                2017     12886
                                                2018      3943
HOMEWOOD LIBRARY                                2016     11363
                                                2017     14144
                                                2018      4111
KNOXVILLE LIBRARY                               2016      4953
                                                2017     16859
                                                2018      4362
LAWRENCEVILLE LIBRARY                           2016      3877
                                                2017      5909
                                                2018      1788
LIBRARY FOR THE BLIND & PHYSICALLY HANDICAPPED  2016      2407
                                                2017      3712
                                                2018       739
MAIN (OAKLAND) LIBRARY                          2016    108945
                                                2017    134235
                                                2018     53078
MOUNT WASHINGTON LIBRARY                        2016      2959
                                                2017      4076
                                                2018       992
SHERADEN LIBRARY                                2016      8805
                                                2017     11926
                                                2018      3026
SOUTH SIDE LIBRARY                              2016      7946
                                                2017      9630
                                                2018      3264
SQUIRREL HILL LIBRARY                           2016     44102
                                                2017     57127
                                                2018     20003
WEST END LIBRARY                                2016      3999
                                                2017      4228
                                                2018      1147
WOODS RUN LIBRARY                               2016      8495
                                                2017     11395
                                                2018      3213
Name: WifiSessions, dtype: int64

Making a map¶

In [23]:

m = folium.Map(location=[40.442, -79.997],
                        zoom_start=13,
                        tiles="Stamen Toner")

library_layer = folium.FeatureGroup(name='CLP Libraries')

for index, row in libraries.iterrows():
    library_layer.add_child(folium.CircleMarker(location=(row["Lat"],
                                                    row["Lon"]),
                            radius=150,
                            popup=row["Name"],
                            color=False,
                            fill_opacity=1,
                            fill_color="#d4237d"))

m.add_child(library_layer)

folium.LayerControl().add_to(m)

m

Out[23]:

Making a more complicated map!¶

In [24]:

# side note: folium has a "choropleth" shortcut I generally use
# but I just learned today they are thinking of depreciating it
# so I made this map in a slightly more complicated way

In [25]:

# our fcc dataframe has data for the entire US
# but all those census tracts would make for a large file
# we will load a previously downloaded geojson file
# original data from https://data.wprdc.org/dataset/allegheny-county-census-tracts-2016
# a good place to explore geojson as a data structure is http://geojson.io/

allegheny_tracts = simplejson.load(open("Allegheny_County_Census_Tracts_2016.geojson"))

In [26]:

# create a colormap

colormap = linear.YlGn_09.scale(
    fcc.pcat_all.min(),
    fcc.pcat_all.max())

print(colormap(5.0))

colormap

#004529

Out[26]:

In [27]:

# and a way to connect tractcode/geoids to the relevant data

fcc_dict = fcc.set_index('tractcode')['pcat_all']
fcc_dict['42003468900']

Out[27]:

In [28]:

m2 = folium.Map(location=[40.442, -79.997],
                        zoom_start=13,
                        tiles="Stamen Toner")

fcc_layer = folium.GeoJson(
    allegheny_tracts,
    name='Residential Internet',
    style_function=lambda feature: {
        'fillColor': colormap(fcc_dict[feature['properties']['GEOID']]),
        'color': 'black',
        'weight': 1,
        'fillOpacity': 0.8,
    }
)

m2.add_child(fcc_layer)
m2.add_child(library_layer)


folium.LayerControl().add_to(m2)


m2

Out[28]: