I'm following along with this flowingdata tutorial, with some slight tweaks. My choropleth will visualize the percentage public employees that are female in each county. I previously calculated these percentages with pandas using data I scraped from the Des Moines Register's Iowa public salary database.
If you just want to see the code, I put it on GitHub: https://github.com/austinlyons/iowa_female_public_employees
To get an SVG with Iowa and it's counties, I took the US county SVG from Wikimedia, opened the SVG in a text editor, and deleted everything but the Iowa counties. I edited the resulting map a bit with Inkscape, making sure that each county path retained the county name and its FIPS code. To easily look up a county's FIPS code I created a .csv with Iowa counties and their respective FIPS code using data from the EPA.
# imports
import pandas as pd
import numpy as np
from BeautifulSoup import BeautifulSoup
# implement county->fips lookup table with a dataframe with the county as the index
fips = pd.read_csv('iowa_county_fips.csv', index_col=0)
# are there 99 counties in this dataframe? sanity check
print fips
<class 'pandas.core.frame.DataFrame'> Index: 99 entries, ADAIR to WRIGHT Data columns (total 1 columns): code 99 non-null values dtypes: int64(1)
# does the data look like we expect?
print fips[0:10]
code county ADAIR 19001 ADAMS 19003 ALLAMAKEE 19005 APPANOOSE 19007 AUDUBON 19009 BENTON 19011 BLACK HAWK 19013 BOONE 19015 BREMER 19017 BUCHANAN 19019
# Iowa public employees dataframe: % female, % male, total # employees. County is set as the index
iowa_by_sex = pd.read_csv('iowa_public_employees_female_male_ratio.csv', index_col=0)
# Let's see a summary of this data
print iowa_by_sex.describe()
F M Employee Count count 100.000000 100.000000 100.000000 mean 0.514018 0.485982 576.960000 std 0.107958 0.107958 1882.730061 min 0.274510 0.000000 1.000000 25% 0.457567 0.431299 62.750000 50% 0.507613 0.492387 125.500000 75% 0.568701 0.542433 345.250000 max 1.000000 0.725490 15229.000000
print iowa_by_sex.loc['TAMA']
F 0.664311 M 0.335689 Employee Count 283.000000 Name: TAMA, dtype: float64
# add code column to our iowa dataframe
iowa_by_sex['code'] = fips['code'].astype(int)
# pandas magic! We should see the code as a new column in this row.
# adding a column is so simple since the two dataframes
# each use county as their index
print iowa_by_sex.loc['TAMA']
F 0.664311 M 0.335689 Employee Count 283.000000 code 19171.000000 Name: TAMA, dtype: float64
# load the blank Iowa SVG
svg = open('iowa_counties.svg', 'r').read()
# parse SVG, defining selfClosingTags as shown here: https://josephhall.org/nqb2/index.php/flwdchrplth
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])
# find counties using findAll
paths = soup.findAll('path')
# The map colors in order from lightest to darkest
colors = ['#F1EEF6', '#D0D1E6', '#A6BDDB', '#74A9CF', '#2B8CBE', '#045A8D']
# county base style. We'll add the fill color (at the end of this string) for each county path
path_style="font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;fill:"
# Color the counties based on percent of public employees that are female
for p in paths:
if p['id'] not in ["State_Lines", "separator"]: # only color the counties
try:
rate = iowa_by_sex[iowa_by_sex['code'] == int(p['id'])]['F']
except:
continue
if rate > 0.833:
color_class = 5
elif rate > 0.666:
color_class = 4
elif rate > 0.5:
color_class = 3
elif rate > 0.333:
color_class = 2
elif rate > 0.166:
color_class = 1
else:
color_class = 0
color = colors[color_class]
p['style'] = path_style + color
# Save result
fo = open("iowa_counties_colored.svg", "wb")
fo.write(soup.prettify());
fo.close()