Making an Iowa choropleth map using Python

Preface

I'm following along with this flowingdata tutorial, with some slight tweaks. My choropleth will visualize the percentage public employees that are female in each county. I previously calculated these percentages with pandas using data I scraped from the Des Moines Register's Iowa public salary database.

Code

If you just want to see the code, I put it on GitHub: https://github.com/austinlyons/iowa_female_public_employees

SVG

To get an SVG with Iowa and it's counties, I took the US county SVG from Wikimedia, opened the SVG in a text editor, and deleted everything but the Iowa counties. I edited the resulting map a bit with Inkscape, making sure that each county path retained the county name and its FIPS code. To easily look up a county's FIPS code I created a .csv with Iowa counties and their respective FIPS code using data from the EPA.

In [1]:
# imports
import pandas as pd
import numpy as np
from BeautifulSoup import BeautifulSoup
In [2]:
# implement county->fips lookup table with a dataframe with the county as the index
fips = pd.read_csv('iowa_county_fips.csv', index_col=0)
In [3]:
# are there 99 counties in this dataframe? sanity check
print fips
<class 'pandas.core.frame.DataFrame'>
Index: 99 entries, ADAIR to WRIGHT
Data columns (total 1 columns):
code    99  non-null values
dtypes: int64(1)
In [4]:
# does the data look like we expect?
print fips[0:10]
             code
county           
ADAIR       19001
ADAMS       19003
ALLAMAKEE   19005
APPANOOSE   19007
AUDUBON     19009
BENTON      19011
BLACK HAWK  19013
BOONE       19015
BREMER      19017
BUCHANAN    19019
In [5]:
# Iowa public employees dataframe: % female, % male, total # employees. County is set as the index
iowa_by_sex = pd.read_csv('iowa_public_employees_female_male_ratio.csv', index_col=0)
In [6]:
# Let's see a summary of this data
print iowa_by_sex.describe()
                F           M  Employee Count
count  100.000000  100.000000      100.000000
mean     0.514018    0.485982      576.960000
std      0.107958    0.107958     1882.730061
min      0.274510    0.000000        1.000000
25%      0.457567    0.431299       62.750000
50%      0.507613    0.492387      125.500000
75%      0.568701    0.542433      345.250000
max      1.000000    0.725490    15229.000000
In [7]:
print iowa_by_sex.loc['TAMA']
F                   0.664311
M                   0.335689
Employee Count    283.000000
Name: TAMA, dtype: float64
In [8]:
# add code column to our iowa dataframe
iowa_by_sex['code'] = fips['code'].astype(int)
In [9]:
# pandas magic! We should see the code as a new column in this row.
# adding a column is so simple since the two dataframes 
# each use county as their index
print iowa_by_sex.loc['TAMA']
F                     0.664311
M                     0.335689
Employee Count      283.000000
code              19171.000000
Name: TAMA, dtype: float64
In [10]:
# load the blank Iowa SVG
svg = open('iowa_counties.svg', 'r').read()

In [11]:
# parse SVG, defining selfClosingTags as shown here: https://josephhall.org/nqb2/index.php/flwdchrplth
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])
In [12]:
# find counties using findAll
paths = soup.findAll('path')
In [13]:
# The map colors in order from lightest to darkest
colors = ['#F1EEF6', '#D0D1E6', '#A6BDDB', '#74A9CF', '#2B8CBE', '#045A8D']
In [14]:
# county base style. We'll add the fill color (at the end of this string) for each county path
path_style="font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;fill:"
In [15]:
# Color the counties based on percent of public employees that are female
for p in paths:
 
    if p['id'] not in ["State_Lines", "separator"]: # only color the counties
        try:
            rate = iowa_by_sex[iowa_by_sex['code'] == int(p['id'])]['F']
        except:
            continue
 
        if rate > 0.833:
            color_class = 5
        elif rate > 0.666:
            color_class = 4
        elif rate > 0.5:
            color_class = 3
        elif rate > 0.333:
            color_class = 2
        elif rate > 0.166:
            color_class = 1
        else:
            color_class = 0
 
        color = colors[color_class]
        p['style'] = path_style + color
In [16]:
# Save result
fo = open("iowa_counties_colored.svg", "wb")
fo.write(soup.prettify());
fo.close()

output is an SVG of Iowa counties colored by % of public employees that are female