Choropleths are a way to represent some non-geographical data that varies by regions on your map. We'll be making choropleths with Folium and Geopandas.
We're going to start with a dataset from the World Bank on Internet usage in 2016.
import pandas
import matplotlib.pyplot as plt
internet = pandas.read_csv('data/world_internet_usage_2016.csv', skiprows=18, names=["name", "internet_use"])
internet.head()
name | internet_use | |
---|---|---|
17 | Afghanistan | 10.6 |
18 | Albania | 66.4 |
19 | Algeria | 42.9 |
20 | American Samoa | NaN |
21 | Andorra | 97.9 |
internet[internet.internet_use.isnull()]
name | internet_use | |
---|---|---|
20 | American Samoa | NaN |
44 | British Virgin Islands | NaN |
56 | Channel Islands | NaN |
110 | Isle of Man | NaN |
119 | Korea, Dem. People’s Rep. | NaN |
121 | Kosovo | NaN |
154 | Nauru | NaN |
162 | Northern Mariana Islands | NaN |
166 | Palau | NaN |
180 | San Marino | NaN |
188 | Sint Maarten (Dutch part) | NaN |
199 | St. Martin (French part) | NaN |
217 | Turks and Caicos Islands | NaN |
internet.dropna(inplace=True)
internet.head()
name | internet_use | |
---|---|---|
17 | Afghanistan | 10.6 |
18 | Albania | 66.4 |
19 | Algeria | 42.9 |
21 | Andorra | 97.9 |
22 | Angola | 13.0 |
import geopandas
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world.head()
pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
---|---|---|---|---|---|---|
0 | 28400000.0 | Asia | Afghanistan | AFG | 22270.0 | POLYGON ((61.21081709172574 35.65007233330923,... |
1 | 12799293.0 | Africa | Angola | AGO | 110300.0 | (POLYGON ((16.32652835456705 -5.87747039146621... |
2 | 3639453.0 | Europe | Albania | ALB | 21810.0 | POLYGON ((20.59024743010491 41.85540416113361,... |
3 | 4798491.0 | Asia | United Arab Emirates | ARE | 184300.0 | POLYGON ((51.57951867046327 24.24549713795111,... |
4 | 40913584.0 | South America | Argentina | ARG | 573900.0 | (POLYGON ((-65.50000000000003 -55.199999999999... |
len(set(world.name) & set(internet.name))
145
len(set(world.name) - set(internet.name))
32
len(set(internet.name) - set(world.name))
59
world = world.merge(internet, how='left', on='name')
world.head()
pop_est | continent | name | iso_a3 | gdp_md_est | geometry | internet_use | |
---|---|---|---|---|---|---|---|
0 | 28400000.0 | Asia | Afghanistan | AFG | 22270.0 | POLYGON ((61.21081709172574 35.65007233330923,... | 10.6 |
1 | 12799293.0 | Africa | Angola | AGO | 110300.0 | (POLYGON ((16.32652835456705 -5.87747039146621... | 13.0 |
2 | 3639453.0 | Europe | Albania | ALB | 21810.0 | POLYGON ((20.59024743010491 41.85540416113361,... | 66.4 |
3 | 4798491.0 | Asia | United Arab Emirates | ARE | 184300.0 | POLYGON ((51.57951867046327 24.24549713795111,... | 90.6 |
4 | 40913584.0 | South America | Argentina | ARG | 573900.0 | (POLYGON ((-65.50000000000003 -55.199999999999... | 70.2 |
I'm going to copy over some Proj4 strings, that we can use to tell the underlying projection libraries, PyProj and Proj.4, how to project our map.
mercator = '+proj=merc'
robinson = '+proj=robin'
wgs84 = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
Next, we'll clean up the map a bit by removing Antarctica (makes projecting a bit more difficult, and is irrelevant to what we're making anyway) and any filling any missing data about Internet connectivity with a zero.
world = world[(world.name!="Antarctica")].fillna(0)
We'll make a copy of the DataFrame when we do the projections, so we don't get any weird geometric artifacts.
# mercator projection
worldMercator = world.to_crs(mercator)
worldMercator.plot(figsize=(20,20), column='internet_use', cmap="OrRd", scheme="QUANTILES", k=5, legend=True)
plt.axis('off')
plt.title("Internet usage by country")
plt.savefig('data/internet_choropleth_geopandas.png')
plt.show()
...
...
...
Folium choropleth: http://python-visualization.github.io/folium/quickstart.html#Choropleth-maps
World bank data: https://data.worldbank.org/data-catalog/country-profiles