Fragments associated with grabbing boundaries data.
Note that a full demo using all local authority ditricts kills MyBinder demo.
Get the boundaries for Local Authority areas:
http://geoportal.statistics.gov.uk/datasets/7ff28788e1e640de8150fb8f35703f6e_2
import geopandas
#From the downloads area of the page, grab the link for the shapefile download
url='https://opendata.arcgis.com/datasets/7ff28788e1e640de8150fb8f35703f6e_2.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D'
gdf = geopandas.read_file(url)
#The .head() method limits the display of the table to the top few rows;
# THings like .head(20) to preview first 20 rows also works
gdf.head()
objectid | lad16cd | lad16nm | lad16nmw | bng_e | bng_n | long | lat | st_areasha | st_lengths | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | E06000001 | Hartlepool | None | 447157 | 531476 | -1.27023 | 54.676159 | 9.359786e+07 | 69382.685924 | (POLYGON ((447097.0010000002 537152.0011, 4472... |
1 | 2 | E06000002 | Middlesbrough | None | 451141 | 516887 | -1.21099 | 54.544670 | 5.387900e+07 | 42085.584812 | (POLYGON ((449861.8997999998 521260.6998999994... |
2 | 3 | E06000003 | Redcar and Cleveland | None | 464359 | 519597 | -1.00611 | 54.567520 | 2.448388e+08 | 96189.660709 | (POLYGON ((455776.7005000003 528322.4985000007... |
3 | 4 | E06000004 | Stockton-on-Tees | None | 444937 | 518183 | -1.30669 | 54.556911 | 2.049366e+08 | 115439.477112 | (POLYGON ((444126.0993999997 528005.7992000002... |
4 | 10 | E06000010 | Kingston upon Hull, City of | None | 511894 | 431716 | -0.30380 | 53.769791 | 7.145577e+07 | 63330.240277 | (POLYGON ((510966.6001000004 436533.0030000005... |
We can check the projection:
gdf.crs
{'proj': 'tmerc', 'lat_0': 49, 'lon_0': -2, 'k': 0.9996012717, 'x_0': 400000, 'y_0': -100000, 'datum': 'OSGB36', 'units': 'm', 'no_defs': True}
And preview the shapefile:
gdf.plot();
We can easily add a title to the plot:
#Turn the plot into soemthing we can reference
ax = gdf.plot()
#Add a title
ax.set_title('Local Authorities in the UK')
#Turn off axes
ax.axis('off');
Preview shapes from the first few rows using the folium
package:
import folium
m = folium.Map(max_zoom=6, location=[53.9, 0.0])
folium.GeoJson(gdf.head()).add_to(m)
m
#You can download a copy of the actual boundary data file if required
# !wget https://opendata.arcgis.com/datasets/52182cdda64d4b15984f6446ca7ee7fd_1.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D -O wards_fullextent.zip
# !unzip wards_fullextent.zip
Let's get some data to use as the basis of a choropleth map.
We can use something from deprivation indices (we really should check we grabbed boundary files for the correct period...).
#https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015
#File 10: local authority district summaries
data_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464464/File_10_ID2015_Local_Authority_District_Summaries.xlsx'
Download the data into a pandas
dataframe. The orginal file is an Excel spreadsheet with mutliple sheets, so let's load them all and preview the sheetnames:
import pandas as pd
df = pd.read_excel(data_url, sheet_name=None)
for k in df.keys():
print(k)
Notes IMD Income Employment Education Health Crime Barriers Living IDACI IDAOPI
We an the preview the data in a single sheet:
df['Education'].head()
Local Authority District code (2013) | Local Authority District name (2013) | Education, Skills and Training - Average rank | Education, Skills and Training - Rank of average rank | Education, Skills and Training - Average score | Education, Skills and Training - Rank of average score | Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally | Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally | |
---|---|---|---|---|---|---|---|---|
0 | E06000001 | Hartlepool | 20101.48 | 72 | 30.510 | 47 | 0.2069 | 37 |
1 | E06000002 | Middlesbrough | 22728.01 | 24 | 40.640 | 3 | 0.4419 | 1 |
2 | E06000003 | Redcar and Cleveland | 19185.28 | 95 | 27.875 | 71 | 0.1818 | 54 |
3 | E06000004 | Stockton-on-Tees | 16660.09 | 150 | 24.637 | 110 | 0.1750 | 59 |
4 | E06000005 | Darlington | 16385.06 | 155 | 22.569 | 129 | 0.1385 | 75 |
It's easy enough to combine data from a pandas
data frame with shape data in a geopandas
dataframe.
The geopandas
dataframe is used to create a geojson file that the folium
package can render. Each column name in the geopandas
dataframe is mapped onto a corresponding feature.properties.COLUMN_NAME
in the created geojson file. (I'm not sure offhand how column names that include space or punctuation characters are handled: simple column names are easiest.)
The data file is also passed in and the key and data columns identified as columns=[KEYCOL, DATACOL]
.
The rendered choropleth map is then coloured accordingly.
import folium
m = folium.Map(max_zoom=9, location=[54.5, -0.8])
folium.Choropleth(gdf.head(), key_on='feature.properties.lad16cd',
data=df['Education'],
columns=['Local Authority District code (2013)',
'Education, Skills and Training - Rank of average rank'],
fill_color='YlOrBr').add_to(m)
m
If we try to render the whole of the UK in a MyBinder session, things crash. (I think geopandas
is quite heavy on resources.)
We can exploit the notebook environment further by reating a simple application to explore the data more generally.
For example, within the Education
data sheet, we can explore choropleth maps generated from other columns.
We can create tidied up names for the data selection that then refer back to the original column name:
#We can create a drop down list with values in the list that map onto column names
#A python dict is the data structure that lets us do this
#We use a technique called a dict comprehension to create the dict from a list of column names
#The split separates the column names on '-' elements into two parts
#The parts are refenced by a numercial index value, starting at 0
#Index value 1 is the second item in the split list
#The .strip() command gets rid of leading/trailing whitespace in the string
datacols = {c.split('-')[1].strip():c for c in df['Education'].columns if c.startswith('Education')}
datacols
{'Average rank': 'Education, Skills and Training - Average rank', 'Rank of average rank': 'Education, Skills and Training - Rank of average rank', 'Average score': 'Education, Skills and Training - Average score', 'Rank of average score': 'Education, Skills and Training - Rank of average score', 'Proportion of LSOAs in most deprived 10% nationally': 'Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally', 'Rank of proportion of LSOAs in most deprived 10% nationally': 'Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally'}
from ipywidgets import interact
@interact(Indicator=datacols)
def plotEducationChoropleth(Indicator='Education, Skills and Training - Rank of average rank'):
m = folium.Map(max_zoom=9, location=[54.5, -0.8])
folium.Choropleth(gdf.head(), key_on='feature.properties.lad16cd',
data=df['Education'],
columns=['Local Authority District code (2013)',
Indicator],
fill_color='YlOrBr').add_to(m)
return m
interactive(children=(Dropdown(description='Indicator', index=1, options={'Average rank': 'Education, Skills a…
We can merge data into the geodataframe using common columns and then directly plot choropleth maps by selecting the data column we want to plot against.
#Merge in data
gdf = pd.merge(gdf, df['Education'],
how='inner', #The type of join (what happens if data is in one dataset and not the other)
left_on='lad16cd', #Column we're merging on in left dataframe
right_on='Local Authority District code (2013)'#Column we're merging on in right dataframe
)
gdf.head()
objectid | lad16cd | lad16nm | lad16nmw | bng_e | bng_n | long | lat | st_areasha | st_lengths | geometry | Local Authority District code (2013) | Local Authority District name (2013) | Education, Skills and Training - Average rank | Education, Skills and Training - Rank of average rank | Education, Skills and Training - Average score | Education, Skills and Training - Rank of average score | Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally | Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | E06000001 | Hartlepool | None | 447157 | 531476 | -1.27023 | 54.676159 | 9.359786e+07 | 69382.685924 | (POLYGON ((447097.0010000002 537152.0011, 4472... | E06000001 | Hartlepool | 20101.48 | 72 | 30.510 | 47 | 0.2069 | 37 |
1 | 2 | E06000002 | Middlesbrough | None | 451141 | 516887 | -1.21099 | 54.544670 | 5.387900e+07 | 42085.584812 | (POLYGON ((449861.8997999998 521260.6998999994... | E06000002 | Middlesbrough | 22728.01 | 24 | 40.640 | 3 | 0.4419 | 1 |
2 | 3 | E06000003 | Redcar and Cleveland | None | 464359 | 519597 | -1.00611 | 54.567520 | 2.448388e+08 | 96189.660709 | (POLYGON ((455776.7005000003 528322.4985000007... | E06000003 | Redcar and Cleveland | 19185.28 | 95 | 27.875 | 71 | 0.1818 | 54 |
3 | 4 | E06000004 | Stockton-on-Tees | None | 444937 | 518183 | -1.30669 | 54.556911 | 2.049366e+08 | 115439.477112 | (POLYGON ((444126.0993999997 528005.7992000002... | E06000004 | Stockton-on-Tees | 16660.09 | 150 | 24.637 | 110 | 0.1750 | 59 |
4 | 10 | E06000010 | Kingston upon Hull, City of | None | 511894 | 431716 | -0.30380 | 53.769791 | 7.145577e+07 | 63330.240277 | (POLYGON ((510966.6001000004 436533.0030000005... | E06000010 | Kingston upon Hull, City of | 25257.92 | 3 | 44.134 | 1 | 0.4217 | 2 |
We can now create a choropleth directly from the dataframe with an inline (non-interactive) plot:
ax = gdf.plot(column='Education, Skills and Training - Average rank')
ax.axis('off');
Once again, we can create a simple widget driven app to allow us to explore the data more dynamically. Note also that alternative colour map palettes are available:
@interact(Indicator=datacols)
def plotEducationChoropleth2(Indicator='Education, Skills and Training - Rank of average rank'):
ax = gdf.plot(column=Indicator, cmap='OrRd')
ax.axis('off');
interactive(children=(Dropdown(description='Indicator', index=1, options={'Average rank': 'Education, Skills a…
We can reason about shapes in a limited number of ways. For example, we can tell if one boundary touches another. This means that we can plot a region and the areas that border on it.
#Via https://gis.stackexchange.com/a/300262/119781
def plotNeighbours(gdf, region='Milton Keynes',
indicator='Education, Skills and Training - Rank of average rank',
cmap='OrRd'):
''' Plot choropleth for an indicator relative to a specified region and its neighbours. '''
targetBoundary = gdf[gdf['lad16nm']==region]['geometry'].values[0]
neighbours = gdf.apply(lambda row: row['geometry'].touches(targetBoundary) or row['geometry']==targetBoundary ,
axis=1)
#Show the data for the selected area and its neighbours
display(gdf[neighbours][['lad16nm',indicator]].set_index('lad16nm'))
#Generate choropleth
ax = gdf[neighbours].plot(column=indicator, cmap=cmap)
ax.axis('off');
plotNeighbours(gdf)
Education, Skills and Training - Rank of average rank | |
---|---|
lad16nm | |
Milton Keynes | 197 |
Bedford | 160 |
Central Bedfordshire | 186 |
Aylesbury Vale | 254 |
Wellingborough | 67 |
South Northamptonshire | 293 |
plotNeighbours(gdf, 'Portsmouth')
Education, Skills and Training - Rank of average rank | |
---|---|
lad16nm | |
Portsmouth | 38 |
Fareham | 278 |
Havant | 73 |
Winchester | 312 |