Emilio Mayorga, 2017-2-6
We covered the basics of GeoPandas in the previous episode and notebook. Here, we'll extend that introduction to illustrate additional aspects of GeoPandas and its interactions with other Python libraries, covering fancier mapping, analysis (unitary and binary spatial operators), raster zonal stats and geopandas.
Here are the main sections in this episode / notebook:
choropleth
map based on calculated watershed areassjoin
, of polygons on pointsWe'll use these throughout the rest of the tutorial.
%matplotlib inline
import os
import json
import matplotlib.pyplot as plt
# The two statemens below are used mainly to set up a plotting
# default style that's better than the default from matplotlib
import seaborn as sns
plt.style.use('bmh')
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
from geopandas import GeoSeries, GeoDataFrame
data_pth = "../data"
hydrobas = gpd.read_file(os.path.join(data_pth, "hybas_na_lev00_v1c.shp"))
len(hydrobas)
14866
We'll apply a filter to the dataset to select only the Pfastetter level-4 watershed with code 7831: hydrobas['pfaf_4'] == 7831
. This is most of Western Washington. Watershed polygons will still be read at their original level 12 resolution.
hydrobas_ww = hydrobas[hydrobas['pfaf_4'] == 7831]
413 polygon features returned. Let's examine the attributes for the first feature.
len(hydrobas_ww)
413
hydrobas_ww.iloc[0]
coast 0 dist_main 0 dist_sink 0 endo 0 geometry POLYGON ((-123.35 46.36666666666669, -123.3493... hybas_id 2147483647 main_bas 2147483647 next_down 0 next_sink 2147483647 order 1 pfaf_1 7 pfaf_10 2147483647 pfaf_11 2147483647 pfaf_12 2147483647 pfaf_2 78 pfaf_3 783 pfaf_4 7831 pfaf_5 78310 pfaf_6 783101 pfaf_7 7831010 pfaf_8 78310101 pfaf_9 783101010 sort 19945 sub_area 135.4 up_area 135.4 Name: 3908, dtype: object
Plot a categorical map, with coloring based on the aggregating column pfaf_7
. Watershed boundaries are at the high-resolution Pfastetter level 12.
hydrobas_ww.plot(column='pfaf_7', cmap='Paired', categorical=True, figsize=(14,6));
Apply GeoDataFrame dissolve aggregation method (implemented from lower level shapely
operators) on level-7 Pfastetter codes (pfaf_7
) shown in the plot above. Aggregate attributes, retaining only pfaf_7
and pfaf_6
(plus geometry
, of course). Note that this operation results in only 17 polygons, from the original 413.
cols = ['pfaf_6', 'pfaf_7', 'geometry']
hydrobas_ww_p7 = hydrobas_ww[cols].dissolve(by='pfaf_7', aggfunc='first', as_index=False)
len(hydrobas_ww_p7)
17
hydrobas_ww_p7.crs
The CRS was not retained during the dissolve operation. Apply it by copying it from hydrobas_ww
.
hydrobas_ww_p7.crs = hydrobas_ww.crs
hydrobas_ww_p7.crs
{'init': u'epsg:4326'}
Let's examine some of the features.
hydrobas_ww_p7.head()
pfaf_7 | geometry | pfaf_6 | |
---|---|---|---|
0 | 7831010 | (POLYGON ((-123.5722222222222 46.2458333333333... | 783101 |
1 | 7831020 | POLYGON ((-123.1791666666666 46.33333333333336... | 783102 |
2 | 7831031 | (POLYGON ((-123.9597222222222 46.9666666666667... | 783103 |
3 | 7831032 | POLYGON ((-123.8583333333333 47.39583333333336... | 783103 |
4 | 7831033 | POLYGON ((-124.3 47.34583333333336, -124.30221... | 783103 |
Plot the results. Looks like the previous plot, except the polygon boundaries are now the pfaf_7 watersheds (same as the background colors).
hydrobas_ww_p7.plot(column='pfaf_7', cmap='Paired', categorical=True, figsize=(14,6));
NOTE/WATCH:
Beware that dissolve
may fail if there are "invalid" geometries.
See this code from the previous, intro notebook. The 6 geometries/points reported are invalid (and are reported by the is_valid()
method). This dissolve statement does work, though.
seas_grp = seas[['oceans', 'geometry']]
seas_oceans_diss = seas_grp[seas_grp.geometry.is_valid].dissolve(by='oceans')
Ring Self-intersection at or near point 10.407218181818182 54.821390909090908
Self-intersection at or near point -79.365827272727287 76.296645454545455
Ring Self-intersection at or near point 10.979445510225332 54.380555030408686
Ring Self-intersection at or near point 133.61550925464189 -4.3005540903175188
Ring Self-intersection at or near point 121.91067196634913 -5.0593090510592447
Ring Self-intersection at or near point 115.29553592754269 -7.0082630551828515
Partly so we can calculate polygon areas in linear units, not geodetic degrees. But also because that's the projection used by most state and local governments in Washington.
No need to go to a web site to learn more about what epsg:2927
is. Use pyepsg
, which issues queries to http://epsg.io web services.
import pyepsg
Extract the epsg code from the string returned by crs['init']
.
hydrobas_ww_p7_epsg_str = hydrobas_ww_p7.crs['init'].split(':')[1]
pyepsg.get(hydrobas_ww_p7_epsg_str)
<GeodeticCRS: 4326, WGS 84>
Query epsg 2927
.
pyepsg.get('2927')
<ProjectedCRS: 2927, NAD83(HARN) / Washington South (ftUS)>
Apply the crs transformation (reprojection) using to_crs
method.
hydrobas_ww_p7_wasp = hydrobas_ww_p7.to_crs(epsg=2927)
Note that, being in a planar project (not geodetic), the shape looks different compared to the previous map. More "normal". And the axes are now in feet
relative to some origin.
hydrobas_ww_p7_wasp.plot(column='pfaf_7', cmap='Paired', categorical=True, figsize=(14,6));
choropleth
map based on calculated watershed areas¶As the projection is in feet
, auto-calculated polygon areas will be in feet2. So let's convert to miles2 first (why not!). We'll add a new column to the GeoDataFrame.
hydrobas_ww_p7_wasp['area_mi2'] = hydrobas_ww_p7_wasp.geometry.area / 27878400
hydrobas_ww_p7_wasp.head(3)
pfaf_7 | geometry | pfaf_6 | area_mi2 | |
---|---|---|---|---|
0 | 7831010 | (POLYGON ((863343.6925921033 347890.2242189167... | 783101 | 1375.137396 |
1 | 7831020 | POLYGON ((963803.8027083561 376154.9965688475,... | 783102 | 2107.945774 |
2 | 7831031 | (POLYGON ((776917.1877152125 614568.383332147,... | 783103 | 528.472846 |
NOTE/FUN:
Now you could get the area of a pfaf_6 watershed via simple Pandas DataFrame groupby
aggregation (sum).
Plot the choloropleth, using area_mi2
.
"fisher_jenks" value segmentation scheme
(using 7 segments, k=7) used is one of the available pysal.esda.mapclassify.Map_Classifier
classifiers from the powerful PySAL package (Python Spatial Analysis Library); GeoPandas can use these classifiers if PySAL is installed. To get the list of classifiers, use:
import pysal
print(pysal.esda.mapclassify.Map_Classifier.__doc__)
f, ax = plt.subplots(1, figsize=(6, 4))
hydrobas_ww_p7_wasp.plot(column='area_mi2', scheme='fisher_jenks', k=7,
alpha=0.9, cmap=plt.cm.Blues, legend=True, ax=ax)
plt.axis('equal')
ax.set_title('Watersheds by area ($mi^2$)');
import folium
Folium likes GeoJSON for geometries. GeoPandas makes conversion easy.
geo_str = hydrobas_ww_p7.to_json()
m.choropleth
will do an attribute join between geo_str
and hydrobas_ww_p7_wasp
based on the key pfaf_7
.
m = folium.Map(location=[47.8, -122.5], zoom_start=7,
tiles="cartodbpositron")
m.choropleth(geo_str=geo_str,
data=hydrobas_ww_p7_wasp, columns=['pfaf_7', 'area_mi2'],
key_on='feature.properties.pfaf_7',
fill_color='YlGn', fill_opacity=0.4,
legend_name='area')
m
/home/mayorga/miniconda/envs/geopandasenv/lib/python2.7/site-packages/ipykernel/__main__.py:8: FutureWarning: 'threshold_scale' default behavior has changed. Now you get a linear scale between the 'min' and the 'max' of your data. To get former behavior, use folium.utilities.split_six.
This map is interactive, so play with it (zoom and pan). There is a lot more to explore in Folium! This is just a teaser.
sjoin
, of polygons on points¶We'll use an old, local snapshot of NANOOS coastal and marine monitoring stations in the Pacific NW, from the NANOOS Visualization System (NVS) Data Explorer. While many stations are moorings on marine waters, some are inshore or in tidal shores and will overlap the watershed boundaries.
The point file is in the GeoPackage format, an OGC format implemented in SQLite.
nanoosstations_gdf = gpd.read_file(os.path.join(data_pth, "nanoos_nvs.gpkg"))
len(nanoosstations_gdf)
194
nanoosstations_gdf.iloc[-1]
data_source Decagon data_source_url http://www.decagon.com geometry POINT (-122.6966 47.3612) lat 47.3612 lon -122.697 name Henderson Bay site, W shore nvs_station_url http://nvs.nanoos.org/Explorer?action=oiw:fixe... platform_label WADOH_HendrsnBay1 platform_type Fixed Shore Platform provider WADOH provider_type State provider_url http://www.doh.wa.gov region Puget Sound short_name WADOH Henderson Bay state Washington status online status_date 2016-06-24T18:00:00Z url http://www.doh.wa.gov/CommunityandEnvironment/... Name: 193, dtype: object
Points in the coasts of the Pacific NW (BC, WA, OR) and out in the open ocean.
nanoosstations_gdf.plot(markersize=4);
Apply "inner" spatial join with sjoin
operator from the gpd.tools
package. An inner join will retain only overlapping features. Then plot as a map overlay on top of hydrobas_ww_p7
, categorizing (coloring) points by the pfaf_6
watershed they're in.
nanoossta_hydrobas_ww_gdf = gpd.tools.sjoin(nanoosstations_gdf, hydrobas_ww_p7, how="inner")
len(nanoossta_hydrobas_ww_gdf)
63
f, ax = plt.subplots(1, figsize=(6, 6))
plt.axis('equal')
hydrobas_ww_p7.plot(ax=ax)
nanoossta_hydrobas_ww_gdf.plot(column='pfaf_6', markersize=6,
categorical=True, legend=True, ax=ax);
We'll end by mixing features from a GeoDataFrame with a raster, applying zonal statistics using the cool and light weight rasterstats package.
Monthly Juy long-term climatology precipitation. The original monthly time series data are from the PRISM Climate Group; the monthly climatology and Pacific NW clip were created by your truly and Don Setiawan for the BiGCZ project.
ppt_july_tif_pth = os.path.join(data_pth, 'prism_precipitation_july_climatology.tif')
Rasterstas uses rasterio to read rasters (and fiona
to read vector datasets), so we'll first do a quick exploration of rasterio.
import rasterio
import rasterio.plot as rioplot
ppt_july = rasterio.open(ppt_july_tif_pth)
ppt_july
<open DatasetReader name='../data/prism_precipitation_july_climatology.tif' mode='r'>
Examine the metadata read from the raster file (we can confirm CRS is epsg:4326), then plot the raster.
ppt_july.meta
{'count': 1, 'crs': CRS({'init': u'epsg:4326'}), 'driver': u'GTiff', 'dtype': 'float32', 'height': 177, 'nodata': -9999.0, 'transform': Affine(0.04, 0.0, -125.12921197411002, 0.0, -0.04, 49.045740291264174), 'width': 353}
NOTE/WATCH:
with_bounds=True
is supposed to display the X & Y coordinate labels (lat & lon) from the metadata. We don't why this isn't working.
rioplot.show(ppt_july, with_bounds=True, cmap=plt.cm.Blues);
zonal_stats
¶Apply zonal_stats
from rasterstats
package. Can pass a GeoDataFrame
directly (instead of the file path to a GIS file) because it implements our old friend, the __geo_interface__
method. For the raster, we pass its file path.
zonal_stats
returns a geojson with the original properties plus the zonal statistics.
import rasterstats as rs
zonal_ppt_gjson = rs.zonal_stats(hydrobas_ww_p7, ppt_july_tif_pth,
prefix='pptjuly_',
geojson_out=True)
len(zonal_ppt_gjson)
17
zonal_ppt_gdf = GeoDataFrame.from_features(zonal_ppt_gjson)
zonal_ppt_gdf.head(2)
geometry | pfaf_6 | pfaf_7 | pptjuly_count | pptjuly_max | pptjuly_mean | pptjuly_min | |
---|---|---|---|---|---|---|---|
0 | (POLYGON ((-123.5722222222222 46.2458333333333... | 783101 | 7831010 | 259 | 49.824776 | 33.674382 | 23.438837 |
1 | POLYGON ((-123.1791666666666 46.33333333333336... | 783102 | 7831020 | 404 | 96.738289 | 29.443886 | 15.356521 |
f, ax = plt.subplots(1, figsize=(6, 5))
zonal_ppt_gdf.plot(column='pptjuly_mean', scheme='Equal_Interval', k=5,
alpha=1, cmap=plt.cm.Blues, legend=True, ax=ax)
plt.axis('equal')
ax.set_title('Mean July precipitation ($mm/month$) by watershed');