%matplotlib inline
import pandas as pd
import geopandas
from shapely.geometry import Point, Polygon
pd.options.display.max_rows = 10
GeoPandas can read and write many different GIS file formats by relying on the fiona
library (which is an interface to GDAL/OGR).
For example, let's start by reading a GeoJSON file about the administrative districts in Paris (https://opendata.paris.fr/explore/dataset/quartier_paris/) and some real-time information about the public bicycle sharing system in Paris (vélib, https://opendata.paris.fr/explore/dataset/stations-velib-disponibilites-en-temps-reel/information/):
stations = geopandas.read_file("stations-velib-disponibilites-en-temps-reel.geojson")
stations[['status', 'name', 'bike_stands', 'available_bikes', 'geometry']].head()
status | name | bike_stands | available_bikes | geometry | |
---|---|---|---|---|---|
0 | OPEN | 14002 - RASPAIL QUINET | 44 | 4 | POINT (2.32955509721 48.8391991672) |
1 | OPEN | 20503 - COURS DE VINCENNES PYRÉNÉES | 21 | 3 | POINT (2.40590731702 48.847724969) |
2 | OPEN | 20011 - PYRÉNÉES-DAGORNO | 21 | 0 | POINT (2.40516852064 48.855501354) |
3 | CLOSED | 31008 - VINCENNES (MONTREUIL) | 56 | 0 | POINT (2.43736866129 48.857702134) |
4 | OPEN | 43006 - MINIMES (VINCENNES) | 28 | 27 | POINT (2.43079345113 48.8414868563) |
quartiers = geopandas.read_file("quartier_paris.geojson")
quartiers.head()
n_sq_qu | perimetre | objectid | longueur | c_qu | surface | n_sq_ar | c_quinsee | l_qu | c_ar | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 750000047 | 6155.005036 | 29 | 6154.591387 | 47 | 1.902932e+06 | 750000012 | 7511203 | Bercy | 12 | POLYGON ((2.391141037839471 48.82611264577471,... |
1 | 750000001 | 5057.549475 | 41 | 5057.332772 | 1 | 8.690007e+05 | 750000001 | 7510101 | St-Germain-l'Auxerrois | 1 | POLYGON ((2.344593389828428 48.85404991486192,... |
2 | 750000076 | 4435.273252 | 79 | 4435.143641 | 76 | 1.294988e+06 | 750000019 | 7511904 | Combat | 19 | POLYGON ((2.388343313526396 48.88056667377272,... |
3 | 750000065 | 5264.597082 | 68 | 5264.463406 | 65 | 1.465071e+06 | 750000017 | 7511701 | Ternes | 17 | POLYGON ((2.295039618663717 48.87377869547587,... |
4 | 750000010 | 2139.625388 | 50 | 2139.535591 | 10 | 2.717503e+05 | 750000003 | 7510302 | Enfants-Rouges | 3 | POLYGON ((2.367101341254551 48.86162755885409,... |
GeoDataFrame
is a DataFrame which has a 'geometry' column. The .geometry
attribute returns a GeoSeries (the column name itself is not necessarily 'geometry')GeoSeries
is a Series that holds (shapely) geometry objects (Points, LineStrings, Polygons, ...)type(stations)
geopandas.geodataframe.GeoDataFrame
stations.geometry
0 POINT (2.32955509721 48.8391991672) 1 POINT (2.40590731702 48.847724969) 2 POINT (2.40516852064 48.855501354) 3 POINT (2.43736866129 48.857702134) 4 POINT (2.43079345113 48.8414868563) ... 1221 POINT (2.38257893505 48.8448673196) 1222 POINT (2.35634514417 48.821039287) 1223 POINT (2.33141820471 48.8589429862) 1224 POINT (2.27854960082 48.8274705017) 1225 POINT (2.32524904508 48.8662852212) Name: geometry, Length: 1226, dtype: object
type(stations.geometry)
geopandas.geoseries.GeoSeries
type(stations.geometry[0])
shapely.geometry.point.Point
We still have our pandas functionality that we can use on the geospatial dataset (e.g. boolean filtering).
stations = stations[stations['status'] == 'OPEN'].copy()
A histogram showing the distribution of the number of bike stands in the stations:
stations['bike_stands'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x7fc62fb68860>
For example, we can access the area of the different quartiers:
quartiers.geometry.area
0 0.000233 1 0.000107 2 0.000159 3 0.000180 4 0.000033 ... 75 0.000035 76 0.000132 77 0.000169 78 0.000098 79 0.000093 Length: 80, dtype: float64
Or calculate the distance from a certain point to all the bike stations.
I used geopy
to localize the Notre Dame:
>>> from geopy.geocoders import Nominatim
>>> geolocator = Nominatim()
>>> location = geolocator.geocode("Notre Dame Paris")
>>> notre_dame = Point(location.longitude, location.latitude)
>>> print(notre_dame)
POINT (2.35005149954546 48.85293695)
notre_dame = Point(2.35005149954546, 48.85293695)
stations.geometry.distance(notre_dame)
0 0.024674 1 0.056098 2 0.055177 4 0.081550 5 0.070769 ... 1221 0.033513 1222 0.032513 1223 0.019577 1224 0.075902 1225 0.028166 Length: 1186, dtype: float64
Or check in which quartier the Notre Dame is located:
quartiers.contains(notre_dame)
0 False 1 False 2 False 3 False 4 False ... 75 False 76 False 77 False 78 False 79 False Length: 80, dtype: bool
quartiers[quartiers.contains(notre_dame)]
n_sq_qu | perimetre | objectid | longueur | c_qu | surface | n_sq_ar | c_quinsee | l_qu | c_ar | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
5 | 750000016 | 3283.163371 | 56 | 3282.999717 | 16 | 378252.153674 | 750000004 | 7510404 | Notre-Dame | 4 | POLYGON ((2.361313701339139 48.84858030437791,... |
Many spatial predicates and operations:
GeoPandas allows you to quickly visualize your geometries:
quartiers.plot(figsize=(10, 10))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc62fb33b38>
ax = quartiers.plot(figsize=(10, 10), cmap='tab10', alpha=0.5)
ax.set_axis_off()
ax = quartiers.plot(figsize=(10, 10), edgecolor='k', facecolor='r', alpha=0.5, linewidth=2)
ax.set_axis_off()
stations.plot(markersize=5, figsize=(10, 10))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc62fa3cd68>
It would be nice to add the street network to the figure above. Therefore, I downloaded, using the osmnx
package, the openstreetmap network for Paris, and saved it as a shapefile:
import osmnx as ox
G = ox.graph_from_place('Paris, France', network_type='drive')
ox.save_graph_shapefile(G, 'openstreetmap_paris', folder='.')
streets = geopandas.read_file("openstreetmap_paris/edges/")
streets.head()
access | bridge | from | highway | key | lanes | length | maxspeed | name | oneway | osmid | ref | service | to | tunnel | width | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | None | 1245546489 | primary | 0 | None | 10.441171286969725 | 50 | Boulevard MacDonald | True | 191694881 | None | None | 442371 | None | None | LINESTRING (2.3711204 48.8987287, 2.3710461 48... |
1 | None | None | 442371 | primary | 0 | 4 | 218.52200148977164 | 50 | Boulevard MacDonald | False | 13859914 | None | None | 209141276 | None | None | LINESTRING (2.3711204 48.8987287, 2.3713436 48... |
2 | None | None | 442371 | primary | 0 | None | 60.98775515463154 | 50 | Boulevard MacDonald | True | 178237238 | None | None | 1886061102 | None | None | LINESTRING (2.3711204 48.8987287, 2.3709746 48... |
3 | None | None | 442371 | primary | 0 | None | 111.87478765505494 | None | Avenue de la Porte d'Aubervilliers | True | 19923111 | None | None | 3805263959 | None | None | LINESTRING (2.3711204 48.8987287, 2.3711267 48... |
4 | None | None | 1760247812 | residential | 0 | None | 44.238688986210406 | None | Villa des Lyanes | False | 164380899 | None | None | 1760247816 | None | None | LINESTRING (2.4051359 48.863424, 2.4054913 48.... |
ax = streets.plot(linewidth=0.5, color='grey', figsize=(15, 10))
stations.plot(ax=ax, markersize=20)
ax.set(xlim=(2.25, 2.42), ylim=(48.815, 48.905));
ax.set_axis_off()
By specifying the column
keyword, we can color the geometries based on the values of that column:
ax = streets.plot(linewidth=0.5, color='grey', figsize=(15, 7))
stations.plot(ax=ax, column='available_bikes', markersize=50)#, legend=True)
ax.set(xlim=(2.25, 2.42), ylim=(48.815, 48.905));
ax.set_axis_off()
Or based on categorical values:
ax = quartiers.plot(column='n_sq_ar', categorical=True, cmap='tab20', figsize=(15, 8))
ax.set_axis_off()
GeoPandas includes functionality to perform spatial joins and overlays (sjoin
and overlay
).
Here, let's do a spatial join to determine the quartier in which each bike station is located.
stations = geopandas.sjoin(stations, quartiers[['l_qu', 'geometry']].copy(), op='within')
stations.head()
status | contract_name | name | bonus | bike_stands | number | last_update | available_bike_stands | banking | available_bikes | address | geometry | index_right | l_qu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | OPEN | Paris | 14002 - RASPAIL QUINET | True | 44 | 14002 | 2017-08-26T13:04:34 | 39 | True | 4 | FACE 4 BD EDGAR QUINET - 75014 PARIS | POINT (2.32955509721 48.8391991672) | 56 | Montparnasse |
143 | OPEN | Paris | 14112 - FAUBOURG SAINT JACQUES CASSINI | False | 16 | 14112 | 2017-08-26T17:35:50 | 16 | True | 0 | 24 RUE MECHAIN - 75014 PARIS | POINT (2.33798034198 48.835867721) | 56 | Montparnasse |
293 | OPEN | Paris | 14033 - DAGUERRE GASSENDI | True | 38 | 14033 | 2017-08-26T17:42:31 | 37 | True | 1 | 31 RUE FROIDEVAUX - 75014 PARIS | POINT (2.32829082205 48.8356804442) | 56 | Montparnasse |
346 | OPEN | Paris | 14006 - SAINT JACQUES TOMBE ISSOIRE | False | 22 | 14006 | 2017-08-26T17:42:11 | 22 | True | 0 | 46 BOULEVARD SAINT JACQUES - 75014 PARIS | POINT (2.33693339125 48.8331903188) | 56 | Montparnasse |
429 | OPEN | Paris | 14111 - DENFERT-ROCHEREAU CASSINI | False | 24 | 14111 | 2017-08-26T17:41:31 | 16 | True | 8 | 18 RUE CASSINI - 75014 PARIS | POINT (2.33598303047 48.8375492922) | 56 | Montparnasse |
counts = stations.groupby('l_qu').size()
quartiers = quartiers.merge(counts.reset_index(name='number_bike_stations'))
quartiers.head()
n_sq_qu | perimetre | objectid | longueur | c_qu | surface | n_sq_ar | c_quinsee | l_qu | c_ar | geometry | number_bike_stations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 750000047 | 6155.005036 | 29 | 6154.591387 | 47 | 1.902932e+06 | 750000012 | 7511203 | Bercy | 12 | POLYGON ((2.391141037839471 48.82611264577471,... | 9 |
1 | 750000001 | 5057.549475 | 41 | 5057.332772 | 1 | 8.690007e+05 | 750000001 | 7510101 | St-Germain-l'Auxerrois | 1 | POLYGON ((2.344593389828428 48.85404991486192,... | 4 |
2 | 750000076 | 4435.273252 | 79 | 4435.143641 | 76 | 1.294988e+06 | 750000019 | 7511904 | Combat | 19 | POLYGON ((2.388343313526396 48.88056667377272,... | 13 |
3 | 750000065 | 5264.597082 | 68 | 5264.463406 | 65 | 1.465071e+06 | 750000017 | 7511701 | Ternes | 17 | POLYGON ((2.295039618663717 48.87377869547587,... | 18 |
4 | 750000010 | 2139.625388 | 50 | 2139.535591 | 10 | 2.717503e+05 | 750000003 | 7510302 | Enfants-Rouges | 3 | POLYGON ((2.367101341254551 48.86162755885409,... | 3 |
ax = quartiers.plot(column='number_bike_stations', figsize=(15, 6), legend=True)
ax.set_axis_off()
quartiers['number_bike_stations_relative'] = quartiers['number_bike_stations'] / quartiers.geometry.area
ax = quartiers.plot(column='number_bike_stations_relative', figsize=(15, 6))
ax.set_axis_off()
A GeoDataFrame or GeoSeries has a .crs
attribute which holds (optionally) a description of the coordinate reference system of the geometries:
stations.crs
{'init': 'epsg:4326'}
stations.geometry.head(3)
0 POINT (2.32955509721 48.8391991672) 143 POINT (2.33798034198 48.835867721) 293 POINT (2.32829082205 48.8356804442) Name: geometry, dtype: object
We can convert this to another reference system using the to_crs
function.
For example, let's convert it to the UTM 31U zone reference system (http://epsg.io/32631) which unit is metres.
stations2 = stations.to_crs(epsg=32631)
# or
# stations.to_crs("+proj=utm +zone=31 +ellps=WGS84 +datum=WGS84 +units=m +no_defs ")
stations2.geometry.head(3)
0 POINT (450804.448740735 5409797.268203795) 143 POINT (451419.446715647 5409421.528587255) 293 POINT (450708.2275807534 5409406.941172979) Name: geometry, dtype: object
quartiers2 = quartiers.to_crs(epsg=32631)
(quartiers2.geometry.area / 1000**2)
0 1.901972 1 0.868538 2 1.294254 3 1.464259 4 0.271603 ... 75 0.282132 76 1.073169 77 1.381170 78 0.796153 79 0.760312 Length: 80, dtype: float64
folium
¶import folium
m = folium.Map([48.8566, 2.3429], zoom_start=12, tiles="OpenStreetMap")
m.choropleth(geo_str=quartiers.to_json(), data=quartiers, columns=['objectid', 'number_bike_stations_relative'],
key_on='feature.properties.objectid', fill_color='BuGn', highlight=True)
m
Urban Atlas is providing pan-European comparable land use and land cover data for Large Urban Zones (http://www.eea.europa.eu/data-and-maps/data/urban-atlas)
The data file for Paris is rather big, so I cutted a subset for the center of Paris and saved that again as a shapefile:
urban_atlas_full = geopandas.read_file('/fr001l_paris', vfs='zip:///.../fr001l_paris.zip')
x, y = 3760000, 2890000
us_subset = urban_atlas_full.cx[x-8000 : x+8000, y-8000 : y+8000].copy()
area = Polygon([(x-8000, y-8000), (x+8000, y-8000), (x+8000, y+8000), (x-8000, y+8000)])
us_subset['geometry'] = us_subset.geometry.intersection(area)
us_subset.to_file("fr001l_paris.shp")
us = geopandas.read_file("fr001l_paris.shp")
us.head()
CITIES | LUZ_OR_CIT | CODE | ITEM | PROD_DATE | SHAPE_LEN | SHAPE_AREA | geometry | |
---|---|---|---|---|---|---|---|---|
0 | Paris | FR001L | 11100 | Continuous Urban Fabric (S.L. > 80%) | 2011 | 852.179314 | 15717.297723 | POLYGON ((3763476.757668196 2882372.206915549,... |
1 | Paris | FR001L | 11100 | Continuous Urban Fabric (S.L. > 80%) | 2011 | 779.619910 | 21761.331806 | POLYGON ((3755750.795373401 2890571.441747018,... |
2 | Paris | FR001L | 11100 | Continuous Urban Fabric (S.L. > 80%) | 2011 | 1107.783935 | 37965.185195 | POLYGON ((3759045.014574341 2894223.898634031,... |
3 | Paris | FR001L | 11100 | Continuous Urban Fabric (S.L. > 80%) | 2011 | 239.287807 | 3104.207018 | POLYGON ((3763468.876496253 2889310.004122872,... |
4 | Paris | FR001L | 11100 | Continuous Urban Fabric (S.L. > 80%) | 2011 | 322.472230 | 5353.457105 | POLYGON ((3756953.575115261 2891043.84315913, ... |
us.crs
{'ellps': 'GRS80', 'lat_0': 52, 'lon_0': 10, 'no_defs': True, 'proj': 'laea', 'units': 'm', 'x_0': 4321000, 'y_0': 3210000}
ax = us.plot(figsize=(20,20))
ax.set_axis_off()
ax = us.plot(column='ITEM', figsize=(20,20), legend=True, cmap='tab20')
ax.set_axis_off()
pd.options.display.max_rows = 60
us.ITEM.value_counts()
Continuous Urban Fabric (S.L. > 80%) 10228 Industrial, commercial, public, military and private units 2678 Discontinuous Dense Urban Fabric (S.L. : 50% - 80%) 1414 Green urban areas 731 Sports and leisure facilities 322 Water bodies 125 Railways and associated land 123 Other roads and associated land 94 Land without current use 72 Discontinuous Medium Density Urban Fabric (S.L. : 30% - 50%) 61 Construction sites 52 Fast transit roads and associated land 28 Mineral extraction and dump sites 3 Discontinuous Low Density Urban Fabric (S.L. : 10% - 30%) 2 Agricultural + Semi-natural areas + Wetlands 1 Airports 1 Name: ITEM, dtype: int64
us.groupby('ITEM')['geometry'].agg(lambda x: x.area.sum()).sort_values(ascending=False)
ITEM Continuous Urban Fabric (S.L. > 80%) 9.977743e+07 Industrial, commercial, public, military and private units 5.566733e+07 Other roads and associated land 3.229025e+07 Green urban areas 2.225923e+07 Discontinuous Dense Urban Fabric (S.L. : 50% - 80%) 1.805155e+07 Sports and leisure facilities 1.016356e+07 Railways and associated land 7.431319e+06 Water bodies 5.779798e+06 Fast transit roads and associated land 1.785006e+06 Construction sites 9.373305e+05 Land without current use 8.906937e+05 Discontinuous Medium Density Urban Fabric (S.L. : 30% - 50%) 7.013594e+05 Mineral extraction and dump sites 1.020311e+05 Airports 8.685288e+04 Agricultural + Semi-natural areas + Wetlands 3.965250e+04 Discontinuous Low Density Urban Fabric (S.L. : 10% - 30%) 3.660228e+04 Name: geometry, dtype: float64
us[us.ITEM == 'Green urban areas'].geometry.area.hist(bins=np.arange(0, 1000000, 50000))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc62a386400>