Spatial operations and overlays: creating new geometries

DS Python for GIS and Geoscience
October, 2020

© 2020, Joris Van den Bossche and Stijn Van Hoey (mailto:jorisvandenbossche@gmail.com, mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons

In the previous notebook we have seen how to identify and use the spatial relationships between geometries. In this notebook, we will see how to create new geometries based on those relationships.

In [1]:

import pandas as pd
import geopandas
import matplotlib.pyplot as plt

In [2]:

countries = geopandas.read_file("zip://./data/ne_110m_admin_0_countries.zip")
cities = geopandas.read_file("zip://./data/ne_110m_populated_places.zip")
rivers = geopandas.read_file("zip://./data/ne_50m_rivers_lake_centerlines.zip")

In [3]:

# defining the same example geometries as in the previous notebook
belgium = countries.loc[countries['name'] == 'Belgium', 'geometry'].item()
brussels = cities.loc[cities['name'] == 'Brussels', 'geometry'].item()

Spatial operations¶

Next to the spatial predicates that return boolean values, Shapely and GeoPandas also provide operations that return new geometric objects.

Binary operations:

Buffer:

See https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods for more details.

For example, using the toy data from above, let's construct a buffer around Brussels (which returns a Polygon):

In [4]:

geopandas.GeoSeries([belgium, brussels.buffer(1)]).plot(alpha=0.5, cmap='tab10')

Out[4]:

<AxesSubplot:>

and now take the intersection, union or difference of those two polygons:

In [5]:

brussels.buffer(1).intersection(belgium)

Out[5]:

In [6]:

brussels.buffer(1).union(belgium)

Out[6]:

In [7]:

brussels.buffer(1).difference(belgium)

Out[7]:

Spatial operations with GeoPandas¶

Above we showed how to create a new geometry based on two individual shapely geometries. The same operations can be extended to GeoPandas. Given a GeoDataFrame, we can calculate the intersection, union or difference of each of the geometries with another geometry.

Let's look at an example with a subset of the countries. We have a GeoDataFrame with the country polygons of Africa, and now consider a rectangular polygon, representing an area around the equator:

In [8]:

africa = countries[countries.continent == 'Africa']

In [9]:

from shapely.geometry import LineString
box = LineString([(-10, 0), (50, 0)]).buffer(10, cap_style=3)

In [10]:

fig, ax = plt.subplots(figsize=(6, 6))
africa.plot(ax=ax, facecolor='none', edgecolor='k')
geopandas.GeoSeries([box]).plot(ax=ax, facecolor='C0', edgecolor='k', alpha=0.5)

Out[10]:

<AxesSubplot:>

The intersection method of the GeoDataFrame will now calculate the intersection with the rectangle for each of the geometries of the africa GeoDataFrame element-wise. Note that for many of the countries, those that do not overlap with the rectangle, this will be an empty geometry:

In [11]:

africa_intersection = africa.intersection(box)
africa_intersection.head()

Out[11]:

1     MULTIPOLYGON (((13.22332 -10.00000, 13.12099 -...
11    POLYGON ((29.34000 -4.49998, 29.27638 -3.29391...
13    POLYGON ((2.69170 6.25882, 1.86524 6.14216, 1....
14    POLYGON ((-2.89227 10.00000, -2.82750 9.64246,...
25                                        POLYGON EMPTY
dtype: geometry

What is returned is a new GeoSeries of the same length as the original dataframe, containing one row per country, but now containing only the intersection. In this example, the last element shown is an empty polygon, as that country was not overlapping with the box.

In [12]:

# remove the empty polygons before plotting
africa_intersection = africa_intersection[~africa_intersection.is_empty]
# plot the intersection
africa_intersection.plot()

Out[12]:

<AxesSubplot:>

Unary union and dissolve¶

Another useful method is the unary_union attribute, which converts the set of geometry objects in a GeoDataFrame into a single geometry object by taking the union of all those geometries.

For example, we can construct a single Shapely geometry object for the Africa continent:

In [13]:

africa_countries = countries[countries['continent'] == 'Africa']

In [14]:

africa = africa_countries.unary_union

In [15]:

africa

Out[15]:

In [16]:

print(str(africa)[:1000])

MULTIPOLYGON (((32.83012047702888 -26.7421916643362, 32.58026492689768 -27.47015756603182, 32.46213260267845 -28.30101124442056, 32.20338870619304 -28.75240488049007, 31.52100141777888 -29.25738697684626, 31.325561150851 -29.40197763439891, 30.90176272962535 -29.90995696382804, 30.62281334811382 -30.42377573010613, 30.05571618014278 -31.14026946383296, 28.92555260591954 -32.1720411109725, 28.2197558936771 -32.77195281344886, 27.46460818859597 -33.2269637997788, 26.41945234549283 -33.61495045342619, 25.90966434093349 -33.6670402971764, 25.7806282895007 -33.94464609144834, 25.17286176931597 -33.79685149509358, 24.67785322439212 -33.98717579522455, 23.59404340993464 -33.79447437920815, 22.98818891774474 -33.91643075941698, 22.57415734222224 -33.86408253350531, 21.54279910654103 -34.25883879978294, 20.689052768647 -34.41717538832523, 20.07126102059763 -34.79513681410799, 19.61640506356457 -34.81916635512371, 19.19327843595872 -34.46259897230979, 18.85531456876987 -34.44430551527847, 18.424

Alternatively, you might want to take the unary union of a set of geometries but grouped by one of the attributes of the GeoDataFrame (so basically doing "groupby" + "unary_union"). For this operation, GeoPandas provides the dissolve() method:

In [17]:

continents = countries.dissolve(by="continent")   # , aggfunc="sum"

In [18]:

continents

Out[18]:

	geometry	iso_a3	name	pop_est	gdp_md_est
continent
Africa	MULTIPOLYGON (((32.830 -26.742, 32.580 -27.470...	AGO	Angola	29310273.0	189000.0
Antarctica	MULTIPOLYGON (((-159.208 -79.497, -161.128 -79...	ATA	Antarctica	4050.0	810.0
Asia	MULTIPOLYGON (((120.716 -10.240, 120.295 -10.2...	AFG	Afghanistan	34124811.0	64080.0
Europe	MULTIPOLYGON (((-51.658 4.156, -52.249 3.241, ...	ALB	Albania	3047987.0	33900.0
North America	MULTIPOLYGON (((-61.680 10.760, -61.105 10.890...	BHS	Bahamas	329988.0	9066.0
Oceania	MULTIPOLYGON (((173.020 -40.919, 173.247 -41.3...	AUS	Australia	23232413.0	1189000.0
Seven seas (open ocean)	POLYGON ((68.935 -48.625, 69.580 -48.940, 70.5...	ATF	Fr. S. Antarctic Lands	140.0	16.0
South America	MULTIPOLYGON (((-66.960 -54.897, -67.291 -55.3...	ARG	Argentina	44293293.0	879400.0

REMEMBER:

GeoPandas (and Shapely for the individual objects) provide a whole lot of basic methods to analyze the geospatial data (distance, length, centroid, boundary, convex_hull, simplify, transform, ....), much more than what we can touch in this tutorial.

An overview of all methods provided by GeoPandas can be found here: https://geopandas.readthedocs.io/en/latest/docs/reference.html

Let's practice!¶

EXERCISE: What are the districts close to the Seine?

Below, the coordinates for the Seine river in the neighborhood of Paris are provided as a GeoJSON-like feature dictionary (created at http://geojson.io).

Based on this seine object, we want to know which districts are located close (maximum 150 m) to the Seine.

Create a buffer of 150 m around the Seine.
Check which districts intersect with this buffered object.
Make a visualization of the districts indicating which districts are located close to the Seine.

In [24]:

districts = geopandas.read_file("data/paris_districts.geojson").to_crs(epsg=2154)

In [25]:

# created a line with http://geojson.io
s_seine = geopandas.GeoDataFrame.from_features({"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"type":"LineString","coordinates":[[2.408924102783203,48.805619828930226],[2.4092674255371094,48.81703747481909],[2.3927879333496094,48.82325391133874],[2.360687255859375,48.84912860497674],[2.338714599609375,48.85827758964043],[2.318115234375,48.8641501307046],[2.298717498779297,48.863246707697],[2.2913360595703125,48.859519915404825],[2.2594070434570312,48.8311646245967],[2.2436141967773438,48.82325391133874],[2.236919403076172,48.82347994904826],[2.227306365966797,48.828339513221444],[2.2224998474121094,48.83862215329593],[2.2254180908203125,48.84856379804802],[2.2240447998046875,48.85409863123821],[2.230224609375,48.867989496547864],[2.260265350341797,48.89192242750887],[2.300262451171875,48.910203080780285]]}}]},
                                               crs='EPSG:4326')

In [26]:

# convert to local UTM zone
s_seine_utm = s_seine.to_crs(epsg=2154)

In [27]:

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(20, 10))
districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
s_seine_utm.plot(ax=ax)

Out[27]:

<AxesSubplot:>

In [30]:

# access the single geometry object
seine = s_seine_utm.geometry.item()
seine

Out[30]:

In [31]:

# Take a buffer
seine_buffer = seine.buffer(150)
seine_buffer

Out[31]:

In [34]:

# Use the intersection
districts_seine = districts[districts.intersects(seine_buffer)]

In [35]:

# Make a plot
fig, ax = plt.subplots(figsize=(20, 10))
districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
districts_seine.plot(ax=ax, color='blue', alpha=0.4, edgecolor='k')
s_seine_utm.plot(ax=ax)

Out[35]:

<AxesSubplot:>

EXERCISE: Exploring a Land Use dataset

For the following exercises, we first introduce a new dataset: a dataset about the land use of Paris (a simplified version based on the open European Urban Atlas). The land use indicates for what kind of activity a certain area is used, such as residential area or for recreation. It is a polygon dataset, with a label representing the land use class for different areas in Paris.

In this exercise, we will read the data, explore it visually, and calculate the total area of the different classes of land use in the area of Paris.

Read in the 'paris_land_use.shp' file and assign the result to a variable land_use.
Make a plot of land_use, using the 'class' column to color the polygons. Add a legend with legend=True, and make the figure size a bit larger.
Add a new column 'area' to the dataframe with the area of each polygon.
Calculate the total area in km² for each 'class' using the groupby() method, and print the result.

Hints

Reading a file can be done with the geopandas.read_file() function.
To use a column to color the geometries, use the column keyword to indicate the column name.
The area of each geometry can be accessed with the area attribute of the geometry of the GeoDataFrame.
The groupby() method takes the column name on which you want to group as the first argument.

In [36]:

# Import the land use dataset
land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
land_use.head()

Out[36]:

	class	geometry
0	Water bodies	POLYGON ((3751386.281 2890064.323, 3751395.345...
1	Roads and associated land	POLYGON ((3751390.345 2886000.000, 3751390.345...
2	Roads and associated land	POLYGON ((3751390.345 2886898.192, 3751390.370...
3	Roads and associated land	POLYGON ((3751390.345 2887500.000, 3751390.345...
4	Roads and associated land	POLYGON ((3751390.345 2888647.357, 3751390.370...

In [37]:

# Make a plot of the land use with 'class' as the color
land_use.plot(column='class', legend=True, figsize=(15, 10))

Out[37]:

<AxesSubplot:>

In [38]:

# Add the area as a new column
land_use['area'] = land_use.geometry.area

In [39]:

# Calculate the total area for each land use class
total_area = land_use.groupby('class')['area'].sum() / 1000**2
total_area

Out[39]:

class
Continuous Urban Fabric             45.943090
Discontinuous Dense Urban Fabric     3.657343
Green urban areas                    9.858438
Industrial, commercial, public      13.295042
Railways and associated land         1.935793
Roads and associated land            7.401574
Sports and leisure facilities        3.578509
Water bodies                         3.189706
Name: area, dtype: float64

EXERCISE: Intersection of two polygons

For this exercise, we are going to use 2 individual polygons: the district of Muette extracted from the districts dataset, and the green urban area of Boulogne, a large public park in the west of Paris, extracted from the land_use dataset. The two polygons have already been assigned to the muette and park_boulogne variables.

We first visualize the two polygons. You will see that they overlap, but the park is not fully located in the district of Muette. Let's determine the overlapping part:

Plot the two polygons in a single map to examine visually the degree of overlap
Calculate the intersection of the park_boulogne and muette polygons.
Plot the intersection.
Print the proportion of the area of the district that is occupied by the park.

Hints

To plot single Shapely objects, you can put those in a GeoSeries([..]) to use the GeoPandas plot() method.
The intersection of to scalar polygons can be calculated with the intersection() method of one of the polygons, and passing the other polygon as the argument to that method.

In [40]:

land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)

In [41]:

# extract polygons
land_use['area'] = land_use.geometry.area
park_boulogne = land_use[land_use['class'] == "Green urban areas"].sort_values('area').geometry.iloc[-1]
muette = districts[districts.district_name == 'Muette'].geometry.item()

In [42]:

# Plot the two polygons
geopandas.GeoSeries([park_boulogne, muette]).plot(alpha=0.5, color=['green', 'blue'])

Out[42]:

<AxesSubplot:>

In [43]:

# Calculate the intersection of both polygons
intersection = park_boulogne.intersection(muette)

In [44]:

# Plot the intersection
geopandas.GeoSeries([intersection]).plot()

Out[44]:

<AxesSubplot:>

In [45]:

# Print proportion of district area that occupied park
print(intersection.area / muette.area)

0.4352082235641065

EXERCISE: Intersecting a GeoDataFrame with a Polygon

Combining the land use dataset and the districts dataset, we can now investigate what the land use is in a certain district.

For that, we first need to determine the intersection of the land use dataset with a given district. Let's take again the Muette district as example case.

Calculate the intersection of the land_use polygons with the single muette polygon. Call the result land_use_muette.
Remove the empty geometries from land_use_muette.
Make a quick plot of this intersection, and pass edgecolor='black' to more clearly see the boundaries of the different polygons.
Print the first five rows of land_use_muette.

Hints

The intersection of each geometry of a GeoSeries with another single geometry can be performed with the intersection() method of a GeoSeries.
The intersection() method takes as argument the geometry for which to calculate the intersection.
We can check which geometries are empty with the is_empty attribute of a GeoSeries.

In [46]:

land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
muette = districts[districts.district_name == 'Muette'].geometry.item()

In [47]:

# Calculate the intersection of the land use polygons with Muette
land_use_muette = land_use.geometry.intersection(muette)

In [48]:

# Print the first five rows of the intersection
land_use_muette.head()

/home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:689: RuntimeWarning: All-NaN slice encountered
  np.nanmin(b[:, 0]),  # minx
/home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:690: RuntimeWarning: All-NaN slice encountered
  np.nanmin(b[:, 1]),  # miny
/home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:691: RuntimeWarning: All-NaN slice encountered
  np.nanmax(b[:, 2]),  # maxx
/home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:692: RuntimeWarning: All-NaN slice encountered
  np.nanmax(b[:, 3]),  # maxy

Out[48]:

0    POLYGON EMPTY
1    POLYGON EMPTY
2    POLYGON EMPTY
3    POLYGON EMPTY
4    POLYGON EMPTY
dtype: geometry

In [49]:

# Remove the empty geometries
land_use_muette = land_use_muette[~land_use_muette.is_empty]

In [50]:

# Print the first five rows of the intersection
land_use_muette.head()

Out[50]:

135    POLYGON ((3752020.694 2891519.323, 3752025.310...
229    POLYGON ((3752443.986 2891171.823, 3752446.430...
239    POLYGON ((3752110.034 2891774.197, 3752112.444...
278    POLYGON ((3752000.000 2891530.298, 3752000.000...
279    POLYGON ((3751954.462 2891571.778, 3752000.000...
dtype: geometry

In [51]:

# Plot the intersection
land_use_muette.plot(edgecolor='black')

Out[51]:

<AxesSubplot:>

You can see in the plot that we now only have a subset of the full land use dataset. The original land_use_muette (before removing the empty geometries) still has the same number of rows as the original land_use, though. But many of the rows, as you could see by printing the first rows, consist now of empty polygons when it did not intersect with the Muette district.

The intersection() method also returned only geometries. If we want to combine those intersections with the attributes of the original land use, we can take a copy of this and replace the geometries with the intersections (you can uncomment and run to see the code):

In [52]:

land_use_muette = land_use.copy()
land_use_muette['geometry'] = land_use.geometry.intersection(muette)
land_use_muette = land_use_muette[~land_use_muette.is_empty]
land_use_muette.head()

Out[52]:

	class	geometry
135	Green urban areas	POLYGON ((3752020.694 2891519.323, 3752025.310...
229	Sports and leisure facilities	POLYGON ((3752443.986 2891171.823, 3752446.430...
239	Water bodies	POLYGON ((3752110.034 2891774.197, 3752112.444...
278	Roads and associated land	POLYGON ((3752000.000 2891530.298, 3752000.000...
279	Roads and associated land	POLYGON ((3751954.462 2891571.778, 3752000.000...

In [53]:

land_use_muette.plot(column="class") #edgecolor="black")

Out[53]:

<AxesSubplot:>

In [54]:

land_use_muette.dissolve(by='class')

Out[54]:

	geometry
class
Continuous Urban Fabric	MULTIPOLYGON (((3755334.091 2889457.833, 37553...
Discontinuous Dense Urban Fabric	MULTIPOLYGON (((3755585.963 2889793.822, 37556...
Green urban areas	MULTIPOLYGON (((3755772.518 2889995.038, 37558...
Industrial, commercial, public	MULTIPOLYGON (((3755275.182 2889527.443, 37552...
Railways and associated land	POLYGON ((3755654.921 2889540.054, 3755560.618...
Roads and associated land	MULTIPOLYGON (((3754820.067 2889843.877, 37548...
Sports and leisure facilities	MULTIPOLYGON (((3753932.354 2891267.190, 37539...
Water bodies	MULTIPOLYGON (((3755507.459 2889412.262, 37555...

EXERCISE: The land use of the Muette district

Based on the land_use_muette dataframe with the land use for the Muette districts as calculated above, we can now determine the total area of the different land use classes in the Muette district.

Calculate the total area per land use class.
Calculate the fraction (in percentage) for the different land use classes.

Hints

The intersection of each geometry of a GeoSeries with another single geometry can be performed with the intersection() method of a GeoSeries.
The intersection() method takes as argument the geometry for which to calculate the intersection.
We can check which geometries are empty with the is_empty attribute of a GeoSeries.

In [55]:

land_use_muette['area'] = land_use_muette.geometry.area
# Total land use per class
land_use_muette.groupby("class")["area"].sum()

Out[55]:

class
Continuous Urban Fabric             1.275297e+06
Discontinuous Dense Urban Fabric    8.828861e+04
Green urban areas                   2.624229e+06
Industrial, commercial, public      3.629901e+05
Railways and associated land        5.424307e+03
Roads and associated land           2.262708e+05
Sports and leisure facilities       6.039885e+05
Water bodies                        2.921943e+05
Name: area, dtype: float64

In [56]:

# Relative percentage of land use classes
land_use_muette.groupby("class")["area"].sum() / land_use_muette.geometry.area.sum() * 100

Out[56]:

class
Continuous Urban Fabric             23.277436
Discontinuous Dense Urban Fabric     1.611493
Green urban areas                   47.898906
Industrial, commercial, public       6.625501
Railways and associated land         0.099008
Roads and associated land            4.130023
Sports and leisure facilities       11.024339
Water bodies                         5.333295
Name: area, dtype: float64

The above was only for a single district. If we want to do this more easily for all districts, we can do this with the overlay operation.

The overlay operation¶

In a spatial join operation, we are not changing the geometries itself. We are not joining geometries, but joining attributes based on a spatial relationship between the geometries. This also means that the geometries need to at least overlap partially.

If you want to create new geometries based on joining (combining) geometries of different dataframes into one new dataframe (eg by taking the intersection of the geometries), you want an overlay operation.

How does it differ compared to the intersection method?¶

With the intersection() method introduced in the previous section, we could for example determine the intersection of a set of countries with another polygon, a circle in the example below:

However, this method (countries.intersection(circle)) also has some limitations.

Mostly useful when intersecting a GeoSeries with a single polygon.
Does not preserve attribute information of the intersecting polygons.

For cases where we require a bit more complexity, it is preferable to use the "overlay" operation, instead of the intersection method.

Consider the following simplified example. On the left we see again the 3 countries. On the right we have the plot of a GeoDataFrame with some simplified geologic regions for the same area:

By simply plotting them on top of each other, as shown below, you can see that the polygons of both layers intersect.

But now, by "overlaying" the two layers, we can create a third layer that contains the result of intersecting both layers: all the intersections of each country with each geologic region. It keeps only those areas that were included in both layers.

This operation is called an intersection overlay, and in GeoPandas we can perform this operation with the geopandas.overlay() function.

Another code example:

In [57]:

africa = countries[countries['continent'] == 'Africa']

In [58]:

africa.plot()

Out[58]:

<AxesSubplot:>

In [59]:

cities['geometry'] = cities.buffer(2)

<ipython-input-59-c1250b33ac0f>:1: UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  cities['geometry'] = cities.buffer(2)

In [60]:

intersection = geopandas.overlay(africa, cities, how='intersection')
intersection.plot()

Out[60]:

<AxesSubplot:>

In [61]:

intersection.head()

Out[61]:

	iso_a3	name_1	continent	pop_est	gdp_md_est	name_2	geometry
0	AGO	Angola	Africa	29310273.0	189000.0	Brazzaville	POLYGON ((14.10056 -5.86749, 16.32653 -5.87747...
1	COD	Dem. Rep. Congo	Africa	83301151.0	66010.0	Brazzaville	POLYGON ((16.34948 -5.94680, 16.32653 -5.87747...
2	COG	Congo	Africa	4954674.0	30270.0	Brazzaville	POLYGON ((16.09773 -2.43295, 15.97280 -2.71239...
3	AGO	Angola	Africa	29310273.0	189000.0	Luanda	POLYGON ((13.69097 -10.78080, 13.68638 -10.731...
4	AGO	Angola	Africa	29310273.0	189000.0	Kinshasa	POLYGON ((14.03697 -5.86721, 16.32653 -5.87747...

With the overlay method, we pass the full GeoDataFrame with all regions to intersect the countries with. The result contains all non-empty intersections of all combinations of countries and city regions.

Note that the result of the overlay function also keeps the attribute information of both the countries as the city regions. That can be very useful for further analysis.

In [62]:

geopandas.overlay(africa, cities, how='intersection').plot()  # how="difference"/"union"/"symmetric_difference"

Out[62]:

<AxesSubplot:>

REMEMBER

Spatial join: transfer attributes from one dataframe to another based on the spatial relationship
Spatial overlay: construct new geometries based on spatial operation between both dataframes (and combining attributes of both dataframes)

Let's practice!¶

EXERCISE: Overlaying spatial datasets I

We will now combine both datasets in an overlay operation. Create a new GeoDataFrame consisting of the intersection of the land use polygons which each of the districts, but make sure to bring the attribute data from both source layers.

Create a new GeoDataFrame from the intersections of land_use and districts. Assign the result to a variable combined.
Print the first rows the resulting GeoDataFrame (combined).

Hints

The intersection of two GeoDataFrames can be calculated with the geopandas.overlay() function.
The overlay() functions takes first the two GeoDataFrames to combine, and a third how keyword indicating how to combine the two layers.
For making an overlay based on the intersection, you can pass how='intersection'.

In [63]:

land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)

In [64]:

# Overlay both datasets based on the intersection
combined = geopandas.overlay(land_use, districts, how='intersection')

In [65]:

# Print the first five rows of the result
combined.head()

Out[65]:

	class	id	district_name	population	geometry
0	Water bodies	61	Auteuil	67967	POLYGON ((3751395.345 2890118.001, 3751395.345...
1	Continuous Urban Fabric	61	Auteuil	67967	MULTIPOLYGON (((3753253.104 2888254.529, 37532...
2	Roads and associated land	61	Auteuil	67967	POLYGON ((3751519.830 2890061.509, 3751522.057...
3	Green urban areas	61	Auteuil	67967	MULTIPOLYGON (((3754314.717 2890283.121, 37543...
4	Roads and associated land	61	Auteuil	67967	POLYGON ((3751619.113 2890500.000, 3751626.627...

EXERCISE: Overlaying spatial datasets II

Now that we created the overlay of the land use and districts datasets, we can more easily inspect the land use for the different districts. Let's get back to the example district of Muette, and inspect the land use of that district.

Add a new column 'area' with the area of each polygon to the combined GeoDataFrame.
Create a subset called land_use_muette where the 'district_name' is equal to "Muette".
Make a plot of land_use_muette, using the 'class' column to color the polygons.
Calculate the total area for each 'class' of land_use_muette using the groupby() method, and print the result.

Hints

The area of each geometry can be accessed with the area attribute of the geometry of the GeoDataFrame.
To use a column to color the geometries, pass its name to the column keyword.
The groupby() method takes the column name on which you want to group as the first argument.
The total area for each class can be calculated by taking the sum() of the area.

In [66]:

# Add the area as a column
combined['area'] = combined.geometry.area

In [67]:

# Take a subset for the Muette district
land_use_muette = combined[combined['district_name'] == 'Muette']

In [68]:

# Visualize the land use of the Muette district
land_use_muette.plot(column='class')

Out[68]:

<AxesSubplot:>

In [69]:

# Calculate the total area for each land use class
print(land_use_muette.groupby('class')['area'].sum() / 1000**2)

class
Continuous Urban Fabric             1.275297
Discontinuous Dense Urban Fabric    0.088289
Green urban areas                   2.624229
Industrial, commercial, public      0.362990
Railways and associated land        0.005424
Roads and associated land           0.226271
Sports and leisure facilities       0.603989
Water bodies                        0.292194
Name: area, dtype: float64

EXERCISE: Overlaying spatial datasets III

Thanks to the result of the overlay operation, we can now more easily perform a similar analysis for all districts. Let's investigate the fraction of green urban area in each of the districts.

Based on the combined dataset, calculate the total area per district using groupby().
Select the subset of "Green urban areas" from combined and call this urban_green.
Now calculate the total area per district for this urban_green subset, and call this urban_green_area.
Determine the fraction of urban green area in each district.

In [70]:

districts_area = combined.groupby("district_name")["area"].sum()
districts_area.head()

Out[70]:

district_name
Archives           3.678060e+05
Arsenal            4.871228e+05
Arts-et-Metiers    3.181515e+05
Auteuil            6.386649e+06
Batignolles        1.350675e+06
Name: area, dtype: float64

In [71]:

urban_green = combined[combined["class"] == "Green urban areas"]

In [72]:

urban_green_area = urban_green.groupby("district_name")["area"].sum()
urban_green_area.head()

Out[72]:

district_name
Archives           4.409114e+03
Arsenal            2.801951e+04
Arts-et-Metiers    7.154903e+03
Auteuil            1.577429e+06
Batignolles        8.471799e+04
Name: area, dtype: float64

In [73]:

urban_green_fraction = urban_green_area / districts_area * 100

In [74]:

urban_green_fraction.nlargest()

Out[74]:

district_name
Muette                    47.898906
Porte-Dauphine            42.266546
Odeon                     33.044112
St-Germain-l'Auxerrois    31.534557
Auteuil                   24.698847
Name: area, dtype: float64

In [75]:

urban_green_fraction.nsmallest()

Out[75]:

district_name
Chaussée-d'Antin         0.005752
Saint-Vincent-de-Paul    0.332773
Notre-Dame-des-Champs    0.593873
Salpêtrière              0.605310
Saint-Georges            0.654253
Name: area, dtype: float64

An alternative to calculate the area per land use class in each district:

In [76]:

combined.groupby(["district_name", "class"])["area"].sum().reset_index()

Out[76]:

	district_name	class	area
0	Archives	Continuous Urban Fabric	323124.623717
1	Archives	Green urban areas	4409.113826
2	Archives	Industrial, commercial, public	33752.922651
3	Archives	Roads and associated land	6519.347840
4	Arsenal	Continuous Urban Fabric	274791.191887
...	...	...	...
352	Villette	Roads and associated land	33164.775004
353	Villette	Water bodies	3384.185687
354	Vivienne	Continuous Urban Fabric	190071.213662
355	Vivienne	Industrial, commercial, public	33180.126871
356	Vivienne	Roads and associated land	20339.790825

357 rows × 3 columns

In [ ]: