Spatial operations and overlays: creating new geometries
DS Python for GIS and Geoscience
October, 2020© 2020, Joris Van den Bossche and Stijn Van Hoey (mailto:jorisvandenbossche@gmail.com, mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons
In the previous notebook we have seen how to identify and use the spatial relationships between geometries. In this notebook, we will see how to create new geometries based on those relationships.
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
countries = geopandas.read_file("zip://./data/ne_110m_admin_0_countries.zip")
cities = geopandas.read_file("zip://./data/ne_110m_populated_places.zip")
rivers = geopandas.read_file("zip://./data/ne_50m_rivers_lake_centerlines.zip")
# defining the same example geometries as in the previous notebook
belgium = countries.loc[countries['name'] == 'Belgium', 'geometry'].item()
brussels = cities.loc[cities['name'] == 'Brussels', 'geometry'].item()
Next to the spatial predicates that return boolean values, Shapely and GeoPandas also provide operations that return new geometric objects.
Binary operations:
Buffer:
See https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods for more details.
For example, using the toy data from above, let's construct a buffer around Brussels (which returns a Polygon):
geopandas.GeoSeries([belgium, brussels.buffer(1)]).plot(alpha=0.5, cmap='tab10')
<AxesSubplot:>
and now take the intersection, union or difference of those two polygons:
brussels.buffer(1).intersection(belgium)
brussels.buffer(1).union(belgium)
brussels.buffer(1).difference(belgium)
Above we showed how to create a new geometry based on two individual shapely geometries. The same operations can be extended to GeoPandas. Given a GeoDataFrame, we can calculate the intersection, union or difference of each of the geometries with another geometry.
Let's look at an example with a subset of the countries. We have a GeoDataFrame with the country polygons of Africa, and now consider a rectangular polygon, representing an area around the equator:
africa = countries[countries.continent == 'Africa']
from shapely.geometry import LineString
box = LineString([(-10, 0), (50, 0)]).buffer(10, cap_style=3)
fig, ax = plt.subplots(figsize=(6, 6))
africa.plot(ax=ax, facecolor='none', edgecolor='k')
geopandas.GeoSeries([box]).plot(ax=ax, facecolor='C0', edgecolor='k', alpha=0.5)
<AxesSubplot:>
The intersection method of the GeoDataFrame will now calculate the intersection with the rectangle for each of the geometries of the africa GeoDataFrame element-wise. Note that for many of the countries, those that do not overlap with the rectangle, this will be an empty geometry:
africa_intersection = africa.intersection(box)
africa_intersection.head()
1 MULTIPOLYGON (((13.22332 -10.00000, 13.12099 -... 11 POLYGON ((29.34000 -4.49998, 29.27638 -3.29391... 13 POLYGON ((2.69170 6.25882, 1.86524 6.14216, 1.... 14 POLYGON ((-2.89227 10.00000, -2.82750 9.64246,... 25 POLYGON EMPTY dtype: geometry
What is returned is a new GeoSeries of the same length as the original dataframe, containing one row per country, but now containing only the intersection. In this example, the last element shown is an empty polygon, as that country was not overlapping with the box.
# remove the empty polygons before plotting
africa_intersection = africa_intersection[~africa_intersection.is_empty]
# plot the intersection
africa_intersection.plot()
<AxesSubplot:>
Another useful method is the unary_union
attribute, which converts the set of geometry objects in a GeoDataFrame into a single geometry object by taking the union of all those geometries.
For example, we can construct a single Shapely geometry object for the Africa continent:
africa_countries = countries[countries['continent'] == 'Africa']
africa = africa_countries.unary_union
africa
print(str(africa)[:1000])
MULTIPOLYGON (((32.83012047702888 -26.7421916643362, 32.58026492689768 -27.47015756603182, 32.46213260267845 -28.30101124442056, 32.20338870619304 -28.75240488049007, 31.52100141777888 -29.25738697684626, 31.325561150851 -29.40197763439891, 30.90176272962535 -29.90995696382804, 30.62281334811382 -30.42377573010613, 30.05571618014278 -31.14026946383296, 28.92555260591954 -32.1720411109725, 28.2197558936771 -32.77195281344886, 27.46460818859597 -33.2269637997788, 26.41945234549283 -33.61495045342619, 25.90966434093349 -33.6670402971764, 25.7806282895007 -33.94464609144834, 25.17286176931597 -33.79685149509358, 24.67785322439212 -33.98717579522455, 23.59404340993464 -33.79447437920815, 22.98818891774474 -33.91643075941698, 22.57415734222224 -33.86408253350531, 21.54279910654103 -34.25883879978294, 20.689052768647 -34.41717538832523, 20.07126102059763 -34.79513681410799, 19.61640506356457 -34.81916635512371, 19.19327843595872 -34.46259897230979, 18.85531456876987 -34.44430551527847, 18.424
Alternatively, you might want to take the unary union of a set of geometries but grouped by one of the attributes of the GeoDataFrame (so basically doing "groupby" + "unary_union"). For this operation, GeoPandas provides the dissolve()
method:
continents = countries.dissolve(by="continent") # , aggfunc="sum"
continents
geometry | iso_a3 | name | pop_est | gdp_md_est | |
---|---|---|---|---|---|
continent | |||||
Africa | MULTIPOLYGON (((32.830 -26.742, 32.580 -27.470... | AGO | Angola | 29310273.0 | 189000.0 |
Antarctica | MULTIPOLYGON (((-159.208 -79.497, -161.128 -79... | ATA | Antarctica | 4050.0 | 810.0 |
Asia | MULTIPOLYGON (((120.716 -10.240, 120.295 -10.2... | AFG | Afghanistan | 34124811.0 | 64080.0 |
Europe | MULTIPOLYGON (((-51.658 4.156, -52.249 3.241, ... | ALB | Albania | 3047987.0 | 33900.0 |
North America | MULTIPOLYGON (((-61.680 10.760, -61.105 10.890... | BHS | Bahamas | 329988.0 | 9066.0 |
Oceania | MULTIPOLYGON (((173.020 -40.919, 173.247 -41.3... | AUS | Australia | 23232413.0 | 1189000.0 |
Seven seas (open ocean) | POLYGON ((68.935 -48.625, 69.580 -48.940, 70.5... | ATF | Fr. S. Antarctic Lands | 140.0 | 16.0 |
South America | MULTIPOLYGON (((-66.960 -54.897, -67.291 -55.3... | ARG | Argentina | 44293293.0 | 879400.0 |
REMEMBER:
GeoPandas (and Shapely for the individual objects) provide a whole lot of basic methods to analyze the geospatial data (distance, length, centroid, boundary, convex_hull, simplify, transform, ....), much more than what we can touch in this tutorial.
EXERCISE: What are the districts close to the Seine?
Below, the coordinates for the Seine river in the neighborhood of Paris are provided as a GeoJSON-like feature dictionary (created at http://geojson.io).
Based on this seine
object, we want to know which districts are located close (maximum 150 m) to the Seine.
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(epsg=2154)
# created a line with http://geojson.io
s_seine = geopandas.GeoDataFrame.from_features({"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"type":"LineString","coordinates":[[2.408924102783203,48.805619828930226],[2.4092674255371094,48.81703747481909],[2.3927879333496094,48.82325391133874],[2.360687255859375,48.84912860497674],[2.338714599609375,48.85827758964043],[2.318115234375,48.8641501307046],[2.298717498779297,48.863246707697],[2.2913360595703125,48.859519915404825],[2.2594070434570312,48.8311646245967],[2.2436141967773438,48.82325391133874],[2.236919403076172,48.82347994904826],[2.227306365966797,48.828339513221444],[2.2224998474121094,48.83862215329593],[2.2254180908203125,48.84856379804802],[2.2240447998046875,48.85409863123821],[2.230224609375,48.867989496547864],[2.260265350341797,48.89192242750887],[2.300262451171875,48.910203080780285]]}}]},
crs='EPSG:4326')
# convert to local UTM zone
s_seine_utm = s_seine.to_crs(epsg=2154)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(20, 10))
districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
s_seine_utm.plot(ax=ax)
<AxesSubplot:>
# access the single geometry object
seine = s_seine_utm.geometry.item()
seine
# Take a buffer
seine_buffer = seine.buffer(150)
seine_buffer
# Use the intersection
districts_seine = districts[districts.intersects(seine_buffer)]
# Make a plot
fig, ax = plt.subplots(figsize=(20, 10))
districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
districts_seine.plot(ax=ax, color='blue', alpha=0.4, edgecolor='k')
s_seine_utm.plot(ax=ax)
<AxesSubplot:>
EXERCISE: Exploring a Land Use dataset
For the following exercises, we first introduce a new dataset: a dataset about the land use of Paris (a simplified version based on the open European Urban Atlas). The land use indicates for what kind of activity a certain area is used, such as residential area or for recreation. It is a polygon dataset, with a label representing the land use class for different areas in Paris.
In this exercise, we will read the data, explore it visually, and calculate the total area of the different classes of land use in the area of Paris.
'paris_land_use.shp'
file and assign the result to a variable land_use
.land_use
, using the 'class'
column to color the polygons. Add a legend with legend=True
, and make the figure size a bit larger.'area'
to the dataframe with the area of each polygon.'class'
using the groupby()
method, and print the result.geopandas.read_file()
function.column
keyword to indicate the column name.area
attribute of the geometry
of the GeoDataFrame.groupby()
method takes the column name on which you want to group as the first argument.# Import the land use dataset
land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
land_use.head()
class | geometry | |
---|---|---|
0 | Water bodies | POLYGON ((3751386.281 2890064.323, 3751395.345... |
1 | Roads and associated land | POLYGON ((3751390.345 2886000.000, 3751390.345... |
2 | Roads and associated land | POLYGON ((3751390.345 2886898.192, 3751390.370... |
3 | Roads and associated land | POLYGON ((3751390.345 2887500.000, 3751390.345... |
4 | Roads and associated land | POLYGON ((3751390.345 2888647.357, 3751390.370... |
# Make a plot of the land use with 'class' as the color
land_use.plot(column='class', legend=True, figsize=(15, 10))
<AxesSubplot:>
# Add the area as a new column
land_use['area'] = land_use.geometry.area
# Calculate the total area for each land use class
total_area = land_use.groupby('class')['area'].sum() / 1000**2
total_area
class Continuous Urban Fabric 45.943090 Discontinuous Dense Urban Fabric 3.657343 Green urban areas 9.858438 Industrial, commercial, public 13.295042 Railways and associated land 1.935793 Roads and associated land 7.401574 Sports and leisure facilities 3.578509 Water bodies 3.189706 Name: area, dtype: float64
EXERCISE: Intersection of two polygons
For this exercise, we are going to use 2 individual polygons: the district of Muette extracted from the districts
dataset, and the green urban area of Boulogne, a large public park in the west of Paris, extracted from the land_use
dataset. The two polygons have already been assigned to the muette
and park_boulogne
variables.
We first visualize the two polygons. You will see that they overlap, but the park is not fully located in the district of Muette. Let's determine the overlapping part:
park_boulogne
and muette
polygons.GeoSeries([..])
to use the GeoPandas plot()
method.intersection()
method of one of the polygons, and passing the other polygon as the argument to that method.land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
# extract polygons
land_use['area'] = land_use.geometry.area
park_boulogne = land_use[land_use['class'] == "Green urban areas"].sort_values('area').geometry.iloc[-1]
muette = districts[districts.district_name == 'Muette'].geometry.item()
# Plot the two polygons
geopandas.GeoSeries([park_boulogne, muette]).plot(alpha=0.5, color=['green', 'blue'])
<AxesSubplot:>
# Calculate the intersection of both polygons
intersection = park_boulogne.intersection(muette)
# Plot the intersection
geopandas.GeoSeries([intersection]).plot()
<AxesSubplot:>
# Print proportion of district area that occupied park
print(intersection.area / muette.area)
0.4352082235641065
EXERCISE: Intersecting a GeoDataFrame with a Polygon
Combining the land use dataset and the districts dataset, we can now investigate what the land use is in a certain district.
For that, we first need to determine the intersection of the land use dataset with a given district. Let's take again the Muette district as example case.
land_use
polygons with the single muette
polygon. Call the result land_use_muette
.land_use_muette
.edgecolor='black'
to more clearly see the boundaries of the different polygons.land_use_muette
.intersection()
method of a GeoSeries.intersection()
method takes as argument the geometry for which to calculate the intersection.is_empty
attribute of a GeoSeries.land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
muette = districts[districts.district_name == 'Muette'].geometry.item()
# Calculate the intersection of the land use polygons with Muette
land_use_muette = land_use.geometry.intersection(muette)
# Print the first five rows of the intersection
land_use_muette.head()
/home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:689: RuntimeWarning: All-NaN slice encountered np.nanmin(b[:, 0]), # minx /home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:690: RuntimeWarning: All-NaN slice encountered np.nanmin(b[:, 1]), # miny /home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:691: RuntimeWarning: All-NaN slice encountered np.nanmax(b[:, 2]), # maxx /home/stijnvh/miniconda3/envs/DS-geospatial/lib/python3.8/site-packages/geopandas/array.py:692: RuntimeWarning: All-NaN slice encountered np.nanmax(b[:, 3]), # maxy
0 POLYGON EMPTY 1 POLYGON EMPTY 2 POLYGON EMPTY 3 POLYGON EMPTY 4 POLYGON EMPTY dtype: geometry
# Remove the empty geometries
land_use_muette = land_use_muette[~land_use_muette.is_empty]
# Print the first five rows of the intersection
land_use_muette.head()
135 POLYGON ((3752020.694 2891519.323, 3752025.310... 229 POLYGON ((3752443.986 2891171.823, 3752446.430... 239 POLYGON ((3752110.034 2891774.197, 3752112.444... 278 POLYGON ((3752000.000 2891530.298, 3752000.000... 279 POLYGON ((3751954.462 2891571.778, 3752000.000... dtype: geometry
# Plot the intersection
land_use_muette.plot(edgecolor='black')
<AxesSubplot:>
You can see in the plot that we now only have a subset of the full land use dataset. The original land_use_muette
(before removing the empty geometries) still has the same number of rows as the original land_use
, though. But many of the rows, as you could see by printing the first rows, consist now of empty polygons when it did not intersect with the Muette district.
The intersection()
method also returned only geometries. If we want to combine those intersections with the attributes of the original land use, we can take a copy of this and replace the geometries with the intersections (you can uncomment and run to see the code):
land_use_muette = land_use.copy()
land_use_muette['geometry'] = land_use.geometry.intersection(muette)
land_use_muette = land_use_muette[~land_use_muette.is_empty]
land_use_muette.head()
class | geometry | |
---|---|---|
135 | Green urban areas | POLYGON ((3752020.694 2891519.323, 3752025.310... |
229 | Sports and leisure facilities | POLYGON ((3752443.986 2891171.823, 3752446.430... |
239 | Water bodies | POLYGON ((3752110.034 2891774.197, 3752112.444... |
278 | Roads and associated land | POLYGON ((3752000.000 2891530.298, 3752000.000... |
279 | Roads and associated land | POLYGON ((3751954.462 2891571.778, 3752000.000... |
land_use_muette.plot(column="class") #edgecolor="black")
<AxesSubplot:>
land_use_muette.dissolve(by='class')
geometry | |
---|---|
class | |
Continuous Urban Fabric | MULTIPOLYGON (((3755334.091 2889457.833, 37553... |
Discontinuous Dense Urban Fabric | MULTIPOLYGON (((3755585.963 2889793.822, 37556... |
Green urban areas | MULTIPOLYGON (((3755772.518 2889995.038, 37558... |
Industrial, commercial, public | MULTIPOLYGON (((3755275.182 2889527.443, 37552... |
Railways and associated land | POLYGON ((3755654.921 2889540.054, 3755560.618... |
Roads and associated land | MULTIPOLYGON (((3754820.067 2889843.877, 37548... |
Sports and leisure facilities | MULTIPOLYGON (((3753932.354 2891267.190, 37539... |
Water bodies | MULTIPOLYGON (((3755507.459 2889412.262, 37555... |
EXERCISE: The land use of the Muette district
Based on the land_use_muette
dataframe with the land use for the Muette districts as calculated above, we can now determine the total area of the different land use classes in the Muette district.
intersection()
method of a GeoSeries.intersection()
method takes as argument the geometry for which to calculate the intersection.is_empty
attribute of a GeoSeries.land_use_muette['area'] = land_use_muette.geometry.area
# Total land use per class
land_use_muette.groupby("class")["area"].sum()
class Continuous Urban Fabric 1.275297e+06 Discontinuous Dense Urban Fabric 8.828861e+04 Green urban areas 2.624229e+06 Industrial, commercial, public 3.629901e+05 Railways and associated land 5.424307e+03 Roads and associated land 2.262708e+05 Sports and leisure facilities 6.039885e+05 Water bodies 2.921943e+05 Name: area, dtype: float64
# Relative percentage of land use classes
land_use_muette.groupby("class")["area"].sum() / land_use_muette.geometry.area.sum() * 100
class Continuous Urban Fabric 23.277436 Discontinuous Dense Urban Fabric 1.611493 Green urban areas 47.898906 Industrial, commercial, public 6.625501 Railways and associated land 0.099008 Roads and associated land 4.130023 Sports and leisure facilities 11.024339 Water bodies 5.333295 Name: area, dtype: float64
The above was only for a single district. If we want to do this more easily for all districts, we can do this with the overlay operation.
In a spatial join operation, we are not changing the geometries itself. We are not joining geometries, but joining attributes based on a spatial relationship between the geometries. This also means that the geometries need to at least overlap partially.
If you want to create new geometries based on joining (combining) geometries of different dataframes into one new dataframe (eg by taking the intersection of the geometries), you want an overlay operation.
With the intersection()
method introduced in the previous section, we could for example determine the intersection of a set of countries with another polygon, a circle in the example below:
However, this method (countries.intersection(circle)
) also has some limitations.
For cases where we require a bit more complexity, it is preferable to use the "overlay" operation, instead of the intersection method.
Consider the following simplified example. On the left we see again the 3 countries. On the right we have the plot of a GeoDataFrame with some simplified geologic regions for the same area:
By simply plotting them on top of each other, as shown below, you can see that the polygons of both layers intersect.
But now, by "overlaying" the two layers, we can create a third layer that contains the result of intersecting both layers: all the intersections of each country with each geologic region. It keeps only those areas that were included in both layers.
This operation is called an intersection overlay, and in GeoPandas we can perform this operation with the geopandas.overlay()
function.
Another code example:
africa = countries[countries['continent'] == 'Africa']
africa.plot()
<AxesSubplot:>
cities['geometry'] = cities.buffer(2)
<ipython-input-59-c1250b33ac0f>:1: UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. cities['geometry'] = cities.buffer(2)
intersection = geopandas.overlay(africa, cities, how='intersection')
intersection.plot()
<AxesSubplot:>
intersection.head()
iso_a3 | name_1 | continent | pop_est | gdp_md_est | name_2 | geometry | |
---|---|---|---|---|---|---|---|
0 | AGO | Angola | Africa | 29310273.0 | 189000.0 | Brazzaville | POLYGON ((14.10056 -5.86749, 16.32653 -5.87747... |
1 | COD | Dem. Rep. Congo | Africa | 83301151.0 | 66010.0 | Brazzaville | POLYGON ((16.34948 -5.94680, 16.32653 -5.87747... |
2 | COG | Congo | Africa | 4954674.0 | 30270.0 | Brazzaville | POLYGON ((16.09773 -2.43295, 15.97280 -2.71239... |
3 | AGO | Angola | Africa | 29310273.0 | 189000.0 | Luanda | POLYGON ((13.69097 -10.78080, 13.68638 -10.731... |
4 | AGO | Angola | Africa | 29310273.0 | 189000.0 | Kinshasa | POLYGON ((14.03697 -5.86721, 16.32653 -5.87747... |
With the overlay method, we pass the full GeoDataFrame with all regions to intersect the countries with. The result contains all non-empty intersections of all combinations of countries and city regions.
Note that the result of the overlay function also keeps the attribute information of both the countries as the city regions. That can be very useful for further analysis.
geopandas.overlay(africa, cities, how='intersection').plot() # how="difference"/"union"/"symmetric_difference"
<AxesSubplot:>
EXERCISE: Overlaying spatial datasets I
We will now combine both datasets in an overlay operation. Create a new GeoDataFrame
consisting of the intersection of the land use polygons which each of the districts, but make sure to bring the attribute data from both source layers.
land_use
and districts
. Assign the result to a variable combined
.combined
).geopandas.overlay()
function.overlay()
functions takes first the two GeoDataFrames to combine, and a third how
keyword indicating how to combine the two layers.how='intersection'
.land_use = geopandas.read_file("zip://./data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
# Overlay both datasets based on the intersection
combined = geopandas.overlay(land_use, districts, how='intersection')
# Print the first five rows of the result
combined.head()
class | id | district_name | population | geometry | |
---|---|---|---|---|---|
0 | Water bodies | 61 | Auteuil | 67967 | POLYGON ((3751395.345 2890118.001, 3751395.345... |
1 | Continuous Urban Fabric | 61 | Auteuil | 67967 | MULTIPOLYGON (((3753253.104 2888254.529, 37532... |
2 | Roads and associated land | 61 | Auteuil | 67967 | POLYGON ((3751519.830 2890061.509, 3751522.057... |
3 | Green urban areas | 61 | Auteuil | 67967 | MULTIPOLYGON (((3754314.717 2890283.121, 37543... |
4 | Roads and associated land | 61 | Auteuil | 67967 | POLYGON ((3751619.113 2890500.000, 3751626.627... |
EXERCISE: Overlaying spatial datasets II
Now that we created the overlay of the land use and districts datasets, we can more easily inspect the land use for the different districts. Let's get back to the example district of Muette, and inspect the land use of that district.
'area'
with the area of each polygon to the combined
GeoDataFrame.land_use_muette
where the 'district_name'
is equal to "Muette".land_use_muette
, using the 'class'
column to color the polygons.'class'
of land_use_muette
using the groupby()
method, and print the result.area
attribute of the geometry
of the GeoDataFrame.column
keyword.groupby()
method takes the column name on which you want to group as the first argument.sum()
of the area.# Add the area as a column
combined['area'] = combined.geometry.area
# Take a subset for the Muette district
land_use_muette = combined[combined['district_name'] == 'Muette']
# Visualize the land use of the Muette district
land_use_muette.plot(column='class')
<AxesSubplot:>
# Calculate the total area for each land use class
print(land_use_muette.groupby('class')['area'].sum() / 1000**2)
class Continuous Urban Fabric 1.275297 Discontinuous Dense Urban Fabric 0.088289 Green urban areas 2.624229 Industrial, commercial, public 0.362990 Railways and associated land 0.005424 Roads and associated land 0.226271 Sports and leisure facilities 0.603989 Water bodies 0.292194 Name: area, dtype: float64
EXERCISE: Overlaying spatial datasets III
Thanks to the result of the overlay operation, we can now more easily perform a similar analysis for all districts. Let's investigate the fraction of green urban area in each of the districts.
combined
dataset, calculate the total area per district using groupby()
.combined
and call this urban_green
.urban_green
subset, and call this urban_green_area
.districts_area = combined.groupby("district_name")["area"].sum()
districts_area.head()
district_name Archives 3.678060e+05 Arsenal 4.871228e+05 Arts-et-Metiers 3.181515e+05 Auteuil 6.386649e+06 Batignolles 1.350675e+06 Name: area, dtype: float64
urban_green = combined[combined["class"] == "Green urban areas"]
urban_green_area = urban_green.groupby("district_name")["area"].sum()
urban_green_area.head()
district_name Archives 4.409114e+03 Arsenal 2.801951e+04 Arts-et-Metiers 7.154903e+03 Auteuil 1.577429e+06 Batignolles 8.471799e+04 Name: area, dtype: float64
urban_green_fraction = urban_green_area / districts_area * 100
urban_green_fraction.nlargest()
district_name Muette 47.898906 Porte-Dauphine 42.266546 Odeon 33.044112 St-Germain-l'Auxerrois 31.534557 Auteuil 24.698847 Name: area, dtype: float64
urban_green_fraction.nsmallest()
district_name Chaussée-d'Antin 0.005752 Saint-Vincent-de-Paul 0.332773 Notre-Dame-des-Champs 0.593873 Salpêtrière 0.605310 Saint-Georges 0.654253 Name: area, dtype: float64
An alternative to calculate the area per land use class in each district:
combined.groupby(["district_name", "class"])["area"].sum().reset_index()
district_name | class | area | |
---|---|---|---|
0 | Archives | Continuous Urban Fabric | 323124.623717 |
1 | Archives | Green urban areas | 4409.113826 |
2 | Archives | Industrial, commercial, public | 33752.922651 |
3 | Archives | Roads and associated land | 6519.347840 |
4 | Arsenal | Continuous Urban Fabric | 274791.191887 |
... | ... | ... | ... |
352 | Villette | Roads and associated land | 33164.775004 |
353 | Villette | Water bodies | 3384.185687 |
354 | Vivienne | Continuous Urban Fabric | 190071.213662 |
355 | Vivienne | Industrial, commercial, public | 33180.126871 |
356 | Vivienne | Roads and associated land | 20339.790825 |
357 rows × 3 columns