In [1]:
%matplotlib inline

import pandas as pd
import geopandas
import matplotlib.pyplot as plt

Case study - Conflict mapping: mining sites in eastern DR Congo

In this case study, we will explore a dataset on artisanal mining sites located in eastern DR Congo.

Note: this tutorial is meant as a hands-on session, and most code examples are provided as exercises to be filled in. I highly recommend actually trying to do this yourself, but if you want to follow the solved tutorial, you can find this in the _solved directory.


Background

IPIS, the International Peace Information Service, manages a database on mining site visits in eastern DR Congo: http://ipisresearch.be/home/conflict-mapping/maps/open-data/

Since 2009, IPIS has visited artisanal mining sites in the region during various data collection campaigns. As part of these campaigns, surveyor teams visit mining sites in the field, meet with miners and complete predefined questionnaires. These contain questions about the mining site, the minerals mined at the site and the armed groups possibly present at the site.

Some additional links:

1. Importing and exploring the data

The mining site visit data

IPIS provides a WFS server to access the data. We can send a query to this server to download the data, and load the result into a geopandas GeoDataFrame:

In [2]:
import requests
import json

wfs_url = "http://geo.ipisresearch.be/geoserver/public/ows"
params = dict(service='WFS', version='1.0.0', request='GetFeature',
              typeName='public:cod_mines_curated_all_opendata_p_ipis', outputFormat='json')

r = requests.get(wfs_url, params=params)
data_features = json.loads(r.content.decode('UTF-8'))
data_visits = geopandas.GeoDataFrame.from_features(data_features, crs={'init': 'epsg:4326'})

However, the data is also provided in the tutorial materials as a GeoJSON file, so it is certainly available during the tutorial.

EXERCISE:
  • Read the GeoJSON file `data/cod_mines_curated_all_opendata_p_ipis.geojson` using geopandas, and call the result `data_visits`.
  • Inspect the first 5 rows, and check the number of observations
In [3]:
data_visits = geopandas.read_file("data/cod_mines_curated_all_opendata_p_ipis.geojson")
In [4]:
data_visits.head()
Out[4]:
id vid source project pcode name visit_date visit_onsite visit_onsite_novisitreason longitude ... digging_armed_group2 forced_labour_armed_group2 pillaging_armed_group2 state_service1 state_service2 state_service3 state_service4 itsci qualification geometry
0 cod_mines_curated_all_opendata_p_ipis.fid-11f0... 1 IPIS - Ministère des Mines IPIS - 2009 codmine00191 Eohe 2009-01-01Z 1 None 28.712580 ... NaN NaN NaN None None None None None None POINT (28.71258 0.33188)
1 cod_mines_curated_all_opendata_p_ipis.fid-11f0... 2 IPIS - Ministère des Mines IPIS - 2009 codmine00192 Eita 2009-01-01Z 1 None 28.699160 ... NaN NaN NaN None None None None None None POINT (28.69916 0.32153)
2 cod_mines_curated_all_opendata_p_ipis.fid-11f0... 3 IPIS - Ministère des Mines IPIS - 2009 codmine00242 Mungu Iko 2009-01-01Z 1 None 28.185142 ... NaN NaN NaN None None None None None None POINT (28.1851423 0.54499175)
3 cod_mines_curated_all_opendata_p_ipis.fid-11f0... 4 IPIS - Ministère des Mines IPIS - 2009 codmine00260 Kiviri/Tayna 2009-01-01Z 1 None 28.884528 ... NaN NaN NaN None None None None None None POINT (28.884528 -0.352529)
4 cod_mines_curated_all_opendata_p_ipis.fid-11f0... 5 IPIS - Ministère des Mines IPIS - 2009 codmine00272 Makanga 2009-01-01Z 1 None 28.903945 ... NaN NaN NaN None None None None None None POINT (28.903945 -0.036707)

5 rows × 62 columns

In [5]:
len(data_visits)
Out[5]:
3687

The provided dataset contains a lot of information, much more than we are going to use in this tutorial. Therefore, we will select a subset of the column:

In [6]:
data_visits = data_visits[['vid', 'project', 'visit_date', 'name', 'pcode', 'workers_numb', 'interference', 'armed_group1', 'mineral1', 'geometry']]
In [7]:
data_visits.head()
Out[7]:
vid project visit_date name pcode workers_numb interference armed_group1 mineral1 geometry
0 1 IPIS - 2009 2009-01-01Z Eohe codmine00191 300.0 NaN None Or POINT (28.71258 0.33188)
1 2 IPIS - 2009 2009-01-01Z Eita codmine00192 110.0 NaN None Or POINT (28.69916 0.32153)
2 3 IPIS - 2009 2009-01-01Z Mungu Iko codmine00242 NaN NaN FARDC Or POINT (28.1851423 0.54499175)
3 4 IPIS - 2009 2009-01-01Z Kiviri/Tayna codmine00260 NaN NaN FDLR Or POINT (28.884528 -0.352529)
4 5 IPIS - 2009 2009-01-01Z Makanga codmine00272 NaN NaN FDLR Or POINT (28.903945 -0.036707)

Before starting the actual geospatial tutorial, we will use some more advanced pandas queries to construct a subset of the data that we will use further on:

In [8]:
# Take only the data of visits by IPIS
data_ipis = data_visits[data_visits['project'].str.contains('IPIS') & (data_visits['workers_numb'] > 0)]
In [9]:
# For those mining sites that were visited multiple times, take only the last visit
data_ipis_lastvisit = data_ipis.sort_values('visit_date').groupby('pcode', as_index=False).last()
data = geopandas.GeoDataFrame(data_ipis_lastvisit, crs=data_visits.crs)

Data on protected areas in the same region

Next to the mining site data, we are also going to use a dataset on protected areas (national parks) in Congo. This dataset was downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs and included in the tutorial repository: data/cod_conservation.zip.

EXERCISE:
  • Extract the `data/cod_conservation.zip` archive, and read the shapefile contained in it. Assign the resulting GeoDataFrame to a variable named `protected_areas`.
  • Quickly plot the GeoDataFrame.
In [10]:
protected_areas = geopandas.read_file("data/Conservation/RDC_aire_protegee_2013.shp")
# or to read it directly from the zip file:
# protected_areas = geopandas.read_file("/Conservation", vfs="zip://./data/cod_conservation.zip")
In [11]:
protected_areas.plot()
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7294d8128>

Conversion to a common Coordinate Reference System

We will see that both datasets use a different Coordinate Reference System (CRS). For many operations, however, it is important that we use a consistent CRS, and therefore we will convert both to a commong CRS.

But first, we explore problems we can encounter related to CRSs.


Goma is the capital city of North Kivu province of Congo, close to the border with Rwanda. It's coordinates are 1.66°S 29.22°E.

EXERCISE:
  • Create a single Point object representing the location of Goma. Call this `goma` (watch out for the order to pass the latitude and longitude!)
  • Calculate the distances of all mines to Goma, and show the 5 smallest distances (mines closest to Goma).
In [12]:
from shapely.geometry import Point
In [13]:
goma = Point(29.22, -1.66)
In [14]:
dist_goma = data.distance(goma)
In [15]:
dist_goma.nsmallest(5)
Out[15]:
301     0.226353
1868    0.235082
1865    0.252101
1067    0.263424
1538    0.264856
dtype: float64

The distances we see here in degrees, which is not helpful for interpreting those distances. That is a reason we will convert the data to another coordinate reference system (CRS) for the remainder of this tutorial.

EXERCISE:
  • Make a visualization of the national parks and the mining sites on a single plot.

Check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.

In [16]:
ax = protected_areas.plot()
data.plot(ax=ax, color='C1')
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd729500c18>

You will notice that the protected areas and mining sites do not map to the same area on the plot. This is because the Coordinate Reference Systems (CRS) differ for both datasets. Another reason we will need to convert the CRS!

Let's check the Coordinate Reference System (CRS) for both datasets.

The mining sites data uses the WGS 84 lat/lon (EPSG 4326) CRS:

In [17]:
data.crs
Out[17]:
{'init': 'epsg:4326'}

The protected areas dataset, on the other hand, uses a WGS 84 / World Mercator (EPSG 3395) projection (with meters as unit):

In [18]:
protected_areas.crs
Out[18]:
{'datum': 'WGS84',
 'lat_ts': 5,
 'lon_0': 0,
 'no_defs': True,
 'proj': 'merc',
 'units': 'm',
 'x_0': 0,
 'y_0': 0}

We will convert both datasets to a local UTM zone, so we can plot them together and that distance-based calculations give sensible results.

To find the appropriate UTM zone, you can check http://www.dmap.co.uk/utmworld.htm or https://www.latlong.net/lat-long-utm.html, and in this case we will use UTM zone 35, which gives use EPSG 32735: https://epsg.io/32735

EXERCISE:
  • Convert both datasets (`data` and `protected_areas`) to EPSG 32735. Name the results `data_utm` and `protected_areas_utm`.
  • Try again to visualize both datasets on a single map.
In [19]:
data_utm = data.to_crs(epsg=32735)
protected_areas_utm = protected_areas.to_crs(epsg=32735)
In [20]:
ax = protected_areas_utm.plot()
data_utm.plot(ax=ax, color='C1')
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7292c4898>

More advanced visualizations

For the following exercises, check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.

EXERCISE:
  • Make a visualization of the national parks and the mining sites on a single plot.
  • Pay attention to the following details:
    • Make the figure a bit bigger.
    • The protected areas should be plotted in green
    • For plotting the mining sites, adjust the markersize and use an `alpha=0.5`.
    • Remove the figure border and x and y labels (coordinates)
In [21]:
ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5)
ax.set_axis_off()
In [22]:
# alternative with constructing the matplotlib figure first
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(aspect='equal'))
protected_areas_utm.plot(ax=ax, color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5)
ax.set_axis_off()
EXERCISE: In addition to the previous figure:
  • Give the mining sites a distinct color based on the `'interference'` column, indicating whether an armed group is present at the mining site or not.
In [23]:
ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='interference')
ax.set_axis_off()
EXERCISE: In addition to the previous figure:
  • Give the mining sites a distinct color based on the `'mineral1'` column, indicating which mineral is the primary mined mineral.
In [24]:
ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='mineral1', legend=True)
ax.set_axis_off()

2. Spatial operations

EXERCISE:
  • Access the geometry of the "Kahuzi-Biega National park".
  • Filter the mining sites to select those that are located in this national park.
In [25]:
kahuzi = protected_areas_utm[protected_areas_utm['NAME_AP'] == "Kahuzi-Biega National park"].geometry.squeeze()
In [26]:
mines_kahuzi = data_utm[data_utm.within(kahuzi)]
mines_kahuzi
Out[26]:
pcode vid project visit_date name workers_numb interference armed_group1 mineral1 geometry
661 codmine00680 1032 IPIS - PROMINES MoFA 2013-2014 2013-08-28Z Ibozia/Kalumé 80.0 1.0 Raïa Mutomboki Cassitérite POINT (567832.7086093378 9759143.339360647)
662 codmine00681 1025 IPIS - PROMINES MoFA 2013-2014 2013-08-26Z Matamba 150.0 1.0 Raïa Mutomboki Cassitérite POINT (598323.5389475008 9758688.142411157)
663 codmine00682 1031 IPIS - PROMINES MoFA 2013-2014 2013-08-27Z Mutete/Mukina 170.0 1.0 Raïa Mutomboki Cassitérite POINT (570733.4369126211 9761871.114227083)
664 codmine00683 1033 IPIS - PROMINES MoFA 2013-2014 2013-08-28Z Mutete 100.0 1.0 Raïa Mutomboki Cassitérite POINT (569881.0930415759 9762219.110778008)
760 codmine00779 1603 IPIS - PROMINES MoFA 2013-2014 2014-02-25Z Mazankala 120.0 1.0 Raïa Mutomboki Cassitérite POINT (613075.5326777868 9722956.979837928)
813 codmine00833 2439 IPIS - IOM PROMINES 2015 2015-07-28Z Kitendebwa 50.0 0.0 FARDC Or POINT (693078.9282059025 9770107.517721133)
871 codmine00893 1226 IPIS - PROMINES MoFA 2013-2014 2013-09-28Z Sebwa-Lukoma 130.0 1.0 Raïa Mutomboki Cassitérite POINT (660406.3452248175 9715261.717041001)
872 codmine00894 1305 IPIS - PROMINES MoFA 2013-2014 2013-10-30Z Rwamakaza 160.0 1.0 Raïa Mutomboki Cassitérite POINT (661266.834456568 9716072.198784607)
1486 codmine01764 180 IPIS - 2009 2009-01-01Z Mugaba I 50.0 NaN NaN Or POINT (685167.3714990132 9744069.967416598)
1487 codmine01765 181 IPIS - 2009 2009-01-01Z Mugaba Ouest 46.0 NaN NaN Or POINT (683156.6865782175 9746324.416321497)
1681 codmine01997 2476 IPIS - IOM PROMINES 2015 2015-08-02Z Nguba(Nkuba) kamisoke 122.0 1.0 Raïa Mutomboki Cassitérite POINT (622151.3489110788 9808363.111073116)
In [27]:
len(mines_kahuzi)
Out[27]:
11

Applying custom spatial functions

Shapely and GeoPandas provide a lot functionality out of the box, because sometimes they will not exactly fit your analysis needs. However, we can still easily use them as building blocks for a custom spatial operation, and apply this to all our geometries.

EXERCISE: Determine for each mining site the "closest" protected area:
  • PART 1 - do this for a single mining site:
    • Get a single mining site, e.g. the first of the dataset.
    • Calculate the distance (in km's) to all protected areas for this mining site
    • Get the index of the minimum distance (tip: `idxmin()`) and get the name of the protected are corresponding to this index.
  • PART 2 - apply this procedure on each geometry:
    • Write the above procedure as a function that gets a single site and the protected areas dataframe as input and returns the name of the closest protected area as output.
    • Apply this function to all sites using the `.apply()` method on `data_utm.geometry`.
In [28]:
single_mine = data_utm.geometry[0]
In [29]:
dist = protected_areas_utm.distance(single_mine)
In [30]:
idx = dist.idxmin()
closest_area = protected_areas_utm.loc[idx, 'NAME_AP']
closest_area
Out[30]:
'Virunga National park'
In [31]:
def closest_protected_area(mine, protected_areas):
    dist = protected_areas.distance(mine)
    idx = dist.idxmin()
    closest_area = protected_areas.loc[idx, 'NAME_AP']
    return closest_area
In [32]:
result = data_utm.geometry.apply(lambda site: closest_protected_area(site, protected_areas_utm))

3. Using spatial join to determine mining sites in the protected areas

Based on the analysis and visualizations above, we can already see that there are mining sites inside the protected areas. Let's now do an actual spatial join to determine which sites are within the protected areas.

Mining sites in protected areas

EXERCISE:
  • Add information about the protected areas to the mining sites dataset, using a spatial join:
    • Call the result `data_within_protected`
    • If the result is empty, this is an indication that the coordinate reference system is not matching. Make sure to re-project the data (see above).
  • How many mining sites are located within a national park?
  • Count the number of mining sites per national park (pandas tip: check `value_counts()`)
In [33]:
data_within_protected = geopandas.sjoin(data_utm, protected_areas_utm[['NAME_AP', 'geometry']],
                                        op='within', how='inner')
In [34]:
len(data_within_protected)
Out[34]:
64
In [35]:
data_within_protected['NAME_AP'].value_counts()
# or data_within_protected.groupby('NAME_AP').size()
Out[35]:
Itombwe Nature Reserve          21
Luama-Katanga Hunting Domain    14
Kahuzi-Biega National park      11
Luama-Kivu Hunting Domain        9
Okapi Faunal Reserve             5
Maiko National park              3
Tayna Nature Reserve             1
Name: NAME_AP, dtype: int64
In [36]:
data_within_protected.groupby('NAME_AP')['workers_numb'].sum()
Out[36]:
NAME_AP
Itombwe Nature Reserve          2987.0
Kahuzi-Biega National park      1178.0
Luama-Katanga Hunting Domain     930.0
Luama-Kivu Hunting Domain       1057.0
Maiko National park              493.0
Okapi Faunal Reserve             997.0
Tayna Nature Reserve             244.0
Name: workers_numb, dtype: float64

Mining sites in the borders of protected areas

And what about the borders of the protected areas? (just outside the park)

EXERCISE:
  • Create a new dataset, `protected_areas_borders`, that contains the border area (10 km wide) of each protected area:
    • Tip: one way of doing this is with the `buffer` and `difference` function.
    • Plot the resulting borders as a visual check of correctness.
  • Count the number of mining sites per national park that are located within its borders
In [37]:
protected_areas_border = protected_areas_utm[['NAME_AP', 'geometry']].copy()
In [38]:
protected_areas_border['geometry'] = protected_areas_border.buffer(10000).difference(protected_areas_utm.unary_union)
In [39]:
protected_areas_border.plot()
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd728f47518>
In [40]:
data_within_border = geopandas.sjoin(data_utm, protected_areas_border,
                                     op='within', how='inner')
In [41]:
data_within_border['NAME_AP'].value_counts()
Out[41]:
Kahuzi-Biega National park      99
Okapi Faunal Reserve            50
Maiko National park             33
Itombwe Nature Reserve          32
Luama-Kivu Hunting Domain       23
Kisimba Ikobo Nature Reserve    21
Tayna Nature Reserve            11
Luama-Katanga Hunting Domain     9
Upemba National park             4
Virunga National park            3
Mulumbu Hunting Domain           1
Name: NAME_AP, dtype: int64

4. Extracting information from raster layer

This section shows an example how to extract information from a raster layer for each of the vector features. In this case, we will extract the value of a vegetation map for each mining site. The data can be downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs (scroll a bit down, and in the right sidebar you can download the "Vegetation" data).

We use the rasterstats package to do this:

In [42]:
from rasterstats import point_query
In [43]:
vegetation_raster = "data/Vegetation/Carte_vgt_ucl_10/carte_foraf/central_africa_vegetation_map_foraf.tif"
In [44]:
data['vegetation'] = point_query(data.geometry, vegetation_raster, interpolate='nearest')
/home/joris/miniconda3/lib/python3.5/site-packages/rasterstats/point.py:167: FutureWarning: The value of this property will change in version 1.0. Please see https://github.com/mapbox/rasterio/issues/86 for details.
  with Raster(raster, nodata=nodata, affine=affine, band=band) as rast:
/home/joris/miniconda3/lib/python3.5/site-packages/rasterstats/io.py:242: FutureWarning: GDAL-style transforms are deprecated and will not be supported in Rasterio 1.0.
  self.affine = guard_transform(self.src.transform)
/home/joris/miniconda3/lib/python3.5/site-packages/rasterstats/io.py:294: UserWarning: Setting nodata to -999; specify nodata explicitly
  warnings.warn("Setting nodata to -999; specify nodata explicitly")

The legend available as an excel file:

In [45]:
legend = pd.read_excel("data/Vegetation/Carte_vgt_ucl_10/carte_foraf/legend_central_africa_vegetation_map_foraf.xls")
In [46]:
data['vegetation'] = data['vegetation'].replace(legend['Final Label (sur le tif)'])
In [47]:
data['vegetation'].tail(10)
Out[47]:
2141    Mosaic cultivated areas / vegetation (herbaceo...
2142    Mosaic cultivated areas / vegetation (herbaceo...
2143            Rural complex and Young secondary forest 
2144                      Savanna woodland - Tree savanna
2145            Rural complex and Young secondary forest 
2146            Rural complex and Young secondary forest 
2147                                    Submontane forest
2148            Rural complex and Young secondary forest 
2149            Rural complex and Young secondary forest 
2150            Rural complex and Young secondary forest 
Name: vegetation, dtype: object
In [48]:
data.plot(column='vegetation', figsize=(15, 15), legend=True)
Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x7faebd571320>