Inspired by Mike Bostock's Let's Make a Map, we want to make a map too using matta. We will display the communes of Santiago, Chile. To do that we will perform the following steps:
from __future__ import print_function, unicode_literals
import matta
import json
import unicodedata
# we do this to load the required libraries when viewing on NBViewer
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')
/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/cartography/template.css /home/egraells/Dropbox/phd/apps/matta/matta/visualizations/parsets/template.css
Note We delete data to start from 0
!rm -fr data
!mkdir data
!wget http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip -O data/division_comunal.zip
--2015-12-20 22:18:18-- http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip Resolviendo siit2.bcn.cl (siit2.bcn.cl)... 200.0.66.71 Conectando con siit2.bcn.cl (siit2.bcn.cl)[200.0.66.71]:80... conectado. Petición HTTP enviada, esperando respuesta... 200 OK Longitud: 29000232 (28M) [application/zip] Grabando a: “data/division_comunal.zip” 100%[======================================>] 29.000.232 1,25MB/s en 22s 2015-12-20 22:18:47 (1,23 MB/s) - “data/division_comunal.zip” guardado [29000232/29000232]
!unzip data/division_comunal.zip -d data/
Archive: data/division_comunal.zip inflating: data/Disclaimer.txt inflating: data/division_comunal.dbf inflating: data/division_comunal.prj inflating: data/division_comunal.sbn inflating: data/division_comunal.sbx inflating: data/division_comunal.shp inflating: data/division_comunal.shp.xml inflating: data/division_comunal.shx
You can use ogrinfo
to see the structure of the source shapefile.
!ogrinfo data/division_comunal.shp 'division_comunal' | head -n 30
INFO: Open of `data/division_comunal.shp' using driver `ESRI Shapefile' successful. Layer name: division_comunal Geometry: Polygon Feature Count: 346 Extent: (-3701712.293900, 3794823.357600) - (704690.560200, 8065196.816300) Layer SRS WKT: PROJCS["WGS_1984_UTM_Zone_19S", GEOGCS["GCS_WGS_1984", DATUM["WGS_1984", SPHEROID["WGS_84",6378137.0,298.257223563]], PRIMEM["Greenwich",0.0], UNIT["Degree",0.0174532925199433]], PROJECTION["Transverse_Mercator"], PARAMETER["False_Easting",500000.0], PARAMETER["False_Northing",10000000.0], PARAMETER["Central_Meridian",-69.0], PARAMETER["Scale_Factor",0.9996], PARAMETER["Latitude_Of_Origin",0.0], UNIT["Meter",1.0]] NOM_REG: String (50.0) NOM_PROV: String (20.0) NOM_COM: String (30.0) SHAPE_LENG: Real (19.11) DIS_ELEC: Integer (4.0) CIR_SENA: Integer (4.0) COD_COMUNA: Integer (4.0) SHAPE_Le_1: Real (19.11) SHAPE_Area: Real (19.11)
Now we use ogr2ogr to convert the shapefile into GeoJSON. We will build a file for Santiago (its capital) only.
Notes:
santiago-comunas-geo.json
in case it exists (that is, when we re-run the notebook :) ).-clipdst
option to specify a bounding box obtained in this site. We need it because some populated locations are much bigger in administrative terms than in population terms.-t_srs EPSG:4326
option to convert the data coordinates to (longitude,latitude) pairs.!ogr2ogr -where "NOM_PROV IN ('Santiago', 'Maipo', 'Cordillera', 'Chacabuco', 'Talagante')" -f GeoJSON \
-clipdst -70.828155 -33.635036 -70.452573 -33.302953 -t_srs EPSG:4326 \
data/santiago-comunas-geo.json data/division_comunal.shp
!topojson -p --id-property NOM_COM -s 0 -o data/santiago-comunas-topo.json data/santiago-comunas-geo.json
bounds: -70.828155 -33.635036 -70.452573 -33.302953 (spherical) pre-quantization: 0.0418m (3.76e-7°) 0.0369m (3.32e-7°) topology: 274 arcs, 5195 points post-quantization: 4.18m (0.0000376°) 3.69m (0.0000332°) prune: retained 274 / 274 arcs (100%)
!ls -lh data/*.json
-rw-rw-r-- 1 egraells egraells 449K dic 20 22:19 data/santiago-comunas-geo.json -rw-rw-r-- 1 egraells egraells 55K dic 20 22:19 data/santiago-comunas-topo.json
def strip_accents(s):
return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')
with open('data/santiago-comunas-topo.json', 'r') as f:
stgo = json.load(f)
for g in stgo['objects']['santiago-comunas-geo']['geometries']:
g['id'] = strip_accents(g['id'].title())
g['properties']['id'] = g['id']
Here is an example feature:
stgo['objects']['santiago-comunas-geo']['geometries'][0]
{u'arcs': [[0, 1, 2, 3, 4, 5, 6, 7]], u'id': u'Independencia', u'properties': {u'CIR_SENA': 7, u'COD_COMUNA': 1310, u'DIS_ELEC': 19, u'NOM_COM': u'Independencia', u'NOM_PROV': u'Santiago', u'NOM_REG': u'Regi\xf3n Metropolitana de Santiago', u'SHAPE_Area': 7514745.51089, u'SHAPE_LENG': 11488.6957475, u'SHAPE_Le_1': 11718.6870863, u'id': u'Independencia'}, u'type': u'Polygon'}
matta.cartography(geometry=stgo, leaflet=False, label='NOM_COM',
fill_color={'value': 'id', 'palette': str('husl'), 'scale': 'ordinal', 'n_colors': 40})
Now that we can load a map and giving some colors to areas, let's try to visualize some data. The easiest way to do so would be to get a Wikipedia page with data for Santiago. From it, we will get the Human Development Index (IDH in Spanish) score for each municipality of the city.
import pandas as pd
df = pd.read_html('https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile',
attrs={'class': 'sortable'}, header=0)[0]
df.head()
Comuna | Ubicación? | Población (2002)? | Viviendas (2002)? | Densidad poblacional? | Crecimiento demográfico (1992-2002)? | IDH (2003)? | Pobreza (2006)? | |
---|---|---|---|---|---|---|---|---|
0 | Cerrillos | surponiente | 71.906 | 19.811 | 4.32908 | -10 | 0,743 (54) | 83 |
1 | Cerro Navia | norponiente | 148.312 | 35.277 | 13.48291 | -48 | 0,683 (165) | 175 |
2 | Conchalí | norte | 133.256 | 32.609 | 12.07029 | -129 | 0,707 (118) | 80 |
3 | El Bosque | sur | 175.594 | 42.808 | 12.27072 | 16 | 0,711 (106) | 158 |
4 | Estación Central | surponiente | 130.394 | 32.357 | 9.03631 | -75 | 0,735 (60) | 73 |
Data is not clean due to locale settings. Fortunately, we just want the HDI column, which should be easy to convert to a meaningful float. Let's fix the HDI and Population columns.
df['Comuna'] = [strip_accents(c).replace('?', '').title() for c in df['Comuna']]
df['HDI'] = [float(c.split()[0].replace(',', '.')) for c in df['IDH (2003)?']]
del df['IDH (2003)?']
df.head()
Comuna | Ubicación? | Población (2002)? | Viviendas (2002)? | Densidad poblacional? | Crecimiento demográfico (1992-2002)? | Pobreza (2006)? | HDI | |
---|---|---|---|---|---|---|---|---|
0 | Cerrillos | surponiente | 71.906 | 19.811 | 4.32908 | -10 | 83 | 0.743 |
1 | Cerro Navia | norponiente | 148.312 | 35.277 | 13.48291 | -48 | 175 | 0.683 |
2 | Conchali | norte | 133.256 | 32.609 | 12.07029 | -129 | 80 | 0.707 |
3 | El Bosque | sur | 175.594 | 42.808 | 12.27072 | 16 | 158 | 0.711 |
4 | Estacion Central | surponiente | 130.394 | 32.357 | 9.03631 | -75 | 73 | 0.735 |
df['Population'] = df['Población (2002)?'] * 1000
df = df.loc[:,('Comuna', 'HDI', 'Population')]
df.head()
Comuna | HDI | Population | |
---|---|---|---|
0 | Cerrillos | 0.743 | 71906 |
1 | Cerro Navia | 0.683 | 148312 |
2 | Conchali | 0.707 | 133256 |
3 | El Bosque | 0.711 | 175594 |
4 | Estacion Central | 0.735 | 130394 |
Much better!
df.describe()
HDI | Population | |
---|---|---|
count | 37.000000 | 37.000000 |
mean | 0.762865 | 146718.648649 |
std | 0.076155 | 106077.982722 |
min | 0.657000 | 2477.000000 |
25% | 0.709000 | 85118.000000 |
50% | 0.737000 | 120874.000000 |
75% | 0.804000 | 175594.000000 |
max | 0.949000 | 492603.000000 |
Now let's visualize HDI for each municipality.
matta.cartography(geometry=stgo, label='NOM_COM', width=700,
area_dataframe=df, area_feature_name='Comuna',
area_color={'value': 'HDI', 'scale': 'threshold', 'palette': 'coolwarm', 'n_colors': 7, 'legend': True},
leaflet=False)
What happened there? We bound a DataFrame (area_dataframe
) to the geometry by using the area_feature_name
column in the DataFrame. Using this, our code was able to match a feature from the TopoJSON structure with a row from the DataFrame. Note that some areas are not drawn - they were not available in the DataFrame.
We can also visualize this information using a cartogram. According to Wikipedia:
A cartogram is a map in which some thematic mapping variable – such as travel time, population, or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable.
For instance, we observe that the areas that have a high HDI are, indeed, big. But how much people inhabit them?
matta.cartogram(geometry=stgo, width=700,
area_dataframe=df, area_feature_name='Comuna', area_opacity=1.0,
area_color={'value': 'HDI', 'scale': 'threshold', 'palette': 'coolwarm', 'n_colors': 7},
area_value='Population', na_value=df['Population'].min() * 0.75)
As we can see, the shape of each area is different now! The differences are not very dramatic, but this is just an example.