This notebook illustrates the blogpost (in French) https://www.smalsresearch.be/la-jointure-spatiale-la-cle-de-lanalytique-geographique .

In this blogpost, we explain the concept of spatial join, and use it to highlight incoherences between several open dataset about Belgian geography

In [1]:
import os

import geopandas as gpd
import pandas as pd
import numpy as np
import shapely
import matplotlib as mpl
import matplotlib.pyplot as plt
import geopy
import urllib
from shapely.geometry import shape
import contextily as ctx


with_plotly = False # Preferably to True, but in case plotly cannot be installer, this allows the notebook to work
if with_plotly:
    import plotly.express as px
    import plotly.graph_objects as go
In [2]:
crs =     'epsg:3857'
osm_crs=  'epsg:4326'

Functions

In [3]:
def download_if_nexist(url, filename):
    """
    If the (local) file <filename> does not exists, download it from <url> 

    Parameters
    ----------
    url: str
       url to fecth
    filename: str
       local file to save

    Returns
    -------

    None
    """
    if not os.path.isfile(filename):
        with urllib.request.urlopen(url) as response:
            with open(filename, "wb") as f:
                f.write(response.read())
In [4]:
def get_osm_polyg(name):
    """
    Request Nominatim (OpenStreetMap) for "name", and returns the polygon (or point) given in the (first) result
    
    Parameters
    ----------
    name: str
        
    """
    url = f"http://nominatim.openstreetmap.org/search.php?q={name}&format=json&limit=1&polygon_geojson=1"
    with urllib.request.urlopen(url) as response:
        json_res = json.loads(response.read())
        if len(json_res)==0:
            return None
        return shape(json_res[0]["geojson"])

Load data

Zip code list

Get the official list of Belgian zipcodes, provided by BPost. Some of them correspond to real (sub)-localities, other are "special codes", corresponding to some institutions (Parliaments, ...) or companies (RTBF, 3 Suisses, ...)

This source does not contain any geographical informations

If several entities share the same zipcode, this zipcode appears in several records

In [5]:
download_if_nexist("https://www.bpost2.be/zipcodes/files/zipcodes_alpha_fr_new.xls",
                  "data/zipcodes_alpha_fr_new.xls")

zipcodes = pd.read_excel("data/zipcodes_alpha_fr_new.xls")
zipcodes = zipcodes.rename({"Code postal":"zipcode", "Commune principale": "main_commune", "Localité":"locality"}, axis=1)
zipcodes["zipcode"] = zipcodes["zipcode"].astype(str)
zipcodes["is_special"] = zipcodes["Sous-commune"].isnull()
print(zipcodes.zipcode.nunique(), "unique zipcodes")
zipcodes
1190 unique zipcodes
Out[5]:
zipcode locality Sous-commune main_commune Province is_special
0 2970 'S Gravenwezel Oui SCHILDE ANVERS False
1 3700 'S Herenelderen Oui TONGEREN LIMBOURG False
2 7510 3 Suisses NaN 3 Suisses NaN True
3 9420 Aaigem Oui ERPE-MERE FLANDRE-ORIENTALE False
4 8511 Aalbeke Oui KORTRIJK FLANDRE-OCCIDENTALE False
... ... ... ... ... ... ...
2760 3690 Zutendaal Non ZUTENDAAL LIMBOURG False
2761 8550 Zwevegem Non ZWEVEGEM FLANDRE-OCCIDENTALE False
2762 8750 Zwevezele Oui WINGENE FLANDRE-OCCIDENTALE False
2763 9052 Zwijnaarde Oui GENT FLANDRE-ORIENTALE False
2764 2070 Zwijndrecht Non ZWIJNDRECHT ANVERS False

2765 rows × 6 columns

In [6]:
zipcodes_normal = zipcodes[~zipcodes.is_special].drop("is_special", axis=1)

Some stats

In [7]:
print("Number of zipcodes:", zipcodes.zipcode.nunique())
print("Number of (not special) zipcodes :", zipcodes_normal.zipcode.nunique())
print("Number of (main) communes:", (zipcodes_normal.main_commune+"_"+zipcodes_normal.Province).nunique())
Number of zipcodes: 1190
Number of (not special) zipcodes : 1146
Number of (main) communes: 581
In [8]:
commune_stats =zipcodes_normal.groupby("main_commune").agg({"zipcode":"nunique", 
                                                            "locality":"count", 
                                                            "Province": max})
commune_stats
Out[8]:
zipcode locality Province
main_commune
AALST 4 9 FLANDRE-ORIENTALE
AALTER 3 6 FLANDRE-ORIENTALE
AARSCHOT 3 4 BRABANT FLAMAND
AARTSELAAR 1 1 ANVERS
AFFLIGEM 1 4 BRABANT FLAMAND
... ... ... ...
ZULTE 1 3 FLANDRE-ORIENTALE
ZUTENDAAL 1 1 LIMBOURG
ZWALM 2 12 FLANDRE-ORIENTALE
ZWEVEGEM 5 5 FLANDRE-OCCIDENTALE
ZWIJNDRECHT 1 2 ANVERS

581 rows × 3 columns

In [9]:
division_counts = pd.DataFrame(columns = ["#zipcodes", "#localities", "desc", "query"],
             data=[
 ["=1", "=1",   "Only one locality within the commune",         "(zipcode==1) & (locality == 1)"],
 ["=1", ">1",   "Zipcode share by all the localities",          "(zipcode==1) & (locality>1)"],
 [">1", "=zip", "A distinct zipcode for each locality",         "(zipcode >1) & (zipcode==locality)"],
 [">1", ">zip", "Some localities share zipcodes",               "(zipcode >1) & (zipcode<locality)"],
# [">1", "<zip", "(more zip than localities, does not happen)",  "(zipcode >1) & (zipcode>locality)"] 
])
In [10]:
division_counts["BEL"] = division_counts["query"].apply(lambda q: commune_stats.query(q).locality.count())
division_counts.drop("query", axis=1)
Out[10]:
#zipcodes #localities desc BEL
0 =1 =1 Only one locality within the commune 98
1 =1 >1 Zipcode share by all the localities 261
2 >1 =zip A distinct zipcode for each locality 53
3 >1 >zip Some localities share zipcodes 169
In [11]:
# Stats per province
prov_counts = division_counts["query"].apply(lambda q: commune_stats.query(q).groupby("Province").locality.count()).fillna(0)
division_counts_prov = pd.concat([division_counts.drop("query", axis=1), prov_counts], axis=1)
division_counts_prov
Out[11]:
#zipcodes #localities desc BEL ANVERS BRABANT FLAMAND BRABANT WALLON BRUXELLES FLANDRE-OCCIDENTALE FLANDRE-ORIENTALE HAINAUT LIEGE LIMBOURG LUXEMBOURG NAMUR
0 =1 =1 Only one locality within the commune 98 33.0 11.0 3.0 18.0 11.0 7.0 2.0 4.0 8.0 1.0 0.0
1 =1 >1 Zipcode share by all the localities 261 24.0 21.0 17.0 0.0 37.0 30.0 27.0 42.0 18.0 22.0 23.0
2 >1 =zip A distinct zipcode for each locality 53 6.0 11.0 3.0 1.0 4.0 5.0 5.0 11.0 5.0 2.0 0.0
3 >1 >zip Some localities share zipcodes 169 6.0 22.0 4.0 0.0 12.0 18.0 35.0 27.0 11.0 19.0 15.0
In [12]:
division_counts_prov.set_index("desc").drop(["BEL", "#zipcodes", "#localities"], axis=1).T.plot.bar(figsize=(15,6))
Out[12]:
<AxesSubplot:>
In [13]:
print("Communes with a locality having the same name")
display(zipcodes_normal[(zipcodes_normal.locality.str.upper() == zipcodes_normal.main_commune)])
print("Number of (unique) communes:", zipcodes_normal[(zipcodes_normal.locality.str.upper() == zipcodes_normal.main_commune)].main_commune.nunique())
Communes with a locality having the same name
zipcode locality Sous-commune main_commune Province
6 9300 Aalst Non AALST FLANDRE-ORIENTALE
7 9880 Aalter Non AALTER FLANDRE-ORIENTALE
8 3200 Aarschot Non AARSCHOT BRABANT FLAMAND
11 2630 Aartselaar Non AARTSELAAR ANVERS
22 1790 Affligem Non AFFLIGEM BRABANT FLAMAND
... ... ... ... ... ...
2757 8377 Zuienkerke Non ZUIENKERKE FLANDRE-OCCIDENTALE
2758 9870 Zulte Non ZULTE FLANDRE-ORIENTALE
2760 3690 Zutendaal Non ZUTENDAAL LIMBOURG
2761 8550 Zwevegem Non ZWEVEGEM FLANDRE-OCCIDENTALE
2764 2070 Zwijndrecht Non ZWIJNDRECHT ANVERS

528 rows × 5 columns

Number of (unique) communes: 521
In [14]:
zipcodes_normal[zipcodes_normal["Sous-commune"] == "Non"]
Out[14]:
zipcode locality Sous-commune main_commune Province
6 9300 Aalst Non AALST FLANDRE-ORIENTALE
7 9880 Aalter Non AALTER FLANDRE-ORIENTALE
8 3200 Aarschot Non AARSCHOT BRABANT FLAMAND
11 2630 Aartselaar Non AARTSELAAR ANVERS
22 1790 Affligem Non AFFLIGEM BRABANT FLAMAND
... ... ... ... ... ...
2757 8377 Zuienkerke Non ZUIENKERKE FLANDRE-OCCIDENTALE
2758 9870 Zulte Non ZULTE FLANDRE-ORIENTALE
2760 3690 Zutendaal Non ZUTENDAAL LIMBOURG
2761 8550 Zwevegem Non ZWEVEGEM FLANDRE-OCCIDENTALE
2764 2070 Zwijndrecht Non ZWIJNDRECHT ANVERS

528 rows × 5 columns

In [15]:
# Several localities with the same name within the same commune
zipcodes[zipcodes[["locality", "main_commune"]].duplicated(keep=False)].sort_values("locality")
Out[15]:
zipcode locality Sous-commune main_commune Province is_special
60 2000 Antwerpen Non ANTWERPEN ANVERS False
61 2018 Antwerpen Non ANTWERPEN ANVERS False
62 2020 Antwerpen Non ANTWERPEN ANVERS False
63 2030 Antwerpen Non ANTWERPEN ANVERS False
64 2040 Antwerpen Non ANTWERPEN ANVERS False
65 2050 Antwerpen Non ANTWERPEN ANVERS False
66 2060 Antwerpen Non ANTWERPEN ANVERS False
1181 9120 Kallo Oui BEVEREN-WAAS FLANDRE-ORIENTALE False
1182 9130 Kallo Oui BEVEREN-WAAS FLANDRE-ORIENTALE False
1359 4000 Liège Non LIÈGE LIEGE False
1360 4020 Liège Non LIÈGE LIEGE False

Zip code centers

Get a list giving for each zipcode, its locality name, and geographical center.

Note : it does not contain any "special zicode"

In [16]:
# https://data.gov.be/fr/dataset/328ba4f140ba0e870dfc9c70635fe7c1840980b1
download_if_nexist("https://www.odwb.be/api/v2/catalog/datasets/code-postaux-belge/exports/geojson",
                  "data/zipcode_centers.geojson")

zipcodes_centers = gpd.read_file("data/zipcode_centers.geojson")
zipcodes_centers = zipcodes_centers.rename({"column_1": "zipcode", "column_2": "locality"}, axis=1)[["locality","zipcode", "geometry"]]

zipcodes_centers = zipcodes_centers.to_crs(crs)
print(zipcodes_centers.zipcode.nunique(), "unique zipcode")
zipcodes_centers
1145 unique zipcode
Out[16]:
locality zipcode geometry
0 Ixelles 1050 POINT (487754.219 6589918.068)
1 Koekelberg 1081 POINT (481535.656 6596965.764)
2 Haren 1130 POINT (491205.143 6602205.979)
3 Woluwe-Saint-Pierre 1150 POINT (494625.526 6591144.170)
4 Incourt 1315 POINT (534216.002 6566956.791)
... ... ... ...
2752 Ertvelde 9940 POINT (417134.047 6653080.277)
2753 Boekhoute 9961 POINT (412545.747 6665801.135)
2754 Sint-Jan-In-Eremo 9982 POINT (398534.241 6668876.259)
2755 Waterland-Oudeman 9988 POINT (399698.031 6672595.592)
2756 Maldegem 9990 POINT (380206.578 6661423.982)

2757 rows × 3 columns

In [17]:
# Missing zipcode
zipcodes_normal[~zipcodes_normal.zipcode.isin(zipcodes_centers.zipcode)]
Out[17]:
zipcode locality Sous-commune main_commune Province
1895 5352 Perwez-Haillot Oui OHEY NAMUR

Zip code boundaries

Get the boundaries for all zipcodes. Provided by BPost.

In the original files, multi-polygons as given as multiple (single)-polygons records. We merge them in ony one record.

Note that in the "center" file, if several entities share the same zip, the center of each entity is given, while here, we only have one (multi-)polygon per zipcode.

In [18]:
zipcode_boundaries_filename = "data/zipcode_boundaries_shapefile_3812.zip"
In [19]:
# https://www.geo.be/catalog/details/9738c7c0-5255-11ea-8895-34e12d0f0423?l=fr
download_if_nexist("https://bgu.bpost.be/assets/9738c7c0-5255-11ea-8895-34e12d0f0423_x-shapefile_3812.zip",
                  zipcode_boundaries_filename)
In [20]:
zipcodes_boundaries = gpd.read_file(f"zip://{zipcode_boundaries_filename}/3812")
zipcodes_boundaries["is_special"] = zipcodes_boundaries.CP_speciau ==1
zipcodes_boundaries = zipcodes_boundaries.rename({"nouveau_PO":"zipcode"}, axis=1)[["zipcode", "is_special", "geometry"]]

zipcodes_boundaries = zipcodes_boundaries.dissolve(["zipcode", "is_special"]).reset_index()

zipcodes_boundaries = zipcodes_boundaries.to_crs(crs)
zipcodes_boundaries
Out[20]:
zipcode is_special geometry
0 1000 False MULTIPOLYGON Z (((487951.058 6600061.994 0.000...
1 1005 True POLYGON Z ((484391.808 6593846.983 0.000, 4844...
2 1006 True POLYGON Z ((484413.037 6593943.815 0.000, 4843...
3 1007 True POLYGON Z ((486232.320 6594255.851 0.000, 4862...
4 1008 True POLYGON Z ((486000.332 6594258.665 0.000, 4858...
... ... ... ...
1182 9982 False POLYGON Z ((398126.149 6670079.581 0.000, 3981...
1183 9988 False POLYGON Z ((402165.046 6674165.098 0.000, 4021...
1184 9990 False POLYGON Z ((383123.261 6664423.574 0.000, 3831...
1185 9991 False POLYGON Z ((389466.378 6660430.021 0.000, 3895...
1186 9992 False POLYGON Z ((376698.000 6669876.805 0.000, 3767...

1187 rows × 3 columns

In [21]:
# Missing zipcode
zipcodes_normal[~zipcodes_normal.zipcode.isin(zipcodes_boundaries.zipcode)]
Out[21]:
zipcode locality Sous-commune main_commune Province
1175 6642 Juseret Oui VAUX-SUR-SÛRE LUXEMBOURG
In [22]:
# Extra zipcode
zipcodes_boundaries[~zipcodes_boundaries.zipcode.isin(zipcodes.zipcode)]
Out[22]:
zipcode is_special geometry
1071 9 False POLYGON Z ((551743.087 6617609.407 0.000, 5517...

NIS code boundaries

Get boundaries for all statistical sectors, from StatBel. Statistical sectors can be grouped by NIS code, providing NIS code boundaries.

In [23]:
# https://statbel.fgov.be/fr/open-data/secteurs-statistiques-2020

download_if_nexist("https://statbel.fgov.be/sites/default/files/files/opendata/Statistische%20sectoren/sh_statbel_statistical_sectors_31370_20200101.shp.zip",
                   "data/stat_sectors_2020.zip")
statistical_sectors = gpd.read_file("zip://data/stat_sectors_2020.zip/sh_statbel_statistical_sectors_20200101.shp")
statistical_sectors["CNIS5_2020"] = statistical_sectors["CNIS5_2020"].astype(str)
In [24]:
# Group (with "dissolve") sectors per NIS code
nis_boundaries = statistical_sectors[["CNIS5_2020", "T_MUN_FR", "T_MUN_NL", "geometry"]].dissolve(by="CNIS5_2020").reset_index()
nis_boundaries = nis_boundaries.rename({"CNIS5_2020": "niscode"}, axis=1)
nis_boundaries = nis_boundaries.to_crs(crs)

Spatial join : polygon vs point

As a first example of spatial join, we will join polygons (from zipcodes_boundaries) and points (from zipcodes_centers).

In the resulting table sjoin_zip_center, we'll get all combination of zipcodes_boundaries and zipcodes_centers, where the point of zipcodes_centers falls into the polygon of zipcodes_boundaries.

sjoin_mismatches will contain all combinations where the zipcode provided by zipcodes_centers does not match the zipcode provided by zipcodes_boundaries

In [25]:
sjoin_zip_center = gpd.sjoin(zipcodes_boundaries, zipcodes_centers)
sjoin_zip_center
Out[25]:
zipcode_left is_special geometry index_right locality zipcode_right
0 1000 False MULTIPOLYGON Z (((487951.058 6600061.994 0.000... 1678 Bruxelles 1000
8 1020 False POLYGON Z ((484519.836 6603848.988 0.000, 4845... 1139 Laeken 1020
9 1030 False POLYGON Z ((488633.964 6600056.547 0.000, 4887... 1140 Schaerbeek 1030
13 1040 False MULTIPOLYGON Z (((486480.015 6594068.947 0.000... 2214 Etterbeek 1040
21 1050 False MULTIPOLYGON Z (((486898.793 6593112.249 0.000... 0 Ixelles 1050
... ... ... ... ... ... ...
1183 9988 False POLYGON Z ((402165.046 6674165.098 0.000, 4021... 547 Watervliet 9988
1183 9988 False POLYGON Z ((402165.046 6674165.098 0.000, 4021... 2755 Waterland-Oudeman 9988
1184 9990 False POLYGON Z ((383123.261 6664423.574 0.000, 3831... 2756 Maldegem 9990
1185 9991 False POLYGON Z ((389466.378 6660430.021 0.000, 3895... 2212 Adegem 9991
1186 9992 False POLYGON Z ((376698.000 6669876.805 0.000, 3767... 2213 Middelburg 9992

2757 rows × 6 columns

In [26]:
sjoin_mismatches = sjoin_zip_center[sjoin_zip_center.zipcode_left != sjoin_zip_center.zipcode_right].reset_index(drop=True)
sjoin_mismatches
Out[26]:
zipcode_left is_special geometry index_right locality zipcode_right
0 1130 False POLYGON Z ((492165.130 6604187.117 0.000, 4921... 1264 Haren 3700
1 1541 False POLYGON Z ((442698.747 6572867.831 0.000, 4427... 1047 Driekapellen 8600
2 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 1198 Antwerpen 2040
3 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 2256 Antwerpen 2030
4 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 2255 Antwerpen 2018
5 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 593 Antwerpen 2050
6 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 1736 Antwerpen 2020
7 2000 False POLYGON Z ((491447.469 6663216.307 0.000, 4914... 2259 Antwerpen 2060
8 2180 False POLYGON Z ((496588.712 6668893.730 0.000, 4964... 1245 Laar 3400
9 2180 False POLYGON Z ((496588.712 6668893.730 0.000, 4964... 2262 Schriek 2223
10 2490 False POLYGON Z ((579405.019 6646044.757 0.000, 5793... 1263 Berg 3700
11 2490 False POLYGON Z ((579405.019 6646044.757 0.000, 5793... 2252 Berg 1910
12 2560 False POLYGON Z ((522581.783 6652932.359 0.000, 5226... 2350 Voort 3840
13 3202 False POLYGON Z ((545421.333 6620399.304 0.000, 5454... 1798 Donk 3540
14 3300 False POLYGON Z ((546638.037 6597317.597 0.000, 5466... 106 Sint-Margriete-Houtem 3470
15 3700 False POLYGON Z ((613768.565 6589710.582 0.000, 6137... 683 Kolmont 3840
16 3770 False POLYGON Z ((625035.587 6592809.132 0.000, 6250... 1657 Elst 9660
17 4000 False MULTIPOLYGON Z (((622294.900 6548667.801 0.000... 151 Liège 4020
18 4120 False POLYGON Z ((604356.408 6544578.503 0.000, 6043... 181 Ehein 4480
19 4122 False POLYGON Z ((617503.511 6544062.555 0.000, 6175... 701 Neupré 4120
20 4141 False POLYGON Z ((640688.017 6541527.140 0.000, 6406... 2429 Louveigné 4920
21 4400 False POLYGON Z ((609617.951 6554549.539 0.000, 6096... 2564 Noirfontaine 6831
22 4852 False POLYGON Z ((664231.609 6575203.101 0.000, 6642... 778 Plombières 4850
23 4910 False POLYGON Z ((654464.107 6535116.816 0.000, 6544... 2538 Mont 6661
24 4910 False POLYGON Z ((654464.107 6535116.816 0.000, 6544... 1943 Mont 5530
25 4910 False POLYGON Z ((654464.107 6535116.816 0.000, 6544... 2423 Polleur 4800
26 4960 False POLYGON Z ((680570.480 6512355.732 0.000, 6805... 340 Bellevaux 6834
27 4987 False POLYGON Z ((651334.479 6514275.104 0.000, 6512... 855 Neuville 5600
28 4987 False POLYGON Z ((651334.479 6514275.104 0.000, 6512... 1366 La Bruyère 5080
29 4987 False POLYGON Z ((651334.479 6514275.104 0.000, 6512... 1342 Andrimont 4821
30 5541 False POLYGON Z ((538168.187 6483865.512 0.000, 5382... 835 Hastière 5540
31 5576 False POLYGON Z ((558829.518 6455080.381 0.000, 5589... 263 Honnay 5570
32 6640 False POLYGON Z ((627657.956 6447744.344 0.000, 6276... 314 Juseret 6642
33 6666 False POLYGON Z ((636191.395 6480967.171 0.000, 6362... 2585 Mormont 6997
34 6681 False POLYGON Z ((616954.726 6458024.935 0.000, 6169... 2017 Sainte-Ode 6680
35 6724 False POLYGON Z ((620859.901 6407429.477 0.000, 6213... 2548 Habay 6720
36 6820 False MULTIPOLYGON Z (((593091.087 6396179.853 0.000... 774 Lambermont 4800
37 7141 False POLYGON Z ((476514.652 6524706.111 0.000, 4764... 376 Morlanwelz 7140
38 7624 False POLYGON Z ((374657.316 6537126.140 0.000, 3747... 2627 Brunehaut 7620
39 7811 False POLYGON Z ((426644.077 6556386.940 0.000, 4269... 2080 Maffle 7810
40 8301 False POLYGON Z ((364600.110 6682813.553 0.000, 3646... 2657 Knokke-Heist 8300
41 8620 False POLYGON Z ((314007.990 6639410.105 0.000, 3139... 1268 Sluizen 3700
42 8755 False POLYGON Z ((370845.213 6638802.585 0.000, 3708... 1643 Bambrugge 9420
43 8952 False POLYGON Z ((317308.967 6581658.756 0.000, 3172... 2694 Heuvelland 8950
44 9120 False POLYGON Z ((475777.507 6667578.396 0.000, 4757... 492 Kallo 9130
45 9130 False POLYGON Z ((478761.897 6667836.361 0.000, 4779... 1628 Kallo 9120
46 9180 False POLYGON Z ((442800.468 6661195.878 0.000, 4428... 2723 Moerbeke 9500
47 9180 False POLYGON Z ((442800.468 6661195.878 0.000, 4428... 2703 Sinaai-Waas 9112
48 9220 False POLYGON Z ((464845.765 6639208.960 0.000, 4647... 39 Hamme 1785
49 9620 False POLYGON Z ((426237.365 6605895.998 0.000, 4262... 523 Sint-Maria-Oudenhove 9660
50 9620 False POLYGON Z ((426237.365 6605895.998 0.000, 4262... 2176 Oombergen 9520
51 9790 False POLYGON Z ((395001.891 6600120.574 0.000, 3950... 529 Ooike 9700

We can now explore all those mismatch. As a way to make decisions easier, we provide several data:

  • The mismatch record (with two different zipcodes zip1, zip2)
  • Information we have from BPost for the two zipcodes
  • What OpenStreetMap knows at the point given by zipcodes_centers
  • What OpenStreetMap knows for the given locality (in zipcodes_centers)
  • A map with :
    • the boundaries corresponding to zip1 and zip2 in zipcodes_boundaries et
    • all points corresponding to zip1 and zip2 in zipcodes_centers
In [27]:
def plot_mismatch_zip(sjoin_mismatches, i):
    print(f"Mismatch record {i}:")
    display(sjoin_mismatches.iloc[[i]])

    zip1 = sjoin_mismatches.iloc[i].zipcode_left
    zip2 = sjoin_mismatches.iloc[i].zipcode_right

    print("From BPost:")
    display( zipcodes[ zipcodes.zipcode.isin([zip1, zip2])])

    try: 
        print("Know by OSM at the given point:")
        pt = zipcodes_centers.to_crs(osm_crs).loc[sjoin_mismatches.iloc[i].index_right].geometry
        geolocator = geopy.geocoders.Nominatim(user_agent="smalsresearch")
        print(geolocator.reverse(f"{pt.y}, {pt.x}").address)

        print("Know by OSM for the given locality:")
        loc = zipcodes_centers.loc[sjoin_mismatches.iloc[i].index_right].locality
        print(geolocator.geocode(f"{loc}, Belgium").address)
    except Exception as e:
        print("Cannot get OSM data:", e)

    query = f"zipcode in ('{zip1}', '{zip2}')"


    df_bnd = zipcodes_boundaries.assign(label=zipcodes_boundaries.zipcode).query(query).to_crs(osm_crs)
    df_pnt = zipcodes_centers.assign(label=zipcodes_centers.zipcode+" "+zipcodes_centers.locality).query(query).to_crs(osm_crs)

    df_pnt["mismatch"] = False
    df_pnt.loc[sjoin_mismatches.iloc[i].index_right, "mismatch"] = True

    if with_plotly:
        fig = px.choropleth_mapbox(df_bnd, 
                                   geojson=df_bnd.geometry, 
                                   locations=df_bnd.index, 
                                   hover_name="label",
                                   color=df_bnd['label'],
                                   center={"lat": df_bnd.bounds.mean()[["miny", "maxy"]].mean(), 
                                           "lon": df_bnd.bounds.mean()[["minx", "maxx"]].mean()},
                                   mapbox_style="open-street-map",
                                   #zoom=10,
                                   opacity=0.5,

                           )

        for i, bloc in df_pnt.groupby("mismatch"): 
            fig.add_trace(go.Scattermapbox(lat=bloc.geometry.y,
                                           lon=bloc.geometry.x,
                                           mode='markers+text', # "markers+text",
                                           text=bloc.label, 
                                           name="Zipcode centers" + (" (mismatch)" if i else ""),
                                           textposition="bottom center"
                                          )
                                    )
        fig.show()
    else: 
        df=pd.concat([df_bnd, df_pnt]).to_crs(crs)
        df["label"] = df["label"] + np.where(df.mismatch==True, "*", "")
        ax = df.plot("label", figsize=(10,15), legend=True)
        try:
            ctx.add_basemap(ax)
        except Exception as  e:
            print("Cannot add basemap: ", e)
        plt.show()
        display(df[["zipcode", "geometry","label"]])
In [28]:
# for i in range(sjoin_mismatches.shape[0]):
for i in [8, 19, 25, 30, 43]:
    plot_mismatch_zip(sjoin_mismatches, i)
    print("--------------")
Mismatch record 8:
zipcode_left is_special geometry index_right locality zipcode_right
8 2180 False POLYGON Z ((496588.712 6668893.730 0.000, 4964... 1245 Laar 3400
From BPost:
zipcode locality Sous-commune main_commune Province is_special
603 2180 Ekeren Oui ANTWERPEN ANVERS False
609 3400 Eliksem Oui LANDEN BRABANT FLAMAND False
683 3400 Ezemaal Oui LANDEN BRABANT FLAMAND False
1266 3400 Laar Oui LANDEN BRABANT FLAMAND False
1285 3400 Landen Non LANDEN BRABANT FLAMAND False
1692 3400 Neerwinden Oui LANDEN BRABANT FLAMAND False
1867 3400 Overwinden Oui LANDEN BRABANT FLAMAND False
2079 3400 Rumsdorp Oui LANDEN BRABANT FLAMAND False
2581 3400 Wange Oui LANDEN BRABANT FLAMAND False
Know by OSM at the given point:
11, Laar, Donk, Ekeren, Antwerpen, Vlaanderen, 2180, België / Belgique / Belgien
Know by OSM for the given locality:
Laar, Wommelgem, Antwerpen, Vlaanderen, 2160, België / Belgique / Belgien