Whether it’s expansion, consolidation or performance monitoring, understanding revenue drivers is essential for Site Planning in many sectors such as Retail or Restaurant and Food Services.
This notebook walks you through all the data collection and preparation steps required for building a revenue prediction model. The main steps followed are:
3.1 Calculate isochrones
3.2 Enrich isochrones
We'll use CARTOframes throughout the analysis.
Note this use case leverages premium datasets from CARTO's Data Observatory.
In order to show all the steps and functionality, we'll work with simulated sales data of Carrefour Express, a chain of small-sized supermarkets.
Carreforu Express (CE) wants to reorganize (open/close) their stores in the city of Madrid (Spain). In order to define an optimal plan of openings and closures, they first need to understand why some stores are performing better (in terms of annual revenue) than others, and identify areas where they could have a high performance.
They have provided us with the stores they have in the city of Madrid, together with the average annual sales of the last three years.
Note the annual sales are not Carrefour Express' actual data.
Import the packages we'll use.
import geopandas as gpd
import ipywidgets as widgets
import numpy as np
import pandas as pd
import pyproj
from cartoframes.auth import set_default_credentials
from cartoframes.data.observatory import *
from cartoframes.data.services import Geocoding, Isolines
from cartoframes.viz import *
from IPython.display import clear_output, display
from scipy.spatial.distance import cdist
from shapely import wkt
from shapely.geometry import Point
pd.set_option('display.max_columns', 100)
In order to be able to use the Data Observatory via CARTOframes, you need to set your CARTO account credentials first.
Please, visit the Authentication guide for further detail.
set_default_credentials('creds.json')
We'll start loading the CSV file containing all stores in Madrid with their address and annual revenue. They 57 stores in total.
stores = pd.read_csv('https://docs.google.com/spreadsheets/d/1RlOsWN3OBTS0Zhq2lbYvBrwXxSJjpCdWrOWHSqg2JVE/export?gid=0&format=csv')
stores.head()
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | |
---|---|---|---|---|---|---|
0 | CARREFOUR EXPRESS SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 |
1 | CARREFOUR EXPRESS ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 |
2 | CARREFOUR EXPRESS RAMIREZ DEL PRADO | CALLE DE JUAN DE MARIANA 8 | 28045 | MADRID | COMUNIDAD DE MADRID | 19.27 |
3 | CARREFOUR EXPRESS PASTORA IMPERIO. 1 | CALLE PASTORA IMPERIO 1 | 28036 | MADRID | COMUNIDAD DE MADRID | 17.53 |
4 | CARREFOUR EXPRESS BÉJAR | CALLE DE BÉJAR 1 | 28028 | MADRID | COMUNIDAD DE MADRID | 25.91 |
stores.shape
(57, 6)
We have the address of each store, but we need their coordinates in order to perform spatial analysis. We'll use CARTOframes geocoding functionality for this.
stores['complete_address'] = stores.apply(lambda row : f"{row['mainaddressline']}, {row['postcode']}",axis=1)
gc = Geocoding()
gdf, metadata = gc.geocode(stores, street='complete_address', city='areaname2',
state='areaname1', country={'value': 'Spain'}, )
Success! Data geocoded correctly
gdf.head(2)
the_geom | name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | gc_status_rel | carto_geocode_hash | |
---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-3.70394 40.42763) | CARREFOUR EXPRESS SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | 0.99 | 3979e14d3cd5995e7e5809430b429da4 |
1 | POINT (-3.78261 40.45914) | CARREFOUR EXPRESS ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | 1.00 | 91271c1c6ff4bc5f03c148b1d567b4a7 |
stores = gdf[stores.columns.tolist() + ['the_geom']].rename(columns={'the_geom':'geometry'})
stores.head(2)
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | geometry | |
---|---|---|---|---|---|---|---|---|
0 | CARREFOUR EXPRESS SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | POINT (-3.70394 40.42763) |
1 | CARREFOUR EXPRESS ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | POINT (-3.78261 40.45914) |
Map(Layer(stores,
popup_hover=popup_element('name'),
geom_col='geometry'))
Once we have the stores geocoded, we'll analyze the spatial distribution of annual sdales.
stores['name'] = stores['name'].str[18:]
Map(Layer(stores,
style=size_continuous_style('annual_sales'),
popup_hover=[popup_element('name'), popup_element('annual_sales')],
legends=size_continuous_legend('Annual Sales', 'Annual sales in million euros'),
geom_col='geometry'))
Sales in a grocery store is mainly influenced by:
In order to enrich our initial dataset with this information, we first need to define the area of influence (catchment area) of the different stores. Once we have the catchment area, we'll bring all the data related to that area from CARTO's Data Observatory.
Because of the characteristics of Carrefour Express' customers, we can define their catchment areas by acknowledging that:
Therefore, we'll consider 5-minute-walking isochrones as their area of influence.
Explore our Guides to learn more about isochornes.
iso_service = Isolines()
isochrones_gdf, _ = iso_service.isochrones(stores, [300, 600], mode='walk', exclusive=False)
Success! Isolines created correctly
isochrones_gdf.head()
source_id | data_range | the_geom | |
---|---|---|---|
0 | 0 | 300 | MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... |
1 | 0 | 600 | MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... |
2 | 1 | 300 | MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... |
3 | 1 | 600 | MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... |
4 | 2 | 300 | MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... |
Map(Layer(isochrones_gdf, geom_col='the_geom', style=basic_style(opacity=0.3)))
stores['iso_5walk'] = isochrones_gdf.loc[isochrones_gdf['data_range'] == 300, 'the_geom'].values
stores['iso_10walk'] = isochrones_gdf.loc[isochrones_gdf['data_range'] == 600, 'the_geom'].values
Once we have the area of influence of every store, we can enrich our initial data.
enrichment = Enrichment()
We'll start by enriching with POI data. We'll calculate for every store:
We will use Pitney Bowes' Points Of Interest premium dataset.
Take a look at this template for more details on how to access and download a premium dataset.
For more details on how to discover a dataset, please check this notebook or take a look at our Guides.
dataset = Dataset.get('pb_points_of_i_94bda91b')
dataset.variables.to_dataframe().head()
slug | name | description | db_type | agg_method | column_name | variable_group_id | dataset_id | id | |
---|---|---|---|---|---|---|---|---|---|
0 | geoid_17c119ef | geoid | Unique numeric identifier, concatenated with l... | STRING | None | geoid | None | carto-do.pitney_bowes.pointsofinterest_pointso... | carto-do.pitney_bowes.pointsofinterest_pointso... |
1 | do_label_ecfa983b | do_label | Primary / Registered name of the business | STRING | None | do_label | None | carto-do.pitney_bowes.pointsofinterest_pointso... | carto-do.pitney_bowes.pointsofinterest_pointso... |
2 | do_date_4f2cf0b3 | do_date | First day of the month for the delivery | DATE | None | do_date | None | carto-do.pitney_bowes.pointsofinterest_pointso... | carto-do.pitney_bowes.pointsofinterest_pointso... |
3 | NAME_ed9f897 | NAME | Primary / Registered name of the business | STRING | None | NAME | None | carto-do.pitney_bowes.pointsofinterest_pointso... | carto-do.pitney_bowes.pointsofinterest_pointso... |
4 | BRANDNAME_a7ebfc28 | Brand | PB standardized Brand Name used by the business | STRING | None | BRANDNAME | None | carto-do.pitney_bowes.pointsofinterest_pointso... | carto-do.pitney_bowes.pointsofinterest_pointso... |
We'll calculate the number of POIs within each store's catchment area. This gives us an idea of how busy the area is.
enriched_dataset_gdf = enrichment.enrich_polygons(
stores,
variables=['CLASS_517d6003'],
aggregation='COUNT',
geom_col='iso_5walk'
)
enriched_dataset_gdf.head()
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | geometry | iso_5walk | iso_10walk | CLASS | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | POINT (-3.70394 40.42763) | MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... | MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... | 21752 |
1 | ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | POINT (-3.78261 40.45914) | MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... | MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... | 6450 |
2 | RAMIREZ DEL PRADO | CALLE DE JUAN DE MARIANA 8 | 28045 | MADRID | COMUNIDAD DE MADRID | 19.27 | CALLE DE JUAN DE MARIANA 8, 28045 | POINT (-3.68693 40.39849) | MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... | MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... | 2622 |
3 | PASTORA IMPERIO. 1 | CALLE PASTORA IMPERIO 1 | 28036 | MADRID | COMUNIDAD DE MADRID | 17.53 | CALLE PASTORA IMPERIO 1, 28036 | POINT (-3.67468 40.48099) | MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40... | MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40... | 2777 |
4 | BÉJAR | CALLE DE BÉJAR 1 | 28028 | MADRID | COMUNIDAD DE MADRID | 25.91 | CALLE DE BÉJAR 1, 28028 | POINT (-3.67456 40.43516) | MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40... | MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40... | 16508 |
stores['n_pois'] = enriched_dataset_gdf['CLASS'].values
Map(Layer(stores, geom_col='iso_5walk', style=color_bins_style('n_pois')))
Next, we'll count the number of competitors within the 10 minute isochrone. This represents all the supermarkets that customers within our stores can find within a 5-minute-walk distance.
Note we apply a filter to only count grocery stores (competitors).
enriched_dataset_gdf = enrichment.enrich_polygons(
stores,
variables=['CLASS_517d6003'],
aggregation='COUNT',
geom_col='iso_10walk',
filters={'carto-do.pitney_bowes.pointsofinterest_pointsofinterest_esp_latlon_v1_monthly_v1.CLASS':
"= 'GROCERY STORES'"}
)
stores['n_competitors'] = enriched_dataset_gdf['CLASS'].values
stores.head(3)
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | geometry | iso_5walk | iso_10walk | n_pois | n_competitors | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | POINT (-3.70394 40.42763) | MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... | MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... | 21752 | 1241 |
1 | ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | POINT (-3.78261 40.45914) | MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... | MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... | 6450 | 199 |
2 | RAMIREZ DEL PRADO | CALLE DE JUAN DE MARIANA 8 | 28045 | MADRID | COMUNIDAD DE MADRID | 19.27 | CALLE DE JUAN DE MARIANA 8, 28045 | POINT (-3.68693 40.39849) | MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... | MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... | 2622 | 108 |
Now we'll enrich our dataframe with Unica360 Sociodemographics premium dataset.
For more details on how to discover a dataset, please check this notebook or take a look at our Guides.
dataset = Dataset.get('u360_sociodemogr_28e93b81')
dataset.head()
HOG | P_T | id_x | id_y | geoid | do_date | EURO_RIC | EXTR_NAC | INM_POBR | n_fincas | UE_28_NOR | n_viv_inm | renta_ind | P_ED_00_04_H | P_ED_00_04_M | P_ED_00_14_H | P_ED_00_14_M | P_ED_15_24_H | P_ED_15_24_M | P_ED_25_44_H | P_ED_25_44_M | P_ED_45_64_H | P_ED_45_64_M | P_ED_65_79_H | P_ED_65_79_M | P_ED_80_MAS_H | P_ED_80_MAS_M | P_EST_RESTO_M | renta_hab_disp | renta_hog_neta | renta_tit_disp | renta_tit_bruta | P_EST_4_2_GRADO_M | P_EST_5_3_GRADO_M | C02_09_GASTO_M_ocio_M | C02_06_GASTO_M_salud_M | C02_12_GASTO_M_otros_M | C02_03_GASTO_M_textil_M | C02_10_GASTO_M_ensena_M | C02_11_GASTO_M_horeca_M | C02_02_GASTO_M_bebidas_M | C02_05_GASTO_M_muebles_M | C02_04_GASTO_M_vivienda_M | C02_07_GASTO_M_transporte_M | C02_01_GASTO_M_alimentacion_M | C02_08_GASTO_M_comunicaciones_M | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 24.296127 | 64.370032 | -19778 | 32977 | -19778#33352 | 2019-01-01 00:00:00+00:00 | 1.029272 | 6.085252 | 4.971238 | 29 | 2.654434 | 29 | 0.717099 | 1.190936 | 1.177480 | 3.299853 | 2.940335 | 3.855768 | 3.180014 | 8.720623 | 7.662982 | 9.115237 | 10.109776 | 4.341853 | 6.231276 | 2.015262 | 2.897054 | 0.389230 | 12202.757585 | 20100.857565 | 14462.460915 | 17616.981223 | 0.499176 | 0.111595 | 2221.969272 | 800.421305 | 2229.070279 | 1787.518852 | 221.568951 | 2746.878966 | 627.479286 | 1390.049665 | 6942.066165 | 4156.128864 | 4465.605857 | 990.745515 |
1 | 58.447519 | 149.641189 | -18173 | 33096 | -18173#33103 | 2019-01-01 00:00:00+00:00 | 4.516065 | 29.098229 | 22.820101 | 32 | 2.607527 | 50 | 0.952874 | 2.414108 | 2.250192 | 6.608373 | 5.342883 | 7.730453 | 9.152621 | 23.431270 | 22.464662 | 21.908981 | 22.224751 | 7.118215 | 10.802637 | 4.817359 | 8.038985 | 0.988062 | 11932.566934 | 26709.812769 | 19217.569299 | 23409.263436 | 0.006491 | 0.005447 | 2156.230515 | 771.555900 | 2779.220510 | 1691.092649 | 193.623252 | 2609.000000 | 560.790967 | 1424.372807 | 7148.106660 | 4006.054925 | 4364.704902 | 934.180016 |
2 | 26.604155 | 73.959551 | -18116 | 32687 | -18116#33085 | 2019-01-01 00:00:00+00:00 | 0.458427 | 2.292135 | 1.642697 | 40 | 4.487540 | 34 | 0.813667 | 1.451685 | 1.031461 | 5.310112 | 4.164045 | 5.424719 | 4.164045 | 9.359551 | 9.970787 | 12.224719 | 12.377528 | 3.705618 | 3.476404 | 1.757303 | 2.024719 | 0.291117 | 7583.815766 | 22807.725689 | 16410.038243 | 19989.359853 | 0.597157 | 0.111726 | 2559.120567 | 680.158167 | 1884.000000 | 1836.091586 | 423.418838 | 3389.951794 | 546.312933 | 1423.716749 | 6341.069319 | 3706.069530 | 4154.231413 | 1016.996320 |
3 | 33.045914 | 92.881935 | -17218 | 32661 | -17218#32297 | 2019-01-01 00:00:00+00:00 | 0.256298 | 2.653818 | 2.482952 | 26 | 1.291380 | 43 | 0.794065 | 1.536414 | 1.664563 | 5.542221 | 6.221605 | 6.281957 | 6.142287 | 12.034348 | 13.008005 | 15.317389 | 14.553237 | 4.441079 | 4.668884 | 1.802194 | 2.868729 | 0.253941 | 10387.578016 | 22258.276526 | 16014.712470 | 19507.806488 | 0.610700 | 0.135358 | 2426.496024 | 654.880189 | 2875.504119 | 1770.786627 | 280.563635 | 2609.000000 | 487.381885 | 1363.928508 | 8477.911559 | 4339.116312 | 4546.185871 | 981.843483 |
4 | 42.791565 | 120.525826 | -17166 | 32306 | -17166#32485 | 2019-01-01 00:00:00+00:00 | 3.510677 | 42.344986 | 38.866111 | 34 | 0.507045 | 43 | 0.750259 | 2.709732 | 2.255995 | 8.161522 | 7.454414 | 7.064280 | 6.442501 | 17.228426 | 18.946401 | 19.495545 | 17.419762 | 6.494981 | 6.937595 | 1.960744 | 2.919654 | 0.410322 | 9230.864235 | 21030.354642 | 15131.229156 | 18431.619728 | 0.515851 | 0.073828 | 2341.058219 | 646.455593 | 1946.375596 | 1817.761132 | 216.990783 | 2713.951569 | 531.613987 | 1439.418479 | 6568.585994 | 3874.252098 | 4230.469594 | 999.165857 |
5 | 37.239490 | 106.338527 | -17163 | 32387 | -17163#32613 | 2019-01-01 00:00:00+00:00 | 0.748508 | 5.395525 | 4.489339 | 41 | 5.261968 | 41 | 0.934545 | 2.327311 | 1.837204 | 8.243516 | 7.299152 | 6.434559 | 6.316766 | 16.636797 | 15.924450 | 17.229334 | 18.293044 | 3.807255 | 3.702808 | 0.942347 | 1.508500 | 0.301092 | 15206.522731 | 26196.021803 | 18847.899411 | 22958.961961 | 0.630520 | 0.068388 | 2455.607492 | 661.780195 | 2471.410572 | 1978.238066 | 464.362718 | 3271.408753 | 570.580945 | 1538.325215 | 7026.862516 | 3642.686800 | 4136.990977 | 1097.559424 |
6 | 68.027299 | 186.194490 | -17162 | 32685 | -17162#32483 | 2019-01-01 00:00:00+00:00 | 3.108189 | 14.195233 | 10.235228 | 38 | 1.479630 | 76 | 0.792400 | 2.274446 | 2.523710 | 8.599478 | 8.481789 | 9.011832 | 8.980095 | 27.891938 | 22.830855 | 31.271871 | 29.658821 | 12.527746 | 12.387136 | 6.130205 | 8.422724 | 0.416778 | 9131.446387 | 22211.580725 | 15981.115088 | 19466.880918 | 0.484402 | 0.098820 | 2244.234898 | 740.458090 | 2297.939237 | 1711.685763 | 141.415802 | 1862.537500 | 614.929750 | 1266.804557 | 6851.581654 | 3714.899753 | 4079.128157 | 965.067165 |
7 | 30.687296 | 80.303223 | -17144 | 32482 | -17144#32307 | 2019-01-01 00:00:00+00:00 | 0.651256 | 8.216549 | 7.610024 | 27 | 3.388013 | 40 | 0.949662 | 1.412346 | 1.276225 | 5.128429 | 4.937896 | 4.390172 | 3.825537 | 10.481719 | 10.696106 | 12.814069 | 13.407670 | 4.136237 | 6.010481 | 1.613199 | 2.861709 | 0.581030 | 10767.001377 | 26619.780114 | 19152.791279 | 23330.356176 | 0.295456 | 0.123514 | 2363.386639 | 678.923939 | 2879.000000 | 1743.381029 | 255.110422 | 2144.928980 | 525.404536 | 1320.476423 | 6952.603962 | 3723.978057 | 3545.675798 | 965.785980 |
8 | 33.390700 | 85.328830 | -17132 | 32486 | -17132#32482 | 2019-01-01 00:00:00+00:00 | 1.357646 | 12.403156 | 10.880929 | 34 | 3.930721 | 44 | 1.046394 | 1.714538 | 1.990038 | 5.842336 | 5.915253 | 4.295373 | 3.103507 | 11.445285 | 13.669828 | 11.059466 | 12.955444 | 5.129749 | 7.033604 | 1.932183 | 2.946802 | 0.242906 | 8990.674991 | 29331.237669 | 21103.670678 | 25706.757117 | 0.528009 | 0.229085 | 2409.842055 | 715.770809 | 1994.852864 | 1793.226386 | 336.881211 | 2609.000000 | 592.512128 | 1375.257446 | 6410.553981 | 4015.768641 | 4377.518211 | 985.091840 |
9 | 78.866892 | 233.102278 | -15086 | 32758 | -15086#33717 | 2019-01-01 00:00:00+00:00 | 4.779089 | 53.609586 | 46.411882 | 24 | 2.951889 | 72 | 0.759931 | 7.575629 | 6.267848 | 21.264743 | 16.362203 | 15.129148 | 12.273443 | 43.326684 | 35.644802 | 30.179679 | 30.755570 | 8.879519 | 11.757333 | 2.910979 | 4.618177 | 0.256679 | 9363.256499 | 21301.462819 | 15326.290058 | 18669.226887 | 0.644514 | 0.098808 | 2486.019486 | 641.856765 | 2879.000000 | 1816.883883 | 420.358591 | 3391.000000 | 540.780005 | 1323.198738 | 7017.370248 | 4413.282917 | 4442.802609 | 977.844271 |
We can get a detailed description of every variable.
Variable.get('C02_01_GASTO_M__7ad08d93').to_dict()
{'slug': 'C02_01_GASTO_M__7ad08d93', 'name': 'C02_01_GASTO_M_alimentacion_M', 'description': 'Average household spend by the 12 COICOP categories', 'db_type': 'FLOAT', 'agg_method': 'AVG', 'column_name': 'C02_01_GASTO_M_alimentacion_M', 'variable_group_id': 'carto-do.unica360.demographics_sociodemographics_esp_grid100x100m_2019_yearly_2019.average_household_spend_by_product_category', 'dataset_id': 'carto-do.unica360.demographics_sociodemographics_esp_grid100x100m_2019_yearly_2019', 'id': 'carto-do.unica360.demographics_sociodemographics_esp_grid100x100m_2019_yearly_2019.C02_01_GASTO_M_alimentacion_M'}
We decide to enrich our dataframe with the following variables.
vars_enrichment = ['P_T_9be2c6a7',
'P_ED_00_14_M_b66ee9e9', 'P_ED_00_14_H_c6041d66', 'P_ED_15_24_M_5261dc00', 'P_ED_15_24_H_220b288f',
'P_ED_25_44_M_46e29941', 'P_ED_25_44_H_36886dce', 'P_ED_45_64_M_8f3b64f0', 'P_ED_45_64_H_ff51907f',
'P_ED_65_79_M_a8c081ef', 'P_ED_65_79_H_d8aa7560', 'P_ED_80_MAS_M_c1c729f7', 'P_ED_80_MAS_H_b1addd78',
'renta_hab_disp_e4a8896c', 'C02_01_GASTO_M__7ad08d93']
enriched_dataset_gdf = enrichment.enrich_polygons(
stores,
variables=vars_enrichment,
geom_col='iso_5walk'
)
stores = enriched_dataset_gdf
stores.crs = 'epsg:4326'
stores.columns = map(str.lower, stores.columns)
stores.head()
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | geometry | iso_5walk | iso_10walk | n_pois | n_competitors | p_ed_00_14_h | p_ed_15_24_h | p_ed_25_44_h | p_ed_45_64_h | p_ed_65_79_h | c02_01_gasto_m_alimentacion_m | renta_hab_disp | p_t | p_ed_80_mas_h | p_ed_00_14_m | p_ed_15_24_m | p_ed_25_44_m | p_ed_45_64_m | p_ed_65_79_m | p_ed_80_mas_m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | POINT (-3.70394 40.42763) | MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... | MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... | 21752 | 1241 | 323.344538 | 216.393397 | 1759.796046 | 1065.119112 | 350.138129 | 3664.562616 | 19147.518516 | 8098.325095 | 128.170807 | 329.781022 | 291.551634 | 1657.004454 | 1096.365846 | 513.270507 | 367.389603 |
1 | ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | POINT (-3.78261 40.45914) | MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... | MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... | 6450 | 199 | 210.641714 | 149.459618 | 335.942818 | 358.687113 | 135.850954 | 4390.010598 | 19620.953784 | 2805.708506 | 74.379705 | 213.150064 | 152.993159 | 411.966220 | 436.088810 | 196.180844 | 130.367487 |
2 | RAMIREZ DEL PRADO | CALLE DE JUAN DE MARIANA 8 | 28045 | MADRID | COMUNIDAD DE MADRID | 19.27 | CALLE DE JUAN DE MARIANA 8, 28045 | POINT (-3.68693 40.39849) | MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... | MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... | 2622 | 108 | 169.482196 | 108.534454 | 281.058539 | 286.753594 | 50.599275 | 4048.951252 | 19705.219263 | 1830.630552 | 10.075373 | 139.002134 | 110.586285 | 275.670275 | 312.001228 | 62.222286 | 24.644911 |
3 | PASTORA IMPERIO. 1 | CALLE PASTORA IMPERIO 1 | 28036 | MADRID | COMUNIDAD DE MADRID | 17.53 | CALLE PASTORA IMPERIO 1, 28036 | POINT (-3.67468 40.48099) | MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40... | MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40... | 2777 | 93 | 156.911911 | 131.700549 | 285.061523 | 289.686083 | 154.947590 | 4608.221456 | 23047.014256 | 2194.433129 | 33.851479 | 142.951123 | 129.396225 | 286.993301 | 358.034212 | 177.399811 | 47.499323 |
4 | BÉJAR | CALLE DE BÉJAR 1 | 28028 | MADRID | COMUNIDAD DE MADRID | 25.91 | CALLE DE BÉJAR 1, 28028 | POINT (-3.67456 40.43516) | MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40... | MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40... | 16508 | 1034 | 358.786956 | 289.856363 | 1243.738389 | 840.314021 | 444.093999 | 3843.137973 | 20587.524114 | 7697.106579 | 190.240250 | 365.406908 | 302.177966 | 1410.380031 | 1100.172199 | 691.146655 | 460.792842 |
In this section, we'll calculate the following spatial lag variables:
In the city of Madrid, all touristic places are close to the Puerta del Sol site. This variable measures how close the store is to touristic places.
Other interesting spatial lag variables would be the average distance to the 3 closest competitors or the average revenue of the 2 closest Carrefour Express stores, just to mention some extra examples.
madrid_city_center = Point(-3.703367, 40.416892)
proj_in = pyproj.Proj('epsg:4326')
proj_out = pyproj.Proj('epsg:25830')
project = pyproj.Transformer.from_proj(proj_in, proj_out).transform
stores['dist_cc'] = stores.set_geometry('geometry').to_crs('epsg:25830').distance(
Point(project(madrid_city_center.y, madrid_city_center.x))).values
stores.head(2)
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | geometry | iso_5walk | iso_10walk | n_pois | n_competitors | p_ed_00_14_h | p_ed_15_24_h | p_ed_25_44_h | p_ed_45_64_h | p_ed_65_79_h | c02_01_gasto_m_alimentacion_m | renta_hab_disp | p_t | p_ed_80_mas_h | p_ed_00_14_m | p_ed_15_24_m | p_ed_25_44_m | p_ed_45_64_m | p_ed_65_79_m | p_ed_80_mas_m | dist_cc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | POINT (-3.70394 40.42763) | MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... | MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... | 21752 | 1241 | 323.344538 | 216.393397 | 1759.796046 | 1065.119112 | 350.138129 | 3664.562616 | 19147.518516 | 8098.325095 | 128.170807 | 329.781022 | 291.551634 | 1657.004454 | 1096.365846 | 513.270507 | 367.389603 | 1192.943298 |
1 | ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | POINT (-3.78261 40.45914) | MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... | MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... | 6450 | 199 | 210.641714 | 149.459618 | 335.942818 | 358.687113 | 135.850954 | 4390.010598 | 19620.953784 | 2805.708506 | 74.379705 | 213.150064 | 152.993159 | 411.966220 | 436.088810 | 196.180844 | 130.367487 | 8195.490184 |
dist_array = cdist(stores.set_geometry('geometry').to_crs('epsg:25830').geometry.apply(lambda point:[point.x, point.y]).tolist(),
stores.set_geometry('geometry').to_crs('epsg:25830').geometry.apply(lambda point:[point.x, point.y]).tolist())
stores['distance_closest_ce'] = list(map(lambda dist_a:np.max(np.partition(dist_a, 2)[:2]), dist_array))
stores.head()
name | mainaddressline | postcode | areaname2 | areaname1 | annual_sales | complete_address | geometry | iso_5walk | iso_10walk | n_pois | n_competitors | p_ed_00_14_h | p_ed_15_24_h | p_ed_25_44_h | p_ed_45_64_h | p_ed_65_79_h | c02_01_gasto_m_alimentacion_m | renta_hab_disp | p_t | p_ed_80_mas_h | p_ed_00_14_m | p_ed_15_24_m | p_ed_25_44_m | p_ed_45_64_m | p_ed_65_79_m | p_ed_80_mas_m | dist_cc | distance_closest_ce | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | SAN BERNARDO 76 | CALLE DEL DIVINO PASTOR 76 | 28004 | MADRID | COMUNIDAD DE MADRID | 22.46 | CALLE DEL DIVINO PASTOR 76, 28004 | POINT (-3.70394 40.42763) | MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... | MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... | 21752 | 1241 | 323.344538 | 216.393397 | 1759.796046 | 1065.119112 | 350.138129 | 3664.562616 | 19147.518516 | 8098.325095 | 128.170807 | 329.781022 | 291.551634 | 1657.004454 | 1096.365846 | 513.270507 | 367.389603 | 1192.943298 | 107.360057 |
1 | ARAVACA | AVENIDA DE LA OSA MAYOR 62 | 28023 | MADRID | COMUNIDAD DE MADRID | 17.21 | AVENIDA DE LA OSA MAYOR 62, 28023 | POINT (-3.78261 40.45914) | MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... | MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... | 6450 | 199 | 210.641714 | 149.459618 | 335.942818 | 358.687113 | 135.850954 | 4390.010598 | 19620.953784 | 2805.708506 | 74.379705 | 213.150064 | 152.993159 | 411.966220 | 436.088810 | 196.180844 | 130.367487 | 8195.490184 | 1570.310251 |
2 | RAMIREZ DEL PRADO | CALLE DE JUAN DE MARIANA 8 | 28045 | MADRID | COMUNIDAD DE MADRID | 19.27 | CALLE DE JUAN DE MARIANA 8, 28045 | POINT (-3.68693 40.39849) | MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... | MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... | 2622 | 108 | 169.482196 | 108.534454 | 281.058539 | 286.753594 | 50.599275 | 4048.951252 | 19705.219263 | 1830.630552 | 10.075373 | 139.002134 | 110.586285 | 275.670275 | 312.001228 | 62.222286 | 24.644911 | 2473.420416 | 1228.590695 |
3 | PASTORA IMPERIO. 1 | CALLE PASTORA IMPERIO 1 | 28036 | MADRID | COMUNIDAD DE MADRID | 17.53 | CALLE PASTORA IMPERIO 1, 28036 | POINT (-3.67468 40.48099) | MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40... | MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40... | 2777 | 93 | 156.911911 | 131.700549 | 285.061523 | 289.686083 | 154.947590 | 4608.221456 | 23047.014256 | 2194.433129 | 33.851479 | 142.951123 | 129.396225 | 286.993301 | 358.034212 | 177.399811 | 47.499323 | 7519.491317 | 692.176192 |
4 | BÉJAR | CALLE DE BÉJAR 1 | 28028 | MADRID | COMUNIDAD DE MADRID | 25.91 | CALLE DE BÉJAR 1, 28028 | POINT (-3.67456 40.43516) | MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40... | MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40... | 16508 | 1034 | 358.786956 | 289.856363 | 1243.738389 | 840.314021 | 444.093999 | 3843.137973 | 20587.524114 | 7697.106579 | 190.240250 | 365.406908 | 302.177966 | 1410.380031 | 1100.172199 | 691.146655 | 460.792842 | 3175.485661 | 426.405602 |
Map(Layer(stores,
geom_col='iso_5walk',
style=color_bins_style('n_competitors'),
legends=color_bins_legend('# Competitors', 'competitos within 10-minute driving isochrone'),
popup_hover=[popup_element('name', 'Name'),
popup_element('n_pois', 'Number of POIs'),
popup_element('n_competitors', 'Number of competitors'),
popup_element('p_t', 'Population coverage'),
popup_element('c02_01_gasto_m_alimentacion_m', 'Groceries spending'),
popup_element('renta_hab_disp', 'income'),
popup_element('distance_closest_ce', 'Distance to closest CE')],
widgets=[histogram_widget('n_pois', 'Number of POIs', description='Select a range of values to filter', buckets=10),
histogram_widget('n_competitors', 'Number of competitors', description='Select a range of values to filter', buckets=10),
histogram_widget('dist_cc', 'Distance to city center', description='Select a range of values to filter', buckets=10),
histogram_widget('distance_closest_ce', 'Distance to closest CE store', description='Select a range of values to filter', buckets=10)]))
Once we have calculated our store's cathcment areas and enriched them, the next steps required to build a predictive model would be:
We suggest using Regression-kriging as prediction technique that combines a regression of the dependent variable on auxiliary variables with kriging of the regression residuals.
If you are interested in learning more about Revenue Prediction for Site Selection and how Regression-kriging is applied, take a look at this blog post where we describe an end-to-end revenue prediction use case.