This notebook shows how to use CARTOframes to enrich the area of influence of different POIs with data from CARTO's Data Observatory. Please, visit CARTOframes Guides to learn more about the enrichment functionality.
We will show CARTOframes enrichment functionality with an example in which we will quantify the number of eating places within a 5-minute isochrone for all sports POI's in Madrid downtown.
The notebook is organized as follows:
Note for this notebook we are using the premium dataset of Pitney Bowes POI's in Spain.
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
from cartoframes.auth import set_default_credentials
from cartoframes.data.observatory import *
from cartoframes.data.services import Isolines
from cartoframes.viz import *
sns.set_style('whitegrid')
%matplotlib inline
In order to be able to use the Data Observatory via CARTOframes, you need to set your CARTO account credentials first.
Please, visit the Authentication guide for further detail.
from cartoframes.auth import set_default_credentials
set_default_credentials('creds.json')
Note about credentials
For security reasons, we recommend storing your credentials in an external file to prevent publishing them by accident when sharing your notebooks. You can get more information in the section Setting your credentials of the Authentication guide.
We need to start with the initial DataFrame that we would like to enrich. Normally, this initial DataFrame contains your own data that you later enrich with data from the Data Observatory. In this case, we will download all sports POI's and use it as our initial DataFrame.
We first check that we are subscribed to PB POIs dataset in Spain and download the sports POI's within a bounding box covering Madrid downtown. You can calculate your bounding box of interest using bboxfinder.
For a step by step description on how to discover and download premium datasets, take a look at templates: Data Discovery and Access Premium Data.
Catalog().subscriptions().datasets.to_dataframe()
slug | name | description | category_id | country_id | data_source_id | provider_id | geography_name | geography_description | temporal_aggregation | time_coverage | update_frequency | is_public_data | lang | version | category_name | provider_name | geography_id | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ags_sociodemogr_a7e14220 | Sociodemographics - United States of America (... | Census and ACS sociodemographic data estimated... | demographics | usa | sociodemographics | ags | Census Block Group - United States of America | None | yearly | None | yearly | False | eng | 2020 | Demographics | Applied Geographic Solutions | carto-do.ags.geography_usa_blockgroup_2015 | carto-do.ags.demographics_sociodemographics_us... |
1 | ags_retailpoten_aaf25a8c | Retail Potential - United States of America (C... | The retail potential database consists of aver... | demographics | usa | retailpotential | ags | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | yearly | [2018-01-01, 2019-01-01) | yearly | False | eng | 2019 | Demographics | Applied Geographic Solutions | carto-do-public-data.carto.geography_usa_block... | carto-do.ags.demographics_retailpotential_usa_... |
2 | pb_consumer_po_62cddc04 | Points Of Interest - Consumer - United States ... | Consumer Point of interest database per catego... | points_of_interest | usa | consumer_points_of_interest | pitney_bowes | Latitude/Longitude - United States of America | Location of Points of Interest | monthly | None | monthly | False | eng | v1 | Points of Interest | Pitney Bowes | carto-do.pitney_bowes.geography_usa_latlon_v1 | carto-do.pitney_bowes.pointsofinterest_consume... |
3 | ags_sociodemogr_f510a947 | Sociodemographics - United States of America (... | Census and ACS sociodemographic data estimated... | demographics | usa | sociodemographics | ags | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | yearly | [2019-01-01, 2020-01-01) | yearly | False | eng | 2019 | Demographics | Applied Geographic Solutions | carto-do-public-data.carto.geography_usa_block... | carto-do.ags.demographics_sociodemographics_us... |
4 | ags_consumer_sp_dbabddfb | Consumer Spending - United States of America (... | The Consumer Expenditure database consists of ... | demographics | usa | consumer_spending | ags | Census Block Group - United States of America | None | yearly | None | yearly | False | eng | 2020 | Demographics | Applied Geographic Solutions | carto-do.ags.geography_usa_blockgroup_2015 | carto-do.ags.demographics_consumerspending_usa... |
5 | spa_geosocial_s_d5dc42ae | Geosocial Segments - United States of America ... | By analysing feeds from Twitter, Instagram, Me... | behavioral | usa | geosocial_segments | spatial_ai | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | quarterly | [2020-01-01, 2020-04-01) | quarterly | False | eng | v1 | Behavioral | Spatial.ai | carto-do-public-data.carto.geography_usa_block... | carto-do.spatial_ai.behavioral_geosocialsegmen... |
6 | mc_geographic__7980c5c3 | Geographic Insights - United States of America... | Geographic Insights validate, evaluate and ben... | financial | usa | geographic_insights | mastercard | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | monthly | [2019-01-01, 2020-01-01) | monthly | False | eng | v1 | Financial | Mastercard | carto-do-public-data.carto.geography_usa_block... | carto-do.mastercard.financial_geographicinsigh... |
7 | pb_points_of_i_94bda91b | Points Of Interest - Spain (Latitude/Longitude) | Point of interest database per categories | points_of_interest | esp | points_of_interest | pitney_bowes | Latitude/Longitude - Spain | Location of Points of Interest | monthly | None | monthly | False | eng | v1 | Points of Interest | Pitney Bowes | carto-do.pitney_bowes.geography_esp_latlon_v1 | carto-do.pitney_bowes.pointsofinterest_pointso... |
pois_ds = Dataset.get('pb_points_of_i_94bda91b')
pois_ds.head()
HTTP | ISO3 | NAME | SIC1 | SIC2 | SIC8 | CLASS | GROUP | PB_ID | ... | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_INDICATOR | DOMESTIC_ULTIMATE_POSTCODE | DOMESTIC_ULTIMATE_AREANAME1 | DOMESTIC_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_BUSINESS_NAME | GLOBAL_ULTIMATE_STREET_ADDRESS | DOMESTIC_ULTIMATE_BUSINESS_NAME | DOMESTIC_ULTIMATE_STREET_ADDRESS | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | ESP | EL SANTO | None | 6552 | 50440000 | DRINKING PLACES | None | SHOPPING | 1173764019 | ... | None | None | None | None | None | None | None | None | None | CARRETERA DE CADIZ 206 |
1 | None | ESP | LA FUENTE | None | 6552 | 50440000 | DRINKING PLACES | None | SHOPPING | 1432760662 | ... | None | None | None | None | None | None | None | None | None | PASEO DE MIRAMON 185 |
2 | None | ESP | SANT FRANCESC XAVIER | None | 6541 | 50440000 | DRINKING PLACES | None | SHOPPING | 1170864171 | ... | None | None | None | None | None | None | None | None | None | PASEO CASTELLANA, 120 - IZ BJ |
3 | None | ESP | RUFINO BLANCO | None | 6552 | 50440000 | DRINKING PLACES | None | SHOPPING | 1171185920 | ... | None | None | None | None | None | None | None | None | None | AVENIDA DEL CARMEN (ED EL FARO), BL 3 LOC |
4 | None | ESP | CASA CONVALESCÈNCIA | None | None | 50440000 | DRINKING PLACES | None | SHOPPING | 1173842007 | ... | None | None | None | None | None | None | None | None | None | CARRETERA PALAU (KM 1) |
5 | None | ESP | ROSALÍA DE CASTRO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1505524737 | ... | None | None | None | None | None | None | None | None | None | CALLE MIGUEL VAZQUEZ DELGADO 71 |
6 | None | ESP | CENTRO DE FORMACIÓN Y EMPLEO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1173898606 | ... | None | None | None | None | None | None | None | None | None | CALLE ANTIC CAMI DE XIMELIS 19 |
7 | None | ESP | EFA EL SOTO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1293842742 | ... | None | None | None | None | None | None | None | None | None | AVENIDA GENERAL PERON (ED MASTER'S I), 38 - PI... |
8 | None | ESP | O CASTIÑEIRO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1172241073 | ... | None | None | None | None | None | None | None | None | None | CALLE BRUC DEL MIG 8 |
9 | None | ESP | CPEB DE CABAÑAQUINTA | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1171203786 | ... | None | None | None | None | None | None | None | None | None | CALLE MAYOR, 32 - 1 A |
10 rows × 74 columns
sql_query = """
SELECT * except(do_label) FROM $dataset$
WHERE TRADE_DIVISION = 'DIVISION M. - SPORTS'
AND ST_IntersectsBox(geom, -3.716398,40.407437,-3.690477,40.425277)
"""
pois_df = pois_ds.to_dataframe(sql_query=sql_query)
# To keep only the latest version of POI's
pois_df = pois_df.sort_values(['NAME', 'do_date']).groupby('NAME').first().reset_index()
pois_df.head()
NAME | geoid | do_date | BRANDNAME | PB_ID | TRADE_NAME | FRANCHISE_NAME | ISO3 | AREANAME4 | AREANAME3 | ... | GLOBAL_ULTIMATE_STREET_ADDRESS | GLOBAL_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_COUNTRY | GLOBAL_ULTIMATE_POSTCODE | FAMILY_MEMBERS | HIERARCHY_CODE | TICKER_SYMBOL | EXCHANGE_NAME | geom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACQUAPLAYA SPA | 2204263540#-3.7095628#40.4226394 | 2020-04-01 | NaN | 2204263540 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70956 40.42264) |
1 | ALAMBIQUE TIENDA Y ESCUELA DE COCINA | 2157202351#-3.7109121#40.4198204 | 2019-12-01 | NaN | 2157202351 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71091 40.41982) |
2 | ALCÁZAR NIGHT | 2137823204#-3.69905#40.4178 | 2019-12-01 | NaN | 2137823204 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69905 40.41780) |
3 | ALICIA PRODUCE | 2181768913#-3.7112#40.42404 | 2020-02-01 | NaN | 2181768913 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71120 40.42404) |
4 | ALMA PILATES | 2197072938#-3.7033635#40.4141477 | 2020-04-01 | NaN | 2197072938 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70336 40.41415) |
5 rows × 74 columns
For this analysis, we are interested in knowing the number of eating places reachable within 5 minutes for every sport POI. We'll now proceed to calculate 5-minute isochrones for every POI, which represent the area reachable within 5 minutes.
You can read more regarding isochrones on CARTOframes Guides.
iso_service = Isolines()
isochrones_gdf, isochrones_metadata = iso_service.isochrones(pois_df, [300], mode='walk', geom_col='geom')
Success! Isolines created correctly
isochrones_gdf.head()
source_id | data_range | the_geom | |
---|---|---|---|
0 | 0 | 300 | MULTIPOLYGON (((-3.71192 40.42488, -3.71149 40... |
1 | 1 | 300 | MULTIPOLYGON (((-3.71346 40.42093, -3.71321 40... |
2 | 2 | 300 | MULTIPOLYGON (((-3.70248 40.41750, -3.70222 40... |
3 | 3 | 300 | MULTIPOLYGON (((-3.71346 40.42437, -3.71338 40... |
4 | 4 | 300 | MULTIPOLYGON (((-3.70660 40.41544, -3.70634 40... |
pois_df['isochrone'] = isochrones_gdf.sort_values('source_id')['the_geom'].values
pois_df.head()
NAME | geoid | do_date | BRANDNAME | PB_ID | TRADE_NAME | FRANCHISE_NAME | ISO3 | AREANAME4 | AREANAME3 | ... | GLOBAL_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_COUNTRY | GLOBAL_ULTIMATE_POSTCODE | FAMILY_MEMBERS | HIERARCHY_CODE | TICKER_SYMBOL | EXCHANGE_NAME | geom | isochrone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACQUAPLAYA SPA | 2204263540#-3.7095628#40.4226394 | 2020-04-01 | NaN | 2204263540 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70956 40.42264) | MULTIPOLYGON (((-3.71192 40.42488, -3.71149 40... |
1 | ALAMBIQUE TIENDA Y ESCUELA DE COCINA | 2157202351#-3.7109121#40.4198204 | 2019-12-01 | NaN | 2157202351 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71091 40.41982) | MULTIPOLYGON (((-3.71346 40.42093, -3.71321 40... |
2 | ALCÁZAR NIGHT | 2137823204#-3.69905#40.4178 | 2019-12-01 | NaN | 2137823204 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69905 40.41780) | MULTIPOLYGON (((-3.70248 40.41750, -3.70222 40... |
3 | ALICIA PRODUCE | 2181768913#-3.7112#40.42404 | 2020-02-01 | NaN | 2181768913 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71120 40.42404) | MULTIPOLYGON (((-3.71346 40.42437, -3.71338 40... |
4 | ALMA PILATES | 2197072938#-3.7033635#40.4141477 | 2020-04-01 | NaN | 2197072938 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70336 40.41415) | MULTIPOLYGON (((-3.70660 40.41544, -3.70634 40... |
5 rows × 75 columns
Map([Layer(pois_df, geom_col='geom'),
Layer(pois_df, geom_col='isochrone', style=basic_style(opacity=0.1))])
We will now proceed to enrich our DataFrame.
For enriching datasets, we use the Enrichment class. Please, visit CARTOframes Guides to learn more.
enrichment = Enrichment()
We will start by simply counting the number of POI's within each isochrone. This will allow us to measure how busy the area around each sport POI is.
In order to do this, we will use the Enrichment function enrich_polygons()
for which we can select any variable, because we are only interested in counting POIs. That is why we selected the variable CLASS_517d6003
that we will use later. Remember you can access the dataset variables doing pois_ds.variables.to_dataframe()
.
Note that we need to specify the name of the geometry column (geom_col
) because we are working with a DataFrame instead of a GeoDataFrame.
# Here we can use any variable because we're only interested in counts
pois_df = enrichment.enrich_polygons(
pois_df,
variables=['CLASS_517d6003'],
aggregation='COUNT',
geom_col='isochrone'
)
# We rename the column name to give it a more descriptive name
pois_df.rename(columns={'CLASS_y':'n_pois'}, inplace=True)
pois_df.head()
NAME | geoid | do_date | BRANDNAME | PB_ID | TRADE_NAME | FRANCHISE_NAME | ISO3 | AREANAME4 | AREANAME3 | ... | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_COUNTRY | GLOBAL_ULTIMATE_POSTCODE | FAMILY_MEMBERS | HIERARCHY_CODE | TICKER_SYMBOL | EXCHANGE_NAME | geom | isochrone | n_pois | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACQUAPLAYA SPA | 2204263540#-3.7095628#40.4226394 | 2020-04-01 | NaN | 2204263540 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70956 40.42264) | MULTIPOLYGON (((-3.71192 40.42488, -3.71149 40... | 31977 |
1 | ALAMBIQUE TIENDA Y ESCUELA DE COCINA | 2157202351#-3.7109121#40.4198204 | 2019-12-01 | NaN | 2157202351 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71091 40.41982) | MULTIPOLYGON (((-3.71346 40.42093, -3.71321 40... | 10131 |
2 | ALCÁZAR NIGHT | 2137823204#-3.69905#40.4178 | 2019-12-01 | NaN | 2137823204 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69905 40.41780) | MULTIPOLYGON (((-3.70248 40.41750, -3.70222 40... | 19947 |
3 | ALICIA PRODUCE | 2181768913#-3.7112#40.42404 | 2020-02-01 | NaN | 2181768913 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71120 40.42404) | MULTIPOLYGON (((-3.71346 40.42437, -3.71338 40... | 24480 |
4 | ALMA PILATES | 2197072938#-3.7033635#40.4141477 | 2020-04-01 | NaN | 2197072938 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70336 40.41415) | MULTIPOLYGON (((-3.70660 40.41544, -3.70634 40... | 30348 |
5 rows × 76 columns
Map(Layer(pois_df, geom_col='geom',
style=size_continuous_style('n_pois'),
legends=size_continuous_legend('# POIs'),
popup_hover=[popup_element('NAME', 'Name'),
popup_element('n_pois', 'Number of POIs')]))
Now, we are interested in getting the number of eating places within a 5-minute isochrone for every sport POI. This requires using a filter to indicate that only eating places should be counted. Filters are added in a dictionary-like format, where the key is the filtering variable and the value is the filtering value.
If you are interested in knowing how to identify the variable to use as filter, check out this notebook on how to access and download premium data.
pois_df = enrichment.enrich_polygons(
pois_df,
variables=['CLASS_517d6003'],
aggregation='COUNT',
geom_col='iso_10walk',
filters={Variable.get('CLASS_517d6003').id:"= 'EATING PLACES/RESTAURANTS'"}
)
# We rename the column name to give it a more descriptive name
pois_df.rename(columns={'CLASS':'n_pois_eating'}, inplace=True)
pois_df.head()
NAME | geoid | do_date | BRANDNAME | PB_ID | TRADE_NAME | FRANCHISE_NAME | ISO3 | AREANAME4 | AREANAME3 | ... | GLOBAL_ULTIMATE_COUNTRY | GLOBAL_ULTIMATE_POSTCODE | FAMILY_MEMBERS | HIERARCHY_CODE | TICKER_SYMBOL | EXCHANGE_NAME | geom | isochrone | n_pois | n_pois_eating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACQUAPLAYA SPA | 2204263540#-3.7095628#40.4226394 | 2020-04-01 | NaN | 2204263540 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70956 40.42264) | MULTIPOLYGON (((-3.71192 40.42488, -3.71149 40... | 31977 | 2052 |
1 | ALAMBIQUE TIENDA Y ESCUELA DE COCINA | 2157202351#-3.7109121#40.4198204 | 2019-12-01 | NaN | 2157202351 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71091 40.41982) | MULTIPOLYGON (((-3.71346 40.42093, -3.71321 40... | 10131 | 1009 |
2 | ALCÁZAR NIGHT | 2137823204#-3.69905#40.4178 | 2019-12-01 | NaN | 2137823204 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69905 40.41780) | MULTIPOLYGON (((-3.70248 40.41750, -3.70222 40... | 19947 | 1534 |
3 | ALICIA PRODUCE | 2181768913#-3.7112#40.42404 | 2020-02-01 | NaN | 2181768913 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.71120 40.42404) | MULTIPOLYGON (((-3.71346 40.42437, -3.71338 40... | 24480 | 2125 |
4 | ALMA PILATES | 2197072938#-3.7033635#40.4141477 | 2020-04-01 | NaN | 2197072938 | NaN | NaN | ESP | NaN | MADRID | ... | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70336 40.41415) | MULTIPOLYGON (((-3.70660 40.41544, -3.70634 40... | 30348 | 2744 |
5 rows × 77 columns
Map(Layer(pois_df, geom_col='geom',
style=size_continuous_style('n_pois_eating'),
legends=size_continuous_legend('# Eating POIs'),
popup_hover=[popup_element('NAME', 'Name'),
popup_element('n_pois_eating', 'Number of eating places')]))
Let's now take a look at how the total number of POI's and eating places around sport POI's correlate.
Layout([Map(Layer(pois_df, geom_col='geom',
style=size_continuous_style('n_pois'),
legends=size_continuous_legend('# POIs'),
popup_hover=[popup_element('NAME', 'Name'),
popup_element('n_pois', 'Number of POIs')])),
Map(Layer(pois_df, geom_col='geom',
style=size_continuous_style('n_pois_eating'),
legends=size_continuous_legend('# Eating POIs'),
popup_hover=[popup_element('NAME', 'Name'),
popup_element('n_pois_eating', 'Number of eating places')]))],
map_height=550)
plt.figure(figsize=(12,5))
sns.regplot(pois_df['n_pois'], pois_df['n_pois_eating'],
scatter_kws={'color':'blue', 'alpha':0.5}, line_kws={'color':'red'})
<matplotlib.axes._subplots.AxesSubplot at 0x7f6970b46cf8>