#!/usr/bin/env python # coding: utf-8 # ## Data Services # # You can connect to [CARTO Data Services API](https://carto.com/developers/data-services-api/) directly from CARTOframes. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook. For instance, you can **geocode** a pandas DataFrame with addresses on the fly, and then perform a trade area analysis by computing **isodistances** or **isochrones** programmatically. # # Using Data Services requires to be authenticated. For more information about how to authenticate, please read the [Authentication guide](/developers/cartoframes/guides/Authentication/). For further learning you can also check out the [Data Services examples](/developers/cartoframes/examples/#example-data-services). # In[1]: from cartoframes.auth import set_default_credentials set_default_credentials('creds.json') # > Depending on your CARTO account plan, some of these data services are subject to different [quota limitations](https://carto.com/developers/data-services-api/support/quota-information/). # ### Geocoding # # To get started, let's read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the **geocoding** service: `name` and `address`. There's also a third column that reflects the annual revenue of the store. # In[2]: import pandas as pd df = pd.read_csv('http://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn.csv') df.head() # #### Quota consumption # # Each time you run Data Services, quota is consumed. For this reason, we provide the ability to check in advance the **amount of credits** an operation will consume by using the `dry_run` parameter when running the service function. # # It is also possible to check your available quota by running the `available_quota` function. # In[3]: from cartoframes.data.services import Geocoding geo_service = Geocoding() city_ny = {'value': 'New York'} country_usa = {'value': 'USA'} _, geo_dry_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa, dry_run=True) # In[4]: geo_dry_metadata # In[5]: geo_service.available_quota() # In[6]: geo_gdf, geo_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa) # Let's compare `geo_dry_metadata` and `geo_metadata` to see the differences between the information returned with and without the `dry_run` option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota. # In[7]: geo_metadata # In[8]: geo_service.available_quota() # If the input data file ever changes, cached results will only be applied to unmodified # records, and new geocoding will be performed only on _new or changed records_. In order to use cached results, we have to save the results to a CARTO table using the `table_name` and `cached=True` parameters. # The resulting data is a `GeoDataFrame` that contains three new columns: # # * `geometry`: The resulting geometry # * `gc_status_rel`: The percentage of accuracy of each location # * `carto_geocode_hash`: Geocode information # In[9]: geo_gdf.head() # In addition, to prevent geocoding records that have been **previously geocoded**, and thus spend quota **unnecessarily**, you should always preserve the ``the_geom`` and ``carto_geocode_hash`` columns generated by the geocoding process. # # This will happen **automatically** in these cases: # # 1. Your input is a **table** from CARTO processed in place (without a ``table_name`` parameter) # 2. If you save your results to a CARTO table using the ``table_name`` parameter, and only use the resulting table for any further geocoding. # # If you try to geocode this DataFrame now that it contains both ``the_geom`` and the ``carto_geocode_hash``, you will see that the required quota is 0 because it has already been geocoded. # In[10]: _, geo_metadata = geo_service.geocode(geo_gdf, street='address', city=city_ny, country=country_usa, dry_run=True) # In[11]: geo_metadata.get('required_quota') # #### Precision # # The `address` column is more complete than the `name` column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the `name` column are lower than the ones we get by using the `address` column for geocoding. # In[12]: geo_name_gdf, geo_name_metadata = geo_service.geocode(df, street='name', city=city_ny, country=country_usa) # In[13]: geo_name_gdf.gc_status_rel.unique() # In[14]: geo_gdf.gc_status_rel.unique() # #### Visualize the results # # Finally, we can visualize the precision of the geocoded results using a CARTOframes [visualization layer](/developers/cartoframes/examples/#example-color-bins-layer). # In[15]: from cartoframes.viz import Layer, color_bins_style, popup_element Layer( geo_gdf, color_bins_style('gc_status_rel', method='equal', bins=geo_gdf.gc_status_rel.unique().size), popup_hover=[popup_element('address', 'Address'), popup_element('gc_status_rel', 'Precision')], title='Geocoding Precision' ) # ### Isolines # # There are two **Isoline** functions: **isochrones** and **isodistances**. In this guide we will use the **isochrones** function to calculate walking areas _by time_ for each Starbucks store and the **isodistances** function to calculate the walking area _by distance_. # # By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by: # # * **Time** in the case of **isochrones** # * **Distance** in the case of **isodistances** # #### Isochrones # # For isochrones, let's calculate the time ranges of 5, 15 and 30 minutes. These ranges are input in `seconds`, so they will be **300**, **900**, and **1800** respectively. # In[16]: from cartoframes.data.services import Isolines iso_service = Isolines() _, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True) # Remember to always **check the quota** using `dry_run` parameter and `available_quota` method before running the service! # In[17]: print('available {0}, required {1}'.format( iso_service.available_quota(), isochrones_dry_metadata.get('required_quota')) ) # In[18]: isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk') # In[19]: isochrones_gdf.head() # In[20]: from cartoframes.viz import Layer, basic_style, basic_legend Layer(isochrones_gdf, basic_style(opacity=0.5), basic_legend('Isochrones')) # #### Isodistances # # For isodistances, let's calculate the distance ranges of 100, 500 and 1000 meters. These ranges are input in `meters`, so they will be **100**, **500**, and **1000** respectively. # In[21]: _, isodistances_dry_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk', dry_run=True) # In[22]: print('available {0}, required {1}'.format( iso_service.available_quota(), isodistances_dry_metadata.get('required_quota')) ) # In[23]: isodistances_gdf, isodistances_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk') # In[24]: isodistances_gdf.head() # In[25]: from cartoframes.viz import Layer, basic_style, basic_legend Layer(isodistances_gdf, basic_style(opacity=0.5), basic_legend('Isodistances'))