#!/usr/bin/env python # coding: utf-8 # ## Quickstart # # Hi! Glad to see you made it to the Quickstart guide! In this guide you are introduced to how CARTOframes can be used by data scientists in spatial analysis workflows. Using simulated Starbucks revenue data, this guide walks through some common steps a data scientist takes to answer the following question: which stores are performing better than others? # # Before you get started, we encourage you to have CARTOframes and Python 3 installed so you can get a feel for the library by using it: # In[1]: # pip install cartoframes # For other ways to install CARTOframes, check out the [Installation guide](/developers/cartoframes/guides/Installation). # #### Spatial analysis scenario # # Let's say you are a data scientist working for Starbucks and you want to better understand why some stores in Brooklyn, New York, perform better than others. # # To begin, let's outline a workflow: # # - Get and explore your own data # - Create areas of influence for your stores # - Enrich your data # - And finally, share the results of your analysis with your team # # Let's get started! # ### Get and explore your own data # # We are going to use a dataset that contains information about the location of Starbucks and each store's annual revenue. # # As a first exploratory step, read it into a Jupyter Notebook using [pandas](https://pandas.pydata.org/). # In[2]: import pandas as pd stores_df = pd.read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn.csv') stores_df.head() # To display your stores as points on a map, you first have to convert the `address` column into geometries. This process is called [geocoding](https://carto.com/help/working-with-data/geocoding-types/) and CARTO provides a straightforward way to do it (you can learn more about it in the [Data Services guide](/developers/cartoframes/guides/Data-Services)). # # In order to geocode, you have to set your CARTO credentials. If you aren't sure about your API key, check the [Authentication guide](/developers/cartoframes/guides/Authentication/) to learn how to get it. In case you want to see the geocoded result, without being logged in, [you can get it here](http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_geocoded.csv). # # > Note: If you don't have an account yet, you can get a trial free account by [signing up here](https://carto.com/signup/). # In[3]: from cartoframes.auth import set_default_credentials set_default_credentials('creds.json') # Now that your credentials are set, we are ready to geocode the dataframe. The resulting data will be a [GeoDataFrame](http://geopandas.org/data_structures.html#geodataframe). # In[4]: from cartoframes.data.services import Geocoding stores_gdf, _ = Geocoding().geocode(stores_df, street='address') stores_gdf.head() # Done! Now that the stores are geocoded, you will notice a new column named `geometry` has been added. This column stores the geographic location of each store and it's used to plot each location on the map. # # You can quickly visualize your geocoded dataframe using the Map and Layer classes. Check out the [Data Visualization guide](/developers/cartoframes/guides/Data-Visualization) to learn more about the visualization capabilities inside of CARTOframes. # In[5]: from cartoframes.viz import Map, Layer Map(Layer(stores_gdf)) # Great! You have a map! # # With the stores plotted on the map, you now have a better sense about where each one is. To continue your exploration, you want to know which stores earn the most yearly revenue. To do this, you can use the [`size_continuous_style`](/developers/cartoframes/examples/#example-size-continuous-style) visualization layer: # In[6]: from cartoframes.viz import Map, Layer, size_continuous_style Map(Layer(stores_gdf, size_continuous_style('revenue', size_range=[10,40]), title='Annual Revenue ($)')) # Good job! By using the [`size continuous visualization style`](/developers/cartoframes/examples/#example-size-continuous-style) you can see right away where the stores with higher revenue are. By default, visualization styles also provide a popup with the mapped value and an appropriate legend. # ### Create your areas of influence # # Similar to geocoding, there is a straightforward method for creating isochrones to define areas of influence around each store. Isochrones are concentric polygons that display equally calculated levels over a given surface area measured by time. # # For this analysis, let's create isochrones for each store that cover the area within a 15 minute walk. # # To do this you will use the Isolines data service: # # In[7]: from cartoframes.data.services import Isolines isochrones_gdf, _ = Isolines().isochrones(stores_gdf, [15*60], mode='walk') isochrones_gdf.head() # In[8]: stores_map = Map([ Layer(isochrones_gdf), Layer(stores_gdf, size_continuous_style('revenue', size_range=[10,40]), title='Annual Revenue ($)') ]) stores_map # There they are! To learn more about creating isochrones and isodistances check out the [Data Services guide](/developers/cartoframes/guides/Data-Services). # # > Note: You will see how to publish a map in the last section. If you already want to publish this map, you can do it by calling `stores_map.publish('starbucks_isochrones', password=None)`. # ### Enrich your data # # Now that you have the area of influence calculated for each store, let's take a look at how to augment the result with population information to help better understand a store's average revenue per person. # # > Note: To be able to use the Enrichment functions you need an Enterprise CARTO account with Data Observatory 2.0 enabled. Please contact your Customer Success Manager or contact us at [sales@carto.com](mailto:sales@carto.com) for more information. # # # First, let's find the demographic variable we need. We will use the `Catalog` class that can be filtered by country and category. In our case, we have to look for USA demographics datasets. Let's see which public ones are available. # In[9]: from cartoframes.data.observatory import Catalog datasets_df = Catalog().country('usa').category('demographics').datasets.to_dataframe() datasets_df[datasets_df['is_public_data'] == True] # Nice! Let's take the first one (`acs_sociodemogr_b758e778`) that has aggregated data from 2013 to 2018 and check which of its variables have data about the total population. # In[10]: from cartoframes.data.observatory import Dataset dataset = Dataset.get('acs_sociodemogr_b758e778') variables_df = dataset.variables.to_dataframe() variables_df[variables_df['description'].str.contains('total population', case=False, na=False)] # We can see the variable that contains the total population is the one with the slug `total_pop_3cf008b3`. Now we are ready to enrich our areas of influence with that variable. # In[11]: from cartoframes.data.observatory import Variable from cartoframes.data.observatory import Enrichment variable = Variable.get('total_pop_3cf008b3') isochrones_gdf = Enrichment().enrich_polygons(isochrones_gdf, [variable]) isochrones_gdf.head() # Great! Let's see the result on a map: # In[12]: from cartoframes.viz import color_continuous_style Map([ Layer(isochrones_gdf, color_continuous_style('total_pop'), title='Total Population'), Layer(stores_gdf, size_continuous_style('revenue', size_range=[10,40]), title='Annual Revenue ($)') ]) # At this stage, we could say that the store on the right performs better than others because its area of influence is the one with the lowest population but the store is not the one with the lowest revenue. This insight will help us to focus on them in further analyses. # # To learn more about discovering and enriching your data with thousands of public and premium datasets, check out the [Data Observatory guide](/developers/cartoframes/guides/Data-Observatory). # ### Publish and share your results # # The final step of this workflow is to share this interactive map with your colleagues so they can explore the information on their own. Let's do it! # # First, let's add widgets to it so people are able to see some graphs of the information and filter it. To do this, we only have to add `default_widget=True` to the layers. # In[13]: result_map = Map([ Layer( isochrones_gdf, color_continuous_style('total_pop', stroke_width=0, opacity=0.7), title='Total Population', default_widget=True ), Layer( stores_gdf, size_continuous_style('revenue', size_range=[10,40], stroke_color='white'), title='Annual Revenue ($)', default_widget=True ) ]) result_map # Cool! Now that you have a small dashboard to play with, let's publish it on CARTO so you are able to share it with anyone. To do this, you just need to call the publish method from the Map class: # In[14]: result_map.publish('starbucks_analysis', password=None, if_exists='replace') # In order to improve the performance and reduce the size of your map, we recommend to upload the data to CARTO and use the table names in the layers instead. To upload your data, you just need to call `to_carto` with your GeoDataFrame: # In[15]: from cartoframes import to_carto to_carto(stores_gdf, 'starbucks_stores', if_exists='replace') to_carto(isochrones_gdf, 'starbucks_isochrones', if_exists='replace') # ### Congratulations! # # You have finished this guide and have a sense about how CARTOframes can speed up your workflow. To continue learning, you can check out other [Guides](/developers/cartoframes/guides), the [Reference](/developers/cartoframes/reference) to know everything about a class or a method or play with the [Examples](/developers/cartoframes/examples) to see CARTOframes in action.