Data science workflows that leverage CARTO
A large number of data scientists rely on the de facto standards of analysis in a Jupyter notebook. We want to support that by creating a Python module that allows these users to develop analyses while seamlessly interacting with CARTO. We aim to feature:
You'll need the following for this:
Paste these values in the quotes (''
) below.
import pandas as pd
import cartoframes
username = '' # <-- insert your username here
api_key = '' # <-- insert your API key here
tablename = '' # <-- insert your tablename here
cc = cartoframes.CartoContext('https://{}.carto.com/'.format(username),
api_key)
df = cc.read(tablename)
df.head()
df['favorite_cookie'] = 'pecan'
df['favorite_cookie'][df.index % 2 == 0] = 'oatmeal'
cc.write(df, tablename, overwrite=True)
df.carto_map(color='favorite_cookie')
Query your CARTO account and create a table from the query. Finally, pull that new table into a pandas DataFrame.
df_buffer = cc.query(query='''
SELECT ST_Buffer(the_geom::geography, 10000)::geometry as the_geom,
cartodb_id, mag, depth, place
FROM all_month_3
LIMIT 100
''',
tablename='buffered_earthquakes')
df_buffer.head()
print(df_buffer.get_carto_datapage())
Let's recreate the workflow from https://jakevdp.github.io/blog/2015/08/14/out-of-core-dataframes-in-python/, where the author explores dask
for splitting up the computations between multiple cores in a machine to complete tasks more quickly.
from dask import dataframe as dd
import pandas as pd
columns = ["name", "amenity", "Longitude", "Latitude"]
data = dd.read_csv('POIWorld.csv', usecols=columns)
with_name = data[data.name.notnull()]
with_amenity = data[data.amenity.notnull()]
is_starbucks = with_name.name.str.contains('[Ss]tarbucks')
is_dunkin = with_name.name.str.contains('[Dd]unkin')
starbucks = with_name[is_starbucks].compute()
dunkin = with_name[is_dunkin].compute()
starbucks['type'] = 'starbucks'
dunkin['type'] = 'dunkin'
coffee_places = pd.concat([starbucks, dunkin])
coffee_places.head(20)
import pandas as pd
import cartoframes
username = 'eschbacher'
api_key = 'abcdefghijklmnopqrstuvwxyz'
# specify columns for lng/lat so carto will create a geometry
cc.write(coffee_places,
tablename='coffee_places',
lnglat=('longitude', 'latitude'))
Category map on Dunkin' Donuts vs. Starbucks (aka, color by 'type')
from cartoframes import Layer
cc.map(layers=Layer('coffee_places', color='type', size=5),
zoom=9, lng=-71.0637, lat=36.4275,
interactive=False)
is_fastfood = with_amenity.amenity.str.contains('fast_food')
fastfood = with_amenity[is_fastfood]
fastfood.name.value_counts().head(12)
ff = fastfood.compute()
ff.sync_carto(username=username,
api_key=api_key,
requested_tablename='fastfood_dask',
lnglat_cols=('longitude', 'latitude'))
len(ff)
cc.map(layers=Layer('fastfood_dask', size=2, color='#FFF'))
This method relies in you having the do_augment_table
function that John had you load into your account. This might be kinda slow given that we have
# DO measures: Total Population,
# Children under 18 years of age
# Median income
data_obs_measures = [{'numer_id': 'us.census.acs.B01003001'},
{'numer_id': 'us.census.acs.B17001001'},
{'numer_id': 'us.census.acs.B19013001'}]
cc.data_augment('coffee_places', data_obs_measures)