cartoframes
lets you use CARTO in a Python environment so that you can do all of your analysis and mapping in, for example, a Jupyter notebook. cartoframes
allows you to use CARTO's functionality for data analysis, storage, location services like routing and geocoding, and visualization.
You can view this notebook best on nbviewer
here: https://nbviewer.jupyter.org/github/CartoDB/cartoframes/blob/master/examples/Basic%20Usage.ipynb
It is recommended to download this notebook and use on your computer instead so you can more easily explore the functionality of cartoframes
.
To get started, let's load the required packages, and set credentials.
%matplotlib inline
import cartoframes
from cartoframes import Credentials
import pandas as pd
USERNAME = 'eschbacher' # <-- replace with your username
APIKEY = 'abcdefg' # <-- your CARTO API key
creds = Credentials(username=USERNAME,
key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)
cc.read
¶CartoContext
has several methods for interacting with CARTO in a Python environment. CartoContext.read
allows you to pull a dataset stored on CARTO into a pandas DataFrame. In the cell below, we use cc.read
to get the table brooklyn_poverty
from a CARTO account. You can get a CSV of the table here for uploading to your CARTO account:
# Get a CARTO table as a pandas DataFrame
df = cc.read('brooklyn_poverty')
df.head()
commuters_16_over_2011_2015 | geoid | pop_determined_poverty_status_2011_2015 | poverty_count | poverty_per_pop | the_geom | the_geom_webmercator | total_pop_2011_2015 | total_population | walked_to_work_2011_2015 | |
---|---|---|---|---|---|---|---|---|---|---|
cartodb_id | ||||||||||
2 | 4074.192637 | 360470050003 | 3304.439797 | 23.112583 | 0.031191 | 0103000020E6100000010000000B0000006D3A02B85982... | 0103000020110F0000010000000B000000D240DAA89070... | 9624.365242 | 741 | 0.005207 |
585 | 5434.149852 | 360470218001 | 27809.352304 | 770.733564 | 0.250000 | 0103000020E6100000010000000B000000ACE3F8A1D27F... | 0103000020110F0000010000000B0000000354CD84456C... | 16072.338976 | 756 | 0.042990 |
15 | 32412.498980 | 360470514002 | 39958.419065 | 574.101597 | 0.325824 | 0103000020E610000001000000070000003DB5FAEAAA7D... | 0103000020110F00000100000007000000ADF228609C68... | 61660.046010 | 1762 | 0.008740 |
16 | 5135.760974 | 360470534003 | 23191.290336 | 235.858921 | 0.391142 | 0103000020E61000000100000008000000EBABAB02B57D... | 0103000020110F000001000000080000008ECED184AD68... | 14912.553653 | 603 | 0.016081 |
146 | 486.050087 | 360470013002 | 8739.299360 | NaN | NaN | 0103000020E6100000010000001500000005854199467F... | 0103000020110F000001000000150000003D6926A8576B... | 40739.834591 | 939 | 0.037871 |
Notice that:
cartodb_id
)the_geom
column stores the geometry. This can be decoded if we set the decode_geom=True
flag in cc.read
, which requires the library shapely
.null
values are represented as np.nan
Other things to notice:
df.dtypes
commuters_16_over_2011_2015 float64 geoid object pop_determined_poverty_status_2011_2015 float64 poverty_count float64 poverty_per_pop float64 the_geom object the_geom_webmercator object total_pop_2011_2015 float64 total_population int64 walked_to_work_2011_2015 float64 dtype: object
The dtype
of each column is a mapping of the column type on CARTO. For example, numeric
will map to float64
, text
will map to object
(pandas string representation), timestamp
will map to datetime64[ns]
, etc. The reverse happens if a DataFrame is sent to CARTO.
cc.map
¶Now that we can inspect the data, we can map it to see how the values change over the geography. We can use the cc.map
method for this purpose.
cc.map
takes a layers
argument which specifies the data layers that are to be visualized. They can be imported from cartoframes
as below.
There are different types of layers:
Layer
for visualizing CARTO tablesQueryLayer
for visualizing arbitrary queries from tables in user's CARTO accountBaseMap
for specifying the base map to be usedEach of the layers has different styling options. Layer
and QueryLayer
take the same styling arguments, and BaseMap
can be specified to be light/dark and options on label placement.
Maps can be interactive
or not. Set interactivity with the interactive
with True
or False
. If the map is static (not interactive), it will be embedded in the notebook as either a matplotlib
axis or IPython.Image
. Either way, the image will be transported with the notebook. Interactive maps will be embedded zoom and pan-able maps.
from cartoframes import Layer, styling
l = Layer('brooklyn_poverty',
color={'column': 'poverty_per_pop',
'scheme': styling.sunset(7)})
cc.map(layers=l,
interactive=False)
<matplotlib.axes._subplots.AxesSubplot at 0x113361160>
Let's explore a typical cartoframes
workflow using data on NYC taxis.
To get the data into CARTO, we can:
pandas
to grab the data from the cartoframes example accountcc.write
, specifying the lng
/lat
columns you want to use for visualizationoverwrite=True
to replace an existing dataset if it existsdf
with the CARTO-fied version using `cc.read``# read in a CSV of NYC taxi data from cartoframes example datasets
df = pd.read_csv('https://cartoframes.carto.com/api/v2/sql?q=SELECT+*+FROM+taxi_50k&format=csv')
# set the index of the dataframe to be the cartodb_id (database index)
df.set_index('cartodb_id', inplace=True)
# show first five rows to see what we've got
df.head()
the_geom | the_geom_webmercator | vendorid | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | pickup_longitude | pickup_latitude | ratecodeid | ... | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cartodb_id | |||||||||||||||||||||
1 | NaN | NaN | 2 | 2016-05-01 14:52:11+00 | 2016-05-01 15:00:36+00 | 2 | 2.08 | -74.006706 | 40.730461 | 1 | ... | -74.012383 | 40.706779 | 1 | 8.5 | 0.0 | 0.5 | 1.00 | 0.0 | 0.3 | 10.30 |
2 | NaN | NaN | 1 | 2016-05-01 08:34:08+00 | 2016-05-01 08:49:02+00 | 1 | 3.00 | -73.924957 | 40.744125 | 1 | ... | -73.973824 | 40.762779 | 1 | 13.5 | 0.0 | 0.5 | 2.00 | 0.0 | 0.3 | 16.30 |
3 | NaN | NaN | 1 | 2016-05-04 09:44:40+00 | 2016-05-04 10:07:09+00 | 1 | 2.10 | -73.973488 | 40.748501 | 1 | ... | -73.998955 | 40.740833 | 2 | 14.5 | 0.0 | 0.5 | 0.00 | 0.0 | 0.3 | 15.30 |
4 | NaN | NaN | 2 | 2016-05-01 20:50:11+00 | 2016-05-01 21:05:24+00 | 1 | 4.41 | -73.999786 | 40.743267 | 1 | ... | -73.966362 | 40.792370 | 2 | 15.0 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 16.30 |
5 | NaN | NaN | 2 | 2016-05-02 07:26:56+00 | 2016-05-02 07:53:53+00 | 2 | 4.01 | -73.963631 | 40.803360 | 1 | ... | -73.956963 | 40.784939 | 1 | 19.5 | 0.0 | 0.5 | 4.06 | 0.0 | 0.3 | 24.36 |
5 rows × 21 columns
# send it to carto so we can map it
# specify the columns we want to have as a point (pickup location)
cc.write(df, 'taxi_50k',
lnglat=('pickup_longitude', 'pickup_latitude'),
overwrite=True)
# read the fresh carto-fied version
df = cc.read('taxi_50k')
Creating geometry out of columns `pickup_longitude`/`pickup_latitude` Table successfully written to CARTO: https://eschbacher.carto.com/dataset/taxi_50k
Take a look at the data on a map.
from cartoframes import Layer
cc.map(layers=Layer('taxi_50k'),
interactive=False)
<matplotlib.axes._subplots.AxesSubplot at 0x1133b4780>
Oops, there are some zero-valued long/lats in there, so the results are going to null island. Let's remove them.
# select only the rows which are not at (0,0)
df = df[(df['pickup_longitude'] != 0) | (df['pickup_latitude'] != 0)]
# send back up to CARTO
cc.write(df, 'taxi_50k', overwrite=True,
lnglat=('pickup_longitude', 'pickup_latitude'))
Creating geometry out of columns `pickup_longitude`/`pickup_latitude` Table successfully written to CARTO: https://eschbacher.carto.com/dataset/taxi_sample
# Let's take a look at what's going on, styled by the fare amount
cc.map(layers=Layer('taxi_sample',
size=4,
color={'column': 'fare_amount',
'scheme': styling.sunset(7)}),
interactive=True)
We can use the zoom=..., lng=..., lat=...
information in the embedded interactive map to help us get static snapshots of the regions we're interested in. For example, JFK airport is around zoom=12, lng=-73.7880, lat=40.6629
. We can paste that information as arguments in cc.map
to generate a static snapshot of the data there.
# Let's take a look at what's going on at JFK airport, styled by the fare amount, and STATIC
cc.map(layers=Layer('taxi_sample',
size=4,
color={'column': 'fare_amount',
'scheme': styling.sunset(7)}),
zoom=12, lng=-73.7880, lat=40.6629,
interactive=False)
<matplotlib.axes._subplots.AxesSubplot at 0x119c01240>