Data Management

In this guide you will learn how to load different data files into DataFrames and how to interact with the CARTO platform to upload DataFrames into tables and download tables or SQL queries into DataFrames.

CARTOframes is built on top of Pandas and GeoPandas. Therefore, it's compatible with all the data formats supported in those projects like GeoJSON, Shapefile, CSV, etc.

There are two main concepts we should know before continuing with the guide:

  • A DataFrame is a two-dimensional data structure for generic data. It can be thought of as a table with rows and columns. It's composed of Series objects, which are one-dimensional data structures.
  • A GeoDataFrame is a DataFrame with an extra geometry column. This geometry column is a GeoSeries object.

Every time we manage geographic data, a GeoDataFrame should be used. In case a DataFrame with an encoded geometry column is used (WKB, WKT, etc.), every method contains a geom_col param to provide the name of that column and decode the geometry internally.

For further learning you can checkout the Data Management examples.

Read a GeoJSON file

This is how to load geographic data from a GeoJSON file using GeoPandas. To read pure JSON files check this example.

In [1]:
from geopandas import read_file

gdf = read_file('https://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn_geocoded.geojson')
gdf.head()
Out[1]:
cartodb_id field_1 name address revenue geometry
0 1 0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 POINT (-73.95901 40.67109)
1 2 1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 POINT (-73.96122 40.57796)
2 3 2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 POINT (-73.98976 40.61912)
3 4 3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 POINT (-74.02744 40.63152)
4 5 4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 POINT (-74.00098 40.59321)

Read a Shapefile

Shapefile is a complex format, compared to CSV or GeoJSON. To learn more about this format check GeoPandas documentation.

In [2]:
from geopandas import read_file

gdf = read_file('https://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn_geocoded.zip')
gdf.head()
Out[2]:
cartodb_id field_1 name address revenue geometry
0 1 0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 POINT (-73.95901 40.67109)
1 2 1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 POINT (-73.96122 40.57796)
2 3 2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 POINT (-73.98976 40.61912)
3 4 3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 POINT (-74.02744 40.63152)
4 5 4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 POINT (-74.00098 40.59321)

Read a CSV file

Compute geometry from longitude and latitude

In [3]:
from pandas import read_csv
from geopandas import GeoDataFrame, points_from_xy

df = read_csv('https://libs.cartocdn.com/cartoframes/samples/sf_incidents.csv')

gdf = GeoDataFrame(df, geometry=points_from_xy(df['longitude'], df['latitude']))
gdf.head()
Out[3]:
incident_datetime incident_date incident_time incident_year incident_day_of_week report_datetime row_id incident_id incident_number cad_number ... :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md :@computed_region_nqbw_i6c3 :@computed_region_2dwj_jsy4 :@computed_region_h4ep_8xdi :@computed_region_y6ts_4iup :@computed_region_jg9y_a9du :@computed_region_6pnf_4xz7 geometry
0 2020-08-16T03:13:00.000 2020-08-16T00:00:00.000 03:13 2020 Sunday 2020-08-16T03:14:00.000 95319604083 953196 200491669 202290313.0 ... 2.0 9 26.0 NaN NaN NaN NaN NaN 2.0 POINT (-122.39773 37.75483)
1 2020-08-16T03:38:00.000 2020-08-16T00:00:00.000 03:38 2020 Sunday 2020-08-16T04:56:00.000 95326228100 953262 200491738 202290404.0 ... 3.0 2 20.0 3.0 NaN NaN NaN NaN 2.0 POINT (-122.42204 37.76654)
2 2020-08-16T13:40:00.000 2020-08-16T00:00:00.000 13:40 2020 Sunday 2020-08-16T13:56:00.000 95336264020 953362 200492463 202291631.0 ... 1.0 10 8.0 NaN NaN NaN NaN NaN 1.0 POINT (-122.40371 37.78404)
3 2020-08-16T16:18:00.000 2020-08-16T00:00:00.000 16:18 2020 Sunday 2020-08-16T16:18:00.000 95335012010 953350 200492792 202292091.0 ... 10.0 7 35.0 NaN NaN NaN NaN NaN 1.0 POINT (-122.50742 37.75100)
4 2020-08-12T22:00:00.000 2020-08-12T00:00:00.000 22:00 2020 Wednesday 2020-08-15T08:30:00.000 95300674000 953006 200489880 202280827.0 ... 4.0 11 39.0 NaN NaN NaN NaN NaN 2.0 POINT (-122.43214 37.78050)

5 rows × 37 columns

Compute geometry from WKT/WKB

In [4]:
from pandas import read_csv
from geopandas import GeoDataFrame
from cartoframes.utils import decode_geometry

df = read_csv('https://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn_geocoded.csv')

gdf = GeoDataFrame(df, geometry=decode_geometry(df['the_geom']))
gdf.head()
Out[4]:
the_geom cartodb_id field_1 name address revenue id_store geometry
0 0101000020E61000005EA27A6B607D52C01956F146E655... 1 0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 A POINT (-73.95901 40.67109)
1 0101000020E6100000B610E4A0847D52C0B532E197FA49... 2 1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 B POINT (-73.96122 40.57796)
2 0101000020E6100000E5B8533A587F52C05726FC523F4F... 3 2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 C POINT (-73.98976 40.61912)
3 0101000020E61000008BA6B393C18152C08D62B9A5D550... 4 3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 D POINT (-74.02744 40.63152)
4 0101000020E6100000CEFC6A0E108052C080D4264EEE4B... 5 4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 E POINT (-74.00098 40.59321)

Read data from a CARTO table

Note: You'll need your CARTO Account credentials to perform this action.

In [5]:
from cartoframes.auth import set_default_credentials

set_default_credentials('cartoframes')
In [6]:
from cartoframes import read_carto

gdf = read_carto('starbucks_brooklyn')
gdf.head()
Out[6]:
cartodb_id the_geom field_1 name address revenue
0 1 POINT (-73.95901 40.67109) 0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772
1 2 POINT (-73.96122 40.57796) 1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418
2 3 POINT (-73.98976 40.61912) 2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699
3 4 POINT (-74.02744 40.63152) 3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676
4 5 POINT (-74.00098 40.59321) 4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411

Read data from a CARTO SQL Query

Note: You'll need your CARTO Account credentials to perform this action.

In [7]:
from cartoframes.auth import set_default_credentials

set_default_credentials('cartoframes')
In [8]:
from cartoframes import read_carto

gdf = read_carto("SELECT * FROM starbucks_brooklyn WHERE revenue > 1200000")
gdf.head()
Out[8]:
cartodb_id the_geom field_1 name address revenue
0 1 POINT (-73.95901 40.67109) 0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772
1 2 POINT (-73.96122 40.57796) 1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418
2 3 POINT (-73.98976 40.61912) 2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699

Upload data to CARTO

Note: You'll need your CARTO Account credentials to perform this action.

In [9]:
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')
In [10]:
from cartoframes import to_carto

to_carto(gdf, 'starbucks_brooklyn_filtered', if_exists='replace')
Success! Data uploaded to table "starbucks_brooklyn_filtered" correctly
Out[10]:
'starbucks_brooklyn_filtered'