Data Observatory in cartoframes¶

The Data Observatory can be accessed through CARTOframes. This is a basic demonstration how one would pull down new measures for building a feature set for training a model.

In [1]:

%matplotlib inline
import cartoframes
from cartoframes import QueryLayer, Layer, styling
import pandas as pd

# Enter your username and api key below
cc = cartoframes.CartoContext(base_url='https://{username}.carto.com/'.format(username=''),
                              api_key='')

Getting Mexico City Metro station coordinates¶

Use pandas to download an Excel spreadsheet into a dataframe.

In [2]:

# Metro stations from here:
#  https://github.com/josecarlosgonz/mexicoCityMetro/blob/master/coordsMetro.xlsx
df = pd.read_excel('https://github.com/josecarlosgonz/mexicoCityMetro/blob/master/coordsMetro.xlsx?raw=true')
df.head()

Out[2]:

	Name	latitude	longitude	Unnamed: 3	linea	estacion	afluencia	latitude.1	longitude.1
0	Pantitlán	19.4163	-99.0747	NaN	1	Pantitlán	4513549.0	19.4163	-99.0747
1	Zaragoza	19.4117	-99.0821	NaN	1	Zaragoza	5144223.0	19.4117	-99.0821
2	Gómez Farías	19.4165	-99.0904	NaN	1	Gómez Farías	3665025.0	19.4165	-99.0904
3	Boulevard Puerto Aéreo	19.4196	-99.0963	NaN	1	Boulevard Puerto Aéreo	3611591.0	19.4196	-99.0963
4	Balbuena	19.4231	-99.1021	NaN	1	Balbuena	1822229.0	19.4231	-99.1021

Send to CARTO, being sure to specify the to-be-normalized column names latitude.1 -> latitude_1, etc.

In [3]:

orig_table = 'coordsmetro_demo'
cc.write(df, orig_table, lnglat=('longitude_1', 'latitude_1'), overwrite=True)

The following columns were changed in the CARTO copy of this dataframe:
Name -> name
Unnamed: 3 -> unnamed_3
latitude.1 -> latitude_1
longitude.1 -> longitude_1
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/coordsmetro_demo

See the data by `linea`¶

Note: notice the basemap labels are default on the bottom.

In [4]:

cc.map(layers=Layer(orig_table,
                    color={'column': 'linea',
                           'scheme': styling.bold(10)}))

Out[4]:

See a static version of the map above

In [5]:

cc.map(layers=Layer(orig_table,
                    color={'column': 'linea',
                           'scheme': styling.bold(10)}),
       interactive=False)

Out[5]:

<matplotlib.axes._subplots.AxesSubplot at 0x10d065358>

Data Observatory measures in the Mexico City area¶

Let's get education-related Data Observatory measures around the metro stops.

In [6]:

meta = cc.data_discovery(region=orig_table, keywords='education')
meta.head()

Out[6]:

	denom_aggregate	denom_colname	denom_description	denom_geomref_colname	denom_id	denom_name	denom_reltype	denom_t_description	denom_tablename	denom_type	...	numer_timespan	numer_type	score	score_rank	score_rownum	suggested_name	target_area	target_geoms	timespan_rank	timespan_rownum
0	sum	employed_primary_education	None	cvegeo	mx.inegi_columns.ECO10	Employed population with primary education	denominator	None	obs_50197262168407a1409111b87164348a0b01e9e4	Numeric	...	2010	Numeric	36.599093	1	1	female_employed_primary_education_rate_2010	None	None	1	1
1	sum	female_employed	None	cvegeo	mx.inegi_columns.ECO5	Employed female	denominator	None	obs_50197262168407a1409111b87164348a0b01e9e4	Numeric	...	2010	Numeric	36.599093	1	2	female_employed_primary_education_rate_2010	None	None	1	2
2	sum	employed_incomplete_secondary_education	None	cvegeo	mx.inegi_columns.ECO13	Employed population with incomplete secondary ...	denominator	None	obs_50197262168407a1409111b87164348a0b01e9e4	Numeric	...	2010	Numeric	36.599093	1	1	female_employed_incomplete_secondary_education...	None	None	1	1
3	sum	female_employed	None	cvegeo	mx.inegi_columns.ECO5	Employed female	denominator	None	obs_50197262168407a1409111b87164348a0b01e9e4	Numeric	...	2010	Numeric	36.599093	1	2	female_employed_incomplete_secondary_education...	None	None	1	2
4	sum	employed_incomplete_secondary_education	None	cvegeo	mx.inegi_columns.ECO13	Employed population with incomplete secondary ...	denominator	None	obs_50197262168407a1409111b87164348a0b01e9e4	Numeric	...	2010	Numeric	36.599093	1	1	male_employed_incomplete_secondary_education_r...	None	None	1	1

5 rows × 42 columns

In [7]:

# See how many measures are possible
meta.shape

Out[7]:

(28, 42)

In [8]:

# Look at the geometry levels available
meta.groupby('geom_id')['geom_id'].count()

Out[8]:

geom_id
mx.inegi.ageb          8
mx.inegi.municipio    20
Name: geom_id, dtype: int64

Narrow down the problem to only have municipio-level measures.

In [9]:

# select only the municipio level data
meta = meta[meta['geom_id'] == 'mx.inegi.municipio']

This takes it down to only 20 measures.

In [10]:

meta.shape

Out[10]:

(20, 42)

Take a look at the measures we have

In [11]:

meta['numer_name'].values

Out[11]:

array(['Employed female population with primary education',
       'Employed female population with primary education',
       'Employed female population with incomplete secondary education',
       'Employed female population with incomplete secondary education',
       'Employed male population with incomplete secondary education',
       'Employed male population with incomplete secondary education',
       'Employed female population who completed basic education',
       'Employed female population who completed basic education',
       'Employed male population who completed basic education',
       'Employed male population who completed basic education',
       'Female population 15 or more years old who did not complete basic education',
       'Female population 15 or more years old who did not complete basic education',
       'Male population 15 or more years old who completed basic education',
       'Male population 15 or more years old who completed basic education',
       'Employed male population with primary education',
       'Employed male population with primary education',
       'Male population 15 or more years old who did not complete basic education',
       'Male population 15 or more years old who did not complete basic education',
       'Female population 15 or more years old who completed basic education',
       'Female population 15 or more years old who completed basic education'], dtype=object)

In [13]:

# Get some data
data = cc.data(orig_table, meta.loc[0:4])
data.head()

Out[13]:

	afluencia	estacion	female_employed_incomplete_secondary_education_rate_2010	female_employed_primary_education_rate_2010	latitude	latitude_1	linea	longitude	longitude_1	male_employed_incomplete_secondary_education_rate_2010	name	the_geom	unnamed_3
cartodb_id
1	4513549	Pantitlán	0.029624	0.135172	19.4163	19.4163	1	-99.0747	-99.0747	0.668229	Pantitlán	0101000020E6100000B84082E2C7C458C0265305A3926A...	None
2	5144223	Zaragoza	0.029624	0.135172	19.4117	19.4117	1	-99.0821	-99.0821	0.668229	Zaragoza	0101000020E61000001AC05B2041C558C061C3D32B6569...	None
3	3665025	Gómez Farías	0.029624	0.135172	19.4165	19.4165	1	-99.0904	-99.0904	0.668229	Gómez Farías	0101000020E6100000BDE3141DC9C558C0B4C876BE9F6A...	None
4	3611591	Boulevard Puerto Aéreo	0.029624	0.135172	19.4196	19.4196	1	-99.0963	-99.0963	0.668229	Boulevard Puerto Aéreo	0101000020E6100000B5A679C729C658C0CF66D5E76A6B...	None
5	1822229	Balbuena	0.029624	0.135172	19.4231	19.4231	1	-99.1021	-99.1021	0.668229	Balbuena	0101000020E6100000FB3A70CE88C658C007F01648506C...	None

In [15]:

new_table = 'mexico_metro_augmented'
cc.write(data, new_table, overwrite=True)

Table successfully written to CARTO: https://cartoframes.carto.com/dataset/mexico_metro_augmented

In [19]:

from cartoframes import Layer, BaseMap, styling
cc.map(layers=[BaseMap('dark'), Layer(new_table,
                    color={'column': 'female_employed_incomplete_secondary_education_rate_2010',
                           'scheme': styling.sunset(7)})])

Out[19]:

Data Observatory in cartoframes¶

Getting Mexico City Metro station coordinates¶

See the data by linea¶

Data Observatory measures in the Mexico City area¶

See the data by `linea`¶