The Data Observatory can be accessed through CARTOframes. This is a basic demonstration how one would pull down new measures for building a feature set for training a model.
%matplotlib inline
import cartoframes
from cartoframes import QueryLayer, Layer, styling
import pandas as pd
# Enter your username and api key below
cc = cartoframes.CartoContext(base_url='https://{username}.carto.com/'.format(username=''),
api_key='')
Use pandas to download an Excel spreadsheet into a dataframe.
# Metro stations from here:
# https://github.com/josecarlosgonz/mexicoCityMetro/blob/master/coordsMetro.xlsx
df = pd.read_excel('https://github.com/josecarlosgonz/mexicoCityMetro/blob/master/coordsMetro.xlsx?raw=true')
df.head()
Name | latitude | longitude | Unnamed: 3 | linea | estacion | afluencia | latitude.1 | longitude.1 | |
---|---|---|---|---|---|---|---|---|---|
0 | Pantitlán | 19.4163 | -99.0747 | NaN | 1 | Pantitlán | 4513549.0 | 19.4163 | -99.0747 |
1 | Zaragoza | 19.4117 | -99.0821 | NaN | 1 | Zaragoza | 5144223.0 | 19.4117 | -99.0821 |
2 | Gómez Farías | 19.4165 | -99.0904 | NaN | 1 | Gómez Farías | 3665025.0 | 19.4165 | -99.0904 |
3 | Boulevard Puerto Aéreo | 19.4196 | -99.0963 | NaN | 1 | Boulevard Puerto Aéreo | 3611591.0 | 19.4196 | -99.0963 |
4 | Balbuena | 19.4231 | -99.1021 | NaN | 1 | Balbuena | 1822229.0 | 19.4231 | -99.1021 |
Send to CARTO, being sure to specify the to-be-normalized column names latitude.1
-> latitude_1
, etc.
orig_table = 'coordsmetro_demo'
cc.write(df, orig_table, lnglat=('longitude_1', 'latitude_1'), overwrite=True)
The following columns were changed in the CARTO copy of this dataframe: Name -> name Unnamed: 3 -> unnamed_3 latitude.1 -> latitude_1 longitude.1 -> longitude_1 Table successfully written to CARTO: https://cartoframes.carto.com/dataset/coordsmetro_demo
linea
¶Note: notice the basemap labels are default on the bottom.
cc.map(layers=Layer(orig_table,
color={'column': 'linea',
'scheme': styling.bold(10)}))
See a static version of the map above
cc.map(layers=Layer(orig_table,
color={'column': 'linea',
'scheme': styling.bold(10)}),
interactive=False)
<matplotlib.axes._subplots.AxesSubplot at 0x10d065358>
Let's get education-related Data Observatory measures around the metro stops.
meta = cc.data_discovery(region=orig_table, keywords='education')
meta.head()
denom_aggregate | denom_colname | denom_description | denom_geomref_colname | denom_id | denom_name | denom_reltype | denom_t_description | denom_tablename | denom_type | ... | numer_timespan | numer_type | score | score_rank | score_rownum | suggested_name | target_area | target_geoms | timespan_rank | timespan_rownum | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | sum | employed_primary_education | None | cvegeo | mx.inegi_columns.ECO10 | Employed population with primary education | denominator | None | obs_50197262168407a1409111b87164348a0b01e9e4 | Numeric | ... | 2010 | Numeric | 36.599093 | 1 | 1 | female_employed_primary_education_rate_2010 | None | None | 1 | 1 |
1 | sum | female_employed | None | cvegeo | mx.inegi_columns.ECO5 | Employed female | denominator | None | obs_50197262168407a1409111b87164348a0b01e9e4 | Numeric | ... | 2010 | Numeric | 36.599093 | 1 | 2 | female_employed_primary_education_rate_2010 | None | None | 1 | 2 |
2 | sum | employed_incomplete_secondary_education | None | cvegeo | mx.inegi_columns.ECO13 | Employed population with incomplete secondary ... | denominator | None | obs_50197262168407a1409111b87164348a0b01e9e4 | Numeric | ... | 2010 | Numeric | 36.599093 | 1 | 1 | female_employed_incomplete_secondary_education... | None | None | 1 | 1 |
3 | sum | female_employed | None | cvegeo | mx.inegi_columns.ECO5 | Employed female | denominator | None | obs_50197262168407a1409111b87164348a0b01e9e4 | Numeric | ... | 2010 | Numeric | 36.599093 | 1 | 2 | female_employed_incomplete_secondary_education... | None | None | 1 | 2 |
4 | sum | employed_incomplete_secondary_education | None | cvegeo | mx.inegi_columns.ECO13 | Employed population with incomplete secondary ... | denominator | None | obs_50197262168407a1409111b87164348a0b01e9e4 | Numeric | ... | 2010 | Numeric | 36.599093 | 1 | 1 | male_employed_incomplete_secondary_education_r... | None | None | 1 | 1 |
5 rows × 42 columns
# See how many measures are possible
meta.shape
(28, 42)
# Look at the geometry levels available
meta.groupby('geom_id')['geom_id'].count()
geom_id mx.inegi.ageb 8 mx.inegi.municipio 20 Name: geom_id, dtype: int64
Narrow down the problem to only have municipio
-level measures.
# select only the municipio level data
meta = meta[meta['geom_id'] == 'mx.inegi.municipio']
This takes it down to only 20 measures.
meta.shape
(20, 42)
Take a look at the measures we have
meta['numer_name'].values
array(['Employed female population with primary education', 'Employed female population with primary education', 'Employed female population with incomplete secondary education', 'Employed female population with incomplete secondary education', 'Employed male population with incomplete secondary education', 'Employed male population with incomplete secondary education', 'Employed female population who completed basic education', 'Employed female population who completed basic education', 'Employed male population who completed basic education', 'Employed male population who completed basic education', 'Female population 15 or more years old who did not complete basic education', 'Female population 15 or more years old who did not complete basic education', 'Male population 15 or more years old who completed basic education', 'Male population 15 or more years old who completed basic education', 'Employed male population with primary education', 'Employed male population with primary education', 'Male population 15 or more years old who did not complete basic education', 'Male population 15 or more years old who did not complete basic education', 'Female population 15 or more years old who completed basic education', 'Female population 15 or more years old who completed basic education'], dtype=object)
# Get some data
data = cc.data(orig_table, meta.loc[0:4])
data.head()
afluencia | estacion | female_employed_incomplete_secondary_education_rate_2010 | female_employed_primary_education_rate_2010 | latitude | latitude_1 | linea | longitude | longitude_1 | male_employed_incomplete_secondary_education_rate_2010 | name | the_geom | unnamed_3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cartodb_id | |||||||||||||
1 | 4513549 | Pantitlán | 0.029624 | 0.135172 | 19.4163 | 19.4163 | 1 | -99.0747 | -99.0747 | 0.668229 | Pantitlán | 0101000020E6100000B84082E2C7C458C0265305A3926A... | None |
2 | 5144223 | Zaragoza | 0.029624 | 0.135172 | 19.4117 | 19.4117 | 1 | -99.0821 | -99.0821 | 0.668229 | Zaragoza | 0101000020E61000001AC05B2041C558C061C3D32B6569... | None |
3 | 3665025 | Gómez Farías | 0.029624 | 0.135172 | 19.4165 | 19.4165 | 1 | -99.0904 | -99.0904 | 0.668229 | Gómez Farías | 0101000020E6100000BDE3141DC9C558C0B4C876BE9F6A... | None |
4 | 3611591 | Boulevard Puerto Aéreo | 0.029624 | 0.135172 | 19.4196 | 19.4196 | 1 | -99.0963 | -99.0963 | 0.668229 | Boulevard Puerto Aéreo | 0101000020E6100000B5A679C729C658C0CF66D5E76A6B... | None |
5 | 1822229 | Balbuena | 0.029624 | 0.135172 | 19.4231 | 19.4231 | 1 | -99.1021 | -99.1021 | 0.668229 | Balbuena | 0101000020E6100000FB3A70CE88C658C007F01648506C... | None |
new_table = 'mexico_metro_augmented'
cc.write(data, new_table, overwrite=True)
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/mexico_metro_augmented
from cartoframes import Layer, BaseMap, styling
cc.map(layers=[BaseMap('dark'), Layer(new_table,
color={'column': 'female_employed_incomplete_secondary_education_rate_2010',
'scheme': styling.sunset(7)})])