What does the Citibike station system look like right now? Citibike publishes an open feed of station statuses. Let's use cartframes to process this data and send it to your CARTO account and create some maps.
cartoframes
lets you use CARTO in a Python environment so that you can do all of your analysis and mapping in, for example, a Jupyter notebook. cartoframes
allows you to use CARTO's functionality for data analysis, storage, location services like routing and geocoding, and visualization. cartoframes
is based on working with data in a Pandas dataframe. Pandas is a handy python library for data analysis (https://pandas.pydata.org/)
Read the cartoframes
docs here: http://cartoframes.readthedocs.io/en/latest/
You can view this notebook best on nbviewer
here: https://nbviewer.jupyter.org/github/CartoDB/cartoframes/blob/master/examples/Citibike%20Example.ipynb, however
it is recommended to download this notebook, install cartoframes and dependencies, and use on your computer instead so you can more easily explore the functionality of cartoframes
.
To get started, let's load the required packages, and set credentials.
import cartoframes
# For convenience we're getting Credentials, Layer, Basemap, and styling
from cartoframes import Credentials
from cartoframes import Layer, BaseMap, styling
import pandas as pd
%matplotlib inline
USERNAME = 'michellemho' # <-- replace with your username
APIKEY = 'abcdefg' # <-- your CARTO API key
creds = Credentials(username=USERNAME,
key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)
Citibike system data can be found here: https://www.citibikenyc.com/system-data We're going to use the real time data, which comes in General Bikeshare Feed Specification (GBFS) format as a series of JSON files.
# Use Pandas to read a JSON of Citibike stations and their statuses
stations_data = pd.read_json('https://gbfs.citibikenyc.com/gbfs/en/station_information.json')
stations = pd.DataFrame(stations_data.data[0])
status_data = pd.read_json('https://gbfs.citibikenyc.com/gbfs/en/station_status.json')
status = pd.DataFrame(status_data.data[0])
# Grab the last updated timestamps
timestamp_stations = stations_data.last_updated[0]
timestamp_status = status_data.last_updated[0]
# Join the station and statuses together by 'station_id'
station_status = pd.merge(stations,status,how='left', on='station_id')
# Preview the dataframe
station_status.head()
capacity | eightd_has_key_dispenser | eightd_station_services | lat | lon | name | region_id | rental_methods | rental_url | short_name | ... | eightd_has_available_keys | is_installed | is_renting | is_returning | last_reported | num_bikes_available | num_bikes_disabled | num_docks_available | num_docks_disabled | num_ebikes_available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 39 | False | NaN | 40.767272 | -73.993929 | W 52 St & 11 Ave | 71.0 | [KEY, CREDITCARD] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 6926.01 | ... | False | 1 | 1 | 1 | 1523628435 | 2 | 0 | 37 | 0 | 0 |
1 | 33 | False | NaN | 40.719116 | -74.006667 | Franklin St & W Broadway | 71.0 | [KEY, CREDITCARD] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 5430.08 | ... | False | 1 | 1 | 1 | 1523628157 | 22 | 2 | 9 | 0 | 0 |
2 | 27 | False | NaN | 40.711174 | -74.000165 | St James Pl & Pearl St | 71.0 | [KEY, CREDITCARD] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 5167.06 | ... | False | 1 | 1 | 1 | 1523626231 | 17 | 1 | 9 | 0 | 0 |
3 | 62 | False | NaN | 40.683826 | -73.976323 | Atlantic Ave & Fort Greene Pl | 71.0 | [KEY, CREDITCARD] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 4354.07 | ... | False | 1 | 1 | 1 | 1523627612 | 42 | 1 | 19 | 0 | 0 |
4 | 19 | False | NaN | 40.696089 | -73.978034 | Park Ave & St Edwards St | 71.0 | [KEY, CREDITCARD] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 4700.06 | ... | False | 1 | 1 | 1 | 1523627405 | 6 | 0 | 13 | 0 | 0 |
5 rows × 22 columns
# Write station status data to CARTO, using string-formatting to name the dataset with the timestmap
cc.write(station_status, 'cb_stations_status_{}'.format(timestamp_stations), lnglat=('lon','lat'), overwrite=True)
/Users/mho/anaconda3/lib/python3.5/site-packages/carto/resources.py:90: FutureWarning: This is part of a non-public CARTO API and may change in the future. Take this into account if you are using this in a production environment warnings.warn('This is part of a non-public CARTO API and may change in the future. Take this into account if you are using this in a production environment', FutureWarning)
Table successfully written to CARTO: https://michellemho-carto.carto.com/dataset/cb_stations_status_1523628491
cc.map
¶Now that we can inspect the data, we can map it to see how the values change over the geography. We can use the cc.map
method for this purpose.
cc.map
takes a layers
argument which specifies the data layers that are to be visualized. They can be imported from cartoframes
as below.
There are different types of layers:
Layer
for visualizing CARTO tablesQueryLayer
for visualizing arbitrary queries from tables in user's CARTO accountBaseMap
for specifying the base map to be usedEach of the layers has different styling options. Layer
and QueryLayer
take the same styling arguments, and BaseMap
can be specified to be light/dark and options on label placement.
Maps can be interactive
or not. Set interactivity with the interactive
with True
or False
. If the map is static (not interactive), it will be embedded in the notebook as either a matplotlib
axis or IPython.Image
. Either way, the image will be transported with the notebook. Interactive maps will be embedded zoom and pan-able maps.
# Bring the data back as a map. Style by number of bikes available at each station
# Replace the name of the table with the correct timestamp!
cc.map(layers=[Layer('cb_stations_status_1523628491',
color={'column': 'num_bikes_available',
'scheme': styling.geyser(7, bin_method='quantiles')},
size=6),
BaseMap(source='dark')],
interactive=True)
cc.query
¶CartoContext
has several methods for retrieving data from your CARTO account into a Pandas dataframe. In this example, we'll use cc.query
to pass in a SQL query and return the results.
# set up SQL query to find all the empty citibike stations
# cdb_isochrone is a function available through CARTO data services
# https://carto.com/docs/carto-engine/dataservices-api/isoline-functions/
empty_query = '''
SELECT *
FROM cb_stations_status_1523628491
WHERE num_bikes_available = 0
'''
# use cartoframes query method, and persist as a new table called empty_stations, also return results as dataframe
new_df = cc.query(empty_query, table_name="empty_stations")
new_df.head()
capacity | eightd_active_station_services | eightd_has_available_keys | eightd_has_key_dispenser | eightd_station_services | is_installed | is_renting | is_returning | last_reported | lat | ... | num_bikes_disabled | num_docks_available | num_docks_disabled | num_ebikes_available | region_id | rental_methods | rental_url | short_name | station_id | the_geom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cartodb_id | |||||||||||||||||||||
6 | 19 | False | False | 1 | 1 | 1 | 1523623566 | 40.686768 | ... | 0 | 19 | 0 | 0 | 71.0 | ['KEY', 'CREDITCARD'] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 4452.03 | 120 | 0101000020E610000020D0FCDE647D52C054A5F302E857... | ||
12 | 29 | False | False | 1 | 1 | 1 | 1523627512 | 40.720874 | ... | 0 | 29 | 0 | 0 | 71.0 | ['KEY', 'CREDITCARD'] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 5476.03 | 150 | 0101000020E610000062516C60C67E52C05F460C96455C... | ||
14 | 29 | False | False | 1 | 0 | 0 | 1523366145 | 40.714740 | ... | 0 | 29 | 0 | 0 | 71.0 | ['KEY', 'CREDITCARD'] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 5288.09 | 152 | 0101000020E6100000ABF57632958052C0673F18997C5B... | ||
21 | 30 | False | False | 1 | 1 | 1 | 1523628272 | 40.738177 | ... | 1 | 29 | 0 | 0 | 71.0 | ['KEY', 'CREDITCARD'] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 6004.07 | 174 | 0101000020E6100000AC1C9C808D7E52C07F164B917C5E... | ||
31 | 31 | False | False | 1 | 0 | 0 | 1523543799 | 40.736197 | ... | 0 | 31 | 0 | 0 | 71.0 | ['KEY', 'CREDITCARD'] | http://app.citibikenyc.com/S6Lr/IBV092JufD?sta... | 5964.01 | 238 | 0101000020E6100000EBE9C0C58C8052C029F686B13B5E... |
5 rows × 23 columns
# map the empty stations, style by capacity
cc.map(layers=[Layer('empty_stations',
color='capacity'),
BaseMap(source='dark')],
interactive=True)