We would like to more easily interact with the datasets available at data.lacity.org and geohub.lacity.org.
These data portals have catalogs available at the data.json
endpoints, which are specified in the
DCAT (https://project-open-data.cio.gov/v1.1/schema) catalog format.
We have been working on an intake
catalog source that adapts a DCAT catalog to an intake catalog,
along with specifying how to load datasets.
Begin by importing intake
:
%matplotlib inline
import intake
The DCAT catalogs for the City of Los Angeles open data portals are specified in catalog.yml.
We can optionally give an item
argument, which will filter the catalog to only include the selected items.
We use intake
to load the catalog into memory:
catalog = intake.open_catalog('catalogs/demo.yml')
print(list(catalog))
Let's load some data relating to Los Angeles' bike infrastructure. The GeoHub dataset has already had some entries selected:
geohub = catalog.la_geohub
open_data = catalog.la_open_data
for key, entry in geohub.items():
display(entry)
The open data catalog, on the other hand, is pretty long:
len(list(open_data))
We can make this shorter by filtering it for the "boundary" keyword:
open_data_boundary = open_data.search('boundary')
len(list(open_data_boundary))
for entry_id, entry in open_data_boundary.items():
display(entry)
Much more managable.
We can new read in some datasets into memory:
# The subsetted geohub catalog was able to rename the selected datasets
# so that they are nicer to read.
bike_racks = geohub.bike_racks.read()
bikeways = geohub.bikeways.read()
# The auto-generated open data catalog used the default identifier,
# which is a bit less nice to read.
city_boundary = open_data_boundary['https://data.lacity.org/api/views/ppge-zfr4'].read()
Let's inspect these new dataframes:
bikeways.head()
bike_racks.head()
These can all be plotted to show a view of Los Angeles bike infrastructure:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, figsize=(16,16))
city_boundary.plot(ax=ax, color='darkgreen', alpha=0.2, linewidth=1, edgecolor='black')
bikeways.plot(ax=ax, color='navy', alpha=0.5, linewidth=1)
bike_racks.plot(ax=ax, color='maroon', alpha=0.5, markersize=1)