The tutorial is designed to touch upon
- Finding data from ArcGIS Hub
- Loading and Visualizing it using ArcGIS API for Python (API)
- Extract it as a Python dataframe for basic exploratory analysis
- Query the data for specific results using the API
- Customize map results using API
Let's start from the bare basics..
For those new to Python, here is a list of online resourses available to learn and familiarize yourself with Python.
Anaconda, an open source distribution of R and Python is used to install Python version 3.6. All the packages used in the next code cell (except geopandas) come along with the Anaconda installation of Python.
To install geopandas, execute the following command after Anaconda has been installed.
conda install -c conda-forge geopandas
We start by importing necessary packages
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
%matplotlib inline
A simple breakdown of these packages..
Pandas : To load data as a table and enable working with tabular data.
Geopandas : An extension of Pandas that supports working with geospatial data.
Numpy : Supports array processing for numbers, strings, records and objects
Matplotlib : Package for plotting and Python visualization
Seaborn : Package for making attractive and informative statistical graphics in Python
The last two lines of code in the cell above, adopt the color styles of seaborn and allow for displaying plots within this notebook, as opposed to displaying them in another browser window.
Look for the dataset in the search bar as shown on the ArcGIS Hub Open Data page.
The dataset used through this tutorial is the DC Bicyclelanes dataset.
The Bicycle Lanes dataset for DC contains information such as length of the bike lane, year installed, additional notes, segment id for the street it falls on, etc for each bike lane. Additional information about each field in the dataset can be found here.
When you click on the dataset in the search results, it takes you to the details of the dastaset. Click on the 'API Explorer' tab as shown below.
Once under the 'API Explorer' tab, copy only the selected portion of the 'Query URL'. This ensures that that the data is not returned in the JSON format.
The copied URL is stored in the variable lyr_url and using the arcgis package of Python it is loaded as a feature/map layer in Python
The arcgis Python package allows automating tasks that integrate big data analysis with one's web GIS.
It can be installed using conda as well.
from arcgis.features import FeatureLayer
lyr_url = 'https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Transportation_WebMercator/MapServer/2'
bike_layer = FeatureLayer(lyr_url)
bike_layer
<FeatureLayer url:"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Transportation_WebMercator/MapServer/2">
Using the following lines of code, we output the fields/attributes of the dataset
for f in bike_layer.properties.fields:
print(f['name'])
OBJECTID FACILITYID STREETSEGID SOURCEID BIKELANELENGTH FACILITY PROPOSEDCYCLETRACK Shape Shape_Length TRAVELDIRECTION NOTES BIKELANE_YEAR PLANSREADY GAP GAP_NOTES NEED_SYMBOL NEED_SYM_1 REPAINT_LINE YEAR_INSTALLED
We now retreive the entire dataset in all_features and get a count of all the rows in the dataset
all_features = bike_layer.query()
print('Total number of rows in the dataset: ')
print(len(all_features.features))
Total number of rows in the dataset: 1370
from arcgis.gis import GIS
gis = GIS()
#Here we select a zoom level of 13, purely based on judgement of location and data to be displayed
map1 = gis.map('Washington, DC', 13)
In the following step we add the Bikelanes as a layer superimposing the basemap of Washington, DC initialized above.
The "url" attribute is provided with the Service URL
map1.add_layer({"type":"FeatureLayer",
"url":"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Transportation_WebMercator/MapServer/2",
})
map1
The variable all_features is of datatype arcgis FeatureSet.
In order to use this data as a tabular dataset, we need to store it as a Pandas DataFrame, as shown below.
#store as dataframe
data = all_features.df
#View first 5 rows
data.head()
BIKELANELENGTH | BIKELANE_YEAR | FACILITY | FACILITYID | GAP | GAP_NOTES | NEED_SYMBOL | NEED_SYM_1 | NOTES | OBJECTID | PLANSREADY | PROPOSEDCYCLETRACK | REPAINT_LINE | SOURCEID | STREETSEGID | Shape_Length | TRAVELDIRECTION | YEAR_INSTALLED | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.009455 | 0 | Existing Bike Lane | SEGID- 9854 | None | None | 1 | None | 15135550 | 9854 | 15.216846 | 2008 | {'paths': [[[-8573273.977615476, 4715455.45970... | ||||||
1 | 0.076754 | 0 | Existing Bike Lane | SEGID- 10198 | None | None | COG 95, 1975 plan, public input | 2 | None | 15135555 | 10198 | 123.524976 | 2008 | {'paths': [[[-8573264.662602307, 4715472.71048... | |||||
2 | 0.058610 | 0 | Existing Bike Lane | SEGID- 9167 | None | None | COG 95, 1975 plan, public input | 3 | None | 15135560 | 9167 | 94.322895 | 2008 | {'paths': [[[-8573183.827335583, 4715609.70743... | |||||
3 | 0.026891 | 0 | Existing Bike Lane | SEGID- 12337 | None | None | COG 95, 1975 plan, public input | 4 | None | 15135565 | 12337 | 43.276471 | 2008 | {'paths': [[[-8573122.896706875, 4715714.84043... | |||||
4 | 0.172385 | 0 | Existing Bike Lane | SEGID- 10611 | None | None | COG 95, 1975 plan, public input | 5 | None | 15135580 | 10611 | 277.425702 | 2008 | {'paths': [[[-8572973.134739578, 4715973.04009... |
Since this is a spatial dataset (contains geometry that can be plotted on a map), we will convert it to a geodataframe to view its details.
data_gdf = gpd.GeoDataFrame(data)
data_gdf.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 1370 entries, 0 to 1369 Data columns (total 19 columns): BIKELANELENGTH 1370 non-null float64 BIKELANE_YEAR 1370 non-null int64 FACILITY 1370 non-null object FACILITYID 1370 non-null object GAP 1370 non-null object GAP_NOTES 1370 non-null object NEED_SYMBOL 0 non-null object NEED_SYM_1 0 non-null object NOTES 1370 non-null object OBJECTID 1370 non-null int64 PLANSREADY 1370 non-null object PROPOSEDCYCLETRACK 1370 non-null object REPAINT_LINE 0 non-null object SOURCEID 1370 non-null object STREETSEGID 1370 non-null int64 Shape_Length 1370 non-null float64 TRAVELDIRECTION 1370 non-null object YEAR_INSTALLED 1370 non-null int64 SHAPE 1370 non-null object dtypes: float64(2), int64(4), object(13) memory usage: 203.4+ KB
Let's start by computing the total distance of bike lanes in DC
np.sum(data['BIKELANELENGTH'])
96.47902169999992
np.mean(array_name)
and np.median(array_name)
to compute those values.We will now find the types of bike lanes stored in the column 'FACILITY' and visualize the number of bike lanes based on type.
#Gives unique values in a dataframe column
data['FACILITY'].unique()
array(['Existing Bike Lane', 'Shared Lane', 'Climbing Lane', 'Cycle Track', 'Contraflow Bike Lane', 'Bus/Bike Lane', ' '], dtype=object)
Let's now find the number of bike lanes of each unique type (frequency) and store it in the variable counts
counts = data['FACILITY'].value_counts()
print(counts)
Existing Bike Lane 908 Shared Lane 258 Cycle Track 88 Climbing Lane 62 Contraflow Bike Lane 45 5 Bus/Bike Lane 4 Name: FACILITY, dtype: int64
We will now visualize counts as a bar chart
#Generates bar graph
ax = counts.plot(kind='bar', figsize=(10, 10), legend=True, fontsize=12)
#X axis text and display style of categories
ax.set_xlabel("Facility", fontsize=12)
plt.xticks(rotation=45)
#Y axis text
ax.set_ylabel("Count", fontsize=12)
#Title
ax.set_title("Types of Bike Lane", fontsize=20)
<matplotlib.text.Text at 0x2519a77c6a0>
We now try to query the dataset based on certain filters using the where clause qithin the query.
In this example, we check for bikes installed in the last 5 years.
NOTE: These filters can only be applied to NUMERICAL data fields
query_result = bike_layer.query(where="YEAR_INSTALLED>2011")
print('Number of bike lanes installed in the last 5 years (2012-2016): ')
len(query_result.features)
Number of bike lanes installed in the last 5 years (2012-2016):
566
These results can also be stored as a Pandas DataFrame for analysis.
#store as dataframe
data_5yrs = query_result.df
To get the number of bikes installed each year, we perform a values_counts() on this dataframe over the YEAR_INSTALLED
field
data_5yrs['YEAR_INSTALLED'].value_counts()
2012 162 2014 160 2015 86 2013 82 2016 76 Name: YEAR_INSTALLED, dtype: int64
Let's start by finding the various unique values in the column 'YEAR_INSTALLED'
data['YEAR_INSTALLED'].unique()
array([2008, 2005, 1980, 2007, 2010, 2013, 2014, 2002, 2011, 2009, 2004, 2012, 2006, 2003, 2015, 2016, 0, 2001], dtype=int64)
For the following map, we have used the same location and zoom level.
The only change is the choice of basemap used. The other available basemaps are:
map1.basemaps
['streets', 'satellite', 'hybrid', 'topo', 'gray', 'dark-gray', 'oceans', 'national-geographic', 'terrain', 'osm']
map2 = gis.map('Washington, DC', 13)
#setting a basemap to the map
map2.basemap = 'gray'
Here we use the ClassedColorRenderer in order to render different colors on the map, based on the values in the field_name.
NOTE: The column provided as "field_name" can only be of NUMERICAL data type.
map2.add_layer({"type":"FeatureLayer",
"url":"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Transportation_WebMercator/MapServer/2",
"renderer":"ClassedColorRenderer", "field_name":"YEAR_INSTALLED"
})
map2
As per the map above, the darker the bike lanes the newer the construction. The lighter bike lanes signify an older year of installation.
Examples of other features that could be added to this map are:
- The street layer for DC, to estimate what proportion of streets in DC are bike lanes.
- Bike trails of DC to follow routes connected by these bike lanes.
- Demographics data to see if the bike lanes are appropriately connecting the most populous regions
To see other analyses examples of using the ArcGIS API for Python, follow this link to the API's Github.