The flow adopted in this notebook is as follows:
- Read in the datasets using ArcGIS API for Python
- Merge datasets
- Construct model that predicts service type
- How does my neighborhood fare?
- Next Steps
The datasets used in this notebook are the
City Service Requests in 2018
Neighborhood Clusters
These datasets can be found on opendata.dc.gov
We start by importing the ArcGIS package to load the data using a service URL
from arcgis.features import *
import arcgis
Link to Service Requests 2018 dataset
lyr_url = 'https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/ServiceRequests/MapServer/9'
req_layer = FeatureLayer(lyr_url)
req_layer
<FeatureLayer url:"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/ServiceRequests/MapServer/9">
#Extract all the data and display number of rows
all_features = req_layer.query()
print('Total number of rows in the dataset: ')
print(len(all_features.features))
Total number of rows in the dataset: 16780
#store as dataframe
requests = all_features.df
#View first 5 rows
requests.head()
ADDDATE | CITY | DETAILS | INSPECTIONDATE | INSPECTIONFLAG | INSPECTORNAME | LATITUDE | LONGITUDE | MARADDRESSREPOSITORYID | OBJECTID | ... | SERVICEREQUESTID | SERVICETYPECODEDESCRIPTION | STATE | STATUS_CODE | STREETADDRESS | WARD | XCOORD | YCOORD | ZIPCODE | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1515054684000 | WASHINGTON | Per T. Duckett 01-09-18. closed by A. Hedgeman... | 1.515497e+12 | Y | None | 38.860941 | -76.989057 | 65434.0 | 309014 | ... | 18-00005582 | SNOW | DC | CLOSED | 1375 MORRIS ROAD SE | 8 | 400949.750000 | 132569.42000 | 20020 | {'spatialReference': {'wkid': 26985, 'latestWk... |
1 | 1514890494000 | WASHINGTON | completed per Supervisor P. Redman-Smith. Clo... | 1.515049e+12 | Y | None | 38.872655 | -76.972942 | 48733.0 | 309015 | ... | 18-00001141 | SNOW | DC | CLOSED | 2310 MINNESOTA AVENUE SE | 8 | 402348.030000 | 133870.07000 | 20020 | {'spatialReference': {'wkid': 26985, 'latestWk... |
2 | 1515014214000 | None | No information. closed by sg 1/4/18 | 1.515049e+12 | Y | None | 38.846461 | -76.971636 | NaN | 309016 | ... | 18-00005386 | SNOW | None | CLOSED | None | 8 | 402462.228864 | 130962.50761 | 20020 | {'spatialReference': {'wkid': 26985, 'latestWk... |
3 | 1515041355000 | WASHINGTON | DPW Officer issued one ticket for fire hydrant... | NaN | N | None | 38.920416 | -77.013792 | 228207.0 | 309017 | ... | 18-00005455 | PEMA- Parking Enforcement Management Administr... | DC | CLOSED | 149 ADAMS STREET NW | 5 | 398803.970000 | 139171.70000 | 20001 | {'spatialReference': {'wkid': 26985, 'latestWk... |
4 | 1515006248000 | WASHINGTON | None | NaN | N | None | 38.952663 | -77.069486 | 223159.0 | 309018 | ... | 18-00005276 | PEMA- Parking Enforcement Management Administr... | DC | CLOSED | 4817 36TH STREET NW | 3 | 393976.960000 | 142753.59000 | 20008 | {'spatialReference': {'wkid': 26985, 'latestWk... |
5 rows × 29 columns
#Import other necessary packages
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
We now convert this dataframe to a GeoDataFrame
geometry = [Point(xy) for xy in zip(requests['LONGITUDE'], requests['LATITUDE'])]
requests = requests.drop(['LONGITUDE', 'LATITUDE'], axis=1)
crs = {'init': 'epsg:4326'}
requests_gdf = gpd.GeoDataFrame(requests, crs=crs, geometry=geometry)
requests_gdf.head()
ADDDATE | CITY | DETAILS | INSPECTIONDATE | INSPECTIONFLAG | INSPECTORNAME | MARADDRESSREPOSITORYID | OBJECTID | ORGANIZATIONACRONYM | PRIORITY | ... | SERVICETYPECODEDESCRIPTION | STATE | STATUS_CODE | STREETADDRESS | WARD | XCOORD | YCOORD | ZIPCODE | SHAPE | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1515054684000 | WASHINGTON | Per T. Duckett 01-09-18. closed by A. Hedgeman... | 1.515497e+12 | Y | None | 65434.0 | 309014 | DPW | STANDARD | ... | SNOW | DC | CLOSED | 1375 MORRIS ROAD SE | 8 | 400949.750000 | 132569.42000 | 20020 | {'x': 400949.75, 'y': 132569.4200000018} | POINT (-76.98905709 38.86094051) |
1 | 1514890494000 | WASHINGTON | completed per Supervisor P. Redman-Smith. Clo... | 1.515049e+12 | Y | None | 48733.0 | 309015 | DPW | STANDARD | ... | SNOW | DC | CLOSED | 2310 MINNESOTA AVENUE SE | 8 | 402348.030000 | 133870.07000 | 20020 | {'x': 402348.0300000012, 'y': 133870.0700000003} | POINT (-76.97294183 38.87265468) |
2 | 1515014214000 | None | No information. closed by sg 1/4/18 | 1.515049e+12 | Y | None | NaN | 309016 | DPW | STANDARD | ... | SNOW | None | CLOSED | None | 8 | 402462.228864 | 130962.50761 | 20020 | {'x': 402462.22890000045, 'y': 130962.50759999... | POINT (-76.971636238 38.84646148780001) |
3 | 1515041355000 | WASHINGTON | DPW Officer issued one ticket for fire hydrant... | NaN | N | None | 228207.0 | 309017 | DPW | STANDARD | ... | PEMA- Parking Enforcement Management Administr... | DC | CLOSED | 149 ADAMS STREET NW | 5 | 398803.970000 | 139171.70000 | 20001 | {'x': 398803.9699999988, 'y': 139171.69999999925} | POINT (-77.01379201 38.92041601) |
4 | 1515006248000 | WASHINGTON | None | NaN | N | None | 223159.0 | 309018 | DPW | STANDARD | ... | PEMA- Parking Enforcement Management Administr... | DC | CLOSED | 4817 36TH STREET NW | 3 | 393976.960000 | 142753.59000 | 20008 | {'x': 393976.9600000009, 'y': 142753.58999999985} | POINT (-77.06948607 38.95266291) |
5 rows × 28 columns
Link to this dataset
neighborhood = gpd.read_file('D:\Data\DC_Neighborhood\\Neighborhood_Clusters\\Neighborhood_Clusters.shp')
neighborhood.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 46 entries, 0 to 45 Data columns (total 8 columns): OBJECTID 46 non-null int64 WEB_URL 39 non-null object NAME 46 non-null object NBH_NAMES 46 non-null object Shape_Leng 46 non-null float64 Shape_Area 46 non-null float64 TYPE 46 non-null object geometry 46 non-null object dtypes: float64(2), int64(1), object(5) memory usage: 3.0+ KB
neighborhood.head()
OBJECTID | WEB_URL | NAME | NBH_NAMES | Shape_Leng | Shape_Area | TYPE | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 1 | http://planning.dc.gov/ | Cluster 39 | Congress Heights, Bellevue, Washington Highlands | 10711.668010 | 4.886463e+06 | Original | POLYGON ((-76.99401890037231 38.84519662346873... |
1 | 2 | http://planning.dc.gov/ | Cluster 38 | Douglas, Shipley Terrace | 8229.486324 | 2.367958e+06 | Original | POLYGON ((-76.97471813575507 38.8528706360112,... |
2 | 3 | http://planning.dc.gov/ | Cluster 36 | Woodland/Fort Stanton, Garfield Heights, Knox ... | 4746.344457 | 1.119573e+06 | Original | POLYGON ((-76.9687730019474 38.86067206227963,... |
3 | 4 | http://planning.dc.gov/ | Cluster 27 | Near Southeast, Navy Yard | 7286.968902 | 1.619167e+06 | Original | POLYGON ((-76.9872595922274 38.87711832849107,... |
4 | 5 | http://planning.dc.gov/ | Cluster 32 | River Terrace, Benning, Greenway, Dupont Park | 11251.012821 | 4.286254e+06 | Original | POLYGON ((-76.93760147029893 38.88995958845385... |
The SHAPE
column needs to be renamed to geometry
for use with geopandas
neighborhood.rename(columns={'SHAPE': 'geometry'}, inplace=True)
We now merge the two datasets
merged = gpd.sjoin(requests_gdf, neighborhood, how="inner", op='intersects')
merged.head()
ADDDATE | CITY | DETAILS | INSPECTIONDATE | INSPECTIONFLAG | INSPECTORNAME | MARADDRESSREPOSITORYID | OBJECTID_left | ORGANIZATIONACRONYM | PRIORITY | ... | SHAPE | geometry | index_right | OBJECTID_right | WEB_URL | NAME | NBH_NAMES | Shape_Leng | Shape_Area | TYPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1515054684000 | WASHINGTON | Per T. Duckett 01-09-18. closed by A. Hedgeman... | 1.515497e+12 | Y | None | 65434.0 | 309014 | DPW | STANDARD | ... | {'x': 400949.75, 'y': 132569.4200000018} | POINT (-76.98905709 38.86094051) | 28 | 29 | http://planning.dc.gov/ | Cluster 37 | Sheridan, Barry Farm, Buena Vista | 7600.043391 | 2.052485e+06 | Original |
9 | 1514881461000 | WASHINGTON | completed per Supervisor P. Redman-Smith. Clo... | 1.515049e+12 | Y | None | 803072.0 | 309023 | DPW | STANDARD | ... | {'x': 400200.6319999993, 'y': 132307.12110000104} | POINT (-76.99768841789999 38.8585777698) | 28 | 29 | http://planning.dc.gov/ | Cluster 37 | Sheridan, Barry Farm, Buena Vista | 7600.043391 | 2.052485e+06 | Original |
184 | 1514996387000 | WASHINGTON | We are unable to complete verification until w... | 1.515071e+12 | Y | None | 67141.0 | 309198 | DPW | STANDARD | ... | {'x': 400741.8599999994, 'y': 132536.73999999836} | POINT (-76.99145240999999 38.86064632) | 28 | 29 | http://planning.dc.gov/ | Cluster 37 | Sheridan, Barry Farm, Buena Vista | 7600.043391 | 2.052485e+06 | Original |
316 | 1515232328000 | None | Closed per T. Duckett on 1/9/2018. Closed by ... | 1.515491e+12 | Y | None | 903772.0 | 310604 | DPW | STANDARD | ... | {'x': 400322.6000000015, 'y': 132462.3200000003} | POINT (-76.9962830905 38.8599761644) | 28 | 29 | http://planning.dc.gov/ | Cluster 37 | Sheridan, Barry Farm, Buena Vista | 7600.043391 | 2.052485e+06 | Original |
361 | 1515244430000 | WASHINGTON | Collect on 1-7-18 by the big truck | NaN | N | None | 62170.0 | 310649 | DPW | STANDARD | ... | {'x': 401239.3900000006, 'y': 132151.4200000018} | POINT (-76.98572065 38.85717463) | 28 | 29 | http://planning.dc.gov/ | Cluster 37 | Sheridan, Barry Farm, Buena Vista | 7600.043391 | 2.052485e+06 | Original |
5 rows × 36 columns
The variables used to build the model are:
- City Quadrant
- Neighborhood cluster
- Organization acronym
- Status Code
quads = ['NE', 'NW', 'SE', 'SW']
def generateQuadrant(x):
'''Function that extracts quadrant from street address'''
try:
temp = x[-2:]
if temp in quads:
return temp
else:
return 'NaN'
except:
return 'NaN'
merged['QUADRANT'] = merged['STREETADDRESS'].apply(generateQuadrant)
merged['QUADRANT'].head()
0 SE 9 SE 184 SE 316 NaN 361 SE Name: QUADRANT, dtype: object
merged['QUADRANT'].unique()
array(['SE', 'NaN', 'NE', 'NW', 'SW'], dtype=object)
merged['CLUSTER'] = merged['NAME'].apply(lambda x: x[8:])
merged['CLUSTER'].head()
0 37 9 37 184 37 316 37 361 37 Name: CLUSTER, dtype: object
merged['CLUSTER'] = merged['CLUSTER'].astype(int)
merged['ORGANIZATIONACRONYM'].unique()
array(['DPW', 'DDOT', 'DMV', 'DOEE', 'FEMS', 'DOH', 'OUC', 'ORM', 'DC-ICH'], dtype=object)
merged['STATUS_CODE'].unique()
array(['CLOSED', 'OPEN'], dtype=object)
Let's extract the number of possible outcomes, i.e. length of the target variable
len(merged['SERVICETYPECODEDESCRIPTION'].unique())
22
#Import necessary packages
from sklearn.preprocessing import *
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
#Converting categorical (text) fields to numbers
number = LabelEncoder()
merged['SERVICETYPE_NUMBER'] = number.fit_transform(merged['SERVICETYPECODEDESCRIPTION'].astype('str'))
merged['STATUS_CODE_NUMBER'] = number.fit_transform(merged['STATUS_CODE'].astype('str'))
#Extracting desired fields
data = merged[['SERVICETYPECODEDESCRIPTION', 'SERVICETYPE_NUMBER', 'QUADRANT', 'CLUSTER', 'ORGANIZATIONACRONYM', 'STATUS_CODE', 'STATUS_CODE_NUMBER']]
data.reset_index(inplace=True)
data.head()
index | SERVICETYPECODEDESCRIPTION | SERVICETYPE_NUMBER | QUADRANT | CLUSTER | ORGANIZATIONACRONYM | STATUS_CODE | STATUS_CODE_NUMBER | |
---|---|---|---|---|---|---|---|---|
0 | 0 | SNOW | 13 | SE | 37 | DPW | CLOSED | 0 |
1 | 9 | SNOW | 13 | SE | 37 | DPW | CLOSED | 0 |
2 | 184 | SNOW | 13 | SE | 37 | DPW | CLOSED | 0 |
3 | 316 | SNOW | 13 | NaN | 37 | DPW | CLOSED | 0 |
4 | 361 | SWMA- Solid Waste Management Admistration | 14 | SE | 37 | DPW | CLOSED | 0 |
Let's binarize values in fields QUADRANT
(4) and ORGANIZATIONACRONYM
(8)
Wonder why are not doing it for CLUSTER
? Appropriate nomenclature of adjacent clusters.
data = pd.get_dummies(data=data, columns=['QUADRANT', 'ORGANIZATIONACRONYM'])
data.head()
index | SERVICETYPECODEDESCRIPTION | SERVICETYPE_NUMBER | CLUSTER | STATUS_CODE | STATUS_CODE_NUMBER | QUADRANT_NE | QUADRANT_NW | QUADRANT_NaN | QUADRANT_SE | QUADRANT_SW | ORGANIZATIONACRONYM_DC-ICH | ORGANIZATIONACRONYM_DDOT | ORGANIZATIONACRONYM_DMV | ORGANIZATIONACRONYM_DOEE | ORGANIZATIONACRONYM_DOH | ORGANIZATIONACRONYM_DPW | ORGANIZATIONACRONYM_FEMS | ORGANIZATIONACRONYM_ORM | ORGANIZATIONACRONYM_OUC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | SNOW | 13 | 37 | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
1 | 9 | SNOW | 13 | 37 | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
2 | 184 | SNOW | 13 | 37 | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 316 | SNOW | 13 | 37 | CLOSED | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | 361 | SWMA- Solid Waste Management Admistration | 14 | 37 | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
#Extracting input dataframe
model_data = data.drop(['SERVICETYPECODEDESCRIPTION', 'SERVICETYPE_NUMBER', 'STATUS_CODE'], axis=1)
model_data.head()
index | CLUSTER | STATUS_CODE_NUMBER | QUADRANT_NE | QUADRANT_NW | QUADRANT_NaN | QUADRANT_SE | QUADRANT_SW | ORGANIZATIONACRONYM_DC-ICH | ORGANIZATIONACRONYM_DDOT | ORGANIZATIONACRONYM_DMV | ORGANIZATIONACRONYM_DOEE | ORGANIZATIONACRONYM_DOH | ORGANIZATIONACRONYM_DPW | ORGANIZATIONACRONYM_FEMS | ORGANIZATIONACRONYM_ORM | ORGANIZATIONACRONYM_OUC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 37 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
1 | 9 | 37 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
2 | 184 | 37 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 316 | 37 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | 361 | 37 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
#Defining independent and dependent variables
y = data['SERVICETYPE_NUMBER'].values
X = model_data.values
#Splitting data to a training and test sample of 70%-30%
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = .3, random_state=522, stratify=y)
#n_estimators = number of trees in the forest
#min_samples_leaf = minimum number of samples required to be at a leaf node for the tree
rf = RandomForestClassifier(n_estimators=1500, min_samples_leaf=20, random_state=522)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(y_pred)
[14 18 14 ..., 14 14 14]
print('Accuracy: ', accuracy_score(y_test, y_pred))
Accuracy: 0.705286168521
data = merged[['SERVICETYPECODEDESCRIPTION', 'SERVICETYPE_NUMBER', 'QUADRANT', 'CLUSTER', 'ORGANIZATIONACRONYM', 'STATUS_CODE', 'STATUS_CODE_NUMBER']]
data.reset_index(inplace=True)
data.head()
index | SERVICETYPECODEDESCRIPTION | SERVICETYPE_NUMBER | QUADRANT | CLUSTER | ORGANIZATIONACRONYM | STATUS_CODE | STATUS_CODE_NUMBER | |
---|---|---|---|---|---|---|---|---|
0 | 0 | SNOW | 13 | SE | 37 | DPW | CLOSED | 0 |
1 | 9 | SNOW | 13 | SE | 37 | DPW | CLOSED | 0 |
2 | 184 | SNOW | 13 | SE | 37 | DPW | CLOSED | 0 |
3 | 316 | SNOW | 13 | NaN | 37 | DPW | CLOSED | 0 |
4 | 361 | SWMA- Solid Waste Management Admistration | 14 | SE | 37 | DPW | CLOSED | 0 |
data_test = pd.get_dummies(data=data,columns=['QUADRANT'])
data_test.head()
index | SERVICETYPECODEDESCRIPTION | SERVICETYPE_NUMBER | CLUSTER | ORGANIZATIONACRONYM | STATUS_CODE | STATUS_CODE_NUMBER | QUADRANT_NE | QUADRANT_NW | QUADRANT_NaN | QUADRANT_SE | QUADRANT_SW | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | SNOW | 13 | 37 | DPW | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 9 | SNOW | 13 | 37 | DPW | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 |
2 | 184 | SNOW | 13 | 37 | DPW | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 |
3 | 316 | SNOW | 13 | 37 | DPW | CLOSED | 0 | 0 | 0 | 1 | 0 | 0 |
4 | 361 | SWMA- Solid Waste Management Admistration | 14 | 37 | DPW | CLOSED | 0 | 0 | 0 | 0 | 1 | 0 |
model_test_data = data_test.drop(['SERVICETYPECODEDESCRIPTION', 'SERVICETYPE_NUMBER', 'STATUS_CODE', 'ORGANIZATIONACRONYM'], axis=1)
model_test_data.head()
index | CLUSTER | STATUS_CODE_NUMBER | QUADRANT_NE | QUADRANT_NW | QUADRANT_NaN | QUADRANT_SE | QUADRANT_SW | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 37 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 9 | 37 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | 184 | 37 | 0 | 0 | 0 | 0 | 1 | 0 |
3 | 316 | 37 | 0 | 0 | 0 | 1 | 0 | 0 |
4 | 361 | 37 | 0 | 0 | 0 | 0 | 1 | 0 |
y = data['SERVICETYPE_NUMBER'].values
X = model_test_data.values
#Splitting data to a training and test sample of 70%-30%
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = .3, random_state=522, stratify=y)
#n_estimators = number of trees in the forest
#min_samples_leaf = minimum number of samples required to be at a leaf node for the tree
rf = RandomForestClassifier(n_estimators=1500, min_samples_leaf=20, random_state=522)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(y_pred)
[14 14 18 ..., 14 14 14]
print('Accuracy: ', accuracy_score(y_test, y_pred))
Accuracy: 0.52146263911
A drop in accuracy from 70.52% to 52.14% demonstrates the importance of using the correct predictors.
#Count of service requests per cluster
cluster_count = merged.groupby('NAME').size().reset_index(name='counts')
cluster_count.head()
NAME | counts | |
---|---|---|
0 | Cluster 1 | 349 |
1 | Cluster 10 | 353 |
2 | Cluster 11 | 506 |
3 | Cluster 12 | 140 |
4 | Cluster 13 | 369 |
#merge with original file
neighborhood = pd.merge(neighborhood, cluster_count, on='NAME')
neighborhood.head()
OBJECTID | WEB_URL | NAME | NBH_NAMES | Shape_Leng | Shape_Area | TYPE | geometry | counts | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | http://planning.dc.gov/ | Cluster 39 | Congress Heights, Bellevue, Washington Highlands | 10711.668010 | 4.886463e+06 | Original | POLYGON ((-76.99401890037231 38.84519662346873... | 461 |
1 | 2 | http://planning.dc.gov/ | Cluster 38 | Douglas, Shipley Terrace | 8229.486324 | 2.367958e+06 | Original | POLYGON ((-76.97471813575507 38.8528706360112,... | 144 |
2 | 3 | http://planning.dc.gov/ | Cluster 36 | Woodland/Fort Stanton, Garfield Heights, Knox ... | 4746.344457 | 1.119573e+06 | Original | POLYGON ((-76.9687730019474 38.86067206227963,... | 87 |
3 | 4 | http://planning.dc.gov/ | Cluster 27 | Near Southeast, Navy Yard | 7286.968902 | 1.619167e+06 | Original | POLYGON ((-76.9872595922274 38.87711832849107,... | 84 |
4 | 5 | http://planning.dc.gov/ | Cluster 32 | River Terrace, Benning, Greenway, Dupont Park | 11251.012821 | 4.286254e+06 | Original | POLYGON ((-76.93760147029893 38.88995958845385... | 241 |
temp = neighborhood.sort_values(['counts'], ascending=[False])
temp[['NAME', 'NBH_NAMES', 'counts']]
NAME | NBH_NAMES | counts | |
---|---|---|---|
33 | Cluster 2 | Columbia Heights, Mt. Pleasant, Pleasant Plain... | 1192 |
20 | Cluster 18 | Brightwood Park, Crestwood, Petworth | 1141 |
30 | Cluster 25 | Union Station, Stanton Park, Kingman Park | 1018 |
13 | Cluster 6 | Dupont Circle, Connecticut Avenue/K Street | 879 |
38 | Cluster 26 | Capitol Hill, Lincoln Park | 759 |
5 | Cluster 8 | Downtown, Chinatown, Penn Quarters, Mount Vern... | 757 |
32 | Cluster 21 | Edgewood, Bloomingdale, Truxton Circle, Eckington | 744 |
23 | Cluster 17 | Takoma, Brightwood, Manor Park | 543 |
8 | Cluster 31 | Deanwood, Burrville, Grant Park, Lincoln Heigh... | 537 |
29 | Cluster 34 | Twining, Fairlawn, Randle Highlands, Penn Bran... | 514 |
14 | Cluster 3 | Howard University, Le Droit Park, Cardozo/Shaw | 512 |
21 | Cluster 11 | Friendship Heights, American University Park, ... | 506 |
9 | Cluster 7 | Shaw, Logan Circle | 503 |
16 | Cluster 33 | Capitol View, Marshall Heights, Benning Heights | 492 |
15 | Cluster 9 | Southwest Employment Area, Southwest/Waterfron... | 476 |
6 | Cluster 5 | West End, Foggy Bottom, GWU | 467 |
0 | Cluster 39 | Congress Heights, Bellevue, Washington Highlands | 461 |
34 | Cluster 22 | Brookland, Brentwood, Langdon | 460 |
11 | Cluster 23 | Ivy City, Arboretum, Trinidad, Carver Langston | 421 |
22 | Cluster 19 | Lamont Riggs, Queens Chapel, Fort Totten, Plea... | 373 |
17 | Cluster 13 | Spring Valley, Palisades, Wesley Heights, Foxh... | 369 |
12 | Cluster 4 | Georgetown, Burleith/Hillandale | 365 |
25 | Cluster 10 | Hawthorne, Barnaby Woods, Chevy Chase | 353 |
31 | Cluster 1 | Kalorama Heights, Adams Morgan, Lanier Heights | 349 |
37 | Cluster 15 | Cleveland Park, Woodley Park, Massachusetts Av... | 332 |
18 | Cluster 20 | North Michigan Park, Michigan Park, University... | 270 |
4 | Cluster 32 | River Terrace, Benning, Greenway, Dupont Park | 241 |
36 | Cluster 14 | Cathedral Heights, McLean Gardens, Glover Park | 195 |
35 | Cluster 24 | Woodridge, Fort Lincoln, Gateway | 181 |
7 | Cluster 30 | Mayfair, Hillbrook, Mahaning Heights | 164 |
43 | Cluster 45 | National Mall, Potomac River | 163 |
1 | Cluster 38 | Douglas, Shipley Terrace | 144 |
19 | Cluster 12 | North Cleveland Park, Forest Hills, Van Ness | 140 |
24 | Cluster 16 | Colonial Village, Shepherd Park, North Portal ... | 138 |
26 | Cluster 28 | Historic Anacostia | 134 |
27 | Cluster 35 | Fairfax Village, Naylor Gardens, Hillcrest, Su... | 129 |
28 | Cluster 37 | Sheridan, Barry Farm, Buena Vista | 98 |
2 | Cluster 36 | Woodland/Fort Stanton, Garfield Heights, Knox ... | 87 |
3 | Cluster 27 | Near Southeast, Navy Yard | 84 |
10 | Cluster 29 | Eastland Gardens, Kenilworth | 46 |
42 | Cluster 46 | Arboretum, Anacostia River | 17 |
41 | Cluster 44 | Joint Base Anacostia-Bolling | 8 |
40 | Cluster 43 | Saint Elizabeths | 5 |
39 | Cluster 41 | Rock Creek Park | 4 |
#Viewing the map
from arcgis.gis import *
gis = GIS("http://dcdev.maps.arcgis.com/", "username", "password")
search_result = gis.content.search("Neighborhood_Service_Requests")
search_result[0]
- Use historical data and train predictive model including month of request, duration of service, etc
- Binarize Neighborhood Clusters for increased accuracy
- Test for spatial/temporal autocorrelation within each neighborhood cluster