Data2go Spatial Analysis

This code calculates the spatial autocorrelation of the different measures in the data2go.nyc dataset.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shapefile
from scipy import spatial
import pysal
from shapely.geometry import Polygon as Shp_poly
from bokeh.plotting import figure, output_notebook, show

output_notebook()
BokehJS successfully loaded.

Load Data

Each column is a different measure in the data set. Each row is a New York City community district. The first column, 'GEO_ID', is number which identifies the community district and is the same identification used in community district shapefile One column is removed because of bad values

In [2]:
d = pd.read_csv("cd_data_reduced.csv",low_memory=False)
d = d.drop('hiv_test_cd', 1)
d
Out[2]:
GEO_ID GEO LABEL air_qual_cd diversion_cd lead_complaints_cd lead_kids_cd waste_cd water_per_1000_cd noise_per_1000_cd citu_land_cd ... homicides_cd robberies_cd major_felonies_cd burglaries_per_1000_cd felony_assaults_per_1000_cd grand_larcenies_per_1000_cd motor_vehicle_thefts_per_1000_cd homicides_per_1000_cd robberies_per_1000_cd major_felonies_per_1000_cd
0 201 Bronx CD 001 10.0 4.934534 844 2 0.261307 12.794864 10.673994 37.804800 ... 7 381 1515 2.038568 4.957139 5.147068 1.120579 0.069641 3.982172 17.315168
1 202 Bronx CD 002 9.8 8.703899 761 5 0.261307 12.794864 10.673994 51.770195 ... 4 248 1220 2.038568 4.957139 5.147068 1.120579 0.069641 3.982172 17.315168
2 203 Bronx CD 003 9.4 6.220565 786 2 0.282632 13.295392 11.816146 14.582070 ... 5 287 1240 2.566403 5.281324 4.461501 1.003986 0.089111 3.879308 17.453915
3 204 Bronx CD 004 10.0 7.639643 2906 5 0.359948 21.782562 19.094777 17.247520 ... 15 406 2129 1.731969 4.836577 4.541927 1.164228 0.107799 2.917757 15.300256
4 205 Bronx CD 005 10.1 9.409869 2460 11 0.308467 20.595051 10.543414 10.927182 ... 13 309 1508 1.363565 3.628723 3.062433 0.782373 0.096865 2.302413 11.236374
5 206 Bronx CD 006 9.5 10.004536 1530 4 0.282632 13.295392 11.816146 15.426946 ... 10 366 1669 2.566403 5.281324 4.461501 1.003986 0.089111 3.879308 17.453915
6 207 Bronx CD 007 9.4 10.955847 2452 13 0.350677 20.186511 21.770401 12.216572 ... 7 397 1969 2.796050 4.202156 4.800155 0.848512 0.056567 3.208185 15.911625
7 208 Bronx CD 008 8.9 16.743962 460 2 0.266786 11.652385 9.631148 5.829461 ... 1 119 855 1.434426 1.024590 3.744411 0.642697 0.009314 1.108420 7.963860
8 209 Bronx CD 009 8.9 7.547203 1258 5 0.243922 9.254031 9.727218 10.881311 ... 9 493 2173 1.613137 3.199389 3.269292 0.903357 0.048394 2.650922 11.684492
9 210 Bronx CD 010 8.7 15.416899 169 3 0.240634 7.779763 6.951437 10.141592 ... 0 135 906 1.112555 1.169401 3.037193 0.942017 0.000000 1.096313 7.357479
10 211 Bronx CD 011 8.7 13.443915 561 7 0.284266 11.025200 7.946735 12.121248 ... 6 150 970 1.463663 1.288660 2.736413 0.986382 0.047728 1.193204 7.716049
11 212 Bronx CD 012 8.7 14.294287 891 9 0.319219 15.651635 6.684699 10.297619 ... 16 398 1892 2.514687 3.662849 2.909147 1.324261 0.112703 2.803488 13.327135
12 301 Brooklyn CD 001 10.1 13.654087 307 8 0.393413 11.196945 57.486997 42.001059 ... 3 333 2590 3.765699 2.116536 7.391185 1.776021 0.020030 2.223365 17.292836
13 302 Brooklyn CD 002 9.5 17.576300 104 3 0.227319 7.829714 35.214038 29.093225 ... 3 317 2014 2.148253 2.148253 8.089393 0.944287 0.023607 2.494492 15.848285
14 303 Brooklyn CD 003 8.8 8.881484 1264 12 0.345026 17.104300 36.107467 7.524312 ... 18 614 3120 4.812396 5.080557 6.377875 1.761163 0.130457 4.450017 22.612464
15 304 Brooklyn CD 004 8.8 10.553904 986 9 0.259103 14.328291 21.702324 9.876285 ... 9 348 1801 3.491122 2.245792 3.393174 0.972477 0.062966 2.434690 12.600221
16 305 Brooklyn CD 005 8.7 9.397541 1164 11 0.338727 17.402564 10.261556 13.802139 ... 19 735 3229 3.165868 5.470672 5.561310 1.825715 0.123009 4.758514 20.905089
17 306 Brooklyn CD 006 9.4 12.975605 102 2 0.235693 10.430305 30.317775 33.429237 ... 1 138 1246 1.840122 1.388938 5.582292 0.981988 0.008847 1.220850 11.023037
18 307 Brooklyn CD 007 9.2 9.911233 372 6 0.255153 7.275520 14.863405 32.410091 ... 1 177 1036 1.158356 1.464215 2.388297 0.572671 0.006508 1.151849 6.741895
19 308 Brooklyn CD 008 8.8 7.975347 1078 2 0.256631 12.789961 25.884826 8.392331 ... 7 268 1553 2.350978 2.960788 3.939693 1.002977 0.056167 2.150382 12.460984
20 309 Brooklyn CD 009 8.6 12.485847 1168 7 0.331695 14.010778 17.588190 5.960514 ... 5 282 1462 1.612100 3.378164 4.890640 0.760766 0.045284 2.554001 13.240955
21 310 Brooklyn CD 010 8.4 24.365336 163 3 0.298262 7.259862 10.414991 4.481255 ... 1 90 919 1.447377 0.957260 3.109181 0.827073 0.007658 0.689227 7.037777
22 311 Brooklyn CD 011 8.2 17.116505 356 10 0.309493 8.095016 6.174528 11.506743 ... 1 165 1274 1.598618 1.078263 2.735890 0.531085 0.005364 0.885141 6.834361
23 312 Brooklyn CD 012 8.5 20.219170 354 13 0.438510 8.158301 8.047721 8.760843 ... 1 131 1100 1.707837 0.774056 2.911924 0.552897 0.006143 0.804772 6.757628
24 313 Brooklyn CD 013 8.0 17.556384 186 8 0.246567 9.151262 7.692085 6.019713 ... 8 237 1212 1.807060 2.618788 4.348540 0.570142 0.077307 2.290231 11.712069
25 314 Brooklyn CD 014 8.6 14.418601 1472 9 0.372077 15.149177 13.605967 7.585208 ... 3 321 1678 1.884002 2.076903 4.140947 0.604424 0.019290 2.064043 10.789609
26 315 Brooklyn CD 015 8.1 12.494749 175 8 0.349920 11.727974 8.347831 8.777353 ... 2 214 1626 2.282093 1.131124 5.106597 0.807001 0.013230 1.415559 10.755604
27 316 Brooklyn CD 016 8.8 13.286349 795 4 0.183807 10.300626 3.507438 13.713551 ... 18 412 1785 2.177304 4.592132 3.246162 0.712572 0.142514 3.261997 14.132681
28 317 Brooklyn CD 017 8.7 17.791859 1390 5 0.382220 19.655031 9.678501 9.944899 ... 12 469 2208 2.443840 4.276720 4.962188 1.184666 0.089409 3.494393 16.451216
29 318 Brooklyn CD 018 8.2 14.443406 189 4 0.316284 9.572800 7.185765 8.449893 ... 6 370 2128 1.879050 1.918505 3.886330 0.956787 0.029591 1.824800 10.495063
30 101 Manhattan CD 001 11.1 29.276651 15 1 0.264751 9.765702 85.724822 41.669408 ... 0 48 905 2.462827 1.692371 15.066707 0.553148 0.019755 1.211658 21.006467
31 102 Manhattan CD 002 10.9 26.498015 35 0 0.264751 9.765702 85.724822 35.579306 ... 3 136 2285 2.462827 1.692371 15.066707 0.553148 0.019755 1.211658 21.006467
32 103 Manhattan CD 003 9.9 13.098490 144 1 0.214429 9.432771 40.815222 9.467428 ... 3 215 1893 1.130249 1.947876 6.595124 0.396790 0.018036 1.292572 11.380647
33 104 Manhattan CD 004 11.4 23.122846 78 0 0.298540 18.970322 80.849704 45.851750 ... 1 215 2602 4.975366 3.592548 37.933694 0.764372 0.013898 3.043590 50.323468
34 105 Manhattan CD 005 14.3 23.787010 13 1 0.298540 18.970322 80.849704 70.293424 ... 1 223 4640 4.975366 3.592548 37.933694 0.764372 0.013898 3.043590 50.323468
35 106 Manhattan CD 006 12.3 25.414852 18 0 0.236906 6.840375 36.402277 20.669567 ... 0 117 1448 1.325707 0.826859 6.587534 0.355344 0.000000 0.799524 9.894968
36 107 Manhattan CD 007 10.3 24.381686 190 2 0.321260 6.341512 29.879281 7.025011 ... 1 148 1511 1.049954 0.793995 4.941573 0.329090 0.005224 0.773100 7.892936
37 108 Manhattan CD 008 11.1 24.478170 98 2 0.295811 6.049860 25.821003 6.803935 ... 0 97 1853 1.042762 0.482335 6.169295 0.372087 0.000000 0.445586 8.512065
38 109 Manhattan CD 009 9.8 13.032074 797 6 0.223021 14.893890 27.035813 8.021353 ... 8 238 1155 1.050889 2.011053 3.462641 0.347776 0.060483 1.799363 8.732205
39 110 Manhattan CD 010 9.6 9.136804 643 0 0.244959 11.855855 30.199843 6.464277 ... 4 368 1436 0.981275 2.724137 3.785965 0.300241 0.029292 2.694845 10.515755
40 111 Manhattan CD 011 9.7 9.060845 534 6 0.218958 9.605578 30.672641 11.453541 ... 9 452 1830 1.646909 4.397332 4.999248 0.401277 0.075240 3.778696 15.298701
41 112 Manhattan CD 012 9.5 12.949045 1914 5 0.280565 17.285812 48.328661 6.981096 ... 3 348 1832 1.305034 1.705496 3.481661 0.485265 0.014134 1.639537 8.631128
42 401 Queens CD 001 8.9 17.143394 253 6 0.286796 10.237873 28.072528 28.761721 ... 2 256 2189 2.425373 3.917910 3.842118 1.072761 0.011660 1.492537 12.762360
43 402 Queens CD 002 10.1 15.357682 139 5 0.250958 7.958600 19.552245 39.896079 ... 2 140 1128 1.576913 1.140116 3.635044 0.947629 0.014807 1.036469 8.350978
44 403 Queens CD 003 8.4 15.992565 264 16 0.305240 6.845518 9.062777 12.178550 ... 6 318 1634 1.636036 1.630654 3.035277 0.748056 0.032290 1.711380 8.793693
45 404 Queens CD 004 8.9 19.370088 212 12 0.349231 7.739302 6.638663 11.851459 ... 8 257 1296 2.159481 1.832076 2.549581 0.640878 0.055729 1.790280 9.028024
46 405 Queens CD 005 8.8 11.953827 291 5 0.344706 10.461702 11.498827 14.786872 ... 3 176 1437 1.724523 1.272288 3.316390 1.272288 0.018089 1.061245 8.664822
47 406 Queens CD 006 8.7 14.879700 85 1 0.290291 9.809293 9.287522 11.579859 ... 0 46 641 0.930491 0.469594 2.956702 0.817441 0.000000 0.400024 5.574252
48 407 Queens CD 007 8.4 10.856874 104 6 0.310764 8.192808 6.495669 10.276585 ... 6 169 1897 1.765345 1.023098 3.281938 0.838539 0.024073 0.678053 7.611046
49 408 Queens CD 008 8.2 18.312693 214 2 0.284319 12.405785 15.263151 3.470133 ... 1 165 877 0.930759 0.520705 2.369206 0.807092 0.006509 1.073953 5.708223
50 409 Queens CD 009 8.4 18.614805 115 14 0.310691 8.260813 11.131316 11.353905 ... 3 217 1356 1.733991 1.435251 2.740616 1.467723 0.019483 1.409274 8.806338
51 410 Queens CD 010 8.0 13.210223 38 9 0.346150 8.653787 7.916641 4.881492 ... 2 259 1472 1.842865 1.835186 3.862338 1.758400 0.015357 1.988759 11.302906
52 411 Queens CD 011 8.1 12.247413 7 1 0.298687 10.123203 5.654402 3.567738 ... 0 38 783 1.683055 0.489164 3.308074 0.696437 0.000000 0.315055 6.491784
53 412 Queens CD 012 8.3 19.336376 307 8 0.330853 12.062967 7.217413 10.920976 ... 17 600 3139 2.460964 3.555669 3.398676 1.285642 0.072132 2.545825 13.318907
54 413 Queens CD 013 7.9 19.086820 73 9 0.318724 10.349646 4.614364 5.967289 ... 5 188 1508 1.387325 1.588387 2.543429 1.090759 0.025133 0.944990 7.580023
55 414 Queens CD 014 7.6 15.632987 473 1 0.321496 12.663291 7.012468 3.891188 ... 6 173 1214 2.119059 3.242415 2.629675 0.816987 0.051062 1.472278 10.331475
56 501 Richmond CD 001 8.1 15.191572 280 6 0.347799 15.418140 11.258590 15.843469 ... 11 292 1619 1.645661 2.372021 2.837346 0.612867 0.062422 1.657010 9.187327
57 502 Richmond CD 002 8.1 17.048487 17 0 0.385287 14.251147 5.228728 15.683740 ... 2 57 634 1.061012 0.824383 2.144924 0.358759 0.015266 0.435091 4.839436
58 503 Richmond CD 003 7.8 17.966814 1 0 0.414923 12.705622 5.747056 10.507627 ... 2 30 563 0.803614 0.407895 1.759427 0.261783 0.012176 0.182640 3.427535

59 rows × 112 columns

Edit Shapefile

Get the NYC community district shapefile from here: http://www.nyc.gov/html/dcp/html/bytes/districts_download_metadata.shtml

Some of the community districts have no associated data in the data2go dataset, most likely because they are parks or nateral areas. (For example, Central Park is one of the community districts.) So this block of code removes those community districts from the shapefile and saves a new one.

In [3]:
sf = shapefile.Reader("./nycd_15d/nycd.shp")
iD = []
area = []
measures = []
index = 0
del_cnt = 0
e = shapefile.Editor(shapefile="./nycd_15d/nycd.shp")
for rec in sf.iterRecords():
    iD.append(rec[0])
    area.append(rec[1])
    row = d.loc[d['GEO_ID'] == rec[0]]
    if row.empty:
        e.delete(index - del_cnt)
        del_cnt = del_cnt + 1
    else:
        measures.append(row.iloc[:,2].values[0])
        
    index = index + 1
        
e.save("./nycd_15d/nycd_reduced")

Spatial Analysis

This performs the spatial autocorrelation (The Gamma Index and Moran's I) of each measure in 'd'.

In [4]:
num_measures = len(d.columns[2:])
fill = {'Gamma mag' : np.zeros(num_measures)}
measure_correlation = pd.DataFrame(fill, index = d.columns[2:] )
measure_correlation['Gamma mag'] = np.NAN
measure_correlation['Gamma stdmag'] = np.NAN
measure_correlation['Gamma pval'] = np.NAN
measure_correlation['MoranI I'] = np.NAN
measure_correlation['MoranI EI'] = np.NAN
measure_correlation['MoranI p_norm'] = np.NAN
measure_correlation['MoranI p_rand'] = np.NAN

sf = shapefile.Reader("./nycd_15d/nycd.shp")
w = pysal.rook_from_shapefile("./nycd_15d/nycd_reduced.shp")

for metric in measure_correlation.index:
    measures = []
    
    for rec in sf.iterRecords():
        row = d.loc[d['GEO_ID'] == rec[0]]
        if row.empty:
            continue
        else:
            measures.append(row[metric].values[0])
    
    measures = np.array(measures)
    
    g = pysal.Gamma(measures,w)
    measure_correlation.ix[metric,'Gamma mag'] = g.g
    measure_correlation.ix[metric,'Gamma stdmag'] = g.g_z
    measure_correlation.ix[metric,'Gamma pval'] = g.p_sim_g
    
    mi = pysal.Moran(measures, w, permutations = 999)
    measure_correlation.ix[metric,'MoranI I'] = mi.I
    measure_correlation.ix[metric,'MoranI EI'] = mi.EI
    measure_correlation.ix[metric,'MoranI p_norm'] = mi.p_norm
    measure_correlation.ix[metric,'MoranI p_rand'] = mi.p_rand
WARNING: there is one disconnected observation (no neighbors)
Island id:  [36]
WARNING:  36  is an island (no neighbors)

Plot some of the calculated autocorrelation values. 'Gamma stdmag' is the standardized Gamma Index and 'Gamma pval' is a measure of certainty. For more details: https://pysal.readthedocs.org/en/latest/users/tutorials/autocorrelation.html#gamma-index-of-spatial-autocorrelation

Not surprisingly, low p values are only observed for large values of the Gamma Index

In [5]:
measure_correlation.plot(kind='scatter',x='Gamma stdmag',y='MoranI I')
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e6785d0>
In [6]:
measure_correlation.plot(kind='scatter',x='Gamma stdmag',y='Gamma pval')
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ec58450>

Save

Save the data for later use.

In [7]:
measure_correlation.to_csv('spatial_correlations.csv')
In [ ]: