by Chris Prince [chrprince@gmail.com] - 12 May 2018
Similar analysis for NYC as for the plots shown in this tweet: https://twitter.com/geographyjim/status/994949659461341184
import pandas as pd
import geopandas as gpd
from shapely.wkt import loads
import pylab as pl
import seaborn as sns
%matplotlib inline
buildings = pd.read_csv('/home/cmp/data/building.csv')
#buildings = pd.read_csv('/home/cmp/data/chicago_buildings.csv')
NYC: https://data.cityofnewyork.us/Housing-Development/Building-Footprints/nqwf-w8eh/data
Similar data exists for Chicago, but appears incomplete: https://data.cityofchicago.org/Buildings/Building-Footprints-deprecated-August-2015-/qv97-3bvb (and the current version may be broken) as of 12 May 2018.
buildings.head()
the_geom | NAME | CNSTRCT_YR | BIN | LSTMODDATE | LSTSTATYPE | DOITT_ID | HEIGHTROOF | FEAT_CODE | GROUNDELEV | SHAPE_AREA | SHAPE_LEN | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MULTIPOLYGON (((-73.81023637428498 40.72623326... | NaN | 1993 | 4441987 | 12/19/2017 12:00:00 AM +0000 | Constructed | 1283366 | 21.540000 | 2100 | 70 | 1089.812313 | 166.785929 |
1 | MULTIPOLYGON (((-73.86002815218995 40.57354222... | NaN | 1920 | 4518072 | 08/17/2017 12:00:00 AM +0000 | Constructed | 964744 | 16.381832 | 5110 | 6 | 329.898503 | 77.673856 |
2 | MULTIPOLYGON (((-73.76711333552652 40.61130961... | NaN | 1940 | 4299860 | 08/22/2017 12:00:00 AM +0000 | Constructed | 547717 | 26.795523 | 2100 | 8 | 1234.856322 | 146.929814 |
3 | MULTIPOLYGON (((-73.74704802666373 40.60410892... | NaN | 1930 | 4516837 | 08/17/2017 12:00:00 AM +0000 | Constructed | 861127 | 11.358426 | 5110 | 29 | 275.411758 | 68.476327 |
4 | MULTIPOLYGON (((-73.77058283711517 40.59512166... | NaN | 1931 | 4301765 | 08/22/2017 12:00:00 AM +0000 | Constructed | 288652 | 26.632714 | 2100 | 6 | 1420.221189 | 208.893121 |
Create the geometries from the wkt column:
geometry = [loads(x) for x in buildings.the_geom]
b = gpd.geodataframe.GeoDataFrame(buildings, geometry=geometry)
Convert the dataframe CRS from lat/long to xy-coordinates (epsg 2263)
b.crs = {'init' :'epsg:4326'}
b = b.to_crs({'init' : 'epsg:2263'})
Calculate centroids of the geometries
b['centroid'] = b.geometry.centroid
from shapely.geometry import Point
The coordinates for Times Square (per Google):
ts = Point(-73.9851,40.7589)
Some gymnastics to convert the lat/long of TS to our CRS
ts = gpd.geodataframe.GeoDataFrame([ts])
ts.geometry = ts[0]
ts.crs = ({'init' : 'epsg:4326'})
ts = ts.to_crs({'init' : 'epsg:2263'})
tsft = ts.geometry
Distance calculations are done from the geometry attribute of the GeoDataFrame, so set the geometry to the centroid column:
b.geometry = b.centroid
b['dist'] = b.distance(tsft[0])
ftPerMeter = 3.28084
b['dist_m'] = b['dist']/ftPerMeter
b['HEIGHTROOF_m'] = b['HEIGHTROOF']/ftPerMeter
Use seaborn defaults to emulate look of the source plots:
sns.set()
Drop zero-height entries and cutoff after 25 km:
b = b[b.HEIGHTROOF>0]
b_25k = b[b.dist_m<25000]
A helper function for common labels:
def formatplot():
pl.xticks(fontsize=14)
pl.yticks(fontsize=14)
ax.set_title("Building heights in NYC by distance to Times Square (2018)", size=24)
ax.set_ylabel("roof height (m)", size=18)
ax.set_xlabel("distance to Times Square (m)", size=18)
I don't bother clipping at 50m, since there is some structure up there that might be interesting. Here's the plot similar to those above for London and Paris:
fig, ax=pl.subplots(figsize=(10,10))
pl.plot(b_25k.dist_m, b_25k.HEIGHTROOF_m, 'o', color = 'r', alpha=0.01)
ax.set_xlim((-1000,26000))
ax.set_ylim((-2,55))
formatplot()
pl.savefig('nyc_bheights.png', bbox_inches='tight')
Looking up to 200 meters, there is a smaller peak of very tall buildings at 6 km, which looks to be downtown Manhattan.
fig, ax=pl.subplots(figsize=(10,10))
pl.plot(b_25k.dist_m, b_25k.HEIGHTROOF_m, 'o', color = 'r', alpha=0.01)
ax.set_xlim((-1000,26000))
ax.set_ylim((-2,200))
formatplot()
Finally, including the population of buildings beyond 25km and clipping height at 30 m, and reducing the opacity to discern more structure. Note the sudden reduction in buildings below ~7 meters past 25 km: the only part of NYC further than 25 km from Times Square is the southern half of Staten Island.
Some thoughts on why this might be:
fig, ax=pl.subplots(figsize=(10,10))
pl.plot(b.dist_m, b.HEIGHTROOF_m, 'o', color = 'r', alpha=0.0025)
ax.set_xlim((-1000,40000))
ax.set_ylim((-1,31))
formatplot()
Just to make sure this isn't actually a visual artifact, I'll visually compare the relative numbers of ~8m and ~3m buildings with a 2d histogram in the 20-25 km range to those in the 25+ km range. We'll inspect only buildings below 10 m. First note that there is a visual artifact in the absolute intensity; after 25 km the drop off is significant across all heights:
fig, ax=pl.subplots(figsize=(10,6))
b_10m = b[b.HEIGHTROOF_m<10]
pl.hist2d(b_10m.dist_m, b_10m.HEIGHTROOF_m, cmap='Reds', bins=[50,30]);
formatplot()
pl.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f7c7823e6d0>
Here's the comparison of the 25+ km buildings:
bb=b_10m[b.dist_m>25000]
/home/cmp/anaconda2/lib/python2.7/site-packages/geopandas/geodataframe.py:376: UserWarning: Boolean Series key will be reindexed to match DataFrame index. result = super(GeoDataFrame, self).__getitem__(key)
fig, ax=pl.subplots(figsize=(18,5))
pl.hist2d(bb.dist_m, bb.HEIGHTROOF_m, cmap='Reds', bins=[100,30]);
formatplot()
pl.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f7c947c6850>
And here's the same for 20-25 km:
bbb=b_10m[(b.dist_m<25000) & (b.dist_m>20000)]
fig, ax=pl.subplots(figsize=(9,5))
pl.hist2d(bbb.dist_m, bbb.HEIGHTROOF_m, cmap='Reds', bins=[50,30]);
formatplot()
pl.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f7c94639610>
So it doesn't seem to be an artifact. Not the relative amounts, anyway.