By Ben Welsh
The Los Angeles Times conducted an analysis of California buildings within fire hazard zones for the Dec. 18, 2018, story "A million California buildings face wildfire risk. ‘Extraordinary steps’ are needed to protect them."
It found that at least 1.1 million structures, roughly 1 in 10 in the state, are within the highest risk zones. Here's how we did it.
Set shared variables that will be used by all the notebooks in this repository.
import os
import pandas as pd
import altair as alt
base_dir = os.getcwd()
input_dir = os.path.join(base_dir, 'input')
output_dir = os.path.join(base_dir, 'output')
%%capture
%store input_dir
%store output_dir
Retrieve California building footprints mapped by Microsoft by running download-buildings.ipynb. Microsoft's list was compiled by a computer program that scours aerial and satellite photos. While it is among the most complete lists available, it is not comprehensive. According to a Microsoft expert who helped create the list, the database is believed to a slight undercount of the state's buildings.
%%capture
%run src/download-buildings.ipynb
Convert the Microsoft building footprints into centroid points by running tidy-buildings.ipynb.
%%capture
%run src/tidy-buildings.ipynb
Break the building points into batches for processing by running split-buildings.ipynb.
%%capture
%run src/split-buildings.ipynb
Join the buildings to "very severe" fire zones by running merge-fire-zones.ipynb. The maps were drawn by scientists at the California Department of Forestry and Fire Protection in 2007 using a computerized model that considers terrain, vegetation and the location of past fires. Our methodology was vetted by an expert there.
%%capture
%run src/merge-fire-zones.ipynb
Join the buildings to neighborhoods defined by The Times' Mapping L.A. project by running merge-neighborhoods.ipynb.
%%capture
%run src/merge-neighborhoods.ipynb
Join all of the merges conducted above into a single file for analysis by running combine-merges.ipynb.
%%capture
%run src/combine-merges.ipynb
First some configuration
intcomma = lambda x: print(f"{x:,.0f}")
percent = lambda numerator, denominator: print(f"{(numerator/denominator)*100:.1f}%")
Read in the prepared data.
points = pd.read_csv(os.path.join(output_dir, "buildings-points-with-analysis.csv"))
How many buildings are in the state?
intcomma(len(points))
10,988,525
How many buildings are in a "very severe" hazard zone?
points['in_firezone'] = ~pd.isnull(points.fire_zone_type)
in_firezone = points[points.in_firezone]
intcomma(len(in_firezone))
1,151,181
What percentage is that?
percent(len(in_firezone), len(points))
10.5%
What type of fire zone has more buildings?
zonetype_counts = in_firezone.fire_zone_type.value_counts().reset_index()
zonetype_counts.columns = ["zonetype", "building_count"]
zonetype_counts['percent'] = zonetype_counts.building_count / zonetype_counts.building_count.sum()
zonetype_counts
zonetype | building_count | percent | |
---|---|---|---|
0 | LRA | 669311 | 0.581412 |
1 | SRA | 481870 | 0.418588 |
alt.Chart(zonetype_counts).mark_bar().encode(
x="building_count:Q",
y="zonetype:N",
color="zonetype:N"
)
What Census defined places have the most buildings in fires zones?
def crosstab_boolean(df, by_field, bool_field):
"""
Generate a crosstab that analyzes a boolean field.
"""
# Group on the index field and count frequencies of the boolean field.
counts = df.groupby([
by_field,
bool_field
]).size().rename("building_count").reset_index()
# Flip the result into a crosstab
pivot = counts.pivot(
index=by_field,
columns=bool_field,
values="building_count"
).fillna(0).reset_index()
# Calculate total
pivot['total'] = pivot[True] + pivot[False]
# Calculate percentages
pivot[f'{bool_field}_percent'] = round((pivot[True] / pivot['total'])*100, 2)
# Clean up the column names
cleaned = pivot.rename(columns={
False: f"not_{bool_field}",
True: bool_field
})
# Return the result
return cleaned.set_index(by_field)
place_pivot = crosstab_boolean(points, "place_name", "in_firezone")
top_places = place_pivot.sort_values("in_firezone", ascending=False).head(20)
top_places
in_firezone | not_in_firezone | in_firezone | total | in_firezone_percent |
---|---|---|---|---|
place_name | ||||
Los Angeles city | 666442.0 | 114355.0 | 780797.0 | 14.65 |
San Diego city | 224153.0 | 88724.0 | 312877.0 | 28.36 |
Santa Clarita city | 33712.0 | 18206.0 | 51918.0 | 35.07 |
Thousand Oaks city | 20899.0 | 17062.0 | 37961.0 | 44.95 |
Rancho Palos Verdes city | 356.0 | 13519.0 | 13875.0 | 97.43 |
Oakland city | 74663.0 | 12186.0 | 86849.0 | 14.03 |
Glendale city | 27580.0 | 11870.0 | 39450.0 | 30.09 |
Paradise town | 149.0 | 11703.0 | 11852.0 | 98.74 |
Big Bear City CDP | 1229.0 | 10908.0 | 12137.0 | 89.87 |
Simi Valley city | 26399.0 | 10432.0 | 36831.0 | 28.32 |
Lake Arrowhead CDP | 129.0 | 9875.0 | 10004.0 | 98.71 |
Truckee town | 3083.0 | 8555.0 | 11638.0 | 73.51 |
Murrieta city | 22975.0 | 7946.0 | 30921.0 | 25.70 |
Redding city | 27432.0 | 7676.0 | 35108.0 | 21.86 |
Laguna Beach city | 3047.0 | 7314.0 | 10361.0 | 70.59 |
Big Bear Lake city | 796.0 | 7273.0 | 8069.0 | 90.14 |
La Cañada Flintridge city | 0.0 | 7179.0 | 7179.0 | 100.00 |
South Lake Tahoe city | 890.0 | 7057.0 | 7947.0 | 88.80 |
Calabasas city | 2.0 | 6964.0 | 6966.0 | 99.97 |
Lake Elsinore city | 9887.0 | 6921.0 | 16808.0 | 41.18 |
alt.Chart(top_places.reset_index()).mark_bar().encode(
x="in_firezone:Q",
y=alt.Y(
"place_name:N",
sort=alt.EncodingSortField(field="in_firezone", op="sum", order="descending")
)
)
How many cities have more than 90% of the buildings in the "very severe" zones?
very_high_places = place_pivot[place_pivot.in_firezone_percent >= 90]
intcomma(len(very_high_places))
174
Are Paradise, Malibu and Topanga in this group?
very_high_places.reset_index()[
very_high_places.reset_index().place_name.isin([
'Paradise town',
'Malibu city',
'Topanga CDP'
])
]
in_firezone | place_name | not_in_firezone | in_firezone | total | in_firezone_percent |
---|---|---|---|---|---|
102 | Malibu city | 4.0 | 5963.0 | 5967.0 | 99.93 |
118 | Paradise town | 149.0 | 11703.0 | 11852.0 | 98.74 |
160 | Topanga CDP | 0.0 | 3574.0 | 3574.0 | 100.00 |
What Los Angeles neighborhoods have the most buildings "very severe" zones?
hood_pivot = crosstab_boolean(points, "neighborhood", "in_firezone")
top_hoods = hood_pivot.sort_values("in_firezone", ascending=False).head(20)
top_hoods
in_firezone | not_in_firezone | in_firezone | total | in_firezone_percent |
---|---|---|---|---|
neighborhood | ||||
Pacific Palisades | 0.0 | 9303.0 | 9303.0 | 100.00 |
Hollywood Hills | 226.0 | 6096.0 | 6322.0 | 96.43 |
Hollywood Hills West | 65.0 | 5826.0 | 5891.0 | 98.90 |
Silver Lake | 2794.0 | 5114.0 | 7908.0 | 64.67 |
Shadow Hills | 915.0 | 5049.0 | 5964.0 | 84.66 |
Beverly Crest | 6.0 | 4915.0 | 4921.0 | 99.88 |
Eagle Rock | 5183.0 | 4836.0 | 10019.0 | 48.27 |
Brentwood | 3583.0 | 4813.0 | 8396.0 | 57.32 |
Sherman Oaks | 11855.0 | 4497.0 | 16352.0 | 27.50 |
Highland Park | 7153.0 | 4343.0 | 11496.0 | 37.78 |
Mount Washington | 0.0 | 4342.0 | 4342.0 | 100.00 |
Studio City | 5617.0 | 4337.0 | 9954.0 | 43.57 |
Encino | 8424.0 | 4277.0 | 12701.0 | 33.67 |
El Sereno | 6046.0 | 4172.0 | 10218.0 | 40.83 |
Los Feliz | 3030.0 | 4015.0 | 7045.0 | 56.99 |
Montecito Heights | 15.0 | 3612.0 | 3627.0 | 99.59 |
Tujunga | 4397.0 | 3489.0 | 7886.0 | 44.24 |
Chatsworth | 9164.0 | 3399.0 | 12563.0 | 27.06 |
Bel-Air | 0.0 | 3364.0 | 3364.0 | 100.00 |
Porter Ranch | 5144.0 | 3286.0 | 8430.0 | 38.98 |
alt.Chart(top_hoods.reset_index()).mark_bar().encode(
x="in_firezone:Q",
y=alt.Y(
"neighborhood:N",
sort=alt.EncodingSortField(field="in_firezone", op="sum", order="descending")
)
)
How many neighborhoods have 1,000 or more?
len(hood_pivot[hood_pivot.in_firezone >= 1000])
30
Group our building points into grids that segment that state by running merge-grids.ipynb.
%%capture
%run src/merge-grids.ipynb
Calculate the number of at-risk buildings in each grid segment by running analyze-grids.ipynb.
%%capture
%run src/analyze-grids.ipynb