Heat maps are useful ways of aggregating points on a 2D plane (including, but not limited to, maps) that highlight the density. In this notebook we'll explore simple ways of making heatmaps, skipping over for now the more complicated ways like using matplotlib.
Note: if you're viewing this on GitHub, charting libraries that output Javascript (Bokeh, Holoviews, Altair) won't display. Try viewing the notebook on nbviewer instead.
First, we're going to need geographic data. Open Street Map is a fantastic repository of it, maintained by a community of volunteers and free to use. We'll use a python wrapper to the Overpass API.
import overpass
api = overpass.API()
We'll also be making a couple of plots with matplotlib, and for convenience we can tell it to always display those in the notebook.
%matplotlib inline
The Overpass API is the framework that Open Street Maps has developed to access their data. It's the way to select certain parts of the very large amount of geographic data, letting you download a dataset of, say, everything that's labeled as an embassy.
First, let's start out with a small amount of data to make sure we understand what we're doing. In order to satiate my addiction to caffeine, I'm going to request all the coffeeshops in the city of Seattle (my home city). The Overpass module for Python is just a simple wrapper around the Overpass API. We're basically just going to pass a string that's written in the query language. To make things easy we'll hardcode the latitude and longitude into the text of the query; these can easily be found by clicking on a point of Google Maps, or if you're using Open Street Maps, by right-clicking and selecting Show Address.
response = api.Get('node["amenity"="cafe"](47.5, -122.436, 47.734, -122.235);')
The response returns a lot of data we don't care about, so let's create a utility function to return just the name, latitude, and longitude in a pandas DataFrame. There's actually nothing that says it has to be a cafe, so we can generalize the function when we name it to be any of this type of data fetched from Open Street Map.
from pandas import DataFrame
def parse_fetched_OSM_data(response):
reduced = []
for feature in response['features']:
if 'name' in feature['properties']:
reduced.append((feature['properties']['name'], *feature['geometry']['coordinates']))
return DataFrame.from_records(sorted(reduced), columns=["name", "lon", "lat"])
seattle_cafes = parse_fetched_OSM_data(response)
Now we have a list of tuples, each of which contain the name of the cafe and its latitude and longitude. To make things easier, let's load the data into a pandas DataFrame.
seattle_cafes.sample(20)
name | lon | lat | |
---|---|---|---|
66 | Caffe Ladro | -122.356521 | 47.625335 |
462 | Starbucks | -122.235249 | 47.586423 |
81 | Caffè Appassionato | -122.342497 | 47.661536 |
356 | Samir's Mediterranean Grill | -122.313507 | 47.659910 |
74 | Caffe Vita | -122.355110 | 47.682730 |
179 | Flame Catering Cafe | -122.331085 | 47.602123 |
393 | Starbucks | -122.351920 | 47.614049 |
416 | Starbucks | -122.335989 | 47.661729 |
491 | The Crumpet Shop | -122.340390 | 47.608968 |
284 | Moondrop Coffee & Tea | -122.381368 | 47.589167 |
302 | Old School Frozen Custard | -122.314702 | 47.614232 |
83 | Caffè Fiore | -122.360822 | 47.632460 |
502 | The Stage Door | -122.355960 | 47.690720 |
192 | Gio's | -122.406647 | 47.674368 |
567 | Zeitgeist Coffee and Art | -122.331856 | 47.599091 |
562 | Yellow Dot Cafe | -122.349403 | 47.649224 |
327 | Pioneer Square Deli | -122.331364 | 47.599335 |
379 | Square Knot | -122.316720 | 47.548900 |
370 | Sound Soups | -122.334730 | 47.605203 |
100 | Chatterbox Cafe | -122.316581 | 47.611725 |
How many coffeeshops did we get?
len(seattle_cafes)
576
So many coffeeshops, so little time.
Anyway, DataFrames have some nice built-in plotting functions that output matplotlib charts, but abstract away the annoying details. This is what it looks like when we scatter plot by geographic coordinates.
import matplotlib.pyplot as plt
seattle_cafes.plot.scatter(x='lon', y='lat')
<matplotlib.axes._subplots.AxesSubplot at 0x1bf9755dd30>
So here we've got a quick-and-dirty point-plotting that actually lets us see some geography, specifically the waterfront around downtown Seattle in the center, which is kind of neat.
To make the actual heat map, we're going to use Folium, a Python wrapper to Leaflet. Though it doesn't have some cartography features like map projections, it's great for simple interactive maps.
import folium
from folium.plugins import HeatMap
Folium doesn't natively work with pandas, so we'll have to do a little work first, and convert the DataFrame into a list of [latitude, longitude] vectors.
coords = [[item[1].lat, item[1].lon] for item in seattle_cafes.iterrows()]
coords[:2]
[[47.5171783, -122.3549598], [47.6120244, -122.3399314]]
The rest is just instantiating the map with the right parameters, which do take some tweaking to get right.
hmap = folium.Map(location=[47.6, -122.35], zoom_start=12)
hm_wide = HeatMap( coords,
max_zoom=1,
min_opacity=0.2,
radius=10, blur=15,
)
hmap.add_child(hm_wide)
hmap
Now let's try a heatmap for the whole country. I've saved the query as a file, since it's ~8mb.
import json
lines = ""
with open('cafes_us_OSM.geojson', encoding='utf-8') as f:
result = json.loads(f.read())
Sure is convenient we wrote that helper function above, so we don't have to duplicate all that code.
us_cafes = parse_fetched_OSM_data(result)
us_cafes.sample(20)
name | lon | lat | |
---|---|---|---|
15018 | Starbucks | -81.508187 | 41.148689 |
2469 | Caffe mille gusti | -73.605203 | 45.539060 |
12158 | Spencer's Coffee | -86.442094 | 36.993715 |
4369 | Cutter Point Coffee | -122.807083 | 46.996474 |
15492 | Starbucks | -77.033170 | 38.904215 |
16682 | Starbucks Coffee | -77.096055 | 38.917237 |
18742 | Tim Hortons | -76.804788 | 44.279883 |
14859 | Starbucks | -83.687939 | 42.255973 |
13973 | Starbucks | -105.762477 | 39.885281 |
7839 | Jittery Joe's Roaster | -83.386533 | 33.967248 |
15666 | Starbucks | -75.166717 | 39.952529 |
2094 | Cafe Happy | -122.206364 | 47.675663 |
16339 | Starbucks Coffee | -97.375888 | 32.672069 |
3616 | Coffee Cup Cafe | -114.154683 | 46.242606 |
6837 | Granada Coffee Company | -96.180442 | 38.407781 |
3951 | Cookie Jar | -96.726585 | 43.544929 |
16043 | Starbucks | -71.085667 | 42.363032 |
6908 | GreenStreet | -80.243994 | 25.726735 |
6106 | Es Café | -71.300074 | 46.719430 |
1403 | Bluebird Coffee Shop & Tea House | -90.971862 | 36.263205 |
len(us_cafes)
20172
us_cafes.plot.scatter(x='lon', y='lat')
<matplotlib.axes._subplots.AxesSubplot at 0x1bf9a209160>
We can see the coasts pretty well just from coffeeshops, which isn't surprising.
hmap = folium.Map(location=[35, -90], zoom_start=4)
coords = [[item[1].lat, item[1].lon] for item in us_cafes.iterrows()]
print("coffeeshops:", len(coords))
hm_wide = HeatMap( coords,
max_zoom=1,
min_opacity=0.1,
radius=4, blur=7,
)
hmap.add_child(hm_wide)
hmap
coffeeshops: 20172
I wonder how many of those are Starbucks? Let's get a count of the most common cafes.
us_cafes['count'] = 1
us_cafes_counted = us_cafes.groupby("name")['count'].count().sort_values(ascending=False)
us_cafes_counted[:20]
name Starbucks 3768 Tim Hortons 811 Dunkin' Donuts 730 Starbucks Coffee 657 Dunkin Donuts 204 Second Cup 162 Caribou Coffee 132 Peet's Coffee & Tea 81 Panera Bread 80 Coffee Time 74 Jamba Juice 65 Starbuck's 64 Subway 62 Peet's Coffee 49 Country Style 48 Au Bon Pain 33 Biggby Coffee 31 Le Pain Quotidien 31 club house 30 Tropical Smoothie Cafe 28 Name: count, dtype: int64
Looks like the data are somewhat messy, since there are a couple of different names for Starbucks. It would be interesting to look at a heatmap of only non-chain coffeeshops. Let's take out the most ones. First we'll get a list of the names of the 50 most common cafes.
chains = us_cafes_counted.index[:50]
chains
Index(['Starbucks', 'Tim Hortons', 'Dunkin' Donuts', 'Starbucks Coffee', 'Dunkin Donuts', 'Second Cup', 'Caribou Coffee', 'Peet's Coffee & Tea', 'Panera Bread', 'Coffee Time', 'Jamba Juice', 'Starbuck's', 'Subway', 'Peet's Coffee', 'Country Style', 'Au Bon Pain', 'Biggby Coffee', 'Le Pain Quotidien', 'club house', 'Tropical Smoothie Cafe', 'The Coffee Bean & Tea Leaf', 'Panera', 'Cafeteria', 'Cold Stone Creamery', 'Blenz Coffee', 'Coffee Culture', 'Baskin-Robbins', 'Bruegger's Bagels', 'Tully's Coffee', 'Coffee Bean & Tea Leaf', 'Einstein Bros Bagels', 'Bridgehead', 'Peet's', 'Cosi', 'Einstein Bros. Bagels', 'Timothy's World Coffee', 'Peet's Coffee and Tea', 'Van Houtte', 'Dutch Bros', 'Presse Café', 'Corner Bakery Cafe', 'Philz Coffee', 'Tim Horton's', 'Tim Horton', 'Timothy's', 'Corner Bakery', 'Aroma Espresso Bar', 'Café Starbucks Coffee', 'Top Pot Doughnuts', 'Club House'], dtype='object', name='name')
How many individual cafes is this? We can create a selection of cafes with one of these names by using the isin function, which outputs a True/False for each row. We use that to filter out the rest of the cafes, and finally get a tally.
len(us_cafes[us_cafes.name.isin(chains)])
7662
That's the number of cafes we'll remove.
len(us_cafes[~us_cafes.name.isin(chains)])
12510
And the number still in.
Looks like we don't have nearly complete data, as the number of Starbucks in the US is closer to 12,000 (see References section). Still, should be good enough for our purposes.
hmap = folium.Map(location=[35, -90], zoom_start=4)
coords = [[item[1].lat, item[1].lon] for item in us_cafes[~us_cafes.name.isin(chains)].iterrows()]
print("coffeeshops", len(coords))
hm_wide = HeatMap( coords,
max_zoom=0.1,
min_opacity=10,
radius=4, blur=7,
)
hmap.add_child(hm_wide)
hmap
coffeeshops 12510
The two heatmaps look almost exactly the same, although they have subtle differences. There are some hot in the Midwest that shrink noticeably when you remove the chains, like Dallas. But all in all, a good showing for independent coffeeshops.
Thanks for reading!
Heatmaps with Folium in Python: https://alcidanalytics.com/p/geographic-heatmap-in-python
Number of Starbucks in the US: https://www.statista.com/statistics/218360/number-of-starbucks-stores-in-the-us/
Getting cafe data: http://overpass-turbo.eu/, https://wiki.openstreetmap.org/wiki/Overpass_API