Let's make a heat map¶

Heat maps are useful ways of aggregating points on a 2D plane (including, but not limited to, maps) that highlight the density. In this notebook we'll explore simple ways of making heatmaps, skipping over for now the more complicated ways like using matplotlib.

Note: if you're viewing this on GitHub, charting libraries that output Javascript (Bokeh, Holoviews, Altair) won't display. Try viewing the notebook on nbviewer instead.

First, we're going to need geographic data. Open Street Map is a fantastic repository of it, maintained by a community of volunteers and free to use. We'll use a python wrapper to the Overpass API.

In [1]:

import overpass
api = overpass.API()

We'll also be making a couple of plots with matplotlib, and for convenience we can tell it to always display those in the notebook.

In [2]:

%matplotlib inline

The Overpass API is the framework that Open Street Maps has developed to access their data. It's the way to select certain parts of the very large amount of geographic data, letting you download a dataset of, say, everything that's labeled as an embassy.

First, let's start out with a small amount of data to make sure we understand what we're doing. In order to satiate my addiction to caffeine, I'm going to request all the coffeeshops in the city of Seattle (my home city). The Overpass module for Python is just a simple wrapper around the Overpass API. We're basically just going to pass a string that's written in the query language. To make things easy we'll hardcode the latitude and longitude into the text of the query; these can easily be found by clicking on a point of Google Maps, or if you're using Open Street Maps, by right-clicking and selecting Show Address.

In [3]:

response = api.Get('node["amenity"="cafe"](47.5, -122.436, 47.734, -122.235);')

The response returns a lot of data we don't care about, so let's create a utility function to return just the name, latitude, and longitude in a pandas DataFrame. There's actually nothing that says it has to be a cafe, so we can generalize the function when we name it to be any of this type of data fetched from Open Street Map.

In [4]:

from pandas import DataFrame

def parse_fetched_OSM_data(response):
    reduced = []
    for feature in response['features']:
        if 'name' in feature['properties']:
            reduced.append((feature['properties']['name'], *feature['geometry']['coordinates']))
    return DataFrame.from_records(sorted(reduced), columns=["name", "lon", "lat"])
            
seattle_cafes = parse_fetched_OSM_data(response)

Now we have a list of tuples, each of which contain the name of the cafe and its latitude and longitude. To make things easier, let's load the data into a pandas DataFrame.

In [5]:

seattle_cafes.sample(20)

Out[5]:

	name	lon	lat
66	Caffe Ladro	-122.356521	47.625335
462	Starbucks	-122.235249	47.586423
81	Caffè Appassionato	-122.342497	47.661536
356	Samir's Mediterranean Grill	-122.313507	47.659910
74	Caffe Vita	-122.355110	47.682730
179	Flame Catering Cafe	-122.331085	47.602123
393	Starbucks	-122.351920	47.614049
416	Starbucks	-122.335989	47.661729
491	The Crumpet Shop	-122.340390	47.608968
284	Moondrop Coffee & Tea	-122.381368	47.589167
302	Old School Frozen Custard	-122.314702	47.614232
83	Caffè Fiore	-122.360822	47.632460
502	The Stage Door	-122.355960	47.690720
192	Gio's	-122.406647	47.674368
567	Zeitgeist Coffee and Art	-122.331856	47.599091
562	Yellow Dot Cafe	-122.349403	47.649224
327	Pioneer Square Deli	-122.331364	47.599335
379	Square Knot	-122.316720	47.548900
370	Sound Soups	-122.334730	47.605203
100	Chatterbox Cafe	-122.316581	47.611725

How many coffeeshops did we get?

In [6]:

len(seattle_cafes)

Out[6]:

So many coffeeshops, so little time.

Anyway, DataFrames have some nice built-in plotting functions that output matplotlib charts, but abstract away the annoying details. This is what it looks like when we scatter plot by geographic coordinates.

In [7]:

import matplotlib.pyplot as plt
seattle_cafes.plot.scatter(x='lon', y='lat')

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x1bf9755dd30>

So here we've got a quick-and-dirty point-plotting that actually lets us see some geography, specifically the waterfront around downtown Seattle in the center, which is kind of neat.

To make the actual heat map, we're going to use Folium, a Python wrapper to Leaflet. Though it doesn't have some cartography features like map projections, it's great for simple interactive maps.

In [8]:

import folium
from folium.plugins import HeatMap

Folium doesn't natively work with pandas, so we'll have to do a little work first, and convert the DataFrame into a list of [latitude, longitude] vectors.

In [9]:

coords = [[item[1].lat, item[1].lon] for item in seattle_cafes.iterrows()]
coords[:2]

Out[9]:

[[47.5171783, -122.3549598], [47.6120244, -122.3399314]]

The rest is just instantiating the map with the right parameters, which do take some tweaking to get right.

In [10]:

hmap = folium.Map(location=[47.6, -122.35], zoom_start=12)
hm_wide = HeatMap( coords, 
                   max_zoom=1,
                   min_opacity=0.2,
                   radius=10, blur=15, 
                 )
hmap.add_child(hm_wide)
hmap

Out[10]:

Now let's try a heatmap for the whole country. I've saved the query as a file, since it's ~8mb.

In [11]:

import json
lines = ""
with open('cafes_us_OSM.geojson', encoding='utf-8') as f:
    result = json.loads(f.read())

Sure is convenient we wrote that helper function above, so we don't have to duplicate all that code.

In [12]:

us_cafes = parse_fetched_OSM_data(result)
us_cafes.sample(20)

Out[12]:

	name	lon	lat
15018	Starbucks	-81.508187	41.148689
2469	Caffe mille gusti	-73.605203	45.539060
12158	Spencer's Coffee	-86.442094	36.993715
4369	Cutter Point Coffee	-122.807083	46.996474
15492	Starbucks	-77.033170	38.904215
16682	Starbucks Coffee	-77.096055	38.917237
18742	Tim Hortons	-76.804788	44.279883
14859	Starbucks	-83.687939	42.255973
13973	Starbucks	-105.762477	39.885281
7839	Jittery Joe's Roaster	-83.386533	33.967248
15666	Starbucks	-75.166717	39.952529
2094	Cafe Happy	-122.206364	47.675663
16339	Starbucks Coffee	-97.375888	32.672069
3616	Coffee Cup Cafe	-114.154683	46.242606
6837	Granada Coffee Company	-96.180442	38.407781
3951	Cookie Jar	-96.726585	43.544929
16043	Starbucks	-71.085667	42.363032
6908	GreenStreet	-80.243994	25.726735
6106	Es Café	-71.300074	46.719430
1403	Bluebird Coffee Shop & Tea House	-90.971862	36.263205

In [13]:

len(us_cafes)

Out[13]:

In [14]:

us_cafes.plot.scatter(x='lon', y='lat')

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x1bf9a209160>

We can see the coasts pretty well just from coffeeshops, which isn't surprising.

In [15]:

hmap = folium.Map(location=[35, -90], zoom_start=4)
coords = [[item[1].lat, item[1].lon] for item in us_cafes.iterrows()]
print("coffeeshops:", len(coords))
hm_wide = HeatMap( coords, 
                   max_zoom=1,
                   min_opacity=0.1,
                   radius=4, blur=7,
                 )
hmap.add_child(hm_wide)
hmap

coffeeshops: 20172

Out[15]:

I wonder how many of those are Starbucks? Let's get a count of the most common cafes.

In [16]:

us_cafes['count'] = 1
us_cafes_counted = us_cafes.groupby("name")['count'].count().sort_values(ascending=False)
us_cafes_counted[:20]

Out[16]:

name
Starbucks                 3768
Tim Hortons                811
Dunkin' Donuts             730
Starbucks Coffee           657
Dunkin Donuts              204
Second Cup                 162
Caribou Coffee             132
Peet's Coffee & Tea         81
Panera Bread                80
Coffee Time                 74
Jamba Juice                 65
Starbuck's                  64
Subway                      62
Peet's Coffee               49
Country Style               48
Au Bon Pain                 33
Biggby Coffee               31
Le Pain Quotidien           31
club house                  30
Tropical Smoothie Cafe      28
Name: count, dtype: int64

Looks like the data are somewhat messy, since there are a couple of different names for Starbucks. It would be interesting to look at a heatmap of only non-chain coffeeshops. Let's take out the most ones. First we'll get a list of the names of the 50 most common cafes.

In [17]:

chains = us_cafes_counted.index[:50]
chains

Out[17]:

Index(['Starbucks', 'Tim Hortons', 'Dunkin' Donuts', 'Starbucks Coffee',
       'Dunkin Donuts', 'Second Cup', 'Caribou Coffee', 'Peet's Coffee & Tea',
       'Panera Bread', 'Coffee Time', 'Jamba Juice', 'Starbuck's', 'Subway',
       'Peet's Coffee', 'Country Style', 'Au Bon Pain', 'Biggby Coffee',
       'Le Pain Quotidien', 'club house', 'Tropical Smoothie Cafe',
       'The Coffee Bean & Tea Leaf', 'Panera', 'Cafeteria',
       'Cold Stone Creamery', 'Blenz Coffee', 'Coffee Culture',
       'Baskin-Robbins', 'Bruegger's Bagels', 'Tully's Coffee',
       'Coffee Bean & Tea Leaf', 'Einstein Bros Bagels', 'Bridgehead',
       'Peet's', 'Cosi', 'Einstein Bros. Bagels', 'Timothy's World Coffee',
       'Peet's Coffee and Tea', 'Van Houtte', 'Dutch Bros', 'Presse Café',
       'Corner Bakery Cafe', 'Philz Coffee', 'Tim Horton's', 'Tim Horton',
       'Timothy's', 'Corner Bakery', 'Aroma Espresso Bar',
       'Café Starbucks Coffee', 'Top Pot Doughnuts', 'Club House'],
      dtype='object', name='name')

How many individual cafes is this? We can create a selection of cafes with one of these names by using the isin function, which outputs a True/False for each row. We use that to filter out the rest of the cafes, and finally get a tally.

In [18]:

len(us_cafes[us_cafes.name.isin(chains)])

Out[18]:

That's the number of cafes we'll remove.

In [19]:

len(us_cafes[~us_cafes.name.isin(chains)])

Out[19]:

And the number still in.

Looks like we don't have nearly complete data, as the number of Starbucks in the US is closer to 12,000 (see References section). Still, should be good enough for our purposes.

In [20]:

hmap = folium.Map(location=[35, -90], zoom_start=4)
coords = [[item[1].lat, item[1].lon] for item in us_cafes[~us_cafes.name.isin(chains)].iterrows()]
print("coffeeshops", len(coords))
hm_wide = HeatMap( coords, 
                   max_zoom=0.1,
                   min_opacity=10,
                   radius=4, blur=7, 
                 )
hmap.add_child(hm_wide)
hmap

coffeeshops 12510

Out[20]:

The two heatmaps look almost exactly the same, although they have subtle differences. There are some hot in the Midwest that shrink noticeably when you remove the chains, like Dallas. But all in all, a good showing for independent coffeeshops.

Thanks for reading!

References¶

Heatmaps with Folium in Python: https://alcidanalytics.com/p/geographic-heatmap-in-python

Number of Starbucks in the US: https://www.statista.com/statistics/218360/number-of-starbucks-stores-in-the-us/

Getting cafe data: http://overpass-turbo.eu/, https://wiki.openstreetmap.org/wiki/Overpass_API