Mapping Te Papa’s collections

This notebook creates some simple maps using the production.spatial facet of the Te Papa API to identify places where collection objects were created.

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

Some tips:

  • Code cells have boxes around them.
  • To run a code cell either click on the cell and then hit Shift+Enter. The Shift+Enter combo will also move you to the next cell, so it's a quick way to work through the notebook.
  • While a cell is running a * appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.
  • In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.
  • To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.

In [2]:
import requests
import pandas as pd
import altair as alt
import re
import folium
from tqdm.notebook import tnrange, tqdm
from folium.plugins import MarkerCluster
from IPython.display import display, HTML
alt.renderers.enable('default')
Out[2]:
RendererRegistry.enable('default')

Get an API key

Sign up here for your very own API key.

In [3]:
# Insert your API key between the quotes
api_key = ''
# If you don't have an API key yet, you can leave the above blank and we'll pick up a guest token below
print('Your API key is: {}'.format(api_key))
Your API key is: 

Set some parameters

In [4]:
search_endpoint = 'https://data.tepapa.govt.nz/collection/search'

headers = {
    'x-api-key': api_key,
    'Accept': 'application/json'
}

if not api_key:
    response = requests.get('https://data.tepapa.govt.nz/collection/search')
    data = response.json()
    guest_token = data['guestToken']
    headers['Authorization'] = 'Bearer {}'.format(guest_token)

Below we set the search parameters. Currently it will return information about all objects in the collection. You can change the query value to limit the result set — try replacing the asterix with some keywords.

The size parameter sets the number of places to return — so in this case we're getting the 100 places that have the most objects associated with them.

The production.spatial.href facet gives us the API url of the place itself, so we can use it to get more information about the place.

In [5]:
post_data = {
    'query': '*',
    'filters': [{
        'field': 'type',
        'keyword': 'Object'
    }],
    'facets': [
        {'field': 'production.spatial.href',
        'size': 100}
    ]
}

Get some data

In [6]:
# Make the API request
response = requests.post(search_endpoint, json=post_data, headers=headers)
data = response.json()
In [7]:
# Convert the facets data to a dataframe and do some cleaning up
# We end up with two columns -- one with the place url, and the other with the number of objects associated with that place
places_df = pd.DataFrame(list(data['facets']['production.spatial.href'].items()))
places_df.columns = ['place_id', 'count']
places_df.head()
Out[7]:
place_id count
0 https://data.tepapa.govt.nz/collection/place/2... 516
1 https://data.tepapa.govt.nz/collection/place/2... 9128
2 https://data.tepapa.govt.nz/collection/place/2... 430
3 https://data.tepapa.govt.nz/collection/place/2... 442
4 https://data.tepapa.govt.nz/collection/place/2... 380

Add more information about each place

Using the place url we'll get the full record for each place. We'll then save the name of the place, its geospatial coordinates (if any), and its ISO country code (if any) to the dataframe.

In [8]:
def find_country_code(place):
    code = None
    if 'alternativeTerms' in place:
        for term in place['alternativeTerms']:
            try:
                if term[:3] == 'ISO':
                    code = term[3:]
            except TypeError:
                pass
    return code       

for i in tnrange(len(places_df)):
    href = places_df.loc[i]['place_id']
    response = requests.get(href, headers=headers)
    place_data = response.json()
    places_df.at[i, 'title'] = place_data['title']
    code = find_country_code(place_data)
    if code:
        places_df.at[i, 'isocode'] = code
    if 'geoLocation' in place_data:
        places_df.at[i, 'lat'] = place_data['geoLocation']['lat']
        places_df.at[i, 'lon'] = place_data['geoLocation']['lon']

places_df.head()

Out[8]:
place_id count title lat lon isocode
0 https://data.tepapa.govt.nz/collection/place/2... 516 Surrey (United Kingdom) 51.200 -0.050 NaN
1 https://data.tepapa.govt.nz/collection/place/2... 9128 Auckland (New Zealand) -36.917 174.783 NaN
2 https://data.tepapa.govt.nz/collection/place/2... 430 Kanto (Nihon) 36.250 139.500 NaN
3 https://data.tepapa.govt.nz/collection/place/2... 442 Solomon Islands NaN NaN NaN
4 https://data.tepapa.govt.nz/collection/place/2... 380 Napier (New Zealand) -39.483 176.967 NaN

Make a map

In [9]:
import html
m = folium.Map(
    location=[10, 10],
    zoom_start=1.5
)
# We'll cluster the markers for better readability
marker_cluster = MarkerCluster().add_to(m)

for index, row in places_df.dropna(subset=['lat', 'lon']).iterrows():
    # We can easily change the API url to a web url and use it to link the map to the Te Papa collection web site
    web_url = row['place_id'].replace('/collection/', '/').replace('data', 'collections')
    popup = '<b><a target="_blank" href="{}">{}</a></b><br>{} objects'.format(web_url, html.escape(row['title']), row['count'])
    folium.Marker([row['lat'], row['lon']], popup=popup).add_to(marker_cluster)
    
m
Out[9]:

Make another map

Let's try and make the number of objects created in each place more obvious.

In [10]:
import html
m = folium.Map(
    location=[10, 10],
    zoom_start=1.5
)

for index, row in places_df.dropna(subset=['lat', 'lon']).iterrows():
    popup = '<b>{}</b><br>{} objects'.format(html.escape(row['title']), row['count'])
    folium.Circle([row['lat'], row['lon']], radius=row['count']*5, popup=popup, color='#de2d26', fill=True).add_to(m)
    
m
Out[10]:

What's missing?

Remember that we're not seeing all the places where objects were created. First of all the facet size parameter limited out results to the top 100 places. Trying changing it to see what happens.

Even amongst the top 100, not every place had geospatial coordinates attached to it. So not everything is on the map. Let's create a list of places without coordinates.

In [11]:
places_df.loc[places_df['lat'].isnull()]  
Out[11]:
place_id count title lat lon isocode
3 https://data.tepapa.govt.nz/collection/place/2... 442 Solomon Islands NaN NaN NaN
17 https://data.tepapa.govt.nz/collection/place/2... 591 Upolu (Samoa) NaN NaN NaN
31 https://data.tepapa.govt.nz/collection/place/314 821 Opononi NaN NaN NaN
34 https://data.tepapa.govt.nz/collection/place/2... 667 Chatham Islands (New Zealand) NaN NaN NaN
46 https://data.tepapa.govt.nz/collection/place/2... 435 Jawa (Indonesia) NaN NaN NaN
47 https://data.tepapa.govt.nz/collection/place/2... 5195 North Island (New Zealand) NaN NaN NaN
59 https://data.tepapa.govt.nz/collection/place/2... 774 South Island (New Zealand) NaN NaN NaN
65 https://data.tepapa.govt.nz/collection/place/2... 417 Africa NaN NaN NaN
70 https://data.tepapa.govt.nz/collection/place/2... 1123 Stewart Island (New Zealand) NaN NaN NaN
71 https://data.tepapa.govt.nz/collection/place/2... 287 Pacific Islands NaN NaN NaN
78 https://data.tepapa.govt.nz/collection/place/2... 475 Czechoslovakia NaN NaN NaN
94 https://data.tepapa.govt.nz/collection/place/2... 303 Admiralty Islands (Papua New Guinea) NaN NaN NaN

Created by Tim Sherrratt (@wragge) as part of the GLAM workbench.

If you think this project is worthwhile you can support it on Patreon.

In [ ]: