This notebook creates some simple maps using the production.spatial
facet of the Te Papa API to identify places where collection objects were created.
If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.
Some tips:
import requests
import pandas as pd
import altair as alt
import re
import folium
from tqdm import tnrange
from folium.plugins import MarkerCluster
from IPython.display import display, HTML
alt.renderers.enable('notebook')
RendererRegistry.enable('notebook')
Sign up here for your very own API key.
# Insert your API key between the quotes
api_key = ''
# If you don't have an API key yet, you can leave the above blank and we'll pick up a guest token below
print('Your API key is: {}'.format(api_key))
Your API key is:
search_endpoint = 'https://data.tepapa.govt.nz/collection/search'
headers = {
'x-api-key': api_key,
'Accept': 'application/json'
}
if not api_key:
response = requests.get('https://data.tepapa.govt.nz/collection/search')
data = response.json()
guest_token = data['guestToken']
headers['Authorization'] = 'Bearer {}'.format(guest_token)
Below we set the search parameters. Currently it will return information about all objects in the collection. You can change the query
value to limit the result set — try replacing the asterix with some keywords.
The size
parameter sets the number of places to return — so in this case we're getting the 100 places that have the most objects associated with them.
The production.spatial.href
facet gives us the API url of the place itself, so we can use it to get more information about the place.
post_data = {
'query': '*',
'filters': [{
'field': 'type',
'keyword': 'Object'
}],
'facets': [
{'field': 'production.spatial.href',
'size': 100}
]
}
# Make the API request
response = requests.post(search_endpoint, json=post_data, headers=headers)
data = response.json()
# Convert the facets data to a dataframe and do some cleaning up
# We end up with two columns -- one with the place url, and the other with the number of objects associated with that place
places_df = pd.DataFrame(list(data['facets']['production.spatial.href'].items()))
places_df.columns = ['place_id', 'count']
places_df.head()
place_id | count | |
---|---|---|
0 | https://data.tepapa.govt.nz/collection/place/2... | 395 |
1 | https://data.tepapa.govt.nz/collection/place/2... | 8846 |
2 | https://data.tepapa.govt.nz/collection/place/2... | 430 |
3 | https://data.tepapa.govt.nz/collection/place/2... | 426 |
4 | https://data.tepapa.govt.nz/collection/place/2... | 374 |
Using the place url we'll get the full record for each place. We'll then save the name of the place, its geospatial coordinates (if any), and its ISO country code (if any) to the dataframe.
def find_country_code(place):
code = None
if 'alternativeTerms' in place:
for term in place['alternativeTerms']:
try:
if term[:3] == 'ISO':
code = term[3:]
except TypeError:
pass
return code
for i in tnrange(len(places_df)):
href = places_df.loc[i]['place_id']
response = requests.get(href, headers=headers)
place_data = response.json()
places_df.at[i, 'title'] = place_data['title']
code = find_country_code(place_data)
if code:
places_df.at[i, 'isocode'] = code
if 'geoLocation' in place_data:
places_df.at[i, 'lat'] = place_data['geoLocation']['lat']
places_df.at[i, 'lon'] = place_data['geoLocation']['lon']
places_df.head()
HBox(children=(IntProgress(value=0), HTML(value='')))
place_id | count | title | lat | lon | isocode | |
---|---|---|---|---|---|---|
0 | https://data.tepapa.govt.nz/collection/place/2... | 395 | Surrey (United Kingdom) | 51.200 | -0.050 | NaN |
1 | https://data.tepapa.govt.nz/collection/place/2... | 8845 | Auckland (New Zealand) | -36.917 | 174.783 | NaN |
2 | https://data.tepapa.govt.nz/collection/place/2... | 430 | Kanto (Nihon) | 36.250 | 139.500 | NaN |
3 | https://data.tepapa.govt.nz/collection/place/2... | 426 | Solomon Islands | NaN | NaN | NaN |
4 | https://data.tepapa.govt.nz/collection/place/2... | 374 | Napier (New Zealand) | -39.483 | 176.967 | NaN |
import html
m = folium.Map(
location=[10, 10],
zoom_start=1.5
)
# We'll cluster the markers for better readability
marker_cluster = MarkerCluster().add_to(m)
for index, row in places_df.dropna(subset=['lat', 'lon']).iterrows():
# We can easily change the API url to a web url and use it to link the map to the Te Papa collection web site
web_url = row['place_id'].replace('/collection/', '/').replace('data', 'collections')
popup = '<b><a target="_blank" href="{}">{}</a></b><br>{} objects'.format(web_url, html.escape(row['title']), row['count'])
folium.Marker([row['lat'], row['lon']], popup=popup).add_to(marker_cluster)
m
Let's try and make the number of objects created in each place more obvious.
import html
m = folium.Map(
location=[10, 10],
zoom_start=1.5
)
for index, row in places_df.dropna(subset=['lat', 'lon']).iterrows():
popup = '<b>{}</b><br>{} objects'.format(html.escape(row['title']), row['count'])
folium.Circle([row['lat'], row['lon']], radius=row['count']*5, popup=popup, color='#de2d26', fill=True).add_to(m)
m
Remember that we're not seeing all the places where objects were created. First of all the facet size
parameter limited out results to the top 100 places. Trying changing it to see what happens.
Even amongst the top 100, not every place had geospatial coordinates attached to it. So not everything is on the map. Let's create a list of places without coordinates.
places_df.loc[places_df['lat'].isnull()]
place_id | count | title | lat | lon | isocode | |
---|---|---|---|---|---|---|
3 | https://data.tepapa.govt.nz/collection/place/2... | 426 | Solomon Islands | NaN | NaN | NaN |
17 | https://data.tepapa.govt.nz/collection/place/2... | 592 | Upolu (Samoa) | NaN | NaN | NaN |
31 | https://data.tepapa.govt.nz/collection/place/314 | 795 | Opononi | NaN | NaN | NaN |
33 | https://data.tepapa.govt.nz/collection/place/2... | 668 | Chatham Islands (New Zealand) | NaN | NaN | NaN |
46 | https://data.tepapa.govt.nz/collection/place/2... | 435 | Jawa (Indonesia) | NaN | NaN | NaN |
47 | https://data.tepapa.govt.nz/collection/place/2... | 5171 | North Island (New Zealand) | NaN | NaN | NaN |
59 | https://data.tepapa.govt.nz/collection/place/2... | 613 | South Island (New Zealand) | NaN | NaN | NaN |
65 | https://data.tepapa.govt.nz/collection/place/2... | 431 | Africa | NaN | NaN | NaN |
70 | https://data.tepapa.govt.nz/collection/place/2... | 1117 | Stewart Island (New Zealand) | NaN | NaN | NaN |
71 | https://data.tepapa.govt.nz/collection/place/2... | 287 | Pacific Islands | NaN | NaN | NaN |
78 | https://data.tepapa.govt.nz/collection/place/2... | 475 | Czechoslovakia | NaN | NaN | NaN |
94 | https://data.tepapa.govt.nz/collection/place/2... | 303 | Admiralty Islands (Papua New Guinea) | NaN | NaN | NaN |