Working with zones

New to Jupyter notebooks? Try Using Jupyter notebooks for a quick introduction.

Trove's zones are important in constructing API requests and interpreting the results. So let's explore them a bit.

You're probably already familiar with Trove's zones from the web site.

There are 10 zones in Trove (11 if you regard the newspapers and gazettes as separate):

  • Digitised newspapers and gazettes
  • Journals, articles and data sets
  • Books
  • Pictures, photos and objects
  • Music, sound and video
  • Maps
  • Diaries, letters and archives
  • People and organisations
  • Archived websites
  • Lists

However, data from the 'People and organisations' and 'Archives websites' zones are not available through the API. Well, sort of not...

Let's see what the API itself can tell us about the zones.

Setting things up

We'll start by importing the modules we're going to need later on.

In [1]:
# Let's import the modules we need 
import requests
import os

# Altair helps us make pretty charts
import altair as alt

# Pandas helps us analyse tabular data
import pandas as pd

os.makedirs('data', exist_ok=True)

As usual we're going to need a Trove API key.

In [ ]:
# This creates a variable called 'api_key', paste your key between the quotes
api_key = ''

# This displays a message with your key
print('Your API key is: {}'.format(api_key))

We'll also set the base url for our API requests.

In [3]:
# Create a variable called 'api_search_url' and give it a value
api_search_url = 'https://api.trove.nla.gov.au/v2/result'

Give us everything!

This time we're going to ask for everything from all the zones. (Don't worry, you won't break anything, Trove will only give us the first 20 results in each zone.)

To do this, we'll set the q parameter to be an empty string (quotes around a space), and the zone parameter to 'all'.

In [19]:
# This creates a dictionary called 'params' and sets values for the API's parameters
params = {
    'q': ' ', # A space to search for everything
    'zone': 'all', # All zones thanks!
    'key': api_key,
    'encoding': 'json',
    'n': 1
}

We can now send our request off to the Trove API. Because we're not applying any limits to our query, the API can take a little longer than normal to respond. Just wait for the asterix in the square brackets to turn into a number, and then move on.

In [20]:
# This sends our request to the Trove API and stores the result in a variable called 'response'
response = requests.get(api_search_url, params=params)

# This shows us the url that's sent to the API
print(response.url) # This shows us the url that's sent to the API

# This checks the status code of the response to make sure there were no errors
if response.status_code == requests.codes.ok:
    print('All ok')
elif response.status_code == 403:
    print('There was an authentication error. Did you paste your API above?')
else:
    print('There was a problem. Error code: {}'.format(response.status_code))
    print('Try running this cell again.')
https://api.trove.nla.gov.au/v2/result?q=+&zone=all&key=6pi5hht0d2umqcro&encoding=json&n=1
All ok

As before we'll get the JSON results data from the API response.

In [21]:
# Get the Trove API's JSON results and make them available as a Python variable called 'data'
data = response.json()

If you'd like to have a look at the raw data, run the next cell.

In [22]:
# Let's prettify the raw JSON data and then display it.

# We're using the Pygments library to add some colour to the output, so we need to import it
import json
from pygments import highlight, lexers, formatters

# This uses Python's JSON module to output the results as nicely indented text
formatted_data = json.dumps(data, indent=2)

# This colours the text
highlighted_data = highlight(formatted_data, lexers.JsonLexer(), formatters.TerminalFormatter())

# And now display the results
print(highlighted_data)
{
  "response": {
    "query": "",
    "zone": [
      {
        "name": "url",
        "records": {
          "s": "*",
          "n": "0",
          "total": "0"
        }
      },
      {
        "name": "gazette",
        "records": {
          "s": "*",
          "n": "1",
          "total": "3535033",
          "next": "/result?q=+&encoding=json&n=1&zone=gazette&s=AoIIP4AAACkyMTk4NzY1MzI%3D",
          "nextStart": "AoIIP4AAACkyMTk4NzY1MzI=",
          "article": [
            {
              "id": "219876532",
              "url": "/newspaper/219876532",
              "heading": "TRADE-MARK.",
              "category": "Government Gazette Notices",
              "title": {
                "id": "525",
                "value": "New South Wales Government Gazette (Sydney, NSW : 1832 - 1900)"
              },
              "date": "1885-09-01",
              "page": 28,
              "pageSequence": 28,
              "relevance": {
                "score": "1.0",
                "value": "likely to be relevant"
              },
              "troveUrl": "https://trove.nla.gov.au/ndp/del/article/219876532"
            }
          ]
        }
      },
      {
        "name": "people",
        "records": {
          "s": "*",
          "n": "1",
          "total": "1275246",
          "next": "/result?q=+&encoding=json&n=1&zone=people&s=AoIIQ3AAAD8NaWRlbnRpdHk2ZTVmMmRhNy03MDUyLTQ3ZjMtYWM1Yy1jOTIwOGIyODhjMzM%3D",
          "nextStart": "AoIIQ3AAAD8NaWRlbnRpdHk2ZTVmMmRhNy03MDUyLTQ3ZjMtYWM1Yy1jOTIwOGIyODhjMzM=",
          "people": [
            {
              "id": "949358",
              "url": "/people/949358",
              "troveUrl": "https://trove.nla.gov.au/people/949358"
            }
          ]
        }
      },
      {
        "name": "list",
        "records": {
          "s": "*",
          "n": "1",
          "total": "92588",
          "next": "/result?q=+&encoding=json&n=1&zone=list&s=AoIIQ6lD%2FClsaXN0OTUzMTI%3D",
          "nextStart": "AoIIQ6lD/ClsaXN0OTUzMTI=",
          "list": [
            {
              "id": "95312",
              "url": "/list/95312",
              "troveUrl": "https://trove.nla.gov.au/list?id=95312",
              "title": ".",
              "creator": "public:polivar",
              "listItemCount": 1,
              "relevance": {
                "score": "338.53113",
                "value": "very relevant"
              }
            }
          ]
        }
      },
      {
        "name": "picture",
        "records": {
          "s": "*",
          "n": "1",
          "total": "4515404",
          "next": "/result?q=+&encoding=json&n=1&zone=picture&s=AoIIQRXCkCpzdTEwMDAwOTA1",
          "nextStart": "AoIIQRXCkCpzdTEwMDAwOTA1",
          "work": [
            {
              "id": "10000905",
              "url": "/work/10000905",
              "troveUrl": "https://trove.nla.gov.au/work/10000905",
              "title": "South-west 'Milk bar'",
              "contributor": [
                "HRRC"
              ],
              "issued": 1937,
              "type": [
                "Photograph"
              ],
              "holdingsCount": 1,
              "versionCount": 1,
              "relevance": {
                "score": "9.360001",
                "value": "very relevant"
              },
              "identifier": [
                {
                  "type": "url",
                  "linktype": "fulltext",
                  "linktext": "274058PD: Milk bar, 1937",
                  "value": "http://www.slwa.wa.gov.au/images/pd274/274058PD.jpg"
                },
                {
                  "type": "url",
                  "linktype": "thumbnail",
                  "value": "http://www.slwa.wa.gov.au/images/pd274/274058PD.png"
                }
              ]
            }
          ]
        }
      },
      {
        "name": "map",
        "records": {
          "s": "*",
          "n": "1",
          "total": "319374",
          "next": "/result?q=+&encoding=json&n=1&zone=map&s=AoIIQTmKJipzdTEwMjc4NzM4",
          "nextStart": "AoIIQTmKJipzdTEwMjc4NzM4",
          "work": [
            {
              "id": "10278738",
              "url": "/work/10278738",
              "troveUrl": "https://trove.nla.gov.au/work/10278738",
              "title": "The new phytologist",
              "contributor": [
                "Tansley, Sir Arthur George, 1871-1955"
              ],
              "issued": "1900-2020",
              "type": [
                "Periodical",
                "Periodical/Journal, magazine, other",
                "Map",
                "Map/Atlas",
                "Book"
              ],
              "holdingsCount": 32,
              "versionCount": 9,
              "relevance": {
                "score": "11.596228",
                "value": "very relevant"
              },
              "identifier": [
                {
                  "type": "url",
                  "linktype": "restricted",
                  "linktext": "Address for accessing the journal from an authorized IP address through OCLC FirstSearch Electronic Collections Online. Subscription to online journal required for access to abstracts and full text",
                  "value": "http://firstsearch.oclc.org/journal=0028-646x;screen=info;ECOIP"
                }
              ]
            }
          ]
        }
      },
      {
        "name": "collection",
        "records": {
          "s": "*",
          "n": "1",
          "total": "728776",
          "next": "/result?q=+&encoding=json&n=1&zone=collection&s=AoIIQRXCkCpzdTEwMTA1Mzc2",
          "nextStart": "AoIIQRXCkCpzdTEwMTA1Mzc2",
          "work": [
            {
              "id": "10105376",
              "url": "/work/10105376",
              "troveUrl": "https://trove.nla.gov.au/work/10105376",
              "title": "Hunter Water Corporation",
              "contributor": [
                "Hunter Water Corporation (N.S.W.)"
              ],
              "issued": 2000,
              "type": [
                "Published",
                "Government publication"
              ],
              "holdingsCount": 3,
              "versionCount": 1,
              "relevance": {
                "score": "9.360001",
                "value": "very relevant"
              },
              "identifier": [
                {
                  "type": "url",
                  "linktype": "restricted",
                  "linktext": "Archived site",
                  "value": "http://nla.gov.au/nla.arc-39584"
                }
              ]
            }
          ]
        }
      },
      {
        "name": "music",
        "records": {
          "s": "*",
          "n": "1",
          "total": "2483162",
          "next": "/result?q=+&encoding=json&n=1&zone=music&s=AoIIQXSbpSpzdTEwMDUxODUx",
          "nextStart": "AoIIQXSbpSpzdTEwMDUxODUx",
          "work": [
            {
              "id": "10051851",
              "url": "/work/10051851",
              "troveUrl": "https://trove.nla.gov.au/work/10051851",
              "title": "Journal of marriage and the family",
              "contributor": [
                "National Council on Family Relations (U.S.)"
              ],
              "issued": "1939-2020",
              "type": [
                "Periodical",
                "Periodical/Journal, magazine, other",
                "Microform",
                "Printed music"
              ],
              "holdingsCount": 21,
              "versionCount": 7,
              "relevance": {
                "score": "15.287999",
                "value": "very relevant"
              },
              "identifier": [
                {
                  "type": "url",
                  "linktype": "restricted",
                  "linktext": "Online access from 01/01/2014-31/12/2016",
                  "value": "http://ejournals.ebsco.com/direct.asp?JournalID=111076"
                }
              ]
            }
          ]
        }
      },
      {
        "name": "article",
        "records": {
          "s": "*",
          "n": "1",
          "total": "11030302",
          "next": "/result?q=+&encoding=json&n=1&zone=article&s=AoIIQe%2BHOilzdTY1MDE4OTQ%3D",
          "nextStart": "AoIIQe+HOilzdTY1MDE4OTQ=",
          "work": [
            {
              "id": "6501894",
              "url": "/work/6501894",
              "troveUrl": "https://trove.nla.gov.au/work/6501894",
              "title": "Annotated Trade Practices Act / by Russell V. Miller",
              "contributor": [
                "Australia"
              ],
              "issued": "1900-2020",
              "type": [
                "Book",
                "Government publication",
                "Book/Illustrated",
                "Periodical",
                "Periodical/Journal, magazine, other"
              ],
              "holdingsCount": 138,
              "versionCount": 136,
              "relevance": {
                "score": "29.941029",
                "value": "very relevant"
              },
              "identifier": [
                {
                  "type": "url",
                  "linktype": "restricted",
                  "linktext": "Access full text (AUSTLII)",
                  "value": "http://www.austlii.edu.au/au/legis/cth/consol_act/tpa1974149"
                }
              ]
            }
          ]
        }
      },
      {
        "name": "newspaper",
        "records": {
          "s": "*",
          "n": "1",
          "total": "227655459",
          "next": "/result?q=+&encoding=json&n=1&zone=newspaper&s=AoIIP4AAACcxMDAwMDAw",
          "nextStart": "AoIIP4AAACcxMDAwMDAw",
          "article": [
            {
              "id": "1000000",
              "url": "/newspaper/1000000",
              "heading": "GOULBURN",
              "category": "Article",
              "title": {
                "id": "11",
                "value": "The Canberra Times (ACT : 1926 - 1995)"
              },
              "date": "1929-03-22",
              "page": 5,
              "pageSequence": 5,
              "relevance": {
                "score": "1.0",
                "value": "likely to be relevant"
              },
              "snippet": "A proposal by Ald. F. W. Yates, that a system of centralised slaughter was desirable for the provision of a wholesale supply of fresh meat to Goulburn,",
              "troveUrl": "https://trove.nla.gov.au/ndp/del/article/1000000"
            }
          ]
        }
      },
      {
        "name": "book",
        "records": {
          "s": "*",
          "n": "1",
          "total": "16432073",
          "next": "/result?q=+&encoding=json&n=1&zone=book&s=AoIIQ6f7gitzdTIyNzcwMTA0NQ%3D%3D",
          "nextStart": "AoIIQ6f7gitzdTIyNzcwMTA0NQ==",
          "work": [
            {
              "id": "227701045",
              "url": "/work/227701045",
              "troveUrl": "https://trove.nla.gov.au/work/227701045",
              "title": "\u200f/\u200f / \u200f",
              "contributor": [
                "Nu\u0304ri\u0304, Sha\u0304kir"
              ],
              "issued": "1907-2017",
              "type": [
                "Book",
                "Government publication"
              ],
              "holdingsCount": 0,
              "versionCount": 3,
              "relevance": {
                "score": "335.9649",
                "value": "very relevant"
              },
              "identifier": [
                {
                  "type": "url",
                  "linktype": "restricted",
                  "linktext": "Ebook Library",
                  "value": "http://public.eblib.com/choice/publicfullrecord.aspx?p=1669541"
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Looking into the zones

Now we've got data from all of Trove's zone – let's see what it looks like!

The next code cell loops through each zone in the results and extracts the total number of results. Because we didn't apply any limits to our query, this will tell us how many items are currently in each zone.

In [23]:
# Create empty lists to store 'zones' and 'totals'
zones = []
totals = []

# Loop through the zones in the API results
for zone in data['response']['zone']:

    # Add the name and total values to the relevant list
    zones.append(zone['name'])
    totals.append(int(zone['records']['total']))
        
# Save the results as a dictionary
zone_totals = {'zones': zones, 'totals': totals}

We're now going to convert the results into a Pandas dataframe. Pandas has lots of useful options for working with and displaying tabular data.

In [24]:
# Create a Pandas dataframe to work with the results
df = pd.DataFrame(zone_totals)

# Sort by zone name
df = df.sort_values(by='zones')

# Display as a table (formatting the numbers with comma separators for readability)
df[['zones', 'totals']].style.format({'totals': '{:,}'})
Out[24]:
zones totals
8 article 11,030,302
10 book 16,432,073
6 collection 728,776
1 gazette 3,535,033
3 list 92,588
5 map 319,374
7 music 2,483,162
9 newspaper 227,655,459
2 people 1,275,246
4 picture 4,515,404
0 url 0
In [25]:
# We can even use Pandas to display the results table with a simple bar chart
df[['zones', 'totals']].style.format({'totals': '{:,}'}).bar(subset=['totals'], color='#d65f5f')
Out[25]:
zones totals
8 article 11,030,302
10 book 16,432,073
6 collection 728,776
1 gazette 3,535,033
3 list 92,588
5 map 319,374
7 music 2,483,162
9 newspaper 227,655,459
2 people 1,275,246
4 picture 4,515,404
0 url 0

That's pretty cool, but let's take things one step further and use Altair to create a pretty interactive bar chart. As you can see from the cell below, Plotly is very easy to use.

In [26]:
alt.Chart(df).mark_bar().encode(
    x='zones:N',
    y='totals:Q'
)
Out[26]:

Ok, let's stop making pictures and look at what the results tell us. It's probably no surprise that there are more digitised newspaper articles that anything else.

Most of the zone names are familiar, though it might not be immediately obvious that the 'article' zone corresponds to the 'Journals, articles and data sets' zone in the web interface.

However, you might be wondering about 'collection'. It's not 'Lists' as there's already a 'list' zone in the results. It turns out that the 'collection' zone corresponds to the 'Diaries, letters and archives' zone in the web interface. I suppose it sort of makes sense.

You might also have noticed that although I said that the API didn't include results for the 'People and organisations' zone, there is a result in the data above for 'people'. What's going on?

Basically full support for the 'People and organisations' zone was never completed. Don't believe me? Let's have a look at the results.

First we'll extract the results for the 'people' zone from the API data.

In [12]:
# Create an empty list to store the results
people_results = []

# Loop through the zone results
for zone in data['response']['zone']:
    
    # When we find the people zone save the records data to 'people_results'
    if zone['name'] == 'people':
        people_results = zone['records']['people']        

Once again, we'll convert the results into a dataframe and display the first 5 rows as a table

In [13]:
# Create a datafram from 'people_results'
people_df = pd.DataFrame(people_results)

# Display the first 5 results as a table
people_df[:5]
Out[13]:
id url troveUrl
0 949358 /people/949358 https://trove.nla.gov.au/people/949358
1 897386 /people/897386 https://trove.nla.gov.au/people/897386
2 542626 /people/542626 https://trove.nla.gov.au/people/542626
3 535418 /people/535418 https://trove.nla.gov.au/people/535418
4 461145 /people/461145 https://trove.nla.gov.au/people/461145

You'll notice that there's not a lot of useful data – just identifiers and urls for the Trove web interface. If you try to use the identifier to get more information from the API you'll be out of luck – it returns a '404: Not Found' error.

As I said, full support for the 'People and organisations' zone was never completed. Hopefully it will be added in a future release.

More zone peculiarities

There's another couple of peculiarities that you need to be aware of. The first is really more of an annoyance than a peculiarity. As you might have noticed above, to find the results for the 'people'zone I had to loop through all the zones until I found the one with the name 'people'. We can't just say, 'give me the people results!'. Of course, this is only an issue if you've asked for results from more than one zone. If you set the 'zone' parameter to a single zone – like 'newspaper' – the newspaper data would be the first (and only) set of results. You could find them at data['response']['zone'][0].

You might also have noticed that the individual records from the 'people' zone were found at zone['records']['people']. What's wrong with that? Well, it means that different zones use different keys to identify their records. So you have to know in advance what the key is in order to get the records data. Again, if you're only working with one zone it's not too hard. But if you're working across multiple zones, it's a bit of a pain.

At least we can use the data we've already gathered to create a mapping of zones to keys.

In [27]:
# Create an empty list to store the results
zone_keys = []

# Loop through the zones
for zone in data['response']['zone']:
    
    # Get the name of the zone
    zone_name = zone['name']
    
    # Loop through the keys
    for key in zone['records'].keys():
        
        # Check the key against the keys that are always there
        if key not in ['s', 'n', 'total', 'next', 'nextStart']:
            # If it's not one of the standard keys save it
            records_key = key
            # Append the zone name and records key to our list
            zone_keys.append({'zone_name': zone_name, 'records_key': records_key})
    
# Convert the results to a dataframe
keys_df = pd.DataFrame(zone_keys)

# Sort and display the results
keys_df = keys_df.sort_values(by='records_key')
keys_df[['zone_name', 'records_key']]
            
Out[27]:
zone_name records_key
0 gazette article
8 newspaper article
2 list list
1 people people
3 picture work
4 map work
5 collection work
6 music work
7 article work
9 book work

As you can see, the 'newspaper', 'list', and 'people' zones all have specific keys. Every other zone uses 'work'.

Finally...

If you want to save the zones data, just run the cell below to create a CSV-formatted file.

In [17]:
# Save the zones data to a CSV file you can download
df.to_csv('data/trove_zones.csv', index=False)

Once you've created it, you can download this file from the workbench's data directory.


Created by Tim Sherrratt (@wragge) as part of the GLAM workbench.

If you think this project is worthwhile you can support it on Patreon.