Exploring facets

New to Jupyter notebooks? Try Using Jupyter notebooks for a quick introduction.

Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove.

In [1]:
import requests
import altair as alt
import pandas as pd
import os

# Make sure data directory exists
os.makedirs('data', exist_ok=True)

Insert your API key between the quotes.

In [ ]:
api_key = ''
print('Your API key is: {}'.format(api_key))
In [6]:
api_search_url = 'https://api.trove.nla.gov.au/v2/result'

Set up our query parameters. We want everything, so we set the q parameter to be a single space.

In [7]:
params = {
    'q': ' ', # A space to search for everything
    'facet': 'format',
    'zone': 'book', 
    'key': api_key,
    'encoding': 'json',
    'n': 1
}
In [8]:
response = requests.get(api_search_url, params=params)
print(response.url) # This shows us the url that's sent to the API
data = response.json()
# print(data)
https://api.trove.nla.gov.au/v2/result?q=+&facet=format&zone=book&key=6pi5hht0d2umqcro&encoding=json&n=1
In [9]:
from operator import itemgetter

def facet_totals():
    '''
    Loop through facets saving terms and counts.
    Returns a list of dictionaries.
    '''
    facets = []
    # Sort alphabetically by facet name
    facet_list = sorted(data['response']['zone'][0]['facets']['facet']['term'], key=itemgetter('search'))
    for term in facet_list:
        term_count = int(term['count'])
        if 'term' in term:
            # There be sub-terms!
            for subterm in sorted(term['term'], key=itemgetter('search')):
                facets.append({'facet': subterm['search'], 'total': int(subterm['count'])})
                # Subtract the subterm count from the term count
                term_count = term_count - int(subterm['count'])
                # print('{:<50} {:,}'.format(subterm['search'], int(subterm['count'])))
        # print('{:<50} {:,}'.format(term['search'], term_count))
        facets.append({'facet': term['search'], 'total': term_count})
    return pd.DataFrame(facets)

facet_totals = facet_totals()
facet_totals
Out[9]:
facet total
0 Archived website 24038
1 Audio book 180720
2 Book/Braille 36227
3 Book/Illustrated 7111902
4 Book/Large print 102783
5 Book 7920356
6 Conference Proceedings 461062
7 Microform 867100
8 Thesis 597625

Now we can create a bar chart using Altair. The x values will be the zone names, and the y values will be the totals.

In [10]:
# Comment out either or both of these lines if not necessary
# Sort by total (highest to lowest) and take the top twenty
#top_facets = facet_totals.sort_values(by="total", ascending=False)[:20]
In [11]:
# Create a bar chart
alt.Chart(facet_totals).mark_bar().encode(
    x='total:Q',
    y='facet:N'
)
Out[11]:
In [10]:
facet_totals.to_csv('data/facet-{}.csv'.format(params['facet']), index=False)

Once you've saved this file, you can download it from the workbench data directory.

Going further

For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see Visualise Trove newspaper searches over time.


Created by Tim Sherrratt (@wragge) as part of the GLAM workbench.

If you think this project is worthwhile you can support it on Patreon.