Create a list of Trove's digitised journals

Everyone know's about Trove's newspapers, but there is also a growing collection of digitised journals available in the journals zone. They're not easy to find, however, which is why I created the Trove Titles web app.

This notebook uses the Trove API to harvest metadata relating to digitised journals – or more accurately, journals that are freely available online in a digital form. This includes some born digital publications that are available to view in formats like PDF and MOBI, but excludes some digital journals that have access restrictions.

The search strategy to find digitised (and digital) journals takes advantage of the fact that Trove's digitised resources (excluding the newspapers) all have an identifier that includes the string nla.obj. So we start by searching in the journals zone for records that include nla.obj and have the format 'Periodical'. By specifying 'Periodical' we exclude individual articles from digitised journals.

Then it's just a matter of looping through all the results and checking to see if a record includes a fulltext link to a digital copy. If it does it gets saved.

You can see the results in this CSV file. Obviously you could extract additional metadata from each record if you wanted to.

The default fields are:

  • fulltext_url – the url of the landing page of the digital version of the journal
  • title – the title of the journal
  • trove_id – the 'nla.obj' part of the fulltext_url, a unique identifier for the digital journal
  • trove_url – url of the journal's metadata record in Trove

I've used this list to harvest all the OCRd text from digitised journals.

In [8]:
# Let's import the libraries we need.
import requests
import pandas as pd
from bs4 import BeautifulSoup
import time
import json
import os
import re
from tqdm.auto import tqdm
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from slugify import slugify
from IPython.display import display, FileLink

s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
s.mount('https://', HTTPAdapter(max_retries=retries))
s.mount('http://', HTTPAdapter(max_retries=retries))

Add your Trove API key

You can get a Trove API key by following these instructions.

In [9]:
# Add your Trove API key between the quotes
api_key = 'YOUR API KEY'

Define some functions to do the work

In [35]:
def get_total_results(params):
    '''
    Get the total number of results for a search.
    '''
    these_params = params.copy()
    these_params['n'] = 0
    response = s.get('https://api.trove.nla.gov.au/v2/result', params=these_params)
    data = response.json()
    return int(data['response']['zone'][0]['records']['total'])


def get_fulltext_url(links):
    '''
    Loop through the identifiers to find a link to the digital version of the journal.
    '''
    nla_digitised = False
    for link in links:
        if link['linktype'] == 'fulltext' and 'nla.obj' in link['value']:
            url = link['value']
            if link['linktext'] == 'National Library of Australia digitised item':
                nla_digitised = True
            return url, nla_digitised


def get_titles():
    '''
    Harvest metadata about digitised journals.
    With a little adaptation, this basic pattern could be used to harvest
    other types of works from Trove.
    '''
    url = 'http://api.trove.nla.gov.au/v2/result'
    titles = []
    params = {
        # We can 'NOT' the format facet in the query
        'q': '"nla.obj-" NOT format:"Government publication" NOT format:Article',
        'zone': 'article',
        'l-format': 'Periodical', # Journals only
        'include': 'links',
        'bulkHarvest': 'true', # Needed to maintain a consistent order across requests
        'key': api_key,
        'n': 100,
        'encoding': 'json'
    }
    start = '*'
    total = get_total_results(params)
    with tqdm(total=total) as pbar:
        while start:
            params['s'] = start
            response = s.get(url, params=params)
            data = response.json()
            # If there's a startNext value then we get it to request the next page of results
            try:
                start = data['response']['zone'][0]['records']['nextStart']
            except KeyError:
                start = None
            for work in data['response']['zone'][0]['records']['work']:
                # Check to see if there's a link to a digital version
                try:
                    fulltext_url, nla_digitised = get_fulltext_url(work['identifier'])
                except (KeyError, TypeError):
                    pass
                else:
                    if fulltext_url:
                        trove_id = re.search(r'(nla\.obj\-\d+)', fulltext_url).group(1)
                        # Get basic metadata
                        # You could add more work data here
                        # Check the Trove API docs for work record structure
                        title = {
                            'title': work['title'],
                            'fulltext_url': fulltext_url, 
                            'trove_url': work['troveUrl'],
                            'trove_id': trove_id,
                            'nla_digitised': nla_digitised
                        }
                        titles.append(title)
            time.sleep(0.2)
            pbar.update(100)
    return titles

Run the harvest

In [ ]:
titles = get_titles()

Convert to a dataframe and save as a CSV file

Let's convert the Python list to a Pandas DataFrame, have a peek inside, then save in CSV format.

In [37]:
df = pd.DataFrame(titles)
df.head()
Out[37]:
title fulltext_url trove_url trove_id nla_digitised
0 The Silver stream songster https://nla.gov.au/nla.obj-614066685 https://trove.nla.gov.au/work/10087062 nla.obj-614066685 False
1 Stonequarry journal (Online) https://nla.gov.au/nla.obj-862209995 https://trove.nla.gov.au/work/10106079 nla.obj-862209995 False
2 Philament (Sydney, N.S.W. : Online) https://nla.gov.au/nla.obj-749489295 https://trove.nla.gov.au/work/10287808 nla.obj-749489295 False
3 Journal (Queensland Law Society) https://www.nla.gov.au/nla.obj-2735787548 https://trove.nla.gov.au/work/10321820 nla.obj-2735787548 False
4 The Order of service for the annual festival t... http://nla.gov.au/nla.obj-657473276 https://trove.nla.gov.au/work/10388163 nla.obj-657473276 True
In [38]:
# How many journals are there?
df.shape
Out[38]:
(2730, 5)

For some reason there are a number of duplicates in the list, where multiple Trove work records point to the same digitised journal. We an display the duplicates like this.

In [41]:
# SHow all the rows
pd.set_option('display.max_rows', None)
# Show dupes
df.loc[df.duplicated(subset=['trove_id'], keep=False)].sort_values(by=['trove_id', 'nla_digitised'])
Out[41]:
title fulltext_url trove_url trove_id nla_digitised
1594 Wings (Sydney, N.S.W. : Online) https://nla.gov.au/nla.obj-1226109179 https://trove.nla.gov.au/work/236307958 nla.obj-1226109179 False
2379 Wings (Sydney, N.S.W.) https://nla.gov.au/nla.obj-1226109179 https://trove.nla.gov.au/work/30060307 nla.obj-1226109179 False
672 [Event programme] / Australian Festival of Cha... https://nla.gov.au/nla.obj-1252107366 https://trove.nla.gov.au/work/205602387 nla.obj-1252107366 False
1942 [Event programme] / Australian Festival of Cha... https://nla.gov.au/nla.obj-1252107366 https://trove.nla.gov.au/work/237613201 nla.obj-1252107366 False
632 The Brisbane Bushwalker https://nla.gov.au/nla.obj-1252263267 https://trove.nla.gov.au/work/200641459 nla.obj-1252263267 False
2314 The Brisbane bushwalker : monthly magazine of ... https://nla.gov.au/nla.obj-1252263267 https://trove.nla.gov.au/work/24167477 nla.obj-1252263267 False
1840 The Shadowland newsletter https://nla.gov.au/nla.obj-1771610885 https://trove.nla.gov.au/work/237293619 nla.obj-1771610885 False
2692 The Atlas of the solar system / Patrick Moore ... https://nla.gov.au/nla.obj-1771610885 https://trove.nla.gov.au/work/7113005 nla.obj-1771610885 False
1996 Photographic review of reviews (Online) http://nla.gov.au/nla.obj-389050007 https://trove.nla.gov.au/work/238058947 nla.obj-389050007 False
2471 Photographic review of reviews http://nla.gov.au/nla.obj-389050007 https://trove.nla.gov.au/work/33565755 nla.obj-389050007 True
1963 The Australian woman's mirror (Online) http://nla.gov.au/nla.obj-389050376 https://trove.nla.gov.au/work/237942979 nla.obj-389050376 False
2482 The Australian woman's mirror http://nla.gov.au/nla.obj-389050376 https://trove.nla.gov.au/work/33657268 nla.obj-389050376 True
352 U3A Sunshine e-Voice https://nla.gov.au/nla.obj-483060448 https://trove.nla.gov.au/work/185170546 nla.obj-483060448 False
1127 U3A Sunshine e-Voice (Online) https://nla.gov.au/nla.obj-483060448 https://trove.nla.gov.au/work/227406722 nla.obj-483060448 False
243 Newsletter (Genealogical Society of Queensland... https://nla.gov.au/nla.obj-485357469 https://trove.nla.gov.au/work/17504125 nla.obj-485357469 False
1131 Bremer echoes (Online) https://nla.gov.au/nla.obj-485357469 https://trove.nla.gov.au/work/227470178 nla.obj-485357469 False
1965 The New South Wales Post Office directory (Onl... http://nla.gov.au/nla.obj-518308191 https://trove.nla.gov.au/work/237945398 nla.obj-518308191 False
1469 The New South Wales Post Office directory http://nla.gov.au/nla.obj-518308191 https://trove.nla.gov.au/work/23458620 nla.obj-518308191 True
1964 Everyones (Sydney, N.S.W. : Online) http://nla.gov.au/nla.obj-522690001 https://trove.nla.gov.au/work/237943749 nla.obj-522690001 False
2635 Everyones http://nla.gov.au/nla.obj-522690001 https://trove.nla.gov.au/work/5550156 nla.obj-522690001 True
1971 Month (Sydney, N.S.W. : Online) http://nla.gov.au/nla.obj-597762006 https://trove.nla.gov.au/work/238004028 nla.obj-597762006 False
2612 Month (Sydney, N.S.W.) http://nla.gov.au/nla.obj-597762006 https://trove.nla.gov.au/work/5525268 nla.obj-597762006 True
1967 South-Asian register (Online) http://nla.gov.au/nla.obj-597769314 https://trove.nla.gov.au/work/237956276 nla.obj-597769314 False
2613 The South-Asian register http://nla.gov.au/nla.obj-597769314 https://trove.nla.gov.au/work/5525462 nla.obj-597769314 True
1969 Tegg's monthly magazine (Online) http://nla.gov.au/nla.obj-598267619 https://trove.nla.gov.au/work/237995271 nla.obj-598267619 False
805 Tegg's monthly magazine http://nla.gov.au/nla.obj-598267619 https://trove.nla.gov.au/work/21508050 nla.obj-598267619 True
1962 Rugby League news (Sydney, N.S.W. : Online) http://nla.gov.au/nla.obj-598579045 https://trove.nla.gov.au/work/237942681 nla.obj-598579045 False
2653 Rugby League news (Sydney, N.S.W.) http://nla.gov.au/nla.obj-598579045 https://trove.nla.gov.au/work/6193573 nla.obj-598579045 True
1972 Bookfellow (Sydney, N.S.W. : 1899 : Online) http://nla.gov.au/nla.obj-636005630 https://trove.nla.gov.au/work/238004103 nla.obj-636005630 False
823 Bookfellow (Sydney, N.S.W. : 1899) http://nla.gov.au/nla.obj-636005630 https://trove.nla.gov.au/work/21701137 nla.obj-636005630 True
1974 Australian magazine (Sydney, N.S.W. : 1899 : O... http://nla.gov.au/nla.obj-636091247 https://trove.nla.gov.au/work/238008662 nla.obj-636091247 False
386 Australian magazine (Sydney, N.S.W. : 1899) http://nla.gov.au/nla.obj-636091247 https://trove.nla.gov.au/work/18685300 nla.obj-636091247 True
324 Bulletin (Sydney, N.S.W. : 1880) https://nla.gov.au/nla.obj-68375465 https://trove.nla.gov.au/work/181758222 nla.obj-68375465 False
39 The bulletin https://nla.gov.au/nla.obj-68375465 https://trove.nla.gov.au/work/11500235 nla.obj-68375465 True
1966 Dun's gazette for New South Wales (Online) http://nla.gov.au/nla.obj-724008889 https://trove.nla.gov.au/work/237946453 nla.obj-724008889 False
393 Dun's gazette for New South Wales http://nla.gov.au/nla.obj-724008889 https://trove.nla.gov.au/work/18731562 nla.obj-724008889 True
1995 Weldon's matrimonial gazette (Online) http://nla.gov.au/nla.obj-744869630 https://trove.nla.gov.au/work/238056012 nla.obj-744869630 False
549 Weldon's matrimonial gazette http://nla.gov.au/nla.obj-744869630 https://trove.nla.gov.au/work/19323982 nla.obj-744869630 True
1986 New South Wales school magazine of literature ... http://nla.gov.au/nla.obj-748141557 https://trove.nla.gov.au/work/238053149 nla.obj-748141557 False
9 The New South Wales school magazine of literat... http://nla.gov.au/nla.obj-748141557 https://trove.nla.gov.au/work/10753694 nla.obj-748141557 True
1979 Australian magazine (Sydney, N.S.W. : 1838 : O... http://nla.gov.au/nla.obj-752101760 https://trove.nla.gov.au/work/238034966 nla.obj-752101760 False
343 Australian magazine (Sydney, N.S.W. : 1838) http://nla.gov.au/nla.obj-752101760 https://trove.nla.gov.au/work/18429775 nla.obj-752101760 True
1984 New South Wales magazine (1833 : Online) http://nla.gov.au/nla.obj-753076802 https://trove.nla.gov.au/work/238053011 nla.obj-753076802 False
388 New South Wales magazine (1833) http://nla.gov.au/nla.obj-753076802 https://trove.nla.gov.au/work/18691103 nla.obj-753076802 True
1994 Sydney coronal (Online) http://nla.gov.au/nla.obj-753479079 https://trove.nla.gov.au/work/238055834 nla.obj-753479079 False
35 The Sydney coronal / by Charles M'Donald http://nla.gov.au/nla.obj-753479079 https://trove.nla.gov.au/work/11415630 nla.obj-753479079 True
1978 Tegg's New South Wales pocket almanac and reme... http://nla.gov.au/nla.obj-754081281 https://trove.nla.gov.au/work/238033317 nla.obj-754081281 False
2615 Tegg's New South Wales pocket almanac and reme... http://nla.gov.au/nla.obj-754081281 https://trove.nla.gov.au/work/5527029 nla.obj-754081281 True
1977 Liberty (Sydney, N.S.W. : Online) http://nla.gov.au/nla.obj-760289107 https://trove.nla.gov.au/work/238033248 nla.obj-760289107 False
2665 Liberty (Sydney, N.S.W.) http://nla.gov.au/nla.obj-760289107 https://trove.nla.gov.au/work/6399562 nla.obj-760289107 True
1973 Sydney once a week magazine (Online) http://nla.gov.au/nla.obj-760335335 https://trove.nla.gov.au/work/238006103 nla.obj-760335335 False
115 The Sydney once a week magazine http://nla.gov.au/nla.obj-760335335 https://trove.nla.gov.au/work/13572556 nla.obj-760335335 True
1970 Literary news (Online) http://nla.gov.au/nla.obj-765536757 https://trove.nla.gov.au/work/238002346 nla.obj-765536757 False
384 The Literary news : a review and magazine of f... http://nla.gov.au/nla.obj-765536757 https://trove.nla.gov.au/work/18674223 nla.obj-765536757 True
1982 Australia and the bookfellow (Online) http://nla.gov.au/nla.obj-768289925 https://trove.nla.gov.au/work/238042370 nla.obj-768289925 False
935 Australia and the bookfellow http://nla.gov.au/nla.obj-768289925 https://trove.nla.gov.au/work/22318978 nla.obj-768289925 True
1981 Australia (Sydney, N.S.W. : 1907 : Online) http://nla.gov.au/nla.obj-768329137 https://trove.nla.gov.au/work/238042365 nla.obj-768329137 False
936 Australia (Sydney, N.S.W. : 1907) http://nla.gov.au/nla.obj-768329137 https://trove.nla.gov.au/work/22319382 nla.obj-768329137 True
1983 Bookfellow (Sydney, N.S.W. : 1911 : Online) http://nla.gov.au/nla.obj-768936943 https://trove.nla.gov.au/work/238051629 nla.obj-768936943 False
937 Bookfellow (Sydney, N.S.W. : 1911) http://nla.gov.au/nla.obj-768936943 https://trove.nla.gov.au/work/22319620 nla.obj-768936943 True
1985 New Triad (Online) http://nla.gov.au/nla.obj-788254980 https://trove.nla.gov.au/work/238053099 nla.obj-788254980 False
2639 The New Triad http://nla.gov.au/nla.obj-788254980 https://trove.nla.gov.au/work/5552221 nla.obj-788254980 True
1975 Triad (Sydney, N.S.W. : Online) https://nla.gov.au/nla.obj-875780662 https://trove.nla.gov.au/work/238009872 nla.obj-875780662 False
2357 Triad (Sydney, N.S.W.) https://nla.gov.au/nla.obj-875780662 https://trove.nla.gov.au/work/27592184 nla.obj-875780662 True
In [43]:
df.sort_values(by=['trove_id', 'nla_digitised']).drop_duplicates(subset='trove_id', keep='last').shape
Out[43]:
(2698, 5)
In [40]:
# Save as CSV and display a download link
df.to_csv('digital-journals.csv', index=False)
display(FileLink('digital-journals.csv'))

Created by Tim Sherratt for the GLAM Workbench.

Work on this notebook was supported by the Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab.