The Trove API includes the contributor
endpoint for retrieving information about organisations whose metadata is aggregated into Trove. If you include the reclevel=full
parameter, you can get details of all contributors with a single API request like this:
https://api.trove.nla.gov.au/v2/contributor?encoding=json&reclevel=full&key=[YOUR API KEY]
However, the data can be difficult to use because of its nested structure, with some organisations having several levels of subsidiaries. There's also some inconsistency in the way nested records are named. This notebook aims to work around these problems by converting the nested data into a single flat list of organisations.
This code is used to make weekly harvests of the contributor data which are saved in this repository.
import datetime
import json
import os
from pathlib import Path
import pandas as pd
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
# Create a session that will automatically retry on server errors
s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])
s.mount("http://", HTTPAdapter(max_retries=retries))
s.mount("https://", HTTPAdapter(max_retries=retries))
%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv
# Insert your Trove API key
API_KEY = "YOUR API KEY"
if os.getenv("TROVE_API_KEY"):
API_KEY = os.getenv("TROVE_API_KEY")
def get_contrib_details(record, parent=None):
"""
Get the details of a contributor, recursing through children if present.
"""
records = []
# Get the basic details
details = {
"id": record["id"],
"name": record["name"],
"total_items": int(record["totalholdings"]),
"parent": None,
}
# Add nuc if present
if "nuc" in record:
details["nuc"] = record["nuc"][0]
else:
details["nuc"] = None
# If this is a child record, combine parent and child names
if parent:
if not record["name"].startswith(parent["name"]):
details["name"] = f"{parent['name']} {record['name']}"
# Add parent id
details["parent"] = parent["id"]
records = [details]
if "children" in record:
# Pass forward combined names for deeply nested orgs
record["name"] = details["name"]
records += get_children(record)
return records
def get_children(parent):
"""
Process child records.
"""
children = []
for child in parent["children"]["contributor"]:
children += get_contrib_details(child, parent)
return children
def get_contributors(save_json=True):
"""
Get all Trove contributors, flattening the nested structure and optionally saving the original JSON.
"""
contributors = []
params = {"encoding": "json", "reclevel": "full", "key": API_KEY}
response = s.get("https://api.trove.nla.gov.au/v2/contributor", params=params)
data = response.json()
# Save the original nested JSON response
if save_json:
Path(
f"trove-contributors-{datetime.datetime.now().strftime('%Y%m%d')}.json"
).write_text(json.dumps(data))
# Get details of each contributor
for contrib in data["response"]["contributor"]:
contributors += get_contrib_details(contrib)
return contributors
contributors = get_contributors()
Convert the data to a dataframe.
df = pd.DataFrame(contributors)
df.head()
id | name | total_items | parent | nuc | |
---|---|---|---|---|---|
0 | VPWLH | 4th/19th Prince of Wales' Light Horse Regiment Unit. History Room. | 1570 | None | VPWLH |
1 | NBAL | Abbotsleigh. Betty Archdale Library. | 0 | None | NBAL |
2 | ADFA | Academy Library, UNSW Canberra. | 289474 | None | ADFA |
3 | ACT | ACT Legislative Assembly Library. | 11290 | None | ACT |
4 | SACC | Adelaide City Libraries. Adelaide City Libraries - City Library. | 79056 | None | SACC |
How many contributors are listed?
df.shape[0]
2693
How many of the contributor records include NUCs?
df.loc[df["nuc"].notnull()].shape[0]
1765
Save the data to a CSV file.
df[["id", "nuc", "name", "parent", "total_items"]].to_csv(
f"trove-contributors-{datetime.datetime.now().strftime('%Y%m%d')}.csv", index=False
)
Created by Tim Sherratt for the GLAM Workbench. Support this project by becoming a GitHub sponsor.