This notebook converts Trove lists into a series of files that can be uploaded to a CollectionBuilder-GH repository to create an instant exhibition. See the CollectionBuilder site for more information on how CollectionBuilder works and what it can do.
Demo: this exhibition was generated from this Trove list.
GitHub builds your exhibition from the files in the repository using GitHub Pages. You need to enable this after you create your repository:
GitHub will now build your exhibition. Once it's ready you'll see a link on the 'Pages' page. The url will have the form https://[your GH user name].github.io/[your repository name]
. At the moment the exhibition will contain dummy data – the next step is to generate your own exhibition data!
https://trove.nla.gov.au/list/83774
has an id of 83774
.The metadata describing the items in your exhibition is contained in the _data/[list id]-items.csv
file. If the items in your exhibition relate to specific places, you may want to add some extra metadata so that CollectionBuilder can display them on a map.
Information about places is contained in three columns: location
, latitude
, and longitude
. In the location
field you can include a list of place names, separated by semicolons, eg: 'Melbourne; Sydney; Hobart'. These placenames will be used to build a word cloud when you click on the Location tab in your exhibition.
To add an item to CollectionBuilder's map view, you need to supply values for latitude
and longitude
.
You might also want to edit the subject
, and description
fields.
Note that GitHub has it's own built-in file editor. So if you don't have a way of editing the CSV file on your own computer, just skip down to the 'Upload your files...' section below and add them to your GitHub repository. To edit the file just view it in GitHub and click on the pencil icon. Once you've finished editing, make sure you click the Commit button to save your changes.
Trove work records often only include links to tiny thumbnailed versions of images. These don't look great in an exhibition, so you might want to replace them. Different collections use different image viewers, so there's no easy, automated way to do this. You'll have to manually download them and replace the thumnailed versions.
objects
directory.objects
directory with the new downloaded version.You're now ready to add your exhibition files to the exhibition repository!
_config.yml
file in the exhibition files you downloaded from this notebook._data
directory in your GitHub repository._data/[list id]-items.csv
file in your exhibition files.objects
directory in your GitHub repository.objects
directory of your exhibition files.Once you've uploaded the files, GitHub will rebuild the exhibition using your data. It might take a little while to generate, but once it's ready you see it at https://[your GH user name].github.io/[your repository name]
.
If your not happy with the metadata and how it displays, you can either edit the exhibition files on your own computer and re-upload them to GitHub. Or you can use GitHub's built-in file editor to make changes. To edit a file just view it in GitHub and click on the pencil icon. Once you've finished editing, make sure you click the Commit button to save your changes.
Every time you make a change to your repository, GitHub will automatically rebuild your exhibition.
You can further customise the look and feel of your exhibition by editing the _data/theme.yml
file. For example, you can:
featured-image
to display in the header of your exhibition.latitude
and longitude
values to set the centre on the map view.See the CollectionBuilder documentation for more options.
You can add your own annotations to Trove list items and these will automatically be included in your exhibition. To add a descriptive note:
Your note will be added to the description
field of the item when you generate your exhibition files. In addition, any tags added to items in your list will be added to the subject
field.
Note that if you make changes to your list, you'll need to regenerate the exhibition files using this notebook and upload them to your GitHub repository before the changes are visible in your exhibition.
import os
import shutil
from pathlib import Path
import pandas as pd
import requests
import yaml
from IPython.display import HTML
from PIL import Image
from PIL.ImageOps import fit
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from tqdm.auto import tqdm
from trove_newspaper_images.articles import download_images
s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
s.mount("http://", HTTPAdapter(max_retries=retries))
s.mount("https://", HTTPAdapter(max_retries=retries))
%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv
This is the only section that you'll need to edit. Paste your API key and list id in the cells below as indicated. Once you've finished, select Run all cells from the Run menu to generate your exhibition files.
# Insert your Trove API key between the quotes
API_KEY = "YOUR API KEY"
# Use api key value from environment variables if it is available
if os.getenv("TROVE_API_KEY"):
API_KEY = os.getenv("TROVE_API_KEY")
# Paste your list id between the quotes
list_id = "83777"
def listify(value):
"""
Sometimes values can be lists and sometimes not.
Turn them all into lists to make life easier.
"""
if isinstance(value, (str, int)):
try:
value = str(value)
except ValueError:
pass
value = [value]
return value
def get_url(identifiers, linktype):
"""
Loop through the identifiers to find the request url.
"""
url = ""
for identifier in identifiers:
if identifier["linktype"] == linktype:
url = identifier["value"]
break
return url
def save_as_csv(list_dir, data, data_type):
df = pd.DataFrame(data)
df["pages"] = df["pages"].astype("Int64")
df.to_csv(Path(list_dir, "_data", f"{list_id}-{data_type}.csv"), index=False)
def make_filename(article):
"""
Create a filename for a text file or PDF.
For easy sorting/aggregation the filename has the format:
PUBLICATIONDATE-NEWSPAPERID-ARTICLEID
"""
date = article["date"]
date = date.replace("-", "")
newspaper_id = article["newspaper_id"]
article_id = article["id"]
return "{}-{}-{}".format(date, newspaper_id, article_id)
def get_list(list_id):
list_url = f"https://api.trove.nla.gov.au/v2/list/{list_id}?encoding=json&reclevel=full&include=listItems&key={API_KEY}"
response = s.get(list_url)
return response.json()
def get_article(id):
article_api_url = f"https://api.trove.nla.gov.au/v2/newspaper/{id}/?encoding=json&reclevel=full&key={API_KEY}&include=tags"
response = s.get(article_api_url)
return response.json()
def get_work(id):
article_api_url = f"https://api.trove.nla.gov.au/v2/work/{id}/?encoding=json&reclevel=full&key={API_KEY}&include=tags,links"
response = s.get(article_api_url)
return response.json()
def make_dirs(list_id):
list_dir = Path("cb-exhibitions", list_id)
list_dir.mkdir(parents=True, exist_ok=True)
Path(list_dir, "objects").mkdir(exist_ok=True)
Path(list_dir, "temp").mkdir(exist_ok=True)
Path(list_dir, "_data").mkdir(exist_ok=True)
return list_dir
def get_subjects(work):
subjects = []
if "subject" in work:
subjects = listify(work["subject"])
else:
subjects = []
if "tag" in work:
for tag in work["tag"]:
subjects.append(tag["value"])
return subjects
def get_work_image_url(record):
image_url = get_url(record.get("identifier", ""), "viewcopy")
if not image_url:
image_url = get_url(record.get("identifier", ""), "thumbnail")
return image_url
def save_work_image(list_dir, record):
image_url = get_work_image_url(record)
if image_url:
response = s.get(image_url)
if response.status_code == 200:
filename = Path(list_dir, "objects", f"work-{record.get('id', '')}.jpg")
filename.write_bytes(response.content)
return filename
def get_article_tags(record):
subjects = []
article = get_article(record["id"])["article"]
if "tag" in article:
for tag in article["tag"]:
subjects.append(tag["value"])
return subjects
def get_parent(record):
parent = ""
parents = listify(record.get("isPartOf", []))
if parents:
if isinstance(parents[0], dict) and "value" in parents[0]:
parent = parents[0]["value"]
else:
parent = parents[0]
return parent
def update_config(list_data, list_dir):
with Path("cb-config", "_config.yml").open("r") as config_in:
config = yaml.safe_load(config_in)
config["title"] = list_data["list"][0]["title"]
config["author"] = list_data["list"][0]["creator"].replace("public:", "")
config["metadata"] = f'{list_data["list"][0]["id"]}-items'
with Path(list_dir, "_config.yml").open("w") as config_out:
config_out.write(yaml.dump(config))
def harvest_list(list_id):
list_dir = make_dirs(list_id)
data = get_list(list_id)
update_config(data, list_dir)
items = []
for item in tqdm(data["list"][0]["listItem"]):
for zone, record in item.items():
if zone == "work":
# Some fields aren't included in the list data, so get the full work record
work_data = get_work(record["id"])["work"]
work = {
"objectid": f"work-{record.get('id', '')}",
"title": record.get("title", ""),
"type": ";".join(listify(record.get("type", ""))),
"date": listify(record.get("issued", []))[0],
"creator": "; ".join(listify(record.get("contributor", ""))),
"is_part_of": get_parent(record),
"trove_url": record.get("troveUrl", ""),
"source_url": get_url(record.get("identifier", ""), "fulltext"),
"description": item.get("note", ""),
"subject": "; ".join(get_subjects(work_data)),
"location": "",
"latitude": "",
"longitude": "",
}
image_filename = save_work_image(list_dir, work_data)
if image_filename:
work["filename"] = image_filename.name
work["format"] = "image/jpeg"
items.append(work)
elif zone == "article":
newspaper_id = record.get("title", {}).get("id")
newspaper_title = record.get("title", {}).get("value")
newspaper_link = f'<a href="http://nla.gov.au/nla.news-title{newspaper_id}">{newspaper_title}</a>'
# citation =
article = {
"objectid": f"article-{record.get('id', '')}",
"title": record.get("heading", ""),
"date": record.get("date", ""),
"is_part_of": newspaper_link,
"pages": record.get("pageSequence", ""),
"trove_url": f'http://nla.gov.au/nla.news-article{record.get("id")}',
"type": "Newspaper article",
"format": "image/jpeg",
"description": item.get("note", ""),
"subject": "; ".join(get_article_tags(record)),
"location": "",
"latitude": "",
"longitude": "",
}
images = download_images(record["id"], Path(list_dir, "temp"))
img = Image.open(Path(list_dir, "temp", images[0]))
cropped = fit(
img, (800, 800), method=Image.Resampling.LANCZOS, centering=(0.5, 0)
)
cropped.save(Path(list_dir, "objects", images[0]), "JPEG")
article["filename"] = images[0]
items.append(article)
shutil.rmtree(Path(list_dir, "temp"))
if items:
save_as_csv(list_dir, items, "items")
return items
Run the cell below to start the exhibition building process.
items = harvest_list(list_id)
Run the cell below to zip up all the harvested files and create a download link.
list_dir = Path("cb-exhibitions", list_id)
shutil.make_archive(list_dir, "zip", list_dir)
HTML(f'<a download="{list_id}.zip" href="{list_dir}.zip">Download your files</a>')
Created by Tim Sherratt for the GLAM Workbench.