Beyond the copyright cliff of death¶

Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. Let's find out how many, and which newspapers they were published in.

In [1]:

import os

import pandas as pd
import requests
from IPython.display import FileLink, display

In [2]:

%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv

In [3]:

# Insert your Trove API key
API_KEY = "YOUR API KEY"

# Use api key value from environment variables if it is available
if os.getenv("TROVE_API_KEY"):
    API_KEY = os.getenv("TROVE_API_KEY")

Search for articles published after 1955¶

First we're going to run a date query to find all the articles published after 1954. But instead of looking at the articles themselves, we're going to get the title facet – this will tell us the number of articles for each newspaper.

In [4]:

params = {
    "q": "date:[1955 TO *]",  # date range query
    "zone": "newspaper",
    "facet": "title",  # get the newspaper facets
    "encoding": "json",
    "n": 0,  # no articles thanks
    "key": API_KEY,
}

In [5]:

# Make our API request
response = requests.get("https://api.trove.nla.gov.au/v2/result", params=params)
data = response.json()

In [6]:

# Get the facet data
facets = data["response"]["zone"][0]["facets"]["facet"]["term"]

In [7]:

# Convert to a dataframe
df_articles = pd.DataFrame(facets)
# Get rid of some columns
df_articles = df_articles[["count", "display"]]
# Rename columns
df_articles.columns = ["number_of_articles", "id"]
# Change id to string, so we can merge on it later
df_articles["id"] = df_articles["id"].astype("str")
# Preview results
df_articles.head()

Out[7]:

	number_of_articles	id
0	2567488	11
1	573658	1685
2	417472	1376
3	263618	1694
4	225466	112

As you can see from the data above, the title facet only gives us the identifier for a newspaper, not its title or date range. To get more information about each newspaper, we're going to get a list of newspapers from the Trove API and then merge the two datasets.

In [8]:

# Get ALL the newspapers
response = requests.get(
    "https://api.trove.nla.gov.au/v2/newspaper/titles",
    params={"encoding": "json", "key": API_KEY},
)
newspapers_data = response.json()

In [9]:

newspapers = newspapers_data["response"]["records"]["newspaper"]
# Convert to a dataframe
df_newspapers = pd.DataFrame(newspapers)

In [10]:

# Merge the two dataframes by doing a left join on the 'id' column
df_newspapers_post54 = pd.merge(df_articles, df_newspapers, how="left", on="id")
df_newspapers_post54.head()

Out[10]:

	number_of_articles	id	title	state	issn	troveUrl	startDate	endDate
0	2567488	11	The Canberra Times (ACT : 1926 - 1995)	ACT	01576925	https://trove.nla.gov.au/ndp/del/title/11	1926-09-03	1995-12-31
1	573658	1685	The Australian Jewish News (Melbourne, Vic. : ...	Victoria	NDP00187	https://trove.nla.gov.au/ndp/del/title/1685	1935-05-24	1999-12-24
2	417472	1376	Papua New Guinea Post-Courier (Port Moresby : ...	International	22087427	https://trove.nla.gov.au/ndp/del/title/1376	1969-06-30	1981-06-30
3	263618	1694	The Australian Jewish Times (Sydney, NSW : 195...	New South Wales	NDP00196	https://trove.nla.gov.au/ndp/del/title/1694	1953-10-16	1990-04-06
4	225466	112	The Australian Women's Weekly (1933 - 1982)	National	00050458	https://trove.nla.gov.au/ndp/del/title/112	1933-06-10	1982-12-15

Results¶

In [11]:

# How many newspapers?
df_newspapers_post54.shape[0]

Out[11]:

In [12]:

# Reorder columns and save as CSV
df_newspapers_post54[
    [
        "title",
        "state",
        "id",
        "startDate",
        "endDate",
        "issn",
        "number_of_articles",
        "troveUrl",
    ]
].to_csv("newspapers_post_54.csv", index=False)
# Display a link for easy download
display(FileLink("newspapers_post_54.csv"))

newspapers_post_54.csv

Created by Tim Sherratt for the GLAM Workbench.
Support this project by becoming a GitHub sponsor.

Beyond the copyright cliff of death¶

Search for articles published after 1955¶

Match the facets with newspapers¶

Results¶