How many fact sheets survived the NAA website migration in 2019

In [2]:
import requests
from bs4 import BeautifulSoup

Get the most recent version of the fact sheet index from the Internet Archive

First we'll load the page.

In [3]:
# Note the 'id_' in the url to get the original page without the IA navigation.
response = requests.get('https://web.archive.org/web/20190716210347id_/http://www.naa.gov.au/collection/fact-sheets/by-number/index.aspx')
In [4]:
soup = BeautifulSoup(response.content)

Then we'll extract the rows from the index table.

In [5]:
fs_list = soup.find('table', title='Numerical list of fact sheets').find_all('tr')[1:]

Look for the fact sheets

Let's loop through all the rows in the fact sheet index, extracting the fact sheet number, title and url. Then we'll try loading the url. We'll save all the details and the HTTP status code for further exploration.

In [ ]:
fact_sheets = []
for row in fs_list:
    num = row.td.text
    fs = row.find('a')
    title = fs.text
    url = f'http://naa.gov.au{fs["href"]}'
    response = requests.get(url)
    status = response.status_code
    print(f'{title}: {status}')
    fact_sheets.append({'number': num, 'title': title, 'url': url, 'status': status})
Reading room addresses and hours of opening: 200
Using our collection: 404
Addresses of Australian archival institutions: 404
Reading room rules: 404
What are archives?: 404
Archival terms: 200
The Commonwealth Record Series (CRS) system: 200
Citing archival records: 200
Copyright: 200
Searching for records: 404
Access to records under the Archives Act: 200
Viewing records in the reading room: 404
What to do if we refuse you access: 200
RecordSearch: an overview: 404
Keyword searching in RecordSearch Advanced search screens: 404
Release of records containing personal information: 200
Service guidelines for the National Reference Service: 404
NameSearch: 200
PhotoSearch: 404
Parliamentary Papers: 404
Commonwealth of Australia Gazettes: 404
Customs House, Sydney: 404
Coastal fortifications in New South Wales: 404
Commonwealth Film Unit: 404
The wine industry in South Australia: 404
Tasmanian railways: 404
Australia First Movement: 404
Commonwealth banking policy: 404
Navy service records: 404
Navy crew and ships records: 404
RAAF service records: 404
Security intelligence records held in Canberra: 404
Cabinet records: 404
Administration of the Australian Capital Territory: 404
Military records held in Hobart: 404
Maritime records held in Hobart: 404
Passenger records held in Canberra: 404
Civilian service in World War II: 404
Research agents – Canberra: 404
Research agents – Sydney: 404
Research agents – Brisbane: 404
Research agents – Adelaide and Darwin: 404
Research agents – Melbourne and Hobart: 404
Research agents – Perth: 404
Why we refuse access: 200
Australian Overseas Information Service photographs: 404
Papua New Guinea patrol reports: 404
D Notices: 404
Post Office records: 404
Copying charges: 200
Exempt information in ASIO records: 404
Personal information in ASIO records: 404
Veterans' case files: 404
Fremantle Harbour: 404
Passenger records held in Perth: 404
Melbourne Olympics, 1956: 404
World War I internee, alien and POW records held in Canberra: 404
World War II internee, alien and POW records held in Canberra: 404
Design and development of the national capital: 404
World War II war crimes: 404
Indonesian independence: 404
War service information: 404
Passenger records held in Sydney: 404
Customs shipping records held in Sydney: 404
Migrant selection documents held in Canberra: 404
Boer War records: 404
Naturalisation records held in Canberra: 404
ASIO files on writers and literary groups: 404
Prime ministers of Australia: 404
Prime Minister Joseph Cook: 404
Prime Minister William Morris Hughes: 404
Prime Minister Stanley Melbourne Bruce: 404
Prime Minister James Henry Scullin: 404
Prime Minister Joseph Aloysius Lyons: 404
Prime Minister Earle Christmas Grafton Page: 404
Prime Minister Robert Gordon Menzies: 404
Prime Minister Arthur William Fadden: 404
Prime Minister John Joseph Ambrose Curtin: 404
Prime Minister Francis Michael Forde: 404
Prime Minister Joseph Benedict Chifley: 404
Prime Minister Harold Edward Holt: 404
Prime Minister John McEwen: 404
Prime Minister John Grey Gorton: 404
Family history sources held in Canberra: 404
Family history sources held in Adelaide: 404
Australia and the United Nations: 404
Births, deaths and marriages: 404
Cyclones and the Northern Territory: 404
Coastal fortifications in South Australia: 404
Customs houses in South Australia: 404
Customs House, Port Adelaide, South Australia: 404
Excise control of distilled products in South Australia: 404
Walter Burley Griffin and the design of Canberra: 404
J T Lang and Lang Labor: 404
Regulation of beer and brewing in South Australia: 404
Sir Frederick Shedden and the Shedden collection: 404
Records relating to Italian migration held in Sydney: 404
World War II internee, alien and POW records held in Sydney: 404
The Australian flag: 404
The Cocos (Keeling) Islands: 404
Commonwealth electoral rolls held in Perth: 404
Copyright records: 404
World War I internee, alien and POW records held in Adelaide: 404
World War II internee, alien and POW records held in Adelaide: 404
The Pastoral industry in the Northern Territory: 404
Building the provisional Parliament House: 404
When to use the Freedom of Information, Archives and Privacy Acts: 200
The sinking of HMAS Sydney, November 1941: 404
Royal Commission into Aboriginal Deaths in Custody: 404
Aboriginal and Torres Strait Islander people: 404
Memorandum of Understanding with Northern Territory Aboriginal people: 404
Introducing television to Australia, 1956: 404
Guides to the collection: 404
Australia's involvement in the Vietnam War: 404
Computer resources in reading rooms: 404
Commonwealth electoral rolls held in Brisbane: 404
Bankruptcy records held in Sydney: 404
General Sir John Monash: 404
Lighthouse records held in Hobart: 404
Records of British migrants held in Canberra: 404
Child migration to Australia: 404
Radar research in Australia during World War II: 404
Radar production and use during World War II: 404
War Cabinet records: 404
Cabinet notebooks: 404
British nuclear tests at Maralinga: 404
The Royal Commission on Espionage, 1954–55: 404
Posters: 404
World War ll Army pay files held in Adelaide: 404
Defence and service records held in Melbourne: 404
Colonial defence personnel records held in Melbourne: 404
Army administrative records held in Melbourne: 404
Army service records: 404
Navy administrative records held in Melbourne: 404
Navy service records held in Melbourne: 404
Royalty and Australian society: 404
Cockatoo Island Dockyard: 404

Examine the results

In [21]:
import pandas as pd
In [28]:
df = pd.DataFrame(fact_sheets)

Let's break down the results by HTTP status code.

In [29]:
df['status'].value_counts()
Out[29]:
404    251
200     15
Name: status, dtype: int64
In [42]:
print(f'{251 / (251+15):.2%} of fact sheets are kaput!')
94.36% of fact sheets are kaput!

Which fact sheets have survived?

In [30]:
df.loc[df['status'] == 200]
Out[30]:
number title url status
0 1 Reading room addresses and hours of opening http://naa.gov.au/collection/fact-sheets/fs01.... 200
5 5 Archival terms http://naa.gov.au/collection/fact-sheets/fs05.... 200
6 6 The Commonwealth Record Series (CRS) system http://naa.gov.au/collection/fact-sheets/fs06.... 200
7 7 Citing archival records http://naa.gov.au/collection/fact-sheets/fs07.... 200
8 8 Copyright http://naa.gov.au/collection/fact-sheets/fs08.... 200
10 10 Access to records under the Archives Act http://naa.gov.au/collection/fact-sheets/fs10.... 200
12 12 What to do if we refuse you access http://naa.gov.au/collection/fact-sheets/fs12.... 200
15 15 Release of records containing personal informa... http://naa.gov.au/collection/fact-sheets/fs15.... 200
17 18 NameSearch http://naa.gov.au/collection/fact-sheets/fs18.... 200
44 46 Why we refuse access http://naa.gov.au/collection/fact-sheets/fs46.... 200
49 51 Copying charges http://naa.gov.au/collection/fact-sheets/fs51.... 200
106 110 When to use the Freedom of Information, Archiv... http://naa.gov.au/collection/fact-sheets/fs110... 200
170 175 Bringing Them Home name index http://naa.gov.au/collection/fact-sheets/fs175... 200
190 195 The bombing of Darwin http://naa.gov.au/collection/fact-sheets/fs195... 200
214 220 Passenger arrivals index http://naa.gov.au/collection/fact-sheets/fs220... 200

Save the results as a CSV

In [31]:
df.to_csv('data/fact_sheets.csv', index=False)