Exploring object records¶

In this notebook we'll have a preliminary poke around in the object data harvested from the NMA Collection API. I'll focus here on the basic shape/stats of the data, other notebooks will explore the object data over time and space.

If you haven't already, you'll either need to harvest the object data, or unzip a pre-harvested dataset.

The shape of the data
Nested data
The additionalType field
The extent field
How big is the collection?
The biggest object?

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

Some tips:

Code cells have boxes around them.
To run a code cell click on the cell and then hit Shift+Enter. The Shift+Enter combo will also move you to the next cell, so it's a quick way to work through the notebook.
While a cell is running a * appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.
In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.
To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

Import what we need¶

In [23]:

import pandas as pd
import math
from IPython.display import display, HTML, FileLink
from tinydb import TinyDB, Query
from pandas import json_normalize

Load the harvested data¶

In [2]:

# Load the harvested data from the json db
db = TinyDB('nma_object_db.json')
records = db.all()
Object = Query()

In [3]:

# Convert to a dataframe
df = pd.DataFrame(records)
df.head()

Out[3]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	educationalSignificance	location	acknowledgement
0	145400	object	Wahlo and Tribal law by Kevin Gilbert, reprint...	{'modified': '2018-07-09', 'issued': '2011-10-...	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	251390	object	Pair of woven shoes made from feathers and hair	{'modified': '2019-01-17', 'issued': '2018-04-...	[Shoes]	{'id': '5244', 'type': 'Collection', 'title': ...	2000.0014.0495	[{'type': 'Material', 'title': 'Feather'}, {'t...	{'type': 'Measurement', 'length': 260, 'width'...	Shoes, the soles of which are made from woven ...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	A pair of ceremonial shoes made with several m...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	21507	object	Grinding stone	{'modified': '2018-06-19', 'issued': '2014-12-...	[Grinding stones]	{'id': '2229', 'type': 'Collection', 'title': ...	1985.0288.0109	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	142308	object	'time CHange' [sic]	{'modified': '2019-04-15', 'issued': '2012-06-...	[Compact discs]	{'id': '3893', 'type': 'Collection', 'title': ...	AR00213.012	NaN	NaN	A compact disc, housed within a clear and blac...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 25 columns

The shape of the data¶

How many objects are there?

In [4]:

print('There are {:,} objects in the collection'.format(df.shape[0]))

There are 86,717 objects in the collection

Obviously not every record has a value for every field, let's create a quick count of the number of values in each field.

In [5]:

df.count()

Out[5]:

id                         86717
type                       86717
title                      86558
_meta                      86717
additionalType             86690
collection                 84289
identifier                 86692
medium                     73952
extent                     64199
physicalDescription        86397
significanceStatement      32468
creator                    25119
spatial                    46773
contributor                40760
isAggregatedBy              4353
isPartOf                   10769
seeAlso                      467
description                 9128
hasVersion                 20159
temporal                   29597
relation                    3096
hasPart                     2350
educationalSignificance      201
location                    1069
acknowledgement              789
dtype: int64

Let's express those counts as a percentage of the total number of records, and display them as a bar chart using Pandas.

In [6]:

# Get field counts and convert to dataframe
field_counts = df.count().to_frame().reset_index()

# Change column headings
field_counts.columns = ['field', 'count']

# Calculate proportion of the total
field_counts['proportion'] = field_counts['count'].apply(lambda x: x / df.shape[0])

# Style the results as a barchart
field_counts.style.bar(subset=['proportion'], color='#d65f5f').format({'proportion': '{:.2%}'.format})

Out[6]:

	field	count	proportion
0	id	86717	100.00%
1	type	86717	100.00%
2	title	86558	99.82%
3	_meta	86717	100.00%
4	additionalType	86690	99.97%
5	collection	84289	97.20%
6	identifier	86692	99.97%
7	medium	73952	85.28%
8	extent	64199	74.03%
9	physicalDescription	86397	99.63%
10	significanceStatement	32468	37.44%
11	creator	25119	28.97%
12	spatial	46773	53.94%
13	contributor	40760	47.00%
14	isAggregatedBy	4353	5.02%
15	isPartOf	10769	12.42%
16	seeAlso	467	0.54%
17	description	9128	10.53%
18	hasVersion	20159	23.25%
19	temporal	29597	34.13%
20	relation	3096	3.57%
21	hasPart	2350	2.71%
22	educationalSignificance	201	0.23%
23	location	1069	1.23%
24	acknowledgement	789	0.91%

Nested data¶

One thing you might note is that some of the fields contain nested JSON arrays or objects. For example additionalType contains a list of object types, while extent is a dictionary with keys and values. Let's unpack these columns for the second row (index of 1).

In [7]:

df['additionalType'][1][0]

Out[7]:

'Shoes'

In [8]:

df['extent'][1]

Out[8]:

{'type': 'Measurement',
 'length': 260,
 'width': 120,
 'depth': 40,
 'unitText': 'mm'}

In [9]:

df['extent'][1]['length']

Out[9]:

The `additionalType` field¶

How many objects have values in the additionalType column?

In [10]:

df.loc[df['additionalType'].notnull()].shape

Out[10]:

(86690, 25)

In [11]:

print('{:%} of objects have an additionalType value'.format(df.loc[df['additionalType'].notnull()].shape[0] / df.shape[0]))

99.968864% of objects have an additionalType value

So which ones don't have an additionalType?

In [12]:

# Just show the first 5 rows
df.loc[df['additionalType'].isnull()].head()

Out[12]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	educationalSignificance	location	acknowledgement
0	145400	object	Wahlo and Tribal law by Kevin Gilbert, reprint...	{'modified': '2018-07-09', 'issued': '2011-10-...	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	A pair of ceremonial shoes made with several m...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1728	180161	object	Awelye- panel 1 by Lily Kngwarreye	{'copyright': '', 'licence': ''}	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1939	224632	object	Glass plate negative of family and horse stand...	{'copyright': '', 'licence': ''}	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3416	180165	object	Awelye- panel 3 by Lily Kngwarreye	{'copyright': '', 'licence': ''}	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 25 columns

How many rows have more than one additionalType?

In [13]:

df.loc[df['additionalType'].str.len() > 1].shape[0]

Out[13]:

Let's have a look at a sample.

In [14]:

df.loc[df['additionalType'].str.len() > 1].head()

Out[14]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	educationalSignificance	location	acknowledgement
45	202601	object	Album of Newspaper clippings	{'modified': '2019-04-22', 'issued': '2010-11-...	[Albums, Newspaper clippings]	{'id': '4760', 'type': 'Collection', 'title': ...	1989.0009.0108	[{'type': 'Material', 'title': 'Cardboard'}, {...	{'type': 'Measurement', 'height': 345, 'width'...	A brown textured hardback album with gold colo...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1935', 'startDate...	NaN	NaN	NaN	NaN	NaN
113	256766	object	Handmade wolf figurine in yellow dress likely ...	{'modified': '2018-12-13', 'issued': '2018-10-...	[Novelty toys, Toys]	{'id': '6773', 'type': 'Collection', 'title': ...	2013.0038.0556.005	[{'type': 'Material', 'title': 'Cotton thread'...	{'type': 'Measurement', 'height': 88, 'width':...	A handmade wolf figurine robed in a yellow dre...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1925 - 1935', 'st...	NaN	NaN	NaN	NaN	NaN
133	223557	object	Receipt issued to Tirranna Race Club, 1878	{'modified': '2019-04-23', 'issued': '2017-11-...	[Invoices, Receipts]	{'id': '6139', 'type': 'Collection', 'title': ...	2012.0019.0170	[{'type': 'Material', 'title': 'Ink'}, {'type'...	{'type': 'Measurement', 'height': 114, 'width'...	A receipt handwritten on a piece of grey paper...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1878', 'startDate...	NaN	NaN	NaN	NaN	NaN
219	231018	object	Cycling jersey worn by Harry Clarke	{'modified': '2019-04-12', 'issued': '2017-03-...	[Gee clamps, Sports clothing]	{'id': '7017', 'type': 'Collection', 'title': ...	2013.0033.0002	[{'type': 'Material', 'title': 'Polyester clot...	{'type': 'Measurement', 'height': 610, 'width'...	A short sleeved, striped brown, black and tan ...	...	NaN	NaN	Brown and yellow cycling jersey worn by Harry ...	[{'id': '131401', 'type': 'StillImage', 'ident...	[{'type': 'Event', 'title': '1988', 'startDate...	NaN	NaN	NaN	NaN	NaN
301	255447	object	Pair of orange leather dolls shoes with pom pom	{'modified': '2019-04-24', 'issued': '2018-06-...	[Dolls clothing, Shoes]	{'id': '6773', 'type': 'Collection', 'title': ...	2013.0038.0315	[{'type': 'Material', 'title': 'Cotton thread'...	{'type': 'Measurement', 'height': 25, 'width':...	A pair of orange leather dolls shoes with one ...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1925 - 1935', 'st...	NaN	NaN	NaN	NaN	NaN

5 rows × 25 columns

The additionalType field contains a nested list of values. Using json_normalize() or explode() we can explode these lists, creating a row for each separate value.

In [15]:

# Use json_normalize to expand 'additionalType' into separate rows, adding the id and title from the parent record
# df_types = json_normalize(df.loc[df['additionalType'].notnull()].to_dict('records'), record_path='additionalType', meta=['id', 'title'], errors='ignore').rename({0: 'additionalType'}, axis=1)

# In pandas v.0.25 and above you can just use explode -- this prodices the same result as above
df_types = df.loc[df['additionalType'].notnull()][['id', 'title', 'additionalType']].explode('additionalType')

df_types.head()

Out[15]:

	id	title	additionalType
1	251390	Pair of woven shoes made from feathers and hair	Shoes
3	21507	Grinding stone	Grinding stones
4	142308	'time CHange' [sic]	Compact discs
5	20174	Ten Days To Live - A supposed sorcery painting.	Bark paintings
6	144359	'The Dance of Life (1898-1902)' by Diana Boyer...	Booklets

Now that we've exploded the type values, we can aggregate them in different ways. Let's look at the 25 most common object types!

In [16]:

df_types['additionalType'].value_counts()[:25]

Out[16]:

Mineral samples                   6000
Photographs                       4742
Stone artefacts                   4364
Photographic postcards            4250
Drawings                          3755
Postcards                         3697
Zoological specimens              2168
Bark paintings                    2107
Geological specimens              1993
Engravings                        1498
Cartoons                          1384
Negatives                         1124
Boomerangs                        1025
Spears                            1012
Percussion and abrading stones     982
Paintings                          840
Clubs                              747
Mounts                             745
Cards                              709
Armbands                           649
Shells                             563
Letters                            543
Documents                          519
Geophysical survey equipment       509
Posters                            497
Name: additionalType, dtype: int64

How many object types only appear once?

In [17]:

type_counts = df_types['additionalType'].value_counts().to_frame().reset_index().rename({'index': 'type', 'additionalType': 'count'}, axis=1)
unique_types = type_counts.loc[type_counts['count'] == 1]
unique_types.shape[0]

Out[17]:

In [18]:

unique_types.head()

Out[18]:

	type	count
1854	Medications	1
1855	Hollow bits	1
1856	Television cameras	1
1857	Art drawings	1
1858	Electric indicators	1

Let's save the complete list of types as a CSV file.

In [19]:

type_counts.to_csv('nma_object_type_counts.csv', index=False)
display(FileLink('nma_object_type_counts.csv'))

nma_object_type_counts.csv

Browsing the CSV I noticed that there was one item with the type Vegetables. Let's find some more out about it.

In [20]:

# Find in the complete data set
mask = df.loc[df['additionalType'].notnull()]['additionalType'].apply(lambda x: 'Vegetables' in x)
veggie = df.loc[df['additionalType'].notnull()][mask]
veggie

Out[20]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	educationalSignificance	location	acknowledgement
21559	256742	object	Wooden toy toad stalk	{'modified': '2019-04-24', 'issued': '2018-10-...	[Toys, Vegetables]	{'id': '6773', 'type': 'Collection', 'title': ...	2013.0038.0540	[{'type': 'Material', 'title': 'Paint - non sp...	{'type': 'Measurement', 'height': 65, 'diamete...	A painted wooden toy toad stalk with a red cap...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1925 - 1935', 'st...	NaN	NaN	NaN	NaN	NaN

1 rows × 25 columns

We can create a link into the NMA Collections Explorer using the object id.

In [21]:

display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(veggie.iloc[0]['id'], veggie.iloc[0]['title'])))

Wooden toy toad stalk

Does a toad stool count as a vegetable?

The `extent` field¶

The extent field is a nested object, so once again we'll use json_normalize() to expand it out into separate columns.

In [24]:

# Without reset_index() the rows are misaligned
df_extent = df.loc[df['extent'].notnull()].reset_index().join(json_normalize(df.loc[df['extent'].notnull()]['extent'].tolist()).add_prefix("extent_"))
df_extent.head()

Out[24]:

	index	id	type	title	_meta	additionalType	collection	identifier	medium	extent	...	acknowledgement	extent_type	extent_length	extent_width	extent_depth	extent_unitText	extent_height	extent_diameter	extent_weight	extent_unitTextWeight
0	1	251390	object	Pair of woven shoes made from feathers and hair	{'modified': '2019-01-17', 'issued': '2018-04-...	[Shoes]	{'id': '5244', 'type': 'Collection', 'title': ...	2000.0014.0495	[{'type': 'Material', 'title': 'Feather'}, {'t...	{'type': 'Measurement', 'length': 260, 'width'...	...	NaN	Measurement	260.0	120.0	40.0	mm	NaN	NaN	NaN	NaN
1	2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	...	NaN	Measurement	246.0	190.0	45.0	mm	NaN	NaN	NaN	NaN
2	5	20174	object	Ten Days To Live - A supposed sorcery painting.	{'modified': '2019-04-21', 'issued': '2013-06-...	[Bark paintings]	{'id': '2202', 'type': 'Collection', 'title': ...	1985.0246.0077	[{'type': 'Material', 'title': 'Bark'}, {'type...	{'type': 'Measurement', 'length': 574, 'width'...	...	NaN	Measurement	574.0	185.0	NaN	mm	NaN	NaN	NaN	NaN
3	6	144359	object	'The Dance of Life (1898-1902)' by Diana Boyer...	{'modified': '2018-06-18', 'issued': '2012-06-...	[Booklets]	{'id': '3893', 'type': 'Collection', 'title': ...	2008.0043.0022.001	[{'type': 'Material', 'title': 'Paper'}, {'typ...	{'type': 'Measurement', 'height': 214, 'width'...	...	NaN	Measurement	NaN	150.0	5.0	mm	214.0	NaN	NaN	NaN
4	8	42084	object	Child's drawing by Lester Moran, Cabbage Tree ...	{'modified': '2019-10-14', 'issued': '2016-10-...	[Drawings]	{'id': '2261', 'type': 'Collection', 'title': ...	1991.0024.0027	[{'type': 'Material', 'title': 'Paint - non sp...	{'type': 'Measurement', 'length': 560, 'width'...	...	NaN	Measurement	560.0	380.0	0.5	mm	NaN	NaN	NaN	NaN

5 rows × 35 columns

Let's check to see what types of things are in the extent field.

In [25]:

df_extent['extent_type'].value_counts()

Out[25]:

Measurement    64199
Name: extent_type, dtype: int64

So they're all measurements. Let's have a look at the units being used.

In [26]:

df_extent['extent_unitText'].value_counts()

Out[26]:

mm    63504
MM       10
cm        9
m         5
Name: extent_unitText, dtype: int64

In [27]:

df_extent['extent_unitTextWeight'].value_counts()

Out[27]:

g        1713
kg        212
lb          5
oz          4
tonne       1
Name: extent_unitTextWeight, dtype: int64

Hmmm, are those measurements really in metres, or might they be meant to be 'mm'? Let's have a look at them.

In [28]:

df_extent.loc[df_extent['extent_unitText'] == 'm'][['id', 'title', 'extent_length', 'extent_width', 'extent_unitText']]

Out[28]:

	id	title	extent_length	extent_width	extent_unitText
8968	202783	The Percival Project, Gull Twelve, in a manill...	NaN	230.0	m
13210	257184	Fishing line inside envelope	137.0000	110.0	m
23356	171768	Fair Breeze	NaN	138.0	m
31845	123962	Gunter's chain	20.1168	NaN	m
63827	214193	Extension tube	55.0000	NaN	m

Other than 'Gunter's chain' it looks like the unit should indeed by 'mm'. We'll need to take that into account in calculations.

Now let's convert all the measurements into a single unit – millimetre for lengths, and gram for weights.

In [29]:

def conversion_factor(unit):
    '''
    Get the factor required to convery current unit to either mm or g.
    '''
    factors = {
        'mm': 1,
        'cm': 10,
        'm': 1, # Most should in fact be mm (see above)
        'g': 1,
        'kg': 1000,
        'tonne': 1000000,
        'oz': 28.35,
        'lb': 453.592
    }
    try:
        factor = factors[unit.lower()]
    except KeyError:
        factor = 0 
    return factor

def normalise_measurements(row):
    '''
    Convert measurements to standard units.
    '''
    l_factor = conversion_factor(str(row['extent_unitText']))
    length = row['extent_length'] * l_factor
    width = row['extent_width'] * l_factor
    depth = row['extent_depth'] * l_factor
    height = row['extent_height'] * l_factor
    diameter = row['extent_diameter'] * l_factor
    w_factor = conversion_factor(str(row['extent_unitTextWeight']))
    weight = row['extent_weight'] * w_factor
    return pd.Series([length, width, depth, height, diameter, weight])

# Add normalised measurements to the dataframe
df_extent[['length_mm', 'width_mm', 'depth_mm', 'height_mm', 'diameter_mm', 'weight_g']] = df_extent.apply(normalise_measurements, axis=1)

In [30]:

df_extent.head()

Out[30]:

	index	id	type	title	_meta	additionalType	collection	identifier	medium	extent	...	extent_height	extent_diameter	extent_weight	extent_unitTextWeight	length_mm	width_mm	depth_mm	height_mm	diameter_mm	weight_g
0	1	251390	object	Pair of woven shoes made from feathers and hair	{'modified': '2019-01-17', 'issued': '2018-04-...	[Shoes]	{'id': '5244', 'type': 'Collection', 'title': ...	2000.0014.0495	[{'type': 'Material', 'title': 'Feather'}, {'t...	{'type': 'Measurement', 'length': 260, 'width'...	...	NaN	NaN	NaN	NaN	260.0	120.0	40.0	NaN	NaN	NaN
1	2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	...	NaN	NaN	NaN	NaN	246.0	190.0	45.0	NaN	NaN	NaN
2	5	20174	object	Ten Days To Live - A supposed sorcery painting.	{'modified': '2019-04-21', 'issued': '2013-06-...	[Bark paintings]	{'id': '2202', 'type': 'Collection', 'title': ...	1985.0246.0077	[{'type': 'Material', 'title': 'Bark'}, {'type...	{'type': 'Measurement', 'length': 574, 'width'...	...	NaN	NaN	NaN	NaN	574.0	185.0	NaN	NaN	NaN	NaN
3	6	144359	object	'The Dance of Life (1898-1902)' by Diana Boyer...	{'modified': '2018-06-18', 'issued': '2012-06-...	[Booklets]	{'id': '3893', 'type': 'Collection', 'title': ...	2008.0043.0022.001	[{'type': 'Material', 'title': 'Paper'}, {'typ...	{'type': 'Measurement', 'height': 214, 'width'...	...	214.0	NaN	NaN	NaN	NaN	150.0	5.0	214.0	NaN	NaN
4	8	42084	object	Child's drawing by Lester Moran, Cabbage Tree ...	{'modified': '2019-10-14', 'issued': '2016-10-...	[Drawings]	{'id': '2261', 'type': 'Collection', 'title': ...	1991.0024.0027	[{'type': 'Material', 'title': 'Paint - non sp...	{'type': 'Measurement', 'length': 560, 'width'...	...	NaN	NaN	NaN	NaN	560.0	380.0	0.5	NaN	NaN	NaN

5 rows × 41 columns

How big is the collection?¶

In [31]:

def calculate_volume(row):
    '''
    Look for 3 linear dimensions and multiply them to get a volume.
    '''
    # Create a list of valid linear measurements from the available fields
    dimensions = [d for d in [row['length_mm'], row['width_mm'], row['depth_mm'], row['height_mm'], row['diameter_mm']] if not math.isnan(d)]
    
    # If there's only 2 dimensions...
    if len(dimensions) == 2:
        # Set a default height of 1 for items with only 2 dimensions
        dimensions.append(1)
        
    # If there's 3 or more dimensions, multiple the first 3 together
    if len(dimensions) >= 3:
        volume = dimensions[0] * dimensions[1] * dimensions[2]
    else:
        volume = 0
    return volume

df_extent['volume'] = df_extent.apply(calculate_volume, axis=1)

In [32]:

print('Total length of objects is {:.2f} km'.format(df_extent['length_mm'].sum() / 1000 / 1000))

Total length of objects is 15.38 km

In [33]:

print('Total weight of objects is {:.2f} tonnes'.format(df_extent['weight_g'].sum() / 1000000))

Total weight of objects is 197.16 tonnes

In [34]:

print('Total volume of objects is {:.2f} m\N{SUPERSCRIPT THREE}'.format(df_extent['volume'].sum() / 1000000000))

Total volume of objects is 2911.19 m³

The biggest object?¶

What's the biggest thing?

In [35]:

# Get the object with the largest volume
biggest = df_extent.loc[df_extent['volume'].idxmax()]

# Create a link to Collection Explorer
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(biggest['id'], biggest['title'])))

Percival Proctor Mk 1 monoplane VH-FEP

Created by Tim Sherratt for the GLAM Workbench.

Work on this notebook was supported by the Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab.

Exploring object records¶

Import what we need¶

Load the harvested data¶

The shape of the data¶

Nested data¶

The additionalType field¶

The extent field¶

How big is the collection?¶

The biggest object?¶

The `additionalType` field¶

The `extent` field¶