# Download an image using the IIIF server and a Handle url¶

IIIF (the International Image Interoperability Framework) has defined a set of standards for publishing and using image collections. The State Library of Victoria makes many of its images available from an IIIF-compliant server. This means you can access and manipulate the images in standard ways set out by the IIIF Image API.

Many images in the State Library's collection also have a permanent url, created using the Handle system. These are displayed in Trove. But there's no obvious way of getting an IIIF image from a Handle. Of course, if you're a human you can load the image page in your browser and click on the download button. But what if you want to build a processing pipeline, or create a dataset of images? Wouldn't it be good if you could just supply a Handle url and get back an image in whatever format or size you wanted? That's what this notebook does.

One odd thing I discovered when developing this notebook is that if you try to download an image from the IIIF server using the @id in the image manifest, you get a 403 ('Forbidden') error. This seems to be because the server is expecting a session cookie in the request headers. In order to set this cookie, you have to first download the image manifest itself and submit the saved cookie with the image request (fortunately requests.Session() makes this easy).

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

Some tips:

• Code cells have boxes around them.
• To run a code cell click on the cell and then hit Shift+Enter. The Shift+Enter combo will also move you to the next cell, so it's a quick way to work through the notebook.
• While a cell is running a * appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.
• In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.
• To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

## Setting things up¶

In [9]:
import requests
import re
from pathlib import Path
from IPython.display import display, HTML

In [16]:
def get_pid(handle_url):
'''
Extract a pid (image identifier) from the image viewer url.
'''
# The handle url will get redirected to a system url that includes a pid
response = requests.get(handle_url)

# Get the pid from the redirected url
match = re.search(r'entity=(IE\d+)', response.url)
if match:
return match.group(1)

def get_image_ids(manifest):
'''
Extract a list of image @ids from an IIIF manifest
'''
image_ids = []
# There can be multiple images in a record
# So we loop through the canvases to get each one.
for canvas in manifest['sequences'][0]['canvases']:
image_ids.append(canvas['images'][0]['resource']['service']['@id'])
return image_ids

def construct_image_url(image_id, image_type, max_width, max_height):
'''
Construct a url to download the image according to the IIIF standard.
'''
# Create a string with the size information -- either 'w,h', 'w,', ',h', or 'max'
if max_width and max_height:
size = f'!{max_width},{max_height}'
elif max_width:
size = f'{max_width},'
elif max_height:
size = f',{max_height}'
else:
size = 'max'
# Construct the url
return f'{image_id}/full/{size}/0/default.{image_type}'

'''
Downloads a derivative image from the IIIF server using its Handle, and saves
it to the 'images' folder.
Params:
handle_url: the Handle link for this image (required)
image_type: one of 'jpg', 'tif', 'png'
max_width: maximum width in pixels
max_height: maximum height in pixels
'''
pid = get_pid(handle_url)
manifest_url = f'https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/{pid}/manifest'

# We need to use a session to save cookies
s = requests.Session()

# Get the IIIF manifest
# Requesting the manifest also sets a cookie
response = s.get(manifest_url)
manifest = response.json()

# Extract the image ids from the manifest
image_ids = get_image_ids(manifest)

# Make sure there's somewhere to save the images
Path('images').mkdir(parents=True, exist_ok=True)

for index, image_id in enumerate(image_ids):

# Construct a filename using the image pid and a numeric index
filename = Path(f'images/slv-{pid}-{index}.{image_type}')

# Construct an IIIF compliant url
image_url = construct_image_url(image_id, image_type, max_width, max_height)

response = s.get(image_url)
filename.write_bytes(response.content)

# Display the image if possible
if image_type in ['jpg', 'png']:
display(HTML(f'<a href="{filename}"><img width=500 src="{filename}"></a>'))


To download an image (or images) from a Handle url, just copy and paste the url into the cell below. By default, this will download the largest available version of the image in jpeg format. You can modify this behaviour by supplying any of the following parameters:

• image_type: one of 'jpg', 'tif', 'png'
• max_width: maximum width of the image in pixels
• max_height: maximum height of the image in pixels

For example to get a fullsize TIFF version:

download_image('http://handle.slv.vic.gov.au/10381/282282', image_type='tif')

To get a PNG file that's 200 pixels wide:

download_image('http://handle.slv.vic.gov.au/10381/282282', image_type='png', max_width=200)

You'll find the downloaded image(s) in the images directory.

In [17]:
# Paste a Handle url between the quotes