Download an image using the IIIF server and a Handle url

IIIF (the International Image Interoperability Framework) has defined a set of standards for publishing and using image collections. The State Library of Victoria makes many of its images available from an IIIF-compliant server. This means you can access and manipulate the images in standard ways set out by the IIIF Image API.

Many images in the State Library's collection also have a permanent url, created using the Handle system. These are displayed in Trove. But there's no obvious way of getting an IIIF image from a Handle. Of course, if you're a human you can load the image page in your browser and click on the download button. But what if you want to build a processing pipeline, or create a dataset of images? Wouldn't it be good if you could just supply a Handle url and get back an image in whatever format or size you wanted? That's what this notebook does.

One odd thing I discovered when developing this notebook is that if you try to download an image from the IIIF server using the @id in the image manifest, you get a 403 ('Forbidden') error. This seems to be because the server is expecting a session cookie in the request headers. In order to set this cookie, you have to first download the image manifest itself and submit the saved cookie with the image request (fortunately requests.Session() makes this easy).

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

Some tips:

  • Code cells have boxes around them.
  • To run a code cell click on the cell and then hit Shift+Enter. The Shift+Enter combo will also move you to the next cell, so it's a quick way to work through the notebook.
  • While a cell is running a * appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.
  • In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.
  • To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

Setting things up

In [9]:
import requests
import re
from pathlib import Path
from IPython.display import display, HTML
In [16]:
def get_pid(handle_url):
    '''
    Extract a pid (image identifier) from the image viewer url.
    '''
    # The handle url will get redirected to a system url that includes a pid
    response = requests.get(handle_url)
    
    # Get the pid from the redirected url
    match = re.search(r'entity=(IE\d+)', response.url)
    if match:
        return match.group(1)
    
def get_image_ids(manifest):
    '''
    Extract a list of image @ids from an IIIF manifest
    '''
    image_ids = []
    # There can be multiple images in a record
    # So we loop through the canvases to get each one.
    for canvas in manifest['sequences'][0]['canvases']:
        image_ids.append(canvas['images'][0]['resource']['service']['@id'])
    return image_ids

def construct_image_url(image_id, image_type, max_width, max_height):
    '''
    Construct a url to download the image according to the IIIF standard.
    '''
    # Create a string with the size information -- either 'w,h', 'w,', ',h', or 'max'
    if max_width and max_height:
        size = f'!{max_width},{max_height}'
    elif max_width:
        size = f'{max_width},'
    elif max_height:
        size = f',{max_height}'
    else:
        size = 'max'
    # Construct the url
    return f'{image_id}/full/{size}/0/default.{image_type}'
    
def download_image(handle_url, image_type='jpg', max_width=None, max_height=None):
    '''
    Downloads a derivative image from the IIIF server using its Handle, and saves
    it to the 'images' folder.
    Params:
        handle_url: the Handle link for this image (required)
        image_type: one of 'jpg', 'tif', 'png'
        max_width: maximum width in pixels
        max_height: maximum height in pixels
    '''
    pid = get_pid(handle_url)
    manifest_url = f'https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/{pid}/manifest'
    
    # We need to use a session to save cookies
    s = requests.Session()
    
    # Get the IIIF manifest
    # Requesting the manifest also sets a cookie
    response = s.get(manifest_url)
    manifest = response.json()
    
    # Extract the image ids from the manifest
    image_ids = get_image_ids(manifest)
    
    # Make sure there's somewhere to save the images
    Path('images').mkdir(parents=True, exist_ok=True)
    
    #  Loop through the image ids, downloading each image
    for index, image_id in enumerate(image_ids):
        
        # Construct a filename using the image pid and a numeric index
        filename = Path(f'images/slv-{pid}-{index}.{image_type}')
        
        # Construct an IIIF compliant url
        image_url = construct_image_url(image_id, image_type, max_width, max_height)
        
        # Download and save the image
        response = s.get(image_url)
        filename.write_bytes(response.content)
        
        # Display the image if possible
        if image_type in ['jpg', 'png']:
            display(HTML(f'<a href="{filename}"><img width=500 src="{filename}"></a>'))

Downloading an image

To download an image (or images) from a Handle url, just copy and paste the url into the cell below. By default, this will download the largest available version of the image in jpeg format. You can modify this behaviour by supplying any of the following parameters:

  • image_type: one of 'jpg', 'tif', 'png'
  • max_width: maximum width of the image in pixels
  • max_height: maximum height of the image in pixels

For example to get a fullsize TIFF version:

download_image('http://handle.slv.vic.gov.au/10381/282282', image_type='tif')

To get a PNG file that's 200 pixels wide:

download_image('http://handle.slv.vic.gov.au/10381/282282', image_type='png', max_width=200)

You'll find the downloaded image(s) in the images directory.

In [17]:
# Paste a Handle url between the quotes
download_image('http://handle.slv.vic.gov.au/10381/25896')

Of course there's a lot more fun things you can do with IIIF – we'll explore that in another notebook...


Created by Tim Sherratt for the GLAM Workbench.

If you find this useful, please consider supporting my work on Patreon.