Accessing Europeana IIIF APIs

Europeana IIIF APIs, allows us to download, share, and reuse images and text of Europeana newspapers.

This notebook introduces how to explore the repository, search, read a record, obtain the fulltext and create a CSV dataset.

Europeana IIIF APIs requires an API key to access the endpoints. Please register at https://pro.europeana.eu/page/get-api to get a key.

Setting up things

In [ ]:
import requests, csv
import json
import pandas as pd

Glogal configuration

In this section, we can add our api_key, the text that we want to use to search and retrieve the elements, and the number of records to retrieve.

In [ ]:
api_key = 'add_your_api' #J6W44jvPV
query = 'paris'

Performing a search using the API

The API allows us to search on text and retrieve the hits highlighted, as traditional systems (e.g. Lucene and Solr).

In [ ]:
url = 'https://newspapers.eanadev.org/api/v2/search.json'
r = requests.get(url, params = {'query': query, 'profile': 'hits', 'wskey': api_key })
print(r.url)
response = r.text
#print(response)

Displaying the mentions in the transcribed text where the search keyword was found

In [ ]:
results = json.loads(response)

for r in results['hits']:
    print('id:' + r['scope'])
    for s in r['selectors']:
        
        print(s.get('prefix', '') + s.get('exact', '') + s.get('suffix', ''))

Creating a CSV file

In [ ]:
csv_out = csv.writer(open('eu_records.csv', 'w'), delimiter = ',', quotechar = '"', quoting = csv.QUOTE_MINIMAL)
csv_out.writerow(['title', 'thumbnail', 'date', 'license', 'typem', 'language', 'fulltextUrl', 'manifestUrl', 'fulltext'])

Retrieving the manifests

A manifest describes the information needed for a viewer to present a digital object to the user, such as the title and the sequence of views/images. We can also retrieve the manifest of each item. According to the Europeana documentation, the request follows the pattern https://iiif.europeana.eu/presentation/[RECORD_ID]/manifest

The manifest includes the metadata, some of the attribues are multivalued.

The full text is available at https://www.europeana.eu/api/fulltext/9200303/BibliographicResource_3000059898023/472ef0641de5cce2ba8eb26d67110ed6#char=0,10o

In [ ]:
results = json.loads(response)

for r in results['hits']:
    
    title = thumbnail = date = license = typem = language = fulltextUrl = manifestUrl = fulltext =''
    
    manifestUrl = 'https://iiif.europeana.eu/presentation/' + r['scope'] + '/manifest'
    responseManifest = requests.get(manifestUrl, params = {'wskey': api_key })
    print(responseManifest.url)
    
    # retrieving the metadata
    m = json.loads(responseManifest.text)
    
    # retrieving metadata
    title = m['label'][0]['@value']
    thumbnail = m['thumbnail']['@id']
    date = m['navDate']
    license = m['license']

    for i in m['metadata']:
        if i['label'] == 'type':
            typem = i['value'][0]['@value']
        elif i['label'] == 'language':
            language = i['value'][0]['@value']
        else: pass
        
    ## getting the full text
    annopageUrl = 'https://iiif.europeana.eu/presentation/' + r['scope'] + '/annopage/1'
    responseAnnopage = requests.get(annopageUrl, params = {'wskey': api_key })
    print(responseAnnopage.url)
    
    a = json.loads(responseAnnopage.text)
    fulltextUrl = a['resources'][0]['resource']['@id']
    print(fulltextUrl)
    
    responseFulltext = requests.get(fulltextUrl, params = {'wskey': api_key })
   
    # retrieving the metadata
    f = json.loads(responseFulltext.text)
    # TODO check encoding
    fulltext = f['value']
   
    print('-------')
    
    csv_out.writerow([title, thumbnail, date, license, typem, language, fulltextUrl, manifestUrl, fulltext])
In [ ]:
# Load the CSV file from GitHub.
# This puts the data in a Pandas DataFrame
df = pd.read_csv('eu_records.csv')

Have a peek

In [ ]:
df

Once we have queried the repository and we have the metadata as a CSV file, let's show the results as a thumbnail gallery.

In [ ]:
from IPython.display import HTML, Image

def _src_from_data(data):
    """Base64 encodes image bytes for inclusion in an HTML img element"""
    img_obj = Image(data=data)
    for bundle in img_obj._repr_mimebundle_():
        for mimetype, b64value in bundle.items():
            if mimetype.startswith('image/'):
                return f'data:{mimetype};base64,{b64value}'

def gallery(images, row_height='auto'):
    """Shows a set of images in a gallery that flexes with the width of the notebook.
    
    Parameters
    ----------
    images: list of str or bytes
        URLs or bytes of images to display

    row_height: str
        CSS height value to assign to all images. Set to 'auto' by default to show images
        with their native dimensions. Set to a value like '250px' to make all rows
        in the gallery equal height.
    """
    figures = []
    for image in images:
        if isinstance(image, bytes):
            src = _src_from_data(image)
            caption = ''
        else:
            src = image
            caption = f'<figcaption style="font-size: 0.6em">{image}</figcaption>'
        figures.append(f'''
            <figure style="margin: 5px !important;">
              <img src="{src}" style="height: {row_height}">
              
            </figure>
        ''')
    return HTML(data=f'''
        <div style="display: flex; flex-flow: row wrap; text-align: center;">
        {''.join(figures)}
        </div>
    ''')
In [ ]:
#gallery(urls, row_height='150px')
gallery(df['thumbnail'], row_height='150px')
In [ ]:
 
In [ ]: