Notebook

LOC Colors - Production¶

Calculate color swatches for historic postcards

Code¶

In [1]:

from PIL import Image
from sys import exit
from io import BytesIO
from colorsys import rgb_to_hsv, hsv_to_rgb
from scipy.cluster.vq import kmeans
from numpy import array

In [2]:

DEFAULT_NUM_COLORS = 6
# default minimum and maximum values are used to clamp the color values to a specific range
# originally this was set to 170 and 200, but I'm running with 0 and 256 in order to 
# not clamp the values. This can also be set as a parameter. 
DEFAULT_MINV = 0
DEFAULT_MAXV = 256

THUMB_SIZE = (200, 200)
SCALE = 256.0

def down_scale(x):
    return x / SCALE

def up_scale(x):
    return int(x * SCALE)

The original code by Laura Wrubel uses the RGB (red, green, blue) color space for most color computations.

We're using the HSV (hue, saturation, value) color space for clustering in the hope of getting prettier and more colorful results for our historic postcards.

That necessitates modifying some utility functions:

In [3]:

def clamp_hsv(color, min_v, max_v):
    """
    Clamps a color such that the value (lightness) is between min_v and max_v.
    """
    # use down_scale to convert color to value between 0-1 as expected by rgb_hsv
    h, s, v = [down_scale(c) for c in color]
    # also convert the min_v and max_v to values between 0-1
    min_v, max_v = map(down_scale, (min_v, max_v))
    # get the maximum of the min value and the color's value (therefore bumping it up if needed)
    # then get the minimum of that number and the max_v (bumping the value down if needed)
    v = min(max(min_v, v), max_v)
    # apply upscale to get the h, s, v(which has been clamped) back to 0-255, return as tuple
    return tuple(map(up_scale, (h, s, v)))


def order_by_hue_hsv(colors):
    """
    Orders colors by hue.
    """
    hsvs = [list(map(down_scale, color)) for color in colors]
    hsvs.sort(key=lambda t: t[0])
    return [tuple(map(up_scale, hsv_to_rgb(*hsv))) for hsv in hsvs]

All postcards are scanned in front of a black background, and many contain a lot of very dark colors. This function lets us experiment on removing all colors under a certain saturation or value threshold: colorless (grey-ish) and dark colors, respectively.

In [4]:

def clip_hsv(colors_hsv, min_s, min_v):
    min_s = down_scale(min_s)
    min_v = down_scale(min_v)
    hsvs = [tuple(map(down_scale, color)) for color in colors_hsv]
    hsvs = filter(lambda hsv: (hsv[1] >= min_s) and (hsv[2] >= min_v), hsvs)
    return [tuple(map(up_scale, hsv)) for hsv in hsvs]

If a certain color appears more than once in the picture (when count >= 1), we add it more than once to the dataset. This way, large areas of a single color factor in heavily in the resulting clusters:

In [5]:

def get_colors(img, colorspace='HSV'):
    """
    Returns a list of all the image's colors.
    """
    w, h = img.size
    # convert('RGB') converts the image's pixels info to RGB 
    # getcolors() returns an unsorted list of (count, pixel) values
    # w * h ensures that maxcolors parameter is set so that each pixel could be unique
    # there are three values returned in a list
    # return [color for count, color in img.convert(colorspace).getcolors(w * h)]
    return [single_color for count, color in img.convert(colorspace).getcolors(w * h) for single_color in [color] * count]

In [6]:

def hexify(rgb):
    return "#{0:02x}{1:02x}{2:02x}".format(*rgb)

For experimentation, allow scaling of the colorspace (effectively making clustering along scaled down axes more likely, and along scaled up axes less likely), clipping of pixels with low saturation and/or low value.

The scaling is inverted after the clustering algorithm is executed.

In [7]:

def colorz(image_url, n=DEFAULT_NUM_COLORS, min_v=DEFAULT_MINV, max_v=DEFAULT_MAXV,
           order_colors=True, coefficients=[1.0, 1.0, 1.0], clip_colors=False, min_clip_s=20, min_clip_v=20):
    """
    Get the n most dominant colors of an image.
    Clamps value to between min_v and max_v.

    Total number of colors returned is n, optionally ordered by hue.
    Returns as a list of RGB triples.

    """
    try:
        r = requests.get(image_url)
    except ValueError:
        print("{0} was not a valid URL.".format(image_file))
        exit(1)
    img = Image.open(BytesIO(r.content))
    img.thumbnail(THUMB_SIZE) # replace with a thumbnail with same aspect ratio, no larger than THUMB_SIZE

    obs = get_colors(img, 'HSV') # gets a list of RGB/HSV colors (e.g. (213, 191, 152)) for each pixel
    # adjust the value of each color, if you've chosen to change minimum and maximum values
    clamped = [clamp_hsv(color, min_v, max_v) for color in obs]
    clipped = clip_hsv(clamped, min_clip_s, min_clip_v) if clip_colors else clamped
    # turns the list of colors into a numpy array of floats, then applies scipy's k-means function
    clusters, _ = kmeans(array(clipped).astype(float) * coefficients, n)
    normalized_clusters = clusters / coefficients
    colors = order_by_hue_hsv(normalized_clusters) if order_colors else normalized_clusters
        
    hex_colors = list(map(hexify, colors)) # turn RGB into hex colors for web
    return hex_colors

In [8]:

def draw_row_with_links(link_and_colors):
    html = ""
    url, colors = link_and_colors
    for count, color in enumerate(colors):
        square = '<rect x="{0}" y="{1}" width="30" height="30" fill="{2}" />'.format(((count * 30)), 0, color)
        html += square
    full_html = '<a href="{0}" target="_blank"><svg height="30" width="{2}">{1}</svg>'.format(url, html, len(colors) * 30)
    return full_html

Test¶

Let's take a look at how different parameters affect how the swatches look.

We'll need an image link. Grab a link to a IIIF manifest from https://labs.onb.ac.at/en/dataset/akon/ or take the one provided down below.

In [9]:

import requests

In [10]:

r = requests.get('https://iiif.onb.ac.at/presentation/AKON/AK115_479/manifest/')
r.json()

Out[10]:

{'@context': 'https://iiif.io/api/presentation/2/context.json',
 '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/manifest',
 '@type': 'sc:Manifest',
 'label': 'Dresden',
 'metadata': [{'label': [{'@value': 'Id', '@language': 'en'},
    {'@value': 'Id', '@language': 'ger'}],
   'value': 'AK115_479'},
  {'label': [{'@value': 'Title', '@language': 'en'},
    {'@value': 'Titel', '@language': 'ger'}],
   'value': 'Dresden'},
  {'label': [{'@value': 'Place', '@language': 'en'},
    {'@value': 'Ort', '@language': 'ger'}],
   'value': "<a href='https://sws.geonames.org/2935022'>Dresden</a>"},
  {'label': [{'@value': 'Year', '@language': 'en'},
    {'@value': 'Jahr', '@language': 'ger'}],
   'value': '1906'},
  {'label': [{'@value': 'Disseminator', '@language': 'en'},
    {'@value': 'Anbieter', '@language': 'ger'}],
   'value': "<a href='https://akon.onb.ac.at/'>Ansichtskarten Online</a>"},
  {'label': [{'@value': 'Physical Location', '@language': 'en'},
    {'@value': 'Standort', '@language': 'ger'}],
   'value': 'Niederösterreichische Landesbibliothek 1672 - ÖNB'}],
 'description': 'Ministerium, Dampferlandeplatz',
 'viewingDirection': 'left-to-right',
 'viewingHint': 'paged',
 'license': 'http://creativecommons.org/publicdomain/mark/1.0/',
 'attribution': [{'@value': 'Austrian National Library', '@language': 'en'},
  {'@value': 'Österreichische Nationalbibliothek', '@language': 'ger'}],
 'logo': 'https://iiif.onb.ac.at/logo/',
 'seeAlso': [{'@id': 'http://data.onb.ac.at/AKON/AK115_479',
   'format': 'text/html'},
  {'@id': 'http://data.onb.ac.at/AKON/AK115_479.rdf',
   'format': 'application/rdf+xml'}],
 'sequences': [{'@context': 'https://iiif.io/api/presentation/2/context.json',
   '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/sequence/normal',
   '@type': 'sc:Sequence',
   'startCanvas': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/canvas/479',
   'canvases': [{'@context': 'https://iiif.io/api/presentation/2/context.json',
     '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/canvas/479',
     '@type': 'sc:Canvas',
     'label': 'Dresden',
     'height': 1462,
     'width': 2200,
     'images': [{'@context': 'https://iiif.io/api/presentation/2/context.json',
       '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/annotation/479',
       '@type': 'oa:Annotation',
       'motivation': 'sc:painting',
       'resource': {'@id': 'https://iiif.onb.ac.at/images/AKON/AK115_479/479/full/full/0/native.jpg',
        '@type': 'dctypes:Image',
        'height': 1462,
        'width': 2200,
        'format': 'image/jpeg',
        'service': {'@context': 'https://iiif.io/api/image/2/context.json',
         '@id': 'https://iiif.onb.ac.at/images/AKON/AK115_479/479',
         'profile': 'https://iiif.io/api/image/2/level2.json'}},
       'on': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/canvas/479'}]}]}]}

The image link can be found under sequences[*].canvases[*].images[*].resource.@id

In [11]:

image_link = 'https://iiif.onb.ac.at/images/AKON/AK115_479/479/full/!200,200/0/native.jpg'

Let's look at it. For our calculation, we'll use a much smaller variant of the image. Using the IIIF Image API, we can request an image of a certain size. To do this, we'll substitute the second full parameter by !200,200, meaning the resulting image should fit inside a 200x200 square.

In [12]:

import IPython.display as ipd

im_r = requests.get(image_link)
ipd.display(ipd.Image(im_r.content))

Let's create the color swatches...

In [13]:

cols1 = colorz(image_link)
cols1

Out[13]:

['#beaea3', '#494440', '#211d11', '#313c3f', '#8e9da2', '#534a4e']

...and display them as well:

In [14]:

def display_colors(color_array, link):
    html = draw_row_with_links((link, color_array))
    ipd.display(ipd.HTML(html))

In [15]:

display_colors(cols1, image_link)

In [16]:

cols2 = colorz(image_link, coefficients=[1.0, 2.0, 0.6])
cols2

Out[16]:

['#c4aa9e', '#5b5651', '#242411', '#8a9597', '#435158', '#5d5559']

In [17]:

display_colors(cols2, image_link)

In [18]:

cols3 = colorz(image_link, coefficients=[1.0, 2.0, 0.6], clip_colors=True, min_clip_v=30)
cols3

Out[18]:

['#c8aea2', '#4b4540', '#3d3422', '#4b5e62', '#829396', '#594f52']

In [19]:

display_colors(cols3, image_link)

In [20]:

cols5 = colorz(image_link, clip_colors=True, min_clip_s=30, min_clip_v=30)
cols5

Out[20]:

['#c9ac9f', '#49413b', '#332c1c', '#8ba9b0', '#2f393d', '#5d6b72']

In [21]:

display_colors(cols5, image_link)

This is getting tedious.

Let's define a function that computes swatches and then displays the original image and the swatches side by side:

In [22]:

def colorize_and_display(image_link=image_link, **kwargs):
    cols = colorz(image_link, **kwargs)
    display_colors(cols, image_link)
    ipd.display(ipd.Image(requests.get(image_link).content))

In [23]:

colorize_and_display(clip_colors=True, min_clip_s=50, min_clip_v=0)

In [24]:

colorize_and_display(clip_colors=True, min_clip_s=50, min_clip_v=30)

In [25]:

colorize_and_display(clip_colors=True, min_clip_s=20, min_clip_v=30)

In [26]:

colorize_and_display(clip_colors=True, min_clip_s=20, min_clip_v=30, coefficients=[1.0, 2.0, 0.6])

In [27]:

colorize_and_display(clip_colors=True, coefficients=[1.0, 2.0, 0.6], min_clip_s=20, min_clip_v=20)

This looks like a winner to me. We'll use create_swatches.py to create swatches for all available images in batch.

In [ ]: