To participate, you'll need to git clone (or download the .zip from GitHub):

You can do that in git using:

git clone --depth=1
If you have already cloned the material, please issue `git pull` now and reload the notebook to ensure that you have the latest updates.
In [ ]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Images are numpy arrays

Images are represented in scikit-image using standard numpy arrays. This allows maximum inter-operability with other libraries in the scientific Python ecosystem, such as matplotlib and scipy.

Let's see how to build a grayscale image as a 2D array:

In [ ]:
import numpy as np
from matplotlib import pyplot as plt

random_image = np.random.random([500, 500])

plt.imshow(random_image, cmap='gray')

The same holds for "real-world" images:

In [ ]:
from skimage import data

coins = data.coins()

print('Type:', type(coins))
print('dtype:', coins.dtype)
print('shape:', coins.shape)

plt.imshow(coins, cmap='gray');

A color image is a 3D array, where the last dimension has size 3 and represents the red, green, and blue channels:

In [ ]:
cat = data.chelsea()
print("Shape:", cat.shape)
print("Values min/max:", cat.min(), cat.max())


These are just NumPy arrays. E.g., we can make a red square by using standard array slicing and manipulation:

In [ ]:
cat[10:110, 10:110, :] = [255, 0, 0]  # [red, green, blue]

Images can also include transparent regions by adding a 4th dimension, called an alpha layer.

Other shapes, and their meanings

Image type Coordinates
2D grayscale (row, column)
2D multichannel (row, column, channel)
3D grayscale (or volumetric) (plane, row, column)
3D multichannel (plane, row, column, channel)

Displaying images using matplotlib

In [ ]:
from skimage import data

img0 = data.chelsea()
img1 = data.rocket()
In [ ]:
import matplotlib.pyplot as plt

f, (ax0, ax1) = plt.subplots(1, 2, figsize=(20, 10))

ax0.set_title('Cat', fontsize=18)

ax1.set_title('Rocket', fontsize=18)
ax1.set_xlabel(r'Launching position $\alpha=320$')

ax1.vlines([202, 300], 0, img1.shape[0], colors='magenta', linewidth=3, label='Side tower position')
ax1.plot([168, 190, 200], [400, 200, 300], color='white', linestyle='--', label='Side angle')


For more on plotting, see the Matplotlib documentation and pyplot API.

Data types and image values

In literature, one finds different conventions for representing image values:

  0 - 255   where  0 is black, 255 is white
  0 - 1     where  0 is black, 1 is white

scikit-image supports both conventions--the choice is determined by the data-type of the array.

E.g., here, I generate two valid images:

In [ ]:
linear0 = np.linspace(0, 1, 2500).reshape((50, 50))
linear1 = np.linspace(0, 255, 2500).reshape((50, 50)).astype(np.uint8)

print("Linear0:", linear0.dtype, linear0.min(), linear0.max())
print("Linear1:", linear1.dtype, linear1.min(), linear1.max())

fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(15, 15))
ax0.imshow(linear0, cmap='gray')
ax1.imshow(linear1, cmap='gray');

The library is designed in such a way that any data-type is allowed as input, as long as the range is correct (0-1 for floating point images, 0-255 for unsigned bytes, 0-65535 for unsigned 16-bit integers).

You can convert images between different representations by using img_as_float, img_as_ubyte, etc.:

In [ ]:
from skimage import img_as_float, img_as_ubyte

image = data.chelsea()

image_ubyte = img_as_ubyte(image)
image_float = img_as_float(image)

print("type, min, max:", image_ubyte.dtype, image_ubyte.min(), image_ubyte.max())
print("type, min, max:", image_float.dtype, image_float.min(), image_float.max())
print("231/255 =", 231/255.)

Your code would then typically look like this:

def my_function(any_image):
   float_image = img_as_float(any_image)
   # Proceed, knowing image is in [0, 1]

We recommend using the floating point representation, given that scikit-image mostly uses that format internally.

Image I/O

Mostly, we won't be using input images from the scikit-image example data sets. Those images are typically stored in JPEG or PNG format. Since scikit-image operates on NumPy arrays, any image reader library that provides arrays will do. Options include imageio, matplotlib, pillow, etc.

scikit-image conveniently wraps many of these in the io submodule, and will use whichever of the libraries mentioned above are installed:

In [ ]:
from skimage import io

image = io.imread('../images/balloon.jpg')

print(image.min(), image.max())


We also have the ability to load multiple images, or multi-layer TIFF images:

In [ ]:
ic = io.ImageCollection('../images/*.png:../images/*.jpg')

print('Type:', type(ic))

In [ ]:
import os

f, axes = plt.subplots(nrows=3, ncols=len(ic) // 3 + 1, figsize=(20, 5))

# subplots returns the figure and an array of axes
# we use `axes.ravel()` to turn these into a list
axes = axes.ravel()

for ax in axes:

for i, image in enumerate(ic):
    axes[i].imshow(image, cmap='gray')

Aside: enumerate

enumerate gives us each element in a container, along with its position.

In [ ]:
animals = ['cat', 'dog', 'leopard']
In [ ]:
for i, animal in enumerate(animals):
    print('The animal in position {} is {}'.format(i, animal))

Exercise: draw the letter H

Define a function that takes as input an RGB image and a pair of coordinates (row, column), and returns a copy with a green letter H overlaid at those coordinates. The coordinates point to the top-left corner of the H.

The arms and strut of the H should have a width of 3 pixels, and the H itself should have a height of 24 pixels and width of 20 pixels.

Start with the following template:

In [ ]:
def draw_H(image, coords, color=(0, 255, 0)):
    out = image.copy()
    return out 

Test your function like so:

In [ ]:
cat = data.chelsea()
cat_H = draw_H(cat, (50, -50))

Exercise: visualizing RGB channels

Display the different color channels of the image along (each as a gray-scale image). Start with the following template:

In [ ]:
# --- read in the image ---

image = plt.imread('../images/Bells-Beach.jpg')

# --- assign each color channel to a different variable ---

r = ...
g = ...
b = ...

# --- display the image and r, g, b channels ---

f, axes = plt.subplots(1, 4, figsize=(16, 5))

for ax in axes:

(ax_r, ax_g, ax_b, ax_color) = axes
ax_r.imshow(r, cmap='gray')
ax_r.set_title('red channel')

ax_g.imshow(g, cmap='gray')
ax_g.set_title('green channel')

ax_b.imshow(b, cmap='gray')
ax_b.set_title('blue channel')

# --- Here, we stack the R, G, and B layers again
#     to form a color image ---
ax_color.imshow(np.stack([r, g, b], axis=2))
ax_color.set_title('all channels');

Now, take a look at the following R, G, and B channels. How would their combination look? (Write some code to confirm your intuition.)

In [ ]:
from skimage import draw

red = np.zeros((300, 300))
green = np.zeros((300, 300))
blue = np.zeros((300, 300))

r, c =, 100, 100)
red[r, c] = 1

r, c =, 200, 100)
green[r, c] = 1

r, c =, 150, 100)
blue[r, c] = 1

f, axes = plt.subplots(1, 3)
for (ax, channel) in zip(axes, [red, green, blue]):
    ax.imshow(channel, cmap='gray')

Exercise: Convert to grayscale ("black and white")

The relative luminance of an image is the intensity of light coming from each point. Different colors contribute differently to the luminance: it's very hard to have a bright, pure blue, for example. So, starting from an RGB image, the luminance is given by:

$$ Y = 0.2126R + 0.7152G + 0.0722B $$

Use Python 3.5's matrix multiplication, @, to convert an RGB image to a grayscale luminance image according to the formula above.

Compare your results to that obtained with skimage.color.rgb2gray.

Change the coefficients to 1/3 (i.e., take the mean of the red, green, and blue channels, to see how that approach compares with rgb2gray).

In [ ]:
from skimage import color, img_as_float

image = img_as_float(io.imread('../images/balloon.jpg'))

gray = color.rgb2gray(image)
my_gray = ...

# --- display the results ---

f, (ax0, ax1) = plt.subplots(1, 2, figsize=(10, 6))

ax0.imshow(gray, cmap='gray')

ax1.imshow(my_gray, cmap='gray')
ax1.set_title('my rgb2gray')