Images are simply n-dimensional regular arrays of pixels where n >= 2. Each pixel has k-channels of information, and, once instantiated, for all images k >= 1
. All pixels and all channels are of the same data type, but there is no restriction in general on what that data type is.
There are two main subclasses of Image
- BooleanImage
and MaskedImage
. The vast majority of the functionality is provided by Image
and hence available on all three Image types - the subclasses are simple specializations which will be explained in this notebook.
We use the syntax
(i, j, ...., l, [k])
To declare an image shape. The i
is the size of the first image dimension, j
the second. This image is of n
dimensions - the final spatial dimension being of size l
. The []
sybolizes that this is the channel axis - this image has k
channels. A few examples for clarity:
(320, 240, [1])
(1700, 1650, [3])
(1024, 1024, 1024, [1])
commonly the channel has an explicit meaning - these are symbolized by a <>
. For example:
(320, 240, <I>)
(1700, 1650, <R, G, B>)
(1024, 768, <Z>)
We now use this notation to explain all of the image classes. As a final note - some classes are fixed to have only one channel. The constructors for these images don't expect you to pass a numpy array in with a dead axis on the end all the time. To signify this, the channel signature includes an exclamation mark to show it is implictly generated for you, for example
(1024, 768, !<Z>!)
To aid with the explanations, lets import the good old Takeo and Lenna images. import_builtin_asset(asset_name)
allows us to quickly grab a few builtin images
%matplotlib inline
import numpy as np
import menpo.io as mio
lenna = mio.import_builtin_asset('lenna.png')
takeo_rgb = mio.import_builtin_asset('takeo.ppm')
# Takeo is RGB with repeated channels - convert to greyscale
takeo = takeo_rgb.as_greyscale(mode='average')
print('Lenna is a {}'.format(type(lenna)))
print('Takeo is a {}'.format(type(takeo)))
Lenna is a <class 'menpo.image.masked.MaskedImage'> Takeo is a <class 'menpo.image.masked.MaskedImage'>
(i, j, ...., n, [k])
All images are Image
instance, and a large bulk of functionality can be explored in this one class.
from menpo.image import Image
print("Lenna 'isa' Image: {}".format(isinstance(lenna, Image)))
print("Takeo 'isa' Image: {}".format(isinstance(lenna, Image)))
Lenna 'isa' Image: True Takeo 'isa' Image: True
pixels
self.pixels
property. self.pixels[..., -1]
is refered to as the channel axis - it is always present on an instantiated subclass of Image
(even if for instance we know the number of channels to always be 1)print('Takeo shape:{}'.format(takeo.pixels.shape)) # takeo is an RGB image even though all the channels are the same
print('The number of channels in Takeo is {}'.format(takeo.pixels.shape[-1]))
print("But this the right way to find out is with the 'n_channels' property: {}".format(takeo.n_channels))
print('n_channels for Lenna is {}'.format(lenna.n_channels))
Takeo shape:(225L, 150L, 1L) The number of channels in Takeo is 1 But this the right way to find out is with the 'n_channels' property: 1 n_channels for Lenna is 3
shape
self.shape
of the image is the spatial dimension of the array- that's (i, j, ..., n)
print('Takeo has a shape of {}'.format(takeo.shape))
print('Lenna has a shape of {}'.format(lenna.shape))
Takeo has a shape of (225L, 150L) Lenna has a shape of (512L, 512L)
width and height
0
'th axis of pixels is the 'height' or 'y' axis, and it starts at the top of the image and runs down. The 1
'st axis is the 'width' or 'x' axis - it starts from the left of the image and runs across.Most of the time worrying about this will lead you into hot water - it's a lot better to not get bogged down in the terminology and just consider the image as an array, just like all our other data. As a result, all our algorithms, such as gradient, will be ordered by axis 0,1,...,n
not x,y, z
(as this would be axis 1,0,2
). The self.shape
we printed above was the shape of the underlying array, and so was semantically (height, width)
. You can use the self.width
and self.height
properties to check this for yourself if you ever get confused though
print('Takeos arrangement in memory (for maths) is {}'.format(takeo.shape))
print('Semantically, Takeo has W:{} H:{}'.format(takeo.width, takeo.height))
print(takeo) # shows the common semantic labels
Takeos arrangement in memory (for maths) is (225L, 150L) Semantically, Takeo has W:150 H:225 150W x 225H 2D MaskedImage with 1 channels. Attached mask 100.0% true
centre
# note that this is (axis0, axis1), which is (height, width) or (Y, X)!
print('The centre of Takeo is {}'.format(takeo.centre))
The centre of Takeo is [ 112.5 75. ]
counts
n_pixels
is channel independent - to find the total size of the array (including channels) use n_elements
.print('Lenna n_dims : {}'.format(lenna.n_dims))
print('Lenna n_channels : {}'.format(lenna.n_channels))
print('Lenna n_pixels : {}'.format(lenna.n_pixels))
print('Lenna n_elements: {}'.format(lenna.n_elements))
Lenna n_dims : 2 Lenna n_channels : 3 Lenna n_pixels : 262144 Lenna n_elements: 786432
view
takeo.view()
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x42e5be0>
lenna.view()
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x13d1cda0>
you can pass the channel=x
to inspect a single channel of the image
# viewing Lenna's green channel...
lenna.view(channels=0)
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x1d42da0>
crop
crop()
, which is inplace, and cropped_copy()
which returns the cropped image without damaging the instance it is called on. Both execute identical code paths. To crop we provide the minimum values per dimension where we want the crop to start, and the maximum values where we want the crop to end. For example, to crop Takeo from the centre down to the bottom corner, we could dotakeo_cropped = takeo.cropped_copy(takeo.centre, np.array(takeo.shape))
takeo_cropped.view()
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x13def2e8>
rescale
lenna_double = lenna.rescale(2.0)
print(lenna_double)
1024W x 1024H 2D MaskedImage with 3 channels. Attached mask 100.0% true
landmark support
breakingbad = mio.import_builtin_asset('breakingbad.jpg')
breakingbad.landmarks['PTS'].view()
<menpo.visualize.viewmatplotlib.MatplotlibLandmarkViewer2dImage at 0x1479cc18>
print(breakingbad.landmarks['PTS'])
LandmarkGroup: label: PTS, n_labels: 1, n_points: 68
it can sometimes be useful to constrain an image to be bound around it's landmarks. A convienience method exists to do just this
breakingbad.crop_to_landmarks(boundary=20)
# note that this method is smart enough to not stray outside the boundary of the image
breakingbad.landmarks['PTS'].view()
<menpo.visualize.viewmatplotlib.MatplotlibLandmarkViewer2dImage at 0x174aa710>
(i, j, ...., n, !<B>!)
The first concrete Image subclass we will look at is BooleanImage
. This is an n-dimensional image with a single channel per pixel. The datatype of this image is np.bool
. First, remember that BooleanImage
is a subclass of Image
and so all of the above features apply again.
from menpo.image import BooleanImage
random_seed = np.random.random(lenna.shape) # shape doesn't include channel - and that's what we want
random_mask = BooleanImage(random_seed > 0.5)
print "the mask's shape is as expected: {}".format(random_mask.shape)
print "the channel has been added to the mask's pixel's shape for us: {}".format(random_mask.pixels.shape)
random_mask.view()
the mask's shape is as expected: (512L, 512L) the channel has been added to the mask's pixel's shape for us: (512L, 512L, 1L)
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x174c2160>
Note that the constructor for the Boolean Image doesn't require you to pass in the redundant channel axis - it's created for you.
blank()
blank()
method. You can rely on this existing on every concrete Image class.all_true_mask = BooleanImage.blank((120, 240))
all_false_mask = BooleanImage.blank((120, 240), fill=False)
metrics
print('n_pixels on random_mask: {}'.format(random_mask.n_pixels))
print('n_true pixels on random_mask: {}'.format(random_mask.n_true))
print('n_false pixels on random_mask: {}'.format(random_mask.n_false))
print('proportion_true on random_mask: {:.3}'.format(random_mask.proportion_true))
print('proportion_false on random_mask: {:.3}'.format(random_mask.proportion_false))
n_pixels on random_mask: 262144 n_true pixels on random_mask: 131157 n_false pixels on random_mask: 130987 proportion_true on random_mask: 0.5 proportion_false on random_mask: 0.5
true_indices/false_indices
BooleanImage
has functionality that aids in the use of the class as a mask to another image. The indices properties give you access to the coordinates of the True and False values as if the mask had been flattened.from copy import deepcopy
small_amount_true = deepcopy(all_false_mask)
small_amount_true.pixels[4, 8] = True
small_amount_true.pixels[15, 56] = True
small_amount_true.pixels[0, 4] = True
print(small_amount_true.true_indices) # note the ordering is incremental C ordered
print('The shape of true indices: {}'.format(small_amount_true.true_indices.shape))
print('The shape of false indices: {}'.format(small_amount_true.false_indices.shape))
[[ 0 4] [ 4 8] [15 56]] The shape of true indices: (3L, 2L) The shape of false indices: (28797L, 2L)
all_indices
print('The shape of all indices: {}'.format(small_amount_true.all_indices.shape))
# note that all_indices = true_indices + false_indices
The shape of all indices: (28800L, 2L)
mask
mask
property. This is used heavily in MaskedImage
.lenna_masked_pixels_flatted = lenna.pixels[random_mask.mask]
lenna_masked_pixels_flatted.shape
# note we can only do this as random_mask is the shape of lenna
print('Is Lenna and random mask the same shape? {}'.format(lenna.shape == random_mask.shape))
Is Lenna and random mask the same shape? True
print(random_mask)
print(lenna)
print(takeo)
512W x 512H 2D mask, 50.0% of which is True 512W x 512H 2D MaskedImage with 3 channels. Attached mask 100.0% true 150W x 225H 2D MaskedImage with 1 channels. Attached mask 100.0% true
(i, j, ...., n, [k])
Notice in the above print statements that lenna and takeo have attached masks. This is because all images imported through menpo.io are instances of MaskedImage. This means this functionality is available to all images we deal with directly. Just like you would expect, MaskedImage
s have a mask attached to them which augments their usual behavior.
mask
MaskedImage
s have a BooleanImage
of appropriate size attached to them at the mask property. On construction, a mask can be specified at the mask
kwarg (either a boolean ndarray
or a BooleanImage
instance). If nothing is provided, the mask is set to all true. A MaskedImage
with an all true mask behaves exactly as an Image
- abeit with a performance penalty.# imported images have all-true masks on them to start with
print takeo.mask
takeo.mask.view()
print breakingbad.mask
150W x 225H 2D mask, 100.0% of which is True 403W x 411H 2D mask, 100.0% of which is True
constrain_mask_to_landmarks
Allows us to update the mask to equal the convex hull around some landmarks on the image. You can choose a particular group of landmarks (e.g. PTS
) and then a specific label (e.g. perimeter
). By default, if neither are provided (and if their is only one landmark group) all the landmarks are used to form a convex hull.
breakingbad.constrain_mask_to_landmarks()
breakingbad.mask.view()
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x178f1860>
breakingbad.landmarks.view()
view behavior
breakingbad.view()
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x17aeb128>
use masked=False
to see everything
breakingbad.view(masked=False)
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x174c2748>
Thanks to kwarg chaining, this can even be used when viewing landmarks
breakingbad.landmarks.view(channels=2, masked=False)
as_vector() / from_vector() behavior
as_vector()
and from_vector()
on MaskedImage
s only returns True mask values flattened.print('breakingbad has {} pixels, but only {} are '
'masked.'.format(breakingbad.n_pixels, breakingbad.n_true_pixels))
print('breakingbad has {} elements (3 x n_pixels)'.format(breakingbad.n_elements))
vectorized_bad = breakingbad.as_vector()
print('vector of breaking bad is of shape {}'.format(vectorized_bad.shape))
breakingbad has 165633 pixels, but only 97498 are masked. breakingbad has 496899 elements (3 x n_pixels) vector of breaking bad is of shape (292494L,)
gradient()
MaskedImage
. Note that landmarks from the original image are persisted to the gradient. Also the nullify_values_at_mask_boundaries
kwarg is useful for calculating gradients with masked data that has empty regions, such as in AAM's.grads = breakingbad.gradient(nullify_values_at_mask_boundaries=True)
grads.view(channels=3, masked=False)
<menpo.visualize.viewmatplotlib.MatplotlibImageViewer2d at 0x178e05f8>