You are faced with the following problem: Given an image with a considerable amount unimportant space. You want to resize the image in such a way that the contents of the image are preserved while only the unimportant pieces are removed. There are several situations in which this may be a relevant issue to consider. For instance, if you are making a web page you will most likely want the content on the page to have roughly the same size as the window of the browser is reduced. Reducing the window size should (at least up to some critical point) only reduce empty space and not distort the content within it. Another situation where this is relevant is in image sharing and storing. If you have limited storing space or the medium you are to post the image on only accepts images of certain dimensions, it might be relevant to have a method for resizing the image without distorting the proportions of its contents.
Seam carving is an algorithm that does exactly this; it is a content-aware image-resizer. Although the problem itself seems neither correlated to physics nor programming, there are in fact beautiful analogues to draw from this algorithm to physical systems and it involves the use of dynamic programming.
(Much of the implementation is based on what is presented in this lecture on Seam Carving given in a course on Computational Thinking at MIT)
Here we will outline the fundamental algorithm we are to implement for resizing the image. The terms in italic will be defined more precisely in the following sections.
Intuitively we seek in each iteration a path connecting the top and bottom of the image such that as little information as possible is contained in the pixels constituting the path.
In principle, the main task of the problem is to find out which pixels of an image can be removed with the smallest amount of loss possible. It is therefore convenient to associate an energy function that measures the information content of each pixel.
As a starter, let's introduce some notation and definitions to be able to make our ideas more precise. Denote by $\mathbf{I}$ an $n\times m$ image. Intuitively, we think of the image as essentially a $n\times m$ array of scalar values representing the brightness of each pixel of the image. However, in our discussion we are going to benefit from defining the image as a mapping \begin{align} \mathbf{I} : [1,\dots, n] \times [1,\dots,m] &\to \mathbb{R} \label{eq:img}\\ (i,j) &\mapsto \mathrm{Brightness}(i,j), \nonumber \end{align} assigning to each of the pixels of the image a level of brightness. We define the brightness of a pixel as the mean value of its red, green and blue-content, with $1.0$ being the highest value.
# importing useful packages
using Images, LinearAlgebra, ImageFiltering, Statistics, ImageView, Plots
# loading an image
img_path = "./wave.jpg"
image = load(img_path)
function brightness(img_element::AbstractRGB)
"""
Function for getting the brightness of a pixel
"""
return mean(img_element.r + img_element.g + img_element.b)
end
brightness (generic function with 1 method)
If we for a moment think of the image as a mapping defined on a rectangle $R \subset \mathbb{R}^2$ rather than the finite domain in \eqref{eq:img}, we have an obvious candidate for the energy of the image. Consider the function
$$ E(\mathbf{I}; \mathbf{x}) = \sqrt{\left( \frac{\partial \mathbf{I}}{\partial x} \right)^2 + \left( \frac{\partial \mathbf{I} }{\partial y} \right)^2} \equiv |\boldsymbol{\nabla} \mathbf{I} (\mathbf{x})|, $$with $\mathbf{x}\in R \subset \mathbb{R}^2$. What we seek is essentially a measure of how much the brightness of the pixel changes when moving in either direction from it, i.e. a gradient. A discrete analogue of the gradient is the so-called Sobel operator. This is an operator which estimates the gradient of a function at a point. It does so by multiplying the matrix corresponding to the function evaluated at the point in question and its $8$ nearest neighbours by the following matrices:
$$ G_x = \frac{1}{8} \begin{pmatrix} -1 & 0 & 1\\ -2 & 0 & 2\\ -1 & 0 & 1 \end{pmatrix} $$and $$ G_y = \frac{1}{8}\begin{pmatrix} -1 & -2 & -1\\ 0 & 0 & 0\\ 1 & 2 & 1 \end{pmatrix}. $$
When multiplying the matrix of neighbours with $G_x$, the norm of the resulting matrix will be large if there is a vertical edge to the right or left of the centre point. In this respect $G_x$ detects vertical edges. Similarly, $G_y$ detects horizontal edges. The gradient of the image at the pixel $(i,j)$ is approximated by the function \begin{equation}\label{eq:conv} g(i,j) \equiv \sum_{k =-1}^{+1} \sum_{l = -1}^{+1} \mathbf{G}(k,l) \mathbf{I}(i-k,j-l) \equiv (\mathbf{G} \star \mathbf{I}) (i,j), \end{equation} where we define the operator $\mathbf{G}(k,l)$ as $$ \mathbf{G}(k,l) \mathbf{I}(i-k,j-l) \equiv \sqrt{ |G_x(i,j)\mathbf{I}(i-k,j-l)|^2 + |G_y(i,j)\mathbf{I}(i-k,j-l)|^2 }. $$ The expression in equation \ref{eq:conv} takes the reminiscent form of a convolution, justifying the notation $\mathbf{G}\star\mathbf{I}$. We will use the convolution at each point as the measure of the energy of the image, that is
\begin{equation}\label{eq:energy} E(\mathbf{I};i,j) = (\mathbf{G} \star \mathbf{I}) (i,j). \end{equation}A slight complication remains however with equation \eqref{eq:energy}. Since the image is of finite extent, the function in equation \eqref{eq:conv} is undefined for $i=1,n$ and $j=1,m$. This edge case is often handled by expanding the image with a padding of zeros on the boundaries.
In fact, realising that this gradient approximation is a convolution is helpful for us, as there exist methods for applying such convolutions to images built into julia (and also in python, for that matter). Accordingly, we leave it to the built-in filter to handle the calculation and the edge case mentioned above. The default is the zero padding mentioned earlier. For a discussion of more advanced padding strategies we encourage you to consult the julia reference here.
In julia these convolutions are applied to a picture by using the function imfilter
whose arguments are the image and the kernel with which you want to convolve the image. The Sobel-kernels are retrieved from Kernel.sobel()
. We reflect the kernels since imfilter
in julia by default is not a convolution, but rather a correlation operation, which is very similar. For more details, check out the julia reference here.
G_y = reflect(Kernel.sobel()[1])
G_x = reflect(Kernel.sobel()[2])
function E(img::Array{RGB{N0f8},2})
"""
Function for calculating the energy of an image
Parameters
----------
img : Array{RGB{N0f8},2}
Image
Returns
-------
energy : Float64
Energy of image
"""
gray_img = brightness.(img)
∇x = imfilter(gray_img, G_x)
∇y = imfilter(gray_img, G_y)
return sqrt.(∇x.^2 + ∇y.^2)
end
E (generic function with 1 method)
In the cell below we test our function to see the energy landscape of the test-image we have imported below. As expected, regions in the image we think of as having a high information content also has high energy in this measure.
# simple function used to plot an image
function plot_image(image_array,title)
gr()
plt = heatmap(image_array, title=title,yflip = true)
return plt
end
plot_image (generic function with 1 method)
e = E(image)
plot(plot_image(e,"Energy landscape"))
Next, we need to define a seam of the image. We will in this notebook only consider shrinking images in the horizontal direction, so we will only need vertical seams. Informally, a vertical seam is some connected path from the top of the image to the bottom. More formally, a vertical seam of a $n\times m$-dimensional image is a set
$$ s(\mathbf{I}) = \lbrace (x_s(i),i) \rbrace_{i=1}^{n} \subset \mathbb{R}^2 $$of points in the plane $\mathbb{R}^2$ whose consecutive $x$-coordinates cannot differ by more than $1$. Hence, the function $x : [1,\dots,n] \to [1,\dots,m]$ is any map associating to any of the $n$ vertical indices one of the $m$ horizontal indices, subject to the constraint that $|x(i-1) - x(i)| \leq 1$ for all $i = 2,\dots,n$.
Caution: It is possible to get lost and confused when translating the indexing notation used in programming to mathematical notation. $0\leq i\leq n$ is the index referring to the $y$-direction of the image, while $0\leq j \leq m$ is the index referring to the $x$-direction. Points in $\mathbb{R}^2$ are written on the form $(x(j),y(i))$ while the function $\mathbf{I}$ takes arguments on the form $(i,j)$, effectively acting the same way as a matrix.
The energy of a seam is simply the energy of all pixels that constitute the seam: $$ E(s(\mathbf{I})) =\sum_{i = 1}^{n} E \left(\mathbf{I}; i, x_s(i)\right). $$
To find the optimal seam, we seek the seam $s^*$ minimising the energy: $$ s^* = \underset{s}{\mathrm{argmin}} \left\lbrace E(s(\mathbf{I}))\right\rbrace = \underset{\mathbf{s}}{\mathrm{argmin}} \left\lbrace \sum_{i = 1}^{n} E\left(\mathbf{I}; i, x_s(i)\right) \right\rbrace. $$
This is performed using dynamic programming. The procedure is as follows [2] :
The function $M(i,j)$ is defined recursively, with a base case of $M(n,j) = E(n,j)$.
We will refer to $M$ as the energy map of the image. The algorithm is implemented in the function below. Note that in the original paper [2], they traverse the image downwards as opposed to us traversing it upwards.
function energy_map(energy::Array{Float64,2})
"""
Function for finding the energy map of the image. That is, providing
information about which way to go in each point of the image.
Parameters
----------
energy : Array{Float64,2}
Energy of image
Returns
-------
energy_map : Array{Float64,2}
Energy map
next_elements : Array{Int64,2}
Map of the directions to choose at each point of the image.
"""
energy_map = copy(energy) # the map contains the energy at each point
next_elements = zeros(Int64, size(energy))
n,m = size(energy)
for i ∈ n-1:-1:1, j ∈ 1:m
# using min and max to avoid stepping out of the image
j_minus = max(j-1,1) # stepping to the left
j_plus = min(j+1,m) # stepping to the right
pixel_energy, next_element = findmin(energy_map[i+1, j_minus:j_plus])
energy_map[i,j] += pixel_energy
# next_element is either 1,2,3 so map it to -1,0,1 by subtracting 2.
# If x_minus == 1 we can only go to 0,1 so we add 1
next_elements[i,j] = next_element - 2 + (j_minus==1)
end
return energy_map, next_elements
end
energy_map (generic function with 1 method)
Before looking at how the energy map of our chosen picture looks like, let's look at some constructed examples to gain some insight into what the energy map is. Lets consider an energy landscape which consists of a disc at high energy and its surrondings at low energy. Clearly, we should try to avoid the disc when choose our path, and therefore we'd expect the energy map to lead the paths away from the centre. This is exactly what we observe below.
# Generating a disc shaped energy landscape
D = zeros((500,500))
for i = 1:500
for j = 1:500
if (i-250)^2 + (j-250)^2 <= 100^2
D[i,j] = 1
end
end
end
map_sphere, _ = energy_map(D)
plt1 = plot_image(D,"Disc energy landscape")
plt2 = plot_image(map_sphere,"Energy map of disc landscape")
plot(plt1,plt2,layout = (1,2),size = (1300,500))
As another example, consider an energy landscape consisting of a box at high energy with low-energy surroundings.
# Generating a box shaped energy landscape
box = zeros((500,500))
box[100:400,100:400] .= 1
map_box, _ = energy_map(box)
plt1 = plot_image(box,"Box energy landscape")
plt2 = plot_image(map_box,"Energy map of box landscape")
plot(plt1,plt2,layout = (1,2),size = (1300,500))
The same "tree"-like structures are also observed in the energy map of our wave image. These structures originate from essentially the same reason as for the simple examples above. The only difference is that we have a lot more regions of high energy, so these structures overlap to form something reminiscent of a forest. The optimal path avoids all of the trees.
map, _ = energy_map(E(image))
plot_image(map,"Energy-path landscape")
Suppose that the energy landscape that we are considering is the energy landscape of a chemical reaction. Furthermore, suppose that the $y$-coordinate represents the degree of completion of the reaction and the $x$-coordinate represents some physical parameter of the reaction. Now, the energy of the reaction will depend on the path from the top to the bottom in our landscape. That is, in going from $0$ to $100\%$ completion the energy will depend on the initial value of the parameter $x$, and its value at each time step of the reaction towards completion. What the seam carving algorithm does in this case is to find the most energetically favourable way of completing the reaction when constrained to vary $x$ continuously. Taking analogue even further, one can say that nature decides how the reaction should go about by carving out the least significant seam of the energy landscape. In fact, one could argue that there are obvious connections between this algorithm and most physical systems, as all are constrained by some form of Hamilton's principle: $$ \frac{\delta }{\delta \mathbf{q}} \int_{t_1}^{t_2} \mathrm{d}t L (\mathbf{q},\dot{\mathbf{q}},t) = 0. $$
The energy landscape would in this case of course not look anything like that of our image, and it is major simplification to assume it only depends on one parameter. In some higher dimensional space with one coordinate representing the degree of completion and the others the parameters of the reaction, one could imagine that the landscape will look very complicated, but the idea is essentially the same. Another thing to mention is that the "cost-function" of a reaction is not necessary the energy per se, but rather some appropriate thermodynamic potential.
To finish the algorithm, we need
These functions are written in the cell below. One could think that we'd only need to remove the seam from the energy and use the result as the energy in the next calculation. However, since the energy at a point depends on the brightness of the neighbours, this will not be correct. We would at least have to calculate the energy of all the affected pixels. To be safe, one could calculate the energy of the carved image in each step, but that requires doing many calculations we have already performed in previous steps. We choose to solve this problem by re-calculating the energy of the rectangular strip of the image which fully contains the seam and its adjacent seams.
Note that we have two functions below named remove_seam
whose parameters are different. This is similar to overloading of functions in e.g. $\texttt{c++}$, but not quite the same. Usually, overloading involves deciding based on compile-time types of the arguments what version of a function is to be called, while in julia it is decided based on the run-time types.
function get_seam(energy::Array{Float64,2})
"""
Function for getting the optimal seam given the energy map of the image.
Parameters
----------
energy : Array{Float64,2}
energy of image
Returns
-------
seam : Array{Tuple{Int64,Int64},1}
Array of tuples corresponding to a seam. Note that the indexing is reversed with respect to
the mathematical notation, since i <-> y, and j <-> x.
"""
map, next_elements = energy_map(energy) # calculates the energy map of the image
min_element = argmin(map[1,:]) # finds the element of the first row in the map minimising the energy
n = size(next_elements)[1]
x = zeros(Int64,n)
x[1] = min_element
# The first element is already entered into the seam
for i = 2:n
x[i] = x[i-1] + next_elements[i, x[i-1]]
end
seam = tuple.(1:n,x)
return seam
end
function remove_seam(img::Array{RGB{N0f8},2},seam::Array{Tuple{Int64,Int64},1})
"""
Function for removing the seam from the image.
Parameters
----------
img : Array{RGB{N0f8}},2}
Image we are working with
seam : Array{Tuple{Int64,Int64},1}
Seam to remove.
Returns
-------
new_img : Array{RGB{Normed{UInt8,8}},2}
The image with the seam removed.
"""
new_img = img[:,1:end-1]
for (i,j) ∈ seam
new_img[i, 1:j-1] .= img[i,1:j-1]
new_img[i, j:end] .= img[i,j+1:end]
end
return new_img
end
function remove_seam(energy::Array{Float64,2},img::Array{RGB{N0f8},2},seam::Array{Tuple{Int64,Int64},1})
"""
Function for removing the seam from the energy.
Parameters
----------
energy : Array{Float64,2}
Energy of image
img : Array{RGB{N0f8},2}
Image with carved out seam
seam : Array{Tuple{Int64,Int64},1}
Seam to remove.
Returns
-------
new_energy : Array{Float64,2}
Energy of carved image
"""
n,m = size(img) # size of _new_ image
new_energy = zeros(size(energy[:,1:end-1]))
j_min, j_max = extrema(last.(seam))
new_energy[:,1:j_min-1] = energy[:,1:j_min-1]
new_energy[:,j_max+1:end] = energy[:,j_max+2:end]
# Re-calculate the energy of the (possibly) affected pixels
# Set to boundary if they happen to be mapped outside the image!
j_min = j_min - 1 + (j_min == 1)
j_max = j_max + 1 - 2*(j_max == m + 1) - (j_max == m)
new_energy[:,j_min:j_max] = E(img[:,j_min:j_max])
return new_energy
end
function seam_carving(img::Array{RGB{N0f8},2},res::Int64)
"""
Function for carving out a given number of seams of an image.
Parameters
----------
img : Array{RGB{Normed{UInt8,8}},2}
Image.
res : Int64
Number of pixels in the horisontal direction in the carved image.
Returns
-------
img : Array{RGB{Normed{UInt8,8}},2}
The carved image.
"""
@assert(res >= 0 || res <= size(img)[2])
energy = E(img)
for i = (1:size(img)[2] - res)
seam = get_seam(energy)
img = remove_seam(img, seam)
energy = remove_seam(energy,img,seam)
end
return img
end
seam_carving (generic function with 1 method)
# Check the size of the image
print(size(image))
(649, 965)
new_image = seam_carving(image, 700);
display(new_image)
display(image)
bad_carv = seam_carving(image,400);
display(bad_carv)
If you actually tried running the cell above you would notice that the calculations here require quite a lot of time. Since julia usually is pretty fast one might suspect that there are some bad choices made in the implementation. That might of course always be the case, but consider this rough calculation:
For each pixel of the image, we calculate the energy by multiplying two $3\times 3$ matrices with the neighbours. This is
2 * 3 * 3 * prod(size(image))
number of calculations. For the image we have chosen, we have to perform 11 273 130 calculations only for finding the energy of the image in the first iteration! A considerable amount of time is also spent in the function for getting the energy map, but we will not consider the complexity of this function here. When taking this into account one should not be so surprised that these calculations take some time, although they seem quite innocent at first sight. Try writing the algorithm in python and see if you can get it running as fast as in julia!
print("Number of calculations in calculating the energy of the image: ", 2 * 3 * 3 * prod(size(image)))
Number of calculations in calculating the energy of the image: 11273130
@time seam_carving(image,700);
16.226366 seconds (143.43 M allocations: 22.274 GiB, 10.63% gc time)
[1] Lecture 3 from MIT-course 18.S191 https://computationalthinking.mit.edu/Fall20/lecture3/
[2] Shai Avidan and Ariel Shamir. 2007. Seam carving for content-aware image resizing. In ACM SIGGRAPH 2007 papers (SIGGRAPH '07). Association for Computing Machinery, New York, NY, USA, 10–es. DOI:https://doi.org/10.1145/1275808.1276390