In the last notebook, we introduced the Ising model and a Variational Inference way to approximate a solution. All you need to know is that the log probability of an image $y$ is given by $$ \log \tilde{p}(y) = - \sum_{s \tilde t } y_s w_{st} y_t.$$

This time, we'll approach the same problem from a Gibbs sampling perspective.

In [1]:

from PIL import Image
import numpy as np
from scipy.signal import convolve2d
from scipy import stats

In [4]:

img = imread('lettera.bmp')
arr = np.asarray(img, int)
arr = arr[:0:-1,:0:-1]  # Flip it
mn = np.mean(arr)
arr[arr < mn] = -1
arr[arr >= mn] = 1
original_mean = np.mean(arr)
print original_mean

-0.469030938062

In [5]:

sigma = 2
noisy_arr = arr + sigma*np.random.normal(size=arr.shape)

In [6]:

imshow(arr)

Out[6]:

<matplotlib.image.AxesImage at 0xa38ac2c>

In [8]:

imshow(noisy_arr)

Out[8]:

<matplotlib.image.AxesImage at 0xa5db38c>

In [9]:

noisy_mean = np.mean(noisy_arr)
print noisy_mean
imshow(noisy_arr > noisy_mean)

-0.439872637268

Out[9]:

<matplotlib.image.AxesImage at 0xa826ccc>

Gibbs Sampling for Image Denoising¶

Gibbs sampling is a very simple technique to sample from a distribution when sampling from a marginal distribution is easier than sampling the joint distribution. For instance, it is trivial to sample a simple pixel conditioned on all the other pixels from the image distribution described above. It would be much more difficult to sample from the joint distribution. If $p(\mathbf{x} | \theta)$ is a probability distribution and $\mathbf{x}_0 \sim p$, then we can generate a new sample $\mathbf{x}_0$ by following a simple algorithm:

for x_i in x: 
   x_i = sample from p(x_i | x_except_i, theta)

We can iterate through samples in this manner to arrive at a set of samples with the correct distribution.

To see this in the Ising model, let's just code it up:

In [104]:

noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []

burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
    for x in range(noisy_image_copy.shape[0]):
        low_range_x = max(x - 1, 0)
        high_range_x = min(x + 1, noisy_image_copy.shape[0])
        for y in range(noisy_image_copy.shape[1]):
            low_range_y = max(y - 1, 0)
            high_range_y = min(y + 1, noisy_image_copy.shape[1])
            eta = sum(noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]) - noisy_image_copy[x, y]
            local_evidence = log(stats.norm.pdf(noisy_arr[x, y], 1, 2) / stats.norm.pdf(noisy_arr[x, y], -1, 2))
            threshold = 2 * eta - local_evidence
            sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
            filter_matrix = np.random.random() < sigmoid_threshold
            noisy_image_copy[x, y] = 1 if filter_matrix else -1
    if i > burn_in:
        noisy_image_iterations.append(noisy_image_copy.copy())

In [105]:

noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]

Out[105]:

array([[-1., -1.],
       [-1., -1.]])

In [106]:

imshow(noisy_image_copy)

Out[106]:

<matplotlib.image.AxesImage at 0xb3e888c>

In [109]:

sum(abs(arr - noisy_image_copy)) / arr.size

Out[109]:

0.094364188728377457

In [107]:

sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)

Out[107]:

0.026164052328104655

So it works, but it's terribly slow. You could speed it up in a language that supports loops well (or use cython or numba). I coded up a vectorized approximation to Gibbs sampling below. Here, we sample from the entire image one time step in the past. We aren't iterating through the pixels updating one at a time, which is the normal definition of Gibbs sampling. However, the difference is generally pretty small and the results are as good.

In [115]:

# convolution image
w_conv = np.ones((3, 3)) 
w_conv[1, 1] = 0

noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []
burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
    eta_arr = convolve2d(noisy_image_copy, w_conv, mode='same')
    local_evidence = log(stats.norm.pdf(noisy_arr, 1, 2) / stats.norm.pdf(noisy_arr, -1, 2))
    threshold = 2 * eta_arr - local_evidence
    sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
    filter_matrix = np.random.random(noisy_image_copy.shape) < sigmoid_threshold
    noisy_image_copy[filter_matrix] = 1
    noisy_image_copy[~filter_matrix] = -1
    if i > burn_in:
        noisy_image_iterations.append(noisy_image_copy.copy())
    

In [116]:

imshow(np.array(noisy_image_iterations).mean(axis=0) > 0)

Out[116]:

<matplotlib.image.AxesImage at 0xabb21eec>

In [117]:

imshow(noisy_image_copy)

Out[117]:

<matplotlib.image.AxesImage at 0xacb5e14c>

In [118]:

sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)

Out[118]:

0.012586025172050343

In [119]:

sum(abs(arr - noisy_image_copy)) / arr.size

Out[119]:

0.047988095976191955

A (quick) derivation of the Gibbs sample¶

Our algorithm combines the Ising prior with a "damper" term describing local information. We start by deriving the Ising prior. by definition,

$$ p(x_t|x_{-t}, \theta) \propto \prod_{s \in nbhd(t)} exp(x_sx_t). $$

This is just the Ising model, and we can confirm that "like pixels" add weight and opposite pixels reduce weight. Fix $x_t = 1$ to get:

$$ p(x_t = 1 | x_{-t}, \theta) = \frac{\prod_{s \in nbhd(t)} exp(x_s)}{\prod_{s \in nbhd(t)} exp(x_s) + \prod_{s \in nbhd(t)} exp(-x_s)} $$

This simplifies (extract exp) to

$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s) .$$

In the code, we use $\eta_t$ to equal the neighborhood sum. We can also push a local evidence term $\gamma$ into each location in the model (see Murphy 19.4.1 for a discussion of local evidence in Ising models). This changes the model to encode the current value of the pixel using a Gaussian prior:

$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s - \log \frac{\gamma_t(1)}{\gamma_t(-1)}) $$

That's the local_evidence variable in our python code.

In [ ]: