In the last notebook, we introduced the Ising model and a Variational Inference way to approximate a solution. All you need to know is that the log probability of an image $y$ is given by $$ \log \tilde{p}(y) = - \sum_{s \tilde t } y_s w_{st} y_t.$$

This time, we'll approach the same problem from a Gibbs sampling perspective.

In [1]:
from PIL import Image
import numpy as np
from scipy.signal import convolve2d
from scipy import stats
In [4]:
img = imread('lettera.bmp')
arr = np.asarray(img, int)
arr = arr[:0:-1,:0:-1]  # Flip it
mn = np.mean(arr)
arr[arr < mn] = -1
arr[arr >= mn] = 1
original_mean = np.mean(arr)
print original_mean
-0.469030938062
In [5]:
sigma = 2
noisy_arr = arr + sigma*np.random.normal(size=arr.shape)
In [6]:
imshow(arr)
Out[6]:
<matplotlib.image.AxesImage at 0xa38ac2c>
In [8]:
imshow(noisy_arr)
Out[8]:
<matplotlib.image.AxesImage at 0xa5db38c>
In [9]:
noisy_mean = np.mean(noisy_arr)
print noisy_mean
imshow(noisy_arr > noisy_mean)
-0.439872637268
Out[9]:
<matplotlib.image.AxesImage at 0xa826ccc>

Gibbs Sampling for Image Denoising

Gibbs sampling is a very simple technique to sample from a distribution when sampling from a marginal distribution is easier than sampling the joint distribution. For instance, it is trivial to sample a simple pixel conditioned on all the other pixels from the image distribution described above. It would be much more difficult to sample from the joint distribution. If $p(\mathbf{x} | \theta)$ is a probability distribution and $\mathbf{x}_0 \sim p$, then we can generate a new sample $\mathbf{x}_0$ by following a simple algorithm:

for x_i in x: 
   x_i = sample from p(x_i | x_except_i, theta)

We can iterate through samples in this manner to arrive at a set of samples with the correct distribution.

To see this in the Ising model, let's just code it up:

In [104]:
noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []

burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
    for x in range(noisy_image_copy.shape[0]):
        low_range_x = max(x - 1, 0)
        high_range_x = min(x + 1, noisy_image_copy.shape[0])
        for y in range(noisy_image_copy.shape[1]):
            low_range_y = max(y - 1, 0)
            high_range_y = min(y + 1, noisy_image_copy.shape[1])
            eta = sum(noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]) - noisy_image_copy[x, y]
            local_evidence = log(stats.norm.pdf(noisy_arr[x, y], 1, 2) / stats.norm.pdf(noisy_arr[x, y], -1, 2))
            threshold = 2 * eta - local_evidence
            sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
            filter_matrix = np.random.random() < sigmoid_threshold
            noisy_image_copy[x, y] = 1 if filter_matrix else -1
    if i > burn_in:
        noisy_image_iterations.append(noisy_image_copy.copy())
In [105]:
noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]
Out[105]:
array([[-1., -1.],
       [-1., -1.]])
In [106]:
imshow(noisy_image_copy)
Out[106]:
<matplotlib.image.AxesImage at 0xb3e888c>
In [109]:
sum(abs(arr - noisy_image_copy)) / arr.size
Out[109]:
0.094364188728377457
In [107]:
sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)
Out[107]:
0.026164052328104655

So it works, but it's terribly slow. You could speed it up in a language that supports loops well (or use cython or numba). I coded up a vectorized approximation to Gibbs sampling below. Here, we sample from the entire image one time step in the past. We aren't iterating through the pixels updating one at a time, which is the normal definition of Gibbs sampling. However, the difference is generally pretty small and the results are as good.

In [115]:
# convolution image
w_conv = np.ones((3, 3)) 
w_conv[1, 1] = 0

noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []
burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
    eta_arr = convolve2d(noisy_image_copy, w_conv, mode='same')
    local_evidence = log(stats.norm.pdf(noisy_arr, 1, 2) / stats.norm.pdf(noisy_arr, -1, 2))
    threshold = 2 * eta_arr - local_evidence
    sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
    filter_matrix = np.random.random(noisy_image_copy.shape) < sigmoid_threshold
    noisy_image_copy[filter_matrix] = 1
    noisy_image_copy[~filter_matrix] = -1
    if i > burn_in:
        noisy_image_iterations.append(noisy_image_copy.copy())
    
In [116]:
imshow(np.array(noisy_image_iterations).mean(axis=0) > 0)
Out[116]:
<matplotlib.image.AxesImage at 0xabb21eec>
In [117]:
imshow(noisy_image_copy)
Out[117]:
<matplotlib.image.AxesImage at 0xacb5e14c>
In [118]:
sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)
Out[118]:
0.012586025172050343
In [119]:
sum(abs(arr - noisy_image_copy)) / arr.size
Out[119]:
0.047988095976191955

A (quick) derivation of the Gibbs sample

Our algorithm combines the Ising prior with a "damper" term describing local information. We start by deriving the Ising prior. by definition,

$$ p(x_t|x_{-t}, \theta) \propto \prod_{s \in nbhd(t)} exp(x_sx_t). $$

This is just the Ising model, and we can confirm that "like pixels" add weight and opposite pixels reduce weight. Fix $x_t = 1$ to get:

$$ p(x_t = 1 | x_{-t}, \theta) = \frac{\prod_{s \in nbhd(t)} exp(x_s)}{\prod_{s \in nbhd(t)} exp(x_s) + \prod_{s \in nbhd(t)} exp(-x_s)} $$

This simplifies (extract exp) to

$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s) .$$

In the code, we use $\eta_t$ to equal the neighborhood sum. We can also push a local evidence term $\gamma$ into each location in the model (see Murphy 19.4.1 for a discussion of local evidence in Ising models). This changes the model to encode the current value of the pixel using a Gaussian prior:

$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s - \log \frac{\gamma_t(1)}{\gamma_t(-1)}) $$

That's the local_evidence variable in our python code.