In the last notebook, we introduced the Ising model and a Variational Inference way to approximate a solution. All you need to know is that the log probability of an image $y$ is given by $$ \log \tilde{p}(y) = - \sum_{s \tilde t } y_s w_{st} y_t.$$
This time, we'll approach the same problem from a Gibbs sampling perspective.
from PIL import Image
import numpy as np
from scipy.signal import convolve2d
from scipy import stats
img = imread('lettera.bmp')
arr = np.asarray(img, int)
arr = arr[:0:-1,:0:-1] # Flip it
mn = np.mean(arr)
arr[arr < mn] = -1
arr[arr >= mn] = 1
original_mean = np.mean(arr)
print original_mean
-0.469030938062
sigma = 2
noisy_arr = arr + sigma*np.random.normal(size=arr.shape)
imshow(arr)
<matplotlib.image.AxesImage at 0xa38ac2c>
imshow(noisy_arr)
<matplotlib.image.AxesImage at 0xa5db38c>
noisy_mean = np.mean(noisy_arr)
print noisy_mean
imshow(noisy_arr > noisy_mean)
-0.439872637268
<matplotlib.image.AxesImage at 0xa826ccc>
Gibbs sampling is a very simple technique to sample from a distribution when sampling from a marginal distribution is easier than sampling the joint distribution. For instance, it is trivial to sample a simple pixel conditioned on all the other pixels from the image distribution described above. It would be much more difficult to sample from the joint distribution. If $p(\mathbf{x} | \theta)$ is a probability distribution and $\mathbf{x}_0 \sim p$, then we can generate a new sample $\mathbf{x}_0$ by following a simple algorithm:
for x_i in x:
x_i = sample from p(x_i | x_except_i, theta)
We can iterate through samples in this manner to arrive at a set of samples with the correct distribution.
To see this in the Ising model, let's just code it up:
noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []
burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
for x in range(noisy_image_copy.shape[0]):
low_range_x = max(x - 1, 0)
high_range_x = min(x + 1, noisy_image_copy.shape[0])
for y in range(noisy_image_copy.shape[1]):
low_range_y = max(y - 1, 0)
high_range_y = min(y + 1, noisy_image_copy.shape[1])
eta = sum(noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]) - noisy_image_copy[x, y]
local_evidence = log(stats.norm.pdf(noisy_arr[x, y], 1, 2) / stats.norm.pdf(noisy_arr[x, y], -1, 2))
threshold = 2 * eta - local_evidence
sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
filter_matrix = np.random.random() < sigmoid_threshold
noisy_image_copy[x, y] = 1 if filter_matrix else -1
if i > burn_in:
noisy_image_iterations.append(noisy_image_copy.copy())
noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]
array([[-1., -1.], [-1., -1.]])
imshow(noisy_image_copy)
<matplotlib.image.AxesImage at 0xb3e888c>
sum(abs(arr - noisy_image_copy)) / arr.size
0.094364188728377457
sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)
0.026164052328104655
So it works, but it's terribly slow. You could speed it up in a language that supports loops well (or use cython or numba). I coded up a vectorized approximation to Gibbs sampling below. Here, we sample from the entire image one time step in the past. We aren't iterating through the pixels updating one at a time, which is the normal definition of Gibbs sampling. However, the difference is generally pretty small and the results are as good.
# convolution image
w_conv = np.ones((3, 3))
w_conv[1, 1] = 0
noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []
burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
eta_arr = convolve2d(noisy_image_copy, w_conv, mode='same')
local_evidence = log(stats.norm.pdf(noisy_arr, 1, 2) / stats.norm.pdf(noisy_arr, -1, 2))
threshold = 2 * eta_arr - local_evidence
sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
filter_matrix = np.random.random(noisy_image_copy.shape) < sigmoid_threshold
noisy_image_copy[filter_matrix] = 1
noisy_image_copy[~filter_matrix] = -1
if i > burn_in:
noisy_image_iterations.append(noisy_image_copy.copy())
imshow(np.array(noisy_image_iterations).mean(axis=0) > 0)
<matplotlib.image.AxesImage at 0xabb21eec>
imshow(noisy_image_copy)
<matplotlib.image.AxesImage at 0xacb5e14c>
sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)
0.012586025172050343
sum(abs(arr - noisy_image_copy)) / arr.size
0.047988095976191955
Our algorithm combines the Ising prior with a "damper" term describing local information. We start by deriving the Ising prior. by definition,
$$ p(x_t|x_{-t}, \theta) \propto \prod_{s \in nbhd(t)} exp(x_sx_t). $$This is just the Ising model, and we can confirm that "like pixels" add weight and opposite pixels reduce weight. Fix $x_t = 1$ to get:
$$ p(x_t = 1 | x_{-t}, \theta) = \frac{\prod_{s \in nbhd(t)} exp(x_s)}{\prod_{s \in nbhd(t)} exp(x_s) + \prod_{s \in nbhd(t)} exp(-x_s)} $$This simplifies (extract exp) to
$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s) .$$In the code, we use $\eta_t$ to equal the neighborhood sum. We can also push a local evidence term $\gamma$ into each location in the model (see Murphy 19.4.1 for a discussion of local evidence in Ising models). This changes the model to encode the current value of the pixel using a Gaussian prior:
$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s - \log \frac{\gamma_t(1)}{\gamma_t(-1)}) $$That's the local_evidence variable in our python code.