In the last notebook, we introduced the Ising model and a Variational Inference way to approximate a solution. All you need to know is that the log probability of an image $y$ is given by $$ \log \tilde{p}(y) = - \sum_{s \tilde t } y_s w_{st} y_t.$$

This time, we'll approach the same problem from a Gibbs sampling perspective.

In [1]:

```
from PIL import Image
import numpy as np
from scipy.signal import convolve2d
from scipy import stats
```

In [4]:

```
img = imread('lettera.bmp')
arr = np.asarray(img, int)
arr = arr[:0:-1,:0:-1] # Flip it
mn = np.mean(arr)
arr[arr < mn] = -1
arr[arr >= mn] = 1
original_mean = np.mean(arr)
print original_mean
```

In [5]:

```
sigma = 2
noisy_arr = arr + sigma*np.random.normal(size=arr.shape)
```

In [6]:

```
imshow(arr)
```

Out[6]:

In [8]:

```
imshow(noisy_arr)
```

Out[8]:

In [9]:

```
noisy_mean = np.mean(noisy_arr)
print noisy_mean
imshow(noisy_arr > noisy_mean)
```

Out[9]:

Gibbs sampling is a very simple technique to sample from a distribution *when sampling from a marginal distribution is easier than sampling the joint distribution*. For instance, it is trivial to sample a simple pixel conditioned on all the other pixels from the image distribution described above. It would be much more difficult to sample from the joint distribution. If $p(\mathbf{x} | \theta)$ is a probability distribution and $\mathbf{x}_0 \sim p$, then we can generate a new sample $\mathbf{x}_0$ by following a simple algorithm:

```
for x_i in x:
x_i = sample from p(x_i | x_except_i, theta)
```

We can iterate through samples in this manner to arrive at a set of samples with the correct distribution.

To see this in the Ising model, let's just code it up:

In [104]:

```
noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []
burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
for x in range(noisy_image_copy.shape[0]):
low_range_x = max(x - 1, 0)
high_range_x = min(x + 1, noisy_image_copy.shape[0])
for y in range(noisy_image_copy.shape[1]):
low_range_y = max(y - 1, 0)
high_range_y = min(y + 1, noisy_image_copy.shape[1])
eta = sum(noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]) - noisy_image_copy[x, y]
local_evidence = log(stats.norm.pdf(noisy_arr[x, y], 1, 2) / stats.norm.pdf(noisy_arr[x, y], -1, 2))
threshold = 2 * eta - local_evidence
sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
filter_matrix = np.random.random() < sigmoid_threshold
noisy_image_copy[x, y] = 1 if filter_matrix else -1
if i > burn_in:
noisy_image_iterations.append(noisy_image_copy.copy())
```

In [105]:

```
noisy_image_copy[low_range_x:high_range_x+1, low_range_y:high_range_y+1]
```

Out[105]:

In [106]:

```
imshow(noisy_image_copy)
```

Out[106]:

In [109]:

```
sum(abs(arr - noisy_image_copy)) / arr.size
```

Out[109]:

In [107]:

```
sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)
```

Out[107]:

So it works, but it's terribly slow. You could speed it up in a language that supports loops well (or use cython or numba). I coded up a vectorized approximation to Gibbs sampling below. Here, we sample from the entire image one time step in the past. We aren't iterating through the pixels updating one at a time, which is the normal definition of Gibbs sampling. However, the difference is generally pretty small and the results are as good.

In [115]:

```
# convolution image
w_conv = np.ones((3, 3))
w_conv[1, 1] = 0
noisy_image_copy = noisy_arr.copy()
noisy_image_iterations = []
burn_in = 10
iterations = 5
for i in range(burn_in + iterations):
eta_arr = convolve2d(noisy_image_copy, w_conv, mode='same')
local_evidence = log(stats.norm.pdf(noisy_arr, 1, 2) / stats.norm.pdf(noisy_arr, -1, 2))
threshold = 2 * eta_arr - local_evidence
sigmoid_threshold = exp(threshold) / (exp(threshold) + 1)
filter_matrix = np.random.random(noisy_image_copy.shape) < sigmoid_threshold
noisy_image_copy[filter_matrix] = 1
noisy_image_copy[~filter_matrix] = -1
if i > burn_in:
noisy_image_iterations.append(noisy_image_copy.copy())
```

In [116]:

```
imshow(np.array(noisy_image_iterations).mean(axis=0) > 0)
```

Out[116]:

In [117]:

```
imshow(noisy_image_copy)
```

Out[117]:

In [118]:

```
sum(abs(arr - np.array(noisy_image_iterations).mean(axis=0) > 0)) / float(arr.size)
```

Out[118]:

In [119]:

```
sum(abs(arr - noisy_image_copy)) / arr.size
```

Out[119]:

Our algorithm combines the Ising prior with a "damper" term describing local information. We start by deriving the Ising prior. by definition,

$$ p(x_t|x_{-t}, \theta) \propto \prod_{s \in nbhd(t)} exp(x_sx_t). $$

This is just the Ising model, and we can confirm that "like pixels" add weight and opposite pixels reduce weight. Fix $x_t = 1$ to get:

$$ p(x_t = 1 | x_{-t}, \theta) = \frac{\prod_{s \in nbhd(t)} exp(x_s)}{\prod_{s \in nbhd(t)} exp(x_s) + \prod_{s \in nbhd(t)} exp(-x_s)} $$

This simplifies (extract exp) to

$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s) .$$

In the code, we use $\eta_t$ to equal the neighborhood sum. We can also push a local evidence term $\gamma$ into each location in the model (see Murphy 19.4.1 for a discussion of local evidence in Ising models). This changes the model to encode the current value of the pixel using a Gaussian prior:

$$ p(x_t = 1 | x_{-t}, \theta) = sigmoid (2 \sum_{s \in nbhd(t)} x_s - \log \frac{\gamma_t(1)}{\gamma_t(-1)}) $$

That's the local_evidence variable in our python code.