# Bag of Visual Words Image Classification¶

This notebook walks through a simple bag of words based image classifier. In this technique, small patches are extracted from an image and the number of each patch "type" is used as a feature to a classifier such as an SVM. The patch types are automatically discovered via clustering of all patches found in the training set.

Take a look here for a more thorough exposition

This notebook requires sklearn. You can install something like Enthought Canopy to get all of the packages without any hassle.

This approach is fairly naive (and extremely slow) as is, but it gives an idea of how to implement your own classifiers for use in a pipeline.

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [2]:
# Display given faces in a grid
def show_patches(ims, grid_size=(16, 6)):
fig = plt.figure(figsize=grid_size)
for i in range(min(len(ims), 30)):
ax = fig.add_subplot(3, 10, i + 1, xticks=[], yticks=[])
ax.imshow(ims[i].reshape((8, 8)), cmap=plt.cm.bone, interpolation='nearest')

In [3]:
import sklearn
from sklearn.pipeline import Pipeline
from sklearn import svm


We are using the CIFAR 10 dataset - a dataset of 32x32 images from 10 classes

In [4]:
# Load data
import cPickle

with open(file, 'rb') as f:
return data


In [5]:
images = data['data']
# Reshape to go from length 3072 vector to 32x32 rgb images
# order='F' deals with specifics of how the data is laid out
images = images.reshape((-1, 32, 32, 3), order='F')
labels = np.array(data['labels'])

def get_classes(classes=[0, 1, 2], per_class=100):
# Array of indices i where labels[i] is in classes
indices = np.concatenate([np.where(labels == c)[0][:per_class] for c in classes])
return images[indices], labels[indices]

# For speed, let's consider only 2 classes, 100 images per class for now
classes = [0,1,2,3]
X, Y = get_classes(classes, 100)

In [6]:
# That's cars, cats, and birds in this case
for c in classes:
plt.imshow(images[labels == c][0], interpolation='nearest')
plt.show()

In [7]:
from sklearn.cross_validation import KFold
from sklearn.metrics import accuracy_score

def score(clf, X, Y, folds=2, verbose=False):
predictions = np.zeros(len(Y))
for i, (train, test) in enumerate(KFold(len(X), n_folds=folds, shuffle=True)):
clf.fit(X[train], Y[train])
predictions[test] = clf.predict(X[test])
if verbose:
print("Fold {}: {}".format(i + 1, accuracy_score(Y[test], predictions[test])))
return accuracy_score(Y, predictions)


### The pipeline begins with the patch extractor. This extracts small (8x8) patches from the image.¶

In [8]:
from sklearn.feature_extraction import image
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.299, 0.587, 0.144])

# Used to whiten patches - reduces variance, helps with classification
from sklearn.decomposition import RandomizedPCA

class PatchExtractor(sklearn.base.BaseEstimator):
def __init__(self, patch_size=(8, 8)):
self.patch_size = patch_size

def _extract_patches(self, x):
""" Extracts patches from given H x W image """
# Extract patches in a grid, reshape to proper size
# Details not really important - written this way for speed
return image.extract_patches(x, self.patch_size,
self.patch_size[0]).reshape((-1,
self.patch_size[0] * self.patch_size[1]))

def fit(self, X, Y=None):
patches = np.concatenate([self._extract_patches(rgb2gray(x)) for x in X])
print(patches.shape)
#self.pca = RandomizedPCA(whiten=True)
#self.pca.fit(patches)
return self

def transform(self, X, Y=None):
return np.array([self._extract_patches(rgb2gray(x)) for x in X])
#return np.array([self.pca.transform(self._extract_patches(rgb2gray(x))) for x in X])


### The next step is taking a collection of patches and creating a histogram of patch types.¶

Patch types are automatically discovered during training via KMeans clustering. For feature transformation of a single image, each patch is assigned to its nearest cluster and the feature vector of an image is the number of patches from each type (a histogram).

In [9]:
from sklearn.cluster import KMeans
class Codebook(sklearn.base.BaseEstimator):
def __init__(self, size=10):
self.size = size
self.clusterer = KMeans(n_clusters=size)

def _get_histogram(self, x):
""" Returns histogram of codewords for given features """
# Alternative method: return distance of each patch to cluster centers
# return self.clusterer.transform(x).ravel()

# Assign each patch to a cluster
clusters = self.clusterer.predict(x)

# Get the number of each patch type
return np.bincount(clusters, minlength=self.size)

def fit(self, X, Y=None):
# print("Fitting clusterer")
self.clusterer.fit(np.concatenate(X))
return self

def transform(self, X, Y=None):
return np.array([self._get_histogram(x) for x in X])


## Let's put those together in a Pipeline¶

In [10]:
X, Y = get_classes([1, 3, 5, 7, 9], 100)
patcher = PatchExtractor(patch_size=(8, 8))
codebook = Codebook(size=50)
clf = svm.SVC(kernel='linear')
pipeline = Pipeline([("Patch_extractor", patcher), ("Codebook", codebook), ("svm", clf)])
score(pipeline, X, Y, 2, verbose=True)

(4000, 64)
Fold 1: 0.304
(4000, 64)
Fold 2: 0.328

Out[10]:
0.316

## Heads up: This takes a really long time to run¶

We end up getting about a 30% accuracy across 10 classes, which is significantly better than chance but can definitely be improved.

In [14]:
X, Y = get_classes([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 900)
patcher = PatchExtractor(patch_size=(8, 8))
codebook = Codebook(size=50)
clf = svm.SVC(kernel='linear')
pipeline = Pipeline([("Patch_extractor", patcher), ("Codebook", codebook), ("svm", clf)])
score(pipeline, X, Y, 2, verbose=True)

(72000, 64)
Fold 1: 0.316444444444
(72000, 64)
Fold 2: 0.313777777778

Out[14]:
0.31511111111111112
In [15]:
# You end up with edge like features
show_patches(codebook.clusterer.cluster_centers_)


If you wanted to try different codebook sizes, GridSearchCV is a good approach:

X, Y = get_classes([5, 7], 100)
from sklearn.grid_search import GridSearchCV
param_grid = [
{'Codebook__size': [3, 10]},
]
grid_search = GridSearchCV(pipeline, param_grid, scoring=score)
grid_search.fit(X, Y)

## Now let's compare this to a simple RGB color based approach¶

While this may be useful on some datasets with characteristic colors per category, that doesn't appear to be the case with CIFAR. We end up doing worse than chance.

In [16]:
class RGBFeature(sklearn.base.BaseEstimator):
""" Maps an image to its RGB color averages """
def fit(self, X, Y=None):
return self
def transform(self, X, Y=None):
# Get average of each color channel
return np.array([[np.average(X[:, :, i]) for i in range(3)] for x in X])

In [24]:
X, Y = get_classes([1, 3, 5, 7, 9], 500)
rgb = RGBFeature()
clf = svm.SVC(kernel='linear')
pipeline = Pipeline([("RGB Average", rgb), ("svm", clf)])
score(pipeline, X, Y, 5, verbose=True)

Fold 1: 0.164
Fold 2: 0.168
Fold 3: 0.18
Fold 4: 0.16
Fold 5: 0.178

Out[24]:
0.17000000000000001