import cv2
%matplotlib notebook
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_style("dark")
/usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
The last example in this tutorial combines different packages to solve a structure from motion problem on 2 images. It starts using OpenCV to detect SIFT features and their descriptors. Scikit-learn’s implementation of PCA is employed to reduce the dimensionality of the descriptors and the KD-Trees in the SciPy library are then used to efficiently find features matches. From the found matches, the fundamental matrix between the pair of images is computed using OpenCV, and linear algebra procedures from SciPy are employed to compute the essential matrix and retrieve valid projection matrices. Finally, SVD decomposition, implemented in scipy.linalg, is used to perform triangulations and estimate a 3D point for each pair of matching features. The code and comments are available online 11 as an IPython notebook.
T1 = cv2.imread('data/templeRing/templeR0034.png', cv2.IMREAD_GRAYSCALE).T
sift = cv2.xfeatures2d.SIFT_create(nfeatures=5000)
kpts1, D_i = sift.detectAndCompute(T1, mask=None)
K1 = np.array([[k.pt[0], k.pt[1]] for k in kpts1])
T2 = cv2.imread('data/templeRing/templeR0036.png', cv2.IMREAD_GRAYSCALE).T
sift = cv2.xfeatures2d.SIFT_create(nfeatures=5000)
kpts2, D_j = sift.detectAndCompute(T2, mask=None)
K2 = np.array([[k.pt[0], k.pt[1]] for k in kpts2])
plt.subplot(1,2,1)
plt.plot(K1[:,0], K1[:,1], 'rx')
plt.imshow(T1, cmap=plt.cm.gray)
plt.title('Temple 34')
plt.subplot(1,2,2)
plt.plot(K2[:,0], K2[:,1], 'rx')
plt.imshow(T2, cmap=plt.cm.gray)
plt.title('Temple 36')
<matplotlib.text.Text at 0x7fe733ce2050>
Consider the two sets of descriptors, $\mathbf{D}_i$ and $\mathbf{D}_j$, computed using a method as SIFT or SURF, for two images $I_i$ and $I_j$. Matching can be performed using nearest neighbors, just selecting for each $\mathbf{d}_i \in \mathbf{D}_i$ the closest vector $\mathbf{d}_j \in \mathbf{D}_j$. Nearest neighbor queries can be efficiently done representing $\mathbf{D}_i$ in a KD-Tree.
KD-Tree performance is close to brute force for vectors presenting large dimensions. SIFT vectors are 128-d and SURF ones are 64-d. Before matching using KD-Trees, a dimensionality reduction procedure, as PCA, is recommended. Sklearn and OpenCV provide PCA implementations.
from sklearn.decomposition import PCA
pca = PCA(n_components=10)
pca.fit(D_i)
D_i = pca.transform(D_i)
D_j = pca.transform(D_j)
Lowe observes that many features will not have any correct match so just picking the closest neighbor would produce many incorrect matches. The recommended method is to compare the distance of the nearest neighbor to that of the second-closest one. If the ratio between the two distances is below a threshold, the matching is accepted. The rationale behind this procedure is that features with no proper matching would present similar distances between to their closest neighbors.
A KD-Tree is created from the $N_j$ vectors in the array $\mathbf{D}_j$. The query
procedure takes all vectors in $\mathbf{D}_j$ and the number of closest neighbors to be retrieved, just two in this case ($k=2$). The function return two $N_i \times 2$ arrays, d
and nn
, keeping the distances and the neighbors indexes respectively. That means d[n][0]
keeps the distance between the $n$-th vector in $\mathbf{D}_i$ and its closest neighbor, that is the element nn[n][0]
in $\mathbf{D}_j$. Fancy indexing is used to select the matches that obey the ratio test proposed by Lowe. Finally, a $N_i \times 2$ array is created, keeping in each row the indexes $n_i$ and $n_j$ such that $n_i$-th descriptor in $\mathbf{D}_i$ matches the $n_j$-th descriptor in $\mathbf{D}_j$. This array m
is produced using the vstack
, a function that "stack" arrays vertically, in this case a row of indexes in
$\mathbf{D}_i$ followed by the corresponding matching indexes in $\mathbf{D}_j$.
Other procedure to reject bad matches is to ensure that the same feature $\mathbf{d}_j \in \mathbf{D}_j$ is the proper match of a single feature $\mathbf{d}_i \in \mathbf{D}_i$.
In the code below, h
is a Python dictionary acting as an histogram, counting the number of times the $n_j$-th descriptor of $\mathbf{D}_j$ was matched to a descriptor in $\mathbf{D}_i$. The last line just keeps the matches that are unique.
from scipy.spatial import cKDTree
kdtree_j = cKDTree(D_j)
N_i = D_i.shape[0]
d, nn = kdtree_j.query(D_i, k=2)
ratio_mask = d[:,0]/d[:,1] < 0.6
m = np.vstack((np.arange(N_i), nn[:,0])).T
m = m[ratio_mask]
# Filtering: If more than one feature in I matches the same feature in J,
# we remove all of these matches
h = {nj:0 for nj in m[:,1]}
for nj in m[:,1]:
h[nj] += 1
m = np.array([(ni, nj) for ni, nj in m if h[nj] == 1])
from random import random
def rcolor():
return (random(), random(), random())
def show_matches(matches):
n_rows, n_cols = T1.shape
display = np.zeros( (n_rows, 2 * n_cols), dtype=np.uint8 )
display[:,0:n_cols] = T1
display[:,n_cols:] = T2
for pi, pj in matches:
plt.plot([K1[pi][0], K2[pj][0] + n_cols],
[K1[pi][1], K2[pj][1]],
marker='o', linestyle='-', color=rcolor())
plt.imshow(display, cmap=plt.cm.gray)
show_matches(m)
Find the fundamental matrix $\mathrm{\tt F}$
xi = K1[m[:,0],:]
xj = K2[m[:,1],:]
F, status = cv2.findFundamentalMat(xi, xj, cv2.FM_RANSAC, 0.5, 0.9)
from scipy import linalg as la
assert(la.det(F) < 1.e-7)
is_inlier = np.array(status == 1).reshape(-1)
inlier_i = xi[is_inlier]
inlier_j = xj[is_inlier]
hg = lambda x : np.array([x[0], x[1], 1])
K = np.array([[1520.4, 0., 302.32],
[0, 1525.9, 246.87],
[0, 0, 1]])
K
array([[ 1.52040000e+03, 0.00000000e+00, 3.02320000e+02], [ 0.00000000e+00, 1.52590000e+03, 2.46870000e+02], [ 0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
E = np.dot(K.T, np.dot(F, K))
U, s, VT = la.svd(E)
if la.det(np.dot(U, VT)) < 0:
VT = -VT
E = np.dot(U, np.dot(np.diag([1,1,0]), VT))
V = VT.T
# Let's check Nistér (2004) Theorem 3 constraint:
assert(la.det(U) > 0)
assert(la.det(V) > 0)
# Nistér (2004) Theorem 2 ("Essential Condition")
assert sum(np.dot(E, np.dot(E.T, E)) - 0.5 * np.trace(np.dot(E, E.T)) * E)[0] < 1.0e-10
Given two corresponding homogeneous points $\mathbf{x}_i$ and $\mathbf{x}_j$, observed in images $I_i$ and $I_j$ respectively, and the projection matrices $\mathrm{\tt P}_i$ and $\mathrm{\tt P}_j$, estimate the 3D points $\mathbf{X}$ in the scene associated to the pair $\mathbf{x}_i \leftrightarrow \mathbf{x}_j$.
The point $\mathbf{X}$ could be estimated by rays back-projection in the 3D space. However, in general $\mathbf{x}_i$ and $\mathbf{x}_j$ are imperfect measures and the back-projected rays will not perfectly intersect in $\mathbf{X}$. Such a problem can be formalized as a linear least-square problem in the system of equations $\mathrm{\tt A}\mathbf{X}$, where $\mathrm{\tt A}$ is defined over $\mathbf{x}_i$, $\mathbf{x}_j$, $\mathrm{\tt P}_i$ and $\mathrm{\tt P}_j$ (see Section~12.2 of Hartley and Zisserman's book). The following code uses the SVD implementation in SciPy to perform the minimization, finding the best estimation of $\mathbf{X}$. Note the last column of $\mathrm{\tt V}$ can be indexed by $-1$.
def dlt_triangulation(ui, Pi, uj, Pj):
"""Hartley & Zisserman, 12.2"""
ui /= ui[2]
xi, yi = ui[0], ui[1]
uj /= uj[2]
xj, yj = uj[0], uj[1]
a0 = xi * Pi[2,:] - Pi[0,:]
a1 = yi * Pi[2,:] - Pi[1,:]
a2 = xj * Pj[2,:] - Pj[0,:]
a3 = yj * Pj[2,:] - Pj[1,:]
A = np.vstack((a0, a1, a2, a3))
U, s, VT = la.svd(A)
V = VT.T
X3d = V[:,-1]
return X3d/X3d[3]
Result 6.1 in Hartley and Zisserman's book) says:
Let $\mathbf{X} = (X, Y, Z, T)^\intercal$ be a 3D point and $\mathrm{\tt P} = [\mathrm{\tt M} | \mathbf{p}_4]$ be a camera matrix for a finite camera. Suppose $\mathrm{\tt P}(X, Y, Z, T)^\intercal = w(x,y, 1)^\intercal$. Then
$\mathrm{depth}(\mathbf{X}, \mathrm{\tt P}) = \frac{\mathrm{sign}(\det \mathrm{\tt M})w}{T\|\mathbf{m}^3\|}$
is the depth of the point $\mathbf{X}$ in front of the principal plane of the camera.
In the result above, $\mathrm{\tt M}$ corresponds to a view on the three first columns of $\mathrm{\tt P}$ and $\mathbf{m}^3$ is the last row of $\mathrm{\tt M}$. The function below takes $\mathbf{X}$ and $\mathrm{\tt P}$ and applies this result to compute the depth of $\mathbf{X}$.
def depth(X, P):
T = X[3]
M = P[:,0:3]
p4 = P[:,3]
m3 = M[2,:]
x = np.dot(P, X)
w = x[2]
X = X/w
return (np.sign(la.det(M)) * w) / (T*la.norm(m3))
The code below implements the method described by Nistér (see An efficient solution to the five-point relative pose problem, section 3.1).
def get_proj_matrices(E, K, xi, xj):
hg = lambda x : np.array([x[0], x[1], 1])
W = np.array([[0., -1., 0.],
[1., 0., 0.],
[0., 0., 1.]])
Pi = np.dot(K, np.hstack( (np.identity(3), np.zeros((3,1))) ))
U, s, VT = la.svd(E)
u3 = U[:,2].reshape(3,1)
# Candidates
Pa = np.dot(K, np.hstack((np.dot(U, np.dot(W ,VT)), u3)))
Pb = np.dot(K, np.hstack((np.dot(U, np.dot(W ,VT)), -u3)))
Pc = np.dot(K, np.hstack((np.dot(U, np.dot(W.T ,VT)), u3)))
Pd = np.dot(K, np.hstack((np.dot(U, np.dot(W.T ,VT)), -u3)))
# Find the camera for which the 3D points are *in front*
xxi, xxj = hg(xi[0]), hg(xj[0])
Pj = None
for Pk in [Pa, Pb, Pc, Pd]:
Q = dlt_triangulation(xxi, Pi, xxj, Pk)
if depth(Q, Pi) > 0 and depth(Q, Pk) > 0:
Pj = Pk
break
assert(Pj is not None)
return Pi, Pj
P1, P2 = get_proj_matrices(E, K, inlier_i, inlier_j)
print P1
[[ 1.52040000e+03 0.00000000e+00 3.02320000e+02 0.00000000e+00] [ 0.00000000e+00 1.52590000e+03 2.46870000e+02 0.00000000e+00] [ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00]]
print P2
[[ -1.42505476e+03 4.70752545e+01 -6.08289727e+02 1.52545806e+03] [ 7.78012848e+00 -1.52389286e+03 -2.58854432e+02 1.99731458e+02] [ 2.05743507e-01 5.27826740e-03 -9.78591717e-01 3.50422051e-01]]
X = []
for xxi, xxj in zip(inlier_i, inlier_j):
X_k = dlt_triangulation(hg(xxi), P1, hg(xxj), P2)
X.append(X_k)
X = np.array(X)
num_pix = X.shape[0]
pix_color = [rcolor() for k in range(num_pix)]
pix = np.dot(P2, X.T).T
pix = np.divide(pix, pix[:,2].reshape(num_pix, -1))
The 3D plot below is static because of the inline plotting. Other Matplotlib's backends allow us to rotate the 3D plot, letting us to inspect the results.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
plt.subplot(1,2,1)
for k in range(num_pix):
plt.plot(pix[k,0], pix[k,1], color=pix_color[k], marker='o')
plt.imshow(T1, cmap=plt.cm.gray)
ax = fig.add_subplot(1, 2, 2, projection='3d')
ax.scatter(X[:,0], X[:,1], X[:,2], zdir='z', c=pix_color)
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fe6ffad5450>