Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$ $\newcommand{\eqdef}{\equiv}$
This numerical tour studies semi-discrete optimal transport, i.e. when one of the two measure is discrete. It is not inteded to show efficient algorithm but only conveys the main underlying idea (c-transform, Laguerre cells, connexion to optimal quantization). In the Euclidean case, there exists efficient algorithm to compute Laguerre cells leveraging computational geometry algorithm for convex hulls.
The primal Kantorovitch OT problem reads $$ W_c(\al,\be) = \umin{\pi} \enscond{\int_{\Xx \times \Yy} c(x,y) \text{d}\pi(x,y)}{ \pi_1=\al,\pi_2=\be }. $$ It dual is $$ W_c(\al,\be) = \umax{f,g} \enscond{ \int_\Xx f \text{d} \al + \int_\Yy g \text{d} \be }{ f(x)+g(y) \leq c(x,y) }. $$
We consider the case where $\al=\sum_i a_i \de_{x_i}$ is a discrete measure, so that the function $f(x)$ can be replaced by a vector $(f_i)_{i=1}^n \in \RR^n$. The optimal $g(y)$ function can the be replaced by the $c$-transform of $f$ $$ f^c(y) \eqdef \umin{i} c(x_i,y) - f_i. $$
The function to maximize is then $$ W_c(\al,\be) = \umax{f \in \RR^n} \Ee(f) \eqdef \sum_i f_i a_i + \int f^c(y) \text{d}\be(y). $$
We now implement a gradient ascent scheme for the maximization of $\Ee$. The evaluation of $\Ee$) can be computed via the introduction of the partition of the domain in Laguerre cells $$ \Yy = \bigcup_{i} L_i(f) \qwhereq L_i(f) \eqdef \enscond{y}{ \forall j, c(x_i,y) - f_i \leq c(x_j,y) - f_j }. $$ Where $f=0$, this corrsponds to the partition in Voronoi cells.
One has that $\forall y \in L_i(f)$, $f^c(y) = c(x_i,y) - f_i$, i.e. $f^c$ is piecewise smooth according to this partition.
The grid for evaluation of the "continuous measure".
p = 500; % size of the image for sampling, m=p*p
t = linspace(0,1,p);
[V,U] = meshgrid(t,t);
Y = [U(:)';V(:)'];
First measure, sums of Dirac masses $\al = \sum_{i=1}^n a_i \de_{x_i}$.
n = 30;
X = .5+.5i + exp(1i*pi/4) * 1*( .1*(rand(1,n)-.5)+1i*(rand(1,n)-.5) );
X = [real(X);imag(X)];
a = ones(n,1)/n;
Second measure $\be$, potentially a continuous one (i.e. with a density), mixture of Gaussians. Here we discretize $\beta = \sum_{j=1}^m b_j \de_{y_j}$ on a very fine grid.
Z = {[.6 .9] [.4 .1]}; % mean
S = [.07 .09]; % variance
W = [.5 .5];
b = zeros(p); % ones(p)*.01;
for i=1:length(Z)
z = Z{i};
s = S(i);
b = b + W(i) * exp( (-(U-z(1)).^2-(V-z(2)).^2)/(2*s^2) );
end
b = b/sum(b(:));
Display the two measures.
Col = rand(n,3);
clf; hold on;
imagesc(t,t,-b);
s = 60*ones(n,1); % size
scatter( X(2,:), X(1,:), s, .8*Col, 'filled' );
axis equal; axis([0 1 0 1]); axis off;
colormap gray(256);
Initial potentials.
f = zeros(n,1);
compute Laguerre cells and c-transform
D = sum(Y.^2)' + sum(X.^2) - 2*Y'*X - f(:)';
[fC,I] = min(D,[], 2);
I = reshape(I, [p p]);
Dual value of the OT $\dot{f}{a}+\dotp{f^c}{\be}$.
OT = sum(f.*a) + sum(fC.*b(:));
Display the Laguerre call partition (here this is equal to the Vornoi diagram since $f=0$).
clf;
hold on;
imagesc(t,t,I); axis image; axis off;
contour(t,t,I, -.5:n+.5, 'k', 'LineWidth', 2);
colormap(Col);
scatter( X(2,:), X(1,:), (1+(f-min(f))*10)*100, Col*.8, 'filled' );
Where $\be$ has a density with respect to Lebesgue measure, then $\Ee$ is smooth, and its gradient reads $$ \nabla \Ee(f)_i = a_i - \int_{L_i(f)} \text{d}\be(x). $$
sum area captured
r = sum(sum( ( I==reshape(1:n,[1 1 n]) ) .* b, 1 ),2); r = r(:);
nablaE = a-r(:);
Exercise 1
Implement a gradient ascent $$ f \leftarrow f + \tau \nabla \Ee(f). $$ Experiment on the impact of $\tau$, display the evolution of the OT value $\Ee$ and of the Laguerre cells.
exo1()
%% Insert your code here.
Display the evolution of the estimated OT distance.
clf;
plot(1:niter, E, 'LineWidth', 2);
axis tight;
The function $\Ee$ to minimize can be written as an expectation over a random variable $Y \sim \be$ $$ \Ee(f)=\EE(E(f,Y)) \qwhereq E(f,y) = \dotp{f}{a} + f^c(y). $$
One can thus use a stochastic gradient ascent scheme to minimize this function, at iteration $\ell$ $$ f \leftarrow f + \tau_\ell \nabla E(f,y_\ell) $$ where $y_\ell \sim Y$ is a sample drawn according to $\be$ and the step size $\tau_\ell \sim 1/\ell$ should decay at a carefully chosen rate.
The gradient of the integrated functional reads $$ \nabla E(f,y)_i = a - 1_{L_i(f)}(y), $$ where $1_A$ is the binary indicator function of a set $A$.
Initialize the algorithm.
f = 0;
Draw the sample.
i = (rand<W(1))+1; % select one of the two Gaussian
y = [S(i) * randn + Z{i}(1);S(i) * randn + Z{i}(2)];
Compute the randomized gradient
detect Laguerre cell where y is
[~,i] = min( sum(y.^2)' + sum(X.^2) - 2*y'*X - f(:)' );
gradient
nablaEy = a; nablaEy(i) = nablaEy(i) - 1;
Exercise 2
Implement the stochastic gradient descent. Test various step size selection rule.
exo2()
%% Insert your code here.
Display the evolution of the estimated OT distance (warning: recording this takes lot of time).
clf;
plot(1:niter, E, 'LineWidth', 2);
axis tight;
We consider the following optimal quantization problem $$ \umin{ (a_i)_i,(x_i)_i } W_c\pa{ \sum_i a_i \de_{x_i},\be }. $$ This minimization is convex in $a$, and writing down the optimality condition, one has that the associated dual potential should be $f=0$, which means that the associated optimal Laguerre cells should be Voronoi cells $L_i(0)=V_i(x)$ associated to the sampling locations $$ V_i(x) = \enscond{y}{ \forall j, c(x_i,y) \leq c(x_j,y) }. $$
The minimization is non-convex with respect to the positions $x=(x_i)_i$ and one needs to solve $$ \umin{x} \Ff(x) \eqdef \sum_{i=1}^n \int_{V_i(x)} c(x_i,y) \text{d} \be(y). $$ For the sake of simplicity, we consider the case where $c(x,y)=\frac{1}{2}\norm{x-y}^2$.
The gradient reads $$ \nabla \Ff(x)_i = x_i \int_{V_i(x)} \text{d}\be - \int_{V_i(x)} y \text{d}\be(y). $$ The usual algorithm to compute stationary point of this energy is Lloyd's algorithm, which iterate the fixed point $$ x_i \leftarrow \frac{ \int_{V_i(x)} y \text{d}\be(y) }{ \int_{V_i(x)} \text{d}\be }, $$ i.e. one replaces the centroids by the barycenter of the cells.
Intialize the centroids positions.
X1 = X;
Compute the Voronoi cells $V_i(x)$.
D = sum(Y.^2)' + sum(X1.^2) - 2*Y'*X1;
[fC,I] = min(D,[], 2);
I = reshape(I, [p p]);
Update the centroids to the barycenters.
A = ( I==reshape(1:n,[1 1 n]) ) .* b;
B = ( I==reshape(1:n,[1 1 n]) ) .* b .* ( U+1i*V );
X1 = sum(sum(B,1),2) ./ sum(sum(A,1),2);
X1 = [real(X1(:))';imag(X1(:))'];
Exercise 3
Implement Lloyd algortihm.
exo3()
%% Insert your code here.
Display the evolution of the estimated OT distance.
clf;
plot(1:niter, E, 'LineWidth', 2);
axis tight;