K-Means clustering is one of the most used clustering problems in unsupervised learning. Typically, a heuristic algorithm is used to solve the K-Means clustering problem. Such algorithms however do not guarantee a global optimum. In this notebook, we show how K-Means can be expressed as a Generalized Disjunctive Program (GDP) with a Quadratic Rotated Cone, and we demonstrate how to implement it in MOSEK using Disjunctive Constraints (DJC). Such problem was for example studied by Papageorgiou, Trespalacios and Kronqvist, Misener, Tsay. We further show a modification called Euclidean clustering by changing only a few lines of code.
We assume a set of points $\textbf{p}_1, \ldots, \textbf{p}_n \in \mathbb{R}^\mathcal{D}$ and a natural number $\mathcal{K} \in \{1, 2, ..., n\}$ which specifies number of centroids. We want to find positions of the centroids such that the overall squared Euclidean distance from each point to the closest centroid is minimized. The formulation using disjunctions can look as follows
$$\begin{array}{rll} \text{minimize} & \sum_{i=1}^n d_i & \\ \text{subject to} & \bigvee_{j \in \{1, .., \mathcal{K}\}} \Bigl[ d_i \geq || \textbf{c}_j - \textbf{p}_i ||_2^2 \wedge Y_i = j \Bigr], & \forall i \in \{1, ..., n\},\\ & c_{j-1, 1} \leq c_{j, 1}, & \forall j \in \{2, .., \mathcal{K}\},\\ \\ & d_1, ..., d_n \in \mathbb{R}, & \\ & \textbf{c}_1, ..., \textbf{c}_{\mathcal{K}} \in \mathbb{R}^\mathcal{D}, \\ & Y_1, ..., Y_n \in \{1, .., \mathcal{K}\}, & \end{array}$$where $\textbf{c}_1, ..., \textbf{c}_{\mathcal{K}}$ are the positions of the centroids, $d_1, ..., d_n$ are auxiliary variables representing the shortest squared distance to the nearest centroid for each point and $Y_1, ..., Y_n$ are classification labels for each point, indicating the index of the nearest centroid. The first constraint is a disjunctive constraint representing the choice of a centroid for each point. Exactly one of the clauses in each of $n$ disjunctions is "active" and determines the index of the nearest centroid and the distance to is. The second constraint in the formulation is a symmetry-breaking constraint.
MOSEK only supports Disjunctive Normal Form (DNF) of affine constraints. Formally, this means that each Disjunctive Constraint (DJC) is of the form $ \bigvee_i \bigwedge_j T_{i, j}$, where $T_{i,j}$ is an affine constraint. Such constraint is satisfied if and only if there exists at least one term $i$, such that all affine constraints $T_{i,j}$ are satisfied. We therefore need to move the non-linearity out of the disjunction. This can be tackled by a using new auxiliary variables $dAux_{i, j}$ and constraining them outside of the dicjunctions. The program then looks in the following way:
$$\begin{array}{rll} \text{minimize} & \sum_{i=1}^n d_i & \\ \text{subject to} & \bigvee_{j \in \{1, .., \mathcal{K}\}} \Bigl[ d_i \geq dAux_{i, j} \wedge \hspace{0.2cm} Y_i = j \Bigr], & \forall i \in \{1, ..., n\},\\ & dAux_{i, j} \geq || \textbf{c}_j - \textbf{p}_i ||_2^2, & \forall j \in \{1, .., \mathcal{K}\}, \forall i \in \{1, ..., n\} \\ & c_{j-1, 1} \leq c_{j, 1}, & \forall j \in \{2, .., \mathcal{K}\}, \\ \\ & dAux \in \mathbb{R}^{n \times \mathcal{K}}, & \\ & d_1, ..., d_n \in \mathbb{R}, & \\ & \textbf{c}_1, ..., \textbf{c}_{\mathcal{K}} \in \mathbb{R}^\mathcal{D}, & \\ & Y_1, ..., Y_n \in \{1, .., \mathcal{K}\}. & \end{array}$$To prepare the synthetic data, we generate 3 clusters with the same number of points. These clusters are generated randomly according to the Multivariate normal distribution each with an appropriate mean vector and a covariance matrix.
import numpy as np
from mosek.fusion import *
import matplotlib.pyplot as plt
import sys
# make the randomness deterministic
np.random.seed(0)
# specify the number of points
nData = 21
# generate data
numberOfClusters = 3
size = nData // numberOfClusters
pointsClass1 = np.random.multivariate_normal(np.array([1, 1])*2, np.array([[1, 0], [0, 1]])*0.7, size=size)
pointsClass2 = np.random.multivariate_normal(np.array([-1, -1])*2, np.array([[1, 0], [0, 1]])*0.7, size=size)
pointsClass3 = np.random.multivariate_normal(np.array([-1, 3])*2, np.array([[1, 0], [0, 1]])*0.7, size=size)
points = np.vstack((pointsClass1, pointsClass2, pointsClass3))
# plot the generated data
labels = np.zeros(3*size)
labels[size:2*size] = 1
labels[2*size:] = 2
plt.title("Generated Data")
plt.scatter(pointsClass1[:, 0], pointsClass1[:, 1])
plt.scatter(pointsClass2[:, 0], pointsClass2[:, 1])
plt.scatter(pointsClass3[:, 0], pointsClass3[:, 1])
plt.show()
In the following block, we show the K-Means model in the Mosek Fusion for Python in a vectorized fashion.
# get shape
n, d = points.shape
xs = np.repeat( points, numberOfClusters, axis=0)
# create model
with Model() as M:
########## create variables ##########
centroids = M.variable( [numberOfClusters, d], Domain.unbounded() )
distances = M.variable( n, Domain.greaterThan(0) )
distanceAux = M.variable( [n, numberOfClusters] , Domain.greaterThan(0) )
# create classification labels
Y = M.variable( n, Domain.integral( Domain.inRange(0, numberOfClusters-1) ) )
########## create constraints ##########
# quadratic cone constraints
a1 = Expr.flatten( distanceAux )
a2 = Expr.constTerm([n*numberOfClusters, 1], 0.5)
a3 = Expr.sub( Expr.repeat(centroids, n, 0) , xs )
hstack = Expr.hstack([a1, a2, a3]) # ( d_{ij}Aux, 1/2, x_i - c_j ) in Qr
M.constraint( hstack, Domain.inRotatedQCone() )
# create disjunctive constraints
for pointInd in range(0, n):
label = Y.index(pointInd)
di = distances.index(pointInd)
ANDs = {}
# create AND constraints
for clusterInd in range(0, numberOfClusters):
dijAux = distanceAux.index([pointInd, clusterInd])
ANDs[clusterInd] = DJC.AND( DJC.term( Expr.sub(di, dijAux),Domain.greaterThan(0)), # di >= dijAux
DJC.term( label , Domain.equalsTo(clusterInd) )) # Y_i = j
M.disjunction( [ ANDs[i] for i in range(0, numberOfClusters) ] )
# symmetry breaking constraints
M.constraint( Expr.sub(centroids.slice([0, 0], [numberOfClusters-1, 1]),
centroids.slice([1, 0], [numberOfClusters, 1])), Domain.lessThan(0) )
########## solve ##########
M.objective( ObjectiveSense.Minimize, Expr.sum(distances) )
M.setLogHandler(sys.stdout) # Enable log output
# Solve
M.solve()
# get centroids and clusters
labels_KMEANS = Y.level()
centroids_KMEANS = np.array(centroids.level()).reshape(numberOfClusters, d)
Problem Name : Objective sense : minimize Type : CONIC (conic optimization problem) Constraints : 2 Affine conic cons. : 63 Disjunctive cons. : 21 Cones : 0 Scalar variables : 112 Matrix variables : 0 Integer variables : 21 Optimizer started. Mixed integer optimizer started. Threads used: 64 Presolve started. Presolve terminated. Time = 0.00, probing time = 0.00 Presolved problem: 420 variables, 290 constraints, 664 non-zeros Presolved problem: 21 general integer, 63 binary, 336 continuous Presolved problem: 63 cones Presolved problem: 63 disjunctions Clique table size: 63 BRANCHES RELAXS ACT_NDS DEPTH BEST_INT_OBJ BEST_RELAX_OBJ REL_GAP(%) TIME 0 1 1 0 NA 0.0000000000e+00 NA 0.0 0 1 1 0 3.6682139754e+02 0.0000000000e+00 100.00 0.0 Cut generation started. 0 1 1 0 3.6682139754e+02 0.0000000000e+00 100.00 0.0 Cut generation terminated. Time = 0.00 9 13 10 2 3.6682139754e+02 0.0000000000e+00 100.00 0.1 23 27 24 3 3.6682139754e+02 0.0000000000e+00 100.00 0.1 41 45 42 5 3.6682139754e+02 0.0000000000e+00 100.00 0.1 54 58 55 6 3.6682139754e+02 0.0000000000e+00 100.00 0.1 72 76 73 7 3.6682139754e+02 0.0000000000e+00 100.00 0.1 95 99 96 8 3.6682139754e+02 0.0000000000e+00 100.00 0.1 128 132 129 9 3.6682139754e+02 0.0000000000e+00 100.00 0.1 206 210 207 11 3.6682139754e+02 0.0000000000e+00 100.00 0.2 398 402 399 14 3.6682139754e+02 5.7818891965e-09 100.00 0.2 782 786 783 20 3.6682139754e+02 5.7818891965e-09 100.00 0.2 1550 1554 1489 11 2.9209669476e+02 5.7987916610e-09 100.00 0.3 3022 2964 2649 15 2.9209669476e+02 1.3458281073e-08 100.00 0.4 4885 4827 4054 23 2.5587728623e+02 1.0159788076e-06 100.00 0.5 6751 6686 5698 31 6.1363276101e+01 2.4285911784e+00 96.04 0.6 9534 8501 5095 17 3.0246866008e+01 4.8619330908e+00 83.93 0.7 14484 8860 389 29 2.9900620501e+01 1.1705980010e+01 60.85 0.8 14868 8886 31 23 2.9900620501e+01 1.1904381561e+01 60.19 0.8 14899 8901 12 24 2.9900620501e+01 1.1904381561e+01 60.19 0.8 14917 8917 12 26 2.9900620501e+01 1.6935494637e+01 43.36 0.9 14931 8930 6 25 2.9900620501e+01 1.6935494640e+01 43.36 0.9 An optimal solution satisfying the relative gap tolerance of 1.00e-02(%) has been located. The relative gap is 0.00e+00(%). An optimal solution satisfying the absolute gap tolerance of 0.00e+00 has been located. The absolute gap is 0.00e+00. Objective of best integer solution : 2.990062050092e+01 Best objective bound : 2.990062050092e+01 Initial feasible solution objective: Undefined Construct solution objective : Not employed User objective cut value : Not employed Number of cuts generated : 0 Number of branches : 14937 Number of relaxations solved : 8936 Number of interior point iterations: 125523 Number of simplex iterations : 0 Time spend presolving the root : 0.00 Time spend optimizing the root : 0.00 Mixed integer optimizer terminated. Time: 0.88 Optimizer terminated. Time: 0.96 Integer solution solution summary Problem status : PRIMAL_FEASIBLE Solution status : INTEGER_OPTIMAL Primal. obj: 2.9900620501e+01 nrm: 2e+02 Viol. con: 0e+00 var: 0e+00 acc: 0e+00 djc: 0e+00 itg: 0e+00
def plotTheResults(centroids, points, labels, kmeans=True):
# scatter centroids and data
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', color='r')
plt.scatter(points[:, 0], points[:, 1], marker='o')
# draw lines to centroids
for i in range(0, numberOfClusters):
centroid = centroids[i, :]
cluster = points[labels == i, :]
nPointsInCluster = cluster.shape[0]
centroidCopied = np.tile(centroid, (nPointsInCluster, 1))
xx = np.vstack( [ centroidCopied[:, 0] , cluster[:, 0] ])
yy = np.vstack( [ centroidCopied[:, 1] , cluster[:, 1] ])
# plot the lines
plt.plot(xx, yy, '--', color='green', alpha=0.4)
# show the points
if kmeans:
plt.title("Solution to the K-Means clustering problem")
else:
plt.title("Solution to the Euclidean clustering problem")
plt.legend(["centroids", "data"])
plt.show()
# call the function
plotTheResults(centroids_KMEANS, points, labels_KMEANS, kmeans=True)
In order to change the proposed model to Euclidean Clustering, we need to only change the squared norm of the distances to standard Euclidean norm. This approach is more robust to outliers compared to the K-Means algorithm. Euclidean clustering problem has the following form:
$$\begin{array}{rll} \text{minimize} & \sum_{i=1}^n d_i & \\ \text{subject to} & \bigvee_{j \in \{1, .., \mathcal{K}\}} \Bigl[ \hspace{0.2cm} d_i \geq dAux_{i, j} \wedge Y_i = j \Bigr], & \forall i \in \{1, ..., n\},\\ & dAux_{i, j} \geq || \textbf{c}_j - \textbf{p}_i ||_2, & \forall j \in \{1, .., \mathcal{K}\}, & \forall i \in \{1, ..., n\}, \\ & c_{j-1, 1} \leq c_{j, 1} , & \forall j \in \{2, .., \mathcal{K}\}, \\ \\ & dAux \in \mathbb{R}^{n \times \mathcal{K}}_+ & \\ & d_1, ..., d_n \in \mathbb{R}_{+}, & \\ & \textbf{c}_1, ..., \textbf{c}_{\mathcal{K}} \in \mathbb{R}^\mathcal{D}, & \\ & Y_1, ..., Y_n \in \{1, .., \mathcal{K}\}. & \end{array}$$Heuristic algorithms usually solve such a problem by finding a geometric median. In the Fusion API model, we need to only swap the Rotated Quadratic Cone for Quadratic Cone.
# use the same data
plt.title("Generated Data")
plt.scatter(pointsClass1[:, 0], pointsClass1[:, 1])
plt.scatter(pointsClass2[:, 0], pointsClass2[:, 1])
plt.scatter(pointsClass3[:, 0], pointsClass3[:, 1])
plt.show()
# get shape
n, d = points.shape
xs = np.repeat( points, numberOfClusters, axis=0)
# create model
with Model() as M:
########## create variables ##########
centroids = M.variable( [numberOfClusters, d], Domain.unbounded() )
distances = M.variable( n, Domain.greaterThan(0) )
distanceAux = M.variable( [n, numberOfClusters] , Domain.greaterThan(0) )
# create classification labels
Y = M.variable( n, Domain.integral( Domain.inRange(0, numberOfClusters-1) ) )
########## create constraints ##########
# quadratic cone constraints - THIS IS THE ONLY DIFFERENCE TO K-MEANS MODEL
a1 = Expr.flatten( distanceAux )
a2 = Expr.sub( Expr.repeat(centroids, n, 0) , xs )
hstack = Expr.hstack([a1, a2]) # ( d_{ij}Aux, x_i - c_j )
M.constraint( hstack, Domain.inQCone() ) # now a quadratic instead of rotated quadratic cone
# create disjunctive constraints
for pointInd in range(0, n):
label = Y.index(pointInd)
di = distances.index(pointInd)
ANDs = {}
# create AND constraints
for clusterInd in range(0, numberOfClusters):
dijAux = distanceAux.index([pointInd, clusterInd])
ANDs[clusterInd] = DJC.AND( DJC.term( Expr.sub(di, dijAux),Domain.greaterThan(0)), # di >= dijAux
DJC.term( label , Domain.equalsTo(clusterInd) )) # Y_i = j
M.disjunction( [ ANDs[i] for i in range(0, numberOfClusters) ] )
# symmetry breaking constraints
M.constraint( Expr.sub(centroids.slice([0, 0], [numberOfClusters-1, 1]),
centroids.slice([1, 0], [numberOfClusters, 1])), Domain.lessThan(0) )
########## solve ##########
M.objective( ObjectiveSense.Minimize, Expr.sum(distances) )
M.setLogHandler(sys.stdout) # Enable log output
# Solve
M.solve()
# get centroids and clusters
labels_EUCL = Y.level()
centroids_EUCL = np.array(centroids.level()).reshape(numberOfClusters, d)
Problem Name : Objective sense : minimize Type : CONIC (conic optimization problem) Constraints : 2 Affine conic cons. : 63 Disjunctive cons. : 21 Cones : 0 Scalar variables : 112 Matrix variables : 0 Integer variables : 21 Optimizer started. Mixed integer optimizer started. Threads used: 64 Presolve started. Presolve terminated. Time = 0.00, probing time = 0.00 Presolved problem: 357 variables, 290 constraints, 664 non-zeros Presolved problem: 21 general integer, 63 binary, 273 continuous Presolved problem: 63 cones Presolved problem: 63 disjunctions Clique table size: 63 BRANCHES RELAXS ACT_NDS DEPTH BEST_INT_OBJ BEST_RELAX_OBJ REL_GAP(%) TIME 0 1 1 0 NA 0.0000000000e+00 NA 0.0 0 1 1 0 8.5213968665e+01 0.0000000000e+00 100.00 0.0 Cut generation started. 0 1 1 0 8.5213968665e+01 0.0000000000e+00 100.00 0.0 Cut generation terminated. Time = 0.00 9 13 10 2 8.5213968665e+01 0.0000000000e+00 100.00 0.1 24 28 25 3 8.5213968665e+01 0.0000000000e+00 100.00 0.1 43 47 44 5 8.5213968665e+01 0.0000000000e+00 100.00 0.1 55 59 56 6 8.5213968665e+01 0.0000000000e+00 100.00 0.1 75 79 76 7 8.5213968665e+01 0.0000000000e+00 100.00 0.1 97 101 98 8 8.5213968665e+01 0.0000000000e+00 100.00 0.1 131 135 132 9 8.5213968665e+01 0.0000000000e+00 100.00 0.1 207 211 208 11 8.5213968665e+01 0.0000000000e+00 100.00 0.1 399 403 400 14 8.5213968665e+01 0.0000000000e+00 100.00 0.2 783 787 784 20 8.5213968665e+01 0.0000000000e+00 100.00 0.2 1551 1555 1506 7 8.0007253389e+01 0.0000000000e+00 100.00 0.2 3023 3019 2752 20 7.7542742673e+01 1.2384027469e-09 100.00 0.3 4943 4939 4282 10 6.4230909292e+01 1.2384027469e-09 100.00 0.4 6863 6859 6190 26 6.4230909292e+01 1.5939326100e-09 100.00 0.5 8778 8774 7569 21 6.4230909292e+01 3.3543150697e+00 94.78 0.5 10672 10668 9177 24 5.3741376919e+01 6.1996500725e+00 88.46 0.6 12586 12571 10627 29 3.0022302617e+01 7.8071468376e+00 74.00 0.7 14549 14469 11228 24 2.4972155345e+01 8.9100468605e+00 64.32 0.8 17060 16350 11243 32 2.4972155345e+01 9.4691749685e+00 62.08 0.9 19230 18233 11387 14 2.0852857330e+01 9.9792349266e+00 52.14 1.0 22397 20093 10512 13 2.0852857330e+01 1.0770938019e+01 48.35 1.1 24852 21978 10029 16 2.0852857330e+01 1.1656163183e+01 44.10 1.2 27019 23877 9694 20 2.0852857330e+01 1.2027249317e+01 42.32 1.3 29093 25778 9428 24 2.0852857330e+01 1.2350676113e+01 40.77 1.4 31121 27685 9144 18 2.0852857330e+01 1.2509919303e+01 40.01 1.4 33180 29593 8753 20 2.0852857330e+01 1.2792794764e+01 38.65 1.5 35230 31500 8335 16 2.0852857330e+01 1.3012038572e+01 37.60 1.6 37275 33413 7852 14 2.0852857330e+01 1.3338809996e+01 36.03 1.7 39308 35325 7359 11 2.0852857330e+01 1.3661633587e+01 34.49 1.8 41351 37235 6848 18 2.0852857330e+01 1.3874268700e+01 33.47 1.9 43417 39148 6230 17 2.0852857330e+01 1.4262195538e+01 31.61 2.0 45434 41013 5585 14 2.0852857330e+01 1.4539497525e+01 30.28 2.1 47571 42925 4798 13 2.0852857330e+01 1.4558626153e+01 30.18 2.1 49732 44834 3933 18 2.0852857330e+01 1.5158292439e+01 27.31 2.2 51896 46751 2989 20 2.0852857330e+01 1.5629371465e+01 25.05 2.3 54255 48662 1682 18 2.0852857330e+01 1.6574615587e+01 20.52 2.4 55919 49807 514 14 2.0852857330e+01 1.7679256490e+01 15.22 2.4 56423 49986 62 21 2.0852857330e+01 1.9995164234e+01 4.11 2.5 56487 49997 0 19 2.0852857330e+01 2.0852857330e+01 0.00e+00 2.5 An optimal solution satisfying the relative gap tolerance of 1.00e-02(%) has been located. The relative gap is 0.00e+00(%). An optimal solution satisfying the absolute gap tolerance of 0.00e+00 has been located. The absolute gap is 0.00e+00. Objective of best integer solution : 2.085285733044e+01 Best objective bound : 2.085285733044e+01 Initial feasible solution objective: Undefined Construct solution objective : Not employed User objective cut value : Not employed Number of cuts generated : 0 Number of branches : 56487 Number of relaxations solved : 49997 Number of interior point iterations: 655050 Number of simplex iterations : 0 Time spend presolving the root : 0.00 Time spend optimizing the root : 0.00 Mixed integer optimizer terminated. Time: 2.47 Optimizer terminated. Time: 2.51 Integer solution solution summary Problem status : PRIMAL_FEASIBLE Solution status : INTEGER_OPTIMAL Primal. obj: 2.0852857330e+01 nrm: 4e+01 Viol. con: 0e+00 var: 0e+00 acc: 0e+00 djc: 0e+00 itg: 0e+00
# call the plotting functions
plotTheResults(centroids_KMEANS, points, labels_KMEANS, kmeans=True)
plotTheResults(centroids_EUCL, points, labels_EUCL, kmeans=False)
This work is licensed under a Creative Commons Attribution 4.0 International License. The MOSEK logo and name are trademarks of Mosek ApS. The code is provided as-is. Compatibility with future release of MOSEK or the Fusion API
are not guaranteed. For more information contact our support.