It is clear that each convex function can be approximated to some extent by a convex piece-wise linear function. In this notebook we show how to model this as a least squares problem. This model was introduced and studied by Kuosmanen and Seijo, Sen, among others.
We are given a set of points $X_1,\ldots,X_n\in \mathbb{R}^d$ and $Y_1,\ldots,Y_n\in\mathbb{R}$, which we presume were generated as $Y_i = f(X_i) + \varepsilon_i$ where $f$ is a convex function and $\varepsilon_i$ is noise. Our least squares problem looks as follows
$$\begin{array}{rll} \text{minimize} & \sum_{i=1}^n (t_i - Y_i)^2 & \\ \text{subject to} & t_i \geq t_j + s_j^T (X_i - X_j) &\text{for all}\ 1\leq i,j\leq n, \\ & t_1, ..., t_n \in \mathbb{R}, & \\ & s_1, ..., s_n \in \mathbb{R}^d. & \end{array}$$Then the piecewise linear function approximating the data will be defined as the maximum $\hat{\Phi}(x)=\text{max}\{\Phi_i(x),\ i=1,\ldots,n\}$ of $n$ linear functions $\Phi_i:\mathbb{R}^d\to \mathbb{R}$ given by
$$\Phi_i(x) = t_i + s_i^T(x-X_i)$$for $i=1,\ldots,n$.
Note that $\Phi_i(X_i)=t_i$ and $\nabla \Phi_i(X_i)=s_i$. Therefore the main constraint appearing in the model corresponds precisely the convexity requirement:
$$\hat{\Phi}(X_i) - \hat{\Phi}(X_j) \geq \langle\nabla \hat{\Phi}(X_j),X_i-X_j\rangle$$for the function $\Theta$.
In the example we will show an approximation of a simple quadratic function $f(x) = x^Tx.$
Let us first define our data. We generate $n$ points of dimension $d$ uniformly from $[-1, 1]^d$. The corresponding respose is given as $Y_i = f(X_i) + \varepsilon_i$. Error $\varepsilon$ follows normal distribution $N(0, \sigma^2 I_n)$, where $\sigma^2 = Var(f(\mathbf{X})) / SNR$ and $SNR = 3$. Lastly, we mean-center and standardize to unit $l_2$ norm the responses $Y$ and the data $X$. We do this to get a more predictive model.
import numpy as np
from mosek.fusion import *
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 10]
from matplotlib import cm
import sys
np.random.seed(0)
# specify the dimension
n = 200
d = 2
# specify the function
fun = lambda x: np.sum(np.multiply(x, x), 1) # x^T x
# generate data and get corresponding responses
x = np.random.uniform(-1, 1, n*d).reshape(n, d)
response = fun(x)
# compute response vector with error
varResponse = np.var(response, ddof=1) # unbiased sample variance
SNR = 3
stdError = np.sqrt(varResponse / SNR)
error = np.random.normal(0, stdError, n)
reponseWithError = response + error
# standardize and mean-center the response
meanY = np.sum(reponseWithError) / n
standY = reponseWithError - meanY
norm = np.sqrt( np.sum( np.square( standY )))
Y = standY / norm
# standardize the x (by the dimension) and mean-center
meanX = np.sum(x, axis=0) / n
standX = x - meanX
norm = np.sqrt(np.sum (np.square( standX ), axis=0))
X = np.divide(standX, norm)
# Plot generated data
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# set title
ax.set_title("Generated Data")
# scatter old points
scatter = ax.scatter(X[:, 0], X[:, 1], Y)
In the following block, usage of Mosek Fusion for Python for this case is presented.
# solve QP
with Model('ConvexApproxLeastSquares') as M:
############## create variables ##############
m = M.variable( 1, Domain.unbounded() ) # maximum variable
t = M.variable( n, Domain.unbounded() )
ss = M.variable( [n, d], Domain.unbounded() ) # variables s stored in n X d matrix
############## create constraints ##############
# create Qr constraints
# auxDiff = t - ys
auxDiff = M.variable( n, Domain.unbounded() )
M.constraint( Expr.sub(t, auxDiff), Domain.equalsTo(Y) )
# (0.5, m, t - ys) in Q_r
auxHalf = M.variable( 1, Domain.equalsTo(0.5) )
z1 = Var.vstack( auxHalf, m, auxDiff )
M.constraint( z1, Domain.inRotatedQCone() )
# create constraints on the maximum of the linear-piecewise functions
# i.e.
# 0 >= t_j - t_i + <s_j,x_i - x_j>
# compute t_j - t_i
t_is = Expr.flatten( Expr.repeat(t, n, 1) )
t_js = Expr.repeat( t, n, 0 )
Diff = Expr.sub(t_js, t_is)
# make a dot product <s_j,x_i - x_j>
x_is = np.repeat( X, repeats=n, axis=0 )
x_js = np.tile( X, (n, 1) )
ss_stack = Expr.repeat( ss, n, 0 )
dotProd = Expr.sum( Expr.mulElm(ss_stack ,x_is - x_js) , 1 ) # dot product
# add constraint
expr = Expr.add(Diff, dotProd)
M.constraint( expr, Domain.lessThan(0) )
############## solve problem ##############
# minimize the least squares
M.objective( "obj", ObjectiveSense.Minimize, m )
# set the log handler to stdout
M.setLogHandler(sys.stdout)
M.solve()
# get the optimal values
solt = t.level()
sols = ss.level().reshape((n, d))
Problem Name : ConvexApproxLeastSquares Objective sense : minimize Type : CONIC (conic optimization problem) Constraints : 40200 Affine conic cons. : 1 Disjunctive cons. : 0 Cones : 0 Scalar variables : 803 Matrix variables : 0 Integer variables : 0 Optimizer started. Presolve started. Linear dependency checker started. Linear dependency checker terminated. Eliminator started. Freed constraints in eliminator : 200 Eliminator terminated. Eliminator started. Freed constraints in eliminator : 0 Eliminator terminated. Eliminator - tries : 2 time : 0.00 Lin. dep. - tries : 1 time : 0.00 Lin. dep. - number : 0 Presolve terminated. Time: 0.03 Problem Name : ConvexApproxLeastSquares Objective sense : minimize Type : CONIC (conic optimization problem) Constraints : 40200 Affine conic cons. : 1 Disjunctive cons. : 0 Cones : 0 Scalar variables : 803 Matrix variables : 0 Integer variables : 0 Optimizer - threads : 64 Optimizer - solved problem : the dual Optimizer - Constraints : 592 Optimizer - Cones : 1 Optimizer - Scalar variables : 39206 conic : 202 Optimizer - Semi-definite variables: 0 scalarized : 0 Factor - setup time : 0.02 dense det. time : 0.00 Factor - ML order time : 0.00 GP order time : 0.00 Factor - nonzeros before factor : 9.91e+04 after factor : 9.91e+04 Factor - dense dim. : 0 flops : 1.94e+07 ITE PFEAS DFEAS GFEAS PRSTATUS POBJ DOBJ MU TIME 0 2.1e-01 1.0e+00 2.1e+00 0.00e+00 7.071067812e-01 -3.535533906e-01 1.0e+00 0.06 1 1.2e-01 5.7e-01 1.8e-01 1.23e+00 2.938859452e-01 -2.209341240e-01 5.7e-01 0.11 2 3.4e-02 1.7e-01 3.7e-02 2.36e+00 3.283410900e-01 2.541459120e-01 1.7e-01 0.13 3 5.1e-03 2.4e-02 2.0e-03 1.57e+00 2.984161649e-01 2.900558823e-01 2.4e-02 0.14 4 1.2e-03 5.9e-03 2.3e-04 1.07e+00 2.833396729e-01 2.813883669e-01 5.9e-03 0.15 5 5.2e-04 2.5e-03 6.1e-05 1.01e+00 2.697746770e-01 2.689233337e-01 2.5e-03 0.16 6 1.9e-04 9.3e-04 1.3e-05 1.00e+00 2.540998423e-01 2.537768700e-01 9.3e-04 0.17 7 1.1e-04 5.2e-04 5.5e-06 9.90e-01 2.405247711e-01 2.403411101e-01 5.2e-04 0.18 8 7.6e-05 3.7e-04 3.2e-06 9.87e-01 2.395813200e-01 2.394502320e-01 3.7e-04 0.19 9 3.5e-05 1.7e-04 9.8e-07 9.44e-01 2.282203169e-01 2.281592375e-01 1.7e-04 0.20 10 1.8e-05 8.7e-05 3.7e-07 9.48e-01 2.228880403e-01 2.228554922e-01 8.7e-05 0.21 11 1.2e-05 5.6e-05 1.9e-07 9.47e-01 2.206282771e-01 2.206070722e-01 5.6e-05 0.22 12 8.1e-06 3.9e-05 1.1e-07 9.45e-01 2.192059131e-01 2.191906637e-01 3.9e-05 0.22 13 5.6e-06 2.7e-05 6.6e-08 9.08e-01 2.175484624e-01 2.175376218e-01 2.7e-05 0.23 14 5.3e-06 2.6e-05 6.1e-08 8.31e-01 2.172616367e-01 2.172513140e-01 2.6e-05 0.24 15 4.1e-06 2.0e-05 4.2e-08 8.35e-01 2.161293821e-01 2.161211400e-01 2.0e-05 0.25 16 3.1e-06 1.5e-05 2.8e-08 8.61e-01 2.152832407e-01 2.152768442e-01 1.5e-05 0.26 17 2.5e-06 1.2e-05 2.1e-08 8.75e-01 2.148823263e-01 2.148769841e-01 1.2e-05 0.27 18 1.7e-06 8.2e-06 1.2e-08 8.85e-01 2.143013544e-01 2.142976924e-01 8.2e-06 0.28 19 1.6e-06 7.7e-06 1.1e-08 9.03e-01 2.142786291e-01 2.142751783e-01 7.7e-06 0.29 20 9.9e-07 4.8e-06 5.4e-09 9.09e-01 2.137893790e-01 2.137871826e-01 4.8e-06 0.30 21 8.5e-07 4.1e-06 4.4e-09 8.60e-01 2.136317544e-01 2.136298245e-01 4.1e-06 0.30 22 8.1e-07 3.9e-06 4.1e-09 8.33e-01 2.135692363e-01 2.135673840e-01 3.9e-06 0.31 23 7.9e-07 3.8e-06 3.9e-09 8.35e-01 2.135530013e-01 2.135511868e-01 3.8e-06 0.32 24 6.9e-07 3.3e-06 3.2e-09 8.33e-01 2.134027805e-01 2.134011731e-01 3.3e-06 0.33 25 3.5e-07 1.7e-06 1.2e-09 8.23e-01 2.129778108e-01 2.129769367e-01 1.7e-06 0.34 26 3.3e-07 1.6e-06 1.1e-09 8.43e-01 2.129379387e-01 2.129371256e-01 1.6e-06 0.35 27 2.2e-07 1.1e-06 6.5e-10 8.48e-01 2.127934005e-01 2.127928223e-01 1.1e-06 0.36 28 1.2e-07 5.8e-07 2.7e-10 8.72e-01 2.126512890e-01 2.126509681e-01 5.8e-07 0.37 29 1.2e-07 5.7e-07 2.6e-10 9.18e-01 2.126472296e-01 2.126469133e-01 5.7e-07 0.37 30 4.6e-08 2.2e-07 6.5e-11 9.22e-01 2.125597556e-01 2.125596301e-01 2.2e-07 0.38 31 1.7e-08 8.3e-08 1.5e-11 9.61e-01 2.125239401e-01 2.125238932e-01 8.3e-08 0.39 32 3.7e-09 1.9e-08 1.7e-12 9.85e-01 2.125108858e-01 2.125108750e-01 1.9e-08 0.40 Optimizer terminated. Time: 0.42 Interior-point solution summary Problem status : PRIMAL_AND_DUAL_FEASIBLE Solution status : OPTIMAL Primal. obj: 2.1251088582e-01 nrm: 5e+01 Viol. con: 4e-14 var: 0e+00 acc: 0e+00 Dual. obj: 2.1251087505e-01 nrm: 1e+00 Viol. con: 6e-09 var: 1e-12 acc: 0e+00
Let us evaluate our fit, this can be done as follows
$\hat \Phi (\mathbf{x}) = \max \limits _{1 \leq j \leq n} \{ \hat{t}_j + \hat{s}_j^T (\mathbf{x} - X_j) \}$
We can now see obvious drawback of such method; we need to keep all points. Please note that code below is not vectorized for the educational purposes but should be if aiming for efficiency.
# create points for evaluation
n = 100
xs = np.linspace(-.1, .1, n)
ys = np.linspace(-.1, .1, n)
xx, yy = np.meshgrid(xs, ys)
x = np.c_[xx.reshape(xx.size, 1), yy.reshape(yy.size, 1)]
x = x.astype(np.double)
# get number of linear functions
numberOfLinearFuns = solt.size
# evaluate data
phi = np.zeros(n**2)
for i in range(0, n**2):
possibleValues = np.zeros(numberOfLinearFuns)
for j in range(0, numberOfLinearFuns):
# max_j { t_j + <s_j , x_i - X_j> }
val = solt[j] + np.dot(sols[j, :], x[i, :] - X[j, :])
possibleValues[j] = val
phi[i] = np.max(possibleValues)
Finally, we can plot our solution.
# create plot
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# set title
ax.set_title("Linear piece-wise approximation")
# scatter old points
ax.scatter(X[:, 0], X[:, 1], Y)
# plot the model
xx = x[:, 0].reshape(n, n)
yy = x[:, 1].reshape(n, n)
zz = phi.reshape(n, n)
axs = ax.contour3D(xx, yy, zz, 50, cmap=cm.cool)
# set the legend
leg = ax.legend(["Data"])
This work is licensed under a Creative Commons Attribution 4.0 International License. The MOSEK logo and name are trademarks of Mosek ApS. The code is provided as-is. Compatibility with future release of MOSEK or the Fusion API
are not guaranteed. For more information contact our support.