Use a method of least squares to find a 'best bezier fit' for a hand drawn arc.

This notebook will investigate the cubic bezier function along with how to perform a least squares best fit to find the optimal control points for a cubic bezier curve that best matches a hand drawn curve.

For more background information on bezier curves see : http://en.wikipedia.org/wiki/B%C3%A9zier_curve

For an excellent tutorial on least squared fitting see : http://www.embedded.com/electrical-engineer-community/industry-blog/4027019/1/Why-all-the-math-

Author : Luke Whittlesey

email : [email protected]

First setup this ipython notebook for symbolics using sympy and inline plotting using matplotlib.

In [1]:
# we need some ipython magic commands to 
# 1. allow sympy to pretty print math equations inline
# 2. use pylab for inline graphing
%pylab inline
%load_ext sympy.interactive.ipythonprinting
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

To make sure we understand what a cubic bezier curve is, let's create one numerically and then plot it.

In [2]:
def numerical_cubic_bezier_fn(control_pts, t):
    """ 
    Numerically calculate an x,y point for the bezier curve at position t. 

    Parameters: control_pts -- list of 4 (x,y) control points
                t -- 'time' value should be between 0 and 1

    Return: (xpt, ypt) -- tuple of x,y data on the curve
    """
    check = ((0 <= t) and (t <= 1))
    assert check
    # define the actual cubic bezier equation here
    fn = lambda c,t : c[0]*(1 - t)**3 + c[1]*3*t*(1 - t)**2 + c[2]*3*t**2*(1 - t) + c[3]*t**3
    xs = [x for x,y in control_pts]
    ys = [y for x,y in control_pts]
    # now calculate the x,y position from the bezier equation
    xpt = fn(xs, t)
    ypt = fn(ys, t)
    return xpt, ypt

# just to make sure that I understand, lets create a list of numbers for a specific bezier curve
# arbitrarily define my 4 control points here
p0 = (0.7,0.2); p1 = (2.3,1.0); p2 = (7.2,0.8); p3 = (10.0,0.0)
# create a list of 1000 x,y points that traces over the length of this curve
points = [numerical_cubic_bezier_fn(control_pts=[p0,p1,p2,p3], t=t) for t in arange(0.0,1.0,1./1000)]
# separate the x's and y's so matplotlib can plot them
xs = [x for x,y in points]
ys = [y for x,y in points]

# plot the resulting curve and the control points
plot(xs, ys, 'b-')
hold(True)
for pt in [p0,p1,p2,p3]:
    plot( pt[0], pt[1], 'ro')
hold(False)
t=title('example cubic bezier curve')
#legend(['curve','control pts'])

Using sympy symbols we will now go through some mathy equations for how to setup the least squares problem of finding the best cubic bezier fit for our hand drawn data points.

First redefine the cubic bezier equation using sympy.

t must be defined in the range 0 to 1 \begin{aligned} t \in \begin{bmatrix} 0 & 1 \end{bmatrix} \end{aligned}

In [3]:
from sympy.printing.mathml import mathml
from sympy import Symbol, Matrix, summation, diff, expand, collect, preview, Sum, latex, Eq, factor

t = Symbol('t_i')
c0, c1, c2, c3 = [Symbol(name) for name in ['c0','c1','c2','c3']]

# the equation for a cubic bezier path b(t) over t in [0,1]
# Note: the c control points are (x,y) tuples, but we're going to compute
# the resulting x and y values independantly. x and y are independant of
# each other, but both are dependant on t which goes from 0 to 1
symbolic_cubic_bezier_fn = c0*(1 - t)**3 + c1*3*t*(1 - t)**2 + c2*3*t**2*(1 - t) + c3*t**3
print 'The cubic bezier function B(t) is:'
B = Symbol('B(t_i)')
Eq(B, symbolic_cubic_bezier_fn)
The cubic bezier function B(t) is:
Out[3]:
$$B(t_{i)} = c_{0} \left(- t_{i} + 1\right)^{3} + 3 c_{1} t_{i} \left(- t_{i} + 1\right)^{2} + 3 c_{2} t_{i}^{2} \left(- t_{i} + 1\right) + c_{3} t_{i}^{3}$$

Now we define the error for each data point as the difference between what the best fit bezier function gives us and what the actual data point is.

\begin{aligned} e_i = B(t_i) - d_i \end{aligned}

In [4]:
d_i = Symbol('d_i')
# the error between what the 'best fit' bezier curve produces and the hand drawn point is defined as
error = symbolic_cubic_bezier_fn - d_i

e_i = Symbol('e_i')
# where symbolic_cubic_bezier_fn is bezier(t, control_points) and data_i is the i'th hand drawn point
Eq(e_i, error)
Out[4]:
$$e_{i} = c_{0} \left(- t_{i} + 1\right)^{3} + 3 c_{1} t_{i} \left(- t_{i} + 1\right)^{2} + 3 c_{2} t_{i}^{2} \left(- t_{i} + 1\right) + c_{3} t_{i}^{3} - d_{i}$$

The penalty function will be the sum of all of the squared errors. I.e. least squares.

\begin{aligned} M = \sum_{i=0}^{n} e_i^2 \end{aligned} This penalty function, M, is what we will want to minimize to call the solution a best fit.

In [5]:
# the least squares penalty function is the sum of all of the squared errors
penalty_fn = error**2

print 'The penalty function is :'
i = Symbol('i')
n = Symbol('n')
M = Symbol('M')
Eq(M, Sum( penalty_fn, (i,0,n) ))
The penalty function is :
Out[5]:
$$M = \sum_{i=0}^{n} \left(c_{0} \left(- t_{i} + 1\right)^{3} + 3 c_{1} t_{i} \left(- t_{i} + 1\right)^{2} + 3 c_{2} t_{i}^{2} \left(- t_{i} + 1\right) + c_{3} t_{i}^{3} - d_{i}\right)^{2}$$

We want to minimize the error in the penalty function for each coef c0,c1,c2,c3 so take partial derivatives and find where they equal zero to find the minimum (or maximum).

\begin{aligned} \frac{\partial M}{\partial c_0} = 0, \frac{\partial M}{\partial c_1} = 0, \frac{\partial M}{\partial c_2} = 0, \frac{\partial M}{\partial c_3} = 0 \end{aligned} We actually know that this can only be a minimum since the maximum error is infinite and therefore the derivative won't have a zero crossing for the maximum.

In [6]:
df_dc0 = collect( expand(penalty_fn.diff(c0), deep=True), [c0,c1,c2,c3])
print 'Partial derivative wrt c0 is:'
Eq(0, Sum(df_dc0, (i,0,n)) )
Partial derivative wrt c0 is:
Out[6]:
$$0 = \sum_{i=0}^{n} \left(c_{0} \left(2 t_{i}^{6} - 12 t_{i}^{5} + 30 t_{i}^{4} - 40 t_{i}^{3} + 30 t_{i}^{2} - 12 t_{i} + 2\right) + c_{1} \left(- 6 t_{i}^{6} + 30 t_{i}^{5} - 60 t_{i}^{4} + 60 t_{i}^{3} - 30 t_{i}^{2} + 6 t_{i}\right) + c_{2} \left(6 t_{i}^{6} - 24 t_{i}^{5} + 36 t_{i}^{4} - 24 t_{i}^{3} + 6 t_{i}^{2}\right) + c_{3} \left(- 2 t_{i}^{6} + 6 t_{i}^{5} - 6 t_{i}^{4} + 2 t_{i}^{3}\right) + 2 d_{i} t_{i}^{3} - 6 d_{i} t_{i}^{2} + 6 d_{i} t_{i} - 2 d_{i}\right)$$
In [7]:
df_dc1 = collect( expand(penalty_fn.diff(c1), deep=True), [c0,c1,c2,c3])
print 'Partial derivative wrt c1 is:'
Eq(0, Sum(df_dc1, (i,0,n)))
Partial derivative wrt c1 is:
Out[7]:
$$0 = \sum_{i=0}^{n} \left(c_{0} \left(- 6 t_{i}^{6} + 30 t_{i}^{5} - 60 t_{i}^{4} + 60 t_{i}^{3} - 30 t_{i}^{2} + 6 t_{i}\right) + c_{1} \left(18 t_{i}^{6} - 72 t_{i}^{5} + 108 t_{i}^{4} - 72 t_{i}^{3} + 18 t_{i}^{2}\right) + c_{2} \left(- 18 t_{i}^{6} + 54 t_{i}^{5} - 54 t_{i}^{4} + 18 t_{i}^{3}\right) + c_{3} \left(6 t_{i}^{6} - 12 t_{i}^{5} + 6 t_{i}^{4}\right) - 6 d_{i} t_{i}^{3} + 12 d_{i} t_{i}^{2} - 6 d_{i} t_{i}\right)$$
In [8]:
df_dc2 = collect( expand(penalty_fn.diff(c2), deep=True), [c0,c1,c2,c3])
print 'Partial derivative wrt c2 is:'
Eq(0, Sum(df_dc2, (i,0,n)) )
Partial derivative wrt c2 is:
Out[8]:
$$0 = \sum_{i=0}^{n} \left(c_{0} \left(6 t_{i}^{6} - 24 t_{i}^{5} + 36 t_{i}^{4} - 24 t_{i}^{3} + 6 t_{i}^{2}\right) + c_{1} \left(- 18 t_{i}^{6} + 54 t_{i}^{5} - 54 t_{i}^{4} + 18 t_{i}^{3}\right) + c_{2} \left(18 t_{i}^{6} - 36 t_{i}^{5} + 18 t_{i}^{4}\right) + c_{3} \left(- 6 t_{i}^{6} + 6 t_{i}^{5}\right) + 6 d_{i} t_{i}^{3} - 6 d_{i} t_{i}^{2}\right)$$
In [9]:
df_dc3 = collect( expand(penalty_fn.diff(c3), deep=True), [c0,c1,c2,c3])
print 'Partial derivative wrt c3 is:'
Eq(0, Sum(df_dc3, (i,0,n)) )
Partial derivative wrt c3 is:
Out[9]:
$$0 = \sum_{i=0}^{n} \left(c_{0} \left(- 2 t_{i}^{6} + 6 t_{i}^{5} - 6 t_{i}^{4} + 2 t_{i}^{3}\right) + c_{1} \left(6 t_{i}^{6} - 12 t_{i}^{5} + 6 t_{i}^{4}\right) + c_{2} \left(- 6 t_{i}^{6} + 6 t_{i}^{5}\right) + 2 c_{3} t_{i}^{6} - 2 d_{i} t_{i}^{3}\right)$$

Notice that all of the partial derivatives only have a single power for c0,c1,c2,c3. This means that these equations are linear with respect to the c's. To minimize the error we have to set each partial derivative to zero, to find the minima, and then solve 4 simultaneous equations.

Now we should have 4 equations and 4 unknowns. We can use linear algebra to solve this for us, but first we need to reshuffle the equations to prepare them for matrix form.

We will also factor to reduce some of the terms.

In [10]:
def pull_arguments_by_var(eqn, variables):
    """
    For a equation such as : a*2 + b*3*x + 5*x + 5
    this function will return a dictionary of the arguments that are multiplied by each variable.
    So if variables=[a,b], the dict returned would be:
        a:[2],
        b:[3*x],
        'other':[5*x, 5]

    Parameters: eqn -- a sympy equation that has already been term collected
                variables -- a list of variables to search for arguments that are multiplied by each variable

    Returns: vardict -- a dictionary containing the arguments for each variable and 'other'
    """
    # try to find what is multiplied by the variables in variables and other
    cdict = {}
    for var in variables:
        for arg in eqn.args:
            if var in arg.atoms(Symbol):
                cdict.setdefault(var, []).append(arg/var)
                #print '%s : %s' %(c, arg)
    for arg in eqn.args:
        # test for non-cx arguments
        if not any([(cc in arg) for cc in variables]):
            cdict.setdefault('other', []).append(arg)
            #print 'other : %s' %(arg)
    return cdict   
In [11]:
# from the 1st partial derivative, reshuffle the equations so the c's are on the left and the others are on the right
# also redistribute the sums
cvars = [c0,c1,c2,c3]
arg_dict = pull_arguments_by_var(eqn=df_dc0, variables=cvars)
leftside = c0*Sum(factor(arg_dict[c0][0]), (i,0,n)) + \
           c1*Sum(factor(arg_dict[c1][0]), (i,0,n)) + \
           c2*Sum(factor(arg_dict[c2][0]), (i,0,n)) + \
           c3*Sum(factor(arg_dict[c3][0]), (i,0,n))
rightside = arg_dict['other'][0]
for arg in arg_dict['other'][1:]:
    rightside = rightside + arg
print '1st Equation:'
Eq(leftside, -1*Sum(factor(rightside), (i,0,n)))
1st Equation:
Out[11]:
$$c_{0} \sum_{i=0}^{n} 2 \left(t_{i} -1\right)^{6} + c_{1} \sum_{i=0}^{n} - 6 t_{i} \left(t_{i} -1\right)^{5} + c_{2} \sum_{i=0}^{n} 6 t_{i}^{2} \left(t_{i} -1\right)^{4} + c_{3} \sum_{i=0}^{n} - 2 t_{i}^{3} \left(t_{i} -1\right)^{3} = - \sum_{i=0}^{n} 2 d_{i} \left(t_{i} -1\right)^{3}$$
In [12]:
# from the 2nd partial derivative, reshuffle the equations so the c's are on the left and the others are on the right
# also redistribute the sums
cvars = [c0,c1,c2,c3]
arg_dict = pull_arguments_by_var(eqn=df_dc1, variables=cvars)
leftside = c0*Sum(factor(arg_dict[c0][0]), (i,0,n)) + \
           c1*Sum(factor(arg_dict[c1][0]), (i,0,n)) + \
           c2*Sum(factor(arg_dict[c2][0]), (i,0,n)) + \
           c3*Sum(factor(arg_dict[c3][0]), (i,0,n))
rightside = arg_dict['other'][0]
for arg in arg_dict['other'][1:]:
    rightside = rightside + arg
print '2nd Equation:'
Eq(leftside, -1*Sum(factor(rightside), (i,0,n)))
2nd Equation:
Out[12]:
$$c_{0} \sum_{i=0}^{n} - 6 t_{i} \left(t_{i} -1\right)^{5} + c_{1} \sum_{i=0}^{n} 18 t_{i}^{2} \left(t_{i} -1\right)^{4} + c_{2} \sum_{i=0}^{n} - 18 t_{i}^{3} \left(t_{i} -1\right)^{3} + c_{3} \sum_{i=0}^{n} 6 t_{i}^{4} \left(t_{i} -1\right)^{2} = - \sum_{i=0}^{n} - 6 d_{i} t_{i} \left(t_{i} -1\right)^{2}$$
In [13]:
# from the 3rd partial derivative, reshuffle the equations so the c's are on the left and the others are on the right
# also redistribute the sums
cvars = [c0,c1,c2,c3]
arg_dict = pull_arguments_by_var(eqn=df_dc2, variables=cvars)
leftside = c0*Sum(factor(arg_dict[c0][0]), (i,0,n)) + \
           c1*Sum(factor(arg_dict[c1][0]), (i,0,n)) + \
           c2*Sum(factor(arg_dict[c2][0]), (i,0,n)) + \
           c3*Sum(factor(arg_dict[c3][0]), (i,0,n))
rightside = arg_dict['other'][0]
for arg in arg_dict['other'][1:]:
    rightside = rightside + arg
print '3rd Equation:'
Eq(leftside, -1*Sum(factor(rightside), (i,0,n)))
3rd Equation:
Out[13]:
$$c_{0} \sum_{i=0}^{n} 6 t_{i}^{2} \left(t_{i} -1\right)^{4} + c_{1} \sum_{i=0}^{n} - 18 t_{i}^{3} \left(t_{i} -1\right)^{3} + c_{2} \sum_{i=0}^{n} 18 t_{i}^{4} \left(t_{i} -1\right)^{2} + c_{3} \sum_{i=0}^{n} - 6 t_{i}^{5} \left(t_{i} -1\right) = - \sum_{i=0}^{n} 6 d_{i} t_{i}^{2} \left(t_{i} -1\right)$$
In [14]:
# from the 4th partial derivative, reshuffle the equations so the c's are on the left and the others are on the right
# also redistribute the sums
cvars = [c0,c1,c2,c3]
arg_dict = pull_arguments_by_var(eqn=df_dc3, variables=cvars)
leftside = c0*Sum(factor(arg_dict[c0][0]), (i,0,n)) + \
           c1*Sum(factor(arg_dict[c1][0]), (i,0,n)) + \
           c2*Sum(factor(arg_dict[c2][0]), (i,0,n)) + \
           c3*Sum(factor(arg_dict[c3][0]), (i,0,n))
rightside = arg_dict['other'][0]
for arg in arg_dict['other'][1:]:
    rightside = rightside + arg
print '4th Equation:'
Eq(leftside, -1*Sum(factor(rightside), (i,0,n)))
4th Equation:
Out[14]:
$$c_{0} \sum_{i=0}^{n} - 2 t_{i}^{3} \left(t_{i} -1\right)^{3} + c_{1} \sum_{i=0}^{n} 6 t_{i}^{4} \left(t_{i} -1\right)^{2} + c_{2} \sum_{i=0}^{n} - 6 t_{i}^{5} \left(t_{i} -1\right) + c_{3} \sum_{i=0}^{n} 2 t_{i}^{6} = - \sum_{i=0}^{n} - 2 d_{i} t_{i}^{3}$$

From the 4 equations above we can start to see the structure of the matrix form.

Set the problem up as the familiar multiple linear equations, multiple unknowns pattern. \begin{aligned} \mathbf{A x = b} \end{aligned} Where:

$\mathbf{A}$ is a 4x4 matrix of the summations listed on the left side of the equations

$\mathbf{x}$ is a 4x1 matrix containing the unknowns c0,c1,c2,c3

$\mathbf{b}$ is a 4x1 matrix of the summations listed on the right side of the equations

To solve is simply : $\mathbf{x = A^{-1} b}$

In [15]:
# build the actual matrices this time in a loop
cvars = [c0,c1,c2,c3]
a_vals = []
x_vals = cvars
b_vals = []
for eqn in [df_dc0, df_dc1, df_dc2, df_dc3]:
    arg_dict = pull_arguments_by_var(eqn=eqn, variables=cvars)
    for var in cvars:
        a_vals.append( Sum(factor(arg_dict[var][0]), (i,0,n)) )
    rightside = arg_dict['other'][0]
    for arg in arg_dict['other'][1:]:
        rightside = rightside + arg
    b_vals.append( -1*Sum(factor(rightside), (i,0,n)) )
    
    
A = Matrix(4,4,a_vals)
x = Matrix(4,1,x_vals)
b = Matrix(4,1,b_vals)
In [16]:
print 'A = '
A
A = 
Out[16]:
$$\left[\begin{smallmatrix}\sum_{i=0}^{n} 2 \left(t_{i} -1\right)^{6} & \sum_{i=0}^{n} - 6 t_{i} \left(t_{i} -1\right)^{5} & \sum_{i=0}^{n} 6 t_{i}^{2} \left(t_{i} -1\right)^{4} & \sum_{i=0}^{n} - 2 t_{i}^{3} \left(t_{i} -1\right)^{3}\\\sum_{i=0}^{n} - 6 t_{i} \left(t_{i} -1\right)^{5} & \sum_{i=0}^{n} 18 t_{i}^{2} \left(t_{i} -1\right)^{4} & \sum_{i=0}^{n} - 18 t_{i}^{3} \left(t_{i} -1\right)^{3} & \sum_{i=0}^{n} 6 t_{i}^{4} \left(t_{i} -1\right)^{2}\\\sum_{i=0}^{n} 6 t_{i}^{2} \left(t_{i} -1\right)^{4} & \sum_{i=0}^{n} - 18 t_{i}^{3} \left(t_{i} -1\right)^{3} & \sum_{i=0}^{n} 18 t_{i}^{4} \left(t_{i} -1\right)^{2} & \sum_{i=0}^{n} - 6 t_{i}^{5} \left(t_{i} -1\right)\\\sum_{i=0}^{n} - 2 t_{i}^{3} \left(t_{i} -1\right)^{3} & \sum_{i=0}^{n} 6 t_{i}^{4} \left(t_{i} -1\right)^{2} & \sum_{i=0}^{n} - 6 t_{i}^{5} \left(t_{i} -1\right) & \sum_{i=0}^{n} 2 t_{i}^{6}\end{smallmatrix}\right]$$
In [17]:
print 'x = '
x
x = 
Out[17]:
$$\left[\begin{smallmatrix}c_{0}\\c_{1}\\c_{2}\\c_{3}\end{smallmatrix}\right]$$
In [18]:
print 'b = '
b
b = 
Out[18]:
$$\left[\begin{smallmatrix}- \sum_{i=0}^{n} 2 d_{i} \left(t_{i} -1\right)^{3}\\- \sum_{i=0}^{n} - 6 d_{i} t_{i} \left(t_{i} -1\right)^{2}\\- \sum_{i=0}^{n} 6 d_{i} t_{i}^{2} \left(t_{i} -1\right)\\- \sum_{i=0}^{n} - 2 d_{i} t_{i}^{3}\end{smallmatrix}\right]$$

Now we're done with symbolic math.

The A and b matrices still look formidable with those summations in there. Do realize however that the summations are computed from the data prior to solving for c0,c1,c2, and c3. As far as the c values are concerned, the summations involving d_i's and t_i's just come out to be numbers computed from your hand drawn data. This means that the matrix math is actually quite simple. You use your d_i and t_i values in the summations to place numbers into the 4x4 A matrix and the 4x1 b matrix. From here we can do our matrix math numerically.

Now it's time to see if we can actually compute a bezier curve from hand drawn data.

This is not a gui program to draw an arc with, so add some noise to our original curve to simulate a poorly drawn line.

Remember that when we started this worksheet, somewhere near the top, we had computed a bezier curve numerically. We're going to do the same thing here, but add noise to the output curve data to simulate our hand drawing.

In [19]:
from pylab import randn

# define the control points of a bezier curve we wish to generate
p0 = (0.7,0.2); p1 = (2.3,1.0); p2 = (7.2,0.8); p3 = (10.0,0.0)
# create a list of 1000 x,y points that traces over the length of this curve
points = [numerical_cubic_bezier_fn(control_pts=[p0,p1,p2,p3], t=t) for t in arange(0.0,1.0,1./1000)]
# separate the x's and y's so matplotlib can plot them; also add some noise
xs = [x+0.15*randn() for x,y in points]
ys = [y+0.15*randn() for x,y in points]

# plot the resulting 'hand drawn' curve
plot(xs, ys, 'b-')
t=title('noisy (hand drawn) curve')

Now it's time for the numerical code for all of those summations that we computed symbolically above.

First we have to define what $t_i$ and $d_i$ are...

We used $d_i$ to represent each of our drawing data points in sequence, but you might be confused on how we are going to use (x,y) points for each $d_i$. The answer is that we will compute only for x and then we will repeat the process only for y. This should give us the x portion of the control points and then the y portion of the control points. Note that x and y are orthogonal and so they are independant of each other, so we can get away with calculating them independantly.

For $t_i$, just imagine that someone is drawing the curve starting at one position and ending at another position in one continuous line. For this case we will assume it was drawn from the left and was finished on the right. $t_i$ is simply the time at which each data point was drawn. Remember that $t_i$ needs to go from 0 to 1 for the bezier equation to work. So, we assume that it took 1 second to draw the entire curve and each data point is drawn at the same relative speed. Of course time actually has nothing to do with anything, it's just a mental model of tying a $t_i$ to a $d_i$.

In [20]:
# we already have out d_i's in xs and ys from above
# make a t_i list
ts = arange(0.0, 1.0, 1.0/len(points))

# make the summation functions for A (16 of them)
A_fns = [None for i in range(16)]
A_fns[0] = lambda time_list, data_list : sum([2*(t_i - 1)**6 for t_i, d_i in zip(time_list, data_list)])
A_fns[1] = lambda time_list, data_list : sum([-6*t_i*(t_i - 1)**5 for t_i, d_i in zip(time_list, data_list)])
A_fns[2] = lambda time_list, data_list : sum([6*t_i**2*(t_i - 1)**4 for t_i, d_i in zip(time_list, data_list)])
A_fns[3] = lambda time_list, data_list : sum([-2*t_i**3*(t_i - 1)**3 for t_i, d_i in zip(time_list, data_list)])
A_fns[4] = lambda time_list, data_list : sum([-6*t_i*(t_i - 1)**5 for t_i, d_i in zip(time_list, data_list)])
A_fns[5] = lambda time_list, data_list : sum([18*t_i**2*(t_i - 1)**4 for t_i, d_i in zip(time_list, data_list)])
A_fns[6] = lambda time_list, data_list : sum([-18*t_i**3*(t_i - 1)**3 for t_i, d_i in zip(time_list, data_list)])
A_fns[7] = lambda time_list, data_list : sum([6*t_i**4*(t_i - 1)**2 for t_i, d_i in zip(time_list, data_list)])
A_fns[8] = lambda time_list, data_list : sum([6*t_i**2*(t_i - 1)**4 for t_i, d_i in zip(time_list, data_list)])
A_fns[9] = lambda time_list, data_list : sum([-18*t_i**3*(t_i - 1)**3 for t_i, d_i in zip(time_list, data_list)])
A_fns[10] = lambda time_list, data_list : sum([18*t_i**4*(t_i - 1)**2 for t_i, d_i in zip(time_list, data_list)])
A_fns[11] = lambda time_list, data_list : sum([-6*t_i**5*(t_i - 1) for t_i, d_i in zip(time_list, data_list)])
A_fns[12] = lambda time_list, data_list : sum([-2*t_i**3*(t_i - 1)**3 for t_i, d_i in zip(time_list, data_list)])
A_fns[13] = lambda time_list, data_list : sum([6*t_i**4*(t_i - 1)**2 for t_i, d_i in zip(time_list, data_list)])
A_fns[14] = lambda time_list, data_list : sum([-6*t_i**5*(t_i - 1) for t_i, d_i in zip(time_list, data_list)])
A_fns[15] = lambda time_list, data_list : sum([2*t_i**6 for t_i, d_i in zip(time_list, data_list)])
    
# make the summation functions for b (4 of them)
b_fns = [None for i in range(4)]
b_fns[0] = lambda time_list, data_list : -1.0 * sum([2*d_i*(t_i - 1)**3 for t_i, d_i in zip(time_list, data_list)])
b_fns[1] = lambda time_list, data_list : -1.0 * sum([-6*d_i*t_i*(t_i - 1)**2 for t_i, d_i in zip(time_list, data_list)])
b_fns[2] = lambda time_list, data_list : -1.0 * sum([6*d_i*t_i**2*(t_i - 1) for t_i, d_i in zip(time_list, data_list)])
b_fns[3] = lambda time_list, data_list : -1.0 * sum([-2*d_i*t_i**3 for t_i, d_i in zip(time_list, data_list)])

From the generated 'hand drawn data' create a function to fill the A and b matrices using the summation functions we just defined and then solve for the unknowns.

In [21]:
def solve_for_cs(time_series, data_series):
    """
    Take an input series of t_i values and the corresponding d_i values,
    compute the summation values that should go into the matrices and
    solve for the 4 unknown variables.

    Parameters: time_series -- t_i in increasing values
                data_series -- d_i corresponding to each t_i

    Returns: solution -- matrix containing the 4 solutions from solving the linear equations
    """
    # compute the data we will put into matrix A
    A_values = []
    for fn in A_fns:
        A_values.append(fn(time_series, data_series))
    # fill the A matrix with data
    A_numerical = Matrix(4,4, A_values)

    # compute the data we will put into the b vector
    b_values = []
    for fn in b_fns:
        b_values.append(fn(time_series, data_series))
    # fill the b vector with data
    b_numerical = Matrix(4,1, b_values)

    # solve for the unknowns in vector x
    x_numerical = A_numerical.inv() * b_numerical
    #print 'A = \n', A_numerical
    #print 'b = \n', b_numerical
    #print 'x = \n', x_numerical
    return x_numerical

Now solve for the 'best fit' solution. We will start with the x data from the (x,y) data points and then the y data.

In [22]:
# solve for the best fit in the x dimension
x_solutions = solve_for_cs(time_series=ts, data_series=xs)
# solve for the best fit in the y dimension
y_solutions = solve_for_cs(time_series=ts, data_series=ys)

# place the solutions into control points
best_fit_control_pts = [(x,y) for x,y in zip(x_solutions, y_solutions)]

Now for the test... did it all work?

Use the best fit control points to create the best fit bezier curve over top of the 'hand drawn' curve.

In [23]:
# create a list of 1000 x,y points that traces over the length of our best fit curve
best_fit_points = [numerical_cubic_bezier_fn(control_pts=best_fit_control_pts, t=t) for t in arange(0.0,1.0,1./1000)]
# separate the x's and y's so matplotlib can plot them
best_fit_xs = [x for x,y in best_fit_points]
best_fit_ys = [y for x,y in best_fit_points]

# plot the original noisy curve
plot(xs, ys, 'b-')
hold(True)
# plot the best fit curve over top
plot(best_fit_xs, best_fit_ys, 'r-')
hold(False)
t=title('best fit cubic bezier curve')

# just for comparisons sake, print the 'clean' control points and then print the best 
# fit control points taken from the noisy data
print 'Clean Control Points = ',
print [p0, p1, p2, p3]
print 'Best Fit Control Points = '
print best_fit_control_pts
Clean Control Points =  [(0.7, 0.2), (2.3, 1.0), (7.2, 0.8), (10.0, 0.0)]
Best Fit Control Points = 
[(0.680235809341125, 0.199169722342596), (2.30586285000266, 1.03398096289025), (7.22739062250292, 0.753866347077646), (9.99219893606198, 0.00672263149211805)]