Author: Seth Nielsen - sethmnielsen@gmail.com
Forgetting Factor Section: Brian Nelson - brnnelson4@gmail.com
Least squares approximation is a method used to approximate the solution to an overdetermined system of equations. Linear least squares can be applied to a set of data to find a model that minimizes the sum of the squared errors between the data and their corresponding modeled values, thus creating a model of best fit. This method can further be applied to create a least-squares filter; a type of filter that, given a sequence of input data, uses least squares to find the coefficients that minimize the error between the filter's output and a desired sequence.
The traditional form of a least squares filter is performed in batch form, where the minimizing solution is computed on an entire block of data after the data has been collected. This method of calculating a solution can be effective for applications suited to offline computation, but is not particularly useful for systems that require real-time or online parameter estimation for parameters that are unknown in advance or are changing. In this case, an adaptive filter is required. The recursive least squares filter modifies the least squares filter to be adaptive by continuously updating the estimated parameters (coefficients) as new data arrive. Thanks to the matrix inversion lemma (see the following derivation), the recursive least squares filter does not require the calculation of any matrix inverse, which can greatly lower computation requirements and precision errors when filtering large amounts of data. The computation can also have a "forgetting factor" added that allows the filter to follow a system with changing parameters.
We begin with the batch least squares filter with output yt, minimizing coefficients ht, and input signal ft:
yt=m−1∑i=0hi ft−iThe coefficients minimizing the least-squares error between filter output yt and a desired signal dt satisfy the normal equation (with data matrix A)
Rh=AHd, where R=AHA=N∑i=1qi qHiwith qi=[fifi−1...fi−m+1].Let
p=AHd=N∑i=1qidiand thus the coefficients can be computed as
h=R−1AH=R−1pThe algorithm can now be made to be adaptive. From this point forward, all terms with subscript t are computing using the data only up to time t.
Now define the Grammian matrix Rt as
Rt=t∑i=1qiqHiand
pt=t∑i=1qidi=pt−1+qtdtˆht=R−1tptwhere ˆht is the estimated filter coefficients at time t.
The algorithm is now adaptive, but it still needs to be recursive (R−1t is currently calculated at each time step). We can break Rt up to be
Rt=t−1∑i=1qiqHi+qtqHt=Rt−1+qtqHtBy the matrix inversion lemma (eq. 4.32),
R−1t=R−1t−1−R−1t−1qtqHtR−1t−11+qHtR−1t−1qtLet
Pt=R−1tand the Kalman gain
kt=R−1t−1qt1+qHtR−1t−1qt=P−1t−1qt1+qHtP−1t−1qt.Then we arrive at Pt=Pt−1−ktqHtPt−1.(eq. 4.38)
The coefficients for the filter, or the estimated parameters, are computed by
ˆht=Ptpt=Pt( pt−1+qt dt )=Pt pt−1+Pt qt dt=Pt−1 pt−1−kt qHt Pt−1 pt−1+Pt qt dt ⇐ using (eq. 4.38)=ˆht−1−kt qHt ˆht−1+kt dt⇐ kt=Pt qt=ˆht−1+kt( dt−qHt ˆht−1 ).The quantity multiplied by kt respresents the filter error
εt=dt−qHt ˆht−1,which allows us to write the update of the filter coefficients in the simple form of
ˆht=ˆht−1+ktεt.To initialize the algorithm, the initial condition P0=R−10 is needed. A slight perturbation of some small δ to the matrix Rt gives the following: Rt=t∑i=1qiqHi+δIR0=δIP0=δ−1I
Computing P0 in this way allows the recursive algorithm to be free of calculating any matrix inverses.
The initial filter coefficients can be assumed to be zero, and thus ˆh0=0.
The derivation above allows us to approximate a system with static coefficients--as the algorithm runs and more data is processed, it converges to a set of coefficients that minimize the error between the output of the system that it is generating, and the actual outputs. Well it is doing this, each input and output pair is given the same amount of weight. The first input and output will be weighted the same as the most recently obtained inputs and outputs. This works well if the coefficents of the system are known to be static, but if the system is changing, then we need a way for the system to adapt to place greater weight on new inputs and "forget" the old ones.
To add a forgetting factor, we can look at the initial equation for how Rt is calculated. As shown above, Rt is the sum shown below.
Rt=t−1∑i=1qiqHi+qtqHt=Rt−1+qtqHtIf we pick a number λ, where 0<λ<1, and weight the previously computed values with this, the formula will place less weight on older inputs. This is shown below:
Rt=λt−1∑i=1qiqHi+qtqHt=λRt−1+qtqHtBecause this is redone at each step, the older an input gets, the less weight it will have in the current Rt matrix. This ends up causing an exponentially decreasing focus on old inputs, that fits exactly what we are looking; as the system slowly changes with time, we still want to use the old inputs in the current computation, but we want more weight on more recent input/output pairs and less focus on inputs and outputs that happened far in the past.
If the derivation shown in the previous section to find the formulas for Pt, kt, and ˆht are followed using the formula for Rt with a forgetting factor, then these formulas end up being as shown below.
Pt=λ−1(Pt−1−ktqHtPt−1)λ is generally chosen to be between 0.98 and 1. The higher the value of λ is chosen, the more the old input/output pairs will affect the current estimates of the system coefficients.
We can start by estimating the parameters of a simple system using a recursive least squares filter. Let's choose a system with impulse response h=[ 5,6,7,8,9 ] . The following filter written in Python will give normally-distributed random values ft as inputs and attempt to match the system's true output dt by computing the error εt=dt−yt , where yt is the filtered output evaluated as yt=qHt ˆht, and ˆht is the vector of m estimated parameters that is updated by computing ˆht=ˆht−1+ktεt at each timestep. The vector qt is simply the m most recent input values taken from the signal ft.
The variable m can be set at the beginning of the script to choose how many filter coefficients will be used, thus defining the size of ˆh and qt.
import numpy as np
np.set_printoptions(precision=4)
m = 5 # number of parameters
n = 10000 # iterations
h = np.array([5,6,7,8,9]) # impulse response
hhat = np.zeros(m) # initial estimated parameters
f = np.random.randn(n) # normally-distributed random input
fn = np.hstack(([0,0,0,0],f)) # convenience array used to shift through input (f)
q = np.zeros(m) # input data for one time step
delta = .0001
P = 1/delta * np.eye(m) # initialize P[0] without calculating inverse of R[0]
for i in range(n):
d = fn[i:i+5] @ h # true output
q = fn[i:i+m] # q update
k = P @ q / (1 + q.T @ P @ q) # kalman gain vector
y = q @ hhat # filter output
e = d - y # error
hhat = hhat + k * e # update of estimated parameters
P = P - k * q @ P # P update
print('hhat:',hhat)
hhat: [5. 6. 7. 8. 9.]
In this example, the following Python code will attempt to find the optimal filter coefficients to match the output of a mass-spring-damper system given a horizontal force F as input. The force varies over time sinusoidally. The output of the system is the position of the mass. The output is calculated using the Runge-Kutta RK4 algorithm to integrate the first order ODE m¨x+b˙x+kx=F.
and propagating the dynamics at each time step Ts.
Gaussian noise has been added to the output as well as some uncertainty in the system's physical parameters (mass m, spring-constant k, and damping coefficient b).
class msdParam:
def __init__(self):
# Physical parameters
self.m = 5.; # kg
self.k = 3.; # N/m
self.b = 0.5; # N m s
# Simulation Parameters
self.t_start = 0.0; # Start time of simulation
self.Ts = 0.1; # sample time for simulation
# Initial Conditions
self.z0 = 0.0;
self.zd0 = 0.0;
import random
P_ = msdParam()
class msdDynamics:
'''
Model the physical system
'''
def __init__(self):
# Initial state conditions
self.state = np.matrix([[P_.z0], # initial position
[P_.zd0]]) # initial velocity
alpha = 0.02 # Uncertainty parameter
self.m = P_.m * (1+2*alpha*np.random.rand()-alpha) # Mass of the msd, kg
self.k = P_.k * (1+2*alpha*np.random.rand()-alpha) # Spring constant, N/m
self.b = P_.b * (1+2*alpha*np.random.rand()-alpha) # Damping coefficient, Ns
self.Ts = P_.Ts # sample rate at which the dynamics are propagated
def propagateDynamics(self, u):
'''
Integrate the differential equations defining dynamics
P.Ts is the time step between function calls.
u contains the system input(s).
'''
# Integrate ODE using Runge-Kutta RK4 algorithm
k1 = self.derivatives(self.state, u)
k2 = self.derivatives(self.state + self.Ts/2*k1, u)
k3 = self.derivatives(self.state + self.Ts/2*k2, u)
k4 = self.derivatives(self.state + self.Ts*k3, u)
self.state += self.Ts/6 * (k1 + 2*k2 + 2*k3 + k4)
def derivatives(self, state, u):
'''
Return xdot = f(x,u), the derivatives of the continuous states, as a matrix
'''
# re-label states and inputs for readability
z = state.item(0)
zd = state.item(1)
F = u[0]
# Equations of motion
zdd = (F - self.b*zd - self.k*z)/self.m
# build xdot and return
xdot = np.matrix([[zd], [zdd]])
return xdot
def outputs(self):
'''
Returns the measured outputs as a list
[z] with added Gaussian noise
'''
# re-label states for readability
z = self.state.item(0)
# add Gaussian noise to outputs
z_m = z + random.gauss(0, 0.001)
# return measured outputs
return [z_m]
def states(self):
'''
Returns all current states as a list
'''
return self.state.T.tolist()
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("white")
msd = msdDynamics()
param = msdParam()
# Input signal (sine wave)
amplitude=5
frequency=0.1
m = 3 # number of parameters (k, b, m)
time = 50 # secs
n = int(time//param.Ts) # iterations
hhat = np.zeros(m) # initial estimated parameters
q = np.zeros(m)
delta = .00001
P = 1/delta * np.eye(m) # initial P
# Arrays for plotting
d_arr = np.zeros(n)
y_arr = np.zeros(n)
t_arr = np.zeros(n)
t = param.t_start # time starts at t_start
for i in range(n):
u = [amplitude*np.sin(2*np.pi*frequency*t)] # input (force on mass)
msd.propagateDynamics(u)
d = msd.outputs()[0] # system output (position of mass)
q = np.hstack((u[0],q[:-1])) # q update
k = P @ q / (1 + q.T @ P @ q) # kalman gain vector
y = q @ hhat # filter output
e = d - y # error
hhat = hhat + k * e # update of estimated parameters
P = P - k * q @ P # P update
# Saving values for plot
d_arr[i] = d
y_arr[i] = y
t_arr[i] = t
t = t + param.Ts # advance time by Ts
fig = plt.figure(dpi=150)
plt.plot(t_arr, d_arr, label='d')
plt.plot(t_arr, y_arr, label='y')
plt.xlabel('t')
plt.ylabel('output')
plt.legend(loc='upper left')
plt.show()
print('\nhhat:',hhat)
hhat: [-2.8695 -0.4514 3.3125]
The filter's output y very quickly converges to match the system's output d with only minor deviations thereafter. Note that the filter coefficents ˆh differ from the system's parameters, yet the filter output is still very close to that of the system.
Recursive Least Squares is a powerful tool that can be used to model an unknown system based solely on the inputs and outputs of that system. One application of this is in filtering audio. In this problem you will be taking an input audio clip that is all of the notes of the song “Mary Had a Little Lamb” being played on the piano at the same time for the length of the song, and an output that is an audio clip of the whole song “Mary Had a Little Lamb”. Given this input and output data, build a system that filters the input clip to sound like the output clip.
There is some starter code shown below. The input audio clip is called mary_had_a_little_lamb_in.wav, and the output is mary_had_a_little_lamb_out.wav. The starter code given will read in the .wav files, set up a few helper arrays and variables to make things a little easier, and will write the output to a .wav file called generated_out.wav.
HINT: The filtering will need to be adaptive. This is because the audio output changes over time while the input to the system stays consistent. This means that the recursive least squares algorithm will need to include a forgetting factor to place greater weight on newer input/output pairs.
import numpy as np
import scipy.io.wavfile as wav
# get the input and output files
[rateIn, inWav] = wav.read("mary_had_a_little_lamb_in.wav", 'r')
[rateOut, expectedOutWav] = wav.read("mary_had_a_little_lamb_out.wav", 'r')
# number of taps in the filter.
numberOfParameters = 5
n = expectedOutWav.size # length of the expected output
# convenience array used to shift through input. The beginning is padded with zeros,
# and the end is also padded with zeros.
fn = np.hstack((np.zeros(numberOfParameters - 1),
inWav, np.zeros(numberOfParameters - 1)))
# impulse response of system we are generating
h = np.zeros(numberOfParameters)
# preallocate the output that we will be generating
generated_output = np.zeros(n)
'''
WRITE YOUR CODE HERE.
- fn is an array of inputs that has already been zero padded
- h is the array for the system coefficients that your code will be using to
approximate the system
- generated_output is an array that you should write each output of your system to.
This will be saved to a .wav file that you can listen to once the system is done
processing.
You will need to:
1. set up P
2. loop through the inputs and outputs to compute P, k, h, and the generated_output
at each time step. Save your guessed outputs to the generated_output array for
it to be written to a wave file at the end of the processing
'''
# write the output to a wav file.
generated_output = generated_output.astype(np.int16)
wav.write("generated_out.wav", rateOut, generated_output)