# Residual analysis to determine the optimal cutoff frequency¶

Marcos Duarte

A common problem is signal processing is to automatically determine the optimal cutoff frequency that should be employed in a low-pass filter to attenuate as much as possible the noise without compromising the signal content of the data.

Before we continue, see this notebook for an overview about data filtering if needed.

Unfortunately, there is no definite solution for this problem, but there are some techniques, with different degrees of success, to try to determine the optimal cutoff frequency.

David Winter, in his classic book Biomechanics and motor control of human movement, proposed a method to find the optimal cutoff frequency based on residual analysis of the difference between filtered and unfiltered signals over a range of cutoff frequencies. The optimal cutoff frequency is the one where the residual starts to change very little because it is considered that from this point, it's being filtered mostly noise and minimally signal, ideally. This concept is straightforward to implement.

The function residual_analysis.py (its code is shown at the end of this text) is an implmmentation of this method and it is divided in three parts (after the help section): first, the residuals over a range of cutoff frequencies are calculated; second, an algorithm tries to find the noisy region (supposedly linear) of the residuals versus cutoff frequency plot and finds the optimal cutoff frequency; and third, the results are plotted. The code is lengthy relatively to the simplicity of the idea because of the long help section, the implementation of the automatic search and a rich plot.

Let's test this code with benchmark data.

In 1977, Pezzack, Norman and Winter published a paper where they investigated the effects of differentiation and filtering processes on experimental data (the angle of a bar manipulated in space). Since then, these data have became a benchmark to test new algorithms. Let's work with these data (available at http://isbweb.org/data/pezzack/index.html). The data have the angular displacement measured by video and the angular acceleration directly measured by an accelerometer, which we will consider as the true acceleration.
Part of these data are showing next:

In [1]:
# Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sys
sys.path.insert(1, r'./../functions') # add dir to pythonpath

In [2]:
# load data file
time, disp, disp2, aacc = np.loadtxt('./../data/Pezzack.txt', skiprows=6, unpack=True)
dt = np.mean(np.diff(time))
# plot data
fig, (ax1,ax2) = plt.subplots(1, 2, sharex = True, figsize=(11, 4))
plt.suptitle("Pezzack's benchmark data", fontsize=20)
ax1.plot(time, disp, 'b')
ax2.plot(time, aacc, 'g')
ax2.set_xlabel('Time [s]'); ax2.set_ylabel('Angular acceleration [rad/s$^2$]')


And using the residual analsysis code:

In [3]:
from residual_analysis import residual_analysis

freq = np.mean(1/np.diff(time))
fc_opt = residual_analysis(disp, freq=freq, show=True)


The optimal cutoff frequency found is 5.6 Hz. Note that the filtering process is relevant only for the derivative of the data; we cannot distinguish the unfiltered and unfiltered displacements (see that the RMSE residual is very small).
Let's employ this filter, differentiate the data twice and compare with the true acceleration as we did before:

In [4]:
from scipy.signal import butter, filtfilt
# Butterworth filter
# Correct the cutoff frequency for the number of passes in the filter
C = 0.802 # for dual pass; C = (2**(1/npasses) - 1)**0.25
b, a = butter(2, (fc_opt/C)/(freq/2))
dispf = filtfilt(b, a, disp)
aaccBW =  np.diff(dispf, 2)*freq*freq
# RMSE:
rmseBW = np.sqrt(np.mean((aaccBW-aacc[1:-1])**2))
# plot data
fig, ax1 = plt.subplots(1, 1, figsize=(11, 4))
plt.suptitle("Pezzack's benchmark data", fontsize=20)
ax1.plot(time[1:-1], aacc[1:-1], 'g', label='Analog acceleration:     (True value)')
ax1.plot(time[1:-1], aaccBW, 'r',
label='Butterworth %.3g Hz:     RMSE = %0.2f' %(fc_opt,rmseBW))
ax1.set_xlabel('Time [s]');
ax1.set_ylabel('Angular acceleration [rad/s$^2$]');
plt.legend(frameon=False, fontsize=12, loc='upper left');


The peeformance seems satisfactory (see this notebook for a comparison using other filters), but it is known that this residual analysis algorithm results in oversmoothing the kinematic data (see http://www.clinicalgaitanalysis.com/faq/cutoff.html).

## Function residual_analysis.py¶

In [6]:
%loadpy ./../functions/residual_analysis

In [ ]:
#!/usr/bin/env python

"""Automatic search of filter cutoff frequency based on residual analysis."""

from __future__ import division, print_function
import numpy as np
from scipy.signal import butter, filtfilt

__author__ = 'Marcos Duarte, https://github.com/demotu/BMC'
__version__ = 'residual_analysis.py v.3 2014/06/13'

def residual_analysis(y, freq=1, fclim=[], show=False, ax=None):
""" Automatic search of filter cutoff frequency based on residual analysis.

This method was proposed by Winter in his book [1]_.
The 'optimal' cutoff frequency (in the sense that a filter with such cutoff
frequency removes as much noise as possible without considerably affecting
the signal) is found by performing a residual analysis of the difference
between filtered and unfiltered signals over a range of cutoff frequencies.
The optimal cutoff frequency is the one where the residual starts to change
very little because it is considered that from this point, it's being
filtered mostly noise and minimally signal, ideally.

Parameters
----------
y : 1D array_like
Data
freq : float, optional (default = 1)
sampling frequency of the signal y
fclim : list with 2 numbers, optional (default = [])
limit frequencies of the noisy part or the residuals curve
show : bool, optional (default = False)
True (1) plots data in a matplotlib figure
False (0) to not plot
ax : a matplotlib.axes.Axes instance, optional (default = None).

Returns
-------
fc_opt : float

Notes
-----
A second-order zero-phase digital Butterworth low-pass filter is used.
# The cutoff frequency is correctyed for the number of passes:
# C = 0.802 # for dual pass; C = (2**(1/npasses) - 1)**0.25

The matplotlib figure with the results will show a plot of the residual
analysis with the optimal cutoff frequency, a plot with the unfiltered and
filtered signals at this optimal cutoff frequency (with the RMSE of the
difference between these two signals), and a plot with the respective
second derivatives of these signals which should be useful to evaluate
the quality of the optimal cutoff frequency found.

Winter should not be blamed for the automatic search algorithm used here.
The algorithm implemented is just to follow as close as possible Winter's
suggestion of fitting a regression line to the noisy part of the residuals.

This function performs well with data where the signal has frequencies
considerably bellow the Niquist frequency and the noise is predominantly
white in the higher frequency region.

If the automatic search fails, the lower and upper frequencies of the noisy
part of the residuals curve cam be inputed as a parameter (fclim).
These frequencies can be chosen by viewing the plot of the residuals (enter
show=True as input parameter when calling this function).

It is known that this residual analysis algorithm results in oversmoothing
kinematic data [2]_. Use it with moderation.
This code is described elsewhere [3]_.

References
----------
.. [1] Winter DA (2009) Biomechanics and motor control of human movement.
.. [2] http://www.clinicalgaitanalysis.com/faq/cutoff.html
.. [3] http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/ResidualAnalysis.ipynb

Examples
--------
>>> import numpy as np
>>> from residual_analysis import residual_analysis
>>> y = np.cumsum(np.random.randn(1000))
>>> # optimal cutoff frequency based on residual analysis and plot:
>>> fc_opt = residual_analysis(y, freq=1000, show=True)
>>> # sane analysis but specifying the frequency limits and plot:
>>> residual_analysis(y, freq=1000, fclim=[200,400], show=True)
>>> # Not always it's possible to find an optimal cutoff frequency
>>> # or the one found can be wrong (run this example many times):
>>> y = np.random.randn(100)
>>> residual_analysis(y, freq=100, show=True)

"""

from scipy.interpolate import UnivariateSpline

# Correct the cutoff frequency for the number of passes in the filter
C = 0.802  # for dual pass; C = (2**(1/npasses) - 1)**0.25

# signal filtering
freqs = np.linspace((freq/2) / 101, (freq/2)*C, 101)
res = []
for fc in freqs:
b, a = butter(2, (fc/C) / (freq / 2))
yf = filtfilt(b, a, y)
# residual between filtered and unfiltered signals
res = np.hstack((res, np.sqrt(np.mean((yf - y) ** 2))))

# find the optimal cutoff frequency by fitting an exponential curve
# y = A*exp(B*x)+C to the residual data and consider that the tail part
# of the exponential (which should be the noisy part of the residuals)
# decay starts after 3 lifetimes (exp(-3), 95% drop)
if not len(fclim) or np.any(fclim < 0) or np.any(fclim > freq / 2):
fc1 = 0
fc2 = 0.95*(len(freqs)-1)
# log of exponential turns the problem to first order polynomial fit
# make the data always greater than zero before taking the logarithm
reslog = np.log(np.abs(res[fc1:fc2 + 1] - res[fc2]) +
10 * np.finfo(np.float).eps)
Blog, Alog = np.polyfit(freqs[fc1:fc2 + 1], reslog, 1)
fcini = np.nonzero(freqs >= -3 / Blog)  # 3 lifetimes
fclim = [fcini[0][0], fc2] if np.size(fcini) else []
else:
fclim = [np.nonzero(freqs >= fclim[0])[0][0],
np.nonzero(freqs >= fclim[1])[0][0]]

# find fc_opt with linear fit y=A+Bx of the noisy part of the residuals
if len(fclim) and fclim[0] < fclim[1]:
B, A = np.polyfit(freqs[fclim[0]:fclim[1]], res[fclim[0]:fclim[1]], 1)
# optimal cutoff frequency is the frequency where y[fc_opt] = A
roots = UnivariateSpline(freqs, res - A, s=0).roots()
fc_opt = roots[0] if len(roots) else None
else:
fc_opt = None

if show:
_plot(y, freq, freqs, res, fclim, fc_opt, B, A, ax)

return fc_opt

def _plot(y, freq, freqs, res, fclim, fc_opt, B, A, ax):
"""Plot results of the residual_analysis function, see its help."""
try:
import matplotlib.pyplot as plt
except ImportError:
print('matplotlib is not available.')
else:
if ax is None:
plt.figure(num=None, figsize=(10, 5))
ax = np.array([plt.subplot(121),
plt.subplot(222),
plt.subplot(224)])

plt.rc('axes', labelsize=12,  titlesize=12)
plt.rc('xtick', labelsize=12)
plt.rc('ytick', labelsize=12)
ax[0].plot(freqs, res, 'b.', markersize=9)
time = np.linspace(0, len(y) / freq, len(y))
ax[1].plot(time, y, 'g', linewidth=1, label='Unfiltered')
ydd = np.diff(y, n=2) * freq ** 2
ax[2].plot(time[:-2], ydd, 'g', linewidth=1, label='Unfiltered')
if fc_opt:
ylin = np.poly1d([B, A])(freqs)
ax[0].plot(freqs, ylin, 'r--', linewidth=2)
ax[0].plot(freqs[fclim[0]], res[fclim[0]], 'r>',
freqs[fclim[1]], res[fclim[1]], 'r<', ms=9)
ax[0].set_ylim(ymin=0, ymax=4 * A)
ax[0].plot([0, freqs[-1]], [A, A], 'r-', linewidth=2)
ax[0].plot([fc_opt, fc_opt], [0, A], 'r-', linewidth=2)
ax[0].plot(fc_opt, 0, 'ro', markersize=7, clip_on=False,
zorder=9, label='$Fc_{opt}$ = %.1f Hz' % fc_opt)
ax[0].legend(fontsize=12, loc='best', numpoints=1, framealpha=.5)
# Correct the cutoff frequency for the number of passes
C = 0.802  # for dual pass; C = (2**(1/npasses) - 1)**0.25
b, a = butter(2, (fc_opt/C) / (freq / 2))
yf = filtfilt(b, a, y)
ax[1].plot(time, yf, color=[1, 0, 0, .5],
linewidth=2, label='Opt. filtered')
ax[1].legend(fontsize=12, loc='best', framealpha=.5)
ax[1].set_title('Signals (RMSE = %.3g)' % A)
yfdd = np.diff(yf, n=2) * freq ** 2
ax[2].plot(time[:-2], yfdd, color=[1, 0, 0, .5],
linewidth=2, label='Opt. filtered')
ax[2].legend(fontsize=12, loc='best', framealpha=.5)
resdd = np.sqrt(np.mean((yfdd - ydd) ** 2))
ax[2].set_title('Second derivatives (RMSE = %.3g)' % resdd)
else:
ax[0].text(.5, .5, 'Unable to find optimal cutoff frequency',
horizontalalignment='center', color='r', zorder=9,
transform=ax[0].transAxes, fontsize=12)
ax[1].set_title('Signal')
ax[2].set_title('Second derivative')

ax[0].set_xlabel('Cutoff frequency [Hz]')
ax[0].set_ylabel('Residual RMSE')
ax[0].set_title('Residual analysis')
ax[0].grid()
# ax2.set_xlabel('Time [s]')
ax[1].set_xlim(0, time[-1])
ax[1].grid()
ax[2].set_xlabel('Time [s]')
ax[2].set_xlim(0, time[-1])
ax[2].grid()
plt.tight_layout()
plt.show()