A brief introduction to sensor noise¶

PyData Berlin 2014

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.html.widgets import interact
import matplotlib.mlab as mlab
import seaborn as sb
#sb.set()  # Reset Style
sb.set(style="white", palette="muted")
sb.set_context("talk")

In [2]:

%matplotlib inline

All right, let's look at some real sensor data¶

Read some measurements in, which are collected with Tinkerforge IMU and GPS sensors, which layed flat on the ground, without moving in any direction (static measurement).

In [3]:

df = pd.read_csv('2014-06-27-001-Data.csv')

In [4]:

# Convert Epoch Millis to Timestamp
# http://pandas.pydata.org/pandas-docs/dev/timeseries.html#epoch-timestamps
df.index = pd.to_datetime(df['millis'], unit='ms')

Acceleration¶

In [5]:

accelerations = df[['ax','ay']].dropna()

In [6]:

accelerations.plot();

Let's try to fit a distribution¶

In [7]:

import scipy.stats as stats
from IPython.html.widgets import interact
dists = ['Normal', 'Rayleigh', 'Weibull']

Thanks Rob Story for the code in your PyData Silicon Valley 2014 talk!

In [24]:

@interact
def plot_sb_dist(column=accelerations.columns.tolist(), dist=dists):
    plt.figure(figsize=(10, 4))
    dist_map = {
        'Rayleigh': stats.rayleigh,
        'Weibull': stats.exponweib,
        'Normal': stats.norm,
    }
    sb.distplot(accelerations[column], fit=dist_map[dist], label='# of Sensor Readings')
    sb.plt.legend()
    plt.savefig("ax_dist.png", dpi=72, bbox_inches='tight')

The Normal Distribution fits it perfectly!¶

In [9]:

accelerations.describe()

Out[9]:

	ax	ay
count	1000.000000	1000.000000
mean	-0.300999	-1.135121
std	0.046919	0.042984
min	-0.470600	-1.264900
25%	-0.333400	-1.166900
50%	-0.303900	-1.137500
75%	-0.274500	-1.108000
max	-0.176500	-0.990400

One can use mean and std to describe, what's going on.

The variance is $\mathrm{std}^2$

GPS Measurement¶

GPS measurements are in decimal degrees DD.dddddd

In [10]:

gps = pd.DataFrame()

Convert decimal degress approximately to meters

In [11]:

arcLon = 2*np.pi*6378.0/360.0
arcLat = arcLon * np.cos(df.longitude*np.pi/180.0)

gps['LonM'] = df.longitude * arcLon * 1000.0
gps['LatM'] = df.latitude * arcLat * 1000.0

In [30]:

sb.jointplot(gps.LonM, gps.LatM, kind='kde', \
             xlim=(gps.LonM.min()-1, gps.LonM.max()+1), \
             ylim=(gps.LatM.min()-1, gps.LatM.max()+1), \
             size=10, space=1)
plt.savefig('gps_dist.png', dpi=150)

In [13]:

@interact
def plot_sb_dist(column=gps.columns.tolist(), dist=dists):
    plt.figure(figsize=(10, 6))
    dist_map = {
        'Rayleigh': stats.rayleigh,
        'Weibull': stats.exponweib,
        'Normal': stats.norm,
    }
    sb.distplot(gps[column], fit=dist_map[dist])

Even if it's not perfect, it matches the sensor readings pretty well.¶

In [14]:

gps.describe()

Out[14]:

	LonM	LatM
count	1000.000000	1000.000000
mean	1535402.773968	5517728.990404
std	0.273083	1.430401
min	1535402.321909	5517726.748935
25%	1535402.544544	5517727.816466
50%	1535402.655861	5517728.968462
75%	1535402.989812	5517730.181281
max	1535403.435080	5517731.309635

One can use mean and std to describe, what's going on.

The Normal Distribution / Gauss-Distribution¶

$f(x, \mu, \sigma) = \large \frac{1}{\sigma\sqrt{2\pi}} \ e^{\Large -\frac{(x-\mu)^2}{2\sigma^2} } \\$ with $\mu$ as mean and $\sigma^2$ as variance ($\sigma$ is the standard deviation).

In [15]:

def pltnormpdf(mean, variance):
    plt.figure(figsize=(8,5))
    plt.plot(x,mlab.normpdf(x, mean, variance))
    plt.ylim(0,1)

In [16]:

x = np.linspace(-10,10,500)
interact(pltnormpdf, mean=(-5,5,0.5), variance=(0.1,10,0.1));

Conclusion¶

With the help of statistical parameters, it is possible to describe a sensor reading and it's noise. A lot of sensors can be described be the assumption:

[real value] + [white, gaussian noise]

so one can use the Kalman filter! Even it is not a perfect normal distribution, the Kalman Filter is the best estimator of the real value. The Kalman Filter does not work, if the sensor reading is multimodal (two or more peaks).