$$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\Yv}{\mathbf{Y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\betav}{\mathbf{\beta}} \newcommand{\gv}{\mathbf{g}} \newcommand{\Hv}{\mathbf{H}} \newcommand{\dv}{\mathbf{d}} \newcommand{\Vv}{\mathbf{V}} \newcommand{\vv}{\mathbf{v}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\Sv}{\mathbf{S}} \newcommand{\Gv}{\mathbf{G}} \newcommand{\zv}{\mathbf{z}} \newcommand{\Zv}{\mathbf{Z}} \newcommand{\Norm}{\mathcal{N}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}} \newcommand{\dimensionbar}[1]{\underset{#1}{\operatorname{|}}} \newcommand{\dimensionbar}[1]{\underset{#1}{\operatorname{|}}} \newcommand{\grad}{\mathbf{\nabla}} \newcommand{\ebx}[1]{e^{\betav_{#1}^T \xv_n}} \newcommand{\eby}[1]{e^{y_{n,#1}}} \newcommand{\Tiv}{\mathbf{Ti}} \newcommand{\Fv}{\mathbf{F}} \newcommand{\ones}[1]{\mathbf{1}_{#1}}$$

# Classification with Nonlinear Logistic Regression Using Neural Networks¶

## Motivation and Setup¶

Linear function approximator as a neural network.

What must we add to do logistic regression?

Just add the softmax calculation to the output layer.

Any thoughts on how to do nonlinear logistic regression?

## Derivation¶

We will maximize the log likelihood of the training data. \begin{align*} L(\Wv) & = \left ( \prod_{n=1}^N \prod_{k=1}^K P(C=k|\xv_n)^{t_{n,k}} \right ) \\ \log L(\Wv) & = LL(\Wv) = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log P(C=k|\xv_n)\\ & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k} \end{align*}

## Gradient of the Log Likelihood¶

\begin{align*} LL(\Wv) & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k}\;\;\;\;\; \text{ where } g_{n,k} = \frac{\eby{k}}{\sum_{m=1}^K \eby{m}}; \\ \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^K \frac{t_{n,k}}{g_{n,k}} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}}\\ \end{align*}

## Linear Version¶

\begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \end{align*}

For linear logistic regression, $y_{n,j} = \xv_n^T \Wv_{*,j}$, so $\frac{\partial y_{n,k}}{\partial \Wv_{d,j}}$ exists only when $j=k$, so

\begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N (t_{n,j} - g_{n,j}) \frac{\partial y_{n,j}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \xv_{d,j} \end{align*}

## Nonlinear Version¶

First the general form again.

\begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \end{align*}

Now $y_{n,j}$ depends on $\Vv$ and $\Wv$, so

\begin{align*} \frac{\partial LL(\Vv,\Wv)}{\partial \Vv_{d,m}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*}

But, thank goodness, we have already calculated $\frac{\partial y_{n,k}}{\partial \Vv_{d,m}}$ and $\frac{\partial y_{n,k}}{\partial \Wv_{m,k}}$ in our previous neural network lectures. This becomes more clear when we compare above with the derivatives of mean squared error with respect to weights for neural networks for regression problems.

\begin{align*} E &= \frac{1}{NK} \frac{1}{2} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k})^2\\ \frac{\partial E}{\partial \Vv_{d,m}} & = - \frac{1}{NK} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k}) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial E}{\partial \Wv_{m,j}} & = - \frac{1}{NK} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial E}{\partial \Wv_{m,j}} & = - \frac{1}{NK} \sum_{n=1}^N (t_{n,j} - y_{n,j}) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*}

\begin{align*} LL(\Vv,\Wv) & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k} \text{ where } g_{n,k} = \frac{\eby{k}}{\sum_{m=1}^{K} \eby{m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Vv_{d,m}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*}

So, our previously derived matrix expressions for neural networks can be used if we modify the output calculation. Here are the expressions we used for minimizing mean squared error:

\begin{align*} \Zv &= h(\tilde{\Xv} \Vv)\\ \Yv &= \tilde{\Zv} \Wv\\ E &= \frac{1}{NK} \frac{1}{2} \sum (\Tv - \Yv)^2\\ \grad_\Vv E &= - \frac{1}{NK} \tilde{\Xv}^T \left ( (\Tv - \Yv) \hat{\Wv}^T \cdot (1-\Zv^2) \right )\\ \grad_\Wv E &= - \frac{1}{NK} \tilde{\Zv}^T (\Tv - \Yv) \end{align*}

Here are the changes needed for nonlinear logistic regression. $\Tiv$ is indicator variables for $\Tv$

\begin{align*} \Zv &= h(\tilde{\Xv} \Vv)\\ \Yv &= \tilde{\Zv} \Wv\\ \Fv &= e^{\Yv}\\ \Sv &= \Fv \ones{K}\;\;\;\;\;\;\;\;\;\;\;\;\; \text{ sum across columns}\\ \Gv &= \Fv / \left [ \Sv, \Sv,\ldots,\Sv \right ] \;\;\; \Sv \text{ are column vectors }\\ LL &= \sum \Tiv \log \Gv\\ \grad_\Vv LL &= \tilde{\Xv}^T \left ( (\Tiv - \Gv) \hat{\Wv}^T \cdot (1-\Zv^2) \right )\\ \grad_\Wv LL &= \tilde{\Zv}^T (\Tiv - \Gv) \end{align*}

## Two-Dimensional Data¶

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
n = 500
x1 = np.linspace(5,20,n) + np.random.uniform(-2,2,n)
y1 = ((20-12.5)**2-(x1-12.5)**2) / (20-12.5)**2 * 10 + 14 + np.random.uniform(-2,2,n)
x2 = np.linspace(10,25,n) + np.random.uniform(-2,2,n)
y2 = ((x2-17.5)**2) / (25-17.5)**2 * 10 + 5.5 + np.random.uniform(-2,2,n)
angles = np.linspace(0,2*np.pi,n)
x3 = np.cos(angles) * 15 + 15 + np.random.uniform(-2,2,n)
y3 = np.sin(angles) * 15 + 15 + np.random.uniform(-2,2,n)
X =  np.vstack((np.hstack((x1,x2,x3)), np.hstack((y1,y2,y3)))).T
T = np.repeat(range(1,4),n).reshape((-1,1))
colors = ['blue','red','green']
plt.figure(figsize=(6,6))
for c in range(1,4):


Let's try to classify this data with a 5 hidden unit neural network with nonlinear logistic regression. In Python, do this by defining a new class NeuralNetClassifier it is easy to create a new class for using a neural network as a classifier by making a subclass NeuralNetworkClassifier of the NeuralNetwork and make the required changes. The changes will be in objectiveF and gradF functions local to the train method, and in the use method.

In [3]:
!wget http://www.cs.colostate.edu/~anderson/cs445/notebooks/nn2.tar
!tar xvf nn2.tar

--2019-03-25 07:50:29--  http://www.cs.colostate.edu/~anderson/cs445/notebooks/nn2.tar
Resolving www.cs.colostate.edu (www.cs.colostate.edu)... 129.82.45.114
Connecting to www.cs.colostate.edu (www.cs.colostate.edu)|129.82.45.114|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40960 (40K) [application/x-tar]
Saving to: â€˜nn2.tarâ€™

nn2.tar             100%[===================>]  40.00K  --.-KB/s    in 0.1s

2019-03-25 07:50:29 (389 KB/s) - â€˜nn2.tarâ€™ saved [40960/40960]

neuralnetworks.py
mlutilities.py

In [4]:
cat neuralnetworks.py

import numpy as np
import mlutilities as ml
from copy import copy
import sys  # for sys.float_info.epsilon
import pdb

######################################################################
### class NeuralNetwork
######################################################################

class NeuralNetwork:

def __init__(self, ni,nhs,no):
if nhs == 0 or nhs == [0] or nhs is None or nhs == [None]:
nhs = None
else:
try:
nihs = [ni] + list(nhs)
except:
nihs = [ni] + [nhs]
nhs = [nhs]
if nhs is not None:
self.Vs = [(np.random.uniform(-1,1,size=(1+nihs[i],nihs[i+1])) / np.sqrt(nihs[i]))  for i in range(len(nihs)-1)]
self.W = np.zeros((1+nhs[-1],no))
# self.W = (np.random.uniform(-1,1,size=(1+nhs[-1],no)) / np.sqrt(nhs[-1]))
else:
self.Vs = None
self.W = np.zeros((1+ni,no))
# self.W = 0*np.random.uniform(-1,1,size=(1+ni,no)) / np.sqrt(ni)
self.ni,self.nhs,self.no = ni,nhs,no
self.Xmeans = None
self.Xstds = None
self.Tmeans = None
self.Tstds = None
self.trained = False
self.reason = None
self.errorTrace = None
self.numberOfIterations = None

def train(self,X,T,nIterations=100,verbose=False):

if self.Xmeans is None:
self.Xmeans = X.mean(axis=0)
self.Xstds = X.std(axis=0)
self.Xconstant = self.Xstds == 0
self.XstdsFixed = copy(self.Xstds)
self.XstdsFixed[self.Xconstant] = 1
X = self._standardizeX(X)

if T.ndim == 1:
T = T.reshape((-1,1))

if self.Tmeans is None:
self.Tmeans = T.mean(axis=0)
self.Tstds = T.std(axis=0)
self.Tconstant = self.Tstds == 0
self.TstdsFixed = copy(self.Tstds)
self.TstdsFixed[self.Tconstant] = 1
T = self._standardizeT(T)

# Local functions used by scg()

def objectiveF(w):
self._unpack(w)
Y,_ = self._forward_pass(X)
return 0.5 * np.mean((Y - T)**2)

self._unpack(w)
Y,Z = self._forward_pass(X)
delta = (Y - T) / (X.shape[0] * T.shape[1])
dVs,dW = self._backward_pass(delta,Z)
return self._pack(dVs,dW)

nIterations = nIterations,
verbose=verbose,
ftracep=True)

self._unpack(scgresult['x'])
self.reason = scgresult['reason']
self.errorTrace = np.sqrt(scgresult['ftrace']) # * self.Tstds # to unstandardize the MSEs
self.numberOfIterations = len(self.errorTrace)
self.trained = True
return self

def use(self,X,allOutputs=False):
Xst = self._standardizeX(X)
Y,Z = self._forward_pass(Xst)
Y = self._unstandardizeT(Y)
if Z is None:
return (Y,None) if allOutputs else Y
else:
return (Y,Z[1:]) if allOutputs else Y

def getNumberOfIterations(self):
return self.numberOfIterations

def getErrorTrace(self):
return self.errorTrace

def draw(self,inputNames = None, outputNames = None):
ml.draw(self.Vs, self.W, inputNames, outputNames)

def _forward_pass(self,X):
if self.nhs is None:
# no hidden units, just linear output layer
Y = np.dot(X, self.W[1:,:]) + self.W[0:1,:]
Zs = [X]
else:
Zprev = X
Zs = [Zprev]
for i in range(len(self.nhs)):
V = self.Vs[i]
Zprev = np.tanh(np.dot(Zprev,V[1:,:]) + V[0:1,:])
Zs.append(Zprev)
Y = np.dot(Zprev, self.W[1:,:]) + self.W[0:1,:]
return Y, Zs

def _backward_pass(self,delta,Z):
if self.nhs is None:
# no hidden units, just linear output layer
dW = np.vstack((np.dot(np.ones((1,delta.shape[0])),delta),  np.dot( Z[0].T, delta)))
dVs = None
else:
dW = np.vstack((np.dot(np.ones((1,delta.shape[0])),delta),  np.dot( Z[-1].T, delta)))
dVs = []
delta = (1-Z[-1]**2) * np.dot( delta, self.W[1:,:].T)
for Zi in range(len(self.nhs),0,-1):
Vi = Zi - 1 # because X is first element of Z
dV = np.vstack(( np.dot(np.ones((1,delta.shape[0])), delta),
np.dot( Z[Zi-1].T, delta)))
dVs.insert(0,dV)
delta = np.dot( delta, self.Vs[Vi][1:,:].T) * (1-Z[Zi-1]**2)
return dVs,dW

def _standardizeX(self,X):
result = (X - self.Xmeans) / self.XstdsFixed
result[:,self.Xconstant] = 0.0
return result
def _unstandardizeX(self,Xs):
return self.Xstds * Xs + self.Xmeans
def _standardizeT(self,T):
result = (T - self.Tmeans) / self.TstdsFixed
result[:,self.Tconstant] = 0.0
return result
def _unstandardizeT(self,Ts):
return self.Tstds * Ts + self.Tmeans

def _pack(self,Vs,W):
if Vs is None:
return np.array(W.flat)
else:
return np.hstack([V.flat for V in Vs] + [W.flat])

def _unpack(self,w):
if self.nhs is None:
self.W[:] = w.reshape((self.ni+1, self.no))
else:
first = 0
numInThisLayer = self.ni
for i in range(len(self.Vs)):
self.Vs[i][:] = w[first:first+(numInThisLayer+1)*self.nhs[i]].reshape((numInThisLayer+1,self.nhs[i]))
first += (numInThisLayer+1) * self.nhs[i]
numInThisLayer = self.nhs[i]
self.W[:] = w[first:].reshape((numInThisLayer+1,self.no))

def __repr__(self):
str = 'NeuralNetwork({}, {}, {})'.format(self.ni,self.nhs,self.no)
# str += '  Standardization parameters' + (' not' if self.Xmeans == None else '') + ' calculated.'
if self.trained:
str += '\n   Network was trained for {} iterations. Final error is {}.'.format(self.numberOfIterations,
self.errorTrace[-1])
else:
str += '  Network is not trained.'
return str

######################################################################
### class NeuralNetworkClassifier
######################################################################

def makeIndicatorVars(T):
""" Assumes argument is N x 1, N samples each being integer class label """
return (T == np.unique(T)).astype(int)

class NeuralNetworkClassifier(NeuralNetwork):

def __init__(self,ni,nhs,no):
#super(NeuralNetworkClassifier,self).__init__(ni,nh,no)
NeuralNetwork.__init__(self,ni,nhs,no)

def _multinomialize(self,Y):
# fix to avoid overflow
mx = max(0,np.max(Y))
expY = np.exp(Y-mx)
# print('mx',mx)
denom = np.sum(expY,axis=1).reshape((-1,1)) + sys.float_info.epsilon
Y = expY / denom
return Y

def train(self, X, T, nIterations=100, verbose=False):
if self.Xmeans is None:
self.Xmeans = X.mean(axis=0)
self.Xstds = X.std(axis=0)
self.Xconstant = self.Xstds == 0
self.XstdsFixed = copy(self.Xstds)
self.XstdsFixed[self.Xconstant] = 1
X = self._standardizeX(X)

self.classes, counts = np.unique(T,return_counts=True)
self.mostCommonClass = self.classes[np.argmax(counts)]  # to break ties

if self.no != len(self.classes):
raise ValueError(" In NeuralNetworkClassifier, the number of outputs must equal\n the number of classes in the training data. The given number of outputs\n is %d and number of classes is %d. Try changing the number of outputs in the\n call to NeuralNetworkClassifier()." % (self.no, len(self.classes)))
T = makeIndicatorVars(T)

# Local functions used by gradientDescent.scg()
def objectiveF(w):
self._unpack(w)
Y,_ = self._forward_pass(X)
Y = self._multinomialize(Y)
Y[Y==0] = sys.float_info.epsilon
return -np.mean(T * np.log(Y))

self._unpack(w)
Y,Z = self._forward_pass(X)
Y = self._multinomialize(Y)
delta = (Y - T) / (X.shape[0] * (T.shape[1]))
dVs,dW = self._backward_pass(delta,Z)
return self._pack(dVs,dW)

nIterations = nIterations,
ftracep=True,
verbose=verbose)

self._unpack(scgresult['x'])
self.reason = scgresult['reason']
self.errorTrace = scgresult['ftrace']
self.numberOfIterations = len(self.errorTrace) - 1
self.trained = True
return self

def use(self,X,allOutputs=False):
Xst = self._standardizeX(X)
Y,Z = self._forward_pass(Xst)
Y = self._multinomialize(Y)
classes = self.classes[np.argmax(Y,axis=1)].reshape((-1,1))
# If any row has all equal values, then all classes have same probability.
# Let's return the most common class in these cases
classProbsEqual = (Y == Y[:,0:1]).all(axis=1)
if sum(classProbsEqual) > 0:
classes[classProbsEqual] = self.mostCommonClass
if Z is None:
return (classes,Y,None) if allOutputs else classes
else:
return (classes,Y,Z[1:]) if allOutputs else classes

In [5]:
import sys
sys.path = ['.'] + sys.path

In [6]:
import neuralnetworks as nn
import mpl_toolkits.mplot3d as plt3
from matplotlib import cm

## if you edit neuralnetwork.py, force ipython to reload it by doing this.

nHidden = 5
nnet = nn.NeuralNetworkClassifier(2,nHidden,3) # 3 classes, will actually make 2-unit output layer
nnet.train(X,T,  nIterations=1000, verbose=True)

xs = np.linspace(0,30,40)
x,y = np.meshgrid(xs,xs)
Xtest = np.vstack((x.flat,y.flat)).T
Ytest = nnet.use(Xtest)
predTest,probs,_ = nnet.use(Xtest,allOutputs=True)  #discard hidden unit outputs

plt.figure(figsize=(10,10))

plt.subplot(2,2,1)
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel("Epochs")
plt.ylabel("Likelihood")

plt.subplot(2,2,3)
nnet.draw()

colors = ['red','green','blue']
plt.subplot(2,2,2)

for c in range(1,4):

plt.subplot(2,2,4)
plt.contourf(Xtest[:,0].reshape((40,40)),Xtest[:,1].reshape((40,40)), predTest.reshape((40,40)),
levels = [0.5,1.99,2.01,3.5], #    levels=(0.5,1.5,2.5,3.5),
colors=('red','green','blue'));

SCG: Iteration 100 fValue 0.03775978800038138 Scale 1e-15
SCG: Iteration 200 fValue 0.020741938982621968 Scale 1e-15
SCG: Iteration 300 fValue 0.015552770614653178 Scale 1e-15
SCG: Iteration 400 fValue 0.013600204254684578 Scale 1e-15
SCG: Iteration 500 fValue 0.012868513291316002 Scale 1e-15
SCG: Iteration 600 fValue 0.011968508806622594 Scale 1e-15
SCG: Iteration 700 fValue 0.011549682470761087 Scale 1e-15
SCG: Iteration 800 fValue 0.011191824490301792 Scale 1e-15
SCG: Iteration 900 fValue 0.010798480012414048 Scale 1e-15
SCG: Iteration 1000 fValue 0.010341145640767479 Scale 1e-15

In [7]:
from matplotlib.colors import LightSource

In [8]:
fig = plt.figure(figsize=(10,30))
ls = LightSource(azdeg=30, altdeg=60)
white = np.ones((x.shape[0], x.shape[1], 3))
red = white * np.array([1,0,0])
green = white * np.array([0,1,0])
blue = white * np.array([0,0,1])
colors = [red, green, blue]
#colors = ['red','green','blue']

for c in range(3):
ax.view_init(azim = 180+40,elev = 60)
Z = probs[:, c].reshape(x.shape)
#ax.plot_surface(x,y,Z,
#rstride=1,cstride=1,linewidth=0,antialiased=False,
#                color=colors[c],alpha=1)
ax.plot_surface(x,y,Z,
rstride=1,cstride=1,linewidth=0, antialiased=False,
ax.set_zlabel(r"$p(C="+str(c+1)+"|x)$")


How would you plot the outputs of the hidden units?

Let's repeat the experiment with classifying human activity data (accelerometer data), but now use our NeuralNetworkClassifier class to do nonlinear logistic regression. This time we will retrieve and load accelerometers.npy, a file containing a numpy array stored in its binary format.

In [9]:
! curl -O http://www.cs.colostate.edu/~anderson/cs445/notebooks/acceldata.tar
! tar xvf acceldata.tar

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
100 7040k  100 7040k    0     0  2830k      0  0:00:02  0:00:02 --:--:-- 2829k
accelerometers.npy

In [10]:
data.shape

Out[10]:
(225006, 4)
In [11]:
data[0,:]

Out[11]:
array([ 1.        , -0.87313405, -0.08552787, -0.29504612])
In [38]:
X = data[:,1:]
T = data[:,0:1]
X.shape, T.shape

Out[38]:
((225006, 3), (225006, 1))
In [39]:
import mlutilities as ml # for ml.paritition

In [40]:
X.shape

Out[40]:
(225006, 3)
In [41]:
train_fraction = 0.8
Xtrain, Ttrain, Xtest, Ttest = ml.partition(X, T, train_fraction, classification=True) #stratified partitioning (by class)

In [42]:
Xtrain.shape,Ttrain.shape,Xtest.shape,Ttest.shape

Out[42]:
((180006, 3), (180006, 1), (45000, 3), (45000, 1))
In [43]:
np.unique(Ttrain, return_counts=True)

Out[43]:
(array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
array([18002, 18000, 18000, 18000, 18000, 18002, 18000, 18000, 18002,
18000]))
In [44]:
%precision 5
values,counts = np.unique(Ttrain, return_counts=True)
counts / Ttrain.shape[0]

Out[44]:
array([0.10001, 0.1    , 0.1    , 0.1    , 0.1    , 0.10001, 0.1    ,
0.1    , 0.10001, 0.1    ])
In [45]:
values,counts = np.unique(Ttest, return_counts=True)
counts / Ttest.shape[0]

Out[45]:
array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
In [46]:
n_classes = len(np.unique(T))
nnet = nn.NeuralNetworkClassifier(3, 10, n_classes)
nnet.train(Xtrain,Ttrain, verbose=True)

plt.rcParams['figure.figsize'] = (6,6)
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel('Iteration')
plt.ylabel('Data Likelihood');

SCG: Iteration 10 fValue 0.16188814244708796 Scale 3.90625e-09
SCG: Iteration 20 fValue 0.14704801907462223 Scale 3.814697265625e-12
SCG: Iteration 30 fValue 0.13908391343967322 Scale 3.725290298461914e-15
SCG: Iteration 40 fValue 0.13515204025420435 Scale 1e-15
SCG: Iteration 50 fValue 0.13309067684300996 Scale 1e-15
SCG: Iteration 60 fValue 0.13153904759497034 Scale 1e-15
SCG: Iteration 70 fValue 0.13063932634565126 Scale 1e-15
SCG: Iteration 80 fValue 0.13026014784075352 Scale 1e-15
SCG: Iteration 90 fValue 0.12998739696400266 Scale 1e-15
SCG: Iteration 100 fValue 0.12973129439100037 Scale 1e-15

In [47]:
Ptrain,Prtrain,_ = nnet.use(Xtrain,allOutputs=True)
Ptest,Prtest,_ = nnet.use(Xtest,allOutputs=True)
plt.figure(figsize=(10, 10))
plt.subplot(2,1,1)
plt.plot(np.hstack((Ttrain,Ptrain)), '.')
plt.legend(('Actual','Predicted'))
plt.subplot(2,1,2)
plt.plot(np.hstack((Ttest,Ptest)), '.')
plt.legend(('Actual','Predicted'));

In [48]:
cm = ml.confusionMatrix(Ttest,Ptest,np.unique(Ttest).astype(int))
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')

       1    2    3    4    5    6    7    8    9   10
------------------------------------------------------------
1 | 86.1 13.9  0    0    0    0    0    0    0    0
2 | 37.8 27.0  3.5  0.2 27.5  0    0    0.1  4.0  0
3 |  0.8  4.7 65.4  2.9  5.1  9.8  3.6  1.1  3.6  3.1
4 |  0.7  5.8 14.7  9.0 20.1 16.6 10.3  2.7  5.9 14.1
5 |  3.2  1.6 13.6  1.5 50.6  0.6  1.0  6.0 19.5  2.5
6 |  0.2  0.2  0.6  0.1  0.5 68.4 28.5  0.4  0.3  0.7
7 |  0.1  0.4  1.4  0.3  0.2 37.8 53.1  0.6  0.9  5.4
8 |  0.2  1.3  8.1  2.6 12.4  1.9  3.1 15.4 50.4  4.7
9 |  0.3  0.8  6.7  1.6 14.2  1.8  2.0 13.9 55.1  3.7
10 |  0.3  4.6  6.8  3.2 12.5  9.8 17.6 14.0 12.6 18.7
1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs


Try training for more iterations.

In [49]:
nnet = nn.NeuralNetworkClassifier(3, 20, 10)
nnet.train(Xtrain, Ttrain, nIterations=500, verbose=True)
print('Trained for', nnet.getNumberOfIterations(), 'iterations')

SCG: Iteration 50 fValue 0.12810128826542772 Scale 1e-15
SCG: Iteration 100 fValue 0.12123691708103801 Scale 1e-15
SCG: Iteration 150 fValue 0.11980220947167443 Scale 1e-15
SCG: Iteration 200 fValue 0.11916892273110097 Scale 1e-15
SCG: Iteration 250 fValue 0.11877816769617629 Scale 1e-15
SCG: Iteration 300 fValue 0.118428957495202 Scale 1e-15
SCG: Iteration 350 fValue 0.11826140885636018 Scale 1e-15
SCG: Iteration 400 fValue 0.11801238572879526 Scale 1e-15
SCG: Iteration 450 fValue 0.11784035361994037 Scale 1e-15
SCG: Iteration 500 fValue 0.11770224579302253 Scale 1e-15
Trained for 500 iterations

In [50]:
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel('Iteration')
plt.ylabel('Data Likelihood');

In [51]:
classes = np.unique(Ttest).astype(int)
Ptrain, Prtrain, _ = nnet.use(Xtrain,allOutputs=True)
Ptest, Prtest, _ = nnet.use(Xtest,allOutputs=True)
print('Percent Correct: Training', 100*np.sum(Ptrain==Ttrain)/len(Ttrain),
'Testing',100*np.sum(Ptest==Ttest)/len(Ttest))
print()
ml.confusionMatrix(Ttest,Ptest,classes)
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
plt.plot(np.exp(-nnet.getErrorTrace()));

Percent Correct: Training 55.955357043654104 Testing 51.824444444444445

1    2    3    4    5    6    7    8    9   10
------------------------------------------------------------
1 | 95.6  4.4  0    0    0    0    0    0    0    0
2 | 23.6 55.9  3.6  0.7 12.2  0    0    0.0  4.0  0
3 |  0.5  1.2 67.6  7.5  2.6 10.1  4.2  3.2  0.6  2.5
4 |  0.8  3.0 17.2 18.5  9.3 17.3  8.1  5.9  4.0 15.8
5 |  1.2  2.5 12.7  2.9 52.8  0.2  1.2  9.3 14.9  2.2
6 |  0.0  0.2  0.4  0.3  0.8 78.6 16.7  0.5  0.2  2.3
7 |  0.0  0.3  1.1  0.1  0.4 37.1 54.1  1.2  1.3  4.4
8 |  0.0  1.9  5.8  3.2 11.6  1.6  3.3 28.6 39.6  4.3
9 |  0.2  2.0  5.0  3.0 12.3  1.3  2.0 21.5 48.7  3.9
10 |  0.2  1.6  6.8  4.3 10.2  9.6 16.0 25.1  8.5 17.9
1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs

In [52]:
nnet.draw()

In [53]:
import scipy.signal as sig

def cwt(eeg,Fs,freqs,width,channelNames=None,graphics=False):
if freqs.min() == 0:
print('cwt: Frequencies must be greater than 0.')
return None,None
nChannels,nSamples = eeg.shape
if not channelNames and graphics:
channelNames = ['Channel {:2d}'.format(i) for i in range(nChannels)]

nFreqs = len(freqs)
tfrep = np.zeros((nChannels, nFreqs, nSamples))
tfrepPhase = np.zeros((nChannels, nFreqs, nSamples))

for ch in range(nChannels):
print('channel',ch,' freq ',end='')
for freqi in range(nFreqs):
print(freqs[freqi],' ',end='')
mag,phase = energyvec(freqs[freqi],eeg[ch,:],Fs,width)
tfrepPhase[ch,freqi,:] = phase
tfrep[ch,freqi,:] = mag
print()

return tfrep, tfrepPhase

def morletLength(Fs,f,width):
''' len = morletLength(Fs,f,width) '''
dt = 1.0/Fs
sf = f/width
st = 1.0/(2*np.pi*sf)
return int((3.5*st - -3.5*st)/dt)

def energyvec(f,s,Fs,width):
'''
function [y,phase] <- energyvec(f,s,Fs,width)
function y <- energyvec(f,s,Fs,width)

Return a vector containing the energy as a
function of time for frequency f. The energy
is calculated using Morlet''s wavelets.
s : signal
Fs: sampling frequency
width : width of Morlet wavelet (><- 5 suggested).
'''

dt = 1.0/Fs
sf = f/float(width)
st = 1.0/(2*np.pi*sf)

t = np.arange(-3.5*st,3.5*st,step=dt)
m = morlet(f,t,width)
# yconv = np.convolve(s,m,mode="same")
yconv = sig.fftconvolve(s,m,mode='same')

lengthMorlet = len(m)
firsthalf = int(lengthMorlet/2.0 + 0.5)
secondhalf = lengthMorlet - firsthalf

yconvNoBoundary = yconv
y = np.abs(yconvNoBoundary)**2
phase = np.angle(yconvNoBoundary,deg=True)
return y,phase

######################################################################

def morlet(f,t,width):
'''
function y <- morlet(f,t,width)
Morlet''s wavelet for frequency f and time t.
The wavelet will be normalized so the total energy is 1.
width defines the width of the wavelet.
A value ><- 5 is suggested.

Ref: Tallon-Baudry et al., J. Neurosci. 15, 722-734 (1997), page 724

Ole Jensen, August 1998
'''
sf = f/float(width)
st = 1.0/(2*np.pi*sf)
A = 1.0/np.sqrt(st*np.sqrt(2*np.pi))
y = A*np.exp(-t**2/(2*st**2)) * np.exp(1j*2*np.pi*f*t)
return y

In [54]:
import time
width = 75
maxFreq = 20
freqs = np.arange(0.5,maxFreq,0.5) # makes same freqs used in stft above
start = time.time()
tfrep,tfrepPhase = cwt(data[:,1:].T, 75, freqs, width)
print('CWT time: {} seconds'.format(time.time() - start))

channel 0  freq 0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5  10.0  10.5  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0  15.5  16.0  16.5  17.0  17.5  18.0  18.5  19.0  19.5
channel 1  freq 0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5  10.0  10.5  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0  15.5  16.0  16.5  17.0  17.5  18.0  18.5  19.0  19.5
channel 2  freq 0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5  10.0  10.5  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0  15.5  16.0  16.5  17.0  17.5  18.0  18.5  19.0  19.5
CWT time: 4.35584020614624 seconds

In [55]:
plt.figure(figsize=(15,15))
plt.subplot(5,1,1)
plt.plot(data[:,1:])
plt.axis('tight')

plt.subplot(5,1,2)
plt.plot(data[:,0])
plt.text(5000,8,'1-Rest, 2-Coloring, 3-Legos, 4-Wii Tennis, 5-Boxing, 6-0.75, 7-1.25 m/s, 8-1.75, 9-2.25 m/s, 10-stairs')
plt.axis('tight')

nSensors = data.shape[1] - 1
for i in range(nSensors):
plt.subplot(5,1,i+3)
plt.imshow(np.log(tfrep[i,:,:]),
interpolation='nearest',origin='lower',
cmap=plt.cm.jet) #plt.cm.Reds)
plt.xlabel('Seconds')
plt.ylabel('Frequency in ' + ('$x$','$y$','$z$')[i])
tickstep = round(len(freqs) / 5)
plt.yticks(np.arange(len(freqs))[::tickstep],
[str(i) for i in freqs[::tickstep]])
plt.axis('auto')
plt.axis('tight')

In [56]:
tfrep.shape

Out[56]:
(3, 39, 225006)
In [57]:
X = tfrep.reshape((3*39,-1)).T
X.shape, T.shape, len(np.unique(T))

Out[57]:
((225006, 117), (225006, 1), 10)
In [58]:
Xtrain, Ttrain, Xtest, Ttest = ml.partition(X, T, 0.8, classification=True)

In [59]:
print(Xtrain.shape)
nnet = nn.NeuralNetworkClassifier(X.shape[1], 20, 10)  #10 classes
nnet.train(Xtrain,Ttrain, nIterations = 500, verbose=True)

(180006, 117)
SCG: Iteration 50 fValue 0.02536230729996015 Scale 1e-15
SCG: Iteration 100 fValue 0.007619789855092344 Scale 1e-15
SCG: Iteration 150 fValue 0.001294782770381377 Scale 1e-15
SCG: Iteration 200 fValue 0.00044073818686692193 Scale 1e-15
SCG: Iteration 250 fValue 0.0002525632424925583 Scale 1e-15
SCG: Iteration 300 fValue 0.00013891321767210582 Scale 1e-15
SCG: Iteration 350 fValue 7.487201391883371e-05 Scale 1e-15
SCG: Iteration 400 fValue 4.98034449410777e-05 Scale 8.589934592e-06
SCG: Iteration 450 fValue 3.247916940876592e-05 Scale 4e-15
SCG: Iteration 500 fValue 2.9229093159987138e-05 Scale 4611.686018427388

Out[59]:
NeuralNetwork(117, [20], 10)
Network was trained for 500 iterations. Final error is 2.9229093159987138e-05.
In [60]:
Ptrain, Prtrain, _ = nnet.use(Xtrain,allOutputs=True)
Ptest, Prtest, _ = nnet.use(Xtest,allOutputs=True)
print('Percent Correct: Training', 100*np.sum(Ptrain==Ttrain)/len(Ttrain),
'Testing', 100*np.sum(Ptest==Ttest)/len(Ttest))
print()
classes = np.unique(Ttest).astype(int)
ml.confusionMatrix(Ttest, Ptest, classes)
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
plt.plot(np.exp(-nnet.getErrorTrace()));

Percent Correct: Training 100.0 Testing 35.446666666666665

1    2    3    4    5    6    7    8    9   10
------------------------------------------------------------
1 | 35.7 64.3  0    0    0    0    0    0    0    0
2 |  0   10.5  2.6  0    0   63.3  0    0    0   23.6
3 | 16.0  0    3.1 18.6  0    0   25.8  0    0   36.5
4 |  0    0   67.7 25.6  6.7  0    0    0    0    0
5 |  0.7  0    1.0  0   67.6  6.4  0    1.4 15.8  7.0
6 |  0    0    0    0    0   78.0 15.8  0    0    6.2
7 |  0    0    3.0  0    0.4 19.9 65.4  1.4  0    9.9
8 |  0    0   10.0  2.2  6.6  0    0   41.3 40.0  0
9 |  0    0    0   31.8  0    1.5  0   51.6 14.6  0.5
10 |  3.0  0    0    0    5.7 12.6 40.8 23.9  1.4 12.7
1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs

In [61]:
plt.figure(figsize=(15,15))
nnet.draw()