$$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\Yv}{\mathbf{Y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\betav}{\mathbf{\beta}} \newcommand{\gv}{\mathbf{g}} \newcommand{\Hv}{\mathbf{H}} \newcommand{\dv}{\mathbf{d}} \newcommand{\Vv}{\mathbf{V}} \newcommand{\vv}{\mathbf{v}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\Sv}{\mathbf{S}} \newcommand{\Gv}{\mathbf{G}} \newcommand{\zv}{\mathbf{z}} \newcommand{\Zv}{\mathbf{Z}} \newcommand{\Norm}{\mathcal{N}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}} \newcommand{\dimensionbar}[1]{\underset{#1}{\operatorname{|}}} \newcommand{\dimensionbar}[1]{\underset{#1}{\operatorname{|}}} \newcommand{\grad}{\mathbf{\nabla}} \newcommand{\ebx}[1]{e^{\betav_{#1}^T \xv_n}} \newcommand{\eby}[1]{e^{y_{n,#1}}} \newcommand{\Tiv}{\mathbf{Ti}} \newcommand{\Fv}{\mathbf{F}} \newcommand{\ones}[1]{\mathbf{1}_{#1}} $$

Classification with Nonlinear Logistic Regression Using Neural Networks

Motivation and Setup

Linear function approximator as a neural network.

What must we add to do logistic regression?

Just add the softmax calculation to the output layer.

Any thoughts on how to do nonlinear logistic regression?

Derivation

We will maximize the log likelihood of the training data. $$ \begin{align*} L(\Wv) & = \left ( \prod_{n=1}^N \prod_{k=1}^K P(C=k|\xv_n)^{t_{n,k}} \right ) \\ \log L(\Wv) & = LL(\Wv) = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log P(C=k|\xv_n)\\ & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k} \end{align*} $$

Gradient of the Log Likelihood

$$ \begin{align*} LL(\Wv) & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k}\;\;\;\;\; \text{ where } g_{n,k} = \frac{\eby{k}}{\sum_{m=1}^K \eby{m}}; \\ \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^K \frac{t_{n,k}}{g_{n,k}} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}}\\ \end{align*} $$

Linear Version

General gradient

$$ \begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \end{align*} $$

For linear logistic regression, $y_{n,j} = \xv_n^T \Wv_{*,j}$, so $\frac{\partial y_{n,k}}{\partial \Wv_{d,j}}$ exists only when $j=k$, so

$$ \begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N (t_{n,j} - g_{n,j}) \frac{\partial y_{n,j}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \xv_{d,j} \end{align*} $$

Nonlinear Version

First the general form again.

$$ \begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \end{align*} $$

Now $y_{n,j}$ depends on $\Vv$ and $\Wv$, so

$$ \begin{align*} \frac{\partial LL(\Vv,\Wv)}{\partial \Vv_{d,m}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*} $$

But, thank goodness, we have already calculated $\frac{\partial y_{n,k}}{\partial \Vv_{d,m}}$ and $\frac{\partial y_{n,k}}{\partial \Wv_{m,k}}$ in our previous neural network lectures. This becomes more clear when we compare above with the derivatives of mean squared error with respect to weights for neural networks for regression problems.

$$ \begin{align*} E &= \frac{1}{NK} \frac{1}{2} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k})^2\\ \frac{\partial E}{\partial \Vv_{d,m}} & = - \frac{1}{NK} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k}) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial E}{\partial \Wv_{m,j}} & = - \frac{1}{NK} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial E}{\partial \Wv_{m,j}} & = - \frac{1}{NK} \sum_{n=1}^N (t_{n,j} - y_{n,j}) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*} $$

Compare to gradients for likelihood

$$ \begin{align*} LL(\Vv,\Wv) & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k} \text{ where } g_{n,k} = \frac{\eby{k}}{\sum_{m=1}^{K} \eby{m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Vv_{d,m}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \sum_{k=1}^{K} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*} $$

So, our previously derived matrix expressions for neural networks can be used if we modify the output calculation. Here are the expressions we used for minimizing mean squared error:

$$ \begin{align*} \Zv &= h(\tilde{\Xv} \Vv)\\ \Yv &= \tilde{\Zv} \Wv\\ E &= \frac{1}{NK} \frac{1}{2} \sum (\Tv - \Yv)^2\\ \grad_\Vv E &= - \frac{1}{NK} \tilde{\Xv}^T \left ( (\Tv - \Yv) \hat{\Wv}^T \cdot (1-\Zv^2) \right )\\ \grad_\Wv E &= - \frac{1}{NK} \tilde{\Zv}^T (\Tv - \Yv) \end{align*} $$

Here are the changes needed for nonlinear logistic regression. $\Tiv$ is indicator variables for $\Tv$

$$ \begin{align*} \Zv &= h(\tilde{\Xv} \Vv)\\ \Yv &= \tilde{\Zv} \Wv\\ \Fv &= e^{\Yv}\\ \Sv &= \Fv \ones{K}\;\;\;\;\;\;\;\;\;\;\;\;\; \text{ sum across columns}\\ \Gv &= \Fv / \left [ \Sv, \Sv,\ldots,\Sv \right ] \;\;\; \Sv \text{ are column vectors }\\ LL &= \sum \Tiv \log \Gv\\ \grad_\Vv LL &= \tilde{\Xv}^T \left ( (\Tiv - \Gv) \hat{\Wv}^T \cdot (1-\Zv^2) \right )\\ \grad_\Wv LL &= \tilde{\Zv}^T (\Tiv - \Gv) \end{align*} $$

Two-Dimensional Data

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
n = 500
x1 = np.linspace(5,20,n) + np.random.uniform(-2,2,n)
y1 = ((20-12.5)**2-(x1-12.5)**2) / (20-12.5)**2 * 10 + 14 + np.random.uniform(-2,2,n)
x2 = np.linspace(10,25,n) + np.random.uniform(-2,2,n)
y2 = ((x2-17.5)**2) / (25-17.5)**2 * 10 + 5.5 + np.random.uniform(-2,2,n)
angles = np.linspace(0,2*np.pi,n)
x3 = np.cos(angles) * 15 + 15 + np.random.uniform(-2,2,n)
y3 = np.sin(angles) * 15 + 15 + np.random.uniform(-2,2,n)
X =  np.vstack((np.hstack((x1,x2,x3)), np.hstack((y1,y2,y3)))).T
T = np.repeat(range(1,4),n).reshape((-1,1))
colors = ['blue','red','green']
plt.figure(figsize=(6,6))
for c in range(1,4):
    mask = (T == c).flatten()
    plt.plot(X[mask,0],X[mask,1],'o',markersize=6, alpha=0.5, color=colors[c-1])

Let's try to classify this data with a 5 hidden unit neural network with nonlinear logistic regression. In Python, do this by defining a new class NeuralNetClassifier it is easy to create a new class for using a neural network as a classifier by making a subclass NeuralNetworkClassifier of the NeuralNetwork and make the required changes. The changes will be in objectiveF and gradF functions local to the train method, and in the use method.

In [3]:
!wget http://www.cs.colostate.edu/~anderson/cs445/notebooks/nn2.tar
!tar xvf nn2.tar
--2019-03-25 07:50:29--  http://www.cs.colostate.edu/~anderson/cs445/notebooks/nn2.tar
Resolving www.cs.colostate.edu (www.cs.colostate.edu)... 129.82.45.114
Connecting to www.cs.colostate.edu (www.cs.colostate.edu)|129.82.45.114|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40960 (40K) [application/x-tar]
Saving to: ‘nn2.tar’

nn2.tar             100%[===================>]  40.00K  --.-KB/s    in 0.1s    

2019-03-25 07:50:29 (389 KB/s) - ‘nn2.tar’ saved [40960/40960]

neuralnetworks.py
mlutilities.py
In [4]:
cat neuralnetworks.py
import numpy as np
import mlutilities as ml
from copy import copy
import sys  # for sys.float_info.epsilon
import pdb

######################################################################
### class NeuralNetwork
######################################################################

class NeuralNetwork:

    def __init__(self, ni,nhs,no):        
        if nhs == 0 or nhs == [0] or nhs is None or nhs == [None]:
            nhs = None
        else:
            try:
                nihs = [ni] + list(nhs)
            except:
                nihs = [ni] + [nhs]
                nhs = [nhs]
        if nhs is not None:
            self.Vs = [(np.random.uniform(-1,1,size=(1+nihs[i],nihs[i+1])) / np.sqrt(nihs[i]))  for i in range(len(nihs)-1)]
            self.W = np.zeros((1+nhs[-1],no))
            # self.W = (np.random.uniform(-1,1,size=(1+nhs[-1],no)) / np.sqrt(nhs[-1]))
        else:
            self.Vs = None
            self.W = np.zeros((1+ni,no))
            # self.W = 0*np.random.uniform(-1,1,size=(1+ni,no)) / np.sqrt(ni)
        self.ni,self.nhs,self.no = ni,nhs,no
        self.Xmeans = None
        self.Xstds = None
        self.Tmeans = None
        self.Tstds = None
        self.trained = False
        self.reason = None
        self.errorTrace = None
        self.numberOfIterations = None

    def train(self,X,T,nIterations=100,verbose=False):

        if self.Xmeans is None:
            self.Xmeans = X.mean(axis=0)
            self.Xstds = X.std(axis=0)
            self.Xconstant = self.Xstds == 0
            self.XstdsFixed = copy(self.Xstds)
            self.XstdsFixed[self.Xconstant] = 1
        X = self._standardizeX(X)

        if T.ndim == 1:
            T = T.reshape((-1,1))

        if self.Tmeans is None:
            self.Tmeans = T.mean(axis=0)
            self.Tstds = T.std(axis=0)
            self.Tconstant = self.Tstds == 0
            self.TstdsFixed = copy(self.Tstds)
            self.TstdsFixed[self.Tconstant] = 1
        T = self._standardizeT(T)

        # Local functions used by scg()

        def objectiveF(w):
            self._unpack(w)
            Y,_ = self._forward_pass(X)
            return 0.5 * np.mean((Y - T)**2)

        def gradF(w):
            self._unpack(w)
            Y,Z = self._forward_pass(X)
            delta = (Y - T) / (X.shape[0] * T.shape[1])
            dVs,dW = self._backward_pass(delta,Z)
            return self._pack(dVs,dW)

        scgresult = ml.scg(self._pack(self.Vs,self.W), objectiveF, gradF,
                            nIterations = nIterations,
                            verbose=verbose,
                            ftracep=True)

        self._unpack(scgresult['x'])
        self.reason = scgresult['reason']
        self.errorTrace = np.sqrt(scgresult['ftrace']) # * self.Tstds # to unstandardize the MSEs
        self.numberOfIterations = len(self.errorTrace)
        self.trained = True
        return self

    def use(self,X,allOutputs=False):
        Xst = self._standardizeX(X)
        Y,Z = self._forward_pass(Xst)
        Y = self._unstandardizeT(Y)
        if Z is None:
            return (Y,None) if allOutputs else Y
        else:
            return (Y,Z[1:]) if allOutputs else Y

    def getNumberOfIterations(self):
        return self.numberOfIterations
    
    def getErrorTrace(self):
        return self.errorTrace
        
    def draw(self,inputNames = None, outputNames = None):
        ml.draw(self.Vs, self.W, inputNames, outputNames)

    def _forward_pass(self,X):
        if self.nhs is None:
            # no hidden units, just linear output layer
            Y = np.dot(X, self.W[1:,:]) + self.W[0:1,:]
            Zs = [X]
        else:
            Zprev = X
            Zs = [Zprev]
            for i in range(len(self.nhs)):
                V = self.Vs[i]
                Zprev = np.tanh(np.dot(Zprev,V[1:,:]) + V[0:1,:])
                Zs.append(Zprev)
            Y = np.dot(Zprev, self.W[1:,:]) + self.W[0:1,:]
        return Y, Zs

    def _backward_pass(self,delta,Z):
        if self.nhs is None:
            # no hidden units, just linear output layer
            dW = np.vstack((np.dot(np.ones((1,delta.shape[0])),delta),  np.dot( Z[0].T, delta)))
            dVs = None
        else:
            dW = np.vstack((np.dot(np.ones((1,delta.shape[0])),delta),  np.dot( Z[-1].T, delta)))
            dVs = []
            delta = (1-Z[-1]**2) * np.dot( delta, self.W[1:,:].T)
            for Zi in range(len(self.nhs),0,-1):
                Vi = Zi - 1 # because X is first element of Z
                dV = np.vstack(( np.dot(np.ones((1,delta.shape[0])), delta),
                                 np.dot( Z[Zi-1].T, delta)))
                dVs.insert(0,dV)
                delta = np.dot( delta, self.Vs[Vi][1:,:].T) * (1-Z[Zi-1]**2)
        return dVs,dW

    def _standardizeX(self,X):
        result = (X - self.Xmeans) / self.XstdsFixed
        result[:,self.Xconstant] = 0.0
        return result
    def _unstandardizeX(self,Xs):
        return self.Xstds * Xs + self.Xmeans
    def _standardizeT(self,T):
        result = (T - self.Tmeans) / self.TstdsFixed
        result[:,self.Tconstant] = 0.0
        return result
    def _unstandardizeT(self,Ts):
        return self.Tstds * Ts + self.Tmeans
   
    def _pack(self,Vs,W):
        if Vs is None:
            return np.array(W.flat)
        else:
            return np.hstack([V.flat for V in Vs] + [W.flat])

    def _unpack(self,w):
        if self.nhs is None:
            self.W[:] = w.reshape((self.ni+1, self.no))
        else:
            first = 0
            numInThisLayer = self.ni
            for i in range(len(self.Vs)):
                self.Vs[i][:] = w[first:first+(numInThisLayer+1)*self.nhs[i]].reshape((numInThisLayer+1,self.nhs[i]))
                first += (numInThisLayer+1) * self.nhs[i]
                numInThisLayer = self.nhs[i]
            self.W[:] = w[first:].reshape((numInThisLayer+1,self.no))

    def __repr__(self):
        str = 'NeuralNetwork({}, {}, {})'.format(self.ni,self.nhs,self.no)
        # str += '  Standardization parameters' + (' not' if self.Xmeans == None else '') + ' calculated.'
        if self.trained:
            str += '\n   Network was trained for {} iterations. Final error is {}.'.format(self.numberOfIterations,
                                                                                           self.errorTrace[-1])
        else:
            str += '  Network is not trained.'
        return str
            

######################################################################
### class NeuralNetworkClassifier
######################################################################

def makeIndicatorVars(T):
    """ Assumes argument is N x 1, N samples each being integer class label """
    return (T == np.unique(T)).astype(int)

class NeuralNetworkClassifier(NeuralNetwork):

    def __init__(self,ni,nhs,no):
        #super(NeuralNetworkClassifier,self).__init__(ni,nh,no)
        NeuralNetwork.__init__(self,ni,nhs,no)

    def _multinomialize(self,Y):
        # fix to avoid overflow
        mx = max(0,np.max(Y))
        expY = np.exp(Y-mx)
        # print('mx',mx)
        denom = np.sum(expY,axis=1).reshape((-1,1)) + sys.float_info.epsilon
        Y = expY / denom
        return Y

    def train(self, X, T, nIterations=100, verbose=False):
        if self.Xmeans is None:
            self.Xmeans = X.mean(axis=0)
            self.Xstds = X.std(axis=0)
            self.Xconstant = self.Xstds == 0
            self.XstdsFixed = copy(self.Xstds)
            self.XstdsFixed[self.Xconstant] = 1
        X = self._standardizeX(X)

        self.classes, counts = np.unique(T,return_counts=True)
        self.mostCommonClass = self.classes[np.argmax(counts)]  # to break ties

        if self.no != len(self.classes):
            raise ValueError(" In NeuralNetworkClassifier, the number of outputs must equal\n the number of classes in the training data. The given number of outputs\n is %d and number of classes is %d. Try changing the number of outputs in the\n call to NeuralNetworkClassifier()." % (self.no, len(self.classes)))
        T = makeIndicatorVars(T)

        # Local functions used by gradientDescent.scg()
        def objectiveF(w):
            self._unpack(w)
            Y,_ = self._forward_pass(X)
            Y = self._multinomialize(Y)
            Y[Y==0] = sys.float_info.epsilon
            return -np.mean(T * np.log(Y))

        def gradF(w):
            self._unpack(w)
            Y,Z = self._forward_pass(X)
            Y = self._multinomialize(Y)
            delta = (Y - T) / (X.shape[0] * (T.shape[1]))
            dVs,dW = self._backward_pass(delta,Z)
            return self._pack(dVs,dW)

        scgresult = ml.scg(self._pack(self.Vs,self.W), objectiveF, gradF,
                            nIterations = nIterations,
                            ftracep=True,
                            verbose=verbose)

        self._unpack(scgresult['x'])
        self.reason = scgresult['reason']
        self.errorTrace = scgresult['ftrace']
        self.numberOfIterations = len(self.errorTrace) - 1
        self.trained = True
        return self
    
    def use(self,X,allOutputs=False):
        Xst = self._standardizeX(X)
        Y,Z = self._forward_pass(Xst)
        Y = self._multinomialize(Y)
        classes = self.classes[np.argmax(Y,axis=1)].reshape((-1,1))
        # If any row has all equal values, then all classes have same probability.
        # Let's return the most common class in these cases
        classProbsEqual = (Y == Y[:,0:1]).all(axis=1)
        if sum(classProbsEqual) > 0:
            classes[classProbsEqual] = self.mostCommonClass
        if Z is None:
            return (classes,Y,None) if allOutputs else classes
        else:
            return (classes,Y,Z[1:]) if allOutputs else classes
In [5]:
import sys
sys.path = ['.'] + sys.path
In [6]:
import neuralnetworks as nn
import mpl_toolkits.mplot3d as plt3
from matplotlib import cm

## if you edit neuralnetwork.py, force ipython to reload it by doing this.
# from imp import reload
# reload(nn)

nHidden = 5
nnet = nn.NeuralNetworkClassifier(2,nHidden,3) # 3 classes, will actually make 2-unit output layer
nnet.train(X,T,  nIterations=1000, verbose=True)

xs = np.linspace(0,30,40)
x,y = np.meshgrid(xs,xs)
Xtest = np.vstack((x.flat,y.flat)).T
Ytest = nnet.use(Xtest)
predTest,probs,_ = nnet.use(Xtest,allOutputs=True)  #discard hidden unit outputs

plt.figure(figsize=(10,10))

plt.subplot(2,2,1)
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel("Epochs")
plt.ylabel("Likelihood")

plt.subplot(2,2,3)
nnet.draw()

colors = ['red','green','blue']
plt.subplot(2,2,2)

for c in range(1,4):
    mask = (T == c).flatten()
    plt.plot(X[mask,0],X[mask,1],'o',markersize=6, alpha=0.5, color=colors[c-1])

plt.subplot(2,2,4)
plt.contourf(Xtest[:,0].reshape((40,40)),Xtest[:,1].reshape((40,40)), predTest.reshape((40,40)),
             levels = [0.5,1.99,2.01,3.5], #    levels=(0.5,1.5,2.5,3.5),
             colors=('red','green','blue'));
SCG: Iteration 100 fValue 0.03775978800038138 Scale 1e-15
SCG: Iteration 200 fValue 0.020741938982621968 Scale 1e-15
SCG: Iteration 300 fValue 0.015552770614653178 Scale 1e-15
SCG: Iteration 400 fValue 0.013600204254684578 Scale 1e-15
SCG: Iteration 500 fValue 0.012868513291316002 Scale 1e-15
SCG: Iteration 600 fValue 0.011968508806622594 Scale 1e-15
SCG: Iteration 700 fValue 0.011549682470761087 Scale 1e-15
SCG: Iteration 800 fValue 0.011191824490301792 Scale 1e-15
SCG: Iteration 900 fValue 0.010798480012414048 Scale 1e-15
SCG: Iteration 1000 fValue 0.010341145640767479 Scale 1e-15
In [7]:
from matplotlib.colors import LightSource
In [8]:
fig = plt.figure(figsize=(10,30))
ls = LightSource(azdeg=30, altdeg=60)
white = np.ones((x.shape[0], x.shape[1], 3))
red = white * np.array([1,0,0])
green = white * np.array([0,1,0])
blue = white * np.array([0,0,1])
colors = [red, green, blue]
#colors = ['red','green','blue']

for c in range(3):
    ax = fig.add_subplot(3,1,c+1,projection='3d')
    ax.view_init(azim = 180+40,elev = 60)
    Z = probs[:, c].reshape(x.shape)
    #ax.plot_surface(x,y,Z,
    #rstride=1,cstride=1,linewidth=0,antialiased=False,
    #                color=colors[c],alpha=1)
    rgb = ls.shade_rgb(colors[c], Z, vert_exag=0.1)
    #rgb = ls.shade(Z, plt.cm.gray)
    ax.plot_surface(x,y,Z,
                    rstride=1,cstride=1,linewidth=0, antialiased=False,
                    shade=False, facecolors=rgb)
    ax.set_zlabel(r"$p(C="+str(c+1)+"|x)$")

How would you plot the outputs of the hidden units?

Let's repeat the experiment with classifying human activity data (accelerometer data), but now use our NeuralNetworkClassifier class to do nonlinear logistic regression. This time we will retrieve and load accelerometers.npy, a file containing a numpy array stored in its binary format.

In [9]:
! curl -O http://www.cs.colostate.edu/~anderson/cs445/notebooks/acceldata.tar
! tar xvf acceldata.tar
data = np.load('accelerometers.npy')
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7040k  100 7040k    0     0  2830k      0  0:00:02  0:00:02 --:--:-- 2829k
accelerometers.npy
In [10]:
data.shape
Out[10]:
(225006, 4)
In [11]:
data[0,:]
Out[11]:
array([ 1.        , -0.87313405, -0.08552787, -0.29504612])
In [38]:
X = data[:,1:]
T = data[:,0:1]
X.shape, T.shape
Out[38]:
((225006, 3), (225006, 1))
In [39]:
import mlutilities as ml # for ml.paritition
In [40]:
X.shape
Out[40]:
(225006, 3)
In [41]:
train_fraction = 0.8
Xtrain, Ttrain, Xtest, Ttest = ml.partition(X, T, train_fraction, classification=True) #stratified partitioning (by class)
In [42]:
Xtrain.shape,Ttrain.shape,Xtest.shape,Ttest.shape
Out[42]:
((180006, 3), (180006, 1), (45000, 3), (45000, 1))
In [43]:
np.unique(Ttrain, return_counts=True)
Out[43]:
(array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 array([18002, 18000, 18000, 18000, 18000, 18002, 18000, 18000, 18002,
        18000]))
In [44]:
%precision 5
values,counts = np.unique(Ttrain, return_counts=True)
counts / Ttrain.shape[0]
Out[44]:
array([0.10001, 0.1    , 0.1    , 0.1    , 0.1    , 0.10001, 0.1    ,
       0.1    , 0.10001, 0.1    ])
In [45]:
values,counts = np.unique(Ttest, return_counts=True)
counts / Ttest.shape[0]
Out[45]:
array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
In [46]:
n_classes = len(np.unique(T))
nnet = nn.NeuralNetworkClassifier(3, 10, n_classes) 
nnet.train(Xtrain,Ttrain, verbose=True)

plt.rcParams['figure.figsize'] = (6,6)
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel('Iteration')
plt.ylabel('Data Likelihood');
SCG: Iteration 10 fValue 0.16188814244708796 Scale 3.90625e-09
SCG: Iteration 20 fValue 0.14704801907462223 Scale 3.814697265625e-12
SCG: Iteration 30 fValue 0.13908391343967322 Scale 3.725290298461914e-15
SCG: Iteration 40 fValue 0.13515204025420435 Scale 1e-15
SCG: Iteration 50 fValue 0.13309067684300996 Scale 1e-15
SCG: Iteration 60 fValue 0.13153904759497034 Scale 1e-15
SCG: Iteration 70 fValue 0.13063932634565126 Scale 1e-15
SCG: Iteration 80 fValue 0.13026014784075352 Scale 1e-15
SCG: Iteration 90 fValue 0.12998739696400266 Scale 1e-15
SCG: Iteration 100 fValue 0.12973129439100037 Scale 1e-15
In [47]:
Ptrain,Prtrain,_ = nnet.use(Xtrain,allOutputs=True)
Ptest,Prtest,_ = nnet.use(Xtest,allOutputs=True)
plt.figure(figsize=(10, 10))
plt.subplot(2,1,1)
plt.plot(np.hstack((Ttrain,Ptrain)), '.')
plt.legend(('Actual','Predicted'))
plt.subplot(2,1,2)
plt.plot(np.hstack((Ttest,Ptest)), '.')
plt.legend(('Actual','Predicted'));
In [48]:
cm = ml.confusionMatrix(Ttest,Ptest,np.unique(Ttest).astype(int))
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
       1    2    3    4    5    6    7    8    9   10
    ------------------------------------------------------------
 1 | 86.1 13.9  0    0    0    0    0    0    0    0  
 2 | 37.8 27.0  3.5  0.2 27.5  0    0    0.1  4.0  0  
 3 |  0.8  4.7 65.4  2.9  5.1  9.8  3.6  1.1  3.6  3.1
 4 |  0.7  5.8 14.7  9.0 20.1 16.6 10.3  2.7  5.9 14.1
 5 |  3.2  1.6 13.6  1.5 50.6  0.6  1.0  6.0 19.5  2.5
 6 |  0.2  0.2  0.6  0.1  0.5 68.4 28.5  0.4  0.3  0.7
 7 |  0.1  0.4  1.4  0.3  0.2 37.8 53.1  0.6  0.9  5.4
 8 |  0.2  1.3  8.1  2.6 12.4  1.9  3.1 15.4 50.4  4.7
 9 |  0.3  0.8  6.7  1.6 14.2  1.8  2.0 13.9 55.1  3.7
10 |  0.3  4.6  6.8  3.2 12.5  9.8 17.6 14.0 12.6 18.7
1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs

Try training for more iterations.

In [49]:
nnet = nn.NeuralNetworkClassifier(3, 20, 10)
nnet.train(Xtrain, Ttrain, nIterations=500, verbose=True)
print('Trained for', nnet.getNumberOfIterations(), 'iterations')
SCG: Iteration 50 fValue 0.12810128826542772 Scale 1e-15
SCG: Iteration 100 fValue 0.12123691708103801 Scale 1e-15
SCG: Iteration 150 fValue 0.11980220947167443 Scale 1e-15
SCG: Iteration 200 fValue 0.11916892273110097 Scale 1e-15
SCG: Iteration 250 fValue 0.11877816769617629 Scale 1e-15
SCG: Iteration 300 fValue 0.118428957495202 Scale 1e-15
SCG: Iteration 350 fValue 0.11826140885636018 Scale 1e-15
SCG: Iteration 400 fValue 0.11801238572879526 Scale 1e-15
SCG: Iteration 450 fValue 0.11784035361994037 Scale 1e-15
SCG: Iteration 500 fValue 0.11770224579302253 Scale 1e-15
Trained for 500 iterations
In [50]:
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel('Iteration')
plt.ylabel('Data Likelihood');
In [51]:
classes = np.unique(Ttest).astype(int)
Ptrain, Prtrain, _ = nnet.use(Xtrain,allOutputs=True)
Ptest, Prtest, _ = nnet.use(Xtest,allOutputs=True)
print('Percent Correct: Training', 100*np.sum(Ptrain==Ttrain)/len(Ttrain),
      'Testing',100*np.sum(Ptest==Ttest)/len(Ttest))
print()
ml.confusionMatrix(Ttest,Ptest,classes)
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
plt.plot(np.exp(-nnet.getErrorTrace()));
Percent Correct: Training 55.955357043654104 Testing 51.824444444444445

       1    2    3    4    5    6    7    8    9   10
    ------------------------------------------------------------
 1 | 95.6  4.4  0    0    0    0    0    0    0    0  
 2 | 23.6 55.9  3.6  0.7 12.2  0    0    0.0  4.0  0  
 3 |  0.5  1.2 67.6  7.5  2.6 10.1  4.2  3.2  0.6  2.5
 4 |  0.8  3.0 17.2 18.5  9.3 17.3  8.1  5.9  4.0 15.8
 5 |  1.2  2.5 12.7  2.9 52.8  0.2  1.2  9.3 14.9  2.2
 6 |  0.0  0.2  0.4  0.3  0.8 78.6 16.7  0.5  0.2  2.3
 7 |  0.0  0.3  1.1  0.1  0.4 37.1 54.1  1.2  1.3  4.4
 8 |  0.0  1.9  5.8  3.2 11.6  1.6  3.3 28.6 39.6  4.3
 9 |  0.2  2.0  5.0  3.0 12.3  1.3  2.0 21.5 48.7  3.9
10 |  0.2  1.6  6.8  4.3 10.2  9.6 16.0 25.1  8.5 17.9
1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs
In [52]:
nnet.draw()
In [53]:
import scipy.signal as sig

def cwt(eeg,Fs,freqs,width,channelNames=None,graphics=False):
    if freqs.min() == 0:
        print('cwt: Frequencies must be greater than 0.')
        return None,None
    nChannels,nSamples = eeg.shape
    if not channelNames and graphics:
        channelNames = ['Channel {:2d}'.format(i) for i in range(nChannels)]

    nFreqs = len(freqs)
    tfrep = np.zeros((nChannels, nFreqs, nSamples))
    tfrepPhase = np.zeros((nChannels, nFreqs, nSamples))

    for ch in range(nChannels):
        print('channel',ch,' freq ',end='')
        for freqi in range(nFreqs):
            print(freqs[freqi],' ',end='')
            mag,phase = energyvec(freqs[freqi],eeg[ch,:],Fs,width)
            tfrepPhase[ch,freqi,:] = phase
            tfrep[ch,freqi,:] = mag
        print()

    return tfrep, tfrepPhase

def morletLength(Fs,f,width):
  ''' len = morletLength(Fs,f,width) '''
  dt = 1.0/Fs
  sf = f/width
  st = 1.0/(2*np.pi*sf)
  return int((3.5*st - -3.5*st)/dt)

def energyvec(f,s,Fs,width):
  '''
  function [y,phase] <- energyvec(f,s,Fs,width)
  function y <- energyvec(f,s,Fs,width)

  Return a vector containing the energy as a
  function of time for frequency f. The energy
  is calculated using Morlet''s wavelets.
  s : signal
  Fs: sampling frequency
  width : width of Morlet wavelet (><- 5 suggested).
  '''

  dt = 1.0/Fs
  sf = f/float(width)
  st = 1.0/(2*np.pi*sf)

  t = np.arange(-3.5*st,3.5*st,step=dt)
  m = morlet(f,t,width)
  # yconv = np.convolve(s,m,mode="same")
  yconv = sig.fftconvolve(s,m,mode='same')

  lengthMorlet = len(m)
  firsthalf = int(lengthMorlet/2.0 + 0.5)
  secondhalf = lengthMorlet - firsthalf

  padtotal = len(s) - len(yconv)
  padfront = int(padtotal/2.0 + 0.5)
  padback = padtotal - padfront
  yconvNoBoundary = yconv
  y = np.abs(yconvNoBoundary)**2
  phase = np.angle(yconvNoBoundary,deg=True)
  return y,phase

######################################################################
      
def morlet(f,t,width):
    '''
    function y <- morlet(f,t,width)
    Morlet''s wavelet for frequency f and time t.
    The wavelet will be normalized so the total energy is 1.
    width defines the width of the wavelet.
    A value ><- 5 is suggested.

    Ref: Tallon-Baudry et al., J. Neurosci. 15, 722-734 (1997), page 724

    Ole Jensen, August 1998
    '''
    sf = f/float(width)
    st = 1.0/(2*np.pi*sf)
    A = 1.0/np.sqrt(st*np.sqrt(2*np.pi))
    y = A*np.exp(-t**2/(2*st**2)) * np.exp(1j*2*np.pi*f*t)
    return y
In [54]:
import time
width = 75 
maxFreq = 20
freqs = np.arange(0.5,maxFreq,0.5) # makes same freqs used in stft above
start = time.time()
tfrep,tfrepPhase = cwt(data[:,1:].T, 75, freqs, width)
print('CWT time: {} seconds'.format(time.time() - start))
channel 0  freq 0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5  10.0  10.5  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0  15.5  16.0  16.5  17.0  17.5  18.0  18.5  19.0  19.5  
channel 1  freq 0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5  10.0  10.5  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0  15.5  16.0  16.5  17.0  17.5  18.0  18.5  19.0  19.5  
channel 2  freq 0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5  10.0  10.5  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0  15.5  16.0  16.5  17.0  17.5  18.0  18.5  19.0  19.5  
CWT time: 4.35584020614624 seconds
In [55]:
plt.figure(figsize=(15,15))
plt.subplot(5,1,1)
plt.plot(data[:,1:])
plt.axis('tight')

plt.subplot(5,1,2)
plt.plot(data[:,0])
plt.text(5000,8,'1-Rest, 2-Coloring, 3-Legos, 4-Wii Tennis, 5-Boxing, 6-0.75, 7-1.25 m/s, 8-1.75, 9-2.25 m/s, 10-stairs')
plt.axis('tight')

nSensors = data.shape[1] - 1
for i in range(nSensors):
    plt.subplot(5,1,i+3)
    plt.imshow(np.log(tfrep[i,:,:]), 
               interpolation='nearest',origin='lower',
               cmap=plt.cm.jet) #plt.cm.Reds)
    plt.xlabel('Seconds')
    plt.ylabel('Frequency in ' + ('$x$','$y$','$z$')[i])
    tickstep = round(len(freqs) / 5)
    plt.yticks(np.arange(len(freqs))[::tickstep],
                   [str(i) for i in freqs[::tickstep]])
    plt.axis('auto')
    plt.axis('tight')
In [56]:
tfrep.shape
Out[56]:
(3, 39, 225006)
In [57]:
X = tfrep.reshape((3*39,-1)).T
X.shape, T.shape, len(np.unique(T))
Out[57]:
((225006, 117), (225006, 1), 10)
In [58]:
Xtrain, Ttrain, Xtest, Ttest = ml.partition(X, T, 0.8, classification=True)
In [59]:
print(Xtrain.shape)
nnet = nn.NeuralNetworkClassifier(X.shape[1], 20, 10)  #10 classes 
nnet.train(Xtrain,Ttrain, nIterations = 500, verbose=True)
(180006, 117)
SCG: Iteration 50 fValue 0.02536230729996015 Scale 1e-15
SCG: Iteration 100 fValue 0.007619789855092344 Scale 1e-15
SCG: Iteration 150 fValue 0.001294782770381377 Scale 1e-15
SCG: Iteration 200 fValue 0.00044073818686692193 Scale 1e-15
SCG: Iteration 250 fValue 0.0002525632424925583 Scale 1e-15
SCG: Iteration 300 fValue 0.00013891321767210582 Scale 1e-15
SCG: Iteration 350 fValue 7.487201391883371e-05 Scale 1e-15
SCG: Iteration 400 fValue 4.98034449410777e-05 Scale 8.589934592e-06
SCG: Iteration 450 fValue 3.247916940876592e-05 Scale 4e-15
SCG: Iteration 500 fValue 2.9229093159987138e-05 Scale 4611.686018427388
Out[59]:
NeuralNetwork(117, [20], 10)
   Network was trained for 500 iterations. Final error is 2.9229093159987138e-05.
In [60]:
Ptrain, Prtrain, _ = nnet.use(Xtrain,allOutputs=True)
Ptest, Prtest, _ = nnet.use(Xtest,allOutputs=True)
print('Percent Correct: Training', 100*np.sum(Ptrain==Ttrain)/len(Ttrain),
      'Testing', 100*np.sum(Ptest==Ttest)/len(Ttest))
print()
classes = np.unique(Ttest).astype(int)
ml.confusionMatrix(Ttest, Ptest, classes)
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
plt.plot(np.exp(-nnet.getErrorTrace()));
Percent Correct: Training 100.0 Testing 35.446666666666665

       1    2    3    4    5    6    7    8    9   10
    ------------------------------------------------------------
 1 | 35.7 64.3  0    0    0    0    0    0    0    0  
 2 |  0   10.5  2.6  0    0   63.3  0    0    0   23.6
 3 | 16.0  0    3.1 18.6  0    0   25.8  0    0   36.5
 4 |  0    0   67.7 25.6  6.7  0    0    0    0    0  
 5 |  0.7  0    1.0  0   67.6  6.4  0    1.4 15.8  7.0
 6 |  0    0    0    0    0   78.0 15.8  0    0    6.2
 7 |  0    0    3.0  0    0.4 19.9 65.4  1.4  0    9.9
 8 |  0    0   10.0  2.2  6.6  0    0   41.3 40.0  0  
 9 |  0    0    0   31.8  0    1.5  0   51.6 14.6  0.5
10 |  3.0  0    0    0    5.7 12.6 40.8 23.9  1.4 12.7
1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs
In [61]:
plt.figure(figsize=(15,15))
nnet.draw()