SYDE 556/750: Simulating Neurobiological Systems

Lecture 10: Learning

Learning

  • What do we mean by learning?

    • When we use an integrator to keep track of location, is that learning?
    • What about the learning used to complete a pattern in the Raven's Progressive Matrices task?
    • Neither of these require any connection weights to change in the model
    • But both allow future performance to be affected by past performance
    • I suggest the term 'adaptation' to capture all such future-affected-by-past phenomena
  • So, we'll stick with a simple definition of learning

    • Changing connection weights between groups of neurons
  • Why might we want to change connection weights?
  • This is what traditional neural network approaches do
    • Change connection weights until it performs the desired task
    • Once it's doing the task, stop changing the weights
  • But we have a method for just solving for the optimal connection weights
    • So why bother learning?

Why learning might be useful

  • We might not know the function at the beginning of the task
    • Example: a creature explores its environment and learns that eating red objects is bad, but eating green objects is good
      • what are the inputs and outputs here?
  • The desired function might change
    • Example: an ensemble whose input is a desired hand position, but the output is the muscle tension (or joint angles) needed to get there
      • why would this change?
  • The optimal weights we solve for might not be optimal
    • How could they not be optimal?
    • What assumptions are we making?

The simplest approach

  • What's the easiest way to deal with this, given what we know?
  • If we need new decoders
    • Let's solve for them while the model's running
    • Gather data to build up our $\Gamma$ and $\Upsilon$ matrices
  • Example: eating red but not green objects
    • Decoder from state to $Q$ value (utility of action) for eating
    • State is some high-dimensional vector that includes the colour of what we're looking for
      • And probably some other things, like whether it's small enough to be eaten
    • Initially doesn't use colour to get output
    • But we might experience a few bad outcomes after red, and good after green
    • These become new $x$ samples, with corresponding $f(x)$ outputs
    • Gather a few, recompute decoder
      • Could even do this after every timestep
  • Example: converting hand position to muscle commands
    • Send random signals to muscles
    • Observe hand position
    • Use that to train decoders
  • Example: going from optimal to even more optimal
    • As the model runs, we gather $x$ values
    • Recompute decoder for those $x$ values

What's wrong with this approach

  • Feels like cheating
  • Why?
  • Two kinds of problems:
    • Not biologically realistic
      • How are neurons supposed to do all this?
      • store data
      • solve decoders
      • timing
    • Computationally expensive
      • Even if we're not worried about realism

Traditional neural networks

  • Traditionally, learning is the main method of constructing a model network
  • Usually incremental learning (gradient descent)
    • As you get examples, shift the connection weights slightly based on that example
    • Don't have to consider all the data when making an update
  • Example: Perceptron learning (1957)
    • $\Delta w_j = \alpha(y_d - y)x_i$

  • Problems with perceptron
    • Can't do all possible functions
    • Effectively just linear functions of $x$ (with a threshold; i.e. a linear classifier)
    • Is that a problem (X)OR not?

Backprop and the NEF

  • How are nonlinear functions included?
    • Multiple layers

  • But now a new rule is needed
    • Standard answer: backprop
    • Same as perceptron for first (output) layer
    • Backprop adds: Estimate correct "hidden layer" input, and repeat
  • What would this be in NEF terms?
  • Remember that we're already fine with linear decoding
    • Encoders (and $\alpha$ and $J^{bias}$) are input layer of weights, decoders are output layer
    • Note that in the NEF, we combine many of these layers together
  • We can just use the standard perceptron rule for decoders
    • As long as there are lots of neurons, and we've initialized them well with the desired intercepts, maximum rates, and encoders we should be able to decode lots of functions
    • So, what might backprop add to that?
      • Think about encoders

Biologically realistic perceptron learning

  • (MacNeil & Eliasmith, 2011) derive a simple, plausible learning rule starting with a delta rule
  • $E = 1/2 \int (x-\hat{x})^2 dx$
  • $\delta E/\delta d_i = (x-\hat{x})a_i$ (as usual for finding decoders)
  • So, to move down the gradient:
    • $\Delta d_i = -\kappa (x - \hat{x})a_i$ (NEF notation)
    • $\Delta d_i = \kappa (y_d - y)a_i$ (the standard perceptron/delta rule)
  • How do we make it realistic?
  • Decoders don't exist in the brain
    • Need weights
  • The NEF tells us:
    • $\omega_{ij} = \alpha_j d_i \cdot e_j$
    • $\Delta \omega_{ij} = \alpha_j \kappa (y_d - y)a_i \cdot e_j$
  • Let's write $(y_d - y)$ as $E$ (for error)
    • $\Delta \omega_{ij} = \alpha_j \kappa a_i E \cdot e_j$
    • $\Delta \omega_{ij} = \kappa a_i (\alpha_j E \cdot e_j)$
  • What's $\alpha_j E \cdot e_j$?
    • That's the current that this neuron would get if it had $E$ as an input
    • But we don't want this current to drive the neuron
    • Rather, we want it to change the weight
    • It's a modulatory input
  • This is the "Prescribed Error Sensitivity" PES rule
    • Any model in the NEF could use this instead of computing decoders
    • Requires some other neural group computing the error $E$
    • Used in Spaun for Q-value learning (reinforcement task)
    • Can even be used to learn circular convolution
  • Is this realistic?
    • Local information only
    • Need an error signal
      • Does it look like anything like this happens in the brain?
      • Yes
        • Retinal slip error is computed in oculomotor system
        • Dopamine seems to act as prediction error
    • Weight changes proportional to pre-synaptic activity and post-synaptic activity (Hebbian rule)
In [25]:
#From the learning examples in Nengo - a Communication Channel
%pylab inline
import nengo
from nengo.processes import WhiteSignal

model = nengo.Network('Learn a Communication Channel')
with model:
    stim = nengo.Node(output=WhiteSignal(10, high=5, rms=0.5), size_out=2)
    
    pre = nengo.Ensemble(60, dimensions=2)
    post = nengo.Ensemble(60, dimensions=2)
    
    nengo.Connection(stim, pre)
    conn = nengo.Connection(pre, post, function=lambda x: np.random.random(2))
    
    inp_p = nengo.Probe(stim)
    pre_p = nengo.Probe(pre, synapse=0.01)
    post_p = nengo.Probe(post, synapse=0.01)

sim = nengo.Simulator(model)
sim.run(10.0)    
In [26]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/pre_learn.py.cfg')
In [12]:
t=sim.trange()

figure(figsize=(12, 8))
subplot(2, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Random function computation')
    
subplot(2, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');
In [27]:
#Now learn
with model:
    error = nengo.Ensemble(60, dimensions=2)
    error_p = nengo.Probe(error, synapse=0.03)
    
    # Error = actual - target = post - pre
    nengo.Connection(post, error)
    nengo.Connection(pre, error, transform=-1)
    
    # Add the learning rule to the connection
    conn.learning_rule_type = nengo.PES()
    
    # Connect the error into the learning rule
    learn_conn = nengo.Connection(error, conn.learning_rule)

sim = nengo.Simulator(model)
sim.run(10.0)     
In [28]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/simple_learn.py.cfg')
In [6]:
t=sim.trange()

figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a communication channel')
    
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');

subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')
Out[6]:
<matplotlib.text.Text at 0x10fdfeb90>
In [33]:
#Turning learning on and off to test generalization
def inhibit(t):
    return 2.0 if t > 10.0 else 0.0

with model:
    inhib = nengo.Node(inhibit)
    inhib_conn = nengo.Connection(inhib, error.neurons, transform=[[-1]] * error.n_neurons)
    
sim = nengo.Simulator(model)
sim.run(16.0)
In [30]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/control_learn.py.cfg')
In [9]:
t=sim.trange()

figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a communication channel')
    
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');

subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')
Out[9]:
<matplotlib.text.Text at 0x110fb6fd0>
In [34]:
#Compute a nonlinear functions
#model.connections.remove(err_fcn) #uncomment to try other fcns
#del err_fcn
model.connections.remove(inhib_conn)
del inhib_conn
model.nodes.remove(inhib)
model.connections.remove(learn_conn)
del learn_conn

def nonlinear(x):
    return x[0]*x[0], x[1]*x[1]

with model:
    err_fcn = nengo.Connection(pre, error, function=nonlinear, transform=-1)
    
    conn.learning_rule_type = nengo.PES(learning_rate=1e-4)
    # Connect the error into the learning rule
    learn_conn = nengo.Connection(error, conn.learning_rule)    
    
sim = nengo.Simulator(model)
sim.run(26.0)
In [35]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/square_learn.py.cfg')
In [28]:
t=sim.trange()

figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a nonlinear function')
    
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');

subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')
Out[28]:
<matplotlib.text.Text at 0x116211850>
  • This rule can be used to learn any nonlinear vector function
  • It does as well, or better, than the NEF decoder optimization
  • It's a 'spike-based' rule... meaning it works in a spiking network
  • It has been used for 'constant supervision' as well as 'reinforcement learning' (occasional supervision) tasks (Spaun uses it for the RL task)
  • It moves the focus of learning research from weight changes or 'learning rules' to error signals
  • Backprop is one way of propagating error signals (unfortunately not bio-plausible)
  • Pretty much ignores encoders (which should maybe be about capturing all the incoming information, so as to compute any function over that information... though they can be optimized for a given fcn as well)

Applications of PES

Classical conditioning

  • Classical or Pavlovian conditioning uses an unconditioned stimuli (US) (meat for a dog) that ellicits an unconditioned response (UR) (salivating) to cause a conditioned response (CR) (salivating after learning) to be ellicited by a conditioned stimulus (CS) (ringing a bell).

  • The best known model of this is the Rescorla-Wagner model that states:

$\Delta V_x = \alpha (\lambda - \sum_x V)$

where $V_x$ is the value of conditioned stimulus $x$, $\alpha$ is a learning rate and salience parameter, $\lambda$ is the max value (usually 1).

  • In the model below there is only 1 element in $\sum V$ (or you can assume there is little association between other stimuli and the US). The difference in brackets is like a reward prediction error.

In this model:

  • There are three different US that are provided to the model, one after the other.
  • Each has a different hardwired UR.
  • There is also a CS provided (a different one for each US)
  • The model attempts to learn to trigger the correct CR in response to the CS.
  • After learning, the CR should start to respond before the corresponding UR.
In [12]:
import nengo
import numpy as np

D = 3
N = D*50

def us_stim(t):
    # cycle through the three US
    t = t % 3
    if 0.9 < t< 1: return [1, 0, 0]
    if 1.9 < t< 2: return [0, 1, 0]
    if 2.9 < t< 3: return [0, 0, 1]
    return [0, 0, 0]

def cs_stim(t):
    # cycle through the three CS
    t = t % 3
    if 0.7 < t< 1: return [0.7, 0, 0.5]
    if 1.7 < t< 2: return [0.6, 0.7, 0.8]
    if 2.7 < t< 3: return [0, 1, 0]
    return [0, 0, 0]

model = nengo.Network(label="Classical Conditioning")
with model:
    us_stim = nengo.Node(us_stim)
    cs_stim = nengo.Node(cs_stim)

    us = nengo.Ensemble(N, D)
    cs = nengo.Ensemble(N*2, D*2)

    nengo.Connection(us_stim, us[:D])
    nengo.Connection(cs_stim, cs[:D])
    nengo.Connection(cs[:D], cs[D:], synapse=0.2)

    ur = nengo.Ensemble(N, D)
    nengo.Connection(us, ur)
    
    cr = nengo.Ensemble(N, D)
    learn_conn = nengo.Connection(cs, cr, function=lambda x: [0]*D)
    learn_conn.learning_rule_type = nengo.PES(learning_rate=3e-4)

    error = nengo.Ensemble(N, D)
    nengo.Connection(error, learn_conn.learning_rule)
    nengo.Connection(ur, error, transform=-1)
    nengo.Connection(cr, error, transform=1, synapse=0.1)

    stop_learn = nengo.Node([0])
    nengo.Connection(stop_learn, error.neurons, transform=-10*np.ones((N, 1)))
In [13]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning2-conditioning.py.cfg')

Cortical Consolidation

  • There is evidence that when you first learn a skill, it takes a lot of effort and you tend to perform fairly slowly.
  • We would think of this as requiring a lot of intervention from the basal ganglia in selecting actions.
  • As you get better at the skill you become much faster, and BG is used less because cortex 'takes over' cental aspects of that skill, consolidating it into cortico-cortical connections.
  • The next model shows a toy version of this kind of behaviour.

In this model:

  • there is a slow mapping from pre->wm->target (because of long synaptic time constants)
  • there is a fast, direct connection from pre->post
  • the fast connection is trained using the error signal from the slow system
  • the fast system learns to produce the correct output before the slow system
  • if you change the 'context' the fast system will learn the output
  • there is a more complete model that uses this kind of thing (but with BG) in Aubin et al., 2016
In [14]:
import nengo
import numpy as np

tau_slow = 0.2

model = nengo.Network("Cortical Consolidation")
with model:
    pre_value = nengo.Node(lambda t: np.sin(t))
    
    pre = nengo.Ensemble(100, 1)
    post = nengo.Ensemble(100, 1)
    target = nengo.Ensemble(100, 1)
    nengo.Connection(pre_value, pre)

    conn = nengo.Connection(pre, post, function=lambda x: np.random.random(),
                learning_rule_type=nengo.PES())
    
    wm = nengo.Ensemble(300, 2, radius=1.4)
    context = nengo.Node(1)
    nengo.Connection(context, wm[1])
    nengo.Connection(pre, wm[0], synapse=tau_slow)
    
    nengo.Connection(wm, target, synapse=tau_slow, 
                     function=lambda x: x[0]*x[1])
                     
    error = nengo.Ensemble(n_neurons=100, dimensions=1)
    nengo.Connection(post, error, synapse=tau_slow*2, transform=1) #Delay the fast connection so they line up
    nengo.Connection(target, error, transform=-1)
    
    nengo.Connection(error, conn.learning_rule)

    stop_learn = nengo.Node([0])
    nengo.Connection(stop_learn, error.neurons, transform=-10*np.ones((100,1)))
    
    both = nengo.Node(None, size_in=2) #For plotting
    nengo.Connection(post, both[0], synapse=None)
    nengo.Connection(target, both[1], synapse=None)
In [15]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning3-consolidation.py.cfg')

Reinforcement Learning

  • As mentioned in the last lecture, RL is a useful way to think about action selection.
  • You have a set of actions and a set of states, and you figure out the value of each action in each state, letting you construct a big table $Q(s,a)$ which you can use to pick good actions.
  • RL figures out what those values are through trial and error.

$\Delta Q(s,a) = \alpha (R + \gamma Q_{predicted} - Q_{old})$ where $R$ is reward, $\alpha$ is a learning rate and $\gamma$ is a discount factor.

In the model:

  • the agent has three actions (go forward, turn left, and turn right)
  • its only sense are three range finders (radar)
  • initially it should always go forward
  • it gets reward proportial to its forward speed, but a large negative reward for hitting walls
  • the error signal is simply the difference between the computed utility and the instantaneous reward
  • $\Delta Q(s,a) = \alpha (R - Q_{current})$
  • this error will only be applied to whatever action is currently being chosen, which means it cannot learn to do actions that will lead to future rewards
In [28]:
# requires CCMSuite https://github.com/tcstewar/ccmsuite/
import ccm.lib.grid
import ccm.lib.continuous
import ccm.ui.nengo

mymap="""
#########
#       #
#       #
#   ##  #
#   ##  #
#       #
#########

"""

class Cell(ccm.lib.grid.Cell):
    def color(self):
        return 'black' if self.wall else None
    def load(self, char):
        if char == '#':
            self.wall = True

world = ccm.lib.grid.World(Cell, map=mymap, directions=4)

body = ccm.lib.continuous.Body()
world.add(body, x=1, y=3, dir=2)

import nengo
import numpy as np    

def move(t, x):
    speed, rotation = x
    dt = 0.001
    max_speed = 20.0
    max_rotate = 10.0
    body.turn(rotation * dt * max_rotate)
    success = body.go_forward(speed * dt * max_speed)
    if not success: #Hit a wall
        return -1
    else:
        return speed

model = nengo.Network("Simple RL", seed=2)
with model:
    env = ccm.ui.nengo.GridNode(world, dt=0.005)
    
    #set up node to project movement commands to
    movement_node = nengo.Node(move, size_in=2, label='reward')
    movement = nengo.Ensemble(n_neurons=100, dimensions=2, radius=1.4)    
    nengo.Connection(movement, movement_node)

    def detect(t):
        #put sensors at -45 0 45 compared to facing direction
        angles = (np.linspace(-0.5, 0.5, 5) + body.dir ) % world.directions
        return [body.detect(d, max_distance=4)[0] for d in angles]
    stim_radar = nengo.Node(detect)

    #set up low fidelity sensors; noise might help exploration
    radar = nengo.Ensemble(n_neurons=50, dimensions=5, radius=4)
    nengo.Connection(stim_radar, radar)
    
    #set up BG to allow 3 actions (left/fwd/right)
    bg = nengo.networks.actionselection.BasalGanglia(3)
    thal = nengo.networks.actionselection.Thalamus(3)
    nengo.Connection(bg.output, thal.input)
    
    #start with a kind of random selection process, but like going fwd most
    def u_fwd(x):
        return 0.8
    def u_left(x):
        return 0.6
    def u_right(x):
        return 0.7

    conn_fwd = nengo.Connection(radar, bg.input[0], function=u_fwd, learning_rule_type=nengo.PES())
    conn_left = nengo.Connection(radar, bg.input[1], function=u_left, learning_rule_type=nengo.PES())
    conn_right = nengo.Connection(radar, bg.input[2], function=u_right, learning_rule_type=nengo.PES())
        
    nengo.Connection(thal.output[0], movement, transform=[[1],[0]])
    nengo.Connection(thal.output[1], movement, transform=[[0],[1]])
    nengo.Connection(thal.output[2], movement, transform=[[0],[-1]])
    
    errors = nengo.networks.EnsembleArray(n_neurons=50, n_ensembles=3)
    nengo.Connection(movement_node, errors.input, transform=-np.ones((3,1)))
    #inhibit learning for actions not currently chosen (recall BG is high for non-chosen actions)
    nengo.Connection(bg.output[0], errors.ensembles[0].neurons, transform=np.ones((50,1))*4)    
    nengo.Connection(bg.output[1], errors.ensembles[1].neurons, transform=np.ones((50,1))*4)    
    nengo.Connection(bg.output[2], errors.ensembles[2].neurons, transform=np.ones((50,1))*4)    
    nengo.Connection(bg.input, errors.input, transform=1)
    
    nengo.Connection(errors.ensembles[0], conn_fwd.learning_rule)
    nengo.Connection(errors.ensembles[1], conn_left.learning_rule)
    nengo.Connection(errors.ensembles[2], conn_right.learning_rule)
In [29]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning5-utility.py.cfg')

Better RL

  • To improve our RL it would be good to predict future rewards more accurately.
    • It would be good to learn the function $Q(s,a)$.
  • Let's assume that your policy is fixed, so future actions are fixed.
  • As well, future rewards are 90% as good as current rewards (i.e. they are discounted).
  • Consequently, we have:

$Q(s,t) = R(s,t) + 0.9 R(s+1, t+1) + 0.9^2 R(s+2, t+2) + ...$.

  • So also,

$Q(s+1,t+1) = R(s+1,t+1) + 0.9 R(s+2, t+2) + 0.9^2 R(s+3, t+3) + ...$. $0.9 Q(s+1,t+1) = 0.9 R(s+1,t+1) + 0.9^2 R(s+2, t+2) + 0.9^3 R(s+3, t+3) + ...$.

  • Substituting this last equation into the first gives

$Q(s,t) = R(s,t) + 0.9 Q(s+1, t+1)$.

  • This suggests an error rule:

$Error(t) = Q(s-1) - (R(s-1) + 0.9 Q(s))$

In this model:

  • the agent always moves randomly, it's not using what it learns to change its movement (it is just trying to anticipate future rewards)
  • the agent is given a reward whenever it is in the green square, and a punishment (negative reward) whenever it is in the red square
  • it learns to anticipate the reward/punishment as shown in the value graph
  • we convert the error rule into the continuous domain by using a long time constant for s-1 and a short time constant for s (assuming we switch states at each time step).
In [26]:
# requires CCMSuite https://github.com/tcstewar/ccmsuite/
# install with: python setup.py develop
import ccm.lib.grid
import ccm.lib.continuous
import ccm.ui.nengo

mymap="""
#######
#     #
# # # #
# # # #
#G   R#
#######
"""

class Cell(ccm.lib.grid.Cell):
    def color(self):
        if self.wall:
            return 'black'
        elif self.reward > 0:
            return 'green'
        elif self.reward < 0:
            return 'red'
        return None
    def load(self, char):
        self.reward = 0
        if char == '#':
            self.wall = True
        if char == 'G':
            self.reward = 10
        elif char == 'R':
            self.reward = -10

world = ccm.lib.grid.World(Cell, map=mymap, directions=4)

body = ccm.lib.continuous.Body()
world.add(body, x=1, y=2, dir=2)

import nengo
import numpy as np 

tau=0.1

def move(t, x):
    speed, rotation = x
    dt = 0.001
    max_speed = 20.0
    max_rotate = 10.0
    body.turn(rotation * dt * max_rotate)
    body.go_forward(speed * dt * max_speed)
    
    if int(body.x) == 1:
        world.grid[4][4].wall = True
        world.grid[4][2].wall = False
    if int(body.x) == 4:
        world.grid[4][2].wall = True
        world.grid[4][4].wall = False

model = nengo.Network("Predict Value", seed=2)
with model:
    env = ccm.ui.nengo.GridNode(world, dt=0.005)

    movement = nengo.Node(move, size_in=2)
    
    def detect(t):
        angles = (np.linspace(-0.5, 0.5, 3) + body.dir) % world.directions
        return [body.detect(d, max_distance=4)[0] for d in angles]
    stim_radar = nengo.Node(detect)
    radar = nengo.Ensemble(n_neurons=50, dimensions=3, radius=4, seed=2,
                noise=nengo.processes.WhiteSignal(10, 0.1, rms=1))
    nengo.Connection(stim_radar, radar)

    def braiten(x):
        turn = x[2] - x[0]
        spd = x[1] - 0.5
        return spd, turn
    nengo.Connection(radar, movement, function=braiten)  
    
    def position_func(t):
        return body.x / world.width * 2 - 1, 1 - body.y/world.height * 2, body.dir / world.directions
    position = nengo.Node(position_func)
    state = nengo.Ensemble(100, 3)
    nengo.Connection(position, state, synapse=None)
    
    reward = nengo.Node(lambda t: body.cell.reward)
        
    value = nengo.Ensemble(n_neurons=50, dimensions=1)

    learn_conn = nengo.Connection(state, value, function=lambda x: 0,
                                  learning_rule_type=nengo.PES(learning_rate=1e-4,
                                                               pre_tau=tau))
    nengo.Connection(reward, learn_conn.learning_rule, 
                     transform=-1, synapse=tau)
    nengo.Connection(value, learn_conn.learning_rule, 
                     transform=-0.9, synapse=0.01)
    nengo.Connection(value, learn_conn.learning_rule, 
                     transform=1, synapse=tau)
In [27]:
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'configs/learning6-value.py.cfg')

Unsupervised learning

  • Hebbian learning
    • Neurons that fire together, wire together
    • $\Delta \omega_{ij} = \kappa a_i a_j$
    • Just that would be unstable
      • Why?
  • BCM rule (Bienenstock, Cooper, & Munro, 1982)
    • $\Delta \omega_{ij} = \kappa a_i a_j (a_j-\theta)$
    • $\theta$ is an activity threshold
      • If post-synaptic neuron is more active than this threshold, increase strength
      • Otherwise decrease it
    • Other than that, it's a standard Hebbian rule
    • Where would we get $\theta$?
      • Need to store something about the overall recent activity of neuron $j$ so it can be compared to its current activity
      • Just have $\theta$ be a pstc-filtered spiking of $a_j$
In [90]:
import nengo

model = nengo.Network()
with model:
    sin = nengo.Node(lambda t: np.sin(t*4))
    
    pre = nengo.Ensemble(100, dimensions=1)
    post = nengo.Ensemble(100, dimensions=1)

    nengo.Connection(sin, pre)
    conn = nengo.Connection(pre, post, solver=nengo.solvers.LstsqL2(weights=True))

    pre_p = nengo.Probe(pre, synapse=0.01)
    post_p = nengo.Probe(post, synapse=0.01)

sim = nengo.Simulator(model)
sim.run(2.0)
Simulation finished in 0:00:01.                                                 
In [91]:
plot(sim.trange(), sim.data[pre_p], label="Pre")
plot(sim.trange(), sim.data[post_p], label="Post")
ylabel("Decoded value")
legend(loc="best");
In [93]:
conn.learning_rule_type = nengo.BCM(learning_rate=5e-10)

with model:
    trans_p = nengo.Probe(conn, 'weights', synapse=0.01, sample_every=0.01)

sim = nengo.Simulator(model)
sim.run(20.0)
Simulation finished in 0:00:08.                                                 
In [94]:
figure(figsize=(12, 8))
subplot(2, 1, 1)
plot(sim.trange(), sim.data[pre_p], label="Pre")
plot(sim.trange(), sim.data[post_p], label="Post")
ylabel("Decoded value")
ylim(-1.6, 1.6)
legend(loc="lower left")

subplot(2, 1, 2)
# Find weight row with max variance
neuron = np.argmax(np.mean(np.var(sim.data[trans_p], axis=0), axis=1))
plot(sim.trange(dt=0.01), sim.data[trans_p][..., neuron])
ylabel("Connection weight");
In [95]:
def sparsity_measure(vector):
    # Max sparsity = 1 (single 1 in the vector)
    v = np.sort(np.abs(vector))
    n = v.shape[0]
    k = np.arange(n) + 1
    l1norm = np.sum(v)
    summation = np.sum((v / l1norm) * ((n - k + 0.5) / n))
    return 1 - 2 * summation

print "Starting sparsity: {0}".format(sparsity_measure(sim.data[trans_p][0]))
print "Ending sparsity: {0}".format(sparsity_measure(sim.data[trans_p][-1]))
Starting sparsity: 0.223842561067
Ending sparsity: 0.475518530635
  • Result: only a few neurons will fire
    • Sparsification
  • What would this do in NEF terms?
    • Still represent $x$, but with very sparse encoders (assuming the function doesn't change)
  • This is still a rule on the weight matrix, but functionally seems to be more about encoders than decoders
    • What could we do, given that?

The homeostatic Prescribed Error Sensitivity (hPES) rule

$\Delta \omega_{ij} = \kappa a_i (\alpha_j S e_j \cdot E + (1-S) a_j (a_j-\theta))$

  • Works as well (or better) than PES
    • Seems to be a bit more stable, but analysis is ongoing
  • Biological evidence?
    • Spike-Timing Dependent Plasticity

  • Still work to do for comparison, but seems promising
  • Error-driven for improving decoders
  • Hebbian sparsification to improve encoders
    • Perhaps to sparsify connections (energy savings in the brain, but not necessarily in simulation)