Notebook

Learning Strategies¶

git: https://github.com/JuliaML/LearningStrategies.jl

Index¶

Summary of package and functionalities
How a _learning strategy_ works
Combining _learning strategies_
How data is handled in the default structure

__Examples:__

Building a simple maximum finder (model with no data)
Building a K-means algorithm (model with data)
Making a custom meta-strategy
List and examples of built-in functions

Summary¶

Through the JuliaML Learn.jl and LearningStrategies.jl, a framework was built allowing the simple implementation of new algorithms and learning strategies, and making it extremely simple to combine multiple strategies together.

A strategy here is simply defined as a way for the user to tell how a model (stored as a struct) is setup, trained and updated. This idea stems from the very similar pseudocode most learning algorithms operate with, which is:

# Learning a model involves a model, a strategy and (not necessary) some data    
function learn!(model, strat::LearningStrategy, data)   
    # We set up the model, can be preparing arrays for training, doing some preprocessing
    setup!(strat, model[, data])           

    # Using the "repeated" function  on your data allows it to replicate it infinitely
    for (i, item) in enumerate(data)                    

        # Update the model according to the strategy(-ies) and the data  
        update!(model, strat[, i], item) 

        # Allows to record information or generally "hook in" the process. Should NOT modify in place
        hook(strat, model[, data], i)    

        # Checks if the finishing condition has been reached
        finished(strat, model[, data], i) && break      
    end

    # Clean up after yourself
    cleanup!(strat, model) 

    # Return the trained model
    model                                               
end

Which can be translated as setup, iterate over data until some end condition is met, and clean up. In addition, a conventional extra function which can (and should) be defined if your model is predictive, is predict(), which should be imported from StatsBase. The pseudocode is as follows

function predict(model, data)
    # Make predictions
end

LearningStrategy¶

A LearningStrategy is simply a struct which extends LearningStrategy. A strategy needs not contain any variable, all that is required is that at least one of the following functions is defined:

setup!(strat, model, data)
cleanup!(strat, model)
hook(strat, model, i)
finished(strat, model, data, i)
update!(model, strat, item)

Keep in mind of course that if only the hook function is defined for a generic strategy StrategyA, and this is the only strategy given for learning, the model will.. not learn anything!

This architecture makes it extremely simple for user wanting to write his own learning algorithm to add methods as they come without need further planning. For instance, suppose one strats off by just implementing and SGD() optimizer, then all that would be required is to define the structure

struct SGD() <:LearningStrategy end

and then implement the actual optimization by defining a function update!(model, strategy::SGD). Later on, if the same user or another wants to implement AdaProx optimizer, they can simply add a new function update!(model, strategy::AdaProx).

It is not even required to explicitely wirte a learn! function if all you need is the update!, simply use the call

learn!(model, SGD(), data)

and Julia will handle the "missing" functions (by simply skipping them) and will make the call to the update you defined.

Note: the data (and i)parameter is not actually required, all the base functions are defined with both data and no data argument

Meta-Strategy¶

One extremely powerful feature that this framework offers is the combination of strategies. For instance, while only one strategy will tell the model how to train and update for each iteration, extra strategies which only give side indications can be very easily used. For instance, one of the built-in strategies is MaxIter(n=100) which, if given in conjuction to a laerning strategy for any model, will overload the finished function, and will do its own check as to whether the algorithm is indeed over. A simple use would be:

learn!(MyModel(), strategies( MyLearningStrategy(), MaxIter(20) ), MyData )

In this way, the learning process will end after 20 iterations, even if the conditions for finished() given by the MyLearningStrategy are not met (only one finished condition needs to be met for the learning to stop)

This means that if you develop your own learning model, you do not need to also add all this "side functionalities"! How great is that!

Passing data¶

Different algorithms use data differently. Some iterate over each element, while other take batches and so on.

The base code uses this loop: for (i, item) in enumerate(data). This implies the data supplied for training is an iterable. This is often not really the case, for instance a matrix, although technically iterable, will be enumerated by element, instead of by row or column which is the more natural way.

Two solutions are therefore proposed;

The first is to pass a repeated matrix using Base.Iterators.repeated, which will basically make every "item" a copy of the initial matrix. This is simply used as repeated(matrix).

The other is to do some preprocessing, possibly using the setup! function or in some other way. It would be, for example, possible to make a row iterator, in the case of a row-majored matrix.

Sample Codes¶

Making a simple maximum finder (No data)¶

Below I will show a and explain a code to make a simple (and not very good..) maximum finder! The idea is this: the model will simply include a starting point (2D space), and a heuristic function that is being maximised by a (bad) strategy.

First thing first, since we're going to be implementing learn!, update! and finished, we need to export them from the base package

In [1]:

using Plots
using LearningStrategies
import LearningStrategies: setup!, learn!, update!, finished, hook

import StatsBase: predict
import Base.Iterators: repeated

Now we define the model. It contains 4 attributes, one of which is a generic function. x & y are the starting points, and z will just keep track of the current "height" (i.e. score) in the solution space. The heuristic is the function to be optimised

In [2]:

mutable struct HillClimber
    x::Float64
    y::Float64
    z::Float64
    heuristic::Function
end

Then, we define our learning strategy structure. Since this strategy doesn't require any attributes, its definition is trivial.

In [3]:

struct SillyOptimizer <: LearningStrategy end

Finally we start the actual strategy building.

In [4]:

function update!(model::HillClimber, s::SillyOptimizer)
    # Generate 4 random direction vectors
    choices = rand((4,2))-0.5

    # Normalizes to unit step size
    choices ./= abs.(sum(choices,2))

    # Test the locations
    heights = [heuristic(choices[i,:]+[model.x,model.y]) for i in 1:size(choices,1)]
    
    # Keep the best
    (model.z, best) = findmax(heights)
    
    # Change current coordinates
    model.x += choices[best,1]
    model.y += choices[best,2]
end

# Stop after first iteration
finished(s::SillyOptimizer, model) = true

Out[4]:

finished (generic function with 13 methods)

You may have noticed that the learn! did not actually need to be defined since there is no need to override the default learning structure. The finished function simply returns true, hence the learning will complete after one iteration. Let's give it a try:

In [5]:

# Gaussian with max at (0,1)
function heuristic(x)
    arg = (x[1])^2 /5 + (x[2]-1)^2 /12
    h = exp(-arg)
    h
end

# Create the model, giving the attributes in order of definition
model = HillClimber(1.5, -2, 0, heuristic)

# Learn!
learn!(model, SillyOptimizer() )

Out[5]:

HillClimber(-1.042851726399331, -0.4571482736006689, 0.6740519892971322, heuristic)

So given that it's a pretty bad optimiser and that we only allowed for one iteration, it didn't perform very well.. Now it's quite hard for such an optimizer to decide when to stop and building a finished function. Therefore, what I will do is intead use meta-learning to decide when to finish.

To do this, I first need to redefine finished so that it doesn't return true, and then use a convergence strategy, such as MaxIter or TimeLimit.

In [6]:

# Redefine to never finish (careful here! #infiniteloop)
finished(s::SillyOptimizer, model) = false

# Create the model, giving the attributes in order of definition
model = HillClimber(1.5, -2, 0, heuristic)

# Use MaxIter to decide when to end learning
learn!(model, strategy( SillyOptimizer(), MaxIter(20) ) )

Out[6]:

HillClimber(0.3349619985744343, -0.8349619985744344, 0.73857666586363, heuristic)

Building a K-means learner¶

TODO: rewrite this terrible description

K-means is an unsupervised-learning algorithm which attempts to find the set of k clusters which separates the data the best.

The model itself doesn't require anything more than the number K of clusters that the user expects, and an array of means which can be initialised in the setup. Here we are writing an algorithm for the case of 2D points.

In [7]:

mutable struct KMeans
    k::Int32
    means::Array{Float64,2}

    # Constructor requires the number k of clusters to calculate
    KMeans(k) = new( k, zeros((0,2)) )
end

The strategy requires a few more variables, an array of distances of each point to the means, and an array of the assigned mean for that iteration.

Note that these would not necessarily be required as structure variables, since they could be reallocated continuously in the update function.

In [8]:

mutable struct Lloyds <:LearningStrategy 
    n_obs::Int32
    distances::Array{Float64}
    choices::Array{Int64}
    Lloyds() = new(0,[],[])
end

Now for the actual learning, two functions are defined. setup! is used to, as expected, setup and allocate the model and strategy variables.

The means are randomly initialised with coordinates close to the origin, and the strategy variables are allocated.

The update! simply follows the Lloyds algorithm, which is the basic K-means algorithm.

In [9]:

function setup!(s::Lloyds, model::KMeans, repeated_points::Base.Iterators.Repeated{Array{Float64,2}})
    model.means = randn(model.k,2)*0.1
    s.n_obs = size(repeated_points.x,1)
    s.distances = zeros(model.k)
    s.choices = zeros(Int64, s.n_obs)
end

function update!(model::KMeans, s::Lloyds, points::Array{Float64,2})
    # Assign step
    for i in 1:s.n_obs    
        # Get distance from means
        for j in 1:model.k
            s.distances[j] = sum((points[i,:]-model.means[j,:]).^2)
        end
        # Assign to closest mean
        s.choices[i] = findmin(s.distances)[2]

    end
    
    # Update k-means
    for i in 1:model.k
        # Get list of points assigned to i-th mean
        in_cluster = (s.choices .== i)

        # Calculate new mean (average)
        mean = sum(points[in_cluster,:],1)
        mean /= sum(in_cluster.==true)
        
        model.means[i,:] = mean
    end
end

Out[9]:

update! (generic function with 6 methods)

predict is imported from StatsBase. We notice that the "assigning" step in the update is really just a prediction, and therefore it could have simply be replaced with a call to the predict function. This was not done out of clarity of the example.

In [10]:

# Imported from StatsBase
function predict(model::KMeans, points::Array{Float64})
    n_obs = size(points,1)
    distances = zeros(model.k)
    predictions = zeros(Int64, n_obs)

    for i in 1:n_obs  
        # Get distance from means
        for j in 1:model.k
            distances[j] = sum((points[i,:]-model.means[j,:]).^2)
        end
        
        # Assign to closest mean
        predictions[i] = findmin(distances)[2]
    end
    predictions
end

Out[10]:

predict (generic function with 2 methods)

Then we simply train the model. We set K=3, points generated from normals withs differents variances and means.

Note that the data is given as a repeated iterator, which is simply an iterator of the same data. This structure is the preferred for the framework since it fits better with the basic learn!, although one could jsut overload the latter and have a normal array be sent, or similarly transform it in the setup!.

In [ ]:

# Instantiate a KMeans model with k=3
model = KMeans(3)
# Points are generated from three normals with different stds
points = vcat( randn(100,2)*0.2 .+ [1.0 1.0], randn(100,2)*0.5 .+ [-1.0 0.0], randn(100,2)*0.35 .+ [2.0 -1.0] )
learn!(model, strategy(Lloyds(), MaxIter(10)), Base.Iterators.repeated(points))

# How prediction would work
predict(model, points);

KMeans Source

Making a helper strategy (using hook function)¶

Now it's quite hard to tell whether it's actually working any good given the optimizer is quite bad and will keep moving even when it reaches a maximum, therefore I'd like to be able to visualise..

Why not build a path tracing strategy!

For this, one could either write and update! function which would update the strategy, but because we're just recording a trace, if makes more sense to use the hook function, which is another function allowed in the framework, used for printing or logging; one can think of it as a function meant to observe the process, without interfering with it.

In [12]:

# One array for each coordinate to keep track of positions
mutable struct PathTracer <: LearningStrategy
    x::Array{Float64}
    y::Array{Float64}
    z::Array{Float64}
end

# The constructor (for any argument) is simply to initialise the arrays
PathTracer() = PathTracer([],[],[])

# On setup, save initial position
function setup!(s::PathTracer, model::HillClimber)
   update!(model, s) 
end

# Save current position in helper strategy
function hook(s::PathTracer, model::HillClimber)
    push!(s.x, model.x)
    push!(s.y, model.y)
    push!(s.z, model.z)
end

Out[12]:

hook (generic function with 7 methods)

In [13]:

# Run the model again, this time adding the path tracer strategy
model = HillClimber(2.0, -4.8, 0.0, heuristic)
tracer = PathTracer()
learn!(model, strategy(SillyOptimizer(), MaxIter(30), tracer) )

Out[13]:

HillClimber(0.0948636718479543, 1.1051363281520459, 0.997282737919177, heuristic)

In [14]:

# Plot
using Plots
pyplot()
xs = linspace(-3,3,30)
ys = linspace(-5,5,30)
z = [heuristic([x, y]) for x=xs, y=ys]'

anim = @animate for max_length = 2:length(tracer.x)-1
    plot(xs, ys, z)
    for i in 1:max_length
        plot!([tracer.x[i], tracer.x[i+1]], [tracer.y[i], tracer.y[i+1]], arrow = true, color="black")
    end
    plot!(legend=false, cb=false, title="SillyOptimizer Trace")
end

Out[14]:

Plots.Animation("/tmp/tmpxkTZbH", String[])

SillyOptimizer

Built-in functions¶

We've already looked at MaxIter, here's a list of other built-in functions from LearningStrategy that will be discussed below:

TimeLimit
Converged
ConvergedTo
IterFunction
Tracer
Breaker

TimeLimit¶

Allows to set a maximal time (in seconds) for the process to run.

In [15]:

model = HillClimber(2.0, -4.8, 0.0, heuristic)
tic()
learn!(model, strategy(SillyOptimizer(), TimeLimit(1.0)) )
t = toq()
println("Time elapsed: $t")

Time elapsed: 1.053900964

Converged¶

Allows to check if the model has converged by looking at whether it is changing by enough of a margin. From the source code

    Converged(f; tol = 1e-6, every = 1)
    Stop learning when `norm(f(model) - lastf) ≦ tol`. `f` must return a `Vector{Float64}`.

f should be a function that takes as argument the model. In the case of our optimizer, this is not a great strategy since we know the optimizer takes pretty big steps and moves a lot even when near a maximum. Nevertheless, we can sort of test convergence by checking whether the heuristic has changed by less than a tolerance $\epsilon$.

In [103]:

model = HillClimber(2.0, -4.8, 0.0, heuristic)

θ(m::HillClimber) = [m.heuristic([m.x m.y])]
ϵ=0.01

learn!(model, strategy( SillyOptimizer(), Converged(θ, tol=ϵ) ) )

Out[103]:

HillClimber(0.38183295191069544, -0.18183295191069537, 0.8645437442655263, heuristic)

Similarly, on the KMeans example (which is more adapted to this convergence strategy)

In [102]:

model = KMeans(3)
points = vcat( randn(100,2)*0.2 .+ [1.0 1.0], randn(100,2)*0.5 .+ [-1.0 0.0], randn(100,2)*0.35 .+ [2.0 -1.0] )

ϵ=0.01
θ(m::KMeans) = m.means[:]

learn!(model, strategy(Lloyds(), Converged(θ,tol=ϵ)), Base.Iterators.repeated(points))

Out[102]:

KMeans(3, [2.03591 -1.01121; 0.992619 1.02625; -1.08733 -0.0182533])

ConvergedTo¶

Allows to check whether the model has converged to a goal value.

ConvergedTo(f, goal; tol=1e-6, every=1)
    Stop learning when `‖f(model) - goal‖ ≦ tol`.

The user must provide a function taking the model as argument and returning a float. The latter should be some sort of score of the model to check for convergence to the goal

In our case, this will simply check if we're close to the actual maximum (which we know given the heuristic is a gaussian)

In [134]:

model = HillClimber(2.0, -4.8, 0.0, heuristic)

θ(m::HillClimber) = m.heuristic( [m.x m.y] )
ϵ=0.05
goal = model.heuristic( [0.0 1.0] )

learn!(model, strategy(SillyOptimizer(), ConvergedTo(θ,ϵ, goal, 1) ) )

INFO: Converged after 3 iterations: 0.9627098842147999

Out[134]:

HillClimber(-0.27074663336744964, 0.4707466333674494, 0.9627098842147999, heuristic)

Tracer¶

Earlier, I demonstrated a helper strategy, PathTracer, which allowed to record the movement. This actually already exists as a built-in function, Tracer which allows for much more operations.

Tracer{T}(::Type{T}, f, b=1)
Store `f(model, i)` every `b` iterations.

Let us record our means for the KMeans example. We're recording the means, stored as rows in a matrix, therefore the ::Type{T} will be Array{Float64}. Additionally, a requirement on the function is that it accepts two arguments, model and i, the latter being the iteration number. In this case we just ignore it.

In [107]:

model = KMeans(3)
points = vcat( randn(100,2)*0.2 .+ [1.0 1.0], randn(100,2)*0.5 .+ [-1.0 0.0], randn(100,2)*0.35 .+ [2.0 -1.0] )

θ(m::KMeans, i) = m.means
t = Tracer(Array{Float64,2}, θ)

learn!(model, strategy(SillyOptimizer(), t, MaxIter(5) ) )

# Records are stored in
t.storage

Out[107]:

KMeans(3, Array{Float64}(0,2))

Breaker¶

Allows to add a condition to break out of the learning, by defining a boolean function acting on the model and the iteration number:

    Breaker(f)
    Stop learning when `f(model, i)` returns true.

As a meaningless example, we will simply break when the y value of the hill climbing model is larger than 0.

In [117]:

model = HillClimber(2.0, -4.8, 0.0, heuristic)

θ(m::HillClimber, ..) = m.y > 0.0

learn!(model, strategy(SillyOptimizer(), Breaker(θ) ) )

Out[117]:

HillClimber(-1.3979701578860708, 1.5979701578860714, 0.6566123457082915, heuristic)

IterFunction¶

    IterFunction(f)
    IterFunction(f, b)
    IterFunction(b, f)

    Call `f(model, i)` every `b` iterations at `hook` call.
    Default value of `b` is 1.