*git:* https://github.com/JuliaML/LearningStrategies.jl

- Summary of package and functionalities
- How a _learning strategy_ works
- Combining _learning strategies_
- How data is handled in the default structure

Through the JuliaML *Learn.jl* and *LearningStrategies.jl*, a framework was built allowing the simple implementation of new algorithms and learning strategies, and making it extremely simple to combine multiple strategies together.

A strategy here is simply defined as a way for the user to tell how a model (stored as a *struct*) is setup, trained and updated. This idea stems from the very similar pseudocode most learning algorithms operate with, which is:

```
# Learning a model involves a model, a strategy and (not necessary) some data
function learn!(model, strat::LearningStrategy, data)
# We set up the model, can be preparing arrays for training, doing some preprocessing
setup!(strat, model[, data])
# Using the "repeated" function on your data allows it to replicate it infinitely
for (i, item) in enumerate(data)
# Update the model according to the strategy(-ies) and the data
update!(model, strat[, i], item)
# Allows to record information or generally "hook in" the process. Should NOT modify in place
hook(strat, model[, data], i)
# Checks if the finishing condition has been reached
finished(strat, model[, data], i) && break
end
# Clean up after yourself
cleanup!(strat, model)
# Return the trained model
model
end
```

Which can be translated as setup, iterate over data until some end condition is met, and clean up.
In addition, a conventional extra function which can (and should) be defined if your model is predictive, is *predict()*, which should be imported from StatsBase. The pseudocode is as follows

```
function predict(model, data)
# Make predictions
end
```

A LearningStrategy is simply a struct which extends *LearningStrategy*. A strategy needs not contain any variable, all that is required is that at least one of the following functions is defined:

`setup!(strat, model, data)`

`cleanup!(strat, model)`

`hook(strat, model, i)`

`finished(strat, model, data, i)`

`update!(model, strat, item)`

Keep in mind of course that if only the `hook`

function is defined for a generic strategy *StrategyA*, and this is the only strategy given for learning, the model will.. not learn anything!

This architecture makes it extremely simple for user wanting to write his own learning algorithm to add methods as they come without need further planning. For instance, suppose one strats off by just implementing and `SGD()`

optimizer, then all that would be required is to define the structure

```
struct SGD() <:LearningStrategy end
```

and then implement the actual optimization by defining a function `update!(model, strategy::SGD)`

. Later on, if the same user or another wants to implement *AdaProx* optimizer, they can simply add a new function `update!(model, strategy::AdaProx)`

.

It is not even required to explicitely wirte a `learn!`

function if all you need is the `update!`

, simply use the call

```
learn!(model, SGD(), data)
```

and Julia will handle the "missing" functions (by simply skipping them) and will make the call to the update you defined.

**Note:** the *data* (and *i*)parameter is not actually required, all the base functions are defined with both data and no data argument

One extremely powerful feature that this framework offers is the combination of strategies. For instance, while only one strategy will tell the model how to train and update for each iteration, extra strategies which only give side indications can be very easily used. For instance, one of the built-in strategies is `MaxIter(n=100)`

which, if given in conjuction to a laerning strategy for any model, will overload the *finished* function, and will do its own check as to whether the algorithm is indeed over. A simple use would be:

```
learn!(MyModel(), strategies( MyLearningStrategy(), MaxIter(20) ), MyData )
```

In this way, the learning process will end after 20 iterations, even if the conditions for *finished()* given by the *MyLearningStrategy* are not met (only one *finished* condition needs to be met for the learning to stop)

This means that if you develop your own learning model, you do not need to also add all this "side functionalities"! How great is that!

Different algorithms use data differently. Some iterate over each element, while other take batches and so on.

The base code uses this loop: `for (i, item) in enumerate(data)`

. This implies the data supplied for training is an iterable. This is often not really the case, for instance a matrix, although technically iterable, will be enumerated by element, instead of by row or column which is the more natural way.

Two solutions are therefore proposed;

The first is to pass a *repeated* matrix using Base.Iterators.repeated, which will basically make every "item" a copy of the initial matrix. This is simply used as `repeated(matrix)`

.

The other is to do some preprocessing, possibly using the `setup!`

function or in some other way. It would be, for example, possible to make a row iterator, in the case of a row-majored matrix.

Below I will show a and explain a code to make a simple (and not very good..) maximum finder!
The idea is this: the model will simply include a starting point (2D space), and a *heuristic function* that is being maximised by a (bad) strategy.

First thing first, since we're going to be implementing `learn!, update!`

and `finished`

, we need to export them from the base package

In [1]:

```
using Plots
using LearningStrategies
import LearningStrategies: setup!, learn!, update!, finished, hook
import StatsBase: predict
import Base.Iterators: repeated
```

Now we define the model. It contains 4 attributes, one of which is a generic function. x & y are the starting points, and z will just keep track of the current "height" (i.e. score) in the solution space. The *heuristic* is the function to be optimised

In [2]:

```
mutable struct HillClimber
x::Float64
y::Float64
z::Float64
heuristic::Function
end
```

Then, we define our learning strategy structure. Since this strategy doesn't require any attributes, its definition is trivial.

In [3]:

```
struct SillyOptimizer <: LearningStrategy end
```

Finally we start the actual strategy building.

In [4]:

```
function update!(model::HillClimber, s::SillyOptimizer)
# Generate 4 random direction vectors
choices = rand((4,2))-0.5
# Normalizes to unit step size
choices ./= abs.(sum(choices,2))
# Test the locations
heights = [heuristic(choices[i,:]+[model.x,model.y]) for i in 1:size(choices,1)]
# Keep the best
(model.z, best) = findmax(heights)
# Change current coordinates
model.x += choices[best,1]
model.y += choices[best,2]
end
# Stop after first iteration
finished(s::SillyOptimizer, model) = true
```

Out[4]:

You may have noticed that the `learn!`

did not actually need to be defined since there is no need to override the default learning structure. The *finished* function simply returns true, hence the learning will complete after one iteration. Let's give it a try:

In [5]:

```
# Gaussian with max at (0,1)
function heuristic(x)
arg = (x[1])^2 /5 + (x[2]-1)^2 /12
h = exp(-arg)
h
end
# Create the model, giving the attributes in order of definition
model = HillClimber(1.5, -2, 0, heuristic)
# Learn!
learn!(model, SillyOptimizer() )
```

Out[5]:

So given that it's a pretty bad optimiser and that we only allowed for one iteration, it didn't perform very well.. Now it's quite hard for such an optimizer to decide when to stop and building a finished function. Therefore, what I will do is intead use meta-learning to decide when to finish.

To do this, I first need to redefine `finished`

so that it doesn't return true, and then use a convergence strategy, such as MaxIter or TimeLimit.

In [6]:

```
# Redefine to never finish (careful here! #infiniteloop)
finished(s::SillyOptimizer, model) = false
# Create the model, giving the attributes in order of definition
model = HillClimber(1.5, -2, 0, heuristic)
# Use MaxIter to decide when to end learning
learn!(model, strategy( SillyOptimizer(), MaxIter(20) ) )
```

Out[6]:

*TODO: rewrite this terrible description*

K-means is an unsupervised-learning algorithm which attempts to find the set of *k* clusters which separates the data the best.

The model itself doesn't require anything more than the number *K* of clusters that the user expects, and an array of means which can be initialised in the setup. Here we are writing an algorithm for the case of 2D points.

In [7]:

```
mutable struct KMeans
k::Int32
means::Array{Float64,2}
# Constructor requires the number k of clusters to calculate
KMeans(k) = new( k, zeros((0,2)) )
end
```

The strategy requires a few more variables, an array of distances of each point to the means, and an array of the assigned mean for that iteration.

Note that these would not necessarily be required as structure variables, since they could be reallocated continuously in the update function.

In [8]:

```
mutable struct Lloyds <:LearningStrategy
n_obs::Int32
distances::Array{Float64}
choices::Array{Int64}
Lloyds() = new(0,[],[])
end
```

Now for the actual learning, two functions are defined. *setup!* is used to, as expected, setup and allocate the model and strategy variables.

The means are randomly initialised with coordinates close to the origin, and the strategy variables are allocated.

The *update!* simply follows the *Lloyds* algorithm, which is the basic K-means algorithm.

In [9]:

```
function setup!(s::Lloyds, model::KMeans, repeated_points::Base.Iterators.Repeated{Array{Float64,2}})
model.means = randn(model.k,2)*0.1
s.n_obs = size(repeated_points.x,1)
s.distances = zeros(model.k)
s.choices = zeros(Int64, s.n_obs)
end
function update!(model::KMeans, s::Lloyds, points::Array{Float64,2})
# Assign step
for i in 1:s.n_obs
# Get distance from means
for j in 1:model.k
s.distances[j] = sum((points[i,:]-model.means[j,:]).^2)
end
# Assign to closest mean
s.choices[i] = findmin(s.distances)[2]
end
# Update k-means
for i in 1:model.k
# Get list of points assigned to i-th mean
in_cluster = (s.choices .== i)
# Calculate new mean (average)
mean = sum(points[in_cluster,:],1)
mean /= sum(in_cluster.==true)
model.means[i,:] = mean
end
end
```

Out[9]:

*predict* is imported from StatsBase. We notice that the "assigning" step in the update is really just a prediction, and therefore it could have simply be replaced with a call to the predict function. This was not done out of clarity of the example.

In [10]:

```
# Imported from StatsBase
function predict(model::KMeans, points::Array{Float64})
n_obs = size(points,1)
distances = zeros(model.k)
predictions = zeros(Int64, n_obs)
for i in 1:n_obs
# Get distance from means
for j in 1:model.k
distances[j] = sum((points[i,:]-model.means[j,:]).^2)
end
# Assign to closest mean
predictions[i] = findmin(distances)[2]
end
predictions
end
```

Out[10]:

Then we simply train the model. We set K=3, points generated from normals withs differents variances and means.

Note that the data is given as a *repeated* iterator, which is simply an iterator of the same data. This structure is the preferred for the framework since it fits better with the basic *learn!*, although one could jsut overload the latter and have a normal array be sent, or similarly transform it in the *setup!*.

In [ ]:

```
# Instantiate a KMeans model with k=3
model = KMeans(3)
# Points are generated from three normals with different stds
points = vcat( randn(100,2)*0.2 .+ [1.0 1.0], randn(100,2)*0.5 .+ [-1.0 0.0], randn(100,2)*0.35 .+ [2.0 -1.0] )
learn!(model, strategy(Lloyds(), MaxIter(10)), Base.Iterators.repeated(points))
# How prediction would work
predict(model, points);
```

Now it's quite hard to tell whether it's actually working any good given the optimizer is quite bad and will keep moving even when it reaches a maximum, therefore I'd like to be able to visualise..

Why not build a path tracing strategy!

For this, one could either write and `update!`

function which would update the strategy, but because we're just recording a trace, if makes more sense to use the `hook`

function, which is another function allowed in the framework, used for printing or logging; one can think of it as a function meant to *observe* the process, without interfering with it.

In [12]:

```
# One array for each coordinate to keep track of positions
mutable struct PathTracer <: LearningStrategy
x::Array{Float64}
y::Array{Float64}
z::Array{Float64}
end
# The constructor (for any argument) is simply to initialise the arrays
PathTracer() = PathTracer([],[],[])
# On setup, save initial position
function setup!(s::PathTracer, model::HillClimber)
update!(model, s)
end
# Save current position in helper strategy
function hook(s::PathTracer, model::HillClimber)
push!(s.x, model.x)
push!(s.y, model.y)
push!(s.z, model.z)
end
```

Out[12]:

In [13]:

```
# Run the model again, this time adding the path tracer strategy
model = HillClimber(2.0, -4.8, 0.0, heuristic)
tracer = PathTracer()
learn!(model, strategy(SillyOptimizer(), MaxIter(30), tracer) )
```

Out[13]:

In [14]:

```
# Plot
using Plots
pyplot()
xs = linspace(-3,3,30)
ys = linspace(-5,5,30)
z = [heuristic([x, y]) for x=xs, y=ys]'
anim = @animate for max_length = 2:length(tracer.x)-1
plot(xs, ys, z)
for i in 1:max_length
plot!([tracer.x[i], tracer.x[i+1]], [tracer.y[i], tracer.y[i+1]], arrow = true, color="black")
end
plot!(legend=false, cb=false, title="SillyOptimizer Trace")
end
```

Out[14]:

In [15]:

```
model = HillClimber(2.0, -4.8, 0.0, heuristic)
tic()
learn!(model, strategy(SillyOptimizer(), TimeLimit(1.0)) )
t = toq()
println("Time elapsed: $t")
```

Allows to check if the model has converged by looking at whether it is changing by enough of a margin. From the source code

```
Converged(f; tol = 1e-6, every = 1)
Stop learning when `norm(f(model) - lastf) ≦ tol`. `f` must return a `Vector{Float64}`.
```

f should be a function that takes as argument the model. In the case of our optimizer, this is not a great strategy since we know the optimizer takes pretty big steps and moves a lot even when near a maximum. Nevertheless, we can sort of test convergence by checking whether the heuristic has changed by less than a tolerance $\epsilon$.

In [103]:

```
model = HillClimber(2.0, -4.8, 0.0, heuristic)
θ(m::HillClimber) = [m.heuristic([m.x m.y])]
ϵ=0.01
learn!(model, strategy( SillyOptimizer(), Converged(θ, tol=ϵ) ) )
```

Out[103]:

Similarly, on the KMeans example (which is more adapted to this convergence strategy)

In [102]:

```
model = KMeans(3)
points = vcat( randn(100,2)*0.2 .+ [1.0 1.0], randn(100,2)*0.5 .+ [-1.0 0.0], randn(100,2)*0.35 .+ [2.0 -1.0] )
ϵ=0.01
θ(m::KMeans) = m.means[:]
learn!(model, strategy(Lloyds(), Converged(θ,tol=ϵ)), Base.Iterators.repeated(points))
```

Out[102]:

Allows to check whether the model has converged to a goal value.

```
ConvergedTo(f, goal; tol=1e-6, every=1)
Stop learning when `‖f(model) - goal‖ ≦ tol`.
```

The user must provide a function taking the model as argument and returning a float. The latter should be some sort of score of the model to check for convergence to the goal

In our case, this will simply check if we're close to the actual maximum (which we know given the heuristic is a gaussian)

In [134]:

```
model = HillClimber(2.0, -4.8, 0.0, heuristic)
θ(m::HillClimber) = m.heuristic( [m.x m.y] )
ϵ=0.05
goal = model.heuristic( [0.0 1.0] )
learn!(model, strategy(SillyOptimizer(), ConvergedTo(θ,ϵ, goal, 1) ) )
```

Out[134]:

Earlier, I demonstrated a helper strategy, `PathTracer`

, which allowed to record the movement. This actually already exists as a built-in function, `Tracer`

which allows for much more operations.

```
Tracer{T}(::Type{T}, f, b=1)
Store `f(model, i)` every `b` iterations.
```

Let us record our means for the KMeans example. We're recording the means, stored as rows in a matrix, therefore the `::Type{T}`

will be `Array{Float64}`

. Additionally, a requirement on the function is that it accepts two arguments, `model`

and `i`

, the latter being the iteration number. In this case we just ignore it.

In [107]:

```
model = KMeans(3)
points = vcat( randn(100,2)*0.2 .+ [1.0 1.0], randn(100,2)*0.5 .+ [-1.0 0.0], randn(100,2)*0.35 .+ [2.0 -1.0] )
θ(m::KMeans, i) = m.means
t = Tracer(Array{Float64,2}, θ)
learn!(model, strategy(SillyOptimizer(), t, MaxIter(5) ) )
# Records are stored in
t.storage
```

Out[107]:

Allows to add a condition to break out of the learning, by defining a boolean function acting on the model and the iteration number:

```
Breaker(f)
Stop learning when `f(model, i)` returns true.
```

As a meaningless example, we will simply break when the `y`

value of the hill climbing model is larger than 0.

In [117]:

```
model = HillClimber(2.0, -4.8, 0.0, heuristic)
θ(m::HillClimber, ..) = m.y > 0.0
learn!(model, strategy(SillyOptimizer(), Breaker(θ) ) )
```

Out[117]:

```
IterFunction(f)
IterFunction(f, b)
IterFunction(b, f)
Call `f(model, i)` every `b` iterations at `hook` call.
Default value of `b` is 1.
```