Block Coordinate Descent code¶

The initial code was checked in to the github repository.

There are several update steps that are expressed as functions. One is to take a matrix of linear predictor values and convert them to probabilities.

In [1]:

#computes predicted probabilities for multinomial regression, given X*beta
function multProbPrecompute(Xbeta)
  numerator = exp.(Xbeta)
  numerator ./ sum(numerator, dims = 2) # normalize rows to unit length
end

Out[1]:

multProbPrecompute (generic function with 1 method)

This function copies the matrix Xbeta in the call to exp(Xbeta), and creates another copy in the evaluation of prob. This is how one would approach the task in R. In Julia you can do this in place.

It is a good practice to write tests as you go along so that you can verify the results even for very simple functions.

In [2]:

const n = 100
const num = 200
using Random
Random.seed!(1234321)     # set the random number seed for reproducibility

const X = rand(n, num)
X[:, 1] .= 1
X

Out[2]:

100×200 Array{Float64,2}:
 1.0  0.942218    0.24867     0.508796   …  0.833132   0.243615     0.79745  
 1.0  0.042852    0.798958    0.946023      0.271972   0.676288     0.473405 
 1.0  0.658443    0.910798    0.99185       0.309513   0.738769     0.466282 
 1.0  0.933942    0.215572    0.711079      0.0693223  0.966414     0.996814 
 1.0  0.493509    0.0107833   0.322528      0.470281   0.389887     0.853022 
 1.0  0.216062    0.731008    0.543219   …  0.555514   0.842862     0.820857 
 1.0  0.55655     0.626748    0.847555      0.0846821  0.104867     0.393553 
 1.0  0.698472    0.550688    0.539199      0.124176   0.120487     0.820305 
 1.0  0.477957    0.746092    0.675665      0.5284     0.913444     0.876063 
 1.0  0.288074    0.0495433   0.89312       0.324252   0.305764     0.961244 
 1.0  0.762155    0.00279019  0.30776    …  0.612043   0.000933508  0.0783043
 1.0  0.231349    0.578132    0.088052      0.417872   0.0834675    0.950657 
 1.0  0.358739    0.995562    0.0488842     0.904      0.648354     0.640511 
 ⋮                                       ⋱                                   
 1.0  0.871992    0.544112    0.180798      0.907896   0.909636     0.701654 
 1.0  0.826786    0.955972    0.0744761     0.804368   0.0112623    0.920605 
 1.0  0.703566    0.137559    0.882054   …  0.969421   0.299211     0.301055 
 1.0  0.50145     0.303211    0.30445       0.243777   0.411264     0.949961 
 1.0  0.810933    0.899724    0.512623      0.23256    0.972064     0.891446 
 1.0  0.620525    0.55652     0.782564      0.956294   0.598255     0.220518 
 1.0  0.440561    0.874214    0.267746      0.312244   0.820928     0.370913 
 1.0  0.715693    0.372606    0.752476   …  0.349817   0.966559     0.659257 
 1.0  0.00561552  0.597088    0.413342      0.862901   0.760024     0.115013 
 1.0  0.502888    0.352371    0.286198      0.159112   0.30687      0.970337 
 1.0  0.0869479   0.918245    0.353192      0.912009   0.104156     0.430432 
 1.0  0.0272969   0.971082    0.198065      0.113312   0.909728     0.082192

In [3]:

using Distributions
const k = 4

const β = zeros(num, k)
rand!(Uniform(-1.5, 1.5), view(β, 1:10, :))
β[1:11, :]

Out[3]:

11×4 Array{Float64,2}:
 -0.842878  -1.48142   -0.569145   -0.882919
  0.229552   0.809904  -0.432924   -0.388652
  1.11555    0.495039  -0.0618396  -1.08376 
 -0.978813  -0.12077   -1.38466    -0.753775
  0.845695   1.30246    0.0835514  -1.35018 
  0.734366  -0.987098   0.831515   -1.12607 
  1.48704   -0.466602   0.535014    0.454538
  0.790596   0.981425  -0.117606    0.585541
  0.421596   1.39621   -0.176295   -0.378822
  1.1885    -0.775636  -0.718342    1.12486 
  0.0        0.0        0.0         0.0

Notice that the const declaration doesn't mean that the contents of the array must be constant. It just means that the type, size and location of the array cannot be changed.

In [4]:

const Xβ = X * β  # linear predictor
probs = multProbPrecompute(Xβ)
size(probs)

Out[4]:

(100, 4)

In [5]:

probs[1:10, :]

Out[5]:

10×4 Array{Float64,2}:
 0.694888  0.231298    0.0409205   0.0328935 
 0.972676  0.00503632  0.0155342   0.00675307
 0.876706  0.0848717   0.0360854   0.00233681
 0.687598  0.199428    0.0697715   0.0432018 
 0.805965  0.038707    0.130989    0.0243393 
 0.806296  0.118814    0.0720161   0.00287435
 0.838491  0.126393    0.0323506   0.0027648 
 0.730423  0.235906    0.0192987   0.0143724 
 0.957658  0.0283855   0.00825927  0.005697  
 0.771638  0.194524    0.016277    0.0175611

The array of probabilities is oriented so that each row adds to 1.

In [6]:

extrema(sum(probs, dims=2))

Out[6]:

(0.9999999999999998, 1.0000000000000002)

In languages like R and Julia where matrices are stored in column-major order it is an advantage to work with the transpose of this matrix.

Tuning the elementary operation¶

Exponentiating and normalizing a vector can be combined into two loops. We exponentiate and accumulate the sum in one loop, and in the second loop normalize the probabilities. In Julia it is okay to use loops if that makes sense for the operation.

In [7]:

function expnorm!(v)
    ss = sum(map!(exp, v, v))
    v ./= ss
end

Out[7]:

expnorm! (generic function with 1 method)

In [8]:

vv = vec(Xβ[1, :])

Out[8]:

4-element Array{Float64,1}:
  1.4125396324688992 
  0.31249547628056473
 -1.419581047798883  
 -1.6379353964967804

In [9]:

pr = expnorm!(vv)

Out[9]:

4-element Array{Float64,1}:
 0.6948880834566447 
 0.2312979359283825 
 0.0409204574983493 
 0.03289352311662352

In [10]:

sum(pr)

Out[10]:

0.9999999999999999

In [11]:

pr ≈ vec(probs[1,:])

Out[11]:

true

Creating a structure or type¶

The arrays X, beta and probs are associated with each other and must have compatible dimensions. We define a single structure to hold them and the response z.

In [12]:

struct BCD{T<:AbstractFloat}
    X::Matrix{T}
    β::Matrix{T}
    probs::Matrix{T}
    z::Matrix{T}
end

The operation of updating the probabilities is done in-place.

In [13]:

using LinearAlgebra
function updateprobs!(bcd::BCD)
    probs = bcd.probs
    LinearAlgebra.mul!(probs, bcd.β', bcd.X')
    for j in 1:size(probs, 2)
        expnorm!(view(probs, :, j))
    end
    bcd
end

Out[13]:

updateprobs! (generic function with 1 method)

The constructor for the type only requires X and z. The sizes of probs and β can be inferred

In [14]:

function BCD(X::Matrix{T}, z::Matrix{T}) where {T<:AbstractFloat}  # constructor
    n, num = size(X)
    k, r = size(z)      # z is also transposed
    if r ≠ n
        throw(DimensionMismatch())
    end
    res = BCD(X, zeros(T, (num, k)), similar(z), z)
    updateprobs!(res)
end

Out[14]:

BCD

To create such an object we need to simulate the data. Recall that the array probs should be transposed to fit our new schema.

In [15]:

pr = probs'

Out[15]:

4×100 Adjoint{Float64,Array{Float64,2}}:
 0.694888   0.972676    0.876706    …  0.822341   0.80093     0.947382  
 0.231298   0.00503632  0.0848717      0.0750398  0.106154    0.016788  
 0.0409205  0.0155342   0.0360854      0.0610365  0.088035    0.0303635 
 0.0328935  0.00675307  0.00233681     0.0415823  0.00488122  0.00546646

In [16]:

z = similar(pr)

Out[16]:

4×100 Array{Float64,2}:
 6.90739e-310  6.90739e-310  6.90739e-310  …  9.73907e-317  9.73907e-317
 6.90739e-310  6.9074e-310   6.90739e-310     0.0           0.0         
 6.90739e-310  6.90739e-310  6.90739e-310     9.73907e-317  9.73907e-317
 6.90739e-310  6.90739e-310  6.90739e-310     0.0           0.0

In [17]:

for j in 1:size(z, 2)
    rand!(Multinomial(1, view(pr, :, j)), view(z, :, j))
end

MethodError: no method matching Multinomial(::Int64, ::SubArray{Float64,1,Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false})
Closest candidates are:
  Multinomial(::Integer, !Matched::Array{T<:Real,1}) where T<:Real at /home/bates/.julia/packages/Distributions/fMt8c/src/multivariate/multinomial.jl:37
  Multinomial(::Integer, !Matched::Integer) at /home/bates/.julia/packages/Distributions/fMt8c/src/multivariate/multinomial.jl:38

Stacktrace:
 [1] top-level scope at ./In[17]:2

Unfortunately, this doesn't work because the vector of probabilities for the Multinomial must be an Vector. A SubArray won't do.

In [18]:

for j in 1:size(z, 2)
    rand!(Multinomial(1, vec(pr[:, j])), view(z, :, j))
end

In [19]:

z[:,1:4]  # the probabilities for the first category are large

Out[19]:

4×4 Array{Float64,2}:
 0.0  1.0  0.0  1.0
 1.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

In [20]:

sum(z, dims=2)

Out[20]:

4×1 Array{Float64,2}:
 82.0
 13.0
  4.0
  1.0

In [21]:

bcd = BCD(X, z);

In [22]:

bcd.probs

Out[22]:

4×100 Array{Float64,2}:
 0.25  0.25  0.25  0.25  0.25  0.25  …  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25

In [23]:

bcd.z

Out[23]:

4×100 Array{Float64,2}:
 0.0  1.0  0.0  1.0  0.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0

Evaluating the log-likelihood¶

Currently the function to evaluate the log-likelihood is

#computes multinomial log likelihood given X, beta, z
function loglikelihood(X,beta,z)
  p=size(X)[2]
  k=size(z)[2]
  beta=reshape(beta,p,k)
  probs=multProb(X,beta,k)
  val=0
  for i = 1:(size(X)[1])
    val=val+log(probs[i,find(z[i,:].==1)])
  end
  beta=vec(beta)
  return(val)
end

The find(z[i, :] .== 1) call checks which element of each row is non-zero every time the log-likelihood is evaluated. But that never changes. Thus we can evaluate it once only and be done. The best option here is to change the BCD type and its constructor but, for illustration I will simply create a vector in the global workspace. (To change the type I would need to restart the session.)

In [42]:

const y = Int[]
for j in 1:size(z, 2)
    append!(y, findfirst(view(z, :, j)))
end

WARNING: redefining constant y

TypeError: non-boolean (Float64) used in boolean context

Stacktrace:
 [1] findnext at ./array.jl:1609 [inlined]
 [2] findfirst(::SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}) at ./array.jl:1660
 [3] top-level scope at ./In[42]:3

In [41]:

length(y)

Out[41]:

In [26]:

using StatsBase

In [27]:

counts(y, 1:4)

Out[27]:

4-element Array{Int64,1}:
 0
 0
 0
 0

There is already a loglikelihood function in the StatsBase package so we will add a method to it rather than overwriting the name.

In [28]:

function StatsBase.loglikelihood(bcd::BCD{T}) where {T}
    ss = zero(T)
    probs = bcd.probs  # in a productiuon version we would store y as bcd.y
    for j in 1:length(y)
        ss += log(probs[y[j], j])
    end
    ss
end

In [29]:

loglikelihood(bcd)

Out[29]:

0.0

In [30]:

100 * log(0.25)

Out[30]:

-138.62943611198907

Note that, because updateprobs! returns its argument, you can compose updating the probabilities and evaluating the loglikelihood, as is done in the existing function.

In [31]:

updateprobs!(bcd) |> loglikelihood

Out[31]:

0.0

which is equivalent to

In [32]:

loglikelihood(updateprobs!(bcd))

Out[32]:

0.0

Comparisons¶

In [33]:

@time loglikelihood(updateprobs!(bcd));

  0.000044 seconds (105 allocations: 4.859 KiB)

In [34]:

function multProb(X,beta,k)
  p=size(X)[2]
  n=size(X)[1]
  beta=reshape(beta,p,k)
  denominator=zeros(n)
  numerator=exp(X*beta)
  denominator=sum(numerator,2)
  prob=numerator./denominator
  beta=vec(beta)
  return(prob)
end

function ll(X,beta,z)
  p=size(X)[2]
  k=size(z)[2]
  beta=reshape(beta,p,k)
  probs=multProb(X,beta,k)
  val=0
  for i = 1:(size(X)[1])
    val=val+log(probs[i,find(z[i,:].==1)])
  end
  beta=vec(beta)
  return(val)
end

Out[34]:

ll (generic function with 1 method)

In [35]:

size(X), size(β)

Out[35]:

((100, 200), (200, 4))

In [36]:

const zz = z'

Out[36]:

100×4 Adjoint{Float64,Array{Float64,2}}:
 0.0  1.0  0.0  0.0
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 ⋮                 
 0.0  1.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0
 1.0  0.0  0.0  0.0

In [37]:

fill!(β, 0.25);

In [38]:

ll(X, β, zz)

DimensionMismatch("matrix is not square: dimensions are (100, 4)")

Stacktrace:
 [1] checksquare at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/LinearAlgebra.jl:214 [inlined]
 [2] exp!(::Array{Float64,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:513
 [3] exp at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:481 [inlined]
 [4] multProb(::Array{Float64,2}, ::Array{Float64,2}, ::Int64) at ./In[34]:6
 [5] ll(::Array{Float64,2}, ::Array{Float64,2}, ::Adjoint{Float64,Array{Float64,2}}) at ./In[34]:17
 [6] top-level scope at In[38]:1

In [39]:

@time ll(X, β, zz)

DimensionMismatch("matrix is not square: dimensions are (100, 4)")

Stacktrace:
 [1] checksquare at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/LinearAlgebra.jl:214 [inlined]
 [2] exp!(::Array{Float64,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:513
 [3] exp at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:481 [inlined]
 [4] multProb(::Array{Float64,2}, ::Array{Float64,2}, ::Int64) at ./In[34]:6
 [5] ll(::Array{Float64,2}, ::Array{Float64,2}, ::Adjoint{Float64,Array{Float64,2}}) at ./In[34]:17
 [6] top-level scope at util.jl:156
 [7] top-level scope at In[39]:1