The initial code was checked in to the github repository.
There are several update steps that are expressed as functions. One is to take a matrix of linear predictor values and convert them to probabilities.
#computes predicted probabilities for multinomial regression, given X*beta
function multProbPrecompute(Xbeta)
numerator = exp.(Xbeta)
numerator ./ sum(numerator, dims = 2) # normalize rows to unit length
end
multProbPrecompute (generic function with 1 method)
This function copies the matrix Xbeta
in the call to exp(Xbeta)
, and creates another copy in the evaluation of prob
. This is how one would approach the task in R. In Julia you can do this in place.
It is a good practice to write tests as you go along so that you can verify the results even for very simple functions.
const n = 100
const num = 200
using Random
Random.seed!(1234321) # set the random number seed for reproducibility
const X = rand(n, num)
X[:, 1] .= 1
X
100×200 Array{Float64,2}: 1.0 0.942218 0.24867 0.508796 … 0.833132 0.243615 0.79745 1.0 0.042852 0.798958 0.946023 0.271972 0.676288 0.473405 1.0 0.658443 0.910798 0.99185 0.309513 0.738769 0.466282 1.0 0.933942 0.215572 0.711079 0.0693223 0.966414 0.996814 1.0 0.493509 0.0107833 0.322528 0.470281 0.389887 0.853022 1.0 0.216062 0.731008 0.543219 … 0.555514 0.842862 0.820857 1.0 0.55655 0.626748 0.847555 0.0846821 0.104867 0.393553 1.0 0.698472 0.550688 0.539199 0.124176 0.120487 0.820305 1.0 0.477957 0.746092 0.675665 0.5284 0.913444 0.876063 1.0 0.288074 0.0495433 0.89312 0.324252 0.305764 0.961244 1.0 0.762155 0.00279019 0.30776 … 0.612043 0.000933508 0.0783043 1.0 0.231349 0.578132 0.088052 0.417872 0.0834675 0.950657 1.0 0.358739 0.995562 0.0488842 0.904 0.648354 0.640511 ⋮ ⋱ 1.0 0.871992 0.544112 0.180798 0.907896 0.909636 0.701654 1.0 0.826786 0.955972 0.0744761 0.804368 0.0112623 0.920605 1.0 0.703566 0.137559 0.882054 … 0.969421 0.299211 0.301055 1.0 0.50145 0.303211 0.30445 0.243777 0.411264 0.949961 1.0 0.810933 0.899724 0.512623 0.23256 0.972064 0.891446 1.0 0.620525 0.55652 0.782564 0.956294 0.598255 0.220518 1.0 0.440561 0.874214 0.267746 0.312244 0.820928 0.370913 1.0 0.715693 0.372606 0.752476 … 0.349817 0.966559 0.659257 1.0 0.00561552 0.597088 0.413342 0.862901 0.760024 0.115013 1.0 0.502888 0.352371 0.286198 0.159112 0.30687 0.970337 1.0 0.0869479 0.918245 0.353192 0.912009 0.104156 0.430432 1.0 0.0272969 0.971082 0.198065 0.113312 0.909728 0.082192
using Distributions
const k = 4
const β = zeros(num, k)
rand!(Uniform(-1.5, 1.5), view(β, 1:10, :))
β[1:11, :]
11×4 Array{Float64,2}: -0.842878 -1.48142 -0.569145 -0.882919 0.229552 0.809904 -0.432924 -0.388652 1.11555 0.495039 -0.0618396 -1.08376 -0.978813 -0.12077 -1.38466 -0.753775 0.845695 1.30246 0.0835514 -1.35018 0.734366 -0.987098 0.831515 -1.12607 1.48704 -0.466602 0.535014 0.454538 0.790596 0.981425 -0.117606 0.585541 0.421596 1.39621 -0.176295 -0.378822 1.1885 -0.775636 -0.718342 1.12486 0.0 0.0 0.0 0.0
Notice that the const
declaration doesn't mean that the contents of the array must be constant. It just means that the type, size and location of the array cannot be changed.
const Xβ = X * β # linear predictor
probs = multProbPrecompute(Xβ)
size(probs)
(100, 4)
probs[1:10, :]
10×4 Array{Float64,2}: 0.694888 0.231298 0.0409205 0.0328935 0.972676 0.00503632 0.0155342 0.00675307 0.876706 0.0848717 0.0360854 0.00233681 0.687598 0.199428 0.0697715 0.0432018 0.805965 0.038707 0.130989 0.0243393 0.806296 0.118814 0.0720161 0.00287435 0.838491 0.126393 0.0323506 0.0027648 0.730423 0.235906 0.0192987 0.0143724 0.957658 0.0283855 0.00825927 0.005697 0.771638 0.194524 0.016277 0.0175611
The array of probabilities is oriented so that each row adds to 1.
extrema(sum(probs, dims=2))
(0.9999999999999998, 1.0000000000000002)
In languages like R and Julia where matrices are stored in column-major order it is an advantage to work with the transpose of this matrix.
Exponentiating and normalizing a vector can be combined into two loops. We exponentiate and accumulate the sum in one loop, and in the second loop normalize the probabilities. In Julia it is okay to use loops if that makes sense for the operation.
function expnorm!(v)
ss = sum(map!(exp, v, v))
v ./= ss
end
expnorm! (generic function with 1 method)
vv = vec(Xβ[1, :])
4-element Array{Float64,1}: 1.4125396324688992 0.31249547628056473 -1.419581047798883 -1.6379353964967804
pr = expnorm!(vv)
4-element Array{Float64,1}: 0.6948880834566447 0.2312979359283825 0.0409204574983493 0.03289352311662352
sum(pr)
0.9999999999999999
pr ≈ vec(probs[1,:])
true
The arrays X
, beta
and probs
are associated with each other and must have compatible dimensions. We define a single structure to hold them and the response z
.
struct BCD{T<:AbstractFloat}
X::Matrix{T}
β::Matrix{T}
probs::Matrix{T}
z::Matrix{T}
end
The operation of updating the probabilities is done in-place.
using LinearAlgebra
function updateprobs!(bcd::BCD)
probs = bcd.probs
LinearAlgebra.mul!(probs, bcd.β', bcd.X')
for j in 1:size(probs, 2)
expnorm!(view(probs, :, j))
end
bcd
end
updateprobs! (generic function with 1 method)
The constructor for the type only requires X
and z
. The sizes of probs
and β can be inferred
function BCD(X::Matrix{T}, z::Matrix{T}) where {T<:AbstractFloat} # constructor
n, num = size(X)
k, r = size(z) # z is also transposed
if r ≠ n
throw(DimensionMismatch())
end
res = BCD(X, zeros(T, (num, k)), similar(z), z)
updateprobs!(res)
end
BCD
To create such an object we need to simulate the data. Recall that the array probs
should be transposed to fit our new schema.
pr = probs'
4×100 Adjoint{Float64,Array{Float64,2}}: 0.694888 0.972676 0.876706 … 0.822341 0.80093 0.947382 0.231298 0.00503632 0.0848717 0.0750398 0.106154 0.016788 0.0409205 0.0155342 0.0360854 0.0610365 0.088035 0.0303635 0.0328935 0.00675307 0.00233681 0.0415823 0.00488122 0.00546646
z = similar(pr)
4×100 Array{Float64,2}: 6.90739e-310 6.90739e-310 6.90739e-310 … 9.73907e-317 9.73907e-317 6.90739e-310 6.9074e-310 6.90739e-310 0.0 0.0 6.90739e-310 6.90739e-310 6.90739e-310 9.73907e-317 9.73907e-317 6.90739e-310 6.90739e-310 6.90739e-310 0.0 0.0
for j in 1:size(z, 2)
rand!(Multinomial(1, view(pr, :, j)), view(z, :, j))
end
MethodError: no method matching Multinomial(::Int64, ::SubArray{Float64,1,Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false}) Closest candidates are: Multinomial(::Integer, !Matched::Array{T<:Real,1}) where T<:Real at /home/bates/.julia/packages/Distributions/fMt8c/src/multivariate/multinomial.jl:37 Multinomial(::Integer, !Matched::Integer) at /home/bates/.julia/packages/Distributions/fMt8c/src/multivariate/multinomial.jl:38 Stacktrace: [1] top-level scope at ./In[17]:2
Unfortunately, this doesn't work because the vector of probabilities for the Multinomial must be an Vector
. A SubArray
won't do.
for j in 1:size(z, 2)
rand!(Multinomial(1, vec(pr[:, j])), view(z, :, j))
end
z[:,1:4] # the probabilities for the first category are large
4×4 Array{Float64,2}: 0.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
sum(z, dims=2)
4×1 Array{Float64,2}: 82.0 13.0 4.0 1.0
bcd = BCD(X, z);
bcd.probs
4×100 Array{Float64,2}: 0.25 0.25 0.25 0.25 0.25 0.25 … 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
bcd.z
4×100 Array{Float64,2}: 0.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Currently the function to evaluate the log-likelihood is
#computes multinomial log likelihood given X, beta, z
function loglikelihood(X,beta,z)
p=size(X)[2]
k=size(z)[2]
beta=reshape(beta,p,k)
probs=multProb(X,beta,k)
val=0
for i = 1:(size(X)[1])
val=val+log(probs[i,find(z[i,:].==1)])
end
beta=vec(beta)
return(val)
end
The find(z[i, :] .== 1)
call checks which element of each row is non-zero every time the log-likelihood is evaluated. But that never changes. Thus we can evaluate it once only and be done. The best option here is to change the BCD type and its constructor but, for illustration I will simply create a vector in the global workspace. (To change the type I would need to restart the session.)
const y = Int[]
for j in 1:size(z, 2)
append!(y, findfirst(view(z, :, j)))
end
WARNING: redefining constant y
TypeError: non-boolean (Float64) used in boolean context Stacktrace: [1] findnext at ./array.jl:1609 [inlined] [2] findfirst(::SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}) at ./array.jl:1660 [3] top-level scope at ./In[42]:3
length(y)
0
using StatsBase
counts(y, 1:4)
4-element Array{Int64,1}: 0 0 0 0
There is already a loglikelihood
function in the StatsBase
package so we will add a method to it rather than overwriting the name.
function StatsBase.loglikelihood(bcd::BCD{T}) where {T}
ss = zero(T)
probs = bcd.probs # in a productiuon version we would store y as bcd.y
for j in 1:length(y)
ss += log(probs[y[j], j])
end
ss
end
loglikelihood(bcd)
0.0
100 * log(0.25)
-138.62943611198907
Note that, because updateprobs!
returns its argument, you can compose updating the probabilities and evaluating the loglikelihood, as is done in the existing function.
updateprobs!(bcd) |> loglikelihood
0.0
which is equivalent to
loglikelihood(updateprobs!(bcd))
0.0
@time loglikelihood(updateprobs!(bcd));
0.000044 seconds (105 allocations: 4.859 KiB)
function multProb(X,beta,k)
p=size(X)[2]
n=size(X)[1]
beta=reshape(beta,p,k)
denominator=zeros(n)
numerator=exp(X*beta)
denominator=sum(numerator,2)
prob=numerator./denominator
beta=vec(beta)
return(prob)
end
function ll(X,beta,z)
p=size(X)[2]
k=size(z)[2]
beta=reshape(beta,p,k)
probs=multProb(X,beta,k)
val=0
for i = 1:(size(X)[1])
val=val+log(probs[i,find(z[i,:].==1)])
end
beta=vec(beta)
return(val)
end
ll (generic function with 1 method)
size(X), size(β)
((100, 200), (200, 4))
const zz = z'
100×4 Adjoint{Float64,Array{Float64,2}}: 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ⋮ 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
fill!(β, 0.25);
ll(X, β, zz)
DimensionMismatch("matrix is not square: dimensions are (100, 4)") Stacktrace: [1] checksquare at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/LinearAlgebra.jl:214 [inlined] [2] exp!(::Array{Float64,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:513 [3] exp at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:481 [inlined] [4] multProb(::Array{Float64,2}, ::Array{Float64,2}, ::Int64) at ./In[34]:6 [5] ll(::Array{Float64,2}, ::Array{Float64,2}, ::Adjoint{Float64,Array{Float64,2}}) at ./In[34]:17 [6] top-level scope at In[38]:1
@time ll(X, β, zz)
DimensionMismatch("matrix is not square: dimensions are (100, 4)") Stacktrace: [1] checksquare at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/LinearAlgebra.jl:214 [inlined] [2] exp!(::Array{Float64,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:513 [3] exp at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/dense.jl:481 [inlined] [4] multProb(::Array{Float64,2}, ::Array{Float64,2}, ::Int64) at ./In[34]:6 [5] ll(::Array{Float64,2}, ::Array{Float64,2}, ::Adjoint{Float64,Array{Float64,2}}) at ./In[34]:17 [6] top-level scope at util.jl:156 [7] top-level scope at In[39]:1