Generalized Linear Mixed‐effects Model Playground

or the many ways to perform GLMM in python

A comparison among:
StatsModels
Theano
PyMC3(Base on Theano)
TensorFlow
Stan and pyStan
Keras with Tensorflow backend

In [1]:
import numpy as np
#import pandas as pd
import seaborn as sns
from theano import tensor as tt
%pylab inline
%config InlineBackend.figure_format = 'retina'
%qtconsole --colors=linux

%load_ext watermark
Populating the interactive namespace from numpy and matplotlib
In [2]:
%watermark -v -m -p numpy,pandas,seaborn,statsmodels,theano,pymc3,tensorflow,keras,matplotlib
Using TensorFlow backend.
CPython 3.5.2
IPython 6.0.0

numpy 1.12.1
pandas 0.19.2
seaborn 0.7.1
statsmodels 0.8.0.dev0+abb5996
theano 0.9.0.dev-d704a600dfb029eed39957730a42f50262df004f
pymc3 3.1rc3
tensorflow 1.1.0
keras 2.0.1
matplotlib 2.0.1

compiler   : GCC 5.4.0 20160609
system     : Linux
release    : 4.4.0-77-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 8
interpreter: 64bit

Prepare data and quick visualization

In [3]:
# load data
import pandas as pd
Tbl_beh  = pd.read_csv('./behavioral_data.txt', delimiter='\t')
Tbl_beh["subj"]  = Tbl_beh["subj"].astype('category')
#%% visualized data
# Draw a nested violinplot and split the violins for easier comparison
sns.set(style="ticks", palette="muted", color_codes=True)
plt.figure()
Tbl_beh['cbcond']  = pd.Series(Tbl_beh['chimera'] + '-' + Tbl_beh['identity'], index=Tbl_beh.index)
sns.violinplot(y="cbcond", x="rt", hue="group", data=Tbl_beh, split=True,
               inner="quart", palette={"cp": "b", "cg": "y"})
sns.despine(left=True)

Tbl_beh is the raw data table.

In [4]:
Tbl_beh.head(5)
Out[4]:
subj trial chimera identity orientation rt acc group cbcond
0 1 1 L_R other upright 754.6 0 cp L_R-other
1 1 2 L_L other upright 1036.1 1 cp L_L-other
2 1 3 R_R self upright 433.0 1 cp R_R-self
3 1 4 L_R self upright 463.4 1 cp L_R-self
4 1 5 L_L self upright 483.3 1 cp L_L-self
In [5]:
#%% Compute the conditional mean of dataset
condi_sel = ['subj', 'chimera', 'identity', 'orientation']
tblmean = pd.DataFrame({'Nt' : Tbl_beh.groupby( condi_sel ).size()}).reset_index()
tblmean["subj"] = tblmean["subj"].astype('category')
tblmean['rt']  = np.asarray(Tbl_beh.groupby(condi_sel)['rt'].mean())
tblmean['ACC'] = np.asarray(Tbl_beh.groupby(condi_sel)['acc'].mean())
tblmean['group']= np.asarray(Tbl_beh.groupby(condi_sel)['group'].all())

tblmean is the summary data table.

In [6]:
tblmean['cbcond']  = pd.Series(tblmean['chimera'] + '-' + tblmean['identity'], 
                index=tblmean.index)

## boxplot + scatter plot to show accuracy
#ax = sns.boxplot(y="cbcond", x="acc", data=tbltest,
#                 whis=np.inf, color="c")
## Add in points to show each observation
#sns.stripplot(y="cbcond", x="acc", data=tbltest,
#              jitter=True, size=3, color=".3", linewidth=0)
plt.figure()
g1 = sns.violinplot(y="cbcond", x="rt", hue="group", data=tblmean, split=True,
               inner="quart", palette={"cp": "b", "cg": "y"})
g1.set(xlim=(0, 3000))
# Make the quantitative axis logarithmic
sns.despine(trim=True)
In [7]:
tblmean.head(5)
Out[7]:
subj chimera identity orientation Nt rt ACC group cbcond
0 1 L_L other inverted 10 713.37 0.8 cp L_L-other
1 1 L_L other upright 10 749.55 0.8 cp L_L-other
2 1 L_L self inverted 10 538.64 0.6 cp L_L-self
3 1 L_L self upright 10 629.01 0.9 cp L_L-self
4 1 L_R other inverted 10 784.70 1.0 cp L_R-other

Here I used the mean table as the data - it is smaller and faster to run

In [8]:
tbltest = tblmean
tbltest.shape
Out[8]:
(320, 9)

Using StatsModels to perform a linear mixed model of reaction time

In [9]:
import statsmodels.formula.api as smf
from patsy import dmatrices
formula = "rt ~ group*orientation*identity"
#formula = "rt ~ -1 + cbcond"
md  = smf.mixedlm(formula, tbltest, groups=tbltest["subj"])
mdf = md.fit()
print(mdf.summary())

fe_params = pd.DataFrame(mdf.fe_params,columns=['LMM'])
random_effects = pd.DataFrame(mdf.random_effects)
random_effects = random_effects.transpose()
random_effects = random_effects.rename(index=str, columns={'groups': 'LMM'})

#%% Generate Design Matrix for later use
Y, X   = dmatrices(formula, data=tbltest, return_type='matrix')
Terms  = X.design_info.column_names
_, Z   = dmatrices('rt ~ -1+subj', data=tbltest, return_type='matrix')
X      = np.asarray(X) # fixed effect
Z      = np.asarray(Z) # mixed effect
Y      = np.asarray(Y).flatten()
nfixed = np.shape(X)
nrandm = np.shape(Z)
                               Mixed Linear Model Regression Results
===================================================================================================
Model:                           MixedLM               Dependent Variable:               rt        
No. Observations:                320                   Method:                           REML      
No. Groups:                      20                    Scale:                            4502.8886 
Min. group size:                 16                    Likelihood:                       -1796.2120
Max. group size:                 16                    Converged:                        Yes       
Mean group size:                 16.0                                                              
---------------------------------------------------------------------------------------------------
                                                     Coef.   Std.Err.   z    P>|z|  [0.025   0.975]
---------------------------------------------------------------------------------------------------
Intercept                                            689.538   21.918 31.459 0.000  646.579 732.497
group[T.cp]                                          151.113   37.049  4.079 0.000   78.498 223.727
orientation[T.upright]                               -64.602   13.160 -4.909 0.000  -90.395 -38.808
identity[T.self]                                     -17.581   13.160 -1.336 0.182  -43.374   8.212
group[T.cp]:orientation[T.upright]                   -87.712   22.245 -3.943 0.000 -131.311 -44.113
group[T.cp]:identity[T.self]                         -67.411   22.245 -3.030 0.002 -111.010 -23.812
orientation[T.upright]:identity[T.self]              -20.020   18.611 -1.076 0.282  -56.498  16.457
group[T.cp]:orientation[T.upright]:identity[T.self]   36.768   31.459  1.169 0.242  -24.890  98.426
groups RE                                           5119.673   27.637                              
===================================================================================================

A helper function for ploting

In [10]:
#%% ploting function 
def plotfitted(fe_params=fe_params,random_effects=random_effects,X=X,Z=Z,Y=Y):
    plt.figure(figsize=(18,9))
    ax1 = plt.subplot2grid((2,2), (0, 0))
    ax2 = plt.subplot2grid((2,2), (0, 1))
    ax3 = plt.subplot2grid((2,2), (1, 0), colspan=2)
    
    fe_params.plot(ax=ax1)
    random_effects.plot(ax=ax2)
    
    ax3.plot(Y.flatten(),'o',color='k',label = 'Observed', alpha=.25)
    for iname in fe_params.columns.get_values():
        fitted = np.dot(X,fe_params[iname])+np.dot(Z,random_effects[iname]).flatten()
        print("The MSE of "+iname+ " is " + str(np.mean(np.square(Y.flatten()-fitted))))
        ax3.plot(fitted,lw=1,label = iname, alpha=.5)
    ax3.legend(loc=0)
    #plt.ylim([0,5])
    plt.show()

plotfitted(fe_params=fe_params,random_effects=random_effects,X=X,Z=Z,Y=Y)
The MSE of LMM is 4150.21209997

Using PyMC3

In [11]:
beta0     = np.linalg.lstsq(X,Y)

fixedpred = np.argmax(X,axis=1)
randmpred = np.argmax(Z,axis=1)

con  = tt.constant(fixedpred)
sbj  = tt.constant(randmpred)
import pymc3 as pm
with pm.Model() as glmm1:
    # Fixed effect
    beta = pm.Normal('beta', mu = 0, sd = 100, shape=(nfixed[1]))
    # random effect
    s    = pm.HalfCauchy('s',50,shape=(nrandm[1]))
    b    = pm.Normal('b', mu = 0, sd = s, shape=(nrandm[1]))
    eps  = pm.HalfCauchy('eps', 5)
    
    #mu_est = pm.Deterministic('mu_est',beta[con] + b[sbj])
    mu_est = pm.Deterministic('mu_est',tt.dot(X,beta)+tt.dot(Z,b))
    RT = pm.Normal('RT', mu_est, eps, observed = Y)
    
    # start = pm.find_MAP()
    # h = find_hessian(start)

with glmm1:
    # means, sds, elbos = pm.variational.advi(n=100000)
    trace = pm.sample(3000)
    
pm.traceplot(trace,varnames=['beta','b','s']) # 
plt.show()

burnin1 = 2000
fixed_pymc = pm.df_summary(trace[burnin1:],varnames=['beta'])
randm_pymc = pm.df_summary(trace[burnin1:],varnames=['b'])
fe_params['PyMC'] = pd.Series(np.asarray(fixed_pymc['mean']), index=fe_params.index)
random_effects['PyMC'] = pd.Series(np.asarray(randm_pymc['mean']),index=random_effects.index)
print(fe_params)
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
INFO (theano.gof.compilelock): Waiting for existing lock by process '4987' (I am process '14478')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/laoj/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-3.5.2-64/lock_dir
Average Loss = 2,332.3: 100%|██████████| 200000/200000 [00:18<00:00, 10991.19it/s] 
Finished [100%]: Average Loss = 2,332.3
100%|█████████▉| 2999/3000 [00:18<00:00, 167.05it/s]/usr/local/lib/python3.5/dist-packages/pymc3/step_methods/hmc/nuts.py:237: UserWarning: Step size tuning was enabled throughout the whole trace. You might want to specify the number of tuning steps.
  warnings.warn('Step size tuning was enabled throughout the whole '
/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
100%|██████████| 3000/3000 [00:19<00:00, 157.89it/s]
                                                           LMM        PyMC
Intercept                                           689.537735  652.578745
group[T.cp]                                         151.112622  168.266454
orientation[T.upright]                              -64.601774  -62.159064
identity[T.self]                                    -17.581122  -15.239280
group[T.cp]:orientation[T.upright]                  -87.712115  -83.001840
group[T.cp]:identity[T.self]                        -67.411021  -62.446566
orientation[T.upright]:identity[T.self]             -20.020417  -21.808688
group[T.cp]:orientation[T.upright]:identity[T.s...   36.768234   28.513699

Using Theano

This is a non-probabilistic model, but none-the-less uncertainty related to parameter estimation could be obtained using dropout (http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html).

In [12]:
Y, X   = dmatrices(formula, data=tbltest, return_type='matrix')
Terms  = X.design_info.column_names
_, Z   = dmatrices('rt ~ -1+subj', data=tbltest, return_type='matrix')
X      = np.asarray(X) # fixed effect
Z      = np.asarray(Z) # mixed effect
Y      = np.asarray(Y).flatten()
nfixed = np.shape(X)
nrandm = np.shape(Z)

X = X.astype(np.float32)
Z = Z.astype(np.float32)
Y = Y.astype(np.float32)

import theano
theano.config.compute_test_value = 'ignore'

def floatX(X):
    return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):
    return theano.shared(floatX(np.random.randn(*shape) * 0.01))

def model(X, beta, Z, b):
    return tt.sum(tt.dot(X, beta) + tt.dot(Z, b),axis=1)

def sgd(cost, params, lr=0.001):
    grads = tt.grad(cost=cost, wrt=params)
    updates = []
    for p, g in zip(params, grads):
        updates.append([p, p - g * lr])
    return updates
    
Xtf   = tt.fmatrix()
Ztf   = tt.fmatrix()
y     = tt.vector()
tbeta = init_weights([nfixed[1], 1])
tb    = init_weights([nrandm[1], 1])
eps   = init_weights([0])
y_    = model(Xtf, tbeta, Ztf,tb)

cost    = tt.mean(tt.sqr(y - y_))
params  = [tbeta, tb]
updates = sgd(cost, params)

train = theano.function(inputs=[Xtf, Ztf, y], outputs=cost, updates=updates, allow_input_downcast=True)

for i in range(100000):
    sel = np.random.randint(0,nfixed[0],size=int(nfixed[0]/2))
    batch_xs, batch_zs, batch_ys = X[sel,:],Z[sel,:],Y[sel]
    train(batch_xs, batch_zs, batch_ys)
        
outdf = pd.DataFrame(data=tbeta.get_value(),columns=['beta'],index=Terms)
fe_params['theano'] = pd.Series(tbeta.get_value().flatten(), index=fe_params.index)
random_effects['theano'] = pd.Series(tb.get_value().flatten() ,index=random_effects.index)
print(fe_params)
                                                           LMM        PyMC  \
Intercept                                           689.537735  652.578745   
group[T.cp]                                         151.112622  168.266454   
orientation[T.upright]                              -64.601774  -62.159064   
identity[T.self]                                    -17.581122  -15.239280   
group[T.cp]:orientation[T.upright]                  -87.712115  -83.001840   
group[T.cp]:identity[T.self]                        -67.411021  -62.446566   
orientation[T.upright]:identity[T.self]             -20.020417  -21.808688   
group[T.cp]:orientation[T.upright]:identity[T.s...   36.768234   28.513699   

                                                        theano  
Intercept                                           653.084289  
group[T.cp]                                         162.572210  
orientation[T.upright]                              -66.649262  
identity[T.self]                                    -19.750673  
group[T.cp]:orientation[T.upright]                  -83.064418  
group[T.cp]:identity[T.self]                        -62.530412  
orientation[T.upright]:identity[T.self]             -16.959136  
group[T.cp]:orientation[T.upright]:identity[T.s...   29.054494  

Using TensorFlow

In [13]:
Y, X   = dmatrices(formula, data=tbltest, return_type='matrix')
Terms  = X.design_info.column_names
_, Z   = dmatrices('rt ~ -1+subj', data=tbltest, return_type='matrix')
X      = np.asarray(X) # fixed effect
Z      = np.asarray(Z) # mixed effect
Y      = np.asarray(Y)
nfixed = np.shape(X)
nrandm = np.shape(Z)

X = X.astype(np.float32)
Z = Z.astype(np.float32)
Y = Y.astype(np.float32)

import tensorflow as tf
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

def model(X, beta, Z, b):
    y_pred = tf.matmul(X, beta) + tf.matmul(Z, b)
    #randcoef = tf.matmul(Z, b)
    #Xnew     = tf.transpose(X) * tf.transpose(randcoef)
    #y_pred   = tf.matmul(tf.transpose(Xnew), beta)
    return y_pred # notice we use the same model as linear regression, 
                  # this is because there is a baked in cost function which performs softmax and cross entropy

Xtf   = tf.placeholder("float32", [None, nfixed[1]]) # create symbolic variables
Ztf   = tf.placeholder("float32", [None, nrandm[1]])
y     = tf.placeholder("float32", [None, 1])
beta  = init_weights([nfixed[1], 1])
b     = init_weights([nrandm[1], 1])
eps   = init_weights([0])
y_    = model(Xtf, beta, Ztf, b)
# y_    = tf.nn.softmax(model(Xtf, beta) + model(Ztf, b) + eps)

# cost  = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_), reduction_indices=[1]))
cost  = tf.square(y - y_) # use square error for cost function

train_step    = tf.train.GradientDescentOptimizer(0.0001).minimize(cost)

init          = tf.global_variables_initializer()

# Add ops to save and restore all the variables.