*Key ideas:* Poisson GEE, exchangeable and autoregressive covariance structures, longitudinal count data

In this notebook, we fit a Poisson GEE to the epilepsy data from the MASS library in R. The data come from a study in which the number of seizures was recorded on each of 236 subjects over a 16 week period, with a treatment introduced at the end of week 8. The seizure counts were binned as follows: weeks 1-8, 9-10, 11-12, 13-14, and 15-16. This is a longitudinal data set with five measurements per subject, and there is likely to be within-subject dependence.

In the analysis below, the total number of seizures during the first 8 weeks is treated as a baseline measure, and the effectiveness of the treatment is based on the seizure counts for weeks 9-16 (binned into 4 intervals that are each 2 weeks in duration).

More information about the data can be found here:

http://vincentarelbundock.github.io/Rdatasets/doc/MASS/epil.html

Here are the standard import statements:

In [1]:

```
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
Independence,Autoregressive)
from statsmodels.genmod.families import Poisson
```

The data are in a clean csv file and are easy to read into Python:

In [2]:

```
data = pd.read_csv("epil.csv")
print(data.head())
```

In [3]:

```
for period in range(1,5):
ii = data['period'] == period
x = np.sqrt(data['base'].loc[ii])
y = np.sqrt(data['y'].loc[ii])
plt.plot(x, y, 'o', color='rgbc'[period-1], label=str(period), alpha=0.5)
plt.xlabel("Baseline", size=17)
plt.ylabel("Follow-up", size=17)
plt.ylim(ymin=-0.5)
ha, lb = plt.gca().get_legend_handles_labels()
leg = plt.figlegend(ha, lb, "center right", numpoints=1, handletextpad=0.0001)
leg.draw_frame(False)
```

In [4]:

```
fam = Poisson()
ind = Independence()
model1 = GEE.from_formula("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)
result1 = model1.fit()
print(result1.summary())
```

The scale parameter is estimated to be around 5:

In [5]:

```
result1.scale
```

Out[5]:

In [6]:

```
yg = model1.cluster_list(np.asarray(data["y"]))
ymn = [x.mean() for x in yg]
yva = [x.var() for x in yg]
plt.grid(True)
plt.plot(np.log(ymn), np.log(yva), 'o')
plt.plot([-2, 6], [-2, 6], '-', color='grey')
plt.xlabel("Log variance", size=17)
plt.ylabel("Log mean", size=17)
```

Out[6]:

In [7]:

```
fam = Poisson()
ex = Exchangeable()
model2 = GEE.from_formula("y ~ age + trt + base", "subject", data, cov_struct=ex, family=fam)
result2 = model2.fit()
print(result2.summary())
print(ex.summary())
```

In [8]:

```
time = np.kron(np.ones(data.shape[0]/4), (0,1,2,3))
fam = Poisson()
ar = Autoregressive()
ar.dep_params = 0.4
model3 = GEE.from_formula("y ~ age + trt + base", "subject", data, time=time, cov_struct=ar, family=fam)
result3 = model3.fit(start_params=result2.params)
print(result3.summary())
print(ar.summary())
print("scale=%.2f" % (result2.scale))
```

Comparing the autoregressive and exchangeable models, we see that the regression coefficients are quite similar. Nevertheless, we can probe the dependence structure a bit more by making a plot of the empirical correlation between residuals as a function of the time spacing. Here the time point values are 0, 1, 2, 3, which are the default integer time values for the four post-treatment measurements. The plot below suggests that the dependence decreases with time, so the autoregressive structure seems to be more suitable.

In [9]:

```
fig = result1.plot_isotropic_dependence()
plt.grid(True)
```

In [10]:

```
rslts = result3.params_sensitivity(0., 0.9, 5)
dp = [x.cov_struct.dep_params for x in rslts]
age_params = np.asarray([x.params[2] for x in rslts])
age_se = np.asarray([x.bse[2] for x in rslts])
plt.grid(True)
plt.plot(dp, age_params, '-', color='orange', lw=4)
plt.plot(dp, age_params + age_se, '-', color='grey')
plt.plot(dp, age_params - age_se, '-', color='grey')
plt.axvline(result3.cov_struct.dep_params, ls='--', color='black')
plt.xlabel("Autocorrelation parameter", size=14)
plt.ylabel("Age effect", size=14)
```

Out[10]: