In [24]:

```
import numpy as np
import pandas as pd
import patsy
import seaborn as sns
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import itertools
%matplotlib inline
sns.set_style("white")
```

In this lecture we will cover:

- interaction effects and what they mean
- link functions

In next week's lecture we will look at ANOVA and repeated measures designs.

An *interaction* between two variables is when the effect of one variable depends on (is modified by) another variable.

For example,

- the relationship beween drug dosage and disease outcome
*depends*on weight - the effect of attentional load on reaction time
*depends*on whether the observer is depressed or not

Interactions are ubiquitous in experimental designs in neuroscience, and sometimes they aren't tested appropriately:

First let's give a simple example of an interaction between two continuous variables.

A model with no interaction is a plane in a multidimensional space. Adding an interaction term lets the model prediction bend – that is, the influence of one variable on the response is no longer constant, but depends on the level of the other variable.

We can consider numerous types of interactions that make sense in the real world. The sign of the interaction's coefficient will determine the shape of the bend in the model surface.

One example of a strong interaction (from John Kruschke's book, *Doing Bayesian Data Analysis*) occurs for the *lift* (how fast an object rises) of lighter-than-air craft (i.e. balloons). For hot air balloons, lift increases as a function of fire.

For hydrogen balloons, lift increases as a function of hydrogen. But if we *combine* fire and hydrogen, we get a massive decrease in lift.

That is, the two *interact*. We can visualise this as a curved 3D surface:

In [25]:

```
betas = np.array([1, 1, -2.])
x = np.linspace(0, 1) # vector for x
y = np.linspace(0, 1) # vector for y
X, Y = np.meshgrid(x, y) # grid of X and Y
# the height Z is the sum of the weighted X and Ys (weighted by our betas):
Z = X*betas[0] + \
Y*betas[1] + \
X*Y*betas[2]
# create a matplotlib figure window, add 3D axes
# http://matplotlib.org/examples/mplot3d/subplot3d_demo.html
fig = plt.figure(figsize=(12, 4))
# fig = plt.figure(figsize=plt.figaspect(0.3))
# first view:
ax = fig.add_subplot(1, 3, 1, projection='3d')
ax.plot_surface(X, Y, Z, alpha=1, cmap=cm.coolwarm)
ax.view_init(elev=15, azim=20)
ax.set_xlabel('Fire')
ax.set_xticks([0, .5, 1])
ax.set_ylabel('Hydrogen')
ax.set_yticks([0, .5, 1])
ax.set_zlabel('Lift')
ax.set_zticks([])
# second view:
ax = fig.add_subplot(1, 3, 2, projection='3d')
ax.plot_surface(X, Y, Z, alpha=1, cmap=cm.coolwarm)
ax.view_init(elev=20, azim=55)
ax.set_xlabel('Fire')
ax.set_xticks([0, .5, 1])
ax.set_ylabel('Hydrogen')
ax.set_yticks([0, .5, 1])
ax.set_zlabel('Lift')
ax.set_zticks([])
# third view:
ax = fig.add_subplot(1, 3, 3, projection='3d')
ax.plot_surface(X, Y, Z, alpha=1, cmap=cm.coolwarm)
ax.view_init(elev=10, azim=80)
ax.set_xlabel('Fire')
ax.set_xticks([0, .5, 1])
ax.set_ylabel('Hydrogen')
ax.set_yticks([0, .5, 1])
ax.set_zlabel('Lift')
ax.set_zticks([])
plt.show()
```

While 3D plots can be nifty, usually it's nicer to visualise this in 2D as a coloured plane (equivalent to looking down on the surface above). Seaborn can fit interaction models between two continuous covariates in a single step, and plot:

In [26]:

```
# make a z vector
z = x*betas[0] + \
y*betas[1] + \
x*y*betas[2]
# put all the vectors into a new dataframe:
df = pd.DataFrame({'fire': x,
'hydrogen': y,
'lift': z})
# fit regression with interaction, plot:
sns.interactplot('fire', 'hydrogen', 'lift', df,
cmap="coolwarm", filled=True, levels=25);
```

How are interaction terms implemented in a design matrix? It's very simple: they are the *product* of single-variable design matrix columns. The interaction columns are then weighted by a coefficient, as for any other variable. Let's look at how this works, and what it means.

In [27]:

```
# import the dataset csv file into a Pandas dataframe object:
sleep = pd.read_csv('../Datasets/sleep.txt', sep='\t')
sleep = sleep.dropna(axis=0) # drop on axis 0 (i.e. rows)
# variable creation:
sleep['log_BodyWt'] = np.log(sleep['BodyWt'])
sleep['danger_cat'] = 'low' # set as a string, "low" for all.
sleep.loc[sleep['Danger'] > 3, 'danger_cat'] = 'high' # set values of 4 and 5 to "high"
sleep['life_cat'] = pd.qcut(sleep['LifeSpan'], [0, .33, .66, 1])
sleep['life_cat'] = sleep['life_cat'].cat.rename_categories(['short', 'med', 'long']) # rename categories
sleep['life_cat'] = sleep['life_cat'].astype('object') # make it an object, rather than a category
sleep.info()
```

Let's imagine that we want to know if the relationship between bodyweight and total sleep changes depending on the length of the animal's gestation.

In patsy formulae, we can include an interaction term using the "`:`

" character. So `BodyWt:Gestation`

is an interaction between those two variables. A shorthand for including both main effects (the variables alone) and their interaction is the "`*`

" character. So writing `BodyWt * Gestation`

in the formula is the same as writing `BodyWt + Gestation + BodyWt:Gestation`

.

In [28]:

```
X = patsy.dmatrix('log_BodyWt * Gestation', sleep)
X
```

Out[28]:

So you can see we get a new column that's the product of the other two. Let's think about what this column means.

First, let's look at the relationship between these variables by using a `pairplot`

(also called a scatterplot matrix).

In [29]:

```
# convert to pandas df so seaborn is happy:
tmp = pd.DataFrame(X, columns = ['intercept', 'log_bodyweight', 'gestation', 'interaction'] )
# plot pairplot:
sns.pairplot(tmp, vars=['log_bodyweight', 'gestation', 'interaction']);
```

Incidentally, note how `gestation`

is also skewed (and consequently the interaction term, too). This would be a good candidate to log-scale as well, but for simplicity we'll leave it be.

The interaction term means that we're adding some value ($\beta_4 \times \mathrm{bodyweight} \times \mathrm{gestation}$) into the linear predictor. When both bodyweight and gestation are large, this value is large.

What happens when we fit this model?

In [30]:

```
fit_2 = smf.glm('TotalSleep ~ log_BodyWt * Gestation', sleep).fit()
print(fit_2.summary())
```

The interaction term is not significant (its confidence interval borders zero), so we could probably exclude it from the model. For demonstration, let's look at the model surface.

In [31]:

```
x = np.linspace(sleep['log_BodyWt'].min(), sleep['log_BodyWt'].max()) # vector for x (log bodyweight)
y = np.linspace(sleep['Gestation'].min(), sleep['Gestation'].max()) # vector for y (gestation)
X, Y = np.meshgrid(x, y) # grid of X and Y
# the height Z is the sum of the weighted X and Ys (weighted by our betas):
Z = fit_2.params[0] + \
X*fit_2.params[1] + \
Y*fit_2.params[2] + \
X*Y*fit_2.params[3] # note the addition of the interaction term here!
# create a matplotlib figure window, add 3D axes
# http://matplotlib.org/examples/mplot3d/subplot3d_demo.html
fig = plt.figure(figsize=(12, 4))
# first view:
ax = fig.add_subplot(1, 3, 1, projection='3d')
ax.plot_surface(X, Y, Z, cmap=cm.coolwarm)
ax.view_init(elev=10, azim=20)
ax.set_xlabel('log bodyweight (kg)')
ax.set_ylabel('gestation (days)')
ax.set_zlabel('Total sleep (hours)')
# second view:
ax = fig.add_subplot(1, 3, 2, projection='3d')
ax.plot_surface(X, Y, Z, cmap=cm.coolwarm)
ax.view_init(elev=10, azim=45)
ax.set_xlabel('log bodyweight (kg)')
ax.set_ylabel('gestation (days)')
ax.set_zlabel('Total sleep (hours)')
# third view:
ax = fig.add_subplot(1, 3, 3, projection='3d')
ax.plot_surface(X, Y, Z, cmap=cm.coolwarm)
ax.view_init(elev=10, azim=70)
ax.set_xlabel('log bodyweight (kg)')
ax.set_ylabel('gestation (days)')
ax.set_zlabel('Total sleep (hours)')
plt.show()
```

In [32]:

```
sns.interactplot("log_BodyWt", "Gestation", "TotalSleep", sleep,
cmap="coolwarm", filled=True, levels=25);
```

We saw in the last lecture how categorical variables (e.g. gender, religion) can be included into design matrices, using *dummy coding*. How are interaction terms calculated for dummy coded variables?

It's exactly the same: the product of the relevant design matrix columns. This allows the effect of one variable to depend on another. Let's look at an example in the context of a $2 \times 2$ design.

How far does a paper plane fly, depending on its design and the weight of the paper used?

Some undergraduate statistics students from Australia wanted to find out. So they ran a study in which they threw paper planes down a hallway in a university building.

We're going to use this dataset to demonstrate an interaction in a $2 \times 2$ factorial experiment. Note that for the purposes of this demo, we will ignore some of the variables in the dataset.

In [33]:

```
# read in the data, rename some values:
planes = pd.read_csv('../Datasets/planes.txt', sep='\t')
# column names to lowercase:
planes.columns = [c.lower() for c in planes.columns]
# turn "paper" variable into an object, not int64:
planes['paper'] = planes['paper'].astype(object)
planes.replace(to_replace={'paper': {1: '80',
2: '50'},
'angle': {1: 'horz',
2: '45deg '},
'plane': {1: 'sleek',
2: 'simple'}},
inplace=True)
# rename "plane" to "design":
planes = planes.rename(columns={'plane': 'design'})
print(planes)
```

In [34]:

```
fit = smf.glm('distance ~ design * paper', planes).fit()
print(fit.summary())
```

In this model, there are significant effects of plane design and paper weight, and also a significant interaction term. What do these mean?

First, let's look at the design matrix.

In [35]:

```
X = patsy.dmatrix('~ design * paper', planes)
X
```

Out[35]:

We have our intercept column of ones as before. The second and third columns are the "switches" for whether the plane is "sleek", and whether a paper weight of 80 grams was used.

So the intercept $\beta_1$ weight corresponds to the distance travelled when `design == simple`

and `paper == 50`

. $\beta_2$ is the *difference* between the distance travelled for sleek and simple planes, when the paper weight is 50. $\beta_3$ is the difference in distance travelled between paper of weight 50 and paper of 80 grams, for simple planes.

What about the interaction column? We can see that again, this is just the product of the two simple effect columns. When both `design[T.sleek]`

and `paper[T.80]`

are 1, the interaction is 1. Otherwise, it's zero.

Let's think about what this means.

If we throw a sleek-designed plane, made of 80 gram paper, then we add the interaction coefficient $\beta_4$ to the prediction. Otherwise, no coefficient is added.

Let's visualise the data and the model predictions, using Seaborn. To do this, we need to merge some data frames together so that we can plot the predictions with the data. The reason for this is that Seaborn plotting commands accept one *and only one* data frame. There are probably other ways to make this plot; I got this way from the answer to my question here.

In [36]:

```
# make a data frame containing the predictions, using the expand grid function.
# define expand grid function:
def expand_grid(data_dict):
""" A port of R's expand.grid function for use with Pandas dataframes.
from http://pandas.pydata.org/pandas-docs/stable/cookbook.html?highlight=expand%20grid
"""
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
# build a new matrix with expand grid:
preds = expand_grid({'paper': ['80', '50'],
'design': ['sleek', 'simple']})
preds_2 = preds.copy()
preds['distance'] = fit.predict(preds)
# also fit a model with no interaction for comparison:
fit_2 = smf.glm('distance ~ paper + design', planes).fit()
preds_2['distance'] = fit_2.predict(preds_2)
# to plot model predictions against data, we should merge them together:
# see answer (https://stackoverflow.com/questions/28239359/showing-data-and-model-predictions-in-one-plot-using-seaborn-and-statsmodels)
merged = pd.concat(dict(data=planes,
additive=preds_2,
interaction=preds),
names=['type']).reset_index()
print(merged)
```

In [37]:

```
# plot with seaborn:
g = sns.factorplot('design', 'distance', 'paper',
data=merged,
col='type',
kind='point',
size=3.5)
g.fig.subplots_adjust(wspace=0.3)
sns.despine(offset=10);
```

In [38]:

```
print(fit_2.summary())
```

The three panels above show the data (with bootstrapped 95% confidence intervals around the means), the predictions of the additive model and the predictions of the interaction model.

By comparing the above plots (predictions with and without an interaction), it becomes clear that the effect of paper depends strongly on the plane design. That is, the relationship between design and distance depends on the paper weight.

This is also a good example of when it's not very meaningful to talk about a significant main effect if an interaction involving that factor is significant.

For example, in the interaction model, there is a signficant simple effect of design such that the sleek design travels further on average than the simple design (for a paper weight of 50). However, the interaction means that the sleek design with heavy paper actually travels a shorter distance than the simple design (whether the simple plane is made of heavy or light paper). So writing something like "the sleek design travels further than the simple design, since we had a significant effect of design" is misleading.

What happens when we include an interaction term between a categorical and a continuous variable? In the design matrix, it's exactly the same operation: we take the products of the relevant columns.

To demonstrate what this means, let's return to the sleep dataset, and consider an interaction between bodyweight and our categorical danger variable.

In [39]:

```
X = patsy.dmatrix('~ log_BodyWt * danger_cat', sleep)
X
```

Out[39]:

This model can be written as:

$\hat y = \beta_1 + \beta_2 \mathrm{danger} + \beta_3 \mathrm{bodyweight} + \beta_4 \mathrm{interaction}$

So when `danger`

is "high", our model reduces to the intercept plus the log_BodyWeight:

$\hat y = \beta_1 + \beta_3 \mathrm{bodyweight}$

Whereas when `danger`

is "low", we

- add $\beta_2$ to the predictions, irrespective of bodyweight (i.e. a change in intercept)
- change the slope of bodyweight by $\beta_4$

So what our interaction term effectively does is allows the `danger = low`

condition to have a different intercept *and* a different slope than the `danger = high`

condition.

What happens when we fit this model?

In [40]:

```
fit_2 = smf.glm('TotalSleep ~ log_BodyWt * danger_cat', sleep).fit() # interaction model
print(fit_2.summary())
```

We can see that the z-test for the coefficient of interaction term in the table is not significant. This means that from these data, we can't conclude that the relationship between bodyweight and sleep depends on danger. That is, the slopes of the line for the low and high danger groups do not significantly differ.

In [41]:

```
# Plot the two models against each other.
# we do this slightly differently to the categorical examples above,
# due to the need for some data re-arranging to plot the data as points
# and the model predictions as lines.
preds = expand_grid(
{'log_BodyWt': np.linspace(-6, 9, 2),
'life_cat': ['short', 'med', 'long'],
'danger_cat': ['low', 'high']})
fit = smf.glm('TotalSleep ~ log_BodyWt + danger_cat', sleep).fit() # additive model
fit_2 = smf.glm('TotalSleep ~ log_BodyWt * danger_cat', sleep).fit() # interaction model
# add model predictions to data frames:
preds['yhat'] = fit.predict(preds)
preds['yhat_2'] = fit_2.predict(preds)
merged = sleep.append(preds)
```

In [42]:

```
# Plot additive model:
g = sns.FacetGrid(merged, hue='danger_cat', size=4)
# use the `map` method to add stuff to the facetgrid axes:
g.map(plt.scatter, "log_BodyWt", "TotalSleep")
g.map(plt.plot, "log_BodyWt", "yhat")
sns.despine(offset=10)
g.set_axis_labels('Bodyweight [log kg]', 'Sleep [hours]')
g.add_legend(title='Danger')
```

In [43]:

```
# plot interaction model:
g = sns.FacetGrid(merged, hue='danger_cat', size=4)
# use the `map` method to add stuff to the facetgrid axes:
g.map(plt.scatter, "log_BodyWt", "TotalSleep")
g.map(plt.plot, "log_BodyWt", "yhat_2")
sns.despine(offset=10)
g.set_axis_labels('Bodyweight [log kg]', 'Sleep [hours]')
g.add_legend(title='Danger')
```

Interactions let you model how the value of one variable changes the relationship between other variables. That is, we ask "does the relationship between X and Y *depend on* Z"?

In terms of the design matrix, interaction terms are simply the products of the relative columns.

In [1]:

```
from IPython.core.display import HTML
def css_styling():
styles = open("../custom_style.css", "r").read()
return HTML(styles)
css_styling()
```

Out[1]: