Design of Experiments

Unit 18, Lecture 1

Numerical Methods and Statistics

Prof. Andrew White, April 30, 2019


  1. Know the vocubulary (treatment condition, factor, level, response, ANOVA, coding, factorial design, interaction, confound, grand mean, nuisance factor, blocking)
  2. Know that design of experiments and its analysis is for seeing what factors affect a response, not necessarily getting good regression models
  3. Recognize that design of experiments analysis is based on linear regression and hypothesis tests
  4. Be able to read and interpret an ANOVA table
  5. Be able to read and create a table of experiments following factorial or other designs
  6. Understand how to treat unkown nuisance factors (randomize experiment order) and known nuisance factors (blocking)

Design of experiments: (Wikipedia)

The design of experiments is the design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.

Table of Experiments

Let's see an example of a design of experiments table:

TC $X_1$ $X_2$ $Y$
1 +1 +1 $y_1$
2 +1 -1 $y_2$
3 -1 +1 $y_3$
4 -1 -1 $y_4$
  • TC Treatment Condition
  • $X$ A factor
  • +1 The factor level
  • $Y$ The response
  • The use of +1,-1 is called the coding

This table shows a 2 factor, 2 level design of experiments that has 4 treatment conditions.

Factor Levels

What is the meaning of the +1, -1? We do design of experiments to see if factors affect something. For example, our response might be the concentration of a chemical species and factors could be temperature and pressure. Because there are many temperatures to test, we might just only consider two temperature: hot and cold. This can be coded as levels: -1, +1. This is often done because there are standard analysis equations that would with integer levels, especially with two levels.

If we regress against these integer levels, the regression coefficients aren't really meaningful. Instead, we care about p-values. That is, we care about discovering if certain factors affect our response. This will allow to say "temperature affects the concentration" or "pressure does not affect concentration".


Note that our experimental design doesn't include replicates. The design of experiments is meant to be as efficient as possible. Note that here we're trying to see what matters, and not trying to get an accurate regression model. If you want to do regression for accuracy, then you should include replicates and work with actual factor values instead of levels.

Connecting to Categorical Regression

We saw in unit 12, lecture 3 how to treat discrete data like this. Let's try regressing it! The data is 2 dimensional, so we will use 2 dimensional least squares. Should we include an intercept? Yes! One way to include is it to compute the grand mean from all responses so that they are centered at 0. Then the intercept will be 0. You should know this is commonly done, but we won't do this for our analysis. We'll just use a regular intercept as we saw in our regression unit.

In [26]:
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import scipy.stats as ss
import numpy.linalg as linalg

x1 = [1, 1, -1, -1]
x2 = [1, -1, 1, -1]
y = [1.2, 3.2, 4.1, 3.6]

We'll use multidimensional ordinary least squares with an intercept:

In [36]:
x_mat = np.column_stack((np.ones(4), x1, x2))
array([[ 1.,  1.,  1.],
       [ 1.,  1., -1.],
       [ 1., -1.,  1.],
       [ 1., -1., -1.]])

We'll compute our coefficients and their standard error

In [44]:
beta, *_ = linalg.lstsq(x_mat, y)
y_hat = x_mat @ beta
resids = (y - y_hat)
SSR = np.sum(resids**2)
se2_epsilon = SSR / (len(x) - len(beta))
se2_beta = se2_epsilon * linalg.inv(x_mat.transpose() @ x_mat)
print(np.sqrt(se2_beta), np.sqrt(se2_epsilon))
[[0.625 0.    0.   ]
 [0.    0.625 0.   ]
 [0.    0.    0.625]] 1.25
/opt/conda/lib/python3.7/site-packages/ FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  """Entry point for launching an IPython kernel.

Now we can compute p-values and confidence intervals:

In [47]:
df = len(x) - len(beta)
print('df = ', df)
for i in range(len(beta)):
    #get our T-value for the confidence interval
    T = ss.t.ppf(0.975, df) 
    # Get the width of the confidence interval using our previously computed standard error
    cwidth = T * np.sqrt(se2_beta[i,i]) 
    # print the result, using 2 - i to match our numbering above
    hypothesis_T = -abs(beta[i]) / np.sqrt(se2_beta[i,i])
    p = 2 * ss.t.cdf(hypothesis_T, df + 1) # +1 because null hypothesis doesn't include coefficient
    print(f'beta_{i} is {beta[i]:.2f} +/- {cwidth:.2f} with 95% confidence. p-value: {p:.2f} (T = {hypothesis_T:.2f})') 
df =  1
beta_0 is 3.02 +/- 7.94 with 95% confidence. p-value: 0.13 (T = -4.84)
beta_1 is -0.83 +/- 7.94 with 95% confidence. p-value: 0.41 (T = -1.32)
beta_2 is -0.38 +/- 7.94 with 95% confidence. p-value: 0.66 (T = -0.60)

So we found that our intercept is likely necessary (p < 0.05), but the two factors do not have a significant effect. We also found that factor 1 is more important than factor 2 as judged from the p-value

Using Statsmodels to for Regression

We're going to be using a new library to do regression on this unit because of its ability to do an ANOVA analysis. We'll learn about ANOVA below, but let's first repeat the above regression with this tool. Creating a statsmodel requires two ingredients: data and a formula. The formula is a string that matches your regression model. In this case we use y ~ x1 + x2. The ~ means equal to here. The data should be a dictionary whose keys match the variables you used in your formula. Thus doing data[y] should give the y vector. The statsmodels regression is created by calling ols and then we must call fit() to do the regression and summary to get a report on the results.

In [42]:
from statsmodels.formula.api import ols
x1 = [1, 1, -1, -1]
x2 = [1, -1, 1, -1]
y = [1.2, 3.2, 4.1, 3.6]
data = {'x1': x1, 'x2': x2, 'y': y}

model = ols('y ~ x1 + x2', data=data).fit()
/opt/conda/lib/python3.7/site-packages/statsmodels/stats/ ValueWarning: omni_normtest is not valid with less than 8 observations; 4 samples were given.
  "samples were given." % int(n), ValueWarning)
OLS Regression Results
Dep. Variable: y R-squared: 0.678
Model: OLS Adj. R-squared: 0.033
Method: Least Squares F-statistic: 1.051
Date: Thu, 02 May 2019 Prob (F-statistic): 0.568
Time: 08:52:10 Log-Likelihood: -3.7957
No. Observations: 4 AIC: 13.59
Df Residuals: 1 BIC: 11.75
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 3.0250 0.625 4.840 0.130 -4.916 10.966
x1 -0.8250 0.625 -1.320 0.413 -8.766 7.116
x2 -0.3750 0.625 -0.600 0.656 -8.316 7.566
Omnibus: nan Durbin-Watson: 2.000
Prob(Omnibus): nan Jarque-Bera (JB): 0.667
Skew: 0.000 Prob(JB): 0.717
Kurtosis: 1.000 Cond. No. 1.00

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Interpreting Statsmodels

This regression summary has a huge amount of information. The top table includes information about the goodness of fit and regression model, like degrees of freedom and what the independent variable is. The middle table contains information about the regression coefficients including confidence intervals and p-values. The final table contains information about the residuals. The Jarque-Bera test is a normality test, like the Shapiro-Wilks test we learned previously. The p-values are slightly different because they use dof as 1, instead of 2, for their hypothesis test.


One of the most common analysis techniques of a design of experiments is the use of an Analysis of Variance (ANOVA). An ANOVA breaks up the response variance into factor variances. It explains where the variance in the response comes from. We aren't going to go deeply into the theory of ANOVA, but it's important that you know how it's used and how to intepret it. An ANOVA is based on a linear regression like above, but it's a different way of computing p-values. The p-values are the most relevant output of an ANOVA.

Here's an ANOVA of the above example:

In [51]:
df sum_sq mean_sq F PR(>F)
x1 1.0 2.7225 2.7225 1.7424 0.412741
x2 1.0 0.5625 0.5625 0.3600 0.655958
Residual 1.0 1.5625 1.5625 NaN NaN

The ANOVA test gives information about each factor. The df is the degrees of freedom used to model each factor, the sum_sq is difference between the grand mean response and mean response of the treatment, the mean_sq is the sum_sq divided by the degrees of freedom, the F is an F-test statistic (like a T statistic from a t-test), and the final column contains p-value for the existence of each treatment.


The F-test is an alternative to the t-tests we do for regression coefficients being non-zero. The F-test is a little bit different than a t-test. One important idea of an F-test is that when we consider regression coefficents, we imagine our null model as being nested within the model we're considering. That means that the null hypothesis, the regression coefficient is zero, is a special case of the model we're considering where the regression coefficient is non-zero. An example of models that are not nested would be comparing using a $\beta \sin x$ vs $\beta x^2$. There is no obvious way to nest one of these models in the other to create a null hypothesis. Notice that if you imagine the F-test exactly the same as the t-test (null is regression coefficient being 0), then you'll always have nested models.

Designing Experiments

One at a time (bad example)

One common choice for designing experiments might be one at a time. In this approach you vary each treatment once. Let's see an example. Say you want to know how water, sun, and playing music affects plant growth. A one at a time design would look like this:

TC Water Sun Music Plant Growth
1 0 0 0 $y_1$
2 1 0 0 $y_2$
3 0 1 0 $y_3$
4 0 0 1 $y_4$

Notice that we have switched our level coding to $0$ and $1$. The choice is arbitrary, but it demonstrates better the idea of one at a time design. What is wrong with this design?

Water and sun will never be active at the same time, meaning that all of our experiments will not actually have plant growth. As we discussed in unit 12, lecture 3, this means we're missing interactions. Look at the model equation we assume with one at a time:

$$ y = \beta_w x_w + \beta_s x_s + \beta_m x_m \ldots + \epsilon $$

This is missing those interactions, like how the system changes when both water and sun are given to the plant. The correct model equation is:

$$ y = \beta_w x_w + \beta_s x_s + \beta_m x_m + \beta_{ws} x_{ws} + \beta_{wm} x_{wm} + \beta_{sm} x_{sm} + \beta_{wsm} x_{wsm} + \epsilon $$

To solve for all these regression coefficients, we need to have at least as many experiments. This leads to..

Factoial Design

With a factorial design, we have one treatment condition for all permutations of the factor levels. For our plant growth example, the experiments would look like:

TC Water Sun Music Plant Growth
1 0 0 0 $y_1$
2 1 0 0 $y_2$
3 0 1 0 $y_3$
4 0 0 1 $y_4$
5 1 1 0 $y_5$
6 0 1 1 $y_6$
7 1 0 1 $y_7$
8 1 1 1 $y_8$

The factorial design will have $L^K$ treatment conditions, where $L$ is the number of levels and $K$ is the number of factors. $2^3 = 8$ in this case. One at a time is $1 + K$ treatment conditions for comparison.

Factorial Analysis Example

Let's consider the following example data. The plant growth is in grams. We have one replicate at each condition in this example

In [57]:
xw = [0, 1, 0, 0, 1, 0, 1, 1]
xs = [0, 0, 1, 0, 1, 1, 0, 1]
xm = [0, 0, 0, 1, 0, 1, 1, 1]
y = [0.4, 0.3, 0.3, 0.2, 4.6, 0.3, 0.2, 5.2, 0.3, 0.2, 0.4, 0.3, 5.0, 0.3, 0.3, 5.0]

# we do xw +  xw because we have 2 replicates at each condition
data = {'xw': xw + xw, 'xs': xs + xs, 'xm': xm + xm, 'y': y}

model = ols('y~xw + xs + xm + xw * xs + xw * xm + xs * xm + xw * xm * xs', data=data).fit()
sm.stats.anova_lm(model, typ=2)
sum_sq df F PR(>F)
xw 20.930625 1.0 1339.56 3.404548e-10
xs 22.325625 1.0 1428.84 2.633625e-10
xm 0.005625 1.0 0.36 5.651101e-01
xw:xs 21.855625 1.0 1398.76 2.866344e-10
xw:xm 0.050625 1.0 3.24 1.095530e-01
xs:xm 0.030625 1.0 1.96 1.990794e-01
xw:xm:xs 0.015625 1.0 1.00 3.465935e-01
Residual 0.125000 8.0 NaN NaN

Fractional Factorial

Due to how many treatment conditions are required for factorial, often people will neglect some of the interaction effects and leave them as confounded. For example, if you have 3 levels and 5 factors, there will be 11 treatment conditions measuring the treatment effects without interaction and 232 for measuring interactions. If we study only a fraction of these, we can greatly reduce the number of experiments. The choice of how to reduce the number of experiments is a complex topic, but essentially any design between one at a time and factorial is fractional factorial

Nuisance Factors

Often we have factors that are known but not interesting. For example, we might be conducting experiments on Monday and Wednesday. That is a factor, but not one we're interested in. A common example is gender in drug trails. We know gender may effect response to a drug, but we don't want to study it. There are a variety of ways to deal with these.

Unquantifiable Nuisance Factors

To remove unknown nuisance factors, or nuisance factors that are hard to quantify, you can randomize your order of experiments. This gives some robustness to the possibility of unkown nuisance factors affecting your experiments. For example, if you get better at an experiment so that you conduct it more accurately and precise as time goes on, this is a hard to quantify nuisance factor. If you randomize your order though, this will mean that you will not have your accuracy indirectly affecting your conclusion about other treatment conditions.

Quantifiable Nuisance Factors

For nuisance factors which you can measure and sort into levels, you can use blocking to remove their effect. Blocking means arranging your factors so that in each block you have the same nuisance factors.

For example, if I want to do a drug trial I need to two blocks: the control and the block given the drug. If I want to remove gender, I will make each block have equal numbers of the two genders. Let's saw you have 12 participants and 8 are women. Block I would be 4 women and 2 men. Block II would be 4 women and 2 men. My experiment would look like:

TC Block $X$ $Y$
1 I 0 $y_1$
2 II 1 $y_2$

This blocking means that the nuisance factor of gender does not vary when other factors vary.