We generally assume that any quantity measured has a “true” value, which is the result that we would get if we had a perfect measuring apparatus. However, devices may be poorly made, out of adjustment, subject to noise or other random effects, or difficult to read accurately. In addition, all devices give measurements to only a finite number of digits. These problems mean that uncertainty is an unavoidable part of the measurement process. Of course, you will try to reduce measurement uncertainty whenever possible, but you will never elminate it. Your task is to estimate the size of the uncertainty carefully and to communicate the result clearly.

We will quantify uncertainty by specifying the 95% confidence interval, which is the range within which we are 95% confident that the “true value” would be found if we could measure it. This means that we expect that there is only one chance in 20 that the true value does not lie within the specified range. The conventional way of specifying this range is to state the measurement value plus or minus the uncertainty. For example, we might report that the length of an object is 25.2 ± 0.2 cm, which means that the measured value is 25.2 cm and the uncertainty is ±0.2 cm. This statement mean that we are 95% confident that the measurement’s true value lies within the range 25.0 cm to 25.4 cm.

It is essential to know the uncertainty of a measured value in order to correctly interpret its meaning. For example, suppose that you measure the period of a pendulum to be 1.23 sec. Imagine that a theory predicts that the period should be 1.189275 sec. Is your result consistent with the theory or not? The answer to this question depends entirely on the uncertainty of your result. If your result has an uncertainty of ±0.05 s, then the true value of the measured period could quite easily be the same as the theoretical value. In this case, we say that the measurement is consistent with the theory. On the other hand, if the uncertainty in your result is ±0.01 s, then it is not very likely (less than one in twenty chance) that the true value being measuremed is the same as the predicted value. In this case, we say that the measurement is not consistent with the theory. If both values being compared have uncertainties, we'll say that they are consistent if the ranges overlap (for example, 1.43 ± 0.05 s and 1.50 ± 0.04 s) or that they are not consistent if the ranges don't overlap (for example, 1.13 ± 0.05 s and 1.30 ± 0.06 s). In previous classes, you may have calculated the percent difference between measurement and theory, but this isn't appropriate when you know uncertainties.

Systematic errors occur when a piece of equipment is improperly constructed,
calibrated, or used, or when some physical process is going on in the experiment that you haven’t
taken into account. As a somewhat contrived example of problematic
equipment, suppose that you measured lengths with a meter stick that you failed to notice had
been cut off at the 5 cm mark. This would mean that all of your measured values would be 5 cm
too long. Some systematic errors resulting from equipment problems are relatively easy to identify
once you have some reason to suspect they exist. You can compare your equipment to two
similar, pieces of equipment and see if they all give the same result for the measurement. If your
device doesn’t agree with the other two, it probably has a problem.
As an example of a systematic error due to neglected
physical process, suppose you were trying to measure the acceleration of gravity by timing the motion of
a falling object. If you dropped a wadded-up
ball of paper, but neglected air resistance, you would find a much smaller value for the acceleration of
gravity (*g*) than the accepted value.
You don’t normally include systematic errors in the uncertainty of a measurement. If you
know that a systematic problem exists, you should fix the problem. In the meter-stick example
above, you would use a complete meter stick, or subtract 5 cm from all your measurements. Systematic
errors arising from unanticipated physics are harder to find, although they’re the source of many
Nobel Prizes in physics. Unfortunately, no well-defined procedures exist for finding systematic errors.

This chapter is mainly focused on the analysis of random effects. It is commonly the case that repeated measurements of the same quantity do not yield the same values, but rather a spread of values. For example, you might determine the speed of sound by standing at a fairly large distance (like the length of the Quad) away from an object that simultaneously emits a flash of light and a loud sound when someone flips a switch. After measuring the distance to the source, you would measure the time between seeing the flash and hearing the sound. If you measure this interval five times, you are unlikely to get five identical results. For example, you might measure 0.54 s, 0.53 s, 0.55 s, 0.49 s, and 0.53 s. Why are these results different? In this case, the problem is that it is difficult for you to start and stop the stopwatch at exactly the right instant. No matter how hard you try, sometimes you will press the stopwatch button a bit too early and sometimes a bit too late. These unavoidable and essentially random measurement errors cause the results of successive measurements of the same quantity to vary.

Random errors are a feature of almost all measurement processes. Sometimes a measuring device is too crude to register such effects. For example, a stopwatch accurate to only one decimal place might read 0.5 s for each of the measurements in the case described above. But laboratory instruments are often chosen to be just sufficiently sensitive to register random effects. You might want as precise an instrument as possible, but there is no point in buying an instrument much more precise than the limit imposed by unavoidable random effects. For example, a hand-held timer like a stopwatch that reads to a hundredth of a second is better than one that registers to only a tenth, because it’s possible to reduce your random experimental uncertainty down to a few hundredths of a second by making enough measurements. But there would be no scientific point in making a stopwatch that reads to a thousandth of a second, because the added precision of the watch would be swamped by the scatter in the measurements resulting from its operation by a human being.

The point is that random effects will be an important factor of many of the measurements that you will encounter with any scientific experiment. It should be clear that such effects increase the uncertainty in a measurement. In the stopwatch case, for example, that different trials lead to results differing by several hundredths of a second implies that the uncertainty in any given measurement value is larger than the basic ± 0.01 s uncertainty imposed by the digital readout.

Often, you will make multiple measurements of the same quantitiy. In that case, you can use statistics to quantify the random error.

Suppose that a quantity $x$ is measured $N$ times giving values of $x_1, x_2, x_3, \ldots, x_N$. The best estimate of the true value of $x$ is the mean (or average) of the measurements, which is

\begin{equation} \bar{x} = \frac{x_1 + x_2 + x_3 + \ldots + x_N}{N} = \frac{1}{N}\sum_{i=1}^N x_i. \tag{2.1} \end{equation}The standard deviation quantifies the precision of a set of data or how spread out the data is. It is defined as

\begin{equation} \sigma = \sqrt{\frac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + (x_3 - \bar{x})^2 + \ldots + (x_N - \bar{x})^2}{N-1}} = \sqrt{\frac{1}{N-1}\sum_{i=1}^N (x_i - \bar{x})^2}. \tag{2.2} \end{equation}Ignoring that we are dividing by $N-1$ instead of $N$, the standard deviation is approximately the square root of the average of the deviation squared. The reason for dividing by $N-1$ can be understood by considering a single measurement ($N=1$). In that case, the numerator is zero because the single measurement is the same as the mean. However, the standard deviation shouldn't be zero in this case, because that would suggest that the value was measured perfectly. Since the denominator, $N-1$, is also zero, the standard deviation is undefined, which makes more sense.

The standard deviation is a measure of how far a typical measurement is from the mean. A small standard deviation indicates that the measurements are clustered closely around the mean. Another way of thinking about the standard deviation is the range in which an additional measurement is likely to fall. For a large number of measurements, there is a 68% probability that another measurement would fall in the range between $\bar{x}-\sigma$ and $\bar{x}+\sigma$.

As described in the previous section, the standard deviation for a large data set is associated with a 68% probability of an additional measurement being within a standard deviation of the mean. It is conventional to define the uncertainty so that one measurement has a 95% probability of being within the uncertainty of the mean. This defines what is called a *95% confidence interval*. Clearly, if the uncertainty is defined this way, it must be larger than the standard deviation.

Mathematicians have shown that the uncertainty $U_1$ of one measurement is

\begin{equation} U_1 = t\sigma, \tag{2.3} \end{equation}where $t$ is called the *student t-value*. A list of t-factors is given in the table below. Note that $t$ varies slightly with $N$. For large $N$, the *t*-factor is approximately 2. The table also shows two measurements is not enough, but five is a good compromise between reducing the uncertainty and spending time making measurements.

$N$ | $t$ | $N$ | $t$ | |
---|---|---|---|---|

2 | 12.7 | 10 | 2.26 | |

3 | 4.3 | 12 | 2.2 | |

4 | 3.2 | 15 | 2.15 | |

5 | 2.8 | 20 | 2.09 | |

6 | 2.6 | 30 | 2.05 | |

7 | 2.5 | 50 | 2.01 | |

8 | 2.4 | 100 | 1.98 | |

9 | 2.3 | $\infty$ | 1.97 |

If random effects result in a measured value of a quantity $x$ away from its “true value”, then they are equally likely to cause any given measured value to be higher than the true value as they are to cause it to be lower than the true value. We can think of each measurement as being equal to the true value plus some random error. When calculating the mean using equation 2.1, the sum of $N$ measurements will be equal to $N$ times the true value plus the sum of all the random errors. Since the random errors are as likely to be negative as positive, they will tend to cancel each other out, so the sum of the $N$ measurements is likely to be very close to $N$ times the true value. Dividing the result by $N$ to get the mean thus yields a number that is likely to be close to the true value. This is why we use the mean as the best estimate of the true value.

We usually assume that the measurements are normally distributed, which means that they probability of a value being measured falls on a Gaussian (bell-shaped) curve centered on the mean. The more measurements $N$ that are taken, the better the location of the peak of the probablity (the mean) is known. According to mathematicians, the *uncertainty of the mean* is

Note that the uncertainty of the mean is smaller than the uncertainty $U_1$ of one measuremnt by a factor of $1/\sqrt{N}$.

When the measurement of a quantity is repeatable, you should take a set of measurements of the quantity, compute the mean of the set to get a best estimate for the measurement value, and compute the uncertainty of that mean. However, there are three cases where measurements are not repeatable. First, there are intrinsically unrepeatable measurements of one-time events. For example, imagine that you are timing the duration of a foot-race with a single stopwatch. When the first runner crosses the finish line, you stop the watch and look at the value registered. There is no way to repeat this measurement of this particular race duration. Second, some measurements are influenced by subjective human judgment, and this judgment can be influenced by knowledge of past results. In such cases, you could repeat the measurement process, but your knowledge of previous results would influence your judgment to such an extent that you couldn't be sure that those measurements would be randomly distributed. For example, imagine measuring the size of an object using a ruler. The first measurement that you take of this quantity may be subject to random effects (for example, the way that you line up the ruler on the object, the orientation of your eye with respect to the ruler, and so on), the subsequent measurements that you take of this quantity will be strongly influenced by your knowledge of the first value that you obtained. It is very difficult to be completely unbiased about such measurements after the first, particularly when it comes to reading linear scales. After reading it once, it is hard to avoid looking at the scale the same way subsequently. Finally, there can be an unvarying measurement whose value always comes out to be exactly the same if we do try to repeat it. This is a particular problem with digital readouts, and is an indication that any random errors that might influence the measurement are smaller than the precision of the instrument. In this case, we have to estimate the uncertainty from our knowledge about the instrument’s precision.

If the measurement is not repeatable, estimate the uncertainty of your single measurement based either on previous experience, your common sense, or on these two principles: (1) the uncertainty of a linear scale or dial is greater than ± one tenth of the smallest division and (2) the uncertainty of a digital readout is ± 1 in the final digit.

**2.1** Compute the mean and the standard deviation for the data set below.

0.56 s, 0.52 s, 0.59 s, 0.48 s, 0.51 s

**2.2** Imagine that you are one of ten different people who measure the time of flight of a thrown baseball. These ten measured speeds are as listed below. Compute the standard deviation of the data set and estimate the uncertainty of your particular measurement.

2.53 s, 2.58 s, 2.67 s, 2.63 s, 2.59 s, 2.60 s, 2.62 s, 2.56 s, 2.66 s, 2.61 s

**2.3** For the measurements given in exercise 2.2, what are the mean and its uncertainty?