Let X1,X2,… be i.i.d. random variables all with with mean μ and variance σ2.
Let the sample mean be denoted by ¯Xn=1n∑ni=1Xj, where n is the sample size.
What can we say about ¯Xn when n gets large?
Another way to think about the Law of Large Numbers is to see that
limn→∞¯Xn−μ=0However, this is only thinking point-wise.
One way to study the distribution of ¯Xn is to multiply (¯Xn−μ) by some variable that itself goes to ∞.
Consider:
n?(¯Xn−μ)We could learn more about the distribution of ¯Xn by selecting some power of n>0, and thinking about what happens.
Here is the standardized version: ∑nj=1Xj−nμ√nσ→N(0,1)
Let Sn=∑nj=1Xj, show M[Sn√n]→M[N(0,1)].
Here are some quick facts about Moment Generating Functions to keep in mind as we go along our proof:
But t22 is the log of et2/2, and that is the MGF of N(0,1). QED.
Let X∼Bin(n,p), and think of X=∑nj=1Xj,X∼Bern(p) i.i.d.
By the Central Limit Theorem, we can approximate X with a Normal distribution if n is large enough, and if we standardize X first.
P(a≤X≤b)=P(a−np√npq≤X−np√npq≤b−np√npq)≈Φ(b−np√npq)−Φ(a−np√npq)Contrast the above Normal approximation of a binomial with a Poisson approximation. With a Poisson approximation of a binomial, we assumed:
But in the case of a Normal approximation, while we do wish n to be large, it is best if p is close to 12. Why?
Remember that the Normal distribution in a CLR is symmetric about μ=0. If p is too far away from the mean, then the distribution will get very skewed. That is bad. If n is really, really, really large, then the CLR would still work no matter what p might be, but you will need to be careful when n is not that large.
Now in the above example, we are approximating a discrete distribution using a continuous one.
What would we do if we instead started with something like P(X=a), where a is some integer?
P(X=a)=P(a−ϵ≤X≤a+ϵ)... where ϵ is a very small value that allows us to look at a range centered at a instead of a single value a. Now we can continue using the Normal approximation.