sampling
Randomly choosing a value such that the probability of picking any particular value is given by a probability distribution. This is known as sampling from the distribution. For example, here are 10 samples from a Bernoulli(0.7) distribution: false, true, false, false, true, true, true, false, true and true. If we took a very large number of samples from a Bernoulli(0.7) distribution then the percentage of the samples equal to true would be very close to 70%.

Sampling Distribution¶

A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population.

Imgur

Central Limit Theorem¶

The CLT says that if you take many repeated samples from a population, and calculate the averages or sum of each one, the collection of those averages will be normally distributed… and it doesn’t matter what the shape of the source distribution is!

CLT applies for reasonably large sample size (e.g., n = 30)

Central Limit Theorem applet
http://bioinformatics.cruk.cam.ac.uk/apps/stats/central-limit-theorem/

Imgur

In [16]:

y = runif(1000)
     par(mfrow = c(2,1))
     hist(y)
     out = vector()
     for (i in 1:1000){
     out_i = mean(sample(y, size = 3))
     out = c(out, out_i)
     }
     hist(out)

Imgur

Sampling Distributions Applet:
http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Imgur

In [17]:

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
   r <- 10000  # Number of replications/samples - DO NOT ADJUST
   # This produces a matrix of observations with  
   # n columns and r rows. Each row is one sample:
   my.samples <- switch(src.dist,
"E" = matrix(rexp(n*r,param1),r),
"N" = matrix(rnorm(n*r,param1,param2),r),
"U" = matrix(runif(n*r,param1,param2),r),
"P" = matrix(rpois(n*r,param1),r),
"B" = matrix(rbinom(n*r,param1,param2),r),
"G" = matrix(rgamma(n*r,param1,param2),r),
"X" = matrix(rchisq(n*r,param1),r),
"T" = matrix(rt(n*r,param1),r))
   all.sample.sums <- apply(my.samples,1,sum)
   all.sample.means <- apply(my.samples,1,mean)   
   all.sample.vars <- apply(my.samples,1,var) 
   par(mfrow=c(2,2))
   hist(my.samples[1,],col="gray",main="Distribution of One Sample")
   hist(all.sample.sums,col="gray",main="Sampling Distribution\nof
the Sum")
   hist(all.sample.means,col="gray",main="Sampling Distribution\nof the Mean")
   hist(all.sample.vars,col="gray",main="Sampling Distribution\nof
the Variance")
}

In [18]:

sdm.sim(50,src.dist="E",param1=1)
#source: https://qualityandinnovation.com/2015/03/30/sampling-distributions-and-central-limit-theorem-in-r/

Imgur

z Score and t Values¶

Imgur

References
http://www.mbmlbook.com/MurderMystery.html

Table of Contents

Terminology¶

Sampling Distribution¶

Central Limit Theorem¶

z Score and t Values¶