Population: population is the broader group of people to whom you intend to generalize the results of your study.
Sample: sample is the group of individuals who participate in your study, always be a subset of your population.
Statistic: A statistic is a characteristic of a sample
Parameter: A parameter is a characteristic of a population
sampling
Randomly choosing a value such that the probability of picking any particular value is given by a probability distribution. This is known as sampling from the distribution. For example, here are 10 samples from a Bernoulli(0.7) distribution: false, true, false, false, true, true, true, false, true and true. If we took a very large number of samples from a Bernoulli(0.7) distribution then the percentage of the samples equal to true would be very close to 70%.
A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population.
The CLT says that if you take many repeated samples from a population, and calculate the averages or sum of each one, the collection of those averages will be normally distributed… and it doesn’t matter what the shape of the source distribution is!
Central Limit Theorem applet
http://bioinformatics.cruk.cam.ac.uk/apps/stats/central-limit-theorem/
y = runif(1000)
par(mfrow = c(2,1))
hist(y)
out = vector()
for (i in 1:1000){
out_i = mean(sample(y, size = 3))
out = c(out, out_i)
}
hist(out)
Sampling Distributions Applet:
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
r <- 10000 # Number of replications/samples - DO NOT ADJUST
# This produces a matrix of observations with
# n columns and r rows. Each row is one sample:
my.samples <- switch(src.dist,
"E" = matrix(rexp(n*r,param1),r),
"N" = matrix(rnorm(n*r,param1,param2),r),
"U" = matrix(runif(n*r,param1,param2),r),
"P" = matrix(rpois(n*r,param1),r),
"B" = matrix(rbinom(n*r,param1,param2),r),
"G" = matrix(rgamma(n*r,param1,param2),r),
"X" = matrix(rchisq(n*r,param1),r),
"T" = matrix(rt(n*r,param1),r))
all.sample.sums <- apply(my.samples,1,sum)
all.sample.means <- apply(my.samples,1,mean)
all.sample.vars <- apply(my.samples,1,var)
par(mfrow=c(2,2))
hist(my.samples[1,],col="gray",main="Distribution of One Sample")
hist(all.sample.sums,col="gray",main="Sampling Distribution\nof
the Sum")
hist(all.sample.means,col="gray",main="Sampling Distribution\nof the Mean")
hist(all.sample.vars,col="gray",main="Sampling Distribution\nof
the Variance")
}
sdm.sim(50,src.dist="E",param1=1)
#source: https://qualityandinnovation.com/2015/03/30/sampling-distributions-and-central-limit-theorem-in-r/
References
http://www.mbmlbook.com/MurderMystery.html