Statistics and Sampling Distributions

advertisement
What is a sampling distribution and why is called that?
It’s a distribution of a sample statistic, like x or p. It’s called a ‘sampling’
distribution because every observation within the distribution was
calculated from a sample.
Why do we use statistics, x and p, instead of the original data, X?
Statistics usually have much less variability than the original data. If we
take a random sample, the standard deviation of our statistic is 1
n
of the original standard deviation.
What is this distribution dependent on?
1. the original (or parent) population: the mean and sd
2. the sampling method (e.g., simple random sample)
3. the size of the sample, n
What good do they do us (how do we use them)?
Knowing the sampling distribution of a statistic allows us to make
probability statements about our particular statistic value. For example,
“the likelihood of seeing this small of a proportion of red M&M’s® in a
bag of 50, is only 2%”. We can also find percentiles.
Can we always get (find) a sampling distribution?
We could take many, many samples to create a sampling distribution OR
we can use the Central Limit Theorem which says that the distribution of
the sample mean, x , will be at least approximately normal if we take a
large enough sample.
X ~ N(52, 42) what is the sampling distribution of X 25 , the sample mean
from samples of size n?
X 25 ~ N(52, (4/25)2) ~ N(52, 0.82)
How likely are we to see an X over 55?
P(X>55)=P(Z>(5552)/4)=P(Z>0.75)=1P(Z<0.75) = 10.7734 = 0.2266
How likely are we to see an X 25 over 55?
P( X 25 >55)=P(Z>(5552)/0.8)=P(Z>3.75)=1P(Z<3.75)  1
If X is NOT normal, what is the distribution of X n ?
IF the sample size is large enough, X n will be approximately normal --Central Limit Theorem.
What about categorical data?
If n  10 AND n(1)  10, then p ~ N(, ((1)/n)2)
Download