STATISTICS 141 Introduction to Statistics

advertisement
Chapter 18
Sampling Distribution Models
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
1 / 10
Population VS Sample
Example 18.1
Suppose a total of 10,000 patients in a hospital and 7,000 of them like
to play basketball. A sample of 200 patients is selected from this
hospital, and 128 of them like to play basketball. Find the proportion of
patients who like to play basketball in the population and in the sample.
Population parameter is a numerical measure, e.g., mean, median,
variance, etc, of the given population.
Sample statistic is a summary measure calculated from a sample data
set.
Remark: sample statistic is a random variable, since its value varies
from sample to sample.
The distribution of a sample statistic is called sampling distribution.
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
2 / 10
Sample Proportion
Population parameter: proportion p.
e.g., proportion of students who pass the final in a class.
Sample statistic: proportion p̂.
e.g., instead of considering all students, we rather select a sample to
investigate. Then, p̂ apparently depends on the chosen sample.
Assume we select a sample of n students, and k of them passed the
k
exam. Hence, p̂ = . (different k and/or n give different p̂.)
n
Question?
Assume true population parameter p is known, what is the distribution
of p̂? (which is called sampling distribution)
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
3 / 10
Central Limit Theorem for Sample Proportion
CLT
Given p (population proportion) and n (sample size), the sampling
distribution of p̂ is
r !
pq
N p,
, where q = 1 − p,
n
that is, p̂ follows
a normal distribution with mean p and standard
q
pq
deviation
n .
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
4 / 10
Assumptions and Conditions for CLT
1
Independence Assumption: The sampled values must be
independent of each other.
2
Sample Size Assumption: The sample size, n must be large
enough.
3
Randomization Condition: The sample should be a simple
random sample of the population.
4
10% Condition: The sample size, n, must be no larger than 10%
of the population.
5
Success/failure Condition: n · p ≥ 10 and n · q ≥ 10.
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
5 / 10
Example 18.2
A study shows that the proportion of people in age 20 to 34 with IQ
over 120 is about 0.35. We randomly choose a sample of 50 people
aged between 20 and 34. What is the probability there are more than
30 people with IQ over 120?
Check assumptions: independence & large sample
and conditions: randomization; 10%; and success/failure.
Identify: p = 0.35 and n = 50. (q = 1 − p = 0.65)
p
Sampling distribution: p̂ ∼ N(0.35, 0.35 ∗ 0.65/50 = 0.06745).
30
“more than 30 people with IQ over 120” = “p̂ ≥
= 0.6”.
50
p̂ − 0.35 0.6 − 0.35
≥
= 3.71 = 0.0001.
P(p̂ ≥ 0.6) = P Z =
0.06745
0.06745
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
6 / 10
Sample Mean
Sampling distribution for Mean
When a simple random sample is drawn from a population with mean
µ and standard deviation σ , its sample mean ȳ has a sampling
distribution with the same mean µ but new standard deviation is given
by √σn .
If the population is normally distributed, sampling distribution is exactly
normally distributed. If not, then sampling distribution is only
approximately normally distributed. And the larger the sample size, the
closer the approximation.
Remark: when we estimate the standard deviation of a sampling
distribution, we call it standard error.
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
7 / 10
Assumptions and Conditions for CLT
1
Independence Assumption: The sampled values must be
independent of each other.
2
Sample Size Assumption: The sample size, n must be large
enough.
3
Randomization Condition: The sample should be a simple
random sample of the population.
4
10% Condition: The sample size, n, must be no larger than 10%
of the population.
5
Large Enough Sample Condition: n ≥ 30.
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
8 / 10
Example 18.3
The scores of students on the ACT college entrance exam has a
normal distribution with µ = 18.6 and σ = 5.9.
(a) What is the probability that one randomly chosen student scores 21
or higher?
(b) Now take a simple random sample of 50 students who took the
test. What are the mean and standard deviation of ȳ and describe the
shape of its sampling distribution?
(c) What is the probability that the sample mean is 21 or higher?
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
9 / 10
Example 18.4
The duration of a disease from the onset of symptoms until death
ranges from 3 to 20 years. The mean is 8 years and the standard
deviation is 4 years. Looking at the average duration for 30 randomly
selected patients, calculate the mean and standard deviation of ȳ and
describe the shape of its sampling distribution. What is the probability
that the average duration of those 30 patients is less than 7 years?
Bin Zou (bzou@ualberta.ca)
STAT 141 University of Alberta
Winter 2015
10 / 10
Download