Chapter 18 * Sampling Distribution Models

advertisement
Chapter 18 – Sampling
Distribution Models
How accurate is our sample?
 Sometimes different polls show different results for
the same question.
 Since each poll samples a different group of people,
we should expect some variation in the results.
 We could try drawing lots of samples and looking at
the variation amongst those samples.
Experiment: Simulating a sample
 A recent US Census Bureau study (source) reports
that about 30% of Americans 25 or older have a
Bachelor’s degree.
 Open up a blank Minitab worksheet and let’s
generate some random data:




Calc > Random Data > Bernoulli
Enter 200 rows
Store in Column C1-C20
Event Probability: .3
Proportion estimates for samples of size 5
 We can treat each row as a sample and calculate the
proportion of each sample using the mean.
 Samples of size 5:

Calc > Row Statistics > Mean
Input Variables: C1 – C5
 Store result in: C21

 Look at these sample proportions. Are they close to the
population proportion of 30%?
 Draw a histogram of the sample proportions in C21
Proportion estimates for samples of size 10
 Samples of size 10:
 Calc > Row Statistics > Mean
Input Variables: C1 – C10
 Store result in: C22

 Look at these sample proportions. Are they close to
the population proportion of 30%?
 Draw a histogram of the sample proportions in C22
Proportion estimates for samples of size 20
 Samples of size 10:
 Calc > Row Statistics > Mean
Input Variables: C1 – C20
 Store result in: C23

 Look at these sample proportions. Are they close to
the population proportion of 30%?
 Draw a histogram of the sample proportions in C23
Sampling Distribution Model for a Proportion
 Our histogram of the sample proportions started to
look like a Normal model
 The larger our sample size gets, the better the
Normal model works
 Assumptions:
 Independence: sampled values must be independent of each
other
 Sample Size: n must be large enough
Conditions to check for assumptions
 Randomization Condition:
 Experiments should have treatments randomly assigned
 Survey samples should be a simple random sample or
representative, unbiased sample otherwise
 10% Condition:
 Sample size n must be no more than 10% of population
 Success/Failure Condition:
 Sample size needs to be large enough to expect at least 10
successes and 10 failures
Sampling Distribution Model for a Proportion
 If the sampled values are independent and the
sample size is large enough,
The sampling distribution model of p is modeled by
a Normal model with:
( p)  p
SD( p) 
pq
n
Example: Proportion of Vegetarians
 7% of the US population is estimated to be
vegetarian. If a random sample of 200 people
resulted in 20 people reporting themselves as
vegetarians, is this an unusually high proportion?
 Conditions:
 Randomization
 10% condition
 Success/Failure
Vegetarians Example continued
 Since our conditions were met, it’s ok to use a Normal
model.

p = 20/200 = .10
 E(
p) = p = .07
SD( p) 
pq
(.07)(.93)

 .018
n
200
.10  .07
 1.67
 z=
.018
This result is within 2 sd’s of
mean, so not unusual
68-95-99.7 Rule with Vegetarians
68%
95%
98%
-3σ
-2σ
-1σ
p
1σ
2σ
3σ
Sampling Distribution of a Mean
 Rolling dice simulation
 10,000 individual rolls recorded
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
 Roll 2 dice 10,000 times, average dice
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
 Rolling 3 dice 10,000 times and averaging dice
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
 Rolling 5 dice 10,000 times and averaging
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
 Rolling 20 dice 10,000 times and averaging
 Once again, as sample size increases, Normal model
appears
Figure from DeVeaux, Intro to Stats
Central Limit Theorem
 The sampling distribution of any mean becomes
more nearly Normal as the sample size grows.

The larger the sample, the better the approximation will be
 Observations need to be independent and
collected with randomization.
CLT Assumptions
 Assumptions:
 Independence: sampled values must be independent
 Sample Size: sample size must be large enough
 Conditions:
 Randomization
 10% Condition
 Large enough sample
Which Normal Model to use?
 The Normal Model depends on a mean and sd
 Sampling Distribution Model for a Mean
When a random sample is drawn from any
population with mean µ and standard deviation σ, its
sample mean y has a sampling distribution with:
Mean: µ
Standard Deviation:

n
Example: CEO compensation
 800 CEO’s
 Mean (in thousands) = 10,307.31
 SD (in thousands) = 17,964.62
 Samples of size 50 were drawn with:
 Mean = 10,343.93
SD = 2,483.84
 Samples of size 100 were drawn with:
 Mean = 10,329.94
SD = 1,779.18
 According to CLT, what should theoretical mean and sd
be?
Example from DeVeaux, Intro to Stats
Download