Chapter 18 - cloudfront.net

advertisement
Chapter 18
Sampling
Distribution
Models
Review Notation
µ : population mean (real or
assumed average)
ð‘Ĩ : sample mean (estimated
average)
p : population proportion (real or
assumed proportion of success)
𝑝 : sample proportion (estimated
proportion of success)
• Why do we use a sample?
- to make general conclusion about the whole
population
- a census is impractical
• Which has a larger standard deviation, individual
grades on a test or average grades over several
tests?
- Individual grades have more variation!
- average grades have a tighter fit around the
center (appear more like the normal model and
smaller SD)
Review drawing the 68-95-99.7 rule: for this
distribution of test grades, N(70, 10)
Dice Experiment
http://www.math.uah.edu/stat/apps/
DiceExperiment.html
Means – Averaging More Dice
• Looking at the average • The average of three
of two dice after a
dice after a simulation of
simulation of 10,000
10,000 tosses looks like:
tosses:
Means – Averaging Still More Dice
• The average of 5
•
dice after a
simulation of 10,000
tosses looks like:
The average of 20 dice
after a simulation of
10,000 tosses looks
like:
The Fundamental
Theorem of Statistics
• The sampling distribution (shape and spread)
of any statistic (averages or proportions)
becomes more Normal as the sample size
grows.
• This is called the Central Limit Theorem (CLT).
Sampling Distribution for 𝑝 :
% of Americans
who believe in
ghosts
• Distribution Model for pĖ‚
–Mean: ðļ(𝑝) = 𝑝
–Standard deviation:
SD( pĖ‚) ï€―
ïƒĶ
N  p,
ïƒĻ
pq ïƒķ

n ïƒļ
pq
n
A picture of what we just
discussed is as follows:
Assumptions for proportions
1. Randomization Condition: The sample
should be a simple random sample.
2. 10% Condition: the sample size, n, must be
no larger than 10% of the population.
3. Success/Failure Condition: both np
(number of successes) and nq (number of
failures) are at least 10.
4. Independence
Sampling Distribution for ð‘Ĩ :
SAT scores for
all HS students
in US
•Distribution Model for ð‘Ĩ
–Mean: ðļ(ð‘Ĩ) = 𝜇
–Standard deviation:
𝑆𝐷 ð‘Ĩ =
𝜎
𝑁 𝜇,
𝑛
𝜎
𝑛
Assumptions for means
1. Randomization Condition: The data values
must be sampled randomly.
2. 10% Condition: the sample size, n, should be no
more than 10% of the population.
3. Large Enough Sample Condition: if you think it’s
large enough. (CLT doesn’t tell us)
4. Independence
The Process Going Into the
Sampling Distribution Model
Why use these models?
µ
What Can Go Wrong?
• Don’t confuse the sampling distribution with
the distribution of the sample.
– When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
– The sampling distribution is an imaginary
collection of the values that a statistic might
have taken for all random samples—the one you
got and the ones you didn’t get.
• Beware of observations that are not
independent.
–The CLT depends crucially on the
assumption of independence.
–You can’t check this with your data—you
have to think about how the data were
gathered.
• Watch out for small samples from skewed
populations.
–The more skewed the distribution, the larger
the sample size we need for the CLT to work.
Chapter 18 Assignment
Pg. 432: #1, 3, 11, 15, 17, 23,
25, 31, 33, 37, 43 (omit b), 45, 47
Show work!
Download