Sampling Distributions

advertisement
Sampling
Distributions
A review by Hieu Nguyen
(03/27/06)
Parameter vs Statistic
A parameter is a description for the entire
population.
 Example:
A parameter for the US population is the
proportion of all people who support
President Bush’s nomination of Samuel
Alito to the Supreme Court.


p=.74
Parameter vs Statistic
A statistic is a description of a sample
taken from the population. It is only an
estimate of the population parameter.
 Example:
In a poll of 1001 Americans, 73% of those
surveyed supported Alito’s nomination.


p-hat=.73
Bias
The bias of a statistic is a measure of its
difference from the population parameter.
 A statistic is unbiased if it exactly equals
the population parameter.
 Example:
The poll would have been unbiased if 74%
of those surveyed approved of Alito’s
nomination.


p-hat=.74=p
Sampling Variability
Samples naturally have varying results.
The mean or sample proportion of one
sample may be different from that of
another.
 In the poll mentioned before p-hat=.73.
 A repetition of the same poll may have phat=.75.

Central Limit Theorem (CLT)
Populations that are wildly skewed may
cause samples to vary a great deal.
 However, the CLT states that these
samples tend to have a sample proportion
(or mean) that is close to the population
parameter.

 The
CLT is very similar to the law of large
numbers.
CLT Example
Imagine that many polls of 1001
Americans are done to find the proportion
of those who supported Alito’s nomination.
 Although the poll results vary, more
samples have a mean that is close to the
population parameter μ=.74.

CLT Example

Plot the mean of all samples to see the effects of
the CLT. Notice how there are more sample
means near the population parameter μ=.74.

This histogram is actually a sampling distribution
Sampling Distributions:
Definition
Textbook definition:
A sampling distribution is the distribution of
values taken by the statistic in all possible
samples of the same size from the same
population.
 In other words, a sampling distribution is a
histogram of the statistics from samples of
the same size of a population.

Two Most Common Types of
Sampling Distributions

Sample Proportion Distribution
 Distribution
of the sample proportions of
samples from a population

Sample Mean Distribution


Distribution of the sample means of samples
from a population
For both types, the ideal shape is a normal
distribution
Sampling Distributions:
Conditions

Before assuming that a sampling
distribution is normal, check the following
conditions:
 Plausible
Independence
 Randomness
 Each sample is less than 10% of the
population
Sampling Distributions As
Normal Distributions
When all conditions met, the sampling
distribution can be considered a normal
distribution with a center and a spread.
 Note:
With sample proportion distributions,
another condition must be meet:

conditon – there must be at
least 10 success and 10 failures according to
the population parameter and sample size
 Success-failure
Sampling Distributions As
Normal Distributions: Equations

Sample Proportion
Distribution
p = population proportion
(given)
SD pˆ  
pq
n
N  p, SD pˆ 

Sample Mean
Distribution
μ = population mean (given)
σ = population standard
deviation (given)
SD y  

n
N , SD y 
Sampling Distributions As
Normal Distributions: Note

Note:
If any of the parameters are unknown, use
the statistics from a sample to
approximate it.
Using Sampling Distributions

Sampling Distributions can estimate the
probability of getting a certain statistic in a
random sample.
 Use
z-scores or the NormalCDF function in
the TI-83/84.
Using Sampling Distributions:
Z-Scores w/ Example

Use the z-score table to find appropriate
probabilities
Example:
Find the probability that a poll of Americans that support Alito’s
nomination will return a sample proportion of .72.
pˆ  p
z
SD pˆ 
 P pˆ  pˆ 
OR
 P pˆ  pˆ 
p  .74
pq
.74 * .26
SD pˆ  

 .0139
n
1001
pˆ  p .72  .74
z

 1.443
SD pˆ 
.0139
 P pˆ  .72  .0749
Using Sampling Distributions:
NormalCDF Function w/ Example

The syntax for the NormalCDF function is:
 NormalCDF(lower
limit, upper limit, μ, σ)
Example:
Find the probability that a sample of size 25 will have a mean of 5
given that the population has a mean of 7 and a standard deviation
of 3.
 7
 3
SD y  

3
 .6
n
25
NormalCDF (0,5,7,.6)  .000429

Sampling Distribution for
Two Populations

Use a difference sampling distribution if
the question presents 2 different
populations.
 x y   x   y
 x y   x 2   y 2
Sampling Distribution for
Two Populations: Example
(adapted from AP Statistics – Chapter 9 – Sampling Distribution Multiple Choice Questions
Medium oranges have a mean weight of 14oz and a standard
deviation of 2oz. Large oranges have a mean weight of 18oz and a
standard deviation of 3oz. Find the probability of finding a medium
orange that weights more than a large orange.
 x  14
x  2
 y  18
y 3
 y  x   y   x  18  14  4
 y  x   y 2   x 2  32  2 2  3.606
NormalCDF (,0,4,3.606)  .134
Example Problem
(adapted from DeVeau Sampling Distribution Models Exercise #42)
Ayrshire cows average 47 pounds if milk a day, with a standard
deviation of 6 pounds. For Jersey cows, the mean daily production is
43 pounds, with a standard deviation of 5 pounds. Assume that
Normal models describe milk production for these breeds.




A) We select an Ayrshire at random. What’s the probability that she
averages more than 50 pounds of milk a day?
B) What’s the probability that a randomly selected Ayrshire gives more
milk than a randomly selected Jersey?
C) A farmer has 20 Jerseys. What’s the probability that the average
production for this small herd exceeds 45 pounds of milk a day?
D) A neighboring farmer has 10 Ayrshires. What’s the probability that his
herd average is at least 5 pounds higher than the average for the Jersey
herd?
Example Problem Solution
First, check the assumptions:
 Independent
samples
 Randomness
 Sample represents less than 10% of
population
Example Problem Solution
A) Use the normal model to estimate the appropriate probability.
  47
 6
x
50  47
z

 .5  P pˆ  50  .309

6
NormalCDF 50, ,47,6  .309
Example Problem Solution
B) Create a normal model for the difference between Ayrshires and
Jerseys. Use the model to estimate the appropriate probability.
 a  47
a  6
 j  43
j 5
 a  j   a   j  47  43  4
 a  j   a 2   j 2  6 2  52  7.810
z
x  a j
 a j

04
 .512  P x  0   .696
7.810
NormalCDF (0, ,4,7.810)  .696
Example Problem Solution
C) Create a sampling distribution model for which n=20 Jerseys. Use
the model to estimate the appropriate probability.
  43
 5
n  20
SD  y  

5
 1.118
n
20
x   45  43
z

 .1.789  P  pˆ  45  .0367

1.118
NormalCDF (50, ,47,6)  .0367

Example Problem Solution
D) First create a sampling distribution model for 10 random Ayrshires
and 20 random Jerseys. Then create a normal model for the
difference between the 10 Ayrshires and 20 Jerseys.
 j  43
      47  43  4
a j
j 5
j
 a  j   a 2   j 2  1.897 2  1.1182  2.202
n j  20
j
5
SD y j  

 1.118
nj
20
 a  47
a  6
na  10
SD ya  
a
a
na

6
 1.897
10
z
x  a j
 a j

54
 .454  P x  5  .325
2.202
NormalCDF (5, ,4,2.202)  .325
Download