Chapter 8

advertisement
Chapter 8
Sampling Variability and Sampling Distributions
JellyBlubbers:Episode 4
A New Curve
Well, JellyBlubbers have taken control of my backyard, but a
solution to the JellyBlubber invasion is in the works! I have
determined that Jelly aggression is the result of the skewed
nature of the distribution of their lengths. Jellies require balance
in all things and as long as they continue to focus on such a
skewed distribution it upsets them.
So, lets see if we can help the Jellies out!
JellyBlubbers:Episode 4
A New Curve
1. Use a random number generator (a table or
your calculator) to generate a random
sample of 20 JellyBlubbers.
2. Find the Sample Mean length of the set.
3. Graph the sample mean length on the dotplot at the front of the room.
4. Repeat the process with a new random
sample.
If we wanted to do this for every possible
sample of size 20, how many sample mean
lengths would we have to find?
JellyBlubbers:Episode 4
A New Curve
Does it appear that this new distribution
the Jellies will focus on, the sampling
distribution of the sample means, will
allow us to achieve the peace and
balance in my backyard?
Why?
BASIC TERMS
•Statistic:
•Any quantity computed from values in a
sample.
•Sampling variability:
•The observed value of a statistic depends on
the particular sample selected from the
population; Typically, it varies from sample
to sample.
Sample This!
Q: How many possible different samples
of 5 m&ms are in a bag of 500
m&ms?
A: Too many…Not really. There are 500C5
or 255,244,687,600 possible samples
of 5 M&Ms. This is called the
Population of Samples.
BASIC TERMS, cont.
•Sampling Distribution:
The distribution of a statistic.
If the population of samples is relatively
small, a sampling distribution can be
displayed in a table just like any other
probability distribution!
Sampling Distribution of Sample
Means: An Example…
MHS has the following senior football starters and
their weights in pounds:
Aaron-220, Brad-200, Chris-170, Derek-180, Eric-190,
Frank-210, and George-160. Suppose this is the
population we are interested in. The mean and
standard deviation of this population are:
=
220  200  170  180  190  210  160
 190lbs
7
 = 20
To create a sampling distribution of sample
means from every combination of two players,
let’s create the population of samples.
Sample Sample Mean Sample
AB
210
BD
AC
195
BE
AD
200
BF
AE
205
BG
AF
215
CD
AG
190
CE
BC
185
CF
Sample Mean
190
195
205
180
175
180
190
Sample Sample Mean
CG
165
DE
185
DF
195
DG
170
EF
200
EG
175
FG
185
Create the sampling distribution as a probability
distribution then create the dotplot for the original data
and for the sample means. Compare them.
Results!!!
• Original data
• Sample means
  = 190
  = 20
  = 190
  = 12.91
 Graph
 Graph
Now create the population OF SAMPLES for
samples of size 3! Create the sampling
distribution and the dotplot.
Sample Sample Mean Sample Sample Mean Sample Sample Mean
ABC
AFG
CEF
ABD
BCD
CEG
ABE
BCE
CFG
ABF
BCF
DEF
ABG
BCG
DEG
ACD
BDE
DFG
ACE
BDF
EFG
ACF
BDG
ACG
BEF
ADE
BEG
ADF
BFG
ADG
CDE
AEF
CDF
AEG
CDG
What Do You Notice About the
Sampling Distributions of Sample
Means As the Sample Size Increases
From the Parent Population?
• What type of distribution is the parent
population?
• What is the mean (center) of the parent
population?
What Do You Notice About the
Sampling Distributions of Sample
Means As the Sample Size Increases
From the Parent Population?
• What is the standard deviation of the
parent population?
• What shape (type of distribution) is the
sampling distribution of sample means of
sample size 2?
What Do You Notice About the
Sampling Distributions of Sample
Means As the Sample Size Increases
From the Parent Population?
• What shape (type of distribution) is the
sampling distribution of sample means of
sample size 3?
• Remind me. What was the center (mean)
value of the sampling distribution of
sample means of sample size 2?
What Do You Notice About the
Sampling Distributions of Sample
Means As the Sample Size Increases
From the Parent Population?
• What appears to be the center (mean) value of the
sampling distribution of sample means of sample
size 3?
• The standard deviation of the sampling
distribution of sample means of sample size 2 is
13.2288 and for sample size 3 is 9.5665. How
does this compare to the parent population
standard deviation of 20?
Example 2…
•Consider a very large population that consists of the
numbers 1, 2, 3, 4 and 5 generated in a manner that the
probability of each of those values is 0.2 no matter what
the previous selections were. This population could be
described as the outcome associated with a spinner such
as given below with the distribution next to it.
x
1
2
3
4
5
p(x)
0.2
0.2
0.2
0.2
0.2
Example 2
•If the sampling distribution for the means of
samples of size two is analyzed, it looks like…..
Population of Samples
Sample
1, 1
1, 2
1, 3
1, 4
1, 5
2, 1
2, 2
2, 3
2, 4
2, 5
3, 1
3, 2
3, 3
1
1.5
2
2.5
3
1.5
2
2.5
3
3.5
2
2.5
3
Sample
3, 4
3, 5
4, 1
4, 2
4, 3
4, 4
4, 5
5, 1
5, 2
5, 3
5, 4
5, 5
Sampling Distribution
3.5
4
2.5
3
3.5
4
4.5
3
3.5
4
4.5
5
1
1.5
2
2.5
3
3.5
4
4.5
5
frequency
1
2
3
4
5
4
3
2
1
25
p(x)
0.04
0.08
0.12
0.16
0.20
0.16
0.12
0.08
0.04
Example 2
• The original distribution and the sampling
distribution of means of samples with n=2 are
given below.
1
2
3
4
5
Original distribution
1
2
3
4
5
Sampling distribution
n=2
Example 2
• Sampling distributions for n=3 and n=4 were
calculated and are illustrated below.
1
2
3
4
5
1
2
1
2
3
4
5
Original distribution Sampling distribution n = 2
1
2
3
4
5
3
4
5
Sampling distribution
Sampling distribution
n=3
n=4
Simulations
To illustrate the general behavior
of samples of fixed size n,
10000 samples each of size 30, 2
60 and 120 were generated from
this uniform distribution and the
means calculated. Probability
histograms were created for
each of these (simulated)
sampling distributions.
2
Notice all three of these look to
be essentially normally
distributed. Also the mean of
each is 3 and the variability
decreases as the sample size
increases.
2
3
4
3
4
Means (n=30)
Means (n=60)
3
Means (n=120)
4
Simulations
To further illustrate the general behavior of samples of fixed size n,
10000 samples each of size 4, 16 and 30 were generated from the
positively skewed distribution pictured below.
Skewed distribution
Notice that these sampling distributions all all skewed, but as n
increased the sampling distributions became more symmetric and
eventually appeared to be almost normally distributed.
Terminology
The mean of the distribution of
sample means  μ x
The standard deviation of
the distribution of sample means  σ x
Properties of the Sampling
Distribution of the Sample Mean.
Rule 1:  x  



Rule 2: x
n This rule is approximately
correct as long as no more than 5%*of
the population is included in the
sample.
Rule 3: When the population distribution is
normal, the sampling distribution of x is
also normal for any sample size n.
*10% depending on your text of choice. Its just like
the Binomial Distribution rule for sampling without
replacement.
Central Limit Theorem.
Rule 4:
When n is sufficiently large,
the sampling distribution of
is approximately normally
distributed, even when the
population distribution is not
itself normal.
x
Illustrations of Sampling
Distributions
Population
n= 4
n=9
n = 25
Symmetric normal-like population
Illustrations of Sampling
Distributions
Population
n=4
n=10
n=30
Skewed population
More About the Central
Limit Theorem.
The Central Limit Theorem can safely be
applied when n exceeds 30.
If n is large or the population distribution is normal, the
standardized variable, z, has (approximately) a standard
normal (z) distribution.
x  X x  
z

X
 n
Examples
Example 1: Non-CLT problem
The average number of detention
hours assigned per offender at MHS is
5 hours with a standard deviation of
1.5 hours.
If an offender is selected at random
what is the probability he/she served
no more than 7 hours of detention?
 = 5,  = 1.5
75

p(x  7)  p z 
  .908
1.5 

CLT
Same Setup:
What is the probability that a random
sample of 30 offenders will have served
an average of at most 7 hours?
Notice now we are talking about an
average from a sample of items not an
individual item. This is CLT.

1.5
x    5, x 

 .274
n
30
75

p ( x  7)  p z 
 1
.274 

Example
A food company sells “18 ounce” boxes of cereal. Let
x denote the actual amount of cereal in a box of
cereal. Suppose that x is normally distributed with 
= 18.03 ounces and  = 0.05.
a) What proportion of the boxes will contain less
than 18 ounces?
18  18.03 

P(x  18)  P  z 

0.05 

 P(z  0.60)  0.2743
Example - Continued
b) A case consists of 24 boxes of cereal.
What is the probability that the mean
amount of cereal (per box in a case) is
less than 18 ounces?
The central limit
theorem states that
the distribution of x
is normal so…

18  18.03 
P(x  18)  P  z 

0.05
24


 P(z  2.94)  0.0016
Some Proportion Distributions
Where P = 0.2
Let p̂ be the proportion of successes in a
random sample of size n from a population
whose proportion of S’s (successes) is p.*
n = 10
np  10, n1  p   10
n = 20
n = 50
n = 100
0.2
0.2
0.2
0.2
* Or  depending on your textbook of choice.
Properties of the Sampling
Distribution of p̂
Let p̂ be the proportion of successes in
a random sample of size n from a
population whose proportion of S’s
(successes) is p. Denote the mean of
by mu and the standard deviation by
sigma . Then the following rules hold
p̂
Properties of the Sampling
Distribution of p̂
Rule 1:  p̂
p
Rule 2:  pˆ

p 1  p 
n
Rule 3: When n is large and p is not too near 0
or 1, the sampling distribution of p-hat
is approximately normal. (CLT for
proportions.)
Rule of thumb: If n·p10 and n(1-p)10
the distribution is approximately normal
Condition for Use
The further the value of p is from 0.5, the
larger n must be for a normal
approximation to the sampling distribution
of p̂ to be accurate.
Rule of Thumb: Remember!!!!!!
If both np  10 and n(1-p)  10, then it is
safe to use a normal approximation.
Example
If the true proportion of defectives
produced by a certain manufacturing
process is 0.08 and a sample of 400
is chosen, what is the probability that
the proportion of defectives in the
sample is greater than 0.10?
Since np = 400(0.08) = 32 > 10 and
n(1-p) = 400(0.92) = 368 > 10, it’s
reasonable to use the normal
approximation.
Example (Continued)
p    0.08
p 
(1  )
0.08(1  0.08)

 0.013565
n
400
z
p  p 0.10  0.08

 1.47
p
0.013565
P(p  0.1)  P(z  1.47)
 1  0.9292  0.0708
Example
Suppose 3% of the people contacted by
phone are receptive to a certain sales
pitch and buy your product. If your sales
staff contacts 2000 people, what is the
probability that more than 5% of the
people contacted will purchase your
product?
Clearly p = 0.03 and
p = 100/2000 = 0.05



0.05  0.03 
so
P(p  0.05)  P  z 




(0.03)(0.97) 

2000

0.05  0.03 

 P z 
  P(z  5.24)  0
0.0038145 

Example - Continued
If your sales staff contacts 2000 people, what is
the probability that less than 2.5% of the people
contacted will purchase your product?
Now p = 0.03 and p = 50/2000 = 0.025 so



0.025  0.03 
P(p  0.025)  P  z 

(0.03)(0.97) 



2000


0.025  0.03 

 P z 
  P(z  1.31)  0.0951
0.0038145 

 x  ,  x 

n
Review
• Central limit theorem for •
the sample means
• If the sample size is
sufficiently large, then the
•
sampling distribution of
the sample means is
approximately normal
regardless of the shape of
the the parent distribution.
• Rule of thumb: it is
generally safe to apply the
CLT for means when n30.
 pˆ  p,  p̂ 
p(1  p)
n
“Central limit
theorem” for the
sample proportions
The sampling
distribution of sample
proportions is
approximately
normal when np10
and n(1-p)10
Download