CHAPTER 4 Basic Probability and Discrete Probability Distributions

advertisement
CHAPTER 5 Sampling Distributions
notes continued from Chapt5-2_overheads.doc
Sampling Distribution of the Sample Mean X (continued)
Summing Up
E ( X )   x   for any sample size, any population distributi on
StdErr ( X )   X 

n

X ~ N ( ,
2
X  N ( ,
2
n
n
when sample size, n, is small relative to N

n
N n
when n is large relative to N
N 1
) when X is normally distributed : X ~ N( ,  2 )
) when X is not normal but n  30
What if X is not normal and n < 30? The sampling distribution of
X depends on the specific distribution of X. Not possible to
generalize.
Using Sampling Distribution of the Sample Mean
A sample of 64 is taken from a large population. The pop. has a
mean of $200 and a standard deviation of $20.
a. What is the mean of the sampling distribution of sample
means? Answer: $200. We can expect, in a randomly selected
sample, that the average of the 64 values is $200.
b. What is the standard error of the sample mean?
StdErr ( X )   X 

n

20
 2.5
8
Chapter 5 - 18
c. What is the pb. that the sample mean will fall within $5 of the
true population mean?
Rephrase: What is the pb. that the sample mean will be in the
range $195 to $205?
195  200 X   X 205  200
P(195  X  205)  P(


)
2.5
2.5
/ n
 P(2  z  2)  0.9544
A manufacturer of spark plugs claims that the life of the plugs is
normally distributed with a mean of 36,000 miles and a standard
deviation of 4,000 miles. A random sample of 16 is tested and the
sample mean life of these 16 plugs is 34,500. If the manufacturer’s
claim is correct, what is the probability that the sample mean is less
than 34,500?
Since population is normal, so is sampling distribution of the sample
mean. Under manufacturer’s claim,
X ~ N (36,000,
Therefore,
P( X  34,500)  P(
X  X
X

4000 2
)
16
34500  36000
)  P( z  1.5)  0.0668
1000
The claim could be true, although the sample outcome is unlikely if it
were.
Suppose that the annual incomes of Canadian adults are normally
distributed with a mean of $40,000 and a standard deviation of
$10,000.
What is the probability that the income of a randomly selected
individual is in the range $38,000 to $42,000?
Chapter 5 - 19
This is a sampling distribution where n=1. So is same as the
population distribution.
38000  40000
42000  40000
X
)
10000
10000
 P(0.2  z  0.2)  0.1586
P(38000  X  42000)  P(
Now suppose a sample of 16 is drawn from the population. What is
the probability that the sample mean is in the range $38,000 to
$42,000?
X is normally distribute d with mean, 40000, and standard error
 10000
X 

 2500.
4
n
38000  40000
42000  40000
Then P(38000  X  42000)  P(
X
)
2500
2500
 P(0.8  z  0.8)  0.5762
If sample size is increased to 100, then pb. that sample mean is
within the range 38000 to 42000 becomes 0.9544.
(students should prove this).
Note how small a sample is required to get fairly accurate estimates
of the mean income of Canadian adults.
Sampling Distribution of the Sample Proportion
Often interested in the proportion of the population with a given
characteristic: proportion unemployed, proportion defective,
proportion of fish in a lake that are tagged, etc.
Randomly select sample and use proportion in sample as estimate of
true population value.
Chapter 5 - 20
Population proportion: denoted p (in this text. In most texts: π)
Sample proportion: denoted ps
ps will be a random variable: varies from sample to sample. We want
to know its sampling distribution.
Formally derived from the binomial distribution if sample is drawn with
replacement.
Eg. looking for proportion of Canadians unemployed. Draw a sample
of 100. Sample proportion is no. in sample who are unemployed
divided by sample size. No. in sample who are unemployed will be a
binomial random variable where n=100, pb(success) on any trial =
true unemployment rate.
Generally, the sample proportion is given by:
ps 
x number of successes

n
sample size
To develop its sampling distribution, need to know its expected value
and standard error.
x
1
1
E ( ps )  E ( )  ( ) E ( x)  ( )np  p
n
n
n
Therefore, expected value (or mean) of ps = p, the true population
proportion.
Eg. if unemployment rate is 7.8%, we would expect the
unemployment rate in our sample to be 0.078.
Standard Error of ps. It can be shown that
 ps 
p (1  p )
n
Now we need to know the shape of the sampling distribution.
Chapter 5 - 21
The exact sampling distribution of ps follows a binomial.
Ex. Suppose n=5 and the true population proportion is 0.25.
Possible values for proportion are 0, 1/5, 2/5, 3/5, 4/5, and 5/5.
P(ps = 0) = P(X=0) when n=5 and p=.25 (where X is binomial r.v.)
P(ps = .2) = P(X=1) when n=5 and p=.25
etc.
In most applications, however, we use the following result:
If n is sufficiently large, ps is approximately normal.
How large must n be? If np>5 and n(1-p)>5, the approximation is
acceptable.
p(1  p)
)
n
where a ~ means "is approximately distributed as"
Therefore, if n is large, assume ps a ~ N ( p,
Ex.
Suppose 20% of all fish in a lake are tagged. A sample of 100 is
taken. What is pb. that less than 15% of the fish in the sample are
tagged?
np = 100(.20) = 20, n(1-p) = 100(.80) = 80. Use normal approx.
p s ~ N ( p,
p(1  p)
) or
n
P( ps  .15)  P(
ps  p
 ps

N (.20,
(.2)(.8)
) or N (.20,.042 )
100
.15  .20
)  P( z  1.25)  0.1056
.04
Ex. Suppose 47% of voters have voted for candidate A in an
election. Suppose also that a sample of 1000 voters is selected and
Chapter 5 - 22
asked how they voted. What is the probability that the sample
proportion is greater than 50%, i.e., that we will falsely infer that
candidate A has won the majority of the vote?
ps ~ N (.47,
.47(.53)
) or N (.47,.01582 )
1000
so
P( ps  .50)  P( z 
.50  .47
)  P( z  1.90)  .0287
.0158
Therefore, there is a 2.87% pb. that more than 50% of the voters in
the sample voted for Candidate A when only 47% of the population
voted for Candidate A.
Ends discussion of the sampling distribution of the sample proportion.
Other sample statistics could be investigated. What is the sampling
distribution of the sample standard variation (useful in statistical
quality control when we want to know how much variability there is in
the population of manufactured components), of the sample median,
etc.
Leave any others for now, and proceed to estimating population
means and proportions.
Chapter 5 - 23
Download