STAT355 - Probability & Statistics Chapter 7: Statistical Intervals

advertisement
STAT355 - Probability & Statistics
Chapter 7: Statistical Intervals Based on a Single Sample
Fall 2011
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
1 / 28Inte
Chapter 7 - Statistical Intervals Based on a Single Sample
1
7.1 Basic Properties of Confidence Intervals
2
7.2 Large-Sample Confidence Intervals for a Population Mean and
Proportion
3
7.3 Intervals Based on a Normal Population Distribution
4
7.4 Confidence Intervals for the Variance and Standard Deviation of a
Normal Population
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
2 / 28Inte
Basic Properties of Confidence Intervals
Consider a random sample X1 , ..., Xn from N(µ, σ 2 ) and x1 , ..., xn be the
actual observations of the random sample.
Sample mean X̄ ∼ N(µ, σ 2 /n).
Z=
P(−1.96 ≤
X̄ − µ
√ ∼ N(0, 1)
σ/ n
X̄ − µ
√ ≤ 1.96) = 0.95
σ/ n
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
3 / 28Inte
Basic Properties of Confidence Intervals
P(−1.96 ≤
X̄ − µ
√ ≤ 1.96) = 0.95
σ/ n
is equivalent to
σ
σ
P(X̄ − 1.96 √ ≤ µ ≤ X̄ + 1.96 √ ) = 0.95
n
n
Thus,
σ
σ
(X̄ − 1.96 √ , X̄ + 1.96 √ )
n
n
is a random interval that includes or covers the true value of µ.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
4 / 28Inte
Basic Properties of Confidence Intervals
σ
σ
(X̄ − 1.96 √ , X̄ + 1.96 √ )
n
n
(1)
is a random interval that includes or covers the true value of µ.
Definition
If, after observing X1 = x1 , X2 = x2 , ..., Xn = xn , we compute the observed
sample mean x̄ and then substitute x̄ into (1) in place of X̄ , the resulting
fixed interval
σ
σ
(x̄ − 1.96 √ , x̄ + 1.96 √ )
n
n
is called a 95% confidence interval for µ.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
5 / 28Inte
Basic Properties of Confidence Intervals
Definition
A 100(1 − α)% confidence interval for the mean µ of a normal population
when the value of σ 2 is known is given
σ
σ
(x̄ − zα/2 √ , x̄ + zα/2 √ )
n
n
or, equivalently, by
σ
x̄ ± zα/2 √
n
α = 0.1, zα/2 = z0.05 = 1.64
α = 0.05, zα/2 = z0.025 = 1.96
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
6 / 28Inte
Example
Exercises 1: Consider a normal population with the value of σ known.
√
1 What is the confidence interval level for the interval x̄ ± 2.81σ/
n?
√
2 What is the confidence interval level for the interval x̄ ± 1.44σ/
n?
3 What is the value of z
α/2 that will result in a confidence level of
99.7%?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
7 / 28Inte
Large-Sample Confidence Intervals for a Population Mean
Consider X1 , ..., Xn from N(µ, σ 2 ). Often, σ 2 is unknown. Let S be the
sample standard deviation.
Proposition
If n is sufficiently large, the standardized variable
Z=
X −µ
√
S/ n
has approximately a standard normal distribution. This implies that
s
x̄ ± zα/2 √
n
is a large-sample confidence interval for µ with confidence level
approximately 100(1 − α)%. This formula is valid regardless of the shape
of the population distribution.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
8 / 28Inte
A Confidence Interval for a Population Proportion
Let p denote the proportion of “successes” in a population.
A random sample of n individuals is to be selected, and X is the number
of successes in the sample.
Provided that n is small compared to the population size, X can be
regarded as a binomial rv with
p
E (X ) = np and σX = np(1 − p)
I Furthermore, if both np ≥ 10 and n(1 − p) ≥ 10, then X has
approximately a normal distribution.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
9 / 28Inte
A Confidence Interval for a Population Proportion
The natural estimator of p is p̂ = X /n, the sample fraction of successes.
Since p̂ is just X multiplied by the constant 1/n, p̂ also has approximately
a normal distribution.
As we know that, E (p̂) = p (unbiasedness) and σp̂ =
p
p(1 − p)/n.
The standard deviation σp̂ involves the unknown parameter p.
Standardizing p̂ by subtracting p and dividing by σp̂ then implies that
p̂ − p
P(−zα/2 ≤ p
≤ zα/2 ) ≈ 1 − α
p(1 − p)/n
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
10 / 28Inte
A Confidence Interval for a Population Proportion
Proposition
Let p̃ =
2 /2n
p̂+zα/2
2 /n
1+zα/2
. Then a confidence interval for a population proportion
p with confidence level approximately 100(1 − α)% is
q
2 /4n2
p̂(1 − p̂)/n + zα/2
p̃ ± zα/2
2 /n
1 + zα/2
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
11 / 28Inte
Exercise (7.2) 21
In a sample of 1000 randomly selected consumers who had opportunities
to send in a rebate claim form after purchasing a product, 250 of these
people said they never did so. Calculate an upper confidence bound at the
95% confidence level for the true proportion of such consumers who never
apply for a rebate.
Based on this bound, is there compelling evidence that the true proportion
of such consumers is smaller than 1/3?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
12 / 28Inte
Intervals Based on a Normal Population Distribution
The CI for µ presented earlier is valid provided that n is large.
The resulting interval can be used whatever the nature of the population
distribution.
The CLT cannot be invoked, however, when n is small.
In this case, one way to proceed is to make a specific assumption about
the form of the population distribution and then derive a CI tailored to
that assumption.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
13 / 28Inte
Intervals Based on a Normal Population Distribution
Assumption
The population of interest is normal, so that X1 , ..., Xn constitutes a
random sample from a normal distribution with both µ and σ 2 unknown.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
14 / 28Inte
Intervals Based on a Normal Population Distribution
The key result underlying the interval in earlier section was that for large
X̄ −µ
√ has approximately a standard normal distribution.
n, the rv Z = S/
n
When n is small, S is no longer likely to be close to s, so the variability in
the distribution of Z arises from randomness in both the numerator and
the denominator.
This implies that the probability distribution of
out than the standard normal distribution.
STAT355 ()
- Probability & Statistics
X̄ −µ
√
S/ n
will be more spread
Chapter
Fall 2011
7: Statistical
15 / 28Inte
Intervals Based on a Normal Population Distribution
The result on which inferences are based introduces a new family of
probability distributions called t distributions.
Theorem
When X̄ is the mean of a random sample of size n from a normal
distribution with mean, the rv
T =
X̄ − µ
√
S/ n
has a probability distribution called a t distribution with n − 1 degrees of
freedom (df).
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
16 / 28Inte
Properties of t Distributions
X̄ −µ
√ , we now denote it by T to
Although the variable of interest is still S/
n
emphasize that it does not have a standard normal distribution when n is
small.
We know that a normal distribution is governed by two parameters; each
different choice of µ in combination with σ 2 gives a particular normal
distribution.
Any particular t distribution results from specifying the value of a single
parameter, called the number of degrees of freedom, abbreviated df.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
17 / 28Inte
Properties of t Distributions
Well denote this parameter by the Greek letter ν. Possible values of ν are
the positive integers 1, 2, 3,... So there is a t distribution with 1 df,
another with 2 df, yet another with 3 df, and so on.
For any fixed value of ν, the density function that specifies the associated
t curve is even more complicated than the normal density function.
Fortunately, we need concern ourselves only with several of the more
important features of these curves.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
18 / 28Inte
Properties of t Distributions
Let tν denote the t distribution with ν df.
1
Each tν curve is bell-shaped and centered at 0.
2
Each tν curve is more spread out than the standard normal (z) curve.
3
As ν increases, the spread of the corresponding tν curve decreases.
4
As ν → ∞, the sequence of tν curves approaches the standard normal
curve (so the z curve is often called the t curve with df =∞).
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
19 / 28Inte
Properties of t Distributions
T =
X̄ − µ
√
S/ n
The number of df for T is n − 1 because, although
S is based on the n
P
deviations X1 − X̄ , ..., X̄ − Xn , the fact that (Xi − X̄ ) = 0 implies that
only n − 1 of these are “freely determined.”
The number of df for a t variable is the number of freely determined
deviations on which the estimated standard deviation in the denominator
of T is based.
The use of t distribution in making inferences requires notation for
capturing t-curve tail areas tα analogous to zα for the z curve.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
20 / 28Inte
Properties of t Distributions
Notation: Let tα,ν = the number on the measurement axis for which the
area under the t curve with ν df to the right of tα,ν is α; tα,ν is called a t
critical value.
For example, t.05,6 is the t critical value that captures an upper-tail area of
0.05 under the t curve with 6 df.
Because t curves are symmetric about zero, -tα,ν captures lower-tail area
α.
Appendix Table A.5 gives tα,ν for selected values of α and n.
The columns of the table correspond to different values of α. To obtain
t0.05,15 , go to the α = 0.05 column, look down to the n = 15 row, and
read t0.05,15 = 1.753.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
21 / 28Inte
The One-Sample t Confidence Interval
Proposition
Let x̄ and s be the sample mean and sample standard deviation computed
from the results of a random sample from a normal population with mean
µ. Then a 100(1 − α)% confidence interval for µ is
s
s
(x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √ )
n
n
or, more compactly,
s
x̄ ± tα/2,n−1 √
n
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
22 / 28Inte
The One-Sample t Confidence Interval
Example (11):
Even as traditional markets for sweetgum lumber have declined, large
section solid timbers traditionally used for construction bridges and mats
have become increasingly scarce.
The article “Development of Novel Industrial Laminated Planks from
Sweetgum Lumber” (J. of Bridge Engr., 2008: 6466) described the
manufacturing and testing of composite beams designed to add value to
low-grade sweetgum lumber.
Here is data on the modulus of rupture:
6807.99
7437.88
7659.50
7422.69
7637.06
6872.39
7378.61
7886.87
6663.28
7663.18
7295.54
6316.67
6165.03
6032.28
6702.76
7713.65
6991.41
6906.04
7440.17
7503.33
6992.23 6981.46 7569.75
6617.17 6984.12 7093.71
8053.26 8284.75 7347.95
7674.99
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
23 / 28Inte
The One-Sample t Confidence Interval
Use R software.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
24 / 28Inte
The One-Sample t Confidence Interval
Example (12)
Consider the following sample of fat content (in percentage) of n = 10
randomly selected hot dogs (“Sensory and Mechanical Assessment of the
Quality of Frankfurters,” J. of Texture Studies, 1990: 395409):
25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5
Assuming that these were selected from a normal population distribution,
find a 95% CI for (interval estimate of) the population mean fat content.
Use your calculator to obtain x̄ and s.
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
25 / 28Inte
The Chi-Squared (χ2 ) Distribution
Definition
Let X1 , X2 , ..., Xn be a random sample from a normal distribution with
parameters µ and σ 2 . Then the rv
P
(n − 1)S 2
(Xi − X̄ )2
=
σ2
σ2
has a chi-squared (χ2 ) probability distribution with ν = n − 1 df.
Notation: Let χ2α,ν called a chi-squared critical value, denote the number
on the horizontal axis such that α of the area under the chi-squared curve
with ν df lies to the right of χ2α,ν .
Remark: The chi-squared distribution is not symmetric
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
26 / 28Inte
Confidence Interval of σ 2
From the theorem,
P(χ21−α/2,n−1 ≤
(n − 1)S 2
≤ χ2α/2,n−1 ) = 1 − α
σ2
we get the inequalities
(n − 1)S 2
(n − 1)S 2
≤
α
≤
χ2α/2,n−1
χ21−α/2,n−1
I A 100(1 − α)% confidence interval for the variance σ 2 of a normal
population is
(n − 1)s 2 (n − 1)s 2
,
)
( 2
χα/2,n−1 χ21−α/2,n−1
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
27 / 28Inte
(Suppl) 51
An April 2009 survey of 2253 American adults conducted by the Pew
Research Center’s Internet & American Life Project revealed that 1262 of
the respondents had at some point used wireless means for online access.
1 Calculate an interpret a 95% CI for the proportion of all American
adults who at the time of the survey had used wireless means for
online access.
2 What sample size is required if the desired width of the 95% CI is to
be at most 0.04, irrespective of the sample results?
STAT355 ()
- Probability & Statistics
Chapter
Fall 2011
7: Statistical
28 / 28Inte
Download