Statistics 305 CHAPTER 6 – INTRODUCTION TO FORMAL STATISTICAL INFERENCE

advertisement
Statistics 305
CHAPTER 6 – INTRODUCTION TO FORMAL
STATISTICAL INFERENCE
Large Sample Confidence Intervals for a Mean
The situation is that we have a sample x1 , x 2 , K , x n from a population whose mean µ is
unknown. We wish to derive an interval, based on our sample, which is likely to contain
µ, and we also want some idea of how likely it is that µ is in the interval. Why? Because
simply looking at the sample mean value x doesn’t give information about how close x
might be to µ.
To begin a derivation, let X 1 , X 2 , K , X n be sampling random variables which
characterize mathematically the sample mentioned above. Suppose n is large so that X
is close to normal. Then
X ≈ N ( µ , σ 2 / n),
(using Central Limit Theorem).
For some given small number α , 0 ≤ α ≤ 1 , (e.g. α = 0.05 or 0.01) consider (using the
⎛
⎞
X −µ
≤ zα / 2 ⎟⎟ = 1 − α .
normal quantile zα / 2 ), P ⎜⎜ − zα / 2 ≤
σ/ n
⎝
⎠
Here zα / 2 is the quantile used to make a correct statement (under the approximation of
the Central Limit Theorem. I.e., zα / 2 is the 1 − α / 2 quantile in the N (0, 1) population.)
Doing some algebra gives:
σ zα / 2
σ zα / 2 ⎞
⎞
⎛
⎛
X −µ
P ⎜⎜ − zα / 2 ≤
≤ −µ ≤ −X +
⎟⎟
≤ zα / 2 ⎟⎟ = P ⎜⎜ − X −
n
n ⎠
σ/ n
⎝
⎠
⎝
⇒
σ zα / 2
σ zα / 2 ⎞
⎛
P ⎜⎜ X −
⎟⎟ = 1 − α
≤ µ ≤ X +
n
n ⎠
⎝
σ zα / 2
σ zα / 2 ⎞
⎛
The interval ⎜⎜ X −
, X+
⎟⎟ is random because it involves X . The
n
n ⎠
⎝
probability statement says that the probability is 1 − α that the random interval will cover
µ. Even given the realization of X we still can’t compute interval endpoints because we
don’t know σ. However, we can compute the sample standard deviation s from
X 1 , X 2 , K , X n and for large n it turns out that we can use s to approximate σ. Now our
sz
sz
⎛
interval becomes ⎜⎜ X − α / 2 , X + α / 2
n
n
⎝
⎞
⎟⎟ .
⎠
Now we have a theoretical development of a useful result. How to use it to get what we
call a 100(1 − α)% Confidence Interval for µ ?
1. Select a confidence level α and use Table B.3 to find zα / 2 so that
P(− zα / 2 ≤ Z ≤ zα / 2 ) = 1 − α . The following table gives example values.
α
zα / 2
1.28
0.2
1.645
0.1
1.96
0.05
2.33
0.02
2. Use the sample values x1 , x 2 , K , x n and compute
x =
1
n
∑
xi ,
s =
i
1
( xi − x ) 2
∑
n −1 i
3. Find the endpoints of the interval
s zα / 2
⎛
⎜⎜ x −
,
n
⎝
x +
s zα / 2 ⎞
⎟⎟
n ⎠
This is called a 100(1−α)% confidence interval for the population mean µ. We
interpret the interval by saying that we are 100(1−α)% confident that µ lies in
the interval. We don’t say that the probability that µ is in the interval is 1 −
α because there is nothing random about the interval now.
2
Example: Exercise 6.1.2
n = 26,
x = 142.7,
s = 98.2
(LARGE SAMPLE SIZE)
a) Find a 90% C.I. for µ. Here α = 0.10 so that 1−α = 0.90
P ( − zα / 2 ≤ Z ≤ zα / 2 ) = 0.90 , zα / 2 = 1.645 so the C.I. for µ is
x ±
s zα / 2
= 142.7 ±
n
(98.2) (1.645)
26
⇒ [111.02, 174.38 ] .
We are 90% confident that this interval contains µ. Certainly either µ is in there
or it isn’t so no probability statement is applicable.
b) Find a 95% C.I. for µ. Here α = 0.05 and zα / 2 = 1.96
x ±
s zα / 2
n
= 142.7 ±
(98.2) (1.96)
26
⇒ [104.95, 180.45 ]
The 95% C.I. is wider, but we are more confident that it contains µ than we
were with the 90% C.I.
* * * * * *
Confidence bounds are also used sometimes. Instead of a finite interval, an upper or
lower bound for µ, with confidence coefficient α, is given.
⎞
⎛ X −µ
Consider the statement P ⎜⎜
≤ zα ⎟⎟ = 1 − α or equivalently
⎠
⎝S/ n
S zα
⎞
⎛
P ⎜⎜ X −
≤ µ ⎟⎟ = 1 − α .
n
⎠
⎝
3
This yields the 100(1−α)% lower confidence bound x −
are 100(1−α)% confident that µ is in the interval [ x −
s zα
n
s zα
n
for µ, and we say that we
, ∞ ].
Similarly, beginning with the statement
⎛ X −µ
P ⎜⎜
≥ − zα
⎝ S/ n
⎞
⎟⎟ = 1 − α
⎠
yields
s zα
⎞
⎛
P ⎜⎜ X +
≥ µ ⎟⎟ = 1 − α ,
n
⎠
⎝
so [ − ∞, x +
s zα
n
] shows the 100(1−α )% upper confidence bound.
To find a 100(1−α )% confidence bound (either upper or lower) the easy way to proceed
is:
1. Compute the endpoints of a 100(1−2α )% confidence interval, then
2. Take the (upper or lower) endpoint of the interval. This is the 100(1−α )%
confidence bound.
Example: Exercise 6.1.2 (continued)
c) For a 90% upper confidence bound for µ, compute an 80% confidence interval
and take the upper endpoint. Here α = 0.20, zα / 2 = z0.10 = 1.28, so the 80%
C.I. has endpoints
⎡
(98.2) (1.28)
,
⎢142.7 −
26
⎣
142.7 +
(98.2) (1.28) ⎤
⎥
26
⎦
The required 90% upper confidence bound is
142.7 +
(98.2) (1.28)
26
= 167.4
d) For a 95% upper confidence bound for µ, compute a 90% confidence interval
and take the upper endpoint. This was done in part (a) and the upper endpoint is
174.4.
4
Download