Theory of Hypothesis Testing Inference is divided into two broad

advertisement
Theory of Hypothesis Testing
Inference is divided into two broad categories:
• Estimation
• Testing
Chapter 7 devoted to point estimation. Will
discuss interval estimation in Chapter 9.
Testing is the subject of Chapter 8.
Definition 14 A hypothesis is a statement
about population parameters.
There are usually two hypotheses, called the
null and alternative hypotheses.
125
The null hypothesis is denoted H0, and the
alternative is denoted H1.
Usually, H1 is the negation, or complement, of
H0. Often the two hypotheses have the form
H0 : θ ∈ Θ0
H1 : θ ∈ Θc0.
Θ0 ⊂ Θ and Θc0 is the complement of Θ0.
A test of the hypotheses is carried out as follows:
We are to observe a value of random vector X
having distribution f ( · |θ ). Before observing
the data, we formulate a decision rule of the
form:
Reject H0 if the observed value x of X is
in some set R.
Don’t reject H0 if x ∈ Rc.
126
R is called the rejection region or critical region.
We could also phrase the decision rule in terms
of a sufficient statistic.
Any statistic whose values are used to define
the rejection region is called a test statistic.
Suppose that Θ0 contains only a single element. Then H0 : θ ∈ Θ0 is called a simple
hypothesis. Otherwise it is called a composite
hypothesis.
We have two hypotheses H0 and H1. After
observing the data we either take action a0 or
a1.
127
a0 = ”accept, or fail to reject, H0”
a1 = ”accept H1”
Consequences of actions
Action
True state
of nature
a0
a1
θ ∈ Θ0
Correct
Type I
Error
θ ∈ Θc0
Type II
Error
Correct
128
Example 29 Suppose X1, . . . , Xn is a random
sample from N (θ, 1). We’re interested in testing the hypotheses
H0 : θ ≤ 10
Θ = (−∞, ∞)
H1 : θ > 10.
Θ0 = (−∞, 10]
Θc0 = (10, ∞)
For example:
• Xi might be a measure of product quality
when a new process is used.
• The average quality measure using the old
process is 10.
• H0 says that the new process is no better
than the old.
• H1 says the new process is better than the
old.
129
A sufficient statistic in this model is
n
1 X
X̄ =
Xi.
n i=1
Also, we know that X̄ is both the MLE and
the UMVUE of θ. A sensible test would have
the following form:
Take action a1 if x̄ ≥ cn, and
take action a0 if x̄ < cn,
where x̄ is the observed value of X̄ and cn is
some constant larger than 10.
Type I error: Conclude new process is better
when it isn’t.
Type II error: Conclude new process is no
better than the old when in fact it is better.
130
A general approach to testing
When θ is the true parameter value, let
β(θ) = Pθ (rejecting H0)
= Pθ (X ∈ R),
where R is the rejection region of the test. We
have
(
β(θ) =
Pθ (Type I error),
1 − Pθ (Type II error),
if θ ∈ Θ0
if θ ∈ Θc0.
Note that for θ ∈ Θc0,
β(θ) = Pθ (making correct decision).
The function β restricted to Θc0 is called the
power function of the test, and β(θ) is called
the power at θ.
131
Define
α = sup β(θ).
θ∈Θ0
α is called the size of the test. Casella and
Berger say that the test is of level η if α ≤ η.
Notation A test can be characterized by a test
function φ.
½
φ(x) =
1,
0,
if x ∈ R
if x ∈ Rc
Note that β(θ) = Eθ [φ(X )].
Example 29 (continued) Suppose we use a
test function φ as follows:
(
φ(x) =
1,
0,
√
if x̄ ≥ 10 + 1.645
n
otherwise.
132
Ã
1.645
β(θ) = Pθ X̄ ≥ 10 + √
n
!
Ã
!
√
X̄ − θ
= Pθ
√ ≥ n(10 − θ) + 1.645
1/ n
= P (Z ≥
√
n(10 − θ) + 1.645),
where Z ∼ N (0, 1).
Remarks
• β(10) = 0.05 for each n.
• β(θ) increases monotonically, from 0 at θ =
−∞ to 1 at θ = ∞.
• So, the size of the test is 0.05, no matter
the value of n.
133
1.0
Power curves for Example 29
5
0.0
0.2
0.4
β(θ)
0.6
0.8
40 20 10
8
9
10
11
12
θ
The numbers beside the curves indicate sample
size, n.
134
Goal in constructing a test
Make α as small as possible while making the
power as large as possible.
Clearly we can construct a test with α = 0 by
taking R = empty set! Also, we can make the
power equal to 1 for all θ ∈ Θc by using a test
with R = sample space.
However, neither of these tests attains the goal
of making α small and power large.
Our main approach will be to consider tests
with level of significance set at some desired
value (such as 0.05), and to try and select
from these tests one that maximizes power.
135
By taking α to be fairly small, this approach
implicitly says that a Type I error is more serious than a Type II error.
Choosing α to be small makes a Type I error
unlikely, but may mean that our test has low
power, and hence a high probability of Type II
error.
Example 29 (continued) The way we set up
the hypotheses, a Type I error means concluding that the new process is better when it really
isn’t.
Taking α small may reflect our reluctance to
change to the new process unless there’s very
convincing evidence that it’s really better.
136
Type II error: missing out on a better process.
Type I error: make a (perhaps costly) switch
to a new process that is no better than the
old one.
If the latter error is more serious, it would be
advisable to choose α small and to live with
the resulting Type II error probability.
Testing a simple null against a simple alternative
Definition 15 A test φ is said to be a most
powerful test of size α for testing
H0 : θ = θ0
H 1 : θ = θ1
if Eθ0 [φ(X )] = α and for any other test φ∗ of
size α, Eθ1 [φ(X )] ≥ Eθ1 [φ∗(X )].
137
We observe a value of X , whose distribution is
either f (x|θ0) or f (x|θ1).
Suppose we observe X to be x. Consider the
likelihood ratio
f (x|θ1)
L(x|θ0, θ1) =
.
f (x|θ0)
It would be sensible to reject H0 in favor of H1
only when L(x|θ0, θ1) is sufficiently large.
Neyman-Pearson Lemma Let X be a random
vector with pdf or pmf f (x|θ), where θ ∈ Θ =
{θ0, θ1}. Define the hypotheses
H 0 : θ = θ0
H1 : θ = θ1.
138
a) Any test φ of the form
(
φ(x) =
1,
0,
if f (x|θ1) > kf (x|θ0)
if f (x|θ1) < kf (x|θ0)
with Eθ0 [φ(X )] = α is a most powerful level
α test of H0 vs. H1.
b) If there exists a test of the form in a) with
k > 0, then any other most powerful level
α test, call it φ∗, has size α and is such
that φ(x) = φ∗(x) for almost all x in {x :
f (x|θ1) 6= kf (x|θ0)}.
139
Download