Document 10639941

advertisement
Likelihood Ratio Test of a General Linear
Hypothesis
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
1 / 42
Consider the Likelihood Ratio Test of
H0 : Cβ = d
vs HA : Cβ 6= d.
Suppose
y ∼ N(Xβ, σ 2 I).
The likelihood function is
L(β, σ 2 |y) = (2πσ 2 )−n/2 e
−
1
(y−Xβ)0 (y−Xβ)
2σ 2
c
Copyright 2012
Dan Nettleton (Iowa State University)
for β ∈ Rp and σ 2 > 0.
Statistics 611
2 / 42
Note that
L(β, σ 2 |y) = (2πσ 2 )−n/2 e
−
1
Q(β)
2σ 2
,
where
Q(β) = (y − Xβ)0 (y − Xβ)
= ky − Xβk2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
3 / 42
The parameter space under the null hypothesis H0 : Cβ = d is
Ω0 = {(β, σ 2 ) : Cβ = d, σ 2 > 0}.
The parameter space corresponding to the union of the null and
alternative parameter spaces is
Ω = {(β, σ 2 ) : β ∈ Rp , σ 2 > 0}.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
4 / 42
The likelihood ratio test rejects H0 iff
Λ(y) =
supΩ0 L(β, σ 2 |y)
supΩ L(β, σ 2 |y)
is sufficiently small.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
5 / 42
To conduct a significance level α likelihood ratio test, we reject H0 iff
Λ(y) ≤ cα ,
where cα satisfies
sup{P(Λ(y) ≤ cα |β, σ 2 ) : (β, σ 2 ) ∈ Ω0 } ≤ α.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
6 / 42
To find Λ(y), we must maximize the likelihood over Ω0 and Ω.
For any fixed β ∈ Rp , we can find the value of σ 2 > 0 that maximizes
L(β, σ 2 |y) as follows.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
7 / 42
Because log is a strictly increasing function, the value of σ 2 that
maximizes L(β, σ 2 |y) is the same as the value of σ 2 that maximizes
l(β, σ 2 |y) ≡ log L(β, σ 2 |y).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
8 / 42
1
n
l(β, σ 2 |y) = − log(2πσ 2 ) − 2 Q(β)
2
2σ
∂l(β, σ 2 |y)
n
Q(β)
=− 2 +
.
∂σ 2
2σ
2σ 4
Equating to 0 and solving for σ 2 yields
σ̂ 2 (β) =
c
Copyright 2012
Dan Nettleton (Iowa State University)
Q(β)
.
n
Statistics 611
9 / 42
∂l(β,σ 2 |y)
as a function of σ 2
∂σ 2
σ 2 is increasing to the left of
Furthermore, examination
shows that
l(β, σ 2 |y)
Q(β)/n and
as a function of
decreasing to the right of Q(β)/n.
Thus, for any fixed β ∈ Rp , the likelihood is maximized over σ 2 > 0 at
Q(β)/n.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
10 / 42
Now note that
n
− 2Q(β)
Q(β)
L(β, σ̂ 2 (β)|y) = (2πQ(β)/n)−n/2 e
= (2πeQ(β)/n)−n/2 ,
which is clearly maximized over β by minimizing Q(β) over β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
11 / 42
Let β̂ be any minimizer of Q(β) over β ∈ Rp .
We know from previous results that β̂ minimizes Q(β) over β ∈ Rp iff β̂
is a solution to NE. (X0 X)− X0 y is one solution.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
12 / 42
Thus, the maximum likelihood estimator (MLE) of σ 2 is
2
σ̂MLE
= Q(β̂)/n = ky − Xβ̂k2 /n,
where β̂ is any solution to the NE.
Recall that Xβ̂ = PX y is the same for all solution to NE.
2
Thus, σ̂MLE
is the same for any β̂ that solves the NE.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
13 / 42
Note that the MLE of σ 2
2
σ̂MLE
=
Q(β̂)
Q(β̂)
n − rank(X)
=
n
n − rank(X)
n
0
y (I − PX )y n − rank(X)
=
n − rank(X)
n
n
−
rank(X)
= σ̂ 2
.
n
Recall
E(σ̂ 2 ) = σ 2 .
Thus,
2
E(σ̂MLE
)=
c
Copyright 2012
Dan Nettleton (Iowa State University)
n − rank(X) 2
σ < σ2.
n
Statistics 611
14 / 42
If X is of full-column rank, then
β̂ = (X0 X)−1 X0 y
is the maximum likelihood estimator (MLE) of β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
15 / 42
We have
2
sup L(β, σ 2 |y) = L(β̂, σ̂MLE
|y)
Ω
= (2πeQ(β̂)/n)−n/2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
16 / 42
If we let β̃ denote the minimizer of Q(β) over β ∈ Rp satisfying Cβ = d,
then
sup L(β, σ 2 |y)
Ω0
= L(β̃, Q(β̃)/n|y)
= (2πeQ(β̃)/n)−n/2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
17 / 42
Thus
Λ(y) =
(2πeQ(β̃)/n)−n/2
(2πeQ(β̂)/n)−n/2
"
#−n/2
Q(β̃)
=
.
Q(β̂)
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
18 / 42
Now note that
Λ(y) ≤ cα ⇐⇒
⇐⇒
⇐⇒
⇐⇒
c
Copyright 2012
Dan Nettleton (Iowa State University)
Q(β̃)
−n/2
Q(β̂)
Q(β̃)
Q(β̂)
Q(β̃)
Q(β̂)
≤ cα
≥ c−2/n
α
−1
− 1 ≥ c−2/n
α
Q(β̃) − Q(β̂)
Q(β̂)
≥ c−2/n
− 1.
α
Statistics 611
19 / 42
⇐⇒
[Q(β̃) − Q(β̂)]/q
Q(β̂)/(n − r)
≥
n − r −2/n
(cα − 1),
q
where
q = rank(C)
c
Copyright 2012
Dan Nettleton (Iowa State University)
and n − r = n − rank(X).
Statistics 611
20 / 42
Example:
Suppose
y = Xβ + ε,
where
ε ∼ N(0, σ 2 I).
Furthermore, suppose rank( n×p
X ) = p. Partition
"
X = [X1 , X2 ] and
β=
β1
β2
#
,
where X1 is n × p1 and β 1 is p1 × 1.
Suppose we wish to test H0 : β 2 = 0.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
21 / 42
H0 : β 2 = 0 is a GLH H0 : Cβ = d
C = [ 0 , q×q
I]
q×p1
with
and d = 0 ,
q×1
where
q = p − p1 .
This GLH is testable becauseq×p
C has rank q and Cβ is estimable due to
full-column rank of X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
22 / 42
Q(β̃) = min{Q(β) : Cβ = d}
= min{ky − Xβk2 : β ∈ Rp 3 β 2 = 0}
= min{ky − X1 β 1 k2 : β 1 ∈ Rp1 }
= ky − X1 β̂ 1 k2 = ky − PX1 yk2
= k(I − PX1 )yk2 = y0 (I − PX1 )y
= SSEReduced .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
23 / 42
Q(β̂) = ky − Xβ̂k2
= ky − PX yk2
= k(I − PX )yk2
= y0 (I − PX )y
= SSEFull .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
24 / 42
The DF for SSEFull is
DFF = rank(I − PX ) = n − p.
The DF for SSEReduced is
DFR = rank(I − PX1 ) = n − p1 .
DFR − DFF = p − p1 = q.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
25 / 42
We have shown that, in the general case, the likelihood ratio test (LRT)
of
H0 : Cβ = d
rejects for sufficiently large
[Q(β̃) − Q(β̂)]/q
Q(β̂)/(n − r)
c
Copyright 2012
Dan Nettleton (Iowa State University)
.
Statistics 611
26 / 42
In this example,
[Q(β̃) − Q(β̂)]/q
Q(β̂)/(n − r)
=
[SSEReduced − SSEFull ]/(DFR − DFF )
,
SSEFull /DFF
which should look familiar.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
27 / 42
We now show that for a testable GLH
H0 : Cβ = d,
the GLT statistic
F =
=
(Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d)/q
σ̂ 2
[Q(β̃) − Q(β̂)]/q
.
Q(β̂)/(n − r)
Thus, the GLT is equivalent to the LRT.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
28 / 42
Note that
0
Q(β̂)/(n − r) = y
I − PX
n−r
y
= σ̂ 2 .
Thus, it remains to show
Q(β̃) − Q(β̂) = (Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d).
From 3.10, we know that β̃ is leading subvector of solution to RNE.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
29 / 42
Theorem 6.1:
If Cβ = d is testable and β̃ is the leading subvecctor of a solution to
the RNE
"
X 0 X C0
C
0
#" #
b
λ
=
" #
X0 y
d
,
then
Q(β̃) − Q(β̂) = (β̂ − β̃)0 X0 X(β̂ − β̃)
= (Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
30 / 42
Proof of Result 6.1:
First show that
Q(β̃) − Q(β̂) = (β̂ − β̃)0 X0 X(β̂ − β̃).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
31 / 42
Now let λ̃ denote the trailing subvector of the solution to RNE whose
leading subvector is β̃; i.e., suppose
" #
β̃
λ̃
is solution to RNE.
Show X0 X(β̂ − β̃) = C0 λ̃.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
34 / 42
Now show
λ̃ = (C(X0 X)− C0 )−1 (Cβ̂ − d).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
36 / 42
We have established
(1) Q(β̂) − Q(β̃) = (β̂ − β̃)0 X0 X(β̂ − β̃)
(2) X0 X(β̂ − β̃) = C0 λ̃
(3) λ̃ = (C(X0 X)− C0 )−1 (Cβ̂ − d).
Use these results to finish the proof.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
38 / 42
Corollary 6.4:
Suppose Cβ = d is testable, and suppose β̂ is a solution to the NE.
Then the leading subvector of a solution to the RNE with constraint
Cβ = d
can be found by solving for b in the equations
X0 Xb = X0 y − C0 (C(X0 X)− C0 )−1 (C0 β̂ − d).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
41 / 42
Download