Likelihood Ratio Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 42 Consider the Likelihood Ratio Test of H0 : Cβ = d vs HA : Cβ 6= d. Suppose y ∼ N(Xβ, σ 2 I). The likelihood function is L(β, σ 2 |y) = (2πσ 2 )−n/2 e − 1 (y−Xβ)0 (y−Xβ) 2σ 2 Copyright c 2012 Dan Nettleton (Iowa State University) for β ∈ Rp and σ 2 > 0. Statistics 611 2 / 42 Note that L(β, σ 2 |y) = (2πσ 2 )−n/2 e − 1 Q(β) 2σ 2 , where Q(β) = (y − Xβ)0 (y − Xβ) = ky − Xβk2 . Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 3 / 42 The parameter space under the null hypothesis H0 : Cβ = d is Ω0 = {(β, σ 2 ) : Cβ = d, σ 2 > 0}. The parameter space corresponding to the union of the null and alternative parameter spaces is Ω = {(β, σ 2 ) : β ∈ Rp , σ 2 > 0}. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 4 / 42 The likelihood ratio test rejects H0 iff Λ(y) = supΩ0 L(β, σ 2 |y) supΩ L(β, σ 2 |y) is sufficiently small. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 5 / 42 To conduct a significance level α likelihood ratio test, we reject H0 iff Λ(y) ≤ cα , where cα satisfies sup{P(Λ(y) ≤ cα |β, σ 2 ) : (β, σ 2 ) ∈ Ω0 } ≤ α. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 42 To find Λ(y), we must maximize the likelihood over Ω0 and Ω. For any fixed β ∈ Rp , we can find the value of σ 2 > 0 that maximizes L(β, σ 2 |y) as follows. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 42 Because log is a strictly increasing function, the value of σ 2 that maximizes L(β, σ 2 |y) is the same as the value of σ 2 that maximizes l(β, σ 2 |y) ≡ log L(β, σ 2 |y). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 42 1 n l(β, σ 2 |y) = − log(2πσ 2 ) − 2 Q(β) 2 2σ ∂l(β, σ 2 |y) n Q(β) =− 2 + . ∂σ 2 2σ 2σ 4 Equating to 0 and solving for σ 2 yields σ̂ 2 (β) = Copyright c 2012 Dan Nettleton (Iowa State University) Q(β) . n Statistics 611 9 / 42 ∂l(β,σ 2 |y) as a function of σ 2 ∂σ 2 σ 2 is increasing to the left of Furthermore, examination shows that l(β, σ 2 |y) Q(β)/n and as a function of decreasing to the right of Q(β)/n. Thus, for any fixed β ∈ Rp , the likelihood is maximized over σ 2 > 0 at Q(β)/n. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 42 Now note that n − 2Q(β) Q(β) L(β, σ̂ 2 (β)|y) = (2πQ(β)/n)−n/2 e = (2πeQ(β)/n)−n/2 , which is clearly maximized over β by minimizing Q(β) over β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 11 / 42 Let β̂ be any minimizer of Q(β) over β ∈ Rp . We know from previous results that β̂ minimizes Q(β) over β ∈ Rp iff β̂ is a solution to NE. (X0 X)− X0 y is one solution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 12 / 42 Thus, the maximum likelihood estimator (MLE) of σ 2 is 2 σ̂MLE = Q(β̂)/n = ky − Xβ̂k2 /n, where β̂ is any solution to the NE. Recall that Xβ̂ = PX y is the same for all solution to NE. 2 Thus, σ̂MLE is the same for any β̂ that solves the NE. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 42 Note that the MLE of σ 2 2 σ̂MLE = Q(β̂) Q(β̂) n − rank(X) = n n − rank(X) n 0 y (I − PX )y n − rank(X) = n − rank(X) n n − rank(X) = σ̂ 2 . n Recall E(σ̂ 2 ) = σ 2 . Thus, 2 E(σ̂MLE )= Copyright c 2012 Dan Nettleton (Iowa State University) n − rank(X) 2 σ < σ2. n Statistics 611 14 / 42 If X is of full-column rank, then β̂ = (X0 X)−1 X0 y is the maximum likelihood estimator (MLE) of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 42 We have 2 sup L(β, σ 2 |y) = L(β̂, σ̂MLE |y) Ω = (2πeQ(β̂)/n)−n/2 . Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 16 / 42 If we let β̃ denote the minimizer of Q(β) over β ∈ Rp satisfying Cβ = d, then sup L(β, σ 2 |y) Ω0 = L(β̃, Q(β̃)/n|y) = (2πeQ(β̃)/n)−n/2 . Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 42 Thus Λ(y) = (2πeQ(β̃)/n)−n/2 (2πeQ(β̂)/n)−n/2 " #−n/2 Q(β̃) = . Q(β̂) Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 42 Now note that Λ(y) ≤ cα ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ Copyright c 2012 Dan Nettleton (Iowa State University) Q(β̃) −n/2 Q(β̂) Q(β̃) Q(β̂) Q(β̃) Q(β̂) ≤ cα ≥ c−2/n α −1 − 1 ≥ c−2/n α Q(β̃) − Q(β̂) Q(β̂) ≥ c−2/n − 1. α Statistics 611 19 / 42 ⇐⇒ [Q(β̃) − Q(β̂)]/q Q(β̂)/(n − r) ≥ n − r −2/n (cα − 1), q where q = rank(C) Copyright c 2012 Dan Nettleton (Iowa State University) and n − r = n − rank(X). Statistics 611 20 / 42 Example: Suppose y = Xβ + ε, where ε ∼ N(0, σ 2 I). Furthermore, suppose rank( n×p X ) = p. Partition " X = [X1 , X2 ] and β= β1 β2 # , where X1 is n × p1 and β 1 is p1 × 1. Suppose we wish to test H0 : β 2 = 0. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 21 / 42 H0 : β 2 = 0 is a GLH H0 : Cβ = d C = [ 0 , q×q I] q×p1 with and d = 0 , q×1 where q = p − p1 . This GLH is testable becauseq×p C has rank q and Cβ is estimable due to full-column rank of X. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 22 / 42 Q(β̃) = min{Q(β) : Cβ = d} = min{ky − Xβk2 : β ∈ Rp 3 β 2 = 0} = min{ky − X1 β 1 k2 : β 1 ∈ Rp1 } = ky − X1 β̂ 1 k2 = ky − PX1 yk2 = k(I − PX1 )yk2 = y0 (I − PX1 )y = SSEReduced . Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 23 / 42 Q(β̂) = ky − Xβ̂k2 = ky − PX yk2 = k(I − PX )yk2 = y0 (I − PX )y = SSEFull . Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 24 / 42 The DF for SSEFull is DFF = rank(I − PX ) = n − p. The DF for SSEReduced is DFR = rank(I − PX1 ) = n − p1 . DFR − DFF = p − p1 = q. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 42 We have shown that, in the general case, the likelihood ratio test (LRT) of H0 : Cβ = d rejects for sufficiently large [Q(β̃) − Q(β̂)]/q Q(β̂)/(n − r) Copyright c 2012 Dan Nettleton (Iowa State University) . Statistics 611 26 / 42 In this example, [Q(β̃) − Q(β̂)]/q Q(β̂)/(n − r) = [SSEReduced − SSEFull ]/(DFR − DFF ) , SSEFull /DFF which should look familiar. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 27 / 42 We now show that for a testable GLH H0 : Cβ = d, the GLT statistic F = = (Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d)/q σ̂ 2 [Q(β̃) − Q(β̂)]/q . Q(β̂)/(n − r) Thus, the GLT is equivalent to the LRT. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 28 / 42 Note that 0 Q(β̂)/(n − r) = y I − PX n−r y = σ̂ 2 . Thus, it remains to show Q(β̃) − Q(β̂) = (Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d). From 3.10, we know that β̃ is leading subvector of solution to RNE. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 29 / 42 Theorem 6.1: If Cβ = d is testable and β̃ is the leading subvecctor of a solution to the RNE " X 0 X C0 C 0 #" # b λ = " # X0 y d , then Q(β̃) − Q(β̂) = (β̂ − β̃)0 X0 X(β̂ − β̃) = (Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 30 / 42 Proof of Result 6.1: First show that Q(β̃) − Q(β̂) = (β̂ − β̃)0 X0 X(β̂ − β̃). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 31 / 42 Q(β̃) − Q(β̂) = (y − Xβ̃)0 (y − Xβ̃) −(y − Xβ̂)0 (y − Xβ̂) = y0 y − 2β̃ 0 X0 y + β̃ 0 X0 Xβ̃ −y0 y + 2β̂ 0 X0 y − β̂ 0 X0 Xβ̂ = 2(β̂ − β̃)0 X0 y + β̃ 0 X0 Xβ̃ − β̂ 0 X0 Xβ̂ Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 32 / 42 = 2(β̂ − β̃)0 X0 Xβ̂ + β̃ 0 X0 Xβ̃ − β̂ 0 X0 Xβ̂ = β̂ 0 X0 Xβ̂ − 2β̃ 0 X0 Xβ̂ + β̃ 0 X0 Xβ̃ = (β̂ − β̃)0 X0 X(β̂ − β̃). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 33 / 42 Now let λ̃ denote the trailing subvector of the solution to RNE whose leading subvector is β̃; i.e., suppose " # β̃ λ̃ is solution to RNE. Show X0 X(β̂ − β̃) = C0 λ̃. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 34 / 42 X0 X(β̂ − β̃) = X0 Xβ̂ − X0 Xβ̃ Copyright c 2012 Dan Nettleton (Iowa State University) = X0 y − (X0 y − C0 λ̃) = C0 λ̃. Statistics 611 35 / 42 Now show λ̃ = (C(X0 X)− C0 )−1 (Cβ̂ − d). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 36 / 42 X0 X(β̂ − β̃) = C0 λ̃ =⇒ C(X0 X)− X0 X(β̂ − β̃) = C(X0 X)− C0 λ̃ =⇒ AX(X0 X)− X0 X(β̂ − β̃) = C(X0 X)− C0 λ̃ =⇒ AX(β̂ − β̃) = C(X0 X)− C0 λ̃ =⇒ C(β̂ − β̃) = C(X0 X)− C0 λ̃ =⇒ Cβ̂ − Cβ̃ = C(X0 X)− C0 λ̃ =⇒ Cβ̂ − d = C(X0 X)− C0 λ̃ =⇒ (C(X0 X)− C0 )−1 (Cβ̂ − d) = λ̃. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 37 / 42 We have established (1) Q(β̂) − Q(β̃) = (β̂ − β̃)0 X0 X(β̂ − β̃) (2) X0 X(β̂ − β̃) = C0 λ̃ (3) λ̃ = (C(X0 X)− C0 )−1 (Cβ̂ − d). Use these results to finish the proof. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 38 / 42 Combining (2) and (3) gives X0 X(β̂ − β̃) = C0 (C(X0 X)− C0 )−1 (Cβ̂ − d). Multiplying on left by (β̂ − β̃)0 gives (β̂ − β̃)0 X0 X(β̂ − β̃) = (β̂ − β̃)0 C0 (C(X0 X)− C0 )−1 (Cβ̂ − d) = (Cβ̂ − Cβ̃)0 (C(X0 X)− C0 )−1 (Cβ̂ − d) = (Cβ̂ − d)0 (C(X0 X)− C0 )−1 (Cβ̂ − d). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 39 / 42 Combining this with (1) gives the desired result. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 40 / 42 Corollary 6.4: Suppose Cβ = d is testable, and suppose β̂ is a solution to the NE. Then the leading subvector of a solution to the RNE with constraint Cβ = d can be found by solving for b in the equations X0 Xb = X0 y − C0 (C(X0 X)− C0 )−1 (C0 β̂ − d). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 41 / 42 Proof of Corollary 6.4: In the proof of Theorem 6.1, we showed " X 0 X C0 C 0 #" # β̃ λ̃ = " # X0 y d , where λ̃ = (C(X0 X)− C0 )−1 (Cβ̂ − d). Thus X0 Xβ̃ + C0 λ̃ = X0 y ⇐⇒ X0 Xβ̃ = X0 y − C0 (C(X0 X)− C0 )−1 (Cβ̂ − d). Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 42 / 42