Cochran’s Theorem and Analysis of Variance c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 34 Theorem 5.1:(Cochran’s Theorem) Suppose Y ∼ N(µ, σ 2 I) and A1 , . . . , Ak are symmetric and idempotent matrices with rank(Ai ) = si Then Pk i=1 Ai =n×n I =⇒ 1 0 Y Ai Y σ2 ∀ i = 1, . . . , k. (i = 1, . . . , k) are independently distributed as χ2si (φi ), with φi = k X 1 0 µ A µ, si = n. i 2σ 2 i=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 34 Proof of Theorem 5.1: By Result 5.15, 1 0 Y Ai Y ∼ χ2si (φi ) σ2 with φi = ∵ 1 A σ2I σ2 i 1 0 µ Ai µ 2σ 2 = Ai is idempotent and has rank si c Copyright 2012 Dan Nettleton (Iowa State University) ∀ i = 1, . . . , k. Statistics 611 3 / 34 k X si = i=1 = k X i=1 k X rank(Ai ) trace(Ai ) i=1 = trace( k X Ai ) i=1 = trace(n×n I) = n. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 4 / 34 By Lemma 5.1, ∃ Gi 3 n×si Gi G0i = Ai , G0i Gi = I si ×si ∀ i = 1, . . . , k. Now let G = [G1 , . . . , Gk ]. ∵ Gi is n × si ∀ i = 1, . . . , k and c Copyright 2012 Dan Nettleton (Iowa State University) Pk i=1 si = n, it follows that G is n × n. Statistics 611 5 / 34 Moreover, h GG0 = G1 = k X i=1 G0 i 1 .. . . . Gk . G0k Gi G0i = k X Ai = I. i=1 Thus, n×n G has G0 as its inverse; i.e., G−1 = G0 . Thus G0 G = I. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 34 Now we have G01 h i .. G1 · · · Gk I = G0 G = . G0k G01 G1 G01 G2 . . . G01 Gk 0 G G1 G0 G2 . . . G0 Gk 2 2 2 = . . .. .. .. .. . . . G0k G1 G0k G2 . . . G0k Gk c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 34 ∴ G0i Gj = 0 ∀ i 6= j. ∴ Gi G0i Gj G0j = 0 ∀ i 6= j ∴ Ai Aj = 0 ∀ i 6= j ∴ σ 2 Ai Aj = 0 ∀ i 6= j ∴ Ai (σ 2 I)Aj = 0 ∀ i 6= j ∴ Independence, hold by Corollary 5.4, ∀ pair of quadratic forms Y 0 Ai Y/σ 2 and Y 0 Aj Y/σ 2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 34 However, we can prove more than pairwise independence. G01 Y G01 h i0 . . .. = .. Y = G1 . . . Gk Y = G0 Y 0 Gk Y G0k ∼ N(G0 µ, G0 (σ 2 I)G = σ 2 G0 G = σ 2 I). By Result 5.4, G01 Y, . . . , G0k Y are mutually independent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 9 / 34 G01 Y, . . . , G0k Y mutually independent, =⇒ kG01 Yk2 , . . . , kG0k Yk2 mutually independent =⇒ Y 0 G1 G01 Y, . . . , Y 0 Gk G0k Y mutually independent =⇒ Y 0 A1 Y/σ 2 , . . . , Y 0 Ak Y/σ 2 mutually independent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 34 Example: i.i.d. Suppose Y1 , . . . , Yn ∼ N(µ, σ 2 ). Find the joint distribution of nȲ·2 and c Copyright 2012 Dan Nettleton (Iowa State University) Pn i=1 (Yi − Ȳ· )2 . Statistics 611 11 / 34 Let A1 = P1 = 1(10 1)− 10 = 1n 110 . Let A2 = I − P1 = I − 1n 110 . Then rank(A1 ) = 1 and rank(A2 ) = n − 1. Also, A1 and A2 are each symmetric and idempotent matrices 3 A1 + A2 = P1 + I − P1 = I. Let Y = (Y1 , . . . , Yn )0 3 E(Y) = µ1, Var(Y) = σ 2 I. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 12 / 34 Cochran’s Thenorem =⇒ IND 1 0 Y Ai Y ∼ σ2 si = rank(Ai ) and φi = χ2si (φi ), where 1 0 µ2 0 µ A µ = 1 Ai 1. i 2σ 2 2σ 2 For i = 1, we have 1 0 0 n Y 11 Y/n = 2 Ȳ·2 , s1 = 1, 2 σ σ µ2 0 1 0 nµ2 φ1 = 2 1 11 1 = 2 . 2σ n 2σ c Copyright 2012 Dan Nettleton (Iowa State University) and Statistics 611 13 / 34 For i = 2, we have 1 0 1 Y A2 Y = 2 Y 0 A02 A2 Y 2 σ σ 1 = 2 kA2 Yk2 σ 2 1 0 1 = 2 I − 11 Y σ n 1 = 2 kY − 1Ȳ· k2 σ n 1 X (Yi − Ȳ· )2 . = 2 σ c Copyright 2012 Dan Nettleton (Iowa State University) i=1 Statistics 611 14 / 34 Also, s2 = n − 1 and µ2 0 1 0 φ2 = 2 1 I − 11 1 2σ n 2 µ 1 = 2 10 1 − 10 110 1 2σ n 2 2 µ n = 2 n− = 0. 2σ n c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 34 Thus, nȲ·2 ∼ σ 2 χ21 nµ2 2σ 2 independent of n X (Yi − Ȳ· )2 ∼ σ 2 χ2n−1 . i=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 16 / 34 It follows that nȲ·2 /σ 2 Pn 2 i=1 (Yi −Ȳ· ) n−1 = /σ 2 nȲ·2 Pn 2 i=1 (Yi −Ȳ· ) n−1 ∼ F1,n−1 c Copyright 2012 Dan Nettleton (Iowa State University) nµ2 2σ 2 . Statistics 611 17 / 34 If µ = 0, nȲ·2 Pn (Y −Ȳ· )2 i=1 i n−1 ∼ F1,n−1 . Thus, we can test H0 : µ = 0 by comparing nȲ·2 Pn (Y −Ȳ· )2 i=1 i n−1 to F1,n−1 distribution and rejecting H0 for large values. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 34 Note nȲ·2 Pn 2 i=1 (Yi −Ȳ· ) = t2 , n−1 where Ȳ· t= p , s2 /n the usual t-statistics for testing H0 : µ = 0. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 34 Example: ANalysis Of VAriance (ANOVA): Consider Y = Xβ + ε, where β1 β 2 X = [X1 , X2 , . . . , Xm ], β = . , ε ∼ N(0, σ 2 I). .. βm c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 20 / 34 Let Pj = P[X1 ,...,Xj ] for j = 1, . . . , m. Let A1 = P 1 A2 = P 2 − P 1 A3 = P 3 − P 2 .. . Am = Pm − Pm−1 Am+1 = I − Pm . Note that Pm+1 j=1 Aj = I. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 21 / 34 Then Aj is symmetric and idempotent ∀ j = 1, . . . , m + 1. s1 = rank(A1 ) = rank(P1 ) s2 = rank(A2 ) = rank(P2 ) − rank(P1 ) .. . sm = rank(Am ) = rank(Pm ) − rank(Pm−1 ) sm+1 = rank(Am+1 ) = rank(I) − rank(Pm ) c Copyright 2012 Dan Nettleton (Iowa State University) = n − rank(X). Statistics 611 22 / 34 It follows that 1 0 IND Y Aj Y ∼ χ2sj (φj ), 2 σ ∀ j = 1, . . . , m + 1, where φj = 1 0 0 β X Aj Xβ. 2σ 2 Note 1 0 0 β X (I − PX )Xβ 2σ 2 1 = 2 β 0 X0 (X − PX X)β 2σ 1 = 2 β 0 X0 (X − X)β = 0. 2σ φm+1 = c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 23 / 34 Thus, 1 0 Y Am+1 Y ∼ χ2n−rank(X) σ2 and Y 0 Aj Y/sj Y 0 Am+1 Y/(n − rank(X)) 1 0 0 β X Aj Xβ ∼ Fsj ,n−rank(X) 2σ 2 Fj = c Copyright 2012 Dan Nettleton (Iowa State University) ∀ j = 1, . . . , m. Statistics 611 24 / 34 We can assemble the ANOVA table as below: Source Sum of DF Mean Squares Square Expected F Mean Square A1 .. . Y A1 Y .. . s1 .. . Y A1 Y/s1 .. . σ2 + β 0 X0 A1 Xβ/s1 .. . F1 .. . Am Y 0 Am Y sm Y 0 Am Y/sm σ 2 + β 0 X0 Am Xβ/sm Fm Y Am+1 Y/sm+1 σ2 0 0 Am+1 Y Am+1 Y sm+1 I Y 0 IY n 0 0 This ANOVA table contains sequential (a.k.a type I) sum of squares. Am+1 corresponds to “error.” c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 34 We can use Fj to test H0j : 1 0 0 β X Aj Xβ = 0 ⇐⇒ 2σ 2 β 0 X0 A0j Aj Xβ = 0 ⇐⇒ H0j : Aj Xβ = 0. H0j : c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 26 / 34 Now β1 . . Aj Xβ = Aj [X1 , . . . , Xm ] . βm = Aj m X Xi βi i=1 = m X Aj Xi β i i=1 m X = (Pj − Pj−1 )Xi β i ∀ j = 1, . . . , m (where P0 = 0 ). i=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 27 / 34 Recall Pj = P[X1 ,...,Xj ] . Thus, Pj Xi = Xi ∀ i ≤ j. It follows that (Pj − Pj−1 )Xi = 0 whenever i ≤ j − 1. Therefore m X Aj Xβ = (Pj − Pj−1 )Xi β i i=1 m X = (Pj − Pj−1 )Xi β i . i=j c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 28 / 34 For j = m, this simplifies to Am Xβ = (Pm − Pm−1 )Xm β m = (I − Pm−1 )Pm Xm β m = (I − Pm−1 )Xm β m . Now (I − Pm−1 )Xm β m = 0 ⇐⇒ Xm β m ∈ N (I − Pm−1 ) ⇐⇒ Xm β m ∈ C(Pm−1 ) ⇐⇒ Xm β m ∈ C([X1 , . . . , Xm−1 ]). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 29 / 34 Now Xm β m ∈ C([X1 , . . . , Xm−1 ]) ⇐⇒ E(Y) = Xβ = m X Xi βi i=1 ∈ C([X1 , . . . , Xm−1 ]) ⇐⇒ Explanatory variables in Xm are irrelevant in the presence of X1 , . . . , Xm−1 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 30 / 34 For the special case where X is of full column rank, Xm β m ∈ C([X1 , . . . , Xm−1 ]) ⇐⇒ Xm β m = 0 ⇐⇒ β m = 0. Thus, in this full column rank case, Fm tests c Copyright 2012 Dan Nettleton (Iowa State University) H0m : β m = 0. Statistics 611 31 / 34 Explain what Fj tests for the special case where X0k Xk∗ = 0 c Copyright 2012 Dan Nettleton (Iowa State University) ∀ k 6= k∗ . Statistics 611 32 / 34 Now suppose X0k Xk∗ = 0 ∀ k 6= k∗ . Then Pj X i = Xi for i ≤ j 0 for i > j X01 Xi 0 .. .. ∵ [X1 , . . . , Xj ]0 Xi = . = . X0j Xi 0 for i > j. It follows that Aj Xβ = m X (Pj − Pj−1 )Xi β i = Pj Xj β j = Xj β j . i=1 c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 33 / 34 Thus, for the orthogonal case, Fj can be used to test H0j : Xj β j = 0 ∀ j = 1, . . . , m. If we have X full column rank in addition to the orthogonality condition, Fj tests H0j : β j = 0 c Copyright 2012 Dan Nettleton (Iowa State University) ∀ j = 1, . . . , m. Statistics 611 34 / 34