The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear Model, the distribution of y depends on β only through Xβ, i.e., y ∼ (Xβ, σ 2 I) Estimating Estimable Functions of β or y ∼ N(Xβ, σ 2 I) If X is not of full column rank, there are infinitely many vectors in the set {b : Xb = Xβ} for any fixed value of β. Thus, no matter what the value of E(y), there will be infinitely many vectors b such that Xb = E(y) when X is not of full column rank. The response vector y can help us learn about E(y) = Xβ, but when X is not of full column rank, there is no hope of learning about β alone unless additional information about β is available. c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 1 / 17 Treatment Effects Model ⎡ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ y11 y12 y13 y21 y22 y23 i = 1, 2; j = 1, 2, 3 ⎤ ⎡ ⎤ μ + τ1 11 ⎥ ⎢ μ + τ1 ⎥ ⎢ 12 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ μ + τ1 ⎥ ⎢ 13 ⎥ ⎥=⎢ ⎥+⎢ ⎥ ⎥ ⎢ μ + τ2 ⎥ ⎢ 21 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎦ ⎣ μ + τ2 ⎦ ⎣ 22 ⎦ μ + τ2 23 ⎤ ⎡ ⎡ 1 1 0 11 ⎥ ⎢ 1 1 0 ⎥⎡ ⎤ ⎢ 12 ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ 1 1 0 ⎥ μ ⎥ ⎣ τ1 ⎦ + ⎢ 13 ⎥=⎢ ⎢ 21 ⎥ ⎢ 1 0 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎣ 22 ⎦ ⎣ 1 0 1 ⎦ τ2 1 0 1 23 y11 ⎢ y12 ⎢ ⎢ y13 ⎢ ⎢ y21 ⎢ ⎣ y22 y23 ⎤ ⎤ c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 2 / 17 Treatment Effects Model (continued) Researchers randomly assigned a total of six experimental units to two treatments and measured a response of interest. yij = μ + τi + ij , c Copyright 2010 Dan Nettleton (Iowa State University) ⎡ In this case, it makes no sense to estimate β = [μ, τ1 , τ2 ] because there are multiple (infinitely many, in fact) choices of β that define the same mean for y. For example, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 5 μ 0 999 ⎣ τ1 ⎦ = ⎣ −1 ⎦ , ⎣ 4 ⎦ , or ⎣ −995 ⎦ τ2 1 6 −993 ⎤ all yield same Xβ = E(y). ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ When multiple values for β define the same E(y), we say that β is non-estimable. Statistics 511 3 / 17 c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 4 / 17 Estimable Functions of β Treatment Effects Model (continued) ⎡ A linear function of β, Cβ, is said to be estimable if there is a linear function of y, Ay, that is an unbiased estimator of Cβ. Otherwise, Cβ is said to be non-estimable. ⎢ ⎢ ⎢ Xβ = ⎢ ⎢ ⎢ ⎣ Note that Ay is an unbiased estimator of Cβ if and only if E(Ay) = Cβ ∀ β ∈ IRp ⇐⇒ AXβ = Cβ ∀ β ∈ IRp ⇐⇒ AX = C. This says that we can estimate Cβ as long as Cβ = AXβ = AE(y) for some A, i.e., as long as Cβ is a linear function of E(y). The bottom line is that we can always estimate E(y) and all linear functions of E(y); all other linear functions of β are non-estimable. c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 5 / 17 Estimating Estimable Functions of β 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 ⎤ ⎡ ⎥⎡ ⎤ ⎢ ⎥ μ ⎢ ⎥ ⎢ ⎥ ⎣ τ1 ⎦ = ⎢ ⎥ ⎢ ⎥ τ2 ⎢ ⎦ ⎣ μ + τ1 μ + τ1 μ + τ1 μ + τ2 μ + τ2 μ + τ2 ⎤ ⎥ ⎥ ⎥ ⎥ =⇒ ⎥ ⎥ ⎦ [1, 0, 0, 0, 0, 0]Xβ = [1, 1, 0]β = μ + τ1 [0, 0, 0, 1, 0, 0]Xβ = [1, 0, 1]β = μ + τ2 [1, 0, 0, −1, 0, 0]Xβ = [0, 1, −1]β = τ1 − τ 2 are estimable functions of β. c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 6 / 17 Invariance of Cβ̂ to the Choice of β̂ If Cβ is estimable, then there exists a matrix A such that C = AX and Cβ = AXβ = AE(y) for any β ∈ IRp . Although there are infinitely many solutions to the normal equations when X is not of full column rank, Cβ̂ is the same for all normal equation solutions β̂ whenever Cβ is estimable. To see this, suppose β̂ 1 and β̂ 2 are any two solutions to the normal equations. Then It makes sense to estimate Cβ by = Aŷ = APX y = AX(X X)− X y = AX(X X)− X Xβ̂ AE(y) Cβ̂ 1 = AXβ̂ 1 = APX Xβ̂ 1 = APX Xβ̂ = AXβ̂ = Cβ̂. = AX(X X)− X Xβ̂ 1 = AX(X X)− X y Cβ̂ is called an Ordinary Least Squares (OLS) estimator of Cβ. Note that although the “hat” is on β, it is Cβ that we are estimating. = AX(X X)− X Xβ̂ 2 = APX Xβ̂ 2 = AXβ̂ 2 = Cβ̂ 2 . c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 7 / 17 c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 8 / 17 Treatment Effects Model (continued) Treatment Effects Model (continued) Suppose our aim is to estimate τ1 − τ2 . As noted before, ⎡ ⎢ ⎢ ⎢ Xβ = ⎢ ⎢ ⎢ ⎣ 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 ⎤ ⎡ ⎥⎡ ⎤ ⎢ ⎢ ⎥ μ ⎢ ⎥ ⎥ ⎣ τ1 ⎦ = ⎢ ⎢ ⎥ ⎢ ⎥ τ2 ⎣ ⎦ μ + τ1 μ + τ1 μ + τ1 μ + τ2 μ + τ2 μ + τ2 The normal equations in this case are ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ =⇒ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ [1, 0, 0, −1, 0, 0]Xβ = [0, 1, −1]β = τ1 − τ2 . Thus, we can compute the OLS estimator of τ1 − τ2 as 1 1 1 0 0 0 0 0 0 1 1 1 ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 ⎤ ⎡ ⎥⎡ ⎢ ⎤ ⎥ b1 ⎢ ⎥ ⎢ ⎥ ⎣ b2 ⎦ = ⎢ ⎥ ⎢ ⎥ b3 ⎢ ⎦ ⎣ 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ y11 y12 y13 y21 y22 y23 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎡ ⎤⎡ ⎤ ⎡ ⎤ 6 3 3 b1 y·· ⇐⇒ ⎣ 3 3 0 ⎦ ⎣ b2 ⎦ = ⎣ y1· ⎦ . 3 0 3 b3 y2· [1, 0, 0, −1, 0, 0]ŷ = [0, 1, −1]β̂, where ŷ = X(X X)− X y and β̂ is any solution to the normal equations. c Copyright 2010 Dan Nettleton (Iowa State University) 1 1 1 1 1 1 Statistics 511 9 / 17 Treatment Effects Model (continued) c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 10 / 17 Treatment Effects Model (continued) ⎡ ⎡ ⎤ ⎤ ȳ·· 0 are each solutions to β̂ 1 ≡ ⎣ ȳ1· − ȳ·· ⎦ and β̂ 2 ≡ ⎣ ȳ1· ⎦ the normal equations ȳ2· − ȳ·· ȳ2· because ⎡ ⎤ ⎡ 1/6 0 0 0 0 ⎣ 0 ⎦ and (X X)− = ⎣ 0 1/3 1/6 −1/6 Let (X X)− = 1 2 0 −1/6 1/6 0 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎡ ⎤⎡ 6 3 3 ȳ·· y·· 6 3 3 0 ⎣ 3 3 0 ⎦ ⎣ ȳ1· − ȳ·· ⎦ = ⎣ y1· ⎦ = ⎣ 3 3 0 ⎦ ⎣ ȳ1· ⎦ . ȳ2· − ȳ·· ȳ2· y2· 3 0 3 3 0 3 − It is straightforward to verify that (X X)− 1 and (X X)2 are each generalized inverses of X X. Thus, the OLS estimator of Cβ = [0, 1, −1]β = τ1 − τ2 is ⎡ ⎤ ⎡ ⎤ ȳ·· 0 Cβ̂ 1 = [0, 1, −1] ⎣ ȳ1· − ȳ·· ⎦ = ȳ1· − ȳ2· = [0, 1, −1] ⎣ ȳ1· ⎦ = Cβ̂ 2 . ȳ2· − ȳ·· ȳ2· c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 ⎤ 0 0 ⎦. 1/3 11 / 17 − It is also easy to show that β̂ 1 = (X X)− 1 X y and β̂ 2 = (X X)2 X y. c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 12 / 17 Treatment Effects Model (continued) Treatment Effects Model (continued) Thus ⎤⎡ 0 0 0 ⎢ ⎢ 0 0 0 ⎥ ⎥⎢ ⎢ ⎢ ⎢ 0 0 0 ⎥ = ŷ = PX y = ⎢ ⎥⎢ E(y) ⎢ 0 0 0 1 1 1 ⎥⎢ ⎢ 3 3 3 ⎥⎢ ⎣ 0 0 0 1 1 1 ⎦⎣ 3 3 3 0 0 0 13 13 13 ⎡ ⎡ PX ⎢ ⎢ ⎢ − = X(X X) X = ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ = ⎢ ⎢ ⎢ ⎣ 1 1 1 1 1 1 ⎤ 1 1 1 0 0 0 0 1/3 0 ⎡ 0 1/3 0 ⎥ ⎥ 1 ⎥ 0 1/3 0 ⎥ ⎣ 1 0 0 1/3 ⎥ ⎥ 0 0 0 1/3 ⎦ 0 0 1/3 0 0 0 1 1 1 1 1 0 ⎡ ⎤ ⎥⎡ ⎤⎢ ⎥ 0 0 0 ⎢ ⎢ ⎥ ⎥ ⎣ 0 1/3 0 ⎦⎢ ⎢ ⎥ ⎥ 0 0 1/3 ⎢ ⎣ ⎦ ⎡ 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 3 1 3 1 3 ⎢ ⎢ 1 ⎢ 0 ⎦=⎢ ⎢ 0 ⎢ 1 ⎣ 0 0 ⎤ 1 1 1 0 0 0 1 3 1 3 1 3 0 0 0 c Copyright 2010 Dan Nettleton (Iowa State University) 0 0 0 1 1 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 1 3 1 3 1 3 0 0 0 1 3 1 3 1 3 0 0 0 0 0 0 1 3 1 3 1 3 Statistics 511 0 0 0 1 3 1 3 1 3 ⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦ 13 / 17 Treatment Effects Model (continued) is our OLS estimator of ⎡ ⎢ ⎢ ⎢ E(y) = Xβ = ⎢ ⎢ ⎢ ⎣ 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 0 0 0 1 3 1 3 1 3 0 0 0 1 1 1 ⎤ ⎡ ⎥⎡ ⎤ ⎢ ⎢ ⎥ μ ⎢ ⎥ ⎥ ⎣ τ1 ⎦ = ⎢ ⎢ ⎥ ⎢ ⎥ τ2 ⎣ ⎦ y11 y12 y13 y21 y22 y23 ⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ μ + τ1 μ + τ1 μ + τ1 μ + τ2 μ + τ2 μ + τ2 c Copyright 2010 Dan Nettleton (Iowa State University) ȳ1· ȳ1· ȳ1· ȳ2· ȳ2· ȳ2· ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦ Statistics 511 14 / 17 Treatment Effects Model (continued) Also, we can see that the OLS estimator of ⎡ ⎢ ⎤ ⎢ μ ⎢ = [0, 1, −1] ⎣ τ1 ⎦ = [1, 0, 0, −1, 0, 0] ⎢ ⎢ ⎢ τ2 ⎣ ⎡ τ1 − τ2 ⎡ ⎢ ⎢ ⎢ = [1, 0, 0, −1, 0, 0] ⎢ ⎢ ⎢ ⎣ c Copyright 2010 Dan Nettleton (Iowa State University) μ + τ1 μ + τ1 μ + τ1 μ + τ2 μ + τ2 μ + τ2 ⎤ 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 ⎤ ⎥⎡ ⎤ ⎥ μ ⎥ ⎥ ⎣ τ1 ⎦ ⎥ ⎥ τ2 ⎦ = [1, 0, 0, −1, 0, 0]ŷ [1, 0, 0, −1, 0, 0]E(y) ⎡ ⎢ ⎢ ⎢ = [1, 0, 0, −1, 0, 0] ⎢ ⎢ ⎢ ⎣ ⎥ ⎥ ⎥ ⎥ = [1, 0, 0, −1, 0, 0]E(y) is ⎥ ⎥ ⎦ Statistics 511 = ȳ1· − ȳ2· 15 / 17 c Copyright 2010 Dan Nettleton (Iowa State University) ȳ1· ȳ1· ȳ1· ȳ2· ȳ2· ȳ2· ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Statistics 511 16 / 17 The Gauss-Markov Theorem Under the Gauss-Markov Linear Model, the OLS estimator c β̂ of an estimable linear function c β is the unique Best Linear Unbiased Estimator (BLUE) in the sense that Var(c β̂) is strictly less than the variance of any other linear unbiased estimator of c β for all β ∈ IRp and all σ 2 ∈ IR+ . The Gauss-Markov Theorem says that if we want to estimate an estimable linear function c β using a linear estimator that is unbiased, we should always use the OLS estimator. In our simple example of the treatment effects model, we could have used y11 − y21 to estimate τ1 − τ2 . It is easy to see that y11 − y21 is a linear estimator that is unbiased for τ1 − τ2 , but its variance is clearly larger than the variance of the OLS estimator ȳ1· − ȳ2· (as guaranteed by the Gauss-Markov Theorem). c Copyright 2010 Dan Nettleton (Iowa State University) Statistics 511 17 / 17