Stat 505 Gauss Markov Theorem September 12, 2014 The Gauss-Markov Theorem y = Xβ+ with ∼ (0, σ 2 V ). Normality is not required. For any estimable cT β, the best (minimum b where β b solves the NE: (X T V −1 X)β = X T V −1 y. variance) linear unbiased estimator of cT β is cT β Note: smaller variance does not necessarily mean that SE of each coefficient is smaller. With multiple coefficients to consider, minimum variance means that the difference between the variance-covariance matrix of any other estimator and the variance-covariance matrix of the BLUE is non-negative definite. Proof: 1. Because cT β is estimable, c ∈ C(X T ), and can be written as c = X T a. 2. We need to show that X T V −1 X(X T V −1 X)g c = c. Because P V = X(X T V −1 X)g X T V −1 projects into C(X), it does not change X, and P V X = X. Then X T V −1 X(X T V −1 X)g c = X T V −1 X(X T V −1 X)g X T a = X T P V T a = (P V X)T a = X T a = c. 3. Take any other linearly unbiased estimator of cT β, calling it d0 + dT y. Unbiased means that for any β, E(d0 + dT y) = cT β, and that includes β = 0 which means d0 = 0. Also, E(dT y) = dT Xβ for all β, so cT = dT X. Careful of this argument. Just cT β = dT Xβ for a few β’s would not be enough to say cT = dT X, but if it’s true for all β, then plug in each column of the identity matrix in turn and we get the equality we need. 4. Now how big is the variance of this generic linear unbiased estimator? We’ll use the “add zero” trick. b + dT y − cT β) b = Var(cT β) b + Var(dT y − cT β) b + 2Cov(cT β, b dT y − cT β) b Var(dT y) = Var(cT β b is a If we can show that the covariance term is 0, then we’ll be done, because Var(dT y − cT β) variance-covariance matrix, and must be non-negative definite. In general, Cov(aT y, bT y) = aT Var(y)b. b dT y − cT β) b Cov(cT β, = cT (X T V −1 X)g X T V −1 [σ 2 V ][d − V −1 X(X T V −1 X)g c] = σ 2 cT (X T V −1 X)g X T [d − V −1 X(X T V −1 X)g c] = σ 2 cT (X T V −1 X)g [X T d − X T V −1 X(X T V −1 X)g Xa] = σ 2 cT (X T V −1 X)g [c − X T P V T a] = σ 2 cT (X T V −1 X)g [c − c] from 3 above from 2 above = 0 b in the sense that the difference between And we proved Gauss-Markov: Var(dT y) ≥ Var(cT β) the two matrices is a non-negative definite matrix. In fact, the only time they are equal is if b d = V −1 X(X T V −1 X)g X T c for some generalized inverse and dT y = cT β. 1