A review of linear models 1/16 Maximum likelihood estimator (MLE) Assume Z1 , · · · , Zn are independent and identically distributed (iid) random variables with probability mass function or density function f (z; θ). The likelihood function of θ is L(θ) = n Y f (Zi ; θ). i=1 The maximum likelihood estimator of θ is given by θ̂ = arg max L(θ). θ 2/16 Linear models with normality assumption Assume the following Gauss-Markov model, Y = Xβ + ε where ε is multivariate normally distributed with mean 0 and covariance σ 2 In . In this model, Y1 , · · · , Yn are independent random variables with mean XiT β and variance σ 2 . The probability density function of Yi is f (yi ; β, σ 2 ) = √ 1 2πσ exp n − (yi − XiT β)2 o . 2σ 2 3/16 Likelihood function for X β and σ 2 The likelihood function for X β and σ 2 is L(X β, σ 2 ) = (2π)−n/2 σ −n exp n o 1 X T 2 (Y − X β) i i 2σ 2 n − n o 1 − 2 (Y − X β)T (Y − X β) . 2σ i=1 −n/2 −n = (2π) σ exp The corresponding log-likelihood function is `(X β, σ 2 ) = −(n/2) log(2π) − (n/2) log(σ 2 ) − 1 (Y − X β)T (Y − X β). 2σ 2 4/16 MLE for X β and c T β I For any σ 2 , Xcβ = arg min(Y − X β)T (Y − X β). Xβ Thus, the estimation of X β is the same as the ordinary least squares estimator. It follows that Xcβ = X (X T X )− X T Y . I For any estimable function c T β, the MLE of c T β is T β = c T (X T X )− X T Y . cd 5/16 MLE for σ 2 I Denote SSE = (Y − Xcβ)T (Y − Xcβ). The log-likelihood function for σ 2 is `(Xcβ, σ 2 ) = −(n/2) log(2π) − (n/2) log(σ 2 ) − 1 SSE. 2σ 2 The corresponding score function is ∂`(Xcβ, σ 2 ) 1 1 = −(n/2) 2 + SSE = 0. 2 ∂σ σ 2(σ 2 )2 Then the joint MLE of (X β, σ 2 ) is (Xcβ, SSE/n). I Note that the estimation of σ 2 is different from the unbiased estimator of σ 2 , which is SSE/(n − rank(X )). 6/16 Asymptotic inference based on likelihood ratio Assume C is an m × p matrix and rank(C) = m < p. Consider the hypothesis testing for H0 : Cβ = 0 vs. H1 : Cβ 6= 0. For convenience, assume β is estimable in the following. Define the likelihood ratio statistic as LR = supCβ=0,σ2 L(β, σ 2 ) . supβ,σ2 L(β, σ 2 ) where L(β, σ 2 ) is the likelihood function for β, σ 2 . A likelihood ratio test for H0 : Cβ = 0 rejects the null hypothesis if LR < rα where rα some cutoff point that will be spcified later. 7/16 Likelihood under alternative When there is no constraint, the maximum likelihood under alternative is b σ̂ 2 ) sup L(β, σ 2 ) = L(β, β,σ 2 = (2π)−n/2 σ̂ −n exp n − n 2o σ̂ 2σ̂ 2 = (2πe/n)−n/2 SSE −n/2 . where σ̂ 2 = SSE/n. 8/16 Likelihood under null hypothesis Under H0 , the maximum likelihood estimator of β is β̂c = β̂ − (X T X )− C T (C(X T X )− C T )−1 C β̂ and the MLE of σ 2 under H0 is σ̂c2 = (Y − X β̂c )T (Y − X β̂c )/n. Under the null hypothesis, the maximum likelihood is cc , σ̂ 2 ) sup L(β, σ 2 ) = L(β c Cβ=0,σ 2 = (2π)−n/2 σ̂c−n exp n − −n/2 = (2πe/n)−n/2 SSc n 2o σ̂c 2σ̂c2 , where SSc = (Y − X β̂c )T (Y − X β̂c ). 9/16 Likelihood ratio test statistic and F-statistic I The likelihood ratio test statistic is LR = supCβ=0,σ2 L(β, σ 2 ) supβ,σ2 L(β, σ 2 ) −n/2 = I SSc −n/2 SSc − SSE −n/2 SSc =( ) = (1 + ) . −n/2 SSE SSE SSE The likelihood ratio statistic LR is a monotone function of G = (SSc − SSE)/SSE, which is equivalent to the F -test where Fn = SSH0 /m n−r (SSc − SSE)/m G= = , m SSE/(n − r ) SSE/(n − r ) where r = rank(X ) and we used the fact that SSc − SSE = SSH0 . 10/16 Asymptotic inference: likelihood ratio test I Let Z1 , · · · , Zn be iid random variables generated from a pdf or pmf f (z; θ). Assume θ̂ is the MLE of θ ∈ R p . The P log-likelihood function for θ is `(θ) = ni=1 log f (zi ; θ). I Assume Θ0 is a subspace of Θ. The log-likelihood ratio statistic is defined by log LR = supθ∈Θ0 `(θ) − supθ∈Θ `(θ). Then under some regularity conditions, d −2 log LR → χ2df . where df is the difference between the number of free parameters specified by θ ∈ Θ0 and the number of free parameters in θ ∈ Θ. 11/16 Likelihood ratio test for H0 : Cβ = 0 vs H1 : Cβ 6= 0 Under the null hypothesis, Θ0 = {β : Cβ = 0}, the number of free parameters is p − m. Under the alternative hypothesis, the number of free parameters is p. Therefore, the degrees of freedome for the limiting chi-square distribution is m. Therefore, we have the following d −2 log LR = n log(SSc /SSE) = n log(1 + SSH0 /SSE) → χ2m . 12/16 Example: Beverage study data set Hypothesis testing for H0 : β1 = β2 = β3 = β4 vs H1 : βk 6= βl for some k 6= l is Cmat<-rbind(c(0,1,-1,0,0),c(0,0,1,-1,0), c(0,0,0,1,-1)) Cmatbeta<-Cmat%*%ginv(t(X)%*%X)%*%t(X)%*%y CovCmatbeta<-Cmat%*%ginv(t(X)%*%X)%*%t(Cmat) SSH0<-t(Cmatbeta)%*%solve(CovCmatbeta)%*%Cmatbeta SSE<-sum(epsilonhatˆ2) logLR<-n*log(1+SSH0/SSE) pvalLR<-1-pchisq(logLR,3) pvalLR 0.7707697 ## pvalF: 0.8142451 13/16 Asymptotic inference: Wald type inference Assume n → ∞, we have the following d 1/2 In (θ)(θ̂ − θ) → N(0, Ip ) where In (θ) is the information matrix defined by In (θ) = E n ∂`(θ) ∂`(θ) o ∂θ ∂θT = −E n ∂ 2 `(θ) o . ∂θ∂θT 14/16 Example: Gauss-Markov model The information matrix for β under the Gauss-Markov model is In (β) = 1 T X X. σ2 Therefore, we have 1/2 d σ −2 (X T X ) (β̂ − β) → N(0, Ip ). For any linear combinations of β, say Cβ, the asymptotic distribution is c − Cβ) ∼ N(0, σ 2 C(X T X )− C T ). (Cβ 15/16 Example: Beverage study data set A 95% confidence interval for β3 − β4 is cvec<-c(0,0,0,1,-1) cbeta<-cvec%*%ginv(t(X)%*%X)%*%t(X)%*%y varcbeta<-t(cvec)%*%ginv(t(X)%*%X)%*%cvec Waldlow95cbeta<-cbeta-qnorm(0.975)*sigmahat *sqrt(varcbeta) Waldupp95cbeta<-cbeta+qnorm(0.975)*sigmahat *sqrt(varcbeta) WaldCI95cbeta<-c(Waldlow95cbeta,Waldupp95cbeta) WaldCI95cbeta: (-0.2181812, 0.1135572) ## CI95cbeta (-0.2301103, 0.1254863) 16/16