A review of linear models 1/16

advertisement
A review of linear models
1/16
Maximum likelihood estimator (MLE)
Assume Z1 , · · · , Zn are independent and identically distributed
(iid) random variables with probability mass function or density
function f (z; θ). The likelihood function of θ is
L(θ) =
n
Y
f (Zi ; θ).
i=1
The maximum likelihood estimator of θ is given by
θ̂ = arg max L(θ).
θ
2/16
Linear models with normality assumption
Assume the following Gauss-Markov model,
Y = Xβ + ε
where ε is multivariate normally distributed with mean 0 and
covariance σ 2 In . In this model, Y1 , · · · , Yn are independent
random variables with mean XiT β and variance σ 2 . The
probability density function of Yi is
f (yi ; β, σ 2 ) = √
1
2πσ
exp
n
−
(yi − XiT β)2 o
.
2σ 2
3/16
Likelihood function for X β and σ 2
The likelihood function for X β and σ 2 is
L(X β, σ 2 ) = (2π)−n/2 σ −n exp
n
o
1 X
T
2
(Y
−
X
β)
i
i
2σ 2
n
−
n
o
1
− 2 (Y − X β)T (Y − X β) .
2σ
i=1
−n/2 −n
= (2π)
σ
exp
The corresponding log-likelihood function is
`(X β, σ 2 ) = −(n/2) log(2π) − (n/2) log(σ 2 )
−
1
(Y − X β)T (Y − X β).
2σ 2
4/16
MLE for X β and c T β
I
For any σ 2 ,
Xcβ = arg min(Y − X β)T (Y − X β).
Xβ
Thus, the estimation of X β is the same as the ordinary
least squares estimator. It follows that
Xcβ = X (X T X )− X T Y .
I
For any estimable function c T β, the MLE of c T β is
T β = c T (X T X )− X T Y .
cd
5/16
MLE for σ 2
I
Denote SSE = (Y − Xcβ)T (Y − Xcβ). The log-likelihood
function for σ 2 is
`(Xcβ, σ 2 ) = −(n/2) log(2π) − (n/2) log(σ 2 ) −
1
SSE.
2σ 2
The corresponding score function is
∂`(Xcβ, σ 2 )
1
1
= −(n/2) 2 +
SSE = 0.
2
∂σ
σ
2(σ 2 )2
Then the joint MLE of (X β, σ 2 ) is (Xcβ, SSE/n).
I
Note that the estimation of σ 2 is different from the unbiased
estimator of σ 2 , which is SSE/(n − rank(X )).
6/16
Asymptotic inference based on likelihood ratio
Assume C is an m × p matrix and rank(C) = m < p. Consider
the hypothesis testing for H0 : Cβ = 0 vs. H1 : Cβ 6= 0. For
convenience, assume β is estimable in the following.
Define the likelihood ratio statistic as
LR =
supCβ=0,σ2 L(β, σ 2 )
.
supβ,σ2 L(β, σ 2 )
where L(β, σ 2 ) is the likelihood function for β, σ 2 .
A likelihood ratio test for H0 : Cβ = 0 rejects the null hypothesis
if
LR < rα
where rα some cutoff point that will be spcified later.
7/16
Likelihood under alternative
When there is no constraint, the maximum likelihood under
alternative is
b σ̂ 2 )
sup L(β, σ 2 ) = L(β,
β,σ 2
= (2π)−n/2 σ̂ −n exp
n
−
n 2o
σ̂
2σ̂ 2
= (2πe/n)−n/2 SSE −n/2 .
where σ̂ 2 = SSE/n.
8/16
Likelihood under null hypothesis
Under H0 , the maximum likelihood estimator of β is
β̂c = β̂ − (X T X )− C T (C(X T X )− C T )−1 C β̂
and the MLE of σ 2 under H0 is σ̂c2 = (Y − X β̂c )T (Y − X β̂c )/n.
Under the null hypothesis, the maximum likelihood is
cc , σ̂ 2 )
sup L(β, σ 2 ) = L(β
c
Cβ=0,σ 2
= (2π)−n/2 σ̂c−n exp
n
−
−n/2
= (2πe/n)−n/2 SSc
n 2o
σ̂c
2σ̂c2
,
where SSc = (Y − X β̂c )T (Y − X β̂c ).
9/16
Likelihood ratio test statistic and F-statistic
I
The likelihood ratio test statistic is
LR =
supCβ=0,σ2 L(β, σ 2 )
supβ,σ2 L(β, σ 2 )
−n/2
=
I
SSc −n/2
SSc − SSE −n/2
SSc
=(
)
= (1 +
)
.
−n/2
SSE
SSE
SSE
The likelihood ratio statistic LR is a monotone function of
G = (SSc − SSE)/SSE, which is equivalent to the F -test
where
Fn =
SSH0 /m
n−r
(SSc − SSE)/m
G=
=
,
m
SSE/(n − r )
SSE/(n − r )
where r = rank(X ) and we used the fact that
SSc − SSE = SSH0 .
10/16
Asymptotic inference: likelihood ratio test
I
Let Z1 , · · · , Zn be iid random variables generated from a
pdf or pmf f (z; θ). Assume θ̂ is the MLE of θ ∈ R p . The
P
log-likelihood function for θ is `(θ) = ni=1 log f (zi ; θ).
I
Assume Θ0 is a subspace of Θ. The log-likelihood ratio
statistic is defined by log LR = supθ∈Θ0 `(θ) − supθ∈Θ `(θ).
Then under some regularity conditions,
d
−2 log LR → χ2df .
where df is the difference between the number of free
parameters specified by θ ∈ Θ0 and the number of free
parameters in θ ∈ Θ.
11/16
Likelihood ratio test for H0 : Cβ = 0 vs H1 : Cβ 6= 0
Under the null hypothesis, Θ0 = {β : Cβ = 0}, the number of
free parameters is p − m. Under the alternative hypothesis, the
number of free parameters is p. Therefore, the degrees of
freedome for the limiting chi-square distribution is m.
Therefore, we have the following
d
−2 log LR = n log(SSc /SSE) = n log(1 + SSH0 /SSE) → χ2m .
12/16
Example: Beverage study data set
Hypothesis testing for H0 : β1 = β2 = β3 = β4 vs H1 : βk 6= βl
for some k 6= l is
Cmat<-rbind(c(0,1,-1,0,0),c(0,0,1,-1,0),
c(0,0,0,1,-1))
Cmatbeta<-Cmat%*%ginv(t(X)%*%X)%*%t(X)%*%y
CovCmatbeta<-Cmat%*%ginv(t(X)%*%X)%*%t(Cmat)
SSH0<-t(Cmatbeta)%*%solve(CovCmatbeta)%*%Cmatbeta
SSE<-sum(epsilonhatˆ2)
logLR<-n*log(1+SSH0/SSE)
pvalLR<-1-pchisq(logLR,3)
pvalLR
0.7707697
## pvalF: 0.8142451
13/16
Asymptotic inference: Wald type inference
Assume n → ∞, we have the following
d
1/2
In (θ)(θ̂ − θ) → N(0, Ip )
where In (θ) is the information matrix defined by
In (θ) = E
n ∂`(θ) ∂`(θ) o
∂θ
∂θT
= −E
n ∂ 2 `(θ) o
.
∂θ∂θT
14/16
Example: Gauss-Markov model
The information matrix for β under the Gauss-Markov model is
In (β) =
1 T
X X.
σ2
Therefore, we have
1/2
d
σ −2 (X T X )
(β̂ − β) → N(0, Ip ).
For any linear combinations of β, say Cβ, the asymptotic
distribution is
c − Cβ) ∼ N(0, σ 2 C(X T X )− C T ).
(Cβ
15/16
Example: Beverage study data set
A 95% confidence interval for β3 − β4 is
cvec<-c(0,0,0,1,-1)
cbeta<-cvec%*%ginv(t(X)%*%X)%*%t(X)%*%y
varcbeta<-t(cvec)%*%ginv(t(X)%*%X)%*%cvec
Waldlow95cbeta<-cbeta-qnorm(0.975)*sigmahat
*sqrt(varcbeta)
Waldupp95cbeta<-cbeta+qnorm(0.975)*sigmahat
*sqrt(varcbeta)
WaldCI95cbeta<-c(Waldlow95cbeta,Waldupp95cbeta)
WaldCI95cbeta: (-0.2181812, 0.1135572)
## CI95cbeta (-0.2301103, 0.1254863)
16/16
Download