Population-Averaged Model

advertisement
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Population-Averaged Model
Recall the general formulation of a PA (marginal) model:
E (Yi | xi ) = fi (xi , β)
var ( Yi | xi ) = Vi (β, ξ, xi )
ni × 1,
ni × ni .
Below, xi,j contains the among-individual covariates for the i th
subject, ai , and the within-individual covariates for the j th
observation on that subject, zi,j :
zi,j
xi,j =
.
ai
1 / 21
Population-Averaged Model
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
We assume that
1
1
Vi (β, ξ, xi ) = Ti (β, θ, xi ) 2 Γi (α, xi ) Ti (β, θ, xi ) 2
where Ti (·) is the diagonal matrix of variances, and Γi (·) is the
correlation matrix.
The variances are specified as in the univariate case:
Ti (β, θ, xi ) = diag {var (Yi,1 | xi ) , var ( Yi,2 | xi ) , . . . , var ( Yi,ni | xi )}
and
var ( Yi,j | xi ) = σ 2 g (β, θ, xi,j )2 .
2 / 21
Population-Averaged Model
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The correlations are specified using one of the standard correlation
models, with parameter α.
The correlation structure is usually viewed as a working model, and
inferences are based on variance estimators that recognize that
(sandwich estimators).
The overall variance parameter vector is
θ
ξ=
.
α
3 / 21
Population-Averaged Model
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Generalized Estimating Equations (GEE)
The basic principles of GEE are the same as those for univariate
response, even though the equations are more complicated.
Things to come:
Linear estimating equations for β
Quadratic estimating equations for ξ
Quadratic estimating equations for β
The folklore theorem and robust covariance matrix
Implementations in R
Strategy: initially, assume the given mean specification and known
variance matrices Vi , and the Gaussian distribution.
4 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Linear Estimating Equations for β
Normal log-likelihood
m
i
1 Xh
T
−1
log |2πVi | + {Yi − fi (xi , β)} Vi {Yi − fi (xi , β)}
logL = −
2 i=1
Differentiate wrt β:
m
X
Xi (β)T Vi−1 {Yi − fi (xi , β)} = 0.
i=1
5 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
As always, Xi (β) is a gradient matrix:

fβ (xi,1 , β)T
 f (x , β)T
 β i,2
Xi (β) = 
..

.
fβ (xi,ni , β)T
6 / 21
Population-Averaged Model





ni ×p
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Now use the variance specification
Vi = var ( Yi | xi ) = Vi (β, ξ, xi ) .
Estimating equation for β becomes
m
X
Xi (β)T Vi (β, ξ, xi )−1 {Yi − fi (xi , β)} = 0.
i=1
In terms of stacked XN×p , YN×1 , and fN×1 :
X(β)T V−1 {Y − f (β)} = 0
where VN×N is block-diagonal:
V = diag (V1 , V2 , . . . , Vm ) .
7 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Estimating equations: linear, unbiased.
GLS iteration:
(0)
Initial estimator β̂ , e.g., OLS.
(k)
For each k ≥ 0, some estimator ξ̂ gives estimated weight
matrices
(k) (k) −1
V̂i β̂ , ξ̂ , xi
.
Re-estimate β by solving
m
X
(k) (k) −1
Xi (β)T V̂i β̂ , ξ̂ , xi
{Yi − fi (xi , β)} = 0.
i=1
for β̂
8 / 21
(k+1)
, and iterate C times.
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Estimating Variance and Correlation Parameters
Estimation of ξ q+s = (θ T , αT )T .
Hold β fixed in the normal log-likelihood and maximize wrt ξ.
Both involve complicated estimating equations that are quadratic in
{Yi − fi (xi , β)}.
9 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
General quadratic equation approach:
univariate case: requires variances of residuals
i = {Yi − f (xi , β)} .
multivariate case: requires variances and covariances of
ui,j,k = {Yi,j − f (xi,j , β)} × {Yi,k − f (xi,k , β)} .
10 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Two popular working assumptions:
Independence: assume cov (ui,j,k , ui,j 0 ,k 0 ) is the same as if
Yi,1 , Yi,2 , . . . , Yi,ni were independent.
Gaussian: assume cov (ui,j,k , ui,j 0 ,k 0 ) is the same as if
Yi,1 , Yi,2 , . . . , Yi,ni were normal with the specified mean and
variance structure.
Not surprisingly, the Gaussian working assumption leads to the
same estimating equation as PL.
11 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Quadratic Estimating Equations for β
As in the univariate case, maximizing the full normal likelihood wrt β
leads to estimating equations that are quadratic in Yi − fi (xi , β),
when g (·) depends on β.
As when estimating variance parameters, general quadratic estimating
equations may be used, based on the covariances cov (ui,j,k , ui,j 0 ,k 0 ).
Working assumptions (independence or Gaussian) would usually be
necessary.
12 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the working assumptions about the covariance are correct, the
quadratic estimator is better than GLS.
Otherwise, β̂ Q may be less efficient than GLS, and, if the variance is
misspecified, inconsistent.
GEE-1: linear equations for β and quadratic equations for ξ;
GEE-2: quadratic equations for β and quadratic equations for ξ.
13 / 21
Population-Averaged Model
Generalized Estimating Equations (GEE)
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The folklore theorem
Large sample properties are usually derived with ni fixed and m → ∞.
With fixed weight matrices U−1
and true variance matrices Vi ,
i
·
β̂ ∼

!−1
m

X
T −1
N β0 ,
Xi Ui Xi

i=1
m
X
!
−1
−1
XT
i Ui Vi Ui Xi
i=1
m
X
−1
XT
i Ui Xi
!−1 

i=1
Folklore theorem: estimated weight matrices generally lead to the
same asymptotic distribution as if the weights were known.
Misspecified weight matrices lead to inefficiency.
14 / 21
Population-Averaged Model
The folklore theorem

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Sandwich Estimator
With misspecified weight matrices, model-based standard errors are
generally incorrect, while sandwich estimator of standard errors are
valid.
The asymptotic variance is estimated by
m
X
R
XTi U−1
β̂ U−1
i
i
i Xi
i=1
where
Ri (b) = {Yi − fi (xi , b)} {Yi − fi (xi , b)}T .
Note that
E {Ri (β)} = Vi ,
regardless of the correlation structure of Yi .
15 / 21
Population-Averaged Model
The folklore theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Implementation in R
Recall the general formulation of a PA (marginal) model:
E (Yi | xi ) = fi (xi , β) ,
var (Yi | xi ) = Vi (β, ξ, xi )
with
1
1
Vi (β, ξ, xi ) = Ti (β, θ, xi ) 2 Γi (α, xi ) Ti (β, θ, xi ) 2
where Ti (·) is the diagonal matrix of variances, and Γi (·) is the
correlation matrix.
16 / 21
Population-Averaged Model
Implementation in R
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Package software implements only a special case, based on
generalized linear models:
mean is a nonlinear function of a linear predictor:
E (Yi,j | xi ) = f xTi,j β ;
variance is a function of the mean:
var ( Yi,j | xi ) = σ 2 g f xTi,j β ;
correlation structure is one of the standard patterns.
17 / 21
Population-Averaged Model
Implementation in R
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Note
It is possible that no joint distribution of Yi exists with marginal
distributions for Yi,j in the associated scaled exponential family.
That is, the specification should be viewed as defining Generalized
Estimating Equations (GEE);
also, the implementations are derived from the Generalized
Linear Model point of view, and are based on linear estimating
equations for β (GEE-1).
18 / 21
Population-Averaged Model
Implementation in R
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
SAS proc genmod has a repeated statement to specify correlation
structure.
In R, use one of the functions:
gee() (in the gee library);
geese() or geeglm() (in the geepack library).
The argument corstr specifies correlation structure.
19 / 21
Population-Averaged Model
Implementation in R
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The gee package
seize <- read.table("seize.dat",
col.names = c("id", "y", "time", "trt",
"base", "age"))
library(gee)
seizeGee.un <- gee(y ~ log(age) + trt*log(base / 4) + (time == 4),
id = id, data=seize, corstr="unstructured",
family=poisson)
summary(seizeGee.un)
seizeGee.ex <- update(seizeGee.un, corstr="exchangeable")
summary(seizeGee.ex)
seizeGee.ar1 <- update(seizeGee.un, corstr="AR-M", Mv = 1)
summary(seizeGee.ar1)
library(MuMIn)
QIC(seizeGee.un)
QIC(seizeGee.ex)
QIC(seizeGee.ar1)
20 / 21
Population-Averaged Model
Implementation in R
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The geepack package
library(geepack)
seizeGeeglm.un <- geeglm(y ~ log(age) + trt*log(base / 4)
+ (time == 4),
id = id, data=seize,
corstr="un", family=poisson)
summary(seizeGeeglm.un)
seizeGeeglm.ex <- update(seizeGeeglm.un, corstr="exch")
summary(seizeGeeglm.ex)
seizeGeeglm.ar1 <- update(seizeGeeglm.un, corstr="ar1")
summary(seizeGeeglm.ar1)
library(MuMIn)
QIC(seizeGeeglm.un)
QIC(seizeGeeglm.ex)
QIC(seizeGeeglm.ar1)
21 / 21
Population-Averaged Model
Implementation in R
Related documents
Download