ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Population-Averaged Model Recall the general formulation of a PA (marginal) model: E (Yi | xi ) = fi (xi , β) var ( Yi | xi ) = Vi (β, ξ, xi ) ni × 1, ni × ni . Below, xi,j contains the among-individual covariates for the i th subject, ai , and the within-individual covariates for the j th observation on that subject, zi,j : zi,j xi,j = . ai 1 / 21 Population-Averaged Model ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response We assume that 1 1 Vi (β, ξ, xi ) = Ti (β, θ, xi ) 2 Γi (α, xi ) Ti (β, θ, xi ) 2 where Ti (·) is the diagonal matrix of variances, and Γi (·) is the correlation matrix. The variances are specified as in the univariate case: Ti (β, θ, xi ) = diag {var (Yi,1 | xi ) , var ( Yi,2 | xi ) , . . . , var ( Yi,ni | xi )} and var ( Yi,j | xi ) = σ 2 g (β, θ, xi,j )2 . 2 / 21 Population-Averaged Model ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response The correlations are specified using one of the standard correlation models, with parameter α. The correlation structure is usually viewed as a working model, and inferences are based on variance estimators that recognize that (sandwich estimators). The overall variance parameter vector is θ ξ= . α 3 / 21 Population-Averaged Model ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Generalized Estimating Equations (GEE) The basic principles of GEE are the same as those for univariate response, even though the equations are more complicated. Things to come: Linear estimating equations for β Quadratic estimating equations for ξ Quadratic estimating equations for β The folklore theorem and robust covariance matrix Implementations in R Strategy: initially, assume the given mean specification and known variance matrices Vi , and the Gaussian distribution. 4 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Linear Estimating Equations for β Normal log-likelihood m i 1 Xh T −1 log |2πVi | + {Yi − fi (xi , β)} Vi {Yi − fi (xi , β)} logL = − 2 i=1 Differentiate wrt β: m X Xi (β)T Vi−1 {Yi − fi (xi , β)} = 0. i=1 5 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response As always, Xi (β) is a gradient matrix: fβ (xi,1 , β)T f (x , β)T β i,2 Xi (β) = .. . fβ (xi,ni , β)T 6 / 21 Population-Averaged Model ni ×p Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Now use the variance specification Vi = var ( Yi | xi ) = Vi (β, ξ, xi ) . Estimating equation for β becomes m X Xi (β)T Vi (β, ξ, xi )−1 {Yi − fi (xi , β)} = 0. i=1 In terms of stacked XN×p , YN×1 , and fN×1 : X(β)T V−1 {Y − f (β)} = 0 where VN×N is block-diagonal: V = diag (V1 , V2 , . . . , Vm ) . 7 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Estimating equations: linear, unbiased. GLS iteration: (0) Initial estimator β̂ , e.g., OLS. (k) For each k ≥ 0, some estimator ξ̂ gives estimated weight matrices (k) (k) −1 V̂i β̂ , ξ̂ , xi . Re-estimate β by solving m X (k) (k) −1 Xi (β)T V̂i β̂ , ξ̂ , xi {Yi − fi (xi , β)} = 0. i=1 for β̂ 8 / 21 (k+1) , and iterate C times. Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Estimating Variance and Correlation Parameters Estimation of ξ q+s = (θ T , αT )T . Hold β fixed in the normal log-likelihood and maximize wrt ξ. Both involve complicated estimating equations that are quadratic in {Yi − fi (xi , β)}. 9 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response General quadratic equation approach: univariate case: requires variances of residuals i = {Yi − f (xi , β)} . multivariate case: requires variances and covariances of ui,j,k = {Yi,j − f (xi,j , β)} × {Yi,k − f (xi,k , β)} . 10 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Two popular working assumptions: Independence: assume cov (ui,j,k , ui,j 0 ,k 0 ) is the same as if Yi,1 , Yi,2 , . . . , Yi,ni were independent. Gaussian: assume cov (ui,j,k , ui,j 0 ,k 0 ) is the same as if Yi,1 , Yi,2 , . . . , Yi,ni were normal with the specified mean and variance structure. Not surprisingly, the Gaussian working assumption leads to the same estimating equation as PL. 11 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Quadratic Estimating Equations for β As in the univariate case, maximizing the full normal likelihood wrt β leads to estimating equations that are quadratic in Yi − fi (xi , β), when g (·) depends on β. As when estimating variance parameters, general quadratic estimating equations may be used, based on the covariances cov (ui,j,k , ui,j 0 ,k 0 ). Working assumptions (independence or Gaussian) would usually be necessary. 12 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response If the working assumptions about the covariance are correct, the quadratic estimator is better than GLS. Otherwise, β̂ Q may be less efficient than GLS, and, if the variance is misspecified, inconsistent. GEE-1: linear equations for β and quadratic equations for ξ; GEE-2: quadratic equations for β and quadratic equations for ξ. 13 / 21 Population-Averaged Model Generalized Estimating Equations (GEE) ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response The folklore theorem Large sample properties are usually derived with ni fixed and m → ∞. With fixed weight matrices U−1 and true variance matrices Vi , i · β̂ ∼ !−1 m X T −1 N β0 , Xi Ui Xi i=1 m X ! −1 −1 XT i Ui Vi Ui Xi i=1 m X −1 XT i Ui Xi !−1 i=1 Folklore theorem: estimated weight matrices generally lead to the same asymptotic distribution as if the weights were known. Misspecified weight matrices lead to inefficiency. 14 / 21 Population-Averaged Model The folklore theorem ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Sandwich Estimator With misspecified weight matrices, model-based standard errors are generally incorrect, while sandwich estimator of standard errors are valid. The asymptotic variance is estimated by m X R XTi U−1 β̂ U−1 i i i Xi i=1 where Ri (b) = {Yi − fi (xi , b)} {Yi − fi (xi , b)}T . Note that E {Ri (β)} = Vi , regardless of the correlation structure of Yi . 15 / 21 Population-Averaged Model The folklore theorem ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Implementation in R Recall the general formulation of a PA (marginal) model: E (Yi | xi ) = fi (xi , β) , var (Yi | xi ) = Vi (β, ξ, xi ) with 1 1 Vi (β, ξ, xi ) = Ti (β, θ, xi ) 2 Γi (α, xi ) Ti (β, θ, xi ) 2 where Ti (·) is the diagonal matrix of variances, and Γi (·) is the correlation matrix. 16 / 21 Population-Averaged Model Implementation in R ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Package software implements only a special case, based on generalized linear models: mean is a nonlinear function of a linear predictor: E (Yi,j | xi ) = f xTi,j β ; variance is a function of the mean: var ( Yi,j | xi ) = σ 2 g f xTi,j β ; correlation structure is one of the standard patterns. 17 / 21 Population-Averaged Model Implementation in R ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Note It is possible that no joint distribution of Yi exists with marginal distributions for Yi,j in the associated scaled exponential family. That is, the specification should be viewed as defining Generalized Estimating Equations (GEE); also, the implementations are derived from the Generalized Linear Model point of view, and are based on linear estimating equations for β (GEE-1). 18 / 21 Population-Averaged Model Implementation in R ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response SAS proc genmod has a repeated statement to specify correlation structure. In R, use one of the functions: gee() (in the gee library); geese() or geeglm() (in the geepack library). The argument corstr specifies correlation structure. 19 / 21 Population-Averaged Model Implementation in R ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response The gee package seize <- read.table("seize.dat", col.names = c("id", "y", "time", "trt", "base", "age")) library(gee) seizeGee.un <- gee(y ~ log(age) + trt*log(base / 4) + (time == 4), id = id, data=seize, corstr="unstructured", family=poisson) summary(seizeGee.un) seizeGee.ex <- update(seizeGee.un, corstr="exchangeable") summary(seizeGee.ex) seizeGee.ar1 <- update(seizeGee.un, corstr="AR-M", Mv = 1) summary(seizeGee.ar1) library(MuMIn) QIC(seizeGee.un) QIC(seizeGee.ex) QIC(seizeGee.ar1) 20 / 21 Population-Averaged Model Implementation in R ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response The geepack package library(geepack) seizeGeeglm.un <- geeglm(y ~ log(age) + trt*log(base / 4) + (time == 4), id = id, data=seize, corstr="un", family=poisson) summary(seizeGeeglm.un) seizeGeeglm.ex <- update(seizeGeeglm.un, corstr="exch") summary(seizeGeeglm.ex) seizeGeeglm.ar1 <- update(seizeGeeglm.un, corstr="ar1") summary(seizeGeeglm.ar1) library(MuMIn) QIC(seizeGeeglm.un) QIC(seizeGeeglm.ex) QIC(seizeGeeglm.ar1) 21 / 21 Population-Averaged Model Implementation in R