Generalized Estimating Equations

advertisement
Generalized Estimating Equations
(GEE): A Modern Love Story
April 18, 2011
DαSAL
Brandi Stupica
Data for today on the H: drive in the DaSAL folder
GEE Talk Data_041811.sav
What are generalized estimating equations?
Applications
Why you should love GEEs
PART I.
What are Generalized Estimating
Equations (GEE)?
• Extension of the Generalized Linear Model (GZLM),
which is an extension of the General Linear Model
(GLM)
– GLM analyzes models with normally distributed DVs that
are linearly linked to predictors
– GZLM extends GLM to analyze non-normally distributed DVs
that may be non-linearly linked to predictors
• Easily handles interactions between discrete and continuous
IVs
• Cannot analyze correlated, non-independent, clustered,
nested, repeated measures, within-subjects data
– GEE extends GZLM and analyzes correlated data with
• Normal and non-normal DVs
• DVs that are linearly or non-linearly linked to IVs
• Full factorial models with any combo of discrete and
continuous IVs
Application of GEE
• Nested data
– Dyadic relationships
– Family studies
– School and organizational studies
• Repeated measures
– Longitudinal data analysis
• Within subjects designs
– Pre/post designs
Why You Should Love GEEs for
Correlated Data
• Compared to rANOVA
– Doesn’t assume DV is normal or that it is linearly linked to
predictors
• Can model DVs that are binomial, multinomial, Poisson, negative binomial,
and more!
– Can model interactions between factors and covariates with
ease
• Compared to Linear Mixed Models
– Doesn’t require that repeated responses have multivariate
normal distribution
• Unlikely to meet this assumption when DV is binary or count data
• Rather than combining multiple assessments, analyze with
improved power by including within Ss factor
• Uses all available data as default rather than complete cases
only
• Extraordinary flexibility can streamline results sections
Conducting a GEE
PART II.
Conducting a GEE: First Step
•
Arrange your data in “long form”
A. How data usually look
B. How data need to look for GEE
Getting from A to B:
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructuring Your Data
Restructured Data
Conducting a GEE Analysis
Selecting the Model Type
• Dozens of model
combinations with
GEE
– DV can be discrete,
any of several
distributions, and
nonlinearly linked to
IVs
• Must select
distribution of DV
and link function
Response Variable
• Also known as
outcome variable,
DV
• Category order is
for multinomial DVs
• For binary
outcomes, can
specify reference
category
Predictors
• Options for factors
allows specification
of reference
category and how
to handle missing
data
Model
• Full factorial is a few
clicks away
Estimation and Statistics
EM Means
• Several options for
controlling for
family-wise error
• Several options for
contrasts, including
–
–
–
–
Simple
Pairwise
Deviation
Difference
Save, Export, and Cross Your
Fingers
Results: Descriptive Information
Detour to Explain the Relevance
of Goodness of Fit
Working Correlation Matrix
• What is a working correlation matrix?
– Correlated data could be correlated many ways
– Specify in the beginning the assumptions that should
be made about how correlated data are correlated
– “Working” comes from the structure being reestimated at each iteration
• GEE robust to misspecification
• Then, why bother picking the best one?
– Small gain in efficiency by selecting correct
underlying structure
• In the “Repeated” tab I picked Unstructured
correlation matrix
– Why?
Working Correlation Matrix
Options
•
Unstructured
–
–
–
•
•
Independent
–
–
–
Assumes measurements for the repeated measure uncorrelated
Default in SPSS
1’s on the diagonal and 0's off the diagonal
–
Signifies variables correlated with themselves at any given time but not correlated with measurements at other times
–
–
Illogical assumption and often wrong given that data are correlated and non-independent by nature!!!
Thus, I always start with something other than independent, and choose unstructured because most
conservative, efficient, and makes no assumptions
AR(1): Auto-regressive, order 1
–
–
–
•
Correlation diminishes exponentially over-time
Assumes equal time intervals
1's on the diagonal; alpha for observations one apart; alpha-squared for two apart; alpha-cubed for three apart
, and so on
Exchangeable
–
–
–
•
No assumption about relative magnitude of the correlation between any two pairs of observations
Must estimate many parameters
Most efficient and conservative but can lead to poor estimates with small samples
Correlation does not change with time
Correlations for within-subjects variables homogenous,
1's on the diagonal and equal correlation for all off-diagonal elements
M-dependent
–
–
–
Correlation does not change with time until time M, when it drops to zero
1’s on the diagonal and 0 for observations separated by some number M or more and equal correlation for
responses separated by less than M time points
Researcher specifies M
Choosing the Best Fitting
Working Correlation Matrix
• Run model for
different working
correlation structure
assumptions,
choose the one
assumption with the
lowest QIC value
• But, wait…What is
the QICC?
Bonus! Choosing the Best Subset
of Predictors
• QICC used for
choosing best subset of
predictors
• Penalizes for model
complexity
• Run a model and a
nested model dropping
one of the predictors,
then compare QICC
coefficients
• Lower QICC indicates
better fit
Test of Model Effects
Parameter Estimates
Estimated Marginal Means and
Pairwise Comparisons
Continuous Normal DV
• Example if time
More Information on GEEs
• Hardin, J. W., & Hilbe, J. M. (2003).
Generalized estimating equations. Boca
Raton, FL: Chapman and Hall/CRC Press.
• Norusis, M. (2011). IBM SPSS Statistics 19
Advanced Statistical Procedures
Companion. Upper Saddle River, NJ:
Pearson.
• http://faculty.chass.ncsu.edu/garson/PA
765/gzlm_gee.htm
Estimated Marginal Means
EM Means
Download