Generalized Estimating Equations (GEE): A Modern Love Story April 18, 2011 DαSAL Brandi Stupica Data for today on the H: drive in the DaSAL folder GEE Talk Data_041811.sav What are generalized estimating equations? Applications Why you should love GEEs PART I. What are Generalized Estimating Equations (GEE)? • Extension of the Generalized Linear Model (GZLM), which is an extension of the General Linear Model (GLM) – GLM analyzes models with normally distributed DVs that are linearly linked to predictors – GZLM extends GLM to analyze non-normally distributed DVs that may be non-linearly linked to predictors • Easily handles interactions between discrete and continuous IVs • Cannot analyze correlated, non-independent, clustered, nested, repeated measures, within-subjects data – GEE extends GZLM and analyzes correlated data with • Normal and non-normal DVs • DVs that are linearly or non-linearly linked to IVs • Full factorial models with any combo of discrete and continuous IVs Application of GEE • Nested data – Dyadic relationships – Family studies – School and organizational studies • Repeated measures – Longitudinal data analysis • Within subjects designs – Pre/post designs Why You Should Love GEEs for Correlated Data • Compared to rANOVA – Doesn’t assume DV is normal or that it is linearly linked to predictors • Can model DVs that are binomial, multinomial, Poisson, negative binomial, and more! – Can model interactions between factors and covariates with ease • Compared to Linear Mixed Models – Doesn’t require that repeated responses have multivariate normal distribution • Unlikely to meet this assumption when DV is binary or count data • Rather than combining multiple assessments, analyze with improved power by including within Ss factor • Uses all available data as default rather than complete cases only • Extraordinary flexibility can streamline results sections Conducting a GEE PART II. Conducting a GEE: First Step • Arrange your data in “long form” A. How data usually look B. How data need to look for GEE Getting from A to B: Restructuring Your Data Restructuring Your Data Restructuring Your Data Restructuring Your Data Restructuring Your Data Restructured Data Conducting a GEE Analysis Selecting the Model Type • Dozens of model combinations with GEE – DV can be discrete, any of several distributions, and nonlinearly linked to IVs • Must select distribution of DV and link function Response Variable • Also known as outcome variable, DV • Category order is for multinomial DVs • For binary outcomes, can specify reference category Predictors • Options for factors allows specification of reference category and how to handle missing data Model • Full factorial is a few clicks away Estimation and Statistics EM Means • Several options for controlling for family-wise error • Several options for contrasts, including – – – – Simple Pairwise Deviation Difference Save, Export, and Cross Your Fingers Results: Descriptive Information Detour to Explain the Relevance of Goodness of Fit Working Correlation Matrix • What is a working correlation matrix? – Correlated data could be correlated many ways – Specify in the beginning the assumptions that should be made about how correlated data are correlated – “Working” comes from the structure being reestimated at each iteration • GEE robust to misspecification • Then, why bother picking the best one? – Small gain in efficiency by selecting correct underlying structure • In the “Repeated” tab I picked Unstructured correlation matrix – Why? Working Correlation Matrix Options • Unstructured – – – • • Independent – – – Assumes measurements for the repeated measure uncorrelated Default in SPSS 1’s on the diagonal and 0's off the diagonal – Signifies variables correlated with themselves at any given time but not correlated with measurements at other times – – Illogical assumption and often wrong given that data are correlated and non-independent by nature!!! Thus, I always start with something other than independent, and choose unstructured because most conservative, efficient, and makes no assumptions AR(1): Auto-regressive, order 1 – – – • Correlation diminishes exponentially over-time Assumes equal time intervals 1's on the diagonal; alpha for observations one apart; alpha-squared for two apart; alpha-cubed for three apart , and so on Exchangeable – – – • No assumption about relative magnitude of the correlation between any two pairs of observations Must estimate many parameters Most efficient and conservative but can lead to poor estimates with small samples Correlation does not change with time Correlations for within-subjects variables homogenous, 1's on the diagonal and equal correlation for all off-diagonal elements M-dependent – – – Correlation does not change with time until time M, when it drops to zero 1’s on the diagonal and 0 for observations separated by some number M or more and equal correlation for responses separated by less than M time points Researcher specifies M Choosing the Best Fitting Working Correlation Matrix • Run model for different working correlation structure assumptions, choose the one assumption with the lowest QIC value • But, wait…What is the QICC? Bonus! Choosing the Best Subset of Predictors • QICC used for choosing best subset of predictors • Penalizes for model complexity • Run a model and a nested model dropping one of the predictors, then compare QICC coefficients • Lower QICC indicates better fit Test of Model Effects Parameter Estimates Estimated Marginal Means and Pairwise Comparisons Continuous Normal DV • Example if time More Information on GEEs • Hardin, J. W., & Hilbe, J. M. (2003). Generalized estimating equations. Boca Raton, FL: Chapman and Hall/CRC Press. • Norusis, M. (2011). IBM SPSS Statistics 19 Advanced Statistical Procedures Companion. Upper Saddle River, NJ: Pearson. • http://faculty.chass.ncsu.edu/garson/PA 765/gzlm_gee.htm Estimated Marginal Means EM Means