Applied Structural Equation Modeling for Dummies, by Dummies February 22, 2013 Indiana University, Bloomington Joseph J. Sudano, Jr., PhD Center for Health Care Research and Policy Case Western Reserve University at The MetroHealth System Adam T. Perzynski, PhD Center for Health Care Research and Policy Case Western Reserve University at The MetroHealth System Acknowledgements Thanks Joe. Thanks to Bill Pridemore and all of you here at IU. Thanks to Doug Gunzler. Thanks to Kyle Kercher. Rejected Titles for this Talk February 22, 2013 Indiana University, Bloomington Joseph J. Sudano, Jr., PhD Center for Health Care Research and Policy Case Western Reserve University at The MetroHealth System Adam T. Perzynski, PhD Center for Health Care Research and Policy Case Western Reserve University at The MetroHealth System Structural Equation Modeling for Fashion Week We have lots of Models! Structural Equation Modelin’ fer Pirates Structural Equation Modelin’ fer Pirates SEM be a statistical technique for testin' and estimatin' causal relations usin' a combination o' statistical data and qualitative causal assumptions *From Wikipedia Assumptions I do not actually assume you are dummies Feel free to assume what you want about me I do not assume you will be experts in SEM after this presentation I assume you know something about means and regression (hopefully) Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification SEM Resources SEM Resources SEM Resources SEM Resources: Statmodel.com SEM Resources SEM Resources SEM Resources SEM Resources SEM Resources SEM Resources SEM Resources Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification Measurement Models A special type of causal models Survey items are assumed to have measurement error ◦ Each question has its own amount of error Your answer to a survey question is causally related to a latent, unobserved variable. Perfect Measurement health Self-rated health 1.0? Causality and the Latent Concept of Health In general, how would you describe your health? We assume that every individual varies along an infinite continuum from best possible health to worst possible health. When any given individual answers this question, they are approximating their position on this latent continuum. Imperfect Measurement 1.0 health Self-rated health e4 Variance > 0 < 1.0 Measurement Models using Multiple Indicators Single items are unreliable Single cases prevent generalizability Use multiple indicators and large samples to estimate the values of the latent, unobservered variables or factors The SF36 uses multiple indicators describing multiple factors in order to measure health more reliably. Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification Acknowledgement: This study was funded by Grant number R01-AG022459 from the NIH National Institute on Aging. Measuring Disparities: Bias in Self-reported Health Among Spanishspeaking Patients J.J. Sudano1,2, A.T. Perzynski1,2, T.E. Love2, S.A. Lewis1,B. Ruo3, D.W. Baker3 1 The MetroHealth System, Cleveland, OH; 2 Case Western Reserve University School of Medicine, Cleveland, OH; 3 Northwestern University Feinberg School of Medicine Measurement Model of the SF36 Objective & Significance Do observed differences in SRH reflect true differences in health? ◦ Cultural and language differences may create measurement bias ◦ If outcomes aren’t measuring the same thing in different groups, we have a problem Measurement Equivalence & Factorial Invariance It is only possible to properly interpret group differences after measurement equivalence has been established (Horn & McArdle, 1992; Steenkamp & Baumgartner, 1998). “It may be the case that the groups differ … but it also may be the case that extraneous influences are giving rise to the observed difference.” Meredith & Teresi (2006 p. S69) The external validity of any conclusion regarding group differences rests securely on whether the measurement equivalence of the scale has been established (Borsboom, 2006). Cross-sectional Study N= 1281 Medical patients categorized into four groups:White, Black, English-speaking Hispanic and Spanish-speaking Hispanic. Multigroup Confirmatory Factor Analysis (MGCFA) Two Types of Invariance Metric (Weak) Invariance ◦ Are the item factor loadings equivalent across groups? ◦ Is a one unit change in the item equal to a one unit change in the factor score for all groups? Scalar (Strong) Invariance ◦ Are the item intercepts equivalent across groups? ◦ Unequal intercepts results in unequal scaling of factor scores Weak invariance health Self-rated health What happens to the model fit when we constrain all of these paths (loadings) to be equal across groups? e4 Table 1: Goodness of Fit for SF36 Multigroup Factorial Invariance Testing (N = 1281) Ref ∆RMSEA ∆CFI B-S ∆χ2 3110 2253 1 0.001 -0.005 109 81 0.907 3215 2358 2 0.004 -0.024 105 105 0.033 (.032 - .034) 0.909 3179 2323 2 0.004 -0.022 69 70 Partial Scalar Invariance (B=W=HE not HS) 0.030 (.029 - .032) 0.921 3180 2323 2 0.001 -0.010 70 70 2nd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3187 2333 2 0.001 -0.010 77 80 7 2nd & 3rd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3196 2339 2 0.001 * The bootstrapped Bollen - Stine χ2 value is reported because of significant (p<.01) multivariate non-normality. ** Structural factor weights are constrained equal for Blacks, Whites and Hispanic English (Hispanic Spanish are unconstrained). -0.010 86 86 Model B-S χ2* Description RMSEA (95% CI) CFI df 1 Unconstrained Model 0.028 (.017 - .030) 0.936 3001 2172 2 Metric Invariance (Factor Weights) 0.029 (.028 - .030) 0.931 3 Scalar Invariance (Intercepts) 0.033 (.032 - .034) 4 Partial Scalar Invariance (B=W=HS not HE) 5 6 ∆df The Unconstrained Model Fits the Data Well Table 1: Goodness of Fit for SF36 Multigroup Factorial Invariance Testing (N = 1281) Ref ∆RMSEA ∆CFI B-S ∆χ2 3110 2253 1 0.001 -0.005 109 81 0.907 3215 2358 2 0.004 -0.024 105 105 0.033 (.032 - .034) 0.909 3179 2323 2 0.004 -0.022 69 70 Partial Scalar Invariance (B=W=HE not HS) 0.030 (.029 - .032) 0.921 3180 2323 2 0.001 -0.010 70 70 2nd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3187 2333 2 0.001 -0.010 77 80 7 2nd & 3rd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3196 2339 2 0.001 * The bootstrapped Bollen - Stine χ2 value is reported because of significant (p<.01) multivariate non-normality. ** Structural factor weights are constrained equal for Blacks, Whites and Hispanic English (Hispanic Spanish are unconstrained). -0.010 86 86 Model B-S χ2* Description RMSEA (95% CI) CFI df 1 Unconstrained Model 0.028 (.017 - .030) 0.936 3001 2172 2 Metric Invariance (Factor Weights) 0.029 (.028 - .030) 0.931 3 Scalar Invariance (Intercepts) 0.033 (.032 - .034) 4 Partial Scalar Invariance (B=W=HS not HE) 5 6 ∆df The model with factor loadings constrained still fits the data well. The Unconstrained Model fits the data well Table 1: Goodness of Fit for SF36 Multigroup Factorial Invariance Testing (N = 1281) Ref ∆RMSEA ∆CFI B-S ∆χ2 3110 2253 1 0.001 -0.005 109 81 0.907 3215 2358 2 0.004 -0.024 105 105 0.033 (.032 - .034) 0.909 3179 2323 2 0.004 -0.022 69 70 Partial Scalar Invariance (B=W=HE not HS) 0.030 (.029 - .032) 0.921 3180 2323 2 0.001 -0.010 70 70 2nd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3187 2333 2 0.001 -0.010 77 80 7 2nd & 3rd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3196 2339 2 0.001 * The bootstrapped Bollen - Stine χ2 value is reported because of significant (p<.01) multivariate non-normality. ** Structural factor weights are constrained equal for Blacks, Whites and Hispanic English (Hispanic Spanish are unconstrained). -0.010 86 86 Model B-S χ2* Description RMSEA (95% CI) CFI df 1 Unconstrained Model 0.028 (.017 - .030) 0.936 3001 2172 2 Metric Invariance (Factor Weights) 0.029 (.028 - .030) 0.931 3 Scalar Invariance (Intercepts) 0.033 (.032 - .034) 4 Partial Scalar Invariance (B=W=HS not HE) 5 6 ∆df Metric (Weak) Invariance was Confirmed I forget what an intercept is Scalar (Strong) Invariance ◦ Are the item intercepts equivalent across groups? Intercept: the intercept in a multiple regression model is the mean for the response when all of the explanatory variables take on the value 0. Could be called the “starting point” The model with factor loadings constrained still fits the data well. The Unconstrained Model fits the data well Table 1: Goodness of Fit for SF36 Multigroup Factorial Invariance Testing (N = 1281) Ref ∆RMSEA ∆CFI B-S ∆χ2 3110 2253 1 0.001 -0.005 109 81 0.907 3215 2358 2 0.004 -0.024 105 105 0.033 (.032 - .034) 0.909 3179 2323 2 0.004 -0.022 69 70 Partial Scalar Invariance (B=W=HE not HS) 0.030 (.029 - .032) 0.921 3180 2323 2 0.001 -0.010 70 70 2nd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3187 2333 2 0.001 -0.010 77 80 7 2nd & 3rd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3196 2339 2 0.001 * The bootstrapped Bollen - Stine χ2 value is reported because of significant (p<.01) multivariate non-normality. ** Structural factor weights are constrained equal for Blacks, Whites and Hispanic English (Hispanic Spanish are unconstrained). -0.010 86 86 Model B-S χ2* Description RMSEA (95% CI) CFI 1 Unconstrained Model 0.028 (.017 - .030) 0.936 3001 2172 2 Metric Invariance (Factor Weights) 0.029 (.028 - .030) 0.931 3 Scalar Invariance (Intercepts) 0.033 (.032 - .034) 4 Partial Scalar Invariance (B=W=HS not HE) 5 6 Constraining the intercepts results in a worsening of model fit df ∆df The model with factor loadings constrained still fits the data well. Table 1: Goodness of Fit for SF36 Multigroup Factorial Invariance Testing (N = 1281) Ref ∆RMSEA ∆CFI B-S ∆χ2 3110 2253 1 0.001 -0.005 109 81 0.907 3215 2358 2 0.004 -0.024 105 105 0.033 (.032 - .034) 0.909 3179 2323 2 0.004 -0.022 69 70 Partial Scalar Invariance (B=W=HE not HS) 0.030 (.029 - .032) 0.921 3180 2323 2 0.001 -0.010 70 70 2nd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3187 2333 2 0.001 -0.010 77 80 7 2nd & 3rd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3196 2339 2 0.001 * The bootstrapped Bollen - Stine χ2 value is reported because of significant (p<.01) multivariate non-normality. ** Structural factor weights are constrained equal for Blacks, Whites and Hispanic English (Hispanic Spanish are unconstrained). -0.010 86 86 Model B-S χ2* Description RMSEA (95% CI) CFI 1 Unconstrained Model 0.028 (.017 - .030) 0.936 3001 2172 2 Metric Invariance (Factor Weights) 0.029 (.028 - .030) 0.931 3 Scalar Invariance (Intercepts) 0.033 (.032 - .034) 4 Partial Scalar Invariance (B=W=HS not HE) 5 6 Constraining the intercepts results in a worsening of model fit df The fit is still poor if you allow intercepts for English-speaking Hispanics to vary ∆df The model with factor loadings constrained still fits the data well. Table 1: Goodness of Fit for SF36 Multigroup Factorial Invariance Testing (N = 1281) Ref ∆RMSEA ∆CFI B-S ∆χ2 3110 2253 1 0.001 -0.005 109 81 0.907 3215 2358 2 0.004 -0.024 105 105 0.033 (.032 - .034) 0.909 3179 2323 2 0.004 -0.022 69 70 Partial Scalar Invariance (B=W=HE not HS) 0.030 (.029 - .032) 0.921 3180 2323 2 0.001 -0.010 70 70 2nd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3187 2333 2 0.001 -0.010 77 80 7 2nd & 3rd Order Structural Invariance** 0.030 (.029 - .032) 0.921 3196 2339 2 0.001 * The bootstrapped Bollen - Stine χ2 value is reported because of significant (p<.01) multivariate non-normality. ** Structural factor weights are constrained equal for Blacks, Whites and Hispanic English (Hispanic Spanish are unconstrained). -0.010 86 86 Model B-S χ2* Description RMSEA (95% CI) CFI df 1 Unconstrained Model 0.028 (.017 - .030) 0.936 3001 2172 2 Metric Invariance (Factor Weights) 0.029 (.028 - .030) 0.931 3 Scalar Invariance (Intercepts) 0.033 (.032 - .034) 4 Partial Scalar Invariance (B=W=HS not HE) 5 6 The fit is acceptable if you allow intercepts for Spanish speaking Hispanics to vary ∆df Scalar (Strong) Invariance is NOT Confirmed Measurement equivalence of the SF36 does not exist for Spanish speaking Hispanics Intercepts are lower for Spanishspeaking Hispanics on nearly all items Measurement equivalence of the SF36 does not exist for Spanish speaking Hispanics Use of English Rating Categories on Twiter Using of Spanish Rating Categories on Twitter Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification Everywhere and Nowhere: Latent Class Analysis of Knowledge of the Spread of Hepatitis C Adam T. Perzynski, PhD E-mail: Adam.Perzynski@case.edu Introduction Hepatitis C is a widespread and serious disease that affects the liver. 170 million people worldwide are infected. 3.9 million Americans infected with HCV. (AHRQ 2003) More Americans die every year from Chronic HCV infection than from HIV HCV Transmission Blood ◦ ◦ ◦ ◦ Injection Drug Use Blood Transfusions Needle Sticks Shared Household Items (Razor or Toothbrush) Sexual transmission of HCV is recognized but is infrequent. HCV is not transmitted by Coughing, Kissing, Sneezing, Touching, Bathrooms, Fecal Matter, or Contaminated Food Sample Behavior Risk Factor Surveillance System (BRFSS), 2001, Arizona Conducted by the Centers for Disease Control (CDC) The world’s largest telephone survey Nearly 200,000 people participated in 2001 Measure Do you think hepatitis C can be spread thru? ◦ ◦ ◦ ◦ ◦ ◦ ◦ Sneezing or Coughing Kissing Unprotected Sex Food or Water Sharing Needles to Inject Street Drugs Using the Same Bathroom Contact with the Blood of an Infected Person Methods of Analysis Analyzed with Mplus Analysis proceeded in several stages 1. 2. 3. 4. 5. Exploratory Factor Analysis Confirmatory Factor Analysis Cluster Analysis (Not reported) Latent Class Analysis Mixture Modeling Robust estimation for binary indicators Missing Values Imputation using Full Information Maximum Likelihood Estimation (FIML) Distribution of Outcome Variables Means and Standard Deviations Exploratory Factor Analysis Scree plot, Eigenvalues, and Root Mean Square Residuals more or less supported a two factor solution Figure 1: Measurement Model #1 Confirmatory Factor Analysis with Two Latent Continuous Variables. N = 3902 -.79 Unprotected Sex Sharing Needles .86 .98 HCV is Transmitted HCV is not Transmitted .83 Sneezing/Coughing .79 Food or Water .83 Bathrooms .92 Blood Contact .96 Kissing What is different about LCA? Instead of assuming that the latent variable is continuous (infinitely poor to infinitely good) We assume the latent variable is categorical. Membership in “hidden” empirical forms determines answers rather than a single latent continuum. Figure 2: Measurement Model #2 Latent Class Analysis with a Categorical Latent Variable N = 3902 Kissing Sneezing/Coughing Estimation: Maximum Likelihood Robust Fit Measures Likelihood Ratio Chi-Square = 269.556 DF = 104 P-Value = 0.000 AIC = 17338.312 BIC = 17477.153 Adjusted BIC = 17404.073 Entropy = .888 Unprotected Sex HCV Transmission Awareness Food or Water Sharing Needles Bathrooms Blood Contact Don’t Know Three Latent Classes The Two Category and Four Category models do not fit the data as well as as the Three Category model. HCV is Nowhere ◦ N = 1683 (The largest class!) Full Awareness of how HCV is Spread ◦ N = 930 HCV is Everywhere ◦ N = 479 Figure 3: Estimated Probabilities of Knowing How HCV is Spread by Class Membership Additional Analyses What predicts membership in each latent class? Do the relationships between variables vary inside of a particular class? Mixture Modeling ◦ Simultaneously test continuous and categorical predictors of class membership. Figure 4: Causal Model with a Latent Categorical Variable of HCV Transmission Awareness* Socio-demographics Age HCV Module Race/Ethnicity Know Someone w/ HCV Kissing Gender HCV Test Sneezing/Coughing HCV Diagnosis Unprotected Sex Education Income Received Blood Transfusion Employment Status HCV Transmission Awareness Food or Water IDU Sharing Needles Self-Perceived HCV Risk Bathrooms Blood Contact Health and Health Care Self-rated Health Health Care Access * Bivariate correlations are calculated for all variables inside of boxed conceptual categories Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification Longitudinal Patterns of Depressive Symptoms in the Health and Retirement Study Adam T. Perzynski, PhD & Joseph S. Sudano, Jr., PhD Center for Health Care Research and Policy Case Western Reserve University and MetroHealth Presentation at the Annual Meeting of the Gerontological Society of America on November 22, 2010 Introduction This is another measurement study Explore the use of Latent Class Growth Analysis to model changes in depressive symptoms over time in the Health and Retirement Study. Most studies compare the change in means scores between two waves. A small number of studies have modeled change as a single growth trajectory Change in Means Between Waves Often we simply calculate the mean depressive symptoms at Wave 1 (baseline). Subract it from the mean at Wave 2 (followup). What is a trajectory? Regrettably, the term “trajectory” has taken on multiple meanings across disciplines and research studies. A broad, inclusive definition of trajectory modeling is the analysis of patterns of change or stability. Confusion is possible between aggregate trajectories which summarize an overall average pattern of change for a population and disaggregated trajectories which examine multiple potential trajectories of different shapes (George 2006). Example of a single growth trajectory Continuous Latent Growth Curve Analysis LGA / LGCA Studies in older adults (ie George and Lynch 2003) typically find that the slope of the latent growth curve for depressive symptoms is small and positive, and that the slope of the curve is steepest in the oldest cohorts. Example from George and Lynch (2003) Example of an LGA finding LGA estimates a single Aggregate trajectory Assumes that the average population starting point (intercept for the growth curve) and average amount of change (slope) are a sufficient depiction of variation over time in depressive symptoms. If discrete subtypes of depressive symptom trajectories exist, but are ignored (as in single latent growth curve and autoregressive models) the magnitude of associations could be grossly misestimated. What is Latent Class Growth Analysis? Latent Class Growth Analysis (LCGA), also referred to as growth mixture modeling, belongs to a family of statistical techniques referred to as general latent variable modeling or GLVM. Why would we ever think we should use LCGA? Studying the mean change or using a single trajectory for everyone assumes uniform heterogeneity in the population. Researchers use familiar methods and typically assume that the underlying (latent or real) distribution of variables is continuous. We have theoretical reasons to suspect that underlying distributions could be categorical. Life course theorists (Dannefer) specifically caution that intracohort differentiation is unlikely to be homogeneous. Why would we use LCGA? We think individuals and cohorts diverge over time Cumulative change differentiates individuals and cohorts. Prior LCGA Models of Depression or Depressive Symptoms LCGA models and closely related Longitudinal Latent Class Analysis (LLCA) have been used to estimate models of depressive symptoms in prior studies of ◦ maternity (Campbell et al 2009; Mora et al 2009) ◦ childhood and adolescence (Meadows et al 2006) ◦ adolescence through young adulthood (Olino et al 2009) ◦ response to antidepressants among adults (Muthen et al, 2007; Hunter et al 2009) ◦ patients who have had a cardiovascular event (Kaptein et al 2006). Methods 5,195 age-eligible respondents from the 1992 Health and Retirement Study cohort, who completed interviews in all seven waves through 2004. Depressive symptoms in HRS are measured using a dichotomous, 8-item version of the CES-D. Analysis begins with Wave 2 data due to a change in response categories from Wave 1. Using MPlus, we compared the fit of LCGA models of two to eight classes while also accounting for the HRS complex sampling design. We then tested the effect of a small number of covariates. This is very similar to a multinomial logistic regression. Demographic characteristics Gender ◦ 60.3% female Race/ethnicity ◦ ◦ ◦ ◦ 76.4% non-Hispanic White 14.4% Black 7.4% Hispanic 1.8% other racial/ethnic groups Age ◦ Median=55 Education ◦ Mean=12.4 years (SD=3.0). Rule for Determining the number of Latent Classes “How many trajectories are there?” Measures of model fit including: ◦ Lo-Mendell-Rubin Test (LMR test) ◦ log-likelihood (LL) ◦ Bayesian Information Criteria (BIC) (Vuong, 1989; Muthen, 2004; Muthen, & Muthen, 2005; Nylund et al, 2007). Here we will use the LMR Test Where k is the number of latent classes, this test gives a pvalue for the k-1 versus the k-class model when running the k-class model (Vuong, 1989; Muthen, B. 2005). The first time p > .05, k-1 is the preferred number of classes. Results How many classes are there? What do the classes look like? How is this different from looking at means or single trajectory? Are any demographic variables associated with being in a particular class? How many Classes are there? Table 1. Depressive Symptoms LCGA Model Fit Comparison, N = 5,195 K LL BIC Adjusted BIC LMR Test LMR p Entropy 2 -56367.49 112983.09 112890.94 10525.69 0.000 0.955 3 -55146.65 110618.41 110497.66 2410.38 0.000 0.922 4 -54652.99 109708.09 109558.74 974.66 0.015 0.925 5 -54357.08 109193.27 109015.32 519.88 0.149 0.901 6 -54090.08 108736.27 108529.72 397.39 0.354 0.912 7 -54079.98 108793.06 108557.91 97.55 0.392 0.920 8 -53895.87 108501.85 108238.10 307.84 0.314 0.732 How many Classes are there? Table 1. Depressive Symptoms LCGA Model Fit Comparison, N = 5,195 K LL BIC Adjusted BIC LMR Test LMR p Entropy 2 -56367.49 112983.09 112890.94 10525.69 0.000 0.955 3 -55146.65 110618.41 110497.66 2410.38 0.000 0.922 4 -54652.99 109708.09 109558.74 974.66 0.015 0.925 5 -54357.08 109193.27 109015.32 519.88 0.149 0.901 6 -54090.08 108736.27 108529.72 397.39 0.354 0.912 7 -54079.98 108793.06 108557.91 97.55 0.392 0.920 8 -53895.87 108501.85 108238.10 307.84 0.314 0.732 What do the classes look like? Figure 1: Four Latent Classes of Depressive Symptoms over 12 Years of the HRS Mean # of Depressive Symptoms 6 5 Many Persistent Symptoms = 5.4% Decreasing Symptoms = 9.6% Increasing Symptoms = 11.5% Almost No Symptoms = 73.5% 4 3 2 1 0 1994 1996 1998 2000 HRS Study Wave 2002 2004 N = 5195 How is this different from looking at Means or a Single Trajectory? Online at: http://spreadsheets.google.com/pub?key=0ApRkae54BRnudEYyUGdXZWlES3Z4VzZ6a kNaOFFiekE&gid=5 Does anything influence the chances of being in a particular class? Figure 2. Relationship between Years of Education and Depressive Symptoms Trajectory/Latent Class Membership 1.0 Latent Class Probability 0.9 0.8 Many Symptoms Decreasing Symptoms Increasing Symptoms Almost No Symptoms 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 Years of Education 13 14 15 16 17 N = 5195 18 Does anything influence the chances of being in a particular class? Table. Effects of Demographics on the Likelikhood of a Depressive Symptoms Trajectory Many Symptoms Decreasing Increasing N = 5195 vs. Almost No Symptoms (reference category) OR b p OR b p OR b p Age 0.94 -0.057 0.010 0.96 -0.043 0.014 1.02 0.016 0.422 Female 2.19 0.785 0.000 1.53 0.428 0.001 1.41 0.346 0.002 Black 1.89 0.635 0.000 1.90 0.641 0.000 1.54 0.429 0.001 Hispanic 1.12 0.113 0.655 1.59 0.461 0.018 1.19 0.178 0.461 Low Education 1.32 0.274 0.000 0.84 -0.173 0.000 0.90 -0.105 0.000 • Females, African Americans and those with fewer years of education have a higher probability of being in the Many Symptoms trajectory. Outline Important SEM Resources Measurement (and measurement error) Examples ◦ Measurement Invariance ◦ Latent Class Analysis ◦ Latent Growth Mixture Modeling Model Specification Model Specification Choosing the model that best represents the data structure and addresses the research questions of interest can be a daunting task. Brief overview of model specification tests and procedures. Model Specification “First, your return to shore was not part of our negotiations nor our agreement so I must do nothing. And secondly, you must be a pirate for the pirate's code to apply and you're not. And thirdly, the code is more what you'd call ‘guidelines’ than actual rules.” Captain Barbossa from Pirates of the Caribbean: The Curse of the Black Pearl (2003) Model Specification In model specification a researcher can use: ◦ logic, theory and prior empirical evidence to choose the initial model ◦ model comparison testing to compare the initial model to competing models ◦ a combination of theory, prior evidence, and the results of the model comparison testing to decide upon which model or models are appropriate for a given study Nested or Not Nested? Chi Square Test The Chi-square statistic is computed and used to test whether the model does fit the data well. It is the basis for most other fit tests. Along with other fit tests we use it to evaluate whether to include or exclude model paths relating measures to each other for a given study. Chi Square Test Also called the discrepancy function If not significant, the model is regarded as acceptable.* Chi Square Test* Some limitations are: Complex models with many parameters With large samples, models will most often be rejected, sometimes unfairly Where multivariate non-normality is present, the chi-square fit index is inaccurate. Modified tests (The Satorra-Bentler scaled chi-square) are available. Modification Indices Modification indices can be calculated individually for every path that is fixed to zero, by estimating a chi-square test statistic with one df. The higher the value of the modification index for a causal path, the better the predicted improvement in overall model fit if that path were added to the model. Jöreskog suggested that a modification index should be at least five before the researcher considers adding the causal path and modifying the hypothesized model. R-squared In linear regression analysis, we interpret the r2 value as the amount of variation in the response that can be explained by the regressors in the model. In SEM, it is pretty much the same* *Not exactly, but that is beyond the “for dummies” version of this talk AIC, BIC (and BCC) Bayesian Information Criterion (BIC) Akaike Information Criterion (AIC) Based on the chi-squared test statistic While the models under comparison can be nested or non-nested, in both these tests, as with all tests in this section, for a truly direct comparison, we prefer that the same observed measures are used in the models we are comparing. Both BIC and AIC feature the goodness-of-fit 2 term for our model 𝜒𝑀 , derived directly from the discrepancy function when applicable, along with a penalty term. AIC, BIC (and BCC) Cannot identify if a model has good fit. Only if one model fits better than another. The lower the value of BIC, AIC and BCC, the better the fit. BCC penalizes for model complexity more than AIC and BIC. BIC penalizes for model complexity more than AIC. Specification Search Allows researchers to choose a model from among a number of candidates. Exploratory Should be guided by theory Specification Search Given the model in Figure 1, with 7 unknown paths, the number of models is equivalent to 27=128 possible specifications of the model. 128 different possible models! Specification Search The unconstrained model (The one with all seven ambiguous paths in the model) demonstrates satisfactory overall model fit ◦ ◦ ◦ ◦ CFI=.95 TLI=.92 RMSEA =.07 chisq= 476.33, DF=63 Table. Results of Specification Search in AMOS Model Name Params df C C - df AIC BCC BIC C / df R2 p A Unconstrained 56 63 476.336 413.336 588.336 589.664 877.039 7.561 0 .11 B No DIF path to PF1 55 64 476.357 412.357 586.357 587.661 869.904 7.443 0 .11 C No DIF path to PF1, PF2 54 65 476.396 411.396 584.396 585.677 862.787 7.329 0 .11 D No DIF paths to PF1, PF2, or PF3 53 66 476.857 410.857 582.857 584.114 856.093 7.225 0 .11 E No DIF paths, No educ to SS 52 67 478.536 411.536 582.536 583.769 850.616 7.142 0 .11 F No DIF paths, No educ to SS, SS to PP 51 68 482.272 414.272 584.272 585.481 847.197 7.092 0 .10 G No DIF paths, No educ to SS, SS to PP/PF 50 69 495.104 426.104 595.104 596.289 852.873 7.175 0 .09 H Fully Constrained (No DIF or SS paths) 49 70 540.892 470.892 638.892 640.054 891.506 7.727 0 .09 Notes: Reported R2 values are for the equation in each model with the endogenous PF latent variable with the interpretation of total explained variance in physical functioning given all other paths in the model. C is the chi-squared test statistic and df are the associated degrees of freedom. Specification Search When a number of models are plausible, specification tests can be used as evidence for verification of or improvement over an initial model. ‘Guidelines’ a researcher is ultimately left to decide if the results of the specification tests are unjustly in favor of a certain model due to complexity or sample size, rather than the meaning behind the causal paths. Thus the specification tests act more like guidelines, rather than strict codes dictating the “best” fitting model. Selected Strengths & Limitations of SEM Strengths ◦ Very flexible ◦ Estimate and correct for measurement error Limitations ◦ ◦ ◦ ◦ Large sample sizes Challenging to learn Need lots of hands-on experience to learn Need a strong theoretical basis It’s easy to mis-specify a model if you have no idea what you are doing. Applied Structural Equation Modeling for Everyone! February 22, 2013 Indiana University, Bloomington Joseph J. Sudano, Jr., PhD Center for Health Care Research and Policy Case Western Reserve University at The MetroHealth System Adam T. Perzynski, PhD Center for Health Care Research and Policy Case Western Reserve University at The MetroHealth System