This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2007, The Johns Hopkins University and Qian-Li Xue. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed. Advanced Structural Equations Models I Statistics for Psychosocial Research II: Structural Models Qian-Li Xue Test of causal hypotheses? No Yes Ordinary Regression SEM (Origin: Path Models) Yes Continuous endogenous var. and Continuous LV? Yes Classic SEM Yes Categorical indicators and Categorical LV? Latent Class Reg. Longitudinal Data? No Multilevel Data ? Adv. SEM II: Multilevel Models No Latent Trait Latent Profile No Yes Adv. SEM I: latent growth curves) No Classic SEM Outline 1. Estimating means of observed and latent variables 2. Modeling repeated measures of outcome over time The Simplex-Growth Over Time 3. Non-Recursive Models 4. Modeling repeated measures of outcome and covariate over time Cross-Lag Panel Analysis Latent Growth Curve Models (Next Lecture) 1. Estimating Means of Observed and Latent Variables Estimating Means of Observed and Latent Variables So far, we have largely ignored intercept terms in our analyses What has happened to the alpha coefficient? Estimating Means of Observed and Latent Variables Up to now, information on means and intercepts has not been of interest It is possible to estimate levels of association without information on these parameters If of interest, these parameters can be estimated using a “mean model.” In addition to covariances, these models also require information on mean of variables These parameters are of key interest in group comparisons and growth curve models Estimating Means of Observed and Latent Variables Does the mean score on the latent variable ξ (e.g. depression) differ between men and women? Man Women 1 e d ξ11 0.6 0.8 X11 .64 4.0 a b c 0.7 X12 .36 5.0 X13 .51 6.0 a b ξ12 c 0.6 0.8 X21 Y22 .64 4.3 .36 5.4 0.7 X23 .51 6.35 Resid. Var. Means (Loehlin p.139) Estimating Means of Observed and Latent Variables Man Women 1 e d ξ11 0.6 0.8 X11 .64 4.0 a b c 0.7 X12 .36 5.0 X13 .51 6.0 a b d=0 (reference) a=4.0,b=5.0,c=6.0 (baseline values, same across groups) e – difference between the means of the latent variable e*0.6+a=4.3 ⇒ e=0.5 ξ12 c 0.6 0.8 X21 Y22 .64 4.3 .36 5.4 0.7 X23 .51 6.35 Resid. Var. Means (Loehlin p.139) Example: Stress, Resources, and Depression (Holahan & Moos, 1991) “How do the high-stressor and the low-stressor groups compare on the two latent variables: depression (D) and resources (R) r 1 g f D High-Stressor R i h l a b DM DF m n j c d k SC o e EG FS p q Example: Stress, Resources, and Depression (Holahan & Moos, 1991) High-stressor group: above diagonal (underlined) Low-stressor group: below diagonal DM DF SC EG FS SD M Depressed Mood 1 .84 -.36 -.45 -.51 5.97 8.82 Depressive Features .71 1 -.32 -.41 -.50 7.98 13.87 Self-confidence -.35 -.16 1 .26 .47 3.97 15.24 Easygoingness -.35 -.21 .11 1 .34 2.27 7.92 Family support -.38 -.26 .30 .28 1 4.91 19.03 Standard Deviation 4.84 6.33 3.84 2.14 4.43 N 128 Mean 6.15 9.96 15.14 8.80 20.43 126 Example: Stress, Resources, and Depression (Holahan & Moos, 1991) MPLUS code Low-Stressor r 1 g f D R i h l a b DM DF m n j c d k SC o e EG FS p q TITLE: Stress, resources, and depression (Loehlin, p.142) DATA: FILE is c:/teaching/140.658.2007/depression.dat; TYPE IS CORRELATION MEANS STDEVIATIONS; NOBSERVATIONS ARE 126 128; NGROUPS=2; VARIABLE: NAMES ARE DM DF SC EG FS; USEVARIABLES ARE DM-FS; MODEL: D BY DM* DF; R BY SC* EG FS; DM (1); Equate the measurement models DF (2); SC (3); across the groups EG (4); FS (5); MODEL g1: [D@0 R@0]; Set reference group (i.e. lowD@1 R@1; stressor) OUTPUT: TECH1; Example: Stress, Resources, and Depression (Holahan & Moos, 1991) Measurement Model Latent Variables LowStressor HighStressor Path Coeff. Residual Var. Baseline means Depression: Mean f [0]* a 4.42 m 2.91 h 6.09 Resources: Mean g [0] b 5.22 n 16.04 i 10.27 Depression: SD [1] c 1.56 o 11.76 j 15.59 Resources: SD [1] d 1.01 p 3.61 k 8.61 e 2.67 q 12.25 l 20.40 correlation r -0.72 Depression: Mean f 0.63 Resources: Mean g -0.50 Depression: SD 1.30 Resources: SD 1.29 correlation r -0.78 * Numbers in [ ] are prefixed in order to make the model identified Same as above Example: Stress, Resources, and Depression (Holahan & Moos, 1991) TESTS OF MODEL FIT Chi-Square Test of Model Fit Value Degrees of Freedom P-Value CFI/TLI CFI TLI 27.245 19 0.0991 0.979 0.978 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.058 90 Percent C.I. 0.000 SRMR (Standardized Root Mean Square Residual) Value 0.055 The model fits reasonably well to the data! 0.104 2. Modeling Repeated Measures of Outcome Over Time The Simplex-Growth Over Time Modeling growth over (e.g. height) Measurements taken repeatedly over time In general, measurements made closer together in time would be more highly correlated (called “simplex” by Guttman, 1954) Smaller E.g. Correlation 1 2 3 4 1 1 0.73 0.72 0.68 1 0.79 0.76 1 0.84 2 3 4 1 The Simplex-Growth Over Time Example: Scores on standardized tests of academic achievement at grades 1-7 (Bracht & Hopkins, 1972) Test score (Y) is a measure of the latent academic achievement (η) Achievement at grade t is a function of achievement at t-1 via β, and other factors ζ ζ2 η1 1 β21 ζ3 η2 1 β32 ζ4 η3 β43 ζ5 η4 1 β54 ζ6 η5 1 1 β65 ζ7 η6 β76 η7 1 1 Y1 Y2 Y3 Y4 Y5 Y6 Y7 ε1 ε2 ε3 ε4 ε5 ε6 ε7 Loehlin p.125 The Simplex-Growth Over Time ζ2 η1 1 β21 ζ3 η2 1 β32 ζ4 η3 β43 ζ5 η4 1 β54 ζ6 η5 1 1 β65 ζ7 η6 1 β76 η7 1 Y1 Y2 Y3 Y4 Y5 Y6 Y7 ε1 ε2 ε3 ε4 ε5 ε6 ε7 Yi = ηi + ε i ηi = β iηi −1 + ς i εi are uncorrelated, εi⊥ηi, and ζi⊥ηi-1 The Simplex-Growth Over Time Var(η1), Var(ζ7), Var(ε1), Var(ε2), β21 are unidentified To achieve identification, set Var(ε1)=Var(ε2) AND Var(ε6)=Var(ε7), reasonable if Ys are on the same scale # free parameters = 3p-3, where p=# of Ys For testing a simplex model, p>3 !!! ζ2 η1 1 β21 ζ3 η2 1 β32 ζ4 η3 β43 ζ5 η4 1 β54 ζ6 η5 1 1 β65 ζ7 η6 1 β76 η7 1 Y1 Y2 Y3 Y4 Y5 Y6 Y7 ε1 ε2 ε3 ε4 ε5 ε6 ε7 3. Non-Recursive Models Non-Recursive Models So far, there has been little discussion of models with feedback loops Non-recursive models deal with reciprocal causal relationships Can not be analyzed by ordinary regression analysis due to correlated errors Non-recursive models may not be identified even if the T-rule is met Non-Recursive Models Time 1 Time 2 A A A B B B Reciprocal Lagged What do you mean by “reciprocal causation”? Alternative: Lagged model Assumption: the principal of “finite causal lag” Roles of the variables in the bidirectional relationship change over time (e.g. A is a cause at Time 1, but effect at Time 2) The reciprocal causation model becomes the only choice if only cross-sectional data are available Non-Recursive Models: Model Identification Recall: recursive path models without measurement error are always identified Not true for non-recursive models Definition: Instrumental variable – a predictor is an instrument for an endogenous variable if it has a direct path to other endogenous variables but not the endogenous variable of interest X1 Y1 X3 is an instrument for Y1 X2 X3 Y2 Maruyama, 1998; p.106 Non-Recursive Models: Model Identification Order condition (necessary but not sufficient) – For any system of N endogenous variables, a particular equation is identified only if at least N-1 variables are left out of that equation Rank condition (necessary AND sufficient) – is met for a particular equation if there is at least one non-zero determinant of rank N-1 from the coefficients of the variables omitted from that equation X1 Y1 X2 X3 Y2 Maruyama, 1998; p.106 4. Modeling Repeated Measures of Outcome and Covariate Over Time Cross-Lagged Panel Analysis: Terminology Time 1 Time 2 eX1 X1 eX2 Synchronous correlations: Corr(X1,Y1) and Corr(X2,Y2) Autocorrelations (i.e. stability): Corr(X1,X2) and Corr(Y1,Y2) X2 Cross-lagged: Corr(X1,Y2) and Corr(Y1,X2) Y1 eY1 Y2 eY2 Residual correlations (due to measure-specific variance): Corr(ex1,eX2) and Corr(eY1,eY2) Here Corr. denotes total correlation! Cross-Lagged Panel Analysis: Identification Time 1 Time 2 eX1 X1 eX2 X2 Is this model identified? # equations = 4*5/2=10 # unknowns = 11 Not identified! What is the problem? The repeated assessment of the same measure leads to two sources of common variance Y1 Y2 eY1 eY2 construct variance Measure-specific variance Model would be identified if delete residual correlations or Build multiple-indicator models Cross-Lagged Panel Analysis: Key Issues (Maruyama, pp.112-120) Time 1 Time 2 1. Stability of a variable For example, if Y is perfectly stable, Y2 is perfectly determined by Y1 If data is only available at Time 2, then Y1 is not available Any variable correlated with Y or caused by Y could be included as predictors, leading to a misspecified model! Low stability over time may result from poor reliability (if so, we’re in trouble!) or Real change in the measure eY2 Y1 Y2 X2 eX2 Cross-Lagged Panel Analysis: Key Issues Time 1 Time 2 eY2 eX1 X1 Y1 eY1 2. Temporal Lags How long is the causal lag? It the sampling interval > causal lag ⇒ attenuated effect If the sampling interval < causal lag ⇒ no effect or underestimated effect What if the causal lag from X1 to Y2 is different from Y1 to X2? Solution: three-wave data with different intervals X2 Y2 eY2 Cross-Lagged Panel Analysis: Key Issues 3. Growth Across Time When to use covariance vs. correlation data in SEM Covariance allows for “growth” by focusing on raw scores Correlation focuses on standardized relationships If no change in variability of any of the variables over time, the results are identical Using covariance is highly recommended! Cross-Lagged Panel Analysis: Key Issues 3. Stability of Causal Process Causal dynamics between variables remain stable across time intervals of the same length If not true, the relationships would differ depending on the particular interval sampled On the other hand, modeling unstable processes may be warranted when studying Developmental processes Time-varying interventions Cross-Lagged Panel Analysis with Latent Variables: Example 0.39 0.52 0.53 Nervous or upset Nervous or upset Often get scared 0.63 Often get scared 0.72 0.73 Grade 7 Anxiety 0.54 0.51 Grade 8 Anxiety 0.69 Nervous or upset 0.48 0.63 Grade 9 Anxiety 0.73 0.64 … 0.73 Often get scared 0.53 (Ma & Xu, Journal of Adolescence 27 (2): 165-179 APR 2004 ) Cross-Lagged Panel Analysis with Latent Variables: Example 0.77 0.31 Basic skills 0.88 0.46 Algebra 0.56 0.64 Geometry 0.68 Grade 7 Achieve Literacy 0.80 0.98 0.89 Basic skills 0.79 0.88 0.77 0.90 0.85 Algebra … 0.92 Grade 8 Achieve Geometry 0.72 Literacy 0.81 (Ma & Xu, Journal of Adolescence 27 (2): 165-179 APR 2004 ) Anxiety Grade 0.39 0.39 77 0.55 0.55 88 0.57 0.57 99 10 10 -0.05 -0.05 -0.01 -0.20 -0.12 -0.14 77 88 0.98 0.98 99 0.91 0.91 0.59 0.59 0.57 0.57 11 11 -0.02 -0.02 -0.15 10 10 0.95 0.95 12 12 -0.11 11 11 0.97 0.97 12 12 0.97 0.97 Achievement Grade Example of cross-lagged panel analysis with latent variables. Structural equation model estimating the causal relationship between mathematics anxiety & mathematics achievement across Grades 7–12. Large ovals represent latent factors & unidirectional arrows represent casual links. All parameter estimates for unidirectional paths are standardized. Pink boxes indicated P < 0.001). Adapted from Ma & Xu, Journal of Adolescence 2004;27:165-179