Structural Equation Modeling Jiyoon An Kiran Pedada Agenda Part 1 (Presented by Jiyoon An) - SEM and latent variable - Find a model from dataset Part 2 (Presented by Kiran Pedada) - SEM Structural model and measurement models - How to use Lavaan - Addressing missing values - Path Diagrams Part 1 (Presented by Jiyoon An) Structural equation modeling (SEM) Test and estimate the (causal) relationships among observable measures and non-observable theoretical (or latent) variables, and further to describe relationships between the latent variables themselves with directed arrows Source: http://davidakenny.net/ Why latent variable? A latent variable, a random variable, differs from a fixed process parameter Measuring a person’s characteristics (e.g. dominance) Everyone has a different level of dominance. Some are less dominant and some are more dominant We cannot measure dominance directly and need a latent variable Source: Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003), The theoretical status of latent variables, Psychological review, 110(2), 203. Measuring ‘dominance’ by using latent variable Latent variable Manifest variables Dominance Xi X1: “I would like a job where I have power over others” X2: “I would make a good military leader” X3: “I try to control others” Source: Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003), The theoretical status of latent variables, Psychological review, 110(2), 203. When do you have a latent variable? A latent variable is defined as a random variable whose realizations cannot be observed directly Remind an example of “ROA” Assess of true measure against measurement error (e.g. age) Source: Borsboom, D. (2008), Latent variable theory, Measurement 6, 25-53, Howell, R. D. (2014), course materials from MKT 6355 Theory Testing SEM case in point: Student evaluation Infer from data structure to variable structure How to conceptualize latent variables? What are their causal relationships? Source: Borsboom, D. (2008), Latent variable theory, Measurement 6, 25-53, Howell, R. D. (2014), course martials from MKT 6355 Theory Testing How to conceptualize latent variables? Perceived instructor competence (R1, R3, R7, R8, R9, R10) Perceived instructor interaction (R6, R4, R5) Perceived course quality (R11, R12, R13, R14, R15, R16) R2 is removed Factor analysis and SEM EFA - Find a latent variable which affects observed variables - Without prior assumption, all loadings are free to vary CFA - Some loadings are forced to be zero by the researcher - Factors are allowed to correlated - No direct arrows between factors (Measured model) SEM - Test and estimate the (causal) relationships Where is latent variable? R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 F1 F2 F3 (Competence) (Interaction) (Course) Student 1 Student 2 … Student n Student n Comp. Inter. Course R1 R3 R7 R8 R9 R10 R6 R4 R5 R11 R12 R13 R14 R15 R16 e1 e6 e7 e8 e9 e10 e3 e4 e5 e11 e12 e13 e14 e15 e16 What are their causal relationships? Criteria for classifying an explanation as causal - Temporal sequentiality, nonspurious correlation, and common sense logic # of people of drowning and ice cream consumption Source: Hunt, S. D. (2010), Foundations of marketing theory:Toward a general theory of marketing, ME Sharpe Applying criteria for choosing a model • Latent variables: Perceived course quality, perceived instructor competence, and perceived instructor interaction • Discussion: What are our DV(s) and IV(s)? A model that does not make sense A student forms an opinion about interaction, which influences his/her opinion about competence, which in turn influences his/her opinion about course quality. Remember criteria of causality Comp. Inter. Course A model that makes more sense A student forms his/her opinion on interaction and competence simultaneously, which influences perceived course quality Opinions on interaction and competence are correlated because they come from the same student How the instructor offers and what the instructor offers influence perceived quality of course Comp. Inter. Course Source: Grönroos, C. (1984), A service quality model and its marketing implications, European Journal of marketing, 18(4), 36-44. Part 2 (Presented by Kiran Pedada) SEM Structural Model SEM model for the case: Model Z = B zU + e z Here: Z is the endogenous latent variable, U is a (2x1) matrix of exogenous latent variables Bz is a (1x2) matrix of coefficients of exogenous variables, ez is the error associated with the endogenous variable. Perceived Competence Perceived Interaction Perceived Course Quality Source: “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf Note: The equation is taken from the above mentioned source. However, the symbols are changed for ease and convenience. Exogenous Measurement Model Exogenous measurement model: X = BxU + ex Here: X is a (9 x 1) matrix of exogenous indicators, Bx is a (9 x 2) matrix of coefficients from the exogenous variables to exogenous indicators, U is a (2 x 1) matrix of exogenous latent variables, ex is a (9 x 1) matrix for error associated with the exogenous indicators. Source: “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf Note: The equation is taken from the above mentioned source. However, the symbols are changed for ease and convenience. Exogenous Measurement Model X = BxU + ex Endogenous Measurement Model Endogenous measurement model: Y = ByZ + ey Here: Y is a (6x1) matrix of endogenous indicators, By is a (6x1) matrix of coefficients from the endogenous variable to endogenous indicators, Z is a (1x1) matrix of endogenous latent variable, ey is a (6x1) matrix for error associated with the endogenous indicators. Source: “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf Note: The equation is taken from the above mentioned source. However, the symbols are changed for ease and convenience. Endogenous Measurement Model Y = ByZ + ey SEM and Analysis of Covariance SEM is based on the analysis of covariances Analysis of covariances allows for estimation of both standardized and unstandardized parameters Source: www.structuralequations.com/resources/SEM+Essentials.pps Example of Analysis of Covariance Structure Compare S denotes the observed covariances (typically the unstandardized covariances) ∑ denotes the model-implied covariances Source: www.structuralequations.com/resources/SEM+Essentials.pps R Packages for SEM – Non-commerical SEM Developer: John Fox (since 2001) For a long time, the only option in R Will not do multiple groups OpenMX Developer: Steven Boker (available at http://openmx.psyc.Virginia.edu/) Very powerful All parts of OpenMX are open-source, except for the NPSOL optimizer, which is closed-source Somewhat idiosyncratic syntax Lavaan Developer:Yves Rosseel (http://lavaan.ugent.be/) First public release – May 2010. On 1st Oct’14 version 0.5-17 has been released on CRAN Uses a more compact notation that sem Will work on multiple groups Source: Rosseel,Yves. "lavaan: An R package for structural equation modeling."Journal of Statistical Software 48.2 (2012): 1-36. Source 2: https://personality-project.org/revelle/syllabi/454/wk6.lavaan.pdf Why lavaan? A free, open-source for latent variable modeling Easy and intuitive to use Results are typically very close, to the results of Mplus Powerful, easy-to-use text-based syntax describing the model Fairly complete Source: Rosseel,Yves. "lavaan: An R package for structural equation modeling."Journal of Statistical Software 48.2 (2012): 1-36. Data #Data Data = read.csv(file.choose(), header=T) attach(Data) #Responses 1 to 16 evals=as.matrix(cbind(RESP_1,RESP_2,RESP_3,RESP_4,RESP_5,RESP_ 6,RESP_7,RESP_8,RESP_9,RESP_10,RESP_11,RESP_12,RESP_13,RES P_14,RESP_15,RESP_16)) Formulae and Operators Formula type Operator Mnemonic Latent variable Regression Covariance Defined parameter =~ ~ ~~ := is manifested by is regressed on is correlated with is defined as Equality constraint Inequality constraint Inequality constraint == < > is equal to is smaller than is larger than Source: Rosseel,Yves. "lavaan: An R package for structural equation modeling."Journal of Statistical Software 48.2 (2012): 1-36. Specifying the Model model <- ' # Defining the Latent Variables Competence =~ RESP_1 + RESP_3 + RESP_7 + RESP_8 + RESP_9 + RESP_10 Course =~ RESP_11 + RESP_12 + RESP_13 + RESP_14 + RESP_15 + RESP_16 Interaction =~ RESP_6 + RESP_4 + RESP_5 #Regression Course ~ Interaction + Competence #covariance of latent variables Interaction ~~ Competence ' Install Packages Install.packages(“lavaan”) Install.packages(“semplot”) Running the Model require("lavaan") #Fitting the data fit <- sem(model, data = evals, missing = "FIML") Dealing with Missing Values in Lavaan “listwise” - cases with missing data removed listwise (before analysis) “fiml” - the package offers estimation using all available data.This is also called “case-wise” maximum likelihood estimation. Source: http://cran.r-project.org/web/packages/lavaan/lavaan.pdf Examining the Results #Examining the results summary(fit, fit.measure=TRUE, standardized = TRUE) Examining the Results Number of observations Used 7828 Number of missing patterns 92 Estimator Minimum Function Test Statistic Degrees of freedom P-value (Chi-square) ML 6068.046 87 0.000 Parameter estimates: Information Standard Errors Total 7830 Observed Standard Examining the Results Estimate Std.err Z-value P(>|z|) Std.lv Std.all Latent variables: Competence =~ RESP_1 1.000 RESP_3 1.038 RESP_7 1.072 RESP_8 0.957 RESP_9 1.026 RESP_10 0.695 Course =~ RESP_11 1.000 RESP_12 0.971 RESP_13 0.947 RESP_14 0.766 RESP_15 0.829 RESP_16 0.890 Interaction =~ RESP_6 1.000 RESP_4 1.151 RESP_5 1.196 0.778 0.000 0.000 0.000 0.000 0.000 0.902 0.807 0.834 0.745 0.798 0.541 0.889 0.867 0.871 0.855 0.792 0.853 0.009 110.946 0.000 0.009 107.388 0.000 0.008 90.252 0.000 0.009 90.857 0.000 0.010 88.775 0.000 0.869 0.829 0.808 0.654 0.707 0.760 0.891 0.879 0.805 0.808 0.795 0.009 121.814 0.009 114.296 0.008 114.973 0.009 110.423 0.007 94.256 0.612 0.822 0.012 97.686 0.000 0.704 0.910 0.012 100.429 0.000 0.731 0.922 Regressions: Course ~ Interaction 0.075 0.019 4.059 0.000 0.054 0.054 Competence 0.929 0.016 56.843 0.000 0.847 0.847 Covariances: Competence ~~ Interaction 0.394 0.008 48.130 0.000 0.828 0.828 Plotting the SEM Path Diagram #SEM path diagram Require(“semplot”) # Plot input path diagram semPaths(fit,title=FALSE, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE) # Plot output path diagram with standardized parameters semPaths(fit, "std", edge.label.cex = 1.0, curvePivot = TRUE) Input Path Diagram Output Path Diagram Relating to the Results Estimate Std.err Latent variables: Competence =~ RESP_1 RESP_3 RESP_7 RESP_8 RESP_9 RESP_10 Course =~ RESP_11 RESP_12 RESP_13 RESP_14 RESP_15 RESP_16 Interaction =~ RESP_6 RESP_4 RESP_5 1.000 1.038 1.072 0.957 1.026 0.695 1.000 0.971 0.947 0.766 0.829 0.890 1.000 1.151 1.196 0.009 0.009 0.008 0.009 0.007 Z-value 121.814 114.296 114.973 110.423 94.256 0.009 110.946 0.009 107.388 0.008 90.252 0.009 90.857 0.010 88.775 0.012 97.686 0.012 100.429 P(>|z|) Std.lv Std.all 0.000 0.000 0.000 0.000 0.000 0.778 0.807 0.834 0.745 0.798 0.541 0.902 0.889 0.867 0.871 0.855 0.792 0.000 0.000 0.000 0.000 0.000 0.853 0.829 0.808 0.654 0.707 0.760 0.869 0.891 0.879 0.805 0.808 0.795 0.000 0.000 0.612 0.704 0.731 0.822 0.910 0.922 Relating to the Results Intercepts: RESP_1 RESP_3 RESP_7 RESP_8 RESP_9 RESP_10 RESP_11 RESP_12 RESP_13 RESP_14 RESP_15 RESP_16 RESP_6 RESP_4 RESP_5 Competence Course Interaction Estimate Std.err Z-value P(>|z|) Std.lv 4.380 4.366 4.306 4.435 4.361 4.637 4.295 4.301 4.313 4.472 4.408 4.345 4.578 4.548 4.558 0.000 0.000 0.000 448.881 425.167 395.835 458.797 413.101 600.331 386.301 408.596 414.576 486.091 444.632 401.296 543.730 519.595 507.973 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 4.380 4.366 4.306 4.435 4.361 4.637 4.295 4.301 4.313 4.472 4.408 4.345 4.578 4.548 4.558 0.000 0.000 0.000 0.010 0.010 0.011 0.010 0.011 0.008 0.011 0.011 0.010 0.009 0.010 0.011 0.008 0.009 0.009 Std.all 5.077 4.810 4.479 5.191 4.674 6.792 4.372 4.624 4.694 5.506 5.036 4.546 6.155 5.879 5.747 0.000 0.000 0.000 Relating to the Results Estimate Std.err Variances: RESP_1 RESP_3 RESP_7 RESP_8 RESP_9 RESP_10 RESP_11 RESP_12 RESP_13 RESP_14 RESP_15 RESP_16 RESP_6 RESP_4 RESP_5 Competence Course Interaction 0.139 0.172 0.230 0.176 0.234 0.174 0.237 0.178 0.191 0.232 0.266 0.336 0.179 0.103 0.094 0.605 0.148 0.374 0.003 0.003 0.004 0.003 0.004 0.003 0.005 0.004 0.004 0.004 0.005 0.006 0.003 0.003 0.003 0.012 0.004 0.009 Z-value P(>|z|) Std.lv 0.139 0.172 0.230 0.176 0.234 0.174 0.237 0.178 0.191 0.232 0.266 0.336 0.179 0.103 0.094 1.000 0.204 1.000 Std.all 0.187 0.209 0.248 0.241 0.269 0.373 0.245 0.205 0.227 0.352 0.348 0.368 0.324 0.172 0.150 1.000 0.204 1.000 References Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003), The theoretical status of latent variables, Psychological review, 110(2), 203. Borsboom, D. (2008), Latent variable theory, Measurement 6, 25-53. Grönroos, C. (1984), A service quality model and its marketing implications, European Journal of marketing, 18(4), 36-44. Howell, R. D. (2014), course materials from MKT 6355 Theory Testing. Hunt, S. D. (2010), Foundations of marketing theory: Toward a general theory of marketing, ME Sharpe. Rosseel,Yves. "lavaan: An R package for structural equation modeling."Journal of Statistical Software 48.2 (2012): 1-36 Thank You