PLS-SEM: Introduction (Part 1) Joe F. Hair, Jr. Founder & Senior Scholar, DBA Program SEM Model: Predicting the Birth Weight of Guinea Pigs X & Y = different outcomes B, C & D = common causes A & E = independent causes Sewall Wright, Correlation and Causation, Journal of Agricultural Research, Vol. XX, No. 7, 1921. The greatest interest in any factor solution centers on the correlations between the original variables and the factors. The matrix of such test-factor correlations is called the factor structure, and it is the primary interpretative device in principal components analysis. In the factor structure the element rjk gives the correlation of the jth test with the kth factor. Assuming that the content of the observation variables is well known, the correlations in the kth column of the structure help in interpreting, and perhaps naming, the kth factor. Also, the coefficients in the jth row give the best view of the factor composition of the jth test. The derivation of the factor structure S is as follows : N S 1 N (z i m 1 N z ifi 1 N z i (L 1 N ( z i z i ) VL z )( f i m f ) i 1 RVL 1 / 2 V z i ) 1 / 2 1 / 2 and since RV VL S VLL 1 / 2 VL 1/ 2 Another set of coefficients of interest in factor analysis is the weights that compound predicted observations z from factor scores f. These regression coefficients for the multiple regression of each element of the observation vector z on the factor f are called factor loadings and the matrix A that contains them as its rows is . . . . . Source: Cooley, William W., and Paul R. Lohnes, Multivariate Data Analysis, John Wiley & Sons, Inc., New York, 1971, page 106. Structural Equations Modeling What comes to mind? CB-SEM (Covariance-based SEM) – objective is to reproduce the theoretical covariance matrix, without focusing on explained variance. PLS-SEM (Partial Least Squares SEM) – objective is to maximize the explained variance of the endogenous latent constructs (dependent variables). CB-SEM Model HBAT, MDA database Covariance Matrix = HBAT 3-Construct model CB-SEM – evaluation focuses on goodness of fit = minimization of the difference between the observed covariance matrix and the estimated covariance matrix. Research objective: testing and confirmation where prior theory is strong. • Assumes normality of data distribution, homoscedasticity, large sample size, etc. • Only reliable and valid variance is useful for testing causal relationships. • A “full information approach” which means small changes in model specification can result in substantial changes in model fit. PLS-SEM – objective is to maximize the explained variance of the endogenous latent constructs (dependent variables). Research objective: theory development and prediction. • Normality of data distribution not assumed. • Can be used with fewer indicator variables (1 or 2) per construct. • Models can include a larger number of indicator variables (CB-SEM difficult with 50+ items). • Preferred alternative with formative constructs. • Assumes all measured variance (including error) is useful for explanation/prediction of causal relationships. PLS Path Model Latent LatentVariable Construct W1 Indicator Variable X1 Y1 X2 P1 W2 W6 Y3 X3 X6 W3 Y2 X4 X5 W5 W4 P2 W7 X7 Multivariate Methods Should SEM Be Used? Considerations: 1. The Variate 2. Multivariate Measurement 3. Measurement Scales 4. Coding 5. Data Distribution Variate = a linear combination of several variables, often referred to as the fundamental building block of multivariate analysis. Variate value = x1w1 + x2w2 + . . . + xkwk Data Matrix Multiple Regression Model x1 x2 … Y1 xk Variate = x1 + x2 + xk + e e1 Multivariate Measurement Measurement = the process of assigning numbers to a variable/construct based on a set of rules that are used to assign the numbers to the variable in a way that accurately represents the variable. When variables are difficult to measure, one approach is to measure them indirectly with proxy variables. If the concept is restaurant satisfaction, for example, then the several proxy variables that could be used to measure this might be: 1. 2. 3. 4. 5. The taste of the food was excellent. The speed of service met my expectations. The wait staff was very knowledgeable about the menu items. The background music in the restaurant was pleasant. The meal was a good value compared to the price. Multivariate measurement involves using several variables to indirectly measure a concept, as in the restaurant satisfaction example above. It also enables researchers to account for the error in data. Data Characteristics – PLS-SEM Sample Size No identification issues with small sample sizes (35-50). Generally achieves high levels of statistical power with small sample sizes (35-50). Larger sample sizes (250+) increase the precision (i.e., consistency) of PLS-SEM estimations. Data No distributional assumptions (PLS-SEM is a non-parametric method; works well with extremely non-normal data). Distribution Missing Highly robust as long as missing values are below Values reasonable level (e.g., up to 15% randomly missing data points). Use mean replacement (sub-groups) and nearest neighbor. Measurement Works with metric, quasi-metric (ordinal) scaled data, and Scales binary coded variables (~only exogenous variables). Limitations when using categorical data to measure endogenous latent variables. Suggest using binary variables for multi-group comparisons. Model Characteristics – PLS-SEM Number of Items in each Construct Measurement Model Relationships between Latent Constructs and their Indicators Model Complexity Model Set-up Handles constructs measured with single and multi-item measures. Easily handles 50+ items (CB-SEM does not). Single item scales OK. Easily incorporates reflective and formative measurement models. Handles complex models with many structural model relationships. Larger numbers of indicators are helpful in reducing “consistency at large”. Causal loops not allowed in the structural model (only recursive models). Algorithm Properties – PLS-SEM Objective Minimizes the amount of unexplained variance (i.e., maximizes the R² values). Efficiency Converges after a few iterations (even in situations with complex models and/or large sets of data) to the global optimum solution; efficient algorithm. Latent Estimated as linear combinations of their indicators. Construct Used for predictive purposes. Scores Can be used as input for subsequent analyses. Not affected by data inadequacies. Parameter Structural model relationships underestimated (PLSEstimates SEM bias). Measurement model relationships overestimated (PLSSEM bias). Consistency at large (minimal impact with N = 250+). High levels of statistical power with smaller sample sizes (35-50). Model Evaluation Issues – PLS-SEM Evaluation of No global goodness-of-fit criterion. Overall Model Evaluation of Reflective measurement models: reliability and Measurement validity assessments by multiple criteria. Models Formative measurement models: validity assessment, significance of path coefficients, multicollinearity. Evaluation of Significance of path coefficients, coefficient of Structural determination (R²), pseudo F-test (f² effect size), Model predictive relevance (Q² and q² effect size). Additional Mediating effects Analyses Impact-performance matrix analysis Higher-order constructs Multi-group analysis Measurement mode invariance Moderating effects Uncovering unobserved heterogeneity: FIMIX-PLS Rules of Thumb: PLS-SEM or CB-SEM? Use PLS-SEM when: • The goal is predicting key target constructs or identifying key “driver” constructs. • Formative constructs are easy to use in the structural model. Note that formative measures can also be used with CB-SEM, but doing so requires construct specification modifications (e.g., the construct must include both formative and reflective indicators to meet identification requirements). • The structural model is complex (many constructs and many indicators). • The sample size is small and/or the data is not-normally distributed, or exhibits heteroskedasticity. • The plan is to use latent variable scores in subsequent analyses. Rules of Thumb: PLS-SEM or CB-SEM Use CB-SEM when: • The goal is theory testing, theory confirmation, or the comparison of alternative theories. • Error terms require additional specification, such as the covariation. • Structural model has non-recursive relationships. • Research requires a global goodness of fit criterion. Systematic Process for applying PLS-SEM Stage 1 Specifying the Structural Model Stage 2 Specifying the Measurement Models Stage 3 Data Collection and Examination Stage 4 PLS-SEM Model Estimation Stage 5a Assessing PLS-SEM Results for Reflective Measurement Models Stage 5b Assessing PLS-SEM Results for Formative Measurement Models Stage 6 Assessing PLS-SEM Results for the Structural Model Stage 7 Interpretation of Results and Drawing Conclusions Should You Use SEM? Journal reviewers rate SEM papers more favorably on key manuscript attributes . . . Mean Score Attributes Topic Relevance Research Methods Data Analysis Conceptualization Writing Quality Contribution SEM 4.2 3.5 3.5 3.1 3.9 3.1 No SEM 3.8 2.7 2.8 2.5 3.0 2.8 p-value .182 .006 .025 .018 .006 .328 Note: scores based on 5-point scale, with 5 = more favorable Source: Babin, Hair & Boles, Publishing Research in Marketing Journals Using Structural Equation Modeling, Journal of Marketing Theory and Practice, Vol. 16, No. 4, 2008, pp. 281-288. PLS-SEM Stages 1, 2 & 3: Design Issues 1. Scale Measures • Scale selection/design • Reflective vs. Formative 2. Common Methods Variance • Harmon Single Factor Test • Common Latent Factor • Marker Construct 3. Missing Data, outliers, etc. Scale Design 1. Revise/Update • Established scales – how old? • Double barreled; negatively worded 2. Number of Scale Points • More scale points = greater variability 3. Single Item Scales Single Item Scales ? Single-item measures Theoretical Aspects Reliability Validity Partitioning Multi-item measures allows for random error adjustment determination of reliability by means of internal consistency no adjustment of random error assessing reliability is problematic lower construct validity – does not account for all facets of a construct decreased criterion validity assessing validity is more problematic Partitioning solely based on the single variable Missing Values very difficult to resolve Use in Academic Research very uncommon (publication problematic) higher construct validity – different facets of a construct can be captured increased criterion validity validity measures based on item-to-item correlations more precise partition possible imputation methods based on correlations between indicators of the same construct generally accepted Single Item Scales ? Single-item measures Practical Aspects Costs Multi-item measures higher costs associated with lower costs associated scale development, with scale development, questioning, and data questioning, and data analysis analysis increased survey Non lower survey response rate response rate response higher item nonresponse lower item nonresponse Burden little burden: simple, of increased burden: longer, fast, and Question likely more boring and tiring comprehensible -ing Reflective (Scale) Versus Formative (Index) Operationalization of Constructs A central research question in social science research, particularly marketing and MIS, focuses on the operationalization of complex constructs: Are indicators causing or being caused by the latent variable/construct measured by them? Indicator 1 Indicator 2 Indicator 3 Construct Changes in the latent variable directly cause changes in the assigned indicators Indicator 1 ? Indicator 2 Indicator 3 Construct Changes in one or more of the indicators causes changes in the latent variable Example: Reflective vs. Formative World View Can’t walk a straight line Drunkenness Smells of alcohol Slurred speech Example: Reflective vs. Formative World View Consumption of beer Drunkenness Consumption of wine Consumption of hard liquor Basic Difference Between Reflective and Formative Measurement Approaches “Whereas reflective indicators are essentially interchangeable (and therefore the removal of an item does not change the essential nature of the underlying construct), with formative indicators ‘omitting an indicator is omitting a part of the construct’.” (DIAMANTOPOULOS/WINKLHOFER, 2001, p. 271) The formative measurement approach generally minimizes the overlap between complementary indicators Construct domain Construct domain The reflective measurement approach focuses on maximizing the overlap between interchangeable indicators Exercise: Satisfaction in Hotels as Formative and Reflective Operationalized Construct The rooms‘ furnishings are good The hotel’s recreation offerings are good Taking everything into account, I am satisfied with this hotel The hotel‘s personnel are friendly I appreciate this hotel Satisfaction with Hotels The hotel is low-priced I am looking forward to staying overnight in this hotel The rooms are quiet I am comfortable with this hotel The rooms are clean The hotel’s service is good The hotel’s cuisine is good Formative Constructs – Two Types 1. Composite (formative) constructs – indicators completely determine the “latent” construct. They share similarities because they define a composite variable but may or may not have conceptual unity. In assessing validity, indicators are not interchangeable and should not be eliminated, because removing an indicator will likely change the nature of the latent construct. 2. Causal constructs – indicators have conceptual unity in that all variables should correspond to the definition of the concept. In assessing validity some of the indicators may be interchangeable, and also can be eliminated. Bollen, K.A. (2011), Evaluating Effect, Composite, and Causal Indicators in Structural Equations Models, MIS Quarterly, Vol. 35, No. 2, pp. 359-372. PLS-SEM Example LIKE CUSA COMP CUSL Types of Measurement Models PLS-SEM Example Reflective Measurement Model comp_1 comp_2 Reflective Measurement Model like_1 COMP comp_3 like_2 LIKE like_3 Single-Item Construct Reflective Measurement Model cusl_1 cusa CUSA CUSL cusl_2 cusl_3 Indicators for SEM Model Constructs Competence (COMP) comp_1 [company] is a top competitor in its market. comp_2 As far as I know, [company] is recognized world-wide. comp_3 I believe that [company] performs at a premium level. Likeability (LIKE) like_1 [company] is a company that I can better identify with than other companies. like_2 [company] is a company that I would regret more not having if it no longer existed than I would other companies. I regard [company] as a likeable company. like_3 Customer Loyalty (CUSL) cusl_1 I would recommend [company] to friends and relatives. cusl_2 If I had to choose again, I would chose [company] as my mobile phone services provider. I will remain a customer of [company] in the future. cusl_3 Satisfaction (CUSA) cusa If you consider your experiences with [company] how satisfied are you with [company]? Data Matrix for Indicator Variables Column Number and Variable Name Case Number 1 2 3 4 5 6 7 8 9 10 comp_1 comp_2 comp_3 like_1 like_2 like_3 cusl_1 cusl_2 cusl_3 cusa 1 4 5 5 3 1 2 5 3 3 5 2 6 7 6 6 6 6 7 7 7 7 6 5 6 6 7 5 7 7 7 7 ... 344 Getting Started with the SmartPLS Software The next slide shows the graphical interface for the SmartPLS software, with the simple model already drawn. We describe in the following slides how to set up this model using the SmartPLS software program. Before you draw your model, you need to have data that serves as the basis for running the model. The data we will use to run our example PLS model can be downloaded either as comma separated values (.csv) or text (.txt) data files at the following URL: http://www.smartpls.de/cr/. When you get to the website scroll down to the Corporate Reputation Example where it says Click on the following links to download files. SmartPLS can use both data file formats (i.e., .csv or .txt). Follow the onscreen instructions to save one of these two files on your hard drive. Click on Save Target As… to save the data to a folder on your hard drive, and then Close. Now go to the folder where you previously downloaded and saved the SmartPLS software on your computer. Click on the file that runs SmartPLS ( ) and then on the Run tab to start the software. You are now ready to create a new SmartPLS project. SmartPLS Graphical Interface Example with Names and Data Assigned Brief Instructions: Using SmartPLS 1. Load SmartPLS software – click on 2. Create your new project – assign name and data. 3. Double-click to get Menu Bar. 4. Draw model – see options below: • Insertion mode = • Selection mode = • Connection mode = 5. Save model. 6. Click on calculate icon and select PLS algorithm on the Pull-Down menu. Now accept the default options by clicking Finish. To create a new project, click on → File → New → Create New Project. The screen below will appear. Type a name in the window. Click Next. You now need to assign a data file to the project, in our case, data.csv (or whatever name you gave to the data you downloaded). To do so, click on the dots tab (…) at the right side of the window, find and highlight your data folder, and click Open to select your data. Once you have specified the data file, click on Finish. SmartPLS Software Options Find your new project in window, expand list of projects to get project details (see below), click on the .splsm file for your project Double click on your new model to get the menu bar to appear at the top of the screen. Selection mode Draw constructs Draw structural paths Initial Structural Model – No Indicator Variables Structural Model with Names and Paths Name Constructs, Align Indicators, Etc. . . . Start calculation Rename Construct Hide used indicators Show measurement model Change reflective to formative How to Run SmartPLS Software Default Settings for Example – Click Finish to run Trade-off in missing value treatment: Case wise replacement can greatly reduce the number of cases but sample mean imputation reduces variables’ variance. Preferred approach to deal with missing data is combination of sub-group and nearest neighbor, or use EM imputation using SPSS. Always use path weighting scheme PLS Results for Example SmartPLS Calculation Reports – Overview Quality Criteria Report – SmartPLS The composite reliability is excellent – almost .90 for all three constructs. The AVEs for all three constructs are well above .50. Summary of PLS-SEM Findings 1. The direct path from COMP to CUSA is 0.162 and the direct path from COMP to CUSL is 0.009. 2. The direct path from LIKE to CUSA is 0.424 and the direct path from LIKE to CUSL is 0.342. 3. The direct path from CUSA to CUSL is 0.504. 4. Overall, the model predicts 29.5% of the variance in CUSA, and 56.2% of the variance in CUSL. 5. Reliability of constructs is excellent. 6. Constructs achieve convergent validity (AVE > 0.50) To determine significance levels, you must run Bootstrapping option. Look for under the calculate option.