PLS Path Modeling Michel Tenenhaus (tenenhaus@hec.fr) 1 2 4 5 6 PLS Methods initiated by Herman Wold, Svante Wold, Harald Martens and Jan-Bernd Lohmöller 1. 2. 3. 4. 5. 6. 7. 8. NIPALS (Nonlinear Iterative Partial Least Squares) PLS Regression (Partial Least Squares Regression) PLS Discriminant Analysis SIMCA (Soft Independent Modeling by Class Analogy) PLS Approach to Structural Equation Modeling N-way PLS PLS Logistic Regression PLS Generalized Linear Model 7 PLS Methods PLS Path Modeling: PLS Approach to Structural Equation Modeling 8 ECSI Path model for a “ Mobile phone provider” Image R2=.243 R2=.432 .212 (.002) .493 (.000) Loyalty .153 (.006) Customer Expectation .037 (.406) .466 (.000) .066 (.314) .545 (.000) Perceived value .540 (.000) R2=.335 .200 (.000) .05 (.399) Customer satisfaction R2=.672 .540 (.000) .544 (.000) Perceived quality R2=.297 Complaint R2=.292 9 Structural Equation Modeling The PLS approach of Herman WOLD • Study of a system of linear relationships between latent variables. • Each latent variable is described by a set of manifest variables, or summarizes them. • Variables can be numerical, ordinal or nominal (no need for normality assumptions). • The number of observations can be small compare to the number of variables. 10 Economic inequality and political instability Data from Russett (1964), in GIFI Economic inequality Political instability Agricultural inequality INST : Instability of executive (45-61) ECKS : Nb of violent internal war incidents (46-61) DEAT : Nb of people killed as a result of civic group violence (50-62) D-STAB : Stable democracy D-UNST : Unstable democracy DICT : Dictatorship GINI : Inequality of land distributions FARM : % farmers that own half of the land (> 50) RENT : % farmers that rent all their land Industrial development GNPR : Gross national product per capita ($ 1955) LABO : % of labor force employed in agriculture 11 Economic inequality and political instability (Data from Russett, 1964) 47 countries Argentine Australie Autriche Gini 86.3 92.9 74.0 Farm 98.2 99.6 97.4 Rent 32.9 * 10.7 Gnpr 374 1215 532 Labo 25 14 32 Inst 13.6 11.3 12.8 Ecks 57 0 4 Deat 217 0 0 Demo 2 1 2 58.3 86.1 26.0 1046 26 16.3 46 1 2 43.7 79.8 0.0 297 67 0.0 9 0 3 France Yougoslavie 1 = Stable democracy 2 = Unstable democracy 3 = Dictatorship 12 Economic inequality and political instability Agricultural inequality (X1) GINI FARM + + + INST 1 + RENT 3 GNPR + LABO - Industrial development (X2) 2 - + + + + + ECKS DEAT D-STB D-INS DICT Political instability (X3) 13 Modeling • Reflective model (the block is supposed to be uni-dimensionnel) Each manifest variable Xjh is written as : Xjh = jhh + jh • Formative model (the block can be multi-dimensionnel) The latent variable h is a function of the manifest variables of its block Xh : h jXhj h j • There exists a linear structural relationship between the latent variables: Political instability (3) = 1Agri. inequality (1) + 2Ind. development (2) + residual 14 Estimation of latent variables using the PLS approach (1) External (outer) estimation Yh of h Yh = Xhwh (2) Internal (inner) estimation Zh of h Zh (3) [sign( cor( j , h ))]Yj jh j related with ξ h Calculation of wh whj = cor(Zh , Xhj) 15 Estimation of latent variables using the PLS approach (1) External estimation Yh of h Yh = Xhwh Y1 = w11Gini + w12Farm + w13Rent Y2 = X2w2 Y3 = X3w3 16 1 Estimation of latent variables + using the PLS approach 3 (2) 2 Internal estimation Zh of h (Centroid scheme) - Zh [sign( cor( j , h ))]Yj jh j related with ξ h Z1 = sign(cor(1, 3)Y3 = (+1)Y3 Z2 = sign(cor(2, 3)Y3 = (-1)Y3 Z3 = sign(cor(3, 1)Y1 + sign(cor(3, 2)Y2 = (+1)Y1 + (-1)Y2 17 Estimation of latent variables using the PLS approach (3) Calculation of the weights wh whj = cor(Xhj , Zh) w11 = cor(Gini , Z1) w12 = cor(Farm , Z1) w13 = cor(Rent , Z1) And the same way for the other whj. 18 Weight initialization in PLS-graph Option “1” : All weights are equal to 1. Option “–1” : All weights equal to 1, except the last one put to –1. w11,initial = 1 w12,initial = 1 w13,initial = -1 This choice allows some sign control: If the variable with the largest weight is put on last position, this weight will have a good chance to be negative. 19 Economic inequality and political instability Estimation of latent variables with PLS Approach (1) External estimation Y1 = X1w1 Y2 = X2w2 Y3 = X3w3 (2) Internal estimation Z1 = Y3 Z2 = -Y3 Z3 = Y1 - Y2 (3) Calculation of wh w1j = cor(X1j , Z1) w2j = cor(X2j , Z2) w3j = cor(X3j , Z3) Algorithm • Begin with arbitrary weights w1, w2, w3. • Get new weights wh by using (1) to (3). • Iterate until convergence. 20 Use of PLS-Graph (Wynne Chin) 21 Résults Outer Model =============================== Variable Weight Loading (Corrélation) ------------------------------Ineg_agri outward gini 0.4567 0.9745 farm 0.5125 0.9857 rent 0.1018 0.5156 ------------------------------Dev_ind outward gnpr 0.5113 0.9501 labo -0.5384 -0.9551 ------------------------------Inst_pol outward inst 0.1187 0.3676 ecks 0.2855 0.8241 death 0.2977 0.7910 demostab -0.3271 -0.8635 demoinst 0.0370 0.1037 dictatur 0.2758 0.7227 ================================= Loading = coeff. de régression de Xhj sur Yh , = cor(Xhj, Yh) si les X sont centrées-réduites 22 Results Eta .. Latent variables ======================================== ineg_agr dev_indu inst_pol ---------------------------------------arg .964 .238 .755 aus 1.204 1.371 -1.617 aut .397 .253 -.480 bel -.812 1.530 -.846 bol 1.115 -1.584 1.505 bré .778 -.654 .302 . . . tai -.009 -.898 -.068 ru .134 2.059 -1.046 eu .193 2.016 -.942 uru .699 .179 -1.298 ven 1.149 .252 1.135 rfa -.212 1.104 -.494 you -2.189 -.654 .125 ======================================== 23 PLS results Latent variable estimation Argentine Australie Autriche Y1 0.96 1.20 0.39 Y2 0.24 1.37 0.25 Y3 0.75 -1.62 -0.48 -0.88 0.80 0.56 -2.19 -0.65 0.13 France Yougoslavie Multiple regression of Y3 on Y1 and Y2 R2 = 0.618 Political instability = 0.217 Agricultural inequality – 0.692 Industrial development (2.24) (-7.22) Student t coming from multiple regression results 24 Economic inequality and political instability Agricultural inequality (X1) GINI FARM INST .974 .986 RENT .516 GNPR .950 LABO -.955 Industrial development (X2) 1 2 .368 ECKS .824 .217 .791 DEAT -.864 3 D-STB .104 -.692 D-UNS R2 = 0.618 .723 DICT Political instability (X3) 25 Map of countries : Y1 = agricultural inequality , Y2 = industrial development Y2 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 „ƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒ† ‚ ‚ ‚ ˆ royaume-uni(1) ** états-unis(1) ˆ ‚ ‚ ‚ ‚ ‚ ‚ ‚ * canada(1) ‚ ‚ ‚ * suisse(1) ‚ ‚ ˆ * belgique(1) ‚ ˆ ‚ * suède(1) ‚ australie(1) * ‚ ‚ ‚ * nouv._zélande(1) ‚ ‚ * pays-bas(1) ‚ ‚ ‚ * rfa(2) ‚ ˆ * luxembourg(1) ˆ ‚ france(2) ‚ ‚ ‚ * danemark(1) * * norvège(1)‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ˆ ‚ ˆ ‚ ‚ ‚ ‚ * finlande(2) ‚ * autriche(2) ‚ ‚ ‚ italie(2) * * argentine(2)‚ ‚ * irlande(1) ‚ uruguay(1) *venezuela(3) ‚ ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆ ‚ ‚ ‚ ‚ ‚ * cuba(3) ‚ ‚ * pologne(3) ‚ chili(2) * ‚ ‚ * japon(2) ‚ * panama(3) * colombie(2) ‚ ˆ ‚ grèce(2) * * * costa-rica(2)ˆ ‚ * yougoslavie(3) nicaragua(3)* Espagne(3)*brésil(2) ‚ ‚ ‚ salvador(3)* * * équateur(3) ‚ ‚ * philippines(3) rép_dominic.(3) ‚ ‚ taiwan(3) * guatémala(3) * ‚ ˆ ‚ pérou(3) * * irak(3) ˆ ‚ sud_vietnam(3) * ** honduras(3) ‚ ‚ ‚ égypte(3) ‚ ‚ ‚ ‚ ‚ * libye(3) ‚ ˆ * inde(1) ‚ ˆ ‚ ‚ bolivie(3) * ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ˆ ‚ ˆ ‚ ‚ ‚ ŠƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒŒ -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 Y1 26 Results Inner Model ======================= Block Mult.RSq ----------------------Inégalit 0.0000 Développ 0.0000 Instabil 0.6180 ======================== Path coefficients ======================================== Ineg_agr Dev_indu Inst_pol ---------------------------------------Ineg_agr 0.000 0.000 0.000 Dev_indu 0.000 0.000 0.000 Inst_pol 0.217 -0.692 0.000 ======================================== Correlations of latent variables ======================================== Ineg_agr Dev_indu Inst_pol ---------------------------------------Ineg_agr 1.000 Dev_indu -0.309 1.000 Inst_pol 0.431 -0.759 1.000 ======================================== 27 Results Outer Model ======================================================== Variable Weight Loading Communality Redundancy -------------------------------------------------------Ineg_agri outward gini 0.4567 0.9745 0.9496 0.0000 farm 0.5125 0.9857 0.9716 0.0000 rent 0.1018 0.5156 0.2659 0.0000 -------------------------------------------------------Dev_indu outward gnpr 0.5113 0.9501 0.9027 0.0000 labo -0.5384 -0.9551 0.9123 0.0000 -------------------------------------------------------Inst_pol outward inst 0.1187 0.3676 0.1352 0.0835 ecks 0.2855 0.8241 0.6792 0.4197 death 0.2977 0.7910 0.6257 0.3867 demostab -0.3271 -0.8635 0.7457 0.4608 demoinst 0.0370 0.1037 0.0107 0.0066 dictatur 0.2758 0.7227 0.5223 0.3228 ======================================================== Average = 0.28 Communality = Cor(Xhj, Yh)2 = Loading2 For endogenous LV : Redundancy = Cor2(Xhj, Yh)*R2(Yh, LVs explaining Yh) 28 Résultats Inner Model =========================================================== Block Mult.RSq AvCommun AvRedund Goodness of Fit ----------------------------------------------------------Ineg_agri 0.0000 0.7290 0.0000 Dev_indu 0.0000 0.9075 0.0000 Inst_pol 0.6180 0.4531 0.2800 ----------------------------------------------------------Average 0.6180 0.6110 0.2800 .614 =========================================================== Value of the internal model Value of the external model GoF .618 .611 1 ph (Average communality)h Cor 2 (Xhj , Yh ) ph j1 = Average Variance of Xh explained by Yh = AVEh A latent variable must explain at least 50% of its block variance. Average Communality = (3*AvCommun1 + 2*AvCommun2 + 6*AvCommun3)/11 29 A global index of model fit PLS Goodness of Fit Inner Model =================================================== Block Mult.RSq AvCommun AvRedund GoF --------------------------------------------------Ineg_agri 0.0000 0.7290 0.0000 Dev_indu 0.0000 0.9075 0.0000 Inst_pol 0.6180 0.4531 0.2800 --------------------------------------------------Average 0.618 0.6110 0.2800 .614 =================================================== GoF ph 1 1 2 R (Yh , Other LVs explaining Yh * Cor 2 (X hj , Yh ) Nb of endogenous LVs h Nb of MVs h j1 Inner model Outer model 30 Discriminant validity A LV explains more its own MVs than the other LVs AVE and square correlations ======================================== Ineg_agr Dev_indu Inst_pol ---------------------------------------Ineg_agr 0.729 Dev_indu 0.095 0.907 Inst_pol 0.186 0.576 0.453 ???? ======================================== AVE(Yj) must be larger than the cor2(Yj,Yh) for all h 31 Using PLS-Graph (t=1.705) (t=-7.685) t coming from bootstrap re-sampling 32 Bootstrap validation in PLS-Graph Sign control: Individual sign changes / Construct level changes* Outer Model Loadings: ==================================================================== Entire Mean of Standard T-Statistic sample subsamples error estimate Inégalité agricole: gini 0.9745 0.9584 0.0336 28.9616 farm 0.9857 0.9689 0.0329 29.9339 rent 0.5156 0.4204 0.2462 2.0946 Développement industriel: gnpr 0.9501 labo -0.9551 0.9489 -0.9536 0.0121 0.0107 78.3692 -89.1493 Instabilité politique: inst 0.3676 0.3347 0.1756 2.0932 ecks 0.8241 0.8138 0.0699 11.7920 demostab -0.8635 -0.8520 0.0667 -12.9419 demoinst 0.1037 0.0955 0.1611 0.6438 dictatur 0.7227 0.7195 0.0841 8.5915 death 0.7910 0.7977 0.0528 14.9773 ==================================================================== (*) used here 33 Bootstrap validation in PLS-Graph Sign control Individual sign changes Each bootstrapped sign weight is automatically put equal to the full sample sign weight. Construct level changes (Default) For each LV (Construct) the weights are globally inversed if the new loadings (after inversion) are closer to the full sample loadings than the bootstrapped loadings (before inversion). 34 PLS-Graph : Bootsrap Validation Path Coefficients Table (Entire Sample Estimate): ==================================================================== Inég. Agric. Dev. Indust. Instab. Pol. Inég. Agric. 0.0000 0.0000 0.0000 Dev. Indust. 0.0000 0.0000 0.0000 Inst. Pol. 0.2170 -0.6920 0.0000 ==================================================================== Path Coefficients Table (Mean of Subsamples): ==================================================================== Inég. Agric. Dev. Indust. Instab. Pol. Inég. Agric. 0.0000 0.0000 0.0000 Dev. Indust. 0.0000 0.0000 0.0000 Inst. Pol. 0.2328 -0.6743 0.0000 ==================================================================== Path Coefficients Table (Standard Error): ==================================================================== Inég. Agric. Dev. Indust. Instab. Pol. Inég. Agric. 0.0000 0.0000 0.0000 Dev. Indust. 0.0000 0.0000 0.0000 Instabil 0.1272 0.0900 0.0000 ==================================================================== Path Coefficients Table (T-Statistic) ==================================================================== Inég. Agric. Dev. Indust. Instab. Pol. Inég. Agric. 0.0000 0.0000 0.0000 Dev. Indust. 0.0000 0.0000 0.0000 Inst. Pol. 1.7054 -7.6855 0.0000 ==================================================================== 35 SPECIAL CASES OF PLS PATH MODELLING • • • • • • • Principal component analysis Multiple factor analysis Canonical correlation analysis Redundancy analysis PLS Regression Generalized canonical correlation analysis (Horst) Generalized canonical correlation analysis (Carroll) 36 Options of the PLS algorithm External estimation Yj = Xjwj Internal estimation Mode A (for reflective) : wjh = cor(Xjh , Zj) Centroid scheme eji = sign of cor(Yi,Yj) Factorial scheme eji = cor(Yi,Yj) Path weighting scheme eji = regression coeff. in the Mode B (for formative) : wj = (Xj´Xj)-1Xj´Zj Z j e jiYi regression of Yj on the Yi’s 37 The general PLS algorithm Yj = Xjwj Initial step wj Outer Estimation (standardized) 1 Mode A: wj = X j ' Z j w cor(X,Z) n 1 1 Mode B: wj = ( X j ' X j ) 1 ( X j ' Z j ) n n Look at the loading, not at the w Yj1 ej1 Yj2 ej2 Zj Yjm ejm Inner estimation Choice of weights e: Centroid, Factorial or Path weighting scheme 38 Some modified multi-block methods for SEM cjk = 1 if blocks are linked, 0 otherwise SUMCOR (Horst, 1961) Max j ,k c jk Cor ( F j , Fk ) Mathes (1993), Hanafi (2004): Max j , k c jk Cor 2 ( F j , Fk ) PLS: B, Factorial Mathes (1993), Hanafi (2004) Max j , k c jk | Cor ( F j , Fk ) | PLS : B, Centroid MAXBET PLS: B, Horst (New) Max [ j Var ( X j w j ) j k c jk Cov ( X j w j , X k wk )] All w j 1 (Van de Geer, 1984): MAXDIFF Max [ j k c jk Cov ( X j w j , X k wk )] PLS : A, Horst NEW APPROACH All w j 1 (Van de Geer, 1984): MAXDIFF B (Hanafi & Kiers, 2006) Max All wi 1 c Cov ( X w , X w ) 2 i j ij i i j j PLS : A, Factorial NEW APPROACH PLS approach : 2 blocks 1 X1 2 X2 Mode for weight calculation Method Deflation (*) Y1 = X1w1 Y2 = X2w2 A A PLS regression of X2 on X1 On X1 only B A Redundancy analysis of X2 with respect to X1 On X1 only A A Tucker Inter-Battery Factor Analysis On X1 and X2 B B Canonical correlation Analysis On X1 and X2 (*) Deflation: Working on residuals of the regression of X on the previous LV’s in order to obtain orthogonal LV’s. 40 PLS regression (2 components) dim 1 - Mode A for X - Mode A for Y - Deflate only X Max Cov( Xa, Yb) a b 1 Max Cor ( Xa, Yb)* Var ( Xa) * Var (Yb) dim 2 a b 1 41 PLS Regression in SIMCA-P : PLS Scores australie 3 royaume-uni 2 états-unis nouvelle zélande belgique pays-bas 1 suède luxembourg t[2] suisse 0 france West Germany norvège canada -1 argentine uruguay venezuela irak cuba espagne italie chili équateur pèrou colombie autriche rép. dominicaine grèce salvador guatémala costa-rica taiwan brésil panama honduras nicaragua égypte sud vietnam philippines inde libye finlande danemark bolivie irlande -2 japon -3 pologne yougoslavie -4 -4 -3 -2 -1 0 1 2 3 4 t[1] 42 Correlation loadings 1.00 RENT 0.80 GINI FARM 0.60 0.40 pc(corr)[Comp. 2] GNPR DEMOSTB 0.20 DEAT 0.00 INST DEMOINST -0.20 ECKS DICTATURE -0.40 LABO -0.60 -0.80 -1.00 - 1.00 - 0.80 - 0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00 pc(corr)[Comp. 1] 43 Redundancy analysis of X on Y (2 components) dim 1 - Mode A for X - Mode B for Y - Deflate only X Max a Var (Yb ) 1 Cov( Xa, Yb) Max a Var (Yb ) 1 Max Var (Yb ) 1 Cor ( Xa, Yb)* Var ( Xa) Cor 2 dim 2 ( x j , Yb) j 44 Inter-battery factor analysis (2 components) dim 1 - Mode A for X - Mode A for Y - Deflate both X and Y Max Cov( Xa, Yb) a b 1 Max Cor ( Xa, Yb)* Var ( Xa) * Var (Yb) dim 2 a b 1 45 Canonical correlation analysis (2 components) dim 1 - Mode B for X - Mode B for Y - Deflate both X and Y Max Var ( Xa ) Var (Yb ) 1 Cov( Xa, Yb) dim 2 46 PLS approach : K blocks 1 X1 X1 . . . K XK . . . X XK Scheme for internal estimation calculation Mode for weight calculation Centroid Factorial A NEW ! NEW ! B Generalized Canonical Correlation Analysis (Horst) Generalized Canonical Correlation Analysis (Carroll) Deflation: On the super-block only Structural - PCA of X - Multiple Factor Analysis of the X j’s - ACOM (Chessel & Hanafi) NEW ! 47 A new PLS algorithm Arthur & Michel Tenenhaus cij = 1 if blocks are linked, 0 otherwise Horst scheme : Maximize cij Cov( X i ai , X j a j ) i, j Centroid scheme : Maximize cij Cov( X i ai , X j a j ) i, j Factorial scheme : Maximize cij Cov 2 ( X i ai , X j a j ) i, j subject to the following constraints : i ai 1 i Var ( X i ai ) 1 2 For i = 1 Mode A : ai 2 1 For i = 0 Mode B : Var ( X i ai ) 1 48 CONCLUSION • PLS IS TO COVARIANCE-BASED SEM AS PRINCIPAL COMPONENT ANALYSIS IS TO FACTOR ANALYSIS. • WHEN INDIVIDUAL DATA ARE AVAILABLE, SIGNIFICANCE TESTS CAN BE CARRIED OUT WITH PLS BY CROSS VALIDATION METHODS. 49 European Customer Satisfaction Index PLS Path Modelling versus LISREL Michel Tenenhaus tenenhaus@hec.fr 50 The European Customer Satisfaction Index (ECSI) • ECSI is an economic indicator that measures customer satisfaction. • It is an adaptation of the Swedish Customer Satisfaction Barometer and the American Customer Satisfaction Index (ACSI) proposed by Claes Fornell. • Fornell’s methodology is presented. 51 Path model describing causes and consequences of Customer Satisfaction Image Loyalty Customer Expectation Perceived value Customer satisfaction . Perceived quality Full model in red and blue, Reduced model in red Complaints 52 Content of the presentation • Use of Fornell’s methodology on the full ECSI model • Use of Fornell’s methodology on the reduced model • Use of SEM-ML on the reduced model (SEM-ML did not work on the full model) • Comparison between PLS and SEM-ML results on the reduced model 53 Measurement Instrument for the Mobile Phone Industry : Examples of latent and manifest variables Customer expectation a) Expectations for the overall quality of “your mobil phone provider” at the moment you became customer of this provider. b) Expectations for “your mobile phone provider” to provide products and services to meet your personal need. Customer satisfaction a) Overall satisfaction b) Fulfilment of expectations c) How well do you think “ your mobile phone provider” compares with your ideal mobil phone provider ? c) How often did you expect that things could go wrong at “your mobile phone provider” ? 54 Measurement Instrument for the Mobile Phone Industry : Examples of latent and manifest variables Customer loyalty a) If you would need to choose a new mobile phone provider how likely is it that you would choose “your provider” again ? b) Let us now suppose that other mobile phone providers decide to lower fees and prices, but “your mobile phone provider” stays at the same level as today. At which level of difference (in %) would you choose another phone provider ? c) If a friend or colleague asks you for advice, how likely is it that you would recommend “your mobile phone provider” ? And so on for the other latent variables ... 55 I. Study of the complete model using the Fornell’s approach • Manifest variables V are transformed from a scale “ 1-10 ” to a scale “ 0 -100 ” : x V 1 100 9 • Each latent variable is estimated as a weighted average of its manifest variables. • PLS Path modeling is used to estimate the weights with Mode A and Centroid scheme options. • Path coefficients are computed by multiple regression on the estimated latent variables and t-statistics by crossvalidation (bootstrap). 56 Use of PLS-Graph (Wynne Chin) 57 Results : The weights Outer Model ====================== Variable Weight ---------------------Image outward IMAG1 0.0147 IMAG2 0.0127 IMAG3 0.0137 IMAG4 0.0177 IMAG5 0.0143 ---------------------Expectat outward CUEX1 0.0232 CUEX2 0.0224 CUEX3 0.0252 ---------------------Per_Qual outward PERQ1 0.0098 PERQ2 0.0085 PERQ3 0.0118 PERQ4 0.0094 PERQ5 0.0084 PERQ6 0.0095 PERQ7 0.0129 ---------------------- Outer Model ====================== Variable Weight ---------------------Per_Valu outward PERV1 0.0239 PERV2 0.0247 ---------------------Satisfac outward CUSA1 0.0158 CUSA2 0.0231 CUSA3 0.0264 ---------------------Complain outward CUSCO 0.0397 ---------------------Loyalty outward CUSL1 0.0185 CUSL2 0.0061 CUSL3 0.0225 ====================== 58 Fornell’s computation of the latent variables Example : Customer Satisfaction Index CSI 0.0158 CUSA1 0.0231 CUSA2 0.0264 CUSA3 0.0158 0.0231 0.0264 Mean and standard deviation of the latent variables IMAGE CUSTOMER EXPECTATION PERCEIVED QUALITY PERCEIVED VALUE CUSTOMER SATISFACTION COMPLAINT LOYALTY N 250 250 250 250 250 250 250 Minimum 26.49 25.85 23.95 .00 23.68 .00 1.29 Maximum 100.00 100.00 100.00 100.00 100.00 100.00 100.00 Mean 72.6878 72.3198 74.5765 61.5887 71.2876 67.4704 69.1757 Std. Deviation 13.7660 14.1259 14.2573 20.5987 15.3417 25.2684 21.2668 59 Correlations between manifest variables and latent variables Image1 Image2 Image3 Image4 Image5 C_exp1 C_exp2 C_exp3 P_qual1 P_qual2 P_qual3 P_qual4 P_qual5 P_qual6 P_qual7 P_val1 P_val2 C_sat1 C_sat2 C_sat3 Complaint Loyalty1 Loyalty2 Loyalty3 Image .717 .565 .657 .791 .698 .622 . .621 . .599 .551 .596 .541 .558 .524 .613 Customer expectation .689 .644 .724 .537 Perceived quality .571 Perceived value Customer satisfaction .539 .571 .544 .543 .500 .778 .651 .801 .760 .732 .766 .803 .661 .594 .638 .672 .684 .537 .547 .933 .911 .588 Complaint Loyalty .651 .587 .516 .539 .707 .631 .711 .872 .884 .540 .524 .547 1 .610 .854 .528 .537 .659 Correlations below 0.5 in absolute value are not shown. .869 60 ECSI Path model for a“ Mobile phone provider” Regression on standardized variables and t-statistics provided by PLS-Graph bootstrap, construct level change option Image R 2=.242 R 2=.432 .211 (2.54) .492 (7.67) Loyalty .153 (3.07) Customer Expectation .037 (1.14) .468 (5.18) .066 (1.10) .544 (10.71) Perceived value .541 (6.93) R2 =.335 .201 (3.59) .049 (1.11) Customer satisfaction R 2=.672 .540 (11.08) .543 (8.62) Perceived quality R 2=.296 Complaint R 2=.292 61 II. Study of the reduced model using Fornell’s approach . Customer Expectation .053 (1.20) .070 (1.08) .545 (8.92) Perceived value .538 (6.59) R 2=.335 .216 (12.35) Customer satisfaction R 2=.660 .634 (11.50) Loyalty R 2 = .402 .638 (3.70) Perceived quality R 2=.297 62 The new PLS weights Weight CE1 CE2 CE3 PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ7 .0237 .0206 .0262 .0098 .0085 .0118 .0094 .0084 .0095 .0129 Relative Weight .336 .292 .372 .139 .121 .168 .134 .119 .135 .183 Weight PV1 PV2 CS1 CS2 CS3 CL1 CL2 CL3 -.0239 -.0247 .0157 .0240 .0256 -.0188 -.0050 -.0226 Relative Weight .492 .508 .241 .368 .392 .405 .108 .487 For each variable the relative weights sum up to 1. 63 III. Study of the reduced model using AMOS: Model 1 (Standardized Results) .21 .30 CE1 Chi-Square = 271 DF = 128 Chi-Square /DF = 2.12 e5 CE3 .42 PQ5 e10 .50 PQ6 .57 PQ7 .71 .69 .71 .76 .74 .86 PER_QUAL .55 d1 .46 .78 .74 PV1 d2 PER_VAL .89 .94 .04 .72 (.83) PV2 .24 .48 e13 e9 .48 .50 PQ4 PQ3 .77 .57 .75 -.13 (.63) e12 e8 e7 .56 PQ2 PQ1 CUS_EXP e11 e6 .33 .60 .18 CE2 .55.46 e4 e3 e2 e1 CSI1 .57 e14 CSI2 e15 CS3 .87 .70 .75 CSI d3 .64 .80 .80 .39 e16 CL1 .63 .01 .12 e17 CL2 .65 CUS_LOY d4 .86 .75 e18 CL3 64 Reduced model 2 (Standardized results) .21 .30 CE1 Chi-Square = 271 DF = 130 Chi-Square /DF = 2.08 RMSEA = 0.066 H0: RMSEA 0.05 : p-value = 0.01 CE3 PQ1 e6 .33 .86 PQ5 e10 .50 PQ6 .57 PQ7 .71 .69 .71 .76 .73 PER_QUAL d1 .45 .67 .74 PV1 d2 PER_VAL .89 .94 .75 PV2 .25 .48 e13 e9 .48 .50 PQ4 PQ3 .77 .58 .75 .55 e12 e8 e7 .56 PQ2 .43 CUS_EXP e11 e5 .60 .18 CE2 .55.45 e4 e3 e2 e1 CSI1 .57 e14 CSI2 e15 CS3 .87 .70 .75 CSI d3 .64 .80 .80 .39 e16 CL1 .63 .01 .12 e17 CL2 .65 CUS_LOY d4 .86 .75 e18 CL3 65 Reduced model 2 (Unstandardized results) 225.47 313.57 445.07 99.30 e2 e1 1 1 1 e4 e3 CE3 CE2 .91 1.00 1.00 97.94 PQ1 e11 PV1 CSI2 PQ4 PQ5 1 1 PQ6 PQ7 1.06.91 1.051.26 39.71 1 d1 177.30 1.00 1 PER_VAL d2 .58 1.00 1.55 CL3 CSI d3 1.56 CL1 CL2 12.01 1 1.63 CS3 152.38 1 e18 PQ3 e10 .99 CSI1 977.28 1 e17 PQ2 PER_QUAL 529.56 1 e16 1 e9 .13 134.06 1 e15 1 1 PV2 166.09 1 e14 e8 1.06 96.73 1 e13 e7 1.07 47.08 1 e12 e6 1 1.00.99 1.24 CUS_EXP 262.28 1 e5 1 CE1 294.34 178.38 167.50 135.99 163.63179.21 1.00 .20 120.31 CUS_LOY 1 d4 1.15 Chi-Square = 271.118 df = 130 Chi-Square/df = 2.086 rmsea = .066 p-value (rmsea =< .05) = .010 66 Specific estimation of the latent variables • Each latent variable is estimated as a weighted average of its own manifest variables, using the loadings hj . • For example Y4 41CUSA1 42CUSA2 43CUSA3 41 42 43 is the Customer Satisfaction Index score. • Each coefficient 4j is the regression coefficient of 4 in the regression relating the manifest variable X4j to its latent variable 4 (similar to PLS weight estimation when mode A is used). 67 Loadings and LISREL weights CE1 CE2 CE3 PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ7 Loading 1.000 0.913 1.004 1.000 0.988 1.241 1.061 0.911 1.045 1.265. Weight .343 .313 .344 .133 .132 .165 .141 .121 .139 .168 PV1 PV2 CS1 CS2 CS3 CL1 CL2 CL3 Loading 1.000 1.069 1.000 1.549 1.634 1.000 0.202 1.155 Weight .483 .517 .239 .370 .391 .424 .086 .490 68 Comparison between the PLS and LISREL weights .6 .5 .4 LISREL WEIGHT .3 .2 .1 0.0 .1 .2 .3 PLS RELATIVE WEIGHT .4 .5 .6 69 Correlations between the PLS latent variables and the specific LISREL latent variables 3 2 2 2 1 1 0 0 -1 -1 -2 -2 1 0 -2 -3 -4 20 40 60 80 -3 -4 100 120 20 2 1 1 0 0 -1 -1 -2 -2 -3 -4 40 C S I (LIS R E L) 40 60 80 100 -3 -4 120 P E R _QU A L (LIS RE L) 2 CUS_LO Y (PLS) C SI (PLS) C U S _E X P (LIS R E L) 20 PER _VAL (PLS) PER _Q U AL (PLS) -1 60 80 100 -2 0 0 20 40 60 80 100 120 P E R _V A L (LIS R E L) All the correlations are above .998 -3 -4 120 -2 0 0 20 40 60 80 1 00 1 20 CUS_LOY (LISREL) 70 First conclusions • If COV-BASED SEM works, the PLS results can be derived from the COV-BASED SEM results. • If COV-BASED SEM does not work, PLS is still an alternative. • If COV-BASED SEM is not adequate (small number of observations and/or large number of variables) PLS can be used for exploratory purposes. 71 Usual estimation of latent variables in LISREL Proc calis covariance modification data =ecsi outstat=a; lineqs CUEX1 = 1 f1 + e1, CUEX2 = Lambda12 f1 + e2, CUEX3 = Lambda13 f1 + e3, . . . CUSL1 = 1 f5 + e16, CUSL2 = Lambda52 f5 + e17, CUSL3 = Lambda53 f5 + e18, f2 = beta21 f1 + d2, f3 = beta31 f1 + beta32 f2 + d3, f4 = beta41 f1 + beta42 f2 + beta43 f3 + d4, f5 = beta54 f4 + d5; std e1-e18 = vare1-vare18, d2-d5 = vard2-vard5, f1 = varf1; var CUEX1 PERQ6 CUSL2 CUEX2 PERQ7 CUSL3; CUEX3 PERV1 PERQ1 PERQ2 PERQ3 PERV2 CUSA1 CUSA2 PERQ4 CUSA3 PERQ5 CUSL1 run; proc print data=a (where = (_type_="SCORE")); run; 72 Variable weights for the usual estimation of the latent variables in LISREL CUEX1 f1 f2 f3 f4 f5 PERQ4 0.046274 0.081633 0.007765 0.028778 0.019737 CUSA2 0.027752 0.042369 0.024356 0.083861 0.057516 0.11102 0.03242 -0.00083 0.01321 0.00906 CUEX2 0.074334 0.021705 -0.000558 0.008842 0.006064 CUEX3 0.055776 0.016287 -0.000418 0.006634 0.004550 PERQ7 PERQ1 PERQ2 PERQ3 0.07362 0.12987 0.01235 0.04578 0.03140 0.024507 0.043233 0.004112 0.015241 0.010453 0.050785 0.089590 0.008522 0.031583 0.021661 PERV1 PERV2 CUSA1 PERQ5 PERQ6 0.049023 0.086483 0.008227 0.030488 0.020910 0.046702 0.082388 0.007837 0.029044 0.019920 0.051524 0.090894 0.008646 0.032043 0.021977 CUSA3 CUSL1 CUSL2 CUSL3 0.03606 0.05506 0.03165 0.10898 0.07474 0.00387 0.00591 0.00340 0.01170 0.10838 0.000423 0.000645 0.000371 0.001277 0.011830 0.01533 0.02340 0.01345 0.04632 0.42895 -0.00071 0.00466 0.10833 0.00992 0.00680 -0.00440 0.02873 0.66864 0.06122 0.04199 0.030850 0.047098 0.027074 0.093221 0.063936 73 Correlations between the PLS latent variables and the usual estimated LISREL latent variables 3 2 2 2 1 1 0 0 -1 -1 -2 -2 1 0 -2 -3 -4 -3 Rs q = 0.6 2 2 7 -4 0 -3 0 -2 0 -1 0 0 10 20 -4 1 1 0 0 -1 -1 -2 -2 C U S_LOY 2 C SI -3 -4 Rs q = 0.8 9 8 1 F4 -2 0 -4 0 -3 0 -2 0 -1 0 0 10 20 30 -1 0 0 10 20 -3 -4 Rs q = 0.9 0 1 0 -6 0 P E R _QU A L (LIS RE L) 2 -3 0 Rs q = 0.9 5 9 9 -5 0 C U S _E X P (LIS R E L) -4 0 PER _VAL (PLS) PER _Q U AL (PLS) -1 -4 0 -2 0 0 20 40 P E R _V A L (LIS R E L) -3 -4 Rs q = 0.8 5 9 2 -6 0 -4 0 -2 0 0 20 40 F5 74 Final conclusions • COV-BASED SEM did not work on the full model. • COV-BASED SEM gives better results for the inner model (relating the latent variables between them) because the latent variables are space-free. • PLS gives better results for the outer model (relating the manifest variables to their latent variables) because each latent variable is constrained to be in its own manifest variables space. • If each COV-BASED SEM latent variable is estimated as a weighted average of its own manifest variables, then COV-BASED SEM and PLS give almost identical latent variable estimates (at least on the examples we have studied). 75 PLS Path Modeling and Multiple Table Analysis Application to the study of the cosmetic habits of women in Ile-de-France Christiane Guinot (CERIES/CHANEL) & Michel Tenenhaus (HEC) 76 Objective of the analysis We have applied the PLS approach to a study of the cosmetic habits of women living in Ile-de-France The aim of the project was to obtain a global score describing the propensity to use cosmetic products in this sample Then, we used behavioural and skin characteristic variables, which are known to account for the variation in use of cosmetic products, to check on the relevance of this score 77 Data The cosmetic products were divided into four blocks corresponding to different cosmetic practices Body soap, liquid soap, moisturising body care cream, hand creams make-up and eye make-up Face removers, tonic lotions, day care creams, night creams exfoliation products Make blushers, mascaras, eye shadows, eye pencils, lipsticks, lip shiners -up and nail polish Sun sun protection products for face and for body after-sun products care for face and for body 78 Construction of a global score Cosmetic practices Partial Scores Body care 1 Face care 2 Make-up Sun care Manifest variables Global score Face care 3 4 Body care Latent Variable (A) Latent Variables (B) Make-up Sun care Inner model : centroid scheme 79 Results Correlations Body care Face care Make -up Sun care soap body liquid soap body cream hand cream make-up rem. tonic lotion eye m.up rem. day cream night cream exfoliation pdt blusher mascara eye shadow eye pencil lipstick nail polish protec face after sun face protec body after sun body -.24 .47 .80 .56 Propensity to use cosmetic products 1 .27 .44 .56 .56 .46 .55 .57 .57 .72 .58 .43 .53 .47 .64 .73 .71 .78 2 .42 .43 3 4 Global score .44 Regression coefficients -.12 .23 .40 .28 .32 .40 .40 .33 .39 .41 .39 .49 .39 .29 .36 .31 .39 .45 .44 .49 Soap body liquid soap body cream hand cream make-up remover tonic lotion eye m.up remover day cream night cream exfoliation pdt blusher mascara eye shadow eye pencil lipstick nail polish sun protec. face after sun face sun protec. body after sun body 80 Result : Global score Global score = -3.40 - .11 * soaps and toilet soaps for body care +.20 * liquid soaps for body care +.38 * moisturising body creams and milks +.25 * hand creams and milks +.21 * make-up removers +.26 * tonic lotions +.30 * eye make-up removers +.39 * moisturising day creams +.30 * moisturising night creams +.30 * exfoliation products +.26 * blushers +.41 * mascara +.26 * eye shadows +.20 * eye pencils +.33 * lipsticks and lip shiners +.20 * nail polish +.36 * sun protection products for the face +.31 * moisturising after sun products for the face +.38 * sun protection products for the body +.34 * moisturising after sun products for the body 81 Results: partial scores Score body-care Score facial-care Score make-up Score sun-care 82 Result : global score S_body-care S_facial-care S_make-up S_facial-care 0.24001 S_make-up 0.13462 0.35035 S_sun-care 0.16500 0.19075 0.14273 S_global 0.50263 0.71846 0.67347 S_sun-care 0.62071 83 Relevance of the global score Factors influencing the use of cosmetic products To identify behavioural and skin characteristic variables which best account for the variation in the use of cosmetic products, we can relate the global score to the following variables: Professional activity & Socio-professional category Children Sun exposure habits Practice of sport Importance of physical appearance Type of facial skin & type of body skin Age 84 Relevance of the global score E(Score global) = -1.02 +.21 +.07 +.00 +.27 * * * * professional activity housewife or student retired CSP A (craftsmen, trades people, business managers, managerial staff,academics and professionals) +.09 * CSP B (farmers and intermediary professions) +.05 * CSP C (employees and working class people) +.00 * CSP D (retired and non working people) -.21 * without child +.00 * with child +.40 * habits of deliberate exposure to sunlight +.09 * previous habits of deliberate exposure to sunlight +.00 * no habits of deliberate exposure to sunlight -.17 * no sport practised +.00 * sport practised +1.04 * physical appearance is of extreme importance +.89 * physical appearance is of high importance +.50 * physical appearance is of some importance +.00 * physical appearance is of little importance -.06 * oily facial skin +.16 * combination facial skin -.20 * normal facial skin +.00 * dry facial skin -.32 * oily body skin -.57 * combination body skin -.32 * normal body skin +.00 * dry body skin -.00 * age 85 A good profile E(Global score)= -1.02 +.21 +.07 +.00 +.27 1.06 * * * * professional activity housewife or student retired CSP A (craftsmen, trades people, business managers, managerial staff, academics and professionals) +.09 * CSP B (farmers and intermediary professions) +.05 * CSP C (employees and working class people) +.00 * CSP D (retired and non working people) - .21 * without child +.00 * with child +.40 * habits of deliberate exposure to sunlight +.09 * previous habits of deliberate exposure to sunlight +.00 * no habits of deliberate exposure to sunlight - .17 * no sport practised +.00 * sport practised +1.04 * physical appearance is of extreme importance +.89 * physical appearance is of high importance +.50 * physical appearance is of some importance +.00 * physical appearance is of little importance - .06 * oily facial skin +.16 * combination facial skin - .20 * normal facial skin +.00 * dry facial skin - .32 * oily body skin - .57 * combination body skin - .32 * normal body skin +.00 * dry body skin - .00 * age 86 A bad profile E(Global score)= -1.02 +.21 * professional activity +.07 * housewife or student +.00 * retired +.27 * CSP A (craftsmen, trades people, business managers, managerial staff, academics and professionals) +.09 * CSP B (farmers and intermediary professions) +.05 * CSP C (employees and working class people) +.00 * CSP D (retired and non working people) - .21 * without child +.00 * with child +.40 * habits of deliberate exposure to sunlight +.09 * previous habits of deliberate exposure to sunlight +.00 * no habits of deliberate exposure to sunlight - .17 * no sport practised +.00 * sport practised +1.04 * physical appearance is of extreme importance +.89 * physical appearance is of high importance +.50 * physical appearance is of some importance +.00 * physical appearance is of little importance - .06 * oily facial skin +.16 * combination facial skin - .20 * normal facial skin +.00 * dry facial skin - .32 * oily body skin - .57 * combination body skin - .32 * normal body skin +.00 * dry body skin - .00 * age -2.00 87 Conclusion Using PLS approach, we obtain a score presenting the propensity to use cosmetic products by balancing the different types of cosmetic products better than using principal component analysis. 88 Final conclusion « All the proofs of a pudding are in the eating, not in the cooking ». William Camden (1623) 89 Some references on PLS Path Modeling • CHIN W.W. (2001) : PLS-Graph User’s Guide, C.T. Bauer College of Business, University of Houston, Houston. • CHIN W.W. (1998) : “The partial least squares approach for structural equation modeling”, in: G.A. Marcoulides (Ed.) Modern Methods for Business Research, Lawrence Erlbaum Associates, pp. 295-336. • FORNELL C. (1992) : “A National Customer Satisfaction Barometer: The Swedish Experience”, Journal of Marketing, Vol. 56, 6-21. • FORNELL C. & CHA J. (1994) : “Partial Least Squares”, in Advanced Methods of Marketing Research, R.P. Bagozzi (Ed.), Basil Blackwell, Cambridge, MA., pp. 52-78. • GUINOT, C., LATREILLE, J. & TENENHAUS M.: “PLS Path Modeling and Analysis of Multiple Tables”, Chemometrics and Intelligent Laboratory Systems, Special issue on PLS methods, 58, 2001 (with C. Guinot and J. Latreille). • LOHMÖLLER J.-B. (1987) : LVPLS Program Manual, Version 1.8, Zentralarchiv für Empirische Sozialforschung, Köln. 90 • LOHMÖLLER J.-B. (1989) : Latent Variables Path Modeling with Partial Least Squares, Physica-Verlag, Heildelberg. • PAGÈS J. & TENENHAUS M. (2001) : "Multiple Factor Analysis and PLS Path Modeling", Chemometrics and Intelligent Laboratory Systems, 58, 261-273. • TENENHAUS M. (1998) : La Régression PLS. Éditions Technip, Paris • TENENHAUS M. (1999) : “L’approche PLS”, Revue de Statistique Appliquée, vol. 47, n°2, pp. 5-40. • TENENHAUS M., ESPOSITO VINZI V., CHATELIN Y.-M., LAURO, C. (2005): "PLS Path Modeling", Computational Statistics and Data Analysis. • WOLD H. (1985) : “Partial Least Squares”, in Encyclopedia of Statistical Sciences, vol. 6, Kotz, S & Johnson, N.L. (Eds), John Wiley & Sons, New York, pp. 581-591. 91