SmartPLS Bootstrapping

advertisement
PLS-SEM: Introduction Continued
(Part 2)
Joe F. Hair, Jr.
Founder & Senior Scholar, DBA Program
Systematic Process for applying PLS-SEM
Stage 1
Specifying the Structural Model
Stage 2
Specifying the Measurement Models
Stage 3
Data Collection and Examination
Stage 4
PLS-SEM Model Estimation
Stage 5a
Assessing PLS-SEM Results for Reflective
Measurement Models
Stage 5b
Assessing PLS-SEM Results for Formative
Measurement Models
Stage 6
Assessing PLS-SEM Results for the Structural
Model
Stage 7
Interpretation of Results and Drawing Conclusions
Significance of PLS-SEM Parameters = Bootstrapping
PLS-SEM does not assume the data is normally distributed, which
implies that parametric significance tests used in regression analyses
cannot be applied to test whether coefficients such as outer weights
and loadings are significant. Instead, PLS-SEM relies on a
nonparametric bootstrap procedure to test coefficients for their
significance.
In bootstrapping, a large number of subsamples (i.e., bootstrap
samples) is drawn from the original sample with replacement.
Replacement means that each time an observation is drawn at random
from the sampling population, it is returned to the sampling
population before the next observation is drawn (i.e., the population
from which the observations are drawn always contains all the same
elements). Therefore, an observation for a certain subsample can be
selected more than once, or may not be selected at all for another
subsample. The number of bootstrap samples should be high but
must be at least equal to the number of valid observations in the
dataset. The recommended number of bootstrap samples is 5,000.
SmartPLS Bootstrapping
• Bootstrapping estimates a PLS path model for each subsample:
 Samples: Number of random samples drawn from the original sample
(at minimum should equal the number of observations in the original
sample, but 5,000 is recommended).
 Cases: Number of cases drawn in each sample run (should be at least
as large as the number of valid observations in the original sample).
• Bootstrapping provides mean values and standard errors for:
 inner model path coefficients.
 weights and loadings in the measurement models.
Use bootstrapping
SmartPLS Bootstrapping
If you have missing data do not use mean
replacement because bootstrapping draws
samples with replacement. Use Casewise
Replacement.
Use individual (sign) changes option
• Make sure the number of cases are
equal to the number of valid
observations in your dataset.
• Set cases = samples size (or higher)
Caution!!! It is a common mistake to
set samples equal to the overall number
of observations.
SmartPLS Bootstrapping
• Make sure the number of cases are
equal to (or more than) the number of
valid observations in your dataset. Set
cases = sample size (or higher). Note
that the number is now 344.
• We have also set the number of
samples as 5,000.
Bootstrapping HTML Report – Table of Contents
Click on
to access HTML report
Bootstrapping Option (Total Effects tables) –
Significance of Structural Path Coefficients
Results based on Cases = 344 and Samples = 5,000
Significant t-values
• 1.65 for 10%
• 1.96 for 5%
• 2.58 for 1%
(all two-tailed)
Bootstrapping Option – Significance of
Indicator Loadings
Results based on Cases = 344 and Samples = 5,000
SmartPLS Predictive Relevance – Blindfolding
o
o
o
Q² is a criterion to evaluate how well the model predicts the data of
omitted cases. It is referred to as predictive relevance.
The process involves omitting (removing) or “blindfolding” one case
at a time and re-estimating the model parameters based on the
remaining cases. The omitted case values are then predicted on the
basis of the newly estimated parameters of the remaining cases.
Procedure:
• Set an omission distance D. Note: The number of cases in your
data must not be a multiple integer number of the omission
distance (otherwise the blindfolding procedure yields erroneous
results). Experience has shown that d values between 5 and 10
typically work well.
• Interpret the cross-validated redundancy, because it uses the
PLS-SEM estimates of both the structural model and the
measurement models for data prediction. Also, in most instances
the focus is on predicting the data of the target endogenous
constructs.
SmartPLS Predictive Relevance – Blindfolding
Redundancy vs. Communality?
Cross-validated redundancy
Step 1:
The scores of the endogenous LV(s)
are estimated using the scores of the
exogenous LVs
LV1
MV 1
LV3
MV 3
LV2
LV1
Step 2:
Newly estimated LV scores are used
to estimate the missing MV data
11
MV 1
LV3
MV 2
MV 3
LV2
Cross-validated communality
MV 2
Only step 2.
SmartPLS Results – Blindfolding
Use blindfolding
SmartPLS Results – Blindfolding
Make sure that n / Omission
distance is not an integer
(here: n = 344).
Check all boxes
SmartPLS Results – Blindfolding
 Click on
to access HTML report
SmartPLS Results – Blindfolding
Click on Construct Crossvalidated Redundancy
Q² > 0: model has predictive relevance.
Q² ≈ 0 or Q² < 0: model is lacking predictive relevance.
Predictive relevance is demonstrated for both
endogenous constructs.

PLS-SEM and Research in Marketing
•
Top 30 marketing journals* – 204 articles / 311 models
•
80% of articles published since 2000, 35% in JM, IMM & EJM
2010 = 25%
60
51
Number of PLS articles
50
40
26
30
22
17
20
8
10
24
17
15
10
7
3
0
1980
1984
1985
1989
1990
1994
1995
1999
2000
2004
2005
2006
2007
2008
2009
2010
Individual years
Totals for 5 year periods
An Assessment of the Use of Partial Least Squares Structural Equation
Modeling in Marketing Research, JAMS, Vol. 40 (3), May 2012.
* Ranking based on Hult et al. (2009)
PLS-SEM and Research in Marketing
•
•
•
•
Reasons for using PLS – non-normal data (50%), small
sample size (46%), formative measures (33%),
prediction = research objective (28%), complex models
(13%), categorical variables (13%).
Average PLS sample size is 211 compared to 246 for
CB-SEM. But 25% had less than 100 observations,
and 9% did not meet recommended sample size
criteria.
No studies report skewness or kurtosis.
42% reflective only; 6% formative only; 40% mixed;
12% no indication.
PLS-SEM and Research in Marketing
Outer Model Evaluation
•
•
•
•
Reliability
 70% reported (56% composite reliability)
 46% included formative constructs, but 23% used reflective
criteria on formative constructs
Convergent Validity (AVE) – 57%
Discriminant Validity
 Fornell-Larcker – 44%
 Cross-loadings – 5%
 Both – 12%
Significance of formative indicator weights – 17%
Inner Model Evaluation
•
•
•
•
R2 – 88%
Predictive relevance (Q2) – 16% (for endogenous variables)
Path coefficient sizes – 96%
Path coefficient significance – 92%
Observations and Conclusions

PLS-SEM = rapidly emerging tool in marketing literature
because . . .
 Flexible data distribution and scaling requirements.
 Achieves high levels of statistical power with smaller sample




sizes and complex models.
With complex models produces superior results to CB-SEM.
Easily handles both reflective and formative measured
constructs.
PLS-SEM’s methodological properties are widely
misunderstood (CB-SEM bias).
Marketing scholars need to become familiar with
advantages and limitations.
Special Issue, PLS in Marketing, March 2011







Hair, Joseph F., Christian M. Ringle, and Marko Sarstedt. PLS-SEM:
Indeed a Silver Bullet.
Haenlein, Michael and Andreas M. Kaplan. The Influence of Observed
Heterogeneity on Path Coefficient Significance: Technology Acceptance within
the Marketing Discipline.
Eggert, Andreas and Murat Serdaroglu. Exploring the Impact of Sales
Technology on Salesperson Performance: A Task-Based Approach.
Navarro, Antonio, Francisco J. Acedo, Fernando Losada, and Emilio Ruzo.
Integrated Model of Export Activity: Analysis of Heterogeneity in Managers’
Orientations and Perceptions on Strategic Marketing Management in Foreign
Markets.
Wiedmann, Klaus-Peter, Nadine Hennigs, Steffen Schmidt, and Thomas
Wuestefeld. Drivers and Outcomes of Brand Heritage: Consumers’
Perception of Heritage Brands in the Automotive Industry.
Anderson, Rolph, and Srinivasan Swaminathan. Customer Satisfaction and
Loyalty in e-Markets: A PLS Path Modeling Approach.
Hoffmann, Stefan, Robert Mai, and Maria Smirnova. Development and
Validation of a Cross-Nationally Stable Scale of Consumer Animosity.
Other Sources:
An Assessment of the Use of Partial Least
Squares Structural Equation Modeling
In Marketing Research, JAMS, Vol 40 (3), May
2012; 414-433.
Special Issue, LRP, forthcoming 2013,
PLS in Long Range Planning.
Book: A Primer on Partial Least Squares,
Sage, forthcoming 2013.
Summary Comparison: PLS-SEM vs. CB-SEM
Criteria
Variance-Based Modeling
(e.g. SmartPLS, PLS Graph)
Covariance-Based Modeling
(e.g. LISREL, AMOS, Mplus)
Objective
Prediction oriented
Parameter oriented
Distribution
Assumptions
Non-parametric
Normal distribution (parametric)
Required sample size
Small (min. 30 – 100)
High (min. 100 – 800)
Model complexity
Large models OK
Large models problematic
(50+ indicator variables)
Parameter Estimates
Potential Bias
Stable, if assumptions met
Indicators per
construct
One – two OK
Large number OK
Typically 3 – 4 minimum to meet
identification requirements
Statistical tests for
Inference requires
parameter estimates Jackknifing or Bootstrapping
Assumptions must be met
Measurement Model
Formative and Reflective
indicators OK
Typically only Reflective
indicators
Goodness-of-fit
measures
None
Many
Sample Size Determination – PLS-SEM
Sample size should be equal to the larger of:
• ten times the largest number of formative
indicators used to measure a single construct,
or
• ten times the largest number of structural paths
directed at a particular latent construct in the
structural model.
Sample Size Guidelines – PLS-SEM
The overall complexity of a structural model has little
influence on the sample size requirements for PLS-SEM.
The reason is the algorithm does not compute all
relationships in the structural model at the same time.
Instead, it uses OLS to estimate the SEM model’s partial
regression relationships. Two early studies
systematically evaluated the performance of PLS-SEM
with small sample sizes and concluded it performed well
(e.g., Chin & Newsted, 1999; Hui & Wold, 1982). More
recently a simulation study by Reinartz et al. (2009)
indicated that PLS-SEM is a good choice when the
sample size is small. Moreover, compared to its
covariance-based counterpart, PLS-SEM has higher
levels of statistical power in situations with complex
model structures or smaller sample sizes.
AT
Path Model and Data for PLS-SEM Hypothetical Example
x1
w11
x2
w12
x3
w21
x4
w22
Y1
p13
l31
Y3
Y2
x5
l32
l33
p23
x6
x7
Measurement Models
Structural Model
(Indicators x, latent variables Y,
and relationships (i.e., w or l) between
indicators and latent variables)
(Latent variables Y and
relationships between
latent variables p)
x1
x2
x3
x4
x5
x6
x7
Y1
w11
w12
Y2
Y3
Y1
y1
y2
y3
w21
w22
l31
l32
l33
Y2
Y3
p13
p23
Download