Construct

advertisement
PLS-SEM: Introduction (Part 1)
Joe F. Hair, Jr.
Founder & Senior Scholar, DBA Program
SEM Model:
Predicting the Birth Weight
of Guinea Pigs
X & Y = different outcomes
B, C & D = common causes
A & E = independent causes
Sewall Wright, Correlation and Causation, Journal
of Agricultural Research, Vol. XX, No. 7, 1921.
The greatest interest in any factor solution centers on the correlations between the original
variables and the factors. The matrix of such test-factor correlations is called the factor structure,
and it is the primary interpretative device in principal components analysis. In the factor
structure the element rjk gives the correlation of the jth test with the kth factor. Assuming that the
content of the observation variables is well known, the correlations in the kth column of the
structure help in interpreting, and perhaps naming, the kth factor. Also, the coefficients in the jth
row give the best view of the factor composition of the jth test.
The derivation
of the factor
structure
S is as follows
:
N
S  1
N

(z i  m
 1
N

z ifi
 1
N

z i (L
 1
N


( z i z i ) VL
z
)( f i  m
f
)
i 1
 RVL

1 / 2
V z i ) 
1 / 2
1 / 2
and since
RV  VL
S  VLL
1 / 2
 VL
1/ 2
Another set of coefficients of interest in factor analysis is the weights that compound predicted
observations z from factor scores f. These regression coefficients for the multiple regression of
each element of the observation vector z on the factor f are called factor loadings and the matrix A
that contains them as its rows is . . . . .
Source: Cooley, William W., and Paul R. Lohnes, Multivariate Data Analysis, John Wiley & Sons,
Inc., New York, 1971, page 106.
Structural Equations Modeling
What comes to mind?
CB-SEM (Covariance-based SEM) –
objective is to reproduce the theoretical
covariance matrix, without focusing on
explained variance.
PLS-SEM (Partial Least Squares SEM)
– objective is to maximize the explained
variance of the endogenous latent
constructs (dependent variables).
CB-SEM Model
HBAT, MDA database
Covariance Matrix = HBAT 3-Construct model
CB-SEM – evaluation focuses on goodness of
fit = minimization of the difference between
the observed covariance matrix and the
estimated covariance matrix.
Research objective: testing and confirmation where
prior theory is strong.
• Assumes normality of data distribution,
homoscedasticity, large sample size, etc.
• Only reliable and valid variance is useful for testing
causal relationships.
• A “full information approach” which means small
changes in model specification can result in
substantial changes in model fit.
PLS-SEM – objective is to maximize the
explained variance of the endogenous
latent constructs (dependent variables).
Research objective: theory development and prediction.
• Normality of data distribution not assumed.
• Can be used with fewer indicator variables (1 or 2) per
construct.
• Models can include a larger number of indicator
variables (CB-SEM difficult with 50+ items).
• Preferred alternative with formative constructs.
• Assumes all measured variance (including error) is
useful for explanation/prediction of causal
relationships.
PLS Path Model
Latent
LatentVariable
Construct
W1
Indicator Variable
X1
Y1
X2
P1
W2
W6
Y3
X3
X6
W3
Y2
X4
X5
W5
W4
P2
W7
X7
Multivariate Methods
Should SEM Be Used?
Considerations:
1. The Variate
2. Multivariate Measurement
3. Measurement Scales
4. Coding
5. Data Distribution
Variate = a linear combination of several variables,
often referred to as the fundamental building block
of multivariate analysis.
Variate value = x1w1 + x2w2 + . . . + xkwk
Data Matrix
Multiple Regression Model
x1
x2
…
Y1
xk
Variate = x1 + x2 + xk + e
e1
Multivariate Measurement
Measurement = the process of assigning numbers to a
variable/construct based on a set of rules that are used to assign
the numbers to the variable in a way that accurately represents the
variable.
When variables are difficult to measure, one approach is to
measure them indirectly with proxy variables. If the concept is
restaurant satisfaction, for example, then the several proxy
variables that could be used to measure this might be:
1.
2.
3.
4.
5.
The taste of the food was excellent.
The speed of service met my expectations.
The wait staff was very knowledgeable about the menu items.
The background music in the restaurant was pleasant.
The meal was a good value compared to the price.
Multivariate measurement involves using several variables
to indirectly measure a concept, as in the restaurant satisfaction
example above. It also enables researchers to account for the error
in data.
Data Characteristics – PLS-SEM
Sample Size  No identification issues with small sample sizes (35-50).
 Generally achieves high levels of statistical power with small
sample sizes (35-50).
 Larger sample sizes (250+) increase the precision
(i.e., consistency) of PLS-SEM estimations.
Data
 No distributional assumptions (PLS-SEM is a non-parametric
method; works well with extremely non-normal data).
Distribution
Missing
 Highly robust as long as missing values are below
Values
reasonable level (e.g., up to 15% randomly missing data
points).
 Use mean replacement (sub-groups) and nearest neighbor.
Measurement  Works with metric, quasi-metric (ordinal) scaled data, and
Scales
binary coded variables (~only exogenous variables).
 Limitations when using categorical data to measure
endogenous latent variables.
 Suggest using binary variables for multi-group comparisons.
Model Characteristics – PLS-SEM
Number of Items
in each Construct
Measurement
Model
Relationships
between Latent
Constructs and
their Indicators
Model
Complexity
Model Set-up
 Handles constructs measured with single and
multi-item measures.
 Easily handles 50+ items (CB-SEM does not).
 Single item scales OK.
 Easily incorporates reflective and formative
measurement models.
 Handles complex models with many structural
model relationships.
 Larger numbers of indicators are helpful in
reducing “consistency at large”.
 Causal loops not allowed in the structural model
(only recursive models).
Algorithm Properties – PLS-SEM
Objective  Minimizes the amount of unexplained variance (i.e.,
maximizes the R² values).
Efficiency  Converges after a few iterations (even in situations with
complex models and/or large sets of data) to the global
optimum solution; efficient algorithm.
Latent
 Estimated as linear combinations of their indicators.
Construct  Used for predictive purposes.
Scores  Can be used as input for subsequent analyses.
 Not affected by data inadequacies.
Parameter  Structural model relationships underestimated (PLSEstimates
SEM bias).
 Measurement model relationships overestimated (PLSSEM bias).
 Consistency at large (minimal impact with N = 250+).
 High levels of statistical power with smaller sample
sizes (35-50).
Model Evaluation Issues – PLS-SEM
Evaluation of  No global goodness-of-fit criterion.
Overall Model
Evaluation of  Reflective measurement models: reliability and
Measurement
validity assessments by multiple criteria.
Models
 Formative measurement models: validity
assessment, significance of path coefficients,
multicollinearity.
Evaluation of  Significance of path coefficients, coefficient of
Structural
determination (R²), pseudo F-test (f² effect size),
Model
predictive relevance (Q² and q² effect size).
Additional  Mediating effects
Analyses
 Impact-performance matrix analysis
 Higher-order constructs
 Multi-group analysis
 Measurement mode invariance
 Moderating effects
 Uncovering unobserved heterogeneity: FIMIX-PLS
Rules of Thumb: PLS-SEM or CB-SEM?
Use PLS-SEM when:
• The goal is predicting key target constructs or identifying
key “driver” constructs.
• Formative constructs are easy to use in the structural
model. Note that formative measures can also be used
with CB-SEM, but doing so requires construct
specification modifications (e.g., the construct must
include both formative and reflective indicators to meet
identification requirements).
• The structural model is complex (many constructs and
many indicators).
• The sample size is small and/or the data is not-normally
distributed, or exhibits heteroskedasticity.
• The plan is to use latent variable scores in subsequent
analyses.
Rules of Thumb: PLS-SEM or CB-SEM
Use CB-SEM when:
• The goal is theory testing, theory
confirmation, or the comparison of
alternative theories.
• Error terms require additional specification,
such as the covariation.
• Structural model has non-recursive
relationships.
• Research requires a global goodness of fit
criterion.
Systematic Process for applying PLS-SEM
Stage 1
Specifying the Structural Model
Stage 2
Specifying the Measurement Models
Stage 3
Data Collection and Examination
Stage 4
PLS-SEM Model Estimation
Stage 5a
Assessing PLS-SEM Results for Reflective
Measurement Models
Stage 5b
Assessing PLS-SEM Results for Formative
Measurement Models
Stage 6
Assessing PLS-SEM Results for the Structural
Model
Stage 7
Interpretation of Results and Drawing Conclusions
Should You Use SEM?
Journal reviewers rate SEM papers more favorably
on key manuscript attributes . . .
Mean Score
Attributes
Topic Relevance
Research Methods
Data Analysis
Conceptualization
Writing Quality
Contribution
SEM
4.2
3.5
3.5
3.1
3.9
3.1
No SEM
3.8
2.7
2.8
2.5
3.0
2.8
p-value
.182
.006
.025
.018
.006
.328
Note: scores based on 5-point scale, with 5 = more favorable
Source: Babin, Hair & Boles, Publishing Research in Marketing Journals
Using Structural Equation Modeling, Journal of Marketing Theory and
Practice, Vol. 16, No. 4, 2008, pp. 281-288.
PLS-SEM Stages 1, 2 & 3: Design Issues
1. Scale Measures
•
Scale selection/design
•
Reflective vs. Formative
2. Common Methods Variance
•
Harmon Single Factor Test
•
Common Latent Factor
•
Marker Construct
3. Missing Data, outliers, etc.
Scale Design
1. Revise/Update
•
Established scales – how old?
•
Double barreled; negatively worded
2. Number of Scale Points
•
More scale points = greater variability
3. Single Item Scales
Single Item Scales ?
Single-item measures

Theoretical
Aspects
Reliability


Validity


Partitioning

Multi-item measures
 allows for random error
adjustment
 determination of reliability by
means of internal
consistency
no adjustment of
random error
assessing reliability
is problematic
lower construct
validity – does not
account for all facets
of a construct
decreased criterion
validity
assessing validity is
more problematic
Partitioning solely
based on the single
variable
Missing
Values

very difficult to
resolve
Use in
Academic
Research

very uncommon
(publication
problematic)





higher construct validity –
different facets of a
construct can be captured
increased criterion validity
validity measures based on
item-to-item correlations
more precise partition
possible

imputation methods based
on correlations between
indicators of the same
construct

generally accepted
Single Item Scales ?
Single-item measures
Practical
Aspects
Costs
Multi-item measures
 higher costs associated with
 lower costs associated
scale development,
with scale development,
questioning, and data
questioning, and data
analysis
analysis
 increased survey
Non lower survey response rate
response rate
response
 higher item nonresponse
 lower item nonresponse
Burden
 little burden: simple,
of
 increased burden: longer,
fast, and
Question
likely more boring and tiring
comprehensible
-ing
Reflective (Scale) Versus Formative (Index)
Operationalization of Constructs
A central research question in social science research, particularly marketing
and MIS, focuses on the operationalization of complex constructs:
Are indicators causing or being caused by
the latent variable/construct measured by them?
Indicator 1
Indicator 2
Indicator 3
Construct
Changes in the latent variable
directly cause changes in the
assigned indicators
Indicator 1
?
Indicator 2
Indicator 3
Construct
Changes in one or more of the
indicators causes changes in
the latent variable
Example: Reflective vs. Formative World View
Can’t walk a straight
line
Drunkenness
Smells of alcohol
Slurred speech
Example: Reflective vs. Formative World View
Consumption of beer
Drunkenness
Consumption of wine
Consumption of hard
liquor
Basic Difference Between Reflective and
Formative Measurement Approaches
“Whereas reflective indicators are essentially interchangeable (and
therefore the removal of an item does not change the essential
nature of the underlying construct), with formative indicators
‘omitting an indicator is omitting a part of the construct’.”
(DIAMANTOPOULOS/WINKLHOFER, 2001, p. 271)
The formative measurement approach
generally minimizes the overlap
between complementary indicators
Construct
domain
Construct
domain
The reflective measurement approach
focuses on maximizing the overlap
between interchangeable indicators
Exercise: Satisfaction in Hotels as Formative
and Reflective Operationalized Construct
The rooms‘ furnishings
are good
The hotel’s recreation
offerings are good
Taking everything into
account, I am satisfied
with this hotel
The hotel‘s personnel
are friendly
I appreciate this hotel
Satisfaction
with Hotels
The hotel is low-priced
I am looking forward to
staying overnight in
this hotel
The rooms are quiet
I am comfortable with
this hotel
The rooms are clean
The hotel’s service is
good
The hotel’s cuisine is
good
Formative Constructs – Two Types
1. Composite (formative) constructs – indicators completely
determine the “latent” construct. They share similarities because
they define a composite variable but may or may not have
conceptual unity. In assessing validity, indicators are not
interchangeable and should not be eliminated, because removing
an indicator will likely change the nature of the latent construct.
2. Causal constructs – indicators have conceptual unity in that
all variables should correspond to the definition of the concept. In
assessing validity some of the indicators may be interchangeable,
and also can be eliminated.
Bollen, K.A. (2011), Evaluating Effect, Composite, and Causal Indicators in
Structural Equations Models, MIS Quarterly, Vol. 35, No. 2, pp. 359-372.
PLS-SEM Example
LIKE
CUSA
COMP
CUSL
Types of Measurement Models
PLS-SEM Example
Reflective Measurement
Model
comp_1
comp_2
Reflective Measurement
Model
like_1
COMP
comp_3
like_2
LIKE
like_3
Single-Item Construct
Reflective Measurement
Model
cusl_1
cusa
CUSA
CUSL
cusl_2
cusl_3
Indicators for SEM Model Constructs
Competence (COMP)
comp_1
[company] is a top competitor in its market.
comp_2
As far as I know, [company] is recognized world-wide.
comp_3
I believe that [company] performs at a premium level.
Likeability (LIKE)
like_1
[company] is a company that I can better identify with than other companies.
like_2
[company] is a company that I would regret more not having if it no longer
existed than I would other companies.
I regard [company] as a likeable company.
like_3
Customer Loyalty (CUSL)
cusl_1
I would recommend [company] to friends and relatives.
cusl_2
If I had to choose again, I would chose [company] as my mobile phone services
provider.
I will remain a customer of [company] in the future.
cusl_3
Satisfaction (CUSA)
cusa
If you consider your experiences with [company] how satisfied are you with
[company]?
Data Matrix for Indicator Variables
Column Number and Variable Name
Case
Number
1
2
3
4
5
6
7
8
9
10
comp_1 comp_2 comp_3 like_1 like_2 like_3 cusl_1 cusl_2 cusl_3 cusa
1
4
5
5
3
1
2
5
3
3
5
2
6
7
6
6
6
6
7
7
7
7
6
5
6
6
7
5
7
7
7
7
...
344
Getting Started with the SmartPLS Software
The next slide shows the graphical interface for the SmartPLS
software, with the simple model already drawn. We describe in
the following slides how to set up this model using the SmartPLS
software program. Before you draw your model, you need to
have data that serves as the basis for running the model. The
data we will use to run our example PLS model can be
downloaded either as comma separated values (.csv) or text (.txt)
data files at the following URL: http://www.smartpls.de/cr/. When
you get to the website scroll down to the Corporate Reputation
Example where it says Click on the following links to download
files.
SmartPLS can use both data file formats (i.e., .csv or .txt).
Follow the onscreen instructions to save one of these two files on
your hard drive. Click on Save Target As… to save the data to a
folder on your hard drive, and then Close. Now go to the folder
where you previously downloaded and saved the SmartPLS
software on your computer. Click on the file that runs SmartPLS
(
) and then on the Run tab to start the software. You are
now ready to create a new SmartPLS project.
SmartPLS Graphical Interface
Example with Names and Data Assigned
Brief Instructions: Using SmartPLS
1. Load SmartPLS software – click on
2. Create your new project – assign name and data.
3. Double-click to get Menu Bar.
4. Draw model – see options below:
•
Insertion mode =
•
Selection mode =
•
Connection mode =
5. Save model.
6. Click on calculate icon
and select PLS algorithm on
the Pull-Down menu. Now accept the default options by
clicking Finish.
To create a new project, click on → File → New → Create New Project.
The screen below will appear. Type a name in the window. Click Next.
You now need to assign a data file to the project, in our case, data.csv (or
whatever name you gave to the data you downloaded). To do so, click on
the dots tab (…) at the right side of the window, find and highlight your data
folder, and click Open to select your data. Once you have specified the data
file, click on Finish.
SmartPLS Software Options
Find your new project in window, expand list of projects to get project
details (see below), click on the .splsm file for your project
Double click on your new model to get the menu
bar to appear at the top of the screen.
Selection mode
Draw constructs
Draw structural paths
Initial Structural Model – No Indicator Variables
Structural Model with Names and Paths
Name Constructs, Align Indicators, Etc. . . .
 Start calculation
Rename Construct
Hide used indicators
Show measurement model
Change reflective to formative
How to Run SmartPLS Software
Default Settings for Example – Click Finish to run
Trade-off in missing value
treatment:
Case wise replacement can
greatly reduce the number of
cases but sample mean
imputation reduces variables’
variance.
 Preferred approach to deal
with missing data is combination
of sub-group and nearest
neighbor, or use EM imputation
using SPSS.
Always use path weighting scheme
PLS Results for Example
SmartPLS Calculation Reports – Overview
Quality Criteria Report – SmartPLS
The composite reliability is
excellent – almost .90 for all
three constructs.
The AVEs for all three constructs are
well above .50.
Summary of PLS-SEM Findings
1. The direct path from COMP to CUSA is 0.162 and the direct
path from COMP to CUSL is 0.009.
2. The direct path from LIKE to CUSA is 0.424 and the direct path
from LIKE to CUSL is 0.342.
3. The direct path from CUSA to CUSL is 0.504.
4. Overall, the model predicts 29.5% of the variance in CUSA, and
56.2% of the variance in CUSL.
5. Reliability of constructs is excellent.
6. Constructs achieve convergent validity (AVE > 0.50)
To determine significance levels, you must run Bootstrapping option.
Look for under the calculate
option.
Download