Use Of Structural Equation Modeling In

Journal of Operations Management 24 (2006) 148–169
www.elsevier.com/locate/dsw
Use of structural equation modeling in operations
management research: Looking back and forward§
Rachna Shah *, Susan Meyer Goldstein 1
Operations and Management Science Department, Carlson School of Management,
321, 19th Avenue South, University of Minnesota, Minneapolis, MN 55455, USA
Received 10 October 2003; received in revised form 28 March 2005; accepted 3 May 2005
Available online 5 July 2005
Abstract
This paper reviews applications of structural equation modeling (SEM) in four major Operations Management journals
(Management Science, Journal of Operations Management, Decision Sciences, and Journal of Production and Operations
Management Society) and provides guidelines for improving the use of SEM in operations management (OM) research. We
review 93 articles from the earliest application of SEM in these journals in 1984 through August 2003. We document and assess
these published applications and identify methodological issues gleaned from the SEM literature. The implications of
overlooking fundamental assumptions of SEM and ignoring serious methodological issues are presented along with guidelines
for improving future applications of SEM in OM research. We find that while SEM is a valuable tool for testing and advancing
OM theory, OM researchers need to pay greater attention to these highlighted issues to take full advantage of its potential.
# 2005 Elsevier B.V. All rights reserved.
Keywords: Empirical research methods; Structural equation modeling; Operations management
1. Introduction
Structural equation modeling as a method for
measuring relationships among latent variables has
been around since early in the 20th century originating
in Sewall Wright’s 1916 work (Bollen, 1989). Despite
a slow but steady increase in its use, it was not until the
monograph by Bagozzi in 1980 that the technique was
§
Note: List of reviewed articles is available upon request from the
authors.
* Corresponding author. Tel.: +1 612 624 4432.
E-mail addresses: rshah@csom.umn.edu (R. Shah),
smeyer@csom.umn.edu (S.M. Goldstein).
1
Tel.: +1 612 626 0271.
brought to the attention of a much wider audience of
marketing and consumer behavior researchers. While
Operations Management (OM) researchers were slow
to use this new statistical approach, structural equation
modeling (SEM) has more recently become one of the
preferred data analysis methods among empirical OM
researchers, and articles that employ SEM as the
primary data analytic tool now routinely appear in
major OM journals.
Despite its regular and frequent application in the
OM literature, there are few guidelines for the
application of SEM and even fewer standards that
researchers adhere to in conducting analyses and
presenting and interpreting results, resulting in a large
0272-6963/$ – see front matter # 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.jom.2005.05.001
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
variance across articles that use SEM. To the best of
our knowledge, there are no reviews of the applications of SEM in the OM literature, while there are
regular reviews in other research areas that use this
technique. For instance, focused reviews have
appeared periodically in psychology (Hershberger,
2003), marketing (Baumgartner and Homburg, 1996),
MIS (Chin and Todd, 1995; Gefen et al., 2000),
strategic management (Shook et al., 2004), logistics
(Garver and Mentzer, 1999), and organizational
research (Medsker et al., 1994). These reviews have
revealed vast discrepancies and serious flaws in the use
of SEM. Steiger (2001) notes that even SEM textbooks
ignore many important issues, suggesting that
researchers may not have sufficient guidance to use
SEM appropriately.
Due to the complexities involved in using SEM and
problems uncovered in its use in other fields, a review
specific to OM literature seems timely and warranted.
Our objectives in conducting this review are threefold. First, we characterize published OM research in
terms of relevant criteria such as software used,
sample size, parameters estimated, purpose for using
SEM (e.g. measurement model development, structural model evaluation), and fit measures used. In
using SEM, researchers have to make subjective
choices on complex elements that are highly interdependent in order to align research objectives with
analytical requirements. Therefore, our second objective is to highlight these interdependencies, identify
problem areas, and discuss their implications. Third,
we provide guidelines to improve analysis and
reporting of SEM applications. Our goal is to promote
improved usage of SEM, standardize terminology, and
help prevent some common pitfalls in future OM
research.
2. Overview of structural equation modeling
To provide a basis for subsequent discussion, we
present a brief overview of structural equation
modeling along with two special cases frequently
used in the OM literature. The overview is intended to
be a brief synopsis rather than a comprehensive
detailing of mathematical model specification. There
are a number of books (Maruyama, 1998; Bollen,
1989) and articles dealing with mathematical speci-
149
fication (Anderson and Gerbing, 1988), key assumptions underlying model specification (Bagozzi and Yi,
1988; Fornell, 1983), and other methodological issues
of evaluation and fit (MacCallum, 1986; MacCallum
et al., 1992).
At the outset, we point to a distinction in the use of
two terms that are often used interchangeably in OM:
covariance structure modeling (CSM) and structural
equation modeling (SEM). CSM represents a general
class of models that include ARMA (autoregressive
and moving average) time series models, multiplicative models for multi-faceted data, circumplex
models, as well as all SEM models (Long, 1983).
Thus, SEM models are a subset of CSM models. We
restrict the current review to SEM models because
other types of CSM models are rarely used in OM
research.
Structural equation modeling is a technique to
specify, estimate, and evaluate models of linear
relationships among a set of observed variables in
terms of a generally smaller number of unobserved
variables (see Appendix A for detail). SEM models
consist of observed variables (also called manifest or
measured, MV for short) and unobserved variables
(also called underlying or latent, LV for short) that can
be independent (exogenous) or dependent (endogenous) in nature. LVs are hypothetical constructs that
cannot be directly measured, and in SEM are typically
represented by multiple MVs that serve as indicators
of the underlying constructs. The SEM model is an a
priori hypothesis about a pattern of linear relationships
among a set of observed and unobserved variables.
The objective in using SEM is to determine whether
the a priori model is valid, rather than to ‘find’ a
suitable model (Gefen et al., 2000).
Path analysis and confirmatory factor analysis are
two special cases of SEM that are regularly used in
OM. Path analysis (PA) models specify patterns of
directional and non-directional relationships among
MVs. The only LVs in such models are error terms
(Hair et al., 1998). Thus, PA provides for the testing of
structural relationships among MVs when the MVs are
of primary interest or when multiple indicators for LVs
are not available. Confirmatory factor analysis (CFA)
requires that LVs and their associated MVs be
specified before analyzing the data. This is accomplished by restricting the MVs to load on specific LVs
and by designating which LVs are allowed to correlate.
150
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
Fig. 1. Illustrations of PA, CFA, and SEM models.
A CFA model allows for directional influences
between LVs and their MVs and (only) nondirectional (correlational) relationships between
LVs. Long (1983) provides a detailed (mathematical)
treatment of each of these techniques. Fig. 1 shows
graphical illustrations of SEM, PA and CFA models.
Throughout this paper, we use the term SEM to refer to
all three model types (SEM, PA, CFA) and note any
exceptions to this.
3. Review of published SEM research
Our review focuses on empirical applications of
SEM which include: (1) CFA models alone, such as
in measurement or validation research; (2) PA
models (provided they are estimated using software
which allows latent variable modeling); and (3) SEM
models that combine both measurement and structural components. We exclude theoretical papers,
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
papers using simulation, conventional exploratory
factor analysis (EFA), structural models estimated by
regression models (e.g. models estimated by two
stage least squares), and partial least squares (PLS)
models. EFA models are not included because the
measurement model is not specified a priori (MVs
are not restricted to load on a specific LV and a MV
can load on multiple LVs),1 whereas in SEM, the
model is explicitly defined a priori. The main
objective of regression and PLS models is prediction
of variance explanation in the dependent variable(s)
compared to theory development and testing in the
form of structural relationships (i.e. parameter
estimation) in SEM. This philosophical distinction
between these approaches is critical in deciding
whether to use PLS or SEM (Anderson and Gerbing,
1988). In addition, because assumptions underlying
PLS and regression are less constraining than
SEM, the problems and concerns in conducting
these analyses are significantly different. Therefore,
we do not include regression and PLS models in our
review.
3.1. Journal selection
We considered all OM journals that are recognized
as publishing high quality and relevant empirical OM
research. Recently, Barman et al. (2001) ranked
Management Science (MS), Operations Research
(OR), Journal of Operations Management (JOM),
Decision Sciences (DS), and Journal of Production
and Operations Management Society (POMS) as the
top OM journals in terms of quality. In the past decade,
several additional reviews have examined the quality
and/or relevance of OM journals and have consistently
ranked these journals in the top tier (Vokurka, 1996;
Goh et al., 1997; Soteriou et al., 1998; Malhotra and
Grover, 1998). We do not include OR in our review as
its mission does not include publishing empirical
research. We selected MS, JOM, DS, and POMS as the
journals most representative of high quality and
relevant empirical research in OM. In our review, we
include articles from these four journals that meet our
methodology criteria and do not exclude articles due
to topic of research.
1
Target rotation, rarely used in OM research, is an instance of
EFA in which the model is specified a priori.
151
3.2. Time horizon and article selection
Rather than use specific search terms for selecting
articles, we manually checked each article of the
reviewed journals. Although more time consuming,
the manual search gave us more control and better
coverage than a ‘‘keyword’’ based search because
there is no widely accepted terminology for research
methods in OM to conduct such a search. In selecting
an appropriate time horizon, we started with the most
recent issue of each journal available until August
2003 and moved backwards in time. Using this
approach, we reviewed all published issues of JOM
from 1982 (Volume 1, Number 1) to 2003 (Volume 21,
Number 4) and POM from 1992 (Volume 1, Number
1) to 2003 (Volume 12, Number 1). For MS and DS,
we moved backward in time until we no longer found
applications of SEM. The earliest application of SEM
in DS was found in 1984 (Volume 15, Number 2) and
the most recent issue reviewed is Volume 34, Number
1 (2003). The incidence of SEM in MS began in 1987
(Volume 34, Number 6) and we reviewed all issues
through Volume 49, Number 8 (2003). The earliest
publication in these two journals corresponds with our
knowledge of the field and seems to have face validity
as such because it coincides with the general
timeframe when SEM was beginning to gain attention
of the wider audience in other literature streams.
In total, we found 93 research articles that satisfied
our selection criteria. Fig. 2 shows the number of
articles stacked by journal for the years we reviewed.
This figure is very informative: overall, it is clear that
the number of SEM articles has increased significantly
over the past 20 years in the four journals individually
Fig. 2. Number of articles by journal and year.
152
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
and cumulatively. To assess the growth trend in the use
of SEM, we regress the number of articles on an index
of year of publication (beginning with 1984). We use
both linear and quadratic effects of time in the
regression model.
The regression model is significant (F 2,17 = 39.93,
p = 0.000) and indicates that 82% of the variance in
the number of SEM publications is explained by the
linear and quadratic effects of time. Further, the linear
trend is not significant (t = 0.850, p = 0.41), whereas
the quadratic effect is significant (t = 2.94, p = .009).
So the use of SEM has not grown linearly as a function
of time, rather it has accelerated over time. In contrast,
the use of SEM in marketing and psychology grew
steadily over time and there is no indication of its
accelerated use in more recent years (Baumgartner
and Homburg, 1996; Hershberger, 2003).
There are several software programs available for
conducting SEM analysis, and each has idiosyncrasies and fundamental requirements for conducting
analysis. In our database, 19.6% of the articles did
not report the software used. Of the articles that
reported the software, LISREL accounted for 48.3%
followed by EQS (18.9%), SAS (9.1%), AMOS
(2.8%), RAMONA (0.7%) and SPSS (0.7%).
LISREL was the first software developed to solve
structural equation models and seems to have
capitalized on its first mover advantage not only in
psychology (MacCallum and Austin, 2000) and
marketing (Baumgartner and Homburg, 1996) but
also in OM.
3.3. Unit of analysis
In our review we found that multiple models were
sometimes presented in one article. Therefore, the unit
of analysis from this point forward (unless specified
otherwise) is the actual applications (one or more
models for each article). A single model is included in
our data set in the following situations: (1) when a
single model is proposed and evaluated using a single
sample; (2) when multiple alternative or nested
models are evaluated using a single sample, only
the final model is included in our analysis; (3) when a
single model is evaluated with either multiple samples
or by splitting a sample, only the model tested with the
verification sample is included in our analysis. Thus,
in these three cases, each article contributed only one
model to the analysis. When more than one model is
evaluated (using single, multiple, or split samples)
each distinct model is included in our analysis. In this
situation, each article contributed more than one
model to the analysis. A total of 143 models were
drawn from the 93 research articles, thus the overall
sample size for the remainder of the paper is 143. Of
the 143 models, we could not determine the method
used for four models. Of the remaining 139 models, 26
are PAs, 38 are CFAs, and 75 are SEMs. There are a
small number of articles that reported models that
never achieved adequate fit (by the authors’ descriptions), and while we include these articles in our
review, the fit measures are omitted from our analysis
to avoid inclusion of data related to models with
inadequate fit.
4. Critical issues in the application of SEM
There are many important issues to consider when
using SEM, whether for evaluating a measurement
model or examining the fit of structural relationships,
separately or simultaneously. Our discussion of issues
is organized into three groups: (1) issues to consider or
address prior to analysis are categorized under the
‘‘pre-analysis’’ stage; (2) issues and concerns to
address during analysis; and (3) issues related to the
post-analysis stage, which includes issues related to
evaluation, interpretation and presentation of results.
Decisions made at each stage are highly interdependent and significantly impact the quality of results, and
we cross-reference and discuss these interdependencies whenever possible.
4.1. Issues related to pre-analysis stage
Issues related to the pre-analysis stage need to be
considered prior to conducting SEM analysis and
include conceptual issues, sample size issues, measurement model specification, latent model specification, and degrees of freedom issues. A summary of
pre-analysis data from the reviewed OM studies is
presented in Table 1.
4.1.1. Conceptual issues
An underlying assumption of SEM analysis is that
the items or indicators used to measure a LV are
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
153
Table 1
Issues related to pre-analysis stage
Path analysis
models
Confirmatory
factor analysis
models
Structural
equation
models
All modelsa
Number of models revieweda
26
38
75
143
Sample size
Median
Mean
Range
125.0
251.2
(18, 2338)
141.0
245.4
(63, 902)
202.0
246.4
(52, 840)
176.0
243.3
(16, 2338)
Number of parameters estimated
Median
Mean
Range
10.0
11.3
(2, 34)
31.0
38.3
(8, 98)
34.0
37.5
(11, 101)
26.0
31.9
(2, 101)
Sample size/parameters estimated
Median
Mean
Range
9.6
33.5
(2.9, 389.7)
6.2
8.8
(2.3, 36.1)
5.6
7.4
(1.6, 25.4)
6.4
13.2
(1.6, 389.7)
Number of manifest variables
Median
Mean
Range
6.0
6.3
(3, 10)
12.5
13.5
(4, 32)
12.0
16.3
(5, 80)
11.0
14.0
(3, 80)
Number of latent variables
Median
Mean
Range
Not relevant
Not relevant
Not relevant
3.0
3.66
(1, 10)
4.0
4.7
(1, 12)
4.0
4.4
(1, 12)
Manifest variables/latent variable
Median
Mean
Range
Not relevant
Not relevant
Not relevant
4.0
5.2
(1.3, 16.0)
3.3
4.1
(1.3, 9.0)
3.6
4.5
(1.3, 16.0)
Not relevant
Reported for 1 model
Reported for 25 models
Reported for 28 models
1 model unknownc
11 models (28.9%)
Not relevant
0 (0% of CFA models
with CMEs)
8 models (10.7%),
4 models unknownc
4 (50% of SEM models
with CMEs)
19 models (13.3%),
6 models unknownc
4 (21% of all models
with CMEs)
Number of single indicator
latent variables b
Correlated measurement
errors (CMEs)
Theoretical justification
for CMEs
Recursiveness
Evidence of model identification
127 (88.8%) recursive; 13 (9.1%) nonrecursive; not reported or could not be determined from model
description for 3 (2.1%) models
Reported by 3.8%
Reported by 26.3%
Reported by 5.3%
Reported by 10.5%
Degrees of freedom (d.f.)
Median
Mean
Range
4.5
4.6
(1, 11)
62.0
90.1
(5, 367)
52.5
124.5
(4, 690)
48.0
99.7
(1, 690)
Proportion reporting
53.8%
52.6%
88.0%
71.3%
a
b
c
The type of analysis performed could not be determined for 4 of 143 models published in 93 articles.
The number of latent variables modeled using a single measured variable (i.e. single indicator).
Presence of CMEs could not be determined due to inadequate model description.
154
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
reflective (i.e. caused by the same underlying LV) in
nature. Yet researchers frequently apply SEM to
formative indicators. Formative (also called causal)
indicators are measures that form or cause the creation
of a LV (MacCallum and Browne, 1993; Bollen,
1989). An example of formative measures is the
amount of beer, wine and hard liquor consumed to
indicate level of mental inebriation (Chin, 1998). It
can be hardly argued that mental inebriation causes the
amount of beer, wine and hard liquor consumption. On
the contrary, the amount of each type of alcoholic
beverage affects the level of mental inebriation.
Formative indicators do not need to be highly
correlated or have high internal consistency (Bollen,
1989). In this example, an increase in beer consumption does not imply an increase in wine or hard liquor
consumption. Measurement of formative indicators
requires an index (as opposed to developing a scale
when using reflective indicators), and can be modeled
using SEM, but requires additional constraints
(Bollen, 1989; MacCallum and Browne, 1993). Using
SEM without additional constraints makes the
resulting estimates invalid (Fornell et al., 1991) and
the model statistically unidentified (Bollen and
Lennox, 1991).
Another underlying assumption for SEM is that the
theoretical relationships hypothesized in the models
being tested represent actual relationships in the
studied population. SEM assesses how closely the
observed data correspond to the expected patterns
and requires that relationships represented by the
model are well established and amenable to accurate
measurement in the population. SEM is not recommended for exploratory research when the measurement structure is not well defined or when the theory
that underlies patterns of relationships among LVs is
not well established (Brannick, 1995; Hurley et al.,
1997).
Thus, researchers need to carefully consider: (1)
type of items, (2) state of underlying theory, and (3)
stage of development of measurement instrument,
prior to using SEM. For formative measurement items,
researchers should consider alternative techniques
such as SEM using formative indicators (MacCallum
and Browne, 1993) and components-based approaches
such as partial least squares (Cohen et al., 1990).
When the underlying theory or the measurement
structure is not well developed, simpler data analytic
techniques such as EFA and regression analysis may
be more appropriate (Hurley et al., 1997).
4.1.2. Sample size issues
Adequacy of sample size has a significant impact
on the reliability of parameter estimates, model fit,
and statistical power. Using a simulation experiment
to examine the effect of varying sample size to
parameter estimate ratios, Jackson (2003) reports that
smaller sample sizes are generally characterized by
parameter estimates with low reliability, greater
bias in x2 and RMSEA fit statistics, and greater
uncertainty in future replication. How large a sample
should be for SEM is deceptively difficult to
determine because it is dependent upon several
characteristics such as number of MVs per LV
(MacCallum et al., 1996), degree of multivariate
normality (West et al., 1995), and estimation method
(Tanaka, 1987). Suggested approaches for determining sample size include establishing a minimum (e.g.,
200), having a certain number of observations per
MV, having a certain number of observations per
parameters estimated (Bentler and Chou, 1987;
Bollen, 1989; Marsh et al., 1988), and through
conducting power analysis (MacCallum et al., 1996).
While the first two approaches are simply rules
of thumbs, the latter two have been studied
extensively.
Table 1 reports the results of analysis of SEM
applications in the OM literature related to sample size
and number of parameters estimated. The smallest
sample sizes for PA (n = 18), CFA (n = 63), and SEM
(n = 52) are significantly smaller than established
guidelines for models with even minimal complexity
(MacCallum et al., 1996; Marsh et al., 1988).
Additionally, 67.9% of all models have ratios of
sample size to parameters estimates of less than 10:1
and 35.7% of models have ratios of less than 5:1. The
lower end of both sample size and sample size to
parameter estimate ratios are significantly smaller in
the reviewed OM research than those studied by
Jackson (2003), indicating that the OM literature may
be highly susceptible to the negative outcomes
reported in his study.
Statistical power (i.e. the ability to detect and
reject a poor model) is critical to SEM analysis
because, in contrast to traditional hypothesis testing,
the goal in SEM analysis is to produce a non-
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
significant result between sample data and the
implied covariance matrix derived from model
parameter estimates. Yet, a non-significant result
may also be due to a lack of ability (i.e. power) to
detect model misspecification. Few studies in our
review mentioned power and none estimated power
explicitly. Therefore, we employed MacCallum et al.
(1996), who define minimum sample size as a
function of degrees of freedom that is needed for
adequate power (0.80) to detect close model fit, to
assess the power of models in our sample. (We were
not able to assess power for 41 of 143 models due to
insufficient information.) Our analysis indicates that
37% of the models have adequate power and 63% do
not. These proportions are consistent with similar
analyses in psychology (MacCallum and Austin,
2000), MIS (Chin and Todd, 1995), and strategy
(Shook et al., 2004), and have not changed since
1960 (Sedlmeier and Gigenrenzer, 1989). We
recommend that future researchers use MacCallum
et al. (1996) to calculate the minimum sample size
needed to ensure adequate statistical power.
4.1.3. Degrees of freedom and model
identification
Degrees of freedom are calculated as follows:
d:f: ¼ ð1=2Þf pð p þ 1Þg q, where p is the number
of MVs, ð1=2Þf pð p þ 1Þg is the number of equations
(or alternately, the number of distinct elements in the
input matrix ‘‘S’’), and q is the effective number of
free (unknown) parameters to be estimated minus the
number of implied variances. As the formula
indicates, degrees of freedom is a function of model
specification in terms of the number of equations and
the effective number of free parameters that need to be
estimated.
When the effective number of free parameters is
exactly equal to the number of equations (that is, the
degrees of freedom are zero), the model is said to be
‘‘just-identified’’ or ‘‘saturated’’. Just-identified models provide an exact solution for parameters (i.e. point
estimates with no confidence intervals). When the
effective number of free parameters is greater than the
number of equations (degrees of freedom are less than
zero), the model is ‘‘under-identified’’ and sufficient
information is not available to uniquely estimate the
parameters. Under-identified models may not converge during model estimation, and when they do, the
155
parameter estimates they provide are not reliable and
overall fit statistics cannot be interpreted (Rigdon,
1995). For models in which there are fewer unknowns
than equations (degrees of freedom are one or greater),
the model is ‘‘over-identified’’. An over-identified
model is highly desirable because more than one
equation is used to estimate at least some of the
parameters, significantly enhancing reliability of the
estimate (Bollen, 1989).
Model identification is a complex issue and while
non-negative degrees of freedom is a necessary
condition, additional conditions such as establishing
a scale for each LV are frequently required (for a
detailed discourse on sufficiency conditions, see Long,
1983; Bollen, 1989). In our review, degrees of freedom
were not reported for 41 (28.7%) models (see Table 1).
We recalculated the degrees of freedom independently
for each reviewed model to assess discrepancies
between the reported and our calculated degrees of
freedom. We were not able to reproduce the degrees of
freedom for 18 applications based on authors’
descriptions of their models. This lack of reproducibility may be due in part to poor model description or
to correlated errors in the measurement or latent
variable models that are not stated in the text. We also
examined whether the issue of identification was
explicitly addressed for each model. One author
reported that the estimated model was not identified
and only 10.5% mentioned anything about model
identification. Perhaps the issue of identification was
considered implicitly because many software programs provide a warning message if a model is not
identified.
Model identification has a significant impact on
parameter estimates: in an unidentified model, more
than one set of parameter estimates could generate the
observed data and a researcher has no way to choose
among the various solutions because each is equally
valid (or invalid, if you wish). Degrees of freedom are
critically linked to the minimum sample size required
for adequate model fit; the greater the degrees of
freedom, the smaller the sample size needed for a
given level of model fit (MacCallum et al., 1996).
Calculating and reporting the degrees of freedom are
fundamental to understanding the specified model, its
identification, and its fit. Thus, we recommend that
degrees of freedom and model identification should be
reported for every tested model.
156
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
4.1.4. Measurement model specification
4.1.4.1. Number of items (MVs) per LV. It is generally
accepted that multiple MVs should measure each LV
but the number of MVs that should be used is less
clear. A ratio of fewer than three MVs per LV is of
concern because the model is statistically unidentified
in the absence of additional constraints (Long, 1983).
A large number of MVs per LV is advantageous as it
helps to compensate for a small sample (Marsh et al.,
1988) but disadvantageous as it means more parameters to estimate, requiring a larger sample size for
adequate power. A large number of MVs per LV also
makes it difficult to parsimoniously represent the
measurement structure constituting the set of MVs
(Anderson and Gerbing, 1984). In cases where a large
number of MVs are needed to represent a LV, Bagozzi
and Heatherton (1994) suggest four methods to reduce
the number of MVs per LV. In our review, 24% of CFA
models (9 of 38) and 39% of SEM models (29 of 75)
had a MV:LV ratio of less than 3. Generally, these
applications did not explicitly discuss identification
issues or additional constraints. The number of MVs
per LV characteristic is not applicable to PA models.
4.1.4.2. Single indicator constructs. We identified
LVs represented by a single indicator in 2.6% of CFA
models and 33.3% of SEM models in our sample (not
applicable to PA models). The low occurrence of
single indicator variables for CFA is not surprising
because the central objective of CFA is construct
measurement. However, the relatively high occurrence
of single indicator constructs in SEM models is
troublesome because single indicators ignore measurement reliability, one of the challenges SEM is
designed to circumvent (Bentler and Chou, 1987). The
single indicator issue is also tied to model identification as discussed above. Single indicators are only
sufficient when one measure perfectly represents a
concept, a rare situation, or when measurement
reliability is not an issue. Generally, single MVs
should be modeled as MVs rather than LVs.
4.1.4.3. Correlated measurement errors. Measurement errors should sometimes be modeled as
correlated, for instance, in a longitudinal study when
the same item is measured at two points in time
(Bollen, 1989 p. 232). The statistical effect of
correlated error terms is the same as double loading,
but the substantive meaning is significantly different.
Double loading implies that each MV is affected by
two underlying LVs. Fundamental to LV unidimensionality is that each MV load on one LV with loadings
on all other LVs restricted to zero. Because adding
correlated measurement errors to SEM models nearly
always improves model fit, they are often used post
hoc without improving the substantive interpretation
of the model (Fornell, 1983; Gerbing and Anderson,
1984) and making reliability estimates ambiguous
(Bollen, 1989 p. 222).
To the best of our knowledge, our sample contains
no instances of double loading MVs but we found a
number of models with correlated measurement
errors: 3.8% of PA, 28.9% of CFA, and 10.7% of
SEM models. We read the text of each article carefully
to determine whether the authors provided any
theoretical justification for using correlated errors or
whether they were introduced simply to improve
model fit. In more than half of the applications, no
justification was provided. Correlated measurement
errors should be used only when warranted on
theoretical or methodological grounds (Fornell,
1983) and their statistical and substantive impact
should be explicitly discussed.
4.1.5. Latent model specification
4.1.5.1. Recursive/non-recursive models. Models are
non-recursive when they contain reciprocal causation,
feedback loops, or correlated error terms (Bollen,
1989, p. 83). In such models, the matrix representing
latent exogenous variables (B; see Appendix A for more
detail) has non-zero elements both above and below the
diagonal. If B is lower triangular and the errors in
equations are uncorrelated, then the model is called
recursive (Hair et al., 1998). Non-recursive models
require additional restrictions for the model to be
identified, for the stability of estimated reciprocal
effects, and for the interpretation of measures of
variation accounted for in the endogenous variables (for
a more detailed treatment of non-recursive models, see
Long, 1983; Teel et al., 1986). In our review, we
examined each application for recursive and nonrecursive models due to either simultaneous effects or
correlated errors in equations. While we did not observe
any instances of simultaneous effects, we found that in
9.1% of the models, either the authors defined their
model as non-recursive or a careful reading of the article
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
led to such a conclusion. However, even when authors
explicitly stated that they were testing a non-recursive
model, we saw little if any explanation of issues such as
model identification in the text. We recommend that if
non-recursive models are specified, additional restrictions and implications for model identification are
explicitly stated in the paper.
4.2. Issues related to data analysis
Data analysis issues comprise examining sample
data for distributional characteristics and generating
an input matrix. Distributional characteristics of the
data impact researchers’ choices of estimation
method, and the type of input matrix impacts the
selection of software used for analysis.
4.2.1. Data screening
Data screening is critical to prepare data for SEM
analysis (Hair et al., 1998). Screening through
exploratory data analysis includes investigating for
missing data, influential outliers, and distributional
characteristics. Significant missing data result in
convergence failures, biased parameter estimates,
and inflated fit indices (Brown, 1994; Muthen et al.,
1987). Influential outliers are linked to normality and
skewness issues with MVs. Assessing data normality
(along with skewness and kurtosis) is important
because many model estimation methods are based on
an assumption of normality. Non-normal data may
result in inflated goodness of fit statistics and
underestimated standard errors (MacCallum et al.,
1992), although these effects are lessened with larger
sample sizes (Lei and Lomax, 2005).
In our review, only a handful of applications
discussed missing data. In the psychology literature,
listwise deletion, pairwise deletion, data imputation
and full information maximum likelihood (FIML)
methods are commonly used to manage missing data
(Marsh, 1998). Results from Monte Carlo simulation
examining the performance of these four methods
indicate the superiority of FIML, leading to the lowest
rate of convergence failures, least bias in parameter
estimates, and lowest inflation in goodness of fit
statistics (Enders and Bandalos, 2001; Brown, 1994).
FIML method is currently available in LISREL
(version 8.50 and above), SYSTAT (RAMONA) and
AMOS.
157
We found that for 26.6% of applications, normality
was discussed qualitatively in the text of the reviewed
articles. Estimation methods such as maximum
likelihood ratio and generalized least square assume
normality, although some non-normality can be
accommodated (Hu and Bentler, 1998; Lei and
Lomax, 2005). Weighted least square, ordinary least
square, and asymptotically distribution free estimation
methods do not require normality. Additionally, ‘‘ML,
Robust’’ in EQS software adjusts model fit and
parameter estimates for non-normality. Finally,
researchers can transform non-normal data, although
serious problems have been noted with data transformation (cf. Satorra, 2001). We suggest that some
discussion of data screening methods be included
generally, and normality be discussed specifically in
relation to the choice of estimation method.
4.2.2. Type of input matrix
While raw data can be used as input for SEM
analysis, a covariance (S) or correlation (R) matrix is
generally used. In our review of the OM literature, no
papers report using raw data, 30.8% report using S,
and 25.2% report using R (44.1% of applications did
not report the type of matrix used to conduct analysis).
Seven of 44 applications using S and 25 of 36
applications using R provide the input matrix in the
paper. Not providing the input matrix makes it
impossible to replicate the results reported by the
author(s).
While conventional estimation methods in SEM are
based on statistical distribution theory that is
appropriate for S but not for R, there are interpretational advantages to using R: if MVs are standardized
and the model is fit to R, then parameter estimates can
be interpreted in terms of standardized variables.
However, it is not correct to fit a model to R while
treating R as if it were a covariance matrix. Cudeck
(1989) conducted exhaustive analysis on the implications of treating R as if it were S and concludes that the
consequences depend on the properties of the model
being fitted: standard errors, confidence intervals and
test statistics for the parameter estimates are incorrect
in all cases. In some cases, parameter estimates and
values of fit indices are also incorrect.
Software programs commonly used to conduct
SEM deal with this issue in different ways. Correct
estimation of a correlation matrix can be done in
158
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
LISREL (Jöreskog and Sörbom, 1996) but requires the
user to introduce specific parameter constraints.
Although not widely used in OM, RAMONA (Browne
and Mels, 1998), EQS (Bentler, 1989) and SEPATH
(Steiger, 1999) automatically provide correct estimation with a correlation matrix. Currently, AMOS
cannot analyze correlation matrices. In our review, we
found 24 instances where authors reported using a
correlation matrix with LISREL (out of 69 models run
with LISREL) but most did not mention the necessary
additional constraints. We found one instance of using
AMOS with a correlation matrix.
Given the lack of awareness among users about the
treatment of R versus S by various software programs,
we direct readers’ attention to a test devised by
MacCallum and Austin (2000) to help users determine
whether a particular SEM program provides correct
estimation of a model fit to a correlation matrix.
Otherwise, it is preferable to fit models to covariance
matrices, thus insuring correct results.
relatively unbiased under moderate violations of
normality (Bollen, 1989). GLS assumes normality
but does not impose the restriction of a positive definite
input matrix. ADF has few distributional assumptions
but requires very large sample sizes for accurate
estimates. OLS, the simplest method, has no distributional assumptions and is computationally the most
robust, but it is scale invariant and does not provide fit
indices or standard errors for estimates.
Forty-eight percent of applications in our review
did not report the estimation method used. Of the
applications that reported the estimation method, a
majority (68.9%) used ML. Estimation method, data
normality, sample size, and model specification are
inextricably linked and must be considered simultaneously by the researcher. We suggest that authors
explicitly state the estimation method used and link it
to the properties of the observed variables.
4.2.3. Estimation methods
A variety of estimation methods such as maximum
likelihood ratio (ML), generalized least square (GLS),
weighted and unweighted least square (WLS and ULS),
asymptotically distribution free (ADF), and ordinary
least square (OLS) are available. Their use depends
upon the distributional properties of the MVs, and each
has computational advantages and disadvantages
relative to the others. For instance, ML assumes data
are univariate and multivariate normal and requires that
the input data matrix be positive definite, but it is
Post-analysis issues include evaluating the solution
achieved from model estimation, model fit, and
respecification of the model. Reports of these data
from the studied sample are summarized in Tables 2a
and 2b.
4.3. Issues related to post-analysis
4.3.1. Evaluation of solution
We have organized our discussion of evaluation of
solutions into overall model fit, measurement model
fit, and structural model fit. To focus solely on the
overall fit of the model while overlooking important
Table 2a
Issues related to data analysis for structural model
x2
x2, p-value
GFI
AGFI
RMR (or RMSR)
RMSEA
NFI
NNFI (or TLI)
CFI
IFI (or BL89)
Normed x2 (x2/d.f.) reported
Normed x2 calculated
a
b
Number of models reporting (n = 143)
Proportion reporting (%)
Results: mean; median
Range
107
76
84
59
51
51
49
62
73
16
52
98b
74.8
53.1
58.7
41.3
35.7
35.7
34.3
43.4
51.0
11.2
36.4
68.5
204.0; 64.2
0.21; 0.13
0.93; 0.94
0.89; 0.90
0.052; 0.050
0.058; 0.060
0.91; 0.92
0.95; 0.95
0.96; 0.96
0.94; 0.95
1.82; 1.59
2.17; 1.62
(0.0, 1270.0)
(0.0, 0.94)
(0.75, 0.99)
(0.63, 0.97)
(0.01, 0.14)a
(0.00, 0.13)
(0.72, 0.99)
(0.73, 1.07)
(0.88, 1.00)
(0.88, 0.98)
(0.02, 4.80)
(0.01, 21.71)
One model reported RMR = 145.4; this data point omitted as an outlier relative to other reported RMRs.
Data not available to calculate others.
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
Table 2b
Issues related to data analysis for measurement model
Reliability assessment
Unidimensionality assessment
Discriminant validity addressed
Validity issues addressed
(R2; variance explained)
Path coefficients
(confidence intervals)
Path t-statistics (standard errors)
Residual information/analysis
provided
Specification search conducted
for model respecification
Modification indices used
for model respecification
Alternative models compared
Inconsistency between described
and tested models
Cross-validation sample used
Split sample approach used
Number of
models reporting
(n = 143)
Proportion
reporting
(%)
123
94
99
76
86.0
65.7
69.2
53.1
138 (3)
96.5 (2.1)
90 (21)
19
62.9 (14.7)
13.3
20
14.0
21
14.7
29
31
20.3
21.7
22
27
15.4
18.9
information about parameters is a common error that
we encountered in our review. A model with good
overall fit but yielding nonsensical parameter estimates is not a useful model.
4.3.1.1. Overall model fit. Assessing a model’s fit is
one of the more complicated aspects of SEM because,
unlike traditional statistical methods, it relies on nonsignificance. Historically, the most popular index used
to assess the overall goodness of fit has been the x2statistic, although its conclusions regarding model
significance are generally ignored. The x2-statistic is
inherently biased when the sample size is large but is
dependent on distributional assumptions associated
with large samples. Additionally, a x2-test offers a
dichotomous decision strategy (accept /reject) for
assessing the adequacy of fit implied by a statistical
decision rule (Bollen, 1989). In light of these issues,
numerous alternative fit indices have been developed
to quantify the degree of fit along a continuum (see
Jöreskog, 1993; Tanaka, 1993; Bollen, 1989, pp. 256–
289; Mulaik et al., 1989 for comprehensive reviews).
Fit indices are commonly distinguished as either
absolute or incremental (Bollen, 1989). In general,
absolute fit indices indicate the degree to which the
hypothesized model reproduces the sample data, and
159
incremental fit indices measure the proportional
improvement in fit when the hypothesized model is
compared with a restricted, nested baseline model (Hu
and Bentler, 1998).
Absolute measures of fit: The most basic measure of
absolute fit is the x2-statistic. Other commonly used
measures include root mean square error of approximation (RMSEA), root mean square residual (RMR or
SRMR), goodness-of-fit index (GFI) and adjusted
goodness of fit (AGFI). GFI and AGFI increase as
goodness of fit increases and are bounded above by
1.00, while RMSEA and RMR decrease as goodness of
fit increases and are bounded below by zero (Browne
and Cudeck, 1989). Ninety-four percent of the
applications we reviewed report at least one of these
measures (Table 2a). Although the frequency of use and
the magnitude of each of these measures are similar to
those reported in marketing by Baumgartner and
Homburg (1996), the ranges in our sample are much
wider indicating greater variability in empirical OM
research. The variability may be an indication of more
complex models and/or a less established theory base.
Incremental fit measures: Incremental fit measures
compare the model under study to two reference
models: (1) a worst case or null model, and (2) an ideal
model that perfectly represents the modeled phenomena in the studied population. While there are many
incremental fit indices, some of the most popular are
normed fit index (NFI), non-normed fit index (NNFI or
TLI), comparative fit index (CFI) and incremental fit
index (IFI or BL89). Sixty-nine percent of the
reviewed studies report at least one of the four
measures (Table 2a). An additional fit index that is
frequently used is the normed x2 which is reported for
36.4% of models. Because the x2-statistic by itself is
beset with problems, the ratio of x2 to degrees of
freedom (x2/d.f.) is informative because it corrects for
model size. Additionally, we calculated the normed x2
for all models that reported x2 and either reported
degrees of freedom or enough model specification
information to allow us to ascertain the degrees of
freedom (68.5% of all applications) and found a
median of 1.62 (range 0.01, 21.71). Small values of
normed x2 (<1.0) can indicate an over-fitted model
and higher values (>3.0–5.0) can indicate an underparameterized model (Jöreskog, 1969).
A brief summary of the effects on fit indices of
small samples, normality violations, model misspe-
160
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
cification, and estimation method are reported in
Table 3. An ongoing debate about superiority or even
appropriateness of one index over another makes the
issue of selecting which to use in assessing fit very
complex. For instance, Hu and Bentler (1998) advise
against using GFI and AGFI because they are
significantly influenced by sample size and are
insufficiently sensitive to model misspecification.
Most fit indices are influenced by sample size and
should not be interpreted independently of sample
size (Hu and Bentler, 1998; Marsh et al., 1988).
Therefore, no consistent criteria (i.e. cut-offs) can be
defined to apply in all (or most) instances (Marsh
et al., 1988).
Until definitive fit indices are developed, researchers should report multiple measures of fit so reviewers
and readers have the opportunity to evaluate the
underlying fit of the data to the model from multiple
perspectives. x2 should be reported with its corresponding degrees of freedom in order to be insightful.
RMR and RMSEA, two measures that reflect the
residual differences between the input and implied
(reproduced) matrices, indicate how well matrix
covariance terms are predicted by the tested model.
RMR in particular performs well under many
conditions (Hu and Bentler, 1998; Marsh et al.,
1988). Researchers might also report a summary of
standardized (correlation) residuals because when
most or all are ‘‘quite small’’ relative to correlations in
the tested sample (Browne et al., 2002, p. 418), they
indicate good model fit (Bollen, 1989, p. 258).
4.3.1.2. Measurement model fit. Measurement model
fit can be evaluated in two ways: first, by assessing
constructs’ reliability and convergent and discriminant
Table 3
Influence of sample and estimation characteristics on model fit indices
Small sample (n) bias a
Absolute
x2
GFI
AGFI
RMR (or SRMR)
RMSEA
Incremental
NFI
NNFI (or TLI)
CFI
IFI (or BL89)
Normed x2
a
Bias establishedf
Poor for small ne
can be used f
Poor for small ne,f
ML preferred for
small ne
Tends to over reject
modele
Poor for small ne
Best index for small nf
tends to over reject
modele
ML preferred for
small ne
ML preferred for
small ne
Bias establishedf
Violations of
normalityb
Model misspecification c
Problematic with
ADFe
Problematic with
ADFe
Misspec’s not
identified by ADFe
Misspec’s not
identified by ADFe
Misspec’s identified
ML preferred
Misspec’s identified
No preference
Some misspec’s
identified
Misspec’s identified
ML preferred
ML preferred
Misspec’s identified
ML preferred
Misspec’s identified
ML preferred
Estimation
method effectd
No preference
ML preferred
ML preferred
General comments
Use of index not
recommended e
Use of index not
recommended e
Recommended for all
analysese
Use with ADF not
recommended e
Use of index not
recommended e
No preference
While all fit indexes listed suffer small sample bias (approximately n < 250), we consolidate findings by leading researchers.
b
Most normality violations have insignificant effects on fit indexes, except those noted.
c
Identifying model misspecification is a positive characteristic; fit indexes that do not identify misspecification are considered poor choices.
d
The following estimation methods investigated: maximum likelihood ratio (ML), generalized least square (GLS), asymptotically
distribution free (ADF)e,f.
e
Hu and Bentler (1998).
f
Marsh et al. (1988).
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
validity, and second, by examining the individual path
(parameter) estimates (Bollen, 1989).
Various indices of reliability can be computed to
summarize how well LVs are measured by their MVs
individually or jointly (individual item reliability,
composite reliability, and average variance extracted;
cf. Bagozzi and Yi, 1988; Fornell and Larcker, 1981).
Our initial attempt to report reliability measures used
by the authors proved difficult due to the diversity of
methods used. Therefore, we limit our review to
whether authors report at least one of the various
measures. Overall, 86.0% of the applications describe
some form of reliability assessment. We recommend
that authors report at least one measure of construct
reliability based on estimated model parameters (e.g.
composite reliability or average variance extracted)
(Bollen, 1989).
Cronbach alpha is an inferior measure of reliability
because in most cases it is only a lower bound on
reliability (Bollen, 1989). In our review we found that
Cronbach alpha was frequently presented as proof to
establish unidimensionality. It is not sufficient for this
purpose because a scale may not be unidimensional
even if it has high reliability (Gerbing and Anderson,
1984). Our review also examined how published
research dealt with the issue of discriminant validity.
We found that 69.2% of all applications included
evidence of discriminant validity. Our review indicates
that despite a lack of standardization in the reported
measures, most published research in OM includes
some measure of reliability, unidimensionality and
validity.
Another way to assess measurement model fit is to
evaluate path estimates. In evaluating path estimates,
sign (positive or negative), strength, and significance
should be aligned with theory. The magnitude of
standard errors associated with path estimates should
be small; a large standard error indicates an unstable
parameter estimate that is subject to sampling error.
Although recommended but rarely used in practice,
the 90% confidence interval (CI) around each path
estimate is very useful (Browne and Cudeck, 1993).
The CI provides an explicit indication of the degree of
parameter estimate precision. Additionally, the statistical significance of path estimates can be inferred
from the 90% CI: if the 90% CI includes zero, then the
path estimate is not significantly different from zero
(at a = 0.05). Overall, confidence intervals are very
161
informative and we recommend their use in future
studies. In our review, we found that 96.5% of the
applications report path coefficients, 62.9% provide t
statistics, 14.7% provide standard errors, and 2.1%
report confidence intervals.
4.3.1.3. Structural model fit. In SEM models, the
latent variable model represents the structural model
fit, and generally, the hypotheses of interest. In PA
models that do not have LVs, the hypotheses of interest
are generally represented by the paths between MVs.
Like measurement model fit, the sign, magnitude and
statistical significance of the structural path coefficients are examined in testing the hypotheses.
Researchers should recognize the important distinction between variance fit (explained variance in
endogenous variables as measured by R2 for each
structural equation) and covariance fit (overall goodness of fit, such as that tested by a x2-test). Authors
emphasize covariance fit a great deal more than
variance fit; in our review, 53.1% of the models
presented evidence of the variance fit compared to 96%
that presented at least one index of overall fit. It is
important to distinguish between these two types of fit
because a model might fit well but not explain a
significant amount of variation in endogenous variables
or conversely, fit poorly and explain a large amount of
variation in endogenous variables (Fornell, 1983).
In summary, we suggest that fit indices should
not be regarded as measures of usefulness of a model.
They each contain some information about model fit but
none about model plausibility (Browne and Cudeck,
1993). Rather than establishing that fit indices meet
arbitrarily established cut-offs, future research should
report a variety of absolute and incremental fit indices
for measurement, structural, and overall models and
include a discussion of interpretation of fit indices
relative to the study design. We found many instances in
which authors conclude that a particular model had
better fit than alternative models based on comparing fit
indices. While some fit indices can be useful for such
comparisons, most commonly employed fit indices
cannot be compared across models in this manner (e.g. a
model with a lower RMSEA does not indicate better fit
than a model with a higher RMSEA). For nested
alternate models, x2 difference test or Target Coefficient can be used (Marsh and Hocevar, 1985). For
alternate models that are not nested, parsimony fit
162
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
measures such as Parsimonious NFI, Parsimonious
GFI, Akaike information criterion (AIC) and normed
x2 can be used (Hair et al., 1998).
4.3.2. Model respecification
Although no model fits the real world exactly, a
desirable outcome in SEM analysis is to show that a
hypothesized model provides a good approximation of
real world phenomena, as represented by an observed
set of data. When an initial model of interest does not
satisfy this objective, researchers often alter the model
to improve its fit to the data. Modification of a
hypothesized model to improve its parsimony and/or
fit to the data is termed a ‘‘specification search’’
(Leamer, 1978; Long, 1983). A specification search is
designed to identify and eliminate errors from the
original specification of the hypothesized model.
Jöreskog and Sörbom (1996) describe three
strategies in model specification (and evaluation):
(1) strictly confirmatory, where a single a priori model
is studied; (2) model generation, where an initial
model is fit to data and then modified (frequently with
the use of modification indices) until it fits adequately;
and (3) alternative models, where multiple a priori
models are studied. Although not improper, the
‘‘strictly confirmatory’’ approach is highly restrictive
and does not leave the researcher any latitude if the
model does not work. The model generation approach
is troublesome because of the potential for abuse,
results that lack validity (MacCallum, 1986), and high
susceptibility to capitalization on chance (MacCallum
et al., 1992). Simulation work by MacCallum (1990)
and Homburg and Dobartz (1992) indicates that only
half of specification searches (even with correct
restrictions and large samples) are successful in
recovering the correct underlying model.
In our review, 28.7% (41 of 143) of the applications
reported making post hoc changes to respecify the
model. We also examined the published articles for
inconsistency between the model that was tested
versus the model described in the text. In 31 out of 143
cases we found such inconsistency, where we could
not match the described model with the tested model.
We suspect that in many cases, authors made post hoc
changes (perhaps to improve model fit), but those
changes were not well described. We found only
20.3% of the models were tested using alternate
models. We recommend that researchers compare
alternate a priori models (either nested or unnested) to
uncover the model that the observed data support best
rather than use specification searches (Browne and
Cudeck, 1989). Such practices may have a lower
probability of identifying models with great fit, but
they increase the alignment of modeling results with
our existing knowledge and theories. Leading journals
must show a willingness to publish poor fitting models
for such advancement of knowledge and theory.
5. Presentation and interpretation of results
We encountered many difficulties related to
presentation and interpretation of models, methods,
analysis, and results in our review. In a majority of
articles, we had difficulty determining either the
complete model (e.g. correlated measurement errors)
or the complete set of MVs. Whether the model was fit
to a correlation or covariance matrix could not be
ascertained for nearly half of the models, and reporting
of fit results was incomplete in a majority of models.
In addition, issues of causation in cross-sectional
designs, generalizability, and confirmation bias also
raise concerns and are discussed in detail below.
5.1. Causality
Each of the applications we reviewed used a crosssectional research design. The debate over whether
concurrent measurement of variables can be used to
infer causality is vibrant but unresolved (Gollob and
Reichardt, 1991; Gollob and Reichardt, 1987;
MacCallum and Austin, 2000). One point of agreement is that causal interpretation must be based on the
theoretical grounding of and empirical support for a
model (Pearl, 2000). In light of this ongoing debate,
we suggest that OM researchers describe the theory
they are testing and its expected manifested results as
clearly as possible prior to conducting analysis.
5.2. Generalizability
‘‘Generalizability of findings’’ refers to the applicability of findings from one study with a finite, often
small sample to a population (or other populations).
Findings from single studies are subject to limitations
due to sample or selection effects and their impact on
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
the conclusions that can be drawn. In our review, such
limitations were seldom acknowledged and results were
usually interpreted and discussed as if they were
expansively generalizable. Sample and selection effects
are controlled (but not eliminated) by identifying a
specific population and from it selecting a sample that is
appropriate to the objectives of the study. Rather than
identifying a specific population, the articles we
reviewed focused predominantly on describing their
samples. However, a structural equation model is a
hypothesis about the structure of relationships among
MVs and LVs in a specific population, and this
population should be explicitly identified.
Another aspect of generalizability involves replicating the results of a study in a different sample from
the same population. We found that 15.4% of the
reviewed applications used cross-validation and
18.9% used a split sample approach. Given the
difficulty in obtaining responses from multiple
samples from a given population, the expected
cross-validation index (ECVI), an index computed
from a single sample, can indicate how well a solution
obtained in one sample is likely to fit an independent
sample from the same population (Browne and
Cudeck, 1989; Cudeck and Browne, 1983).
Selecting the most appropriate set of measurement
items to represent the domain of underlying LVs is
critical when using SEM. However, there are few
standardized instruments for LVs, making progress in
empirical OM research slow and difficult. Appropriate
operationalization of LVs is as critical as their repeated
use: repetition helps to establish validity and
reliability. (For a detailed discussion and guidelines
on the selection effects related to good indicators, see
Little et al., 1999; for OM measurement scales, see Roth
andSchroeder,inpress.)A challenging issueariseswhen
researchers are unable to validate previously used scales.
In such situations, we suggest a two-pronged strategy.
First, a priori the researcher must examine the
assumptions employed in developing the previous
scales and state their impact on replication. Second,
upon failure to replicate with validity, the researcher
must use an exploratory means to develop modified
scales to be validated by future researchers. However,
this respecified model should not be given the status
of a hypothesized model and would need to be validated
in the future with another sample from the same
population.
163
5.3. Confirmation bias
Confirmation bias is defined as a prejudice in favor
of the evaluated model (Greenwald et al., 1986). Our
review suggests that OM researchers (not unlike
researchers in other fields) are highly susceptible to
confirmation bias. Researchers evaluate a single
model, give an overly positive evaluation of model
fit, and are reluctant to consider alternative explanations of data. An associated problem in this context is
the existence of equivalent models, alternative models
that are indistinct from the original model in terms of
goodness of fit to the data but with a distinct
substantive meaning in terms of the underlying theory
(MacCallum et al., 1993). In a study of 53 published
applications in psychology, MacCallum et al. (1993)
showed that equivalent models exist routinely in large
numbers and are universally ignored by researchers. In
order to mitigate problems related to confirmation
bias, we recommend that OM researchers generate
multiple alternate, equivalent models a priori and if
one or more of these models cannot be eliminated due
to theoretical reasons or poor fit, to explicitly discuss
the alternate explanation(s) underlying the data rather
than confirming and presenting results from one
definitive model (MacCallum et al., 1993).
6. Discussion and conclusion
SEM has rapidly become an important and widely
used research tool in the OM literature. Its attractiveness to OM researchers can be attributed to two
factors. From CFA, SEM draws upon the notion of
unobserved or latent variables, and from PA, SEM
adopts the notion of modeling direct and indirect
relationships. These advantages, combined with the
availability of ever more user-friendly software, make
it likely that SEM will enjoy widespread use in the
future. We have provided both a review of the OM
literature employing SEM as well as discussion and
guidelines for improving its future use. Table 4
contains a summary of some of the most important
issues discussed here, their implications, and recommendations for resolving these challenges. Below, we
briefly discuss these issues.
As researchers, we should ensure that SEM is the
correct method for examining the research question at
164
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
Table 4
Implications and recommendations for select SEM issues
Implications
Recommendations
Formative (causal indicators) Bollen (1989), MacCallum and
Browne (1993):
Without additional constraints, the model
is generally unidentified
Model as causal indicators
(MacCallum and Browne, 1993)
Report appropriate conditions and modeling issues
Poorly developed or weak
relationships
Hurley et al. (1997):
More likely to result in a poor fitting
model requiring specification searches and
post hoc model respecification
Use alternative methods that demand less
rigorous model specification such as EFA
and Regression Analysis (Hurley et al., 1997)
Violating multivariate
normality
MacCallum et al. (1992):
Inflated goodness of fit
statistics; Underestimated
standard errors
Use estimation methods that adjust for violation
such as ‘‘ML, Robust’’ available in EQS;
Use estimation methods that do not assume
multivariate normality such as GLS, ADF
Correlation matrix
as input data
LISREL is inappropriate without
additional constraints (Cudeck, 1989):
Standard errors, confidence intervals and
test statistics for parameter estimates are
incorrect in all cases
Parameter estimates and fit indices are
incorrect in some cases
Type of input matrix and software must be reported
RAMONA in SYSTAT (Browne and Mels, 1998),
EQS (Bentler, 1989), SEPATH (Steiger, 1999)
can be used LISREL can be used with additional
constraints (LISREL 8.50)
AMOS cannot be used
Small sample size
MacCallum et al. (1996), Marsh et al. (1988),
Hu and Bentler (1998):
Associated with lower power, ceteris paribus
Parameter estimates have lower reliability
Fit indices are overestimated
Conduct and report statistical power
Simpler models (fewer parameters estimated, higher
degrees of freedom) are associated with higher
power (MacCallum et al., 1996)
Use fit indices that are less biased to small sample size
such as NNFI; avoid fit indices that are more biased,
such as x2, GFI and NFI (Hu and Bentler, 1998)
Few degrees of freedom (d.f.) MacCallum et al. (1996):
Associated with lower power, ceteris paribus
Parameter estimates have lower reliability
Fit indices are overestimated
Report degrees of freedom
Conduct and report statistical power
Simpler models (fewer parameters estimated, higher
degrees of freedom) are associated with higher
power (MacCallum et al., 1996)
Model identification
d.f. = 0, results are not generalizable
d.f. < 0, model cannot be estimated unless
some parameters are fixed or held constant
Desirable condition (d.f. > 0)
Assess and report model identification
Explicitly discuss implication of unidentified models
on generalizability of results
Number of MVs/LV
To provide adequate representation of
content domain, need sufficient MVs/LV
Have at least three MVs per LV for CFA/SEM
(Rigdon, 1995)
One MV per LV
May not provide adequate representation of
content domain Poor reliability and
validity because error variance cannot be
estimated (Maruyama, 1998)
Model is generally unidentified
Model as MV (not LV)
Single MV can be modeled as LV only when MV is the
perfect representation of the LV; specific conditions must
be imposed for identification purposes (LISREL 8.50)
Correlated measurement
errors
Gerbing and Anderson (1984):
Report correlated errors Justify their theoretical
validity a priori
Discuss the impact on measurement and structural
parameter estimates and model fit
Alters measurement and structural
parameter estimates
Almost always improves model fit
Changes the substantive meaning of the model
Non-recursive models
Without additional constraints the model
is unidentified
Explicitly report model is non-recursive and its cause
Add constraints and report their impact (Long, 1983)
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
hand. When theory development is at a nascent stage
and patterns of relationships among LVs are relatively
weak, SEM should be used with caution so that model
confirmation and theory testing do not degenerate into
extensive model respecification. Likewise, it is
important that we use appropriate measurement
methods and understand the distinction between
formative and reflective variables.
Determining minimum sample size is, in part,
dependent upon the number of parameter estimates in
the hypothesized model. But emerging research in this
area indicates that the relationship between sample
size and number of parameter estimates is complex
and dependent upon MV characteristics (MacCallum
et al., 2001). Likewise, guidelines on degrees of
freedom and model identification are not simple or
straightforward. Researchers must be cognizant of
these issues and we recommend that all studies discuss
them explicitly. As the powerful capabilities of SEM
derive partly from its highly restrictive simplifying
assumptions, it is important that assumptions such as
normality and skewness are carefully assessed prior to
generating an input matrix and conducting analysis.
With regard to model estimation, researchers
should recognize that parameter estimates are not
fixed values, but rather depend upon the estimation
method. For instance, parameter estimates obtained by
using maximum likelihood ratio are different from
those obtained using ordinary least squares (Browne
and Arminger, 1995). Further, in evaluating model fit,
the correspondence between the hypothesized model
and the observed data should be assessed using a
variety of absolute and incremental fit indices for
measurement, structural, and overall models. In
addition to path coefficients, confidence intervals
and standard errors should be assessed.
Rather than hypothesizing a single model, multiple
alternate models should be evaluated when possible,
and research results should be cross validated using split
or multiple samples. Given the very real possibility of
alternate, equivalent models, researchers should be
cautious in over-interpreting results. Because no model
represents the real world exactly, we must be more
forthright about the ‘‘imperfection’’ inherent in any
model and acknowledge the literal implausibility of the
model more explicitly (MacCallum, 2003).
One of the most poignant observations in conducting this study was the inconsistency in the published
165
reporting of results and, in numerous instances, our
inability to reconstruct the tested model based on the
description in the text and the reported degrees of
freedom. These issues can be resolved by attention to
published guidelines for presenting results of SEM (e.g.
Hoyle and Panter, 1995). To assist both during the
review process and in building a cumulative tradition in
the OM field, sufficient information needs to be
provided to understand (1) the population from which
the data sample was obtained, (2) the distribution of the
data, (3) the hypothesized measurement and structural
models, and (4) statistical results to corroborate the
subsequent interpretation and conclusions.
We recommend that every published application of
SEM provide a clear and complete specification of the
model(s) and variables, preferably in the form of a
graphical figure, including the measurement model
linking LVs to MVs, the structural model connecting
LVs, and specification of which parameters are being
estimated and which are fixed. It is helpful to identify
specific research hypotheses on the graphical figure,
both to clarify the model and to reduce the text needed
to describe them. In addition to including a statement
about the type of input data matrix, software and
estimation method used, we recommend the input
matrix be included in paper for future replications and
meta-analytical research studies, but we recognize this
is an editorial decision subject to space constraints. In
terms of statistical results, we suggest researchers
include multiple measures of fit and criteria for
evaluating fit along with parameter estimates, and
associated confidence intervals and standard errors.
Finally, interpretation of results should be guided by
an understanding that models are imperfect and cannot
be made to be exactly correct.
We can enrich our knowledge by reviewing the use
of SEM in more mature research fields such as
psychology and marketing, including methodological
advances. Some advances worthy of mention are
validation studies using the multi-trait multi-method
(MTMM) matrix method (cf. Cudeck, 1988; Widaman, 1985), measurement invariance (Widaman and
Reise, 1997), and using categorical (Muthen, 1983) or
experimental data (Russell et al., 1998).
Our review of published SEM applications in the
OM literature suggests that while reporting has
improved over time, we need to pay attention to
methodological issues in using SEM. Like any
166
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
statistical technique or tool, it is important that SEM
be used prudently if researchers want to take full
advantage of its potential. SEM is a useful tool to
represent multidimensional unobservable constructs
and simultaneously examine structural relationships
that are not well captured by traditional research
methods (Gefen et al., 2000, p. 6). In the future,
utilizing the guidelines presented here will improve
the use of SEM in OM research, and thus, improve our
collective understanding of OM theory and practice.
(matrix), d the error of measurement in exogenous
manifest variables, y the measures of endogenous
manifest variables, Ly the effect of endogenous LVs
on their MVs (matrix), e the error of measurement in
endogenous manifest variables, j the latent exogenous
constructs, h the latent endogenous constructs, G the
effect of exogenous constructs on endogenous constructs (matrix), B the effect of endogenous constructs
on each of the other endogenous constructs (matrix)
and z is the errors in equations or residuals.
It is also necessary to define the following
covariance matrices:
Acknowledgements
We thank Michael Browne and Sriram Thirumalai
for helpful comments on this paper. We also thank
Carlos Rodriguez for assistance with article screening
and data coding.
Appendix A. Mathematical specification of
structural equation modeling
(a) f = E(jj0 ) is a covariance matrix for the exogenous LVs.
(b) ud = E(dd0 ) is a covariance matrix for the
measurement errors in the exogenous MVs.
(c) ue = E(ee0 ) is a covariance matrix for the measurement errors in the endogenous MVs.
(d) c = E(zz0 ) is a covariance matrix for the errors in
equation for the endogenous LVs.
A structural equation model can be defined as a
hypothesis of a specific pattern of relations among a
set of measured variables (MVs) and latent variables
(LVs). The three equations presented below are
fundamental to SEM. Eq. (1) represents the directional
influences of the exogenous LVs (j) on their indicators
(x). Eq. (2) represents the directional influences of the
endogenous LVs (h) on their indicators (y). Thus,
Eqs. (1) and (2) link the observed (manifest) variables
to unobserved (latent) variables through a factor
analytic model and constitute the ‘‘measurement’’
portion of the model. Eq. (3) represents the
endogenous LVs (h) as linear functions of other
exogenous LVs (j) and endogenous LVs plus residual
terms (z). Thus, Eq. (3) specifies relationships between
LVs through a structural equation model and
constitutes the ‘‘structural’’ portion of the model
Given this mathematical representation, it can be
shown that the population covariance matrix for the
MVs is a function of eight parameter matrices: Lx, Ly,
G, B, f, ud, ue and c. Thus, given a hypothesized model
in terms of fixed and free parameters of the eightparameter matrices, and given a sample covariance
matrix for the MVs, one can solve for estimates of the
free parameters of the model. The most common
approach for fitting the model to data is to obtain
maximum likelihood estimates of parameters, and an
accompanying likelihood ratio x2-test of the null
hypothesis that the model holds in the population.
The notation above uses SEM as developed by
Jöreskog (1974) and represented in LISREL (Jöreskog
and Sörbom, 1996).
x ¼ Lx j þ d
(1)
References
y ¼ Ly h þ e
(2)
h ¼ Bh þ G j þ z
(3)
where x is the measures of exogenous manifest variables, Lx the effect of exogenous LVs on their MVs
Anderson, J.C., Gerbing, D.W., 1988. Structural equation modeling
in practice: a review and recommended two step approach.
Psychological Bulletin 103 (3), 411–423.
Anderson, J.C., Gerbing, D.W., 1984. The effects of sampling error
on convergence, improper solutions, and goodness-of-fit indices
for maximum likelihood confirmatory factor analysis. Pyschometrika 49, 155–173.
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
Bagozzi, R.P., Heatherton, T.F., 1994. A general approach to
representing multifaceted personality constructs: application
to state self-esteem. Structural Equation Modeling 1 (1), 35–67.
Bagozzi, R.P., Yi, Y., 1988. On the evaluation of structural equation
models. Journal of the Academy of Marketing Science 16 (1),
74–94.
Barman, S., Hanna, M.D., LaForge, R.L., 2001. Perceived relevance
and quality of POM journals: a decade later. Journal of Operations Management 19 (3), 367–385.
Baumgartner, H., Homburg, C., 1996. Applications of structural
equation modeling in marketing and consumer research: a review.
International Journal of Research in Marketing 13 (2), 139–161.
Bentler, P.M., 1989. EQS: Structural Equations Program Manual.
BMDP Statistical Software, Los Angeles, CA.
Bentler, P.M., Chou, C.P., 1987. Practical issues in structural
modeling. Sociological Methods and Research 16 (1), 78–117.
Bollen, K.A., 1989. Structural Equations with Latent Variables.
Wiley, New York.
Bollen, K.A., Lennox, R., 1991. Conventional wisdom on measurement: a structural equation perspective. Psychological Bulletin
110, 305–314.
Brannick, M.T., 1995. Critical comments on applying covariance
structure modeling. Journal of Organizational Behavior 16 (3),
201–213.
Brown, R.L., 1994. Efficacy of the indirect approach for estimating
structural equation models with missing data: a comparison of
five methods. Structural Equation Modeling 1, 287–316.
Browne, M.W., Arminger, G., 1995. Specification and estimation of
mean and covariance structure models. In: Arminger, G., Clogg,
C.C., Sobel, M.E. (Eds.), Handbook of Statistical Modeling for
the Social and Behavioral Sciences. Plenum, New York, pp.
185–249.
Browne, M.W., Cudeck, R., 1989. Single sample cross-validation
indices for covariance structures. Multivariate Behavioral
Research 24 (4), 445–455.
Browne, M.W., Cudeck, R., 1993. Alternative ways of assessing
model fit. In: Bollen, K.A., Long, J.S. (Eds.), Testing Structural
Equation Models. Sage, Newbury Park, CA, pp. 136–161.
Browne, M.W., Mels, G., 1998. Path analysis: RAMONA. In:
SYSTAT for Windows: Advanced Applications (Version 8),
SYSTAT, Evanston, IL.
Browne, M.W., MacCallum, R.C., Kim, C., Anderson, B.L., Glaser,
R., 2002. When fit indices and residuals are incompatible.
Psychological Bulletin 7 (4), 403–421.
Chin, W.W., 1998. Issues and opinion on structural equation modeling. MIS Quarterly 22 (1), vii–xvi.
Chin, W.W., Todd, P.A., 1995. On the use, usefulness, and ease of
use of structural equation modeling in MIS research: a note of
caution. MIS Quarterly 19 (2), 237–246.
Cohen, P., Cohen, J., Teresi, J., Marchi, M., Velez, C.N., 1990.
Problems in the measurement of latent variables in structural
equations causal models. Applied Psychological Measurement
14 (2), 183–196.
Cudeck, R., 1988. Multiplicative models and MTMM matrices.
Multivariate Behavioral Research 13, 131–147.
Cudeck, R., 1989. Analysis of correlation matrices using covariance
structure models. Psychological Bulletin 105, 317–327.
167
Cudeck, R., Browne, M.W., 1983. Cross-validation of covariance
structures. Multivariate Behavioral Research 18 (2), 147–167.
Enders, C.K., Bandalos, D.L., 2001. The relative performance of full
information maximum likelihood estimation for missing data in
structural equation models. Structural Equation Modeling 8 (3),
430–457.
Fornell, C., 1983. Issues in the application of covariance structure
analysis. Journal of Consumer Research 9 (4), 443–448.
Fornell, C., Larcker, D.F., 1981. Evaluating structural equation
models with unobservable variables and measurement errors.
Journal of Marketing Research 18 (1), 39–50.
Fornell, C., Rhee, B., Yi, Y., 1991. Direct regression, reverse
regression, and covariance structural analysis. Marketing Letters
2 (3), 309–320.
Garver, M.S., Mentzer, J.T., 1999. Logistics research methods:
employing structural equation modeling to test for construct
validity. Journal of Business Logistics 20 (1), 33–57.
Gefen, D., Straub, D.W., Boudreau, M., 2000. Structural equation
modeling and regression: guidelines for research practice. Communications of the AIS 1 (7), 1–78.
Gerbing, D.W., Anderson, J.C., 1984. On the meaning of withinfactor correlated measurement errors. Journal of Consumer
Research 11, 572–580.
Goh, C., Holsapple, C.W., Johnson, L.E., Tanner, J.R., 1997.
Evaluating and classifying POM journals. Journal of Operations
Management 15 (2), 123–138.
Gollob, H.F., Reichardt, C.S., 1987. Taking account of time lags in
causal models. Child Development 58 (1), 80–92.
Gollob, H.F., Reichardt, C.S., 1991. Interpreting and estimating
indirect effects assuming time lags really matter. In: Collins,
L.M., Horn, J.L. (Eds.), Best Methods for the Analysis of
Change. American Psychological Association, Washington,
DC, pp. 243–259.
Greenwald, A.G., Pratkanis, A.R., Leippe, M.R., Baumgartner,
M.H., 1986. Under what conditions does theory obstruct
research progress? Psychological Review 93 (2), 216–229.
Hair Jr., J.H., Anderson, R.E., Tatham, R.L., Black, W.C., 1998.
Multivariate Data Analysis. Prentice-Hall, New Jersey.
Hershberger, S.L., 2003. The growth of structural equation modeling: 1994–2001. Structural Equation Modeling 10 (1), 35–46.
Homburg, C., Dobartz, A., 1992. Covariance structure analysis via
specification searches. Statistical Papers 33 (1), 119–142.
Hoyle, R.H., Panter, A.T., 1995. Writing about structural equation
modeling. In: Hoyle, R.H. (Ed.), Structural Equation Modeling:
Concepts, Issues, and Applications. Sage, Thousand Oaks, CA,
pp. 158–176.
Hu, L., Bentler, P.M., 1998. Fit indices in covariance structure
modeling: sensitivity to under-parameterized model misspecification. Psychological Methods 3 (4), 424–453.
Hurley, A.E., Scandura, T.A., Schriesheim, C.A., Brannick, M.T.,
Seers, A., Vandenberg, R.J., Williams, L.J., 1997. Exploratory
and confirmatory factor analysis: guidelines, issues, and
alternatives. Journal of Organizational Behavior 18 (6), 667–
683.
Jackson, D.L., 2003. Revisiting the sample size and number of
parameter estimates: some support for the N:q hypothesis.
Structural Equation Modeling 10 (1), 128–141.
168
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
Jöreskog, K.G., 1969. A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34 (2 Part 1),
183–202.
Jöreskog, K.G., 1974. Analyzing psychological data by structural
analysis of covariance matrices. In: Atkinson, R.C., Krantz,
D.H., Luce, R.D., Suppes, P. (Eds.), Contemporary developments in mathematical psychology, vol. II. W.H. Freeman, San
Francisco, pp. 1–56.
Jöreskog, K.G., 1993. Testing structural equation models. In: Bollen, K.A., Long, J.S. (Eds.), Testing Structural Equation Models.
Sage, Newbury Park, CA, pp. 294–316.
Jöreskog, K.G., Sörbom, D., 1996. LISREL 8: User’s Reference
Guide. Scientific Software International Inc., Chicago, IL.
Leamer, E.E., 1978. Specification Searches: Ad-hoc Inference with
Non-experimental Data. Wiley, New York.
Lei, M., Lomax, R.G., 2005. The effect of varying degrees of
nonnormality in structural equation modeling. Structural Equation Modeling 12 (1), 1–27.
Little, T.D., Lindenberger, U., Nesselroade, J.R., 1999. On selecting indicators for multivariate measurement and modeling
with latent variables: when ’good’ indicators are bad and
’bad’ indicators are good. Psychological Methods 4 (2),
192–211.
Long, J.S., 1983. Covariance Structure Models: An Introduction to
LISREL. Sage, Beverly Hill, CA.
MacCallum, R.C., 2003. Working with imperfect models. Multivariate Behavioral Research 38 (1), 113–139.
MacCallum, R.C., 1990. The need for alternative measures of fit in
covariance structure modeling. Multivariate Behavioral
Research 25 (2), 157–162.
MacCallum, R.C., 1986. Specification searches in covariance
structure modeling. Psychological Bulletin 100 (1), 107–
120.
MacCallum, R.C., Austin, J.T., 2000. Applications of structural
equation modeling in psychological research. Annual Review
of Psychology 51 (1), 201–226.
MacCallum, R.C., Browne, M.W., 1993. The use of causal indicators
in covariance structure models: some practical issues. Psychological Bulletin 114 (3), 533–541.
MacCallum, R.C., Browne, M.W., Sugawara, H.M., 1996. Power
analysis and determination of sample size for covariance structure modeling. Psychological Methods 1 (1), 130–149.
MacCallum, R.C., Roznowski, M., Necowitz, L.B., 1992. Model
modifications in covariance structure analysis: the problem of
capitalization on chance. Psychological Bulletin 111 (3), 490–
504.
MacCallum, R.C., Wegener, D.T., Uchino, B.N., Fabrigar, L.R.,
1993. The problem of equivalent models in applications of
covariance structure analysis. Psychological Bulletin 114 (1),
185–199.
MacCallum, R.C., Widaman, K.F., Preacher, K.J., Hong, S., 2001.
Sample size in factor analysis: the role of model error. Multivariate Behavioral Research 36 (4), 611–637.
Malhotra, M.K., Grover, V., 1998. An assessment of survey research
in POM: from constructs to theory. Journal of Operations
Management 16 (4), 407–425.
Marsh, H.W., 1998. Pairwise deletion for missing data in structural
equation models: nonpositive definite matrices, parameter estimates, goodness of fit, and adjusted sample sizes. Structural
Equation Modeling 5, 22–36.
Marsh, H.W., Balla, J.R., McDonald, R.P., 1988. Goodness-of-fit
indexes in confirmatory factor analysis: the effect of sample size.
Psychological Bulletin 103 (3), 391–410.
Marsh, H.W., Hocevar, D., 1985. Applications of confirmatory
factor analysis to the study of self concept: first and higher
order factor models and their invariance across groups. Psychological Bulletin 97, 562–582.
Maruyama, G., 1998. Basics of Structural Equation Modeling. Sage,
Thousand Oaks, CA.
Medsker, G.J., Williams, L.J., Holahan, P., 1994. A review of current
practices for evaluating causal models in organizational behavior and human resources management research. Journal of
Management 20 (2), 439–464.
Mulaik, S.S., James, L.R., Van Alstine, J., Bennett, N., Lind, S.,
Stillwell, C.D., 1989. An evaluation of goodness of fit indices for
structural equation models. Psychological Bulletin 105 (3), 430–
445.
Muthen, B., 1983. Latent variable structural equation modeling
with categorical data. Journal of Econometrics 22 (1/2),
43–66.
Muthen, B., Kaplan, D., Hollis, M., 1987. On structural equation
modeling with data that are not missing completely at random.
Psychometrika 52, 431–462.
Pearl, J., 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK.
Rigdon, E.E., 1995. A necessary and sufficient identification rule for
structural models estimated in practice. Multivariate Behavioral
Research 30 (3), 359–383.
Roth, A., Schroeder, R., in press. Handbook of Multi-item Scales for
Research in Operations Management. Sage.
Russell, D.W., Kahn, J.H., Spoth, R., Altmaier, E.M., 1998. Analyzing data from experimental studies: a latent variable structural
equation modeling approach. Journal of Counseling Psychology
45, 18–29.
Satorra, A., 2001. Goodness of fit testing of structural equations
models with multiple group data and nonnormality. In: Cudeck,
R.C., du Toit, S., Sörbon, D. (Eds.), Structural Equation Modeling: Present and Future. Scientific Software International, Lincolnwood, IL, pp. 231–256.
Sedlmeier, P., Gigenrenzer, G., 1989. Do studies of statistical power
have an effect on the power of the studies? Psychological
Bulletin 105 (2), 309–316.
Shook, C.L., Ketchen, D.J., Hult, G.T.M., Kacmar, K.M., 2004. An
assessment of the use of structural equation modeling in strategic
management research. Strategic Management Journal 25 (4),
397–404.
Soteriou, A.C., Hadijinicola, G.C., Patsia, K., 1998. Assessing
production and operations management related journals: the
European perspective. Journal of Operations Management 17
(2), 225–238.
Steiger, J.H., 1999. Structural equation modeling (SEPATH). Statistica for Windows, vol. III. StatSoft, Tulsa, OK.
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169
Steiger, J., 2001. Driving fast in reverse. Journal of American
Statistical Association 96, 331–338.
Tanaka, J.S., 1987. How big is big enough? Sample size and
goodness of fit in structural equation models with latent
variables. Child Development 58, 134–146.
Tanaka, J.S., 1993. Multifaceted conceptions of fit in structural
equation models. In: Bollen, K.A., Long, J.S. (Eds.), Testing
Structural Equation Models. Sage, Newbury Park, CA, pp. 10–39.
Teel, J.E., Bearden, W.O., Sharma, S., 1986. Interpreting LISREL
estimates of explained variance in non-recursive structural equation models. Journal of Marketing Research 23 (2), 164–168.
Vokurka, R.J., 1996. The relative importance of journals used in
operations management research: a citation analysis. Journal of
Operations Management 14 (3), 345–355.
169
West, S.G., Finch, J.F., Curran, P.J., 1995. Structural equation
models with nonnormal variables: problems and remedies. In:
Hoyle, R.H. (Ed.), Structural Equation Modeling: Issues,
Concepts, and Applications. Sage, Newbury Park, CA, pp.
56–75.
Widaman, K.F., 1985. Hierarchically nested covariance structure
models for multitrait-multimethod data. Applied Psychological
Measurement 9, 1–26.
Widaman, K.F., Reise, S., 1997. Exploring the measurement invariance of psychological instruments: applications in the substance
use domain. In: Bryant, K.J., Windle, M., West, S.G. (Eds.), The
Science of Prevention: Methodological Advances from Alcohol
and Substance Abuse. American Psychological Association,
Washington, DC, pp. 281–324.