Bergen 2011 - Roy Howell Homepage

advertisement
Taking Time into Account in
Structural Equation Models
Roy D. Howell, Texas Tech University
Einar Breivik, NHH
Bergen
August, 2011
Every structural path in a SEM implies the
passing of at least some period of time. A
fundamental notion of causation :
The cause must precede the effect in time.
b
X(t1)
Y(t2)
Our Question: Can we recover the true value of
b (the data generating process) from a model
estimated with cross-sectional data?
Why does it matter?
While we may carefully avoid using the word
‘cause’, since we have all been taught the
“correlation does not imply causation”, it is
almost impossible to avoid a causal
interpretation when we get to the
“managerial implications” section of a
manuscript.
• Cross-sectional data allow the researcher to
examine how outcomes differ among entities
that posses different levels of an independent
variable (i.e., how profitability differs among
firms with high versus low levels of market
share; how information usage differs among
firms with high versus low levels of trust).
• The conceptual argument is based on a
between-subject interpretation, consistent
with cross-sectional data (i.e., covariation or
correlation).
• Other research focuses on how outcomes are
influenced by changes in a predictor (e.g., how
and increase in market share affects profitability;
how and increase in trust affects information
usage). The nature of the conceptual argument
has a within-subject interpretation.
• As noted by Rindfleisch, et al. (JMR 2008), “Of
these two types of arguments, longitudinal data
collection appears to be most valid for the [within
subject] interpretation. Within-subject
comparisons are typically obtained through
multiple observations over time” (p. 276).
• While the preceding statement is accurate,
Rindfleisch et al. go on to suggest that,
“However, most survey-based marketing
studies appear to focus on between-subject
arguments. Thus, for these types of studies,
longitudinal data collection may not be
necessary” (p. 276).
• On this point we disagree.
• First, it is difficult to avoid the within-subject
interpretation when reaching any ‘managerial
implications’. Any statement that begins
“Managers should…” must have a withinsubject interpretation.
• If the results of a study have no implications
for actions that could be taken to affect the
dependent variable of interest, then why was
the study undertaken in the first place?
• Even the co-authors of the paper claiming that
most marketing studies are between-subject
focused, in the managerial implications section,
use essentially cross-sectional data (as explained
later) to conclude, “Therefore, a key managerial
priority should be to develop and nurture
relationships with potential knowledge
suppliers…” and “…a firm must develop strong
relationships with key knowledge providers to
gain access to knowledge” (Genasan, Malter &
Rindfleisch 2005, p. 256).
• Second, even in the rare event that the author of a
study is very careful to express the results in betweensubject terms (e.g., firms with higher levels of
relationship trust exhibit more information sharing that
comparable firms with lower levels of trust; we find on
average that a firm with 1% higher market share than a
comparable firm will have a .6% greater ROI), how
many managers will not draw the conclusion that if I
increase X by one unit, I should see Y increase by b
units (if I increase market share by 1% I will see ROI go
up by .6% in the future); that is, draw a within-subject
conclusion?
• We conclude that any study designed to inform action
with respect to a dependent variable must have a
within-subject interpretation.
• What’s wrong with interpreting an effect
found from cross-sectional data in a withinsubject sense? That is, when will crosssectional estimates of longitudinal effects be
unbiased?
• To address this question, we need to look at
some properties of longitudinal models
(remember, causation implies the passage of
time).
Some Properties of Longitudinal Models:
.4
X
X
1
.4
2
.3
3
.3
.4
Y
X
.4
Y
Y
5
.4
3
.5
X
.3
.4
2
.5
4
.3
.4
1
X
.4
Y
4
.5
Y
5
.5
Within-wave X,Y correlations
0
.32
.38
.40
.42 ………42
Figure 1 Example of how stationarity produces equilibrium over time. Subscripts denote the time or wave at
which a given measure was obtained.
Stationarity
Stationarity “refers to an unchanging causal
structure” (Kenny 1979, p. 232). Stationarity
implies the degree to which one set of
variables produces changes in another set
remains the same over time. The model
depicted in Figure 1 is stationary in that both
the autoregressive parameters (linking a
variable to itself from time period to time
period) and the causal parameters are the
same from wave to wave.
.4
X
X
1
.4
2
.3
3
.3
.4
Y
X
.4
Y
Y
5
.4
3
.5
X
.3
.4
2
.5
4
.3
.4
1
X
.4
Y
4
.5
Y
5
.5
Within-wave correlations
0
.32
.38
.40
.42 ………42
Figure 1 Example of how stationarity produces equilibrium over time. Subscripts denote the time or wave at
which a given measure was obtained.
Equilibrium
Equilibrium refers to a causal structure that displays
temporal stability (or constancy) of patterns of
covariance and variance.
• A system that exhibits stationarity is not necessarily at
equilibrium. The model in Figure 1 shows stability from
the outset – the autocorrelations and causal paths are
identical at every lag. However, the system is not at
equilibrium until the fifth wave and after. That is, the
correlation between X and Y starts at zero, and
progresses steadily to and equilibrium value of .42.
• We know that the true effect of X on Y is .3 in
all waves. If we just had the cross-sectional
correlation between X and Y at a given time,
(as we would have with cross-sectional data),
we would over-estimate the causal effect of X
on Y (true value = .3) in every wave after wave
0, and if we were to observe the system in
equilibrium (waves five and after), cross
sectional data would not provide an unbiased
estimate (.42 instead of .3).
• Although many researchers have attempted to
justify the use of cross-sectional models using
the equilibrium assumption (e.g., James et al.
1985), we see that this won’t work.
• So, why are cross-sectional estimates of causal
effects consistently wrong?
• The key here is the autoregressive effect of
Yt-1 on Yt, or the effect that a variable has on
itself.
• Consider the Model
X1
.46
.95
.23
X2
.48
Y1
.62
Y2
The true causal impact of X1 on Y2 is .23. A model
for cross-sectional data gathered only at time 1
would yield an estimate of .46, while a crosssection at time to would estimate the causal
effect to be .48 (Gollob & Reichardt 1987).
• In this simple example, we have specification
error or omitted variable bias in the equation for
y2, (omitting the effect of y1). Also, If there are
other factors not modeled that affect Y, some of
their effect is captured in the lagged value.
• It seems clear that when we investigate
relationships in an ongoing system, that is, one
where the variables already exist at some level
for the subjects, the level of a variable at any time
ti depends at least in part on its level at ti-1.
A Small Example
• In arguing against the necessity of longitudinal data,
Rindfleisch et al. (2008) present results from two
surveys conducted in two waves each. They compare
cross-sectional correlations (between Xt1 and Yt1) with
the Xt1,Yt2 correlations for 12 pairs of variables and
conclude, “…these results indicate that the longitudinal
data in each study provide largely similar results as
their cross-sectional counterparts” (p. 270). But what
have their results shown?
• That their systems are in equilibrium (as might be
expected, since the average duration of the systems
they examine is over five years.
• The relationships they examine are between Xt1
and Yt1 as compared to Xt1 and Yt2. What have
they not considered?
• The probability that Yt2 depends not only on Xt1,
but also on Yt1; that is, the autoregressive effects.
• While they do not present the Y1Y2 correlations,
we can effectively “bracket” the magnitude of the
true effect by assuming plausible values for the
Y1Y2 correlations.
• For example, they show that the cross-sectional
correlation between Product Knowledge (PK) at time t1
with Product Creativity (PC) at t1 is .37, and the
correlation between PKt1 and PCt2 is .38 (in their
alliance data).
• Let us assume that a plausible range for the
autocorrelation of PC is from .5 to .8. At .5, the PK1 to
PC2 path would be .23 (instead of .38 assuming zero
autocorrelation), while at .8 the path would be .09.
• This approach is consistent with the “latent
longitudinal analysis” suggested by Gollob & Reichardt
(1987).
Does it Matter?
• Both the within- versus between-subject
interpretations of our data and the need to
account for autocorrelation can have profound
effects on practice.
• Buzzell et al. (1975)
– Cross-sectional PIMS data
– 10% point difference in market share associated with
5% difference in ROI.
• Should be interpreted as between-subject: “firms
with higher market share are more profitable
than low share firms”.
• Strategic Planning Institute: “Market share
Boosts productivity”. Based on this withinsubject interpretation (and underlying
experience curve rationale), the “wars for
market share” began.
• Managers and consultants interpreted the
findings as, “If I increase my market share I
will increase my profitability,” that is, in a
causal, within-subject sense.
• Jacobson & Aaker (1985)
– Longitudinal PIMS data, include lagged ROI in profit
equation (capturing unmeasured influences on ROI).
– Market share effect dropped from .5 in cross-section
to .22 in longitudinal analysis.
– Additionally including lagged values for market share
dropped the estimate to .18 (and controlling for other
variables dropped it to .03).
• Ten years of business preoccupation with market
share with no positive results!
What to Do?
• Longitudinal Data – Difficult and costly
• If stability and equilibrium can be assumed,
two (appropriately spaced) waves may be
enough
• Web-based surveys – lower marginal cost of
data collection?
In cases where only cross-sectional data are possible, Gollub and
Reichardt (1987) recommend the use of “latent” longitudinal
models. The idea is to use unobserved variable in place of time 1
variables:
LX1
γ1
ø
X2
β
Y2
LY1
γ2
Ovals are unobserved, rectangles are observed.
LX1
γ1
X2
β
ø
LY1
Y2
γ2
We have four parameters to estimate, but only the X2Y2 correlation is observed.
γ1 can be considered a test-retest correlation – plausible values not too difficult.
We can perhaps get plausible values for a test-retest on Y,(ρyy). Then the
relationship γ2 = ρyy – βø can be used as a constraint to identify the system.
Further, if the system is at equilibrium, the observed X2Y2 correlation should be
close to the time I estimate of ø.
Choosing a range of plausible values for the test-retest retest correlations, one
can estimate a range of plausible values for β, the causal parameter of interest.
Almost always closer to the true value than the cross-sectional correlation, and
assumptions are explicit!
A note on equilibrium
• How long does it take? Simulation on a wide
variety of autoregressive and cross-lagged
parameters suggest stationary systems often
approach equilibrium after four or five waves.
• Systems may not be stationary, however, and
may not reach equilibrium in any finite
number of waves. (CSR year to year)
Mediation Models?
• Just as direct causal effects need time to manifest
themselves, the effects of X on a mediator M
need time to appear, and the effects of M on Y
also require some time interval.
• As in the case of a two variable system, the
estimation of the true causal effect of X on M
requires that the autocorrelation effect of M on
itself needs to be included (as well as the Y
autocorrelation).
• A longitudinal mediation model is:
X
X
1
a
M
1
3
X
t
a
M … M
M
1
2
b
Y
X
2
…
3
t
b
c
Y
2
Y
3
…
Y
t
When can a cross-sectional model of mediation (below) be used to approximate the
longitudinal model?
X
C’
Y
t
t
a’
b’
M
t
eM
eY
• Will a’ in the cross-sectional model provide an unbiased
estimate of a in the longitudinal model? No. As in the two
variable case, omitting the autocorrelation of M on itself
results in omitted variable bias – M1 is a cause of M2, and
is correlated with X.
• Will b’ = b? No, for the same reasons we have been
discussing: Y(t-1) is missing as a predictor, and is correlated
with M.
• Even if there is complete mediation [c=0], the crosssectional analysis will almost never reflect the longitudinal
[ab] effect accurately. That is a’b’ almost never equals ab
(Maxwell & Cole 2007).
• Will c’ = c (the direct effect of X on Y, not mediated by
M).
• Assume that in the true model, c=0.
• C’= 0 iff pxx = pmm (Equal Stability)
• C’> 0 iff pxx > pmm
• C’< 0 iff pxx < pmm
• You get the appropriate direct effect in cross-sectional
models ONLY when X and M are equally stable!
• Maybe this is why cross-sectional models almost never
demonstrate complete mediation.
Does this mean I have to have three waves of data
to estimate a mediation model?
• Not necessarily. Measure X, M, and Y at two
waves.
• Estimate the a parameter (X1 to M2) with M1 in
the model.
• Estimate path b (M1 to Y2) with Y1 in the model.
• Use the stationarity assumption (b doesn’t
change from wave to wave). Then, the product
(ab) can provide an estimate of the mediational
effect of X on Y through M.
Other Considerations
• Timing of the waves is important, and deserves a
level of consideration beyond the scope of this
presentation.
• Too long an interval, effects decay.
• Especially difficult because the appropriate
interval for the X to M effect may not be the best
for the M to Y effect.
• Using retrospection instead of a first wave?
– Measurement error and response effects may be
difficult to overcome.
• With multiple observations on each subject
(true panel or multiple (four or more) waves),
latent growth curve modeling or random
effects (two-level) modeling may be more
appropriate.
Key References
• Cole, D. A. & Maxwell, S. E. (2003). “Testing mediation
models with longitudinal data: Questions and tips in the
use of structural equation modeling”, Journal of Abnormal
Psychology, 112 (4), 558-577.
• Gollob, H. F. & Reichardt, C. S. (1987). “Taking account of
time lags in causal models”, Child Development, 58, 80-92.
• Maxwell, S. E. & Cole, D. A. (2007). “Bias in cross-sectional
analyses of longitudinal mediation.” Psychological Methods,
12 (1), 23-44.
• Rindfleisch, A., Malter, A., Ganesan, S. & Moorman, C.
(2008). “Cross-sectional versus longitudinal survey
research: Concepts, findings, and guidelines,” Journal of
Marketing Research, 45 (June), 261-279.
Download