Document

advertisement
Estimating Interaction Effects Using
Multiple Regression
Herman Aguinis, Ph.D.
Mehalchin Term Professor of Management
The Business School
University of Colorado at Denver
www.cudenver.edu/~haguinis
Overview
• What is an Interaction Effect?
• The “So What” Question: Importance of
Interaction Effects for Theory and Practice
• Estimating Interaction Effects Using Moderated
Multiple Regression (MMR)
• Problems with MMR
• Aguinis, Beaty, Boik, & Pierce (2005, J. of
Applied Psychology)
• The “Now What” Question: Addressing problems
with MMR
• Some Conclusions
What is an Interaction Effect?
• The relationship between X and Y depends on Z
(i.e., a moderator)
X
Y
Z
X
Y
Z
• Other terms used:
– Population control variable (Gaylord & Carroll, 1948);
Subgrouping variable (Frederiksen & Melville, 1954);
Predictability variable (Ghiselli, 1956); Referent
variable (Toops, 1959); Modifier variable (Grooms &
Endler, 1960);Homologizer variable (Johnson, 1966)
Importance of Interaction
Effects: Theory
•
•
•
•
Going beyond main effects
We typically say “it depends”
More complex models
“If we want to know how well we are doing in the
biological, psychological, and social sciences, an
index that will serve us well is how far we have
advanced in our understanding of the moderator
variables of our field” (Hall & Rosenthal, 1991, p.
447)
Importance of Interaction
Effects: Practice
For example, personnel selection:
• Test bias: The relationship between a test and a
criterion depends on gender or ethnicity
• “No bias exists if the regression equations
relating the test and the criterion are
indistinguishable for the groups in question”
(Standards, 1999, p. 79)
• In other words, the X-Y relationship differs
depending on the value of Z (e.g., 1 = Female, 0
= Male)
Illustration of Gender as a Moderator
in Personnel Selection
Women
Job Performance
Ŷwomen
Common
line
Ŷcommon
Ŷmen
Men
X
Test Scores
Importance of Interaction
Effects: Practice
• Management in General
– Does an intervention work similarly well for,
for example, Cantonese and American
employees working in Hong Kong?
(categorical moderator)
• Example: Performance management system
regarding teaching at university in Hong Kong.
Would the same evaluation methods lead to
employee (i.e., faculty) satisfaction depending on
the national origin of faculty members?
Estimating Interaction Effects
• Moderated Multiple Regression (MMR)
• Ŷ = a + b1 X + b2 Z + b3 X·Z,
where Y = criterion (continuous variable)
X = predictor (typically continuous)
Z = moderator (continuous or
categorical)
X·Z = product term carrying information
about the moderating effect (i.e., interaction
between X and Z)
Statistical Significance Test
2
1
R
• Ŷ = a + b1 X + b2 Z ;
• Ŷ = a + b1 X + b2 Z + b3 X·Z;
( R  R ) / ( k2  k1 )
F
2
(1  R2 ) / ( N  k2  1)
2
2
2
1
; Ho :
• Ho: β3 = 0 (using a t-statistic)
R
2
2
ψ =ψ
1
2
Estimating Interaction Effects
Using Moderated Multiple
Regression (MMR)
^
Y  a  b1 X  b2 Z  b3 X  Z
• For example:
– Personnel selection: Y = measure of performance,
X = test score, Z = gender
– Additional research areas: training, turnover,
performance appraisal, return on investment,
mentoring, self-efficacy, job satisfaction,
organizational commitment, and career
development, among others
Interpreting Interactions
(Z is continuous)
• Ŷ = a + b1 X + b2 Z + b3 X·Z,
• b3 = 2 means that a one-unit change in X
(Z) increases the slope of Y on Z (Y on X)
by 2 points
Interpreting Interactions
(Z is binary, dummy coded)
• Ŷ = a + b1 X + b2 Z + b3 X·Z,
• b3 = estimated difference between the slope of Y on X
between the group coded as 1 and the group coded as
0.
• b2 = estimated difference between X scores for a
member in group coded as 1 and a member in group
coded as 0 assuming the scores on Y are 0.
• b1 = estimated X score for members of the group coded
as 1 assuming the scores on Y are 0.
• a = mean score on X for members of group coded as 0.
Pervasive Use of MMR in the
Organizational Sciences
• Recent review: MMR was used in over
600 attempts to detect moderating effects
of categorical variables in AMJ, JAP, and
PP between 1977-1998 (Aguinis, Beaty,
Boik, & Pierce, 2005, JAP)
Selected Research on MMR
• Aguinis (2004, Regression Analysis for Categorical
Moderators, Guilford Press)
• Aguinis, Beaty, Boik, and Pierce (2005, J. of Applied
Psychology)
• Aguinis, Boik, and Pierce (2001, Organizational Research
Methods)
• Aguinis, Petersen, and Pierce (1999, Organizational
Research Methods)
• Aguinis and Pierce (1998, Organizational Research Methods)
• Aguinis and Pierce (1998, Ed. & Psychological
Measurement)
• Aguinis and Stone-Romero (1997, J. of Applied Psychology)
• Aguinis, Bommer, and Pierce (1996, Ed. & Psychological
Measurement)
• Aguinis (1995, J. of Management)
Methodology: Monte Carlo
Simulations
• Research question: Does MMR do a good
job at estimating moderating effects?
• Difficulty: We don’t know the population
• Solution: Monte Carlo methodology
–
–
–
–
–
Create a population
Generate random samples
Perform MMR analyses on samples
Compare population versus samples
Assess % of hits and misses
Problems with MMR
1.
2.
We don’t find moderators
If we find them, they are small
Why should we care?
 Theory: Failure to find support for correct
hypotheses (derailment of theory advancement
process; model misspecification)
 Practice: Erroneous decision making (e.g.,
over and under prediction of performance,
implementation of ineffective interventions)
–
–
Ethical implications
Legal implications
Some Culprits for Erroneous
Estimation of Moderating Effects
• Small total sample size
• Unequal sample size across moderatorbased groups
• Range restriction (i.e., truncation) in predictor
variable X
• Scale coarseness
• Violation of homogeneity of error variance
assumption
• Unreliability of measurement
• Artificial dichotomization/polichotomization of
continuous variables
• Interactive effects
Unequal Sample Size Across
Moderator-based Subgroups
• Applies to categorical moderators (e.g.,
gender, national origin)
• In many research situations, n1  n2
• Two studies examined this issue (Aguinis &
Stone-Romero, 1997; Stone, Alliger, and
Aguinis, 1994) (see also Aguinis, 1995)
•
2 n1n2
N'
n1  n2
• Conclusion: n1 needs to be (.3 n2) or larger to
detect medium moderating effects
Truncation in Predictor X
• Non-random sampling
• Pervasive in field settings (systematic in
personnel selection/test validation research,
[X,Y] | X > x)
• Aguinis and Stone-Romero (1997)
(categorical moderator) McClelland and Judd,
1993 (continuous moderator)
• Truncation has a dramatic impact on power
– N = 300, medium moderating effect, power = .81
– Same conditions, truncation = .80, power = .51
• Conclusion: Even mild levels of truncation
can have a substantial detrimental effect on
power
Violation of Homogeneity
of Error Variance Assumption
• Applies to categorical moderators
• Error variance: Variance in Y that remains
after predicting Y from X is equal across
subgroups (e.g., women, men)
2
2
2



(
1


)
e
Y
XY
•
(i )
(i )
(i )
• Distinct from homoscedasticity assumption
Regression of Homoscedastic
Data
10
8
Criterion (Y)
6
4
2
0
0
2
4
6
8
10
12
14
Predictor (X)
Total Sample: Women & Men
16
18
10
10
8
8
6
6
Criterion (Y)
Criterion (Y)
Regression for Subgroups
4
2
0
0
2
4
6
8
Predictor (X)
Women
10
12
14
16
18
4
2
0
0
2
4
6
8
Predictor (X)
Men
10
12
14
16
18
Artificial polichotomization of
continuous variables
• Median split and other common methods for
“simplifying the data” before conducting ANOVAs
• Cohen (1983) showed this practice is inappropriate
• In the context of MMR, some have used a median
split procedure on continuous predictor Z and
compared correlations across groups
• MMR always performs better than comparing
artificially-created subgroups (Stone-Romero &
Anderson, 1994)
• Conclusion: Do not polichotomize truly continuous
predictors
Interactions Among Artifacts
• Concurrent manipulation of truncation, N, n1 and
n2, and moderating effect magnitude (Aguinis &
Stone-Romero, JAP, 1997) .
• Results: Methodological artifacts have interactive
effects on power.
• Even if conditions conducive to high power are
favorable regarding one factor (e.g., N),
conditions unfavorable regarding other factors
(e.g., truncation) will lead to low power.
• Conclusion: Relying on a single strategy (e.g.,
increase N) to improve power will not be
successful if other methodological and statistical
artifacts
Aguinis, Beaty, Boik, & Pierce
(2005, JAP)
• Q1: What is the size of observed moderating
effects of categorical variables in published
research?
• Q2: What would the size of moderating effects of
categorical variables be in published research
under conditions of perfect reliability?
• Q3: What is the a priori power of MMR to detect
moderating effects of categorical variables in
published research?
• Q4: Do MMR tests reported in published
research have sufficient statistical power to
detect moderating effects conventionally defined
as small, medium, and large?
Method
• Review of all articles published from 1969
to 1998 in Academy of Management
Journal (AMJ), Journal of Applied
Psychology (JAP), and Personnel
Psychology (PP)
• Criteria for study inclusion:
– At least one MMR analysis
– The MMR analysis included a continuous
criterion Y, a continuous predictor X, and a
categorical moderator Z
Effect Size and Power
Computation
• Total of 636 MMR analyses
• Moderator sample sizes for 507 (79.72%)
• Moderator group sample sizes and predictorcriterion rs for 261 (41.04%)
• Effect sizes and power computation based on
261 MMR analyses for which ns and rs were
available. We used SD information when
available, and assumed homogeneity or error
variance when this information was not available
Results (I)
Frequency of MMR Use over Time:
Number of MMR Ana lyse s
120
100
80
60
40
20
0
1977 1979 1981 1983
1985 1987 1989 1991 1993
Publication Year
1995 1997
Q1: Size of Observed Effects (I)
2
2
R

R
f 2  2 21
1  R2
• Effect size metric:
• Median f 2 = .002,
• Mean (SD) = .009 (.025)
– 95% CI = .0089 to .0091
– 25th percentile = .0004
– 75th percentile = .0053
• Effect size values over time: r(261) = .15, p <
.05
Q1: Size of Observed Effects
(II)
Mean
(SD)
Median
AMJ (k =
6)
.040
(.047)
.025
JAP (k =
236)
.007 (.024)
.002
PP (k =
19)
.017
(.025)
.006
• F(2, 258) = 4.97, p = .008, η2 = .04
• Tukey HSD tests: AMJ > JAP and PP >
JAP
Q1: Size of Observed Effects
(III)
Mean
(SD)
Median
Gender
(k = 63)
.005
(.011)
.002
Ethnicity
(k = 45)
.002 (.002)
.001
Other
(k = 153)
.013
(.031)
.002
• F(2, 258) = 8.71, p < .001, η2 = .06
• Tukey HSD tests: Other > Ethnicity
Q1: Size of Observed Effects
(IV)
Mean (SD);
Median
Personnel Selection (k =
20)
Other (k =
241)
.010 (.023); .001
.009 (.025) .002
• t(259) = -.226, p = ns
Mean (SD);
Median
Work Attitudes (k = 96)
Other (k =
165)
.005 (.015); .002
.011 (.029) .002
• t(259) = -0.95, p = ns
Q2: Construct-level Effects (I)
• Median f 2 = .003
– Increase of .001 over median observed effect size
• Mean (SD) = .017
– Increase of .008 over mean observed effect size
Statistical Power
Q3: Statistical Power (I)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
Effect Size
Q3: Statistical Power (II)
Statistical Power
1.00
0.80
0.60
0.40
0.20
0.00
0.000 0.003 0.005 0.008 0.010 0.013 0.015 0.018 0.020 0.023
Effect Size
Q4: Power to Detect Small,
Medium, and Large Effects
• Small f 2 (.02); mean power = .84; 72% of
tests would have a power of .80 or higher
• Medium f 2 (.15); mean power = .98
• Large f 2 (.35); mean power = 1.0
Some Conclusions
• We expected effect size to be small, but not
so small (i.e., median of .002)
• Computation of construct-level effect sizes
did not improve things by much (i.e., median
of .003)
• More encouraging results:
– None of the 95% CIs around the mean effect size
for the various comparisons included zero
– Effect sizes have increased over time
– Given the observed sample sizes, mean power is
sufficient to detect effects ≥ .02
– 72% of studies had sufficient power to detect an
effect ≥ .02
Some Implications
• Are theories in dozens of research domains
incorrect in hypothesizing moderators?
• Are hundreds of researchers in dozens of
disparate domains wrong and population
moderating effects so small?
• Could be, but….. more likely, methodological
artifacts decrease the observed effect sizes
substantially vis-à-vis their population
counterparts
• More attention needs to be paid to design
and analysis issues that decrease observed
effect sizes
• Conventional definitions of effect size (f 2) for
moderators should probably be revised
The “Now What” Question
• Before data are collected
–
–
–
–
Larger sample size *
More reliable measures *
Avoid truncated samples *
Use non-coarse scales (e.g., program by Aguinis,
Bommer, & Pierce, 1996, Ed. & Psych. Measurement)
– Equalize sample size across moderator-based
subgroups
– Use computer programs in the public domain to
estimate sample size needed for desired power level
– Gather information on research design trade-offs
* Easier said that done!
Tools to Improve Moderating
Effect Estimation
(Aguinis, 2004)
• Scale coarseness
– Aguinis, Bommer, and Pierce (1996, Educational &
Psychological Measurement)
• Homogeneity of error variance
– Aguinis, Petersen, and Pierce (1999, Organizational
Research Methods)
• Power estimation and research design trade-offs
– Aguinis, Pierce, and Stone-Romero (1994, Educational &
Psychological Measurement)
– Aguinis and Pierce (1998, Educational & Psychological
Measurement)
– Aguinis, Boik, and Pierce (2001, Organizational Research
Methods)
Assessment of Assumption
Compliance
• DeShon and Alexander’s (1996) 1.5 rule
of thumb
• Bartlett’s homogeneity test:
M=
•
•
•
•
( ivi ) log e ( ivi si2 / ivi )  ivi log e si2
1
1
( i1 / vi  1 / ivi )
3( k  1)
k = number of sub-groups
nk = number of observations in each sub-group
s2 = sub-group variance on the criterion
v = degrees of freedom from which s2 is based
Homogeneity is not Met...
Now What?
• Use alternatives to MMR
– Alexander and colleagues' normalized-t
approximation:
(c 3  3c) (4c 7  33c 5  240c 3  855c)
; where
zi  c 

2
4
b
(10b  8bc  1000b
a  v i .5;
b  48a 2 ; c 
a ln(1  t
2
i
/ v i ) ; and vi  nk  2
– OR James's second-order approximation:
 
 
1
2 1  ( k  3)
2


3



T
 1 3 4   2  
4
2
 16
2
c

2
2

  8R 23  10 R 22  4 R 21  6R 12  8R 12 R 11  4 R 11 


2
2
(
2
R

4
R

2
R

2
R

4
R
R

2
R
)

(


1
)


23
22
21
12
12 11
11
2

 1  R  4R R  2R R  4R 2  4R R  R 2  
12
12 11
12 10
11
11 10
10

 4

 3  2   1
4
2



R 23  3R 22  3R 21  R 20 5 6  2  4   2  

2
 316 R 12  4 R 23  6 R 22  4 R 21  R 20 

h ( )  c  1 (3 4   2 )T  35  8  15  6  9  4  5 2  
2
1
2
 16  2 R 22  4 R 21  R 20  2 R 12 R 10  4 R 11R 10  R 10 
9   3  5    
8
6
4
2

 1  R  R 2 27   3      
22
11
8
6
4
2
 4
 1 R  R R 45   9   7   3 
12 11
8
6
4
2
 4 23








 

 

 
 
 































Program ALTMMR
• Calculates
– Error variance ratio (highest if more than 2
subgroups)
– Bartlett’s M
– James’s J
– Alexander’s A
• Uses sample descriptive data
– nk , sx , sy , rxy
– User sets p = .05 or .01 (for all but James’s
statistic)
Program ALTMMR
 Described in detail in Aguinis (2004)
 Available at www.cudenver.edu/~haguinis/
(click on MMR icon on left side of page)
 Executable on-line or locally
Power Estimation
• Program POWER
– Aguinis, Pierce, and Stone-Romero (1994,
Ed. & Psych. Measurement)
• Program MMRPWR
– Aguinis and Pierce (1998, Ed. & Psych.
Measurement)
• Program MMRPOWER
– Aguinis, Boik, and Pierce (2001,
Organizational Research Methods)
Program MMRPOWER
• Problems/Challenges regarding
POWER and MMRPWR programs:
– Based on extrapolation from simulations:
Range of values is limited
– Absence of factors known to affect power
of MMR (e.g., unreliability)
• Theoretical approximation to power:
k
k 1
  k  1  1

2
Power  Pr  
 Fk 1, N  2 k  1   j  x y H j    j G j  0 ,
j 1
j 1
  N  2k 



Program MMRPOWER
 Described in detail in Aguinis (2004)
 Available at www.cudenver.edu/~haguinis/
(click on MMR icon on left side of page)
 Executable on-line or locally
Some Conclusions
• Observed moderating effects are very
small
• MMR is a low power test for detecting
effect sizes as typically observed
• Researchers are not aware of problems
with MMR
• Implications for theory and practice
• User-friendly programs are available and
allow researchers to improve moderating
effect estimation
• Using these tools will allow researchers to
make more informed decisions regarding
the operation of moderating effects
Download