20131025_Graphical_Exploration_of_Interactions_Jackson

advertisement
Graphical Exploration of
Statistical Interactions
Nick Jackson
University of Southern California
Department of Psychology
10/25/2013
1
Overview


What is Interaction?
2-Way Interactions
◦ Categorical X Categorical
◦ Continuous X Categorical
◦ Continuous X Continuous

3-Way Interactions
◦ Categorical X Continuous X Continuous
◦ Continuous X Continuous X Continuous
◦ Time in a Three-Way Interaction

4-Way and beyond
2
What is an Interaction?

Equivalent Statements:
◦ When the relationship between X and Y depends on
the levels of a third variable Z.
◦ Z modifies the effect of X on Y.
◦ X and Y ‘s relationship is different at differing levels of Z

Also Called Moderation or Effect Modification.
Moderation is a stupid term.
◦ Moderation (n): The avoidance of excess or extremes.
◦ Moderate (v): To make or become less extreme or
intense
Those are kinda the opposite of what we mean when we
say moderation in a statistical sense.
3
What is an Interaction?
As SEM diagrams:
Z
X
Y
Z
X*Z
X
Y
4
What is an Interaction?
Z Modifies the effect of X on Y
Effect of X on Y if we ignore Z
Z=1
Y
Y
Z=0
X
X
5
Types of Interaction
Quantitative Interaction Only
Qualitative Interaction
X*Z, p<0.05
X=1
Y
X=0
Z=0
X=1
X=1
X=0
Z=1
Quantitative Interaction:
Difference between X(0) and X(1) is significantly
different between Z(0) and Z(1), though these
differences are not qualitatively different
(visually these things look to be about the
same). This occurs as a result of substantial
power.
Y
X=0
Z=0
X=0
X=1
Z=1
Qualitative Interaction:
Difference between X(0) and X(1) may or may
not be significantly different between Z(0) and
Z(1), however these differences are qualitatively
different (ie. it really does look like an
interaction)
6
Graphing the Interaction

Why Graph?
◦ Interpreting the interaction coefficient(s) is not
always intuitive

Two ways to graph:
◦ 1) Look at observed means/values
 Represents your actual data
 Very easy to do in any package
 Does not represent the statistical model being used
◦ 2) Look at marginal (predicted) means/values from
regression equation
 A direct representation of the statistical model you are using
 For interactions with continuous variables, it allows you to see
where the interaction is occurring.
7
Graphing the Interaction
More about marginal (predicted) means/values from
regression equation
 The General Idea:
◦ Take the regression equation and predict values for the different
levels of your variables X and Z
◦ For any covariates, use the their mean levels
◦ An Example:
𝐵𝑙𝑜𝑜𝑑_𝑃𝑟𝑒𝑠𝑠 = 𝛽0 + 𝛽1 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠 + 𝛽2 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝛽3 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠𝑋𝑔𝑒𝑛𝑑𝑒𝑟
𝐵𝑙𝑜𝑜𝑑_𝑃𝑟𝑒𝑠𝑠 = 75 + 20. 5 ∗ 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠 + 15 ∗ 𝑔𝑒𝑛𝑑𝑒𝑟 + 10.5 ∗ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠𝑋𝑔𝑒𝑛𝑑𝑒𝑟
Find the predicted means:
Diabetes=1, Gender=1: 75 + 20.5(1) + 15(1) + 10.5(1*1)=121
Diabetes=0, Gender=1: 75 + 20.5(0) + 15(1) + 10.5(0*1)=90
Diabetes=1, Gender=0: 75 + 20.5(1) + 15(0) + 10.5(1*0)=95.5
Diabetes=0, Gender=0: 75 + 20.5(0) + 15(0) + 10.5(0*0)=75
Can get Standard Errors of predictions, though a bit difficult.
8
Graphing the Interaction (Marginal
Estimates)

Available in most Software Packages:
◦ Margins/marginsplot command in Stata
◦ lsmeans and effects Packages in R. predict and
predict.lm commands in R.
 Some good ways to look at interactions in R.
http://www.ats.ucla.edu/stat/r/faq/concon.htm
◦ Least-Squares Means (LSMEANS), Slicing,
Contrasts, Estimate in SAS.
◦ SPSS GLM (emmeans), estimated marginal
means
9
Two-Way Interactions

Categorical X Categorical Interaction
◦ Use Bar Graphs
◦ 2 X 2: Below are equivalent representations of the
same interaction…so which is it?
Asian
Asian
White
White
Blood
Pressure
Blood
Pressure
Male Female
Male Female
Among Whites, Females have a
higher blood pressure than Males.
Among Asians, Females have a
lower blood pressure than Males.
Male
Female
Among males, Asians have a higher
blood pressure than whites.
Among females, Asians have a
lower blood pressure than whites.
10
Two-Way Interactions

Continuous X Categorical Interaction
◦ Could make continuous variable categorical and use a bar
graph.
◦ Better idea, Use Scatter Plots/Linear Prediction for each
category
60
50
40
By looking at the Confidence
Intervals we can start to get an
idea about when the genders
diverge (statistically) in their
effects.
30
Blood Press
We can see that as BMI increases,
blood pressure increases more
sharply in Men than in Women.
70
80
Adjusted Predictions of gender with 95% CIs
20
30
40
body mass index (k/m-sq)
male
female
50
60
Two-Way Interactions

Continuous X Categorical Interaction
◦ Look at how the Slope of Gender (differences between Men and
Women) change across varying levels of BMI.
◦ We can use the 95% CI to see when these differences become
significant.
-20
-40
Difference in Blood Press
The differences in mean blood
pressure between men and
women become more
pronounced at higher BMI’s such
that women have a lower BP than
men as BMI increases. These
differences are statistically
significant (95% CI of difference
does not include 0) past a BMI of
around 35.
0
20
Conditional Marginal Effects of 2.gender with 95% CIs
20
30
40
body mass index (k/m-sq)
50
60
Two-Way Interactions

Continuous X Categorical Interaction
◦ With more than Two Group categorical variable
13
Two-Way Interactions

Continuous X Categorical Interaction
◦ With more than Two Group categorical variable
 Same as before, just plotting the differences relative to the reference group
 Works the same with non-linear continuous variables.
95% Confidence Intervals of the Difference in BMI between Sleep Duration Groups (Referenced to 7-8 Hours) across Age
16
20
24
28
32
36
40
44
48
52
56
60
64
68
72
76
80
9
8
7
6
5
4
3
2
1
0
-1
-2
-3
-4
16
20
24
28
32
36
40
44
48
52
56
60
64
68
72
76
80
Difference in BMI
9
8
7
6
5
4
3
2
1
0
-1
-2
-3
-4
16
20
24
28
32
36
40
44
48
52
56
60
64
68
72
76
80
Difference in BMI
>=9 Hours of Sleep
5-6 Hours of Sleep
<=4 Hours of Sleep
9
8
7
6
5
4
3
2
1
0
-1
-2
-3
-4
Age
Age
Age
Two-Way Interactions

Continuous X Continuous Interaction
◦ Traditional Methods
 Discretize one of the continuous variables making it
categorical and do the usual procedures for categorical X
continuous interactions.
 Usually +1 and -1 SD (This method sucks ) –Can miss where
the interaction occurs
◦ Newer Method: Predict values at percentiles of the
continuous variables
 Generally avoid the extremes of the percentiles (<5 or >95) as
the variability is greater at the extremes
◦ Newer Method: Use 3-D Graphing (Surface/Mesh
Plots)
 Same ideas as predicting values at the percentiles, but
utilizing a 3D modeling software
Two-Way Interactions
Continuous X Continuous Interaction:
Predicted values at percentiles
Effect Modification of bp_sys1_bl vs bmi by cholest_bl
140
130
120
bp_sys1_bl
150
160
bmi*cholest_bl Interaction p=0.0431
20
30
At 1% cholest_bl
40
bmi
50
60
5,10,25,50,75,90,95% Percentiles of cholest_bl
At 99% cholest_bl
16
Two-Way Interactions
Continuous X Continuous Interaction:
Which way we graph it is fairly arbitrary
cholest_bl*bmi Interaction p=0.0431
150
120
130
140
bp_sys1_bl
140
130
120
bp_sys1_bl
150
160
Effect Modification of bp_sys1_bl vs cholest_bl by bmi
bmi*cholest_bl Interaction p=0.0431
160
Effect Modification of bp_sys1_bl vs bmi by cholest_bl
20
30
At 1% cholest_bl
40
bmi
50
60
5,10,25,50,75,90,95% Percentiles of cholest_bl
At 99% cholest_bl
2
4
6
8
10
12
cholest_bl
At 1% bmi
5,10,25,50,75,90,95% Percentiles of bmi
At 99% bmi
We can see that the nature of the relationship
changes at around a BMI 30.
We can see that the nature of the relationship
changes at around a cholesterol value of 3.5.
We could say that BMI has a positive association
with Blood Pressure, and that this relationship is
the strongest among those with high cholesterol.
Those with low cholesterol do not see a
relationship of BMI with Blood Pressure
We could say that Cholesterol has a positive
association with Blood Pressure, and that this
relationship is the strongest among those with
high BMI. Those with low BMI have a negative or
no relationship of Cholesterol with Blood Pressure
17
Two-Way Interactions
Continuous X Continuous Interaction:
Another way to interpret: 4-Corners Method
140
150
160
bmi*cholest_bl Interaction p=0.0431
130
The combination of being
Obese (BMI >30) and having
high cholesterol results in
high BP.
Effect Modification of bp_sys1_bl vs bmi by cholest_bl
120
bp_sys1_bl
Low Chol, Low BMI=133
Low Chol, High BMI=125
High Chol, Low BMI=130
High Chol, High BMI=155
20
30
At 1% cholest_bl
40
bmi
50
60
5,10,25,50,75,90,95% Percentiles of cholest_bl
At 99% cholest_bl
18
Two-Way Interactions
Continuous X Continuous Interaction:
3D Mesh Plots (Matlab, Sigma Plot, R)
Same data as before, same interpretation. Use 4-Corners
Observed Data
Marginal Estimates Data
180
800
170
Blood Pressure
Blood Pressure
600
400
200
0
-200
160
150
140
130
120
-400
110
-600
50
45
40
-600
-400 35
-200 30
25
0
200
400
600
800
I
BM
I
BM
50
45
40
35
30
25
10
8
6
4
Choles
terol
10
8
6
4
Choles
terol
Why we generally don’t use observed data…not smooth
110
120
130
140
150
160
170
180
19
Two-Way Interactions
Continuous X Continuous Interaction:
Useful for Non-linear continuous interactions (Response Surface Model)
20
Three-Way Interactions

Now things get complicated.
◦ Variables W*X*Z used to predict Y.
◦ The Interaction of X*Z is different at differing
levels of W
◦ Or X*W is different at differing levels of Z
◦ Or Z*W is different at differing levels of X
◦ Or relationship of X and Y is different according
to the levels of W and Z etc.
◦ Substantially easier when one of X, W, or Z are
categorical
21
Three-Way Interactions

Substantially easier when one of X, W, or Z are
categorical….


so we pick a small range of values to predict one of
the variables over…treating it as semi-discrete
(Quartiles?)
Often Time is the third variable

Interested in if the interaction of X*Z change over
Time (W)
22
Three-Way Interactions
Categorical X Continuous X Continuous Interaction:
Sleep Medication (Y/N) * BMI * Pulse: Stratify on categorical var Sleep Meds
Predictive Margins
med_sleep=1
60
40
20
Apnea Index
The interaction
of BMI and
Pulse exists for
those on Sleep
Medications
only.
80
100
med_sleep=0
20
25
30
35
40
45
50
55
60
20
25
30
35
40
45
50
55
60
body mass index (k/m-sq)
Pulse
55
75
60
80
65
85
70
90
23
Three-Way Interactions
Another way to look at this is how the difference in Apnea between those
on Sleep Medications versus Not changes depending upon the
relationships of pulse and BMI
-40
-20
0
Apnea Index
20
40
Conditional Marginal Effects of 1.med_sleep
20
25
30
35
40
45
body mass index (k/m-sq)
50
55
60
Pulse
55
75
60
80
65
85
70
90
24
Three-Way Interactions
Continuous X Continuous X Continuous Interaction:
Glucose Level* BMI * Pulse: Stratify on Glucose
glucose_bl=6
glucose_bl=7
glucose_bl=8
20 40 60 80
glucose_bl=5
20 40 60 80
Apnea Index
Asks the question:
How does the
interaction of Pulse
and BMI change
across levels of
glucose
Adjusted Predictions
20
25
30
35
40
45
50
55
60
20
25
30
35
40
45
50
55
60
body mass index (k/m-sq)
Pulse
55
75
60
80
65
85
70
90
25
Three-Way Interactions
Continuous X Continuous X Continuous Interaction:
Glucose Level* BMI * Pulse: Look at how the slopes of Glucose on Apnea change.
5
-10
-5
0
Apnea Index
Asks the question:
How does the
relationship of
Glucose to Apnea
change across levels
of BMI and pulse.
10
Average Marginal Effects of glucose_bl
20
25
30
35
40
45
body mass index (k/m-sq)
50
55
60
Pulse
55
75
60
80
65
85
70
90
26
Three-Way Interactions
What if we have time as our third variable?
 Same techniques, but perhaps in the future we
won’t be limited to just static graphs.

Interaction of BMI and Pulse on Apnea Score across Time
27
Presenting Data in Motion

Even better, lets do some of this:
◦ http://www.ted.com/talks/hans_rosling_reve
als_new_insights_on_poverty.html
28
Four-Way Interactions and Beyond


Understanding anything much more complex than a 3way interaction is difficult without a good way to break
down variables into categories
Classification Techniques/Machine Leaning/Exploratory
Data Mining
◦ Can take high-dimensional data and find homogenous groups
based upon relationships of continuous/categorical variables.
29
Four-Way Interactions and Beyond
CART Model:
4-Way Interaction of
continuous variables on
Apnea Severity
Smaller
Structure
Lateral
Walls
0.644
Larger
Structure
50.9 ± 21.4
Soft Palate
-1.845
19.0 ± 12.3
Genioglossus
-1.123
42.2 ± 17.9
Mandibular
Width
-0.250
41.2 ± 19.1
27.8 ± 13.8
30
Take Home Points

Test for interactions in the beginning of model building
◦ Cause they are interesting
◦ Cause they obscure your main effects

Interactions give us clues about underlying etiology
(David Schwartz). It is not enough to detect them, we
have to understand why the interaction exists.
◦ We must search for the variable(s) that make interactions go
away (mediated moderation)


Modern classification/Data Mining Methods are great at
detecting high-dimensional (numerous variables) nonlinear interactions
Stata Version 12 and 13 are amazing at doing these
types of plots (margin plots). Also, check out
“Interpreting and Visualizing Regression Models Using
Stata” by Michael Mitchell
31
Download