Graphical Exploration of Statistical Interactions Nick Jackson University of Southern California Department of Psychology 10/25/2013 1 Overview What is Interaction? 2-Way Interactions ◦ Categorical X Categorical ◦ Continuous X Categorical ◦ Continuous X Continuous 3-Way Interactions ◦ Categorical X Continuous X Continuous ◦ Continuous X Continuous X Continuous ◦ Time in a Three-Way Interaction 4-Way and beyond 2 What is an Interaction? Equivalent Statements: ◦ When the relationship between X and Y depends on the levels of a third variable Z. ◦ Z modifies the effect of X on Y. ◦ X and Y ‘s relationship is different at differing levels of Z Also Called Moderation or Effect Modification. Moderation is a stupid term. ◦ Moderation (n): The avoidance of excess or extremes. ◦ Moderate (v): To make or become less extreme or intense Those are kinda the opposite of what we mean when we say moderation in a statistical sense. 3 What is an Interaction? As SEM diagrams: Z X Y Z X*Z X Y 4 What is an Interaction? Z Modifies the effect of X on Y Effect of X on Y if we ignore Z Z=1 Y Y Z=0 X X 5 Types of Interaction Quantitative Interaction Only Qualitative Interaction X*Z, p<0.05 X=1 Y X=0 Z=0 X=1 X=1 X=0 Z=1 Quantitative Interaction: Difference between X(0) and X(1) is significantly different between Z(0) and Z(1), though these differences are not qualitatively different (visually these things look to be about the same). This occurs as a result of substantial power. Y X=0 Z=0 X=0 X=1 Z=1 Qualitative Interaction: Difference between X(0) and X(1) may or may not be significantly different between Z(0) and Z(1), however these differences are qualitatively different (ie. it really does look like an interaction) 6 Graphing the Interaction Why Graph? ◦ Interpreting the interaction coefficient(s) is not always intuitive Two ways to graph: ◦ 1) Look at observed means/values Represents your actual data Very easy to do in any package Does not represent the statistical model being used ◦ 2) Look at marginal (predicted) means/values from regression equation A direct representation of the statistical model you are using For interactions with continuous variables, it allows you to see where the interaction is occurring. 7 Graphing the Interaction More about marginal (predicted) means/values from regression equation The General Idea: ◦ Take the regression equation and predict values for the different levels of your variables X and Z ◦ For any covariates, use the their mean levels ◦ An Example: 𝐵𝑙𝑜𝑜𝑑_𝑃𝑟𝑒𝑠𝑠 = 𝛽0 + 𝛽1 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠 + 𝛽2 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝛽3 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠𝑋𝑔𝑒𝑛𝑑𝑒𝑟 𝐵𝑙𝑜𝑜𝑑_𝑃𝑟𝑒𝑠𝑠 = 75 + 20. 5 ∗ 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠 + 15 ∗ 𝑔𝑒𝑛𝑑𝑒𝑟 + 10.5 ∗ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠𝑋𝑔𝑒𝑛𝑑𝑒𝑟 Find the predicted means: Diabetes=1, Gender=1: 75 + 20.5(1) + 15(1) + 10.5(1*1)=121 Diabetes=0, Gender=1: 75 + 20.5(0) + 15(1) + 10.5(0*1)=90 Diabetes=1, Gender=0: 75 + 20.5(1) + 15(0) + 10.5(1*0)=95.5 Diabetes=0, Gender=0: 75 + 20.5(0) + 15(0) + 10.5(0*0)=75 Can get Standard Errors of predictions, though a bit difficult. 8 Graphing the Interaction (Marginal Estimates) Available in most Software Packages: ◦ Margins/marginsplot command in Stata ◦ lsmeans and effects Packages in R. predict and predict.lm commands in R. Some good ways to look at interactions in R. http://www.ats.ucla.edu/stat/r/faq/concon.htm ◦ Least-Squares Means (LSMEANS), Slicing, Contrasts, Estimate in SAS. ◦ SPSS GLM (emmeans), estimated marginal means 9 Two-Way Interactions Categorical X Categorical Interaction ◦ Use Bar Graphs ◦ 2 X 2: Below are equivalent representations of the same interaction…so which is it? Asian Asian White White Blood Pressure Blood Pressure Male Female Male Female Among Whites, Females have a higher blood pressure than Males. Among Asians, Females have a lower blood pressure than Males. Male Female Among males, Asians have a higher blood pressure than whites. Among females, Asians have a lower blood pressure than whites. 10 Two-Way Interactions Continuous X Categorical Interaction ◦ Could make continuous variable categorical and use a bar graph. ◦ Better idea, Use Scatter Plots/Linear Prediction for each category 60 50 40 By looking at the Confidence Intervals we can start to get an idea about when the genders diverge (statistically) in their effects. 30 Blood Press We can see that as BMI increases, blood pressure increases more sharply in Men than in Women. 70 80 Adjusted Predictions of gender with 95% CIs 20 30 40 body mass index (k/m-sq) male female 50 60 Two-Way Interactions Continuous X Categorical Interaction ◦ Look at how the Slope of Gender (differences between Men and Women) change across varying levels of BMI. ◦ We can use the 95% CI to see when these differences become significant. -20 -40 Difference in Blood Press The differences in mean blood pressure between men and women become more pronounced at higher BMI’s such that women have a lower BP than men as BMI increases. These differences are statistically significant (95% CI of difference does not include 0) past a BMI of around 35. 0 20 Conditional Marginal Effects of 2.gender with 95% CIs 20 30 40 body mass index (k/m-sq) 50 60 Two-Way Interactions Continuous X Categorical Interaction ◦ With more than Two Group categorical variable 13 Two-Way Interactions Continuous X Categorical Interaction ◦ With more than Two Group categorical variable Same as before, just plotting the differences relative to the reference group Works the same with non-linear continuous variables. 95% Confidence Intervals of the Difference in BMI between Sleep Duration Groups (Referenced to 7-8 Hours) across Age 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 Difference in BMI 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 Difference in BMI >=9 Hours of Sleep 5-6 Hours of Sleep <=4 Hours of Sleep 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 Age Age Age Two-Way Interactions Continuous X Continuous Interaction ◦ Traditional Methods Discretize one of the continuous variables making it categorical and do the usual procedures for categorical X continuous interactions. Usually +1 and -1 SD (This method sucks ) –Can miss where the interaction occurs ◦ Newer Method: Predict values at percentiles of the continuous variables Generally avoid the extremes of the percentiles (<5 or >95) as the variability is greater at the extremes ◦ Newer Method: Use 3-D Graphing (Surface/Mesh Plots) Same ideas as predicting values at the percentiles, but utilizing a 3D modeling software Two-Way Interactions Continuous X Continuous Interaction: Predicted values at percentiles Effect Modification of bp_sys1_bl vs bmi by cholest_bl 140 130 120 bp_sys1_bl 150 160 bmi*cholest_bl Interaction p=0.0431 20 30 At 1% cholest_bl 40 bmi 50 60 5,10,25,50,75,90,95% Percentiles of cholest_bl At 99% cholest_bl 16 Two-Way Interactions Continuous X Continuous Interaction: Which way we graph it is fairly arbitrary cholest_bl*bmi Interaction p=0.0431 150 120 130 140 bp_sys1_bl 140 130 120 bp_sys1_bl 150 160 Effect Modification of bp_sys1_bl vs cholest_bl by bmi bmi*cholest_bl Interaction p=0.0431 160 Effect Modification of bp_sys1_bl vs bmi by cholest_bl 20 30 At 1% cholest_bl 40 bmi 50 60 5,10,25,50,75,90,95% Percentiles of cholest_bl At 99% cholest_bl 2 4 6 8 10 12 cholest_bl At 1% bmi 5,10,25,50,75,90,95% Percentiles of bmi At 99% bmi We can see that the nature of the relationship changes at around a BMI 30. We can see that the nature of the relationship changes at around a cholesterol value of 3.5. We could say that BMI has a positive association with Blood Pressure, and that this relationship is the strongest among those with high cholesterol. Those with low cholesterol do not see a relationship of BMI with Blood Pressure We could say that Cholesterol has a positive association with Blood Pressure, and that this relationship is the strongest among those with high BMI. Those with low BMI have a negative or no relationship of Cholesterol with Blood Pressure 17 Two-Way Interactions Continuous X Continuous Interaction: Another way to interpret: 4-Corners Method 140 150 160 bmi*cholest_bl Interaction p=0.0431 130 The combination of being Obese (BMI >30) and having high cholesterol results in high BP. Effect Modification of bp_sys1_bl vs bmi by cholest_bl 120 bp_sys1_bl Low Chol, Low BMI=133 Low Chol, High BMI=125 High Chol, Low BMI=130 High Chol, High BMI=155 20 30 At 1% cholest_bl 40 bmi 50 60 5,10,25,50,75,90,95% Percentiles of cholest_bl At 99% cholest_bl 18 Two-Way Interactions Continuous X Continuous Interaction: 3D Mesh Plots (Matlab, Sigma Plot, R) Same data as before, same interpretation. Use 4-Corners Observed Data Marginal Estimates Data 180 800 170 Blood Pressure Blood Pressure 600 400 200 0 -200 160 150 140 130 120 -400 110 -600 50 45 40 -600 -400 35 -200 30 25 0 200 400 600 800 I BM I BM 50 45 40 35 30 25 10 8 6 4 Choles terol 10 8 6 4 Choles terol Why we generally don’t use observed data…not smooth 110 120 130 140 150 160 170 180 19 Two-Way Interactions Continuous X Continuous Interaction: Useful for Non-linear continuous interactions (Response Surface Model) 20 Three-Way Interactions Now things get complicated. ◦ Variables W*X*Z used to predict Y. ◦ The Interaction of X*Z is different at differing levels of W ◦ Or X*W is different at differing levels of Z ◦ Or Z*W is different at differing levels of X ◦ Or relationship of X and Y is different according to the levels of W and Z etc. ◦ Substantially easier when one of X, W, or Z are categorical 21 Three-Way Interactions Substantially easier when one of X, W, or Z are categorical…. so we pick a small range of values to predict one of the variables over…treating it as semi-discrete (Quartiles?) Often Time is the third variable Interested in if the interaction of X*Z change over Time (W) 22 Three-Way Interactions Categorical X Continuous X Continuous Interaction: Sleep Medication (Y/N) * BMI * Pulse: Stratify on categorical var Sleep Meds Predictive Margins med_sleep=1 60 40 20 Apnea Index The interaction of BMI and Pulse exists for those on Sleep Medications only. 80 100 med_sleep=0 20 25 30 35 40 45 50 55 60 20 25 30 35 40 45 50 55 60 body mass index (k/m-sq) Pulse 55 75 60 80 65 85 70 90 23 Three-Way Interactions Another way to look at this is how the difference in Apnea between those on Sleep Medications versus Not changes depending upon the relationships of pulse and BMI -40 -20 0 Apnea Index 20 40 Conditional Marginal Effects of 1.med_sleep 20 25 30 35 40 45 body mass index (k/m-sq) 50 55 60 Pulse 55 75 60 80 65 85 70 90 24 Three-Way Interactions Continuous X Continuous X Continuous Interaction: Glucose Level* BMI * Pulse: Stratify on Glucose glucose_bl=6 glucose_bl=7 glucose_bl=8 20 40 60 80 glucose_bl=5 20 40 60 80 Apnea Index Asks the question: How does the interaction of Pulse and BMI change across levels of glucose Adjusted Predictions 20 25 30 35 40 45 50 55 60 20 25 30 35 40 45 50 55 60 body mass index (k/m-sq) Pulse 55 75 60 80 65 85 70 90 25 Three-Way Interactions Continuous X Continuous X Continuous Interaction: Glucose Level* BMI * Pulse: Look at how the slopes of Glucose on Apnea change. 5 -10 -5 0 Apnea Index Asks the question: How does the relationship of Glucose to Apnea change across levels of BMI and pulse. 10 Average Marginal Effects of glucose_bl 20 25 30 35 40 45 body mass index (k/m-sq) 50 55 60 Pulse 55 75 60 80 65 85 70 90 26 Three-Way Interactions What if we have time as our third variable? Same techniques, but perhaps in the future we won’t be limited to just static graphs. Interaction of BMI and Pulse on Apnea Score across Time 27 Presenting Data in Motion Even better, lets do some of this: ◦ http://www.ted.com/talks/hans_rosling_reve als_new_insights_on_poverty.html 28 Four-Way Interactions and Beyond Understanding anything much more complex than a 3way interaction is difficult without a good way to break down variables into categories Classification Techniques/Machine Leaning/Exploratory Data Mining ◦ Can take high-dimensional data and find homogenous groups based upon relationships of continuous/categorical variables. 29 Four-Way Interactions and Beyond CART Model: 4-Way Interaction of continuous variables on Apnea Severity Smaller Structure Lateral Walls 0.644 Larger Structure 50.9 ± 21.4 Soft Palate -1.845 19.0 ± 12.3 Genioglossus -1.123 42.2 ± 17.9 Mandibular Width -0.250 41.2 ± 19.1 27.8 ± 13.8 30 Take Home Points Test for interactions in the beginning of model building ◦ Cause they are interesting ◦ Cause they obscure your main effects Interactions give us clues about underlying etiology (David Schwartz). It is not enough to detect them, we have to understand why the interaction exists. ◦ We must search for the variable(s) that make interactions go away (mediated moderation) Modern classification/Data Mining Methods are great at detecting high-dimensional (numerous variables) nonlinear interactions Stata Version 12 and 13 are amazing at doing these types of plots (margin plots). Also, check out “Interpreting and Visualizing Regression Models Using Stata” by Michael Mitchell 31