Analysis of Interaction Effects James Jaccard New York University Overview Will cover the basics of interaction analysis, highlighting multiple regression based strategies Will discuss advanced issues and complications in interaction analysis. This treatment will be somewhat superficial but hopefully informative Conceptual Foundations of Interaction Analysis Causal Theories Most (but not all) theories rely heavily on the concept of causality, i.e., we seek to identify the determinants of a behavior or mental state and/or the consequences of a behavior or environmental/mental state I am going to ground interaction analysis in a causal framework Causal Theories Causal theories can be complicated, but at their core, there are five types of causal relationships in causal theories Direct Causal Relationships A direct causal relationship is when a variable, X, has a direct causal influence on another variable, Y: X Y Direct Causal Relationships Frustration + Aggression Direct Causal Relationships Frustration Quality of Relationship with Mother + Aggression - Adolescent Drug Use Indirect Causal Relationships Indirect Causal Relationships An indirect causal relationship is when a variable, X, has a causal influence on another variable, Y, through an intermediary variable, M: X M Y Indirect Causal Relationships Quality of Relationship with Mother Adolescent School Work Ethic Adolescent Drug Use Spurious Relationship Spurious Relationship A spurious relationship is one where two variables that are not causally related share a common cause: C X Y Bidirectional Causal Relationships Bidirectional Causal Relationships A bidirectional causal relationship is when a variable, X, has a causal influence on another variable, Y, and that effect, Y, has a “simultaneous” impact on X: X Y Bidirectional Causal Relationships Quality of Relationship with Mother Adolescent Drug Use Moderated Causal Relationships Moderated Causal Relationships A moderated causal relationship is when the impact of a variable, X, on another variable, Y, differs depending on the value of a third variable, Z Z X Y Moderated Causal Relationships Gender Treatment vs. No Treatment Depression Moderated Causal Relationships Gender Treatment vs. No Treatment Depression Quality of ParentAdolescent Relationship Exp Negative Peers Drug Use Moderated Causal Relationships Z X Y The variable that “moderates” the relationship is called a moderator variable. Causal Theories We put all these ideas together to build complex theories of phenomena. Here is one example: Quality of Relationship with Mother Gender Time Mother Spends with Child Adolescent School Work Ethic Adolescent Drug Use Interaction Analysis Interactions, when translated into causal analysis, focus on moderated relationships When I encounter an interaction effect, I think: Z X Y Interaction Analysis Key step in interaction analysis is to identify the focal independent variable and the moderator variable. Sometimes it is obvious – such as with the analysis of a treatment for depression on depression as moderated by gender Gender Treat vs Control Depression Interaction Analysis Sometimes it is not obvious – such as an analysis of the effects of gender and ethnicity on the amount of time an adolescent spends with his or her mother Gender Ethnicity Time Spent Statistically, it matters not which variables take on which role. Conceptually, it does. The Statistical Analysis of Interactions Some Common Practices Omnibus tests – I do not use these Hierarchical regression – I use sparingly Focus on unstandardized coefficients - we tend to stay away from standardized coefficients in interaction analysis because they can be misleading and they do not have “clean” mathematical properties A “Trick” We Will Use: Linear Transformations Y = a + b1 X + e Satisfaction = a + b1 Grade + e Satisfaction = 12 + -.50 Grade + e A “Trick” We Will Use: Linear Transformations Y = a + b1 X + e Satisfaction = a + b1 Grade + e Satisfaction = 12 + -.50 Grade + e Satisfaction = 9 + -.50 (Grade – 6) + e A “Trick” We Will Use: Linear Transformations Y = a + b1 X + e Satisfaction = a + b1 Grade + e Satisfaction = 12 + -.50 Grade + e Satisfaction = 9 + -.50 (Grade – 6) + e “Mean centering” is when we subtract the mean Interaction Analysis Will focus on four cases: Categorical IV and Categorical MV Continuous IV and Categorical MV Categorical IV and Continuous MV Continuous IV and Continuous MV Assume you know the basics of multiple regression and dummy variables in multiple regression Categorical IV and Categorical MV Categorical IV and Categorical MV Y = Relationship satisfaction (0 to 10) X = Gender (female = 1, male = 0) Z = Grade (6th = 1, 7th = 0) 6th 7th Female 8.0 7.0 Male 7.0 4.0 Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Three questions: Is there a gender difference for 6th graders? Is there a gender difference for 7th graders? Are these gender effects different? Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Gender effect for 6th grade: 8 – 7 = 1 Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Gender effect for 6th grade: 8 – 7 = 1 Gender effect for 7th grade: 7 – 4 = 3 Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Gender effect for 6th grade: 8 – 7 = 1 Gender effect for 7th grade: 7 – 4 = 3 Interaction contrast: (8-7) – (7– 4) = -2 Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Y = a + b1 Gender + b2 Grade + b3 (Gender)(Grade) Y = 4.0 + 3.0 Gender + b2 Grade + -2.0 (Gender)(Grade) Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Y = a + b1 Gender + b2 Grade + b3 (Gender)(Grade) Y = 4.0 + 3.0 Gender + b2 Grade + -2.0 (Gender)(Grade) Flipped: Y = 7.0 + 1.0 Gender + b2 Grade + 2.0 (Gender)(Grade) Categorical IV and Categorical MV 6th 7th Female 8.0 7.0 Male 7.0 4.0 Extend to groups > 2 (add 8th grade) Inclusion of covariates How to generate means and tables Continuous IV and Categorical MV Continuous IV and Categorical MV Y = Relationship satisfaction (0 to 10) X = Time spent together (in hours) Z = Gender (female = 1, male = 0) Continuous IV and Categorical MV Y = Relationship satisfaction (0 to 10) X = Time spent together (in hours) Z = Gender (female = 1, male = 0) Three questions: For females: b = 0.33 For males: b = 0.20 Are the effects different: 0.33 – 0.20 Continuous IV and Categorical MV Y = Relationship satisfaction (0 to 10) X = Time spent together (in hours) Z = Gender (female = 1, male = 0) For females: b = 0.33 For males: b = 0.20 Y = a + b1 Gender + 0.20 Time + 0.13 (Gender)(Time) Continuous IV and Categorical MV Y = Relationship satisfaction (0 to 10) X = Time spent together (in hours) Z = Gender (female = 1, male = 0) For females: b = 0.33 For males: b = 0.20 Y = a + b1 Gender + 0.20 Time + 0.13 (Gender)(Time) Flipped: Y = a + b1 Gender + 0.33 Time + -0.13 (Gender)(Time) Continuous IV and Categorical MV Do not estimate slopes separately; use flipped reference group strategy Extend to groups > 2 (use grade as example) Categorical IV and Continuous MV Categorical IV and Continuous MV Study conducted in Miami with bi-lingual Latinos Categorical IV and Continuous MV Study conducted in Miami with bi-lingual Latinos Ad language: Half shown ad in Spanish (0) and half in English (1) Categorical IV and Continuous MV Study conducted in Miami with bi-lingual Latinos Ad language: Half shown ad in Spanish (0) and half in English (1) Latino identity: 1 = not at all, 7 = strong identify Categorical IV and Continuous MV Study conducted in Miami with bi-lingual Latinos Ad language: Half shown ad in Spanish (0) and half in English (1) Latino identity: 1 = not at all, 7 = strong identify Outcome = Attitude toward product (1 = unfavorable, 7 = unfavorable) Hypothesized moderated relationship Common Analysis Form: Median Split Many researchers not sure how to analyze this, so use median split for continuous moderator variable and conduct ANOVA Why this is bad practice…. Categorical IV and Continuous MV Identity 1 2 3 4 5 6 7 Mean English – Mean Spanish 1.50 1.00 0.50 0.00 -0.50 -1.00 -1.50 Categorical IV and Continuous MV Identity 1 2 3 4 5 6 7 Mean English – Mean Spanish 1.50 1.00 0.50 0.00 -0.50 -1.00 -1.50 Y = a + b1 Ad language + b2 Identity + b3 Ad X Identity Categorical IV and Continuous MV In order to make intercept meaningful, subtracted 1 from Latino Identity measure, so ranged from 0 to 6 Y = a + b1 Ad language + b2 Identity + b3 Ad X Identity Categorical IV and Continuous MV Categorical IV and Continuous MV Mean attitude for Spanish ad for Latino ID = 1 is 3.215 Categorical IV and Continuous MV Mean attitude for Spanish ad for Latino ID = 1 is 3.215 Mean difference for Latino ID = 1 is 1.707 (p < 0.05) Categorical IV and Continuous MV Mean attitude for Spanish ad for Latino ID = 1 is 3.215 Mean difference for Latino ID = 1 is 1.707 (p < 0.05) Mean attitude for English ad for Latino ID = 1 is 4.922 Categorical IV and Continuous MV Identity 1 2 3 4 5 6 7 Mean English Mean Spanish Difference 4.922 3.215 1.707* Categorical IV and Continuous MV Identity 1 2 3 4 5 6 7 Mean English Mean Spanish Difference 4.922 4.915 3.215 3.662 1.707* 1.253* Categorical IV and Continuous MV Identity 1 2 3 4 5 6 7 Mean English Mean Spanish Difference 4.922 4.915 4.908 3.215 3.662 4.108 1.707* 1.253* 0.800* Categorical IV and Continuous MV Identity 1 2 3 4 5 6 7 Mean English Mean Spanish Difference 4.922 4.915 4.908 4.901 4.895 4.888 4.882 3.215 3.662 4.108 4.555 5.002 5.449 5.896 1.707* 1.253* 0.800* 0.346* -0.107 -0.561* -1.014* (Common practice, Mean = 3, SD = 1.2; Show R program) Continuous IV and Continuous MV Y: Child anxiety (0 to 20) X: Parent anxiety (0 to 20) Z: Parenting behavior: Control (0 to 20) Continuous IV and Continuous MV Y: Child anxiety (0 to 20) X: Parent anxiety (0 to 20) Z: Parenting behavior: Control (0 to 20) Control 7 8 9 10 11 12 13 b for Y onto X .10 .20 .30 .40 .50 .60 .70 Continuous IV and Continuous MV Control 7 8 9 10 11 12 13 b for Y onto X .10 .20 .30 .40 .50 .60 .70 Y = a + b1 Control + 0.10 PA + 0.10 (Control)(PA) (Common practice versus regions of significance) (Why we include component parts) Advanced Topics Three Way Interactions Three Way Interactions Identify focal independent variable Identify first order moderator variable Identify second order moderator variable Grade Ethnicity Gender Satisfaction IC1 = (6-5) - (6-4) = -1 Three Way Interactions European American Grade 7 Grade 8 Female 6.0 6.0 Male 5.0 4.0 IC = (6-5) – (6-4) = -1 IC1 = (6-5) - (6-4) = -1 Three Way Interactions European American Latinos Grade 7 Grade 8 Grade 7 Grade 8 Female 6.0 6.0 Female 6.0 6.0 Male 5.0 4.0 Male 6.0 6.0 IC = (6-5) – (6-4) = -1 IC = (6-6) – (6-6) = 0 IC1 = (6-5) - (6-4) = -1 Three Way Interactions European American Latinos Grade 7 Grade 8 Grade 7 Grade 8 Female 6.0 6.0 Female 6.0 6.0 Male 5.0 4.0 Male 6.0 6.0 IC = (6-5) – (6-4) = -1 IC = (6-6) – (6-6) = 0 TW = [(6-5) – (6-4)] - [(6-6) – (6-6)] = -1 IC1 = (6-5) - (6-4) = -1 Three Way Interactions European American (1) Latinos (0) G7 (1) G8 (0) Female (1) 6.0 6.0 Female (1) Male (0) 5.0 4.0 Male (0) IC = (6-5) – (6-4) = -1 G7 (1) G8 (0) 6.0 6.0 6.0 6.0 IC = (6-6) – (6-6) = 0 TW = [(6-5) – (6-4)] - [(6-6) – (6-6)] = -1 Y = 6.0 + 0 Gender + b2 Grade + b3 Ethnic + 0 (Gender)(Grade) + b5 (Gender)(Ethnic) + b6 (Grade)(Ethnic) + -1 (Gender)(Grade)(Ethnic) Modeling Non-Linear Interactions Modeling Non-Linear Interactions Y = α + β1 X + β2 Z + ε β1 = α’ + β3 Z + β4 Z2 Modeling Non-Linear Interactions Y = α + β1 X + β2 Z + ε β1 = α’ + β3 Z + β4 Z2 Substitute right hand side for β1: Y = α + (α’ + β3 Z + β4 Z2) X + β2 Z + ε Modeling Non-Linear Interactions Y = α + β1 X + β2 Z + ε β1 = α’ + β3 Z + β4 Z2 Substitute right hand side for β1: Y = α + (α’ + β3 Z + β4 Z2) X + β2 Z + ε Expand: Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε Modeling Non-Linear Interactions Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε Re-arrange terms: Y = α + α’X + β2 Z + β3 XZ + β4 XZ2 + ε Modeling Non-Linear Interactions Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε Re-arrange terms: Y = α + α’X + β2 Z + β3 XZ + β4 XZ2 + ε Re-label and you have your model: Y = α + β1 X + β2 Z + β3 XZ + β4 XZ2 + ε Modeling Non-Linear Interactions Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε Re-arrange terms: Y = α + α’X + β2 Z + β3 XZ + β4 XZ2 + ε Re-label and you have your model: Y = α + β1 X + β2 Z + β3 XZ + β4 XZ2 + ε Use centering strategy to isolate effect of X on Y (β1 ) at any given value of Z; also consider modeling intercept Exploratory Interaction Analysis Exploratory Interaction Analysis Use program in R Y = Tenured or not (using MLPM) X = Number of articles published Z = Number of years since hired Y = α + β1 X + ε X COEFFICENT AND M VALUES N M Value 478 475 457 408 330 246 166 115 74 48 1.000 2.000 3.000 4.000 5.000 6.000 7.000 8.000 9.000 10.000 X Slope .000 .002 .007 .007 .009 .008 .005 .009 .011 .001 Regression Mixture Modeling Mixture Regression BI = α + β1 Aact + β2 PN + β3 PBC + ε When we regress Y onto a set of predictors, we assume that people are drawn from a single population with common linear coefficients But, in reality, we probably are mixing heterogeneous population segments with different coefficients characterizing the segments Mixture Regression With “mixed” populations, the overall regression analysis can characterize neither segment very well and lead to sub-optimal inferences and intervention strategies Another Example of Aggregation Bias Mixture Regression Latent Class X Aact SN Intention PBC Mixture Model for Heavy Episodic Drinking A four class model fits data best (entries are linear coefficients) Aact SN DN PBC Segment 1 (42%): .33 .02 .01 -.01 Segment 2 (17%): .10 .29 .30 .01 Segment 3 (21%): .30 .29 .05 .04 Segment 4 (20%): .48 .09 .25 -.03 Interaction Analysis and Establishing Generalizability Generalizability It is common for people to conclude that an effect “generalizes” in the absence of a statistically significant interaction effect Example with RCT of obesity treatment and gender Problem is that we can never accept the null hypothesis of a zero interaction contrast Solution: Adopt the framework of equivalence testing Generalizability Step 1: Specify a threshold value that will be used to define functional equivalence Step 2: Specify the range of functional equivalence Step 3: Calculate the 95% CI for the interaction contrast Step 4: Determine if the CI is completely within the range of functional equivalence Measurement Error Measurement Error It is well known that measurement error can bias parameter estimates in multiple regression. This holds with vigor for interaction analysis One approach to dealing with measurement error in general is to use latent variable modeling Measurement Error e1 D1 Depression Measurement Error e1 e2 e3 D1 D2 D3 e1 D1 Depression Depression Latent Variable Regression e4 e5 e6 X1 X2 X3 e10 e11 Y1 Y2 X e9 X3Z3 e8 X2Z2 e7 X1Z1 Support Y d3 Z Z1 Z2 Z3 e1 e2 e3 Latent Variable Regression There are a about a half a dozen approaches to how best to model latent variable interactions (e.g., quasi-maximum likelihood; Bayesian). I recommend the approach developed by Herbert Marsh as a good balance between utility and complexity, coupled with Huber-White sandwich estimators for robustness Latent variable regression using multiple group analysis Multi-Group Modeling in SEM e4 e5 e6 X1 X2 X3 e7 e8 Y1 Y2 X Y d3 Z Z1 Z2 Z3 e1 e2 e3 Assumption Violations Assumption Violations If assumptions of normality or variance homogeneity are suspect Use approaches with robust standard errors Bootstrapping Huber-White sandwich estimators Be careful of outlier resistant robust methods Rand Wilcox work with smoothers Thank God It Has Ended!