Creating variables and specifying to test for interactions involving continuous independent variables Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Overview • Creating variables for an interaction between one categorical and one continuous variable. • Aside on missing values. • Specifying a model to test an interaction between one categorical and one continuous variable. • Creating variables for an interaction between two continuous independent variables. • Specifying a model to test an interaction between two continuous independent variables. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Interaction between a continuous and a categorical independent variable (IV) • Example: Race and income-to-poverty ratio. – Race is a 2-category IV classified • non-Hispanic black (NHB), • non-Hispanic white (NHW,) – IPR is a continuous variable calculated as annual family income (in $) divided by the Federal Poverty Level for a family of that size and age composition. • IPR ranges from 0 to more than 10 in this sample. • Federal Poverty Level for a family of 2 adults and 2 children in 2010 was about $22,000 The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Coding of variables • The NHB main effect variable is defined as in the previous example (of categorical by categorical interaction). • 1 = non-Hispanic black. • 0 = all others, the reference category, in this example, nonHispanic white. • However, for a continuous variable like income that takes on many possible numeric values, it doesn’t make sense to create a lot of dummy variables. • Instead, use income-poverty ratio in its continuous form. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Calculating an interaction term from a dummy and a continuous main effects term • The value of the interaction term variable is defined as the product of the two component main effects variables: X1_ X2 = X1 × X2 – Result will be one continuous interaction term variable. • Thus NHB_IPR is the product of NHB and IPR. – If NHB = 1 and IPR = 2.3 then the interaction term NHB_IPR = 2.3 – If NHB = 0 and IPR = 2.3, then NHB_IPR = 0 The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Coding of main effects and interaction term variables: race and IPR Case characteristics – SELECTED VALUES Non-H white & IPR = 0.5 Non-H white & IPR = 1.0 Non-H white & IPR = 2.0 Non-H white & IPR = 5.0 Non-H black & IPR = 0.5 Non-H black & IPR = 1.0 Non-H black & IPR = 2.0 Non-H black & IPR = 5.0 Variables Main effects terms Interaction term NHB IPR NHB_IPR 0 0 0 0 1 1 1 1 0.5 1.0 2.0 5.0 0.5 1.0 2.0 5.0 0 0 0 0 0.5 1.0 2.0 5.0 E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR = 2.0 means income is twice the FPL. For a two-category race variable (non-Hispanic white = reference category). The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Coding of race and IPR variables: Non-Hispanic white infants Case characteristics Non-H white & IPR = 0.5 Non-H white & IPR = 1.0 Non-H white & IPR = 2.0 Non-H white & IPR = 5.0 Variables Main effects terms Interaction term NHB IPR NHB_IPR 0 0 0 0 0.5 1.0 2.0 5.0 0 0 0 0 E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR = 2.0 means income is twice the FPL. For a two-category race variable (non-Hispanic white = reference category). The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Coding of race and IPR variables: Non-Hispanic black infants Case characteristics Non-H black & IPR = 0.5 Non-H black & IPR = 1.0 Non-H black & IPR = 2.0 Non-H black & IPR = 5.0 Variables Main effects terms Interaction term NHB IPR NHB_IPR 1 1 1 1 0.5 1.0 2.0 5.0 0.5 1.0 2.0 5.0 E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR = 2.0 means income is twice the FPL. For a two-category race variable (non-Hispanic white = reference category). The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Coding of main effects and interaction term variables: race and IPR Case characteristics – SELECTED VALUES Non-H white & IPR = 0.5 Non-H white & IPR = 1.0 Non-H white & IPR = 2.0 Non-H white & IPR = 5.0 Non-H black & IPR = 0.5 Non-H black & IPR = 1.0 Non-H black & IPR = 2.0 Non-H black & IPR = 5.0 Variables Main effects terms Interaction term NHB IPR NHB_IPR 0 0 0 0 1 1 1 1 0.5 1.0 2.0 5.0 0.5 1.0 2.0 5.0 0 0 0 0 0.5 1.0 2.0 5.0 E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR = 2.0 means income is twice the FPL. For a two-category race variable (non-Hispanic white = reference category). The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Aside: Missing values • For each new variable created, the new variable should take on a missing value if the original source variable was missing for a given case. • Need to specify this as an extra step for IF/THEN logic such as that used in creating the dummies. – E.g., IF RACE = . THEN NHB =.; – In the statistical package SAS, “.” is the code for missing. • For variables created using arithmetic, if any component source variable is missing, the result of the calculation will also be missing. – E.g., if IPR =., then NHB_IPR will also be missing. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Model specification to test an interaction between one continuous and one categorical independent variable • For a model with an interaction between two independent variables, need all of the ALL of the main effects and interaction term variables related to those two independent variables. • E.g., for a model of birth weight by race and IPR, include the main effect and interaction terms related to race and family IPR-to-poverty ratio: – BW = f (NHB, IPR, NHB_IPR) The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Independent variables: continuous by continuous interaction • Mother’s age at time of child’s birth, years – One continuous variable for the main effect: age • Family income to poverty ratio, in multiples of the Federal Poverty Level – One continuous variable for the main effect: IPR • Interaction: Mother’s age and IPR – Age_IPR = age × IPR – Resulting interaction term variable will also be continuous The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Coding of main effects and interaction term variables: age and IPR Case characteristics – SELECTED VALUES Age = 15 & IPR = 0.5 Age = 15 & IPR = 1.0 Age = 15 & IPR = 2.0 Age = 25 & IPR = 0.5 Age = 25 & IPR = 1.0 Age = 25 & IPR = 2.0 Age = 30 & IPR = 0.5 Age = 30 & IPR = 1.0 Age = 30 & IPR = 2.0 Variables Main effects terms Interaction term Age IPR Age_IPR 15 15 15 25 25 25 30 30 30 0.5 1.0 2.0 0.5 1.0 2.0 0.5 1.0 2.0 7.5 15.0 30.0 12.5 25.0 50.0 15.0 30.0 60.0 E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR = 2.0 means income is twice the FPL. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Model specification to test an interaction between two continuous variables • For a model with an interaction between two independent variables, need all of the ALL of the main effects and interaction term variables related to those two independent variables. • E.g., for a model of birth weight with an interaction between age and IPR, include the main effect and interaction terms related to family IPR-to-poverty ratio and mother’s age: – BW = f (IPR, mother’s age, age_IPR) The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Centered continuous independent variables • The example shown here used uncentered continuous variables: – Age in years – Income-to-poverty ratio (IPR) • Could use centered versions in the interactions – Create a version of the variable centered at its mean • E.g. CENTAGE = AGE – mean age – Where mean age is the sample mean – Then use CENTAGE age the main effect term – CENTAGE_IPR as the interaction term The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Be parsimonious in deciding which interactions to test • As shown here, the number of variables in the regression model proliferates rapidly with each additional interaction. • Specify interactions only between key independent variables. • Communicating results becomes unwieldy: – Considerable behind-the-scenes calculations. – Extra tables or charts to convey the shape of the interaction. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Summary • A model specification to test for interactions includes both main effects and interaction terms. – Combination of those terms in the model uniquely identifies each possible combination of values of the component variables. • Number and type of interaction terms needed depends on – Type (s)of variables in the interaction. – Number of categories, for categorical variables in interaction. • For most situations, test interactions among key variables only. For criteria to help you decide which interactions to test for your topic and data, see podcast on visualizing shapes of interaction patterns The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Suggested resources • Miller, J. E., 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Chapter 16, on interactions – Chapter 9, on defining dummy variables – Chapter 8, on choice of reference category • Chapters 8 and 9 of Cohen et al. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Suggested online resources • Podcasts on – Introduction to interactions – Visualizing shapes of interaction patterns between categorical by continuous variables – Choosing a reference category The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Suggested course extensions for Chapter 16 • “Reviewing” exercises #2, 3 and 4. • “Applying statistics and writing” exercises #1, 2, and 3. • “Revising” exercises #1 and 3. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.