Using alternative reference categories to test statistical significance of an interaction Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Overview • Reasons for estimating models with alternate reference categories • Review: confidence intervals from simple slopes calculations for compound coefficients • Contrasts that can be assessed for statistical significance – With “base” specification – With a different reference category • Suggestions for presenting results of alternate specifications The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Substantive question behind the interaction model: “Does race modify the association between education and birth weight?” Non-Hispanic white Mexican-American Non-Hispanic black Predicted birth weight (grams) 3,450 • Two questions involved in the interaction: – Are differences across racial/ethnic groups within each education level statistically significant? 3,400 3,350 3,300 3,250 • Within cluster, across bar colors 3,200 – Are differences across education levels within each racial/ethnic group statistically significant? 3,150 3,100 3,050 3,000 2,950 <HS =HS >HS • Within bar color, across clusters The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Why change the reference category? • Standard errors calculated using the simple slopes technique allow formal inferential statistical testing of each of the other groups – Only against the reference category – Not against one another • Such comparisons are important for characterizing how the two independent variables involved in an interaction relate to the dependent variable – E.g., how race and educational attainment together relate to birth weight The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Model Specification A with interactions: race and education • BW = f (race, education, race_education) – Birth weight is a function of race, education, and the race-byeducation interaction – Also includes controls for other demographic factors • To specify the model, need ALL of the main effects and interaction term variables related to race and mother’s education BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS) • In this specification, the omitted (reference) category is non-Hispanic white infants born to mothers with > HS The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Which contrasts can be tested with Specification A? “Is the difference in birth weight for the group shown statistically significantly lower than for NHW & > HS?” Predicted birth weight (grams) Non-Hispanic white Mexican-American Reference category = Non-Hispanic white & > HS Non-Hispanic black 3,450 3,400 3,350 3,300 3,250 3,200 3,150 3,100 3,050 3,000 2,950 <HS =HS >HS The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Which contrasts are possible with Non-Hispanic white, >HS as the reference category Predicted birth weight (grams) ,by mother’s education and race/ethnicity, United States, 1988–1994 NHANES III Mother’s educational attainment Race/ethnicity < HS = HS > HS Non-Hispanic white 3,044 3,044 3,112 Mexican American 3,027 3,031 3,003 Non-Hispanic black 2,820 2,883 2,937 • The circled cell is the reference category (Non-Hispanic white, mother’s education > HS) • Yellow-shaded cells can be compared to the reference category based on the standard errors of the associated main effects terms alone • Green-shaded cells can be compared to the reference category using standard errors calculated using the simple slope for a compound coefficient The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Confidence intervals around estimates from Specification A – from simple slopes Reference category = Non-Hispanic white & > HS 95% confidence intervals shown in pink Difference in birth weight (grams) <HS =HS >HS 50 0 -50 -100 -150 -200 -250 -300 -350 -400 Non-Hispanic white Mexican-American Non-Hispanic black The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Model Specification B • Still testing – the same substantive question about birth weight as a function of race, education – including interactions between race and education • Change the reference category to NHB & < HS Had been: BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS) Specification B: BW = f (NHW, =HS, >HS, NHW_=HS, NHW_>HS) • Requires a different set of dummy variables – Drop main effect and interaction terms related to NHB and to <HS from the original specification – Add main effects and interaction terms related to NHW and >HS in their place – Retain the same set of controls for other demographic factors The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Which contrasts can be formally tested with Specification B? Reference category = Non-Hispanic black & < HS Predicted birth weight (grams) Non-Hispanic white “Is the difference in birth weight for the group shown statistically significantly different than for NHB & < HS?” Mexican-American Non-Hispanic black 3,450 3,400 3,350 3,300 3,250 3,200 3,150 3,100 3,050 3,000 2,950 <HS =HS >HS The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Which contrasts are possible with Non-Hispanic black, < HS as the reference category? Predicted birth weight (grams), by mother’s education and race/ethnicity, United States, 1988–1994 NHANES III Mother’s educational attainment Race/ethnicity < HS = HS > HS Non-Hispanic white 3,044 3,044 3,112 Mexican American 3,027 3,031 3,003 Non-Hispanic black 2,820 2,883 2,937 • The circled cell is the reference category (Non-Hispanic black, mother’s education < HS) • Yellow-shaded cells can be compared to the reference category based on the standard errors of the associated main effects terms alone • Green-shaded cells can be compared to the reference category using standard errors calculated using the simple slope for a compound coefficient The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Model Specification C • Again, testing – The same substantive question about birth weight as a function of race, education – Including interactions between race and education • Change the reference category to NHB & = HS Specification C: BW = f (NHW, <HS, >HS, NHW_<HS, NHW_>HS) • Requires yet a different set of dummy variables than Specification B – Replace main effect and interaction terms related to =HS from Specification B with main effects and interaction terms related to <HS in their place – Retain the same set of controls for other demographic factors The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Which contrasts can be formally tested with Specification C? “Is the difference in birth weight for the group shown statistically significantly different than for NHB & = HS?” Predicted birth weight (grams) Non-Hispanic white Mexican-American Reference category = Non-Hispanic black & = HS Non-Hispanic black 3,450 3,400 3,350 3,300 3,250 3,200 3,150 3,100 3,050 3,000 2,950 <HS =HS >HS The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Which contrasts are possible with Non-Hispanic black, = HS as the reference category? Predicted birth weight (grams), by mother’s education and race/ethnicity, United States, 1988–1994 NHANES III Mother’s educational attainment Race/ethnicity < HS = HS > HS Non-Hispanic white 3,044 3,044 3,112 Mexican American 3,027 3,031 3,003 Non-Hispanic black 2,820 2,883 2,937 • The circled cell is the reference category (Non-Hispanic black, mother’s education < HS) • Yellow-shaded cells can be compared to the reference category based on the standard errors of the associated main effects terms alone • Green-shaded cells can be compared to the reference category using standard errors calculated using the simple slope for a compound coefficient The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Similarities across models with different reference categories – Produce identical results in terms of • The overall direction and magnitude of differences in the DV across groups • Overall model fit F-statistic R2 statistic Non-Hispanic white Mexican-American Non-Hispanic black 3,450 Predicted birth weight (grams) • Alternate specifications with different reference categories 3,400 3,350 3,300 3,250 3,200 3,150 3,100 3,050 3,000 2,950 <HS =HS >HS The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Differences across models with different reference categories • Alternate specifications with different reference categories differ in terms of – Which contrasts can be formally evaluated for statistical significance based on the estimated standard errors for each i and the associated variance-covariance matrix – The estimated constant, which depends on the choice of omitted (reference) category • The constant term will equal the predicted value of the DV for that category, with all other IVs in the model = 0 • This is true whether the model includes – Only main effects dummies – Both main effects and interaction terms The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Presenting results of statistical tests from alternative specifications • No one set of is and associated standard errors (from any one specification) will allow readers to conduct inferential tests for all possible contrasts • Estimate the models and conduct post-hoc tests for your substantive hypotheses behind the scenes – Do NOT present the is, standard errors, and simple slopes calculations for each of the alternative specifications! • Create a table of detailed results from your “base” specification • Supplement it with a table or chart reporting the predicted values of the dependent variable for pertinent values of the IVs in the interaction, calculated from the is – Use symbols to denote which values are statistically significantly different from one another The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Chart of results of post-hoc testing with alternative reference categories Non-Hispanic white * denotes statistically Mexican-American Non-Hispanic black Predicted birth weight (grams) 3,450 3,400 † £ ¥ 3,350 3,300 3,250 * † * † £ * £ * † 3,200 3,150 * † ¥ * ¥ 3,100 * ¥ 3,050 3,000 significantly different at p < 0.05 from non-Hispanic white > HS † denotes statistically significantly different at p < 0.05 from nonHispanic black < HS £ denotes statistically significantly different at p < 0.05 from nonHispanic black = HS ¥ denotes statistically significantly different at p < 0.05 from Mexican American = HS 2,950 <HS =HS >HS The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Summary • By estimating a series of models with different reference categories, can test statistical significance of different substantively important contrasts among groups defined by the interaction – Use the simple slope technique to calculate standard errors for compound coefficients involved in the interaction • Overall specification must be otherwise identical – Same other control variables – Same amount of detail in which main effects and interactions are conceptually tested – Same analytic sample • Conduct the post-hoc tests behind the scenes and communicate the results with a chart of the overall pattern, with symbols for statistical significance The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested resources • Cohen, Jacob, Patricia Cohen, Stephen G. West, and Leona S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge, chapters 8 and 9. • Figueiras, Adolfo, Jose Maria Domenech-Massons, and Carmen Cadarso. 1998. Regression Models: Calculating the Confidence Interval of Effects in the Presence of Interactions. Statistics in Medicine 17: 2099–2105. • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. University of Chicago Press, chapters 5 (tables), 6 (charts), and 16. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested online resources • Podcasts on – – – – Introduction to interactions Choosing a reference category Specifying a model to test for interactions Calculating the overall shape of an interaction from regression coefficients – Introduction to testing statistical significance of interactions – Approaches to testing statistical significance of interactions – Conducting post-hoc tests of compound coefficients using simple slopes The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested exercise • Estimate an OLS model with an interaction between a threecategory independent variable (IV1) and a two-category independent variable (IV2) • Create a grid with a row for each category of IV1 and a column for each category of IV2 – Shade the cells to show which contrasts can be formally tested for statistical significance based on this specification • Standard errors for main effects terms • Standard errors calculated for compound coefficients using the simple slopes technique – Fill the values of the compound coefficients and standard errors into each cell • Create a chart with 95% confidence intervals around each compound coefficient The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested exercise, cont. • Respecify the model, changing the reference category for the three-category IV involved in the interaction • Create a grid to display compound coefficients and standard errors from Specification 2 • Create a chart with 95% confidence intervals around each compound coefficient based on the results of Specification 2 • Comment on which tests of statistical significance are possible with each of the two model specifications • Describe which differences are statistically significant The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Suggested exercise, cont. • Calculate the predicted value of the dependent variable (including the intercept term) separately for – Specification 1 – Specification 2 • Compare Specifications 1 and 2 in terms of – Their GOF statistics – The overall shape of the interaction pattern • Create a chart to summarize the findings of your post-hoc testing • Write a 1 – 2 paragraph description of the direction, magnitude, and statistical significance of the interaction between IV1 and IV2 The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.