SS16.10

advertisement
Introduction to testing statistical
significance of interactions
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Overview
• Testing statistical significance of individual coefficients
• Testing effect of interaction terms on overall model fit
• Approaches to testing statistical significance of
interactions
–
–
–
–
Alternative model specification
The “TEST” statement
Simple slopes calculations for compound coefficients
Changing the reference category
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Statistical significance of an interaction
• To evaluate statistical significance of an interaction,
use a set of approaches
• t-tests for individual coefficients
• F-tests for the collective contribution of a set of terms to
the overall fit of an OLS model
• The corresponding statistics for a logistic model are
• z-statistics for individual coefficients
• –2 log likelihood statistic for overall model fit
• Methods for testing differences among values of
variables involved in the interaction
– Contrasts within the overall shape of the pattern
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Estimated coefficients from an OLS model of birth weight in grams
Model A: Without
Model B: With
interactions
interactions
ttβ
β
statistic
statistic
Main effects terms
Race (ref. = non-Hisp. white)
Non-Hispanic Black (NHB)
–172.6** –9.86
Mexican American (MA)
–23.1 –1.02
Mother’s ed. (ref. = > HS)
Less than high school (<HS)
–55.5** –2.88
High school graduate (=HS)
–53.9** –3.64
Interactions: race & education
NHB_<HS
MA_<HS
NHB_=HS
MA_=HS
F-statistic
94.08
Degrees of freedom (df)
9
–168.1** –5.66
–104.2** –2.16
–54.2** –2.35
–62.0** –3.77
–38.5 –0.88
99.4 1.72
18.4 0.47
93.7 1.49
65.59
13
Statistical significance of βs on
individual interaction terms
• Statistical significance of coefficients on each of the
interaction terms is assessed as for any other
independent variable in a multivariate regression
model
• In the example from the previous slide, none of the
βs on the individual interaction terms between
race/ethnicity and mother’s education achieve
statistical significance as assessed by their t-statistics
– E.g., βNHB_<HS = –38.5, with a t-statistic of –0.88
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Overall shape of an interaction
• But recall that βs on main effect and interaction terms cannot
be interpreted in isolation from one another
• E.g., in a model of birth weight with an interaction between
race and education, the difference in birth weight for nonHispanic black infants born to mothers with < HS compared to
the reference category involves βs on three variables
= βNHB + β<HS + βNHB_<HS
• More than one β is involved in this calculation, so looking only
at the statistical significance of each of those three βs does
not tell us the statistical significance of differences between
groups defined by combinations of the two IVs in the
interaction
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
What do inferential statistics for
individual terms in a model tell us?
• If the coefficient on an interaction term is statistically
significant in a model that includes the corresponding
main effects terms
– We know only that that combination of characteristics has a
joint effect on the DV over and above the main effects
• E.g., if <HS_NHB is statistically significant in a model of
birth weight that also includes the main effects of
education and race
– We know only that that combination of race and education
has a different effect on birth weight than would be implied
by the  on the main effects of NHB and <HS alone
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Assessing effects of interactions on
overall model goodness-of-fit (GOF)
• To assess whether the interaction terms collectively improve
overall model fit, calculate the difference in F-statistics for
models with and without those terms
– Model A: Main effects only
– Model B: Main effects and interactions
• Compare against critical value of the F-statistic for the number
of degrees of freedom (df) for the model.
– df for the numerator is based on the difference in number of
covariates in models with and without interaction terms
– df for the denominator depends on the sample size
• If the difference in F > the critical value, the interaction terms
statistically significantly improves the overall fit of the model
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Example difference in model GOF
• The difference in F-statistics between models A and B
= Fmodel A – Fmodel B = 94.1 – 65.6 = 28.5
• The difference in the number of degrees of freedom between
models A and B = 13 df – 9 df = 4 df
• For an F distribution with
• 4 degrees of freedom for the numerator
• ∞ degrees of freedom for the denominator (based on the sample size
used to estimate the model)
• The critical value for p = 0.001 is 10.8
• The difference in F exceeds the critical value (28.5 > 10.8)
– Thus we conclude that inclusion of interaction terms improves the
overall fit of the model at p < 0.001
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
How can overall fit improve if individual
terms aren’t statistically significant?
• In models that include several main effect and
interaction terms, one or more of those terms may
not be needed to capture relevant variation in the DV
– Could be collapsed into the reference category or
combined with other subgroups based on empirical testing
– Might yield statistical significance for some interaction
terms
• Models that include many interaction terms may be
affected by multicollinearity
– Can explain why the t-statistics show a lack of significance
even if the F-statistic indicates statistical significance
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
What do inferential statistics for
individual terms in a model tell us?
• If <HS_NHB is statistically significant in a model of birth
weight that also includes the main effects of
education and race
– We know only that that combination of race and education
has a different effect on birth weight than would be
implied by the s on the main effects of NHB and <HS
alone
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
What don’t inferential statistics for
individual terms in a model tell us?
• Based on the separate test statistics for each of the
individual main effect and interaction s alone cannot
assess statistical significance of differences in predicted
birth weight
– For example, for non-Hispanic blacks born to mothers with < HS
compared to non-Hispanic whites born to mothers with > HS
(the reference category)
– Across racial/ethnic groups within the < HS group
– Across education levels among non-Hispanic blacks
• Remember: each of these comparisons involves
comparing values calculated from more than one , e.g.,
– For non-Hispanic black < HS: <HS + NHB + <HS_NHB
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Calculating overall effect for nonHispanic blacks with < HS education
βNHB =
–168
*
β<HS =
βNHB_<HS =
* –54
* p < 0.05 based on t-tests for individual coefficients
–39
= βNHB + β<HS + βNHB_<HS = (–168) + (–54) + (–39) = –261
–39
–54
–168
We want to know whether that sum is statistically significantly different from 0;
e.g., no difference in birth weight compared to infants born to non-Hispanic
white women with more than a high school education = reference category
Substantive question behind the interaction model:
“Does race modify the association between education
and birth weight?”
• The bar for each
race/education combination
involves the sum of the
intercept and one to three
other coefficients
• t-tests for individual βs
won’t tell us about
statistical significance of
differences in those sums
Non-Hispanic white
Mexican-American
Predicted birth weight (grams)
Non-Hispanic black
3,450
3,400
3,350
3,300
3,250
3,200
3,150
3,100
3,050
3,000
2,950
<HS
=HS
>HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Tests of differences across groups
other than the reference category
• Conduct formal inferential tests of whether the
predicted value of the dependent variable is
statistically significantly different across categories
• Possible approaches
– Use “TEST” statement to contrast coefficients
– Revise the model specification
• Estimate a model with dummies for all interaction combinations
• Reestimate the model with different reference categories
– See separate podcast on that topic
– Conduct post-hoc tests of differences between s from
one model
• See separate podcast on simple slope
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Summary
• Inferential tests for individual coefficients in a
regression model test whether each β is statistically
significantly different from 0
• In models using main effects and interaction terms,
calculating the overall shape of an interaction
requires summing several βs
– Tests of the individual component βs don’t address
statistical significance of differences in the overall
interaction pattern
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested resources
• Miller, J. E. 2013. The Chicago Guide to Writing about
Multivariate Analysis, 2nd Edition. University of
Chicago Press, chapters 11, 15, and 16.
• Cohen, Jacob, Patricia Cohen, Stephen G. West, and
Leona S. Aiken. 2003. Applied Multiple
Regression/Correlation Analysis for the Behavioral
Sciences, 3rd Edition. Florence, KY: Routledge,
chapters 7 and 9.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested online resources
• Podcasts on
– Specifying models to test for interactions
– Calculating the overall shape of an interaction pattern
from regression coefficients
– Comparing overall goodness-of-fit across models
– Approaches to testing statistical significance of interactions
– Conducting post-hoc tests of compound coefficients using
the simple slopes technique
– Using alternative reference categories to test statistical
significance of interactions
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Download