SS16.13

advertisement
Using alternative reference
categories to test statistical
significance of an interaction
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Overview
• Reasons for estimating models with alternate
reference categories
• Review: confidence intervals from simple slopes
calculations for compound coefficients
• Contrasts that can be assessed for statistical
significance
– With “base” specification
– With a different reference category
• Suggestions for presenting results of alternate
specifications
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Substantive question behind the interaction model:
“Does race modify the association between education
and birth weight?”
Non-Hispanic white
Mexican-American
Non-Hispanic black
Predicted birth weight (grams)
3,450
• Two questions involved in the
interaction:
– Are differences across
racial/ethnic groups within
each education level
statistically significant?
3,400
3,350
3,300
3,250
• Within cluster, across bar colors
3,200
– Are differences across
education levels within each
racial/ethnic group
statistically significant?
3,150
3,100
3,050
3,000
2,950
<HS
=HS
>HS
• Within bar color, across clusters
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Why change the reference category?
• Standard errors calculated using the simple slopes
technique allow formal inferential statistical testing
of each of the other groups
– Only against the reference category
– Not against one another
• Such comparisons are important for characterizing
how the two independent variables involved in an
interaction relate to the dependent variable
– E.g., how race and educational attainment together relate
to birth weight
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Model Specification A with
interactions: race and education
• BW = f (race, education, race_education)
– Birth weight is a function of race, education, and the race-byeducation interaction
– Also includes controls for other demographic factors
• To specify the model, need ALL of the main effects and
interaction term variables related to race and mother’s
education
BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS)
• In this specification, the omitted (reference) category is
non-Hispanic white infants born to mothers with > HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Which contrasts can be tested
with Specification A?
“Is the difference in birth weight for the group
shown statistically significantly lower than for
NHW & > HS?”
Predicted birth weight (grams)
Non-Hispanic white
Mexican-American
Reference category
= Non-Hispanic white
& > HS
Non-Hispanic black
3,450
3,400
3,350
3,300
3,250
3,200
3,150
3,100
3,050
3,000
2,950
<HS
=HS
>HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Which contrasts are possible with
Non-Hispanic white, >HS as the reference category
Predicted birth weight (grams) ,by
mother’s education and race/ethnicity,
United States, 1988–1994 NHANES III
Mother’s educational
attainment
Race/ethnicity
< HS
= HS
> HS
Non-Hispanic white 3,044 3,044 3,112
Mexican American
3,027 3,031 3,003
Non-Hispanic black
2,820 2,883 2,937
• The circled cell is the reference
category (Non-Hispanic white,
mother’s education > HS)
• Yellow-shaded cells can be
compared to the reference
category based on the standard
errors of the associated main
effects terms alone
• Green-shaded cells can be
compared to the reference
category using standard errors
calculated using the simple slope
for a compound coefficient
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Confidence intervals around estimates from
Specification A – from simple slopes
Reference category
= Non-Hispanic white
& > HS
95% confidence intervals shown in pink
Difference in birth weight (grams)
<HS
=HS
>HS
50
0
-50
-100
-150
-200
-250
-300
-350
-400
Non-Hispanic white
Mexican-American
Non-Hispanic black
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Model Specification B
• Still testing
– the same substantive question about birth weight as a function of race,
education
– including interactions between race and education
• Change the reference category to NHB & < HS
Had been: BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS)
Specification B: BW = f (NHW, =HS, >HS, NHW_=HS, NHW_>HS)
• Requires a different set of dummy variables
– Drop main effect and interaction terms related to NHB and to <HS from
the original specification
– Add main effects and interaction terms related to NHW and >HS in their
place
– Retain the same set of controls for other demographic factors
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Which contrasts can be formally tested
with Specification B?
Reference category
= Non-Hispanic black
& < HS
Predicted birth weight (grams)
Non-Hispanic white
“Is the difference in birth weight for the group
shown statistically significantly different than
for NHB & < HS?”
Mexican-American
Non-Hispanic black
3,450
3,400
3,350
3,300
3,250
3,200
3,150
3,100
3,050
3,000
2,950
<HS
=HS
>HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Which contrasts are possible with Non-Hispanic
black, < HS as the reference category?
Predicted birth weight (grams), by
mother’s education and race/ethnicity,
United States, 1988–1994 NHANES III
Mother’s educational
attainment
Race/ethnicity
< HS
= HS
> HS
Non-Hispanic white 3,044 3,044 3,112
Mexican American
3,027 3,031 3,003
Non-Hispanic black
2,820 2,883 2,937
• The circled cell is the reference
category (Non-Hispanic black,
mother’s education < HS)
• Yellow-shaded cells can be
compared to the reference
category based on the standard
errors of the associated main
effects terms alone
• Green-shaded cells can be
compared to the reference
category using standard errors
calculated using the simple slope
for a compound coefficient
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Model Specification C
• Again, testing
– The same substantive question about birth weight as a
function of race, education
– Including interactions between race and education
• Change the reference category to NHB & = HS
Specification C: BW = f (NHW, <HS, >HS, NHW_<HS, NHW_>HS)
• Requires yet a different set of dummy variables than
Specification B
– Replace main effect and interaction terms related to =HS from
Specification B with main effects and interaction terms
related to <HS in their place
– Retain the same set of controls for other demographic factors
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Which contrasts can be formally tested
with Specification C?
“Is the difference in birth weight for the group
shown statistically significantly different than
for NHB & = HS?”
Predicted birth weight (grams)
Non-Hispanic white
Mexican-American
Reference category
= Non-Hispanic black
& = HS
Non-Hispanic black
3,450
3,400
3,350
3,300
3,250
3,200
3,150
3,100
3,050
3,000
2,950
<HS
=HS
>HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Which contrasts are possible with
Non-Hispanic black, = HS as the reference category?
Predicted birth weight (grams), by
mother’s education and race/ethnicity,
United States, 1988–1994 NHANES III
Mother’s educational
attainment
Race/ethnicity
< HS
= HS
> HS
Non-Hispanic white 3,044 3,044 3,112
Mexican American
3,027 3,031 3,003
Non-Hispanic black
2,820 2,883 2,937
• The circled cell is the reference
category (Non-Hispanic black,
mother’s education < HS)
• Yellow-shaded cells can be
compared to the reference
category based on the standard
errors of the associated main
effects terms alone
• Green-shaded cells can be
compared to the reference
category using standard errors
calculated using the simple slope
for a compound coefficient
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Similarities across models with
different reference categories
– Produce identical results in
terms of
• The overall direction and
magnitude of differences in
the DV across groups
• Overall model fit
 F-statistic
 R2 statistic
Non-Hispanic white
Mexican-American
Non-Hispanic black
3,450
Predicted birth weight (grams)
• Alternate specifications
with different reference
categories
3,400
3,350
3,300
3,250
3,200
3,150
3,100
3,050
3,000
2,950
<HS
=HS
>HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Differences across models with
different reference categories
• Alternate specifications with different reference
categories differ in terms of
– Which contrasts can be formally evaluated for statistical
significance based on the estimated standard errors for
each i and the associated variance-covariance matrix
– The estimated constant, which depends on the choice of
omitted (reference) category
• The constant term will equal the predicted value of the DV for that
category, with all other IVs in the model = 0
• This is true whether the model includes
– Only main effects dummies
– Both main effects and interaction terms
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Presenting results of statistical tests
from alternative specifications
• No one set of is and associated standard errors (from any one
specification) will allow readers to conduct inferential tests for
all possible contrasts
• Estimate the models and conduct post-hoc tests for your
substantive hypotheses behind the scenes
– Do NOT present the is, standard errors, and simple slopes calculations
for each of the alternative specifications!
• Create a table of detailed results from your “base” specification
• Supplement it with a table or chart reporting the predicted
values of the dependent variable for pertinent values of the IVs
in the interaction, calculated from the is
– Use symbols to denote which values are statistically significantly
different from one another
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Chart of results of post-hoc testing
with alternative reference categories
Non-Hispanic white
* denotes statistically
Mexican-American
Non-Hispanic black
Predicted birth weight (grams)
3,450
3,400
†
£
¥
3,350
3,300
3,250
*
†
*
†
£
*
£
*
†
3,200
3,150
*
†
¥
*
¥
3,100
*
¥
3,050
3,000
significantly different at p < 0.05
from non-Hispanic white > HS
† denotes statistically significantly
different at p < 0.05 from nonHispanic black < HS
£ denotes statistically significantly
different at p < 0.05 from nonHispanic black = HS
¥ denotes statistically significantly
different at p < 0.05 from
Mexican American = HS
2,950
<HS
=HS
>HS
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Summary
• By estimating a series of models with different reference
categories, can test statistical significance of different
substantively important contrasts among groups defined by
the interaction
– Use the simple slope technique to calculate standard errors for
compound coefficients involved in the interaction
• Overall specification must be otherwise identical
– Same other control variables
– Same amount of detail in which main effects and interactions are
conceptually tested
– Same analytic sample
• Conduct the post-hoc tests behind the scenes and
communicate the results with a chart of the overall pattern,
with symbols for statistical significance
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested resources
• Cohen, Jacob, Patricia Cohen, Stephen G. West, and Leona S.
Aiken. 2003. Applied Multiple Regression/Correlation Analysis
for the Behavioral Sciences, 3rd Edition. Florence, KY:
Routledge, chapters 8 and 9.
• Figueiras, Adolfo, Jose Maria Domenech-Massons, and
Carmen Cadarso. 1998. Regression Models: Calculating the
Confidence Interval of Effects in the Presence of Interactions.
Statistics in Medicine 17: 2099–2105.
• Miller, J. E. 2013. The Chicago Guide to Writing about
Multivariate Analysis, 2nd Edition. University of Chicago Press,
chapters 5 (tables), 6 (charts), and 16.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested online resources
• Podcasts on
–
–
–
–
Introduction to interactions
Choosing a reference category
Specifying a model to test for interactions
Calculating the overall shape of an interaction from
regression coefficients
– Introduction to testing statistical significance of
interactions
– Approaches to testing statistical significance of interactions
– Conducting post-hoc tests of compound coefficients using
simple slopes
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested exercise
• Estimate an OLS model with an interaction between a threecategory independent variable (IV1) and a two-category
independent variable (IV2)
• Create a grid with a row for each category of IV1 and a column
for each category of IV2
– Shade the cells to show which contrasts can be formally tested for
statistical significance based on this specification
• Standard errors for main effects terms
• Standard errors calculated for compound coefficients using the simple
slopes technique
– Fill the values of the compound coefficients and standard errors into
each cell
• Create a chart with 95% confidence intervals around each
compound coefficient
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested exercise, cont.
• Respecify the model, changing the reference category
for the three-category IV involved in the interaction
• Create a grid to display compound coefficients and
standard errors from Specification 2
• Create a chart with 95% confidence intervals around
each compound coefficient based on the results of
Specification 2
• Comment on which tests of statistical significance are
possible with each of the two model specifications
• Describe which differences are statistically significant
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Suggested exercise, cont.
• Calculate the predicted value of the dependent variable
(including the intercept term) separately for
– Specification 1
– Specification 2
• Compare Specifications 1 and 2 in terms of
– Their GOF statistics
– The overall shape of the interaction pattern
• Create a chart to summarize the findings of your post-hoc
testing
• Write a 1 – 2 paragraph description of the direction,
magnitude, and statistical significance of the interaction
between IV1 and IV2
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Download