Specification errors for interaction models: Implications for the shape of the overall pattern

Specification errors for
interaction models:
Implications for the shape of the
overall pattern
Jane E. Miller, PhD
• Review: Model specification with main effects and
interaction terms
• Implications of leaving the main effects terms out of
a model intended to test for interactions
• Repercussions for
– An interaction between two categorical independent
– An interaction between one categorical and one
continuous independent variable
List of variables used in examples
• Dependent variable = birth weight in grams (BW)
• Independent variables:
– Main effects terms:
• Race
– Two nominal categories (non-Hispanic black; non-Hispanic white
is the reference category)
– One main effect dummy variable: NHB
» Coded 1 = non-Hispanic black, 0 = non-Hispanic white
• Mother’s education
– Three ordinal categories (< HS; = HS; > HS is the reference
– Two main effects dummies: <HS, =HS
» Each coded 1 = named category, 0 = all other values
List of variables, continued
• Interaction between race and mother’s education
– Two interaction term dummies: NHB_<HS;
• Each named using the “_” convention to link the names
of the component variables.
• Each coded 1 = named category, 0 = all other values
– E.g., NHB_<HS
= 1 for those who are both NHB and < HS,
= 0 for all other combinations of race and education
Model specification with interactions:
race and education
• BW = f (race, education, race_education)
– Birth weight is a function of race, education, and the race-byeducation interaction
• To specify a model that does not impose assumptions
about the shape of the association, need ALL of the
main effects and interaction term variables related to
race and mother’s education
• BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS)
– Yellow denotes the main effects terms
– Green denotes the interaction terms
Some possible patterns of race,
education, and birth weight
< HS
= HS
< HS
> HS
= HS
> HS
< HS
Main effect: education
Main effect: race
= HS
> HS
Interaction: magnitude
> HS
Main effects: race & education
< HS
= HS
< HS
= HS
> HS
Interaction: direction & magnitude
What happens if the specification omits
the main effects terms?
• If we omit the main effects terms for the two
independent variables involved in the interaction,
the implied model is specified
BW = f (NHB_<HS, NHB_=HS)
• Then the estimated βs for those two variables
compare those groups against everyone else
– In this case all whites (regardless of mother’s educational
attainment) plus blacks whose mothers have > HS
– This implicitly assumes that those four groups all have
equal mean birth weight, rather than testing for
differences across those groups
Repercussions of misspecification
• Any differences among
“&” used to denote a group
with that combination of
characteristics, not an
interaction term
– NHB & > HS
– NHW & < HS
– NHW &= HS
– and NHW & > HS
will be overlooked because there are no terms in the model
to test for such differences.
• β0 (the constant or intercept term) will be a weighted
average of birth weight for those four groups
• βNHB_<HS and βNHB_=HS will estimate the difference in
mean birth weight for those groups compared to that
combined reference category
Implied pattern if main effects of race
and education are omitted
Implied reference category for specification
BW = f (NHB_<HS, NHB_=HS)
< HS
= HS
> HS
Non-Hispanic black
Non-Hispanic white
Implied pattern if main effects of race
and education are omitted
BW = f (NHB_<HS, NHB_=HS)
Non-Hispanic black
Non-Hispanic white
< HS
= HS
> HS
Observed pattern based on model of NHANES III
data with main effects and interaction terms
BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS)
βNHB + β<HS + βNHB_<HS
βNHB + β=HS + βNHB_=HS
Model estimates
separate levels
(intercepts) for each
combination of race
and education
Non-Hispanic black
Non-Hispanic white
< HS
= HS
> HS
Interaction between a continuous and
a categorical independent variable (IV)
• Example: Race and income-to-poverty ratio (IPR)
– Race is a two-category IV, specified with a dummy variable
NHB, coded
• 1 = non-Hispanic black
• 0 = non-Hispanic white (the reference category)
– IPR is a continuous variable calculated as annual family
income (in dollars) divided by the Federal Poverty Level for a
family of that size and age composition
– The interaction between race and IPR is a continuous variable
calculated as the product of the NHB dummy and IPR
Model specification to test an interaction
between continuous and categorical IVs
• For a model with an interaction between two
independent variables, need all of the ALL of the main
effects and interaction term variables related to those
two independent variables
• E.g., for a model of birth weight by race and IPR, include
the main effect and interaction terms related to race
and family IPR:
What happens if the specification
omits the main effects terms?
• If we omit the main effects terms for the two
independent variables involved in the interaction,
the implied model is specified
BW = f (NHB_IPR)
• Then the coefficient βNHB_IPR estimates the slope of
the IPR/birth weight curve for blacks, but does not
– Allow for a different intercept for blacks than for white
– Test for a difference in slopes of the IPR/birth weight
curves for blacks and for whites
Some possible patterns among
income, race, and birth weight
Income main effect
Income & race main effects,
and interaction: converging
Income & race main effects
Income & race main effects,
and interaction: diverging
from same intercept
Income & race main effects,
and interaction: diverging
from different intercepts
Income & race main effects,
and interaction: disordinal
Implied pattern based on NHANES III data
if main effects of race and IPR are omitted
BW = f (NHB_IPR) specification forces
• The intercept to be the same for black and white infants
• The slope of IPR/birth weight curve for white infants to be
zero (flat)
• The estimated slope of IPR/birth weight curve for black
infants to be negative
Observed pattern based on model of NHANES III
data with main effects and interaction terms
• BW = f (NHB, IPR, NHB_IPR) specification estimates
– Different intercepts for blacks and for whites
– Different slopes for blacks and for whites
• Slopes for both racial/ethnic groups are positive
= βIPR
= β0 + βNHB
• Models intended to test for interactions should
initially include all main effects and interaction terms
for the independent variables involved
• Such a specification
– Does not impose a priori assumptions about the shape of
the association among the IVs and DV
– Allows the data to reveal the shape and size of that pattern
• Empirical criteria can be used to simplify the
specification if βs for some term(s) are not
statistically significantly different from one another
Suggested resources
• Miller, J. E. 2013. The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
University of Chicago Press, chapter 16.
Suggested online resources
• Podcasts on
– Visualizing shapes of interaction patterns
– Creating variables and specifying models to test for
– Calculating the shape of an interaction pattern from
regression coefficients
• Two categorical independent variables
• One categorical and one continuous independent variable
– Testing whether a multivariate specification can be
Suggested practice exercises
• Using your own data, estimate the following models for an
interaction between two categorical independent variables
– Main effects only
– Main effects and interactions
– Interaction terms only (omit the associated main effects terms)
• Using a spreadsheet, calculate and graph the implied overall
pattern of the association between the two IVs involved in the
interaction and your DV for EACH of the three specifications
– See spreadsheet template
• Repeat the exercise for an interaction between one
categorical and one continuous independent variable
