SS16.3a

advertisement
Creating variables and specifying
models to test for interactions
between two categorical
independent variables
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Overview
• Creating variables for an interaction between
two categorical variables
– Review: dummy variables
– Review: reference categories
• Aside on missing values
• Specifying a model with an interaction between
two categorical variables
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
List of variables used in examples
• Dependent variable = birth weight in grams (BW).
• Independent variables:
– Main effects terms:
• Race
– Two nominal categories (non-Hispanic black; non-Hispanic white
is the reference category)
– One main effect dummy variable: NHB
» Coded 1 = non-Hispanic black, 0 = non-Hispanic white
• Mother’s education
– Three ordinal categories (<HS; =HS; >HS is the reference category)
– Two main effects dummies: <HS, =HS
» Each coded 1 = named category, 0 = all other values
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
List of variables, continued
• Interaction between race and mother’s
education
– Two interaction term dummies: NHB_<HS; NHB_=HS
• Each named using the “_” convention to link the names of
the component variables.
• Each coded 1 = named category, 0 = all other values
– E.g., NHB_<HS
= 1 for those who are both NHB and <HS,
= 0 for all other combinations of race and education
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Interaction between two
categorical independent variables
• Example: Race and education
– Race is a 2-category independent variable classified
• Non-Hispanic black (NHB)
• Non-Hispanic white (NHW) = reference category
– Mother’s educational attainment is a 3-category
independent variable classified
• Less than complete high school (<HS)
• High school diploma, no higher (=HS)
• More than high school (>HS) = reference category
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of variables
• Each of the dummy (also known as “binary”)
variables will be coded
• 1 for each case that has the trait after which the
variable is named.
• 0 for all other cases.
• E.g., the dummy variable “NHB” will be coded
• 1 for all non-Hispanic black infants.
• 0 for all others (in this example, all non-Hispanic
white infants).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Reference category for an interaction
• Need a set of independent variables to uniquely
identify each possible combination of race and
mother’s educational attainment.
– With one 2-category variable and one 3-category
variable, there are six such combinations.
• Choose one category to be the basis of
comparison.
– The reference category.
• Define dummy variables to differentiate among
the other five categories.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Possible combinations of race and
mother’s educational attainment
Race
Non-Hispanic black
Non-Hispanic white
Mother’s educational
attainment
<HS
=HS
>HS
Reference
category
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Source variables used to create
main effects and interaction terms
• Three source variables:
– A two-category race variable RACE coded 1 = non-Hispanic
white; 2 = non-Hispanic black
– A three-category education variable MOMED coded 1 = <HS;
2 = “=HS”; 3 = >HS
– A continuous income variable IPR, annual family income (in $)
divided by the Federal Poverty Level for a family of that size
and age composition
• On the next few slides,
– PINK = original (“source”) variable
– YELLOW = main effect term
– GREEN = interaction term
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
terms: race/ethnicity and education
Case characteristics
Non-H white & <HS
Non-H white & =HS
Non-H white & >HS
Non-H black & <HS
Non-H black & =HS
Non-H black & >HS
Main effects terms
Interaction terms
Race Education
Race & educ
NHB
0
0
0
1
1
1
<HS
1
0
0
1
0
0
=HS
0
1
0
0
1
0
NHB_<HS NHB_=HS
0
0
0
0
0
0
1
0
0
1
0
0
For a two-category race variable (non-Hispanic white = reference category).
And a three-category educational attainment variable (>HS = reference
category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
variables: non-Hispanic white infants
Main effects terms
Interaction terms
Race
Education
Race & educ
Case characteristics NHB <HS =HS NHB_<HS NHB_=HS
Non-H white & <HS
0
1
0
0
0
Non-H white & =HS
0
0
1
0
0
Non-H white & >HS
0
0
0
0
0
For a two-category race variable (non-Hispanic white = reference category).
And a three-category educational attainment variable (>HS = reference
category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Calculating an interaction term from
two dummy main effects terms
• Using the convention of naming the interaction term
with an “_” to connect the names of the two
component variables.
– The interaction term between NHB and <HS is calculated NHB
× <HS.
• Since both component main effects terms are coded 1
for the named group and 0 for all others, only when
both NHB and <HS = 1 is NHB_<HS = 1.
– A value of 1 for that interaction term identifies infants with
BOTH of those traits.
– E.g., for an infant who is NHW and <HS we have 0 × 1 = 0.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
variables: non-Hispanic black infants
Case characteristics
Non-H black & <HS
Non-H black & =HS
Non-H black & >HS
Main effects terms
Interaction terms
Race Education
Race & educ
NHB
1
1
1
<HS
1
0
0
=HS
0
1
0
NHB_<HS NHB_=HS
1
0
0
1
0
0
For a two-category race variable (non-Hispanic white = reference category).
And a three-category educational attainment variable (>HS = reference
category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
variables: race and educational attainment
Main effects terms
Interaction terms
Race Education Race & education
Case characteristics
Non-H white & <HS
Non-H white & =HS
Non-H white & >HS
Non-H black & <HS
Non-H black & =HS
Non-H black & >HS
NHB
0
0
0
1
1
1
<HS
1
0
0
1
0
0
=HS
0
1
0
0
1
0
NHB_<HS NHB_=HS
0
0
0
0
0
0
1
0
0
1
0
0
For a two-category race variable (non-Hispanic white = reference category).
And a three-category educational attainment variable (>HS = reference
category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Aside: Missing values
• For each new variable created, the new variable should
take on a missing value if the original source variable
was missing for a given case.
• Need to specify this as an extra step for IF/THEN logic
such as that used in creating the dummies.
– E.g., IF RACE = . THEN NHB =.;
– In the statistical package SAS, “.” is the code for missing.
• For variables created using arithmetic, if any
component source variable is missing, the result of the
calculation will also be missing.
– E.g., if IPR =., then IPR_NHB will also be missing.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Be parsimonious in deciding which
interactions to test
• As shown here, the number of variables in the
regression model proliferates rapidly with each
additional interaction.
• Specify interactions only between key
independent variables.
• Communicating results becomes unwieldy:
– Considerable behind-the-scenes calculations.
– Extra tables or charts to convey the shape of the
interaction.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Criteria for identifying pertinent
interactions to test
• Theoretical reasons why the association
between X1 and Y might differ by X2 for the
particular variables you are studying.
• Empirical evidence that the association between
X1 and Y varies by X2 in your data.
– Three-way association among X1, X2 , and Y.
– See Babbie’s elaboration paradigm.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Model specification with
interactions: race and education
• BW = f (race, education, race_education)
– Birth weight is a function of race, education, and the
race-by-education interaction.
• To specify the model, need ALL of the main
effects and interaction term variables related to
race and mother’s education
• BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS)
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Parsimonious specification
• Most interaction specifications should initially
include
– Main effects terms for all variables involved in the
interaction
– Interaction terms
• Might be able to omit some main effect or
interaction terms based on
– Theoretical criteria
– Empirical statistical significance tests for combining
groups
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Summary
• A model specification to test for interactions includes both
main effects and interaction terms.
– Combination of those terms in the model uniquely identifies
each possible combination of values of the component
variables.
• Number and type of interaction terms needed depends on
– Type (s)of variables in the interaction.
– Number of categories, for categorical variables in interaction.
• For most situations, test interactions among key variables
only. For criteria to help you decide which interactions to
test for your topic and data, see podcast on visualizing
shapes of interaction patterns
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Suggested resources
• Miller, J. E., 2013. The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Chapter 16, on interactions
– Chapter 9, on defining dummy variables
– Chapter 8, on choice of reference category
• Chapters 8 and 9 of Cohen et al. 2003. Applied
Multiple Regression/Correlation Analysis for the
Behavioral Sciences, 3rd Edition. Florence, KY:
Routledge.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Suggested online resources
• Podcasts on
– Introduction to interactions
– Visualizing shapes of interaction patterns
– Choosing a reference category
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Suggested practice exercises
• Study guide to The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Suggested course extensions for Chapter 16
• “Reviewing” exercises #2, 3 and 4.
• “Applying statistics and writing” exercises #1, 2, and 3.
• “Revising” exercises #1 and 3.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Download