SS16.3b

advertisement
Creating variables and specifying to
test for interactions involving
continuous independent variables
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Overview
• Creating variables for an interaction between one
categorical and one continuous variable.
• Aside on missing values.
• Specifying a model to test an interaction between one
categorical and one continuous variable.
• Creating variables for an interaction between two
continuous independent variables.
• Specifying a model to test an interaction between two
continuous independent variables.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Interaction between a continuous and
a categorical independent variable (IV)
• Example: Race and income-to-poverty ratio.
– Race is a 2-category IV classified
• non-Hispanic black (NHB),
• non-Hispanic white (NHW,)
– IPR is a continuous variable calculated as annual
family income (in $) divided by the Federal Poverty
Level for a family of that size and age composition.
• IPR ranges from 0 to more than 10 in this sample.
• Federal Poverty Level for a family of 2 adults and 2
children in 2010 was about $22,000
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of variables
• The NHB main effect variable is defined as in the
previous example (of categorical by categorical
interaction).
• 1 = non-Hispanic black.
• 0 = all others, the reference category, in this example, nonHispanic white.
• However, for a continuous variable like income that
takes on many possible numeric values, it doesn’t make
sense to create a lot of dummy variables.
• Instead, use income-poverty ratio in its continuous form.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Calculating an interaction term from a dummy
and a continuous main effects term
• The value of the interaction term variable is
defined as the product of the two component
main effects variables:
X1_ X2 = X1 × X2
– Result will be one continuous interaction term
variable.
• Thus NHB_IPR is the product of NHB and IPR.
– If NHB = 1 and IPR = 2.3 then the interaction term
NHB_IPR = 2.3
– If NHB = 0 and IPR = 2.3, then NHB_IPR = 0
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
term variables: race and IPR
Case characteristics –
SELECTED VALUES
Non-H white & IPR = 0.5
Non-H white & IPR = 1.0
Non-H white & IPR = 2.0
Non-H white & IPR = 5.0
Non-H black & IPR = 0.5
Non-H black & IPR = 1.0
Non-H black & IPR = 2.0
Non-H black & IPR = 5.0
Variables
Main effects terms Interaction term
NHB
IPR
NHB_IPR
0
0
0
0
1
1
1
1
0.5
1.0
2.0
5.0
0.5
1.0
2.0
5.0
0
0
0
0
0.5
1.0
2.0
5.0
E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR =
2.0 means income is twice the FPL.
For a two-category race variable (non-Hispanic white = reference category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of race and IPR variables:
Non-Hispanic white infants
Case characteristics
Non-H white & IPR = 0.5
Non-H white & IPR = 1.0
Non-H white & IPR = 2.0
Non-H white & IPR = 5.0
Variables
Main effects terms Interaction term
NHB
IPR
NHB_IPR
0
0
0
0
0.5
1.0
2.0
5.0
0
0
0
0
E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR =
2.0 means income is twice the FPL.
For a two-category race variable (non-Hispanic white = reference category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of race and IPR variables:
Non-Hispanic black infants
Case characteristics
Non-H black & IPR = 0.5
Non-H black & IPR = 1.0
Non-H black & IPR = 2.0
Non-H black & IPR = 5.0
Variables
Main effects terms Interaction term
NHB
IPR
NHB_IPR
1
1
1
1
0.5
1.0
2.0
5.0
0.5
1.0
2.0
5.0
E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR =
2.0 means income is twice the FPL.
For a two-category race variable (non-Hispanic white = reference category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
term variables: race and IPR
Case characteristics –
SELECTED VALUES
Non-H white & IPR = 0.5
Non-H white & IPR = 1.0
Non-H white & IPR = 2.0
Non-H white & IPR = 5.0
Non-H black & IPR = 0.5
Non-H black & IPR = 1.0
Non-H black & IPR = 2.0
Non-H black & IPR = 5.0
Variables
Main effects terms Interaction term
NHB
IPR
NHB_IPR
0
0
0
0
1
1
1
1
0.5
1.0
2.0
5.0
0.5
1.0
2.0
5.0
0
0
0
0
0.5
1.0
2.0
5.0
E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR =
2.0 means income is twice the FPL.
For a two-category race variable (non-Hispanic white = reference category).
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Aside: Missing values
• For each new variable created, the new variable should
take on a missing value if the original source variable
was missing for a given case.
• Need to specify this as an extra step for IF/THEN logic
such as that used in creating the dummies.
– E.g., IF RACE = . THEN NHB =.;
– In the statistical package SAS, “.” is the code for missing.
• For variables created using arithmetic, if any
component source variable is missing, the result of the
calculation will also be missing.
– E.g., if IPR =., then NHB_IPR will also be missing.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Model specification to test an interaction
between one continuous and one
categorical independent variable
• For a model with an interaction between two
independent variables, need all of the ALL of the
main effects and interaction term variables
related to those two independent variables.
• E.g., for a model of birth weight by race and IPR,
include the main effect and interaction terms
related to race and family IPR-to-poverty ratio:
– BW = f (NHB, IPR, NHB_IPR)
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Independent variables:
continuous by continuous interaction
• Mother’s age at time of child’s birth, years
– One continuous variable for the main effect: age
• Family income to poverty ratio, in multiples of
the Federal Poverty Level
– One continuous variable for the main effect: IPR
• Interaction: Mother’s age and IPR
– Age_IPR = age × IPR
– Resulting interaction term variable will also be
continuous
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Coding of main effects and interaction
term variables: age and IPR
Case characteristics –
SELECTED VALUES
Age = 15 & IPR = 0.5
Age = 15 & IPR = 1.0
Age = 15 & IPR = 2.0
Age = 25 & IPR = 0.5
Age = 25 & IPR = 1.0
Age = 25 & IPR = 2.0
Age = 30 & IPR = 0.5
Age = 30 & IPR = 1.0
Age = 30 & IPR = 2.0
Variables
Main effects terms Interaction term
Age
IPR
Age_IPR
15
15
15
25
25
25
30
30
30
0.5
1.0
2.0
0.5
1.0
2.0
0.5
1.0
2.0
7.5
15.0
30.0
12.5
25.0
50.0
15.0
30.0
60.0
E.g., IPR = 0.5 means income is half the Federal Poverty Level (FPL); IPR =
2.0 means income is twice the FPL.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Model specification to test an interaction
between two continuous variables
• For a model with an interaction between two
independent variables, need all of the ALL of the
main effects and interaction term variables
related to those two independent variables.
• E.g., for a model of birth weight with an
interaction between age and IPR, include the
main effect and interaction terms related to
family IPR-to-poverty ratio and mother’s age:
– BW = f (IPR, mother’s age, age_IPR)
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Centered continuous
independent variables
• The example shown here used uncentered
continuous variables:
– Age in years
– Income-to-poverty ratio (IPR)
• Could use centered versions in the interactions
– Create a version of the variable centered at its mean
• E.g. CENTAGE = AGE – mean age
– Where mean age is the sample mean
– Then use CENTAGE age the main effect term
– CENTAGE_IPR as the interaction term
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Be parsimonious in deciding which
interactions to test
• As shown here, the number of variables in the
regression model proliferates rapidly with each
additional interaction.
• Specify interactions only between key
independent variables.
• Communicating results becomes unwieldy:
– Considerable behind-the-scenes calculations.
– Extra tables or charts to convey the shape of the
interaction.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Summary
• A model specification to test for interactions includes both
main effects and interaction terms.
– Combination of those terms in the model uniquely identifies
each possible combination of values of the component
variables.
• Number and type of interaction terms needed depends on
– Type (s)of variables in the interaction.
– Number of categories, for categorical variables in interaction.
• For most situations, test interactions among key variables
only. For criteria to help you decide which interactions to
test for your topic and data, see podcast on visualizing
shapes of interaction patterns
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Suggested resources
• Miller, J. E., 2013. The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Chapter 16, on interactions
– Chapter 9, on defining dummy variables
– Chapter 8, on choice of reference category
• Chapters 8 and 9 of Cohen et al. 2003. Applied
Multiple Regression/Correlation Analysis for the
Behavioral Sciences, 3rd Edition. Florence, KY:
Routledge.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Suggested online resources
• Podcasts on
– Introduction to interactions
– Visualizing shapes of interaction patterns between
categorical by continuous variables
– Choosing a reference category
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Suggested practice exercises
• Study guide to The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Suggested course extensions for Chapter 16
• “Reviewing” exercises #2, 3 and 4.
• “Applying statistics and writing” exercises #1, 2, and 3.
• “Revising” exercises #1 and 3.
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Download