SS9.2

advertisement
Criteria for choosing
a reference category
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Overview
• What is a reference category?
– For independent variables (IVs)
– For the dependent variable (DV)
• Choosing reference categories based on:
– Theoretical criteria
– Previous literature on the topic
– Writing patterns
– Sample size
– Joint distribution of variables
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
What is a reference category?
• For each nominal or ordinal variable, the
reference category is the one against which all
other categories of that variable will be
compared.
• A multivariate model specification will not
include a dummy variable for that category.
– Sometimes called the “omitted” category.
• Choice of a reference category for each
categorical variable in your model should NOT
be arbitrary.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Multivariate coefficients
and the reference category
• OLS coefficients will estimate the difference in
the DV for each of the other categories,
compared to the reference category.
• Logit models will estimate odds ratios of the
outcome for each of the other categories of the
IV compared to the reference category.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choosing a reference category
based on theoretical criteria
• Your specific research question will often
determine choice of reference category. E.g.,
– If you are analyzing effects of a drug compared to
placebo, the placebo condition is the logical
reference category.
– If you are comparing other states to your home
state, your home state should be the reference
category.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choosing a reference category
based on prior literature
• If previous studies of your topic have standard
conventions of a reference category, often you
will use it as your reference category as well.
– Doing so facilitates comparison of results.
• BUT, it is important to think through whether
their choice fits your study.
– Identify the reasons why others have chosen that
reference category.
– Check those reasons against your own.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choosing a different reference
category than the prior literature
• If you have strong reasons to use a different
reference category than a major study of your
topic:
– In your methods section, explain the theoretical or
empirical basis why you chose a different reference
category.
– In the discussion section, translate your results to
compare against the same reference category as
other leading studies.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choosing a reference category
based on writing patterns
• If your sentences tend to read “compared to
group X,” then group X should be your reference
category.
• Doing so will ensure that your statistical
calculations are consistent with how you will
write about the results.
• But see
• Empirical criteria for sample size
• Precedent in the literature
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choosing a reference category
based on sample size
• Lacking some other basis for selecting a reference
category, choose the largest (modal) group.
– Doing so maximizes statistical power for estimating
coefficients.
• Sometimes this will mesh with theoretical criteria, as
when the majority racial ethnic group is chosen as the
reference category.
• Sometimes, your “natural” reference category includes
very few cases.
– Might need to pick a different group to provide stable
statistical estimates.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choosing reference categories based
on joint distribution of variables
• The overall reference category for a multivariate
regression model is the combination of
reference categories for each of your categorical
variables.
• Be sure that that combination isn’t too rare.
– E.g., teenagers with at least a college degree will be
pretty unusual (if not definitionally impossible!), so
don’t pick teenagers as the reference category for
age and college+ as the reference category for
education.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Reference category
for dependent variables
• If you are analyzing a categorical dependent
variable, you also need to decide which category
to model, and which category is omitted.
• If the DV is dichotomous (2-category),
– You will model one category.
– The other will be the omitted category of the DV.
• E.g., if you model having health insurance, then
being uninsured is the reference category.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Reference category for a
multichotomous dependent variable
• If the DV is multichotomous (N-category),
– You will separately model (N – 1) categories.
– The other category will be the omitted category, for which no
model is estimated.
• E.g., if type of health insurance is a 4-category variable,
– You will estimate separate models for 3 (= 4 – 1) of those
categories.
• For instance, you might model having public insurance, self-pay,
and uninsured.
– The other category (in this case private insurance) is the
reference category.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Summary
• Choice of a reference category for each categorical
variable in your model should NOT be arbitrary.
• Consider the following criteria when selecting a
reference category for each of your variables:
–
–
–
–
–
Theoretical
Previous literature
Writing patterns
Sample size
Joint distribution of variables in your data
• Use the same criteria for choosing a reference category
for the DV as for IVs.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested resources
• Miller, J. E. 2013. The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Chapter 8, section on choosing a reference category
– Chapter 9, section on interpreting coefficients on
categorical variables
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested practice exercises
• Study guide to The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Questions #3e and 8e in the problem set for chapter
9
– Suggested course extensions for
• Chapter 8
– “Applying statistics” exercise #2
• Chapter 9
– “Reviewing” exercise #1
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Download