Criteria for choosing a reference category Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Overview • What is a reference category? – For independent variables (IVs) – For the dependent variable (DV) • Choosing reference categories based on: – Theoretical criteria – Previous literature on the topic – Writing patterns – Sample size – Joint distribution of variables The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. What is a reference category? • For each nominal or ordinal variable, the reference category is the one against which all other categories of that variable will be compared. • A multivariate model specification will not include a dummy variable for that category. – Sometimes called the “omitted” category. • Choice of a reference category for each categorical variable in your model should NOT be arbitrary. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Multivariate coefficients and the reference category • OLS coefficients will estimate the difference in the DV for each of the other categories, compared to the reference category. • Logit models will estimate odds ratios of the outcome for each of the other categories of the IV compared to the reference category. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing a reference category based on theoretical criteria • Your specific research question will often determine choice of reference category. E.g., – If you are analyzing effects of a drug compared to placebo, the placebo condition is the logical reference category. – If you are comparing other states to your home state, your home state should be the reference category. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing a reference category based on prior literature • If previous studies of your topic have standard conventions of a reference category, often you will use it as your reference category as well. – Doing so facilitates comparison of results. • BUT, it is important to think through whether their choice fits your study. – Identify the reasons why others have chosen that reference category. – Check those reasons against your own. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing a different reference category than the prior literature • If you have strong reasons to use a different reference category than a major study of your topic: – In your methods section, explain the theoretical or empirical basis why you chose a different reference category. – In the discussion section, translate your results to compare against the same reference category as other leading studies. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing a reference category based on writing patterns • If your sentences tend to read “compared to group X,” then group X should be your reference category. • Doing so will ensure that your statistical calculations are consistent with how you will write about the results. • But see • Empirical criteria for sample size • Precedent in the literature The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing a reference category based on sample size • Lacking some other basis for selecting a reference category, choose the largest (modal) group. – Doing so maximizes statistical power for estimating coefficients. • Sometimes this will mesh with theoretical criteria, as when the majority racial ethnic group is chosen as the reference category. • Sometimes, your “natural” reference category includes very few cases. – Might need to pick a different group to provide stable statistical estimates. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing reference categories based on joint distribution of variables • The overall reference category for a multivariate regression model is the combination of reference categories for each of your categorical variables. • Be sure that that combination isn’t too rare. – E.g., teenagers with at least a college degree will be pretty unusual (if not definitionally impossible!), so don’t pick teenagers as the reference category for age and college+ as the reference category for education. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Reference category for dependent variables • If you are analyzing a categorical dependent variable, you also need to decide which category to model, and which category is omitted. • If the DV is dichotomous (2-category), – You will model one category. – The other will be the omitted category of the DV. • E.g., if you model having health insurance, then being uninsured is the reference category. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Reference category for a multichotomous dependent variable • If the DV is multichotomous (N-category), – You will separately model (N – 1) categories. – The other category will be the omitted category, for which no model is estimated. • E.g., if type of health insurance is a 4-category variable, – You will estimate separate models for 3 (= 4 – 1) of those categories. • For instance, you might model having public insurance, self-pay, and uninsured. – The other category (in this case private insurance) is the reference category. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Summary • Choice of a reference category for each categorical variable in your model should NOT be arbitrary. • Consider the following criteria when selecting a reference category for each of your variables: – – – – – Theoretical Previous literature Writing patterns Sample size Joint distribution of variables in your data • Use the same criteria for choosing a reference category for the DV as for IVs. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Chapter 8, section on choosing a reference category – Chapter 9, section on interpreting coefficients on categorical variables The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Questions #3e and 8e in the problem set for chapter 9 – Suggested course extensions for • Chapter 8 – “Applying statistics” exercise #2 • Chapter 9 – “Reviewing” exercise #1 The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.