Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Overview • Identifying criteria for choosing fitting contrasts for each variable • Understanding conceptual and contextual aspects of your variables • Becoming familiar with the distributions of your variables • Transforming variables • Describing your variables in the methods section The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Criteria for choosing pertinent-sized contrasts for each of your variables • Theoretical criteria • Empirical criteria • Measurement issues The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Theoretical criteria for choosing fitting contrasts • Theoretical criteria relate to how that concept is measured and compared in the literature or real-world context. • Examples: – Multiples of the poverty level that correspond with program eligibility criteria for that place and time. – Multiples of standard deviations of weight-forheight , based on international child growth standards. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Identifying theoretical criteria for your topic • Start by reading the literature to identify which ones pertain to each of your – Independent variables (IVs) – Dependent variables (DV) • Also identify real-world factors pertaining to your variables. E.g., – Physical properties (e.g. freezing point of water) – Clinically meaningful contrasts – Socially relevant contrasts The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Empirical criteria for choosing fitting contrasts • Based on the observed distribution of values in your data. • Examples: – Multiples of standard deviations • Comparing values at the mean, and ±1 standard deviation in the IV – Interquartile range • Comparing values at the 25th and 75th percentiles of the IV. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. When to use empirical criteria • Best used if theoretical criteria are not available for your topic. • Or possibly to compare with other studies that have used same criteria. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Measurement issues and choice of contrast size • For some variables, a one-unit contrast is too small to be measured accurately. • Examples: – Difficult for most individuals to accurately recall their annual income to the nearest dollar. – Difficult to measure blood pressure to the nearest 1 mm Hg (millimeter of mercury) • In such situations, use a larger contrast. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Getting to know your variables The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Understanding the context • Become familiar with the range of values that make sense for each of your variables: – When, where, and to whom the data pertain. • E.g., pertinent values for family income will be different: – Now versus 200 years ago. – In the US versus in a developing country today. – For a low-income sample of the US than for the entire population. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Understanding conceptual attributes of your measures • Become familiar with the ranges of values that make sense for each of your variables – A birth weight of 9,999 grams is too high • ~=22 lb., which is the size of an average 12 month old! – In this case, problems arose due to ignoring • System of measurement (metric, not British) • Units • Real-world meaning of the number. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Identifying the valid theoretical range of values • Different types of measures have different valid ranges: – Proportions must fall between 0.0 and 1.0. – Temperature in °Fahrenheit can be either positive or negative, but in °Kelvin can only be positive. – Number of children in a family has a narrower theoretical range than does annual family income. • Identify the pertinent limits for each of your variables. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Examining the range of observed values • Examine the distributions of the variables in your data set to become familiar with the – Units – Range – Distribution of values – Categories • Of nominal variables • Ordinal versions of continuous variables The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Identifying variables for which a 1-unit contrast is not suitable • Based on your theoretical, contextual, and empirical investigations of each variable in your model, identify those for which – A one-unit contrast is too big • E.g., those with low values or a very narrow range – A one-unit contrast is too small • E.g., those with very high values or a wide range – A one-unit contrast is just right • See podcast on defining the Goldilocks problem The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining variables to address the Goldilocks problem • Many Goldilocks issues can be addressed by modifying one or more variables before specifying the multivariate model: – Rescaling – Using a different level of aggregation – Creating a categorical version of a continuous variable. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Transforming your variables • These transformations can: – Make a one-unit increase in Xi align better with the research question. – Shift the scale of the βs to be more consistent across the set of variables in the model. • For any of these approaches, retain the original variable and create a new variable with the transformed version. – Never overwrite the original data! The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Rescaling your variables • For some research questions, a simple change of scale can help make a one-unit contrast in the independent variable align better with the research question. • For example, working with – annual income in $10,000s instead of $1s. – ozone concentration in parts per thousand instead of parts per million. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Rescaling and the decimal system • Rescaling variables involves dividing or multiplying the original variable by some value • Often a multiple of ten, e.g., – Multiply by 1,000 – Divide by 100 • Although changing the scale of a variable by an order of magnitude or two is mathematically convenient, it is also arbitrary and in many cases unrelated to the topic or data under study. – E.g., increments of 10 or 100 days don’t correspond to common usage as well as increments of 7 or 30 or 365 days. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Changing the level of aggregation • An alternative way to make the scale of variables fit better with a one-unit increase is to change the level of aggregation. – If a one-unit change in the original variable is too small, shift to a lower level of aggregation, e.g., • weekly income instead of annual income; • population at the county instead of state level. – If a one-unit change is too large, shift to a higher level of aggregation, e.g., • cost per dozen instead of per piece. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating a categorical version of a continuous variable • For topics for which standard ranges or cutoffs are commonly used, consider creating a categorical version of a continuous variable. E.g., – Age ranges that relate to developmental, economic, social, or health phenomena • 0–17 years (children), 18–64 years, 65+ years – Clinically meaningful ranges of blood pressure • <120 mm Hg; 120–139 mm Hg; 140+ mm Hg The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Describing exploratory work in your methods section • In the methods section, describe the behindthe-scenes work you did to address Goldilocks issues. • Explain the reasons for those transformations given your research question and data. – Exploratory analysis of distributions of your variables in your data set. – Background reading on commonly used cutoffs or calculations for the variables you are using. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining newly created variables in your methods section • If you transformed variables or created categorical versions of continuous variables, – Report units and levels of aggregation for all transformed variables. E.g., • Income in $10,000s. • Logged(income in $1s). – Specify cutoffs used to define categories. E.g., • Ranges of BMI used to define overweight or obesity. • Poverty thresholds (multiples of the Federal Poverty Level) for different years or household compositions. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Summary • Transforming one or more of your variables before specifying your multivariate model can – Make a one-unit increase in each independent variable align better with the research question. – Shift the scale of the βs to be more consistent across independent variables in the model. • In your methods section, describe – Exploratory data analysis to become familiar with observed values and distributions of each variable in your model. – The calculations and criteria used to create new variables. – Citations for those criteria and calculations. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Chapter 10, on the Goldilocks problem – Chapter 4, on types of variables, units and distribution – Chapter 7, on choosing effective examples – Chapter 13, on the data and methods section The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested online resources • Podcasts on – Defining the Goldilocks problem – Resolving the Goldilocks problem using • Model specification • Effective ways of presenting results The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested practice problems • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Problem sets for • chapter 7, question #6 • chapter 10, questions #1 through 5. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested extensions • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Suggested course extensions for • chapter 4 – “Reviewing” questions #1 and 3. • chapter 10 – “Reviewing” exercises #1 and 2. – “Applying statistics and writing” question #1, 2, 3, and 5. – “Revising” questions #1, 2, 3, and 9. • chapter 13, “writing” exercises #3 and 4. – “Getting to know your variables” assignment The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.