SS10.2 - University of Chicago Press

advertisement
Planning how to create
the variables you need
from the variables you have
Jane E. Miller, PhD
The Chicago Guide to Writing about Numbers, 2nd edition.
Overview
• Why researchers sometimes need to create new
variables to conduct their analysis
• Why it is important to plan ahead for how to create
those new variables
• What information is required to identify the new
variables needed for the research question
• How to write clear instructions on how to get from
the variables you have to the variables you need
The Chicago Guide to Writing about Numbers, 2nd Edition.
Why create new variables?
• For many statistical analyses, variables available on
the original data set are not yet in the form needed
to address the research question of interest.
• Examples:
– You want to study total family income, but the data set has
separate variables measuring income components such as
earned income, government benefits, and alimony.
– You want to compare outcomes for age groups (children,
working age adults, and the elderly), but the data set
reports respondent’s age in single years.
The Chicago Guide to Writing about Numbers, 2nd edition.
Conceptualizing the new variable
should precede programming it
• Important to separate
– Researching and planning how those variables
should be defined
– Programming the new variable in an electronic
database
• Each of those tasks
– Has its own challenging aspects
– Uses different
• Skills
• Resources
Some common patterns of creating
new from existing variables
•
•
•
•
A categorical version of a continuous variable
A simplified (collapsed) categorical variable
A binary indicator from a continuous variable
A new continuous variable that combines 2+
continuous variables
• A mathematical transformation of a continuous
variable
The Chicago Guide to Writing about Numbers, 2nd edition.
A categorical version
of a continuous variable
• Original variable
– Age in years (continuous)
• Needed variable
– Age group (categorical)
The Chicago Guide to Writing about Numbers, 2nd edition.
A simplified (collapsed)
categorical variable
• Original variable
– Ten-category ethnicity variable
• Needed variable
– Three-category ethnicity variable
The Chicago Guide to Writing about Numbers, 2nd edition.
A binary indicator
from a continuous variable
• Original variable
– Birth weight in grams (continuous)
• Needed variable
– Indicator of low birth weight status (yes or no)
The Chicago Guide to Writing about Numbers, 2nd edition.
A new continuous variable that
aggregates 2+ continuous variables
Original variable(s)
Separate measures of income for each
family member
New variable
Total family income
Multiple attitudinal items
A composite
attitudinal scale
The Chicago Guide to Writing about Numbers, 2nd edition.
A new continuous variable calculated
from 2+ continuous variables
Original variable(s)
Separate measure of county-level
population and poverty rate
Separate measures of weight (kg.)
and height (meters)
New variable
Number of poor persons
in the county =
population × % poor
Body Mass Index =
2
weight/(height )
The Chicago Guide to Writing about Numbers, 2nd edition.
A mathematical transformation
of a continuous variable
Original variable(s)
Income in dollars
New variable
Logged income
Income in dollars
Income in thousands of dollars
The Chicago Guide to Writing about Numbers, 2nd edition.
Planning steps for creating new variables
• Finding relevant variables in the original data set
• Becoming acquainted with the units and categories
for available variables
• Consulting the published literature on the topic to
see how those concepts have been measured or
classified by other researchers
• Identifying pertinent formulas and thresholds
• Writing out the logic or math needed to create the
new variables from existing variables
The Chicago Guide to Writing about Numbers, 2nd edition.
Steps toward creating a new variable
1. Identify the name(s) of the original variable(s) in the
data set that contain the data needed to create the
new variable.
2. For the new variable, devise
–
A name (acronym) to convey
• Content (meaning) of the new variable
• The dates or survey rounds when the data were
collected, if pertinent
– A label (short descriptive phrase) for the new variable
• Mention units, if pertinent
The Chicago Guide to Writing about Numbers, 2nd edition.
For new continuous variables
• Write the formula to calculate the value of the
new variable from the original variables.
• Specify the units of the original variable(s) and
the new variable.
The Chicago Guide to Writing about Numbers, 2nd edition.
Example: Calculating course grades
from component test scores
• For a hypothetical college course, the overall course
grade is based on three exam scores
– Two mid-term exams (EXAM1 and EXAM2)
• Each scored from 0 to 25 points
– A final exam (FINAL)
• Scored from 0 to 50 points
• For each student, the instructor wants to calculate
– The percentage of questions s/he got correct on exam 1
– Total numeric course grade
– Course letter grade, based on standard grade cutoffs
The Chicago Guide to Writing about Numbers, 2nd edition.
Calculating percentage of exam questions
correct from number of questions correct
• Logic: From the information in the data set, how does one
calculate the percentage of questions correct?
• Concepts: Percentage of questions correct is number of
questions correct divided by the total number of questions on
the exam, multiplied by 100.
• Formula: Replace concepts with names of variables:
STEP 2: name for
STEP 1: Identify existing variables,
new variable, not
already in data set from which
yet in data set.
new variable will be calculated.
PCCOREX1 = (EXAM1/25) * 100
STEP 3: Write the mathematical formula
The Chicago Guide to Writing about Numbers, 2nd edition.
Creating a variable for total numeric course
grade from exam scores
• Logic: From the information in the data set, how does
one calculate total numeric course grade?
• Concepts: Overall numeric course grade is the sum of
the three exam scores.
• Formula: Replace concepts with names of variables:
STEP 2: name for
new variable, not
yet in data set.
STEP 1: Identify existing variables,
already in data set from which
new variable will be calculated.
TOTGRADE = EXAM1 + EXAM2 + FINAL
STEP 3: Write the mathematical formula
The Chicago Guide to Writing about Numbers, 2nd edition.
For new categorical variables
• Write the logical steps to classify the values of the
original variable into the values of the new variable.
• Show how every possible value of the original
variable maps into a value of the new variable.
• List the
– Value label (descriptive phrase) for each value (category)
of the new variable;
– Code (numeric value) that the new variable will take on for
each value or set of values of the original variable.
The Chicago Guide to Writing about Numbers, 2nd edition.
Classifying numeric course grades
into letter grade ranges
STEP 1: Identify existing variables from
STEP 2: name for new
which new variable will be created. variable, not yet in data set.
TOTGRADE Variable Label:
Numeric course grade
Values of original variable
<60
60 TO 69
70 TO 79
80 TO 89
90 OR HIGHER
 LETTRGRD
Variable Label: Final letter grade
Values (codes) of
new variable
Value labels
1
F
2
D
3
C
4
B
5
A
STEP 3: Write the logic for classifying the numeric scores into
letter grade ranges, based on the university’s standard grade
cutoffs. E.g., scores below 60 are classified an “F.”
Missing values for the new variable
• Provide instructions to ensure that cases that
have missing values on the original variables
will also have missing values for new variables
that are based on them.
• Needed whether the new variable was
created using
– A formula
– Classification instructions
The Chicago Guide to Writing about Numbers, 2nd edition.
Summary
• It is often necessary to create new variables to
answer one’s research question.
• Planning steps for creating new variables include
– Identifying source variables available in a data set
– Finding references about how such variables are
conventionally analyzed
– Becoming familiar with units or categories of the variables
– Writing formulas or classification instructions to create the
new variables from the original variables
– Providing instructions about missing values for the original
and new variables
The Chicago Guide to Writing about Numbers, 2nd Edition.
Summary, cont.
• With the formulas and classification
instructions for creating the new variables,
one can then use a spreadsheet or statistical
software to create those variables within an
electronic data set.
• Separate
– The researching and planning steps
– The programming steps
The Chicago Guide to Writing about Numbers, 2nd edition.
Suggested resources
• Miller, J. E. 2015. The Chicago Guide to Writing about
Numbers, 2nd Edition. University of Chicago Press,
chapter 10.
The Chicago Guide to Writing about Numbers, 2nd edition.
Suggested practice exercises
Instructions and a planning template can be
downloaded from the supplemental online materials
at http://press.uchicago.edu/books/miller/numbers/index.htm
NAME of original variable  NAME of new variable _______________________
______________________
LABEL for original variable  LABEL for new variable _______________________
______________________
Values of original variable
Values (codes) of new variable
Value labels of new
variable
The Chicago Guide to Writing about Numbers, 2nd Edition.
Suggested online appendixes
• How to Create the Variables You Need from the
Variables You Have
– Exercise includes
• Step-by-step instructions
• A template planning grid for a new categorical variable
– Paper for instructors on how to teach the concepts and skills
• Getting to Know Your Variables
– Exercise to familiarize researchers with the concepts, units,
categories of variables in their data set
– Paper for instructors on how to teach the concepts and skills
The Chicago Guide to Writing about Numbers, 2nd Edition.
Contact information
Jane E. Miller, PhD
jmiller@ifh.rutgers.edu
Online materials available at
http://press.uchicago.edu/books/miller/numbers/index.html
The Chicago Guide to Writing about Numbers, 2nd Edition.
Download