Survey Methodology Survey data entry/cleaning

advertisement
Survey Methodology
Survey data entry/cleaning
EPID 626
Lecture 10
To do or not to do:
Contracting the work
• During study planning, you should decide
whether to do the data entry,
management, and analysis yourself, or
whether to contract with someone else to
do it
• What are the advantages and
disadvantages?
• When might you want to? When might you
not want to?
Contracting
• Advantages
– Specialized expertise
– Potential ability to access national network
of personnel
– Reduction of load on study personnel
– Third party (without financial or
professional stake in results) increases
legitimacy of the results
Contracting
• Disadvantages
– Generally more expensive
• Is this true? Discuss profits vs. expertise and
efficiency
– Lose direct control over quality of data and
study conduct
– May be more difficult to interpret data
without having done the analysis
DIY: Now what?
•
•
•
•
•
Data analysis plan
Data entry
Data diagnostics
Data cleaning
Data setup
Data analysis plan (DAP)
• Design from the protocol and the survey
instrument
– Note: they may be discrepant
• Aim:
– Resolve discrepancies before you start
working with the data
– Establish a clear plan for data
management and analysis
DAP elements
• Summarize methods
• For each survey objective, identify and
describe the relevant variables
• Identify the analysis methods
– Software
– Statistical methods, tests, significance
levels, definitions
DAP elements (2)
• Describe plan for handling:
– missing values
– out-of-range values
– zeros if doing log transformations
– data collapsing
• Describe subgroup or by-group
analyses
DAP elements (3)
• Set up dummy tables and graphs
• Review this DAP carefully and pass it
around
Data entry
• Design a database that resembles the
survey instrument in layout and format
• Pretest it extensively
• Designer should be present at the
beginning of data entry to fix bugs
• Double data entry?
• Avoid necessity of interpretation by
entry personnel
You and Your Data
Your first eight
hours together
First things first
• Virus-check the files
• Write protect original data
• Back up files and CRFs
– On-site: hard drives, diskettes, safes
– Off-site: safe deposit box
First things first (2)
• Import data
– Error prone; be very careful here
• Validate and verify the data
Validating and verifying data
• Run frequencies for categorical
variables
• Run univariate statistics for continuous
variables
• Examine key variables (those used in
the evaluation of primary objectives)
• Look at variables by group (sex, age,
etc)
Validating and verifying data (2)
• Recode missing values
• Calculate checks for error prone
variables
– Ex. Check dates against time-to variables
– Check anything that the interviewer had to
calculate, such as a total score
• Derive any key variables that need to be
calculated from other variables, and
verify them too
Validating and verifying data (3)
• Rearrange, combine, or separate
datasets as needed for analysis
– Ex. Split demographic data, primary
outcome, secondary outcome data
• Annotate a survey instrument with
variable names
• Create a data dictionary
– Include variable name, type, length, and
description or label
Validating and verifying data (4)
• Look for obvious errors
– Ex. Spelling of medication or medical
condition
– Be very careful about correcting them
– Document any changes
– Think about a query system
– May need interviewer to resolve errors
Validating and verifying data (5)
• Run rough crosstabs for reference
– Ex. Number by sex, group, and age
– Use to track observations
• Create data listings
– Very useful for reference and to identify
problems in the data
• Check data coming from different
sources
– Be very careful with merging
Validating and verifying data (6)
• Aside: Variable naming
– Should be meaningful and descriptive
– But be careful about overly descriptive
names
• Long variable names are difficult to manipulate
• If meaning appears obvious, people won’t look
it up
• Back all of this up in the same way you
backed up the original data
Download