Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10 To do or not to do: Contracting the work • During study planning, you should decide whether to do the data entry, management, and analysis yourself, or whether to contract with someone else to do it • What are the advantages and disadvantages? • When might you want to? When might you not want to? Contracting • Advantages – Specialized expertise – Potential ability to access national network of personnel – Reduction of load on study personnel – Third party (without financial or professional stake in results) increases legitimacy of the results Contracting • Disadvantages – Generally more expensive • Is this true? Discuss profits vs. expertise and efficiency – Lose direct control over quality of data and study conduct – May be more difficult to interpret data without having done the analysis DIY: Now what? • • • • • Data analysis plan Data entry Data diagnostics Data cleaning Data setup Data analysis plan (DAP) • Design from the protocol and the survey instrument – Note: they may be discrepant • Aim: – Resolve discrepancies before you start working with the data – Establish a clear plan for data management and analysis DAP elements • Summarize methods • For each survey objective, identify and describe the relevant variables • Identify the analysis methods – Software – Statistical methods, tests, significance levels, definitions DAP elements (2) • Describe plan for handling: – missing values – out-of-range values – zeros if doing log transformations – data collapsing • Describe subgroup or by-group analyses DAP elements (3) • Set up dummy tables and graphs • Review this DAP carefully and pass it around Data entry • Design a database that resembles the survey instrument in layout and format • Pretest it extensively • Designer should be present at the beginning of data entry to fix bugs • Double data entry? • Avoid necessity of interpretation by entry personnel You and Your Data Your first eight hours together First things first • Virus-check the files • Write protect original data • Back up files and CRFs – On-site: hard drives, diskettes, safes – Off-site: safe deposit box First things first (2) • Import data – Error prone; be very careful here • Validate and verify the data Validating and verifying data • Run frequencies for categorical variables • Run univariate statistics for continuous variables • Examine key variables (those used in the evaluation of primary objectives) • Look at variables by group (sex, age, etc) Validating and verifying data (2) • Recode missing values • Calculate checks for error prone variables – Ex. Check dates against time-to variables – Check anything that the interviewer had to calculate, such as a total score • Derive any key variables that need to be calculated from other variables, and verify them too Validating and verifying data (3) • Rearrange, combine, or separate datasets as needed for analysis – Ex. Split demographic data, primary outcome, secondary outcome data • Annotate a survey instrument with variable names • Create a data dictionary – Include variable name, type, length, and description or label Validating and verifying data (4) • Look for obvious errors – Ex. Spelling of medication or medical condition – Be very careful about correcting them – Document any changes – Think about a query system – May need interviewer to resolve errors Validating and verifying data (5) • Run rough crosstabs for reference – Ex. Number by sex, group, and age – Use to track observations • Create data listings – Very useful for reference and to identify problems in the data • Check data coming from different sources – Be very careful with merging Validating and verifying data (6) • Aside: Variable naming – Should be meaningful and descriptive – But be careful about overly descriptive names • Long variable names are difficult to manipulate • If meaning appears obvious, people won’t look it up • Back all of this up in the same way you backed up the original data