Data Quality Control by Naila Baig Ansari Research Fellow Dept of Community Health Sciences The Aga Khan University Karachi, Pakistan Who am I? Education: MSc (Epidemiology), The Aga Khan University, 2001. Thesis: Care and feeding practices and their association with stunting among young children residing in Karachi-s squatter settlements BBA (Management), The College of William and Mary, Williamsburg, VA, USA, 1989 Research interest: Nutritional and behavioral epidemiology, methodological issues in dietary assessment methods, household food security and gender-related issues, care and feeding practices, management of data and questionnaire designing Learning Objectives To know the steps necessary for ensuring quality assurance and control of data at various stages of a study To understand the difference between pilot testing and pretesting To understand the importance of designing data collection instruments To understand how data can be managed using an audit trail and the various techniques that can be used to inspect your dataset after it has been entered Performance Objectives Know the difference between quality assurance and quality control and ways to ensure them Know the objectives of a pilot test and a pre-test Understand how data collection instruments should be designed and coded Be able to manage data using an audit trail Be able to inspect datasets for errors and rectify them Data Quality Control Quality Assurance – Activities to ensure quality of data before data collection Quality Control – Monitoring and maintaining the quality of data during the conduct of the study • Data Management – Handling and processing of data throughout the study Steps in Quality Assurance 1. Specify the study hypothesis 2. Specify general design to test study hypothesis Develop an overall study protocol 3. Choose or prepare specific instruments 4. Develop procedures for data collection and processing Develop operation manuals 5. 6. Train staff Certify staff User certified staff, pretest and pilot-study data collection and processing instruments and procedures Quality Assurance: Standardization of procedures Why is standardization important? – In order to achieve highest possible level of uniformity and standardization of data collection procedures in the entire study population Preparation of written manual of operations – Detailed descriptions of exactly how the procedures specific to each data collection instrument are to be carried out (BP example) – Q by Q’s (question by question) instructions for interviews Quality Assurance: Training of Staff Aim to make each staff person thoroughly familiar with procedures under his/her responsibility Training certification of the staff member to perform a specific procedure Quality Assurance: Pretesting and Pilot testing Pretesting – Involves assessing specific procedures on a sample in order to detect major flaws Pilot Testing – Formal rehearsal of study procedures – Attempts to reproduce the whole flow of operations in a sample as similar as possible to study participants Pretesting and Pilot testing results Pretesting of questionnaire used to assess: – flow of questions, – presence of sensitive questions, – appropriateness of categorization of variables, – clarity of the q by q instructions to the interviewer Pilot testing – In addition to the above, flow of process Quality Assurance: Data Management Designing data collection – Layout, questions to ask, sequence of questions, phrasing of questions, response categories, skip patterns – Collect and record “raw”, not processed information (eg. Age) – Codebook: link between the questionnaire and the data entered in the computer Code book example Variable QNo Meaning Codes Format Q1Id Q1 Quest. No 1-750 C3 Q2Sex Q2 Respondent’s sex 1 male 2 female N 1.0 Q3Child Q3 No of children 99 no response N 2.0 Q4Wt Q4 Weight in kg 999 not recorded N 3.1 Q5roof Q5 Roof type 1 RCC 2 Cement sheet 3 Tin sheet 4 Thatched Other (specify) N 2.0 Quality Assurance: Use of a Code book Variable names – Up to 8 characters a-z and 0-9, must start with a letter – Combination of question number and description (eg. q3age) Meaning: – short text description describing the meaning of the variable – SPSS software can incorporate this info as variable labels and display it in the output Quality Assurance: Use of a Code book Codes – Try and use numerical codes Predecide codes for no response, missing values – Question could not be asked or not applicable (eg. pregnancy outcome) – Question was asked but respondent did not reply (eg salary) – Respondent replied “don’t know” Quality Control Observation of procedures and performance of staff members for identification of obvious protocol deviations Strategies include: – Over-the-shoulder observation of staff – Taping all interviews and reviewing a random sample – Ongoing field supervision – field editing by interviewer as well as field supervisor – Office editing which includes coding – log book maintenance – Statistical assessment of trends over time in the performance of each observer/interviewer/technician Data Management: Audit trail Researcher should be able to trace each piece of information back to the original document: – ID included in the original documents and in the dataset – All corrections must be documented and explained – All modifications to the dataset must be documented by command files – Each analysis must be documented by a command file Purpose of audit is to – protect yourself against mistakes, errors, waste of time and loss of information – enable external audit (revision) Data Management: Handling of Data Entering data – Use professional data entry program like EpiData Preparations – complete codebook – examine questionnaires for obvious inconsistencies, skip patterns Data Management: Handling of Data Error prevention: – Set up a data entry form resembling your questionnaire – Define valid values before entering data – double data entry by two different operators compare contents to get list of discrepancies (EpiInfo) correct errors in both files and run new comparison First Inspection of data. Error Finding Add variable and value labels to your data using a syntax command Searching for errors – make printouts of codebook from the data, overview of variables, simple frequency tables of appropriate variables – compare codebook created with original codebook and see if label information is correct – Inspect the generated summary/frequency tables for illegal or improbable minimum and maximum values of variables and inconsistencies (eg. 250 years age, pregnant male; 23 yr woman with 19 yr son) Calculate the error rate by – randomly select 10% or at least 40 of your questionnaires and re-enter them into new file Correction of errors - Documentation If errors are discovered – Make corrections in a command file (SPSS syntax file), this will provide full documentation of changes made to the dataset If errors are discovered when comparing files after double data entry – you can make corrections directly in the data entered, provided you end this step with a comparison of the two files entered and corrected Correction of errors - Documentation Split the process into distinct and welldefined steps and that your documentation from one step to another is consistent Archive – once you have a “clean” documented version of your primary data, save one copy in a safe place and do your work with another copy Analysis Make sure you use the right data set – recommend to create command files for analysis which start with the command reading the dataset Late discovery of errors and inconsistencies Backing up vs Archiving Backing up – everyday activity – purpose to able you to restore your data and documents in case of destruction or loss of data – not only datasets, but also command files modifying your data, written documents such as the protocol, log book and other documenting information Archiving – takes place once or a few times during the life of the project – purpose is to preserve your data and documents for a more distant future, maybe to even allow other researchers access to the information.