upcph Data Processing John Robert Carabeo Medina Assistant Professor Department of Epidemiology and Biostatistics COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Learning Objectives At the end of the session, the students should be able to: 1. Explain data processing and its four major phases 2. Discuss data coding and its importance 3. Discuss the principles of setting up categories and different types of code 4. Explain the stages of coding and coding manual 5. Define data encoding, data checking, and data editing 6. Enumerate reasons for data editing COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Processing ▪ Systematic procedure to ensure that the information and/or data gathered are complete, consistent, and suitable for analysis ▪ Has four (4) major phases ▪ Data Coding ▪ Data Encoding ▪ Data Checking ▪ Data Editing ▪ Some researchers include data analysis; other do not COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding: Definition ▪ Process of (1) grouping the responses to a question into categories and assigning codes to these categories ▪ Codes - numbers, characters and/or other symbols ▪ Facilitates counting and tabulation ▪ Symbols or numbers used must allow for flexible and rapid storage, retrieval and analysis ▪ Final choice of code depends on the objectives of the study and the type of storage facility for the data COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Importance 1. To group the responses into a limited number of meaningful categories, so as to bring out their essential pattern. 2. To make recording of information easy. 3. To preserve the confidentiality of the data either for personnel, industrial, or security reasons. 4. To make it easier to enter the data into the computer, minimize storage space and to speed up processing during encoding. COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Importance 5. To facilitate management of files (e.g., sorting, merging, transforming, listing, etc.) using the coded values of one or more key variables. 6. To be able to analyze data using statistical software like Epi Info™, SAS. SPSS, etc. which require numeric inputs. COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Code Construction Five Principles in Setting up Code Categories in Code Construction 1. Appropriateness 2. Single 3. Concept 4. Exhaustiveness 5. Exclusiveness 6. Adequacy COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Code Construction 1. Appropriateness ▪ Categories should be appropriate to the research problem; construct them in such a way that they provide the information needed to satisfy our objectives 2. Single ▪ Categories defined in terms of a single concept or classification principle relevant to the objectives of the study COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Code Construction 3. Exhaustiveness ▪ Single category corresponding to each response ▪ No response which does not belong to any of the constructed categories. ▪ Unfortunately, we are not always able to anticipate the responses ▪ High proportion falling into this catch-all category insinuates that categories are not proper→ categories not properly constructed COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Code Construction 4. Exclusiveness ▪ Categories should be mutually exclusive ▪ Each response should fall into one and only one category. 5. Adequacy ▪ Categories should adequately summarize the responses without losing the important details of the information COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 1. 2. 3. 4. 5. Field code Bracket code Listing code Pattern code Scale code COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 1. Field Code ▪ Numbers or responses are recorded as they are given by the respondent (“Code as is”) ▪ Actual value or quantity of the variable is the information that will be recorded in the form and later on entered in the computer ▪ Ex. Age as of last birthday, height in centimeters, weight in kilograms, number of households with toilet, and number of facility inspected Example 1. What is your age in years? ________ COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 2. Bracket Code ▪ Each category of this type of code refers to a range of numbers or class intervals ▪ Frequently used for numerical responses when the respondent may not be able to provide an exact answer ▪ Ex. Income, age Example 2. How much is your monthly salary? ❑ Less than 10, 000 ❑ 10, 000 – 19, 999 ❑ 20, 000 – 39, 999 ❑ 40, 000 – 49, 999 ❑ 49, 000 or greater COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 3. Listing Code ▪ Listed pre-specified choices Single Response can be selected as answers to Example 3.1. What is the question are listed or your civil status? given in the form. There are pre-specified choices 1 - Less than 10, 000 2 - 10, 000 – 19, 999 ▪ Two types 3 - 20, 000 – 39, 999 ▪ Single response - only one 4 - 40, 000 – 49, 999 answer to the question 5 - 49, 000 or greater ▪ Multiple response – respondents may answer several answers simultaneously COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 3. Listing Code ▪ Listed pre-specified choices can be selected as answers to the question are listed or given in the form. There are pre-specified choices ▪ Two types ▪ Single response - only one answer to the question ▪ Multiple response – respondents may answer several answers simultaneously COLLEGE OF PUBLIC HEALTH University of the Philippines Manila Multiple Response Example 3.2. What were the signs and symptoms experienced by the patient? Check all that applies __ Fever __ Nausea __ Diarrhea __ Malaise SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 4. Pattern Code ▪ Single code allocated to each type of response, including multiple or combined responses (bidimensional code) ▪ First code the response to the primary question; then to the secondary questions. Example 4. Example 4: Do you find the project helpful? Are you planning to go on with the project or do you plan to discontinue your project? Answer: “Well, I find it sort of helpful b ut I simply do not have the time for it. So, I guess I just have to quit fro m the project”. So we code this answer as ‘2 Possible answers Codes Project helpful– plans to go on (1) Project helpful– plans to discontinue (2) Project helpful– plans undecided (3) COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Types of Codes 5. Scale Code ▪ Grouping of responses which vary in degree or intensity along a continuum ▪ For coding responses to attitudinal questions ▪ Divide the continuum into segments; then code Example 5. Five-¬‐point scale (This scale is used to intensify the code responses with degree of feeling.) Q: “How satisfied are you with your pre sent job?” A: “Neither satisfied nor dissatisfied.” Co de (3) 1 Very Satisfied COLLEGE OF PUBLIC HEALTH University of the Philippines Manila 2 Satisfied 3 Neutral 4 Dissatisifed 5 Very Dissatisfied SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Stages Two Stages of Coding 1. Pre-coding 2. Post-coding COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Stages 1. Pre-coding ▪ Process of establishing the categories and codes early in the questionnaire design stage especially for fixed-alternative questions ▪ Coding for closed questions (categories are fixed or predetermined) ▪ Codes may be devised even prior to data collection ▪ Respondents need only to encircle or check the appropriate code or the data collector needs only write down the proper code COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Stages 2. Post-coding ▪ Process of finalizing the categories and codes after the questionnaires have come in from the field ▪ Necessary for open-ended questions in which we might find it difficult, if not impossible to pre-code COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Coding Manual ▪ A document containing all the instructions regarding the codes to be used for the replies given by the respondents ▪ By referring to this document, any person must be able to determine the meaning for each of the symbols used in encoding the data ▪ Keep the manual as detailed and complete as possible COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Coding Manual Content 1. Question number ▪ Provides easy reference in case there is a need to look for the question within the questionnaire itself 2. Variable name (to be used in the computer) ▪ Has to be as short as possible since computer software would allow only 8 characters at the most COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Coding Manual Content 3. Variable Description ▪ Facilitates data analysis and writing report when whole variable name may not be accommodated by the software that was or when the coding manual will contain only the question number without any variable names COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Coding Manual Content 4. Actual Codes ▪ Actual coding instructions must appear under this column ▪ Should be as detailed and complete as possible to enable easy decoding of the information processed COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Coding : Coding Manual Content Question No. Variable Name Variable Description Codes & Instruction 1 Id Identification Number Unique number from 1 to 50 2 Initials Initials of the respondent Code as is 3 Sex Assigned sex at birth 0 – Male 1 – Female 4 Educ Highest Educational Attainment 0 – None 1 – Elementary 2 – High school 3 - College COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Encoding: Definition ▪ process of transforming the data written in the questionnaire or form to electronic form ▪ Available computer programs for data encoding\ ▪ Microsoft Excel ▪ Microsoft Access ▪ Epi Info™ COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Checking: Definition ▪ Inspection of the data collected ▪ Data are scrutinized to identify erroneous information that can affect the validity of the results • Check data for the following: 1. Completeness: Incomplete answers should be filled up when possible. 2. Consistency: Reject the question or even the questionnaire where inconsistent answers are seen. 3. Accuracy: Locate and discard answers which are extremely doubtful and inaccurate COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Checking: Definition 4. Uniformity: Standardize all answers involving units of measurements. 5. Comprehensiveness: Detect any response which is incomprehensible to all but the field interview and make the necessary clarification. 6. Legibility: Correct any unintelligible hand-writing or use of abbreviations and symbols that may not be understood by the coder later on. COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: Definition ▪ Process of inspecting the raw data from a questionnaire and correcting for any error to ensure its accuracy and reliability ▪ Necessary process before analyzing and interpreting the data ▪ Referred to also as “data screening” ▪ Done after detecting erroneous information on the forms and after detecting errors in the encoded data COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: Reasons 1. To detect and correct gross errors 2. To complete, as much as possible, the responses to the questions 3. To eliminate inconsistencies, incorrect, or imprecise responses. 4. To clarify the response to some specific questions. Primary Objective: Ensure accuracy and reliability of data COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: Reasons 5. To make the data entries clear, legible, comprehensible, and consistently uniform 6. To reduce non-¬‐response or incomplete response. 7. To prepare the data for coding, encoding and processing 8. To impose some minimum standards on the quality of the raw data Primary Objective: Ensure accuracy and reliability of data COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: When to do? 1. Field Editing ▪ Done in the field as soon as possible after the questionnaire had been administered ▪ During data collection, the interviewer cannot always write out completely and legibly the responses due to time constraint. ▪ Ad hoc abbreviations and the writing style of the interviewer can be difficult for others to decipher ▪ As soon as possible, the interviewer himself should review the responses, complete what was abbreviated, translate any personal shorthand, and rewrite all illegible writings COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: When to do? 2. Central Editing ▪ Done in the central office after the completed returns had been received ▪ What to do when replies clearly are inappropriate or missing? ▪ editor can sometimes determine the proper answer by reviewing the other information in the questionnaire. ▪ editor to strike out the answer if it is clearly inappropriate and there is no reasonable basis for determining the correct response COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: Who should edit? 1. Interviewer ▪ qualifies as a field editor to perform data editing as soon as possible 2. Editor ▪ responsible for data editing in the central office ▪ sometimes assigned to other persons under the supervision of the editor ▪ data editing must never be assigned to an inexperienced or temporary clerk COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: General Rules 1. Scrutinize all pages of the questionnaires to determine that no page was omitted. 2. Don’t erase or destroy the original responses on the questionnaire. 3. Use colored pencil or pen marks when adjustments, corrections and changes are made to distinguish edits from changes by a respondent or interviewer. COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health Data Editing: General Rules 4. Although it is advisable to have a single person to specialize in all editing for a given study, when more than one editor is needed, each should work through a complete questionnaire for efficiency. 5. Editing instructions should be written explicitly and completely so that no misunderstanding as to the manner in which to handle the task may arise. 6. Classify general answers especially in opinion or interpretative types of questions. COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health References • Lebanan-Dalida, M.O. Data Processing: National Sanitarian Training Center: Department of Biostatistics and Epidemiology, College of Public Health, University of the Philippines Manila, July 29, 2009. • Asaad, A.S. Processing of Data: : Department of Biostatistics and Epidemiology, College of Public Health, University of the Philippines Manila • Parel, C.P., et al. Data Analysis and Interpretation. Philippine Social Science Council, Inc. Quezon City: NEDA-APO Production Unit. 1979 COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health upcph Data Processing John Robert Carabeo Medina Assistant Professor Department of Epidemiology and Biostatistics COLLEGE OF PUBLIC HEALTH University of the Philippines Manila SEAMEO TROPMED Philippines Regional Centre for Public Health, Hospital Administration, Environmental and Occupational Health