Uploaded by judeonanad07

Data Processing

advertisement
upcph
Data Processing
John Robert Carabeo Medina
Assistant Professor
Department of Epidemiology and Biostatistics
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Learning Objectives
At the end of the session, the students should be able to:
1. Explain data processing and its four major phases
2. Discuss data coding and its importance
3. Discuss the principles of setting up categories and
different types of code
4. Explain the stages of coding and coding manual
5. Define data encoding, data checking, and data
editing
6. Enumerate reasons for data editing
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Processing
▪ Systematic procedure to ensure that the information
and/or data gathered are complete, consistent, and
suitable for analysis
▪ Has four (4) major phases
▪ Data Coding
▪ Data Encoding
▪ Data Checking
▪ Data Editing
▪ Some researchers include data analysis; other do not
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding: Definition
▪ Process of (1) grouping the responses to a
question into categories and assigning codes to
these categories
▪ Codes - numbers, characters and/or other
symbols
▪ Facilitates counting and tabulation
▪ Symbols or numbers used must allow for flexible
and rapid storage, retrieval and analysis
▪ Final choice of code depends on the objectives of
the study and the type of storage facility for the
data
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Importance
1. To group the responses into a limited
number of meaningful categories, so as to
bring out their essential pattern.
2. To make recording of information easy.
3. To preserve the confidentiality of the data
either for personnel, industrial, or security
reasons.
4. To make it easier to enter the data into the
computer, minimize storage space and to speed
up processing during encoding.
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Importance
5. To facilitate management of files (e.g., sorting,
merging, transforming, listing, etc.) using the
coded values of one or more key variables.
6. To be able to analyze data using statistical
software like Epi Info™, SAS. SPSS, etc. which
require numeric inputs.
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Code Construction
Five Principles in Setting up Code Categories in
Code Construction
1. Appropriateness
2. Single
3. Concept
4. Exhaustiveness
5. Exclusiveness
6. Adequacy
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Code Construction
1. Appropriateness
▪ Categories should be appropriate to the
research problem; construct them in such a way
that they provide the information needed to
satisfy our objectives
2. Single
▪ Categories defined in terms of a single concept
or classification principle relevant to the
objectives of the study
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Code Construction
3. Exhaustiveness
▪ Single category corresponding to each
response
▪ No response which does not belong to any of
the constructed categories.
▪ Unfortunately, we are not always able to
anticipate the responses
▪ High proportion falling into this catch-all category
insinuates that categories are not proper→
categories not properly constructed
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Code Construction
4. Exclusiveness
▪ Categories should be mutually exclusive
▪ Each response should fall into one and only one
category.
5. Adequacy
▪ Categories should adequately summarize the
responses without losing the important details of the
information
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
1.
2.
3.
4.
5.
Field code
Bracket code
Listing code
Pattern code
Scale code
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
1. Field Code
▪ Numbers or responses are recorded as they are given
by the respondent (“Code as is”)
▪ Actual value or quantity of the variable is the
information that will be recorded in the form and later
on entered in the computer
▪ Ex. Age as of last birthday, height in centimeters,
weight in kilograms, number of households with toilet,
and number of facility inspected
Example 1. What is your age in years? ________
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
2. Bracket Code
▪ Each category of this type of code refers to a range of
numbers or class intervals
▪ Frequently used for numerical responses when the
respondent may not be able to provide an exact answer
▪ Ex. Income, age
Example 2. How much is your monthly salary?
❑ Less than 10, 000
❑ 10, 000 – 19, 999
❑ 20, 000 – 39, 999
❑ 40, 000 – 49, 999
❑ 49, 000 or greater
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
3. Listing Code
▪ Listed pre-specified choices
Single Response
can be selected as answers to
Example 3.1. What is
the question are listed or
your civil status?
given in the form. There are
pre-specified choices
1 - Less than 10, 000
2 - 10, 000 – 19, 999
▪ Two types
3 - 20, 000 – 39, 999
▪ Single response - only one
4 - 40, 000 – 49, 999
answer to the question
5 - 49, 000 or greater
▪ Multiple response –
respondents may answer
several answers
simultaneously
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
3. Listing Code
▪ Listed pre-specified choices
can be selected as answers to
the question are listed or
given in the form. There are
pre-specified choices
▪ Two types
▪ Single response - only one
answer to the question
▪ Multiple response –
respondents may answer
several answers
simultaneously
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
Multiple Response
Example 3.2. What were
the signs and symptoms
experienced by the
patient? Check all that
applies
__ Fever
__ Nausea
__ Diarrhea
__ Malaise
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
4. Pattern Code
▪ Single code allocated
to each type of
response, including
multiple or combined
responses
(bidimensional code)
▪ First code the
response to the
primary question; then
to the secondary
questions.
Example 4. Example 4: Do you find the
project helpful? Are you planning to go
on with the project or do you plan to
discontinue your project?
Answer: “Well, I find it sort of helpful b
ut I simply do not have the time
for it. So, I guess I just have to quit fro
m the project”. So we code this
answer as ‘2
Possible answers Codes
Project helpful– plans to go on (1)
Project helpful– plans to discontinue (2)
Project helpful– plans undecided (3)
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Types of Codes
5. Scale Code
▪ Grouping of responses
which vary in degree
or intensity along a
continuum
▪ For coding responses
to attitudinal questions
▪ Divide the continuum
into segments; then
code
Example 5. Five-¬‐point scale (This scale
is used to intensify the code
responses with degree of feeling.)
Q: “How satisfied are you with your pre
sent job?”
A: “Neither satisfied nor dissatisfied.” Co
de (3)
1
Very Satisfied
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
2
Satisfied
3
Neutral
4
Dissatisifed
5
Very Dissatisfied
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Stages
Two Stages of Coding
1. Pre-coding
2. Post-coding
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Stages
1. Pre-coding
▪ Process of establishing the categories and codes
early in the questionnaire design stage especially for
fixed-alternative questions
▪ Coding for closed questions (categories are fixed or
predetermined)
▪ Codes may be devised even prior to data
collection
▪ Respondents need only to encircle or check the
appropriate code or the data collector needs only
write down the proper code
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Stages
2. Post-coding
▪ Process of finalizing the categories and codes after
the questionnaires have come in from the field
▪ Necessary for open-ended questions in which we
might find it difficult, if not impossible to pre-code
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Coding Manual
▪ A document containing all the instructions regarding
the codes to be used for the replies given by the
respondents
▪ By referring to this document, any person must
be able to determine the meaning for each of the
symbols used in encoding the data
▪ Keep the manual as detailed and complete as
possible
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Coding Manual
Content
1. Question number
▪ Provides easy reference in case there is a need to
look for the question within the questionnaire itself
2. Variable name (to be used in the computer)
▪ Has to be as short as possible since computer
software would allow only 8 characters at the most
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Coding Manual
Content
3. Variable Description
▪ Facilitates data analysis and writing report when
whole variable name may not be accommodated by
the software that was or when the coding manual will
contain only the question number without any variable
names
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Coding Manual
Content
4. Actual Codes
▪ Actual coding instructions must appear under this
column
▪ Should be as detailed and complete as possible to
enable easy decoding of the information processed
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Coding : Coding Manual
Content
Question No.
Variable Name
Variable
Description
Codes &
Instruction
1
Id
Identification
Number
Unique number
from 1 to 50
2
Initials
Initials of the
respondent
Code as is
3
Sex
Assigned sex at
birth
0 – Male
1 – Female
4
Educ
Highest
Educational
Attainment
0 – None
1 – Elementary
2 – High school
3 - College
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Encoding: Definition
▪ process of transforming the data written in the
questionnaire or form to electronic form
▪ Available computer programs for data encoding\
▪ Microsoft Excel
▪ Microsoft Access
▪ Epi Info™
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Checking: Definition
▪ Inspection of the data collected
▪ Data are scrutinized to identify erroneous
information that can affect the validity of the results
• Check data for the following:
1. Completeness: Incomplete answers should be
filled up when possible.
2. Consistency: Reject the question or even the
questionnaire where inconsistent answers are
seen.
3. Accuracy: Locate and discard answers which are
extremely doubtful and inaccurate
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Checking: Definition
4. Uniformity: Standardize all answers involving
units of measurements.
5. Comprehensiveness: Detect any response which
is incomprehensible to all but the field interview and
make the necessary clarification.
6. Legibility: Correct any unintelligible hand-writing or
use of abbreviations and symbols that may not be
understood by the coder later on.
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: Definition
▪ Process of inspecting the raw data from a
questionnaire and correcting for any error to
ensure its accuracy and reliability
▪ Necessary process before analyzing and
interpreting the data
▪ Referred to also as “data screening”
▪ Done after detecting erroneous information on the
forms and after detecting errors in the encoded
data
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: Reasons
1. To detect and correct gross errors
2. To complete, as much as possible, the
responses to the questions
3. To eliminate inconsistencies, incorrect, or
imprecise responses.
4. To clarify the response to some specific
questions.
Primary Objective: Ensure accuracy
and reliability of data
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: Reasons
5. To make the data entries clear, legible,
comprehensible, and consistently uniform
6. To reduce non-¬‐response or incomplete
response.
7. To prepare the data for coding, encoding and
processing
8. To impose some minimum standards on the
quality of the raw data
Primary Objective: Ensure accuracy
and reliability of data
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: When to do?
1. Field Editing
▪ Done in the field as soon as possible after the
questionnaire had been administered
▪ During data collection, the interviewer cannot
always write out completely and legibly the
responses due to time constraint.
▪ Ad hoc abbreviations and the writing style of
the interviewer can be difficult for others to
decipher
▪ As soon as possible, the interviewer himself
should review the responses, complete what was
abbreviated, translate any personal shorthand, and
rewrite all illegible writings
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: When to do?
2. Central Editing
▪ Done in the central office after the completed
returns had been received
▪ What to do when replies clearly are inappropriate
or missing?
▪ editor can sometimes determine the proper
answer by reviewing the other information in
the questionnaire.
▪ editor to strike out the answer if it is
clearly inappropriate and there is no reasonable
basis for determining the correct response
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: Who should edit?
1. Interviewer
▪ qualifies as a field editor to perform data editing
as soon as possible
2. Editor
▪ responsible for data editing in the central office
▪ sometimes assigned to other persons under
the supervision of the editor
▪ data editing must never be assigned to an
inexperienced or temporary clerk
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: General Rules
1. Scrutinize all pages of the questionnaires to
determine that no page was omitted.
2. Don’t erase or destroy the original responses on
the questionnaire.
3. Use colored pencil or pen marks when
adjustments, corrections and changes are made
to distinguish edits from changes by a
respondent or interviewer.
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Data Editing: General Rules
4. Although it is advisable to have a single
person to specialize in all editing for a given
study, when more than one editor is needed,
each should work through a complete
questionnaire for efficiency.
5. Editing instructions should be written explicitly
and completely so that no misunderstanding
as to the manner in which to handle the
task may arise.
6. Classify general answers especially in opinion or
interpretative types of questions.
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
References
• Lebanan-Dalida, M.O. Data Processing: National
Sanitarian Training Center: Department of Biostatistics
and Epidemiology, College of Public Health, University
of the Philippines Manila, July 29, 2009.
• Asaad, A.S. Processing of Data: : Department of
Biostatistics and Epidemiology, College of Public
Health, University of the Philippines Manila
• Parel, C.P., et al. Data Analysis and Interpretation.
Philippine Social Science Council, Inc. Quezon City:
NEDA-APO Production Unit. 1979
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
upcph
Data Processing
John Robert Carabeo Medina
Assistant Professor
Department of Epidemiology and Biostatistics
COLLEGE OF PUBLIC HEALTH
University of the Philippines Manila
SEAMEO TROPMED Philippines
Regional Centre for Public Health, Hospital Administration,
Environmental and Occupational Health
Download