Data entry

advertisement
Data entry – principles
and practices
Module 2 Session 2
1
Overview

This session is concerned with the
principles and practices of data entry so
that participants can:
i.
advise others on how to do
effective data entry
ii. explain principles of good data
entry through practice with a small
set of data
2
Data management cycle
Design
questionnaire
Design
survey
Conception
Reporting of results
Data
analysis
Enumerators collect
data in the field
Manual checking,
editing etc.
Data entered
onto computer
Now we start
looking at
entering data
Computer data management
3
Contents




Review different types of questions that can
be found on questionnaires
Review different data types
Enter a small dataset onto the computer
Summarise steps in data entry, and principles
of good data entry
The Epi Info software is used for data entry.
4
Learning Objectives
At the end of this session participants should
be able to:
 enter questionnaire data onto the computer

summarise the steps in the data entry
process
produce a checklist of data entry principles

describe double data entry

5
Questions and data types

Preliminary review of the questions on a
questionnaire gives the data entry person an
idea of:
 the types of data to be entered
 the complexity of the data to be entered
 quality of the data on the questionnaires.

It is also essential for designing the
computer data entry screens.
Here we look at some example questions.
6
Types of questions
What is
These are examples of numeric data.
(NAME'S) age
in completed

years?

For how long has
(NAME) stayed in
in the


household
during the

last 12 months?

(In months)
First one take values of 1, 2, 3, etc.
Units are in years.
What is maximum value?
Second one can take values of 0, 1, 2,
3, etc.
Units are in months.
Duration cannot be more than 12
months.
7
Types of questions
Sex
Male....................
1
Female................
2

This is an example of
categorical data.

It has two possible values
– male and female.
Coded as 1 and 2.
The codes are entered
onto the computer.


Other similar examples are Yes/No types of response.
Coding is often Yes = 1 No = 0; or Yes = 1 No = 2.
[Should be consistent throughout.]
8
Types of questions
What is the
relationship of
of (NAME) to the head of
household?

This is also categorical
data.

There are 12 possible
values;
coded 1 to 12.
Need sufficient space in
computer system to be
able to enter up to 2-digit
numbers.
Head...................................…………………………
1
Spouse....................................…………………………
2
Son/daughter…………………………………………….
3
Grand child..................................…………………
4
Step child..................................…………………..
5
Parent of head or spouse.....................................
6


Sister/Brother of head or spouse.....................................
7
Nephew/Niece………………………………………….
8
Other relatives...........................................…
9
Servant...........................................……………………
10
Non Relative..............................................…
11
Others..............................................…………..
12
9
Types of questions
What is [NAME'S]
current schooling

This is also a
categorical
variable.

Are the
categories in
any particular
order?
Are the
categories
mutually
exclusive?
Status?
Never attended............................................................
01
Left school.......................…………………………
02
Currently attending:
Nursery...................................................................
03
Primary...........................................................
04
Post primary.......................................................
05
Secondary..........................................................
06
Post secondary**..............................................
07
A diploma course.............................................
08
University.....................................................
09
Apprenticeship...................................................
10

10
Multiple response questions
Multiple response questions can be in
the form of:



Multiple dichotomy
Responses listed but not ordered
Ranked e.g. List 1st, 2nd, 3rd.
How should these be entered?
11
Example: Multiple dichotomy
Question from UNHS. S5b10.
Does this household own any of the following?
Yes =1
No= 2
Motor vehicle
1
Motor cycle
2
Bicycle
1
Boat/canoe
2
Donkey
2
12
Example: Listed but not ordered
multiple responses
UNHS S3a3.
What sort of sickness/injury did [x]
suffer? (column.3)
Malaria
Respiratory
Measles
Diarrhoea
Aids
Pregnancy related
problems
Dental
Accident
Intestinal worms
Sick infections
Others
01
02
03
04
05
06
07
08
09
10
11
If code 01 (malaria) did in column (3)
5. What type of drug did [X] take?
None ………………………………1
Chroloquine
2
Fansidar…………………………… 3
Camaquine ……………………….. 4
Quinine ……………………………..5
Panadol …………………………….6
Aspirin ……………………………...7
Herbs ……………………………... 8
Others …………………………….. 9
(5a)
(5b)
2
(5c)
5
3
13
Example: Ranked multiple
responses
UNHS S3bq3: What are the main channels of communication from which
you receive AIDS/HIV information and Education?
(Note that the channels should be ranked in order of the three most important)
(use codes at the bottom of page)
1st
(3)
2nd
08
(4)
01
3rd
(5)
07
Channels of communication (codes for col. (3), (4), and (5)
Radio
01
Posters
05
Teachers
09
TV
02
Billboards 06
Political leaders 10
Film
03
Family
07
Trad. Leaders
11
Drama 04
Friends
08
Religious leaders 12
14
Computerisation

The dichotomous Multiple Response
questions require one column for each
Yes/No (or 1/0) response
 each one indicating whether respondent
ticked / did not tick item in the list.

In the ordered or ranked multiple responses,
can have as many columns as there are
alternatives in the question, but the first
records the most important etc..
15
More complex questions
Did [NAME]
fall sick or
get injured
during the
last 30 days?
If code 1 in col [2]
What sort of
How many
If code 01
(Malaria)
days were
lost (suffered)
by [NAME]
due to the
01 illness/Injury?
Malaria...............................................
What type of
drugs did
[NAME] take?
09
Intestinal Infections................................
7
Asprin...…………………………………………….
10
Skin Infections..........................................
8
Herbs…………..…………………………………………
sickness/injury
did [NAME]
suffer?
in col (3)
How
02
1 Respiratory..........................................................
Yes............................................................
should
03
2 Measles.................................................................
No.......................................................
1
None………………………………………………….
04
3 Diarrohea..............................................................…
Don't know.....................
these
2
Chloroquine……………………………………..
05
AIDS......................................................................
3
Fansidar……………………………………………..
Pregnancy
(if no or
data be
4
Camaquine.……………………………………..
06
Related Problems..................................................................
don't know
5
Quinine...…………………………………………………
07
Dental..................................................................
skip to
entered?
6
Panadol...…………………………………………….
08
Accident...............................................................
col. (11)
9
Other (specify)…………..…………….
11
Hyper - tension..................................................
12
Ulcers....................................................................
13
Mental Illness...................................................................
14
Other fever...............................................
15
Others...................................................................
16
Missing values


Surveys will always have missing data
Data can be missing for a variety of reasons:






respondent did not know the answer;
respondent refused to answer;
question was not applicable;
question was missed by the fieldworker;
response was not recorded clearly;
etc.
17
Coding missing data




Assigning codes to missing data – avoids
blanks in the data.
Code must not be a possible value.
For numeric data (e.g. Age) negative value
often used (e.g. -99)
For categorical data use a code higher than
any valid code for the question (e.g. 99)
18
Missing value codes

Different codes could be used for different
types of missing data.




99 or -99 = question missed by fieldworker
88 or -88 = question not applicable
77 or -77 = don’t know or refused to answer
Should be consistent throughout
19
Unique Identifier




Each set of data should have a unique
identifier.
Often referred to as a Primary Key.
In household surveys for example you often
have a Household ID.
This would be unique for each household and
enables you to easily find the data for the
household.
20
Activity 2




In pairs.
Look at questionnaires.
Identify types of questions, and types of
data.
Class discussion.
21
Brief introduction to Epi Info…


Epi Info is a series of freely distributable
programs for Microsoft Windows, for
managing databases (especially public
health ones)
can customize the data entry process
(layout similar to questionnaire), enter
and analyse data.
22
Brief introduction to Epi Info…
Epi Info contains:
Projects
(file, .mdb)
which have
Data Tables
stores the data
View: info about the screen appearance, or how the
survey looks, and how data is entered into the data table.
It has fields (variables) which are created to hold data.
23
Data entry in Epi Info

Points to note:




View can span several “pages”
Space assigned for “Other, specify” text
Questions can be skipped if not relevant
Demonstrate data entry using the Household
Survey data
24
Activity 4 & 5

Entering a small dataset into Epi Info.

Record some principles of good data
entry.
Record the steps in the data entry
process.

25
Double data entry



Data entry needs to be checked.
If data set is small, can print out and check
manually.
If dataset is large, this can be resourceintensive and time consuming,
- How many records do you need to check?


Double data entry = dataset is entered twice
(by different people) and datasets compared.
Discrepancies are checked and corrected.
26
Data Compare Utility








Utilities -> Data Compare
File -> New Script
Step 1: Epi Info View – select the files to compare
Step 2: Checks that structure of the files is the same
Step 3: Select the unique identifier
Step 4: Select the fields to compare (all)
View -> Read-Only
Demonstration of Data Compare using data1 and
data2
27
Activity 7

Use the Data Compare utility to compare
data entered in Activity 4 with data entered by
another group
28
Download