Data Capture Overview United Nations Statistics Division

advertisement
Data Capture Overview
United Nations Statistics Division
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Overview of Presentation
 Definition of data capture
 Methods of data capture:
- Different Methods
- Advantages and disadvantages
 Issues to consider
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
What’s Data Capture?
“Data capture is the system used to convert
the information obtained in the census to a
format that can be interpreted by a
computer.”
Source: United Nations Principles and Recommendations for
Population and Housing Censuses, Rev. 2, p.68.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Data Capture Methods
1) Keyboard data entry
2) Optical mark recognition/reading (OMR)
3) Optical character recognition/intelligent
character recognition (OCR/ICR)
4) Personal digital assistant (PDA)
5) Internet


Advantages/disadvantages/costs/impacts at both data
capture and later stages
Combination of more than one of the above methods
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Keyboard Data Entry
 Response codes from census form are manually
entered into computers
 Sophisticated version involves computer assisted
key entry where operator selects a response from
options displayed on the screen
 Use of method based on time and cost
considerations, and feasibility to implement more
sophisticated technology
 Method also used to process textual responses
into classification categories
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Advantages and Disadvantages of Keyboard Data
Entry
Advantages
Disadvantages





Method requires simple
software systems and
low-end computing
hardware
Less costly (depending on
the costs of manpower)
There will be a large
number of PCs available
for other uses after
census


Requires more staff
Task takes much longer time to
complete than with automated
data entry
Potential for errors during data
entry
Standardization of operations is
difficult as performance may be
individually dependant
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Data Capture Technologies
 Imaging and intelligent character recognition offer
great potential and benefits for data capture
 Use of technology for data capture should be to
enhance effective and efficient data capture and
not for technology’s sake
 Awareness of long lead times and technology
infrastructure required for successful
implementation of intelligent character recognition
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Optical Mark Recognition/Reading (OMR)

OMR is a form-scanning method whereby responses are
read into a computer without a keyboard

OMR technology reads responses to “tick-box” type
questions on specially designed paper

Only presence or absence of a mark is detected by the
machine

The scanned responses are transformed into codes

Handwritten responses must be manually entered or
coded using computer-assisted methods
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Advantages and Disadvantages of OMR
Advantages

Improved data accuracy

Data capture faster than
keyboard data entry

Equipment is relatively
inexpensive

Relatively simple to install
and run

A well-established
technology that’s been
used in many countries
Disadvantages

Restrictions as to form design

Restrictions on type of paper and ink

Precision required in printing
process/cutting of sheets

Response boxes should be correctly
marked with appropriate pen or
pencil

Won’t capture textual responses
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Optical Character Recognition (OCR)/
Intelligent Character Recognition (ICR)

OCR and ICR combine scanning and character recognition
technology to scan the whole form and interpret the
responses

OCR technology recognizes machine-printed characters
only

ICR technology reads both machine-printed and handwritten responses in specific locations of the page and
transforms the responses into codes

For OCR, handwritten responses must be manually
entered or coded using computer-assisted methods
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Advantages of OCR/ICR
 Form design is not as stringent as for OMR
 Processing time can be reduced due to automated
nature of the process
 Allow for digital filing of questionnaires resulting in
efficiency of storage and retrieval of questionnaires
for future use
 Some handwritten responses can be automatically
coded thereby improving data quality
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Disadvantages of OCR/ICR

Higher costs of equipment (sophisticated hardware/software required)

High calibre IT staff required to support the system

Handwriting on census forms be as close as possible to the model
handwriting to avoid recognition error

Possibility for error during character substitution which would affect
data quality

Tuning of recognition engine to accurately recognize characters is
critical with trade-off between quality and cost
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Personal Digital Assistant (PDA)
 Contents of the census form are stored onto
the PDA so that the questions appear
sequentially on the screen
 Data are entered into a hand-held computer
instead of onto a paper census form
 Data are then electronically transmitted to
an NSO database for further processing
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Advantages and Disadvantages of use of the PDA
Advantages

Instant data capturing at the
point of collection, reducing
manual input errors

Immediate data validation,
reducing re-verifications at later
stage


Time effective with real time
logical validation rules, reducing
logical errors
Faster processing of census
information leading to timely
availability of results
Disadvantages

Setting up of process may take a long time as
it requires extensive testing

Requires that enumerators have ability to use
the device which may require administering a
test

Requires intensive training of enumerators on
use of device (training is more complicated)

Need to recharge the battery which could run
out during enumeration

Possibility of equipment failure
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Internet-based Data Collection
Use of the Internet for census data collection is growing

-
However, the method is always complementary to other
more established methods

Like with PDAs, the on-line form is not a downloadable
version of the paper form

Use of this method requires a password in order to access
and fill in the form

Development of the internet system for data collection is
generally outsourced for lack of in-house expertise
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Advantages/Disadvantages of use of the Internet
Advantages

Reduced resources necessary
for form handling and data
capture

Better opportunity to
enumerate difficult to reach
and to enumerate geographic
area and population groups

Automatic filtering of irrelevant
questions

Better quality data due to inbuilt interactive verification
mechanism

Faster availability of census
results through simplified data
entry and editing
Disadvantages






Requires that respondents have a
computer with Internet access
Management of responses can be
problematic, e.g., that households have
responded once and only once
Requires high security system to ensure
safe transfer of data
Need to build parallel processing system
as not everyone will use the Internet
Requires mechanism to check for
omitted and duplicate submissions
Is costly and requires a lot of resources
for setting up and adequately test the
system
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Issues to Consider in Choosing a Method
 Method to use is dependant on national circumstances
 Choice of method should be part of the overall strategic
objective of the census in terms of timeliness, accuracy and
cost
 Choice of processing system and technology to use need to
be established early in census cycle
 Enough time is required to test and implement the system
 When imaging technology is used for data capture, extensive
testing is required well in advance of the census
 Possibility to outsource when the required expertise is not
available in-house
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Issues to consider
(cont.)
 Extensive testing of the system is also critical
when data collection is either by PDA or via the
Internet
 Design and paper quality of census form should
be linked to method of data capture
 When imaging technology is to be used,
adequate training of enumerators on how to
properly fill in the forms is crucial
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Thank you
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries:
Contemporary technologies for data capture, methodology and practice of data editing
Dar es Salaam, Tanzania, 9-13 June 2008
Download