Census Data Processing: Contemporary Technologies for Data Capture Bangkok, Thailand

advertisement
Census Data Processing: Contemporary
Technologies for Data Capture
Bangkok, Thailand
15- 19 September, 2008
By
Jatan Kumar Saha
Systems Analyst
Bangladesh Bureau of Statistics
Ministry of Planning
Government of the People’s Republic of Bangladesh
1. Bangladesh Experience
Bangladesh Bureau of Statistics (BBS) used Optical
Mark Recognition (OMR) for the first time in 1981
census of population.
OMR technology is used to capture data in
Population Census 1991.
OMR is used in Agriculture Census 1983-84 for data
capturing.
OMR is also used in Economic Census 1986 and
Economic Census 2001 & 2003 for data capturing.
OCR (Optical Character Recognition) technology for
capturing data of Population Census 2001.
2
1.1 Population Census 2001
Processing of Population Census data is a gigantic
operation. BBS procured 4 OCR machines of KODAK
Company in Singapore. In OCR technology machine
read the form (questionnaire) creating an image file.
Image file can be used for multiple purposes. Initially
Bangladesh Government had the planning of using
population census data, (individual records) in future
for preparing voter list & ID cards of voter.
3
1.1 Population Census 2001
Continue
On
the
basis
of
previous
experience
and
considering need for faster processing the adoption
of OCR technology was given high priority. Two
experts from KODAK Company tested the OCR
forms (used as census questionnaire). The result of
this test was satisfactory. KODAK expert initially
developed template in collaboration with GIS (Agent
of KODAK machines) expert and trained BBS
computer experts.
4
1.1.1 OCR Operation
• Hardware: 4 OCR machines, 2 Servers & 31
Workstations were used for whole operation.
• Software: Eyes & Hands for Forms (EHF) is a
software application for automatically capturing
and managing information (data) from forms
(questionnaire).
EHF
scans,
interprets
and
verifies the forms, then transfers the data to a
host system (Server).
5
1.1.1 OCR Operation
Continue
The forms do not need to be specially printed for
optical reading and they can be filled in with
hand written information. Those corrections
which are necessary are made in an efficient and
user-friendly environment where the form being
edited is shown directly on the screen of
workstation.
6
1.1.2 Mode of Processing
 All the filled in questionnaires (in books) were stored
on steel racks specially designed for the sequence
(by District, Thana/Upazila and Union).
 The filled in questionnaires were also arranged in
Geo-code sequence upto enumeration area (EA) level
in an envelop.
 This arrangement provided quick retrieval system for
any desired questionnaire.
7
1.1.2 Mode of Processing
Continue
 After manual editing the questionnaire were separated
from the book and again kept in the same envelop
containing identification of EA.
 These were taken to the OCR room and scanned
systematically in Geo-code sequence.
 The top sheet of every EA is called Tally Sheet which
indicated the geographic area identification of the EA,
number of households and population by sex for that
EA.
8
1.1.2 Mode of Processing
Continue
 The tally Sheet was followed by Household sheets
containing information on household characteristics
and individual characteristics of the persons of that
household.
 Around 5000 sheets were scanned per hour by a
single scanner.
 Capturing of data by OCR was performed by batches
of Thana/Upazila (Sup-District).
9
1.2 Economic Census 2001 & 2003
BBS
has
the
vast
experience
in
using
OMR
technology. GIS (Graphic information System Ltd) in
collaboration
Programmer
with
BBS
developed
Systems
the
Analyst
template
for
and
OMR
operation. On the basis of the template data were
captured from two OMR forms (Tally sheet & Data
sheet) at the time of Economic census 2001 & 2003
processing. Economic Census was conducted in two
phases.
10
1.2 Economic Census 2001 & 2003
Continue
In 2001 only urban areas were covered. Lack of
government fund rural areas could not be covered
simultaneously. So only the households having
economic activities in rural area were covered in
2003. From both phases data of only 4.2 million
sheets of household/establishment were captured
by OMR machine in 6 months long time.
11
1.2 Economic Census 2001 & 2003
Continue
• Hardware: BBS has used 3 (three) OMR machines of
DRS (Data and Research services plc) of Model
CD1200i for data capturing of Economic Census 2001
& 2003.
• Software: DRS has provided OMR with the SOSKIT
software which is supplied in BBS for Census data
Capturing. SOSkit 7.10 version is a suite of three
system programs.
(a) SOSGen,
(b) SOSlnk and
(c) SOSInp.
12
2. Good Practice
Learning from other organizations that have developed
successful projects or approaches to problems.
Batch Management
The system administrator can use Batch
Management to create, delete, or open batches. In
addition, the administrator can route a batch to a
processing module or change the current status of a
batch. A user can be given rights to batch Manager
to perform batch creation or other operations as
permitted by the system administrator.
13
2. Good Practice
Continue
Batch Management can be used to:
 Display a summary table showing the current
status of all activities batches.
Create new batches.
Delete existing batches.
Edit batch properties such the priority, status, and
processing queue.
Display a status history of each active batch in the
system.
14
3. Success
BBS experiences the reality using OMR technology
successfully in processing of population Census 1981
& 1991. Reasons:
 Proper management
 Adequate training
 Quality forms (Questionnaire)
 Uninterruptible Power supply
 Availability of spare parts of Machines.
15
4 Problems encountered
The following problems have been encountered in
Population Census 2001
1. Working in stand alone system (One PC per OCR)
2. Power failure (Power interruption for long time )
3. Paper cutting (Forms were not cut as per border
mark in the form)
4. Non matching of colour of printed forms (as
assigned in template)
5. Non availability of spare parts
6. Lack of proper training on OCR technology.
16
4 Problems encountered
Continue
Economic Census 2001 & 2003:
In Bangladesh humidity is high. So OMR questionnaire
could not be run properly due to absorption of
moisture.
Dehumidifier
machines
were
used
in
seasoning the questionnaire everyday. The torn or
mutilated forms (sheet) could not pass through the
machine. In that case, forms should be replaced. All
the forms should be stored in a relatively dry, cool and
dust free environment.
17
4 Problems encountered
Continue
Reasons for lower speed of in OMR operation :
 Intensity of marking in the bubbles was not proper.
 The questionnaires ( OMR forms) were not stored in
proper environment.
 Cutting of forms was not properly done on the basis
of border mark.
 Thickness of paper (OMR form) was not good
(Thickness differs in forms)
 Power failure (electricity disturbance)
18
Comments:
Bangladesh experience on using OMR and OCR indicate that;
a.
OMR scanning is fast.
b.
OMR scanning is more accurate. OMR technology can
consistently provide 99.9% accuracy on read data. ICR and
OCR technologies can also provide 99.5% accuracy if the
system is tuned properly, the forms are well designed, the
characters are written cleanly and neatly, and contextual
editing is used.
c.
OMR scanning is efficient and cost effective.
d.
OMR scanning is easy to implement and support. compared
to many PC network installations, OMR scanners' need for
ongoing technical support is minimal.
19
Thank you
for
Your patient hearing
Download