Census Data Processing: Contemporary Technologies for Data Capture Bangkok, Thailand 15- 19 September, 2008 By Jatan Kumar Saha Systems Analyst Bangladesh Bureau of Statistics Ministry of Planning Government of the People’s Republic of Bangladesh 1. Bangladesh Experience Bangladesh Bureau of Statistics (BBS) used Optical Mark Recognition (OMR) for the first time in 1981 census of population. OMR technology is used to capture data in Population Census 1991. OMR is used in Agriculture Census 1983-84 for data capturing. OMR is also used in Economic Census 1986 and Economic Census 2001 & 2003 for data capturing. OCR (Optical Character Recognition) technology for capturing data of Population Census 2001. 2 1.1 Population Census 2001 Processing of Population Census data is a gigantic operation. BBS procured 4 OCR machines of KODAK Company in Singapore. In OCR technology machine read the form (questionnaire) creating an image file. Image file can be used for multiple purposes. Initially Bangladesh Government had the planning of using population census data, (individual records) in future for preparing voter list & ID cards of voter. 3 1.1 Population Census 2001 Continue On the basis of previous experience and considering need for faster processing the adoption of OCR technology was given high priority. Two experts from KODAK Company tested the OCR forms (used as census questionnaire). The result of this test was satisfactory. KODAK expert initially developed template in collaboration with GIS (Agent of KODAK machines) expert and trained BBS computer experts. 4 1.1.1 OCR Operation • Hardware: 4 OCR machines, 2 Servers & 31 Workstations were used for whole operation. • Software: Eyes & Hands for Forms (EHF) is a software application for automatically capturing and managing information (data) from forms (questionnaire). EHF scans, interprets and verifies the forms, then transfers the data to a host system (Server). 5 1.1.1 OCR Operation Continue The forms do not need to be specially printed for optical reading and they can be filled in with hand written information. Those corrections which are necessary are made in an efficient and user-friendly environment where the form being edited is shown directly on the screen of workstation. 6 1.1.2 Mode of Processing All the filled in questionnaires (in books) were stored on steel racks specially designed for the sequence (by District, Thana/Upazila and Union). The filled in questionnaires were also arranged in Geo-code sequence upto enumeration area (EA) level in an envelop. This arrangement provided quick retrieval system for any desired questionnaire. 7 1.1.2 Mode of Processing Continue After manual editing the questionnaire were separated from the book and again kept in the same envelop containing identification of EA. These were taken to the OCR room and scanned systematically in Geo-code sequence. The top sheet of every EA is called Tally Sheet which indicated the geographic area identification of the EA, number of households and population by sex for that EA. 8 1.1.2 Mode of Processing Continue The tally Sheet was followed by Household sheets containing information on household characteristics and individual characteristics of the persons of that household. Around 5000 sheets were scanned per hour by a single scanner. Capturing of data by OCR was performed by batches of Thana/Upazila (Sup-District). 9 1.2 Economic Census 2001 & 2003 BBS has the vast experience in using OMR technology. GIS (Graphic information System Ltd) in collaboration Programmer with BBS developed Systems the Analyst template for and OMR operation. On the basis of the template data were captured from two OMR forms (Tally sheet & Data sheet) at the time of Economic census 2001 & 2003 processing. Economic Census was conducted in two phases. 10 1.2 Economic Census 2001 & 2003 Continue In 2001 only urban areas were covered. Lack of government fund rural areas could not be covered simultaneously. So only the households having economic activities in rural area were covered in 2003. From both phases data of only 4.2 million sheets of household/establishment were captured by OMR machine in 6 months long time. 11 1.2 Economic Census 2001 & 2003 Continue • Hardware: BBS has used 3 (three) OMR machines of DRS (Data and Research services plc) of Model CD1200i for data capturing of Economic Census 2001 & 2003. • Software: DRS has provided OMR with the SOSKIT software which is supplied in BBS for Census data Capturing. SOSkit 7.10 version is a suite of three system programs. (a) SOSGen, (b) SOSlnk and (c) SOSInp. 12 2. Good Practice Learning from other organizations that have developed successful projects or approaches to problems. Batch Management The system administrator can use Batch Management to create, delete, or open batches. In addition, the administrator can route a batch to a processing module or change the current status of a batch. A user can be given rights to batch Manager to perform batch creation or other operations as permitted by the system administrator. 13 2. Good Practice Continue Batch Management can be used to: Display a summary table showing the current status of all activities batches. Create new batches. Delete existing batches. Edit batch properties such the priority, status, and processing queue. Display a status history of each active batch in the system. 14 3. Success BBS experiences the reality using OMR technology successfully in processing of population Census 1981 & 1991. Reasons: Proper management Adequate training Quality forms (Questionnaire) Uninterruptible Power supply Availability of spare parts of Machines. 15 4 Problems encountered The following problems have been encountered in Population Census 2001 1. Working in stand alone system (One PC per OCR) 2. Power failure (Power interruption for long time ) 3. Paper cutting (Forms were not cut as per border mark in the form) 4. Non matching of colour of printed forms (as assigned in template) 5. Non availability of spare parts 6. Lack of proper training on OCR technology. 16 4 Problems encountered Continue Economic Census 2001 & 2003: In Bangladesh humidity is high. So OMR questionnaire could not be run properly due to absorption of moisture. Dehumidifier machines were used in seasoning the questionnaire everyday. The torn or mutilated forms (sheet) could not pass through the machine. In that case, forms should be replaced. All the forms should be stored in a relatively dry, cool and dust free environment. 17 4 Problems encountered Continue Reasons for lower speed of in OMR operation : Intensity of marking in the bubbles was not proper. The questionnaires ( OMR forms) were not stored in proper environment. Cutting of forms was not properly done on the basis of border mark. Thickness of paper (OMR form) was not good (Thickness differs in forms) Power failure (electricity disturbance) 18 Comments: Bangladesh experience on using OMR and OCR indicate that; a. OMR scanning is fast. b. OMR scanning is more accurate. OMR technology can consistently provide 99.9% accuracy on read data. ICR and OCR technologies can also provide 99.5% accuracy if the system is tuned properly, the forms are well designed, the characters are written cleanly and neatly, and contextual editing is used. c. OMR scanning is efficient and cost effective. d. OMR scanning is easy to implement and support. compared to many PC network installations, OMR scanners' need for ongoing technical support is minimal. 19 Thank you for Your patient hearing