Data Capture Technology Statistical Centre Of IRAN Presented by :

advertisement
Vice – Presidency for
Strategic Planning and Supervision
Statistical Centre Of Iran
Data Capture Technology
Statistical Centre Of IRAN
Presented by :
MS. SOMAYE AHANGAR
Data Capture Technologies

Personal Digital Assistant (PDA)

Intelligent Character Recognition (ICR)
PDA Technology






Instances of the PDA applications in the
SCI Surveys and Censuses
General Advantages
Application specification
Localization and customization
Activities for conducting the survey
Limitations
Instance of the PDA application in the SCI

Collecting images and multimedia contents on tribes
in the Census of Nomadic Tribes of Iran in order to
be used in Encyclopedia of Iranian Tribes.

In sampling surveys including monthly and periodical
surveys PDAs are frequently used.
General Advantages of PDA

They are convenient tools for Personal Information Management
(PIM).

They offer various kinds of input keyboard (Hardware keyboard,
Virtual Keyboard on LCD, Finger- touch screen)

Communication via USB, Infrared and Bluetooth

Accurate, rapid, single- time data entry.

They could lead to elimination of paper questionnaires in
surveys/census.

There will be no need for verification step.

PDA application will reduce the need for human resources
leading to more efficiency and cost effectiveness in projects
Application specification

The current devices’ model in use are :
 HP iPAQ HW6945
 HP iPAQ HW-6955

Developed software packages consists of 3 sections:
 The mobile software edition used in PDA device
 The desktop application for central management system in provinces
 The desktop application for central management system in the SCI

Developed in .NET technologies and SQL Server 2005 CE
Localization and customization

Application of GPS devices to take spatial Information

Assignment of sample working scopes to enumerators in
monthly survey is dynamic. It should be mentioned that this
dynamic process may causes some problems and concerns.

Measuring and evaluating spent time in each place under
enumeration.

Simultaneous error validations to maintain data consistency.

Flexible navigation tool enabling browsing between different
pages for filling their blank fields.

Possibility of data transmission thought internet
Activities for conducting the survey

Collecting Data by interviewers

Moving collected data from devices to central management
system in provinces

Reviewing and verifying data by experts of provinces

Sending database files (.sdf files) by data communication from
provinces to the SCI

Converting and processing received files

Generate new .sdf file for using in next period
Limitations

Requires trained staff

In order to find direction and paths by using digital geographical
maps, a more comprehensive database of maps of roads is
necessary which itself requires updating the current data.

For using GSM features proper coverage of mobile communication
network is necessary in all areas under enumeration.

Hardware problems like short battery life of some PDAs and their
failure to operate properly as well as problem in synchronizing of PDA
with pc were among other problems.

a comprehensive management systems– particularly in monthly
surveys- is necessary to process the received files in timely manner
in order to generate the required data file for next month.
Sometimes, time constraint is a major problem.
ICR Technology

Overview
 Top Operational Reasons
 Accuracy
 ICR Stages
 Instances of the ICR usages in the SCI
Surveys and Censuses
 Application specification and consideration
 The Strength Points using ICR in the SCI
ICR Overview

ICR is an advanced automatic data entry system

Captures the images of document forms

Reads handwritten, machine-printed, barcodes and other data
fields

The Interpret engines convert the precious data of the images
into digital format

Recognizes printed letters with one letter in each box
Top Operational Reasons

Cost effectiveness

Improved accuracy

Shorter cycle time

Improved work distribution (skill, location)

Audit trails and reporting and tracking tools

Data privacy
How Accurate? It depends

Quality of incoming documents is a very critical issue
depending on :
 Form design
 Scan
 Original vs. photocopy
 Dropout vs. non-dropout
ICR Stages

Template/Form creation

Image capture & scan

Image pre-processing

Batch processing

Recognition and data extraction

Data validation and Review
ICR Process Schema
Character Verification
Field Verification
Instance of the ICR usages

Conducting a pilot census in some provinces in 2005

Conducting 2006 National Population and Housing Census

Using it as a major data capture tool in Iranian Households
Economic Data Collection Project in 2008
Application specification & consideration

Various kind of forms with different layouts and
contents

High complex structure of Farsi writing in comparison to Latin

Scan forms for about 45 pages per-minute for double sides in
A3 size forms with 200 DPI resolution

Distribute the tasks using completely automated
management systems

Developed modules for rule and batch editing, within the ICR
software packages
Batch & Rule Verification
The Strength Points

Using the product of the same company’s software in 2006
Census, which was used in 2005 Pilot Census because of its
accuracy and reliability.

Improving quality of the developed software based on feedbacks
and experiences gained from the Pilot Census and experimental
versions.

A good practice in designing and preparing of the questionnaire
forms

A good practice in offering an acceptable estimation of scanning
and reading times.
Download