The use of Optical Character Recognition Technology in National

advertisement
The use of Optical Character Recognition
Technology in National Statistical Offices
Learning Objectives
At the end of the session students will be able to:





Define Optical Character Recognition technology.
Explain the advantages that accrue to a National Statistical Office from using
Optical Character Recognition technology.
Explain some of the factors to be considered when adopting and implementing this
form of technology.
Explain the implications of this technology to enumerator training, field operations
and questionnaire design.
List the requirements for obtaining good scanning results.
Activities
Activity 1
Watch the PowerPoint presentation on the use of Optical Character Recognition
technology in National Statistical Offices and at the end of the presentation answer the
following questions:
1. What is Optical Character Recognition?
2. What are the advantages that accrue to a National Statistical Office by utilising Optical
Recognition technology?
3. What are the factors to be considered when adopting and implementing Optical
Character Recognition technology?
4. What are the implications of adopting this form of technology to enumerator training,
field operations and questionnaire design?
5. What are the requirements for obtaining good scanning results?
Activity 2
The Trainer can invite an expert in OCR technology or make an appointment to visit,
together with students, an institution/organisation in their National Statistical System
1
SADC Course in Statistics
where Optical Character Recognition technology is in use for data entry. This is intended
to reinforce students’ understanding of Optical Character Recognition technology and
expose them to:
-
The equipment and skills needed to benefit from the technology.
The scanning process.
Scanning and Optical Character Recognition data extraction process.
Activity 3
Assume a research organisation in your country has conducted a survey on the
prevalence of HIV/AIDS whose findings are intended to be used in the 2010 Round of
Population and Housing Census questionnaire. The organisation has just finished the
data collection process and is to embark on data entry. In the past, the organisation has
been using the traditional approach of hiring contract staff for data cleaning, coding and
ultimately data entry. Given the importance of the survey to the forthcoming Census and
the nation, management of the organisation is of the opinion that the traditional approach
to data entry is time consuming and labour intensive and this may affect the release of the
findings of the study for policy formulation and resource targeting in the country.
In view of this, management has tasked the Data Processing Manager to explore on the
availability of technology that will enable the organisation to speedily facilitate data entry.
Through research and networking with students in your class, the Manager has established
that there is Optical Character Recognition technology that can be used to address the
immediate problems faced by the organisation. As someone who is aware of the
technology, write 2 documents (2 pages long) to the research organisation: - for the first
document use the guidelines given below:






What is Optical Character Recognition?
The equipment and skills needed to benefit from the technology?
What are the advantages of using Optical Character Recognition technology?
What are the factors to be considered when adopting and implementing Optical
Character Recognition technology?
What are the implications of adopting this form of technology to future enumerator
training, field operations and questionnaire design?
good scanning results?
What are the requirements for obtaining
2
SADC Course in Statistics
For the second document, include the following elements:






Cite the disadvantages of OCR as compared to a case of having an efficient data
entry based on human input.
Is there a difference in time required to do data entry between the two approaches?
If there is, what is the difference and where is the evidence?
The cost lines that need budgeting for when using OCR technology versus the
traditional manual approach to data entry.
Availability of resources, money and the required skills to operationalise the
technology.
Managerial considerations such as error rates, data cleaning and double data entry.
Practical implications in the field
Activity 4
In a classroom setting, the trainer may want to set up a debate between two teams, one
proposing to use OCR and the other team taking a more cautious approach.
Resources
1.
2.
3.
http://www.afdb.org/pls/portal/docs/PAGE/ADB_ADMIN_PG/DOCUMEN
TS/STATISTICS/JOURNALVOL1FULL.PDF
http://intranet.unescap.org/stat/pop-it/pop-guide/capture_ch06.pdf
National Sample Census of Agriculture 2002/2003, Volume 1: Technical and
Operation Report, September 2006. Tanzania.
3
SADC Course in Statistics
Download