The use of Optical Character Recognition Technology in National Statistical Offices Learning Objectives At the end of the session students will be able to: Define Optical Character Recognition technology. Explain the advantages that accrue to a National Statistical Office from using Optical Character Recognition technology. Explain some of the factors to be considered when adopting and implementing this form of technology. Explain the implications of this technology to enumerator training, field operations and questionnaire design. List the requirements for obtaining good scanning results. Activities Activity 1 Watch the PowerPoint presentation on the use of Optical Character Recognition technology in National Statistical Offices and at the end of the presentation answer the following questions: 1. What is Optical Character Recognition? 2. What are the advantages that accrue to a National Statistical Office by utilising Optical Recognition technology? 3. What are the factors to be considered when adopting and implementing Optical Character Recognition technology? 4. What are the implications of adopting this form of technology to enumerator training, field operations and questionnaire design? 5. What are the requirements for obtaining good scanning results? Activity 2 The Trainer can invite an expert in OCR technology or make an appointment to visit, together with students, an institution/organisation in their National Statistical System 1 SADC Course in Statistics where Optical Character Recognition technology is in use for data entry. This is intended to reinforce students’ understanding of Optical Character Recognition technology and expose them to: - The equipment and skills needed to benefit from the technology. The scanning process. Scanning and Optical Character Recognition data extraction process. Activity 3 Assume a research organisation in your country has conducted a survey on the prevalence of HIV/AIDS whose findings are intended to be used in the 2010 Round of Population and Housing Census questionnaire. The organisation has just finished the data collection process and is to embark on data entry. In the past, the organisation has been using the traditional approach of hiring contract staff for data cleaning, coding and ultimately data entry. Given the importance of the survey to the forthcoming Census and the nation, management of the organisation is of the opinion that the traditional approach to data entry is time consuming and labour intensive and this may affect the release of the findings of the study for policy formulation and resource targeting in the country. In view of this, management has tasked the Data Processing Manager to explore on the availability of technology that will enable the organisation to speedily facilitate data entry. Through research and networking with students in your class, the Manager has established that there is Optical Character Recognition technology that can be used to address the immediate problems faced by the organisation. As someone who is aware of the technology, write 2 documents (2 pages long) to the research organisation: - for the first document use the guidelines given below: What is Optical Character Recognition? The equipment and skills needed to benefit from the technology? What are the advantages of using Optical Character Recognition technology? What are the factors to be considered when adopting and implementing Optical Character Recognition technology? What are the implications of adopting this form of technology to future enumerator training, field operations and questionnaire design? good scanning results? What are the requirements for obtaining 2 SADC Course in Statistics For the second document, include the following elements: Cite the disadvantages of OCR as compared to a case of having an efficient data entry based on human input. Is there a difference in time required to do data entry between the two approaches? If there is, what is the difference and where is the evidence? The cost lines that need budgeting for when using OCR technology versus the traditional manual approach to data entry. Availability of resources, money and the required skills to operationalise the technology. Managerial considerations such as error rates, data cleaning and double data entry. Practical implications in the field Activity 4 In a classroom setting, the trainer may want to set up a debate between two teams, one proposing to use OCR and the other team taking a more cautious approach. Resources 1. 2. 3. http://www.afdb.org/pls/portal/docs/PAGE/ADB_ADMIN_PG/DOCUMEN TS/STATISTICS/JOURNALVOL1FULL.PDF http://intranet.unescap.org/stat/pop-it/pop-guide/capture_ch06.pdf National Sample Census of Agriculture 2002/2003, Volume 1: Technical and Operation Report, September 2006. Tanzania. 3 SADC Course in Statistics