Using OmniPage for OCR Assistive Technology Team. January 2012. The University has OCR software (FreeOCR) available on all classroom PCs (See: A Guide to Scanning Text using Optical Character Recognition (OCR) software). However if you need to perform OCR with greater accuracy or on more complex documents, we recommend the use of OmniPage. Omnipage is currently installed in the following locations: - Taylor Library scanning station Assistive Technology Booth 1, Edward Wright Building Assistive Technology Booths 107 & 108, First floor of the Sir Duncan Rice Library. Assistive Technology Booths are accessible by staff and authorised students only. Note: This document has been primarily written for Non-Medical Personal Assistants (NMPAs) who are providing support for print disabled students by converting material from print format to electronic text which can be read with a screenreader or produced in Braille. Getting Started Start OmniPage You will be presented with the Start Page: Choose Open File or Scan Document as appropriate: - If you already have a PDF file from scanning using the MFDs or eScan book scanner, choose Open File - If you are going to scan directly using the OpticBook Scanner choose Scan Document (You may need to set up the scanner to work with Omnipage if you have not used it before, see Appendix) The OmniPage Window The Omnipage window is split into three sections: 1. Thumbnails – displays small version of each page in the document 2. Page Image – displays an image of the current page selected in the Thumbnails view. You can use this section to define areas of text and/or images to be recognised (see later) 3. Text Editor – displays the recognised text for the currently selected page. Copyright 2012 University of Aberdeen Page 1 Page Image Text Editor Thumbnails Note: If any of these sections are not visible, you can choose the appropriate options from the Window menu. If the Page Image and Text Editor are displayed as tabs rather than as adjacent panes, you may wish to switch to Classic View (from the Window menu) to display them as shown below. Step 1: Open a PDF or Scan using the OpticBook Scanner Open a PDF Select Load Files from the second drop down menu in the Omnipage Toolbox toolbar: Choose the PDF file you wish to work with. The document will load and you will see the pages appear in the Thumbnails section of the Omnipage window. Note: If you have scanned a document in short sections and consequently have multiple PDFs, you can work with all the sections together by repeating the Load Files the process for the additional files. Scan using the OpticBook Scanner Select Scan from the second drop down menu in the Omnipage Toolbox toolbar: The scan options dialog opens. Select the following options (as shown below), then click Scan: Copyright 2012 University of Aberdeen Page 2 - Scan Type: Flatbed Reflective Page Size: Max Scan Area Scan Mode: Greyscale Resolution: 300 Autocrop The scanning process will start and OmniPage will display the dialog pictured opposite. Do NOT click on Cancel – doing so will interrupt the scanning process. Once your page has been scanned you have two choices – Stop Loading Pages or Add More Pages: - Stop Loading Pages Choose this option if you have finished scanning, or if you were only scanning one page - Add More Pages Choose this option if you wish to continue scanning: 1. Put next page on scanner, then click on Add More Pages to continue scanning. 2. Your page will be previewed, then press Scan to scan the page. Step 2: Perform OCR on Simple Documents Set the text editor view to Plain Text: View menu -> Text Editor Views -> Plain Text Either scan the document or Open a PDF using the Load Files option from the second drop down menu in the Omnipage Toolbox toolbar (See Step 1) Select all pages in document by clicking one page in the Thumbnails view then pressing CTRL+A Copyright 2012 University of Aberdeen Page 3 Press the third button (Perform OCR) in the OmniPage Toolbox toolbar, by default labelled Automatic OCR will be performed on all pages in the document and the OCR Proofreader window will open. Step 3: Proofread It is recommended that you use the OCR Proofreader to proofread the recognised text, as it allows you to see both the recognised text and the part of the scanned image to which it refers. - - If you have closed the proofreading window and need to reopen it, press F7 Choose to correct a word that is highlighted by selecting one of the suggested alternatives or directly editing the suspect text and clicking on Change (or Change All if you want to change every instance of the word in your document). The proofreader will move on to the next Non-dictionary word it finds. To leave a word alone, click on Ignore or Ignore All. If your document contains more than one page the proofread will go through each page. You will prompted once proofreading is complete, at which point click on OK. Step 4: Save & Tidy Up Document Select Save to Files from the fourth drop down menu in the Omnipage Toolbox toolbar: The Save to File window opens and you will be prompted to save your file. We recommend the following settings: - - Look in: Save the file to your H: drive or a USB pen File name: Name your file appropriately: e.g. authortitle-page from-pages (ex: Stanlake-Introductory Economics-pages 23 to 36) Save as: Text File Type: Microsoft Word 97 (*.doc) Formatting Level: Flowing Text File Options: Create one file for all pages Page Range: All Pages View Result: Ticked Click on OK. Your saved document will be opened in Word. In Word you should tidy up the document: o Proofread, using the spelling & grammar checker to help Copyright 2012 University of Aberdeen Page 4 o o Click and delete any spurious images (e.g. edges of pages, “black space”) Make sure the original page numbers are clear in the text so it can be properly referenced Notes for Performing OCR on More Complex Documents Sometimes using the Automatic option for performing OCR produces less than satisfactory results for example a page with lots of images that you don’t need, pages with graphs (where the graph itself will be recognised as an image but any labels will be recognised as text), pages where you have accidentally caught part of the next page. Sometimes you will also notice that the recognised text is not in the same order as it is on the page, and will need to be corrected. In these cases you will need to use zones to tell Omnipage what should be recognised as text, and what should not, and in some cases the order of the text. Dealing with Images Removing Images from a document If you are scanning a document for a visually impaired student who will not be able to make use of images, it is recommended that images are removed from the document. This can be done either by removing an image zones automatically created in the document, or deleting the image from the resulting word document. If you remove an image from a document be sure to include the image caption followed by [Image Removed] to make it clear that something is missing from the document. Keeping Images with text together Sometimes when performing automatic OCR, Omnipage will recognise the text from an image separately to the image (for example, labels from a graph). If you want the image to remain intact in the document, you will need to draw an Image Zone around it. Note that the text will note be accessible to a screenreader/text-to-speech software. Zones Zones are indicated on the Page Image by either a red outline (a text zone) or a green outline (an image zone). The Zones toolbar appears on the left side of the Page Image section of the Omnipage window. Zones will appear on pages after performing OCR using the Automatic setting (see Step 2). If you manually change the zones on the page, you will need to perform OCR on that page again by pressing the Perform OCR button again. You can draw zones on several pages before selecting them together in the thumbnails view, and then pressing Perform OCR rather than doing each page individually. Deleting Zones Before drawing your own zones, you may need to delete existing zones if OCR has already been performed on the page. Select a zone by clicking on it – you can select multiple zones by holding down the Shift key as you click on them. To select all zones, right click then choose Select All Zones Copyright 2012 University of Aberdeen Page 5 Right click then choose Remove Selected Zones Drawing Zones Click the Draw Text Zone tool on the Zones toolbar. Use the mouse to drag a rectangle around the block of text to be recognised. You can make irregular shapes by drawing overlapping rectangles (for example if the text is arranged in an L-shape around an image. If you need to draw a different type of zone, hold the mouse over the bottom right corner (blue triangle) of the Draw Text Zone button for options. Ordering or Reordering Zones If you manually draw zones, the text will be recognised in the order that you drew the zones. You can check the order in which the text will be recognised by clicking the Change zone order tool on the Zones toolbar. Each zone will display a number in the corner which indicates the recognition order To change the order, click on each zone in turn in the correct order. Further Information More detailed help on creating zones or performing other functions with Omnipage can be found in the Omnipage Help File (choose Help Topics from the Help menu). Further further advice and help you can contact the Assistive Technology Team: Assistive Technology Team IT Services, Room G86a/b, Edward Wright Building, University of Aberdeen AB24 3QY Tel: 01224 273336 Email: atech@abdn.ac.uk Web Site: http://www.abdn.ac.uk/assistivetechnology Copyright 2012 University of Aberdeen Page 6 Appendix: Set up the OpticBook Scanner to work with Omnipage 1. Start the Scanner Setup Wizard from the Tools menu (it may run automatically if you attempt to scan without the scanner set up). 2. Select No to “Would you like to download the latest scanner database from Nuance” then click Next 3. Choose Select and test scanner or digital camera, then click Next 4. From the list of Available Scanners choose TWAIN: Plustek OpticBook 3800, then click OK 5. Click Next, then continue to click Next in the following windows until you reach the last window which has a Finish button. Click Finish. The scanner is now setup and ready to use. Copyright 2012 University of Aberdeen Page 7