Using Omnipage for OCR - University of Aberdeen

advertisement
Using OmniPage for OCR
Assistive Technology Team. January 2012.
The University has OCR software (FreeOCR) available on all classroom PCs (See: A Guide to Scanning
Text using Optical Character Recognition (OCR) software). However if you need to perform OCR with
greater accuracy or on more complex documents, we recommend the use of OmniPage.
Omnipage is currently installed in the following locations:
-
Taylor Library scanning station
Assistive Technology Booth 1, Edward Wright Building
Assistive Technology Booths 107 & 108, First floor of the Sir Duncan Rice Library.
Assistive Technology Booths are accessible by staff and authorised students only.
Note: This document has been primarily written for Non-Medical Personal Assistants (NMPAs)
who are providing support for print disabled students by converting material from print format to
electronic text which can be read with a screenreader or produced in Braille.
Getting Started
 Start OmniPage
 You will be presented with the Start Page:
 Choose Open File or Scan Document as appropriate:
-
If you already have a PDF file from scanning using the MFDs or eScan book scanner,
choose Open File
-
If you are going to scan directly using the OpticBook Scanner choose Scan Document
(You may need to set up the scanner to work with Omnipage if you have not used it
before, see Appendix)
The OmniPage Window
The Omnipage window is split into three sections:
1. Thumbnails – displays small version of each page in the document
2. Page Image – displays an image of the current page selected in the Thumbnails view. You
can use this section to define areas of text and/or images to be recognised (see later)
3. Text Editor – displays the recognised text for the currently selected page.
Copyright 2012 University of Aberdeen
Page 1
Page Image
Text Editor
Thumbnails
Note: If any of these sections are not visible, you can choose the appropriate options from the
Window menu. If the Page Image and Text Editor are displayed as tabs rather than as adjacent
panes, you may wish to switch to Classic View (from the Window menu) to display them as shown
below.
Step 1: Open a PDF or Scan using the OpticBook Scanner
Open a PDF
 Select Load Files from the second drop down menu in the Omnipage Toolbox toolbar:
 Choose the PDF file you wish to work with. The document will load and you will see the pages
appear in the Thumbnails section of the Omnipage window.
Note: If you have scanned a document in short sections and consequently have multiple PDFs, you
can work with all the sections together by repeating the Load Files the process for the additional
files.
Scan using the OpticBook Scanner
 Select Scan from the second drop down menu in the Omnipage Toolbox toolbar:
 The scan options dialog opens. Select the following options (as shown below), then click Scan:
Copyright 2012 University of Aberdeen
Page 2
-
Scan Type: Flatbed Reflective
Page Size: Max Scan Area
Scan Mode: Greyscale
Resolution: 300
Autocrop
 The scanning process will start and OmniPage will display
the dialog pictured opposite.
Do NOT click on Cancel – doing so will interrupt the
scanning process.
 Once your page has been scanned you have two choices –
Stop Loading Pages or Add More Pages:
-
Stop Loading Pages
Choose this option if you have finished scanning, or if you
were only scanning one page
-
Add More Pages
Choose this option if you wish to continue scanning:
1. Put next page on scanner, then click on Add More Pages to continue scanning.
2. Your page will be previewed, then press Scan to scan the page.
Step 2: Perform OCR on Simple Documents
 Set the text editor view to Plain Text: View menu -> Text Editor Views -> Plain Text
 Either scan the document or Open a PDF using the Load Files option from the second drop down
menu in the Omnipage Toolbox toolbar (See Step 1)
 Select all pages in document by clicking one page in the Thumbnails view then pressing
CTRL+A
Copyright 2012 University of Aberdeen
Page 3
 Press the third button (Perform OCR) in the OmniPage Toolbox toolbar, by default labelled
Automatic
 OCR will be performed on all pages in the document and the OCR Proofreader window will open.
Step 3: Proofread
It is recommended that you use the OCR Proofreader to proofread the recognised text, as it allows
you to see both the recognised text and the part of the scanned image to which it refers.
-
-
If you have closed the proofreading window and need to reopen it, press F7
Choose to correct a word that is highlighted by selecting one of the suggested alternatives or
directly editing the suspect text and clicking on Change (or Change All if you want to change
every instance of the word in your document). The proofreader will move on to the next
Non-dictionary word it finds.
To leave a word alone, click on Ignore or Ignore All.
If your document contains more than one page the proofread will go through each page.
You will prompted once proofreading is complete, at which point click on OK.
Step 4: Save & Tidy Up Document
 Select Save to Files from the fourth drop down menu in the Omnipage Toolbox toolbar:
 The Save to File window opens and you will be
prompted to save your file.
We recommend the following settings:
-
-
Look in:
Save the file to your H: drive
or a USB pen
File name:
Name your file
appropriately: e.g. authortitle-page from-pages (ex:
Stanlake-Introductory
Economics-pages 23 to 36)
Save as:
Text
File Type:
Microsoft Word 97 (*.doc)
Formatting Level: Flowing Text
File Options:
Create one file for all pages
Page Range:
All Pages
View Result:
Ticked
 Click on OK. Your saved document will be opened in Word.
 In Word you should tidy up the document:
o Proofread, using the spelling & grammar checker to help
Copyright 2012 University of Aberdeen
Page 4
o
o
Click and delete any spurious images (e.g. edges of pages, “black space”)
Make sure the original page numbers are clear in the text so it can be properly
referenced
Notes for Performing OCR on More Complex Documents
Sometimes using the Automatic option for performing OCR produces less than satisfactory results for example a page with lots of images that you don’t need, pages with graphs (where the graph
itself will be recognised as an image but any labels will be recognised as text), pages where you have
accidentally caught part of the next page. Sometimes you will also notice that the recognised text is
not in the same order as it is on the page, and will need to be corrected.
In these cases you will need to use zones to tell Omnipage what should be recognised as text, and
what should not, and in some cases the order of the text.
Dealing with Images
Removing Images from a document
If you are scanning a document for a visually impaired student who will not be able to make use of
images, it is recommended that images are removed from the document. This can be done either by
removing an image zones automatically created in the document, or deleting the image from the
resulting word document. If you remove an image from a document be sure to include the image
caption followed by [Image Removed] to make it clear that something is missing from the document.
Keeping Images with text together
Sometimes when performing automatic OCR, Omnipage will recognise the text from an image
separately to the image (for example, labels from a graph). If you want the image to remain intact in
the document, you will need to draw an Image Zone around it. Note that the text will note be
accessible to a screenreader/text-to-speech software.
Zones
Zones are indicated on the Page Image by either a red outline (a text zone) or a green outline (an
image zone).
The Zones toolbar appears on the left side of the Page Image section of the Omnipage window.
Zones will appear on pages after performing OCR using the Automatic setting (see Step 2). If you
manually change the zones on the page, you will need to perform OCR on that page again by
pressing the Perform OCR button again. You can draw zones on several pages before selecting them
together in the thumbnails view, and then pressing Perform OCR rather than doing each page
individually.
Deleting Zones
Before drawing your own zones, you may need to delete existing zones if OCR has already been
performed on the page.
 Select a zone by clicking on it – you can select multiple zones by holding down the Shift key
as you click on them. To select all zones, right click then choose Select All Zones
Copyright 2012 University of Aberdeen
Page 5
 Right click then choose Remove Selected Zones
Drawing Zones
 Click the Draw Text Zone tool on the Zones toolbar.
 Use the mouse to drag a rectangle around the block of text to be recognised.
 You can make irregular shapes by drawing overlapping rectangles (for example if the text is
arranged in an L-shape around an image.
If you need to draw a different type of zone, hold the mouse over the bottom right corner (blue
triangle) of the Draw Text Zone button for options.
Ordering or Reordering Zones
If you manually draw zones, the text will be recognised in the order that you drew the zones.
 You can check the order in which the text will be recognised by clicking the
Change zone order tool on the Zones toolbar.
 Each zone will display a number in the corner which indicates the recognition order
 To change the order, click on each zone in turn in the correct order.
Further Information
More detailed help on creating zones or performing other functions with Omnipage can be found in
the Omnipage Help File (choose Help Topics from the Help menu).
Further further advice and help you can contact the Assistive Technology Team:
Assistive Technology Team
IT Services, Room G86a/b, Edward Wright Building,
University of Aberdeen
AB24 3QY
Tel: 01224 273336
Email: atech@abdn.ac.uk
Web Site: http://www.abdn.ac.uk/assistivetechnology
Copyright 2012 University of Aberdeen
Page 6
Appendix: Set up the OpticBook
Scanner to work with Omnipage
1. Start the Scanner Setup Wizard from the Tools menu (it may run automatically if you
attempt to scan without the scanner set up).
2. Select No to “Would you like to download the latest scanner database from Nuance” then
click Next
3. Choose Select and test scanner or digital camera, then click Next
4. From the list of Available Scanners choose TWAIN: Plustek OpticBook 3800, then click OK
5. Click Next, then continue to click Next in the following windows until you reach the last
window which has a Finish button. Click Finish. The scanner is now setup and ready to use.
Copyright 2012 University of Aberdeen
Page 7
Download