Technology Learning Center Omni Page Professional 17 Optical Character Recognition Software Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc Last Updated: February 9, 2016 2 Contents OmniPage Professional 17 ...................................................................................................................................................... 1 Overview ................................................................................................................................................................................. 3 Prerequisites ....................................................................................................................................................................... 3 Objectives............................................................................................................................................................................ 3 Creating Accessible Scanned Documents ............................................................................................................................... 4 What is OCR?....................................................................................................................................................................... 4 OmniPage Pro Scanning Software ...................................................................................................................................... 4 Scanning the Document ...................................................................................................................................................... 4 Launch OmniPage Pro ..................................................................................................................................................... 5 Set Scanning Preferences ................................................................................................................................................ 5 Load file: .......................................................................................................................................................................... 6 Optical Character Recognition Tool ................................................................................................................................ 7 Export the document: ..................................................................................................................................................... 7 Start Scanning ............................................................................................................................................................... 11 Multiple Pages............................................................................................................................................................... 11 Other ways to load files .................................................................................................................................................... 11 Converting from PDF ..................................................................................................................................................... 11 Creating PDF files from other applications ................................................................................................................... 12 OCR and Proofreading....................................................................................................................................................... 12 Exporting the Files............................................................................................................................................................. 13 Save the File .................................................................................................................................................................. 13 OmniPage Document format ........................................................................................................................................ 13 Save images in the document ....................................................................................................................................... 14 Save to PDF ................................................................................................................................................................... 14 References ........................................................................................................................................................................ 15 Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 3 Overview This manual goes over how to create accessible scanned documents using Optical Character Recognition software. Omni Page Pro 17 lets users scan text documents like; journal articles, text based handouts, pages in a book and other documents. Not only does it recognize or convert scanned content to readable text formats; it also converts pictures of documents to readable text formats. Features available in Omni page 17 include: 1. Asian Recognition: recognizes Asian characters (Japanese, Korean, Traditional Chinese and Simplified Chinese). 2. Vertical Text Recognition: Identifies Text written vertically and allows it to be edited using Text editor, using the True Page formatting level. 3. Improved support for Office 2007: The Direct OCR buttons now appear on a separate Nuance OCR tab instead of being mixed with all other Add-Ins in Office. 4. Robust Batch Processing: The Batch Manager automatically skips files that cannot be processed. Including those blocked by password requirements, without stopping the main flow of work. The Job results window indicates which files were not processed. 5. Running: The program’s launch speed is increased and performance is considerably improved on multi-core computers. Support for quad-core machines is introduced. 6. Linking workflows to scanner buttons: Omni Page functions and workflows can be associated with scanner buttons, so the whole pre-processing, recognition and storage of documents can be launched from the scanner. 7. Output to Kindle: The new Kindle Assistant lets you create workflows to send recognition results to a Kindle account at Amazon and receive them displayed on a Kindle device registered with that account. Prerequisites To use Omni Page, users should have a basic knowledge about how to use a scanner to scan documents, knowledge about how to use word processing applications such as Microsoft Word, Notepad or WordPad, and the ability to work in the Windows Operating System environment. Objectives After following the steps in this manual you should: Understand how to use the different features in Omni Page Pro that aid in creating accessible scanned documents. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 4 Apply what you have learned about Omni Page Pro to manipulate different types of course materials (text documents) that need to be scanned so that they’re accessible. Creating Accessible Scanned Documents What is OCR? When you scan documents such as; journal articles, pages in a book or handouts, the standard scanning software that comes with your scanner may scan a text document as an image. Assistive technology such as screen readers, cannot access image only content because they don’t contain readable and editable text. OCR or Optical Character Recognition software enables you to take an image (a scanned document page) and create editable text from this image. This is usually referred to as “OCR’ing” an image file. The editable text can then be saved into a word processing format such as a MS doc/.docx or saved as an Adobe .pdf. These formats allow for further editing and formatting of text if necessary. Omni Page Pro Scanning Software Omni Page Pro OCR software converts the scanned “image” output from print-based documents such as laser-printed and typewritten documents and digital documents such as an image PDF into editable text. This editable text can then be edited, formatted and saved in your choice of application: MS Word, PowerPoint, Excel or Adobe PDF. Omni Page also retains various elements from your scanned print based documents and digital documents. The elements retained include: Graphics: photos, drawings, charts, graphs, etc. Text Formatting: fonts, font sizes, font styles and font color. Page Formatting: column structure, paragraph spacing, table formats, placement of graphics, etc. Scanning the Document Before you begin scanning: Gather all the course materials that you will be scanning. Think about how you will make the scanned document(s) available to users. For example, will you be posting the scanned document on a website so that it could be downloaded by a user or will it be linked to a SacCT course? Determine the file format that you should save your scanned document in so that it benefits those who access the document. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 5 Launch Omni Page Pro step 1. Click the Start Menu button on the Windows task bar. step 2. Select All Programs. step 3. Select Scan Soft Omni Page 17. step 4. Select Omni Page Professional 17. step 5. Omni Page Professional main window displays. Set Scanning Preferences Before you begin scanning, you will need to specify the processing method you will use to scan documents. You can choose from; automatic, manual or workflow. For this tutorial we will focus on the automatic processing, which is the fastest and easiest method. Through the automatic process; Omni Page scans the image, performs OCR to generate editable text so that you can check and correct errors in the document and Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 6 lastly gives you the option to export the document to the desired format and location. Omni Page can complete these three steps from beginning to end, in the Automatic Process. You can set scanning preferences using Scanner Setup Wizard in Tools: step 1. Select your scanning preferences by clicking on the downward arrows (Toolbox drop down lists) and make your selection for each phase (1, 2, 3). We generally select 1-2-3 which executes step 1 of loading file, step 2 of proof reading the document and step 3 of saving the document to its destination folder. By default it saves the file in Documents in Libraries. Load file Choose the location of the file you would like Omnipage to convert. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 7 Optical Character Recognition Tool: The user should choose the option ’Automatic’ so that the software can detect the type/style of document being submitted. If you intend to be more specific in order to aid the software to understand the type of document being fed, click on the option that best describes your document. Export the document: You can export the document to a desired location. We generally select ‘Save to File’ but you could also copy to clipboard and send it as a e-mail etc. We can save it in any format as seen below. This document can later on be opened in MS Office Word (or any other text editing tool) to be edited. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 8 step 1. Now that you have chosen the recommended workflow, click on the workflow icon. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 9 step 2. Once you click on the icon, you’ll be prompted with a Scan Setup window. Choose whether to update to the latest scanner database from Nuance. step 3. Next select choose “Select and test scanner or digital camera step 4. Locate your device and click Next step 5. It’s up to you whether you should test the device you’re using. If the Set Up wizard has prompted you that your device is ready, then click Next step 6. Click Finish to exit the Set Up Wizard Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 10 step 7. The Setup can be done manually by going to ToolsScanner Setup wizard. The set up wizard provides us with the same options as the previous steps provided. step 8. For normal black and white scanning of primarily text, we recommend choosing the Automatic Process 1-2-3 procedure and under each phase (1, 2, 3) select the following: a. Phase 1: Scan B&W b. Phase 2: Automatic Recognition c. Phase 3: Save to File Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 11 Start Scanning After you select your scanning preferences you can begin scanning. step 1. Click on the 1-2-3-> button to start scanning. Multiple Pages Omni Page allows you to scan single or multiple pages at a time. If you will be scanning multiple pages: step 1. step 2. step 3. step 4. After the first page is scanned the Continue Automatic Processing window appears. If you have more pages to scan, place the new page in the scanner. Click the Add More Pages button. When you are done scanning, select the Stop Loading Pages button. Other ways to load files Converting from PDF To extract text content from a PDF file, load it into Omni Page, recognize it, and save the results to a text format. A variety of outputs is also available from a PDF file shortcut menu: Word, Excel, RTF, WordPerfect or text. For more options, use the Convert Now Wizard. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 12 Creating PDF files from other applications The Nuance PDF Create product supplied with Omni Page Professional provides the ability to create Normal PDF files from documents in any print-capable application on your system. Click File / Print and select the printer Scan Soft PDF Create! Adjust properties as desired and click OK and supply a file name and location. If View resulting PDF is selected, your default PDF viewer displays the result. OCR and Proofreading After scanning is completed, you will see your image in the Image Panel and the 100% bar appears below the second button of get file type. The OCR and Proofreading steps will follow automatically. You can respond to the OCR Proofreader’s suggestions (see below). step 1. If the OCR does not recognize a word because it is not very clear, it will highlight the wrong elements and display it to you with a few suggestions that it thinks are appropriate. Whenever an unidentified character pops up, you need to identify it. After, click on one of the tabs that fit. Ignore – would ignore the current mistake that has been pointed out. Ignore all – would ignore all similar occurrences throughout the document. Add – would add a new word that needs to appear in the text box. Change- would change the current word with suggestions or any changes that were made would be saved. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 13 Change all – Changes all the occurrences of that word in the remaining document. More >> - Displays more special characters. Page Ready- If you do not want to make any more changes on the page, select this. Document Ready- If you don’t want to make any more changes to the document, select this. Close - Closes the editing window and takes you to the next step, which is exporting the file. You can either save it to a folder or email it. Exporting the Files Save the File 1. When done with the Proofreader, click the close button on the Proofreader window. The Save to File window appears automatically (see below). Select your preferred format in the Files of type popdown window. You can save the file in any format. It’s highly recommend you save as Word document. Saving as a Word Document is important because you will have an original digital copy of a file format that is easy to edit in the future. You can also save your scanned file as PDF file. In both cases (Word or Acrobat) you will also need to structure the content by assigning tags/styles to the text and adding text equivalent information to images or figures. Save As: Omni Page Document format This way the document is always available in Omni Page to be edited. It will remain in Omni Page after export and can be edited multiple times, exported to different formats or mailed to your Outlook 2007. You can even add or recognize already recognized pages Other formats levels Include Plain Text - This exports plain, no columns, left-aligned text, in a single font and font size. When exporting to Text or Unicode file types, graphics and tables are not supported. You can export plain text to nearly all file types and target applications; in these cases graphics, tables and bullets can be retained. Formatted Text – Formatted Text exports text with no columns, font and paragraph styling, graphics and tables. This is available for nearly all file types. Flowing Page - This keeps the original layout of the pages, including columns. This is done wherever possible with column and indent settings, not with text boxes or frames. Text will then flow from one column to the other, which does not happen when text boxes are used. Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 14 True Page - This keeps the original layout of the pages, including columns. This is done with text, picture and table boxes and frames. This is offered only for target applications capable of handling these. True Page Formatting is the only choice for XML export and for all PDF export, except to the file type ‘PDF Edited’. Spreadsheet - This exports recognition results in tabular form, suitable for use in spreadsheet applications. This places each document page onto a separate worksheet. Save images in the document You can save images in various formats. Under Save to file in the Export results drop down list, select Image under Save as. step 1. Choose a folder location and type in the file name. After that select the format in which you want to save your images and select if you want to save images from the current page, multiple pages or all the pages. step 2. Select to save the selected zone image(s) only, the current page image, selected page images or all images in the document. For multiple zones or multiple pages, you can have all images in a single multi-page image file, providing you set TIFF, MAX, DCX, JB2 or Image-only PDF or XPS as file type. Otherwise each image is placed in a separate file. Omni Page adds numerical suffixes to the file name you provide, to generate unique file names. step 3. Click Options... if you want to specify a saving mode (black and white, grayscale, color or ‘As is’), a maximum resolution and other settings. For TIFF files, you specify the compression method here. Click OK to save the image(s) as specified. Zones and recognized text are not saved with the file. Save to PDF You have five choices when saving to Portable Document Format (PDF) files. The first four are presented as Text converters; the last one is listed among the Image converters. PDF (Normal): Pages are exported as they appeared in the Text Editor in True Page view. The PDF file can be viewed and searched in a PDF viewer and edited in a PDF editor. PDF Edited: Use this if you have made significant editing changes in the recognition results. You have three formatting level choices, Saving recognition results 80 including True Page. The PDF file can be viewed, searched and edited. PDF Searchable Image (formerly PDF Image on Text): Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc 15 The PDF file is viewable only and cannot be modified in a PDF editor. The original images are exported, but there is a linked text file behind each image, so the text can be searched. A found word is highlighted in the image. PDF with image substitutes: As for PDF (Normal), but words containing reject and suspect characters have image overlays, so these uncertain words display as they were in the original document. The PDF file can be viewed, searched and edited. PDF Image (formerly PDF, image only): The original images are exported. The PDF file is viewable only and cannot be modified in a PDF editor and text cannot be searched. Summary Topics and techniques described in this manual include What is OCR? OCR software converts the scanned “image” output from print based documents such as laser-printed and typewritten documents and digital documents such as an image PDF into editable text The process of launching, scanning, setting preferences and exporting Other ways to load files such as converting to PDF Proofreading your OCR after it has been loaded or scanned Exporting options References For Further Assistance pleas e refer the official Omni Page Professional 17 User guide by nuance http://www.nuance.com/imaging/pdf/ug_OmniPage17UserGuide.pdf Manual referred - © Nuance_Omnipage_Professional_17_1_User_Guide Technology Learning Center 916.278.6112 AIRC 3012 http://www.csus.edu/irt/fsrc