Introduction to OCR in Revu Optical Character Recognition (OCR), or text recognition, translates the text in scanned PDF documents into searchable text. Once OCR has been run on a scanned PDF, you can search the document for specific text, add bookmarks and hyperlinks on text, copy text to another document or use one of Revu's advanced text editing tools. Compatibility Revu eXtreme 12.0 or higher Contents Running OCR on a Single Document Running OCR on Multiple Documents (Batch) Search for Text in a Document After Running OCR Copying and Editing Text in a Document After Running OCR www.bluebeam.com 1 Running OCR on a Single Document 1. Open the document on which OCR is to be run. 2. From the Command bar, go to Document > Ctrl+Shift+O. The OCR dialog box opens. l OCR or use the keyboard shortcut The OCR function will also be invoked when the Create PDF from Scanner or Camera function in Revu is used, opening the OCR dialog box automatically. 3. The languages that will be used by the OCR process are shown under Recognition Languages. The American English library is loaded by default. To add other libraries, click Add. To remove a library, select it and click Remove. Multiple libraries can be used on the same document. 4. Set the OCR Configuration options, as desired: l l l Correct Skew: Enable to correct angular deviations in scanned documents. Detect Orientation: Enable to detect the page orientation (90, 180 and 270 degrees) of each page and correct it if needed. Detect Text in Pictures and Drawings: Enable to detect text in graphics. www.bluebeam.com 2 l l l Rotate Markups: If Correct Skew is enabled, use this option to also adjust existing markups so they line up with skew-corrected text or images. Skip Vector Pages: Enable to skip processing of pages with vector content. Page Chunk Size: Use to determine the maximum number of pages sent to the OCR engine at one time. Increasing chunk size can increase speed, but will also consume more of the computer's resources. Note: Enabling Page Chunk Size and setting it to 1 is recommended for OCR jobs performed on PDFs that have a large number of pages, are of substantial file size or contain large format drawings. If OCR is run on a PDF with no results, running it again with a Page Chunk Size of 1 can correct the problem. l l Max Vector Size: Use to set the maximum vector size that will be analyzed during the OCR process; any vectors larger than this setting will be discarded in pre-processing. Decreasing this value can increase speed, but might also cause larger text (for example, larger fonts) to be inadvertently ignored. Optimize for: Use to optimize the OCR process for the selected document type. The CAD Drawing setting tends to ignore text formatting, for example, while the Text Document setting does not. 5. To select a Page Range, click the Pages menu and select from the following: l l l l All Pages: Sets the range to all pages. Current: Sets the range to the current page only. The current page number will appear in parentheses, for example, Current (2) if page 2 is the current page. Selected: Sets the range to the current selection. This option only appears if pages were selected prior to invoking the command. Custom: Sets the range to a custom value. When this option is selected the list becomes a text box. To enter a custom range: l l Use a dash between page numbers to define those two pages and all pages in between. Use a comma to define pages that are separated. For example: 1-3, 5, 9 will include pages 1, 2, 3, 5 and 9. 6. Click OK to run OCR. www.bluebeam.com 3 Running OCR on Multiple Documents (Batch) 1. From the Command bar, go to File > dialog box opens. Batch > OCR. The Batch: OCR 2. Add documents using one (or both) of the following methods: l Click Add Open Files to add currently open files to the list. l Click Add to select files from a local or network drive to the list. 3. To select a Page Range, click the Pages menu and select from the following: l l All Pages: Sets the range to all pages. Custom: Sets the range to a custom value. When this option is selected the list becomes a text box. To enter a custom range: l l Use a dash between page numbers to define those two pages and all pages in between. Use a comma to define pages that are separated. For example: 1-3, 5, 9 will include pages 1, 2, 3, 5 and 9. 4. Click the Apply To lists to select among Even Pages Only, Odd Pages Only or Odd and Even Pages and among Landscape Pages, Portrait Pages or Landscape and Portrait Pages. www.bluebeam.com 4 5. Select the next PDF in the File List and repeat steps 3 and 4 until Page Range and Page Filter options have been set for each PDF. 6. Click OK. The OCR dialog box opens. 7. The languages that will be used by the OCR process are shown under Recognition Languages. The American English library is loaded by default. To add other libraries, click Add. To remove a library, select it and click Remove. Multiple libraries can be used on the same document. 8. Set the OCR Configuration options, as desired: l l l l l l Correct Skew: Enable to correct angular deviations in scanned documents. Detect Orientation: Enable to detect the page orientation (90, 180 and 270 degrees) of each page and correct it if needed. Detect Text in Pictures and Drawings: Enable to detect text in graphics. Rotate Markups: If Correct Skew is enabled, use this option to also adjust existing markups so they line up with skew-corrected text or images. Skip Vector Pages: Enable to skip processing of pages with vector content. Page Chunk Size: Use to determine the maximum number of pages sent to the OCR engine at one time. Increasing chunk size can increase speed, but will also consume more of the computer's www.bluebeam.com 5 resources. Note: Enabling Page Chunk Size and setting it to 1 is recommended for OCR jobs performed on PDFs that have a large number of pages, are of substantial file size or contain large format drawings. If OCR is run on a PDF with no results, running it again with a Page Chunk Size of 1 can correct the problem. l l Max Vector Size: Use to set the maximum vector size that will be analyzed during the OCR process; any vectors larger than this setting will be discarded in pre-processing. Decreasing this value can increase speed, but might also cause larger text (for example, larger fonts) to be inadvertently ignored. Optimize for: Use to optimize the OCR process for the selected document type. The CAD Drawing setting tends to ignore text formatting, for example, while the Text Document setting does not. 9. Click OK to run OCR. Search for Text in a Document After Running OCR One advantage of running OCR on a scanned PDF is the ability to search it for a specific text string. Since scanned PDFs are images, this is not possible until after OCR is run. To search for text in a document: 1. Select the Search tab. l If the Search tab is not open, go to keyboard shortcut Alt+1 or Ctrl+F. Tab Access > Search or use the 2. Enter text to search for in the Text field. 3. Select Current Document from the Search In dropdown menu. 4. Select any of the desired Options: l l Search Pages: Searches for text in the content of the PDF. Search Filenames: Searches the file names in the Recents list when Search In is set to Recents. l Search File Properties: Searches for text in the Properties metadata fields. l Search Form Fields: Searches for text in the data entered in the form fields. l Search Markups: Searches for text in markups. l Case Sensitive: Searches for text with the exact case typed in the Search Terms field. www.bluebeam.com 6 l Whole Words Only: Searches only for instances where the search term exists as a complete word. If the search term is partially contained in another word and the Whole Words Only box is checked, it will not be included in the search results. 5. Click Search. Results are shown below the Options panel. Copying and Editing Text in a Document After Running OCR Many advanced features available in Revu can be applied to text in a scanned PDF on which OCR has been run. Use the Select Text tool or the keyboard shortcut Shift+T to select text and right-click it to prompt a context menu with several useful commands. l l l l l l l l l l l Add Bookmark: Inserts a bookmark at this location using the selected text as the name of the bookmark. Add Hyperlink: Opens the Action dialog box to define a hyperlink action for the selected text. Mark for Redaction: Marks the selected text for redaction. Copy: Copies the selected text. Paste: Pastes previously copied text over the selected text. Highlight Selected Text: Highlights the selected text. Underline Selected Text: Underlines the selected text. Squiggly Selected Text: Inserts a squiggly line under the selected text. Strikethrough Selected Text: Strikes through the selected text. Replace Selected Text: Opens a Replacement Text pop-up window for the selected text. Insert Text at Cursor: Opens an Insert Text pop-up window at the current position of the cursor, which is unavailable when text has been selected. l Select All Text: Selects all text on the current page. l Deselect All Text: Deselects currently selected text. l Look Up: Opens a WebTab to look up the selected text in Wikipedia. www.bluebeam.com 7 l Search: Opens the Search tab and searches the current document for the selected text. www.bluebeam.com 8