OCR - Bluebeam

advertisement
Introduction to OCR in Revu
Optical Character Recognition (OCR), or text recognition, translates the text in
scanned PDF documents into searchable text. Once OCR has been run on a
scanned PDF, you can search the document for specific text, add bookmarks and
hyperlinks on text, copy text to another document or use one of Revu's advanced text
editing tools.
Compatibility
Revu eXtreme 12.0 or higher
Contents
Running OCR on a Single Document
Running OCR on Multiple Documents (Batch)
Search for Text in a Document After Running OCR
Copying and Editing Text in a Document After Running OCR
www.bluebeam.com
1
Running OCR on a Single Document
1. Open the document on which OCR is to be run.
2. From the Command bar, go to Document >
Ctrl+Shift+O. The OCR dialog box opens.
l
OCR or use the keyboard shortcut
The OCR function will also be invoked when the Create PDF from
Scanner or Camera function in Revu is used, opening the OCR dialog
box automatically.
3. The languages that will be used by the OCR process are shown under Recognition
Languages. The American English library is loaded by default. To add other libraries,
click Add. To remove a library, select it and click Remove. Multiple libraries can be
used on the same document.
4. Set the OCR Configuration options, as desired:
l
l
l
Correct Skew: Enable to correct angular deviations in scanned documents.
Detect Orientation: Enable to detect the page orientation (90, 180 and 270
degrees) of each page and correct it if needed.
Detect Text in Pictures and Drawings: Enable to detect text in graphics.
www.bluebeam.com
2
l
l
l
Rotate Markups: If Correct Skew is enabled, use this option to also adjust
existing markups so they line up with skew-corrected text or images.
Skip Vector Pages: Enable to skip processing of pages with vector content.
Page Chunk Size: Use to determine the maximum number of pages
sent to the OCR engine at one time. Increasing chunk size can
increase speed, but will also consume more of the computer's
resources.
Note: Enabling Page Chunk Size and setting it to 1 is recommended
for OCR jobs performed on PDFs that have a large number of pages,
are of substantial file size or contain large format drawings. If
OCR is run on a PDF with no results, running it again with a Page
Chunk Size of 1 can correct the problem.
l
l
Max Vector Size: Use to set the maximum vector size that will be analyzed
during the OCR process; any vectors larger than this setting will be discarded
in pre-processing. Decreasing this value can increase speed, but might also
cause larger text (for example, larger fonts) to be inadvertently ignored.
Optimize for: Use to optimize the OCR process for the selected document
type. The CAD Drawing setting tends to ignore text formatting, for example,
while the Text Document setting does not.
5. To select a Page Range, click the Pages menu and select from the following:
l
l
l
l
All Pages: Sets the range to all pages.
Current: Sets the range to the current page only. The current page number
will appear in parentheses, for example, Current (2) if page 2 is the current
page.
Selected: Sets the range to the current selection. This option only
appears if pages were selected prior to invoking the command.
Custom: Sets the range to a custom value. When this option is selected the
list becomes a text box. To enter a custom range:
l
l
Use a dash between page numbers to define those two pages and all
pages in between.
Use a comma to define pages that are separated.
For example: 1-3, 5, 9 will include pages 1, 2, 3, 5 and 9.
6. Click OK to run OCR.
www.bluebeam.com
3
Running OCR on Multiple Documents (Batch)
1. From the Command bar, go to File > dialog box opens.
Batch >
OCR. The Batch: OCR
2. Add documents using one (or both) of the following methods:
l
Click Add Open Files to add currently open files to the list.
l
Click Add to select files from a local or network drive to the list.
3. To select a Page Range, click the Pages menu and select from the following:
l
l
All Pages: Sets the range to all pages.
Custom: Sets the range to a custom value. When this option is selected the
list becomes a text box. To enter a custom range:
l
l
Use a dash between page numbers to define those two pages and all
pages in between.
Use a comma to define pages that are separated.
For example: 1-3, 5, 9 will include pages 1, 2, 3, 5 and 9.
4. Click the Apply To lists to select among Even Pages Only, Odd Pages Only or Odd
and Even Pages and among Landscape Pages, Portrait Pages or Landscape and
Portrait Pages.
www.bluebeam.com
4
5. Select the next PDF in the File List and repeat steps 3 and 4 until Page Range and Page
Filter options have been set for each PDF.
6. Click OK. The OCR dialog box opens.
7. The languages that will be used by the OCR process are shown under Recognition
Languages. The American English library is loaded by default. To add other libraries,
click Add. To remove a library, select it and click Remove. Multiple libraries can be
used on the same document.
8. Set the OCR Configuration options, as desired:
l
l
l
l
l
l
Correct Skew: Enable to correct angular deviations in scanned documents.
Detect Orientation: Enable to detect the page orientation (90, 180 and 270
degrees) of each page and correct it if needed.
Detect Text in Pictures and Drawings: Enable to detect text in graphics.
Rotate Markups: If Correct Skew is enabled, use this option to also adjust
existing markups so they line up with skew-corrected text or images.
Skip Vector Pages: Enable to skip processing of pages with vector content.
Page Chunk Size: Use to determine the maximum number of pages
sent to the OCR engine at one time. Increasing chunk size can
increase speed, but will also consume more of the computer's
www.bluebeam.com
5
resources.
Note: Enabling Page Chunk Size and setting it to 1 is recommended
for OCR jobs performed on PDFs that have a large number of pages,
are of substantial file size or contain large format drawings. If
OCR is run on a PDF with no results, running it again with a Page
Chunk Size of 1 can correct the problem.
l
l
Max Vector Size: Use to set the maximum vector size that will be analyzed
during the OCR process; any vectors larger than this setting will be discarded
in pre-processing. Decreasing this value can increase speed, but might also
cause larger text (for example, larger fonts) to be inadvertently ignored.
Optimize for: Use to optimize the OCR process for the selected document
type. The CAD Drawing setting tends to ignore text formatting, for example,
while the Text Document setting does not.
9. Click OK to run OCR.
Search for Text in a Document After Running OCR
One advantage of running OCR on a scanned PDF is the ability to search it for a
specific text string. Since scanned PDFs are images, this is not possible until after OCR
is run.
To search for text in a document:
1. Select the Search tab.
l
If the Search tab is not open, go to
keyboard shortcut Alt+1 or Ctrl+F.
Tab Access >
Search or use the
2. Enter text to search for in the Text field.
3. Select Current Document from the Search In dropdown menu.
4. Select any of the desired Options:
l
l
Search Pages: Searches for text in the content of the PDF.
Search Filenames: Searches the file names in the Recents list when Search In
is set to Recents.
l
Search File Properties: Searches for text in the Properties metadata fields.
l
Search Form Fields: Searches for text in the data entered in the form fields.
l
Search Markups: Searches for text in markups.
l
Case Sensitive: Searches for text with the exact case typed in the Search
Terms field.
www.bluebeam.com
6
l
Whole Words Only: Searches only for instances where the search term exists
as a complete word. If the search term is partially contained in another word
and the Whole Words Only box is checked, it will not be included in the
search results.
5. Click Search. Results are shown below the Options panel.
Copying and Editing Text in a Document After Running OCR
Many advanced features available in Revu can be applied to text in a scanned PDF on
which OCR has been run. Use the
Select Text tool or the keyboard shortcut
Shift+T to select text and right-click it to prompt a context menu with several useful
commands.
l
l
l
l
l
l
l
l
l
l
l
Add Bookmark: Inserts a bookmark at this location
using the selected text as the name of the
bookmark.
Add Hyperlink: Opens the Action dialog box to
define a hyperlink action for the selected text.
Mark for Redaction: Marks the selected text for
redaction.
Copy: Copies the selected text.
Paste: Pastes previously copied text over the
selected text.
Highlight Selected Text: Highlights the selected
text.
Underline Selected Text: Underlines the selected
text.
Squiggly Selected Text: Inserts a squiggly line
under the selected text.
Strikethrough Selected Text: Strikes through the
selected text.
Replace Selected Text: Opens a Replacement Text
pop-up window for the selected text.
Insert Text at Cursor: Opens an Insert Text pop-up window at the current position of
the cursor, which is unavailable when text has been selected.
l
Select All Text: Selects all text on the current page.
l
Deselect All Text: Deselects currently selected text.
l
Look Up: Opens a WebTab to look up the selected text in Wikipedia.
www.bluebeam.com
7
l
Search: Opens the Search tab and searches the current document for the selected
text.
www.bluebeam.com
8
Download