Scanned documents

advertisement

How can I make scanned PDF files accessible (and searchable)?

The first step in making scanned documents accessible is to perform Optical Character

Recognition, or OCR, on the scanned page image. OCR converts images of alphanumeric characters into actual text that can be searched, read by assistive technology, exported to other formats or copied and pasted into other applications. Acrobat X has OCR text recognition feature that allows you to apply OCR to the scanned pages.

Converting a scanned document to text through Acrobat X Pro OCR also renders the document searchable. A scanned page (such as one scanned to email on a copier) is an image and cannot be searched for specific words.

The resulting PDF files contain computer-generated text, which is necessary for making the file's information accessible via screen readers and other assistive technologies. You may need to further process the files at this point by using the accessibility authoring tools in Acrobat to add structure (tags), alternate text for graphics that appear in the file, and accessible form fields if applicable. You may also need to adjust the reading and tab order for interactive PDF file components.

Adobe Reader X has the feature: Save as Text (Acrobat supports the export of other formats, such as Microsoft Word, HTML, RTF, and XML). Adobe Reader X, however, does not have the accessibility authoring tools included in Acrobat X. For example, it cannot perform optical character recognition (OCR) conversions, nor can it add missing alternate text descriptions to graphics in PDF files. So all staff who need to convert a scan to text will need Acrobat Pro X.

The Acrobat X tools menu is shown at left. The recognize text tool, the text touch-up tool (under content) and the accessibility tools can all be used to create accessible and searchable documents from a scan. The recognize text is the OCR tool and the first step to converting the scan.

Download