IBM Intelligent Forms Processing Services Offering - Forms Processing Overview The Forms Processing (FP) application incorporates state-of-the-art Character Recognition technology and is the cornerstone of the Intelligent Forms Processing (IFP) Services Offering. The FP application provides the following functions: Image Quality Control Document Quality Control Form Recognition Form Dropout Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Context Checking Data Validation Statistics Collection Remittance Processing. The FP application is the equivalent of several data entry operators working without a break, all day and night, for as long as there is work to do. The FP application runs continuously and unattended on a Windows NT or Windows 2000 platform. Once the processing capacity of one FP platform is reached, you can add another FP workstation to increase capacity and increase the volumes of forms that can be processed. There is virtually no limit to the number of FP processors you can add to an IFP system, and thus increase the volumes of forms you want to process. Once the FP application is started, it begins to process scanned images (TIFF or MODCA standard format images) of documents, automatically recognizing the type and version of the form (as defined by the IFP Forms Training Utility FTU). Then FP recognizes the hand or machineprinted characters (OCR/ICR) in the defined fields on the form and applies data validation rules. Scanners are improving, but they still suffer from occasional calibration and feeder problems, especially when scanning mixed size and different color documents. To help detect and resolve these problems, FP deskews registers, cleans up each image, and performs quality control checks to ensure image integrity. If an image is skewed or shifted, or if it is too light or dark, the image may be unreadable or missing data, so the document is sent to an exception process. FP also verifies that the document is complete and cohesive, meaning that all required pages and no extra pages are present. Image has been deskewed Powerful Form Recognition Successful recognition of each image is a prerequisite for high OCR/ICR accuracy. FP uses multiple techniques to identify each image as a form or attachment, and can even distinguish between many similar versions of the same form, often better than a human can. FP can also detect when a different form type is found in a batch of similar forms, a common document preparation mistake. FORMID: INVOICE Version 1B Image has been recognized as an invoice Original image from scanner Image and Document Quality Control Form Dropout A unique image processing option of FP is called form dropout. This is an IBM patented process that removes the preprinted information from the image, leaving only the variable printed information, before the OCR/ICR is done. Forms dropout removes lines, captions, and other static text from the image and significantly improves the OCR/ICR accuracy. It also reduces the file-size of the image so that it requires much less storage space and less bandwidth to transmit across a network. Form dropout automatically repairs gaps where characters cross lines and detects whiteout areas on the form (such as labels and alterations). This technology saves having to print forms with dropout colored ink. TOTAL SALES: 678959.02 Data Validation After OCR/ICR and context checking is done, FP validates the captured data against predefined edit rules, user-provided dictionaries, tables, and business rules (all defined in FTU). For example, an edit rule can check the sum of prior fields to make sure the “TOTAL SALES” field (shown above) matches the sum. With a dictionary lookup, FP can verify that an account number is valid by comparing it to a predefined list of valid account numbers. TOTAL SALES: 678959.02 Image after form dropout Optical/Intelligent Character Recognition The accuracy of the data recognized by FP is critical. FP allows for one or more OCR/ICR engines to be used to recognize characters on forms, and supports voting with multiple engines. Different engines are better at capturing different types of characters (e.g. hand-print, machine-print, numbers, OCR-A, etc.). IBM provides with FP two engines, developed by IBM Research. FP also supports use of multiple other leading commercial engines. Noise filtering, image enhancement, and other functions provided by different engines can improve OCR/ICR results. FP also is able to use specialized character classifiers to enable recognition of foreign language characters. Statistics Collection FP maintains a record of processing forms so that an IFP administrator can monitor FP’s performance. This record includes how long each form takes to be processed, what interruptions or errors may have occurred in processing a specific form, and a start/finish statement at each stage (e.g. form recognition, OCR/ICR, context checking, etc.) in the processing of each form. AMOUNT PAID: $11.00 Remittance Processing FP can also read both MICR and CAR/LAR amounts (example shown above) on checks using industry leading OCR/ICR engines and technology. For additional information contact your IBM marketing representative or visit the IFP web site at www.clearlake.ibm.com/gov/ifp TOTAL SALES: .678959,02 Context Checking After the OCR/ICR process, FP can apply numerous context checks (defined in FTU) to improve and reformat results. For example, many OCR/ICR engines know that money-amount fields may contain a decimal and/or comma, but do not enforce that syntax. As in the example above, a large decimal or comma might be read as the number “1”. The FP “amount field” context check, expecting such syntax, knows to interpret how the character. IBM Corporation, 800 North Frederick Avenue Gaithersburg, MD 20879 IBM is the registered trademark of International Business Machines Corporation. Windows NT and 2000 are registered trademarks of Microsoft Corporation. IBM product and service names are trademarks or registered trademarks of IBM. Other company product or service names may be trademarks or service marks of other companies.