IBM`s ImagePlus Intelligent Forms Processing Solution

advertisement
IBM Intelligent Forms Processing
Services Offering - Forms Processing
Overview
The Forms Processing (FP) application incorporates
state-of-the-art Character Recognition technology and
is the cornerstone of the Intelligent Forms Processing
(IFP) Services Offering.
The FP application provides the following functions:
 Image Quality Control
 Document Quality Control
 Form Recognition
 Form Dropout
 Optical Character Recognition (OCR)
 Intelligent Character Recognition (ICR)
 Context Checking
 Data Validation
 Statistics Collection
 Remittance Processing.
The FP application is the equivalent of several data
entry operators working without a break, all day and
night, for as long as there is work to do. The FP
application runs continuously and unattended on a
Windows NT or Windows 2000 platform. Once the
processing capacity of one FP platform is reached,
you can add another FP workstation to increase
capacity and increase the volumes of forms that can
be processed. There is virtually no limit to the
number of FP processors you can add to an IFP
system, and thus increase the volumes of forms you
want to process. Once the FP application is started,
it begins to process scanned images (TIFF or
MODCA standard format images) of documents,
automatically recognizing the type and version of the
form (as defined by the IFP Forms Training Utility FTU). Then FP recognizes the hand or machineprinted characters (OCR/ICR) in the defined fields on
the form and applies data validation rules.
Scanners are improving, but they still suffer from
occasional calibration and feeder problems,
especially when scanning mixed size and different
color documents. To help detect and resolve these
problems, FP deskews registers, cleans up each
image, and performs quality control checks to ensure
image integrity. If an image is skewed or shifted, or if
it is too light or dark, the image may be unreadable or
missing data, so the document is sent to an exception
process. FP also verifies that the document is
complete and cohesive, meaning that all required
pages and no extra pages are present.
Image has been deskewed
Powerful Form Recognition
Successful recognition of each image is a
prerequisite for high OCR/ICR accuracy. FP uses
multiple techniques to identify each image as a form
or attachment, and can even distinguish between
many similar versions of the same form, often better
than a human can. FP can also detect when a
different form type is found in a batch of similar forms,
a common document preparation mistake.
FORMID: INVOICE Version 1B
Image has been recognized as an invoice
Original image from scanner
Image and Document Quality Control
Form Dropout
A unique image processing option of FP is called
form dropout. This is an IBM patented process that
removes the preprinted information from the image,
leaving only the variable printed information, before
the OCR/ICR is done. Forms dropout removes lines,
captions, and other static text from the image and
significantly improves the OCR/ICR accuracy. It also
reduces the file-size of the image so that it requires
much less storage space and less bandwidth to
transmit across a network.
Form dropout
automatically repairs gaps where characters cross
lines and detects whiteout areas on the form (such as
labels and alterations). This technology saves having
to print forms with dropout colored ink.
TOTAL SALES: 678959.02
Data Validation
After OCR/ICR and context checking is done, FP
validates the captured data against predefined edit
rules, user-provided dictionaries, tables, and business
rules (all defined in FTU). For example, an edit rule
can check the sum of prior fields to make sure the
“TOTAL SALES” field (shown above) matches the
sum. With a dictionary lookup, FP can verify that an
account number is valid by comparing it to a
predefined list of valid account numbers.
TOTAL SALES: 678959.02
Image after form dropout
Optical/Intelligent Character Recognition
The accuracy of the data recognized by FP is critical.
FP allows for one or more OCR/ICR engines to be
used to recognize characters on forms, and supports
voting with multiple engines. Different engines are
better at capturing different types of characters (e.g.
hand-print, machine-print, numbers, OCR-A, etc.).
IBM provides with FP two engines, developed by IBM
Research. FP also supports use of multiple other
leading commercial engines. Noise filtering, image
enhancement, and other functions provided by
different engines can improve OCR/ICR results. FP
also is able to use specialized character classifiers to
enable recognition of foreign language characters.
Statistics Collection
FP maintains a record of processing forms so that an
IFP administrator can monitor FP’s performance.
This record includes how long each form takes to be
processed, what interruptions or errors may have
occurred in processing a specific form, and a
start/finish statement at each stage (e.g. form
recognition, OCR/ICR, context checking, etc.) in the
processing of each form.
AMOUNT PAID:
$11.00
Remittance Processing
FP can also read both MICR and CAR/LAR amounts
(example shown above) on checks using industry
leading OCR/ICR engines and technology.
For additional information contact your IBM
marketing representative or visit the IFP web site at
www.clearlake.ibm.com/gov/ifp
TOTAL SALES: .678959,02
Context Checking
After the OCR/ICR process, FP can apply numerous
context checks (defined in FTU) to improve and
reformat results.
For example, many OCR/ICR
engines know that money-amount fields may contain
a decimal and/or comma, but do not enforce that
syntax. As in the example above, a large decimal or
comma might be read as the number “1”. The FP
“amount field” context check, expecting such syntax,
knows to interpret how the character.
IBM Corporation, 800 North Frederick Avenue
Gaithersburg, MD 20879
IBM is the registered trademark of International Business
Machines Corporation. Windows NT and 2000 are
registered trademarks of Microsoft Corporation.
IBM product and service names are trademarks or
registered trademarks of IBM. Other company product or
service names may be trademarks or service marks of
other companies.
Download