ABBYY Headquarters
P.O. Box 54, Moscow
129301, Russia
Tel.: +7 (495) 783 3700
Fax: +7 (495) 783 2663
office@abbyy.com
www.ABBYY.com
Recognition Server 4.0
Release Notes
© ABBYY. All rights reserved.
Table of Contents
RECOGNITION SERVER 4.0 RELEASE NOTES ........................................................................................................ 1
TABLE OF CONTENTS ........................................................................................................................................... 2
INTRODUCTION ................................................................................................................................................... 5
About This Document ..................................................................................................................................... 5
About the Product .......................................................................................................................................... 5
Key Enhancements.......................................................................................................................................... 5
Release 2 ............................................................................................................................................................ 5
Release 1 Multilingual ........................................................................................................................................ 5
Release 1 ............................................................................................................................................................ 6
Release 1 (specially for 3A) ................................................................................................................................. 6
Arabic Edition ..................................................................................................................................................... 6
Installing the New Version .............................................................................................................................. 6
License Usage ................................................................................................................................................. 6
NEW FEATURES AND IMPROVEMENTS ................................................................................................................ 7
1.
2.
Server Features ........................................................................................................................................ 7
1.1.
Separate workflow queues ................................................................................................................... 7
1.2.
Easy recovery after failure without data loss ....................................................................................... 7
1.3.
Support working on Failover cluster ..................................................................................................... 7
1.4.
Internal database .................................................................................................................................. 7
1.5.
Server exceptions folder ....................................................................................................................... 7
Administration Console............................................................................................................................ 7
2.1.
2.1.1.
2.2.
Usage of Active Directory groups ..................................................................................................... 7
Logs and reports ................................................................................................................................... 8
2.2.1.
Improved logging ............................................................................................................................. 8
2.2.2.
Saving information about the operator who edited or rejected the document .............................. 9
2.2.3.
Correspondence between input and output files ............................................................................ 9
2.3.
Notifications ....................................................................................................................................... 10
2.3.1.
Including server and workflow names into the text of notification messages .............................. 10
2.3.2.
Notification about near license expiry ........................................................................................... 10
2.4.
Job rejection without loss of files ....................................................................................................... 10
2.5.
Interface improvements ..................................................................................................................... 11
2.5.1.
Main window of Administration Console ....................................................................................... 11
2.5.2.
Workflow status pane .................................................................................................................... 11
2.6.
3.
User rights management ...................................................................................................................... 7
Soft stop of the workflow processing ................................................................................................. 12
Workflow settings .................................................................................................................................. 13
3.1.
3.1.1.
3.2.
Document Library workflow type ....................................................................................................... 13
Periodical crawling of document libraries ...................................................................................... 13
Input settings ...................................................................................................................................... 14
3.2.1.
Processing SharePoint libraries ...................................................................................................... 14
3.2.2.
Using IFilter for processing PDF files in MS SharePoint.................................................................. 15
3.2.3.
Filtering files for processing and settings for unprocessed files .................................................... 15
3.2.4.
3.3.
Using the SSL protocol for data protection .................................................................................... 17
Processing settings ............................................................................................................................. 17
3.3.1.
Special mode for processing technical drawings ........................................................................... 17
3.3.2.
Despeckle images option ............................................................................................................... 18
© ABBYY. All rights reserved. Page 2 of 39
3.3.3.
Additional fonts .............................................................................................................................. 18
3.3.4.
To speed up processing, text in pictures is not recognized by default .......................................... 18
3.3.5.
Blank page detection settings ........................................................................................................ 18
3.4.
3.4.1.
Improved MRC compression method of output PDF files.............................................................. 19
3.4.2.
Version, format, and other parameters of an output PDF file ....................................................... 20
3.4.3.
Export to PDF/A-3 format .............................................................................................................. 20
3.4.4.
Tagged PDF enabled by default ...................................................................................................... 20
3.4.5.
Possibility to skip processing PDFs with a text layer ...................................................................... 20
3.4.6.
Ability to embed a text layer and keep the image and all PDF file properties ............................... 21
3.4.7.
Enabling and disabling Fast Web View for PDF files ...................................................................... 21
3.4.8.
Using PDF text layer for recognition results improvement ............................................................ 21
3.4.9.
Using PDF text layer for generating quality output files of different formats ............................... 22
3.5.
4.
5.
Overwriting files in an output folder .............................................................................................. 22
3.5.2.
Export format compatible with FineReader Engine 11 .................................................................. 23
3.5.3.
KeepPages parameter .................................................................................................................... 23
3.5.4.
Export to specific column types in SharePoint ............................................................................... 23
3.5.5.
Export to ePub3 format .................................................................................................................. 24
3.5.6.
Settings of units measurement for export to ALTO XML ............................................................... 24
Document processing ............................................................................................................................. 24
4.1.
Improved recognition of Arabic texts ................................................................................................. 24
4.2.
Ability to limit the number of processed pages in input files ............................................................. 25
4.3.
Support of new barcode type - USPS-4CB (Intelligent Mail Barcode) ................................................ 25
4.4.
Disabled image compression of lossy JBIG2 type ............................................................................... 25
Scanning Station .................................................................................................................................... 26
8.
9.
Sending registration parameters values to index fields ..................................................................... 26
Verification and Indexing Stations ......................................................................................................... 27
6.1.
Manual selection of documents for verification and indexing ........................................................... 27
6.2.
Saving documents ............................................................................................................................... 28
6.3.
Timeout of inactivity ........................................................................................................................... 28
6.4.
Improved work with document types and index fields on Indexing Stations .................................... 28
6.4.1.
Import of index fields from files ..................................................................................................... 28
6.4.2.
Quick input of index fields .............................................................................................................. 29
6.4.3.
Possibility to combine values from several regions into a one index field .................................... 29
6.5.
7.
Output settings ................................................................................................................................... 22
3.5.1.
5.1.
6.
PDF processing options....................................................................................................................... 19
User interface changes ....................................................................................................................... 30
6.5.1.
Verification Station......................................................................................................................... 30
6.5.2.
Indexing Station ............................................................................................................................. 30
Operating systems ................................................................................................................................. 30
7.1.
Support for Windows Server 2012 Release 2 ..................................................................................... 30
7.2.
Discontinued support for Windows XP and Windows Server 2003 ................................................... 30
Scripting ................................................................................................................................................. 31
8.1.
Access to subsequent pages from the document assembly script ..................................................... 31
8.2.
Detecting the workflow name by script ............................................................................................. 31
Changes in the COM-based API and Web API ......................................................................................... 31
9.1.
Namespace changes ........................................................................................................................... 31
9.2.
Compatible API ................................................................................................................................... 31
9.3.
Automatic API deployment on 64x operating systems ...................................................................... 31
© ABBYY. All rights reserved. Page 3 of 39
9.4.
10.
Added objects ..................................................................................................................................... 31
9.4.1.
Correspondence between input and output files .......................................................................... 31
9.4.2.
Support of the recognition service scenario (for NLC) ................................................................... 32
9.4.3.
Deleting of jobs .............................................................................................................................. 34
UI and Documentation localization ................................................................................................... 34
CORRECTED ISSUES............................................................................................................................................ 35
KNOWN ISSUES ................................................................................................................................................. 35
© ABBYY. All rights reserved. Page 4 of 39
Introduction
About This Document
This document describes the features that have been implemented in the ABBYY Recognition Server 4.
About the Product
ABBYY Recognition Server 4 provides new technology, including significantly improved recognition of texts in
Arabic, new export settings, and other technology improvements. The main server features, such as stability,
performance, and auto-recovery have been revised and improved. The new version can also process document
libraries stored in read-only folders. Other improvements include advanced logging, GUI changes, and bug fixes.
The main changes introduced in the new version are described below.
Key Enhancements
Release 2
Part #: 1135/6, build # 4.0.3.1167, OCR Technologies build # 13.0.15.131, release date: 14/11/2014
New features and changes in Release 2 are marked with the blue color here and in the document below.
The major features:
 Improved MRC compression method (producing the compressed PDF files of the minimum size)
 Using IFilter for processing PDF files in MS SharePoint
 Processing the SharePoint document libraries:
o Crawling of the whole site (including multiple libraries and folders)
o Periodical re-crawling settings
 Export to specific column types in SharePoint (support of Date, Number, and some other formats)
 Export to PDF/A-3
Other improvements:
 Improved email notifications:
o Notifications on near license expiry
o Information on server name in the message text
 Sending registration parameters values from Scanning Station to index fields
 Soft stop of the workflow processing
 Support working on failover cluster
 Using PDF text layer for generating quality output files
 Blank page detection parameters
 New barcode type - USPS-4CB (Intelligent Mail Barcode)
 New export format: ePub3
 Settings of units measurement for export to ALTO XML
 Disabled image compression of lossy JBIG2 type
 Tagged PDF enabled by default
 Possibility to combine values from several regions into a one index field
 Access to subsequent pages from the document assembly script
 Detecting the workflow name by script
Release 1 Multilingual
Part #: 1135/5, build # 4.0.2.952, OCR Technologies build number 13.0.13.21, release date: 14/08/2014

Translation of UI and help on the following languages:
o
o
o
o
o
o
o
French
German
Italian
Spanish
Chinese
Portuguese (Brazil)
Czech
© ABBYY. All rights reserved. Page 5 of 39

o Hungarian
o Polish
Bug fix for ABBYY USA
Release 1
Part #: 1135/4, build # 4.0.2.943, OCR Technologies build number 13.0.13.15, release date: 19/05/2014




Improved failure recovery
Ability to limit the number of processed pages
Verification and Indexing Station improvements
o Selecting documents from a queue
o Timeout settings
o Saving changes on the stations
Indexing Station improvements
o Importing document types from an external source
Release 1 (specially for 3A)
Part #: 1135/3, build # 4.0.1.795, OCR Technologies build number 13.0.8.108, release date: 29/01/2014





Improved server operation
o Redundancy
o Reports and statistics
Enhanced work with PDF files
Ability to process documents in read-only folders
Processing documents placed in SharePoint libraries
Latest technology version
Arabic Edition
Part #: 1135/2, build # 4.0.0.461, OCR Technologies build number 13.0.0.58, release date: 06/05/2013
 Improved recognition of Arabic texts
 Ability to process documents in read-only folders
 Improved logging
 Bug fixes
A detailed description of the changes can be found in Release Notes for Recognition Server Arabic Edition.
Installing the New Version
Recognition Server 4 can be installed on the same computer where Recognition Server 3.5 or earlier is installed.
The configuration settings used in a previous version of ABBYY Recognition Server can be imported into ABBYY
Recognition Server 4. For further information, please see the section “Upgrade from the previous versions of
ABBYY Recognition Server” in the System Administrator’s Guide.
Note: Please be aware that some changes have been made to the XML result file, which describes the results of
task processing. This may require changes in the software used for integrating ABBYY Recognition Server with data
storage systems. For details, see the description of changes to the XML result file below or refer to the “XML Result”
section in the Help file.
License Usage
Recognition Server 4 cannot work with licenses generated for previous versions of Recognition Server.
Recognition Server 4 can work with a license generated for Recognition Server Arabic Edition, but since some of
the license parameters have been changed (ISIS option has been added), we recommend generating new licenses
for Recognition Server 4 Release 1 (for 3A), Recognition Server Release 1 and other maintenance releases.
© ABBYY. All rights reserved. Page 6 of 39
New Features and Improvements
1. Server Features
1.1. Separate workflow queues
Each workflow now has a separate queue which prevents other workflows from being stopped if one of them has
an overloaded queue.
The default number of jobs in the queue of each workflow is 50. This number can be changed in the
Configuration.xml file: MaxJobsCount="50". Prior to this change, this key set the total number of jobs in the server
queue. Now this key sets the number of jobs in the queue of each workflow.
Implemented in: release 1 for 3A
1.2. Easy recovery after failure without data loss
Recovery after failure is now smoother and does not require manual copying of files. GUIDs are not used in file
names anymore, so it is always possible to find a file by its name.
When Recognition Server processes jobs, files are stored in the folder %programdata%\ABBYY Recognition Server
4.0\RS4WF\Images\<Workflow name>. File names are the same as the names of source files with the only
difference: job ID is added at the beginning of the name.
Implemented in: release 1 for 3A
1.3. Support working on Failover cluster
Work on failover cluster has been supported. The Recognition Server instances can be installed on separate nodes
of Failover cluster. All settings of the Recognition Server can be stored in the shared folder available for the cluster.
Please note: this feature has not been tested. The testing can be done upon the request.
(The instruction with details of installing the Recognition Server on Failover cluster will be provided later.)
Implemented in: Release 2.
1.4. Internal database
The current system state is now stored in the internal SQLite database. This database is installed together with
Recognition Server and is invisible to users.
Implemented in: Arabic Edition
1.5. Server exceptions folder
A new folder with server exceptions is now created in C:\ProgramData\ABBYY Recognition Server
4.0\RS4WF\Exceptions. This folder contains jobs which failed due to the faulty operation of the server or server
flows. Jobs may fail if, for example, the database or the configuration file becomes corrupted.
Implemented in: release 1 for 3A
2. Administration Console
2.1. User rights management
2.1.1. Usage of Active Directory groups
In the Users node of the Administration Console it is now possible to add groups of users from Active Directory.
The full name of a group should be specified, including the domain name. A role (e.g. Verifier or Indexer) can be
© ABBYY. All rights reserved. Page 7 of 39
assigned to a group, and all members of the group will have the rights corresponding to the assigned role. Any
users added to the group will automatically receive the rights required to work with Recognition Server.
Implemented in: release 1 for 3A, release 1
When a user adds a new group, the application checks if this group exists in Active Directory and displays a warning
if the group cannot be found. The user can still add a group with this name.
Implemented in: release 1
2.2. Logs and reports
2.2.1. Improved logging
The job log contains records about every finished job in Recognition Server.
The details pane has two tabs: a Files tab shows input and output files of the job and paths to these files and a
Details tab shows detailed information about the job, including Processing notes.
Now the job log may contain more than 500 records. The number of records is now limited only by the size of the
log or by the maximum number of days when data will be logged. These values can be changed in the Job Log
Properties dialog box, which can be opened by clicking the Options button
.
The Find button
allows users to search for records by input and output file name or by error text. Wildcard
searches are supported.
The job log can be saved to a *.csv by clicking Export to CSV File on the shortcut menu.
© ABBYY. All rights reserved. Page 8 of 39
By default, the Job Log provides two views: all jobs
without filtering and failed jobs. It is also possible to
create custom views by applying a custom filter to the
log. To create a custom view, select the corresponding
item from the shortcut menu or click the Create
Custom View button
and specify a view name and
filtering settings. The custom log will appear in the tree
below the Job Log node.
Implemented in: Arabic edition
2.2.2. Saving information about the operator who edited or rejected the document
The XML result file now contains information about the operator who verified or indexed the document. This
information is available in the verificationUserName and indexingUserName fields inside the <XmlResult> and
<JobDocument> tags. If indexing and verification are switched off, these fields will remain empty.
The XML result file now also contains information about the time of document indexing and verification.
The job log contains information about the rejected jobs in Processing notes (who rejected a job and on which
station).
Implemented in: release 1 for 3A
2.2.3. Correspondence between input and output files
The XML result file allows you to establish a correspondence between the original and the resulting files: in the
log, you can see the input and output files for each job.
Changes in the XML result file:

The attribute “Id” has been added to the <InputFile> tag. This is the identifier of the input file.

An embedded <Page> tag has been added to the <InputFile> tag. It has the following parameters: Id –
the page identifier of the input document; PageNumber – the number of the page in the input file.

A <Pages> tag with embedded <FileId> and <PageId> tags has been added to the <JobDocument> tag.
<FileId> is the input file identifier and <PageId> is the page identifier which indicates the page of the
input file which is the origin of the current processed page.
Changes in the log:
The log now has a Files tab which shows input and output files for each job.
Implemented in: Arabic edition
© ABBYY. All rights reserved. Page 9 of 39
2.3. Notifications
2.3.1. Including server and workflow names into the text of notification messages
Now the server name and the workflow names are included in the text of the notification messages sent by email
to the administrator. This helps to easier manage the servers/workflows and solve the possible problems.
The subject of the email message has the following structure (to be used for filtering the emails):
ABBYY Recognition Server (<Server Name>): <Reason of notification>
Implemented in: Release 2.
2.3.2. Notification about near license expiry
New notifications about near license expiry have been added.
Notifications can be sent based on the following event notification options:
 percentage of remaining pages in license;
 number of days left before the license expiry.
Implemented in: Release 2.
2.4. Job rejection without loss of files
Now it is possible to reject or delete a job without deleting the files.
The new commands Reject Job and Reject All Jobs are used to reject a job or all jobs. The files will be saved to
the Exceptions folder of the corresponding workflow.
The commands Delete Job and Delete All Jobs are used to delete a job or all jobs. The files will be placed into the
Exceptions folder of the server.
Implemented in: release 1 for 3A
© ABBYY. All rights reserved. Page 10 of 39
2.5. Interface improvements
2.5.1. Main window of Administration Console
The interface of the main window has been changed. New toolbars, panes, and buttons have been added. The
order of nodes is also slightly different. The stations are now gathered in the Stations node.
2.5.2. Workflow status pane
The Workflow status pane displays the current state of the selected workflow. Available information depends on
the workflow type.
The status pane displays the following information:
 State: started or stopped
 Start time
 Stop time (if workflow was stopped)
 Duration
 Total number of jobs
 Number of processed jobs
 Number of copied files
 Number of failed jobs
 Paths to Output folders
 Path to Exceptions folder
For a Document Library workflow which has been started, the status pane also displays a progress bar with
percent completed.
For a workflow with errors, the reason of failure is given in the status pane.
© ABBYY. All rights reserved. Page 11 of 39
2.6. Soft stop of the workflow processing
Now it is possible to stop the processing of jobs using the so called “soft” stop mechanism.
It helps to complete the processing of all current jobs. New jobs are not taken into the processing. After the results
of all current jobs are published, the workflow is stopped.
For manual “soft” stop one should use the Stop command. If the processing runs by the schedule, the workflows
are always stopped “softly”.
If the processing must be interrupted and the current jobs must be postponed without completion, one should use
the manual Stop immediately command. It frees the computing power at once. The postponed jobs are finished,
when the workflow is started again.
Implemented in: Release 2.
© ABBYY. All rights reserved. Page 12 of 39
3. Workflow settings
3.1. Document Library workflow type
New Recognition Server functionality allows users to process document libraries which shouldn’t be modified.
Now users don’t need to copy files to the Hot Folder. Instead, they can simply specify the root folder of a document
library as an input folder, an output folder, an output format, and processing settings. The document library will
be recognized and the processed files will appear in the specified output folder. The structure of the original
document library will be preserved.
Files which do not require recognition can be skipped, or moved to the output folder if you need to preserve the
entire structure of the document library.
The input files will not be deleted, as opposed to using the Hot Folder.
A new workflow type has been created especially for processing document libraries.
The Document Library workflow will be stopped after all files in the indicated library are processed. If the user
places new files into the library, he must restart the workflow. As all processed files are registered, only new files
will be processed.
If the workflow settings have been changed and it is necessary to reprocess all files again, use the Restart command
(click the arrow next to the Start button to see the command).
As a document library might be quite big and take a long time to process, workflows of type Document Library has
a progress bar. See Workflow status pane for details.
Implemented in: Arabic edition, modified in release 1 for 3A
3.1.1. Periodical crawling of document libraries
A crawling frequency can now be set up for the workflow of the Document Library type to ensure the fast
processing of upcoming files.
A new option Crawl for new files in library every: should be enabled. The period of the library crawling can be
selected from the drop-down list (from 10 minutes to 11 hours) or typed manually. E.g. “2 hours”, “12 hours”, etc.
After the periodical crawling is enabled and the workflow is started, the system runs the monitoring of the
document library and counts down the time until the next crawl.
If the Crawl for new files in library every: option is not enabled, the library is crawled only once. The start time of
crawling depends on the Workflow Activity settings (General tab).
Settings of periodical crawling of document libraries can be also specified in the configuration file
(Configuration.xml).
© ABBYY. All rights reserved. Page 13 of 39
Parameter EnablePeriodicCrawling stands for enabling/disabling the periodical crawling, the possible values are
True and False (the default is False). Parameter CrawlingInterval sets the crawling interval in milliseconds (the
default value is 7200000 ms).
Implemented in: Release 2.
3.2. Input settings
3.2.1. Processing SharePoint libraries
SharePoint libraries can now be indicated as a source for a Document Library workflow.
Users can indicate the input source: a site, a particular library or several libraries, a folder or several folders.
If Export output files to source library option is enabled when configuring the input source of MS SharePoint, the
output parameters will always include an output file with the export destination of SharePoint source libraries.
Output files are saved into the same libraries/folders as they are at input.
The format and naming schema of a file can be configured. By default the output files are saved under the same
names as at input. If a file already exists, a new version is created.
If Export output files to source library option is not enabled, than the output settings can be configured as usual,
including saving the files into any SharePoint library/folder. Only one library/folder can be selected.
If one and the same folder or library is indicated as input and output, files can be overwritten, or files with new
names can be created, or the versions of the files can be changed. The behavior is determined by the option
selected from the If file exists drop-down list in the Output Format Settings dialog box. See also Overwriting files
in the output folder.
© ABBYY. All rights reserved. Page 14 of 39
Limitations:
1.
If the input library is the same as the output library, the option For each folder cannot be used — you can only
create a job for each file.
2.
Only one site including all its libraries can be processed within one workflow. For child sites one should create
separate workflows.
Implemented in: release 1 for 3A.
Possibility to indicate several libraries as input was implemented in Release 2.
3.2.2. Using IFilter for processing PDF files in MS SharePoint
Microsoft Search IFilter for SharePoint 2013 can again be used for indexing PDF files due to the lifting of the
Microsoft ban.
To enable this possibility, the cumulative update package for SharePoint Server 2013 should be installed. Link to
install it: http://support2.microsoft.com/default.aspx?scid=kb;EN-US;2882989
Please note: The update for MS SharePoint should be installed before the installation of Recognition Server 4
Release 2.
If the Recognition Server 4 Release 2 has been installed, install the update for MS SharePoint, then run the
installation of the Recognition Server 4 Release 2 again and use the Repair command to modify the installation.
Implemented in: Release 2.
3.2.3. Filtering files for processing and settings for unprocessed files
It is possible now to filter files to be processed using a “mask” (i.e. a template) for file names. If you specify a name
mask, the program will process only files with names and extensions which fit the mask.
Files can be selected in the workflow properties: Input tab, Select files to process.
© ABBYY. All rights reserved. Page 15 of 39
You can use the “?” and “*” symbols in the mask. “?” stands for any single character and “*” stands for any
number of any characters. For instance, the mask *.* will select all files, the mask *.tiff will select only files with
the “.tiff” extension, and the image*.* mask will select files of all types whose names start with “image”.
For workflows of the Hot Folder and Mail types, the default mask is *.*, i.e. all files from the Input folder will be
processed. For workflows of the Document Library type, the default mask selects files in all of the supported image
formats (*.bmp, *.dib, *.rle, *.dcx, *.djvu, *.djv, *.gif, *.jb2, *.jbig2, *.jp2, *.j2k, *.jpf, *.jpx, *.jpc, *.jpg, *.jpeg,
*.pcx, *.pdf, *.png, *.tif, *.tiff, *.wdp, *.wmp.). You can specify any other mask that suits your needs. For instance,
you may wish to have a mask that processes image files but ignores files with the “.tmp” extension, which may be
created in the input folder when scanning documents.
Under Other files, you can specify which actions should be performed on files that do not fit the mask:



Exceptions folder - Any files that do not fit the mask will be placed into the Exceptions folder. Use this
option when only files of certain types must be processed.
Output folders - Any files that do not fit the mask will be placed into an output folder. Use this option
for processing archives where all documents must be preserved together with the folder structure.
Processed image files will be converted to images with a text layer and all other files will be copied or
moved to an output folder “as is.”
No action - Any files that do not fit the mask will be ignored. Use this option when only files of certain
types must be processed. Note: We do not recommend using the No action option for workflows of the
Hot Folder type, as this may fill up the folder with unprocessed files.
Note: A separate job is always created for unprocessed files. If the workflow must create one job per folder and in
a folder contains both processed and unprocessed files, the workflow will create one job for the processed files
and another job for the unprocessed files.
The mask option is useful in the following scenarios:

Hot Folder. Sometimes scanners create *.tmp files besides *.tiff files and place both kinds of files in the
same folder. Only *.tiff files should be processed, and the *.tmp files should be ignored.

Read-only folder. The user might need to recreate in the output folder the structure of the input folder.
Only images should be processed and the other files must be moved to the output folder.

Mail. Besides an attached image file, a letter may contain a logo or signature in GIF format. Only the
attached image file should be processed and the GIF logos and signatures should be ignored.
© ABBYY. All rights reserved. Page 16 of 39
The input files of failed jobs can now be moved to output folders, moved to the Exceptions folder, or ignored. To
tell the program what it should do with failed jobs, use the Save failed jobs to option on the Quality control tab
of the Workflow Properties dialog box.
Note: If the user chooses to move unprocessed or failed files to output folders and the workflow contains several
output folders, the unprocessed or failed files will appear in all output folders.
Implemented in: Arabic Edition, modified in release 1 for 3A
3.2.4. Using the SSL protocol for data protection
Communicating with a POP3 server over the SSL protocol
is now supported. If POP3 E-mail Server is selected as the
source type, the option Use SSL becomes available. Port
995 should be specified in the Port number field.
Implemented in: release 1
3.3. Processing settings
3.3.1. Special mode for processing technical drawings
Working with technical drawings such as
construction blueprints has been significantly
improved. Since the processing of technical
drawings requires settings different to those
required for regular documents, users should
enable the Processing mode for technical
drawings option on the 2. Process tab of the
Workflow Settings dialog box.
It is recommended to enable this mode for
documents that contain a lot of fine details. The
graphical objects will remain unchanged and the
text will be recognized.
Recognition in this mode is done in three
directions:



The direction of the principal
orientation, which is automatically
detected
Rotated clockwise relative to the
principal orientation
Rotated counterclockwise relative to
principal orientation
In the XML output file, the orientation of the text will be indicated in the orientation attribute:
 RotatedClockwise
 RotatedCounterclockwise
 If not indicated, the orientation is “normal” (i.e. the text is oriented horizontally)
Note: Using this mode can slow down image processing.
Implemented in: release 1 for 3A
© ABBYY. All rights reserved. Page 17 of 39
3.3.2. Despeckle images option
The Despeckle option is now available in the product GUI
(Workflow properties, 2. Process tab, Advanced
Processing Settings). This option removes noise from the
image. Noise can be introduced by scanning, and it is
recommended that it be removed for better data
recognition. During despeckling, the program also
removes background dots or boundary lines of raster
forms.
By default, the option is switched off, because in some
cases it can adversely affect recognition (the program
may even fail to recognize some text fragments). We
recommend switching the option on only if you are
certain that it will help to remove noise from your images
(please try it first on several sample images).
The corresponding API method is RemoveGarbage.
Implemented in: release 1 for 3A
3.3.3. Additional fonts
This setting is only available in the configuration file.
By default Recognition Server uses only limited number of fonts to avoid dependency of a result on fonts set
installed in each processing station. These fonts might be not enough for correctly display fonts for Chinese,
Korean, Japanese, Thai or Arabic.
To solve this problem, a new parameter, AllowedFontsMode, is available in the section RecognitionParams of the
configuration file (Configuration.xml).
Possible values are:


Default – In this mode, only the following fonts will be used: Arial, Times New Roman, and Courier New.
All – All possible fonts will be used. Please note that processing will take longer. It is also important that
the user have the same set of fonts on all the processing stations; otherwise the result might be different
on different computers.
Users can also use a custom font set as an addition to the main font set. In this case, a list of additional fonts can
be added below the section RecognitionParams using the element AdditionalAllowedFont.
This example illustrates adding the font AngsanaUPC to the set of main fonts:
<RecognitionParams RecognitionQuality="Fast" LookForBarcodes="true" VerificationMode="AlwaysVerify"
RecognitionMode="FullPage" TextExtractionMode="false" AllowedFontsMode="Default">
<AdditionalAllowedFont>AngsanaUPC</AdditionalAllowedFont>
Implemented in: release 1
3.3.4. To speed up processing, text in pictures is not recognized by default
To speed up processing, recognition of text in pictures is now disabled by default. If you need to recognize text in
pictures, you can enable this feature in the configuration file. This can only be done for the quality recognition
mode.
The name of the parameter is ProhibitHiddenTextDetection, the default value is true.
Implemented in: release 1
3.3.5. Blank page detection settings
The settings to configure the flexible detection of empty pages have been added. It helps to avoid problems of
wrong blank pages detection for images of the low quality, with the noise left after scanning, with non-textual
objects, etc.
© ABBYY. All rights reserved. Page 18 of 39
Margins, percentage of blackness and objects allowed on a page to consider it empty can be specified in the
Document Separation parameters.
Implemented in: Release 2.
3.4. PDF processing options
3.4.1. Improved MRC compression method of output PDF files
The quality of output PDF files generated with using the MRC method of compression has been significantly
improved. The enhanced method of MRC compression now grants the noticeably better visual quality of
documents while keeping almost the same small file size.
The MRC compression for output files shows the same results of minimizing the file size and preserving the visual
quality as our competitors (incl. CVISION).
The improved compression methods are used by default now in all new and previously created workflows with
compressed PDF output format enabled (Enhanced compression (MRC) option).
To disable the updated MRC and use the previous
compression mode one should set the
LegacyMRCMode
flag
to
True
in
the
Configuration.xml of ABBYY Recognition Server
settings.
To manage the quality/size parameters of the output
files, the Max Quality – (balanced) – Min Size
profiles can be selected.
These profiles help you to select the desired output
quality/size and have the settings configured
automatically. For instance, when selecting Min Size
profile, the quality parameter is set to 30% and the
MRC compression is enabled.
Implemented in: Release 2.
© ABBYY. All rights reserved. Page 19 of 39
3.4.2. Version, format, and other parameters of an output PDF file
Export settings for PDF and PDF/A have
been expanded: it is now possible to specify
a version for output PDF files and select a
PDF/A standard. The list of available PDF
standards includes PDF/A-1a, PDF/A-1b,
PDF/A-2a, PDF/A-2b, and PDF/A-2u.
Implemented in: release 1 for 3A
3.4.3. Export to PDF/A-3 format
Export of output files to PDF/A-3 format has been supported. It is possible to select PDF/A-3a, PDF/A-3b, or PDF/A3u standards of PDF/A format.
Please note: the attachment cannot be written into the output PDF/A-3.
Implemented in: Release 2.
3.4.4. Tagged PDF enabled by default
When adding a new output format for saving documents to PDF files, the option of Enable tagged PDF (compatible
with Adobe Acrobat 5.0 or above) is enabled by default now. This helps to avoid problems with having excess
spaces in the words and ensure the correct search within the PDF file.
Please note: this option may result in upto a 10% increase in the file size.
Implemented in: Release 2.
3.4.5. Possibility to skip processing PDFs with a text layer
It is now possible to skip the processing of PDF files. PDF files with a text layer can now be moved to an output
folder if the user selects the option Do not modify files with high-quality text layer. The user can also select a
detection mode:

In Fast mode, the application looks for a text layer in the file. If a text layer is detected, the file will be
moved to an output folder and the other export settings will be ignored. The application will not treat the
pages in this file as OCRed, but please note that if there are other output folders with formats other than
PDF specified, OCR will be performed, affecting the page counter.

In Thorough mode, the application compares the text layer of a PDF file with OCR results (a piece of text
on each page will be compared). If the text in the text layer and the text obtained through OCR are
identical, the file will be moved to an output folder. In this case pages are considered to be as OCRed,
which affects the page counter.
When a text layer is compared to OCR results, the default threshold is 5%. This means that the program will use
the OCR results, if there is more than more 5% difference between the texts. This threshold can be changed in the
Configuration.xml file: SkipRecognizePdfsWithTextLayerCoefficient="25"
This setting is located in the ExportFormat node and appears in the file when you set up output to PDF.
Note:
1.
2.
Files skipped in Fast mode will not be sent to operator stages (i.e. indexing or verification).
The setting is only applicable to source files in PDF format.
Implemented in: release 1 for 3A
© ABBYY. All rights reserved. Page 20 of 39
3.4.6. Ability to embed a text layer and keep the image and all PDF file properties
Sometimes PDF files don’t have a good text layer but have bookmarks, attachments or other parameters which
must be preserved. It is now possible to preserve all attributes of a PDF file and embed only recognized text. The
option Modify text layer only is available on the Format Settings tab for PDF and PDF/A.
Note: The option is only applicable to source files in PDF format.
Implemented in: release 1 for 3A
3.4.7. Enabling and disabling Fast Web View for PDF files
The option Fast Web View is available on the
Format Settings tab for PDF and PDF/A. If the
option is enabled, a preview will be created for
fast opening of the file on websites.
Implemented in: release 1 for 3A
3.4.8. Using PDF text layer for recognition results improvement
In case PDF files with a text layer are OCRed by Recognition Server the source text layer is used for recognition
results improvement. For example, unconfidently recognized characters are checked with a text layer and are
copied from it.
Implemented in: release 1 for 3A
© ABBYY. All rights reserved. Page 21 of 39
3.4.9. Using PDF text layer for generating quality output files of different formats
If imported PDF file contains a text layer, it can be reused for creating the quality output files of PDF and other
formats. For example, PDF/A, ALTO XML, etc.
When running the OCR of imported files, the original text layer is detected. The quality of the original text character
is evaluated before copying it to the resulting file. By this algorithm we ensure the same or better quality of the
output file compared to the original file.
Please note, that the license counter is decreased, even if the original files contain the text layer.
Implemented in: Release 2.
3.5. Output settings
3.5.1. Overwriting files in an output folder
It is now possible to overwrite an output file if it
already exists in an output folder. If the option
Overwrite if file exist is not selected, a 4-digit index
will be added to the file name.
In the XML result file, the attribute
RewriteIfFileExists has been added to the tag
<FormatSettings>. The value true indicates that the
files in the output folder were overwritten.
Implemented in: Arabic edition
When you save output files in a SharePoint
library, you have a choice of the following
options:

Create new name – The output file will
be given a new name.

Overwrite file – The output file will
replace the original file.

Use SharePoint versioning options – The
output file will replace the original file
and a new version number will be
calculated using the current settings of
SharePoint versioning.
SharePoint options:
Implemented in: release 1 for 3A
© ABBYY. All rights reserved. Page 22 of 39
3.5.2. Export format compatible with FineReader Engine 11
Recognition Server 4 supports export to an internal
FineReader format which is compatible with
FineReader Engine 11.
To export to this internal FineReader format, select
FineReader Internal format (*.layout, *.image) as the
output format. As a result, two files will be created
with *.layout and *.image extensions.
This feature is useful for complicated image processing
in FRE. Instead of creating a distributed system,
Recognition Server will be used for text layer creation.
Implemented in: release 1 for 3A
3.5.3. KeepPages parameter
This setting is only available in the configuration file.
The new parameter KeepPages regulates page breaks in the output formats doc, rtf, and docx. This parameter is
available in the export settings inside the ExportFormat tag of the configuration file (Configuration.xml). Possible
values are true and false (the default value is false).
Usage scenario:
The size of a text fragment on a page can decrease if font size is decreased. To keep the page breaks as in the
source document, the parameter should be set to true, otherwise content from the beginning of one page may be
placed on the preceding page.
In other cases, the size of a text fragment can increase and if you keep the page breaks, the end of the text fragment
from one page can be placed on the following page. If this is the case, we recommend setting the KeepPages
parameter to false.
Implemented in: release 1 for 3A
3.5.4. Export to specific column types in SharePoint
Export of index fields to specific column types to SharePoint has been supported:
 Single line of text;
 Multiple lines of text;
 Choice (menu to choose from);
 Number;
 Currency;
 Date and Time;
 Yes/No (checkbox);
 Hyperlink or Picture;
 Managed Metadata.
The document attributes (index fields) should be mapped with the appropriate content types imported from the
selected SharePoint library.
To configure the mapping process, one should click the Settings button then selecting the SharePoint document
library in the output parameters. In the Mapping Document Attributes to SharePoint Columns window the links
between the RS document types (created at the Indexing tab) and SharePoint content types (submitted from the
© ABBYY. All rights reserved. Page 23 of 39
selected library) should be established. After the appropriate SharePoint content type is selected, the RS document
attributes (index fields) can be mapped with the SharePoint columns.
Implemented in: Release 2.
3.5.5. Export to ePub3 format
Export of output files to ePub v.3 format has been supported.
Implemented in: Release 2.
3.5.6. Settings of units measurement for export to ALTO XML
A unit of measurement (pixels, inches, and mms) can be selected when configuring export to ALTO XML format.
Implemented in: Release 2.
4. Document processing
4.1. Improved recognition of Arabic texts
A new version of OCR Technologies is used in the Recognition Server 4, where Arabic OCR has been significantly
improved. Please see the Test results for recognition speed and quality as compared to the OCR Technologies
inside Recognition Server 3.5. There you can also find a comparison with other OCR products that can recognize
texts in Arabic.
© ABBYY. All rights reserved. Page 24 of 39
Besides these technological tests, productivity of Recognition Server 4 was measured on 2,500 pages of Arabic
texts which were exported to RTF. This test has shown Recognition Server 4 to be 17-20% faster compared to
Recognition Server 3.5.
Implemented in: Arabic edition
4.2. Ability to limit the number of processed pages in input files
In many scenarios that involve searching document libraries it is sufficient to have text from a few first pages in
order to find a document. In such cases, clients would like to save time and pages in the page counter by limiting
the number of processed pages to N first pages in each file.
This feature can be switched on for IFilter and GSA connectors in the workflow settings via the GUI. For other
workflows, it can only be switched on using the XML ticket.
Example of enabling this feature in the XML ticket:
<XmlTicket PageNumToRecognizeForSingleInputFile="2">
<InputFile Name="50.pdf" />
<ExportParams>
<ExportFormat OutputFileFormat="Text"
OutputFlowType="SharedFolder">
<OutputLocation>D:\Output Folder</OutputLocation>
</ExportFormat>
</ExportParams>
</XmlTicket>
This feature will work only if there is no document assembly (the option Create one document for each file in job
is selected), otherwise the setting will be ignored.
This setting has the following effect:
 Only processed pages will be counted in any output files
 The time of processing will be reduced, as only the specified number of pages will be processed
 This setting will be ignored if the output format is PDF
 Output files in text formats will contain only N pages
 Output files in image formats will contain all pages, but the page counter will be decremented only by N
pages for each file
 If an operator station is included in the processing, all pages can be opened on this station, but only the
first N pages will be available for indexing and editing. The operator will be able to recognize other pages
on the Verification Stations if necessary. In this case, the page counter will be decremented by the number
of recognized pages.
Implemented in: release 1
4.3. Support of new barcode type - USPS-4CB (Intelligent Mail Barcode)
Extraction of barcodes of USPS-4CB type which is used on mails in USA and is required by the US postal service has
been supported.
Barcodes of USPS-4CB (Intelligent Mail Barcode) can be recognized in documents and also can be selected as a
barcode type for the document separation in the workflow settings.
Implemented in: Release 2.
4.4. Disabled image compression of lossy JBIG2 type
Lossy JBIG2 image compression has been removed from the UI and internal compression parameters, as it
produced the output files of low quality.
Implemented in: Release 2.
© ABBYY. All rights reserved. Page 25 of 39
5. Scanning Station
5.1. Sending registration parameters values to index fields
When scanning a batch, the registration parameters
entered for a document can be sent as the values of the
document index fields.
The lists of index fields (document types and their
attributes) must be pre-configured in the workflow
properties (Indexing tab).
At the Scanning station in the batch type settings one
should specify the batch sending parameters: select the
desired workflow and import the list of index fields by
clicking the Import Registration Parameters button.
When creating a batch, select the desired Batch Type,
assemble the documents and assign the Document Types
in the Registration Parameters window.
After processing the batch in Recognition Server, the documents with pre-filled index fields’ values are shown at
the Indexing station. It is possible to skip the indexing stage by using the following code in the indexing script:
“SkipManualIndexing = true;”. In this case index fields’ values will be exported according to the workflow settings.
© ABBYY. All rights reserved. Page 26 of 39
The values of document registration parameters can be obtained from indexing or export script by using the
standard Attributes object. Also they are accessible from the XML result file.
Please note:

Only values of the parameters imported from the workflow can be sent as index fields to Recognition
Server, despite it is possible to create more registration parameters in the batch type settings at the
Scanning Station.

The types of entered values should coincide with the types of index fields, specified in the workflow
properties.
Implemented in: Release 2.
6. Verification and Indexing Stations
6.1. Manual selection of documents for verification and indexing
Operators of Verification and Indexing Stations can
now select documents manually from the queue.
This feature can be very useful if an operator needs
to speed up the processing of recently added urgent
documents.
The button
on both stations toggles between
manual and automatic modes of receiving the next
document.
The button
on both stations should be used to
open the Select Document for Verification or Select
Document for Indexing dialog box. In this dialog box,
the operator can find the required document, sort
documents by name, priority or creation date, and
select the found document, which will be opened for
verification or indexing.
Anew information pane displays the number of
documents in the queue and allows starting
verification or indexing and selecting documents
manually. This pane appears:

Between tasks in manual mode

When connection with the server is lost

When the current document is returned to
the queue when timeout is reached
Implemented in: release 1
© ABBYY. All rights reserved. Page 27 of 39
6.2. Saving documents
It is now possible to save changes in the current document on both Verification and Indexing Stations. Now if a
failure occurs during verification, the verification results will not be lost. The verification results will be saved if the
operator selects Document > Save or presses Ctrl+S.
When the station is closed during document verification, the operator will be asked to save the results.
The current document with the saved changes will be returned to the server and will become available to other
operators.
On Indexing Stations, it is only possible to save results after the document type is selected.
Implemented in: release 1
6.3. Timeout of inactivity
To prevent documents from sitting forever on
operators’ stations, documents are returned to the
queue after a timeout is reached. In previous versions,
the timeout value was set to 120 minutes and could
not be changed. That proved insufficient for verifying
large documents (for example, books). The 120minute timeout is also not suitable for companies
which allow operators to leave the current document
opened when they go home after work or break for
lunch.
Now the timeout value can be changed in the
Recognition Server Properties dialog box, or in the
configuration file Configuration.xml (change the value
in OperatorStationInactiveTimeoutInMinutes="120"
in the QueueManager node).
Important! This timeout is applied to all workflows
and to all jobs on Verification and Indexing Stations.
Implemented in: release 1
6.4. Improved work with document types and index fields on Indexing Stations
6.4.1. Import of index fields from files
The ability to import document types, index fields, and values from an XML or CSV file has been added. This feature
is useful if there is a need to use the same field in different workflows.
The feature is available on the Indexing tab of the of the Workflow Properties dialog box tab (click the Import…
button).
Imported files should have the following structure:

XML
Indexing.xml

CSV
© ABBYY. All rights reserved. Page 28 of 39
DocumentType
type1
type1
type2
FieldName IsObligitary FieldType
bbb
List
ccc
SingleLine
test
TRUE
MultipleLines
PossibleValues
IsDefault
Field1;Field2;Field3 TRUE
Don't say
Do this 1; test twice
There are some other changes on the 5.Indexing tab of the Workflow Properties dialog box:
 Order of document types can be changed using the Up and Down buttons.
 The default document type can be selected using the Default type checkbox.
Implemented in: release 1
6.4.2. Quick input of index fields
When the operator starts typing an index field value, the values starting with the same letter will be automatically
selected from the list of allowed values.
Implemented in: release 1
6.4.3. Possibility to combine values from several regions into a one index field
Possibility to use several regions as a source of values for the one index field has been added. This feature can be
useful to set the multi-line text as an index field value.
To combine the values, one should hold the CTRL key and click on the regions that contain values to be used as a
single index field. The values are aggregated and separated with spaces automatically.
Implemented in: Release 2.
© ABBYY. All rights reserved. Page 29 of 39
6.5. User interface changes
6.5.1. Verification Station
The main toolbar on the Verification Station has been changed:

The new Warnings button allows the user to hide/show the warnings pane. The button also displays the
number of issued warnings.

The number of low-confidence characters is displayed on the Check Spelling button.

The new Select Document button allows selecting documents manually from the verification queue.

The new Get documents Automatically button allows switching between automatic and manual
document selection.

The Reject All Documents button is hidden, this command is only available in the menu *.
*Note: The Reject All Documents command should not be used very often because it rejects all documents of the
job while the operator works on the current document only. The Reject command returns only the current
document to the queue.
Information about the number of documents in the queue is now displayed in the status bar:
Implemented in: Release 1
6.5.2. Indexing Station
The main toolbar on Indexing Station has been changed:

The new Select Document button allows selecting documents manually from the indexing queue.

The new Get documents Automatically button allows switching between automatic and manual
document selection.

Reject All Documents button is hidden, this command is only available in the menu *.
*Note. The Reject All Documents command should not be used very often because it rejects all documents of the
job while the operator works on the current document only. The Reject command returns only the current
document to the queue.
Information about the number of documents in the queue is now displayed in the status bar:
Implemented in: release 1
7. Operating systems
7.1. Support for Windows Server 2012 Release 2
Recognition Server 4 can be installed and run on Windows Server 2012 Release 2.
Implemented in: release 1
7.2. Discontinued support for Windows XP and Windows Server 2003
We stopped supporting Windows XP and Windows Server 2003. Recognition Server 4 cannot be installed on these
operating systems.
© ABBYY. All rights reserved. Page 30 of 39
8. Scripting
8.1. Access to subsequent pages from the document assembly script
A new property was added for a page object to enable the document assembly based on the analysis of subsequent
pages - RecognizedPage: UserProperty.
The decision on whether the page belongs to a document can be made based on the information about the next
pages. For example, the same ID values on all the pages.
Implemented in: Release 2.
8.2. Detecting the workflow name by script
A new property was added for a page object to get the workflow name for the page that is being processed RecognizedPage: WorkflowName.
This possibility allows copying scripts to several workflows without manual modifications.
Implemented in: Release 2.
9. Changes in the COM-based API and Web API
9.1. Namespace changes
The namespace of the COM API is changed from ABBYYRecognitionServer3 to ABBYYRecognitionServer.
The namespace of the Web API is changed from RSSoapService3 to RSSoapService.
Implemented in: Arabic edition
9.2. Compatible API
By default, the API is not fully compatible, which allows Recognition Server 4 to be installed and run on the same
computer where a previous version is installed.
If there is a need to have a fully compatible API without recompiling your applications, you can achieve this by
following simple instructions.
This feature is available by request only; please contact ABBYY HQ for the instructions.
9.3. Automatic API deployment on 64x operating systems
Both the Web and the COM API are automatically deployed by installer on 64x operating systems without any
additional manual setup.
9.4. Added objects
The goal of adding new objects to the API is to support these scenarios:
1. Ability to establish a correspondence between the input and output files.
2. Ability to delete jobs after asynchronous processing (for Ricoh).
3. Setup of the recognition service if Recognition Server is accessed and settings are changed by a user
working on the same computer (for NLC).
9.4.1. Correspondence between input and output files
The following objects are added to the COM-based and Web-based API to support the ability to establish a
correspondence between input and output files.
InputFile
This object represents one input image file and the results of processing this file.
© ABBYY. All rights reserved. Page 31 of 39
Properties
Name Type
Description
Pages Pages, read-only Returns a collection of pages of the input file.
ID
String, read-only Unique identifier of the input file generated by RS.
Pages
This object represents a collection of Page objects.
Page
This object represents a page of the input file. This is a child object of InputFile.
Properties
Name
Type
Description
ID
String, read-only Unique page identifier generated by RS.
Number String, read-only Page number in the input file.
JobDocument
This object represents one output document.
Properties
Name
Type
Description
PagePositions
PagePositions, read- Returns a collection of pages of the output document with the information
only
about the position of each page in the input file.
PagePositions
This object represents a collection of PagePosition objects.
PagePosition
This object represents a page in the output document and information about the position of this page in the input
file. This is a child object of JobDocument.
Properties
Name Type
Description
FileId String, read-only ID of the input file to which the page belongs.
PageId String, read-only ID of the page in that input file.
Implemented in: Arabic edition
9.4.2. Support of the recognition service scenario (for NLC)
In this scenario, Recognition Server works as a service which is almost invisible to the user and is called if documents
processed with NLC are in an image file format and should be recognized first.
Recognition Server is installed silently and uses the default workflow. However, its settings are available on the
same computer and the user can change these settings. In this situation, it is necessary to have the ability to check
if the job can be processed and cancel the job if it cannot be processed at the moment.
With the new API methods now you can:
 Check if the workflow is started or stopped
 Check if there is a connection with server
 Check if indexing and/or verification is switched on in the workflow and change indexing or
verification settings
The following objects have been added to the COM-based API.
Now it is possible to get the state of a workflow. The parameter WorkflowState has been added to the IWorkflow
interface.
Interface
Name
Type
Description
IWorkflow
WorkflowState
WorkflowStateEnum, read-only
Returns a collection of workflow states.
© ABBYY. All rights reserved. Page 32 of 39
WorkflowStateEnum is a type of constant enumeration, which defines different workflow states.
Name
Description
WS_ApplyingSettings
The state of a workflow after it has been started and before the processing has
begun. At this stage, the program checks if it can access the folder that
contains the input documents. This state is very short in duration and is not
indicated in the console (the word "Starting" is displayed instead).
WS_Crawling
At this stage, the program checks the folders of the Document Library
workflow. It counts the files, adds them to the database, and prepares to
process them. The word "Crawling" is displayed in the console.
WS_Finishing
The state of a workflow when processing is coming to an end. At this stage,
the program writes the files for the last time and completes publishing the
large files. The words "Finishing Processing" are displayed in the console.
WS_NotAvailable
The state of a workflow that is inaccessible. The words "Not Available" are
displayed in the console, together with the reason why the workflow cannot
be accessed.
WS_Processing
The principal state of a workflow, when files are being received, processed,
and recognized. The word "Processing" is displayed in the console.
WS_StartingProcess
The state of a workflow after the start command has been executed and
before information about the beginning of processing has been returned. The
word "Starting" is displayed in the console.
WS_Suspended
The state of a workflow that has been stopped. The word "Stopped" is
displayed in the console.
Besides workflow states, it is possible to get the state of the server.
Interface
IClient
Name
Description
Connect(string serverName)
A connection with server is being established.
If the server is stopped, there will be a COMException with this
text: “ABBYY Recognition Server is not available: The client has
successfully connected to the server, but the server is not
running.”
A method which deletes a job and all images has been added to the IClient interface.
Interface
Name
Description
IClient
DeleteJob(string jobId)
Deletes a job with its all images.
It is now possible to receive the server’s Exceptions folder via the IClient interface.
Interface
Name
Type
Description
IClient
ServerExceptionsFolder
string, read-only
Returns the folder with the server’s
exceptions
It is now possible to switch on/ off verification using IXmlTicket
Interface
Name
IRecognitionParams VerificationMode
Type
Description
VerificationModeEnum
Returns the verification type:
whether verification will be
performed or not.
IRecognitionParams VerificationModeThreshold double
Sets the verification threshold.
VerificationModeEnum is a type of constant enumeration which defines different verification types.
Name
Description
DVM_DoNotVerify
Verification is switched off.
DVM_VerifyAlways
Documents will be always verified.
DVM_VerifyIfThresholdExceeded
Documents with the number of low-confidence characters above the
threshold (VerificationModeThreshold) will be verified.
Implemented in: release 1
© ABBYY. All rights reserved. Page 33 of 39
9.4.3. Deleting of jobs
The following method has been added to the COM-based and Web-based API to support the ability to delete a job
after asynchronous processing.
A method which deletes a job and all of its images has been added to the IClient interface.
Interface
Name
Description
IClient
DeleteJob (string jobId)
Deletes a job with all of its images.
Implemented in: release 1
10.
UI and Documentation localization
Localization of ABBYY Recognition Server 4 is done according to the table below. New language that is supported
in the Release 2 is filled with blue (Help file for Scanning Station in French).
English
Russian
French
German
Italian
Spanish
Chinese
Portuguese
(Brazil)
Czech
Hungarian
Polish
Console
+
+
+
+
+
+
+
+
+
+
+
Indexing Station
+
+
+
+
+
+
+
+
+
+
+
Verification
Station
+
+
+
+
+
+
+
+
+
+
+
Scanning
Station
+
+
+
+
+
+
+
+
+
+
+
Protection
+
+
+
+
+
+
+
+
+
+
+
Console
+
+
+
+
+
+
-
-
-
-
-
Indexing Station
+
+
+
+
+
+
-
-
-
-
-
Verification
Station
+
+
+
+
+
+
-
-
-
-
-
Scanning
Station
+
+
+
+
-
-
-
-
-
-
-
Open API
+
-
-
-
-
-
-
-
-
-
-
Admin Guide
+
+
+
+
+
+
-
-
-
-
-
EULA
+
+
+
+
+
+
+
+
+
+
+
Recognition
Server
+
+
+
+
+
+
+
+
+
+
+
IFilter
+
+
+
+
+
+
+
+
+
+
+
Autorun
+
+
+
+
+
+
+
+
+
+
+
Resources
Help
Installer
Implemented in: Release 1 Multilingual, Release 2
© ABBYY. All rights reserved. Page 34 of 39
Corrected Issues
Corrected in Release 2
№
1
Office
Description
ABBYY Europe
2
It was not possible to use NEW operator (under 64-bit operating systems in COM API), when
creating InputFile object.
Now to make xmlTicket.InputFiles.Add(inputFile); method work, the object InputFile should be
created using the CreateInputFile method: InputFile file = _clientObject.CreateInputFile();
Microsoft Search IFilter for SharePoint 2013 could not be used for indexing PDF files due to the
Microsoft restriction.
Now PDF files can be indexed again, after installing the cumulative update package for
SharePoint Server 2013.
“There is no workflow for processing IFilter requests” error happened after adding the IFilter
component to the computer with Recognition Server already installed.
3
4
A license without ISIS drivers allowed scanning with ISIS.
Corrected in Release 1
№
Office
Description
1
ABBYY USA
There were extra spaces in PDF files for particular images.
The Registry Key
[HKCU\Software\ABBYY\RecognitionServer\4.0\OCRProcessor\Export\DebugOptions]
"Final_PdfUseScalingForPreventingVirtualSpaces"="true"
can be created and enabled for solving extra spaces problem in PDF.
Be aware, that if RS Processing service is working under LocalSystem account it is necessary to
change HKCU to “HKEY_USERS\S-1-5-18”
2
ABBYY USA
Some PDF file were opened on the second page
Known Issues
Description
Solution or workaround
Release with
issue
E-mail support
When Exchange Mailbox is selected as input and
output, and the "Reply to all:" mode is selected,
Recognition Server does not send the resulting letter
to the addresses indicated in the "To:" field but only
to the sender and to the addresses in the "CC:" field.
In the "To:" field, only the mailbox checked by
Recognition Server should be specified. In the
"CC:" field, all the addressees that must
receive the recognition result should be
indicated.
Release2;
Release 1;
release 1
(specially for 3A)
Export
Settings of mapping the index fields to SharePoint
columns are not inherited when upgrading from
Release 1 to Release 2. Only document types are
preserved.
If the Content Type configured in the indexing
parameters contains the required fields, the
documents' index fields’ values must be non-empty at
the export step. Otherwise, these documents are left
as checkout in the MS SharePoint with an error in the
processing notes.
Release2
All index fields configured as required must be
assigned with the values.
© ABBYY. All rights reserved. Page 35 of 39
Release2
Description
Solution or workaround
Export options for PDF/A-2b and PDF/A-3b lead to
creating files of PDF/A-2u and PDF/A-3u formats
accordingly.
(As 2b and 3b formats of PDF/A files are subsets of 2u
and 3u formats.)
When exporting documents in Arabic to PDF with
MRC, the visual quality of the picture layer becomes
worse than in the original image. At the same time,
the text layer is correct and can be used for searches.
When Arabic text is exported to *.docx format, it is
oriented from left to right, which is incorrect.
Enable Microsoft Office 2007 Language Settings
After exporting in PDF-a (tagged PDF) text layer
differs from the text layer in non-tagged PDF and
original image.
It is recommended to use Adobe Reader 11 to
open tagged PDF files. Adobe Reader 10 opens
tagged PDF files incorrectly.
The structure of Arabic documents is not saved in
exported files when using the Formatted mode.
Use Exact Copy or Plain Text mode.
(Programs > Microsoft Office Tools >
Microsoft Office 2007 Language Services).
Failed files are duplicated if several export profiles
are used and if they are sent to an output folder. For
example, each export profile sends a failed file to an
output folder.
Release with
issue
Release2;
Release 1
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
Release2;
Release 1;
Release 1
(specially for 3A);
Arabic edition
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
Workflow
The child sites of the SharePoint site cannot be
selected for crawling within one workflow.
When specifying the settings for processing the
documents in SharePoint, it is possible to select the
top level site and any of its libraries.
For the child sites one should create the
separate workflows.
Release 2
Index fields or document properties are not saved
into the SharePoint libraries, when crawling the
SharePoint libraries with exporting the output files to
source library. (There is no option to setup saving
values into the SharePoint columns.)
Release 2
It is not possible to establish connection to the
SharePoint in the workflows of Recognition Server
3.5, if it was installed on the computer with the
Recognition Server 4 Release 2 already installed.
Release 2
IFilter
© ABBYY. All rights reserved. Page 36 of 39
Description
Solution or workaround
IFilter in Recognition Server 4 for SharePoint 2013
cannot index JPEG files.
For processing JPEG files stored in SharePoint,
we recommend using a Document Library
workflow instead of IFilter and specify a folder
in SharePoint as the input folder.
A possible workaround for processing files
stored in SharePoint 2013 is to create two
workflows.
One workflow of type Document Library should
have a file name mask which will allow
processing only JPEGs. You can specify the
same SharePoint library as input and output. As
a result, all JPEG files can be replaced with PDF
files with a text layer.
The second workflow of type IFilter should
index other image files stored in SharePoint.
Release with
issue
Release 2
Release 1;
release 1
(specially for 3A);
Arabic edition
Recognition
If both English and Arabic are selected as recognition
languages, the orientation of a document can be
detected incorrectly.
1. The orientation of documents in Arabic is always
correct.
2. The orientation of documents in two languages is
generally correct but there may be some documents
with incorrect orientation.
3. The orientation of documents in English is always
incorrect in this case.
The problem is difficult to avoid. Although orientation
correction on a Verification Station helps to get a
correctly recognized text, the orientation of the
document will remain incorrect after export.
Performance
Process documents in English separately and
select only English recognition language for
them.
The processing speed of the workflow type Document
Library is several times slower comparing to the
processing speed of Hot Folder workflow type.
Protection
If the license contains both types of pages limits
(standard and gothic), but none of standard pages left
in the page counter, it cannot be selected as the
current license, even if some gothic pages are left for
processing.
Server operation
"Not enough memory" error may occur when
processing very large multi-page files.
When processing multi-page files of 3,000 pages, the
following error message appears:
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
Release2;
Release 1
Release 2
In certain cases the following method can be
used as workaround: disable the text layer
substitution by setting the
ReplaceTextLayerOnlyInPdfs to "False".
Split files containing 3,000 pages into smaller
files.
OCR Processor: Not enough storage is available to
complete this operation.
© ABBYY. All rights reserved. Page 37 of 39
Release2
Release2;
Release 1;
release 1
(specially for 3A)
Description
Solution or workaround
Documents are not submitted for recognition until all
files of the same job in the import folder are loaded.
If a job contains too many files to be imported, the
Processing Stations can stay idle for a significant
period of time.
Release with
issue
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
System Administration
The database where the job log is stored is not
compatible with a previous version of the database.
For this reason, when upgrading to Recognition
Server Release 1 or Release 2 from previous releases
of Recognition Server 4 it is not possible to save the
job log.
Release2;
Release 1
Event log for the next version of application (RS 4.0) is
stored in the new branch to speed up working with
Administration Console. After the upgrade from RS
3.5 to RS 4.0 the previous Event log is not displayed in
the Console, but it can be viewed using system log for
the application log.
Release2;
Release 1
Verification
If a barcode was not correctly recognized and
documents were not separated, re-recognizing this
barcode at the verification stage does not help to
separate the documents correctly.
Verification Stations of Recognition 4 cannot be
started if a Verification Station of Recognition Server
3.5 has already been started on the same computer.
Release2;
Release 1 ;
release 1
(specially for 3A);
Arabic edition
Release2;
Release 1
Customization, Scripts
Only files located in the same folder can be addressed
from an XML Ticket. It is not possible to address files
in subfolders.
It is not possible to create new objects using the NEW
operator under 64-bit operating systems in COM API.
The warning is written in the Open API Help, Open
API overview -> COM-based API -> Using the COMbased API within 64-bit Applications.
© ABBYY. All rights reserved. Page 38 of 39
Release2;
Release 1;
release 1
(specially for 3A);
Arabic edition
Release2;
Release1
Description
Solution or workaround
The FileName property of the XMLTicket object
contains full path to the file, if the file was added to
processing via API.
The FileName property contains only the name of the
file, if file was added to processing via hot folder or
other standard inputs.
To work around this the OriginalFileName of
the InputFile object property can be used.
Release with
issue
Release2;
Release1
ABBYYRecognitionServer.Client client = new
ABBYYRecognitionServer.Client();
ABBYYRecognitionServer.XmlTicket xmlTicket =
= client.CreateXmlTicket(workflowName);
ABBYYRecognitionServer.InputFile inputFile =
new ABBYYRecognitionServer.InputFile();
inputFile.FileName = filePath; // the path to
exisiting file
xmlTicket.AddImage(filePath);
xmlTicket.InputFiles.Item(0).OriginalFileName =
Path.GetFileName(filePath);
Another way to workaround is just to add a
little bit of code in script where the original file
name is used: to check if FileName property
contains ‘\’ then extract only the file name
from the whole path, otherwise FileName
contains the name of the file.
Help
The following features are not described in the Help
file:
1) Export to the specific column types of MS
SharePoint
2) Import of registration parameters (lists of index
fields from the workflow) at Scanning Station
3) Notifications
4) User interface changes of the Jobs node
© ABBYY. All rights reserved. Page 39 of 39
Release2