Recognition Server 4.0 Release Notes

ABBYY Headquarters P.O. Box 54, Moscow 129301, Russia Tel.: +7 (495) 783 3700 Fax: +7 (495) 783 2663 office@abbyy.com www.ABBYY.com Recognition Server 4.0 Release Notes © ABBYY. All rights reserved. Table of Contents RECOGNITION SERVER 4.0 RELEASE NOTES ........................................................................................................ 1 TABLE OF CONTENTS ........................................................................................................................................... 2 INTRODUCTION ................................................................................................................................................... 5 About This Document ..................................................................................................................................... 5 About the Product .......................................................................................................................................... 5 Key Enhancements.......................................................................................................................................... 5 Release 2 ............................................................................................................................................................ 5 Release 1 Multilingual ........................................................................................................................................ 5 Release 1 ............................................................................................................................................................ 6 Release 1 (specially for 3A) ................................................................................................................................. 6 Arabic Edition ..................................................................................................................................................... 6 Installing the New Version .............................................................................................................................. 6 License Usage ................................................................................................................................................. 6 NEW FEATURES AND IMPROVEMENTS ................................................................................................................ 7 1. 2. Server Features ........................................................................................................................................ 7 1.1. Separate workflow queues ................................................................................................................... 7 1.2. Easy recovery after failure without data loss ....................................................................................... 7 1.3. Support working on Failover cluster ..................................................................................................... 7 1.4. Internal database .................................................................................................................................. 7 1.5. Server exceptions folder ....................................................................................................................... 7 Administration Console............................................................................................................................ 7 2.1. 2.1.1. 2.2. Usage of Active Directory groups ..................................................................................................... 7 Logs and reports ................................................................................................................................... 8 2.2.1. Improved logging ............................................................................................................................. 8 2.2.2. Saving information about the operator who edited or rejected the document .............................. 9 2.2.3. Correspondence between input and output files ............................................................................ 9 2.3. Notifications ....................................................................................................................................... 10 2.3.1. Including server and workflow names into the text of notification messages .............................. 10 2.3.2. Notification about near license expiry ........................................................................................... 10 2.4. Job rejection without loss of files ....................................................................................................... 10 2.5. Interface improvements ..................................................................................................................... 11 2.5.1. Main window of Administration Console ....................................................................................... 11 2.5.2. Workflow status pane .................................................................................................................... 11 2.6. 3. User rights management ...................................................................................................................... 7 Soft stop of the workflow processing ................................................................................................. 12 Workflow settings .................................................................................................................................. 13 3.1. 3.1.1. 3.2. Document Library workflow type ....................................................................................................... 13 Periodical crawling of document libraries ...................................................................................... 13 Input settings ...................................................................................................................................... 14 3.2.1. Processing SharePoint libraries ...................................................................................................... 14 3.2.2. Using IFilter for processing PDF files in MS SharePoint.................................................................. 15 3.2.3. Filtering files for processing and settings for unprocessed files .................................................... 15 3.2.4. 3.3. Using the SSL protocol for data protection .................................................................................... 17 Processing settings ............................................................................................................................. 17 3.3.1. Special mode for processing technical drawings ........................................................................... 17 3.3.2. Despeckle images option ............................................................................................................... 18 © ABBYY. All rights reserved. Page 2 of 39 3.3.3. Additional fonts .............................................................................................................................. 18 3.3.4. To speed up processing, text in pictures is not recognized by default .......................................... 18 3.3.5. Blank page detection settings ........................................................................................................ 18 3.4. 3.4.1. Improved MRC compression method of output PDF files.............................................................. 19 3.4.2. Version, format, and other parameters of an output PDF file ....................................................... 20 3.4.3. Export to PDF/A-3 format .............................................................................................................. 20 3.4.4. Tagged PDF enabled by default ...................................................................................................... 20 3.4.5. Possibility to skip processing PDFs with a text layer ...................................................................... 20 3.4.6. Ability to embed a text layer and keep the image and all PDF file properties ............................... 21 3.4.7. Enabling and disabling Fast Web View for PDF files ...................................................................... 21 3.4.8. Using PDF text layer for recognition results improvement ............................................................ 21 3.4.9. Using PDF text layer for generating quality output files of different formats ............................... 22 3.5. 4. 5. Overwriting files in an output folder .............................................................................................. 22 3.5.2. Export format compatible with FineReader Engine 11 .................................................................. 23 3.5.3. KeepPages parameter .................................................................................................................... 23 3.5.4. Export to specific column types in SharePoint ............................................................................... 23 3.5.5. Export to ePub3 format .................................................................................................................. 24 3.5.6. Settings of units measurement for export to ALTO XML ............................................................... 24 Document processing ............................................................................................................................. 24 4.1. Improved recognition of Arabic texts ................................................................................................. 24 4.2. Ability to limit the number of processed pages in input files ............................................................. 25 4.3. Support of new barcode type - USPS-4CB (Intelligent Mail Barcode) ................................................ 25 4.4. Disabled image compression of lossy JBIG2 type ............................................................................... 25 Scanning Station .................................................................................................................................... 26 8. 9. Sending registration parameters values to index fields ..................................................................... 26 Verification and Indexing Stations ......................................................................................................... 27 6.1. Manual selection of documents for verification and indexing ........................................................... 27 6.2. Saving documents ............................................................................................................................... 28 6.3. Timeout of inactivity ........................................................................................................................... 28 6.4. Improved work with document types and index fields on Indexing Stations .................................... 28 6.4.1. Import of index fields from files ..................................................................................................... 28 6.4.2. Quick input of index fields .............................................................................................................. 29 6.4.3. Possibility to combine values from several regions into a one index field .................................... 29 6.5. 7. Output settings ................................................................................................................................... 22 3.5.1. 5.1. 6. PDF processing options....................................................................................................................... 19 User interface changes ....................................................................................................................... 30 6.5.1. Verification Station......................................................................................................................... 30 6.5.2. Indexing Station ............................................................................................................................. 30 Operating systems ................................................................................................................................. 30 7.1. Support for Windows Server 2012 Release 2 ..................................................................................... 30 7.2. Discontinued support for Windows XP and Windows Server 2003 ................................................... 30 Scripting ................................................................................................................................................. 31 8.1. Access to subsequent pages from the document assembly script ..................................................... 31 8.2. Detecting the workflow name by script ............................................................................................. 31 Changes in the COM-based API and Web API ......................................................................................... 31 9.1. Namespace changes ........................................................................................................................... 31 9.2. Compatible API ................................................................................................................................... 31 9.3. Automatic API deployment on 64x operating systems ...................................................................... 31 © ABBYY. All rights reserved. Page 3 of 39 9.4. 10. Added objects ..................................................................................................................................... 31 9.4.1. Correspondence between input and output files .......................................................................... 31 9.4.2. Support of the recognition service scenario (for NLC) ................................................................... 32 9.4.3. Deleting of jobs .............................................................................................................................. 34 UI and Documentation localization ................................................................................................... 34 CORRECTED ISSUES............................................................................................................................................ 35 KNOWN ISSUES ................................................................................................................................................. 35 © ABBYY. All rights reserved. Page 4 of 39 Introduction About This Document This document describes the features that have been implemented in the ABBYY Recognition Server 4. About the Product ABBYY Recognition Server 4 provides new technology, including significantly improved recognition of texts in Arabic, new export settings, and other technology improvements. The main server features, such as stability, performance, and auto-recovery have been revised and improved. The new version can also process document libraries stored in read-only folders. Other improvements include advanced logging, GUI changes, and bug fixes. The main changes introduced in the new version are described below. Key Enhancements Release 2 Part #: 1135/6, build # 4.0.3.1167, OCR Technologies build # 13.0.15.131, release date: 14/11/2014 New features and changes in Release 2 are marked with the blue color here and in the document below. The major features:  Improved MRC compression method (producing the compressed PDF files of the minimum size)  Using IFilter for processing PDF files in MS SharePoint  Processing the SharePoint document libraries: o Crawling of the whole site (including multiple libraries and folders) o Periodical re-crawling settings  Export to specific column types in SharePoint (support of Date, Number, and some other formats)  Export to PDF/A-3 Other improvements:  Improved email notifications: o Notifications on near license expiry o Information on server name in the message text  Sending registration parameters values from Scanning Station to index fields  Soft stop of the workflow processing  Support working on failover cluster  Using PDF text layer for generating quality output files  Blank page detection parameters  New barcode type - USPS-4CB (Intelligent Mail Barcode)  New export format: ePub3  Settings of units measurement for export to ALTO XML  Disabled image compression of lossy JBIG2 type  Tagged PDF enabled by default  Possibility to combine values from several regions into a one index field  Access to subsequent pages from the document assembly script  Detecting the workflow name by script Release 1 Multilingual Part #: 1135/5, build # 4.0.2.952, OCR Technologies build number 13.0.13.21, release date: 14/08/2014  Translation of UI and help on the following languages: o o o o o o o French German Italian Spanish Chinese Portuguese (Brazil) Czech © ABBYY. All rights reserved. Page 5 of 39  o Hungarian o Polish Bug fix for ABBYY USA Release 1 Part #: 1135/4, build # 4.0.2.943, OCR Technologies build number 13.0.13.15, release date: 19/05/2014     Improved failure recovery Ability to limit the number of processed pages Verification and Indexing Station improvements o Selecting documents from a queue o Timeout settings o Saving changes on the stations Indexing Station improvements o Importing document types from an external source Release 1 (specially for 3A) Part #: 1135/3, build # 4.0.1.795, OCR Technologies build number 13.0.8.108, release date: 29/01/2014      Improved server operation o Redundancy o Reports and statistics Enhanced work with PDF files Ability to process documents in read-only folders Processing documents placed in SharePoint libraries Latest technology version Arabic Edition Part #: 1135/2, build # 4.0.0.461, OCR Technologies build number 13.0.0.58, release date: 06/05/2013  Improved recognition of Arabic texts  Ability to process documents in read-only folders  Improved logging  Bug fixes A detailed description of the changes can be found in Release Notes for Recognition Server Arabic Edition. Installing the New Version Recognition Server 4 can be installed on the same computer where Recognition Server 3.5 or earlier is installed. The configuration settings used in a previous version of ABBYY Recognition Server can be imported into ABBYY Recognition Server 4. For further information, please see the section “Upgrade from the previous versions of ABBYY Recognition Server” in the System Administrator’s Guide. Note: Please be aware that some changes have been made to the XML result file, which describes the results of task processing. This may require changes in the software used for integrating ABBYY Recognition Server with data storage systems. For details, see the description of changes to the XML result file below or refer to the “XML Result” section in the Help file. License Usage Recognition Server 4 cannot work with licenses generated for previous versions of Recognition Server. Recognition Server 4 can work with a license generated for Recognition Server Arabic Edition, but since some of the license parameters have been changed (ISIS option has been added), we recommend generating new licenses for Recognition Server 4 Release 1 (for 3A), Recognition Server Release 1 and other maintenance releases. © ABBYY. All rights reserved. Page 6 of 39 New Features and Improvements 1. Server Features 1.1. Separate workflow queues Each workflow now has a separate queue which prevents other workflows from being stopped if one of them has an overloaded queue. The default number of jobs in the queue of each workflow is 50. This number can be changed in the Configuration.xml file: MaxJobsCount="50". Prior to this change, this key set the total number of jobs in the server queue. Now this key sets the number of jobs in the queue of each workflow. Implemented in: release 1 for 3A 1.2. Easy recovery after failure without data loss Recovery after failure is now smoother and does not require manual copying of files. GUIDs are not used in file names anymore, so it is always possible to find a file by its name. When Recognition Server processes jobs, files are stored in the folder %programdata%\ABBYY Recognition Server 4.0\RS4WF\Images\<Workflow name>. File names are the same as the names of source files with the only difference: job ID is added at the beginning of the name. Implemented in: release 1 for 3A 1.3. Support working on Failover cluster Work on failover cluster has been supported. The Recognition Server instances can be installed on separate nodes of Failover cluster. All settings of the Recognition Server can be stored in the shared folder available for the cluster. Please note: this feature has not been tested. The testing can be done upon the request. (The instruction with details of installing the Recognition Server on Failover cluster will be provided later.) Implemented in: Release 2. 1.4. Internal database The current system state is now stored in the internal SQLite database. This database is installed together with Recognition Server and is invisible to users. Implemented in: Arabic Edition 1.5. Server exceptions folder A new folder with server exceptions is now created in C:\ProgramData\ABBYY Recognition Server 4.0\RS4WF\Exceptions. This folder contains jobs which failed due to the faulty operation of the server or server flows. Jobs may fail if, for example, the database or the configuration file becomes corrupted. Implemented in: release 1 for 3A 2. Administration Console 2.1. User rights management 2.1.1. Usage of Active Directory groups In the Users node of the Administration Console it is now possible to add groups of users from Active Directory. The full name of a group should be specified, including the domain name. A role (e.g. Verifier or Indexer) can be © ABBYY. All rights reserved. Page 7 of 39 assigned to a group, and all members of the group will have the rights corresponding to the assigned role. Any users added to the group will automatically receive the rights required to work with Recognition Server. Implemented in: release 1 for 3A, release 1 When a user adds a new group, the application checks if this group exists in Active Directory and displays a warning if the group cannot be found. The user can still add a group with this name. Implemented in: release 1 2.2. Logs and reports 2.2.1. Improved logging The job log contains records about every finished job in Recognition Server. The details pane has two tabs: a Files tab shows input and output files of the job and paths to these files and a Details tab shows detailed information about the job, including Processing notes. Now the job log may contain more than 500 records. The number of records is now limited only by the size of the log or by the maximum number of days when data will be logged. These values can be changed in the Job Log Properties dialog box, which can be opened by clicking the Options button . The Find button allows users to search for records by input and output file name or by error text. Wildcard searches are supported. The job log can be saved to a *.csv by clicking Export to CSV File on the shortcut menu. © ABBYY. All rights reserved. Page 8 of 39 By default, the Job Log provides two views: all jobs without filtering and failed jobs. It is also possible to create custom views by applying a custom filter to the log. To create a custom view, select the corresponding item from the shortcut menu or click the Create Custom View button and specify a view name and filtering settings. The custom log will appear in the tree below the Job Log node. Implemented in: Arabic edition 2.2.2. Saving information about the operator who edited or rejected the document The XML result file now contains information about the operator who verified or indexed the document. This information is available in the verificationUserName and indexingUserName fields inside the <XmlResult> and <JobDocument> tags. If indexing and verification are switched off, these fields will remain empty. The XML result file now also contains information about the time of document indexing and verification. The job log contains information about the rejected jobs in Processing notes (who rejected a job and on which station). Implemented in: release 1 for 3A 2.2.3. Correspondence between input and output files The XML result file allows you to establish a correspondence between the original and the resulting files: in the log, you can see the input and output files for each job. Changes in the XML result file:  The attribute “Id” has been added to the <InputFile> tag. This is the identifier of the input file.  An embedded <Page> tag has been added to the <InputFile> tag. It has the following parameters: Id – the page identifier of the input document; PageNumber – the number of the page in the input file.  A <Pages> tag with embedded <FileId> and <PageId> tags has been added to the <JobDocument> tag. <FileId> is the input file identifier and <PageId> is the page identifier which indicates the page of the input file which is the origin of the current processed page. Changes in the log: The log now has a Files tab which shows input and output files for each job. Implemented in: Arabic edition © ABBYY. All rights reserved. Page 9 of 39 2.3. Notifications 2.3.1. Including server and workflow names into the text of notification messages Now the server name and the workflow names are included in the text of the notification messages sent by email to the administrator. This helps to easier manage the servers/workflows and solve the possible problems. The subject of the email message has the following structure (to be used for filtering the emails): ABBYY Recognition Server (<Server Name>): <Reason of notification> Implemented in: Release 2. 2.3.2. Notification about near license expiry New notifications about near license expiry have been added. Notifications can be sent based on the following event notification options:  percentage of remaining pages in license;  number of days left before the license expiry. Implemented in: Release 2. 2.4. Job rejection without loss of files Now it is possible to reject or delete a job without deleting the files. The new commands Reject Job and Reject All Jobs are used to reject a job or all jobs. The files will be saved to the Exceptions folder of the corresponding workflow. The commands Delete Job and Delete All Jobs are used to delete a job or all jobs. The files will be placed into the Exceptions folder of the server. Implemented in: release 1 for 3A © ABBYY. All rights reserved. Page 10 of 39 2.5. Interface improvements 2.5.1. Main window of Administration Console The interface of the main window has been changed. New toolbars, panes, and buttons have been added. The order of nodes is also slightly different. The stations are now gathered in the Stations node. 2.5.2. Workflow status pane The Workflow status pane displays the current state of the selected workflow. Available information depends on the workflow type. The status pane displays the following information:  State: started or stopped  Start time  Stop time (if workflow was stopped)  Duration  Total number of jobs  Number of processed jobs  Number of copied files  Number of failed jobs  Paths to Output folders  Path to Exceptions folder For a Document Library workflow which has been started, the status pane also displays a progress bar with percent completed. For a workflow with errors, the reason of failure is given in the status pane. © ABBYY. All rights reserved. Page 11 of 39 2.6. Soft stop of the workflow processing Now it is possible to stop the processing of jobs using the so called “soft” stop mechanism. It helps to complete the processing of all current jobs. New jobs are not taken into the processing. After the results of all current jobs are published, the workflow is stopped. For manual “soft” stop one should use the Stop command. If the processing runs by the schedule, the workflows are always stopped “softly”. If the processing must be interrupted and the current jobs must be postponed without completion, one should use the manual Stop immediately command. It frees the computing power at once. The postponed jobs are finished, when the workflow is started again. Implemented in: Release 2. © ABBYY. All rights reserved. Page 12 of 39 3. Workflow settings 3.1. Document Library workflow type New Recognition Server functionality allows users to process document libraries which shouldn’t be modified. Now users don’t need to copy files to the Hot Folder. Instead, they can simply specify the root folder of a document library as an input folder, an output folder, an output format, and processing settings. The document library will be recognized and the processed files will appear in the specified output folder. The structure of the original document library will be preserved. Files which do not require recognition can be skipped, or moved to the output folder if you need to preserve the entire structure of the document library. The input files will not be deleted, as opposed to using the Hot Folder. A new workflow type has been created especially for processing document libraries. The Document Library workflow will be stopped after all files in the indicated library are processed. If the user places new files into the library, he must restart the workflow. As all processed files are registered, only new files will be processed. If the workflow settings have been changed and it is necessary to reprocess all files again, use the Restart command (click the arrow next to the Start button to see the command). As a document library might be quite big and take a long time to process, workflows of type Document Library has a progress bar. See Workflow status pane for details. Implemented in: Arabic edition, modified in release 1 for 3A 3.1.1. Periodical crawling of document libraries A crawling frequency can now be set up for the workflow of the Document Library type to ensure the fast processing of upcoming files. A new option Crawl for new files in library every: should be enabled. The period of the library crawling can be selected from the drop-down list (from 10 minutes to 11 hours) or typed manually. E.g. “2 hours”, “12 hours”, etc. After the periodical crawling is enabled and the workflow is started, the system runs the monitoring of the document library and counts down the time until the next crawl. If the Crawl for new files in library every: option is not enabled, the library is crawled only once. The start time of crawling depends on the Workflow Activity settings (General tab). Settings of periodical crawling of document libraries can be also specified in the configuration file (Configuration.xml). © ABBYY. All rights reserved. Page 13 of 39 Parameter EnablePeriodicCrawling stands for enabling/disabling the periodical crawling, the possible values are True and False (the default is False). Parameter CrawlingInterval sets the crawling interval in milliseconds (the default value is 7200000 ms). Implemented in: Release 2. 3.2. Input settings 3.2.1. Processing SharePoint libraries SharePoint libraries can now be indicated as a source for a Document Library workflow. Users can indicate the input source: a site, a particular library or several libraries, a folder or several folders. If Export output files to source library option is enabled when configuring the input source of MS SharePoint, the output parameters will always include an output file with the export destination of SharePoint source libraries. Output files are saved into the same libraries/folders as they are at input. The format and naming schema of a file can be configured. By default the output files are saved under the same names as at input. If a file already exists, a new version is created. If Export output files to source library option is not enabled, than the output settings can be configured as usual, including saving the files into any SharePoint library/folder. Only one library/folder can be selected. If one and the same folder or library is indicated as input and output, files can be overwritten, or files with new names can be created, or the versions of the files can be changed. The behavior is determined by the option selected from the If file exists drop-down list in the Output Format Settings dialog box. See also Overwriting files in the output folder. © ABBYY. All rights reserved. Page 14 of 39 Limitations: 1. If the input library is the same as the output library, the option For each folder cannot be used — you can only create a job for each file. 2. Only one site including all its libraries can be processed within one workflow. For child sites one should create separate workflows. Implemented in: release 1 for 3A. Possibility to indicate several libraries as input was implemented in Release 2. 3.2.2. Using IFilter for processing PDF files in MS SharePoint Microsoft Search IFilter for SharePoint 2013 can again be used for indexing PDF files due to the lifting of the Microsoft ban. To enable this possibility, the cumulative update package for SharePoint Server 2013 should be installed. Link to install it: http://support2.microsoft.com/default.aspx?scid=kb;EN-US;2882989 Please note: The update for MS SharePoint should be installed before the installation of Recognition Server 4 Release 2. If the Recognition Server 4 Release 2 has been installed, install the update for MS SharePoint, then run the installation of the Recognition Server 4 Release 2 again and use the Repair command to modify the installation. Implemented in: Release 2. 3.2.3. Filtering files for processing and settings for unprocessed files It is possible now to filter files to be processed using a “mask” (i.e. a template) for file names. If you specify a name mask, the program will process only files with names and extensions which fit the mask. Files can be selected in the workflow properties: Input tab, Select files to process. © ABBYY. All rights reserved. Page 15 of 39 You can use the “?” and “*” symbols in the mask. “?” stands for any single character and “*” stands for any number of any characters. For instance, the mask *.* will select all files, the mask *.tiff will select only files with the “.tiff” extension, and the image*.* mask will select files of all types whose names start with “image”. For workflows of the Hot Folder and Mail types, the default mask is *.*, i.e. all files from the Input folder will be processed. For workflows of the Document Library type, the default mask selects files in all of the supported image formats (*.bmp, *.dib, *.rle, *.dcx, *.djvu, *.djv, *.gif, *.jb2, *.jbig2, *.jp2, *.j2k, *.jpf, *.jpx, *.jpc, *.jpg, *.jpeg, *.pcx, *.pdf, *.png, *.tif, *.tiff, *.wdp, *.wmp.). You can specify any other mask that suits your needs. For instance, you may wish to have a mask that processes image files but ignores files with the “.tmp” extension, which may be created in the input folder when scanning documents. Under Other files, you can specify which actions should be performed on files that do not fit the mask:    Exceptions folder - Any files that do not fit the mask will be placed into the Exceptions folder. Use this option when only files of certain types must be processed. Output folders - Any files that do not fit the mask will be placed into an output folder. Use this option for processing archives where all documents must be preserved together with the folder structure. Processed image files will be converted to images with a text layer and all other files will be copied or moved to an output folder “as is.” No action - Any files that do not fit the mask will be ignored. Use this option when only files of certain types must be processed. Note: We do not recommend using the No action option for workflows of the Hot Folder type, as this may fill up the folder with unprocessed files. Note: A separate job is always created for unprocessed files. If the workflow must create one job per folder and in a folder contains both processed and unprocessed files, the workflow will create one job for the processed files and another job for the unprocessed files. The mask option is useful in the following scenarios:  Hot Folder. Sometimes scanners create *.tmp files besides *.tiff files and place both kinds of files in the same folder. Only *.tiff files should be processed, and the *.tmp files should be ignored.  Read-only folder. The user might need to recreate in the output folder the structure of the input folder. Only images should be processed and the other files must be moved to the output folder.  Mail. Besides an attached image file, a letter may contain a logo or signature in GIF format. Only the attached image file should be processed and the GIF logos and signatures should be ignored. © ABBYY. All rights reserved. Page 16 of 39 The input files of failed jobs can now be moved to output folders, moved to the Exceptions folder, or ignored. To tell the program what it should do with failed jobs, use the Save failed jobs to option on the Quality control tab of the Workflow Properties dialog box. Note: If the user chooses to move unprocessed or failed files to output folders and the workflow contains several output folders, the unprocessed or failed files will appear in all output folders. Implemented in: Arabic Edition, modified in release 1 for 3A 3.2.4. Using the SSL protocol for data protection Communicating with a POP3 server over the SSL protocol is now supported. If POP3 E-mail Server is selected as the source type, the option Use SSL becomes available. Port 995 should be specified in the Port number field. Implemented in: release 1 3.3. Processing settings 3.3.1. Special mode for processing technical drawings Working with technical drawings such as construction blueprints has been significantly improved. Since the processing of technical drawings requires settings different to those required for regular documents, users should enable the Processing mode for technical drawings option on the 2. Process tab of the Workflow Settings dialog box. It is recommended to enable this mode for documents that contain a lot of fine details. The graphical objects will remain unchanged and the text will be recognized. Recognition in this mode is done in three directions:    The direction of the principal orientation, which is automatically detected Rotated clockwise relative to the principal orientation Rotated counterclockwise relative to principal orientation In the XML output file, the orientation of the text will be indicated in the orientation attribute:  RotatedClockwise  RotatedCounterclockwise  If not indicated, the orientation is “normal” (i.e. the text is oriented horizontally) Note: Using this mode can slow down image processing. Implemented in: release 1 for 3A © ABBYY. All rights reserved. Page 17 of 39 3.3.2. Despeckle images option The Despeckle option is now available in the product GUI (Workflow properties, 2. Process tab, Advanced Processing Settings). This option removes noise from the image. Noise can be introduced by scanning, and it is recommended that it be removed for better data recognition. During despeckling, the program also removes background dots or boundary lines of raster forms. By default, the option is switched off, because in some cases it can adversely affect recognition (the program may even fail to recognize some text fragments). We recommend switching the option on only if you are certain that it will help to remove noise from your images (please try it first on several sample images). The corresponding API method is RemoveGarbage. Implemented in: release 1 for 3A 3.3.3. Additional fonts This setting is only available in the configuration file. By default Recognition Server uses only limited number of fonts to avoid dependency of a result on fonts set installed in each processing station. These fonts might be not enough for correctly display fonts for Chinese, Korean, Japanese, Thai or Arabic. To solve this problem, a new parameter, AllowedFontsMode, is available in the section RecognitionParams of the configuration file (Configuration.xml). Possible values are:   Default – In this mode, only the following fonts will be used: Arial, Times New Roman, and Courier New. All – All possible fonts will be used. Please note that processing will take longer. It is also important that the user have the same set of fonts on all the processing stations; otherwise the result might be different on different computers. Users can also use a custom font set as an addition to the main font set. In this case, a list of additional fonts can be added below the section RecognitionParams using the element AdditionalAllowedFont. This example illustrates adding the font AngsanaUPC to the set of main fonts: <RecognitionParams RecognitionQuality="Fast" LookForBarcodes="true" VerificationMode="AlwaysVerify" RecognitionMode="FullPage" TextExtractionMode="false" AllowedFontsMode="Default"> <AdditionalAllowedFont>AngsanaUPC</AdditionalAllowedFont> Implemented in: release 1 3.3.4. To speed up processing, text in pictures is not recognized by default To speed up processing, recognition of text in pictures is now disabled by default. If you need to recognize text in pictures, you can enable this feature in the configuration file. This can only be done for the quality recognition mode. The name of the parameter is ProhibitHiddenTextDetection, the default value is true. Implemented in: release 1 3.3.5. Blank page detection settings The settings to configure the flexible detection of empty pages have been added. It helps to avoid problems of wrong blank pages detection for images of the low quality, with the noise left after scanning, with non-textual objects, etc. © ABBYY. All rights reserved. Page 18 of 39 Margins, percentage of blackness and objects allowed on a page to consider it empty can be specified in the Document Separation parameters. Implemented in: Release 2. 3.4. PDF processing options 3.4.1. Improved MRC compression method of output PDF files The quality of output PDF files generated with using the MRC method of compression has been significantly improved. The enhanced method of MRC compression now grants the noticeably better visual quality of documents while keeping almost the same small file size. The MRC compression for output files shows the same results of minimizing the file size and preserving the visual quality as our competitors (incl. CVISION). The improved compression methods are used by default now in all new and previously created workflows with compressed PDF output format enabled (Enhanced compression (MRC) option). To disable the updated MRC and use the previous compression mode one should set the LegacyMRCMode flag to True in the Configuration.xml of ABBYY Recognition Server settings. To manage the quality/size parameters of the output files, the Max Quality – (balanced) – Min Size profiles can be selected. These profiles help you to select the desired output quality/size and have the settings configured automatically. For instance, when selecting Min Size profile, the quality parameter is set to 30% and the MRC compression is enabled. Implemented in: Release 2. © ABBYY. All rights reserved. Page 19 of 39 3.4.2. Version, format, and other parameters of an output PDF file Export settings for PDF and PDF/A have been expanded: it is now possible to specify a version for output PDF files and select a PDF/A standard. The list of available PDF standards includes PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, and PDF/A-2u. Implemented in: release 1 for 3A 3.4.3. Export to PDF/A-3 format Export of output files to PDF/A-3 format has been supported. It is possible to select PDF/A-3a, PDF/A-3b, or PDF/A3u standards of PDF/A format. Please note: the attachment cannot be written into the output PDF/A-3. Implemented in: Release 2. 3.4.4. Tagged PDF enabled by default When adding a new output format for saving documents to PDF files, the option of Enable tagged PDF (compatible with Adobe Acrobat 5.0 or above) is enabled by default now. This helps to avoid problems with having excess spaces in the words and ensure the correct search within the PDF file. Please note: this option may result in upto a 10% increase in the file size. Implemented in: Release 2. 3.4.5. Possibility to skip processing PDFs with a text layer It is now possible to skip the processing of PDF files. PDF files with a text layer can now be moved to an output folder if the user selects the option Do not modify files with high-quality text layer. The user can also select a detection mode:  In Fast mode, the application looks for a text layer in the file. If a text layer is detected, the file will be moved to an output folder and the other export settings will be ignored. The application will not treat the pages in this file as OCRed, but please note that if there are other output folders with formats other than PDF specified, OCR will be performed, affecting the page counter.  In Thorough mode, the application compares the text layer of a PDF file with OCR results (a piece of text on each page will be compared). If the text in the text layer and the text obtained through OCR are identical, the file will be moved to an output folder. In this case pages are considered to be as OCRed, which affects the page counter. When a text layer is compared to OCR results, the default threshold is 5%. This means that the program will use the OCR results, if there is more than more 5% difference between the texts. This threshold can be changed in the Configuration.xml file: SkipRecognizePdfsWithTextLayerCoefficient="25" This setting is located in the ExportFormat node and appears in the file when you set up output to PDF. Note: 1. 2. Files skipped in Fast mode will not be sent to operator stages (i.e. indexing or verification). The setting is only applicable to source files in PDF format. Implemented in: release 1 for 3A © ABBYY. All rights reserved. Page 20 of 39 3.4.6. Ability to embed a text layer and keep the image and all PDF file properties Sometimes PDF files don’t have a good text layer but have bookmarks, attachments or other parameters which must be preserved. It is now possible to preserve all attributes of a PDF file and embed only recognized text. The option Modify text layer only is available on the Format Settings tab for PDF and PDF/A. Note: The option is only applicable to source files in PDF format. Implemented in: release 1 for 3A 3.4.7. Enabling and disabling Fast Web View for PDF files The option Fast Web View is available on the Format Settings tab for PDF and PDF/A. If the option is enabled, a preview will be created for fast opening of the file on websites. Implemented in: release 1 for 3A 3.4.8. Using PDF text layer for recognition results improvement In case PDF files with a text layer are OCRed by Recognition Server the source text layer is used for recognition results improvement. For example, unconfidently recognized characters are checked with a text layer and are copied from it. Implemented in: release 1 for 3A © ABBYY. All rights reserved. Page 21 of 39 3.4.9. Using PDF text layer for generating quality output files of different formats If imported PDF file contains a text layer, it can be reused for creating the quality output files of PDF and other formats. For example, PDF/A, ALTO XML, etc. When running the OCR of imported files, the original text layer is detected. The quality of the original text character is evaluated before copying it to the resulting file. By this algorithm we ensure the same or better quality of the output file compared to the original file. Please note, that the license counter is decreased, even if the original files contain the text layer. Implemented in: Release 2. 3.5. Output settings 3.5.1. Overwriting files in an output folder It is now possible to overwrite an output file if it already exists in an output folder. If the option Overwrite if file exist is not selected, a 4-digit index will be added to the file name. In the XML result file, the attribute RewriteIfFileExists has been added to the tag <FormatSettings>. The value true indicates that the files in the output folder were overwritten. Implemented in: Arabic edition When you save output files in a SharePoint library, you have a choice of the following options:  Create new name – The output file will be given a new name.  Overwrite file – The output file will replace the original file.  Use SharePoint versioning options – The output file will replace the original file and a new version number will be calculated using the current settings of SharePoint versioning. SharePoint options: Implemented in: release 1 for 3A © ABBYY. All rights reserved. Page 22 of 39 3.5.2. Export format compatible with FineReader Engine 11 Recognition Server 4 supports export to an internal FineReader format which is compatible with FineReader Engine 11. To export to this internal FineReader format, select FineReader Internal format (*.layout, *.image) as the output format. As a result, two files will be created with *.layout and *.image extensions. This feature is useful for complicated image processing in FRE. Instead of creating a distributed system, Recognition Server will be used for text layer creation. Implemented in: release 1 for 3A 3.5.3. KeepPages parameter This setting is only available in the configuration file. The new parameter KeepPages regulates page breaks in the output formats doc, rtf, and docx. This parameter is available in the export settings inside the ExportFormat tag of the configuration file (Configuration.xml). Possible values are true and false (the default value is false). Usage scenario: The size of a text fragment on a page can decrease if font size is decreased. To keep the page breaks as in the source document, the parameter should be set to true, otherwise content from the beginning of one page may be placed on the preceding page. In other cases, the size of a text fragment can increase and if you keep the page breaks, the end of the text fragment from one page can be placed on the following page. If this is the case, we recommend setting the KeepPages parameter to false. Implemented in: release 1 for 3A 3.5.4. Export to specific column types in SharePoint Export of index fields to specific column types to SharePoint has been supported:  Single line of text;  Multiple lines of text;  Choice (menu to choose from);  Number;  Currency;  Date and Time;  Yes/No (checkbox);  Hyperlink or Picture;  Managed Metadata. The document attributes (index fields) should be mapped with the appropriate content types imported from the selected SharePoint library. To configure the mapping process, one should click the Settings button then selecting the SharePoint document library in the output parameters. In the Mapping Document Attributes to SharePoint Columns window the links between the RS document types (created at the Indexing tab) and SharePoint content types (submitted from the © ABBYY. All rights reserved. Page 23 of 39 selected library) should be established. After the appropriate SharePoint content type is selected, the RS document attributes (index fields) can be mapped with the SharePoint columns. Implemented in: Release 2. 3.5.5. Export to ePub3 format Export of output files to ePub v.3 format has been supported. Implemented in: Release 2. 3.5.6. Settings of units measurement for export to ALTO XML A unit of measurement (pixels, inches, and mms) can be selected when configuring export to ALTO XML format. Implemented in: Release 2. 4. Document processing 4.1. Improved recognition of Arabic texts A new version of OCR Technologies is used in the Recognition Server 4, where Arabic OCR has been significantly improved. Please see the Test results for recognition speed and quality as compared to the OCR Technologies inside Recognition Server 3.5. There you can also find a comparison with other OCR products that can recognize texts in Arabic. © ABBYY. All rights reserved. Page 24 of 39 Besides these technological tests, productivity of Recognition Server 4 was measured on 2,500 pages of Arabic texts which were exported to RTF. This test has shown Recognition Server 4 to be 17-20% faster compared to Recognition Server 3.5. Implemented in: Arabic edition 4.2. Ability to limit the number of processed pages in input files In many scenarios that involve searching document libraries it is sufficient to have text from a few first pages in order to find a document. In such cases, clients would like to save time and pages in the page counter by limiting the number of processed pages to N first pages in each file. This feature can be switched on for IFilter and GSA connectors in the workflow settings via the GUI. For other workflows, it can only be switched on using the XML ticket. Example of enabling this feature in the XML ticket: <XmlTicket PageNumToRecognizeForSingleInputFile="2"> <InputFile Name="50.pdf" /> <ExportParams> <ExportFormat OutputFileFormat="Text" OutputFlowType="SharedFolder"> <OutputLocation>D:\Output Folder</OutputLocation> </ExportFormat> </ExportParams> </XmlTicket> This feature will work only if there is no document assembly (the option Create one document for each file in job is selected), otherwise the setting will be ignored. This setting has the following effect:  Only processed pages will be counted in any output files  The time of processing will be reduced, as only the specified number of pages will be processed  This setting will be ignored if the output format is PDF  Output files in text formats will contain only N pages  Output files in image formats will contain all pages, but the page counter will be decremented only by N pages for each file  If an operator station is included in the processing, all pages can be opened on this station, but only the first N pages will be available for indexing and editing. The operator will be able to recognize other pages on the Verification Stations if necessary. In this case, the page counter will be decremented by the number of recognized pages. Implemented in: release 1 4.3. Support of new barcode type - USPS-4CB (Intelligent Mail Barcode) Extraction of barcodes of USPS-4CB type which is used on mails in USA and is required by the US postal service has been supported. Barcodes of USPS-4CB (Intelligent Mail Barcode) can be recognized in documents and also can be selected as a barcode type for the document separation in the workflow settings. Implemented in: Release 2. 4.4. Disabled image compression of lossy JBIG2 type Lossy JBIG2 image compression has been removed from the UI and internal compression parameters, as it produced the output files of low quality. Implemented in: Release 2. © ABBYY. All rights reserved. Page 25 of 39 5. Scanning Station 5.1. Sending registration parameters values to index fields When scanning a batch, the registration parameters entered for a document can be sent as the values of the document index fields. The lists of index fields (document types and their attributes) must be pre-configured in the workflow properties (Indexing tab). At the Scanning station in the batch type settings one should specify the batch sending parameters: select the desired workflow and import the list of index fields by clicking the Import Registration Parameters button. When creating a batch, select the desired Batch Type, assemble the documents and assign the Document Types in the Registration Parameters window. After processing the batch in Recognition Server, the documents with pre-filled index fields’ values are shown at the Indexing station. It is possible to skip the indexing stage by using the following code in the indexing script: “SkipManualIndexing = true;”. In this case index fields’ values will be exported according to the workflow settings. © ABBYY. All rights reserved. Page 26 of 39 The values of document registration parameters can be obtained from indexing or export script by using the standard Attributes object. Also they are accessible from the XML result file. Please note:  Only values of the parameters imported from the workflow can be sent as index fields to Recognition Server, despite it is possible to create more registration parameters in the batch type settings at the Scanning Station.  The types of entered values should coincide with the types of index fields, specified in the workflow properties. Implemented in: Release 2. 6. Verification and Indexing Stations 6.1. Manual selection of documents for verification and indexing Operators of Verification and Indexing Stations can now select documents manually from the queue. This feature can be very useful if an operator needs to speed up the processing of recently added urgent documents. The button on both stations toggles between manual and automatic modes of receiving the next document. The button on both stations should be used to open the Select Document for Verification or Select Document for Indexing dialog box. In this dialog box, the operator can find the required document, sort documents by name, priority or creation date, and select the found document, which will be opened for verification or indexing. Anew information pane displays the number of documents in the queue and allows starting verification or indexing and selecting documents manually. This pane appears:  Between tasks in manual mode  When connection with the server is lost  When the current document is returned to the queue when timeout is reached Implemented in: release 1 © ABBYY. All rights reserved. Page 27 of 39 6.2. Saving documents It is now possible to save changes in the current document on both Verification and Indexing Stations. Now if a failure occurs during verification, the verification results will not be lost. The verification results will be saved if the operator selects Document > Save or presses Ctrl+S. When the station is closed during document verification, the operator will be asked to save the results. The current document with the saved changes will be returned to the server and will become available to other operators. On Indexing Stations, it is only possible to save results after the document type is selected. Implemented in: release 1 6.3. Timeout of inactivity To prevent documents from sitting forever on operators’ stations, documents are returned to the queue after a timeout is reached. In previous versions, the timeout value was set to 120 minutes and could not be changed. That proved insufficient for verifying large documents (for example, books). The 120minute timeout is also not suitable for companies which allow operators to leave the current document opened when they go home after work or break for lunch. Now the timeout value can be changed in the Recognition Server Properties dialog box, or in the configuration file Configuration.xml (change the value in OperatorStationInactiveTimeoutInMinutes="120" in the QueueManager node). Important! This timeout is applied to all workflows and to all jobs on Verification and Indexing Stations. Implemented in: release 1 6.4. Improved work with document types and index fields on Indexing Stations 6.4.1. Import of index fields from files The ability to import document types, index fields, and values from an XML or CSV file has been added. This feature is useful if there is a need to use the same field in different workflows. The feature is available on the Indexing tab of the of the Workflow Properties dialog box tab (click the Import… button). Imported files should have the following structure:  XML Indexing.xml  CSV © ABBYY. All rights reserved. Page 28 of 39 DocumentType type1 type1 type2 FieldName IsObligitary FieldType bbb List ccc SingleLine test TRUE MultipleLines PossibleValues IsDefault Field1;Field2;Field3 TRUE Don't say Do this 1; test twice There are some other changes on the 5.Indexing tab of the Workflow Properties dialog box:  Order of document types can be changed using the Up and Down buttons.  The default document type can be selected using the Default type checkbox. Implemented in: release 1 6.4.2. Quick input of index fields When the operator starts typing an index field value, the values starting with the same letter will be automatically selected from the list of allowed values. Implemented in: release 1 6.4.3. Possibility to combine values from several regions into a one index field Possibility to use several regions as a source of values for the one index field has been added. This feature can be useful to set the multi-line text as an index field value. To combine the values, one should hold the CTRL key and click on the regions that contain values to be used as a single index field. The values are aggregated and separated with spaces automatically. Implemented in: Release 2. © ABBYY. All rights reserved. Page 29 of 39 6.5. User interface changes 6.5.1. Verification Station The main toolbar on the Verification Station has been changed:  The new Warnings button allows the user to hide/show the warnings pane. The button also displays the number of issued warnings.  The number of low-confidence characters is displayed on the Check Spelling button.  The new Select Document button allows selecting documents manually from the verification queue.  The new Get documents Automatically button allows switching between automatic and manual document selection.  The Reject All Documents button is hidden, this command is only available in the menu *. *Note: The Reject All Documents command should not be used very often because it rejects all documents of the job while the operator works on the current document only. The Reject command returns only the current document to the queue. Information about the number of documents in the queue is now displayed in the status bar: Implemented in: Release 1 6.5.2. Indexing Station The main toolbar on Indexing Station has been changed:  The new Select Document button allows selecting documents manually from the indexing queue.  The new Get documents Automatically button allows switching between automatic and manual document selection.  Reject All Documents button is hidden, this command is only available in the menu *. *Note. The Reject All Documents command should not be used very often because it rejects all documents of the job while the operator works on the current document only. The Reject command returns only the current document to the queue. Information about the number of documents in the queue is now displayed in the status bar: Implemented in: release 1 7. Operating systems 7.1. Support for Windows Server 2012 Release 2 Recognition Server 4 can be installed and run on Windows Server 2012 Release 2. Implemented in: release 1 7.2. Discontinued support for Windows XP and Windows Server 2003 We stopped supporting Windows XP and Windows Server 2003. Recognition Server 4 cannot be installed on these operating systems. © ABBYY. All rights reserved. Page 30 of 39 8. Scripting 8.1. Access to subsequent pages from the document assembly script A new property was added for a page object to enable the document assembly based on the analysis of subsequent pages - RecognizedPage: UserProperty. The decision on whether the page belongs to a document can be made based on the information about the next pages. For example, the same ID values on all the pages. Implemented in: Release 2. 8.2. Detecting the workflow name by script A new property was added for a page object to get the workflow name for the page that is being processed RecognizedPage: WorkflowName. This possibility allows copying scripts to several workflows without manual modifications. Implemented in: Release 2. 9. Changes in the COM-based API and Web API 9.1. Namespace changes The namespace of the COM API is changed from ABBYYRecognitionServer3 to ABBYYRecognitionServer. The namespace of the Web API is changed from RSSoapService3 to RSSoapService. Implemented in: Arabic edition 9.2. Compatible API By default, the API is not fully compatible, which allows Recognition Server 4 to be installed and run on the same computer where a previous version is installed. If there is a need to have a fully compatible API without recompiling your applications, you can achieve this by following simple instructions. This feature is available by request only; please contact ABBYY HQ for the instructions. 9.3. Automatic API deployment on 64x operating systems Both the Web and the COM API are automatically deployed by installer on 64x operating systems without any additional manual setup. 9.4. Added objects The goal of adding new objects to the API is to support these scenarios: 1. Ability to establish a correspondence between the input and output files. 2. Ability to delete jobs after asynchronous processing (for Ricoh). 3. Setup of the recognition service if Recognition Server is accessed and settings are changed by a user working on the same computer (for NLC). 9.4.1. Correspondence between input and output files The following objects are added to the COM-based and Web-based API to support the ability to establish a correspondence between input and output files. InputFile This object represents one input image file and the results of processing this file. © ABBYY. All rights reserved. Page 31 of 39 Properties Name Type Description Pages Pages, read-only Returns a collection of pages of the input file. ID String, read-only Unique identifier of the input file generated by RS. Pages This object represents a collection of Page objects. Page This object represents a page of the input file. This is a child object of InputFile. Properties Name Type Description ID String, read-only Unique page identifier generated by RS. Number String, read-only Page number in the input file. JobDocument This object represents one output document. Properties Name Type Description PagePositions PagePositions, read- Returns a collection of pages of the output document with the information only about the position of each page in the input file. PagePositions This object represents a collection of PagePosition objects. PagePosition This object represents a page in the output document and information about the position of this page in the input file. This is a child object of JobDocument. Properties Name Type Description FileId String, read-only ID of the input file to which the page belongs. PageId String, read-only ID of the page in that input file. Implemented in: Arabic edition 9.4.2. Support of the recognition service scenario (for NLC) In this scenario, Recognition Server works as a service which is almost invisible to the user and is called if documents processed with NLC are in an image file format and should be recognized first. Recognition Server is installed silently and uses the default workflow. However, its settings are available on the same computer and the user can change these settings. In this situation, it is necessary to have the ability to check if the job can be processed and cancel the job if it cannot be processed at the moment. With the new API methods now you can:  Check if the workflow is started or stopped  Check if there is a connection with server  Check if indexing and/or verification is switched on in the workflow and change indexing or verification settings The following objects have been added to the COM-based API. Now it is possible to get the state of a workflow. The parameter WorkflowState has been added to the IWorkflow interface. Interface Name Type Description IWorkflow WorkflowState WorkflowStateEnum, read-only Returns a collection of workflow states. © ABBYY. All rights reserved. Page 32 of 39 WorkflowStateEnum is a type of constant enumeration, which defines different workflow states. Name Description WS_ApplyingSettings The state of a workflow after it has been started and before the processing has begun. At this stage, the program checks if it can access the folder that contains the input documents. This state is very short in duration and is not indicated in the console (the word "Starting" is displayed instead). WS_Crawling At this stage, the program checks the folders of the Document Library workflow. It counts the files, adds them to the database, and prepares to process them. The word "Crawling" is displayed in the console. WS_Finishing The state of a workflow when processing is coming to an end. At this stage, the program writes the files for the last time and completes publishing the large files. The words "Finishing Processing" are displayed in the console. WS_NotAvailable The state of a workflow that is inaccessible. The words "Not Available" are displayed in the console, together with the reason why the workflow cannot be accessed. WS_Processing The principal state of a workflow, when files are being received, processed, and recognized. The word "Processing" is displayed in the console. WS_StartingProcess The state of a workflow after the start command has been executed and before information about the beginning of processing has been returned. The word "Starting" is displayed in the console. WS_Suspended The state of a workflow that has been stopped. The word "Stopped" is displayed in the console. Besides workflow states, it is possible to get the state of the server. Interface IClient Name Description Connect(string serverName) A connection with server is being established. If the server is stopped, there will be a COMException with this text: “ABBYY Recognition Server is not available: The client has successfully connected to the server, but the server is not running.” A method which deletes a job and all images has been added to the IClient interface. Interface Name Description IClient DeleteJob(string jobId) Deletes a job with its all images. It is now possible to receive the server’s Exceptions folder via the IClient interface. Interface Name Type Description IClient ServerExceptionsFolder string, read-only Returns the folder with the server’s exceptions It is now possible to switch on/ off verification using IXmlTicket Interface Name IRecognitionParams VerificationMode Type Description VerificationModeEnum Returns the verification type: whether verification will be performed or not. IRecognitionParams VerificationModeThreshold double Sets the verification threshold. VerificationModeEnum is a type of constant enumeration which defines different verification types. Name Description DVM_DoNotVerify Verification is switched off. DVM_VerifyAlways Documents will be always verified. DVM_VerifyIfThresholdExceeded Documents with the number of low-confidence characters above the threshold (VerificationModeThreshold) will be verified. Implemented in: release 1 © ABBYY. All rights reserved. Page 33 of 39 9.4.3. Deleting of jobs The following method has been added to the COM-based and Web-based API to support the ability to delete a job after asynchronous processing. A method which deletes a job and all of its images has been added to the IClient interface. Interface Name Description IClient DeleteJob (string jobId) Deletes a job with all of its images. Implemented in: release 1 10. UI and Documentation localization Localization of ABBYY Recognition Server 4 is done according to the table below. New language that is supported in the Release 2 is filled with blue (Help file for Scanning Station in French). English Russian French German Italian Spanish Chinese Portuguese (Brazil) Czech Hungarian Polish Console + + + + + + + + + + + Indexing Station + + + + + + + + + + + Verification Station + + + + + + + + + + + Scanning Station + + + + + + + + + + + Protection + + + + + + + + + + + Console + + + + + + - - - - - Indexing Station + + + + + + - - - - - Verification Station + + + + + + - - - - - Scanning Station + + + + - - - - - - - Open API + - - - - - - - - - - Admin Guide + + + + + + - - - - - EULA + + + + + + + + + + + Recognition Server + + + + + + + + + + + IFilter + + + + + + + + + + + Autorun + + + + + + + + + + + Resources Help Installer Implemented in: Release 1 Multilingual, Release 2 © ABBYY. All rights reserved. Page 34 of 39 Corrected Issues Corrected in Release 2 № 1 Office Description ABBYY Europe 2 It was not possible to use NEW operator (under 64-bit operating systems in COM API), when creating InputFile object. Now to make xmlTicket.InputFiles.Add(inputFile); method work, the object InputFile should be created using the CreateInputFile method: InputFile file = _clientObject.CreateInputFile(); Microsoft Search IFilter for SharePoint 2013 could not be used for indexing PDF files due to the Microsoft restriction. Now PDF files can be indexed again, after installing the cumulative update package for SharePoint Server 2013. “There is no workflow for processing IFilter requests” error happened after adding the IFilter component to the computer with Recognition Server already installed. 3 4 A license without ISIS drivers allowed scanning with ISIS. Corrected in Release 1 № Office Description 1 ABBYY USA There were extra spaces in PDF files for particular images. The Registry Key [HKCU\Software\ABBYY\RecognitionServer\4.0\OCRProcessor\Export\DebugOptions] "Final_PdfUseScalingForPreventingVirtualSpaces"="true" can be created and enabled for solving extra spaces problem in PDF. Be aware, that if RS Processing service is working under LocalSystem account it is necessary to change HKCU to “HKEY_USERS\S-1-5-18” 2 ABBYY USA Some PDF file were opened on the second page Known Issues Description Solution or workaround Release with issue E-mail support When Exchange Mailbox is selected as input and output, and the "Reply to all:" mode is selected, Recognition Server does not send the resulting letter to the addresses indicated in the "To:" field but only to the sender and to the addresses in the "CC:" field. In the "To:" field, only the mailbox checked by Recognition Server should be specified. In the "CC:" field, all the addressees that must receive the recognition result should be indicated. Release2; Release 1; release 1 (specially for 3A) Export Settings of mapping the index fields to SharePoint columns are not inherited when upgrading from Release 1 to Release 2. Only document types are preserved. If the Content Type configured in the indexing parameters contains the required fields, the documents' index fields’ values must be non-empty at the export step. Otherwise, these documents are left as checkout in the MS SharePoint with an error in the processing notes. Release2 All index fields configured as required must be assigned with the values. © ABBYY. All rights reserved. Page 35 of 39 Release2 Description Solution or workaround Export options for PDF/A-2b and PDF/A-3b lead to creating files of PDF/A-2u and PDF/A-3u formats accordingly. (As 2b and 3b formats of PDF/A files are subsets of 2u and 3u formats.) When exporting documents in Arabic to PDF with MRC, the visual quality of the picture layer becomes worse than in the original image. At the same time, the text layer is correct and can be used for searches. When Arabic text is exported to *.docx format, it is oriented from left to right, which is incorrect. Enable Microsoft Office 2007 Language Settings After exporting in PDF-a (tagged PDF) text layer differs from the text layer in non-tagged PDF and original image. It is recommended to use Adobe Reader 11 to open tagged PDF files. Adobe Reader 10 opens tagged PDF files incorrectly. The structure of Arabic documents is not saved in exported files when using the Formatted mode. Use Exact Copy or Plain Text mode. (Programs > Microsoft Office Tools > Microsoft Office 2007 Language Services). Failed files are duplicated if several export profiles are used and if they are sent to an output folder. For example, each export profile sends a failed file to an output folder. Release with issue Release2; Release 1 Release2; Release 1; release 1 (specially for 3A); Arabic edition Release2; Release 1; release 1 (specially for 3A); Arabic edition Release2; Release 1; Release 1 (specially for 3A); Arabic edition Release2; Release 1; release 1 (specially for 3A); Arabic edition Release2; Release 1; release 1 (specially for 3A); Arabic edition Workflow The child sites of the SharePoint site cannot be selected for crawling within one workflow. When specifying the settings for processing the documents in SharePoint, it is possible to select the top level site and any of its libraries. For the child sites one should create the separate workflows. Release 2 Index fields or document properties are not saved into the SharePoint libraries, when crawling the SharePoint libraries with exporting the output files to source library. (There is no option to setup saving values into the SharePoint columns.) Release 2 It is not possible to establish connection to the SharePoint in the workflows of Recognition Server 3.5, if it was installed on the computer with the Recognition Server 4 Release 2 already installed. Release 2 IFilter © ABBYY. All rights reserved. Page 36 of 39 Description Solution or workaround IFilter in Recognition Server 4 for SharePoint 2013 cannot index JPEG files. For processing JPEG files stored in SharePoint, we recommend using a Document Library workflow instead of IFilter and specify a folder in SharePoint as the input folder. A possible workaround for processing files stored in SharePoint 2013 is to create two workflows. One workflow of type Document Library should have a file name mask which will allow processing only JPEGs. You can specify the same SharePoint library as input and output. As a result, all JPEG files can be replaced with PDF files with a text layer. The second workflow of type IFilter should index other image files stored in SharePoint. Release with issue Release 2 Release 1; release 1 (specially for 3A); Arabic edition Recognition If both English and Arabic are selected as recognition languages, the orientation of a document can be detected incorrectly. 1. The orientation of documents in Arabic is always correct. 2. The orientation of documents in two languages is generally correct but there may be some documents with incorrect orientation. 3. The orientation of documents in English is always incorrect in this case. The problem is difficult to avoid. Although orientation correction on a Verification Station helps to get a correctly recognized text, the orientation of the document will remain incorrect after export. Performance Process documents in English separately and select only English recognition language for them. The processing speed of the workflow type Document Library is several times slower comparing to the processing speed of Hot Folder workflow type. Protection If the license contains both types of pages limits (standard and gothic), but none of standard pages left in the page counter, it cannot be selected as the current license, even if some gothic pages are left for processing. Server operation "Not enough memory" error may occur when processing very large multi-page files. When processing multi-page files of 3,000 pages, the following error message appears: Release2; Release 1; release 1 (specially for 3A); Arabic edition Release2; Release 1 Release 2 In certain cases the following method can be used as workaround: disable the text layer substitution by setting the ReplaceTextLayerOnlyInPdfs to "False". Split files containing 3,000 pages into smaller files. OCR Processor: Not enough storage is available to complete this operation. © ABBYY. All rights reserved. Page 37 of 39 Release2 Release2; Release 1; release 1 (specially for 3A) Description Solution or workaround Documents are not submitted for recognition until all files of the same job in the import folder are loaded. If a job contains too many files to be imported, the Processing Stations can stay idle for a significant period of time. Release with issue Release2; Release 1; release 1 (specially for 3A); Arabic edition System Administration The database where the job log is stored is not compatible with a previous version of the database. For this reason, when upgrading to Recognition Server Release 1 or Release 2 from previous releases of Recognition Server 4 it is not possible to save the job log. Release2; Release 1 Event log for the next version of application (RS 4.0) is stored in the new branch to speed up working with Administration Console. After the upgrade from RS 3.5 to RS 4.0 the previous Event log is not displayed in the Console, but it can be viewed using system log for the application log. Release2; Release 1 Verification If a barcode was not correctly recognized and documents were not separated, re-recognizing this barcode at the verification stage does not help to separate the documents correctly. Verification Stations of Recognition 4 cannot be started if a Verification Station of Recognition Server 3.5 has already been started on the same computer. Release2; Release 1 ; release 1 (specially for 3A); Arabic edition Release2; Release 1 Customization, Scripts Only files located in the same folder can be addressed from an XML Ticket. It is not possible to address files in subfolders. It is not possible to create new objects using the NEW operator under 64-bit operating systems in COM API. The warning is written in the Open API Help, Open API overview -> COM-based API -> Using the COMbased API within 64-bit Applications. © ABBYY. All rights reserved. Page 38 of 39 Release2; Release 1; release 1 (specially for 3A); Arabic edition Release2; Release1 Description Solution or workaround The FileName property of the XMLTicket object contains full path to the file, if the file was added to processing via API. The FileName property contains only the name of the file, if file was added to processing via hot folder or other standard inputs. To work around this the OriginalFileName of the InputFile object property can be used. Release with issue Release2; Release1 ABBYYRecognitionServer.Client client = new ABBYYRecognitionServer.Client(); ABBYYRecognitionServer.XmlTicket xmlTicket = = client.CreateXmlTicket(workflowName); ABBYYRecognitionServer.InputFile inputFile = new ABBYYRecognitionServer.InputFile(); inputFile.FileName = filePath; // the path to exisiting file xmlTicket.AddImage(filePath); xmlTicket.InputFiles.Item(0).OriginalFileName = Path.GetFileName(filePath); Another way to workaround is just to add a little bit of code in script where the original file name is used: to check if FileName property contains ‘\’ then extract only the file name from the whole path, otherwise FileName contains the name of the file. Help The following features are not described in the Help file: 1) Export to the specific column types of MS SharePoint 2) Import of registration parameters (lists of index fields from the workflow) at Scanning Station 3) Notifications 4) User interface changes of the Jobs node © ABBYY. All rights reserved. Page 39 of 39 Release2

Recognition Server 4.0 Release Notes

Related documents

Products

Support

Recognition Server 4.0 Release Notes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib