Configuration and Administration Guide

EMC ® InfoArchive
Version 3.1
Configuration and Administration Guide
EMC Corporation
Corporate Headquarters
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Legal Notice
Copyright © 2014 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS
OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDF
Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarks
used herein are the property of their respective owners.
Documentation Feedback
Your opinion matters. We want to hear from you regarding our product documentation. If you have feedback about how we can
make our documentation better or easier to use, please send us your feedback directly at IIGDocumentationFeedback@emc.com
Table of Contents
Preface
Chapter 1
Chapter 2
................................................................................................................................ 11
EMC InfoArchive Overview
.......................................................................... 15
Key Features and Benefits .................................................................................
15
InfoArchive Architecture ..................................................................................
Required Components ..................................................................................
Optional Components ...................................................................................
18
19
20
.............................................................
InfoArchive Data Model....................................................................................
Archive Holding...........................................................................................
Submission Information Package (SIP) ...........................................................
Data Submission Session (DSS) ..................................................................
SIP Descriptor (eas_sip.xml) ......................................................................
SIP Descriptor (eas_sip.xml) Schema ......................................................
Customizing SIP/AIP Metadata .............................................................
PDI File (eas_pdi.xml) ..............................................................................
Archival Information Unit (AIU) ...............................................................
Archival Information Package (AIP) ..............................................................
Unstructured Content File Table of Contents (eas_ri_xml) ..........................
Archival Information Collection (AIC) ...........................................................
How Data is Ingested........................................................................................
Ingestion Modes ...........................................................................................
Asynchronous Ingestion ...........................................................................
Synchronous Ingestion ..............................................................................
Asynchronous Ingestion vs. Synchronous Ingestion ...................................
Archiving Process .........................................................................................
Reception .................................................................................................
Receiver ...............................................................................................
Reception Process and Lifecycle ............................................................
Reception Node ....................................................................................
Enumerator ..............................................................................................
Ingestion ..................................................................................................
Ingestor ...............................................................................................
Ingestion Process and Lifecycle .............................................................
Ingestion Node .....................................................................................
Ingestion Priority..................................................................................
How Data is Stored ...........................................................................................
xDB Modes ..................................................................................................
xDB Library Pool (xDB Mode 3) ................................................................
xDB Pooled Library Assignment Policy (xDB Mode 3).................................
xDB Library Online Backup.......................................................................
xDB Caching ............................................................................................
xDB Library Locking.................................................................................
AIP Modes ...................................................................................................
AIP Parent Assignment Policy ...................................................................
AIP Parent Closure ...................................................................................
21
Concepts: How InfoArchive Works
21
21
22
23
25
26
30
30
32
32
33
36
36
36
36
37
38
39
39
39
41
42
43
44
44
44
48
48
48
50
52
54
54
55
57
58
60
60
3
Table of Contents
Chapter 3
Supported AIP Mode and xDB Mode Combinations .......................................
61
How Data is Searched .......................................................................................
Order (Asynchronous Search) .......................................................................
Order Lifecycle .........................................................................................
62
63
65
Confirmations—How the Loop is Closed ...........................................................
Confirmation Event Types .............................................................................
Confirmation Job (eas_confirmation) ..............................................................
66
66
67
...........................................................................
InfoArchive Configuration Overview .................................................................
Working with InfoArchive Configuration Objects ...........................................
Creating a Contentless Configuration Object ..............................................
Creating a Configuration Object with Content ............................................
Importing Content into a Configuration Object ...........................................
Editing the Properties of a Configuration Object .........................................
Performing System Global Configurations..........................................................
Configuring Global Settings ..........................................................................
Configuring an xDB Cache Access Node ........................................................
Configuring an xDB Library ..........................................................................
Configuring the Configuration Cache ............................................................
Configuring a Centera Store for Use with InfoArchive ....................................
Creating a Centera Store Object .................................................................
Configuring Key Centera Store Object Properties ........................................
Quickstart Using the InfoArchive Holding Configuration Wizard ........................
Launching the InfoArchive Holding Configuration Wizard .............................
69
InfoArchive Configuration
Configuring a Basic Holding Using the InfoArchive Holding
Configuration Wizard ...................................................................................
69
70
70
71
72
72
72
72
74
76
79
81
81
84
84
85
86
Configuring a Holding ...................................................................................... 96
Configuring a Holding Configuration (eas_cfg_holding) Object ....................... 97
Configuring an AIC View (eas_cfg_aic_view) Object ..................................... 104
Configuring a DQL Predicate for the AIC View ........................................ 104
Configuring an XQuery for the AIC View................................................. 104
4
Configuring Holding Security .........................................................................
104
Defining the Structured Data (PDI) Schema ......................................................
Creating a Structured Data (PDI) Schema .....................................................
PDI (eas_pdi.xml) Schema Definition Best Practices ..................................
Sample PDI (eas_pdi.xml) Schema ...........................................................
Configuring a Schema Configuration Object .................................................
108
109
110
111
113
Configuring xDB Modes .................................................................................
Configuring an xDB Parent Library (xDB Mode 1 and 2) ...............................
Configuring an xDB Library Pool (xDB Mode 3) ...........................................
Configuring a Custom Pooled Library Assignment Policy (xDB
Mode 3) .....................................................................................................
Configuring xDB Mode Settings for a Holding..............................................
Configuring Settings for xDB Mode 1 .......................................................
Configuring Settings for xDB Mode 2 .......................................................
Configuring Settings for xDB Mode 3 .......................................................
Converting xDB Library (Pool) Backup Renditions ...................................
Configuring the Ingestion Process ...................................................................
Defining Ingestor Parameters ......................................................................
pdi.index.creator—Creating xDB Indexes .................................................
Path Index ..........................................................................................
Full-Text Index ...................................................................................
pdi.aiu.cnt—Counting the Number of AIUs .............................................
minmax—Defining the AIP Partitioning Key ............................................
114
115
116
120
121
121
122
124
124
125
127
127
128
129
131
132
Table of Contents
Optimizing XQuery for Partitioning Keys ............................................
pdi.aiu.id — Generating AIU IDs.............................................................
pdi.ci.id — Generating Content File IDs ...................................................
toc.creator—Creating the Table of Contents (eas_ri.xml) for
Unstructured Content Files .....................................................................
ci.hash—Configuring Content Hashing (Unstructured Data) ....................
ci.compressor—Configuring Compression of Content Files
(Unstructured Data)................................................................................
Configuring a Ingestion Configuration (eas_cfg_pdi) Object ..........................
Configuring a Reception Node Configuration (eas_cfg_receive
_node) Object .............................................................................................
Configuring an Ingestion Node Configuration (eas_cfg_ingest
_node) Object .............................................................................................
134
135
136
Configuring Ingestion for Unstructured Data ...................................................
148
Configuring an Encryption-enabled Holding ....................................................
Preparing Resources for InfoArchive Installation ..........................................
Configuring RSA DPM Key Manager ...........................................................
Creating Configuration Objects for Encryption Settings.................................
eas_cfg_crypto_provider .........................................................................
eas_cfg_aic_crypto ..................................................................................
eas_cfg_holding_crypto ..........................................................................
eas_cfg_pdi_crypto .................................................................................
pdi.xdb.encryption .............................................................................
pdi.index.creator ................................................................................
pdi.xdb.importer ................................................................................
set.schema..........................................................................................
ci.encryption ......................................................................................
eas_query_config ....................................................................................
Adding Properties for the Holding Object ....................................................
Auto Populating Properties for Runtime Objects ...........................................
RSA Encryption Header ..............................................................................
Operator Restraints for Querying Encrypted Data ........................................
149
150
151
151
151
152
153
155
155
156
156
156
157
157
159
159
161
161
Configuring Synchronous Ingestion .................................................................
Defining the Custom AIP Assignment Policy ................................................
Configuring the AIP Parenting Policy ..........................................................
Enabling Synchronous Ingestion..................................................................
162
162
163
167
Configuring Query .........................................................................................
Configuring a Query Quota Configuration (eas_cfg_query_quota)
Object ........................................................................................................
Defining the Search Criteria and Search Result .............................................
Configuring Query for Unstructured Data................................................
request-config—Defining Search Criteria .................................................
Configuring Search Criteria for Multi-Level XML Structures .................
Grouping Search Criteria ....................................................................
Searching Multiple Paths.....................................................................
query-template—Defining Search Results ................................................
Configuring a Query Configuration (eas_cfg_query) Object ..........................
Configuring an Order Configuration (eas_cfg_order) Object .........................
Configuring an Order Node Configuration Object ........................................
Starting the Order Node..............................................................................
168
169
171
171
173
174
175
176
177
178
179
182
184
Configuring InfoArchive GUI ..........................................................................
Configuring the Search Menu ......................................................................
Configuring a Search Form Folder ...............................................................
Configuring a Search Form..........................................................................
Configuring XForms ...............................................................................
XForms Structure ...............................................................................
XForms Example ................................................................................
184
185
186
187
188
189
189
137
138
141
143
143
146
5
Table of Contents
Chapter 4
6
Search Criteria ....................................................................................
Search Form Binding ..........................................................................
Search Form Controls .........................................................................
Logical Grouping of Search Criteria .....................................................
Multiple Input Values for a Single Search Criterion ...............................
Defining InfoArchive GUI Locales .......................................................
Localizing a Search Form ....................................................................
Configuring Hints and Error Messages ................................................
Configuring a Search Form Configuration (eas_cfg_search_form)
Object ....................................................................................................
Configuring the Search Results ....................................................................
Configuring a Stylesheet Configuration (eas_cfg_stylesheet)
Object ....................................................................................................
Creating a Stylesheet ..............................................................................
Stylesheet Components .......................................................................
Stylesheet Elements and Attributes ......................................................
Namespaces in the Stylesheet ..............................................................
Stylesheet Example .............................................................................
Localizing a Stylesheet ............................................................................
Implementing InfoArchive GUI Single Sign-On (SSO) ...................................
InfoArchive Reserved Context Variables ..................................................
Custom Context Variables .......................................................................
Enabling Security Policies ...........................................................................
Clear-text Username/Password Policy ......................................................
Configuration on Web Services ............................................................
Configuration on Web GUI ..................................................................
X.509 Certificates ....................................................................................
Configuration on Web Services ............................................................
Configuration on Web GUI ..................................................................
Customizing a Security Policy .................................................................
Customizing InfoArchive GUI .....................................................................
UI strings ...............................................................................................
CSS ........................................................................................................
Autocomplete ........................................................................................
Configuring Advanced InfoArchive GUI Settings .........................................
191
192
193
193
194
195
195
198
Configuring Confirmations .............................................................................
Confirmation Configuration (eas_cfg_confirmation) Object ..........................
Query Configuration Object (eas_cfg_query) for Confirmations .....................
XQuery for Confirmations .......................................................................
Delivery Channel Configuration Object (eas_cfg_delivery
_channel) ..............................................................................................
Delivery Channel Configuration Parameters.........................................
Configuring a Delivery Channel Configuration Object for
Confirmations ............................................................................................
Configuring a Query Configuration (eas_cfg_query) Object for
Confirmations ............................................................................................
Defining the Criteria for Applicable AIPs (Optional) .....................................
Configuring a Confirmation Configuration Object ........................................
222
222
223
224
InfoArchive Administration .......................................................................
InfoArchive Administration Overview .............................................................
233
233
Generating SIPs .............................................................................................
PDF Files Consolidation & Dynamic Extraction ............................................
Generating SIPs Containing Consolidated PDF Files .....................................
233
234
234
Starting the xDB Cache ...................................................................................
Archiving Data in Asynchronous Ingestion Mode.............................................
Receiving SIPs ............................................................................................
235
236
237
198
200
201
202
202
203
211
211
213
215
215
216
216
216
217
218
218
218
219
220
220
221
221
221
221
225
226
228
229
230
231
Table of Contents
Configuring the Receiver Properties.........................................................
Running the Receiver ..............................................................................
Verifying the Reception Process ...............................................................
Troubleshooting Receiver Errors ..............................................................
Receiver Return Codes ........................................................................
Receiver Log File ................................................................................
Enumerating AIPs for Ingestion...................................................................
Configuring Enumerator Properties .........................................................
Running the Enumerator.........................................................................
Troubleshooting Enumerator errors .........................................................
Enumerator Return Codes ...................................................................
Re-enumerating Failed AIPs ................................................................
Ingesting AIPs ............................................................................................
Configuring Ingestor Properties ..............................................................
Running the Ingestor ..............................................................................
Verifying the Ingestion Process ................................................................
Troubleshooting Ingestion Errors .............................................................
Ingestor Return Codes ........................................................................
Ingestor Log File.................................................................................
Committing Ingested AIPs ..........................................................................
Executing the eas_commit Job .................................................................
Verifying the Commit .............................................................................
237
238
238
240
240
242
242
242
243
244
244
244
245
245
246
247
248
249
250
250
251
252
Managing AIPs ..............................................................................................
AIP States ..................................................................................................
Rejecting or Invalidating an AIP ..................................................................
Updating Custom Package Metadata ...........................................................
Ingesting SIPs Containing Custom Package Metadata ...............................
Updating Custom Package Metadata .......................................................
Custom Package Metadata Update Dates .............................................
Updating Custom Package Metadata in DA .........................................
Updating Custom Package Metadata using DQL scripts ........................
Propagating Changes to AIP Renditions...................................................
Performing Data Retention Management......................................................
Using InfoArchive’s Date-Based Retention Management
Capabilities ............................................................................................
Changing the Retention Date ...............................................................
Applying/Removing a Purge Lock .......................................................
Using Extended Retention Management Capabilities of
Retention Policy Services (RPS) ...............................................................
Licensing and Installing RPS Components ...........................................
RPS Basic Concepts.............................................................................
Retention Policy .............................................................................
Retainer .........................................................................................
Applying Retention Markups ..........................................................
Escalating DFC Instances to Privileged Users .......................................
Aligning the Retention Base Time ........................................................
Creating a Retention Policy .................................................................
Applying RPS Retention Policies to AIP Objects ...................................
Applying a Retention Policy to a Folder ...........................................
Closing and Reopening a Folder ......................................................
Setting a Retention Policy at the Holding Level .................................
Setting a Retention Policy in the SIP Descriptor ................................
Changing RPS Retentions ....................................................................
Disposing AIPs under RPS Retention ...................................................
Rejecting/Invalidating AIP Objects Under RPS Retention ......................
Deleting AIP Objects Under RPS Retention ..........................................
252
254
255
256
256
257
257
258
258
258
260
Working with InfoArchive GUI .......................................................................
Logging in to InfoArchive GUI ....................................................................
269
270
261
262
262
262
263
263
263
264
264
265
266
266
267
267
267
267
268
268
268
269
269
7
Table of Contents
8
Searching Archived Data.............................................................................
Working with the Search Results Page..........................................................
Exporting AIUs on the Search Results Page ..............................................
Using the InfoArchive GUI Direct Search URL .............................................
271
272
273
274
Using Archival Reports ...................................................................................
Using Archival Reports in DA (Common Operations) ...................................
List of Ingestions in Progress .......................................................................
List of Ingestions with Errors.......................................................................
List of AIPs by Retention Date .....................................................................
AIPs for Disposition ...................................................................................
Current xDB Library Pool Volume ...............................................................
Current Archived Volume ...........................................................................
Archived Volume History ...........................................................................
Performed Actions ......................................................................................
Exported Information .................................................................................
Archived AIPs and archival metrics .............................................................
Configuring the Calculation of Archived Structured Data Volume ................
Configuring the eas_report Job ....................................................................
276
277
278
278
278
278
279
279
280
281
282
283
284
285
Managing Orders ...........................................................................................
Viewing Order Properties and Processing History ........................................
Suspending/Resuming an Order ..................................................................
Changing the Priority of an Order ...............................................................
Cancelling an Order....................................................................................
Purging Orders ..........................................................................................
287
287
287
288
289
290
Managing Jobs ...............................................................................................
InfoArchive Jobs .........................................................................................
Viewing Job Trace Logs ...........................................................................
InfoArchive Content Server Jobs ..................................................................
Modifying InfoArchive Methods Timeout Settings....................................
Increasing the JVM Heap Size Allocated to InfoArchive Jobs .....................
Archive Audit (eas_archive_audit) ...........................................................
DCTM Clean (eas_dctm_clean) ................................................................
DCTM Clean Arguments ....................................................................
DCTM Clean Return Codes .................................................................
Close (eas_close) .....................................................................................
Close Arguments ................................................................................
Close Return Codes ............................................................................
Commit (eas_commit) .............................................................................
Commit Arguments ............................................................................
Commit Return Codes ........................................................................
DMClean CAStore (dm_DMClean_CAStore) ............................................
DMClean CAStore Arguments ............................................................
DMClean CAStore Return Codes .........................................................
Invalidation/Rejection (eas_rejinv) ...........................................................
Invalidation/Rejection Arguments .......................................................
Invalidation/Rejection Return Codes ....................................................
Confirmation (eas_confirmation) .............................................................
Confirmation Arguments ....................................................................
Confirmation Return Codes.................................................................
Purge (eas_purge)...................................................................................
Purge Argument.................................................................................
The Purge Return Code .......................................................................
Update Content Metadata (eas_sip_cmeta_refresh) ..................................
Update Content Metadata Argument ...................................................
Update Content Metadata Return Codes ..............................................
InfoArchive Command Line Jobs .................................................................
Clean (eas-launch-clean) .........................................................................
Clean properties file............................................................................
290
290
290
291
293
294
294
294
294
295
295
295
296
296
297
297
298
298
298
298
298
299
299
300
300
301
302
302
302
303
303
303
304
304
Table of Contents
Clean Return Codes ............................................................................
xDB Enumeration No Backup (eas-launch-xdb-enumeration
-nobackup) .............................................................................................
xDB Enumeration No Backup properties file ........................................
xDB Enumeration No Backup options ..................................................
xDB Enumeration No Backup Return Codes .........................................
xDB Clean (eas-launch-xdb-clean) ...........................................................
xDB Clean Properties File ....................................................................
xDB Clean Options .............................................................................
xDB Clean Return Codes .....................................................................
305
306
306
306
306
307
307
308
309
Managing Audit .............................................................................................
InfoArchive Audit Trail Events ....................................................................
Documentum Content Server Events .......................................................
InfoArchive-Specific Events .....................................................................
Archiving InfoArchive Audit Records ..........................................................
Configuring the Archive Audit (eas_archive_audit) Job
Arguments .............................................................................................
Running the Archive Audit (eas_archive_audit) Job ..................................
Troubleshooting the Archive Audit Job Errors ..........................................
Viewing Archived Audit Records ................................................................
Purging InfoArchive Audit Records .............................................................
Configuring Purge Audit (eas_purge_audit) Job Arguments .....................
Running the Purge Audit (eas_purge_audit) Job .......................................
Troubleshooting the Purge Audit Job Errors .............................................
309
309
310
310
312
Administrating the Configuration Cache ..........................................................
Logging ........................................................................................................
Command Line Jobs Logging ......................................................................
File Loggers for Command Line Jobs .......................................................
Console Loggers for Command Line Jobs .................................................
Content Server Jobs Logging .......................................................................
Configuring the Logging Level for InfoArchive Web GUI and Web
Services .....................................................................................................
318
319
319
319
320
321
312
314
315
315
316
317
318
318
321
9
Table of Contents
10
Preface
EMC InfoArchive is highly configurable and allows you to customize many aspects of the archiving
process to meet your specific business requirements, from ingestion, to query, to the look and feel
of InfoArchive GUI.
This guide provides information and instructions about how to configure InfoArchive.
Intended Audience
This document is intended for system administrators responsible for configuring and administrating
InfoArchive.
To use this document, you need the following:
• Administrative privileges on the host where you are configuring InfoArchive
• Working knowledge of:
— Microsoft Windows or Linux
— EMC Documentum Content Server configuration and administration
— EMC Documentum xDB configuration and administration
— XML, XQuery, XPath, and XForms
Related Documentation
The following documentation provides additional information:
• EMC InfoArchive Release Notes
• EMC InfoArchive Installation Guide
• EMC InfoArchive Object Reference Guide
• EMC InfoArchive Development Guide
Conventions
The following conventions are used in this document:
11
Preface
Font Type
Meaning
boldface
Graphical user interface elements associated with an action
italic
Book titles, emphasis, or placeholder variables for which you supply particular
values
monospace
Commands within a paragraph, URLs, code in examples, text that appears on the
screen, or text that you enter
Path Conventions
This guide uses the following path conventions:
Path Variable
Description
EAS_HOME
EMC InfoArchive installation directory
WEBSERVER_HOME
Web application server installation directory
TOMCAT_HOME
Apache Tomcat installation directory
XHIVE_HOME
EMC Documentum xDB installation directory
Note: EMC InfoArchive was named Enterprise Archiving Solution (EAS) prior to the 3.0 release.
Names of variables, properties, and object types may still contain the EAS abbreviation.
Acronyms and Abbreviations
AIC
Archival Information Collection
AIP
Archival Information Package
AIU
Archival Information Unit
DSS
Data Submission Session
JSP
Java Server Pages
LWSO
Light Weight System Object
OAIS
Open Archival Information System
PDI
Preservation Description Information
SIP
Submission Information Package
XSD
XML Schema Document
XML
eXtensible Markup Language
12
Preface
Revision History
The following changes have been made to this document.
Revision Date
Description
December 2014
Initial publication
13
Preface
14
Chapter 1
EMC InfoArchive Overview
With the explosive growth of information and the increased focus on regulatory compliance,
companies today are facing the challenges of retaining and protecting ever-growing business-critical
data for prolonged periods of time to meet corporate, regulatory, and legal requirements while
minimizing storage cost and increasing operational efficiency.
EMC InfoArchive is a powerful, secure, and flexible enterprise archiving system for long-term or
permanent preservation and access of digital information. Standards-compliant and built on the
proven strengths of EMC Documentum Content Server and xDB, InfoArchive provides a unified
and cost-effective solution for storing, managing, and retrieving large volumes of structured and
unstructured data. Highly configurable and scalable, InfoArchive is designed to help organizations of
all sizes to address their complete information retention needs and achieve regulatory compliance.
Key Features and Benefits
EMC InfoArchive provides the following key features:
• Compliance with free, open industry standards
InfoArchive is compliant with the following free, open industry standards:
— Reference Model for an Open Archival Information System (OAIS)
InfoArchive is designed by referencing the OAIS framework.
— Extensible Markup Language (XML)
InfoArchive stores all structured data in XML format, which is platform and vendor neutral.
Unlike proprietary data formats, XML can be read easily by XML editors, most major word
processors, and text editors.
— XQuery
InfoArchive uses the XQuery language to query archived data.
By leveraging open technologies, InfoArchive mitigates the risk of losing the possibility of
restoring, rendering, or interpreting archived information caused by rapidly changing digital
technologies and ensures long-term data immutability and readability. Even if the archiving
system itself becomes obsolete, its archived data can still be permanently preserved and retrieved.
• Support for data from all types of source applications
15
EMC InfoArchive Overview
InfoArchive is source application-agnostic, which means that it can archive any data produced by
any source application, as long as the data is packaged into a designated InfoArchive-supported
format for ingestion. For example, InfoArchive can archive scanned documents, recorded videos,
business data exported from an ERP system, and of course, information extracted from EMC
Documentum Content Server. Once archived, the information is securely preserved and can be
retrieved at any time.
• Unified management of both structured and unstructured data of varying levels of complexity
InfoArchive consolidates structured and unstructured data into a single archive, thus eliminating
the need to maintain two separate systems.
InfoArchive can archive objects of varying levels of complexity, ranging from simple objects (such
as flat files), to mildly complex ones (such as content metadata), to highly complex ones (such
as SEPA financial records).
While being able to archive up to tens of billions of objects, InfoArchive also takes advantage of
the lightweight and sharable system object types in EMC Documentum Content Server 6.5 and
later to manage objects with shared property values to minimize the storage footprint.
• Synchronous (transactional) and asynchronous (batch) ingestion
InfoArchive supports two ingestion modes for archiving data objects of varying granularity:
— Synchronous ingestion: Archiving of large quantities of single data objects to keep up with
steady input streams of data to be archived.
— Asynchronous ingestion: Scheduled ingestion of items in batches for optimal performance
when information to be archived comes in intermittently.
• Active archiving
Unlike in traditional passive archives, InfoArchive allows authorized users to search, view, and
export both online and offline content stored in the archive through a customizable web-based
search application called InfoArchive GUI. You can configure many aspects of the query such
as search criteria and query quota, and even customize the look and feel of the web-based user
interface.
Furthermore, because InfoArchive GUI is built on a set of exposed web services, you can leverage
the exposed APIs to integrate the query capabilities into your own business applications or build
your own search application.
• Powerful administration functionality
InfoArchive provides administrators with a set of powerful functions to perform day-to-day
archiving operations and management of archived data in an efficient manner, including storage
management, retention management, audit management, AIP management, and metrics
reporting. Most of these administrative functions are performed in the familiar user interface
of EMC Documentum Administrator (DA).
• Data security
InfoArchive ensures the privacy and security of archived data using role-based permission access
and data encryption.
• Support for EMC Centera, Atmos, Data Domain, and Isilon
16
EMC InfoArchive Overview
InfoArchive provides built-in support for archiving content on a wide selection of EMC storage
platforms such as Centera, Atmos, Data Domain, and Isilon, to meet different business and
regulatory compliance requirements.
• Integration readiness
Key InfoArchive functions are exposed as web services that are based on a common framework
and can be consumed using Java client libraries or with standard web services tools. You can
develop consumers of these web services and integrate InfoArchive capabilities into your own
business applications.
• Virtualization readiness
InfoArchive supports fully virtualized infrastructures as supported by Documentum Content
Server and can be readily installed on virtual machines.
InfoArchive can bring about the following benefits:
• Reduced cost
As enterprise data grows on storage resources, it can quickly exhaust terabytes or more of
expensive storage space. By offloading infrequently accessed data from tier-one storage to
lower-cost storage tiers, InfoArchive reclaims expensive primary storage capacity and minimizes
storage cost as well as associated administrative cost.
InfoArchive’s ability to permanently preserve and retrieve application data also means that you
can decommission legacy applications, leading to further cost savings.
• Increased operational efficiency
Reduction in stored data on primary storage frees up space for active data and optimizes
performance of existing applications.
In addition, InfoArchive’s powerful high-volume data ingestion capability, unified and
simplified management of structured and unstructured data in a single consolidated archive,
and customizable queries to quickly locate archived data through InfoArchive GUI all help to
increase operational efficiency.
• Data accessibility
Through active archiving, InfoArchive ensures easy and flexible access to archived data by
authorized users.
• Regulatory and legal compliance
Government agencies and other regulatory organizations have established requirements for
data retention, security, and accessibility. Organizations must also be able to retrieve relevant
information in the event of legal discovery, audits, and business or personnel investigations.
By using InfoArchive as the information archiving system, organizations can stay in compliance
with these regulatory requirements and legal mandates. InfoArchive enables data governance
and ensures that historical records are systematically stored in a centralized archive for as long as
you are required to keep them, with security in place to guard against tampering or inadvertent
deletion, and can be retrieved and presented in a timely manner.
• Long-term availability of preserved information
With information stored in the open, vendor-neutral XML format, InfoArchive ensures long-term
immutability and readability of data provided at the storage level. This means that an organization
17
EMC InfoArchive Overview
can continue to access its archived data even when the business application that produced the
data has been decommissioned or the archiving system itself has become unavailable.
• Versatility and flexibility
Whether the data you want to archive is structured or unstructured, plain simple or highly
complex, and regardless of which source application the data is extracted from and the granularity
of the data, InfoArchive is the truly versatile, flexible, all-in-one solution that can meet all your
archiving requirements.
• Ability to meet today’s specific business requirements as well as tomorrow’s evolving needs
InfoArchive is highly configurable and allows you to customize many aspects of the archiving
process to meet your specific business requirements, from ingestion, to query, to the look and feel
of InfoArchive GUI. InfoArchive also provides you with the utility to update the metadata of
archived data to be in sync with changes in the data model of your source business application.
InfoArchive not only provides the flexibility to tailor to your current business needs, but also lays
a reliable foundation for your evolving requirements into the future, protecting your existing
IT investments.
InfoArchive can be scaled to meet your ever-growing storage needs in the long run, in support of
virtually limitless archiving volumes.
Standards-compliant and built with open technologies, InfoArchive mitigates the danger of
information loss due to fast-changing digital technologies that may render a software application
obsolete in a matter of years. You can rest assured that business-critical data archived by
InfoArchive will always be retrievable and readable well into the future, and can even outlive
the applications that produced it.
InfoArchive Architecture
Built on the proven EMC Documentum platform, InfoArchive taps into the powerful content
management capabilities of Content Server and xDB to manage archived data—both structured data
(i.e. XML documents) and unstructured data (i.e. data stored in any non-XML format). Data is
archived into the repository through a set of standalone Java programs and is queried and retrieved
through InfoArchive web services that are exposed to consumer applications, such as the optional
search application InfoArchive GUI. InfoArchive provides an administration user interface through
customized Documentum Administrator (DA) and lets you perform various archive maintenance
activities through a set of Content Server jobs and command line jobs.
The following InfoArchive system architecture diagram illustrates the required and optional
components in a typical InfoArchive deployment.
18
EMC InfoArchive Overview
Required Components
Component
Type
Description
Receiver
Standalone Java program
Places received SIPs in the ingestion queue
Enumerator
Standalone Java program
Returns an ordered list of queued ingestions
Ingestor
Standalone Java program
Archives SIPs
web Services
JAX-WS stateless web
Services
Exposes search and retrieval services to
consumer applications
Documentum
Administrator
(DA) extensions
WDK customization
Provides the user interface for performing most
configuration and administrative functions
Content Server
jobs
Standalone Java program
Archive maintenance jobs implemented as
Content Server repository objects to perform
various tasks ranging from ingestion commit, to
confirmation, to purge
Command line
jobs
Standalone Java programs
Archive maintenance jobs executed by a shell
with predefined arguments to perform tasks
such as cleaning reception, ingestion, and xDB
cache working directories
19
EMC InfoArchive Overview
Optional Components
Component
Type
Description
InfoArchive GUI
Web application
The default search application for searching
archived data
Order processor
Standalone Java program
Launched as a background daemon to execute
orders (asynchronous search requests) and
loads/unloads xDB data files, useful for search
profiles that generally retrieve a huge number
of results or execute over an extremely large
range of data
xDB cache
Standalone Java program
Launched as a background daemon to access
xDB data files at the file system level
20
Chapter 2
Concepts: How InfoArchive Works
It is important to understand fundamental InfoArchive concepts before performing configuration
and administration tasks. This chapter grounds you in the essentials of InfoArchive terminology and
concepts that underlie the complete data archiving and retrieval process.
InfoArchive Data Model
InfoArchive is designed using a unified OAIS-compliant data model that dictates in which format
information is ingested into, stored in, and retrieved from InfoArchive throughout its lifecycle.
Content to be archived is exported from the producer (source application) and packaged into
Submission Information Packages (SIPs). SIPs are ingested into InfoArchive and stored there as Archival
information Packages (AIPs), one AIP corresponding to one SIP. The consumer (user or application)
retrieves the content from InfoArchive by either performing a search (synchronous) or creating an
order (asynchronous search) on the AIPs and the information units contained in the AIPs (called
AIUs) that match the search criteria are returned as query results.
Archive Holding
A holding is a logical destination archive where to ingest and store data, usually of the same type
that share common characteristics. For example, you can create a holding to archive data from
21
Concepts: How InfoArchive Works
the same source application (such as ERP data), or of the same format (such audio recordings), or
belonging to the same business entity.
An InfoArchive instance can contain multiple archive holdings.
You can create multiple archive holdings for a single data type for applying different access rights
and target storage areas in order to meet the requirements from different data owners.
Considering that the definition of the archive holdings is highly structured, it must be carefully
designed according to:
• The expected content types to be archived
• The data segregation constraints for isolating the data owned by different business entities
in distinct archive holdings
Most InfoArchive system configurations are performed at the holding level. Holding configuration
encompasses many aspects of data archiving such as storage areas, retention policy, ingestion
sequence, AIP mode, and xDB mode. The settings defined at the archive holding level are used
throughout the whole lifecycle of the data archived in the holding.
Submission Information Package (SIP)
Data to be archived must be packaged into Submission Information Packages (SIPs) to be ingested
into InfoArchive.
A SIP is a data container used to transport data to be archived from the producer (source application)
to InfoArchive. It consists a SIP descriptor containing packaging and archival information about
the package and the data to be archived. The latter part of the SIP in turn comprises a PDI file
eas_pdi.xml (structured data) and optionally one or more content files (unstructured data).
A SIP must be compressed into .zip format and have the following files at the root level:
• eas_sip.xml
SIP descriptor that identifies and describes the information package. This file must conform to
the SIP schema.
• eas_pdi.xml
22
Concepts: How InfoArchive Works
PDI (preservation description information) file that contains the structured data to archive.
InfoArchive does not dictate the structure of this file.
• Optionally, one or more unstructured content files to be archived, such as audio (.mp3), video
(.avi), and graphic (.png) files. These file names must be referenced in the PDI file (eas_pdi.xml).
InfoArchive can archive any type of data produced from any source application as long as the
data is packaged into SIPs that meet all the file structure and format requirements. However,
InfoArchive is not responsible for generating SIPs; the source application must produce them or you
can develop utilities to extract information to be archived from the source application and convert it
into InfoArchive-compliant SIPs. Use a file transfer program of your choice to move the generated
SIPs to where they can be ingested into InfoArchive.
Data Submission Session (DSS)
Every SIP pertains to a data submission session (DSS), also referred to as a batch. The data submission
session (DSS) provides the packaging information used to identify, bind, and relate SIPs of different
levels of granularity produced by the source application.
When information is produced and packaged into a discrete, standalone SIP, the SIP pertains to a
single-package DSS.
Sometimes, information has to be packaged into more than one sequential SIP grouped together
as a batch rather than a single SIP due to limitations in file size, file transfer, time, or the source
application. In this case, multiple SIPs pertains to a single DSS, with each SIP in the batch assigned
a sequence number (seqno). The last SIP in a DSS has the is_last value set to True. A DSS
associates multiple SIPs together.
Each DSS (batch) has a unique identifier derived from the information contained in the SIP descriptor:
external DSS ID = holding + producer + id (internal DSS ID within the SIP)
For example, given the following DSS information:
<holding>PhoneCalls</holding>
<producer>CC</producer>
<id>2011060118</id>
The external DSS ID is PhoneCallsCC2011060118.
23
Concepts: How InfoArchive Works
When archiving multiple SIPs belong to the same DSS (batch archiving), InfoArchive is insensitive to
the order in which the SIPs are received, regardless of their sequence number, and can ingest multiple
SIPs belonging to the same DSS concurrently. InfoArchive also natively supports the commit and
rollback of all the SIPs in a batch at the DSS level.
The following is an example of the DSS element contained in the SIP descriptor (eas_sip.xml). All
SIPs that belong to the same DSS have the same values in the DSS element.
<?xml version="1.0" encoding="UTF-8"?>
<sip xmlns="urn:x-emc:eas:schema:sip:1.0">
<dss>
<holding>PhoneCalls</holding>
<id>2011060118</id>
<pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema>
<pdi_schema_version/>
<production_date>2011-06-01T00:00:00.000</production_date>
<base_retention_date>2011-06-01T00:00:00.000</base_retention_date>
<producer>CC</producer>
<entity>PhoneCalls</entity>
<priority>0</priority>
<application>CC</application>
<retention_class>R1</retention_class>
</dss>
...
</sip>
DSS element
Description
holding
The destination archive holding into which to ingest the SIP.
id
Internal DSS identifier (ID) assigned by the application or utility that
generated the SIP. This ID, in conjunction with holding, and producer, is
used to form the external DSS ID that uniquely identifies a DSS:
external DSS ID = holding + producer + id
pdi_schema
The uniform resource name (URN) of the XML schema used to validate
the PDI file (eas_pdi.xml), with its version number appended to it,
for example:
urn:eas-samples:en:xsd:phonecalls.1.0
pdi_schema_version
It is recommended that you leave this element blank and append the
PDI file schema version number to the schema URN in the pdi_schema
element. When this information is not included in the schema URN (not
recommended), you can specify a version of the schema.
This element is included for alignment with the xsd:schema standard.
However, using this element brings several XML inherent limitations,
for example, it is not possible to include a schema in another schema
by referencing its schema version. For this reason, it is recommended
that you directly put the version in the URN of the schema, which is
the most common XML practice.
24
Concepts: How InfoArchive Works
DSS element
Description
production_date
The datetime when the DSS (batch) that the SIP belongs to was produced
in the following coordinated universal time (UTC) format:
yyyy-mm-ddThh:mm:ss.000
Typically, it is the creation time of the first SIP in a batch.
base_retention_date
The base retention date of the SIP used to calculate the retention date of
the information package in the archive holding:
retention date = base retention date + retention period of the holding
The date must be in the following coordinated universal time (UTC)
format:
yyyy-mm-ddThh:mm:ss.000
producer
The application or utility that produced the SIP. This can be the same
as the application element.
entity
The business entity that owns the information contained in the
information package.
priority
The ingestion priority of the batch. With the same archive holding,
InfoArchive ingests batches with a higher priority value first.
application
The application or utility that produced the SIP. This can be the same
as the producer element.
retention_class
Retention class used to calculate the retention date of the SIP. This
element is optional.
• If empty, the retention period (eas_retention_period property)
configured for the destination archive holding is applied.
• If present, the retention period (eas_retention_class_period property)
associated with the specified retention class (eas_retention_class
property) defined for the destination archive holding is applied. If the
retention period or the retention class is not found during reception,
an error occurs.
SIP Descriptor (eas_sip.xml)
A SIP must have a SIP descriptor (eas_sip.xml) at the root level of the package. The SIP descriptor
contains two types of information:
• Archival information about the package such as producer (the source application from which the
data originates), the destination archive holding (an InfoArchive instance can contain multiple
archive holdings), and base retention date. This information is used by InfoArchive during the
25
Concepts: How InfoArchive Works
ingestion process to facilitate searching archived data. This information is the same for all SIPs in
a DSS (batch).
• Packaging information that provides encapsulation and identification of the content to be
archived. It identifies whether the SIP is a standalone single-item package or a one of multiple
sequential packages in a batch (and if so, which sequence number it is).
Each SIP is uniquely identified by an external SIP ID:
external SIP ID = external DSS ID + seqno = holding + producer + internal DSS ID + seqno
The producer information in the external SIP ID allows different source applications or utilities to
produce SIPs without conflicting IDs.
In the following SIP descriptor (eas_sip.xml) example, the external SIP ID is
PhoneCallsCC20110201131:
<?xml version="1.0" encoding="UTF-8"?>
<sip xmlns="urn:x-emc:eas:schema:sip:1.0">
<dss>
<holding>PhoneCalls</holding>
<id>2011020113</id>
<pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema>
<pdi_schema_version />
<production_date>2011-02-01T00:00:00.000+01:00</production_date>
<base_retention_date>2011-02-01T00:00:00.000+01:00</base_retention_date>
<producer>CC</producer>
<entity>PhoneCalls</entity>
<priority>0</priority>
<application>CC</application>
</dss>
<production_date>2011-02-01T00:00:00.000+01:00</production_date>
<seqno>1</seqno>
<is_last>true</is_last>
<aiu_count>10</aiu_count>
<page_count>0</page_count>
</sip>
Note: In eas_sip.xml, the text of the holding, id, and producer elements cannot contain the
following reserved characters: < (less than), > (greater than), : (colon), " (double quote), / (forward
slash), \ (backward slash), | (pipe), ? (question mark), * (asterisk).
SIP Descriptor (eas_sip.xml) Schema
A SIP descriptor (eas_sip.xml) must be a valid XML document that conforms to the
predefined InfoArchive SIP descriptor schema (eas_sip.xsd), which can be found in the
install/resources/xsd directory of the InfoArchive installation package.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:Schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sip="urn:x-emc:eas:Schema:sip:1.0" targetNamespace="urn:x-emc:eas:Schema:sip:1.0"
version="1.0">
<xs:element name="sip">
<xs:complexType>
<xs:sequence>
<xs:element name="dss">
<xs:complexType>
<xs:sequence>
<xs:element name="holding" nillable="false">
<xs:simpleType>
26
Concepts: How InfoArchive Works
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="id" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="pdi_schema" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="256"/>
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="pdi_schema_version" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="production_date" type="xs:dateTime"/>
<xs:element name="base_retention_date" type="xs:dateTime" nillable="false"/>
<xs:element name="producer" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="entity" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="priority" type="xs:int" nillable="false"/>
<xs:element name="application" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="retention_class" nillable="false" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
27
Concepts: How InfoArchive Works
<xs:element name="production_date" type="xs:dateTime" nillable="false"/>
<xs:element name="seqno" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="is_last" type="xs:boolean" default="false"/>
<xs:element name="aiu_count" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:long">
<xs:minInclusive value="0"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="page_count" nillable="false" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:long">
<xs:minInclusive value="0"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element ref="sip:pdi_hash" minOccurs="0"/>
<xs:element ref="sip:custom" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="pdi_hash">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="algorithm" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="MD2"/>
<xs:enumeration value="MD5"/>
<xs:enumeration value="SHA-1"/>
<xs:enumeration value="SHA-256"/>
<xs:enumeration value="SHA-384"/>
<xs:enumeration value="SHA-512"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="encoding" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="base64"/>
<xs:enumeration value="hex"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="custom">
<xs:complexType>
<xs:sequence>
<xs:element ref="sip:attributes" minOccurs="0" maxOccurs="1"/>
<xs:element ref="sip:data" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="attributes">
28
Concepts: How InfoArchive Works
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="sip:attribute"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="attribute">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="name" use="required" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:Schema>
The global SIP elements defined in the schema are described as follows:
Global SIP element
Description
production_date
The datetime when the SIP was produced in the following coordinated
universal time (UTC) format:
yyyy-mm-ddThh:mm:ss.000
Generally, it is the creation time of the PDI file.
seqno
The sequence number of this SIP that denotes its position in the DSS
(batch). This sequence number, in conjunction with the internal DSS
id, holding, and producer information, is used to form the external SIP
ID that uniquely identifies a SIP to avoid conflicting SIP IDs produced
by different source applications:
external SIP ID = holding + producer + id + seqno
is_last
Whether this SIP is the last one in the DSS (batch).
aiu_count
The number of AIUs contained in the SIP. This information is used for
consistency check during ingestion.
page_count
Currently not implemented yet; reserved for future use. Always set this
to 0 (zero).
pdi_hash
(Optional) Hash value (Base64 or Hexadecimal encoded binary) of the
PDI file.
The element has two attributes:
• algorithm
The algorithm used to compute hash values. The following algorithms
are currently supported: MD2, MD5, SHA-1, SHA-256, SHA-384,
and SHA-512.
• encoding
The encoding scheme used to store hash values. The hex and base64
encoding schemes are currently supported.
29
Concepts: How InfoArchive Works
Global SIP element
Description
InfoArchive uses this element to perform consistency checks of PDI files
at the start of ingestion.
custom
Optional element that lets you customize SIP/AIP metadata by defining
custom attributes and/or data to be included in the SIP descriptor
(eas_sip.xml).
See Customizing SIP/AIP Metadata, page 30.
Customizing SIP/AIP Metadata
You can customize SIP/AIP metadata by modifying the predefined InfoArchive SIP descriptor schema
(eas_sip.xsd) and defining custom elements to be included in the SIP descriptor (eas_sip.xml).
The predefined InfoArchive SIP descriptor schema (eas_sip.xsd) can be found in the
install/resources/xsd directory of the InfoArchive installation package.
In the following example, a custom SIP element data is defined in the SIP descriptor schema:
...
<xs:element name="custom">
<xs:complexType>
<xs:sequence>
<xs:element ref="sip:attributes" minOccurs="0" maxOccurs="1"/>
<xs:element ref="sip:data" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:any processContents="lax" minOccurs="0" maxOccurs="10000"/>
</xs:sequence>
</xs:complexType>
</xs:element>
...
Note: You cannot include a parent node that contains both a text node and an element at the same
level in the custom element; for example:
<data>textnode1<foo1/>textnode2<foo2/></data>
After you define the custom element, you can directly include it in the SIP descriptor.
Note: Currently, custom AIP attributes cannot be pushed to the Centera store.
PDI File (eas_pdi.xml)
The PDI (Preservation Description Information) file eas_pdi.xml in a SIP stores structured data
to archive.
Unlike the SIP descriptor (eas_sip.xml), there is no predefined schema for the PDI file. You can
define your own schema using an XML editor for the type of data to archive based on your business
30
Concepts: How InfoArchive Works
requirements (the PDI file name must be eas_pdi.xml though). Any schema can be used as long as
you import it into InfoArchive and perform some additional configurations.
A file name can be stored as a value of one or several XML elements and/or attributes, and the same
file name can be referenced multiple times within an eas_pdi.xml file.
Here is an example of a PDI file containing phone call recordings (eas_pdi.xml):
<?xml version="1.0" encoding="UTF-8" ?>
<Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0">
<Call>
<SentToArchiveDate>2011-06-01</SentToArchiveDate>
<CallStartDate>2011-05-27T04:11:37.234+01:00</CallStartDate>
<CallEndDate>2011-05-27T04:57:16.234+01:00</CallEndDate>
<CallFromPhoneNumber>1773708622</CallFromPhoneNumber>
<CallToPhoneNumber>251286403</CallToPhoneNumber>
<CustomerID>000601</CustomerID>
<CustomerLastName>Mills</CustomerLastName>
<CustomerFirstName>William</CustomerFirstName>
<RepresentativeID>024</RepresentativeID>
<Attachments>
<Attachment>
<AttachmentName>recording1</AttachmentName>
<FileName>recording1.mp3</FileName>
<CreatedBy>PhoneRecorder</CreatedBy>
<CreatedOnDate>2011-05-27T04:57:16.234+01:00</CreatedOnDate>
</Attachment>
<Attachment>
<AttachmentName>recording2</AttachmentName>
<FileName>recording2.mp3</FileName>
<CreatedBy>PhoneRecorder</CreatedBy>
<CreatedOnDate>2011-05-27T04:57:16.234+01:00</CreatedOnDate>
</Attachment>
</Attachments>
</Call>
<Call>
<SentToArchiveDate>2011-06-01</SentToArchiveDate>
<CallStartDate>2011-05-04T19:56:28.234+01:00</CallStartDate>
<CallEndDate>2011-05-04T20:19:50.234+01:00</CallEndDate>
<CallFromPhoneNumber>1616330136</CallFromPhoneNumber>
<CallToPhoneNumber>1885236136</CallToPhoneNumber>
<CustomerID>000123</CustomerID>
<CustomerLastName>Martin</CustomerLastName>
<CustomerFirstName>Camila</CustomerFirstName>
<RepresentativeID>013</RepresentativeID>
<Attachments>
<Attachment>
<AttachmentName>recording3</AttachmentName>
<FileName>recording3.mp3</FileName>
<CreatedBy>PhoneRecorder</CreatedBy>
<CreatedOnDate>2011-05-04T20:19:50.234+01:00</CreatedOnDate>
</Attachment>
</Attachments>
</Call>
<Call>
...
</Call>
</Calls>
31
Concepts: How InfoArchive Works
Archival Information Unit (AIU)
An archival information units (AIU) is conceptually the smallest archival unit (like an information
atom) of an information package. Each AIU corresponds to a record or item of the archived data. A
single customer order, a patient profile, or a financial transaction record in an information package
is an AIU.
The PDI file (eas_pdi.xml) in a SIP describes all the AIUs in the package. An AIU in eas_pdi.xml
consists of an XML block in the file containing its structured data, and optionally, references to one
or more associated unstructured content files.
In the following example:
• AIU #1 is described by its structured data in eas_pdi.xml with a reference to one content file.
• AIU #2 contains two content files referenced by its structured data in eas_pdi.xml.
• AIU #3 only contains structured data stored in eas_pdi.xml with no content files.
InfoArchive processes all AIUs in the same way regardless of whether they contain any content files.
Archival Information Package (AIP)
When a SIP is ingested into InfoArchive, it is converted into an archival information package (AIP)
and stored in the system. An AIP is represented as an eas_aip type object in Content Server.
32
Concepts: How InfoArchive Works
The way InfoArchive organizes and stores information in an AIP is very different from the way
information is packaged in a SIP. During the SIP to AIP conversion process, information is extracted
from the SIP descriptor and stored in the properties of the AIP, along with some additional archival
information. Content files, if any, are imported as a part of the AIP object and stored on the
designated file store. The structured data of all the AIUs in eas_pdi.xml within the received SIP are
stored as an XML document in xDB.
Like a SIP, an AIP is also made up of AIUs. Although physically stored very differently in InfoArchive
from in the SIP file, conceptually, AIUs remain the same as the basic unit that constitutes an AIP.
Unstructured Content File Table of Contents (eas_ri_xml)
Unstructured content files associated with AIUs, if any, are imported as the original content of the
AIP object in the aggregated content file (eas_ci_container) format and stored in the configured
storage area. The eas_ci_container rendition content aggregates all unstructured data associated
with the AIP. Meanwhile, a table of contents (eas_ri_xml) which describes the aggregated file is
generated by the ingestion process.
The description of each unstructured content file is stored in the eas_ri_xml (table of contents)
rendition of the AIP.
33
Concepts: How InfoArchive Works
The unstructured data table of contents (eas_ri_xml) is also imported into xDB with indexes created
on the file name and sequence number elements. The extension for table of contents XML file is ri.
The table of contents contains the following information:
• File name
• File size
• Format
• Sequence number of the unstructured content file in the AIP
• Byte position in the eas_ci_container rendition content
• Hash value for future consistency checking purposes (optional)
• Operations performed during the ingestion such as compression and encryption (optional)
Here is an example of the table of contents:
<?xml version="1.0"?>
<ris xmlns="urn:x-emc:eas:schema:ri">
<ri schemaVersion="1.0" seqno="1" pdi_key="recording1.mp3"
xmlns:ri="urn:x-emc:eas:schema:ri">
<step seqno="1">
<ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123"/>
</step>
<ri schemaVersion="1.0" seqno="2" pdi_key="recording2.mp3"
xmlns:ri="urn:x-emc:eas:schema:ri">
<step seqno="1">
<ci mime_type="audio/x-mpeg" dctm_format="mp3" size="40183"/>
</step>
</ri>
34
Concepts: How InfoArchive Works
</ris>
• The namespace is urn:x-emc:eas:schema:ri.
• Each content file is described by an ri element, which has the following attributes:
Attribute Name
Description
SchemaVersion
Always set to 1.0.
seqno
Sequence number of the unstructured file in the AIP, incremented for
each content file name returned by the configured XQuery defined in the
content of the ingestion configuration (eas_cfg_pdi) object.
pdi_key
The original content file name
• Each step element has a ci sub-element having the following attributes:
Attribute Name
Description
mime_type
MIME type corresponding to the format of the content file as returned by
the configured query.
dctm_format
Repository format corresponding to the format of the content file.
position
Byte position in the eas_ci_container rendition
size
Byte size in the eas_ci_container rendition
seqno and pdi_key together make a unique identifier for a content file. If the query returns several
results for a single content file, an error will occur.
The following AIP properties are related to the unstructured data.
Property Name
DA Label
Description
eas_ci_cnt
#contents
Number of unstructured data files associated with
the AIP
eas_ci_size
Cumulated size of the
contents
Combined size of all of the unstructured data files
35
Concepts: How InfoArchive Works
Property Name
DA Label
Description
eas_ci_format
Content store format
Format of the content storing the AIP unstructured
data (always eas_ci_container)
Content description
size
Content description
size
Size in bytes of an <ri> block in the TOC
Archival Information Collection (AIC)
AIPs provide the information necessary for a consumer to locate and order AIUs of interest. However,
it can be impossible for a consumer to sort through the millions of AIPs contained in a large Archive.
AIC addresses this problem.
An Archival Information Collection (AIC) organizes a set of AIPs, which can support flexible and
efficient data access. AIPs are aggregated into the AIC using specified criteria (through DQL, for
example) determined by the archivist. Generally AICs are based on the AIPs of interest having
common themes or origins, and a common set of properties. For example, an archive for digital
movies may have AICs based on the subject area of the movie such as action, science fiction, or
horror. In addition, the archive may have AICs based on other factors such as director or lead actor.
It is common that one AIC corresponds to one holding. At a minimum, InfoArchive can be viewed
has having at least one AIC which contains all the AIPs archived in the system.
How Data is Ingested
Ingestion Modes
Companies today are dealing with all sorts of information that varies widely in volume and variety
from business to business, with the need to enforce different data management policies on different
types of information in compliance with business rules and regulatory requirements. InfoArchive
supports two ingestion modes to support a wide range of information archiving needs with optimal
storage usage and performance: asynchronous ingestion and synchronous ingestion. A holding can
be configured to support both ingestion modes.
Asynchronous Ingestion
In the asynchronous ingestion mode, InfoArchive performs scheduled ingestions of large SIPs
(containing large numbers of AIUs) in batches. Each Data Submission Session (DSS), or batch,
contains multiple sequentially numbered SIPs. SIPs in a batch can be ingested in any order, but
ingested packages are not committed and made searchable until all the SIPs in a batch have been
ingested. As a result, information is not searchable or accessible immediately after ingestion.
Asynchronous ingestion provides small granularity of data management as you can individually
manage each package such as adjust the retention date and apply permission sets.
36
Concepts: How InfoArchive Works
Asynchronous ingestion is performed through a set of scheduled scripts.
Synchronous Ingestion
In the synchronous ingestion mode (also called transactional ingestion mode), the client application
calls the InfoArchive ingest web service to synchronously archive data to the archive holding. If
the ingest web service call is successful, archived data can be searched and accessed immediately
after ingestion.
You use synchronous ingestion in the following scenarios:
• Regulatory compliance
In some industries, regulations mandate that certain types of documents must be synchronously
archived to their final location.
• Business requirements
In some business environments, it is required that data must be promptly archived and
immediately accessible to be referenced by a business process to achieve a high service level
(as stipulated in the service level agreement).
• Small data management granularity
You want to manage archived data at the AIP level—for example, you want to apply policies,
access rights, folder classifications to individual AIPs and not to groups of AIPs.
Synchronous ingestion is suitable for ingesting of large quantities of standalone (one SIP per DSS
or batch), small SIPs (containing small numbers of AIUs) that can be managed at the package level.
The synchronous ingestion mode has the following characteristics:
• Only discrete, standalone SIPs can be ingested. A standalone SIP is the only (the first and last)
SIP in a DSS (batch), as indicated by the following properties in its SIP descriptor: seqno=1,
is_last=True.
• SIPs share a set of common characteristics that can be shared properties of the AIP parent
shareable object, for example, the same target archive holding, PDI schema, retention class, close
base retention date, and permission sets.
• SIPs are of small size, containing a moderate number of AIUs.
• Received SIPs are ingested immediately without delay.
The synchronous ingestion mode has the following limitations:
• If InfoArchive receives the same SIP from a business application multiple times, it will archive
multiple identical copies of the SIP.
• Unlike in the asynchronous ingestion mode, you cannot defer the commit or rollback ingestions.
• The synchronous ingestion mode provides large management granularity, which means that you
can only manage packages collectively at the aggregated AIP level.
• This mode is not suitable for ingesting large-sized SIPs, which may cause system latency, timeout,
unresponsiveness, or crash.
• Compared with the asynchronous ingestion mode, the synchronous ingestion mode is less
efficient, consumes more resources, and has a larger storage footprint.
37
Concepts: How InfoArchive Works
Synchronous ingestion supports all the three AIP modes.
Although synchronous ingestion supports AIP mode 1, this AIP mode has the largest storage
footprint at the repository and RDBMS level. Therefore, use AIP mode 1 only when there are only a
moderate number of AIPs to retain during the retention period.
You perform synchronous ingestion by calling InfoArchive Web services exposed to consumers.
For information about how to consume InfoArchive Web services, refer to the EMC InfoArchive
Development Guide.
Asynchronous Ingestion vs. Synchronous Ingestion
An archive holding can concurrently support both ingestion modes.
Both ingestion modes basically share the same ingestion configuration and underlying ingestion
code, and when you search archived data, it makes no difference in which ingestion mode was
the data archived.
However, since SIPs are ingested in an ad hoc and discrete manner rather than in scheduled batches,
synchronous ingestion is relatively less efficient, consumes more system resources, and has a larger
storage footprint.
Choose the appropriate ingestion mode to archive your information based on the nature of the data,
system environment, information management policies, and business rules or regulatory compliance
requirements.
The following table compares the two ingestion modes:
Asynchronous Ingestion
Synchronous ingestion
Number of SIPs per DSS (batch)
Multiple (sequentially
numbered)
One (discrete, standalone)
SIP size (Number of AIUs in a
SIP)
Large
Small
Ingestion frequency
Scheduled
Ad hoc
Ingestion commit
Not until all the SIPs in a batch
have been ingested
Immediately after ingestion
Supports ingestion rollback
Yes
No
Immediately searchable and
accessible?
No
Yes
Supported AIP mode
1
1, 2, 3
Management granularity
Small
Large
Ingestion efficiency
High
Low
Resource consumption
Low
High
38
Concepts: How InfoArchive Works
Asynchronous Ingestion
Synchronous ingestion
Use case
Large packages that need to
be managed individually; for
example, video footages
Large quantities of small
packages with shared
properties that can be managed
collectively; for example,
payment transaction records
Performed through
A set of scheduled scripts
The exposed Ingest service
Archiving Process
Both asynchronous and synchronous archiving processes consist of four distinct sub-processes:
reception, enumeration, ingestion, and commit. The only difference is that for asynchronous
ingestion, each of the sub-processes is executed through a scheduled script located in EAS_HOME/bin
whereas for synchronous ingestion, all the sub-processes are executed in one run through the
InfoArchive web services call and are transparent to the user.
• Reception
Queues the SIP file for ingestion. An AIP object is created in the Documentum repository and
assigned attribute values based on the information in the SIP.
• Enumeration
Outputs a list of AIPs awaiting ingestion, in descending order of ingestion priority.
• Ingestion
Ingests the AIPs and their associated AIUs, in order of AIP ingestion priority. At this stage, the
AIUs are not searchable yet.
• Commit
Commits the AIUs into the xDB database. The AIU data for the holding can now be searched.
Reception
Receiver
Reception performed by running the receiver. The receiver is a standalone Java program which
adds a received SIP file in the ingestion queue.
This program is typically activated either by the third party file transfer software after a SIP file is
received or by a custom script launched as a daemon which periodically scans predefined reception
directories and launches the receiver for each file found. If necessary, several concurrent executions of
the receiver can be launched.
39
Concepts: How InfoArchive Works
The arguments expected by the receiver are stored in a properties file. You can overwrite the default
arguments in the command line. One property contains the name of a receiver configuration object
containing additional configuration settings.
Beside the arguments and the receiver configuration object, the execution of the receiver is also driven
by the configuration of the target archive holding referenced by the SIP descriptor.
The receiver connects to the repository, loads the referenced receiver configuration object then:
• Creates an AIP repository object and assigns attributes with the reception start date, the path
name and the size of the received file
• Creates a reception working directory and initializes a reception log file
• Attaches the reception lifecycle to the AIP repository object
• Checks the file path name matches an acceptable pattern defined in the receiver configuration
object
• Extracts the SIP descriptor from the received SIP file
• Assigns attributes of the AIP repository object with information read in the SIP descriptor
• Checks the archive holding referenced in the SIP descriptor is configured
• Creates a destruction lock on the AIP repository object if activated in the archive holding
configuration
• Classifies the AIP repository object as defined in the archive holding configuration
• Checks the presence of the hash value in the SIP descriptor if demanded by the archive holding
configuration
• Checks if another non invalidated AIP repository object already has the same identifier as the
received file
• Checks if the Data Submission Session (DSS) sequence number is consistent with those of the other
AIP repository objects referencing the same DSS (asynchronous ingestion only)
• Rejects the received file if it references a rejected DSS identifier
• Encrypts the received file if activated by the archive holding configuration
• Imports several files as renditions of the AIP repository object
— The received file as format eas_sip_zip or eas_sip_zip_crypto.
— The SIP descriptor as format eas_sip_xml.
— The compressed reception log file as format eas_receive_logs_zip.
• Destroys the received file
All along the execution of those actions, the receiver promotes the AIP repository object within its
attached reception lifecycle.
Consequently, a successful reception leads to an AIP repository object which is in the Waiting
Ingestion state.
The reception working directories are periodically destroyed by the Clean (eas-launch-clean) job.
40
Concepts: How InfoArchive Works
Importing the received file as content of the AIP repository object both secure it in a private Content
Server storage area and makes it available to any ingestor running on any server connected to the
network.
Reception Process and Lifecycle
The SIP reception process entails a sequence of predefined processing steps, each associated with
a lifecycle state.
Lifecycle State
Actions
Initialization
• Extracts and performs XML validation of the SIP descriptor
(eas_sip.xml)
• Sets the AIP properties based on the information in the SIP descriptor
(eas_sip.xml)
• Verifies that the target archive holding is configured
Validation
Performs the following validations and actions:
• If this SIP has already been processed, raises an error
Note: This validation is only performed in asynchronous ingestion;
Synchronous ingestion does not check for SIP duplication.
• If this SIP belongs to a rejected DSS (batch), attaches it to the reject lifecycle
• If this SIP does not reference the PDI schema configured for the holding,
attaches it to the reject lifecycle
• Computes the ingestion deadline and retention date
The administrator can reject a DSS (batch) by rejecting a SIP within the DSS
that has already been received, while other SIPs of the DSS are still being
produced or transferred. In this situation, the reception of further SIPs for
this DSS leads the receiver to immediately attach the AIPs to the rejection
lifecycle without raising any error.
This can be useful when you manage a large DSS for which many large
SIPs are created.
Encryption
(optional)
Encrypts the entire received SIP file, using a configured crypto provider
such as EMC RSA DPM (Data Protection Manager).
41
Concepts: How InfoArchive Works
Lifecycle State
Actions
Import
Imports the following as contents of AIP:
• SIP descriptor (eas_sip.xml)
• Received SIP file
• Gzipped reception log file (optional)
Waiting_Ingestion
No action
Reception Node
The receiver processes SIPs according to instructions set in the reception node. You must configure a
reception node for a holding. Multiple holdings can share the same reception node.
A reception node contains the following settings:
• The repository folder path in which the AIP object must be initially classified.
• The directory file system path in which the reception working directory must be created.
• A table describing the patterns of the accepted incoming files; for example:
File Name Pattern
Customer Pattern
Type Pattern
*.zip
PRODUCTION
SEPA
*.zip
PRODUCTION
ATMLOGS
The receiver sequentially scans the table for finding a line for which the received filename, customer
and type arguments match the configured patterns.
The customer and type arguments are intended to be passed by the third party file transfer software:
• The customer argument is generally set with the identification of the file transfer destination agent
which is different per environment.
• The type argument is generally set with a code associated with the data type transported by
the received file.
The usage of the customer and type arguments is not mandatory when this information can be
deducted from the normalized filename of the received file; for example:
File Name Pattern
Customer Pattern
Type Pattern
prod_sepa_*.zip
*
*
prod_atmlogs_*.zip
*
*
42
Concepts: How InfoArchive Works
By default, for diagnosis purpose, the receiver does not delete the received file when a reception
error occurs. This behavior can be altered by adjusting the settings of the receiver configuration
object for deleting the file:
• If it does not match any acceptable pattern.
• If it matches a given pattern but the error occurs after the matching is found.
This mechanism is activated when constraints impose to not keep files containing sensitive data in
a regular form.
Enumerator
The ingestion enumerator is a standalone Java program utility returning a list of AIP identifiers
pending for ingestion ordered by their descending ingestion priority.
This utility is typically activated on each ingestion node by a custom shell periodically activated by
the corporate job scheduler or launched as a permanent daemon.
The arguments expected by the enumerator default from a properties file but can be overridden
on the command line.
Every AIP is associated with a given archive holding and a holding configuration defines which
ingestion nodes are allowed to execute an ingestion for the holding. Based on this information, the
nodes argument restricts the selection of the AIPs to those allowed to be ingested by the ingestion
node names passed in the argument. This mechanism allows each ingestion node to obtain the
list of AIP identifiers it can process.
The operational plan defines the maximum allowed number of concurrent ingestions which can be
executed on each ingestion node at any point in time. It is the responsibility of the custom shell to
determine that number but the following arguments restricts the returned list to the AIPs for which
the ingestion can be launched:
• Passing that number as value of the max argument leads the enumerator to return up to that
number of AIPs.
• Passing also the minusrunning value in the flag argument leads the enumerator to:
— Determine the number of ingestions being executed
— Subtract this number of ingestion being executed to the value passed in the max argument.
For example:
• 20 AIPs are pending for ingestion and eligible for being processed by the ingestion node 1
• The maximum number of concurrent ingestions allowed for the ingestion node 1 is 10
• 3 ingestions are being executed on the ingestion node 1
• A shell launched on the ingestion node 1 invokes the enumerator with the nodes=ingestion node
1, max=10 and f=minusrunning arguments. The enumerator returns 7 AIPs (i.e., up to 10 – 3)
• Consequently, the custom shell can immediately launch an ingestion for each returned AIP
without fearing to exceed the maximum allowed number of concurrent ingestions.
43
Concepts: How InfoArchive Works
Ingestion
Ingestor
The ingestor is a standalone Java program which executes the ingestion of an AIP.
This program is generally launched by the custom shell invoking the enumerator:
• Direct activation of the ingestor as a subprocess per AIP to ingest
• Indirect activation through the triggering of a job created per AIP
The arguments expected by the ingestor are defaulted in a properties file but can be overridden
on the command line. One property contains the name of an ingestion node configuration object
containing additional configuration settings.
Beside the arguments and the ingestor node configuration object, the ingestion is mainly driven by
the configuration of the archive holding to which the AIP belongs to.
The ingestor connects to the repository, loads the referenced ingestor configuration object then:
• Creates an ingestion working directory and initializes an ingestion log file
• Analyses the configuration for determining the ingestion sequence to apply
• Attaches the lifecycle associated with the ingestion sequence to the AIP repository object
• Sequentially executes the ingestion steps described in the ingestion sequence (e.g., validation of
the structured data of the AIUs, import of the structured data in xDB, creation of xDB indexes,
compression of the contents associated with the AIUs, …).
• Enriches the AIP repository object with additional contents according to the configuration (e.g.,
compressed xDB detachable library, ingestion logs, aggregate containing the contents associated
with the AIUs, table of contents, …)
All along the execution of the ingestion, the ingestor promotes the AIP repository object within the
lifecycle associated with the applied ingestion sequence.
A successful ingestion leads to an AIP repository object which is in the pending commit state.
The ingestion working directories are periodically destroyed by the Clean (eas-launch-clean) job.
Ingestion Process and Lifecycle
The SIP ingestion process entails a sequence of processing steps. Ingestion step actions are executed
by processors referenced in the XML content of the ingestion configuration (eas_cfg_pdi) object. The
XML content also contains settings.
InfoArchive ships with two pre-configured ingestion configuration objects:
• eas_ingest_sip_zip-pdi
The ingestion process for ingesting structured data only—SIPs containing no content files.
• eas_ingest_sip_zip-ci
The ingestion process for ingesting both structured and unstructured data—SIPs containing
content files.
44
Concepts: How InfoArchive Works
For information about the SIP structure, see Submission Information Package (SIP), page 22.
In addition to the same processing steps as eas_ingest_sip_zip-pdi for ingesting structured data,
eas_ingest_sip_zip-ci also contains the steps for processing unstructured data.
eas_ingest_sip_zip-ci must be configured for archiving unstructured data. The process also works
for structured data, but if you archive structured data only, eas_ingest_sip_zip_pdi should be
configured since it entails less overhead with fewer processing steps and fewer audit records.
eas_ingest_sip_zip-pdi is associated with the eas_ingest_sip_zip-ci.en_US lifecycle that contains
structured data processing related states. The eas_ingest_sip_zip-ci ingestion process is associated
with the eas_ingest_sip_zip-ci.en_US lifecycle that contains additional unstructured data processing
related states. There are no out-of-the-box lifecycles for other locales such as fr_FR and zh_CN.
45
Concepts: How InfoArchive Works
Structured Data Lifecycle State
Actions
Initialize
Creates the ingestion working directory
Decrypt (optional)
Decrypts the SIP file if InfoArchive has been configured for
encrypting received SIPs duirng ingestion using an crypto
provider such as EMC RSA DPM (Data Protection Manager)
Decompress_Metadata
Extracts eas_pdi.xml to the ingestion working directory and
compresses it as an eas_pdi_xml_gzip rendition of the AIP
object (eas_aip) if configured for the holding
Note: As opposed to the zip format which can compress and
store multiple files, gzip compresses a single file only.
Unstructured data is archived separately as the
eas_ci_container rendition of the AIP (eas_aip) object.
Validate_Metadata_Hash
Checks the eas_pdi.xml hash, if present in the SIP descriptor
Encrypt_Metadata (optional)
Encrypts only the configured subset of structured data in
eas_pdi.xml, as well as the compressed eas_pdi.xml file
before storing it as an eas_pdi_xml_gzip_crypto rendition
Import_Metadata
Creates the xDB detachable library associated with the AIP
(xDB mode 2) and imports eas_pdi.xml into xDB
46
Concepts: How InfoArchive Works
Structured Data Lifecycle State
Actions
Query_Metadata
• Executes XQuery for counting the number of archived AIUs
• Executes XQuery for obtaining the minimum and maximum
values of the partitioning criteria
• Assigns minimum and maximum values on AIP object in
eas_pkey_min_date and eas_pkey_max_date attributes
Create_Metadata_Index
Creates xDB indexes
Import
Imports the following as renditions of the AIP object:
• Compressed eas_pdi.xml file (optionally ciphered) for
reversibility
• The compressed xDB detachable library containing the data
extracted from eas_pdi.xml (xDB mode 2)
• The compressed ingestion log file
Waiting_Commit
The AIP has been successfully ingested but is waiting for the
final commit before it can be searched.
If synchronous commit is enabled on the holding configuration
object, standalone SIPs (seqno=1 and is_last=true) bypass this
step and are directly committed.
Unstructured Data Lifecycle
State
Decompress_Content
Actions
• Unzips the unstructured data files contained in the SIP to
the ingestion working directory
• Executes configured XQuery on the structured data to
initialize the TOC
• Checks for the presence of each unstructured data file
Validate_Content_Hash
Executes the configured XQuery on the structured data to
compute and check the hash values on the unstructured data
files
Compress_Content
Executes the configured XQuery on the structured data to
compress the desired unstructured data files
Unstructured data is usually provided in an electronic format
that is already compressed such as pdf, tiff, jpeg, mp3, mp4,
and so on. You can configure the processor not to compress
specified types of compressed data to reduce the processing
overhead during ingestion and retrieval while still having an
acceptable compression ratio.
47
Concepts: How InfoArchive Works
Unstructured Data Lifecycle
State
Actions
Encrypt_Content
Executes the configured XQuery on the structured data to
encrypt the desired unstructured data files
Aggregate_Content
Aggregates the processed unstructured data files
Ingestion Node
For scalability purpose, the ingestion workload can be distributed across multiple ingestion nodes.
A repeating property of the archive holding configuration defines the ingestion nodes allowed
to process the ingestion of the SIPs.
This setting is considered by the InfoArchive enumerator activated on each ingestion node. It returns
the list of AIPs pending for ingestion restricted to those allowed to be processed on the current node.
This mechanism lets you expedite ingestion on some ingestion nodes associated with some holdings.
Ingestion Priority
The InfoArchive enumerator returns the list of AIPs pending for ingestion in descending order
of their ingestion priority.
This ingestion priority is driven by two sorting criteria:
1.
In descending order of the ingestion priority of the destination archive holding
In an archive holding, a SIP with a high ingestion priority is returned before a SIP with a low
ingestion priority.
2.
In ascending order of the ingestion deadline date of the SIP
For SIPs to be archived in archive holdings having the same ingestion priority, the SIP having
the closest ingestion deadline is returned before the other SIPs.
How Data is Stored
InfoArchive ingests SIPs into the repository and stores data as AIPs; each AIP corresponding to one
SIP. The AIP is represented by the eas_aip type object in Content Server. An AIP repository object
and an xDB data file respectively apply the proprietary Content Server and xDB data model.
The following general actions are performed during the ingestion:
• The information contained in the SIP descriptor (eas_sip.xml) and PDI file (eas_pdi.xml)
populates the properties of the AIP repository object in order to facilitate the subsequent
operations.
• Unstructured content files associated with AIUs, if any, are imported as the original content of the
AIP object in the aggregated content file (eas_ci_container) format and stored in the configured
storage area. The eas_ci_container rendition content aggregates all unstructured data associated
48
Concepts: How InfoArchive Works
with the AIP. Meanwhile, a table of contents (eas_ri_xml) which describes the aggregated file is
generated during the ingestion process.
• The structured data files (eas_sip.xml, eas_pdi.xml, and eas_ri.xml) contained in the
SIP are imported into xDB.
In situations where a full data reversibility is required at the storage level, settings of the archive
holding enable the activation of redundant storage of this information as renditions of the AIP
repository object.
Rendition format
Description
eas_sip_xml
Rendition storing the SIP descriptor
eas_pdi_xml_gzip
Rendition storing the SIP structured data compressed with the
gzip algorithm
When AIUs have associated contents, the following information is always stored as content of the
AIP repository object regardless of the related holding settings.
Format
Description
eas_ci_container
This content aggregates all the contents associated with AIUs. It is
imported as the original content of the AIP repository object.
eas_empty
When the AIP does not contain any unstructured content, the AIP
content is in eas_empty format.
eas_ri_xml
This rendition corresponds to the table of contents which describes
the aggregated content file (eas_ci_container).
49
Concepts: How InfoArchive Works
xDB Modes
xDB modes determine how structured data is stored in xDB.
A SIP may contain three types of structured data files. InfoArchive assigns a new file extension to
metadata files when importing them into xDB:
Structured Data
File
Description
Document File Extension
SIP descriptor
XML file describing the attributes of a SIP
sip
PDI file
XML file describing the attributes of AIUs
pdi
Table of contents
XML describing the attributes of the content files
contained in a SIP
ri
There are three ways structured data is ingested into xDB:
• xDB Mode 1
Structured data files of all ingested AIPs are stored directly in a designated xDB detachable library.
50
Concepts: How InfoArchive Works
xDB mode 1 is suitable for the following scenarios:
— Frequent searches required on AIPs with long retention periods
— Large number of AIPs with a moderate structured data volume
— Long xDB backup time acceptable
• xDB Mode 2
For each ingested AIP, a new sub-library is created in a designated xDB detachable library and
structured data files are stored in the sub-library. The AIP object ID (eas_aip_id) constitutes
part of the sub-library name.
xDB mode 2 is suitable for ingesting a moderate number of large AIPs with a high structured
data volume.
• xDB Mode 3
Like in xDB mode 2, AIP structured data files are stored in the sub-library exclusively created for
each AIP. However, AIP sub-libraries are organized according to a defined assignment policy
in a pool of libraries (called pooled libraries, which is purely an InfoArchive concept and not a
library type in xDB) created in a designated xDB detachable library. The pooled libraries logically
pertain to a configured library pool.
The logical library pool in xDB mode 3 gives you another layer of data manageability. For
example, you can configure the assignment policy to ensure that a given pooled library stores
only AIPs belonging to a specific holding, or group AIPs in different pooled libraries by
year/month/week based on their retention dates.
xDB mode 3 also significantly increases the ability to manage more archived data in an xDB
database without suffering from performance issues. Compared with xDB mode 2, xDB mode 3
dramatically reduces the number of managed xDB detachable libraries. For performance reasons,
managing more than 100,000 detachable libraries in an xDB database is not recommended.
Note: xDB creates a library or a sub-library when all XML files in the folder are compressed and
archived to the same destination. For example, when the xDB mode is set to 2, or the AIP mode
is set to 3.
The example below illustrates how AIP structured data files are stored in xDB after ingestion by
each of the three xDB modes:
51
Concepts: How InfoArchive Works
xDB Library Pool (xDB Mode 3)
An xDB library pool is a set of logically grouped xDB libraries which can be configured as destination
of one or several archive holdings. xDB libraries in a library pool are called pooled libraries (an
InfoArchive concept) and a pooled library can store AIPs belonging to different holdings.
In Content Server, an xDB library pool is represented by an eas_cfg_xdb_library_pool configuration
object. For each xDB pooled library, an eas_xdb_pooled_library repository object is created and
associated with an eas_cfg_xdb_library_pool configuration object.
In DA, a library pool is displayed as a folder, which you can double-click to view its associated
AIP objects.
Note: The library pool folder and its subordinate AIP objects visually represents their logical
parent-child relations rather than the actual state of the objects. Therefore, even when an AIP object
assigned to a library pool has been pruned, it is still present in the library pool folder.
The following illustration shows the mapping between a pooled library in xDB and its corresponding
eas_xdb_pooled_library object in the repository.
52
Concepts: How InfoArchive Works
A library pool is represented by an eas_cfg_xdb_library_pool configuration object defining:
• Its assigned name.
• The xDB database in which the libraries are stored.
• The rule to be applied for assigning a new AIP to a library of the pool, for example, maximum
number of AIPs per library
• The settings to be applied when it is required to add a new library to the pool. For example,
repository folder in which the eas_xdb_pooled_library repository object must be classified and the
permission set to apply on it.
• The maximum number of libraries of the pool which can be cached in at the xDB level.
• The conditions to be fulfilled for closing a library of the pool and the settings to apply during a
closure, for example, import the xDB library as content of the eas_xdb_pooled_library repository
object.
0
The library is never closed automatically.
1
The library is closed when the current date >= date returned by the XQuery assigned to the
eas_close_hint_date on the library + eas_close_period.
If this mode is configured, the XQuery must not only return the partitioning value but
also a date; if not, an ingestion error occurs.
2
The library is closed when the current date >= library opening date + eas_close_period.
3
The library is closed when the current date >= eas_last_ingest_date defined on the
library + eas_close_period.
Note: The library is closed whenever a close request is set on the library explicitly.
In DA, different xDB pooled library states are represented by different icons.
Icon
xDB Pooled Library State
Pooled library
53
Concepts: How InfoArchive Works
Icon
xDB Pooled Library State
Pooled library is open
Pooled library is online
Pooled library is offline
xDB Pooled Library Assignment Policy (xDB Mode 3)
When an AIP is ingested, InfoArchive searches for an available xDB pooled library in which to store its
structured data. An available xDB pooled library must be one that is not closed and has not reached
its close condition or storage quota. If no such pooled library is found, a new pooled library is created.
By default, a new AIP is assigned to the latest open library in the pool. You can configure a custom
assignment policy to assign new AIPs to libraries to meet your specific requirements.
The most common assignment policy is the time-based partitioning logic (e.g., by week, month,
quarter, or year). For example, in a quarter partitioning assignment policy, AIPs archived during
different quarters are assigned to their respective quarter-based xDB pooled libraries. Multiple
pooled libraries can be open at a single point in time for a given library pool. To archive data from
in 2014 Q3 through Q4 with a quarter partitioning policy, the corresponding xDB pooled libraries
must remain open. When there is no more data to be archived for a particular quarter, the library
corresponding to that quarter can be closed.
xDB Library Online Backup
For xDB libraries created in xDB mode 2 and 3, xDB libraries can be compressed and imported
into the repository as a rendition of the AIP object while the xDB library still resides in the xDB
database in read-only mode. The xDB library backup renditions in eas_xdb_backup (uncompressed)
or eas_xdb_backup_gzip (compressed) format contains xDB library contents and is stored in the
Content Server repository storage area such as on EMC Centera or Atmos.
54
Concepts: How InfoArchive Works
Note: Earlier releases of InfoArchive stores xDB library (pool) renditions in the eas_xdb_library or
eas_xdb_library_gzip format. If you have previously backed up xDB library (pool) backup renditions
in the legacy format, you need to migrate them to the new xDB backup format. InfoArchive is
backward-compatible with the legacy xDB library (pool) backup format, but it may not be supported
in future InfoArchive releases. InfoArchive provides a data migration tool to help you convert the
old xDB library (pool) backup format to the new format. See Converting xDB Library (Pool) Backup
Renditions, page 124.
In xDB mode 2, the xDB library is automatically imported into the repository at the end of the AIP
ingestion; In xDB mode 3, the xDB pooled library is imported into the repository when the pooled
library is closed by the eas_close job.
Archived data is always searchable during the online backup process since the xDB library does not
need to be detached.
Importing xDB libraries into the repository helps to exclude the libraries from DB backups and
provides the following benefits:
• The xDB backup duration is fairly constant, independent of the volume of archived data.
• If a rendition is assigned to a replicated content-addressable storage (CAS) like EMC Centera or
Atmos, no backup operations need to be managed for this data.
xDB Caching
Through the xDB mechanism, InfoArchive uses the xDB system as cache for querying archived data.
xDB libraries created in xDB mode 2 and 3 can be temporarily removed from the xDB file system
(cached out). Caching out xDB libraries prevents the xDB file system from growing too large and
reduces the xDB space requirements. Some data is archived solely for compliance purposes without
any need to be searched. xDB caching is configured for such holdings generally to keep the number
of xDB libraries low.
The order node requests the xDB cache process to cache out the library. A background thread of the
order node periodically creates a list of the unlocked libraries which can be cached out of the xDB file
system for each holding and library pool. A library cannot be cached out if it is accessed by a search
that is in progress. The list of libraries to cache out is ordered by the following criteria:
1.
Number of orders currently being searched
2.
The date of the last search in the library
3.
The date of the last caching of the library
When an xDB library is cached out, you can put it back to the xDB database later (cached in) when
needed for searching. Caching in xDB libaries provides file system access to the xDB data files.
55
Concepts: How InfoArchive Works
The following screenshot illustrates how an AIP’s xDB library in xDB is cached in as a compressed
xDB backup rendition of the same AIP object in DA.
In DA, you can view xDB caching related properties of an AIP.
56
Concepts: How InfoArchive Works
Property Name
DA Label
Description
eas_xdb_cache
_support
Cache out support
Equals T when the AIP is stored in an xDB library
which can be cached in/out
eas_xdb_is_cached
Cached
Equals T when the library which stores the AIP is
currently present in the xDB file system
eas_xdb_clib_cache
_lck_date
Locked in cache
deadline date
The library storing the AIP must be kept in xDB
file system until this date and time (i.e., cannot be
cached out)
eas_xdb_clib_cache
_in_cnt
# of cache in
The number of times that the library has been
placed in the xDB file system, including the
initially ingested
eas_xdb_clib_cache
_in_date
Last cache in date
The date and time when the library was last placed
on the xDB file system
xDB Library Locking
You can prevent xDB libraries from being cached out for a specified period of time to ensure the most
recent archived data can always be searched synchronously. When an xDB library cannot be removed
from the DB file system, it is locked in the cache; otherwise, it is in the unlocked state.
During ingestion, the eas_xdb_clib_cache_lck_date property of the AIP (eas_aip) object is computed
from the retention date (/sip/dss/base_retention_date) read from the SIP descriptor eas_sip.xml and
the eas_xdb_cache_lock_period property of the holding configuration object (eas_cfg_holding):
eas_xdb_clib_cache_lck_date = base_retention_date + eas_xdb_cache_lock_period
The xDB library is prevented from being cached out from the xDB file system until this date.
The cache lock period does not starts from the current ingestion date because it is not appropriate to
lock xDB libraries in the cache when the archived AIPs are old, migrated data.
57
Concepts: How InfoArchive Works
For example, if the locked in cache period is set to 31 (days) and the current ingestion date is February
8, the locked in cached dates of the following AIPs are computed as follows:
AIP
Retention Date
Lock in xDB Cache
Date
AIP #1
January 1
February 1
Unlocked
AIP #2
January 15
February 15
Locked
AIP #3
January 31
March 3
Locked
Lock State
You can configure xDB caching settings so that archived data can be synchronously searched
throughout the entire retention period in some holdings but only the most recent data (for example,
several months old) can be synchronously searched in others. The xDB caching settings let you adopt
a strategy for each archive holding based on their respective demand for synchronous searches.
AIP Modes
There are three ways AIPs are created and stored in InfoArchive.
• AIP mode 1
AIPs are created and stored as full-blown materialized AIP objects.
• AIP mode 2
AIPs are created and stored as unmaterialized AIP lightweight objects (LWSOs) attached to a
shared AIP parent shareable object (eas_aip_parent).
In this mode, multiple AIP child lightweight objects (eas_aip) are created, each from one SIP, with
their common properties (for example, the same holding and PDI schema) inherited from an
AIP parent shareable object.
• AIP mode 3
In AIP mode 3, AIPs are created and stored as unmaterialized AIP lightweight objects (LWSOs) in
exactly the same way as in AIP mode 2, except the following:
— The eas_close InfoArchive job closes any open AIP parent shareable object that meets the
defined close condition by aggregating all its AIP child LWSOs into a single materialized
AIP object.
— The eas_dctm_clean job deletes the aggregated AIP child LWSOs.
58
Concepts: How InfoArchive Works
In the following situations, an AIP is considered in unsteady state:
— The AIP is a future aggregate and its associated AIP parent is open
— The AIP belongs to an aggregated AIP parent but is yet to be pruned
Of the three AIP modes, AIP mode 1 takes up the most storage space but offers the smallest
management granularity. For example, in this mode, you can individually manage the retention
period and permission set for each AIP or classify/move repository objects around the repository
folders.
Both AIP mode 2 and AIP mode 3 take advantage of the Content Server lightweight object type,
which reduces the storage space needed in the database for AIP objects by inheriting their common
properties from a single copy of the shared parent object instead of having multiple redundant rows
in the underlying SysObject tables to support all AIP instances. AIP mode 2 and 3 are designed for
ingesting large quantities of SIPs that share many common properties. For detailed information about
the lightweight object type, refer to EMC Documentum Content Server Fundamentals.
Through aggregation of AIPs, AIP mode 3 has the smallest storage footprint of the three modes but
does not offer much flexibility in terms of managing individual AIPs. You can only manage AIPs at
the aggregated AIP level in this mode.
Which AIP mode to use depends on the ingestion mode. AIP mode 1 is used with the asynchronous
ingestion mode; all AIP modes can be used with the synchronous ingestion mode.
The following table compares the three AIP modes:
AIP Mode 1
AIP Mode 2
AIP Mode 3
AIP Object Type
Materialized
lightweight object
(LWSO)
Unmaterialized
lightweight object
(LWSO)
Unmaterialized
lightweight object
(LWSO), aggregated
into materialized
lightweight object
when the parent
shareable object is
closed
AIP Management
Granularity
Small
Medium
Large
Storage Footprint
Large
Medium
Small
Ingestion mode
Synchronous and
asynchronous
Synchronous
Synchronous
An archive holding can concurrently support all the three AIP modes. AIP search and retrieval is
AIU-based and works in the same manner regardless of which AIP mode is used.
The default AIP mode is set through the eas_default_aip_mode property of the
eas_cfg_aip_parent_policy configuration object. You can change the AIP mode after creating a
holding. When you change the AIP mode of a holding, the new AIP mode is only applied to new
packages, and previously archived packages remain stored according to the mode configured when
they were ingested.
59
Concepts: How InfoArchive Works
AIP Parent Assignment Policy
In AIP mode 2 and 3, when an AIP in ingested, InfoArchive searches for an available AIP parent to
which to attach the AIP. The eligible AIP parent must be open and has not reached any defined quota,
and must share some common characteristics with the new AIP, such as the same holding and PDI
schema. If no suitable AIP parent is found, a new AIP parent is created.
An open AIP parent shareable object (eas_aip_parent) has multiple unmaterialized AIP child
lightweight objects (eas_aip). Content files (unstructured data) pertaining to the AIP lightweight
objects are stored individually in temporary file system storage areas such as a file store configured in
Content Server. AIP structured data is stored in an xDB segment assigned to the AIP parent.
By default, the AIP is attached to the first compatible open AIP parent. You can create a custom
assignment policy.
The most common AIP assignment policy is the time-based partitioning logic (e.g., week, month,
quarter, or year). For example, in a quarter partitioning assignment policy, AIPs archived during
different quarters are attached to their corresponding AIP parents. The AIP parent of a particular
quarter must remain open until all AIPs pertaining to that quarter have been archived. Only when
there is no more data to be archived for that quarter, can the corresponding AIP parent be closed.
AIP Parent Closure
In AIP mode 3, an InfoArchive job (eas_close) closes any open AIP parent shareable object that meets
the defined close condition defined in the AIP parenting policy by aggregating all its AIP child
lightweight objects into a single materialized AIP object. Once an AIP parent is closed, AIP child
objects can no longer be attached to it.
When an open AIP parent shareable object is closed:
• All its child lightweight objects are aggregated into a single materialized AIP lightweight object.
The eas_is_aip_aggregate property of the aggregated object is set to TRUE to indicate that the
AIP object is an aggregate of multiple AIP lightweight objects.
• The original child lightweight objects are attached to the Prune lifecycle to be deleted from the
system.
60
Concepts: How InfoArchive Works
• The individual content files originally pertaining to the child lightweight objects are aggregated
into a single content file in the archive storage area. Original content files are removed from the
temporary file system storage area.
• The xDB segment is assigned to the aggregated AIP object.
Closing AIP parent shareable objects and aggregating child lightweight objects reduces the storage
space and optimizes system performance in several ways:
• The destruction of AIP lightweight objects reclaims the storage space needed in the underlying
RDBMS such as SQL Server or Oracle database.
• The consolidation of individual content files into one content file reclaims the temporary file
system storage space.
• If you use Content Addressable Storage (CAS) such as EMC Centera, the storage of content
files in the storage area and xDB does not require backups, which reduces system overhead
and improves performance.
Supported AIP Mode and xDB Mode Combinations
Not all AIP mode/xDB mode combinations are valid and some combinations are only supported by
one ingestion mode but not the other. For information about AIP modes and xDB modes, see AIP
Modes, page 58and xDB Modes, page 50.
xDB Mode 1
xDB Mode 2
xDB Mode 3
AIP Mode 1
asynchronous
ingestion and
synchronous ingestion
asynchronous
ingestion
asynchronous
ingestion and
synchronous ingestion
AIP Mode 2
synchronous ingestion
Unsupported
synchronous ingestion
AIP Mode 3
synchronous ingestion
synchronous ingestion
synchronous ingestion
Note: In xDB mode 2 and 3, the number of xDB detachable libraries automatically created in the
xDB database must not exceed 100,000.
61
Concepts: How InfoArchive Works
How Data is Searched
InfoArchive performs a search in two distinct tiers:
1.
In the tier-1 search, a DQL query is executed against structured PDI data (eas_pdi.xml) using
the partitioning key as a filter criteria to return a subset of eligible AIPs that may contain query
results.
The partitioning key was defined as a part of ingestion configuration and its minimum and
maximum values were determined and assigned to each AIP object during ingestion, which
forms the basis for the tier-1 search. If the search criteria does not contain the partitioning key,
all AIPs will be considered in the tier-2 search. For information about partitioning key, see
minmax—Defining the AIP Partitioning Key, page 132.
For better performance, the tier-1 search only retrieves a subset of AIP attributes.
2.
In the second tier of the search, an XQuery expression is executed at the xDB level against the
candidate AIPs returned by the tier-1 search. This tier-2 search retrieves the AIUs that match
the exact criteria.
A quota (AIP quota or results quota) can be defined and applied to each tier of the search to limit the
number of AIPs or AIUs returned by the query. The staged search approach makes queries very fast.
This is because the AIPs returned by the tier-1 search effectively limit the scope of the tier-2 search.
This prevents the tier-2 search from querying for all the AIPs stored at the xDB level.
Depending on whether or not search results can be returned instantly, there are two types of searches:
synchronous search (also referred to simply as search) and asynchronous search (also referred to
as a background search or an order).
• In a synchronous search, the search results reside in an online xDB library and can be immediately
returned to the user. A synchronous search fails if it attempts to access data from an AIP stored in
a library that is not present on the xDB file system.
• In an asynchronous search, the search results cannot be retrieved in a reasonable amount of time,
either because the result set is too large or because the candidate AIPs that may contain the
search results do not reside in any online xDB library. In order to satisfy this search, data must be
retrieved from a detached (i.e., offline) xDB library.
The detached libraries are placed online in xDB (i.e., the data is cached in) and therefore is available
for searching. The order node requests the xDB cache process to cache in the library when needed
for executing an order. For information about xDB caching, see xDB Caching, page 55.
For example, suppose you perform a search for phone calls received from a customer named John
between Feb. 1 and March 1, 2011 using the following search criteria:
StartDate >= 2011-02-01
AND StartDate < 2011-03-01
62
Concepts: How InfoArchive Works
AND CustomerName = ’John’
In the tier-1 search, InfoArchive executes the following DQL query and returns two AIP objects (AIP
2 and AIP 3) that may contain the records you want:
SELECT AIP WHERE
eas_pkey_min >= 2012-02-01
AND
eas_pkey_max < 2012-03-01
In the tier-2 search, InfoArchive executes an XQuery expression something like the following against
AIP 2 and AIP 3:
for $Call in doc('/aip/02/00000000000/eas_pdi.pdi')/n:Calls/n:Call[
n:CallStartDate[
. ge xs:dateTime('2012-02-01')
and
. lt xs:dateTime('2012-03-01')
]
and
n:CustomerFirstName[. = ’John’]
...
However, since AIP 3 resides offline in a detachable library and cannot be searched until it is cached
in to the online xDB library, an asynchronous search or order has to be created for the search.
Order (Asynchronous Search)
In some situations, searches cannot be completed within a reasonable transactional response time
because one of the following reasons:
• A very large result set is returned
• The candidate AIPs that may contain the search results do not reside in any online xDB library
and must be cached in first for the search to complete
63
Concepts: How InfoArchive Works
Such searches can be performed asynchronously through an order (asynchronous search). Generally,
orders return more results than synchronous searches. Using orders also helps balance the query
load on the server and optimize the system performance.
You issue an order through InfoArchive GUI, which provides a Background Search button that lets
you submit a search order. After an order is submitted, a confirmation message is displayed.
When an order is generated, an eas_order object is created a repository folder (configured through
the order configuration object); for example, in the /Order/order_node_01 folder.
Orders are executed by an order processor, which is configured through an order node configuration
object in the repository.
Order results are stored in a configurable location in xDB; for example, in the /root-library/order/01
library. The document name of an order includes the r_object_id of the eas_order object from the
repository.
Intermediate order results and final order results can be stored in the same location or separate xDB
libraries. If needed, the destination xDB libraries can even be hosted on two xDB databases.
Optionally, if order results contain encrypted data, they can be encrypted and stored as rendition of
the Order object in the repository instead of in xDB.
64
Concepts: How InfoArchive Works
Order Lifecycle
After creation, an order goes through a set of states:
• When an order is created, it is in the Dormant state.
• After the order processor reads the order request, the order state becomes Queued.
• When the order processor begins to execute the query, the order state becomes Started.
• When the order execution is finished, its status is:
— Terminated if the execution completed without any error.
— Error if an error occurred during its processing.
• An administrator can suspend/resume the execution of an order. A suspended order is in the
Suspended state.
Orders are kept until deleted—either manually or by an InfoArchive job.
65
Concepts: How InfoArchive Works
Confirmations—How the Loop is Closed
You create a closed-loop system using InfoArchive through confirmations. A confirmation is a
message generated in response to an AIP event to acknowledge that the event has occurred and to
capture the information of the relevant AIP. Confirmations are generally used to send feedback to
source applications or notifying other business applications (such as a web portal) of AIP events.
Through the confirmation mechanism, InfoArchive can interoperate with other applications to form
a closed-loop system.
Multiple confirmations can be generated for a single event and the content of the messages are
configurable. You can also limit the scope of confirmation generation to a specified set of AIPs or
event types. Confirmation message are outputted to a file system location or xDB.
Confirmation processing is performed by the eas_confirmation Documentum job.
Confirmation Event Types
Confirmations can be generated for the following AIP event types:
Confirmation Event
Type
Description
Receipt
The SIP has been successfully received into the repository and the AIP
has been created
Storage
The AIP ingestion has been committed in xDB
66
Concepts: How InfoArchive Works
Confirmation Event
Type
Description
Purge
The AIP has been deleted from the repository
Reject
The AIP has been rejected by the InfoArchive administrator
invalid
The AIP has been invalidated by the InfoArchive administrator
When a confirmation event occurs, the event timestamp is stored in a date property of the AIP. When
an occurred event is processed by the confirmation job, the confirmation timestamp is assigned to a
date property of the AIP to record the time of confirmation. The confirmation job uses the timestamp
pair (occurrence and confirmation) for each event type to keep track of conformation processing so
that it can pick up where it left off in the last run to incrementally process AIP events.
You can view the event timestamp and the confirmation timestamp for the event in the Tracking
tab of an AIP properties page in Documentum Administrator. A void event timestamp means the
event has not occurred for the AIP; a void event confirmation timestamp means the AIP event has
not been confirmed yet.
Note: the ingestion confirmation date records the storage event timestamp.
Confirmation Job (eas_confirmation)
Confirmation processing is performed by the eas_confirmation Documentum job, which can be run
from the command prompt or Documentum Administrator.
Confirmation processing is very CPU-intensive and I/O-intensive due to frequent queries on AIPs
and frequent writes to the working directory configured for the confirmation job. To minimize its
impact on system performance during business-critical hours, you can schedule the eas_confirmation
job to run in periods that are less resource-demanding.
The eas_confirmation job performs the following tasks when processing confirmations:
1.
Searches for all unprocessed AIP events
2.
For each AIP event, searches for applicable confirmation configurations
3.
Searches for the query configuration object
67
Concepts: How InfoArchive Works
4.
Generates a confirmation message for each eligible AIP event using the XQuery in the query
configuration object
5.
Updates the event confirmation timestamp on the AIP
68
Chapter 3
InfoArchive Configuration
InfoArchive Configuration Overview
EMC InfoArchive is highly configurable and allows you to customize many aspects of the archiving
process to meet your specific business requirements, from ingestion, to query, to the look and feel
of InfoArchive GUI.
All configurations are represented as Documentum repository objects. All InfoArchive configuration
objects have the eas_cfg prefix in their name and are subtypes of the eas_cfg object type, whose
parent type is dm_sysobject.
You perform most of InfoArchive configurations by setting the properties of InfoArchive
configuration objects through EMC Documentum Administrator (DA). Some important configuration
information is stored in the content of configuration objects. For example, the PDI schema file
that defines the data structure of the eas_pdi.xml file is imported as the content of the schema
configuration (eas_cfg_schema) object.
Since a holding is a logical archival destination for the same type of data that shares similar
characteristics, most InfoArchive configurations are performed at the holding level and apply to all
the data archived into a particular holding. The configuration settings of a holding govern all the
aspects of data archiving in the holding throughout the archived data lifecycle, from ingestion, to
storing, to retrieving, to disposition.
The following diagram depicts the core InfoArchive configuration objects and the relations among
them. Arrowed lines denote how the configuration object are related through references, with
the arrowhead pointing to the object being referenced. Sitting at the center of the InfoArchive
configurations is the holding configuration (eas_cfg_holding) object, which is directly or indirectly
associated with most of other configuration objects.
69
InfoArchive Configuration
The EMC InfoArchive Object Reference Guide contains the detailed information of all InfoArchive
objects—both configuration objects and non-configuration objects– and their properties.
The complete working configurations of a holding entail properly configuring all the configuration
objects associated with the holding. There is not a mandatory, fixed sequence to follow to configure
all the configuration objects—you can configure them in any order you want and jump back and
forth until all objects are correctly configured. However, it is a good practice to configure objects
with no or fewer dependencies first— objects that do not reference other objects that have not been
configured yet.
As an alternative to manually creating and configuring a new holding, InfoArchive provides a holding
configuration wizard that guides you through the process of creating a basic holding configuration to
give you a quickstart. See Quickstart Using the InfoArchive Holding Configuration Wizard, page 84.
Working with InfoArchive Configuration Objects
You manage InfoArchive configuration objects in the same way as you manage other repository
objects in DA.
Creating a Contentless Configuration Object
1.
70
Navigate to the folder where you want to create the configuration object.
InfoArchive Configuration
You can create the configuration object at any location, but the best practice is to group
holding-specific configuration objects in one folder (created for a specific holding and preferably
named after it); for configuration objects that are shared across multiple holdings, you can group
them by their type.
For example, the location of all the configuration objects pertaining to the PhoneCalls sample
holding is /System EAS/Archive Holdings/PhoneCalls; All non-holding-specific default
delivery channel, node, search form, and xDB library configuration objects are classified into
their respective folders.
2.
From the File menu, choose New > Document.
Note: Make sure no existing object is selected; otherwise, the New > Document command is
greyed out.
3.
Under the Create tab, type a name for the configuration object and select the object type (always
starts with eas_cfg_); then click Next.
The name here is just for display in DA and will not be used by InfoArchive.
4.
Under the Info tab, set the properties of the configuration object as needed and click Finish. If
you do not set its properties now, you can always edit them later.
Note: The name here corresponds to the eas_name property of the configuration object and will
be used by InfoArchive. Make sure that the value of eas_name is unique.
The configuration object created this way is without any content. You can import a file as the object
content later if needed.
Creating a Configuration Object with Content
1.
Navigate to the folder where you want to create the configuration object.
You can create the configuration object at any location, but the best practice is to group
holding-specific configuration objects in one folder (created for a specific holding and preferably
named after it); for configuration objects that are shared across multiple holdings, you can group
them by their type.
For example, the location of all the configuration objects pertaining to the PhoneCalls sample
holding is /System EAS/Archive Holdings/PhoneCalls; All non-holding-specific default
delivery channel, node, search form, and xDB library configuration objects are classified into
their respective folders.
2.
From the File menu, choose Import.
Note: Make sure no existing object is selected; otherwise, the Import command is greyed out.
3.
On the File Selection page, select a file to import as the content of the configuration object and
then click Next.
4.
On the Object Definition Page, type a name for the configuration object and select the object type
(always starts with eas_cfg_). The corresponding configuration object properties are displayed.
71
InfoArchive Configuration
Note: The first Name field is just for display in DA and will not be used by InfoArchive; The
second name field corresponds to the eas_name property of the configuration object and will be
referenced by InfoArchive. Make sure that the value of eas_name is unique.
5.
Set the properties of the configuration object as needed and click Finish. If you do not set its
properties now, you can always edit them later.
The configuration object is created with the imported file as its content. You can change the object
content later if needed.
Importing Content into a Configuration Object
After you create a configuration object, you can always import content into it (if it is a contentless
object) or change its existing content.
1.
Right-click the configuration object and choose Check Out. A Checked out by you icon
appears next to the object.
2.
Right-click the checked out configuration object and choose Check In.
3.
At the bottom of the Checkin page, select a file and click OK. The file format is automatically
detected and the file is checked in as the new content of the configuration object.
Editing the Properties of a Configuration Object
To edit the properties of a configuration object, right-click the object and choose Properties from the
shortcut menu.
Performing System Global Configurations
Global settings and configuration objects are shared by multiple holdings. Some default global
configuration objects that can be used out of the box were created when you installed InfoArchive.
You can modify them or create new configuration objects as needed.
Configuring Global Settings
Global settings are configured through the global configuration (eas_cfg_config) object and apply to
all holdings. To modify the default global settings, edit the properties of this object.
72
InfoArchive Configuration
Property Name
DA Label
Description
eas_site
Site
Name of the current execution site referenced in
several other configuration objects (for example,
eas_cfg_xdb_library, eas_cfg_rsa_rkm, and so on).
eas_receive
_policy
Reception
lifecycle
Name of the lifecycle to attach to an AIP when it is
received.
eas_rec_reattach
_state
Reception reatach
state
Name of the second state of the reception lifecycle state. In
the event the alias set configured for the archive holding
is different from the one configured for the reception, this
information is used for reattaching the AIP directly to this
state with the alias set defined for the holding.
eas_invalid
_policy
Invalidity lifecycle
Name of the lifecycle to attach to an Archival Information
Package (AIP) when it has been invalidated by the
administrator (for example, unknown file format,
unknown archive holding).
eas_reject_policy
Reject lifecycle
Name of the lifecycle to attach to an AIP when it has been
rejected by the administrator (for example, the business
73
InfoArchive Configuration
application representative states this AIP contains wrong
data).
eas_purge_policy
Purge lifecycle
Name of the lifecycle to attach to an AIP when it has to
be disposed.
eas_prune_policy
Prune lifecycle
Name of the lifecycle to attach to an eas_aip_parent
object, which is not required anymore after its associated
eas_aip has been aggregated to a materialized eas_aip.
eas_rejinv_logs
_store_enable
Archive reject
/invalidation log
for unknown
holding
Flag to indicate that the reject/invalidation logs should be
archived for AIP referencing an unknown holding.
eas_rejinv_logs
_store
Reject
/invalidation log
store for unknown
holding
Name of the log store used to store reject/invalidation
logs for AIP referencing an unknown holding.
eas_rejinv
_retention_enable
Set retention on
reject/invalidation
for unknown
holding
Push the retention date to the storage subsystem for the
contents associated with an invalidated or rejected AIP
referencing an unknown holding.
eas_rejinv
_retention_period
Retention
period (d) on
reject/invalidation
for unknown
holding
Retention time in days for a rejected or an invalidated
AIP referencing an unknown holding (eas_aip.eas
_retention_date = eas_aip.eas_receive_start_date +
eas_cfg_config.eas_rejinv_retention_period).
Configuring an xDB Cache Access Node
The InfoArchive accesses XML documents on the xDB file system through the cache access node.
You must configure a cache access node for xDB caching to work.
The cache access node is configured through the cache access node configuration
(eas_cfg_cache_access_node) object in the repository. A default cache access node /System
EAS/Nodes/cache_access_node_01 was configured during InfoArchive installation and can be
used out of the box. You can modify the default cache access node as needed.
To create and configure a new cache access node, create an object of type eas_cfg_cache_access_node
and set its properties.
74
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
Name of the access node.
eas_log_level
Log level
Log level for processing cache access requests.
eas_fs_working
_root
Working directory
Path to the working directory in the file system used by
the cache access node.
eas_queue_user
Queue user
User account (virtual user, no connection) in the
repository associated with this node:
• Each cache access node must be associated with a user
account.
• Ingestion nodes and order nodes must post their cache
requests to that user.
eas_polling
_interval
Polling interval
(ms)
Interval in milliseconds for polling the queue of cache
requests.
eas_suspend
_enabled
Suspended
Flag to temporarily suspend the processing of requests
without having to stop the node.
75
InfoArchive Configuration
eas_requestor
_timeout
Requestor timeout
Time in milliseconds after which the application which
issued the request to this node must assume that the
request cannot be processed at this time, unless it has
received a callback.
eas_processed
_request_cnt
# requests
processed
Counter indicating the number of requests processed by
this node.
eas_log_close
_pending
Log close pending
Flag to indicate that the log file should be closed without
stopping the node.
eas_start_date
Start date
Date of the last time the node started.
eas_start_proc
_request_cnt
# requests
processed since
start date
Counter indicating the number of requests processed by
this node since the last boot (or since the last reset).
eas_stop_pending
Stop pending
Flag to indicate that the node should be stopped. Setting
this flag to true (for example, using DQL) allows stopping
the node.
eas_stop_date
Stop date
Date of the last time the node stopped.
Configuring an xDB Library
xDB libraries to be used for storing archived data are represented as xDB library configuration
(eas_cfg_xdb_library) objects in the repository. This object defines the target xDB server, the
library path, and the xDB cache process to be used (i.e., the name of a eas_cfg_cache_access_node
configuration object).
For information about xDB library, refer to the EMC Documentum xDB Administration Guide.
Four default xDB libraries were configured during InfoArchive installation and can be used out of the
box. You can modify these xDB libraries as needed.
• aip_01_xdb_lib
The default xDB parent library in xDB mode 1 and 2.
For information about configuring an xDB parent library, see Configuring an xDB Parent Library
(xDB Mode 1 and 2), page 115.
• confirmation_aip_xdb_lib
The default xDB library to store generated confirmations, if confirmations are configured to use
the xDB library as the delivery channel (set as the value of the eas_cfg_xdb_library parameter on
the delivery channel configuration object for confirmations).
76
InfoArchive Configuration
For information about delivery channel, see Delivery Channel Configuration Object
(eas_cfg_delivery_channel) , page 225.
• confirmation_audit_xdb_lib
The default xDB library to store confirmation audit trails, referenced in the purge audit properties
file.
• order_01_xdb_lib
The default xDB library to store intermediate order (asynchronous search) results in xDB,
referenced by the delivery channel configuration (eas_cfg_order) object used as the working
delivery channel by the order configuration (eas_cfg_order) object.
For information about configuring an order configuration object, see Configuring an Order
Configuration (eas_cfg_order) Object , page 179.
To create and configure a new xDB library, create an object of type eas_cfg_xdb_library and set its
properties.
Property Name
DA Label
Description
eas_name
Name
Name of this xDB library configuration.
77
InfoArchive Configuration
Property Name
DA Label
Description
eas_library_path
Library path
Full path to the xDB library where structured data can
be imported.
eas_database
_name
Database
Name of the xDB database to which the library belongs.
eas_federation
_name
Federation
Name of the federation to which the database belongs.
eas_federation
_set_name
Federation set
Name of the federation set to which the federation
belongs.
eas_user_name
xDB user name
xDB user name to use to connect to the xDB database.
eas_cfg_crypto
_pro_user_pwd
Cfg crypto
provider for
password
Name of service provider used for encrypting the
password (blank if the password is not encrypted).
Password crypto
key id
Identifier of the key used to encrypt the password if the
password has been encrypted using a cryptographic
provider. An empty value indicates that the password
is not encrypted.
eas_user_pwd
_crypto_key_id
Reserved for future use
Reserved for future use
eas_user
_password
_encoding
Password
encoding
Name of the encoding (for example, base64) used to store
the password.
eas_user
_password
xDB password
Password for the user name used to connect to the xDB
database.
eas_cfg_cache
_access_node
Cfg cache access
node
Name of the cache access node having access to the xDB
file system of the xDB database to which the library
belongs.
eas_segment
_location
Segment location
Logical name of the storage path of the xDB segment file
associated with the property eas_seg_root_fs_path at the
same index.
This logical name indirection makes it possible to not
include file system-level paths in the configuration of the
archive holding.
eas_seg_root_fs
_path
78
Path of the
segment location
File system path to the location where xDB segment files
can be stored.
InfoArchive Configuration
Property Name
DA Label
Description
eas_node
_proximity
Site Proximity
Proximity value of the xDB node associated with the xDB
node name at the same index.
Reserved for future use
eas_node_name
xDB node name
Name of the xDB node.
eas_node_host
Host
Name or IP address of the xDB database host associated
with the xDB node name.
eas_node_port
Port
Port of the node associated with the xDB node name at
the same index.
eas_node_write
_enabled
Write enabled
Flag associated with the xDB node name at the same
index indicating if the node can write to the library.
eas_node_read
_enabled
Read enabled
Flag associated with the xDB node name at the same
index indicating if the node can read from the library.
Configuring the Configuration Cache
When enabled, the configuration cache boosts web services calls performance and offloads the
Content Server by temporarily storing configuration objects fetched from the repository to the web
services server so that they can be retrieved rapidly when needed in subsequent calls.
An atomic entry in the configuration cache is called an element. An element has a key, a value, and
a record of accesses. Elements are put into and removed from caches. They can also expire and be
removed by the cache, depending on the cache settings.
You enable the configuration cache and configure its settings through the options in the
eas-services.properties file located in WEBSERVER_HOME/conf on the web application
server.
You can configure the following options in eas-services.properties. If an option is not set in
the properties file, it falls back to its default value.
Option
Description
eas_config_cache
_enabled
Whether or not to enable the configuration cache. Default: false.
79
InfoArchive Configuration
Option
Description
eas_config_cache
_memory_size
The maximum size of the JVM memory that can be used for caching.
When this max size limit is reached, elements will be cached out
(evicted) from the memory store based on the cache strategy (eviction
algorithm). Default: 256M.
eas_config_cache_time
_to_live
Length of time (in seconds) elements are kept in the cache since they
are placed in the cache. 0 means that elements have an indefinitely
long life span unless evicted based on the cached strategy. Default:
86400 (1 day).
eas_config_cache_time
_to_idle
Length of time (in seconds) elements are kept in the cache after their
last use. 0 means that elements will not be cached out based on idle
time. Default: 86400 (1 day).
eas_config_cache
_strategy
When the cache limit is reached, which eviction algorithm to use to
cache out elements. Default: LRU.
• FIFO
First in First Out—elements are evicted in the same order as they
come in. When a put call is made for a new element (and assuming
that the max limit is reached for the memory store) the element that
was placed first (first-in) in the store is the candidate for eviction
first-out.
This algorithm is used if the use of an element makes it less likely
to be used in the future. It takes a random sample of the elements
and evicts the smallest.
• LFU
Least Frequently Used—The first element to be deleted will be the
least frequently used. For each get() call on the element, the number
of hits is updated. When a put() call is made for a new element
(and assuming that the max limit is reached), the element with least
number of hits (the Least Frequently Used element) is evicted.
If cache-element usage follows a Pareto distribution, this algorithm
might give better results than LRU.
• LRU
Least Recently Used—The first element to be deleted will be the
least recently used. The oldest element is the Less Recently Used
element. The last-used timestamp is updated when an element is
put into the cache or an element is retrieved from the cache with a
get call. This algorithm takes a random sample of the elements and
evicts the smallest.
eas_config_cache
_statistics_enabled
80
Whether or not to enable the detailed cache statistics. Set this option to
true if you want to view basic and detailed statistics of the configuration
cache, such as cache hits, cache misses, eviction count, and average get
time. Default: false.
InfoArchive Configuration
You perform administrative tasks on the configuration cache through the InfoArchive web services
administration pages. See Administrating the Configuration Cache, page 318.
Configuring a Centera Store for Use with InfoArchive
InfoArchive natively supports data archiving on Content Addressable Storage (CAS) platforms
such as EMC Centera.
If you use Centera with InfoArchive, archived data is pushed to Centera at the storage level at he end
of the archiving process. The storage of content files in the storage area and xDB does not require
backups, which reduces system overhead and improves performance.
To configure a Centera Store for use with InfoArchive, you need to create a Centera Store repository
object in DA and set some key properties.
Creating a Centera Store Object
In DA, create a Centera Store object by executing a DQL statement in DQL Editor. Use the following
sample DQL as a reference:
CREATE dm_ca_store OBJECT
SET name
SET a_plugin_id
= 'AssignStoreName',
= (SELECT r_object_id FROM dm_plugin WHERE
object_name = 'CSEC Plugin'),
SET a_retention_attr_name
= 'eas_retention_date',
SET a_retention_attr_required = False,
APPEND a_storage_params
= 'AssignedIpAddress#1,…,
AssignedIpAddress#n?AssignedFilepath',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_aip_id',
= 'eas_aip_id',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_holding',
= 'eas_dss_holding',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_id',
= 'eas_dss_id',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_pdi_schema',
= 'eas_dss_pdi_schema',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_pdi_schema_version',
= 'eas_dss_pdi_schema_version',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_production_date',
= 'eas_dss_production_date',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_base_retention_date',
= 'eas_dss_base_retention_date',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_producer',
= 'eas_dss_producer',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_entity',
= 'eas_dss_entity',
81
InfoArchive Configuration
82
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_priority',
= 'eas_dss_priority',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_dss_application',
= 'eas_dss_application',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_production_date',
= 'eas_sip_production_date',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_seqno',
= 'eas_sip_seqno',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_is_last',
= 'eas_sip_is_last'
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_aiu_count',
= 'eas_sip_aiu_count',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_page_count',
= 'eas_sip_page_count',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_pdi_hash_algorithm',
= 'eas_sip_pdi_hash_algorithm',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_pdi_hash_encoding',
= 'eas_sip_pdi_hash_encoding',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_pdi_hash',
= 'eas_sip_pdi_hash',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_cfg_crypto_provider',
= 'eas_cfg_crypto_provider',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_pdi_crypto_key_id',
= 'eas_pdi_crypto_key_id',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_crypto_encoding',
= 'eas_crypto_encoding',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_crypto_iv',
= 'eas_sip_crypto_iv',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_pdi_crypto_iv',
= 'eas_pdi_crypto_iv',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_ci_crypto_iv',
= 'eas_ci_crypto_iv',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_sip_crypto_propbag',
= 'eas_sip_crypto_propbag',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_pdi_crypto_propbag',
= 'eas_pdi_crypto_propbag',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_ci_crypto_propbag',
= 'eas_ci_crypto_propbag',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_pdi_crypto_hash_algo',
= 'eas_pdi_crypto_hash_algo',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_pdi_crypto_hash_salt',
= 'eas_pdi_crypto_hash_salt',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_ci_crypto_key_id',
= 'eas_ci_crypto_key_id',
InfoArchive Configuration
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_reject_date',
= 'eas_reject_date',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_invalid_date',
= 'eas_invalid_date',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_rejinv_user_name',
= 'eas_rejinv_user_name',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_rejinv_description',
= 'eas_rejinv_description',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_phase',
= 'eas_phase',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'eas_state',
= 'eas_state',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'a_content_type',
= 'a_content_type',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'r_page_cnt',
= 'r_page_cnt',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'content.r_object_id',
= 'content.r_object_id',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'content.full_format',
= 'content.full_format',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'content.page_modifier',
= 'content.page_modifier',
APPEND a_content_attr_name
APPEND a_content_attr_desc
= 'content.page',
= 'content.page',
APPEND a_content_attr_name
APPEND a_content_attr_desc
GO
= 'content.set_file',
= 'content.set_file'
The sample DQL defines a Centera Store object with AIP and content attributes assigned to
it, which determine what information will be pushed to Centera at the storage level when the
east_commit job commits ingested data at the end of the archiving process. As a naming convention,
content.attribute_name refers to the attribute of the content object (dmr_content); an attribute name
not proceeded by content indicates that it is an AIP (eas_aip) object attribute.
The attributes defined in the sample DQL are recommended ones. You can configure them based
on your own storage needs. If you do not want to push a content attribute to Centera at the storage
level, you can remove the assignment of the attribute from the statements. For example, if you
remove the assignment of a_retention_attr_name from the DQL statements, the retention
date attribute will not be pushed to Centera.
83
InfoArchive Configuration
Configuring Key Centera Store Object Properties
Edit the properties of the Centera Store object:
• Select the Configure Retention Information option and set Retention Attribute Name to
eas_retention_date, which is an AIP (eas_aip) object property.
• Do not select the Application Provides Retention option.
• The content attribute names and descriptions were defined when you created the Centera Store
object by executing the DQL statement. You can edit the attributes as needed.
Quickstart Using the InfoArchive Holding
Configuration Wizard
As an alternative to manually creating and configuring a holding, InfoArchive provides a holding
configuration wizard that guides you through the process of creating a basic holding configuration to
give you a quickstart in getting the system up and running. The web-based wizard walks you through
some key configuration steps for a holding in an intuitive manner and generates a compressed .zip
84
InfoArchive Configuration
file that contains all the configuration files required for installing the configured holding. You then
install the holding into the repository on the InfoArchive Content Server host using the ant installer.
While the InfoArchive configuration wizard streamlines the configuration process for a holding, it
only lets you define some key configuration settings and making all other configuration settings
transparent by using the preset default values. The simple holding configuration package generated
by the wizard can be either directly installed into the repository using the ant installer, or used as a
basic configuration template to configure advanced settings.
Here is the general workflow of the InfoArchive holding configuration wizard:
Launching the InfoArchive Holding Configuration
Wizard
The InfoArchive Holding Configuration Wizard is a web-based application that can be accessed in
a web browser, either remotely or locally on the web application server. There are two ways you
can start the wizard:
• Launch the wizard remotely as a web application.
85
InfoArchive Configuration
Deploy the wizard WAR package located in EAS_HOME/tools/holding-configurator/WAR
to the web application server and access it in a web browser from a remote client.
For example, on Apache Tomcat, deploy the wizard WAR package to TOMCAT_HOME/webapps
/eas-wizard, and then access the wizard via the URL http://hostname:8080/eas-wizard.
• Launch the wizard locally as a standalone application.
Note: If you are running the wizard on a computer other than the InfoArchive server, make sure
Java 7 has been installed.
Execute the following script:
— EAS_HOME/tools/holding-configurator/run.bat (Windows)
— EAS_HOME/tools/holding-configurator/run.sh (Linux)
This will deploy the wizard application on an embedded Jetty server and directly launch a web
browser pointing to the wizard URL: http://localhost:9000/holding-configurator (default).
If needed, you can change the port number by editing EAS_HOME/tools/holding
-configurator/conf.xml.
Note: The standalone InfoArchive Configuration Wizard can be directly extracted from the
InfoArchive installation package and launched using the run.bat/run.sh script. If you do so on
Linux, make sure that proper permissions are set on the run.sh script.
Configuring a Basic Holding Using the InfoArchive
Holding Configuration Wizard
On the first screen of the InfoArchive Holding Configuration Wizard, choose to create a new holding
configuration or load an existing holding configuration that you saved earlier using the wizard.
Note: Holding configurations files created outside the wizard cannot be loaded.
The wizard lets you save an unfinished holding configuration and complete it at a later time. Any
time during the configuration process, you can click the Save button to download an interim
configuration file (.zip) to your local drive.
Note: If you are running the InfoArchive Holding Configuration Wizard in Microsoft Internet
Explorer on a Windows Server operating system, you must disable Internet Explorer Enhanced
Security Configuration to be able to save the interim configuration file.
You can load the saved configuration file later to continue with the configuration. Note that every
time you modify a saved configuration, you must regenerate the final holding configuration package
(.zip) to be installed into the repository. Do not directly modify settings in the interim configuration
file or the generated holding configuration package outside the wizard.
Note: In the InfoArchive Holding Configuration Wizard, when you load an existing holding
configuration that you saved earlier, you may not be returned to the screen where you saved
the configuration, which may lead to some configuration steps skipped. To be safe, to modify a
previously saved holding configuration in the wizard, after you load it, always go back to the first
configuration screen and resume from here.
86
InfoArchive Configuration
To create a new holding configuration, follow the on-screen instructions that guide you through the
configuration process:
1.
Specify a descriptive name that uniquely identifies the holding. The holding name provided here
can contain up to 18 alphanumeric characters.
A holding is a logical destination archive where to ingest and store data, usually of the same type
that share common characteristics. For example, you can create a holding to archive data from
the same source application (such as ERP data), or of the same format (such as audio recordings),
or belonging to the same business entity.
An InfoArchive instance can contain multiple archive holdings. The SIP descriptor
(eas_sip.xml) contains the name of the holding to be used for data archiving.
2.
Define the PDI schema.
Select the schema that formally describes the structured data in the information packages to
archive. The specified schema will be imported into the repository as the content of the schema
configuration (eas_cfg_schema) object.
Note: The PDI schema must not contain elements with identical names.
Unlike the SIP descriptor file eas_sip.xml, there is no predefined schema for structured data
(PDI) in the information package. You must create a schema with a target namespace that defines
the elements, attribute, and simple and complex types in the eas_pdi.xml file according to your
business requirements. The eas_pdi.xml file in the information package must conform to
the defined schema.
For detailed information about defining the PDI schema, see Creating a Structured Data (PDI)
Schema, page 109.
3.
Select the Archival Information Unit (AIU) node.
From the schema diagram, select the node that represents the archival information unit (AIU).
This diagram is a graphical representation of the structured data (PDI) schema.
Use the following navigational operations to locate the AIU node:
• Pan and zoom around the diagram—Wheel up to zoom in; wheel down to zoom out
• Click the plus sign (+) on a node to expand it; click the minus sign (-) to collapse a node
• Click the plus sign (+) and the minus sign (-) on the left side of the diagram to expand/collapse
the nodes all at once
Note: Make sure you select the correct node that represents the AIU. The wizard does not
validate your selection. If you select the wrong node, ingestion will fail.
An archival information units (AIU) is conceptually the smallest archival unit (like an information
atom) of an information package. Each AIU corresponds to a record or item of the archived data.
A single customer order, a patient profile, or a financial transaction record in an information
package is an AIU.
The structured data (PDI) file (eas_pdi.xml) in a SIP describes all the AIUs in the package. An
AIU in eas_pdi.xml consists of an XML block in the file containing its structured data, and
optionally, references to one or more associated unstructured content files. For information about
AIU, see Archival Information Unit (AIU) , page 32.
87
InfoArchive Configuration
4.
Select a date or dateTime node to be used as the AIP partitioning key.
You can only select one AIU child node (element) of type date or dateTime as the partitioning
key (all the other nodes are greyed out in the schema diagram), which is the most common use
scenario. However, through manual configurations, you can also define multiple AIP partitioning
keys and the element does not have to be of type date or dateTime.
InfoArchive uses the partitioning key, which is an AIU child node (element) defined in the
structured data (PDI) schema, to calculate the value range (min/max values) of the information
package in terms of this AIU attribute. The value range serves as the AIP partitioning criterion
used to quickly locate the AIPs that may contain matching AIUs in the query for archived data
(tier-1 search).
For more information about defining the partitioning key, see minmax—Defining the AIP
Partitioning Key, page 132.
88
InfoArchive Configuration
5.
If information packages include unstructured content files, select the node (AIU child element or
attribute) that contain their names.
By executing an XQuery expression to select distinct values of this element during the ingestion
process, InfoArchive uses this information to create the table of contents (eas_ri.xml) that
references unstructured content files.
6.
Select the format of content files. In most cases, this is the content file name extension, and the
MIME type attribute is extrapolated from the format.
The format must be defined in the Documentum repository. If needed, you can define additional
formats in the repository using DA or editing the configuration template files, which can be
installed using ant.
This information is used to create the table of contents (eas_ri.xml) that references
unstructured content files.
7.
Select one or more nodes to be used as search criteria.
89
InfoArchive Configuration
8.
Define the search criteria:
• Data types of the search criteria are displayed on the screen. If a data type is not correct, fixed
it in the PDI schema and reload it in the wizard.
• Create xDB index: Specify whether to index the element in xDB. Creating xDB indexes speeds
up searches but consumes more storage space.
• Set the default sorting order on the search results page
This information will be used to construct the XQuery XML file to be imported into the repository
as the content of the query configuration (eas_cfg_query) object. Specifically, this step configures
the path.value.index section in the eas_cfg_pdi.xml and eas_cfg_query.01.xml
configuration template files. For details, see Defining the Search Criteria and Search Result,
page 171.
9.
If you want to take advantage of the confirmations feature, select the event types for which
you want to generate confirmation messages.
A confirmation is a message generated in response to an AIP event to acknowledge that the event
has occurred and to capture the information of the relevant AIP.
The wizard only provides the most basic confirmation configuration options.
You can perform additional manual configurations to:
• Generate multiple confirmations for a single event and configure the content of the messages
• The scope of confirmation generation to a specified set of AIPs
• Configure where to output confirmation messages: xDB or a file system location.
For information about configuring confirmations, see Configuring Confirmations, page 222.
90
InfoArchive Configuration
10. Specify the xDB mode and retention policy for the holding.
xDB modes determine how structured data is stored in xDB.
If you choose xDB mode 3 (pooled libraries), specify the following settings for the library pool:
• Close period (in days)
How many days after the last ingestion date when the library pool can be closed.
• Partition period per
Specify a partitioning period: day, week, month, quarter, or year.
• Pooled libraries quota
The maximum number of xDB libraries that can be assigned to a library pool.
For information about xDB modes, see xDB Modes, page 50.
By default, the AIP mode is set to 1. To change the AIP mode, after the wizard generates
the holding configuration package, you need to edit the configuration template file
022-aip-parent-policy.xml inside the package.
By default, the synchronous ingestion is not enabled.
If you choose to use a retention policy, select one of the following:
• Retention period (days):
Specify the default retention period used to calculate the retention date of the information
packages in the holding:
retention date = base retention date + retention period
• Retention Policy Services (RPS) retention policy:
If InfoArchive is integrated with Retention Policy Services (RPS) for retention management,
specify the RPS retention policy to use.
For information about retention management using RPS, see Using Extended Retention
Management Capabilities of Retention Policy Services (RPS) , page 262.
91
InfoArchive Configuration
11. Select search criteria to include in the search form.
From the search criteria you defined in the previous steps, select the ones you want to use on the
InfoArchive GUI search form and use the Up/Down arrows to arrange them in the same order
you want then to appear on the search form.
12. Configure the search form:
• Label: The label of the search criterion to display in InfoArchive GUI
• Operator: Search operator that defines how to compare the specified value against archived
data.
92
InfoArchive Configuration
When you specify an operator for a criterion, you can only choose from the list a valid
operator supported by the node data type. The supported operator for each data type are
listed as follows:
Date
DateTime
String
Integer
Double
Equal
*
*
*
*
*
Not equal
*
*
*
*
*
Greater
*
*
*
*
Greater or
equal
*
*
*
*
Less
*
*
*
*
Less Or equal
*
*
*
*
Between
*
*
*
*
Begins with
*
Contains
*
Note: The information provided in this step will be used to generate the
eas_cfg_search_form.01.xml configuration template file in the resultant holding
configuration package to be generated by the wizard. This file will be imported into the search
form configuration (eas_cfg_search_form) object during holding installation.
In the XForms eas_cfg_search_form.01.xml:
The Between operator automatically translates into the combined Greater or equal and
Less or equal operator pair, which actually requires two input values.
The Begins with operator in the wizard will translate into the StartsWith operator.
For information about search form configuration, see Configuring a Search Form, page 187.
• Default value: Default value of the search criterion.
You can specify a fixed value such as an integer or a string, or directly use an XForms function
such as now() for the dateTime type or local-date() for the date type.
Note: You must ensure the default value you set is valid and ensure that the returned value is
of the same data type as that of the search criterion.
String type values must be surrounded by single quotes (’); no additional single quotes (’)
or double quotes (’’) are allowed. For example, ’MyValue’ is a valid entry, but ’My’Value’,
’My’’Value’, or ’’MyValue’’ will either cause technical errors or incorrect default values
in InfoArchive GUI.
• Required: Whether the criterion is a required search condition.
If you set a search criterion of type date or a dateTime as required but do not set its default
value, the current date will be automatically set as the default value in the generated search
form.
93
InfoArchive Configuration
13. Select columns to display on the result page. Use the Up/Down arrows to arrange them in the
order that you want them to appear on the result page.
14. Define the column labels to display on the result page.
The labels default from the ones defined for the search form. You can modify the labels, but
they must be unique.
94
InfoArchive Configuration
Information provided in step 13 and 14 will be used to generate the template stylesheet
eas_cfg_stylesheet.01.xml.
15. Enter the Documentum repository information required for installing the configured holding.
The wizard does not check the validity of this information.
If you do not provide the repository superuser password (for security reasons), the wizard
supplies a predefined dummy value so as to use the Content Server trusted login mechanism to
install the configured holding.
You can configure these settings later in the build.properties contained in the holding
configuration package to be generated by the wizard.
16. Review the holding configuration settings.
Click Back to modify configuration settings as needed; click Generate Configuration to generate
and download a compressed .zip file that contains all the configuration files required for
installing the configured holding.
17. Copy the generated holding configuration package (.zip) to the InfoArchive server.
18. Unzip the holding configuration package.
95
InfoArchive Configuration
19. Optionally, if you want to modify the holding configuration settings, including the default
settings such as AIP mode that were not exposed for configuration in the wizard, you can edit the
following files in the decompressed package folder:
• build.properties
• Configuration object content template files in /template/content
Note: The changes you directly make to the files in the holding configuration package will not
take effect if you reload the modified package in the wizard.
20. Run the ant installer to install the configured holding into the Content Server repository.
After you install the basic holding configured using the wizard, you can modify the holding
configurations in one of the following ways:
• Modify the configuration objects pertaining to the holding in DA. This allows you to configure
more advanced settings that are not available in the wizard.
• Modify the settings in build.properties and configuration object content template files in
/template/content in the holding configuration package and reinstall the holding into the
repository to overwrite the existing one.
• Load the existing holding configuration in the wizard, modify the configurations, regenerate
the holding configuration package, and reinstall the holding into the repository to overwrite
the existing one.
Configuring a Holding
Most InfoArchive system configuration is performed at the holding level. Holding configuration
encompasses many aspects of data archiving such as storage areas, retention policy, ingestion
sequence, AIP mode, and xDB mode. The settings defined at the archive holding level are used
throughout the whole lifecycle of the data archived in the holding.
Holding is configured through the holding configuration (eas_cfg_holding) object, which directly or
indirectly references many of the other InfoArchive configuration objects.
96
InfoArchive Configuration
Configuring a Holding Configuration (eas_cfg_holding)
Object
In DA, create an object of type eas_cfg_holding in the holding folder (e.g., /System EAS/Archive
Holdings/MyHolding) and configure its properties.
The holding configuration object name and its location in the repository folder is completely
arbitrary. However, the general convention is to place the object in the /System EAS/Archive
Holdings folder.
Property Name
DA Label
Description
eas_criteria_name
Criteria name
Name of the partitioning criterion to use during
ingestion as well as search when queries are
executed against the collection.
eas_criteria_type
Criteria type
Name of the XQuery atomic type (for example,
string, date, date-time, float, integer, or
boolean) of the partitioning criteria specified in
eas_criteria_name.
eas_criteria_pkey_min_attr
Criteria min.
value
Name of the AIP property containing the
minimum value of the partitioning criteria in
eas_criteria_name at the corresponding index.
eas_criteria_pkey_max_attr
Criteria max.
value
Name of the AIP property containing the
maximum value of the partitioning criteria in
eas_criteria_name at the corresponding index.
All these properties are related to the AIP partitioning criteria, which must be already defined as part
of the ingestor parameters in ingestion configuration.
Note: Partition criteria must be defined before data is ingested.
97
InfoArchive Configuration
Property Name
DA Label
Description
eas_order_no
Order No
Number to control the sort order in which items are
returned by the InfoArchive web service.
98
InfoArchive Configuration
Property Name
DA Label
Description
eas_consumer
_application
User application
Name of the consumer application for which the labels are
valid (eas_language_code, eas_title, and eas_description).
eas_language
_code
Language Code
Language code in the format language_country (ISO 639,
3166); for example: fr_FR for French and zh_CN for
simplified Chinese.
eas_title
Title
eas_description
Description
Title of the AIC in the language specified in
eas_language_code.
Description of the AIC in the language specified in
eas_language_code.
Property Name
DA Label
Description
eas_sip_format
Received file format
Name of a Documentum format
name associated with a SIP to
receive or ingest (for example,
eas_sip_zip).
99
InfoArchive Configuration
Property Name
DA Label
Description
eas_cfg_ingest
Cfg ingest process
Name of the ingestion
configuration to apply for
a SIP format at the same index.
eas_pdi_schema
Schema
Name of an XML schema in
which the holding can ingest
structured data.
eas_pdi_schema_version
Cfg metadata
Version of the XML schema in
which this holding can ingest
structured data associated with
the schema at the same index.
eas_cfg_pdi
Cfg metadata
Technical name of the ingestion
parameters applied for
ingesting the AIP.
eas_cfg_pdi_version
Cfg metadata version
Version of the ingestion
parameters applied for
ingesting the AIP.
eas_sip_store
Reception filestore
Name of repository storage
area in which the receiver must
store the received file.
eas_delete_on_error
Delete on ingestion error
Whether to enable the
deletion of received data
from the working directory
and xDB if a processing
error occurs. This setting is
useful for protecting sensitive
data. When data must be
encrypted for the holding, the
eas_delete_on_error property
of the eas_cfg_holding_crypto
configuration object overrides
this property defined at the
holding level.
eas_create_deletion_lock
Automatic creation of purge
lock
Enables the automatic creation
of a deletion lock on the AIP
when it is ingested (refer to
eas_litigation_hold type).
eas_keep_sip_ingest_enabled
Keep received file on reject or
invalidation
Flag to indicate whether the
received file should be retained
after the ingestion of the AIP is
committed.
eas_cfg_ingest_node
Ingest nodes
Name of the ingestion node
that processed this AIP.
eas_priority
Priority
Ingestion priority of this
holding.
100
InfoArchive Configuration
Property Name
DA Label
Description
eas_dss_priority
Sub-priority
Ingestion sub-priority code
mentioned in the SIP descriptor
(eas_sip.xml).
The ingest deadline date
is computed based on the
duration defined in the
holding for the current
eas_dss_priority (same as the
value for eas_sip.xml).
eas_dss_deadline
Sub-priority deadline (mn)
Maximum target time in
minutes for the injection of an
AIP having the priority value
eas_dss_priorities at same
index.
101
InfoArchive Configuration
Property Name
DA Label
Description
eas_logs_store_enabled
Archive logs
Flag indicating whether reception and
ingestion logs must be stored in the
repository.
eas_logs_store
Log store
Name of the repository storage area to store
reception and ingestion logs.
eas_root_folder_path
Root folder
Path to the root folder for the classification
of AIPs in the repository.
AIPs are classified chronologically
according to its base date of retention
(eas_dss_base_retention_date present in the
descriptor): <Root> / YYYY/ YYYY-MM /
YYYY-MM-DD
eas_folder_type
Sub-folder type
Type of folders to create under the
root classification folder defined by
eas_root_folder_path.
eas_folder_acl_name
Sub-folder ACL name
Name of the permission set to apply to
folders created under the root classification
folder defined by eas_root_folder_path.
eas_folder_acl_domain
Sub-folder ACL
domain
Domain of the permission set to apply to
folders created under the root classification
folder defined by eas_root_folder_path.
eas_aip_type
AIP type
AIP type used by this holding. It must be a
subtype of the eas_aip base type.
eas_sip_xml_store_enabled
Archive SIP
descriptor
Flag to indicate if the XML descriptor
(eas_sip.xml) of the AIP should be stored
as content of the AIP for redundancy and
reversibility.
eas_pdi_xml_hash
_enforced
Mandatory metadata
hash
Flag to indicate whether an error should
be generated if the SIP descriptor
(eas_sip.xml) does not contain the hash
value of the PDI file (eas_pdi.xml).
eas_pdi_xml_hash
_validation
Metadata hash
validation
Flag to indicate whether the hash value
associated with the XML structured data
elements of the AIP should be validated if
the hash is present in the SIP descriptor.
eas_pdi_xml_store_enabled
Archive XML
metadata
Flag to indicate if the structured data of the
AIP should be stored as compressed XML.
eas_xml_store
XML store
Name of the storage area of the repository
to assign the contents to ensure
reversibility (formats eas_sip_xml_zip,
eas_ri_xml_zip, eas_pdi_xml_zip or
eas_pdi_xml_zip_crypto).
102
InfoArchive Configuration
Property Name
DA Label
Description
eas_keep_xml_rejinv
_enabled
Keep XML file on
reject or invalidation
Flag to indicate whether structured data files
should be retained for ensuring reversibility
when an AIP is rejected or invalidated.
eas_keep_ci_rejinv_enabled
Keep contents on
reject or invalidation
Flag to indicate whether unstructured
content files associated with the items
contained in an AIP should be retained in
case of rejection or invalidation of the AIP.
eas_retention_period
Default Retention
period (d)
Retention time of AIPs expressed in days,
to be applied by default if not any retention
class is mentioned in the SIP descriptor
(eas_sip.xml).
The retention date of the AIP is
calculated according to: eas_aip
.eas_dss_base_retention_date +
eas_cfg_holding.eas_retention_period.
eas_retention_class
Retention class
Logical name of a retention class.
eas_retention_class_period
Retention class
period (d)
Retention period in days associated with the
retention class name of the same index.
eas_auxiliary_alias_set
Auxiliary Alias set
Alias set name assigned as auxiliary set
during reception or ingestion of the AIP. This
alias set is defined as the default alias set for
the session that runs the state transitions of
the lifecycle. This alias set generally contain
aliases referencing the permission sets to be
applied, assuming those aliases are defined
as actions in the lifecycle attached to the AIP.
eas_alias_set
Alias Set
Name of the alias set to apply to the lifecycle
after the creation of the object (optional).
eas_sync_commit_enabled
Synchronous commit
enabled
Activates commit at the end of the ingestion
when the AIP is a standalone DSS (the DSS
is contained in this single AIP).
eas_sync_ingest_enabled
Synchronous
ingestion enabled
Must be set for authorizing synchronous
ingestion (performed by the ingest Web
service) for the archive holding.
eas_sing_cfg_aip_parent
_pol
Cfg AIP parenting
policy for sync. ingest
Name of the AIP parenting policy
(eas_cfg_aip_parent_policy) to apply
for synchronous ingestion. If not set, a
synchronous ingestion is processed like a
batch ingestion (creation of a materialized
AIP object with the eas_xdb_mode
configured at the holding level).
eas_xdb_mode
Metadata ingest
mode
xDB ingestion mode of the metadata which
has been applied for the AIP (refer to
eas_cfg_holding type).
103
InfoArchive Configuration
Configuring an AIC View (eas_cfg_aic_view) Object
Optionally, you can configure the AIC view (eas_cfg_aic_view) object, which is a collection of selected
AIUs. The holding configuration (eas_cfg_holding) object can work without this.
You have two methods of selecting AIUs in the AIC view scope:
• DQL
• XQuery
Configuring a DQL Predicate for the AIC View
You specify a DQL predicate with the eas_aip_predicate property of the AIC view object. You
can configure a DQL query to select a collection of AIPs from one or more holdings. For example, the
following DQL predicate selects AIPs in the EAS_AUDIT_001 holding:
eas_aip WHERE eas_dds_holding IN ('EAS_AUDIT_001')
When InfoArchive performs a search in background, the predicate is integrated into a DQL SELECT
statement to identify the AIP objects.
Configuring an XQuery for the AIC View
You can further refine the AIUs visible in the AIC view scope by attaching an XML file, which
contains XQuery criteria, to the AIC view object. Attaching an XML file to the AIC view object is
similar to attaching XML files to eas_cfg_query or eas_cfg_ingest objects.
The syntax of the XML file is similar to the query configuration XML: The following example shows
an XML snippet which is used to build an XQuery selecting dm_logon_failure event AIUs from
the audit holding.
<?xml version="1.0" encoding="UTF-8"?>
<aicQueryCriteria type="AND">
<operand>
<name>event-name</name>
<operator>BeginWithFullText</operator>
<value>dm_logon_failure</value>
</operand>
</aicQueryCriteria>
Configuring Holding Security
You must configuring security settings for a holding with the following group/role hierarchy.
Note: myholding is the name of the holding for which you are configuring security settings. The
names of the domain, roles, and groups are for illustrative purposes only. You can use your own
naming conventions.
104
InfoArchive Configuration
The domain is used by the UI to display different sets of search forms depending on the specified
domain. The dynamic roles are used for access control by the InfoArchive access web services and the
eas_usr_webservice must be a member of each dynamic role. You can grant users different access
rights by assigning them to the corresponding groups.
It is a good practice to create a dynamic role to use with a holding. When a dynamic role is specified,
InfoArchive GUI handles the assignment of users at runtime, which makes it easier to manage the
holding:
• The permissions configured for AIPs and configuration objects just need to include the role
dedicated to the holding
• To grant or revoke access to the holding, you just need to manage the desired users, groups, or
roles associated with the dynamic role.
It is much harder to reliably manage access to the holding directly using permissions (ACLs), since
users and roles are often subject to change.
In DA, follow these steps to configure repository security settings for your holding:
1.
Under Administration/User Management/Groups, create two groups: g_myholding_read and
g_myholding_admin.
2.
Under Administration/User Management/Roles:
• Create a dynamic role r_myholding_read and add the group g_myholding_read and the user
eas_usr_webservice to it.
• Create a dynamic role r_myholiding_admin and add the group g_myholding_admin and the
user eas_usr_webservice to it.
Note: In the New Role window, select Dynamic Role to create the dynamic roles.
3.
Under Administration/User Management/Roles, create a domain myholding and add the
dynamic roles r_myholding_read and r_myholding_admin to it. Choose File > New Role and select
Create role as domain in the New Role window.
Note: After you create the domain, you will find it not under Administration/User
Management/Roles but under Administration/User Management/Groups.
4.
Grant appropriate access rights users by adding them to the g_myholding_admin and
g_myholding_read groups respectively. For example, add the InfoArchive administrator user
to the g_myholding_admin group.
105
InfoArchive Configuration
Under Administration/User Management/Groups, double-click a group and then select File >
Add Member(s) to add users to it.
5.
Create the default permission set (ACL) for the holding.
a.
Under Administration/Security, choose File > New > Permission Set.
b. In the New Permission Set window, under the Info tab, specify a unique name for the
permission set; for example, you can use the holding name myholding as the permission
set name.
c.
Under the Permissions tab:
• Add r_myholding_read and r_myholding_admin to the permission set with Read permission
• Add r_myholding_admin with Relate permission
•
Remove permissions for dm_world
Note: In most situations, grant the Relate rather than the Write or higher access right on AIPs
to the InfoArchive administrator for compliance reasons. The Relate permission grant access
to the InfoArchive administrative functions on AIPs, including purge lock/unlock, retention
date management, reject, and invalidation.
Audit management and job execution require the standard Content Server privileges and
are not related to the ACL applied to AIPs.
6.
Create an alias set for the holding.
The archive holding configuration references the name of an alias set to be used for determining
the permissions to apply on the AIP during its lifecycle. It is a good practice to use the holding
name as the alias set name.
For more information about alias sets, refer to EMC Documentum Content Server Fundamentals
Guide.
a.
Under Administration/Alias Sets, choose File > New > Alias Set to create a new alias set.
b. Add the following aliases to the alias set, each alias corresponding to an AIP processing phase:
106
Alias
Value
Description
EAS_ACL_RECEPTION
eas_aip_non_visible
Permission set to apply when an
AIP is received
EAS_ACL_INGESTION
_WAIT
eas_aip_non_visible
Permission set to apply when an
AIP is waiting for ingestion
EAS_ACL_INGESTION
eas_aip_non_visible
Permission set to apply when an
AIP is ingested
EAS_ACL_COMMIT_WAIT
eas_aip_non_visible
Permission set to apply when an
AIP has been ingested but has not
yet been committed
InfoArchive Configuration
Alias
Value
Description
EAS_ACL_TERMINATED
myholding
Permission set to apply when an
AIP has been ingested and the
ingestion has been committed
The EAS_ACL_TERMINATED
alias should be assigned an ACL
that grants:
• Read access to users performing
searches against the archive
• Relate access to users who
perform administrative tasks
EAS_ACL_PURGE
eas_aip_non_visible
Permission set to apply when an
AIP is purged
EAS_ACL_REJECT
eas_aip_non_visible
Permission set to apply when an
AIP is rejected
EAS_ACL_INVALID
eas_aip_non_visible
Permission set to apply when an
AIP is invalidated
EAS_ACL_PRUNE
eas_aip_non_visible
Permission set to apply to an AIP
after it has been aggregated (in
synchronous ingestion)
EAS_AIP_OWNER
repository_owner
User name of the repository owner
account
The most common configuration is to assign to all the aliases except EAS_ACL_TERMINATED
a generic ACL that grant Relate access to InfoArchive administrators and None to World.
If there is a need to make the AIP visible to different administrator groups depending on the
AIP processing phase, assign the appropriate ACLs to the alias for each processing phase.
You can also create an alias set quickly using a DQL statement; for example:
CREATE dm_alias_set OBJECT
SET object_name
SET object_description
SET owner_name
= 'AssignedAliasSetName',
= ' AssignedAliasSetDescription',
= (SELECT owner_name FROM dm_docbase_config),
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_RECEPTION',
'AssignedAclName',
'ACL for the reception phase',
6,
1,
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_INGESTION_WAIT',
' AssignedAclName',
'ACL for the pending ingestion phase',
6,
1,
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
=
=
=
=
'EAS_ACL_INGESTION',
'AssignedAclName',
'ACL for the ingestion phase',
6,
107
InfoArchive Configuration
APPEND alias_usr_category = 1,
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_COMMIT_WAIT',
'AssignedAclName',
'ACL for the pending commit phase',
6,
1,
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_TERMINATED',
'AssignedAclName'
'ACL for the completed phase',
6,
1,
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_PURGE',
'AssignedAclName',
'ACL for the purge phase',
6,
1,
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_REJECT',
'AssignedAclName',
'ACL for the reject phase',
6,
1,
APPEND
APPEND
APPEND
APPEND
APPEND
alias_name
alias_value
alias_description
alias_category
alias_usr_category
=
=
=
=
=
'EAS_ACL_INVALID',
'AssignedAclName',
'ACL for the invalid phase',
6,
1
Defining the Structured Data (PDI) Schema
Unlike the SIP descriptor file (eas_sip.xml), there is no predefined schema for structured data
(PDI). You must create a schema with a target namespace that defines the elements, attribute, and
simple and complex types in the eas_pdi.xml file according to your business requirements. The
eas_pdi.xml file in the information package must conform to your defined schema.
The structured data (PDI) schema is configured through the schema configuration (eas_cfg_schema)
object, with the structured data (PDI) schema (.xsd) file as its content. The holding configuration
(eas_cfg_holding) object references the schema through the schema name.
108
InfoArchive Configuration
Here are the steps for defining a PDI schema:
1.
Create a PDI schema (.xsd) file with a designated namespace and a target namespace.
2.
Configure a schema configuration (eas_cfg_schema) object with the PDI schema (.xsd) file as
its content.
The defined schema must also be specified in the SIP descriptor.
Creating a Structured Data (PDI) Schema
Create a schema (XSD) using an XML editor. You then import this document into the repository as
the content of the schema configuration (eas_cfg_schema) object.
Note: The document you create must have the filename extension .xsd. A schema embedded in other
document types (such as a Word file) is not valid.
Also, the schema used by InfoArchive must meet the following requirements:
• The schema version must be 1.0. XSD 1.1 is currently not supported.
• The schema must be a standalone document; xs:include or xs:import that references another
schema is not supported.
• There must be one and only one target namespace in the schema. Multiple namespaces are not
supported.
• PDI schema containing only one AIU element at the root level is not supported.
• The schema must specifically describe its data. Ambiguous data descriptions—for example,
using the any element—will make it difficult to perform configuration tasks such as defining the
partitioning key, specifying the unstructured content file, and creating xDB indexes.
• Element names cannot contain the dot (.) character.
• If you install multiple holdings in one repository, make sure the PDI schema used by each distinct
holding is identified by a unique namespace; otherwise, installing a new holding will break a
previously installed one that uses a PDI schema with the same namespace due to name conflicts.
Schema specifies how to formally describe the elements in an Extensible Markup Language (XML)
document. It defines elements, attributes, and their data type that can appear in an XML document. It
can be used to verify each piece of item content in a document and to express a set of rules to which
an XML document must conform in order to be considered valid according to that schema.
In the following example, the schema dictates:
1.
The child elements must appear in the sequence specified in <xs:sequence>.
2.
The data type for each element is defined in type attribute.
3.
CallFromPhoneNumber must be an 11-digit positive integer.
<xs:sequence>
<xs:element name="SentToArchiveDate" type="xs:date" nillable="false" />
<xs:element name="CallStartDate" type="xs:dateTime" nillable="false" />
<xs:element name="CallEndDate" type="xs:dateTime"/>
<xs:element name="CallFromPhoneNumber">
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
109
InfoArchive Configuration
<xs:minInclusive value="1" />
<xs:totalDigits value="11" />
</xs:restriction>
</xs:simpleType>
PDI (eas_pdi.xml) Schema Definition Best Practices
Here are some best practices for creating your PDI file schema:
• Whenever applicable, use standardized schemas such as ISO 20022, Rosettanet, METS, DITA,
XBRL, and SWIFT.
• Leverage standard schema features to control XML content. For example, use standard XML data
types, especially for the date(time) information.
• Configure minimum and maximum length of attribute and/or element values
• Include the version number of the schema in its URN.
• Adopt a consistent naming rule for the schema URN. This makes it easier to remember the URNs,
which are referenced in multiple places during the configuration.
• Include the version number in the schema URN, defined as value of the targetNamespace; for
example:
<?xml version="1.0" encoding="UTF-8"?>
$xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:eas-samples:en:xsd:PhoneCalls.1.0"
xmlns:ns1="urn:eas-samples:en:xsd:PhoneCalls.1.0">
...
</xs:schema>
• In eas_pdi.xml, include date and date time values that explicitly include the time zone
information; for example, specify an offset from UTC:
<CallStartDate>2014-01-08T22:08:08.158+01:00</CallStartDate>
• Configure value uniqueness, if acceptable from a performance point of view; use namespaces to
separate and identify duplicate XML elements. Also avoid non-ASCII characters in the node name.
If a uniqueness value constraint is defined in the schema, the uniqueness checking is performed
during the validation of the XML. This validation occurs at the beginning of the ingestion. If the
XML is large, uniqueness checking can consume considerable resources and considerably slow
down the ingestion. In such situations, it is frequently better to remove uniqueness definitions
in the schema. Instead, configure the creation of unique xDB indexes during the ingestion. The
XML validation will occur more quickly. Also, if several AIUs having the same value exist
in the package, the ingestion will fail during the creation of the xDB indexes, thus enforcing
the uniqueness constraint.
• When customizing your schema for XML data types, use a professional XML editor such as
oXygen XML Editor and Altova XMLSpy XML Editor.
110
InfoArchive Configuration
Sample PDI (eas_pdi.xml) Schema
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:Schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace=
"urn:eas-samples:en:xsd:phonecalls.1.0" version="1.0"
elementFormDefault="qualified" xmlns:Q1="urn:eas-samples:en:xsd:phonecalls.1.0">
<xs:element name="Calls">
<xs:complexType>
<xs:sequence maxOccurs="unbounded">
<xs:element name="Call">
<xs:complexType>
<xs:sequence>
<xs:element name="SentToArchiveDate" type="xs:date" nillable="false" />
<xs:element name="CallStartDate" type="xs:dateTime" nillable="false" />
<xs:element name="CallEndDate" type="xs:dateTime"/>
<xs:element name="CallFromPhoneNumber">
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
<xs:minInclusive value="1" />
<xs:totalDigits value="11" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="CallToPhoneNumber" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
<xs:minInclusive value="1" />
<xs:totalDigits value="11" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="CustomerID" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
111
InfoArchive Configuration
<xs:totalDigits value="11" />
<xs:minInclusive value="1" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="CustomerLastName" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:normalizedString">
<xs:minLength value="1" />
<xs:maxLength value="32" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="CustomerFirstName" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:normalizedString">
<xs:minLength value="1" />
<xs:maxLength value="32" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="RepresentativeID" nillable="false">
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
<xs:minInclusive value="1" />
<xs:totalDigits value="7" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Attachments" nillable="false" minOccurs="1">
<xs:complexType>
<xs:sequence maxOccurs="unbounded" minOccurs="0">
<xs:element name="Attachment">
<xs:complexType>
<xs:sequence>
<xs:element name="AttachmentName" nillable="false" maxOccurs="1">
<xs:simpleType>
<xs:restriction base="xs:normalizedString">
<xs:minLength value="1" />
<xs:maxLength value="32" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="FileName" nillable="false" minOccurs="1" maxOccurs="1">
<xs:simpleType>
<xs:restriction base="xs:normalizedString">
<xs:minLength value="1" />
<xs:maxLength value="32" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="CreatedBy" nillable="false" maxOccurs="1">
<xs:simpleType>
<xs:restriction base="xs:normalizedString">
<xs:minLength value="1" />
<xs:maxLength value="32" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="CreatedOnDate" type="xs:dateTime" nillable="false" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
112
InfoArchive Configuration
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:Schema>
Configuring a Schema Configuration Object
In DA, create an object of type eas_cfg_schema in the holding folder (e.g., /System EAS/Archive
Holdings/MyHolding) and configure its properties. Import the PDI XML schema (.xsd) file into the
repository as the content of the object.
Note: If you check in multiple versions of PDI schema (.xsd) as the content of the schema
configuration (eas_cfg_schema) object, ingestion will fail. Make sure that the schema configuration
(eas_cfg_schema) object only contains one version of PDI schema and track the schema version as
part of the object name instead.
Property Name
DA Label
Description
eas_name
Name
Technical name of the schema that is referenced by
InfoArchive configuration objects. EMC recommends that
you append the version number directly to the schema
name.
Note: The URN is case-sensitive.
eas_version
Version
Version of the schema. The recommended approach to
schema versioning is to add the version number in the
name of the schema itself and not to use this property.
113
InfoArchive Configuration
On the Description tab, include information about the applications that will query InfoArchive.
Property Name
DA Label
Description
eas_order_no
Sort order
Number that controls the order in which the items are
returned by the service returning the available schemas.
eas_consumer
_application
User application
Name of the consumer application for which the
eas_language_code value is specified.
Note: eas_gui is the client application name assigned to
InfoArchive GUI.
eas_language
_code
Language code
Language code in the format language_country (ISO 639,
3166); for example: fr_FR for French and zh_CN for
simplified Chinese.
eas_title
Title
Title of the schema in the language specified in
eas_language_code at the same index.
eas_description
Description
Description of the schema in the language specified in
eas_language_code at the same index.
Configuring xDB Modes
You configure settings for each xDB mode and specify which xDB mode to use for a holding through
the holding configuration object (eas_cfg_holding).
For xDB modes 1 and 2, you must configure an xDB library configuration object (eas_cfg_xdb_library).
This defines how to connect to the xDB database as well as the root library to use. The xDB library
configuration object contains the information needed to connect to the target xDB detacheable
database and the path of the library in the database.
For xDB mode 3, you must configure an xDB library pool configuration object
(eas_cfg_xdb_library_pool). Optionally, you can configure a custom pooled library assignment
114
InfoArchive Configuration
policy by creating a text file containing the XQuery expression and importing it into the xDB library
pool configuration object as its content.
Both the parent xDB library configuration object and the xDB library pool configuration object are
referenced by the holding configuration object.
You can change the xDB mode for a holding and the change will take effect immediately. After you
change the xDB mode, the new xDB mode is applied to newly ingested AIPs. Previously archived
AIPs remain stored in xDB according to the mode configured when they were ingested. Data ingested
in different xDB modes is transparent to searches.
A sample holding EAS-AUDIT-001 (/System EAS/Archive Holdings/EAS/EAS-AUDIT-001),
which uses xDB mode 3 is provided with the InfoArchive installation package. You can install
this sample holding and use its holding configuration (eas_cfg_holding) and library pool
(eas_cfg_library_pool) objects as a reference for configuring xDB mode 3 for other holdings.
Configuring an xDB Parent Library (xDB Mode 1 and 2)
A default xDB library configuration object was created (/System EAS/xDB Libraries/aip_01
_xdb_lib) during InfoArchive installation and can be used out of the box. You can modify this
object in DA as needed for your InfoArchive deployment.
You can also create a new object of type eas_cfg_xdb_library and edit its properties.
The name of the xDB library configuration object is referenced by the holding configuration object.
115
InfoArchive Configuration
Configuring an xDB Library Pool (xDB Mode 3)
An xDB library pool is configured through an xDB library pool configuration object
(eas_cfg_xdb_library_pool).
In DA, create an object of type eas_cfg_xdb_library_pool and configure its properties.
Property Name
DA Label
Description
eas_name
Name
Name of the xDB library pool configuration. The
name is referenced by the holding configuration
object that uses the xDB library pool.
116
InfoArchive Configuration
Property Name
DA Label
Description
eas_create_library
_disabled
Disable auto. library
creation
Whether to disable automatic creation of new
libraries in the pool.
This setting lets the administrator manually manage
the libraries in the pool or temporarily disable
creation of new libraries in the pool.
This setting only applies to xDB libraries created by
InfoArchive. InfoArchive does not automatically
manage xDB libraries not created by it—they cannot
be closed or cached in/out.
eas_aip_quota
AIP quota per library
Maximum number of AIPs allowed in a library. The
value zero indicates an unlimited number.
eas_aiu_quota
AIU quota per
library
Maximum number of Archival Information Units
(AIUs) allowed in a library. The value zero indicates
an unlimited number.
This property is for information only and does not
actually constrain the number of AIUs in a SIP
during ingestion.
eas_xdb_seg_size
_quota
Library segment size
quota
Maximum size allowed for the xDB segment of a
library. 0 (zero) indicates an unlimited size.
eas_unitary_purge
_enabled
Unitary AIP purge
Reserved for future use
117
InfoArchive Configuration
Property Name
DA Label
Description
eas_last_library
_seqno
Last assigned library
seqno
Last sequence number assigned to an xDB library
created in this pool. Internally, this sequence
number is not really used. It serves only to have a
meaningful way of naming of the pooled library
object. When browsing in DA, it is possible to see
immediately the chronology of the pooled libraries
by sorting on their object_name.
eas_folder_path
Library folder
Repository folder in which to put the xDB library
repository objects created in this pool. The
repository folder path can be any repository
location. It does have to be the same folder path as
that for the holding configuration (eas_cfg_holding)
object. Since a pooled library can hold archived
data pertaining to different holdings, it is common
to classify them into a dedicated sub-folder in the
root folder.
118
InfoArchive Configuration
Property Name
DA Label
Description
eas_acl_name
Library ACL name
Name of the permission set to apply to an xDB
library repository object in Content Server created
for this pool.
eas_acl_domain
Library ACL domain
Domain of the permission set to apply to an xDB
library repository object in Content Server created
for this pool.
eas_close_mode
Close mode
Mode to apply for closing a library of the pool
automatically created by EMC InfoArchive:
• 0: The library is never closed automatically unless
a close request has been manually set on the
library.
• 1: The library can be automatically closed when
the current date is greater than or equal to the
date returned by the XQuery + the delay defined
in the eas_close_period property.
• 2 : The library can be automatically closed when
the current date is greater than or equal to the
library opening date + the delay defined in the
eas_close_period property.
• 3 : The library can be automatically closed when
the current date is greater than or equal to the last
ingestion date in the library + the delay defined in
the eas_close_period property.
The scheduled eas_close_pooled_libraries job is in
charge of closing open pooled libraries according to
the close mode setting.
eas_param_close
_period
Close period (d)
Period in days used according to the chosen close
mode.
eas_xdb_clib_is
_detachable
Detachable library
Whether to set the DETACHABLE_LIBRARY flag
when creating a new library. This setting must be
set to true for xDB mode 3 to work.
eas_xdb_cache
_quota
# of cached library
quota (unlocked)
Maximum number of non-locked libraries of the
pool that are allowed in the xDB cache.
When this limit is exceeded, EMC InfoArchive
attempts to cache out the less used libraries to
comply with this quota.
Since in xDB mode 3, AIPs belonging to different
holdings can be stored in a pooled library, the
maximum number of unlocked libraries associated
with a library pool is defined at the library pool level
rather than at the holding level.
119
InfoArchive Configuration
Since the same library pool can be configured as destination of multiple holdings, the library pool
configuration has similar settings as those available on the holding configuration (eas_cfg_holding)
object for xDB mode 2.
Note: Users must have at least Read permission on the library pool object for searches to be successful.
Configuring a Custom Pooled Library Assignment
Policy (xDB Mode 3)
By default, a new AIP is assigned to the latest open library in the pool. You can configure a custom
assignment policy to assign new AIPs to libraries to meet your specific requirements.
You define the pooled library assignment policy using XQuery in a text file to be imported into the
library pool (eas_cfg_xdb_library_pool) object as its content. The content determines the pooled
library to which AIPs will be assigned.
You can use any information present in the SIP descriptor, including custom metadata, in the XQuery
to define the assignment logic. For example, you can use the retention date (which is most commonly
used), holding, or entity as the condition for assigning AIPs.
Here is an example of the assignment policy XQuery expression.
xquery version "1.0" encoding "utf-8";
declare namespace n = "urn:x-emc:eas:schema:sip:1.0";
let $d as xs:dateTime:= xs:dateTime(/n:sip/n:dss/n:base_retention_date/text())
let $quarter as xs:integer := xs:integer((ceiling(month-from-dateTime($d) div 3)))
let $tz as xs:dayTimeDuration := timezone-from-dateTime($d)
let $start as xs:dateTime := adjust-dateTime-to-timezone(xs:dateTime
(concat(year-from-dateTime($d),"-01-01T00:00:00")),$tz)
let $nextQuarter as xs:dateTime := $start + xs:yearMonthDuration
(concat("P",string(($quarter*3)),"M"))
return
<pool partitioning_key="{year-from-dateTime($d)}-Q{$quarter}"
close_hint_date="{$nextQuarter}"/>
In this example, the quarter-based assignment logic is used and the base retention date is used to
compute the following values:
• partitioning_key="{year-from-dateTime($d)}-Q{$quarter}"
A string type value using the YYYY-Qn pattern for the year and quarter number of the base
retention date
• close_hint_date="{$nextQuarter}
A datetime type value corresponding to the first day of the next quarter
Those computed values are returned as values of partitioning_key and close_hint_date attributes
of a <policy> element expected by InfoArchive. The close_hint_date attribute is mandatory only if
the close mode 1 is applied.
Here is an example of the <policy> element returned by the XQuery expression:
<pool partitioning_key="2011-Q1" close_hint_date="2014-01-01T12:00:00.000+01:00"/>
120
InfoArchive Configuration
If a pooled library already exists whose Partitioning key is equal to the pool.partitioning_key, the
AIP is assigned to it; otherwise, a new pooled library is created and its Partitioning key and Closing
hint date are obtained from the <pool> element.
Configuring xDB Mode Settings for a Holding
Configure the settings for different xDB modes under their respective tabs in the holding
configuration object properties window.
Configuring Settings for xDB Mode 1
121
InfoArchive Configuration
Property Name
DA Label
Description
eas_cfg_xdb_library
_parent
Cfg parent
library (mode
1 and 2)
Set to the eas_name of an eas_cfg_xdb_library object,
which defines:
• Information needed to connect to the target xDB
database
• The path of the library in the database where AIP data
will be stored
• Aliases pointing to file system paths in which data files
can be created
Configuring Settings for xDB Mode 2
The xDB library configuration object also defines one or more segment location aliases. Each segment
location alias refers to a file system path accessible to the xDB server, in which new xDB data files
can be created. In xDB, documents are grouped in libraries. Each library specifies one or more
segments. Each segment specifies a file path on the xDB server host. The segment location alias
on the xDB library configuration object corresponds to the segment location field on the holding
configuration object.
122
InfoArchive Configuration
When an AIP is ingested with the xDB ingestion mode 2, InfoArchive reads the segment location
alias referenced on the holding configuration object. It uses this alias to obtain the associated file
system path from the library configuration object.
Lock with parent and Concurrent library are xDB library settings. They are not required by xDB
mode 2. For information about these two settings, refer to the EMC Documentum xDB Administration
Guide.
In xDB mode 2, all xDB caching settings are configured on the holding configuration object
(eas_cfg_holding) , therefore, you can apply different caching settings for each holding.
Property Name
DA Label
Description
eas_xdb_cache
_lock_period
Locked in cache period
(days)
Duration in days after which the xDB sub-library
may be removed from the cache. The end date of
locking cache = eas_aip.eas_dss_base_retention_date
+ eas_cfg_holding.eas_xdb_cache_lock_period.
eas_cfg_xdb
_library_parent
Cfg parent library (mode
1 and 2)
Set to the eas_name of an eas_cfg_xdb_library object,
which defines:
• Information needed to connect to the target xDB
database
• The path of the library in the database where AIP
data will be stored
• Aliases pointing to file system paths in which data
files can be created
eas_xdb_seg
_location
Segment location
The alias corresponding to the file system path under
which the data file of a new detachable library must
be created
eas_xdb
_detachable
_option
Detachable library
Must be selected or ingestion will fail.
eas_xdb_store
_cache_out
Cached out after library
storage
Reserved for future use
eas_xdb_cache
_in_req_period
Cache in delay (d)
Reserved for future use
eas_xdb_cache
_quota
# of cached AIP quota
(unlocked)
Maximum number of unlocked libraries associated
with this holding that are allowed on the xDB file
system.
123
InfoArchive Configuration
Configuring Settings for xDB Mode 3
Specify the name of the library pool configuration (eas_cfg_xdb_library_pool) object to use.
Set the Locked in cache period to the number of days after the AIP’s base retention date that the AIP
must be kept on the xDB file system. A pooled library is kept on the xDB file system up to the highest
locking date of the AIPs stored in it.
The rest of the configuration for xDB mode 3 is done on the library pool configuration object.
Converting xDB Library (Pool) Backup Renditions
If you have xDB library (pool) backup renditions in the legacy formats (eas_xdb_library and
eas_xdb_library_gzip) backed up using earlier versions of InfoArchive, you need to convert them
to the new formats eas_xdb_backup and eas_xdb_backup_gzip.
InfoArchive is backward-compatible with the legacy xDB library (pool) backup formats, but they may
not be supported in future InfoArchive releases.
To convert xDB library (pool) backup renditions, execute the following command located in
EAS_HOME/tools/XdbBackupMigration/bin:
• eas-launch-xdb-backup-migration.bat (Windows)
• eas-launch-xdb-backup-migration.sh (Linux)
You can set the command options in one of the following ways:
• Configure the eas-xdb-backup-migration.properties file located in
EAS_HOME/tools/XdbBackupMigration/conf
• Set the options when running the command, which overrides the settings in the configuration file
The xDB library backup migration command has the following options; all are optional:
Option
Description
-h <holding
_name>
Specifies the holding for which to convert xDB library (pool) backup renditions.
If this option is not set, the migration command converts xDB library (pool)
backups in all the holdings present in the repository.
124
InfoArchive Configuration
Option
Description
-m <mode>
Specifies the processing mode of the migration script:
• MIGRATE: Adds eas_xdb_backup and eas_xdb_backup_gzip renditions only;
all old eas_xdb_library and eas_xdb_library_gzip renditions are still retained
• MIGRATE_DELETE: Adds eas_xdb_backup and eas_xdb_backup_gzip
renditions and deletes all old eas_xdb_library and eas_xdb_library_gzip
renditions
• DELETE: If corresponding eas_xdb_backup and eas_xdb_backup_gzip
renditions are present, deletes old eas_xdb_library and eas_xdb_library_gzip
renditions
-l [ERROR
| WARN|
DEBUG | INFO
| TRACE]
Specifies the logging level to use: ERROR, WARN, DEBUG, INFO, TRACE.
Default: INFO.
-f <file_name>
Specifies the log file to which to write logging information.
By default, logging information is sent to an output console. With this option,
logging information is sent to the specified log file instead of the output console.
-x
Use this option to display migration statistics without actually performing the
migration process.
If this option is specified, the migration script will do nothing but to display
the detailed information of the objects to migrate including the number of
holdings to update, the number of pool library configuration objects to update,
the number of AIPs and pool libraries to update and the size of structured data
to update.
If the data volume to process is large, the migration process may take a while to complete.
During the migration process (processing mode is MIGRATE), the tool:
1.
Updates the old library archive format of the holding and the xDB library pool from
eas_xdb_library/eas_xdb_library_gzip to eas_xdb_backup/ eas_xdb_backup_gzip by:
• Updating the eas_xdb_store_format property value of the eas_cfg_holding object
• Updating the eas_xdb_store_format property value of the eas_cfg_xdb_library_pool object
2.
Creates the xDB library and xDB library pool backup renditions in the new format.
Configuring the Ingestion Process
The ingestion process consists of a sequence of sub-processes, each performed by an ingestion
processor, which is a Java class file. Which sub-processes to perform during ingestion is defined by
the ingestion sequence configuration (eas_cfg_ingest) object, whose XML content references the
processors to invoke during the ingestion process.
125
InfoArchive Configuration
What can or must be configured depends on the ingestion process (structured or unstructured
data). InfoArchive ships with two pre-configured ingestion sequence configuration (eas_cfg_ingest)
objects—eas_ingest_sip_zip-pdi and eas_ingest_sip_zip-ci—for ingesting structured data and
unstructured data respectively. You can view the XML content of these two objects to see what
sub-processes are included in the ingestion process by default.
The XML content of the ingestion sequence configuration object (eas_cfg_ingest) contains default
settings for most blocks. If the default settings of a block meet your requirements, it is not necessary
to configure them in the ingestion configuration. If needed, you can customize the XML content to
change the ingestion processors to use.
The ingestion process is mostly configured through the ingestion configuration (eas_cfg_pdi)
object, whose XML content contains ingestion parameters that define the actions performed by
the processors. The processor parameters are grouped by XML <data> blocks, each block defining
settings for a processor (with corresponding processor ID). You configure the processor parameters
by modifying the XML content of the ingestion configuration (eas_cfg_pdi) object.
You can view the XML content of the ingestion configuration objects to view the default processor
configurations.
Since each ingestion configuration is specific to a PDI schema, the ingestion configuration object also
references a PDI schema configuration (eas_cfg_schema) object.
You must also configure at least one reception node and one ingestion node, which are required by
the execution of ingestion receiver and ingestor.
The ingestor has a set of default values in .properties files during InfoArchive installation. During
the ingestion process, the ingestor will:
1.
Look for a certain XML block defined in the content of the ingestion configuration (eas_cfg_pdi)
object.
2.
Overwrite the ingestor default values with the values contained in the XML block.
To configure the ingestion process:
1.
Create an XML file defining the ingestor parameters.
2.
Configure an ingestion configuration (eas_cfg_pdi) object with the XML file as its content.
3.
Configure a reception node configuration (eas_cfg_receive_node) object.
4.
Configure an ingestion node configuration (eas_cfg_ingestion_node) object.
126
InfoArchive Configuration
Defining Ingestor Parameters
Create an XML file defining the ingestor parameters.
Parameters for the specific processors are often specific to a schema. You define parameters in the
<data> blocks having the identifier of processor in the XML file. The structure of each block depends
on the parameters expected by the processor.
pdi.index.creator—Creating xDB Indexes
xDB indexing can improve search performance, if applied properly. Indexes adds to the total load
on the system by consuming time and space. Their use may be justified if they actually do improve
search performance.
xDB indexes creation happens after the eas_pdi.xml file is imported into xDB (and has its file name
extension changed from .xml to .pdi). In xDB Admin Client, you can see the indexes that have
been created on the imported PDI file.
The pdi.index.creator block defines the xDB indexes to be created. Before configuring this
block, you must determine the key search criteria that are expected to be used by the end user and
business application. You can parameterize as many indexes as required, but the distribution of the
values to be indexed must be evaluated in order to ensure a minimal selectivity. For asynchronous
ingestion, indexes are defined at the xDB document level.
The XML block for creating xDB indexes is illustrated below.
<data id="pdi.index.creator">
<key.document.name>xdb.pdi.name</key.document.name>
<indexes>
<Index_Definition_1> ... </Index_Definition_1>
<Index_Definition_2> ... </Index_Definition_2>
...
<Index_Definition_n> ... </Index_Definition_n>
</indexes>
</data>
127
InfoArchive Configuration
• The id of this XML block is pdi.index.creator.
• The key.document.name is always xdb.pdi.name, which is the alias for the PDI file
(eas_pdi.xml) imported into xDB. When defined, indexes are created on the PDI files imported
into xDB.
• The element indexes is the parent element for all indexes created in this block.
InfoArchive supports path index and full-text index. The path index is the most common index type
and it consumes less space than the full-text index.
The definition of this parameters block is not mandatory. In case it is not defined, no index is created
on the PDI file. This block is defined in the XML of eas_cfg_pdi object.
Path Index
A path index extracts XML elements and attributes in PDI file. In the following example, two path
indexes on CustomerID and CallStartDate will be created.
<path.value.index>
<name>CustomerID</name>
<path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/
{urn:eas-samples:en:xsd:phonecalls.1.0}Call
[{urn:eas-samples:en:xsd:phonecalls.1.0}CustomerID<LONG>]</path>
<compressed>false</compressed>
<unique.keys>false</unique.keys>
<concurrent>false</concurrent>
<build.without.logging>true</build.without.logging>
</path.value.index>
<path.value.index>
<name>CallStartDate</name>
<path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/
{urn:eas-samples:en:xsd:phonecalls.1.0}Call
[{urn:eas-samples:en:xsd:phonecalls.1.0}CallStartDate<DATE_TIME>]</path>
<compressed>false</compressed>
<unique.keys>false</unique.keys>
<concurrent>false</concurrent>
<build.without.logging>true</build.without.logging>
</path.value.index>
The elements of a path index definition block are listed in the table below. For detailed information
about the path index, please refer to the Path Index section of EMC Documentum xDB Administration
Guide.
Element
Description
name
A unique name for the index.
path
XPath to the element or attribute to be indexed, with element URN.
The datatype can be: STRING, INT, LONG, FLOAT, DOUBLE, DATE,
DATE_TIME
compressed
Boolean value indicating whether the index is compressed in xDB. Default:
false.
Set this to false except when the volume of the text is particularly large.
128
InfoArchive Configuration
Element
Description
unique.keys
Boolean value indicating whether the indexed value is unique. Default: false.
Specify true if the indexed values must be unique, otherwise specify false.
Activating this option throws an ingestion error if the same index value is
found more than once within the AIU structured data of the ingested SIP.
concurrent
Boolean value indicating whether multiple xDB transactions can update the
index concurrently. Default: false.
When set to true, multiple xDB transactions can concurrently update the
index. However, since AIU structured data is never updated in InfoArchive,
this is always set to false.
build.without
.logging
Boolean value indicating whether the index is created without xDB logs.
Default: true.
When set to true, the index must be created without writing to the xDB
transaction log. When restarting a failed ingestion, InfoArchive automatically
performs a cleanup at the xDB level before the restart. For this reason, it is
recommended that you set this to true to reduce the disk I/O activity on the
xDB transaction logs.
Full-Text Index
A full-text index is a special form of value index, which indexes XML elements and attributes and
tokenizes the values into a number of terms. Each term-element combination is added to the index.
Full-text index lets you search for an individual word contained in the indexed values. It is also less
sensitive to misspelling as it also allows you to use wildcard characters in the search query. However,
a full-text index consumes more storage than a path index.
The following example shows the definition of a full-text index CustomerLastName on the value of the
CustomerLastName element, defined by this schema urn:eas-samples:en:xsd:phonecalls.1.0.
...
<full.text.index>
<name>CustomerLastName</name>
<compressed>false</compressed>
<concurrent>false</concurrent>
<optimize.leading.wildcard.search>true</optimize.leading.wildcard.search>
<index.all.text>true</index.all.text>
<include.attributes>false</include.attributes>
<support.phrases>false</support.phrases>
<support.scoring>false</support.scoring>
<convert.terms.to.lowercase>true</convert.terms.to.lowercase>
<filter.english.stop.words>false</filter.english.stop.words>
<support.start.end.token.flags>false</support.start.end.token.flags>
<element.uri>urn:eas-samples:en:xsd:phonecalls.1.0</element.uri>
<element.name>CustomerLastName</element.name>
<attribute.uri/>
<attribute.name/>
</full.text.index>
...
129
InfoArchive Configuration
Element Name
Description
name
Name of the index to create in xDB
compressed
Boolean value indicating whether the index is compressed in xDB.
Default: false.
Set this to false except when the volume of the text is particularly
large.
Note: When the ingestion mode 2 is applied, InfoArchive
compresses the entire xDB data file before importing it in the
repository.
concurrent
Always set this to false since AIU structured data is never updated
in InfoArchive. When set to true, multiple xDB transactions can
concurrently update the index.
optimize.leading.wildcard
.search
If set to true, enables searching the index for terms with a leading
wildcard (e.g., “*abcd”) at the expense of a longer index creation
time and a larger index volume.
Default: true.
index.all.text
If this is set to true, the element is indexed by its string value, which
is computed from the string value of all descendant nodes.
If this is set to false, the element can only have text-child nodes, but
the index updates faster.
Set this to false for better performance.
include.attributes
Boolean value indicating indexing the specified element or indexing
the specified element and its attributes. Default: false.
support.phrases
Boolean value indicating whether to optimize the index to support
proximity phrases in queries. Default: false.
Using this option increases the index size.
support.scoring
Boolean value indicating whether to stores additional information
about the indexed tokens to improve the relevance score calculation
quality. Default: false.
Always set this to false since scoring is not used.
convert.terms.to.lowercase
Boolean value indicating whether to convert indexed terms to
lowercase. Default: true.
Set this to true if case-insensitive searching is desired.
130
InfoArchive Configuration
Element Name
Description
filter.english.stop.words
Boolean value indicating whether to exclude stop words in the
indexes. Default: false.
Setting this to true applies a stop word filter. Words on the stop
word list are not indexed.
For the vast majority of the InfoArchive use cases where a short text
value is associated with the structured data to index, there is little or
no benefit from setting this to true. So, set this to false in most cases.
support.start.end.token
.flags
Boolean value indicating whether to include the first and last terms
of the indexed values.
Setting this to true optimizes queries containing starts with, ends with
or other proximity constraints. Default: true.
element.uri
Namespace of the element to index.
element.name
Name of the element to index.
attribute.uri
Namespace of the attribute to index. It is recommended that you
only index the value of the element.
Default: blank.
attribute.name
Name of the attribute to index. It is recommended that you only
index the value of attribute.
Default: blank.
pdi.aiu.cnt—Counting the Number of AIUs
The pdi.aiu.cnt block contains the XQuery expression that returns the total number of AIUs in the
imported structured data (PDI) file.
The AIU count is specified in the aiu_count element of the SIP descriptor eas_sip.xml. When properly
configured, the ingestor checks the amount of AIUs contained in the SIP. If there is any discrepancy
between the specified amount of AIUs and actual received amount of AIUs, the ingestor throws an
error.
Make sure the correct XML path to the AIU node in the PDI file is specified in the XQuery expression;
for example:
<data id="pdi.aiu.cnt">
<select.query xml:space="preserve">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
count(/n:Calls/n:Call)
</select.query>
</data>
131
InfoArchive Configuration
minmax—Defining the AIP Partitioning Key
The minmax block contains the parameters for returning the minimum and maximum values of the
partitioning key found in the structured data (PDI) of the SIP. In the structured data (PDI), some XML
elements could be used as a partitioning keys, which makes archive search efficient and responsive.
InfoArchive performs the following actions to determine partitioning keys in an AIP:
• Searches related minimum and maximum values contained in the PDI file.
• Assigns minimum and maximum values found for the criteria to the eas_pkey_min_date and
eas_pkey_max_date properties of the AIP (eas_aip) object respectively.
Archiving is mostly a long-term task, and elements related to date or time are proper candidates
for partitioning keys. For example, you can set the partitioning criteria to tag the range of records
creation date or retention date to AIPs.
In some situations, you need to define multiple partitioning keys.
• Create a customized eas_aip subtype with additional attributes.
Additional partitioning criteria can be managed by adding custom attributes to the eas_aip
subtype used for this holding. The criteria can be of any data type (for example, DATE, STRING,
INTEGER, FLOAT, BOOLEAN).
• Add as many key elements as required in the parameters block referencing the name of the
additional attributes.
For performance reasons, define an RDBMS index on the attributes used as partitioning criteria. An
RDBMS index can be created using the MAKE_INDEX Content Server administration method. This
methoed is exposed in Documentum Administrator. It can also be used in IAPI or DQL scripts. Refer
to the EMC Content Server Administration Guide for details.
To further optimize query performance, instead of creating two indexes on a pair of AIP properties
representing the lower and upper range of the partitioning key (e.g., eas_pkey_min_date and
eas_pkey_max_date), you create one composite index on the two properties and drop the original
indexes.
For example, in Oracle, you can create a composite index using a script like this:
CREATE INDEX "MYREPO"."EAS_AIP_PEM_08" ON "MYREPO"."EAS_AIP_PEM_S"
("EAS_PKEY_MIN_DATE", "EAS_PKEY_MAX_DATE") TABLESPACE "DM_MYREPO_INDEX"
PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE ( INITIAL 16K BUFFER_POOL
DEFAULT)
Then, update the table statistics using the following script:
exec DBMS_STATS.GATHER_DATABASE_STATS;
Note: When you drop the original indexes, you must drop them using DQL statements on the
Content Server side instead of directly dropping them in the database; otherwise, Content Server will
recreate the indexes in the database.
The dm_UpdateStats job is enabled by default and may impact query performance.
132
InfoArchive Configuration
The value range information stored with AIP dramatically optimizes the search performance as
InfoArchive executes 2–tiered searches:
• InfoArchive first looks for a subset of AIPs which fit into the specified partitions (value ranges);
for example, a subset of AIPs which contain phone calls from March 1st, 2013 to April 1st, 2013.
• Then, InfoArchive searches at xDB level for the AIUs satisfying all the search criteria and return
the search results.
The following example uses CallStartDate as the partitioning key to determine its value.
<data id="minmax">
<key name="CallStartDate" type="date-time" xml:space="preserve">
<min field="eas_pkey_min_date" xml:space="preserve">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
min(/n:Calls/n:Call/xs:dateTime(n:CallStartDate))
</min>
<max field="eas_pkey_max_date" xml:space="preserve">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
max(/n:Calls/n:Call/xs:dateTime(n:CallStartDate))
</max>
</key>
</data>
• The name of this XML block must be set to minmax.
• In the example, n is used to refer to the urn:eas-samples:en:xsd:phonecalls.1.0 namespace.
Therefore, in the XML files, all of the tags defined by this namespace are prefaced with n:.
• The use of XQuery functions is supported; for example, count, adjust-dateTime-to-timezone,
min, max, and so on.
• A configured XQuery is embedded in the count element used to query for the number of
AIUs. This query will be executed during the ingestion and return an integer. The number of
/Calls/Call elements found in the structured data corresponds to the number of AIU.
An XPath statement suffices as a simple XQuery to obtain all of the <Call> elements: /Calls/Call.
However, it is good practice to specify the namespace (urn:eas-samples:en:xsd:phonecalls.1.0),
which is stored in the variable n.: /n:Calls/n:Call. To count these elements, this XPath is passed
to the count function.
• One key element per partitioning criterion. The value for partitioning key is retrieved by getting
the minimum and the maximum values of the <key>.
The value of the type attribute is only written to the processing logs. Valid data types are: DATE,
DATE-TIME, STRING, INTEGER, DOUBLE, and BOOLEAN.
• The minimum and maximum values found for the /Calls/Call/CallStartDate elements
are respectively assigned to the eas_pkey_min_date and eas_pkey_max_date attributes of
the AIP repository object.
The attributes of the key element are described as follows:
133
InfoArchive Configuration
Attribute Name
Description
name
This name will appear in the ingestion log when the value range
extraction is performed for the current key element.
type
Indicates the data type of the value to be returned by the configured
XQuery.
The acceptable values are DATE, DATETIME, STRING, INTEGER,
FLOAT, and DOUBLE.
Once defined, all these partitioning criteria attributes must be registered with the corresponding
holding configuration (eas_cfg_holding) object.
Note: In earlier releases of InfoArchive, the minmax processor also included the function of counting
the number of AIUs in the imported structured data (PDI), which is now performed by a separate
pdi.aiu.cnt processor. If you have upgraded from a previous InfoArchive version, you must modify
the XML content of the ingestion configuration (eas_cfg_pdi) object by removing the count element
such as in the following example from the minmax XML block:
<count xml:space="preserve">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
count(/n:Calls/n:Call)
</count>
Optimizing XQuery for Partitioning Keys
The execution time for the embedded XQuery has a significant impact on ingestion performance.
If the partitioning key is also an index item in xDB, the execution time for XQuery will be reduced
substantially when XQuery is executed within the index context.
The following example demonstrates how to put XQuery within the index context.
<data id="minmax">
<key name="date-oper" type="date" xml:space="preserve">
<min field="eas_pkey_min_date">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
(for $d in /n:Calls/n:Call[n:CallStartDate]
order by
$d/n:CallStartDate ascending return $d/n:CallStartDate/xs:dateTime(.))[1]
</min>
<max field="eas_pkey_max_date" xml:space="preserve">
declare namespace op
= "urn:acme-corp:xsd:sdd:operation.1.4";
declare namespace head = "urn:acme-corp:xsd:sdd:operation-header.1.6";
(for $d in /n:Calls/n:Call[n:CallStartDate]
order by
$d/n:CallStartDate descending return $d/n:CallStartDate/xs:dateTime(.))[1]
</max>
</key>
</data>
The min and max queries respectively retrieve the minimum and maximum values directly from the
index. Optimized query prevents a full scan of the structured data (PDI) in xDB.
134
InfoArchive Configuration
pdi.aiu.id — Generating AIU IDs
The pdi.aiu.id block contains XQuery expressions that specify the path to the AIU node and
returns AIU elements in eas_pdi.xml. The pdi.aiu.id ingestion processor uses this information
to generate IDs in the structured data file (eas_pdi.xml) for each AIU in ingested AIPs. AIU
IDs are useful for identifying AIUs and retrieving them later, as well as generating granular audit
trails at the AIU level.
In the following example, the AIU node is specified as /n:Calls/n:Call.
<data id="pdi.aiu.id">
<select.query xml:space="preserve">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
/n:Calls/n:Call
</select.query>
</data>
Optionally, you can also define XQuery expressions to return some additional contextual information
such as customer ID for each AIU, which will be automatically logged as the eas_fetch event in
the audit trail; for example:
<data id="pdi.aiu.id">
<select.query xml:space="preserve">
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
for $call in /n:Calls/n:Call
return ($call, $call/n:CustomerID/text())
</select.query>
</data>
During the ingestion process, the pdi.aiu.id processor creates a unique ID for each AIU in
eas_pdi.xml using the following pattern: AIP_ID + “aiu:” + AIU_sequence. In the following example,
the Calls element is assigned the AIP ID (080000018000a5e78000a65d) and each Call element,
which represents an AIU, is assigned an AIU ID that is derived from the AIP ID. The customer ID is
saved as the value of the audit attribute for each AIU as contextual information during the ingestion
process. When an AIU is retrieved later, the access to the AIU will be logged as the eas_fetch audit
trail event capturing the AIU ID along with its associated contextual (eas_aiu_audit) information.
<Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0" xmlns:ns1="urn:x-emc:eas:schema:pdi"
ns1:id="080000018000a5e78000a65d">
<Call ns1:id="080000018000a5e78000a65d:aiu:1" ns1:audit="000502">
...
</Call>
<Call ns1:id="080000018000a5e78000a65d:aiu:2" ns1:audit="000503">
...
</Call>
...
</Calls>
Note: The AIU node specified in the XML content of the eas_cfg_pdi object must match the node
level defined for the pdi.aiu.id processor in the XML content of the eas_cfg_ingest object. If needed,
you can edit the XML content of the eas_cfg_ingest object to redefine the AIU node level:
<processor id="pdi.aiu.id">
<display.name>Add AIU IDs in PDI</display.name>
<class>com.emc.documentum.eas.ingestor.transform.processor.importer.PDIAiuIdProcessor</class>
<data>
<select.query>/node()/node()</select.query>
</data>
</processor>
135
InfoArchive Configuration
Note: For SIPs already ingested without AIU IDs, you must re-ingest them with the appropriate
pdi.aiu.id configuration to generate the AIU IDs.
To optimize query, the ingestion process also invokes the pdi.aiu.index processor, which is referenced
in the eas_cfg_ingest object, to create xDB indexes on the AIU ID attribute in the PDI file. The default
settings for the processor are sufficient in most use scenarios.
pdi.ci.id — Generating Content File IDs
The pdi.ci.id block contains the XQuery expression that returns strings that uniquely identify
unstructured content files. The pdi.ci.id ingestion processor uses this information to set the content
ID as the value of the cid attribute of the element containing the unstructured content file name in
eas_pdi.xml. Content file IDs are useful for identifying content files and retrieving them later, as
well as generating granular audit trails at the content file level.
Make sure that in the XQuery expression, the XML path to the node that contains the file name of the
content file is specified; for example:
<data id="pdi.ci.id">
<select.query>
<![CDATA[
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
declare namespace ri = "urn:x-emc:eas:schema:ri";
let $pdi_uri := root(.)
let $aip_id := xhive:metadata($pdi_uri, 'eas_aip_id')
let $ri_uri := replace(document-uri($pdi_uri), '\.pdi$', '.ri')
for $ri in doc($ri_uri)/ris/ri[@pdi_key]
for $n in /n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName[. = $ri/@pdi_key]
return ($n,concat($aip_id,":ci:",$ri/@seqno))
]]>
</select.query></data>
...
Here is an example of an content file ID created in eas_pdi.xml.
<Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0" xmlns:ns1="urn:x-emc:eas:schema:pdi"
ns1:id="080000018000a71b8000a9a8">
<Call ns1:id="080000018000a71b8000a9a8:aiu:1">
…
<Attachments>
<Attachment>
<AttachmentName>recording</AttachmentName>
<FileName ns1:cid="080000018000a71b8000a9a8:ci:1">recording1.mp3</FileName>
…
</Attachment>
</Attachments>
</Call>
</Calls>
Note: The same content file can be referenced by multiple AIUs.
For SIPs already ingested without content file IDs, you must re-ingest them with the appropriate
pdi.ci.id configuration to generate the content file IDs.
136
InfoArchive Configuration
To optimize query, the ingestion process also invokes the pdi.ci.index processor, which is referenced
in the eas_cfg_ingest object, to index content file IDs in the PDI file. The default settings for the
processor are sufficient in most use scenarios.
toc.creator—Creating the Table of Contents (eas_ri.xml) for
Unstructured Content Files
The toc.creator block contains parameters for returning information on each unstructured content
file, including file name and file format, which is required by the TOC.
When content files are contained in a SIP and referenced in the PDI file, the ingestor generates a table
of contents during the ingestion. If content files are contained in SIP, but not referenced in the PDI
file, the ingestor does not archive content files. The default settings of this block return nothing.
Note: To archive unstructured data, the toc.creator block must always be included in the ingestion
configuration (eas_cfg_pdi) object content.
An XQuery expression, which can be flexibly configured, is embedded in <select.query> element,
responsible for scanning all the PDI elements and attributes and returning the information used for
referencing the content files.
During the ingestion, the ingestor:
• Executes a query on the PDI file (eas_pdi.xml) in xDB, and returns a list of unstructured
content files—one XML block per distinct unstructured content file name referenced in the PDI
file (eas_pdi.xml); for example:
<content type="audio/x-mpeg" format="mp3">recording08.mp3</content>
The XQuery selects the distinct values of the /Calls/Call/Attachments/Attachment
/FileName elements.
The content MIME type and format are statically defined in the XQuery. Only the unstructured
content files of the specified MIME type and format will be returned.
It is a good practice to order the results returned by the XQuery by file name in ascending order.
Note: If the XQuery does not return the file name of an unstructured data file in the SIP, the
ingestion can still be completed successfully but the unstructured data file is NOT archived.
• Verifies no referenced content file is missing in the SIP by checking whether the file exists
in the working directory, where unstructured content files have been uncompressed:
/EAS_HOME/working/IngestionNodeDirectory/TimestampedSubDirectory
An ingestion error is raised if a returned file name is not found.
• For each returned file name, ingestion processing:
— Obtains the size of each referenced file
— Adds an entry to the TOC initialized in xDB
Here is an example of the toc.creator block for creating the table of contents:
<data id="toc.creator">
<select.query>
<![CDATA[
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
for $ci in
137
InfoArchive Configuration
distinct-values(/n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName)
order by $ci
return <content type="audio/x-mpeg" format="mp3" audit="{$ci}">{ $ci }</content>
]]>
</select.query>
</data>
The attributes of the <content> element in the XQuery are described as follows.
Attribute Name
Description
type
MIME type corresponding to the electronic format of the unstructured
content file.
The MIME type can be ambiguous. For example, files created by different
versions of the same application can have the same MIME type.
format
Format of the unstructured content file; in most cases, the file name extension.
The MIME type can be extrapolated from this attribute.
The format must be defined in the Documentum repository. If needed, you
can define additional formats in the repository using DA.
audit
This external variable is declared for the toc.creator processor and a new
attribute audit is saved into each ri element as contextual information
during the ingestion process. When a content file is retrieved later, the access
to the content file will be logged as the content retrieval (eas_getcontent)
audit trail event.
The XQuery can be as complex as required; for example, it can:
• Obtain the file names from multiple elements and attributes of the structured data
• Determine the MIME types and format names based on information present in structured data,
such as the extension of the file name , dedicated elements, context, and so on.
ci.hash—Configuring Content Hashing (Unstructured Data)
When SIP contains content files, you can configure the ingestion parameters within the ci.hash XML
block to define how hashing is performed on the unstructured data, including:
• Computation of hash values for unstructured content files
• Comparison of hash values with values provided in the structured data (any discrepancy will
result in an ingestion error)
Hashing ensures the consistency of contents included in a SIP. There are several algorithms to
compute hash values, for example, MD2, MD5, SHA-1, SHA-256, SHA-384, and SHA-512. You can
choose between base64 and hexadecimal encoding schemes.
The default settings in the ci.hash XML block dictates that:
• SHA-1 hash values are computed for all unstructured content files
• Hash values are not compared with any values stored in the structured data
The following example shows the structure of the ci.hash XML block:
<data id="ci.hash">
138
InfoArchive Configuration
<select.query>
declare namespace ri = "urn:x-emc:eas:schema:ri";
let $uri := replace(document-uri(.), '\.pdi$', '.ri')
for $c in doc($uri)/ris/ri
return
<content filename="{ $c/@pdi_key }">
<hash encoding="base64|hex" algorithm="SHA-1" provided="false" />
</content>
</select.query>
</data>
• The id of this XML block is ci.hash.
• An XQuery is embedded in select.query. The query will be executed against the PDI file
(eas_pdi.xml) in the xDB.
The conditions applied in the configured XQuery can selectively compute or validate a hash for
a subset of unstructured data files.
• The uri variable is assigned with the xDB path of the table of contents which equals to the path of
the execution context except that the extension is ri instead of pdi.
• A content element is returned for each /ris/ri element found in the table of contents
• A single hash sub-element is included in each returned content element
• The algorithm attribute of the hash sub-elements is statically assigned to SHA-1 for requesting
to apply this algorithm.
• Since not any hash value is obtained from the structured data of the AIUs, the hash sub-elements
are not assigned with any value and their provided attribute is set to false.
The XML block returned by the query for each unstructured content file consists of a single
<content> element. When PDI file contains hash value, the query returns the hash value in the
hashelement. The XQuery does not return such an XML block for an unstructured data file on which
no hash computation or validation action is performed.
Here is the structure of the returned XML block:
<content filename="Unstructured_Content_File_Name">
<hash encoding="base64" algorithm="MD2|MD5|SHA-1|SHA-256|SHA-384|SHA-512"
provided="true|false">Encoded_Hash_Value</hash>
</content>
The <hash> attributes are described as follows:
Attribute Name
Description
encoding
Indicates the encoding:
• To be applied for inserting the hash value in the table of contents
• Of the value of the hash sub-element
The current version supports only the base64 encoding.
139
InfoArchive Configuration
Attribute Name
Description
algorithm
Name of the hash algorithm apply.
The following algorithms are currently supported: MD2, MD5, SHA-1,
SHA-256, SHA-384 and SHA-512.
provided
Boolean indicating if the presence of this hash value in the structured data of
the AIUs is demanded.
Setting to true this attribute raises an ingestion error if the value of the hash
element is void (no value to compare with).
When the hash sub-element has a value, an ingestion error is raised if the computed hash value
differs from the provided one.
The configured XQuery can be as complex as needed depending on the desired behavior of this
ingestion step:
• If the PDI file also contains content hash values; the query will:
— Obtain hash value for each content file.
— Set the provided attribute to True.
• Compute multiple hash values using different algorithms for each unstructured data file.
Multiple hash elements will be returned since multiple hash algorithms are applied as per
your configuration.
Most common electronic archiving standards mandate that at least one hash value is computed for
future consistency checks. Defining the parameters block with a query not returning any information
will prevent hash computation during the ingestion.
The following example ci.hash block requests the computation of two hash values for all unstructured
content files:
<data id="ci.hash">
<select.query>
<![CDATA[
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
for $ci in distinct-values(/n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName) order by $ci
return
<content filename="{ $ci }">
<hash encoding="base64" algorithm="MD5" provided="false" />
<hash encoding="base64" algorithm="SHA-1" provided="false" />
</content>
]]>
</select.query>
</data>
You can use information from the unstructured table of contents (TOC) for hashing. In situations
where it is not required to query structured data, using the TOC generally results in faster query
execution.
Content hash will populate the table of contents with the computed hash values. In the following
example, the table of contents now includes a hash value The table of contents includes a hash value
for the first content computed during ingestion with the SHA-1 algorithm for the first content file,
but no value comparison has been performed:
<?xml version="1.0"?>
<ris xmlns="urn:x-emc:eas:Schema:ri">
140
InfoArchive Configuration
<ri SchemaVersion="1.0" seqno="1" pdi_key="recording1.mp3"
xmlns:ri="urn:x-emc:eas:Schema:ri">
<step seqno="1">
<ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123">
<hash encoding="base64" algorithm="SHA-1" provided="false">
OWJjYjNkNDk2MjNlNzg3YjhkNTI2NWFiMTY3ZmQwODA3NjYzYjNiYQ==</hash>
</ci>
</step>
</ri>
…
</ris>
One <ri.hash> element per is inserted per computed hash value (encoded) in <ri.ci>. The
attributes of the hash element are described as follows:
Attribute Name
Description
encoding
Encoding applied to the hash value
algorithm
Name of the algorithm used to compute the hash value
provided
Set to true when the hash has been compared with the one obtained from
structured data.
ci.compressor—Configuring Compression of Content Files
(Unstructured Data)
The ci.compressor block contains parameters that define unstructured content files to compress with
the gzip compression algorithm in order to save storage space.
In the following example, the ci.compressor parameter block request all unstructured content files to
be compressed.
<data id="ci.compressor">
<select.query>
<![CDATA[
declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0";
for $ci in distinct-values(/n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName)
return $ci
]]>
</select.query>
</data>
• The XML block id is ci.compressor.
• An XQuery is embedded in <select.query> element.
— By default, XQuery is executed against the PDI file (eas_pdi.xml) imported into xDB, which
is more efficient.
— Its expected result is the file names of the contents to compress.
• The uri variable is assigned with the xDB path of the table of contents which equals to the path of
the execution context except that the extension is ri instead of pdi.
• This query will return the values of the attribute pdi_key of all /ris/ri elements.
The XQuery returns a list of file names of the unstructured content files to compress.
141
InfoArchive Configuration
This ci.compressor parameters block is not mandatory. If this block is absent, no content will
be compressed.
The default ci.compressor default settings do not enable compression.
The XQuery can be as complex as required to determine the unstructured content files to compress;
for example, based on information available in the structured data, such as the file extension, or
information in the TOC, such as MIME type and format.
Depending on how the unstructured content files are generated, files in a given format can be
compressible or not compressible; for example, optimized or non optimized PDF. Consequently, it is
necessary to have compression rules defined in each ingestion context according to the characteristics
of the expected unstructured data files.
In the following ci.compressor parameter block example, the XQuery dictates that all unstructured
content files are compressed:
<data id="ci.compressor">
<select.query>
declare namespace ri = "urn:x-emc:eas:schema:ri";
let $uri := replace(document-uri(.), '\.pdi$', '.ri')
for $c in doc($uri)/ris/ri
return $c/@pdi_key
</select.query>
</data>
The following TOC example shows that the unstructured content file recording1.mp3 is
compressed using gzip.
<?xml version="1.0"?>
<ris xmlns="urn:x-emc:eas:Schema:ri">
<ri SchemaVersion="1.0" seqno="1" pdi_key="recording1.mp3"
xmlns:ri="urn:x-emc:eas:Schema:ri">
<step seqno="1">
<ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123">
<hash encoding="base64" algorithm="SHA-1" provided="false">
OWJjYjNkNDk2MjNlNzg3YjhkNTI2NWFiMTY3ZmQwODA3NjYzYjNiYQ==</hash>
</ci>
</step>
<step seqno="2">
<compress mime_type="application/x-gzip" dctm_format="gzip" size="40739"/>
</step>
</ri>
• A <step> element is appended after the current <step> in the <ri> element of the content file.
• The seqno attribute value is the seqno of the previous step + 1.
• A compress element with the following attributes is created within step.
Attribute name
Description
mime_type
MIME type of the compression format.
Set to application/x-gzip since the current version applies this
compression algorithm.
142
InfoArchive Configuration
Attribute name
Description
dctm_format
Name of the repository format corresponding to the compression format.
Set to gzip since the current version applies this compression algorithm.
size
Size of the content file in bytes after the compression.
Configuring a Ingestion Configuration (eas_cfg_pdi)
Object
In DA, create an object of type eas_cfg_pdi with the ingestion parameter XML file as its content in the
holding folder (e.g., /System EAS/Archive Holdings/MyHolding) and configure its properties.
Property Name
DA label
Description
eas_name
Name
Name assigned to these ingestion parameters
Generally, it is chosen to assign the URN of the Schema
for which the parameters are applicable.
eas_version
Version
Generally, not any value is assigned to this property.
Configuring a Reception Node Configuration
(eas_cfg_receive_node) Object
At least one reception node must be configured for the reception process.
The reception node is configured through the reception node configuration (eas_cfg_receiver_node)
object in the repository. A default cache access node /System EAS/Nodes/reception_node_01
was configured during InfoArchive installation and can be used out of the box. You can modify
the default reception node as needed.
To create and configure a new reception node, create an object of type eas_cfg_receiver_node and
set its properties.
143
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
Name of the receiver node.
eas_log_level
Log level
Verbosity level of the reception logging:
• 0: TRACE
• 1: DEBUG
• 2: INFO
• 3: WARN
• 4: ERROR
eas_logs_store
_enabled
Archive logs
When set, InfoArchive stores reception logs in the
repository for unknown holdings (for example,
when a reception error occurs before the holding has
been identified) which is useful for troubleshooting
reception errors with incorrectly configured data.
eas_logs_store
Log store
Storage area to use for storing reception logs when
eas_log_store_enabled is set to true.
eas_alias_set
Alias set
Name of the alias set to apply to the lifecycle after the
creation of the object (optional).
eas_auxiliary_alias
_set
Auxiliary alias set
(Optional) The alias set in the session that transitions
states in the AIP lifecycle during the reception phase.
It contains aliases referencing the permission sets
to be applied to actions throughout the reception
lifecycle.
If not using custom lifecycle states, leave the default
value as it.
144
InfoArchive Configuration
Property Name
DA Label
Description
eas_folder_path
Creation Folder
Path to the temporary repository folder in which to
create AIP objects, before the configured setting of
the target holding are determined and applied.
eas_fs_working
_root
Working directory
Path to the working directory in the file system used
by the cache access node.
eas_delete_on
_unknown
Delete on unknown
flow
Enable deletion of the received file if the
eas_filename_pattern, eas_app_user_arg1_pattern,
eas_app_arg2_pattern table line does not match the
arguments. This attribute is associated with the
Argument 1 mask at the same index.
Property Name
DA Label
Description
eas_filename
_pattern
Filename pattern
Pattern (Java regular expression) that must match the
value of the filename passed to the receiver. This attribute
is associated with the Argument 1 mask at the same index.
eas_app_user
_arg1_pattern
Argument 1 mask
Pattern (Java regular expression) that must match the
value of the “o” argument passed to the receiver.
eas_app_user
_arg2_pattern
Argument 2 mask
Pattern (Java regular expression) that must match the
value of the “t” argument passed to the receiver. This
attribute is associated with the Argument 1 mask at the
same index.
eas_sip_format
Format
Documentum format name associated with the received
file (for example, eas_sip_zip) associated with the
Argument 1 mask at the same index.
eas_java_class
XML extraction
Java class
Fully qualified name of the Java class to invoke the
extraction of the eas_sip.xml file from the received SIP
with the eas_sip_format at the same index. This attribute
is associated with the Argument 1 mask at the same index.
eas_java_class
_arg1
Java class
arguments
String value passed as argument of the Java class. This
attribute is associated with the Argument 1 mask at the
same index.
eas_delete_on
_error
Delete on error
Whether or not to enable the deletion of received data
from the working directory and xDB if a processing
error occurs. This setting is useful for protecting
sensitive data. When data must be encrypted for
145
InfoArchive Configuration
the holding, the eas_delete_on_error property of the
eas_cfg_holding_crypto configuration object overrides
this property defined at the holding level.
This property is associated with the Argument 1 mask at
the same index.
Configuring an Ingestion Node Configuration
(eas_cfg_ingest_node) Object
At least one ingestion node must be configured for ingestion processing (required by both the
enumerator and ingestor as an argument).
You can configure multiple ingestion nodes to distribute the ingestion workload across different
holdings.
The ingestion node is configured through the ingestion node configuration (eas_cfg_ingest_node)
object in the repository. A default ingestion node /System EAS/Nodes/ingestion_node_01
was configured during InfoArchive installation and can be used out of the box. You can modify
the default ingestion node as needed.
To create and configure a new ingestion node, create an object of type eas_cfg_ingest_node and
set its properties.
146
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
Name of the ingestion node.
eas_log_level
Log level
Verbosity level of the ingestion logging:
• 0: TRACE
• 1: DEBUG
• 2: INFO
• 3: WARN
• 4: ERROR
eas_fs_working
_root
Working directory
File system path of the root working directory of the
ingestor.
eas_fstore_access
_enabled
Filestore access
Activates direct read access to the repository filestore
storing the received files:
• Requires that these file systems are accessible on the
host server of the ingestion node.
• Accelerates the ingestion and optimizes disk usage
since no local copy of the received file is created.
If you run ingestions on the Content Server using the
installation owner user account, the ingestor can access
the repository file stores directly at the operating system
level. This is quicker than accessing files using the DFC
147
InfoArchive Configuration
Property Name
DA Label
Description
getfile method. File access at the operating system level
speeds up ingestion, especially for large SIP files.
eas_fstore_nomap
_enabled
Use same
filesystem path
Indicates that all the filestores of the repository is available
in file system with the same path as on the Content Server.
eas_map_fstore
Filestore
Name of the repository filestore associated with the
eas_map_local_mounts property at the same index.
Reserved for future use
eas_map_local
_mount
Local path
Mount path of the local filestore named in eas_map_fstore
at the same index.
Reserved for future use
Configuring Ingestion for Unstructured Data
Ingestion of unstructured data is performed by additional processors referenced in the
eas_ingest_sip_zip-ci ingestion sequence configuration (eas_cfg_ingest) object.
Before ingesting unstructured data, make sure the following has been configured for the target
holding:
• The ingestion sequence must be set to eas_ingest_sip_zip-ci.
If the eas_ingest_sip_zip-pdi ingestion sequence is used for SIPs that contain unstructured data,
the ingestion process still proceeds without any errors. However, the unstructured data is not
archived.
• The content store must be specified for storing unstructured data.
If content hashing is required, you select the Content hash validation option. This option lets you
quickly activate or deactivate hash processing without having to alter the ingestion configuration.
148
InfoArchive Configuration
Configuring an Encryption-enabled Holding
InfoArchive supports data encryption to ensure data security. Data is encrypted during ingestion
and decrypted when searched and accessed by authorized users.
InfoArchive can encrypt the following data:
• Received SIPs (sip)
• Structured data in PDI XML (pdi)
• Content files (ci)
• Order results (dip)
You can configure an encryption-enabled holding to ensure data security. InfoArchive has two
certified encryption providers:
• DemoCryptoProvider: This is not a real encryption provider. You can use it for demo and training
purposes.
• RSACryptoProvider: InfoArchive currently supports EMC RSA DPM (Data Protection Manager)
as a crypto provider out of the box. If you want to use RSA DPM with InfoArchive, a separate
RSA DPM license is required.
Note:
— InfoArchive 3.1 supports RSA DPM Java Key Client 3.5.2 and RSA DPM Server 3.5 or later.
— You should have basic knowledge of RSA DPM in order to use RSA with InfoArchive. Refer to
RSA documentation for more information about RSA DPM Key Client and Server.
You can also develop your own encryption provider. The encryption Java classes to be used with
InfoArchive must follow the ICryptoProvider interface. You can find the source code, the required
methods, and the documentation about ICryptoProvider in resources/java of the InfoArchive
installation package.
In this documentation, RSA Crypto Provider is used as an example to illustrate how to configure
encryption.
149
InfoArchive Configuration
Preparing Resources for InfoArchive Installation
1.
If the host name of the DPM server is not known by the DNS, map the DPM appliance host
IP to the host name.
On Windows, you map a host name by editing C:\Windows\System32\drivers\etc\hosts;
On Linux, you map a host name by editing /etc/hosts.
2.
Update JVM Crypto Policy libraries on the InfoArchive host. You can download the policy
files from the following sites:
• Oracle JVM: http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download
-432124.html
• IBM JVM: https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=jcesdk
Unzip the downloaded package and copy the files to <jdk_install_dir>/jre/lib
/security on your InfoArchive host.
3.
Copy RSA DPM Key Client Java libraries to core/template/home/external/lib in the
installation package.
4.
Copy sample-config.properties provided by RSA DPM Java Key Client to
core/template/home/external/resources in the installation package.
5.
Change the filename of sample-config.properties.
The properties file should follow the naming convention: <eas_cfg_crypto_provider
.eas_name>.properties. For RSA Crypto Provider, you must change the filename to
RSACryptoProvider.properties.
6.
Modify the value of pki.client_keystore_file and pki.server_keystore_file.
Note:
• The value of pki.client_keystore_file is the absolute path to the file (.p12) you upload
to DPM appliance as the Identity Certificate. The value of pki.server_keystore_file
is the absolute path to the file (.pem) you upload to DPM appliance as the Trusted CA
Certificate. If the value is a Windows path, you must use two back slash characters as path
separators. For example, C:\\certificates\\cert_1.p12.
• pki.client_keystore_password must be set with the password to the
pki.client_keystore_file.
• If InfoArchive components share the same certificate, Receiver and Ingestor are launched in
parallel. Cache on disk and registration client functions are not needed. The properties file
should not contain the following properties that are added or overwritten by InfoArchive at
runtime:
— server.host
— server.port
— cache.file
— client.registration_file
— client.app_name
• If InfoArchive components are distributed across several hosts, EMC recommends that you
use several custom .poperties files, and set paths to the properties files in the CLASSPATH
150
InfoArchive Configuration
environment variable. In this scenario, no properties are overwritten by InfoArchive at
runtime.
After the preparation is finished, you can start InfoArchive installation following the instructions
described in the EMC InfoArchive Installation Guide.
Configuring RSA DPM Key Manager
To encrypt data archived into InfoArchive using RSA DPM, you must create at least one key class
(you can create more if needed).
Follow these rules when creating the key class:
• Identity group linked to the key class must contain an identity whose identity certificate is set in
RSACryptoProvider.properties.
• Set Key duration to Infinite.
• Set Key behavior to New key each time or Use current key depending on your needs.
Creating Configuration Objects for Encryption Settings
InfoArchive saves encryption settings in a separate set of objects. In this way, a dedicated ACL can be
configured to ensure data security. To configure an encryption-enabled holding, you must create the
following objects to utilize the Java classes contained in the libraries you installed earlier.
• eas_cfg_crypto_provider
• eas_cfg_aic_crypto
• eas_cfg_holding_crypto
• eas_cfg_pdi_crypto
• eas_query_config
In DA, you can create a new object by selecting File > New > Document. Then select a specific type
in the Type drop-down list.
eas_cfg_crypto_provider
You must create an eas_cfg_crypto_provider object for the encryption libraries you installed.
For RSA Crypto Provider, use the specific class eas_cfg_rsa_key_manager, which is a child of the
eas_cfg_crypto_provider class.
To create an eas_cfg_crypto_provider object:
1.
In DA, navigate to /System EAS/System.
2.
Select File > New > Document in the menu.
151
InfoArchive Configuration
3.
Fill Name, Type and Format controls in the Create tab. For RSACryptoProvider, choose
eas_cfg_rsa_key_manager in the Type drop-down list.
4.
Click Next.
5.
Fill textboxes in the Info tab.
• In the Encryption service section, you must provide the exact same name as that
of the .properties file in the Name field. The Java Class field should have the
name for the RSA implementation class. For RSACryptoProvider, the Java Class is
com.emc.documentum.eas.crypt.rsa.rkm.RSADpmCryptoProvider.
• In the RSA RKM section, the Site value should match the site value set in the global
configuration (eas_cfg_config) object. The default value is A.
6.
Click Next.
7.
In the Permission tab, ensure dm_world has the Read permission.
The properties of the created object are as follows:
Property Name
DA Label
Description
object_name
Name (Info tab)
RSACryptoProvider, must be identical to the
file name (without file name extension) of the
.properties file in the external/resources
folder.
eas_java_class
Java Class
Fully qualified name of the Java class
implementing the InfoArchive encryption
interface. For the RSA RSACryptoProvider, it is
com.emc.documentum.eas.crypt.rsa.rkm
.RSADpmCryptoProvider.
eas_node_site
Site
Must match the eas_site property of eas_cfg_config
object
eas_node_proximity
Proximity
Proximity from the site of the RSA DPM server,
should be an integer bigger than 0.
At runtime, CryptoProvider first tries to connect to
the RSA DPM server with the lowest priority.
eas_node_host
Host
The name of the DPM appliance host.
eas_node_port
Port
The DPM appliance port.
eas_cfg_aic_crypto
This is the parent type for eas_cfg_holding_crypto. You can use the child type,
eas_cfg_holding_crypto, as the AIC for queries (search and order). Moreover, you can create a specific
eas_cfg_aic_crypto to accommodate custom requirements.
You can configure eas_cfg_aic_crypto properties on the Collection tab of the
eas_cfg_holding_crypto object. Specify the following properties:
152
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
The name of the configuration
eas_cfg_crypto_provider
Cfg crypto provider
The crypto provider to use for the holding
eas_pdi_crypto_query
_role
Encrypted structured
data query role
The role necessary to use encrypted criteria
on a query. If blank, no specific role is
needed.
eas_ci_crypto_access_role
Encrypted content
accessor role
The role to access the decrypted content. If
left blank, no specific role is needed.
eas_pdi_crypto_access
_role
Encrypted structured
data accessor role
The role necessary to access decrpyted
data on a query result. If blank, no specific
role is needed.
eas_dip_crypto_enabled
Encrypt order result
Flag indicating whether order results are
encrypted.
eas_dip_crypto_always
Always encrypt order
result
Flag indicating whether to encrypt order
results even if the result contains no
encrypted data.
eas_dip_crypto_key
_class
Order result key class
Key class to encrypt order results.
eas_dip_crypto
_parameters
Order result crypto
parameters
Specific encryption parameters for
order results. RSA Encryption Header
section provides an example of using
cryptographic parameters.
eas_cfg_holding_crypto
The eas_cfg_holding_crypto object holds properties related to encryption of a holding. You
need to specify the following properties.
Property Name
DA Label
Description
eas_name
Name
The name of the encryption holding object.
For example, crypto.PhoneCalls
eas_cfg_crypto_provider
Cfg crypto provider
The name of the encryption provider. For
example, RSACryptoProvider
eas_pdi_schema
Schema
Name of an XML schema in which this
holding can ingest structured data.
eas_cfg_pdi_crypto
Cfg metadata crypto
eas_cfg_pdi_crypto and
eas_pdi_schema are coupled values.
When a SIP whose schema is the
value of eas_pdi_schema received, the
corresponding eas_cfg_pdi_crypto is used
to override PDI ingestion configuration.
153
InfoArchive Configuration
Property Name
DA Label
Description
eas_delete_on_error
Delete on error
Flag indicating whether data files should
be deleted if an error is encountered
during the ingestion.
eas_sip_crypto_enabled
Encrypt reception
Setting this flag to FALSE to disable the
encryption of received files.
eas_sip_crypto_key_class
Reception key class
Name of the key class for encrypting
received files.
eas_sip_crypto
_parameters
Reception crypto
parameters
Parameters to pass to the class
implementing the encryption of the
SIP data; for example, parameter that
specifies whether to include the RSA
header in the encrypted value in RSA RKM
implementation.
The RSA Encryption Header section
provides an example of using
cryptographic parameters.
eas_pdi_crypto_enabled
Encrypt structured data
Setting this flag to FALSE to disable the
encryption of the PDI data.
eas_pdi_crypto_key
_class
Metadata key class
Name of the key class for the encryption
of the PDI data.
eas_pdi_crypto
_parameters
Metadata crypto
parameters
Parameters to pass to the class
implementing the encryption of the
PDI data; for example, parameter that
specifies whether to include the RSA
header in the encrypted value in RSA RKM
implementation.
The RSA Encryption Header section
provides an example of using
cryptographic parameters.
eas_ci_crypto_enabled
Encrypt reception
Setting this flag to false allows to disable
the encryption of unstructured contents.
eas_ci_crypto_key_class
Content key class
Name of the key class known by the crypto
provider for encrypting unstructured
content files.
eas_ci_crypto
_parameters
Content crypto
parameters
Parameters to pass to the class
implementing the encryption of CI;
for example, parameter that specifies
whether to include the RSA header
in the encrypted value in RSA RKM
implementation.
154
InfoArchive Configuration
Property Name
DA Label
Description
The RSA Encryption Header section
provides an example of using
cryptographic parameters.
eas_cfg_pdi_crypto
Similar to the eas_cfg_pdi type, eas_cfg_pdi_crypto configures parameters for the
encrypted ingestion with an imported XML file. If encryption is enabled, both eas_cfg_pdi
and eas_cfg_pdi_crypto configurations are both applied to ingestion. If there is any overlap,
configurations held by the eas_cfg_pdi_crypto object overwrites the eas_cfg_pdi object.
The imported XML file has the following XML structure:
<datas>
<data
<data
<data
<data
<data
</datas>
id="pdi.xdb.encryption">...</data>
id="pdi.xdb.importer">...</data>
id="set.schema">...</data>
id="ci.encryption">...</data>
id="pdi.index.creator">...</data>
pdi.xdb.encryption
The pdi.xdb.encryption block is used to determine the elements in the structured data to encrypt
during ingestion. Encryption of structured data occurs before it is imported into xDB, which prevents
unauthorized access at the xDB level.
The following XML block configures the CustomerLastName and RepresentativeID elements in
eas_pdi.xml to be encrypted during the ingestion process.
<data id="pdi.xdb.encryption" >
<prefix>ns1</prefix>
<namespace>urn:x-emc:eas:schema:pdi</namespace>
<paths>
<path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls
/{urn:eas-samples:en:xsd:phonecalls.1.0}Call
/{urn:eas-samples:en:xsd:phonecalls.1.0}CustomerLastName
</path>
<path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls
/{urn:eas-samples:en:xsd:phonecalls.1.0}Call
/{urn:eas-samples:en:xsd:phonecalls.1.0}RepresentativeID
</path>
</paths>
</data>
This is an example of PDI structured data before encryption.
<RepresentativeID>000502</RepresentativeID>
After encryption, a hash value and an encrypted element value are generated:
<RepresentativeID ns1:hash="wPTEW63YwTeh1dKn8kd+UJKSIDPsC/B9pRA9rwD39ag=">
MDAwNTAycGRpa2V5X0FfNw==</RepresentativeID>
155
InfoArchive Configuration
Note: Structured data encryption has the following limitations:
1.
The text value and attribute value of an XML element cannot be both encrypted.
2.
In an XML element, only one attribute can be encrypted.
3.
The namespace must be set for every single XML element.
4.
The namespace is optional for attributes. If a namespace is declared on an attribute, you must
also declare the namespace in the PDI XML.
pdi.index.creator
If an encrypted element in the PDI XML file is to be indexed, the index should target the hashed
value. For example, RepresentativeID is encrypted as follows:
<RepresentativeID ns1:hash="wPTEW63YwTeh1dKn8kd+UJKSIDPsC/B9pRA9rwD39ag=">
MDAwNTAycGRpa2V5X0FfNw==</RepresentativeID>
In the XML content of the eas_cfg_pdi object, the pdi.index.creator section specifies a path
to RepresentativeID as follows:
<path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/{urn:eas
-samples:en:xsd:phonecalls.1.0}Call[{urn:eas-samples:en:xsd:phonecalls.1
.0}RepresentativeID<LONG>]</path>
In the XML content of the eas_cfg_pdi_crypto object, the pdi.index.creator section should
target the hash attribute instead of the RepresentativeID text value. Therefore, the pdi.index.creator
section specifies a path to the hash attribute instead of the RepresentativeID text value as follows:
<path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/{urn:eas
-samples:en:xsd:phonecalls.1.0}Call/{urn:eas-samples:en:xsd:phonecalls.1
.0}RepresentativeID[@{urn:x-emc:eas:schema:pdi}hash<STRING>]</path>
Note: The pdi.index.creator defined by the eas_cfg_pdi_crypto object overrides the one defined by
the eas_cfg_pdi object. Therefore, you must declare all indexes, including the elements which are
not encrypted.
pdi.xdb.importer
After encryption is enabled, a PDI XML containing the encrypted structured data is created during
the ingestion process and then imported into xDB.
The following XML block must be configured in order to import the encrypted PDI XML into xDB.
<data id="pdi.xdb.importer">
<pdi.env.name>pdi.file.xml.metadata.encrypted</pdi.env.name>
</data>
set.schema
The encrypted PDI XML may have a different structure than the regular one. The schema for the
regular PDI XML may not be applicable to the encrypted one. This block defines a schema for the
encrypted PDI XML.
156
InfoArchive Configuration
The following XML block defines a schema for the encrypted PDI XML:
<data id="set.schema">
<name>urn:eas-samples:en:xsd:phonecalls:crypto.1.0</name>
</data>
ci.encryption
Unstructured content files can contain sensitive data as well. You can also encrypt content files during
the ingestion process. Content encryption usually happens after compression.
The following XML block defines the process of selecting content files from a table of contents
(eas_ri.xml) file.
<data id="ci.encryption">
<select.query>
declare namespace ri = "urn:x-emc:eas:schema:ri";
let $toc as document-node(element(ri:ris)):=
doc(concat('eas_aip_id_', xhive:metadata(., 'eas_aip_id'), '.ri'))
return
$toc/ri:ris/ri:ri/@pdi_key/[contains(., 'mp3')]/string(.)
</select.query>
</data>
The following code shows an XML block in the table of contents after encryption is enabled.
<ris xmlns="urn:x-emc:eas:schema:ri">
<ri seqno="1" pdi_key="recording1.mp3" schemaVersion="1.0" audit="recording1.mp3" xmlns="urn:x-e
<step>
<ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123"/>
</step>
<step>
<crypto size="41136"/>
</step>
<step>
<container position="0" size="41136"/>
</step>
</ri>
</ris>
eas_query_config
On the query configuration object side, no new attributes need to be added to the existing
eas_cfg_query object. You need to modify the imported XML of the eas_cfg_query object to include
information about how the encryption schema must be handled.
For example, the XML content of the query configuration object for the sample PhoneCall holding is
as follows:
<?xml version="1.0" encoding="UTF-8"?>
<request-configs xmlns:n="urn:eas-samples:en:xsd:phonecalls.1.0"
xmlns:eas="urn:emc:eas">
<query-prolog>
declare namespace f = "urn:x-emc:eas:functions";
declare namespace pdi = "urn:x-emc:eas:schema:pdi";
declare function eas:decrypt-tree($node as node(),
$aipid as xs:string) as node()?
{
typeswitch ( $node )
157
InfoArchive Configuration
case element(n:CustomerLastName)
return element { node-name($node) } { f:decrypt($node/text(), $aipid) }
case element(n:RepresentativeID)
return element { node-name($node) } { f:decrypt($node/text(), $aipid) }
case element()
return element { node-name($node) } { $node/@*,
$node/node()/eas:decrypt-tree(., $aipid) }
default
return $node
};
</query-prolog>
<request-config schema="urn:eas-samples:en:xsd:phonecalls.1.0">
<entity>
<path>/n:Calls/n:Call</path>
<order-by>n:CallStartDate</order-by>
</entity>
<param name="CustomerID" type="decimal" index="true">
<path>n:CustomerID</path>
</param>
<param name="CallStartDate" type="date-time" index="true">
<path>n:CallStartDate</path>
</param>
<param name="CustomerLastName" type="string" index="true">
<path>n:CustomerLastName</path>
</param>
<param name="RepresentativeID" type="decimal" index="true">
<path>n:RepresentativeID</path>
</param>
<param name="CustomerFirstName" type="string" index="true">
<path>n:CustomerFirstName</path>
</param>
</request-config>
<request-config schema="urn:eas-samples:en:xsd:phonecalls:crypto.1.0">
<entity>
<path>/n:Calls/n:Call</path>
<order-by>n:CallStartDate</order-by>
</entity>
<param name="CustomerID" type="decimal" index="true">
<path>n:CustomerID</path>
</param>
<param name="CallStartDate" type="date-time" index="true">
<path>n:CallStartDate</path>
</param>
<param name="CustomerLastName" type="hashed" index="true">
<path>n:CustomerLastName/@pdi:hash</path>
</param>
<param name="RepresentativeID" type="hashed" index="true">
<path>n:RepresentativeID/@pdi:hash</path>
</param>
<param name="CustomerFirstName" type="string" index="true">
<path>n:CustomerFirstName</path>
</param>
<query-template xmlns:xhive="http://www.x-hive.com/2001/08/xquery-functions">
158
InfoArchive Configuration
for $entity in
<select/>
let $root := root($entity)
let $aipid := xhive:metadata($root, 'eas_aip_id')
return eas:decrypt-tree($entity, $aipid)
</query-template>
</request-config>
</request-configs>
Adding Properties for the Holding Object
You must modify the regular holding configuration object (eas_cfg_holding) to accept the encrypted
received file format by adding the following attributes:
Property Name
DA Label
Value
eas_sip_format
Received file format
eas_sip_zip_crypto
eas_cfg_ingest
Cfg ingest process
eas_ingest_sip_zip-ci, or
eas_ingest_sip_zip-pdi
Auto Populating Properties for Runtime Objects
New attributes are automatically populated for InfoArchive runtime objects. If objects are created
after the encryption option is enabled for a holding, you can view the auto-populated attributes
in their properties dialogs.
Object
New Properties
eas_aip
• eas_pdi_crypto_key_id
• eas_pdi_crypto_iv
• eas_pdi_crypto_hash_salt
• eas_pdi_crypto_propbag
• eas_ci_crypto_key_id
• eas_ci_crypto_iv
• eas_ci_crypto_propbag
• eas_sip_crypto_key_id
• eas_sip_crypto_iv
• eas_sip_crypto_propbag
159
InfoArchive Configuration
Object
New Properties
eas_order
• eas_cfg_aic_crypto
• eas_cfg_aic_crypto_id
• eas_cfg_crypto_provider
• eas_cfg_crypto_provider_id
• eas_dip_crypto
• eas_dip_crypto_key_id
• eas_dip_crypto_iv
• eas_dip_crypto_propbag
• eas_crypto_encoding
• eas_pdi_decrypt_cnt
• eas_ci_decrypt_cnt
eas_aip_parent
• eas_cfg_crypto_provider
• eas_cfg_pdi_crypto
• eas_cfg_pdi_crypto_version
• eas_crypto_encoding
• eas_pdi_crypto_hash_algo
eas_open_aip_parent
• eas_aggr_crypto_mode
• eas_aggr_crypto_encoding
• eas_aggr_pdi_crypto_halgo
• eas_aggr_pdi_crypto_hsalt
• eas_aggr_pdi_crypto_key_id
• eas_aggr_ci_crypto_key_id
• eas_aggr_sip_crypto_key_id
• eas_aggr_pdi_crypto_iv
• eas_aggr_ci_crypto_iv
• eas_aggr_sip_crypto_iv
• eas_aggr_pdi_crypto_propbag
• eas_aggr_ci_crypto_propbag
• eas_aggr_sip_crypto_propbag
160
InfoArchive Configuration
RSA Encryption Header
RSA encryption header is an RSA encryption parameter. When you use a header, RSA stores
description data (keyID and IV) at the encrypted value level. If you use the header parameter, the
encrypted value itself contains all the information required for decryption. There is no need to
access AIP objects to retrieve keyID and IV values.
You set the following parameter properties according to your encryption requirements:
• eas_sip_crypto_parameters
• eas_pdi_crypto_parameters
• eas_ci_crypto_parameters
• eas_dip_crypto_parameters
Valid parameter values are as follows:
Value
Description
HEADER=Default
Use the most recent header version when protecting; automatically
determine the header type when processing.
HEADER=Version 1.5
Use a Key Manager Java Client Version 1.5 compatible header.
HEADER=Version 1.5
Base64
Use a Key Manager Java Client Version 1.5 compatible header with
Base 64 encoding.
HEADER=Version 2.1
Use a Key Manager Java Client Version 2.1 header.
HEADER=Version 2.7
Use a Key Manager Java Client Version 2.7 header.
HEADER=None
No header is used.
For example, you can set the eas_pdi_crypto_parameters property value to HEADER=Default for the
PhoneCalls holding. In the PDI XML file, the Representative ID is encrypted as follows:
<RepresentativeID ns1:hash="ZXio5+YN3cY1/DVee3qxuR3qQAo7hlXflX+wvA5jusk=">
UktNQzI3MAD/////AAAABW11aWQAAAAAIFJlFFDkcl+lzIeFd4txg4eLP3RpZupcPRvY+Z4REoUR
/////wAAAANpdgAAAAAQeAuiLT5AE9A4QeIZOVw3HP////8AAAAFY3N1bQAAAAAgSd8DOFGOoxBy
ao6Q5yjCxTihQ8JM1LdWukRsGh5+fmGritDkwyKAZxV4o02hfoRO</RepresentativeID>
If you try to retrieve the Representative ID from a web application, the application does not need to
access the corresponding AIP object. Instead, the application retrieves the encrypted value, decrypts
the value based on the header information contained in itself, and returns the decrypted value.
Operator Restraints for Querying Encrypted Data
When you build a query on encrypted data, you must use the following operator rules for single-value
and multi-value queries on encrypted data:
Operator
Single-value Query
Multi-value Query
Equal
Allowed
Not allowed
NotEqual
Allowed
Not allowed
Other operators
Not allowed
Not allowed
161
InfoArchive Configuration
Configuring Synchronous Ingestion
Synchronous ingestion configuration involves defining the custom AIP assignment policy and
configuring the AIP parenting policy. You must also enable synchronous ingestion both at the access
node and the holding level.
The AIP parenting policy is configured through an eas_cfg_aip_parent_policy configuration object
and referenced by the eas_cfg_holding configuration object of the holding to which the policy is
applied. One AIP parenting policy can be applied to multiple holdings.
The AIP assignment policy is defined using XQuery in a text file and imported into
eas_cfg_aip_parent_policy as the content of the object. You activate synchronous ingestion by
configuring the eas_cfg_access_node and eas_cfg_holding objects.
Defining the Custom AIP Assignment Policy
You define the AIP assignment policy using XQuery in a text file to be imported into
eas_cfg_aip_parent_policy as the content of the object.
You can use any information present in the SIP descriptor, including custom metadata, in the XQuery
to define the assignment logic. For example, you can use the retention date (which is most commonly
used) or the holding as the condition for assigning AIPs.
Here is an example of the assignment policy XQuery expression.
xquery version "1.0" encoding "utf-8";
declare namespace n = "urn:x-emc:eas:schema:sip:1.0";
let $xdbMode := /n:sip/n:dss/n:entity/text()
let $aipMode := /n:sip/n:dss/n:application/text()
let $d as xs:dateTime:= xs:dateTime(/n:sip/n:dss/n:base_retention_date/text())
let $quarter as xs:integer := xs:integer((ceiling(month-from-dateTime($d) div 3)))
let $tz as xs:dayTimeDuration := timezone-from-dateTime($d)
let $start as xs:dateTime := adjust-dateTime-to-timezone(xs:dateTime
(concat(year-from-dateTime($d),"-01-01T00:00:00")),$tz)
let $nextQuarter as xs:dateTime := $start + xs:yearMonthDuration
(concat("P",string(($quarter*3)),"M"))
return
<policy partitioning_key="{year-from-dateTime($d)}-Q{$quarter}"
close_hint_date="{$nextQuarter}"aip_mode="{$aipMode}" xdb_mode="{$xdbMode}"/>
162
InfoArchive Configuration
In this example, the quarter-based assignment logic is used and the base retention date is used to
compute the following values:
• partitioning_key="{year-from-dateTime($d)}-Q{$quarter}"
A string type value using the YYYY-Qn pattern for the year and quarter number of the base
retention date
• close_hint_date="{$nextQuarter}
A datetime type value corresponding to the first day of the next quarter
Those computed values are returned as values of partitioning_key and close_hint_date attributes
of a policy element expected by InfoArchive. The close_hint_date attribute is mandatory only if
the close mode 1 is applied.
The AIP and xDB mode to apply can be also dynamically determined based on the SIP descriptor
by returning optional aip_mode and xdb_mode attributes of the policy element. However, the
returned combination must be configured in the as_cfg_aip_parent_policy configuration object.
Here is an example of the policy element returned by the XQuery expression:
<policy partitioning_key="2012-Q2" close_hint_date="2014-01-01T12:00:00.000+01:00"/>
Configuring the AIP Parenting Policy
1.
In DA, create an object of type eas_cfg_aip_parent_policy by importing the AIP assignment
XQuery text file as its content.
2.
Configure the eas_cfg_aip_parent_policy object properties.
Property
DA Label
Description
eas_name
Name
Name of the AIP parenting policy.
163
InfoArchive Configuration
Property
DA Label
Description
eas_default_aip
_mode
Default AIP mode
The default AIP mode to apply for creating the
eas_aip repository object:
• 1 = Creation of a materialized eas_aip object (like
in batch ingestion)
• 2 = Creation of an eas_aip lightweight object
attached to a shared eas_aip_parent parent object
• 3 = Same as mode 2, but when the parent object is
closed, its lightweight objects are aggregated into a
single materialized eas_aip object and lightweight
objects and the parent object are pruned
The returned result of the AIP assignment policy
XQuery overrides the default AIP mode.
eas_default_xdb
_mode
Default xDB mode
The default xDB mode to apply when ingesting
structured data in xDB:
• 0: Mode configured at the archive
holding level is applied (see property
eas_cfg_holding.eas_xdb_mode)
• 1: Metadata files of all ingested AIPs are stored
directly in a designated xDB detachable library
• 2: For each ingested AIP, a new sub-library is
created in a designated xDB detachable library and
metadata files are stored in the sub-library.
• 3: Like in xDB mode 2, AIP metadata files are
stored in the sub-library exclusively created
for each AIP. However, AIP sub-libraries are
organized according to a defined assignment
policy in a pool of libraries (called pooled libraries,
which is purely an InfoArchive concept and not a
library type in xDB) created in a designated xDB
detachable library. The pooled libraries logically
pertain to a configured library pool.
The returned result of the AIP assignment policy
XQuery overrides the default AIP mode.
Note: The parameters associated with the mode are
read from the archive holding configuration. The
parameters required by a selectable mode must be set
at the archive holding level.
eas_aip_aggr
_staging_store
164
Staging store of
the AIP creation
mode 3
Only needed if the AIP creation mode 3 is applied:
• Contents associated with the lightweight AIPs
are all imported in this transient storage area (the
InfoArchive Configuration
Property
DA Label
Description
storage areas designed in the archive holding
configuration are ignored).
• When the AIP parent is closed, the contents
associated with the lightweight AIPs are retrieved
and aggregated, and those aggregates are
imported as contents of the materialized AIP in
the storage areas designated in the archive holding
configuration. The lightweight AIPs are then
marked for being pruned later, in order to reclaim
their associated RDBMS and storage area space.
The repeating attributes table is a matrix of all possible combinations of AIP parent quota
settings, AIP parent close settings, and confirmation event options for different AIP mode and
xDB mode pairs.
AIP parent quotas determine when the ingestion service automatically creates a new AIP parent
shareable object for new ingested packages. AIP parent close modes (eas_param_close_mode)
sets the conditions for the AIP parent.
Property
DA Label
Description
eas_param_aip
_quota
AIP quota per
parent
Maximum number of AIPs per AIP parent. The value
zero indicates an unlimited number.
eas_param_aiu
_quota
AIU quota per
parent
Maximum number of AIUs per AIP parent. The value
zero indicates an unlimited number.
eas_param_ci
_quota
CI count quota per
parent
Maximum number of contents per AIP parent (sum of
the number of contents of the associated lightweight
AIPs). The value zero indicates an unlimited number.
eas_param_ci
_size_quota
CI size quota per
parent
Maximum content size per AIP parent (sum of the
content size of the associated lightweight AIPs). The
value zero indicates an unlimited number.
eas_param_close
_mode
AIP parent close
mode
Set the conditions for closing the AIP parent.
• 0 : Never close the parent unless a close request
is manually issued
• 1: Automatically close the parent when the
following condition is met:
current date >= close hint date + eas_close_period
The close hint date must be returned by the
XQuery statement and if the date is not returned,
an ingestion error occurs
• 2 : Automatically close the parent when the
following condition is met:
165
InfoArchive Configuration
Property
DA Label
Description
current date >= parent opening date +
eas_close_period
• 3 : Automatically close the parent when the
following condition is met:
current date >= last ingestion date of the parent +
eas_close_period
The eas_close job is responsible for closing AIP
parent shareable objects when the conditions are met.
Regardless of the AIP parent close mode you choose,
the eas_close job always closes an open AIP parent
when a manual close request is received.
The close request, parent opening date, close hint
date, and last ingestion date information are stored
as properties of the eas_open_aip_parent_rel relation
associated with the parent.
eas_param_close
_period
Close period (d)
Period expressed in days used according to the close
mode.
eas_param_conf
_rec_disabled
Receipt conf.
disabled
When set, the receipt event of the ingested AIP is
not processed by the confirmation job (this exclusion
is done by assigning the ingestion date to the
eas_conf_receive_date attribute of the AIP). When no
receipt confirmation needs to be generated, setting
this flag dramatically decreases the execution time
of the confirmation job when a large number of
synchronously ingested AIPs are expected.
eas_param_conf
_ing_disabled
Archival conf.
disabled
Whether or not the receipt event type will be
processed by the confirmation (eas_confirmation)
job. When this option is set to true, the receipt
event timestamp will still be assigned to the
eas_conf_receive_date property but no confirmation
messages will be generated for storage events.
Setting this option to true will significantly reduce
the execution time of the confirmation job, especially
with large numbers of synchronously ingested AIPs.
1.
Configure the holding you want to apply the AIP parenting policy to.
Edit the eas_cfg_holding object properties and under the Holding tab, specify the AIP parenting
policy name in the Cfg AIP parenting policy for sync ingest field.
166
InfoArchive Configuration
Enabling Synchronous Ingestion
To enable synchronous ingestion, you need to activate the synchronous ingestion service at both
the access node and the holding level.
1.
Activate the synchronous ingestion service at the access node level.
In DA, locate the access node (e.g., /System/EAS/Nodes/access_node_01), which is the
application sever instance hosting the InfoArchive web services. Edit its properties and under the
Ingestion tab, select the SIP ingestion enabled option.
Note: You need to restart the web services for the change to take effect.
2.
Activate the synchronous ingestion service at the holding level.
Configure the holding properties and under the Holding tab, select the Synchronous ingestion
enabled option at the bottom. This change takes effect immediately.
167
InfoArchive Configuration
Configuring Query
There are two types of queries: synchronous search that returns query results in real time and order
(asynchronous search) that creates search orders and returns search results with a delay.
Both type of queries are configured through a query configuration object (eas_cfg_query) and a
query quota object (eas_cfg_query_quota). Order requires the configuration of an additional order
configuration object (eas_cfg_order) and an order node configuration object (eas_cfg_order_node).
Note: An order node does not correspond to a single holding. It works globally for all the holdings
configured in the system.
• Query configuration object (eas_cfg_query)
168
InfoArchive Configuration
The query configuration object (eas_cfg_query) defines:
— Which Archiving Information Collection (ACL, each corresponding to a holding) to search
(can be configured to search one or more specified holdings)
— What search criteria can be used (through the XML content of the query configuration object)
— What data in the XML PDI can be returned
Optionally, the query configuration object (eas_cfg_query) also lets you dynamically adjust the
returned XML result.
• Query quota configuration object (eas_cfg_query_quota)
The query quota configuration object (eas_cfg_query_quota) defines:
— The maximum search range—the number of AIPs and/or AIUs—that can be searched. Data
outside the defined scope will not be searched.
— The maximum number of results that can be returned. Search execution will stop after the
this number is reached and search results will be returned.
Only one query quota is allowed for each query configuration, but several query configurations
can be defined for an archive holding.
You can apply different ACLs on query quota configuration (eas_cfg_query_quota) objects so that
different users and groups have different query quotas applied to their searches.
A query quota configuration object (eas_cfg_query_quota) can be configured for search and order,
as well as delivery channel, which defines where the query results will be returned.
• Order configuration object (eas_cfg_order)
The order configuration object is applicable to delivery channels and it applies ACLs so that
orders are accessible to users with appropriate privileges.
• Order node configuration object (eas_cfg_order_node)
The order node configuration object configures how the order processor works and is used when
an order processor is started.
Configuring a Query Quota Configuration
(eas_cfg_query_quota) Object
In DA, create an object of type eas_cfg_query_quota in the holding (/System EAS/Archive
Holdings/MyHolding) folder and configure its properties.
169
InfoArchive Configuration
Property
DA Label
Description
eas_name
Name
Technical name of the query quota configuration.
eas_query
_applicable
Valid for search
(synchronous)
Whether the quotas are applicable to synchronous
searches
eas_order
_applicable
Valid for orders
(asynchronous)
Whether the quotas are applicable to orders
eas_max_duration
Max. duration
Not implemented, reserved for future use
eas_aip_quota
AIP quota
The maximum number of AIPs allowed for a search
range. This value is generally much greater for an
order than for a synchronous search.
eas_result_cnt
_quota
# results quota
Maximum number of results that can be returned by
the query. This value is generally much greater for
an order than for a synchronous search.
eas_result_size
_quota
Result size quota
Not implemented, reserved for future use
eas_superuser
_applied
Applied to superusers
Since superuser Documentum accounts have Read
access to all configuration objects, this flag is used
to indicate that this configuration should be applied
to superusers. However, it is recommended to not
configure a Documentum account having superuser
privileges since they are used by web services (refer
to eas_cfg_access_node type).
eas_delivery
_channel
Delivery channels
One or more delivery channel for which the quota
configuration is applicable
170
InfoArchive Configuration
Defining the Search Criteria and Search Result
You define the search criteria by creating them in an XML file and importing it into the repository
as the content of the query configuration object (eas_cfg_query).
The outline of the query configuration XML is as follows. Elements in square brackets [] are optional.
<?xml version="1.0" encoding="UTF-8"?>
<request-configs>
[<query-prolog> ... </query-prolog>]
<request-config schema="URN of the SIP Metadata XML">
<entity> ... </entity>
<param> ... </param>
...
[<query-template>]
...
</request-config>
...
</request-configs>
Element
Description
Required
query-prolog
Defines custom XQuery functions that contains custom query
processing logic. The text of the element is dynamically
inserted as a prolog into the XQuery expression defined in the
query configuration and the XQuery functions defined within
the prolog can be referenced in the query-template element.
No
request-config
Defines search criteria
Yes
query-template
Defines query results
No
Configuring Query for Unstructured Data
Unstructured data is stored in the eas_ci_container rendition of an AIP (eas_aip) object. In order to
retrieve an unstructured data file from an eas_ci_container rendition, you must first get the AIP
ID—eas_aip_id of the AIP (eas_aip) object—and the sequence number of the content file within the
AIP that was assigned during ingestion.
The unstructured content ID can be constructed from the eas_aip_id metadata in xDB (as is shown
below) and the sequence number in the TOC. The eas_aip_id value can be retrieved using the xDB
function.
171
InfoArchive Configuration
To enable a client application to issue queries that can retrieve unstructured data files:
• The XQuery must be written to obtain the AIP ID and the sequence number.
• The XQuery results must be modified to insert the AIP id and the sequence number as attributes
of the element that contains the name of the unstructured data file.
In the following example, a custom XQuery function named eas:add-aip-seqno is defined within
the query-prolog XML block.
<query-prolog>
declare function eas:add-aip-seqno($node as node(), $aip_id as xs:string,
$ri_uri as xs:string) as node()?
{
typeswitch ( $node )
case element(n:FileName)
return element { node-name($node) } { attribute {xs:QName('eas:seqno')}
{ doc($ri_uri)/ris/ri[@pdi_key eq $node/text()][1]/@seqno },
attribute {xs:QName('eas:aipid')}{$aip_id}, $node/node() }
case element()
return element { node-name($node) } { $node/@*,
$node/node()/eas:add-aip-seqno(., $aip_id, $ri_uri) }
default
return $node
};
</query-prolog>
As its name indicates, the eas:add-aip-seqno function appends eas:seqno and eas:aipid
attributes to each FileName element in the node hierarchy passed as an argument. The value of the
inserted eas:seqno attribute is the sequence number of the content associated with the file name
present in the table of contents. The eas:aipid is the AIP identifier passed as an argument.
To modify the XQuery results, in the query-template XML block, configure an XQuery that
processes the default results referenced by the <select/> placeholder, using the function defined in
the query-prolog XML block; for example:
<query-template xmlns:ri="urn:x-emc:eas:schema:ri"
xmlns:xhive="http://www.x-hive.com/2001/08/xquery-functions">
for $Call in <select/>
let $pdi_uri := root($Call)
let $aip_id := xhive:metadata($pdi_uri, 'eas_aip_id')
let $ri_uri := replace(document-uri($pdi_uri),'\.pdi$', '.ri')
return eas:add-aip-seqno($Call, $aip_id, $ri_uri)
</query-template>
172
InfoArchive Configuration
In plain English, the XQuery translates into:
For each query result
Return the structured content file path containing the result
Return the AIP ID (using the xDB function xhive:metadata)
Shape the path of the TOC of this AIP
Return the modified query result (calling the eas:add-aip-seqno function)
The path to the TOC needs to be retrieved because the TOC contains the sequence number for each
unstructured content file. The eas:add-aip-seqno function uses this information to add the sequence
number to the XQuery results.
Here is an example of what the XQuery might return:
<FileName xmns:eas="urn:x-emc:eas" eas:seqno="1"
eas:aipid="080f424080002144">recording1.mp3</FileName>
Instead of adding the AIP id and the unstructured content file sequence number as distinct elements in
the results, they can be returned in a single element with the structure of AIPID:ci:SequenceNumber.
ci stands for Content Information and is a constant value here.
To use this approach, in the query-prolog XML block, define the function to add the eas:cid
attribute to each FileName element.
InfoArchive does not dictate how this value is inserted in the query results. However, it is good
practice to:
• Assign it in a distinct attribute (for example, eas_ci_id) of the element that contains the file name
of the unstructured content file
• Associate a designated namespace to this attribute (e.g., urn:x-company:eas)
• Apply a uniform rule across all holdings to facilitate easy configuration and maintenance
. This alternative simplifies the structure of the results, which might be something like this:
<FileName xmns:eas="urn:x-emc:eas" eas:cid="080f424080002144:ci:1">recording1.mp3</FileName>
request-config—Defining Search Criteria
The request-config XML block defines the search criteria.
At least one request-config block must be defined. You can define multiple request-config
blocks when:
• The query covers AIPs using different schemas. Additional request-config blocks are needed
in order to define the mapping rules to apply for each schema.
• When the query configuration is used for an order, and the order results must be sorted, an
additional <request-config> block for the schema URN of the results must be defined for the
criteria by which the sorting must be performed.
The following example configures a query for returning the phone calls metadata and the identifiers
of their associated content, ordered by call start date. The search criteria can be customer ID, call start
date, or representative ID.
<request-config schema="urn:eas-samples:en:xsd:phonecalls.1.0">
<entity>
<path>/n:Calls/n:Call</path>
<order-by>n:CallStartDate</order-by>
173
InfoArchive Configuration
</entity>
<param name="CustomerID" type="integer" index="true">
<path>n:CustomerID</path>
</param>
<param name="CallStartDate" type="date-time" index="true">
<path>n:CallStartDate</path>
</param>
<param name="RepresentativeID" type="integer" index="true">
<path>n:RepresentativeID</path>
</param> </request-config>
• The schema attribute specifies the XML schema to apply to AIP metadata in xDB.
• The entity element defines the XML block to return for each matching result.
• The order-by element defines the default sorting of the results if the query does not include a
sort clause.
• Each param element defines a search criterion.
— The name attribute is the criteria name in the InfoArchive query.
— The type attribute is the data type of the element. The valid values are: string, date, dateTime,
integer, and double. For example, the data type for CallStartDate is dateTime.
— The index attribute indicates whether to create index for the element in xDB. The value of this
attribute impacts the structure of the generated XQuery during its execution.
In the example, the listed elements will be all indexed in xDB.
— You can set the AIU ID (eas_aiu_id) or content file ID (eas_ci_id) as a search criterion to reduce
the scope of the query or directly access an AIU or content file:
<param name="eas_aiu_id" type="string" index="true" ancestor_alias="0">
<path>.//@_eas:id</path>
</param>
<param name="eas_ci_id" type="string" index="true" ancestor_alias="0">
<path>.//@_eas:cid</path>
</param>
• The path element maps to the path of the XML element relative to the path of the entity
element with which the criteria is associated.
• By default, the matching entities are returned “as is”, without any alteration.
Configuring Search Criteria for Multi-Level XML Structures
When eas_pdi.xml has an XML document structure with multiple levels of nested elements,
define the search criteria in a top-down manner starting from higher-level elements rather than just
traversing the AIU elements at the lowest level. This will significantly improve query performance,
especially when the XML structure is deep.
For example, in the following XML structure, the AIU element Call is nested within the Dept
element, which is in turn nested within the Calls element:
<Calls>
<Dept id="1">
<Call call-id="1"/>
<Call call-id="2"/>
174
InfoArchive Configuration
<Call call-id="3"/>
</Dept>
<Dept id="2">
<Call call-id="1"/>
<Call call-id="5"/>
</Dept>
</Calls>
For optimal query performance, define the search criteria for this XML structure this way:
<request-config>
<entity>
<path>/Calls[]/Dept[]/Call</path>
</entity>
<param name="dept-id" type="string" ancestor_alias="1">
<path>@id</path>
</param>
<param name="call-id" type="string">
<path>@call-id</path>
</param>
</request-config>
In the Path element that specifies the path to the AIU element, each nesting level above the AIU
element is represented by an empty bracket []. The ancestor_alias attribute denotes the nesting
level of the element, with the lowest level being 0, the level above being 1, and so on so forth. If not
specified, the default ancestor_alias value for the parameter is 0.
In the example above, the ancestor_alias values for the elements are as follows (internally interpreted
as such):
<paths>
<path alias="2">/Calls</path>
<path alias="1">/Dept</path>
<path alias="0">/Call</path>
</paths>
During query, the search criteria defined above will be translated into XQuery expressions such
as the following:
for $call in /Calls/Dept[@id='2']/Call[@call-id='5']
return $call
Note: The syntax for defining search criteria for multi-level XML structures has the following
limitations:
• Two or more search criteria cannot be used together in an OR relation
• Search results cannot be ordered by search criteria defined in this way
Grouping Search Criteria
For some content types, the search configuration must include the definition of a significant number
of search criteria below a common sub path. In such situations you can enclose related parameter
definitions in the <group> element to avoid repeating the sub path for each criterion. The example
below shows how to group search criteria.
<group>
<path>op:index</path>
<param name="settlement" type="date">
<path>head:settlementDay</path>
</param>
<param name="ref-order" type="string" index="true">
175
InfoArchive Configuration
<path>head:orderRef</path>
</param>
</group>
• A path element is inserted directly under the <group> tag. The value enclosed between <path>
tags is the common sub path.
• The settlement and ref-order criteria are respectively op:index/head:settlementDay
and op:index/head:orderRef.
As shown in the example below, you can nest group tags.
<group>
<param name="settlement_rop" type="date" index="true">
<path>op:cancel-operation/op:index/head:settlementDay</path>
<path>op:R-operation/op:index/head:settlementDay</path>
</param>
<group>
<path>(op:R-operation | op:cancel-operation)/op:index</path>
<param name="direction_rop" type="string">
<path>head:orderDirection</path>
</param>
<param name="amount_rop" type="decimal">
<path>head:amount</path>
</param>
</group>
The usage of the group tag not only enhances the readability of the configuration but also allows to
influence the generation of the XQuery.
If a criterion belongs to a group, the generated XQuery also includes all the criteria defined in the
group:
• Inclusion of the received filtering condition for the received criterion
• Inclusion of the condition that the other criteria defined in the group exist
This behavior has been retained for being able to leverage potential xDB indexes composed of
multiple criteria. A side effect of this behavior is the generated XQuery does not return an AIU if one
of the criteria of the group does not exist for this AIU. Because of this behavior, it is recommended to
impose in the data the presence of all the elements associated with the criteria.
To summarize, the following rules can be considered depending on the XML element associated
with a search criteria:
• If an XML element is mandatory and is not included in a composed index, the inclusion or not of
the criteria in a group tag is indifferent.
• If an XML element is optional and the criteria must be isolated in a dedicated group for avoiding
the side effect mentioned above.
• If the element is included in a composed index, the criteria must be included in a group containing
the criteria associated with the other elements included in the group.
Searching Multiple Paths
If an element appears at multiple locations in the XML hierarchy, you can include all these tags in a
search criterion. This query configuration will perform an OR operation on the tags.
The following example shows a repeated <FirstName> in <path> tags.
176
InfoArchive Configuration
<param name="Document.RecipientFirstName" type="string" index="false">
<path>n:Documents/n:Document/n:Recipients/n:Recipient/n:InternalEntity
/n:Person/n:FirstName</path>
<path>n:Documents/n:Document/n:Recipients/n:Recipient/n:ExternalEntity
/n:Person/n:FirstName</path>
</param>
query-template—Defining Search Results
The optional query-template element contains the XQuery expression that defines the XML
content of the returned query results, which will be formatted by the stylesheet.
You can define query results to return not only AIU properties contained in eas_pdi.xml,
but also associated AIP properties. In the XQuery expression, you can call dctm-retriever
custom function defined in the query-prolog element to get standard or custom properties (e.g.,
r_object_id, eas_dss_holding, eas_aip_type, etc.) of the AIPs returned by the query and include them
in the query results.
The custom function dctm-retriever takes in two arguments: the AIP object ID and the property
of the AIP to retrieve; for example:
<query-template>
for $aiu in <select/>
let $pdi_uri:=root($aiu)
let $aip_id:=xhive:metadata($pdi_uri,'eas_aip_id')
return
<parent>{
(
$aiu/@id,
attribute{xs:QName('object_id')}{eas_functions:dctm-retriever($aip_id,'r_object_id')},
)
}
</parent>
</query-template>
In this example, the XQuery expression within the query-template element retrieves every AIU
(referenced by the <select/> placeholder) and returns them as the parent element in the query
result. The custom function dctm-retriever is used to get the r_object_id attribute for each
returned AIP and sets it as the object_id attribute of the parent element in the query result.
The returned query result looks something like this:
<Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0">
<parent object_id="0800000180002904"/>
<parent object_id="0800000180002904"/>
</Calls>
The parent element can repeat as many times as the number of AIUs contained in the AIP.
Note: For performance reasons, in the 2-tiered InfoArchive search, the tier-1 search (DQL) does not
fetch all of the AIP properties. The dctm-retriever function executes as a part of the tier-2 search
and can only retrieve AIP properties already fetched by the DQL query. If the AIP property the
dctm-retriever function tries to get is not present in the DQL query result, or if the property is a
custom subtype, the function will return an empty result and an exception will occur.
177
InfoArchive Configuration
For example, if you try to use the dctm-retriever function to get the value of the eas_purge_date
property, which is not fetched by the tier-1 search, the query will fail.
Although eas_purge_lock_count is not a standard AIP property, it is exposed as a virtual property
and can be passed as an argument to the dctm-retriever function to return the number of the
AIUs with a purge lock in each AIP in the query result; for example:
<query-template>
for $aiu in <select/>
let $pdi_uri:=root($aiu)
let $aip_id:=xhive:metadata($pdi_uri,'eas_aip_id')
return
<parent>{
(
$aiu/@id,
attribute{xs:QName('count')}{eas_functions:dctm-retriever($aip_id,'eas_purge_lock_count')},
)
}
</parent>
</query-template>
Configuring a Query Configuration (eas_cfg_query)
Object
In DA, create an object of type eas_cfg_query /System EAS/Archive Holdings/MyHolding (or
any other folder) configure its properties. The query configuration object must contain an XML file
that defines the search criteria. See Defining the Search Criteria and Search Result, page 171.
Property
DA label
Description
eas_name
Name
Name of the query configuration; for example,
query.MyHolding
178
InfoArchive Configuration
Property
DA label
Description
eas_result_schema
Result Schema
name
URN of the schema applied by the results that are
returned by this query configuration.
You can only create one query configuration object for
the same result schema.
eas_result_root
_element
Result root
element
Root of the data in the PDI file, for example, Calls for
PhoneCalls in eas_pdi.xml
eas_cfg_quota
_predicate
Query quota
predicate
DQL predicate used to search for a query quota
(eas_cfg_query_quota) object to be use with this
configuration; for example:
eas_cfg_query_quota where eas_name like
'quota.PhoneCalls.%
The value specified here must be a valid DQL predicate
that returns only one eas_cfg_query_quota object for the
dynamic roles received in a search request. If this DQL
predicate returns more than one eas_cfg_query_quota
object, search execution will fail.
eas_cfg_order
_predicate
Cfg order
predicate
DQL predicate that selects one and only one
eas_cfg_order object for a given user; for example:
eas_cfg_order where eas_name like
'order.PhoneCalls.%'
eas_result_root
_element
Result root
element
XML root element returned in the query results
eas_result_root_ns
_enabled
Namespace in root
element
If checked, the result namespace is associated to the
XML root element of the result
eas_cfg_aic
Archival
Information
Collection (AIC)
Collection name, for example, PhoneCalls
Name of the Archive Information Collections (AIC,
a holding is a collection) for which this query
configuration can be applied
Configuring an Order Configuration (eas_cfg_order)
Object
In DA, create an object of type eas_cfg_order in the holding folder (e.g., /System EAS/Archive
Holdings/MyHolding) and configure its properties.
179
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
Name of the order configuration object referenced by
other objects
eas_delivery
_channel
Delivery channel
One or more delivery channels to which this order
configuration object is applicable
eas_priority
Priority
Execution priority; the order node processes orders
having the highest value first
eas_deadline
Execution
deadline (mn)
Maximum desired execution deadline, used to prioritize
order execution.The execution deadline corresponds to an
SLA agreed with the business owner.
When the order is created, its eas_deadline_date property
is set to the creation date + the execution deadline.
This setting does not guarantee the order will complete
before this date and time; it is only used to prioritize order
execution for orders having the same execution priority.
When two orders have the same priority, the order with
the closest deadline date/time gets selected for execution.
180
InfoArchive Configuration
Property Name
DA Label
Description
Therefore, the execution deadline can be considered a
sub-priority.
eas_folder_path
Repository folder
Repository location in which the order (eas_order) objects
are created
eas_acl_name
ACL name
Name of the permission set to apply to the order
configuration object; for example, PhoneCalls.ORDER
eas_acl_domain
ACL domain
Domain of the permission set to apply to the order
configuration object
eas_alias_set
Alias set
The permission set applied on the eas_order repository
object must grant Read access to one of the user’s role.
This makes it possible to limit the visibility of an order
to specific roles. For example, one role of users can be
responsible for posting orders in sequence while another
role can process the results of completed orders.
eas_owner_name
Owner
Owner to assign to the order repository object
eas_retention
_period
Retention period
Maximum retention period of the order after its execution
eas_working
_deliv_channel
Working delivery
channel
Defines where to store the intermediate results in xDB
eas_working
_deliv_par_name
Working
deliv.chan.param
Working delivery channel parameter name.
eas_working
_deliv_par_value
Working
deliv.chan.param.
value
Working delivery channel parameter value associated
with the parameter name at the same index.
eas_working
_store
Working filestore
Only used when encrypted data is returned.
eas_superuser
_applied
Applied to
superusers
When the result contains structured data which has been
encrypted during ingestion, the order node does not store
the result within xDB. Instead, it dynamically encrypts
the whole result and stores it as content in the repository,
using the configured working filestore. This principle
ensures that sensitive structured data contained in the
order result are not accessible to an administrator having
access at the xDB and the file system levels.
Since superuser accounts have at least Read access
to all configuration objects, this boolean is used to
indicate that this configuration should be applied
to superusers. However, it is recommended to not
configure a Documentum account having the superuser
privilege for they are used by web services (refer to
eas_cfg_access_node type).
181
InfoArchive Configuration
Configuring an Order Node Configuration Object
A default order node configuration object order_node_01 was already configured during InfoArchive
installation and can be used out of the box. You can modify its properties as needed. To create and
configure a new order node, create an object of type eas_cfg_order_node and configure its properties.
Property Name
DA Label
Description
eas_name
Name
Name of the order node.
eas_order
_predicate
Order predicate
DQL predicate to restrict the order requests considered by
the node for processing. This is used to dedicate order
nodes to specific types of queries.
eas_log_level
Log level
Log level used by the order processing.
eas_fs_working
_root
Working
directory
Path to the working directory in the file system for this
node.
eas_worker
_thread_cnt
# worker threads
Number of order execution threads on the host computer.
After updating the thread count, you must restart the
order processor for the change to take effect.
182
InfoArchive Configuration
Property Name
DA Label
Description
eas_polling
_interval
Polling interval
(ms)
Order queue polling interval in milliseconds
eas_cacheout
_enabled
Cache out
processing
Activates the background task caching out the least used
xDB detachable libraries for archive holdings and library
pools having exceeded their quota.
If this option is not selected, the order node will not
remove libraries from the xDB file system.
You do not need to restart the system for changes to this
setting to take effect.
eas_cacheout
_interval
Cache out
processing
interval (ms)
Time interval between activation of the background task
caching out the least used xDB libraries.
eas_user_name
User name
Deprecated property
eas_processed
_order_cnt
# processed
orders
Incremented when an order is processed; used for
monitoring
eas_act_indicators
_enabled
Update activity
indicators
Indicates if the order node must update the AIP activity
indicators
For performance reasons, the order processor does not
individually update an AIP (eas_aip) object each time it
processes an order scoping the AIP. The order processor
manages in memory the list of AIPs scoped by processed
orders. It regularly flushes this list, to update the AIP
activity indicators. If a number of processed orders are
scoped by the same AIP, the AIP activity indicators are
updated in memory. The AIP object is updated only once,
later.
When the first limit is reached, the order processor
updates the activity
eas_def_upd_max
_entries
Deferred update
max. entries
Maximum number of AIP entries to keep in memory
eas_def_upd
_flush_threshold
Deferred update
flush threshold
Maximum number of AIP entries modified in the activity
indicator cache since the last flush.
When this limit is reached, modified entries are written
to the database.
eas_def_upd
_flush_interval
Deferred update
max. interval
(sec.)
Maximum time interval in seconds between flushes.
When the first limit is reached, the order processor
updates the activity indicators of the AIP objects marked
as modified since the last flush.
eas_log_close
_pending
Log close pending
Flag indicating that the log should be closed without
stopping the node itself.
183
InfoArchive Configuration
Property Name
DA Label
Description
eas_start_date
Last start date
The date of the latest startup of the node.
eas_start_proc
_order_cnt
# processed
orders since
startup
Reset to 0 at order processor startup then incremented
when an order is processed; used for monitoring
eas_is_suspended
Suspended
Allows the dynamic suspension or resumption of the
associated order processor
eas_stop_pending
Stop pending
This boolean property allows the administrator to
remotely stop the associated order processor; for example,
using DA or IDQL
eas_stop_date
Last stop date
The date of the latest stop of the node.
Starting the Order Node
Start the order node by executing the following command located in EAS_HOME/bin:
• eas-launch-order-node.sh (Linux)
• eas-launch-order-node.bat (Windows)
Keep the prompt window open.
Note: To stop the order node, you must explicitly execute the following command located in
EAS_HOME/bin:
• eas-stop-order-node.sh (Linux)
• eas-stop-order-node.bat (Windows)
Shutting down the host without stopping the order node properly may cause failure the next time
you try to start it. If this happens, force-start the order node using the force (-f) option.
On Windows, order node is installed as a Windows service. You can start/stop order node by
starting/stopping the EAS Order Node service in the Services Microsoft Management Console (MMC).
Note: NEVER execute the eas-launch-order-node script when the EAS Order Node Windows
service is already running.
Configuring InfoArchive GUI
Shipping within the InfoArchive installation package, InfoArchive GUI is the default web-based
search application for searching data archived in InfoArchive holdings. You must configure
InfoArchive GUI for your holdings before using this search application.
Skip this section if you build your own search application by leveraging InfoArchive web services
instead of using InfoArchive GUI.
You can configure the following components of InfoArchive GUI:
• Search menu
184
InfoArchive Configuration
The search menu groups search forms under folders based on which holding they are used to
search, each folder corresponding to a distinct holding.
• Search form
The search form contains search criteria fields and search buttons with which the end user
performs queries against a holding. A search form is specific to InfoArchive GUI and is the
mechanism used to define what can be searched.
• Search result
The search result page displays the returned query results.
In addition, you can customize the cascading style sheet (CSS) used by InfoArchive GUI by creating
your custom styles in the empty custom.css file located in the css directory within the InfoArchive
GUI web application. For example, on Apache Tomcat, the file can be found in the following location:
TOMCAT_HOME/webapps/eas-gui/css
Configuring the Search Menu
You configure InfoArchive GUI search menu by creating a search form folder configuration object
(eas_cfg_search_form_folder) for each holding in a configured root folder in the repository (/System
EAS/Search Forms by default). You then populate each folder with search form configuration
(eas_cfg_search_form) objects used to search that holding. Each search form folder configuration
185
InfoArchive Configuration
(eas_cfg_search_form_folder) object is automatically rendered as a folder in InfoArchive GUI with
search forms grouped under it.
The root folder that holds search form folder configuration (eas_cfg_search_form_folder) objects in the
repository is configured through an access node configuration (eas_cfg_access_node) object, which
also links the search forms to the InfoArchive web services. InfoArchive web services are configured
through the eas_service.properties file located in the web application deployment directory
such as C:\app\apache-tomcat-7.0.42\webapps\eas-services\WEB-INF\classes. The
eas_service.properties file contains an eas_access_node property that points to the access
node configuration (eas_cfg_access_node) object. InfoArchive GUI solely uses the InfoArchive web
services and does not directly connect to the repository and xDB.
Multiple access nodes to points to the same search form folder configuration
(eas_cfg_search_form_folder) object.
Configuring a Search Form Folder
In DA, create an object of type eas_cfg_search in /System EAS/Search Forms (the default root
folder) and configure its properties.
186
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
Name that InfoArchive uses to refer to the search
form
eas_order_no
Order No.
The integer value that determines the order in which
search form folders are displayed in the search menu
in InfoArchive GUI. Items are displayed in ascending
order of the assigned values.
eas_consumer
_application
Consumer
application
Restricts visibility to a specific consumer application;
for example, eas_gui (InfoArchive GUI)
eas_language_code
Language code
Language code in the format language_country (ISO
639, 3166); for example: fr_FR for French and zh_CN
for simplified Chinese.
eas_title
Title
Title of the menu item for the associated
language/locale
eas_description
Description
Description of the menu item
Configuring a Search Form
InfoArchive GUI uses standard 1.1 XForms for displaying the search user interface rendered
completely on the client browser by the EMC Documentum XForms Engine. The EMC Documentum
XForms Engine is a pure client-side XForms implementation that runs entirely from within a web
browser. It is capable of rendering very flexible and dynamic forms without the need for a plugin or
processing outside of the browser.
The search form is primarily configured through the search form configuration object
(eas_cfg_search_form). Here are the general steps:
1.
Configure a query configuration (eas_cfg_query) object that defines the data to be searched and
the usable search criteria.
187
InfoArchive Configuration
Note: For the content (unstructured data) functionality to work the internal identifier of the
contents must be included in the query result.
2.
Configure one or more query quota configuration (eas_cfg_query_quota) objects to define the
quota to be considered for serving the search.
3.
Configure a delivery channel configuration (eas_cfg_delivery_channel) object to define the
delivery channel to use.
4.
Configure XForms for searching the holding.
5.
Configure a search form configuration (eas_cfg_search_form) object with the configured XForms
as its content.
The recommended approach to creating a new search form is to start with an existing search form
(such as the search forms in the sample holdings that ship with InfoArchive). The PhoneCalls search
form sample can be found here:
/install/unsupported/holdings/PhoneCalls/template/content/eas_cfg_search
_form.PhoneCall.01.xml
Configuring XForms
You configure an XForms XML file and later import it into the repository as the content of the search
form configuration (eas_cfg_search_form) object.
To get a quick start on configuring an XForms, you can customize an XForms of the sample
PhoneCalls holding that ships with InfoArchive or configure and install a simple holding using the
InfoArchive Holding Configuration Wizard and then build upon the basic
To accelerate the development of search forms, EMC recommends that you test
XForms locally using a web browser. To do this, retrieve the Formula project from
https://community.emc.com/docs/DOC-7172 and replace the $formula/war/example.xml with
your own XForms; then open index.xml in Firefox. You can adjust the look and feel of your search
form by referencing eas.css and bootstrap.css in the header.
188
InfoArchive Configuration
XForms Structure
XForms consists of the following structural components:
• Search criteria
Defines the search criteria and logical operators used to construct a search request in the search
form.
• Bindings
— Links the search criteria to form controls on the search form
— Defines the validation rules to apply on search criteria values
— Defines the computation rules (optional) to apply on search criteria values
• Form controls
Provides the controls in the search form for the user to interact with, such as fields,
lists, and buttons. The form control definition also enables the application of the desired
initialization/reinitialization rules on the value presented in the controls; for example,
reinitialization of the value presented in a control when the value of another control is changed.
XForms Example
Here is an example of a simple XForms that renders into the following search form in InfoArchive
GUI:
In the example:
• Defined within the criterias tags are the criteria by which to search data, each criteria
element corresponding to a single search criterion.
• The xform:bind elements are used to bind each search criterion to a form control in the search
form.
• Defined within the xhtml:body tags are the form controls—fields, labels, and buttons— that
the user can interact with in the search form.
<?xml version='1.0' encoding='UTF-8'?>
<xhtml:html xmlns:xhtml="http://www.w3.org/2002/06/xhtml2"
xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema"
xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:fn="http://www.w3.org/2005/xpath-functions">
189
InfoArchive Configuration
<xforms:model>
<xforms:instance id="PhoneCallsSimple1" xmlns="">
<request>
<criterias>
<criteria name="CritCallStartDateLower" operator="GreaterOrEqual"
model="CallStartDate" gui_display="From date"/>
<criteria name="CritCallStartDateUpper" operator="LessOrEqual"
model="CallStartDate" gui_display="To date"/>
<criterias relation="OR">
<criteria name="CritCallCustomerID" operator="Equal"
model="CustomerID" gui_display="Customer ID"/>
<criteria name="CritCallCustomerLastName" operator="StartsWith"
model="CustomerLastName" gui_display="Customer LastName"/>
</criterias>
</criterias>
</request>
</xforms:instance>
<xforms:bind id="bindCallStartDateLower" required="true()" type="xforms:dateTime"
nodeset="/request/criterias/criteria[@name='CritCallStartDateLower']"/>
<xforms:bind id="bindCallStartDateUpper" required="true()" type="xforms:dateTime"
nodeset="/request/criterias/criteria[@name='CritCallStartDateUpper']"/>
<xforms:bind id="bindCallCustomerID" required="false()" type="xforms:positiveInteger"
nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerID']"/>
<xforms:bind id="bindCallCustomerLastName" required="false()" type="xforms:string"
nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']"/>
<xforms:submission id="search" ref="/request" replace="none"/>
<xforms:submission id="order" ref="/request" replace="none"/>
</xforms:model>
<xhtml:body>
<xhtml:div class="form-horizontal">
<xhtml:div style="margin-left: 10px">
<xhtml:fieldset>
<xhtml:legend>Phone Calls</xhtml:legend>
<xhtml:div class="control-group">
<xhtml:label class="control-label">Call received between</xhtml:label>
<xhtml:div class="controls controls-row">
<xforms:input bind="bindCallStartDateLower" id="input_call_start_date_lower"
class="input-small">
<xforms:hint>Start date format:<xforms:output value="instance('eas-context-info')
/date_formats/date_format[1]"/></xforms:hint>
<xforms:message level="ephemeral" ev:event="xforms-invalid">Start date is invalid.
It must follow the format: <xforms:output value="instance('eas-context-info')
/date_formats/date_format[1]"/></xforms:message>
</xforms:input>
<xforms:input bind="bindCallStartDateUpper" id="input_call_start_date_upper"
class="input-small">
<xforms:hint>End date format:<xforms:output value="instance('eas-context-info')
/date_formats/date_format[1]"/></xforms:hint>
<xforms:message level="ephemeral" ev:event="xforms-invalid">End date is invalid.
It must follow the format: <xforms:output value="instance('eas-context-info')
/date_formats/date_format[1]"/></xforms:message>
</xforms:input>
</xhtml:div>
</xhtml:div>
<xhtml:div class="control-group">
<xhtml:label class="control-label">Customer ID</xhtml:label>
<xhtml:div class="controls">
<xforms:input bind="bindCallCustomerID" id="input_call_customer_id"
class="input-large">
<xforms:hint>Customer ID</xforms:hint>
</xforms:input>
</xhtml:div>
</xhtml:div>
<xhtml:div class="control-group">
<xhtml:label class="control-label">Customer Last Name</xhtml:label>
190
InfoArchive Configuration
<xhtml:div class="controls">
<xforms:input bind="bindCallCustomerLastName" id="input_call_customer_last_name"
class="input-large">
<xforms:hint>Customer Last Name</xforms:hint>
</xforms:input>
</xhtml:div>
</xhtml:div>
</xhtml:fieldset>
</xhtml:div>
<xhtml:div class="form-actions">
<xhtml:span class="pull-right">
<xforms:trigger class="btn">
<xforms:label>Reset</xforms:label>
<xforms:reset ev:event="DOMActivate"/>
</xforms:trigger>
<xforms:submit submission="order" incremental="false" class="btn">
<xforms:label>Background search</xforms:label>
</xforms:submit>
<xforms:submit submission="search" incremental="false" class="btn btn-primary">
<xforms:label>Search</xforms:label>
</xforms:submit>
</xhtml:span>
</xhtml:div>
</xhtml:div>
</xhtml:body>
</xhtml:html>
Search Criteria
Search criteria filter data and limit search results to a subset of data that matches the search
conditions. You define search criteria inside the criterias tags, each criteria element specifying
a search criterion in the search form. Once defined, search criteria will be used to construct XQuery
expressions for querying data, and will also be displayed on the search result page.
For example, the following criterion definition translates into a search field labeled Customer ID in
the search form, and when a value is specified in the field, only AIU records with the exact matching
CustomerID value will be returned as search results.
<criteria name="CritCallCustomerID" operator="Equal" model="CustomerID" gui_display=
"Customer ID"/>
The criteria element has the following attributes:
Attribute
Description
name
Descriptive name of the criterion, referenced in the search binding
191
InfoArchive Configuration
Attribute
Description
operator
Comparison operator that defines how a value specified in the search field is to be
compared against the AIU property (metadata) value to filter data. Valid operators
(case-sensitive) are: Equal, NotEqual, Greater, GreaterOrEqual, Less, LessOrEqual,
StartsWith, and Contains.
Make sure the operator you use is supported by the property data type. The
following table lists the valid operators supported by each data type:
Date
model
DateTime
String
Integer
Double
Equal
*
*
*
*
*
NotEqual
*
*
*
*
*
Greater
*
*
*
*
GreaterOrEqual
*
*
*
*
Less
*
*
*
*
LessOrEqual
*
*
*
*
StartsWith
*
Contains
*
AIU property to use as a search filter. This must be a search parameter defined as a
search criterion in the query configuration (eas_cfg_query) object (XML content)
configured for the same holding; for example:
<param name="CustomerID" type="decimal" index="true">
<path>n:CustomerID</param>
For information about defining search criteria, see request-config—Defining Search
Criteria, page 173.
gui_display
Label of the search field displayed in the search form
Search Form Binding
In search form binding, each xform:bind element identifies an input value for the search criterion to
be bound to a form control in the search form. A binding also specifies the data type of the search
value and whether it is required; for example:
<xforms:bind id="bindCallCustomerID" required="false()" type="xforms:positiveInteger"
nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerID']"/>
Attribute
Description
id
Uniquely identifies the binding
required
Whether or not the input value of search criterion is required
192
InfoArchive Configuration
Attribute
Description
type
Data type of the search value
nodeset
Full XPath to the search value attribute in the criterion definition
Search Form Controls
The bind attribute of the form control links the form control to an <xforms:bind> element, as well
as a criterion in the data model; for example:
<xforms:input
bind="bindCallStartDateStart"
id="input_from_creation_date">
<xforms:hint>Start date</xforms:hint>
</xforms:input>
InforArchive GUI uses the ID of the submission elements to distinguish between synchronous
searches and orders (asynchronous searches): for example:
Submission in the data model section
<xforms:submission id="search"
replace="none" ref="/request"/>
• id: “search” or “order”
Submission buttons in the form controls
section
<xforms:submit submission
="search" incremental="false">
<xforms:label>Search</xforms:label>
</xforms:submit>
• submission: References the submission id
• replace: Must be “none”, or a blank page will
be rendered in the browser
Logical Grouping of Search Criteria
By default, multiple search criteria are combined using the AND relation (translated into the AND
logical operator in the resultant XQuery expression). That is, all the search criteria must be met for
the data to be returned by the query.
You can group multiple search criteria using the OR relation by enclosing them inside the criterias
element and setting its relation attribute to OR (if not specified, the default value is AND). Multiple
criterias elements can be nested to create more complex search criteria with a combination of
logical relations.
Note: If an element is defined as a partitioning key, do not include it in the OR relation in the search
criteria; otherwise, the returned search results will be incorrect. The partitioning key is used to
narrow down the scope of the AIPs in the tier-1 search, and the tier-2 XQuery search constructed by
the search criteria can only be executed within this scope (AND relation), not outside it (OR relation).
In plain English, the following criteria definition translates into: a call record meets the search criteria
when the CallStartDate is between a specified date A and a specified date B AND either the customer
ID equals a specified value OR the customer last name starts with a specified value.
193
InfoArchive Configuration
<criterias relation="AND">
<criteria name="CritCallStartDateLower" operator="GreaterOrEqual"
model="CallStartDate" gui_display="From date"/>
<criteria name="CritCallStartDateUpper" operator="LessOrEqual"
model="CallStartDate" gui_display="To date"/>
<criterias relation="OR">
<criteria name="CritCallCustomerID" operator="Equal"
model="CustomerID" gui_display="Customer ID"/>
<criteria name="CritCallCustomerLastName" operator="StartsWith"
model="CustomerLastName" gui_display="Customer LastName"/>
</criterias>
</criterias>
When rendered in the search form, the logical relations among multiple criteria are not spelled out
on the search screen (unless you specify them in the field label), but can be found in the Search
Details panel at the top of the search results screen.
Multiple Input Values for a Single Search Criterion
A search criterion can contain multiple input values combined with the OR relation. For example,
in the following search form, the user can enter three values for the Customer Last Name search
criterion, and records that match any one of these values will be returned:
To define multiple input values for a search criterion, in the search criteria definition, include as many
empty value elements you want to create inside the criteria element; for example:
<criteria name="CritCallCustomerLastName" operator="StartsWith" model
="CustomerLastName" gui_display="Customer LastName"><value></value><value><
/value><value></value></criteria>
In the search bindings, create a binding for each input value you have defined; for example:
<xforms:bind id="bindCallCustomerLastName1" required="false()" type="xforms:string"
nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']/value[1]"/>
<xforms:bind id="bindCallCustomerLastName2" required="false()" type="xforms:string"
nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']/value[2]"/>
<xforms:bind id="bindCallCustomerLastName3" required="false()" type="xforms:string"
nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']/value[3]"/>
In the form controls definition, define an input field for each input value you have defined; for
example:
<xhtml:div class="control-group">
<xhtml:label class="control-label">Customer Last Name</xhtml:label>
194
InfoArchive Configuration
<xhtml:div class="controls">
<xforms:input bind="bindCallCustomerLastName1" id="input_call_customer_last_name1"
class="input-large">
<xforms:hint>Customer Last Name</xforms:hint>
</xforms:input>
<xforms:input bind="bindCallCustomerLastName2" id="input_call_customer_last_name2"
class="input-large">
<xforms:hint>Customer Last Name</xforms:hint>
</xforms:input>
<xforms:input bind="bindCallCustomerLastName3" id="input_call_customer_last_name3"
class="input-large">
<xforms:hint>Customer Last Name</xforms:hint>
</xforms:input>
</xhtml:div>
</xhtml:div>
Defining InfoArchive GUI Locales
You define which locales are supported by InfoArchive GUI as well as default locale and formats
in the eas-gui.properties file located in the WEB-INF/classes directory of InfoArchive
GUI web application. For example, on Apache Tomcat, the configuration file can be found in the
following location:
TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes
Edit eas-gui.properties and configure locale-related settings; for example:
eas.locale.default=en_US
eas.locales=en_US,fr_FR,zh_CN
eas.client.config.default.dateformat=yyyy-MM-dd
eas.client.config.en_US.dateformat=dd/MM/yyyy,ddMMyyyy
eas.client.config.fr_FR.dateformat=dd/MM/yyyy,ddMMyyyy
eas.client.config.zh_CN.dateformat=dd/MM/yyyy,ddMMyyyy
For each locale, you can define multiple supported date format and/or dateTime format, delimited
by comma. However, the first value will be the default format used by the date/dateTime control
on the UI.
Restart the web Application server for the changes to take effect.
Localizing a Search Form
A search form can be localized into multiple languages. In InfoArchive GUI, you can switch between
different language versions of a search form by choosing a language locale at login.
In the configuration of a localized search form, the labels of the form controls are externalized. All the
localized resources (form name, labels, and hint messages) are stored in a localization properties file,
one for each language. The localization properties file is imported into the search form configuration
(eas_cfg_search_form) object as one of its renditions. It is applied to the corresponding language
version of the search form in InfoArchive through a language code defined for the application in the
search form configuration.
195
InfoArchive Configuration
To localize a search form into a non-English language:
1.
Edit the properties of the search form configuration (eas_cfg_search_form) object. Under the
Description tab, add the language code to be supported by InfoArchive GUI application; for
example:
eas_gui is the application code for InfoArchive GUI.
2.
Externalize the localization resources such as form name, labels, and hint messages.
Edit the XML content of the search form configuration (eas_cfg_search_form) object to replace
actual form name, labels, and hint messages with string variables; for example:
...
<request>
<criterias>
<criteria name="CritCallCustomerID" operator="Equal"
model="CustomerID" gui_display="${customer.id.label}"/>
</criterias>
</request>
...
<xhtml:body>
<xhtml:div class="form-horizontal">
<xhtml:div style="margin-left: 10px">
<xhtml:fieldset>
<xhtml:legend>${form.name}</xhtml:legend>
<xhtml:div class="control-group">
<xhtml:label class="control-label">${customer.id.label}</xhtml:label>
<xhtml:div class="controls">
<xforms:input bind="bindCallCustomerID" id="input_call_customer_id"
class="input-large">
<xforms:hint>${customer.id.hint}</xforms:hint>
</xforms:input>
</xhtml:div>
</xhtml:div>
</xhtml:fieldset>
</xhtml:div>
<xhtml:div class="form-actions">
<xhtml:span class="pull-right">
<xforms:submit submission="order" incremental="false" class="btn">
<xforms:label>${button.backgroundsearch.label}</xforms:label>
</xforms:submit>
<xforms:submit submission="search" incremental="false" class="btn btn-primary">
196
InfoArchive Configuration
<xforms:label>${button.search.label}</xforms:label>
</xforms:submit>
</xhtml:span>
</xhtml:div>
</xhtml:body>
...
3.
Create a localization properties file containing translations of the externalized resources (form
name, labels, and hint messages) in the target language. The localization file must be encoded
in UTF-8 and its file name must be suffixed with the language code and have the .properties
extension, such as form.PhoneCalls.zh_CN.properties.
In the following example localization properties file, strings translated into the Simplified Chinese
language are assigned to the resource variables.
form.name=通话录音
customer.id.label=客户 ID
customer.id.hint=通话客户的标识
button.backgroundsearch.label=后台搜索
button.search.label=搜索
4.
Import the localization properties file as a rendition of the search form configuration
(eas_cfg_search_form) object.
a.
Right-click the eas_cfg_search_form object and choose View > Renditions from the shortcut
menu.
b. With the existing XML Document rendition selected, choose Import Rendition from the
File menu.
c.
Select the localization properties file to import.
d. If the localization properties file conforms to the correct naming convention and file format
(.properties), InfoArchive will recognize its format as eas_localization (localization
properties file) and automatically fills out the Format field.
e.
In the Page modifier field, enter the language code, such as zh_CN for Simplified Chinese.
f.
Click OK. The imported file is displayed as a rendition of the search form configuration
(eas_cfg_search_form) object.
The search form is localized. When you access the form in InfoArchive GUI using the corresponding
locale, the form name, labels, and hint messages will all appear in the target language.
197
InfoArchive Configuration
When the localization property file is missing for a specific locale, the default one defined in the
eas-services.properties file (eas.locale.default=en_US) is used.
Configuring Hints and Error Messages
A hint or tooltip is a piece of concise information about a UI control (e.g., input field) that appears in a
small “hover box” when the user hovers the cursor over the control.
An error message is displayed when an error occurs, to alert the user to the possible causes of the
error and help the user troubleshoot.
You can optionally configure hints and error messages using the xforms:hint and
xforms:message elements respectively nested within UI control elements.
In the following example, a hint and an error message are defined for the Call Start Date input control
to inform the user of the acceptable date format.
<xforms:input bind="bindCallStartDateLower" id="input_call_start_date_lower"
class="input-small">
<xforms:hint>Start date format:<xforms:output value="instance('eas-context-info')
/date_formats/date_format[1]"/></xforms:hint>
<xforms:message level="ephemeral" ev:event="xforms-invalid">Start date is invalid.
It must follow the format: <xforms:output value="instance('eas-context-info')
/date_formats/date_format[1]"/></xforms:message>
</xforms:input>
Note: In the hint, the xforms:output element retrieves the first (as indicated by [1]) valid date
format of the current locale from eas-gui.properties, where locales are defined.
Both hints and error messages are optional for UI controls. However, if you do not define an error
message for an input control, when an error occurs upon search form submission, a default system
error message will still display in the following format:
input_control_id is invalid.
In the example above, without the xforms:message element, the default system error message
would be:
input_call_start_date_lower is invalid.
Configuring a Search Form Configuration (eas_cfg_search_form)
Object
In DA, create an object of type eas_cfg_search_form under the corresponding search form folder
configuration (eas_cfg_search_form_folder) object and configure its properties. You then import the
configured XForms as the content of the object.
198
InfoArchive Configuration
Property Name
DA Label
Description
eas_name
Name
Value used by InfoArchive to refer to this form
eas_aic_predicate
AIC DQL
predicate
Returns the holding(s) which can be searched with
the form. The AIC DQL predicate must select
eas_cfg_aic objects. Most often, this predicate selects
a eas_cfg_holding object (a sub-type of eas_cfg_aic).
Therefore, this predicate is generally used to select the
holding to query.
eas_result
_schema
_predicate
Result Schema
DQL predicate
Returns the XML schema which can be requested by the
form. The Result Schema DQL predicate must select a
eas_cfg_schema object. This predicate most often selects
the eas_cfg_schema associated with the schema applied
by the data archived in the holding to query.
eas_result
_delivery_pred
Delivery channel
DQL predicate
Returns the delivery channels which can be requested
by the form as the destination of search results.
The Delivery channel DQL predicate must select a
eas_cfg_delivery_channel object. With the current
InfoArchive release, this predicate must always select the
standard eas_cfg_delivery_channel: eas_access_services.
199
InfoArchive Configuration
Property Name
DA Label
Description
eas_order_no
Order No.
The integer value that determines the order in which
search forms are displayed in the search menu in
InfoArchive GUI. Search forms are displayed in ascending
order of the assigned values.
eas_consumer
_application
User application
Restricts visibility to a specific consumer application; for
example, eas_gui (InfoArchive GUI)
eas_language
_code
Language code
Language code in the format language_country (ISO 639,
3166); for example: fr_FR for French and zh_CN for
simplified Chinese.
eas_title
Title
Title of the search form for the associated language/locale
eas_description
Description
Description of the search form
Configuring the Search Results
Search results are rendered in InfoArchive GUI using an XML based stylesheet conforming to
the predefined styesheet schema eas_gui_stylesheet_1.2.xsd, which can be found in the
install/resources/xsd directory of the InfoArchive installation package. The stylesheets
must be imported into the repository as a stylesheet configuration (eas_cfg_stylesheet) object. The
stylesheet is specific to a schema and is linked to one or more search forms.
200
InfoArchive Configuration
Configuring a Stylesheet Configuration (eas_cfg_stylesheet) Object
In DA, create an object of type eas_cfg_stylesheet in the holding folder (e.g., /System EAS/Archive
Holdings/MyHolding) and configure its properties.
Property Name
DA Label
Description
eas_name
Name
The name used to refer to this form in the InfoArchive
configuration
eas_schema
Schema name
Indicate the result schema(s) which can be displayed by
this stylesheet
201
InfoArchive Configuration
Property Name
DA Label
Description
eas_search_form
Search form
Indicates this stylesheet can be used for displaying the
results of a search submitted from these forms
eas_form_alias_id
Form Alias ID
Resolve the references to search forms which can be
defined in <query> element of the stylesheet.
eas_form_alias
_pred
Form Alias
Predicate
DQL predicate that must be executed for finding the
search form associated with the eas_form_alias_id at the
same index value.
Creating a Stylesheet
You create a stylesheet in an XML file and later import it into the repository as the content of the
stylesheet configuration (eas_cfg_stylesheet) object.
Stylesheet Components
The following components make up a stylesheet, each defining a portion of the search results:
• Main
Defines the main paged result table containing the AIUs returned in the result set. From a
configuration perspective, it is identical to a htable.
• Layout
There are three types of layout elements:
— htable
Displays a table containing any number of columns and rows
— vtable
Displays a table containing two columns where the left column contains labels and the right
column contains values
— detail
A collapsible panel that can contain any other layout
• Item
There are three types of items:
— value
Displays something from the XML of the result set
— image
Displays an image
— html
202
InfoArchive Configuration
Displays a piece of static HTML
• Modifier
A modifier modifies an item, for instance turning it into an external link or a link to download a
file. There are four types of modifiers:
— content
Creates a link to download a file
— link
Creates a custom external link
— query
Creates a link that triggers another search
— display
Creates a conditional display of an item or a modifier other than itself
You can use the cssclass and style attributes to apply the stylesheet to any component.
If the attribute name ends with "xpath", the value must be a valid XPath 2.0 expression. For a static
value, you must enclose the text within single quote, for example: filename_xpath="’audio.mp3’"
Stylesheet Elements and Attributes
Stylesheets are defined using the <stylesheet> element. They should include an xmlns attribute
that points to the URN of the result set schema.
The main element and its children define how the result set XML is formatted and presented to the
user. The following table lists the stylesheet components available.
Stylesheet
Element
Example
Attributes
Children
Main
<g:main path="[PATH TO
AIU]">...</g:main>
• Path
At least
one
Column
Defines the main
paged table
presenting the
result set.
a simple path that selects a
NodeSet (a list of objects)
representing the rows of
the table
• Style
CSS class name to apply
203
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
Column
<g:column label="First name">
</g:column>
• label
• Link
The name of the column
header
A column defines
one column of
values in a table
(htable or main).
• cssclass
The custom CSS class to
apply to the column
• header_cssclass
The custom CSS class
to apply to the column
header
Details
Displays a
collapsible
panel containing
other layout
components.
<g:details collapsed_label
="Attachments (show)"
expanded_label="Attachments
(hide)"> </g:details>
• expanded_label
The label of the expanded
panel
• collapsed_label
The label of the collapsed
panel
• Style
Optional. CSS class name
to apply.
• auto_expand_xpath
Optional. For non
top-level details if the
xpath evaluates to true the
details pane will be auto
expanded.
Vtable
A column defines
one column of
values in a table
(htable or main).
204
<g:vtable> </g:vtable>
• Style
Optional. CSS class name
to apply.
• Query
• Content
• Value
• Image
• Html
• Details
• Optional:
Html
(before)
• One of
Htable
or
Vtable
• Any
number
of
Details
or
Display
_if
• Optional:
Html
(after)
At least
one of:
• Row
• Display
_if
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
Row
<g:row label="Customer ID"> ...
</g:row>
• Label
• Link
The label of the row.
• Query
• Content
• Value
• Image
• Html
Htable
<g:htable path="Attachments
/Attachment"> ... </g:htable>
• Path
a simple path, relative to
the parent table row node
that selects a NodeSet (a
list of objects) representing
the rows of the table
At least
one of:
• Column
• Style
Optional, CSS class name
to apply.
Value
Display a value
from the result set
XML, the value is
extracted using an
XPath expression
relative to the
current node.
<g:value xpath="CreatedOnDate
/text()" datatype="DATE"
format="dd/MM/yyyy
HH:mm:ss" />
• xpath
None
a simple path, relative to
the parent table row node
that selects a NodeSet (a
list of objects) representing
the rows of the table.
• datatype
The data type of the value:
DATE, DATETIME, TEXT,
NUMBER.
• format
Optional. How to format
the value, only applies to
DATE, DATETIME and
NUMBER. The format is
specified according to the
format defined in java.text
.SimpleDateFormat and
java.text.DecimalFormat
respectively.
205
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
• input_format
Optional. How to
parse the value in the
XML, only applies to
DATE, DATETIME and
NUMBER. The format is
specified according to the
format defined in java.text
.SimpleDateFormat and
java.text.DecimalFormat
respectively.
• Style
Optional. CSS class name
to apply.
Image
<g:image url_xpath="’img
/defaultContent.png’" width="16"
height="16" alt_xpath="’alt’" />
• url_xpath
An XPath that evaluates to
the URL of the image.
• width
The width of the image.
• height
The height of the image.
• alt_xpath
An XPath that evaluates
to the alternate text of the
image.
• Style
Optional. CSS class name
to apply.
Html
Displays a piece
of plain static
HTML.
206
<g:html> <![CDATA[ This is
<b>HTML</b>]]> </g:html>
None
None
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
Link
Creates an
external link (e.g.,
an element with
the href attribute
set).
<g:link url_xpath="concat
(’phone:’, CallToPhoneNumber
/text())" style="phonelink"
target="_notarget"> </g:link>
• url_xpath
One of:
An xpath that evalutes to
the URL of the link.
• target
• Value
• Image
• Html
Optional. The target of the
link.
• style
Optional. CSS class name
to apply.
Query
Note, schema,
AIC and
delivery_channel
are resolved
against the form
and eas_form
_alias_pred
corresponds to
the id attribute.
Only the values
returned by those
predicates are
valid, and the
attributes on the
query can only
restrict this to a
single value.
<g:query id="form1 "
mode="replace" > ... </g:query>
• id
The id of a form alias
corresponding to the
eas_form_alias_pred
attribute of the
eas_cfg_stylesheet object.
• mode
The mode of the query,
currently only replace.
All of:
• Criteria
One of:
• Value
• Image
• Html
• schema
Optional, if a specific
schema should be used.
• schema_version
Optional, if a specific
schema version should be
used.
• aic
Optional, if a specific
collection should be
queried.
• delivery_channel
Optional. If a specific
delivery channel should
be used.
207
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
Criteria
<g:criteria> ... </g:criteria>
None
At least
one of:
• or
Specifies a search
criteria, the
sub-criteria are
treated as if
contained in
a conjunction
(logical and).
or
• and
• arg
<g:or> ... </g:or>
None
At least
one of:
• or
Specifies a
disjunction of
search criteria.
• and
• arg
and
<g:and> ... </g:and>
None
At least
one of:
• or
Specifies a
conjunction of
search criteria.
• and
• arg
arg
Specifies a search
criterion.
<g:arg name=="FileName/@aipid"
seqno_xpath="//Attachment
/FileName/@seqno"
type="DOWNLOAD">
• name
The name of the search
criterion, it corresponds
to a name in the query
configuration.
• value_xpath
The value of the search
criterion.
• operator
The operator to use.
Depends on the datatype.
• datatype
The type of the value,
used to determine
the formatting of the
reminder.
208
None
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
• reminder_label
The label of the criterion
reminder.
• reminder_format
Optional. The format of
the reminder.
• reminder_input_format
Optional. The format of
the value.
Content
Note that to
leverage the
content control
the query
configuration
must be
configured to
add the content
information to the
result set.
<g:content aip_xpath
="FileName/@aipid"
seqno_xpath="FileName/@seqno"
type="DOWNLOAD"> ...
</g:content>
or
<g:content cid_xpath
="FileName/@cid"
seqno_xpath="FileName/@seqno"
type="DOWNLOAD"> ...
</g:content>
• cid_xpath
The "AIP id:ci:Sequence
number" value extracted
using an XPath expression
The cid value must have
the same format as the
ID used to retrieve the
content with the WS :
0000000000000000:CI:1.
One of:
• Value
• Image
• Html
• aip_xpath
The AIP ID value extracted
using an XPath expression
• seqno_xpath
The sequence number of
the content item, extracted
using an XPath expression.
• filename_xpath
An xpath that evaluates to
the filename to use for the
file.
• type
DOWNLOAD or SHOW
if the content should be
downloaded or shown
in the browser (note that
209
InfoArchive Configuration
Stylesheet
Element
Example
Attributes
Children
difference is in the content
disposition (attachment vs
inline) of the header in the
reply to the http request,
and ultimately it is up to
the browser what action is
taken.
• style
Optional. CSS class name
to apply.
Display_if
Display_if is a
special element
that can wrap
other elements
and depending
on a value of
an xpath either
hide or show
those elements.
Display_if can
wrap:
<g:display_if condition_xpath
="Attachments/Attachment"> ...
</g:display_if>
• condition_xpath
An xpath that should
evaluate to a boolean
which is used to determine
the visibility of the child
elements.
Depends
on the
context it
occurs in,
can only
contain
children
that would
be valid if
the display
_if wasn’t
there.
• Child details
• Columns in a
htable
• Rows in a
vtable
• Items
In the following example:
• The title element defines the title of the stylesheet, used by the UI as the “header” of the result set
page
• The path element defines the path to the AIUs relative to the root node. Note that this is a simple
path and not a full XPath.
• In the cssstyle element, you can define custom CSS classes.
<g:stylesheet xmlns:g="urn:eas:xsd:stylesheet.1.2"
xmlns="urn:eas-samples:en:xsd:phonecalls.1.0">
<g:title>PhoneCalls</g:title>
<g:cssstyle>.customCssClassA { background-color: red; }
.customCssClassB { background-color: blue; }</g:cssstyle>
<g:main path="Call">
...
210
InfoArchive Configuration
</g:main>
</g:stylesheet>
Namespaces in the Stylesheet
It is important that you declare the namespaces correctly in the stylesheet. At least 2 namespaces
must be defined, the namespace of the stylesheet and all the namespaces of the result set XML. The
namespace of the result set must be used in the XPath expressions in the stylesheet. A good practice
is to define the stylesheet namespace with the g prefix and the result set namespace. If there is
only one, with the blank prefix, for example:
<?xml version="1.0" encoding="UTF-8"?>
<g:stylesheet xmlns:g="urn:eas:xsd:stylesheet.1.2"
xmlns="urn:eas-samples:en:xsd:phonecalls.1.0">
</g:stylesheet>
Stylesheet Example
The code presented below shows a complete example of a stylesheet used to present the search result
of the phonecalls example data, it uses all the controls available and in addition shows how to handle
a variable amount of associated content files by presenting them in a nested htable.
<?xml version="1.0" encoding="UTF-8"?>
<g:stylesheet xmlns:g="urn:eas:xsd:stylesheet.1.2" xmlns="urn:eas-samples:en:xsd:
phonecalls.1.0" xmlns:eas="urn:x-emc:eas:schema:pdi">
<g:title>Phonecalls</g:title>
<g:main path="Call">
<g:column label="">
<g:details expanded_label="-" collapsed_label="+">
<g:vtable>
<g:row label="Customer ID">
<g:value xpath="CustomerID/text()" format="" datatype="STRING" />
</g:row>
<g:row label="Last Name">
<g:value xpath="CustomerLastName/text()" format="" datatype="STRING" />
</g:row>
<g:row label="First name">
<g:value xpath="CustomerFirstName/text()" format="" datatype="STRING" />
</g:row>
<g:row label="Sent to archive at">
<g:value xpath="SentToArchiveDate/text()" format="dd/MM/yyyy HH:mm:ss" datatype="DATE" />
</g:row>
<g:row label="Call started at">
<g:value xpath="CallStartDate/text()" format="dd/MM/yyyy HH:mm:ss" datatype="DATE" />
</g:row>
<g:row label="Call ended at">
<g:value xpath="CallEndDate/text()" format="dd/MM/yyyy HH:mm:ss" datatype="DATE" />
</g:row>
<g:row label="Call from phone number">
<g:link url_xpath="concat('phone:',CallFromPhoneNumber/text())" style="phonelink"
target="_notarget">
<g:value xpath="CallFromPhoneNumber/text()" format="" datatype="STRING" />
</g:link>
</g:row>
<g:row label="Call to phone number">
<g:link url_xpath="concat('phone:',CallToPhoneNumber/text())" style="phonelink"
target="_notarget">
211
InfoArchive Configuration
<g:value xpath="CallToPhoneNumber/text()" format="" datatype="STRING" />
</g:link>
</g:row>
<g:row label="Representative ID">
<g:value xpath="RepresentativeID/text()" format="" datatype="STRING" />
</g:row>
<g:row label="Search">
<g:query id="form1" mode="replace">
<g:value xpath="'Search for calls'" format="" datatype="STRING" />
<g:criteria>
<g:arg name="CallStartDate" value_xpath="'2000-01-01T08:00:00.000+01:00'"
operator="GreaterOrEqual" datatype="DATE" reminder_label="To date"></g:arg>
<g:arg name="CallStartDate" value_xpath="current-dateTime()" operator="LessOrEqual"
datatype="DATE" reminder_label="From date"></g:arg>
<g:arg name="CustomerID" value_xpath="CustomerID/text()" operator="Equal"
datatype="STRING" reminder_label="Customer ID"></g:arg>
</g:criteria>
</g:query>
</g:row>
</g:vtable>
<g:details collapsed_label="+ Attachments" expanded_label="- Attachments">
<g:htable path="Attachments/Attachment">
<g:column label="File">
<g:content cid_xpath="FileName/@eas:cid" filename_xpath="AttachmentName/text()"
type="SHOW">
<g:image url_xpath="'img/defaultContent.png'" width="16" height="16" alt_xpath="'alt'" />
</g:content>
</g:column>
<g:column label="Name">
<g:value xpath="AttachmentName/text()" datatype="STRING" />
</g:column>
<g:column label="Created by">
<g:value xpath="CreatedBy/text()" datatype="STRING" />
</g:column>
<g:column label="Created on">
<g:value xpath="CreatedOnDate/text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" />
</g:column>
</g:htable>
</g:details>
</g:details>
</g:column>
<g:column label="Call received at">
<g:value xpath="CallStartDate/text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" />
</g:column>
<g:column label="ID">
<g:value xpath="CustomerID/text()" datatype="NUMBER" />
</g:column>
<g:column label="First name">
<g:value xpath="CustomerFirstName/text()" datatype="STRING" />
</g:column>
<g:column label="Last name">
<g:value xpath="CustomerLastName/text()" datatype="STRING" />
</g:column>
<g:column label="Call ended at">
<g:value xpath="CallEndDate/text()" datatype="DATETIME" format="yyyy/MM/dd HH:mm:ss" />
</g:column>
<g:column label="Audio">
<g:content cid_xpath="Attachments/Attachment[AttachmentName='recording']/FileName/@eas:cid"
filename_xpath="'audio.mp3'" type="SHOW" style="btn">
<g:html>Listen</g:html>
</g:content>
</g:column>
</g:main>
</g:stylesheet>
212
InfoArchive Configuration
Localizing a Stylesheet
Like the search form, the search results screen can also be localized into multiple languages through
its stylesheet. The locale you choose at InfoArchive GUI login is applied to both search forms and
search results pages.
You localize a stylesheet in pretty much the same way you localize a search form. In the configuration
of a localized stylesheet, the labels of the form controls are externalized. All the localized resources
(page title and labels) are stored in a localization properties file, one for each language. The
localization properties file is imported into the stylesheet configuration (eas_cfg_stylesheet) object
as one of its renditions. It is applied to the corresponding language version of the stylesheet in
InfoArchive through a language code defined for the application in the stylesheet configuration.
To localize a stylesheet into a non-English language:
1.
Edit the properties of the style configuration (eas_cfg_styleshet) object. Under the Description
tab, add the language code to be supported by the InfoArchive GUI application, if the language
code has not been added yet; for example:
eas_gui is the application code for InfoArchive GUI.
2.
Externalize the localization resources such as the page title an labels.
Edit the XML content of the stylesheet configuration (eas_cfg_stylesheet) object to replace actual
page title and labels with string variables; for example:
• Externalized row header:
<g:row label="${customer.id.label}">
<g:value xpath="CustomerID/text()" format="" datatype="STRING"/>
</g:row>
• Externalized column header:
<g:column label="${created.on.label}">
<g:value xpath="CreatedOnDate/text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" />
</g:column>
3.
Create a localization properties file containing translations of the externalized resources (page
title and labels) in the target language. The localization file must be encoded in UTF-8 and its
213
InfoArchive Configuration
file name must be suffixed with the language code and have the .properties extension, such as
stylesheet.PhoneCalls.zh_CN.properties.
In the following example localization properties file, strings translated into the Simplified Chinese
language are assigned to the resource variables.
title=通话记录
customer.id.label=客户 ID
search.label=搜索
attachments.label=附件
file.label=文件
name.label=名称
created.by.label=创建人
created.on.label=创建日期
call.received.at.label=通话时间
id.label=ID
audio.label=音频
play.label=播放
4.
Import the localization properties file as a rendition of the stylesheet configuration
(eas_cfg_stylesheet) object.
a.
Right-click the eas_cfg_stylesheet object and choose View > Renditions from the shortcut
menu.
b. With the existing XML Document rendition selected, choose Import Rendition from the
File menu.
c.
Select the localization properties file to import.
d. If the localization properties file conforms to the correct naming convention and file format
(.properties), InfoArchive will recognize its format as eas_localization (localization
properties file) and automatically fills out the Format field.
e.
In the Page modifier field, enter the language code, such as zh_CN for Simplified Chinese.
f.
Click OK. The imported file is displayed as a rendition of the stylesheet configuration
(eas_cfg_stylesheet) object.
The stylesheet is localized. When you view the search results pages in InfoArchive GUI using the
corresponding locale, the page title and labels will all appear in the target language.
When the localization property file is missing for a specific locale, the default one defined in the
eas-services.properties file (eas.locale.default=en_US) is used.
214
InfoArchive Configuration
Implementing InfoArchive GUI Single Sign-On (SSO)
If you use a third-party authentication server (LDAP, ActiveDirectory, and so on) for user login, you
may want to implement single sign-on (SSO) and handle user context variables using a custom JSP
file, instead of creating authorized users in the InfoArchive GUI web application. You can create a JSP
file that calls the external authentication system to authenticate users logging in to InfoArchive GUI.
In addition to username and the password, you can include other context variables in the JSP file
such as email address and client IP address. These context variables are added to InfoArchive event
properties for auditing purposes.
Here is an example of the custom JSP file for SSO:
<%@
<%@
<%@
<%@
<%@
<%@
<%
page
page
page
page
page
page
import="com.emc.documentum.eas.gui.gwt.server.EASGUIApplication" %>
import="java.util.Map" %>
import="java.util.HashMap" %>
import="com.emc.documentum.eas.gui.gwt.server.webservices.EASAccessLayer" %>
import="com.emc.documentum.eas.service.model.authenticate.AuthenticationResponse" %>
import="com.emc.documentum.eas.gui.gwt.server.web.jsp.EASGuiJspUtils" %>
String user = "eas_usr_webservice";
String password = "eas_usr_webservice";
Map<String, String> variables = new HashMap<String, String>();
// use variables.put() to include additional variables
variables.put("key1", "value1");
variables.put("key2", "value2");
AuthenticationResponse authenticationResponse = EASAccessLayer.login(user, password);
EASGUIApplication.getAuthenticationHook().onUserAuthenticated(user, null,
authenticationResponse.getProfile(), variables);
response.sendRedirect(EASGuiJspUtils.getWelcomeURL(request));
%>
After you deploy the custom JSP file in the InfoArchive GUI directory, you can access InfoArchive
GUI via this page; for example:
http://myhost:8080/eas-gui/testsso.jsp
InfoArchive Reserved Context Variables
InfoArchive UI application has the following reserved context variables:
• useragent
• remote.addr
• remote.host
• remote.user
You can audit these variables, which contain authentication and connection information. For example,
you can enable the auditing of remote.addr by adding the following in eas-gui.properties:
eas.audit.remote.addr=true
After auditing is enabled, you can see the remote address information added as properties of an
eas_query or eas_order event:
eas_consumer_ctxvar_name
eas_consumer_ctxvar_value
[0]: remote.addr
[0]: 127.0.0.1
215
InfoArchive Configuration
Note: When the audit option is set to true in eas-gui.properties, the variables.put();
method for the reserved context variables in the JSP file are no longer effective.
Therefore, do not add InfoArchive reserved variables and hard code their values in the custom JSP file
for auditing purposes; enable reserved context variables auditing in eas-gui.properties instead.
Custom Context Variables
You can add custom variables such as DNS, email, and other authentication or connection information
in the JSP file for SSO. The information is later added to each query or order event related to an SSO.
For example, if you have the following in the JSP file:
variables.put("DNS", "10.27.0.10");
variables.put("Email", "Admin@InfoArchive.com");
The query or order event related to the SSO will have the following properties, which can be used
for auditing purposes.
eas_consumer_ctxvar_name
eas_consumer_ctxvar_value
eas_consumer_ctxvar_name
eas_consumer_ctxvar_value
[0]:
[0]:
[1]:
[1]:
DNS
10.27.0.10
Email
Admin@InfoArchive.com
Enabling Security Policies
By default, security policies are disabled for InfoArchive. You can enable it by choosing a security
policy for your system. Enabling a security policy for InfoArchive consists of two configuration steps:
• Configuration on web services
• Configuration on web GUI
Typically, you perform the security configuration tasks after deploying WAR files for web services
and web GUI. InfoArchive provides the following security policies:
• Clear-text username/password policy
• X.509 policy
• Customized policies
Clear-text Username/Password Policy
Clear-text username/password policy has many options. The basic option is to validate the username
and the password saved in a file. Furthermore, you can include extra validation mechanism for this
policy, for example, Documentum Content Server validation, LDAP validation.
216
InfoArchive Configuration
Configuration on Web Services
1.
Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/wsdl/eas-service-ws
-policies.wsdl.
a.
Remove comment tags (<!-- -->) for the Clear-text Username/Password Policy
section.
b. Save the file.
Note: You can have only one active security policy for web services. Therefore, whenever you
remove comment tags in eas-service-ws-policies.wsdl, you must add comment tags
for the currently active policy to disable it.
2.
Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/cxf-context.xml.
a.
Locate the jaxws:endpoint Properties for file-based Username/Password
checking, jaxws:endpoint Properties for Documentum Username/Password
checking, or jaxws:endpoint Properties for LDAP Username/Password
checking section, and copy the whole <entry> element in the section to each
<jaxws:properties> section in the file.
b. Examples of completed sections:
• A completed <jaxws:properties> section for the file-based validation should look
like the following:
<jaxws:properties>
<entry key="ws-security.callback-handler"
value="com.emc.documentum.eas.webservice.
security.UsernamePasswordCallbackHandler"/>
</jaxws:properties>
• A completed <jaxws:properties> section for the Documentum Content Server
validation should look like the following:
<jaxws:properties>
<entry key="ws-security.ut.validator"
value-ref="documentumCredentialValidator"/>
</jaxws:properties>
• A completed <jaxws:properties> section for the LDAP validation should look like
the following:
<jaxws:properties>
<entry key="ws-security.ut.validator"
value-ref="ldapCredentialValidator"/>
</jaxws:properties>
c.
If you choose Documentum or LDAP validation, you must also remove the comment tags
for the corresponding <bean> section. For example, remove the comment tags (<!-- -->)
in the following section:
<!-<bean id="documentumCredentialValidator"
class="com.emc.documentum.eas.webservice.security.
DocumentumCredentialValidator"/>
-->
d. Save cxf-context.xml.
217
InfoArchive Configuration
3.
Add the username and the password to TOMCAT_HOME/webapps/eas-services/WEB-INF
/classes/uname_token.dat. On the web service side, you can add multiple username and
password combinations, with each pair of username and password for a front-end application.
If you choose the LDAP validation for the username and the password, you must also complete the
following steps:
1.
Create a jaas.config file on the file system for LdapLoginModule. The file must contain
the following information:
easldap {
com.sun.security.auth.module.LdapLoginModule REQUIRED
userProvider="ldap://LDAP_server_IP:LDAP_Port/
ou=users,dc=mycompany,dc=com"
authIdentity="cn=USERNAME,
ou=users,dc=mycompany,dc=com"
useSSL=false;
};
For a full list of parameters contained in this configuration file, refer to the Class
LdapLoginModule reference page.
2.
Ensure the web application server is restarted with -Djava.security.auth.login.config
=absolute path to jaas.config.
Configuration on Web GUI
The only configuration task to perform is to add the username and the password to
TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes/uname_token.dat. On the web GUI
side, you must only add one pair of username and password.
X.509 Certificates
Configuration on Web Services
1.
Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/wsdl/eas-service-ws
-policies.wsdl.
a.
Remove comment tags (<!-- -->) for the X.509 Policy section.
b. Save the file.
Note: You can have only one active security policy for web services. Therefore, whenever you
remove comment tags in eas-service-ws-policies.wsdl, you must add comment tags
for the currently active policy to disable it.
2.
Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/cxf-context.xml.
a.
Locate the jaxws:endpoint Properties for X.509 checking section, and copy the
whole <entry> element in the section to each <jaxws:properties> section in the file.
b. Examples of completed sections:
• A completed <jaxws:properties> section for the X.509 checking should look like
below:
218
InfoArchive Configuration
<jaxws:properties>
<entry key="ws-security.callback-handler"
value="com.emc.documentum.eas.webservice.
security.X509PasswordCallbackHandler"/>
<entry key="ws-security.encryption.properties"
value="WEB-INF/serviceKeystore.properties"/>
<entry key="ws-security.signature.properties"
value="WEB-INF/serviceKeystore.properties"/>
<entry key="ws-security.encryption.username"
value="useReqSigCert"/>
</jaxws:properties>
c.
Save the file.
3.
Create keystores for InfoArchive web services and each client with an X.509 v3 certificate. The
X.509 certificate contains a new private key using the RSA algorithm for key generation and the
SHA1 with RSA algorithm for certificate signature.
4.
Import the certificate with the public key of InfoArchive web services into each client keystore,
and the public key of each client into the InfoArchive web services keystore.
5.
Place the InfoArchive web services keystore in a safe directory on the server where InfoArchive
web services are hosted.
6.
Modify the TOMCAT_HOME/webapps/eas-services/WEB-INF/serviceKeystore
.properties file with the correct values for the InfoArchive Web Services keystore :
org.apache.ws.security.crypto.provider=<cryptography provider>
org.apache.ws.security.crypto.merlin.keystore.file=<path to keystore>
org.apache.ws.security.crypto.merlin.keystore.password=
<service keystore password>
org.apache.ws.security.crypto.merlin.keystore.type=<keystore type>
org.apache.ws.security.crypto.merlin.keystore.alias=
<service private key alias>
org.apache.ws.security.crypto.merlin.keystore.private.password=
<service private key password>
• cryptography provider is the class providing the cryptography implementation. The default is:
org.apache.ws.security.components.crypto.Merlin.
• path to keystore is the absolute path to the keystore file containing the InfoArchive web services
private key, for example, C:\path_to_keystores\serviceKeystore.jks.
• service keystore password is the password protecting the keystore file.
• service private key alias is the alias of the InfoArchive web services private key.
• service private key password is the password protecting the InfoArchive web services private key.
Configuration on Web GUI
1.
Create a .java file with the following example code. The Service Key Alias in the last line can be
found in serviceKeystore.properties in Step 6 of the Configuration on web services section.
package com.emc.documentum.eas.gui.gwt.server.webservices;
import org.apache.cxf.endpoint.Client;
public class CustomSecurityConfigurator implements SecurityConfigurator
{
@Override
public void configureSecurity(Client client) throws Exception {
client.getRequestContext().put("ws-security.callback-handler","com.emc.
documentum.eas.gui.gwt.server.webservices.X509PasswordCallbackHandler");
219
InfoArchive Configuration
client.getRequestContext().
put("ws-security.encryption.properties", "clientKeystore.properties");
client.getRequestContext().put("ws-security.signature.properties",
"clientKeystore.properties");
client.getRequestContext().put("ws-security.encryption.username",
"<service key alias>");
}
}
2.
Compile the .java file, then place the compiled .class file in TOMCAT_HOME/webapps/eas
-gui/WEB-INF/classes/com/emc/documentum/eas/gui/gwt/server/webservices.
Customizing a Security Policy
You can also customize a security policy for InfoArchive web services. Enabling a customized policy
for InfoArchive consists of configuration tasks on web services and web GUI.
On web service side, you must perform the following tasks.
1.
Add the customized policy to TOMCAT_HOME/webapps/eas-services/WEB-INF/wsdl/eas
-service-ws-policies.wsdl, and ensure the customized policy is the only active one. The
active policy must have the attribute wsu:Id="ActivePolicy", and the other policies must be
disabled.
2.
Add the customized policy to the Apache CXF configuration file cxf-context.xml.
On web GUI side, you must create a customized CustomSecurityConfigurator class and place
the .class file in the specified directory.
The following code sample outlines a CustomSecurityConfigurator class. Similar to X.509
certificates, you must place the compiled class in TOMCAT_HOME/webapps/eas-gui/WEB-INF
/classes/com/emc/documentum/eas/gui/gwt/server/webservices.
package com.emc.documentum.eas.gui.gwt.server.webservices;
import org.apache.cxf.endpoint.Client;
public class CustomSecurityConfigurator implements SecurityConfigurator {
@Override
public void configureSecurity(Client client) throws Exception {
// CXF Configuration goes here.
}
}
The following websites provide detailed information about security policies:
• Apache CXF
• WS-Security configuration
• WS-SecurityPolicy configuration
Customizing InfoArchive GUI
The installation of InfoArchive includes a standard InfoArchive web GUI. You can customize UI
strings and CSS on the web GUI.
220
InfoArchive Configuration
UI strings
Files in the folder TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes/l10n contain UI strings
that can be customized to meet your requirements. These properties files are encoded with ANSI
character sets. You must use an ANSI compatible text editor to avoid introducing garbage characters.
CSS
You can define the style for your InfoArchive GUI in eas-gui/custom.css, which is an empty
file after deployment.
It is no longer needed to add a reference to the customized CSS in local JSP files. In JSP files, the
following link refers to custom.css.
<link type="text/css" rel="stylesheet" href="custom.css?v=1"/>
Autocomplete
You can activate the autocomplete feature on the InfoArchive login page by setting the
autocomplete value to ON in eas-gui/WEB-INF/pages/LoginPage.jsp.
When the autocomplete feature is turned on, login fields are populated automatically after your
first login.
Configuring Advanced InfoArchive GUI Settings
The eas-gui.properties files located in the InfoArchive GUI web application directory (for
example, TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes on Apache Tomcat) contains
preset InfoArchive web services settings and some advanced settings that you can configure if needed.
You can edit the eas-gui.properties file to modify the following advanced settings:
Property
Description
Default Value
eas.cache.directory
InfoArchive GUI cache directory
/tmp/eas
eas.locale.default
Properties for defining InfoArchive locales
for GUI search forms and stylesheets (search
results pages); see Defining InfoArchive GUI
Locales, page 195.
en_US
Maximum number of search results that
can be displayed on a synchronous search
results page
10
eas.locales
eas.client.config.default
.dateformat
eas.client.result.page.size
yyyy-MM-dd
221
InfoArchive Configuration
Property
Description
Default Value
eas.client.order.page.size
Maximum number of search
results that can be displayed on an
asynchronous/background search results
page
10
eas.export.content.quota
Maximum total size in MB of unstructured
content files that can be exported in a single
operation
200
eas.cache.clean
.shutdown
Whether or not the entire cache is cleared on
application shutdown (true or false)
true
eas.cache.clean.logout
Whether or not the a session’s cache is
cleared on logout (true or false)
true
eas.xdb.page.size
Size of the xDB page size used for the cachet
8192
eas.client.order.delivery
.channels
Delivery channels used by orders
(asynchronous searches). You can
define multiple channels in addition to
eas_access_services, delimited by comma:
eas.client.order.delivery.channels
=eas_access_services, my_delivery
_channel_1, my_delivery_channel_2
After you modify the settings, restart the web application for the changes to take effect.
Configuring Confirmations
You configure confirmations using the confirmation configuration object (eas_cfg_confirmation) in
conjunction with the query configuration object (eas_cfg_query) and the delivery channel object
(eas_cfg_delivery_channel).
Here are the general steps for configuring confirmations:
1.
Configure a delivery channel configuration object for confirmations.
2.
Configure a query configuration object for confirmations.
3.
Optionally, define the criteria for applicable AIPs.
4.
Configure a confirmation configuration object.
Confirmation Configuration (eas_cfg_confirmation)
Object
The confirmation configuration object primarily determines the scope of confirmation
processing—which set of AIPs to generate confirmations for (defined by an XQuery text file imported
as the content of the confirmation configuration object) for what event types.
222
InfoArchive Configuration
The confirmation configuration object also specifies:
• The result schema name used to find the query configuration object, which defines the content
of confirmation messages (through an XQuery XML file imported as the content of the query
configuration object)
• The delivery channel object to use that defines the output destination of confirmation messages.
The confirmation configuration object also passes delivery parameters to the delivery channel
object.
Query Configuration Object (eas_cfg_query) for
Confirmations
You configure a query configuration object (eas_cfg_query) to define the content of confirmation
messages.
The XML content of the query configuration object defines:
• The path of the XML elements to return
• The configured XQuery to execute on the selected XML elements for adjusting the results
The properties of the query configuration object defines:
• The holdings against which the confirmation job runs
• The schema URN to which the query results must comply
The confirmation job uses the following criteria to find a query configuration object:
• The holding of the AIP
• The schema URN of the confirmation configuration object
If there is no query configuration object that matches these criteria, the confirmation job does the
following and exits with a non-zero return code:
• Writes an error message in the log
• Continues the processing for other AIP events
223
InfoArchive Configuration
• Attempts to generate the remaining confirmations applicable to the AIP event
• Does not set the confirmation timestamp for the AIP event
• Continues the processing for other AIP events
The confirmation job will attempt to generate again the confirmations for this AIP event the next time
it runs. The confirmation messages which have been successfully generated for this AIP event will
be regenerated at the next job run.
XQuery for Confirmations
The XQuery content of the query configuration object for confirmations defines the content of
confirmation messages. During confirmation processing, the confirmation job executes the XQuery
on the SIP descriptor (eas_sip.xml) or the PDI (eas_pdi.xml) rendition of the AIP and returns
the query results in the designated format to construct the confirmation message.
The default execution scope of the XQuery depends on the event type:
Event Type
Default Execution Scope
Changeable
receipt
SIP descriptor (eas_sip.xml)
No
storage
PDI file (eas_pdi.xml)
Yes
purge
PDI file (eas_pdi.xml)
Yes
reject
SIP descriptor (eas_sip.xml)
No
invalid
SIP descriptor (eas_sip.xml)
No
For storage and purge event types, the XQuery can use any information stored in the AIP structured
data—eas_pdi.xml and eas_ci.xml.
The XQuery can return results in an XML file or in other file formats such as .csv or fixed records file.
A fixed record file refers to a file containing records requested by a business application, where each
field has a maximized constant size; for example, info1 from position 1 to 10, info2 from position 11 to
15, info3 from position 16 to 20, so on and so forth. Each field has a predefined layout.
Fixed record files are often used to generate confirmations with data produced by a mainframe
application. For example, you write an XQuery to return a subset of key of archived documents along
with their associated InfoArchive content identifiers in fixed record files. A business application
reading the confirmation message can use the content identifier to retrieve content without having
to issue an InfoArchive search first. It is extracted directly from the eas_ci_container rendition,
using the ci position.
Dynamic variables can also be used in the XQuery for confirmations. In the following example, the
AIP event is defined as an external variable ($eas_conf_type):
<request-configs xmlns:s="urn:x-emc:eas:schema:sip:1.0">
<request-config>
<entity>
<path/>
</entity>
<query-template xmlns:xhive="http://www.x-hive.com/2001/08/xquery-functions">
declare variable $eas_conf_type as xs:string external;
<confirmation xmlns='urn:x-emc:eas:schema:confirmation:1.0'> {
224
InfoArchive Configuration
let <uriSip := root(<select/>)
let <uriSip := replace(document-uri(<uriSip), '\.pdi<', '.sip')
return (doc(<uriSip)/s:sip, element eas_aip_id {xhive:metadata(doc(<uriSip),
'eas_aip_id')}, element eas_conf_type {<eas_conf_type}) }
</confirmation>
</query-template>
</request-config>
</request-configs>
Here is an example of the confirmation messages returned by the XQuery:
<confirmation xmlns="urn:x-emc:eas:schema:confirmation:1.0">
<sip xmlns="urn:x-emc:eas:schema:sip:1.0">
<dss>
<holding>PhoneCalls</holding>
<id>2011020113</id>
<pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema>
<pdi_schema_version/>
<production_date>2011-02-01T00:00:00.000+01:00</production_date>
<base_retention_date>2011-02-01T00:00:00.000+01:00</base_retention_date>
<producer>CC</producer>
<entity>PhoneCalls</entity>
<priority>0</priority>
<application>CC</application>
</dss>
<production_date>2011-02-01T00:00:00.000+01:00</production_date>
<seqno>1</seqno>
<is_last>true</is_last>
<aiu_count>10</aiu_count>
<page_count>0</page_count>
</sip>
<eas_aip_id>0800000b80009c9e80009ca3</eas_aip_id>
<eas_conf_type>invalid</eas_conf_type>
</confirmation>
Delivery Channel Configuration Object (eas_cfg_delivery_channel)
The delivery channel configuration object defines the output destination of confirmation messages.
InfoArchive provides two delivery channel implementations:
Delivery Channel (Java Class)
Output Destination
com.emc.documentum.eas.delivery.module.FileDeliveryModule
File system
com.emc.documentum.eas.delivery.module.XdbDeliveryModule
xDB
Using the xDB delivery channel facilitates the aggregation of multiple messages using a custom job,
as is the case with the InfoArchive audit archive holding. If you use the xDB delivery channel,
confirmation messages must be in XML format.
The delivery channel configuration object accepts a set of parameters. You define the values of the
parameters to pass to the delivery channel configuration object when configuring the confirmation
configuration object.
225
InfoArchive Configuration
Delivery Channel Configuration Parameters
Each type of delivery channels—file system and xDB—has a list of valid configuration parameters.
Parameter with an asterisk (*) next to it denotes that it is required.
File system delivery channel configuration object parameters are as follows:
Parameter
Default Value
Description
Name *
Filename convention of confirmations, which is
pattern consisting of fixed parts and confirmation
variables to construct a unique filename for
each generated confirmation; for example:
conf_%eas_conf_aip_id%_%eas_conf_type%
file.path *
Full path to the destination output directory; for
example: c:/tmp/infoarchive/confirmations
.Make sure the specified directory exists.
file.name.prefix
Prefix of the file
file.name.suffix
Suffix of the file, sometimes used to denote the file
extension; for example: .xml
file.overwrite
false
If the file already exits, whether or not to overwrite
the existing file (true|false)
file.zip
false
Whether to compress the file and add the zip
extension (true|false)
file.audittrail
false
Whether to generate a confirmation file export audit
trail entry (true|false)
audit:event_name
eas_delivery
Event name to put in the confirmation file export
audit trail entry
file.audittrail.content
.attr
null
Attribute where the OID content is saved. If null, the
OID content is not saved.
Some electronic archiving regulations require
generated confirmation messages to be archived.
Setting the name of an id attribute of the audittrail
object type to the file.audittrail.content.attr parameter
triggers:
• The import of the generated confirmation file as an
eas_audit_trail_content repository object
• The assignment of the eas_audit_trail_content
object identifier to the designated id attribute of
the audittrail
Archiving audit trails also archives the content
associated with those audit trail entries.
After archiving, InfoArchive destroys the
eas_audit_trail_content objects in the repository.
226
InfoArchive Configuration
Parameter
Default Value
Description
audit:audited_obj_id
AIP OID
audit:string
_1|2|3|4|5
Audit strings:
• string_1: eas_aip_id
• string_2: eas_dss_holding,eas_dss_producer,eas
_dss_id
• string_3: eas_aip_id:ci:position[:start_page:page
_count]
• string_4: pdi_key
• string_5: audit
audit:id_1|2|3|4|5
Audit IDs
xDB delivery channel configuration object parameters are as follows:
Parameter
Default Value
Description
eas_cfg_xdb_library *
Name of the eas_cfg_xdb_library object, which points
to the xDB library to store generated confirmations
name *
Name convention of XML documents created for
confirmations, which is pattern consisting of fixed
parts and confirmation variables to construct a unique
name for each generated confirmation; for example:
conf_%eas_conf_aip_id%_%eas_conf_type%
meta:metadata_name
The name of xDB metadata to assign to each XML
document. You can set a confirmation variable
as the value of the metadata; for example, assign
variable %eas_conf_datetime% to parameter
meta:eas_conf_datetime.
The following confirmation variables can be used to construct filenames or XML document names for
confirmations and assign metadata to XML documents.
Confirmation Variable
Description
eas_aip.property_name
Specify any property of the eas_aip object
eas_conf_audittrail_id
r_object_id of the audit trail entry created for the current
confirmation processing
eas_conf_type
Confirmation event type
eas_conf_datetime
Date and time when the processing of the current confirmation
started
eas_conf_event_datetime
Date and time when the confirmed event occurred
eas_conf_aip_id
The AIP object ID
eas_conf_aip_oid
r_object_id of the AIP object
227
InfoArchive Configuration
Confirmation Variable
Description
eas_conf_cfg_id
r_object_id of the confirmation configuration object
(eas_cfg_confirmation)
eas_conf_cfg
Object name of the confirmation configuration object
(eas_cfg_confirmation)
eas_conf_schema
eas_result_schema of the confirmation configuration object
(eas_cfg_confirmation)
eas_conf_schema_version
eas_result_schema version property of the confirmation
configuration object (eas_cfg_confirmation)
Configuring a Delivery Channel Configuration Object
for Confirmations
1.
In DA, navigate to a folder where you want to create the delivery channel configuration object.
You can use an archive holding folder (e.g., /System EAS/Archive Holdings/MyHolding)
if you are configuring the delivery channel exclusively for that holding, or a shared folder (e.g.,
/System EAS/Delivery Channels) if the delivery channel is to be shared by multiple
holdings.
2.
228
Create a new object of type eas_cfg_delivery_channel and set its properties.
InfoArchive Configuration
Property
Name
DA Label
Description
eas_name
Name
The unique name of the delivery channel.
eas_java_class
Delivery Java
class
Fully qualified name of the Java class implementing the
delivery channel.
InfoArchive provides two delivery channel
implementations: com.emc.documentum.eas.delivery
.module.FileDeliveryModule for the file system delivery
channel and com.emc.documentum.eas.delivery.module
.XdbDeliveryModule for the xDB delivery channel.
eas_parameter
_name
Parameter
name
Name of the delivery channel configuration parameter.
Each type of delivery channel (file system and xDB) has a
set of valid parameters.
eas_parameter
_value
Parameter
value
The value of the delivery channel configuration parameter
specified at the same index. These parameters are passed to
the delivery channel implementation class at runtime.
Configuring a Query Configuration (eas_cfg_query)
Object for Confirmations
1.
Create an XML file containing the XQuery for generating confirmations. For information about
how to construct the XQuery, see XQuery for Confirmations, page 224.
2.
In DA, create a new object of type eas_cfg_query in the archive holding folder by importing the
XQuery XML file and then configure its properties.
229
InfoArchive Configuration
Property
Name
DA Label
Description
eas_name
Name
Technical name of the query configuration.
eas_result
_schema
Result schema
name
Name of the schema in which the query results are returned.
eas_result_root
_element
Result root
element
The root XML element of the result.
eas_cfg_aic
Archival
Information
Collection
(AIC)
The Archive Information Collections (AIC) that this query
configuration can be used to query. A query configuration
can query multiple AICs.
Defining the Criteria for Applicable AIPs (Optional)
By default, a confirmation configuration is applicable to all AIPs. To restrict the applicability of a
confirmation configuration to a subset of AIPs, such as those of a particular holding, you can create a
text file containing the XQuery defining the applicability scope and import it into the repository as
the content of the confirmation configuration object.
The XQuery’s execution context is restricted to the SIP descriptor (eas_sip.xml) associated with an
AIP and cannot be dynamically changed to query the PDI file (eas_pdi.xml). Any information
contained in an AIP’s SIP descriptor can be used to define the applicability condition. The context of
the XQuery cannot be dynamically changed.
230
InfoArchive Configuration
The XQuery applicability condition must return a boolean value (true/false). Only when the returned
value is true is the confirmation configuration applicable to the AIP. In the following example, the
XQuery limits the application of the confirmation configuration only to the AIPs pertaining to the
PhoneCalls holding.
declare namespace sip="urn:x-emc:eas:schema:sip:1.0";
let $holding := /sip:sip/sip:dss/sip:holding
return matches($holding, 'PhoneCalls', 'i')
The “i” argument of the matches XQuery function indicates that a case-insensitive comparison of
the values will be performed. For more information about XQuery functions and operators, see
http://www.w3.org/TR/xpath-functions/#flags.
Configuring a Confirmation Configuration Object
In Documentum Administrator, create a new object of type eas_cfg_confirmation in the archive
holding folder and configure its properties.
If you created an XQuery text file for defining the applicability scope of the confirmation
configuration, choose File > Import to import the XQuery text file to create the object; otherwise,
choose File > New > Document.
Property Name
DA Label
Description
eas_confirmation
_type
Confirmation type
Confirmation event types that this configuration
is applicable to: receipt, storage, purge, reject,
invalid.
231
InfoArchive Configuration
Property Name
DA Label
Description
eas_result_schema
Result schema name
Uniform resource name of the result schema for
generating confirmation messages; for example:
urn:x-emc:eas:schema:confirmation:1.0.
This value is used to find the query configuration
object, which defines the content of confirmation
messages. Only matching query configuration
objects with the identical value of the
eas_result_schema property will be used for the
confirmation configuration.
eas_delivery_channel
Delivery channel
Delivery channel to use for outputting confirmation
messages.
eas_delivery_param
_name
Delivery parameter
name
Name of the delivery channel parameter to pass
to the delivery channel.
eas_delivery_param
_value
Delivery channel
parameter value
Value of the delivery channel parameter to pass
to the delivery channel.
232
Chapter 4
InfoArchive Administration
InfoArchive Administration Overview
EMC InfoArchive provides the functions and services for the overall operation of the archive system,
including the following administrative functions:
• Managing data throughout its entire lifecycle within InfoArchive, from archiving data into the
system, to managing archived data, to disposing of information at the end of its retention period
• Continuously monitoring the functionality of the entire InfoArchive system and systematically
control changes to the configuration. This function maintains integrity and traceability of the
configuration, audits system operations, system performance, and system usage. It receives
operational statistics from archival storage areas and periodically provides archival reports.
• Defining and modifying retention policies for archived data
• Managing InfoArchive jobs to monitor and improve archiving operations, and to inventory,
report on, and update the contents of the archive
• Managing audit trails to ensure established archive standards and policies are met in compliance
with regulatory and legal requirements
• Performing backup and restore as part of disaster recovery capabilities
• Querying archives using InfoArchive GUI
The level of complexity of the administrative tasks is highly dependent on the nature of archived
information and the level of integration of InfoArchive into your IT infrastructure.
Generating SIPs
InfoArchive is source application-agnostic, which means that it can archive any data produced by any
source application, as long as the data is packaged into a designated InfoArchive-supported format
for ingestion. For example, InfoArchive can archive scanned documents, recorded videos, business
data exported from an ERP system, and of course, information extracted from EMC Documentum
Content Server. Once archived, the information is securely preserved and can be retrieved at any time.
InfoArchive is not responsible for generating SIPs; you must use the source application or develop
your own utilities to generate SIPs that conform to the InfoArchive required format. You optionally
use a file transfer utility to transport the generated SIPs to where they can be ingested by InfoArchive.
EMC provides the following out-of-the-box utilities for generating SIPs:
• FS SIP Creator
FS SIP Creator is a standalone, configurable command line tool that creates SIP from a PDI file
template, a metadata configuration file, and content files on the file system. This tool ships with
233
InfoArchive Administration
InfoArchive and can be found in the /unsupported/tools/ directory of the InfoArchive
installation package. For information about using FS SIP Creator, refer to its accompanying
documentation.
• EMC InfoArchive Documentum Connector
Documentum Connector is a command-line data extraction and transformation utility that exports
content to be archived directly from the EMC Documentum repository and generates Submission
Information Packages (SIPs) to ingest into InfoArchive. For information about using Documentum
Connector, refer to the EMC Documentum Connector User Guide.
• EMC InfoArchive SharePoint Connector
SharePoint Connector is a command-line data extraction and transformation utility that extracts
documents and task items from SharePoint sites and generates SIPs to ingest into InfoArchive.
For information about using SharePoint Connector, refer to the EMC InfoArchive SharePoint
Connector User Guide.
Both EMC InfoArchive Documentum Connector and EMC InfoArchive Documentum SharePoint
Connector are not part of the InfoArchive installation package and must be downloaded separately.
For information about how to use these utilities, refer to their respective user guide downloadable
from EMC Online Support site (https://support.emc.com).
PDF Files Consolidation & Dynamic Extraction
When there are a large number of PDF documents with identical formatting and layout to be
archived, instead of generating a SIP containing multiple PDF files, one for each AIU, you can
consolidate all PDF files in the SIP into a single PDF file, with each AIU corresponding to a specific
page range of the file. A SIP can include several such consolidated PDF files.
When you perform queries on the archived package, AIUs are returned with their associated content
files (PDF) dynamically generated by extracting corresponding pages from the consolidated PDF file.
For example, you want to archive invoice records in PDF format. Instead of generating a SIP
containing many PDF documents, one for each invoice record, you can create a SIP that contains a
single consolidated PDF file. In the PDI file eas_pdi.xml, each invoice record (AIU) contains a
reference to a specific page range (denoted by a start page and a page count) of the PDF file. When
you query archived invoice records, each invoice record in the search results is returned as a single
PDF document, dynamically generated based on its page range in the consolidated PDF file.
PDF files consolidation and dynamic extraction provide the following benefits:
• Because information stored in one consolidated PDF file is shared by multiple documents (AIUs),
only one content file is needed as opposed to one PDF file for each AIU.
• Consolidating large numbers of PDF documents into one significantly reduces the total file size (at
least to one-third of the original size) as well as speeds up the ingestion process.
Generating SIPs Containing Consolidated PDF Files
When you generate SIP files that contain one or more consolidated PDF files, use a PDI schema
that includes the start_page and page_count elements. InfoArchive uses these two elements to
234
InfoArchive Administration
calculate an AIU’s page range in the consolidated PDF file: start_page indicates from which page a
PDF document (AIU) starts and page_count indicates how many pages the document spans, with
start_page being the first page.
In the following sample eas_pdi.xml:
• The first document corresponds to the first two pages of the file Invoice001.pdf.
• The second document corresponds to page 3 through 5 of the file Invoice001.pdf.
• The third document corresponds to the first page of the file Invoices002.pdf.
...
<document>
<start_page>1</start_page>
<page_count>2</page_count>
<filename>Invoices001.pdf</filename>
</document>
<document>
<start_page>3</start_page>
<page_count>3</page_count>
<filename>Invoices001.pdf</filename>
</document>
<document>
<start_page>1</start_page>
<page_count>1</page_count>
<filename>Invoices002.pdf</filename>
</document>
...
When using the PDF files consolidation & dynamic extraction feature, note the following:
• Limit the number of pages in the consolidated PDF file to around 200 for optimal query
performance. Too many PDF pages lowers query performance.
• Ensure the values of the start_page and page_count elements in eas_pdi.xml are valid. The
ingestion process does not validate these values. If page ranges are invalid, PDF documents can
still be ingested but cannot be successfully retrieved through queries.
• Ensure consolidated PDF files do not have any security settings enabled. InfoArchive cannot
handle PDF files with certain security settings turned on, for example, password protected or page
extraction restricted. The ingestion process does not check the validity of PDF files.
Starting the xDB Cache
If you use xDB ingestion mode 2, you must start the xDB cache first before launching the ingestor.
The xDB cache is a standalone Java program that runs as a background daemon for importing the
xDB detachable library associated with the AIP at the end of the ingestion. You can only run one
instance of the xDB cache.
By default, the arguments for the xDB cache process are defined in the EAS_HOME/conf
/eas-xdb-cache.properties file. The name of the xDB access node configuration
(eas_cfg_cache_access_node) object to be used by the xDB cache process is defined in the file.
Note: xDB server can be hosted on a dedicated server. If this is the case, you must start the xDB cache
process on the xDB server using the same OS account.
235
InfoArchive Administration
You can also start/stop the xDB cache via commands. You can override the arguments defined in
eas-xdb-cache.properties at the command line as needed.
Start the xDB cache by executing the following command located in EAS_HOME/bin:
• eas-launch-xdb-cache.sh (Linux)
• eas-launch-xdb-cache.bat (Windows)
Keep the prompt window open.
Note: To stop the xDB cache, you must explicitly execute the following command located in
EAS_HOME/bin:
• eas-stop-xdb-cache.sh (Linux)
• eas-stop-xdb-cache.bat (Windows)
Shutting down the host without stopping the xDB cache properly may cause failure the next time you
try to start it. If this happens, force-start the xDB cache using the force (-f) option.
On Windows, xDB cache is installed as a Windows service. You can run/stop xDB cache by
starting/stopping the EAS XDB Cache service in the Services Microsoft Management Console (MMC).
Note: NEVER execute the eas-launch-xdb-cache script when the EAS XDB Cache Windows service is
already running.
Archiving Data in Asynchronous Ingestion
Mode
In the asynchronous ingestion mode, you archive data into InfoArchive in the following distinct steps
by running corresponding scripts or jobs (typically scheduled) located in EAS_HOME/bin:
1.
Reception (eas-launch-receiver.bat / eas-launch-receiver.sh)
Queues the SIP file for ingestion. An AIP object is created in the Documentum repository and
assigned attribute values based on the information in the SIP.
2.
Enumeration (eas-launch-enumeration.bat / eas-launch-enumeration.sh)
Outputs a list of AIPs awaiting ingestion for a specified ingestion node.
3.
Ingestion (eas-launch-ingestor.bat / eas-launch-ingestor.sh)
Ingests the AIPs and their associated AIUs, in order of AIP ingestion priority. At this stage, the
AIUs are not searchable yet.
4.
Commit (eas-launch-job-commit.bat / eas-launch-job-commit.sh)
Commits the AIUs into the xDB database. The AIU data for the holding can now be searched.
For each command, its parameters are configured through a properties file or directly in the command
line. Once read from the repository, the settings in the holding configuration override the arguments
provided through the properties file or the command line.
236
InfoArchive Administration
Receiving SIPs
The eas-launch-receiver script executes the receiver process, which performs the following
actions:
1.
Creates an AIP (eas_aip) object
2.
Creates a directory under the reception root working directory (EAS_HOME/working
/reception_node_name)
3.
Attaches the eas_receive.en_US lifecycle to the AIP
4.
Promotes the AIP through the lifecycle
5.
Executes actions appropriate to each lifecycle state
6.
After reception is complete, deletes the received file from the receiving area
Before running the receiver, make sure a reception node has been configured for the reception. See
Configuring a Reception Node Configuration (eas_cfg_receive_node) Object, page 143.
Configuring the Receiver Properties
Configure the general receiver properties, such as reception node, repository name, and login
credentials, in EAS_HOME/conf/receiver.properties as needed.
Property Long
(Short) Name
Description
Default Value
config (c)
Name of the reception node configured for
the reception.
reception_node_01
This value must match that of the eas_name
property of the reception node configuration
(eas_cfg_receive_node) object so as to
associate the AIP being received to the object.
domain (d)
The user’s domain, if any
Empty value
delete (e)
(Optional) Deletes the supplied file (specified
in the file option). This option does not
require any argument.
N/A
file (f)
The file to process
N/A
aek (k)
(Optional) The aek file to use for the encrypted
password used to connect to the repository.
This must be the password for the user
specified in the user property.
N/A
level (l)
(Optional) Logging level: ERROR, WARN,
DEBUG, INFO, TRACE
N/A
customer (o)
The name of the customer supplying the file
EAS
237
InfoArchive Administration
Property Long
(Short) Name
Description
Default Value
password (p)
(Optional) The user password with which to
connect to the repository.
Password of the installation
owner user account
If the job executes on the same host where
Content Server is installed, you do not need
to specify this property here. The job can
connect to the repository through the Content
Server trusted login feature.
rkm (r)
(Optional)The RSA DPM client configuration
file to use to connect to the RSA DPM Server
N/A
sever (s)
The repository name
Name of the repository
type (t)
A qualifying type identifying the
configuration of the file to process.
EAS
user (u)
The name of the Documentum user with
which to connect to
Login name of the installation
owner user account
Running the Receiver
Except the file argument, all other mandatory arguments default from the properties file but can
be overridden at the command line.
eas-launch-receiver –f SIP_file_path_name
Note: The SIP file to be received cannot be read-only; otherwise, a reception error will occur.
Return code 0 (zero) indicates the SIP file has been successfully received. A good practice is to not
delete the original file without getting a return code of zero from the receiver.
If the reception failed, the created AIP object will be located in the reception node root folder. You can
either delete or invalidate the AIP object, troubleshoot the receiver error, and run the receiver again.
Verifying the Reception Process
Return code 0 (zero) indicates the SIP file has been successful received. You can optionally check
the following to verify the reception process.
• A new AIP object is created in the root AIP classification folder in the repository.
• The AIP is in queue for ingestion, as is indicated by its Waiting_Ingestion state.
• As is displayed on the AIP properties page (AIP tab), both the retention date and ingestion
deadline were computed during the reception process. Return code is 0 (zero). Most of the
properties are populated using the information in the SIP descriptor (eas_sip.xml).
238
InfoArchive Administration
• Information related with the reception is recorded under the Tracking tab of the AIP properties
page for tracking and reporting purposes.
• The reception log file (eas_receive_logs_zip) has been included as part of the AIP renditions.
239
InfoArchive Administration
Troubleshooting Receiver Errors
When a reception error occurs, a non-zero return code is returned by the receiver command line and
more information can be found in the reception log file created in the working directory of the
reception. If configured, the reception log is also attached as content of the AIP object.
For reporting an traceability purposes, many information are also populated as attributes of the
AIP repository object created by the reception
• Return code and message of the reception error
• Reception date, node, working directory
• Received file path name and size
The AIP repository object remains in the reception lifecycle state where the error occurred.
After diagnosing the reason of the reception error and fixing it:
• The AIP repository object created by the failed reception must be invalidated in the DA interface.
• A new reception must be launched; this reception creates a new AIP repository object.
This procedure does not correspond to a restart of the failed reception; such restart approach has not
been retained for the following reasons:
• Considering the basic actions performed by the receiver, it has been judged as a low source
of errors.
• Nothing prevents the received file to be overridden by a newer file between the reception error
and a restart. As such, restarting a reception by relying solely on the file system path of the
received file has not been perceived as a safe approach.
Receiver Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
240
InfoArchive Administration
Return Code
Mnemonic
Description
3
E_CREDENTIALS
Cannot connect to the repository with the configured
credentials
4
E_PARAMS
Error while validating the parameters
5
E_GLOBALCONFIG
Cannot load the InfoArchive global configuration object
6
E_RECEIVERNODE
Cannot load the reception node configuration object
7
E_CREATELOG
Cannot create the log file in the working directory
8
E_CREATEAIP
Cannot create the AIP repository object
9
E_UPDATEAIP
Cannot update the AIP repository object
10
E_INITPOLICY
The policy cannot be attached to the AIP due to JMS issues,
missing policy, or other reasons.
11
E_LOCATESIPEX
The method configured for extracting the SIP descriptor
returned a non-zero return code
12
E_LOCATESIP
Cannot locate the descriptor in the SIP
13
E_WRITESIP
The SIP descriptor could not be imported as a content of
the AIP repository object
14
E
_LOCATEHOLDING
The SIP descriptor references an unknown archive holding
15
E_VALIDATESIP
The SIP cannot be queued either because the structure of
its descriptor is invalid, an existing non invalidated AIP
repository object has the same identifier, the descriptor
does not contain any hash value for the PDI file while
demanded by the holding configuration, an inconsistency
of the SIP sequence number has been detected regarding
the AIP repository objects having the same DSS identifier
or the destruction lock cannot be created.
16
E_REJECT
The SIP has been set to the rejected state since its descriptor
references a rejected DSS identifier
17
E_WRITEHOLDING
Error while applying the changes to the AIP repository
object configured in the archive holding (i.e. changing the
type of the AIP, classifying the AIP in the folder hierarchy
or attaching the ingestion lifecycle to the AIP)
18
E_COMPLETEAIP
Error while promoting the AIP to the ingestion pending
state
241
InfoArchive Administration
Receiver Log File
Every execution of the receiver leads to the creation of a reception working directory and a log
file in that directory:
• The name of the reception working directory consists of the reception start date timestamp and
the AIP (eas_aip) object ID.
• The log file name is eas_receive_logs_zip.log
For traceability purpose, if activated for the archive holding, a compressed form of the reception log
files is imported as content of the AIP repository object as format eas_receive_logs_zip even
if an execution error occurs. As such, the log file can be both accessed at the file system level and
within the DA interface as content of the AIP repository object.
Enumerating AIPs for Ingestion
The enumeration process is executed by running the eas-launch-enumeration script and outputs
a list of AIPs awaiting ingestion for a specified ingestion node, ordered by:
1.
The ingestion priority of their respective archive holding
2.
The ingestion deadline date computed during reception
Configuring Enumerator Properties
The enumerator properties file is named job-enumeration.properties and contains the
following properties.
Property Long
(Short) Name
Description
Default Value
cutoff_days (c)
The cut off in days for research.
31
flags (f)
(Optional) minusrunning:
Returns the maxinum number of AIPs
to enumerate, minus the number of
ingestions already running on the
ingestion node
242
InfoArchive Administration
Property Long
(Short) Name
Description
-faileduntildate (F)
Includes AIPs of which ingestion has
failed, with an ingestion start date earlier
than the specified date.
Default Value
This option lets you trigger a batch retry
of failed ingestions.
server (s)
The repository name
Name of the repository
domain (d)
the user’s domain if any
Empty value
user (u)
The name of the Documentum user to
connect with
Login name of the installation
owner user account
password (p)
(Optional) The user password with
which to connect to the repository.
Password of the installation owner
user account
If the job executes on the same host
where Content Server is installed, you
do not need to specify this property here.
The job can connect to the repository
through the Content Server trusted login
feature.
aek (k)
(Optional)The aek file to use for
encrypted password
This property is not set
level (l)
(Optional)The optional logging level to
use (ERROR, WARN, DEBUG, INFO,
TRACE)
This property is not set
max (m)
(Optional) The maximum number of
AIPs to return
nodes (n)
(Optional) Ingestors node name(s) for
which to return the AIP list
ingestion_node_01
The purpose of the cutoff_days property is to optimize the DQL query issued by the enumerator by
adding a criteria restricting the search to the AIP received after the current date minus the number
of days mentioned in this property.
Running the Enumerator
All arguments are defaulted in the properties file but can be overridden at the command line.
eas-launch-job-enumeration
The return code is set to 0 (zero) for success and non-zero for error.
When successful, the enumerator returns a list of AIP object IDs.
In order to facilitate the invocation of the ingestion within a shell:
• The list of AIPs is written to the standard output (For example, stdout)
243
InfoArchive Administration
The parsing of the result returned to stdout allows for simple integration within a shell (e.g.,
which dynamically creates new ingestion jobs in the custom job scheduler) .
• Messages are written to the standard error output (For example, stderr)
Troubleshooting Enumerator errors
The enumerator is just a utility which does not any processing except returning an ordered list
of AIPs. In case of error, the cause of the error must be identified and fixed but not any special
procedure has to be applied.
Enumerator Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the configured
credentials
4
E_PARAMS
Error while validating the parameters
5
E
_INGESTORNODES
Error while loading the configuration of the referenced
ingestor nodes
Re-enumerating Failed AIPs
When the ingestion of an AIP fails, the enumerator:
• Does not consider such failed ingestion as an ingestion being executed
• By default, does not include such AIP in the returned list
However, in situations where a system wide problem leads to the failure of many ingestions, it can be
useful to restart those failed ingestions en masse after fixing the problem.
Such operation can be achieved by using the "faileduntildate=DQL date" flag of the enumerator.
This flag leads the enumerator to also consider the AIPs being in an ingestion error state and for
which the last ingestion attempt was triggered up to the date passed in the argument.
Including those AIPs in the results allows the invoking shell to restart their ingestion.
The passed date is a safeguard mechanism for avoiding an infinite restart loop when the ingestion
constantly fails.
244
InfoArchive Administration
Ingesting AIPs
The ingestion process is invoked by running the eas_launch-ingestor script and it performs
the following actions:
• Creates a directory under the ingestion root working directory
• Attaches the AIP to the lifecycle associated with the ingestion sequence (defined by the
eas_cfg_ingest object) to be executed
Before running the ingestor, make sure that:
• An ingestion node has been properly configured. See Configuring an Ingestion Node
Configuration (eas_cfg_ingest_node) Object, page 146.
• xDB cache is running. See Starting the xDB Cache, page 235.
The xDB cache process imports the xDB detachable library associated with the AIP and stores
it as a rendition of the AIP object at the end of the ingestion.
Configuring Ingestor Properties
The ingestor properties file ingestor.properties located in EAS_HOME/conf contains the
following properties.
Property Long (Short)
Name
Description
Default Value
config (c)
Name of the ingestion node configuration
(eas_cfg_ingest_node) object to use for the
ingestion
ingestion_node_01
server (s)
The repository name
Current repository name
domain (d)
the user’s domain if any
user (u)
The name of the Documentum user to
with which to connect to the repository
Login name of the installation
owner user account
password (p)
(Optional) The user password with which
to connect to the repository.
Password of the installation
owner user account
If the job executes on the same host where
Content Server is installed, you do not
need to specify this property here. The job
can connect to the repository through the
Content Server trusted login feature.
aek (k)
(Optional) The aek encrypted password
file for connecting to the repository. This
must be the password of the user specified
in the user property.
rkm (r)
(Optional) The RKM client configuration
file to use to connect to the RKM Server
245
InfoArchive Administration
Property Long (Short)
Name
Description
id (a)
ID (r_object_id) of the AIP object
level (l)
(Optional) The optional logging level to
use (ERROR, WARN, DEBUG, INFO,
TRACE)
delete (e)
Whether to delete data placed in the
working directory if an error occurs
(true|false).
Default Value
True
By default, in the event of an ingestion
error, data being archived is not deleted
from the working directory.
Set this to true to automatically delete
sensitive data from the working directory
in case of an ingestion error.
Running the Ingestor
To ingest an AIP, launch the ingestor using the following command:
• EAS_HOME/bin/eas_launch-ingestor.bat aip_id (Windows)
• EAS_HOME/bin/eas_launch-ingestor.sh aip_id (Linux)
Except for the AIP_ID, which is the AIP object ID, all other arguments values default from the
eas-ingestor.properties file located in EAS_HOME/conf. You can override the default values
by specifying the arguments in the command line.
After the ingestor is run, a return code is displayed: 0 (zero) indicates successful ingestion and a
non-zero value means the ingestion has failed.
If ingestion of an AIP failed, identify and resolve the cause of the error, and re-ingest it. When
you restart an ingestion, the ingestor reverts to the initial ingestion lifecycle state and re-execute
the ingestion process.
Note: You can only restart a failed ingestion on the same ingestion node, due to the following reasons:
• The working directory for a ingestion node is not accessible by other ingestion nodes. Therefore,
the log file of a failed ingestion can only be accessed from the ingestion node on which the
ingestion was performed.
• To ensure the log of new ingestion attempts are appended to the old log file so that the entire
ingestion history is kept for traceability.
246
InfoArchive Administration
Verifying the Ingestion Process
Return code 0 (zero) indicates the AIP file has been successful ingested. You can optionally check
the following to verify the reception process:
• On the ingested AIP’s properties page:
— The current state is Waiting_Commit.
— The min and max values of the partitioning key has been computed and recorded.
— Ingestion related information has been recorded.
• Information relating to contents imported into xDB has been captured and recorded.
The Locked in cache deadline date property denotes until when the associated xDB library must
remain online so that the archived data can be searched.
247
InfoArchive Administration
• In the ingestion node working directory (default: EAS_HOME/working/ingestion_node_01),
an ingestion directory has been created for the AIP, its name consisting of the ingestion
start date timestamp and the AIP object ID. The directory contains the ingestion log file
(eas_ingest_logs_zip.log). If configured, the file is also imported as a rendition of the
AIP object.
Troubleshooting Ingestion Errors
When ingestion error occurs, a non-zero return code is returned at the ingestor command line. You
can find more information in the ingestion log file created in the ingestion node working directory. If
configured, the ingestion log file is also imported as content of the AIP object.
For traceability purposes, the AIP repository object properties are populated with the following
information:
• Return code and message of the ingestion error
• Ingestion date, node, working directory
The AIP repository object remains in the ingestion lifecycle state where the error occurred.
The data files managed by the ingestor in the working directory are normally left as is for facilitating
the diagnosis of the problem. If configured, InfoArchive deletes those files when an error occurs to
protect sensitive data.
The actions to perform depend on the diagnosed cause of the error:
248
InfoArchive Administration
Cause of the
Error
Actions to Perform
Transient
technical issue
After fixing the cause of the error, the ingestion can be restarted by relaunching
the ingestor for this AIP.
The ingestor detects a restart, reattaches the AIP at the first state of the ingestion
lifecycle, performs the necessary cleanup and executes again the ingestion.
In order to keep a consistent logging across the multiple executions of the
ingestion. A restart reuses the existing ingestion working directory and appends
its logs to the existing log file. For that reason, a restart must be done using the
same ingestion node.
Configuration
error
After fixing the configuration error, the ingestion can be restarted as described
above.
Incorrect DSS
The origin of the error comes from the business application having generated an
incorrect SIP and it is identified that all SIPs belonging to that DSS are incorrect
and must be rejected.
In such situation, the administrator has to reject the AIP
Incorrect SIP
The origin of the error comes from an incorrect isolated SIP file but other SIP
files belonging to the same DSS are correct.
In such situation, the administrator has to invalidate the AIP.
Ingestor Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
I_DUP
Ingestion refused since an ongoing processing on
the AIP has been detected
2
I_PARALLEL
Ingestion refused since another ongoing ingestion
on this AIP has been detected
10
E_PARSE
Error while parsing arguments
11
E_DFCINIT
Error while initializing the DFC
12
E_CREDENTIALS
Cannot connect to the repository with the configured
credentials
13
E_PARAMS
Error while validating the parameters
14
E_GLOBALCONFIG
Cannot load the InfoArchive global configuration
object
15
E_INGESTORNODE
Cannot load the ingestor node configuration object
16
E_BINDLOG
Cannot create the log file in the working directory
17
E_NOAIP
Cannot find the AIP having the provided identifier
249
InfoArchive Administration
Return Code
Mnemonic
Description
18
E_LOCATEHOLDING
Cannot find the holding referenced by the AIP
19
E_LOCATECONFIG
Cannot find an ingestion configuration applicable
to the AIP
20
E_INVALIDAIP
The AIP state is not applicable for launching the
ingestion on the AIP
21
E_CANNOTRESTART
The ingestion cannot be restarted
22
E
_TRANSFORMCONFIG
Empty eas_cfg_pdi: No PDI schema is specified
23
E_LOCATEXDBNODE
Cannot load the configuration of the target xDB
library to use
24
E_LOADRKMCLIENT
Cannot load the RSA RKM client
25
E_PROCESSOR
Invalid eas_cfg_pdi value: The specified PDI schema
is invalid
Ingestor Log File
Every ingestion leads to the creation a working directory and a compressed log
file eas_ingest_logs_zip.log in that directory. The name of the working
directory consists of the ingestion date timestamp and the AIP object ID; for example:
20120501T144822536_0800046e800061ee.
For traceability purpose, if activated for the archive holding, a compressed form of the ingestion log
files is imported as content of the AIP repository object as format eas_ingest_logs_zip even
if an ingestion error occurs.
As such, the log file can be both accessed at the file system level and within the DA interface as
content of the AIP repository object.
Committing Ingested AIPs
The commit process is invoked by running the eas_launch-ingestor script and it commits AIUs
into the xDB database so that the archived data can be searched.
You schedule the eas_commit Content Server job to run at regular intervals to scan for ingested AIPs
pending commit (lifecycle state is Waiting_Commit) and commit them.
250
InfoArchive Administration
The commit process can be synchronous or asynchronous depending on the ingestion mode used:
• Asynchronous commit
In asynchronous ingestion mode, when multiple ingested SIPs are part of a single DSS (batch),
you can execute the eas_commit job multiple times, but until all the SIPs have been ingested, none
of the archived data in the batch will be searchable.
When all SIPs pertaining to a DSS have been ingested (lifecycle state = Waiting_Commit),
executing the eas_commit job performs the following actions on every SIP in the batch:
— Imports a compressed commit log file as a rendition of the AIP (eas_aip) object for traceability
purposes
— Deletes the original SIP file rendition from the AIP object, unless the holding is configured to
keep SIP files.
If you use an EMC Centera as the archive store, the eas_commit job also pushes the AIP retention
date and the content
— Populates corresponding Centera properties with the values of some properties of the AIP and
its content for reversibility
— Assigns the AIP retention date to the contents stored in Centera
— Promotes the AIP object to the Completed lifecycle state.
This deferred push of the retention date and content properties to Centera (as opposed to during
the reception or ingestion stage) allows for the potential destruction of the ingested contents
before the commit is complete. This way, when you cancel the ingestion of AIPs before they are
committed, the ingested content is deleted.
• Synchronous commit
In synchronous ingestion mode, you enable synchronous commit for the target holding and
ingested standalone SIPs (one SIP per DSS) are automatically committed without the need to
explicitly invoke the eas_commit job.
To enable synchronous commit for the target holding, edit the properties of the holding
configuration (eas_cfg_holding) object and under the Holding tab, select the Synchronous
commit enabled option.
When the commit process (both synchronous and asynchronous) is complete, the AIP lifecycle state is
Completed and the archived data is searchable.
Executing the eas_commit Job
You can execute the eas_commit job in one of the following ways:
• Run the eas_commit job in DA
Under the repository folder Administration/Job Management/Jobs, right-click the
eas_commit job and choose Run.
• Run the eas_commit job at the command prompt using the runJob command:
EAS_HOME/bin/runjob repository_name eas_commit
251
InfoArchive Administration
Verifying the Commit
After a successful commit, you can see in the properties page of the ingested AIP object that:
• The current Phase and State of the AIP are both Completed.
• The Commit date is recorded.
Also, the eas_zip_zip rendition of the committed AIP has been removed, unless the holding is
configured to keep SIP files. Meanwhile, the compressed commit job log file eas_commit_logs_zip
has been imported as a rendition of the AIP.
Managing AIPs
An AIP is represented as an eas_aip type object in Content Server. You manage AIPs in DA in the
same way as you manage other repository objects using standard DA features such as searching for
objects and viewing object properties.
252
InfoArchive Administration
Right-click an AIP and choose a command from the shortcut menu to perform the following actions:
Menu Command
Action
Properties
View the properties of the AIP.
For information about the properties of an eas_aip object, refer to the
EMC InfoArchive Object Reference Guide.
Reject
Reject the AIP.
When you reject an AIP, all existing and future AIPs within the same
DSS (batch) will also be automatically rejected.
Note: An open AIP aggregate cannot be rejected.
You must perform additional steps to complete the entire AIP rejection
cycle. See Rejecting or Invalidating an AIP, page 255.
Invalidate
Invalidate the AIP.
The invalidation of an AIP consists in declaring that the SIP having led
to the creation of the SIP has to be ignored. However, for traceability
purpose, the AIP object is kept.
In the event other AIPs or future received SIP file have the same DSS
identifier as the invalidated AIP, they will be processed.
Note: An open AIP aggregate cannot be invalidated.
You must perform additional steps to complete the entire AIP
invalidation cycle. See Rejecting or Invalidating an AIP, page 255.
253
InfoArchive Administration
Menu Command
Action
Apply/Remove the
Purge Lock
Apply/remove purge lock on the AIP.
Cache In/Out
Temporarily remove the archived AIP structure data from the xDB file
system (cache out) and put it back into xDB (cache in).
A purge lock prevents an AIP from being deleted when its retention
date is reached. Such lock can be automatically created when an AIP
is created or manually by the administrator.
For information about xDB caching, see xDB Caching, page 55.
View > Renditions
View all the renditions of the AIP object.
To close an open AIP parent, right-click it and choose Request AIP Parent Close from the shortcut
menu. The AIP parent will be closed the next time the eas_close job is executed.
AIP States
In DA, different AIP states are represented by different icons.
Icon (Unlocked)
Icon (Locked)
AIP (Parent) State
AIP parent
AIP parent is open
AIP parent is pruned
AIP
AIP is online
AIP is offline
AIP is not searchable
AIP is in unsteady state
One of the following scenarios leads to the unsteady
state:
• The AIP is being ingested (work in progress)
• The AIP is a future aggregate and its associated
AIP parent is open (AIP mode 3)
254
InfoArchive Administration
Icon (Unlocked)
Icon (Locked)
AIP (Parent) State
• The AIP belongs to an aggregated AIP parent but
is yet to be pruned (AIP mode 3)
Rejecting or Invalidating an AIP
Rejecting or invalidating an AIP and promoting it through the complete rejection/invalidation
lifecycle entails the following steps:
1.
In DA, right-click an AIP object and choose the Reject or Invalidate command from the shortcut
menu.
The lifecycle state of the AIP object changes to REJ-WCOM or INV-WCOM.
2.
Make sure confirmations are enabled for the reject and invalid event types. Run the confirmation
(eas_confirmation) job.
3.
Run the rejection/invalidation (eas_rejinv) job.
The lifecycle state of the AIP object changes to REJ-WXDBCLEAN or INV-WXDBCLEAN.
4.
Run the xDB clean job by executing eas-launch-xdb-clean.bat (Windows) or
eas-launch-xdb-clean.sh (Linux).
255
InfoArchive Administration
The lifecycle state of the AIP object changes to REJ-WPROC or INV-WPROC.
5.
Run the rejection/invalidation (eas_rejinv) job again.
The lifecycle state of the AIP object changes to REJ-DONE or INV-DONE.
The AIP rejection/invalidation cycle is complete.
Updating Custom Package Metadata
In SIP descriptor, you can add custom package metadata to incorporate extra information about the
package, for example, department that owns the SIP, partitioning criteria, and so on. Custom package
metadata must be enclosed within <custom> tags, as shown in the following example.
<?xml version="1.0" encoding="UTF-8" ?>
<sip xmlns="urn:x-emc:eas:schema:sip:1.0">
<dss>
<holding>PhoneCalls</holding>
<id>20140418012221</id>
<pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema>
<pdi_schema_version />
<production_date>2011-02-01T00:00:00.000+01:00</production_date>
<base_retention_date>2011-02-01T00:00:00.000+01:00</base_retention_date>
<producer>CC</producer>
<entity>PhoneCalls</entity>
<priority>0</priority>
<application>CC</application>
</dss>
<production_date>2011-02-01T00:00:00.000+01:00</production_date>
<seqno>1</seqno>
<is_last>true</is_last>
<aiu_count>10</aiu_count>
<page_count>0</page_count>
<custom>
<attributes>
<attribute name="rep_attr">value1</attribute>
<attribute name="rep_attr">value2</attribute>
<attribute name="int_attr">1</attribute>
<attribute name="boolean_attr">true</attribute>
<attribute name="rep_attr">value3</attribute>
<attribute name="date_attr">1990-02-01</attribute>
<attribute name="float_attr">3.1</attribute>
</attributes>
</custom>
</sip>
Ingesting SIPs Containing Custom Package Metadata
In order to ingest SIPs containing custom package metadata, you must create a subtype of the base
AIP type (eas_aip), and specify the created subtype in the archive holding. The following procedure
shows how to create a subtype and specify that type in the archive holding.
1.
Launch Documentum Server Manager.
2.
Click IAPI on the Repository tab.
256
InfoArchive Administration
3.
Enter the following DQL script in the IAPI command prompt. The script creates a subtype that
holds custom attributes shown in the previous section. For more information about CREATE
TYPE and other DQL statements, refer to the EMC Documentum Content Server DQL Reference.
create type "test_phonecalls"
("bolean_attr" boolean(default=false),"float_attr" float,
"rep_attr" string(32) repeating, "date_attr" date, "int_attr" integer (NOT NULL))
with supertype "eas_aip"
Note: The custom package metadata value must be one of the following types:
• xs:string
• xs:float
• xs:integer
• xs:date
• xs:datetime
4.
In the archive holding, change the eas_aip_type attribute to the created subtype name, for
example, test_phonecalls.
In DA, this attribute is set in the AIP type text box on the Holding tab.
5.
After ingestion, custom package metadata turns into AIP object attributes.
Updating Custom Package Metadata
You may want to change custom package metadata occasionally due to company reorganization,
merge and acquisition. InfoArchive 3.0 first exposes custom package metadata to users with
appropriate access rights. If you are the system administrator or the IT specialist with WRITE or
RELATE access, you can update custom package metadata through DA or using DQL scripts.
Custom Package Metadata Update Dates
After a SIP is ingested, it turns into an AIP repository object. The SIP package metadata
contained in the SIP descriptor is applied as the attribute values of the AIP object. If
eas_sip_xml_store_enabled is set to TRUE, the SIP descriptor is saved as the AIP rendition.
Therefore, whenever you update custom package metadata (AIP attribute value), you must also
update the SIP descriptor in the AIP rendition to keep consistency. The following attributes are
added for eas_aip type:
• eas_aip_cmeta_modify_date: When is the last time you modified custom package metadata
of an AIP object.
• eas_sip_cmeta_refresh_date: When is the last time the refresh job propagated custom package
metadata changes to the SIP descriptor rendition.
257
InfoArchive Administration
Updating Custom Package Metadata in DA
You can update custom package metadata for an AIP individually in DA, if you log in to DA with
RELATE and WRITE privileges. Complete the following procedure to update custom package
metadata in DA:
1.
Right-click the AIP object whose custom package metadata you would change and then select
Properties.
2.
Select the AIP tab.
3.
Locate the custom package metadata you would change. If the metadata is not shown in
READ/WRITE mode, you may not have enough privileges to manipulate the data. Please contact
your system administrator.
Note: The DA label for custom package metadata is identical to the attribute name of the
metadata in the SIP descriptor. For example, the DA label for the following metadata is
boolean_attr. EMC recommends using descriptive attribute names.
<att ribute name="boolean_attr">true</attribute>
4.
Change the metadata, and click Save.
By now, the eas_aip_cmeta_modify_date attribute is set to the time when the metadata is
changed. If SIP descriptors are saved as the AIP rendition, a refresh job is needed to propagate
changes to keep data consistency. The refresh job is mentioned in Propagating Changes to AIP
Renditions, page 258.
Updating Custom Package Metadata using DQL scripts
You can mass update the custom package metadata for a batch of AIP objects by using DQL scripts.
You must have at least WRITE access to the AIPs. In addition to updating metadata, you must also
update the eas_aip_cmeta_modify_date attribute. The DQL scripts should have the following
pattern:
UPDATE "test_phonecalls" OBJECTS
SET "int_attr" = 2,
SET "boolean_attr" = false,
INSERT "rep_attr"[4] = 'value4'
SET "eas_aip_cmeta_modify_date" = DATE(NOW)
WHERE conditions_for_update
After the DQL update, you must run the refresh job to propagate package metadata (AIP attribute
value) changes to AIP renditions.
Propagating Changes to AIP Renditions
If eas_sip_xml_store_enabled is set to TRUE in the archive holding, SIP descriptors are saved
as AIP renditions. To view an AIP’s rendition, you can select View > Renditions in the shortcut menu,
and double-click eas_sip_xml (XML description of an AIP) in the list.
The eas_sip_cmeta_refreshjob propagates custom package metadata changes to AIP renditions.
This refresh job selects AIPs using the following DQL script:
258
InfoArchive Administration
SELECT ... FROM eas_aip WHERE eas_aip_cmeta_modify_date IS NOT NULLDATE
AND eas_aip_cmeta_modify_date>eas_sip_cmeta_refresh_date ORDER BY eas_aip_cmeta_modify_date
The DQL script may also select out AIPs not eligible for propagating changes. The refresh job may be
executed in the following scenarios:
• AIP is base eas_aip type
AIPs of eas_aip type do not contain any custom package metadata. The refresh job:
1.
Sets eas_sip_cmeta_refresh_date to the current date to avoid processing in the future.
2.
Writes information into the log file.
3.
Proceeds to the next AIP object.
• AIP is part of an open AIP aggregate
It is not allowed to change the custom package metadata of an open AIP aggregate (eas_aip_mode
= 3). The refresh job:
1.
Sets eas_sip_cmeta_refresh_date to the current date to avoid processing in the future.
2.
Writes information into the log file.
3.
Proceeds to the next AIP object.
• AIP is in a transient mode
AIPs whose eas_is_in_unsteady_state is TRUE are in transient mode. The refresh job:
4.
Be postponed to the time when AIP is not in transient mode.
5.
Writes information into the log file.
6.
Proceeds to the next AIP object.
• AIP is a subtype of eas_aip or a closed AIP aggregate
The refresh job:
1.
Looks for the rendition, which has the highest page_rendition value.
2.
If no SIP descriptor rendition is found, performs the following actions based on different
conditions:
a. If the AIP is not rejected or invalidated, an error message is written to the log.
b. If the AIP is rejected or invalidated, sets the eas_sip_cmeta_refresh_date attribute
to TRUE to avoid processing it in the future.
3.
Exports the rendition to the working directory, for example, C:\app\eas\working.
4.
If the rendition is a package, unzips the package, and then replaces custom package metadata
in the SIP descriptor with the new AIP attribute values.
5.
Imports the updated SIP descriptor to replace the old eas_sip_xml rendition.
a. If the old rendition cannot be removed because it has not reached its retention date, the
updated SIP descriptor is imported with a page modifier set to current date. The date
is formatted as YYYYMMDDHHMMSS.
b. If the XML store is a retention-enabled storage, for example, Centera,
1. Structured data must be pushed to the storage.
259
InfoArchive Administration
2. If the AIP is in one of the final states, COM, REJ-DONE, or INV-DONE, the retention
date must be pushed to the storage.
3. If the AIP is not in a final state and eas_rejinv_retention_enabled is set to
FALSE, the retention date is not pushed to the storage.
6.
Sets the eas_sip_cmeta_refresh_date attribute to the current date to avoid processing
in the future.
7.
Writes information into the log file.
8.
Proceeds to the next AIP object.
The following table lists the exit codes for the refresh job.
Exit Code
Description
0
s.ok: No error
1
e.parse: The return code for argument parsing error
2
e.dfc.init: The return code for DFC initialization error
3
e.credentials: The return code for server credentials error
4
e.params: The return code for parameter validation error
5
e.report: The return code for report creation error
-1
e.unexpected: The return code for unexpected error
Performing Data Retention Management
Data retention management is essential to an enterprise archiving system. Data retention defines
the policies of persistent data and records management for meeting legal and business data archival
requirements. It controls and governs archived records of an organization throughout the records
lifecycle, from the time the records are ingested to their eventual disposal.
InfoArchive lets you perform data retention management in the following two ways:
• Using InfoArchive’s date-based retention management capabilities
• Using extended retention management capabilities of Retention Policy Services (RPS)
InfoArchive provides two archival reports to help you better manage data retention:
• List of AIPs by Retention Date
Vew all archived AIPs whose retention date is before a specified date.
• AIPs for Disposition
View all AIPs with a disposition date determined by the retention policy.
To perform data retention management tasks including changing the retention date, adding/removing
a purge lock, and rejecting/invalidating an AIP, you must have at least the Relate permission.
260
InfoArchive Administration
Using InfoArchive’s Date-Based Retention Management Capabilities
InfoArchive retains and disposes archived data based on the retention date, which is stored in the
eas_retention_date property of the AIP (eas_aip) object.
When a SIP is received, the retention date of the AIP is calculated based on the base retention date
specified in the SIP descriptor (eas_sip.xml) and the retention period defined for the destination
holding:
retention date = base retention date + retention period
Depending on whether a retention class is specified in the SIP descriptor (eas_sip.xml), different
retention periods are used:
• If there is no retention class defined, the retention period defined for the destination holding is
used. This is the eas_retention_period property of the eas_holding object.
• If a retention class is present, the corresponding retention period associated with the retention class
is used to calculate the retention date. This is a pair of repeating properties—eas_retention_class
and as_retention_class_period defined for the for the destination holding. If the specified retention
class has not been defined, a reception error occurs.
For example, given the following retention class definitions, if the retention class specified in the
SIP descriptor (eas_sip.xml) is Class B, then retention period used to calculate the retention
date is 365.
eas_retention_class
eas_retention_class_period
Class A
3650
Class B
365
Class C
30
Before archiving data, there must be an agreement between the data owner and data producer on the
retention class definitions and the retention period defined at the holding level.
If you use EMC Centera as the archive store, when an ingested AIP is committed, its retention date
(along with content attributes) is pushed to Centera at the storage level, and is also assigned to
the associated content files stored in Centera.
When the retention of an AIP is reached and the AIP does not have a purge lock, the purge
(eas_purge) job attaches the purge lifecycle to the AIP object and the AIP is moved to a dedicated
repository folder /System EAS/data/purge.
To facilitate retention management, it is a common practice to use the retention date as a condition
for assigning AIPs when defining the pooled library assignment policy. See Configuring a Custom
Pooled Library Assignment Policy (xDB Mode 3), page 120.
261
InfoArchive Administration
Changing the Retention Date
You can change the retention date of an archived AIP. To do so, in DA, right-click an AIP object and
choose Change Retention from the shortcut menu, and then specify the change you want to make:
• Increase or decrease the retention date by a specific number of days
• Set a new retention date
• Set the retention date to null so that the AIP object will never expire
If you use a content-addressable storage (CAS) such as EMC Centera as the archive store, the
new retention date will be pushed to the CAS at the storage level, and will also be assigned to the
associated content files stored in the CAS.
Applying/Removing a Purge Lock
By default, an AIP object having reached its retention date is automatically purged by the scheduled
purge (eas_purge) job. However, such automated disposal is not desirable in some situations:
• Some electronic archiving standards such as ISO 14641-1 forbid automated disposal of expired
AIPs
• Such disposal must be formally confirmed by the data owner
A purge lock prevents scheduled disposal of an AIP when its retention date is reached. The AIP is not
disposed until the administrator explicitly removes the purge lock.
In DA, you can manually apply purge locks to AIPs or configure the holding to automatically apply
purge locks to AIPs during the archiving process.
To manually apply/remove a purge lock to an AIP:
Right-click an AIP and choose Apply/Remove the purge lock from the shortcut menu.
To configure the holding to automatically apply purge locks to AIPs during the archiving process:
Edit the properties of the holding configuration (eas_cfg_holding) object and under the Holding tab
of the Properties window, select Automatic creation of purge lock.
Using Extended Retention Management Capabilities of Retention
Policy Services (RPS)
By integrating EMC Documentum Retention Policy Services (RPS) with InfoArchive to manage
disposition of archived data, you can take advantage of RPS extended retention management features
such as event-based retention, multi–phase retention, and markups.
262
InfoArchive Administration
You must install RPS and Records Client, the RPS interface, to perform RPS actions on AIP objects.
An extension package of RPS Records Client (limited to exclusive use with InfoArchive) ships with
the product and can be deployed as part of the InfoArchive installation. The Record Client extension
provides custom-built shortcut menus that you can use to perform actions on AIP objects directly
from RPS Records Client.
Refer to the following documents for more information about RPS:
• EMC Documentum Retention Policy Service Administrator User Guide
• EMC Documentum Retention Policy Services Installation Guide
• EMC Documentum Records Client Administration and User Guide
• EMC InfoArchive Installation Guide
Licensing and Installing RPS Components
To use RPS with InfoArchive, you must perform the following licensing and installation tasks:
1.
Launch Documentum Content Server Configuration Program to activate Retention Policy
Services using a valid license.
2.
Download the following RPS components from EMC SubscribeNet.
• Documentum Retention Policy Services DAR
• Documentum Records Client WAR
3.
Launch Documentum DAR Installer to install Documentum Retention Policy Services DAR
(rps.dar).
4.
Deploy Documentum Records Client (records.war) on the web application server.
Note: InfoArchive provides an extension for Records Client. Refer to the EMC InfoArchive
Installation Guide for more information about how to deploy the extension on a standard Records
Client.
5.
Restart the repository and the web application server.
RPS Basic Concepts
This section introduces RPS basic concepts to help you quickly get started with RPS.
Retention Policy
A data retention policy is a recognized and proven protocol within an organization to retain
information for operational use while adhering to the laws and regulations concerning them. It is a
set of guidelines that describes which data will be archived, how long it will be kept, and other factors
concerning the retention of the data. Its objectives are to keep important information for future
use or reference, to organize information so it can be searched and retrieved at a later date and to
dispose of information that is no longer needed.
263
InfoArchive Administration
You can only apply RPS retention policies to AIP (eas_aip) objects; Applying them to other
InfoArchive object types—configuration objects (e.g., eas_cfg_holding) and other runtime objects
(e.g., eas_audittrail)—are not supported.
A retention policy determines the length of time an object is kept in a repository. There are two
types of retention policies:
• Linked retention policy
• Individual retention policy
Retainer
A retainer is created when you apply a retention policy to an object. There are two types of retainers,
shared and individual. Retainers are created differently depending on which type the policy belongs
to. The following table describes the behavior differences on retainer creation.
Applied to containing objects
(For example, folders)
Individual Retention Policy
Linked Retention Policy
Each object under the folder
inherits an individual retainer.
A shared retainer is created.
Objects age with their own
retainer.
Applied to non-containing
objects (For example, eas_aip
objects)
An individual retainer is
created.
All objects under the folder age
with the folder.
An individual retainer is
created.
Applying Retention Markups
In Records Client, right-click a folder and choose Retention > Apply Retention Markups from
the shortcut menu to apply retention markups.
You can create your own retention markups with custom markup names. The types of retention
markups are restricted to the following:
• Hold: Stops destruction of objects.
• Permanent: Stops destruction of objects, and only retention manager can remove permanent
markups.
• Freeze: Stops the promotion of an object from one phase to the next phase.
• Review: Sends a notification (either an email or an inbox) to a named contact following a time
period that you select.
• Vital: Used to designate which records are critical to the day-to-day operation of the business. It
does not prevent disposition.
Only the Hold and Permanent retention markups prevent an item from being disposed and privileged
deletes. Hold and Permanent markups are used in a similar way as InfoArchive purge locks.
264
InfoArchive Administration
Once retention markups are applied, the retention of an AIP object is determined by the retention
markups. After you remove the retention markup on an AIP object, the AIP reverts back to to RPS
retention or InfoArchive date-based retention.
InfoArchive never purges an AIP with an RPS retention markup applied to it. You must remove the
markup before the AIP can be purged.
Escalating DFC Instances to Privileged Users
InfoArchive makes changes to AIPs and the repository when executing jobs; for example, add/delete
renditions, create repository folders, delete AIP objects, and so on. Only privileged users can make
changes to objects under retention.
You must escalate InfoArchive DFC instances to privileged users after you install Records Client to
ensure successful execution of InfoArchive jobs.
To escalate DFC instances to privileged users in DA:
1.
Choose Client Rights Management > Privileged Clients.
2.
Click Manage Clients in the upper-right corner.
3.
Add all InfoArchive DFC clients to the right, and approve them from the shortcut menu.
4.
Click OK and exit.
5.
In the Privileged Clients list, right-click a client and choose Approve Privilege from the shortcut
menu.
6.
Restart the web application server.
Note: If InfoArchive components are distributed among several hosts, you must set the DFC instances
of the following components as privileged clients:
• InfoArchive receiver
• InfoArchive ingestor
• JMS instances
• Content Server jobs
265
InfoArchive Administration
Aligning the Retention Base Time
If you create a RPS retention policy based on date, the default base date is r_creation_date
of the retained object.
To align the RPS policy base retention date with the base retention date set using InfoArchive’s
date-based retention mechanism, create a base date in Records Client, and map the base date to the
eas_dss_base_retention_date property.
To align a RPS base date with the eas_dss_base_retention_date property of AIP
objects:
1.
In Records Client, navigate to Retention Policy Services > Base Dates.
2.
Choose File > New > Base Date.
3.
In the dialog, select eas_aip in the Object type drop-down list, and Batch base retention date
(eas_dss_base_retention_date) in the Attribute drop-down list.
4.
Click OK.
Creating a Retention Policy
To create a retention policy:
1.
Launch Records Client http://host_name:8080/records in the web browser.
2.
Click Retention Policy Services > Authorities in the left pane.
3.
Create an authority.
4.
Click Retention Policy Services > Conditions in the left pane.
5.
Create a condition.
6.
Click Retention Policy Services > Retention Policies in the left pane.
7.
Create a retention policy.
You have the following constraints:
• Disposition Strategy
You can only choose from one of these disposition strategies: Export All, Destroy All or
Destroy All. All other disposition strategies are currently not supported.
• Rendition Rule
When choosing whether to protect the renditions from deletion, you must select Primary
Format Only as the parent rendition rule. The All Renditions option is currently not
supported.
• Metadata Immutable
You must set Make Parent Metadata Immutable to No. Setting this option to Yes prevents
InfoArchive from updating AIP metadata (attributes).
266
InfoArchive Administration
Applying RPS Retention Policies to AIP Objects
There are three ways of applying an RPS retention policy to AIP objects:
• Applying a retention policy to a folder
• Setting a retention policy at the holding level
• Setting a retention policy in the SIP descriptor (eas_sip.xml)
Applying a Retention Policy to a Folder
Folders are a type of containing objects in RPS. If you retain the objects in a folder in a synchronized
manner, for example, all objects in a folder purged at the same time, you can apply a linked retention
policy to the direct containing folder.
You apply a retention policy to a folder by right-clicking a folder, and selecting Retention > Apply
Retention Policy from the shortcut menu.
Closing and Reopening a Folder
If you have all AIP objects ingested into a folder, and you want to prevent any more contents being
added, you can close the folder.
You close a folder by right-clicking the folder and choosing Retention > Close Folder from the
shortcut menu in Records Client.
If you attempt to receive an AIP object into a closed folder, the following message appears:
[dfc.error]: Exception:DfException:: THREAD: main; MSG: [DMC_RPS_LINK
_ERROR] This folder is marked as closed and cannot be linked into;
ERRORCODE: ff; NEXT: null
You can reopen a closed folder by right-clicking the folder and choosing Retention > Re-open Folder
from the shortcut menu in Records Client.
Setting a Retention Policy at the Holding Level
You can apply retention policies to a set of AIP objects at the holding level with the
eas_def_retention_class property.
267
InfoArchive Administration
The screenshot above shows an example of setting a default retention class at the holding level:
• If no retention class is specified in the SIP descriptor, InfoArchive applies RPS_0_DAY to AIP
objects.
• If Default retention class is not defined at the holding level, the Default Retention period (d) is
applied to AIP objects.
• If EAS retention disabled is TRUE for a retention class, the EAS retention period is not applied
after you remove the applied retention on AIP objects. If EAS retention disabled is FALSE, the
EAS retention period is applied after you remove the applied retention on AIP objects.
• A retention class can map to one ore more retention policies. You can apply multiple policies
on a single AIP.
Setting a Retention Policy in the SIP Descriptor
You can specify a retention class in the SIP descriptor. The following SIP descriptor (eas_sip.xml)
specifies a retention class of RPS_1_DAY.
<?xml version="1.0" encoding="UTF-8"?>
<sip xmlns="urn:x-emc:eas:schema:sip:1.0">
<dss>
...
<retention_class>RPS_1_DAY</retention_class>
...
</dss>
......
</sip>
• The retention class specified in the SIP descriptor overrides the default retention class at the
archive holding level.
• If the folder into which the SIP is ingested has its own retention policy, the AIP is retained using
two retention policies.
Changing RPS Retentions
You can extend, modify, or replace retention policies in Records Client when the policy is not
in Active state.
You can also increase or decrease the retention date for an individual AIP by choosing Change
Retention from the shortcut menu for individual AIPs.
Disposing AIPs under RPS Retention
Disposition refers to removing AIP objects from the repository through a set of jobs. Disposition
can only be performed on AIP objects in steady states. Rejection, invalidation or pruning can be
performed on AIPs in unsteady state.
Disposition can be triggered by one of the following jobs:
• eas_purge, when eas_retention_date is expired
• dmc_rps_DispositionJob, when the retainer is in its final state
268
InfoArchive Administration
For an AIP under RPS retention, you must promote its retainer to the Final state before disposing the
AIP. You perform the following jobs to dispose an AIP under RPS retention:
1.
Run dmc_rps_PromotionJob to move the retainer to the Final state
2.
Run dmc_rps_DispositionJob to dispose the AIP. The AIP is moved to the /System
EAS/Data/Purged repository folder.
3.
Run the confirmation (eas_confirmation) job.
4.
Run the purge (eas_purge) job.
Rejecting/Invalidating AIP Objects Under RPS Retention
When you reject or invalidate AIP objects under RPS retention, you remove all renditions of the
object. The object itself is not removed. The format of rejected/invalidated AIP objects changes
from eas_ci_container to eas_empty.
Deleting AIP Objects Under RPS Retention
You cannot delete AIP objects under RPS retention. However, if you are a privileged user, you
can perform privilege deletion on AIP objects. Privilege deletion results vary depending on the
AIP state prior to the action.
AIP State Code
Result
COM
Attach the purge lifecycle to the AIP object
PUR-WDEL
Delete the AIP object
PRU-WPROC
Delete the AIP object
REJ-DONE
Delete the AIP object
INV-DONE
Delete the AIP object
Others
No action
Working with InfoArchive GUI
Shipping within the InfoArchive installation package, InfoArchive GUI is the default web-based
search application for searching data archived in InfoArchive holdings. InfoArchive GUI consists
of a Search UI that lets you perform both real-time searches (synchronous) and background
searches/orders (asynchronous) and Background Searches UI that displays all the submitted orders.
For information about InfoArchive search, see How Data is Searched, page 62.
The Search UI consists of a search menu and a search pane that displays search form and search
result page:
• Search menu
269
InfoArchive Administration
The search menu groups search forms under folders based on which holding they are used to
search, each folder corresponding to a distinct holding.
• Search form
The search form contains search criteria fields and search buttons with which the end user
performs queries against a holding. A search form is specific to InfoArchive GUI and is the
mechanism used to define what can be searched.
• Search result
The search result page displays the returned query results.
The search UI components are completely customizable and you must configure them before you can
use InfoArchive GUI to search archived data in your holdings.
Logging in to InfoArchive GUI
1.
Launch InfoArchive GUI in a browser. The default URL is http://myhost:8080/eas-gui.
2.
Enter your username and password. Your must have at least read access rights to the holdings
you want to search.
3.
Choose in which locale you want view search forms.
270
InfoArchive Administration
After you log in, search forms that have been localized will appear in the corresponding
language; otherwise, the default locale defined in the eas-services.properties file
(eas.locale.default=en_US) will be used.
4.
Click Sign In.
Searching Archived Data
To search for archived data in InfoArchive GUI:
1.
From the search menu on the left, click the holding you want to search.
2.
In the search form, enter the search criteria.
3.
Do one of the following:
• To perform a synchronous search, click Search.
• To submit an order (asynchronous search), click Background Search and then optionally
enter a name to identify your background search. You can see the order you submitted in the
Background Search screen.
When the order is executed, you can view the search results or export all the contents of
the returned AIUs.
In the search page, the logical relations among multiple criteria are not spelled out on the search
screen (unless you specify them in the field label), but can be found in the Search Details panel at
the top of the search results screen.
271
InfoArchive Administration
Working with the Search Results Page
Records returned by a search (synchronous or asynchronous) are displayed in the search result page.
A typical search result page consists of the components illustrated below.
The search results page is configurable through the stylesheet configuration (eas_cfg_stylesheet)
object. See Configuring the Search Results, page 200.
On the search results page, you can perform the following actions:
• Expand the Search Details panel to view the detailed information about the search:
— The search criteria entered in the proceeding search form for executing the query, including the
specified attribute values and their logical relations (AND/OR)
— The search options set in the query configuration, including the ACL/holding name, delivery
channel, and result schema used by the query.
• Sort results by a column by clicking on a column header, switching between ascending and
descending order.
• Filter results by a column by entering or selecting a filter value below the column header. For
some value data types, you can choose a comparison operator for the filter.
• You can filter the results by more than one column. By default, multiple filter values are combined
using the AND relation. At the bottom of the screen, you can click the AND/OR button to switch
between these two relations. To remove all filter criteria, click the Clear filters button.
With the OR relation applied, search results that meet any one the specified filter conditions are
displayed. In the following example, “Logan” and “Johnson” are used as the filter value for
First Name and Last Name respectively using the OR relation (with the OR option selected at
272
InfoArchive Administration
the bottom of the page), records that satisfy either of the filter criteria are displayed, sorted by
Call received at in ascending order.
• If a result contains an unstructured content file, you can click the action button corresponding to
the file data type (if configured) to download or open the file.
• Click the + (plus) sign to the left of a record to view its detailed information. If the record has
associated unstructured content files, they are displayed below as attachments.
Exporting AIUs on the Search Results Page
To export structured and/or unstructured data from AIUs on the search results page:
• Select the AIUs you want to export and click Export Selected (only appears when at least one
AIU is selected) at the bottom of the screen. If the search results span multiple pages, navigate
through the pages to select them.
If you want to export all the AIUs, click Export All without having to select any AIUs.
• If returned AIUs have unstructured content files associated with them, you can use the Export
with Structured/Unstructured Content icon at the bottom to choose whether to export structured
data only or export both structured data and unstructured content files.
• By default, all unstructured data is exported in XML format. You can change the export format
to CSV or TXT format using the format option button next to the Export Selected and Export
All buttons.
Each export operation is logged in the audit trail complete with detailed information such as the AIU
IDs and content IDs of exported records. To see a detailed report of who exported what information
during the specified period of time, see the Exported Information report (Exported Information,
page 282).
273
InfoArchive Administration
Using the InfoArchive GUI Direct Search URL
If you want to integrate InfoArchive GUI’s search capabilities into a third-party or custom-built
business application, you can use a direct search URL to perform queries without having to logging
in to InfoArchive GUI.
Using this approach, you can bypass explicit InfoArchive GUI login, include search criteria as the
query parameter directly in the URL, and leverage the configured InfoArchive GUI search forms
and results pages, which allows for seamless integration of InfoArchive GUI with your business
application.
To use a direct search URL:
1.
Create a JSP (Java Server Pages) file and deploy it as part of the InfoArchive GUI web application.
For example, on Apache Tomcat, place the JSP file in the following location:
TOMCAT_HOME/webapps/eas-gui
Here is a sample of the JSP code:
<%@ page import="com.emc.documentum.eas.gui.gwt.server.EASGUIApplication" %>
<%@ page import="com.emc.documentum.eas.gui.gwt.server.NoDomainGroupInProfileListException" %>
<%@ page import="com.emc.documentum.eas.service.model.authenticate.Profile" %>
<%@ page import="java.util.ArrayList" %>
<%@ page import="java.util.List" %>
<%
try {
String user = request.getParameter("user");
String domain = request.getParameter("domain");
String profile = request.getParameter("profile");
String query = request.getParameter("query");
String schema = request.getParameter("schema");
String channel = request.getParameter("delivery");
if (domain != null && profile != null) {
List<Profile> profiles = new ArrayList<Profile>();
Profile d = new Profile();
d.setName(domain);
d.setIsDomain("true");
d.setIsDynamic(false);
d.setClazz("domain");
d.setDescription("");
Profile p1 = new Profile();
p1.setName(profile);
p1.setIsDomain("false");
p1.setIsDynamic(true);
p1.setClazz("role");
p1.setDescription("");
d.getProfiles().add(p1);
profiles.add(d);
EASGUIApplication.getAuthenticationHook().onUserAuthenticated(profile, "en_US",
profiles);
response.sendRedirect(request.getContextPath() + "/directSearch?query="+query+
"&schema="+schema+"&delivery="+channel);
} else if (user != null) {
EASGUIApplication.getAuthenticationHook().onUserAuthenticated(user, "en_US", user);
response.sendRedirect(request.getContextPath() + "/directSearch?query="+query+"
&schema="+schema+"&delivery="+channel);
} else {
response.sendRedirect(request.getContextPath() + "/login?locale=en_US");
}
} catch (NoDomainGroupInProfileListException e) {
response.sendRedirect(request.getContextPath() + "/login?locale=en_US");
}
274
InfoArchive Administration
%>
2.
In the JSP file, provide the URL of the page to redirect to in the response.sendRedirect
method. For example, if you want to be redirected to a search form, provide the redirect URL
like this:
response.sendRedirect(request.getContextPath() + "/eas#Root:search
_form.PhoneCalls.01?locale=en_US");
3.
Restart the web application server. You can then access InfoArchive GUI via the JSP file by
including the following parameters directly in the URL:
Parameter
Description
user
Username with which to log in to InfoArchive GUI
profile
The role to which the user pertains, with access privileges to the holding
to search
domain
Domain to use to display a specific set of search forms
schema
PDI schema to use for the search
delivery
Delivery channel to use as the destination of search results. Always specify
the standard delivery channel: eas_access_services.
query
DQL expression that will be parsed into the search criteria for the query. The
DQL expression uses the following syntax:
AIC where criteria order by element
In the criteria expression, you can use the following comparison operators,
which will be parsed into the corresponding operators that InfoArchive
understands:
DQL Operator
InfoArchive Operator
=
Equal
!=
<>
NotEqual
>
Greater
>=
GreaterOrEqual
<
Less
<=
LessOrEqual
like
StartsWith
The following built-in functions are also supported in the criteria expression:
DQL Function
InfoArchive Operator
contains(name,’value’)
Contains
starts-with(name,’value’)
StartsWithFullText
You can also use the AND and OR logical operators in the criteria.
275
InfoArchive Administration
Here is an example direct search URL containing parameters to pass to the search form:
http://localhost:8080/eas-gui/example.jsp?user=admin&domain=samples
&profile=phonecalls_role&query=PhoneCalls where CostumerID='1' or FirstName like 'John'
&schema=urn:eas-samples:en:xsd:phonecalls.1.0&delivery=eas_access_services
4.
The direct search URL will redirect you to the search results page.
Using Archival Reports
InfoArchive is a system designed to archive large volumes data of varying levels of complexity,
including structured and unstructured data, over a long period of time. To make the most effective
use of the system, it is important to understand how it is being used.
InfoArchive automatically collects key system usage data and provides out-of-the-box archival
reporting capabilities that help you gain visibility into how business data is being archived by the
system and turn current and historical statistics into valuable insight to optimize system utilization, as
well as troubleshoot ingestion problems. In addition, the Archived Volume History report provides
the basis for InfoArchive volume-based pricing.
InfoArchive provides key archival metrics through a set of pre-configured saved searches in DA:
• List of Ingestions in Progress
View valid AIPs in the system that have not been successfully archived yet.
• List of Ingestions with Errors
View AIPs with errors encountered during the ingestion process (return code is not 0), including
rejected or invalidated AIPs.
• List of AIPs by Retention Date
View all archived AIPs whose retention date is before a specified date.
• AIPs for Disposition
View all AIPs with a disposition date determined by the retention policy.
• Current xDB Library Pool Volume
View the most current archival metrics of xDB library pools that stores archived AIPs.
• Current Archived Volume
View the most current archival metrics for a specified holding and entity.
• Archived Volume History
View detailed as well as aggregate historical records of archival metrics for specified holdings and
entities over a past period of time.
• Performed Actions
View who (users and roles) performed what actions on a specific holding during a specified
period of time.
• Exported Information
276
InfoArchive Administration
View who (users and roles) have exported what information (which AIUs) from a specified
holding or all holdings during a specified period of time.
Using Archival Reports in DA (Common Operations)
As with other saved searches, you can use the archival reports through common operations in DA:
• To access the reports, in the DA navigation pane, click Saved Searches.
• To run a report, right-click the report and select Run Search.
• To rerun the report, in the Search Results page, click Restart from the menu on the top-right
corner of the screen. (This button is not available for Current xDB Library Pool Volume.)
• To sort on an attribute, click the column header in the search results table.
• To customize the search results table, in the Search Results page, click the Column Preferences
icon next to the right-most column header. In the Column Preferences pop-up window, you can
add or remove attributes to display in the search result table. (This feature is not available for
Current xDB Library Pool Volume.)
• To better understand the archival reports, refer to the EMC InfoArchive Object Reference Guide,
which contains descriptions of all the object properties that map to corresponding columns in the
search results tables.
• To monitor search results in real time when a report is running, on the search results page, click
the Open the Search Status icon (magnifying glass) as soon as the search has started.
• To view the underlying DQL query statement for the report, in the Real Time Search Monitoring
page, click Show native query in the table.
277
InfoArchive Administration
List of Ingestions in Progress
Use the List of Ingestion in Progress report to view valid AIPs in the system that have not been
successfully archived yet, which include AIP objects whose eas_phase_seqno values are any of the
following (representing corresponding phases in the AIP lifecycle):
• 1 (Receive)
• 2 (Pending ingestion)
• 3 (Ingestion)
• 4 (Pending commit)
List of Ingestions with Errors
Use the List of Ingestions with Errors report to view AIPs with errors encountered during the
ingestion process (return code is not 0), including rejected or invalidated AIPs.
Ingesting an AIP always returns a code (represented by the eas_return_code property of the AIP
object): 0 (zero) means the AIP has been successfully ingested; a value other than 0 (zero) means
the ingestion has failed. You can use the return code and its corresponding return message to
troubleshoot the ingestion error. After you resolve the ingestion problem, you can either resume the
ingestion process or invalidate/reject the AIP.
List of AIPs by Retention Date
Use the List of AIPs by Retention Date report to view all archived AIPs whose retention date is before
a specified date. If you do not specify a date, all AIPs with a retention date are returned.
AIPs for Disposition
Use the AIPs for Disposition report to view all AIPs with a disposition date determined by the
retention policy. The report provides the following details:
Column
Description
AIP
Name of the AIP with a disposition date already set
Holding Name
Name of the holding to which the AIP belongs
Disposition Date
Exact date and time of disposition
Time to Disposition
Number of days remaining until the disposition date; if the disposition
date has already passed, the status Expired is displayed
Hold Requested by
The user who applied the purge lock to the AIP, if there is one
Retention Markup
Details
The retention policy and markup designation if Retention Policy Services
(RPS) is implemented
278
InfoArchive Administration
Current xDB Library Pool Volume
Use the Current xDB Library Pool Volume report to view the most current archival metrics of xDB
library pools that stores archived AIPs.
When defining the search criteria, you can select a library pool and specify a range for the last cache
in date, last cache out date, or opening date. If you do not specify any search criteria, the report
displays metrics for all the xDB library pools that stores archived data.
You can export the report to the comma-separated values (CSV) format by clicking Export to CSV
from the menu on the upper-right corner of the page.
Current Archived Volume
Use the Current Archived Volume report to view the most current archival metrics for a specified
holding and entity. The report displays aggregated statistics computed in real time from all archived
AIPs currently stored in the system.
If you do not specify a holding or an entity when you define the search criteria, all holding/entity
combinations with archived AIPs are returned. It may take a long time to return the search results
when the number of archived AIPs is large.
Note: Empty holdings and entities not associated with any archived AIPs are not available for
selection when you define the search criteria; holding/entity combinations with no archived AIPs are
not returned in search results.
279
InfoArchive Administration
Archived Volume History
Use the Archived Volume History report to view detailed as well as aggregate historical records of
archival metrics for specified holdings and entities over a past period of time. By offering you a view
of the archival metrics at varying levels of granularity, the report lets you see both the forest and the
trees—you can view a high-level summary of archival metrics to understand the archiving trend
during a period, as well as dive in to see individual snapshot of metrics at the time of archiving.
There are two report types:
• List
The list report type displays detailed historical records time of archival metrics during a specified
period of in chronicle order. If you do not specify a time period, all historical records will be listed.
The report displays the same metrics as the Current Archived Volume, but instead of showing the
most current archival metrics computed in real time, it pulls from an underlying registered table
past records of archival metrics captured periodically by a scheduled job. Each record has a date
timestamp and represents a snapshot of archival metrics at that point in time.
You can export the list type report to the comma-separated values (CSV) format by clicking Export
to CSV from the menu on the upper-right corner of the page.
• Aggregate
The aggregate report gives you the flexibility to view a host of archival metrics aggregated by
specified functions in varying levels of granularity. For example, you can view the average
number of archived packages aggregated by each week for the past three months or the maximum
archived volume of structured and unstructured data combined aggregated by each month
during last year.
Note: InfoArchive volume-based pricing is based on the maximum metadata + contents volume
metric (received structured data and associated unstructured data volume).
280
InfoArchive Administration
To define the search criteria for the Archived Volume History report, you first specify a holding and
an entity for which you want to generate the report, and then choose a report type.
If you do not specify a holding or an entity when you define the search criteria, all holding/entity
combinations with archived AIPs are returned. It may take a long time to return the search results
when the number of archived AIPs is large.
Note: Empty holdings and entities not associated with any archived AIPs are not available for
selection when you define the search criteria; holding/entity combinations with no archived AIPs are
not returned in search results.
After the report is generated, you can:
• Click the Export To CSV command on the top-right corner of the screen to export the generated
report into a .csv file that contains a snapshot of all the metrics for each interval period during
a specified period of time
• Click the Show Histogram command on the top-right corner of the screen to view the archival
metrics visually represented as a histogram.
Performed Actions
Use the Performed Actions report to view who (users and roles) performed what actions on a specific
holding during a specified period of time.
You must export the report to CSV to view the complete action history information, including the
event name and object ID associated with the performed actions.
281
InfoArchive Administration
The exported Performed Actions report provides detailed information about the following
InfoArchive audit trail events (of object type eas_audittrail): eas_query, eas_fetch, eas_getcontent,
eas_order, and eas_getorder.
Exported Information
Use the Exported Information report to view who (users and roles) have exported what information
(which AIUs) from a specified holding or all holdings during a specified period of time.
282
InfoArchive Administration
Archived AIPs and archival metrics
The Current Archived Volume and Archived Volume History reports share the same set of archival
metrics. Archival metrics are calculated based on AIPs that have been successfully archived into the
system (ingestion committed with the phase being Complete in the AIP lifecycle) and are not in a
transient state (such as included in an aggregate). Qualified AIPs have the following attribute values:
• eas_phase_code = COM
• eas_is_in_unsteady_state = false
The following AIPs are not taken into account by archival metrics calculation and reporting:
• AIPs in any phase of the lifecycle prior to Complete: Receive, Pending ingestion, Ingestion, and
Pending commit
• AIPs that have been invalidated, rejected, or purged
• AIPs being synchronously ingested (eas_is_in_unsteady_state = true)
• AIPs to be aggregated
Note: The symbol G in the archival metrics represents the Giga unit prefix that denotes a factor of a
billion (1,000,000,000) rather than 1024M (1,073,741,824).
Key Archival Metric
Description
Metadata volume (GChar)
Total number of characters in billions contained in archived
PDI files (structured data), calculated as the sum of the
eas_pdi_values_char_count property value of all the archived AIPs.
The structured data character count for an AIP includes XML
element values and attribute values in the PDI file (eas_pdi.xml)
and excludes XML tags to accurately reflect the volume of actual
business data archived. Therefore, the count is independent of
the XML schema used.
For example, the following element contains 6 characters worth
of structured data:
<myelement myattribute="ABC">DEF</myelement>
Contents volume (GB)
Total volume in gigabytes of archived unstructured content,
calculated as the sum of the eas_ci_size property value of all the
archived AIPs.
The unstructured content volume for a single AIP is the
uncompressed size of all the content files received in the package,
before compression or encryption, if any, during the ingestion
process.
For example, a SIP package consisting of a 500KB PDF file and a
300KB ZIP file is 800KB in data size.
283
InfoArchive Administration
Key Archival Metric
Description
Metadata + Contents volume
Total volume of archived data, including structured data (PDI
files) and unstructured data (content files), calculated as the sum
of structured data volume and contents volume (despite different
units).
The largest value of this metric over a given period of time
(calendar year) is used for InfoArchive volume-based pricing.
Number of packages
Total number of archived AIPs stored in the xDB library, calculated
as the sum of the eas_aip_count property value of all the archived
AIPs.
Number of AIUs
Total number of archived AIUs, calculated as the sum of the
eas_sip_aiu_count property value of all the archived AIPs.
Received package volume
(GB)
Total size in gigabytes of received SIP files that have been archived,
calculated as the sum of the eas_sip_file_size property value of all
the archived AIPs.
Raw XML metadata volume
(GB)
Total size in gigabytes of PDI files (eas_pdi.xml) that have been
archived, calculated as the sum of the eas_pdi_file_size property
value of all the archived AIPs.
xDB volume (GB)
Total size in gigabytes of the xDB storage space occupied by
archived data, including PDI files (eas_pdi.xml), table of
contents (eas_ri.xml), and indexes created on these documents.
The metric is calculated by aggregating the sum of
eas_xdb_pdi_size, eas_xdb_ri_size and eas_xdb_index_size
property values of all the archived AIPs; it does not include size
related to potential xDB indexes created at an upper level for AIPs
ingested with xDB mode 1 or 3.
Storage volume (GB)
Total size in gigabytes of all the content stored in the repository
(e.g., renditions) pertaining to archived AIPs, calculated as the sum
of the eas_contents_size property value of all the archived AIPs.
Configuring the Calculation of Archived Structured
Data Volume
To keep track of the cumulative volume of archived structured data, InfoArchive counts the number
of characters contained in the PDI file (eas_pdi.xml) of the AIP during the ingestion process
and stores this information as the value of the eas_pdi_values_char_count property of the AIP. If a
counting error occurs, structured data hash is not validated correctly and ingestion fails.
If xDB mode is set to 2, InfoArchive sums up the structured data character count of all LWSO AIPs
into the eas_pdi_values_char_count property value of the aggregated AIP.
The structured data character count for an AIP includes XML element values and attribute values in
the PDI file (eas_pdi.xml) and excludes XML tags to accurately reflect the volume of actual business
data archived. Therefore, the structured data character count is independent of the XML schema used.
284
InfoArchive Administration
You can configure whether to ignore spaces in counting structured data characters by setting the
are.whiteSpace.ignored value in the following ingestion configurations:
• eas_cfg_ingest_sip_zip-pdi for ingesting structured information
• eas_cfg_ingest_sip_zip-ci for ingesting unstructured information
Both ingestion configurations are .xml files imported into the repository as eas_cfg_ingest type
objects. Add the following in the files:
...
<processor>
<class>com.emc.documentum.eas.ingestor.transform.processor.validator.PDISchemaValidator</class>
<data>
<are.whiteSpaces.ignored>true</are.whiteSpaces.ignored>
</data>
</processor>
...
Configuring the eas_report Job
In both the list and aggregate type Archived Volume History reports, historical metrics data is pulled
from a registered table named eas_history_volume. A scheduled job eas_report automatically collects
and aggregates AIPs archiving data and saves all historical statistics in the eas_history_volume
table periodically. Each record represents a snapshot of the archival metrics at the point of time the
archiving data was captured.
The eas_report job must be activated and scheduled to run at least once a week to ensure enough
archiving data is captured to generate the Archive Volume History report and provide the basis for
InfoArchive volume-based pricing. InfoArchive installation automatically creates the eas_report job
in DA and sets it to the Active state. You can change the job schedule to meet your specific reporting
requirements but the minimum frequency of weekly job runs must be maintained. You can also use
your own job scheduler to schedule the eas_report job outside DA.
To schedule the eas_report job in DA, you use the same steps for other jobs:
1.
In the DA navigation pane, connect to your repository and click Administration > Job
Management > Jobs.
2.
Use the search box to locate the eas_report job.
3.
Right-click the job and select Properties.
4.
On the Info page of the Job Properties window, make sure State is Active.
5.
Select a trace level from the Trace Level list:
• 0: TRACE
• 1: DEBUG
• 2: INFO
• 3: WARN
• 4: ERROR
285
InfoArchive Administration
Trace logs contain status information logged by the job. The trace level set for the job determines
the amount of information logged.
6.
Click the Schedule tab and change the job schedule.
Field Label
Description
Next Run Date and Time
Specifies the next start date and time for the job.
The default is the current date and time.
Repeat
Specifies the time interval in which the job is repeated.
Frequency
Specifies how many times the job is repeated.
For example, if Repeat is set to Weeks and Frequency is set to
1, the job repeats every week. If Repeat is set to weeks and
Frequency is set to 3, the job repeats every three weeks.
End Date and Time
Specifies the end date and time for the job.
After
The default end date is 10 years from the current date and time.
Specifies the number of invocations after which the job becomes
inactive.
After the eas_report job is run:
• To view the job report, right-click the job and select View Job Report.
• To view the job trace logs, right-click the job and select View Trace File.
• Right-click the job and select Properties to view the job run history information at the bottom of
the Info page.
Last Return Code displays the last value returned by the job that can help you troubleshoot
problems when error occurred:
— 0: The job was executed successfully without any errors.
— -1: An unexpected error occurred.
— 3: There was an error related with credentials.
— 4: There was an error related with job parameters.
— 5: There was an error related with job initialization.
286
InfoArchive Administration
Managing Orders
InfoArchive orders are created by asynchronous (background) searches and are represented by
eas_order objects in Content Server. You manage orders using DA in the same way you manage other
Content Server objects. You can create orders through SOAP requests or InfoArchive GUI.
Viewing Order Properties and Processing History
To view the properties of an order, right-click the order in DA and then select Properties. In the
Properties window, you can click through tabs to view various order properties grouped under them.
You can view detailed order processing history under the Tracking Details tab.
Suspending/Resuming an Order
To temporarily suspend an order or to resume a suspended order:
1.
In DA, right-click an order and select Properties.
In the Properties window, click the Tracking tab.
2.
Select the Suspended option to suspend the order; clear this option to resume the order.
3.
Click OK.
287
InfoArchive Administration
Changing the Priority of an Order
To change the priority of an order:
1.
In DA, right-click an order and select Properties.
In the Properties window, click the Order tab.
2.
In the Priority field, set the new priority for the order.
3.
Click OK.
288
InfoArchive Administration
Cancelling an Order
To abort the execution of an order:
1.
In DA, right-click an order and select Properties.
In the Properties window, click the Tracking tab.
2.
Select the Delete requested option to cancel the order.
3.
Click OK.
289
InfoArchive Administration
Purging Orders
Two scheduled jobs are executed to purge orders:
1.
Run the eas_clean_order Content Server job.
This job destroys all completed orders that have reached their retention date, making associated
data in xDB orphaned.
2.
Run the eas_launch_xdb_clean job on the xDB server.
This job destroys orphaned xDB data.
Managing Jobs
InfoArchive Jobs
InfoArchive includes several specific jobs to be scheduled:
• In the form of Content Server repository jobs for those which always have to be executed on the
server hosting the Content Server.
• In the form of a command line to be triggered for the jobs which are subject to be executed on
servers not hosting the Content Server depending on the deployed architecture.
Since InfoArchive rests on the usage of a repository, it is also required to schedule the execution of
some standard Content Server jobs.
Viewing Job Trace Logs
Trace logs contain status information logged by a job. The trace level set for a particular job
determines the amount of information logged.
After a job run, you can view the trace log for the job by right-click the job and select View Trace
File in DA.
However, it may take an InfoArchive job several hours to complete and the trace file is not available
until the job has stopped running. For a job that is still running, you can find its log file in the a
location like the following:
DOCUMENTUM/dba/log/00000001/sysadmin/job_log_file
290
InfoArchive Administration
Where job_log_file is a job-specific log file in .txt format:
• eas_close: eas_closeTrace.txt
• eas_confirmation: eas_confirmationTrace.txt
• eas_dctm_clean: eas_dctm_cleanTrace.txt
In addition, for Content Server methods being executed, you can find some information
pertaining to the related running job in the Content Server repository log located in
DOCUMENTUM/dba/log/repository_log_file.
If a job encounters an error before the listener (the object writing the report and the trace file) are
attached, the error is written to the standard output only. For example, if a memory exception occurs,
or if the arguments in the dm_job are not correct, an error is written to the standard output.
If the trace file or the report file is empty, you can find the error log in /Temp/Jobs/job_name. For
example, the close job log is located in /Temp/Jobs/eas_close.
InfoArchive Content Server Jobs
Content Server jobs are repository objects that automate method object execution. Methods associated
with jobs are executed automatically on a user-defined schedule. The properties of a job define the
execution schedule and turn execution on or off. Jobs are invoked by the agent exec process, a process
installed with Content Server. At regular intervals, the agent exec process examines the job objects in
the repository and runs those jobs that are ready for execution.
All InfoArchive job and method names start with eas_ as the prefix. You manage InfoArchive jobs
and methods under Administration > Job Management in DA. You manage InfoArchive jobs and
methods in the same way as you do with standard Content Server jobs. For information about
how to manage Content Server jobs, such as reschedule jobs, run jobs, view running job statuses,
view job reports, set job trace levels, and view job trace logs, refer to the Job Management chapter in
the EMC Documentum Administrator User Guide.
291
InfoArchive Administration
InfoArchive provides the RunJob utility to run Content Server jobs from the command line:
• On Windows:
EAS_HOME\bin\runJob.bat repository_name job_name
For example:
c:\InfoArchive\bin\runJob.bat repo01 eas_commit
• On Linux or UNIX:
EAS_HOME/bin/runJob.sh repository_name job_name
For example:
/usr/local/bin/InfoArchive/bin/runJob.sh repo01 eas_commit
InfoArchive provides the following Content Server jobs:
Job Name
Job Code
Description
Archive Audit
eas_archive_audit
The Archive Audit job creates SIP files containing a
copy of the repository audit trails and post them in the
ingestion queue by triggering the receiver.
DCTM Clean
eas_dctm_clean
The DCTM Clean job destroys the repository objects
associated with the orders having reached their
retention date.
Commit
eas_commit
The purpose of the Commit job is to seal the AIP
belonging to the same data submission session (DSS)
when they have all been ingested.
292
InfoArchive Administration
Job Name
Job Code
Description
DMClean CAStore
dm_DMClean
_CAStore
The DMClean CAStore job is a clone of the standard
dm_DMClean Content Server job having different
argument values in order to include the detection of the
orphan contents stored in Centera.
Invalidation
/Rejection
eas_rejinv
Use the this job to process invalidated or rejected AIPs.
Confirmation
eas_confirmation
Generates one or several confirmation messages when
some events occur on an AIP.
Purge
eas_purge
The Purge job is responsible for the disposal of AIPs.
Purge Audit
eas_purge_audit
The Purge Audit job is responsible for purging Content
Server audit trails that have been successfully archived.
Update Content
Metadata
eas_sip_cmeta
_refresh
Use the Update Content Metadata job to propagate
the AIP custom element changes to the SIP descriptor
rendition.
Modifying InfoArchive Methods Timeout Settings
The default timeout for InfoArchive methods is one hour. If it normally takes longer than one hour
to run a job, you need to modify the method’s timeout settings to extend the time allowed to pass
before the job fails.
To modify the timeout setting of a method:
1.
In DA, navigate to Administration > Job Management > Methods.
2.
Locate the method; then right-click it and select Properties.
3.
Specify new values for the following timeout fields:
• Timeout Minimum
The minimum timeout that can be specified on the command line for this procedure. The
minimum timeout value cannot be greater than the default value specified in the timeout
default field.
• Timeout Default
The default timeout value for the procedure. The system uses the default timeout value if no
other time-out is specified on the command line.
• Timeout Maximum
The maximum timeout that can be specified on the command line for this procedure.
293
InfoArchive Administration
Increasing the JVM Heap Size Allocated to InfoArchive Jobs
Running jobs on sizable volumes of data can run out of system memory using the default settings
(JOB_JAVA_MEMORY=256M). For example, you may encounter the out-of-memory error when
running the eas_close job to close a large quantity of open AIP parent shareable objects.
To avoid out-of-memory issues, increase the JVM heap size allocated to InfoArchive jobs:
• On Windows, edit EAS_HOME\bin\eas-set-env.bat and change:
SET JOB_JAVA_MEMORY=256M
To:
SET JOB_JAVA_MEMORY=4096M
• On Linux or UNIX, edit EAS_HOME/bin/eas-set-env.sh and change:
export JOB_JAVA_MEMORY=256M
To:
export JOB_JAVA_MEMORY=4096M
Archive Audit (eas_archive_audit)
The Archive Audit job creates SIP files containing a copy of the repository audit trails and post
them in the ingestion queue by triggering the receiver. For detailed information about the Archive
Audit job, see .
DCTM Clean (eas_dctm_clean)
The DCTM Clean job destroys the repository objects associated with the orders having reached
their retention date.
DCTM Clean Arguments
Argument
Description
-WorkingDir DirectoryPath
File system path of the working directory to be used by the job.
-report_only True|False
Setting to true this argument lets the system report the purge
processing to be done without executing them.
-OrderNodes node_name
Nodes of the orders to clean.
-PhasesToProcess
order|prune|aip
_parent
Which phase of orders to delete:
• Order phase: Clean outdated and deleted orders
• AIP Parent phase: Clean empty AIP parents
• Prune phase: Delete AIP parents and LWSOs that are attached
to the Prune lifecycle
294
InfoArchive Administration
DCTM Clean Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
Close (eas_close)
The close job (eas_close) is responsible for:
• Checking if the AIP parent shareable objects that meet the defined close condition have been
closed (their associated eas_open_aip_parent_rel objects have been destroyed)
• Closing any AIP parent shareable object that meets defined close condition by aggregating all
its AIP child lightweight objects into a single materialized AIP object and attaching the parent
to the prune lifecycle.
• Detecting pooled libraries which can be closed in order to close them. Such closure includes their
archiving in the repository if this option is activated.
Close Arguments
Argument
Description
-report_only TRUE|FALSE
When set to true, the log of the job only indicate the xDB pooled
libraries ready for being closed but the job will not close them.
-pooled_library_predicate
Optional argument allowing to restrict the scope of the pooled
libraries to be considered by the job.
This argument has to be populated with a standard DQL predicate
(For example, "eas_xdb_pooled_library where …"). This argument
creates several dm_job responsible for closing different subsets
of libraries.
If empty, there is no predicate.
295
InfoArchive Administration
Argument
Description
-aip_parent_predicate
Optional argument allowing to restrict the scope of the AIP parent
sharable objects to be considered by the job.
This argument has to be populated with a standard DQL predicate
(For example, "eas_aip_parent where …"). This argument lets you
create several dm_job responsible for closing different subsets of
AIP parents.
If empty, there is no predicate.
-PhasesToProcess aip
_parent|pooled_library
Whether to close AIP parents or pooled libraries.
-WorkingDir
File system path of the directory to be used by the job as working
area.
-pooled_library_close_delay
The amount of delay time in minutes before closing pooled
libraries. Default: 10 (minutes).
-aip_parent_close_delay
The amount of delay time in minutes before closing AIP parents.
Default: 10 (minutes).
Close Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
Commit (eas_commit)
The commit process is invoked by running the eas_launch-ingestor script and it commits AIUs into
the xDB database so that the archived data can be searched.
When all SIPs pertaining to a DSS have been ingested (lifecycle state = Waiting_Commit), executing
the eas_commit job performs the following actions on every SIP in the batch:
• Attach a commit log file content to the AIP for traceability purposes
• Push the retention date as well as content attributes to the contents stored in Centera
296
InfoArchive Administration
• Compress the received SIP file
• Promote the AIP to the Completed state
Commit Arguments
Argument
Description
-WorkingDir
File system path of the working directory to be used by the job.
Standard Content Server job arguments are passed to the job.
Commit Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
Return Code
Mnemonic
Message
0
RUNJOB_I_NORMAL
Normal successful completion
1
RUNJOB_E_LAUNCHFAILED
Could not launch method
2
RUNJOB_E_TIMEDOUT
The method timed-out
3
RUNJOB_E_RETURNVAL
The method returns the error value
4
RUNJOB_E_RESULT
The method returns error as execution
result
5
RUNJOB_E_INTERNAL
Generic error
6
RUNJOB_E_AMBIGUOUSJOB
Several jobs have name
7
RUNJOB_E_MAXITERATIONS
The maximum of times to execute the
job has been reached
8
RUNJOB_E_USERCHECKEDOUT
Job is currently checked out by user
9
RUNJOB_E_NOJOBMETHOD
Job does not have any associated
method
10
RUNJOB_E_DMCL
Error while processing DMCL
command
11
RUNJOB_E_RESULTFILE
297
InfoArchive Administration
Return Code
Mnemonic
Message
12
RUNJOB_E_JOBLOGFILE
13
RUNJOB_E_CONNECT
Can not connect to the docbase
14
RUNJOB_E_JOBNOTFOUND
No job found
DMClean CAStore (dm_DMClean_CAStore)
The DMClean CAStore job is a clone of the standard dm_DMClean Content Server job having
different argument values in order to include the detection of the orphan contents stored in Centera.
DMClean CAStore Arguments
The arguments set on the job are identical to the arguments of the standard dm_DMClean job except
the value of the -clean_castore argument which is set to true.
DMClean CAStore Return Codes
The job returns the same return codes as the standard dm_DMClean job.
Invalidation/Rejection (eas_rejinv)
Use the this job to process invalidated or rejected AIPs.
An administrator can invalidate or reject an AIP within the DA interface; such action leads to
immediately attach the invalidation or rejection lifecycle to the selected AIP.
In the event the invalidated AIP was in the Completed state and part of a Data Submission Session
(DSS) having multiple AIPs; the job reverts the other AIPs to the Waiting Commit state.
In the event the rejected AIP is part of a Data Submission Session (DSS) having multiple AIPs; the job
also rejects the other AIPs by attaching them to the reject lifecycle.
Once a rejected AIP have been considered by the Confirmation (eas_confirmation) job for the
invalidation or rejection event, the job removes the contents associated to the AIP except those
configured as to be retained in such case.
Invalidation/Rejection Arguments
Argument
Description
-WorkingDir
DirectoryPath
File system path of the working directory to be used by the job.
298
InfoArchive Administration
Argument
Description
-PhasesToProcess
invalid|reject
Whether to process invalidated or rejected AIPs.
-cutoff_days
NumberOfDays
Optional argument ensuring a fast scan even if a large number of AIP
is stored in the repository.
If this argument is present, the criteria receive date >= (current date – the
number of days) is added to the searches
Since RDBMS indexes are created on the AIP receive date attributes, the
inclusion of this criteria lets you quickly narrow down the search to
the recently received AIP.
Standard Content Server job arguments are passed to the job.
Invalidation/Rejection Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
Confirmation (eas_confirmation)
Depending on the configuration, InfoArchive can generate one or more confirmation messages when
one of the following events occurs on an AIP:
• The SIP file associated with the AIP has been received
• The AIP has been ingested
• The AIP has been purged
• The AIP has been rejected
• The AIP has been invalidated
The structure and the delivery of those confirmations are also driven by the configuration.
These confirmations are generated by the Confirmation (eas_confirmation) job which incrementally
scans the AIP on which the events occurred since its last execution.
299
InfoArchive Administration
When an error occurs, check the job report, which is available in DA as well as in the /tmp directory
(e.g., c:/tmp) on the file system.
Confirmation Arguments
Argument
Description
-ConfirmationTypes
[receipt][,storage][,purge]
Specifies the confirmation events to be considered by the job using
the following keywords:
[,reject][,invalid]
• receipt searches the AIP which have not yet be considered by
the confirmation job for potentially notifying the reception of
the SIP file
• storage searches the AIP which have not yet be considered by
the confirmation job for potentially notifying the archiving of
the AIP
• purge searches the AIP which have not yet be considered by the
confirmation job for potentially notifying the purge of the AIP
• reject searches the AIP which have not yet be considered by the
confirmation job for potentially notifying the rejection of an AIP
• invalid searches the AIP which have not yet be considered by
the confirmation job for potentially notifying the invalidation
of an AIP
-WorkingDir DirectoryPath
File system path of the working directory to be used by the job
-cutoff_days NumberOfDays
Optional argument ensuring a fast scan even if a large number of
AIP is stored in the repository.
If this argument is present:
• The criteria receive date >= (current date – the number of days)
is added to the searches issued for the receipt, storage, reject
and invalid events.
• The criteria purge date >= (current date – the number of days) is
added to the searches issued for the purge events.
Since RDBMS indexes are created on those AIP receive and purge
date attributes, the inclusion of those criteria lets you quickly
narrow down the search to the recently altered AIP.
Confirmation Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
300
InfoArchive Administration
Return Code
Mnemonic
Description
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
6
E_NOT_CACHED
The configuration requires to generate a
confirmation containing the details of the AIU but
the AIP is not cached at the xDB level
7
E_DELIVERY
_CHANNEL
Cannot load the delivery channel configuration
(i.e. not found or found more than once)
8
E_NOTIFICATION
_SENT
The configured command line activated for
sending a generated confirmation returned a
non-zero return code
9
E_GLOBALCONFIG
Cannot load the InfoArchive global configuration
object
10
E_STAMP_UPDATE
Cannot update the AIP with the confirmation
timestamp indicating when the AIP has been
considered by the job
11
E_CONF_AUDIT
The audit trail associated with a generated
confirmation could not be saved
12
E_CONF_COMPLETED
_AUDIT
The audit trail indicating the completion of the job
execution could not be saved
13
E_NO_QUERY
_CONFIG
Cannot find the Query configuration
Purge (eas_purge)
The Purge job is in charge of executing the processing related to the disposal of the AIP:
• Attachment of the purge lifecycle the AIP having reached their retention date which do not
have any disposal lock.
• Destruction the AIP attached to the purge lifecycle which have been considered by the
Confirmation (eas_confirmation) job for the purge event.
• Destruction of the pooled library objects storing only destroyed AIP.
301
InfoArchive Administration
Purge Argument
Argument
Description
-PhasesToProcess
[detect][,destroy]
[,destroy_pooled_library]
Specifies the execution scope of the job using the following keywords:
• detect activates the search of the AIP having reached their retention
date and not having any disposal lock in order to attach them to the
purge lifecycle.
• destroy activates the search of the AIP attached to the purge lifecycle
which have been considered by the confirmation job.
• destroy_pooled_library activates the search of pooled libraries storing
only destroyed AIP.
-report_only
TRUE|FALSE
Setting to true this argument lets you report the purge processing to
be done without executing them.
The Purge Return Code
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
6
E_DETECT
Error while detecting the AIP
7
E_DESTROYAIP
Error while attempting to destroy an orphan AIP
8
E_DESTROYPOOL
Error while attempting to destroy an orphan xDB
pooled library
Update Content Metadata (eas_sip_cmeta_refresh)
Use the Update Content Metadata job to propagate the AIP custom element changes to the SIP
descriptor rendition.
302
InfoArchive Administration
Update Content Metadata Argument
Argument
Description
-WorkingDir DirectoryPath
File system path of the working directory to be used by the job
Update Content Metadata Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
InfoArchive Command Line Jobs
InfoArchive provides the following command line jobs:
Job Name
Job Code
Description
Clean
eas-launch-clean
Performs a regular cleanup of the reception, ingestion
and xDB cache working directories.
xDB Enumeration
No Backup
eas-launch-xdb
-enumeration
-nobackup
This job enumerates to stdout the list of xDB segments
which have been archived in the repository.
xDB Clean
eas-launch-xdb
-clean
The xDB Clean job detects and destroy the xDB
segments and XML documents created by InfoArchive
but no more referenced in the repository.
Those jobs have not been implemented as Content Server jobs since depending on the chosen
architecture, their execution can be required on servers not hosting the Content Server repository.
For clarity and brevity, the name of the script associated with the command line jobs are sometimes
referred to without their platform-specific extension (.bat on Windows and .sh on Linux) in this
document.
303
InfoArchive Administration
Clean (eas-launch-clean)
The purpose of this job is to perform a regular cleanup of the reception, ingestion and xDB cache
working directories:
• The arguments indicate the nodes for which the clean-up must be executed
— The file system path of the working areas is read in the configuration object of these nodes.
— These file system areas must be accessible at the file system level on which the command is
launched.
• The job is driven by a scanning of the working directories at the file system level having a
timestamped name.
• Directories timestamped with a date older than a parameterized period are deleted except if
they are associated with an AIP likely in error.
Clean properties file
The parameters of this job are stored in the conf/eas-clean.properties file; they can be
overridden in the command line.
Property Long
(Short) Name
Description
Default Value
docbase_name (s)
The repository name
Name of the repository
domain (d)
The user’s domain if any
Empty value
user_name (u)
The name of the Documentum user to
connect with
Login name of the installation
owner user account
password (p)
(Optional) The user password with
which to connect to the repository.
Password of the installation owner
user account
If the job executes on the same host
where Content Server is installed, you
do not need to specify this property
here. The job can connect to the
repository through the Content Server
trusted login feature.
receivers (r)
304
(Optional) List of reception node
configuration names (separated by ’,’)
indicating which reception working
areas must be cleaned up. The ’*’
value indicates that all nodes must be
considered
*
InfoArchive Administration
Property Long
(Short) Name
Description
Default Value
ingestors (i)
(Optional) List of ingestor node
configuration names (separated by ’,’)
indicating which ingestion working
areas must be cleaned up. The ’*’
value indicates that all nodes must be
considered
*
cache_access (c)
(Optional) List of xDB cache node
configuration names (separated by ’,’)
indicating which xDB cache working
areas must be cleaned up. The ’*’
value indicates that all nodes must be
considered
*
keephistory (h)
Working directories having a name
timestamped prior to the current
date minus the value of this property
(expressed in hours) are destroyed.
5
level (l)
Logging level: ERROR, WARN, DEBUG,
INFO, TRACE
INFO
Clean Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the
configured credentials
4
E_PARAMS
Error while validating the parameters
5
E_INJECTOR_FOLDER
Error while deleting an ingestion working
directory
6
E_RECEIVER_FOLDER
Error while deleting an a reception working
directory
7
E_CACHEACCESS
_FOLDER
Error while deleting a cache access node working
directory
305
InfoArchive Administration
xDB Enumeration No Backup (eas-launch-xdb-enumeration
-nobackup)
This job enumerates to stdout the list of xDB segments which have been archived in the repository by:
• Parsing the segments of the xDB federation
• Checking in the repository if the segments has been archived
The xDB segments which have been archived are written to the stdout output while the job processing
messages are written to the stderr one.
The returned list of segments is intended to be used while activating the standard xDB backup
command line for skipping the backup of those segments.
xDB Enumeration No Backup properties file
This properties file defines the known repositories and their associated credentials; this file has the
same syntax as the properties file of the xDB Clean job.
xDB Enumeration No Backup options
Usage:
eas-launch-xdb-enumeration-nobackup.sh [-c <arg>] -f <arg> [-l <arg>] -P
<arg> [-p <arg>]
This command supports the following options:
Option
Description
-c –cache <arg>
The xDB cache pages for the database session
-f –federation <arg>
The xDB federation bootstrap path or URL
-l, –level <arg>
The optional logging level to use (TRACE, DEBUG, INFO, WARN, or
ERROR)
-p –password <arg>
The password of the xDB superuser
-P –properties
Properties file containing the list of repositories to connect to with the
credentials to use
xDB Enumeration No Backup example:
eas-launch-xdb-enumeration-nobackup.sh -f //localhost:1235 -p mypassword
–P /app/eas/conf/eas-xdb-clean.properties 1>skip-segments.txt 2>
eas-launch-xdb-enumeration-nobackup.log
xDB Enumeration No Backup Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
306
InfoArchive Administration
Return Code
Mnemonic
Description
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
5
E_XDBINIT
Error while initializing the xDB client
xDB Clean (eas-launch-xdb-clean)
The xDB Clean job detects and destroy the xDB segments and XML documents created by InfoArchive
but no more referenced in the repository.
The job performs the following actions in sequence:
• Scan of the xDB segment having a name prefixed by eas_aip_id : the segment is considered as
orphan if not any AIP having this identifier is found in the repository.
• Scan of the xDB segment having a name prefixed by eas_pooled_library_id : the segment is
considered as orphan if not any pooled library object having this identifier is found in the
repository.
• Scan of the xDB documents having an attribute named eas_aip_id : the document is considered
as orphan when:
— No AIP having this identifier is found in the repository
— The AIP is invalidated with status INV-WXDBCLEAN
— The AIP is rejected with status REJ-WXDBCLEAN
• Scan of the xDB documents having an attribute named eas_order_id : the document is considered
as orphan if not any order having this identifier is found in the repository.
xDB Clean Properties File
The conf/eas-xdb-clean.properties properties file contains the repositories to connect to
with the credentials to use for each repository.
Property
Description
Default Value
dfc.servers
The repository names to connect with
separated by a comma
Name of the repository
307
InfoArchive Administration
Property
Description
Default Value
dfc.server. repositoryName
.user
The name of the Documentum user to
connect with for the specified repository
Login name of the
installation owner user
account
dfc.server. repositoryName
.password
(Optional) The user password with
which to connect to the repository.
Password of the
installation owner
user account
If the job executes on the same host
where Content Server is installed, you
do not need to specify this property here.
The job can connect to the repository
through the Content Server trusted login
feature.
xDB Clean Options
Option
Description
-d –database
The xDB Database name
-u –username
<arg>
The user name. xDB automatically uses the superuser or Administrator user
name where needed.
-p –password
<arg>
The password of the user
-f –federation
<arg>
The federation bootstrap path or URL
-c –cache <arg>
The cache pages for the database session
-l,--level <arg>
The optional logging level to use (TRACE, DEBUG, INFO, WARN, or ERROR)
-P –properties
properties file containing the list of repositories to connect to with the
credentials to use
-r –report_only
Reports the xDB libraries and documents to be deleted without deleting them
-xa –exclude_aip
Deactivate the search of the xDB libraries/documents associated with AIP
-xo –exclude
_order
Deactivate the search of the xDB libraries/documents associated with orders
-xi --exclude
_invalid
Deactivate the search of the xDB libraries/documents associated with invalid
AIPs
-xr --exclude_reject
Deactivate the search of the xDB libraries/documents associated with rejected
AIPs
xDB Clean command line example:
eas-launch-xdb-clean.bat -P /app/eas/conf/eas-xdb-clean.properties -d xdb01 -d xdb01
-u Administrator -p dmadmin -f xhive://localhost:1235
308
InfoArchive Administration
xDB Clean Return Codes
Return Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
5
E_XDBINIT
Error while initializing the xDB client library
Managing Audit
Audit management is essential to regulatory and legal compliance, which demands a convincingly
documented audit trail. Most audits commence with a request for information, followed by a request
for an audit trail for supplied information. A properly stored, well-managed, and tamper-proof
audit trail that can be conveniently and quickly accessed build confidence with auditing bodies
and contribute to favorable outcomes.
In InfoArchive, auditing can be activated for both standard Documentum Content Server events and
InfoArchive-specific events and are managed through the standard Audit Management feature in DA.
Audit records can be archived by the eas_archive_audit job into a designated InfoArchive holding
and accessed through two pre-configured search forms in InfoArchive GUI. Archived audit records
that have aged beyond their compliance requirements can be deleted using the eas_purge_audit job.
InfoArchive Audit Trail Events
In InfoArchive, audit records result from two types of events:
• Standard Documentum Content Server events (with event names beginning with the dm_ prefix)
InfoArchive leverages the standard Documentum auditing capabilities to provide documentary
evidence of the sequence of activities and events that have affected InfoArchive-specific objects
(with object names beginning with the eas_ prefix), such as AIP objects, configuration objects,
and pooled library objects, just as with other repository objects.
The auditing of standard Documentum Content events are activated by default.
You view and manage audit records of standard Documentum events in DA using the features in
the Administration/Audit Management folder.
• InfoArchive-specific events
Aside from standard Documentum Content Server events, InfoArchive also audits events that are
specific to the InfoArchive application—events with their names beginning with the eas_ prefix.
309
InfoArchive Administration
Among these events, InfoArchive audit trail events—eas_query, eas_getcontent, eas_order,
eas_getorder, and eas_fetch—are not audited by default, but need to be manually activated at the
global or holding/AIC level.
Documentum Content Server Events
Here is a list of standard Documentum Content Server events audited by InfoArchive:
Audit Object Type
Event
dm_user
dm_connect, dm_destroy, dm_disconnect, dm_getlogin, dm_logon_failure,
dm_save, dm_security_check_failed
dm_group
dm_destroy, dm_save, dm_security_check_failed
dm_policy
dm_install, dm_uninstall, dm_validate
eas_cfg
dm_addrendition, dm_branch, dm_checkin, dm_checkout, dm_destroy,
dm_link, dm_lock, dm_mark, dm_prune, dm_removecontent,
dm_removerendition, dm_save, dm_saveasnew, dm_setfile, dm_unlink,
dm_unlock, dm_unmark
eas_aip
dm_addrendition, dm_addretention, dm_branch, dm_checkin,
dm_checkout, dm_destroy, dm_link, dm_lock, dm_mark, dm_prune,
dm_removecontent, dm_removerendition, dm_removeretention, dm_save,
dm_saveasnew, dm_setfile, dm_unlink, dm_unlock, dm_unmark,
dm_updatepart, dm_bp_attach, dm_bp_demote, dm_bp_promote,
dm_bp_resume, dm_bp_suspend
dm_acl
dm_destroy, dm_save, dm_saveasnew
dm_job
dm_addrendition, dm_branch, dm_checkin, dm_checkout, dm_destroy,
dm_link, dm_lock, dm_mark, dm_prune, dm_removecontent,
dm_removerendition, dm_save, dm_saveasnew, dm_setfile, dm_unlink,
dm_unlock, dm_unmark, dm_jobstart,
dmr_content
dm_move_content
InfoArchive-Specific Events
Here is a list of InfoArchive-specific events audited by InfoArchive:
Event
User
Object Type
Description
eas_query
webservice
eas_audittrail
A search is performed
310
InfoArchive Administration
Event
User
Object Type
Description
eas_fetch
webservice
eas_audittrail
An AIU ID is returned by a search.
The event contains the same information as
the eas_query event except for query criteria
and the AIP count. To minimize the database
footprint, Only one eas_fetch will be created
and all AIU IDs will be saved into a distinct
persistence object (eas_audittrail_fetch). Each
persistence object contains a reference to the
eas_fetch audittrail.
eas_getcontent
webservice
eas_audittrail
An unstructured content file is retrieved from
InfoArchive
eas_order
webservice
eas_audittrail
An order (asynchronous search) is created
eas_getorder
webservice
eas_audittrail
The result of an order is retrieved
eas_confirmation
dmadmin
dm_audittrail
A confirmation job is run
eas_confirmation
_completed
dmadmin
dm_audittrail
A confirmation job has completed
eas_archive
_audit
dmadmin
dm_audittrail
The archive audit (eas_archive_audit) job is
run
eas_purge_audit
dmadmin
dm_audittrail
The purge audit (eas_puarge_audit) job is run
eas_purge_audit
_completed
dmadmin
dm_audittrail
The purge audit (eas_puarge_audit) job has
completed
eas_change
_retention
user
dm_audittrail
The retention date of an AIP has changed
eas_unlock
user
dm_audittrail
A purge lock has been removed from an AIP
eas_lock
user
dm_audittrail
A purge lock has been attached to an AIP
dmadmin
eas_rollback
dmadmin
dm_audittrail
An asynchronous ingestion operation has
been rolled back
eas_reject
dmadmin
dm_audittrail
An AIP has been rejected
user
eas_invalid
user
dm_audittrail
An AIP has been invalidated
eas_purge
dmadmin
dm_audittrail
An AIP has been purged
eas_aip_cmeta
_modify
user
dm_audittrail
The custom metadata of an AIP has been
updated
311
InfoArchive Administration
Archiving InfoArchive Audit Records
Audit records can be archived in InfoArchive in the same way as other archived data. Audit records
are archived using the archive audit (eas_archive_audit) job.
When you installed InfoArchive, a pre-configured holding EAS-AUDIT-001 was installed for storing
audit records. By default, the primary holding configuration objects are located in the /System
EAS/Archive Holdings/EAS/EAS-AUDIT-001 repository folder and /EAS/EAS-AUDIT-001 is
the default location for audit AIPs. If needed, you can modify the default holding configurations.
When executed, the archive audit (eas_archive_audit) job performs the following actions:
1.
Creates a SIP file containing audit records that have not been archived so far.
2.
Triggers the Receiver to receive the generated SIPs into the audit holding and put them in the
queue for ingestion.
Once the SIP is ingested and committed into the audit holding, the audit records contained within
can be searched and accessed in the InfoArchive GUI.
Configuring the Archive Audit (eas_archive_audit) Job Arguments
You pass standard Documentum Content Server job method arguments to the archive audit
(eas_archive_audit) job.
To edit the job arguments:
1.
In DA, navigate to the Administration/Job Management/Jobs repository folder.
2.
Locate and right-click the eas_archive_audit job and choose Properties from the shortcut menu.
3.
Under the Method tab of the Job Properties page, make sure the Pass standard arguments
option is selected and click Edit.
4.
Edit the method arguments and save your changes.
Archive audit (eas_archive_audit) job arguments are listed as follows:
312
InfoArchive Administration
Job Method
Arguments
Description
Default Value
docbase_name (s)
Name of the Documentum Content Server
repository to connect to
domain (d)
the user’s domain if any
user_name (u)
Username with which to connect to the
Documentum repository
Login name of the
installation owner user
account
password (p)
(Optional) The user password with which to
connect to the repository.
Password of the installation
owner user account
If the job executes on the same host where
Content Server is installed, you do not need to
specify this property here. The job can connect
to the repository through the Content Server
trusted login feature.
job_id (j)
r_object_id of the dm_job repository object
corresponding to the job
predicate
DQL dm_audittrail predicate for filtering the
audit records to be archived by the job. If not
set, all audit records are archived.
holding
Name of the target archive holding to fill in the
descriptor of the generated SIP
EAS-AUDIT-001
producer
Designation of the application producing the
SIP to fill in the descriptor of the generated SIP
eas
entity
Name of the business entity to fill in the
descriptor of the generated SIP
EAS
pdischema
PDI schema URN to fill in the descriptor of the
generated SIP
urn:x
-emc:eas:schema:audittrail:1
.0
pdischemaversion
PDI schema URN to fill in the descriptor of the
generated SIP
producer
Designation of the application producing the
SIP to fill in the descriptor of the generated SIP
eas
application
Designation of the application producing the
data to fill in the descriptor of the generated SIP
eas
priority
Ingestion priority to fill in the descriptor of the
generated SIP
0
maxaudit
Maximum number of audit trails to include in
a generated SIP
100000
313
InfoArchive Administration
Job Method
Arguments
Description
Default Value
commandline
Command line to trigger the reception of a
generated SIP. The job dynamically substitute
the placeholder %file% with the file path of
the generated SIP file
eas-launch-receiver
-f %file% -e true -c
reception_node_01 -o
EAS
checkaudit
Boolean indicating if the job must first validate
a signed audit trail before exporting it
false
checksumalgo
Java name of the hash algorithm to apply for
computing the hash value of the previous SIP
MD5
checksumencoding
Java name of the encoding algorithm to use for
including the hash value of the previous SIP in
the data file header of the current SIP
base64
workingdir
File system path of the directory to be used by
the job as working area
Value defined during the
installation
Running the Archive Audit (eas_archive_audit) Job
You archive audit trail records by running the eas_archive_audit job. When successfully completed,
the eas_archive_audit job returns code 0 (zero). An AIP containing all audit trail records that
have not been archived yet is created in the audit holding (default: EAS-AUDIT-001) with the
Waiting_Ingestion lifecycle state. You must then ingest and commit the received AIP to complete
the archiving process. If synchronous commit is enabled, ingestion is automatically followed
by the commit operation. To ensure the integrity of the archived audit trail, you must run the
eas_confirmation job after the commit; otherwise, the next eas_archive_audit job run will fail.
archived and confirmed audit trail records can be periodically purged.
Follow these steps to archive InfoArchive audit records:
1.
Run the archive audit (eas_archive_audit) job in one of the following ways:
• Execute the following command in a prompt window:
runJob repository_name eas_archive_audit
• In DA, run the eas_archive_audit job under Administration/Job Management/Jobs.
You can schedule the job to run on a regular basis.
2.
Ingest the received AIP by executing the following command:
EAS_HOME/bin/eas-launch-ingestor aip_id
See Ingesting AIPs, page 245.
3.
314
Commit the ingestion by running the eas_commit job.
InfoArchive Administration
See Committing Ingested AIPs, page 250.
4.
Run the confirmation (eas_confirmation) job.
See Confirmation (eas_confirmation), page 299.
Troubleshooting the Archive Audit Job Errors
When an audit archiving error occurs, a non-zero code is returned. You can find more information
about the error in the log file eas_archive_audit.log created in EAS_HOME/bin. Identify and fix
the cause of the error and run the eas_archive_audit job again.
If the error persists, delete the dm_audittrail objects in question by executing the following DQL and
then rerun the eas_archive_audit job:
delete dm_audittrail objects where event_name='eas_archive_audit'
Here is a list of eas_archive_audit job return codes:
Return
Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the configured
credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
6
E_EXTERNAL
_COMMAND
Error returned by the configured command line launched for
posting the ingestion of a generated SIP
7
E_CHECK_AUDIT
Error returned when the verification of the signature
associated with an audit trail fails
8
E_COMMIT_WAIT
Execution refused since the last SIP posted for ingestion has
not been ingested and confirmed
Viewing Archived Audit Records
Once archived by the eas_archive_audit job, audit records can be searched and viewed in two built-in
search forms in InfoArchive GUI, both located in the EIA system forms folder under the Search tab:
• Archive lifecycle log
Use this form to search for audit records of events that affect the lifecycle state of an AIP, such
as eas_confirmation and eas_purge.
315
InfoArchive Administration
• Event log
Use this form to search for audit records of events do not affect the AIP lifecycle, such as eas_query
and eas_purge_audit.
Both the archive lifecycle log and event log search forms return audit records in the search results
page, on which you can sort and filter audit records, view record details, and export records. For
information about the search result page, see Working with the Search Results Page, page 272.
Purging InfoArchive Audit Records
Every growing audit records can quickly take up storage space and cause system performance
degradation. Once they have aged beyond compliance requirements, they should be deleted to
reclaim storage space.
316
InfoArchive Administration
The Purge Audit (eas_purge_audit) job is responsible for purging archived audit records and
performs the following actions:
• Incremental reading of the archival confirmation messages created for the AIP archived in
the audit archive holding; each audit trail repository object referenced in such confirmation is
flagged as archived.
• Destruction of the archived repository objects older than a configurable time frame.
Configuring Purge Audit (eas_purge_audit) Job Arguments
You pass standard Documentum Content Server job method arguments to the purge audit
(eas_purge_audit) job.
To edit the job arguments:
1.
In DA, navigate to the Administration/Job Management/Jobs repository folder.
2.
Locate and right-click the eas_purge_audit job and choose Properties from the shortcut menu.
3.
Under the Method tab of the Job Properties page, make sure the Pass standard arguments
option is selected and click Edit.
4.
Edit the method arguments and save your changes.
Archive audit (eas_purge_audit) job arguments are listed as follows:
Property Long
(Short) Name
Description
Default Value
library
Name of the xDB library configuration object
in which the archival confirmations must be
scanned
confirmation_audit_xdb
_lib
cutoffaudit
Sets the current date minus the specified
number of days as the audit archive cutoff date.
Archived audit trail objects with a timestamp
prior to this date will be destroyed by the
eas_purge_audit job. For example, if this value
is set to 7, all audit trail records archived earlier
than a week will be destroyed.
92
cutoffconf
Sets the current date minus the specified
number of days as the cutoff date for archival
confirmation messages. Archival confirmation
messages created prior to this date will be
destroyed by the eas_purge_audit job. For
example, if this value is set to 7, all archival
confirmation messages created earlier than a
week will be destroyed.
92
workingdir
File system path of the directory to be used by
the job as working area
Value defined during the
installation.
317
InfoArchive Administration
Running the Purge Audit (eas_purge_audit) Job
You can run the purge audit (eas_purge_audit) job in one of the following ways:
• Execute the following command in a prompt window:
runJob repository_name eas_purge_audit
• In DA, run the eas_purge_audit job under Administration/Job Management/Jobs.
When successfully completed, the eas_purge_audit job returns code 0 (zero).
Troubleshooting the Purge Audit Job Errors
When an audit purge error occurs, a non-zero code is returned. You can find more information about
the error in the log file eas_purge_audit.log created in EAS_HOME/bin. Identify and fix the
cause of the error and run the eas_purge_audit job again.
Here is a list of eas_purge_audit job return codes:
Return
Code
Mnemonic
Description
-1
E_UNEXPECTED
Unexpected
Unexpected error
0
OK
Successful execution
1
E_PARSE
Error while parsing arguments
2
E_DFCINIT
Error while initializing the DFC
3
E_CREDENTIALS
Cannot connect to the repository with the configured
credentials
4
E_PARAMS
Error while validating the parameters
5
E_REPORT
Error while creating the log of the job
6
E
_USERPERMISSION
The repository user account configured for being used by the
job does not have the audit related extended privileges which
are required for the execution of the job
Administrating the Configuration Cache
Use the InfoArchive web services administration pages http://hostname:port/eas-services/ to perform
administrative tasks on the configuration cache, including flushing the cache, view basic and detailed
cache statistics, and viewing detailed element key information in the cache.
The configuration cache administration pages requires basic HTTP authentication, so to access
the pages, you must give access privileges to an appropriate user by granting access to the
eas-services-admin role, and assigning the user to this role.
318
InfoArchive Administration
For example, on Apache Tomcat, add the following in TOMCAT_HOME/conf/tomcat-user.xml,
and then restart the web application server for the changes to take effect.
<role rolename="eas-services-admin"/>
<user username="<username>" password="<password>" roles="eas-services-admin"/>
There are three configuration cache administration pages:
• Configuration Cache Flush (http://<hostname>:<port>/eas-services/admin/config-cache/flush)
Use this page to flush the cache. Every time you access this page, the cache is flushed.
• Configuration Cache Statistics (http://<hostname>:<port>/eas-services/admin/config-cache/stats)
View basic and detailed statistics of the cache on this page. You must set eas_config_cache
_statistics_enabled in eas-services.properties to true to be able to view the detailed
statistics.
• Configuration Cache Keys (http://<hostname>:<port>/eas-services/admin/config-cache/keys)
View detailed element key information on this page.
Logging
Logging mechanisms are implemented differently among InfoArchive components. This section
describes logging mechanisms and how you can customize logging according to your needs.
Command Line Jobs Logging
Command line jobs refer to the stand-alone Java programs for a certain task. Most command line jobs
have two loggers, file logger and console logger.
File Loggers for Command Line Jobs
When the command line job’s corresponding configuration object is loaded, a file logger is attached
to the event. When the event is triggered, logs are saved into a file in the specified location. The
following table describes the file logging process in detail.
Job
Log Setting
Log Location
eas-launch-receiver
Open a reception node’s property
page (eas_cfg_receive_node), log
level and working directory set the
logging verbosity and the log file
location respectively.
In the working directory, the
subfolder name follows the
<date-time>_<aipID> pattern.
If you enable archive logs for a
holding, the reception logs are also
compressed and saved as a rendition
of the AIP object.
319
InfoArchive Administration
Job
Log Setting
Log Location
eas-launch-ingestor
Open an ingestion node’s property
page (eas_cfg_ingest_node), log
level and working directory set the
logging verbosity and the log file
location respectively.
In the working directory, the
subfolder name follows the
<date-time>_<aipID> pattern.
eas-launch-order
-node
Open an order node’s property page
(eas_cfg_order_node), log level and
working directory set the logging
verbosity and the log file location
respectively.
The log file is saved in the directory
specified by working directory.
eas-launch-xdb
-cache
Open a cache access node’s property
page (eas_cfg_cache_access_node),
log level and working directory set
the logging verbosity and the log file
location respectively.
The log file is saved in the directory
specified by working directory.
If you enable archive logs for a
holding, the ingestion logs are also
compressed and saved as a rendition
of the AIP object.
The command line job logging level is specified by the -l parameter in the .properties file. The
value is one of the following:
ERROR, WARN, DEBUG, INFO, TRACE.
Console Loggers for Command Line Jobs
Command line jobs, except eas-clean, also have console loggers. The default console logger
displays logs on the screen prompt or saves screen logs to a file you specify in the command line.
You can set the log level and redirect the log for command line jobs’ console loggers in the following
ways:
1.
Enable and set the -l argument in the corresponding property file in the conf folder.
2.
If you would tentatively override the log level value in the configuration file, append the
-l argument and your desired log level value in the command. For example, the following
command sets the current command log level to DEBUG:
eas-launch-receiver.bat -f myfile.zip -l DEBUG
3.
If you would redirect the logging ouput to a file, specify the file path in the command line. For
example, the following command saves logs to a file:
eas-launch-receiver.bat -f myfile.zip > C:\temp\details\receiver.log
320
InfoArchive Administration
The console loggers for the following command line jobs behave differently from the default console
logger.
• eas-enumeration: The console logger only outputs the AIP ID.
• eas-xdb-cache, eas-xdb-order: You cannot set the logging level for the console logger.
• eas-launch-xdb-enumeration-nobackup: The console outputs when an error occurs only.
Content Server Jobs Logging
Content Server jobs refer to jobs that are triggered by dm_job objects in a Content Server repository.
Content Server jobs perform lifecycle management, update, modify or delete Content Server objects.
Content Server jobs save their logs in the $Documentum/dba/log/mylog/sysadmin folder.
The files in this folder are also imported to the repository. Therefore, you can also view job logs in
cabinets\Temp\Jobs of DA.
Content Server job logging has the following logging levels:
• 0: ERROR
• 1: WARN
• 2: INFO
• 3: DEBUG
• 4: TRACE
If you specify an integer between 4 and 10 inclusive, the logging level is TRACE.
For eas_commit, eas_rejinv, and eas_report jobs, logs are also saved in the directory specified by the
-WorkingDir argument. You can edit this argument in the Method tab.
Configuring the Logging Level for InfoArchive Web GUI
and Web Services
For InfoArchive web services, the logging verbosity and the location are defined by
WEB-INF/classes/log4j.xml. You can locate this file after you deploy WAR files.
The logging level is set to INFO by default. You can change the logging level to ERROR, WARN,
DEBUG, or TRACE.
<category name="com.emc">
<priority value="INFO" />
</category>
The log location is specified by the following line:
<param name="file" value="/tmp/eas/eas-gui.log" />
321