EMC ® InfoArchive Version 3.1 Configuration and Administration Guide EMC Corporation Corporate Headquarters Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com Legal Notice Copyright © 2014 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDF Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarks used herein are the property of their respective owners. Documentation Feedback Your opinion matters. We want to hear from you regarding our product documentation. If you have feedback about how we can make our documentation better or easier to use, please send us your feedback directly at IIGDocumentationFeedback@emc.com Table of Contents Preface Chapter 1 Chapter 2 ................................................................................................................................ 11 EMC InfoArchive Overview .......................................................................... 15 Key Features and Benefits ................................................................................. 15 InfoArchive Architecture .................................................................................. Required Components .................................................................................. Optional Components ................................................................................... 18 19 20 ............................................................. InfoArchive Data Model.................................................................................... Archive Holding........................................................................................... Submission Information Package (SIP) ........................................................... Data Submission Session (DSS) .................................................................. SIP Descriptor (eas_sip.xml) ...................................................................... SIP Descriptor (eas_sip.xml) Schema ...................................................... Customizing SIP/AIP Metadata ............................................................. PDI File (eas_pdi.xml) .............................................................................. Archival Information Unit (AIU) ............................................................... Archival Information Package (AIP) .............................................................. Unstructured Content File Table of Contents (eas_ri_xml) .......................... Archival Information Collection (AIC) ........................................................... How Data is Ingested........................................................................................ Ingestion Modes ........................................................................................... Asynchronous Ingestion ........................................................................... Synchronous Ingestion .............................................................................. Asynchronous Ingestion vs. Synchronous Ingestion ................................... Archiving Process ......................................................................................... Reception ................................................................................................. Receiver ............................................................................................... Reception Process and Lifecycle ............................................................ Reception Node .................................................................................... Enumerator .............................................................................................. Ingestion .................................................................................................. Ingestor ............................................................................................... Ingestion Process and Lifecycle ............................................................. Ingestion Node ..................................................................................... Ingestion Priority.................................................................................. How Data is Stored ........................................................................................... xDB Modes .................................................................................................. xDB Library Pool (xDB Mode 3) ................................................................ xDB Pooled Library Assignment Policy (xDB Mode 3)................................. xDB Library Online Backup....................................................................... xDB Caching ............................................................................................ xDB Library Locking................................................................................. AIP Modes ................................................................................................... AIP Parent Assignment Policy ................................................................... AIP Parent Closure ................................................................................... 21 Concepts: How InfoArchive Works 21 21 22 23 25 26 30 30 32 32 33 36 36 36 36 37 38 39 39 39 41 42 43 44 44 44 48 48 48 50 52 54 54 55 57 58 60 60 3 Table of Contents Chapter 3 Supported AIP Mode and xDB Mode Combinations ....................................... 61 How Data is Searched ....................................................................................... Order (Asynchronous Search) ....................................................................... Order Lifecycle ......................................................................................... 62 63 65 Confirmations—How the Loop is Closed ........................................................... Confirmation Event Types ............................................................................. Confirmation Job (eas_confirmation) .............................................................. 66 66 67 ........................................................................... InfoArchive Configuration Overview ................................................................. Working with InfoArchive Configuration Objects ........................................... Creating a Contentless Configuration Object .............................................. Creating a Configuration Object with Content ............................................ Importing Content into a Configuration Object ........................................... Editing the Properties of a Configuration Object ......................................... Performing System Global Configurations.......................................................... Configuring Global Settings .......................................................................... Configuring an xDB Cache Access Node ........................................................ Configuring an xDB Library .......................................................................... Configuring the Configuration Cache ............................................................ Configuring a Centera Store for Use with InfoArchive .................................... Creating a Centera Store Object ................................................................. Configuring Key Centera Store Object Properties ........................................ Quickstart Using the InfoArchive Holding Configuration Wizard ........................ Launching the InfoArchive Holding Configuration Wizard ............................. 69 InfoArchive Configuration Configuring a Basic Holding Using the InfoArchive Holding Configuration Wizard ................................................................................... 69 70 70 71 72 72 72 72 74 76 79 81 81 84 84 85 86 Configuring a Holding ...................................................................................... 96 Configuring a Holding Configuration (eas_cfg_holding) Object ....................... 97 Configuring an AIC View (eas_cfg_aic_view) Object ..................................... 104 Configuring a DQL Predicate for the AIC View ........................................ 104 Configuring an XQuery for the AIC View................................................. 104 4 Configuring Holding Security ......................................................................... 104 Defining the Structured Data (PDI) Schema ...................................................... Creating a Structured Data (PDI) Schema ..................................................... PDI (eas_pdi.xml) Schema Definition Best Practices .................................. Sample PDI (eas_pdi.xml) Schema ........................................................... Configuring a Schema Configuration Object ................................................. 108 109 110 111 113 Configuring xDB Modes ................................................................................. Configuring an xDB Parent Library (xDB Mode 1 and 2) ............................... Configuring an xDB Library Pool (xDB Mode 3) ........................................... Configuring a Custom Pooled Library Assignment Policy (xDB Mode 3) ..................................................................................................... Configuring xDB Mode Settings for a Holding.............................................. Configuring Settings for xDB Mode 1 ....................................................... Configuring Settings for xDB Mode 2 ....................................................... Configuring Settings for xDB Mode 3 ....................................................... Converting xDB Library (Pool) Backup Renditions ................................... Configuring the Ingestion Process ................................................................... Defining Ingestor Parameters ...................................................................... pdi.index.creator—Creating xDB Indexes ................................................. Path Index .......................................................................................... Full-Text Index ................................................................................... pdi.aiu.cnt—Counting the Number of AIUs ............................................. minmax—Defining the AIP Partitioning Key ............................................ 114 115 116 120 121 121 122 124 124 125 127 127 128 129 131 132 Table of Contents Optimizing XQuery for Partitioning Keys ............................................ pdi.aiu.id — Generating AIU IDs............................................................. pdi.ci.id — Generating Content File IDs ................................................... toc.creator—Creating the Table of Contents (eas_ri.xml) for Unstructured Content Files ..................................................................... ci.hash—Configuring Content Hashing (Unstructured Data) .................... ci.compressor—Configuring Compression of Content Files (Unstructured Data)................................................................................ Configuring a Ingestion Configuration (eas_cfg_pdi) Object .......................... Configuring a Reception Node Configuration (eas_cfg_receive _node) Object ............................................................................................. Configuring an Ingestion Node Configuration (eas_cfg_ingest _node) Object ............................................................................................. 134 135 136 Configuring Ingestion for Unstructured Data ................................................... 148 Configuring an Encryption-enabled Holding .................................................... Preparing Resources for InfoArchive Installation .......................................... Configuring RSA DPM Key Manager ........................................................... Creating Configuration Objects for Encryption Settings................................. eas_cfg_crypto_provider ......................................................................... eas_cfg_aic_crypto .................................................................................. eas_cfg_holding_crypto .......................................................................... eas_cfg_pdi_crypto ................................................................................. pdi.xdb.encryption ............................................................................. pdi.index.creator ................................................................................ pdi.xdb.importer ................................................................................ set.schema.......................................................................................... ci.encryption ...................................................................................... eas_query_config .................................................................................... Adding Properties for the Holding Object .................................................... Auto Populating Properties for Runtime Objects ........................................... RSA Encryption Header .............................................................................. Operator Restraints for Querying Encrypted Data ........................................ 149 150 151 151 151 152 153 155 155 156 156 156 157 157 159 159 161 161 Configuring Synchronous Ingestion ................................................................. Defining the Custom AIP Assignment Policy ................................................ Configuring the AIP Parenting Policy .......................................................... Enabling Synchronous Ingestion.................................................................. 162 162 163 167 Configuring Query ......................................................................................... Configuring a Query Quota Configuration (eas_cfg_query_quota) Object ........................................................................................................ Defining the Search Criteria and Search Result ............................................. Configuring Query for Unstructured Data................................................ request-config—Defining Search Criteria ................................................. Configuring Search Criteria for Multi-Level XML Structures ................. Grouping Search Criteria .................................................................... Searching Multiple Paths..................................................................... query-template—Defining Search Results ................................................ Configuring a Query Configuration (eas_cfg_query) Object .......................... Configuring an Order Configuration (eas_cfg_order) Object ......................... Configuring an Order Node Configuration Object ........................................ Starting the Order Node.............................................................................. 168 169 171 171 173 174 175 176 177 178 179 182 184 Configuring InfoArchive GUI .......................................................................... Configuring the Search Menu ...................................................................... Configuring a Search Form Folder ............................................................... Configuring a Search Form.......................................................................... Configuring XForms ............................................................................... XForms Structure ............................................................................... XForms Example ................................................................................ 184 185 186 187 188 189 189 137 138 141 143 143 146 5 Table of Contents Chapter 4 6 Search Criteria .................................................................................... Search Form Binding .......................................................................... Search Form Controls ......................................................................... Logical Grouping of Search Criteria ..................................................... Multiple Input Values for a Single Search Criterion ............................... Defining InfoArchive GUI Locales ....................................................... Localizing a Search Form .................................................................... Configuring Hints and Error Messages ................................................ Configuring a Search Form Configuration (eas_cfg_search_form) Object .................................................................................................... Configuring the Search Results .................................................................... Configuring a Stylesheet Configuration (eas_cfg_stylesheet) Object .................................................................................................... Creating a Stylesheet .............................................................................. Stylesheet Components ....................................................................... Stylesheet Elements and Attributes ...................................................... Namespaces in the Stylesheet .............................................................. Stylesheet Example ............................................................................. Localizing a Stylesheet ............................................................................ Implementing InfoArchive GUI Single Sign-On (SSO) ................................... InfoArchive Reserved Context Variables .................................................. Custom Context Variables ....................................................................... Enabling Security Policies ........................................................................... Clear-text Username/Password Policy ...................................................... Configuration on Web Services ............................................................ Configuration on Web GUI .................................................................. X.509 Certificates .................................................................................... Configuration on Web Services ............................................................ Configuration on Web GUI .................................................................. Customizing a Security Policy ................................................................. Customizing InfoArchive GUI ..................................................................... UI strings ............................................................................................... CSS ........................................................................................................ Autocomplete ........................................................................................ Configuring Advanced InfoArchive GUI Settings ......................................... 191 192 193 193 194 195 195 198 Configuring Confirmations ............................................................................. Confirmation Configuration (eas_cfg_confirmation) Object .......................... Query Configuration Object (eas_cfg_query) for Confirmations ..................... XQuery for Confirmations ....................................................................... Delivery Channel Configuration Object (eas_cfg_delivery _channel) .............................................................................................. Delivery Channel Configuration Parameters......................................... Configuring a Delivery Channel Configuration Object for Confirmations ............................................................................................ Configuring a Query Configuration (eas_cfg_query) Object for Confirmations ............................................................................................ Defining the Criteria for Applicable AIPs (Optional) ..................................... Configuring a Confirmation Configuration Object ........................................ 222 222 223 224 InfoArchive Administration ....................................................................... InfoArchive Administration Overview ............................................................. 233 233 Generating SIPs ............................................................................................. PDF Files Consolidation & Dynamic Extraction ............................................ Generating SIPs Containing Consolidated PDF Files ..................................... 233 234 234 Starting the xDB Cache ................................................................................... Archiving Data in Asynchronous Ingestion Mode............................................. Receiving SIPs ............................................................................................ 235 236 237 198 200 201 202 202 203 211 211 213 215 215 216 216 216 217 218 218 218 219 220 220 221 221 221 221 225 226 228 229 230 231 Table of Contents Configuring the Receiver Properties......................................................... Running the Receiver .............................................................................. Verifying the Reception Process ............................................................... Troubleshooting Receiver Errors .............................................................. Receiver Return Codes ........................................................................ Receiver Log File ................................................................................ Enumerating AIPs for Ingestion................................................................... Configuring Enumerator Properties ......................................................... Running the Enumerator......................................................................... Troubleshooting Enumerator errors ......................................................... Enumerator Return Codes ................................................................... Re-enumerating Failed AIPs ................................................................ Ingesting AIPs ............................................................................................ Configuring Ingestor Properties .............................................................. Running the Ingestor .............................................................................. Verifying the Ingestion Process ................................................................ Troubleshooting Ingestion Errors ............................................................. Ingestor Return Codes ........................................................................ Ingestor Log File................................................................................. Committing Ingested AIPs .......................................................................... Executing the eas_commit Job ................................................................. Verifying the Commit ............................................................................. 237 238 238 240 240 242 242 242 243 244 244 244 245 245 246 247 248 249 250 250 251 252 Managing AIPs .............................................................................................. AIP States .................................................................................................. Rejecting or Invalidating an AIP .................................................................. Updating Custom Package Metadata ........................................................... Ingesting SIPs Containing Custom Package Metadata ............................... Updating Custom Package Metadata ....................................................... Custom Package Metadata Update Dates ............................................. Updating Custom Package Metadata in DA ......................................... Updating Custom Package Metadata using DQL scripts ........................ Propagating Changes to AIP Renditions................................................... Performing Data Retention Management...................................................... Using InfoArchive’s Date-Based Retention Management Capabilities ............................................................................................ Changing the Retention Date ............................................................... Applying/Removing a Purge Lock ....................................................... Using Extended Retention Management Capabilities of Retention Policy Services (RPS) ............................................................... Licensing and Installing RPS Components ........................................... RPS Basic Concepts............................................................................. Retention Policy ............................................................................. Retainer ......................................................................................... Applying Retention Markups .......................................................... Escalating DFC Instances to Privileged Users ....................................... Aligning the Retention Base Time ........................................................ Creating a Retention Policy ................................................................. Applying RPS Retention Policies to AIP Objects ................................... Applying a Retention Policy to a Folder ........................................... Closing and Reopening a Folder ...................................................... Setting a Retention Policy at the Holding Level ................................. Setting a Retention Policy in the SIP Descriptor ................................ Changing RPS Retentions .................................................................... Disposing AIPs under RPS Retention ................................................... Rejecting/Invalidating AIP Objects Under RPS Retention ...................... Deleting AIP Objects Under RPS Retention .......................................... 252 254 255 256 256 257 257 258 258 258 260 Working with InfoArchive GUI ....................................................................... Logging in to InfoArchive GUI .................................................................... 269 270 261 262 262 262 263 263 263 264 264 265 266 266 267 267 267 267 268 268 268 269 269 7 Table of Contents 8 Searching Archived Data............................................................................. Working with the Search Results Page.......................................................... Exporting AIUs on the Search Results Page .............................................. Using the InfoArchive GUI Direct Search URL ............................................. 271 272 273 274 Using Archival Reports ................................................................................... Using Archival Reports in DA (Common Operations) ................................... List of Ingestions in Progress ....................................................................... List of Ingestions with Errors....................................................................... List of AIPs by Retention Date ..................................................................... AIPs for Disposition ................................................................................... Current xDB Library Pool Volume ............................................................... Current Archived Volume ........................................................................... Archived Volume History ........................................................................... Performed Actions ...................................................................................... Exported Information ................................................................................. Archived AIPs and archival metrics ............................................................. Configuring the Calculation of Archived Structured Data Volume ................ Configuring the eas_report Job .................................................................... 276 277 278 278 278 278 279 279 280 281 282 283 284 285 Managing Orders ........................................................................................... Viewing Order Properties and Processing History ........................................ Suspending/Resuming an Order .................................................................. Changing the Priority of an Order ............................................................... Cancelling an Order.................................................................................... Purging Orders .......................................................................................... 287 287 287 288 289 290 Managing Jobs ............................................................................................... InfoArchive Jobs ......................................................................................... Viewing Job Trace Logs ........................................................................... InfoArchive Content Server Jobs .................................................................. Modifying InfoArchive Methods Timeout Settings.................................... Increasing the JVM Heap Size Allocated to InfoArchive Jobs ..................... Archive Audit (eas_archive_audit) ........................................................... DCTM Clean (eas_dctm_clean) ................................................................ DCTM Clean Arguments .................................................................... DCTM Clean Return Codes ................................................................. Close (eas_close) ..................................................................................... Close Arguments ................................................................................ Close Return Codes ............................................................................ Commit (eas_commit) ............................................................................. Commit Arguments ............................................................................ Commit Return Codes ........................................................................ DMClean CAStore (dm_DMClean_CAStore) ............................................ DMClean CAStore Arguments ............................................................ DMClean CAStore Return Codes ......................................................... Invalidation/Rejection (eas_rejinv) ........................................................... Invalidation/Rejection Arguments ....................................................... Invalidation/Rejection Return Codes .................................................... Confirmation (eas_confirmation) ............................................................. Confirmation Arguments .................................................................... Confirmation Return Codes................................................................. Purge (eas_purge)................................................................................... Purge Argument................................................................................. The Purge Return Code ....................................................................... Update Content Metadata (eas_sip_cmeta_refresh) .................................. Update Content Metadata Argument ................................................... Update Content Metadata Return Codes .............................................. InfoArchive Command Line Jobs ................................................................. Clean (eas-launch-clean) ......................................................................... Clean properties file............................................................................ 290 290 290 291 293 294 294 294 294 295 295 295 296 296 297 297 298 298 298 298 298 299 299 300 300 301 302 302 302 303 303 303 304 304 Table of Contents Clean Return Codes ............................................................................ xDB Enumeration No Backup (eas-launch-xdb-enumeration -nobackup) ............................................................................................. xDB Enumeration No Backup properties file ........................................ xDB Enumeration No Backup options .................................................. xDB Enumeration No Backup Return Codes ......................................... xDB Clean (eas-launch-xdb-clean) ........................................................... xDB Clean Properties File .................................................................... xDB Clean Options ............................................................................. xDB Clean Return Codes ..................................................................... 305 306 306 306 306 307 307 308 309 Managing Audit ............................................................................................. InfoArchive Audit Trail Events .................................................................... Documentum Content Server Events ....................................................... InfoArchive-Specific Events ..................................................................... Archiving InfoArchive Audit Records .......................................................... Configuring the Archive Audit (eas_archive_audit) Job Arguments ............................................................................................. Running the Archive Audit (eas_archive_audit) Job .................................. Troubleshooting the Archive Audit Job Errors .......................................... Viewing Archived Audit Records ................................................................ Purging InfoArchive Audit Records ............................................................. Configuring Purge Audit (eas_purge_audit) Job Arguments ..................... Running the Purge Audit (eas_purge_audit) Job ....................................... Troubleshooting the Purge Audit Job Errors ............................................. 309 309 310 310 312 Administrating the Configuration Cache .......................................................... Logging ........................................................................................................ Command Line Jobs Logging ...................................................................... File Loggers for Command Line Jobs ....................................................... Console Loggers for Command Line Jobs ................................................. Content Server Jobs Logging ....................................................................... Configuring the Logging Level for InfoArchive Web GUI and Web Services ..................................................................................................... 318 319 319 319 320 321 312 314 315 315 316 317 318 318 321 9 Table of Contents 10 Preface EMC InfoArchive is highly configurable and allows you to customize many aspects of the archiving process to meet your specific business requirements, from ingestion, to query, to the look and feel of InfoArchive GUI. This guide provides information and instructions about how to configure InfoArchive. Intended Audience This document is intended for system administrators responsible for configuring and administrating InfoArchive. To use this document, you need the following: • Administrative privileges on the host where you are configuring InfoArchive • Working knowledge of: — Microsoft Windows or Linux — EMC Documentum Content Server configuration and administration — EMC Documentum xDB configuration and administration — XML, XQuery, XPath, and XForms Related Documentation The following documentation provides additional information: • EMC InfoArchive Release Notes • EMC InfoArchive Installation Guide • EMC InfoArchive Object Reference Guide • EMC InfoArchive Development Guide Conventions The following conventions are used in this document: 11 Preface Font Type Meaning boldface Graphical user interface elements associated with an action italic Book titles, emphasis, or placeholder variables for which you supply particular values monospace Commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter Path Conventions This guide uses the following path conventions: Path Variable Description EAS_HOME EMC InfoArchive installation directory WEBSERVER_HOME Web application server installation directory TOMCAT_HOME Apache Tomcat installation directory XHIVE_HOME EMC Documentum xDB installation directory Note: EMC InfoArchive was named Enterprise Archiving Solution (EAS) prior to the 3.0 release. Names of variables, properties, and object types may still contain the EAS abbreviation. Acronyms and Abbreviations AIC Archival Information Collection AIP Archival Information Package AIU Archival Information Unit DSS Data Submission Session JSP Java Server Pages LWSO Light Weight System Object OAIS Open Archival Information System PDI Preservation Description Information SIP Submission Information Package XSD XML Schema Document XML eXtensible Markup Language 12 Preface Revision History The following changes have been made to this document. Revision Date Description December 2014 Initial publication 13 Preface 14 Chapter 1 EMC InfoArchive Overview With the explosive growth of information and the increased focus on regulatory compliance, companies today are facing the challenges of retaining and protecting ever-growing business-critical data for prolonged periods of time to meet corporate, regulatory, and legal requirements while minimizing storage cost and increasing operational efficiency. EMC InfoArchive is a powerful, secure, and flexible enterprise archiving system for long-term or permanent preservation and access of digital information. Standards-compliant and built on the proven strengths of EMC Documentum Content Server and xDB, InfoArchive provides a unified and cost-effective solution for storing, managing, and retrieving large volumes of structured and unstructured data. Highly configurable and scalable, InfoArchive is designed to help organizations of all sizes to address their complete information retention needs and achieve regulatory compliance. Key Features and Benefits EMC InfoArchive provides the following key features: • Compliance with free, open industry standards InfoArchive is compliant with the following free, open industry standards: — Reference Model for an Open Archival Information System (OAIS) InfoArchive is designed by referencing the OAIS framework. — Extensible Markup Language (XML) InfoArchive stores all structured data in XML format, which is platform and vendor neutral. Unlike proprietary data formats, XML can be read easily by XML editors, most major word processors, and text editors. — XQuery InfoArchive uses the XQuery language to query archived data. By leveraging open technologies, InfoArchive mitigates the risk of losing the possibility of restoring, rendering, or interpreting archived information caused by rapidly changing digital technologies and ensures long-term data immutability and readability. Even if the archiving system itself becomes obsolete, its archived data can still be permanently preserved and retrieved. • Support for data from all types of source applications 15 EMC InfoArchive Overview InfoArchive is source application-agnostic, which means that it can archive any data produced by any source application, as long as the data is packaged into a designated InfoArchive-supported format for ingestion. For example, InfoArchive can archive scanned documents, recorded videos, business data exported from an ERP system, and of course, information extracted from EMC Documentum Content Server. Once archived, the information is securely preserved and can be retrieved at any time. • Unified management of both structured and unstructured data of varying levels of complexity InfoArchive consolidates structured and unstructured data into a single archive, thus eliminating the need to maintain two separate systems. InfoArchive can archive objects of varying levels of complexity, ranging from simple objects (such as flat files), to mildly complex ones (such as content metadata), to highly complex ones (such as SEPA financial records). While being able to archive up to tens of billions of objects, InfoArchive also takes advantage of the lightweight and sharable system object types in EMC Documentum Content Server 6.5 and later to manage objects with shared property values to minimize the storage footprint. • Synchronous (transactional) and asynchronous (batch) ingestion InfoArchive supports two ingestion modes for archiving data objects of varying granularity: — Synchronous ingestion: Archiving of large quantities of single data objects to keep up with steady input streams of data to be archived. — Asynchronous ingestion: Scheduled ingestion of items in batches for optimal performance when information to be archived comes in intermittently. • Active archiving Unlike in traditional passive archives, InfoArchive allows authorized users to search, view, and export both online and offline content stored in the archive through a customizable web-based search application called InfoArchive GUI. You can configure many aspects of the query such as search criteria and query quota, and even customize the look and feel of the web-based user interface. Furthermore, because InfoArchive GUI is built on a set of exposed web services, you can leverage the exposed APIs to integrate the query capabilities into your own business applications or build your own search application. • Powerful administration functionality InfoArchive provides administrators with a set of powerful functions to perform day-to-day archiving operations and management of archived data in an efficient manner, including storage management, retention management, audit management, AIP management, and metrics reporting. Most of these administrative functions are performed in the familiar user interface of EMC Documentum Administrator (DA). • Data security InfoArchive ensures the privacy and security of archived data using role-based permission access and data encryption. • Support for EMC Centera, Atmos, Data Domain, and Isilon 16 EMC InfoArchive Overview InfoArchive provides built-in support for archiving content on a wide selection of EMC storage platforms such as Centera, Atmos, Data Domain, and Isilon, to meet different business and regulatory compliance requirements. • Integration readiness Key InfoArchive functions are exposed as web services that are based on a common framework and can be consumed using Java client libraries or with standard web services tools. You can develop consumers of these web services and integrate InfoArchive capabilities into your own business applications. • Virtualization readiness InfoArchive supports fully virtualized infrastructures as supported by Documentum Content Server and can be readily installed on virtual machines. InfoArchive can bring about the following benefits: • Reduced cost As enterprise data grows on storage resources, it can quickly exhaust terabytes or more of expensive storage space. By offloading infrequently accessed data from tier-one storage to lower-cost storage tiers, InfoArchive reclaims expensive primary storage capacity and minimizes storage cost as well as associated administrative cost. InfoArchive’s ability to permanently preserve and retrieve application data also means that you can decommission legacy applications, leading to further cost savings. • Increased operational efficiency Reduction in stored data on primary storage frees up space for active data and optimizes performance of existing applications. In addition, InfoArchive’s powerful high-volume data ingestion capability, unified and simplified management of structured and unstructured data in a single consolidated archive, and customizable queries to quickly locate archived data through InfoArchive GUI all help to increase operational efficiency. • Data accessibility Through active archiving, InfoArchive ensures easy and flexible access to archived data by authorized users. • Regulatory and legal compliance Government agencies and other regulatory organizations have established requirements for data retention, security, and accessibility. Organizations must also be able to retrieve relevant information in the event of legal discovery, audits, and business or personnel investigations. By using InfoArchive as the information archiving system, organizations can stay in compliance with these regulatory requirements and legal mandates. InfoArchive enables data governance and ensures that historical records are systematically stored in a centralized archive for as long as you are required to keep them, with security in place to guard against tampering or inadvertent deletion, and can be retrieved and presented in a timely manner. • Long-term availability of preserved information With information stored in the open, vendor-neutral XML format, InfoArchive ensures long-term immutability and readability of data provided at the storage level. This means that an organization 17 EMC InfoArchive Overview can continue to access its archived data even when the business application that produced the data has been decommissioned or the archiving system itself has become unavailable. • Versatility and flexibility Whether the data you want to archive is structured or unstructured, plain simple or highly complex, and regardless of which source application the data is extracted from and the granularity of the data, InfoArchive is the truly versatile, flexible, all-in-one solution that can meet all your archiving requirements. • Ability to meet today’s specific business requirements as well as tomorrow’s evolving needs InfoArchive is highly configurable and allows you to customize many aspects of the archiving process to meet your specific business requirements, from ingestion, to query, to the look and feel of InfoArchive GUI. InfoArchive also provides you with the utility to update the metadata of archived data to be in sync with changes in the data model of your source business application. InfoArchive not only provides the flexibility to tailor to your current business needs, but also lays a reliable foundation for your evolving requirements into the future, protecting your existing IT investments. InfoArchive can be scaled to meet your ever-growing storage needs in the long run, in support of virtually limitless archiving volumes. Standards-compliant and built with open technologies, InfoArchive mitigates the danger of information loss due to fast-changing digital technologies that may render a software application obsolete in a matter of years. You can rest assured that business-critical data archived by InfoArchive will always be retrievable and readable well into the future, and can even outlive the applications that produced it. InfoArchive Architecture Built on the proven EMC Documentum platform, InfoArchive taps into the powerful content management capabilities of Content Server and xDB to manage archived data—both structured data (i.e. XML documents) and unstructured data (i.e. data stored in any non-XML format). Data is archived into the repository through a set of standalone Java programs and is queried and retrieved through InfoArchive web services that are exposed to consumer applications, such as the optional search application InfoArchive GUI. InfoArchive provides an administration user interface through customized Documentum Administrator (DA) and lets you perform various archive maintenance activities through a set of Content Server jobs and command line jobs. The following InfoArchive system architecture diagram illustrates the required and optional components in a typical InfoArchive deployment. 18 EMC InfoArchive Overview Required Components Component Type Description Receiver Standalone Java program Places received SIPs in the ingestion queue Enumerator Standalone Java program Returns an ordered list of queued ingestions Ingestor Standalone Java program Archives SIPs web Services JAX-WS stateless web Services Exposes search and retrieval services to consumer applications Documentum Administrator (DA) extensions WDK customization Provides the user interface for performing most configuration and administrative functions Content Server jobs Standalone Java program Archive maintenance jobs implemented as Content Server repository objects to perform various tasks ranging from ingestion commit, to confirmation, to purge Command line jobs Standalone Java programs Archive maintenance jobs executed by a shell with predefined arguments to perform tasks such as cleaning reception, ingestion, and xDB cache working directories 19 EMC InfoArchive Overview Optional Components Component Type Description InfoArchive GUI Web application The default search application for searching archived data Order processor Standalone Java program Launched as a background daemon to execute orders (asynchronous search requests) and loads/unloads xDB data files, useful for search profiles that generally retrieve a huge number of results or execute over an extremely large range of data xDB cache Standalone Java program Launched as a background daemon to access xDB data files at the file system level 20 Chapter 2 Concepts: How InfoArchive Works It is important to understand fundamental InfoArchive concepts before performing configuration and administration tasks. This chapter grounds you in the essentials of InfoArchive terminology and concepts that underlie the complete data archiving and retrieval process. InfoArchive Data Model InfoArchive is designed using a unified OAIS-compliant data model that dictates in which format information is ingested into, stored in, and retrieved from InfoArchive throughout its lifecycle. Content to be archived is exported from the producer (source application) and packaged into Submission Information Packages (SIPs). SIPs are ingested into InfoArchive and stored there as Archival information Packages (AIPs), one AIP corresponding to one SIP. The consumer (user or application) retrieves the content from InfoArchive by either performing a search (synchronous) or creating an order (asynchronous search) on the AIPs and the information units contained in the AIPs (called AIUs) that match the search criteria are returned as query results. Archive Holding A holding is a logical destination archive where to ingest and store data, usually of the same type that share common characteristics. For example, you can create a holding to archive data from 21 Concepts: How InfoArchive Works the same source application (such as ERP data), or of the same format (such audio recordings), or belonging to the same business entity. An InfoArchive instance can contain multiple archive holdings. You can create multiple archive holdings for a single data type for applying different access rights and target storage areas in order to meet the requirements from different data owners. Considering that the definition of the archive holdings is highly structured, it must be carefully designed according to: • The expected content types to be archived • The data segregation constraints for isolating the data owned by different business entities in distinct archive holdings Most InfoArchive system configurations are performed at the holding level. Holding configuration encompasses many aspects of data archiving such as storage areas, retention policy, ingestion sequence, AIP mode, and xDB mode. The settings defined at the archive holding level are used throughout the whole lifecycle of the data archived in the holding. Submission Information Package (SIP) Data to be archived must be packaged into Submission Information Packages (SIPs) to be ingested into InfoArchive. A SIP is a data container used to transport data to be archived from the producer (source application) to InfoArchive. It consists a SIP descriptor containing packaging and archival information about the package and the data to be archived. The latter part of the SIP in turn comprises a PDI file eas_pdi.xml (structured data) and optionally one or more content files (unstructured data). A SIP must be compressed into .zip format and have the following files at the root level: • eas_sip.xml SIP descriptor that identifies and describes the information package. This file must conform to the SIP schema. • eas_pdi.xml 22 Concepts: How InfoArchive Works PDI (preservation description information) file that contains the structured data to archive. InfoArchive does not dictate the structure of this file. • Optionally, one or more unstructured content files to be archived, such as audio (.mp3), video (.avi), and graphic (.png) files. These file names must be referenced in the PDI file (eas_pdi.xml). InfoArchive can archive any type of data produced from any source application as long as the data is packaged into SIPs that meet all the file structure and format requirements. However, InfoArchive is not responsible for generating SIPs; the source application must produce them or you can develop utilities to extract information to be archived from the source application and convert it into InfoArchive-compliant SIPs. Use a file transfer program of your choice to move the generated SIPs to where they can be ingested into InfoArchive. Data Submission Session (DSS) Every SIP pertains to a data submission session (DSS), also referred to as a batch. The data submission session (DSS) provides the packaging information used to identify, bind, and relate SIPs of different levels of granularity produced by the source application. When information is produced and packaged into a discrete, standalone SIP, the SIP pertains to a single-package DSS. Sometimes, information has to be packaged into more than one sequential SIP grouped together as a batch rather than a single SIP due to limitations in file size, file transfer, time, or the source application. In this case, multiple SIPs pertains to a single DSS, with each SIP in the batch assigned a sequence number (seqno). The last SIP in a DSS has the is_last value set to True. A DSS associates multiple SIPs together. Each DSS (batch) has a unique identifier derived from the information contained in the SIP descriptor: external DSS ID = holding + producer + id (internal DSS ID within the SIP) For example, given the following DSS information: <holding>PhoneCalls</holding> <producer>CC</producer> <id>2011060118</id> The external DSS ID is PhoneCallsCC2011060118. 23 Concepts: How InfoArchive Works When archiving multiple SIPs belong to the same DSS (batch archiving), InfoArchive is insensitive to the order in which the SIPs are received, regardless of their sequence number, and can ingest multiple SIPs belonging to the same DSS concurrently. InfoArchive also natively supports the commit and rollback of all the SIPs in a batch at the DSS level. The following is an example of the DSS element contained in the SIP descriptor (eas_sip.xml). All SIPs that belong to the same DSS have the same values in the DSS element. <?xml version="1.0" encoding="UTF-8"?> <sip xmlns="urn:x-emc:eas:schema:sip:1.0"> <dss> <holding>PhoneCalls</holding> <id>2011060118</id> <pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema> <pdi_schema_version/> <production_date>2011-06-01T00:00:00.000</production_date> <base_retention_date>2011-06-01T00:00:00.000</base_retention_date> <producer>CC</producer> <entity>PhoneCalls</entity> <priority>0</priority> <application>CC</application> <retention_class>R1</retention_class> </dss> ... </sip> DSS element Description holding The destination archive holding into which to ingest the SIP. id Internal DSS identifier (ID) assigned by the application or utility that generated the SIP. This ID, in conjunction with holding, and producer, is used to form the external DSS ID that uniquely identifies a DSS: external DSS ID = holding + producer + id pdi_schema The uniform resource name (URN) of the XML schema used to validate the PDI file (eas_pdi.xml), with its version number appended to it, for example: urn:eas-samples:en:xsd:phonecalls.1.0 pdi_schema_version It is recommended that you leave this element blank and append the PDI file schema version number to the schema URN in the pdi_schema element. When this information is not included in the schema URN (not recommended), you can specify a version of the schema. This element is included for alignment with the xsd:schema standard. However, using this element brings several XML inherent limitations, for example, it is not possible to include a schema in another schema by referencing its schema version. For this reason, it is recommended that you directly put the version in the URN of the schema, which is the most common XML practice. 24 Concepts: How InfoArchive Works DSS element Description production_date The datetime when the DSS (batch) that the SIP belongs to was produced in the following coordinated universal time (UTC) format: yyyy-mm-ddThh:mm:ss.000 Typically, it is the creation time of the first SIP in a batch. base_retention_date The base retention date of the SIP used to calculate the retention date of the information package in the archive holding: retention date = base retention date + retention period of the holding The date must be in the following coordinated universal time (UTC) format: yyyy-mm-ddThh:mm:ss.000 producer The application or utility that produced the SIP. This can be the same as the application element. entity The business entity that owns the information contained in the information package. priority The ingestion priority of the batch. With the same archive holding, InfoArchive ingests batches with a higher priority value first. application The application or utility that produced the SIP. This can be the same as the producer element. retention_class Retention class used to calculate the retention date of the SIP. This element is optional. • If empty, the retention period (eas_retention_period property) configured for the destination archive holding is applied. • If present, the retention period (eas_retention_class_period property) associated with the specified retention class (eas_retention_class property) defined for the destination archive holding is applied. If the retention period or the retention class is not found during reception, an error occurs. SIP Descriptor (eas_sip.xml) A SIP must have a SIP descriptor (eas_sip.xml) at the root level of the package. The SIP descriptor contains two types of information: • Archival information about the package such as producer (the source application from which the data originates), the destination archive holding (an InfoArchive instance can contain multiple archive holdings), and base retention date. This information is used by InfoArchive during the 25 Concepts: How InfoArchive Works ingestion process to facilitate searching archived data. This information is the same for all SIPs in a DSS (batch). • Packaging information that provides encapsulation and identification of the content to be archived. It identifies whether the SIP is a standalone single-item package or a one of multiple sequential packages in a batch (and if so, which sequence number it is). Each SIP is uniquely identified by an external SIP ID: external SIP ID = external DSS ID + seqno = holding + producer + internal DSS ID + seqno The producer information in the external SIP ID allows different source applications or utilities to produce SIPs without conflicting IDs. In the following SIP descriptor (eas_sip.xml) example, the external SIP ID is PhoneCallsCC20110201131: <?xml version="1.0" encoding="UTF-8"?> <sip xmlns="urn:x-emc:eas:schema:sip:1.0"> <dss> <holding>PhoneCalls</holding> <id>2011020113</id> <pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema> <pdi_schema_version /> <production_date>2011-02-01T00:00:00.000+01:00</production_date> <base_retention_date>2011-02-01T00:00:00.000+01:00</base_retention_date> <producer>CC</producer> <entity>PhoneCalls</entity> <priority>0</priority> <application>CC</application> </dss> <production_date>2011-02-01T00:00:00.000+01:00</production_date> <seqno>1</seqno> <is_last>true</is_last> <aiu_count>10</aiu_count> <page_count>0</page_count> </sip> Note: In eas_sip.xml, the text of the holding, id, and producer elements cannot contain the following reserved characters: < (less than), > (greater than), : (colon), " (double quote), / (forward slash), \ (backward slash), | (pipe), ? (question mark), * (asterisk). SIP Descriptor (eas_sip.xml) Schema A SIP descriptor (eas_sip.xml) must be a valid XML document that conforms to the predefined InfoArchive SIP descriptor schema (eas_sip.xsd), which can be found in the install/resources/xsd directory of the InfoArchive installation package. <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <xs:Schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:sip="urn:x-emc:eas:Schema:sip:1.0" targetNamespace="urn:x-emc:eas:Schema:sip:1.0" version="1.0"> <xs:element name="sip"> <xs:complexType> <xs:sequence> <xs:element name="dss"> <xs:complexType> <xs:sequence> <xs:element name="holding" nillable="false"> <xs:simpleType> 26 Concepts: How InfoArchive Works <xs:restriction base="xs:string"> <xs:maxLength value="32"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="id" nillable="false"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="32"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="pdi_schema" nillable="false"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="256"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="pdi_schema_version" nillable="false"> <xs:simpleType> <xs:restriction base="xs:token"> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="production_date" type="xs:dateTime"/> <xs:element name="base_retention_date" type="xs:dateTime" nillable="false"/> <xs:element name="producer" nillable="false"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="32"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="entity" nillable="false"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="priority" type="xs:int" nillable="false"/> <xs:element name="application" nillable="false"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="retention_class" nillable="false" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:minLength value="1"/> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> 27 Concepts: How InfoArchive Works <xs:element name="production_date" type="xs:dateTime" nillable="false"/> <xs:element name="seqno" nillable="false"> <xs:simpleType> <xs:restriction base="xs:int"> <xs:minInclusive value="1"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="is_last" type="xs:boolean" default="false"/> <xs:element name="aiu_count" nillable="false"> <xs:simpleType> <xs:restriction base="xs:long"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="page_count" nillable="false" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:long"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element ref="sip:pdi_hash" minOccurs="0"/> <xs:element ref="sip:custom" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="pdi_hash"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="algorithm" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="MD2"/> <xs:enumeration value="MD5"/> <xs:enumeration value="SHA-1"/> <xs:enumeration value="SHA-256"/> <xs:enumeration value="SHA-384"/> <xs:enumeration value="SHA-512"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="encoding" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="base64"/> <xs:enumeration value="hex"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="custom"> <xs:complexType> <xs:sequence> <xs:element ref="sip:attributes" minOccurs="0" maxOccurs="1"/> <xs:element ref="sip:data" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="attributes"> 28 Concepts: How InfoArchive Works <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="sip:attribute"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="attribute"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="name" use="required" type="xs:string"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:Schema> The global SIP elements defined in the schema are described as follows: Global SIP element Description production_date The datetime when the SIP was produced in the following coordinated universal time (UTC) format: yyyy-mm-ddThh:mm:ss.000 Generally, it is the creation time of the PDI file. seqno The sequence number of this SIP that denotes its position in the DSS (batch). This sequence number, in conjunction with the internal DSS id, holding, and producer information, is used to form the external SIP ID that uniquely identifies a SIP to avoid conflicting SIP IDs produced by different source applications: external SIP ID = holding + producer + id + seqno is_last Whether this SIP is the last one in the DSS (batch). aiu_count The number of AIUs contained in the SIP. This information is used for consistency check during ingestion. page_count Currently not implemented yet; reserved for future use. Always set this to 0 (zero). pdi_hash (Optional) Hash value (Base64 or Hexadecimal encoded binary) of the PDI file. The element has two attributes: • algorithm The algorithm used to compute hash values. The following algorithms are currently supported: MD2, MD5, SHA-1, SHA-256, SHA-384, and SHA-512. • encoding The encoding scheme used to store hash values. The hex and base64 encoding schemes are currently supported. 29 Concepts: How InfoArchive Works Global SIP element Description InfoArchive uses this element to perform consistency checks of PDI files at the start of ingestion. custom Optional element that lets you customize SIP/AIP metadata by defining custom attributes and/or data to be included in the SIP descriptor (eas_sip.xml). See Customizing SIP/AIP Metadata, page 30. Customizing SIP/AIP Metadata You can customize SIP/AIP metadata by modifying the predefined InfoArchive SIP descriptor schema (eas_sip.xsd) and defining custom elements to be included in the SIP descriptor (eas_sip.xml). The predefined InfoArchive SIP descriptor schema (eas_sip.xsd) can be found in the install/resources/xsd directory of the InfoArchive installation package. In the following example, a custom SIP element data is defined in the SIP descriptor schema: ... <xs:element name="custom"> <xs:complexType> <xs:sequence> <xs:element ref="sip:attributes" minOccurs="0" maxOccurs="1"/> <xs:element ref="sip:data" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="data"> <xs:complexType> <xs:sequence> <xs:any processContents="lax" minOccurs="0" maxOccurs="10000"/> </xs:sequence> </xs:complexType> </xs:element> ... Note: You cannot include a parent node that contains both a text node and an element at the same level in the custom element; for example: <data>textnode1<foo1/>textnode2<foo2/></data> After you define the custom element, you can directly include it in the SIP descriptor. Note: Currently, custom AIP attributes cannot be pushed to the Centera store. PDI File (eas_pdi.xml) The PDI (Preservation Description Information) file eas_pdi.xml in a SIP stores structured data to archive. Unlike the SIP descriptor (eas_sip.xml), there is no predefined schema for the PDI file. You can define your own schema using an XML editor for the type of data to archive based on your business 30 Concepts: How InfoArchive Works requirements (the PDI file name must be eas_pdi.xml though). Any schema can be used as long as you import it into InfoArchive and perform some additional configurations. A file name can be stored as a value of one or several XML elements and/or attributes, and the same file name can be referenced multiple times within an eas_pdi.xml file. Here is an example of a PDI file containing phone call recordings (eas_pdi.xml): <?xml version="1.0" encoding="UTF-8" ?> <Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0"> <Call> <SentToArchiveDate>2011-06-01</SentToArchiveDate> <CallStartDate>2011-05-27T04:11:37.234+01:00</CallStartDate> <CallEndDate>2011-05-27T04:57:16.234+01:00</CallEndDate> <CallFromPhoneNumber>1773708622</CallFromPhoneNumber> <CallToPhoneNumber>251286403</CallToPhoneNumber> <CustomerID>000601</CustomerID> <CustomerLastName>Mills</CustomerLastName> <CustomerFirstName>William</CustomerFirstName> <RepresentativeID>024</RepresentativeID> <Attachments> <Attachment> <AttachmentName>recording1</AttachmentName> <FileName>recording1.mp3</FileName> <CreatedBy>PhoneRecorder</CreatedBy> <CreatedOnDate>2011-05-27T04:57:16.234+01:00</CreatedOnDate> </Attachment> <Attachment> <AttachmentName>recording2</AttachmentName> <FileName>recording2.mp3</FileName> <CreatedBy>PhoneRecorder</CreatedBy> <CreatedOnDate>2011-05-27T04:57:16.234+01:00</CreatedOnDate> </Attachment> </Attachments> </Call> <Call> <SentToArchiveDate>2011-06-01</SentToArchiveDate> <CallStartDate>2011-05-04T19:56:28.234+01:00</CallStartDate> <CallEndDate>2011-05-04T20:19:50.234+01:00</CallEndDate> <CallFromPhoneNumber>1616330136</CallFromPhoneNumber> <CallToPhoneNumber>1885236136</CallToPhoneNumber> <CustomerID>000123</CustomerID> <CustomerLastName>Martin</CustomerLastName> <CustomerFirstName>Camila</CustomerFirstName> <RepresentativeID>013</RepresentativeID> <Attachments> <Attachment> <AttachmentName>recording3</AttachmentName> <FileName>recording3.mp3</FileName> <CreatedBy>PhoneRecorder</CreatedBy> <CreatedOnDate>2011-05-04T20:19:50.234+01:00</CreatedOnDate> </Attachment> </Attachments> </Call> <Call> ... </Call> </Calls> 31 Concepts: How InfoArchive Works Archival Information Unit (AIU) An archival information units (AIU) is conceptually the smallest archival unit (like an information atom) of an information package. Each AIU corresponds to a record or item of the archived data. A single customer order, a patient profile, or a financial transaction record in an information package is an AIU. The PDI file (eas_pdi.xml) in a SIP describes all the AIUs in the package. An AIU in eas_pdi.xml consists of an XML block in the file containing its structured data, and optionally, references to one or more associated unstructured content files. In the following example: • AIU #1 is described by its structured data in eas_pdi.xml with a reference to one content file. • AIU #2 contains two content files referenced by its structured data in eas_pdi.xml. • AIU #3 only contains structured data stored in eas_pdi.xml with no content files. InfoArchive processes all AIUs in the same way regardless of whether they contain any content files. Archival Information Package (AIP) When a SIP is ingested into InfoArchive, it is converted into an archival information package (AIP) and stored in the system. An AIP is represented as an eas_aip type object in Content Server. 32 Concepts: How InfoArchive Works The way InfoArchive organizes and stores information in an AIP is very different from the way information is packaged in a SIP. During the SIP to AIP conversion process, information is extracted from the SIP descriptor and stored in the properties of the AIP, along with some additional archival information. Content files, if any, are imported as a part of the AIP object and stored on the designated file store. The structured data of all the AIUs in eas_pdi.xml within the received SIP are stored as an XML document in xDB. Like a SIP, an AIP is also made up of AIUs. Although physically stored very differently in InfoArchive from in the SIP file, conceptually, AIUs remain the same as the basic unit that constitutes an AIP. Unstructured Content File Table of Contents (eas_ri_xml) Unstructured content files associated with AIUs, if any, are imported as the original content of the AIP object in the aggregated content file (eas_ci_container) format and stored in the configured storage area. The eas_ci_container rendition content aggregates all unstructured data associated with the AIP. Meanwhile, a table of contents (eas_ri_xml) which describes the aggregated file is generated by the ingestion process. The description of each unstructured content file is stored in the eas_ri_xml (table of contents) rendition of the AIP. 33 Concepts: How InfoArchive Works The unstructured data table of contents (eas_ri_xml) is also imported into xDB with indexes created on the file name and sequence number elements. The extension for table of contents XML file is ri. The table of contents contains the following information: • File name • File size • Format • Sequence number of the unstructured content file in the AIP • Byte position in the eas_ci_container rendition content • Hash value for future consistency checking purposes (optional) • Operations performed during the ingestion such as compression and encryption (optional) Here is an example of the table of contents: <?xml version="1.0"?> <ris xmlns="urn:x-emc:eas:schema:ri"> <ri schemaVersion="1.0" seqno="1" pdi_key="recording1.mp3" xmlns:ri="urn:x-emc:eas:schema:ri"> <step seqno="1"> <ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123"/> </step> <ri schemaVersion="1.0" seqno="2" pdi_key="recording2.mp3" xmlns:ri="urn:x-emc:eas:schema:ri"> <step seqno="1"> <ci mime_type="audio/x-mpeg" dctm_format="mp3" size="40183"/> </step> </ri> 34 Concepts: How InfoArchive Works </ris> • The namespace is urn:x-emc:eas:schema:ri. • Each content file is described by an ri element, which has the following attributes: Attribute Name Description SchemaVersion Always set to 1.0. seqno Sequence number of the unstructured file in the AIP, incremented for each content file name returned by the configured XQuery defined in the content of the ingestion configuration (eas_cfg_pdi) object. pdi_key The original content file name • Each step element has a ci sub-element having the following attributes: Attribute Name Description mime_type MIME type corresponding to the format of the content file as returned by the configured query. dctm_format Repository format corresponding to the format of the content file. position Byte position in the eas_ci_container rendition size Byte size in the eas_ci_container rendition seqno and pdi_key together make a unique identifier for a content file. If the query returns several results for a single content file, an error will occur. The following AIP properties are related to the unstructured data. Property Name DA Label Description eas_ci_cnt #contents Number of unstructured data files associated with the AIP eas_ci_size Cumulated size of the contents Combined size of all of the unstructured data files 35 Concepts: How InfoArchive Works Property Name DA Label Description eas_ci_format Content store format Format of the content storing the AIP unstructured data (always eas_ci_container) Content description size Content description size Size in bytes of an <ri> block in the TOC Archival Information Collection (AIC) AIPs provide the information necessary for a consumer to locate and order AIUs of interest. However, it can be impossible for a consumer to sort through the millions of AIPs contained in a large Archive. AIC addresses this problem. An Archival Information Collection (AIC) organizes a set of AIPs, which can support flexible and efficient data access. AIPs are aggregated into the AIC using specified criteria (through DQL, for example) determined by the archivist. Generally AICs are based on the AIPs of interest having common themes or origins, and a common set of properties. For example, an archive for digital movies may have AICs based on the subject area of the movie such as action, science fiction, or horror. In addition, the archive may have AICs based on other factors such as director or lead actor. It is common that one AIC corresponds to one holding. At a minimum, InfoArchive can be viewed has having at least one AIC which contains all the AIPs archived in the system. How Data is Ingested Ingestion Modes Companies today are dealing with all sorts of information that varies widely in volume and variety from business to business, with the need to enforce different data management policies on different types of information in compliance with business rules and regulatory requirements. InfoArchive supports two ingestion modes to support a wide range of information archiving needs with optimal storage usage and performance: asynchronous ingestion and synchronous ingestion. A holding can be configured to support both ingestion modes. Asynchronous Ingestion In the asynchronous ingestion mode, InfoArchive performs scheduled ingestions of large SIPs (containing large numbers of AIUs) in batches. Each Data Submission Session (DSS), or batch, contains multiple sequentially numbered SIPs. SIPs in a batch can be ingested in any order, but ingested packages are not committed and made searchable until all the SIPs in a batch have been ingested. As a result, information is not searchable or accessible immediately after ingestion. Asynchronous ingestion provides small granularity of data management as you can individually manage each package such as adjust the retention date and apply permission sets. 36 Concepts: How InfoArchive Works Asynchronous ingestion is performed through a set of scheduled scripts. Synchronous Ingestion In the synchronous ingestion mode (also called transactional ingestion mode), the client application calls the InfoArchive ingest web service to synchronously archive data to the archive holding. If the ingest web service call is successful, archived data can be searched and accessed immediately after ingestion. You use synchronous ingestion in the following scenarios: • Regulatory compliance In some industries, regulations mandate that certain types of documents must be synchronously archived to their final location. • Business requirements In some business environments, it is required that data must be promptly archived and immediately accessible to be referenced by a business process to achieve a high service level (as stipulated in the service level agreement). • Small data management granularity You want to manage archived data at the AIP level—for example, you want to apply policies, access rights, folder classifications to individual AIPs and not to groups of AIPs. Synchronous ingestion is suitable for ingesting of large quantities of standalone (one SIP per DSS or batch), small SIPs (containing small numbers of AIUs) that can be managed at the package level. The synchronous ingestion mode has the following characteristics: • Only discrete, standalone SIPs can be ingested. A standalone SIP is the only (the first and last) SIP in a DSS (batch), as indicated by the following properties in its SIP descriptor: seqno=1, is_last=True. • SIPs share a set of common characteristics that can be shared properties of the AIP parent shareable object, for example, the same target archive holding, PDI schema, retention class, close base retention date, and permission sets. • SIPs are of small size, containing a moderate number of AIUs. • Received SIPs are ingested immediately without delay. The synchronous ingestion mode has the following limitations: • If InfoArchive receives the same SIP from a business application multiple times, it will archive multiple identical copies of the SIP. • Unlike in the asynchronous ingestion mode, you cannot defer the commit or rollback ingestions. • The synchronous ingestion mode provides large management granularity, which means that you can only manage packages collectively at the aggregated AIP level. • This mode is not suitable for ingesting large-sized SIPs, which may cause system latency, timeout, unresponsiveness, or crash. • Compared with the asynchronous ingestion mode, the synchronous ingestion mode is less efficient, consumes more resources, and has a larger storage footprint. 37 Concepts: How InfoArchive Works Synchronous ingestion supports all the three AIP modes. Although synchronous ingestion supports AIP mode 1, this AIP mode has the largest storage footprint at the repository and RDBMS level. Therefore, use AIP mode 1 only when there are only a moderate number of AIPs to retain during the retention period. You perform synchronous ingestion by calling InfoArchive Web services exposed to consumers. For information about how to consume InfoArchive Web services, refer to the EMC InfoArchive Development Guide. Asynchronous Ingestion vs. Synchronous Ingestion An archive holding can concurrently support both ingestion modes. Both ingestion modes basically share the same ingestion configuration and underlying ingestion code, and when you search archived data, it makes no difference in which ingestion mode was the data archived. However, since SIPs are ingested in an ad hoc and discrete manner rather than in scheduled batches, synchronous ingestion is relatively less efficient, consumes more system resources, and has a larger storage footprint. Choose the appropriate ingestion mode to archive your information based on the nature of the data, system environment, information management policies, and business rules or regulatory compliance requirements. The following table compares the two ingestion modes: Asynchronous Ingestion Synchronous ingestion Number of SIPs per DSS (batch) Multiple (sequentially numbered) One (discrete, standalone) SIP size (Number of AIUs in a SIP) Large Small Ingestion frequency Scheduled Ad hoc Ingestion commit Not until all the SIPs in a batch have been ingested Immediately after ingestion Supports ingestion rollback Yes No Immediately searchable and accessible? No Yes Supported AIP mode 1 1, 2, 3 Management granularity Small Large Ingestion efficiency High Low Resource consumption Low High 38 Concepts: How InfoArchive Works Asynchronous Ingestion Synchronous ingestion Use case Large packages that need to be managed individually; for example, video footages Large quantities of small packages with shared properties that can be managed collectively; for example, payment transaction records Performed through A set of scheduled scripts The exposed Ingest service Archiving Process Both asynchronous and synchronous archiving processes consist of four distinct sub-processes: reception, enumeration, ingestion, and commit. The only difference is that for asynchronous ingestion, each of the sub-processes is executed through a scheduled script located in EAS_HOME/bin whereas for synchronous ingestion, all the sub-processes are executed in one run through the InfoArchive web services call and are transparent to the user. • Reception Queues the SIP file for ingestion. An AIP object is created in the Documentum repository and assigned attribute values based on the information in the SIP. • Enumeration Outputs a list of AIPs awaiting ingestion, in descending order of ingestion priority. • Ingestion Ingests the AIPs and their associated AIUs, in order of AIP ingestion priority. At this stage, the AIUs are not searchable yet. • Commit Commits the AIUs into the xDB database. The AIU data for the holding can now be searched. Reception Receiver Reception performed by running the receiver. The receiver is a standalone Java program which adds a received SIP file in the ingestion queue. This program is typically activated either by the third party file transfer software after a SIP file is received or by a custom script launched as a daemon which periodically scans predefined reception directories and launches the receiver for each file found. If necessary, several concurrent executions of the receiver can be launched. 39 Concepts: How InfoArchive Works The arguments expected by the receiver are stored in a properties file. You can overwrite the default arguments in the command line. One property contains the name of a receiver configuration object containing additional configuration settings. Beside the arguments and the receiver configuration object, the execution of the receiver is also driven by the configuration of the target archive holding referenced by the SIP descriptor. The receiver connects to the repository, loads the referenced receiver configuration object then: • Creates an AIP repository object and assigns attributes with the reception start date, the path name and the size of the received file • Creates a reception working directory and initializes a reception log file • Attaches the reception lifecycle to the AIP repository object • Checks the file path name matches an acceptable pattern defined in the receiver configuration object • Extracts the SIP descriptor from the received SIP file • Assigns attributes of the AIP repository object with information read in the SIP descriptor • Checks the archive holding referenced in the SIP descriptor is configured • Creates a destruction lock on the AIP repository object if activated in the archive holding configuration • Classifies the AIP repository object as defined in the archive holding configuration • Checks the presence of the hash value in the SIP descriptor if demanded by the archive holding configuration • Checks if another non invalidated AIP repository object already has the same identifier as the received file • Checks if the Data Submission Session (DSS) sequence number is consistent with those of the other AIP repository objects referencing the same DSS (asynchronous ingestion only) • Rejects the received file if it references a rejected DSS identifier • Encrypts the received file if activated by the archive holding configuration • Imports several files as renditions of the AIP repository object — The received file as format eas_sip_zip or eas_sip_zip_crypto. — The SIP descriptor as format eas_sip_xml. — The compressed reception log file as format eas_receive_logs_zip. • Destroys the received file All along the execution of those actions, the receiver promotes the AIP repository object within its attached reception lifecycle. Consequently, a successful reception leads to an AIP repository object which is in the Waiting Ingestion state. The reception working directories are periodically destroyed by the Clean (eas-launch-clean) job. 40 Concepts: How InfoArchive Works Importing the received file as content of the AIP repository object both secure it in a private Content Server storage area and makes it available to any ingestor running on any server connected to the network. Reception Process and Lifecycle The SIP reception process entails a sequence of predefined processing steps, each associated with a lifecycle state. Lifecycle State Actions Initialization • Extracts and performs XML validation of the SIP descriptor (eas_sip.xml) • Sets the AIP properties based on the information in the SIP descriptor (eas_sip.xml) • Verifies that the target archive holding is configured Validation Performs the following validations and actions: • If this SIP has already been processed, raises an error Note: This validation is only performed in asynchronous ingestion; Synchronous ingestion does not check for SIP duplication. • If this SIP belongs to a rejected DSS (batch), attaches it to the reject lifecycle • If this SIP does not reference the PDI schema configured for the holding, attaches it to the reject lifecycle • Computes the ingestion deadline and retention date The administrator can reject a DSS (batch) by rejecting a SIP within the DSS that has already been received, while other SIPs of the DSS are still being produced or transferred. In this situation, the reception of further SIPs for this DSS leads the receiver to immediately attach the AIPs to the rejection lifecycle without raising any error. This can be useful when you manage a large DSS for which many large SIPs are created. Encryption (optional) Encrypts the entire received SIP file, using a configured crypto provider such as EMC RSA DPM (Data Protection Manager). 41 Concepts: How InfoArchive Works Lifecycle State Actions Import Imports the following as contents of AIP: • SIP descriptor (eas_sip.xml) • Received SIP file • Gzipped reception log file (optional) Waiting_Ingestion No action Reception Node The receiver processes SIPs according to instructions set in the reception node. You must configure a reception node for a holding. Multiple holdings can share the same reception node. A reception node contains the following settings: • The repository folder path in which the AIP object must be initially classified. • The directory file system path in which the reception working directory must be created. • A table describing the patterns of the accepted incoming files; for example: File Name Pattern Customer Pattern Type Pattern *.zip PRODUCTION SEPA *.zip PRODUCTION ATMLOGS The receiver sequentially scans the table for finding a line for which the received filename, customer and type arguments match the configured patterns. The customer and type arguments are intended to be passed by the third party file transfer software: • The customer argument is generally set with the identification of the file transfer destination agent which is different per environment. • The type argument is generally set with a code associated with the data type transported by the received file. The usage of the customer and type arguments is not mandatory when this information can be deducted from the normalized filename of the received file; for example: File Name Pattern Customer Pattern Type Pattern prod_sepa_*.zip * * prod_atmlogs_*.zip * * 42 Concepts: How InfoArchive Works By default, for diagnosis purpose, the receiver does not delete the received file when a reception error occurs. This behavior can be altered by adjusting the settings of the receiver configuration object for deleting the file: • If it does not match any acceptable pattern. • If it matches a given pattern but the error occurs after the matching is found. This mechanism is activated when constraints impose to not keep files containing sensitive data in a regular form. Enumerator The ingestion enumerator is a standalone Java program utility returning a list of AIP identifiers pending for ingestion ordered by their descending ingestion priority. This utility is typically activated on each ingestion node by a custom shell periodically activated by the corporate job scheduler or launched as a permanent daemon. The arguments expected by the enumerator default from a properties file but can be overridden on the command line. Every AIP is associated with a given archive holding and a holding configuration defines which ingestion nodes are allowed to execute an ingestion for the holding. Based on this information, the nodes argument restricts the selection of the AIPs to those allowed to be ingested by the ingestion node names passed in the argument. This mechanism allows each ingestion node to obtain the list of AIP identifiers it can process. The operational plan defines the maximum allowed number of concurrent ingestions which can be executed on each ingestion node at any point in time. It is the responsibility of the custom shell to determine that number but the following arguments restricts the returned list to the AIPs for which the ingestion can be launched: • Passing that number as value of the max argument leads the enumerator to return up to that number of AIPs. • Passing also the minusrunning value in the flag argument leads the enumerator to: — Determine the number of ingestions being executed — Subtract this number of ingestion being executed to the value passed in the max argument. For example: • 20 AIPs are pending for ingestion and eligible for being processed by the ingestion node 1 • The maximum number of concurrent ingestions allowed for the ingestion node 1 is 10 • 3 ingestions are being executed on the ingestion node 1 • A shell launched on the ingestion node 1 invokes the enumerator with the nodes=ingestion node 1, max=10 and f=minusrunning arguments. The enumerator returns 7 AIPs (i.e., up to 10 – 3) • Consequently, the custom shell can immediately launch an ingestion for each returned AIP without fearing to exceed the maximum allowed number of concurrent ingestions. 43 Concepts: How InfoArchive Works Ingestion Ingestor The ingestor is a standalone Java program which executes the ingestion of an AIP. This program is generally launched by the custom shell invoking the enumerator: • Direct activation of the ingestor as a subprocess per AIP to ingest • Indirect activation through the triggering of a job created per AIP The arguments expected by the ingestor are defaulted in a properties file but can be overridden on the command line. One property contains the name of an ingestion node configuration object containing additional configuration settings. Beside the arguments and the ingestor node configuration object, the ingestion is mainly driven by the configuration of the archive holding to which the AIP belongs to. The ingestor connects to the repository, loads the referenced ingestor configuration object then: • Creates an ingestion working directory and initializes an ingestion log file • Analyses the configuration for determining the ingestion sequence to apply • Attaches the lifecycle associated with the ingestion sequence to the AIP repository object • Sequentially executes the ingestion steps described in the ingestion sequence (e.g., validation of the structured data of the AIUs, import of the structured data in xDB, creation of xDB indexes, compression of the contents associated with the AIUs, …). • Enriches the AIP repository object with additional contents according to the configuration (e.g., compressed xDB detachable library, ingestion logs, aggregate containing the contents associated with the AIUs, table of contents, …) All along the execution of the ingestion, the ingestor promotes the AIP repository object within the lifecycle associated with the applied ingestion sequence. A successful ingestion leads to an AIP repository object which is in the pending commit state. The ingestion working directories are periodically destroyed by the Clean (eas-launch-clean) job. Ingestion Process and Lifecycle The SIP ingestion process entails a sequence of processing steps. Ingestion step actions are executed by processors referenced in the XML content of the ingestion configuration (eas_cfg_pdi) object. The XML content also contains settings. InfoArchive ships with two pre-configured ingestion configuration objects: • eas_ingest_sip_zip-pdi The ingestion process for ingesting structured data only—SIPs containing no content files. • eas_ingest_sip_zip-ci The ingestion process for ingesting both structured and unstructured data—SIPs containing content files. 44 Concepts: How InfoArchive Works For information about the SIP structure, see Submission Information Package (SIP), page 22. In addition to the same processing steps as eas_ingest_sip_zip-pdi for ingesting structured data, eas_ingest_sip_zip-ci also contains the steps for processing unstructured data. eas_ingest_sip_zip-ci must be configured for archiving unstructured data. The process also works for structured data, but if you archive structured data only, eas_ingest_sip_zip_pdi should be configured since it entails less overhead with fewer processing steps and fewer audit records. eas_ingest_sip_zip-pdi is associated with the eas_ingest_sip_zip-ci.en_US lifecycle that contains structured data processing related states. The eas_ingest_sip_zip-ci ingestion process is associated with the eas_ingest_sip_zip-ci.en_US lifecycle that contains additional unstructured data processing related states. There are no out-of-the-box lifecycles for other locales such as fr_FR and zh_CN. 45 Concepts: How InfoArchive Works Structured Data Lifecycle State Actions Initialize Creates the ingestion working directory Decrypt (optional) Decrypts the SIP file if InfoArchive has been configured for encrypting received SIPs duirng ingestion using an crypto provider such as EMC RSA DPM (Data Protection Manager) Decompress_Metadata Extracts eas_pdi.xml to the ingestion working directory and compresses it as an eas_pdi_xml_gzip rendition of the AIP object (eas_aip) if configured for the holding Note: As opposed to the zip format which can compress and store multiple files, gzip compresses a single file only. Unstructured data is archived separately as the eas_ci_container rendition of the AIP (eas_aip) object. Validate_Metadata_Hash Checks the eas_pdi.xml hash, if present in the SIP descriptor Encrypt_Metadata (optional) Encrypts only the configured subset of structured data in eas_pdi.xml, as well as the compressed eas_pdi.xml file before storing it as an eas_pdi_xml_gzip_crypto rendition Import_Metadata Creates the xDB detachable library associated with the AIP (xDB mode 2) and imports eas_pdi.xml into xDB 46 Concepts: How InfoArchive Works Structured Data Lifecycle State Actions Query_Metadata • Executes XQuery for counting the number of archived AIUs • Executes XQuery for obtaining the minimum and maximum values of the partitioning criteria • Assigns minimum and maximum values on AIP object in eas_pkey_min_date and eas_pkey_max_date attributes Create_Metadata_Index Creates xDB indexes Import Imports the following as renditions of the AIP object: • Compressed eas_pdi.xml file (optionally ciphered) for reversibility • The compressed xDB detachable library containing the data extracted from eas_pdi.xml (xDB mode 2) • The compressed ingestion log file Waiting_Commit The AIP has been successfully ingested but is waiting for the final commit before it can be searched. If synchronous commit is enabled on the holding configuration object, standalone SIPs (seqno=1 and is_last=true) bypass this step and are directly committed. Unstructured Data Lifecycle State Decompress_Content Actions • Unzips the unstructured data files contained in the SIP to the ingestion working directory • Executes configured XQuery on the structured data to initialize the TOC • Checks for the presence of each unstructured data file Validate_Content_Hash Executes the configured XQuery on the structured data to compute and check the hash values on the unstructured data files Compress_Content Executes the configured XQuery on the structured data to compress the desired unstructured data files Unstructured data is usually provided in an electronic format that is already compressed such as pdf, tiff, jpeg, mp3, mp4, and so on. You can configure the processor not to compress specified types of compressed data to reduce the processing overhead during ingestion and retrieval while still having an acceptable compression ratio. 47 Concepts: How InfoArchive Works Unstructured Data Lifecycle State Actions Encrypt_Content Executes the configured XQuery on the structured data to encrypt the desired unstructured data files Aggregate_Content Aggregates the processed unstructured data files Ingestion Node For scalability purpose, the ingestion workload can be distributed across multiple ingestion nodes. A repeating property of the archive holding configuration defines the ingestion nodes allowed to process the ingestion of the SIPs. This setting is considered by the InfoArchive enumerator activated on each ingestion node. It returns the list of AIPs pending for ingestion restricted to those allowed to be processed on the current node. This mechanism lets you expedite ingestion on some ingestion nodes associated with some holdings. Ingestion Priority The InfoArchive enumerator returns the list of AIPs pending for ingestion in descending order of their ingestion priority. This ingestion priority is driven by two sorting criteria: 1. In descending order of the ingestion priority of the destination archive holding In an archive holding, a SIP with a high ingestion priority is returned before a SIP with a low ingestion priority. 2. In ascending order of the ingestion deadline date of the SIP For SIPs to be archived in archive holdings having the same ingestion priority, the SIP having the closest ingestion deadline is returned before the other SIPs. How Data is Stored InfoArchive ingests SIPs into the repository and stores data as AIPs; each AIP corresponding to one SIP. The AIP is represented by the eas_aip type object in Content Server. An AIP repository object and an xDB data file respectively apply the proprietary Content Server and xDB data model. The following general actions are performed during the ingestion: • The information contained in the SIP descriptor (eas_sip.xml) and PDI file (eas_pdi.xml) populates the properties of the AIP repository object in order to facilitate the subsequent operations. • Unstructured content files associated with AIUs, if any, are imported as the original content of the AIP object in the aggregated content file (eas_ci_container) format and stored in the configured storage area. The eas_ci_container rendition content aggregates all unstructured data associated 48 Concepts: How InfoArchive Works with the AIP. Meanwhile, a table of contents (eas_ri_xml) which describes the aggregated file is generated during the ingestion process. • The structured data files (eas_sip.xml, eas_pdi.xml, and eas_ri.xml) contained in the SIP are imported into xDB. In situations where a full data reversibility is required at the storage level, settings of the archive holding enable the activation of redundant storage of this information as renditions of the AIP repository object. Rendition format Description eas_sip_xml Rendition storing the SIP descriptor eas_pdi_xml_gzip Rendition storing the SIP structured data compressed with the gzip algorithm When AIUs have associated contents, the following information is always stored as content of the AIP repository object regardless of the related holding settings. Format Description eas_ci_container This content aggregates all the contents associated with AIUs. It is imported as the original content of the AIP repository object. eas_empty When the AIP does not contain any unstructured content, the AIP content is in eas_empty format. eas_ri_xml This rendition corresponds to the table of contents which describes the aggregated content file (eas_ci_container). 49 Concepts: How InfoArchive Works xDB Modes xDB modes determine how structured data is stored in xDB. A SIP may contain three types of structured data files. InfoArchive assigns a new file extension to metadata files when importing them into xDB: Structured Data File Description Document File Extension SIP descriptor XML file describing the attributes of a SIP sip PDI file XML file describing the attributes of AIUs pdi Table of contents XML describing the attributes of the content files contained in a SIP ri There are three ways structured data is ingested into xDB: • xDB Mode 1 Structured data files of all ingested AIPs are stored directly in a designated xDB detachable library. 50 Concepts: How InfoArchive Works xDB mode 1 is suitable for the following scenarios: — Frequent searches required on AIPs with long retention periods — Large number of AIPs with a moderate structured data volume — Long xDB backup time acceptable • xDB Mode 2 For each ingested AIP, a new sub-library is created in a designated xDB detachable library and structured data files are stored in the sub-library. The AIP object ID (eas_aip_id) constitutes part of the sub-library name. xDB mode 2 is suitable for ingesting a moderate number of large AIPs with a high structured data volume. • xDB Mode 3 Like in xDB mode 2, AIP structured data files are stored in the sub-library exclusively created for each AIP. However, AIP sub-libraries are organized according to a defined assignment policy in a pool of libraries (called pooled libraries, which is purely an InfoArchive concept and not a library type in xDB) created in a designated xDB detachable library. The pooled libraries logically pertain to a configured library pool. The logical library pool in xDB mode 3 gives you another layer of data manageability. For example, you can configure the assignment policy to ensure that a given pooled library stores only AIPs belonging to a specific holding, or group AIPs in different pooled libraries by year/month/week based on their retention dates. xDB mode 3 also significantly increases the ability to manage more archived data in an xDB database without suffering from performance issues. Compared with xDB mode 2, xDB mode 3 dramatically reduces the number of managed xDB detachable libraries. For performance reasons, managing more than 100,000 detachable libraries in an xDB database is not recommended. Note: xDB creates a library or a sub-library when all XML files in the folder are compressed and archived to the same destination. For example, when the xDB mode is set to 2, or the AIP mode is set to 3. The example below illustrates how AIP structured data files are stored in xDB after ingestion by each of the three xDB modes: 51 Concepts: How InfoArchive Works xDB Library Pool (xDB Mode 3) An xDB library pool is a set of logically grouped xDB libraries which can be configured as destination of one or several archive holdings. xDB libraries in a library pool are called pooled libraries (an InfoArchive concept) and a pooled library can store AIPs belonging to different holdings. In Content Server, an xDB library pool is represented by an eas_cfg_xdb_library_pool configuration object. For each xDB pooled library, an eas_xdb_pooled_library repository object is created and associated with an eas_cfg_xdb_library_pool configuration object. In DA, a library pool is displayed as a folder, which you can double-click to view its associated AIP objects. Note: The library pool folder and its subordinate AIP objects visually represents their logical parent-child relations rather than the actual state of the objects. Therefore, even when an AIP object assigned to a library pool has been pruned, it is still present in the library pool folder. The following illustration shows the mapping between a pooled library in xDB and its corresponding eas_xdb_pooled_library object in the repository. 52 Concepts: How InfoArchive Works A library pool is represented by an eas_cfg_xdb_library_pool configuration object defining: • Its assigned name. • The xDB database in which the libraries are stored. • The rule to be applied for assigning a new AIP to a library of the pool, for example, maximum number of AIPs per library • The settings to be applied when it is required to add a new library to the pool. For example, repository folder in which the eas_xdb_pooled_library repository object must be classified and the permission set to apply on it. • The maximum number of libraries of the pool which can be cached in at the xDB level. • The conditions to be fulfilled for closing a library of the pool and the settings to apply during a closure, for example, import the xDB library as content of the eas_xdb_pooled_library repository object. 0 The library is never closed automatically. 1 The library is closed when the current date >= date returned by the XQuery assigned to the eas_close_hint_date on the library + eas_close_period. If this mode is configured, the XQuery must not only return the partitioning value but also a date; if not, an ingestion error occurs. 2 The library is closed when the current date >= library opening date + eas_close_period. 3 The library is closed when the current date >= eas_last_ingest_date defined on the library + eas_close_period. Note: The library is closed whenever a close request is set on the library explicitly. In DA, different xDB pooled library states are represented by different icons. Icon xDB Pooled Library State Pooled library 53 Concepts: How InfoArchive Works Icon xDB Pooled Library State Pooled library is open Pooled library is online Pooled library is offline xDB Pooled Library Assignment Policy (xDB Mode 3) When an AIP is ingested, InfoArchive searches for an available xDB pooled library in which to store its structured data. An available xDB pooled library must be one that is not closed and has not reached its close condition or storage quota. If no such pooled library is found, a new pooled library is created. By default, a new AIP is assigned to the latest open library in the pool. You can configure a custom assignment policy to assign new AIPs to libraries to meet your specific requirements. The most common assignment policy is the time-based partitioning logic (e.g., by week, month, quarter, or year). For example, in a quarter partitioning assignment policy, AIPs archived during different quarters are assigned to their respective quarter-based xDB pooled libraries. Multiple pooled libraries can be open at a single point in time for a given library pool. To archive data from in 2014 Q3 through Q4 with a quarter partitioning policy, the corresponding xDB pooled libraries must remain open. When there is no more data to be archived for a particular quarter, the library corresponding to that quarter can be closed. xDB Library Online Backup For xDB libraries created in xDB mode 2 and 3, xDB libraries can be compressed and imported into the repository as a rendition of the AIP object while the xDB library still resides in the xDB database in read-only mode. The xDB library backup renditions in eas_xdb_backup (uncompressed) or eas_xdb_backup_gzip (compressed) format contains xDB library contents and is stored in the Content Server repository storage area such as on EMC Centera or Atmos. 54 Concepts: How InfoArchive Works Note: Earlier releases of InfoArchive stores xDB library (pool) renditions in the eas_xdb_library or eas_xdb_library_gzip format. If you have previously backed up xDB library (pool) backup renditions in the legacy format, you need to migrate them to the new xDB backup format. InfoArchive is backward-compatible with the legacy xDB library (pool) backup format, but it may not be supported in future InfoArchive releases. InfoArchive provides a data migration tool to help you convert the old xDB library (pool) backup format to the new format. See Converting xDB Library (Pool) Backup Renditions, page 124. In xDB mode 2, the xDB library is automatically imported into the repository at the end of the AIP ingestion; In xDB mode 3, the xDB pooled library is imported into the repository when the pooled library is closed by the eas_close job. Archived data is always searchable during the online backup process since the xDB library does not need to be detached. Importing xDB libraries into the repository helps to exclude the libraries from DB backups and provides the following benefits: • The xDB backup duration is fairly constant, independent of the volume of archived data. • If a rendition is assigned to a replicated content-addressable storage (CAS) like EMC Centera or Atmos, no backup operations need to be managed for this data. xDB Caching Through the xDB mechanism, InfoArchive uses the xDB system as cache for querying archived data. xDB libraries created in xDB mode 2 and 3 can be temporarily removed from the xDB file system (cached out). Caching out xDB libraries prevents the xDB file system from growing too large and reduces the xDB space requirements. Some data is archived solely for compliance purposes without any need to be searched. xDB caching is configured for such holdings generally to keep the number of xDB libraries low. The order node requests the xDB cache process to cache out the library. A background thread of the order node periodically creates a list of the unlocked libraries which can be cached out of the xDB file system for each holding and library pool. A library cannot be cached out if it is accessed by a search that is in progress. The list of libraries to cache out is ordered by the following criteria: 1. Number of orders currently being searched 2. The date of the last search in the library 3. The date of the last caching of the library When an xDB library is cached out, you can put it back to the xDB database later (cached in) when needed for searching. Caching in xDB libaries provides file system access to the xDB data files. 55 Concepts: How InfoArchive Works The following screenshot illustrates how an AIP’s xDB library in xDB is cached in as a compressed xDB backup rendition of the same AIP object in DA. In DA, you can view xDB caching related properties of an AIP. 56 Concepts: How InfoArchive Works Property Name DA Label Description eas_xdb_cache _support Cache out support Equals T when the AIP is stored in an xDB library which can be cached in/out eas_xdb_is_cached Cached Equals T when the library which stores the AIP is currently present in the xDB file system eas_xdb_clib_cache _lck_date Locked in cache deadline date The library storing the AIP must be kept in xDB file system until this date and time (i.e., cannot be cached out) eas_xdb_clib_cache _in_cnt # of cache in The number of times that the library has been placed in the xDB file system, including the initially ingested eas_xdb_clib_cache _in_date Last cache in date The date and time when the library was last placed on the xDB file system xDB Library Locking You can prevent xDB libraries from being cached out for a specified period of time to ensure the most recent archived data can always be searched synchronously. When an xDB library cannot be removed from the DB file system, it is locked in the cache; otherwise, it is in the unlocked state. During ingestion, the eas_xdb_clib_cache_lck_date property of the AIP (eas_aip) object is computed from the retention date (/sip/dss/base_retention_date) read from the SIP descriptor eas_sip.xml and the eas_xdb_cache_lock_period property of the holding configuration object (eas_cfg_holding): eas_xdb_clib_cache_lck_date = base_retention_date + eas_xdb_cache_lock_period The xDB library is prevented from being cached out from the xDB file system until this date. The cache lock period does not starts from the current ingestion date because it is not appropriate to lock xDB libraries in the cache when the archived AIPs are old, migrated data. 57 Concepts: How InfoArchive Works For example, if the locked in cache period is set to 31 (days) and the current ingestion date is February 8, the locked in cached dates of the following AIPs are computed as follows: AIP Retention Date Lock in xDB Cache Date AIP #1 January 1 February 1 Unlocked AIP #2 January 15 February 15 Locked AIP #3 January 31 March 3 Locked Lock State You can configure xDB caching settings so that archived data can be synchronously searched throughout the entire retention period in some holdings but only the most recent data (for example, several months old) can be synchronously searched in others. The xDB caching settings let you adopt a strategy for each archive holding based on their respective demand for synchronous searches. AIP Modes There are three ways AIPs are created and stored in InfoArchive. • AIP mode 1 AIPs are created and stored as full-blown materialized AIP objects. • AIP mode 2 AIPs are created and stored as unmaterialized AIP lightweight objects (LWSOs) attached to a shared AIP parent shareable object (eas_aip_parent). In this mode, multiple AIP child lightweight objects (eas_aip) are created, each from one SIP, with their common properties (for example, the same holding and PDI schema) inherited from an AIP parent shareable object. • AIP mode 3 In AIP mode 3, AIPs are created and stored as unmaterialized AIP lightweight objects (LWSOs) in exactly the same way as in AIP mode 2, except the following: — The eas_close InfoArchive job closes any open AIP parent shareable object that meets the defined close condition by aggregating all its AIP child LWSOs into a single materialized AIP object. — The eas_dctm_clean job deletes the aggregated AIP child LWSOs. 58 Concepts: How InfoArchive Works In the following situations, an AIP is considered in unsteady state: — The AIP is a future aggregate and its associated AIP parent is open — The AIP belongs to an aggregated AIP parent but is yet to be pruned Of the three AIP modes, AIP mode 1 takes up the most storage space but offers the smallest management granularity. For example, in this mode, you can individually manage the retention period and permission set for each AIP or classify/move repository objects around the repository folders. Both AIP mode 2 and AIP mode 3 take advantage of the Content Server lightweight object type, which reduces the storage space needed in the database for AIP objects by inheriting their common properties from a single copy of the shared parent object instead of having multiple redundant rows in the underlying SysObject tables to support all AIP instances. AIP mode 2 and 3 are designed for ingesting large quantities of SIPs that share many common properties. For detailed information about the lightweight object type, refer to EMC Documentum Content Server Fundamentals. Through aggregation of AIPs, AIP mode 3 has the smallest storage footprint of the three modes but does not offer much flexibility in terms of managing individual AIPs. You can only manage AIPs at the aggregated AIP level in this mode. Which AIP mode to use depends on the ingestion mode. AIP mode 1 is used with the asynchronous ingestion mode; all AIP modes can be used with the synchronous ingestion mode. The following table compares the three AIP modes: AIP Mode 1 AIP Mode 2 AIP Mode 3 AIP Object Type Materialized lightweight object (LWSO) Unmaterialized lightweight object (LWSO) Unmaterialized lightweight object (LWSO), aggregated into materialized lightweight object when the parent shareable object is closed AIP Management Granularity Small Medium Large Storage Footprint Large Medium Small Ingestion mode Synchronous and asynchronous Synchronous Synchronous An archive holding can concurrently support all the three AIP modes. AIP search and retrieval is AIU-based and works in the same manner regardless of which AIP mode is used. The default AIP mode is set through the eas_default_aip_mode property of the eas_cfg_aip_parent_policy configuration object. You can change the AIP mode after creating a holding. When you change the AIP mode of a holding, the new AIP mode is only applied to new packages, and previously archived packages remain stored according to the mode configured when they were ingested. 59 Concepts: How InfoArchive Works AIP Parent Assignment Policy In AIP mode 2 and 3, when an AIP in ingested, InfoArchive searches for an available AIP parent to which to attach the AIP. The eligible AIP parent must be open and has not reached any defined quota, and must share some common characteristics with the new AIP, such as the same holding and PDI schema. If no suitable AIP parent is found, a new AIP parent is created. An open AIP parent shareable object (eas_aip_parent) has multiple unmaterialized AIP child lightweight objects (eas_aip). Content files (unstructured data) pertaining to the AIP lightweight objects are stored individually in temporary file system storage areas such as a file store configured in Content Server. AIP structured data is stored in an xDB segment assigned to the AIP parent. By default, the AIP is attached to the first compatible open AIP parent. You can create a custom assignment policy. The most common AIP assignment policy is the time-based partitioning logic (e.g., week, month, quarter, or year). For example, in a quarter partitioning assignment policy, AIPs archived during different quarters are attached to their corresponding AIP parents. The AIP parent of a particular quarter must remain open until all AIPs pertaining to that quarter have been archived. Only when there is no more data to be archived for that quarter, can the corresponding AIP parent be closed. AIP Parent Closure In AIP mode 3, an InfoArchive job (eas_close) closes any open AIP parent shareable object that meets the defined close condition defined in the AIP parenting policy by aggregating all its AIP child lightweight objects into a single materialized AIP object. Once an AIP parent is closed, AIP child objects can no longer be attached to it. When an open AIP parent shareable object is closed: • All its child lightweight objects are aggregated into a single materialized AIP lightweight object. The eas_is_aip_aggregate property of the aggregated object is set to TRUE to indicate that the AIP object is an aggregate of multiple AIP lightweight objects. • The original child lightweight objects are attached to the Prune lifecycle to be deleted from the system. 60 Concepts: How InfoArchive Works • The individual content files originally pertaining to the child lightweight objects are aggregated into a single content file in the archive storage area. Original content files are removed from the temporary file system storage area. • The xDB segment is assigned to the aggregated AIP object. Closing AIP parent shareable objects and aggregating child lightweight objects reduces the storage space and optimizes system performance in several ways: • The destruction of AIP lightweight objects reclaims the storage space needed in the underlying RDBMS such as SQL Server or Oracle database. • The consolidation of individual content files into one content file reclaims the temporary file system storage space. • If you use Content Addressable Storage (CAS) such as EMC Centera, the storage of content files in the storage area and xDB does not require backups, which reduces system overhead and improves performance. Supported AIP Mode and xDB Mode Combinations Not all AIP mode/xDB mode combinations are valid and some combinations are only supported by one ingestion mode but not the other. For information about AIP modes and xDB modes, see AIP Modes, page 58and xDB Modes, page 50. xDB Mode 1 xDB Mode 2 xDB Mode 3 AIP Mode 1 asynchronous ingestion and synchronous ingestion asynchronous ingestion asynchronous ingestion and synchronous ingestion AIP Mode 2 synchronous ingestion Unsupported synchronous ingestion AIP Mode 3 synchronous ingestion synchronous ingestion synchronous ingestion Note: In xDB mode 2 and 3, the number of xDB detachable libraries automatically created in the xDB database must not exceed 100,000. 61 Concepts: How InfoArchive Works How Data is Searched InfoArchive performs a search in two distinct tiers: 1. In the tier-1 search, a DQL query is executed against structured PDI data (eas_pdi.xml) using the partitioning key as a filter criteria to return a subset of eligible AIPs that may contain query results. The partitioning key was defined as a part of ingestion configuration and its minimum and maximum values were determined and assigned to each AIP object during ingestion, which forms the basis for the tier-1 search. If the search criteria does not contain the partitioning key, all AIPs will be considered in the tier-2 search. For information about partitioning key, see minmax—Defining the AIP Partitioning Key, page 132. For better performance, the tier-1 search only retrieves a subset of AIP attributes. 2. In the second tier of the search, an XQuery expression is executed at the xDB level against the candidate AIPs returned by the tier-1 search. This tier-2 search retrieves the AIUs that match the exact criteria. A quota (AIP quota or results quota) can be defined and applied to each tier of the search to limit the number of AIPs or AIUs returned by the query. The staged search approach makes queries very fast. This is because the AIPs returned by the tier-1 search effectively limit the scope of the tier-2 search. This prevents the tier-2 search from querying for all the AIPs stored at the xDB level. Depending on whether or not search results can be returned instantly, there are two types of searches: synchronous search (also referred to simply as search) and asynchronous search (also referred to as a background search or an order). • In a synchronous search, the search results reside in an online xDB library and can be immediately returned to the user. A synchronous search fails if it attempts to access data from an AIP stored in a library that is not present on the xDB file system. • In an asynchronous search, the search results cannot be retrieved in a reasonable amount of time, either because the result set is too large or because the candidate AIPs that may contain the search results do not reside in any online xDB library. In order to satisfy this search, data must be retrieved from a detached (i.e., offline) xDB library. The detached libraries are placed online in xDB (i.e., the data is cached in) and therefore is available for searching. The order node requests the xDB cache process to cache in the library when needed for executing an order. For information about xDB caching, see xDB Caching, page 55. For example, suppose you perform a search for phone calls received from a customer named John between Feb. 1 and March 1, 2011 using the following search criteria: StartDate >= 2011-02-01 AND StartDate < 2011-03-01 62 Concepts: How InfoArchive Works AND CustomerName = ’John’ In the tier-1 search, InfoArchive executes the following DQL query and returns two AIP objects (AIP 2 and AIP 3) that may contain the records you want: SELECT AIP WHERE eas_pkey_min >= 2012-02-01 AND eas_pkey_max < 2012-03-01 In the tier-2 search, InfoArchive executes an XQuery expression something like the following against AIP 2 and AIP 3: for $Call in doc('/aip/02/00000000000/eas_pdi.pdi')/n:Calls/n:Call[ n:CallStartDate[ . ge xs:dateTime('2012-02-01') and . lt xs:dateTime('2012-03-01') ] and n:CustomerFirstName[. = ’John’] ... However, since AIP 3 resides offline in a detachable library and cannot be searched until it is cached in to the online xDB library, an asynchronous search or order has to be created for the search. Order (Asynchronous Search) In some situations, searches cannot be completed within a reasonable transactional response time because one of the following reasons: • A very large result set is returned • The candidate AIPs that may contain the search results do not reside in any online xDB library and must be cached in first for the search to complete 63 Concepts: How InfoArchive Works Such searches can be performed asynchronously through an order (asynchronous search). Generally, orders return more results than synchronous searches. Using orders also helps balance the query load on the server and optimize the system performance. You issue an order through InfoArchive GUI, which provides a Background Search button that lets you submit a search order. After an order is submitted, a confirmation message is displayed. When an order is generated, an eas_order object is created a repository folder (configured through the order configuration object); for example, in the /Order/order_node_01 folder. Orders are executed by an order processor, which is configured through an order node configuration object in the repository. Order results are stored in a configurable location in xDB; for example, in the /root-library/order/01 library. The document name of an order includes the r_object_id of the eas_order object from the repository. Intermediate order results and final order results can be stored in the same location or separate xDB libraries. If needed, the destination xDB libraries can even be hosted on two xDB databases. Optionally, if order results contain encrypted data, they can be encrypted and stored as rendition of the Order object in the repository instead of in xDB. 64 Concepts: How InfoArchive Works Order Lifecycle After creation, an order goes through a set of states: • When an order is created, it is in the Dormant state. • After the order processor reads the order request, the order state becomes Queued. • When the order processor begins to execute the query, the order state becomes Started. • When the order execution is finished, its status is: — Terminated if the execution completed without any error. — Error if an error occurred during its processing. • An administrator can suspend/resume the execution of an order. A suspended order is in the Suspended state. Orders are kept until deleted—either manually or by an InfoArchive job. 65 Concepts: How InfoArchive Works Confirmations—How the Loop is Closed You create a closed-loop system using InfoArchive through confirmations. A confirmation is a message generated in response to an AIP event to acknowledge that the event has occurred and to capture the information of the relevant AIP. Confirmations are generally used to send feedback to source applications or notifying other business applications (such as a web portal) of AIP events. Through the confirmation mechanism, InfoArchive can interoperate with other applications to form a closed-loop system. Multiple confirmations can be generated for a single event and the content of the messages are configurable. You can also limit the scope of confirmation generation to a specified set of AIPs or event types. Confirmation message are outputted to a file system location or xDB. Confirmation processing is performed by the eas_confirmation Documentum job. Confirmation Event Types Confirmations can be generated for the following AIP event types: Confirmation Event Type Description Receipt The SIP has been successfully received into the repository and the AIP has been created Storage The AIP ingestion has been committed in xDB 66 Concepts: How InfoArchive Works Confirmation Event Type Description Purge The AIP has been deleted from the repository Reject The AIP has been rejected by the InfoArchive administrator invalid The AIP has been invalidated by the InfoArchive administrator When a confirmation event occurs, the event timestamp is stored in a date property of the AIP. When an occurred event is processed by the confirmation job, the confirmation timestamp is assigned to a date property of the AIP to record the time of confirmation. The confirmation job uses the timestamp pair (occurrence and confirmation) for each event type to keep track of conformation processing so that it can pick up where it left off in the last run to incrementally process AIP events. You can view the event timestamp and the confirmation timestamp for the event in the Tracking tab of an AIP properties page in Documentum Administrator. A void event timestamp means the event has not occurred for the AIP; a void event confirmation timestamp means the AIP event has not been confirmed yet. Note: the ingestion confirmation date records the storage event timestamp. Confirmation Job (eas_confirmation) Confirmation processing is performed by the eas_confirmation Documentum job, which can be run from the command prompt or Documentum Administrator. Confirmation processing is very CPU-intensive and I/O-intensive due to frequent queries on AIPs and frequent writes to the working directory configured for the confirmation job. To minimize its impact on system performance during business-critical hours, you can schedule the eas_confirmation job to run in periods that are less resource-demanding. The eas_confirmation job performs the following tasks when processing confirmations: 1. Searches for all unprocessed AIP events 2. For each AIP event, searches for applicable confirmation configurations 3. Searches for the query configuration object 67 Concepts: How InfoArchive Works 4. Generates a confirmation message for each eligible AIP event using the XQuery in the query configuration object 5. Updates the event confirmation timestamp on the AIP 68 Chapter 3 InfoArchive Configuration InfoArchive Configuration Overview EMC InfoArchive is highly configurable and allows you to customize many aspects of the archiving process to meet your specific business requirements, from ingestion, to query, to the look and feel of InfoArchive GUI. All configurations are represented as Documentum repository objects. All InfoArchive configuration objects have the eas_cfg prefix in their name and are subtypes of the eas_cfg object type, whose parent type is dm_sysobject. You perform most of InfoArchive configurations by setting the properties of InfoArchive configuration objects through EMC Documentum Administrator (DA). Some important configuration information is stored in the content of configuration objects. For example, the PDI schema file that defines the data structure of the eas_pdi.xml file is imported as the content of the schema configuration (eas_cfg_schema) object. Since a holding is a logical archival destination for the same type of data that shares similar characteristics, most InfoArchive configurations are performed at the holding level and apply to all the data archived into a particular holding. The configuration settings of a holding govern all the aspects of data archiving in the holding throughout the archived data lifecycle, from ingestion, to storing, to retrieving, to disposition. The following diagram depicts the core InfoArchive configuration objects and the relations among them. Arrowed lines denote how the configuration object are related through references, with the arrowhead pointing to the object being referenced. Sitting at the center of the InfoArchive configurations is the holding configuration (eas_cfg_holding) object, which is directly or indirectly associated with most of other configuration objects. 69 InfoArchive Configuration The EMC InfoArchive Object Reference Guide contains the detailed information of all InfoArchive objects—both configuration objects and non-configuration objects– and their properties. The complete working configurations of a holding entail properly configuring all the configuration objects associated with the holding. There is not a mandatory, fixed sequence to follow to configure all the configuration objects—you can configure them in any order you want and jump back and forth until all objects are correctly configured. However, it is a good practice to configure objects with no or fewer dependencies first— objects that do not reference other objects that have not been configured yet. As an alternative to manually creating and configuring a new holding, InfoArchive provides a holding configuration wizard that guides you through the process of creating a basic holding configuration to give you a quickstart. See Quickstart Using the InfoArchive Holding Configuration Wizard, page 84. Working with InfoArchive Configuration Objects You manage InfoArchive configuration objects in the same way as you manage other repository objects in DA. Creating a Contentless Configuration Object 1. 70 Navigate to the folder where you want to create the configuration object. InfoArchive Configuration You can create the configuration object at any location, but the best practice is to group holding-specific configuration objects in one folder (created for a specific holding and preferably named after it); for configuration objects that are shared across multiple holdings, you can group them by their type. For example, the location of all the configuration objects pertaining to the PhoneCalls sample holding is /System EAS/Archive Holdings/PhoneCalls; All non-holding-specific default delivery channel, node, search form, and xDB library configuration objects are classified into their respective folders. 2. From the File menu, choose New > Document. Note: Make sure no existing object is selected; otherwise, the New > Document command is greyed out. 3. Under the Create tab, type a name for the configuration object and select the object type (always starts with eas_cfg_); then click Next. The name here is just for display in DA and will not be used by InfoArchive. 4. Under the Info tab, set the properties of the configuration object as needed and click Finish. If you do not set its properties now, you can always edit them later. Note: The name here corresponds to the eas_name property of the configuration object and will be used by InfoArchive. Make sure that the value of eas_name is unique. The configuration object created this way is without any content. You can import a file as the object content later if needed. Creating a Configuration Object with Content 1. Navigate to the folder where you want to create the configuration object. You can create the configuration object at any location, but the best practice is to group holding-specific configuration objects in one folder (created for a specific holding and preferably named after it); for configuration objects that are shared across multiple holdings, you can group them by their type. For example, the location of all the configuration objects pertaining to the PhoneCalls sample holding is /System EAS/Archive Holdings/PhoneCalls; All non-holding-specific default delivery channel, node, search form, and xDB library configuration objects are classified into their respective folders. 2. From the File menu, choose Import. Note: Make sure no existing object is selected; otherwise, the Import command is greyed out. 3. On the File Selection page, select a file to import as the content of the configuration object and then click Next. 4. On the Object Definition Page, type a name for the configuration object and select the object type (always starts with eas_cfg_). The corresponding configuration object properties are displayed. 71 InfoArchive Configuration Note: The first Name field is just for display in DA and will not be used by InfoArchive; The second name field corresponds to the eas_name property of the configuration object and will be referenced by InfoArchive. Make sure that the value of eas_name is unique. 5. Set the properties of the configuration object as needed and click Finish. If you do not set its properties now, you can always edit them later. The configuration object is created with the imported file as its content. You can change the object content later if needed. Importing Content into a Configuration Object After you create a configuration object, you can always import content into it (if it is a contentless object) or change its existing content. 1. Right-click the configuration object and choose Check Out. A Checked out by you icon appears next to the object. 2. Right-click the checked out configuration object and choose Check In. 3. At the bottom of the Checkin page, select a file and click OK. The file format is automatically detected and the file is checked in as the new content of the configuration object. Editing the Properties of a Configuration Object To edit the properties of a configuration object, right-click the object and choose Properties from the shortcut menu. Performing System Global Configurations Global settings and configuration objects are shared by multiple holdings. Some default global configuration objects that can be used out of the box were created when you installed InfoArchive. You can modify them or create new configuration objects as needed. Configuring Global Settings Global settings are configured through the global configuration (eas_cfg_config) object and apply to all holdings. To modify the default global settings, edit the properties of this object. 72 InfoArchive Configuration Property Name DA Label Description eas_site Site Name of the current execution site referenced in several other configuration objects (for example, eas_cfg_xdb_library, eas_cfg_rsa_rkm, and so on). eas_receive _policy Reception lifecycle Name of the lifecycle to attach to an AIP when it is received. eas_rec_reattach _state Reception reatach state Name of the second state of the reception lifecycle state. In the event the alias set configured for the archive holding is different from the one configured for the reception, this information is used for reattaching the AIP directly to this state with the alias set defined for the holding. eas_invalid _policy Invalidity lifecycle Name of the lifecycle to attach to an Archival Information Package (AIP) when it has been invalidated by the administrator (for example, unknown file format, unknown archive holding). eas_reject_policy Reject lifecycle Name of the lifecycle to attach to an AIP when it has been rejected by the administrator (for example, the business 73 InfoArchive Configuration application representative states this AIP contains wrong data). eas_purge_policy Purge lifecycle Name of the lifecycle to attach to an AIP when it has to be disposed. eas_prune_policy Prune lifecycle Name of the lifecycle to attach to an eas_aip_parent object, which is not required anymore after its associated eas_aip has been aggregated to a materialized eas_aip. eas_rejinv_logs _store_enable Archive reject /invalidation log for unknown holding Flag to indicate that the reject/invalidation logs should be archived for AIP referencing an unknown holding. eas_rejinv_logs _store Reject /invalidation log store for unknown holding Name of the log store used to store reject/invalidation logs for AIP referencing an unknown holding. eas_rejinv _retention_enable Set retention on reject/invalidation for unknown holding Push the retention date to the storage subsystem for the contents associated with an invalidated or rejected AIP referencing an unknown holding. eas_rejinv _retention_period Retention period (d) on reject/invalidation for unknown holding Retention time in days for a rejected or an invalidated AIP referencing an unknown holding (eas_aip.eas _retention_date = eas_aip.eas_receive_start_date + eas_cfg_config.eas_rejinv_retention_period). Configuring an xDB Cache Access Node The InfoArchive accesses XML documents on the xDB file system through the cache access node. You must configure a cache access node for xDB caching to work. The cache access node is configured through the cache access node configuration (eas_cfg_cache_access_node) object in the repository. A default cache access node /System EAS/Nodes/cache_access_node_01 was configured during InfoArchive installation and can be used out of the box. You can modify the default cache access node as needed. To create and configure a new cache access node, create an object of type eas_cfg_cache_access_node and set its properties. 74 InfoArchive Configuration Property Name DA Label Description eas_name Name Name of the access node. eas_log_level Log level Log level for processing cache access requests. eas_fs_working _root Working directory Path to the working directory in the file system used by the cache access node. eas_queue_user Queue user User account (virtual user, no connection) in the repository associated with this node: • Each cache access node must be associated with a user account. • Ingestion nodes and order nodes must post their cache requests to that user. eas_polling _interval Polling interval (ms) Interval in milliseconds for polling the queue of cache requests. eas_suspend _enabled Suspended Flag to temporarily suspend the processing of requests without having to stop the node. 75 InfoArchive Configuration eas_requestor _timeout Requestor timeout Time in milliseconds after which the application which issued the request to this node must assume that the request cannot be processed at this time, unless it has received a callback. eas_processed _request_cnt # requests processed Counter indicating the number of requests processed by this node. eas_log_close _pending Log close pending Flag to indicate that the log file should be closed without stopping the node. eas_start_date Start date Date of the last time the node started. eas_start_proc _request_cnt # requests processed since start date Counter indicating the number of requests processed by this node since the last boot (or since the last reset). eas_stop_pending Stop pending Flag to indicate that the node should be stopped. Setting this flag to true (for example, using DQL) allows stopping the node. eas_stop_date Stop date Date of the last time the node stopped. Configuring an xDB Library xDB libraries to be used for storing archived data are represented as xDB library configuration (eas_cfg_xdb_library) objects in the repository. This object defines the target xDB server, the library path, and the xDB cache process to be used (i.e., the name of a eas_cfg_cache_access_node configuration object). For information about xDB library, refer to the EMC Documentum xDB Administration Guide. Four default xDB libraries were configured during InfoArchive installation and can be used out of the box. You can modify these xDB libraries as needed. • aip_01_xdb_lib The default xDB parent library in xDB mode 1 and 2. For information about configuring an xDB parent library, see Configuring an xDB Parent Library (xDB Mode 1 and 2), page 115. • confirmation_aip_xdb_lib The default xDB library to store generated confirmations, if confirmations are configured to use the xDB library as the delivery channel (set as the value of the eas_cfg_xdb_library parameter on the delivery channel configuration object for confirmations). 76 InfoArchive Configuration For information about delivery channel, see Delivery Channel Configuration Object (eas_cfg_delivery_channel) , page 225. • confirmation_audit_xdb_lib The default xDB library to store confirmation audit trails, referenced in the purge audit properties file. • order_01_xdb_lib The default xDB library to store intermediate order (asynchronous search) results in xDB, referenced by the delivery channel configuration (eas_cfg_order) object used as the working delivery channel by the order configuration (eas_cfg_order) object. For information about configuring an order configuration object, see Configuring an Order Configuration (eas_cfg_order) Object , page 179. To create and configure a new xDB library, create an object of type eas_cfg_xdb_library and set its properties. Property Name DA Label Description eas_name Name Name of this xDB library configuration. 77 InfoArchive Configuration Property Name DA Label Description eas_library_path Library path Full path to the xDB library where structured data can be imported. eas_database _name Database Name of the xDB database to which the library belongs. eas_federation _name Federation Name of the federation to which the database belongs. eas_federation _set_name Federation set Name of the federation set to which the federation belongs. eas_user_name xDB user name xDB user name to use to connect to the xDB database. eas_cfg_crypto _pro_user_pwd Cfg crypto provider for password Name of service provider used for encrypting the password (blank if the password is not encrypted). Password crypto key id Identifier of the key used to encrypt the password if the password has been encrypted using a cryptographic provider. An empty value indicates that the password is not encrypted. eas_user_pwd _crypto_key_id Reserved for future use Reserved for future use eas_user _password _encoding Password encoding Name of the encoding (for example, base64) used to store the password. eas_user _password xDB password Password for the user name used to connect to the xDB database. eas_cfg_cache _access_node Cfg cache access node Name of the cache access node having access to the xDB file system of the xDB database to which the library belongs. eas_segment _location Segment location Logical name of the storage path of the xDB segment file associated with the property eas_seg_root_fs_path at the same index. This logical name indirection makes it possible to not include file system-level paths in the configuration of the archive holding. eas_seg_root_fs _path 78 Path of the segment location File system path to the location where xDB segment files can be stored. InfoArchive Configuration Property Name DA Label Description eas_node _proximity Site Proximity Proximity value of the xDB node associated with the xDB node name at the same index. Reserved for future use eas_node_name xDB node name Name of the xDB node. eas_node_host Host Name or IP address of the xDB database host associated with the xDB node name. eas_node_port Port Port of the node associated with the xDB node name at the same index. eas_node_write _enabled Write enabled Flag associated with the xDB node name at the same index indicating if the node can write to the library. eas_node_read _enabled Read enabled Flag associated with the xDB node name at the same index indicating if the node can read from the library. Configuring the Configuration Cache When enabled, the configuration cache boosts web services calls performance and offloads the Content Server by temporarily storing configuration objects fetched from the repository to the web services server so that they can be retrieved rapidly when needed in subsequent calls. An atomic entry in the configuration cache is called an element. An element has a key, a value, and a record of accesses. Elements are put into and removed from caches. They can also expire and be removed by the cache, depending on the cache settings. You enable the configuration cache and configure its settings through the options in the eas-services.properties file located in WEBSERVER_HOME/conf on the web application server. You can configure the following options in eas-services.properties. If an option is not set in the properties file, it falls back to its default value. Option Description eas_config_cache _enabled Whether or not to enable the configuration cache. Default: false. 79 InfoArchive Configuration Option Description eas_config_cache _memory_size The maximum size of the JVM memory that can be used for caching. When this max size limit is reached, elements will be cached out (evicted) from the memory store based on the cache strategy (eviction algorithm). Default: 256M. eas_config_cache_time _to_live Length of time (in seconds) elements are kept in the cache since they are placed in the cache. 0 means that elements have an indefinitely long life span unless evicted based on the cached strategy. Default: 86400 (1 day). eas_config_cache_time _to_idle Length of time (in seconds) elements are kept in the cache after their last use. 0 means that elements will not be cached out based on idle time. Default: 86400 (1 day). eas_config_cache _strategy When the cache limit is reached, which eviction algorithm to use to cache out elements. Default: LRU. • FIFO First in First Out—elements are evicted in the same order as they come in. When a put call is made for a new element (and assuming that the max limit is reached for the memory store) the element that was placed first (first-in) in the store is the candidate for eviction first-out. This algorithm is used if the use of an element makes it less likely to be used in the future. It takes a random sample of the elements and evicts the smallest. • LFU Least Frequently Used—The first element to be deleted will be the least frequently used. For each get() call on the element, the number of hits is updated. When a put() call is made for a new element (and assuming that the max limit is reached), the element with least number of hits (the Least Frequently Used element) is evicted. If cache-element usage follows a Pareto distribution, this algorithm might give better results than LRU. • LRU Least Recently Used—The first element to be deleted will be the least recently used. The oldest element is the Less Recently Used element. The last-used timestamp is updated when an element is put into the cache or an element is retrieved from the cache with a get call. This algorithm takes a random sample of the elements and evicts the smallest. eas_config_cache _statistics_enabled 80 Whether or not to enable the detailed cache statistics. Set this option to true if you want to view basic and detailed statistics of the configuration cache, such as cache hits, cache misses, eviction count, and average get time. Default: false. InfoArchive Configuration You perform administrative tasks on the configuration cache through the InfoArchive web services administration pages. See Administrating the Configuration Cache, page 318. Configuring a Centera Store for Use with InfoArchive InfoArchive natively supports data archiving on Content Addressable Storage (CAS) platforms such as EMC Centera. If you use Centera with InfoArchive, archived data is pushed to Centera at the storage level at he end of the archiving process. The storage of content files in the storage area and xDB does not require backups, which reduces system overhead and improves performance. To configure a Centera Store for use with InfoArchive, you need to create a Centera Store repository object in DA and set some key properties. Creating a Centera Store Object In DA, create a Centera Store object by executing a DQL statement in DQL Editor. Use the following sample DQL as a reference: CREATE dm_ca_store OBJECT SET name SET a_plugin_id = 'AssignStoreName', = (SELECT r_object_id FROM dm_plugin WHERE object_name = 'CSEC Plugin'), SET a_retention_attr_name = 'eas_retention_date', SET a_retention_attr_required = False, APPEND a_storage_params = 'AssignedIpAddress#1,…, AssignedIpAddress#n?AssignedFilepath', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_aip_id', = 'eas_aip_id', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_holding', = 'eas_dss_holding', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_id', = 'eas_dss_id', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_pdi_schema', = 'eas_dss_pdi_schema', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_pdi_schema_version', = 'eas_dss_pdi_schema_version', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_production_date', = 'eas_dss_production_date', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_base_retention_date', = 'eas_dss_base_retention_date', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_producer', = 'eas_dss_producer', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_entity', = 'eas_dss_entity', 81 InfoArchive Configuration 82 APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_priority', = 'eas_dss_priority', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_dss_application', = 'eas_dss_application', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_production_date', = 'eas_sip_production_date', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_seqno', = 'eas_sip_seqno', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_is_last', = 'eas_sip_is_last' APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_aiu_count', = 'eas_sip_aiu_count', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_page_count', = 'eas_sip_page_count', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_pdi_hash_algorithm', = 'eas_sip_pdi_hash_algorithm', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_pdi_hash_encoding', = 'eas_sip_pdi_hash_encoding', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_pdi_hash', = 'eas_sip_pdi_hash', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_cfg_crypto_provider', = 'eas_cfg_crypto_provider', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_pdi_crypto_key_id', = 'eas_pdi_crypto_key_id', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_crypto_encoding', = 'eas_crypto_encoding', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_crypto_iv', = 'eas_sip_crypto_iv', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_pdi_crypto_iv', = 'eas_pdi_crypto_iv', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_ci_crypto_iv', = 'eas_ci_crypto_iv', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_sip_crypto_propbag', = 'eas_sip_crypto_propbag', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_pdi_crypto_propbag', = 'eas_pdi_crypto_propbag', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_ci_crypto_propbag', = 'eas_ci_crypto_propbag', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_pdi_crypto_hash_algo', = 'eas_pdi_crypto_hash_algo', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_pdi_crypto_hash_salt', = 'eas_pdi_crypto_hash_salt', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_ci_crypto_key_id', = 'eas_ci_crypto_key_id', InfoArchive Configuration APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_reject_date', = 'eas_reject_date', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_invalid_date', = 'eas_invalid_date', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_rejinv_user_name', = 'eas_rejinv_user_name', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_rejinv_description', = 'eas_rejinv_description', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_phase', = 'eas_phase', APPEND a_content_attr_name APPEND a_content_attr_desc = 'eas_state', = 'eas_state', APPEND a_content_attr_name APPEND a_content_attr_desc = 'a_content_type', = 'a_content_type', APPEND a_content_attr_name APPEND a_content_attr_desc = 'r_page_cnt', = 'r_page_cnt', APPEND a_content_attr_name APPEND a_content_attr_desc = 'content.r_object_id', = 'content.r_object_id', APPEND a_content_attr_name APPEND a_content_attr_desc = 'content.full_format', = 'content.full_format', APPEND a_content_attr_name APPEND a_content_attr_desc = 'content.page_modifier', = 'content.page_modifier', APPEND a_content_attr_name APPEND a_content_attr_desc = 'content.page', = 'content.page', APPEND a_content_attr_name APPEND a_content_attr_desc GO = 'content.set_file', = 'content.set_file' The sample DQL defines a Centera Store object with AIP and content attributes assigned to it, which determine what information will be pushed to Centera at the storage level when the east_commit job commits ingested data at the end of the archiving process. As a naming convention, content.attribute_name refers to the attribute of the content object (dmr_content); an attribute name not proceeded by content indicates that it is an AIP (eas_aip) object attribute. The attributes defined in the sample DQL are recommended ones. You can configure them based on your own storage needs. If you do not want to push a content attribute to Centera at the storage level, you can remove the assignment of the attribute from the statements. For example, if you remove the assignment of a_retention_attr_name from the DQL statements, the retention date attribute will not be pushed to Centera. 83 InfoArchive Configuration Configuring Key Centera Store Object Properties Edit the properties of the Centera Store object: • Select the Configure Retention Information option and set Retention Attribute Name to eas_retention_date, which is an AIP (eas_aip) object property. • Do not select the Application Provides Retention option. • The content attribute names and descriptions were defined when you created the Centera Store object by executing the DQL statement. You can edit the attributes as needed. Quickstart Using the InfoArchive Holding Configuration Wizard As an alternative to manually creating and configuring a holding, InfoArchive provides a holding configuration wizard that guides you through the process of creating a basic holding configuration to give you a quickstart in getting the system up and running. The web-based wizard walks you through some key configuration steps for a holding in an intuitive manner and generates a compressed .zip 84 InfoArchive Configuration file that contains all the configuration files required for installing the configured holding. You then install the holding into the repository on the InfoArchive Content Server host using the ant installer. While the InfoArchive configuration wizard streamlines the configuration process for a holding, it only lets you define some key configuration settings and making all other configuration settings transparent by using the preset default values. The simple holding configuration package generated by the wizard can be either directly installed into the repository using the ant installer, or used as a basic configuration template to configure advanced settings. Here is the general workflow of the InfoArchive holding configuration wizard: Launching the InfoArchive Holding Configuration Wizard The InfoArchive Holding Configuration Wizard is a web-based application that can be accessed in a web browser, either remotely or locally on the web application server. There are two ways you can start the wizard: • Launch the wizard remotely as a web application. 85 InfoArchive Configuration Deploy the wizard WAR package located in EAS_HOME/tools/holding-configurator/WAR to the web application server and access it in a web browser from a remote client. For example, on Apache Tomcat, deploy the wizard WAR package to TOMCAT_HOME/webapps /eas-wizard, and then access the wizard via the URL http://hostname:8080/eas-wizard. • Launch the wizard locally as a standalone application. Note: If you are running the wizard on a computer other than the InfoArchive server, make sure Java 7 has been installed. Execute the following script: — EAS_HOME/tools/holding-configurator/run.bat (Windows) — EAS_HOME/tools/holding-configurator/run.sh (Linux) This will deploy the wizard application on an embedded Jetty server and directly launch a web browser pointing to the wizard URL: http://localhost:9000/holding-configurator (default). If needed, you can change the port number by editing EAS_HOME/tools/holding -configurator/conf.xml. Note: The standalone InfoArchive Configuration Wizard can be directly extracted from the InfoArchive installation package and launched using the run.bat/run.sh script. If you do so on Linux, make sure that proper permissions are set on the run.sh script. Configuring a Basic Holding Using the InfoArchive Holding Configuration Wizard On the first screen of the InfoArchive Holding Configuration Wizard, choose to create a new holding configuration or load an existing holding configuration that you saved earlier using the wizard. Note: Holding configurations files created outside the wizard cannot be loaded. The wizard lets you save an unfinished holding configuration and complete it at a later time. Any time during the configuration process, you can click the Save button to download an interim configuration file (.zip) to your local drive. Note: If you are running the InfoArchive Holding Configuration Wizard in Microsoft Internet Explorer on a Windows Server operating system, you must disable Internet Explorer Enhanced Security Configuration to be able to save the interim configuration file. You can load the saved configuration file later to continue with the configuration. Note that every time you modify a saved configuration, you must regenerate the final holding configuration package (.zip) to be installed into the repository. Do not directly modify settings in the interim configuration file or the generated holding configuration package outside the wizard. Note: In the InfoArchive Holding Configuration Wizard, when you load an existing holding configuration that you saved earlier, you may not be returned to the screen where you saved the configuration, which may lead to some configuration steps skipped. To be safe, to modify a previously saved holding configuration in the wizard, after you load it, always go back to the first configuration screen and resume from here. 86 InfoArchive Configuration To create a new holding configuration, follow the on-screen instructions that guide you through the configuration process: 1. Specify a descriptive name that uniquely identifies the holding. The holding name provided here can contain up to 18 alphanumeric characters. A holding is a logical destination archive where to ingest and store data, usually of the same type that share common characteristics. For example, you can create a holding to archive data from the same source application (such as ERP data), or of the same format (such as audio recordings), or belonging to the same business entity. An InfoArchive instance can contain multiple archive holdings. The SIP descriptor (eas_sip.xml) contains the name of the holding to be used for data archiving. 2. Define the PDI schema. Select the schema that formally describes the structured data in the information packages to archive. The specified schema will be imported into the repository as the content of the schema configuration (eas_cfg_schema) object. Note: The PDI schema must not contain elements with identical names. Unlike the SIP descriptor file eas_sip.xml, there is no predefined schema for structured data (PDI) in the information package. You must create a schema with a target namespace that defines the elements, attribute, and simple and complex types in the eas_pdi.xml file according to your business requirements. The eas_pdi.xml file in the information package must conform to the defined schema. For detailed information about defining the PDI schema, see Creating a Structured Data (PDI) Schema, page 109. 3. Select the Archival Information Unit (AIU) node. From the schema diagram, select the node that represents the archival information unit (AIU). This diagram is a graphical representation of the structured data (PDI) schema. Use the following navigational operations to locate the AIU node: • Pan and zoom around the diagram—Wheel up to zoom in; wheel down to zoom out • Click the plus sign (+) on a node to expand it; click the minus sign (-) to collapse a node • Click the plus sign (+) and the minus sign (-) on the left side of the diagram to expand/collapse the nodes all at once Note: Make sure you select the correct node that represents the AIU. The wizard does not validate your selection. If you select the wrong node, ingestion will fail. An archival information units (AIU) is conceptually the smallest archival unit (like an information atom) of an information package. Each AIU corresponds to a record or item of the archived data. A single customer order, a patient profile, or a financial transaction record in an information package is an AIU. The structured data (PDI) file (eas_pdi.xml) in a SIP describes all the AIUs in the package. An AIU in eas_pdi.xml consists of an XML block in the file containing its structured data, and optionally, references to one or more associated unstructured content files. For information about AIU, see Archival Information Unit (AIU) , page 32. 87 InfoArchive Configuration 4. Select a date or dateTime node to be used as the AIP partitioning key. You can only select one AIU child node (element) of type date or dateTime as the partitioning key (all the other nodes are greyed out in the schema diagram), which is the most common use scenario. However, through manual configurations, you can also define multiple AIP partitioning keys and the element does not have to be of type date or dateTime. InfoArchive uses the partitioning key, which is an AIU child node (element) defined in the structured data (PDI) schema, to calculate the value range (min/max values) of the information package in terms of this AIU attribute. The value range serves as the AIP partitioning criterion used to quickly locate the AIPs that may contain matching AIUs in the query for archived data (tier-1 search). For more information about defining the partitioning key, see minmax—Defining the AIP Partitioning Key, page 132. 88 InfoArchive Configuration 5. If information packages include unstructured content files, select the node (AIU child element or attribute) that contain their names. By executing an XQuery expression to select distinct values of this element during the ingestion process, InfoArchive uses this information to create the table of contents (eas_ri.xml) that references unstructured content files. 6. Select the format of content files. In most cases, this is the content file name extension, and the MIME type attribute is extrapolated from the format. The format must be defined in the Documentum repository. If needed, you can define additional formats in the repository using DA or editing the configuration template files, which can be installed using ant. This information is used to create the table of contents (eas_ri.xml) that references unstructured content files. 7. Select one or more nodes to be used as search criteria. 89 InfoArchive Configuration 8. Define the search criteria: • Data types of the search criteria are displayed on the screen. If a data type is not correct, fixed it in the PDI schema and reload it in the wizard. • Create xDB index: Specify whether to index the element in xDB. Creating xDB indexes speeds up searches but consumes more storage space. • Set the default sorting order on the search results page This information will be used to construct the XQuery XML file to be imported into the repository as the content of the query configuration (eas_cfg_query) object. Specifically, this step configures the path.value.index section in the eas_cfg_pdi.xml and eas_cfg_query.01.xml configuration template files. For details, see Defining the Search Criteria and Search Result, page 171. 9. If you want to take advantage of the confirmations feature, select the event types for which you want to generate confirmation messages. A confirmation is a message generated in response to an AIP event to acknowledge that the event has occurred and to capture the information of the relevant AIP. The wizard only provides the most basic confirmation configuration options. You can perform additional manual configurations to: • Generate multiple confirmations for a single event and configure the content of the messages • The scope of confirmation generation to a specified set of AIPs • Configure where to output confirmation messages: xDB or a file system location. For information about configuring confirmations, see Configuring Confirmations, page 222. 90 InfoArchive Configuration 10. Specify the xDB mode and retention policy for the holding. xDB modes determine how structured data is stored in xDB. If you choose xDB mode 3 (pooled libraries), specify the following settings for the library pool: • Close period (in days) How many days after the last ingestion date when the library pool can be closed. • Partition period per Specify a partitioning period: day, week, month, quarter, or year. • Pooled libraries quota The maximum number of xDB libraries that can be assigned to a library pool. For information about xDB modes, see xDB Modes, page 50. By default, the AIP mode is set to 1. To change the AIP mode, after the wizard generates the holding configuration package, you need to edit the configuration template file 022-aip-parent-policy.xml inside the package. By default, the synchronous ingestion is not enabled. If you choose to use a retention policy, select one of the following: • Retention period (days): Specify the default retention period used to calculate the retention date of the information packages in the holding: retention date = base retention date + retention period • Retention Policy Services (RPS) retention policy: If InfoArchive is integrated with Retention Policy Services (RPS) for retention management, specify the RPS retention policy to use. For information about retention management using RPS, see Using Extended Retention Management Capabilities of Retention Policy Services (RPS) , page 262. 91 InfoArchive Configuration 11. Select search criteria to include in the search form. From the search criteria you defined in the previous steps, select the ones you want to use on the InfoArchive GUI search form and use the Up/Down arrows to arrange them in the same order you want then to appear on the search form. 12. Configure the search form: • Label: The label of the search criterion to display in InfoArchive GUI • Operator: Search operator that defines how to compare the specified value against archived data. 92 InfoArchive Configuration When you specify an operator for a criterion, you can only choose from the list a valid operator supported by the node data type. The supported operator for each data type are listed as follows: Date DateTime String Integer Double Equal * * * * * Not equal * * * * * Greater * * * * Greater or equal * * * * Less * * * * Less Or equal * * * * Between * * * * Begins with * Contains * Note: The information provided in this step will be used to generate the eas_cfg_search_form.01.xml configuration template file in the resultant holding configuration package to be generated by the wizard. This file will be imported into the search form configuration (eas_cfg_search_form) object during holding installation. In the XForms eas_cfg_search_form.01.xml: The Between operator automatically translates into the combined Greater or equal and Less or equal operator pair, which actually requires two input values. The Begins with operator in the wizard will translate into the StartsWith operator. For information about search form configuration, see Configuring a Search Form, page 187. • Default value: Default value of the search criterion. You can specify a fixed value such as an integer or a string, or directly use an XForms function such as now() for the dateTime type or local-date() for the date type. Note: You must ensure the default value you set is valid and ensure that the returned value is of the same data type as that of the search criterion. String type values must be surrounded by single quotes (’); no additional single quotes (’) or double quotes (’’) are allowed. For example, ’MyValue’ is a valid entry, but ’My’Value’, ’My’’Value’, or ’’MyValue’’ will either cause technical errors or incorrect default values in InfoArchive GUI. • Required: Whether the criterion is a required search condition. If you set a search criterion of type date or a dateTime as required but do not set its default value, the current date will be automatically set as the default value in the generated search form. 93 InfoArchive Configuration 13. Select columns to display on the result page. Use the Up/Down arrows to arrange them in the order that you want them to appear on the result page. 14. Define the column labels to display on the result page. The labels default from the ones defined for the search form. You can modify the labels, but they must be unique. 94 InfoArchive Configuration Information provided in step 13 and 14 will be used to generate the template stylesheet eas_cfg_stylesheet.01.xml. 15. Enter the Documentum repository information required for installing the configured holding. The wizard does not check the validity of this information. If you do not provide the repository superuser password (for security reasons), the wizard supplies a predefined dummy value so as to use the Content Server trusted login mechanism to install the configured holding. You can configure these settings later in the build.properties contained in the holding configuration package to be generated by the wizard. 16. Review the holding configuration settings. Click Back to modify configuration settings as needed; click Generate Configuration to generate and download a compressed .zip file that contains all the configuration files required for installing the configured holding. 17. Copy the generated holding configuration package (.zip) to the InfoArchive server. 18. Unzip the holding configuration package. 95 InfoArchive Configuration 19. Optionally, if you want to modify the holding configuration settings, including the default settings such as AIP mode that were not exposed for configuration in the wizard, you can edit the following files in the decompressed package folder: • build.properties • Configuration object content template files in /template/content Note: The changes you directly make to the files in the holding configuration package will not take effect if you reload the modified package in the wizard. 20. Run the ant installer to install the configured holding into the Content Server repository. After you install the basic holding configured using the wizard, you can modify the holding configurations in one of the following ways: • Modify the configuration objects pertaining to the holding in DA. This allows you to configure more advanced settings that are not available in the wizard. • Modify the settings in build.properties and configuration object content template files in /template/content in the holding configuration package and reinstall the holding into the repository to overwrite the existing one. • Load the existing holding configuration in the wizard, modify the configurations, regenerate the holding configuration package, and reinstall the holding into the repository to overwrite the existing one. Configuring a Holding Most InfoArchive system configuration is performed at the holding level. Holding configuration encompasses many aspects of data archiving such as storage areas, retention policy, ingestion sequence, AIP mode, and xDB mode. The settings defined at the archive holding level are used throughout the whole lifecycle of the data archived in the holding. Holding is configured through the holding configuration (eas_cfg_holding) object, which directly or indirectly references many of the other InfoArchive configuration objects. 96 InfoArchive Configuration Configuring a Holding Configuration (eas_cfg_holding) Object In DA, create an object of type eas_cfg_holding in the holding folder (e.g., /System EAS/Archive Holdings/MyHolding) and configure its properties. The holding configuration object name and its location in the repository folder is completely arbitrary. However, the general convention is to place the object in the /System EAS/Archive Holdings folder. Property Name DA Label Description eas_criteria_name Criteria name Name of the partitioning criterion to use during ingestion as well as search when queries are executed against the collection. eas_criteria_type Criteria type Name of the XQuery atomic type (for example, string, date, date-time, float, integer, or boolean) of the partitioning criteria specified in eas_criteria_name. eas_criteria_pkey_min_attr Criteria min. value Name of the AIP property containing the minimum value of the partitioning criteria in eas_criteria_name at the corresponding index. eas_criteria_pkey_max_attr Criteria max. value Name of the AIP property containing the maximum value of the partitioning criteria in eas_criteria_name at the corresponding index. All these properties are related to the AIP partitioning criteria, which must be already defined as part of the ingestor parameters in ingestion configuration. Note: Partition criteria must be defined before data is ingested. 97 InfoArchive Configuration Property Name DA Label Description eas_order_no Order No Number to control the sort order in which items are returned by the InfoArchive web service. 98 InfoArchive Configuration Property Name DA Label Description eas_consumer _application User application Name of the consumer application for which the labels are valid (eas_language_code, eas_title, and eas_description). eas_language _code Language Code Language code in the format language_country (ISO 639, 3166); for example: fr_FR for French and zh_CN for simplified Chinese. eas_title Title eas_description Description Title of the AIC in the language specified in eas_language_code. Description of the AIC in the language specified in eas_language_code. Property Name DA Label Description eas_sip_format Received file format Name of a Documentum format name associated with a SIP to receive or ingest (for example, eas_sip_zip). 99 InfoArchive Configuration Property Name DA Label Description eas_cfg_ingest Cfg ingest process Name of the ingestion configuration to apply for a SIP format at the same index. eas_pdi_schema Schema Name of an XML schema in which the holding can ingest structured data. eas_pdi_schema_version Cfg metadata Version of the XML schema in which this holding can ingest structured data associated with the schema at the same index. eas_cfg_pdi Cfg metadata Technical name of the ingestion parameters applied for ingesting the AIP. eas_cfg_pdi_version Cfg metadata version Version of the ingestion parameters applied for ingesting the AIP. eas_sip_store Reception filestore Name of repository storage area in which the receiver must store the received file. eas_delete_on_error Delete on ingestion error Whether to enable the deletion of received data from the working directory and xDB if a processing error occurs. This setting is useful for protecting sensitive data. When data must be encrypted for the holding, the eas_delete_on_error property of the eas_cfg_holding_crypto configuration object overrides this property defined at the holding level. eas_create_deletion_lock Automatic creation of purge lock Enables the automatic creation of a deletion lock on the AIP when it is ingested (refer to eas_litigation_hold type). eas_keep_sip_ingest_enabled Keep received file on reject or invalidation Flag to indicate whether the received file should be retained after the ingestion of the AIP is committed. eas_cfg_ingest_node Ingest nodes Name of the ingestion node that processed this AIP. eas_priority Priority Ingestion priority of this holding. 100 InfoArchive Configuration Property Name DA Label Description eas_dss_priority Sub-priority Ingestion sub-priority code mentioned in the SIP descriptor (eas_sip.xml). The ingest deadline date is computed based on the duration defined in the holding for the current eas_dss_priority (same as the value for eas_sip.xml). eas_dss_deadline Sub-priority deadline (mn) Maximum target time in minutes for the injection of an AIP having the priority value eas_dss_priorities at same index. 101 InfoArchive Configuration Property Name DA Label Description eas_logs_store_enabled Archive logs Flag indicating whether reception and ingestion logs must be stored in the repository. eas_logs_store Log store Name of the repository storage area to store reception and ingestion logs. eas_root_folder_path Root folder Path to the root folder for the classification of AIPs in the repository. AIPs are classified chronologically according to its base date of retention (eas_dss_base_retention_date present in the descriptor): <Root> / YYYY/ YYYY-MM / YYYY-MM-DD eas_folder_type Sub-folder type Type of folders to create under the root classification folder defined by eas_root_folder_path. eas_folder_acl_name Sub-folder ACL name Name of the permission set to apply to folders created under the root classification folder defined by eas_root_folder_path. eas_folder_acl_domain Sub-folder ACL domain Domain of the permission set to apply to folders created under the root classification folder defined by eas_root_folder_path. eas_aip_type AIP type AIP type used by this holding. It must be a subtype of the eas_aip base type. eas_sip_xml_store_enabled Archive SIP descriptor Flag to indicate if the XML descriptor (eas_sip.xml) of the AIP should be stored as content of the AIP for redundancy and reversibility. eas_pdi_xml_hash _enforced Mandatory metadata hash Flag to indicate whether an error should be generated if the SIP descriptor (eas_sip.xml) does not contain the hash value of the PDI file (eas_pdi.xml). eas_pdi_xml_hash _validation Metadata hash validation Flag to indicate whether the hash value associated with the XML structured data elements of the AIP should be validated if the hash is present in the SIP descriptor. eas_pdi_xml_store_enabled Archive XML metadata Flag to indicate if the structured data of the AIP should be stored as compressed XML. eas_xml_store XML store Name of the storage area of the repository to assign the contents to ensure reversibility (formats eas_sip_xml_zip, eas_ri_xml_zip, eas_pdi_xml_zip or eas_pdi_xml_zip_crypto). 102 InfoArchive Configuration Property Name DA Label Description eas_keep_xml_rejinv _enabled Keep XML file on reject or invalidation Flag to indicate whether structured data files should be retained for ensuring reversibility when an AIP is rejected or invalidated. eas_keep_ci_rejinv_enabled Keep contents on reject or invalidation Flag to indicate whether unstructured content files associated with the items contained in an AIP should be retained in case of rejection or invalidation of the AIP. eas_retention_period Default Retention period (d) Retention time of AIPs expressed in days, to be applied by default if not any retention class is mentioned in the SIP descriptor (eas_sip.xml). The retention date of the AIP is calculated according to: eas_aip .eas_dss_base_retention_date + eas_cfg_holding.eas_retention_period. eas_retention_class Retention class Logical name of a retention class. eas_retention_class_period Retention class period (d) Retention period in days associated with the retention class name of the same index. eas_auxiliary_alias_set Auxiliary Alias set Alias set name assigned as auxiliary set during reception or ingestion of the AIP. This alias set is defined as the default alias set for the session that runs the state transitions of the lifecycle. This alias set generally contain aliases referencing the permission sets to be applied, assuming those aliases are defined as actions in the lifecycle attached to the AIP. eas_alias_set Alias Set Name of the alias set to apply to the lifecycle after the creation of the object (optional). eas_sync_commit_enabled Synchronous commit enabled Activates commit at the end of the ingestion when the AIP is a standalone DSS (the DSS is contained in this single AIP). eas_sync_ingest_enabled Synchronous ingestion enabled Must be set for authorizing synchronous ingestion (performed by the ingest Web service) for the archive holding. eas_sing_cfg_aip_parent _pol Cfg AIP parenting policy for sync. ingest Name of the AIP parenting policy (eas_cfg_aip_parent_policy) to apply for synchronous ingestion. If not set, a synchronous ingestion is processed like a batch ingestion (creation of a materialized AIP object with the eas_xdb_mode configured at the holding level). eas_xdb_mode Metadata ingest mode xDB ingestion mode of the metadata which has been applied for the AIP (refer to eas_cfg_holding type). 103 InfoArchive Configuration Configuring an AIC View (eas_cfg_aic_view) Object Optionally, you can configure the AIC view (eas_cfg_aic_view) object, which is a collection of selected AIUs. The holding configuration (eas_cfg_holding) object can work without this. You have two methods of selecting AIUs in the AIC view scope: • DQL • XQuery Configuring a DQL Predicate for the AIC View You specify a DQL predicate with the eas_aip_predicate property of the AIC view object. You can configure a DQL query to select a collection of AIPs from one or more holdings. For example, the following DQL predicate selects AIPs in the EAS_AUDIT_001 holding: eas_aip WHERE eas_dds_holding IN ('EAS_AUDIT_001') When InfoArchive performs a search in background, the predicate is integrated into a DQL SELECT statement to identify the AIP objects. Configuring an XQuery for the AIC View You can further refine the AIUs visible in the AIC view scope by attaching an XML file, which contains XQuery criteria, to the AIC view object. Attaching an XML file to the AIC view object is similar to attaching XML files to eas_cfg_query or eas_cfg_ingest objects. The syntax of the XML file is similar to the query configuration XML: The following example shows an XML snippet which is used to build an XQuery selecting dm_logon_failure event AIUs from the audit holding. <?xml version="1.0" encoding="UTF-8"?> <aicQueryCriteria type="AND"> <operand> <name>event-name</name> <operator>BeginWithFullText</operator> <value>dm_logon_failure</value> </operand> </aicQueryCriteria> Configuring Holding Security You must configuring security settings for a holding with the following group/role hierarchy. Note: myholding is the name of the holding for which you are configuring security settings. The names of the domain, roles, and groups are for illustrative purposes only. You can use your own naming conventions. 104 InfoArchive Configuration The domain is used by the UI to display different sets of search forms depending on the specified domain. The dynamic roles are used for access control by the InfoArchive access web services and the eas_usr_webservice must be a member of each dynamic role. You can grant users different access rights by assigning them to the corresponding groups. It is a good practice to create a dynamic role to use with a holding. When a dynamic role is specified, InfoArchive GUI handles the assignment of users at runtime, which makes it easier to manage the holding: • The permissions configured for AIPs and configuration objects just need to include the role dedicated to the holding • To grant or revoke access to the holding, you just need to manage the desired users, groups, or roles associated with the dynamic role. It is much harder to reliably manage access to the holding directly using permissions (ACLs), since users and roles are often subject to change. In DA, follow these steps to configure repository security settings for your holding: 1. Under Administration/User Management/Groups, create two groups: g_myholding_read and g_myholding_admin. 2. Under Administration/User Management/Roles: • Create a dynamic role r_myholding_read and add the group g_myholding_read and the user eas_usr_webservice to it. • Create a dynamic role r_myholiding_admin and add the group g_myholding_admin and the user eas_usr_webservice to it. Note: In the New Role window, select Dynamic Role to create the dynamic roles. 3. Under Administration/User Management/Roles, create a domain myholding and add the dynamic roles r_myholding_read and r_myholding_admin to it. Choose File > New Role and select Create role as domain in the New Role window. Note: After you create the domain, you will find it not under Administration/User Management/Roles but under Administration/User Management/Groups. 4. Grant appropriate access rights users by adding them to the g_myholding_admin and g_myholding_read groups respectively. For example, add the InfoArchive administrator user to the g_myholding_admin group. 105 InfoArchive Configuration Under Administration/User Management/Groups, double-click a group and then select File > Add Member(s) to add users to it. 5. Create the default permission set (ACL) for the holding. a. Under Administration/Security, choose File > New > Permission Set. b. In the New Permission Set window, under the Info tab, specify a unique name for the permission set; for example, you can use the holding name myholding as the permission set name. c. Under the Permissions tab: • Add r_myholding_read and r_myholding_admin to the permission set with Read permission • Add r_myholding_admin with Relate permission • Remove permissions for dm_world Note: In most situations, grant the Relate rather than the Write or higher access right on AIPs to the InfoArchive administrator for compliance reasons. The Relate permission grant access to the InfoArchive administrative functions on AIPs, including purge lock/unlock, retention date management, reject, and invalidation. Audit management and job execution require the standard Content Server privileges and are not related to the ACL applied to AIPs. 6. Create an alias set for the holding. The archive holding configuration references the name of an alias set to be used for determining the permissions to apply on the AIP during its lifecycle. It is a good practice to use the holding name as the alias set name. For more information about alias sets, refer to EMC Documentum Content Server Fundamentals Guide. a. Under Administration/Alias Sets, choose File > New > Alias Set to create a new alias set. b. Add the following aliases to the alias set, each alias corresponding to an AIP processing phase: 106 Alias Value Description EAS_ACL_RECEPTION eas_aip_non_visible Permission set to apply when an AIP is received EAS_ACL_INGESTION _WAIT eas_aip_non_visible Permission set to apply when an AIP is waiting for ingestion EAS_ACL_INGESTION eas_aip_non_visible Permission set to apply when an AIP is ingested EAS_ACL_COMMIT_WAIT eas_aip_non_visible Permission set to apply when an AIP has been ingested but has not yet been committed InfoArchive Configuration Alias Value Description EAS_ACL_TERMINATED myholding Permission set to apply when an AIP has been ingested and the ingestion has been committed The EAS_ACL_TERMINATED alias should be assigned an ACL that grants: • Read access to users performing searches against the archive • Relate access to users who perform administrative tasks EAS_ACL_PURGE eas_aip_non_visible Permission set to apply when an AIP is purged EAS_ACL_REJECT eas_aip_non_visible Permission set to apply when an AIP is rejected EAS_ACL_INVALID eas_aip_non_visible Permission set to apply when an AIP is invalidated EAS_ACL_PRUNE eas_aip_non_visible Permission set to apply to an AIP after it has been aggregated (in synchronous ingestion) EAS_AIP_OWNER repository_owner User name of the repository owner account The most common configuration is to assign to all the aliases except EAS_ACL_TERMINATED a generic ACL that grant Relate access to InfoArchive administrators and None to World. If there is a need to make the AIP visible to different administrator groups depending on the AIP processing phase, assign the appropriate ACLs to the alias for each processing phase. You can also create an alias set quickly using a DQL statement; for example: CREATE dm_alias_set OBJECT SET object_name SET object_description SET owner_name = 'AssignedAliasSetName', = ' AssignedAliasSetDescription', = (SELECT owner_name FROM dm_docbase_config), APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_RECEPTION', 'AssignedAclName', 'ACL for the reception phase', 6, 1, APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_INGESTION_WAIT', ' AssignedAclName', 'ACL for the pending ingestion phase', 6, 1, APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category = = = = 'EAS_ACL_INGESTION', 'AssignedAclName', 'ACL for the ingestion phase', 6, 107 InfoArchive Configuration APPEND alias_usr_category = 1, APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_COMMIT_WAIT', 'AssignedAclName', 'ACL for the pending commit phase', 6, 1, APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_TERMINATED', 'AssignedAclName' 'ACL for the completed phase', 6, 1, APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_PURGE', 'AssignedAclName', 'ACL for the purge phase', 6, 1, APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_REJECT', 'AssignedAclName', 'ACL for the reject phase', 6, 1, APPEND APPEND APPEND APPEND APPEND alias_name alias_value alias_description alias_category alias_usr_category = = = = = 'EAS_ACL_INVALID', 'AssignedAclName', 'ACL for the invalid phase', 6, 1 Defining the Structured Data (PDI) Schema Unlike the SIP descriptor file (eas_sip.xml), there is no predefined schema for structured data (PDI). You must create a schema with a target namespace that defines the elements, attribute, and simple and complex types in the eas_pdi.xml file according to your business requirements. The eas_pdi.xml file in the information package must conform to your defined schema. The structured data (PDI) schema is configured through the schema configuration (eas_cfg_schema) object, with the structured data (PDI) schema (.xsd) file as its content. The holding configuration (eas_cfg_holding) object references the schema through the schema name. 108 InfoArchive Configuration Here are the steps for defining a PDI schema: 1. Create a PDI schema (.xsd) file with a designated namespace and a target namespace. 2. Configure a schema configuration (eas_cfg_schema) object with the PDI schema (.xsd) file as its content. The defined schema must also be specified in the SIP descriptor. Creating a Structured Data (PDI) Schema Create a schema (XSD) using an XML editor. You then import this document into the repository as the content of the schema configuration (eas_cfg_schema) object. Note: The document you create must have the filename extension .xsd. A schema embedded in other document types (such as a Word file) is not valid. Also, the schema used by InfoArchive must meet the following requirements: • The schema version must be 1.0. XSD 1.1 is currently not supported. • The schema must be a standalone document; xs:include or xs:import that references another schema is not supported. • There must be one and only one target namespace in the schema. Multiple namespaces are not supported. • PDI schema containing only one AIU element at the root level is not supported. • The schema must specifically describe its data. Ambiguous data descriptions—for example, using the any element—will make it difficult to perform configuration tasks such as defining the partitioning key, specifying the unstructured content file, and creating xDB indexes. • Element names cannot contain the dot (.) character. • If you install multiple holdings in one repository, make sure the PDI schema used by each distinct holding is identified by a unique namespace; otherwise, installing a new holding will break a previously installed one that uses a PDI schema with the same namespace due to name conflicts. Schema specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It defines elements, attributes, and their data type that can appear in an XML document. It can be used to verify each piece of item content in a document and to express a set of rules to which an XML document must conform in order to be considered valid according to that schema. In the following example, the schema dictates: 1. The child elements must appear in the sequence specified in <xs:sequence>. 2. The data type for each element is defined in type attribute. 3. CallFromPhoneNumber must be an 11-digit positive integer. <xs:sequence> <xs:element name="SentToArchiveDate" type="xs:date" nillable="false" /> <xs:element name="CallStartDate" type="xs:dateTime" nillable="false" /> <xs:element name="CallEndDate" type="xs:dateTime"/> <xs:element name="CallFromPhoneNumber"> <xs:simpleType> <xs:restriction base="xs:positiveInteger"> 109 InfoArchive Configuration <xs:minInclusive value="1" /> <xs:totalDigits value="11" /> </xs:restriction> </xs:simpleType> PDI (eas_pdi.xml) Schema Definition Best Practices Here are some best practices for creating your PDI file schema: • Whenever applicable, use standardized schemas such as ISO 20022, Rosettanet, METS, DITA, XBRL, and SWIFT. • Leverage standard schema features to control XML content. For example, use standard XML data types, especially for the date(time) information. • Configure minimum and maximum length of attribute and/or element values • Include the version number of the schema in its URN. • Adopt a consistent naming rule for the schema URN. This makes it easier to remember the URNs, which are referenced in multiple places during the configuration. • Include the version number in the schema URN, defined as value of the targetNamespace; for example: <?xml version="1.0" encoding="UTF-8"?> $xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:eas-samples:en:xsd:PhoneCalls.1.0" xmlns:ns1="urn:eas-samples:en:xsd:PhoneCalls.1.0"> ... </xs:schema> • In eas_pdi.xml, include date and date time values that explicitly include the time zone information; for example, specify an offset from UTC: <CallStartDate>2014-01-08T22:08:08.158+01:00</CallStartDate> • Configure value uniqueness, if acceptable from a performance point of view; use namespaces to separate and identify duplicate XML elements. Also avoid non-ASCII characters in the node name. If a uniqueness value constraint is defined in the schema, the uniqueness checking is performed during the validation of the XML. This validation occurs at the beginning of the ingestion. If the XML is large, uniqueness checking can consume considerable resources and considerably slow down the ingestion. In such situations, it is frequently better to remove uniqueness definitions in the schema. Instead, configure the creation of unique xDB indexes during the ingestion. The XML validation will occur more quickly. Also, if several AIUs having the same value exist in the package, the ingestion will fail during the creation of the xDB indexes, thus enforcing the uniqueness constraint. • When customizing your schema for XML data types, use a professional XML editor such as oXygen XML Editor and Altova XMLSpy XML Editor. 110 InfoArchive Configuration Sample PDI (eas_pdi.xml) Schema <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <xs:Schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace= "urn:eas-samples:en:xsd:phonecalls.1.0" version="1.0" elementFormDefault="qualified" xmlns:Q1="urn:eas-samples:en:xsd:phonecalls.1.0"> <xs:element name="Calls"> <xs:complexType> <xs:sequence maxOccurs="unbounded"> <xs:element name="Call"> <xs:complexType> <xs:sequence> <xs:element name="SentToArchiveDate" type="xs:date" nillable="false" /> <xs:element name="CallStartDate" type="xs:dateTime" nillable="false" /> <xs:element name="CallEndDate" type="xs:dateTime"/> <xs:element name="CallFromPhoneNumber"> <xs:simpleType> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="1" /> <xs:totalDigits value="11" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="CallToPhoneNumber" nillable="false"> <xs:simpleType> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="1" /> <xs:totalDigits value="11" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="CustomerID" nillable="false"> <xs:simpleType> <xs:restriction base="xs:positiveInteger"> 111 InfoArchive Configuration <xs:totalDigits value="11" /> <xs:minInclusive value="1" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="CustomerLastName" nillable="false"> <xs:simpleType> <xs:restriction base="xs:normalizedString"> <xs:minLength value="1" /> <xs:maxLength value="32" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="CustomerFirstName" nillable="false"> <xs:simpleType> <xs:restriction base="xs:normalizedString"> <xs:minLength value="1" /> <xs:maxLength value="32" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="RepresentativeID" nillable="false"> <xs:simpleType> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="1" /> <xs:totalDigits value="7" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Attachments" nillable="false" minOccurs="1"> <xs:complexType> <xs:sequence maxOccurs="unbounded" minOccurs="0"> <xs:element name="Attachment"> <xs:complexType> <xs:sequence> <xs:element name="AttachmentName" nillable="false" maxOccurs="1"> <xs:simpleType> <xs:restriction base="xs:normalizedString"> <xs:minLength value="1" /> <xs:maxLength value="32" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="FileName" nillable="false" minOccurs="1" maxOccurs="1"> <xs:simpleType> <xs:restriction base="xs:normalizedString"> <xs:minLength value="1" /> <xs:maxLength value="32" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="CreatedBy" nillable="false" maxOccurs="1"> <xs:simpleType> <xs:restriction base="xs:normalizedString"> <xs:minLength value="1" /> <xs:maxLength value="32" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="CreatedOnDate" type="xs:dateTime" nillable="false" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> 112 InfoArchive Configuration </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:Schema> Configuring a Schema Configuration Object In DA, create an object of type eas_cfg_schema in the holding folder (e.g., /System EAS/Archive Holdings/MyHolding) and configure its properties. Import the PDI XML schema (.xsd) file into the repository as the content of the object. Note: If you check in multiple versions of PDI schema (.xsd) as the content of the schema configuration (eas_cfg_schema) object, ingestion will fail. Make sure that the schema configuration (eas_cfg_schema) object only contains one version of PDI schema and track the schema version as part of the object name instead. Property Name DA Label Description eas_name Name Technical name of the schema that is referenced by InfoArchive configuration objects. EMC recommends that you append the version number directly to the schema name. Note: The URN is case-sensitive. eas_version Version Version of the schema. The recommended approach to schema versioning is to add the version number in the name of the schema itself and not to use this property. 113 InfoArchive Configuration On the Description tab, include information about the applications that will query InfoArchive. Property Name DA Label Description eas_order_no Sort order Number that controls the order in which the items are returned by the service returning the available schemas. eas_consumer _application User application Name of the consumer application for which the eas_language_code value is specified. Note: eas_gui is the client application name assigned to InfoArchive GUI. eas_language _code Language code Language code in the format language_country (ISO 639, 3166); for example: fr_FR for French and zh_CN for simplified Chinese. eas_title Title Title of the schema in the language specified in eas_language_code at the same index. eas_description Description Description of the schema in the language specified in eas_language_code at the same index. Configuring xDB Modes You configure settings for each xDB mode and specify which xDB mode to use for a holding through the holding configuration object (eas_cfg_holding). For xDB modes 1 and 2, you must configure an xDB library configuration object (eas_cfg_xdb_library). This defines how to connect to the xDB database as well as the root library to use. The xDB library configuration object contains the information needed to connect to the target xDB detacheable database and the path of the library in the database. For xDB mode 3, you must configure an xDB library pool configuration object (eas_cfg_xdb_library_pool). Optionally, you can configure a custom pooled library assignment 114 InfoArchive Configuration policy by creating a text file containing the XQuery expression and importing it into the xDB library pool configuration object as its content. Both the parent xDB library configuration object and the xDB library pool configuration object are referenced by the holding configuration object. You can change the xDB mode for a holding and the change will take effect immediately. After you change the xDB mode, the new xDB mode is applied to newly ingested AIPs. Previously archived AIPs remain stored in xDB according to the mode configured when they were ingested. Data ingested in different xDB modes is transparent to searches. A sample holding EAS-AUDIT-001 (/System EAS/Archive Holdings/EAS/EAS-AUDIT-001), which uses xDB mode 3 is provided with the InfoArchive installation package. You can install this sample holding and use its holding configuration (eas_cfg_holding) and library pool (eas_cfg_library_pool) objects as a reference for configuring xDB mode 3 for other holdings. Configuring an xDB Parent Library (xDB Mode 1 and 2) A default xDB library configuration object was created (/System EAS/xDB Libraries/aip_01 _xdb_lib) during InfoArchive installation and can be used out of the box. You can modify this object in DA as needed for your InfoArchive deployment. You can also create a new object of type eas_cfg_xdb_library and edit its properties. The name of the xDB library configuration object is referenced by the holding configuration object. 115 InfoArchive Configuration Configuring an xDB Library Pool (xDB Mode 3) An xDB library pool is configured through an xDB library pool configuration object (eas_cfg_xdb_library_pool). In DA, create an object of type eas_cfg_xdb_library_pool and configure its properties. Property Name DA Label Description eas_name Name Name of the xDB library pool configuration. The name is referenced by the holding configuration object that uses the xDB library pool. 116 InfoArchive Configuration Property Name DA Label Description eas_create_library _disabled Disable auto. library creation Whether to disable automatic creation of new libraries in the pool. This setting lets the administrator manually manage the libraries in the pool or temporarily disable creation of new libraries in the pool. This setting only applies to xDB libraries created by InfoArchive. InfoArchive does not automatically manage xDB libraries not created by it—they cannot be closed or cached in/out. eas_aip_quota AIP quota per library Maximum number of AIPs allowed in a library. The value zero indicates an unlimited number. eas_aiu_quota AIU quota per library Maximum number of Archival Information Units (AIUs) allowed in a library. The value zero indicates an unlimited number. This property is for information only and does not actually constrain the number of AIUs in a SIP during ingestion. eas_xdb_seg_size _quota Library segment size quota Maximum size allowed for the xDB segment of a library. 0 (zero) indicates an unlimited size. eas_unitary_purge _enabled Unitary AIP purge Reserved for future use 117 InfoArchive Configuration Property Name DA Label Description eas_last_library _seqno Last assigned library seqno Last sequence number assigned to an xDB library created in this pool. Internally, this sequence number is not really used. It serves only to have a meaningful way of naming of the pooled library object. When browsing in DA, it is possible to see immediately the chronology of the pooled libraries by sorting on their object_name. eas_folder_path Library folder Repository folder in which to put the xDB library repository objects created in this pool. The repository folder path can be any repository location. It does have to be the same folder path as that for the holding configuration (eas_cfg_holding) object. Since a pooled library can hold archived data pertaining to different holdings, it is common to classify them into a dedicated sub-folder in the root folder. 118 InfoArchive Configuration Property Name DA Label Description eas_acl_name Library ACL name Name of the permission set to apply to an xDB library repository object in Content Server created for this pool. eas_acl_domain Library ACL domain Domain of the permission set to apply to an xDB library repository object in Content Server created for this pool. eas_close_mode Close mode Mode to apply for closing a library of the pool automatically created by EMC InfoArchive: • 0: The library is never closed automatically unless a close request has been manually set on the library. • 1: The library can be automatically closed when the current date is greater than or equal to the date returned by the XQuery + the delay defined in the eas_close_period property. • 2 : The library can be automatically closed when the current date is greater than or equal to the library opening date + the delay defined in the eas_close_period property. • 3 : The library can be automatically closed when the current date is greater than or equal to the last ingestion date in the library + the delay defined in the eas_close_period property. The scheduled eas_close_pooled_libraries job is in charge of closing open pooled libraries according to the close mode setting. eas_param_close _period Close period (d) Period in days used according to the chosen close mode. eas_xdb_clib_is _detachable Detachable library Whether to set the DETACHABLE_LIBRARY flag when creating a new library. This setting must be set to true for xDB mode 3 to work. eas_xdb_cache _quota # of cached library quota (unlocked) Maximum number of non-locked libraries of the pool that are allowed in the xDB cache. When this limit is exceeded, EMC InfoArchive attempts to cache out the less used libraries to comply with this quota. Since in xDB mode 3, AIPs belonging to different holdings can be stored in a pooled library, the maximum number of unlocked libraries associated with a library pool is defined at the library pool level rather than at the holding level. 119 InfoArchive Configuration Since the same library pool can be configured as destination of multiple holdings, the library pool configuration has similar settings as those available on the holding configuration (eas_cfg_holding) object for xDB mode 2. Note: Users must have at least Read permission on the library pool object for searches to be successful. Configuring a Custom Pooled Library Assignment Policy (xDB Mode 3) By default, a new AIP is assigned to the latest open library in the pool. You can configure a custom assignment policy to assign new AIPs to libraries to meet your specific requirements. You define the pooled library assignment policy using XQuery in a text file to be imported into the library pool (eas_cfg_xdb_library_pool) object as its content. The content determines the pooled library to which AIPs will be assigned. You can use any information present in the SIP descriptor, including custom metadata, in the XQuery to define the assignment logic. For example, you can use the retention date (which is most commonly used), holding, or entity as the condition for assigning AIPs. Here is an example of the assignment policy XQuery expression. xquery version "1.0" encoding "utf-8"; declare namespace n = "urn:x-emc:eas:schema:sip:1.0"; let $d as xs:dateTime:= xs:dateTime(/n:sip/n:dss/n:base_retention_date/text()) let $quarter as xs:integer := xs:integer((ceiling(month-from-dateTime($d) div 3))) let $tz as xs:dayTimeDuration := timezone-from-dateTime($d) let $start as xs:dateTime := adjust-dateTime-to-timezone(xs:dateTime (concat(year-from-dateTime($d),"-01-01T00:00:00")),$tz) let $nextQuarter as xs:dateTime := $start + xs:yearMonthDuration (concat("P",string(($quarter*3)),"M")) return <pool partitioning_key="{year-from-dateTime($d)}-Q{$quarter}" close_hint_date="{$nextQuarter}"/> In this example, the quarter-based assignment logic is used and the base retention date is used to compute the following values: • partitioning_key="{year-from-dateTime($d)}-Q{$quarter}" A string type value using the YYYY-Qn pattern for the year and quarter number of the base retention date • close_hint_date="{$nextQuarter} A datetime type value corresponding to the first day of the next quarter Those computed values are returned as values of partitioning_key and close_hint_date attributes of a <policy> element expected by InfoArchive. The close_hint_date attribute is mandatory only if the close mode 1 is applied. Here is an example of the <policy> element returned by the XQuery expression: <pool partitioning_key="2011-Q1" close_hint_date="2014-01-01T12:00:00.000+01:00"/> 120 InfoArchive Configuration If a pooled library already exists whose Partitioning key is equal to the pool.partitioning_key, the AIP is assigned to it; otherwise, a new pooled library is created and its Partitioning key and Closing hint date are obtained from the <pool> element. Configuring xDB Mode Settings for a Holding Configure the settings for different xDB modes under their respective tabs in the holding configuration object properties window. Configuring Settings for xDB Mode 1 121 InfoArchive Configuration Property Name DA Label Description eas_cfg_xdb_library _parent Cfg parent library (mode 1 and 2) Set to the eas_name of an eas_cfg_xdb_library object, which defines: • Information needed to connect to the target xDB database • The path of the library in the database where AIP data will be stored • Aliases pointing to file system paths in which data files can be created Configuring Settings for xDB Mode 2 The xDB library configuration object also defines one or more segment location aliases. Each segment location alias refers to a file system path accessible to the xDB server, in which new xDB data files can be created. In xDB, documents are grouped in libraries. Each library specifies one or more segments. Each segment specifies a file path on the xDB server host. The segment location alias on the xDB library configuration object corresponds to the segment location field on the holding configuration object. 122 InfoArchive Configuration When an AIP is ingested with the xDB ingestion mode 2, InfoArchive reads the segment location alias referenced on the holding configuration object. It uses this alias to obtain the associated file system path from the library configuration object. Lock with parent and Concurrent library are xDB library settings. They are not required by xDB mode 2. For information about these two settings, refer to the EMC Documentum xDB Administration Guide. In xDB mode 2, all xDB caching settings are configured on the holding configuration object (eas_cfg_holding) , therefore, you can apply different caching settings for each holding. Property Name DA Label Description eas_xdb_cache _lock_period Locked in cache period (days) Duration in days after which the xDB sub-library may be removed from the cache. The end date of locking cache = eas_aip.eas_dss_base_retention_date + eas_cfg_holding.eas_xdb_cache_lock_period. eas_cfg_xdb _library_parent Cfg parent library (mode 1 and 2) Set to the eas_name of an eas_cfg_xdb_library object, which defines: • Information needed to connect to the target xDB database • The path of the library in the database where AIP data will be stored • Aliases pointing to file system paths in which data files can be created eas_xdb_seg _location Segment location The alias corresponding to the file system path under which the data file of a new detachable library must be created eas_xdb _detachable _option Detachable library Must be selected or ingestion will fail. eas_xdb_store _cache_out Cached out after library storage Reserved for future use eas_xdb_cache _in_req_period Cache in delay (d) Reserved for future use eas_xdb_cache _quota # of cached AIP quota (unlocked) Maximum number of unlocked libraries associated with this holding that are allowed on the xDB file system. 123 InfoArchive Configuration Configuring Settings for xDB Mode 3 Specify the name of the library pool configuration (eas_cfg_xdb_library_pool) object to use. Set the Locked in cache period to the number of days after the AIP’s base retention date that the AIP must be kept on the xDB file system. A pooled library is kept on the xDB file system up to the highest locking date of the AIPs stored in it. The rest of the configuration for xDB mode 3 is done on the library pool configuration object. Converting xDB Library (Pool) Backup Renditions If you have xDB library (pool) backup renditions in the legacy formats (eas_xdb_library and eas_xdb_library_gzip) backed up using earlier versions of InfoArchive, you need to convert them to the new formats eas_xdb_backup and eas_xdb_backup_gzip. InfoArchive is backward-compatible with the legacy xDB library (pool) backup formats, but they may not be supported in future InfoArchive releases. To convert xDB library (pool) backup renditions, execute the following command located in EAS_HOME/tools/XdbBackupMigration/bin: • eas-launch-xdb-backup-migration.bat (Windows) • eas-launch-xdb-backup-migration.sh (Linux) You can set the command options in one of the following ways: • Configure the eas-xdb-backup-migration.properties file located in EAS_HOME/tools/XdbBackupMigration/conf • Set the options when running the command, which overrides the settings in the configuration file The xDB library backup migration command has the following options; all are optional: Option Description -h <holding _name> Specifies the holding for which to convert xDB library (pool) backup renditions. If this option is not set, the migration command converts xDB library (pool) backups in all the holdings present in the repository. 124 InfoArchive Configuration Option Description -m <mode> Specifies the processing mode of the migration script: • MIGRATE: Adds eas_xdb_backup and eas_xdb_backup_gzip renditions only; all old eas_xdb_library and eas_xdb_library_gzip renditions are still retained • MIGRATE_DELETE: Adds eas_xdb_backup and eas_xdb_backup_gzip renditions and deletes all old eas_xdb_library and eas_xdb_library_gzip renditions • DELETE: If corresponding eas_xdb_backup and eas_xdb_backup_gzip renditions are present, deletes old eas_xdb_library and eas_xdb_library_gzip renditions -l [ERROR | WARN| DEBUG | INFO | TRACE] Specifies the logging level to use: ERROR, WARN, DEBUG, INFO, TRACE. Default: INFO. -f <file_name> Specifies the log file to which to write logging information. By default, logging information is sent to an output console. With this option, logging information is sent to the specified log file instead of the output console. -x Use this option to display migration statistics without actually performing the migration process. If this option is specified, the migration script will do nothing but to display the detailed information of the objects to migrate including the number of holdings to update, the number of pool library configuration objects to update, the number of AIPs and pool libraries to update and the size of structured data to update. If the data volume to process is large, the migration process may take a while to complete. During the migration process (processing mode is MIGRATE), the tool: 1. Updates the old library archive format of the holding and the xDB library pool from eas_xdb_library/eas_xdb_library_gzip to eas_xdb_backup/ eas_xdb_backup_gzip by: • Updating the eas_xdb_store_format property value of the eas_cfg_holding object • Updating the eas_xdb_store_format property value of the eas_cfg_xdb_library_pool object 2. Creates the xDB library and xDB library pool backup renditions in the new format. Configuring the Ingestion Process The ingestion process consists of a sequence of sub-processes, each performed by an ingestion processor, which is a Java class file. Which sub-processes to perform during ingestion is defined by the ingestion sequence configuration (eas_cfg_ingest) object, whose XML content references the processors to invoke during the ingestion process. 125 InfoArchive Configuration What can or must be configured depends on the ingestion process (structured or unstructured data). InfoArchive ships with two pre-configured ingestion sequence configuration (eas_cfg_ingest) objects—eas_ingest_sip_zip-pdi and eas_ingest_sip_zip-ci—for ingesting structured data and unstructured data respectively. You can view the XML content of these two objects to see what sub-processes are included in the ingestion process by default. The XML content of the ingestion sequence configuration object (eas_cfg_ingest) contains default settings for most blocks. If the default settings of a block meet your requirements, it is not necessary to configure them in the ingestion configuration. If needed, you can customize the XML content to change the ingestion processors to use. The ingestion process is mostly configured through the ingestion configuration (eas_cfg_pdi) object, whose XML content contains ingestion parameters that define the actions performed by the processors. The processor parameters are grouped by XML <data> blocks, each block defining settings for a processor (with corresponding processor ID). You configure the processor parameters by modifying the XML content of the ingestion configuration (eas_cfg_pdi) object. You can view the XML content of the ingestion configuration objects to view the default processor configurations. Since each ingestion configuration is specific to a PDI schema, the ingestion configuration object also references a PDI schema configuration (eas_cfg_schema) object. You must also configure at least one reception node and one ingestion node, which are required by the execution of ingestion receiver and ingestor. The ingestor has a set of default values in .properties files during InfoArchive installation. During the ingestion process, the ingestor will: 1. Look for a certain XML block defined in the content of the ingestion configuration (eas_cfg_pdi) object. 2. Overwrite the ingestor default values with the values contained in the XML block. To configure the ingestion process: 1. Create an XML file defining the ingestor parameters. 2. Configure an ingestion configuration (eas_cfg_pdi) object with the XML file as its content. 3. Configure a reception node configuration (eas_cfg_receive_node) object. 4. Configure an ingestion node configuration (eas_cfg_ingestion_node) object. 126 InfoArchive Configuration Defining Ingestor Parameters Create an XML file defining the ingestor parameters. Parameters for the specific processors are often specific to a schema. You define parameters in the <data> blocks having the identifier of processor in the XML file. The structure of each block depends on the parameters expected by the processor. pdi.index.creator—Creating xDB Indexes xDB indexing can improve search performance, if applied properly. Indexes adds to the total load on the system by consuming time and space. Their use may be justified if they actually do improve search performance. xDB indexes creation happens after the eas_pdi.xml file is imported into xDB (and has its file name extension changed from .xml to .pdi). In xDB Admin Client, you can see the indexes that have been created on the imported PDI file. The pdi.index.creator block defines the xDB indexes to be created. Before configuring this block, you must determine the key search criteria that are expected to be used by the end user and business application. You can parameterize as many indexes as required, but the distribution of the values to be indexed must be evaluated in order to ensure a minimal selectivity. For asynchronous ingestion, indexes are defined at the xDB document level. The XML block for creating xDB indexes is illustrated below. <data id="pdi.index.creator"> <key.document.name>xdb.pdi.name</key.document.name> <indexes> <Index_Definition_1> ... </Index_Definition_1> <Index_Definition_2> ... </Index_Definition_2> ... <Index_Definition_n> ... </Index_Definition_n> </indexes> </data> 127 InfoArchive Configuration • The id of this XML block is pdi.index.creator. • The key.document.name is always xdb.pdi.name, which is the alias for the PDI file (eas_pdi.xml) imported into xDB. When defined, indexes are created on the PDI files imported into xDB. • The element indexes is the parent element for all indexes created in this block. InfoArchive supports path index and full-text index. The path index is the most common index type and it consumes less space than the full-text index. The definition of this parameters block is not mandatory. In case it is not defined, no index is created on the PDI file. This block is defined in the XML of eas_cfg_pdi object. Path Index A path index extracts XML elements and attributes in PDI file. In the following example, two path indexes on CustomerID and CallStartDate will be created. <path.value.index> <name>CustomerID</name> <path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/ {urn:eas-samples:en:xsd:phonecalls.1.0}Call [{urn:eas-samples:en:xsd:phonecalls.1.0}CustomerID<LONG>]</path> <compressed>false</compressed> <unique.keys>false</unique.keys> <concurrent>false</concurrent> <build.without.logging>true</build.without.logging> </path.value.index> <path.value.index> <name>CallStartDate</name> <path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/ {urn:eas-samples:en:xsd:phonecalls.1.0}Call [{urn:eas-samples:en:xsd:phonecalls.1.0}CallStartDate<DATE_TIME>]</path> <compressed>false</compressed> <unique.keys>false</unique.keys> <concurrent>false</concurrent> <build.without.logging>true</build.without.logging> </path.value.index> The elements of a path index definition block are listed in the table below. For detailed information about the path index, please refer to the Path Index section of EMC Documentum xDB Administration Guide. Element Description name A unique name for the index. path XPath to the element or attribute to be indexed, with element URN. The datatype can be: STRING, INT, LONG, FLOAT, DOUBLE, DATE, DATE_TIME compressed Boolean value indicating whether the index is compressed in xDB. Default: false. Set this to false except when the volume of the text is particularly large. 128 InfoArchive Configuration Element Description unique.keys Boolean value indicating whether the indexed value is unique. Default: false. Specify true if the indexed values must be unique, otherwise specify false. Activating this option throws an ingestion error if the same index value is found more than once within the AIU structured data of the ingested SIP. concurrent Boolean value indicating whether multiple xDB transactions can update the index concurrently. Default: false. When set to true, multiple xDB transactions can concurrently update the index. However, since AIU structured data is never updated in InfoArchive, this is always set to false. build.without .logging Boolean value indicating whether the index is created without xDB logs. Default: true. When set to true, the index must be created without writing to the xDB transaction log. When restarting a failed ingestion, InfoArchive automatically performs a cleanup at the xDB level before the restart. For this reason, it is recommended that you set this to true to reduce the disk I/O activity on the xDB transaction logs. Full-Text Index A full-text index is a special form of value index, which indexes XML elements and attributes and tokenizes the values into a number of terms. Each term-element combination is added to the index. Full-text index lets you search for an individual word contained in the indexed values. It is also less sensitive to misspelling as it also allows you to use wildcard characters in the search query. However, a full-text index consumes more storage than a path index. The following example shows the definition of a full-text index CustomerLastName on the value of the CustomerLastName element, defined by this schema urn:eas-samples:en:xsd:phonecalls.1.0. ... <full.text.index> <name>CustomerLastName</name> <compressed>false</compressed> <concurrent>false</concurrent> <optimize.leading.wildcard.search>true</optimize.leading.wildcard.search> <index.all.text>true</index.all.text> <include.attributes>false</include.attributes> <support.phrases>false</support.phrases> <support.scoring>false</support.scoring> <convert.terms.to.lowercase>true</convert.terms.to.lowercase> <filter.english.stop.words>false</filter.english.stop.words> <support.start.end.token.flags>false</support.start.end.token.flags> <element.uri>urn:eas-samples:en:xsd:phonecalls.1.0</element.uri> <element.name>CustomerLastName</element.name> <attribute.uri/> <attribute.name/> </full.text.index> ... 129 InfoArchive Configuration Element Name Description name Name of the index to create in xDB compressed Boolean value indicating whether the index is compressed in xDB. Default: false. Set this to false except when the volume of the text is particularly large. Note: When the ingestion mode 2 is applied, InfoArchive compresses the entire xDB data file before importing it in the repository. concurrent Always set this to false since AIU structured data is never updated in InfoArchive. When set to true, multiple xDB transactions can concurrently update the index. optimize.leading.wildcard .search If set to true, enables searching the index for terms with a leading wildcard (e.g., “*abcd”) at the expense of a longer index creation time and a larger index volume. Default: true. index.all.text If this is set to true, the element is indexed by its string value, which is computed from the string value of all descendant nodes. If this is set to false, the element can only have text-child nodes, but the index updates faster. Set this to false for better performance. include.attributes Boolean value indicating indexing the specified element or indexing the specified element and its attributes. Default: false. support.phrases Boolean value indicating whether to optimize the index to support proximity phrases in queries. Default: false. Using this option increases the index size. support.scoring Boolean value indicating whether to stores additional information about the indexed tokens to improve the relevance score calculation quality. Default: false. Always set this to false since scoring is not used. convert.terms.to.lowercase Boolean value indicating whether to convert indexed terms to lowercase. Default: true. Set this to true if case-insensitive searching is desired. 130 InfoArchive Configuration Element Name Description filter.english.stop.words Boolean value indicating whether to exclude stop words in the indexes. Default: false. Setting this to true applies a stop word filter. Words on the stop word list are not indexed. For the vast majority of the InfoArchive use cases where a short text value is associated with the structured data to index, there is little or no benefit from setting this to true. So, set this to false in most cases. support.start.end.token .flags Boolean value indicating whether to include the first and last terms of the indexed values. Setting this to true optimizes queries containing starts with, ends with or other proximity constraints. Default: true. element.uri Namespace of the element to index. element.name Name of the element to index. attribute.uri Namespace of the attribute to index. It is recommended that you only index the value of the element. Default: blank. attribute.name Name of the attribute to index. It is recommended that you only index the value of attribute. Default: blank. pdi.aiu.cnt—Counting the Number of AIUs The pdi.aiu.cnt block contains the XQuery expression that returns the total number of AIUs in the imported structured data (PDI) file. The AIU count is specified in the aiu_count element of the SIP descriptor eas_sip.xml. When properly configured, the ingestor checks the amount of AIUs contained in the SIP. If there is any discrepancy between the specified amount of AIUs and actual received amount of AIUs, the ingestor throws an error. Make sure the correct XML path to the AIU node in the PDI file is specified in the XQuery expression; for example: <data id="pdi.aiu.cnt"> <select.query xml:space="preserve"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; count(/n:Calls/n:Call) </select.query> </data> 131 InfoArchive Configuration minmax—Defining the AIP Partitioning Key The minmax block contains the parameters for returning the minimum and maximum values of the partitioning key found in the structured data (PDI) of the SIP. In the structured data (PDI), some XML elements could be used as a partitioning keys, which makes archive search efficient and responsive. InfoArchive performs the following actions to determine partitioning keys in an AIP: • Searches related minimum and maximum values contained in the PDI file. • Assigns minimum and maximum values found for the criteria to the eas_pkey_min_date and eas_pkey_max_date properties of the AIP (eas_aip) object respectively. Archiving is mostly a long-term task, and elements related to date or time are proper candidates for partitioning keys. For example, you can set the partitioning criteria to tag the range of records creation date or retention date to AIPs. In some situations, you need to define multiple partitioning keys. • Create a customized eas_aip subtype with additional attributes. Additional partitioning criteria can be managed by adding custom attributes to the eas_aip subtype used for this holding. The criteria can be of any data type (for example, DATE, STRING, INTEGER, FLOAT, BOOLEAN). • Add as many key elements as required in the parameters block referencing the name of the additional attributes. For performance reasons, define an RDBMS index on the attributes used as partitioning criteria. An RDBMS index can be created using the MAKE_INDEX Content Server administration method. This methoed is exposed in Documentum Administrator. It can also be used in IAPI or DQL scripts. Refer to the EMC Content Server Administration Guide for details. To further optimize query performance, instead of creating two indexes on a pair of AIP properties representing the lower and upper range of the partitioning key (e.g., eas_pkey_min_date and eas_pkey_max_date), you create one composite index on the two properties and drop the original indexes. For example, in Oracle, you can create a composite index using a script like this: CREATE INDEX "MYREPO"."EAS_AIP_PEM_08" ON "MYREPO"."EAS_AIP_PEM_S" ("EAS_PKEY_MIN_DATE", "EAS_PKEY_MAX_DATE") TABLESPACE "DM_MYREPO_INDEX" PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE ( INITIAL 16K BUFFER_POOL DEFAULT) Then, update the table statistics using the following script: exec DBMS_STATS.GATHER_DATABASE_STATS; Note: When you drop the original indexes, you must drop them using DQL statements on the Content Server side instead of directly dropping them in the database; otherwise, Content Server will recreate the indexes in the database. The dm_UpdateStats job is enabled by default and may impact query performance. 132 InfoArchive Configuration The value range information stored with AIP dramatically optimizes the search performance as InfoArchive executes 2–tiered searches: • InfoArchive first looks for a subset of AIPs which fit into the specified partitions (value ranges); for example, a subset of AIPs which contain phone calls from March 1st, 2013 to April 1st, 2013. • Then, InfoArchive searches at xDB level for the AIUs satisfying all the search criteria and return the search results. The following example uses CallStartDate as the partitioning key to determine its value. <data id="minmax"> <key name="CallStartDate" type="date-time" xml:space="preserve"> <min field="eas_pkey_min_date" xml:space="preserve"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; min(/n:Calls/n:Call/xs:dateTime(n:CallStartDate)) </min> <max field="eas_pkey_max_date" xml:space="preserve"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; max(/n:Calls/n:Call/xs:dateTime(n:CallStartDate)) </max> </key> </data> • The name of this XML block must be set to minmax. • In the example, n is used to refer to the urn:eas-samples:en:xsd:phonecalls.1.0 namespace. Therefore, in the XML files, all of the tags defined by this namespace are prefaced with n:. • The use of XQuery functions is supported; for example, count, adjust-dateTime-to-timezone, min, max, and so on. • A configured XQuery is embedded in the count element used to query for the number of AIUs. This query will be executed during the ingestion and return an integer. The number of /Calls/Call elements found in the structured data corresponds to the number of AIU. An XPath statement suffices as a simple XQuery to obtain all of the <Call> elements: /Calls/Call. However, it is good practice to specify the namespace (urn:eas-samples:en:xsd:phonecalls.1.0), which is stored in the variable n.: /n:Calls/n:Call. To count these elements, this XPath is passed to the count function. • One key element per partitioning criterion. The value for partitioning key is retrieved by getting the minimum and the maximum values of the <key>. The value of the type attribute is only written to the processing logs. Valid data types are: DATE, DATE-TIME, STRING, INTEGER, DOUBLE, and BOOLEAN. • The minimum and maximum values found for the /Calls/Call/CallStartDate elements are respectively assigned to the eas_pkey_min_date and eas_pkey_max_date attributes of the AIP repository object. The attributes of the key element are described as follows: 133 InfoArchive Configuration Attribute Name Description name This name will appear in the ingestion log when the value range extraction is performed for the current key element. type Indicates the data type of the value to be returned by the configured XQuery. The acceptable values are DATE, DATETIME, STRING, INTEGER, FLOAT, and DOUBLE. Once defined, all these partitioning criteria attributes must be registered with the corresponding holding configuration (eas_cfg_holding) object. Note: In earlier releases of InfoArchive, the minmax processor also included the function of counting the number of AIUs in the imported structured data (PDI), which is now performed by a separate pdi.aiu.cnt processor. If you have upgraded from a previous InfoArchive version, you must modify the XML content of the ingestion configuration (eas_cfg_pdi) object by removing the count element such as in the following example from the minmax XML block: <count xml:space="preserve"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; count(/n:Calls/n:Call) </count> Optimizing XQuery for Partitioning Keys The execution time for the embedded XQuery has a significant impact on ingestion performance. If the partitioning key is also an index item in xDB, the execution time for XQuery will be reduced substantially when XQuery is executed within the index context. The following example demonstrates how to put XQuery within the index context. <data id="minmax"> <key name="date-oper" type="date" xml:space="preserve"> <min field="eas_pkey_min_date"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; (for $d in /n:Calls/n:Call[n:CallStartDate] order by $d/n:CallStartDate ascending return $d/n:CallStartDate/xs:dateTime(.))[1] </min> <max field="eas_pkey_max_date" xml:space="preserve"> declare namespace op = "urn:acme-corp:xsd:sdd:operation.1.4"; declare namespace head = "urn:acme-corp:xsd:sdd:operation-header.1.6"; (for $d in /n:Calls/n:Call[n:CallStartDate] order by $d/n:CallStartDate descending return $d/n:CallStartDate/xs:dateTime(.))[1] </max> </key> </data> The min and max queries respectively retrieve the minimum and maximum values directly from the index. Optimized query prevents a full scan of the structured data (PDI) in xDB. 134 InfoArchive Configuration pdi.aiu.id — Generating AIU IDs The pdi.aiu.id block contains XQuery expressions that specify the path to the AIU node and returns AIU elements in eas_pdi.xml. The pdi.aiu.id ingestion processor uses this information to generate IDs in the structured data file (eas_pdi.xml) for each AIU in ingested AIPs. AIU IDs are useful for identifying AIUs and retrieving them later, as well as generating granular audit trails at the AIU level. In the following example, the AIU node is specified as /n:Calls/n:Call. <data id="pdi.aiu.id"> <select.query xml:space="preserve"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; /n:Calls/n:Call </select.query> </data> Optionally, you can also define XQuery expressions to return some additional contextual information such as customer ID for each AIU, which will be automatically logged as the eas_fetch event in the audit trail; for example: <data id="pdi.aiu.id"> <select.query xml:space="preserve"> declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; for $call in /n:Calls/n:Call return ($call, $call/n:CustomerID/text()) </select.query> </data> During the ingestion process, the pdi.aiu.id processor creates a unique ID for each AIU in eas_pdi.xml using the following pattern: AIP_ID + “aiu:” + AIU_sequence. In the following example, the Calls element is assigned the AIP ID (080000018000a5e78000a65d) and each Call element, which represents an AIU, is assigned an AIU ID that is derived from the AIP ID. The customer ID is saved as the value of the audit attribute for each AIU as contextual information during the ingestion process. When an AIU is retrieved later, the access to the AIU will be logged as the eas_fetch audit trail event capturing the AIU ID along with its associated contextual (eas_aiu_audit) information. <Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0" xmlns:ns1="urn:x-emc:eas:schema:pdi" ns1:id="080000018000a5e78000a65d"> <Call ns1:id="080000018000a5e78000a65d:aiu:1" ns1:audit="000502"> ... </Call> <Call ns1:id="080000018000a5e78000a65d:aiu:2" ns1:audit="000503"> ... </Call> ... </Calls> Note: The AIU node specified in the XML content of the eas_cfg_pdi object must match the node level defined for the pdi.aiu.id processor in the XML content of the eas_cfg_ingest object. If needed, you can edit the XML content of the eas_cfg_ingest object to redefine the AIU node level: <processor id="pdi.aiu.id"> <display.name>Add AIU IDs in PDI</display.name> <class>com.emc.documentum.eas.ingestor.transform.processor.importer.PDIAiuIdProcessor</class> <data> <select.query>/node()/node()</select.query> </data> </processor> 135 InfoArchive Configuration Note: For SIPs already ingested without AIU IDs, you must re-ingest them with the appropriate pdi.aiu.id configuration to generate the AIU IDs. To optimize query, the ingestion process also invokes the pdi.aiu.index processor, which is referenced in the eas_cfg_ingest object, to create xDB indexes on the AIU ID attribute in the PDI file. The default settings for the processor are sufficient in most use scenarios. pdi.ci.id — Generating Content File IDs The pdi.ci.id block contains the XQuery expression that returns strings that uniquely identify unstructured content files. The pdi.ci.id ingestion processor uses this information to set the content ID as the value of the cid attribute of the element containing the unstructured content file name in eas_pdi.xml. Content file IDs are useful for identifying content files and retrieving them later, as well as generating granular audit trails at the content file level. Make sure that in the XQuery expression, the XML path to the node that contains the file name of the content file is specified; for example: <data id="pdi.ci.id"> <select.query> <![CDATA[ declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; declare namespace ri = "urn:x-emc:eas:schema:ri"; let $pdi_uri := root(.) let $aip_id := xhive:metadata($pdi_uri, 'eas_aip_id') let $ri_uri := replace(document-uri($pdi_uri), '\.pdi$', '.ri') for $ri in doc($ri_uri)/ris/ri[@pdi_key] for $n in /n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName[. = $ri/@pdi_key] return ($n,concat($aip_id,":ci:",$ri/@seqno)) ]]> </select.query></data> ... Here is an example of an content file ID created in eas_pdi.xml. <Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0" xmlns:ns1="urn:x-emc:eas:schema:pdi" ns1:id="080000018000a71b8000a9a8"> <Call ns1:id="080000018000a71b8000a9a8:aiu:1"> … <Attachments> <Attachment> <AttachmentName>recording</AttachmentName> <FileName ns1:cid="080000018000a71b8000a9a8:ci:1">recording1.mp3</FileName> … </Attachment> </Attachments> </Call> </Calls> Note: The same content file can be referenced by multiple AIUs. For SIPs already ingested without content file IDs, you must re-ingest them with the appropriate pdi.ci.id configuration to generate the content file IDs. 136 InfoArchive Configuration To optimize query, the ingestion process also invokes the pdi.ci.index processor, which is referenced in the eas_cfg_ingest object, to index content file IDs in the PDI file. The default settings for the processor are sufficient in most use scenarios. toc.creator—Creating the Table of Contents (eas_ri.xml) for Unstructured Content Files The toc.creator block contains parameters for returning information on each unstructured content file, including file name and file format, which is required by the TOC. When content files are contained in a SIP and referenced in the PDI file, the ingestor generates a table of contents during the ingestion. If content files are contained in SIP, but not referenced in the PDI file, the ingestor does not archive content files. The default settings of this block return nothing. Note: To archive unstructured data, the toc.creator block must always be included in the ingestion configuration (eas_cfg_pdi) object content. An XQuery expression, which can be flexibly configured, is embedded in <select.query> element, responsible for scanning all the PDI elements and attributes and returning the information used for referencing the content files. During the ingestion, the ingestor: • Executes a query on the PDI file (eas_pdi.xml) in xDB, and returns a list of unstructured content files—one XML block per distinct unstructured content file name referenced in the PDI file (eas_pdi.xml); for example: <content type="audio/x-mpeg" format="mp3">recording08.mp3</content> The XQuery selects the distinct values of the /Calls/Call/Attachments/Attachment /FileName elements. The content MIME type and format are statically defined in the XQuery. Only the unstructured content files of the specified MIME type and format will be returned. It is a good practice to order the results returned by the XQuery by file name in ascending order. Note: If the XQuery does not return the file name of an unstructured data file in the SIP, the ingestion can still be completed successfully but the unstructured data file is NOT archived. • Verifies no referenced content file is missing in the SIP by checking whether the file exists in the working directory, where unstructured content files have been uncompressed: /EAS_HOME/working/IngestionNodeDirectory/TimestampedSubDirectory An ingestion error is raised if a returned file name is not found. • For each returned file name, ingestion processing: — Obtains the size of each referenced file — Adds an entry to the TOC initialized in xDB Here is an example of the toc.creator block for creating the table of contents: <data id="toc.creator"> <select.query> <![CDATA[ declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; for $ci in 137 InfoArchive Configuration distinct-values(/n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName) order by $ci return <content type="audio/x-mpeg" format="mp3" audit="{$ci}">{ $ci }</content> ]]> </select.query> </data> The attributes of the <content> element in the XQuery are described as follows. Attribute Name Description type MIME type corresponding to the electronic format of the unstructured content file. The MIME type can be ambiguous. For example, files created by different versions of the same application can have the same MIME type. format Format of the unstructured content file; in most cases, the file name extension. The MIME type can be extrapolated from this attribute. The format must be defined in the Documentum repository. If needed, you can define additional formats in the repository using DA. audit This external variable is declared for the toc.creator processor and a new attribute audit is saved into each ri element as contextual information during the ingestion process. When a content file is retrieved later, the access to the content file will be logged as the content retrieval (eas_getcontent) audit trail event. The XQuery can be as complex as required; for example, it can: • Obtain the file names from multiple elements and attributes of the structured data • Determine the MIME types and format names based on information present in structured data, such as the extension of the file name , dedicated elements, context, and so on. ci.hash—Configuring Content Hashing (Unstructured Data) When SIP contains content files, you can configure the ingestion parameters within the ci.hash XML block to define how hashing is performed on the unstructured data, including: • Computation of hash values for unstructured content files • Comparison of hash values with values provided in the structured data (any discrepancy will result in an ingestion error) Hashing ensures the consistency of contents included in a SIP. There are several algorithms to compute hash values, for example, MD2, MD5, SHA-1, SHA-256, SHA-384, and SHA-512. You can choose between base64 and hexadecimal encoding schemes. The default settings in the ci.hash XML block dictates that: • SHA-1 hash values are computed for all unstructured content files • Hash values are not compared with any values stored in the structured data The following example shows the structure of the ci.hash XML block: <data id="ci.hash"> 138 InfoArchive Configuration <select.query> declare namespace ri = "urn:x-emc:eas:schema:ri"; let $uri := replace(document-uri(.), '\.pdi$', '.ri') for $c in doc($uri)/ris/ri return <content filename="{ $c/@pdi_key }"> <hash encoding="base64|hex" algorithm="SHA-1" provided="false" /> </content> </select.query> </data> • The id of this XML block is ci.hash. • An XQuery is embedded in select.query. The query will be executed against the PDI file (eas_pdi.xml) in the xDB. The conditions applied in the configured XQuery can selectively compute or validate a hash for a subset of unstructured data files. • The uri variable is assigned with the xDB path of the table of contents which equals to the path of the execution context except that the extension is ri instead of pdi. • A content element is returned for each /ris/ri element found in the table of contents • A single hash sub-element is included in each returned content element • The algorithm attribute of the hash sub-elements is statically assigned to SHA-1 for requesting to apply this algorithm. • Since not any hash value is obtained from the structured data of the AIUs, the hash sub-elements are not assigned with any value and their provided attribute is set to false. The XML block returned by the query for each unstructured content file consists of a single <content> element. When PDI file contains hash value, the query returns the hash value in the hashelement. The XQuery does not return such an XML block for an unstructured data file on which no hash computation or validation action is performed. Here is the structure of the returned XML block: <content filename="Unstructured_Content_File_Name"> <hash encoding="base64" algorithm="MD2|MD5|SHA-1|SHA-256|SHA-384|SHA-512" provided="true|false">Encoded_Hash_Value</hash> </content> The <hash> attributes are described as follows: Attribute Name Description encoding Indicates the encoding: • To be applied for inserting the hash value in the table of contents • Of the value of the hash sub-element The current version supports only the base64 encoding. 139 InfoArchive Configuration Attribute Name Description algorithm Name of the hash algorithm apply. The following algorithms are currently supported: MD2, MD5, SHA-1, SHA-256, SHA-384 and SHA-512. provided Boolean indicating if the presence of this hash value in the structured data of the AIUs is demanded. Setting to true this attribute raises an ingestion error if the value of the hash element is void (no value to compare with). When the hash sub-element has a value, an ingestion error is raised if the computed hash value differs from the provided one. The configured XQuery can be as complex as needed depending on the desired behavior of this ingestion step: • If the PDI file also contains content hash values; the query will: — Obtain hash value for each content file. — Set the provided attribute to True. • Compute multiple hash values using different algorithms for each unstructured data file. Multiple hash elements will be returned since multiple hash algorithms are applied as per your configuration. Most common electronic archiving standards mandate that at least one hash value is computed for future consistency checks. Defining the parameters block with a query not returning any information will prevent hash computation during the ingestion. The following example ci.hash block requests the computation of two hash values for all unstructured content files: <data id="ci.hash"> <select.query> <![CDATA[ declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; for $ci in distinct-values(/n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName) order by $ci return <content filename="{ $ci }"> <hash encoding="base64" algorithm="MD5" provided="false" /> <hash encoding="base64" algorithm="SHA-1" provided="false" /> </content> ]]> </select.query> </data> You can use information from the unstructured table of contents (TOC) for hashing. In situations where it is not required to query structured data, using the TOC generally results in faster query execution. Content hash will populate the table of contents with the computed hash values. In the following example, the table of contents now includes a hash value The table of contents includes a hash value for the first content computed during ingestion with the SHA-1 algorithm for the first content file, but no value comparison has been performed: <?xml version="1.0"?> <ris xmlns="urn:x-emc:eas:Schema:ri"> 140 InfoArchive Configuration <ri SchemaVersion="1.0" seqno="1" pdi_key="recording1.mp3" xmlns:ri="urn:x-emc:eas:Schema:ri"> <step seqno="1"> <ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123"> <hash encoding="base64" algorithm="SHA-1" provided="false"> OWJjYjNkNDk2MjNlNzg3YjhkNTI2NWFiMTY3ZmQwODA3NjYzYjNiYQ==</hash> </ci> </step> </ri> … </ris> One <ri.hash> element per is inserted per computed hash value (encoded) in <ri.ci>. The attributes of the hash element are described as follows: Attribute Name Description encoding Encoding applied to the hash value algorithm Name of the algorithm used to compute the hash value provided Set to true when the hash has been compared with the one obtained from structured data. ci.compressor—Configuring Compression of Content Files (Unstructured Data) The ci.compressor block contains parameters that define unstructured content files to compress with the gzip compression algorithm in order to save storage space. In the following example, the ci.compressor parameter block request all unstructured content files to be compressed. <data id="ci.compressor"> <select.query> <![CDATA[ declare namespace n = "urn:eas-samples:en:xsd:phonecalls.1.0"; for $ci in distinct-values(/n:Calls/n:Call/n:Attachments/n:Attachment/n:FileName) return $ci ]]> </select.query> </data> • The XML block id is ci.compressor. • An XQuery is embedded in <select.query> element. — By default, XQuery is executed against the PDI file (eas_pdi.xml) imported into xDB, which is more efficient. — Its expected result is the file names of the contents to compress. • The uri variable is assigned with the xDB path of the table of contents which equals to the path of the execution context except that the extension is ri instead of pdi. • This query will return the values of the attribute pdi_key of all /ris/ri elements. The XQuery returns a list of file names of the unstructured content files to compress. 141 InfoArchive Configuration This ci.compressor parameters block is not mandatory. If this block is absent, no content will be compressed. The default ci.compressor default settings do not enable compression. The XQuery can be as complex as required to determine the unstructured content files to compress; for example, based on information available in the structured data, such as the file extension, or information in the TOC, such as MIME type and format. Depending on how the unstructured content files are generated, files in a given format can be compressible or not compressible; for example, optimized or non optimized PDF. Consequently, it is necessary to have compression rules defined in each ingestion context according to the characteristics of the expected unstructured data files. In the following ci.compressor parameter block example, the XQuery dictates that all unstructured content files are compressed: <data id="ci.compressor"> <select.query> declare namespace ri = "urn:x-emc:eas:schema:ri"; let $uri := replace(document-uri(.), '\.pdi$', '.ri') for $c in doc($uri)/ris/ri return $c/@pdi_key </select.query> </data> The following TOC example shows that the unstructured content file recording1.mp3 is compressed using gzip. <?xml version="1.0"?> <ris xmlns="urn:x-emc:eas:Schema:ri"> <ri SchemaVersion="1.0" seqno="1" pdi_key="recording1.mp3" xmlns:ri="urn:x-emc:eas:Schema:ri"> <step seqno="1"> <ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123"> <hash encoding="base64" algorithm="SHA-1" provided="false"> OWJjYjNkNDk2MjNlNzg3YjhkNTI2NWFiMTY3ZmQwODA3NjYzYjNiYQ==</hash> </ci> </step> <step seqno="2"> <compress mime_type="application/x-gzip" dctm_format="gzip" size="40739"/> </step> </ri> • A <step> element is appended after the current <step> in the <ri> element of the content file. • The seqno attribute value is the seqno of the previous step + 1. • A compress element with the following attributes is created within step. Attribute name Description mime_type MIME type of the compression format. Set to application/x-gzip since the current version applies this compression algorithm. 142 InfoArchive Configuration Attribute name Description dctm_format Name of the repository format corresponding to the compression format. Set to gzip since the current version applies this compression algorithm. size Size of the content file in bytes after the compression. Configuring a Ingestion Configuration (eas_cfg_pdi) Object In DA, create an object of type eas_cfg_pdi with the ingestion parameter XML file as its content in the holding folder (e.g., /System EAS/Archive Holdings/MyHolding) and configure its properties. Property Name DA label Description eas_name Name Name assigned to these ingestion parameters Generally, it is chosen to assign the URN of the Schema for which the parameters are applicable. eas_version Version Generally, not any value is assigned to this property. Configuring a Reception Node Configuration (eas_cfg_receive_node) Object At least one reception node must be configured for the reception process. The reception node is configured through the reception node configuration (eas_cfg_receiver_node) object in the repository. A default cache access node /System EAS/Nodes/reception_node_01 was configured during InfoArchive installation and can be used out of the box. You can modify the default reception node as needed. To create and configure a new reception node, create an object of type eas_cfg_receiver_node and set its properties. 143 InfoArchive Configuration Property Name DA Label Description eas_name Name Name of the receiver node. eas_log_level Log level Verbosity level of the reception logging: • 0: TRACE • 1: DEBUG • 2: INFO • 3: WARN • 4: ERROR eas_logs_store _enabled Archive logs When set, InfoArchive stores reception logs in the repository for unknown holdings (for example, when a reception error occurs before the holding has been identified) which is useful for troubleshooting reception errors with incorrectly configured data. eas_logs_store Log store Storage area to use for storing reception logs when eas_log_store_enabled is set to true. eas_alias_set Alias set Name of the alias set to apply to the lifecycle after the creation of the object (optional). eas_auxiliary_alias _set Auxiliary alias set (Optional) The alias set in the session that transitions states in the AIP lifecycle during the reception phase. It contains aliases referencing the permission sets to be applied to actions throughout the reception lifecycle. If not using custom lifecycle states, leave the default value as it. 144 InfoArchive Configuration Property Name DA Label Description eas_folder_path Creation Folder Path to the temporary repository folder in which to create AIP objects, before the configured setting of the target holding are determined and applied. eas_fs_working _root Working directory Path to the working directory in the file system used by the cache access node. eas_delete_on _unknown Delete on unknown flow Enable deletion of the received file if the eas_filename_pattern, eas_app_user_arg1_pattern, eas_app_arg2_pattern table line does not match the arguments. This attribute is associated with the Argument 1 mask at the same index. Property Name DA Label Description eas_filename _pattern Filename pattern Pattern (Java regular expression) that must match the value of the filename passed to the receiver. This attribute is associated with the Argument 1 mask at the same index. eas_app_user _arg1_pattern Argument 1 mask Pattern (Java regular expression) that must match the value of the “o” argument passed to the receiver. eas_app_user _arg2_pattern Argument 2 mask Pattern (Java regular expression) that must match the value of the “t” argument passed to the receiver. This attribute is associated with the Argument 1 mask at the same index. eas_sip_format Format Documentum format name associated with the received file (for example, eas_sip_zip) associated with the Argument 1 mask at the same index. eas_java_class XML extraction Java class Fully qualified name of the Java class to invoke the extraction of the eas_sip.xml file from the received SIP with the eas_sip_format at the same index. This attribute is associated with the Argument 1 mask at the same index. eas_java_class _arg1 Java class arguments String value passed as argument of the Java class. This attribute is associated with the Argument 1 mask at the same index. eas_delete_on _error Delete on error Whether or not to enable the deletion of received data from the working directory and xDB if a processing error occurs. This setting is useful for protecting sensitive data. When data must be encrypted for 145 InfoArchive Configuration the holding, the eas_delete_on_error property of the eas_cfg_holding_crypto configuration object overrides this property defined at the holding level. This property is associated with the Argument 1 mask at the same index. Configuring an Ingestion Node Configuration (eas_cfg_ingest_node) Object At least one ingestion node must be configured for ingestion processing (required by both the enumerator and ingestor as an argument). You can configure multiple ingestion nodes to distribute the ingestion workload across different holdings. The ingestion node is configured through the ingestion node configuration (eas_cfg_ingest_node) object in the repository. A default ingestion node /System EAS/Nodes/ingestion_node_01 was configured during InfoArchive installation and can be used out of the box. You can modify the default ingestion node as needed. To create and configure a new ingestion node, create an object of type eas_cfg_ingest_node and set its properties. 146 InfoArchive Configuration Property Name DA Label Description eas_name Name Name of the ingestion node. eas_log_level Log level Verbosity level of the ingestion logging: • 0: TRACE • 1: DEBUG • 2: INFO • 3: WARN • 4: ERROR eas_fs_working _root Working directory File system path of the root working directory of the ingestor. eas_fstore_access _enabled Filestore access Activates direct read access to the repository filestore storing the received files: • Requires that these file systems are accessible on the host server of the ingestion node. • Accelerates the ingestion and optimizes disk usage since no local copy of the received file is created. If you run ingestions on the Content Server using the installation owner user account, the ingestor can access the repository file stores directly at the operating system level. This is quicker than accessing files using the DFC 147 InfoArchive Configuration Property Name DA Label Description getfile method. File access at the operating system level speeds up ingestion, especially for large SIP files. eas_fstore_nomap _enabled Use same filesystem path Indicates that all the filestores of the repository is available in file system with the same path as on the Content Server. eas_map_fstore Filestore Name of the repository filestore associated with the eas_map_local_mounts property at the same index. Reserved for future use eas_map_local _mount Local path Mount path of the local filestore named in eas_map_fstore at the same index. Reserved for future use Configuring Ingestion for Unstructured Data Ingestion of unstructured data is performed by additional processors referenced in the eas_ingest_sip_zip-ci ingestion sequence configuration (eas_cfg_ingest) object. Before ingesting unstructured data, make sure the following has been configured for the target holding: • The ingestion sequence must be set to eas_ingest_sip_zip-ci. If the eas_ingest_sip_zip-pdi ingestion sequence is used for SIPs that contain unstructured data, the ingestion process still proceeds without any errors. However, the unstructured data is not archived. • The content store must be specified for storing unstructured data. If content hashing is required, you select the Content hash validation option. This option lets you quickly activate or deactivate hash processing without having to alter the ingestion configuration. 148 InfoArchive Configuration Configuring an Encryption-enabled Holding InfoArchive supports data encryption to ensure data security. Data is encrypted during ingestion and decrypted when searched and accessed by authorized users. InfoArchive can encrypt the following data: • Received SIPs (sip) • Structured data in PDI XML (pdi) • Content files (ci) • Order results (dip) You can configure an encryption-enabled holding to ensure data security. InfoArchive has two certified encryption providers: • DemoCryptoProvider: This is not a real encryption provider. You can use it for demo and training purposes. • RSACryptoProvider: InfoArchive currently supports EMC RSA DPM (Data Protection Manager) as a crypto provider out of the box. If you want to use RSA DPM with InfoArchive, a separate RSA DPM license is required. Note: — InfoArchive 3.1 supports RSA DPM Java Key Client 3.5.2 and RSA DPM Server 3.5 or later. — You should have basic knowledge of RSA DPM in order to use RSA with InfoArchive. Refer to RSA documentation for more information about RSA DPM Key Client and Server. You can also develop your own encryption provider. The encryption Java classes to be used with InfoArchive must follow the ICryptoProvider interface. You can find the source code, the required methods, and the documentation about ICryptoProvider in resources/java of the InfoArchive installation package. In this documentation, RSA Crypto Provider is used as an example to illustrate how to configure encryption. 149 InfoArchive Configuration Preparing Resources for InfoArchive Installation 1. If the host name of the DPM server is not known by the DNS, map the DPM appliance host IP to the host name. On Windows, you map a host name by editing C:\Windows\System32\drivers\etc\hosts; On Linux, you map a host name by editing /etc/hosts. 2. Update JVM Crypto Policy libraries on the InfoArchive host. You can download the policy files from the following sites: • Oracle JVM: http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download -432124.html • IBM JVM: https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=jcesdk Unzip the downloaded package and copy the files to <jdk_install_dir>/jre/lib /security on your InfoArchive host. 3. Copy RSA DPM Key Client Java libraries to core/template/home/external/lib in the installation package. 4. Copy sample-config.properties provided by RSA DPM Java Key Client to core/template/home/external/resources in the installation package. 5. Change the filename of sample-config.properties. The properties file should follow the naming convention: <eas_cfg_crypto_provider .eas_name>.properties. For RSA Crypto Provider, you must change the filename to RSACryptoProvider.properties. 6. Modify the value of pki.client_keystore_file and pki.server_keystore_file. Note: • The value of pki.client_keystore_file is the absolute path to the file (.p12) you upload to DPM appliance as the Identity Certificate. The value of pki.server_keystore_file is the absolute path to the file (.pem) you upload to DPM appliance as the Trusted CA Certificate. If the value is a Windows path, you must use two back slash characters as path separators. For example, C:\\certificates\\cert_1.p12. • pki.client_keystore_password must be set with the password to the pki.client_keystore_file. • If InfoArchive components share the same certificate, Receiver and Ingestor are launched in parallel. Cache on disk and registration client functions are not needed. The properties file should not contain the following properties that are added or overwritten by InfoArchive at runtime: — server.host — server.port — cache.file — client.registration_file — client.app_name • If InfoArchive components are distributed across several hosts, EMC recommends that you use several custom .poperties files, and set paths to the properties files in the CLASSPATH 150 InfoArchive Configuration environment variable. In this scenario, no properties are overwritten by InfoArchive at runtime. After the preparation is finished, you can start InfoArchive installation following the instructions described in the EMC InfoArchive Installation Guide. Configuring RSA DPM Key Manager To encrypt data archived into InfoArchive using RSA DPM, you must create at least one key class (you can create more if needed). Follow these rules when creating the key class: • Identity group linked to the key class must contain an identity whose identity certificate is set in RSACryptoProvider.properties. • Set Key duration to Infinite. • Set Key behavior to New key each time or Use current key depending on your needs. Creating Configuration Objects for Encryption Settings InfoArchive saves encryption settings in a separate set of objects. In this way, a dedicated ACL can be configured to ensure data security. To configure an encryption-enabled holding, you must create the following objects to utilize the Java classes contained in the libraries you installed earlier. • eas_cfg_crypto_provider • eas_cfg_aic_crypto • eas_cfg_holding_crypto • eas_cfg_pdi_crypto • eas_query_config In DA, you can create a new object by selecting File > New > Document. Then select a specific type in the Type drop-down list. eas_cfg_crypto_provider You must create an eas_cfg_crypto_provider object for the encryption libraries you installed. For RSA Crypto Provider, use the specific class eas_cfg_rsa_key_manager, which is a child of the eas_cfg_crypto_provider class. To create an eas_cfg_crypto_provider object: 1. In DA, navigate to /System EAS/System. 2. Select File > New > Document in the menu. 151 InfoArchive Configuration 3. Fill Name, Type and Format controls in the Create tab. For RSACryptoProvider, choose eas_cfg_rsa_key_manager in the Type drop-down list. 4. Click Next. 5. Fill textboxes in the Info tab. • In the Encryption service section, you must provide the exact same name as that of the .properties file in the Name field. The Java Class field should have the name for the RSA implementation class. For RSACryptoProvider, the Java Class is com.emc.documentum.eas.crypt.rsa.rkm.RSADpmCryptoProvider. • In the RSA RKM section, the Site value should match the site value set in the global configuration (eas_cfg_config) object. The default value is A. 6. Click Next. 7. In the Permission tab, ensure dm_world has the Read permission. The properties of the created object are as follows: Property Name DA Label Description object_name Name (Info tab) RSACryptoProvider, must be identical to the file name (without file name extension) of the .properties file in the external/resources folder. eas_java_class Java Class Fully qualified name of the Java class implementing the InfoArchive encryption interface. For the RSA RSACryptoProvider, it is com.emc.documentum.eas.crypt.rsa.rkm .RSADpmCryptoProvider. eas_node_site Site Must match the eas_site property of eas_cfg_config object eas_node_proximity Proximity Proximity from the site of the RSA DPM server, should be an integer bigger than 0. At runtime, CryptoProvider first tries to connect to the RSA DPM server with the lowest priority. eas_node_host Host The name of the DPM appliance host. eas_node_port Port The DPM appliance port. eas_cfg_aic_crypto This is the parent type for eas_cfg_holding_crypto. You can use the child type, eas_cfg_holding_crypto, as the AIC for queries (search and order). Moreover, you can create a specific eas_cfg_aic_crypto to accommodate custom requirements. You can configure eas_cfg_aic_crypto properties on the Collection tab of the eas_cfg_holding_crypto object. Specify the following properties: 152 InfoArchive Configuration Property Name DA Label Description eas_name Name The name of the configuration eas_cfg_crypto_provider Cfg crypto provider The crypto provider to use for the holding eas_pdi_crypto_query _role Encrypted structured data query role The role necessary to use encrypted criteria on a query. If blank, no specific role is needed. eas_ci_crypto_access_role Encrypted content accessor role The role to access the decrypted content. If left blank, no specific role is needed. eas_pdi_crypto_access _role Encrypted structured data accessor role The role necessary to access decrpyted data on a query result. If blank, no specific role is needed. eas_dip_crypto_enabled Encrypt order result Flag indicating whether order results are encrypted. eas_dip_crypto_always Always encrypt order result Flag indicating whether to encrypt order results even if the result contains no encrypted data. eas_dip_crypto_key _class Order result key class Key class to encrypt order results. eas_dip_crypto _parameters Order result crypto parameters Specific encryption parameters for order results. RSA Encryption Header section provides an example of using cryptographic parameters. eas_cfg_holding_crypto The eas_cfg_holding_crypto object holds properties related to encryption of a holding. You need to specify the following properties. Property Name DA Label Description eas_name Name The name of the encryption holding object. For example, crypto.PhoneCalls eas_cfg_crypto_provider Cfg crypto provider The name of the encryption provider. For example, RSACryptoProvider eas_pdi_schema Schema Name of an XML schema in which this holding can ingest structured data. eas_cfg_pdi_crypto Cfg metadata crypto eas_cfg_pdi_crypto and eas_pdi_schema are coupled values. When a SIP whose schema is the value of eas_pdi_schema received, the corresponding eas_cfg_pdi_crypto is used to override PDI ingestion configuration. 153 InfoArchive Configuration Property Name DA Label Description eas_delete_on_error Delete on error Flag indicating whether data files should be deleted if an error is encountered during the ingestion. eas_sip_crypto_enabled Encrypt reception Setting this flag to FALSE to disable the encryption of received files. eas_sip_crypto_key_class Reception key class Name of the key class for encrypting received files. eas_sip_crypto _parameters Reception crypto parameters Parameters to pass to the class implementing the encryption of the SIP data; for example, parameter that specifies whether to include the RSA header in the encrypted value in RSA RKM implementation. The RSA Encryption Header section provides an example of using cryptographic parameters. eas_pdi_crypto_enabled Encrypt structured data Setting this flag to FALSE to disable the encryption of the PDI data. eas_pdi_crypto_key _class Metadata key class Name of the key class for the encryption of the PDI data. eas_pdi_crypto _parameters Metadata crypto parameters Parameters to pass to the class implementing the encryption of the PDI data; for example, parameter that specifies whether to include the RSA header in the encrypted value in RSA RKM implementation. The RSA Encryption Header section provides an example of using cryptographic parameters. eas_ci_crypto_enabled Encrypt reception Setting this flag to false allows to disable the encryption of unstructured contents. eas_ci_crypto_key_class Content key class Name of the key class known by the crypto provider for encrypting unstructured content files. eas_ci_crypto _parameters Content crypto parameters Parameters to pass to the class implementing the encryption of CI; for example, parameter that specifies whether to include the RSA header in the encrypted value in RSA RKM implementation. 154 InfoArchive Configuration Property Name DA Label Description The RSA Encryption Header section provides an example of using cryptographic parameters. eas_cfg_pdi_crypto Similar to the eas_cfg_pdi type, eas_cfg_pdi_crypto configures parameters for the encrypted ingestion with an imported XML file. If encryption is enabled, both eas_cfg_pdi and eas_cfg_pdi_crypto configurations are both applied to ingestion. If there is any overlap, configurations held by the eas_cfg_pdi_crypto object overwrites the eas_cfg_pdi object. The imported XML file has the following XML structure: <datas> <data <data <data <data <data </datas> id="pdi.xdb.encryption">...</data> id="pdi.xdb.importer">...</data> id="set.schema">...</data> id="ci.encryption">...</data> id="pdi.index.creator">...</data> pdi.xdb.encryption The pdi.xdb.encryption block is used to determine the elements in the structured data to encrypt during ingestion. Encryption of structured data occurs before it is imported into xDB, which prevents unauthorized access at the xDB level. The following XML block configures the CustomerLastName and RepresentativeID elements in eas_pdi.xml to be encrypted during the ingestion process. <data id="pdi.xdb.encryption" > <prefix>ns1</prefix> <namespace>urn:x-emc:eas:schema:pdi</namespace> <paths> <path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls /{urn:eas-samples:en:xsd:phonecalls.1.0}Call /{urn:eas-samples:en:xsd:phonecalls.1.0}CustomerLastName </path> <path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls /{urn:eas-samples:en:xsd:phonecalls.1.0}Call /{urn:eas-samples:en:xsd:phonecalls.1.0}RepresentativeID </path> </paths> </data> This is an example of PDI structured data before encryption. <RepresentativeID>000502</RepresentativeID> After encryption, a hash value and an encrypted element value are generated: <RepresentativeID ns1:hash="wPTEW63YwTeh1dKn8kd+UJKSIDPsC/B9pRA9rwD39ag="> MDAwNTAycGRpa2V5X0FfNw==</RepresentativeID> 155 InfoArchive Configuration Note: Structured data encryption has the following limitations: 1. The text value and attribute value of an XML element cannot be both encrypted. 2. In an XML element, only one attribute can be encrypted. 3. The namespace must be set for every single XML element. 4. The namespace is optional for attributes. If a namespace is declared on an attribute, you must also declare the namespace in the PDI XML. pdi.index.creator If an encrypted element in the PDI XML file is to be indexed, the index should target the hashed value. For example, RepresentativeID is encrypted as follows: <RepresentativeID ns1:hash="wPTEW63YwTeh1dKn8kd+UJKSIDPsC/B9pRA9rwD39ag="> MDAwNTAycGRpa2V5X0FfNw==</RepresentativeID> In the XML content of the eas_cfg_pdi object, the pdi.index.creator section specifies a path to RepresentativeID as follows: <path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/{urn:eas -samples:en:xsd:phonecalls.1.0}Call[{urn:eas-samples:en:xsd:phonecalls.1 .0}RepresentativeID<LONG>]</path> In the XML content of the eas_cfg_pdi_crypto object, the pdi.index.creator section should target the hash attribute instead of the RepresentativeID text value. Therefore, the pdi.index.creator section specifies a path to the hash attribute instead of the RepresentativeID text value as follows: <path>/{urn:eas-samples:en:xsd:phonecalls.1.0}Calls/{urn:eas -samples:en:xsd:phonecalls.1.0}Call/{urn:eas-samples:en:xsd:phonecalls.1 .0}RepresentativeID[@{urn:x-emc:eas:schema:pdi}hash<STRING>]</path> Note: The pdi.index.creator defined by the eas_cfg_pdi_crypto object overrides the one defined by the eas_cfg_pdi object. Therefore, you must declare all indexes, including the elements which are not encrypted. pdi.xdb.importer After encryption is enabled, a PDI XML containing the encrypted structured data is created during the ingestion process and then imported into xDB. The following XML block must be configured in order to import the encrypted PDI XML into xDB. <data id="pdi.xdb.importer"> <pdi.env.name>pdi.file.xml.metadata.encrypted</pdi.env.name> </data> set.schema The encrypted PDI XML may have a different structure than the regular one. The schema for the regular PDI XML may not be applicable to the encrypted one. This block defines a schema for the encrypted PDI XML. 156 InfoArchive Configuration The following XML block defines a schema for the encrypted PDI XML: <data id="set.schema"> <name>urn:eas-samples:en:xsd:phonecalls:crypto.1.0</name> </data> ci.encryption Unstructured content files can contain sensitive data as well. You can also encrypt content files during the ingestion process. Content encryption usually happens after compression. The following XML block defines the process of selecting content files from a table of contents (eas_ri.xml) file. <data id="ci.encryption"> <select.query> declare namespace ri = "urn:x-emc:eas:schema:ri"; let $toc as document-node(element(ri:ris)):= doc(concat('eas_aip_id_', xhive:metadata(., 'eas_aip_id'), '.ri')) return $toc/ri:ris/ri:ri/@pdi_key/[contains(., 'mp3')]/string(.) </select.query> </data> The following code shows an XML block in the table of contents after encryption is enabled. <ris xmlns="urn:x-emc:eas:schema:ri"> <ri seqno="1" pdi_key="recording1.mp3" schemaVersion="1.0" audit="recording1.mp3" xmlns="urn:x-e <step> <ci mime_type="audio/x-mpeg" dctm_format="mp3" size="41123"/> </step> <step> <crypto size="41136"/> </step> <step> <container position="0" size="41136"/> </step> </ri> </ris> eas_query_config On the query configuration object side, no new attributes need to be added to the existing eas_cfg_query object. You need to modify the imported XML of the eas_cfg_query object to include information about how the encryption schema must be handled. For example, the XML content of the query configuration object for the sample PhoneCall holding is as follows: <?xml version="1.0" encoding="UTF-8"?> <request-configs xmlns:n="urn:eas-samples:en:xsd:phonecalls.1.0" xmlns:eas="urn:emc:eas"> <query-prolog> declare namespace f = "urn:x-emc:eas:functions"; declare namespace pdi = "urn:x-emc:eas:schema:pdi"; declare function eas:decrypt-tree($node as node(), $aipid as xs:string) as node()? { typeswitch ( $node ) 157 InfoArchive Configuration case element(n:CustomerLastName) return element { node-name($node) } { f:decrypt($node/text(), $aipid) } case element(n:RepresentativeID) return element { node-name($node) } { f:decrypt($node/text(), $aipid) } case element() return element { node-name($node) } { $node/@*, $node/node()/eas:decrypt-tree(., $aipid) } default return $node }; </query-prolog> <request-config schema="urn:eas-samples:en:xsd:phonecalls.1.0"> <entity> <path>/n:Calls/n:Call</path> <order-by>n:CallStartDate</order-by> </entity> <param name="CustomerID" type="decimal" index="true"> <path>n:CustomerID</path> </param> <param name="CallStartDate" type="date-time" index="true"> <path>n:CallStartDate</path> </param> <param name="CustomerLastName" type="string" index="true"> <path>n:CustomerLastName</path> </param> <param name="RepresentativeID" type="decimal" index="true"> <path>n:RepresentativeID</path> </param> <param name="CustomerFirstName" type="string" index="true"> <path>n:CustomerFirstName</path> </param> </request-config> <request-config schema="urn:eas-samples:en:xsd:phonecalls:crypto.1.0"> <entity> <path>/n:Calls/n:Call</path> <order-by>n:CallStartDate</order-by> </entity> <param name="CustomerID" type="decimal" index="true"> <path>n:CustomerID</path> </param> <param name="CallStartDate" type="date-time" index="true"> <path>n:CallStartDate</path> </param> <param name="CustomerLastName" type="hashed" index="true"> <path>n:CustomerLastName/@pdi:hash</path> </param> <param name="RepresentativeID" type="hashed" index="true"> <path>n:RepresentativeID/@pdi:hash</path> </param> <param name="CustomerFirstName" type="string" index="true"> <path>n:CustomerFirstName</path> </param> <query-template xmlns:xhive="http://www.x-hive.com/2001/08/xquery-functions"> 158 InfoArchive Configuration for $entity in <select/> let $root := root($entity) let $aipid := xhive:metadata($root, 'eas_aip_id') return eas:decrypt-tree($entity, $aipid) </query-template> </request-config> </request-configs> Adding Properties for the Holding Object You must modify the regular holding configuration object (eas_cfg_holding) to accept the encrypted received file format by adding the following attributes: Property Name DA Label Value eas_sip_format Received file format eas_sip_zip_crypto eas_cfg_ingest Cfg ingest process eas_ingest_sip_zip-ci, or eas_ingest_sip_zip-pdi Auto Populating Properties for Runtime Objects New attributes are automatically populated for InfoArchive runtime objects. If objects are created after the encryption option is enabled for a holding, you can view the auto-populated attributes in their properties dialogs. Object New Properties eas_aip • eas_pdi_crypto_key_id • eas_pdi_crypto_iv • eas_pdi_crypto_hash_salt • eas_pdi_crypto_propbag • eas_ci_crypto_key_id • eas_ci_crypto_iv • eas_ci_crypto_propbag • eas_sip_crypto_key_id • eas_sip_crypto_iv • eas_sip_crypto_propbag 159 InfoArchive Configuration Object New Properties eas_order • eas_cfg_aic_crypto • eas_cfg_aic_crypto_id • eas_cfg_crypto_provider • eas_cfg_crypto_provider_id • eas_dip_crypto • eas_dip_crypto_key_id • eas_dip_crypto_iv • eas_dip_crypto_propbag • eas_crypto_encoding • eas_pdi_decrypt_cnt • eas_ci_decrypt_cnt eas_aip_parent • eas_cfg_crypto_provider • eas_cfg_pdi_crypto • eas_cfg_pdi_crypto_version • eas_crypto_encoding • eas_pdi_crypto_hash_algo eas_open_aip_parent • eas_aggr_crypto_mode • eas_aggr_crypto_encoding • eas_aggr_pdi_crypto_halgo • eas_aggr_pdi_crypto_hsalt • eas_aggr_pdi_crypto_key_id • eas_aggr_ci_crypto_key_id • eas_aggr_sip_crypto_key_id • eas_aggr_pdi_crypto_iv • eas_aggr_ci_crypto_iv • eas_aggr_sip_crypto_iv • eas_aggr_pdi_crypto_propbag • eas_aggr_ci_crypto_propbag • eas_aggr_sip_crypto_propbag 160 InfoArchive Configuration RSA Encryption Header RSA encryption header is an RSA encryption parameter. When you use a header, RSA stores description data (keyID and IV) at the encrypted value level. If you use the header parameter, the encrypted value itself contains all the information required for decryption. There is no need to access AIP objects to retrieve keyID and IV values. You set the following parameter properties according to your encryption requirements: • eas_sip_crypto_parameters • eas_pdi_crypto_parameters • eas_ci_crypto_parameters • eas_dip_crypto_parameters Valid parameter values are as follows: Value Description HEADER=Default Use the most recent header version when protecting; automatically determine the header type when processing. HEADER=Version 1.5 Use a Key Manager Java Client Version 1.5 compatible header. HEADER=Version 1.5 Base64 Use a Key Manager Java Client Version 1.5 compatible header with Base 64 encoding. HEADER=Version 2.1 Use a Key Manager Java Client Version 2.1 header. HEADER=Version 2.7 Use a Key Manager Java Client Version 2.7 header. HEADER=None No header is used. For example, you can set the eas_pdi_crypto_parameters property value to HEADER=Default for the PhoneCalls holding. In the PDI XML file, the Representative ID is encrypted as follows: <RepresentativeID ns1:hash="ZXio5+YN3cY1/DVee3qxuR3qQAo7hlXflX+wvA5jusk="> UktNQzI3MAD/////AAAABW11aWQAAAAAIFJlFFDkcl+lzIeFd4txg4eLP3RpZupcPRvY+Z4REoUR /////wAAAANpdgAAAAAQeAuiLT5AE9A4QeIZOVw3HP////8AAAAFY3N1bQAAAAAgSd8DOFGOoxBy ao6Q5yjCxTihQ8JM1LdWukRsGh5+fmGritDkwyKAZxV4o02hfoRO</RepresentativeID> If you try to retrieve the Representative ID from a web application, the application does not need to access the corresponding AIP object. Instead, the application retrieves the encrypted value, decrypts the value based on the header information contained in itself, and returns the decrypted value. Operator Restraints for Querying Encrypted Data When you build a query on encrypted data, you must use the following operator rules for single-value and multi-value queries on encrypted data: Operator Single-value Query Multi-value Query Equal Allowed Not allowed NotEqual Allowed Not allowed Other operators Not allowed Not allowed 161 InfoArchive Configuration Configuring Synchronous Ingestion Synchronous ingestion configuration involves defining the custom AIP assignment policy and configuring the AIP parenting policy. You must also enable synchronous ingestion both at the access node and the holding level. The AIP parenting policy is configured through an eas_cfg_aip_parent_policy configuration object and referenced by the eas_cfg_holding configuration object of the holding to which the policy is applied. One AIP parenting policy can be applied to multiple holdings. The AIP assignment policy is defined using XQuery in a text file and imported into eas_cfg_aip_parent_policy as the content of the object. You activate synchronous ingestion by configuring the eas_cfg_access_node and eas_cfg_holding objects. Defining the Custom AIP Assignment Policy You define the AIP assignment policy using XQuery in a text file to be imported into eas_cfg_aip_parent_policy as the content of the object. You can use any information present in the SIP descriptor, including custom metadata, in the XQuery to define the assignment logic. For example, you can use the retention date (which is most commonly used) or the holding as the condition for assigning AIPs. Here is an example of the assignment policy XQuery expression. xquery version "1.0" encoding "utf-8"; declare namespace n = "urn:x-emc:eas:schema:sip:1.0"; let $xdbMode := /n:sip/n:dss/n:entity/text() let $aipMode := /n:sip/n:dss/n:application/text() let $d as xs:dateTime:= xs:dateTime(/n:sip/n:dss/n:base_retention_date/text()) let $quarter as xs:integer := xs:integer((ceiling(month-from-dateTime($d) div 3))) let $tz as xs:dayTimeDuration := timezone-from-dateTime($d) let $start as xs:dateTime := adjust-dateTime-to-timezone(xs:dateTime (concat(year-from-dateTime($d),"-01-01T00:00:00")),$tz) let $nextQuarter as xs:dateTime := $start + xs:yearMonthDuration (concat("P",string(($quarter*3)),"M")) return <policy partitioning_key="{year-from-dateTime($d)}-Q{$quarter}" close_hint_date="{$nextQuarter}"aip_mode="{$aipMode}" xdb_mode="{$xdbMode}"/> 162 InfoArchive Configuration In this example, the quarter-based assignment logic is used and the base retention date is used to compute the following values: • partitioning_key="{year-from-dateTime($d)}-Q{$quarter}" A string type value using the YYYY-Qn pattern for the year and quarter number of the base retention date • close_hint_date="{$nextQuarter} A datetime type value corresponding to the first day of the next quarter Those computed values are returned as values of partitioning_key and close_hint_date attributes of a policy element expected by InfoArchive. The close_hint_date attribute is mandatory only if the close mode 1 is applied. The AIP and xDB mode to apply can be also dynamically determined based on the SIP descriptor by returning optional aip_mode and xdb_mode attributes of the policy element. However, the returned combination must be configured in the as_cfg_aip_parent_policy configuration object. Here is an example of the policy element returned by the XQuery expression: <policy partitioning_key="2012-Q2" close_hint_date="2014-01-01T12:00:00.000+01:00"/> Configuring the AIP Parenting Policy 1. In DA, create an object of type eas_cfg_aip_parent_policy by importing the AIP assignment XQuery text file as its content. 2. Configure the eas_cfg_aip_parent_policy object properties. Property DA Label Description eas_name Name Name of the AIP parenting policy. 163 InfoArchive Configuration Property DA Label Description eas_default_aip _mode Default AIP mode The default AIP mode to apply for creating the eas_aip repository object: • 1 = Creation of a materialized eas_aip object (like in batch ingestion) • 2 = Creation of an eas_aip lightweight object attached to a shared eas_aip_parent parent object • 3 = Same as mode 2, but when the parent object is closed, its lightweight objects are aggregated into a single materialized eas_aip object and lightweight objects and the parent object are pruned The returned result of the AIP assignment policy XQuery overrides the default AIP mode. eas_default_xdb _mode Default xDB mode The default xDB mode to apply when ingesting structured data in xDB: • 0: Mode configured at the archive holding level is applied (see property eas_cfg_holding.eas_xdb_mode) • 1: Metadata files of all ingested AIPs are stored directly in a designated xDB detachable library • 2: For each ingested AIP, a new sub-library is created in a designated xDB detachable library and metadata files are stored in the sub-library. • 3: Like in xDB mode 2, AIP metadata files are stored in the sub-library exclusively created for each AIP. However, AIP sub-libraries are organized according to a defined assignment policy in a pool of libraries (called pooled libraries, which is purely an InfoArchive concept and not a library type in xDB) created in a designated xDB detachable library. The pooled libraries logically pertain to a configured library pool. The returned result of the AIP assignment policy XQuery overrides the default AIP mode. Note: The parameters associated with the mode are read from the archive holding configuration. The parameters required by a selectable mode must be set at the archive holding level. eas_aip_aggr _staging_store 164 Staging store of the AIP creation mode 3 Only needed if the AIP creation mode 3 is applied: • Contents associated with the lightweight AIPs are all imported in this transient storage area (the InfoArchive Configuration Property DA Label Description storage areas designed in the archive holding configuration are ignored). • When the AIP parent is closed, the contents associated with the lightweight AIPs are retrieved and aggregated, and those aggregates are imported as contents of the materialized AIP in the storage areas designated in the archive holding configuration. The lightweight AIPs are then marked for being pruned later, in order to reclaim their associated RDBMS and storage area space. The repeating attributes table is a matrix of all possible combinations of AIP parent quota settings, AIP parent close settings, and confirmation event options for different AIP mode and xDB mode pairs. AIP parent quotas determine when the ingestion service automatically creates a new AIP parent shareable object for new ingested packages. AIP parent close modes (eas_param_close_mode) sets the conditions for the AIP parent. Property DA Label Description eas_param_aip _quota AIP quota per parent Maximum number of AIPs per AIP parent. The value zero indicates an unlimited number. eas_param_aiu _quota AIU quota per parent Maximum number of AIUs per AIP parent. The value zero indicates an unlimited number. eas_param_ci _quota CI count quota per parent Maximum number of contents per AIP parent (sum of the number of contents of the associated lightweight AIPs). The value zero indicates an unlimited number. eas_param_ci _size_quota CI size quota per parent Maximum content size per AIP parent (sum of the content size of the associated lightweight AIPs). The value zero indicates an unlimited number. eas_param_close _mode AIP parent close mode Set the conditions for closing the AIP parent. • 0 : Never close the parent unless a close request is manually issued • 1: Automatically close the parent when the following condition is met: current date >= close hint date + eas_close_period The close hint date must be returned by the XQuery statement and if the date is not returned, an ingestion error occurs • 2 : Automatically close the parent when the following condition is met: 165 InfoArchive Configuration Property DA Label Description current date >= parent opening date + eas_close_period • 3 : Automatically close the parent when the following condition is met: current date >= last ingestion date of the parent + eas_close_period The eas_close job is responsible for closing AIP parent shareable objects when the conditions are met. Regardless of the AIP parent close mode you choose, the eas_close job always closes an open AIP parent when a manual close request is received. The close request, parent opening date, close hint date, and last ingestion date information are stored as properties of the eas_open_aip_parent_rel relation associated with the parent. eas_param_close _period Close period (d) Period expressed in days used according to the close mode. eas_param_conf _rec_disabled Receipt conf. disabled When set, the receipt event of the ingested AIP is not processed by the confirmation job (this exclusion is done by assigning the ingestion date to the eas_conf_receive_date attribute of the AIP). When no receipt confirmation needs to be generated, setting this flag dramatically decreases the execution time of the confirmation job when a large number of synchronously ingested AIPs are expected. eas_param_conf _ing_disabled Archival conf. disabled Whether or not the receipt event type will be processed by the confirmation (eas_confirmation) job. When this option is set to true, the receipt event timestamp will still be assigned to the eas_conf_receive_date property but no confirmation messages will be generated for storage events. Setting this option to true will significantly reduce the execution time of the confirmation job, especially with large numbers of synchronously ingested AIPs. 1. Configure the holding you want to apply the AIP parenting policy to. Edit the eas_cfg_holding object properties and under the Holding tab, specify the AIP parenting policy name in the Cfg AIP parenting policy for sync ingest field. 166 InfoArchive Configuration Enabling Synchronous Ingestion To enable synchronous ingestion, you need to activate the synchronous ingestion service at both the access node and the holding level. 1. Activate the synchronous ingestion service at the access node level. In DA, locate the access node (e.g., /System/EAS/Nodes/access_node_01), which is the application sever instance hosting the InfoArchive web services. Edit its properties and under the Ingestion tab, select the SIP ingestion enabled option. Note: You need to restart the web services for the change to take effect. 2. Activate the synchronous ingestion service at the holding level. Configure the holding properties and under the Holding tab, select the Synchronous ingestion enabled option at the bottom. This change takes effect immediately. 167 InfoArchive Configuration Configuring Query There are two types of queries: synchronous search that returns query results in real time and order (asynchronous search) that creates search orders and returns search results with a delay. Both type of queries are configured through a query configuration object (eas_cfg_query) and a query quota object (eas_cfg_query_quota). Order requires the configuration of an additional order configuration object (eas_cfg_order) and an order node configuration object (eas_cfg_order_node). Note: An order node does not correspond to a single holding. It works globally for all the holdings configured in the system. • Query configuration object (eas_cfg_query) 168 InfoArchive Configuration The query configuration object (eas_cfg_query) defines: — Which Archiving Information Collection (ACL, each corresponding to a holding) to search (can be configured to search one or more specified holdings) — What search criteria can be used (through the XML content of the query configuration object) — What data in the XML PDI can be returned Optionally, the query configuration object (eas_cfg_query) also lets you dynamically adjust the returned XML result. • Query quota configuration object (eas_cfg_query_quota) The query quota configuration object (eas_cfg_query_quota) defines: — The maximum search range—the number of AIPs and/or AIUs—that can be searched. Data outside the defined scope will not be searched. — The maximum number of results that can be returned. Search execution will stop after the this number is reached and search results will be returned. Only one query quota is allowed for each query configuration, but several query configurations can be defined for an archive holding. You can apply different ACLs on query quota configuration (eas_cfg_query_quota) objects so that different users and groups have different query quotas applied to their searches. A query quota configuration object (eas_cfg_query_quota) can be configured for search and order, as well as delivery channel, which defines where the query results will be returned. • Order configuration object (eas_cfg_order) The order configuration object is applicable to delivery channels and it applies ACLs so that orders are accessible to users with appropriate privileges. • Order node configuration object (eas_cfg_order_node) The order node configuration object configures how the order processor works and is used when an order processor is started. Configuring a Query Quota Configuration (eas_cfg_query_quota) Object In DA, create an object of type eas_cfg_query_quota in the holding (/System EAS/Archive Holdings/MyHolding) folder and configure its properties. 169 InfoArchive Configuration Property DA Label Description eas_name Name Technical name of the query quota configuration. eas_query _applicable Valid for search (synchronous) Whether the quotas are applicable to synchronous searches eas_order _applicable Valid for orders (asynchronous) Whether the quotas are applicable to orders eas_max_duration Max. duration Not implemented, reserved for future use eas_aip_quota AIP quota The maximum number of AIPs allowed for a search range. This value is generally much greater for an order than for a synchronous search. eas_result_cnt _quota # results quota Maximum number of results that can be returned by the query. This value is generally much greater for an order than for a synchronous search. eas_result_size _quota Result size quota Not implemented, reserved for future use eas_superuser _applied Applied to superusers Since superuser Documentum accounts have Read access to all configuration objects, this flag is used to indicate that this configuration should be applied to superusers. However, it is recommended to not configure a Documentum account having superuser privileges since they are used by web services (refer to eas_cfg_access_node type). eas_delivery _channel Delivery channels One or more delivery channel for which the quota configuration is applicable 170 InfoArchive Configuration Defining the Search Criteria and Search Result You define the search criteria by creating them in an XML file and importing it into the repository as the content of the query configuration object (eas_cfg_query). The outline of the query configuration XML is as follows. Elements in square brackets [] are optional. <?xml version="1.0" encoding="UTF-8"?> <request-configs> [<query-prolog> ... </query-prolog>] <request-config schema="URN of the SIP Metadata XML"> <entity> ... </entity> <param> ... </param> ... [<query-template>] ... </request-config> ... </request-configs> Element Description Required query-prolog Defines custom XQuery functions that contains custom query processing logic. The text of the element is dynamically inserted as a prolog into the XQuery expression defined in the query configuration and the XQuery functions defined within the prolog can be referenced in the query-template element. No request-config Defines search criteria Yes query-template Defines query results No Configuring Query for Unstructured Data Unstructured data is stored in the eas_ci_container rendition of an AIP (eas_aip) object. In order to retrieve an unstructured data file from an eas_ci_container rendition, you must first get the AIP ID—eas_aip_id of the AIP (eas_aip) object—and the sequence number of the content file within the AIP that was assigned during ingestion. The unstructured content ID can be constructed from the eas_aip_id metadata in xDB (as is shown below) and the sequence number in the TOC. The eas_aip_id value can be retrieved using the xDB function. 171 InfoArchive Configuration To enable a client application to issue queries that can retrieve unstructured data files: • The XQuery must be written to obtain the AIP ID and the sequence number. • The XQuery results must be modified to insert the AIP id and the sequence number as attributes of the element that contains the name of the unstructured data file. In the following example, a custom XQuery function named eas:add-aip-seqno is defined within the query-prolog XML block. <query-prolog> declare function eas:add-aip-seqno($node as node(), $aip_id as xs:string, $ri_uri as xs:string) as node()? { typeswitch ( $node ) case element(n:FileName) return element { node-name($node) } { attribute {xs:QName('eas:seqno')} { doc($ri_uri)/ris/ri[@pdi_key eq $node/text()][1]/@seqno }, attribute {xs:QName('eas:aipid')}{$aip_id}, $node/node() } case element() return element { node-name($node) } { $node/@*, $node/node()/eas:add-aip-seqno(., $aip_id, $ri_uri) } default return $node }; </query-prolog> As its name indicates, the eas:add-aip-seqno function appends eas:seqno and eas:aipid attributes to each FileName element in the node hierarchy passed as an argument. The value of the inserted eas:seqno attribute is the sequence number of the content associated with the file name present in the table of contents. The eas:aipid is the AIP identifier passed as an argument. To modify the XQuery results, in the query-template XML block, configure an XQuery that processes the default results referenced by the <select/> placeholder, using the function defined in the query-prolog XML block; for example: <query-template xmlns:ri="urn:x-emc:eas:schema:ri" xmlns:xhive="http://www.x-hive.com/2001/08/xquery-functions"> for $Call in <select/> let $pdi_uri := root($Call) let $aip_id := xhive:metadata($pdi_uri, 'eas_aip_id') let $ri_uri := replace(document-uri($pdi_uri),'\.pdi$', '.ri') return eas:add-aip-seqno($Call, $aip_id, $ri_uri) </query-template> 172 InfoArchive Configuration In plain English, the XQuery translates into: For each query result Return the structured content file path containing the result Return the AIP ID (using the xDB function xhive:metadata) Shape the path of the TOC of this AIP Return the modified query result (calling the eas:add-aip-seqno function) The path to the TOC needs to be retrieved because the TOC contains the sequence number for each unstructured content file. The eas:add-aip-seqno function uses this information to add the sequence number to the XQuery results. Here is an example of what the XQuery might return: <FileName xmns:eas="urn:x-emc:eas" eas:seqno="1" eas:aipid="080f424080002144">recording1.mp3</FileName> Instead of adding the AIP id and the unstructured content file sequence number as distinct elements in the results, they can be returned in a single element with the structure of AIPID:ci:SequenceNumber. ci stands for Content Information and is a constant value here. To use this approach, in the query-prolog XML block, define the function to add the eas:cid attribute to each FileName element. InfoArchive does not dictate how this value is inserted in the query results. However, it is good practice to: • Assign it in a distinct attribute (for example, eas_ci_id) of the element that contains the file name of the unstructured content file • Associate a designated namespace to this attribute (e.g., urn:x-company:eas) • Apply a uniform rule across all holdings to facilitate easy configuration and maintenance . This alternative simplifies the structure of the results, which might be something like this: <FileName xmns:eas="urn:x-emc:eas" eas:cid="080f424080002144:ci:1">recording1.mp3</FileName> request-config—Defining Search Criteria The request-config XML block defines the search criteria. At least one request-config block must be defined. You can define multiple request-config blocks when: • The query covers AIPs using different schemas. Additional request-config blocks are needed in order to define the mapping rules to apply for each schema. • When the query configuration is used for an order, and the order results must be sorted, an additional <request-config> block for the schema URN of the results must be defined for the criteria by which the sorting must be performed. The following example configures a query for returning the phone calls metadata and the identifiers of their associated content, ordered by call start date. The search criteria can be customer ID, call start date, or representative ID. <request-config schema="urn:eas-samples:en:xsd:phonecalls.1.0"> <entity> <path>/n:Calls/n:Call</path> <order-by>n:CallStartDate</order-by> 173 InfoArchive Configuration </entity> <param name="CustomerID" type="integer" index="true"> <path>n:CustomerID</path> </param> <param name="CallStartDate" type="date-time" index="true"> <path>n:CallStartDate</path> </param> <param name="RepresentativeID" type="integer" index="true"> <path>n:RepresentativeID</path> </param> </request-config> • The schema attribute specifies the XML schema to apply to AIP metadata in xDB. • The entity element defines the XML block to return for each matching result. • The order-by element defines the default sorting of the results if the query does not include a sort clause. • Each param element defines a search criterion. — The name attribute is the criteria name in the InfoArchive query. — The type attribute is the data type of the element. The valid values are: string, date, dateTime, integer, and double. For example, the data type for CallStartDate is dateTime. — The index attribute indicates whether to create index for the element in xDB. The value of this attribute impacts the structure of the generated XQuery during its execution. In the example, the listed elements will be all indexed in xDB. — You can set the AIU ID (eas_aiu_id) or content file ID (eas_ci_id) as a search criterion to reduce the scope of the query or directly access an AIU or content file: <param name="eas_aiu_id" type="string" index="true" ancestor_alias="0"> <path>.//@_eas:id</path> </param> <param name="eas_ci_id" type="string" index="true" ancestor_alias="0"> <path>.//@_eas:cid</path> </param> • The path element maps to the path of the XML element relative to the path of the entity element with which the criteria is associated. • By default, the matching entities are returned “as is”, without any alteration. Configuring Search Criteria for Multi-Level XML Structures When eas_pdi.xml has an XML document structure with multiple levels of nested elements, define the search criteria in a top-down manner starting from higher-level elements rather than just traversing the AIU elements at the lowest level. This will significantly improve query performance, especially when the XML structure is deep. For example, in the following XML structure, the AIU element Call is nested within the Dept element, which is in turn nested within the Calls element: <Calls> <Dept id="1"> <Call call-id="1"/> <Call call-id="2"/> 174 InfoArchive Configuration <Call call-id="3"/> </Dept> <Dept id="2"> <Call call-id="1"/> <Call call-id="5"/> </Dept> </Calls> For optimal query performance, define the search criteria for this XML structure this way: <request-config> <entity> <path>/Calls[]/Dept[]/Call</path> </entity> <param name="dept-id" type="string" ancestor_alias="1"> <path>@id</path> </param> <param name="call-id" type="string"> <path>@call-id</path> </param> </request-config> In the Path element that specifies the path to the AIU element, each nesting level above the AIU element is represented by an empty bracket []. The ancestor_alias attribute denotes the nesting level of the element, with the lowest level being 0, the level above being 1, and so on so forth. If not specified, the default ancestor_alias value for the parameter is 0. In the example above, the ancestor_alias values for the elements are as follows (internally interpreted as such): <paths> <path alias="2">/Calls</path> <path alias="1">/Dept</path> <path alias="0">/Call</path> </paths> During query, the search criteria defined above will be translated into XQuery expressions such as the following: for $call in /Calls/Dept[@id='2']/Call[@call-id='5'] return $call Note: The syntax for defining search criteria for multi-level XML structures has the following limitations: • Two or more search criteria cannot be used together in an OR relation • Search results cannot be ordered by search criteria defined in this way Grouping Search Criteria For some content types, the search configuration must include the definition of a significant number of search criteria below a common sub path. In such situations you can enclose related parameter definitions in the <group> element to avoid repeating the sub path for each criterion. The example below shows how to group search criteria. <group> <path>op:index</path> <param name="settlement" type="date"> <path>head:settlementDay</path> </param> <param name="ref-order" type="string" index="true"> 175 InfoArchive Configuration <path>head:orderRef</path> </param> </group> • A path element is inserted directly under the <group> tag. The value enclosed between <path> tags is the common sub path. • The settlement and ref-order criteria are respectively op:index/head:settlementDay and op:index/head:orderRef. As shown in the example below, you can nest group tags. <group> <param name="settlement_rop" type="date" index="true"> <path>op:cancel-operation/op:index/head:settlementDay</path> <path>op:R-operation/op:index/head:settlementDay</path> </param> <group> <path>(op:R-operation | op:cancel-operation)/op:index</path> <param name="direction_rop" type="string"> <path>head:orderDirection</path> </param> <param name="amount_rop" type="decimal"> <path>head:amount</path> </param> </group> The usage of the group tag not only enhances the readability of the configuration but also allows to influence the generation of the XQuery. If a criterion belongs to a group, the generated XQuery also includes all the criteria defined in the group: • Inclusion of the received filtering condition for the received criterion • Inclusion of the condition that the other criteria defined in the group exist This behavior has been retained for being able to leverage potential xDB indexes composed of multiple criteria. A side effect of this behavior is the generated XQuery does not return an AIU if one of the criteria of the group does not exist for this AIU. Because of this behavior, it is recommended to impose in the data the presence of all the elements associated with the criteria. To summarize, the following rules can be considered depending on the XML element associated with a search criteria: • If an XML element is mandatory and is not included in a composed index, the inclusion or not of the criteria in a group tag is indifferent. • If an XML element is optional and the criteria must be isolated in a dedicated group for avoiding the side effect mentioned above. • If the element is included in a composed index, the criteria must be included in a group containing the criteria associated with the other elements included in the group. Searching Multiple Paths If an element appears at multiple locations in the XML hierarchy, you can include all these tags in a search criterion. This query configuration will perform an OR operation on the tags. The following example shows a repeated <FirstName> in <path> tags. 176 InfoArchive Configuration <param name="Document.RecipientFirstName" type="string" index="false"> <path>n:Documents/n:Document/n:Recipients/n:Recipient/n:InternalEntity /n:Person/n:FirstName</path> <path>n:Documents/n:Document/n:Recipients/n:Recipient/n:ExternalEntity /n:Person/n:FirstName</path> </param> query-template—Defining Search Results The optional query-template element contains the XQuery expression that defines the XML content of the returned query results, which will be formatted by the stylesheet. You can define query results to return not only AIU properties contained in eas_pdi.xml, but also associated AIP properties. In the XQuery expression, you can call dctm-retriever custom function defined in the query-prolog element to get standard or custom properties (e.g., r_object_id, eas_dss_holding, eas_aip_type, etc.) of the AIPs returned by the query and include them in the query results. The custom function dctm-retriever takes in two arguments: the AIP object ID and the property of the AIP to retrieve; for example: <query-template> for $aiu in <select/> let $pdi_uri:=root($aiu) let $aip_id:=xhive:metadata($pdi_uri,'eas_aip_id') return &lt;parent>{ ( $aiu/@id, attribute{xs:QName('object_id')}{eas_functions:dctm-retriever($aip_id,'r_object_id')}, ) } </parent> </query-template> In this example, the XQuery expression within the query-template element retrieves every AIU (referenced by the <select/> placeholder) and returns them as the parent element in the query result. The custom function dctm-retriever is used to get the r_object_id attribute for each returned AIP and sets it as the object_id attribute of the parent element in the query result. The returned query result looks something like this: <Calls xmlns="urn:eas-samples:en:xsd:phonecalls.1.0"> <parent object_id="0800000180002904"/> <parent object_id="0800000180002904"/> </Calls> The parent element can repeat as many times as the number of AIUs contained in the AIP. Note: For performance reasons, in the 2-tiered InfoArchive search, the tier-1 search (DQL) does not fetch all of the AIP properties. The dctm-retriever function executes as a part of the tier-2 search and can only retrieve AIP properties already fetched by the DQL query. If the AIP property the dctm-retriever function tries to get is not present in the DQL query result, or if the property is a custom subtype, the function will return an empty result and an exception will occur. 177 InfoArchive Configuration For example, if you try to use the dctm-retriever function to get the value of the eas_purge_date property, which is not fetched by the tier-1 search, the query will fail. Although eas_purge_lock_count is not a standard AIP property, it is exposed as a virtual property and can be passed as an argument to the dctm-retriever function to return the number of the AIUs with a purge lock in each AIP in the query result; for example: <query-template> for $aiu in <select/> let $pdi_uri:=root($aiu) let $aip_id:=xhive:metadata($pdi_uri,'eas_aip_id') return &lt;parent>{ ( $aiu/@id, attribute{xs:QName('count')}{eas_functions:dctm-retriever($aip_id,'eas_purge_lock_count')}, ) } </parent> </query-template> Configuring a Query Configuration (eas_cfg_query) Object In DA, create an object of type eas_cfg_query /System EAS/Archive Holdings/MyHolding (or any other folder) configure its properties. The query configuration object must contain an XML file that defines the search criteria. See Defining the Search Criteria and Search Result, page 171. Property DA label Description eas_name Name Name of the query configuration; for example, query.MyHolding 178 InfoArchive Configuration Property DA label Description eas_result_schema Result Schema name URN of the schema applied by the results that are returned by this query configuration. You can only create one query configuration object for the same result schema. eas_result_root _element Result root element Root of the data in the PDI file, for example, Calls for PhoneCalls in eas_pdi.xml eas_cfg_quota _predicate Query quota predicate DQL predicate used to search for a query quota (eas_cfg_query_quota) object to be use with this configuration; for example: eas_cfg_query_quota where eas_name like 'quota.PhoneCalls.% The value specified here must be a valid DQL predicate that returns only one eas_cfg_query_quota object for the dynamic roles received in a search request. If this DQL predicate returns more than one eas_cfg_query_quota object, search execution will fail. eas_cfg_order _predicate Cfg order predicate DQL predicate that selects one and only one eas_cfg_order object for a given user; for example: eas_cfg_order where eas_name like 'order.PhoneCalls.%' eas_result_root _element Result root element XML root element returned in the query results eas_result_root_ns _enabled Namespace in root element If checked, the result namespace is associated to the XML root element of the result eas_cfg_aic Archival Information Collection (AIC) Collection name, for example, PhoneCalls Name of the Archive Information Collections (AIC, a holding is a collection) for which this query configuration can be applied Configuring an Order Configuration (eas_cfg_order) Object In DA, create an object of type eas_cfg_order in the holding folder (e.g., /System EAS/Archive Holdings/MyHolding) and configure its properties. 179 InfoArchive Configuration Property Name DA Label Description eas_name Name Name of the order configuration object referenced by other objects eas_delivery _channel Delivery channel One or more delivery channels to which this order configuration object is applicable eas_priority Priority Execution priority; the order node processes orders having the highest value first eas_deadline Execution deadline (mn) Maximum desired execution deadline, used to prioritize order execution.The execution deadline corresponds to an SLA agreed with the business owner. When the order is created, its eas_deadline_date property is set to the creation date + the execution deadline. This setting does not guarantee the order will complete before this date and time; it is only used to prioritize order execution for orders having the same execution priority. When two orders have the same priority, the order with the closest deadline date/time gets selected for execution. 180 InfoArchive Configuration Property Name DA Label Description Therefore, the execution deadline can be considered a sub-priority. eas_folder_path Repository folder Repository location in which the order (eas_order) objects are created eas_acl_name ACL name Name of the permission set to apply to the order configuration object; for example, PhoneCalls.ORDER eas_acl_domain ACL domain Domain of the permission set to apply to the order configuration object eas_alias_set Alias set The permission set applied on the eas_order repository object must grant Read access to one of the user’s role. This makes it possible to limit the visibility of an order to specific roles. For example, one role of users can be responsible for posting orders in sequence while another role can process the results of completed orders. eas_owner_name Owner Owner to assign to the order repository object eas_retention _period Retention period Maximum retention period of the order after its execution eas_working _deliv_channel Working delivery channel Defines where to store the intermediate results in xDB eas_working _deliv_par_name Working deliv.chan.param Working delivery channel parameter name. eas_working _deliv_par_value Working deliv.chan.param. value Working delivery channel parameter value associated with the parameter name at the same index. eas_working _store Working filestore Only used when encrypted data is returned. eas_superuser _applied Applied to superusers When the result contains structured data which has been encrypted during ingestion, the order node does not store the result within xDB. Instead, it dynamically encrypts the whole result and stores it as content in the repository, using the configured working filestore. This principle ensures that sensitive structured data contained in the order result are not accessible to an administrator having access at the xDB and the file system levels. Since superuser accounts have at least Read access to all configuration objects, this boolean is used to indicate that this configuration should be applied to superusers. However, it is recommended to not configure a Documentum account having the superuser privilege for they are used by web services (refer to eas_cfg_access_node type). 181 InfoArchive Configuration Configuring an Order Node Configuration Object A default order node configuration object order_node_01 was already configured during InfoArchive installation and can be used out of the box. You can modify its properties as needed. To create and configure a new order node, create an object of type eas_cfg_order_node and configure its properties. Property Name DA Label Description eas_name Name Name of the order node. eas_order _predicate Order predicate DQL predicate to restrict the order requests considered by the node for processing. This is used to dedicate order nodes to specific types of queries. eas_log_level Log level Log level used by the order processing. eas_fs_working _root Working directory Path to the working directory in the file system for this node. eas_worker _thread_cnt # worker threads Number of order execution threads on the host computer. After updating the thread count, you must restart the order processor for the change to take effect. 182 InfoArchive Configuration Property Name DA Label Description eas_polling _interval Polling interval (ms) Order queue polling interval in milliseconds eas_cacheout _enabled Cache out processing Activates the background task caching out the least used xDB detachable libraries for archive holdings and library pools having exceeded their quota. If this option is not selected, the order node will not remove libraries from the xDB file system. You do not need to restart the system for changes to this setting to take effect. eas_cacheout _interval Cache out processing interval (ms) Time interval between activation of the background task caching out the least used xDB libraries. eas_user_name User name Deprecated property eas_processed _order_cnt # processed orders Incremented when an order is processed; used for monitoring eas_act_indicators _enabled Update activity indicators Indicates if the order node must update the AIP activity indicators For performance reasons, the order processor does not individually update an AIP (eas_aip) object each time it processes an order scoping the AIP. The order processor manages in memory the list of AIPs scoped by processed orders. It regularly flushes this list, to update the AIP activity indicators. If a number of processed orders are scoped by the same AIP, the AIP activity indicators are updated in memory. The AIP object is updated only once, later. When the first limit is reached, the order processor updates the activity eas_def_upd_max _entries Deferred update max. entries Maximum number of AIP entries to keep in memory eas_def_upd _flush_threshold Deferred update flush threshold Maximum number of AIP entries modified in the activity indicator cache since the last flush. When this limit is reached, modified entries are written to the database. eas_def_upd _flush_interval Deferred update max. interval (sec.) Maximum time interval in seconds between flushes. When the first limit is reached, the order processor updates the activity indicators of the AIP objects marked as modified since the last flush. eas_log_close _pending Log close pending Flag indicating that the log should be closed without stopping the node itself. 183 InfoArchive Configuration Property Name DA Label Description eas_start_date Last start date The date of the latest startup of the node. eas_start_proc _order_cnt # processed orders since startup Reset to 0 at order processor startup then incremented when an order is processed; used for monitoring eas_is_suspended Suspended Allows the dynamic suspension or resumption of the associated order processor eas_stop_pending Stop pending This boolean property allows the administrator to remotely stop the associated order processor; for example, using DA or IDQL eas_stop_date Last stop date The date of the latest stop of the node. Starting the Order Node Start the order node by executing the following command located in EAS_HOME/bin: • eas-launch-order-node.sh (Linux) • eas-launch-order-node.bat (Windows) Keep the prompt window open. Note: To stop the order node, you must explicitly execute the following command located in EAS_HOME/bin: • eas-stop-order-node.sh (Linux) • eas-stop-order-node.bat (Windows) Shutting down the host without stopping the order node properly may cause failure the next time you try to start it. If this happens, force-start the order node using the force (-f) option. On Windows, order node is installed as a Windows service. You can start/stop order node by starting/stopping the EAS Order Node service in the Services Microsoft Management Console (MMC). Note: NEVER execute the eas-launch-order-node script when the EAS Order Node Windows service is already running. Configuring InfoArchive GUI Shipping within the InfoArchive installation package, InfoArchive GUI is the default web-based search application for searching data archived in InfoArchive holdings. You must configure InfoArchive GUI for your holdings before using this search application. Skip this section if you build your own search application by leveraging InfoArchive web services instead of using InfoArchive GUI. You can configure the following components of InfoArchive GUI: • Search menu 184 InfoArchive Configuration The search menu groups search forms under folders based on which holding they are used to search, each folder corresponding to a distinct holding. • Search form The search form contains search criteria fields and search buttons with which the end user performs queries against a holding. A search form is specific to InfoArchive GUI and is the mechanism used to define what can be searched. • Search result The search result page displays the returned query results. In addition, you can customize the cascading style sheet (CSS) used by InfoArchive GUI by creating your custom styles in the empty custom.css file located in the css directory within the InfoArchive GUI web application. For example, on Apache Tomcat, the file can be found in the following location: TOMCAT_HOME/webapps/eas-gui/css Configuring the Search Menu You configure InfoArchive GUI search menu by creating a search form folder configuration object (eas_cfg_search_form_folder) for each holding in a configured root folder in the repository (/System EAS/Search Forms by default). You then populate each folder with search form configuration (eas_cfg_search_form) objects used to search that holding. Each search form folder configuration 185 InfoArchive Configuration (eas_cfg_search_form_folder) object is automatically rendered as a folder in InfoArchive GUI with search forms grouped under it. The root folder that holds search form folder configuration (eas_cfg_search_form_folder) objects in the repository is configured through an access node configuration (eas_cfg_access_node) object, which also links the search forms to the InfoArchive web services. InfoArchive web services are configured through the eas_service.properties file located in the web application deployment directory such as C:\app\apache-tomcat-7.0.42\webapps\eas-services\WEB-INF\classes. The eas_service.properties file contains an eas_access_node property that points to the access node configuration (eas_cfg_access_node) object. InfoArchive GUI solely uses the InfoArchive web services and does not directly connect to the repository and xDB. Multiple access nodes to points to the same search form folder configuration (eas_cfg_search_form_folder) object. Configuring a Search Form Folder In DA, create an object of type eas_cfg_search in /System EAS/Search Forms (the default root folder) and configure its properties. 186 InfoArchive Configuration Property Name DA Label Description eas_name Name Name that InfoArchive uses to refer to the search form eas_order_no Order No. The integer value that determines the order in which search form folders are displayed in the search menu in InfoArchive GUI. Items are displayed in ascending order of the assigned values. eas_consumer _application Consumer application Restricts visibility to a specific consumer application; for example, eas_gui (InfoArchive GUI) eas_language_code Language code Language code in the format language_country (ISO 639, 3166); for example: fr_FR for French and zh_CN for simplified Chinese. eas_title Title Title of the menu item for the associated language/locale eas_description Description Description of the menu item Configuring a Search Form InfoArchive GUI uses standard 1.1 XForms for displaying the search user interface rendered completely on the client browser by the EMC Documentum XForms Engine. The EMC Documentum XForms Engine is a pure client-side XForms implementation that runs entirely from within a web browser. It is capable of rendering very flexible and dynamic forms without the need for a plugin or processing outside of the browser. The search form is primarily configured through the search form configuration object (eas_cfg_search_form). Here are the general steps: 1. Configure a query configuration (eas_cfg_query) object that defines the data to be searched and the usable search criteria. 187 InfoArchive Configuration Note: For the content (unstructured data) functionality to work the internal identifier of the contents must be included in the query result. 2. Configure one or more query quota configuration (eas_cfg_query_quota) objects to define the quota to be considered for serving the search. 3. Configure a delivery channel configuration (eas_cfg_delivery_channel) object to define the delivery channel to use. 4. Configure XForms for searching the holding. 5. Configure a search form configuration (eas_cfg_search_form) object with the configured XForms as its content. The recommended approach to creating a new search form is to start with an existing search form (such as the search forms in the sample holdings that ship with InfoArchive). The PhoneCalls search form sample can be found here: /install/unsupported/holdings/PhoneCalls/template/content/eas_cfg_search _form.PhoneCall.01.xml Configuring XForms You configure an XForms XML file and later import it into the repository as the content of the search form configuration (eas_cfg_search_form) object. To get a quick start on configuring an XForms, you can customize an XForms of the sample PhoneCalls holding that ships with InfoArchive or configure and install a simple holding using the InfoArchive Holding Configuration Wizard and then build upon the basic To accelerate the development of search forms, EMC recommends that you test XForms locally using a web browser. To do this, retrieve the Formula project from https://community.emc.com/docs/DOC-7172 and replace the $formula/war/example.xml with your own XForms; then open index.xml in Firefox. You can adjust the look and feel of your search form by referencing eas.css and bootstrap.css in the header. 188 InfoArchive Configuration XForms Structure XForms consists of the following structural components: • Search criteria Defines the search criteria and logical operators used to construct a search request in the search form. • Bindings — Links the search criteria to form controls on the search form — Defines the validation rules to apply on search criteria values — Defines the computation rules (optional) to apply on search criteria values • Form controls Provides the controls in the search form for the user to interact with, such as fields, lists, and buttons. The form control definition also enables the application of the desired initialization/reinitialization rules on the value presented in the controls; for example, reinitialization of the value presented in a control when the value of another control is changed. XForms Example Here is an example of a simple XForms that renders into the following search form in InfoArchive GUI: In the example: • Defined within the criterias tags are the criteria by which to search data, each criteria element corresponding to a single search criterion. • The xform:bind elements are used to bind each search criterion to a form control in the search form. • Defined within the xhtml:body tags are the form controls—fields, labels, and buttons— that the user can interact with in the search form. <?xml version='1.0' encoding='UTF-8'?> <xhtml:html xmlns:xhtml="http://www.w3.org/2002/06/xhtml2" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:fn="http://www.w3.org/2005/xpath-functions"> 189 InfoArchive Configuration <xforms:model> <xforms:instance id="PhoneCallsSimple1" xmlns=""> <request> <criterias> <criteria name="CritCallStartDateLower" operator="GreaterOrEqual" model="CallStartDate" gui_display="From date"/> <criteria name="CritCallStartDateUpper" operator="LessOrEqual" model="CallStartDate" gui_display="To date"/> <criterias relation="OR"> <criteria name="CritCallCustomerID" operator="Equal" model="CustomerID" gui_display="Customer ID"/> <criteria name="CritCallCustomerLastName" operator="StartsWith" model="CustomerLastName" gui_display="Customer LastName"/> </criterias> </criterias> </request> </xforms:instance> <xforms:bind id="bindCallStartDateLower" required="true()" type="xforms:dateTime" nodeset="/request/criterias/criteria[@name='CritCallStartDateLower']"/> <xforms:bind id="bindCallStartDateUpper" required="true()" type="xforms:dateTime" nodeset="/request/criterias/criteria[@name='CritCallStartDateUpper']"/> <xforms:bind id="bindCallCustomerID" required="false()" type="xforms:positiveInteger" nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerID']"/> <xforms:bind id="bindCallCustomerLastName" required="false()" type="xforms:string" nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']"/> <xforms:submission id="search" ref="/request" replace="none"/> <xforms:submission id="order" ref="/request" replace="none"/> </xforms:model> <xhtml:body> <xhtml:div class="form-horizontal"> <xhtml:div style="margin-left: 10px"> <xhtml:fieldset> <xhtml:legend>Phone Calls</xhtml:legend> <xhtml:div class="control-group"> <xhtml:label class="control-label">Call received between</xhtml:label> <xhtml:div class="controls controls-row"> <xforms:input bind="bindCallStartDateLower" id="input_call_start_date_lower" class="input-small"> <xforms:hint>Start date format:<xforms:output value="instance('eas-context-info') /date_formats/date_format[1]"/></xforms:hint> <xforms:message level="ephemeral" ev:event="xforms-invalid">Start date is invalid. It must follow the format: <xforms:output value="instance('eas-context-info') /date_formats/date_format[1]"/></xforms:message> </xforms:input> <xforms:input bind="bindCallStartDateUpper" id="input_call_start_date_upper" class="input-small"> <xforms:hint>End date format:<xforms:output value="instance('eas-context-info') /date_formats/date_format[1]"/></xforms:hint> <xforms:message level="ephemeral" ev:event="xforms-invalid">End date is invalid. It must follow the format: <xforms:output value="instance('eas-context-info') /date_formats/date_format[1]"/></xforms:message> </xforms:input> </xhtml:div> </xhtml:div> <xhtml:div class="control-group"> <xhtml:label class="control-label">Customer ID</xhtml:label> <xhtml:div class="controls"> <xforms:input bind="bindCallCustomerID" id="input_call_customer_id" class="input-large"> <xforms:hint>Customer ID</xforms:hint> </xforms:input> </xhtml:div> </xhtml:div> <xhtml:div class="control-group"> <xhtml:label class="control-label">Customer Last Name</xhtml:label> 190 InfoArchive Configuration <xhtml:div class="controls"> <xforms:input bind="bindCallCustomerLastName" id="input_call_customer_last_name" class="input-large"> <xforms:hint>Customer Last Name</xforms:hint> </xforms:input> </xhtml:div> </xhtml:div> </xhtml:fieldset> </xhtml:div> <xhtml:div class="form-actions"> <xhtml:span class="pull-right"> <xforms:trigger class="btn"> <xforms:label>Reset</xforms:label> <xforms:reset ev:event="DOMActivate"/> </xforms:trigger> <xforms:submit submission="order" incremental="false" class="btn"> <xforms:label>Background search</xforms:label> </xforms:submit> <xforms:submit submission="search" incremental="false" class="btn btn-primary"> <xforms:label>Search</xforms:label> </xforms:submit> </xhtml:span> </xhtml:div> </xhtml:div> </xhtml:body> </xhtml:html> Search Criteria Search criteria filter data and limit search results to a subset of data that matches the search conditions. You define search criteria inside the criterias tags, each criteria element specifying a search criterion in the search form. Once defined, search criteria will be used to construct XQuery expressions for querying data, and will also be displayed on the search result page. For example, the following criterion definition translates into a search field labeled Customer ID in the search form, and when a value is specified in the field, only AIU records with the exact matching CustomerID value will be returned as search results. <criteria name="CritCallCustomerID" operator="Equal" model="CustomerID" gui_display= "Customer ID"/> The criteria element has the following attributes: Attribute Description name Descriptive name of the criterion, referenced in the search binding 191 InfoArchive Configuration Attribute Description operator Comparison operator that defines how a value specified in the search field is to be compared against the AIU property (metadata) value to filter data. Valid operators (case-sensitive) are: Equal, NotEqual, Greater, GreaterOrEqual, Less, LessOrEqual, StartsWith, and Contains. Make sure the operator you use is supported by the property data type. The following table lists the valid operators supported by each data type: Date model DateTime String Integer Double Equal * * * * * NotEqual * * * * * Greater * * * * GreaterOrEqual * * * * Less * * * * LessOrEqual * * * * StartsWith * Contains * AIU property to use as a search filter. This must be a search parameter defined as a search criterion in the query configuration (eas_cfg_query) object (XML content) configured for the same holding; for example: <param name="CustomerID" type="decimal" index="true"> <path>n:CustomerID</param> For information about defining search criteria, see request-config—Defining Search Criteria, page 173. gui_display Label of the search field displayed in the search form Search Form Binding In search form binding, each xform:bind element identifies an input value for the search criterion to be bound to a form control in the search form. A binding also specifies the data type of the search value and whether it is required; for example: <xforms:bind id="bindCallCustomerID" required="false()" type="xforms:positiveInteger" nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerID']"/> Attribute Description id Uniquely identifies the binding required Whether or not the input value of search criterion is required 192 InfoArchive Configuration Attribute Description type Data type of the search value nodeset Full XPath to the search value attribute in the criterion definition Search Form Controls The bind attribute of the form control links the form control to an <xforms:bind> element, as well as a criterion in the data model; for example: <xforms:input bind="bindCallStartDateStart" id="input_from_creation_date"> <xforms:hint>Start date</xforms:hint> </xforms:input> InforArchive GUI uses the ID of the submission elements to distinguish between synchronous searches and orders (asynchronous searches): for example: Submission in the data model section <xforms:submission id="search" replace="none" ref="/request"/> • id: “search” or “order” Submission buttons in the form controls section <xforms:submit submission ="search" incremental="false"> <xforms:label>Search</xforms:label> </xforms:submit> • submission: References the submission id • replace: Must be “none”, or a blank page will be rendered in the browser Logical Grouping of Search Criteria By default, multiple search criteria are combined using the AND relation (translated into the AND logical operator in the resultant XQuery expression). That is, all the search criteria must be met for the data to be returned by the query. You can group multiple search criteria using the OR relation by enclosing them inside the criterias element and setting its relation attribute to OR (if not specified, the default value is AND). Multiple criterias elements can be nested to create more complex search criteria with a combination of logical relations. Note: If an element is defined as a partitioning key, do not include it in the OR relation in the search criteria; otherwise, the returned search results will be incorrect. The partitioning key is used to narrow down the scope of the AIPs in the tier-1 search, and the tier-2 XQuery search constructed by the search criteria can only be executed within this scope (AND relation), not outside it (OR relation). In plain English, the following criteria definition translates into: a call record meets the search criteria when the CallStartDate is between a specified date A and a specified date B AND either the customer ID equals a specified value OR the customer last name starts with a specified value. 193 InfoArchive Configuration <criterias relation="AND"> <criteria name="CritCallStartDateLower" operator="GreaterOrEqual" model="CallStartDate" gui_display="From date"/> <criteria name="CritCallStartDateUpper" operator="LessOrEqual" model="CallStartDate" gui_display="To date"/> <criterias relation="OR"> <criteria name="CritCallCustomerID" operator="Equal" model="CustomerID" gui_display="Customer ID"/> <criteria name="CritCallCustomerLastName" operator="StartsWith" model="CustomerLastName" gui_display="Customer LastName"/> </criterias> </criterias> When rendered in the search form, the logical relations among multiple criteria are not spelled out on the search screen (unless you specify them in the field label), but can be found in the Search Details panel at the top of the search results screen. Multiple Input Values for a Single Search Criterion A search criterion can contain multiple input values combined with the OR relation. For example, in the following search form, the user can enter three values for the Customer Last Name search criterion, and records that match any one of these values will be returned: To define multiple input values for a search criterion, in the search criteria definition, include as many empty value elements you want to create inside the criteria element; for example: <criteria name="CritCallCustomerLastName" operator="StartsWith" model ="CustomerLastName" gui_display="Customer LastName"><value></value><value>< /value><value></value></criteria> In the search bindings, create a binding for each input value you have defined; for example: <xforms:bind id="bindCallCustomerLastName1" required="false()" type="xforms:string" nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']/value[1]"/> <xforms:bind id="bindCallCustomerLastName2" required="false()" type="xforms:string" nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']/value[2]"/> <xforms:bind id="bindCallCustomerLastName3" required="false()" type="xforms:string" nodeset="/request/criterias/criterias/criteria[@name='CritCallCustomerLastName']/value[3]"/> In the form controls definition, define an input field for each input value you have defined; for example: <xhtml:div class="control-group"> <xhtml:label class="control-label">Customer Last Name</xhtml:label> 194 InfoArchive Configuration <xhtml:div class="controls"> <xforms:input bind="bindCallCustomerLastName1" id="input_call_customer_last_name1" class="input-large"> <xforms:hint>Customer Last Name</xforms:hint> </xforms:input> <xforms:input bind="bindCallCustomerLastName2" id="input_call_customer_last_name2" class="input-large"> <xforms:hint>Customer Last Name</xforms:hint> </xforms:input> <xforms:input bind="bindCallCustomerLastName3" id="input_call_customer_last_name3" class="input-large"> <xforms:hint>Customer Last Name</xforms:hint> </xforms:input> </xhtml:div> </xhtml:div> Defining InfoArchive GUI Locales You define which locales are supported by InfoArchive GUI as well as default locale and formats in the eas-gui.properties file located in the WEB-INF/classes directory of InfoArchive GUI web application. For example, on Apache Tomcat, the configuration file can be found in the following location: TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes Edit eas-gui.properties and configure locale-related settings; for example: eas.locale.default=en_US eas.locales=en_US,fr_FR,zh_CN eas.client.config.default.dateformat=yyyy-MM-dd eas.client.config.en_US.dateformat=dd/MM/yyyy,ddMMyyyy eas.client.config.fr_FR.dateformat=dd/MM/yyyy,ddMMyyyy eas.client.config.zh_CN.dateformat=dd/MM/yyyy,ddMMyyyy For each locale, you can define multiple supported date format and/or dateTime format, delimited by comma. However, the first value will be the default format used by the date/dateTime control on the UI. Restart the web Application server for the changes to take effect. Localizing a Search Form A search form can be localized into multiple languages. In InfoArchive GUI, you can switch between different language versions of a search form by choosing a language locale at login. In the configuration of a localized search form, the labels of the form controls are externalized. All the localized resources (form name, labels, and hint messages) are stored in a localization properties file, one for each language. The localization properties file is imported into the search form configuration (eas_cfg_search_form) object as one of its renditions. It is applied to the corresponding language version of the search form in InfoArchive through a language code defined for the application in the search form configuration. 195 InfoArchive Configuration To localize a search form into a non-English language: 1. Edit the properties of the search form configuration (eas_cfg_search_form) object. Under the Description tab, add the language code to be supported by InfoArchive GUI application; for example: eas_gui is the application code for InfoArchive GUI. 2. Externalize the localization resources such as form name, labels, and hint messages. Edit the XML content of the search form configuration (eas_cfg_search_form) object to replace actual form name, labels, and hint messages with string variables; for example: ... <request> <criterias> <criteria name="CritCallCustomerID" operator="Equal" model="CustomerID" gui_display="${customer.id.label}"/> </criterias> </request> ... <xhtml:body> <xhtml:div class="form-horizontal"> <xhtml:div style="margin-left: 10px"> <xhtml:fieldset> <xhtml:legend>${form.name}</xhtml:legend> <xhtml:div class="control-group"> <xhtml:label class="control-label">${customer.id.label}</xhtml:label> <xhtml:div class="controls"> <xforms:input bind="bindCallCustomerID" id="input_call_customer_id" class="input-large"> <xforms:hint>${customer.id.hint}</xforms:hint> </xforms:input> </xhtml:div> </xhtml:div> </xhtml:fieldset> </xhtml:div> <xhtml:div class="form-actions"> <xhtml:span class="pull-right"> <xforms:submit submission="order" incremental="false" class="btn"> <xforms:label>${button.backgroundsearch.label}</xforms:label> </xforms:submit> <xforms:submit submission="search" incremental="false" class="btn btn-primary"> 196 InfoArchive Configuration <xforms:label>${button.search.label}</xforms:label> </xforms:submit> </xhtml:span> </xhtml:div> </xhtml:body> ... 3. Create a localization properties file containing translations of the externalized resources (form name, labels, and hint messages) in the target language. The localization file must be encoded in UTF-8 and its file name must be suffixed with the language code and have the .properties extension, such as form.PhoneCalls.zh_CN.properties. In the following example localization properties file, strings translated into the Simplified Chinese language are assigned to the resource variables. form.name=通话录音 customer.id.label=客户 ID customer.id.hint=通话客户的标识 button.backgroundsearch.label=后台搜索 button.search.label=搜索 4. Import the localization properties file as a rendition of the search form configuration (eas_cfg_search_form) object. a. Right-click the eas_cfg_search_form object and choose View > Renditions from the shortcut menu. b. With the existing XML Document rendition selected, choose Import Rendition from the File menu. c. Select the localization properties file to import. d. If the localization properties file conforms to the correct naming convention and file format (.properties), InfoArchive will recognize its format as eas_localization (localization properties file) and automatically fills out the Format field. e. In the Page modifier field, enter the language code, such as zh_CN for Simplified Chinese. f. Click OK. The imported file is displayed as a rendition of the search form configuration (eas_cfg_search_form) object. The search form is localized. When you access the form in InfoArchive GUI using the corresponding locale, the form name, labels, and hint messages will all appear in the target language. 197 InfoArchive Configuration When the localization property file is missing for a specific locale, the default one defined in the eas-services.properties file (eas.locale.default=en_US) is used. Configuring Hints and Error Messages A hint or tooltip is a piece of concise information about a UI control (e.g., input field) that appears in a small “hover box” when the user hovers the cursor over the control. An error message is displayed when an error occurs, to alert the user to the possible causes of the error and help the user troubleshoot. You can optionally configure hints and error messages using the xforms:hint and xforms:message elements respectively nested within UI control elements. In the following example, a hint and an error message are defined for the Call Start Date input control to inform the user of the acceptable date format. <xforms:input bind="bindCallStartDateLower" id="input_call_start_date_lower" class="input-small"> <xforms:hint>Start date format:<xforms:output value="instance('eas-context-info') /date_formats/date_format[1]"/></xforms:hint> <xforms:message level="ephemeral" ev:event="xforms-invalid">Start date is invalid. It must follow the format: <xforms:output value="instance('eas-context-info') /date_formats/date_format[1]"/></xforms:message> </xforms:input> Note: In the hint, the xforms:output element retrieves the first (as indicated by [1]) valid date format of the current locale from eas-gui.properties, where locales are defined. Both hints and error messages are optional for UI controls. However, if you do not define an error message for an input control, when an error occurs upon search form submission, a default system error message will still display in the following format: input_control_id is invalid. In the example above, without the xforms:message element, the default system error message would be: input_call_start_date_lower is invalid. Configuring a Search Form Configuration (eas_cfg_search_form) Object In DA, create an object of type eas_cfg_search_form under the corresponding search form folder configuration (eas_cfg_search_form_folder) object and configure its properties. You then import the configured XForms as the content of the object. 198 InfoArchive Configuration Property Name DA Label Description eas_name Name Value used by InfoArchive to refer to this form eas_aic_predicate AIC DQL predicate Returns the holding(s) which can be searched with the form. The AIC DQL predicate must select eas_cfg_aic objects. Most often, this predicate selects a eas_cfg_holding object (a sub-type of eas_cfg_aic). Therefore, this predicate is generally used to select the holding to query. eas_result _schema _predicate Result Schema DQL predicate Returns the XML schema which can be requested by the form. The Result Schema DQL predicate must select a eas_cfg_schema object. This predicate most often selects the eas_cfg_schema associated with the schema applied by the data archived in the holding to query. eas_result _delivery_pred Delivery channel DQL predicate Returns the delivery channels which can be requested by the form as the destination of search results. The Delivery channel DQL predicate must select a eas_cfg_delivery_channel object. With the current InfoArchive release, this predicate must always select the standard eas_cfg_delivery_channel: eas_access_services. 199 InfoArchive Configuration Property Name DA Label Description eas_order_no Order No. The integer value that determines the order in which search forms are displayed in the search menu in InfoArchive GUI. Search forms are displayed in ascending order of the assigned values. eas_consumer _application User application Restricts visibility to a specific consumer application; for example, eas_gui (InfoArchive GUI) eas_language _code Language code Language code in the format language_country (ISO 639, 3166); for example: fr_FR for French and zh_CN for simplified Chinese. eas_title Title Title of the search form for the associated language/locale eas_description Description Description of the search form Configuring the Search Results Search results are rendered in InfoArchive GUI using an XML based stylesheet conforming to the predefined styesheet schema eas_gui_stylesheet_1.2.xsd, which can be found in the install/resources/xsd directory of the InfoArchive installation package. The stylesheets must be imported into the repository as a stylesheet configuration (eas_cfg_stylesheet) object. The stylesheet is specific to a schema and is linked to one or more search forms. 200 InfoArchive Configuration Configuring a Stylesheet Configuration (eas_cfg_stylesheet) Object In DA, create an object of type eas_cfg_stylesheet in the holding folder (e.g., /System EAS/Archive Holdings/MyHolding) and configure its properties. Property Name DA Label Description eas_name Name The name used to refer to this form in the InfoArchive configuration eas_schema Schema name Indicate the result schema(s) which can be displayed by this stylesheet 201 InfoArchive Configuration Property Name DA Label Description eas_search_form Search form Indicates this stylesheet can be used for displaying the results of a search submitted from these forms eas_form_alias_id Form Alias ID Resolve the references to search forms which can be defined in <query> element of the stylesheet. eas_form_alias _pred Form Alias Predicate DQL predicate that must be executed for finding the search form associated with the eas_form_alias_id at the same index value. Creating a Stylesheet You create a stylesheet in an XML file and later import it into the repository as the content of the stylesheet configuration (eas_cfg_stylesheet) object. Stylesheet Components The following components make up a stylesheet, each defining a portion of the search results: • Main Defines the main paged result table containing the AIUs returned in the result set. From a configuration perspective, it is identical to a htable. • Layout There are three types of layout elements: — htable Displays a table containing any number of columns and rows — vtable Displays a table containing two columns where the left column contains labels and the right column contains values — detail A collapsible panel that can contain any other layout • Item There are three types of items: — value Displays something from the XML of the result set — image Displays an image — html 202 InfoArchive Configuration Displays a piece of static HTML • Modifier A modifier modifies an item, for instance turning it into an external link or a link to download a file. There are four types of modifiers: — content Creates a link to download a file — link Creates a custom external link — query Creates a link that triggers another search — display Creates a conditional display of an item or a modifier other than itself You can use the cssclass and style attributes to apply the stylesheet to any component. If the attribute name ends with "xpath", the value must be a valid XPath 2.0 expression. For a static value, you must enclose the text within single quote, for example: filename_xpath="’audio.mp3’" Stylesheet Elements and Attributes Stylesheets are defined using the <stylesheet> element. They should include an xmlns attribute that points to the URN of the result set schema. The main element and its children define how the result set XML is formatted and presented to the user. The following table lists the stylesheet components available. Stylesheet Element Example Attributes Children Main <g:main path="[PATH TO AIU]">...</g:main> • Path At least one Column Defines the main paged table presenting the result set. a simple path that selects a NodeSet (a list of objects) representing the rows of the table • Style CSS class name to apply 203 InfoArchive Configuration Stylesheet Element Example Attributes Children Column <g:column label="First name"> </g:column> • label • Link The name of the column header A column defines one column of values in a table (htable or main). • cssclass The custom CSS class to apply to the column • header_cssclass The custom CSS class to apply to the column header Details Displays a collapsible panel containing other layout components. <g:details collapsed_label ="Attachments (show)" expanded_label="Attachments (hide)"> </g:details> • expanded_label The label of the expanded panel • collapsed_label The label of the collapsed panel • Style Optional. CSS class name to apply. • auto_expand_xpath Optional. For non top-level details if the xpath evaluates to true the details pane will be auto expanded. Vtable A column defines one column of values in a table (htable or main). 204 <g:vtable> </g:vtable> • Style Optional. CSS class name to apply. • Query • Content • Value • Image • Html • Details • Optional: Html (before) • One of Htable or Vtable • Any number of Details or Display _if • Optional: Html (after) At least one of: • Row • Display _if InfoArchive Configuration Stylesheet Element Example Attributes Children Row <g:row label="Customer ID"> ... </g:row> • Label • Link The label of the row. • Query • Content • Value • Image • Html Htable <g:htable path="Attachments /Attachment"> ... </g:htable> • Path a simple path, relative to the parent table row node that selects a NodeSet (a list of objects) representing the rows of the table At least one of: • Column • Style Optional, CSS class name to apply. Value Display a value from the result set XML, the value is extracted using an XPath expression relative to the current node. <g:value xpath="CreatedOnDate /text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" /> • xpath None a simple path, relative to the parent table row node that selects a NodeSet (a list of objects) representing the rows of the table. • datatype The data type of the value: DATE, DATETIME, TEXT, NUMBER. • format Optional. How to format the value, only applies to DATE, DATETIME and NUMBER. The format is specified according to the format defined in java.text .SimpleDateFormat and java.text.DecimalFormat respectively. 205 InfoArchive Configuration Stylesheet Element Example Attributes Children • input_format Optional. How to parse the value in the XML, only applies to DATE, DATETIME and NUMBER. The format is specified according to the format defined in java.text .SimpleDateFormat and java.text.DecimalFormat respectively. • Style Optional. CSS class name to apply. Image <g:image url_xpath="’img /defaultContent.png’" width="16" height="16" alt_xpath="’alt’" /> • url_xpath An XPath that evaluates to the URL of the image. • width The width of the image. • height The height of the image. • alt_xpath An XPath that evaluates to the alternate text of the image. • Style Optional. CSS class name to apply. Html Displays a piece of plain static HTML. 206 <g:html> <![CDATA[ This is <b>HTML</b>]]> </g:html> None None InfoArchive Configuration Stylesheet Element Example Attributes Children Link Creates an external link (e.g., an element with the href attribute set). <g:link url_xpath="concat (’phone:’, CallToPhoneNumber /text())" style="phonelink" target="_notarget"> </g:link> • url_xpath One of: An xpath that evalutes to the URL of the link. • target • Value • Image • Html Optional. The target of the link. • style Optional. CSS class name to apply. Query Note, schema, AIC and delivery_channel are resolved against the form and eas_form _alias_pred corresponds to the id attribute. Only the values returned by those predicates are valid, and the attributes on the query can only restrict this to a single value. <g:query id="form1 " mode="replace" > ... </g:query> • id The id of a form alias corresponding to the eas_form_alias_pred attribute of the eas_cfg_stylesheet object. • mode The mode of the query, currently only replace. All of: • Criteria One of: • Value • Image • Html • schema Optional, if a specific schema should be used. • schema_version Optional, if a specific schema version should be used. • aic Optional, if a specific collection should be queried. • delivery_channel Optional. If a specific delivery channel should be used. 207 InfoArchive Configuration Stylesheet Element Example Attributes Children Criteria <g:criteria> ... </g:criteria> None At least one of: • or Specifies a search criteria, the sub-criteria are treated as if contained in a conjunction (logical and). or • and • arg <g:or> ... </g:or> None At least one of: • or Specifies a disjunction of search criteria. • and • arg and <g:and> ... </g:and> None At least one of: • or Specifies a conjunction of search criteria. • and • arg arg Specifies a search criterion. <g:arg name=="FileName/@aipid" seqno_xpath="//Attachment /FileName/@seqno" type="DOWNLOAD"> • name The name of the search criterion, it corresponds to a name in the query configuration. • value_xpath The value of the search criterion. • operator The operator to use. Depends on the datatype. • datatype The type of the value, used to determine the formatting of the reminder. 208 None InfoArchive Configuration Stylesheet Element Example Attributes Children • reminder_label The label of the criterion reminder. • reminder_format Optional. The format of the reminder. • reminder_input_format Optional. The format of the value. Content Note that to leverage the content control the query configuration must be configured to add the content information to the result set. <g:content aip_xpath ="FileName/@aipid" seqno_xpath="FileName/@seqno" type="DOWNLOAD"> ... </g:content> or <g:content cid_xpath ="FileName/@cid" seqno_xpath="FileName/@seqno" type="DOWNLOAD"> ... </g:content> • cid_xpath The "AIP id:ci:Sequence number" value extracted using an XPath expression The cid value must have the same format as the ID used to retrieve the content with the WS : 0000000000000000:CI:1. One of: • Value • Image • Html • aip_xpath The AIP ID value extracted using an XPath expression • seqno_xpath The sequence number of the content item, extracted using an XPath expression. • filename_xpath An xpath that evaluates to the filename to use for the file. • type DOWNLOAD or SHOW if the content should be downloaded or shown in the browser (note that 209 InfoArchive Configuration Stylesheet Element Example Attributes Children difference is in the content disposition (attachment vs inline) of the header in the reply to the http request, and ultimately it is up to the browser what action is taken. • style Optional. CSS class name to apply. Display_if Display_if is a special element that can wrap other elements and depending on a value of an xpath either hide or show those elements. Display_if can wrap: <g:display_if condition_xpath ="Attachments/Attachment"> ... </g:display_if> • condition_xpath An xpath that should evaluate to a boolean which is used to determine the visibility of the child elements. Depends on the context it occurs in, can only contain children that would be valid if the display _if wasn’t there. • Child details • Columns in a htable • Rows in a vtable • Items In the following example: • The title element defines the title of the stylesheet, used by the UI as the “header” of the result set page • The path element defines the path to the AIUs relative to the root node. Note that this is a simple path and not a full XPath. • In the cssstyle element, you can define custom CSS classes. <g:stylesheet xmlns:g="urn:eas:xsd:stylesheet.1.2" xmlns="urn:eas-samples:en:xsd:phonecalls.1.0"> <g:title>PhoneCalls</g:title> <g:cssstyle>.customCssClassA { background-color: red; } .customCssClassB { background-color: blue; }</g:cssstyle> <g:main path="Call"> ... 210 InfoArchive Configuration </g:main> </g:stylesheet> Namespaces in the Stylesheet It is important that you declare the namespaces correctly in the stylesheet. At least 2 namespaces must be defined, the namespace of the stylesheet and all the namespaces of the result set XML. The namespace of the result set must be used in the XPath expressions in the stylesheet. A good practice is to define the stylesheet namespace with the g prefix and the result set namespace. If there is only one, with the blank prefix, for example: <?xml version="1.0" encoding="UTF-8"?> <g:stylesheet xmlns:g="urn:eas:xsd:stylesheet.1.2" xmlns="urn:eas-samples:en:xsd:phonecalls.1.0"> </g:stylesheet> Stylesheet Example The code presented below shows a complete example of a stylesheet used to present the search result of the phonecalls example data, it uses all the controls available and in addition shows how to handle a variable amount of associated content files by presenting them in a nested htable. <?xml version="1.0" encoding="UTF-8"?> <g:stylesheet xmlns:g="urn:eas:xsd:stylesheet.1.2" xmlns="urn:eas-samples:en:xsd: phonecalls.1.0" xmlns:eas="urn:x-emc:eas:schema:pdi"> <g:title>Phonecalls</g:title> <g:main path="Call"> <g:column label=""> <g:details expanded_label="-" collapsed_label="+"> <g:vtable> <g:row label="Customer ID"> <g:value xpath="CustomerID/text()" format="" datatype="STRING" /> </g:row> <g:row label="Last Name"> <g:value xpath="CustomerLastName/text()" format="" datatype="STRING" /> </g:row> <g:row label="First name"> <g:value xpath="CustomerFirstName/text()" format="" datatype="STRING" /> </g:row> <g:row label="Sent to archive at"> <g:value xpath="SentToArchiveDate/text()" format="dd/MM/yyyy HH:mm:ss" datatype="DATE" /> </g:row> <g:row label="Call started at"> <g:value xpath="CallStartDate/text()" format="dd/MM/yyyy HH:mm:ss" datatype="DATE" /> </g:row> <g:row label="Call ended at"> <g:value xpath="CallEndDate/text()" format="dd/MM/yyyy HH:mm:ss" datatype="DATE" /> </g:row> <g:row label="Call from phone number"> <g:link url_xpath="concat('phone:',CallFromPhoneNumber/text())" style="phonelink" target="_notarget"> <g:value xpath="CallFromPhoneNumber/text()" format="" datatype="STRING" /> </g:link> </g:row> <g:row label="Call to phone number"> <g:link url_xpath="concat('phone:',CallToPhoneNumber/text())" style="phonelink" target="_notarget"> 211 InfoArchive Configuration <g:value xpath="CallToPhoneNumber/text()" format="" datatype="STRING" /> </g:link> </g:row> <g:row label="Representative ID"> <g:value xpath="RepresentativeID/text()" format="" datatype="STRING" /> </g:row> <g:row label="Search"> <g:query id="form1" mode="replace"> <g:value xpath="'Search for calls'" format="" datatype="STRING" /> <g:criteria> <g:arg name="CallStartDate" value_xpath="'2000-01-01T08:00:00.000+01:00'" operator="GreaterOrEqual" datatype="DATE" reminder_label="To date"></g:arg> <g:arg name="CallStartDate" value_xpath="current-dateTime()" operator="LessOrEqual" datatype="DATE" reminder_label="From date"></g:arg> <g:arg name="CustomerID" value_xpath="CustomerID/text()" operator="Equal" datatype="STRING" reminder_label="Customer ID"></g:arg> </g:criteria> </g:query> </g:row> </g:vtable> <g:details collapsed_label="+ Attachments" expanded_label="- Attachments"> <g:htable path="Attachments/Attachment"> <g:column label="File"> <g:content cid_xpath="FileName/@eas:cid" filename_xpath="AttachmentName/text()" type="SHOW"> <g:image url_xpath="'img/defaultContent.png'" width="16" height="16" alt_xpath="'alt'" /> </g:content> </g:column> <g:column label="Name"> <g:value xpath="AttachmentName/text()" datatype="STRING" /> </g:column> <g:column label="Created by"> <g:value xpath="CreatedBy/text()" datatype="STRING" /> </g:column> <g:column label="Created on"> <g:value xpath="CreatedOnDate/text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" /> </g:column> </g:htable> </g:details> </g:details> </g:column> <g:column label="Call received at"> <g:value xpath="CallStartDate/text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" /> </g:column> <g:column label="ID"> <g:value xpath="CustomerID/text()" datatype="NUMBER" /> </g:column> <g:column label="First name"> <g:value xpath="CustomerFirstName/text()" datatype="STRING" /> </g:column> <g:column label="Last name"> <g:value xpath="CustomerLastName/text()" datatype="STRING" /> </g:column> <g:column label="Call ended at"> <g:value xpath="CallEndDate/text()" datatype="DATETIME" format="yyyy/MM/dd HH:mm:ss" /> </g:column> <g:column label="Audio"> <g:content cid_xpath="Attachments/Attachment[AttachmentName='recording']/FileName/@eas:cid" filename_xpath="'audio.mp3'" type="SHOW" style="btn"> <g:html>Listen</g:html> </g:content> </g:column> </g:main> </g:stylesheet> 212 InfoArchive Configuration Localizing a Stylesheet Like the search form, the search results screen can also be localized into multiple languages through its stylesheet. The locale you choose at InfoArchive GUI login is applied to both search forms and search results pages. You localize a stylesheet in pretty much the same way you localize a search form. In the configuration of a localized stylesheet, the labels of the form controls are externalized. All the localized resources (page title and labels) are stored in a localization properties file, one for each language. The localization properties file is imported into the stylesheet configuration (eas_cfg_stylesheet) object as one of its renditions. It is applied to the corresponding language version of the stylesheet in InfoArchive through a language code defined for the application in the stylesheet configuration. To localize a stylesheet into a non-English language: 1. Edit the properties of the style configuration (eas_cfg_styleshet) object. Under the Description tab, add the language code to be supported by the InfoArchive GUI application, if the language code has not been added yet; for example: eas_gui is the application code for InfoArchive GUI. 2. Externalize the localization resources such as the page title an labels. Edit the XML content of the stylesheet configuration (eas_cfg_stylesheet) object to replace actual page title and labels with string variables; for example: • Externalized row header: <g:row label="${customer.id.label}"> <g:value xpath="CustomerID/text()" format="" datatype="STRING"/> </g:row> • Externalized column header: <g:column label="${created.on.label}"> <g:value xpath="CreatedOnDate/text()" datatype="DATE" format="dd/MM/yyyy HH:mm:ss" /> </g:column> 3. Create a localization properties file containing translations of the externalized resources (page title and labels) in the target language. The localization file must be encoded in UTF-8 and its 213 InfoArchive Configuration file name must be suffixed with the language code and have the .properties extension, such as stylesheet.PhoneCalls.zh_CN.properties. In the following example localization properties file, strings translated into the Simplified Chinese language are assigned to the resource variables. title=通话记录 customer.id.label=客户 ID search.label=搜索 attachments.label=附件 file.label=文件 name.label=名称 created.by.label=创建人 created.on.label=创建日期 call.received.at.label=通话时间 id.label=ID audio.label=音频 play.label=播放 4. Import the localization properties file as a rendition of the stylesheet configuration (eas_cfg_stylesheet) object. a. Right-click the eas_cfg_stylesheet object and choose View > Renditions from the shortcut menu. b. With the existing XML Document rendition selected, choose Import Rendition from the File menu. c. Select the localization properties file to import. d. If the localization properties file conforms to the correct naming convention and file format (.properties), InfoArchive will recognize its format as eas_localization (localization properties file) and automatically fills out the Format field. e. In the Page modifier field, enter the language code, such as zh_CN for Simplified Chinese. f. Click OK. The imported file is displayed as a rendition of the stylesheet configuration (eas_cfg_stylesheet) object. The stylesheet is localized. When you view the search results pages in InfoArchive GUI using the corresponding locale, the page title and labels will all appear in the target language. When the localization property file is missing for a specific locale, the default one defined in the eas-services.properties file (eas.locale.default=en_US) is used. 214 InfoArchive Configuration Implementing InfoArchive GUI Single Sign-On (SSO) If you use a third-party authentication server (LDAP, ActiveDirectory, and so on) for user login, you may want to implement single sign-on (SSO) and handle user context variables using a custom JSP file, instead of creating authorized users in the InfoArchive GUI web application. You can create a JSP file that calls the external authentication system to authenticate users logging in to InfoArchive GUI. In addition to username and the password, you can include other context variables in the JSP file such as email address and client IP address. These context variables are added to InfoArchive event properties for auditing purposes. Here is an example of the custom JSP file for SSO: <%@ <%@ <%@ <%@ <%@ <%@ <% page page page page page page import="com.emc.documentum.eas.gui.gwt.server.EASGUIApplication" %> import="java.util.Map" %> import="java.util.HashMap" %> import="com.emc.documentum.eas.gui.gwt.server.webservices.EASAccessLayer" %> import="com.emc.documentum.eas.service.model.authenticate.AuthenticationResponse" %> import="com.emc.documentum.eas.gui.gwt.server.web.jsp.EASGuiJspUtils" %> String user = "eas_usr_webservice"; String password = "eas_usr_webservice"; Map<String, String> variables = new HashMap<String, String>(); // use variables.put() to include additional variables variables.put("key1", "value1"); variables.put("key2", "value2"); AuthenticationResponse authenticationResponse = EASAccessLayer.login(user, password); EASGUIApplication.getAuthenticationHook().onUserAuthenticated(user, null, authenticationResponse.getProfile(), variables); response.sendRedirect(EASGuiJspUtils.getWelcomeURL(request)); %> After you deploy the custom JSP file in the InfoArchive GUI directory, you can access InfoArchive GUI via this page; for example: http://myhost:8080/eas-gui/testsso.jsp InfoArchive Reserved Context Variables InfoArchive UI application has the following reserved context variables: • useragent • remote.addr • remote.host • remote.user You can audit these variables, which contain authentication and connection information. For example, you can enable the auditing of remote.addr by adding the following in eas-gui.properties: eas.audit.remote.addr=true After auditing is enabled, you can see the remote address information added as properties of an eas_query or eas_order event: eas_consumer_ctxvar_name eas_consumer_ctxvar_value [0]: remote.addr [0]: 127.0.0.1 215 InfoArchive Configuration Note: When the audit option is set to true in eas-gui.properties, the variables.put(); method for the reserved context variables in the JSP file are no longer effective. Therefore, do not add InfoArchive reserved variables and hard code their values in the custom JSP file for auditing purposes; enable reserved context variables auditing in eas-gui.properties instead. Custom Context Variables You can add custom variables such as DNS, email, and other authentication or connection information in the JSP file for SSO. The information is later added to each query or order event related to an SSO. For example, if you have the following in the JSP file: variables.put("DNS", "10.27.0.10"); variables.put("Email", "Admin@InfoArchive.com"); The query or order event related to the SSO will have the following properties, which can be used for auditing purposes. eas_consumer_ctxvar_name eas_consumer_ctxvar_value eas_consumer_ctxvar_name eas_consumer_ctxvar_value [0]: [0]: [1]: [1]: DNS 10.27.0.10 Email Admin@InfoArchive.com Enabling Security Policies By default, security policies are disabled for InfoArchive. You can enable it by choosing a security policy for your system. Enabling a security policy for InfoArchive consists of two configuration steps: • Configuration on web services • Configuration on web GUI Typically, you perform the security configuration tasks after deploying WAR files for web services and web GUI. InfoArchive provides the following security policies: • Clear-text username/password policy • X.509 policy • Customized policies Clear-text Username/Password Policy Clear-text username/password policy has many options. The basic option is to validate the username and the password saved in a file. Furthermore, you can include extra validation mechanism for this policy, for example, Documentum Content Server validation, LDAP validation. 216 InfoArchive Configuration Configuration on Web Services 1. Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/wsdl/eas-service-ws -policies.wsdl. a. Remove comment tags (<!-- -->) for the Clear-text Username/Password Policy section. b. Save the file. Note: You can have only one active security policy for web services. Therefore, whenever you remove comment tags in eas-service-ws-policies.wsdl, you must add comment tags for the currently active policy to disable it. 2. Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/cxf-context.xml. a. Locate the jaxws:endpoint Properties for file-based Username/Password checking, jaxws:endpoint Properties for Documentum Username/Password checking, or jaxws:endpoint Properties for LDAP Username/Password checking section, and copy the whole <entry> element in the section to each <jaxws:properties> section in the file. b. Examples of completed sections: • A completed <jaxws:properties> section for the file-based validation should look like the following: <jaxws:properties> <entry key="ws-security.callback-handler" value="com.emc.documentum.eas.webservice. security.UsernamePasswordCallbackHandler"/> </jaxws:properties> • A completed <jaxws:properties> section for the Documentum Content Server validation should look like the following: <jaxws:properties> <entry key="ws-security.ut.validator" value-ref="documentumCredentialValidator"/> </jaxws:properties> • A completed <jaxws:properties> section for the LDAP validation should look like the following: <jaxws:properties> <entry key="ws-security.ut.validator" value-ref="ldapCredentialValidator"/> </jaxws:properties> c. If you choose Documentum or LDAP validation, you must also remove the comment tags for the corresponding <bean> section. For example, remove the comment tags (<!-- -->) in the following section: <!-<bean id="documentumCredentialValidator" class="com.emc.documentum.eas.webservice.security. DocumentumCredentialValidator"/> --> d. Save cxf-context.xml. 217 InfoArchive Configuration 3. Add the username and the password to TOMCAT_HOME/webapps/eas-services/WEB-INF /classes/uname_token.dat. On the web service side, you can add multiple username and password combinations, with each pair of username and password for a front-end application. If you choose the LDAP validation for the username and the password, you must also complete the following steps: 1. Create a jaas.config file on the file system for LdapLoginModule. The file must contain the following information: easldap { com.sun.security.auth.module.LdapLoginModule REQUIRED userProvider="ldap://LDAP_server_IP:LDAP_Port/ ou=users,dc=mycompany,dc=com" authIdentity="cn=USERNAME, ou=users,dc=mycompany,dc=com" useSSL=false; }; For a full list of parameters contained in this configuration file, refer to the Class LdapLoginModule reference page. 2. Ensure the web application server is restarted with -Djava.security.auth.login.config =absolute path to jaas.config. Configuration on Web GUI The only configuration task to perform is to add the username and the password to TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes/uname_token.dat. On the web GUI side, you must only add one pair of username and password. X.509 Certificates Configuration on Web Services 1. Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/wsdl/eas-service-ws -policies.wsdl. a. Remove comment tags (<!-- -->) for the X.509 Policy section. b. Save the file. Note: You can have only one active security policy for web services. Therefore, whenever you remove comment tags in eas-service-ws-policies.wsdl, you must add comment tags for the currently active policy to disable it. 2. Modify TOMCAT_HOME/webapps/eas-services/WEB-INF/cxf-context.xml. a. Locate the jaxws:endpoint Properties for X.509 checking section, and copy the whole <entry> element in the section to each <jaxws:properties> section in the file. b. Examples of completed sections: • A completed <jaxws:properties> section for the X.509 checking should look like below: 218 InfoArchive Configuration <jaxws:properties> <entry key="ws-security.callback-handler" value="com.emc.documentum.eas.webservice. security.X509PasswordCallbackHandler"/> <entry key="ws-security.encryption.properties" value="WEB-INF/serviceKeystore.properties"/> <entry key="ws-security.signature.properties" value="WEB-INF/serviceKeystore.properties"/> <entry key="ws-security.encryption.username" value="useReqSigCert"/> </jaxws:properties> c. Save the file. 3. Create keystores for InfoArchive web services and each client with an X.509 v3 certificate. The X.509 certificate contains a new private key using the RSA algorithm for key generation and the SHA1 with RSA algorithm for certificate signature. 4. Import the certificate with the public key of InfoArchive web services into each client keystore, and the public key of each client into the InfoArchive web services keystore. 5. Place the InfoArchive web services keystore in a safe directory on the server where InfoArchive web services are hosted. 6. Modify the TOMCAT_HOME/webapps/eas-services/WEB-INF/serviceKeystore .properties file with the correct values for the InfoArchive Web Services keystore : org.apache.ws.security.crypto.provider=<cryptography provider> org.apache.ws.security.crypto.merlin.keystore.file=<path to keystore> org.apache.ws.security.crypto.merlin.keystore.password= <service keystore password> org.apache.ws.security.crypto.merlin.keystore.type=<keystore type> org.apache.ws.security.crypto.merlin.keystore.alias= <service private key alias> org.apache.ws.security.crypto.merlin.keystore.private.password= <service private key password> • cryptography provider is the class providing the cryptography implementation. The default is: org.apache.ws.security.components.crypto.Merlin. • path to keystore is the absolute path to the keystore file containing the InfoArchive web services private key, for example, C:\path_to_keystores\serviceKeystore.jks. • service keystore password is the password protecting the keystore file. • service private key alias is the alias of the InfoArchive web services private key. • service private key password is the password protecting the InfoArchive web services private key. Configuration on Web GUI 1. Create a .java file with the following example code. The Service Key Alias in the last line can be found in serviceKeystore.properties in Step 6 of the Configuration on web services section. package com.emc.documentum.eas.gui.gwt.server.webservices; import org.apache.cxf.endpoint.Client; public class CustomSecurityConfigurator implements SecurityConfigurator { @Override public void configureSecurity(Client client) throws Exception { client.getRequestContext().put("ws-security.callback-handler","com.emc. documentum.eas.gui.gwt.server.webservices.X509PasswordCallbackHandler"); 219 InfoArchive Configuration client.getRequestContext(). put("ws-security.encryption.properties", "clientKeystore.properties"); client.getRequestContext().put("ws-security.signature.properties", "clientKeystore.properties"); client.getRequestContext().put("ws-security.encryption.username", "<service key alias>"); } } 2. Compile the .java file, then place the compiled .class file in TOMCAT_HOME/webapps/eas -gui/WEB-INF/classes/com/emc/documentum/eas/gui/gwt/server/webservices. Customizing a Security Policy You can also customize a security policy for InfoArchive web services. Enabling a customized policy for InfoArchive consists of configuration tasks on web services and web GUI. On web service side, you must perform the following tasks. 1. Add the customized policy to TOMCAT_HOME/webapps/eas-services/WEB-INF/wsdl/eas -service-ws-policies.wsdl, and ensure the customized policy is the only active one. The active policy must have the attribute wsu:Id="ActivePolicy", and the other policies must be disabled. 2. Add the customized policy to the Apache CXF configuration file cxf-context.xml. On web GUI side, you must create a customized CustomSecurityConfigurator class and place the .class file in the specified directory. The following code sample outlines a CustomSecurityConfigurator class. Similar to X.509 certificates, you must place the compiled class in TOMCAT_HOME/webapps/eas-gui/WEB-INF /classes/com/emc/documentum/eas/gui/gwt/server/webservices. package com.emc.documentum.eas.gui.gwt.server.webservices; import org.apache.cxf.endpoint.Client; public class CustomSecurityConfigurator implements SecurityConfigurator { @Override public void configureSecurity(Client client) throws Exception { // CXF Configuration goes here. } } The following websites provide detailed information about security policies: • Apache CXF • WS-Security configuration • WS-SecurityPolicy configuration Customizing InfoArchive GUI The installation of InfoArchive includes a standard InfoArchive web GUI. You can customize UI strings and CSS on the web GUI. 220 InfoArchive Configuration UI strings Files in the folder TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes/l10n contain UI strings that can be customized to meet your requirements. These properties files are encoded with ANSI character sets. You must use an ANSI compatible text editor to avoid introducing garbage characters. CSS You can define the style for your InfoArchive GUI in eas-gui/custom.css, which is an empty file after deployment. It is no longer needed to add a reference to the customized CSS in local JSP files. In JSP files, the following link refers to custom.css. <link type="text/css" rel="stylesheet" href="custom.css?v=1"/> Autocomplete You can activate the autocomplete feature on the InfoArchive login page by setting the autocomplete value to ON in eas-gui/WEB-INF/pages/LoginPage.jsp. When the autocomplete feature is turned on, login fields are populated automatically after your first login. Configuring Advanced InfoArchive GUI Settings The eas-gui.properties files located in the InfoArchive GUI web application directory (for example, TOMCAT_HOME/webapps/eas-gui/WEB-INF/classes on Apache Tomcat) contains preset InfoArchive web services settings and some advanced settings that you can configure if needed. You can edit the eas-gui.properties file to modify the following advanced settings: Property Description Default Value eas.cache.directory InfoArchive GUI cache directory /tmp/eas eas.locale.default Properties for defining InfoArchive locales for GUI search forms and stylesheets (search results pages); see Defining InfoArchive GUI Locales, page 195. en_US Maximum number of search results that can be displayed on a synchronous search results page 10 eas.locales eas.client.config.default .dateformat eas.client.result.page.size yyyy-MM-dd 221 InfoArchive Configuration Property Description Default Value eas.client.order.page.size Maximum number of search results that can be displayed on an asynchronous/background search results page 10 eas.export.content.quota Maximum total size in MB of unstructured content files that can be exported in a single operation 200 eas.cache.clean .shutdown Whether or not the entire cache is cleared on application shutdown (true or false) true eas.cache.clean.logout Whether or not the a session’s cache is cleared on logout (true or false) true eas.xdb.page.size Size of the xDB page size used for the cachet 8192 eas.client.order.delivery .channels Delivery channels used by orders (asynchronous searches). You can define multiple channels in addition to eas_access_services, delimited by comma: eas.client.order.delivery.channels =eas_access_services, my_delivery _channel_1, my_delivery_channel_2 After you modify the settings, restart the web application for the changes to take effect. Configuring Confirmations You configure confirmations using the confirmation configuration object (eas_cfg_confirmation) in conjunction with the query configuration object (eas_cfg_query) and the delivery channel object (eas_cfg_delivery_channel). Here are the general steps for configuring confirmations: 1. Configure a delivery channel configuration object for confirmations. 2. Configure a query configuration object for confirmations. 3. Optionally, define the criteria for applicable AIPs. 4. Configure a confirmation configuration object. Confirmation Configuration (eas_cfg_confirmation) Object The confirmation configuration object primarily determines the scope of confirmation processing—which set of AIPs to generate confirmations for (defined by an XQuery text file imported as the content of the confirmation configuration object) for what event types. 222 InfoArchive Configuration The confirmation configuration object also specifies: • The result schema name used to find the query configuration object, which defines the content of confirmation messages (through an XQuery XML file imported as the content of the query configuration object) • The delivery channel object to use that defines the output destination of confirmation messages. The confirmation configuration object also passes delivery parameters to the delivery channel object. Query Configuration Object (eas_cfg_query) for Confirmations You configure a query configuration object (eas_cfg_query) to define the content of confirmation messages. The XML content of the query configuration object defines: • The path of the XML elements to return • The configured XQuery to execute on the selected XML elements for adjusting the results The properties of the query configuration object defines: • The holdings against which the confirmation job runs • The schema URN to which the query results must comply The confirmation job uses the following criteria to find a query configuration object: • The holding of the AIP • The schema URN of the confirmation configuration object If there is no query configuration object that matches these criteria, the confirmation job does the following and exits with a non-zero return code: • Writes an error message in the log • Continues the processing for other AIP events 223 InfoArchive Configuration • Attempts to generate the remaining confirmations applicable to the AIP event • Does not set the confirmation timestamp for the AIP event • Continues the processing for other AIP events The confirmation job will attempt to generate again the confirmations for this AIP event the next time it runs. The confirmation messages which have been successfully generated for this AIP event will be regenerated at the next job run. XQuery for Confirmations The XQuery content of the query configuration object for confirmations defines the content of confirmation messages. During confirmation processing, the confirmation job executes the XQuery on the SIP descriptor (eas_sip.xml) or the PDI (eas_pdi.xml) rendition of the AIP and returns the query results in the designated format to construct the confirmation message. The default execution scope of the XQuery depends on the event type: Event Type Default Execution Scope Changeable receipt SIP descriptor (eas_sip.xml) No storage PDI file (eas_pdi.xml) Yes purge PDI file (eas_pdi.xml) Yes reject SIP descriptor (eas_sip.xml) No invalid SIP descriptor (eas_sip.xml) No For storage and purge event types, the XQuery can use any information stored in the AIP structured data—eas_pdi.xml and eas_ci.xml. The XQuery can return results in an XML file or in other file formats such as .csv or fixed records file. A fixed record file refers to a file containing records requested by a business application, where each field has a maximized constant size; for example, info1 from position 1 to 10, info2 from position 11 to 15, info3 from position 16 to 20, so on and so forth. Each field has a predefined layout. Fixed record files are often used to generate confirmations with data produced by a mainframe application. For example, you write an XQuery to return a subset of key of archived documents along with their associated InfoArchive content identifiers in fixed record files. A business application reading the confirmation message can use the content identifier to retrieve content without having to issue an InfoArchive search first. It is extracted directly from the eas_ci_container rendition, using the ci position. Dynamic variables can also be used in the XQuery for confirmations. In the following example, the AIP event is defined as an external variable ($eas_conf_type): <request-configs xmlns:s="urn:x-emc:eas:schema:sip:1.0"> <request-config> <entity> <path/> </entity> <query-template xmlns:xhive="http://www.x-hive.com/2001/08/xquery-functions"> declare variable $eas_conf_type as xs:string external; <confirmation xmlns='urn:x-emc:eas:schema:confirmation:1.0'> { 224 InfoArchive Configuration let <uriSip := root(<select/>) let <uriSip := replace(document-uri(<uriSip), '\.pdi<', '.sip') return (doc(<uriSip)/s:sip, element eas_aip_id {xhive:metadata(doc(<uriSip), 'eas_aip_id')}, element eas_conf_type {<eas_conf_type}) } </confirmation> </query-template> </request-config> </request-configs> Here is an example of the confirmation messages returned by the XQuery: <confirmation xmlns="urn:x-emc:eas:schema:confirmation:1.0"> <sip xmlns="urn:x-emc:eas:schema:sip:1.0"> <dss> <holding>PhoneCalls</holding> <id>2011020113</id> <pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema> <pdi_schema_version/> <production_date>2011-02-01T00:00:00.000+01:00</production_date> <base_retention_date>2011-02-01T00:00:00.000+01:00</base_retention_date> <producer>CC</producer> <entity>PhoneCalls</entity> <priority>0</priority> <application>CC</application> </dss> <production_date>2011-02-01T00:00:00.000+01:00</production_date> <seqno>1</seqno> <is_last>true</is_last> <aiu_count>10</aiu_count> <page_count>0</page_count> </sip> <eas_aip_id>0800000b80009c9e80009ca3</eas_aip_id> <eas_conf_type>invalid</eas_conf_type> </confirmation> Delivery Channel Configuration Object (eas_cfg_delivery_channel) The delivery channel configuration object defines the output destination of confirmation messages. InfoArchive provides two delivery channel implementations: Delivery Channel (Java Class) Output Destination com.emc.documentum.eas.delivery.module.FileDeliveryModule File system com.emc.documentum.eas.delivery.module.XdbDeliveryModule xDB Using the xDB delivery channel facilitates the aggregation of multiple messages using a custom job, as is the case with the InfoArchive audit archive holding. If you use the xDB delivery channel, confirmation messages must be in XML format. The delivery channel configuration object accepts a set of parameters. You define the values of the parameters to pass to the delivery channel configuration object when configuring the confirmation configuration object. 225 InfoArchive Configuration Delivery Channel Configuration Parameters Each type of delivery channels—file system and xDB—has a list of valid configuration parameters. Parameter with an asterisk (*) next to it denotes that it is required. File system delivery channel configuration object parameters are as follows: Parameter Default Value Description Name * Filename convention of confirmations, which is pattern consisting of fixed parts and confirmation variables to construct a unique filename for each generated confirmation; for example: conf_%eas_conf_aip_id%_%eas_conf_type% file.path * Full path to the destination output directory; for example: c:/tmp/infoarchive/confirmations .Make sure the specified directory exists. file.name.prefix Prefix of the file file.name.suffix Suffix of the file, sometimes used to denote the file extension; for example: .xml file.overwrite false If the file already exits, whether or not to overwrite the existing file (true|false) file.zip false Whether to compress the file and add the zip extension (true|false) file.audittrail false Whether to generate a confirmation file export audit trail entry (true|false) audit:event_name eas_delivery Event name to put in the confirmation file export audit trail entry file.audittrail.content .attr null Attribute where the OID content is saved. If null, the OID content is not saved. Some electronic archiving regulations require generated confirmation messages to be archived. Setting the name of an id attribute of the audittrail object type to the file.audittrail.content.attr parameter triggers: • The import of the generated confirmation file as an eas_audit_trail_content repository object • The assignment of the eas_audit_trail_content object identifier to the designated id attribute of the audittrail Archiving audit trails also archives the content associated with those audit trail entries. After archiving, InfoArchive destroys the eas_audit_trail_content objects in the repository. 226 InfoArchive Configuration Parameter Default Value Description audit:audited_obj_id AIP OID audit:string _1|2|3|4|5 Audit strings: • string_1: eas_aip_id • string_2: eas_dss_holding,eas_dss_producer,eas _dss_id • string_3: eas_aip_id:ci:position[:start_page:page _count] • string_4: pdi_key • string_5: audit audit:id_1|2|3|4|5 Audit IDs xDB delivery channel configuration object parameters are as follows: Parameter Default Value Description eas_cfg_xdb_library * Name of the eas_cfg_xdb_library object, which points to the xDB library to store generated confirmations name * Name convention of XML documents created for confirmations, which is pattern consisting of fixed parts and confirmation variables to construct a unique name for each generated confirmation; for example: conf_%eas_conf_aip_id%_%eas_conf_type% meta:metadata_name The name of xDB metadata to assign to each XML document. You can set a confirmation variable as the value of the metadata; for example, assign variable %eas_conf_datetime% to parameter meta:eas_conf_datetime. The following confirmation variables can be used to construct filenames or XML document names for confirmations and assign metadata to XML documents. Confirmation Variable Description eas_aip.property_name Specify any property of the eas_aip object eas_conf_audittrail_id r_object_id of the audit trail entry created for the current confirmation processing eas_conf_type Confirmation event type eas_conf_datetime Date and time when the processing of the current confirmation started eas_conf_event_datetime Date and time when the confirmed event occurred eas_conf_aip_id The AIP object ID eas_conf_aip_oid r_object_id of the AIP object 227 InfoArchive Configuration Confirmation Variable Description eas_conf_cfg_id r_object_id of the confirmation configuration object (eas_cfg_confirmation) eas_conf_cfg Object name of the confirmation configuration object (eas_cfg_confirmation) eas_conf_schema eas_result_schema of the confirmation configuration object (eas_cfg_confirmation) eas_conf_schema_version eas_result_schema version property of the confirmation configuration object (eas_cfg_confirmation) Configuring a Delivery Channel Configuration Object for Confirmations 1. In DA, navigate to a folder where you want to create the delivery channel configuration object. You can use an archive holding folder (e.g., /System EAS/Archive Holdings/MyHolding) if you are configuring the delivery channel exclusively for that holding, or a shared folder (e.g., /System EAS/Delivery Channels) if the delivery channel is to be shared by multiple holdings. 2. 228 Create a new object of type eas_cfg_delivery_channel and set its properties. InfoArchive Configuration Property Name DA Label Description eas_name Name The unique name of the delivery channel. eas_java_class Delivery Java class Fully qualified name of the Java class implementing the delivery channel. InfoArchive provides two delivery channel implementations: com.emc.documentum.eas.delivery .module.FileDeliveryModule for the file system delivery channel and com.emc.documentum.eas.delivery.module .XdbDeliveryModule for the xDB delivery channel. eas_parameter _name Parameter name Name of the delivery channel configuration parameter. Each type of delivery channel (file system and xDB) has a set of valid parameters. eas_parameter _value Parameter value The value of the delivery channel configuration parameter specified at the same index. These parameters are passed to the delivery channel implementation class at runtime. Configuring a Query Configuration (eas_cfg_query) Object for Confirmations 1. Create an XML file containing the XQuery for generating confirmations. For information about how to construct the XQuery, see XQuery for Confirmations, page 224. 2. In DA, create a new object of type eas_cfg_query in the archive holding folder by importing the XQuery XML file and then configure its properties. 229 InfoArchive Configuration Property Name DA Label Description eas_name Name Technical name of the query configuration. eas_result _schema Result schema name Name of the schema in which the query results are returned. eas_result_root _element Result root element The root XML element of the result. eas_cfg_aic Archival Information Collection (AIC) The Archive Information Collections (AIC) that this query configuration can be used to query. A query configuration can query multiple AICs. Defining the Criteria for Applicable AIPs (Optional) By default, a confirmation configuration is applicable to all AIPs. To restrict the applicability of a confirmation configuration to a subset of AIPs, such as those of a particular holding, you can create a text file containing the XQuery defining the applicability scope and import it into the repository as the content of the confirmation configuration object. The XQuery’s execution context is restricted to the SIP descriptor (eas_sip.xml) associated with an AIP and cannot be dynamically changed to query the PDI file (eas_pdi.xml). Any information contained in an AIP’s SIP descriptor can be used to define the applicability condition. The context of the XQuery cannot be dynamically changed. 230 InfoArchive Configuration The XQuery applicability condition must return a boolean value (true/false). Only when the returned value is true is the confirmation configuration applicable to the AIP. In the following example, the XQuery limits the application of the confirmation configuration only to the AIPs pertaining to the PhoneCalls holding. declare namespace sip="urn:x-emc:eas:schema:sip:1.0"; let $holding := /sip:sip/sip:dss/sip:holding return matches($holding, 'PhoneCalls', 'i') The “i” argument of the matches XQuery function indicates that a case-insensitive comparison of the values will be performed. For more information about XQuery functions and operators, see http://www.w3.org/TR/xpath-functions/#flags. Configuring a Confirmation Configuration Object In Documentum Administrator, create a new object of type eas_cfg_confirmation in the archive holding folder and configure its properties. If you created an XQuery text file for defining the applicability scope of the confirmation configuration, choose File > Import to import the XQuery text file to create the object; otherwise, choose File > New > Document. Property Name DA Label Description eas_confirmation _type Confirmation type Confirmation event types that this configuration is applicable to: receipt, storage, purge, reject, invalid. 231 InfoArchive Configuration Property Name DA Label Description eas_result_schema Result schema name Uniform resource name of the result schema for generating confirmation messages; for example: urn:x-emc:eas:schema:confirmation:1.0. This value is used to find the query configuration object, which defines the content of confirmation messages. Only matching query configuration objects with the identical value of the eas_result_schema property will be used for the confirmation configuration. eas_delivery_channel Delivery channel Delivery channel to use for outputting confirmation messages. eas_delivery_param _name Delivery parameter name Name of the delivery channel parameter to pass to the delivery channel. eas_delivery_param _value Delivery channel parameter value Value of the delivery channel parameter to pass to the delivery channel. 232 Chapter 4 InfoArchive Administration InfoArchive Administration Overview EMC InfoArchive provides the functions and services for the overall operation of the archive system, including the following administrative functions: • Managing data throughout its entire lifecycle within InfoArchive, from archiving data into the system, to managing archived data, to disposing of information at the end of its retention period • Continuously monitoring the functionality of the entire InfoArchive system and systematically control changes to the configuration. This function maintains integrity and traceability of the configuration, audits system operations, system performance, and system usage. It receives operational statistics from archival storage areas and periodically provides archival reports. • Defining and modifying retention policies for archived data • Managing InfoArchive jobs to monitor and improve archiving operations, and to inventory, report on, and update the contents of the archive • Managing audit trails to ensure established archive standards and policies are met in compliance with regulatory and legal requirements • Performing backup and restore as part of disaster recovery capabilities • Querying archives using InfoArchive GUI The level of complexity of the administrative tasks is highly dependent on the nature of archived information and the level of integration of InfoArchive into your IT infrastructure. Generating SIPs InfoArchive is source application-agnostic, which means that it can archive any data produced by any source application, as long as the data is packaged into a designated InfoArchive-supported format for ingestion. For example, InfoArchive can archive scanned documents, recorded videos, business data exported from an ERP system, and of course, information extracted from EMC Documentum Content Server. Once archived, the information is securely preserved and can be retrieved at any time. InfoArchive is not responsible for generating SIPs; you must use the source application or develop your own utilities to generate SIPs that conform to the InfoArchive required format. You optionally use a file transfer utility to transport the generated SIPs to where they can be ingested by InfoArchive. EMC provides the following out-of-the-box utilities for generating SIPs: • FS SIP Creator FS SIP Creator is a standalone, configurable command line tool that creates SIP from a PDI file template, a metadata configuration file, and content files on the file system. This tool ships with 233 InfoArchive Administration InfoArchive and can be found in the /unsupported/tools/ directory of the InfoArchive installation package. For information about using FS SIP Creator, refer to its accompanying documentation. • EMC InfoArchive Documentum Connector Documentum Connector is a command-line data extraction and transformation utility that exports content to be archived directly from the EMC Documentum repository and generates Submission Information Packages (SIPs) to ingest into InfoArchive. For information about using Documentum Connector, refer to the EMC Documentum Connector User Guide. • EMC InfoArchive SharePoint Connector SharePoint Connector is a command-line data extraction and transformation utility that extracts documents and task items from SharePoint sites and generates SIPs to ingest into InfoArchive. For information about using SharePoint Connector, refer to the EMC InfoArchive SharePoint Connector User Guide. Both EMC InfoArchive Documentum Connector and EMC InfoArchive Documentum SharePoint Connector are not part of the InfoArchive installation package and must be downloaded separately. For information about how to use these utilities, refer to their respective user guide downloadable from EMC Online Support site (https://support.emc.com). PDF Files Consolidation & Dynamic Extraction When there are a large number of PDF documents with identical formatting and layout to be archived, instead of generating a SIP containing multiple PDF files, one for each AIU, you can consolidate all PDF files in the SIP into a single PDF file, with each AIU corresponding to a specific page range of the file. A SIP can include several such consolidated PDF files. When you perform queries on the archived package, AIUs are returned with their associated content files (PDF) dynamically generated by extracting corresponding pages from the consolidated PDF file. For example, you want to archive invoice records in PDF format. Instead of generating a SIP containing many PDF documents, one for each invoice record, you can create a SIP that contains a single consolidated PDF file. In the PDI file eas_pdi.xml, each invoice record (AIU) contains a reference to a specific page range (denoted by a start page and a page count) of the PDF file. When you query archived invoice records, each invoice record in the search results is returned as a single PDF document, dynamically generated based on its page range in the consolidated PDF file. PDF files consolidation and dynamic extraction provide the following benefits: • Because information stored in one consolidated PDF file is shared by multiple documents (AIUs), only one content file is needed as opposed to one PDF file for each AIU. • Consolidating large numbers of PDF documents into one significantly reduces the total file size (at least to one-third of the original size) as well as speeds up the ingestion process. Generating SIPs Containing Consolidated PDF Files When you generate SIP files that contain one or more consolidated PDF files, use a PDI schema that includes the start_page and page_count elements. InfoArchive uses these two elements to 234 InfoArchive Administration calculate an AIU’s page range in the consolidated PDF file: start_page indicates from which page a PDF document (AIU) starts and page_count indicates how many pages the document spans, with start_page being the first page. In the following sample eas_pdi.xml: • The first document corresponds to the first two pages of the file Invoice001.pdf. • The second document corresponds to page 3 through 5 of the file Invoice001.pdf. • The third document corresponds to the first page of the file Invoices002.pdf. ... <document> <start_page>1</start_page> <page_count>2</page_count> <filename>Invoices001.pdf</filename> </document> <document> <start_page>3</start_page> <page_count>3</page_count> <filename>Invoices001.pdf</filename> </document> <document> <start_page>1</start_page> <page_count>1</page_count> <filename>Invoices002.pdf</filename> </document> ... When using the PDF files consolidation & dynamic extraction feature, note the following: • Limit the number of pages in the consolidated PDF file to around 200 for optimal query performance. Too many PDF pages lowers query performance. • Ensure the values of the start_page and page_count elements in eas_pdi.xml are valid. The ingestion process does not validate these values. If page ranges are invalid, PDF documents can still be ingested but cannot be successfully retrieved through queries. • Ensure consolidated PDF files do not have any security settings enabled. InfoArchive cannot handle PDF files with certain security settings turned on, for example, password protected or page extraction restricted. The ingestion process does not check the validity of PDF files. Starting the xDB Cache If you use xDB ingestion mode 2, you must start the xDB cache first before launching the ingestor. The xDB cache is a standalone Java program that runs as a background daemon for importing the xDB detachable library associated with the AIP at the end of the ingestion. You can only run one instance of the xDB cache. By default, the arguments for the xDB cache process are defined in the EAS_HOME/conf /eas-xdb-cache.properties file. The name of the xDB access node configuration (eas_cfg_cache_access_node) object to be used by the xDB cache process is defined in the file. Note: xDB server can be hosted on a dedicated server. If this is the case, you must start the xDB cache process on the xDB server using the same OS account. 235 InfoArchive Administration You can also start/stop the xDB cache via commands. You can override the arguments defined in eas-xdb-cache.properties at the command line as needed. Start the xDB cache by executing the following command located in EAS_HOME/bin: • eas-launch-xdb-cache.sh (Linux) • eas-launch-xdb-cache.bat (Windows) Keep the prompt window open. Note: To stop the xDB cache, you must explicitly execute the following command located in EAS_HOME/bin: • eas-stop-xdb-cache.sh (Linux) • eas-stop-xdb-cache.bat (Windows) Shutting down the host without stopping the xDB cache properly may cause failure the next time you try to start it. If this happens, force-start the xDB cache using the force (-f) option. On Windows, xDB cache is installed as a Windows service. You can run/stop xDB cache by starting/stopping the EAS XDB Cache service in the Services Microsoft Management Console (MMC). Note: NEVER execute the eas-launch-xdb-cache script when the EAS XDB Cache Windows service is already running. Archiving Data in Asynchronous Ingestion Mode In the asynchronous ingestion mode, you archive data into InfoArchive in the following distinct steps by running corresponding scripts or jobs (typically scheduled) located in EAS_HOME/bin: 1. Reception (eas-launch-receiver.bat / eas-launch-receiver.sh) Queues the SIP file for ingestion. An AIP object is created in the Documentum repository and assigned attribute values based on the information in the SIP. 2. Enumeration (eas-launch-enumeration.bat / eas-launch-enumeration.sh) Outputs a list of AIPs awaiting ingestion for a specified ingestion node. 3. Ingestion (eas-launch-ingestor.bat / eas-launch-ingestor.sh) Ingests the AIPs and their associated AIUs, in order of AIP ingestion priority. At this stage, the AIUs are not searchable yet. 4. Commit (eas-launch-job-commit.bat / eas-launch-job-commit.sh) Commits the AIUs into the xDB database. The AIU data for the holding can now be searched. For each command, its parameters are configured through a properties file or directly in the command line. Once read from the repository, the settings in the holding configuration override the arguments provided through the properties file or the command line. 236 InfoArchive Administration Receiving SIPs The eas-launch-receiver script executes the receiver process, which performs the following actions: 1. Creates an AIP (eas_aip) object 2. Creates a directory under the reception root working directory (EAS_HOME/working /reception_node_name) 3. Attaches the eas_receive.en_US lifecycle to the AIP 4. Promotes the AIP through the lifecycle 5. Executes actions appropriate to each lifecycle state 6. After reception is complete, deletes the received file from the receiving area Before running the receiver, make sure a reception node has been configured for the reception. See Configuring a Reception Node Configuration (eas_cfg_receive_node) Object, page 143. Configuring the Receiver Properties Configure the general receiver properties, such as reception node, repository name, and login credentials, in EAS_HOME/conf/receiver.properties as needed. Property Long (Short) Name Description Default Value config (c) Name of the reception node configured for the reception. reception_node_01 This value must match that of the eas_name property of the reception node configuration (eas_cfg_receive_node) object so as to associate the AIP being received to the object. domain (d) The user’s domain, if any Empty value delete (e) (Optional) Deletes the supplied file (specified in the file option). This option does not require any argument. N/A file (f) The file to process N/A aek (k) (Optional) The aek file to use for the encrypted password used to connect to the repository. This must be the password for the user specified in the user property. N/A level (l) (Optional) Logging level: ERROR, WARN, DEBUG, INFO, TRACE N/A customer (o) The name of the customer supplying the file EAS 237 InfoArchive Administration Property Long (Short) Name Description Default Value password (p) (Optional) The user password with which to connect to the repository. Password of the installation owner user account If the job executes on the same host where Content Server is installed, you do not need to specify this property here. The job can connect to the repository through the Content Server trusted login feature. rkm (r) (Optional)The RSA DPM client configuration file to use to connect to the RSA DPM Server N/A sever (s) The repository name Name of the repository type (t) A qualifying type identifying the configuration of the file to process. EAS user (u) The name of the Documentum user with which to connect to Login name of the installation owner user account Running the Receiver Except the file argument, all other mandatory arguments default from the properties file but can be overridden at the command line. eas-launch-receiver –f SIP_file_path_name Note: The SIP file to be received cannot be read-only; otherwise, a reception error will occur. Return code 0 (zero) indicates the SIP file has been successfully received. A good practice is to not delete the original file without getting a return code of zero from the receiver. If the reception failed, the created AIP object will be located in the reception node root folder. You can either delete or invalidate the AIP object, troubleshoot the receiver error, and run the receiver again. Verifying the Reception Process Return code 0 (zero) indicates the SIP file has been successful received. You can optionally check the following to verify the reception process. • A new AIP object is created in the root AIP classification folder in the repository. • The AIP is in queue for ingestion, as is indicated by its Waiting_Ingestion state. • As is displayed on the AIP properties page (AIP tab), both the retention date and ingestion deadline were computed during the reception process. Return code is 0 (zero). Most of the properties are populated using the information in the SIP descriptor (eas_sip.xml). 238 InfoArchive Administration • Information related with the reception is recorded under the Tracking tab of the AIP properties page for tracking and reporting purposes. • The reception log file (eas_receive_logs_zip) has been included as part of the AIP renditions. 239 InfoArchive Administration Troubleshooting Receiver Errors When a reception error occurs, a non-zero return code is returned by the receiver command line and more information can be found in the reception log file created in the working directory of the reception. If configured, the reception log is also attached as content of the AIP object. For reporting an traceability purposes, many information are also populated as attributes of the AIP repository object created by the reception • Return code and message of the reception error • Reception date, node, working directory • Received file path name and size The AIP repository object remains in the reception lifecycle state where the error occurred. After diagnosing the reason of the reception error and fixing it: • The AIP repository object created by the failed reception must be invalidated in the DA interface. • A new reception must be launched; this reception creates a new AIP repository object. This procedure does not correspond to a restart of the failed reception; such restart approach has not been retained for the following reasons: • Considering the basic actions performed by the receiver, it has been judged as a low source of errors. • Nothing prevents the received file to be overridden by a newer file between the reception error and a restart. As such, restarting a reception by relying solely on the file system path of the received file has not been perceived as a safe approach. Receiver Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 240 InfoArchive Administration Return Code Mnemonic Description 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_GLOBALCONFIG Cannot load the InfoArchive global configuration object 6 E_RECEIVERNODE Cannot load the reception node configuration object 7 E_CREATELOG Cannot create the log file in the working directory 8 E_CREATEAIP Cannot create the AIP repository object 9 E_UPDATEAIP Cannot update the AIP repository object 10 E_INITPOLICY The policy cannot be attached to the AIP due to JMS issues, missing policy, or other reasons. 11 E_LOCATESIPEX The method configured for extracting the SIP descriptor returned a non-zero return code 12 E_LOCATESIP Cannot locate the descriptor in the SIP 13 E_WRITESIP The SIP descriptor could not be imported as a content of the AIP repository object 14 E _LOCATEHOLDING The SIP descriptor references an unknown archive holding 15 E_VALIDATESIP The SIP cannot be queued either because the structure of its descriptor is invalid, an existing non invalidated AIP repository object has the same identifier, the descriptor does not contain any hash value for the PDI file while demanded by the holding configuration, an inconsistency of the SIP sequence number has been detected regarding the AIP repository objects having the same DSS identifier or the destruction lock cannot be created. 16 E_REJECT The SIP has been set to the rejected state since its descriptor references a rejected DSS identifier 17 E_WRITEHOLDING Error while applying the changes to the AIP repository object configured in the archive holding (i.e. changing the type of the AIP, classifying the AIP in the folder hierarchy or attaching the ingestion lifecycle to the AIP) 18 E_COMPLETEAIP Error while promoting the AIP to the ingestion pending state 241 InfoArchive Administration Receiver Log File Every execution of the receiver leads to the creation of a reception working directory and a log file in that directory: • The name of the reception working directory consists of the reception start date timestamp and the AIP (eas_aip) object ID. • The log file name is eas_receive_logs_zip.log For traceability purpose, if activated for the archive holding, a compressed form of the reception log files is imported as content of the AIP repository object as format eas_receive_logs_zip even if an execution error occurs. As such, the log file can be both accessed at the file system level and within the DA interface as content of the AIP repository object. Enumerating AIPs for Ingestion The enumeration process is executed by running the eas-launch-enumeration script and outputs a list of AIPs awaiting ingestion for a specified ingestion node, ordered by: 1. The ingestion priority of their respective archive holding 2. The ingestion deadline date computed during reception Configuring Enumerator Properties The enumerator properties file is named job-enumeration.properties and contains the following properties. Property Long (Short) Name Description Default Value cutoff_days (c) The cut off in days for research. 31 flags (f) (Optional) minusrunning: Returns the maxinum number of AIPs to enumerate, minus the number of ingestions already running on the ingestion node 242 InfoArchive Administration Property Long (Short) Name Description -faileduntildate (F) Includes AIPs of which ingestion has failed, with an ingestion start date earlier than the specified date. Default Value This option lets you trigger a batch retry of failed ingestions. server (s) The repository name Name of the repository domain (d) the user’s domain if any Empty value user (u) The name of the Documentum user to connect with Login name of the installation owner user account password (p) (Optional) The user password with which to connect to the repository. Password of the installation owner user account If the job executes on the same host where Content Server is installed, you do not need to specify this property here. The job can connect to the repository through the Content Server trusted login feature. aek (k) (Optional)The aek file to use for encrypted password This property is not set level (l) (Optional)The optional logging level to use (ERROR, WARN, DEBUG, INFO, TRACE) This property is not set max (m) (Optional) The maximum number of AIPs to return nodes (n) (Optional) Ingestors node name(s) for which to return the AIP list ingestion_node_01 The purpose of the cutoff_days property is to optimize the DQL query issued by the enumerator by adding a criteria restricting the search to the AIP received after the current date minus the number of days mentioned in this property. Running the Enumerator All arguments are defaulted in the properties file but can be overridden at the command line. eas-launch-job-enumeration The return code is set to 0 (zero) for success and non-zero for error. When successful, the enumerator returns a list of AIP object IDs. In order to facilitate the invocation of the ingestion within a shell: • The list of AIPs is written to the standard output (For example, stdout) 243 InfoArchive Administration The parsing of the result returned to stdout allows for simple integration within a shell (e.g., which dynamically creates new ingestion jobs in the custom job scheduler) . • Messages are written to the standard error output (For example, stderr) Troubleshooting Enumerator errors The enumerator is just a utility which does not any processing except returning an ordered list of AIPs. In case of error, the cause of the error must be identified and fixed but not any special procedure has to be applied. Enumerator Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E _INGESTORNODES Error while loading the configuration of the referenced ingestor nodes Re-enumerating Failed AIPs When the ingestion of an AIP fails, the enumerator: • Does not consider such failed ingestion as an ingestion being executed • By default, does not include such AIP in the returned list However, in situations where a system wide problem leads to the failure of many ingestions, it can be useful to restart those failed ingestions en masse after fixing the problem. Such operation can be achieved by using the "faileduntildate=DQL date" flag of the enumerator. This flag leads the enumerator to also consider the AIPs being in an ingestion error state and for which the last ingestion attempt was triggered up to the date passed in the argument. Including those AIPs in the results allows the invoking shell to restart their ingestion. The passed date is a safeguard mechanism for avoiding an infinite restart loop when the ingestion constantly fails. 244 InfoArchive Administration Ingesting AIPs The ingestion process is invoked by running the eas_launch-ingestor script and it performs the following actions: • Creates a directory under the ingestion root working directory • Attaches the AIP to the lifecycle associated with the ingestion sequence (defined by the eas_cfg_ingest object) to be executed Before running the ingestor, make sure that: • An ingestion node has been properly configured. See Configuring an Ingestion Node Configuration (eas_cfg_ingest_node) Object, page 146. • xDB cache is running. See Starting the xDB Cache, page 235. The xDB cache process imports the xDB detachable library associated with the AIP and stores it as a rendition of the AIP object at the end of the ingestion. Configuring Ingestor Properties The ingestor properties file ingestor.properties located in EAS_HOME/conf contains the following properties. Property Long (Short) Name Description Default Value config (c) Name of the ingestion node configuration (eas_cfg_ingest_node) object to use for the ingestion ingestion_node_01 server (s) The repository name Current repository name domain (d) the user’s domain if any user (u) The name of the Documentum user to with which to connect to the repository Login name of the installation owner user account password (p) (Optional) The user password with which to connect to the repository. Password of the installation owner user account If the job executes on the same host where Content Server is installed, you do not need to specify this property here. The job can connect to the repository through the Content Server trusted login feature. aek (k) (Optional) The aek encrypted password file for connecting to the repository. This must be the password of the user specified in the user property. rkm (r) (Optional) The RKM client configuration file to use to connect to the RKM Server 245 InfoArchive Administration Property Long (Short) Name Description id (a) ID (r_object_id) of the AIP object level (l) (Optional) The optional logging level to use (ERROR, WARN, DEBUG, INFO, TRACE) delete (e) Whether to delete data placed in the working directory if an error occurs (true|false). Default Value True By default, in the event of an ingestion error, data being archived is not deleted from the working directory. Set this to true to automatically delete sensitive data from the working directory in case of an ingestion error. Running the Ingestor To ingest an AIP, launch the ingestor using the following command: • EAS_HOME/bin/eas_launch-ingestor.bat aip_id (Windows) • EAS_HOME/bin/eas_launch-ingestor.sh aip_id (Linux) Except for the AIP_ID, which is the AIP object ID, all other arguments values default from the eas-ingestor.properties file located in EAS_HOME/conf. You can override the default values by specifying the arguments in the command line. After the ingestor is run, a return code is displayed: 0 (zero) indicates successful ingestion and a non-zero value means the ingestion has failed. If ingestion of an AIP failed, identify and resolve the cause of the error, and re-ingest it. When you restart an ingestion, the ingestor reverts to the initial ingestion lifecycle state and re-execute the ingestion process. Note: You can only restart a failed ingestion on the same ingestion node, due to the following reasons: • The working directory for a ingestion node is not accessible by other ingestion nodes. Therefore, the log file of a failed ingestion can only be accessed from the ingestion node on which the ingestion was performed. • To ensure the log of new ingestion attempts are appended to the old log file so that the entire ingestion history is kept for traceability. 246 InfoArchive Administration Verifying the Ingestion Process Return code 0 (zero) indicates the AIP file has been successful ingested. You can optionally check the following to verify the reception process: • On the ingested AIP’s properties page: — The current state is Waiting_Commit. — The min and max values of the partitioning key has been computed and recorded. — Ingestion related information has been recorded. • Information relating to contents imported into xDB has been captured and recorded. The Locked in cache deadline date property denotes until when the associated xDB library must remain online so that the archived data can be searched. 247 InfoArchive Administration • In the ingestion node working directory (default: EAS_HOME/working/ingestion_node_01), an ingestion directory has been created for the AIP, its name consisting of the ingestion start date timestamp and the AIP object ID. The directory contains the ingestion log file (eas_ingest_logs_zip.log). If configured, the file is also imported as a rendition of the AIP object. Troubleshooting Ingestion Errors When ingestion error occurs, a non-zero return code is returned at the ingestor command line. You can find more information in the ingestion log file created in the ingestion node working directory. If configured, the ingestion log file is also imported as content of the AIP object. For traceability purposes, the AIP repository object properties are populated with the following information: • Return code and message of the ingestion error • Ingestion date, node, working directory The AIP repository object remains in the ingestion lifecycle state where the error occurred. The data files managed by the ingestor in the working directory are normally left as is for facilitating the diagnosis of the problem. If configured, InfoArchive deletes those files when an error occurs to protect sensitive data. The actions to perform depend on the diagnosed cause of the error: 248 InfoArchive Administration Cause of the Error Actions to Perform Transient technical issue After fixing the cause of the error, the ingestion can be restarted by relaunching the ingestor for this AIP. The ingestor detects a restart, reattaches the AIP at the first state of the ingestion lifecycle, performs the necessary cleanup and executes again the ingestion. In order to keep a consistent logging across the multiple executions of the ingestion. A restart reuses the existing ingestion working directory and appends its logs to the existing log file. For that reason, a restart must be done using the same ingestion node. Configuration error After fixing the configuration error, the ingestion can be restarted as described above. Incorrect DSS The origin of the error comes from the business application having generated an incorrect SIP and it is identified that all SIPs belonging to that DSS are incorrect and must be rejected. In such situation, the administrator has to reject the AIP Incorrect SIP The origin of the error comes from an incorrect isolated SIP file but other SIP files belonging to the same DSS are correct. In such situation, the administrator has to invalidate the AIP. Ingestor Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 I_DUP Ingestion refused since an ongoing processing on the AIP has been detected 2 I_PARALLEL Ingestion refused since another ongoing ingestion on this AIP has been detected 10 E_PARSE Error while parsing arguments 11 E_DFCINIT Error while initializing the DFC 12 E_CREDENTIALS Cannot connect to the repository with the configured credentials 13 E_PARAMS Error while validating the parameters 14 E_GLOBALCONFIG Cannot load the InfoArchive global configuration object 15 E_INGESTORNODE Cannot load the ingestor node configuration object 16 E_BINDLOG Cannot create the log file in the working directory 17 E_NOAIP Cannot find the AIP having the provided identifier 249 InfoArchive Administration Return Code Mnemonic Description 18 E_LOCATEHOLDING Cannot find the holding referenced by the AIP 19 E_LOCATECONFIG Cannot find an ingestion configuration applicable to the AIP 20 E_INVALIDAIP The AIP state is not applicable for launching the ingestion on the AIP 21 E_CANNOTRESTART The ingestion cannot be restarted 22 E _TRANSFORMCONFIG Empty eas_cfg_pdi: No PDI schema is specified 23 E_LOCATEXDBNODE Cannot load the configuration of the target xDB library to use 24 E_LOADRKMCLIENT Cannot load the RSA RKM client 25 E_PROCESSOR Invalid eas_cfg_pdi value: The specified PDI schema is invalid Ingestor Log File Every ingestion leads to the creation a working directory and a compressed log file eas_ingest_logs_zip.log in that directory. The name of the working directory consists of the ingestion date timestamp and the AIP object ID; for example: 20120501T144822536_0800046e800061ee. For traceability purpose, if activated for the archive holding, a compressed form of the ingestion log files is imported as content of the AIP repository object as format eas_ingest_logs_zip even if an ingestion error occurs. As such, the log file can be both accessed at the file system level and within the DA interface as content of the AIP repository object. Committing Ingested AIPs The commit process is invoked by running the eas_launch-ingestor script and it commits AIUs into the xDB database so that the archived data can be searched. You schedule the eas_commit Content Server job to run at regular intervals to scan for ingested AIPs pending commit (lifecycle state is Waiting_Commit) and commit them. 250 InfoArchive Administration The commit process can be synchronous or asynchronous depending on the ingestion mode used: • Asynchronous commit In asynchronous ingestion mode, when multiple ingested SIPs are part of a single DSS (batch), you can execute the eas_commit job multiple times, but until all the SIPs have been ingested, none of the archived data in the batch will be searchable. When all SIPs pertaining to a DSS have been ingested (lifecycle state = Waiting_Commit), executing the eas_commit job performs the following actions on every SIP in the batch: — Imports a compressed commit log file as a rendition of the AIP (eas_aip) object for traceability purposes — Deletes the original SIP file rendition from the AIP object, unless the holding is configured to keep SIP files. If you use an EMC Centera as the archive store, the eas_commit job also pushes the AIP retention date and the content — Populates corresponding Centera properties with the values of some properties of the AIP and its content for reversibility — Assigns the AIP retention date to the contents stored in Centera — Promotes the AIP object to the Completed lifecycle state. This deferred push of the retention date and content properties to Centera (as opposed to during the reception or ingestion stage) allows for the potential destruction of the ingested contents before the commit is complete. This way, when you cancel the ingestion of AIPs before they are committed, the ingested content is deleted. • Synchronous commit In synchronous ingestion mode, you enable synchronous commit for the target holding and ingested standalone SIPs (one SIP per DSS) are automatically committed without the need to explicitly invoke the eas_commit job. To enable synchronous commit for the target holding, edit the properties of the holding configuration (eas_cfg_holding) object and under the Holding tab, select the Synchronous commit enabled option. When the commit process (both synchronous and asynchronous) is complete, the AIP lifecycle state is Completed and the archived data is searchable. Executing the eas_commit Job You can execute the eas_commit job in one of the following ways: • Run the eas_commit job in DA Under the repository folder Administration/Job Management/Jobs, right-click the eas_commit job and choose Run. • Run the eas_commit job at the command prompt using the runJob command: EAS_HOME/bin/runjob repository_name eas_commit 251 InfoArchive Administration Verifying the Commit After a successful commit, you can see in the properties page of the ingested AIP object that: • The current Phase and State of the AIP are both Completed. • The Commit date is recorded. Also, the eas_zip_zip rendition of the committed AIP has been removed, unless the holding is configured to keep SIP files. Meanwhile, the compressed commit job log file eas_commit_logs_zip has been imported as a rendition of the AIP. Managing AIPs An AIP is represented as an eas_aip type object in Content Server. You manage AIPs in DA in the same way as you manage other repository objects using standard DA features such as searching for objects and viewing object properties. 252 InfoArchive Administration Right-click an AIP and choose a command from the shortcut menu to perform the following actions: Menu Command Action Properties View the properties of the AIP. For information about the properties of an eas_aip object, refer to the EMC InfoArchive Object Reference Guide. Reject Reject the AIP. When you reject an AIP, all existing and future AIPs within the same DSS (batch) will also be automatically rejected. Note: An open AIP aggregate cannot be rejected. You must perform additional steps to complete the entire AIP rejection cycle. See Rejecting or Invalidating an AIP, page 255. Invalidate Invalidate the AIP. The invalidation of an AIP consists in declaring that the SIP having led to the creation of the SIP has to be ignored. However, for traceability purpose, the AIP object is kept. In the event other AIPs or future received SIP file have the same DSS identifier as the invalidated AIP, they will be processed. Note: An open AIP aggregate cannot be invalidated. You must perform additional steps to complete the entire AIP invalidation cycle. See Rejecting or Invalidating an AIP, page 255. 253 InfoArchive Administration Menu Command Action Apply/Remove the Purge Lock Apply/remove purge lock on the AIP. Cache In/Out Temporarily remove the archived AIP structure data from the xDB file system (cache out) and put it back into xDB (cache in). A purge lock prevents an AIP from being deleted when its retention date is reached. Such lock can be automatically created when an AIP is created or manually by the administrator. For information about xDB caching, see xDB Caching, page 55. View > Renditions View all the renditions of the AIP object. To close an open AIP parent, right-click it and choose Request AIP Parent Close from the shortcut menu. The AIP parent will be closed the next time the eas_close job is executed. AIP States In DA, different AIP states are represented by different icons. Icon (Unlocked) Icon (Locked) AIP (Parent) State AIP parent AIP parent is open AIP parent is pruned AIP AIP is online AIP is offline AIP is not searchable AIP is in unsteady state One of the following scenarios leads to the unsteady state: • The AIP is being ingested (work in progress) • The AIP is a future aggregate and its associated AIP parent is open (AIP mode 3) 254 InfoArchive Administration Icon (Unlocked) Icon (Locked) AIP (Parent) State • The AIP belongs to an aggregated AIP parent but is yet to be pruned (AIP mode 3) Rejecting or Invalidating an AIP Rejecting or invalidating an AIP and promoting it through the complete rejection/invalidation lifecycle entails the following steps: 1. In DA, right-click an AIP object and choose the Reject or Invalidate command from the shortcut menu. The lifecycle state of the AIP object changes to REJ-WCOM or INV-WCOM. 2. Make sure confirmations are enabled for the reject and invalid event types. Run the confirmation (eas_confirmation) job. 3. Run the rejection/invalidation (eas_rejinv) job. The lifecycle state of the AIP object changes to REJ-WXDBCLEAN or INV-WXDBCLEAN. 4. Run the xDB clean job by executing eas-launch-xdb-clean.bat (Windows) or eas-launch-xdb-clean.sh (Linux). 255 InfoArchive Administration The lifecycle state of the AIP object changes to REJ-WPROC or INV-WPROC. 5. Run the rejection/invalidation (eas_rejinv) job again. The lifecycle state of the AIP object changes to REJ-DONE or INV-DONE. The AIP rejection/invalidation cycle is complete. Updating Custom Package Metadata In SIP descriptor, you can add custom package metadata to incorporate extra information about the package, for example, department that owns the SIP, partitioning criteria, and so on. Custom package metadata must be enclosed within <custom> tags, as shown in the following example. <?xml version="1.0" encoding="UTF-8" ?> <sip xmlns="urn:x-emc:eas:schema:sip:1.0"> <dss> <holding>PhoneCalls</holding> <id>20140418012221</id> <pdi_schema>urn:eas-samples:en:xsd:phonecalls.1.0</pdi_schema> <pdi_schema_version /> <production_date>2011-02-01T00:00:00.000+01:00</production_date> <base_retention_date>2011-02-01T00:00:00.000+01:00</base_retention_date> <producer>CC</producer> <entity>PhoneCalls</entity> <priority>0</priority> <application>CC</application> </dss> <production_date>2011-02-01T00:00:00.000+01:00</production_date> <seqno>1</seqno> <is_last>true</is_last> <aiu_count>10</aiu_count> <page_count>0</page_count> <custom> <attributes> <attribute name="rep_attr">value1</attribute> <attribute name="rep_attr">value2</attribute> <attribute name="int_attr">1</attribute> <attribute name="boolean_attr">true</attribute> <attribute name="rep_attr">value3</attribute> <attribute name="date_attr">1990-02-01</attribute> <attribute name="float_attr">3.1</attribute> </attributes> </custom> </sip> Ingesting SIPs Containing Custom Package Metadata In order to ingest SIPs containing custom package metadata, you must create a subtype of the base AIP type (eas_aip), and specify the created subtype in the archive holding. The following procedure shows how to create a subtype and specify that type in the archive holding. 1. Launch Documentum Server Manager. 2. Click IAPI on the Repository tab. 256 InfoArchive Administration 3. Enter the following DQL script in the IAPI command prompt. The script creates a subtype that holds custom attributes shown in the previous section. For more information about CREATE TYPE and other DQL statements, refer to the EMC Documentum Content Server DQL Reference. create type "test_phonecalls" ("bolean_attr" boolean(default=false),"float_attr" float, "rep_attr" string(32) repeating, "date_attr" date, "int_attr" integer (NOT NULL)) with supertype "eas_aip" Note: The custom package metadata value must be one of the following types: • xs:string • xs:float • xs:integer • xs:date • xs:datetime 4. In the archive holding, change the eas_aip_type attribute to the created subtype name, for example, test_phonecalls. In DA, this attribute is set in the AIP type text box on the Holding tab. 5. After ingestion, custom package metadata turns into AIP object attributes. Updating Custom Package Metadata You may want to change custom package metadata occasionally due to company reorganization, merge and acquisition. InfoArchive 3.0 first exposes custom package metadata to users with appropriate access rights. If you are the system administrator or the IT specialist with WRITE or RELATE access, you can update custom package metadata through DA or using DQL scripts. Custom Package Metadata Update Dates After a SIP is ingested, it turns into an AIP repository object. The SIP package metadata contained in the SIP descriptor is applied as the attribute values of the AIP object. If eas_sip_xml_store_enabled is set to TRUE, the SIP descriptor is saved as the AIP rendition. Therefore, whenever you update custom package metadata (AIP attribute value), you must also update the SIP descriptor in the AIP rendition to keep consistency. The following attributes are added for eas_aip type: • eas_aip_cmeta_modify_date: When is the last time you modified custom package metadata of an AIP object. • eas_sip_cmeta_refresh_date: When is the last time the refresh job propagated custom package metadata changes to the SIP descriptor rendition. 257 InfoArchive Administration Updating Custom Package Metadata in DA You can update custom package metadata for an AIP individually in DA, if you log in to DA with RELATE and WRITE privileges. Complete the following procedure to update custom package metadata in DA: 1. Right-click the AIP object whose custom package metadata you would change and then select Properties. 2. Select the AIP tab. 3. Locate the custom package metadata you would change. If the metadata is not shown in READ/WRITE mode, you may not have enough privileges to manipulate the data. Please contact your system administrator. Note: The DA label for custom package metadata is identical to the attribute name of the metadata in the SIP descriptor. For example, the DA label for the following metadata is boolean_attr. EMC recommends using descriptive attribute names. <att ribute name="boolean_attr">true</attribute> 4. Change the metadata, and click Save. By now, the eas_aip_cmeta_modify_date attribute is set to the time when the metadata is changed. If SIP descriptors are saved as the AIP rendition, a refresh job is needed to propagate changes to keep data consistency. The refresh job is mentioned in Propagating Changes to AIP Renditions, page 258. Updating Custom Package Metadata using DQL scripts You can mass update the custom package metadata for a batch of AIP objects by using DQL scripts. You must have at least WRITE access to the AIPs. In addition to updating metadata, you must also update the eas_aip_cmeta_modify_date attribute. The DQL scripts should have the following pattern: UPDATE "test_phonecalls" OBJECTS SET "int_attr" = 2, SET "boolean_attr" = false, INSERT "rep_attr"[4] = 'value4' SET "eas_aip_cmeta_modify_date" = DATE(NOW) WHERE conditions_for_update After the DQL update, you must run the refresh job to propagate package metadata (AIP attribute value) changes to AIP renditions. Propagating Changes to AIP Renditions If eas_sip_xml_store_enabled is set to TRUE in the archive holding, SIP descriptors are saved as AIP renditions. To view an AIP’s rendition, you can select View > Renditions in the shortcut menu, and double-click eas_sip_xml (XML description of an AIP) in the list. The eas_sip_cmeta_refreshjob propagates custom package metadata changes to AIP renditions. This refresh job selects AIPs using the following DQL script: 258 InfoArchive Administration SELECT ... FROM eas_aip WHERE eas_aip_cmeta_modify_date IS NOT NULLDATE AND eas_aip_cmeta_modify_date>eas_sip_cmeta_refresh_date ORDER BY eas_aip_cmeta_modify_date The DQL script may also select out AIPs not eligible for propagating changes. The refresh job may be executed in the following scenarios: • AIP is base eas_aip type AIPs of eas_aip type do not contain any custom package metadata. The refresh job: 1. Sets eas_sip_cmeta_refresh_date to the current date to avoid processing in the future. 2. Writes information into the log file. 3. Proceeds to the next AIP object. • AIP is part of an open AIP aggregate It is not allowed to change the custom package metadata of an open AIP aggregate (eas_aip_mode = 3). The refresh job: 1. Sets eas_sip_cmeta_refresh_date to the current date to avoid processing in the future. 2. Writes information into the log file. 3. Proceeds to the next AIP object. • AIP is in a transient mode AIPs whose eas_is_in_unsteady_state is TRUE are in transient mode. The refresh job: 4. Be postponed to the time when AIP is not in transient mode. 5. Writes information into the log file. 6. Proceeds to the next AIP object. • AIP is a subtype of eas_aip or a closed AIP aggregate The refresh job: 1. Looks for the rendition, which has the highest page_rendition value. 2. If no SIP descriptor rendition is found, performs the following actions based on different conditions: a. If the AIP is not rejected or invalidated, an error message is written to the log. b. If the AIP is rejected or invalidated, sets the eas_sip_cmeta_refresh_date attribute to TRUE to avoid processing it in the future. 3. Exports the rendition to the working directory, for example, C:\app\eas\working. 4. If the rendition is a package, unzips the package, and then replaces custom package metadata in the SIP descriptor with the new AIP attribute values. 5. Imports the updated SIP descriptor to replace the old eas_sip_xml rendition. a. If the old rendition cannot be removed because it has not reached its retention date, the updated SIP descriptor is imported with a page modifier set to current date. The date is formatted as YYYYMMDDHHMMSS. b. If the XML store is a retention-enabled storage, for example, Centera, 1. Structured data must be pushed to the storage. 259 InfoArchive Administration 2. If the AIP is in one of the final states, COM, REJ-DONE, or INV-DONE, the retention date must be pushed to the storage. 3. If the AIP is not in a final state and eas_rejinv_retention_enabled is set to FALSE, the retention date is not pushed to the storage. 6. Sets the eas_sip_cmeta_refresh_date attribute to the current date to avoid processing in the future. 7. Writes information into the log file. 8. Proceeds to the next AIP object. The following table lists the exit codes for the refresh job. Exit Code Description 0 s.ok: No error 1 e.parse: The return code for argument parsing error 2 e.dfc.init: The return code for DFC initialization error 3 e.credentials: The return code for server credentials error 4 e.params: The return code for parameter validation error 5 e.report: The return code for report creation error -1 e.unexpected: The return code for unexpected error Performing Data Retention Management Data retention management is essential to an enterprise archiving system. Data retention defines the policies of persistent data and records management for meeting legal and business data archival requirements. It controls and governs archived records of an organization throughout the records lifecycle, from the time the records are ingested to their eventual disposal. InfoArchive lets you perform data retention management in the following two ways: • Using InfoArchive’s date-based retention management capabilities • Using extended retention management capabilities of Retention Policy Services (RPS) InfoArchive provides two archival reports to help you better manage data retention: • List of AIPs by Retention Date Vew all archived AIPs whose retention date is before a specified date. • AIPs for Disposition View all AIPs with a disposition date determined by the retention policy. To perform data retention management tasks including changing the retention date, adding/removing a purge lock, and rejecting/invalidating an AIP, you must have at least the Relate permission. 260 InfoArchive Administration Using InfoArchive’s Date-Based Retention Management Capabilities InfoArchive retains and disposes archived data based on the retention date, which is stored in the eas_retention_date property of the AIP (eas_aip) object. When a SIP is received, the retention date of the AIP is calculated based on the base retention date specified in the SIP descriptor (eas_sip.xml) and the retention period defined for the destination holding: retention date = base retention date + retention period Depending on whether a retention class is specified in the SIP descriptor (eas_sip.xml), different retention periods are used: • If there is no retention class defined, the retention period defined for the destination holding is used. This is the eas_retention_period property of the eas_holding object. • If a retention class is present, the corresponding retention period associated with the retention class is used to calculate the retention date. This is a pair of repeating properties—eas_retention_class and as_retention_class_period defined for the for the destination holding. If the specified retention class has not been defined, a reception error occurs. For example, given the following retention class definitions, if the retention class specified in the SIP descriptor (eas_sip.xml) is Class B, then retention period used to calculate the retention date is 365. eas_retention_class eas_retention_class_period Class A 3650 Class B 365 Class C 30 Before archiving data, there must be an agreement between the data owner and data producer on the retention class definitions and the retention period defined at the holding level. If you use EMC Centera as the archive store, when an ingested AIP is committed, its retention date (along with content attributes) is pushed to Centera at the storage level, and is also assigned to the associated content files stored in Centera. When the retention of an AIP is reached and the AIP does not have a purge lock, the purge (eas_purge) job attaches the purge lifecycle to the AIP object and the AIP is moved to a dedicated repository folder /System EAS/data/purge. To facilitate retention management, it is a common practice to use the retention date as a condition for assigning AIPs when defining the pooled library assignment policy. See Configuring a Custom Pooled Library Assignment Policy (xDB Mode 3), page 120. 261 InfoArchive Administration Changing the Retention Date You can change the retention date of an archived AIP. To do so, in DA, right-click an AIP object and choose Change Retention from the shortcut menu, and then specify the change you want to make: • Increase or decrease the retention date by a specific number of days • Set a new retention date • Set the retention date to null so that the AIP object will never expire If you use a content-addressable storage (CAS) such as EMC Centera as the archive store, the new retention date will be pushed to the CAS at the storage level, and will also be assigned to the associated content files stored in the CAS. Applying/Removing a Purge Lock By default, an AIP object having reached its retention date is automatically purged by the scheduled purge (eas_purge) job. However, such automated disposal is not desirable in some situations: • Some electronic archiving standards such as ISO 14641-1 forbid automated disposal of expired AIPs • Such disposal must be formally confirmed by the data owner A purge lock prevents scheduled disposal of an AIP when its retention date is reached. The AIP is not disposed until the administrator explicitly removes the purge lock. In DA, you can manually apply purge locks to AIPs or configure the holding to automatically apply purge locks to AIPs during the archiving process. To manually apply/remove a purge lock to an AIP: Right-click an AIP and choose Apply/Remove the purge lock from the shortcut menu. To configure the holding to automatically apply purge locks to AIPs during the archiving process: Edit the properties of the holding configuration (eas_cfg_holding) object and under the Holding tab of the Properties window, select Automatic creation of purge lock. Using Extended Retention Management Capabilities of Retention Policy Services (RPS) By integrating EMC Documentum Retention Policy Services (RPS) with InfoArchive to manage disposition of archived data, you can take advantage of RPS extended retention management features such as event-based retention, multi–phase retention, and markups. 262 InfoArchive Administration You must install RPS and Records Client, the RPS interface, to perform RPS actions on AIP objects. An extension package of RPS Records Client (limited to exclusive use with InfoArchive) ships with the product and can be deployed as part of the InfoArchive installation. The Record Client extension provides custom-built shortcut menus that you can use to perform actions on AIP objects directly from RPS Records Client. Refer to the following documents for more information about RPS: • EMC Documentum Retention Policy Service Administrator User Guide • EMC Documentum Retention Policy Services Installation Guide • EMC Documentum Records Client Administration and User Guide • EMC InfoArchive Installation Guide Licensing and Installing RPS Components To use RPS with InfoArchive, you must perform the following licensing and installation tasks: 1. Launch Documentum Content Server Configuration Program to activate Retention Policy Services using a valid license. 2. Download the following RPS components from EMC SubscribeNet. • Documentum Retention Policy Services DAR • Documentum Records Client WAR 3. Launch Documentum DAR Installer to install Documentum Retention Policy Services DAR (rps.dar). 4. Deploy Documentum Records Client (records.war) on the web application server. Note: InfoArchive provides an extension for Records Client. Refer to the EMC InfoArchive Installation Guide for more information about how to deploy the extension on a standard Records Client. 5. Restart the repository and the web application server. RPS Basic Concepts This section introduces RPS basic concepts to help you quickly get started with RPS. Retention Policy A data retention policy is a recognized and proven protocol within an organization to retain information for operational use while adhering to the laws and regulations concerning them. It is a set of guidelines that describes which data will be archived, how long it will be kept, and other factors concerning the retention of the data. Its objectives are to keep important information for future use or reference, to organize information so it can be searched and retrieved at a later date and to dispose of information that is no longer needed. 263 InfoArchive Administration You can only apply RPS retention policies to AIP (eas_aip) objects; Applying them to other InfoArchive object types—configuration objects (e.g., eas_cfg_holding) and other runtime objects (e.g., eas_audittrail)—are not supported. A retention policy determines the length of time an object is kept in a repository. There are two types of retention policies: • Linked retention policy • Individual retention policy Retainer A retainer is created when you apply a retention policy to an object. There are two types of retainers, shared and individual. Retainers are created differently depending on which type the policy belongs to. The following table describes the behavior differences on retainer creation. Applied to containing objects (For example, folders) Individual Retention Policy Linked Retention Policy Each object under the folder inherits an individual retainer. A shared retainer is created. Objects age with their own retainer. Applied to non-containing objects (For example, eas_aip objects) An individual retainer is created. All objects under the folder age with the folder. An individual retainer is created. Applying Retention Markups In Records Client, right-click a folder and choose Retention > Apply Retention Markups from the shortcut menu to apply retention markups. You can create your own retention markups with custom markup names. The types of retention markups are restricted to the following: • Hold: Stops destruction of objects. • Permanent: Stops destruction of objects, and only retention manager can remove permanent markups. • Freeze: Stops the promotion of an object from one phase to the next phase. • Review: Sends a notification (either an email or an inbox) to a named contact following a time period that you select. • Vital: Used to designate which records are critical to the day-to-day operation of the business. It does not prevent disposition. Only the Hold and Permanent retention markups prevent an item from being disposed and privileged deletes. Hold and Permanent markups are used in a similar way as InfoArchive purge locks. 264 InfoArchive Administration Once retention markups are applied, the retention of an AIP object is determined by the retention markups. After you remove the retention markup on an AIP object, the AIP reverts back to to RPS retention or InfoArchive date-based retention. InfoArchive never purges an AIP with an RPS retention markup applied to it. You must remove the markup before the AIP can be purged. Escalating DFC Instances to Privileged Users InfoArchive makes changes to AIPs and the repository when executing jobs; for example, add/delete renditions, create repository folders, delete AIP objects, and so on. Only privileged users can make changes to objects under retention. You must escalate InfoArchive DFC instances to privileged users after you install Records Client to ensure successful execution of InfoArchive jobs. To escalate DFC instances to privileged users in DA: 1. Choose Client Rights Management > Privileged Clients. 2. Click Manage Clients in the upper-right corner. 3. Add all InfoArchive DFC clients to the right, and approve them from the shortcut menu. 4. Click OK and exit. 5. In the Privileged Clients list, right-click a client and choose Approve Privilege from the shortcut menu. 6. Restart the web application server. Note: If InfoArchive components are distributed among several hosts, you must set the DFC instances of the following components as privileged clients: • InfoArchive receiver • InfoArchive ingestor • JMS instances • Content Server jobs 265 InfoArchive Administration Aligning the Retention Base Time If you create a RPS retention policy based on date, the default base date is r_creation_date of the retained object. To align the RPS policy base retention date with the base retention date set using InfoArchive’s date-based retention mechanism, create a base date in Records Client, and map the base date to the eas_dss_base_retention_date property. To align a RPS base date with the eas_dss_base_retention_date property of AIP objects: 1. In Records Client, navigate to Retention Policy Services > Base Dates. 2. Choose File > New > Base Date. 3. In the dialog, select eas_aip in the Object type drop-down list, and Batch base retention date (eas_dss_base_retention_date) in the Attribute drop-down list. 4. Click OK. Creating a Retention Policy To create a retention policy: 1. Launch Records Client http://host_name:8080/records in the web browser. 2. Click Retention Policy Services > Authorities in the left pane. 3. Create an authority. 4. Click Retention Policy Services > Conditions in the left pane. 5. Create a condition. 6. Click Retention Policy Services > Retention Policies in the left pane. 7. Create a retention policy. You have the following constraints: • Disposition Strategy You can only choose from one of these disposition strategies: Export All, Destroy All or Destroy All. All other disposition strategies are currently not supported. • Rendition Rule When choosing whether to protect the renditions from deletion, you must select Primary Format Only as the parent rendition rule. The All Renditions option is currently not supported. • Metadata Immutable You must set Make Parent Metadata Immutable to No. Setting this option to Yes prevents InfoArchive from updating AIP metadata (attributes). 266 InfoArchive Administration Applying RPS Retention Policies to AIP Objects There are three ways of applying an RPS retention policy to AIP objects: • Applying a retention policy to a folder • Setting a retention policy at the holding level • Setting a retention policy in the SIP descriptor (eas_sip.xml) Applying a Retention Policy to a Folder Folders are a type of containing objects in RPS. If you retain the objects in a folder in a synchronized manner, for example, all objects in a folder purged at the same time, you can apply a linked retention policy to the direct containing folder. You apply a retention policy to a folder by right-clicking a folder, and selecting Retention > Apply Retention Policy from the shortcut menu. Closing and Reopening a Folder If you have all AIP objects ingested into a folder, and you want to prevent any more contents being added, you can close the folder. You close a folder by right-clicking the folder and choosing Retention > Close Folder from the shortcut menu in Records Client. If you attempt to receive an AIP object into a closed folder, the following message appears: [dfc.error]: Exception:DfException:: THREAD: main; MSG: [DMC_RPS_LINK _ERROR] This folder is marked as closed and cannot be linked into; ERRORCODE: ff; NEXT: null You can reopen a closed folder by right-clicking the folder and choosing Retention > Re-open Folder from the shortcut menu in Records Client. Setting a Retention Policy at the Holding Level You can apply retention policies to a set of AIP objects at the holding level with the eas_def_retention_class property. 267 InfoArchive Administration The screenshot above shows an example of setting a default retention class at the holding level: • If no retention class is specified in the SIP descriptor, InfoArchive applies RPS_0_DAY to AIP objects. • If Default retention class is not defined at the holding level, the Default Retention period (d) is applied to AIP objects. • If EAS retention disabled is TRUE for a retention class, the EAS retention period is not applied after you remove the applied retention on AIP objects. If EAS retention disabled is FALSE, the EAS retention period is applied after you remove the applied retention on AIP objects. • A retention class can map to one ore more retention policies. You can apply multiple policies on a single AIP. Setting a Retention Policy in the SIP Descriptor You can specify a retention class in the SIP descriptor. The following SIP descriptor (eas_sip.xml) specifies a retention class of RPS_1_DAY. <?xml version="1.0" encoding="UTF-8"?> <sip xmlns="urn:x-emc:eas:schema:sip:1.0"> <dss> ... <retention_class>RPS_1_DAY</retention_class> ... </dss> ...... </sip> • The retention class specified in the SIP descriptor overrides the default retention class at the archive holding level. • If the folder into which the SIP is ingested has its own retention policy, the AIP is retained using two retention policies. Changing RPS Retentions You can extend, modify, or replace retention policies in Records Client when the policy is not in Active state. You can also increase or decrease the retention date for an individual AIP by choosing Change Retention from the shortcut menu for individual AIPs. Disposing AIPs under RPS Retention Disposition refers to removing AIP objects from the repository through a set of jobs. Disposition can only be performed on AIP objects in steady states. Rejection, invalidation or pruning can be performed on AIPs in unsteady state. Disposition can be triggered by one of the following jobs: • eas_purge, when eas_retention_date is expired • dmc_rps_DispositionJob, when the retainer is in its final state 268 InfoArchive Administration For an AIP under RPS retention, you must promote its retainer to the Final state before disposing the AIP. You perform the following jobs to dispose an AIP under RPS retention: 1. Run dmc_rps_PromotionJob to move the retainer to the Final state 2. Run dmc_rps_DispositionJob to dispose the AIP. The AIP is moved to the /System EAS/Data/Purged repository folder. 3. Run the confirmation (eas_confirmation) job. 4. Run the purge (eas_purge) job. Rejecting/Invalidating AIP Objects Under RPS Retention When you reject or invalidate AIP objects under RPS retention, you remove all renditions of the object. The object itself is not removed. The format of rejected/invalidated AIP objects changes from eas_ci_container to eas_empty. Deleting AIP Objects Under RPS Retention You cannot delete AIP objects under RPS retention. However, if you are a privileged user, you can perform privilege deletion on AIP objects. Privilege deletion results vary depending on the AIP state prior to the action. AIP State Code Result COM Attach the purge lifecycle to the AIP object PUR-WDEL Delete the AIP object PRU-WPROC Delete the AIP object REJ-DONE Delete the AIP object INV-DONE Delete the AIP object Others No action Working with InfoArchive GUI Shipping within the InfoArchive installation package, InfoArchive GUI is the default web-based search application for searching data archived in InfoArchive holdings. InfoArchive GUI consists of a Search UI that lets you perform both real-time searches (synchronous) and background searches/orders (asynchronous) and Background Searches UI that displays all the submitted orders. For information about InfoArchive search, see How Data is Searched, page 62. The Search UI consists of a search menu and a search pane that displays search form and search result page: • Search menu 269 InfoArchive Administration The search menu groups search forms under folders based on which holding they are used to search, each folder corresponding to a distinct holding. • Search form The search form contains search criteria fields and search buttons with which the end user performs queries against a holding. A search form is specific to InfoArchive GUI and is the mechanism used to define what can be searched. • Search result The search result page displays the returned query results. The search UI components are completely customizable and you must configure them before you can use InfoArchive GUI to search archived data in your holdings. Logging in to InfoArchive GUI 1. Launch InfoArchive GUI in a browser. The default URL is http://myhost:8080/eas-gui. 2. Enter your username and password. Your must have at least read access rights to the holdings you want to search. 3. Choose in which locale you want view search forms. 270 InfoArchive Administration After you log in, search forms that have been localized will appear in the corresponding language; otherwise, the default locale defined in the eas-services.properties file (eas.locale.default=en_US) will be used. 4. Click Sign In. Searching Archived Data To search for archived data in InfoArchive GUI: 1. From the search menu on the left, click the holding you want to search. 2. In the search form, enter the search criteria. 3. Do one of the following: • To perform a synchronous search, click Search. • To submit an order (asynchronous search), click Background Search and then optionally enter a name to identify your background search. You can see the order you submitted in the Background Search screen. When the order is executed, you can view the search results or export all the contents of the returned AIUs. In the search page, the logical relations among multiple criteria are not spelled out on the search screen (unless you specify them in the field label), but can be found in the Search Details panel at the top of the search results screen. 271 InfoArchive Administration Working with the Search Results Page Records returned by a search (synchronous or asynchronous) are displayed in the search result page. A typical search result page consists of the components illustrated below. The search results page is configurable through the stylesheet configuration (eas_cfg_stylesheet) object. See Configuring the Search Results, page 200. On the search results page, you can perform the following actions: • Expand the Search Details panel to view the detailed information about the search: — The search criteria entered in the proceeding search form for executing the query, including the specified attribute values and their logical relations (AND/OR) — The search options set in the query configuration, including the ACL/holding name, delivery channel, and result schema used by the query. • Sort results by a column by clicking on a column header, switching between ascending and descending order. • Filter results by a column by entering or selecting a filter value below the column header. For some value data types, you can choose a comparison operator for the filter. • You can filter the results by more than one column. By default, multiple filter values are combined using the AND relation. At the bottom of the screen, you can click the AND/OR button to switch between these two relations. To remove all filter criteria, click the Clear filters button. With the OR relation applied, search results that meet any one the specified filter conditions are displayed. In the following example, “Logan” and “Johnson” are used as the filter value for First Name and Last Name respectively using the OR relation (with the OR option selected at 272 InfoArchive Administration the bottom of the page), records that satisfy either of the filter criteria are displayed, sorted by Call received at in ascending order. • If a result contains an unstructured content file, you can click the action button corresponding to the file data type (if configured) to download or open the file. • Click the + (plus) sign to the left of a record to view its detailed information. If the record has associated unstructured content files, they are displayed below as attachments. Exporting AIUs on the Search Results Page To export structured and/or unstructured data from AIUs on the search results page: • Select the AIUs you want to export and click Export Selected (only appears when at least one AIU is selected) at the bottom of the screen. If the search results span multiple pages, navigate through the pages to select them. If you want to export all the AIUs, click Export All without having to select any AIUs. • If returned AIUs have unstructured content files associated with them, you can use the Export with Structured/Unstructured Content icon at the bottom to choose whether to export structured data only or export both structured data and unstructured content files. • By default, all unstructured data is exported in XML format. You can change the export format to CSV or TXT format using the format option button next to the Export Selected and Export All buttons. Each export operation is logged in the audit trail complete with detailed information such as the AIU IDs and content IDs of exported records. To see a detailed report of who exported what information during the specified period of time, see the Exported Information report (Exported Information, page 282). 273 InfoArchive Administration Using the InfoArchive GUI Direct Search URL If you want to integrate InfoArchive GUI’s search capabilities into a third-party or custom-built business application, you can use a direct search URL to perform queries without having to logging in to InfoArchive GUI. Using this approach, you can bypass explicit InfoArchive GUI login, include search criteria as the query parameter directly in the URL, and leverage the configured InfoArchive GUI search forms and results pages, which allows for seamless integration of InfoArchive GUI with your business application. To use a direct search URL: 1. Create a JSP (Java Server Pages) file and deploy it as part of the InfoArchive GUI web application. For example, on Apache Tomcat, place the JSP file in the following location: TOMCAT_HOME/webapps/eas-gui Here is a sample of the JSP code: <%@ page import="com.emc.documentum.eas.gui.gwt.server.EASGUIApplication" %> <%@ page import="com.emc.documentum.eas.gui.gwt.server.NoDomainGroupInProfileListException" %> <%@ page import="com.emc.documentum.eas.service.model.authenticate.Profile" %> <%@ page import="java.util.ArrayList" %> <%@ page import="java.util.List" %> <% try { String user = request.getParameter("user"); String domain = request.getParameter("domain"); String profile = request.getParameter("profile"); String query = request.getParameter("query"); String schema = request.getParameter("schema"); String channel = request.getParameter("delivery"); if (domain != null && profile != null) { List<Profile> profiles = new ArrayList<Profile>(); Profile d = new Profile(); d.setName(domain); d.setIsDomain("true"); d.setIsDynamic(false); d.setClazz("domain"); d.setDescription(""); Profile p1 = new Profile(); p1.setName(profile); p1.setIsDomain("false"); p1.setIsDynamic(true); p1.setClazz("role"); p1.setDescription(""); d.getProfiles().add(p1); profiles.add(d); EASGUIApplication.getAuthenticationHook().onUserAuthenticated(profile, "en_US", profiles); response.sendRedirect(request.getContextPath() + "/directSearch?query="+query+ "&schema="+schema+"&delivery="+channel); } else if (user != null) { EASGUIApplication.getAuthenticationHook().onUserAuthenticated(user, "en_US", user); response.sendRedirect(request.getContextPath() + "/directSearch?query="+query+" &schema="+schema+"&delivery="+channel); } else { response.sendRedirect(request.getContextPath() + "/login?locale=en_US"); } } catch (NoDomainGroupInProfileListException e) { response.sendRedirect(request.getContextPath() + "/login?locale=en_US"); } 274 InfoArchive Administration %> 2. In the JSP file, provide the URL of the page to redirect to in the response.sendRedirect method. For example, if you want to be redirected to a search form, provide the redirect URL like this: response.sendRedirect(request.getContextPath() + "/eas#Root:search _form.PhoneCalls.01?locale=en_US"); 3. Restart the web application server. You can then access InfoArchive GUI via the JSP file by including the following parameters directly in the URL: Parameter Description user Username with which to log in to InfoArchive GUI profile The role to which the user pertains, with access privileges to the holding to search domain Domain to use to display a specific set of search forms schema PDI schema to use for the search delivery Delivery channel to use as the destination of search results. Always specify the standard delivery channel: eas_access_services. query DQL expression that will be parsed into the search criteria for the query. The DQL expression uses the following syntax: AIC where criteria order by element In the criteria expression, you can use the following comparison operators, which will be parsed into the corresponding operators that InfoArchive understands: DQL Operator InfoArchive Operator = Equal != <> NotEqual > Greater >= GreaterOrEqual < Less <= LessOrEqual like StartsWith The following built-in functions are also supported in the criteria expression: DQL Function InfoArchive Operator contains(name,’value’) Contains starts-with(name,’value’) StartsWithFullText You can also use the AND and OR logical operators in the criteria. 275 InfoArchive Administration Here is an example direct search URL containing parameters to pass to the search form: http://localhost:8080/eas-gui/example.jsp?user=admin&domain=samples &profile=phonecalls_role&query=PhoneCalls where CostumerID='1' or FirstName like 'John' &schema=urn:eas-samples:en:xsd:phonecalls.1.0&delivery=eas_access_services 4. The direct search URL will redirect you to the search results page. Using Archival Reports InfoArchive is a system designed to archive large volumes data of varying levels of complexity, including structured and unstructured data, over a long period of time. To make the most effective use of the system, it is important to understand how it is being used. InfoArchive automatically collects key system usage data and provides out-of-the-box archival reporting capabilities that help you gain visibility into how business data is being archived by the system and turn current and historical statistics into valuable insight to optimize system utilization, as well as troubleshoot ingestion problems. In addition, the Archived Volume History report provides the basis for InfoArchive volume-based pricing. InfoArchive provides key archival metrics through a set of pre-configured saved searches in DA: • List of Ingestions in Progress View valid AIPs in the system that have not been successfully archived yet. • List of Ingestions with Errors View AIPs with errors encountered during the ingestion process (return code is not 0), including rejected or invalidated AIPs. • List of AIPs by Retention Date View all archived AIPs whose retention date is before a specified date. • AIPs for Disposition View all AIPs with a disposition date determined by the retention policy. • Current xDB Library Pool Volume View the most current archival metrics of xDB library pools that stores archived AIPs. • Current Archived Volume View the most current archival metrics for a specified holding and entity. • Archived Volume History View detailed as well as aggregate historical records of archival metrics for specified holdings and entities over a past period of time. • Performed Actions View who (users and roles) performed what actions on a specific holding during a specified period of time. • Exported Information 276 InfoArchive Administration View who (users and roles) have exported what information (which AIUs) from a specified holding or all holdings during a specified period of time. Using Archival Reports in DA (Common Operations) As with other saved searches, you can use the archival reports through common operations in DA: • To access the reports, in the DA navigation pane, click Saved Searches. • To run a report, right-click the report and select Run Search. • To rerun the report, in the Search Results page, click Restart from the menu on the top-right corner of the screen. (This button is not available for Current xDB Library Pool Volume.) • To sort on an attribute, click the column header in the search results table. • To customize the search results table, in the Search Results page, click the Column Preferences icon next to the right-most column header. In the Column Preferences pop-up window, you can add or remove attributes to display in the search result table. (This feature is not available for Current xDB Library Pool Volume.) • To better understand the archival reports, refer to the EMC InfoArchive Object Reference Guide, which contains descriptions of all the object properties that map to corresponding columns in the search results tables. • To monitor search results in real time when a report is running, on the search results page, click the Open the Search Status icon (magnifying glass) as soon as the search has started. • To view the underlying DQL query statement for the report, in the Real Time Search Monitoring page, click Show native query in the table. 277 InfoArchive Administration List of Ingestions in Progress Use the List of Ingestion in Progress report to view valid AIPs in the system that have not been successfully archived yet, which include AIP objects whose eas_phase_seqno values are any of the following (representing corresponding phases in the AIP lifecycle): • 1 (Receive) • 2 (Pending ingestion) • 3 (Ingestion) • 4 (Pending commit) List of Ingestions with Errors Use the List of Ingestions with Errors report to view AIPs with errors encountered during the ingestion process (return code is not 0), including rejected or invalidated AIPs. Ingesting an AIP always returns a code (represented by the eas_return_code property of the AIP object): 0 (zero) means the AIP has been successfully ingested; a value other than 0 (zero) means the ingestion has failed. You can use the return code and its corresponding return message to troubleshoot the ingestion error. After you resolve the ingestion problem, you can either resume the ingestion process or invalidate/reject the AIP. List of AIPs by Retention Date Use the List of AIPs by Retention Date report to view all archived AIPs whose retention date is before a specified date. If you do not specify a date, all AIPs with a retention date are returned. AIPs for Disposition Use the AIPs for Disposition report to view all AIPs with a disposition date determined by the retention policy. The report provides the following details: Column Description AIP Name of the AIP with a disposition date already set Holding Name Name of the holding to which the AIP belongs Disposition Date Exact date and time of disposition Time to Disposition Number of days remaining until the disposition date; if the disposition date has already passed, the status Expired is displayed Hold Requested by The user who applied the purge lock to the AIP, if there is one Retention Markup Details The retention policy and markup designation if Retention Policy Services (RPS) is implemented 278 InfoArchive Administration Current xDB Library Pool Volume Use the Current xDB Library Pool Volume report to view the most current archival metrics of xDB library pools that stores archived AIPs. When defining the search criteria, you can select a library pool and specify a range for the last cache in date, last cache out date, or opening date. If you do not specify any search criteria, the report displays metrics for all the xDB library pools that stores archived data. You can export the report to the comma-separated values (CSV) format by clicking Export to CSV from the menu on the upper-right corner of the page. Current Archived Volume Use the Current Archived Volume report to view the most current archival metrics for a specified holding and entity. The report displays aggregated statistics computed in real time from all archived AIPs currently stored in the system. If you do not specify a holding or an entity when you define the search criteria, all holding/entity combinations with archived AIPs are returned. It may take a long time to return the search results when the number of archived AIPs is large. Note: Empty holdings and entities not associated with any archived AIPs are not available for selection when you define the search criteria; holding/entity combinations with no archived AIPs are not returned in search results. 279 InfoArchive Administration Archived Volume History Use the Archived Volume History report to view detailed as well as aggregate historical records of archival metrics for specified holdings and entities over a past period of time. By offering you a view of the archival metrics at varying levels of granularity, the report lets you see both the forest and the trees—you can view a high-level summary of archival metrics to understand the archiving trend during a period, as well as dive in to see individual snapshot of metrics at the time of archiving. There are two report types: • List The list report type displays detailed historical records time of archival metrics during a specified period of in chronicle order. If you do not specify a time period, all historical records will be listed. The report displays the same metrics as the Current Archived Volume, but instead of showing the most current archival metrics computed in real time, it pulls from an underlying registered table past records of archival metrics captured periodically by a scheduled job. Each record has a date timestamp and represents a snapshot of archival metrics at that point in time. You can export the list type report to the comma-separated values (CSV) format by clicking Export to CSV from the menu on the upper-right corner of the page. • Aggregate The aggregate report gives you the flexibility to view a host of archival metrics aggregated by specified functions in varying levels of granularity. For example, you can view the average number of archived packages aggregated by each week for the past three months or the maximum archived volume of structured and unstructured data combined aggregated by each month during last year. Note: InfoArchive volume-based pricing is based on the maximum metadata + contents volume metric (received structured data and associated unstructured data volume). 280 InfoArchive Administration To define the search criteria for the Archived Volume History report, you first specify a holding and an entity for which you want to generate the report, and then choose a report type. If you do not specify a holding or an entity when you define the search criteria, all holding/entity combinations with archived AIPs are returned. It may take a long time to return the search results when the number of archived AIPs is large. Note: Empty holdings and entities not associated with any archived AIPs are not available for selection when you define the search criteria; holding/entity combinations with no archived AIPs are not returned in search results. After the report is generated, you can: • Click the Export To CSV command on the top-right corner of the screen to export the generated report into a .csv file that contains a snapshot of all the metrics for each interval period during a specified period of time • Click the Show Histogram command on the top-right corner of the screen to view the archival metrics visually represented as a histogram. Performed Actions Use the Performed Actions report to view who (users and roles) performed what actions on a specific holding during a specified period of time. You must export the report to CSV to view the complete action history information, including the event name and object ID associated with the performed actions. 281 InfoArchive Administration The exported Performed Actions report provides detailed information about the following InfoArchive audit trail events (of object type eas_audittrail): eas_query, eas_fetch, eas_getcontent, eas_order, and eas_getorder. Exported Information Use the Exported Information report to view who (users and roles) have exported what information (which AIUs) from a specified holding or all holdings during a specified period of time. 282 InfoArchive Administration Archived AIPs and archival metrics The Current Archived Volume and Archived Volume History reports share the same set of archival metrics. Archival metrics are calculated based on AIPs that have been successfully archived into the system (ingestion committed with the phase being Complete in the AIP lifecycle) and are not in a transient state (such as included in an aggregate). Qualified AIPs have the following attribute values: • eas_phase_code = COM • eas_is_in_unsteady_state = false The following AIPs are not taken into account by archival metrics calculation and reporting: • AIPs in any phase of the lifecycle prior to Complete: Receive, Pending ingestion, Ingestion, and Pending commit • AIPs that have been invalidated, rejected, or purged • AIPs being synchronously ingested (eas_is_in_unsteady_state = true) • AIPs to be aggregated Note: The symbol G in the archival metrics represents the Giga unit prefix that denotes a factor of a billion (1,000,000,000) rather than 1024M (1,073,741,824). Key Archival Metric Description Metadata volume (GChar) Total number of characters in billions contained in archived PDI files (structured data), calculated as the sum of the eas_pdi_values_char_count property value of all the archived AIPs. The structured data character count for an AIP includes XML element values and attribute values in the PDI file (eas_pdi.xml) and excludes XML tags to accurately reflect the volume of actual business data archived. Therefore, the count is independent of the XML schema used. For example, the following element contains 6 characters worth of structured data: <myelement myattribute="ABC">DEF</myelement> Contents volume (GB) Total volume in gigabytes of archived unstructured content, calculated as the sum of the eas_ci_size property value of all the archived AIPs. The unstructured content volume for a single AIP is the uncompressed size of all the content files received in the package, before compression or encryption, if any, during the ingestion process. For example, a SIP package consisting of a 500KB PDF file and a 300KB ZIP file is 800KB in data size. 283 InfoArchive Administration Key Archival Metric Description Metadata + Contents volume Total volume of archived data, including structured data (PDI files) and unstructured data (content files), calculated as the sum of structured data volume and contents volume (despite different units). The largest value of this metric over a given period of time (calendar year) is used for InfoArchive volume-based pricing. Number of packages Total number of archived AIPs stored in the xDB library, calculated as the sum of the eas_aip_count property value of all the archived AIPs. Number of AIUs Total number of archived AIUs, calculated as the sum of the eas_sip_aiu_count property value of all the archived AIPs. Received package volume (GB) Total size in gigabytes of received SIP files that have been archived, calculated as the sum of the eas_sip_file_size property value of all the archived AIPs. Raw XML metadata volume (GB) Total size in gigabytes of PDI files (eas_pdi.xml) that have been archived, calculated as the sum of the eas_pdi_file_size property value of all the archived AIPs. xDB volume (GB) Total size in gigabytes of the xDB storage space occupied by archived data, including PDI files (eas_pdi.xml), table of contents (eas_ri.xml), and indexes created on these documents. The metric is calculated by aggregating the sum of eas_xdb_pdi_size, eas_xdb_ri_size and eas_xdb_index_size property values of all the archived AIPs; it does not include size related to potential xDB indexes created at an upper level for AIPs ingested with xDB mode 1 or 3. Storage volume (GB) Total size in gigabytes of all the content stored in the repository (e.g., renditions) pertaining to archived AIPs, calculated as the sum of the eas_contents_size property value of all the archived AIPs. Configuring the Calculation of Archived Structured Data Volume To keep track of the cumulative volume of archived structured data, InfoArchive counts the number of characters contained in the PDI file (eas_pdi.xml) of the AIP during the ingestion process and stores this information as the value of the eas_pdi_values_char_count property of the AIP. If a counting error occurs, structured data hash is not validated correctly and ingestion fails. If xDB mode is set to 2, InfoArchive sums up the structured data character count of all LWSO AIPs into the eas_pdi_values_char_count property value of the aggregated AIP. The structured data character count for an AIP includes XML element values and attribute values in the PDI file (eas_pdi.xml) and excludes XML tags to accurately reflect the volume of actual business data archived. Therefore, the structured data character count is independent of the XML schema used. 284 InfoArchive Administration You can configure whether to ignore spaces in counting structured data characters by setting the are.whiteSpace.ignored value in the following ingestion configurations: • eas_cfg_ingest_sip_zip-pdi for ingesting structured information • eas_cfg_ingest_sip_zip-ci for ingesting unstructured information Both ingestion configurations are .xml files imported into the repository as eas_cfg_ingest type objects. Add the following in the files: ... <processor> <class>com.emc.documentum.eas.ingestor.transform.processor.validator.PDISchemaValidator</class> <data> <are.whiteSpaces.ignored>true</are.whiteSpaces.ignored> </data> </processor> ... Configuring the eas_report Job In both the list and aggregate type Archived Volume History reports, historical metrics data is pulled from a registered table named eas_history_volume. A scheduled job eas_report automatically collects and aggregates AIPs archiving data and saves all historical statistics in the eas_history_volume table periodically. Each record represents a snapshot of the archival metrics at the point of time the archiving data was captured. The eas_report job must be activated and scheduled to run at least once a week to ensure enough archiving data is captured to generate the Archive Volume History report and provide the basis for InfoArchive volume-based pricing. InfoArchive installation automatically creates the eas_report job in DA and sets it to the Active state. You can change the job schedule to meet your specific reporting requirements but the minimum frequency of weekly job runs must be maintained. You can also use your own job scheduler to schedule the eas_report job outside DA. To schedule the eas_report job in DA, you use the same steps for other jobs: 1. In the DA navigation pane, connect to your repository and click Administration > Job Management > Jobs. 2. Use the search box to locate the eas_report job. 3. Right-click the job and select Properties. 4. On the Info page of the Job Properties window, make sure State is Active. 5. Select a trace level from the Trace Level list: • 0: TRACE • 1: DEBUG • 2: INFO • 3: WARN • 4: ERROR 285 InfoArchive Administration Trace logs contain status information logged by the job. The trace level set for the job determines the amount of information logged. 6. Click the Schedule tab and change the job schedule. Field Label Description Next Run Date and Time Specifies the next start date and time for the job. The default is the current date and time. Repeat Specifies the time interval in which the job is repeated. Frequency Specifies how many times the job is repeated. For example, if Repeat is set to Weeks and Frequency is set to 1, the job repeats every week. If Repeat is set to weeks and Frequency is set to 3, the job repeats every three weeks. End Date and Time Specifies the end date and time for the job. After The default end date is 10 years from the current date and time. Specifies the number of invocations after which the job becomes inactive. After the eas_report job is run: • To view the job report, right-click the job and select View Job Report. • To view the job trace logs, right-click the job and select View Trace File. • Right-click the job and select Properties to view the job run history information at the bottom of the Info page. Last Return Code displays the last value returned by the job that can help you troubleshoot problems when error occurred: — 0: The job was executed successfully without any errors. — -1: An unexpected error occurred. — 3: There was an error related with credentials. — 4: There was an error related with job parameters. — 5: There was an error related with job initialization. 286 InfoArchive Administration Managing Orders InfoArchive orders are created by asynchronous (background) searches and are represented by eas_order objects in Content Server. You manage orders using DA in the same way you manage other Content Server objects. You can create orders through SOAP requests or InfoArchive GUI. Viewing Order Properties and Processing History To view the properties of an order, right-click the order in DA and then select Properties. In the Properties window, you can click through tabs to view various order properties grouped under them. You can view detailed order processing history under the Tracking Details tab. Suspending/Resuming an Order To temporarily suspend an order or to resume a suspended order: 1. In DA, right-click an order and select Properties. In the Properties window, click the Tracking tab. 2. Select the Suspended option to suspend the order; clear this option to resume the order. 3. Click OK. 287 InfoArchive Administration Changing the Priority of an Order To change the priority of an order: 1. In DA, right-click an order and select Properties. In the Properties window, click the Order tab. 2. In the Priority field, set the new priority for the order. 3. Click OK. 288 InfoArchive Administration Cancelling an Order To abort the execution of an order: 1. In DA, right-click an order and select Properties. In the Properties window, click the Tracking tab. 2. Select the Delete requested option to cancel the order. 3. Click OK. 289 InfoArchive Administration Purging Orders Two scheduled jobs are executed to purge orders: 1. Run the eas_clean_order Content Server job. This job destroys all completed orders that have reached their retention date, making associated data in xDB orphaned. 2. Run the eas_launch_xdb_clean job on the xDB server. This job destroys orphaned xDB data. Managing Jobs InfoArchive Jobs InfoArchive includes several specific jobs to be scheduled: • In the form of Content Server repository jobs for those which always have to be executed on the server hosting the Content Server. • In the form of a command line to be triggered for the jobs which are subject to be executed on servers not hosting the Content Server depending on the deployed architecture. Since InfoArchive rests on the usage of a repository, it is also required to schedule the execution of some standard Content Server jobs. Viewing Job Trace Logs Trace logs contain status information logged by a job. The trace level set for a particular job determines the amount of information logged. After a job run, you can view the trace log for the job by right-click the job and select View Trace File in DA. However, it may take an InfoArchive job several hours to complete and the trace file is not available until the job has stopped running. For a job that is still running, you can find its log file in the a location like the following: DOCUMENTUM/dba/log/00000001/sysadmin/job_log_file 290 InfoArchive Administration Where job_log_file is a job-specific log file in .txt format: • eas_close: eas_closeTrace.txt • eas_confirmation: eas_confirmationTrace.txt • eas_dctm_clean: eas_dctm_cleanTrace.txt In addition, for Content Server methods being executed, you can find some information pertaining to the related running job in the Content Server repository log located in DOCUMENTUM/dba/log/repository_log_file. If a job encounters an error before the listener (the object writing the report and the trace file) are attached, the error is written to the standard output only. For example, if a memory exception occurs, or if the arguments in the dm_job are not correct, an error is written to the standard output. If the trace file or the report file is empty, you can find the error log in /Temp/Jobs/job_name. For example, the close job log is located in /Temp/Jobs/eas_close. InfoArchive Content Server Jobs Content Server jobs are repository objects that automate method object execution. Methods associated with jobs are executed automatically on a user-defined schedule. The properties of a job define the execution schedule and turn execution on or off. Jobs are invoked by the agent exec process, a process installed with Content Server. At regular intervals, the agent exec process examines the job objects in the repository and runs those jobs that are ready for execution. All InfoArchive job and method names start with eas_ as the prefix. You manage InfoArchive jobs and methods under Administration > Job Management in DA. You manage InfoArchive jobs and methods in the same way as you do with standard Content Server jobs. For information about how to manage Content Server jobs, such as reschedule jobs, run jobs, view running job statuses, view job reports, set job trace levels, and view job trace logs, refer to the Job Management chapter in the EMC Documentum Administrator User Guide. 291 InfoArchive Administration InfoArchive provides the RunJob utility to run Content Server jobs from the command line: • On Windows: EAS_HOME\bin\runJob.bat repository_name job_name For example: c:\InfoArchive\bin\runJob.bat repo01 eas_commit • On Linux or UNIX: EAS_HOME/bin/runJob.sh repository_name job_name For example: /usr/local/bin/InfoArchive/bin/runJob.sh repo01 eas_commit InfoArchive provides the following Content Server jobs: Job Name Job Code Description Archive Audit eas_archive_audit The Archive Audit job creates SIP files containing a copy of the repository audit trails and post them in the ingestion queue by triggering the receiver. DCTM Clean eas_dctm_clean The DCTM Clean job destroys the repository objects associated with the orders having reached their retention date. Commit eas_commit The purpose of the Commit job is to seal the AIP belonging to the same data submission session (DSS) when they have all been ingested. 292 InfoArchive Administration Job Name Job Code Description DMClean CAStore dm_DMClean _CAStore The DMClean CAStore job is a clone of the standard dm_DMClean Content Server job having different argument values in order to include the detection of the orphan contents stored in Centera. Invalidation /Rejection eas_rejinv Use the this job to process invalidated or rejected AIPs. Confirmation eas_confirmation Generates one or several confirmation messages when some events occur on an AIP. Purge eas_purge The Purge job is responsible for the disposal of AIPs. Purge Audit eas_purge_audit The Purge Audit job is responsible for purging Content Server audit trails that have been successfully archived. Update Content Metadata eas_sip_cmeta _refresh Use the Update Content Metadata job to propagate the AIP custom element changes to the SIP descriptor rendition. Modifying InfoArchive Methods Timeout Settings The default timeout for InfoArchive methods is one hour. If it normally takes longer than one hour to run a job, you need to modify the method’s timeout settings to extend the time allowed to pass before the job fails. To modify the timeout setting of a method: 1. In DA, navigate to Administration > Job Management > Methods. 2. Locate the method; then right-click it and select Properties. 3. Specify new values for the following timeout fields: • Timeout Minimum The minimum timeout that can be specified on the command line for this procedure. The minimum timeout value cannot be greater than the default value specified in the timeout default field. • Timeout Default The default timeout value for the procedure. The system uses the default timeout value if no other time-out is specified on the command line. • Timeout Maximum The maximum timeout that can be specified on the command line for this procedure. 293 InfoArchive Administration Increasing the JVM Heap Size Allocated to InfoArchive Jobs Running jobs on sizable volumes of data can run out of system memory using the default settings (JOB_JAVA_MEMORY=256M). For example, you may encounter the out-of-memory error when running the eas_close job to close a large quantity of open AIP parent shareable objects. To avoid out-of-memory issues, increase the JVM heap size allocated to InfoArchive jobs: • On Windows, edit EAS_HOME\bin\eas-set-env.bat and change: SET JOB_JAVA_MEMORY=256M To: SET JOB_JAVA_MEMORY=4096M • On Linux or UNIX, edit EAS_HOME/bin/eas-set-env.sh and change: export JOB_JAVA_MEMORY=256M To: export JOB_JAVA_MEMORY=4096M Archive Audit (eas_archive_audit) The Archive Audit job creates SIP files containing a copy of the repository audit trails and post them in the ingestion queue by triggering the receiver. For detailed information about the Archive Audit job, see . DCTM Clean (eas_dctm_clean) The DCTM Clean job destroys the repository objects associated with the orders having reached their retention date. DCTM Clean Arguments Argument Description -WorkingDir DirectoryPath File system path of the working directory to be used by the job. -report_only True|False Setting to true this argument lets the system report the purge processing to be done without executing them. -OrderNodes node_name Nodes of the orders to clean. -PhasesToProcess order|prune|aip _parent Which phase of orders to delete: • Order phase: Clean outdated and deleted orders • AIP Parent phase: Clean empty AIP parents • Prune phase: Delete AIP parents and LWSOs that are attached to the Prune lifecycle 294 InfoArchive Administration DCTM Clean Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job Close (eas_close) The close job (eas_close) is responsible for: • Checking if the AIP parent shareable objects that meet the defined close condition have been closed (their associated eas_open_aip_parent_rel objects have been destroyed) • Closing any AIP parent shareable object that meets defined close condition by aggregating all its AIP child lightweight objects into a single materialized AIP object and attaching the parent to the prune lifecycle. • Detecting pooled libraries which can be closed in order to close them. Such closure includes their archiving in the repository if this option is activated. Close Arguments Argument Description -report_only TRUE|FALSE When set to true, the log of the job only indicate the xDB pooled libraries ready for being closed but the job will not close them. -pooled_library_predicate Optional argument allowing to restrict the scope of the pooled libraries to be considered by the job. This argument has to be populated with a standard DQL predicate (For example, "eas_xdb_pooled_library where …"). This argument creates several dm_job responsible for closing different subsets of libraries. If empty, there is no predicate. 295 InfoArchive Administration Argument Description -aip_parent_predicate Optional argument allowing to restrict the scope of the AIP parent sharable objects to be considered by the job. This argument has to be populated with a standard DQL predicate (For example, "eas_aip_parent where …"). This argument lets you create several dm_job responsible for closing different subsets of AIP parents. If empty, there is no predicate. -PhasesToProcess aip _parent|pooled_library Whether to close AIP parents or pooled libraries. -WorkingDir File system path of the directory to be used by the job as working area. -pooled_library_close_delay The amount of delay time in minutes before closing pooled libraries. Default: 10 (minutes). -aip_parent_close_delay The amount of delay time in minutes before closing AIP parents. Default: 10 (minutes). Close Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job Commit (eas_commit) The commit process is invoked by running the eas_launch-ingestor script and it commits AIUs into the xDB database so that the archived data can be searched. When all SIPs pertaining to a DSS have been ingested (lifecycle state = Waiting_Commit), executing the eas_commit job performs the following actions on every SIP in the batch: • Attach a commit log file content to the AIP for traceability purposes • Push the retention date as well as content attributes to the contents stored in Centera 296 InfoArchive Administration • Compress the received SIP file • Promote the AIP to the Completed state Commit Arguments Argument Description -WorkingDir File system path of the working directory to be used by the job. Standard Content Server job arguments are passed to the job. Commit Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job Return Code Mnemonic Message 0 RUNJOB_I_NORMAL Normal successful completion 1 RUNJOB_E_LAUNCHFAILED Could not launch method 2 RUNJOB_E_TIMEDOUT The method timed-out 3 RUNJOB_E_RETURNVAL The method returns the error value 4 RUNJOB_E_RESULT The method returns error as execution result 5 RUNJOB_E_INTERNAL Generic error 6 RUNJOB_E_AMBIGUOUSJOB Several jobs have name 7 RUNJOB_E_MAXITERATIONS The maximum of times to execute the job has been reached 8 RUNJOB_E_USERCHECKEDOUT Job is currently checked out by user 9 RUNJOB_E_NOJOBMETHOD Job does not have any associated method 10 RUNJOB_E_DMCL Error while processing DMCL command 11 RUNJOB_E_RESULTFILE 297 InfoArchive Administration Return Code Mnemonic Message 12 RUNJOB_E_JOBLOGFILE 13 RUNJOB_E_CONNECT Can not connect to the docbase 14 RUNJOB_E_JOBNOTFOUND No job found DMClean CAStore (dm_DMClean_CAStore) The DMClean CAStore job is a clone of the standard dm_DMClean Content Server job having different argument values in order to include the detection of the orphan contents stored in Centera. DMClean CAStore Arguments The arguments set on the job are identical to the arguments of the standard dm_DMClean job except the value of the -clean_castore argument which is set to true. DMClean CAStore Return Codes The job returns the same return codes as the standard dm_DMClean job. Invalidation/Rejection (eas_rejinv) Use the this job to process invalidated or rejected AIPs. An administrator can invalidate or reject an AIP within the DA interface; such action leads to immediately attach the invalidation or rejection lifecycle to the selected AIP. In the event the invalidated AIP was in the Completed state and part of a Data Submission Session (DSS) having multiple AIPs; the job reverts the other AIPs to the Waiting Commit state. In the event the rejected AIP is part of a Data Submission Session (DSS) having multiple AIPs; the job also rejects the other AIPs by attaching them to the reject lifecycle. Once a rejected AIP have been considered by the Confirmation (eas_confirmation) job for the invalidation or rejection event, the job removes the contents associated to the AIP except those configured as to be retained in such case. Invalidation/Rejection Arguments Argument Description -WorkingDir DirectoryPath File system path of the working directory to be used by the job. 298 InfoArchive Administration Argument Description -PhasesToProcess invalid|reject Whether to process invalidated or rejected AIPs. -cutoff_days NumberOfDays Optional argument ensuring a fast scan even if a large number of AIP is stored in the repository. If this argument is present, the criteria receive date >= (current date – the number of days) is added to the searches Since RDBMS indexes are created on the AIP receive date attributes, the inclusion of this criteria lets you quickly narrow down the search to the recently received AIP. Standard Content Server job arguments are passed to the job. Invalidation/Rejection Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job Confirmation (eas_confirmation) Depending on the configuration, InfoArchive can generate one or more confirmation messages when one of the following events occurs on an AIP: • The SIP file associated with the AIP has been received • The AIP has been ingested • The AIP has been purged • The AIP has been rejected • The AIP has been invalidated The structure and the delivery of those confirmations are also driven by the configuration. These confirmations are generated by the Confirmation (eas_confirmation) job which incrementally scans the AIP on which the events occurred since its last execution. 299 InfoArchive Administration When an error occurs, check the job report, which is available in DA as well as in the /tmp directory (e.g., c:/tmp) on the file system. Confirmation Arguments Argument Description -ConfirmationTypes [receipt][,storage][,purge] Specifies the confirmation events to be considered by the job using the following keywords: [,reject][,invalid] • receipt searches the AIP which have not yet be considered by the confirmation job for potentially notifying the reception of the SIP file • storage searches the AIP which have not yet be considered by the confirmation job for potentially notifying the archiving of the AIP • purge searches the AIP which have not yet be considered by the confirmation job for potentially notifying the purge of the AIP • reject searches the AIP which have not yet be considered by the confirmation job for potentially notifying the rejection of an AIP • invalid searches the AIP which have not yet be considered by the confirmation job for potentially notifying the invalidation of an AIP -WorkingDir DirectoryPath File system path of the working directory to be used by the job -cutoff_days NumberOfDays Optional argument ensuring a fast scan even if a large number of AIP is stored in the repository. If this argument is present: • The criteria receive date >= (current date – the number of days) is added to the searches issued for the receipt, storage, reject and invalid events. • The criteria purge date >= (current date – the number of days) is added to the searches issued for the purge events. Since RDBMS indexes are created on those AIP receive and purge date attributes, the inclusion of those criteria lets you quickly narrow down the search to the recently altered AIP. Confirmation Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 300 InfoArchive Administration Return Code Mnemonic Description 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job 6 E_NOT_CACHED The configuration requires to generate a confirmation containing the details of the AIU but the AIP is not cached at the xDB level 7 E_DELIVERY _CHANNEL Cannot load the delivery channel configuration (i.e. not found or found more than once) 8 E_NOTIFICATION _SENT The configured command line activated for sending a generated confirmation returned a non-zero return code 9 E_GLOBALCONFIG Cannot load the InfoArchive global configuration object 10 E_STAMP_UPDATE Cannot update the AIP with the confirmation timestamp indicating when the AIP has been considered by the job 11 E_CONF_AUDIT The audit trail associated with a generated confirmation could not be saved 12 E_CONF_COMPLETED _AUDIT The audit trail indicating the completion of the job execution could not be saved 13 E_NO_QUERY _CONFIG Cannot find the Query configuration Purge (eas_purge) The Purge job is in charge of executing the processing related to the disposal of the AIP: • Attachment of the purge lifecycle the AIP having reached their retention date which do not have any disposal lock. • Destruction the AIP attached to the purge lifecycle which have been considered by the Confirmation (eas_confirmation) job for the purge event. • Destruction of the pooled library objects storing only destroyed AIP. 301 InfoArchive Administration Purge Argument Argument Description -PhasesToProcess [detect][,destroy] [,destroy_pooled_library] Specifies the execution scope of the job using the following keywords: • detect activates the search of the AIP having reached their retention date and not having any disposal lock in order to attach them to the purge lifecycle. • destroy activates the search of the AIP attached to the purge lifecycle which have been considered by the confirmation job. • destroy_pooled_library activates the search of pooled libraries storing only destroyed AIP. -report_only TRUE|FALSE Setting to true this argument lets you report the purge processing to be done without executing them. The Purge Return Code Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job 6 E_DETECT Error while detecting the AIP 7 E_DESTROYAIP Error while attempting to destroy an orphan AIP 8 E_DESTROYPOOL Error while attempting to destroy an orphan xDB pooled library Update Content Metadata (eas_sip_cmeta_refresh) Use the Update Content Metadata job to propagate the AIP custom element changes to the SIP descriptor rendition. 302 InfoArchive Administration Update Content Metadata Argument Argument Description -WorkingDir DirectoryPath File system path of the working directory to be used by the job Update Content Metadata Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job InfoArchive Command Line Jobs InfoArchive provides the following command line jobs: Job Name Job Code Description Clean eas-launch-clean Performs a regular cleanup of the reception, ingestion and xDB cache working directories. xDB Enumeration No Backup eas-launch-xdb -enumeration -nobackup This job enumerates to stdout the list of xDB segments which have been archived in the repository. xDB Clean eas-launch-xdb -clean The xDB Clean job detects and destroy the xDB segments and XML documents created by InfoArchive but no more referenced in the repository. Those jobs have not been implemented as Content Server jobs since depending on the chosen architecture, their execution can be required on servers not hosting the Content Server repository. For clarity and brevity, the name of the script associated with the command line jobs are sometimes referred to without their platform-specific extension (.bat on Windows and .sh on Linux) in this document. 303 InfoArchive Administration Clean (eas-launch-clean) The purpose of this job is to perform a regular cleanup of the reception, ingestion and xDB cache working directories: • The arguments indicate the nodes for which the clean-up must be executed — The file system path of the working areas is read in the configuration object of these nodes. — These file system areas must be accessible at the file system level on which the command is launched. • The job is driven by a scanning of the working directories at the file system level having a timestamped name. • Directories timestamped with a date older than a parameterized period are deleted except if they are associated with an AIP likely in error. Clean properties file The parameters of this job are stored in the conf/eas-clean.properties file; they can be overridden in the command line. Property Long (Short) Name Description Default Value docbase_name (s) The repository name Name of the repository domain (d) The user’s domain if any Empty value user_name (u) The name of the Documentum user to connect with Login name of the installation owner user account password (p) (Optional) The user password with which to connect to the repository. Password of the installation owner user account If the job executes on the same host where Content Server is installed, you do not need to specify this property here. The job can connect to the repository through the Content Server trusted login feature. receivers (r) 304 (Optional) List of reception node configuration names (separated by ’,’) indicating which reception working areas must be cleaned up. The ’*’ value indicates that all nodes must be considered * InfoArchive Administration Property Long (Short) Name Description Default Value ingestors (i) (Optional) List of ingestor node configuration names (separated by ’,’) indicating which ingestion working areas must be cleaned up. The ’*’ value indicates that all nodes must be considered * cache_access (c) (Optional) List of xDB cache node configuration names (separated by ’,’) indicating which xDB cache working areas must be cleaned up. The ’*’ value indicates that all nodes must be considered * keephistory (h) Working directories having a name timestamped prior to the current date minus the value of this property (expressed in hours) are destroyed. 5 level (l) Logging level: ERROR, WARN, DEBUG, INFO, TRACE INFO Clean Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_INJECTOR_FOLDER Error while deleting an ingestion working directory 6 E_RECEIVER_FOLDER Error while deleting an a reception working directory 7 E_CACHEACCESS _FOLDER Error while deleting a cache access node working directory 305 InfoArchive Administration xDB Enumeration No Backup (eas-launch-xdb-enumeration -nobackup) This job enumerates to stdout the list of xDB segments which have been archived in the repository by: • Parsing the segments of the xDB federation • Checking in the repository if the segments has been archived The xDB segments which have been archived are written to the stdout output while the job processing messages are written to the stderr one. The returned list of segments is intended to be used while activating the standard xDB backup command line for skipping the backup of those segments. xDB Enumeration No Backup properties file This properties file defines the known repositories and their associated credentials; this file has the same syntax as the properties file of the xDB Clean job. xDB Enumeration No Backup options Usage: eas-launch-xdb-enumeration-nobackup.sh [-c <arg>] -f <arg> [-l <arg>] -P <arg> [-p <arg>] This command supports the following options: Option Description -c –cache <arg> The xDB cache pages for the database session -f –federation <arg> The xDB federation bootstrap path or URL -l, –level <arg> The optional logging level to use (TRACE, DEBUG, INFO, WARN, or ERROR) -p –password <arg> The password of the xDB superuser -P –properties Properties file containing the list of repositories to connect to with the credentials to use xDB Enumeration No Backup example: eas-launch-xdb-enumeration-nobackup.sh -f //localhost:1235 -p mypassword –P /app/eas/conf/eas-xdb-clean.properties 1>skip-segments.txt 2> eas-launch-xdb-enumeration-nobackup.log xDB Enumeration No Backup Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 306 InfoArchive Administration Return Code Mnemonic Description 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 5 E_XDBINIT Error while initializing the xDB client xDB Clean (eas-launch-xdb-clean) The xDB Clean job detects and destroy the xDB segments and XML documents created by InfoArchive but no more referenced in the repository. The job performs the following actions in sequence: • Scan of the xDB segment having a name prefixed by eas_aip_id : the segment is considered as orphan if not any AIP having this identifier is found in the repository. • Scan of the xDB segment having a name prefixed by eas_pooled_library_id : the segment is considered as orphan if not any pooled library object having this identifier is found in the repository. • Scan of the xDB documents having an attribute named eas_aip_id : the document is considered as orphan when: — No AIP having this identifier is found in the repository — The AIP is invalidated with status INV-WXDBCLEAN — The AIP is rejected with status REJ-WXDBCLEAN • Scan of the xDB documents having an attribute named eas_order_id : the document is considered as orphan if not any order having this identifier is found in the repository. xDB Clean Properties File The conf/eas-xdb-clean.properties properties file contains the repositories to connect to with the credentials to use for each repository. Property Description Default Value dfc.servers The repository names to connect with separated by a comma Name of the repository 307 InfoArchive Administration Property Description Default Value dfc.server. repositoryName .user The name of the Documentum user to connect with for the specified repository Login name of the installation owner user account dfc.server. repositoryName .password (Optional) The user password with which to connect to the repository. Password of the installation owner user account If the job executes on the same host where Content Server is installed, you do not need to specify this property here. The job can connect to the repository through the Content Server trusted login feature. xDB Clean Options Option Description -d –database The xDB Database name -u –username <arg> The user name. xDB automatically uses the superuser or Administrator user name where needed. -p –password <arg> The password of the user -f –federation <arg> The federation bootstrap path or URL -c –cache <arg> The cache pages for the database session -l,--level <arg> The optional logging level to use (TRACE, DEBUG, INFO, WARN, or ERROR) -P –properties properties file containing the list of repositories to connect to with the credentials to use -r –report_only Reports the xDB libraries and documents to be deleted without deleting them -xa –exclude_aip Deactivate the search of the xDB libraries/documents associated with AIP -xo –exclude _order Deactivate the search of the xDB libraries/documents associated with orders -xi --exclude _invalid Deactivate the search of the xDB libraries/documents associated with invalid AIPs -xr --exclude_reject Deactivate the search of the xDB libraries/documents associated with rejected AIPs xDB Clean command line example: eas-launch-xdb-clean.bat -P /app/eas/conf/eas-xdb-clean.properties -d xdb01 -d xdb01 -u Administrator -p dmadmin -f xhive://localhost:1235 308 InfoArchive Administration xDB Clean Return Codes Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 5 E_XDBINIT Error while initializing the xDB client library Managing Audit Audit management is essential to regulatory and legal compliance, which demands a convincingly documented audit trail. Most audits commence with a request for information, followed by a request for an audit trail for supplied information. A properly stored, well-managed, and tamper-proof audit trail that can be conveniently and quickly accessed build confidence with auditing bodies and contribute to favorable outcomes. In InfoArchive, auditing can be activated for both standard Documentum Content Server events and InfoArchive-specific events and are managed through the standard Audit Management feature in DA. Audit records can be archived by the eas_archive_audit job into a designated InfoArchive holding and accessed through two pre-configured search forms in InfoArchive GUI. Archived audit records that have aged beyond their compliance requirements can be deleted using the eas_purge_audit job. InfoArchive Audit Trail Events In InfoArchive, audit records result from two types of events: • Standard Documentum Content Server events (with event names beginning with the dm_ prefix) InfoArchive leverages the standard Documentum auditing capabilities to provide documentary evidence of the sequence of activities and events that have affected InfoArchive-specific objects (with object names beginning with the eas_ prefix), such as AIP objects, configuration objects, and pooled library objects, just as with other repository objects. The auditing of standard Documentum Content events are activated by default. You view and manage audit records of standard Documentum events in DA using the features in the Administration/Audit Management folder. • InfoArchive-specific events Aside from standard Documentum Content Server events, InfoArchive also audits events that are specific to the InfoArchive application—events with their names beginning with the eas_ prefix. 309 InfoArchive Administration Among these events, InfoArchive audit trail events—eas_query, eas_getcontent, eas_order, eas_getorder, and eas_fetch—are not audited by default, but need to be manually activated at the global or holding/AIC level. Documentum Content Server Events Here is a list of standard Documentum Content Server events audited by InfoArchive: Audit Object Type Event dm_user dm_connect, dm_destroy, dm_disconnect, dm_getlogin, dm_logon_failure, dm_save, dm_security_check_failed dm_group dm_destroy, dm_save, dm_security_check_failed dm_policy dm_install, dm_uninstall, dm_validate eas_cfg dm_addrendition, dm_branch, dm_checkin, dm_checkout, dm_destroy, dm_link, dm_lock, dm_mark, dm_prune, dm_removecontent, dm_removerendition, dm_save, dm_saveasnew, dm_setfile, dm_unlink, dm_unlock, dm_unmark eas_aip dm_addrendition, dm_addretention, dm_branch, dm_checkin, dm_checkout, dm_destroy, dm_link, dm_lock, dm_mark, dm_prune, dm_removecontent, dm_removerendition, dm_removeretention, dm_save, dm_saveasnew, dm_setfile, dm_unlink, dm_unlock, dm_unmark, dm_updatepart, dm_bp_attach, dm_bp_demote, dm_bp_promote, dm_bp_resume, dm_bp_suspend dm_acl dm_destroy, dm_save, dm_saveasnew dm_job dm_addrendition, dm_branch, dm_checkin, dm_checkout, dm_destroy, dm_link, dm_lock, dm_mark, dm_prune, dm_removecontent, dm_removerendition, dm_save, dm_saveasnew, dm_setfile, dm_unlink, dm_unlock, dm_unmark, dm_jobstart, dmr_content dm_move_content InfoArchive-Specific Events Here is a list of InfoArchive-specific events audited by InfoArchive: Event User Object Type Description eas_query webservice eas_audittrail A search is performed 310 InfoArchive Administration Event User Object Type Description eas_fetch webservice eas_audittrail An AIU ID is returned by a search. The event contains the same information as the eas_query event except for query criteria and the AIP count. To minimize the database footprint, Only one eas_fetch will be created and all AIU IDs will be saved into a distinct persistence object (eas_audittrail_fetch). Each persistence object contains a reference to the eas_fetch audittrail. eas_getcontent webservice eas_audittrail An unstructured content file is retrieved from InfoArchive eas_order webservice eas_audittrail An order (asynchronous search) is created eas_getorder webservice eas_audittrail The result of an order is retrieved eas_confirmation dmadmin dm_audittrail A confirmation job is run eas_confirmation _completed dmadmin dm_audittrail A confirmation job has completed eas_archive _audit dmadmin dm_audittrail The archive audit (eas_archive_audit) job is run eas_purge_audit dmadmin dm_audittrail The purge audit (eas_puarge_audit) job is run eas_purge_audit _completed dmadmin dm_audittrail The purge audit (eas_puarge_audit) job has completed eas_change _retention user dm_audittrail The retention date of an AIP has changed eas_unlock user dm_audittrail A purge lock has been removed from an AIP eas_lock user dm_audittrail A purge lock has been attached to an AIP dmadmin eas_rollback dmadmin dm_audittrail An asynchronous ingestion operation has been rolled back eas_reject dmadmin dm_audittrail An AIP has been rejected user eas_invalid user dm_audittrail An AIP has been invalidated eas_purge dmadmin dm_audittrail An AIP has been purged eas_aip_cmeta _modify user dm_audittrail The custom metadata of an AIP has been updated 311 InfoArchive Administration Archiving InfoArchive Audit Records Audit records can be archived in InfoArchive in the same way as other archived data. Audit records are archived using the archive audit (eas_archive_audit) job. When you installed InfoArchive, a pre-configured holding EAS-AUDIT-001 was installed for storing audit records. By default, the primary holding configuration objects are located in the /System EAS/Archive Holdings/EAS/EAS-AUDIT-001 repository folder and /EAS/EAS-AUDIT-001 is the default location for audit AIPs. If needed, you can modify the default holding configurations. When executed, the archive audit (eas_archive_audit) job performs the following actions: 1. Creates a SIP file containing audit records that have not been archived so far. 2. Triggers the Receiver to receive the generated SIPs into the audit holding and put them in the queue for ingestion. Once the SIP is ingested and committed into the audit holding, the audit records contained within can be searched and accessed in the InfoArchive GUI. Configuring the Archive Audit (eas_archive_audit) Job Arguments You pass standard Documentum Content Server job method arguments to the archive audit (eas_archive_audit) job. To edit the job arguments: 1. In DA, navigate to the Administration/Job Management/Jobs repository folder. 2. Locate and right-click the eas_archive_audit job and choose Properties from the shortcut menu. 3. Under the Method tab of the Job Properties page, make sure the Pass standard arguments option is selected and click Edit. 4. Edit the method arguments and save your changes. Archive audit (eas_archive_audit) job arguments are listed as follows: 312 InfoArchive Administration Job Method Arguments Description Default Value docbase_name (s) Name of the Documentum Content Server repository to connect to domain (d) the user’s domain if any user_name (u) Username with which to connect to the Documentum repository Login name of the installation owner user account password (p) (Optional) The user password with which to connect to the repository. Password of the installation owner user account If the job executes on the same host where Content Server is installed, you do not need to specify this property here. The job can connect to the repository through the Content Server trusted login feature. job_id (j) r_object_id of the dm_job repository object corresponding to the job predicate DQL dm_audittrail predicate for filtering the audit records to be archived by the job. If not set, all audit records are archived. holding Name of the target archive holding to fill in the descriptor of the generated SIP EAS-AUDIT-001 producer Designation of the application producing the SIP to fill in the descriptor of the generated SIP eas entity Name of the business entity to fill in the descriptor of the generated SIP EAS pdischema PDI schema URN to fill in the descriptor of the generated SIP urn:x -emc:eas:schema:audittrail:1 .0 pdischemaversion PDI schema URN to fill in the descriptor of the generated SIP producer Designation of the application producing the SIP to fill in the descriptor of the generated SIP eas application Designation of the application producing the data to fill in the descriptor of the generated SIP eas priority Ingestion priority to fill in the descriptor of the generated SIP 0 maxaudit Maximum number of audit trails to include in a generated SIP 100000 313 InfoArchive Administration Job Method Arguments Description Default Value commandline Command line to trigger the reception of a generated SIP. The job dynamically substitute the placeholder %file% with the file path of the generated SIP file eas-launch-receiver -f %file% -e true -c reception_node_01 -o EAS checkaudit Boolean indicating if the job must first validate a signed audit trail before exporting it false checksumalgo Java name of the hash algorithm to apply for computing the hash value of the previous SIP MD5 checksumencoding Java name of the encoding algorithm to use for including the hash value of the previous SIP in the data file header of the current SIP base64 workingdir File system path of the directory to be used by the job as working area Value defined during the installation Running the Archive Audit (eas_archive_audit) Job You archive audit trail records by running the eas_archive_audit job. When successfully completed, the eas_archive_audit job returns code 0 (zero). An AIP containing all audit trail records that have not been archived yet is created in the audit holding (default: EAS-AUDIT-001) with the Waiting_Ingestion lifecycle state. You must then ingest and commit the received AIP to complete the archiving process. If synchronous commit is enabled, ingestion is automatically followed by the commit operation. To ensure the integrity of the archived audit trail, you must run the eas_confirmation job after the commit; otherwise, the next eas_archive_audit job run will fail. archived and confirmed audit trail records can be periodically purged. Follow these steps to archive InfoArchive audit records: 1. Run the archive audit (eas_archive_audit) job in one of the following ways: • Execute the following command in a prompt window: runJob repository_name eas_archive_audit • In DA, run the eas_archive_audit job under Administration/Job Management/Jobs. You can schedule the job to run on a regular basis. 2. Ingest the received AIP by executing the following command: EAS_HOME/bin/eas-launch-ingestor aip_id See Ingesting AIPs, page 245. 3. 314 Commit the ingestion by running the eas_commit job. InfoArchive Administration See Committing Ingested AIPs, page 250. 4. Run the confirmation (eas_confirmation) job. See Confirmation (eas_confirmation), page 299. Troubleshooting the Archive Audit Job Errors When an audit archiving error occurs, a non-zero code is returned. You can find more information about the error in the log file eas_archive_audit.log created in EAS_HOME/bin. Identify and fix the cause of the error and run the eas_archive_audit job again. If the error persists, delete the dm_audittrail objects in question by executing the following DQL and then rerun the eas_archive_audit job: delete dm_audittrail objects where event_name='eas_archive_audit' Here is a list of eas_archive_audit job return codes: Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job 6 E_EXTERNAL _COMMAND Error returned by the configured command line launched for posting the ingestion of a generated SIP 7 E_CHECK_AUDIT Error returned when the verification of the signature associated with an audit trail fails 8 E_COMMIT_WAIT Execution refused since the last SIP posted for ingestion has not been ingested and confirmed Viewing Archived Audit Records Once archived by the eas_archive_audit job, audit records can be searched and viewed in two built-in search forms in InfoArchive GUI, both located in the EIA system forms folder under the Search tab: • Archive lifecycle log Use this form to search for audit records of events that affect the lifecycle state of an AIP, such as eas_confirmation and eas_purge. 315 InfoArchive Administration • Event log Use this form to search for audit records of events do not affect the AIP lifecycle, such as eas_query and eas_purge_audit. Both the archive lifecycle log and event log search forms return audit records in the search results page, on which you can sort and filter audit records, view record details, and export records. For information about the search result page, see Working with the Search Results Page, page 272. Purging InfoArchive Audit Records Every growing audit records can quickly take up storage space and cause system performance degradation. Once they have aged beyond compliance requirements, they should be deleted to reclaim storage space. 316 InfoArchive Administration The Purge Audit (eas_purge_audit) job is responsible for purging archived audit records and performs the following actions: • Incremental reading of the archival confirmation messages created for the AIP archived in the audit archive holding; each audit trail repository object referenced in such confirmation is flagged as archived. • Destruction of the archived repository objects older than a configurable time frame. Configuring Purge Audit (eas_purge_audit) Job Arguments You pass standard Documentum Content Server job method arguments to the purge audit (eas_purge_audit) job. To edit the job arguments: 1. In DA, navigate to the Administration/Job Management/Jobs repository folder. 2. Locate and right-click the eas_purge_audit job and choose Properties from the shortcut menu. 3. Under the Method tab of the Job Properties page, make sure the Pass standard arguments option is selected and click Edit. 4. Edit the method arguments and save your changes. Archive audit (eas_purge_audit) job arguments are listed as follows: Property Long (Short) Name Description Default Value library Name of the xDB library configuration object in which the archival confirmations must be scanned confirmation_audit_xdb _lib cutoffaudit Sets the current date minus the specified number of days as the audit archive cutoff date. Archived audit trail objects with a timestamp prior to this date will be destroyed by the eas_purge_audit job. For example, if this value is set to 7, all audit trail records archived earlier than a week will be destroyed. 92 cutoffconf Sets the current date minus the specified number of days as the cutoff date for archival confirmation messages. Archival confirmation messages created prior to this date will be destroyed by the eas_purge_audit job. For example, if this value is set to 7, all archival confirmation messages created earlier than a week will be destroyed. 92 workingdir File system path of the directory to be used by the job as working area Value defined during the installation. 317 InfoArchive Administration Running the Purge Audit (eas_purge_audit) Job You can run the purge audit (eas_purge_audit) job in one of the following ways: • Execute the following command in a prompt window: runJob repository_name eas_purge_audit • In DA, run the eas_purge_audit job under Administration/Job Management/Jobs. When successfully completed, the eas_purge_audit job returns code 0 (zero). Troubleshooting the Purge Audit Job Errors When an audit purge error occurs, a non-zero code is returned. You can find more information about the error in the log file eas_purge_audit.log created in EAS_HOME/bin. Identify and fix the cause of the error and run the eas_purge_audit job again. Here is a list of eas_purge_audit job return codes: Return Code Mnemonic Description -1 E_UNEXPECTED Unexpected Unexpected error 0 OK Successful execution 1 E_PARSE Error while parsing arguments 2 E_DFCINIT Error while initializing the DFC 3 E_CREDENTIALS Cannot connect to the repository with the configured credentials 4 E_PARAMS Error while validating the parameters 5 E_REPORT Error while creating the log of the job 6 E _USERPERMISSION The repository user account configured for being used by the job does not have the audit related extended privileges which are required for the execution of the job Administrating the Configuration Cache Use the InfoArchive web services administration pages http://hostname:port/eas-services/ to perform administrative tasks on the configuration cache, including flushing the cache, view basic and detailed cache statistics, and viewing detailed element key information in the cache. The configuration cache administration pages requires basic HTTP authentication, so to access the pages, you must give access privileges to an appropriate user by granting access to the eas-services-admin role, and assigning the user to this role. 318 InfoArchive Administration For example, on Apache Tomcat, add the following in TOMCAT_HOME/conf/tomcat-user.xml, and then restart the web application server for the changes to take effect. <role rolename="eas-services-admin"/> <user username="<username>" password="<password>" roles="eas-services-admin"/> There are three configuration cache administration pages: • Configuration Cache Flush (http://<hostname>:<port>/eas-services/admin/config-cache/flush) Use this page to flush the cache. Every time you access this page, the cache is flushed. • Configuration Cache Statistics (http://<hostname>:<port>/eas-services/admin/config-cache/stats) View basic and detailed statistics of the cache on this page. You must set eas_config_cache _statistics_enabled in eas-services.properties to true to be able to view the detailed statistics. • Configuration Cache Keys (http://<hostname>:<port>/eas-services/admin/config-cache/keys) View detailed element key information on this page. Logging Logging mechanisms are implemented differently among InfoArchive components. This section describes logging mechanisms and how you can customize logging according to your needs. Command Line Jobs Logging Command line jobs refer to the stand-alone Java programs for a certain task. Most command line jobs have two loggers, file logger and console logger. File Loggers for Command Line Jobs When the command line job’s corresponding configuration object is loaded, a file logger is attached to the event. When the event is triggered, logs are saved into a file in the specified location. The following table describes the file logging process in detail. Job Log Setting Log Location eas-launch-receiver Open a reception node’s property page (eas_cfg_receive_node), log level and working directory set the logging verbosity and the log file location respectively. In the working directory, the subfolder name follows the <date-time>_<aipID> pattern. If you enable archive logs for a holding, the reception logs are also compressed and saved as a rendition of the AIP object. 319 InfoArchive Administration Job Log Setting Log Location eas-launch-ingestor Open an ingestion node’s property page (eas_cfg_ingest_node), log level and working directory set the logging verbosity and the log file location respectively. In the working directory, the subfolder name follows the <date-time>_<aipID> pattern. eas-launch-order -node Open an order node’s property page (eas_cfg_order_node), log level and working directory set the logging verbosity and the log file location respectively. The log file is saved in the directory specified by working directory. eas-launch-xdb -cache Open a cache access node’s property page (eas_cfg_cache_access_node), log level and working directory set the logging verbosity and the log file location respectively. The log file is saved in the directory specified by working directory. If you enable archive logs for a holding, the ingestion logs are also compressed and saved as a rendition of the AIP object. The command line job logging level is specified by the -l parameter in the .properties file. The value is one of the following: ERROR, WARN, DEBUG, INFO, TRACE. Console Loggers for Command Line Jobs Command line jobs, except eas-clean, also have console loggers. The default console logger displays logs on the screen prompt or saves screen logs to a file you specify in the command line. You can set the log level and redirect the log for command line jobs’ console loggers in the following ways: 1. Enable and set the -l argument in the corresponding property file in the conf folder. 2. If you would tentatively override the log level value in the configuration file, append the -l argument and your desired log level value in the command. For example, the following command sets the current command log level to DEBUG: eas-launch-receiver.bat -f myfile.zip -l DEBUG 3. If you would redirect the logging ouput to a file, specify the file path in the command line. For example, the following command saves logs to a file: eas-launch-receiver.bat -f myfile.zip > C:\temp\details\receiver.log 320 InfoArchive Administration The console loggers for the following command line jobs behave differently from the default console logger. • eas-enumeration: The console logger only outputs the AIP ID. • eas-xdb-cache, eas-xdb-order: You cannot set the logging level for the console logger. • eas-launch-xdb-enumeration-nobackup: The console outputs when an error occurs only. Content Server Jobs Logging Content Server jobs refer to jobs that are triggered by dm_job objects in a Content Server repository. Content Server jobs perform lifecycle management, update, modify or delete Content Server objects. Content Server jobs save their logs in the $Documentum/dba/log/mylog/sysadmin folder. The files in this folder are also imported to the repository. Therefore, you can also view job logs in cabinets\Temp\Jobs of DA. Content Server job logging has the following logging levels: • 0: ERROR • 1: WARN • 2: INFO • 3: DEBUG • 4: TRACE If you specify an integer between 4 and 10 inclusive, the logging level is TRACE. For eas_commit, eas_rejinv, and eas_report jobs, logs are also saved in the directory specified by the -WorkingDir argument. You can edit this argument in the Method tab. Configuring the Logging Level for InfoArchive Web GUI and Web Services For InfoArchive web services, the logging verbosity and the location are defined by WEB-INF/classes/log4j.xml. You can locate this file after you deploy WAR files. The logging level is set to INFO by default. You can change the logging level to ERROR, WARN, DEBUG, or TRACE. <category name="com.emc"> <priority value="INFO" /> </category> The log location is specified by the following line: <param name="file" value="/tmp/eas/eas-gui.log" /> 321