Title Page EMC® Kazeon-eDiscovery Version 4.8.0 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com Copyright © 2007 - 2015 EMC Corporation. All rights reserved. Published September 2015 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDF Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarks used herein are the property of their respective owners. The IS1200 software is based in part on software licenses from the following: Outside In® Content Access © 1991-2015, Chicago, Inc. Open Source code from www.java2s.com called the itext.asian.jar available at: http://www.java2s.com/Code/Jar/GHI/itext-asian.jar.htm Copyright 2009 - 12 Demo Source and Support. All rights reserved In part on the work of the Independent JPEG Group. Code from Inxight Software, Inc. Copyright © 1996-2015. All rights reserved. www.inxight.com. Certain icons used by the Kazeon Web applications come from the Silk Icon set (http://www.famfamfam.com/lab/icons/silk/) licensed under the Creative Commons Attribution 2.5 license (http://creativecommons.org/licenses/by/2.5/). ii IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Contents Figures.........................................................................................................v Tables ................................ ........................................................................vii Preface ................................................................ ......................................ix Chapter 1 Introduction About the IS1200 ................................................................................. 2 Extending Server Functionality ..................................................2 Celerra FLR Functionality ...........................................................2 The Celerra FLR Retention Manager .........................................3 Supported Celerra FLR Configurations........................................... 4 Celerra FLR Retention Manager Implementation Overview ....... 5 Celerra FLR Retention Manager Implementation in New Installations ...................................................................................5 Celerra FLR Retention Manager Implementation In Existing Installations ...................................................................................5 Chapter 2 Registering Celerra FLR Repositories Supported Configurations................................................................. 8 Celerra FLR Repository Registration Requirements...................... 8 Creating a Metadata Repository for a Celerra FLR Repository ... 9 Registering Celerra FLR Repositories ............................................ 10 Registering Repositories Using Web-Admin.......................... 10 Registering Repositories from the CLI .................................... 14 Contents iii Contents Chapter 3 Workflow Changes Celerra FLR Retention Capabilities................................................ Celerra FLR Legal Hold Capabilities ............................................. Celerra FLR Collection Options...................................................... Administration.................................................................................. Searching............................................................................................ Reporting ........................................................................................... 18 19 19 19 19 20 Glossary ............................................................. ......................................21 Index .................................................................. ......................................43 iv IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Figures Title 1 2 3 4 Page The Web-Admin Repositories Page............................................................. The Web-Admin Add Repository Tab for NFS ......................................... The Web-Admin Add Repository Tab for CIFS ........................................ Retention Options for Celerra FLR Data Repositories.............................. Figures 10 11 12 18 v Figures vi IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Tables Title 1 2 Page Revision History Details................................................................................. xii Stop Words....................................................................................................... 40 Tables vii Tables viii IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Preface As part of an effort to improve its product lines, EMC periodically releases revisions of its software and hardware. Therefore, some functions described in this document may not be supported by all versions of the software or hardware currently in use.The product release notes provide the most up-to-date information on product features. Contact your EMC technical support professional if a product does not function properly or does not function as described in this document. Note: This document was accurate at publication time. Go to EMC Online Support (https://support.emc.com) to ensure that you are using the latest version of this document. Audience Related Documentation This guide is intended for Administrators, Business Analysts, and Compliance Auditors that need to setup and configure an EMC Kazeon - eDiscovery server to work with EMC Celerra repositories, and then search and produce reports on those repositories. IS1200 Installation and Quickstart Guide - describes installing and configuring the IS1200 server software. IS1200 Web-Admin User and Configuration Guide - describes using Web-Admin to setup and manage Kazeon clusters. IS1200 Web-Search User Guide - describes using Web-Search to perform basic and advanced searches. IS1200 Web-Reports User Guide - describes using Web-Reports to create and use basic and advanced reports. Preface ix Preface IS1200 eDiscovery Case Manager Administrators and Supervisors Guide - for legal representatives, a primer of all the web-based Interfaces above for performing eDiscovery. IS1200 Command Line Interface Reference Guide - describes the IS1200 Command Line Interface and all its commands. Follow these steps to download IS1200 document from the web: 1. Go to https://support.emc.com and click the SUPPORT BY PRODUCT option in the home page. 2. In the Find a Product field, enter Kazeon. From the product selection list, choose one of the sub-headers (such as Kazeon ECS) and click the Find button. 3. Kazeon ECS window is displayed. Click the link for Documentation. 4. In the left-navigation menu, choose a version level to display the available documents. Conventions used in this document EMC uses the following conventions for special notices: DANGER indicates a hazardous situation which, if not avoided, will result in death or serious injury. WARNING indicates a hazardous situation which, if not avoided, could result in death or serious injury. CAUTION, used with the safety alert symbol, indicates a hazardous situation which, if not avoided, could result in minor or moderate injury. NOTICE is used to address practices not related to personal injury. Note: A note presents information that is important, but not hazard-related. x IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Preface IMPORTANT An important notice contains information essential to software or hardware operation. Typographical conventions EMC uses the following type style conventions in this document. Normal Used in running (nonprocedural) text for: • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • Names of resources, attributes, pools, Boolean expressions, buttons, DQL statements, keywords, clauses, environment variables, functions, utilities • URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications Bold Used in running (nonprocedural) text for: • Names of commands, daemons, options, programs, processes, services, applications, utilities, kernels, notifications, system calls, man pages Used in procedures for: • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • What user specifically selects, clicks, presses, or types Italic Used in all text (including procedures) for: • Full titles of publications referenced in text • Emphasis (for example a new term) • Variables Courier Used for: • System output, such as an error message or script • URLs, complete paths, filenames, prompts, and syntax when shown outside of running text Courier bold Used for: • Specific user input (such as commands) Courier italic Used in procedures for: • Variables on command line • User input variables <> Angle brackets enclose parameter or variable values supplied by the user [] Square brackets enclose optional values | Vertical bar indicates alternate selections - the bar means “or” {} Braces indicate content that you must specify (that is, x or y or z) ... Ellipses indicate nonessential information omitted from the example Where to get help EMC support, product, and licensing information can be obtained as follows. Preface xi Preface EMC product, and licensing information can be obtained as follows. Product information — For documentation, release notes, software updates, or for information about EMC products, licensing, and service, go to the EMC Online Support at: https://support.emc.com Technical Support — Go to EMC Online Support and click Service Center. You will see several options for contacting EMC Technical Support. Note that to open a service request, you must have a valid support agreement. Contact your EMC sales representative for details about obtaining a valid support agreement or with questions about your account. Documentation Feedback Your suggestions help us continue to improve the accuracy, organization, and overall quality of the user publications. Please send your comments or opinions on this document to: ECD.Documentation.Feedback@emc.com Revision History Table 1 xii Revision History Details Revision Date Description September 2015 Updated the Deduplication section in “Glossary”. August 2014 Added an update to the mixed mode support by Kazeon in “Creating a Metadata Repository for a Celerra FLR Repository” on page 9. May 2014 • Updated information about support for mixed mode in “Creating a Metadata Repository for a Celerra FLR Repository” on page 9. • Changed any instances of Exchange connector to FLR connector. December 2013 Initial Publication IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 1 Introduction This guide is provided as a companion to the IS1200 Web-Admin User and Configuration Guide which should be read first as it contains most of the basic IS1200 server setup and maintenance information on which this guide builds. Topics include: ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ About the IS1200 .................................................................................. Extending Server Functionality ......................................................... Celerra FLR Functionality................................................................... The Celerra FLR Retention Manager................................................. Supported Celerra FLR Configurations............................................ Celerra FLR Retention Manager Implementation Overview ........ Celerra FLR Retention Manager Implementation in New Installations........................................................................................... Celerra FLR Retention Manager Implementation In Existing Installations........................................................................................... Introduction 2 2 2 3 4 5 5 5 1 Introduction About the IS1200 The IS1200 is an integrated hardware and software system that provides information management solutions enabling organizations to efficiently and cost effectively classify, manage, and retrieve data. They provide consistent information visibility and control across distributed files, minimize the risk of un-managed files, integrate seamlessly with existing infrastructure, and scale to support billions of files for searching, reporting, backup search and recovery, and file migration and archiving. Extending Server Functionality The standard IS1200 uses clustering to offer a scalable solution for classifying, searching, reporting on, and applying Actionable Services to search and report results found on registered data repositories. Data repository types include NFS, CIFS, and many other vendor-specific servers such as Microsoft Exchange, and Microsoft SharePoint servers. The IS1200’s standard functionality, and the types of servers it can access, can be expanded with add-on modules like the FLR Connector. The FLR Connector requires an additional license for the IS1200 and allows that IS1200 to register and manage Celerra FLR Servers. Celerra FLR Functionality The main objective of the EMC Celerra FLR (file-level retention) server is to protect files from deletion or modification until a specified retention date. FLR allows creating a permanent, unalterable set of files and directories, and ensures the integrity of the data they contain for a controllable retention period. This server prevents users from deleting or modifying files that are locked and protected. This is highly useful during legal issues where files must be preserved without modification for the entire course of a legal case, and for organizations like health service providers that must retain medical records for fixed periods determined by legal mandates. The Celerra FLR is designed to provide file retention at both the self-regulated and government-regulated levels. 2 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Introduction The Celerra FLR Retention Manager While the standard IS1200 can register standard Celerra shares that have been exported as CIFS or NFS, the FLR Connector allows the IS1200 to register Celerra FLR servers and work with their FLR-specific retention capabilities. Once a Celerra FLR server has been registered as a data repository, and classified, it can be searched, reported on, and have Actionable Services such as Retention applied to search and report results. Adding the FLR Connector also enables the IS1200 to provide additional Celerra FLR-specific search options, retention reports, and new Action options. The IS1200 can transfer files to Celerra FLR servers, and access those files to classify and search them. Most importantly, the FLR Connector allows the IS1200 to help manage all the files on Celerra FLR servers including file settings such as retention times. Additionally, IS1200 standard retention reports such as “Expired Files”, “Retention by Aging”, and “UpForExpiry” allows identifying files that have passed their retention dates and Actionable Services such as Retention, Copy, Move, and Delete, allows report results (files) to have retention periods modified or extended, or to efficiently archive or delete expired files. About the IS1200 3 Introduction Supported Celerra FLR Configurations The FLR Connector supports EMC Celerra FLR versions 5.6.4.3 and above. 4 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Introduction Celerra FLR Retention Manager Implementation Overview If FLR Connector capabilities are ordered with the original appliance purchase, the appropriate optional module license is automatically included in the server master license and is routinely installed along with the server license. If the FLR Connector is purchased after the original installation, the FLR Connector license key must be added to the installation before Celerra FLR repositories can be registered and managed. See the Installing License Keys chapter of the IS1200 Web-Admin User and Configuration Guide for details on obtaining and installing optional modules licenses for existing installations. Celerra FLR Retention Manager Implementation in New Installations The FLR Connector software is automatically installed with all IS1200 installations (ECS and FI), the FLR Connector license key simply activates it. To use the FLR Connector purchased with a new installation, simply follow the standard hardware and software installation and configuration instructions in the IS1200 Installation and Quickstart Guide and then use the rest of this guide to configure the connectors. Celerra FLR Retention Manager Implementation In Existing Installations If the FLR Connector license key is obtained after the original IS1200 installation, use the following general steps to implement FLR Connector capabilities. 1. Use the current Web-Admin application to add the FLR Connector license key to the IS1200, and then quit and relaunch the Web-Admin application. See the Installing License Keys chapter of the IS1200 Web-Admin User and Configuration Guide for details on obtaining and installing optional modules licenses for existing installations. 2. Register your Celerra FLR servers as IS1200 data repositories. Celerra FLR Retention Manager Implementation Overview 5 Introduction 3. Classify your Celerra FLR repositories. 4. Search, report, and apply Actions to your Celerra FLR servers as necessary. 6 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 2 Registering Celerra FLR Repositories This chapter discusses registering Celerra FLR network shares as EMC Kazeon - eDiscovery data repositories. Topics include: ◆ ◆ ◆ ◆ ◆ ◆ Supported Configurations .................................................................. 8 Celerra FLR Repository Registration Requirements....................... 8 Creating a Metadata Repository for a Celerra FLR Repository .... 9 Registering Celerra FLR Repositories ............................................. 10 Registering Repositories Using Web-Admin ................................. 10 Registering Repositories from the CLI............................................ 14 Registering Celerra FLR Repositories 7 Registering Celerra FLR Repositories Before Celerra FLR repositories can be classified, or searched and reported on, they must be registered with the IS1200 as data repositories and have a metadata repository assigned. Supported Configurations The Celerra FLR Retention Manager supports all Celerra FLR versions 5.6.4.3 and above. The Connector supports registering up to sixteen Celerra FLR repositories per IS1200 node (the same IS1200 support supplied for all NFS or CIFS repositories). Each Celerra FLR repository can consist of multiple Celerra servers. Celerra FLR Repository Registration Requirements To register and access Celerra repositories, the IS1200 must needs the following: 8 ◆ A appropriate metadata repository to associate with the repository when it is registered. ◆ If the Celerra FLR repository is exported as a CIFS share, an identity is needed that has complete access to the repository to be registered. IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Registering Celerra FLR Repositories Creating a Metadata Repository for a Celerra FLR Repository Before registering a Celerra FLR repository, one or more metadata repositories must be registered to store the extracted metadata. For information on adding metadata repositories, see the Repository Registration and Management chapter of the IS1200 Web-Admin User and Configuration Guide. For optimal performance, use metadata repositories on NFS systems. Dedicated IS1200 metadata repositories should be assigned to each Celerra FLR server. Because a Celerra FLR server typically host many terabytes of data, multiple metadata repositories may be needed to store the metadata for a single Celerra FLR server. If you find that the number of metadata repositories is inadequate, register more metadata repositories. Kazeon supports only basic crawl on data repositories that use mixed mode. For the mixed-mode environment to work, the user mapping should be already configured. For performing this mapping, refer to the respective storage documentation. For example, Isilon, Celerra, and so on. If the user mapping is not performed correctly, you may not get the correct custodian or owner information of the object. For more information about mixed mode support, see the IS1200 4.8 Web-Admin User Guide. Note: Metadata repositories may not be shared between two registered Celerra FLR servers or between a Celerra FLR cluster and any other registered data repository. Creating a Metadata Repository for a Celerra FLR Repository 9 Registering Celerra FLR Repositories Registering Celerra FLR Repositories Celerra FLR repositories may be registered with the IS1200 using either the CLI or Web-Admin. Registering Repositories Using Web-Admin To register a Celerra FLR server as a data repository from Web-Admin, do the following: 1. In Web-Admin, select Repository View under Repositories in the left-navigation menu. The Repositories tab opens: Figure 1 The Web-Admin Repositories Page 2. In the Repository tab tool-bar, click Add Repository. 10 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Registering Celerra FLR Repositories 3. Select NFS or CIFS from the Repository Type drop-down menu, depending on how the Celerra FLR repository you want to register was shared. One of the following two tabs appears: Figure 2 The Web-Admin Add Repository Tab for NFS Registering Celerra FLR Repositories 11 Registering Celerra FLR Repositories Figure 3 The Web-Admin Add Repository Tab for CIFS 4. Fill in the fields on the screen you see as follows: Name. Enter a reference name for this repository. The IS1200 uses reference names, instead of the full repository filepaths, in all menus where a user must choose a repository, for example when choosing a data repository for a classification or collection. Reference names must be unique. Reference names are limited to 127 bytes (127 ASCII characters, or as few as 31 four-byte UTF-8 encoded characters). Reference names may include some special characters, see “Special Characters” on page 31 for details. Metadata File System From the drop-down menu, select the metadata repository you created for the Celerra cluster, or let the IS1200 auto select one. Server. Enter the name of the NFS/CIFS file server hosting the repository to add. This may already be entered (and unchangeable) if registering a “discovered” data repository. If you are registering an NFS repository: Mount Path. Enter the mount path of the NFS repository on the host file server. 12 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Registering Celerra FLR Repositories If you are registering a CIFS repository: Share Name: Enter the share name, or mount point, of the CIFS repository to register. Identity: Select a pre-defined user identity—from the drop-down list—to use when the IS1200 accesses this filer. If an appropriate identity is not available, click the Create Identity button to add one. For more information on identities, see The Identity Vault chapter of the IS1200 Web-Admin User and Configuration Guide. Specify Use: Select one of the following: • Source. Register this repository as a source, this includes the reference name (specified above) in all dialogs where a repository can be chosen as a source, for example when doing a Collection in either Web-Admin or the eDiscovery Case Manager. • Target. Register this repository as a target, this includes the reference name (specified above) in all dialogs where a repository can be chosen as a target. • Source and Target. Register this repository as a both a source and a target for dialogs. Read Only. Check to indicate the filer being registered is Read Only to the IS1200. The option should be used if the repository (being registered) is exported or shared as Read Only. If this option is not set, the IS1200 assumes the filer being added is Read Write. Repository Vendor. Check the EMC Celerra FLR checkbox. Once the EMC Celerra FLR checkbox is set, and the repository is registered, the option cannot be un-set! The only way to uncheck this option is to offline the repository, delete it, and re-add it without the option checked. Storage Tier. Optionally, specify the storage tier where the data repository is located. The storage tier can be any number between 0 and 255. Default is 0. Force add on errors. Select to force adding this device to the registered list in spite of errors. Preserve Access Time. This option is unavailable for Celerra FLR repositories. Registering Celerra FLR Repositories 13 Registering Celerra FLR Repositories 5. Click Submit to register the data repository. Registering Repositories from the CLI To register a Celerra FLR server from the Command Line Interface, use the following general Command Line Interface command: add datafs <referenceName> mount <mountPoint> as <identity> <attributesHere> Where: <referenceName> is the name the IS1200 should use when displaying this repository in repository selection menus. <mountPoint> is the mount path for the repository. <identity> is the name of an an identity (already stored in the IS1200 Identity Vault) to use to access the CIFS repository. <identity> is only required when adding CIFS repositories. <attributesHere> is the keyword attributes followed by a comma separated list of attributes appropriate to the repository. Examples follow: To add an NFS repository from a Celerra FLR server and make it a source repository: add datafs celerra_nfs mount celerraflr_server:/nfs1 attributes celerraflr=yes,fs_preserve_timestamp=no,source_reposit ory=yes,target_repository=no To add an NFS repository from a Celerra FLR server and make it a source repository and a target repository: add datafs celerra_nfs mount celerraflr_server:/nfs2 attributes celerraflr=yes,fs_preserve_timestamp=no,source_reposit ory=yes,target_repositoy=yes When adding CIFS repositories, an identity must be available containing credentials allowing complete access to the repository. Assume the identity celerraflr_identity is available for the following examples. To add a CIFS repository from a Celerra FLR server and make it a source repository: 14 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Registering Celerra FLR Repositories add datafs celerra_nfs mount //celerraflr_server/cifs1 as celerraflr_identity attributes celerraflr=yes,fs_preserve_timestamp=no,source_reposit ory=yes,target_repository=no To add a CIFS repository from a Celerra FLR server and make it a source repository and a target repository: add datafs celerra_nfs mount //celerraflr_server/cifs2 as celerraflr_identity attributes celerraflr=yes,fs_preserve_timestamp=no,source_repository=yes,target_r epositoy=yes Registering Celerra FLR Repositories 15 Registering Celerra FLR Repositories 16 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 3 Workflow Changes This chapter describes how installing a Celerra FLR Retention Manager license on the IS1200 changes the standard workflow procedures. Note: Before continuing, be sure a valid Celerra FLR Retention Manager license key is installed on all nodes of your IS1200 cluster. See the Installing License Keys chapter of the IS1200 Web-Admin User and Configuration Guide for details on obtaining and installing a license key if you have not already installed the key. Topics include: ◆ ◆ ◆ ◆ ◆ ◆ Celerra FLR Retention Capabilities ................................................. Celerra FLR Legal Hold Capabilities .............................................. Celerra FLR Collection Options ....................................................... Administration ................................................................................... Searching ............................................................................................. Reporting............................................................................................. Workflow Changes 18 19 19 19 19 20 17 Workflow Changes Celerra FLR Retention Capabilities When a Celerra FLR Retention Manager license is added to the IS1200, new retention options become available in all screens, pages, or tabs that move or copy files to a Celerra FLR repository. The new options generally look like the following: Figure 4 Retention Options for Celerra FLR Data Repositories The new options work as follows: Retention Date Selection: Check this box to determine the retention date to use: ◆ Absolute Date/Time: Select this radio button to set an absolute retention date. Use the ( ) Time drop-down menu and the ( ) Calendar tool to determine the specific retention time and date. ◆ Relative Date: Select this radio button to set a relative retention date, that is, a date determined using the following formula: retention date = <soMany> <timeUnits> from <fileTimeAttribute> Use the fields to the right of the Relative Date radio button to set up the formula: • use the first empty field to set the <soMany> number • use the middle drop-down menu to set the <timeUnits> • use the left drop-down menu to set which <fileTimeAttribute> to base the date on. Retention Class Selection: This radio button is present whenever an EMC repository is selected, but is not available (grayed out) for Celerra FLR repositories. 18 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Workflow Changes Celerra FLR Legal Hold Capabilities Only Legal Hold withOUT Enforcement is available on Celerra FLR servers. More specifically, while a metadata tag for legal hold may be set for files on registered Celerra FLR repositories, the standard IS1200 Actionable Services legal hold option “Enforce Legal Hold at the repository level” is not available for Celerra FLR files. The IS1200 cannot change the file privileges on registered Celerra FLR repositories to prevent users from moving, changing, or deleting files, even those with the legal hold metadata tag set. Celerra FLR Collection Options Collections done from either Web-Admin or the eDiscovery Case Manager to Celerra FLR repositories include options for setting the retention options on the target. See “Celerra FLR Retention Capabilities” on page 18 for details on setting these options. Administration Web-Admin can register, classify, and do Single Step Collections to Celerra FLR repositories. Searching A new metadata namespace, “Retention”, contains the following new fields:| retentionlock, retentiondate, retentionsetdate, retentionsetuser, retentionreportdate All these fields are viewable using the Show Metadata icon from search results, but only retentionsetuser is routinely indexed and can be searched for. Actionable Services like Copy, and Move, that are applied to search results from Celerra FLR repositories contain new interfaces for setting retentions options on their targets. A new Actionable Service Celerra FLR Legal Hold Capabilities 19 Workflow Changes tab called Retention, allows extending retention settings. See “Celerra FLR Retention Capabilities” on page 18 for details on setting these new retention options. Reporting Web-Reports contains a report category called Retention reports. Retention reports list expired, soon-to-expire, and locked files allowing administrators to manage these files with Actions that reset retention settings or delete expired files. Retention reports are only for repositories such as Celerra FLR repositories that implement specific retention features for the files they contain. Retention reports run on filers without retentions capabilities return empty reports. Retention reports prefixed by "Snaplock" are designed for SnapLock repositories. Running these on Celerra FLR repositories will return empty reports or errors. Actions like Copy or Move, that are applied to report results from Celerra FLR repositories contain new interfaces for setting retentions options on their targets. See “Celerra FLR Retention Capabilities” on page 18 for details on setting these new retention options. 20 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Glossary This glossary contains terms related to disk storage subsystems, networks, file management, and eDiscovery. Many of these terms are used in this manual. A active case In eDiscovery situations, a company may have more than one legal issue (case) in progress at a time. Often it is advantageous to limit job or search scope to just one case. When the user interface scope is limited to a particular single case, that case is the active case. Active Directory (AD) A technology created by Microsoft that provides a variety of network services, including: LDAP-like directory services, Kerberos-based authentication, and DNS-based naming and other network information. Actions, Actionable Services Access Control List (ACL) Services such as copy, move, delete, tagging, and so on, that can be applied to search and report results and allow the IS1200 to be an effective file management tool for registered repositories. A file system level data file that specifies how users or groups may access resources on a computer or network, like an application, file or printer, and the rights they have to it, for example read access, write access, and so forth. For more information on how the IS1200 may use ACLs, see the Controlling ACL Checking section of the Configuration Files and Utilities appendix of any IS1200 User Guide for details. 21 Glossary Advanced Search Agents Assignment Rules A search made from the IS1200 Advanced Search link. Allows searching for extracted metadata by tag-value pairs, and allows multiple variable and boolean searches. See “connectors” on page 25. An assignment rule is a type of classification rule. It tags files with metadata and assigns files to policy groups. Assignment rules are contained in Assignment Rule Sets (ASRs). See the Policies: Classification, Extraction and Assignment Rules chapter of the any IS1200 User Guide for more details. Auditing A service that allows the IS1200 to record all system events according to who did what, when, and the event result. This data is especially useful to Legal Service Providers when providing an audit trail for responsive data produced during eDiscovery. Complete details are available in the Auditing and Data Verification chapter of any IS1200 User Guide for details. Authorization Rule A policy rule that filters search results to ensure that the assigned files can only be viewed by authorized users. IS1200 authorization policies may be used to add additional levels of security to the Access Control Lists (ACLs) for file objects found in registered data repositories. See the Policy Groups: Authorization Policies chapter of any IS1200 User Guide for more details. Authentication The process of identifying users based on user name and password to ensure that only authorized users can access the IS1200. B Basic Search A search made from the Search page using only the Search field. Searches only the content found in the fullText field populated during classifications. C CAS Device CASID 22 EMC’s Content Addressed Storage (CAS) devices are cluster-able archival devices that host archival business file content such as email, office productivity files (like word processing and spreadsheet files), images, and other file documents. A unique IS1200 ID for each classified file that the system generates during basic classification. Glossary Centera Server The EMC Centera server is a networked storage system specifically designed to store and provide fast, easy access to fixed content (information in its final form). It is a CAS device providing long-term retention and assured integrity designed to store and manage data that require or have legally mandated retention periods, for example medical records and files relevant to legal matters. Celerra Server An EMC server designed to store and manage archival data. The Celerra File Level Retention (FLR) server also allows enforcing enterprise or governmental retention policies. checkpoints, checkpointing Checkpoints and checkpointing allow IS1200 jobs and services to resume more efficiently if the job or service is paused or stopped before it completes. Basically, the IS1200 records “bookmarks” about what file or object was last processed. This allows the IS1200 to skip to the bookmark—the checkpoint—when the job or service is resumed, and avoid reprocessing all the files and objects already processed. However, checkpoints are not set for every file accessed, instead most jobs divide file processing into “batches” and the checkpoints indicate where batches started. Consequently, when a job restarts at a checkpoint, some objects may be reprocessed again and—in cases such as a 'Copy' service with 'enable-versioning' option selected—duplicate versioned files will be created on the target repository when those objects are reprocessed. Classification Rule Classification Service Rules that the system implements during data classification to extract metadata, tag files, and assign files to policy groups. The two types of classification rules are extraction rules and assignment rules. Sometimes called a “crawl”. An IS1200 service that accesses job-specified registered repositories and extracts and records their metadata to later facilitate comprehensive and cross-repository searches. Classifications extract metadata according to extraction rules, compute digests for all objects, and assigns files to policy groups according to assignment rules. See “Assignment Rules” on page 22, “Extraction Rules” on page 29, “Hash Values” on page 30, and “Policy Groups” on page 34 for more details. Classifications may be “full”, every object in the specified repositories is parsed and its metadata repopulated in the indexes and databases, or they may be “differential”, see “Differential Classifications” on page 27 of more details. 23 Glossary Cluster A set of IS1200 appliance nodes working as a unit. A cluster can contain a maximum of four nodes. A cluster can be used to control other clusters, see “Information Center Server” on page 31 for details. CAS Content Addressable Storage Rather than address data objects by a file name, at a physical location, a CAS device uses a content address (hash-code identifiers) based on file contents to store file objects in a flat file system that maximizes storage efficiency. This returns a unique identifier (Content Address) used to store and retrieve data objects. CSV Comma Separated Values A file type used to transfer data between applications such as databases and spreadsheets. CLI Command Line Interface The CLI is a traditional command line interface that allows direct communications with the IS1200 “backend” using a the set of commands defined in the IS1200 Command Line Interface Reference Guide. Concepts Search The standard IS1200 software supports keyword exploration. However, in the initial stages of the legal discovery process (often called eDiscovery), keyword search alone may not be as concise or as time-efficient as required by standard legal timetables. Concepts augments standard keyword searching by automatically suggesting filters based on the results of a current search. By default it looks for concepts based on persons, countries, noun groups, organizations, company names, and products. Concepts Search is an optional module that requires an additional license key for each IS1200 cluster node. See the IS1200 Concepts Search User and Configuration Guide for complete details. conceptfinder Ruleset The conceptfinder ruleset is an assignment ruleset that extracts the concepts listed in the Review/Analysis Results Grouping Concepts pane, which is only available when a valid Concepts license is installed on the IS1200. The conceptfinder ruleset must be used in deep classifications to get the best results in Review/Analysis from the Concepts heading of the Results Grouping pane. The ConceptFinder_DWF assignment ruleset combines both the conceptfinder ruleset and the DocsWithoutFullText ruleset. See “DocsWithoutFullText Assignment Ruleset” on page 28 for more details. 24 Glossary connectors Connectors are IS1200 optional modules that allow an IS1200 to work with repository types beyond the standard CIFS and NFS repositories. See “optional modules” on page 34 for more details. Optional module connectors require separate licenses to be purchased and installed on all nodes of an IS1200 cluster. For a complete list of optional modules available, see the Introduction chapter of any IS1200 User Guide. Some connectors, such as the Microsoft Exchange Server Connector, require agents. Agents are additional server platforms, usually Windows servers, that provide the additional CPU cycles and network staging the IS1200 needs to work with the repository types they connect to. All connectors have their own user guides which can be accessed from the Kazeon Documentation link on the IS1200 Manager page (https://<yourIS1200Name>/manager). Container file/object A file (object) that contains other files (sub-objects), such as a ZIP, TAR, JAR, and PST or NSF files. The container file is often called the “parent” and the contained objects are called “children”. Container objects should not be confused with files that have embedded objects, such as Microsoft Word files that have embedded charts or graphics (OLE). Custodian A legal term used by Legal Service Providers (LSP) and other legal personnel to describe the owners or responsible parties for electronic documents pertinent (responsive) to a legal matter. D Data Datamap A file of any type and size such as a short email, a word processor document, or a large spreadsheet. A report that lists the electronic storage locations of all possible sources of relevant ESI. This can include standard file servers, groupware servers, email servers—and their backup and archive systems—as well as custodian’s desktop and laptop computers. Data-Mount The NFS file system that is accessed by the IS1200 to parse data and extract metadata. Data Server The file server that exports an NFS or CIFS file system so that the IS1200 can classify data on the file system to create metadata. 25 Glossary Data-Share The CIFS file system to be accessed by the IS1200 to extract metadata. Data Repository A networked file system registered with the IS1200 so it can be classified, searched, and reported on. Data repositories created on the IS1200 itself (sometimes called localdatafs) are strongly discouraged! Data Verification Builds on Auditing and is only available when system auditing is enabled. For job services like Actionable Services Copy or Move, Legal Hold Copy, and Single Step Collections, Data Verification generates an audit trail proving that files were not altered during these actions. This is especially valuable in eDiscovery situations. Complete details are available in the Auditing and Data Verification chapter of any IS1200 User Guide Deduplication A process that identifies file or email object and sub-object duplicates based on their digest values (See “Digest Values” on page 27 for details). In the 4.7.0 and prior versions of the IS1200 software, deduplication was only available for export actions (Actionable Services such as Download, Legal Export, and Copy). This allowed exporting only the unique files and email objects from a set of search results. With IS1200 version 4.8.0, deduplication's functionality is expanded and is automatically applied during case collections and processing to allow displaying deduplicated search results. Note that when deduplication is applied to display of search results, duplicates are only suppressed from display, however duplicates are physically removed from exported file sets. Deduplication is available only in the ECS version of IS1200 and is applicable only in case context. DeDuplication view is configurable as deduplication and non-deduplication view. This allows to view whether any object has got duplicates in search results and the duplicate of the Original (in the search results). Besides the automatic deduplication of collections and processing, deduplication may also be started manually from the IS1200's case dashboard. 26 Glossary Deduplication reports describing how a particular job or service applied deduplication are available. The reports can be accessed from the IS1200 case dashboard as well as from web search. Reports can list all results, only unique (deduplicated) results, or percentages of unique and duplicates. Reduplication is a process that allows the duplicates of unique files to be identified so tagging processes can apply metadata tags to the unique files as well as all its copies. Legal Tags reduplication can be done after documents are added to the case. Differential Classifications Differential classifications do not re-classify all file objects in the selected repositories. Instead, they examine the metadata from previous crawls, and if there is no previous metadata (indicating the object is new since the last classification) or the metadata has changed (based on atime, or mtime changes), then the object is parsed and its metadata re-populated in the database. Note: System classification configuration settings default to using mtime to determine if files have changed for differential classifications. If atime is desired instead, see the Using atimes for Differential Crawls section of the Configuration Files and Utilities appendix of any IS1200 User Guide for details on resetting the default to atime. Additionally, atime may be applied only to selected classifications by initiating them from the Command Line Interface, see the add service deep-classification command and the crawl-atime-check-enabled option in the IS1200 Command Line Interface Reference Guide for details. Digest Values Digests are numerical values calculated based on file and email content and are unique for all unique objects. Digest values allow file objects to be compared very quickly. Digests are calculated during basic and deep classifications or during collections or processing when indexing is enabled. Digests are calculated differently for standard files, emails, and container objects. For standard files, a physical digest is computed for the entire file much like a hash value. For email objects, just the subject, the message content (including attachments), and certain specific addresses are combined and an email digest value is calculated from the combination. Container objects, like ZIP or PST files, and their sub-objects have digests calculated both as complete objects and as individual sub-objects. 27 Glossary Note: Calculating email digests requires access to the email object's fullText and only classifications that include the fullText rule can produce email digests. Emails classified without the fullText rule receive the same physical digest that other files do. Consequently, identical emails on different repositories, one classified with and one without the fullText rule, will not be identified as duplicates. Domino Sever (Lotus) Domino XML Language (DXL) DocsWithoutFullText Assignment Ruleset A Lotus server providing groupware solutions and storage. A Lotus version of eXtensible Markup Language (XML) used to import and export Lotus email files. Some file objects, such as graphics files (examples are.jpeg, .gif, or .bmp files) contain no text, and hence will have no fullText extracted by the FullTextRuleset, see “fullText” on page 30 for more details. In legal cases, these files may still contain responsive information, but not textual information that can be located by text searches. The DocsWithoutFulltext assignment rules identifies these files and adds the metadata tag and value “DocWithoutFulltext=true” to all files that contain no searchable text. This allows these files to be easily searched for later, and inspected for legal responsiveness by non-search methods. The ConceptFinder_DWF assignment ruleset combines both the DocsWithoutFullText ruleset with the conceptfinder ruleset. See “conceptfinder Ruleset” on page 24 for more details. Note: Parent file objects that don’t contain text (such as .zip, .tar, and .pst files) are not tagged with the DocWithoutFulltext tag. Documentum Sever (EMC) The EMC Documentum server manages business content including documents, photos, video, medical images, e-mail, Web pages, fixed content, XML-tagged documents, and so on. The Documentum core is a repository that stores content securely under compliance rules and appears as a unified environment, even though content may reside on multiple servers and physical storage devices within a distributed environment. E eDiscovery 28 The process of reviewing electronic files to determine their relevances and responsiveness to a legal matter or case. Glossary eDiscovery Case Manager An IS1200 tab that facilitates eDiscovery for Legal Service Providers. Electronic Discovery Reference Model (EDRM) The EDRM was a Project created to provide standards and guidelines for the electronic discovery market. The model defines a common, flexible and extensible framework for the development, selection, evaluation and use of electronic discovery products and services. Enterprise Vault eth1, eth2 Extended Attributes Extraction Rules Exchange Server (Microsoft) A Symantec networked repository for archived email. Most IS1200 platforms require two ethernet connections for proper deployment. These connections are called eth1 and eth2, must each have unique IP addresses, and must be GigaBit, or 1GB/sec or faster, connections. Additionally, all network segments between eth1 and all registered metadata and data repositories must be gigabit eth1 is used to communicate between the IS1200 and its registered repositories. The IS1200 hostname should be DNS mapped to the eth1 IP address. eth2 must be connected to a private network between the IS1200 nodes and is used to coordinate and balance system wide operations. eth2 IP address should not be DNS mapped. User-defined keywords that are extracted during data classification. Extraction rules are a type of classification rule. They extract user-defined keywords (custom metadata) to add to the metadata file. Extraction rules are grouped into Extraction Rule Sets (ERSs). See the Policies: Classification, Extraction and Assignment Rules chapter of any IS1200 User Guide for more details. A Microsoft server designed to store and manage email. F Federation Federation Server Filer A defined group of member-clusters on a Federation server that can be managed, searched, and reported on as a group. Member-clusters are referred to as Federated clusters. A single-node IS1200 server, with a Federation license, that allows consolidated searching and reporting of up to eight Federated member-clusters of its defined Federation. A file server that exports its file systems using NFS or CIFS protocol. 29 Glossary fullText fullText is the “content” portion of a file, for example this is the textual content of word processing files and the message body of emails. fulltext is an extraction rule that is used to save file textual content as metadata to the Search Index during classifications. It saves up to 10 megabytes of content by default. This default may be changed, but it is not recommended. Fulltext extraction is required by Review/Analysis for the Previewer pane to work and to generate Concepts in the Results Grouping pane. fulltext, is extracted differently for container objects and sub-objects, and for files with embedded objects. Container objects (such as ZIP or PST files) and their sub-objects are classified individually and the fulltext of the parent container file, and for each child sub-object, is extracted and added to the relevant metadata repository separately. Files with embedded objects (such as a Microsoft Word file with and embedded spreadsheet), are classified together. The fulltext of the embedded object is included in the fulltext of its parent object and not collected separately. For more details on fullText, see Chapter 1 of the IS1200 Metadata Reference Guide. G Groupware Collaborative software designed to help people involved in common tasks achieve their goals. Incorporates services such as email, calendaring, text chat, wiki, web-sharing, document control, and advanced search. H Hash Values Hash values are used to compare one file with another for duplicates. An extremely simplified description of hashing is that the numeric values of all bytes in a file are added into a grand total. The chances of two different files yielding the same result (hash value) are remotely small, so hash values can be used to identify duplicate files, or compare files with the same name to decide if they have been modified. 30 Glossary Computing hash on an entire file is called a full-hash, and computing hash on a portion of the file is called a partial-hash. A “partial hash” may also be used to increase classification speed and “hashing” can be turned on, or off to increase classification speed. I identity A single entry in the Identity Vault database. The identity contains a single username and password that the IS1200 can retrieve when it needs to access a registered data or metadata repository or other server like and authentication service. Identity Vault An encrypted database of usernames and passwords the IS1200 uses to store the credentials used to access registered data repositories, send email notifications, and work with authentication services. Information Center Server The standard IS1200 server offers clustering as a scalable solution for classifying, searching, and reporting on registered network repositories. While clustering is ideal for scaling to large numbers of files on a LAN, it is not a viable solution for WANs. Enterprises with multiple IS1200 clusters deployed, or IS1200 clusters deployed in remote offices need the ability to setup and manage unified reports and searches across all their clusters. The IS1200 Information Center server provides this solution. Each Federation server supports one federation. A Federation may have up to eight clusters (with four nodes each) included in it. Once a federation is established, it becomes a central management point allowing classifications, search, and reports to be setup or managed on all the federations members from the Information Center server. See the IS1200 Information Center User and Configuration Guide for complete details. Intelligent Platform Management Interface (IPMI) IS1200 clusters may contain more than one node. Normally each node communicates with the others to share information and workload. The IS1200 appliance includes an Intelligent Platform Management Interface (IPMI) to shut down nodes when individual nodes or software errors would degrade the overall cluster performance. The IPMI is an autonomous micro-controller—installed in all cluster nodes—used by the cluster’s “leader” node to power down nodes with errors or performance problems. The IPMI requires its own unique IP address, but communicates over the eth1 port, see “eth1, eth2” on page 29 for more details. 31 Glossary K Kazeon EVAgent An IS1200 service, installed on the Enterprise Vault server, that allows the IS1200 to directly open and access Enterprise Vault email for classification services. Kaz-mount The NFS file system that is the IS1200 metadata repository. on which the IS1200 stores metadata. Kazeon Query Language (KQL) A programming language used in classification and assignment rules to identify files that should receive specified metadata tags. KQL Reserved Words The KQL language reserves the following words. Consequently, they are not allowed to be searched for, or used as tags or aliases. "ADD", "ALL", "ALTER", "AND", "ANY", "AS", "ASC", "AVG", "BETWEEN", "BY", "CASCADE", "CHECK", "COLUMN", "COUNT", "DESC", "DISTINCT", "ESCAPE", "EXISTS", "FROM", "FULL", "GRANT", "GROUP", "HAVING", "IN", "INTO", "IS", "JOIN", "KEY", "LEFT", "LIKE", "MAX", "MIN", "NOT", "NULL", "ON", "OR", "ORDER", "OUTER", "REVOKE", "RIGHT", "SELECT", "SET", "SUM", "UNION", "UNIQUE", "UPDATE", "VALUES", "VIEW", "WHERE" Kaz-server The file server where the metadata repository is located. Kaz-share The CIFS file system on which the IS1200 stores metadata. Kaz Schema Defines the set of metadata fields used to build a Search Index for registered data repositories (file systems). L Legal Hold Files placed on legal hold are either copied to a secure secondary location where they can preserved for later use, or are locked in their original locations against further change until a legal matter is resolved. Legal Service Provider (LSP) A lawyer or trained legal professional that provides legal services for a fee. Local localdatafs 32 Refers to the local resources (usually the metadata repository) of the Federation server. A data repository created on the IS1200 itself. This practice is not recommended. Glossary localkazfs Logging rule A metadata repository created on the IS1200 itself. This practice is not recommended. Logging rules audit user actions on files such as file access, creation, modification, and deletion. M Manifest Reports Manifests are reports that summarize the results of an IS1200 job or service. Manifests are produced for Collections (from either Administration or the Case Mgmt) and for some Actionable Services. Collection Manifests summarize what files were, or were not collected during a collection. Actionable Service Manifests reconcile Actionable Services object-counts with the search result object-counts they are performed on because processes such as deduplication can result in the two counts not matching. The reports details the count of differences and the reasons for the differences. For more information, see Manifests in the IS1200 Web-Search User Guide. Note: Collection manifests are available ONLY for collections done from v4.6.0 or later, earlier versions did not generate collection manifests. Member-cluster Metadata Metadata Repository Any of the clusters registered to a particular Federation. Data about data. Metadata is used to search for information and to create reports. Metadata can be file system or custom metadata that the IS1200 extracts from files during classification. File system metadata includes file type, and file path extracted during basic classification. Custom metadata is generated during deep classification. A registered repository the IS1200 uses exclusively to record the metadata extracted during classification services on the registered data repository the metadata repository is mapped to. The primary metadata repository is the host of the repository registration database, the report results database, Environment Discovery job results, Auditing and Data Verification databases, and miscellaneous databases the cluster requires for routine operation. Collectively these are called the Cluster Data Base. Metadata repositories created on the IS1200 itself (sometimes called localkazfs) are strongly discouraged! 33 Glossary N Namespaces IS1200 software, versions 4.0 and higher, organize metadata fields into hierarchy defined by namespaces. Namespaces group similar sets of tags, for example all the file level tags such as FileType, FileSize, aTime, and cTime are grouped together in the System namespace. See the IS1200 Metadata Reference Guide for complete details. Network File System (NFS) A protocol used primarily by Unix based computers for accessing computer systems and filers over the internet. Network Information System (NIS) A network naming, administration, and authentication system for smaller networks that was developed by Sun Microsystems and is used primarily by Unix systems. Node Notes Storage File (NSF) A single IS1200 appliance. A standardized storage file format used by Lotus to store email, attachments, notes, calendars, and so on. O optional modules The standard IS1200 license provides a default set of features that allows the IS1200 to register, classify, and search and report on CIFS and NFS data repositories. Optional modules are additional software licenses that can add further capabilities, such as being able to work with repository types other than CIFS and NFS, or providing Concepts Search capabilities, or applying legal hold. Some optional modules require connectors, see “connectors” on page 25 for more details. For a complete list of available optional modules, see the Introduction chapter of any IS1200 User Guide. P 34 PEA Files A Pool Entry Authorization (PEA) file is generated by the Centera server administrator. A PEA file defines what applications and users can perform read, write, delete, query, copy, or hold operations for Centera objects. Policy Groups Associates one or more authorization rule and logging rule with one or more files to protect information and audit user actions on files. Glossary PST Files Personal STorage files are generally used by email programs like Microsoft Outlook to store user email locally. PST files are also called “composite” files, because they are packages meant to efficiently store a number of smaller related files. Another example of a composite file is a ZIP storage file R Retention Roles The process of enforcing corporate or legal standards for how long certain kinds of files must be preserved for access. Examples of retained files include files responsive to legal matters and medical records. All IS1200 users have a role, either admin, auditor, or end-user. If a legal license is installed, there may also be legaladmin, legalsupervisor, legalreviewer, or a custodian. Roles determines what parts of the IS1200 interface may be seen, and how much of search and report results are displayed. S Search Analytics Pre-Processing Search Analytics Pre-processing was introduced in release 4.5.0 to minimize search results display time and improve the overall efficiency of eDiscovery culling. Analytics Pre-processing is an integral, automatic, post-processing job performed after any job that modifies the Search Index. Analytics Pre-processing trades an increased post-job indexing period for significantly reduced search results display times after the affected jobs complete. A variety of jobs requires Search Index changes and therefore require Analytics Pre-processing. These include Collections, Classifications, Delete, and Tagging jobs. The time required by Analytics Pre-processing is determined primarily by the number of objects in the affected data repository, the number of distinct analytic (result filter grouping) attributes (such as custodians, mail senders, mail recipients, sender domains, recipient domains and so on.), and the read/write performance of the metadata repository associated with the data repository. Additionally, once any Analytics Pre-processing job is launched, all subsequent Analytics Pre-processing jobs (that might be required by other concurrent jobs-in-progress) wait for the current Analytic Pre-processing job to finish. However, before beginning any Analytics Pre-processing job for a particular data repository, the IS1200 checks 35 Glossary all other jobs-in-progress for that repository to see if they might also require Analytics Pre-processing. If other jobs are found, the IS1200 waits for all these jobs to finish in order to launch a single Analytics Pre-processing job for all the jobs that affected the Search Index for that data repository. Therefore, there are two best practices suggested for scheduling jobs that affect the Search Index: • Schedule large classifications or collections such that both they, and the Analytics Pre-Processing they require, can both fully complete before starting any other job. This allows the IS1200 to most efficiently schedule the required processing resources. Large jobs are those that affect data repositories with tens of thousands of objects or terabytes of data. • Schedule small jobs (such as incremental collections, or post-search tagging operations) to run concurrently so the IS1200 can identify their common Analytics Pre-processing requirements and group them into a single job. Note: IS1200’s that are upgraded to v4.5.0 may need some additional configuration to make the most efficient use of Analytics Pre-Processing. See the Configuring the IS1200 To Use Proactive Indexing section of the Configuration Files and Utilities appendix of any IS1200 User Guide for complete details. Search Index SharePoint Sever (Microsoft) snippets An IS1200 database that stores and indexes the file content metadata (including extended attributes, and fullText) for standard and custom user-defined metadata produced by extraction rules during classifications. A Microsoft server in the groupware category. A snippet is a sub-set of a document’s actual content. Snippets are only displayed if they are enabled in Review/Analysis Preferences, and only in Paragraph View immediately under the first line of the result listing. After a keyword search completes, result snippets are created as small standard size chunks of data taken from the text surrounding a search query hit. For example, if a search is made for “medicine”, the snippet will contain about 300 bytes of the text surrounding the paragraph where the word “medicine” was found. If multiple search 36 Glossary hits are found, the most relevant hit is used to create the snippet. For searches made without keywords, snippets are simply the first 300 bytes of file text. Snippet size is configurable, see the Configuration Files and Utilities appendix of any IS1200 User Guide for details on setting snippet size. In all cases, snippets are taken from the result file’s fullText. SourceOne Archive Server (EMC) The EMC SourceOne server is a comprehensive, policy-based system that automatically collects, organizes, indexes and retains messages and associated attachments and stores them in designated archives connected to shared storage. EMC SourceOne provides indexed searching that works with both EMC storage and other brands such as IBM or NetApp. Special Characters The IS1200 supports alphanumeric ASCII and UTF-8 characters. Non-alphanumeric ASCII characters are defined as Special Characters and include the following: ‘“-_\/!@#$%^&*+={}[]()<>|:;,.?~` Special characters are not universally supported in the IS1200 interfaces. The following limitations must be noted: Search Queries and Special Character, Special characters pose a searching challenge. Because the IS1200 tokenization removes special characters from indexed text as it is classified, special characters are never entered into the IS1200 metadata indexes. Consequently, special characters may not be directly searched for. For more details see Tokenisation and Stemming in the IS1200 Web-Search User Guide. While special characters may not be directly searched for, the text they are included in can be searched. For example, the string "-ACME-" is tokenized on the hyphens and recorded in the metadata only as "ACME". Consequently, searching for the string with the hyphens (-) will NOT work. However, you can search for “?ACME?” (using the question mark wildcard) which gives the result as “!ACME!”, “@ACME.”, and so on. See the IS1200 Web-Search User Guide for more details on wildcards. Note: The question mark character ( ? ) may not be searched for in filepaths, even when escaped. This exception is limited to filepath searches only. 37 Glossary AD login names and NIS login names support only alphanumeric ASCII and UTF-8 characters, they do NOT support following special characters: '"-_\/!@#$%^&*+={}[]()<>|:;,.?~` However, in Active Directory (AD), registered users may have both an AD login name and a display name. For example, John Smith may have the AD login name “jsmith” and the display name “John Smith”. When new legal supervisors or reviewers are created in the Case Mgmt using the AD lookup button, they take the display name, not the login name, and the display name may contain special characters as described below. Legal Supervisor Names and Legal Reviewer Names only, support: '-_!@#$%^&*+={}[]()|:;,.?~` do NOT support: "\/<> Custodian Names only, support: '"-_!@# %^&*+={}[]()|:;,.?~` do NOT support: \/<>$ Case Names, Legal Export Profile Names, Repository Names, Rule Names, and Policy Names only, support: _ (underscore) do NOT support: " - \ / ! @ # $ % ^ & * + = { } [ ] ( ) < > | : ; , . ? ~ ` Email IDs which are used in Legal Hold notification and Acknowledgements, Search filters, Collection filters and so on, support: '-_!#$%^&{}:;,.?~` do NOT support: " \ / @ * + = [ ] ( ) | < > Mail Domain Names DO NOT support any special characters. File names/ Directory names in source and destination file names only, support: '-_!@^+={}[]()<>;,.~ do NOT support: " \ / # $ % & * | : ? ` Tag Names only, support: _ (underscore) do NOT support: ' " - \ / ! @ # $ % ^ & * + = { } [ ] ( ) < > | : ; , . ? ~ ` 38 Glossary Tag Values only, support: '-_\/!@#$%^&*+={}[]()<>|:;,.?~` do NOT support: " (double quote) Rules definitions, special characters must be “escaped” before they may be used in rule definitions. To escape a character use a \ before the character. Search technology uses reserved words, stop words, special characters, tokenizers, and so on. These are common to almost ALL search technologies and not just a Kazeon search engine manifestation. One major reason for such implementation is that if all the characters and the words are indexed regardless of any semantics, your search index has a potential to increase beyond any manageable size since the occurrence of said categories of tokens is so common. Besides, there is not much value in indexing stop words (as, the, or, and so on.) and tokenizers (@ , . - and so on.). The omission of such characters from your search query is just a part of the optimization. For example, when you “search” for (1+1):2, the characters “(”, “+”, “)”, and “:” have special meaning in search - the parentheses are used to specify grouping, the plus is used to specify inclusive terms in a query, and the colon is using to separate tag from value as in "filepath:*". In order to use these in your query, you need to escape them with a backslash, as follows: \(1\+1\)\:2 However, the escaping does not mean the characters are now a part of your query. It only means that those characters are not interpreted by search with special semantics. This query is preprocessed to drop those characters from the final query which appears as follows when it is actually executed by the search engine: fulltext:"1 1 2" This means that we are searching for a 1 followed by a 1 which is again followed by a 2 such that there are no other valid indexable search tokens between the three numbers. The results may match 1-1+2 1:1:2 1-1-2 and so on. However, they will not match 39 Glossary 1:3:1:2 1-43+1:2 and so on. Hence to search for (1+1):2, use the following query: \(1\+1\)\:2 stop words Stop words consist of the most commonly used words in sentences, such as “a”, “an”, “the”, and ”and”. If indexed individually, they would consume excessive amount of metadata storage space, and consequently are not individually indexed. If stop words are used in a search query, they are ignored unless they are parts of quoted phrases. The table below lists all stop words: Table 2Stop Words 40 a an and are as at be but by for if in into is it not of on or such that the their then there these they this to was will with Stemming Stemming is a search technique designed to increase search efficiency and broaden relevant search hits. When stemming is used, fullText indexing first attempts to identify each word’s “stem”, and then indexes words by their stems. For example, the words “connected”, “connecting”, and “connectable” all share the same stem and are indexed under “connect”. Search query criteria are automatically stemmed, and so querying “connected” returns all instances where “connect”, “connected”, and “connecting” are used. Nouns like “connector” are not stemmed. Stemming is ON by default but may be disabled. stubs Stubs are created by many file archiving applications, most notably email archiving systems. When stubbing is used, and a file object is moved to archival storage, a “stub” is left behind on the original file system that points to the archived file’s new location. Thereafter, if a user attempts to open the archived file from the original filer, the stub allows that filer to retrieve the archived file and return it to the user transparently (as if it were still on the original filer). Glossary Stubs may be searched for using the metadata field “mailMessageClass”. For example, use the search query “mailMessageClass:IPM.Note.ExShortcut” to find email message stubs. sub-objects A file found inside a “container object”, see “Container file/object” on page 25 for more details. A container file is often called the “parent” and the contained sub-objects are called “children”. Sub-objects should not be confused with embedded files such as OLE objects, for example spreadsheets or graphics embedded in a Microsoft Word file. Note however, an email message may be a container object, if it “contains” attachments, or a simple non-container object but still have a graphic embedded in its body. T Tags The names of metadata fields. Tags are always associated with a value. For example, the metadata tag “filename” for any given file is always followed by a value (a text string) containing the actual filename. Tokenization Tokenization is an IS1200 classification procedure that breaks word strings into “tokens” for better search results. During classifications, Numbers, AlphaNums, HostNames and EmailAddresses (in fullText) are tokenized similar to alpha only strings. With tokenization, the stings “www.kazeon.com”, “fred@kazeon.com”, and “11,22,333,44” are tokenized into separate words yielding; “www”,”kazeon”, and “com”; “fred”, “kazeon”, and “com”; and “11”, “22”, “333”, and “444”. This allows searching for “kazeon” and getting all email addresses that contain the domain name, while. U UTF-8 Unicode Transformation Format - 8, is an 8-bit coding scheme for digitally representing both the standard western alphabet (Aa-Zz) and its punctuation characters, and non-western word characters such as the glyhps found in the Chinese, Japanese, and Korean languages. UTF-8 encodes all its characters as 8-bit bytes (or octets). The first 128 UTF-8 characters are identical to the first 128 ASCII characters and require only one byte each. Non-western languages are coded using one to four octets each. UTF-8 can encode all of the 1,112,064 code points in the Unicode character set that covers the majority of languages in use around the world. 41 Glossary W Web-Admin An IS1200 web application used by IT personnel to administer the server itself, and when the IS1200 is used to help administer other IT resources. Administration is the preferred interface for administering the server. Web-Reports An IS1200 web application that provides advanced reporting capabilities based on IS1200 metadata. Web-Search An IS1200 web application that provides basic, advanced, and specialized email searches against IS1200 metadata. X XML eXtensible Markup Language 42 A file type that uses the XML language to define and describe data that can be transferred between applications like databases and spreadsheets. Index A Access Control List (ACL) 21 Actionable Services 21 Actions 21 Active Directory (AD) 21 Advanced Search 22 Assignment Rule 22 Auditing 22 Authentication 22 Authorization Rule 22 B Basic Search 22 C CAS 24 CAS Device 22 Case Manager 29 CASID 22 Celerra FLR Repositories 10 Celerra FLR repositories 8 Celerra FLR Retention Manager 8 Celerra FLR server 9 Celerra FLR Servers 2 Celerra Server 23 Centera Server 23 checkpointing 23 CIFS 2 Classification Rules 23 Classification Service 23 CLI Command Line Interface 24 Cluster 24 conceptFinder Ruleset 24 Concepts Search 24 connectors 25 Container 25 Content Addressable Storage 24 CSV Comma Separated Values 24 Custodian 25 D data 25 Data Repository 26 Data Server 25 Data Verification 26 Datamap 25 Data-Mount 25 Data-Share 26 Deduplication 26 Differential Classifications 27 DocsWithoutFullText 28 Documentum Sever 28 Domino Sever 28 Domino XML Language 28 DXL 28 E eDiscovery 28 eDiscovery Case Manager 29 Electronic Discovery Reference Model 29 EMC Celerra Server 23 Centera Server 23 Documentum Server 28 SourceOne Archive Server 37 Index 43 Index Enterprise Vault 29 eth1 29 eth2 29 EVAgent 32 Exchange Server 29 Extended Attributes 29 Extraction Rule 29 F Federation 29 Federation Server 29 file-level retention 2 Filer 29 FLR Repository Registration 8 FLR-specific retention 3 full-text 30 G groups policy 34 Groupware 30 GUIs eDiscovery Case Manager 29 Web-Admin 42 Web-Reports 42 Web-Search 42 H L Legal Hold 32 Legal Service Provider (LSP) 32 Local 32 localdatafs 32 localkazfs 33 Logging rule 33 Lotus Domino 28 M Manifest Reports 33 Member-cluster 33 Metadata 33 Metadata Repository 33 Microsoft Active Directory 21 Exchange Server 29 SharePoint 36 Mount Path 12 N Network File System (NFS) 34 Network Information System (NIS) 34 NFS 2 Node 34 Notes Storage File 34 NSF 34 Hash Values 30 O I optional modules 34 identity 31 Identity Vault 31 Intelligent Platform Management Interface 31 IPMI 31 P PEA Files 34 Policy Group 34 PST 35 K Kaz Schema 32 Kazeon EVAgent 32 Kazeon Query Language 32 Kaz-mount 32 Kaz-server 32 Kaz-share 32 44 R Reporting 20 reports manifest 33 Web-Reports 42 Retention 35 Retention Class Selection 18 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0 Index Retention reports 20 Roles 35 rules classification 23 extraction 29 logging 33 S Search advanced 22 Basic 22 Web-Search 42 Search Index 36 Searching 19 SharePoint Sever 36 Snaplock 20 SnapLock repositories 20 snippets 36 source repository 15 SourceOne Archive Server 37 Stemming 40 stop words 40 sub-objects 41 Symantec Enterprise Vault 29 T Tags 41 target repository 15 Tokenization 41 W Web-Admin 42 Web-Reports 42 Web-Search 42 X XML eXtensible Markup Language 42 Index 45 Index 46 IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0