Uploaded by Bellamy Kravitz

IS1200 EMC Celerra FLR Retention ManagerUser and Configuration Guide

advertisement
Title Page
EMC® Kazeon-eDiscovery
Version 4.8.0
IS1200 EMC Celerra FLR Retention Manager
User and Configuration Guide
EMC Corporation
Corporate Headquarters:
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Copyright © 2007 - 2015 EMC Corporation. All rights reserved.
Published September 2015
EMC believes the information in this publication is accurate as of its publication date. The information is subject to
change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and
Adobe PDF Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All
other trademarks used herein are the property of their respective owners.
The IS1200 software is based in part on software licenses from the following:
Outside In® Content Access © 1991-2015, Chicago, Inc.
Open Source code from www.java2s.com called the itext.asian.jar available at:
http://www.java2s.com/Code/Jar/GHI/itext-asian.jar.htm
Copyright 2009 - 12 Demo Source and Support. All rights reserved
In part on the work of the Independent JPEG Group.
Code from Inxight Software, Inc. Copyright © 1996-2015. All rights reserved. www.inxight.com.
Certain icons used by the Kazeon Web applications come from the Silk Icon set
(http://www.famfamfam.com/lab/icons/silk/)
licensed under the Creative Commons Attribution 2.5 license
(http://creativecommons.org/licenses/by/2.5/).
ii
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Contents
Figures.........................................................................................................v
Tables ................................ ........................................................................vii
Preface ................................................................ ......................................ix
Chapter 1
Introduction
About the IS1200 ................................................................................. 2
Extending Server Functionality ..................................................2
Celerra FLR Functionality ...........................................................2
The Celerra FLR Retention Manager .........................................3
Supported Celerra FLR Configurations........................................... 4
Celerra FLR Retention Manager Implementation Overview ....... 5
Celerra FLR Retention Manager Implementation in New
Installations ...................................................................................5
Celerra FLR Retention Manager Implementation In Existing
Installations ...................................................................................5
Chapter 2
Registering Celerra FLR Repositories
Supported Configurations................................................................. 8
Celerra FLR Repository Registration Requirements...................... 8
Creating a Metadata Repository for a Celerra FLR Repository ... 9
Registering Celerra FLR Repositories ............................................ 10
Registering Repositories Using Web-Admin.......................... 10
Registering Repositories from the CLI .................................... 14
Contents
iii
Contents
Chapter 3
Workflow Changes
Celerra FLR Retention Capabilities................................................
Celerra FLR Legal Hold Capabilities .............................................
Celerra FLR Collection Options......................................................
Administration..................................................................................
Searching............................................................................................
Reporting ...........................................................................................
18
19
19
19
19
20
Glossary ............................................................. ......................................21
Index .................................................................. ......................................43
iv
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Figures
Title
1
2
3
4
Page
The Web-Admin Repositories Page.............................................................
The Web-Admin Add Repository Tab for NFS .........................................
The Web-Admin Add Repository Tab for CIFS ........................................
Retention Options for Celerra FLR Data Repositories..............................
Figures
10
11
12
18
v
Figures
vi
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Tables
Title
1
2
Page
Revision History Details................................................................................. xii
Stop Words....................................................................................................... 40
Tables
vii
Tables
viii
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Preface
As part of an effort to improve its product lines, EMC periodically releases
revisions of its software and hardware. Therefore, some functions described
in this document may not be supported by all versions of the software or
hardware currently in use.The product release notes provide the most
up-to-date information on product features.
Contact your EMC technical support professional if a product does not
function properly or does not function as described in this document.
Note: This document was accurate at publication time. Go to EMC Online
Support (https://support.emc.com) to ensure that you are using the latest
version of this document.
Audience
Related
Documentation
This guide is intended for Administrators, Business Analysts, and
Compliance Auditors that need to setup and configure an EMC
Kazeon - eDiscovery server to work with EMC Celerra repositories, and
then search and produce reports on those repositories.
IS1200 Installation and Quickstart Guide
- describes installing and configuring the IS1200 server software.
IS1200 Web-Admin User and Configuration Guide
- describes using Web-Admin to setup and manage Kazeon clusters.
IS1200 Web-Search User Guide
- describes using Web-Search to perform basic and advanced searches.
IS1200 Web-Reports User Guide
- describes using Web-Reports to create and use basic and advanced reports.
Preface
ix
Preface
IS1200 eDiscovery Case Manager Administrators and Supervisors Guide
- for legal representatives, a primer of all the web-based Interfaces
above for performing eDiscovery.
IS1200 Command Line Interface Reference Guide
- describes the IS1200 Command Line Interface and all its commands.
Follow these steps to download IS1200 document from the web:
1. Go to https://support.emc.com and click the SUPPORT BY
PRODUCT option in the home page.
2. In the Find a Product field, enter Kazeon. From the product
selection list, choose one of the sub-headers (such as Kazeon
ECS) and click the Find button.
3. Kazeon ECS window is displayed. Click the link for
Documentation.
4. In the left-navigation menu, choose a version level to display the
available documents.
Conventions used in
this document
EMC uses the following conventions for special notices:
DANGER indicates a hazardous situation which, if not avoided, will
result in death or serious injury.
WARNING indicates a hazardous situation which, if not avoided,
could result in death or serious injury.
CAUTION, used with the safety alert symbol, indicates a hazardous
situation which, if not avoided, could result in minor or moderate injury.
NOTICE is used to address practices not related to personal injury.
Note: A note presents information that is important, but not hazard-related.
x
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Preface
IMPORTANT
An important notice contains information essential to software or
hardware operation.
Typographical conventions
EMC uses the following type style conventions in this document.
Normal
Used in running (nonprocedural) text for:
• Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus)
• Names of resources, attributes, pools, Boolean expressions, buttons, DQL statements, keywords,
clauses, environment variables, functions, utilities
• URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service
keys, file systems, notifications
Bold
Used in running (nonprocedural) text for:
• Names of commands, daemons, options, programs, processes, services, applications, utilities,
kernels, notifications, system calls, man pages
Used in procedures for:
• Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus)
• What user specifically selects, clicks, presses, or types
Italic
Used in all text (including procedures) for:
• Full titles of publications referenced in text
• Emphasis (for example a new term)
• Variables
Courier
Used for:
• System output, such as an error message or script
• URLs, complete paths, filenames, prompts, and syntax when shown outside of running text
Courier bold
Used for:
• Specific user input (such as commands)
Courier italic
Used in procedures for:
• Variables on command line
• User input variables
<>
Angle brackets enclose parameter or variable values supplied by the user
[]
Square brackets enclose optional values
|
Vertical bar indicates alternate selections - the bar means “or”
{}
Braces indicate content that you must specify (that is, x or y or z)
...
Ellipses indicate nonessential information omitted from the example
Where to get help
EMC support, product, and licensing information can be obtained as
follows.
Preface
xi
Preface
EMC product, and licensing information can be obtained as follows.
Product information — For documentation, release notes, software
updates, or for information about EMC products, licensing, and
service, go to the EMC Online Support at:
https://support.emc.com
Technical Support — Go to EMC Online Support and click Service
Center. You will see several options for contacting EMC Technical
Support. Note that to open a service request, you must have a valid
support agreement. Contact your EMC sales representative for details
about obtaining a valid support agreement or with questions about
your account.
Documentation
Feedback
Your suggestions help us continue to improve the accuracy,
organization, and overall quality of the user publications. Please send
your comments or opinions on this document to:
ECD.Documentation.Feedback@emc.com
Revision History
Table 1
xii
Revision History Details
Revision Date
Description
September 2015
Updated the Deduplication section in “Glossary”.
August 2014
Added an update to the mixed mode support by Kazeon in “Creating a
Metadata Repository for a Celerra FLR Repository” on page 9.
May 2014
• Updated information about support for mixed mode in “Creating a
Metadata Repository for a Celerra FLR Repository” on page 9.
• Changed any instances of Exchange connector to FLR connector.
December 2013
Initial Publication
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
1
Introduction
This guide is provided as a companion to the IS1200 Web-Admin
User and Configuration Guide which should be read first as it contains
most of the basic IS1200 server setup and maintenance information
on which this guide builds.
Topics include:
◆
◆
◆
◆
◆
◆
◆
◆
About the IS1200 ..................................................................................
Extending Server Functionality .........................................................
Celerra FLR Functionality...................................................................
The Celerra FLR Retention Manager.................................................
Supported Celerra FLR Configurations............................................
Celerra FLR Retention Manager Implementation Overview ........
Celerra FLR Retention Manager Implementation in New
Installations...........................................................................................
Celerra FLR Retention Manager Implementation In Existing
Installations...........................................................................................
Introduction
2
2
2
3
4
5
5
5
1
Introduction
About the IS1200
The IS1200 is an integrated hardware and software system that
provides information management solutions enabling organizations
to efficiently and cost effectively classify, manage, and retrieve data.
They provide consistent information visibility and control across
distributed files, minimize the risk of un-managed files, integrate
seamlessly with existing infrastructure, and scale to support billions
of files for searching, reporting, backup search and recovery, and file
migration and archiving.
Extending Server Functionality
The standard IS1200 uses clustering to offer a scalable solution for
classifying, searching, reporting on, and applying Actionable Services
to search and report results found on registered data repositories.
Data repository types include NFS, CIFS, and many other
vendor-specific servers such as Microsoft Exchange, and Microsoft
SharePoint servers.
The IS1200’s standard functionality, and the types of servers it can
access, can be expanded with add-on modules like the FLR
Connector. The FLR Connector requires an additional license for the
IS1200 and allows that IS1200 to register and manage Celerra FLR
Servers.
Celerra FLR Functionality
The main objective of the EMC Celerra FLR (file-level retention)
server is to protect files from deletion or modification until a specified
retention date. FLR allows creating a permanent, unalterable set of
files and directories, and ensures the integrity of the data they contain
for a controllable retention period.
This server prevents users from deleting or modifying files that are
locked and protected. This is highly useful during legal issues where
files must be preserved without modification for the entire course of a
legal case, and for organizations like health service providers that
must retain medical records for fixed periods determined by legal
mandates.
The Celerra FLR is designed to provide file retention at both the
self-regulated and government-regulated levels.
2
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Introduction
The Celerra FLR Retention Manager
While the standard IS1200 can register standard Celerra shares that
have been exported as CIFS or NFS, the FLR Connector allows the
IS1200 to register Celerra FLR servers and work with their
FLR-specific retention capabilities. Once a Celerra FLR server has
been registered as a data repository, and classified, it can be searched,
reported on, and have Actionable Services such as Retention applied
to search and report results.
Adding the FLR Connector also enables the IS1200 to provide
additional Celerra FLR-specific search options, retention reports, and
new Action options. The IS1200 can transfer files to Celerra FLR
servers, and access those files to classify and search them.
Most importantly, the FLR Connector allows the IS1200 to help
manage all the files on Celerra FLR servers including file settings
such as retention times. Additionally, IS1200 standard retention
reports such as “Expired Files”, “Retention by Aging”, and
“UpForExpiry” allows identifying files that have passed their
retention dates and Actionable Services such as Retention, Copy,
Move, and Delete, allows report results (files) to have retention
periods modified or extended, or to efficiently archive or delete
expired files.
About the IS1200
3
Introduction
Supported Celerra FLR Configurations
The FLR Connector supports EMC Celerra FLR versions 5.6.4.3 and
above.
4
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Introduction
Celerra FLR Retention Manager Implementation Overview
If FLR Connector capabilities are ordered with the original appliance
purchase, the appropriate optional module license is automatically
included in the server master license and is routinely installed along
with the server license.
If the FLR Connector is purchased after the original installation, the
FLR Connector license key must be added to the installation before
Celerra FLR repositories can be registered and managed. See the
Installing License Keys chapter of the IS1200 Web-Admin
User and Configuration Guide for details on obtaining and installing
optional modules licenses for existing installations.
Celerra FLR Retention Manager Implementation in New Installations
The FLR Connector software is automatically installed with all IS1200
installations (ECS and FI), the FLR Connector license key simply
activates it.
To use the FLR Connector purchased with a new installation, simply
follow the standard hardware and software installation and
configuration instructions in the
IS1200 Installation and Quickstart Guide and then use the rest of this
guide to configure the connectors.
Celerra FLR Retention Manager Implementation In Existing Installations
If the FLR Connector license key is obtained after the original IS1200
installation, use the following general steps to implement FLR
Connector capabilities.
1. Use the current Web-Admin application to add the FLR
Connector license key to the IS1200, and then quit and
relaunch the Web-Admin application. See the Installing
License Keys chapter of the IS1200 Web-Admin
User and Configuration Guide for details on obtaining and
installing optional modules licenses for existing
installations.
2. Register your Celerra FLR servers as IS1200 data
repositories.
Celerra FLR Retention Manager Implementation Overview
5
Introduction
3. Classify your Celerra FLR repositories.
4. Search, report, and apply Actions to your Celerra FLR
servers as necessary.
6
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
2
Registering Celerra FLR Repositories
This chapter discusses registering Celerra FLR network shares as
EMC Kazeon - eDiscovery data repositories.
Topics include:
◆
◆
◆
◆
◆
◆
Supported Configurations .................................................................. 8
Celerra FLR Repository Registration Requirements....................... 8
Creating a Metadata Repository for a Celerra FLR Repository .... 9
Registering Celerra FLR Repositories ............................................. 10
Registering Repositories Using Web-Admin ................................. 10
Registering Repositories from the CLI............................................ 14
Registering Celerra FLR Repositories
7
Registering Celerra FLR Repositories
Before Celerra FLR repositories can be classified, or searched and
reported on, they must be registered with the IS1200 as data
repositories and have a metadata repository assigned.
Supported Configurations
The Celerra FLR Retention Manager supports all Celerra FLR
versions 5.6.4.3 and above. The Connector supports registering up to
sixteen Celerra FLR repositories per IS1200 node (the same IS1200
support supplied for all NFS or CIFS repositories). Each Celerra FLR
repository can consist of multiple Celerra servers.
Celerra FLR Repository Registration Requirements
To register and access Celerra repositories, the IS1200 must needs the
following:
8
◆
A appropriate metadata repository to associate with the
repository when it is registered.
◆
If the Celerra FLR repository is exported as a CIFS share, an
identity is needed that has complete access to the repository to be
registered.
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Registering Celerra FLR Repositories
Creating a Metadata Repository for a Celerra FLR Repository
Before registering a Celerra FLR repository, one or more metadata
repositories must be registered to store the extracted metadata. For
information on adding metadata repositories, see the Repository
Registration and Management chapter of the IS1200 Web-Admin
User and Configuration Guide. For optimal performance, use metadata
repositories on NFS systems.
Dedicated IS1200 metadata repositories should be assigned to each
Celerra FLR server. Because a Celerra FLR server typically host many
terabytes of data, multiple metadata repositories may be needed to
store the metadata for a single Celerra FLR server. If you find that the
number of metadata repositories is inadequate, register more
metadata repositories.
Kazeon supports only basic crawl on data repositories that use mixed
mode. For the mixed-mode environment to work, the user mapping
should be already configured. For performing this mapping, refer to
the respective storage documentation. For example, Isilon, Celerra,
and so on.
If the user mapping is not performed correctly, you may not get the
correct custodian or owner information of the object. For more
information about mixed mode support, see the IS1200 4.8
Web-Admin User Guide.
Note: Metadata repositories may not be shared between two registered
Celerra FLR servers or between a Celerra FLR cluster and any other
registered data repository.
Creating a Metadata Repository for a Celerra FLR Repository
9
Registering Celerra FLR Repositories
Registering Celerra FLR Repositories
Celerra FLR repositories may be registered with the IS1200 using
either the CLI or Web-Admin.
Registering Repositories Using Web-Admin
To register a Celerra FLR server as a data repository from Web-Admin,
do the following:
1. In Web-Admin, select Repository View under Repositories in
the left-navigation menu. The Repositories tab opens:
Figure 1
The Web-Admin Repositories Page
2. In the Repository tab tool-bar, click Add Repository.
10
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Registering Celerra FLR Repositories
3. Select NFS or CIFS from the Repository Type drop-down
menu, depending on how the Celerra FLR repository you
want to register was shared.
One of the following two tabs appears:
Figure 2
The Web-Admin Add Repository Tab for NFS
Registering Celerra FLR Repositories
11
Registering Celerra FLR Repositories
Figure 3
The Web-Admin Add Repository Tab for CIFS
4. Fill in the fields on the screen you see as follows:
Name. Enter a reference name for this repository. The IS1200 uses
reference names, instead of the full repository filepaths, in all
menus where a user must choose a repository, for example when
choosing a data repository for a classification or collection.
Reference names must be unique. Reference names are limited to
127 bytes (127 ASCII characters, or as few as 31 four-byte UTF-8
encoded characters). Reference names may include some special
characters, see “Special Characters” on page 31 for details.
Metadata File System From the drop-down menu, select the
metadata repository you created for the Celerra cluster, or let the
IS1200 auto select one.
Server. Enter the name of the NFS/CIFS file server hosting the
repository to add. This may already be entered (and
unchangeable) if registering a “discovered” data repository.
If you are registering an NFS repository:
Mount Path. Enter the mount path of the NFS repository on the
host file server.
12
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Registering Celerra FLR Repositories
If you are registering a CIFS repository:
Share Name: Enter the share name, or mount point, of the CIFS
repository to register.
Identity: Select a pre-defined user identity—from the drop-down
list—to use when the IS1200 accesses this filer. If an appropriate
identity is not available, click the Create Identity button to add
one. For more information on identities, see The Identity Vault
chapter of the IS1200 Web-Admin User and Configuration Guide.
Specify Use: Select one of the following:
• Source. Register this repository as a source, this includes the
reference name (specified above) in all dialogs where a
repository can be chosen as a source, for example when doing
a Collection in either Web-Admin or the
eDiscovery Case Manager.
• Target. Register this repository as a target, this includes the
reference name (specified above) in all dialogs where a
repository can be chosen as a target.
• Source and Target. Register this repository as a both a source
and a target for dialogs.
Read Only. Check to indicate the filer being registered is Read
Only to the IS1200. The option should be used if the repository
(being registered) is exported or shared as Read Only. If this option
is not set, the IS1200 assumes the filer being added is Read Write.
Repository Vendor. Check the EMC Celerra FLR checkbox.
Once the EMC Celerra FLR checkbox is set, and the repository is
registered, the option cannot be un-set! The only way to uncheck
this option is to offline the repository, delete it, and re-add it
without the option checked.
Storage Tier. Optionally, specify the storage tier where the data
repository is located. The storage tier can be any number between
0 and 255. Default is 0.
Force add on errors. Select to force adding this device to the
registered list in spite of errors.
Preserve Access Time. This option is unavailable for Celerra FLR
repositories.
Registering Celerra FLR Repositories
13
Registering Celerra FLR Repositories
5. Click Submit to register the data repository.
Registering Repositories from the CLI
To register a Celerra FLR server from the Command Line Interface,
use the following general Command Line Interface command:
add datafs <referenceName> mount <mountPoint> as
<identity> <attributesHere>
Where:
<referenceName> is the name the IS1200 should use when
displaying this repository in repository selection menus.
<mountPoint> is the mount path for the repository.
<identity> is the name of an an identity (already stored in the
IS1200 Identity Vault) to use to access the CIFS repository.
<identity> is only required when adding CIFS repositories.
<attributesHere> is the keyword attributes followed by a
comma separated list of attributes appropriate to the repository.
Examples follow:
To add an NFS repository from a Celerra FLR server and make it a source
repository:
add datafs celerra_nfs mount celerraflr_server:/nfs1
attributes
celerraflr=yes,fs_preserve_timestamp=no,source_reposit
ory=yes,target_repository=no
To add an NFS repository from a Celerra FLR server and make it a
source repository and a target repository:
add datafs celerra_nfs mount celerraflr_server:/nfs2
attributes
celerraflr=yes,fs_preserve_timestamp=no,source_reposit
ory=yes,target_repositoy=yes
When adding CIFS repositories, an identity must be available
containing credentials allowing complete access to the repository.
Assume the identity celerraflr_identity is available for the following
examples.
To add a CIFS repository from a Celerra FLR server and make it a
source repository:
14
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Registering Celerra FLR Repositories
add datafs celerra_nfs mount //celerraflr_server/cifs1 as
celerraflr_identity attributes
celerraflr=yes,fs_preserve_timestamp=no,source_reposit
ory=yes,target_repository=no
To add a CIFS repository from a Celerra FLR server and make it a
source repository and a target repository:
add datafs celerra_nfs mount //celerraflr_server/cifs2 as celerraflr_identity
attributes
celerraflr=yes,fs_preserve_timestamp=no,source_repository=yes,target_r
epositoy=yes
Registering Celerra FLR Repositories
15
Registering Celerra FLR Repositories
16
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
3
Workflow Changes
This chapter describes how installing a Celerra FLR Retention
Manager license on the IS1200 changes the standard workflow
procedures.
Note: Before continuing, be sure a valid Celerra FLR Retention Manager
license key is installed on all nodes of your IS1200 cluster. See the Installing
License Keys chapter of the IS1200 Web-Admin User and Configuration Guide
for details on obtaining and installing a license key if you have not already
installed the key.
Topics include:
◆
◆
◆
◆
◆
◆
Celerra FLR Retention Capabilities .................................................
Celerra FLR Legal Hold Capabilities ..............................................
Celerra FLR Collection Options .......................................................
Administration ...................................................................................
Searching .............................................................................................
Reporting.............................................................................................
Workflow Changes
18
19
19
19
19
20
17
Workflow Changes
Celerra FLR Retention Capabilities
When a Celerra FLR Retention Manager license is added to the
IS1200, new retention options become available in all screens, pages,
or tabs that move or copy files to a Celerra FLR repository. The new
options generally look like the following:
Figure 4
Retention Options for Celerra FLR Data Repositories
The new options work as follows:
Retention Date Selection: Check this box to determine the retention
date to use:
◆
Absolute Date/Time: Select this radio button to set an absolute
retention date. Use the (
) Time drop-down menu and the
(
) Calendar tool to determine the specific retention time and
date.
◆
Relative Date: Select this radio button to set a relative retention
date,
that is, a date determined using the following formula:
retention date = <soMany> <timeUnits> from <fileTimeAttribute>
Use the fields to the right of the Relative Date radio button to set
up the formula:
• use the first empty field to set the <soMany> number
• use the middle drop-down menu to set the <timeUnits>
• use the left drop-down menu to set which <fileTimeAttribute>
to base the date on.
Retention Class Selection: This radio button is present whenever an
EMC repository is selected, but is not available (grayed out) for
Celerra FLR repositories.
18
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Workflow Changes
Celerra FLR Legal Hold Capabilities
Only Legal Hold withOUT Enforcement is available on Celerra FLR
servers.
More specifically, while a metadata tag for legal hold may be set for
files on registered Celerra FLR repositories, the standard IS1200
Actionable Services legal hold option “Enforce Legal Hold at the
repository level” is not available for Celerra FLR files. The IS1200
cannot change the file privileges on registered Celerra FLR
repositories to prevent users from moving, changing, or deleting files,
even those with the legal hold metadata tag set.
Celerra FLR Collection Options
Collections done from either Web-Admin or the
eDiscovery Case Manager to Celerra FLR repositories include options
for setting the retention options on the target. See “Celerra FLR
Retention Capabilities” on page 18 for details on setting these
options.
Administration
Web-Admin can register, classify, and do Single Step Collections to
Celerra FLR repositories.
Searching
A new metadata namespace, “Retention”, contains the following new
fields:|
retentionlock, retentiondate, retentionsetdate,
retentionsetuser, retentionreportdate
All these fields are viewable using the
Show Metadata icon from
search results, but only retentionsetuser is routinely indexed
and can be searched for.
Actionable Services like Copy, and Move, that are applied to search
results from Celerra FLR repositories contain new interfaces for
setting retentions options on their targets. A new Actionable Service
Celerra FLR Legal Hold Capabilities
19
Workflow Changes
tab called Retention, allows extending retention settings. See “Celerra
FLR Retention Capabilities” on page 18 for details on setting these
new retention options.
Reporting
Web-Reports contains a report category called Retention reports.
Retention reports list expired, soon-to-expire, and locked files
allowing administrators to manage these files with Actions that reset
retention settings or delete expired files.
Retention reports are only for repositories such as Celerra FLR
repositories that implement specific retention features for the files
they contain. Retention reports run on filers without retentions
capabilities return empty reports.
Retention reports prefixed by "Snaplock" are designed for SnapLock
repositories. Running these on Celerra FLR repositories will return
empty reports or errors.
Actions like Copy or Move, that are applied to report results from
Celerra FLR repositories contain new interfaces for setting retentions
options on their targets. See “Celerra FLR Retention Capabilities” on
page 18 for details on setting these new retention options.
20
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Glossary
This glossary contains terms related to disk storage subsystems,
networks, file management, and eDiscovery. Many of these terms are
used in this manual.
A
active case
In eDiscovery situations, a company may have more than one legal
issue (case) in progress at a time. Often it is advantageous to limit job
or search scope to just one case. When the user interface scope is
limited to a particular single case, that case is the active case.
Active Directory (AD)
A technology created by Microsoft that provides a variety of network
services, including: LDAP-like directory services, Kerberos-based
authentication, and DNS-based naming and other network
information.
Actions,
Actionable Services
Access Control List
(ACL)
Services such as copy, move, delete, tagging, and so on, that can be
applied to search and report results and allow the IS1200 to be an
effective file management tool for registered repositories.
A file system level data file that specifies how users or groups may
access resources on a computer or network, like an application, file or
printer, and the rights they have to it, for example read access, write
access, and so forth. For more information on how the IS1200 may use
ACLs, see the Controlling ACL Checking section of the Configuration
Files and Utilities appendix of any IS1200 User Guide for details.
21
Glossary
Advanced Search
Agents
Assignment Rules
A search made from the IS1200 Advanced Search link. Allows
searching for extracted metadata by tag-value pairs, and allows
multiple variable and boolean searches.
See “connectors” on page 25.
An assignment rule is a type of classification rule. It tags files with
metadata and assigns files to policy groups. Assignment rules are
contained in Assignment Rule Sets (ASRs). See the Policies:
Classification, Extraction and Assignment Rules chapter of the any
IS1200 User Guide for more details.
Auditing
A service that allows the IS1200 to record all system events according
to who did what, when, and the event result. This data is especially
useful to Legal Service Providers when providing an audit trail for
responsive data produced during eDiscovery. Complete details are
available in the Auditing and Data Verification chapter of any IS1200
User Guide for details.
Authorization Rule
A policy rule that filters search results to ensure that the assigned files
can only be viewed by authorized users. IS1200 authorization policies
may be used to add additional levels of security to the Access Control
Lists (ACLs) for file objects found in registered data repositories. See
the Policy Groups: Authorization Policies chapter of any IS1200 User
Guide for more details.
Authentication
The process of identifying users based on user name and password to
ensure that only authorized users can access the IS1200.
B
Basic Search
A search made from the Search page using only the Search field.
Searches only the content found in the fullText field populated
during classifications.
C
CAS Device
CASID
22
EMC’s Content Addressed Storage (CAS) devices are cluster-able
archival devices that host archival business file content such as email,
office productivity files (like word processing and spreadsheet files),
images, and other file documents.
A unique IS1200 ID for each classified file that the system generates
during basic classification.
Glossary
Centera Server
The EMC Centera server is a networked storage system specifically
designed to store and provide fast, easy access to fixed content
(information in its final form). It is a CAS device providing long-term
retention and assured integrity designed to store and manage data
that require or have legally mandated retention periods, for example
medical records and files relevant to legal matters.
Celerra Server
An EMC server designed to store and manage archival data. The
Celerra File Level Retention (FLR) server also allows enforcing
enterprise or governmental retention policies.
checkpoints,
checkpointing
Checkpoints and checkpointing allow IS1200 jobs and services to
resume more efficiently if the job or service is paused or stopped
before it completes. Basically, the IS1200 records “bookmarks” about
what file or object was last processed. This allows the IS1200 to skip
to the bookmark—the checkpoint—when the job or service is
resumed, and avoid reprocessing all the files and objects already
processed.
However, checkpoints are not set for every file accessed, instead most
jobs divide file processing into “batches” and the checkpoints
indicate where batches started. Consequently, when a job restarts at a
checkpoint, some objects may be reprocessed again and—in cases
such as a 'Copy' service with 'enable-versioning' option
selected—duplicate versioned files will be created on the target
repository when those objects are reprocessed.
Classification Rule
Classification Service
Rules that the system implements during data classification to extract
metadata, tag files, and assign files to policy groups. The two types of
classification rules are extraction rules and assignment rules.
Sometimes called a “crawl”. An IS1200 service that accesses
job-specified registered repositories and extracts and records their
metadata to later facilitate comprehensive and cross-repository
searches. Classifications extract metadata according to extraction
rules, compute digests for all objects, and assigns files to policy
groups according to assignment rules. See “Assignment Rules” on
page 22, “Extraction Rules” on page 29, “Hash Values” on page 30,
and “Policy Groups” on page 34 for more details.
Classifications may be “full”, every object in the specified repositories
is parsed and its metadata repopulated in the indexes and databases,
or they may be “differential”, see “Differential Classifications” on
page 27 of more details.
23
Glossary
Cluster
A set of IS1200 appliance nodes working as a unit. A cluster can
contain a maximum of four nodes. A cluster can be used to control
other clusters, see “Information Center Server” on page 31 for details.
CAS
Content Addressable
Storage
Rather than address data objects by a file name, at a physical location,
a CAS device uses a content address (hash-code identifiers) based on
file contents to store file objects in a flat file system that maximizes
storage efficiency. This returns a unique identifier (Content Address)
used to store and retrieve data objects.
CSV
Comma Separated
Values
A file type used to transfer data between applications such as
databases and spreadsheets.
CLI
Command Line
Interface
The CLI is a traditional command line interface that allows direct
communications with the IS1200 “backend” using a the set of
commands defined in the IS1200 Command Line Interface Reference
Guide.
Concepts Search
The standard IS1200 software supports keyword exploration.
However, in the initial stages of the legal discovery process (often
called eDiscovery), keyword search alone may not be as concise or as
time-efficient as required by standard legal timetables.
Concepts augments standard keyword searching by automatically
suggesting filters based on the results of a current search. By default it
looks for concepts based on persons, countries, noun groups,
organizations, company names, and products.
Concepts Search is an optional module that requires an additional
license key for each IS1200 cluster node. See the IS1200 Concepts
Search User and Configuration Guide for complete details.
conceptfinder Ruleset
The conceptfinder ruleset is an assignment ruleset that extracts the
concepts listed in the Review/Analysis Results Grouping Concepts pane,
which is only available when a valid Concepts license is installed on
the IS1200. The conceptfinder ruleset must be used in deep
classifications to get the best results in Review/Analysis from the
Concepts heading of the Results Grouping pane.
The ConceptFinder_DWF assignment ruleset combines both the
conceptfinder ruleset and the DocsWithoutFullText ruleset. See
“DocsWithoutFullText Assignment Ruleset” on page 28 for more
details.
24
Glossary
connectors
Connectors are IS1200 optional modules that allow an IS1200 to work
with repository types beyond the standard CIFS and NFS
repositories. See “optional modules” on page 34 for more details.
Optional module connectors require separate licenses to be purchased
and installed on all nodes of an IS1200 cluster. For a complete list of
optional modules available, see the Introduction chapter of any IS1200
User Guide.
Some connectors, such as the Microsoft Exchange Server Connector,
require agents. Agents are additional server platforms, usually
Windows servers, that provide the additional CPU cycles and
network staging the IS1200 needs to work with the repository types
they connect to.
All connectors have their own user guides which can be accessed from
the Kazeon Documentation link on the IS1200 Manager page
(https://<yourIS1200Name>/manager).
Container file/object
A file (object) that contains other files (sub-objects), such as a ZIP,
TAR, JAR, and PST or NSF files. The container file is often called the
“parent” and the contained objects are called “children”. Container
objects should not be confused with files that have embedded objects,
such as Microsoft Word files that have embedded charts or graphics
(OLE).
Custodian
A legal term used by Legal Service Providers (LSP) and other legal
personnel to describe the owners or responsible parties for electronic
documents pertinent (responsive) to a legal matter.
D
Data
Datamap
A file of any type and size such as a short email, a word processor
document, or a large spreadsheet.
A report that lists the electronic storage locations of all possible
sources of relevant ESI. This can include standard file servers,
groupware servers, email servers—and their backup and archive
systems—as well as custodian’s desktop and laptop computers.
Data-Mount
The NFS file system that is accessed by the IS1200 to parse data and
extract metadata.
Data Server
The file server that exports an NFS or CIFS file system so that the
IS1200 can classify data on the file system to create metadata.
25
Glossary
Data-Share
The CIFS file system to be accessed by the IS1200 to extract metadata.
Data Repository
A networked file system registered with the IS1200 so it can be
classified, searched, and reported on. Data repositories created on the
IS1200 itself (sometimes called localdatafs) are strongly
discouraged!
Data Verification
Builds on Auditing and is only available when system auditing is
enabled. For job services like Actionable Services Copy or Move,
Legal Hold Copy, and Single Step Collections, Data Verification
generates an audit trail proving that files were not altered during
these actions. This is especially valuable in eDiscovery situations.
Complete details are available in the Auditing and Data Verification
chapter of any IS1200 User Guide
Deduplication
A process that identifies file or email object and sub-object duplicates
based on their digest values (See “Digest Values” on page 27 for
details).
In the 4.7.0 and prior versions of the IS1200 software, deduplication
was only available for export actions (Actionable Services such as
Download, Legal Export, and Copy). This allowed exporting only the
unique files and email objects from a set of search results. With IS1200
version 4.8.0, deduplication's functionality is expanded and is
automatically applied during case collections and processing to allow
displaying deduplicated search results. Note that when deduplication is
applied to display of search results, duplicates are only suppressed
from display, however duplicates are physically removed from
exported file sets.
Deduplication is available only in the ECS version of IS1200 and is
applicable only in case context.
DeDuplication view is configurable as deduplication and
non-deduplication view. This allows to view whether any object has
got duplicates in search results and the duplicate of the Original (in
the search results).
Besides the automatic deduplication of collections and processing,
deduplication may also be started manually from the IS1200's case
dashboard.
26
Glossary
Deduplication reports describing how a particular job or service
applied deduplication are available. The reports can be accessed from
the IS1200 case dashboard as well as from web search. Reports can list
all results, only unique (deduplicated) results, or percentages of
unique and duplicates.
Reduplication is a process that allows the duplicates of unique files to
be identified so tagging processes can apply metadata tags to the
unique files as well as all its copies. Legal Tags reduplication can be
done after documents are added to the case.
Differential
Classifications
Differential classifications do not re-classify all file objects in the
selected repositories. Instead, they examine the metadata from
previous crawls, and if there is no previous metadata (indicating the
object is new since the last classification) or the metadata has changed
(based on atime, or mtime changes), then the object is parsed and its
metadata re-populated in the database.
Note: System classification configuration settings default to using mtime to
determine if files have changed for differential classifications. If atime is
desired instead, see the Using atimes for Differential Crawls section of the
Configuration Files and Utilities appendix of any IS1200 User Guide for details
on resetting the default to atime.
Additionally, atime may be applied only to selected classifications by
initiating them from the Command Line Interface, see the add service
deep-classification command and the crawl-atime-check-enabled
option in the IS1200 Command Line Interface Reference Guide for details.
Digest Values
Digests are numerical values calculated based on file and email
content and are unique for all unique objects. Digest values allow file
objects to be compared very quickly. Digests are calculated during
basic and deep classifications or during collections or processing
when indexing is enabled.
Digests are calculated differently for standard files, emails, and
container objects. For standard files, a physical digest is computed for
the entire file much like a hash value.
For email objects, just the subject, the message content (including
attachments), and certain specific addresses are combined and an
email digest value is calculated from the combination. Container
objects, like ZIP or PST files, and their sub-objects have digests
calculated both as complete objects and as individual sub-objects.
27
Glossary
Note: Calculating email digests requires access to the email object's fullText
and only classifications that include the fullText rule can produce email
digests. Emails classified without the fullText rule receive the same physical
digest that other files do. Consequently, identical emails on different
repositories, one classified with and one without the fullText rule, will not be
identified as duplicates.
Domino Sever (Lotus)
Domino XML
Language (DXL)
DocsWithoutFullText
Assignment Ruleset
A Lotus server providing groupware solutions and storage.
A Lotus version of eXtensible Markup Language (XML) used to
import and export Lotus email files.
Some file objects, such as graphics files (examples are.jpeg, .gif, or
.bmp files) contain no text, and hence will have no fullText
extracted by the FullTextRuleset, see “fullText” on page 30 for more
details. In legal cases, these files may still contain responsive
information, but not textual information that can be located by text
searches. The DocsWithoutFulltext assignment rules identifies these
files and adds the metadata tag and value
“DocWithoutFulltext=true” to all files that contain no searchable
text. This allows these files to be easily searched for later, and
inspected for legal responsiveness by non-search methods.
The ConceptFinder_DWF assignment ruleset combines both the
DocsWithoutFullText ruleset with the conceptfinder ruleset. See
“conceptfinder Ruleset” on page 24 for more details.
Note: Parent file objects that don’t contain text (such as .zip, .tar, and .pst files)
are not tagged with the DocWithoutFulltext tag.
Documentum Sever
(EMC)
The EMC Documentum server manages business content including
documents, photos, video, medical images, e-mail, Web pages, fixed
content, XML-tagged documents, and so on. The Documentum core
is a repository that stores content securely under compliance rules
and appears as a unified environment, even though content may
reside on multiple servers and physical storage devices within a
distributed environment.
E
eDiscovery
28
The process of reviewing electronic files to determine their relevances
and responsiveness to a legal matter or case.
Glossary
eDiscovery Case
Manager
An IS1200 tab that facilitates eDiscovery for Legal Service Providers.
Electronic Discovery
Reference Model
(EDRM)
The EDRM was a Project created to provide standards and guidelines
for the electronic discovery market. The model defines a common,
flexible and extensible framework for the development, selection,
evaluation and use of electronic discovery products and services.
Enterprise Vault
eth1, eth2
Extended Attributes
Extraction Rules
Exchange Server
(Microsoft)
A Symantec networked repository for archived email.
Most IS1200 platforms require two ethernet connections for proper
deployment. These connections are called eth1 and eth2, must each
have unique IP addresses, and must be GigaBit, or 1GB/sec or faster,
connections. Additionally, all network segments between eth1 and all
registered metadata and data repositories must be gigabit
eth1 is used to communicate between the IS1200 and its registered
repositories. The IS1200 hostname should be DNS mapped to the eth1
IP address.
eth2 must be connected to a private network between the IS1200
nodes and is used to coordinate and balance system wide operations.
eth2 IP address should not be DNS mapped.
User-defined keywords that are extracted during data classification.
Extraction rules are a type of classification rule. They extract
user-defined keywords (custom metadata) to add to the metadata file.
Extraction rules are grouped into Extraction Rule Sets (ERSs). See the
Policies: Classification, Extraction and Assignment Rules chapter of any
IS1200 User Guide for more details.
A Microsoft server designed to store and manage email.
F
Federation
Federation Server
Filer
A defined group of member-clusters on a Federation server that can
be managed, searched, and reported on as a group. Member-clusters
are referred to as Federated clusters.
A single-node IS1200 server, with a Federation license, that allows
consolidated searching and reporting of up to eight Federated
member-clusters of its defined Federation.
A file server that exports its file systems using NFS or CIFS protocol.
29
Glossary
fullText
fullText is the “content” portion of a file, for example this is the textual
content of word processing files and the message body of emails.
fulltext is an extraction rule that is used to save file textual content as
metadata to the Search Index during classifications. It saves up to 10
megabytes of content by default. This default may be changed, but it
is not recommended. Fulltext extraction is required by
Review/Analysis for the Previewer pane to work and to generate
Concepts in the Results Grouping pane.
fulltext, is extracted differently for container objects and sub-objects, and
for files with embedded objects.
Container objects (such as ZIP or PST files) and their sub-objects
are classified individually and the fulltext of the parent container
file, and for each child sub-object, is extracted and added to the
relevant metadata repository separately.
Files with embedded objects (such as a Microsoft Word file with and
embedded spreadsheet), are classified together. The fulltext of the
embedded object is included in the fulltext of its parent object and
not collected separately.
For more details on fullText, see Chapter 1 of the IS1200 Metadata
Reference Guide.
G
Groupware
Collaborative software designed to help people involved in common
tasks achieve their goals. Incorporates services such as email,
calendaring, text chat, wiki, web-sharing, document control, and
advanced search.
H
Hash Values
Hash values are used to compare one file with another for duplicates.
An extremely simplified description of hashing is that the numeric
values of all bytes in a file are added into a grand total. The chances of
two different files yielding the same result (hash value) are remotely
small, so hash values can be used to identify duplicate files, or
compare files with the same name to decide if they have been
modified.
30
Glossary
Computing hash on an entire file is called a full-hash, and computing
hash on a portion of the file is called a partial-hash. A “partial hash”
may also be used to increase classification speed and “hashing” can
be turned on, or off to increase classification speed.
I
identity
A single entry in the Identity Vault database. The identity contains a
single username and password that the IS1200 can retrieve when it
needs to access a registered data or metadata repository or other
server like and authentication service.
Identity Vault
An encrypted database of usernames and passwords the IS1200 uses
to store the credentials used to access registered data repositories,
send email notifications, and work with authentication services.
Information Center
Server
The standard IS1200 server offers clustering as a scalable solution for
classifying, searching, and reporting on registered network
repositories. While clustering is ideal for scaling to large numbers of
files on a LAN, it is not a viable solution for WANs. Enterprises with
multiple IS1200 clusters deployed, or IS1200 clusters deployed in
remote offices need the ability to setup and manage unified reports
and searches across all their clusters. The IS1200 Information Center
server provides this solution.
Each Federation server supports one federation. A Federation may
have up to eight clusters (with four nodes each) included in it. Once a
federation is established, it becomes a central management point
allowing classifications, search, and reports to be setup or managed
on all the federations members from the Information Center server.
See the IS1200 Information Center User and Configuration Guide for
complete details.
Intelligent Platform
Management
Interface (IPMI)
IS1200 clusters may contain more than one node. Normally each node
communicates with the others to share information and workload.
The IS1200 appliance includes an Intelligent Platform Management
Interface (IPMI) to shut down nodes when individual nodes or
software errors would degrade the overall cluster performance. The
IPMI is an autonomous micro-controller—installed in all cluster
nodes—used by the cluster’s “leader” node to power down nodes
with errors or performance problems. The IPMI requires its own
unique IP address, but communicates over the eth1 port, see “eth1,
eth2” on page 29 for more details.
31
Glossary
K
Kazeon EVAgent
An IS1200 service, installed on the Enterprise Vault server, that allows
the IS1200 to directly open and access Enterprise Vault email for
classification services.
Kaz-mount
The NFS file system that is the IS1200 metadata repository. on which
the IS1200 stores metadata.
Kazeon Query
Language (KQL)
A programming language used in classification and assignment rules
to identify files that should receive specified metadata tags.
KQL Reserved Words
The KQL language reserves the following words. Consequently, they
are not allowed to be searched for, or used as tags or aliases.
"ADD", "ALL", "ALTER", "AND", "ANY", "AS", "ASC", "AVG",
"BETWEEN", "BY", "CASCADE", "CHECK", "COLUMN", "COUNT",
"DESC", "DISTINCT", "ESCAPE", "EXISTS", "FROM", "FULL",
"GRANT", "GROUP", "HAVING", "IN", "INTO", "IS", "JOIN", "KEY",
"LEFT", "LIKE", "MAX", "MIN", "NOT", "NULL", "ON", "OR",
"ORDER", "OUTER", "REVOKE", "RIGHT", "SELECT", "SET", "SUM",
"UNION", "UNIQUE", "UPDATE", "VALUES", "VIEW", "WHERE"
Kaz-server
The file server where the metadata repository is located.
Kaz-share
The CIFS file system on which the IS1200 stores metadata.
Kaz Schema
Defines the set of metadata fields used to build a Search Index for
registered data repositories (file systems).
L
Legal Hold
Files placed on legal hold are either copied to a secure secondary
location where they can preserved for later use, or are locked in their
original locations against further change until a legal matter is
resolved.
Legal Service Provider
(LSP)
A lawyer or trained legal professional that provides legal services for
a fee.
Local
localdatafs
32
Refers to the local resources (usually the metadata repository) of the
Federation server.
A data repository created on the IS1200 itself. This practice is not
recommended.
Glossary
localkazfs
Logging rule
A metadata repository created on the IS1200 itself. This practice is not
recommended.
Logging rules audit user actions on files such as file access, creation,
modification, and deletion.
M
Manifest Reports
Manifests are reports that summarize the results of an IS1200 job or
service. Manifests are produced for Collections (from either
Administration or the Case Mgmt) and for some Actionable Services.
Collection Manifests summarize what files were, or were not
collected during a collection. Actionable Service Manifests reconcile
Actionable Services object-counts with the search result object-counts
they are performed on because processes such as deduplication can
result in the two counts not matching. The reports details the count of
differences and the reasons for the differences. For more information,
see Manifests in the IS1200 Web-Search User Guide.
Note: Collection manifests are available ONLY for collections done from
v4.6.0 or later, earlier versions did not generate collection manifests.
Member-cluster
Metadata
Metadata Repository
Any of the clusters registered to a particular Federation.
Data about data. Metadata is used to search for information and to
create reports. Metadata can be file system or custom metadata that
the IS1200 extracts from files during classification. File system
metadata includes file type, and file path extracted during basic
classification. Custom metadata is generated during deep
classification.
A registered repository the IS1200 uses exclusively to record the
metadata extracted during classification services on the registered
data repository the metadata repository is mapped to.
The primary metadata repository is the host of the repository
registration database, the report results database, Environment
Discovery job results, Auditing and Data Verification databases, and
miscellaneous databases the cluster requires for routine operation.
Collectively these are called the Cluster Data Base.
Metadata repositories created on the IS1200 itself (sometimes called
localkazfs) are strongly discouraged!
33
Glossary
N
Namespaces
IS1200 software, versions 4.0 and higher, organize metadata fields
into hierarchy defined by namespaces. Namespaces group similar sets
of tags, for example all the file level tags such as FileType, FileSize,
aTime, and cTime are grouped together in the System namespace. See
the IS1200 Metadata Reference Guide for complete details.
Network File System
(NFS)
A protocol used primarily by Unix based computers for accessing
computer systems and filers over the internet.
Network Information
System (NIS)
A network naming, administration, and authentication system for
smaller networks that was developed by Sun Microsystems and is
used primarily by Unix systems.
Node
Notes Storage File
(NSF)
A single IS1200 appliance.
A standardized storage file format used by Lotus to store email,
attachments, notes, calendars, and so on.
O
optional modules
The standard IS1200 license provides a default set of features that
allows the IS1200 to register, classify, and search and report on CIFS
and NFS data repositories. Optional modules are additional software
licenses that can add further capabilities, such as being able to work
with repository types other than CIFS and NFS, or providing
Concepts Search capabilities, or applying legal hold. Some optional
modules require connectors, see “connectors” on page 25 for more
details. For a complete list of available optional modules, see the
Introduction chapter of any IS1200 User Guide.
P
34
PEA Files
A Pool Entry Authorization (PEA) file is generated by the Centera
server administrator. A PEA file defines what applications and users
can perform read, write, delete, query, copy, or hold operations for
Centera objects.
Policy Groups
Associates one or more authorization rule and logging rule with one
or more files to protect information and audit user actions on files.
Glossary
PST Files
Personal STorage files are generally used by email programs like
Microsoft Outlook to store user email locally. PST files are also called
“composite” files, because they are packages meant to efficiently
store a number of smaller related files. Another example of a
composite file is a ZIP storage file
R
Retention
Roles
The process of enforcing corporate or legal standards for how long
certain kinds of files must be preserved for access. Examples of
retained files include files responsive to legal matters and medical
records.
All IS1200 users have a role, either admin, auditor, or end-user. If a
legal license is installed, there may also be legaladmin,
legalsupervisor, legalreviewer, or a custodian. Roles
determines what parts of the IS1200 interface may be seen, and how
much of search and report results are displayed.
S
Search Analytics
Pre-Processing
Search Analytics Pre-processing was introduced in release 4.5.0 to
minimize search results display time and improve the overall
efficiency of eDiscovery culling. Analytics Pre-processing is an
integral, automatic, post-processing job performed after any job that
modifies the Search Index. Analytics Pre-processing trades an
increased post-job indexing period for significantly reduced search
results display times after the affected jobs complete.
A variety of jobs requires Search Index changes and therefore require
Analytics Pre-processing. These include Collections, Classifications,
Delete, and Tagging jobs. The time required by Analytics
Pre-processing is determined primarily by the number of objects in
the affected data repository, the number of distinct analytic (result
filter grouping) attributes (such as custodians, mail senders, mail
recipients, sender domains, recipient domains and so on.), and the
read/write performance of the metadata repository associated with
the data repository.
Additionally, once any Analytics Pre-processing job is launched, all
subsequent Analytics Pre-processing jobs (that might be required by
other concurrent jobs-in-progress) wait for the current Analytic
Pre-processing job to finish. However, before beginning any Analytics
Pre-processing job for a particular data repository, the IS1200 checks
35
Glossary
all other jobs-in-progress for that repository to see if they might also
require Analytics Pre-processing. If other jobs are found, the IS1200
waits for all these jobs to finish in order to launch a single Analytics
Pre-processing job for all the jobs that affected the Search Index for
that data repository.
Therefore, there are two best practices suggested for scheduling jobs
that affect the Search Index:
•
Schedule large classifications or collections such that both they,
and the Analytics Pre-Processing they require, can both fully
complete before starting any other job. This allows the IS1200 to
most efficiently schedule the required processing resources. Large
jobs are those that affect data repositories with tens of thousands
of objects or terabytes of data.
•
Schedule small jobs (such as incremental collections, or
post-search tagging operations) to run concurrently so the IS1200
can identify their common Analytics Pre-processing requirements
and group them into a single job.
Note: IS1200’s that are upgraded to v4.5.0 may need some additional
configuration to make the most efficient use of Analytics Pre-Processing. See
the Configuring the IS1200 To Use Proactive Indexing section of the Configuration
Files and Utilities appendix of any IS1200 User Guide for complete details.
Search Index
SharePoint Sever
(Microsoft)
snippets
An IS1200 database that stores and indexes the file content metadata
(including extended attributes, and fullText) for standard and custom
user-defined metadata produced by extraction rules during
classifications.
A Microsoft server in the groupware category.
A snippet is a sub-set of a document’s actual content. Snippets are
only displayed if they are enabled in Review/Analysis Preferences, and
only in Paragraph View immediately under the first line of the result
listing.
After a keyword search completes, result snippets are created as
small standard size chunks of data taken from the text surrounding a
search query hit. For example, if a search is made for “medicine”, the
snippet will contain about 300 bytes of the text surrounding the
paragraph where the word “medicine” was found. If multiple search
36
Glossary
hits are found, the most relevant hit is used to create the snippet. For
searches made without keywords, snippets are simply the first 300
bytes of file text.
Snippet size is configurable, see the Configuration Files and Utilities
appendix of any IS1200 User Guide for details on setting snippet size.
In all cases, snippets are taken from the result file’s fullText.
SourceOne Archive
Server (EMC)
The EMC SourceOne server is a comprehensive, policy-based system
that automatically collects, organizes, indexes and retains messages
and associated attachments and stores them in designated archives
connected to shared storage. EMC SourceOne provides indexed
searching that works with both EMC storage and other brands such
as IBM or NetApp.
Special Characters
The IS1200 supports alphanumeric ASCII and UTF-8 characters.
Non-alphanumeric ASCII characters are defined as Special Characters
and include the following:
‘“-_\/!@#$%^&*+={}[]()<>|:;,.?~`
Special characters are not universally supported in the IS1200
interfaces. The following limitations must be noted:
Search Queries and Special Character, Special characters pose a
searching challenge. Because the IS1200 tokenization removes special
characters from indexed text as it is classified, special characters are
never entered into the IS1200 metadata indexes. Consequently,
special characters may not be directly searched for. For more details
see Tokenisation and Stemming in the IS1200 Web-Search User Guide.
While special characters may not be directly searched for, the text
they are included in can be searched. For example, the string
"-ACME-" is tokenized on the hyphens and recorded in the metadata
only as "ACME". Consequently, searching for the string with the
hyphens (-) will NOT work. However, you can search for “?ACME?”
(using the question mark wildcard) which gives the result as
“!ACME!”, “@ACME.”, and so on. See the IS1200 Web-Search User
Guide for more details on wildcards.
Note: The question mark character ( ? ) may not be searched for in filepaths,
even when escaped. This exception is limited to filepath searches only.
37
Glossary
AD login names and NIS login names support only alphanumeric
ASCII and UTF-8 characters, they do NOT support following special
characters:
'"-_\/!@#$%^&*+={}[]()<>|:;,.?~`
However, in Active Directory (AD), registered users may have both an
AD login name and a display name. For example, John Smith may have
the AD login name “jsmith” and the display name “John Smith”.
When new legal supervisors or reviewers are created in the Case
Mgmt using the AD lookup button, they take the display name, not the
login name, and the display name may contain special characters as
described below.
Legal Supervisor Names and Legal Reviewer Names only,
support:
'-_!@#$%^&*+={}[]()|:;,.?~`
do NOT support:
"\/<>
Custodian Names only,
support:
'"-_!@# %^&*+={}[]()|:;,.?~`
do NOT support:
\/<>$
Case Names, Legal Export Profile Names, Repository Names,
Rule Names, and Policy Names only,
support:
_ (underscore)
do NOT support: " - \ / ! @ # $ % ^ & * + = { } [ ] ( ) < > | : ; , . ? ~ `
Email IDs which are used in Legal Hold notification and
Acknowledgements, Search filters, Collection filters and so on,
support:
'-_!#$%^&{}:;,.?~`
do NOT support: " \ / @ * + = [ ] ( ) | < >
Mail Domain Names DO NOT support any special characters.
File names/ Directory names in source and destination file names
only,
support:
'-_!@^+={}[]()<>;,.~
do NOT support: " \ / # $ % & * | : ? `
Tag Names only,
support:
_ (underscore)
do NOT support: ' " - \ / ! @ # $ % ^ & * + = { } [ ] ( ) < > | : ; , . ? ~ `
38
Glossary
Tag Values only,
support:
'-_\/!@#$%^&*+={}[]()<>|:;,.?~`
do NOT support: " (double quote)
Rules definitions, special characters must be “escaped” before they
may be used in rule definitions. To escape a character use a \ before
the character.
Search technology uses reserved words, stop words, special
characters, tokenizers, and so on. These are common to almost ALL
search technologies and not just a Kazeon search engine
manifestation. One major reason for such implementation is that if all
the characters and the words are indexed regardless of any semantics,
your search index has a potential to increase beyond any manageable
size since the occurrence of said categories of tokens is so common.
Besides, there is not much value in indexing stop words (as, the, or,
and so on.) and tokenizers (@ , . - and so on.). The omission of such
characters from your search query is just a part of the optimization.
For example, when you “search” for (1+1):2, the characters “(”, “+”,
“)”, and “:” have special meaning in search - the parentheses are used
to specify grouping, the plus is used to specify inclusive terms in a
query, and the colon is using to separate tag from value as in
"filepath:*". In order to use these in your query, you need to escape
them with a backslash, as follows:
\(1\+1\)\:2
However, the escaping does not mean the characters are now a part of
your query. It only means that those characters are not interpreted by
search with special semantics. This query is preprocessed to drop
those characters from the final query which appears as follows when
it is actually executed by the search engine:
fulltext:"1 1 2"
This means that we are searching for a 1 followed by a 1 which is
again followed by a 2 such that there are no other valid indexable
search tokens between the three numbers. The results may match
1-1+2
1:1:2
1-1-2 and so on.
However, they will not match
39
Glossary
1:3:1:2
1-43+1:2 and so on.
Hence to search for (1+1):2, use the following query:
\(1\+1\)\:2
stop words
Stop words consist of the most commonly used words in sentences,
such as “a”, “an”, “the”, and ”and”. If indexed individually, they
would consume excessive amount of metadata storage space, and
consequently are not individually indexed.
If stop words are used in a search query, they are ignored unless they
are parts of quoted phrases. The table below lists all stop words:
Table 2Stop Words
40
a
an
and
are
as
at
be
but
by
for
if
in
into
is
it
not
of
on
or
such
that
the
their
then
there
these
they
this
to
was
will
with
Stemming
Stemming is a search technique designed to increase search efficiency
and broaden relevant search hits. When stemming is used, fullText
indexing first attempts to identify each word’s “stem”, and then
indexes words by their stems. For example, the words “connected”,
“connecting”, and “connectable” all share the same stem and are
indexed under “connect”. Search query criteria are automatically
stemmed, and so querying “connected” returns all instances where
“connect”, “connected”, and “connecting” are used. Nouns like
“connector” are not stemmed. Stemming is ON by default but may be
disabled.
stubs
Stubs are created by many file archiving applications, most notably
email archiving systems. When stubbing is used, and a file object is
moved to archival storage, a “stub” is left behind on the original file
system that points to the archived file’s new location. Thereafter, if a
user attempts to open the archived file from the original filer, the stub
allows that filer to retrieve the archived file and return it to the user
transparently (as if it were still on the original filer).
Glossary
Stubs may be searched for using the metadata field
“mailMessageClass”. For example, use the search query
“mailMessageClass:IPM.Note.ExShortcut” to find email message
stubs.
sub-objects
A file found inside a “container object”, see “Container file/object”
on page 25 for more details. A container file is often called the
“parent” and the contained sub-objects are called “children”.
Sub-objects should not be confused with embedded files such as OLE
objects, for example spreadsheets or graphics embedded in a
Microsoft Word file. Note however, an email message may be a
container object, if it “contains” attachments, or a simple
non-container object but still have a graphic embedded in its body.
T
Tags
The names of metadata fields. Tags are always associated with a
value. For example, the metadata tag “filename” for any given file is
always followed by a value (a text string) containing the actual
filename.
Tokenization
Tokenization is an IS1200 classification procedure that breaks word
strings into “tokens” for better search results. During classifications,
Numbers, AlphaNums, HostNames and EmailAddresses (in
fullText) are tokenized similar to alpha only strings. With
tokenization, the stings “www.kazeon.com”, “fred@kazeon.com”,
and “11,22,333,44” are tokenized into separate words yielding;
“www”,”kazeon”, and “com”; “fred”, “kazeon”, and “com”; and
“11”, “22”, “333”, and “444”. This allows searching for “kazeon” and
getting all email addresses that contain the domain name, while.
U
UTF-8
Unicode Transformation Format - 8, is an 8-bit coding scheme for
digitally representing both the standard western alphabet (Aa-Zz)
and its punctuation characters, and non-western word characters
such as the glyhps found in the Chinese, Japanese, and Korean
languages. UTF-8 encodes all its characters as 8-bit bytes (or octets).
The first 128 UTF-8 characters are identical to the first 128 ASCII
characters and require only one byte each. Non-western languages
are coded using one to four octets each. UTF-8 can encode all of the
1,112,064 code points in the Unicode character set that covers the
majority of languages in use around the world.
41
Glossary
W
Web-Admin
An IS1200 web application used by IT personnel to administer the
server itself, and when the IS1200 is used to help administer other IT
resources. Administration is the preferred interface for administering
the server.
Web-Reports
An IS1200 web application that provides advanced reporting
capabilities based on IS1200 metadata.
Web-Search
An IS1200 web application that provides basic, advanced, and
specialized email searches against IS1200 metadata.
X
XML
eXtensible Markup
Language
42
A file type that uses the XML language to define and describe data
that can be transferred between applications like databases and
spreadsheets.
Index
A
Access Control List (ACL) 21
Actionable Services 21
Actions 21
Active Directory (AD) 21
Advanced Search 22
Assignment Rule 22
Auditing 22
Authentication 22
Authorization Rule 22
B
Basic Search 22
C
CAS 24
CAS Device 22
Case Manager 29
CASID 22
Celerra FLR Repositories 10
Celerra FLR repositories 8
Celerra FLR Retention Manager 8
Celerra FLR server 9
Celerra FLR Servers 2
Celerra Server 23
Centera Server 23
checkpointing 23
CIFS 2
Classification Rules 23
Classification Service 23
CLI Command Line Interface 24
Cluster 24
conceptFinder Ruleset 24
Concepts Search 24
connectors 25
Container 25
Content Addressable Storage 24
CSV Comma Separated Values 24
Custodian 25
D
data 25
Data Repository 26
Data Server 25
Data Verification 26
Datamap 25
Data-Mount 25
Data-Share 26
Deduplication 26
Differential Classifications 27
DocsWithoutFullText 28
Documentum Sever 28
Domino Sever 28
Domino XML Language 28
DXL 28
E
eDiscovery 28
eDiscovery Case Manager 29
Electronic Discovery Reference Model 29
EMC
Celerra Server 23
Centera Server 23
Documentum Server 28
SourceOne Archive Server 37
Index
43
Index
Enterprise Vault 29
eth1 29
eth2 29
EVAgent 32
Exchange Server 29
Extended Attributes 29
Extraction Rule 29
F
Federation 29
Federation Server 29
file-level retention 2
Filer 29
FLR Repository Registration 8
FLR-specific retention 3
full-text 30
G
groups
policy 34
Groupware 30
GUIs
eDiscovery Case Manager 29
Web-Admin 42
Web-Reports 42
Web-Search 42
H
L
Legal Hold 32
Legal Service Provider (LSP) 32
Local 32
localdatafs 32
localkazfs 33
Logging rule 33
Lotus
Domino 28
M
Manifest Reports 33
Member-cluster 33
Metadata 33
Metadata Repository 33
Microsoft
Active Directory 21
Exchange Server 29
SharePoint 36
Mount Path 12
N
Network File System (NFS) 34
Network Information System (NIS) 34
NFS 2
Node 34
Notes Storage File 34
NSF 34
Hash Values 30
O
I
optional modules 34
identity 31
Identity Vault 31
Intelligent Platform Management Interface 31
IPMI 31
P
PEA Files 34
Policy Group 34
PST 35
K
Kaz Schema 32
Kazeon EVAgent 32
Kazeon Query Language 32
Kaz-mount 32
Kaz-server 32
Kaz-share 32
44
R
Reporting 20
reports
manifest 33
Web-Reports 42
Retention 35
Retention Class Selection 18
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Index
Retention reports 20
Roles 35
rules
classification 23
extraction 29
logging 33
S
Search
advanced 22
Basic 22
Web-Search 42
Search Index 36
Searching 19
SharePoint Sever 36
Snaplock 20
SnapLock repositories 20
snippets 36
source repository 15
SourceOne Archive Server 37
Stemming 40
stop words 40
sub-objects 41
Symantec
Enterprise Vault 29
T
Tags 41
target repository 15
Tokenization 41
W
Web-Admin 42
Web-Reports 42
Web-Search 42
X
XML eXtensible Markup Language 42
Index
45
Index
46
IS1200 EMC Celerra FLR Retention Manager User and Configuration Guide— v4.8.0
Related documents
Download