LTPI Production Glossary 20151110 For Public Comment

advertisement
PRODUCTION GLOSSARY
The Legal Technology Professionals Institute Production Glossary is designed as an educational resource on terminology used in connection with producing
electronically stored information. While a number of useful industry-wide glossaries exist, we could not find one that specifically discussed document
production, nor one that discussed not only the “what”, but also the “why”, so we created one. You will find the words below used by both lawyers and
litigation support technicians. You will also find these terms used in ESI stipulations as well as in Load Files. This is the language of “Production”.
This is a work in progress. If you are confused by a term or think we have missed one, please contact us at programs@legaltechpi.org. Please also visit
our website http://www.LegalTechPI.org/. If you find LTPI’s work applicable to you, please join LTPI and contribute your time and resources today.
TERM
Native Production
TIFF Production
DEFINITION
SOURCE
A Native Production is a one of the three primary methods of producing ESI. A Native Production contains three main
components:
(1) A copy of each original electronic document being produced, in the format created by the authoring / native
application such as Word (DOC) or Excel (XLS/XLSX), i.e., the Native File;
(2) The text extracted from each Native File during processing, provided in an associated .TXT File; and
(3) A Load File containing the metadata and source information extracted from each Native file during processing,
provided in an associated .DAT, .LEF, .DII or other similar standard format.
The advantage of Native Production is that, if the files are preserved properly, Metadata and Extracted Text should be
100% accurate and intact, improving the efficiency of the review and analysis of produced ESI. Native Production also
maintains the “best evidence” of potential evidence throughout the process. However, the receiving party will need a
matching application for each file type produced, or Viewer Software, such as “QuickViewPlus,”(or similar) that can
read and render most standard Native Files in readable format.
TIFF Production is one of the three primary methods of producing ESI. A TIFF Production should usually contain four
primary components:
(1) A series of Static Images showing each page from each original hard copy or electronic document being produced,
fixed in the same manner as if the document was printed or copied on paper, usually provided in either TIFF or PDF
file format;
(2) The text extracted from each Native File during processing, or electronic text created through OCR, provided in an
associated .TXT File; (this is optional but highly recommended)
(3) A Load File containing the metadata and source information extracted from each Native file during processing,
provided in an associated .DAT, .LEF, .DII or other similar standard format; and
(4) A Load File containing information necessary to link each Static Image with all other Static Images that are part of
the same document or a series of documents (see, Attachment and Family), provided in an associated .OPT or .DII file.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 1 of 13
TERM
DEFINITION
SOURCE
TIFF Productions facilitate the fixed pagination of documents and placement of Bates Numbers, Branding, and
Redactions on each individual page.
Image Production
Hybrid Production
Hard Copy
Production
502(d) Order
Attachment
Author
Bates Number
See, TIFF Production. The term “Image Production” is a more generic term than “TIFF Production” and may be used
when referencing a production of Static Images in either TIFF or PDF format. Remember: Unless you request
extracted text with your images, what you will get are just images, which makes “search” next to impossible.
Hybrid Production is one of the three primary methods of producing ESI. A Hybrid Production is a mix of Native
Production and Image or TIFF Production elements, selecting certain Native Files to be produced in their Native File
format, and other Native Files to be rendered to a TIFF or PDF image. A common example for a Hybrid Production
would be to produce spreadsheets, PowerPoint presentations, graphic images, and specialty applications as Native
Files, while emails and word processing files produced as Static Images. The necessary metadata and text elements
for each type of production would be included as appropriate.
Documents produced in paper format. This is not a primary method of producing ESI. The December 1, 2006
amendments to the Federal Rules of Civil Procedure (FRCP) – specifically rule 34(b) – made the default obligation to
produce a document “in a form or forms in which it is ordinarily maintained or in a form or forms that are reasonably
usable” unless the requesting party – or failing that, the producing party – specifies a different format. A party that
produced ESI in Hard Copy format only without request or court order will likely be required to reproduce the
documents in an electronic format at their own expense.
Federal Rule of Evidence 502(d) provides “A federal court may order that the privilege or protection is not waived by
disclosure connected with the litigation pending before the court — in which event the disclosure is also not a waiver
in any other federal or state proceeding.” Thus, parties who obtain a 502(d) order protect privilege and avoid waiver
of privilege or subject matter in the present litigation as well as in other matters in any other federal or state courts.
An Attachment is an electronic file that has been “attached” to another electronic file, most frequently emails.
Sometimes it is referred to as a “Child” and the “Parent” is the file that it has been attached to; together, they are
referred to as a “Family”. An Attachment is similar to, but different from, an Embedded Item. Both are Children and
part of a Family, but an attachment can be easily detached and saved separate from the Parent, while an embedded
item is contained within an electronic file and cannot be easily detached from the Parent without special software.
Each party should ensure their ESI Stipulation contains a protocol to address embedded items.
Author field extracted from the metadata of a non-email document.
A Control Number used to identify a unique page within a production. Sometimes this term is used incorrectly to
reference a Control Number or Production Number placed on an electronic file. Bates Number is named after Edwin
G. Bates’ Automatic Numbering-Machine, patented between 1890 and 1901. A “Bates stamper” places a unique,
sequenced number on a series of pages.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 2 of 13
TERM
DEFINITION
SOURCE
Control Number
A unique number or combination of letters and numbers used to identify a unique document or unique page within a
complete document set.
BCC
BCC or blind carbon copy field extracted from an email message.
BegAttach
Unique number identifying the first page or first document of a document Attachment(s). See BegDoc and Bates
Number. BegAttach will be the document number assigned to the first page of an attachment in a Tiff Production or
the first document attachment in a Native Production.
BegBates
See, BegDoc.
BegDoc
Unique number identifying the first page of a document in a TIFF Production or a number assigned to identify an
entire native file regardless of the number of pages. BegDoc ranges should not be reused. Ideally, each production
should have a unique Bates range, unless it is a re-production designed to replace an existing production.
Navigation feature that allows a reader to quickly locate a link or point of interest within a WORD or PDF document.
Bookmark
Branding
CC
The process of applying a permanent, unique number or phrase to a Static Image in an Image or TIFF Production, or
Hybrid Production. Also known as Endorsing and Stamping. For example, Bates Numbers and confidentiality
designations are branded onto production images.
CC or carbon copy field extracted from an email message.
Child (or Children)
A Child is an Attachment to or Embedded Item within a Parent document, and is part of a Family.
ConfDesignation
Container File
The confidentiality designation assigned by the producing party to a particular document.
A Container File contains multiple other files and is generally compressed to reduce the amount of disk space used.
Common examples of container files include PST files (email, contact, calendar and tasks typically created in or
converted to Microsoft Outlook / Exchange format); NSF files (email, contact, calendar, tasks and other databases
elements created through Lotus Notes / Lotus Domino); ZIP; 7ZIP; and RAR. A Container File may be encrypted and
require an encryption key (or password) to open.
The metadata field extracted from an email thread that is generated by the email system for each conversation.
ConversationID
Cooperation
Custodian
In simple terms: Playing nicely together! Cooperation is the process of parties, frequently through their counsel,
participating in discussions designed to provide a just, speedy and inexpensive determination in each matter. See,
The Sedona Conference® Cooperation Proclamation.
Name of the custodian of the file(s) produced (last name, first name).
Custodial
Deduplication
See, Deduplication. Also known as “Vertical Deduplication”, i.e., within the set of documents obtained from a single
custodian.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
#5
Page 3 of 13
TERM
.CSV File
.DAT File
Date Format/Time
Formats
Date Created
DEFINITION
SOURCE
Comma Separated Values file. Allows data to be maintained in a table or spreadsheet format. Records are usually
stored in rows and fields are stored in column format. A CSV file is similar to a DAT file in terms of form and function
with the exception of delimiters. CSV file can be used to import or export information into various database platforms.
The CSV acronym stands for Comma Separated Values. This is an industry standard format used to import or export
electronic data. Electronic records are usually stored in rows and fields are stored in a column structure, typically
used in databases tables and spreadsheets. Using a comma as a delimiter for fields and rows, electronic data can be
imported or exported, easily, between different software applications. A CSV file is similar to a DAT file in terms of
form and function, but differ in delimiter.
A DAT file is a Concordance load file (see Load File below), which is universally accepted for loading documents into
various litigation support platforms. A DAT file contains information (data or metadata) laid out in specific fields and
can also contain searchable text that gets loaded into a TEXT field. Concordance offers an alternate way to load
searchable text. Instead of separate .TXT files, the searchable text can be added as the last field in the DAT file. A DAT
file contains a header row with field names and delimiters. Also, see DII File.
Different electronic systems compute various date and time formats. Here is an example of a commonly used format
in North America: mm/dd/yyyy. Consistent use of date formats is important especially for litigation in various time
zones and areas where date formats might be different. Date format and time zone formats should be agreed, by all
parties, at collection time and standardized across all collections regardless of document source or country of origin of
materials. This will prevent confusion as to document sequence for all parties.
Date that a file was created (mm/dd/yyyy format).
DateLastModified or
Date Last Modified
DateRcvd or Date
Received
Modification date(s) of a non-email document.
DateSent or Date
Sent
Sent date of an email message (mm/dd/yyyy format).
Received date of an email message (mm/dd/yyyy format).
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 4 of 13
TERM
Deduplication:
Global or Custodial
Delimiters
Delivery Methods –
FTP, CD, DVD, Hard
drive, Cloud
DII File
DEFINITION
SOURCE
Deduplication is the process of removing an exact version of a document from a data set. In productions deduplication
can be performed Globally (i.e. across all documents) or Custodially (i.e. across custodian). The purpose of
deduplication is to reduce the number of documents to be reviewed. However, in certain instances it is acceptable to
produce duplicates in order to show document context and to show who knew what and when. For example, an
email may have an Attachment. The email would exist in two custodial folders and if deduplicated, may now only exist
in one. This would remove an indication that an existing holder of the document had knowledge of the document.
Sometimes if a document has been removed/deduplicated, this is indicated in the Duplicates field in the metadata.
(Also please research Methods for Deduplication, which will discuss HASH VALUES, such as MD5, SHA1 etc.).
Unique characters used to identify breaks in records and fields in a Load File. For example in a CSV file (Comma
Separated Values), the comma is the separator/delimiter. Common delimiters in load files include:
Space ¶ ASCII character 32
Quote þ ASCII character 34
Pipe |ASCII character 124
An example of how delimiters appear in a Load File is:
þProdNoþþProdBegAttþþProdEndAttþþAuthorþþFromþþToþþCCþþBCCþþDateSentþþTimeRcvdþþDateRcvdþþTimeSe
ntþþDateCreatedþþTimeCreatedþþDateLastModþþTimeLastModþþEMail_SubjectþþTitleþþDocExtþþFilenameþþNativ
eFileþþOCRPathþ
Delivery methods can range from hand delivery of physical media (Hard Drive, CD, DVD, thumb drive etc.), mailing or
shipping physical media to the opposing parties, or using the internet to email or upload the files to the cloud (using
FTP or Drop Box or similar). The files are usually placed in a container such as a zip file when delivered in this method,
to both reduce their size and to provide a protection layer, where the contents can be password protected and/or
encrypted. Larger volumes of data tend to be written to encrypted physical media, rather than zipped.
As a caveat: Be sure to apply appropriate security protocols such as encryption or password protection AT ALL TIMES!
A Summation load file that contains fielded information and links to image and text paths. In Summation, text is
usually kept in separate "text files”. Text files have a .TXT file extension.
Email
An electronic means for sending, receiving and managing communications via a multitude of different structured data
applications (email client software), such as Outlook or Lotus Notes or those often known as “webmail,” such as Gmail
or Yahoo Mail. From a Production standpoint, emails can be produced natively or as images.
Email families
Email families include a file created or received by an electronic mail system and any attachments that may be
transmitted with the email message. A family would include a “parent”, which is the top email hierarchically, and its
“children”, which would be its attachments. The attachments could be embedded files, email, zip containers, or other
files such as PDFs, Office files, or the like.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
#2
Page 5 of 13
TERM
DEFINITION
Email Subject
Subject line extracted from an email message.
Embedded Item
An Embedded Item is an electronic file placed inside of another electronic file. Common embedded items include
contact cards or logos placed inside of emails, and spreadsheets placed inside of PowerPoint presentations. An
embedded item is similar to, but different from, an attachment. Both are Children, but an embedded item is
contained within an electronic file (see, Parent) and cannot be easily detached without special software, while an
attachment can be easily detached and saved separate from the Parent.
An electronic or digital process that renders the contents of a message, file or hardware unreadable to anyone not
authorized to read it or access without an encryption key; used to protect Electronically Stored Information being
stored or transferred from one location to another.
Encryption of
Deliverables/Files
SOURCE
#2
EndAttach
Unique number identifying the last page or last document of a document Attachment(s). See EndDoc and Bates
Number. EndAttach will be the document number assigned to the last page of an attachment in a Tiff Production or
the last document attachment in a Native Production.
EndBates
See EndDoc.
EndDoc
A common metadata field that contains the Bates number of the last page of a document in a tiff production or the
last document in a native production.
Endorsing (or
Endorsement)
The process of applying a permanent, unique number to a Static Image in an Image or TIFF Production, or Hybrid
Production. Also known as Branding and Stamping.
ESI
Electronically Stored Information. As referenced in the United States Federal Rules of Civil Procedure, information that #2
is stored electronically, regardless of the media or whether it is in the original format in which it was created, as
opposed to hard copy (i.e., on paper).
ESI Order
A standing Order filed with the court outlining the agreed to guidelines relating to the discovery of Electronically
Stored Information (“ESI”) and confirming that the parties have met and conferred regarding reasonable and
appropriate steps taken to preserve electronic evidence.
Extracted Text
Text can be extracted from fully searchable documents such as native emails, Excel files, Word documents and
searchable PDFs. If you can perform a ctrl-f function in a document, it has extractable text. Text from emails can be
extracted into metadata fields such as from, author, to, subject, date sent, etc. Text from Word documents can be
extracted into a single text field. See OCR for an explanation on retrieving text from non-searchable PDFs.
Family
A family is a group of documents that include an original document, known as the Parent, and any other documents
that are attached to or embedded inside of the original “Parent”, known as a Child or Children. Families exist in both
ESI and paper files. A common example is an email with attachments, where the email is referred to as the Parent,
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
#2
Page 6 of 13
TERM
DEFINITION
SOURCE
and each attachment is referred to as a Child, together a Family. See also, Email Families, Parent, Child, and
Embedded Item.
File Extensions
Many systems, including DOS and UNIX, allow a filename extension that consists of one or more characters following
the proper filename. For example, image files are usually stored as .bmp, .gif, .jpg or .tiff. Audio files are often stored
as .aud or .wav.
File Name
There are over 8,000 known file extensions identifying file formats of which typically only 20 or 30 (or less) are
needed for eDiscovery purposes. The filename extension should indicate what type of file it is; however, users may
change filename extensions to evade firewall restrictions or for other reasons. Therefore, file types should be
identified at a binary level by reference to the internal file header, rather than relying on file extensions. To research
file types, see http://www.filext.com.
Filename of the original digital file.
File Size or FileSize
The size, in bytes, of the file being produced.
File Type or FileType
The native file type of the original document, e.g., Word, Excel, Adobe, etc.
Filter
The process of identifying and excluding or including data based upon agreed-to parameters, such as file date range,
author(s), folders, directories, and/or keyword search terms.
From
From field extracted from an email message.
Global
Deduplication
See, Deduplication. Also known as “Horizontal Deduplication”, i.e., removing duplicate items across all custodians or
sources in the population.
Hidden Text
Information or text not readily visible in a document. For example, hidden columns or formulas in an Excel
spreadsheet; hidden headers or footers in a Word document. Other examples include track changes on a Microsoft
Office document or presenter notes in a PowerPoint. Text can also be hidden by formatting the text to be the same
color as the document background.
See, Deduplication. Also known as “Global Deduplication”, i.e., across all custodians in the population.
Horizontal
Deduplication
Image Path
#2
Modified
Relative file path to the location of Tiff images, if they exist in a production.
(e.g.: Volume001\PROD001\Images\ABC00015.tif). See Relative Path.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 7 of 13
TERM
LFP File
Load File
LST File
MD5 Hash
Metadata
Native File
DEFINITION
SOURCE
An IPRO file is an image load file used with an IPRO image viewer that includes a cross reference of the production
number and the file path to the Tiff image. This is conceptually related to an Opticon load file (.OPT file extension),
which is another common image load file format.
Common load files and extensions used in the legal industry include Summation (.DII), Concordance (.DAT and
.OPT), Relativity (.DAT), IPRO (LFP), Ringtail (MDB) and DB/Text Works (TXT).
In the litigation community “load file” is the term commonly used to refer to a file used to transfer data (coded,
captured or extracted from ESI processing) so that it can be imported into another discovery management database
application complete with links to maintain the relationships of metadata, native files, document images and text.
Load files are frequently text files that have delimited fields of information. Such load files may have data about
documents to be imported into a document management software such as Concordance or Summation or they may
have the path or directory locations where images reside so that the software can link the images to their
corresponding records. Some database programs require one load file for importing images, text and data, while
others require separate load files for data and text.
LST files are word lists that contains predefined values that can be selected from a list when editing fields. When a list
is assigned to a specific field, it is called an Authority List. An LST extension is also sometimes used for text load files,
which are files that contain a cross-reference between the document identifier (e.g. DOCID or BegDoc) and the file
path to the corresponding file containing extracted or OCR text.
Unique 32 character hexadecimal value, a "digital file fingerprint," that is generated with a 128 bit algorithm.
There are many types of HASH that can be used. Others include SHA1, SHA2, SHA256 and more. Regardless of what
you use, ensure that both sides agree to the Hashing method / algorithm to be used.
Metadata literally means “data about data” and consists of coded information that is usually not visible to the user
and reflects characteristics of the ESI (such as origin, usage, structure, and alteration). Systems and applications
automatically generate most metadata. For example, metadata can describe how, when, and by whom ESI was
created, accessed, and modified. Some metadata, such as file dates and sizes, can easily be seen by users. Other
metadata is hidden or embedded and generally unavailable to non-technical users. Metadata can be external to a file
or document such as from the computer’s file system or it can be embedded in the document itself. There can be
hundreds or even thousands of fields of metadata associated with an individual file. In fact, some ESI may contain
more metadata than user-visible data. Because much of the metadata may be neither relevant nor necessary for
searching, sorting and analyzing the ESI, it may be unnecessary or unhelpful to produce certain metadata fields.
Electronic documents have an associated file structure defined by the application that originally created it. This file
structure is referred to as the native file format of the document. Because viewing or searching documents in native
format may require the original application (for example, viewing a Microsoft Word document requires the Microsoft
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
#3
#5
#2
Page 8 of 13
TERM
Native Link
DEFINITION
SOURCE
Word application or a viewer that can handle the native format), documents may be converted to a neutral format as
part of the record acquisition or archive process. Static format (often called imaged format), such as TIFF or PDF, are
designed to retain an image of the document as it would look viewed in the original creating application but do not
allow metadata to be viewed or the document information to be manipulated unless agreed-upon metadata and
extracted text are preserved. In the conversion to static format, some metadata can be captured, processed,
preserved and electronically associated with the static format file. However, with technology advancements, tools and
applications are increasingly available to allow viewing and searching of documents in their native format while still
preserving pertinent metadata. It should be noted that not all ESI may be conducive to production in either the Native
Format or imaged format, and some other form of production may be necessary. Database data files, for example,
often present such issues.
Path and filename to produced native file.
Native Path
Relative path to the native file as included in the production (e.g., d:\PROD001\natives\ABC00015.xls) for all files
produced in native format.
NSF File
OCR
An NSF file is a database container file from a Lotus Notes / Domino Server system. It may contain emails, contacts,
calendar and task items, and can be used as an archive or to transfer such data offline. A company that uses Lotus
Notes will frequently transfer ESI out of their environment in an NSF file. Also, custodians may retain NSF files offline
that may need to be considered for collection. An NSF file is the functional equivalent of a PST file from a Microsoft
Outlook / Exchange server system.
Optical Character Recognition (OCR) is a technology process that captures searchable text from an image file so that it
can be associated with the image and searched as text within a discovery database. OCR software evaluates scanned
data for shapes it recognizes as letters or numerals. OCR accuracy is dependent on the clarity of the image being
converted to text, is not always accurate, especially with poor print or scan quality or faded images. See Extracted
Text. Documents that are produced from scanned paper in Tiff/Image format should be OCR’ed. You should keep in
mind when requesting OCR or Extracted Text, that OCR is often less accurate than Native Text Extraction.
OPT File
Shapes and graphics that appear in the middle of text will usually reduce OCR quality.
An OPT file is an image load file used with a Concordance Image viewer that includes a cross reference of the
production number and the file path to the Tiff image. This is conceptually similar to the LFP file.
OriginalFileLocation
For email, the folder, if any, where the email was stored. For non-email, the folder location where the file was stored
in the normal course of business.
OtherCustodians
Identifies duplicate custodian sources for files excluded from production based on MD5 hash de-duplication.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 9 of 13
TERM
DEFINITION
SOURCE
Paper - Quality of
Images
Paper documents that were created in the past fifteen years were very likely created in electronic form and were
printed and rescanned. The digital image quality of the document diminishes as multiple generations of a document
are made (copy of a copy of a copy). These documents yield poor OCR results and are often not reliably searchable.
Parent
An original document that is part of a document Family, to which other documents are attached or in which other
items are embedded.
PDF
Portable Document Format. PDFs can be created using either the native Adobe application or a variety of other
systems. PDFs can be searchable or non-searchable. In discovery it is important to consider which type of PDF is
being produced.
1) A searchable PDF has extractable text and can be identified when you can copy or highlight text or when you
can perform the Windows CTRL-F (Find) function.
2) A non-searchable PDF will require OCR’ing, if you want to be able to search the text content from within that
file.
If created from a native file directly from the computer, PDF quality will typically be good. Conversely, OCR’d or
scanned images can be output as PDF’s with searchable text, but will often have poor image quality.
PDF is typically not considered a desirable production format, and while common practice at many law firms,
especially those without review tools, can lead to unwieldy evidence handling. Multiple PDFs crammed into single PDF
file is even less desirable, unless the PDF’s are book marked. Avoid this type of production protocol and instead
consider leveraging the capabilities of the emerging range of tools that allow inexpensive online hosting and review.
Privilege Log
Privilege Review
Processing
A Privilege Log is a list of withheld documents and the legal basis on which the producing party is withholding them,
(e.g. Documents subject to an asserted privilege or the Work Product Doctrine). The Privilege Log must typically
provide enough specificity to permit the requesting party and the court to reasonably determine the sufficiency of the
asserted privilege basis.
The review of documents by counsel to determine whether they should be withheld on the grounds of asserting some
form of recognized privilege (attorney-client, doctor-patient, priest-penitent) or on the basis of the Work Product
Doctrine. Documents withheld must generally be listed in a Privilege Log.
Processing is a technical function during which collected ESI is passed through various systems which can capture and
preserve document and file metadata, create file hashes, extract text, creates static images and create a load file
which can be imported into a discovery management database application. Some discovery database applications
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 10 of 13
TERM
DEFINITION
SOURCE
include processing functions in the application and others rely on separate processing applications to perform the
functions listed above.
Production Number
A Control Number used to identify a unique document or page within a production.
Product Volume or
ProdVolume
Identifies production media deliverable.
PST File
A PST file is an output container file from a Microsoft Outlook / Exchange Server system. It may contain exported
emails, contacts, calendar and task items, and can be used as an archive or to transfer such data offline. A company
that uses Microsoft Outlook / Exchange will frequently transfer ESI to venders in a PST file. Also, custodians may
retain PST files offline that may need to be collected. It is the functional equivalent of a NSF file from a Lotus Notes
Server / Domino Server system.
“Yes,” for redacted documents; otherwise, blank. Redactions may also be applied to metadata fields to protect
privilege. (Also see Redactions below.)
Redacted
Redactions
Relative Path
Scan Quality
A portion of an image or document is intentionally obscured or removed to prevent disclosure of a specific portion of
the content. Redaction is performed to protect privileged content or to remove irrelevant portions, including highly
confidential, sensitive or proprietary information. Redactions typically contain labels, such as “Redacted – Privileged”
but specific requirements are typically negotiated between parties.
#2
Spreadsheet redactions are complex and expensive. A better alternative is to filter the spreadsheet and save it, and
add a slip-sheet to the production that indicates “Redacted Natively”
Relative path to text files in production output e.g. :\PROD001\Text\ABC00015.xls).
Text Path is the path to the location of text files in the transfer media. When producing documents you may be asked
to provide the text path which is simply the path to the location of the text files on the transfer media. Generally,
vendors will provide separate folders for native and text files. However, it is not uncommon for some vendors to
include text files and native files in the same folder.
Paper documents are optically scanned to convert them to TIFF images to allow electronic production as opposed to a
hard copy production. Generally, text will be extracted from the scanned images with an OCR (Optical Character
Recognition) software. The quality of the scanned image directly affects the quality of the OCR text, which in turn
affects the number of character errors generated per line of text. Documents such as forms containing boxes, and
documents with handwritten notes may create a perfect TIFF image, but will most likely render poor OCR results,
where the extracted text is less than useful.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 11 of 13
TERM
Slip Sheets
Speaker Notes
Stamping
Text
DEFINITION
SOURCE
Slip-sheets are used as placeholders to either separate documents or identify a reason why an expected document is
not present in a collection. For example, a slip-sheet can be used to indicate documents that are produced natively or
withheld for privilege or other reasons.
Speaker notes are an element of Microsoft PowerPoint that allows the user to insert notes in a designated section on
a slide for reference during a presentation. Speaker notes are not printed by default, and during processing of ESI,
the text in speaker notes will not be extracted or shown on a Static Image by default. It has become fairly common
for requesting parties to request that Speaker Notes be “turned on” for processing.
The process of applying a permanent, unique number to a Static Image in an Image or TIFF Production, or Hybrid
Production. Also known as Endorsing and Branding.
Text is the extracted text from native files, or text from paper documents which have been scanned and processed
with OCR (Optical Character Recognition) software. Extracted text can be contained in a multi-page text file (.TXT) for
each document produced or single page (.TXT) file for each page produced.
Text Path
Text Path (path to text files in production output e.g. :\PROD001\Text\ABC00015.xls) )
Text Path is the path to the location of text files on production transfer media. When producing documents you may
be asked to provide the text path, which is simply the path to the location of the text files on the production media.
Generally, vendors will provide separate folders for native and text files. However, it is not uncommon for some
vendors to include text files and native files in the same folder.
Tiff Images
TIFF (Tag Image File Format). A TIFF file can be identified as a file with a ".tiff" or ".tif" file name suffix. One of the
most common graphic image formats, TIFF files are commonly used in ESI production, and are typically requested to
be generated in a Group IV 300 dpi greyscale standard. Color TIFFs can also be requested in special circumstances.
A time zone is a region that observes a uniform standard time for legal, commercial, and social purposes.
Time Zone
Most of the time zones on land are offset from Coordinated Universal Time (UTC) by a whole number of hours
(UTC−12 to UTC+14), but a few are offset by 30 or 45 minutes (for example Newfoundland Standard Time is UTC 03:30 and Nepal Standard Time is UTC +05:45). Some higher latitude countries use daylight saving time for part of the
year, typically by changing clocks by an hour.
Title
It is important to consider time zones when producing data from multiple regions. It is somewhat common practice
to conform the times (and dates) of all documents to a particular time zone, for ease of overall reference. This helps
the reviewer understand the relative times of messages in a conversation. Without this, it is sometimes difficult to
understand the order of each message.
Title field extracted from the metadata of a non-email document.
To
To or Recipient field extracted from an email message.
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 12 of 13
TERM
DEFINITION
SOURCE
Unitization
Each page of a document will be electronically saved into an image file. If a document is more than one page, the
unitization of the document and any attachments will be maintained as it existed in the original form and reflected in
the load file. The parties will make their best efforts to unitize documents correctly.
Vertical
Deduplication
See, Deduplication. Also known as “Custodial Deduplication”, i.e., within a set of documents for a single custodian as
opposed to horizontally across all custodians.
Definitions Attribution List
1)
2)
3)
4)
5)
https://www.law.cornell.edu/rules/fre/rule_502
Sedona Conference Glossary
http://help.lexisnexis.com/litigation/ac/cn_classic/database_files.htm
Gibson Dunn Crutcher (http://www.gibsondunn.com/publications/Documents/E-DiscoveryBasicsProductionofESI-Vol1No9.pdf)
Your kindergarten teacher  (play nicely together…). Read the various Sedona articles.
Key Contributors
LTPI is grateful to the following people who contributed their time, knowledge, guidance and expertize.
Seth Eichenholtz
Quin Gregor
Cynthia Johnson
Eric Mandel
Nilsa Moreno
Chris Paskach
Bob Rohlf
along with the LTPI Leadership and Advisory Panels for their support and oversight
© 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015)
Page 13 of 13
Download