PRODUCTION GLOSSARY The Legal Technology Professionals Institute Production Glossary is designed as an educational resource on terminology used in connection with producing electronically stored information. While a number of useful industry-wide glossaries exist, we could not find one that specifically discussed document production, nor one that discussed not only the “what”, but also the “why”, so we created one. You will find the words below used by both lawyers and litigation support technicians. You will also find these terms used in ESI stipulations as well as in Load Files. This is the language of “Production”. This is a work in progress. If you are confused by a term or think we have missed one, please contact us at programs@legaltechpi.org. Please also visit our website http://www.LegalTechPI.org/. If you find LTPI’s work applicable to you, please join LTPI and contribute your time and resources today. TERM Native Production TIFF Production DEFINITION SOURCE A Native Production is a one of the three primary methods of producing ESI. A Native Production contains three main components: (1) A copy of each original electronic document being produced, in the format created by the authoring / native application such as Word (DOC) or Excel (XLS/XLSX), i.e., the Native File; (2) The text extracted from each Native File during processing, provided in an associated .TXT File; and (3) A Load File containing the metadata and source information extracted from each Native file during processing, provided in an associated .DAT, .LEF, .DII or other similar standard format. The advantage of Native Production is that, if the files are preserved properly, Metadata and Extracted Text should be 100% accurate and intact, improving the efficiency of the review and analysis of produced ESI. Native Production also maintains the “best evidence” of potential evidence throughout the process. However, the receiving party will need a matching application for each file type produced, or Viewer Software, such as “QuickViewPlus,”(or similar) that can read and render most standard Native Files in readable format. TIFF Production is one of the three primary methods of producing ESI. A TIFF Production should usually contain four primary components: (1) A series of Static Images showing each page from each original hard copy or electronic document being produced, fixed in the same manner as if the document was printed or copied on paper, usually provided in either TIFF or PDF file format; (2) The text extracted from each Native File during processing, or electronic text created through OCR, provided in an associated .TXT File; (this is optional but highly recommended) (3) A Load File containing the metadata and source information extracted from each Native file during processing, provided in an associated .DAT, .LEF, .DII or other similar standard format; and (4) A Load File containing information necessary to link each Static Image with all other Static Images that are part of the same document or a series of documents (see, Attachment and Family), provided in an associated .OPT or .DII file. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 1 of 13 TERM DEFINITION SOURCE TIFF Productions facilitate the fixed pagination of documents and placement of Bates Numbers, Branding, and Redactions on each individual page. Image Production Hybrid Production Hard Copy Production 502(d) Order Attachment Author Bates Number See, TIFF Production. The term “Image Production” is a more generic term than “TIFF Production” and may be used when referencing a production of Static Images in either TIFF or PDF format. Remember: Unless you request extracted text with your images, what you will get are just images, which makes “search” next to impossible. Hybrid Production is one of the three primary methods of producing ESI. A Hybrid Production is a mix of Native Production and Image or TIFF Production elements, selecting certain Native Files to be produced in their Native File format, and other Native Files to be rendered to a TIFF or PDF image. A common example for a Hybrid Production would be to produce spreadsheets, PowerPoint presentations, graphic images, and specialty applications as Native Files, while emails and word processing files produced as Static Images. The necessary metadata and text elements for each type of production would be included as appropriate. Documents produced in paper format. This is not a primary method of producing ESI. The December 1, 2006 amendments to the Federal Rules of Civil Procedure (FRCP) – specifically rule 34(b) – made the default obligation to produce a document “in a form or forms in which it is ordinarily maintained or in a form or forms that are reasonably usable” unless the requesting party – or failing that, the producing party – specifies a different format. A party that produced ESI in Hard Copy format only without request or court order will likely be required to reproduce the documents in an electronic format at their own expense. Federal Rule of Evidence 502(d) provides “A federal court may order that the privilege or protection is not waived by disclosure connected with the litigation pending before the court — in which event the disclosure is also not a waiver in any other federal or state proceeding.” Thus, parties who obtain a 502(d) order protect privilege and avoid waiver of privilege or subject matter in the present litigation as well as in other matters in any other federal or state courts. An Attachment is an electronic file that has been “attached” to another electronic file, most frequently emails. Sometimes it is referred to as a “Child” and the “Parent” is the file that it has been attached to; together, they are referred to as a “Family”. An Attachment is similar to, but different from, an Embedded Item. Both are Children and part of a Family, but an attachment can be easily detached and saved separate from the Parent, while an embedded item is contained within an electronic file and cannot be easily detached from the Parent without special software. Each party should ensure their ESI Stipulation contains a protocol to address embedded items. Author field extracted from the metadata of a non-email document. A Control Number used to identify a unique page within a production. Sometimes this term is used incorrectly to reference a Control Number or Production Number placed on an electronic file. Bates Number is named after Edwin G. Bates’ Automatic Numbering-Machine, patented between 1890 and 1901. A “Bates stamper” places a unique, sequenced number on a series of pages. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 2 of 13 TERM DEFINITION SOURCE Control Number A unique number or combination of letters and numbers used to identify a unique document or unique page within a complete document set. BCC BCC or blind carbon copy field extracted from an email message. BegAttach Unique number identifying the first page or first document of a document Attachment(s). See BegDoc and Bates Number. BegAttach will be the document number assigned to the first page of an attachment in a Tiff Production or the first document attachment in a Native Production. BegBates See, BegDoc. BegDoc Unique number identifying the first page of a document in a TIFF Production or a number assigned to identify an entire native file regardless of the number of pages. BegDoc ranges should not be reused. Ideally, each production should have a unique Bates range, unless it is a re-production designed to replace an existing production. Navigation feature that allows a reader to quickly locate a link or point of interest within a WORD or PDF document. Bookmark Branding CC The process of applying a permanent, unique number or phrase to a Static Image in an Image or TIFF Production, or Hybrid Production. Also known as Endorsing and Stamping. For example, Bates Numbers and confidentiality designations are branded onto production images. CC or carbon copy field extracted from an email message. Child (or Children) A Child is an Attachment to or Embedded Item within a Parent document, and is part of a Family. ConfDesignation Container File The confidentiality designation assigned by the producing party to a particular document. A Container File contains multiple other files and is generally compressed to reduce the amount of disk space used. Common examples of container files include PST files (email, contact, calendar and tasks typically created in or converted to Microsoft Outlook / Exchange format); NSF files (email, contact, calendar, tasks and other databases elements created through Lotus Notes / Lotus Domino); ZIP; 7ZIP; and RAR. A Container File may be encrypted and require an encryption key (or password) to open. The metadata field extracted from an email thread that is generated by the email system for each conversation. ConversationID Cooperation Custodian In simple terms: Playing nicely together! Cooperation is the process of parties, frequently through their counsel, participating in discussions designed to provide a just, speedy and inexpensive determination in each matter. See, The Sedona Conference® Cooperation Proclamation. Name of the custodian of the file(s) produced (last name, first name). Custodial Deduplication See, Deduplication. Also known as “Vertical Deduplication”, i.e., within the set of documents obtained from a single custodian. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) #5 Page 3 of 13 TERM .CSV File .DAT File Date Format/Time Formats Date Created DEFINITION SOURCE Comma Separated Values file. Allows data to be maintained in a table or spreadsheet format. Records are usually stored in rows and fields are stored in column format. A CSV file is similar to a DAT file in terms of form and function with the exception of delimiters. CSV file can be used to import or export information into various database platforms. The CSV acronym stands for Comma Separated Values. This is an industry standard format used to import or export electronic data. Electronic records are usually stored in rows and fields are stored in a column structure, typically used in databases tables and spreadsheets. Using a comma as a delimiter for fields and rows, electronic data can be imported or exported, easily, between different software applications. A CSV file is similar to a DAT file in terms of form and function, but differ in delimiter. A DAT file is a Concordance load file (see Load File below), which is universally accepted for loading documents into various litigation support platforms. A DAT file contains information (data or metadata) laid out in specific fields and can also contain searchable text that gets loaded into a TEXT field. Concordance offers an alternate way to load searchable text. Instead of separate .TXT files, the searchable text can be added as the last field in the DAT file. A DAT file contains a header row with field names and delimiters. Also, see DII File. Different electronic systems compute various date and time formats. Here is an example of a commonly used format in North America: mm/dd/yyyy. Consistent use of date formats is important especially for litigation in various time zones and areas where date formats might be different. Date format and time zone formats should be agreed, by all parties, at collection time and standardized across all collections regardless of document source or country of origin of materials. This will prevent confusion as to document sequence for all parties. Date that a file was created (mm/dd/yyyy format). DateLastModified or Date Last Modified DateRcvd or Date Received Modification date(s) of a non-email document. DateSent or Date Sent Sent date of an email message (mm/dd/yyyy format). Received date of an email message (mm/dd/yyyy format). © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 4 of 13 TERM Deduplication: Global or Custodial Delimiters Delivery Methods – FTP, CD, DVD, Hard drive, Cloud DII File DEFINITION SOURCE Deduplication is the process of removing an exact version of a document from a data set. In productions deduplication can be performed Globally (i.e. across all documents) or Custodially (i.e. across custodian). The purpose of deduplication is to reduce the number of documents to be reviewed. However, in certain instances it is acceptable to produce duplicates in order to show document context and to show who knew what and when. For example, an email may have an Attachment. The email would exist in two custodial folders and if deduplicated, may now only exist in one. This would remove an indication that an existing holder of the document had knowledge of the document. Sometimes if a document has been removed/deduplicated, this is indicated in the Duplicates field in the metadata. (Also please research Methods for Deduplication, which will discuss HASH VALUES, such as MD5, SHA1 etc.). Unique characters used to identify breaks in records and fields in a Load File. For example in a CSV file (Comma Separated Values), the comma is the separator/delimiter. Common delimiters in load files include: Space ¶ ASCII character 32 Quote þ ASCII character 34 Pipe |ASCII character 124 An example of how delimiters appear in a Load File is: þProdNoþþProdBegAttþþProdEndAttþþAuthorþþFromþþToþþCCþþBCCþþDateSentþþTimeRcvdþþDateRcvdþþTimeSe ntþþDateCreatedþþTimeCreatedþþDateLastModþþTimeLastModþþEMail_SubjectþþTitleþþDocExtþþFilenameþþNativ eFileþþOCRPathþ Delivery methods can range from hand delivery of physical media (Hard Drive, CD, DVD, thumb drive etc.), mailing or shipping physical media to the opposing parties, or using the internet to email or upload the files to the cloud (using FTP or Drop Box or similar). The files are usually placed in a container such as a zip file when delivered in this method, to both reduce their size and to provide a protection layer, where the contents can be password protected and/or encrypted. Larger volumes of data tend to be written to encrypted physical media, rather than zipped. As a caveat: Be sure to apply appropriate security protocols such as encryption or password protection AT ALL TIMES! A Summation load file that contains fielded information and links to image and text paths. In Summation, text is usually kept in separate "text files”. Text files have a .TXT file extension. Email An electronic means for sending, receiving and managing communications via a multitude of different structured data applications (email client software), such as Outlook or Lotus Notes or those often known as “webmail,” such as Gmail or Yahoo Mail. From a Production standpoint, emails can be produced natively or as images. Email families Email families include a file created or received by an electronic mail system and any attachments that may be transmitted with the email message. A family would include a “parent”, which is the top email hierarchically, and its “children”, which would be its attachments. The attachments could be embedded files, email, zip containers, or other files such as PDFs, Office files, or the like. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) #2 Page 5 of 13 TERM DEFINITION Email Subject Subject line extracted from an email message. Embedded Item An Embedded Item is an electronic file placed inside of another electronic file. Common embedded items include contact cards or logos placed inside of emails, and spreadsheets placed inside of PowerPoint presentations. An embedded item is similar to, but different from, an attachment. Both are Children, but an embedded item is contained within an electronic file (see, Parent) and cannot be easily detached without special software, while an attachment can be easily detached and saved separate from the Parent. An electronic or digital process that renders the contents of a message, file or hardware unreadable to anyone not authorized to read it or access without an encryption key; used to protect Electronically Stored Information being stored or transferred from one location to another. Encryption of Deliverables/Files SOURCE #2 EndAttach Unique number identifying the last page or last document of a document Attachment(s). See EndDoc and Bates Number. EndAttach will be the document number assigned to the last page of an attachment in a Tiff Production or the last document attachment in a Native Production. EndBates See EndDoc. EndDoc A common metadata field that contains the Bates number of the last page of a document in a tiff production or the last document in a native production. Endorsing (or Endorsement) The process of applying a permanent, unique number to a Static Image in an Image or TIFF Production, or Hybrid Production. Also known as Branding and Stamping. ESI Electronically Stored Information. As referenced in the United States Federal Rules of Civil Procedure, information that #2 is stored electronically, regardless of the media or whether it is in the original format in which it was created, as opposed to hard copy (i.e., on paper). ESI Order A standing Order filed with the court outlining the agreed to guidelines relating to the discovery of Electronically Stored Information (“ESI”) and confirming that the parties have met and conferred regarding reasonable and appropriate steps taken to preserve electronic evidence. Extracted Text Text can be extracted from fully searchable documents such as native emails, Excel files, Word documents and searchable PDFs. If you can perform a ctrl-f function in a document, it has extractable text. Text from emails can be extracted into metadata fields such as from, author, to, subject, date sent, etc. Text from Word documents can be extracted into a single text field. See OCR for an explanation on retrieving text from non-searchable PDFs. Family A family is a group of documents that include an original document, known as the Parent, and any other documents that are attached to or embedded inside of the original “Parent”, known as a Child or Children. Families exist in both ESI and paper files. A common example is an email with attachments, where the email is referred to as the Parent, © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) #2 Page 6 of 13 TERM DEFINITION SOURCE and each attachment is referred to as a Child, together a Family. See also, Email Families, Parent, Child, and Embedded Item. File Extensions Many systems, including DOS and UNIX, allow a filename extension that consists of one or more characters following the proper filename. For example, image files are usually stored as .bmp, .gif, .jpg or .tiff. Audio files are often stored as .aud or .wav. File Name There are over 8,000 known file extensions identifying file formats of which typically only 20 or 30 (or less) are needed for eDiscovery purposes. The filename extension should indicate what type of file it is; however, users may change filename extensions to evade firewall restrictions or for other reasons. Therefore, file types should be identified at a binary level by reference to the internal file header, rather than relying on file extensions. To research file types, see http://www.filext.com. Filename of the original digital file. File Size or FileSize The size, in bytes, of the file being produced. File Type or FileType The native file type of the original document, e.g., Word, Excel, Adobe, etc. Filter The process of identifying and excluding or including data based upon agreed-to parameters, such as file date range, author(s), folders, directories, and/or keyword search terms. From From field extracted from an email message. Global Deduplication See, Deduplication. Also known as “Horizontal Deduplication”, i.e., removing duplicate items across all custodians or sources in the population. Hidden Text Information or text not readily visible in a document. For example, hidden columns or formulas in an Excel spreadsheet; hidden headers or footers in a Word document. Other examples include track changes on a Microsoft Office document or presenter notes in a PowerPoint. Text can also be hidden by formatting the text to be the same color as the document background. See, Deduplication. Also known as “Global Deduplication”, i.e., across all custodians in the population. Horizontal Deduplication Image Path #2 Modified Relative file path to the location of Tiff images, if they exist in a production. (e.g.: Volume001\PROD001\Images\ABC00015.tif). See Relative Path. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 7 of 13 TERM LFP File Load File LST File MD5 Hash Metadata Native File DEFINITION SOURCE An IPRO file is an image load file used with an IPRO image viewer that includes a cross reference of the production number and the file path to the Tiff image. This is conceptually related to an Opticon load file (.OPT file extension), which is another common image load file format. Common load files and extensions used in the legal industry include Summation (.DII), Concordance (.DAT and .OPT), Relativity (.DAT), IPRO (LFP), Ringtail (MDB) and DB/Text Works (TXT). In the litigation community “load file” is the term commonly used to refer to a file used to transfer data (coded, captured or extracted from ESI processing) so that it can be imported into another discovery management database application complete with links to maintain the relationships of metadata, native files, document images and text. Load files are frequently text files that have delimited fields of information. Such load files may have data about documents to be imported into a document management software such as Concordance or Summation or they may have the path or directory locations where images reside so that the software can link the images to their corresponding records. Some database programs require one load file for importing images, text and data, while others require separate load files for data and text. LST files are word lists that contains predefined values that can be selected from a list when editing fields. When a list is assigned to a specific field, it is called an Authority List. An LST extension is also sometimes used for text load files, which are files that contain a cross-reference between the document identifier (e.g. DOCID or BegDoc) and the file path to the corresponding file containing extracted or OCR text. Unique 32 character hexadecimal value, a "digital file fingerprint," that is generated with a 128 bit algorithm. There are many types of HASH that can be used. Others include SHA1, SHA2, SHA256 and more. Regardless of what you use, ensure that both sides agree to the Hashing method / algorithm to be used. Metadata literally means “data about data” and consists of coded information that is usually not visible to the user and reflects characteristics of the ESI (such as origin, usage, structure, and alteration). Systems and applications automatically generate most metadata. For example, metadata can describe how, when, and by whom ESI was created, accessed, and modified. Some metadata, such as file dates and sizes, can easily be seen by users. Other metadata is hidden or embedded and generally unavailable to non-technical users. Metadata can be external to a file or document such as from the computer’s file system or it can be embedded in the document itself. There can be hundreds or even thousands of fields of metadata associated with an individual file. In fact, some ESI may contain more metadata than user-visible data. Because much of the metadata may be neither relevant nor necessary for searching, sorting and analyzing the ESI, it may be unnecessary or unhelpful to produce certain metadata fields. Electronic documents have an associated file structure defined by the application that originally created it. This file structure is referred to as the native file format of the document. Because viewing or searching documents in native format may require the original application (for example, viewing a Microsoft Word document requires the Microsoft © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) #3 #5 #2 Page 8 of 13 TERM Native Link DEFINITION SOURCE Word application or a viewer that can handle the native format), documents may be converted to a neutral format as part of the record acquisition or archive process. Static format (often called imaged format), such as TIFF or PDF, are designed to retain an image of the document as it would look viewed in the original creating application but do not allow metadata to be viewed or the document information to be manipulated unless agreed-upon metadata and extracted text are preserved. In the conversion to static format, some metadata can be captured, processed, preserved and electronically associated with the static format file. However, with technology advancements, tools and applications are increasingly available to allow viewing and searching of documents in their native format while still preserving pertinent metadata. It should be noted that not all ESI may be conducive to production in either the Native Format or imaged format, and some other form of production may be necessary. Database data files, for example, often present such issues. Path and filename to produced native file. Native Path Relative path to the native file as included in the production (e.g., d:\PROD001\natives\ABC00015.xls) for all files produced in native format. NSF File OCR An NSF file is a database container file from a Lotus Notes / Domino Server system. It may contain emails, contacts, calendar and task items, and can be used as an archive or to transfer such data offline. A company that uses Lotus Notes will frequently transfer ESI out of their environment in an NSF file. Also, custodians may retain NSF files offline that may need to be considered for collection. An NSF file is the functional equivalent of a PST file from a Microsoft Outlook / Exchange server system. Optical Character Recognition (OCR) is a technology process that captures searchable text from an image file so that it can be associated with the image and searched as text within a discovery database. OCR software evaluates scanned data for shapes it recognizes as letters or numerals. OCR accuracy is dependent on the clarity of the image being converted to text, is not always accurate, especially with poor print or scan quality or faded images. See Extracted Text. Documents that are produced from scanned paper in Tiff/Image format should be OCR’ed. You should keep in mind when requesting OCR or Extracted Text, that OCR is often less accurate than Native Text Extraction. OPT File Shapes and graphics that appear in the middle of text will usually reduce OCR quality. An OPT file is an image load file used with a Concordance Image viewer that includes a cross reference of the production number and the file path to the Tiff image. This is conceptually similar to the LFP file. OriginalFileLocation For email, the folder, if any, where the email was stored. For non-email, the folder location where the file was stored in the normal course of business. OtherCustodians Identifies duplicate custodian sources for files excluded from production based on MD5 hash de-duplication. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 9 of 13 TERM DEFINITION SOURCE Paper - Quality of Images Paper documents that were created in the past fifteen years were very likely created in electronic form and were printed and rescanned. The digital image quality of the document diminishes as multiple generations of a document are made (copy of a copy of a copy). These documents yield poor OCR results and are often not reliably searchable. Parent An original document that is part of a document Family, to which other documents are attached or in which other items are embedded. PDF Portable Document Format. PDFs can be created using either the native Adobe application or a variety of other systems. PDFs can be searchable or non-searchable. In discovery it is important to consider which type of PDF is being produced. 1) A searchable PDF has extractable text and can be identified when you can copy or highlight text or when you can perform the Windows CTRL-F (Find) function. 2) A non-searchable PDF will require OCR’ing, if you want to be able to search the text content from within that file. If created from a native file directly from the computer, PDF quality will typically be good. Conversely, OCR’d or scanned images can be output as PDF’s with searchable text, but will often have poor image quality. PDF is typically not considered a desirable production format, and while common practice at many law firms, especially those without review tools, can lead to unwieldy evidence handling. Multiple PDFs crammed into single PDF file is even less desirable, unless the PDF’s are book marked. Avoid this type of production protocol and instead consider leveraging the capabilities of the emerging range of tools that allow inexpensive online hosting and review. Privilege Log Privilege Review Processing A Privilege Log is a list of withheld documents and the legal basis on which the producing party is withholding them, (e.g. Documents subject to an asserted privilege or the Work Product Doctrine). The Privilege Log must typically provide enough specificity to permit the requesting party and the court to reasonably determine the sufficiency of the asserted privilege basis. The review of documents by counsel to determine whether they should be withheld on the grounds of asserting some form of recognized privilege (attorney-client, doctor-patient, priest-penitent) or on the basis of the Work Product Doctrine. Documents withheld must generally be listed in a Privilege Log. Processing is a technical function during which collected ESI is passed through various systems which can capture and preserve document and file metadata, create file hashes, extract text, creates static images and create a load file which can be imported into a discovery management database application. Some discovery database applications © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 10 of 13 TERM DEFINITION SOURCE include processing functions in the application and others rely on separate processing applications to perform the functions listed above. Production Number A Control Number used to identify a unique document or page within a production. Product Volume or ProdVolume Identifies production media deliverable. PST File A PST file is an output container file from a Microsoft Outlook / Exchange Server system. It may contain exported emails, contacts, calendar and task items, and can be used as an archive or to transfer such data offline. A company that uses Microsoft Outlook / Exchange will frequently transfer ESI to venders in a PST file. Also, custodians may retain PST files offline that may need to be collected. It is the functional equivalent of a NSF file from a Lotus Notes Server / Domino Server system. “Yes,” for redacted documents; otherwise, blank. Redactions may also be applied to metadata fields to protect privilege. (Also see Redactions below.) Redacted Redactions Relative Path Scan Quality A portion of an image or document is intentionally obscured or removed to prevent disclosure of a specific portion of the content. Redaction is performed to protect privileged content or to remove irrelevant portions, including highly confidential, sensitive or proprietary information. Redactions typically contain labels, such as “Redacted – Privileged” but specific requirements are typically negotiated between parties. #2 Spreadsheet redactions are complex and expensive. A better alternative is to filter the spreadsheet and save it, and add a slip-sheet to the production that indicates “Redacted Natively” Relative path to text files in production output e.g. :\PROD001\Text\ABC00015.xls). Text Path is the path to the location of text files in the transfer media. When producing documents you may be asked to provide the text path which is simply the path to the location of the text files on the transfer media. Generally, vendors will provide separate folders for native and text files. However, it is not uncommon for some vendors to include text files and native files in the same folder. Paper documents are optically scanned to convert them to TIFF images to allow electronic production as opposed to a hard copy production. Generally, text will be extracted from the scanned images with an OCR (Optical Character Recognition) software. The quality of the scanned image directly affects the quality of the OCR text, which in turn affects the number of character errors generated per line of text. Documents such as forms containing boxes, and documents with handwritten notes may create a perfect TIFF image, but will most likely render poor OCR results, where the extracted text is less than useful. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 11 of 13 TERM Slip Sheets Speaker Notes Stamping Text DEFINITION SOURCE Slip-sheets are used as placeholders to either separate documents or identify a reason why an expected document is not present in a collection. For example, a slip-sheet can be used to indicate documents that are produced natively or withheld for privilege or other reasons. Speaker notes are an element of Microsoft PowerPoint that allows the user to insert notes in a designated section on a slide for reference during a presentation. Speaker notes are not printed by default, and during processing of ESI, the text in speaker notes will not be extracted or shown on a Static Image by default. It has become fairly common for requesting parties to request that Speaker Notes be “turned on” for processing. The process of applying a permanent, unique number to a Static Image in an Image or TIFF Production, or Hybrid Production. Also known as Endorsing and Branding. Text is the extracted text from native files, or text from paper documents which have been scanned and processed with OCR (Optical Character Recognition) software. Extracted text can be contained in a multi-page text file (.TXT) for each document produced or single page (.TXT) file for each page produced. Text Path Text Path (path to text files in production output e.g. :\PROD001\Text\ABC00015.xls) ) Text Path is the path to the location of text files on production transfer media. When producing documents you may be asked to provide the text path, which is simply the path to the location of the text files on the production media. Generally, vendors will provide separate folders for native and text files. However, it is not uncommon for some vendors to include text files and native files in the same folder. Tiff Images TIFF (Tag Image File Format). A TIFF file can be identified as a file with a ".tiff" or ".tif" file name suffix. One of the most common graphic image formats, TIFF files are commonly used in ESI production, and are typically requested to be generated in a Group IV 300 dpi greyscale standard. Color TIFFs can also be requested in special circumstances. A time zone is a region that observes a uniform standard time for legal, commercial, and social purposes. Time Zone Most of the time zones on land are offset from Coordinated Universal Time (UTC) by a whole number of hours (UTC−12 to UTC+14), but a few are offset by 30 or 45 minutes (for example Newfoundland Standard Time is UTC 03:30 and Nepal Standard Time is UTC +05:45). Some higher latitude countries use daylight saving time for part of the year, typically by changing clocks by an hour. Title It is important to consider time zones when producing data from multiple regions. It is somewhat common practice to conform the times (and dates) of all documents to a particular time zone, for ease of overall reference. This helps the reviewer understand the relative times of messages in a conversation. Without this, it is sometimes difficult to understand the order of each message. Title field extracted from the metadata of a non-email document. To To or Recipient field extracted from an email message. © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 12 of 13 TERM DEFINITION SOURCE Unitization Each page of a document will be electronically saved into an image file. If a document is more than one page, the unitization of the document and any attachments will be maintained as it existed in the original form and reflected in the load file. The parties will make their best efforts to unitize documents correctly. Vertical Deduplication See, Deduplication. Also known as “Custodial Deduplication”, i.e., within a set of documents for a single custodian as opposed to horizontally across all custodians. Definitions Attribution List 1) 2) 3) 4) 5) https://www.law.cornell.edu/rules/fre/rule_502 Sedona Conference Glossary http://help.lexisnexis.com/litigation/ac/cn_classic/database_files.htm Gibson Dunn Crutcher (http://www.gibsondunn.com/publications/Documents/E-DiscoveryBasicsProductionofESI-Vol1No9.pdf) Your kindergarten teacher (play nicely together…). Read the various Sedona articles. Key Contributors LTPI is grateful to the following people who contributed their time, knowledge, guidance and expertize. Seth Eichenholtz Quin Gregor Cynthia Johnson Eric Mandel Nilsa Moreno Chris Paskach Bob Rohlf along with the LTPI Leadership and Advisory Panels for their support and oversight © 2015 Legal Technology Professionals Institute, subject to CC BY 4.0 International License. (Last edited November 10, 2015) Page 13 of 13