Role of Enterprise Search in E-Discovery June 18, 2008 © 2008 Clearwell Systems, Inc. Confidential Enterprise E-Discovery is a business process Search is central to E-Discovery Electronic Discovery Reference Model (www.edrm.net) Processing Preservation Information Management Identification Review Production Presentation Collection Analysis VOLUME Identification Search • Custodians • Meta-Data • Date Range • Media Type • Data Type © 2008 Clearwell Systems, Inc. Confidential RELEVANCE Collection Search • By Custodian • By Operator • By Data Type • By keyword, phrase, concept • By Project 2 Analysis/Review Search • Responsiveness • Privilege Determination • Review Grouping • Near-duplicates • Quality Control FRCP Rules governing E-Discovery Rule Summary Reading Rule 16(b) Outline plans for e-discovery and document production Rule 26(f) Procedures and Protocols to govern e-discovery Rule 16(b) (5) Courts to include scheduling orders Rule 26(a) Expansion on definition of ESI Rule 26(b) (2) E-Discovery Scope Cost-Shifting arguments – Burden of reasonableness moving to Requesting Party Rule 26(b) (5) Inadvertently disclosed ESI and Privilege Claw-back agreements Rule 34(b) Specify forms of production (Native, Image etc.) Rule 37(f) Disallow sanctions when ESI lost due to retention policy and good faith efforts © 2008 Clearwell Systems, Inc. Confidential 3 FRCP Rules and Their Impact on E-Discovery • Emphasis on co-operation during E-Discovery • Sedona Principles as a Guide for E-Discovery • Early Discovery Planning Conferences • No “Gaming” of E-Discovery • Prepare for Meet and Confer • • • • • Organizational Structure Information Assets and Data Map ILM Policies and Procedures Backup and Disaster Recovery Practices Preservation Hold/Legal Hold Policies and Actions • Establish E-Discovery Scope • • • • Estimate Review Size from automated Search Results Raw Volume, Processed Volume, Review Volume Substantiate “Not Reasonably Accessible” Claims Move burden of “cost provability” to the Requesting Party © 2008 Clearwell Systems, Inc. Confidential 4 Enabling E-Discovery within an Enterprise Analysis, Culling, Review Organizational Data Digital Asset Database IT Personnel Legal IT Personnel Legal Search/ Analysts ECM/ILM Policies File Shares Messaging servers Meta-Data Index Data Map Case Data Keyword Index CMS Enterprise Intranet © 2008 Clearwell Systems, Inc. Confidential 5 Preservation Hold E-Discovery Search Characteristics Theme • Produce Entire Results – not sufficient to only produce Top N • No Estimates of Counts – Must provide accurate, actual counts • Stability of Results • Very large Result Sets • Fast Query Response Time Relevance • Activity Based Relevance – Responsiveness Search vs. Privilege Search • Meta-Data based Relevance – Timeliness, People, Connection to other data • Review-directed Relevance • Traditional TF/IDF based Relevance • Provide Complete Hit Context Data Types Flexibility Results Management • Complete Auditing of all Searches • Document Hit Count Reports • Tie back to original Document MetaData • EDRM XML-2 Export to downstream processes • Group Neat-Duplicates, Concept Clusters for Review Efficiency Workflow • Many data formats – 10,000 formats • Advanced Search/Query Language • Incremental ESI Collections (Batches) • New communication formats – Wiki, Blogs, SMS, IM, Unified Messaging • Iterative Search and Search Refinement • Multi-level Review • Guided Navigation, one-click Filtering • Multi-person Review • Saving and Sharing Searches • Rolling Productions • Remove impediments to search – ACLs, Encryption, Container Extraction • Activity Reports • ESI from old, legacy applications • Incomplete and Corrupt data (Deleted Files, raw disk blocks) • Handle Multi-language ESI • Handle Low-fidelity documents – OCRscanned images © 2008 Clearwell Systems, Inc. Confidential • Real-time updates for Tagging, Classifying Results 6 • Outside Counsel, Opposing Counsel interactions • Project Management Search Effectiveness Techniques to improve Precision and Recall Precision • Pre-filtering wildcard expansions Precision • Boolean Queries 1.0 • Proximity Specification • Keyword Scope (Sentence, Paragraph) 0.8 • Meta-Data Context Search 0.6 • Entity based Search Recall 0.4 • Misspellings/Fuzzy Search • Wildcard Specifications 0.2 • Synonyms • Related Terms 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Recall © 2008 Clearwell Systems, Inc. Confidential • Concept Search • Bayesian Search 7 E-Discovery Search: Typical measures and outcomes Number of truly responsive in Retrieved Collection: Search Method Retrieved Documents Sample Size Responsive in Sample Estimate in Retrieved Precision Keyword Search 14846 1537 940 9080 0.61 Discussions Threads 16515 1537 1069 11486 0.70 Concept Search 18554 1537 1128 13617 0.73 Number of truly responsive documents in Un-Retrieved Collection Search Method Unretrieved Documents Keyword Search 1076258 Discussions Threads 1074589 Concept Search 1072550 © 2008 Clearwell Systems, Inc. Confidential Sample Size Responsive in Sample Estimate in Unretrieved Recall 1537 29 20307 0.31 1537 28 19576 0.37 1537 26 18143 0.43 8 Interactive Search: Key to Search Efficiency Interactive wildcard, stemming expansion selection • Removes precision-recall tradeoff by enabling interactive review and removal of false positive expansions • Save thousands of dollars per search Search Report • Detailed, interactive keyword search report results for iterative large query execution • Full transparency and auditing • © 2008 Clearwell Systems, Inc. Confidential Significant time savings 9 E-Discovery is about extracting Relevant Content 100-1000 TB 50 TB Preservation Store Archive and Store © 2008 Clearwell Systems, Inc. Confidential Collect and Preserve 10 500 GB 1-2 GB Analyze and Review Enterprise Case Study – Global Media Conglomerate Case Data 456,448 Eliminating the need to process and review 456,000 documents saved $175,000 208,628 Data culling based on query permutations reduced data set by 99% to 417 74,713 Time = 2.5 days © 2008 Clearwell Systems, Inc. Confidential 11 E-Discovery - Workflow S O U R C E S Rate of Ingestion • 1M files/hour • 10K directory scans • 1 TB/hour Rate of Indexing • 100 K files/hour • 10-20 GB/hour Source SCAN Meta-Data (Shallow Index) SQL Full Text Indexer Rate of Processing • 100 custodians • 10K files/hour • 1 GB PST/custodian Copy Engine Deep Index Full-Text Case ESI Store Processing Case Mgmt Processing Manifest Full Text Size of Index Size of Index • 0.2 TB • 1 TB • 1 billion rows • 10 billion • 10K/s Bulkobjects Load © 2008 Clearwell Systems, Inc. Confidential Rate of Extraction • 20 K files/hour • 2-4 GB/hour Size of Index • 1 TB (each partition) • Up to 100 index partitions • 10 billion objects • 200-400 file types • Includes meta-data 12 Size of Store • 32 TB FC/SCSI • 4 TB NTFS • 300 GB/custodian • 100 custodians Size of Manifest • 10 million items E-Discovery Search: Collection Workflow S O U R C E S Case Document Collection Source SCAN Search Meta-Data (Shallow Index) Search Scope • Owners/SID • Last Modification Date • Creation Date • Author/Title • Department Search Technology • Keyword Search • Parameterized Date Range © 2008 Clearwell Systems, Inc. Confidential 13 Copy of Original • Maintain Original Locations • Hash with Meta-Data for content and location integrity • Hash without Meta-Data for content Integrity E-Discovery Search: Analysis Workflow Potentially Privileged Documents Responsive Documents Search NonResponsive Documents Privilege Search • Documents • Emails Search Case Document Collection Search Scope • Documents • Emails Search Technology • Keywords • Boolean Search • Proximity Search • Fuzzy Search • Concept Search • Tagged Search Search Refinement • Additional Keywords • Additional Search Methods Privileged Documents Privilege Review Potentially Responsive Documents Production Documents Privilege “Misses” Review Search Sampling Engine Confidence Level Sample Size 95 1537 99 66358 Quality Control • Documents • Emails • Tags Sample Non-Responsive Documents Reports • Search Reports • Activity Reports • QC Reports • Project Review Reports • Privilege Log • Exceptions Reports Responsive Misses “Recall” Document Review © 2008 Clearwell Systems, Inc. Confidential 14