Ms. Grossman`s PowerPoint presentation. (12 MB )

advertisement
Technology-Assisted Review
Can be More Effective and
More Efficient Than Exhaustive
Manual Review
Maura R. Grossman
Wachtell, Lipton, Rosen & Katz
mrgrossman@wlrk.com
(212) 403-1391
Gordon V. Cormack
University of Waterloo
gvcormac@uwaterloo.ca
(519) 888-4567 x34450
Watson Versus Jennings and Ritter
2
Debunking the Myth of Manual Review

The Myth:
 That “eyeballs-on” review of each and every document in a
massive collection of ESI will identify essentially all responsive
(or privileged) documents; and
 That computers are less reliable than humans in identifying
responsive (or privileged) documents.

The Facts:
 Humans miss a substantial number of responsive (or privileged)
documents;
 Computers—aided by humans—find at least as many
responsive (or privileged) documents as humans alone; and
 Computers—aided by humans—make fewer errors on
responsiveness (or privilege) than humans alone, and are far
more efficient than humans.
3
Human Assessors Disagree!

Suppose two assessors, A and B, review the same set of
documents;

Overlap =
# documents coded responsive by both A and B
# documents coded responsive by A or B, or both A and B
Example: Primary and secondary assessors
both code 2,504 documents as responsive.
One or both code 2,531 + 2,504 + 463 =
5,498 documents as responsive.
Overlap = 2,504 ∕ 5,498 = 45.5%.
4
More Human Assessors Disagree Even More!

Suppose three assessors, A, B, and C, review the same set of
documents;

Overlap =
# documents coded responsive by A and B and C
# documents coded responsive by one or more of A, B, or C
Example: Primary, secondary, and tertiary
assessors all code 1,972 documents as
responsive.
One or more code 1,482 + 532 + 224 + 1,972 +
1,049 + 239 + 522 = 6,020 documents as
responsive.
Overlap = 1,972 / 6,020 = 32.8%.
5
Pairwise Assessor Overlap in the TREC
4 IR Task (Voorhees 2000)
6
Assessor Overlap With the Original Response to
a DOJ Second Request (Roitblat et al. 2010)
7
Assessor Overlap: IR Versus Legal Tasks
8
What is the “Truth”?
Option #1: Deem Someone Correct
Deem the primary reviewer as the gold standard (Voorhees 2000).
9
What is the “Truth”?
Option #2: Take the Majority Vote
Deem the majority vote as the gold standard.
10
What is the “Truth”?
Option #3: Have all Disagreements
Adjudicated by a Topic Authority
Have a senior attorney adjudicate all but only cases of disagreement (Roitblat
et al. 2010; TREC Interactive Task 2009).
11
How Good are Human Eyeballs?

What do we mean by “How Good”?



Recall;
Precision; and
F1.
12
Measures of Information Retrieval

Recall =
# of responsive documents retrieved
Total # of responsive documents in the entire document
collection
(“How many of the responsive documents did I find?”)

Precision =
# of responsive documents retrieved
Total # of documents retrieved
(“How much of what I retrieved was junk?”)

F1 = The harmonic mean of Recall and Precision.
13
Recall and Precision
14
The Recall-Precision Trade-Off
Perfection
100%
90%
Blair & Maron (1985)
80%
Typical result in a
manual responsiveness
review
Precision
70%
60%
50%
40%
30%
20%
TREC Best Benchmark
(Best performance on Precision
at a given Recall)
10%
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Recall
15
How Good is Manual Review?
16
Effectiveness of Manual Review
17
How Good is Technology-Assisted Review?
18
What is “Technology-Assisted Review”?
19
Defining “Technology-Assisted Review”

The use of machine learning technologies to categorize an entire
collection of documents as responsive or non-responsive, based on
human review of only a subset of the document collection. These
technologies typically rank the documents from most to least likely to
be responsive to a specific information request. This ranking can
then be used to “cut” or partition the documents into one or more
categories, such as potentially responsive or not, in need of further
review or not, etc.

Think of a spam filter that reviews and classifies email into
“ham,” “spam,” and “questionable.”
20
Types of Machine Learning

SUPERVISED LEARNING = where a human chooses the
document exemplars (“seed set”) to feed to the system and
requests that the system rank the remaining documents in the
collection according to their similarity to, or difference from, the
exemplars (i.e., “find more like this”).

ACTIVE LEARNING = where the system chooses the document
exemplars to feed to the human and requests that the human make
responsiveness determinations from which the system then learns
and applies that learning to the remaining documents in the
collection.
21
Machine Learning
Step #1: Achieving High Precision
Document Set for
Review
Source: Servient Inc. http://www.servient.com/
22
Machine Learning
Step #2: Improving Recall
Documents Set Excluded From
Review
Source: Servient Inc. http://www.servient.com/
23
How Do We Evaluate TechnologyAssisted Review?
24
The Text REtrieval Conference (“TREC”):
Measuring the Effectiveness of TechnologyAssisted Review

International, interdisciplinary research project sponsored by the National
Institute of Standards and Technology (NIST), which is part of the U.S.
Department of Commerce.

Designed to promote research into the science of information retrieval.

First TREC conference was held in 1992; the TREC Legal Track began in 2006.

Designed to evaluate the effectiveness of search technologies in the context of ediscovery.

Employs hypothetical complaints and requests for production drafted by
members of The Sedona Conference®.

For the first three years (2006-2008), documents were from the publicly available
7 million document tobacco litigation Master Settlement Agreement database.

Since 2009, publicly available Enron data sets have been used.

Participating teams of information scientists from around the world and U.S.
litigation support service providers have contributed computer runs attempting to
identify responsive (or privileged) documents.
25
TREC
The TREC Interactive Task

The Interactive Task was introduced in 2008, and repeated in
2009 and 2010.

It models a document review for responsiveness.

It begins with a mock complaint and associated requests for
production (“topics”).

It has a single Topic Authority (“TA”) for each topic.

Teams may interact with the Topic Authority for up to 10 hours.

Each team must submit a binary (“responsive” / “unresponsive”)
decision for each and every document in the collection for their
assigned topic(s).

It provides for a two-step assessment and adjudication process for
the gold standard: where the team and assessor agree on coding,
the coding decision is deemed correct; where the team and
assessor disagree on coding, appeal is made to the Topic
Authority who determines which coding decision is correct.
26
Effectiveness of Technology-Assisted
Review at TREC 2009
27
Manual Versus Technology-Assisted Review
28
But!

Roitblat, Voorees, and the TREC 2009 Interactive Task all used
different datasets, different topics, and different gold standards, so
we cannot directly compare them.

While technology-assisted review appears to be at least as good as
manual review, we need to control for these differences.
29
Effectiveness of Manual Versus
Technology-Assisted Review
30
So, Technology-Assisted Review is at
Least as Effective as Manual Review, But
is it More Efficient?
31
Efficiency of Technology-Assisted Versus
Exhaustive Manual Review


Exhaustive manual review involves coding 100% of the documents, while technologyassisted review involves coding of between 0.5% (Topic 203) and 5% (Topic 207) of
the documents.
Therefore, on average, technology-assisted review is 50 times more efficient
than exhaustive manual review.
32
Why Are Humans So Lousy at Document
Review?
33
Topic 204 (TREC 2009)

Document Request
 All documents or communications that describe, discuss,
refer to, report on, or relate to any intentions, plans, efforts,
or activities involving the alteration, destruction, retention,
lack of retention, deletion, or shredding of documents or
other evidence, whether in hard-copy or electronic form.

Topic Authority
 Maura R. Grossman (Wachtell, Lipton, Rosen & Katz)
34
Inarguable Error for Topic 204
35
Interpretation Error for Topic 204
36
Arguable Error for Topic 204
37
Topic 207 (TREC 2009)

Document Request
 All documents or communications that describe, discuss,
refer to, report on, or relate to fantasy football, gambling on
football, and related activities, including but not limited to,
football teams, football players, football games, football
statistics, and football performance.

Topic Authority
 K. Krasnow Waterman (LawTechIntersect, LLC)
38
Inarguable Error for Topic 207
39
Interpretation Error for Topic 207
40
Arguable Error for Topic 207
41
Types of Manual Coding Errors
42
Take-Away Messages

Technology-assisted review finds at least as many responsive
documents as exhaustive manual review (meaning that recall is
at least as good).

Technology-assisted review is more accurate than exhaustive
manual review (meaning that precision is much better).

Technology-assisted review is orders of magnitude more efficient
than manual review (meaning that it is quicker and cheaper).
43
Measurement is Key

Not all technology-assisted review (and not all exhaustive manual
review) is created equal.

Measurement is important in selecting and defending an
e-discovery strategy.

Measurement also is critical in discovering better search methods
and tools.
44
Additional Resources

TREC
 http://trec.nist.gov/

TREC Legal Track
 http://trec-legal.umiacs.umd.edu/

TREC 2008 Overview
 http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf

TREC 2009 Overview
 http://trec.nist.gov/pubs/trec18/papers/LEGAL09.OVERVIEW.pdf

TREC 2010 Overview
 Forthcoming (April 2011) at http://trec-legal.umiacs.umd.edu/

Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review
Can be More Effective and More Efficient Than Exhaustive Manual
Review, XVII:3 Richmond Journal of Law & Technology (Spring 2011) (in
press).
45
Questions?
Thank You!
46
Download