Overview of the TREC 2007 Legal Track, Workshop Slides

advertisement
No, Not That PMI:
Creating Search Technology
for E-Discovery
Jason Baron,1,4 Douglas W. Oard,1,3
Tamer Elsayed2,3 and Lidan Wang2,3
Plus thanks to: Simon Attfield, David Lewis, Paul Thompson, Stephen Tomlinson, Feng Zhou
1College
of Information Studies
2Computer Science Department
3Institute for Advanced Computer Studies
University of Maryland, College Park
October 6, 2008
iSchool Colloquium
U.S. v. Philip Morris et al.
• Civil lawsuit brought by Clinton Administration
against tobacco companies in 1999
• Racketeering allegation that companies have
conspired since 1953 to defraud the American
public as to the true health effects of smoking
• 1,726 Requests to Produce from tobacco
companies for tobacco-related records
(including email) from 30 federal agencies
• 32 million Clinton-era email records held by
National Archives
Query Terms
•
•
•
•
•
•
•
•
•
•
•
•
Round 1
Round 2
tobacco
• Liggett
cigarette
• PMI
smoking
– (Philip Morris Institute)
<tar>
• MSA
nicotine
– (Master Settlement Agreement)
Smokeless
• ETS
– (Environmental Tobacco Smoke)
Synar Amendment
• B&W
Philip Morris
– (Brown & Williamson)
R.J. Reynolds
• TI (Tobacco Institute)
BAT Industries
• …
Liggett group
Brown and Williamson
Suppressing False Positives
• Upper Marlboro, Maryland
• Presidential Management Intern (PMI) program
• Medical Savings Accounts (MSA)
• Metropolitan Standard Area (MSA)
• Educational Testing Service (ETS)
• Black & White photos (B&W)
• TI . . .
White
House
Counsel
False
Positives
VP Chief
of Staff
Relevant
Smoking
Policy Emails
Office of
the U.S.
Trade
Rep.
OMB
Ron Klain
Final Boolean Query
(
((master settlement agreement OR msa) AND NOT (medical
savings account OR metropolitan standard area)) OR
s. 1415 OR
(ets AND NOT educational testing service) OR
(liggett AND NOT sharon a. liggett) OR
atco OR
lorillard OR
(pmi AND NOT presidential management intern) OR
pm usa OR
rjr OR
(b&w AND NOT photo*) OR
phillip morris OR batco OR ftc test method OR
star scientific OR vector group OR joe camel OR
(marlboro AND NOT upper marlboro)
…
)
Clinton
White House
32 million
emails
search
request
~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~
Tobacco
Policy
~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~
National Archives
hired 25
persons
for 6 months …
~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~
200,000
80,000
Federal Rules of Civil Procedure
(as amended 12/1/06)
Rule 26(f)
At the parties’ planning meeting, issues
expected to be discussed include:
– “Any issues relating to disclosure or discovery of
electronically stored information, including the
form or forms in which it should be produced”
– “Any issues relating to preserving discoverable
information”
Recent Case Law
• Ameriwood Industries, Inc. v. Liberman, 2007 WL
685623 (E.D. Mo.) (court orders expert report with
number of “hits” based on negotiated search terms,
with expectation that parties will continue to meet and
confer to refine search based on false positives)
• Williams v. Taser Intern, Inc., 2007 WL 1630875 (N.D.
Ga.) (court adjudicates search protocol with keywords
plus use of simple Boolean operators)
• 6/1/07: First published legal opinion in U.S.
discussing difference between “keyword” and
“concept” searching. Disability Rights Council of
Greater Washington, et al. v. Washington Metropolitan
Transit Authority, 242 F.R.D. 139 (D.D.C. 2007)
Text Retrieval Conference (TREC)
• Goals
– Foster development of research communities
– Create “benchmark” evaluation resources
– Establish “baseline” results
• History
– Sponsored by NIST since 1992
– “Legal Track” started in 2006 with an E-Discovery focus
Desiderata in the Legal Realm
• Two-party
– Negotiated (not one-sided) information needs
• Recall-oriented
– “Smoking gun detection” + completeness
• Explainable
– Quantifiable comparison to present best practice
• Affordable
– Minimize amount of human review on back end
IIT CDIP Document Collection
• UCSF Legacy Tobacco Documents Library
– 6,877,327 documents released in lawsuits
– Variety of corporate document types
– Range of printing technologies and handwriting
• IIT CDIP v1.0 Document Collection
– OCR: 50 GB
– Metadata: 5 GB (XML)
• Scanned documents used for assessment
– 42 million TIFF page images: 1.5 TB
Sample Document
Sample Document
Example Document
Scanned
OCR
Philip Moxx's. U.S.A. x.dr~am~c.
cvrrespoaa.aa
Benffrts Departmext Rieh>pwna, Yfe&ia
Ta: Dishlbutfon Data aday 90,1997.
From: Lisa Fislla
Sabj.csr CIGNA WeWedng Newsbttsr Yntsre StratsU
During our last CIGNA Aatfoa Plan
meadng, tlu iasuo of wLetSae to i0op
per'Irw+ng
artieles aod discontinue mndia6 CIGNA
Well-Being aawslener to om employees
was a
msiter of disanision . I Imvm done
somme reaearc>>, and wanted to
pruedt you with my
Sadings and pcdiminary
recwmmeadatioa for PM's atratezy
Ieprding l4aas aewelattee* .
I believe .vayone'a input is valusble, and
would epproolate hoarlng fmaa aaeh of
you on
whetlne you concur with my
reeommendatioa
…
Metadata
Title: CIGNA WELL-BEING
NEWSLETTER - FUTURE
STRATEGY
Organization Authors:
PMUSA, PHILIP MORRIS
USA
Person Authors: HALLE, L
Document Date: 19970530
Document Type: MEMO,
MEMORANDUM
Bates Number:
2078039376/9377
Page Count: 2
Collection: Philip Morris
“Complaints”
Drafted by The Sedona Conference® lawyers:
(1)Wrongful death and products liability action based on the use
of a certain type of radioactive phosphates resulting in
contaminated candy as well as in drinking water;
(2)Patent infringement action on a device named “Suck out the
Bad, Blow in the Good,” designed to ventilate smoke;
(3)Shareholder class action suit alleging securities fraud and
false advertising in connection with a fictional “Smoke Longer,
Feel Younger” campaign relying on ‘60s-era folk music;
(4)Fictional Justice Department antitrust investigation looking in
to a planned merger and acquisition of a casualty and
property insurance company by a tobacco company.
<Complaint>
<ComplaintNumber>No. 2006-3</ComplaintNumber>
<Date>July 1, 2006</Date>
<Court>THE DISTRICT COURT OF THE COMMONWEALTH OF NEW SEARCHLAND</Court>
<Plaintiff>John Doe, et al.</Plaintiff>
<Defendant>Echinoderm Cigarettes, et al.</Defendant>
<Introduction><P>John Doe, on behalf of the Organization of Concerned Parents, brings this action to force the defendant tobacco companies to cease all
placement of tobacco products, brands, and logos in television, film, live theater and rock concerts (collectively referred to as the "public media"). The historical placement of
tobacco products and branding in the public media has forced an increase in product awareness, particularly among young adults and children, by providing consistent and
recurring exposure to on-screen situations that generally glamorize smoking and other tobacco use.</P> </Introduction>
<Party>
<PlaintiffParty>Plaintiff John Doe brings this action on behalf of a nationwide class of individuals injured in childhood and adulthood by defendants' actions. Mr. Doe resides at
1004 Public Avenue, Commonwealth of New Searchland.</PlaintiffParty>
<DefendantParty>Defendants are Echinoderm Cigarettes and other unnamed tobacco companies, with principal places of business in the Commonwealth of New
Searchland.</DefendantParty>
<Coconspirator></Coconspirator> </Party>
<Jurisdiction> <P>This Court has jurisdiction pursuant to 1 Comm. New Searchland, Sec. 1956.</P> </Jurisdiction>
<Background>
<P>According to information and belief, Echinoderm Cigarettes and other companies have a long history of placement of tobacco products and brand images in the public media.
These media, including television (network and cable), film, a live theater, and rock concerts, are regularly viewed by children, teen-agers, and young adults. Such individuals
are at the most impressionable time of their lives, and are unknowingly exposed to de facto advertising for tobacco and tobacco-related products simply by watching such
media.</P>
<P>In particular, the glamorous manner in which smoking and other tobacco use are portrayed on the screen adds a cachet to the habit that encourages young people to try
smoking for the first time. Thus is exposed the true motivation for product placement - inducing non-smokers to become smokers with blatant disregard for the long term effects
and public health risks associated with tobacco use.</P> </Background>
<CauseOfAction>
<P>Echinoderm Cigarettes and other unnamed companies have represented that they do not pay for product placement in the public media. This representation is patently
false. Tobacco concerns regularly pay for placement of their products via direct monetary compensation, exchange of goods and services, and other considerations.</P>
<P>COUNT I Defendants have engaged in a pattern of misleading practices in violation
of state and federal statutes by providing compensation to television networks,
production companies, film production companies, providers of live theater and rock
concerts in exchange for placement of products and brand images.</P>
<P>COUNT II By exposing children and young adults to tobacco products in the public media, and by glamorizing the use of products known to cause health issues, defendants'
actions are in violation of applicable law.</P> </CauseOfAction>
<RequestedRelief>
<P>Declare that Echinoderm Cigarettes and other unnamed defendants are in violation of law by providing compensation for placement of products and brand images, and by
exposing children and young adults to tobacco products in the media.</P>
<P>Enter an order requiring defendants to disgorge all monies, reimbursement, and payments received as a result of product placement in the public media.</P>
<P>Defendants to pay costs and expenses, including attorneys' fees, in connection with the investigation and litigation of this matter.</P>
</RequestedRelief> </Complaint> </ProductionRequest>
A “Production Request”
RequestNumber: 52
RequestText:
Please produce any and all documents that discuss the use
or introduction of high-phosphate fertilizers (HPF) for the
specific purpose of boosting crop yield in commercial
agriculture.
Proposal:
"high-phosphate fertilizer!" AND (boost! w/5 "crop yield")
AND (commercial w/5 agricultur!)
Rejoinder:
(phosphat! OR hpf OR phosphorus OR fertiliz!)
AND (yield! OR output OR produc! OR crop OR crops)
FinalQuery:
(("high-phosphat! fertiliz!" OR hpf) OR
((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND
(boost! OR increas! OR rais! OR augment! OR affect! OR
effect! OR multipl! OR doubl! OR tripl! OR high! OR greater)
AND (yield! OR output OR produc! OR crop OR crops)
B:
3078
The Resulting “Topic”
- <ProductionRequest>
<RequestNumber>52</RequestNumber>
<RequestText>Please produce any and all documents that discuss the use or introduction of high-phosphate fertilizers (HPF) for the specific purpose of boosting crop yield in commercial agriculture.</RequestText>
- <BooleanQuery>
<FinalQuery>(("high-phosphat! fertiliz!" OR hpf) OR ((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND (boost! OR increas! OR rais! OR augment! OR affect! OR effect! OR multipl! OR doubl! OR tripl! OR high! OR greater) AND (yield! OR output OR produc! OR crop OR crops)</FinalQuery>
- <NegotiationHistory>
<ProposalByDefendant>"high-phosphate fertilizer!" AND (boost! w/5 "crop yield") AND (commercial w/5 agricultur!)</ProposalByDefendant>
<RejoinderByPlaintiff>(phosphat! OR hpf OR phosphorus OR fertiliz!) AND (yield! OR output OR produc! OR crop OR crops)</RejoinderByPlaintiff>
</NegotiationHistory>
</BooleanQuery>
<FinalB>3078</FinalB>
<RequestSource>2007-A-1</RequestSource>
- <Instruction>
<P>1. These requests require the production of all responsive documents within the sole or joint possession, custody or control of the Defendant, including their agents, departments, attorneys, directors, officers, employees, consultants, investigators, insurance companies, or other persons subject to Defendant's custody or control.</P>
<P>2. All documents that respond, in whole or in part, to any portion of these Requests must be produced in their entirety, including all attachments and enclosures.</P>
<P>3. For purposes of these requests, the words used are considered to have, or should be understood to have their ordinary, everyday meanings. Plaintiffs refer Defendant to any dictionary in the event that Defendant asserts that the wording of a request is vague, ambiguous, unintelligible, or confusing.</P>
</Instruction>
- <Definition>
<P>4. The words "and," "or," "each," "any," "all," "refer," and "discuss," shall be construed in their broadest form and the singular shall include the plural and the plural shall include the singular whenever necessary so as to bring within the scope of these Requests all documents (defined below) that might otherwise be construed to be outside
their scope.</P>
<P>5. Solely for the purpose of the TREC 2007 legal track, the term "Defendant" shall include the named defendant companies in this complaint as well as all other companies whose records are found in the TREC collection database.</P>
<P>6. Solely for the purpose of the TREC 2007 legal track, "document" means all data, information or writings stored in the TREC legal database, including, without limitation: any written, electronic or computerized files, data or software; memoranda, emails correspondence, OCR scanned images, communications, reports, summaries, studies,
analyses, evaluations, notes or notebooks, indices, spreadsheets, logs, books, pamphlets, binders, calendar or diary entries, ledger entries, press clippings, graphs, tables, charts, printouts, drawings, maps, meeting minutes, and transcripts. The term document encompasses all metadata associated with the document. The term also includes all
drafts associated with any particular document. The term is also intended to include all electronically stored information as the term is used in the Federal Rules of Civil Procedure,</P>
<P>7. The terms "relating to," "regarding," ‘
discussing," or "concerning," shall be synonymous and should be taken to mean in whole or in part constituting, containing, concerning, discussing, describing, analyzing, identifying or stating.</P>
<P>8. The term "high-phosphate fertilizers" (HPF) shall refer to any high phosphate fertilizer, including, but not limited to calcium phosphate fertilizers and superphosphate fertilizers. In some instances, "high-phosphate" fertilizers will be subsumed in the definition of "phosphatic fertlizers." However, phosphatic fertilizers are a more general
term for fertilizers containing phosphate and the phosphate concentration of various phosphatic fertilizers is likely to vary.</P>
<P>9. The term "Maleic Hydrazide" (MH) refers to a pesticide that is sprayed on sugar beets for the purpose of decreasing sugar loss in beet roots.</P>
</Definition>
- <Complaint>
<ComplaintNumber>2007-A</ComplaintNumber>
<Date>July 1, 2007</Date>
<Court>U.S. DISTRICT COURT SOUTHERN DISTRICT OF GLADSHEIM</Court>
<Plaintiff>MR & MRS. N. EINHERJAR, individually and on behalf of the Estate of DRIFA EINHERJAR, a minor, and the CITY AND COUNTY OF VALHALLA, a government entity.</Plaintiff>
<Defendant>GULLINKAMBI CANDY CO., a Gladsheim corporation; VIKING SUGAR FARMS, a Gladsheim corporation; and U.S. BEET SUGAR ASSOCIATION, a nationwide association with local chapters in Gladsheim.</Defendant>
- <Introduction>
<P>1. Plaintiffs Mr. and Mrs. N. Einherjar bring this action individually and on behalf of the estate of their deceased daughter Drifa Einherjar. These plaintiffs and the City and County of Valhalla (collectively referred to as "Plaintiffs") bring this action against Defendants Gullinkambi Candy Co. (GCC), Viking Sugar Farms (VSF), and the U.S. Beet
Sugar Association (BSA) (hereinafter referred to collectively as "Defendants," or individually by their respective acronyms). This complaint seeks equitable and injunctive relief for the use of lethal substances in the production of VSF sugar, resulting in the death of a child and contamination of the Valhalla County groundwater. This complaint
additionally seeks damages for strict products liability and failure to warn against GCC for the use of and failure to disclose lethal substances contained in its candy. Finally, this complaint seeks treble and punitive damages for fraud and conspiracy in violation of the Racketeer Influenced and Corrupt Organizations Act (RICO), 18 U.S.C. (sec)
1962 for Defendants' collective and organized concealment of lethal substances from Plaintiffs, resulting in the death of a child and massive contamination of Valhalla County's sole source of drinking water.</P>
</Introduction>
- <Party>
<PlaintiffParty>Plaintiffs, Mr. and Mrs. N. Einherjar, are residents of Valhalla, Gladsheim, and their deceased daughter, on whose behalf they are suing, was also a Valhalla resident.</PlaintiffParty>
<DefendantParty>2. Defendants GCC and VSF are both Gladsheim Corporations with principal places of business in Valhalla, Gladsheim. The U.S. Beet Sugar Association has local chapters in Valhalla, Gladsheim, and directs the actions of VSF.</DefendantParty>
<Coconspirator />
</Party>
- <Jurisdiction>
<P>All events giving rise to this incident took place in Valhalla, Gladsheim. Therefore, jurisdiction of this court is proper.</P>
</Jurisdiction>
- <Background>
<P>3. Defendant VSF uses high-phosphate fertilizers (HPF) (sometimes referenced as phosphate fertilizers) to increase the flavor of its sugar beets. HPF contains traces of radioactive elements that remain as a byproduct of phosphate extraction. Phosphate used in HPF is taken from a rock mineral called Apatite which also contains radioactive
radium. The resulting Apatite powder therefore contains traces of radioactive elements that become incorporated into HPF. Studies have shown that health problems caused by HPF include immune disorders, toxic myopathy, chronic fatigue syndrome, liver dysfunctions, irregular heart-beat, reactive depression, and memory loss. In addition to
using HPF, VSF sprays its sugar beets with Maleic Hydrazide (MH) to decrease the loss of sugar content in its sugar beet crop. MH has been shown to cause renal dysfunction in laboratory mice and to eventually lead to death.</P>
<P>4. In 1933, the U.S. Beet Sugar Association conspired with cane-growers in Hawaii to form a powerful sugar cartel that controlled Congress through a strong sugar lobby. Together, the American sugar growers united to create an underground sugar-trade brotherhood secretly referred to as "The Sugar Program." Members of the brotherhood
contributed large sums of money to hire sugar-interest lobbyists who successfully brought about a series of favorable Sugar Acts beginning in 1934 and continuing to the present day. The Sugar Program brotherhood has also been successful in preventing Congress from regulating HPF or MH.</P>
<P>5. For the past five years, the BSA has served as elected leader of The Sugar Program, and has been given the responsibility for regulating the actions of the brotherhood members and for approving all major contracts and actions taken by members under its control.</P>
<P>6. Defendant GCC is a candy company that uses VSF sugar in all of its candy. As part of its contract with VSF, GCC agreed to conceal the levels of HPF and MH contained in VSF sugar from its consumers in exchange for an exclusivity provision and a discount on the wholesale price of its sugar. GCC therefore omitted warnings about HPF
and MH from its candy labels.</P>
<P>7. As a result of Defendants' collective actions and omissions an eight-year old girl died from consuming a piece of GCC candy and the Valhalla community as a whole has been harmed by the contamination of their drinking water with HPF and MH.</P>
</Background>
- <CauseOfAction>
<P>FIRST CAUSE OF ACTION</P>
<P>Wrongful Death</P>
<P>8. On March 23, 2007, decedent Drifa Einherjar (hereinafter "Decedent") purchased a piece of GCC candy for $0.67 from the GCC store on Main Street, Valhalla, Gladsheim. At the time of purchase, Decedent was not warned or informed of any dangers of eating the candy and there were no warnings on the candy wrapper or labels of the candy
bag.</P>
<P>9. GCC knew that VSF used HPF and MH in its sugar production process. Despite this knowledge, GCC contractually agreed to conceal the presence of HPF and MH in its candy as a condition of its agreement with VSF, in exchange for a discount on its bulk sugar purchases.</P>
<P>10. As a direct and proximate result of these stated acts and omissions, Decedent consumed a piece of GCC candy containing HPF and MH, resulting in her death on March 24, 2007. Decedent ate the candy in a manner in which it was intended to be eaten, and received no instructions from any agents of GCC to exercise caution or to eat the
candy in any other way.</P>
<P>SECOND CAUSE OF ACTION</P>
<P>Strict Tort Liability</P>
<P>11. The aforementioned candy and VSF sugar used as a primary ingredient in the candy were unreasonably dangerous to human health due to their high content of HPF and MH.</P>
<P>12. Defendants GCC and VSF knew of this health risk and notwithstanding that knowledge, concealed these dangers from the consuming public.</P>
<P>13. As a result of the HPF and MH contained in GCC candy, Decedent died within 24 hours of consuming a single piece of GCC candy.</P>
<P>THIRD CAUSE OF ACTION</P>
<P>Public Nuisance (Against Defendant VSF only)</P>
<P>14. Defendant VSF's method of sugar beet farming creates a public nuisance that unreasonably endangers the health of all Valhalla residents by contaminating their groundwater.</P>
<P>15. By continuing to use HPF and MH in its sugar beet production and by failing to use the standard method of limestone quicklime phosphate precipitation in the treatment of its waste-water, VSF continues to contaminate the groundwater and will continue to endanger the health of Valhalla residents. The harm to Valhalla residents will
continue until an injunction is issued to stop the use of HPF and MH or to require implementation of the limestone quicklime wastewater treatment to minimize contamination.</P>
<P>16. As a direct and proximate cause of Defendant's acts and omissions, residents of Valhalla have unknowingly ingested harmful substances from their contaminated water supply.</P>
<P>FOURTH CAUSE OF ACTION</P>
<P>Failure to Warn</P>
<P>17. VSF, as a sugar beet farm that uses HPF and MH, had a duty to issue warnings to Plaintiffs and the general public about the presence of HPF and MH in its sugar and the corresponding health risks that these substances posed in groundwater or direct consumption.</P>
<P>18. Defendants VSF and GCC knew, or with the exercise of reasonable care, should have known that HPF contained radioactive substances and that MH added to the diet of mice, resulted in renal dysfunction and eventual death. Despite this knowledge, no information was offered to the Valhalla Community about the potential hazards of
HPF, the lethal nature of MH used in VSF's sugar production, or the presence of HPF or MH in GCC candy.</P>
<P>19. At all times relevant to this litigation, Defendants VSF and GCC had actual and/or constructive knowledge of the dangers mentioned above. Despite this knowledge, VSF continued to operate its sugar beet plant with reckless disregard for the community around it by contaminating their groundwater and GCC continued to sell candy
containing HPF and MH in reckless disregard for the life of children whom it targeted in its advertising campaigns and who therefore could be expected to purchase and consume GCC candy.</P>
<P>20. VSF breached its duty to warn the community about HPF and MH groundwater contamination and GCC breached its duty to warn consumers of the HPF and MH in its candy.</P>
<P>21. Defendant VSF's failure to warn has resulted in the contamination of Valhalla County's drinking water and the endangerment of the health of Valhalla residents.</P>
<P>22. GCC's failure to warn resulted in the death of a child and the illness of several others.</P>
<P>FIFTH CAUSE OF ACTION</P>
<P>Conspiracy and Fraud in Violation of the Racketeer Influenced and Corrupt Organizations Act (RICO), 18 U.S.C. (sec) 1962, and Request for Treble Damages.</P>
<P>23. Defendants VSF, GCC, and BSA engaged in a conspiracy to defraud by collectively agreeing to conceal the presence and adverse health effects of HPF and MH from the American public, the Valhalla community and Plaintiffs in particular.</P>
<P>24. In 1933, Defendants formed a sugar cartel secretly known as "The Sugar Program" which successfully lobbied Congress in passing favorable sugar laws and prevented the regulation of HPF and MH in commercial agriculture.</P>
<P>25. All three Defendants contributed financially to a lobbying fund aimed at fighting HPF and MH regulation and obtaining the passage of favorable "Sugar Acts."</P>
<P>26. For the past five years, the BSA has lead lobbying efforts and approved all actions of The Sugar Program brotherhood.</P>
<P>27. BSA spearheaded the movement to discourage written warnings about HPF and MH, and approved the VSF contract with GCC which provided for a reduction of GCC's wholesale sugar price, and a favorable exclusivity provision between VSF and GCC, under the condition that GCC refrain from publishing warnings about HPF and MH on
its product labels.</P>
<P>28. As a result of this collective action to defraud the public, Plaintiffs have suffered injuries indicated above. Treble damages are therefore appropriate under RICO to punish the conspiratorial nature of Defendants' planned concealment of known health risks presented by HPF and MH from the Valhalla community and from Plaintiffs,
resulting in the death of a child.</P>
<P>SIXTH CAUSE OF ACTION</P>
<P>Negligence</P>
<P>29. Defendant VSF had a duty to the Valhalla community and to Plaintiffs to refrain from contaminating their groundwater and to provide warnings about the known health hazards associated with HPF and MH which it used in the production of its sugar beets.</P>
<P>30. Defendant GCC had a duty to the Valhalla community and to Plaintiffs to disclose the known levels of HPF and MH in VSF sugar which it used as a primary ingredient in its candy.</P>
<P>31. Defendant BSA had a duty to compel members of the brotherhood under its control to require lawful disclosures of HPF and MH.</P>
<P>32. All Defendants breached their respective duties to the Valhalla community and to Plaintiffs. As a result, Plaintiffs have suffered damages indicated above.</P>
<P>Punitive Damages</P>
<P>33. The conduct of Defendants described above is outrageous. Defendants' conduct demonstrates a reckless disregard for human life and a conscious disregard for public safety. The acts and omissions described above were willful and performed with actual or implied malice. Punitive and exemplary damages are therefore appropriate and
should be imposed in this instance.</P>
</CauseOfAction>
- <RequestedRelief>
<P>WHEREFORE, Plaintiffs respectfully pray for a judgment against Defendants for:</P>
<P>1. Injunctive and equitable relief as the Court deems appropriate including:</P>
<P>i) Requiring Defendant VSF to test and to monitor the water near its sugar plant;</P>
<P>ii) Requiring Defendant VSF to use the quicklime limestone method for processing wastewater to minimize phosphate contamination of Valhalla groundwater, if it is permitted to continue operation of its plant and to continue use of HPF and MH in its sugar beet production;</P>
<P>iii) Compelling Defendant VSF to remove existing HPF from the groundwater by any means necessary; and</P>
<P>2. Compensatory damages to be paid by all Defendants, according to proof at trial;</P>
<P>3. Punitive damages as the court deems appropriate;</P>
<P>4. Costs and attorneys fees of this lawsuit, with interest;</P>
<P>5. Any other relief as the court deems appropriate.</P>
</RequestedRelief>
</Complaint>
</ProductionRequest>
2006/07 Research Teams
Carnegie Mellon U
Dartmouth College
Long Island U
Sabir Research, Inc.
U Iowa
U Massachusetts
U Maryland
U Missouri, Kansas City
U Washington
Ursinus College
Fudan U (CN)
National U of Singapore (SG)
Open Text Corporation (CA)
U Amsterdam (NL)
U Waterloo (CA)
Deconstructing
“Concept Search”
“Method”
“Features”
“Specification”
“Result”
Representing “Documents”
• Content
– Count the words
– Weight the words
– Ascribe meaning to
words
• Context
– Who said this?
– When was it said?
– Who did they say it to?
“Features”
• Description
– What was said about it?
• Behavior
– What was done with it?
Controlling the Search System
• Proactive
– “Keyword” query
– “Boolean” query
– Query by example
• Reactive
–
–
–
–
Ranked list selection
“More like this” query
Category exploration
Social network
exploration
• Iterative
– Query refinement
– “Search within” query
“Specification”
Generating Results
• Logic
• Similarity
• Probability
•
•
•
•
•
Result set
Ranked list
Classification
Clustering
Visualization
“Method”
“Features”
“Specification”
“Result”
2006 Experiments
• 31 “official” runs from 6 sites
– Judged top-100 main site run, top-10 for others
– Scored top-5000
• Reference Boolean run
– Judged stratified sample of 200 documents
– Judged to B
• Expert manual searcher “run”
– ~100 documents/topic
– Tried to find documents systems would miss
2006/07 “Relevancy” Assessors
Bank of America
Department of Justice
FTI Consulting
H5 Technologies Inc.
NARA
Lewis & Roca LLP
New Mexico Attorney General
Preston Gates LLP
Reasonable Discovery LLC
SAIC
Private individuals (CA, UK)
Law Schools
Boston University
Case Western Reserve
George Mason
George Washington
Loyola-Los Angeles
Loyola-New Orleans
U Dayton
U Indiana-Indianapolis
U Maryland
U Texas
6
7
8
9
10
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
43
44
45
46
47
49
50
51
Kappa
2006 Inter-Assessor Agreement
1.0
0.5
0.0
Topic
-0.5
-1.0
a
b
Agreement of 2nd Assessor with Main Assessor
1.0
c
0.8
Agreement on Negatives
2x
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
Agreement on Positives
0.8
1.0
2006: Nobody Finds Everything
Boolean
Expert Searcher
TREC Systems Only
Known Relevant Documents
500
400
300
200
100
0
Topic
Source: TREC 2006 Legal Track
2006: Precision@R
Mean R-Precision over 39 topics
0.20
Manual run for
pool enrichment
0.15
0.10
0.05
0.00
Automatic Ranked Runs
Reference
Boolean
run
Sampling for Affordable Evaluation
4
 67% relevant in this region
6
1
 33% relevant in this region
3
TREC 2007 Experiments
• Making “pools”: 68 runs from 12 groups
– Up to 25,000 documents per run per topic
– Plus 100 random unsubmitted documents
– Before sampling: 195,688-476,252 docs/topic
• Bin 1 (“required”)
– 500 documents. done by 43 of 50 assessors
• Bins 2 through 6 (optional)
– 100 documents each
– 8 of 43 assessors did at least one, 5 did all
Estimated # of Rel Docs in Pool
80,000
Mean per Topic:
• Relevant: 16,904
• Non-rel.: 298,678
• Gray:
4,303
70,000
60,000
50,000
40,000
Topic 71 (bromhidrosis):
• Relevant: 77,467
30,000
20,000
10,000
0
1
3
5
7
9
11
13 15
17 19
21
23 25
27 29
31
33 35
37
39 41
43
Topic 63 (sugar contract):
• Relevant:
18
Boolean Run Estimated Recall
1.0
Mean EstR@B: 0.22
• Boolean run missed 78% of
the relevant documents (on
average per topic)
0.9
0.8
0.7
0.6
0.5
Topic 84 (1960’s films)
EstR@B=100%
0.4
0.3
0.2
0.1
0.0
1
4
7
10 13 16 19 22 25 28 31 34 37 40 43
Topic 77 (smoke NOT tobacco)
EstR@B= 0%
Median vs. Boolean (EstR@B)
0.2
0.1
Median Better
0
-0.1
-0.2
Median won 8 of 43
Boolean won 31 of 43
(4 tied)
Topic 99: 0.31 vs. 0.21
(natural disasters)
-0.3
-0.4
-0.5
•
•
•
Boolean Better
-0.6
Topic 58: 0.07 vs. 0.94
(phosphates and health)
-0.7
-0.8
-0.9
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Boolean run had higher
mean EstR@B than all
submitted runs.
Median vs. Boolean (EstR@25000)
1
0.8
0.6
Median Better
•
•
•
Median won 33 of 43
Boolean won 9 of 43
(1 tied)
0.4
Topic 60: 0.91 vs. 0.07
(phosphate precip.)
0.2
0
-0.2
-0.4
-0.6
Boolean Better
Topic 58: 0.09 vs. 0.94
(phosphates and health)
Highest mean EstR@25000
47%
-0.8
-1
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Marginal Precision by Depth Band
Depths 1- 5000: median Precision=18%
Depths 5001-10000: median Precision=13%
Depths 10001-15000: median Precision=11%
Depths 15001-20000: median Precision=10%
Depths 20001-25000: median Precision=10%
• 3 of 446 (0.7%) of random (unsubmitted)
documents were judged relevant
– On average, another 50,000 relevant docs per topic?
Median “Run” Marginal Precision
(Depths 20,001-25,000, by Topic)
•
1
0.9
only 6 of 43 topics
Marg. Prec. > 10%
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Topic 69: MP = 100%
(indoor smoke vent.)
Topic 74: MP = 46%
(indoor air quality)
Topic 71: MP = 21%
(bromhidrosis)
2008 Legal Track
• Interactive task models commercial practice
– Recall-oriented (classify every document)
– “Topic authority” available for clarification
– Fewer topics with much richer sampling
• Relevance feedback task
– Models multi-stage meet and confer
• Third set of ad hoc task topics
– Completes development of reusable(?) collection
Hill Climbing the Boolean Set
Boolean run
On OCR
Ranked Run
on OCR
Ranked Run
on Metadata
Extract “good”
metadata and
add to query
Remove least likely from Boolean
Add most likely from Ranked
Metadata-Based Expansion
Document
image from
archives
Retrievable
Corrupted
How to retrieve corrupted documents?
Query
Expand query
with author and
recipient names
Retrievable
Corrupted
“Beating Boolean”
(but not by much yet!)
Estimated Recall@B
0.3
+63%0.231
0.216
0.220
Boolean
Boolean +
Ranked OCR
0.228
0.2
0.142
0.1
0
Ranked OCR
Ranked
OCR/Metadata
Boolean +
Ranked
OCR/Medatada
TREC-2006/07 “Training Topics” for 2008
Meet-and-Confer Alternatives
Incremental Disclosure Benefit
15-then-0
10-then-5
5-then-10
0.56
0-then-0
50 Topics, Title Queries, TREC-2005 Robust Track Collection
Other Recent Developments
• ICAIL Workshop on Discovery of Electronically
Stored Information (DESI), Stanford, June 2007,
http://www.umiacs.umd.edu/~oard/desi-ws/
• Sedona Best Practices Commentary on the Use
of Search and Information Retrieval Methods in
E-Discovery (August 2007 public draft),
http://www.thesedonaconference.org
• DESI-2 Workshop, London, June 2008,
http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/
Taking the Larger View
Jack G. Conrad, “E-Discovery Revisited: A Broader Perspective for Researchers,” DESI-1
E-Discovery as Sensemaking
Simon Attfield and Ann Blandford, “E-Discovery Viewed as Integrated Human-Computer Sensemaking,” DESI-2
Identity Resolution in Email
Date: Wed Dec 20 08:57:00 EST 2000
From: Kay Mann <kay.mann@enron.com>
To: Suzanne Adams <suzanne.adams@enron.com>
Subject: Re: GE Conference Call has be rescheduled
DidSheila
Sheila want Scott to participate? Looks like the
call will be too late for him.
WHO?
3-Step Solution
(1) Identity Modeling
(2) Context Reconstruction
(3) Mention Resolution
Posterior Distribution
Where to Look for Evidence
Socially-related Conversations
On-Topic
This Conversation
This
Message
Contextual Space
Elsayed, Oard and Namata
ACL/HLT 2008
Contextual Resolution
“Sheila Tweed”
“jsheila@enron.com”
social
social
“Sheila Walton”
“Sheila”
topical
topical
social
“sheila”
“Sheila”
topical
conversational
“sg”
Context-Free
Resolution
Test Collections
Enron-all
Enron-subset
Sager
Shapiro
Collection
Sager
Emails
Identities
Mention
Candidates
Queries
Min.
Avg.
Max.
1,628
627
51
1
4
11
974
855
49
1
8
21
Enron-subset
54,018
27,340
78
1
152
489
Enron-all
248,451
123,783
78
3
518
1,785
Shapiro
Which Context is the best?
1.0
MRR
0.8
Social
Topical
Conversational
Baseline
Local
0.6
0.4
0.2
0.0
0
Sager
Shapiro
3
Enron-sub Enron-all
Accomplishments
• Unique test collection
– 7 million documents with OCR and metadata
– 83 rich topics (Boolean, free text, context)
– Recall-oriented evaluation measure
• Moderately robust research community
– 16 research teams from 4 countries
– Attracting attention (and investment) in the law
Download