Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science

advertisement
Text Summarization:
News and Beyond
Kathleen McKeown
Department of Computer Science
Columbia University
1
What is Summarization?
 Data as input (database, software trace,
expert system), text summary as output
 Text as input (one or more articles),
paragraph summary as output
 Multimedia in input or output
 Summaries must convey maximal information
in minimal space
2
Types of Summaries
 Informative vs. Indicative
 Replacing a document vs. describing the
contents of a document
 Extractive vs. Generative
 Choosing bits of the source vs. generating
something new
 Single document vs. Multi Document
 Generic vs. user-focused
3
3
Types of Summaries
 Informative vs. Indicative
 Replacing a document vs. describing the
contents of a document
 Extractive vs. Generative
 Choosing bits of the source vs. generating
something new
 Single document vs. Multi Document
 Generic vs user-focused
4
4
Questions (from Sparck Jones)
 Should we take the reader into account and how?
 “Similarly, the notion of a basic summary, i.e., one
reflective of the source, makes hidden fact
assumptions, for example that the subject knowledge
of the output’s readers will be on a par with that of the
readers for whom the source was intended. (p. 5)”
 Is the state of the art sufficiently mature to allow
summarization from intermediate representations and
still allow robust processing of domain independent
material?
5
SU M M O N S Q U ER Y O U TPU T
S u m m a ry
:
W e d n e s d a y , A p r il 1 9 , 1 9 9 5 , C N N r e p o rte d th a t a n
e x p lo s io n s h o o k a g o v e r n m e n t b u ild in g in
O k la h o m a C ity . R e u te r s a n n o u n c e d th a t a t le a s t 1 8
p e o p le w e re k ille d . A t 1 P M , R e u te rs a n n o u n c e d
th a t th re e m a le s o f M id d le E a s te rn o rig in w e re
p o s s ib ly r e s p o n s ib le fo r th e b la s t. T w o d a y s la te r ,
T im o th y M c V e ig h , 2 7 , w a s a rre s te d a s a s u s p e c t,
U .S . a tto rn e y g e n e ra l J a n e t R e n o s a id . A s o f M a y
2 9 , 1 9 9 5 , th e n u m b e r o f v ic tim s w a s 1 6 6 .
Im a g e(s):
1 (o k fe d 1 .g if) ( W e b S e e k )
A r t ic le ( s ) :
( 1 ) B la s t h it s O k la h o m a C it y
b u ild in g
( 2 ) S u s p e c t s ' t r u c k s a id r e n t e d f r o m D a l la s
( 3 ) A t le a s t 1 8 k il le d in b o m b
b la s t - C N N
( 4 ) D E T R O I T ( R e u t e r ) - A fe d e r a l ju d g e
M o nd a y o rd ered Ja m es
(5 ) W A S H IN G T O N (R eu ter) - A
s u s p e c t in t h e O k la h o m a C it y
b o m b in g
Summons, Dragomir Radev, 1995
6
Briefings
Transitional
 Automatically summarize series of articles
 Input = templates from information extraction
 Merge information of interest to the user from
multiple sources
 Show how perception changes over time
 Highlight agreement and contradictions
 Conceptual summarization: planning
operators
 Refinement (number of victims)
 Addition (Later template contains perpetrator)
7
How is summarization done?
 N input articles parsed by information
extraction system
 N sets of templates produced as output
 Content planner uses planning
operators to identify similarities and
trends
 Refinement (Later template reports new #
victims)
 New template constructed and passed
to sentence generator
8
Sample Template
Message ID
TST-COL-0001
Secsource: source
Secsource: date
Reuters
26 Feb 93
Early afternoon
26 Feb 93
World Trade Center
Bombing
At least 5
Incident: date
Incident: location
Incident:Type
Hum Tgt: number
9
Text Summarization
 Input: one or more text documents
 Output: paragraph length summary
 Sentence extraction is the standard method






Using features such as key words, sentence position in document, cue phrases
Identify sentences within documents that are salient
Extract and string sentences together
Luhn – 1950s
Hovy and Lin 1990s
Schiffman 2000
 Machine learning for extraction
 Corpus of document/summary pairs
 Learn the features that best determine important sentences
 Kupiec 1995: Summarization of scientific articles
10
A Trainable Document
Summarizer
•To summarize is to reduce in complexity, and hence in length, while
retaining some of the essential qualities of the original.
•This paper focuses on document extracts, a particular kind of
computed document summary.
•Document extracts consisting of roughly 20% of the original can be as
informative as the full text of a document, which suggests that even
shorter extracts may be useful indicative summaries.
•The trends in our results are in agreement with those of Edmundson
who used a subjectively weighted combination of features as opposed
to training the feature weights using a corpus.
•We have developed a trainable summarization program that is
grounded in a sound statistical framework.
11
Text Summarization at Columbia
 Shallow analysis instead of information
extraction
 Extraction of phrases rather than
sentences
 Generation from surface
representations in place of semantics
12
Problems with Sentence
Extraction
 Extraneous phrases
 “The five were apprehended along Interstate 95, heading
south in vehicles containing an array of gear including …
... authorities said.”
 Dangling noun phrases and pronouns
 “The five”
 Misleading
Why would the media use this specific word
(fundamentalists), so often with relation to Muslims?
*Most of them are radical Baptists, Lutheran and
Presbyterian groups.
13
Cut and Paste in Professional
Summarization
 Humans also reuse the input text to
produce summaries
 But they “cut and paste” the input rather
than simply extract
 our automatic corpus analysis
 300 summaries, 1,642 sentences
 81% sentences were constructed by cutting
and pasting
 linguistic studies
14
Major Cut and Paste
Operations
 (1) Sentence reduction
~~~~~~~~~~~~
15
Major Cut and Paste
Operations
 (1) Sentence reduction
~~~~~~~~~~~~
16
Major Cut and Paste
Operations
 (1) Sentence reduction
~~~~~~~~~~~~
 (2) Sentence Combination
~~~~~~~
~~~~~~~
~~~~~~
17
Major Cut and Paste
Operations
 (3) Syntactic Transformation
~~~~~
~~~~~
 (4) Lexical paraphrasing
~~~~~~~~~~~
~~~
18
Summarization at Columbia





News
Email
Meetings
Journal articles
Open-ended question-answering
 What is a Loya Jurga?
 Who is Mohammed Naeem Noor Khan?
 What do people think of welfare reform?
19
Summarization at Columbia
 News
 Single Document
 Multi-document
 Email
 Meetings


Journal articles
Open-ended question-answering
 What is a Loya Jurga?
 Who is Al Sadr?
 What do people think of welfare reform?
20
Cut and Paste Based Single Document
Summarization -- System Architecture
Input: single document
Extraction
Extracted sentences
Generation
Parser
Sentence reduction
Co-reference
Sentence combination
Corpus
Decomposition
Lexicon
Output: summary
21
(1) Decomposition of Humanwritten Summary Sentences
 Input:
 a human-written summary sentence
 the original document
 Decomposition analyzes how the
summary sentence was constructed
 The need for decomposition
 provide training and testing data for
studying cut and paste operations
22
Sample Decomposition Output
Document sentences:
Summary sentence:
S1: A proposed new law that would require
web publishers to obtain parental consent
Arthur B. Sackler, vice
before collecting personal information from
president for law and
children could destroy the spontaneous nature
public policy of Time
that makes the internet unique, a member of the
Warner Cable Inc. and a
Direct Marketing Association told a Senate
member of the direct
panel Thursday.
marketing association told S2: Arthur B. Sackler, vice president for law
and public policy of Time Warner Cable Inc.,
the Communications
said the association supported efforts to protect
Subcommittee of the
children on-line, but he…
Senate Commerce
S3: “For example, a child’s e-mail address is
Committee that legislation necessary …,” Sackler said in testimony to the
Communications subcommittee of the Senate
to protect children’s
Commerce Committee.
privacy on-line could
destroy the spondtaneous S5: The subcommittee is considering the
Children’s Online Privacy Act, which was
nature that makes the
drafted…
23
A Sample Decomposition Output
Document sentences:
Summary sentence:
S1: A proposed new law that would require
web publishers to obtain parental consent
Arthur B. Sackler, vice
before collecting personal information from
president for law and
children could destroy the spontaneous nature
public policy of Time
that makes the internet unique, a member of the
Warner Cable Inc. and a
Direct Marketing Association told a Senate
member of the direct
panel Thursday.
marketing association told S2: Arthur B. Sackler, vice president for law
and public policy of Time Warner Cable Inc.,
the Communications
said the association supported efforts to protect
Subcommittee of the
children on-line, but he…
Senate Commerce
S3: “For example, a child’s e-mail address is
Committee that legislation necessary …,” Sackler said in testimony to the
Communications subcommittee of the Senate
to protect children’s
Commerce Committee.
privacy on-line could
destroy the spondtaneous S5: The subcommittee is considering the
Children’s Online Privacy Act, which was
nature that makes the
drafted…
24
Decomposition of human-written
summaries
A Sample Decomposition Output
Document sentences:
Summary sentence:
S1: A proposed new law that would require
web publishers to obtain parental consent
Arthur B. Sackler, vice
before collecting personal information from
president for law and
children could destroy the spontaneous nature
public policy of Time
that makes the internet unique, a member of the
Warner Cable Inc. and a
Direct Marketing Association told a Senate
member of the direct
panel Thursday.
marketing association told S2: Arthur B. Sackler, vice president for law
and public policy of Time Warner Cable Inc.,
the Communications
said the association supported efforts to protect
Subcommittee of the
children on-line, but he…
Senate Commerce
S3: “For example, a child’s e-mail address is
Committee that legislation necessary …,” Sackler said in testimony to the
Communications subcommittee of the Senate
to protect children’s
Commerce Committee.
privacy on-line could
destroy the spondtaneous S5: The subcommittee is considering the
Children’s Online Privacy Act, which was
nature that makes the
drafted…
25
Decomposition of human-written
summaries
A Sample Decomposition Output
Document sentences:
Summary sentence:
S1: A proposed new law that would require
web publishers to obtain parental consent
Arthur B. Sackler, vice
before collecting personal information from
president for law and
children could destroy the spontaneous nature
public policy of Time
that makes the internet unique, a member of the
Warner Cable Inc. and a
Direct Marketing Association told a Senate
member of the direct
panel Thursday.
marketing association told S2: Arthur B. Sackler, vice president for law
and public policy of Time Warner Cable Inc.,
the Communications
said the association supported efforts to protect
Subcommittee of the
children on-line, but he…
Senate Commerce
S3: “For example, a child’s e-mail address is
Committee that legislation necessary …,” Sackler said in testimony to the
Communications subcommittee of the Senate
to protect children’s
Commerce Committee.
privacy on-line could
destroy the spondtaneous S5: The subcommittee is considering the
Children’s Online Privacy Act, which was
nature that makes the
drafted…
26
Decomposition of human-written
summaries
The Algorithm for
Decomposition
 A Hidden Markov Model based solution
 Evaluations:
 Human judgements
 50 summaries, 305 sentences
 93.8% of the sentences were decomposed
correctly
 Summary sentence alignment
 Tested in a legal domain
 Details in (Jing&McKeown-SIGIR99)
27
(2) Sentence Reduction
 An example:
Original Sentence: When it arrives sometime next year
in new TV sets, the V-chip will give parents a new
and potentially revolutionary device to block out
programs they don’t want their children to see.
Reduction Program: The V-chip will give parents a new
and potentially revolutionary device to block out
programs they don’t want their children to see.
Professional: The V-chip will give parents a device to
block out programs they don’t want their children to
see.
28
The Algorithm for Sentence
Reduction
 Preprocess: syntactic parsing
 Step 1: Use linguistic knowledge to decide
what phrases MUST NOT be removed
 Step 2: Determine what phrases are
most important in the local context
 Step 3: Compute the probabilities of
humans removing a certain type of
phrase
 Step 4: Make the final decision
29
Step 1: Use linguistic knowledge
to decide what MUST NOT be
removed
 Syntactic knowledge from a large-scale,
reusable lexicon we have constructed
convince:
meaning 1:
NP-PP :PVAL (“of”)
(E.g., “He convinced me of his innocence”)
NP-TO-INF-OC
(E.g., “He convinced me to go to the party”)
meaning 2:
...
 Required syntactic arguments are not
removed
30
Step 2: Determining context
importance based on lexical
links
 Saudi Arabia on Tuesday decided to sign…
 The official Saudi Press Agency reported
that King Fahd made the decision during a
cabinet meeting in Riyadh, the Saudi
capital.
 The meeting was called in response to … the
Saudi foreign minister, that the Kingdom…
 An account of the Cabinet discussions and
decisions at the meeting…
 The agency...
31
Step 2: Determining context
importance based on lexical
links
 Saudi Arabia on Tuesday decided to sign…
 The official Saudi Press Agency reported
that King Fahd made the decision during a
cabinet meeting in Riyadh, the Saudi
capital.
 The meeting was called in response to … the
Saudi foreign minister, that the Kingdom…
 An account of the Cabinet discussions and
decisions at the meeting…
 The agency...
32
Step 2: Determining context
importance based on lexical
links
 Saudi Arabia on Tuesday decided to sign…
 The official Saudi Press Agency reported
that King Fahd made the decision during a
cabinet meeting in Riyadh, the Saudi
capital.
 The meeting was called in response to … the
Saudi foreign minister, that the Kingdom…
 An account of the Cabinet discussions and
decisions at the meeting…
 The agency...
33
Step 3: Compute probabilities
of humans removing a phrase
verb
(will give)
vsubc
(when)
subj
(V-chip)
iobj
(parents)
ndet
(a)
obj
(device)
adjp
(and)
Prob(“when_clause is removed”| “v=give”) lconj
(new)
rconj
(revolutionary)
Prob (“to_infinitive modifier is removed” | “n=device”)
34
Step 4: Make the final
decision
verb L Cn Pr
(will give)
vsubc L Cn Pr subj L Cn Pr iobj L Cn obj
Pr L Cn Pr
(device)
(when)
(V-chip)
(parents)
L Cn Pr ndet
(a)
L -- linguistic
Cn -- context
Pr -- probabilities
adjp L Cn Pr
(and)
rconj
lconj
(new) (revolutionary)
L Cn Pr L Cn Pr
35
Evaluation of Reduction
 Success rate: 81.3%
 500 sentences reduced by humans
 Baseline: 43.2% (remove all the clauses,
prepositional phrases, to-infinitives,…)
 Reduction rate: 32.7%
 Professionals: 41.8%
 Details in (Jing-ANLP00)
36
Multi-Document Summarization
Research Focus
 Monitor variety of online information sources
 News, multilingual
 Email
 Gather information on events across source
and time
 Same day, multiple sources
 Across time
 Summarize
 Highlighting similarities, new information, different
perspectives, user specified interests in real-time
37
Our Approach
 Use a hybrid of statistical and linguistic
knowledge
 Statistical analysis of multiple documents
 Identify important new, contradictory information
 Information fusion and rule-driven content
selection
 Generation of summary sentences
 By re-using phrases
 Automatic editing/rewriting summary
38
Newsblaster
Integrated in online environment for daily news
updates
http://newsblaster.cs.columbia.edu/
Ani Nenkova
David Elson
39
Newsblaster
http://newsblaster.cs.columbia.edu/




Clustering articles into events
Categorization by broad topic
Multi-document summarization
Generation of summary sentences
 Fusion
 Editing of references
40
Newsblaster
Architecture
Crawl News Sites
Form Clusters
Categorize
Title
Clusters
Summary
Router
Event
Summary
Biography
Summary
Select
Images
MultiEvent
Convert Output to HTML
41
42
Fusion
43
Theme Computation
 Input: A set of related documents
 Output: Sets of sentences that “mean”
the same thing
 Algorithm
 Compute similarity across sentences using the
Cosine Metric
 Can compare word overlap or phrase overlap
 (PLACEHOLDER: IR vector space model)
44
Sentence Fusion Computation
 Common information identification
 Alignment of constituents in parsed theme sentences: only some
subtrees match
 Bottom-up local multi-sequence alignment
 Similarity depends on


Word/paraphrase similarity
Tree structure similarity
 Fusion lattice computation




Choose a basis sentence
Add subtrees from fusion not present in basis
Add alternative verbalizations
Remove subtrees from basis not present in fusion
 Lattice linearization
 Generate all possible sentences from the fusion lattice
 Score sentences using statistical language model
45
46
47
Tracking Across Days
 Users want to follow a story across time and watch it
unfold
 Network model for connecting clusters across days
 Separately cluster events from today’s news
 Connect new clusters with yesterday’s news
 Allows for forking and merging of stories
 Interface for viewing connections
 Summaries that update a user on what’s new
 Statistical metrics to identify differences between article pairs
 Uses learned model of features
 Identifies differences at clause and paragraph levels
48
49
50
51
52
Different Perspectives
 Hierarchical clustering
 Each event cluster is divided into clusters by
country
 Different perspectives can be viewed
side by side
 Experimenting with update summarizer
to identify key differences between sets
of stories
53
54
55
56
Multilingual Summarization
 Given a set of documents on the same
event
 Some documents are in English
 Some documents are translated from
other languages
57
Issues for Multilingual
Summarization
 Problem: Translated text is errorful
 Exploit information available during
summarization
 Similar documents in cluster
 Replace translated sentences with similar
English
 Edit translated text
 Replace named entities with extractions from similar English
58
Multilingual Redundancy
BAGDAD. - A total of 21 prisoners has been died and a hundred more hurt by firings
from mortar in the jail of Abu Gharib (to 20 kilometers to the west of Bagdad), according
to has informed general into the U.S.A. Marco Kimmitt.
Spanish
Bagdad in the Iraqi capital Aufstaendi attacked Bagdad on Tuesday a prison
with mortars and killed after USA gifts 22 prisoners. Further 92 passengers
of the Abu Ghraib prison were hurt, communicated a spokeswoman of the
American armed forces.
German
The Iraqi being stationed US military shot on the 20th, the same day to the allied forces
detention facility which is in アブグレイブ of the Baghdad west approximately 20
kilometers, mortar 12 shot and you were packed, 22 Iraqi human prisoners died, it
announced that nearly 100 people were injured.
Japanese
BAGHDAD, Iraq – Insurgents fired 12 mortars into Baghdad's Abu Ghraib prison
Tuesday, killing 22 detainees and injuring 92, U.S. military officials said.
English
59
Multilingual Redundancy
BAGDAD. - A total of 21 prisoners has been died and a hundred more hurt by firings
from mortar in the jail of Abu Gharib (to 20 kilometers to the west of Bagdad),
according to has informed general into the U.S.A. Marco Kimmitt.
Spanish
Bagdad in the Iraqi capital Aufstaendi attacked Bagdad on Tuesday a prison
with mortars and killed after USA gifts 22 prisoners. Further 92 passengers
of the Abu Ghraib prison were hurt, communicated a spokeswoman of the
American armed forces.
German
The Iraqi being stationed US military shot on the 20th, the same day to the allied forces
detention facility which is in アブグレイブ of the Baghdad west approximately 20
kilometers, mortar 12 shot and you were packed, 22 Iraqi human prisoners died, it
announced that nearly 100 people were injured.
Japanese
BAGHDAD, Iraq – Insurgents fired 12 mortars into Baghdad's Abu Ghraib prison
Tuesday, killing 22 detainees and injuring 92, U.S. military officials said.
English
60
Multilingual Similarity-based
Summarization
‫ ستجواب‬日本語
‫ القالف‬テキ
スト
Machine
Translation
Arabic Japane
se
Text
Text
Germa
n
Text
Select
summary
sentences
Deutsche
r
Text
Documents on the same
Event
English
Text
English
Text
English
Simplify
English
Sentences
Detect
Similar
Sentences
Replace /
rewrite MT
sentences
Text
Related English Documents
English
Summar
y
61
Sentence 1
Iraqi President Saddam Hussein that the government of Iraq
over 24 years in a "black" near the port of the northern Iraq
after nearly eight months of pursuit was considered the
largest in history .
Similarity 0.27: Ousted Iraqi President Saddam Hussein is
in custody following his dramatic capture by US forces in
Iraq.
Similarity 0.07: Saddam Hussein, the former president of
Iraq, has been captured and is being held by US forces in
the country.
Similarity 0.04: Coalition authorities have said that the
former Iraqi president could be tried at a war crimes
tribunal, with Iraqi judges presiding and international legal
experts acting as advisers.
62
Sentence Simplification
 Machine translated sentences long and
ungrammatical
 Use sentence simplification on English
sentences to reduce to approximately
“one fact” per sentence
 Use Arabic sentences to find most
similar simple sentences
 Present multiple high similarity
sentences
63
Simplification Examples
 'Operation Red Dawn', which led to the capture of
Saddam Hussein, followed crucial information from a
member of a family close to the former Iraqi leader.
 ' Operation Red Dawn' followed crucial information from a
member of a family close to the former Iraqi leader.
 Operation Red Dawn led to the capture of Saddam Hussein.
 Saddam Hussein had been the object of intensive
searches by US-led forces in Iraq but previous
attempts to locate him had proved unsuccessful.
 Saddam Hussein had been the object of intensive searches
by US-led forces in Iraq.
 But previous attempts to locate him had proved unsuccessful.
64
Evaluation
 DUC (Document Understanding Conference):
run by NIST yearly
 Manual creation of topics (sets of documents)
 2-7 human written summaries per topic
 How well does a system generated summary
cover the information in a human summary?
 Metrics
 Rouge
 Pyramid
65
Rouge
 ROUGE version 1.2.1:
 Publicly available at: http://www.isi.edu/~cyl/ROUGE
 Version 1.2.1 includes:
 ROUGE-N - n-gram-based co-occurrence statistics
 ROUGE-L - longest common subsequence-based (LCS) co-occurrence statistics
 ROUGE-W - LCS-based co-occurrence statistics favoring consecutive LCSes
 Measures recall
 Rouge-1: How many unigrams in the human
summary did the system summary find?
 Rouge-2: How many bigrams?
66
Pyramids
 Uses multiple human summaries
 Previous data indicated 5 needed for score
stability
 Information is ranked by its importance
 Allows for multiple good summaries
 A pyramid is created from the human
summaries
 Elements of the pyramid are content units
 System summaries are scored by comparison
with the pyramid
67
Summarization Content Units
 Near-paraphrases from different human
summaries
 Clause or less
 Avoids explicit semantic representation
 Emerges from analysis of human
summaries
68
SCU: A cable car caught fire
(Weight = 4)
A. The cause of the fire was unknown.
B. A cable car caught fire just after entering a
mountainside tunnel in an alpine resort in Kaprun,
Austria on the morning of November 11, 2000.
C. A cable car pulling skiers and snowboarders to the
Kitzsteinhorn resort, located 60 miles south of
Salzburg in the Austrian Alps, caught fire inside a
mountain tunnel, killing approximately 170 people.
D. On November 10, 2000, a cable car filled to capacity
caught on fire, trapping 180 passengers inside the
Kitzsteinhorn mountain, located in the town of
Kaprun, 50 miles south of Salzburg in the central
Austrian Alps.
69
SCU: The cause of the fire is
unknown (Weight = 1)
A. The cause of the fire was unknown.
B. A cable car caught fire just after entering a
mountainside tunnel in an alpine resort in Kaprun,
Austria on the morning of November 11, 2000.
C. A cable car pulling skiers and snowboarders to the
Kitzsteinhorn resort, located 60 miles south of
Salzburg in the Austrian Alps, caught fire inside a
mountain tunnel, killing approximately 170 people.
D. On November 10, 2000, a cable car filled to capacity
caught on fire, trapping 180 passengers inside the
Kitzsteinhorn mountain, located in the town of
Kaprun, 50 miles south of Salzburg in the central
Austrian Alps.
70
SCU: The accident happened in
the Austrian Alps (Weight = 3)
A. The cause of the fire was unknown.
B. A cable car caught fire just after entering a
mountainside tunnel in an alpine resort in Kaprun,
Austria on the morning of November 11, 2000.
C. A cable car pulling skiers and snowboarders to the
Kitzsteinhorn resort, located 60 miles south of
Salzburg in the Austrian Alps, caught fire inside a
mountain tunnel, killing approximately 170 people.
D. On November 10, 2000, a cable car filled to capacity
caught on fire, trapping 180 passengers inside the
Kitzsteinhorn mountain, located in the town of
Kaprun, 50 miles south of Salzburg in the central
Austrian Alps.
71
Idealized representation
 Tiers of differentially
W=3
weighted SCUs
 Top: few SCUs, high
weight
 Bottom: many
SCUs, low weight
W=2
W=1
72
Pyramid Score
SCORE = D/MAX
D: Sum of the weights of the SCUs in a
summary
MAX: Sum of the weights of the SCUs in a
ideally informative summary
Measures the proportion of good information in
the summary: precision
73
Questions (from Sparck Jones)
 Should we take the reader into account and how?
 “Similarly, the notion of a basic summary, i.e., one
reflective of the source, makes hidden fact
assumptions, for example that the subject knowledge
of the output’s readers will be on a par with that of the
readers for whom the source was intended. (p. 5)”
 Is the state of the art sufficiently mature to allow
summarization from intermediate representations and
still allow robust processing of domain independent
material?
74
User Study: Objectives
 Does multi-document summarization help?
 Do summaries help the user find information needed to




perform a report writing task?
Do users use information from summaries in gathering
their facts?
Do summaries increase user satisfaction with the online
news system?
Do users create better quality reports with summaries?
How do full multi-document summaries compare with
minimal 1-sentence summaries such as Google News?
75
User Study: Design
 Four parallel news systems
 Source documents only; no summaries
 Minimal single sentence summaries (Google
News)
 Newsblaster summaries
 Human summaries
 All groups write reports given four scenarios
 A task similar to analysts
 Can only use Newsblaster for research
 Time-restricted
76
User Study: Execution
 4 scenarios
 4 event clusters each
 2 directly relevant, 2 peripherally relevant
 Average 10 documents/cluster
 45 participants
 Balance between liberal arts, engineering
 138 reports
 Exit survey
 Multiple-choice and open-ended questions
 Usage tracking
 Each click logged, on or off-site
77
“Geneva” Prompt
 The conflict between Israel and the
Palestinians has been difficult for government
negotiators to settle. Most recently,
implementation of the “road map for peace”,
a diplomatic effort sponsored by ……
 Who participated in the negotiations that produced the
Geneva Accord?
 Apart from direct participants, who supported the Geneva
Accord preparations and how?
 What has the response been to the Geneva Accord by
the Palestinians?
78
Measuring Effectiveness
 Score report content and compare
across summary conditions
 Compare user satisfaction per summary
condition
 Comparing where subjects took report
content from
79
80
User Satisfaction
 More effective than a web search with
Newsblaster
 Not true with documents only or single-sentence
summaries
 Easier to complete the task with summaries
than with documents only
 Enough time with summaries than documents
only
 Summaries helped most
 5% single sentence summaries
 24% Newsblaster summaries
 43% human summaries
81
User Study: Conclusions
 Summaries measurably improve a news
browswer’s effectiveness for research
 Users are more satisfied with Newsblaster
summaries are better than single-sentence
summaries like those of Google News
 Users want search
 Not included in evaluation
82
Email Summarization
 Cross between speech and text
Elements of dialog
 Informal language
 More context explicitly repeated than speech

 Wide variety of types of email

Conversation to decision-making
 Different reasons for summarization
Browsing large quantities of email – a mailbox
 Catch-up: join a discussion late and participate – a
thread

83
Email Summarization: Approach
 Collected and annotated multiple corpora of
email
 Hand-written summary, categorization
threads&messages
 Identified 3 categories of email to address:
 Event planning, Scheduling, Information gathering
 Developed tools:
 Automatic categorization of email
 Preliminary summarizers
 Statistical extraction using email specific features
 Components of category specific summarization
84
Email Summarization by
Sentence Extraction
 Use features to identify key sentences
 Non-email specific: e.g., similarity to centroid
 Email specific: e.g., following quoted material
 Rule-based supervised machine learning
 Training on human-generated summaries
 Add “wrappers” around sentences to show
who said what
85
Data for Sentence Extraction
 Columbia ACM chapter executive board mailing




list
Approximately 10 regular participants
~300 Threads, ~1000 Messages
Threads include: scheduling and planning of
meetings and events, question and answer,
general discussion and chat.
Annotated by human annotators:
 Hand-written summary
 Categorization of threads and messages
 Highlighting important information (such as questionanswer pairs)
86
Email Summarization by
Sentence Extraction
 Creation of Training Data
 Start with human-generated summaries
 Use SimFinder (a trained sentence similarity measure –
Hatzivassiloglou et al 2001) to label sentences in threads as
important
 Learning of Sentence Extraction Rules
 Use Ripper (a rule learning algorithm – Cohen 1996) to learn rules
for sentence classification
 Use basic and email-specific features in machine learning
 Creating summaries
 Run learned rules on unseen data
 Add “wrappers” around sentences to show who said what
 Results
 Basic: .55 precision; .40 F-measure
 Email-specific: .61 precision; .50 F-measure
87
Sample Automatically Generated
Summary (ACM0100)
Regarding "meeting tonight...", on Oct 30, 2000,
David Michael Kalin wrote: Can I reschedule my C
session for Wednesday night, 11/8, at 8:00?
Responding to this on Oct 30, 2000, James J Peach
wrote: Are you sure you want to do it then?
Responding to this on Oct 30, 2000, Christy
Lauridsen wrote: David , a reminder that your
scheduled to do an MSOffice session on Nov. 14,
at 7pm in 252Mudd.
88
Information Gathering Email:
The Problem
Summary from our rule-based sentence extractor:
Regarding "acm home/bjarney", on Apr 9, 2001, Mabel Dannon
wrote:
Two things: Can someone be responsible for the press releases
for Stroustrup?
Responding to this on Apr 10, 2001, Tina Ferrari wrote:
I think Peter, who is probably a better writer than most of us, is
writing up something for dang and Dave to send out to various
ACM chapters. Peter, we can just use that as our "press
release", right?
In another subthread, on Apr 12, 2001, Keith Durban wrote:
Are you sending out upcoming events for this week?
89
Detection of Questions
Questions in interrogative form: inverted subject-verb order
Supervised rule induction approach, training Switchboard,
test ACM corpus
 Results:
Recall
0.56
Precision
0.96
F-measure
0.70
 Recall low because:
Questions in ACM corpus start with a declarative clause
 So, if you're available, do you want to come?
 if you don't mind, could you post this to the class bboard?
 Results without declarative-initial questions: Recall
0.72
Precision
0.96
F-measure
0.82
90
Detection of Answers
Supervised Machine Learning Approach
 Use human annotated data to generate gold standard
training data
 Annotators were asked to highlight and associate question-answer
pairs in the ACM corpus.
 Learn a classifier that predicts if a subsequent segment
to a question segment answers it
 Represent each question and candidate answer segment
by a feature vector
Labeller
1
Labeller
2
Union
Precision 0.690
0.680
0.728
Recall
0.652
0.612
0.732
F1-Score 0.671
0.644
0.730
91
Integrating QA detection with
summarization
 Use QA labels as features in sentence
extraction (F=.545)
 Add automatically detected answers to
questions in extractive summaries
(F=.566)
 Start with QA pair sentences and
augmented with extracted sentences
(F=.573)
92
Integrated in Microsoft Outlook
93
Meeting Summarization
(joint with Berkeley, SRI, Washington)
 Goal: automatic summarization of meetings
by generating “minutes” highlighting the
debate that affected each decision.
 Work to date: Identification of
agreement/disagreement
 Machine learning approach: lexical, structure,
acoustic features
 Use of context: who agreed with who so far?


Adressee identification
Bayesian modeling of context
94
Conclusions
 Non-extractive summarization is practical today
 User studies show summarization improves access
to needed information
 Advances and ongoing research in tracking events,
multilingual summarization, perspective identification
 Moves to new media (email, meetings) raise new
challenges with dialog, informal language
95
Sparck Jones claims
 Need more power than text extraction and more flexibility than




fact extraction (p. 4)
In order to develop effective procedures it is necessary to
identify and respond to the context factors, i.e. input, purpose
and output factors, that bear on summarising and its evaluation.
(p. 1)
It is important to recognize the role of context factors because
the idea of a general-purpose summary is manifestly an ignis
fatuus. (p. 5)
Similarly, the notion of a basic summary, i.e., one reflective of
the source, makes hidden fact assumptions, for example that
the subject knowledge of the output’s readers will be on a par
with that of the readers for whom the source was intended. (p. 5)
I believe that the right direction to follow should start with
intermediate source processing, as exemplified by sentence
parsing to logical form, with local anaphor resolutions
96
97
Download