Dissertation Defense Multiple Alternative Sentence Compressions

advertisement
University of Maryland College Park
Department of Computer Science
Dissertation Defense
Multiple Alternative Sentence
Compressions as a Tool for
Automatic Summarization Tasks
David Zajic
November 28, 2006
Intuition
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government. The body of the editor, Misael
Tamayo Hernández, of the daily El
Despertar de la Costa, was found early
Friday with his hands tied behind his back
in a room at the Venus Motel…
Intuition
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Newspaper editor found dead in Pacific
resort city
Intuition
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Paper ran articles about corruption in
government
Intuition
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Hernández, Zihuatanjo: Newspaper editor
found dead in Pacific resort city
Intuition
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Newspaper Editor Killed in Mexico
– (A) Newspaper Editor (was) killed in Mexico
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
Introduction
• Automatic Summarization
– Distillation of important information from a
source into an abridged form
– Extractive Summarization: select sentences
with important content from the document
– Limitations
• Sentences contain mixture of relevant, nonrelevant information
• Sentences partially redundant to rest of summary
Introduction
• Multiple Alternative Sentence
Compressions (MASC)
– Framework for Automatic Text Summarization
– Generation of many sentence compressions
of source sentences to serve as candidates
– Select from candidates using weighted
features to generate summary
– Environment for testing hypotheses
Introduction
• Hypotheses
– Extractive summarization systems can create
better summaries using larger pool of
compressed candidates
– Sentence selectors choose better summary
candidates using larger sets of features
– For Headline Generation, combination of
fluent text and topics better than either alone
Introduction
• Sentence Compression
– HMM Hedge
– Trimmer
– Topiary
• Sentence Selection
– Lead Sentence for Headline Generation
– Maximal Marginal Relevance for Multidocument Summarization
Summarization Tasks
• Single Document Summarization
–
–
–
–
Very short: Headline Generation
Single sentence
75 characters
DUC2002, 2003, 2004
• Query-focused Multi-Document Summarization
– Multiple sentences
– 100 – 250 words
– DUC2005, 2006
Headline Generation
• Newspaper Headlines
– Natural example of human summarization
– Three criteria for a good headline:
• Summarize a story
• Make people want to read it
• Fit in specified space
– Headlinese: compressed form of English
Introduction
• Headline Types: Eye-Catcher • Indicative •
Informative
• Under God Under Fire
• Pledge of Allegiance
• U.S. Court Decides Pledge of Allegiance
Unconstitutional
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
General Architecture
Document
Compression
HMM Hedge
Trimmer
Topiary
Candidates
Selection
Summary
Lead Sentence
Selection
Maximal Marginal
Relevance
Sentence Compression
• Selecting words in order from a sentence
– Or window of words
• Human Studies
– Humans can almost always do this for written
news
– Bias for words from within a single sentence
– Bias for words early in document
Sentence Compression
• Two implementations of select-words-inorder
– Statistical Method: HMM Hedge (Headline
Generation)
– Syntactic Method: Trimmer
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
HMM Hedge Architecture
Single Compression
Headline
Language
Model
General
Language
Model
HMM Hedge
Document
Part of Speech
Tagger
Verb
Tags
Selection
Summary
HMM Hedge
Noisy Channel Model
•
•
•
•
Underlying method: select words in order
Sentences are observed data
Headlines are unobserved data
Noisy channel adds words to headlines to create
sentences
President
signed
legislation
On Tuesday the
President signed
the controversial
legislation at a
private ceremony
HMM Hedge
Noisy Channel Model
• Probability of Headline estimated with bigram
model of Headlinese
• Probability of observed Sentence given
unobserved Headline (the channel model)
estimated by unigram model of General English
HMM Hedge
• Decoding parameters to mimic Headlines
– Groups of contiguous words, clumpiness
– Size of gaps between words, gappiness
– Sentence position of words
– Require verb
HMM Hedge
• Adaptation to Multi-candidate compression
• Finds the 5 most likely headlines for
summary lengths 5 to 15 words of
document sentences
Automatic Evaluation
• Recall Oriented Understudy of Gisting
Evaluation (Rouge)
– Rouge Recall: ratio of matching candidate n-gram
count to reference n-gram count
– Rouge Precision: ratio of matching candidate n-gram
count to candidate n-gram count times number of
references.
– R1 preferred for single document summarization
– R2 preferred for multi-document summarization
HMM Hedge
0.28
0.26
0.24
Rouge 1 Scores
0.22
R1 Recall, 1 Sentence
R1 Precision, 1 Sentence
0.2
R1 Recall, 2 Sentences
R1 Precision, 2 Sentences
R1 Recall, 3 Sentences
0.18
R1 Precision, 3 Sentences
0.16
0.14
0.12
0.1
1
2
3
4
N-best at each length
5
HMM Hedge
• Features (default weight, optimized weight fold A)
–
–
–
–
–
–
–
–
–
–
Word position sum (-0.05, 1.72)
Small gaps (-0.01, 1.02)
Large gaps (-0.05, 3.70)
Clumps (-0.05, -0.17)
Sentence position (0, -945)
Length in words (1, 42)
Length in characters (1, 85)
Unigram probability of story words (1, 1.03)
Bigram probability of headline words (1, 1.51)
Emit probability of headline words (1, 3.60)
HMM Hedge
Fold
Default
R1 recall
Weights
R1 Prec.
Optimized Weights
R1 recall R1 Prec.
A
0.11214
0.10726
0.24722
0.21482
B
0.11021
0.10231
0.24307
0.21425
C
0.11781
0.10811
0.24129
0.20795
D
0.11993
0.10660
0.16595
0.13454
E
0.11282
0.10003
0.25341
0.21775
Avg
0.11458
0.10486
0.23019
0.19786
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
Multi-Document Summarization
• Human study showed potential for saving space
by using sentence compression in multidocument summarization
• Subject was shown 103 sentences, relevant to
39 queries, asked to make relevance judgments
on 430 compressed versions
• Potential for 16.7% reduction by word count,
17.6% reduction by characters, with no loss of
relevance
HMM Hedge
Multi-Document Summarization
Headline
Language
Model
General
Language
Model
Feature
Weights
URA
Index
HMM Hedge
Document
Candidates,
HMM Features
URA
Part of Speech
Tagger
Query (optional)
Verb
Tags
Candidates,
HMM Features,
URA Features
Selection
Summary
Multi-Document Sentence
Selection
• Maximal Marginal Relevance (MMR)
(Carbonell and Goldstein, 1998)
– All candidates given scores: linear
combination of static and dynamic features
– High ranking candidate included in summary
• Other compressions of source sentence removed
from pool
– Recalculate dynamic features, Rescore
candidates
– Iterate until summary is complete.
Multi-Document Sentence
Selection
• Static Features
– Sentence Position
– Relevance
– Centrality
– Compression-specific features
• Dynamic Features
– Redundancy
– Count of summary candidates from source
document
Relevance and Centrality
• Universal Retrieval Architecture (URA)
– Infrastructure for information retrieval tasks
• Four score components
– Candidate Query Relevance: Matching score
between candidate and query
– Document Query Relevance: Lucene similarity score
between document and query
– Candidate Centrality: Average Lucene similarity of
candidate to other sentences in document
– Document Centrality: Average Lucene similarity of
document to other documents in cluster
Redundancy: Intuition
• Consider a summary about earthquakes
• “Generated” by topic: Earthquake, seismic,
Richter scale
• “Generated” by general language: Dog,
under, during
• Sentences with many words “generated”
by the topic are redundant
Redundancy: Formal
count ( w, D)
P( w | D) 
size ( D)
count ( w, C )
P( w | C ) 
size (C )
redundancy ( S )   P( s | D)  (1   ) P( s | C )
sS
HMM Hedge Multi-doc
• Placeholder for results of HMM Hedge
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
Trimmer
• Underlying method: select words in order
• Parse and Trim
• Rules come from study of Headlinese
– Different distributions of syntactic structures
Phenomenon
Headlines
Lead Sent
Preposed adjunct
0%
2.7%
Time expression
1.5%
24%
Noun Phrase Relative Clause
0.3%
3.5%
Trimmer: Mask operation
Trimmer: Mask Outside
Trimmer Single Document
Document
Entity
Tagger
Entity
Tags
Parser
Parses
Trimmer
Candidates,
Trimmer Features
Selection
Summary
Trimmer: Root S
• Select the lowest leftmost S which has NP
and VP children, in that order.
[S [S [NPRebels] [VP agree to talks with
government]] officials said Tuesday.]
Trimmer: Preposed Adjunct
• Remove [YP …] preceding first NP inside
chosen S
[S [PP According to a now-finalized
blueprint described by U.S. officials and
other sources] [NP the Bush
administration] [VP plans to take complete,
unilateral control of a post-Saddam
Hussein Iraq]]
Trimmer: Conjunction
• Remove [X][CC][X] or [X][CC][X]
[S Illegal fireworks [VP [VP injured hundreds of
people] [CC and] [VP started six fires.]]]
[S A company offering blood cholesterol tests in
grocery stores says [S [S medical technology
has outpaced state laws,] [CC but] [S the
state says the company doesn’t have the
proper licenses.]]]
Trimmer
• Adaptation to multi-candidate compression
• Multi-candidate rules
– Root S
– Preamble
– Conjunction
Trimmer
• Multi-candidate Root S
• [S1 [S2 The latest flood crest, the eighth this summer,
passed Chongqing in southwest China], and [S3 waters
were rising in Yichang, in central China’s Hubei province,
on the middle reaches of the Yangtze], state television
reported Sunday.]
• Single-candidate version would choose only S2. Multicandidate Root-S generates all three choices.
Trimmer: Preamble Rule
Trimmer: Preamble Rule
Trimmer: Preamble Rule
Trimmer: Conjunction
• [S Illegal fireworks [VP [VP injured
hundreds of people] [CC and] [VP started
six fires.]]]
• Illegal fireworks injured hundreds of
people
• Illegal fireworks started six fires
Trimmer: Multi-candidate Rules
0.27
0.25
0.23
Rouge 1 Recall
LUL
L
R
C
0.21
LR
LC
RC
LRC
0.19
0.17
0.15
Trimmer
Trimmer +R
Trimmer +P
Trimmer +C Trimmer +R+P Trimmer +R+C Trimmer +S+C
Trimmer
+R+S+C
Trimmer: Features
• Selection among Trimmer candidates
based on three sets of features
– L: Length in characters or words
– R: Counts of rule applications
– C: Centrality
• Baseline LUL: select longest version under
limit
Trimmer: Features
0.27
0.25
0.23
Rouge 1 Recall
Trimmer
Trimmer +R
Trimmer +P
Trimmer +C
0.21
Trimmer +R+P
Trimmer +R+C
Trimmer +S+C
Trimmer +R+S+C
0.19
0.17
0.15
LUL
L
R
C
LR
LC
RC
LRC
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
Trimmer Multi-Document
Document
Entity
Tagger
Entity
Tags
Parser
Parses
Trimmer
Feature
Weights
URA
Index
Candidates,
Trimmer Features
URA
Query (optional)
Candidates,
Trimmer Features,
URA Features
Selection
Summary
Trimmer
System
R1 Recall
R1 Prec.
R2 Recall
R2 Prec
Trimmer
MultiDoc
0.38198
0.37617
0.08051
0.07922
HMM
MultiDoc
0.37404
0.37405
0.07884
0.07887
Talk Roadmap
• Introduction
• Automatic Summarization under MASC
framework
– HMM Hedge, Trimmer, Topiary
– Single Document, Multi-document
– Experimental evidence supporting hypotheses
• Review of Evaluations
• Conclusion
• Future Work
Topiary
• Combines topic terms and fluent text
– Fluent text comes from Trimmer
– Topics come from Unsupervised Topic Detection
(UTD)
• Single-candidate algorithm
– Lower Trimmer threshold to make room for highest
scoring non-redundant topic term
– Trim to lower threshold.
– Adjust if topic redundancy changes because of
trimming
Topiary Single-Candidate
Document
Entity
Tagger
Entity
Tags
Parser
Parses
Topiary
Topics
Unsupervised
Topic
Detection
Summary
Topiary, Trimmer, UTD
0.3
0.25
First 75 chars
Topiary
0.2
Trimmer 2003
Trimmer 2004
0.15
UTD
0.1
0.05
0
Rouge 1
Rouge 2
Rouge 3
Rouge 4
Topiary
• Multi-candidate Algorithm
– Generate Multi-candidate Trimmer candidates
– Fill space in all Trimmer candidates with all
combinations of non-redundant topics
– Score and select summary
Topiary Multi-Candidate
Document
Entity
Tagger
Entity
Tags
Parser
Parses
Topiary
URA
Index
Candidates,
Trimmer Features
URA
Topics
Query (optional)
Unsupervised
Topic
Detection
Feature
Weights
Candidates,
Trimmer Features,
URA Features
Selection
Summary
DUC 2004 Task 1 Results (Rouge)
0.35
Human
References
Topiary
0.3
0.25
Baseline
0.2
Automatic
Summaries
0.15
0.1
0.05
0
ROUGE-1
ROUGE-L
ROUGE-W-1.2
ROUGE-2
ROUGE-3
ROUGE-4
1
TOPIARY
9
10
18
25
26
31
32
33
50
51
52
53
54
75
76
77
78
79
80
87
88
89
90
91
92
98
99
100
101
110
128
129
130
131
132
135
136
137
A
B
C
D
E
F
G
H
Topiary Evaluation
Rouge Metric
Topiary
Rouge-1 Recall 0.25027
Multi-Candidate
Topiary
0.26490
Rouge-2 Recall 0.06484
0.08168*
Rouge-3 Recall 0.02130
0.02805
Rouge-4 Recall 0.00717
0.01105
Rouge-L
0.20063
0.22283*
Rouge-W1.2
0.11951
0.13234*
Talk Roadmap
• Introduction
• Automatic Summarization
– HMM Hedge, Trimmer, Topiary
• Single-candidate, MASC versions
– Multi-document Summarizataion
• HMM Hedge, Trimmer
• Evaluation
• Conclusion
• Future Work
Evaluation: Review
• HMM Hedge, Single-document. Rouge-1 recall
increases as number of candidates increases
• HMM Hedge, Single-document. Rouge-1 doubles when
scored with optimized weights on features
• Trimmer, Single-document. Rouge-1 increases with
greater use of multi-candidate rules
• Trimmer, Single-document. Rouge-1 increases with
larger set of features
• Topiary, Single-document. Multi-candidate Topiary
scores significantly higher on some Rouge metrics than
single-candidate Topiary.
• Trimmer scored higher than HMM for Multi-document
summarization
Evaluation
Human extrinsic evaluation of HMM,
Trimmer, Topiary and First 75
LDC agreement: ~20x increase in speed.
Some loss of accuracy.
Relevance Prediction
Baseline First75 char, hard to beat
Talk Roadmap
• Introduction
• Automatic Summarization
– HMM Hedge, Trimmer, Topiary
– Multiple Alternative Sentence Compressions
(MASC)
• Evaluation
• Conclusion
• Future Work
Conclusion
• MASC improves performance across
summarization tasks and compression
source
• Fluent and informative summaries can be
constructed by selecting words in order
from sentences
• Headlines combining fluent text and topic
terms score better than either alone
Future Work
• Enhance redundancy score with
paraphrase detection
• Anaphora resolution in candidates
• Expand candidates by sentence merging
• Sentence ordering in multi-sentence
summaries
End
Download