Dissertation Defense Multiple Alternative Sentence Compressions

advertisement
University of Maryland College Park
Department of Computer Science
Dissertation Defense
Multiple Alternative Sentence
Compressions as a Tool for
Automatic Summarization Tasks
David Zajic
November 28, 2006
Talk Roadmap
• Problem Description
• Contributions and Hypotheses
• Automatic Summarization under MASC
framework
– Single Document Summarization and Evaluation
• HMM Hedge, Trimmer, Topiary
– Extension to Multi-document Summarization
• Collaboration and Novel Genres
• Summary and Conclusion
• Future Work
Problem Description
• Automatic Summarization
– Distillation of important information from a
source into an abridged form
Problem Description
Abstractive vs. Extractive
• Abstractive Summarization: new text
generated by the summarizer
• Extractive Summarization: select
sentences with important content from the
document
• Challenges of Extractive Summarization
• Sentences contain mixture of relevant, nonrelevant information
• Sentences partially redundant to rest of summary
Problem Description
Summarization Tasks
• Single Document Summarization
– Very short: Headline Generation
– Single sentence
– 75 characters
• Query-focused Multi-Document
Summarization
– Multiple sentences
– 100 – 250 words
Problem Description
Headline Generation
• Newspaper Headlines
– Natural example of human summarization
– Three criteria for a good headline:
• Summarize a story
• Make people want to read it
• Fit in specified space
– Headlinese: compressed form of English
Problem Description
Headline Types
• Eye-Catcher
– Under God Under Fire
• Indicative
– Pledge of Allegiance
• Informative
– U.S. Court Decides Pledge of Allegiance
Unconstitutional
Problem Description
Sentence Compression
• Selecting words in order from a sentence
– Or window of words
• Feasibility Studies
– Humans can almost always do this for written
news
– Bias for words from within a single sentence
– Bias for words early in document
Problem Description
Sentence Compression
• Single-candidate
– Generates single compression of sentence
• Multi-candidate
– Generates multiple compressions of sentence
– Schizophrenia patients whose medication couldn't stop the imaginary
voices in their heads gained some relief after researchers repeatedly
sent a magnetic field into a small area of their brains.
– Schizophrenia patients gained some relief after researchers repeatedly
sent magnetic field into small area of brains.
– Schizophrenia patients gained some relief.
– researchers repeatedly sent magnetic field into small area of brains
Problem Description
Potential of Compression
• Sentence compression can reduce size of
sentences while preserving relevance.
• Subject was shown 103 sentences,
relevant to 39 queries, asked to make
relevance judgments on 430 compressed
versions
• Potential for 16.7% reduction by word
count, 17.6% reduction by characters, with
no loss of relevance
Problem Description
Single Document
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government. The body of the editor, Misael
Tamayo Hernández, of the daily El
Despertar de la Costa, was found early
Friday with his hands tied behind his back
in a room at the Venus Motel…
Problem Description
Single Document
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Newspaper editor found dead in Pacific
resort city
Problem Description
Single Document
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Paper ran articles about corruption in
government
Problem Description
Single Document
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Hernández, Zihuatanjo: Newspaper editor
found dead in Pacific resort city
Problem Description
Single Document
• A newspaper editor was found dead in a
hotel room in this Pacific resort city a day
after his paper ran articles about
organized crime and corruption in the city
government.
• Newspaper Editor Killed in Mexico
– (A) Newspaper Editor (was) killed in Mexico
Problem Description
Multi-Document
• Gunmen killed a prominent Shiite politician
• The killing of a prominent Shiite politician
by gunmen appeared to have been a
sectarian assassination.
• The killing appeared to have been a
sectarian assassination.
Automatic Evaluation of
Summarization
• Recall Oriented Understudy of Gisting
Evaluation (Rouge)(Lin 2004)
– Rouge Recall: ratio of matching candidate ngram count to reference n-gram count
• Rouge parameters for high correlation with
human judgments (Lin 2004)
– Rouge with unigrams (R1) for headline
generation
– Rouge with bigrams (R2) for multi-sentence
summaries (single- and mult-document)
Talk Roadmap
• Problem Description
• Contributions and Hypotheses
• Automatic Summarization under MASC
framework
– Single Document Summarization and Evaluation
• HMM Hedge, Trimmer, Topiary
– Extension to Multi-document Summarization
• Collaboration and Novel Genres
• Summary and Conclusion
• Future Work
Contributions
• Multiple Alternative Sentence
Compressions Summarization Framework
• Sentence compression methodologies
– HMM Hedge
– Trimmer
• Topiary Headline Generation
Contributions
MASC
• Framework for Automatic Text
Summarization
• Generation of many compressions of
source sentences to serve as candidates
• Select from candidates using weighted
features to generate summary
• Environment for testing hypotheses
Contributions
Sentence Compression
• Underlying technique: select words in
order, with morphological variation
• HMM Hedge (Headline Generation).
Statistical method, uses language models
to mimic Headlinese
• Trimmer. Syntactic method, uses syntactic
parse-and-trim rules to mimic Headlinese
Contributions
Topiary
• Headline Generation technique
– Combines fluent text with topic terms
– Highest scoring system in Document
Understanding Conference 2004 (DUC2004)
• Most recent evaluation of headline generation
Hypotheses
1. Extractive Summarization Systems can
achieve higher Rouge scores by using
larger pools of candidates
• Extractive summarization systems can
achieve higher Rouge scores by giving
the sentence selector appropriate sets of
features.
• For Headline Generation, combination of
fluent text and topics better than either
alone
Talk Roadmap
• Problem Description
• Contributions and Hypotheses
• Automatic Summarization under MASC
framework
– Single Document and evaluation
• HMM Hedge, Trimmer, Topiary
– Extending to Multi-document Summarization
•
•
•
•
Review of Evaluations
Collaboration and Novel Genres
Summary and Conclusion
Future Work
General Architecture
HMM Hedge
Trimmer
Topiary
Lead Sentence
First N Sentences
Document
Candidates
Sentence
Selection
Sentences
Candidate
Selection
Feature Weights
Maximal Marginal
Relevance
Summary
Sentence
Compression
Sentence Selection
• Select sentences to be compressed
• Single document: Lead sentence
– First non-trivial sentence of document
• Multi-document: First 5 sentences of each
document
HMM Hedge Architecture
Headline
Language
Model
Sentence
Part-of-speech
Tagger
General
Language
Model
HMM Hedge
Verb
Tags
Candidates
HMM Hedge
Noisy Channel Model
•
•
•
•
Underlying method: select words in order
Sentences are observed data
Headlines are unobserved data
Noisy channel adds words to headlines to create
sentences (Knight and Marcu 2000, 2002,
Banko et al 2000, Turner and Charniak 2005)
President
signed
legislation
On Tuesday the
President signed
the controversial
legislation at a
private ceremony
HMM Hedge
HMM for Generating Stories from Headlines
• H States emit
headline words
• G States emit nonheadline words
• Path through HMM
corresponds to a
headline
HMM Hedge
P( S | H ) P( H )
argmax H P( H | S )  argmax H
P( S )
P( H )  P(h1 | start ) P(h2 | h1 )...P(hm | end )
P( S | H )  P( g1 ) P( g 2 )...P( g n )
HMM Hedge
• Model parameters used to mimic
Headlinese
– Penalty for clumps of contiguous words
– Penalty for large gaps between clumps
– Word position penalty
• Viterbi decoder constraints
– Require a verb
– Constrain word length
HMM Hedge
Multi-candidate Compression
• Calculate best compression at each wordlength from 5 to 15 words
• Calculate 5-best compressions
• 5x11=55 compressions per sentence
HMM Hedge
Examples
• Schizophrenia patients whose medication couldn't stop
the imaginary voices in their heads gained some relief
after researchers repeatedly sent a magnetic field into a
small area of their brains.
• Schizophrenia couldn't stop their heads (5)
• Schizophrenia couldn't stop in their heads some relief
after researchers (10)
• Schizophrenia patients whose medication couldn't stop
the voices in their heads some relief after researchers
(15)
HMM Hedge
Candidate Selection
• Linear combination of features
– Bigram probability of headline, P(H)
– Unigram probability of non-headline words,
P(S|H)
– Number of clumps
– Length of headline in words, chars
HMM Hedge
Sentence Selection
0.3
0.25
Rouge 1
0.2
0.15
0.1
0.05
0
Sent 1
Sent 2
Sent 3
Sent 4
Selected Sentence
Sent 5
HMM Hedge
Sentence Boundary vs Word Window
0.27
0.23
0.21
0.19
0.17
nc
e
s
se
1
10
w
nt
e
or
d
s
20
w
or
d
s
30
w
or
d
s
40
w
or
d
s
or
d
w
50
w
or
d
s
0.15
60
Rouge 1
0.25
Window Size
HMM Hedge
Multi-Candidate
0.256
0.254
Rouge 1
0.252
0.25
0.248
0.246
0.244
1-best
2-best
3-best
4-best
N-best Candidates
5-best
Trimmer Architecture
Parser
Parses
Sentence
Trimmer
Entity
Tagger
Entity
Tags
Candidates
Trimmer
• Underlying method: select words in order
• Parse and Trim
• Rules come from study of Headlinese
– Some syntactic structures are far less common in
Headlines than in Story sentences.
Phenomenon
Headlines
Lead Sent
Preposed adjunct
0%
2.7%
Conjoined VP
3%
27%
Trimmer: Mask operation
Trimmer: Mask Outside
Trimmer Algorithm
• Find all instances of applicable Trimmer
rules
• Apply rules application instances one at a
time until the desired length is reached
Trimmer Rule: Root S
• Select the lowest leftmost S which has NP
and VP children, in that order.
[S [S [NPRebels] [VP agree to talks with
government]] officials said Tuesday.]
Trimmer Rule: Preposed Adjunct
(Preamble Rule)
• Remove [YP …] preceding first NP inside
chosen S
[S [PP According to a now-finalized
blueprint described by U.S. officials and
other sources] [NP the Bush
administration] [VP plans to take complete,
unilateral control of a post-Saddam
Hussein Iraq]]
Trimmer Rule: Conjunction
• Remove [X][CC][X] or [X][CC][X]
[S Illegal fireworks [VP [VP injured hundreds of
people] [CC and] [VP started six fires.]]]
[S A company offering blood cholesterol tests in
grocery stores says [S [S medical technology
has outpaced state laws,] [CC but] [S the
state says the company doesn’t have the
proper licenses.]]]
Multi-candidate Trimmer
• After each rule application, current state of
parse is a candidate
• Multi-candidate Trimmer rules
– Root S
– Preamble
– Conjunction
Multi-candidate Trimmer Rule:
Root-S
• Multi-candidate Root S
• [S1 [S2 The latest flood crest, the eighth this summer,
passed Chongqing in southwest China], and [S3 waters
were rising in Yichang, in central China’s Hubei province,
on the middle reaches of the Yangtze], state television
reported Sunday.]
• Single-candidate version would choose only S2. Multicandidate Root-S generates all three choices.
Trimmer: Preamble Rule
Multi-candidate Trimmer Rule
Preposed Adjunct
Trimmer: Preamble Rule
Multi-candidate Trimmer Rule:
Conjunction
• [S Illegal fireworks [VP [VP injured
hundreds of people] [CC and] [VP started
six fires.]]]
• Illegal fireworks injured hundreds of
people
• Illegal fireworks started six fires
Trimmer Candidate Selection
• Baseline LUL: select longest version under
limit
• Linear Combination of Features
– L: Length in characters or words
– R: Counts of rule applications
– C: Centrality
Trimmer: Multi-candidate Rule
14000
12000
Number of Candidates Generated
10000
8000
6000
4000
2000
0
None
R
P
C
R,P
Multi-Candidate Rules
R,C
P,C
R,P,C
Trimmer: Candidate Counts
0.264
0.262
0.26
0.258
Rouge 1
0.256
0.254
0.252
0.25
0.248
0.246
0.244
5000
6000
7000
8000
9000
10000
Candidate Count
11000
12000
13000
14000
Trimmer: Candidate Rules
0.27
0.25
0.23
No Multi-Candidate
Rouge 1
R
P
C
0.21
R,P
R,C
P,C
R,P,C
0.19
0.17
0.15
LUL
L
R
C
LR
Candidate Selection Features
LC
RC
LRC
Trimmer: Candidate Selection Features
0.27
0.25
0.23
LUL
Rouge 1
L
R
C
0.21
LR
LC
RC
LRC
0.19
0.17
0.15
None
R
P
C
R,P
Candidate Selection Features
R,C
P,C
R,P,C
Topiary
• Combines topic terms and fluent text
– Fluent text comes from Trimmer
– Topics come from Unsupervised Topic
Detection (UTD)
– High-scoring System at DUC2004
• Most recent large-scale evaluation of Headline
Generation systems
Topiary Architecture
Parser
Parses
Sentence
Topiary
Entity
Tagger
Document
Entity
Tags
Topic
Assignment
Topic
Terms
Candidates
Topiary Single-candidate Algorithm
1. Adjust length threshold to make room for
highest scoring non-redundant topic term
•
“Osama”, 75 char → 69 char
2. While fluent text above adjusted length
threshold
1. Apply a Trimmer rule
2. Adjust length threshold to make room for
highest scoring non-redundant topic term
3. Fill remaining space with topic terms
Topiary: DUC2004
Humans
0.35
Topiary
0.3
Baseline
0.25
Rouge 1
0.2
0.15
0.1
0.05
DUC2004 Subm itted System s and Peers
B
D
F
1
To
pi
ar
y
13
1
13
6
13
5
78
54
12
8
11
0
50
88
31
87
99
76
91
79
18
98
75
25
0
DUC2004: First 75, Topiary,
Trimmer, UTD
0.3
Rouge Score
0.25
0.2
First 75 chars
Topiary
Trimmer
UTD
0.15
0.1
0.05
0
Rouge 1
Rouge 2
Rouge 3
Rouge Metric
Rouge 4
Topiary: Multi-candidate Algorithm
• Generate Multi-candidate Trimmer
candidates
• Generate Topic Terms
• Combine each Trimmer candidate with
each Topic Term. Fill empty space with all
combinations of Topic Terms
Topiary: Multi-candidate Examples
• Document APW19990519.0113 Topics
–
–
–
–
STUDY
SCHIZOPHRENIA
BRAIN
DOCTORS
1.168
0.315
0.144
0.074
• Schizophrenia patients whose medication couldn't stop
the imaginary voices
• STUDY Schizophrenia patients whose medication
couldn't stop the imaginary v
• DOCTORS STUDY BRAIN Schizophrenia patients
gained some relief.
Topiary Candidate Selection
• Linear Combination of Features
– L: Length in characters or words
– R: Counts of rule applications
– C: Centrality
– T: Topic counts, sum of scores
Topiary DUC2004 Evaluation
0.3
0.25
First 75 chars
Rouge Score
Trimmer
0.2
UTD
Topiary
MC Topiary
0.15
0.1
0.05
0
Rouge 1
Rouge 2
Rouge 3
Rouge Metric
Rouge 4
Talk Roadmap
• Problem Description
• Contributions and Hypotheses
• Automatic Summarization under MASC
framework
– Single Document Summarization and Evaluation
• HMM Hedge, Trimmer, Topiary
– Extension to Multi-document Summarization
• Collaboration and Novel Genres
• Summary and Conclusion
• Future Work
General Architecture
HMM Hedge
Trimmer
Topiary
Lead Sentence
First N Sentences
Document
Candidates
Sentence
Selection
Sentences
Candidate
Selection
Feature Weights
Maximal Marginal
Relevance
Summary
Sentence
Compression
Multi-Document Summarization
Candidate Selection
•
Maximal Marginal Relevance (MMR)
(Carbonell and Goldstein, 1998)
1. Score all candidates with linear
combination of static and dynamic
features
2. While summary not full
1. Add highest-scoring candidate to summary
2. Remove other compressions of its source
sentence from the pool
3. Recalculate dynamic features, Rescore
candidates
Multi-Document Summarization
Candidate Selection Features
• Static Features
– Sentence Position
– Relevance
– Centrality
– Compression-specific features
• Dynamic Features
– Redundancy
– Count of summary candidates from source
document
Relevance and Centrality
• Candidate Query Relevance: Matching score
between candidate and query
• Document Query Relevance: Lucene similarity
score between document and query
• Candidate Centrality: Average Lucene similarity
of candidate to other sentences in document
• Document Centrality: Average Lucene similarity
of document to other documents in cluster
Redundancy: Weighted Word
Overlap
count ( w, S )
P( w | S ) 
size ( S )
count ( w, G )
P( w | G ) 
size (G )
redundancy (C , S )   P( w | S )  (1   ) P ( w | G )
wC
• λ estimates ratio of topic-specific words in
summary to size of summary.
DUC 2006 Evaluation
0.45
0.4
Rouge Score
0.35
0.3
0.25
Trimmer
HMM
0.2
0.15
0.1
0.05
0
R1 Recall
R2 Recall
Rouge Metric
Talk Roadmap
• Problem Description
• Contributions and Hypotheses
• Automatic Summarization under MASC
framework
– Single Document Summarization and Evaluation
• HMM Hedge, Trimmer, Topiary
– Extension to Multi-document Summarization
• Collaboration and Novel Genres
• Summary and Conclusion
• Future Work
Collaboration under MASC
• University of Maryland and Institute for
Defense Analysis Center for Computing
Sciences (IDA/CCS) collaborated on a
submission to DUC2006
• Sentence compression: Trimmer, CCS
shallow compression
• Sentence selection features: Relevance,
Centrality, Omega Estimated Oracle Score
Collaboration under MASC
• University of Maryland, Carnegie Mellon
University and IBM collaborated on GALE
Rosetta Team Distillation Evaluation
• Trimmer was a component of the system
for generating snippets
Broadcast News
• 560 broadcast news transcripts
– ABC, CNN, NBC, PRI, VOA
– January 1998 – June 1998, October 2000 –
December 2000
– Transcribed by BBN’s BYBLOS Large
Vocabulary Continuous Speech Recognition
(Colthurst et 2000)
• Baseline (first 75), Multi-candidate
Trimmer, Multi-candidate Topiary
Broadcast News Evaluation
0.256
0.254
0.252
Rouge 1
0.25
0.248
0.246
0.244
0.242
0.24
0.238
0.236
Baseline
Trimmer
System
Topiary
Broadcast News
Trimmer Feature Sets
0.3
0.25
Rouge 1
0.2
First 1
First 5
0.15
0.1
0.05
0
LUL
L
R
C
LR
Feature Sets
LC
RC
LRC
Talk Roadmap
• Problem Description
• Contributions and Hypotheses
• Automatic Summarization under MASC
framework
– Single Document Summarization and Evaluation
• HMM Hedge, Trimmer, Topiary
– Extension to Multi-document Summarization
• Collaboration and Novel Genres
• Summary and Conclusion
• Future Work
Hypothesis No. 1
• Extractive Summarization Systems can
achieve higher Rouge scores by using
larger pools of candidates.
– HMM Hedge, Single-document. Rouge-1 recall
increases as number of candidates increases
– Trimmer, Single-document. Rouge-1 increases with
greater use of multi-candidate rules
– Topiary, Single-document. Multi-candidate Topiary
scores significantly higher for Rouge 2 than singlecandidate Topiary.
Hypothesis No. 2
• Extractive summarization systems can
achieve higher Rouge scores by giving the
sentence selector appropriate sets of
features.
– Trimmer, Single-document. Rouge-1
increases with larger set of features
Hypothesis No. 3
• For Headline Generation, combination of
fluent text and topics better than either
alone
– Topiary scored significantly higher than
Trimmer and UTD
Contributions
• Multiple Alternative Sentence
Compressions Summarization Framework
• Sentence compression methodologies
– HMM Hedge
– Trimmer
• Topiary Headline Generation
Contributions
• Use of MASC framework performance
across summarization tasks and
compression source
• Fluent and informative summaries can be
constructed by selecting words in order
from sentences. Verified by doing a human
study.
• Headlines combining fluent text and topic
terms score better than either alone
Future Work
• Enhance redundancy score with paraphrase
detection (Ibrahim et al 2003, Shen et al 2006)
• Anaphora resolution in candidates (LingPipe
tools)
• Expand candidate pool by sentence merging
(Jing & McKeown 2000)
• Sentence ordering in multi-sentence summaries
(Radev 1999, Barzilay 2002, Lapata 2003,
Okazaki 2004, Conroy et al 2006)
End
Feasibility Study
• Three subjects
• 56 AP newswire stories from TIPSTER
corpus
• Construct headlines by selecting words in
order from the stories
• Task could be done for 53 prose stories,
not for 3 non-prose stories
• Only 7 went beyond 60th word
Feasibility Study
• Two subjects
• 73 AP newswire stories from TIPSTER
corpus
• Construct headlines by selecting words in
order from the stories, allowing
morphological variation
HMM Hedge
0.28
0.26
0.24
Rouge 1 Scores
0.22
R1 Recall, 1 Sentence
R1 Precision, 1 Sentence
0.2
R1 Recall, 2 Sentences
R1 Precision, 2 Sentences
R1 Recall, 3 Sentences
0.18
R1 Precision, 3 Sentences
0.16
0.14
0.12
0.1
1
2
3
4
N-best at each length
5
HMM Hedge
• Features (5-fold cross validation) default weight,
optimized weight fold A) Linear combination scoring
function
– Word position sum (-0.05, 1.72)
– Small gaps (-0.01, 1.02)
– Large gaps (-0.05, 3.70)
– Clumps (-0.05, -0.17)
– Sentence position (0, -945)
– Length in words (1, 42)
– Length in characters (1, 85)
– Unigram probability of story words (1, 1.03)
– Bigram probability of headline words (1, 1.51)
– Emit probability of headline words (1, 3.60)
HMM Hedge
Multi-Document Summarization
Headline
Language
Model
General
Language
Model
Feature
Weights
URA
Index
HMM Hedge
Document
Candidates,
HMM Features
URA
Part of Speech
Tagger
Query (optional)
Verb
Tags
Candidates,
HMM Features,
URA Features
Selection
Summary
Topiary Evaluation
Rouge Metric
Topiary
Rouge-1 Recall 0.25027
Multi-Candidate
Topiary
0.26490
Rouge-2 Recall 0.06484
0.08168*
Rouge-3 Recall 0.02130
0.02805
Rouge-4 Recall 0.00717
0.01105
Rouge-L
0.20063
0.22283*
Rouge-W1.2
0.11951
0.13234*
Redundancy: Intuition
• Consider a summary about earthquakes
• “Generated” by topic: Earthquake, seismic,
Richter scale
• “Generated” by general language: Dog,
under, during
• Sentences with many words “generated”
by the topic are redundant
Trimmer
System
R1 Recall
R1 Prec.
R2 Recall
R2 Prec
Trimmer
MultiDoc
0.38198
0.37617
0.08051
0.07922
HMM
MultiDoc
0.37404
0.37405
0.07884
0.07887
Evaluation
Human extrinsic evaluation of HMM,
Trimmer, Topiary and First 75
LDC agreement: ~20x increase in speed.
Some loss of accuracy.
Relevance Prediction
Baseline First75 char, hard to beat
Download