University of Maryland College Park Department of Computer Science Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic November 28, 2006 Intuition • A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. The body of the editor, Misael Tamayo Hernández, of the daily El Despertar de la Costa, was found early Friday with his hands tied behind his back in a room at the Venus Motel… Intuition • A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. • Newspaper editor found dead in Pacific resort city Intuition • A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. • Paper ran articles about corruption in government Intuition • A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. • Hernández, Zihuatanjo: Newspaper editor found dead in Pacific resort city Intuition • A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. • Newspaper Editor Killed in Mexico – (A) Newspaper Editor (was) killed in Mexico Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work Introduction • Automatic Summarization – Distillation of important information from a source into an abridged form – Extractive Summarization: select sentences with important content from the document – Limitations • Sentences contain mixture of relevant, nonrelevant information • Sentences partially redundant to rest of summary Introduction • Multiple Alternative Sentence Compressions (MASC) – Framework for Automatic Text Summarization – Generation of many sentence compressions of source sentences to serve as candidates – Select from candidates using weighted features to generate summary – Environment for testing hypotheses Introduction • Hypotheses – Extractive summarization systems can create better summaries using larger pool of compressed candidates – Sentence selectors choose better summary candidates using larger sets of features – For Headline Generation, combination of fluent text and topics better than either alone Introduction • Sentence Compression – HMM Hedge – Trimmer – Topiary • Sentence Selection – Lead Sentence for Headline Generation – Maximal Marginal Relevance for Multidocument Summarization Summarization Tasks • Single Document Summarization – – – – Very short: Headline Generation Single sentence 75 characters DUC2002, 2003, 2004 • Query-focused Multi-Document Summarization – Multiple sentences – 100 – 250 words – DUC2005, 2006 Headline Generation • Newspaper Headlines – Natural example of human summarization – Three criteria for a good headline: • Summarize a story • Make people want to read it • Fit in specified space – Headlinese: compressed form of English Introduction • Headline Types: Eye-Catcher • Indicative • Informative • Under God Under Fire • Pledge of Allegiance • U.S. Court Decides Pledge of Allegiance Unconstitutional Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work General Architecture Document Compression HMM Hedge Trimmer Topiary Candidates Selection Summary Lead Sentence Selection Maximal Marginal Relevance Sentence Compression • Selecting words in order from a sentence – Or window of words • Human Studies – Humans can almost always do this for written news – Bias for words from within a single sentence – Bias for words early in document Sentence Compression • Two implementations of select-words-inorder – Statistical Method: HMM Hedge (Headline Generation) – Syntactic Method: Trimmer Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work HMM Hedge Architecture Single Compression Headline Language Model General Language Model HMM Hedge Document Part of Speech Tagger Verb Tags Selection Summary HMM Hedge Noisy Channel Model • • • • Underlying method: select words in order Sentences are observed data Headlines are unobserved data Noisy channel adds words to headlines to create sentences President signed legislation On Tuesday the President signed the controversial legislation at a private ceremony HMM Hedge Noisy Channel Model • Probability of Headline estimated with bigram model of Headlinese • Probability of observed Sentence given unobserved Headline (the channel model) estimated by unigram model of General English HMM Hedge • Decoding parameters to mimic Headlines – Groups of contiguous words, clumpiness – Size of gaps between words, gappiness – Sentence position of words – Require verb HMM Hedge • Adaptation to Multi-candidate compression • Finds the 5 most likely headlines for summary lengths 5 to 15 words of document sentences Automatic Evaluation • Recall Oriented Understudy of Gisting Evaluation (Rouge) – Rouge Recall: ratio of matching candidate n-gram count to reference n-gram count – Rouge Precision: ratio of matching candidate n-gram count to candidate n-gram count times number of references. – R1 preferred for single document summarization – R2 preferred for multi-document summarization HMM Hedge 0.28 0.26 0.24 Rouge 1 Scores 0.22 R1 Recall, 1 Sentence R1 Precision, 1 Sentence 0.2 R1 Recall, 2 Sentences R1 Precision, 2 Sentences R1 Recall, 3 Sentences 0.18 R1 Precision, 3 Sentences 0.16 0.14 0.12 0.1 1 2 3 4 N-best at each length 5 HMM Hedge • Features (default weight, optimized weight fold A) – – – – – – – – – – Word position sum (-0.05, 1.72) Small gaps (-0.01, 1.02) Large gaps (-0.05, 3.70) Clumps (-0.05, -0.17) Sentence position (0, -945) Length in words (1, 42) Length in characters (1, 85) Unigram probability of story words (1, 1.03) Bigram probability of headline words (1, 1.51) Emit probability of headline words (1, 3.60) HMM Hedge Fold Default R1 recall Weights R1 Prec. Optimized Weights R1 recall R1 Prec. A 0.11214 0.10726 0.24722 0.21482 B 0.11021 0.10231 0.24307 0.21425 C 0.11781 0.10811 0.24129 0.20795 D 0.11993 0.10660 0.16595 0.13454 E 0.11282 0.10003 0.25341 0.21775 Avg 0.11458 0.10486 0.23019 0.19786 Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work Multi-Document Summarization • Human study showed potential for saving space by using sentence compression in multidocument summarization • Subject was shown 103 sentences, relevant to 39 queries, asked to make relevance judgments on 430 compressed versions • Potential for 16.7% reduction by word count, 17.6% reduction by characters, with no loss of relevance HMM Hedge Multi-Document Summarization Headline Language Model General Language Model Feature Weights URA Index HMM Hedge Document Candidates, HMM Features URA Part of Speech Tagger Query (optional) Verb Tags Candidates, HMM Features, URA Features Selection Summary Multi-Document Sentence Selection • Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998) – All candidates given scores: linear combination of static and dynamic features – High ranking candidate included in summary • Other compressions of source sentence removed from pool – Recalculate dynamic features, Rescore candidates – Iterate until summary is complete. Multi-Document Sentence Selection • Static Features – Sentence Position – Relevance – Centrality – Compression-specific features • Dynamic Features – Redundancy – Count of summary candidates from source document Relevance and Centrality • Universal Retrieval Architecture (URA) – Infrastructure for information retrieval tasks • Four score components – Candidate Query Relevance: Matching score between candidate and query – Document Query Relevance: Lucene similarity score between document and query – Candidate Centrality: Average Lucene similarity of candidate to other sentences in document – Document Centrality: Average Lucene similarity of document to other documents in cluster Redundancy: Intuition • Consider a summary about earthquakes • “Generated” by topic: Earthquake, seismic, Richter scale • “Generated” by general language: Dog, under, during • Sentences with many words “generated” by the topic are redundant Redundancy: Formal count ( w, D) P( w | D) size ( D) count ( w, C ) P( w | C ) size (C ) redundancy ( S ) P( s | D) (1 ) P( s | C ) sS HMM Hedge Multi-doc • Placeholder for results of HMM Hedge Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work Trimmer • Underlying method: select words in order • Parse and Trim • Rules come from study of Headlinese – Different distributions of syntactic structures Phenomenon Headlines Lead Sent Preposed adjunct 0% 2.7% Time expression 1.5% 24% Noun Phrase Relative Clause 0.3% 3.5% Trimmer: Mask operation Trimmer: Mask Outside Trimmer Single Document Document Entity Tagger Entity Tags Parser Parses Trimmer Candidates, Trimmer Features Selection Summary Trimmer: Root S • Select the lowest leftmost S which has NP and VP children, in that order. [S [S [NPRebels] [VP agree to talks with government]] officials said Tuesday.] Trimmer: Preposed Adjunct • Remove [YP …] preceding first NP inside chosen S [S [PP According to a now-finalized blueprint described by U.S. officials and other sources] [NP the Bush administration] [VP plans to take complete, unilateral control of a post-Saddam Hussein Iraq]] Trimmer: Conjunction • Remove [X][CC][X] or [X][CC][X] [S Illegal fireworks [VP [VP injured hundreds of people] [CC and] [VP started six fires.]]] [S A company offering blood cholesterol tests in grocery stores says [S [S medical technology has outpaced state laws,] [CC but] [S the state says the company doesn’t have the proper licenses.]]] Trimmer • Adaptation to multi-candidate compression • Multi-candidate rules – Root S – Preamble – Conjunction Trimmer • Multi-candidate Root S • [S1 [S2 The latest flood crest, the eighth this summer, passed Chongqing in southwest China], and [S3 waters were rising in Yichang, in central China’s Hubei province, on the middle reaches of the Yangtze], state television reported Sunday.] • Single-candidate version would choose only S2. Multicandidate Root-S generates all three choices. Trimmer: Preamble Rule Trimmer: Preamble Rule Trimmer: Preamble Rule Trimmer: Conjunction • [S Illegal fireworks [VP [VP injured hundreds of people] [CC and] [VP started six fires.]]] • Illegal fireworks injured hundreds of people • Illegal fireworks started six fires Trimmer: Multi-candidate Rules 0.27 0.25 0.23 Rouge 1 Recall LUL L R C 0.21 LR LC RC LRC 0.19 0.17 0.15 Trimmer Trimmer +R Trimmer +P Trimmer +C Trimmer +R+P Trimmer +R+C Trimmer +S+C Trimmer +R+S+C Trimmer: Features • Selection among Trimmer candidates based on three sets of features – L: Length in characters or words – R: Counts of rule applications – C: Centrality • Baseline LUL: select longest version under limit Trimmer: Features 0.27 0.25 0.23 Rouge 1 Recall Trimmer Trimmer +R Trimmer +P Trimmer +C 0.21 Trimmer +R+P Trimmer +R+C Trimmer +S+C Trimmer +R+S+C 0.19 0.17 0.15 LUL L R C LR LC RC LRC Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work Trimmer Multi-Document Document Entity Tagger Entity Tags Parser Parses Trimmer Feature Weights URA Index Candidates, Trimmer Features URA Query (optional) Candidates, Trimmer Features, URA Features Selection Summary Trimmer System R1 Recall R1 Prec. R2 Recall R2 Prec Trimmer MultiDoc 0.38198 0.37617 0.08051 0.07922 HMM MultiDoc 0.37404 0.37405 0.07884 0.07887 Talk Roadmap • Introduction • Automatic Summarization under MASC framework – HMM Hedge, Trimmer, Topiary – Single Document, Multi-document – Experimental evidence supporting hypotheses • Review of Evaluations • Conclusion • Future Work Topiary • Combines topic terms and fluent text – Fluent text comes from Trimmer – Topics come from Unsupervised Topic Detection (UTD) • Single-candidate algorithm – Lower Trimmer threshold to make room for highest scoring non-redundant topic term – Trim to lower threshold. – Adjust if topic redundancy changes because of trimming Topiary Single-Candidate Document Entity Tagger Entity Tags Parser Parses Topiary Topics Unsupervised Topic Detection Summary Topiary, Trimmer, UTD 0.3 0.25 First 75 chars Topiary 0.2 Trimmer 2003 Trimmer 2004 0.15 UTD 0.1 0.05 0 Rouge 1 Rouge 2 Rouge 3 Rouge 4 Topiary • Multi-candidate Algorithm – Generate Multi-candidate Trimmer candidates – Fill space in all Trimmer candidates with all combinations of non-redundant topics – Score and select summary Topiary Multi-Candidate Document Entity Tagger Entity Tags Parser Parses Topiary URA Index Candidates, Trimmer Features URA Topics Query (optional) Unsupervised Topic Detection Feature Weights Candidates, Trimmer Features, URA Features Selection Summary DUC 2004 Task 1 Results (Rouge) 0.35 Human References Topiary 0.3 0.25 Baseline 0.2 Automatic Summaries 0.15 0.1 0.05 0 ROUGE-1 ROUGE-L ROUGE-W-1.2 ROUGE-2 ROUGE-3 ROUGE-4 1 TOPIARY 9 10 18 25 26 31 32 33 50 51 52 53 54 75 76 77 78 79 80 87 88 89 90 91 92 98 99 100 101 110 128 129 130 131 132 135 136 137 A B C D E F G H Topiary Evaluation Rouge Metric Topiary Rouge-1 Recall 0.25027 Multi-Candidate Topiary 0.26490 Rouge-2 Recall 0.06484 0.08168* Rouge-3 Recall 0.02130 0.02805 Rouge-4 Recall 0.00717 0.01105 Rouge-L 0.20063 0.22283* Rouge-W1.2 0.11951 0.13234* Talk Roadmap • Introduction • Automatic Summarization – HMM Hedge, Trimmer, Topiary • Single-candidate, MASC versions – Multi-document Summarizataion • HMM Hedge, Trimmer • Evaluation • Conclusion • Future Work Evaluation: Review • HMM Hedge, Single-document. Rouge-1 recall increases as number of candidates increases • HMM Hedge, Single-document. Rouge-1 doubles when scored with optimized weights on features • Trimmer, Single-document. Rouge-1 increases with greater use of multi-candidate rules • Trimmer, Single-document. Rouge-1 increases with larger set of features • Topiary, Single-document. Multi-candidate Topiary scores significantly higher on some Rouge metrics than single-candidate Topiary. • Trimmer scored higher than HMM for Multi-document summarization Evaluation Human extrinsic evaluation of HMM, Trimmer, Topiary and First 75 LDC agreement: ~20x increase in speed. Some loss of accuracy. Relevance Prediction Baseline First75 char, hard to beat Talk Roadmap • Introduction • Automatic Summarization – HMM Hedge, Trimmer, Topiary – Multiple Alternative Sentence Compressions (MASC) • Evaluation • Conclusion • Future Work Conclusion • MASC improves performance across summarization tasks and compression source • Fluent and informative summaries can be constructed by selecting words in order from sentences • Headlines combining fluent text and topic terms score better than either alone Future Work • Enhance redundancy score with paraphrase detection • Anaphora resolution in candidates • Expand candidates by sentence merging • Sentence ordering in multi-sentence summaries End