Query-oriented Multi-document Summarization via Unsupervised Deep Learning Yan LIU, Sheng-hua ZHONG, Wenjie LI Department of Computing The Hong Kong Polytechnic University www.comp.polyu.edu.hk/~csshzhong Outline Problem Idea Query-oriented deep extraction Experiments Deep models, consistent with human cortex Proposed technique Push out important concepts layer by layer by inheriting extraction ability from deep learning Methodology Extractive style query-oriented multi-document summarization DUC 2005, DUC 2006 and DUC 2007 Conclusion Provide a human-like document summarization Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 2 Outline Problem Idea Query-oriented deep extraction Experiments Deep models, consistent with human cortex Proposed technique Extract important concepts layer by layer by inheriting extraction ability from deep learning Methodology Extractive style query-oriented multi-document summarization DUC 2005, DUC 2006 and DUC 2007 Conclusion Provide a human-like document summarization Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 3 Query-oriented Multi-document Summarization Extractive style query-oriented multi-document summarization Multi-document summarization remains a well-known challenge Automatic generic text summarization query oriented document summarization Single-document multi-document summarization Extractive approach is the mainstream Humans do not have difficulty with multi-document summarization Generate the summary by extracting a proper set of sentences from multiple documents based on the pre-given query Important in both information retrieval and natural language processing How does the neocortex process the lexical-semantic task? Aim of this paper This is the first paper of utilizing deep learning in document summarization Provide human-like judgment by referencing the architecture of the human neocortex Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 4 Outline Problem Idea Query-oriented deep extraction Experiments Deep models, consistent with human cortex Proposed technique Extract important concepts layer by layer by inheriting extraction ability from deep learning Methodology Extractive style query-oriented multi-document summarization DUC 2005, DUC 2006 and DUC 2007 Conclusion Provide a human-like document summarization Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 5 Deep Learning Physical structure Evolution of intelligence Human: dozens of cortical layers are involved in even the simplest lexical-semantic processing Deep: model the problem using multiple layers of parameterized nonlinear modules Human: the multi-layers structure began to appear in the neocortex starting from old world monkeys about 40 million years ago Deep: the development of intelligence follows with the multi-layer structure Propagation of information Human: several reasons for believing that our lexical-semantic systems contain multi-layer generative models Deep: layer-wise reconstruction to learn multiple levels of representation and abstraction that helps to make sense of data Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 6 Human Neocortex and Deep Architectures (a) Human neocortex (b) An example of deep architectures Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 7 Proposed Model in Vision Recognition Samples of first layer weights Examples represent “strokes” of digital (b) Resemble to V1 response at Fe e ur ap M … … Attention Map … … … … … … … … … … … … … H1 … 1 … H2 … … Input Image (a) Proposed BDBN (c) Saliency map based on BDBN Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 8 Outline Problem Idea Query-oriented deep extraction Experiments Deep models, consistent with human cortex Proposed technique Extract important concepts layer by layer by inheriting extraction ability from deep learning Methodology Extractive style query-oriented multi-document summarization DUC 2005, DUC 2006 and DUC 2007 Conclusion Provide a human-like document summarization Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 9 Three-stage Learning of QODE Information Summary Generation Dynamic Programming Input: tf value of every word in document Output: summary Three-stage learning Query-oriented concept extraction Hidden layers are used to abstract the documents using greedy layer-wise extraction algorithm Reconstruction validation for global adjustment Reconstruct the data distribution by finetuning the whole deep architecture globally Summary generation programming Summary Candidate Sentence Pool via dynamic Dynamic programming is utilized to maximize the importance of the summary with the length constraint Reconstruction Validation (A1)T ………… h0 ………… h1 ……… h2 2 T (A ) 3 T (A ) Concept Extraction …… h3 3 A Candidate Sentence Extraction ……… A2 h2 Key Words Discovery ………… A1 h1 Not Important Words Filtering Out ………… f [f , f , , f , , f ] tf Value d d 1 d 2 d V d v … h0 Preprocessing Word List Document Topic Set Query Oriented Initial Weight Setting … Query Oriented Penalty Process Query Word List Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 10 Query-oriented Concept Extraction A joint configuration ( h0 , h 1 ) of the visible layer H 0 and the first hidden H1 layer has energy E h0 , h1; 1 ((h0 )T A1h1 (b1 )T h0 (c1 )T h1 ) 1 A1 ,b1 ,c1 Utilize Contrastive Divergence algorithm to update the parameter space log p(h1 (0)) 1 E (h1 (0), h0 (0)) p(h (0) | h (0)) h0 (0) 1 0 1 A1 A ( h1 (0)T h0 (0) data h1 (1)T h0 (1) recon ) h1 ( k ) E (h1 (k ), h0 (k )) p(h (k ), h (k )) h0 ( k ) 1 1 0 b1 b (h0 (0) h0 (1)) c1 c1 c (h1(0) h1(1)) Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 11 Reconstruction Validation for Global Adjustment Backpropagation adjusts the entire deep network to find good local optimum parameters by minimizing the crossentropy error arg min[ * v fv log fv (1 f )log(1 f )] v v v Before backpropagation, a good region in the whole parameter space has been found The convergence obtained from backpropagation learning is not slow The result generally converge to a good local minimum on the error surface Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 12 Summary Generation via Dynamic Programming Dynamic programming Simplify a complicated problem by breaking it down into simpler sub-problems in a recursive manner Maximize the importance of the summary with the length constraint The importance of the sentence In t In t i i , i 1 i 0 i if (i UN ) (i q) if i UN others The summary length is defined as Le l1 ... lt ... lT NS The objective function max In = In , s.t. Le NS t t Dynamic programming function f K (K ) max{uK In K f K 1 (K 1 )} K t ,1 t T K K 1 uK 1lK 1 , 0, N , f ( ) 0 T S 0 0 0 Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 13 Outline Problem Idea Query-oriented deep extraction Experiments Deep models, consistent with human cortex Proposed technique Extract important concepts layer by layer by inheriting extraction ability from deep learning Methodology Extractive style query-oriented multi-document summarization DUC 2005, DUC 2006 and DUC 2007 Conclusion Provide a human-like document summarization Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 14 Experiment Setting Database DUC 2005, DUC 2006 and DUC 2007 Benchmark datasets of multi-document summarization task evaluation in the Document Understanding Conference (DUC) Produce query-oriented multi-document summarization with allowance of 250 words Evaluation standards ROUGE-1, ROUGE-2 and ROUGE-SU4 Graph-based sentence ranking algorithms Compared algorithms Supervised learning based sentence ranking models Manifold-ranking model [Wan & Xiao, 2009] Multiple-modality model [Wan, 2009] Document-sensitive model [Wei et al, 2010] SVM classification [Vapnik, 1995] Ranking SVM[Jochims et al, 2002] Regression[Ouyang et al, 2011] Classical relevance and redundancy based selection algorithms Maximum marginal relevance (Goldstein et al, 2000) Greedy search (Filatova & Hatzivassiloglou, 2004) Integer linear program (Mcdonald, 2007) NIST baseline system (Dang, 2005) Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 15 Average Recall Scores Comparison Comparison to representative algorithms on the DUC 2005 System ROUGE-1 ROUGE-2 ROUGE-SU4 * QODE 0.3751 0.0775 0.1341* Manifold-ranking 0.3839* 0.0737 0.1317 Multiple-modality 0.3718 0.0676 0.1293 Document-sensitive —— 0.0771 0.1337 SVM Classification 0.3663 0.0701 0.1243 Ranking SVM 0.3702 0.0711 0.1299 Regression 0.3770 0.0761 0.1329 Greedy search 0.3560 0.0610 —— MMR 0.3701 0.0701 0.1289 ILP 0.3580 0.0610 —— NIST Baseline —— 0.0403 0.0872 1 √ √ √ Method 2 √ √ √ Query Oriented Contribution Analysis ROUGE-1 ROUGE-2 3 √ √ √ 0.3751 0.3731 0.3734 0.3704 0.0775 0.0742 0.0755 0.0740 ROUGE-SU4 0.1341 0.1315 0.1329 0.1301 1. Query oriented initial weight setting, 2. Query oriented penalty process, 3. Summary importance maximization by DP Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 16 Average Recall Scores Comparison Comparison to representative algorithms on the DUC 2006 System ROUGE-1 ROUGE-2 ROUGE-SU4 * QODE 0.4015 0.0928 0.1479 Manifold-ranking 0.4101* 0.0886 0.1420 Multiple-modality 0.4031 0.0851 0.1400 Document-sensitive —— 0.0899 0.1427 SVM Classification —— 0.0834 0.1387 Ranking SVM —— 0.0890 0.1443 Regression —— 0.0926 0.1485* 0.0962 NIST Baseline —— 0.0491 Comparison to representative algorithms on the DUC 2007 System QODE Manifold-ranking Multiple-modality Document-sensitive SVM Classification Ranking SVM NIST Baseline ROUGE-1 0.4295 0.4204 —— 0.4211 —— 0.4301* 0.3091 ROUGE-2 0.1163 0.1030 0.1123 0.1103 0.1075 0.1175* 0.0599 ROUGE-SU4 0.1685* 0.1460 0.1682 0.1628 0.1616 0.1682 0.1036 Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 17 Query Word Importance Analysis (a) ROUGE-2 Recall performance vs. (b) ROUGE-4 Recall performance vs. Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 18 Filtering Out Words in First Layer of QODE The statistical analysis of words in layer H 1 Words Number Percentage 1032 In Human Summary 65 Filtering out words Remaining words 1000 211 21.2% 6.3% Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 19 Key Words Discovery of QODE The statistical analysis of words in layer H Words Number Random words 250 Key words 250 2 In Human Summary 34 99 Percentage 13.6% 39.6% Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 20 Candidate Sentence Extraction of QODE Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 21 Advanced Extraction Ability of QODE Candidate sentence extracted in layer H 3 Id of Human’s Summary An international war crimes tribunal covering the former Yugoslavia formally opens in The Hague today with a A,B,C,D,E,G, request for the extradition from Germany of a Bosnian Serb alleged to have killed three Moslem prisoners. H,I,J Sentence with Union of Key Words in Automatically Extracted Summary The extradition is important to the tribunal - the first international war crimes court since the Nuremberg trials after the second world war - because it has no power to try suspects in absentia. World News in Brief: Court rules on border. The International Court of Justice in The Hague ruled in Chad's favour in a 20-year border dispute with Libya which has caused two wars. Maybe we'll go full circle; the World Court can condemn this action and then the Soviets can defy that body, just as the United States defied the court's condemnation of our embargo of Nicaragua. Ever since the Reagan Administration walked out of the Hague to protest Nicaragua's claim of illegality in U.S. aid to the Contras, the State Department has opposed submitting to the World Court any case that involves the use of military force. They refused to appear in the World Court 10 years ago when Washington sought the release of American hostages in Tehran. A year after Noriega's capture, the court was still hearing arguments on whether Bush could be subpoenaed and the World Court was in preliminary hearings on Panama's complaint. After six months of uproar, the U.S. district court judge in Miami ordered that the case proceed to trial. Mr Edwin Williamson, a legal adviser to the U.S. State Department who will address the court later in the proceedings, said yesterday, ‘This (court) action in no way inhibits what the Security Council is doing. ’ Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification B,C,D,E,G,H,I, J A,C,D,E,G,H,I ,J B,D,E,H,I,J C,D,G,I,J H,I,J H,I,J J Null Null 22 Outline Problem Idea Query-oriented deep extraction Experiments Deep models, consistent with human cortex Proposed technique Extract important concepts layer by layer by inheriting extraction ability from deep learning Methodology Extractive style query-oriented multi-document summarization DUC 2005, DUC 2006 and DUC 2007 Conclusion Provide a human-like document summarization Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 23 Conclusion and Future Work Novel document summarization architecture Three-stage learning Simulate the multi-layer physical structure of the cortex First layer is utilized to filter out not important words Second layer is utilized to discover key words Third layer is utilized to extract candidate sentences Simulate the procedure of lexical-semantic processing by human beings Dynamic programming is utilized to maximize the importance of the summary with the length constraint Future work Propose novel deep learning model by referring more characters of human cortex Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 24 Q&A Thanks! Yan LIU, Sheng-hua ZHONG, Wenjie LI Department of Computing The Hong Kong Polytechnic University www.comp.polyu.edu.hk/~csshzhong Query-oriented Multi-document Summarization via Unsupervised Deep Learning Bilinear Deep Learning for Image Classification 25