PPT - The Hong Kong Polytechnic University

advertisement
Query-oriented Multi-document
Summarization via Unsupervised
Deep Learning
Yan LIU, Sheng-hua ZHONG, Wenjie LI
Department of Computing
The Hong Kong Polytechnic University
www.comp.polyu.edu.hk/~csshzhong
Outline

Problem


Idea


Query-oriented deep extraction
Experiments


Deep models, consistent with human cortex
Proposed technique


Push out important concepts layer by layer by inheriting extraction
ability from deep learning
Methodology


Extractive style query-oriented multi-document summarization
DUC 2005, DUC 2006 and DUC 2007
Conclusion

Provide a human-like document summarization
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
2
Outline

Problem


Idea


Query-oriented deep extraction
Experiments


Deep models, consistent with human cortex
Proposed technique


Extract important concepts layer by layer by inheriting extraction
ability from deep learning
Methodology


Extractive style query-oriented multi-document summarization
DUC 2005, DUC 2006 and DUC 2007
Conclusion

Provide a human-like document summarization
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
3
Query-oriented Multi-document
Summarization

Extractive style query-oriented multi-document summarization



Multi-document summarization remains a well-known challenge




Automatic generic text summarization  query oriented document
summarization
Single-document  multi-document summarization
Extractive approach is the mainstream
Humans do not have difficulty with multi-document summarization


Generate the summary by extracting a proper set of sentences from
multiple documents based on the pre-given query
Important in both information retrieval and natural language processing
How does the neocortex process the lexical-semantic task?
Aim of this paper


This is the first paper of utilizing deep learning in document summarization
Provide human-like judgment by referencing the architecture of the human
neocortex
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
4
Outline

Problem


Idea


Query-oriented deep extraction
Experiments


Deep models, consistent with human cortex
Proposed technique


Extract important concepts layer by layer by inheriting extraction
ability from deep learning
Methodology


Extractive style query-oriented multi-document summarization
DUC 2005, DUC 2006 and DUC 2007
Conclusion

Provide a human-like document summarization
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
5
Deep Learning

Physical structure



Evolution of intelligence



Human: dozens of cortical layers are involved in even the simplest
lexical-semantic processing
Deep: model the problem using multiple layers of parameterized
nonlinear modules
Human: the multi-layers structure began to appear in the neocortex
starting from old world monkeys about 40 million years ago
Deep: the development of intelligence follows with the multi-layer
structure
Propagation of information


Human: several reasons for believing that our lexical-semantic
systems contain multi-layer generative models
Deep: layer-wise reconstruction to learn multiple levels of
representation and abstraction that helps to make sense of data
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
6
Human Neocortex and Deep
Architectures
(a) Human neocortex
(b) An example of deep architectures
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
7
Proposed Model in Vision
Recognition
Samples of first layer
weights
Examples represent
“strokes” of digital
(b) Resemble to V1 response
at
Fe
e
ur
ap
M
…
…
Attention Map
…
…
…
…
…
…
… …
…
…
…
…
…
H1
…
1
…
H2
…
…
Input Image
(a) Proposed BDBN
(c) Saliency map based on BDBN
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
8
Outline

Problem


Idea


Query-oriented deep extraction
Experiments


Deep models, consistent with human cortex
Proposed technique


Extract important concepts layer by layer by inheriting extraction
ability from deep learning
Methodology


Extractive style query-oriented multi-document summarization
DUC 2005, DUC 2006 and DUC 2007
Conclusion

Provide a human-like document summarization
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
9
Three-stage Learning of QODE

Information



Summary Generation
Dynamic Programming
Input: tf value of every word in document
Output: summary
Three-stage learning

Query-oriented concept extraction



Hidden layers are used to abstract the
documents
using
greedy
layer-wise
extraction algorithm
Reconstruction validation for global
adjustment

Reconstruct the data distribution by finetuning the whole deep architecture globally
Summary generation
programming

Summary
Candidate Sentence Pool
via
dynamic
Dynamic programming is utilized to
maximize the importance of the summary
with the length constraint
Reconstruction Validation
(A1)T
…………
h0
…………
h1
………
h2
2 T
(A )
3 T
(A )
Concept Extraction
……
h3
3
A
Candidate Sentence Extraction
………
A2
h2
Key Words Discovery
…………
A1
h1
Not Important Words
Filtering Out
…………
f  [f , f ,  , f ,  , f ] tf Value
d
d
1
d
2
d
V
d
v
…
h0
Preprocessing
Word List
Document Topic Set
Query Oriented Initial Weight Setting
…
Query Oriented Penalty Process
Query Word List
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
10
Query-oriented Concept Extraction

A joint configuration ( h0 , h 1 ) of the visible layer H 0 and the first hidden H1
layer has energy
E  h0 , h1; 1   ((h0 )T A1h1  (b1 )T h0  (c1 )T h1 )

 1   A1 ,b1 ,c1 
Utilize Contrastive Divergence algorithm to update the parameter space
 log p(h1 (0))

 1

E (h1 (0), h0 (0))
p(h (0) | h (0))

h0 (0)
 1
0
1
A1   A ( h1 (0)T h0 (0) data   h1 (1)T h0 (1) recon )
 
h1 ( k )
E (h1 (k ), h0 (k ))
p(h (k ), h (k ))
h0 ( k )
 1
1
0
b1   b (h0 (0)  h0 (1)) c1  c1   c (h1(0)  h1(1))
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
11
Reconstruction Validation for
Global Adjustment

Backpropagation adjusts the entire deep network to find
good local optimum parameters by minimizing the crossentropy error

  arg min[
*



v
fv log fv 
 (1  f )log(1  f )]

v
v
v
Before backpropagation, a good region in the whole
parameter space has been found


The convergence obtained from backpropagation learning is not
slow
The result generally converge to a good local minimum on the error
surface
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
12
Summary Generation via Dynamic
Programming

Dynamic programming



Simplify a complicated problem by breaking it down into simpler sub-problems in a
recursive manner
Maximize the importance of the summary with the length constraint
The importance of the sentence In t
In t 




 i  

i , i  1
i
  0
 i
if (i  UN )  (i  q)
if i  UN
others
The summary length is defined as Le  l1  ...  lt  ...  lT  NS
The objective function max In =
In , s.t. Le  NS
t t
Dynamic programming function  f K (K )  max{uK In K  f K 1 (K 1 )}


K  t ,1  t  T
 K  K 1  uK 1lK 1 ,
   0,   N , f ( )  0
T
S 0 0
 0
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
13
Outline

Problem


Idea


Query-oriented deep extraction
Experiments


Deep models, consistent with human cortex
Proposed technique


Extract important concepts layer by layer by inheriting extraction
ability from deep learning
Methodology


Extractive style query-oriented multi-document summarization
DUC 2005, DUC 2006 and DUC 2007
Conclusion

Provide a human-like document summarization
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
14
Experiment Setting

Database

DUC 2005, DUC 2006 and DUC 2007




Benchmark datasets of multi-document summarization task evaluation in the Document
Understanding Conference (DUC)
Produce query-oriented multi-document summarization with allowance of 250 words
Evaluation standards

ROUGE-1, ROUGE-2 and ROUGE-SU4

Graph-based sentence ranking algorithms
Compared algorithms




Supervised learning based sentence ranking models




Manifold-ranking model [Wan & Xiao, 2009]
Multiple-modality model [Wan, 2009]
Document-sensitive model [Wei et al, 2010]
SVM classification [Vapnik, 1995]
Ranking SVM[Jochims et al, 2002]
Regression[Ouyang et al, 2011]
Classical relevance and redundancy based selection algorithms




Maximum marginal relevance (Goldstein et al, 2000)
Greedy search (Filatova & Hatzivassiloglou, 2004)
Integer linear program (Mcdonald, 2007)
NIST baseline system (Dang, 2005)
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
15
Average Recall Scores Comparison
Comparison to representative algorithms on the DUC 2005
System
ROUGE-1
ROUGE-2
ROUGE-SU4
*
QODE
0.3751
0.0775
0.1341*
Manifold-ranking
0.3839*
0.0737
0.1317
Multiple-modality
0.3718
0.0676
0.1293
Document-sensitive
——
0.0771
0.1337
SVM Classification
0.3663
0.0701
0.1243
Ranking SVM
0.3702
0.0711
0.1299
Regression
0.3770
0.0761
0.1329
Greedy search
0.3560
0.0610
——
MMR
0.3701
0.0701
0.1289
ILP
0.3580
0.0610
——
NIST Baseline
——
0.0403
0.0872
1
√
√
√
Method
2
√
√
√
Query Oriented Contribution Analysis
ROUGE-1
ROUGE-2
3
√
√
√
0.3751
0.3731
0.3734
0.3704
0.0775
0.0742
0.0755
0.0740
ROUGE-SU4
0.1341
0.1315
0.1329
0.1301
1. Query oriented initial weight setting, 2. Query oriented penalty process, 3. Summary importance maximization by DP
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
16
Average Recall Scores Comparison
Comparison to representative algorithms on the DUC 2006
System
ROUGE-1
ROUGE-2
ROUGE-SU4
*
QODE
0.4015
0.0928
0.1479
Manifold-ranking
0.4101*
0.0886
0.1420
Multiple-modality
0.4031
0.0851
0.1400
Document-sensitive
——
0.0899
0.1427
SVM Classification
——
0.0834
0.1387
Ranking SVM
——
0.0890
0.1443
Regression
——
0.0926
0.1485*
0.0962
NIST Baseline
——
0.0491
Comparison to representative algorithms on the DUC 2007
System
QODE
Manifold-ranking
Multiple-modality
Document-sensitive
SVM Classification
Ranking SVM
NIST Baseline
ROUGE-1
0.4295
0.4204
——
0.4211
——
0.4301*
0.3091
ROUGE-2
0.1163
0.1030
0.1123
0.1103
0.1075
0.1175*
0.0599
ROUGE-SU4
0.1685*
0.1460
0.1682
0.1628
0.1616
0.1682
0.1036
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
17
Query Word Importance Analysis
(a) ROUGE-2 Recall performance vs. 
(b) ROUGE-4 Recall performance vs. 
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
18
Filtering Out Words in First Layer of
QODE
The statistical analysis of words in layer H 1
Words
Number
Percentage
1032
In Human
Summary
65
Filtering out words
Remaining words
1000
211
21.2%
6.3%
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
19
Key Words Discovery of QODE
The statistical analysis of words in layer H
Words
Number
Random words
250
Key words
250
2
In Human Summary
34
99
Percentage
13.6%
39.6%
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
20
Candidate Sentence Extraction of
QODE
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
21
Advanced Extraction Ability of
QODE
Candidate sentence extracted in layer H 3
Id of Human’s
Summary
An international war crimes tribunal covering the former Yugoslavia formally opens in The Hague today with a A,B,C,D,E,G,
request for the extradition from Germany of a Bosnian Serb alleged to have killed three Moslem prisoners.
H,I,J
Sentence with Union of Key Words in Automatically Extracted Summary
The extradition is important to the tribunal - the first international war crimes court since the Nuremberg trials
after the second world war - because it has no power to try suspects in absentia.
World News in Brief: Court rules on border.
The International Court of Justice in The Hague ruled in Chad's favour in a 20-year border dispute with Libya
which has caused two wars.
Maybe we'll go full circle; the World Court can condemn this action and then the Soviets can defy that body, just
as the United States defied the court's condemnation of our embargo of Nicaragua.
Ever since the Reagan Administration walked out of the Hague to protest Nicaragua's claim of illegality in U.S.
aid to the Contras, the State Department has opposed submitting to the World Court any case that involves the
use of military force.
They refused to appear in the World Court 10 years ago when Washington sought the release of American
hostages in Tehran.
A year after Noriega's capture, the court was still hearing arguments on whether Bush could be subpoenaed and
the World Court was in preliminary hearings on Panama's complaint.
After six months of uproar, the U.S. district court judge in Miami ordered that the case proceed to trial.
Mr Edwin Williamson, a legal adviser to the U.S. State Department who will address the court later in the
proceedings, said yesterday, ‘This (court) action in no way inhibits what the Security Council is doing. ’
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
B,C,D,E,G,H,I,
J
A,C,D,E,G,H,I
,J
B,D,E,H,I,J
C,D,G,I,J
H,I,J
H,I,J
J
Null
Null
22
Outline

Problem


Idea


Query-oriented deep extraction
Experiments


Deep models, consistent with human cortex
Proposed technique


Extract important concepts layer by layer by inheriting extraction
ability from deep learning
Methodology


Extractive style query-oriented multi-document summarization
DUC 2005, DUC 2006 and DUC 2007
Conclusion

Provide a human-like document summarization
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
23
Conclusion and Future Work

Novel document summarization architecture





Three-stage learning



Simulate the multi-layer physical structure of the cortex
First layer is utilized to filter out not important words
Second layer is utilized to discover key words
Third layer is utilized to extract candidate sentences
Simulate the procedure of lexical-semantic processing by
human beings
Dynamic programming is utilized to maximize the
importance of the summary with the length constraint
Future work

Propose novel deep learning model by referring more
characters of human cortex
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
24
Q&A
Thanks!
Yan LIU, Sheng-hua ZHONG, Wenjie LI
Department of Computing
The Hong Kong Polytechnic University
www.comp.polyu.edu.hk/~csshzhong
Query-oriented
Multi-document
Summarization
via Unsupervised Deep Learning
Bilinear
Deep Learning
for Image Classification
25
Download