Uploaded by lu lu

SSRN-id3165331

advertisement
Multi-Document Text Summarization Using
Deep Learning Algorithm With Fuzzy Logic
S.Sudha Lakshmi 1, Dr.M.Usha Rani
1
2
Research Scholar ,Dept.of Compter Science, SPMVV, Tirupati, India (e-mail:s_sudhamca@yahoo.com )
Professor, Dept.of Compter Science, SPMVV, Tirupati, India (e-mail: musha_rohan@yahoo.co.in)
ABSTRACT
Multi-document text summarization focuses on extracting the key information from the collection of related
documents and presents it as a brief summary.In this paper, we are presenting multi document text
summarization using Deep learning algorithm with fuzzy logic which is an important research area in NLP, data
mining(DM) and Machine Learning(ML). To improve the accuracy here , we are using Restricted Boltzman
machine to generate a shortend version of original document without losing its valuable information. The current
method consists of two steps: 1.Training phase 2. Testing phase. The prominent role of the training phase is to
extract effective summary generation. Afterwards the testing phase is implemented to validate the efficiency and
accuracy of the proposed method.
Index Terms : Deep Learning, Fuzzy Logic, Multi-document Summary, RBM.
I.INTRODUCTION
Due to rapid growth of documents on internet, users
require all the related data at one place without
any hassle. Automatic Text summarization is a
mechanism of generating a short meaningful text
that can summarize a textual content by using
computer algorithm[1]. In document summarization
technique, text summarization can be classified into
two categories: Abstractive and Extractive
summary from single or multiple documents[2].
Text Summarization Methods and approaches
which currently in Development such as Neural
networks[3], Graph theoretic[4],TermFrequencyInverseDocumentFrequency(TFIDF)[5,6],Clusterba
sed[6],MachineLearning[7],ConceptOriented[8],fuz
zylogic[9,12],MultidocumentSummarization[10],M
ultilingual Extractive[11].In this paper, multidocument text summarization by using fuzzy logic
is combined with Deep learning algorithm is
presented. The current researchers have showed
that Deep learning algorithm has more influence on
the Multi-document text summarization process by
presenting the most important objects from
collection of objects. The Following are the two
phases of text summarization method .
Phase I: Feature extraction from multiple
documents. In the process of feature extraction, it
generates feature matrix from the features extracted
from each sentences. Now this generated feature
matrix is processed with a fuzzy classifier. Then a
feature matrix is generated based on the fuzzy score
for which rules generated in fuzzy classifier are
given as input.Recently created feature matrix is
processed through deep learning algorithm. The
text summary is generated layer by layer by
processing the deep learning algorithm.
Phase II: The testing phases is implemented to
check the efficiency of the proposed approach.
II. PREVIOUS WORK
El-Haj et al. to optimize the generic extractive
Arabic and English multi-document summarization
technique.This work was classified into the top five
systems in the DUC-2002 multi-document
summarization task during the last two decades.
Deep learning showed strong impact in different
areas, specifically in natural language processing
(NLP) tasks (Collobertet al., 2011; Srivastava and
Salakhutdinov,2012).Yan Liu et al [13] have
proposed a document summarization framework
using deep learning model, it showed a notable
extraction ability in document summarization.
This Framework includes the following: 1.concepts
extraction2.summary
generation3.reconstruction
validation. By using dynamic programming and
from the concepts extracted from deep architecture
,important sentences are extracted as summary.
Kiani et al[14] proposed a novel approach that
extracts sentences based on an evolutionary fuzzy
inference engine. F. kyoomarsi et al [15]have
presented an approach for creating text summaries.
In this technique they used fuzzy logic and wordnet to extract most relevant sentences from an
original document.Experimental study results
reveals that this approach shows better performance
Electroniccopy
copy available
available at:
Electronic
at: https://ssrn.com/abstract=3165331
https://ssrn.com/abstract=3165331
when compared to other commercially available
text summarizers in extracting relevant sentences.
In [16] Witt et. Al present a fuzzy concept approach
based on the co reference resolution. But all of the
above relative works shows clearly that some
sentences in all respective function are short or too
long; To solve this issues, in this paper Deep
learning algorithm with Fuzzy logic is used to
generate brief and relevant summaries.
III. METHODOLOGY
In order to summarize the text a particular model
is needed for processing the text which can given
as input to (RBM)Restricted Boltzmann Machine .
A. Restricted Boltzman Machine(RBM)
Restricted Boltzman machine(RBM) fundamentally
executes a binary version of factor analysis. In brief
technical terms, a Restricted Boltzmann
Machine(RBM) is called as a stochastic neural
network. Stochastic means these activations have a
probabilistic component/element consists of
• One layer of visible units.(users’ movie
choice whose states we know and set).
• One layer of hidden units. (the latent factors
we try to learn)
• A bias unit (whose state is always on, and is
a way of adjusting for the different
inherent popularities of each and every
movie).
There is no connections between the units in each
and every layer rest all are connected to every
other units in other layer as shown in Figure 1.
Let's consider six movies say Terminator,Ben-hur,
Gandhi, Jurassic Park,Titanic
and E.T(Extra
Terrestrial). Now we ask the audience to tell us
which movie they are interested to watch. If we
want to study two latent units basic movie
preferences/choices – for example, in set of six
movies two categories appear to be Science
Fiction /fantasy (Terminator, Jurassic Park, and
E.T) and Oscar winner Movies (Ben-hur,Titanic
and Gandhi), so the mentioned latent units will
correspond to these categories then RBM looks like
Figure2.Connections
between
neurons
are
bidirectional and symmetric. It implies that during
training phase information flows in both directions,
the usage of network and their corresponding
weights are the identical in both directions.
Primarily the given network is trained via data set
and locating the neurons on visible layer to
counterpart or match data points in this data set. In
unsupervised learning if the network is trained once
, it can be applied on new unidentified data to make
classification. During the process of
text
summarization the given text document is preprocessed using different pre-processing techniques
later it is transformed into feature matrix.RBM gets
input from each row of this feature Matrix. In the
present text summarization algorithm, the fuzzy
classifier assigns class labels for the sentences
based on the structured matrix. Similarly rule
selector is used to calculate the relevance of each
sentence .The new feature matrix is formed by
dividing the corresponding sentences by using
rules.RBM input query gets collection of top or
high priority words and sentence vector words
output is compared to generate the extractive
summary of the text document.
(a)
Figure2. Process Flow in Multi Document Text
Summarization
(b)
Figure1.(a)Restricted Boltzmann Machine
(b) RBM -Example
Electroniccopy
copy available
available at:
Electronic
at: https://ssrn.com/abstract=3165331
https://ssrn.com/abstract=3165331
B. Preprocessing
1) Title Similarity Feature
In preprocessing phase, the system takes multiple
documents as input from DUC 2002 Dataset which
needs to be summarized. Preprocessing involves
1.Segmentation,2.Stopwords removal 3. Stemming.
Title similarity Feature can be calculated as the
ration between no. of words in the sentence that can
happen in title to the total no. of words in the title.
The formula is given below:
Title feature
(1)
1) Segmentation
Sentence segmentation is performed by
identifying the delimiter(full stop)".".This step
separates the sentences in the given input
documents to understand each single sentence
present in the document.
where f(1)= The features extracted according to the
title similarity of the documents. S = Set of words
extracted by analyzing the sentences present in each
document. t = words extracted from analyzing the
titles in each document.
2) Positional Feature
2) Stop Words Removal
The insignificant and noisy words are identified
and removed in Stop words Removal step. For
instance a, an ,by, in ,and ,this .,etc which are called
predefined words are separated out earlier to the
pre-processing phase .
3) Stemming
Stemming is process of bringing the word to its
base (or) root form called stem. In stemming using
the words in singular form instead of using it in a
plural form. It removes the prefix and suffix of the
particular word to get the root or base form. For
example, the words: “presentation”, “presented",
"presenting” could all be reduced to a common base
representation “present". There are more number of
algorithms, which are called as stemmers used to
perform the stemming process.
C. Deep Learning Algorithm with Fuzzy Logic
In this approach the proposed algorithm effiently
combines Deep learning Algorithm with Fuzzy
Logic into two phases namely Training phase and
testing phase. The training phase is used to generate
text summarization by given input documents by
using Deep learning algorithm along with fuzzy
logic classifier. Testing phase is implemented to
check the effiency of the algorithm.
Positional score of sentence is calculated as
directly by examining the beginning of each
paragraph, a fresh discussion is started and at the
end of each paragraph, we have a final closing then
the feature value f2 is assigned as 1 or Else if the
sentence is in the middle of the paragraph then the
feature value is assigned as 0.
f2(Positional Feature)={1,if it is the first or last
sentence of a paragraph.0,otherwise}
(2)
3 )Term Weight Feature
The Term Frequency of a word will be given by TF
(f, d) .Here, f is the frequency of the given word
and d is text present in the document. The Total
Term Weight is calculated by ratio between Term
Frequency and IDF for a document . IDF here
represents
the inverse document frequency.It
indicates whether the term is frequent or infrequent
across all set of input documents. We can get IDF
by dividing the total no of documents with the
number of documents holds the term
then
computing the log of that quotient.
(3)
Where ,D= total no. of documents D :
,it is
the no of text documents in term t appears. The
total term weight TFxIDF is calculated as follows:
a) Phase I :Training Phase
Training Phase uses deep learning algorithm to
generate Text summary. Features extracted from
the multiple text documents are considered as the
important attribute for the summarization process.
In training phase ,the proposed approach defines
seven feature sets as:
1. Title Similarity Feature 2. Positional Feature
3. Term Weight Feature 4. Concept Feature
5.Sentence to Centroid similarity Feature
6. Number of numerals Feature 7.POS Tagger
Feature.
f3(Term weight)
= TF X IDF(t,d,D) = TF(t,d) xIDF (t,D)
(4)
4) Concept Feature
By using mutual information and windowing
process the concept feature from the text document
is obtained .A virtual window of size ‘k’ is moved
over document from left to right in windowing
process. Now we have to trace out the cooccurrence of words in same window and it can be
obtained by the following formula:
Electroniccopy
copy available
available at:
Electronic
at: https://ssrn.com/abstract=3165331
https://ssrn.com/abstract=3165331
f4(Concept Feature)
⇒M1(wi,wj) =log 2 P(wi,wj) /P(wi) x P(wj )
(5)
P(wi,wj)
=Joint probability and the co-occurence of
the keyword in text window.
P(wi)= Probability that a keyword P(wi) appears in
a text window and calculated as
P(wi) = | swt | / |sw|
(6)
swt = total no of windows containing the keyword
= total number of windows constructed from
text document.
matrix. Here,S attribute represents the sentences
and Class labels along with class values of each
sentence. Usually the class labels with class values
are manually assigned by domain experts. But in
this approach we are using a fuzzy classifier for
assigning the class labels for the each sentence.
Here the function of fuzzy classifier is to assign
the class labels to the sentences based on the fuzzy
rules by processing the sentences. The Figure(3)
represents the feature matrix for given set of text
documents under consideration.
5)Sentence to Centroid similarity Feature
S =
Sentence having the maximum TF-IDF score is
treated as the centroid sentence. so we calculate
cosine similarity of each sentence with that centroid
sentence. It can be calculated as follows:
f5(Sentence_Similarity)
⇒cosine(sentence, centroid)
(7)
6) Number of numerals
Numerals play crucial role in representing facts so
this feature gives more attention to sentences
containing certain figures. For each sentence we
calculate the ratio of numerals to total no .of words
in the sentence.
f6(Sentence_Numerals)
=
s1
s2
...
sn
f1 f 2 f 3 f 4 f 5 f 6 f 7 C
=
... ... ... ... .... ... ... ...
... ... ... .... ... ... ... ...
... .... ... ... ... ... ... ...
Figure3. Feature Matrix for the given set of Text
Documents
c) Fuzzy Logic System
The proposed algorithm utilizes the fuzzy logic
technique to assign class labels and to compute the
significance of each sentence. Pre-summarized set
of documents are given as input to fuzzy logic
system. It has three main components as Fuzzier,
Rule Selector and the De-fuzzier.
(8)
7) POS Tagger Feature
POS tagger categorizes the words of the text
document based on part of speech such as noun,
adjectives,verbs,adverb etc. Dynamic programming
and algorithms like hidden Markova models are
used to perform this task. The POS Tags on each
document is feature Seven (f7) of the given text
documents.
b) Feature Matrix
Let's consider sentence matrix S = (S1, S2, ….Sn)
where Si = (f1, f2, … f5), i n is the feature vector.
For
the
proposed
multi-document
text
summarization algorithm these seven features are
the most important attributes .The whole text
documents under observation are subjected for the
feature extraction and a set of features are extracted
correspondingly. Now Feature matrix is formed
based on the collected features by mapping it to
feature values. The feature matrix is generated
according to the sentences extracted from the
multiple text documents. Apart from
seven
features, an additional attribute called class label for
each sentence is also associated with the feature
Fuzzier:
The role of Fuzzier in this proposed approach is to
translate the inputs into future values, fuzzier
assigns values from 1 to 7 for each feature. It
generate Fuzzy rules for each sentence according to
the weight given to features based on Fuzzy Values.
Here Fuzzy rules are defined inorder to consider
feature value for judging significant sentences. If a
feature has value VERY LOW, then it assigns least
importance to the sentence. sentences are
considered important based on values such as
LOW,MEDIUM,HIGH,VERY HIGH. Thus if a
sentence is assigned by fuzzy rule with all seven
feature value are 1, then that sentence should be
considered as least important for the summary and
vice versa. Thus set of rules are framed by
comparing the sentences from the set of documents
and the sentences from the multi document text
summary.
Rule Selector
The rule selector selects the prominent rules from
the set of rules generated by the fuzzier and are
stored in a set which are needed for text
summarization.
De-fuzzier
De-fuzzier selects the needed rules from the rule
selector and assign the fuzzy score for each
Electronic copy available at: https://ssrn.com/abstract=3165331
sentences correspondingly. Finally de-fuzzier is
used for data preparation of the deep learning
algorithm Hence de-fuzzier alters the feature matrix
based on the feature values assigned to a specific
rule and derives the fuzzy logic score by evaluating
the features values. The newly created feature
matrix which is an input to the deep learning
algorithm is formed by dividing the rules into
corresponding sentences.
d) Deep Learning Algorithm
The sentence matrix S = (S1, S2, ….Sn) which
contains feature vector set having element as si. si
is a set consists all seven features extracted for the
sentence si. Deep architecture of RBM[17] takes
this set as input as visible layer .Few random
values
are selected for bias hi
where
i=1,2,3..because RBM has at least two hidden
layers. The whole process is described briefly as
follows:S = (S1, S2, ….Sn) where, Si = (f1, f2, … f5),
i
n; n =total number of sentences in the
document. RBM consists of two hidden layers so
two set of bias value is selected randomly which is
operated completely on sentence matrix ,as an input
to RBM as follows:H0H1:H0 = {h0, h1, h2, …hn};H1
= {h0, h1, h2, …hn}
RBM works in 2 steps given below:
Step 1: Sequence Matrix S = (S1, S2, ….Sn) with
six features of sentence is given as input .A new
refined sentence matrix is performed during the
first Cycle of RBM by performing
Si + hi
the total no. of words in the document. It is given
by:
Sc= ;
(10)
Sc = Sentence score;S= Sentence;Q = User
query;Wc = Total wordcount of a text
Step2:Ranking of Sentence
is performed by
considering the sentence score got in step1. Based
on sentence score ,sentences are arranged in
descending order . top-N sentences are selected
using compression rate given as input by the user.
Top sentences can be found as follows:
(11)
Ns= Number of sentences in document ;
C = Compression rate (Given by User)
This is the Final step in summary generation ,later
final set of sentences are obtained.
1V. RESULT ANALYSIS
In this approach ,a set of related topics documents
are given as input efficiency of proposed method is
done based on evaluation metrics such as Recall,
Precision and F-Measure.
(9)
Hence S’ = (S1’, S2’, … Sn’)
Step 2: The same Method can be applied to new
refined set to get more refined matrix sentence set
with H1 and given by S” = (S1”, S2”, …
Sn”).Refined sentence matrix from RBM is further
tested on a specific randomly generated threshold
value for each feature. If Feature value is less than
threshold value then it will be filtered and it
becomes the member of new set feature vector.
Deep learning algorithm generated good set of
feature vectors in initial phase. In this phase feature
vector set is fine tuned by adjusting the weight of
the units of the RBM using back propagation
algorithm to identify good optimum feature vector
set for the brief contextual summary of text. The
deep learning algorithm at this stage uses crossentropy error to fine tune the new feature vector
set. The cross-entropy error is calculated for each
and every feature of the sentence .
C. Summary Generation
Step 1:Sentence score is the ratio of common
words found in user query and specific sentence to
Figure4.Comparitive Analysis.
The maximum Recall, Precision and F-Measure
values for the present dataset as given as 0.40,
0.92 and 0.55 respectively.
V. CONCLUSION
In this paper, we presented a multi-document text
summarization scheme using an unsupervised deep
learning algorithm along with fuzzy logic. Feature
matrix with seven features from the set of sample
dataset from DUC2002(Document ID: AP8809110016). The feature matrix is applied through the
various levels of the RBM and finally the efficient
text summary is generated. The result indicates that
this method generates efficient text summary when
compared to previous methods based on evaluation
metrics.
Electronic copy available at: https://ssrn.com/abstract=3165331
REFERENCES
[1] Aarti Komal Pharande,DipaliNale, Roshani
"Agrawal" Automatic TextSummarization"
Volume 109 – No. 17, January 2015.
[2] Yapinus,G.;Erwin,A.;Galinium,M.;
,Muliady,W.,“Automatic
multi-document
summarization for Indonesian documents
using
hybrid
abstractive-Extractive
summarization technique”,pp.1-5, 0ct.2014.
[3] Khosrow Kaikhah, "Automatic Text
Summarization with Neural Networks",
International Conference on intelligent
systems, IEEE, 40-44, USA, June 2004.
[4] G Erkan and Dragomir R. Radev, “LexRank:
Graph-based Centrality as Salience in
TextSummarization”, Journal of Artificial
Intelligence Research, Re-search, Vol. 22,
pp. 457-479 2004.
[5] Joeran Beel "Research-paper recommender
systems: a literature survey" November 2016,
Volume 17,Issue 4, pp 305–338.
[6] KyoJoong "Research Trend Analysis using
Word Similarities and Clusters" Vol. 8, No.
1,January, 2013.
[7] Joel larocca Neto, Alex A. Freitas and Celso
A.A.Kaestner,"AutomaticTextSummarization
using a Machine Learning Approach”, Book:
Advances in Artificial Intelligence: Lecture
Notes in computer science, Springer Berlin /
Heidelberg, Vol 2507/2002, 205-215, 2002.
[8] Meng Wang, Xiaorong Wang and Chao Xu,
"An Approach to Concept Oriented
TextSummarization", in Proceedings of
ISCIT’05, IEEE international conference,
China,1290-1293, 2005.Computer Science &
Information Technology (CS & IT) .
[9] Farshad Kyoomarsi, Hamid Khosravi,
Esfandiar Eslami and Pooya Khosravyan
Dehkordy,“Optimizing Text Summarization
Based on Fuzzy Logic”, In proceedings of
Seventh IEEE/ACISInternational Conference
on Computer and Information Science, IEEE,
University of Shahid BahonarKerman, UK,
347-352, 2008.
[10] Junlin Zhanq, Le Sun and Quan Zhou, “A
Cue-based HubAuthority Approach for
Multi-DocumentText Summarization”, in
Proceeding of NLP-KE'05, IEEE,642- 645,
2005.
[11] David"Multilingual
Single
Document
Keyword
Extraction
for
Information
Retrieval", Proceedings of NLP-KE’05,
IEEE, Tokushima, 2005.
[12] Ladda Suanmali, Mohammed Salem,
Binwahlan and Naomie Salim,“Sentence
Features Fusion for Text summarization
using Fuzzy Logic, IEEE, 142-145, 2009.
[13] YanLiu,
Sheng-huaZhong,
Wen-jieLii
,“Query-oriented
Unsupervised
Multidocument
Summarization
via
DeepLearning,” Elsevier Science, 2008, PP.
3306 –3309.
[14] B.Arman kiani,Akbarzadeh, “AutomaticText
Summarization using: Hybrid FuzzyGA-GP”
IEEE International Conference on Fuzzy
Systems,2006.
[15] C.Gordon and R.Debus, “Developing deep
learning approaches and personal teaching
efficacy within a preservice teacher education
context,” Web Intelligence and Intelligent
AgentTechnology, British Journal of
Educational Psychology,Vol. 72, No. 4,
2002, PP. 483-511.
[16] L. Zadeh, “Fuzzy sets. Information
Control,” vol. 8, pp.338–353.1965 .
[17] “An Approach For Text Summarization
Using Deep Learning Algorithm”, Journal of
Computer Science, vol.6, no.11, 2013.
Electronic copy available at: https://ssrn.com/abstract=3165331
Download