Multi-Document Text Summarization Using Deep Learning Algorithm With Fuzzy Logic S.Sudha Lakshmi 1, Dr.M.Usha Rani 1 2 Research Scholar ,Dept.of Compter Science, SPMVV, Tirupati, India (e-mail:s_sudhamca@yahoo.com ) Professor, Dept.of Compter Science, SPMVV, Tirupati, India (e-mail: musha_rohan@yahoo.co.in) ABSTRACT Multi-document text summarization focuses on extracting the key information from the collection of related documents and presents it as a brief summary.In this paper, we are presenting multi document text summarization using Deep learning algorithm with fuzzy logic which is an important research area in NLP, data mining(DM) and Machine Learning(ML). To improve the accuracy here , we are using Restricted Boltzman machine to generate a shortend version of original document without losing its valuable information. The current method consists of two steps: 1.Training phase 2. Testing phase. The prominent role of the training phase is to extract effective summary generation. Afterwards the testing phase is implemented to validate the efficiency and accuracy of the proposed method. Index Terms : Deep Learning, Fuzzy Logic, Multi-document Summary, RBM. I.INTRODUCTION Due to rapid growth of documents on internet, users require all the related data at one place without any hassle. Automatic Text summarization is a mechanism of generating a short meaningful text that can summarize a textual content by using computer algorithm[1]. In document summarization technique, text summarization can be classified into two categories: Abstractive and Extractive summary from single or multiple documents[2]. Text Summarization Methods and approaches which currently in Development such as Neural networks[3], Graph theoretic[4],TermFrequencyInverseDocumentFrequency(TFIDF)[5,6],Clusterba sed[6],MachineLearning[7],ConceptOriented[8],fuz zylogic[9,12],MultidocumentSummarization[10],M ultilingual Extractive[11].In this paper, multidocument text summarization by using fuzzy logic is combined with Deep learning algorithm is presented. The current researchers have showed that Deep learning algorithm has more influence on the Multi-document text summarization process by presenting the most important objects from collection of objects. The Following are the two phases of text summarization method . Phase I: Feature extraction from multiple documents. In the process of feature extraction, it generates feature matrix from the features extracted from each sentences. Now this generated feature matrix is processed with a fuzzy classifier. Then a feature matrix is generated based on the fuzzy score for which rules generated in fuzzy classifier are given as input.Recently created feature matrix is processed through deep learning algorithm. The text summary is generated layer by layer by processing the deep learning algorithm. Phase II: The testing phases is implemented to check the efficiency of the proposed approach. II. PREVIOUS WORK El-Haj et al. to optimize the generic extractive Arabic and English multi-document summarization technique.This work was classified into the top five systems in the DUC-2002 multi-document summarization task during the last two decades. Deep learning showed strong impact in different areas, specifically in natural language processing (NLP) tasks (Collobertet al., 2011; Srivastava and Salakhutdinov,2012).Yan Liu et al [13] have proposed a document summarization framework using deep learning model, it showed a notable extraction ability in document summarization. This Framework includes the following: 1.concepts extraction2.summary generation3.reconstruction validation. By using dynamic programming and from the concepts extracted from deep architecture ,important sentences are extracted as summary. Kiani et al[14] proposed a novel approach that extracts sentences based on an evolutionary fuzzy inference engine. F. kyoomarsi et al [15]have presented an approach for creating text summaries. In this technique they used fuzzy logic and wordnet to extract most relevant sentences from an original document.Experimental study results reveals that this approach shows better performance Electroniccopy copy available available at: Electronic at: https://ssrn.com/abstract=3165331 https://ssrn.com/abstract=3165331 when compared to other commercially available text summarizers in extracting relevant sentences. In [16] Witt et. Al present a fuzzy concept approach based on the co reference resolution. But all of the above relative works shows clearly that some sentences in all respective function are short or too long; To solve this issues, in this paper Deep learning algorithm with Fuzzy logic is used to generate brief and relevant summaries. III. METHODOLOGY In order to summarize the text a particular model is needed for processing the text which can given as input to (RBM)Restricted Boltzmann Machine . A. Restricted Boltzman Machine(RBM) Restricted Boltzman machine(RBM) fundamentally executes a binary version of factor analysis. In brief technical terms, a Restricted Boltzmann Machine(RBM) is called as a stochastic neural network. Stochastic means these activations have a probabilistic component/element consists of • One layer of visible units.(users’ movie choice whose states we know and set). • One layer of hidden units. (the latent factors we try to learn) • A bias unit (whose state is always on, and is a way of adjusting for the different inherent popularities of each and every movie). There is no connections between the units in each and every layer rest all are connected to every other units in other layer as shown in Figure 1. Let's consider six movies say Terminator,Ben-hur, Gandhi, Jurassic Park,Titanic and E.T(Extra Terrestrial). Now we ask the audience to tell us which movie they are interested to watch. If we want to study two latent units basic movie preferences/choices – for example, in set of six movies two categories appear to be Science Fiction /fantasy (Terminator, Jurassic Park, and E.T) and Oscar winner Movies (Ben-hur,Titanic and Gandhi), so the mentioned latent units will correspond to these categories then RBM looks like Figure2.Connections between neurons are bidirectional and symmetric. It implies that during training phase information flows in both directions, the usage of network and their corresponding weights are the identical in both directions. Primarily the given network is trained via data set and locating the neurons on visible layer to counterpart or match data points in this data set. In unsupervised learning if the network is trained once , it can be applied on new unidentified data to make classification. During the process of text summarization the given text document is preprocessed using different pre-processing techniques later it is transformed into feature matrix.RBM gets input from each row of this feature Matrix. In the present text summarization algorithm, the fuzzy classifier assigns class labels for the sentences based on the structured matrix. Similarly rule selector is used to calculate the relevance of each sentence .The new feature matrix is formed by dividing the corresponding sentences by using rules.RBM input query gets collection of top or high priority words and sentence vector words output is compared to generate the extractive summary of the text document. (a) Figure2. Process Flow in Multi Document Text Summarization (b) Figure1.(a)Restricted Boltzmann Machine (b) RBM -Example Electroniccopy copy available available at: Electronic at: https://ssrn.com/abstract=3165331 https://ssrn.com/abstract=3165331 B. Preprocessing 1) Title Similarity Feature In preprocessing phase, the system takes multiple documents as input from DUC 2002 Dataset which needs to be summarized. Preprocessing involves 1.Segmentation,2.Stopwords removal 3. Stemming. Title similarity Feature can be calculated as the ration between no. of words in the sentence that can happen in title to the total no. of words in the title. The formula is given below: Title feature (1) 1) Segmentation Sentence segmentation is performed by identifying the delimiter(full stop)".".This step separates the sentences in the given input documents to understand each single sentence present in the document. where f(1)= The features extracted according to the title similarity of the documents. S = Set of words extracted by analyzing the sentences present in each document. t = words extracted from analyzing the titles in each document. 2) Positional Feature 2) Stop Words Removal The insignificant and noisy words are identified and removed in Stop words Removal step. For instance a, an ,by, in ,and ,this .,etc which are called predefined words are separated out earlier to the pre-processing phase . 3) Stemming Stemming is process of bringing the word to its base (or) root form called stem. In stemming using the words in singular form instead of using it in a plural form. It removes the prefix and suffix of the particular word to get the root or base form. For example, the words: “presentation”, “presented", "presenting” could all be reduced to a common base representation “present". There are more number of algorithms, which are called as stemmers used to perform the stemming process. C. Deep Learning Algorithm with Fuzzy Logic In this approach the proposed algorithm effiently combines Deep learning Algorithm with Fuzzy Logic into two phases namely Training phase and testing phase. The training phase is used to generate text summarization by given input documents by using Deep learning algorithm along with fuzzy logic classifier. Testing phase is implemented to check the effiency of the algorithm. Positional score of sentence is calculated as directly by examining the beginning of each paragraph, a fresh discussion is started and at the end of each paragraph, we have a final closing then the feature value f2 is assigned as 1 or Else if the sentence is in the middle of the paragraph then the feature value is assigned as 0. f2(Positional Feature)={1,if it is the first or last sentence of a paragraph.0,otherwise} (2) 3 )Term Weight Feature The Term Frequency of a word will be given by TF (f, d) .Here, f is the frequency of the given word and d is text present in the document. The Total Term Weight is calculated by ratio between Term Frequency and IDF for a document . IDF here represents the inverse document frequency.It indicates whether the term is frequent or infrequent across all set of input documents. We can get IDF by dividing the total no of documents with the number of documents holds the term then computing the log of that quotient. (3) Where ,D= total no. of documents D : ,it is the no of text documents in term t appears. The total term weight TFxIDF is calculated as follows: a) Phase I :Training Phase Training Phase uses deep learning algorithm to generate Text summary. Features extracted from the multiple text documents are considered as the important attribute for the summarization process. In training phase ,the proposed approach defines seven feature sets as: 1. Title Similarity Feature 2. Positional Feature 3. Term Weight Feature 4. Concept Feature 5.Sentence to Centroid similarity Feature 6. Number of numerals Feature 7.POS Tagger Feature. f3(Term weight) = TF X IDF(t,d,D) = TF(t,d) xIDF (t,D) (4) 4) Concept Feature By using mutual information and windowing process the concept feature from the text document is obtained .A virtual window of size ‘k’ is moved over document from left to right in windowing process. Now we have to trace out the cooccurrence of words in same window and it can be obtained by the following formula: Electroniccopy copy available available at: Electronic at: https://ssrn.com/abstract=3165331 https://ssrn.com/abstract=3165331 f4(Concept Feature) ⇒M1(wi,wj) =log 2 P(wi,wj) /P(wi) x P(wj ) (5) P(wi,wj) =Joint probability and the co-occurence of the keyword in text window. P(wi)= Probability that a keyword P(wi) appears in a text window and calculated as P(wi) = | swt | / |sw| (6) swt = total no of windows containing the keyword = total number of windows constructed from text document. matrix. Here,S attribute represents the sentences and Class labels along with class values of each sentence. Usually the class labels with class values are manually assigned by domain experts. But in this approach we are using a fuzzy classifier for assigning the class labels for the each sentence. Here the function of fuzzy classifier is to assign the class labels to the sentences based on the fuzzy rules by processing the sentences. The Figure(3) represents the feature matrix for given set of text documents under consideration. 5)Sentence to Centroid similarity Feature S = Sentence having the maximum TF-IDF score is treated as the centroid sentence. so we calculate cosine similarity of each sentence with that centroid sentence. It can be calculated as follows: f5(Sentence_Similarity) ⇒cosine(sentence, centroid) (7) 6) Number of numerals Numerals play crucial role in representing facts so this feature gives more attention to sentences containing certain figures. For each sentence we calculate the ratio of numerals to total no .of words in the sentence. f6(Sentence_Numerals) = s1 s2 ... sn f1 f 2 f 3 f 4 f 5 f 6 f 7 C = ... ... ... ... .... ... ... ... ... ... ... .... ... ... ... ... ... .... ... ... ... ... ... ... Figure3. Feature Matrix for the given set of Text Documents c) Fuzzy Logic System The proposed algorithm utilizes the fuzzy logic technique to assign class labels and to compute the significance of each sentence. Pre-summarized set of documents are given as input to fuzzy logic system. It has three main components as Fuzzier, Rule Selector and the De-fuzzier. (8) 7) POS Tagger Feature POS tagger categorizes the words of the text document based on part of speech such as noun, adjectives,verbs,adverb etc. Dynamic programming and algorithms like hidden Markova models are used to perform this task. The POS Tags on each document is feature Seven (f7) of the given text documents. b) Feature Matrix Let's consider sentence matrix S = (S1, S2, ….Sn) where Si = (f1, f2, … f5), i n is the feature vector. For the proposed multi-document text summarization algorithm these seven features are the most important attributes .The whole text documents under observation are subjected for the feature extraction and a set of features are extracted correspondingly. Now Feature matrix is formed based on the collected features by mapping it to feature values. The feature matrix is generated according to the sentences extracted from the multiple text documents. Apart from seven features, an additional attribute called class label for each sentence is also associated with the feature Fuzzier: The role of Fuzzier in this proposed approach is to translate the inputs into future values, fuzzier assigns values from 1 to 7 for each feature. It generate Fuzzy rules for each sentence according to the weight given to features based on Fuzzy Values. Here Fuzzy rules are defined inorder to consider feature value for judging significant sentences. If a feature has value VERY LOW, then it assigns least importance to the sentence. sentences are considered important based on values such as LOW,MEDIUM,HIGH,VERY HIGH. Thus if a sentence is assigned by fuzzy rule with all seven feature value are 1, then that sentence should be considered as least important for the summary and vice versa. Thus set of rules are framed by comparing the sentences from the set of documents and the sentences from the multi document text summary. Rule Selector The rule selector selects the prominent rules from the set of rules generated by the fuzzier and are stored in a set which are needed for text summarization. De-fuzzier De-fuzzier selects the needed rules from the rule selector and assign the fuzzy score for each Electronic copy available at: https://ssrn.com/abstract=3165331 sentences correspondingly. Finally de-fuzzier is used for data preparation of the deep learning algorithm Hence de-fuzzier alters the feature matrix based on the feature values assigned to a specific rule and derives the fuzzy logic score by evaluating the features values. The newly created feature matrix which is an input to the deep learning algorithm is formed by dividing the rules into corresponding sentences. d) Deep Learning Algorithm The sentence matrix S = (S1, S2, ….Sn) which contains feature vector set having element as si. si is a set consists all seven features extracted for the sentence si. Deep architecture of RBM[17] takes this set as input as visible layer .Few random values are selected for bias hi where i=1,2,3..because RBM has at least two hidden layers. The whole process is described briefly as follows:S = (S1, S2, ….Sn) where, Si = (f1, f2, … f5), i n; n =total number of sentences in the document. RBM consists of two hidden layers so two set of bias value is selected randomly which is operated completely on sentence matrix ,as an input to RBM as follows:H0H1:H0 = {h0, h1, h2, …hn};H1 = {h0, h1, h2, …hn} RBM works in 2 steps given below: Step 1: Sequence Matrix S = (S1, S2, ….Sn) with six features of sentence is given as input .A new refined sentence matrix is performed during the first Cycle of RBM by performing Si + hi the total no. of words in the document. It is given by: Sc= ; (10) Sc = Sentence score;S= Sentence;Q = User query;Wc = Total wordcount of a text Step2:Ranking of Sentence is performed by considering the sentence score got in step1. Based on sentence score ,sentences are arranged in descending order . top-N sentences are selected using compression rate given as input by the user. Top sentences can be found as follows: (11) Ns= Number of sentences in document ; C = Compression rate (Given by User) This is the Final step in summary generation ,later final set of sentences are obtained. 1V. RESULT ANALYSIS In this approach ,a set of related topics documents are given as input efficiency of proposed method is done based on evaluation metrics such as Recall, Precision and F-Measure. (9) Hence S’ = (S1’, S2’, … Sn’) Step 2: The same Method can be applied to new refined set to get more refined matrix sentence set with H1 and given by S” = (S1”, S2”, … Sn”).Refined sentence matrix from RBM is further tested on a specific randomly generated threshold value for each feature. If Feature value is less than threshold value then it will be filtered and it becomes the member of new set feature vector. Deep learning algorithm generated good set of feature vectors in initial phase. In this phase feature vector set is fine tuned by adjusting the weight of the units of the RBM using back propagation algorithm to identify good optimum feature vector set for the brief contextual summary of text. The deep learning algorithm at this stage uses crossentropy error to fine tune the new feature vector set. The cross-entropy error is calculated for each and every feature of the sentence . C. Summary Generation Step 1:Sentence score is the ratio of common words found in user query and specific sentence to Figure4.Comparitive Analysis. The maximum Recall, Precision and F-Measure values for the present dataset as given as 0.40, 0.92 and 0.55 respectively. V. CONCLUSION In this paper, we presented a multi-document text summarization scheme using an unsupervised deep learning algorithm along with fuzzy logic. Feature matrix with seven features from the set of sample dataset from DUC2002(Document ID: AP8809110016). The feature matrix is applied through the various levels of the RBM and finally the efficient text summary is generated. The result indicates that this method generates efficient text summary when compared to previous methods based on evaluation metrics. Electronic copy available at: https://ssrn.com/abstract=3165331 REFERENCES [1] Aarti Komal Pharande,DipaliNale, Roshani "Agrawal" Automatic TextSummarization" Volume 109 – No. 17, January 2015. [2] Yapinus,G.;Erwin,A.;Galinium,M.; ,Muliady,W.,“Automatic multi-document summarization for Indonesian documents using hybrid abstractive-Extractive summarization technique”,pp.1-5, 0ct.2014. [3] Khosrow Kaikhah, "Automatic Text Summarization with Neural Networks", International Conference on intelligent systems, IEEE, 40-44, USA, June 2004. [4] G Erkan and Dragomir R. Radev, “LexRank: Graph-based Centrality as Salience in TextSummarization”, Journal of Artificial Intelligence Research, Re-search, Vol. 22, pp. 457-479 2004. [5] Joeran Beel "Research-paper recommender systems: a literature survey" November 2016, Volume 17,Issue 4, pp 305–338. [6] KyoJoong "Research Trend Analysis using Word Similarities and Clusters" Vol. 8, No. 1,January, 2013. [7] Joel larocca Neto, Alex A. Freitas and Celso A.A.Kaestner,"AutomaticTextSummarization using a Machine Learning Approach”, Book: Advances in Artificial Intelligence: Lecture Notes in computer science, Springer Berlin / Heidelberg, Vol 2507/2002, 205-215, 2002. [8] Meng Wang, Xiaorong Wang and Chao Xu, "An Approach to Concept Oriented TextSummarization", in Proceedings of ISCIT’05, IEEE international conference, China,1290-1293, 2005.Computer Science & Information Technology (CS & IT) . [9] Farshad Kyoomarsi, Hamid Khosravi, Esfandiar Eslami and Pooya Khosravyan Dehkordy,“Optimizing Text Summarization Based on Fuzzy Logic”, In proceedings of Seventh IEEE/ACISInternational Conference on Computer and Information Science, IEEE, University of Shahid BahonarKerman, UK, 347-352, 2008. [10] Junlin Zhanq, Le Sun and Quan Zhou, “A Cue-based HubAuthority Approach for Multi-DocumentText Summarization”, in Proceeding of NLP-KE'05, IEEE,642- 645, 2005. [11] David"Multilingual Single Document Keyword Extraction for Information Retrieval", Proceedings of NLP-KE’05, IEEE, Tokushima, 2005. [12] Ladda Suanmali, Mohammed Salem, Binwahlan and Naomie Salim,“Sentence Features Fusion for Text summarization using Fuzzy Logic, IEEE, 142-145, 2009. [13] YanLiu, Sheng-huaZhong, Wen-jieLii ,“Query-oriented Unsupervised Multidocument Summarization via DeepLearning,” Elsevier Science, 2008, PP. 3306 –3309. [14] B.Arman kiani,Akbarzadeh, “AutomaticText Summarization using: Hybrid FuzzyGA-GP” IEEE International Conference on Fuzzy Systems,2006. [15] C.Gordon and R.Debus, “Developing deep learning approaches and personal teaching efficacy within a preservice teacher education context,” Web Intelligence and Intelligent AgentTechnology, British Journal of Educational Psychology,Vol. 72, No. 4, 2002, PP. 483-511. [16] L. Zadeh, “Fuzzy sets. Information Control,” vol. 8, pp.338–353.1965 . [17] “An Approach For Text Summarization Using Deep Learning Algorithm”, Journal of Computer Science, vol.6, no.11, 2013. Electronic copy available at: https://ssrn.com/abstract=3165331