related work - Academic Science,International Journal of Computer

RELATIONSHIP ENTITY ATTRIBUTES EXTRACTION USING A FUZZY LOGIC BASED INTELLIGENT AGENT Dr. K.Perumal S.Nagarajan Associate Professor, Assistant Professor Department of Computer Applications Department of Computer Applications Madurai Kamaraj University,Madurai,India Yadava College, Madurai,India perumalmkucs@gmail.com nagasethu2000@yahoo.com ABSTRACT The Large amount of information is available in the web world. Information retrieval and extraction are the major processes for collecting the information from the web. Our research focuses mainly on information extraction from the web sources. Various methods are proposed for extracting the information that contains individual attributes, rather than a particular kind of information. The proposed method is based on the fuzzy logic intelligent agent. The aim of this system extracts the related entities and its attributes of given keyword through fuzzy intelligent. Fuzzy logic is an idea for the management of the membership function value of relationship entity information. Keywords Information retrieval, Information extraction, Fuzzy logic, Fuzzy intelligent agent, incremental information. . INTRODUCTION In data mining information extraction is the process of above concerned. From huge volume of data, extracting particular information is a tedious process that involves data analyse, preprocessing etc., Information Extraction were for document retrieval and extraction, but in the last years its use has been generalized for the search for other types of information, such as the one in a database, a web page or, in general, any set of knowledge. Information Retrieval deals with voluminous collections of textual material, and its aim is to satisfy user queries and needs [1]. These needs meet increased when the matter is not in the form of text, the user in question is not a habitual of the matter, there are ambiguous contents, bad organization or, simply, complex topics or a great amount of information difficult to manage [2]. The abundant information due to the rise of Information Technology constitutes an enormous advantage for information searchers. The difficulty arises in the area of searching for particular site requested. The difficulty lies in to distinguishing the necessary information from the huge quantity of unnecessary data. This leads to the development of mining of data [3]. A Vector Space Model (VSM) based classification of contents by creating a few indexes based on key words, and a method of consultation based on a Fuzzy Logic (FL) application with an interface that one may interact with natural language. The fuzzy logic is employed in this work for the Information retrieval. Extracting the information from popular web site pages can allow user to create extensive databases of entities. These databases can then be queried by search engines to improve ranking and rendering of search results, and by users to access product features and reviews etc [4]. Fuzzy Logic is an ideal tool for the management of this kind of vague and heterogeneous information. Besides, this method has been implemented and validated for Information Extraction in web portals, where the information provided were imprecise and disordered. A fuzzy logic system gives flexibility for term weighting. More important than having a concrete value for weights, what really matters is that a feature is represented by a word. The information retrieval process includes the information extraction, generalization, validation and interpretation of the database to get unknown potential information. An intelligent agent is provided to perform the extraction. RELATED WORK Bhavana Dalvi, William W. Cohen and Jamie Callan, [5] discuss the extraction of entity set from web using from unsupervised information extraction. An open domain information extraction is explained for extracting conceptinstance pairs from an HTML corpus. It relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using pattern recognition. The method can be efficiently applied to a large corpus. From the experimental results it is concluded that the time complexity of their clustering algorithm is more efficient than K- means or agglomerative algorithm. Samabia Tehsin, Asif Masood, Sumaira Kausar, and Fahim Arif describe a solution using novel fuzzy-based method. Their results advocates post processing, segmentation method that can solve the problem of variation in text sizes and image resolution. The methodology is tested on ICDAR 2011 Robust Reading Challenge dataset which amply proves the strength of their recommended method. Four factors are mainly put forth for joining characters into words. These factors are fed into the fuzzy system which gives the verdict of joining or not joining regions [6]. The text extraction techniques can be categorized into two types mainly with reference to the utilized text features, that is, region-based and texture-based methods. Texture-based methods pertain to textural properties of the text, distinguishing it from the background. Region-based methods use distinct region features to extort text content [7]. Hoifung Poon Pedro Domingos shows the concept of Joint Inference in Information Extraction. They proposed a joint approach to information extraction, where segmentation of all records and entity resolution are performed together in a single integrated inference process [8]. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld [9] explains the information extraction of non – overlapping relations. They proposed a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. They apply their model to learn extractors form NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level. The author defines an undirected graphical model that allows joint reasoning about aggregate (corpuslevel) and sentence-level extraction decisions. Amr El-Helw, Mina H. Farid and Ihab F. Ilyas, [10] shows the Just-in-Time Information Extraction using Extraction Views, in which they proposed a lightweight implementation of just-in-time extraction that does not require fundamental modifications to the DBMS. The author’s framework integrates information extraction with traditional query processing through view matching techniques. They introduced extraction views, database views whose data is are obtained by running information extractors on specific document collections, rather than by running SQL queries on relational data. Extraction views leverage the current view infrastructure available in most commercial DBMSs. fuzzy words through membership value of relationship entity attributes. Input Keyword Fuzzy Intelligent Keyword + relationship Agent entity attributes Database Collections of attributes Fig 1: Architecture of information extraction process Fuzzy Intelligent agent is core mechanism of aggregate the relationship entity attributes. In the collection of attributes are store into database in hierarchical structure, Each attributes related to many keywords and participate into properties of other keyword. The database structure is given below PROPOSED METHOD Information extraction is a task which extracts the information from the large sequence data. The extracting patterns are predefined by the user. The Information extraction process is based on the structure and information representation of the document. Generally, User gives the patterns and designed algorithm match the same pattern and extract from the resources. The ontology based search has reached the next mile stone of information extraction. The proposed method focus the relationship entities and its attribute extraction of selected keyword. This is very popular topic of search technique and gaining more knowledge. Proposed method builds the fuzzy intelligent agent for analyzing the relationship entity of given search keyword. Fuzzy intelligent agent aggregate the relationship entity attributes through fuzzy rules and extract the information from the resources. There are three processes involved in the extraction. Preprocessing is the initial stage that it is parsing the resources and convert into the general text format. Here we concentrate the text content extraction. So text file format is suitable format for all kinds of platform. Collecting the attributes and store in to the database with membership value. In university domain, collected attributes are such as student, marks, department, degree, courses, examination grade subject, courses and so on. These are the attributes store into the database For each attribute assign the membership value for generating the fuzzy set that contains the combination of relationship entities attributes according to a given keyword in the university domain. For instance, ‘student’ keyword is a crisp word of the given text file. This Crisp word converted into fuzzy words through fuzzification. There are various methods available for fuzzification. Students keyword indentify through the collection of its properties and its characteristics. Firstname, lastname, grade, mark, regno, department ,subject, semester are the properties and characteristics of student keyword. These are the relationship entity stores in the database with membership value. In Fuzzification process, fuzzify the crisp word into Fig 2: Functions of Intelligent Agent. Once the database has been built, The fuzzy logic system generate the relationship entity attributes of given keyword through the fuzzy rules. The threshold value decides the hierarchical level of relationship entities extraction. If the threshold value is high, the FIA applies the generated rules and search at the maximum level of the hierarchical structure. So the aggregated attribute values are in part of searched keyword. Sometimes it's not coming under the vagueness value of search keyword. If the threshold value is low, The FIA aggregate the less number of relationship connections and strength of vagueness is very less. So the average threshold value generates reasonable relationship entity attributes and extract the information nearly feasible. The range of the threshold value is not greater than 1. Fuzzification value of given input keyword is ranging between 0 to 1. Because the membership value is decided the vagueness value of given input keyword. Fuzzy Intelligent Agent Fuzzy Inference System Fuzzy Rules Fuzzification Gathered relationship entities based membership value of fuzzy set Fig 2: Fuzzy Logic Control The formation of rules is based on IF antecedent THEN consequent. The assignment statements limit the value of a variable to a specific quantity. The canonical rule formation for a fuzzy rule based system is given in table 1. Rule 1 If Condition C THEN assign A1 Rule 2 If Condition A1 THEN assign A2 Rule 3 If Condition A2 THEN assign A3 Rule n If Condition A-1 THEN assign An Let TA is represented as Total number of related attributed in database. Result= EA/TA x 100. Threshold value Keyword 0.0 student 0.1 student department 0.3 student Department, course 0.6 student Department course,subje cts 0.8 student Relationship Department, course,degre e,subjects, grade Attributes Precision in % Roll no,firstname Rollno,Firstname, Department name, 81% 83.8% Rollno,firstname, Department,cours ename, duration 85.3% Rollno,firstname, Department,cours ename, duration, subject name Rollno,firstname, Department,cours ename, duration, subject name,grade level,CGPA 87.4% Table: 1 Information Extraction through Fuzzy Intelligent Agent based on the threshold value 95 A1, A2,A3..An are relationship entity and its attributes of a given keyword. The value n is decides to hierarchical level that is a threshold value. Aggregation rules is the process of obtaining the overall consequents from the individual consequents provided by each rule. Rule1,Rule2,Rule3….Rulen are conjunct for obtaining the relationship entities of a given keyword. Here the aggregated relationship entities and its attributes Y is determined by the fuzzy intersection of all individual rule consequents , yi where i=1 to n Y= y1 and y2 and y3 andy4….and yn or Y= y1 ∩ y2 ∩ y3 ∩ y4….. ∩ yn RESULT AND DISCUSSION In our experiment has implemented in live web portal of the university. More than 60,000 daily visits registering in the web portal and its page rank values is 223 among more than 4,000 universities in webometrics rankings for universities web impact. Every web page in a portal is considered an object. These objects are gathered in a hierarchical structure. Every object consist of set of paragraphs that contains keywords that is collected in a database. The number of keywords and its related entities associated with every web page, depending on the amount of information contained on every page. The precision percentage value of information extraction is calculated in following way. Let EA is represented as Extracted related entity’s attributes in webpages. 90 85 Precision in percentage 80 75 0 0.1 0.3 0.6 0.8 Fig 3 : Performance of Fuzzy logic intelligent agent Table :1 shows the results of information extraction and the threshold value decides the hierarchical level of relationship entity and its attributes of given input keyword. In fig 3, the percentage value of precision exposed the performance of information extraction based on the threshold value. If the threshold value is 0.8, search word will concatenate the related entities and its attributes in following manner. keyword=student y1= Student,y2=Department,y3=course y4=subjects,y5=grade Y= { Roll no,firstname} ∩ { Department name} ∩ { coursename, duration} ∩ { subject name} ∩ { grade level,CGPA} Finally keyword=student, Rollno,firstname, Department,department name, course , coursename, duration,subject, subject name,grade level,CGPA. 92% 8. Hoifung Poon Pedro Domingos, “Joint Inference in Information Extraction”, in Association for the Advancement of Artificial Intelligence. CONCLUSION Users are gaining the knowledge through a collection of various resources. Internet is one of the major resources for collecting the information. Massive amount of information are available in Internet and information format and data representation in different manner. Hence information extraction makes generalize process. Many techniques work recently well for information extraction. Our contribution focuses on extraction of relationship entity attributes extraction based on FIA in static information. The enhancement of our proposed method achieves the feasible extraction in incremental information. REFERENCES 1. Kwok, K.L.”,1989, A neural network for probabilistic information retrieval”, in: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval. Cambridge, Massachusetts, United States.. 2. Jorge Ropero, Ariel Gómez, Carlos León, and Alejandro Carrasco, 2007,“Information Extraction in a Set of Knowledge Using a Fuzzy Logic Based Intelligent Agent”, in ICCSA, LNCS 4707, Part III, pp. 811–820, 2007. 3. Jorge Ropero,, Ariel Gómez, Alejandro Carrasco, Carlos León, 2011,“A Fuzzy Logic intelligent agent for Information Extraction: Introducing 3 a new Fuzzy Logic-based term weighting scheme”, in Expert Systems with Applications, Elsevier, 31 October 2011, pp:1-15. 4. Sandeepkumar Satpal, sahely Bhadra, S Sundararajan and Rajeev Rastogi, Prithviraj Sen, “Web Information Extraction Using Markov Logic Networks”, in KDD’11, August 21–24, 2011, San Diego, California, USA. 5. Bhavana Dalvi, William W. Cohen and Jamie Callan, 2012 “WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction”, in WSDM’12, February 8–12, 2012, Seattle, Washingtion, USA. 6. Samabia Tehsin, Asif Masood, Sumaira Kausar, and Fahim Arif, 2014, “Fuzzy-Based Segmentation for Variable Font-Sized Text Extraction from Images/Videos”, in Hindawi Publishing Corporation, Mathematical Problems in Engineering, Volume 2014, Article ID 389547,10 pages. 7. R. Lienhart, 2003,Video OCR: A Survey and Practitioner’s Guide, Video mining, Springer, Burlingame, Calif, USA. 9. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld, “Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. 10. Amr El-Helw, Mina H. Farid and Ihab F. Ilyas, 2012, s“Just-in-Time Information Extraction using Extraction Views”, in SIGMOD ’12, May 20–24, , Scottsdale, Arizona, USA.

related work - Academic Science,International Journal of Computer

Related documents

Products

Support

related work - Academic Science,International Journal of Computer

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib