related work - Academic Science,International Journal of Computer

advertisement
RELATIONSHIP ENTITY ATTRIBUTES EXTRACTION USING A FUZZY LOGIC
BASED INTELLIGENT AGENT
Dr. K.Perumal
S.Nagarajan
Associate Professor,
Assistant Professor
Department of Computer Applications
Department of Computer Applications
Madurai Kamaraj University,Madurai,India
Yadava College, Madurai,India
perumalmkucs@gmail.com
nagasethu2000@yahoo.com
ABSTRACT
The Large amount of information is available in the web
world. Information retrieval and extraction are the major
processes for collecting the information from the web.
Our research focuses mainly on information extraction
from the web sources. Various methods are proposed for
extracting the information that contains individual
attributes, rather than a particular kind of information.
The proposed method is based on the fuzzy logic
intelligent agent. The aim of this system extracts the
related entities and its attributes of given keyword
through fuzzy intelligent. Fuzzy logic is an idea for the
management of the membership function value of
relationship entity information.
Keywords
Information retrieval, Information extraction, Fuzzy logic,
Fuzzy intelligent agent, incremental information.
.
INTRODUCTION
In data mining information extraction is the process of above
concerned. From huge volume of data, extracting particular
information is a tedious process that involves data analyse,
preprocessing etc., Information Extraction were for document
retrieval and extraction, but in the last years its use has been
generalized for the search for other types of information, such
as the one in a database, a web page or, in general, any set of
knowledge.
Information
Retrieval
deals
with
voluminous collections of textual material, and its aim is to
satisfy user queries and needs [1]. These needs meet increased
when the matter is not in the form of text, the user in question
is not a habitual of the matter, there are ambiguous contents,
bad organization or, simply, complex topics or a great amount
of information difficult to manage [2]. The abundant
information due to the rise of Information Technology
constitutes an enormous advantage for information searchers.
The difficulty arises in the area of searching for particular site
requested. The difficulty lies in to distinguishing the
necessary information from the huge quantity of unnecessary
data. This leads to the development of mining of data [3]. A
Vector Space Model (VSM) based classification of contents
by creating a few indexes based on key words, and a method
of consultation based on a Fuzzy Logic (FL) application with
an interface that one may interact with natural language. The
fuzzy logic is employed in this work for the Information
retrieval. Extracting the information from popular web site
pages can allow user to create extensive databases of entities.
These databases can then be queried by search engines to
improve ranking and rendering of search results, and by users
to access product features and reviews etc [4]. Fuzzy Logic is
an ideal tool for the management of this kind of vague and
heterogeneous information. Besides, this method has been
implemented and validated for Information Extraction in web
portals, where the information provided were imprecise and
disordered. A fuzzy logic system gives flexibility for term
weighting. More important than having a concrete value for
weights, what really matters is that a feature is represented by
a word. The information retrieval process includes the
information extraction, generalization, validation and
interpretation of the database to get unknown potential
information. An intelligent agent is provided to perform the
extraction.
RELATED WORK
Bhavana Dalvi, William W. Cohen and Jamie Callan, [5]
discuss the extraction of entity set from web using from
unsupervised information extraction. An open domain
information extraction is explained for extracting conceptinstance pairs from an HTML corpus. It relies on a novel
approach for clustering terms found in HTML tables, and then
assigning concept names to these clusters using pattern
recognition. The method can be efficiently applied to a large
corpus. From the experimental results it is concluded that the
time complexity of their clustering algorithm is more efficient
than K- means or agglomerative algorithm.
Samabia Tehsin, Asif Masood, Sumaira Kausar, and Fahim
Arif describe a solution using novel fuzzy-based method.
Their results advocates post processing, segmentation method
that can solve the problem of variation in text sizes and image
resolution. The methodology is tested on ICDAR 2011 Robust
Reading Challenge dataset which amply proves the strength of
their recommended method. Four factors are mainly put forth
for joining characters into words. These factors are fed into
the fuzzy system which gives the verdict of joining or not
joining regions [6].
The text extraction techniques can be categorized into two
types mainly with reference to the utilized text features, that
is, region-based and texture-based methods. Texture-based
methods pertain to textural properties of the text,
distinguishing it from the background. Region-based methods
use distinct region features to extort text content [7].
Hoifung Poon Pedro Domingos shows the concept of Joint
Inference in Information Extraction. They proposed a joint
approach to information extraction, where segmentation of all
records and entity resolution are performed together in a
single integrated inference process [8].
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke
Zettlemoyer, Daniel S. Weld [9] explains the information
extraction of non – overlapping relations. They proposed a
novel approach for multi-instance learning with overlapping
relations that combines a sentence-level extraction model with
a simple, corpus-level component for aggregating the
individual facts. They apply their model to learn extractors
form NY Times text using weak supervision from Freebase.
Experiments show that the approach runs quickly and yields
surprising gains in accuracy, at both the aggregate and
sentence level. The author defines an undirected graphical
model that allows joint reasoning about aggregate (corpuslevel) and sentence-level extraction decisions.
Amr El-Helw, Mina H. Farid and Ihab F. Ilyas, [10] shows
the Just-in-Time Information Extraction using Extraction
Views, in which they proposed a lightweight implementation
of just-in-time extraction that does not require fundamental
modifications to the DBMS. The author’s framework
integrates information extraction with traditional query
processing through view matching techniques. They
introduced extraction views, database views whose data is are
obtained by running information extractors on specific
document collections, rather than by running SQL queries on
relational data. Extraction views leverage the current view
infrastructure available in most commercial DBMSs.
fuzzy words through membership value of relationship entity
attributes.
Input
Keyword
Fuzzy
Intelligent
Keyword +
relationship
Agent
entity
attributes
Database
Collections
of attributes
Fig 1: Architecture of information extraction process
Fuzzy Intelligent agent is core mechanism of aggregate the
relationship entity attributes. In the collection of attributes are
store into database in hierarchical structure, Each attributes
related to many keywords and participate into properties of
other keyword. The database structure is given below
PROPOSED METHOD
Information extraction is a task which extracts the information
from the large sequence data. The extracting patterns are
predefined by the user. The Information extraction process is
based on the structure and information representation of the
document. Generally, User gives the patterns and designed
algorithm match the same pattern and extract from the
resources. The ontology based search has reached the next
mile stone of information extraction. The proposed method
focus the relationship entities and its attribute extraction of
selected keyword. This is very popular topic of search
technique and gaining more knowledge. Proposed method
builds the fuzzy intelligent agent for analyzing the
relationship entity of given search keyword. Fuzzy intelligent
agent aggregate the relationship entity attributes through
fuzzy rules and extract the information from the resources.
There are three processes involved in the extraction.
Preprocessing is the initial stage that it is parsing the resources
and convert into the general text format. Here we concentrate
the text content extraction. So text file format is suitable
format for all kinds of platform. Collecting the attributes and
store in to the database with membership value. In university
domain, collected attributes are such as student, marks,
department, degree, courses, examination grade subject,
courses and so on. These are the attributes store into the
database For each attribute assign the membership value for
generating the fuzzy set that contains the combination of
relationship entities attributes according to a given keyword in
the university domain. For instance, ‘student’ keyword is a
crisp word of the given text file. This Crisp word converted
into fuzzy words through fuzzification. There are various
methods available for fuzzification.
Students keyword indentify through the collection of its
properties and its characteristics. Firstname, lastname, grade,
mark, regno, department ,subject, semester are the properties
and characteristics of student keyword. These are the
relationship entity stores in the database with membership
value. In Fuzzification process, fuzzify the crisp word into
Fig 2: Functions of Intelligent Agent.
Once the database has been built, The fuzzy logic system
generate the relationship entity attributes of given keyword
through the fuzzy rules. The threshold value decides the
hierarchical level of relationship entities extraction. If the
threshold value is high, the FIA applies the generated rules
and search at the maximum level of the hierarchical structure.
So the aggregated attribute values are in part of searched
keyword. Sometimes it's not coming under the vagueness
value of search keyword. If the threshold value is low, The
FIA aggregate the less number of relationship connections and
strength of vagueness is very less. So the average threshold
value generates reasonable relationship entity attributes and
extract the information nearly feasible. The range of the
threshold value is not greater than 1. Fuzzification value of
given input keyword is ranging between 0 to 1. Because the
membership value is decided the vagueness value of given
input keyword.
Fuzzy Intelligent Agent
Fuzzy Inference
System
Fuzzy Rules
Fuzzification
Gathered relationship entities based
membership value of fuzzy set
Fig 2: Fuzzy Logic Control
The formation of rules is based on IF antecedent THEN
consequent. The assignment statements limit the value of a
variable to a specific quantity. The canonical rule formation
for a fuzzy rule based system is given in table 1.
Rule 1
If Condition C
THEN assign A1
Rule 2
If Condition A1
THEN assign A2
Rule 3
If Condition A2
THEN assign A3
Rule n
If Condition A-1
THEN assign An
Let TA is represented as Total number of related attributed in
database.
Result= EA/TA x 100.
Threshold
value
Keyword
0.0
student
0.1
student
department
0.3
student
Department,
course
0.6
student
Department
course,subje
cts
0.8
student
Relationship
Department,
course,degre
e,subjects,
grade
Attributes
Precision
in %
Roll no,firstname
Rollno,Firstname,
Department
name,
81%
83.8%
Rollno,firstname,
Department,cours
ename, duration
85.3%
Rollno,firstname,
Department,cours
ename, duration,
subject name
Rollno,firstname,
Department,cours
ename, duration,
subject
name,grade
level,CGPA
87.4%
Table: 1 Information Extraction through Fuzzy Intelligent
Agent based on the threshold value
95
A1, A2,A3..An are relationship entity and its attributes of a
given keyword. The value n is decides to hierarchical level
that is a threshold value. Aggregation rules is the process of
obtaining the overall consequents from the individual
consequents
provided
by
each
rule.
Rule1,Rule2,Rule3….Rulen are conjunct for obtaining the
relationship entities of a given keyword. Here the aggregated
relationship entities and its attributes Y is determined by the
fuzzy intersection of all individual rule consequents , yi where
i=1 to n
Y= y1 and y2 and y3 andy4….and yn or Y= y1 ∩ y2 ∩ y3 ∩
y4….. ∩ yn
RESULT AND DISCUSSION
In our experiment has implemented in live web portal of the
university. More than 60,000 daily visits registering in the
web portal and its page rank values is 223 among more than
4,000 universities in webometrics rankings for universities
web impact. Every web page in a portal is considered an
object. These objects are gathered in a hierarchical structure.
Every object consist of set of paragraphs that contains
keywords that is collected in a database. The number of
keywords and its related entities associated with every web
page, depending on the amount of information contained on
every page.
The precision percentage value of information extraction is
calculated in following way.
Let EA is represented as Extracted related entity’s attributes
in webpages.
90
85
Precision in
percentage
80
75
0
0.1
0.3
0.6
0.8
Fig 3 : Performance of Fuzzy logic intelligent agent
Table :1 shows the results of information extraction and the
threshold value decides the hierarchical level of relationship
entity and its attributes of given input keyword. In fig 3, the
percentage value of precision exposed the performance of
information extraction based on the threshold value. If the
threshold value is 0.8, search word will concatenate the
related entities and its attributes in following manner.
keyword=student
y1= Student,y2=Department,y3=course y4=subjects,y5=grade
Y= { Roll no,firstname} ∩ { Department name} ∩ {
coursename, duration} ∩ { subject name} ∩ { grade
level,CGPA}
Finally
keyword=student,
Rollno,firstname,
Department,department name, course , coursename,
duration,subject, subject name,grade level,CGPA.
92%
8.
Hoifung Poon Pedro Domingos, “Joint Inference in
Information Extraction”, in Association for the
Advancement of Artificial Intelligence.
CONCLUSION
Users are gaining the knowledge through a collection of
various resources. Internet is one of the major resources for
collecting the information. Massive amount of information are
available in Internet and information format and data
representation in different manner.
Hence information
extraction makes generalize process. Many techniques work
recently well for information extraction. Our contribution
focuses on extraction of relationship entity attributes
extraction based on FIA in static information.
The
enhancement of our proposed method achieves the feasible
extraction in incremental information.
REFERENCES
1.
Kwok, K.L.”,1989, A neural network for
probabilistic information retrieval”, in: Proceedings
of the 12th annual international ACM SIGIR
conference on Research and development in
information retrieval. Cambridge, Massachusetts,
United States..
2.
Jorge Ropero, Ariel Gómez, Carlos León, and
Alejandro Carrasco, 2007,“Information Extraction
in a Set of Knowledge Using a Fuzzy Logic Based
Intelligent Agent”, in ICCSA, LNCS 4707, Part III,
pp. 811–820, 2007.
3.
Jorge Ropero,, Ariel Gómez, Alejandro Carrasco,
Carlos León, 2011,“A Fuzzy Logic intelligent agent
for Information Extraction: Introducing 3 a new
Fuzzy Logic-based term weighting scheme”, in
Expert Systems with Applications, Elsevier, 31
October 2011, pp:1-15.
4.
Sandeepkumar
Satpal,
sahely
Bhadra,
S
Sundararajan and Rajeev Rastogi, Prithviraj Sen,
“Web Information Extraction Using Markov Logic
Networks”, in KDD’11, August 21–24, 2011, San
Diego, California, USA.
5.
Bhavana Dalvi, William W. Cohen and Jamie
Callan, 2012 “WebSets: Extracting Sets of Entities
from the Web Using Unsupervised Information
Extraction”, in WSDM’12, February 8–12, 2012,
Seattle, Washingtion, USA.
6.
Samabia Tehsin, Asif Masood, Sumaira Kausar, and
Fahim Arif, 2014, “Fuzzy-Based Segmentation for
Variable Font-Sized Text Extraction from
Images/Videos”,
in
Hindawi
Publishing
Corporation,
Mathematical
Problems
in
Engineering, Volume 2014, Article ID 389547,10
pages.
7.
R. Lienhart, 2003,Video OCR: A Survey and
Practitioner’s Guide, Video mining, Springer,
Burlingame, Calif, USA.
9.
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke
Zettlemoyer, Daniel S. Weld, “Knowledge-Based
Weak Supervision for Information Extraction of
Overlapping Relations.
10. Amr El-Helw, Mina H. Farid and Ihab F. Ilyas,
2012, s“Just-in-Time Information Extraction using
Extraction Views”, in SIGMOD ’12, May 20–24, ,
Scottsdale, Arizona, USA.
Download