Research on Neural Network Based MultiAgent Semantic Web Content Mining

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
Research on Neural Network Based MultiAgent
Semantic Web Content Mining
1
Ms. Ashwini H. Bhuskat , Ms. Nupoor M. Yawale2, Ms. Pranita P. Deshmukh3,
Ms. Rutuja A. Gulhane4 , Ms. Meghana A. Deshmukh5
PRMIT&R, Badnera (INDIA)
ABSTRACT
As with the rapid increase of the huge amount of online information, there is a strong demand for Information retrieving from
the web which helps to discover some useful knowledge from Web documents. Agents are often developed not in isolation but as
part of a multi-agent system. Single agent is unable to retrieve the information from multiple searching tools. Single agent
requires more time to retrieve information than multi-agent. Multi-agent neural network system is an effective solution to
large scale Web mining. This work proposes Multi-agent neural network based framework for mining contents of semantic
web, which would provide query relevant knowledge using STC (Suffix Tree Cluster) clustering technique. Clustering helps to
provide user with query relevant cluster of web contents, which better satisfy user requirement and provides optimal utilization
of web surfing time. It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural
network is used by us to enable multi-agent to match a query terms which create a clusters and return search result with high
accuracy in a reasonably short time.
Keywords: Multi-agent Systems, Hierarchical Clustering, neural network, web mining, content mining.
1.INTRODUCTION
Due to the rapid increase of the huge amount of online information, there is a strong demand for Web text mining
which helps people to discover some useful knowledge from Web documents. There are number of search tools
available for retrieving the information, information agent called search tools used to build database indices for web
information. But these approaches often causes unacceptable access delays under highly competitive access situations as
in search tools. As the amount of information stored in the database indices increases, it leads to the unacceptable
access delay problem. To overcome this problem meta search services is used.
Figure 1 Meta Search Service
Meta search service retrieve the information from multiple search tools. But this technique also have a problem, i.e.
multiple source problem means if there exist multiple search tool then the question arises to which source the given
query be submitted to retrieve the information. To solve this question neural network is used. In this, the internal neural
network mechanism discovers the search tools from which the associated information for the query is retrieved.
The single agent approach is impractical for large scale information using the Multi-Agent approach. In this IR system
number of agents collaboratively retrieves the desired information from the distributed web search tools. Multi agent
neural network system is solution to the large web text mining. A multi-agent system is one in which a number of
agents cooperates and interact with each other in a complex and distributed environment.
Web mining [11] can be defined as mining of the World Wide Web (WWW) to find useful knowledge about user
behavior, content, and structure of the web. Web Content Mining focuses on extracting knowledge from the contents or
their descriptions. It involves techniques for summarizing, classification and clustering of the web contents. It can
provide useful and interesting patterns about user needs and contribution behavior. In this paper, the neural network is
applied using multi-agent for semantic web content mining. In the present era of WWW, the user is more interested in
getting useful, relevant and knowledge oriented contents from the WWW. The paradigm is shifting from demand of
Volume 4, Issue 6, June 2015
Page 45
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
information to demand for knowledge. Web content mining when applied on semantic web contents can lead to
discovery of knowledge that could be provided to end users to better serve their requirements.
2.RELATED WORK
Shu Bo and Kak Subhash [29] propose Meta search engine-Anvish. Determining the relevancy of web pages to a query
term is basic to the working of any search engine. The relevancy of search results on a meta search engine is classified
by using neural network algorithm. The meta search engine is able to handle a query term in a reasonably short time
and return the search results with high accuracy. The meta search engine-Anvish has learning neural network to
classify and organize the search results[29]. Anvish is further improved by expanding the current binary network into
continuous network so that the Web pages can be ranked quantitatively.
Choi, Y.S. and Yoo, S.I.[7], The multi agent learning approach was proposed to the web information retrieval using
neural network . This approach provides the method for locating the search tools that will give the desired information
and retrieving the relevant information. This approach also captures the user interest and retrieves such information.
The learning and generalization mechanism of artificial neural network can be effectively utilized as an internal
knowledge mechanism of each IR agent in the context of the multi- agent Web IR. In this approach the BPN (Back
Propagation neural Network) algorithm is used for this neural network associative memory.
As the large amount of information provided on the web; single agent fails to retrieve the information from multiple
search tools. The problem of unacceptable access delays occurs due to the under highly competitive access situations as
in search tools. This situation occurs as the amount of information stored in the database increases. So then the neural
network [16] is used for single agent, but this have the problem of convergence as the number of search tools increases,
the single agent information comes to have difficulty in training its neural network. Such problem is called ‘bounded
rationality’ for single agent approach. This approach is impractical or inefficient for the multiple search tools, so the
multi agent hierarchically organized multi-agent approach is proposed. In this system, information agents interacting
with each other learn collaboratively about their environment from user’s feedback so that it can retrieve the desired
information effectively from the distributed Web search tools. In this system, each information agent can dynamically
join or leave the collaborative organization and the information sources are subject to asynchronous changes of their
themes, contents, and structures.
With the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which
helps people discover some useful knowledge from Web documents. For this purpose back propagation neural network
(BPNN)-based Web text mining system for decision support is used[22][24]. BPNN is used as an intelligent learning
agent that learns about Web documents. To access the information from large number of web document, multi-agentbased neural network system for Web text mining is used.
Singh A [30][31], uses Ontology to retrieve the information from web. Here ontology plays an important role in
enhancing the efficiency of existing agent based focused crawlers [31]. Agent based Focused crawlers (ABFC) [6][11]
are the intelligent miners, which selectively seek web pages that are most relevant to input information. ABFC follows
the context-based approach that analyzes content of web page thereby reducing the redundant information and hence
deduces the relevant information from a page. Ontology is defined as a well organized knowledge scheme that
represents high level background knowledge with concepts and relations [31][32]. Ontology based crawling [27]
eliminates simple keyword based crawling method as it introduces semantics/context for improving crawl efficiency in
which a keyword is being searched .This work proposes an Ontology Driven Agent Based Focused Crawler that
attempts to improve the efficiency of exiting ABFCs by introducing semantics in which a keyword is searched.
Tremendous growth of web-sites and text and multimedia contents on the WWW (World Wide Web) has lead to
demand of a strategy which could provide knowledge from the vast data scattered over different servers and also could
make useful predictions for otherwise uncertain user behavior. Web mining [10][28] can be defined as mining of the
World Wide Web (WWW) to find useful knowledge about user behavior, content, and structure of the web. Web
mining [20] is uniquely different from data mining as it works on web contents that are unstructured files or server logs
in contrast to well-structured databases used in data mining. The next generation of WWW will be knowledge oriented
and to satisfy the customers web mining is a promising solution [19][30][42].
Scatter/Gather [9] is the first query result visualization algorithm using the clustering technique in the Information
Retrieval community. Before this work, document clustering was traditionally investigated mainly as a method for
improving document search and retrieval, but was not widely used because of speed and poor performance of improving
near-neighbor search. Instead of attempting to reduce the number of documents returned, Cutting et al. introduced
document clustering as a document browsing method. They state that the Scatter/Gather system is particularly helpful
in situations in which it is difficult or undesirable to specify a query formally:
1. When the user is not looking for anything specific, just wants to discover the general information content of the
corpus (to gain an overview);
Volume 4, Issue 6, June 2015
Page 46
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
2. When it is difficult to formulate the query precisely (help user formulate a search request). Two near linear time
clustering algorithms were presented: Buckshot and Fractionation. However, their work is based on general document
collections, not on dynamically generated search results.
Zamir & Etzioni [40] followed this paradigm and propose the notion of search results clustering (also called ephemeral
clustering. In their Grouper system, STC (Suffix Tree Clustering) treats a document as a string instead of a set of
words. It attempted to cluster documents “snippets” returned by search engine according to common phrases they
contain, thus employing information about the proximity and order of single keywords in addition to their frequencies.
STC is organized into two phases: discovering base clusters using a suffix tree and merging base clusters. In the first
stage, the retrieved document “snippets” are inserted into a suffix tree, where each node in the tree represents a group
of documents and a phrase that is common to all of them. For each phrase shared by two or more documents, they
assign a score: s(B)=|B| * f(|P|) to penalize single word terms, where |B| is the number of documents in base cluster B,
and |P| is the length of the phrase P. Only the base clusters whose score is higher than an arbitrarily chosen minimal
base cluster score are retained. In the second phase, N top-ranking base clusters are merged using a version of the AHC
algorithm, with binary single-link merge criterion and predetermined minimal similarity between base clusters as the
halting criterion [13]. The two distinguishing features of STC are: linear time complexity; clustering documents
according to shared phrases instead of word frequency. These make it “a substantial momentum” [17] of ephemeral
clustering.
Carrot system built by Weiss and Stefanowski extended STC’s application into the Polish Language, by using different
stemming techniques. They investigate the influence of two primary STC parameters: merge threshold and minimum
base cluster score on the number and quality of results produced by STC algorithm.
SnakeT [2, 23] constructs two knowledge bases offline and takes advantage of them. The first one is called Anchor
Text and Link Database. It is used to enrich the snippets returned by a search engine. The other is called Semantic
Knowledge Base, it is used to help ranking in the process of generate “approximate sentences”. This approach is
different from Grouper and other approaches as they treated sentences formed by contiguous terms, while SnakeT
extracts sentences involving non-contiguous terms. It first extracts “approximate” sentences from the enriched snippets
collection, then uses a knowledge base to help ranking, sentences above threshold are used as a set of meaningful
labels. These labels will be then used to form and name the clusters and are called the primary label. To generate the
labels of the nodes in the higher levels of the hierarchy, k-approximate sentences are used which have a good rank and
occur in at least c% of the documents contained in the cluster. This set of secondary labels “provide a description for
the cluster at a coarser level and thus is more useful for hierarchical formation and labeling”.
In addition to the above academic tools, there also has been a surge of commercial interest in novel IR-tools that
support users in searching tasks [11]. The following are existing industrial systems implementing clustering techniques
in their (meta) search engines: Vivisimo, Grokker, Clusty and Iboogie provide cluster hierarchies in addition to the flat
ranked list of search results; Kartoo use a network visualization interface, Mooter also use a network visualisation tool
but followed by hierarchical clusters presentation when a node in the network is clicked; Copernic and Dog-pile
concentrate more on supporting users on query formulation (providing revised/refined query suggestions). Among the
various clustering search engines, Vivisimo.com deserves a special mention. This commercial Meta search engine
organizes search results into hierarchical and very well described thematic groups and can be considered a benchmark
and state of- the-art in current research [17]. But very little information about this software is available as it is not
publicly accessible. Much academic research attempts to address the search results clustering problem, but the
attainable performance are far from the one achieved by Vivisimo. Only SnakeT claims to achieve efficiency and
efficacy performance close to it [12]. Following are some snapshots of available commercial clustering search engines.
Figure 2 Home Page of Grouper Clustering Engine
Volume 4, Issue 6, June 2015
Page 47
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
Figure 3 Results of Grouper Clustering Engine for the word ‘clinton’
Figure 4 Home Page and Results of Vivisimo Clustering Engine for word ‘canada’
Volume 4, Issue 6, June 2015
Page 48
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
Figure 5 Home Page and Results of Clusty Clustering Engine for word ‘data mining’
Figure 6 Home Page and Results of iBoogie Clustering Engine for word ‘data mining’
Figure 7 Home Page and Results of HOB Search Clustering Engine for word ‘paris’
Volume 4, Issue 6, June 2015
Page 49
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
Moreover, as specified by [21] in their paper, they reformalize the search result clustering problem as a salient phrases
ranking problem. Thus they convert an unsupervised clustering problem to a supervised learning problem. Although a
supervised learning method requires additional training data, it makes the performance of search result grouping
significantly improve, and enables us to evaluate it accurately. Given a query and the ranked list of search results, their
method first parses the whole list of titles and snippets, extracts all possible phrases (n-grams) from the contents, and
calculates several properties for each phrase such as phrase frequencies, document frequencies, phrase length, etc. A
regression model learned from previous training data is then applied to combine these properties into a single salience
score. The phrases are ranked according to the salience score, and the top-ranked phrases are taken as salient phrases.
The salient phrases are in fact names of candidate clusters, which are further merged according to their corresponding
documents.
They specify that their method is more suitable for Web search results clustering because, emphasize the efficiency of
identifying relevant clusters for Web users. It generates shorter (and thus hopefully more readable) cluster names,
which enable users to quickly identify the topics of a specified cluster. Furthermore, the clusters are ranked according to
their salience scores, thus the more likely clusters required by users are ranked higher.
Amin Milani Fard, Reza Ghaemi[3],proposes a neuro-fuzzy architecture for Web content taxonomy using hybrid of
Adaptive Resonance Theory (ART) neural networks and fuzzy logic concept. The search engine called Kavosh1 is
equipped with unsupervised neural networks for dynamic data clustering. This work implements text mining method.
Unsupervised self-organizing map (SOM) based neural networks are mainly used for datasets clustering.
3.PROPOSED WORK
3.1Snippet Fetching and Clustering
3.1.1Preprocessing
In web search results clustering, it is the web snippets that serve as the input data for the grouping algorithm. Due to
the rather small size of the snippets and the fact that they are automatically generated summaries of the original
documents, proper data preprocessing is of enormous importance. Although LSI is capable of dealing with noisy data,
in a setting where only extremely small pieces of documents are available, this ability is severely limited. As a result,
without sufficient preprocessing, the majority of abstract concepts discovered by the LSI would be related to
meaningless terms, which would make them useless as cluster labels. Thus, the primary aim of the preprocessing phase
is to remove from the input documents all characters and terms that can possibly affect the quality of group
descriptions.
Text Filtering
Stemming
Stop words Removal
Figure 8 Pre-processing of snippets
In STC, there are three steps to the preprocessing phase: text filtering, stemming and stop words marking.
3.1.2Text filtering
In the text filtering step, all terms that are useless or would introduce noise in cluster labels are removed from the input
documents. Among such terms are:
• HTML tags (e.g. <table>) and entities (e.g. &)
• non-letter characters such as "$", "%" or "#" (except white spaces and sentence markers such as '.', '?' or '!')
Note that at this stage the stop-words are not removed from the input documents. Additionally, words that appear in
snippet titles are marked in order to increase their weight in further phases of clustering.
3.1.3Stemming
Stemming algorithms are used to transform the words in texts into their grammatical root form, and are mainly used to
improve the Information Retrieval System’s efficiency. To stem a word is to reduce it to a more general form, possibly
its root. For example, stemming the term interesting may produce the term interest. Though the stem of a word might
not be its root, we want all words that have the same stem to have the same root. The effect of stemming on searches of
English document collections has been tested extensively. Several algorithms exist with different techniques.
3.1.4Elimination of Stop Words
After stemming it is necessary to remove unwanted words. There are 400 to 500 types of stop words such as “of”,
“and”, “the,” etc., that provide no useful information about the document’s topic. Stop-word removal is the process of
removing these words. Stop-words account for about 20% of all words in a typical document [4]. These techniques
greatly reduce the size of the search engine’s index. Stemming alone can reduce the size of an index by nearly 40%. To
compare a webpage with another webpage, all unnecessary content must be removed and the text put into an array.
Although they alone do not present any descriptive value, stop words may help to understand or disambiguate the
Volume 4, Issue 6, June 2015
Page 50
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
meaning of a phrase (compare: "Chamber Commerce" and "Chamber of Commerce"). That is why we have decided to
retain them in the input documents, only adding appropriate markers. This will enable the further phases of the
algorithm to e.g. filter out phrases ending with a stop word or prevent the LSI from indexing stop words at all.
Following tables 1 and 2 represent the document “Software for the sparse singular value decomposition” pre-processing
steps:
Table 1: Preprocessing Steps
Word
Document
Software
for, the
in
Pre-processing step
Word
Sparse
Stopwords-removed
during
Word
singular
Word
Value
Word
Decomposition
Stemming-decompose
Table 2: Result of Pre-processing Steps
Result of pre-processing
1. Software
2. Sparse
3. Singular
4. Value
5. decompose
3.2 Finding Similarity Between Input Query And Fetching Snippets
Finding similarity between Input query and fetching snippet is the second sub module of this system. Here the fuzzy
neural network is used to check the relevancy of query term into documents. To create a term document frequency
matrix, Consider the following seven documents as input snippets, also the frequent terms and phrases are generated by
using preprocessing frequents phrase extraction component then, all other components in clustering engine will
perform computational operations on it to cluster them in groups.
Documents:D1: Large-scale singular value computations
D2: Software for the sparse singular value decomposition
D3: Introduction to modern information retrieval
D4: Linear algebra for intelligent information retrieval
D5: Matrix computations
D6: Singular value cryptogram analysis
D7: Automatic information organization
Terms:T1: Information T2: Singular
T3: Value T4: Computations T5: Retrieval
Phrases:P1: Singular value P2: Information retrieval
Figure 9 Steps for query matching
Volume 4, Issue 6, June 2015
Page 51
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
3.2.1 Neural Network for Clustering
Unsupervised self-organizing map (SOM) based neural networks [3] are mainly used for datasets clustering. SOM
neural network structure changes during learning process based on the observed data; however these methods are
usually not efficient when dealing with dynamic group clustering such as for web documents. The Adaptive Resonance
Theory (ART) neural network [9], however, is an unsupervised incremental clustering neural network especially
designed to satisfy this demand.
The proposed System uses a fuzzy approach [21] to evaluate and assign a score to a web page. The inputs to the fuzzy
inference system are the normalized value of term frequency in the document, position of the word, and number of web
links to the page. The final page score is then calculated as bellow:
PS = 2*frequency_score + position_score + link_score
The fuzzy rule base and experimental triangular membership functions are shown in Table 3 and figure 10. Following
are the linguistics variables
Linguistics Variables:
FK: frequency_of_keyword
FS: frequency_score
PK: position_of_keyword
NL: number_of_links
PS: position_score
LS: link_score
Table 3: Fuzzy Rule Base
IF
THEN
FK
High
FS is High
FK
Medium
FK is Medium
FK
Low
FK is Low
PK
Close_to_top
PK is High
PK
In_the middle
PK is Medium
PK
Far_from_top
PK is Low
NL
Many
NK is High
NL
Medium
NK is Medium
NL
Few
NK is Low
Figure 10 (a) Frequency_score MF
Figure 10 (b) Position_of_keyword MF
Figure 10 (c) Number_of_links MF
Figure 10: Membership Functions
Volume 4, Issue 6, June 2015
Page 52
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
3.2.2System Model
Web Clustering Engine is a post-retrieval clustering that clusters the search results retrieved for the broad topic. The
output of the Web clustering engines ensures fast subtopic retrieval, quick topic exploration within unknown topics, and
easy identification of relevant search results within the broad topic. Figure 11 has the System Model.
3.2.2.1Search Keyword
The system provides an interface to accept the search keyword from the user.
3.2.2.2Search Result Acquisition or input snippets
The component accepts the search keyword as input. It allows us to configure the number of search results to be
extracted from various search engines. The component extracts and stores the search results which include the URL
pointing to the document, title, and snippet. Here, as discussed earlier in this work already extracted snippets are used
AMBIENT and ODP239 datasets.
3.2.2.3Multi-Agent
The term agent is a search tool. In this system, there are two datasets in which number of snippets is saved. The dataset
is divided into three agents to make it multi-agent. As the Dataset is divided into three agents, the query term is
accepted by all agent and check the relevancy of term in that agent by using neuro fuzzy member function as shown
above. The fuzzy rule base is applied on the documents for ranking the term. As multi-agent is used, query term is
simultaneously matched in number of agents. So that result obtains in less time than single agent.
3.2.2.4 Clustering Engine
The clustering engine preprocesses the input snippets and then converts the preprocessed search results to a format
suitable for the clustering algorithm. It extracts the features and provides them as input to the clustering algorithm
within. The clustering algorithm would build the cluster and identifies the label that best describes each cluster.
Figure 11: System Model
4.SYSTEM IMPLEMENTATION
In this system, a search engine is developed to retrieve information from datasets as already discussed. The semantic
content mining is done using clustering technique, suffix tree clustering. As query is entered in search box, first preprocessing takes place. In pre-processing document cleaning phase of STC is performed. Then the query is passed to
the datasets to retrieve related information. In this system, the time required retrieving snippets from single agent and
multi agent are compared. In single agent, data is retrieved from the single searching tool while in multi agent it
retrieves the information simultaneously from multiple search tools. In this system, datasets are divided into three parts
to make it multi-agents. If System uses multi agents, the time for getting result will reduce to great extent as compare to
single agent.
4.1Search
In this system, Search module is a first module. In this module, first query is entered in a search box and then choose
mode single agent or multi agent from drop down list. If user select multi agent, results will obtain within less time
Volume 4, Issue 6, June 2015
Page 53
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
than selecting single agent. here also dropdown list is given for downloads. In it, system gives the number of results
present in a datasets. In figure 12, user enter Art query for 100 results and select the multi agent option.
Figure 12: Search
4.2 Multi-Agent
After pressing the search button, query is processed to give results from datasets. As user selects the multi agent mode
then it will search in multiple agents simultaneously and gives results within a very less time. In figure 13, user is
searching for the Art query and selects multi agent mode from dropdown list then it gives result within 16ms. This time
is definitely less than the time required for single agent.
Figure 13: Multi-Agent
4.3 Single Agent
The system use single agent option only for the comparison with multi-agents. As in figure 14, user selects single agent
it require 750 ms to retrieve the 100 results of Art query while in multi agent for the same query it were return the
results in 16ms.
Volume 4, Issue 6, June 2015
Page 54
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
Figure 14: Single Agent
4.4Clusters
As user enter the Query and press search button it will check the query term in the datasets for the relevancy or
threshold value. If the relevancy is matched using fuzzy neural network then it will produce cluster for particular query
term. In the following figure, it will produce number of clusters for Art query and fetches the snippet related to Art.
Each cluster contains result of similar meaning for example if user selects animation cluster it will give only result of
animation. Such type of mining is called semantic content mining i.e. meaningful search.
Also in the figure 15 this system shows clusters for particular snippet. For example if our cursor is on 1st link then our
system shows the clusters in which this link is present. That is our system allows a document to appear more than one
cluster.
Figure 15:Clusters
Volume 4, Issue 6, June 2015
Page 55
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
Figure 16: Snippets belonging to multiple clusters.
5.COMPARISON
In traditional way, particular search engine returns a long list of snippets for a given query. Generally User searches for
two to three pages and never goes up to last pages as a human tendency. But if the desired result is on 10th page then
user won’t be able to find it. While this system provides clusters which give snippets of similar meaning so that we can
retrieve accurate result.
As in existing approaches, generally single agent is used to retrieve the information but in this system multi agent
approach is used so that system can retrieve information from number of search tools simultaneously to get result in
minimum time. Figure 17 compares the existing System and proposed system.
Figure 17: Comparison between traditional system and proposed system.
Volume 4, Issue 6, June 2015
Page 56
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
6.CONCLUSION AND FUTURE SCOPE
6.1Conclusion
Organizing web search results fastens the user browsing process as the task of searching for relevant web pages in an
unstructured list of results is time consuming. This work has proposed multi-agent based solution for mining semantic
web contents using neural network, with the aim to provide context based knowledge oriented results to the user. This
system implements Multi-agent neural network based framework for mining contents of semantic web, which would
provide query relevant knowledge using STC(Suffix Tree Cluster) clustering technique. To classify the relevancy of
search result fuzzy neural network is used. Using the fuzzy neural network, multi-agent matches the query term within
the document to create a cluster which return search result with high accuracy in a reasonably short time. Instead of
giving the long list of snippets our system provides snippet with cluster which give result within reasonably short time
period.
6.2Future Scope
The proposed work implements the neural network based multi-agent semantic web content mining. We can extend
this work in future as follow:
 As in the proposed work we use only text mining in the future it can be extended as multimedia mining.
 Also, system uses static datasets, it wills extended as a dynamic datasets i.e. datasets will update as the values
added.
 The amalgamation of web mining techniques with agent technology will lead to improved performance, reduced
network traffic, and better results.
REFERENCES
[1]. A. Spink, D. Wolfram, B.J. Jansen, T. Saracevis, Searching the Web: The public and their queries. Journal of the
American Society for Information Science and Technology 52 (3), 2001, pp 226-234.
[2]. Adam Shenker, Mark Last, and Abraham Kandel. Design and Implementation of a Web Mining System for
Organizing Search Engine Results. Proceedings of the CAiSE'01 Workshop Data Integration over the Web
(DIWeb'01), pp. 62 -75, Interlaken, Switzerland, 2001.
[3]. Amin Milani Fard, Reza Ghaemi, Mohammad –R. Akbarzadeh –T.1, Hoda Akbari, Kavosh: An Intelligent NeuroFuzzy Search Engine Published in proceedings of Seventh International Conference on Intelligent Systems Design
and Applications
[4]. April Kontostathis and William Pottenger. A Mathematical View of Latent Semantic Indexing: Tracing Term Coocurrences.http://www.cse.lehigh.edu/techreports/2002/LU-CSE-02-006.pdf.
[5]. B. Yuwono and D. L. Lee. Search and ranking algorithms for locating resources in World Wide Web. Proceedings
of the International Conference on Data Engineering (ICDE), pp. 164-171, New Orleans, USA, 1996.
[6]. C.Dimou, A.Batzios, A.L.Symeonidis and P.A.Mitkas, ‘A Multi-agent framework for spiders traversing the
semantic web’,Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence.
[7]. Choi, Y.S. and Yoo, S.I. Multi-agent Learning Approach to WWW Information Retrieval using Neural Network,
in Proceedings of 1999 ACM International Conference on Intelligent User Interfaces.
[8]. Claudio Carpineto, Stanisiaw Osinski, Giovanni Romano and Dawid Weiss, A Survey of Web Clustering Engines,
ACM Computing Surveys, Volume 41, No. 3, Article 17 (2009) July, pages 17:1 17:38.
[9]. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W. 1992. Scatter/Gather: A Cluster-based Approach to
Browsing Large Document Collections. ACM SIGIR Conference on Research and Development in Information
Retrieval, pp.318-329
[10]. Eirinaki M. & Vazirgiannis M., ‘Web Mining for Web Personalization’. Published in ACM Transactions on
Internet Technology, Vol.3 , No. 1, February 2003, pp. 1-27.
[11]. F. Buccafurri, G. Lax, D. Rosaci and D. Ursino, ‘Dealing with semantic heterogeneity for improving web usage’,
Data Knowl. Eng. 58 (3) (2006), pp. 436–465.
[12]. Ferragina, P. and Gulli, A. (2004) The Anatomy of a Hierarchical Clustering Engine for Web-page, News and
Book Snippets. Technical report, RR04-04 In-formatica, Pisa.
[13]. Ferragina, P. and Gulli, A. (2005) A personalized Search Engine Based OnWeb- Snippet Hierarchical Clustering.
In 14th International World Wide Web Conference.
[14]. Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, Jinwen Ma-Learning to Cluster Web Search Results,;
SIGIR04, July 2529, 2004, She_eld, South York- shire, UK
[15]. J. Han and M. Kamber. Data Mining - Concepts and Techniques. Academic Press, 2001.
[16]. James A. Freeman, David M. Skapura, Neural Networks Algorithms, Applications, and Programming Techniques
in COMPUTATION AND NEURAL SYSTEMS SERIES, Christof Koch ,California Institute of Technology,1991
[17]. Jiang, Z. H., Joshi, A., Krishnapuram, R. and Yi, L. Y. (2002). Retriever: improving web search engine results
using clustering. In Managing Business and Electronic Commerce.
Volume 4, Issue 6, June 2015
Page 57
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
[18]. Jon Kleinberg. Authoritative sources in a hyperlinked environment. ACM-SIAM Symposium on Discrete
Algorithms, pp. 668-667, San Francisco, USA,1998.
[19]. Karayannidis N. & Sellis T., ‘Hierarchical Clustering for OLAP: The CUBE File Approach’. Published in The
VLDB Journal — The International Journal on Very Large Data Bases, Vol. 17, Issue 4, July 2008.
[20]. Kosala R. & Blockeel H., ‘Web Mining Research: A Survey’. Published in ACM SIGKDD, Vol. 2, Issue 1,July
2000
[21]. Krishna Bharat and Monika R. Henzinger. Algorithms for Topic Distillation in a Hyperlinked Environment.
Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.
[22]. Lean Yu, Shouyang Wang and Lai Kin Keung,A Multi-Agent Neural Network System for Web Text Mining,
Information Science Reference, (2008) 162-183
[23]. M. A. Hearst and J. O. Pedersen. Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results.
Proceedings of the Nineteenth Annual International ACM SIGIR Conference, Zurich, June 1996.
[24]. Nikola K. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, The MIT
Press Cambridge, Massachusetts London, England,1998.
[25]. Rasmussen, E. 1992. Clustering Algorithms. Information Retrieval, W.B.Frakes R. Baeza-Yates, Prentice Hall
PTR, New Jersey
[26]. S. Osinski and D. Weiss. A concept-driven algorithms for clustering search results. IEEE Intelligent Systems, 20
(3), (2005), pages 48-54.
[27]. S.Ganesh, M. Jayaraj, V.Kalyan & G. Aghila ,’Ontology based Web Crawler’, Published in proceedings of the
international Conference on Information Technology Coding & Computation, 2004
[28]. Sharma K., Shrivastava G. & Kumar V., ‘Web Mining: Today and Tommorrow’. In Proceedings of the IEEE 3rd
International Conference on Electronics Computer Technology, 2011.
[29]. Shu Bo and Kak Subhash, A neural network-based intelligent metasearch engine, Information Sciences 120
(1999) 1-11
[30]. Singh A, Agent Based Framework for Semantic Web Content Mining, International Journal of Advancements in
Technology 3(April 12)
[31]. Singh A., Juneja D. and Sharma A.K., ‘Design of Ontology-Driven Agent based Focused Crawlers’. In
proceedings of 3rd International Conference on Intelligent Systems & Networks (IISN-2009),Organized by
Institute of Science and Technology, Klawad, 14 -16 Feb 2009, pp.178-181.Available online in ECONOMICS OF
NETWORKS ABSTRACTS, Volume 2, No. 8: Jan 25, 2010.
[32]. Singh A., Juneja D., Sharma A.K., ‘General Design Structure of Ontological Databases in Semantic Web’.
Published in International Journal of Engineering, Science & Technology, Vol. 2, Issue 5, pp. 1227-1232, 2010
[33]. Steinbach, M., G. Karypis, G., Kumar, V. 2000. A Comparison of Document Clustering Techniques. KDD
Workshop on Text Mining
[34]. Udi Manber and Gene Myers. Su_x arrays: a new method for on-line string searches.In Proceedings of the _rst
annual ACM-SIAM symposium on Discrete algorithms, pages 319327, 1990.
[35]. Voorhees, E. M. 1986. Implementing agglomerative hierarchic clustering algorithms for use in document
retrieval. Information Processing Management,22:465-476
[36]. Wang, Y. and Kitsuregawa, M. (2001) Link based clustering of web search results. In proceedings of the 2nd
International Conference on Web-Age Information Management (WAIM2001), XiAn, P.R.China, Spring-Verlag
LNCS, July, 2001.
[37]. Willett, P. 1988. Recent Trends in Hierarchic document Clustering: a critical review. Information Management,
24(5):577-597
[38]. Y. S. Choi, and I. Yoo,Jaeho Lee Neural Network Based Multi-agent Information Retrieval System, in
Proceedings of 2001 ACM International Conference on Intelligent User Interfaces.
[39]. Yuvarani Meiyappan, N. Ch. S. Narayana Iyengar and A. Kannan SRCluster: Web Clustering Engine based on
Wikipedia International Journal of Advanced Science and Technology Vol. 39, February, 2012, PP, 1-18
[40]. Zamir O. and Etzioni, O., (1998)Web document clustering: a feasibility demon-stration. SIGIR 98, Melbourne,
Australia.
[41]. Zamir O. Clustering web documents: a phrase-based method for grouping search engine
results, Doctoral
dissertation, University of Washington and O. M. Oren Zamir, Oren Etzioni and R. Karp. Fast and
intuitiveclustering of web documents. In Proceedings of the 3rd International Conference on Knowledge Discovery
and Data Mining, pages 287-290, 1999
[42]. Zhan L. & Zhijing L., ‘Web Mining based on Multi-Agents’. Published in proceedings of Fifth International
Conference on Computational Intelligence and Multimedia Applications (ICCIMA’03),
[43]. Zhang, D. and Dong, Y. S. (2001). Semantic, Hierarchical, Online Clustering of Web Search Results. In ACM 3rd
Workshop on Web Information and Data.
[44]. Zhao Y and Karypis G, Criterion functions for document clustering: experiments and
Analysis, Technical
Report, Department of Computer Science, University of Minnesota, (2010).
Volume 4, Issue 6, June 2015
Page 58
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 4, Issue 6, June 2015
ISSN 2319 - 4847
AUTHOR
A.H.Bhuskat has received her ME. Degree in Computer Science & Engineering from
P.R.M.C.E.A.M,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof.
Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India.
N.M.Yawale has received her ME. Degree in Computer Science & Engineering from P.R.Patil College
of Engineering, Amravati, India.Currently she is working as an Assistant Professor at Prof. Ram
Meghe Institute of Technology & Research, Badnera, Maharashtra, India.
P.P.Deshmukh has received her ME. Degree in Computer Science & Engineering from
P.R.M.C.E.A.M,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof.
Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India
R.A.Gulhane has received her ME. Degree in Computer Science & Engineering from
P.R.M.C.E.A.M,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof.
Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India.
M.A.Deshmukh has received her ME. Degree in Computer Science & Engineering from P.R.M.I.T &
R,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof. Ram Meghe
Institute of Technology & Research, Badnera, Maharashtra, India.
Volume 4, Issue 6, June 2015
Page 59
Download