A Pattern Based Approach For User Interesting Results Over Search Engine ,

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
A Pattern Based Approach For User Interesting
Results Over Search Engine
Seeatayya Narthu1 , V. Srikanth2
1
1,2
M.Tech Scholar,2Associative professor
Dept of CSE, Pydah College Of Engg & Technology, Visakhapatnam,A.P.
Abstract:- In this paper we are proposing an efficient search
mechanism based on user search queries, for finding the
optimal search results,our approach initially finds the
patterns which are already visited frequently for a relevant
query with in a specific session, For finding the best pattern
we apply an efficient pattern mining algorithm, it retrieves
the pattern which optimal among all patterns .
I.INTRODUCTION
Thanks to the ubiquity of the Internet search engine box,
users have come to depend on Web search engines both to
find new information and to re-find previously viewed
information. A recent Pew Internet and American Life
report showed that Internet searches are a top Internet
activity, second only to email [1]; in a study of Web users
[2], 17% of those surveyed reported “Not being able to
return to a page I once visited,” as one of “the biggest
problems in using the Web.” The effect of this is that
knowledge workers are estimated to waste 15% of their
time ecause they cannot find information that they know
already exists [7]. Despite these known problems, the use
of keyword search engines for re-finding has not been
significantly studied. While many searches are for new
information, a significant use of search engines is to find
information that was found before. For example, a query
or keyword is often used to “bookmark” a Web page. In
this paper, we build on earlier work [2] to explore how
keyword search is used for re-finding. We analyze the
queries and result clicks of 114 anonymous Yahoo users
over the course of a year. Our analysis demonstrates that
re-finding queries are common and provides a detailed
characterization of them. Given the pervasiveness of refinding queries, we explore which search engine features
support or hinder re-finding. In particular, we concentrate
on changes in rank and demonstrate the detrimental impact
of rank changes on this type of task. Making use of our
understanding of re-finding behavior, we describe
algorithmic methods to detect re-finding intent and suggest
ways in which search engines can better support this
behavior common and provides a detailed characterization
ISSN: 2231-5381
of them. Given the pervasiveness of re-finding queries, we
explore which search engine features support or hinder refinding. In particular, we concentrate on changes in rank
and demonstrate the detrimental impact of rank changes on
this type of task. Making use of our understanding of refinding behavior, we describe algorithmic methods to
detect re-finding intent and suggest ways in which search
engines can better support this behavior.
II. RELATED WORK
Most of the work on query similarity is related to query
expansion or query clustering. One early technique
proposed by Raghavan and Sever [14] attempts to measure
query similarity using the differences in the ordering of
documents retrieved in the answers, which is not feasible in
the current Web. Later, Fitzpatrick and Dent [11],
measured query similarity using the normalized set
intersection of the top 200 documents in the answers for
the queries. Again, this is not meaningful in the Web as the
intersection for semantically similar queries that use
different synonyms can and will be very small.Wen et al
[17] proposed to cluster similar queries to recommend
URLs to frequently asked queries of a search engine. They
used four notions of query distance based on:
(1) keywords or phrases of the query; (2) string
matching of keywords; (3) common clicked URL's; and (4)
the distance of the clicked documents in some pre-defined
hierarchy. Befferman and Berger [4] also proposed a query
clustering technique based on distance notion (3). As the
average number of words in queries is small (about two)
and the number of clicks in the answer pages is also small
[1], notions (1) and (2) generate very sparse distance
matrices. Notion (4) needs concept taxonomy and the
clicked documents to be classified into the taxonomy,
which cannot be done in a large scale. Also (3)is sparse,
but this sparsity can be diminished using large query logs,
as in this paper. Fonseca et al [12] propose to discover
related queries using association rules. The query log is
viewed as a set of transactions, with each transaction
representing a session in which a single user submits a
sequence of related queries in a time interval. The method
shows good results, but two
problems arise: it is difficult to determine sessions of
queries belonging to the same search process; moreover the
http://www.ijettjournal.org
Page 4494
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
most interesting related queries, those submitted by
different users, cannot be discovered, since the support of a
rule increases only if its queries appear in the same query
session (i.e. they are submitted by the same user.) BaezaYates et al. [2, 3] used the content of clicked Web pages to
define a term-weight vector model for a query. They
consider terms in the URLs clicked after a query. Each
term is weighted according to the number of occurrences of
the query and the number of clicks of the documents in
which the term appears. Then the similarity of two queries
is equivalent to the similarity of their vector
representations, like the cosine distance function. This
notion of query similarity has several advantages. First, it is
simple and easy to compute. On the other hand, it allows to
relate queries that happen to be worded differently but stem
from the same topic, hence capturing semantic
relationships among queries. Recently, Sahami and
Heilman [15] used a query similarity based on the snippets
of the answers to the queries. However, they do not
consider the feedback of the users (i.e. clicked pages).
Another related paper defines a query taxonomy to cluster
the answers [18], but query logs are not used, while in Cid
et al. [5] they use query logs to maintain a taxonomy, but
not to build one.
Organizing the query groups within a user’s history is
challenging for a number of reasons. First, related queries
may not appear close to one another, as a search task may
span days or even weeks. This is further complicated by the
interleaving of queries and clicks from different search
tasks due to users’ multi-tasking, opening multiple browser
tabs, and frequently changing search topics. related queries
may not be textually similar, traditional approaches like
Query click and query fusion graphs may not give optimal
results.
III. PROPOSED WORK
The main objective of the project is to retrieve the user
interesting results from the search engine based on user
query, We introduced a novel approach to solve this
problem, with Integrated pattern mining approach. In this
approach initially we retrieve the search history which is
relevant to user keyword with respect to individual session,
On that patterns we apply pattern mining approach to find
the optimal patterns then we extract the individual urls
from the patterns and displays the optimal results to the
user.User searches for required information with known
search keyword,search engine receives the query and
forwards to the search history.Search history retrieves the
session oriented results which are relevant to the search
query and forwards these search results to pattern mining
approach.Pattern mining approach performs the main
task(mining) over the session oriented results,for this
approach we are using apriori algorithm for finding the
ISSN: 2231-5381
optimal search results. Apriori work on the search oriented
results(set of records) and each individual record contains
session id and pattern(sequentially navigated urls).Session
id indicates the duration between the user initiation and
termination. Example search history results as follows, for
simplification we are representing the urls in terms of
single
letters.
such
as
www.wikipedia.com(w),www.encyclopedia.com(e),www.a
bc.com(a).www.xyz.co.in(x) and so results retrieved by the
search history forwarded to mining approach, for mining
these results we are using apriori algorithm
Session ID
S1
S2
S3
S4
Pattern
wea
abex
xya
wxe
Search
history
Search
engine
Pattern
Mining
Architecture
Mining algorithm:
1. Scan the (entire) transaction database to get the support
S of each 1-itemset, compare S with min_sup, and get a set
of frequent 1-itemsets, L1
2. Use Lk-1 join Lk-1 to generate a set of candidate k item
sets. Use Apriori property to prune the unfreqset k-item set
3. Scan the transaction database to get the support Sof each
candidate k-itemset in the final set, compare S with
min_sup, and get a set of frequent kitemsets, Lk
4. Is the candidate set empty, if not goto 2
5 For each frequent itemset l, generate all nonempty
subsets of l
6 For every nonempty subset s of l, output the rule if its
confidence C > min_conf
Pattern mining algorithm forwards the patterns to the
search engine as individual urls as search results .
http://www.ijettjournal.org
Page 4495
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013
IV. CONCLUSION
We are concluding our research work with efficient
pattern mining approach rather than clustering approach
and other traditional approaches. We can enhance the
system by improving the pattern mining approach by using
any evolutionary algorithm(Genetic algorithm, Ant colony
algorithm, etc..).
REFERENCES
[1] R. Baeza-Yates. Applications of web query mining.
ECIR'05.
[2] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query
clustering for boosting web page ranking. AWIC'04,
[3] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query
recommendation using query logs in a search engine.
EDBT Workshops, 2004.
[4] D. Beeferman and A. Berger. Agglomerative clustering
of a search engine query log. KDD'99. Boston, MA USA.
[5] A. Cid, C- Hurtado, and M- Mendoza. Automatic
maintenance of Web directories using clickthrough data.
WIRI'06.
[6] S.-L. Chuang and L.-F. Chien. Automatic query
taxonomy generation for information retrieval applications.
Online Information Review 27(4), 2003.
[7] S.-L. Chuang and L.-F. Chien. Enriching web
taxonomies through subject categorization of query terms
from search engine logs. Decision Support System 30(1),
2003.
[8] S.-L. Chuang and L.-F. Chien. Towards automatic
generation of query taxonomy: A hierarchical query
clustering approach. ICDM'02.
[9] P.-J. Cheng, C.-H. Tsai, C.-M. Hung, and L.-F. Chien.
Query Taxonomy Generation for Web Search (poster).
CIKM'06.
[10] G. Dupret and M. Mendoza. Automatic Query
Recommendation using Click-Through Data. IFIP PPAI'06.
ISSN: 2231-5381
[11] V.I. Levenshtein, “Binary Codes Capable of
Correcting Deletions, Insertions and Reversals,” Soviet
Physics Doklady, vol. 10, pp. 707710, 1966.
[12] M. Sahami and T.D. Heilman, “A Web-based Kernel
Function for Measuring the Similarity of Short Text
Snippets,” Proc. the 15th Int’l Conf. World Wide Web
(WWW ’06), pp. 377-386, 2006.
[13] J.-R. Wen, J.-Y. Nie, and H.-J. Zhang, “Query
Clustering Using User Logs,” ACM Trans. in Information
Systems, vol. 20, no. 1, pp. 59-81, 2002.
[14] A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal,
“Using the Wisdom of the Crowds for Keyword
Generation,” Proc. the 17th Int’l Conf. World Wide Web
(WWW ’08), 2008.
[15] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N.
Osipova, “Monte Carlo Methods in PageRank
Computation: When One Iteration Is Sufficient,” SIAM J.
Numerical Analysis, vol. 45, no. 2, pp. 890-904, 2007.
http://www.ijettjournal.org
BIOGRAPHIES
Seeatayya Narthu pursuing M.Tech in
Department Of Computer Science and
Engineering, in Pydah College Of Engg &
Technology. His interested areas are
datamining.
V. Srikanth working as a Associative
Profesor, in Department of CSE, in Pydah
College Of Engg & Technology. His
interested areas are datamaining
Page 4496
Download