A Pattern Based Approach For User Interesting Results Over Search Engine ,

International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013 A Pattern Based Approach For User Interesting Results Over Search Engine Seeatayya Narthu1 , V. Srikanth2 1 1,2 M.Tech Scholar,2Associative professor Dept of CSE, Pydah College Of Engg & Technology, Visakhapatnam,A.P. Abstract:- In this paper we are proposing an efficient search mechanism based on user search queries, for finding the optimal search results,our approach initially finds the patterns which are already visited frequently for a relevant query with in a specific session, For finding the best pattern we apply an efficient pattern mining algorithm, it retrieves the pattern which optimal among all patterns . I.INTRODUCTION Thanks to the ubiquity of the Internet search engine box, users have come to depend on Web search engines both to find new information and to re-find previously viewed information. A recent Pew Internet and American Life report showed that Internet searches are a top Internet activity, second only to email [1]; in a study of Web users [2], 17% of those surveyed reported “Not being able to return to a page I once visited,” as one of “the biggest problems in using the Web.” The effect of this is that knowledge workers are estimated to waste 15% of their time ecause they cannot find information that they know already exists [7]. Despite these known problems, the use of keyword search engines for re-finding has not been significantly studied. While many searches are for new information, a significant use of search engines is to find information that was found before. For example, a query or keyword is often used to “bookmark” a Web page. In this paper, we build on earlier work [2] to explore how keyword search is used for re-finding. We analyze the queries and result clicks of 114 anonymous Yahoo users over the course of a year. Our analysis demonstrates that re-finding queries are common and provides a detailed characterization of them. Given the pervasiveness of refinding queries, we explore which search engine features support or hinder re-finding. In particular, we concentrate on changes in rank and demonstrate the detrimental impact of rank changes on this type of task. Making use of our understanding of re-finding behavior, we describe algorithmic methods to detect re-finding intent and suggest ways in which search engines can better support this behavior common and provides a detailed characterization ISSN: 2231-5381 of them. Given the pervasiveness of re-finding queries, we explore which search engine features support or hinder refinding. In particular, we concentrate on changes in rank and demonstrate the detrimental impact of rank changes on this type of task. Making use of our understanding of refinding behavior, we describe algorithmic methods to detect re-finding intent and suggest ways in which search engines can better support this behavior. II. RELATED WORK Most of the work on query similarity is related to query expansion or query clustering. One early technique proposed by Raghavan and Sever [14] attempts to measure query similarity using the differences in the ordering of documents retrieved in the answers, which is not feasible in the current Web. Later, Fitzpatrick and Dent [11], measured query similarity using the normalized set intersection of the top 200 documents in the answers for the queries. Again, this is not meaningful in the Web as the intersection for semantically similar queries that use different synonyms can and will be very small.Wen et al [17] proposed to cluster similar queries to recommend URLs to frequently asked queries of a search engine. They used four notions of query distance based on: (1) keywords or phrases of the query; (2) string matching of keywords; (3) common clicked URL's; and (4) the distance of the clicked documents in some pre-defined hierarchy. Befferman and Berger [4] also proposed a query clustering technique based on distance notion (3). As the average number of words in queries is small (about two) and the number of clicks in the answer pages is also small [1], notions (1) and (2) generate very sparse distance matrices. Notion (4) needs concept taxonomy and the clicked documents to be classified into the taxonomy, which cannot be done in a large scale. Also (3)is sparse, but this sparsity can be diminished using large query logs, as in this paper. Fonseca et al [12] propose to discover related queries using association rules. The query log is viewed as a set of transactions, with each transaction representing a session in which a single user submits a sequence of related queries in a time interval. The method shows good results, but two problems arise: it is difficult to determine sessions of queries belonging to the same search process; moreover the http://www.ijettjournal.org Page 4494 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013 most interesting related queries, those submitted by different users, cannot be discovered, since the support of a rule increases only if its queries appear in the same query session (i.e. they are submitted by the same user.) BaezaYates et al. [2, 3] used the content of clicked Web pages to define a term-weight vector model for a query. They consider terms in the URLs clicked after a query. Each term is weighted according to the number of occurrences of the query and the number of clicks of the documents in which the term appears. Then the similarity of two queries is equivalent to the similarity of their vector representations, like the cosine distance function. This notion of query similarity has several advantages. First, it is simple and easy to compute. On the other hand, it allows to relate queries that happen to be worded differently but stem from the same topic, hence capturing semantic relationships among queries. Recently, Sahami and Heilman [15] used a query similarity based on the snippets of the answers to the queries. However, they do not consider the feedback of the users (i.e. clicked pages). Another related paper defines a query taxonomy to cluster the answers [18], but query logs are not used, while in Cid et al. [5] they use query logs to maintain a taxonomy, but not to build one. Organizing the query groups within a user’s history is challenging for a number of reasons. First, related queries may not appear close to one another, as a search task may span days or even weeks. This is further complicated by the interleaving of queries and clicks from different search tasks due to users’ multi-tasking, opening multiple browser tabs, and frequently changing search topics. related queries may not be textually similar, traditional approaches like Query click and query fusion graphs may not give optimal results. III. PROPOSED WORK The main objective of the project is to retrieve the user interesting results from the search engine based on user query, We introduced a novel approach to solve this problem, with Integrated pattern mining approach. In this approach initially we retrieve the search history which is relevant to user keyword with respect to individual session, On that patterns we apply pattern mining approach to find the optimal patterns then we extract the individual urls from the patterns and displays the optimal results to the user.User searches for required information with known search keyword,search engine receives the query and forwards to the search history.Search history retrieves the session oriented results which are relevant to the search query and forwards these search results to pattern mining approach.Pattern mining approach performs the main task(mining) over the session oriented results,for this approach we are using apriori algorithm for finding the ISSN: 2231-5381 optimal search results. Apriori work on the search oriented results(set of records) and each individual record contains session id and pattern(sequentially navigated urls).Session id indicates the duration between the user initiation and termination. Example search history results as follows, for simplification we are representing the urls in terms of single letters. such as www.wikipedia.com(w),www.encyclopedia.com(e),www.a bc.com(a).www.xyz.co.in(x) and so results retrieved by the search history forwarded to mining approach, for mining these results we are using apriori algorithm Session ID S1 S2 S3 S4 Pattern wea abex xya wxe Search history Search engine Pattern Mining Architecture Mining algorithm: 1. Scan the (entire) transaction database to get the support S of each 1-itemset, compare S with min_sup, and get a set of frequent 1-itemsets, L1 2. Use Lk-1 join Lk-1 to generate a set of candidate k item sets. Use Apriori property to prune the unfreqset k-item set 3. Scan the transaction database to get the support Sof each candidate k-itemset in the final set, compare S with min_sup, and get a set of frequent kitemsets, Lk 4. Is the candidate set empty, if not goto 2 5 For each frequent itemset l, generate all nonempty subsets of l 6 For every nonempty subset s of l, output the rule if its confidence C > min_conf Pattern mining algorithm forwards the patterns to the search engine as individual urls as search results . http://www.ijettjournal.org Page 4495 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013 IV. CONCLUSION We are concluding our research work with efficient pattern mining approach rather than clustering approach and other traditional approaches. We can enhance the system by improving the pattern mining approach by using any evolutionary algorithm(Genetic algorithm, Ant colony algorithm, etc..). REFERENCES [1] R. Baeza-Yates. Applications of web query mining. ECIR'05. [2] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query clustering for boosting web page ranking. AWIC'04, [3] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in a search engine. EDBT Workshops, 2004. [4] D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. KDD'99. Boston, MA USA. [5] A. Cid, C- Hurtado, and M- Mendoza. Automatic maintenance of Web directories using clickthrough data. WIRI'06. [6] S.-L. Chuang and L.-F. Chien. Automatic query taxonomy generation for information retrieval applications. Online Information Review 27(4), 2003. [7] S.-L. Chuang and L.-F. Chien. Enriching web taxonomies through subject categorization of query terms from search engine logs. Decision Support System 30(1), 2003. [8] S.-L. Chuang and L.-F. Chien. Towards automatic generation of query taxonomy: A hierarchical query clustering approach. ICDM'02. [9] P.-J. Cheng, C.-H. Tsai, C.-M. Hung, and L.-F. Chien. Query Taxonomy Generation for Web Search (poster). CIKM'06. [10] G. Dupret and M. Mendoza. Automatic Query Recommendation using Click-Through Data. IFIP PPAI'06. ISSN: 2231-5381 [11] V.I. Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” Soviet Physics Doklady, vol. 10, pp. 707710, 1966. [12] M. Sahami and T.D. Heilman, “A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets,” Proc. the 15th Int’l Conf. World Wide Web (WWW ’06), pp. 377-386, 2006. [13] J.-R. Wen, J.-Y. Nie, and H.-J. Zhang, “Query Clustering Using User Logs,” ACM Trans. in Information Systems, vol. 20, no. 1, pp. 59-81, 2002. [14] A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal, “Using the Wisdom of the Crowds for Keyword Generation,” Proc. the 17th Int’l Conf. World Wide Web (WWW ’08), 2008. [15] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova, “Monte Carlo Methods in PageRank Computation: When One Iteration Is Sufficient,” SIAM J. Numerical Analysis, vol. 45, no. 2, pp. 890-904, 2007. http://www.ijettjournal.org BIOGRAPHIES Seeatayya Narthu pursuing M.Tech in Department Of Computer Science and Engineering, in Pydah College Of Engg & Technology. His interested areas are datamining. V. Srikanth working as a Associative Profesor, in Department of CSE, in Pydah College Of Engg & Technology. His interested areas are datamaining Page 4496

A Pattern Based Approach For User Interesting Results Over Search Engine ,

Related documents

Products

Support

A Pattern Based Approach For User Interesting Results Over Search Engine ,

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib