International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 Personalized Query Results using User Search Logs Santhi Kolli1 1 V.Ramachandran 2 Dr.Rupa3 PG Student (M.Tech CSE), Vasireddy Venkatadri institute of Technology, Guntur, AP, India 2 Research Scholor, Acharya Nagarjuna University, Nambur, Guntur, AP, India 3 Professor & H.O.D, Vasireddy Venkatadri institute of Technology, Guntur, AP, India ABSTRACT: Most users want their search engine to incorporate three key features in query results. Relevant results (results they are actually interested in), Uncluttered (easy to read interface), Helpful options to broaden or tighten a search for accuracy. This paper addresses the third aspect with new improvement measures for an enhanced experience to the end user. A trivial query like purchasing a laptop has to be broken down into a number of co-dependent steps over a period of time based on prior search patterns of the same user .For instance, a user may first search for company and later the features and price. After deciding the company and price the user may then search for the accessories like mouse and modem etc. Each step requires more queries and each query completes with more clicks. Current search engines cannot support this kind of hierarchical queries. We propose to implement Random walk propagation methods that can construct user profiles based on the credentials obtained from their prior search history repositories. Combined with click points driven click graphs of user search behavior the IR system can support complex queries for future requests at reduced navigations. Random walk propagation over the query fusion graph methods support complex search quests in IR systems at reduced times. For developing an interactive IR system we also propose to use these search quests as auto complete features in similar query propagations. Biasing the ranking of search results can also be provided using ranking algorithms (top-k algorithms).Supporting these methods yields dynamic and improved performance in IR systems, by providing enriched user querying experience. A practical implementation of the proposed system validates our claim. I. INTRODUCTION Web search engines attempt to satisfy users’ 20% of the queries are the navigational on the various information needs by ranking web pages with respect analyses of query logs and the rest are transactional to queries. AS the size and richness of information on or advisory in the nature. A series of interactions the Web grows, hence the variety and the complexity between user and search engine can be necessary to of tasks that users try to accomplish online. Only satisfy a single information need. ISSN: 2231-5381 http://www.ijettjournal.org Page 4227 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 The Internet searches are a top Internet activity, second only to email in the report of the web search. The information retrieval is the important task doing by the users The search engine is still through the keyword queries to access information online. Over a period of time the complex tasks such as the purchasing a laptop is divided into many steps. Although some websites such as the redbus, ebay are helpful to provide from the single database we can attain the steps. So the user want to search the company,features and price. After the selection of the company user searches for accessories like mouse and modem etc.Each step requires more than one query and each query requires more clicks on the relevant pages. Fig 1: Example of search history feature in Google Query grouping takes users session. The search engines are considered to be good if they identify query groups and the clicks in the query groups. The components of search engines such as the result During the online searching group the related queries need to be grouped together.All the search engines are using using the “Search History” where users can allow tracking their online searches by recording their queries and clicks. The figure1 can illustrate the portion of user’s history as it shown by the google. In addition to viewing their search history, users can manipulate it by manually editing and organizing related queries and clicks into groups, or by sharing them with their friends.. Identifying groups of related queries has applications beyond helping the users to make sense and keep track of queries and clicks in their search history. ranking, sessionization, query suggestions, collaborative search and query alterations. Consider a search engine knows that a current query “Mac” belongs to the {“Apple”, “Mac”} group query. The rank of the page can be improved which provides information about how to get a Mac instead of article on “Apple”. Query grouping can also helps other users by promoting task-level collaborative search. Explicit collaborative search can also be performed by allowing users in a trusted community to combine, find and share relevant query groups to perform larger, long-term tasks on the Web. The query group has been arrange in the automated and dynamic fashion in our report of organizing the user’s search history. Each query group is a collection of queries by the same user that are relevant to each other around a common informational need. When the user issues a new ISSN: 2231-5381 http://www.ijettjournal.org Page 4228 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 query then Group 1 query APJ Abdulkalam groups can APJ Abdulkalam Speech automatic APJ Abdulkalam Quotes 11:57:23 Abdulkalam Nagarjuna Speech University Age 19:22:13 Nagarjuna ally University updated. To illustrate our intension let see the below Results tables1, the related queries “Apple” and “Apple IPhone” are separated by many unrelated queries. Time Query Time Query Group 2 12:00:02 Apple 11:40:27 12:32:42 Apple India APPSC Apple Apple Fruit 12:22:22 APPSC Results Apple Free Apps Acharya 19:25:49 19:50:12 Apple Fruit Apple Iphone 20:11:56 Apple Free Apps 20:44:01 Apple Mac Apple I phone (a) User’s Search History Apple Mac 11:51:48 Aeroplane 12:59:12 APPSC Material 12:52:24 Age Calculator 13:03:34 APPSC Notification Group 3 Acharya Nagarjuna University Acharya Nagarjuna University Results 01:59:28 APJ 16:34:09 Axis Bank Abdulkalam Quotes 11:25:04 APJ ISSN: 2231-5381 17:52:49 Acharya http://www.ijettjournal.org Page 4229 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 Our design is to clinch good performance Group 4 while avoiding disruption of existing user- Age define query groups. Age Calculator We analysis two potential ways of using clicks in order to enhance this process: Group 5 APPSC APPSC Results APPSC Notification (b) Query Groups 1. Table 1: Search history of a real user over the Query fusion graph is referred as query click graph into a single period of one day together with the query groups graph by fusing the query reformulation graph. But, the similar quires may not be textually similar. 2. The query set when computing Consider in the example table1 (b) the related queries relevance to also include other “trip advisor Barbados” and “Caribbean cruise” in queries with similar clicked URLs Group 2 have no words in common. Finally, as users on the expanding. may also manually alter their respective query Our proposed search log-based method groups, the users can edit or manual effort of the user shows the effectiveness and the robustness can automate by query grouping. by comprehensive experimental evaluation. To achieve more effective and robust query grouping, we do not rely solely on textual or II. PRELIMINARIES temporal properties of queries. We leverage search behavioral data as captured within a commercial search engine’s log. In general, we develop an online query grouping The aim is to organize a user’s search logs into query groups,which contains one or more related queries The query group consists of atomic information from method over the query fusion graph that combines a probabilistic query reformulation graph. Related to our problem, the problems of session identification, the queries. For broader informational queries the query group consists of a sub queries and clicks we may see at the table1 (b) group 5. and query clustering, that have also used similar graphs in the past. In our scheme we make the following contributions: ISSN: 2231-5381 Query Group: A query group is an ordered list of queries, qi, together with the corresponding set of http://www.ijettjournal.org Page 4230 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 clicked URLs, clki of qi. A query group is denoted as s 2) A set of existing query group S = {s1,….,sm} 3) A similarity threshold Tsim , 0 < Tsim < 1 = h{q1, clk1}, . . . , {qi, clki}. The query group may be either an existing query groups or a new query group if the query group does Output: The query group s that best matches sc, or a new One if necessary not exists. Query group is used to measure the relevance (0) S=0 (1) Tmax = Tsim (2) for i = 1 to m between two queries. Dynamic Query Grouping: The identification of (3) if sim(sc,si) > Tmax (4) s = si query groups is the important task. For this consider the query group as a singleton query group, and (5) Tmax = sim(sc,si) (6) if s = 0 (7) S = S U sc (8) s = sc (9) return s singleton query can be combined iteratively.. First, changing a user’s existing query groups effects the users history. Second, it involves a high computational cost. The grouping in done by using online clustering algorithms. First place the current query and clicks into a singleton query group sinc= {qc, clk}. Compare it with each existing query group si within a user’s history (i.e.,si ∈ S). Identify if there are existing query groups which are relevant to sc. Fig.2: Algorithm for selecting the query group that is the most similar to the given query and clicked URLs. The multitasking is not considered in this scenario.. The relevance between query groups around textually similar queries such as “Apple” and “AppleIphone”. It may not give relevancy measure between the query groups around queries such as “Apple” and “Apple Query Relevance: Iphone”. The user search logs are used in order to To verify whether the query group contains related queries and clicks. The measure sim between the current query singleton group sc and an existing query group si ∈ S.. we assume that users give similar queries and clicks within a short period of time. can determine the relevance between query groups. To measure the relevance between query groups query logs and the click logs are used. QUERY III. RELEVANCE USING SEARCH LOGS The following time-based relevance metric simtime that can be used in place of sim in figure2. Query Relevance means SelectBestQueryGroup based on Web search. The works defines the main the system properties of relevant queries. Input: 1. 1) The current singleton query group sc containing the Current query qc and set of clicks dkc ISSN: 2231-5381 developing Frequent queries which appear together as reformulations http://www.ijettjournal.org Page 4231 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 2. queries which forces the users to click on include only the query pairs whose counts is similar pages. more than threshold value ( Tv). These graphs are used to compute query relevance The edge weight, and the way of incorporating the clicks of a user’s normalized count of the query transitions: query to enhance our relevance metric. Wr(qi,qj) = is the cntγ (qi,qj) ∑ 3.1 Wr(qi,qj), (qi,qk)Eqcntγ(qi,qk) Search Behavior Graphs The search history of a search engine can categorized 2. Query Click Graph into three types of graphs. 1. 2. relevant queries from the search historyt are that are likely reformulations of each likely to force users to click frequently on the other same set of URLs. Consider the queries “Mac” query click graph - forces users to click on query fusion and “Apple” do not share any text and close to each other..They are relevant to each other graph - merges the information in the previous two graphs. 1. queries differently to capture relationship between a pair of queries same URL’s. 3. Take query reformulation Graph - represents the because they give the same results. The bipartite click-through graph, CG = Query Reformulation Graph (Vm Vn, Eg). The infrequent pairs are removed If two queries that are issued at the same using a threshold Tg. time by many users they are likely to be similar wg(qi,qj) = ∑uk min (cntg(qi,uk),cntg(qj,uk) to each other. The relevance of two queries can ∑uk cntg(qi,qk) be measured by the time metric (St) means interval between the timestamps of the queries. Our main aim is to define the statistical 3. Query Fusion Graph frequency of two queries which appear next to each other over all of the users of the system. The query reformulation graph (QRG) and the Query fusion graph (QFG) shows the The query reformulation graph is properties QRG = (Vq, Eq) Eq = set of edges, constructed as per the query sets (qi,qj) , where we can issue the qi, of relevant queries.The query reformulation information is combined with QRG and the query click information is before the qj with in the users day activity., combined with QCG into a single graph QFG= we remove (Vf, Efg). The weight of edge (qi,qj) in QFG, ISSN: 2231-5381 infrequent query pairs and http://www.ijettjournal.org Page 4232 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 Wf(qi,qj), is the sum of the weights, Wr(qi,qj) Fig. 3. Algorithm for the query relevance in EQR and Wg(qi,qj) in EQC. The algorithm computes the fusion fq Wf(qi,qj) = α X Wr(qi,qj) + (1 – α) X Wg(qi,qj) 3.2 relevance vector of a given query q rel . It requires the following inputs in addition to QFG. Two other inputs for the accuracy and the time budget are Computing Query Relevance 1.Total number of random walks (noRWs). 2.The size of neighborhood (maxHops). For a user query q, the relevance vector is calculated using QFG, where each entry represents the relevance value of each query qj ∈ VQ to q. Because this is not sufficient to use the pair wise relevance values in QFG a more sophisticated called Markov chain for q (MKq) is used.This is called the fusion relevance vector of q rel fq for stationary Tthe length of each random walk is maxHops, if the queries from q to ql are not alike. The number of random walk or the lengths of random walks are limited by decreasing both parameters for a faster computation of relfq . . The selection of the next node is based on the outgoing edges of the distribution. Many users issuing queries and clicks in current node v in QFG. The damping factor d is real-time and the huge size of QFG. Relavance(q) performed by the SelectNextNodeToVisit process in Input: Step (7) of the algorithm, it can be shown in the 1) 2) 3) 4) 5) 6) Query FusionGgraph, QFG Jump Vector, j Damping Factor, d Number of Random Walks, noRWs Size of Neighborhood, mxHops the given query, q Output:Fusion Relevance Vector for q, relFq (0) Initialize relFq= 0 (1) noWaks = 0; noVisits = 0 (2) while noWaks < noRWs (3) noHops = 0; v=q (4) While v ≠ NULL ٨ noHops < mxHops (5) noHops++ (6) relFq(v)++; noVists++ (7) v = SelectNextNodeToVisit(v) (8) noWalks++ (9) For each v, normalize relFq(v) = relFq(v)/noVisits figure 4. SelectNextNodeToVisit(v) Input: 1.Query fusion graph, QFG 2. Jump Vector, j 3.Damping Factor, d 4. Current Node, v Output: Next Node to visit, qi (0) if rdm() < d (1) V = {qi | (v,qi) € ∑QF } (2) Pick a node qi € V with probability wf(v,qi) (3) else (4) V = {qi | j(qi) > 0} (5) Pick a node qi € V with probability g(qi) (6) Return qi Fig. 4. Algorithm to select the next node to visit. ISSN: 2231-5381 http://www.ijettjournal.org Page 4233 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 queries to a search engine. A complex task such as IV. QUERY GROUPING Buying a laptop has to be broken down into a The relevance of other queries is captured in number of co-dependent steps over a period of time. an image. The context vector is noted which For example, a user may first search on the company, aggregates the images of queries to form an overall features of laptops, prices etc. After deciding the representation.. The query reformulation graph, query company and prices the user may search for the most images, and context vectors are used in which lend accessories like mouse and modem.Each step Markov chain process to determine requires one or more queries, and each query results relevance in one or more clicks on relevant pages. Keyword between queries and query groups. based search engines cannot address this kind of is used to complicated tasks. So a better system is required that calculate the similarity between the query group and can enable the user to pursue complex search quests the user’s query group. The query group s of the online. Search Engine tries to construct user profile context vector is denoted as the cxts. The context based on his ipaddress/login credentials from its user vector of a query group s by weighting the queries search history repositories. If the user already exists, and the clicks in s by recency, as follows: the search engine checks from its user search history Context Vector: A context vector cxts = ωrecency ωrecency) kjrel(qs1,clk1) repositories up to a certain threshold whether the user already queried the same query previously. If the user Query Image: consider two relevant queries, did, then search engine further retrieves click points “Apple” and “Apple Mac in table 1. The relevance from value in the fusion relevance vectors rel”Apple” Mac”) (“Apple . user search history repositories and reformulates query results by generating click graphs. Click graphs contain useful information on user behavior when searching online. This step is called Online Query Grouping: To avoid performing the query fusion graph. Uses random walk propagation random walk computation of fusion relevance vector over the query fusion graph instead of time-based and for each query estimate that to cache the fusion keyword similarity based approaches. This entire relevance vectors of 100 million queries.. process is called Personalizing the user search histories into query groups. V. EXISTING SYSTEM A user queries a search engine has been VI. PROPOSED SYSTEM existed. The search engine displays results based page ranking algorithms.. The primary means of Most users want their search engine to accessing information online is still through keyword incorporate three key features in query results. A ISSN: 2231-5381 http://www.ijettjournal.org Page 4234 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 query such as buying a laptop has to be divided into process is called personalizing user search histories different steps.iew. For example, a user may first into query groups. For making the IR Systems search on the company, features of laptops, prices effective and dynamic we propose to use these search etc. After deciding the company and prices the user quests as auto complete features in similar query may search for the most accessories like mouse and propagations. Biasing the ranking of search results modem. The click points of click graphs of user can search behavior the IR system can support complex algorithms(top-k queries for future requests at reduced navigations. methods yields dynamic performance in IR systems, Random walk propagation over the query fusion by providing enriched user querying experience also be provided using algorithms). any ranking Supporting these graph methods support complex search quests in IR systems at reduced times. For developing an interactive IR system we also propose to use these VIII. REFERENCES search quests as auto complete features in similar query propagations. Biasing the ranking of search results can also be provided using ranking algorithms (top-k algorithms).These methods gives improved performance in IR systemsA practical implementation of the proposed system validates our claim. During the online complex quest to identify and to group the related queries together we have a standout step towards the enabling services and feature that are capable. Currently we are using the “Search History” in major search engines where users can allow tracking their online searches by recording [1] Heasoo Hwang, Hady W. Lauw, Lise Getoor and Alexandros Ntoulas, Organizing User Search Histories, Volume: 24 , Issue: 5 in IEEE 2012 Transactions on Knowledge and Data Engineering [2] J. Teevan, E. Adar, R. Jones, and M. A. S. Potts, “Information reretrieval: repeat queries in yahoo’s logs,” in SIGIR. New York, NY, USA: ACM, 2007, pp. 151–158. [3] A. Broder, “A taxonomy of web search,” SIGIR Forum, vol. 36, no. 2, pp. 3–10, 2002. [4] R. Jones and K. L. Klinkner, “Beyond the session timeout: Automatic hierarchical segmentation of the queries and clicks. search topics in query logs,” in CIKM, 2008. [5] A. Spink, M. Park, B. J. Jansen, and J. Pedersen, VII. “Multitasking CONCLUSION during Web search sessions,” Information Processing and Management, vol. 42, Click graphs have information on user behavior no. 1, pp. 264–275, 2006. while searching online. This step is called query [6] A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of fusion graph. Uses random walk propagation over the query reformulation and relevance feedback by query fusion graph instead of time-based and Excite users. Internet Research: Electronic keyword similarity based approaches. This entire ISSN: 2231-5381 http://www.ijettjournal.org Page 4235 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013 Networking Applications and Policy, 10(4):317–328, 2000. Authors Profile Dr Rupa, working as theof the Department of Computer Science and Engineering in VVIT.Nambur.She has 12 yearsof teaching Santhi Kolli She completed her Masters Computer Science from experience. Periyar University.Right now she is the student of B.Tech(CSIT) Degree in Computer Science & Vasireddy Information Technology from JNTU, Hyderabad Pursuing Venkatadri M.Tech in institute of Computer Technology Science and Engineering. Dr Ch.Rupa, obtained her in 2002 and M.Tech (IT) in Information Technology from Andhra University in 2005. She was awarded Ph. D (CSE) in Computer Science Engineering by Andhra University. Ramachandran, is a research scholar at Acharya NagarjunaUniversity, Nambur. He got his B.TECH Computer Science &Systems engineering Degreefrom Andhra University andM.TECH in Computer ScienceEngineering from JNTU, Kakinada. He is verymuch interested in image processing, medical retrieval , human vision & pattern recognition.He did several projects in image processing. ISSN: 2231-5381 http://www.ijettjournal.org Page 4236