Personalized Query Results using User Search Logs Santhi Kolli

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
Personalized Query Results using User
Search Logs
Santhi Kolli1
1
V.Ramachandran 2
Dr.Rupa3
PG Student (M.Tech CSE), Vasireddy Venkatadri institute of Technology, Guntur, AP, India
2
Research Scholor, Acharya Nagarjuna University, Nambur, Guntur, AP, India
3
Professor & H.O.D, Vasireddy Venkatadri institute of Technology, Guntur, AP, India
ABSTRACT:
Most users want their search engine to incorporate three key features in query results. Relevant results (results they
are actually interested in), Uncluttered (easy to read interface), Helpful options to broaden or tighten a search for
accuracy. This paper addresses the third aspect with new improvement measures for an enhanced experience to the
end user. A trivial query like purchasing a laptop has to be broken down into a number of co-dependent steps over a
period of time based on prior search patterns of the same user .For instance, a user may first search for company and
later the features and price. After deciding the company and price the user may then search for the accessories like
mouse and modem etc. Each step requires more queries and each query completes with more clicks. Current search
engines cannot support this kind of hierarchical queries. We propose to implement Random walk propagation
methods that can construct user profiles based on the credentials obtained from their prior search history
repositories. Combined with click points driven click graphs of user search behavior the IR system can support
complex queries for future requests at reduced navigations. Random walk propagation over the query fusion graph
methods support complex search quests in IR systems at reduced times. For developing an interactive IR system we
also propose to use these search quests as auto complete features in similar query propagations. Biasing the ranking
of search results can also be provided using ranking algorithms (top-k algorithms).Supporting these methods yields
dynamic and improved performance in IR systems, by providing enriched user querying experience. A practical
implementation of the proposed system validates our claim.
I.
INTRODUCTION
Web search engines attempt to satisfy users’
20% of the queries are the navigational on the various
information needs by ranking web pages with respect
analyses of query logs and the rest are transactional
to queries. AS the size and richness of information on
or advisory in the nature. A series of interactions
the Web grows, hence the variety and the complexity
between user and search engine can be necessary to
of tasks that users try to accomplish online. Only
satisfy a single information need.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4227
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
The Internet searches are a top Internet
activity, second only to email in the report of the web
search. The information retrieval is the important task
doing by the users The search engine is still through
the keyword queries to access information online.
Over a period of time the complex tasks such as the
purchasing a laptop is divided into many steps.
Although some websites such as the redbus, ebay are
helpful to provide from the single database we can
attain the steps. So the user want to search the
company,features and price. After the selection of the
company user searches for accessories like mouse
and modem etc.Each step requires more than one
query and each query requires more clicks on the
relevant pages.
Fig 1: Example of search history feature in Google
Query grouping takes users session. The search
engines are considered to be good if they identify
query groups and the clicks in the query groups. The
components of search engines such as the result
During the online searching
group the
related queries need to be grouped together.All the
search engines are using using the “Search History”
where users can allow tracking their online searches
by recording their queries and clicks. The figure1 can
illustrate the portion of user’s history as it shown by
the google. In addition to viewing their search
history, users can manipulate it by manually editing
and organizing related queries and clicks into groups,
or by sharing them with their friends.. Identifying
groups of related queries has applications beyond
helping the users to make sense and keep track of
queries and clicks in their search history.
ranking,
sessionization,
query
suggestions,
collaborative search and query alterations. Consider a
search engine knows that a current query “Mac”
belongs to the {“Apple”, “Mac”} group query. The
rank of the page can be improved which provides
information about how to get a Mac instead of article
on “Apple”. Query grouping can also helps other
users by promoting task-level collaborative search.
Explicit collaborative search can also be performed
by allowing users in a trusted community to combine,
find and share relevant query groups to perform
larger, long-term tasks on the Web.
The query group has been arrange in the automated
and dynamic fashion in our report of organizing the
user’s search history. Each query group is a
collection of queries by the same user that are
relevant
to
each
other
around
a
common
informational need. When the user issues a new
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4228
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
query then
Group 1
query
APJ Abdulkalam
groups can
APJ Abdulkalam Speech
automatic
APJ Abdulkalam Quotes
11:57:23
Abdulkalam
Nagarjuna
Speech
University
Age
19:22:13
Nagarjuna
ally
University
updated. To illustrate our intension let see the below
Results
tables1, the related queries “Apple” and “Apple
IPhone” are separated by many unrelated queries.
Time
Query
Time
Query
Group 2
12:00:02 Apple
11:40:27
12:32:42
Apple India
APPSC
Apple
Apple Fruit
12:22:22 APPSC Results
Apple Free Apps
Acharya
19:25:49
19:50:12
Apple Fruit
Apple Iphone
20:11:56 Apple Free
Apps
20:44:01
Apple Mac
Apple I phone
(a) User’s Search History
Apple Mac
11:51:48 Aeroplane
12:59:12 APPSC
Material
12:52:24 Age Calculator
13:03:34
APPSC
Notification
Group 3
Acharya Nagarjuna University
Acharya Nagarjuna University
Results
01:59:28 APJ
16:34:09
Axis Bank
Abdulkalam
Quotes
11:25:04 APJ
ISSN: 2231-5381
17:52:49 Acharya
http://www.ijettjournal.org
Page 4229
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013

Our design is to clinch good performance
Group 4
while avoiding disruption of existing user-
Age
define query groups.

Age Calculator
We analysis two potential ways of using
clicks in order to enhance this process:
Group 5
APPSC
APPSC Results
APPSC Notification
(b) Query Groups
1.
Table 1: Search history of a real user over the
Query fusion graph is referred as
query click graph into a single
period of one day together with the query groups
graph
by
fusing
the
query
reformulation graph.
But, the similar quires may not be textually similar.
2.
The query set when computing
Consider in the example table1 (b) the related queries
relevance to also include other
“trip advisor Barbados” and “Caribbean cruise” in
queries with similar clicked URLs
Group 2 have no words in common. Finally, as users
on the expanding.
may also manually alter their respective query

Our proposed search log-based method
groups, the users can edit or manual effort of the user
shows the effectiveness and the robustness
can automate by query grouping.
by comprehensive experimental evaluation.
To achieve more effective and robust query
grouping, we do not rely solely on textual or
II.
PRELIMINARIES
temporal properties of queries. We leverage search
behavioral data as captured within a commercial
search engine’s log. In general, we develop an online
query grouping
The aim is to organize a user’s search logs into query
groups,which contains one or more related queries
The query group consists of atomic information from
method over the query fusion graph that combines a
probabilistic query reformulation graph. Related to
our problem, the problems of session identification,
the queries. For broader informational queries the
query group consists of a sub queries and clicks we
may see at the table1 (b) group 5.
and query clustering, that have also used similar
graphs in the past. In our scheme we make the
following contributions:
ISSN: 2231-5381
Query Group: A query group is an ordered list of
queries, qi, together with the corresponding set of
http://www.ijettjournal.org
Page 4230
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
clicked URLs, clki of qi. A query group is denoted as s
2) A set of existing query group S = {s1,….,sm}
3) A similarity threshold Tsim , 0 < Tsim < 1
= h{q1, clk1}, . . . , {qi, clki}.
The query group may be either an existing query
groups or a new query group if the query group does
Output: The query group s that best matches sc, or a
new
One if necessary
not exists.
Query group is used to measure the relevance
(0) S=0
(1) Tmax = Tsim
(2) for i = 1 to m
between two queries.
Dynamic Query Grouping: The identification of
(3) if sim(sc,si) > Tmax
(4)
s = si
query groups is the important task. For this consider
the query group as a singleton query group, and
(5)
Tmax = sim(sc,si)
(6) if s = 0
(7)
S = S U sc
(8)
s = sc
(9) return s
singleton query can be combined iteratively.. First,
changing a user’s existing query groups effects the
users
history.
Second,
it
involves
a
high
computational cost.
The grouping in done by using online clustering
algorithms. First place the current query and clicks
into a singleton query group sinc= {qc, clk}. Compare
it with each existing query group si within a user’s
history (i.e.,si ∈ S). Identify if there are existing
query groups which are relevant to sc.
Fig.2: Algorithm for selecting the query group that is
the most similar to the given query and clicked URLs.
The multitasking is not considered in this scenario..
The relevance between query groups around textually
similar queries such as “Apple” and “AppleIphone”.
It may not give relevancy measure between the query
groups around queries such as “Apple” and “Apple
Query Relevance:
Iphone”. The user search logs are used in order to
To verify whether the query group contains related
queries and clicks. The measure sim between the
current query singleton group sc and an existing query
group si ∈ S.. we assume that users give similar
queries and clicks within a short period of time. can
determine the relevance between query groups. To
measure the relevance between query groups query
logs and the click logs are used.
QUERY
III.
RELEVANCE
USING
SEARCH LOGS
The following time-based relevance metric simtime
that can be used in place of sim in figure2.
Query Relevance means
SelectBestQueryGroup
based on Web search. The works defines the main
the system
properties of relevant queries.
Input:
1.
1) The current singleton query group sc
containing the
Current query qc and set of clicks dkc
ISSN: 2231-5381
developing
Frequent queries which appear together as
reformulations
http://www.ijettjournal.org
Page 4231
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
2.
queries which forces the users to click on
include only the query pairs whose counts is
similar pages.
more than threshold value ( Tv).
These graphs are used to compute query relevance
The
edge
weight,
and the way of incorporating the clicks of a user’s
normalized count of the query transitions:
query to enhance our relevance metric.
Wr(qi,qj) =
is
the
cntγ (qi,qj)
∑
3.1
Wr(qi,qj),
(qi,qk)Eqcntγ(qi,qk)
Search Behavior Graphs
The search history of a search engine can categorized
2.
Query Click Graph
into three types of graphs.
1.
2.
relevant queries from the search historyt are
that are likely reformulations of each
likely to force users to click frequently on the
other
same set of URLs. Consider the queries “Mac”
query click graph - forces users to click on
query
fusion
and “Apple” do not share any text and close to
each other..They are relevant to each other
graph
-
merges
the
information in the previous two graphs.
1.
queries differently to capture
relationship between a pair of queries
same URL’s.
3.
Take
query reformulation Graph - represents the
because they give the same results.
The bipartite click-through graph, CG =
Query Reformulation Graph
(Vm Vn, Eg). The infrequent pairs are removed
If two queries that are issued at the same
using a threshold Tg.
time by many users they are likely to be similar
wg(qi,qj) = ∑uk min (cntg(qi,uk),cntg(qj,uk)
to each other. The relevance of two queries can
∑uk cntg(qi,qk)
be measured by the time metric (St) means
interval between the timestamps of the queries.
Our main aim is to define the statistical
3.
Query Fusion Graph
frequency of two queries which appear next to
each other over all of the users of the system.
The query reformulation graph (QRG) and
the Query fusion graph (QFG) shows the
The query reformulation graph is
properties
QRG = (Vq, Eq)
Eq = set of edges, constructed as per the
query sets (qi,qj) , where we can issue the qi,
of
relevant
queries.The
query
reformulation information is combined with
QRG and the query click information is
before the qj with in the users day activity.,
combined with QCG into a single graph QFG=
we remove
(Vf, Efg). The weight of edge (qi,qj) in QFG,
ISSN: 2231-5381
infrequent query pairs and
http://www.ijettjournal.org
Page 4232
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
Wf(qi,qj), is the sum of the weights, Wr(qi,qj)
Fig. 3. Algorithm for the query relevance
in EQR and Wg(qi,qj) in EQC.
The
algorithm
computes
the
fusion
fq
Wf(qi,qj) = α X Wr(qi,qj) + (1 – α) X
Wg(qi,qj)
3.2
relevance vector of a given query q rel . It requires
the following inputs in addition to QFG. Two other
inputs for the accuracy and the time budget are
Computing Query Relevance
1.Total number of random walks (noRWs).
2.The size of neighborhood (maxHops).
For a user query q, the relevance vector is
calculated using QFG, where each entry represents
the relevance value of each query qj ∈ VQ to q.
Because this is not sufficient to use the pair wise
relevance values in QFG a more sophisticated called
Markov chain for q (MKq) is used.This is called the
fusion relevance vector of q rel
fq
for stationary
Tthe length of each random walk is maxHops, if the
queries from q to ql are not alike.
The number of random walk or the lengths of random
walks are limited by decreasing both parameters for a
faster computation of relfq . . The selection of the
next node is based on the outgoing edges of the
distribution.
Many users issuing queries and clicks in
current node v in QFG. The damping factor d is
real-time and the huge size of QFG. Relavance(q)
performed by the SelectNextNodeToVisit process in
Input:
Step (7) of the algorithm, it can be shown in the
1)
2)
3)
4)
5)
6)
Query FusionGgraph, QFG
Jump Vector, j
Damping Factor, d
Number of Random Walks, noRWs
Size of Neighborhood, mxHops
the given query, q
Output:Fusion Relevance Vector for q, relFq
(0) Initialize relFq= 0
(1) noWaks = 0; noVisits = 0
(2) while noWaks < noRWs
(3) noHops = 0; v=q
(4) While v ≠ NULL ٨ noHops < mxHops
(5)
noHops++
(6)
relFq(v)++; noVists++
(7)
v = SelectNextNodeToVisit(v)
(8)
noWalks++
(9) For each v, normalize relFq(v) =
relFq(v)/noVisits
figure 4.
SelectNextNodeToVisit(v)
Input:
1.Query fusion graph, QFG
2. Jump Vector, j
3.Damping Factor, d
4. Current Node, v
Output: Next Node to visit, qi
(0) if rdm() < d
(1) V = {qi | (v,qi) € ∑QF }
(2) Pick a node qi € V with probability
wf(v,qi)
(3) else
(4) V = {qi | j(qi) > 0}
(5) Pick a node qi € V with probability
g(qi)
(6) Return qi
Fig. 4. Algorithm to select the next node to visit.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4233
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
queries to a search engine. A complex task such as
IV.
QUERY GROUPING
Buying a laptop
has to be broken down into a
The relevance of other queries is captured in
number of co-dependent steps over a period of time.
an image. The context vector is noted which
For example, a user may first search on the company,
aggregates the images of queries to form an overall
features of laptops, prices etc. After deciding the
representation.. The query reformulation graph, query
company and prices the user may search for the most
images, and context vectors are used in which lend
accessories like mouse and modem.Each step
Markov chain process to determine
requires one or more queries, and each query results
relevance
in one or more clicks on relevant pages. Keyword
between queries and query groups.
based search engines cannot address this kind of
is used to
complicated tasks. So a better system is required that
calculate the similarity between the query group and
can enable the user to pursue complex search quests
the user’s query group. The query group s of the
online. Search Engine tries to construct user profile
context vector is denoted as the cxts. The context
based on his ipaddress/login credentials from its user
vector of a query group s by weighting the queries
search history repositories. If the user already exists,
and the clicks in s by recency, as follows:
the search engine checks from its user search history
Context Vector: A context vector
cxts = ωrecency
ωrecency) kjrel(qs1,clk1)
repositories up to a certain threshold whether the user
already queried the same query previously. If the user
Query Image: consider two relevant queries,
did, then search engine further retrieves click points
“Apple” and “Apple Mac in table 1. The relevance
from
value in the fusion relevance vectors rel”Apple”
Mac”)
(“Apple
.
user
search
history
repositories
and
reformulates query results by generating click graphs.
Click graphs contain useful information on user
behavior when searching online. This step is called
Online Query Grouping: To avoid performing the
query fusion graph. Uses random walk propagation
random walk computation of fusion relevance vector
over the query fusion graph instead of time-based and
for each query estimate that to cache the fusion
keyword similarity based approaches. This entire
relevance vectors of 100 million queries..
process is called Personalizing the user search
histories into query groups.
V.
EXISTING SYSTEM
A user queries a search engine has been
VI.
PROPOSED SYSTEM
existed. The search engine displays results based
page ranking algorithms.. The primary means of
Most users want their search engine to
accessing information online is still through keyword
incorporate three key features in query results. A
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4234
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
query such as buying a laptop has to be divided into
process is called personalizing user search histories
different steps.iew. For example, a user may first
into query groups. For making the IR Systems
search on the company, features of laptops, prices
effective and dynamic we propose to use these search
etc. After deciding the company and prices the user
quests as auto complete features in similar query
may search for the most accessories like mouse and
propagations. Biasing the ranking of search results
modem. The click points of click graphs of user
can
search behavior the IR system can support complex
algorithms(top-k
queries for future requests at reduced navigations.
methods yields dynamic performance in IR systems,
Random walk propagation over the query fusion
by providing enriched user querying experience
also
be
provided
using
algorithms).
any
ranking
Supporting
these
graph methods support complex search quests in IR
systems at reduced times. For developing an
interactive IR system we also propose to use these
VIII.
REFERENCES
search quests as auto complete features in similar
query propagations. Biasing the ranking of search
results can also be provided using ranking algorithms
(top-k algorithms).These methods gives improved
performance
in
IR
systemsA
practical
implementation of the proposed system validates our
claim. During the online complex quest to identify
and to group the related queries together we have a
standout step towards the enabling services and
feature that are capable. Currently we are using the
“Search History” in major search engines where users
can allow tracking their online searches by recording
[1] Heasoo Hwang, Hady W. Lauw, Lise Getoor and
Alexandros Ntoulas,
Organizing User Search
Histories, Volume: 24 , Issue: 5
in IEEE 2012
Transactions on Knowledge and Data Engineering
[2] J. Teevan, E. Adar, R. Jones, and M. A. S. Potts,
“Information reretrieval: repeat queries in yahoo’s
logs,” in SIGIR. New York, NY, USA: ACM, 2007,
pp. 151–158.
[3] A. Broder, “A taxonomy of web search,” SIGIR
Forum, vol. 36, no. 2, pp. 3–10, 2002.
[4] R. Jones and K. L. Klinkner, “Beyond the session
timeout: Automatic hierarchical segmentation of
the queries and clicks.
search topics in query logs,” in CIKM, 2008.
[5] A. Spink, M. Park, B. J. Jansen, and J. Pedersen,
VII.
“Multitasking
CONCLUSION
during
Web
search
sessions,”
Information Processing and Management, vol. 42,
Click graphs have
information on user behavior
no. 1, pp. 264–275, 2006.
while searching online. This step is called query
[6] A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of
fusion graph. Uses random walk propagation over the
query reformulation and relevance feedback by
query fusion graph instead of time-based and
Excite
users.
Internet
Research:
Electronic
keyword similarity based approaches. This entire
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4235
International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- Sep 2013
Networking Applications and Policy, 10(4):317–328,
2000.
Authors Profile
Dr Rupa, working as theof the Department of
Computer
Science
and
Engineering
in
VVIT.Nambur.She has 12 yearsof teaching
Santhi Kolli
She completed her Masters Computer Science from
experience.
Periyar University.Right now she is the student of
B.Tech(CSIT) Degree in Computer Science &
Vasireddy
Information Technology from JNTU, Hyderabad
Pursuing
Venkatadri
M.Tech
in
institute
of
Computer
Technology
Science
and
Engineering.
Dr
Ch.Rupa,
obtained
her
in 2002 and M.Tech (IT) in Information
Technology from Andhra University in 2005.
She
was awarded Ph. D (CSE) in Computer Science
Engineering by Andhra University.
Ramachandran, is a research scholar at Acharya
NagarjunaUniversity, Nambur. He got his B.TECH
Computer
Science
&Systems
engineering
Degreefrom Andhra University andM.TECH in
Computer
ScienceEngineering
from
JNTU,
Kakinada. He is verymuch interested in image
processing, medical retrieval , human vision &
pattern recognition.He did several projects in image
processing.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 4236
Download