PPT - KBS

advertisement
web science Presentation on the topic Query Recommendation
Papers:
1.
Query Suggestions in the Absence of Query Logs
2.
DQR: A Probabilistic Approach to Diversified Query
Recommendation
Muhammad Nuruddin
ITIS M. Sc. Student
Leibniz Universitat Hannover
Winter Semester 2012/13
Matrikelnummer: 2961230
Query Suggestion/Recommendation
Assist users
providing a
list of queries
have been
proven to be
effective.
Paper 1. Query Suggestions in the Absence of Query Logs
1. Query Suggestions in the Absence of Query Logs
Background:
 Most of the existing query suggestion works
based on query logs.
Log based suggestion suitable for system with
large user base, large interactions, past usage
Not suitable for system with smaller user base,
system without large log.
Not suitable for newly deployed systems query
suggestion.
Example: desktop search, personal email search.
1. Query Suggestions in the Absence of Query Logs
How to suggest query where users and query log
are insufficient?
 They proposed a document centric
probabilistic metcanism.
 Query phrases present in documents are
suggested.
Index phrases from the document corpus
suggested to complete the partial user query.
1. Query Suggestions in the Absence of Query Logs
Steps:
1. Phrase Extraction.
- N-gram phrases of order 1,2 and 3 from the
document corpus.
- Ex: “president of Germany”, “president of”,
“of Germany”, “president” , “Germany”.
2. Query suggestion
- following a probabilistic model
1. Query Suggestions in the Absence of Query Logs
2. Query suggestion
1/9
Probabilistic Model for Query Suggestion
Suppose a user typed an incomplete query
The query can be decomposed as follows:
denotes completed portion of the query
denotes the last word of
that the user is still typing
Example:
Einsteins Rel…..
1. Query Suggestions in the Absence of Query Logs
2. Query suggestion
2/9
Probabilistic Model for Query Suggestion
Pi = phrase i ( N-gram ) from the Document corpus from
step 1 ( Phrase extraction from documents)
Using Bayes’ theorem if we calculate
( probability / suitability of Pi as a suggested
completion of query for
)
Then we will be able to recommend m phrases of P which
have higher value of
Pi = phrase i ( N-gram ) from the Document corpus from
step 1 ( Phrase extraction from documents)
1. Query Suggestions in the Absence of Query Logs
2. Query suggestion
3/9
Probabilistic Model for Query Suggestion
They derived the probability equation to:
P(pi|Qt) = Probability of phrase pi can be typed that he has
already typed Qt ( phrase selection probability )
P(Qc|pi) = Correlation between phrase pi and already
typed complete part ( Qc) of query
Albert Einsteins Rel….. =
1. Query Suggestions in the Absence of Query Logs
2. Query suggestion
4/9
Probabilistic Model for Query Suggestion
Example:
P(pi|Qt) = Probability of phrase pi can be typed that he has
already typed Qt ( phrase selection probability )
Bill Gate….. = Qc + Qt
Qt = Gate
P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates
life”, “Bill Gates Foundation”, “India Gate Rice” …}
1. Query Suggestions in the Absence of Query Logs
2. Query suggestion
5/9
Probabilistic Model for Query Suggestion
Example:
P(Qc|pi) = Correlation between phrase pi and already
typed complete part ( Qc) of query
Bill Gate….. = Qc + Qt
Qc = Bill
P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates
life”, “Bill Gates Foundation”, “India Gate Rice” …}
1. Query Suggestions in the Absence of Query Logs
P(pi|Qt) = Probability of phrase pi can be typed that he has
already typed Qt ( phrase selection probability )
C=c1,c2 …. cm the set of m possible words for Qt
Bill Gate….. = Qc + Qt
Qt = Gate
C = { “Gates” , “Gate”, “Gateway”…}
P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates
life”, “Bill Gates Foundation”, “India Gate Rice” …}
Bill Gate….. = Qc + Qt
Qt = Gate
C = { “Gates” , “Gate”, “Gateway”… Cm}
P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates
life”, “Bill Gates Foundation”, “India Gate Rice” … Pn }
P(ci| Qt ) ~ freq( ci ), more used words
In the corpus have higher probability to be useful
For query recommendation
Without IDF some rare
but relevant words will
be suppressed
Bill Gate….. = Qc + Qt
Qt = Gate
C = { “Gates” , “Gate”, “Gateway”… Cm}
P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates
life”, “Bill Gates Foundation”, “India Gate Rice” … Pn }
1. Query Suggestions in the Absence of Query Logs
2. Query suggestion
9/9
Probabilistic Model for Query Suggestion
Example:
Bill Gate….. = Qc + Qt
Qc = Bill
P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates
life”, “Bill Gates Foundation”, “India Gate Rice” …}
1. Query Suggestions in the Absence of Query Logs
Document Corpus
P = { “Bill Gates”, “Indian Gate”,
“Gateway”, “Bill Gates life”,
“Bill Gates Foundation”,
“India Gate Rice” …pn}
P
Bill Gate….. := Qc = Bill, Qt = Gate
Pi
Qc , Q t
References
[1] Solr–Enterprise Search Platform,
http://lucene.apache.org/solr/.
[2] R. Baeza-Yates, C. Hurtado, and M. Mendoza.
Query Recommendation Using Query Logs in
Search Engines, volume 3268/2004 of Lecture
Notes in Computer Science, pages 588–596.
Springer Berlin / Heidelberg, November 2004.
[3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini,
R. Perego, and F. Silvestri. Aging effects on
query flow graphs for query suggestion. In
CIKM ’09: Proceeding of the 18th ACM
conference on Information and knowledge
management, pages 1947–1950, 2009.
[4] M. Barouni-Ebarhimi and A. A. Ghorbani. A novel
approach for frequent phrase mining in web
search engine query streams. In CNSR ’07:
Proceedings of the Fifth Annual Conference on
Communication Networks and Services
Research, pages 125–132, Washington, DC,
USA, 2007. IEEE Computer Society.
[5] H. Bast and I. Weber. Type less, find more: Fast
autocompletion search with a succinct index. In
SIGIR’06, pages 364–371, 2006.
[6] H. Bast and I. Weber. The CompleteSearch
Engine: Interactive, Efficient, and Towards IR&
DB integration. In CIDR’07, pages 88–95, 2007.
[7] S. Bhatia and P. Mitra. Adopting inference
networks for online thread retrieval. In
Proceedings of the Twenty-Fourth AAAI
Conference on Artificial Intelligence, pages
1300–1305, Atlanta, Georgia, USA, July 11-15
2010.
[8] D. C. Blair and M. E. Maron. An evaluation of
retrieval effectiveness for a full-text documentretrieval system. Commun. ACM, 28(3):289–
299, 1985.
[9] P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S.
Vigna. Query suggestions using query-flow
graphs. In WSCD ’09: Proceedings of the 2009
workshop on Web Search Click Data, pages 56–
63, 2009.
[10] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen,
and H. Li. Context-aware query suggestion by
mining click-through and session data. In
KDD’08, pages 875–883, 2008.
End of Discussion on
1. Query Suggestions in the Absence of Query Logs
2. DQR: A Probabilistic Approach to Diversified Query
Recommendation
2. DQR: A Probabilistic Approach to Diversified
Query Recommendation
• In this paper they proposed a query
recommendation methodology for log based
system
• Two components of their proposed system
1. Query concept building (Concept Mining)
- Clustering the search logs.
2. Recommending query from the concepts.
- Probabilistic model to select top m query
concepts and selecting representative query
of each concept.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
A good quality recommender system should have 5 property:
1. Relevancy: Recommended queries should be semantically relevant to the
user search query.
2. Redundancy Free: The recommendation should not contain redundant
queries that repeat similar search intents.
3. Diversity: The recommendation should cover search intents of different
interpretations of the keywords given in the input query.
4. Ranking: Highly relevant queries should be ranked first ahead of less
relevant ones in the recommendation list.
5. Efficiency: Query recommendation provides online helps. Therefore,
recommendation algorithms should achieve fast response times
They claimed that DQR is the first system to address all the 5 requirements
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
A click-through bipartite graph
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
• Two components of their proposed system
1. Query concept building (Concept Mining)
- Clustering the search logs.
2. Recommending query from the concepts.
- Probabilistic model to select top m query concepts and selecting
representative query of each concept.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
• number of queries in Q is huge
• 10 million queries in the AOL dataset
• Even picking, say, m = 10 recommended queries from
Q involves a huge search space.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
1. Query concept building (Concept Mining)
1/3
1.1) Concept Mining:
- Similar queries are grouped to form query concept.
- For this grouping each query is represented by a |D|- dimentional vector
- User-frequency-inverse-query-frequency(UF-IQF) scores qi for
dimensions dj
UF
IQF
Nu(qi,dj) = No. of Unique users issued qi and clicking URL dj
Nq(dj) = No. of queries that lead to clicking URL dj
Normalized weight
Similarity of query qi and qj
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
1. Query concept building (Concept Mining) 2/3
1.1) Concept Mining:
- K means clustering is not suitable, algorithm did not
terminated for two days.
- Instead a one pass algorithm is porposed
- very efficient but highly sensitive to order
Example:
Compactness:
Average pairwise
Distance in a cluster
< 0.5
q1,q2,q3 : C1 = {{q1,q2},{q3}} ; q2,q3,q1 : C1 = {{q1},{q2,q3}}
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
1. Query concept building (Concept Mining) 3/3
1.1) Concept Mining:
Diameter measuer L(c) of cluster C
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
2. Recommending query from the concepts.
- Probabilistic model to select top m query concepts and
selecting representative query of each concept.
- A heuristic algorithm is applied to find a set
of m query concepts such that
is maximum. To
construct Yc incrementally they applied greedy strategy.
- In the greedy approach, they added one more concept at a
time until m. At each step it picks the concept
to maximize the probability increment:
where is input query, query concept
and
the set of m query concepts
belongs
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
2. Recommending query from the concepts.
- Probabilistic model to select top m query concepts and
selecting representative query of each concept.
2. DQR: A Probabilistic Approach to Diversified Query
Recommendation
Selecting representative query of each concept.
- By popularity vote from the log
- For concept C, its representative query is the
one that is issued by large no. of distinct user
among all the queries in C
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
Result comparison from different approaches:
Top 10 queries recommended by the 6 methods for the input query “yahoo”
SR= Similarity based ranking. Finding similar query in past in log, ignores redundancy
MMR = Maximal Marginal Relevance, Considers relevancy & diversity*,
ignores redundancy
CACB = Context-Aware Concept-Based Method.
Based on search session, builds query concepts. Ignores diversity*
DQR-ND = DQR with no Diversity. Same to DQR, ignores diversity.
DQR-OPC = DQR with One Pass Clustering, Same to DQR, but uses
only one pass for clustering
DQR = Diversified Query Recommendation
*Diversity: The recommendation should cover search intents of different
interpretations of the keywords given in the input query.
Rererences
[1] http://www.cs.hku.hk/research/techreps/document/TR2012-06.pdf.
[2] R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza.
Query recommendation using query logs in search
engines. In EDBT Workshops, 2004.
[3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R.
Perego, and F. Silvestri. Aging effects on query flow
graphs for query suggestion. In CIKM, 2009.
[4] D. Beeferman and A. L. Berger. Agglomerative clustering
of a search engine query log. In KDD, 2000.
[5] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M.
Deeds, N. Hamilton, and G. N. Hullender. Learning to
rank using gradient descent. In ICML, 2005.
[6] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li.
Context-aware query suggestion by mining clickthrough and session data. In KDD, 2008.
[7] J. G. Carbonell and J. Goldstein. The use of mmr,
diversity-based reranking for reordering documents
and producing summaries. In SIGIR, 1998.
[8] P.-A. Chirita, C. S. Firan, and W. Nejdl. Personalized
queryexpansion for the web. In SIGIR, 2007.[
9] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W.
Furnas, and R. A. Harshman. Indexing by latent
semantic analysis. JASIS, 41(6), 1990.
[10] H. Deng, I. King, and M. R. Lyu. Entropy-biased models
for query representation on the click graph. In SIGIR,
2009.
[11] B. M. Fonseca, P. B. Golgher, B. Pôssas, B. A. RibeiroNeto, and N. Ziviani. Concept-based interactive query
expansion. In CIKM, 2005.
[12] J. Guo, X. Cheng, G. Xu, and H. Shen. A structured
approach to query recommendation with social
annotation data. In CIKM, 2010.
[13] J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query
similarity. In CIKM, 2011.
[14] K. Järvelin and J. Kekäläinen. Cumulated gain-based
evaluation of IR techniques. ACM Trans. Inf. Syst.,
20(4), 2002.
[15] H. Ma, M. R. Lyu, and I. King. Diversifying query
suggestion results. In AAAI, 2010.
[16] Q. Mei, D. Zhou, and K. W. Church. Query suggestion
using hitting time. In CIKM, 2008. Torgeson. A picture
of search. In Infoscale, 2006.
[18] M. Sanderson. Ambiguous queries: test collections
need more sense. In SIGIR, 2008.
[19] E. M. Voorhees. The TREC-8 question answering
rack report. In TREC, 1999.
[20] X. Wang and C. Zhai. Learn from web search
logs to organize search results. In SIGIR, 2007.
[21] J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering
user queries of a search engine. In WWW, 2001.
End of the Presentation
Thank you very much for your attention!
Download