PPT - KBS

web science Presentation on the topic Query Recommendation Papers: 1. Query Suggestions in the Absence of Query Logs 2. DQR: A Probabilistic Approach to Diversified Query Recommendation Muhammad Nuruddin ITIS M. Sc. Student Leibniz Universitat Hannover Winter Semester 2012/13 Matrikelnummer: 2961230 Query Suggestion/Recommendation Assist users providing a list of queries have been proven to be effective. Paper 1. Query Suggestions in the Absence of Query Logs 1. Query Suggestions in the Absence of Query Logs Background:  Most of the existing query suggestion works based on query logs. Log based suggestion suitable for system with large user base, large interactions, past usage Not suitable for system with smaller user base, system without large log. Not suitable for newly deployed systems query suggestion. Example: desktop search, personal email search. 1. Query Suggestions in the Absence of Query Logs How to suggest query where users and query log are insufficient?  They proposed a document centric probabilistic metcanism.  Query phrases present in documents are suggested. Index phrases from the document corpus suggested to complete the partial user query. 1. Query Suggestions in the Absence of Query Logs Steps: 1. Phrase Extraction. - N-gram phrases of order 1,2 and 3 from the document corpus. - Ex: “president of Germany”, “president of”, “of Germany”, “president” , “Germany”. 2. Query suggestion - following a probabilistic model 1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 1/9 Probabilistic Model for Query Suggestion Suppose a user typed an incomplete query The query can be decomposed as follows: denotes completed portion of the query denotes the last word of that the user is still typing Example: Einsteins Rel….. 1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 2/9 Probabilistic Model for Query Suggestion Pi = phrase i ( N-gram ) from the Document corpus from step 1 ( Phrase extraction from documents) Using Bayes’ theorem if we calculate ( probability / suitability of Pi as a suggested completion of query for ) Then we will be able to recommend m phrases of P which have higher value of Pi = phrase i ( N-gram ) from the Document corpus from step 1 ( Phrase extraction from documents) 1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 3/9 Probabilistic Model for Query Suggestion They derived the probability equation to: P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability ) P(Qc|pi) = Correlation between phrase pi and already typed complete part ( Qc) of query Albert Einsteins Rel….. = 1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 4/9 Probabilistic Model for Query Suggestion Example: P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability ) Bill Gate….. = Qc + Qt Qt = Gate P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …} 1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 5/9 Probabilistic Model for Query Suggestion Example: P(Qc|pi) = Correlation between phrase pi and already typed complete part ( Qc) of query Bill Gate….. = Qc + Qt Qc = Bill P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …} 1. Query Suggestions in the Absence of Query Logs P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability ) C=c1,c2 …. cm the set of m possible words for Qt Bill Gate….. = Qc + Qt Qt = Gate C = { “Gates” , “Gate”, “Gateway”…} P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …} Bill Gate….. = Qc + Qt Qt = Gate C = { “Gates” , “Gate”, “Gateway”… Cm} P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” … Pn } P(ci| Qt ) ~ freq( ci ), more used words In the corpus have higher probability to be useful For query recommendation Without IDF some rare but relevant words will be suppressed Bill Gate….. = Qc + Qt Qt = Gate C = { “Gates” , “Gate”, “Gateway”… Cm} P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” … Pn } 1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 9/9 Probabilistic Model for Query Suggestion Example: Bill Gate….. = Qc + Qt Qc = Bill P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …} 1. Query Suggestions in the Absence of Query Logs Document Corpus P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …pn} P Bill Gate….. := Qc = Bill, Qt = Gate Pi Qc , Q t References [1] Solr–Enterprise Search Platform, http://lucene.apache.org/solr/. [2] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query Recommendation Using Query Logs in Search Engines, volume 3268/2004 of Lecture Notes in Computer Science, pages 588–596. Springer Berlin / Heidelberg, November 2004. [3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R. Perego, and F. Silvestri. Aging effects on query flow graphs for query suggestion. In CIKM ’09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 1947–1950, 2009. [4] M. Barouni-Ebarhimi and A. A. Ghorbani. A novel approach for frequent phrase mining in web search engine query streams. In CNSR ’07: Proceedings of the Fifth Annual Conference on Communication Networks and Services Research, pages 125–132, Washington, DC, USA, 2007. IEEE Computer Society. [5] H. Bast and I. Weber. Type less, find more: Fast autocompletion search with a succinct index. In SIGIR’06, pages 364–371, 2006. [6] H. Bast and I. Weber. The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB integration. In CIDR’07, pages 88–95, 2007. [7] S. Bhatia and P. Mitra. Adopting inference networks for online thread retrieval. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pages 1300–1305, Atlanta, Georgia, USA, July 11-15 2010. [8] D. C. Blair and M. E. Maron. An evaluation of retrieval effectiveness for a full-text documentretrieval system. Commun. ACM, 28(3):289– 299, 1985. [9] P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-flow graphs. In WSCD ’09: Proceedings of the 2009 workshop on Web Search Click Data, pages 56– 63, 2009. [10] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In KDD’08, pages 875–883, 2008. End of Discussion on 1. Query Suggestions in the Absence of Query Logs 2. DQR: A Probabilistic Approach to Diversified Query Recommendation 2. DQR: A Probabilistic Approach to Diversified Query Recommendation • In this paper they proposed a query recommendation methodology for log based system • Two components of their proposed system 1. Query concept building (Concept Mining) - Clustering the search logs. 2. Recommending query from the concepts. - Probabilistic model to select top m query concepts and selecting representative query of each concept. 2. DQR: A Probabilistic Approach to Diversified Query Recommendation A good quality recommender system should have 5 property: 1. Relevancy: Recommended queries should be semantically relevant to the user search query. 2. Redundancy Free: The recommendation should not contain redundant queries that repeat similar search intents. 3. Diversity: The recommendation should cover search intents of different interpretations of the keywords given in the input query. 4. Ranking: Highly relevant queries should be ranked first ahead of less relevant ones in the recommendation list. 5. Efficiency: Query recommendation provides online helps. Therefore, recommendation algorithms should achieve fast response times They claimed that DQR is the first system to address all the 5 requirements 2. DQR: A Probabilistic Approach to Diversified Query Recommendation A click-through bipartite graph 2. DQR: A Probabilistic Approach to Diversified Query Recommendation • Two components of their proposed system 1. Query concept building (Concept Mining) - Clustering the search logs. 2. Recommending query from the concepts. - Probabilistic model to select top m query concepts and selecting representative query of each concept. 2. DQR: A Probabilistic Approach to Diversified Query Recommendation • number of queries in Q is huge • 10 million queries in the AOL dataset • Even picking, say, m = 10 recommended queries from Q involves a huge search space. 2. DQR: A Probabilistic Approach to Diversified Query Recommendation 1. Query concept building (Concept Mining) 1/3 1.1) Concept Mining: - Similar queries are grouped to form query concept. - For this grouping each query is represented by a |D|- dimentional vector - User-frequency-inverse-query-frequency(UF-IQF) scores qi for dimensions dj UF IQF Nu(qi,dj) = No. of Unique users issued qi and clicking URL dj Nq(dj) = No. of queries that lead to clicking URL dj Normalized weight Similarity of query qi and qj 2. DQR: A Probabilistic Approach to Diversified Query Recommendation 1. Query concept building (Concept Mining) 2/3 1.1) Concept Mining: - K means clustering is not suitable, algorithm did not terminated for two days. - Instead a one pass algorithm is porposed - very efficient but highly sensitive to order Example: Compactness: Average pairwise Distance in a cluster < 0.5 q1,q2,q3 : C1 = {{q1,q2},{q3}} ; q2,q3,q1 : C1 = {{q1},{q2,q3}} 2. DQR: A Probabilistic Approach to Diversified Query Recommendation 1. Query concept building (Concept Mining) 3/3 1.1) Concept Mining: Diameter measuer L(c) of cluster C 2. DQR: A Probabilistic Approach to Diversified Query Recommendation 2. Recommending query from the concepts. - Probabilistic model to select top m query concepts and selecting representative query of each concept. - A heuristic algorithm is applied to find a set of m query concepts such that is maximum. To construct Yc incrementally they applied greedy strategy. - In the greedy approach, they added one more concept at a time until m. At each step it picks the concept to maximize the probability increment: where is input query, query concept and the set of m query concepts belongs 2. DQR: A Probabilistic Approach to Diversified Query Recommendation 2. Recommending query from the concepts. - Probabilistic model to select top m query concepts and selecting representative query of each concept. 2. DQR: A Probabilistic Approach to Diversified Query Recommendation Selecting representative query of each concept. - By popularity vote from the log - For concept C, its representative query is the one that is issued by large no. of distinct user among all the queries in C 2. DQR: A Probabilistic Approach to Diversified Query Recommendation Result comparison from different approaches: Top 10 queries recommended by the 6 methods for the input query “yahoo” SR= Similarity based ranking. Finding similar query in past in log, ignores redundancy MMR = Maximal Marginal Relevance, Considers relevancy & diversity*, ignores redundancy CACB = Context-Aware Concept-Based Method. Based on search session, builds query concepts. Ignores diversity* DQR-ND = DQR with no Diversity. Same to DQR, ignores diversity. DQR-OPC = DQR with One Pass Clustering, Same to DQR, but uses only one pass for clustering DQR = Diversified Query Recommendation *Diversity: The recommendation should cover search intents of different interpretations of the keywords given in the input query. Rererences [1] http://www.cs.hku.hk/research/techreps/document/TR2012-06.pdf. [2] R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, 2004. [3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R. Perego, and F. Silvestri. Aging effects on query flow graphs for query suggestion. In CIKM, 2009. [4] D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, 2000. [5] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender. Learning to rank using gradient descent. In ICML, 2005. [6] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining clickthrough and session data. In KDD, 2008. [7] J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. [8] P.-A. Chirita, C. S. Firan, and W. Nejdl. Personalized queryexpansion for the web. In SIGIR, 2007.[ 9] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6), 1990. [10] H. Deng, I. King, and M. R. Lyu. Entropy-biased models for query representation on the click graph. In SIGIR, 2009. [11] B. M. Fonseca, P. B. Golgher, B. Pôssas, B. A. RibeiroNeto, and N. Ziviani. Concept-based interactive query expansion. In CIKM, 2005. [12] J. Guo, X. Cheng, G. Xu, and H. Shen. A structured approach to query recommendation with social annotation data. In CIKM, 2010. [13] J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query similarity. In CIKM, 2011. [14] K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4), 2002. [15] H. Ma, M. R. Lyu, and I. King. Diversifying query suggestion results. In AAAI, 2010. [16] Q. Mei, D. Zhou, and K. W. Church. Query suggestion using hitting time. In CIKM, 2008. Torgeson. A picture of search. In Infoscale, 2006. [18] M. Sanderson. Ambiguous queries: test collections need more sense. In SIGIR, 2008. [19] E. M. Voorhees. The TREC-8 question answering rack report. In TREC, 1999. [20] X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, 2007. [21] J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, 2001. End of the Presentation Thank you very much for your attention!

PPT - KBS

Related documents

Products

Support

PPT - KBS

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib