Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata Motivation by Tojosan @ Flickr What is query expansion? Add meaningful search terms to the query… What is PIR based query expansion? Add meaningful search terms to the query… … related to the use’s interests. Why PIR based query expansion? More personalization quality! More privacy! Example Google search: “canon book” Example Top 3 results: • The Canon: A Whirligig Tour of the Beautiful Basics of Science (Hardcover) @ Amazon • Western Canon @ Wikipedia • Biblical Canon @ Wikipedia Example Top 3 results: • The Canon: A Whirligig Tour of the Beautiful Basics of Science (Hardcover) @ Amazon • Western Canon @ Wikipedia • Biblical Canon @ Wikipedia Example Expanded query: “canon book bible” Example Top 3 results: • Biblical Canon @ Wikipedia • Books of the Bible @ Wikipedia • The Canon of the Bible @ catholicapologetics.org Query Expansion using Desktop data by Old Shoe Woman @ Flickr Algorithms • Expanding with Local Desktop Analysis • Expanding with Global Desktop Analysis Algorithms • Expanding with Local Desktop Analysis • Expanding with Global Desktop Analysis Expanding with Local Desktop Analysis • Term and Document Frequency • Lexical Compounds • Sentence Selection Expanding with Local Desktop Analysis • Term and Document Frequency • Lexical Compounds • Sentence Selection Term and Document Frequency 1 1 πππππππ − πππ πππππππππ = + β β log(1 + ππΉ) 2 2 πππππππ Expanding with Local Desktop Analysis • Term and Document Frequency • Lexical Compounds • Sentence Selection Lexical Compounds { adjective? Noun+ } Expanding with Local Desktop Analysis • Term and Document Frequency • Lexical Compounds • Sentence Selection Sentence Selection ππ 2 ππ 2 ππππ‘πππππππππ = + ππ + ππ ππ 7 − 0.1 × 25 − ππ , ππ ππ < 25 , ππ ππ ∈ [25, 40] ππΉ > ππ = 7 7 + 0.1 × ππ − 40 , ππ ππ > 40 π΄π£π ππ − ππππ‘πππππΌππππ₯ ππ = π΄π£π2 (ππ) 0 , ππ ππππ‘πππππΌππππ₯ ≤ 10 , ππ ππππ‘πππππΌππππ₯ > 10 Expanding with Global Desktop Analysis • Term Co-occurrence Statistics • Thesaurus based Expansion Expanding with Global Desktop Analysis • Term Co-occurrence Statistics • Thesaurus based Expansion Term Co-occurrence Statistics Expanding with Global Desktop Analysis • Term Co-occurrence Statistics • Thesaurus based Expansion Thesaurus based Expansion Experiments & Evaluation by Canadian Museum of Nature @ Flickr Experiments • 18 users • Files indexed within user selected paths, Emails and Web cache Experiments • They chose 4 queries: – 1 from the top 2% log queries (avg. length = 2.0) – 1 random log query (avg. length = 2.3) – 1 self-selected specific query (avg. length = 2.9) – 1 self-selected ambiguous query (avg. length = 1.8) Evaluation πΊ 1 π·πΆπΊ π = π·πΆπΊ π − 1 + , ππ π = 1 πΊ π , ππ‘βπππ€ππ π log 2 (i) Evaluation • Evaluated algorithms: – – – – Google: Google query output TF, DF: Term and Document Frequency LC, LC[O]: Regular and Optimized Lexical Compounds TC[CS], TC[MI], TC[LR]: Term Co-occurrences Statistics using Cosine Similarity, Mutual Information and Likelihood Ratio – WN[SYN], WN[SUB], WN[SUP]: WordNet based expansion with synonyms, sub-concepts and superconcepts. Results Log queries: Results Self-selected queries: Introducing Adaptativity by RavenCore17 @ Flickr Query Clarity Adaptive Expansion Experiments • Same experimental setup as for the previous analyzis. Results Log queries: Results Self-selected queries: Results Conclusions by ThisIsIt2 @ Flickr Conclusions • Five techniques for determining expansion terms from personal documents. • Empirical analysis showed that these approaches perform very well. • Expansion process adapts accordingly to query features. • Adaptive expansion process proved to yield significant improvements over the static one. End Any questions?