Personalized Query Expansion for the Web

advertisement
Personalized Query Expansion
for the Web
Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl
Gabriel Barata
Motivation
by Tojosan @ Flickr
What is query expansion?
Add meaningful search terms to the query…
What is PIR based query expansion?
Add meaningful search terms to the query…
… related to the use’s interests.
Why PIR based query expansion?
More personalization quality!
More privacy!
Example
Google search: “canon book”
Example
Top 3 results:
• The Canon: A Whirligig Tour of the Beautiful
Basics of Science (Hardcover) @ Amazon
• Western Canon @ Wikipedia
• Biblical Canon @ Wikipedia
Example
Top 3 results:
• The Canon: A Whirligig Tour of the Beautiful
Basics of Science (Hardcover) @ Amazon
• Western Canon @ Wikipedia
• Biblical Canon @ Wikipedia
Example
Expanded query: “canon book bible”
Example
Top 3 results:
• Biblical Canon @ Wikipedia
• Books of the Bible @ Wikipedia
• The Canon of the Bible @ catholicapologetics.org
Query Expansion using Desktop data
by Old Shoe Woman @ Flickr
Algorithms
• Expanding with Local Desktop Analysis
• Expanding with Global Desktop Analysis
Algorithms
• Expanding with Local Desktop Analysis
• Expanding with Global Desktop Analysis
Expanding with Local Desktop Analysis
• Term and Document Frequency
• Lexical Compounds
• Sentence Selection
Expanding with Local Desktop Analysis
• Term and Document Frequency
• Lexical Compounds
• Sentence Selection
Term and Document Frequency
1 1 π‘›π‘Ÿπ‘Šπ‘œπ‘Ÿπ‘‘π‘  − π‘π‘œπ‘ 
π‘‡π‘’π‘Ÿπ‘šπ‘†π‘π‘œπ‘Ÿπ‘’ = + βˆ™
βˆ™ log(1 + 𝑇𝐹)
2 2
π‘›π‘Ÿπ‘Šπ‘œπ‘Ÿπ‘‘π‘ 
Expanding with Local Desktop Analysis
• Term and Document Frequency
• Lexical Compounds
• Sentence Selection
Lexical Compounds
{ adjective? Noun+ }
Expanding with Local Desktop Analysis
• Term and Document Frequency
• Lexical Compounds
• Sentence Selection
Sentence Selection
π‘†π‘Š 2
𝑇𝑄 2
π‘†π‘’π‘›π‘‘π‘’π‘›π‘π‘’π‘†π‘π‘œπ‘Ÿπ‘’ =
+ 𝑃𝑆 +
π‘‡π‘Š
𝑁𝑄
7 − 0.1 × 25 − 𝑁𝑆
, 𝑖𝑓 𝑁𝑆 < 25
, 𝑖𝑓 𝑁𝑆 ∈ [25, 40]
𝑇𝐹 > π‘šπ‘  = 7
7 + 0.1 × π‘π‘† − 40
, 𝑖𝑓 𝑁𝑆 > 40
𝐴𝑣𝑔 𝑁𝑆 − 𝑆𝑒𝑛𝑑𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒π‘₯
𝑃𝑆 =
𝐴𝑣𝑔2 (𝑁𝑆)
0
, 𝑖𝑓 𝑆𝑒𝑛𝑑𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒π‘₯ ≤ 10
, 𝑖𝑓 𝑆𝑒𝑛𝑑𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒π‘₯ > 10
Expanding with Global Desktop
Analysis
• Term Co-occurrence Statistics
• Thesaurus based Expansion
Expanding with Global Desktop
Analysis
• Term Co-occurrence Statistics
• Thesaurus based Expansion
Term Co-occurrence Statistics
Expanding with Global Desktop
Analysis
• Term Co-occurrence Statistics
• Thesaurus based Expansion
Thesaurus based Expansion
Experiments & Evaluation
by Canadian Museum of Nature @ Flickr
Experiments
• 18 users
• Files indexed within user selected paths,
Emails and Web cache
Experiments
• They chose 4 queries:
– 1 from the top 2% log queries (avg. length = 2.0)
– 1 random log query (avg. length = 2.3)
– 1 self-selected specific query (avg. length = 2.9)
– 1 self-selected ambiguous query (avg. length = 1.8)
Evaluation
𝐺 1
𝐷𝐢𝐺 𝑖 =
𝐷𝐢𝐺 𝑖 − 1 +
, 𝑖𝑓 𝑖 = 1
𝐺 𝑖
, π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
log 2 (i)
Evaluation
• Evaluated algorithms:
–
–
–
–
Google: Google query output
TF, DF: Term and Document Frequency
LC, LC[O]: Regular and Optimized Lexical Compounds
TC[CS], TC[MI], TC[LR]: Term Co-occurrences Statistics
using Cosine Similarity, Mutual Information and
Likelihood Ratio
– WN[SYN], WN[SUB], WN[SUP]: WordNet based
expansion with synonyms, sub-concepts and superconcepts.
Results
Log queries:
Results
Self-selected queries:
Introducing Adaptativity
by RavenCore17 @ Flickr
Query Clarity
Adaptive Expansion
Experiments
• Same experimental setup as for the previous
analyzis.
Results
Log queries:
Results
Self-selected queries:
Results
Conclusions
by ThisIsIt2 @ Flickr
Conclusions
• Five techniques for determining expansion
terms from personal documents.
• Empirical analysis showed that these
approaches perform very well.
• Expansion process adapts accordingly to query
features.
• Adaptive expansion process proved to yield
significant improvements over the static one.
End
Any questions?
Download