Personal Information Retrieval (PIR)

advertisement
Retrieval and Evaluation Techniques
for Personal Information
Jin Young Kim
7/26 Ph.D Dissertation Seminar
Personal Information Retrieval (PIR)

The practice and the study of supporting users to retrieve
personal information effectively
2
Personal Information Retrieval in the Wild

Everyone has unique information & practices



Different information and information needs
Different preference and behavior
Many existing software solutions


Platform-level: desktop search, folder structure
Application-level: email, calendar, office suites
3
Previous Work in PIR (Desktop Search)

Focus



User interface issues [Dumais03,06]
Desktop-specific features [Solus06] [Cohen08]
Limitations



Each based on different environment and user group
None of them performed comparative evaluation
Research findings do not accumulate over the years
4
Our Approach

Develop general techniques for PIR



Make contributions to related areas



Start from essential characteristics of PIR
Applicable regardless of users and information types
Structured document retrieval
Simulated evaluation for known-item finding
Build a platform for sustainable progress


Develop repeatable evaluation techniques
Share the research findings and the data
5
Essential Characteristics of PIR

Many document types

Unique metadata for each type

People combine search and browsing [Teevan04]

Long-term interactions with a single user

People mostly find known-items [Elsweiler07]

Privacy concern for the data set
Field-based
Search
Models
Associative
Browsing
Model
Simulated
Evaluation
Methods
6
Search and Browsing Retrieval Models

Challenge


Users may remember different things about the document
How can we present effective results for both cases?
User’s
Memory
Search
Browsing
Lexical
Memory
Associative
Memory
Query
James
Registration
Retrieval
Results
1.
2.
3.
4.
5.
7
Information Seeking Scenario in PIR
User
Input
Search
System
Output
James
A user initiate a session with
a keyword query
Registration
Browsing
2011
Search
The user switches to browsing
by clicking on a email document
James
Registration
2011
The user switches to back to
search with a different query
Simulated Evaluation Techniques

Challenge


User’s query originates from what she remembers.
How can we simulate user’s querying behavior realistically?
User’s
Memory
Search
Browsing
Lexical
Memory
Associative
Memory
Query
James
Registration
Retrieval
Results
1.
2.
3.
4.
5.
9
Research Questions

Field-based Search Models



Associative Browsing Model



How can we improve the retrieval effectiveness in PIR?
How can we improve the type prediction quality?
How can we enable the browsing support for PIR?
How can we improve the suggestions for browsing?
Simulated Evaluation Methods


How can we evaluate a complex PIR system by simulation?
How can we establish the validity of simulated evaluation?
10
Field-based Search Models
Searching for Personal Information

An example of desktop search
12
Field-based Search Framework for PIR

Type-specific Ranking


Type Prediction


Rank documents in each document collection (type)
Predict the document type relevant to user’s query
Final Results Generation

Merge into a single ranked list
13
Type-specific Ranking for PIR

Individual collection has type-specific features



Most of these documents have rich metadata




Thread-based features for emails
Path-based features for documents
Email: <sender, receiver, date, subject, body>
Document: <title, author, abstract, content>
Calendar: <title, date, place, participants>
We focus on developing general retrieval techniques for
structured documents
Structured Document Retrieval


Field Operator / Advanced Search Interface
User’s search terms are found in multiple fields
Understanding Re-finding Behavior in Naturalistic Email Interaction Logs.
Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
15
Structured Document Retrieval: Models

Document-based Retrieval Model


f1
Score each document as a whole
f2
...
Field-based Retrieval Model

fn
Combine evidences from each field
q1 q2 ... qm
q1 q2 ... qm
f1
f2
...
fn
Document-based Scoring
w1
w2
wn
f1
f2
...
fn
w1
w2
wn
Field-based Scoring
16
Field Relevance Model for Structured IR
Field Relevance


Different fields are important for different query terms
2
1
‘registration’ is relevant
when it occurs in <subject>
2
1
2
1
‘james’ is relevant when
it occurs in <to>
17
Estimating the Field Relevance: Overview

If User Provides Feedback


Relevant document provides sufficient information
If No Feedback is Available

Combine field-level term statistics from multiple sources
from/to
title
content
Collection
+
from/to
title
content
Top-k Docs
≅
from/to
title
content
Relevant Docs
18
Estimating Field Relevance using Feedback

Assume a user who marked DR as relevant


Estimate field relevance from the field-level term dist. of DR
We can personalize the results accordingly


Rank higher docs with similar field-level term distribution
This weight is provably optimal under LM retrieval framework
Field Relevance:
- To is relevant for ‘james’
- Content is relevant for ‘registration’
DR
19
Estimating Field Relevance without Feedback

Linear Combination of Multiple Sources


Weights estimated using training queries
Features

Field-level term distribution of the collection


Field-level term distribution of top-k docs


Unigram and Bigram LM
Unigram and Bigram LM
A priori importance of each field (wj)

Estimated using held-out training queries
Unigram is the
same to PRM-S
Pseudo-relevance
Feedback
Similar to MFLM
and BM25F
20
Retrieval Using the Field Relevance

Comparison with Previous Work
q1 q2 ... qm
f1
sum
f2
...
fn
w1
w2
wn
f1
f2
...
fn
q1 q2 ... qm
w1
w2
wn
f1
f2
...
fn
P(F1|q1)
P(F2|q1)
P(Fn|q1)
f1
f2
...
fn
P(F1|qm)
P(F2|qm)
P(Fn|qm)
multiply

Ranking in the Field Relevance Model
Per-term Field Score
Per-term Field Weight
21
Evaluating the Field Relevance Model

Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
DQL
BM25F
MFLM
FRM-C FRM-T
FRM-R
TREC
54.2%
59.7%
60.1%
62.4%
66.8%
79.4%
IMDB
40.8%
52.4%
61.2%
63.7%
65.7%
70.4%
Monster
42.9%
27.9%
46.0%
54.2%
55.8%
71.6%
Per-term Field Weights
Fixed Field Weights
80.0%
70.0%
60.0%
TREC
50.0%
IMDB
Monster
40.0%
30.0%
20.0%
DQL
BM25F
MFLM
FRM-C
FRM-T
FRM-R
22
Type Prediction Methods

Field-based collection Query-Likelihood (FQL)



Calculate QL score for each field of a collection
Combine field-level scores into a collection score
Feature-based Method


Combine existing type-prediction methods
Grid Search / SVM for finding combination weights
23
Type Prediction Performance

Pseudo-desktop Collections

CS Collection
(% of queries with correct prediction)


FQL improves performance over CQL
Combining features improves the performance further
24
Summary So Far…

Field relevance model for structured document retrieval



Type prediction methods for PIR



Enables relevance feedback through field weighting
Improves performance using linear feature-based estimation
Field-based type prediction method (FQL)
Combination of features improve the performance further
We move onto associative browsing model

What happens when users can’t recall good search terms?
Associative Browsing Model
Recap: Retrieval Framework for PIR
Keyword Search
Associative Browsing
Registration
James
James
27
User Interaction for Associative Browsing


Users enter a concept or document page by search
The system provides a list of suggestions for browsing
Data Model
User Interface
How can we build associations?
How would it match user’s
Automatically?
Manually?
preference?
Participants wouldn’t create associations beyon
d simple tagging operations
- Sauermann et al. 2005
29
Building the Associative Browsing Model
1. Document Collection
2. Concept Extraction
3. Link Extraction
4. Link Refinement
Click-based
Training
Term
Similarity
Temporal
Similarity
Co-occurrence
30
Link Extraction and Refinement

Link Scoring


Link Presentation



Combination of link type scores
 S(c1,c2) = Σi [ wi × Linki(c1,c2) ]
Ranked list of suggested items
Users click on them for browsing
Concepts
Documents
Term Vector Similarity
Temporal Similarity
Tag Similarity
String Similarity
Path / Type Similarity
Co-occurrence
Concept Similarity
Concept: Search Engine
Link Refinement (training wi)

Maximize click-based relevance


Grid Search : Maximize retrieval effectiveness (MRR)
RankSVM : Minimize error in pairwise preference
Browsing Suggestions
31
Evaluating Associative Browsing Model

Data set: CS Collection



Value of browsing for known-item finding



Collect public documents in UMass CS department
CS dept. people competed in known-item finding tasks
% of sessions browsing was used
% of sessions browsing was used & led to success
Quality of browsing suggestions


Mean Reciprocal Rank using clicks as judgments
10-fold cross validation over the click data collected
32
Value of Browsing for Known-item Finding
Evaluation Type
Simulation
Total
(#sessions)
Browsing used
Successful
outcome
63,260
9,410 (14.8%)
3,957 (42.0%)
User Study (1)
290
42 (14.5%)
15 (35.7%)
Document
Only
User Study (2)
142
43 (30.2%)
32 (74.4%)
Document
+ Concept

Comparison with Simulation Results


Roughly matches in terms of overall usage and success ratio
The Value of Associative Browsing


Browsing was used in 30% of all sessions
Browsing saved 75% of sessions when used
Quality of Browsing Suggestions

Concept Browsing (MRR)
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
CS/Top1
CS/Top5
title

content
tag
time
string
cooc
occur
Uniform
Grid
SVM
Document Browsing (MRR)
0.18
0.16
0.14
0.12
0.1
CS/Top1
0.08
CS/Top5
0.06
0.04
0.02
0
title
content
tag
time
topic
path
type
concept
Uniform
Grid
SVM
34
Simulated Evaluation Methods
Challenges in PIR Evaluation

Hard to create a ‘test-collection’



Each user has different documents and habits
People will not donate their documents and queries for research
Limitations of user study



Experimenting with a working system is costly
Experimental control is hard with real users and tasks
Data is not reusable by third parties
36
Our Approach: Simulated Evaluation

Simulate components of evaluation



Collection: user’s documents with metadata
Task: search topics and relevance judgments
Interaction: query and click data
37
Simulated Evaluation Overview

Simulated document collections

Pseudo-desktop Collections


CS Collection


Subsets of W3C mailing list + Other document types
UMass CS mailing list / Calendar items / Crawl of homepage
Evaluation Methods
Controlled User Study
Simulated Interaction
Field-based
Search
DocTrack Search Game
Query Generation Methods
Associative
Browsing
DocTrack Search + Browsing
Game
Probabilistic User Modeling
Controlled User Study: DocTrack Game

Procedure




DocTrack search game



Collect public documents in UMass CS dept. (CS Collection)
Build a web interface where participants can find documents
People in CS department participated
20 participants / 66 games played
984 queries collected for 882 target documents
DocTrack search+browsing game


30 participants / 53 games played
290 +142 search sessions collected
39
DocTrack Game
Target Item
Find It!
*Users can use search and browsing
for DocTrack search+browsing game
40
Query Generation for Evaluating PIR

Known-item finding for PIR



Query Generation for PIR



A target document represents an information need
Users would take terms from the target document
Randomly select a target document
Algorithmically take terms from the document
Parameters of Query Generation


Choice of extent : Document [Azzopardi07] vs. Field
Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Azzopardi07]
41
Validating of Generated Queries

Basic Idea



Validation by Comparing Query-terms


Use the set of human-generated queries for validation
Compare at the level of query terms and retrieval scores
The generation probability of manual query q from Pterm
Validation by Compare Retrieval Scores [Azzopardi07]

Two-sided Kolmogorov-Smirnov test
42
Validation Results for Generated Queries

Validation based on query terms

Validation based on retrieval score distribution
Probabilistic User Model for PIR

Query generation model


State transition model


Term selection from a target document
Use browsing when result looks marginally relevant
Link selection model

Click on browsing suggestions based on perceived relevance
44
A User Model for Link Selection

User’s level of knowledge




Random : randomly click on a ranked list
Informed : more likely to click on more relevant item
Oracle : always click on the most relevant item
Relevance estimated using the position of the target item
1 …
2 …
3 …
1 …
2 …
4 …
5 …
3 …
4 …
1 …
5 …
2 …
3 …
4 …
5 …
45
Success Ratio of Browsing


Varying the level of knowledge and fan-out for simulation
Exploration is valuable for users with low knowledge level
0.48
0.46
0.44
0.42
0.4
random
informed
0.38
oracle
0.36
0.34
0.32
0.3
FO1
FO2
FO3
More Exploration
46
Community Efforts using the Data Sets
47
Conclusions & Future Work
Major Contributions

Field-based Search Models



Associative Browsing Model



Field relevance model for structured document retrieval
Field-based and combination-based type prediction method
An adaptive technique for generating browsing suggestions
Evaluation of associative browsing in known-item finding
Simulated Evaluation Methods for Known-item Finding


DocTrack game for controlled user study
Probabilistic user model for generating simulated interaction
49
Field Relevance for Complex Structures

Current work assumes documents with flat structure

Field Relevance for Complex Structures?


XML documents with hierarchical structure
Joined Database Relations with graph structure
Cognitive Model of Query Generation

Current query generation methods assume:



Relaxing these assumptions



Queries are generated from the complete document
Query-terms are chosen independently from one another
Model the user’s degradation in memory
Model the dependency in query term selection
Ongoing work


Graph-based representation of documents
Query terms can be chosen by random walk
Thank you for your attention!

Special thanks to my advisor, coauthors, and all of you here!

Are we closer to the superhuman now?
One More Slide: What I Learned…

Start from what’s happening from user’s mind


Balance user input and algorithmic support


Field relevance / query generation, …
Generating suggestions for associative browsing
Learn from your peers & make contributions


Query generation method / DocTrack game
Simulated test collections & workshop
Download