Personalizing Information Search: Understanding Users and their Interests Diane Kelly

advertisement
Personalizing Information
Search: Understanding
Users and their Interests
Diane Kelly
School of Information & Library Science
University of North Carolina
dianek@email.unc.edu
IPAM | 04 October 2007
Background: IR and TREC
 What is IR?
 Who works on problems in IR?
 Where can I find the most recent work in IR?
 A TREC primer
Background: Personalization
 Personalization is a process where retrieval is
customized to the individual (not one-size-fits-all
searching)
 Hans Peter Luhn was one of the first people to
personalize IR through selective dissemination of
information (SDI) (now called ‘filtering’)
 Profiles and user models are often employed to
‘house’ data about users and represent their
interests
 Figuring out how to populate and maintain the
profile or user model is a hard problem
Major Approaches
 Explicit Feedback
 Implicit Feedback
 User’s desktop
Explicit Feedback
Explicit Feedback
 Term relevance feedback is one of the most widely
used and studied explicit feedback techniques
 Typical relevance feedback scenarios (examples)
 Systems-centered research has found that
relevance feedback works (including pseudorelevance feedback)
 User-centered research has found mixed results
about its effectiveness
Explicit Feedback
 Terms are not presented in context so it may be
hard for users to understand how they can help
 Quality of terms suggested is not always good
 Users don’t have the additional cognitive
resources to engage in explicit feedback
 Users are too lazy to provide feedback
 Questions about the sustainability of explicit
feedback for long-term modeling
Examples
Examples
BACK
Query Elicitation Study
 Users typically pose very short queries
 This may be because
 users have a difficult time articulating their
information needs
 traditional search interfaces encourage short
queries
 Polyrepresentative extraction of information
needs suggests obtaining multiple representations
of a single information need (reference interview)
Motivation
 Research has demonstrated that a positive
relationship exists between query length and
performance in batch-mode experimental IR
 Query expansion is an effective technique for
increasing query length, but research has
demonstrated that users have some difficulty with
traditional term relevance feedback features
Elicitation Form
[Already Know]
[Why Know]
[Keywords]
20
16.18
15
5
Keywords
9.33
10.67
Why
10
Already Know
Mean Number of Terms
Results: Number of Terms
0
baseline
Source of Terms
Q2
Q3
N=45
Q4
2.33
Experimental Runs
Source of Terms
Run ID
Baseline
baseline
pseudo05,
pseudo10,
pseudo20,
pseudo50
Q2
Q3
Q4
Q3Q4, Q2Q3,
Q2Q4, Q234
Baseline + Pseudo Relevance
Feedback
Baseline + Elicitation Form Q2
Baseline + Elicitation Form Q3
Baseline + Elicitation Form Q4
Baseline + Combination of
Elicitation Form Questions
Run_ID
Q234
Q2Q3
.36
Q2Q4
Q2
Q3Q4
Q3
Q4
.30
pseudo50
baseline
pseudo20
pseudo10
pseudo05
Overall Performance
.38
0.3685
.34
.32
0.2843
.28
.26
.24
Query Length and Performance
.38
y = 0.263 + .000265(x), p=.000
Q234
.36
Q2Q4
Q2Q3
.34
Q3Q4
Q2
.32
Q4
.30
Q3
baseline
.28
0
Query Length
10
20
30
40
Major Findings
 Users provided lengthy responses to some of the
questions
 There were large differences in the length of users’
responses to each question
 In most cases responses significantly improved
retrieval
 Query length and performance were significantly
related
Implicit Feedback
Implicit Feedback
 What is it?
Information about users, their needs and document
preferences that can be obtained unobtrusively, by
watching users’ interactions and behaviors with systems
 What are some examples?
 Examine: Select, View, Listen, Scroll, Find, Query,
Cumulative measures
 Retain: Print, Save, Bookmark, Purchase, Email
 Reference: Link, Cite
 Annotate/Create: Mark up, Type, Edit, Organize, Label
Implicit Feedback
 Why is it important?
 It is generally believed that users are unwilling to
engage in explicit relevance feedback
 It is unlikely that users can maintain their profiles over
time
 Users generate large amounts of data each time the
engage in online information-seeking activities and the
things in which they are ‘interested’ is in this data
somewhere
Implicit Feedback
 What do we “know” about it?
 There seems to be a positive correlation between
selection (click-through) and relevance
 There seems to be a positive correlation between
display time and relevance
 What is problematic about it?
 Much of the research has been based on incomplete
data and general behavior
 And has not considered the impact of contextual
variables – such as task and a user’s familiarity with a
topic – on behaviors
Implicit Feedback Study
 To investigate:
 the relationship between behaviors and
relevance
 the relationship between behaviors and context
 To develop a method for studying and measuring
behaviors, context and relevance in a natural
setting, over time
Method
 Approach: naturalistic and longitudinal, but
some control
 Subjects/Cases: 7 Ph.D. students
 Study period: 14 weeks
 Compensation: new laptops and printers
Data Collection
Endurance
Tasks
Relevance
Document
Stage
Context
Persistence
Usefulness
Topics
Familiarity
Behaviors
Display
Time
Frequency
Printing
Saving
Protocol
Client- & Server-side Logging
Context
Evaluation
START
Context
Evaluation;
Document
Evaluations
Document
Evaluations
Week 1
Week 13
14 weeks
END
Results: Description of Data
Subject
1
2
3
4
5
6
7
Client
2.6 MB 6.8 MB 3.9 MB 2.0 MB 1.5 MB 21.7 MB 4.9 MB
Proxy
1.7 GB 83 MB
39 MB
42 MB
48 MB
2.9 GB
2.1 GB
URLs
15,499
Requested
5,319
3,157
3,205
3,404
14,586
11,657
Docs
Evaluated
870
(5%)
802
(14%)
384
(12%)
353
(11%)
200
(6%)
1,328
(8%)
1,160
(10%)
Tasks
6
11
19
25
12
21
33
Topics
9
80
17
35
25
40
26
Relevance: Usefulness
7.0
6.1
(2.00)
6.0
5.0
6.1
4.8
(1.65)
6.0
(0.80)
5.3
(2.20)
6.0
5.3
5.3
(2.40)
5.3
4.8
4.6
(0.80)
5.0
(2.40)
5.0
4.6
4.0
3.0
2.0
1.0
0.0
1
Subject
2
3
4
5
6
7
Relevance: Usefulness
1
7
5
3
2
1
7
5
3
3
1
7
5
3
3
1
7
5
5
3
1
7
6
5
3
1
7
7
5
3
1
01
02
03
04
05
06
07
08
09
Week viewed
10
11
12
13
14
subject
4
1
7
5
Display Time
280
260
240
220
200
180
160
140
120
100
80
60
40
0
:2
11 6
0: 6:0
0
0: :5 2
04
0: :0 4
04
0: :3 7
03
0: :1 3
03
0: :5 3
02
0: :3 0
02
0: :1 3
02
0: :5 9
01
0: :4 4
01
0: :3 1
01
0: :1 8
01
0: :0 4
01
0: :5 1
00
0: :4 0
00 0
0: 0:3
0
0: :2 2
00
0: :1 2
00
0: :0 1
00
0:
20
0
Client Display Time
Display Time & Usefulness
5.0
4.5
4.0
Subject
3.5
1
3.0
2
2.5
3
2.0
4
1.5
5
1.0
6
.5
0.0
7
1
2
Usefulness
3
4
5
6
7
Display Time & Task
4.0
Tasks
3.5
1. Researching
Dissertation
3.0
2. Shopping
3. Read News
2.5
4. Movie
Reviews &
Schedules
2.0
1.5
N=
33
127
78
17
30
16
1
2
3
4
5
6
Task Number
5. Preparing
Course
6. Entertainment
Major Findings
 Behaviors differed for each subject, but in general
 most display times were low
 most usefulness ratings were high
 not much printing or saving
 No direct relationship between display time and
usefulness
Major Findings
 Main effects for display time and all contextual
variables:
 Task (5 subjects)
 Topic (6 subjects)
 Familiarity (5 subjects)
Lower levels of familiarity associated with higher display
times
 No clear interaction effects among behaviors,
context and relevance
Personalizing Search
 Using the display time, task and relevance
information from the study, we evaluated the
effectiveness of a set of personalized retrieval
algorithms
 Four algorithms for using display time as implicit
feedback were tested:
1. User
2. Task
3. User + Task
4. General
Results
0.35
0.3
0.25
MAP
0.2
0.15
0.1
0.05
0
0
2
4
6
8
10
Iteration
12
14
16
18
20
Major Findings
 Tailoring display time thresholds based on task
information improved performance, but doing so
based on user information did not
 There was a lot of variability between subjects,
with the user-centered algorithms performing
well for some and poorly for others
 The effectiveness of most of the algorithms
increased with time (and more data)
Some Problems
Relevance
 What are we modeling? Does click = relevance?
 Relevance is multi-dimensional and dynamic
 A single measure does to adequately reflect
‘relevance’
 Most pages are likely to be rated as useful, even if
the value or importance of the information differs
Definition
Recipe
Weather Forecast
Information about Rocky
Mountain Spotted Fever
Paper about
Personalization
Page Structure
 Some behaviors are more likely to occur on some
types of pages
 A more ‘intelligent’ modeling function would
know when and what to observe and expect
 The structure of pages encourage/inhibit
certain behaviors
 Not all pages are equally as useful for
modeling a user’s interests
What types of
behaviors do you
expect here?
And here?
And here?
And here?
The Future
Future
 New interaction styles and systems create new
opportunities for explicit and implicit feedback
 Collaborative search features and query
recommendation
 Features/Systems that support the entire
search process (e.g., saving, organizing, etc.)
 QA systems
 New types of feedback
 Negative
 Physiological
Thank You
Diane Kelly (dianek@email.unc.edu)
WEB:
http://ils.unc.edu/~dianek/research.html
Collaborators: Nick Belkin, Xin Fu, Vijay Dollu, Ryen White
TREC
[Text REtrieval Conference]
It’s not this …
What is TREC?
 TREC is a workshop series sponsored by the National
Institute of Standards and Technology (NIST) and the US
Department of Defense.
 It’s purpose is to build infrastructure for large-scale
evaluation of text retrieval technology.
 TREC collections and evaluation measures are the de facto
standard for evaluation in IR.
 TREC is comprised of different tracks each of which focuses
on different issues (e.g., question answering, filtering).
TREC Collections
 Central to each TREC Track is a collection, which consists
of three major components:
1. A corpus of documents (typically newswire)
2. A set of information needs (called topics)
3. A set of relevance judgments.
 Each Track also adopts particular evaluation measures
 Precision and Recall; F-measure
 Average Precision (AP) and Mean AP (MAP)
Comparison of Measures
List 1
List 2
1
R
1/1 = 1
1
NR
2
R
2/2 = 2
2
NR
3
R
3/3 = 3
3
NR
4
R
4/4 = 4
4
NR
5
R
5/5 = 5
5
NR
6
NR
6
R
1/6 = .167
7
NR
7
R
2/7 = .286
8
NR
8
R
3/8 = .375
9
NR
9
R
4/9 = .444
10 R
5/10 = .50
AP
.354
10 NR
AP
1.0
Learn more about TREC
 http://trec.nist.gov
 Voorhees, E. M., & Harman, D. K. (2005). TREC:
Experiment and Evaluation in Information Retrieval,
Cambridge, MA: MIT Press.
BACK
Example Topic
BACK
Learn more about IR
 ACM SIGIR Conference
 Sparck-Jones, K., & Willett, P. (1997). Readings in
Information Retrieval. Morgan-Kaufman Publishers.
 Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern
information retrieval. New York, NY: ACM Press.
 Grossman, D. A., Frieder, O. (2004). Information
retrieval: Algorithms and Heuristics. The Netherlands:
Springer.
BACK
Download