Slides, 736K

advertisement
An Implicit Feedback approach for
Interactive Information Retrieval
Ryen W. White, Joemon M. Jose, Ian Ruthven
University of Glasgow
Hamza Hydri Syed
Course Presentation - Web Information Retrieval
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
2
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
3
An Implicit Feedback approach for Interactive Information Retrieval
Introduction
Relevance Feedback (RF):
• Automatically improving a system´s representation of a searcher´s information
need through an iterative process of feedback1.
• Depends on a series of relevance assessments made explicitly by the user.
• Assumes that underlying need is the same across all the iterations.
Implicit RF:
• IR system unobtrusively monitors search behaviour
• Removes the need for the searcher to explicitly indicate which documents are
relevant2.
• Variety of „surrogate“ measures have been employed.
• Hyperlinks clicked, mouseovers, scrollbar activity3,4
• Can be unreliable indicators, use interaction with the full-text documents as implicit
feedback.
4
An Implicit Feedback approach for Interactive Information Retrieval
Introduction
Approach:
• Searchers can interact with different representations of each document.
• Representations are of varying length, focussed on the query & logically
connected at the interface to form an interactive search path.
• Develops a means of better representing searcher needs while minimizing the
burden of explicitly reformulating queries.
5
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
6
An Implicit Feedback approach for Interactive Information Retrieval
Searcher Interaction
Document Representations:
•
Focus on query-relevant parts of documents
•
Reduce likelihood for selection of erroneous terms.
•
Interface uses multiple document representations
(1)
(2)
(3)
(4)
(5)
(6)
Top-ranking sentences (TRS)– from each of the top 30 documents retrieved
Title
Query-biased summary of the documents
Summary Sentence
Sentence in Context
Document itself
7
An Implicit Feedback approach for Interactive Information Retrieval
8
An Implicit Feedback approach for Interactive Information Retrieval
Searcher Interaction
Relevance Path:
•
The further along the path a searcher travels, the more relevant is the
information in the path.
•
Paths can vary in length and searchers can access the full-text of the
document from any step in the path.
9
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
10
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Features:
•
Heursitics-based model which implicitly selects terms for query
modification.
•
Utilizes searcher interaction with document representations & relevance
paths.
•
Term present in a viewed representation recieves a „vote“, when not
present recieves no vote.
•
Winning terms are those with the most votes and hence best describe the
information viewed by the searcher.
•
Contribution of a vote is weight-ed based on the indicative worth of the
representation.
0.1
0.2
0.2
0.2
0.3
–
–
–
Title
TRS
Summary Sentence
Sentence in Context
Summary
11
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Features:
•
Each document is represented by a vector of length n
•
n – total number of unique non-stop-word terms
•
The list holding these terms – vocabulary
•
A document x term matrix is built of size (d+1) x n
•
d – no. of documents the searcher has seen
12
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Example – Simple Updating:
•
Original Query Q0 contains t5 and t9
•
Vector is normalised to give each term a value between [0,1]
•
Term occuring is assigned a weight wt
p – no. of steps taken
D – document
t – term
r – representation
Wt,r – weight of t for the representation r
•
Weight for each term is added to the appropriate term/document entry in
the matrix
13
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Example – Simple Updating:
•
Initial state of document x term matrix
•
Searcher expresses interest in the Title of the document D1 – with a step
weight of 0.1 and contains terms t1,t2 and t7
•
Matrix changes to
•
•
•
•
Weights of terms t1,t2 and t7 are directly updated
t2 is now seen as being important to D1
t1 and t7 are seen as more important than before to D1
Scoring is cummulative
14
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Query Creation:
•
For every 5 paths a new query is computed, which gathers sufficient
implicit evidence from searcher interaction
•
To compute new query we calculate the average score for each term
across all documents
•
Terms are ranked by this score
•
High average score implies the term has appeared in many viewed
representations and/or in those with high indicative weights
•
Top 6 terms chosen are
t9,t5,t1,t7,t3 and t2
Although t2,t3 and t8 have the same score, t8 is not included since t3 occurs more
recently and t2 occurs in more than one document
15
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Tracking Information Need:
•
Change in the information need can be measured by computing the change
in the term ordering from the term list at different steps i.e., qm and qm+1
•
Since the vocabulary is static, only the order of the terms in the list will
change
•
•
•
•
Where is searcher information need and o is the Spearman rank-order
correlation coefficient that computes the difference between two lists of
unique terms
The correlation returns values between -1 and 1
Result closer to -1 means the term lists are dissimilar w.r.t rank ordering
Result closer to 1 means similarity between ranking terms increases
16
An Implicit Feedback approach for Interactive Information Retrieval
Binary Voting Model
Strategies Implemented:
•
Re-searching – coeffecient value < 0.2 indicates large change in term lists,
that they are substantially different w.r.t rank ordering, this reflects a
large change in . A new re-search is done to retrieve a new set of
documents.
•
Reordering Documents – result in the range [0.2, 0.5) indicates weak
correlation & consequently a less substantial change in . A new query is
used to reorder the top 30 retrieved documents, which is done using bestmatch tf-idf scoring.
•
Reordering TRS – coefficient in range [0.5,0.8) indicates strong correlation
in the two term lists & a small change in the predicted . New query is
used to re-rank TRS list.
17
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
18
An Implicit Feedback approach for Interactive Information Retrieval
Evaluation
Manual Baseline System:
•
Similar to implicit feedback system except that searcher is solely
responsible for adding new query terms & selecting what action is
undertaken.
•
Baseline interface has additional component – term/strategy control
panel, which allows searchers to decide how best to use the query.
•
This nature of Baseline allows us to evaluate how well the implicit
feedback system detected information needs from the perspective of the
subject.
19
An Implicit Feedback approach for Interactive Information Retrieval
20
An Implicit Feedback approach for Interactive Information Retrieval
Evaluation
Experimental Subjects:
•
Were mainly undergraduate and postgraduate students of University of
Glasgow, divided into 2 groups of
•
•
Experienced
Inexperienced
Experimental Tasks:
•
Each subject was asked to complete one search task from each of 4
categories
•
Categories were –
•
•
•
•
•
fact search (finding a person´s mail address)
background search (finding information on dust allergies)
decision search (choosing the best financial instrument)
search for number of items (finding contact details of a no. of employees)
Search scenarios reflect real-life search situations & allow the subject to
make personal assessments on what constitutes relevant material5
21
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
22
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypotheses tested:
1.
„The terms selected for implicit feedback represent the information
needs of the subject (i.e., term selection support)“
2.
„The implicit feedback approach estimates changes in the subject´s
information need“
3.
„The implicit feedback approach makes search decisions that correspond
closely with those of the subject“
23
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 1 – Information need detection:
•
We measure degree of term overlap using baseline system.
•
BVM runs in background, invisible to subject & not involved directly in any
query modification decisions.
•
High values of term overlap suggest that the terms chosen by the BVM are
of good value and match the subject´s own impression of information need
24
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 1 – Information need detection:
•
•
•
Shows average percentage of occassions where the top 6 terms chosen by
BVM also included as atleast one of the subject´s terms
Difference between inexperienced and experienced subjects was not
significant.
Term overlap for experienced subjects was generally higher than that for
inexperienced subjects.
25
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 1 – Information need detection:
•
•
•
Shows the average number of query iterations & average query length
„iteration“ is the use of a query for any action; reordering the TRS, the
documents or re-searching the Web
Average query length is the number of terms in the new query that were
not in the original query
26
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 1 – Information need detection:
•
•
•
•
Shows the average frequency of query manipulation for each subject
performing different types of search
Query manipulation = adding terms and/or removing terms
Subjects added terms to queries more often for decision and background
searchs than for fact search and search for number of items.
Implicit feedback is better in decision search as compared to fact search
27
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 2 – Information need tracking:
•
Shows the average number of actions
carried out on each system across all
search tasks
•
Differences in the no. of times the
TRS were reordered
•
Experienced subjects make more use
of unfamiliar actions
•
Both groups reorder the list of TRS
more than implicit feedback system
and reorder the documents less
frequently
•
Reordering of sentences/documents
allows the system to reshape the
information space
28
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 2 – Information need tracking:
•
•
•
•
•
Shows the proportion of each type of action that was undone
Reversal indicates a dissatisfaction with outcome of the action or terms
suggested
Subjects responded well to search strategy employed on their behalf
Inexperienced subjects disliked the effects of the TRS reordering
Experienced subjects liked TRS re-ranking, but reversed the re-searching
operation more often
29
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 3 – Relevance paths
•
•
•
Subjects were asked to rate the worth of following a relevance path from
one document representation to another.
The relevance paths were significantly more helpful, beneficial,
appropriate and useful to experienced subjects than inexperienced
The distance travelled along the relevance path was a good indicator of
relevance of the information in that path
30
An Implicit Feedback approach for Interactive Information Retrieval
Results & Analysis
Hypothesis 3 – Relevance paths
•
•
•
Shows the most common path taken, the average number of steps
followed, the average number of complete and partial paths etc.
Subjects used relevance paths consistently, although experienced subjects
followed the paths longer
Experienced subjects interacted more with the retrieved documents and
more frequently used the document representations for viewing the fulltext of a document
31
An Implicit Feedback approach for Interactive Information Retrieval
Roadmap
• Introduction
• Searcher Interaction
• Binary Voting Model
• Evaluation
• Results & Analysis
• Conclusions
32
An Implicit Feedback approach for Interactive Information Retrieval
Conclusions
•
Interface uses query-relevant document representations to facilitate
access to potentially useful information and allow searchers to closely
examine results.
•
This form of implicit feedback is at the extreme end of a spectrum of
searcher support. They may be best used to make decisions in conjuction
with, not in place of, the searcher.
•
This approach has the potential to alleviate some of the problems inherent
in explicit relevance feedback, while preserving many of its beliefs.
•
The success of the approach bodes well for construction of effective
implicit RF systems that will work in concert with the searcher.
33
An Implicit Feedback approach for Interactive Information Retrieval
References
1.
Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance
feedback. Journal of the American Society for Information Science, 41(4), 288–297.
2.
Morita, M., & Shinoda, Y. (1994). Information filtering based on user behavior analysis
and best match text retrieval. In Proceedings of the 17th annual ACM SIGIR conference
on research and development in information retrieval (pp. 272–281).
3.
Lieberman, H. (1995). Letizia: an agent that assists web browsing. In Proceedings of
the 14th international joint conference on artificial intelligence (pp. 475–480).
4.
Joachims, T., Freitag, D., & Mitchell, T. (1997). Webwatcher: a tour guide for the
world wide web. In Proceedings of the 16th joint international conference on artificial
intelligence (pp. 770–775).
5.
Ingwersen, P. (1992). Information retrieval interaction. London: Taylor Graham.
34
An Implicit Feedback approach for Interactive Information Retrieval
Questions
35
An Implicit Feedback approach for Interactive Information Retrieval
..... Thank You !
36
Download