Future Direction : Collaborative Filtering

advertisement
Future Direction : Collaborative Filtering
Motivating Observations:
 Relevance Feedback is useful, but expensive
a) Humans don’t often have time to give positive/negative
judgments on a long list of returned web pages
to improve individual searches
b) Effort is used once, then wasted
 want pooling and re-use of efforts access individuals
cs466-25
1
Collaborative Filtering
Motivating Observations (continued) :
 Relevance Quality
Queries : bootleg CD’s
NAFTA
Medical School Admissions Simulated Annealing
REM
Alzheimer’s
Many web pages can be “about” a topic (specialized unit)
But there are great differences in quality of presentation, detail,
professionalism, substance, etc.
cs466-25
2
Possible Solution:
build a supervised learner for quality/ NOT topic matter
Train on examples of each, learn distinguishing properties
cs466-25
3
One Solution:
Supervised Learner for “Quality” of a Page
P(Quality|Features) in addition to topic similarity
salient features may include:
• # of links
• Size
• How often cited
• Variety of content
• “Top 5th of Web” awards etc,
• assessment of usage counter (hit count)
• Complexity of graphics  quality??
• Prior quality rating of server
cs466-25
4
Collaborative Filtering
Problem: Different humans have different profiles of
relevance/quality
Appropriate for Care Giver
Query: Alzheimer’s disease
Relevant
(high
quality)
for
6th Grader
Medical
Researcher
= A document or web page
cs466-25
5
One Solution:
Pool collective wisdom and
compute weighted average of page rankings
across multiple users in an affinity group
(taking into account topic relevance, quality, and
other intangibles)
Hypothesis : humans have a better idea than machines of
what other humans will find interesting
cs466-25
6
Collaborative Filtering
Idea: instead of trying to model (often intangible) quality
judgments, keep a record of previous human relevance and
quality judgments
Users
A
Query: Alzheimer’s
B
C
E
F
G
1
2
3
1
5
3
Table of user
rankings of web
pages for a
query
D
Web
pages
4
3
1
4
2
4
4
6
2
2
1
2
7
3
3
3
1060
1
1
2
1059
1
2
1
5
1061
cs466-25
7
Solution 1:
Identify individual with similar tastes (high Pearson’s
coefficient on similar ranking judgments)
instead of:
P(relevant to me | Pagei content)
compute:
P(relevant to me | relevant to you)  My similarity to you
* P(relevant to you | Pagei content)  Your Judgments
cs466-25
8
Solution 2:
Model Group Profiles for relevance judgments (e.g. Junior
High School vs. Medical Researchers)
compute:
P(relevant to me | relevant to groupg)  My similarity to
the group
* P(relevant to groupg | Pagei content)  group’s
collective (avg)
relevance
judgments
Supervised Learning
cs466-25
9
Download