Collaborative Filtering: A Tutorial

advertisement
Recommender Systems and
Collaborative Filtering
Drawing much on some online ppt in
this area, especially William W. Cohen
(CMU)
You visit an online bookshop ...
The shop has 100,000 books.
On the webpage, they will display 5 book
covers, especially for you.
What ones will they display?
Why?
• same for books, webpages, music, films,
clothes, food, everything ... this is very
serious for e-commerce -- big financial
uplift if stores get recommendations ‘right’
• What if the website is not selling you
anything (e.g. research papers, search,
interest group forum). Why does such a site
need to make good recommendations?
Basic approaches used for
recommendation
• User-based
– Recommend things that were purchased or
viewed by users who are similar to you
• Item-based
– Recommend things that are similar to the items
that you have viewed/purchased before
Amazon: ‘cold-start’ recomendation
Amazon: with minimal info about me
via a cookie on this netbook
Amazon, when I logged in
User Profiles
For user-based recommendation, sites need to
have some kind of user profile.
Similarity with other users is based on
distance measurements based on the profile.
What do you think could be in a user profile?
Potential contents of user profiles
• Demographic data: age, gender, salary,
profession, country of residence, country of
origin, religion ...
• Site behaviour: Purchase history at the site;
viewing history, perhaps including time
spent on certain pages/items; clickstream
sequence
K-Nearest Neighbour based
Recommendation
Age
You
Salary
(Think in terms of many dimensions, not just these two)
K-Nearest Neighbour based
Recommendation
Age
You
Salary
Your neighbours: recommend things that they have viewed/purchased
Collaborative Filtering: The main idea
People who
purchased A also
purchased B
Different from nearest-neighbour; this can lead
to recommendations based on behaviour of
users who are very dissimilar to you
Other forms/aspects of
collaborative filtering
Why “collaborative”? Basically, someone
else (in fact many someones) have gone to
the effort of viewing/filtering things, and
chosen the best few.
You get a
recommendation of the best few, without
having to spend the effort.
Rampant examples of CF: twitter, pagerank,
stumbleupon, digg, Facebook (Likes), etc ...
Another look at Google’s PageRank
(this bit adapted from slides of William Cohen, CMU)
web site
xxx
web site
xxx
web site
xxx
web site a b c
defg
web
Inlinks are “good”
(recommendations)
Inlinks from a
“good” site are
better than inlinks
from a “bad” site
site
web site yyyy
web site a b c
defg
web site yyyy
pdq pdq ..
but inlinks from
sites with many
outlinks are not as
“good”...
“Good” and “bad”
are relative.
Google’s PageRank
(Brin & Page, http://www-db.stanford.edu/~backrub/google.html)
web site
xxx
web site
xxx
Imagine a “pagehopper”
that always either
• follows a random link, or
web site a b c
defg
• jumps to random page
web
site
web site yyyy
web site a b c
defg
web site yyyy
pdq pdq ..
PageRank ranks pages by
the amount of time the
pagehopper spends on a
page:
• or, if there were many
pagehoppers, PageRank is
the expected “crowd size”
Collaborative Filtering and User Ratings
Many systems ask users to rate items – e.g. on a scale of
1 to 10. These ratings then enable the system to give more
precise/accurate recommendations, and use a variety of
sophisticated learning/prediction algorithms.
Collaborative Filtering and User Ratings
Many systems ask users to rate items – e.g. on a scale of
1 to 10. These ratings then enable the system to give more
precise/accurate recommendations, and use a variety of
sophisticated learning/prediction algorithms.
E.g. Here are user ratings for some items: “?” means unrated.
You:
User1
User2
User3
A
7
1
6
7
B
2
8
3
2
C
1
8
3
1
D
8
2
7
7
E F
9 9
? 2
6 5
7 ?
G H
? ?
8 7
3 1
3 1
How might a system predict your rating for items G and H?
Collaborative Filtering Works
BellCore’s MovieRecommender
(Bell Communications Research)
• Participants sent email to videos@bellcore.com
• System replied with a list of 500 movies to rate on
a 1-10 scale (250 random, 250 popular)
– Only subset need to be rated
• New participant P sends in rated movies via email
• System compares ratings for P to ratings of (a
random sample of) previous users
• Most similar users are used to predict scores for
unrated movies
• System returns recommendations in an email
message.
Start your own business?
Bookmark based recommendation
Display the right adverts on your site
End
Download