Recommender Systems

advertisement
Filtering and Recommender Systems
Content-based and Collaborative
Filtering and
Recommender
Systems
Content-based
and
Collaborative
Personalization
• Recommenders are instances of
personalization software.
• Personalization concerns adapting to the
individual needs, interests, and preferences of
each user.
• Includes:
– Recommending
– Filtering
– Predicting (e.g. form or calendar appt. completion)
• From a business perspective, it is viewed as
part of Customer Relationship Management
(CRM).
Feedback &
Prediction/Recommendation
• Traditional IR has a single user—probably
working in single-shot modes
– Relevance feedback…
• WEB search engines have:
– Working continually
• User profiling You know this one
– Profile is a “model” of the user
• (and also Relevance feedback)
– Many users
• Collaborative filtering
– Propagate user preferences to other
users…
Recommender Systems in Use
• Systems for recommending items (e.g. books,
movies, CD’s, web pages, newsgroup
messages) to users based on examples of
their preferences.
• Many on-line stores provide
recommendations (e.g. Amazon, CDNow).
• Recommenders have been shown to
substantially increase sales at on-line stores.
Feedback Detection
Non-Intrusive
– Click certain pages in
certain order while ignore
most pages.
– Read some clicked pages
longer than some other
clicked pages.
– Save/print certain clicked
pages.
– Follow some links in
clicked pages to reach
more pages.
– Buy items/Put them in
wish-lists/Shopping Carts
Intrusive
– Explicitly ask users to
rate items/pages
Justifying Recommendation..
• Recommendation systems must justify their
recommendations
– Even if the justification is bogus..
– For search engines, the “justifications” are the page
synopses
• Some recommendation algorithms are better at
providing human-understandable justifications than
others
– Content-based ones can justify in terms of classifier
features..
– Collaborative ones are harder-pressed other than saying
“people like you seem to like this stuff”
– In general, giving good justifications is important..
Content/Profile-based
Content-based vs. Collaborative
Recommendation
Red
Mars
Found
ation
Jurassic
Park
Lost
World
2001
Machine
Learning
Needs description of items…
User
Profile
Neuromancer
2010
Difference
Engine
Collaborative Filtering
User
Database
A
B
C
:
Z
9
3
:
5
A
B
C 9
: :
Z 10
A
B
C
:
Z
5
3
A
B
C 8
: :
Z
:
7
A 6
B 4
C
: :
Z
A 10
B 4
C 8
. .
Z 1
Needs only ratings from other users
Correlation
Match
Active
User
A 9
B 3
C
. .
Z 5
A
B
C
:
Z
9
3
:
5
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
Content-Based Recommending
• Recommendations are based on
information on the content of items
rather than on other users’ opinions.
• Uses machine learning algorithms to
induce a profile of the users preferences
from examples based on a featural
description of content.
• Lots of systems
Adapting Naïve Bayes idea for Book
Recommendation
• Vector of Bags model
– E.g. Books have several
different fields that are all
text
• Authors, description, …
• A word appearing in one
field is different from the
same word appearing in
another
• Can give a profile of a
user in terms of words
that are most predictive
of what they like
– Want to keep each bag
different—vector of m Bags;
Conditional probabilities for
each word w.r.t each class
and bag
P(cj) S |dm|
P(cj | Book) 
P(ami | cj, sm)

P( Book) m1 i 1
– Strengh of a keyword
• Log[P(w|rel)/P(w|~rel)]
– We can summarize a
user’s profile in terms of
the words that have
strength above some
threshold.
– Related to mutual
information
Collaborative Filtering
User
Database
A
B
C
:
Z
9
3
:
5
A
B
C 9
: :
Z 10
A
B
C
:
Z
5
3
A
B
C 8
: :
Z
:
7
Correlation
Match
Active
User
A 9
B 3
C
. .
Z 5
A 6
B 4
C
: :
Z
A
B
C
:
Z
9
3
:
5
A 10
B 4
C 8
. .
Z 1
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
Item-User Matrix
• The input to the collaborative filtering
algorithm is an mxn matrix where rows are
items and columns are users
– Sort of like term-document matrix (items are terms
and documents are users)
– Can do vector similarity between users
– Pearson correlation coefficient is a variation
• And find who are most similar users..
– Can do scalar clusters over items etc..
• And find what are most correlated items
Think usersdocs
Itemskeywords
• Can think of users as vectors in the space of
items (or vice versa)
A Collaborative Filtering Method
(think kNN)
• Weight all users with respect to similarity with the
active user.
– How to measure similarity?
• Could use cosine similarity; normally pearson coefficient is
used
• Select a subset of the users (neighbors) to use as
predictors.
• Normalize ratings and compute a prediction from a
weighted combination of the selected neighbors’
ratings.
• Present items with highest predicted ratings as
recommendations.
Finding User Similarity with Person
Correlation Coefficient
• Typically use Pearson correlation coefficient
between ratings for active user, a, and another
user, u.
ca,u 
covar(ra , ru )
r r
a
u
ra and ru are the ratings vectors for the m items rated by
both a and u
m
ri,j is user i’s rating for item j covar(ra , ru ) 
rx 
i 1
m
x ,i
i 1
a ,i
 ra )(ru ,i  ru )
m
m
m
r
 (r
r 
x
2
(
r

r
)
 x ,i x
i 1
m
Person Correlation Coefficient is
the same as vector similarity over
centered ratings vectors
• It is easy to check for yourself that
pearson correlation coefficient is the
same as the cosine theta distance
between centered ratings vectors
– Covariance = dot product
– Sqrt (Variance of each vector) = norm of
each vector
Neighbor Selection
• For a given active user, a, select correlated
users to serve as source of predictions.
• Standard approach is to use the most similar
k users, u, based on similarity weights, wa,u
• Alternate approach is to include all users
whose similarity weight is above a given
threshold.
Rating Prediction
• Predict a rating, pa,i, for each item i, for active user,
a, by using the k selected neighbor users,
u  {1,2,…k}.
• To account for users different ratings levels, base
predictions on differences from a user’s average
rating.
covar(r , r )
c

• Weight users’ ratings contribution by their
 
n
similarity to the active user.
wa ,u (ru ,i  ru )

u 1
p

r

a ,i
a
n
ri,j is user i’s rating for item j
 | wa,u |
a
a ,u
ra
u 1
ru
u
Significance Weighting
• Important not to trust correlations based
on very few co-rated items.
• Include significance weights, sa,u, based
on number of co-rated items, m.
wa,u  sa,u ca,u
s a ,u
1 if m  50 

m


if m  50


 50

ca,u 
covar(ra , ru )
r r
a
u
Item-centered Collaborative
Filtering
• Starting with a “centered” user-item matrix, we found k-nearest
users to the active user and used them to recommend unrated
items
• We can also use the centered U-I matrix to compute item-item
correlations by starting with U-I’xU-I, and doing (a) association
clusters and (b) scalar clusters
• This will give us, for each item, k-nearest items
– Now, given a new item In to be rated for a user U, we first find k items
closest to In and, and take their (weighted) average rating from the user U
as predictive of U’s rating of In
– An advantage of this method over the “user-centered” idea is that the
justifications for the recommendations can be more meaningful (you can tell
the user that we are recommending In because she rated the items in its
association cluster high..)
LSI-style techniques for
collaborative filtering
•
•
•
The NETFLIX prize was won by an approach
that did “latent factor analysis” (aka LSI) on the
u-i matrix, so that both users and items are seen
as vectors in a k-dimensional factor space
One technical difficulty in doing LSI on u-i matrix
is that it has many “null” values
– D-t matrix is sparse and that is good. U-I
matrix has null values and that is bad
(because null != 0)
Two approaches:
– “fill in” the missing ratings (“Imputation”
method) so we have no more null values
– “compute distance between vectors only in
terms of their common non-null dimensions
• Problem: Overfitting. Solution:
Regularization—penalize “large factor”
values.
qi item in factor space
pu user in factor space
Problems with Collaborative Filtering
• Cold Start: There needs to be enough other users
already in the system to find a match.
• Sparsity: If there are many items to be recommended,
even if there are many users, the user/ratings matrix is
sparse, and it is hard to find users that have rated the
same items.
• First Rater: Cannot recommend an item that has not
been previously rated.
– New items
– Esoteric items
• Popularity Bias: Cannot recommend items to
someone with unique tastes.
– Tends to recommend popular items.
• WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEY
SPEARS YOU DUNDERHEAD? #$%$%$&^
Advantages of Content-Based
Approach
• No need for data on other users.
– No cold-start or sparsity problems.
• Able to recommend to users with unique tastes.
• Able to recommend new and unpopular items
– No first-rater problem.
• Can provide explanations of recommended items
by listing content-features that caused an item to
be recommended.
• Well-known technology The entire field of
Classification Learning is at (y)our disposal!
Disadvantages of Content-Based
Method
• Requires content that can be encoded as
meaningful features.
• Users’ tastes must be represented as a
learnable function of these content features.
• Unable to exploit quality judgments of other
users.
– Unless these are somehow included in the content
features.
Content-Boosted CF - I
User-ratings Vector
Training Examples
Content-Based
Predictor
Pseudo User-ratings Vector
User-rated Items
Unrated Items
Items with Predicted Ratings
Content-Boosted CF - II
User Ratings
Matrix
Content-Based
Predictor
Pseudo User
Ratings Matrix
• Compute pseudo user ratings matrix
– Full matrix – approximates actual full user ratings matrix
• Perform CF
– Using Pearson corr. between pseudo user-rating vectors
• This works better than either!
Unlabeled examples help only when they are drawn
from the same distribution as the labeled ones..
Why can’t the pseudo ratings be
used to help content-based filtering?
•
How about using the pseudo ratings to improve a content-based filter
itself? (or how access to unlabelled examples improves accuracy…)
– Learn a NBC classifier C0 using the few items for which we have user
ratings
– Use C0 to predict the ratings for the rest of the items
– Loop
• Learn a new classifier C1 using all the ratings (real and predicted)
• Use C1 to (re)-predict the ratings for all the unknown items
– Until no change in ratings
•
With a small change, this actually works in finding a better classifier!
– Change: Keep the class posterior prediction (rather than just the max class)
• This means that each (unlabelled) entity could belong to multiple classes—with
fractional membership in each
• We weight the counts by the membership fractions
– E.g. P(A=v|c) = Sum of class weights of all examples in c that have A=v divided by Sum
of class weights of all examples in c
•
This is called expectation maximization
– Very useful on web where you have tons of data, but very little of it is
labelled
– Reminds you of K-means, doesn’t it?
•
(no coincidence—K-means is “hard-assignment” EM)
(boosted) content filtering
Discussion of the Google News
Collaborative Filtering Paper
Download