Intro to RecSys and CCF Brian Ackerman 1 Roadmap • Introduction to Recommender Systems & Collaborative Filtering • Collaborative Competitive Filtering 2 Introduction to Recommender Systems & Collaborative Filtering 3 Motivation • Netflix has over 20,000 movies, but you may only be interested in a small number of these movies • Recommender systems can provide personalized suggestions based on a large set of items such as movies – Can be done in a variety of ways, the most popular is collaborative filtering 4 Collaborative Filtering • If two users rate a subset of items similarly, then they might rate other items similarly as well User 1 User 2 Item A ? 1 Item B 3 3 Item C 4 4 Item D 5 5 Item E 3 ? 5 Roadmap (RS-CF) • Motivation • Problem • Main CF Types – Memory-based – User-based – Model-based – Regularized SVD 6 Problem Setting • Set of users, U • Set of items, I • Users can rate items where rui is user u’s rating on item i • Ratings are often stored in a rating matrix – R|U|×|I| 7 Sample Rating Matrix Item A Item B Item C Item D Item E Item F Item G Item H Item I User 1 - 5 - 3 - - 2 - - User 2 4 - 5 - - 4 - 1 - User 3 - 4 - 3 - - 2 - - User 4 1 2 - - - 5 - 3 - User 5 - - 3 - 4 - - 2 - User 6 - 2 - - 1 - - 2 - User 7 4 - - 5 - - 4 - 1 # is a user rating, - means a null entry, not rated 8 Problem • Input – Rating matrix (R|U|×|I|) – Active user, a (user interacting with the system) • Output – Prediction for all null entries of the active user 9 Roadmap (RS-CF) • Motivation • Problem • Main CF Types – Memory-based – User-based – Model-based – Regularized SVD 10 Main Types • Memory-based – User-based* [Resnick et al. 1994] – Item-based [Sarwar et al. 2001] – Similarity Fusion (User/Item-based) [Wang et al. 2006] • Model-based – SVD (Singular Value Decomposition) [Sarwar et al. 2000] – RSVD (Regularized SVD)* [Funk 2006] 11 User-based Item A Item B Item C Item D Item E Item F Item G Item H Item I Active ? 5 ? 3 ? ? 2 ? ? User 2 4 - 5 - - 4 - 1 - User 3 - 4 - 3 - - 2 - - User 4 1 2 - - - 5 - 3 - User 5 - - 3 - 4 - - 2 - User 6 - 2 - - 1 - - 2 - User 7 4 - - 5 - - 4 - 1 • Find similar user’s – KNN or threshold • Make prediction 12 User-based – Similar Users • Consider each user (row) to be a vector • Compare each vector to find the similarity between two users – Let a be the vector for active user and u3 be the vector for user 3 – Cosine similarity can be used to compare vectors 13 User-based – Similar Users Item A Item B Item C Item D Item E Item F Item G Item H Item I User 1 ? 5 - 3 - - 2 - - User 2 4 - 5 - - 4 - 1 - User 3 - 4 - 3 - - 2 - - User 4 1 2 - - - 5 - 3 - User 5 - - 3 - 4 - - 2 - User 6 - 2 - - 1 - - 2 - User 7 4 - - 5 - - 4 - 1 • KNN (k-nearest neighbors or top-k) – Only find the k most similar users • Threshold – Find all users that are at most θ level of similarity 14 User-based – Make Prediction • Weighted by similarity – Weight each similar user’s rating based on similarity to active user Similar users Prediction for active user on item i 15 Main Types • Memory-based – User-based* [Resnick et al. 1994] – Item-based [Sarwar et al. 2001] – Similarity Fusion (User/Item-based) [Wang et al. 2006] • Model-based – SVD (Singular Value Decomposition) [Sarwar et al. 2000] – RSVD (Regularized SVD)* [Funk 2006] 16 Regularized SVD • Netflix data has 8.5 billion entries based on 17 thousand movie and .5 million users • Only 100 million ratings – 1.1% of all possible ratings • Why do we need to operate on such a large matrix? 17 Regularized SVD – Setup • Let each user and item be represented by a feature vector of length k – E.g. Item A may be vector Ak = [a1 a2 a3 … ak] • Imagine the features for items were fixed – E.g. items are movies and each feature is a genre such as comedy, drama, etc… • Features of the user vector are how well a user likes that feature 18 Regularized SVD – Setup • Consider the movie Die Hard – Its feature vector may be i = [1 0 0] if the features are action, comedy, and drama • Maybe the user has the feature vector u = [3.87 2.64 1.32] • We can try to predict a user’s rating using the dot product of these two vectors – r’ui= u ∙ i = [1 0 0] ∙ [3.87 2.64 1.32] = 3.87 19 Regularized SVD – Goal • Try to find values for each item vector that work for all users • Try to find value for each user vector that can produce the actual rating when taking the dot product with the item vector • Minimizing the difference between the actual and predicted (based on dot product) rating 20 Regularized SVD – Setup • In reality, we cannot choose k to be large enough for a fixed number of features – There are too many to consider (e.g. genre, actors, directors, etc…) • Usually k is only 25 to 50 which reduces the total size of the matrices to only roughly 25 million to 50 million (compared to 8.5 billion) • Because of the size of k, the values in the vectors are NOT directly tied to any feature 21 Regularized SVD – Goal • Let u be a user, i be an item, rui is a rating by user u on item i where R is the set of all ratings, and φu, φi are the vectors • At first thought, it seems simple to have the following optimization goal 22 Regularized SVD – Overfitting • Problem is overfitting of the features – Solved by regularization 23 Regularized SVD – Regularization • Introduce a new optimization goal including a term for regularization • Minimizing the magnitude of the feature vectors – Controlled by fixed parameters λu and λi 24 Regularized SVD • Many improvements have been proposed to improve the regularized optimization goal – RSVD2/NSVD1/NSVD2 [Paterek 2007]: added term for user bias and a term for item bias, minimize number of parameters – Integrated Neighborhood SVD++ [Koren 2008]: used a neighborhood-based approach to RSVD 25 Roadmap • Introduction to Recommender Systems & Collaborative Filtering • Collaborative Competitive Filtering 26 Collaborative Competitive Filtering: Learning Recommender Using Context of User Choice Georgia Tech and Yahoo! Labs Best Student Paper at SIGIR’11 27 Motivation • A user may be given 5 random movies and chooses Die Hard – This tells us the user prefers action movies • A user may be given 5 actions movies and chooses Die Hard over Rocky and Terminator – This tells us the user prefers Bruce Willis 28 Roadmap (CCF) • • • • Motivation Problem Setting & Input Techniques Extensions 29 Problem Setting • Set of users, U • Set of items, I • Each user interaction has an offer set O and a decision set D • Each user interaction is stored as a tuple (u, O, D) where D is a subset of O 30 CCF Input Item A Item B Item C Item D Item E Item F Item G Item H U1-S1 1 - U1-S2 - - U1-S3 - - - 1 - - - - U2-S2 U3-S2 1 - U2-S1 U3-S1 - Item I 1 1 - - - - - 1 - - 1 1 means user interaction, - means it was in the offer set 31 Roadmap (CCF) • • • • Motivation Problem Setting & Input Techniques Extensions 32 Local Optimality of User Choice • Each item has a potential revenue to the user which is rui • Users also consider the opportunity cost (OC) when deciding potential revenue – OC is what the user gives up for making a given decision • OC is cui = max( i’ | i’ in O \ i) • Profit is πui= rui – cui 33 Local Optimality of User Choice • A user interaction is an opportunity give and take process – User is given a set of opportunities – User makes a decision to select one of the many opportunities – Each opportunity comes with some revenue (utility or relevance) 34 Competitive Collaborative Filtering • Local optimality constraint – Each item in the decision set has a revenue higher than those not in the decision set – Problem becomes intractable with only this constraint, no unique solution 35 CCF – Hinge Model • Optimization goal – Minimize error (ξ, slack variable) & model complexity 36 CCF – Hinge Model • Find average potential utility – Average utility of non-chosen items • Constraints – Chosen items have a higher utility – eui is an error term 37 CCF – Hinge Model • Optimization Goal – Assume ξ is 0 Average Relevance of Non-chosen Items 38 CCF – How to use results • We can predict the relevance of all items based on user and item vectors – Can set threshold if more than one item can be chosen (e.g. θ > .9 implies action) Item User Action Predicted Relevance A 1 .98 B - .93 C - .56 D - .25 E - .11 39 Roadmap (CCF) • • • • Motivation Problem Setting & Input Techniques Extensions 40 Extensions • Sessions without a response – User does not take any opportunity • Adding content features – Fixed features for each item rather than a limited number of parameters to improve accuracy of new item prediction 41