Recommender Systems/Collaborative Filtering A recommender system is essential what it says, a system (or algorithm/procedure) that can be used make recommendations. For example consider the screenshot from http://www.movielens.org These are the movies recommended to me based on my previous ratings of other movies in their very extensive database. Most of these I have seen, so I could rate these as well and it would give me more recommendations based upon more of my updated reviews. Recommendation systems (also called Collaborative Filtering) try to find recommendations of new products / items to users to help them sort through all the options available to them. Recommendations are personalized to the user's past behavior in some way and also to their similarity to other users within system. One of the core algorithms behind recommender systems is called k Nearest Neighbors (kNN). It is the dirt simplest algorithm you'll ever see yet it is almost ALWAYS part of any recommendation system. Practically every machine learning competition winner has kNN as some component of their algorithm for making recommendations. “Simple” Example – Bob Ross paintings > br3.dist = dist(BobRoss3[,-1],"jaccard") > summary(br3.dist) Min. 1st Qu. Median 0.0000 0.6154 0.7333 Mean 3rd Qu. 0.7262 0.8462 Max. 1.0000 k.nearest.neighbors <- function(i, distance.matrix, k = 5) { ordered.neighbors <- order(distance.matrix[i, ]) # This just gives us the list of points that are # closest to row i in descending order. # The first entry is always 0 (the closest point is the point itself) so # let's ignore that entry and return points 2:(k+1) instead of 1:k return(ordered.neighbors[2:(k + 1)]) } 1 Suppose I really like the Bob Ross painting Mt. McKinley and would like some recommendations for other paintings of his that I might also like. We can use the function above to find the five paintings closest to Mt. McKinley in terms of distance based upon the painting characteristics. > MMneighbors = k.nearest.neighbors(2,as.matrix(br3.dist),k=5) # Painting 2 = Mt. McKinley > BobRoss3$TitleShort[MMneighbors] [1] PERFECTWI FIRSTSNOW CHRISTMAS ARCTICWIN WINTERSP A Perfect Winter Day by Bob Ross Let’s try another one: > BobRoss3$TitleShort[350] [1] FORESTRIV Now we will again use our kNN function to find the k = 5 nearest neighbors in terms of the painting features. 2 > FRneighbors = k.nearest.neighbors(350,as.matrix(br3.dist),k=5) > BobRoss3$TitleShort[FRneighbors] [1] LAZYRIVER HIDDENCRE CYPRESSCR SUNLIGHTI AWALKIN These are simple examples of an item-based collaborative filter (IBCF). In ICBF we use item similarities or distances (i.e. item-based metrics) to recommend items. As another example of the ICBF concept, suppose I like the movie “The Grand Budapest Hotel”, then an ICBF recommender would probably tell me that I would also like “Moonrise Kingdom” or “The Hotel New Hampshire” as they are movies that might be judged as similar based on users reviews of these movies. It would certainly not recommend a movie like “Fast and Furious 6”. The other major type of collaborative filtering is user-based collaborative filtering (UBCF). In UBCF similarity/distance between users is the basis for recommendations. For example, if a user gives “Darjeeling Limited” it would recommend this movie to me (or predict that I would give it a high rating) if I was similar to that user based upon our recommendation profiles. Thus in order to give recommendations or rating predictions it compares me to other users and finds those that are most like me based upon ratings profiles or “purchase” histories. Let’s consider an example using the MovieLense database. > data(MovieLense) > MovieLense 943 x 1664 rating matrix of class ‘realRatingMatrix’ with 99392 ratings. > image(sample(MovieLense,500)) > movie.ratings = getRatings(MovieLense) 3 > barplot(table(movie.ratings)/length(movie.ratings),xlab="Movie Ratings") > summary(movie.ratings) Min. 1st Qu. Median 1.00 3.00 4.00 Mean 3rd Qu. 3.53 4.00 Max. 5.00 > movie.z = getRatings(normalize(MovieLense,method="Z-score")) > hist(movie.z,xlab="Normalized Ratings",main=" ") > hist(movie.z,xlab="Normalized Ratings",main=" ",prob=T) > qplot(rowCounts(MovieLense), binwidth = 10, + main = "Movies Rated on average", + xlab = "# of users", + ylab = "# of movies rated") 4 > summary(rowCounts(MovieLense)) Min. 1st Qu. Median Mean 3rd Qu. 19.0 32.0 64.0 105.4 147.5 Max. 735.0 What is the distribution of the mean rating for each of the movies? It is important to note this is not the same as the mean of all of the ratings in the database. > qplot(movie.means,bin=.1,xlab="Movie Means",ylab="Freq") Putting the number of ratings and the mean ratings for the movies in the database is potentially interesting. > plot(colMeans(MovieLense),colCounts(MovieLense),xlab="Mean Rating",ylab="# of Ratings") > identify(colMeans(MovieLense),colCounts(MovieLense),labels=dimnames(MovieLense)[[2]],cex=.6) [1] 1 50 98 100 121 127 181 243 258 285 287 293 299 402 421 682 74239 5 Let’s consider the process of estimating ratings for users based upon their rating profiles. Using UBCF we would want to find users similar to the target user and then find the mean ratings they assigned to movies the target user has not yet rated. We can return these estimates and then sort them to return some movie recommendations based upon the highest estimated movie ratings. We will first attempt to do this writing our own code utilizing course concepts to this point along with some standard functionality within R. We will then look at package of function specifically for developing and assessing the performance of recommender systems called recommenderlab. > ML.mat = as(MovieLense,”matrix”) > dim(ML.mat) [1] 943 1664 > ML.mat[1:10,1:10] # examine a small portion of the user/rating database. Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) 5 3 4 3 3 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4 3 NA NA NA 4 NA NA NA NA NA NA NA 5 NA NA NA NA NA NA NA NA NA NA NA 4 NA NA 4 NA Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) Twelve Monkeys (1995) Babe (1995) 1 5 4 1 2 NA NA NA 3 NA NA NA 4 NA NA NA 5 NA NA NA 6 NA 2 4 7 NA 5 5 8 NA 3 NA 9 5 4 NA 10 NA 4 NA Dead Man Walking (1995) Richard III (1995) 1 5 3 2 NA 2 3 NA NA 4 NA NA 5 NA NA 6 4 NA 7 5 4 8 NA NA 9 NA NA 10 4 NA 1 2 3 4 5 6 7 8 9 10 We will now go through the operations necessary to find some recommended movies for a given target user. Here are the basic steps we will be performing in R: 1) Form a distance matrix for all the pairwise distances between users based upon their rating profiles. 2) Find the k nearest neighbors to the target user on the basis of these distances. 3) Find the “mean” of the moving ratings for each movie in the database using the k nearest neighbors. 6 4) Return the top N movies the target user has not rated by sorting these mean ratings. We will use simple options for all of this, but realize that a better system could be built by tweaking some/all of these options. The main options we need to think about are the scaling of the ratings, the distance metric used to measure the distance between users, how many neighbors to consider, and the method used to find the predicted “mean” ratings. > ML.mat = as(MovieLense,”matrix”) > ML.eucdist = dist(ML.mat,method=”Euclidean”) # requires proxy library > summary(ML.eucdist) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 0.00 43.08 53.96 55.34 65.71 163.20 15450 > ML.eucmat = as.matrix(ML.eucdist) # form the actual distance matrix > U1nn = k.nearest.neighbors(1,ML.eucmat,k=20) # k = 20 nearest neighbors for user 1 > U1nn [1] 155 418 812 876 105 433 111 309 351 516 651 895 107 237 691 364 485 845 800 767 > > > > > U1pred = colMeans(ML.mat[U1nn,],na.rm=T) U1ratings = ML.mat[1,] U1recommend = U1pred[is.na(U1ratings)] U1top10 = sort(U1recommend,decreasing=T)[1:10] U1top10 Schindler's List (1993) Close Shave, A (1995) 5 5 Vertigo (1958) Apartment, The (1960) 5 5 It's a Wonderful Life (1946) Rebel Without a Cause (1955) 5 5 Third Man, The (1949) 5 Fantasia (1940) 5 Casablanca (1942) 5 Lawrence of Arabia (1962) 5 These are actually pretty bad recommendations and that is because Euclidean distance is not a good choice! The distance between users 1 and 155 is 0, and this because I believe they only have 1 movie in common in terms of the ratings they have given, and of course it is the same! Thus user 155 is not that informative when it comes to making recommendations for user 1. > ML.cosdist = dist(ML.mat,method=”cosine”) > summary(ML.cosdist) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 0.121 0.750 0.844 0.822 0.914 1.000 15450 > ML.cosmat = as.matrix(ML.cosdist) > U1nn = k.nearest.neighbors(1,ML.cosmat,k=20) > U1pred = colMeans(ML.mat[U1nn,],na.rm=T) > U1recommend = U1pred[is.na(U1ratings)] ๐๐๐ ๐ก(๐ข๐ , ๐ข๐ ) = ๐ข๐′ ๐ข๐ √โ๐ข๐ โโ๐ข๐ โ Cosine Distance between ratings for users i and j. Ratings are typically scaled/normalized first. 7 > U1top10 = sort(U1recommend,decreasing=T)[1:10] > U1top10 Secrets & Lies (1996) 5 East of Eden (1955) 5 Paths of Glory (1957) 5 Braindead (1992) 5 Some Folks Call It a Sling Blade (1993) 5 Notorious (1946) 5 American in Paris, An (1951) 5 Shadowlands (1993) 5 Diva (1981) 5 Waiting for Guffman (1996) 5 These are certainly different recommendations than those returned using Euclidean distance. Clearly we need to do some form of model validation in order to determine which options are “best”. Function for building systems within the recommenderlab library allow for model options to be easily altered and more importantly has methods built-in to assess model/system performance. Before looking at the functions in recommenderlab lets review the basics of different methods for making recommendations. User-based Collaborative Filtering (UBCF) Again UBCF starts by finding a neighborhood of users similar to the target or active user using appropriate metric for similarity/distance. We can denote this neighborhood of the active user as ๐(๐) and estimating the rating the active user would give to item j as: ๐ฬ๐๐ = 1 ∑ ๐๐๐ = ๐๐ฃ๐๐๐๐๐ ๐๐๐ก๐๐๐ ๐๐๐ฃ๐๐ ๐ก๐ ๐๐ก๐๐ ๐ ๐๐ฆ ๐ข๐ ๐๐๐ ๐๐ ๐(๐) |๐(๐)| ๐∈๐(๐) A weighted average may also be used where users most similar to the active user are given more weight: ๐ฬ๐๐ = 1 ∑๐∈๐(๐) ๐ ๐๐ ∑ ๐ ๐๐ ๐๐๐ ๐คโ๐๐๐ ๐ ๐๐ = ๐ ๐๐๐๐๐๐๐๐ก๐ฆ ๐๐๐ก๐ค๐๐๐ ๐๐๐ก๐๐ฃ๐ ๐ข๐ ๐๐ ๐๐๐ ๐ข๐ ๐๐ ๐ ๐∈๐(๐) Also part of this process is normalize the ratings of users to account for individual user bias, i.e. some users rate consistently high (give 5’s more liberally for example) while other users rate consistently lower (tend to give lower ratings 1,2, and 3’s for example). 8 A popular choice is subtract the mean rating for all items rated from each or to use zscores which also takes variability of ratings by users into account as well. These are options that can be set as part of the modeling process. Graphic illustrating UBCF (taken from recommenderlab paper by Michael Hahsler) The two main problems with UBCF recommenders is that the entire database needs to stored so similarity computations between the active user and the other users can be performed each time. Item-based Collaborative Filtering (IBCF) In IBCF the ratings for items is estimated by taking a weighted average of the items rated by the active user. The weights are determined by item similarities, rather than user similarities. Typically only information on the k most similar items for each item are stored thus the dimensionality of data stored is much smaller than that required for UBCF models. We denote this set of the k most similar items for item i as ๐(๐). The rating for item i for active user a is given by: ๐ฬ๐๐ = 1 ∑๐∈๐(๐) ๐ ๐๐ ∑ ๐ ๐๐ ๐๐๐ ๐∈๐(๐) where, ๐ ๐๐ = ๐ ๐๐๐๐๐๐๐๐ก๐ฆ ๐๐๐๐ ๐ข๐๐ ๐๐๐ ๐๐ก๐๐๐ ๐ ๐๐๐ ๐ ๐๐๐ = ๐๐๐ก๐๐๐ ๐๐๐ฃ๐๐ ๐๐๐ ๐๐ก๐๐ ๐ ๐๐ฆ ๐กโ๐ ๐๐๐ก๐๐ฃ๐ ๐ข๐ ๐๐ 9 The graphic below demonstrates the process (taken from recommenderlab paper by M. Hahsler) For example the 0.0 for item 2 comes from the fact the active user has not rated any of the four most similar items. The 4.6 rating for item 3 is found as follows: Similar to UBCF the item ratings can be normalized within each user before finding the between item similarities/distances. IBCF recommenders generally do only slightly worse than UBCF ones, however because of the reduction in the amount information that needs to be stored IBCF recommenders are generally better suited for large scale systems (e.g. Amazon.com). Also the item similarities can fully computed in advance, i.e. before a user engages the system. Measuring Performance of Rating Recommender Systems When evaluating any predictive model it is necessary to perform some form of crossvalidation. In cross-validation we use part of the available data to train the model and set aside a portion of the data to test the predictive ability. For the Movie Lense data we can develop a recommender using a randomly selected set of users and then make recommendations for the users not selected to train the model. We can compare the recommendations/estimated ratings for the test users to their actual ratings assigned to the items or their actual top-N ranked items. However, in order to make recommendations for the test users we will have to retain a randomly selected portion of their actual ratings in order to estimate the missing ratings. 10 Some of the basic schemes for creating training and test sets are discussed below: ๏ท Splitting – we randomly split the entire database into the training and test cases. We put p% of the observations in the training data and (1-p)% into the test data, where p is chosen in advance, e.g. 66.6% training and 33.4% test. ๏ท Bootstrap sampling – we can sample users with replacement from the original database and use individuals not chosen in this process as the test cases. ๏ท K-fold cross-validation – randomly splits the database into k equal size sets and uses each portion as test cases for a model using the other (k – 1) sets as training cases. Thus each user eventually will be part of the test cases. As mentioned above we also need to randomly choose some items for the recommender to “fill-in” make predictions for. The terminology for this is Given x, where x represents the number of items given to the recommender to make predictions/recommendations from. Given 2, Given 5, and Given 10. Another is All but x, where x items are withheld and recommendations/ratings are given for these x items. All of these cross-validation options are available for evaluating recommenders using the functions in recommenderlab. Metrics for Measuring Accuracy of Predicted Ratings There a two main metrics for measuring the accuracy of predicted ratings. Mean Average Error (MAE) 1 ๐๐ด๐ธ = |Κ| ∑(๐,๐)∈Κ |๐๐๐ − ๐ฬ๐๐ | where Κ = the set of user-item pairs (๐, ๐) we need to make rating predictions for. Root Mean Squared Error (RMSE) ๐ ๐๐๐ธ = √ ∑(๐,๐)∈๐พ(๐๐๐ − ๐ฬ๐๐ ) 2 |๐พ| RMSE penalizes larger errors more and is best used when small prediction errors are less important. 11 Another way to look at the success of a recommender system is consider the accuracy of the top-N recommendations returned to users. When we omit some items from users rating profiles for cross-validation purposes, some of the items omitted would already be in their top-N items and we can see how many of the recommended items in the topN from the predicted ratings match these top-N items based upon actual ratings. We could look at # of predicted ratings (for items omitted from the user profiles) that are “good” (predicted rating > 4) and compare to actual “good” ratings (actual rating > 4). Using either approach we can construct a confusion matrix, i.e. a 2 X 2 table containing counts of matches and mismatches. Here a positive-positive match (d) would be an item in the actual top-N rated items and in the top-N based upon predicted ratings. A negative-negative match (a) would be items not in either top-N list. A positive-negative mismatch (c) would count the number of items in the actual top-N items (or received an actual “good” rating) but the items did not appear in the top-N items or receive “good” ratings based upon the predicted ratings. A negative-positive mismatch (b) would be the reverse of the statement above. Predicted Actual Negative Positive Negative a b Positive c d Using this table we can calculate different measures of performance: ๐+๐ Accuracy = ๐+๐+๐+๐ = proportion of correct recommendations ๐+๐ Proportion incorrect = ๐+๐+๐+๐ = 1 - Accuracy ๐ Precision = ๐+๐ = proportion of correct positive recommendations (PPV) ๐ Recall = ๐+๐ = proportion of useful recommendations (sensitivity) E-measure = 1 ๐ผ( 1 1 )+(1−๐ผ)( ) ๐๐๐๐๐๐ ๐๐๐ ๐ ๐๐๐๐๐ or F-measure = 2 1 1 + ๐๐๐๐๐๐ ๐๐๐ ๐ ๐๐๐๐๐ ROC curves can also be used which plots (sensitivity vs. false positive rate). 12 Recommenderlab Ratings are stored in an object called a ratingMatrix (like transactions in arules) and we can perform the following operations on it: dim(), dimnames(), colCounts(), rowCounts(), colMeans(), rowMeans(),colSums(), and rowSums(). Sample can be used to randomly sample users from the database and image()plots the ratings as shown on bottom of page 3. Examples of these commands applied to the Movie Lense rating matrix: > dim(MovieLense) [1] 943 1664 > dimnames(MovieLense)[[2]][1:25] [1] "Toy Story (1995)" [2] "GoldenEye (1995)" [3] "Four Rooms (1995)" [4] "Get Shorty (1995)" [5] "Copycat (1995)" [6] "Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)" [7] "Twelve Monkeys (1995)" [8] "Babe (1995)" [9] "Dead Man Walking (1995)" [10] "Richard III (1995)" [11] "Seven (Se7en) (1995)" [12] "Usual Suspects, The (1995)" [13] "Mighty Aphrodite (1995)" [14] "Postino, Il (1994)" [15] "Mr. Holland's Opus (1995)" [16] "French Twist (Gazon maudit) (1995)" [17] "From Dusk Till Dawn (1996)" [18] "White Balloon, The (1995)" [19] "Antonia's Line (1995)" [20] "Angels and Insects (1995)" [21] "Muppet Treasure Island (1996)" [22] "Braveheart (1995)" [23] "Taxi Driver (1976)" [24] "Rumble in the Bronx (1995)" [25] "Birdcage, The (1996)" > colCounts(MovieLense)[1:5] Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) 452 131 90 209 Copycat (1995) 86 > rowCounts(MovieLense)[1:5] 1 2 3 4 5 271 61 51 23 175 > colMeans(MovieLense)[100:105] > rowMeans(MovieLense)[10:15] 10 11 12 13 14 15 4.206522 3.455556 4.392157 3.095238 4.091837 2.873786 To fit a recommender model the basic function call is: Rec.fit = Recommender(data,method,parameter=NULL) # NULL can be replace by options. To make recommendations, i.e. predictions, we use the basic command: Rec.pred = predict(Rec.fit,newdata,n=10,type=c(“topNList”,”ratings”),…) 13 When we specify (n = 10) in the predict command it assumes we wish to have the top 10 recommendations returned. If we don’t specify (n = 10) and set type = “ratings” then the predicted ratings for movies the rater has not yet seen or rated will be returned. We now fit some a recommender model using the first 500 users in the Movie Lense database using first using user-based collaborative filtering (UBCF). > Rec.fit = Recommender(MovieLense[1:500,],method="UBCF") Find top 10 recommendations for users 501 – 510. > newusers = predict(Rec.fit,MovieLense[501:510,],n=10) Display the recommendations for these users. > as(newusers,"list") [[1]] [1] "Star Wars (1977)" [3] "Usual Suspects, The (1995)" [5] "Pulp Fiction (1994)" [7] "Silence of the Lambs, The (1991)" [9] "Toy Story (1995)" "Dead Man Walking (1995)" "L.A. Confidential (1997)" "Full Monty, The (1997)" "Monty Python and the Holy Grail (1974)" "Secrets & Lies (1996)" [[2]] [1] "Good Will Hunting (1997)" "Full Monty, The (1997)" "Jackie Brown (1997)" [4] "G.I. Jane (1997)" "As Good As It Gets (1997)" "Edge, The (1997)" [7] "Kiss the Girls (1997)" "Star Wars (1977)" "Apt Pupil (1998)" [10] "Return of the Jedi (1983)" [[3]] [1] "Casablanca (1942)" [2] "Amadeus (1984)" [3] "Shawshank Redemption, The (1994)" [4] "One Flew Over the Cuckoo's Nest (1975)" [5] "North by Northwest (1959)" [6] "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)" [7] "Lawrence of Arabia (1962)" [8] "Young Frankenstein (1974)" [9] "Strictly Ballroom (1992)" [10] "Philadelphia Story, The (1940)" [[4]] [1] "Princess Bride, The (1987)" [3] "Braveheart (1995)" [5] "Usual Suspects, The (1995)" [7] "Seven (Se7en) (1995)" [9] "Heathers (1989)" "Shawshank Redemption, The (1994)" "Empire Strikes Back, The (1980)" "Fugitive, The (1993)" "Long Kiss Goodnight, The (1996)" "Toy Story (1995)" [[5]] [1] "Schindler's List (1993)" [3] "Shawshank Redemption, The (1994)" [5] "Casablanca (1942)" [7] "One Flew Over the Cuckoo's Nest (1975)" [9] "Client, The (1994)" [[6]] [1] "Shawshank Redemption, The (1994)" [3] "Braveheart (1995)" [5] "Godfather, The (1972)" [7] "Casablanca (1942)" [9] "Beauty and the Beast (1991)" [[7]] [1] "Fargo (1996)" [3] "Raiders of the Lost Ark (1981)" [5] "Empire Strikes Back, The (1980)" [7] "Casablanca (1942)" [9] "Toy Story (1995)" [[8]] [1] "Godfather, The (1972)" [3] "Pulp Fiction (1994)" [5] "North by Northwest (1959)" [7] "Shawshank Redemption, The (1994)" [9] "Chinatown (1974)" [[9]] [1] "Full Monty, The (1997)" [3] "Godfather, The (1972)" [5] "Leaving Las Vegas (1995)" [7] "Empire Strikes Back, The (1980)" [9] "Boot, Das (1981)" "Dead Poets Society (1989)" "Boot, Das (1981)" "Maltese Falcon, The (1941)" "My Fair Lady (1964)" "Apollo 13 (1995)" "Die Hard (1988)" "Silence of the Lambs, The (1991)" "Toy Story (1995)" "Schindler's List (1993)" "Independence Day (ID4) (1996)" "Secrets & Lies (1996)" "Mother (1996)" "Lone Star (1996)" "Sting, The (1973)" "Monty Python and the Holy Grail (1974)" "Casablanca (1942)" "Citizen Kane (1941)" "Blade Runner (1982)" "Lone Star (1996)" "2001: A Space Odyssey (1968)" "Titanic (1997)" "Fargo (1996)" "Good Will Hunting (1997)" "Dead Man Walking (1995)" "Jerry Maguire (1996)" 14 [[10]] [1] "Good Will Hunting (1997)" [4] "Jerry Maguire (1996)" [7] "Conspiracy Theory (1997)" [10] "Edge, The (1997)" "Shadow Conspiracy (1997)" "Face/Off (1997)" "Kiss the Girls (1997)" "Wag the Dog (1997)" "As Good As It Gets (1997)" "Braveheart (1995)" Next we find predicted ratings for movies not yet rated by users 501 – 510. > newusers.ratings = predict(Rec.fit,MovieLense[501:510,],type="ratings") > newusers.mat = as(newusers.ratings,"matrix") > newusers.mat[,1:5] Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) [1,] 4.049278 3.668791 3.714380 3.962566 3.580816 [2,] 3.252201 3.118388 3.103183 3.118704 3.016261 [3,] NA 3.914827 3.937875 4.187932 3.883274 [4,] 3.968095 3.612346 3.661557 NA NA [5,] NA 3.384510 3.366071 3.184141 3.374750 [6,] 4.266543 NA 3.555912 3.792952 NA [7,] 4.880831 4.724138 4.724138 4.755315 4.724138 [8,] NA 3.648793 3.854255 3.851918 3.735318 [9,] 2.586061 2.580645 2.537261 2.587956 2.580645 [10,] 2.827498 2.793103 2.793103 2.793103 2.793103 Why are there missing ratings for some of the movies in these estimated ratings? Viewing the rating profiles for users 501-510 shows that when a user has already rated a movie, the rating is not estimated and the predict function returns NA. > ML.mat = as(MovieLense,”matrix”) > ML.mat[501:510,1:5] Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) 501 NA NA NA NA NA 502 NA NA NA NA NA 503 5 NA NA NA NA 504 NA NA NA 4 4 505 3 NA NA NA NA 506 NA 4 NA NA 4 507 NA NA NA NA NA 508 5 NA NA NA NA 509 NA NA NA NA NA 510 NA NA NA NA NA Next we will consider a recommender built on the same subset using the item-based collaborative filtering (IBCF). > Rec.fit2 = Recommender(MovieLense[1:500,],method="IBCF") > newusers = predict(Rec.fit2,MovieLense[501:510,],n=10) > top10 = as(newusers,"list") > top10 [[1]] [1] "White Balloon, The (1995)" [2] "Taxi Driver (1976)" [3] "Clerks (1994)" [4] "Exotica (1994)" [5] "Mystery Science Theater 3000: The Movie (1996)" [6] "Citizen Kane (1941)" [7] "Monty Python and the Holy Grail (1974)" [8] "Brazil (1985)" [9] "Good, The Bad and The Ugly, The (1966)" [10] "Apocalypse Now (1979)" Compare to the top 10 recommendations from UBCF, only one movie in common…. Hmmm. [[2]] [1] "Seven (Se7en) (1995)" [2] "Haunted World of Edward D. Wood Jr., The (1995)" [3] "Mars Attacks! (1996)" [4] "Lost Highway (1997)" [5] "Picture Perfect (1997)" [6] "Wild Things (1998)" [7] "Quiet Room, The (1996)" [8] "Trial and Error (1997)" [9] "Very Natural Thing, A (1974)" [10] "Walk in the Sun, A (1945)" [[3]] [1] "Big Bang Theory, The (1994)" [2] "From Dusk Till Dawn (1996)" [3] "Desperado (1995)" [4] "Strange Days (1995)" [5] "To Wong Foo, Thanks for Everything! Julie Newmar (1995)" [6] "Natural Born Killers (1994)" 15 [7] [8] [9] [10] "Hudsucker Proxy, The (1994)" "Nightmare Before Christmas, The (1993)" "Welcome to the Dollhouse (1995)" "Aristocats, The (1970)" [[4]] [1] "Horseman on the Roof, The (Hussard sur le toit, Le) (1995)" [2] "Unhook the Stars (1996)" [3] "Ridicule (1996)" [4] "Radioland Murders (1994)" [5] "Bringing Up Baby (1938)" [6] "Last Supper, The (1995)" [7] "Touch (1997)" [8] "Eve's Bayou (1997)" [9] "Sweet Hereafter, The (1997)" [10] "Mercury Rising (1998)" [[5]] [1] "Price Above Rubies, A (1998)" [2] "Client, The (1994)" [3] "Black Beauty (1994)" [4] "Little Rascals, The (1994)" [5] "Philadelphia Story, The (1940)" [6] "Sabrina (1954)" [7] "Sunset Blvd. (1950)" [8] "His Girl Friday (1940)" [9] "Local Hero (1983)" [10] "My Life as a Dog (Mitt liv som hund) (1985)" [[6]] [1] "Guantanamera (1994)" [3] "Rosewood (1997)" [5] "My Favorite Year (1982)" [7] "Wyatt Earp (1994)" [9] "Kundun (1997)" "Supercop (1992)" "Jeffrey (1995)" "My Fair Lady (1964)" "Enchanted April (1991)" "Month by the Lake, A (1995)" [[7]] [1] "Toy Story (1995)" [3] "Babe (1995)" [5] "Mr. Holland's Opus (1995)" [7] "Muppet Treasure Island (1996)" [9] "Rumble in the Bronx (1995)" "GoldenEye (1995)" "Usual Suspects, The (1995)" "Antonia's Line (1995)" "Braveheart (1995)" "Apollo 13 (1995)" [[8]] [1] "Postino, Il (1994)" [2] "Desperado (1995)" [3] "Doom Generation, The (1995)" [4] "Eat Drink Man Woman (1994)" [5] "Exotica (1994)" [6] "Three Colors: Blue (1993)" [7] "Grand Day Out, A (1992)" [8] "Unbearable Lightness of Being, The (1988)" [9] "Shall We Dance? (1996)" [10] "Promesse, La (1996)" [[9]] [1] "Toy Story (1995)" [2] "Seven (Se7en) (1995)" [3] "Apollo 13 (1995)" [4] "Shawshank Redemption, The (1994)" [5] "Silence of the Lambs, The (1991)" [6] "Wallace & Gromit: The Best of Aardman Animation (1996)" [7] "Haunted World of Edward D. Wood Jr., The (1995)" [8] "Godfather, The (1972)" [9] "2001: A Space Odyssey (1968)" [10] "Swingers (1996)" [[10]] [1] "Mr. Holland's Opus (1995)" [3] "Rock, The (1996)" [5] "Star Trek: First Contact (1996)" [7] "Time to Kill, A (1996)" [9] "As Good As It Gets (1997)" "Braveheart (1995)" "Phenomenon (1996)" "Full Monty, The (1997)" "Face/Off (1997)" "Good Will Hunting (1997)" Next we obtain rating predictions from IBCF. > newusers.ratings = predict(Rec.fit2,MovieLense[501:510,],type="ratings") > newratings.mat = as(newusers.ratings,"matrix") > newratings.mat[,1:5] Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) 501 4.000000 4.000000 NA NA NA 502 NA NA NA NA NA 503 NA 1.899343 NA 4.374983 NA 504 4.078962 4.000000 3.514523 NA NA 505 NA 3.379930 NA 2.306433 NA 506 4.759811 NA 3.027949 4.269734 NA 507 5.000000 5.000000 NA NA NA 508 NA 4.000000 NA 4.110303 2 509 5.000000 NA NA NA NA 510 NA NA NA NA NA 16 There are far more missing ratings for the first six movies in the database when using IBCF. A comparison of the estimated ratings for Toy Story is shown below. UBCF IBCF Another method of collaborative filtering uses SVD to fill-in the missing ratings. In general, SVD is commonly used to estimate missing data in a data matrix. When you consider that recommender systems are essentially trying to estimate missing ratings for users, the use of SVD makes sense. > Rec.fit3 = Recommender(MovieLense[1:500,],method="SVD") > newusers.top10 = predict(Rec.fit3,MovieLense[501:510,],n=10) > as(newusers.top10,"list") [[1]] [1] "Reservoir Dogs (1992)" [3] "Dead Man Walking (1995)" [5] "Usual Suspects, The (1995)" [7] "Boogie Nights (1997)" [9] "Raising Arizona (1987)" [[2]] [1] "Event Horizon (1997)" [4] "Happy Gilmore (1996)" [7] "Jaws (1975)" [10] "Cop Land (1997)" "Star Wars (1977)" "Secrets & Lies (1996)" "Hoop Dreams (1994)" "Shawshank Redemption, The (1994)" "Bullets Over Broadway (1994)" "Crash (1996)" "Speed (1994)" "Psycho (1960)" [[3]] [1] "Casablanca (1942)" [4] "Air Force One (1997)" [7] "Lawrence of Arabia (1962)" [10] "Young Frankenstein (1974)" "Mars Attacks! (1996)" "Twelve Monkeys (1995)" "Shining, The (1980)" "L.A. Confidential (1997)" "Men in Black (1997)" "Amadeus (1984)" "Strictly Ballroom (1992)" "From Dusk Till Dawn (1996)" "Aladdin (1992)" [[4]] [1] "Shawshank Redemption, The (1994)" [3] "To Kill a Mockingbird (1962)" [5] "Conspiracy Theory (1997)" [7] "Secrets & Lies (1996)" [9] "Lost Highway (1997)" [[5]] [1] "Dead Poets Society (1989)" [4] "Schindler's List (1993)" [7] "Bed of Roses (1996)" [10] "Home Alone (1990)" Comparing to the UBCF top 10 we see 4 movies in common "Long Kiss Goodnight, The (1996)" "Game, The (1997)" "Hunt for Red October, The (1990)" "Mission: Impossible (1996)" "Event Horizon (1997)" "Sound of Music, The (1965)" "Fargo (1996)" "Trainspotting (1996)" "Boot, Das (1981)" "Casablanca (1942)" "On Golden Pond (1981)" [[6]] [1] "Willy Wonka and the Chocolate Factory (1971)" [2] "Sound of Music, The (1965)" [3] "Clockwork Orange, A (1971)" [4] "Beauty and the Beast (1991)" [5] "Twelve Monkeys (1995)" [6] "Fish Called Wanda, A (1988)" [7] "First Wives Club, The (1996)" [8] "Men in Black (1997)" [9] "Rock, The (1996)" [10] "Dumbo (1941)" [[7]] [1] "Jurassic Park (1993)" [3] "Home Alone (1990)" [5] "Batman Forever (1995)" [7] "Waterworld (1995)" [9] "Harold and Maude (1971)" "E.T. the Extra-Terrestrial (1982)" "Empire Strikes Back, The (1980)" "Batman Returns (1992)" "Get Shorty (1995)" "Lost Highway (1997)" [[8]] [1] "Blade Runner (1982)" [2] "Harold and Maude (1971)" [3] "Manchurian Candidate, The (1962)" [4] "Godfather, The (1972)" [5] "Casablanca (1942)" [6] "Dumbo (1941)" [7] "Wallace & Gromit: The Best of Aardman Animation (1996)" [8] "Bridge on the River Kwai, The (1957)" [9] "Vertigo (1958)" [10] "Rear Window (1954)" 17 [[9]] [1] "Empire Strikes Back, The (1980)" [3] "Game, The (1997)" [5] "Raiders of the Lost Ark (1981)" [7] "Good Will Hunting (1997)" [9] "Fifth Element, The (1997)" [[10]] [1] "Twister (1996)" [3] "Conspiracy Theory (1997)" [5] "Phenomenon (1996)" [7] "Platoon (1986)" [9] "First Wives Club, The (1996)" "Full Monty, The (1997)" "Jaws (1975)" "Titanic (1997)" "Natural Born Killers (1994)" "Star Trek: The Wrath of Khan (1982)" "Good Will Hunting (1997)" "Jerry Maguire (1996)" "E.T. the Extra-Terrestrial (1982)" "Braveheart (1995)" "Peacemaker, The (1997)" Comparing UBCF and IBCF Recommenders First we set evaluation criterion and the form of cross-validation we wish to use. The syntax for the commands below is quite cumbersome. It would be helpful write some wrapper code to shorten it. We first look at accuracy of estimated ratings using MAE, MSE, and RMSE. > ev = evaluationScheme(MovieLense[1:500,],method="split",train=.6666,given=10,goodRating=5) > ev Evaluation scheme with 10 items given Method: ‘split’ with 1 run(s). Training set proportion: 0.667 Good ratings: >=5.000000 Data set: 500 x 1664 rating matrix of class ‘realRatingMatrix’ with 56431 ratings. > rs.UBCF = Recommender(getData(ev,"train"),method="UBCF") > rs.UBCF Recommender of type ‘UBCF’ for ‘realRatingMatrix’ learned using 333 users. > rs.IBCF = Recommender(getData(ev,"train"),method="IBCF") > rs.IBCF Recommender of type ‘IBCF’ for ‘realRatingMatrix’ learned using 333 users. > pred.UBCF = predict(rs.UBCF,getData(ev,"known"),type="ratings") > pred.UBCF 167 x 1664 rating matrix of class ‘realRatingMatrix’ with 276218 ratings. > poo = as(pred.UBCF,"matrix") > poo[1:10,1:5] Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) [1,] 2.853690 2.729452 2.800000 2.840479 2.771065 [2,] 3.042116 NA 2.910423 2.912725 2.846906 [3,] 3.361667 3.491101 3.500000 3.597445 3.440803 [4,] 4.394431 4.161705 4.142394 4.205322 4.189194 [5,] 2.634023 2.690252 2.775944 2.737192 2.778320 [6,] 2.412400 2.200000 2.222101 2.208721 2.200000 [7,] 4.128952 4.023582 3.865842 4.026260 4.002548 [8,] 2.924500 2.609048 2.624472 2.806677 2.687384 [9,] 4.211232 4.111044 4.201743 4.202020 4.223178 [10,] 4.295566 3.971317 4.055703 3.889486 4.008767 > pred.IBCF = predict(rs.IBCF,getData(CF.evaluation,"known"),type="ratings") > pred.IBCF 167 x 1664 rating matrix of class ‘realRatingMatrix’ with 31409 ratings. > poo2 = as(pred.IBCF,"matrix") > poo2[1:10,1:5] Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) 3 2.979738 NA NA NA NA 5 NA NA NA NA NA 6 2.000000 NA NA NA NA 10 NA NA NA NA NA 11 3.111881 NA NA 3 NA 15 NA NA NA NA NA 16 NA NA NA 5 NA 21 4.000000 NA NA NA NA 23 NA 5 NA NA NA 25 NA NA NA NA NA > error = rbind(calcPredictionAccuracy(pred.UBCF,getData(ev,"unknown")), calcPredictionAccuracy(pred.IBCF,getData(ev,"unknown")))) 18 > rownames(error) = c("UBCF","IBCF") > error RMSE MSE MAE UBCF 1.065576 1.135453 0.8423495 IBCF 1.212927 1.471191 0.8630322 Using given = 5 instead of 10. > error RMSE MSE MAE UBCF 1.085761 1.178877 0.8647943 IBCF 1.206584 1.455845 0.8429172 Using 90% of the database to train and 10% to test (instead of 66.66% & 33.34%) > error RMSE MSE MAE UBCF 1.072561 1.150386 0.8520082 IBCF 1.245301 1.550775 0.8557315 Evaluating top-N recommendations from UBCF and IBCF > scheme = evaluationScheme(MovieLense[1:500,],method="cross",k=4,given=3,goodRating=4) > scheme Evaluation scheme with 3 items given Method: ‘cross-validation’ with 4 run(s). Good ratings: >=4.000000 Data set: 500 x 1664 rating matrix of class ‘realRatingMatrix’ with 56431 ratings. > results = evaluate(scheme,method="UBCF",n = c(1,3,5,10,15,20)) UBCF run 1 [0.02sec/12.47sec] 2 [0sec/12.45sec] 3 [0sec/12.48sec] 4 [0sec/12.45sec] > results Evaluation results for 4 runs using method ‘UBCF’. > getConfusionMatrix(results)[[1]] ๏ 1st results & there are 3 other cross-validation runs we can display TP FP FN TN precision recall TPR FPR 1 0.512 0.408 60.24 1600 0.5565 0.01322 0.01322 0.0002518 3 1.408 1.352 59.34 1599 0.5101 0.03505 0.03505 0.0008349 5 2.176 2.424 58.58 1598 0.4730 0.05243 0.05243 0.0014978 10 3.744 5.456 57.01 1595 0.4070 0.08545 0.08545 0.0033790 15 5.048 8.752 55.70 1591 0.3658 0.11086 0.11086 0.0054258 20 6.440 11.960 54.31 1588 0.3500 0.13789 0.13789 0.0074143 > avg(results) ๏ average results for the 4 cross-validation runs for UBCF TP FP FN TN precision recall TPR FPR 1 0.452 0.440 60.97 1599 0.5062 0.01324 0.01324 0.0002724 3 1.214 1.462 60.20 1598 0.4530 0.03250 0.03250 0.0009052 5 1.904 2.556 59.51 1597 0.4265 0.04913 0.04913 0.0015825 10 3.326 5.594 58.09 1594 0.3727 0.08107 0.08107 0.0034658 15 4.550 8.830 56.87 1591 0.3400 0.10536 0.10536 0.0054722 20 5.752 12.088 55.67 1587 0.3224 0.12784 0.12784 0.0074917 > results = evaluate(scheme,method="IBCF",n = c(1,3,5,10,15,20)) IBCF run 1 [81.2sec/0.6sec] 2 [77.69sec/0.61sec] 3 [81.06sec/0.61sec] 4 [75.06sec/0.63sec] > getConfusionMatrix(results)[[1]] ๏ 1st results & there are 3 other cross-validation runs we can display TP FP FN TN precision recall TPR FPR 1 0.224 0.696 60.53 1600 0.2435 0.003877 0.003877 0.0004322 3 0.576 2.184 60.18 1598 0.2087 0.011565 0.011565 0.0013603 5 1.040 3.560 59.71 1597 0.2261 0.019643 0.019643 0.0022147 10 2.144 7.056 58.61 1593 0.2330 0.043143 0.043143 0.0043881 15 3.144 10.656 57.61 1590 0.2278 0.063757 0.063757 0.0066295 20 4.008 14.392 56.74 1586 0.2178 0.081451 0.081451 0.0089576 19 > avg(results) TP FP 1 0.212 0.682 3 0.588 2.094 5 1.024 3.446 10 1.976 6.964 15 2.802 10.608 20 3.558 14.306 ๏ average results for the 4 cross-validation runs for IBCF FN TN precision recall TPR FPR 61.21 1599 0.2369 0.004752 0.004752 0.0004244 60.83 1597 0.2193 0.012722 0.012722 0.0013036 60.39 1596 0.2293 0.021775 0.021775 0.0021439 59.44 1593 0.2212 0.043822 0.043822 0.0043343 58.62 1589 0.2090 0.061052 0.061052 0.0066053 57.86 1585 0.1990 0.078379 0.078379 0.0089122 In terms of accuracy of top-N recommendations IBCF fairs poorly when compared to UBCF. Below are similar results with given = 10 rather than given = 3. > avg(results.UBCF) TP FP FN 1 0.528 0.472 56.80 3 1.378 1.622 55.95 5 2.166 2.834 55.16 10 3.822 6.178 53.51 15 5.214 9.786 52.12 20 6.474 13.526 50.86 TN precision recall TPR FPR 1596 0.5280 0.01934 0.01934 0.0002918 1595 0.4593 0.04544 0.04544 0.0010048 1594 0.4332 0.06766 0.06766 0.0017554 1590 0.3822 0.10516 0.10516 0.0038264 1587 0.3476 0.13447 0.13447 0.0060644 1583 0.3237 0.15817 0.15817 0.0083848 > plot(results.UBCF,annotate=T) > plot(results.UBCF,"prec/rec",annotate=T) > avg(results.IBCF) TP FP FN 1 0.204 0.796 57.13 3 0.652 2.348 56.68 5 1.088 3.912 56.24 10 2.132 7.868 55.20 15 3.068 11.932 54.26 20 4.032 15.962 53.30 TN precision recall TPR FPR 1596 0.2040 0.004264 0.004264 0.0004956 1594 0.2173 0.013481 0.013481 0.0014615 1593 0.2176 0.022812 0.022812 0.0024345 1589 0.2132 0.045688 0.045688 0.0048971 1585 0.2045 0.066380 0.066380 0.0074311 1581 0.2016 0.088308 0.088308 0.0099426 20 Putting it all together… The command below will display all recommender system options available with information on the some of the tuning parameters that can be tweaked to improve performance. > recommenderRegistry$get_entries(dataType = "realRatingMatrix") $IBCF_realRatingMatrix Recommender method: IBCF Description: Recommender based on item-based collaborative filtering (real data). Parameters: k method normalize normalize_sim_matrix alpha na_as_zero minRating 1 30 Cosine center FALSE 0.5 FALSE NA $PCA_realRatingMatrix Recommender method: PCA Description: Recommender based on PCA approximation (real data). Parameters: categories method normalize normalize_sim_matrix alpha na_as_zero minRating 1 20 Cosine center FALSE 0.5 FALSE NA $POPULAR_realRatingMatrix Recommender method: POPULAR Description: Recommender based on item popularity (real data). Parameters: None $RANDOM_realRatingMatrix Recommender method: RANDOM Description: Produce random recommendations (real ratings). Parameters: None $SVD_realRatingMatrix Recommender method: SVD Description: Recommender based on SVD approximation (real data). Parameters: categories method normalize normalize_sim_matrix alpha treat_na minRating 1 50 Cosine center FALSE 0.5 median NA $UBCF_realRatingMatrix Recommender method: UBCF Description: Recommender based on user-based collaborative filtering (real data). Parameters: method nn sample normalize minRating 1 cosine 25 FALSE center NA 21 In the code below we compare multiple recommenders simultaneously using the train/test data. > scheme = evaluationScheme(MovieLense, method = "split", train = .9, k = 1, given = 10, goodRating = 4) > scheme Evaluation scheme with 10 items given Method: ‘split’ with 1 run(s). Training set proportion: 0.900 Good ratings: >=4.000000 Data set: 943 x 1664 rating matrix of class ‘realRatingMatrix’ with 99392 ratings. > algorithms <- list( "random items" = list(name="RANDOM", param=list(normalize = "Z-score")), "popular items" = list(name="POPULAR", param=list(normalize = "Z-score")), "user-based CF" = list(name="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=50, minRating=3)), "item-based CF" = list(name="IBCF", param=list(normalize = "Z-score")) ) > results = evaluate(scheme, algorithms, n=c(1, 3, 5, 10, 15, 20)) > plot(results, annotate = 1:4, legend="topleft") 22 > plot(results, "prec/rec", annotate=2:4) User-based collaborative filtering (UBCF) definitely is superior to the other recommenders. 23 Binary Recommendations Rather than considering actually ratings on a 1-5 point scale we can binarize the ratings to be either good (e.g. rating = 4 or more or a rating = 5) or not. Thus the user ratings will be either 1 for a good movie rating and 0 for a rating that is not considered good based on our classification (e.g. rating = 5). To assess the performance of the recommendation system we use measures like precision, recall, true positive rate and false positive rates. The cross-validation process is very similar to what was done above by first defining a scheme to use, running different recommendation methods, and then examining performance measures appropriate for binary recommendations. > MovieLense_binary = binarize(MovieLense,minRating=4) # classify a good rating as 4 or 5. > ML_binary = MovieLense_binary[rowCounts(MovieLense_binary)>25] # subset users with 25 ratings or more > algorithms_binary = list("random items"=list(name="RANDOM",param=NULL), "popular items"=list(name="POPULAR",param=NULL), "user-based CF"=list(name="UBCF",param=list(method="Jaccard",nn=50)), "item-based CF"=list(name="IBCF",param=list(method="Jaccard"))) > scheme_binary = evaluationScheme(ML_binary[1:200],method="split",train=.9,k=1,given=20) > results_binary = evaluate(scheme_binary,algorithms_binary,n=c(1,3,5,10,15,20)) RANDOM run 1 [0.01sec/0.08sec] 1 [0sec/0.11sec] 1 [0.01sec/0.29sec] 1 [13.66sec/0.06sec] POPULAR run UBCF run IBCF run > plot(results_binary,annotate=1:4,legend="bottomright") Again user-based collaborative filtering is far superior to the other methods. 24