Recommender Systems and Collaborative Filtering Drawing much on some online ppt in this area, especially William W. Cohen (CMU) You visit an online bookshop ... The shop has 100,000 books. On the webpage, they will display 5 book covers, especially for you. What ones will they display? Why? • same for books, webpages, music, films, clothes, food, everything ... this is very serious for e-commerce -- big financial uplift if stores get recommendations ‘right’ • What if the website is not selling you anything (e.g. research papers, search, interest group forum). Why does such a site need to make good recommendations? Basic approaches used for recommendation • User-based – Recommend things that were purchased or viewed by users who are similar to you • Item-based – Recommend things that are similar to the items that you have viewed/purchased before Amazon: ‘cold-start’ recomendation Amazon: with minimal info about me via a cookie on this netbook Amazon, when I logged in User Profiles For user-based recommendation, sites need to have some kind of user profile. Similarity with other users is based on distance measurements based on the profile. What do you think could be in a user profile? Potential contents of user profiles • Demographic data: age, gender, salary, profession, country of residence, country of origin, religion ... • Site behaviour: Purchase history at the site; viewing history, perhaps including time spent on certain pages/items; clickstream sequence K-Nearest Neighbour based Recommendation Age You Salary (Think in terms of many dimensions, not just these two) K-Nearest Neighbour based Recommendation Age You Salary Your neighbours: recommend things that they have viewed/purchased Collaborative Filtering: The main idea People who purchased A also purchased B Different from nearest-neighbour; this can lead to recommendations based on behaviour of users who are very dissimilar to you Other forms/aspects of collaborative filtering Why “collaborative”? Basically, someone else (in fact many someones) have gone to the effort of viewing/filtering things, and chosen the best few. You get a recommendation of the best few, without having to spend the effort. Rampant examples of CF: twitter, pagerank, stumbleupon, digg, Facebook (Likes), etc ... Another look at Google’s PageRank (this bit adapted from slides of William Cohen, CMU) web site xxx web site xxx web site xxx web site a b c defg web Inlinks are “good” (recommendations) Inlinks from a “good” site are better than inlinks from a “bad” site site web site yyyy web site a b c defg web site yyyy pdq pdq .. but inlinks from sites with many outlinks are not as “good”... “Good” and “bad” are relative. Google’s PageRank (Brin & Page, http://www-db.stanford.edu/~backrub/google.html) web site xxx web site xxx Imagine a “pagehopper” that always either • follows a random link, or web site a b c defg • jumps to random page web site web site yyyy web site a b c defg web site yyyy pdq pdq .. PageRank ranks pages by the amount of time the pagehopper spends on a page: • or, if there were many pagehoppers, PageRank is the expected “crowd size” Collaborative Filtering and User Ratings Many systems ask users to rate items – e.g. on a scale of 1 to 10. These ratings then enable the system to give more precise/accurate recommendations, and use a variety of sophisticated learning/prediction algorithms. Collaborative Filtering and User Ratings Many systems ask users to rate items – e.g. on a scale of 1 to 10. These ratings then enable the system to give more precise/accurate recommendations, and use a variety of sophisticated learning/prediction algorithms. E.g. Here are user ratings for some items: “?” means unrated. You: User1 User2 User3 A 7 1 6 7 B 2 8 3 2 C 1 8 3 1 D 8 2 7 7 E F 9 9 ? 2 6 5 7 ? G H ? ? 8 7 3 1 3 1 How might a system predict your rating for items G and H? Collaborative Filtering Works BellCore’s MovieRecommender (Bell Communications Research) • Participants sent email to videos@bellcore.com • System replied with a list of 500 movies to rate on a 1-10 scale (250 random, 250 popular) – Only subset need to be rated • New participant P sends in rated movies via email • System compares ratings for P to ratings of (a random sample of) previous users • Most similar users are used to predict scores for unrated movies • System returns recommendations in an email message. Start your own business? Bookmark based recommendation Display the right adverts on your site End