AudioScrobbler Audioscrobbler and last.fm Collaborative filtering What is a neighbor? What is the network? CompSci 001 6.1 Duke Scrobbler Integration with facebook CompSci 001 6.2 What can we do with real data? How do we find a graph’s diameter? This is the maximal shortest path between any pair of vertices Can we do this in big graphs? What is the center of a graph? From rumor mills to terrorists How is this related to diameter? Demo GUESS (as augmented at Duke) IM data, Audioscrobbler data CompSci 001 6.3 My recommendations at Amazon CompSci 001 6.4 And again… CompSci 001 6.5 Collaborative Filtering Goal: predict the utility of an item to a particular user based on a database of user profiles User profiles contain user preference information Preference may be explicit or implicit • Explicit means that a user votes explicitly on some scale • Implicit means that the system interprets user behavior or selections to impute a vote Problems Missing data: voting is neither complete nor uniform Preferences may change over time Interface issues CompSci 001 6.6 Google’s PageRank web site xxx web site xxx web site xxx web site a b c defg web Inlinks are “good” (recommendations) Inlinks from a “good” site are better than inlinks from a “bad” site site web site yyyy pdq pdq .. web site a b c defg web site yyyy but inlinks from sites with many outlinks are not as “good”... “Good” and “bad” 6.7 are relative. CompSci 001 W. Cohen Google’s PageRank web site xxx web site xxx Imagine a “pagehopper” that always either • follows a random link, or web site a b c defg • jumps to random page web site web site yyyy pdq pdq .. web site a b c defg web site yyyy 6.8 CompSci 001 W. Cohen Google’s PageRank (Brin & Page, http://www-db.stanford.edu/~backrub/google.html) web site xxx web site xxx Imagine a “pagehopper” that always either • follows a random link, or web site a b c defg • jumps to random page web site web site yyyy web site a b c defg web site yyyy CompSci 001 pdq pdq .. PageRank ranks pages by the amount of time the pagehopper spends on a page: • or, if there were many pagehoppers, PageRank is the expected “crowd size” 6.9 Everyday Examples of Collaborative Filtering... Bestseller lists Top 40 music lists The “recent returns” shelf at the library Unmarked but well-used paths thru the woods The printer room at work Many weblogs “Read any good books lately?” .... Common insight: personal tastes are correlated: If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y especially (perhaps) if Bob knows Alice 6.10 CompSci 001 W. Cohen