lect06.ppt

advertisement
AudioScrobbler

Audioscrobbler and last.fm
 Collaborative filtering
 What is a neighbor?
 What is the network?
CompSci 001
6.1
Duke Scrobbler

Integration
with
facebook
CompSci 001
6.2
What can we do with real data?

How do we find a graph’s diameter?
 This is the maximal shortest path between any pair of
vertices
 Can we do this in big graphs?

What is the center of a graph?
 From rumor mills to terrorists
 How is this related to diameter?

Demo GUESS (as augmented at Duke)
 IM data, Audioscrobbler data
CompSci 001
6.3
My recommendations at Amazon
CompSci 001
6.4
And again…
CompSci 001
6.5
Collaborative Filtering

Goal: predict the utility of an item to a particular user based on
a database of user profiles
 User profiles contain user preference information
 Preference may be explicit or implicit
• Explicit means that a user votes explicitly on some scale
• Implicit means that the system interprets user behavior or
selections to impute a vote

Problems
 Missing data: voting is neither complete nor uniform
 Preferences may change over time
 Interface issues
CompSci 001
6.6
Google’s PageRank
web site
xxx
web site
xxx
web site
xxx
web site a b c
defg
web
Inlinks are “good”
(recommendations)
Inlinks from a
“good” site are
better than inlinks
from a “bad” site
site
web site yyyy
pdq pdq ..
web site a b c
defg
web site yyyy
but inlinks from
sites with many
outlinks are not as
“good”...
“Good” and “bad” 6.7
are relative.
CompSci 001
W. Cohen
Google’s PageRank
web site
xxx
web site
xxx
Imagine a “pagehopper”
that always either
• follows a random link, or
web site a b c
defg
• jumps to random page
web
site
web site yyyy
pdq pdq ..
web site a b c
defg
web site yyyy
6.8
CompSci 001
W. Cohen
Google’s PageRank
(Brin & Page, http://www-db.stanford.edu/~backrub/google.html)
web site
xxx
web site
xxx
Imagine a “pagehopper”
that always either
• follows a random link, or
web site a b c
defg
• jumps to random page
web
site
web site yyyy
web site a b c
defg
web site yyyy
CompSci 001
pdq pdq ..
PageRank ranks pages by
the amount of time the
pagehopper spends on a
page:
• or, if there were many
pagehoppers, PageRank is
the expected “crowd size”
6.9
Everyday Examples of Collaborative
Filtering...









Bestseller lists
Top 40 music lists
The “recent returns” shelf at the library
Unmarked but well-used paths thru the woods
The printer room at work
Many weblogs
“Read any good books lately?”
....
Common insight: personal tastes are correlated:
 If Alice and Bob both like X and Alice likes Y then Bob is more
likely to like Y
 especially (perhaps) if Bob knows Alice
6.10
CompSci 001
W. Cohen
Download