Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network Ellen Spertus

advertisement
Evaluating Similarity Measures:
A Large-Scale Study in the orkut
Social Network
Ellen Spertus
spertus@google.com
Recommender systems
• What are they?
• Example: Amazon
Controversial recommenders
“What to do when your TiVo thinks you’re
gay”, Wall Street Journal, Nov. 26, 2002
http://tinyurl.com/2qyepg
Controversial recommenders
“What to do when your TiVo thinks you’re
gay”, Wall Street Journal, Nov. 26, 2002
http://tinyurl.com/2qyepg
Controversial recommenders
“What to do when your TiVo thinks you’re
gay”, Wall Street Journal, Nov. 26, 2002
http://tinyurl.com/2qyepg
Controversial recommenders
Wal-Mart DVD recommendations
http://tinyurl.com/2gp2hm
Controversial recommenders
Wal-Mart DVD recommendations
http://tinyurl.com/2gp2hm
Controversial recommenders
Wal-Mart DVD recommendations
http://tinyurl.com/2gp2hm
Google’s mission
To organize the world's information and
make it universally accessible and useful.
20
04
20
04
20
04
20
04
9/
28
/2
00
10
4
/2
8/
20
11
04
/2
8/
20
04
8/
28
/
7/
28
/
6/
28
/
5/
28
/
2500000
20
04
20
04
20
04
20
04
3000000
4/
28
/
3/
28
/
2/
28
/
1/
28
/
communities
3500000
Members
Communities
2000000
1500000
1000000
500000
0
Community recommender
• Goal: Per-community
ranked recommendations
• How to determine?
Community recommender
• Goal: Per-community
ranked recommendations
• How to determine?
– Implicit collaborative
filtering
– Look for common membership
between pairs of communities
Terminology
• Consider each community to be a set of
members
– B: base community (e.g., “Pizza”)
– R: related community (e.g., “Cheese”)
• Similarity measure
– Based on overlap |B∩R|
Example: Pizza
Example: Pizza
Terminology
• Consider each community to be a set of
members
– B: base community (e.g., “Wine”)
– R: related community (e.g., “Linux”)
• Similarity measure
– Based on overlap |B∩R|
– Also depends on |B| and |R|
– Possibly asymmetric
Example of asymmetry
Stanford
(2756)
5
Stanford Class of
2006 (52)
Similarity measures
• L1 normalization
• L2 normalization
• Pointwise mutual information
– Positive correlations
– Positive and negative correlations
• Salton tf-idf
• Log-odds
L1 normalization
• Vector notation
• Set notation
L2 normalization
• Vector notation
• Set notation
Mutual information:
positive correlation
• Formally,
• Informally, how well membership in the base
community predicts membership in the
related community
bb
r+r- +
Mutual information:
positive and negative correlation
bb
r+r- +
Salton tf-idf
LogOdds0
• Formally,
• Informally, how much likelier a member of
B is to belong to R than a non-member of
B is.
LogOdds0
• Formally,
• Informally, how much likelier a member of
B is to belong to R than a non-member of
B is.
• This yielded the same rankings as L1.
LogOdds
Predictions?
• Were there significant differences among
the measures?
– Top-ranked recommendations
– User preference
• Which measure was “best”?
• Was there a partial or total ordering of
measures?
Recommendations for
“I love wine” (2400)
Experiment
• Precomputed top 12 recommendations for
each base community for each similarity
measure
• When a user views a community page
– Hash the community and user ID to
– Select an ordered pair of measures to
– Interleave, filtering out duplicates
• Track clicks of new users
Click interpretation
base community related community
M
n
j
Member
?
+
non-member
??
??
Click interpretation
base community related community
M
n
j
Member
?
+
non-member
??
??
Overall click rate (July 1-18)
Total recommendation pages generated: 4,106,050
Overall click rate (July 1-18)
Overall click rate (July 1-18)
Analysis
For each pair of similarity measures Ma
and Mb and each click C, either:
• Ma recommended C more highly than Mb
• Ma and Mb recommended C equally
• Mb recommended C more highly than Ma
Results
• Clicks leading to joins
L2 » MI1 » MI2 » IDF › L1 » LogOdds
• All clicks
L2 » L1 » MI1 » MI2 › IDF» LogOdds
Positional effects
• Original experiment
– Ordered recommendations by rank
• Second experiment
– Generated recommendations using L2
– Pseudo-randomly ordered recommendations,
tracking clicks by placement
– Tracked 1.3 M clicks between
September 22-October 21
Results: single row (n=28108)
Namorado Para o Bulldog
Results: single row (n=28,108)
1.00
1.01
.98
p=.12 (not significant)
Results: two rows (n=24,459)
Results: two rows (n=24,459)
1.04
.97
1.05
.94
p < .001
1.08
.92
Results: 3 rows (n=1,226,659)
Results: 3 rows (n=1,226,659)
1.11
1.01
1.01
1.06
.97
.94
p < .001
1.04
.99
.87
Users’ reactions
• Hundreds of requests per day to add
recommendations
• Angry requests from community creators
– General
– Specific
Amusing recommendations
C++
Amusing recommendations
C++
What’s she trying to say?
For every time a woman
has confused you…
Amusing recommendations
Chocolate
Amusing recommendations
Chocolate
PMS
Allowing community owners to set
recommendations
Allowing community owners to set
recommendations
Manual recommendations
• Eight days after release
– 50,876 community owners
– Added 267,623 recommendations
– Deleted 59,599 recommendations
– Affecting 73,230 base communities and
– 111,936 related communities
• Open question: How do they compare with
automatic recommendations?
Future research 1
Determining similar users based on
common communities
– Is it useful?
– Will the measures make the same total order?
(9 users)
Other types of information
• Distance in
social network
• Demographic
– Country
– Age
– Etc.
Future research 2
Per-user community recommendations
– Using social network information
– Using profile information (e.g., country)
Future research 2
Per-user community recommendations
– Using social network information
– Using profile information (e.g., country)
Future research 2
Per-user community recommendations
– Using social network information
– Using profile information (e.g., country)
Future research 3
Do we get the same ordering for other
domains?
L2 » MI1 » MI2 » IDF › L1 » LogOdds
Acknowledgments
• Mehran Sahami
• Orkut Buyukkokten
• orkut team
Bonus material
Self-rated beauty
•
•
•
•
•
“beauty contest winners”
“very attractive”
“attractive”
“average”
“mirror-cracking material”
Self-rated beauty: men
•
•
•
•
•
“beauty contest winners”
“very attractive”
“attractive”
“average”
“mirror-cracking material”
8%
18%
39%
24%
11%
Self-rated beauty: women
•
•
•
•
•
“beauty contest winners”
“very attractive”
“attractive”
“average”
“mirror-cracking material”
8%
16%
39%
27%
9%
Self-rated beauty by country
• Most beautiful
– men:
– women:
• Least beautiful
– men:
– women:
Self-rated beauty by country
• Most beautiful
– men: Syrian
– women: Barbadian
• Least beautiful
– men: Gambian
– women: Ascension Islanders
Ratings by others
• Karma
– trustiness
– sexiness
– coolness
• How do these correlate with age?
Ratings by others
Friend counts
Self-rated best body part
Download