Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network Ellen Spertus spertus@google.com Recommender systems • What are they? • Example: Amazon Controversial recommenders “What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Controversial recommenders “What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Controversial recommenders “What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Controversial recommenders Wal-Mart DVD recommendations http://tinyurl.com/2gp2hm Controversial recommenders Wal-Mart DVD recommendations http://tinyurl.com/2gp2hm Controversial recommenders Wal-Mart DVD recommendations http://tinyurl.com/2gp2hm Google’s mission To organize the world's information and make it universally accessible and useful. 20 04 20 04 20 04 20 04 9/ 28 /2 00 10 4 /2 8/ 20 11 04 /2 8/ 20 04 8/ 28 / 7/ 28 / 6/ 28 / 5/ 28 / 2500000 20 04 20 04 20 04 20 04 3000000 4/ 28 / 3/ 28 / 2/ 28 / 1/ 28 / communities 3500000 Members Communities 2000000 1500000 1000000 500000 0 Community recommender • Goal: Per-community ranked recommendations • How to determine? Community recommender • Goal: Per-community ranked recommendations • How to determine? – Implicit collaborative filtering – Look for common membership between pairs of communities Terminology • Consider each community to be a set of members – B: base community (e.g., “Pizza”) – R: related community (e.g., “Cheese”) • Similarity measure – Based on overlap |B∩R| Example: Pizza Example: Pizza Terminology • Consider each community to be a set of members – B: base community (e.g., “Wine”) – R: related community (e.g., “Linux”) • Similarity measure – Based on overlap |B∩R| – Also depends on |B| and |R| – Possibly asymmetric Example of asymmetry Stanford (2756) 5 Stanford Class of 2006 (52) Similarity measures • L1 normalization • L2 normalization • Pointwise mutual information – Positive correlations – Positive and negative correlations • Salton tf-idf • Log-odds L1 normalization • Vector notation • Set notation L2 normalization • Vector notation • Set notation Mutual information: positive correlation • Formally, • Informally, how well membership in the base community predicts membership in the related community bb r+r- + Mutual information: positive and negative correlation bb r+r- + Salton tf-idf LogOdds0 • Formally, • Informally, how much likelier a member of B is to belong to R than a non-member of B is. LogOdds0 • Formally, • Informally, how much likelier a member of B is to belong to R than a non-member of B is. • This yielded the same rankings as L1. LogOdds Predictions? • Were there significant differences among the measures? – Top-ranked recommendations – User preference • Which measure was “best”? • Was there a partial or total ordering of measures? Recommendations for “I love wine” (2400) Experiment • Precomputed top 12 recommendations for each base community for each similarity measure • When a user views a community page – Hash the community and user ID to – Select an ordered pair of measures to – Interleave, filtering out duplicates • Track clicks of new users Click interpretation base community related community M n j Member ? + non-member ?? ?? Click interpretation base community related community M n j Member ? + non-member ?? ?? Overall click rate (July 1-18) Total recommendation pages generated: 4,106,050 Overall click rate (July 1-18) Overall click rate (July 1-18) Analysis For each pair of similarity measures Ma and Mb and each click C, either: • Ma recommended C more highly than Mb • Ma and Mb recommended C equally • Mb recommended C more highly than Ma Results • Clicks leading to joins L2 » MI1 » MI2 » IDF › L1 » LogOdds • All clicks L2 » L1 » MI1 » MI2 › IDF» LogOdds Positional effects • Original experiment – Ordered recommendations by rank • Second experiment – Generated recommendations using L2 – Pseudo-randomly ordered recommendations, tracking clicks by placement – Tracked 1.3 M clicks between September 22-October 21 Results: single row (n=28108) Namorado Para o Bulldog Results: single row (n=28,108) 1.00 1.01 .98 p=.12 (not significant) Results: two rows (n=24,459) Results: two rows (n=24,459) 1.04 .97 1.05 .94 p < .001 1.08 .92 Results: 3 rows (n=1,226,659) Results: 3 rows (n=1,226,659) 1.11 1.01 1.01 1.06 .97 .94 p < .001 1.04 .99 .87 Users’ reactions • Hundreds of requests per day to add recommendations • Angry requests from community creators – General – Specific Amusing recommendations C++ Amusing recommendations C++ What’s she trying to say? For every time a woman has confused you… Amusing recommendations Chocolate Amusing recommendations Chocolate PMS Allowing community owners to set recommendations Allowing community owners to set recommendations Manual recommendations • Eight days after release – 50,876 community owners – Added 267,623 recommendations – Deleted 59,599 recommendations – Affecting 73,230 base communities and – 111,936 related communities • Open question: How do they compare with automatic recommendations? Future research 1 Determining similar users based on common communities – Is it useful? – Will the measures make the same total order? (9 users) Other types of information • Distance in social network • Demographic – Country – Age – Etc. Future research 2 Per-user community recommendations – Using social network information – Using profile information (e.g., country) Future research 2 Per-user community recommendations – Using social network information – Using profile information (e.g., country) Future research 2 Per-user community recommendations – Using social network information – Using profile information (e.g., country) Future research 3 Do we get the same ordering for other domains? L2 » MI1 » MI2 » IDF › L1 » LogOdds Acknowledgments • Mehran Sahami • Orkut Buyukkokten • orkut team Bonus material Self-rated beauty • • • • • “beauty contest winners” “very attractive” “attractive” “average” “mirror-cracking material” Self-rated beauty: men • • • • • “beauty contest winners” “very attractive” “attractive” “average” “mirror-cracking material” 8% 18% 39% 24% 11% Self-rated beauty: women • • • • • “beauty contest winners” “very attractive” “attractive” “average” “mirror-cracking material” 8% 16% 39% 27% 9% Self-rated beauty by country • Most beautiful – men: – women: • Least beautiful – men: – women: Self-rated beauty by country • Most beautiful – men: Syrian – women: Barbadian • Least beautiful – men: Gambian – women: Ascension Islanders Ratings by others • Karma – trustiness – sexiness – coolness • How do these correlate with age? Ratings by others Friend counts Self-rated best body part