SRS 2013 4’th International Workshop on Social Recommender Systems Co-located with WWW 2013 Rio de Janeiro, Brazil Recommending Collaborators Using Keywords Lior Ebel Sara Cohen The Hebrew University of Jerusalem Collaborator Recommendation 2012 rch ea LS e XM gin An En Julia Probability in XML Bob LS Eng earch ine An XML Search Engine An X M Alice 2013 David Classic link prediction/recommendation: Recommend Julia for Alice Collaborator prediction/recommendation: Recommend Julia for Alice and the specific topic Recommend(Alice) = {Julia} Recommend(Alice, ”Probability in Databases”) = {Julia} 2 Motivation • A new researcher (such as myself ), can benefit from recommended collaborators on a desired topic according to the social graph • Important in cross-domain collaborations • Grant regulations • A foundation for the more generic context-based people recommendation / context-based link prediction problem, where given a source s and textual context k, we recommend/predict target nodes t for a link of context k 3 Take Home Message • A new problem variant – Define the collaborator recommendation problem – Not addressed before in the literature • Scoring functions – Empirical results for several structural, textual and importance based scoring functions – Two large real world DBLP based co-authorship networks • Results – A sophisticated hybrid score function based on structural and textual measures outperforms baseline – Our ranking function is effective 4 Problem Definition • Author node attributes – Profile(v): a bag of words of all publication titles • Co-authorship edge attributes – Label(e): publication title, Time(e): publication year Setting(e): publication venue/journal • A Query q=(s,k) – s: the source node in the network (the querying author) – k: set of keywords (e.g. desired topic or future publication title) • A function score(u,q) – score(u,q) > score(v,q) → u more likely to form a collaboration with s described by k 5 Structural Scoring Functions • Distance variants: Score(u,q) = 1/distance(s,u) – Simple distance – Weighted by time: weight(e) = 1/log(age(e)) where age(e) = current year – time(e) – Weighted by publication frequency: weight(e)=1/Mutual(e) where Mutual(e) = # of mutual publications for the authors of e • Adamic-Adar (Social Networks, 2001) – Score(u,q) = a weighted sum on the mutual neighbors of s and u Each mutual neighbor v weight is 1/log(N(v)), where N(v) is the number of neighbors of v 6 Textual Scoring Functions • TF-IDF – Score(u,q) = tf-idf(k , profile(u)) • COLLAB (developed in this paper) – Step 1: Score(u,q) = a weighted sum of u’s publications, considering: • Textual score for the publication for k • Publication age • Publication venue (did s publish in it?) • Publication participants (did s publish with them?) – Step 2: the unseen-bigrams approach on the results of step 1 (Kleinberg et al., 2007) 7 Combining Scoring Functions • Linear Combinations • Re-ranking • Borda Normalizations • SocScore: a linear combination of re-rankings functions: – First ranking: structural (e.g time based distance, adamic-adar) – Second ranking: text based 8 Results – All Collaborators TopScore@10 Recall@10 MRR dist 62.5 48.1 0.349 cdist 62.5 47.63 0.443 tdist 66.0 55.15 0.444 ad 6.0 3.30 0.027 tfidf 38.0 27.57 0.235 collab 27.5 22.86 0.185 SoCScore 66.5 55.43 0.425 9 Results – Only New Collaborators TopScore@10 Recall@10 MRR dist 10.0 7.14 0.046 cdist 12.5 8.91 0.065 tdist 14.5 11.47 0.064 ad 15.0 11.26 0.067 tfidf 10.0 7.64 0.054 collab 16.0 11.65 0.081 SoCScore 20.5 16.09 0.113 10 Conclusion • We presented a novel problem definition • We examined scoring functions and their combinations, and developed an effective function • Future work: – Incorporating abstracts – Incorporate machine learning – Most promising: the generic context based link prediction/person recommendation problem 11 Thank you! Questions? 12