slides

advertisement
SRS 2013
4’th International Workshop on Social
Recommender Systems
Co-located with WWW 2013
Rio de Janeiro, Brazil
Recommending Collaborators Using
Keywords
Lior Ebel
Sara Cohen
The Hebrew University
of Jerusalem
Collaborator Recommendation
2012
rch
ea
LS e
XM gin
An En
Julia
Probability in
XML
Bob
LS
Eng earch
ine
An XML Search
Engine
An X
M
Alice
2013
David
Classic link
prediction/recommendation:
Recommend Julia for Alice
Collaborator
prediction/recommendation:
Recommend Julia for Alice and the
specific topic
Recommend(Alice) = {Julia}
Recommend(Alice, ”Probability in
Databases”) = {Julia}
2
Motivation
• A new researcher (such as myself ), can benefit from
recommended collaborators on a desired topic
according to the social graph
• Important in cross-domain collaborations
• Grant regulations
• A foundation for the more generic context-based
people recommendation / context-based link
prediction problem, where given a source s and
textual context k, we recommend/predict target nodes t
for a link of context k
3
Take Home Message
• A new problem variant
– Define the collaborator recommendation problem
– Not addressed before in the literature
• Scoring functions
– Empirical results for several structural, textual and
importance based scoring functions
– Two large real world DBLP based co-authorship
networks
• Results
– A sophisticated hybrid score function based on
structural and textual measures outperforms baseline
– Our ranking function is effective
4
Problem Definition
• Author node attributes
– Profile(v): a bag of words of all publication titles
• Co-authorship edge attributes
– Label(e): publication title, Time(e): publication year
Setting(e): publication venue/journal
• A Query q=(s,k)
– s: the source node in the network (the querying
author)
– k: set of keywords (e.g. desired topic or future
publication title)
• A function score(u,q)
– score(u,q) > score(v,q) → u more likely to form a
collaboration with s described by k
5
Structural Scoring Functions
• Distance variants: Score(u,q) = 1/distance(s,u)
– Simple distance
– Weighted by time: weight(e) = 1/log(age(e))
where age(e) = current year – time(e)
– Weighted by publication frequency:
weight(e)=1/Mutual(e)
where Mutual(e) = # of mutual publications for the
authors of e
• Adamic-Adar (Social Networks, 2001)
– Score(u,q) = a weighted sum on the mutual
neighbors of s and u
Each mutual neighbor v weight is 1/log(N(v)), where
N(v) is the number of neighbors of v
6
Textual Scoring Functions
• TF-IDF
– Score(u,q) = tf-idf(k , profile(u))
• COLLAB (developed in this paper)
– Step 1: Score(u,q) = a weighted sum of u’s
publications, considering:
• Textual score for the publication for k
• Publication age
• Publication venue (did s publish in it?)
• Publication participants (did s publish with them?)
– Step 2: the unseen-bigrams approach on the results
of step 1 (Kleinberg et al., 2007)
7
Combining Scoring Functions
• Linear Combinations
• Re-ranking
• Borda Normalizations
• SocScore: a linear combination of re-rankings
functions:
– First ranking: structural (e.g time based distance,
adamic-adar)
– Second ranking: text based
8
Results – All Collaborators
TopScore@10
Recall@10
MRR
dist
62.5
48.1
0.349
cdist
62.5
47.63
0.443
tdist
66.0
55.15
0.444
ad
6.0
3.30
0.027
tfidf
38.0
27.57
0.235
collab
27.5
22.86
0.185
SoCScore
66.5
55.43
0.425
9
Results – Only New Collaborators
TopScore@10
Recall@10
MRR
dist
10.0
7.14
0.046
cdist
12.5
8.91
0.065
tdist
14.5
11.47
0.064
ad
15.0
11.26
0.067
tfidf
10.0
7.64
0.054
collab
16.0
11.65
0.081
SoCScore
20.5
16.09
0.113
10
Conclusion
• We presented a novel problem definition
• We examined scoring functions and their
combinations, and developed an effective
function
• Future work:
– Incorporating abstracts
– Incorporate machine learning
– Most promising: the generic context based link
prediction/person recommendation problem
11
Thank you!
Questions?
12
Download