Advisor-advisee Relationship Mining from Research Publication

advertisement
Advisor-advisee Relationship Mining
from Research Publication Network
Chi Wang1, Jiawei Han1, Yuntao Jia1, Jie Tang2, Duo Zhang1,
Yintao Yu1, Jingyi Guo2
1 University of Illinois at Urbana-Champaign
{chiwang1, hanj, yjia3, dzhang22, yintao}@illinois.edu
2 Tsinghua University {jietang,
guojy07@mails}.tsinghua.edu.cn
Motivation
• Latent knowledge in information network:
– Relationships:
friends/relatives/colleagues/enemies?
• If they can be mined by links, it will benefit
our study in
– Community structure  clustering & classification
– Exerting Searching search & ranking
– Evolution patterns  prediction &
recommendation
Overall Framework
Overall Framework
•
•
•
•
•
ai: author i
pj: paper j
py: paper year
pn: paper#
sti,yi: starting
time
• edi,yi: ending
time
• ri,yi: ranking
score
Heuristics
• ASSUMPTION 1: at each time t during the
publication history of a node x, x is either
being advised or not being advised. Once x
starts to advise another node, it will never be
advised again.
• ASSUMPTION 2: for a given pair of advisor and
advisee, the advisor always has a longer
publication history than the advisee.
Stage 1: Preprocessing
• From author-paper bipartite network to
authorship collaboration homogenous
network.
• Then a filtering process is performed to
remove unlikely relations of advisor-advisee.
Stage 1: Preprocessing
• Author aj is not considered to be ai’s advisor if
one of the following conditions holds:
Stage 1: Preprocessing
• In addition, estimate:
– the starting time st is estimated as the time they
started to collaborate;
– the ending time ed can be estimated as either the
time point when the Kulczynski measure starts to
decrease;
– the local likelihood of aj being ai’s advisor lij
ij
ij
Stage 2: Graph Factor Model
• For each node ai, there are three variables to
decide: yi, sti, and edi. Suppose we have
already had a local feature function g(yi, sti,
edi) defined on the three variables of any
given node.
Experiment Results
• DBLP data: 654, 628 authors, 1076,946
publications, years provided.
Datasets
RULE
SVM
IndMAX
TPFG
TEST1
69.9%
73.4%
75.2%
78.9%
80.2%
84.4%
TEST2
69.8%
74.6%
74.6%
79.0%
81.5%
84.3%
TEST3
80.6%
86.7%
83.1%
90.9%
88.8%
91.3%
heuristics
Supervised
learning
Empirical optimized
parameter parameter
Case Study
Advisee
Top Ranked Advisor
Time
Note
David M.
Blei
1. Michael I. Jordan
01-03
PhD advisor, 2004 grad
2. John D. Lafferty
05-06
Postdoc, 2006
Hong
Cheng
1. Qiang Yang
02-03
MS advisor, 2003
2. Jiawei Han
04-08
PhD advisor, 2008
1. Rajeev Motawani
97-98
“Unofficial advisor”
Sergey
Brin
Effect of rules - ROC curve
• Filtering rules in TPFG
12
THANK YOU
Download