CIKM`11 Discovering Top-k Teams of Experts with

advertisement
Discovering Top-k Teams of Experts
with/without a Leader in Social Networks
Mehdi Kargar
Aijun An
York University, Toronto, Canada
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Overview
• Team Formation in Social Networks
• Communication Cost
• Challenges in Finding Teams
• Approximation Algorithm for Finding Teams
• Enumerating Top-k Teams in Polynomial Delay
• Finding Teams with Leader
• Empirical Results
• Conclusion
2/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Team of Experts
• Given a social network,
find top-k teams of experts
that can effectively
collaborate in order to
complete a project.
• Each team might/might not
have a leader.
3/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Team of Experts
• Project: set of required skills
• Expert: an individual with a specific skill-set
• Social Network: represents strength of
relationships (the degree of collaboration
between any two experts).
• For example: LinkedIn, DBLP and …
4/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Example
Project = {AI, DB, DM, IR}
Jack
The numbers on the edge
represents how
easily two experts can
communicate, smaller
numbers represents better
communication.
{IR}
2
1
John
Susan
1
{DB}
6
6
{AI, DM}
7
7
6
Thomas
Jessie
{AI,DB,DM}
{DB}
5
8
5/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Are Jack and Thomas Able to
Communicate Effectively ?!
Jack
{IR}
2
1
John
Susan
1
{DB}
6
{AI, DM}
6
7
7
6
Thomas
Jessie
{AI,DB,DM}
{DB}
5
8
6/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
What about Jack, John and Susan?
Jack
{IR}
2
Susan
John
1
{DB}
6
6
1
{AI, DM}
7
7
6
Thomas
Jessie
{AI,DB,DM}
{DB}
5
8
7/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Team of Experts
• Team of Expers: Given a set of experts and a project that
requires a set of skills {s1, s2, . . . , sp}, a team of experts is a
set of p skill-expert pairs:
{(s1, cs1), (s2, cs2) , . . . , (sp, csp) },
where csk is an expert having skill sk for k = 1, . . . , p.
• A skill-expert pair (sk, csk) means that expert csk is
responsible for skill sk in the project.
• How to make sure that the experts can communicate
together?
8/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost
• For the following team of experts (without a leader)
• The sum of distances of a team of experts is the sum of the
shortest distances between the experts responsible for
each pair of skills.
9/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Benefit of Sum of Distances
• Previous work in this area defines two communication cost
functions [Lappas et. al]
• Diameter of the sub-graph
• The largest shortest path between any two nodes in the sub-graph
• Cost of Minimum Spanning Tree
• The above measures have the following problems:
1) They do not consider communication costs between each pair of
skill holders.
2) Instability:
a) A slight change in the graph may result in a radical change in the
b)
solution.
On the other hand, they may be insensitive to adding, deleting and
changing a connection in the graph since they only measure part of the
communication cost.
10/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Problem 1
• Team Formation without a Leader :
Given a project P and a graph G representing the social
network of a set of experts C, the problem of team
formation without a leader is to find a team of experts T for
P from G so that the communication cost of T, defined as
the sum of distances of T, is minimized.
• What about finding top-k teams of experts?
• User might be interested in finding more than one team.
11/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Challenges
• Theorem: Problem 1 is NP-hard.
• Proved in the paper by reduction from 3-satisfiability (3-SAT).
• Solution : Approximation algorithm with guaranteed ratio.
• Total number of teams is exponential regarding the number
of required skills.
• It is not efficient to generate all teams and then sort them.
• Solution : Enumerating teams in polynomial delay.
12/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Finding Best Approximate Team (without Leader)
• Step 1: for all experts (skill holders) n, for all required skills
ki, find the closest node which contains ki.
• Step 2: for all experts n, for all required skills ki, calculate the
sum of distances from n to the holder of ki.
• Step 3: Find the expert with the minimum sum of distances
among other experts.
• Step 4: Return the set of experts with the minimum sum of
distances.
• The approximation ratio of the algorithm is equal to 2.
• The weight of the answer is at most twice of the weight of the optimal
answer.
13/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Enumerating in Approximate Order
• The Lawler’s technique is used for finding the top-k teams.
• In each iteration, the next team is generated by finding the
top team under constraints.
• Two problems should be solved
1- What are the constraints?
2- How top answer can be found efficiently under
the constraints?
14/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
System Overview
Required
Skills
+
Value of k
Find best
Team with
no
Constraint
Insert the best
team with the
search space in
priority queue
Fetch the best
team from
priority queue
and print it
Top-k already printed
OR
Empty priority queue
?
YES
Terminate
NO
Insert each answer
with the related
search space into
priority queue
Find best team in
each sub-space
with associated
constrains
Divide the related
search space of
the top answer
into sub-spaces
15/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Constraints and Search Space
• Let’s do it using an example !
• Suppose that the required skills are {k1, k2, k3, k4}.
• Ci = {set of experts that holds skill ki }.
• The search space that contains the best team can be
represented as {C1 ᵡ C2 ᵡ C3 ᵡ C4}.
• Assume that the best team is (v1, v2, v3, v4), where vi is an
expert containing skill ki .
The whole
search space
16/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Team of Experts with a Leader
• A project often has a leader who is
responsible for monitoring and
coordinating the project
• Each expert in the team needs to
communicate with the leader to
report the progress and discuss
issues related to the project.
• The communication cost of the
team heavily depends on the
distance between the leader and
each of the project members.
17/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost
• For the following team of experts (with a leader)
• assume that team has a leader L, where L is an expert in the
social network which may or may not belong to the team.
• The leader distance of a team of experts is the sum of the
shortest distances between its leader and the expert for
each required skill.
18/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Problem 2
• Team Formation with a Leader :
Given a project P and a graph G representing the social
network of a set of experts C, the problem of team
formation with a leader is to find a team of experts T and an
expert L from C as the leader of the team so that the
communication cost, defined as the leader distance is
minimized.
• This problem can be solved in polynomial time.
19/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Finding Best Team (with Leader)
• Step 1: for all individuals i, for all required skills ki, find the
closest expert to i which contains ki.
• Step 2: for all individuals i (leader candidates), for all
required skills ki, calculate the leader distance from i to the
holder of ki.
• Step 3: Find the individual (leader) with the minimum leader
distance among other individuals.
• Step 4: Return the leader with the set of experts which has
the minimum leader distance.
20/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Experimental Results
• The proposed algorithms in this work (Best-SumDistance
and Best-Leader) are compared with the following methods.
• Rarest-First, minimizing diameter
• Enhanced-Steiner, minimizing the cost of minimum spanning tree
• Two datasets are used: DBLP and IMDb.
• DBLP contains 5,658 experts and 8,588 edges.
• IMDb contains 6,784 experts and 35,875 edges.
• For the purpose of comparison, exact answers of NP-hard
problems are achieved by exhaustive search.
21/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost
DBLP Dataset
22/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost
IMDb Dataset
23/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Other Quality Measures
Approximation Algorithms
DBLP Dataset
24/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Other Quality Measures
Exact Algorithms
DBLP Dataset
25/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Scalability
DBLP Dataset
26/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Conclusion
• Two problems are defined:
• Finding top-k teams of experts with a leader.
• Finding top-k teams of experts without a leader.
• An approximation algorithm for finding a team of experts
without a leader with bounded guarantee has been
proposed.
• An exact polynomial algorithm for finding a team of experts
with a leader has been proposed.
• A procedure of finding top-k teams of experts with
polynomial delay is introduced.
27/28
CIKM’11
Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Check the System at
http://graph.cse.yorku.ca:8080/team
It is accepted as a demo paper in ICDM’11
28/28
Download