Discovering Top-k Teams of Experts with/without a Leader in Social Networks Mehdi Kargar Aijun An York University, Toronto, Canada CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Overview • Team Formation in Social Networks • Communication Cost • Challenges in Finding Teams • Approximation Algorithm for Finding Teams • Enumerating Top-k Teams in Polynomial Delay • Finding Teams with Leader • Empirical Results • Conclusion 2/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Team of Experts • Given a social network, find top-k teams of experts that can effectively collaborate in order to complete a project. • Each team might/might not have a leader. 3/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Team of Experts • Project: set of required skills • Expert: an individual with a specific skill-set • Social Network: represents strength of relationships (the degree of collaboration between any two experts). • For example: LinkedIn, DBLP and … 4/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Example Project = {AI, DB, DM, IR} Jack The numbers on the edge represents how easily two experts can communicate, smaller numbers represents better communication. {IR} 2 1 John Susan 1 {DB} 6 6 {AI, DM} 7 7 6 Thomas Jessie {AI,DB,DM} {DB} 5 8 5/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Are Jack and Thomas Able to Communicate Effectively ?! Jack {IR} 2 1 John Susan 1 {DB} 6 {AI, DM} 6 7 7 6 Thomas Jessie {AI,DB,DM} {DB} 5 8 6/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks What about Jack, John and Susan? Jack {IR} 2 Susan John 1 {DB} 6 6 1 {AI, DM} 7 7 6 Thomas Jessie {AI,DB,DM} {DB} 5 8 7/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Team of Experts • Team of Expers: Given a set of experts and a project that requires a set of skills {s1, s2, . . . , sp}, a team of experts is a set of p skill-expert pairs: {(s1, cs1), (s2, cs2) , . . . , (sp, csp) }, where csk is an expert having skill sk for k = 1, . . . , p. • A skill-expert pair (sk, csk) means that expert csk is responsible for skill sk in the project. • How to make sure that the experts can communicate together? 8/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Communication Cost • For the following team of experts (without a leader) • The sum of distances of a team of experts is the sum of the shortest distances between the experts responsible for each pair of skills. 9/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Benefit of Sum of Distances • Previous work in this area defines two communication cost functions [Lappas et. al] • Diameter of the sub-graph • The largest shortest path between any two nodes in the sub-graph • Cost of Minimum Spanning Tree • The above measures have the following problems: 1) They do not consider communication costs between each pair of skill holders. 2) Instability: a) A slight change in the graph may result in a radical change in the b) solution. On the other hand, they may be insensitive to adding, deleting and changing a connection in the graph since they only measure part of the communication cost. 10/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Problem 1 • Team Formation without a Leader : Given a project P and a graph G representing the social network of a set of experts C, the problem of team formation without a leader is to find a team of experts T for P from G so that the communication cost of T, defined as the sum of distances of T, is minimized. • What about finding top-k teams of experts? • User might be interested in finding more than one team. 11/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Challenges • Theorem: Problem 1 is NP-hard. • Proved in the paper by reduction from 3-satisfiability (3-SAT). • Solution : Approximation algorithm with guaranteed ratio. • Total number of teams is exponential regarding the number of required skills. • It is not efficient to generate all teams and then sort them. • Solution : Enumerating teams in polynomial delay. 12/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Finding Best Approximate Team (without Leader) • Step 1: for all experts (skill holders) n, for all required skills ki, find the closest node which contains ki. • Step 2: for all experts n, for all required skills ki, calculate the sum of distances from n to the holder of ki. • Step 3: Find the expert with the minimum sum of distances among other experts. • Step 4: Return the set of experts with the minimum sum of distances. • The approximation ratio of the algorithm is equal to 2. • The weight of the answer is at most twice of the weight of the optimal answer. 13/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Enumerating in Approximate Order • The Lawler’s technique is used for finding the top-k teams. • In each iteration, the next team is generated by finding the top team under constraints. • Two problems should be solved 1- What are the constraints? 2- How top answer can be found efficiently under the constraints? 14/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks System Overview Required Skills + Value of k Find best Team with no Constraint Insert the best team with the search space in priority queue Fetch the best team from priority queue and print it Top-k already printed OR Empty priority queue ? YES Terminate NO Insert each answer with the related search space into priority queue Find best team in each sub-space with associated constrains Divide the related search space of the top answer into sub-spaces 15/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Constraints and Search Space • Let’s do it using an example ! • Suppose that the required skills are {k1, k2, k3, k4}. • Ci = {set of experts that holds skill ki }. • The search space that contains the best team can be represented as {C1 ᵡ C2 ᵡ C3 ᵡ C4}. • Assume that the best team is (v1, v2, v3, v4), where vi is an expert containing skill ki . The whole search space 16/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Team of Experts with a Leader • A project often has a leader who is responsible for monitoring and coordinating the project • Each expert in the team needs to communicate with the leader to report the progress and discuss issues related to the project. • The communication cost of the team heavily depends on the distance between the leader and each of the project members. 17/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Communication Cost • For the following team of experts (with a leader) • assume that team has a leader L, where L is an expert in the social network which may or may not belong to the team. • The leader distance of a team of experts is the sum of the shortest distances between its leader and the expert for each required skill. 18/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Problem 2 • Team Formation with a Leader : Given a project P and a graph G representing the social network of a set of experts C, the problem of team formation with a leader is to find a team of experts T and an expert L from C as the leader of the team so that the communication cost, defined as the leader distance is minimized. • This problem can be solved in polynomial time. 19/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Finding Best Team (with Leader) • Step 1: for all individuals i, for all required skills ki, find the closest expert to i which contains ki. • Step 2: for all individuals i (leader candidates), for all required skills ki, calculate the leader distance from i to the holder of ki. • Step 3: Find the individual (leader) with the minimum leader distance among other individuals. • Step 4: Return the leader with the set of experts which has the minimum leader distance. 20/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Experimental Results • The proposed algorithms in this work (Best-SumDistance and Best-Leader) are compared with the following methods. • Rarest-First, minimizing diameter • Enhanced-Steiner, minimizing the cost of minimum spanning tree • Two datasets are used: DBLP and IMDb. • DBLP contains 5,658 experts and 8,588 edges. • IMDb contains 6,784 experts and 35,875 edges. • For the purpose of comparison, exact answers of NP-hard problems are achieved by exhaustive search. 21/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Communication Cost DBLP Dataset 22/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Communication Cost IMDb Dataset 23/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Other Quality Measures Approximation Algorithms DBLP Dataset 24/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Other Quality Measures Exact Algorithms DBLP Dataset 25/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Scalability DBLP Dataset 26/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Conclusion • Two problems are defined: • Finding top-k teams of experts with a leader. • Finding top-k teams of experts without a leader. • An approximation algorithm for finding a team of experts without a leader with bounded guarantee has been proposed. • An exact polynomial algorithm for finding a team of experts with a leader has been proposed. • A procedure of finding top-k teams of experts with polynomial delay is introduced. 27/28 CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks Check the System at http://graph.cse.yorku.ca:8080/team It is accepted as a demo paper in ICDM’11 28/28