WEIGHTED SYNERGY GRAPHS FOR EFFECTIVE TEAM FORMATION WITH HETEROGENEOUS AD HOC AGENTS Somchaya Liemhetcharat, Manuela Veloso Presented by: Raymond Mead Problem • Written for RoboCup Rescue Simulator, where teams of robots are used to solve tasks. • We want to choose the best team of robots to tackle a disaster. • Around 50 possible agents. • How can we form the best team when everyone’s abilities, and how well people work together, are known? • Given observations of groups and their performances, how can we generate a graph to model each person’s ability, and how well people work together? Modeling Teams • For forming teams, we want to look at: • The compatibility between members of the team. • Each person’s ability. • Using a weighted graph: • Each vertex represents a person, who has a certain ability • Edges are used to show similarity between people • A person’s ability is modeled as a normal distribution • For someone, ππ , their ability is πΆπ ~π(ππ , ππ 2 ) Example Graph Compatibility • π(ππ , ππ ) is the minimum distance between ππ , ππ ∈ π΄ • π(π), is a compatibility function. • Models how well people work together. • Larger distance → Less compatible • π π = 1 π • π π = exp(− π ln 2 β ), exponential decay Synergy of a Pair • A pair of people: ππ , ππ • For a pair’s Synergy, add their abilities, πΆπ , πΆπ , and scale it by how compatible they are, π(π). • π2 ππ , ππ = π(π) β (πΆπ + πΆπ ) • Normal distribution ~ π ππ,π , ππ,π 2 • ππ,π = π π β ππ + ππ • ππ,π 2 = π(π)2 β (ππ 2 + ππ 2 ) Synergy of a Team • Average the Synergy between all pairs in a team π΄ 1 •π π΄ = π΄ 2 ππ ,ππ ∈π΄ π2 (ππ , ππ ) • Normal Distribution ~ π ππ΄ , ππ΄ 2 • ππ΄ = 1 • ππ΄ 2 = 1 π΄ 2 π΄ 2 ππ ,ππ ∈π΄ π(π ππ , ππ ) β (ππ + ππ ) ππ ,ππ ∈π΄ π2 (π ππ , ππ ) β (ππ 2 + ππ 2 ) Example Synergies • π π1 , π2 , π3 • π π1 , … , π5 ~ π(17.8,2.8) ~ π 13.0,0.3 Evaluating a Team • πΏ-value of a team is πΏπ΄ s.t. π π π΄ ≥ πΏπ΄ = πΏ. • Probability of a team’s performance being ≥ πΏπ΄ is πΏ. • If πΏ = .5, then πΏπ΄ = ππ΄ • πΏ ≤ .5 → high risk, high reward • πΏ ≥ .5 → low risk, low reward • π΄′ is better than π΄ if: • πΏπ΄′ ≥ πΏπ΄ • πΏ-optimal team: π΄πΏ • Has largest πΏπ΄ ∗ Problem: Finding the πΏ-Optimal Team • Among all possible teams, find the best team for given πΏ. • Need to check all possible sizes of teams • Need to check most, if not all teams for each team size. • NP-Hard • Reduce the Max-Clique problem to Finding the Optimal Team. • Max-Clique: Find the largest subgraph, where there is an edge between every pair of vertices. • NP-Complete Algorithm: πΏ-optimal team of size π • Branch and Bound Algorithm: • π΄ is a team used for exploring possible teams. • Bound performance of π΄ to decide to keep exploring or not. • π΄πππ π‘ is the current known best team, with πΏπππ π‘ . • Initially, π΄, π΄πππ π‘ = ∅, and πΏπππ π‘ = −∞. • Check all pairs, unless a new best is not possible with the current members. π • π( π ) if the best π is known • π(2π ) otherwise Algorithm: πΏ-optimal team of size π πΉππππΏπππ‘ π, πΏ, π, π΄, π΄πππ π‘ , πΏπππ π‘ : If π΄ = π, compare π΄ and π΄πππ π‘ : Return π΄, πΏπ΄ if π΄ is better, (π΄πππ π‘ , πΏπππ π‘ ) otherwise. For k = π + 1, … , π, where π ← πΏπππππ π‘ πππππ₯ ππ π΄ π΄′ = π΄ ∪ {ππ } ππππ΄′ , πππ₯π΄′ ← π΅ππ’πππΏπππ(π΄′ , π, πΏ, π) • All nodes that can be added are assumed to be worst or best case • Min compatibility with min ability → worst • Max compatibility with max ability → best πππ₯π΄′ ≥ πΏπππ π‘ : π΄πππ π‘ , πΏπππ π‘ ← πΉππππΏπππ‘(π, πΏ, π, π΄′ , π΄πππ π‘ , πΏπππ π‘ ) Reducing the Max-Clique Problem • πΊ = (π, πΈ), is unweighted - want to find the max-clique. • The max-clique in πΊ will be the largest optimal team. • Create πΊ ′ = (π, πΈ ′ ) to run with πΉππππΏπππ‘ • Each edge in πΈ corresponds to an edge of weight 1 in πΈ′ • Everyone’s ability is ~ π(1,1) • πΏ = .5, Evaluating a team only depends on mean, always 1. • π π = 1 π≤1 0 ππ‘βπππ€ππ π Max-Clique → Best Team • Evaluating π(π΄): = = = 1 π΄ 2 1 π΄ 2 1 π΄ 2 ππ ,ππ ∈π΄ π2 (ππ , ππ ), ππ ,ππ ∈π΄ 2 definition β π(π ππ , ππ ), only mean matters 2 πππππ ππ π€πππβπ‘ 1 πππ πππππ ππ π΄ • π π = 1 only when there is an edge between a pair in π΄ • 0 otherwise • Maximized when there is an edge between every pair of π΄ Approximation Algorithm • Simulated Annealing • Looking at teams similar to the current best, and comparing them • Generate a random team • Repeat constant times: • Find a new team similar to the current best, swap a node in π΄ • Evaluate both teams • Replace if the new team is better • Return the best team found • Runs in π(π2 ) if π is known. • Evaluating π(π΄) is π π2 , where π = π΄ • π(π 3 ) if n is unknown Approximation Algorithm π΄πππππ₯πΏπππ‘ππππ π, πΏ, π : π΄πππ π‘ ← π πππππ π, π Repeat π times: π΄πππ€ ← ππ€ππ π πππ π ∈ π΄πππ π‘ π€ππ‘β π′ ∈ π\π΄πππ π‘ Compare π΄πππ€ and π΄πππ π‘ Replace π΄πππ π‘ if π΄πππ€ is better Return π΄πππ π‘ Comparison • Effectiveness of team π΄ is = πΏπ΄ −πΏπππ πΏπππ₯ −πΏπππ • Where π΄’s performance fits between best and worst. Learning the Synergy Graph • We have observations, π, containing all people, π΄. • Each observation is π = (π΄, π), team π΄, performance, π. • Find a synergy graph that best fits the observations. • Need to find ability of each person. • Need to find the compatibility between people. • Strategy: Simulated Annealing Learning Algorithm πΏππππππ¦πππππ¦πΊπππβ(π): πΊ ← π ππππππΊπππβ π΄ πΆ ← πΉππ‘π΄πππππ‘πππ πππΊπππβ(πΊ, π) π ππππ ← πΏπππΏπππππβπππ(πΊ, πΆ, π) Repeat constant times: πΊ ′ ← ππππππππΊπππβ(πΊ) πΆ ′ ← πΉππ‘π΄πππππ‘πππ πππΊπππβ(πΊ′, π) Compare scores of πΊ, and πΊ ′ πΊ ← πΊ ′ if πΊ ′ is better Return πΊ Generating G and Finding Similar G’ • π ππππππΊπππβ π΄ • Vertices represent each person • Randomly put edges of random weights between vertices • ππππππππΊπππβ(πΊ) • Do one of the following to πΊ: • Increase a random edge’s weight by 1 • Decrease a random edge’s weight by 1 • Remove a random edge • Add a random edge of random weight Similar Graph: Fitting Abilities to a Graph • Look at all teams of size 2 or 3 of π΄, π΄2,3 . • Each π΄ ∈ π΄2,3 , there are observations of π΄, each with a performance. • Fit a normal distribution to the observed performance of π΄. • π·π΄ ~ π(π₯π΄ , π π΄ ), is the observed distribution of π΄ • π· is the set of all π·π΄ • We want the distribution of π A to match the distribution of π·π΄ . • Fit π A ~ π(ππ΄ , ππ΄ 2 ) to π·π΄ ~ π(π₯π΄ , π π΄ ) as best we can choosing (ππ , ππ 2 ) for each person Fitting Abilities • For π(π΄) with π΄ of size 2: • ππ΄ = π π ππ + π(π)ππ • ππ΄ 2 = π(π)2 ππ 2 + π(π)2 ππ 2 • Similar for π΄ of size 3. • Know π π , from the graph, and π₯π΄ , π π΄ we want to fit to. • π1 , matrix of π(π), one row per team, π΅1 = (π₯π΄1 , … , π₯ π· ) • Fit π1 π1 = π΅1 , for π1 = (π1 , … , ππ ) • π2 matrix of π 2 (π), one row per • Fit π2 π2 = π΅2 for π2 = (π1 2 , … , ππ 2 ) team, π΅2 = (π 1 2 , … , π π 2 ) Code: Log-Likelihood • Sum of log-likelihoods for each observation, given synergy graph, and abilities. • For an observation π = (π΄, π): 1 log( 2 exp ππ΄ 2π π − ππ΄ 2ππ΄ 2 2 ) • Probability density of normal distribution at value π. Code Evaluation • Generate a hidden graph, with compatibility and abilities. • Generate a set of observations • Run the learning Algorithm • Compare Log-Likelihood of learned graph with true graph. Results Results Using for RoboCup Thoughts: • Domain specific: • Works well for the given problem, but may not be good for other applications. • Tested for relatively small graphs. • May not be generalizable to large sparse graphs. • Due to randomness of search. • Modifying for learning large graphs: • Generate a better initial graph. • Make better choice for a similar graph. • More localized evaluation.