Document

advertisement
WEIGHTED SYNERGY GRAPHS FOR
EFFECTIVE TEAM FORMATION WITH
HETEROGENEOUS AD HOC AGENTS
Somchaya Liemhetcharat, Manuela Veloso
Presented by:
Raymond Mead
Problem
• Written for RoboCup Rescue Simulator, where teams of
robots are used to solve tasks.
• We want to choose the best team of robots to tackle a disaster.
• Around 50 possible agents.
• How can we form the best team when everyone’s abilities,
and how well people work together, are known?
• Given observations of groups and their performances,
how can we generate a graph to model each person’s
ability, and how well people work together?
Modeling Teams
• For forming teams, we want to look at:
• The compatibility between members of the team.
• Each person’s ability.
• Using a weighted graph:
• Each vertex represents a person, who has a certain ability
• Edges are used to show similarity between people
• A person’s ability is modeled as a normal distribution
• For someone, π‘Žπ‘– , their ability is 𝐢𝑖 ~𝑁(πœ‡π‘– , πœŽπ‘– 2 )
Example Graph
Compatibility
• 𝑑(π‘Žπ‘– , π‘Žπ‘— ) is the minimum distance between π‘Žπ‘– , π‘Žπ‘— ∈ 𝐴
• πœ‘(𝑑), is a compatibility function.
• Models how well people work together.
• Larger distance → Less compatible
• πœ‘ 𝑑 =
1
𝑑
• πœ‘ 𝑑 = exp(−
𝑑 ln 2
β„Ž
), exponential decay
Synergy of a Pair
• A pair of people: π‘Žπ‘– , π‘Žπ‘—
• For a pair’s Synergy, add their abilities, 𝐢𝑖 , 𝐢𝑗 , and scale it
by how compatible they are, πœ‘(𝑑).
• π•Š2 π‘Žπ‘– , π‘Žπ‘— = πœ‘(𝑑) βˆ™ (𝐢𝑖 + 𝐢𝑗 )
• Normal distribution ~ 𝑁 πœ‡π‘–,𝑗 , πœŽπ‘–,𝑗 2
• πœ‡π‘–,𝑗 = πœ‘ 𝑑 βˆ™ πœ‡π‘– + πœ‡π‘—
• πœŽπ‘–,𝑗 2 = πœ‘(𝑑)2 βˆ™ (πœŽπ‘– 2 + πœŽπ‘— 2 )
Synergy of a Team
• Average the Synergy between all pairs in a team 𝐴
1
•π•Š 𝐴 =
𝐴
2
π‘Žπ‘– ,π‘Žπ‘— ∈𝐴 π•Š2 (π‘Žπ‘– , π‘Žπ‘— )
• Normal Distribution ~ 𝑁 πœ‡π΄ , 𝜎𝐴 2
• πœ‡π΄ =
1
• 𝜎𝐴 2 =
1
𝐴
2
𝐴
2
π‘Žπ‘– ,π‘Žπ‘— ∈𝐴
πœ‘(𝑑 π‘Žπ‘– , π‘Žπ‘— ) βˆ™ (πœ‡π‘– + πœ‡π‘— )
π‘Žπ‘– ,π‘Žπ‘— ∈𝐴
πœ‘2 (𝑑 π‘Žπ‘– , π‘Žπ‘— ) βˆ™ (πœŽπ‘– 2 + πœŽπ‘— 2 )
Example Synergies
• π•Š π‘Ž1 , π‘Ž2 , π‘Ž3
• π•Š π‘Ž1 , … , π‘Ž5
~ 𝑁(17.8,2.8)
~ 𝑁 13.0,0.3
Evaluating a Team
• 𝛿-value of a team is 𝛿𝐴 s.t. 𝑃 π•Š 𝐴 ≥ 𝛿𝐴 = 𝛿.
• Probability of a team’s performance being ≥ 𝛿𝐴 is 𝛿.
• If 𝛿 = .5, then 𝛿𝐴 = πœ‡π΄
• 𝛿 ≤ .5 → high risk, high reward
• 𝛿 ≥ .5 → low risk, low reward
• 𝐴′ is better than 𝐴 if:
• 𝛿𝐴′ ≥ 𝛿𝐴
• 𝛿-optimal team: 𝐴𝛿
• Has largest 𝛿𝐴
∗
Problem: Finding the 𝛿-Optimal Team
• Among all possible teams, find the best team for given 𝛿.
• Need to check all possible sizes of teams
• Need to check most, if not all teams for each team size.
• NP-Hard
• Reduce the Max-Clique problem to Finding the Optimal Team.
• Max-Clique: Find the largest subgraph, where there is an edge
between every pair of vertices.
• NP-Complete
Algorithm: 𝛿-optimal team of size 𝑛
• Branch and Bound Algorithm:
• 𝐴 is a team used for exploring possible teams.
• Bound performance of 𝐴 to decide to keep exploring or not.
• 𝐴𝑏𝑒𝑠𝑑 is the current known best team, with 𝛿𝑏𝑒𝑠𝑑 .
• Initially, 𝐴, 𝐴𝑏𝑒𝑠𝑑 = ∅, and 𝛿𝑏𝑒𝑠𝑑 = −∞.
• Check all pairs, unless a new best is not possible with the
current members.
𝑁
• 𝑂( 𝑛 ) if the best 𝑛 is known
• 𝑂(2𝑁 ) otherwise
Algorithm: 𝛿-optimal team of size 𝑛
𝐹𝑖𝑛𝑑𝛿𝑂𝑝𝑑 𝑛, 𝛿, 𝑆, 𝐴, 𝐴𝑏𝑒𝑠𝑑 , 𝛿𝑏𝑒𝑠𝑑 :
If 𝐴 = 𝑛, compare 𝐴 and 𝐴𝑏𝑒𝑠𝑑 :
Return 𝐴, 𝛿𝐴 if 𝐴 is better, (𝐴𝑏𝑒𝑠𝑑 , 𝛿𝑏𝑒𝑠𝑑 ) otherwise.
For k = 𝑖 + 1, … , 𝑁, where 𝑖 ← πΏπ‘Žπ‘Ÿπ‘”π‘’π‘ π‘‘ 𝑖𝑛𝑑𝑒π‘₯ 𝑖𝑛 𝐴
𝐴′ = 𝐴 ∪ {π‘Žπ‘˜ }
𝑀𝑖𝑛𝐴′ , π‘€π‘Žπ‘₯𝐴′ ← π΅π‘œπ‘’π‘›π‘‘π›Ώπ‘‰π‘Žπ‘™(𝐴′ , 𝑛, 𝛿, 𝑆)
• All nodes that can be added are assumed to be worst or best case
• Min compatibility with min ability → worst
• Max compatibility with max ability → best
π‘€π‘Žπ‘₯𝐴′ ≥ 𝛿𝑏𝑒𝑠𝑑 :
𝐴𝑏𝑒𝑠𝑑 , 𝛿𝑏𝑒𝑠𝑑 ← 𝐹𝑖𝑛𝑑𝛿𝑂𝑝𝑑(𝑛, 𝛿, 𝑆, 𝐴′ , 𝐴𝑏𝑒𝑠𝑑 , 𝛿𝑏𝑒𝑠𝑑 )
Reducing the Max-Clique Problem
• 𝐺 = (𝑉, 𝐸), is unweighted - want to find the max-clique.
• The max-clique in 𝐺 will be the largest optimal team.
• Create 𝐺 ′ = (𝑉, 𝐸 ′ ) to run with 𝐹𝑖𝑛𝑑𝛿𝑂𝑝𝑑
• Each edge in 𝐸 corresponds to an edge of weight 1 in 𝐸′
• Everyone’s ability is ~ 𝑁(1,1)
• 𝛿 = .5, Evaluating a team only depends on mean, always 1.
• πœ‘ 𝑑 =
1
𝑑≤1
0 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
Max-Clique → Best Team
• Evaluating π•Š(𝐴):
=
=
=
1
𝐴
2
1
𝐴
2
1
𝐴
2
π‘Žπ‘– ,π‘Žπ‘— ∈𝐴 π•Š2 (π‘Žπ‘– , π‘Žπ‘— ),
π‘Žπ‘– ,π‘Žπ‘— ∈𝐴 2
definition
βˆ™ πœ‘(𝑑 π‘Žπ‘– , π‘Žπ‘— ), only mean matters
2 𝑒𝑑𝑔𝑒𝑠 π‘œπ‘“ π‘€π‘’π‘–π‘”β„Žπ‘‘ 1 π‘“π‘œπ‘Ÿ π‘π‘Žπ‘–π‘Ÿπ‘  π‘œπ‘“ 𝐴
• πœ‘ 𝑑 = 1 only when there is an edge between a pair in 𝐴
• 0 otherwise
• Maximized when there is an edge between every pair of 𝐴
Approximation Algorithm
• Simulated Annealing
• Looking at teams similar to the current best, and comparing them
• Generate a random team
• Repeat constant times:
• Find a new team similar to the current best, swap a node in 𝐴
• Evaluate both teams
• Replace if the new team is better
• Return the best team found
• Runs in 𝑂(𝑛2 ) if 𝑛 is known.
• Evaluating π•Š(𝐴) is 𝑂 𝑛2 , where 𝑛 = 𝐴
• 𝑂(𝑁 3 ) if n is unknown
Approximation Algorithm
π΄π‘π‘π‘Ÿπ‘œπ‘₯π›Ώπ‘‚π‘π‘‘π‘‡π‘’π‘Žπ‘š 𝑛, 𝛿, 𝑆 :
𝐴𝑏𝑒𝑠𝑑 ← π‘…π‘Žπ‘›π‘‘π‘œπ‘š 𝑆, 𝑛
Repeat π‘˜ times:
𝐴𝑛𝑒𝑀 ← π‘†π‘€π‘Žπ‘ π‘ π‘œπ‘šπ‘’ π‘Ž ∈ 𝐴𝑏𝑒𝑠𝑑 π‘€π‘–π‘‘β„Ž π‘Ž′ ∈ 𝑉\𝐴𝑏𝑒𝑠𝑑
Compare 𝐴𝑛𝑒𝑀 and 𝐴𝑏𝑒𝑠𝑑
Replace 𝐴𝑏𝑒𝑠𝑑 if 𝐴𝑛𝑒𝑀 is better
Return 𝐴𝑏𝑒𝑠𝑑
Comparison
• Effectiveness of team 𝐴 is =
𝛿𝐴 −π›Ώπ‘šπ‘–π‘›
π›Ώπ‘šπ‘Žπ‘₯ −π›Ώπ‘šπ‘–π‘›
• Where 𝐴’s performance fits between best and worst.
Learning the Synergy Graph
• We have observations, 𝑂, containing all people, 𝐴.
• Each observation is π‘œ = (𝐴, 𝑝), team 𝐴, performance, 𝑝.
• Find a synergy graph that best fits the observations.
• Need to find ability of each person.
• Need to find the compatibility between people.
• Strategy: Simulated Annealing
Learning Algorithm
πΏπ‘’π‘Žπ‘Ÿπ‘›π‘†π‘¦π‘›π‘’π‘Ÿπ‘”π‘¦πΊπ‘Ÿπ‘Žπ‘β„Ž(𝑂):
𝐺 ← π‘…π‘Žπ‘›π‘‘π‘œπ‘šπΊπ‘Ÿπ‘Žπ‘β„Ž 𝐴
𝐢 ← πΉπ‘–π‘‘π΄π‘π‘–π‘™π‘–π‘‘π‘–π‘’π‘ π‘‡π‘œπΊπ‘Ÿπ‘Žπ‘β„Ž(𝐺, 𝑂)
π‘ π‘π‘œπ‘Ÿπ‘’ ← πΏπ‘œπ‘”πΏπ‘–π‘˜π‘’π‘™π‘–β„Žπ‘œπ‘œπ‘‘(𝐺, 𝐢, 𝑂)
Repeat constant times:
𝐺 ′ ← π‘†π‘–π‘šπ‘–π‘™π‘Žπ‘ŸπΊπ‘Ÿπ‘Žπ‘β„Ž(𝐺)
𝐢 ′ ← πΉπ‘–π‘‘π΄π‘π‘–π‘™π‘–π‘‘π‘–π‘’π‘ π‘‡π‘œπΊπ‘Ÿπ‘Žπ‘β„Ž(𝐺′, 𝑂)
Compare scores of 𝐺, and 𝐺 ′
𝐺 ← 𝐺 ′ if 𝐺 ′ is better
Return 𝐺
Generating G and Finding Similar G’
• π‘…π‘Žπ‘›π‘‘π‘œπ‘šπΊπ‘Ÿπ‘Žπ‘β„Ž 𝐴
• Vertices represent each person
• Randomly put edges of random weights between vertices
• π‘†π‘–π‘šπ‘–π‘™π‘Žπ‘ŸπΊπ‘Ÿπ‘Žπ‘β„Ž(𝐺)
• Do one of the following to 𝐺:
• Increase a random edge’s weight by 1
• Decrease a random edge’s weight by 1
• Remove a random edge
• Add a random edge of random weight
Similar Graph:
Fitting Abilities to a Graph
• Look at all teams of size 2 or 3 of 𝐴, 𝐴2,3 .
• Each 𝐴 ∈ 𝐴2,3 , there are observations of 𝐴, each with a
performance.
• Fit a normal distribution to the observed performance of 𝐴.
• 𝐷𝐴 ~ 𝑁(π‘₯𝐴 , 𝑠𝐴 ), is the observed distribution of 𝐴
• 𝐷 is the set of all 𝐷𝐴
• We want the distribution of π•Š A to match the distribution
of 𝐷𝐴 .
• Fit π•Š A ~ 𝑁(πœ‡π΄ , 𝜎𝐴 2 ) to 𝐷𝐴 ~ 𝑁(π‘₯𝐴 , 𝑠𝐴 ) as best we can choosing
(πœ‡π‘– , πœŽπ‘– 2 ) for each person
Fitting Abilities
• For π•Š(𝐴) with 𝐴 of size 2:
• πœ‡π΄ = πœ‘ 𝑑 πœ‡π‘– + πœ‘(𝑑)πœ‡π‘—
• 𝜎𝐴 2 = πœ‘(𝑑)2 πœŽπ‘– 2 + πœ‘(𝑑)2 πœŽπ‘— 2
• Similar for 𝐴 of size 3.
• Know πœ‘ 𝑑 , from the graph, and π‘₯𝐴 , 𝑠𝐴 we want to fit to.
• 𝑀1 , matrix of πœ‘(𝑑), one row per team, 𝐡1 = (π‘₯𝐴1 , … , π‘₯ 𝐷 )
• Fit 𝑀1 𝑋1 = 𝐡1 , for 𝑋1 = (πœ‡1 , … , πœ‡π‘ )
• 𝑀2 matrix of πœ‘ 2 (𝑑), one row per
• Fit 𝑀2 𝑋2 = 𝐡2 for 𝑋2 = (𝜎1 2 , … , πœŽπ‘ 2 )
team, 𝐡2 = (𝑠1 2 , … , 𝑠𝑁 2 )
Code:
Log-Likelihood
• Sum of log-likelihoods for each observation, given
synergy graph, and abilities.
• For an observation π‘œ = (𝐴, 𝑝):
1
log( 2
exp
𝜎𝐴 2πœ‹
𝑝 − πœ‡π΄
2𝜎𝐴
2
2
)
• Probability density of normal distribution at value 𝑝.
Code
Evaluation
• Generate a hidden graph, with compatibility and abilities.
• Generate a set of observations
• Run the learning Algorithm
• Compare Log-Likelihood of learned graph with true graph.
Results
Results
Using for RoboCup
Thoughts:
• Domain specific:
• Works well for the given problem, but may not be good for other
applications.
• Tested for relatively small graphs.
• May not be generalizable to large sparse graphs.
• Due to randomness of search.
• Modifying for learning large graphs:
• Generate a better initial graph.
• Make better choice for a similar graph.
• More localized evaluation.
Download