Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe Lafayette College http://teamcore.usc.edu Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1 2 Teamwork: Foundational MAS Concept • Joint actions improve outcome • But increases communication & computation • Over two decades of work • This paper: increased teamwork can harm team – – – – Even without considering communication & computation Only considering team reward Multiple algorithms, multiple settings But why? 3 DCOPs: Distributed Constraint Optimization Problems • Multiple domains – – – – – Meeting scheduling Traffic light coordination RoboCup soccer Multi-agent plan coordination Sensor networks • Distributed – Robust to failure – Scalable • (In)Complete – Quality bounds DCOP Framework a1 a2 Reward a2 a3 Reward 10 10 0 0 0 0 6 6 a1 a2 a3 R(a) RS (aS ) S 5 DCOP Framework a1 a2 Reward a2 a3 Reward 10 10 0 0 0 0 6 6 a1 a2 a3 R(a) RS (aS ) S 6 DCOP Framework a1 a2 Reward a2 a3 Reward 10 10 0 0 0 0 6 6 a1 a2 a3 R(a) RS (aS ) S Different “levels” of teamwork possible Complete Solution is NP-Hard 7 D-CEE: Distributed Coordination of Exploration and Exploitation • Environment may be unknown • Maximize on-line reward over some number of rounds – Exploration vs. Exploitation • Demonstrated mobile ad-hoc network – Simulation [Released] & Robots [Released Soon] DCOP Distrubted Constraint Optimization Problem 9 DCOP → DCEE Distributed Coordination of Exploration and Exploitation 10 DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d get R=200 a1 99 a2 50 a3 75 a4 11 DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d gain 101 a1 99 If I move, I’d gain 251 a2 50 If I move, I’d gain 275 a3 If I move, I’d gain 125 75 a4 Explore or Exploit? 12 Success! [ATSN-09][IJCAI-09] • Both classes of (incomplete) algorithms • Simulation and on Robots – Ad hoc Wireless Network (Improvement if performance > 0) Scaled Cumulative Signal Strength Varying Topology SE-Optimistic 0.6 0.5 SE-Mean 0.4 BE-Stay 0.3 0.2 BE-Backtrack 0.1 BE-Rebid 0 Chain Density = 1/3 Density = 2/3 Full k-Optimality • Increased coordination – originally DCOP formulation – In DCOP, increased k = increased team reward • Find groups of agents to change variables – Joint actions – Neighbors of moving group cannot move • Defines amount of teamwork (Higher communication & computation overheads) 15 “k-Optimality” in DCEE • k=1, 2, ... o Groups of size k form, those with the most to gain move (change the value of their variable) o A group can only move if no other agents in its neighborhood move 16 Example: SE-Optimistic-2 Rewards on [1,200] If I move, I’d gain 101 99 a1 If I move, I’d gain 251 200-99 251 + 275 101 + 251 - 150 - 101 99 a2 If I move, I’d gain 125 75 a3 a1 50 a2 If I move, I’d gain 275 a4 275 + 250 - 150 125 + 275 - 125 50 a3 75 a4 Sample coordination results Omniscient: confirms DCOP result, as expected Total Average Gain 40000 Artificially Supplied Rewards (DCOP) k=1 k=2 k=3 30000 20000 10000 0 Omnscient Optimistic Complete Graph Omniscient Chain Graph Optimistic Physical Implementation • Create Robots • Mobile ad-hoc Wireless Network Confirms Team Uncertainty Penalty Total Gain • Averaged over 10 trials each • Trend confirmed! • (Huge standard error) 20 Chain Complete Problem with “k-Optimal” • Unknown rewards – cannot know if can increase reward by moving! • Define new term: L-Movement – – – – # of agents that can change variables per round Independent of exploration algorithm Graph dependant Alternate measure of teamwork 21 L-Movement • Example: k = 1 algorithms – L is the size of the largest maximal independent set of the graph – NP-hard to calculate for a general graph – harder for higher k • Consider ring & complete graphs, both with 5 vertices – ring graph: maximal independent set is 2 – complete graph: maximal independent set is 1 • For k =1 – L=1 for a complete graph – size of the maximal independent set of a ring graph is: Configuration Hypercube No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A3 changes from c to c’ 23 How many agents can move? (1/2) • In a ring graph with 5 nodes o k=1:L=2 o k=2:L=3 • In a complete graph with 5 nodes o k=1:L=1 o k=2:L=2 24 How many agents can move? (2/2) Configuration is reachable by an algorithm with movement L in s steps if an only if and C[2,2] reachable for L=1 if s ≥ 4 25 L-Movement Experiments For various DCEE problems, distributions, and L: For steps s = 1...30: 1. Construct hypercube with s values per dimension 2. Find M, the max achievable reward in s steps, given L 3. Return average of 50 runs Example: 2D Hypercube o Only half reachable if L=1 o All locations reachable if L=2 s s Average Maximum Reward Discovered Restricting to L-Movement: Complete Complete Graph o k=1:L=1 o k=2:L=2 L=1→2 27 Average Maximum Reward Discovered Restricting to L-Movement: Ring L=2→3 Ring graph o k=1:L=2 o k=2:L=3 28 Complete Ring 1. Uniform distribution of rewards 2. 4 agents 3. Different normal distribution 29 k and L: 5-agent graphs K value Ring Graph, L value Complete Graph, L value 1 2 1 2 3 2 3 3 3 4 4 4 5 5 5 • Increasing k changes L less in ring than complete • Configuration Hypercube is upper bound • Posit a consistent negative effect • Suggests why increasing k has different effects: • Larger improvement in complete than ring for increasing k 30 L-movement May Help Explain Team Uncertainty Penalty • L = 2 will be able to explore more of C than algorithm with L = 1 – Independent of exploration algorithm! – Determined by k and graph structure – C is upper bound – posit constant negative effect • Any algorithm experiences diminishing returns as k increases – Consistent with DCOP results • L-movement difference between k = 1 algorithms and k = 2 – Larger difference in graphs with more agents – For k = 1, L = 1 for a complete graph – For k = 1, L increases with the number of vertices in a ring graph 31 Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu 33