Document

advertisement
Towards a Theoretic
Understanding of DCEE
Scott Alfeld, Matthew E. Taylor,
Prateek Tandon, and Milind Tambe
Lafayette
College
http://teamcore.usc.edu
Forward Pointer
When Should There be a “Me” in “Team”? Distributed
Multi-Agent Optimization Under Uncertainty
Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto
Yooko, & Milind Tambe
Wednesday, 8:30 – 10:30 Coordination and Cooperation 1
2
Teamwork:
Foundational MAS Concept
• Joint actions improve outcome
• But increases communication & computation
• Over two decades of work
• This paper: increased teamwork can harm team
–
–
–
–
Even without considering communication & computation
Only considering team reward
Multiple algorithms, multiple settings
But why?
3
DCOPs:
Distributed Constraint Optimization Problems
• Multiple domains
–
–
–
–
–
Meeting scheduling
Traffic light coordination
RoboCup soccer
Multi-agent plan coordination
Sensor networks
• Distributed
– Robust to failure
– Scalable
• (In)Complete
– Quality bounds
DCOP Framework
a1
a2
Reward
a2
a3
Reward
10
10
0
0
0
0
6
6
a1
a2
a3
R(a)   RS (aS )
S
5
DCOP Framework
a1
a2
Reward
a2
a3
Reward
10
10
0
0
0
0
6
6
a1
a2
a3
R(a)   RS (aS )
S
6
DCOP Framework
a1
a2
Reward
a2
a3
Reward
10
10
0
0
0
0
6
6
a1
a2
a3
R(a)   RS (aS )
S
Different “levels” of teamwork possible
Complete Solution is NP-Hard
7
D-CEE: Distributed Coordination of
Exploration and Exploitation
• Environment may be unknown
• Maximize on-line reward over some number of rounds
– Exploration vs. Exploitation
• Demonstrated mobile ad-hoc network
– Simulation [Released] & Robots [Released Soon]
DCOP
Distrubted Constraint Optimization Problem
9
DCOP → DCEE
Distributed Coordination of Exploration and Exploitation
10
DCEE Algorithm: SE-Optimistic
(Will build upon later)
Rewards on [1,200]
If I move,
I’d get
R=200
a1
99
a2
50
a3
75
a4
11
DCEE Algorithm: SE-Optimistic
(Will build upon later)
Rewards on [1,200]
If I move,
I’d gain
101
a1
99
If I move,
I’d gain
251
a2
50
If I move,
I’d gain
275
a3
If I move,
I’d gain
125
75
a4
Explore or Exploit?
12
Success! [ATSN-09][IJCAI-09]
• Both classes of (incomplete) algorithms
• Simulation and on Robots
– Ad hoc Wireless Network
(Improvement if performance > 0)
Scaled Cumulative Signal Strength
Varying Topology
SE-Optimistic
0.6
0.5
SE-Mean
0.4
BE-Stay
0.3
0.2
BE-Backtrack
0.1
BE-Rebid
0
Chain
Density = 1/3
Density = 2/3
Full
k-Optimality
• Increased coordination – originally DCOP formulation
– In DCOP, increased k = increased team reward
• Find groups of agents to change variables
– Joint actions
– Neighbors of moving group cannot move
• Defines amount of teamwork
(Higher communication & computation overheads)
15
“k-Optimality” in DCEE
• k=1, 2, ...
o Groups of size k form, those with the most to gain
move (change the value of their variable)
o A group can only move if no other agents in its
neighborhood move
16
Example: SE-Optimistic-2
Rewards on [1,200]
If I move,
I’d gain
101
99
a1
If I move,
I’d gain
251
200-99
251 + 275
101 + 251
- 150
- 101
99
a2
If I move,
I’d gain
125
75
a3



a1
50
a2
If I move,
I’d gain
275
a4
275 + 250

- 150
125 + 275
- 125
50
a3
75
a4
Sample coordination results
Omniscient: confirms DCOP result, as expected
Total Average Gain
40000
Artificially
Supplied
Rewards
(DCOP)
k=1
k=2
k=3
30000
20000
10000
0
Omnscient
Optimistic
Complete Graph
Omniscient
Chain Graph
Optimistic
Physical Implementation
• Create Robots
• Mobile ad-hoc Wireless Network
Confirms Team Uncertainty Penalty
Total Gain
• Averaged over 10 trials each
• Trend confirmed!
• (Huge standard error)
20
Chain
Complete
Problem with “k-Optimal”
• Unknown rewards
– cannot know if can increase reward by moving!
• Define new term: L-Movement
–
–
–
–
# of agents that can change variables per round
Independent of exploration algorithm
Graph dependant
Alternate measure of teamwork
21
L-Movement
• Example: k = 1 algorithms
– L is the size of the largest maximal independent set of the graph
– NP-hard to calculate for a general graph
– harder for higher k
• Consider ring & complete graphs, both with 5 vertices
– ring graph: maximal independent set is 2
– complete graph: maximal independent set is 1
• For k =1
– L=1 for a complete graph
– size of the maximal independent set of a ring graph is:
Configuration Hypercube
No (partial-)assignment is believed to be better than another
wlog, agents can select next value when exploring
Define configuration hypercube: C
Each agent is a dimension
is total reward when agent takes value
cannot be calculated without exploration
values drawn from known reward distribution
Moving along an axis in hypercube → agent changing value
Example: 3 agents (C is 3 dimensional)
Changing from C[a, b, c] to C[a, b, c’]
Agent A3 changes from c to c’
23
How many agents can move? (1/2)
• In a ring graph with 5 nodes
o k=1:L=2
o k=2:L=3
• In a complete graph with 5 nodes
o k=1:L=1
o k=2:L=2
24
How many agents can move? (2/2)
Configuration
is reachable by an algorithm with movement L in s steps
if an only if
and
C[2,2] reachable for L=1 if s ≥ 4
25
L-Movement Experiments
For various DCEE problems, distributions, and L:
For steps s = 1...30:
1. Construct hypercube with s values per dimension
2. Find M, the max achievable reward in s steps, given L
3. Return average of 50 runs
Example: 2D Hypercube
o Only half reachable if L=1
o All locations reachable if L=2
s
s
Average Maximum Reward Discovered
Restricting to L-Movement: Complete
Complete Graph
o k=1:L=1
o k=2:L=2
L=1→2
27
Average Maximum Reward Discovered
Restricting to L-Movement: Ring
L=2→3
Ring graph
o k=1:L=2
o k=2:L=3
28
Complete
Ring
1. Uniform
distribution of
rewards
2. 4 agents
3. Different normal
distribution
29
k and L: 5-agent graphs
K value
Ring Graph, L value
Complete Graph, L value
1
2
1
2
3
2
3
3
3
4
4
4
5
5
5
• Increasing k changes L less in ring than complete
• Configuration Hypercube is upper bound
• Posit a consistent negative effect
• Suggests why increasing k has different effects:
• Larger improvement in complete than ring for
increasing k
30
L-movement May Help Explain
Team Uncertainty Penalty
• L = 2 will be able to explore more of C than algorithm with L = 1
– Independent of exploration algorithm!
– Determined by k and graph structure
– C is upper bound – posit constant negative effect
• Any algorithm experiences diminishing returns as k increases
– Consistent with DCOP results
• L-movement difference between k = 1 algorithms and k = 2
– Larger difference in graphs with more agents
– For k = 1, L = 1 for a complete graph
– For k = 1, L increases with the number of vertices in a ring graph
31
Towards a Theoretic Understanding of DCEE
Scott Alfeld, Matthew E. Taylor, Prateek
Tandon, and Milind Tambe
http://teamcore.usc.edu
33
Download