ppt_download

advertisement
Towards Equilibrium Transfer in
Markov Games
胡裕靖
2013-9-9
Outline
Background
Preliminary Ideas
Some Results
Background
Multi-agent Reinforcement Learning
Single-agent RL:
Path finding
Mountain Car
RL in multi-agent tasks
Robot Soccer
IKEA furniture robot
Markov Games
𝑀𝑎𝑟𝑘𝑜𝑣 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑃𝑟𝑜𝑐𝑒𝑠𝑠: < 𝑆, 𝐴, 𝑟, 𝑝 >
𝑆: the discrete state space.
𝐴: the action space of the agent.
𝑟: 𝑆 × 𝐴 → 𝕽 is the reward function.
𝑝: 𝑆 × 𝐴 × 𝑆 → 0,1 is the transition function.
from one agent to more
than one
𝑀𝑎𝑟𝑘𝑜𝑣 𝑔𝑎𝑚𝑒: < 𝑁, 𝑆, 𝐴𝑖 𝑖=1…𝑛 , 𝑟𝑖 𝑖=1…𝑛 , 𝑝 >
N: the set of agents.
𝑆: the discrete state space.
𝐴 = 𝐴1 × ⋯ × 𝐴𝑛 : the joint action space of the
agents.
𝑟𝑖 : 𝑆 × 𝐴 → 𝕽 is the reward function.
p: 𝑆 × 𝐴 × 𝑆 → 0,1 is the transition function.
Agent take joint
actions
Equilibrium-based MARL
Some equilibrium solution concepts in game theory can be adopted
Our Previous Work
 Equilibrium-based MARL:
 Multi-agent reinforcement learning with meta equilibrium []
 Multi-agent reinforcement learning by negotiation with
unshared value functions []
 Focusing on combining MARL with equilibrium solution
concepts
 Problematic issues:
 Equilibrium computing is complicated and time consuming
 A new complexity class: TFNP! []
 For tasks with many agents, equilibrium-based MARL
algorithms may take too much time
How to accelerate the learning process of equilibrium-based MARL?
Transfer Learning in RL
Matthew E Taylor, Peter Stone. Transfer learning for reinforcement learning domains.
Journal of Machine Learning Research, 2009.
Alessandra Lazaric. Transfer in reinforcement learning: a framework and a survey.
Reinforcement Learning, Springer, 2012.
𝑀𝐷𝑃
instance/policy/value
function/model/…
𝑀𝐷𝑃′
accelerate
Reuse learnt
knowledge
Transfer Learning in Markov Games?
𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒
instance/policy/value
function/model/…
𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒′
Inter-task transfer
Inner-task transfer
……
Why not transfer between
these normal-form games
within a Markov game?
……
𝑡
Inner-task Transfer
 Transfer equilibrium between similar normal-form
games during learning in a Markov game:
 Reuse the computed equilibria in previous games
 Reducing learning time
 Key problems:
 Which games are similar?
 For example: the games occur on different visits of a state
 How to transfer equilibrium?
𝑄1𝑡+1 (𝑠, 𝑎, 𝑏)
𝑄1𝑡 (𝑠, 𝑎, 𝑏)
(𝑎, 𝑏)
𝑏1
𝑏2
(𝑎, 𝑏)
𝑏1
𝑏2
𝑎1
2
1
𝑎1
2
1
𝑎2
−1
0
𝑎2
−1
−0.5
……
Preliminary Ideas
Game Similarity
 Games with the same action space?
 Games with different action space?
 Similarity payoff distance?
 Equilibrium-based similarity or equilibrium-independent
similarity?
Drew Fudenberg and David M. Kreps. A theory of learning,
experimentation and equilibrium in games. 1990.
Game Similarity
Find equilibria of two games
and compute the similarity
Equilibrium-based similarity
Weird Cycle
Transfer seems senseless!
Equilibrium transfer
Why not take (𝑈, 𝐶) in the second game?
Our Idea
Transfer equilibrium between games which are thought to be similar.
Evaluate how much the loss brought by equilibrium transfer is.
Transfer is acceptable when there is a little loss.
𝑄1𝑡+1 (𝑠, 𝑎, 𝑏)
𝑄1𝑡 (𝑠, 𝑎, 𝑏)
(𝑎, 𝑏)
𝑏1
𝑏2
(𝑎, 𝑏)
𝑏1
𝑏2
𝑎1
2
1
𝑎1
2
1
𝑎2
−1
0
𝑎2
−1
−0.5
The two games are different only in one item.
……
Problem Definition
(𝑎, 𝑏)
𝑏1
𝑏2
1
𝑎1
2
1
0
𝑎2
−1
−0.5
(𝑎, 𝑏)
𝑏1
𝑏2
𝑎1
2
𝑎2
−1
𝐺, 𝑝∗
transfer method?
𝐺 ′, ?
 Can we find a transfer method which can transfer the
computed Nash equilibrium 𝑝∗ in game 𝐺 to a strategy
profile 𝑝′ in game 𝐺′ that satisfies ∀𝑖 ∈ 𝑁 and ∀𝑎𝑖 ∈ 𝐴𝑖 ,
there holds
Approximate
𝐺′
𝐺′ ′
′
𝑈𝑖 𝑎𝑖 , 𝑝−𝑖 ≤ 𝑈𝑖 𝑝 + 𝜖, Nash equilibrium
where 𝜖 is close to 0.
 In other words, given a transfer method, if 𝜖 is small
enough, then the transfer method is acceptable.
 Furthermore,
Problem Definition
 ∀𝑖 ∈ 𝑁 and ∀𝑎𝑖 ∈ 𝐴𝑖 , define the transfer error
𝐺′
𝐺′ ′
′
𝜖𝑖 𝑎𝑖 , 𝑝′ = 𝑈𝑖 𝑎𝑖 , 𝑝−𝑖 − 𝑈𝑖 𝑝
 Let 𝜖𝑖 𝑝′ = max 𝜖𝑖 (𝑎𝑖 , 𝑝′ )
𝑎𝑖
 Let 𝜖 𝑝′ = max 𝜖𝑖 (𝑝′ )
𝑖
Given a transfer method, we need to find the bound of 𝜖(𝑝′ )!
A Naïve Transfer Method
Direct Transfer
𝑏1
𝑏2
1
𝑎1
2
1
0
𝑎2
−1
−0.5
𝑏1
𝑏2
𝑎1
2
𝑎2
−1
𝐺, 𝑝∗
𝑝∗
(𝑎, 𝑏)
(𝑎, 𝑏)
𝐺 ′, ?
 Define the difference of the two games 𝛿 = 𝐺 ′ − 𝐺 such
that ∀𝑖 ∈ 𝑁 and ∀𝑎 ∈ 𝐴
𝛿𝑖 𝑎 = 𝐺 ′ 𝑎 − 𝐺 𝑎 .
 Examine the transfer error
𝐺′
𝐺′ ∗
∗
′
∗
𝜖𝑖 𝑎𝑖 , 𝑝 = 𝜖𝑖 𝑎𝑖 , 𝑝 = 𝑈𝑖 𝑎𝑖 , 𝑝−𝑖 − 𝑈𝑖 𝑝
A Naïve Transfer Method
′
′
∗
𝜖𝑖 𝑎𝑖 , 𝑝′ = 𝑈𝑖𝐺 𝑎𝑖 , 𝑝−𝑖
− 𝑈𝑖𝐺 𝑝∗
′
′
∗
= Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖 𝑈𝑖𝐺 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎′ Σ𝑎−𝑖 𝑝𝑖∗ 𝑎𝑖′ 𝑈𝑖𝐺 (𝑎𝑖′ , 𝑎−𝑖 )
𝑖
∗
= Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖
′
′
𝑈𝑖𝐺 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎′ 𝑝𝑖∗ 𝑎𝑖′ 𝑈𝑖𝐺 𝑎𝑖′ , 𝑎−𝑖
𝑖
∗
= Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖 [𝑈𝑖𝐺 𝑎𝑖 , 𝑎−𝑖 + 𝛿𝑖 (𝑎𝑖 , 𝑎−𝑖 ) − Σ𝑎′ 𝑝𝑖∗ 𝑎𝑖′ [𝑈𝑖𝐺 𝑎𝑖′ , 𝑎−𝑖 + 𝛿𝑖 (𝑎𝑖′ , 𝑎−𝑖 )]]
𝑖
∗
= Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖 𝑈𝑖𝐺 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎′ 𝑝𝑖∗ 𝑎𝑖′ 𝑈𝑖𝐺 𝑎𝑖′ , 𝑎−𝑖
𝑖
∗
+ Σ𝑎−𝑖 𝑝−𝑖 𝑎−𝑖 [𝛿𝑖 (𝑎𝑖 , 𝑎−𝑖 ) − Σ𝑎′ 𝑝𝑖∗ 𝑎𝑖′ 𝛿𝑖 (𝑎𝑖′ , 𝑎−𝑖 )]
𝑖
∗
≤ Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖 [𝛿𝑖 (𝑎𝑖 , 𝑎−𝑖 ) − Σ𝑎′ 𝑝𝑖∗ 𝑎𝑖′ 𝛿𝑖 (𝑎𝑖′ , 𝑎−𝑖 )]
𝑖
∗
= Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖 𝛿𝑖 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎 𝑝∗ 𝑎 𝛿𝑖 (𝑎)
∗
= Σ𝑎−𝑖 𝑝−𝑖
𝑎−𝑖 𝛿𝑖+ 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎 𝑝∗ 𝑎 𝛿𝑖 (𝑎)
≤ Σ𝑎−𝑖 𝛿𝑖+ 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎 𝑝∗ 𝑎 𝛿𝑖 (𝑎)
𝛿𝑖+ 𝑎𝑖 , 𝑎−𝑖 = max(0, 𝛿𝑖 𝑎𝑖 , 𝑎−𝑖 )
A Naïve Transfer Method
Σ𝑎−𝑖 𝛿𝑖+ 𝑎𝑖 , 𝑎−𝑖 − Σ𝑎 𝑝∗ 𝑎 𝛿𝑖 (𝑎)
Many items in 𝛿 are zero if two games are very similar
Some Results
Future Work
 Some problems:
 Other transfer methods?
 Only Nash equilibrium?
 Equilibrium finding algorithms
 Transfer between games with different action space
 Transfer between games with different agent numbers
 Game abstraction
Thanks!
Download