Document 12388652

Outline • MDP (brief) – Background – Learning MDP • Q learning • Game theory (brief) – Background • Markov games (2-player) – Background – Learning Markov games • Littman’s Minimax Q learning (zero-sum) • Hu & Wellman’s Nash Q learning (general-sum) Stochastic games (SG) Partially observable SG (POSG) / SG / POSG Expectation over next states Immediate reward Value of next state • Model-based reinforcement learning: 1. 2. • Learn the reward function and the state transition function Solve for the optimal policy Model-free reinforcement learning: 1. Directly learn the optimal policy without knowing the reward function or the state transition function #times action a causes state transition s  s’ #times action a has been executed in state s Total reward accrued when applying a in s v(s’) 1. 2. 3. 4. Start with arbitrary initial values of Q(s,a), for all sS, aA At each time t the agent chooses an action and observes its reward rt The agent then updates its Q-values based on the Qlearning rule The learning rate t needs to decay over time in order for the learning algorithm to converge Famous game theory example A co-operative game Generalization of MDP Mixed strategy Stationary: the agent’s policy does not change over time Deterministic: the same action is always chosen whenever the agent is in state s Example 0 -1 1 State 1 2 1 1 1 0 -1 1 2 1 -1 1 0 1 1 1 -1 State 2 2 1 1 -1 1 1 1 v(s,*)  v(s,) for all s  S,   Max V Such that: rock + paper + scissors = 1 Worst case Best response Expectation over all actions Quality of a state-action pair Discounted value of all succeeding states weighted by their likelihood This learning rule converges to the correct values of Q and v Discounted value of all succeeding states Expected reward for taking action a when opponent chooses o from state s eplor controls how often the agent will deviate from its current policy Hu and Wellman general-sum Markov games as a framework for RL Theorem (Nash, 1951) There exists a mixed strategy Nash equilibrium for any finite bimatrix game

Document 12388652

Related documents

Products

Support

Document 12388652

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib