Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science Simulated Soccer   How does agent decide what to do with the ball? Complexities   Continuous inputs High dimensionality Reinforcement Learning (RL)   Learning to associate utility values with stateaction pairs Agent incrementally updates value associated with each state-action pair based on interaction with environment (Russell & Norvig) Problems   State space explodes exponentially in terms of dimensionality Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer… Quantization  Divide State Space into regions of interest   No automated method for regions     Tile Coding (Sutton & Barto, 1998) granularity Heterogeneity location Prefer a learned abstraction of state space Kohonen Networks   Clustering algorithm Data driven No nearby opponents Agent near opponent goal Teammate near opponent goal Voronoi Diagram State Space Reduction  90 continuous valued inputs describe state of a soccer game    Naïve discretization  290 states Filter out unnecessary inputs  still 218 states Clustering algorithm  only 5000 states  Big Win!!! Two Pass Algorithm  Pass 1:   Use Kohonen Network and large training set to learn state space Pass 2:  Use Reinforcement Learning to learn utilities for states (SARSA) Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line? Unresolved Issues  Increased generalization leads to frequency aliasing… Example: Riemann Sum Few samples  vs. Many samples This becomes a sampling problem… Aliasing & Sampling   Utility function not band limited How can we sample to reduce error?  Uniformly increase sampling rate?    (not the best idea) Adaptively super sample? Choose sample points based on special criteria? Forcing Functions  Use a forcing function to only sample action in a state when it is likely to be effective (valleys are ignored)   Reduces variance in experienced reward for state-action pair How do we create such a forcing function? Results  Evaluate three systems     Control – Random action selection SARSA Forcing Function Evaluation criteria   Goals scored Time of possession Cumulative Score SARSA vs. Random Policy 900 700 600 500 Learning Team 400 Random Team 300 200 100 Games Played 919 865 811 757 703 649 595 541 487 433 379 325 271 217 163 109 55 0 1 Cumulative Goals Scored 800 Time of Possession Time of Possession 6000 4000 3000 Time of Possession 2000 1000 Games Played 945 886 827 768 709 650 591 532 473 414 355 296 237 178 119 60 0 1 Time of Possession 5000 Team with Forcing Functions SARSA with Forcing Function vs. Random Policy 1200 800 Learning Team with Forcing Functions 600 Random Team Against Team with Forcing Functions 400 200 Games Played 897 833 769 705 641 577 513 449 385 321 257 193 129 65 0 1 Cumulative Score 1000 With Forcing vs. Without Performance With Forcing Functions vs Performance Without Forcing Functions 1200 Learning Team Without Forcing Functions 800 Random Team Against Team Without Forcing Functions 600 Learning Team with Forcing Functions 400 Random Team Against Team with Forcing Functions 200 Games Played 937 885 833 781 729 677 625 573 521 469 417 365 313 261 209 157 105 53 0 1 Cumulative Score 1000 Summary  Two-Pass learning algorithm for simulated soccer    State space abstraction is automated Data driven technique Improved state of the art for simulated soccer Future Work  Learned distance metric   Additional automation in process Better generalization

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan

Related documents

Products

Support

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib