Reinforcement Learning in Real-Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick Outline Reasons Background What this research is about Motivation and Aim RTS games Reinforcement Learning explained Applying RL to RTS This project Methodology Evaluation Summary Motivation and Aims Problem: AI has been a neglected area – game developers have adopted the “not broken so why fix it” philosophy Internet Thrashing – my own experience Aim: Use learning to develop a human-like player Simulate beginner → intermediate level play Use RL and A-life-like techniques E.g. Black and White, Pengi [Scott] RTS Games – The Domain Two or more teams of individuals/cohorts in a warlike situation on a series of battlefields Teams can have a variety of: E.g. Command & Conquer, Starcraft, Age of Empires, Red Alert, Empire Earth Weapons Units Resources Buildings Players required to manage all of the above to achieve the end goal. (Destroy all units, capture flag, etc.) Challenges offered in RTS games Real time constraints on actions High level strategies combined with lowlevel tactics Multiple goals and choices The Aim and Approach Create a human-like opponent Realistic Diverse behavior (not boring) This is difficult to do! Tactics and Strategy Agents will be reactive to environment Learn rather than code – Reinforcement learning The Approach Part 1 – Reinforcement Learning Reward and Penalty Action Rewards / Penalties Strategic Rewards / Penalties Penalize being shot Reward killing a player on the other team Securing / occupying a certain area Staying in certain group formations Destroying all enemy units Aim to receive maximum reward over time Problem: Credit assignment What rewards should be given to which behaviors? The Approach Part 2 – Credit Assignment States and actions Decide on a state space and action space Assign values to States, or States and Actions Train the agent in this space Reinforcement Learning example Reinforcement Learning example Why use Reinforcement Learning? Well suited to problems where there is a delayed reward (tactics and strategy) The trained agent moves in (worst case) linear time (reactive) Problems: Large state spaces (state aggregation) Long training times (ER and shaping) The Approach Part 3 – Getting Diversity A-life-like behavior using aggregated state spaces Agent Agent state space Research Summary: Investigate this approach using a simple RTS game Issues: Empirical Research Applying RL in a novel way Not using entire state space Need to investigate Appropriate reward functions Appropriate state spaces Problems with Training Will need lots of trials - the propagation problem No. trials can be reduced using Shaping [Mahadevan] and Experience Replay [Lin] Self play – other possibilities include A* and human opponents Tesauro, Samuel Methodology Hypothesis: “The combination of RL and reduced state spaces in a rich (RTS) environment will lead to human-like gameplay” Empirical investigation to test hypothesis Evaluate system behavior Analyze the observed results Describe interesting phenomenon Evaluation Measure the diversity of strategies How big a change (and what type) is required to change the behaviour – a qualitative analysis of this Success of strategies I.e. what level of gameplay does it achieve Time to win, points scored, resembles humans Compare to human strategies “10 requirements of a challenging and realistic opponent” [Scott] Summary Interested in a human-level game program Want to avoid brittle, predictable programmed solutions Search program space for most diverse solutions using RL to direct search Allows specifications of results, without needing to specify how this can be achieved Evaluate the results References Bob Scott. The illusion of intelligence. AI Game Programming Wisdom, pages 16–20, 2002. Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55, pages 311–364, 1992. L Lin. Reinforcement learning for robots using neural networks. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh USA, 1993. Mark Bishop Ring. Continual Learning in Reinforcement Environments. MIT Press, 1994. Stay Tuned! For more information, see http://www.csse.monash.edu.au/~ngi/ Thanks for listening!