Slides (Part 1) - Microsoft Research

Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges Test Beds for Artificial Intelligence • Perfect instrumentation and measurements • Perfect control and manipulation • Reduced cost • Reduced risk • Great way to showcase algorithms Improve User Experience • Create adaptive, believable game AI • Compose great multiplayer matches based on skill and social criteria • Mitigate Network latency using prediction • Create realistic character movement Partially observable stochastic games States only partially observed Multiple agents choose actions Stochastic pay-offs and state transitions depend on state and all the other agents’ actions Goal: Optimise long term pay-off (reward) Just like life: complex, adversarial, uncertain, and we are in it for the long run! Approximations and Methods • Reinforcement Learning • Unsupervised Learning • Supervised Learning • Planning and Pathfinding What is the best AI? • Maximises its chances of winning • Delivers best entertainment value • ??? Agents 1977 Non-Player Character Human Player 2001 Number of Games Consoles Sold (May 2009) Worldwide Video Game Revenues 50 million $41.9 bln $31.6 bln 30 million $21.9 bln $23.3 bln $26.3 bln $27.7 bln 23 million 2002 Wii Xbox 360 PS3 2003 2004 2005 2006 2007 Hardware development • Graphics (graphics cards, displays) • Sound (5.1, 7.1 surround, speakers, headphones) • CPU speed, cache • Memory, Hard disks, DVD, HDDVD (Blu-Ray) • Networking (Broadband) Software development • Graphics (rendering etc.) • Sound (sound rendering etc.) • Physics simulation (e.g., Havoc engine) • Networking software (e.g., Xbox Live) • Artificial Intelligence Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges Objective is to nurture creatures called norns Model incorporates artificial life features Norns had neural network brains Their development can be influenced by player feedback Peter Molineux’s famous “God Game” Player determines fate of villagers as their “God” (seen as a hand) Creature can be taught complex behaviour Good and Evil - actions have consequence Variety of tracks, drivers and road conditions Racing line provided by author, neural network keeps car on racing line Multilayer perceptrons trained with RPROP Simple rules for recovery and overtaking Adaptive avatar for driving Separate game mode Basis of all in-game AI Basis of “dynamic” racing line XBOX Game • Dynamic Racing Line • Learning a Drivatar • Using a Drivatar Ranking and Matchmaking for Xbox Live and Halo 3 Xbox 360 Live (Launched 2005) Every game uses TrueSkill™ to match players > 20 million players, > 3 million matches per day > 2 billion hours of gameplay Halo 3 (Launched 2007) Largest entertainment launch in history > 200,000 player concurrently (peak: 1 million) Halo 3 Game • Matchmaking • Skill Stats • Tight Matches Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges Supervised Learning Reinforcement Learning Unsupervised Learning • Learning an input-output relationship from examples • Tasks: regression, classification, ranking • Applications: skill estimation, behavioural cloning • Learning policies from state-action-reward sequences • Tasks: control, value estimation, policy learning • Applications: learning to drive, learning to walk, learning to fight • Learning the underlying structure from examples • Tasks: clustering, manifold learning, density estimation • Applications: modelling motion capture data, user behaviour Goal: Learn from skilled players how to act in a first-person shooter (FPS) game Test Environment: Unreal Tournament FPS game engine Gamebots control framework Idea: Naive Bayes classifier to learn under which circumstances to switch behaviour St: bot’s state at t St+1: bot’s state t+1 H: health level W: weapon OW: opponent’s weapon HN: hear noise NE: number of close enemies PW: weapon close by? PH: health pack close by? H W OW HN NE PW PH Naive Bayes (“inverse programming”) • Specify probabilities P(H| St+1) and P(St+1|St) • Use Bayes rule for P(St+1|H,W,...) etc. Supervised learning • Recognition of state of human trainer • Reading out variables from game engine • Determine relative frequencies to estimate probabilities P(H| St+1) (table). Adaptive avatar for driving Separate game mode Basis of all in-game AI Basis of “dynamic” racing line “Built-In” AI Behaviour Development Tool Drivatar Learning System Drivatar Racing Line Behaviour Model Vehicle Interaction and Racing Strategy Recorded Player Driving Controller Car Behaviour Drivatar AI Driving Two phase process: 1. Pre-generate possible racing lines prior to the race from a (compressed) racing table. 2. Switch the lines during the race to add variability. Compression reduces the memory needs per racing line segment Switching makes smoother racing lines. Segments a1 a2 a3 a4 Car Position and Speed at time t Static Car Properties Car Controls Physics (“causal”) Physics Simulation System Control (“inverse”) Controller Car Position and Speed at time t+1 Desired Pos. and Speed at time t+1 Competition is central to our lives Innate biological trait Driving principle of many sports Chess Rating for fair competition ELO: Developed in 1960 by Árpád Imre Élő Matchmaking system for tournaments Challenges of online gaming Learn from few match outcomes efficiently Support multiple teams and multiple players per team Given: Match outcomes: Orderings among k teams consisting of n1, n2 , ..., nk players, respectively Questions: Skill si for each player such that Global ranking among all players Fair matches between teams of players Latent Gaussian performance model for fixed skills Possible outcomes: Player 1 wins over 2 (and vice versa) s1 s2 p1 p2 y1 2 Gaussian Prior Factors s s s s 1 2 3 4 Fast and efficient approximate message passing using Expectationt Propagation t t 1 2 3 y1 y2 2 3 Ranking Likelihood Factors Leaderboard Global ranking of all players Matchmaking For gamers: Most uncertain outcome For inference: Most informative Both are equivalent! Data Set: Halo 2 Beta 3 game modes Free-for-All Two Teams 1 vs. 1 > 60,000 match outcomes ≈ 6,000 players 6 weeks of game play Publically available 40 35 30 Level 25 20 15 char (TrueSkill™) SQLWildman (TrueSkill™) char (Halo 2 rank) SQLWildman (Halo 2 rank) 10 5 0 0 100 200 Number of Games 300 400 Winning probability 100% char wins SQLWildman wins Both players draw 80% 60% 40% 20% 0% 0 5/8 games won by char 100 200 Number of games played 300 400 500 Golf (18 holes): 60 levels Car racing (3-4 laps): 40 levels UNO (chance game): 10 levels Model time-series of skills by smoothing across time History of Chess 3.5M game outcomes (ChessBase) 20 million variables (each of 200,000 players in each year of lifetime + latent variables) 40 million factors pt,i pt,j st,i st,j pt,i pt,j pt+1,i pt+1,j st+1, i st+1, j pt+1,i pt+1,j Garry Kasparov 3000 2800 Robert James Fischer Anatoly Karpov Skill estimate 2600 Mikhail Botvinnik Paul Morphy 2400 Whilhelm Steinitz 2200 Boris V Spassky Emanuel Lasker 2000 1800 Jose Raul Capablanca 1600 Adolf Anderssen 1400 1850 1858 1866 1875 1883 1891 1899 1907 1916 1924 1932 Year 1940 1949 1957 1965 1973 1981 1990 1998 2006 Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges game state Agent parameter update reward / punishment action Learning Algorithm game state Game action +10.0 actions Q-Table THROW KICK STAND 13.2 10.2 -1.3 3.2 6.0 4.0 1ft / GROUND 2ft / GROUND game states 3ft / GROUND 4ft / GROUND 5ft / GROUND 6ft / GROUND 1ft / KNOCKED 2ft / KNOCKED 3ft / KNOCKED 4ft / KNOCKED 5ft / KNOCKED 6ft / KNOCKED 3 ft Q(s,a) is expected reward for action a in state s α is rate of learning a is action chosen r is reward resulting from a s is current state s’ is state after executing a γ is discount factor for future rewards Q Learning (off-policy) SARSA (on-policy) Game state features • • • • Reinforcement Learner Separation (5 binned ranges) Last action (6 categories) Mode (ground, air, knocked) Proximity to obstacle Available Actions • 19 aggressive (kick, punch) • 10 defensive (block, lunge) • 8 neutral (run) Q-Function Representation • One layer neural net (tanh) In-Game AI Code Reward for decrease in Wulong Goth’s health Early in the learning process … … after 15 minutes of learning Punishment for decrease in either player’s health Early in the learning process … … after 15 minutes of learning Left Distance 1. Collect Experience 2. Learn transition probabilities and rewards 3. Revise Value Function and Policy 4. Revise state-action abstraction 5. Return to 1 and collect more experience Speed Too Coarse Just Right! Too Fine Representational Complexity A A A A Split A A A Merge A Split Merge Real time racing simulation. Goal: as fast lap times as possible. Laser Range Finder Measurements as Features Progress along Track as Reward Coast Accelerate Brake Hard-Left Hard-Right Soft-Left Soft-Right 2D Demo • Principles Explained XBOX 360 Integration • Efficient Implementation • 60 fps Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges Fix Markers at key body positions Record their position in 3D during motion Fundamental technology in animation today Free download of mocap files: www.bvhfiles.com Generative model for dimensionality reduction Probabilistic equivalent to PCA which defines a probability distribution over data Non-linear manifolds based on kernels Visualisation of high-dimensional data Back-projection from latent to data space Can deal with missing data Principal Component Latent variables Analysis: Marginalise over x and optimise W Gaussian Process Latent Variable Model: Marginalise over W and optimise x Weight matrix x W y Data Tutorial Introduction Commercial Games Supervised Learning Machine Learning Coffee Break Reinforcement Learning Artificial Intelligence Unsupervised Learning Pathfinding Planning in Games Next Steps Testbeds Future Challenges

Slides (Part 1) - Microsoft Research

Related documents

Products

Support

Slides (Part 1) - Microsoft Research

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib