Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu University of Texas at Austin Department of Computer Sciences Introduction Goal: discover NPC behavior automatically Benefits Save production time/effort Learn counterintuitive behaviors Find weaknesses in static scripts Tailor behavior to human players Introduction Challenges Games are complex Multiple objectives Multi-modal behavior required RL & Evolution popular approaches How to encourage multi-modal behavior? Typical Agent Architecture One policy Agent policy Sensor input Environment Why not several policies? Actions Agent With Multiple Policies Agent Policy for each mode Individual policies simpler than monolithic Sensor input policy Must choose which policy to use policy 2 … arbitrate policy 1 policy n Environment Actions Multi-modal Game Game to test multi-modal architecture Make task delineation clear Same NPCs perform two distinct tasks Must determine their task from sensors New Game: “Fight or Flight” Fight or Flight Fight Task Player fights with bat NPCs avoid bat NPCs fight back Flight Task Player has no weapon Player runs away NPCs confine/attack NPC Objectives Fight Task Deal damage Avoid damage Stay alive Flight Task Deal damage Not the same objective as in the Fight task! How do we deal with multiple, competing objectives? Multi-Objective Optimization Imagine game with two objectives: Damage Dealt Health Remaining High health but did not deal much damage Tradeoff between objectives A dominates B iff A is strictly better in one objective and at least as good in others Population of points not dominated are best: Pareto Front Dealt lot of damage, but lost lots of health NSGA-II Evolution: natural approach for finding optimal population Non-Dominated Sorting Genetic Algorithm II* Population P with size N; Evaluate P Use mutation to get P´ size N; Evaluate P´ Calculate non-dominated fronts of {P P´} size 2N New population size N from highest fronts of {P P´} *K. Deb et al. 2000 Neuroevolution Genetic Algorithms + Artificial Neural Networks NNs good at generating behavior GA creates new nets, evaluates them Four basic mutations (no crossover used) Perturb Weight Add Connection Add Neuron Merge Neurons New Mode Mutation New mode with inputs from preexisting mode before after Maximum preference neuron determines mode Experiment Compare 1Mode vs. ModeMutation 10 trials each What to evolve against? Bot with static policy (instead of player) Bot has a first person perspective Fight Task Swing bat constantly Approach nearest bot in front Flight Task Back away from nearest bot in front Incremental Evolution Hard to evolve against proposed bot strategies Incremental evolution against increasing speeds Could easily fail to evolve interesting behavior 0%, 40%, 80%, 100% Increase speed when all goals are met End when goals met at 100% Goals Average population performance high enough? Then increase speed Each objective has a goal: Fight Flight At least 50 damage to bot (1 kill) Less than 20 damage per NPC on average (2 hits) Survive at least 800 time steps (80% of trial) At least 100 damage to bot (2 kills) Average population objective score met goal value? Goal met Mode Mutation Results Performs well in both tasks Fight Task Baiting behavior One NPC takes damage so others can sneak up behind Bot knocked back and forth Flight Task Corralling behavior Keep bot confined in ring of NPCs Move to scare the bot into enclosure Use of Multiple Modes Different modes for baiting and attacking Similar elements of modes co-opted for different tasks Many unselected modes As many as 7 unused modes Still have outward connections Are they vestigial? 1 Mode Results Only performs well in one task Example 1 Example 2 Runs away in Fight task Corralling behavior in Flight task Overly aggressive in Fight task Lets bot escape in Flight task Population averages of individual objectives are high enough, but few individuals do well in all objectives Why Different Behaviors? Progression method Numerically similar performance Drastically different distribution of behaviors 1Mode evolves groups for subsets of objectives ModeMutation biases towards solving all objectives Changes shape of fitness landscape Future Work Improve progression More granularity in tougher end of task sequence Can incremental evolution be avoided? Improve multiobjective selection Bias towards middle of trade-off surface Other algorithms: SPEA2 PESA-II Future Work Improve ModeMutation Should new modes be strongly differentiated? Different arbitration mechanism? Better option than randomly applying mutation? Different initial connectivity? P(x) P(y) Conclusion ModeMutation encourages multi-modal behavior ModeMutation better than 1Mode Biases search toward multi-modal solutions More successes in shorter amount of time Lead to multi-modal behavior in future games Questions? Movies: http://nn.cs.utexas.edu/?multimodal09 E-mail: schrum2@cs.utexas.edu Auxiliary Slides Ignore Achieved Goals for Objectives Goal is met → Drop objective Focus selection on most difficult objectives Prevents stagnation Reshaping fitness landscape helps escape peaks Project scores into lower dimension