Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –

advertisement
Evolving Multi-modal
Behavior in NPCs
Jacob Schrum – schrum2@cs.utexas.edu
Risto Miikkulainen – risto@cs.utexas.edu
University of Texas at Austin
Department of Computer Sciences
Introduction


Goal: discover NPC behavior automatically
Benefits
Save production time/effort
 Learn counterintuitive behaviors
 Find weaknesses in static scripts
 Tailor behavior to human players

Introduction

Challenges
Games are complex
 Multiple objectives
 Multi-modal behavior required



RL & Evolution popular approaches
How to encourage multi-modal behavior?
Typical Agent Architecture

One policy
Agent
policy
Sensor input

Environment
Why not several policies?
Actions
Agent With Multiple Policies
Agent


Policy for each mode
Individual
policies
simpler than
monolithic
Sensor input
policy
Must choose which
policy to use
policy 2
…
arbitrate

policy 1
policy n
Environment
Actions
Multi-modal Game



Game to test multi-modal architecture
Make task delineation clear
Same NPCs perform two distinct tasks


Must determine their task from sensors
New Game: “Fight or Flight”
Fight or Flight



Fight Task
Player fights with bat
NPCs avoid bat
NPCs fight back



Flight Task
Player has no weapon
Player runs away
NPCs confine/attack
NPC Objectives



Fight Task
Deal damage
Avoid damage
Stay alive

Flight Task
Deal damage

Not the same objective as
in the Fight task!
How do we deal with multiple, competing objectives?
Multi-Objective Optimization

Imagine game with two
objectives:




Damage Dealt
Health Remaining
High health but did not deal much damage
Tradeoff between objectives
A dominates B iff A is
strictly better in one
objective and at least
as good in others
Population of points
not dominated are best:
Pareto Front
Dealt lot of damage,
but lost lots of health
NSGA-II


Evolution: natural approach for finding optimal population
Non-Dominated Sorting Genetic Algorithm II*




Population P with size N; Evaluate P
Use mutation to get P´ size N; Evaluate P´
Calculate non-dominated fronts of {P P´} size 2N
New population size N from highest fronts of {P P´}
*K. Deb et al. 2000
Neuroevolution




Genetic Algorithms + Artificial Neural Networks
NNs good at generating behavior
GA creates new nets, evaluates them
Four basic mutations (no crossover used)
Perturb Weight
Add Connection
Add Neuron
Merge Neurons
New Mode Mutation

New mode with inputs from preexisting mode
before

after
Maximum preference neuron determines mode
Experiment



Compare 1Mode vs. ModeMutation
10 trials each
What to evolve against?


Bot with static policy (instead of player)
Bot has a first person perspective
Fight Task


Swing bat constantly
Approach nearest bot in front
Flight Task

Back away from nearest bot
in front
Incremental Evolution

Hard to evolve against proposed bot strategies


Incremental evolution against increasing speeds



Could easily fail to evolve interesting behavior
0%, 40%, 80%, 100%
Increase speed when all
goals are met
End when goals met at 100%
Goals

Average population performance high enough?


Then increase speed
Each objective has a goal:

Fight




Flight


At least 50 damage to bot (1 kill)
Less than 20 damage per NPC
on average (2 hits)
Survive at least 800 time
steps (80% of trial)
At least 100 damage to bot (2 kills)
Average population objective score met goal value?

Goal met
Mode Mutation Results


Performs well in both tasks
Fight Task

Baiting behavior



One NPC takes damage so others
can sneak up behind
Bot knocked back and forth
Flight Task

Corralling behavior


Keep bot confined in ring of NPCs
Move to scare the bot into enclosure
Use of Multiple Modes



Different modes for baiting
and attacking
Similar elements of modes
co-opted for different tasks
Many unselected modes
As many as 7 unused modes
 Still have outward connections
 Are they vestigial?

1 Mode Results


Only performs well in one task
Example 1



Example 2



Runs away in Fight task
Corralling behavior in Flight task
Overly aggressive in Fight task
Lets bot escape in Flight task
Population averages of individual
objectives are high enough, but few
individuals do well in all objectives
Why Different Behaviors?

Progression method




Numerically similar performance
Drastically different distribution of behaviors
1Mode evolves groups for subsets of objectives
ModeMutation biases towards solving all objectives

Changes shape of fitness landscape
Future Work

Improve progression
More granularity in tougher end of task sequence
 Can incremental evolution be avoided?


Improve multiobjective selection
Bias towards middle of
trade-off surface
 Other algorithms:

SPEA2
 PESA-II

Future Work

Improve ModeMutation




Should new modes be strongly differentiated?
Different arbitration mechanism?
Better option than randomly applying mutation?
Different initial connectivity?
P(x)
P(y)
Conclusion

ModeMutation encourages multi-modal behavior


ModeMutation better than 1Mode


Biases search toward multi-modal solutions
More successes in shorter amount of time
Lead to multi-modal behavior in future games
Questions?
Movies: http://nn.cs.utexas.edu/?multimodal09
E-mail: schrum2@cs.utexas.edu
Auxiliary Slides
Ignore Achieved Goals for Objectives





Goal is met → Drop objective
Focus selection on most difficult objectives
Prevents stagnation
Reshaping fitness
landscape helps
escape peaks
Project scores into
lower dimension
Download