schrum.beacon11

advertisement
Evolving Multimodal
Networks for Multitask
Games
Jacob Schrum – schrum2@cs.utexas.edu
Risto Miikkulainen – risto@cs.utexas.edu
University of Texas at Austin
Department of Computer Science

Evolution in videogames
 Automatically
learn interesting behavior
 Complex but controlled environments

Stepping stone to real world
 Robots
 Training

simulators
Complexity issues
 Multiple
contradictory objectives
 Multiple challenging tasks
Multitask Games
NPCs perform two or more separate tasks
 Each task has own performance measures
 Task linkage

 Independent
 Dependent
Not blended
 Inherently multiobjective

Test Domains




Designed to study multimodal behavior
Two tasks in similar environments
Different behavior needed to succeed
Main challenge: perform well in both
Front/Back Ramming

Same goal, opposite embodiments

Front Ramming
 Attack
w/front ram
 Avoid counterattacks

Back Ramming
 Attack
w/back ram
 Avoid counterattacks
Predator/Prey

Same embodiment, opposite goals

Predator
 Attack
prey
 Prevent escape

Prey
 Avoid
attack
 Stay alive
Multiobjective Optimization

Game with two objectives:
High health but did not deal much damage
 Damage Dealt
 Remaining Health



A dominates B iff A is
strictly better in one
objective and at least
as good in others
Population of points
not dominated are best:
Pareto Front
Weighted-sum provably
incapable of capturing
non-convex front
Tradeoff between objectives
Dealt lot of damage,
but lost lots of health
NSGA-II


Evolution: natural approach for finding optimal population
Non-Dominated Sorting Genetic Algorithm II*




Population P with size N; Evaluate P
Use mutation to get P´ size N; Evaluate P´
Calculate non-dominated fronts of {P P´} size 2N
New population size N from highest fronts of {P P´}
*K. Deb et al. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. Evol. Comp. 2002
Constructive Neuroevolution
Genetic Algorithms + Neural Networks
 Build structure incrementally (complexification)
 Good at generating control policies
 Three basic mutations (no crossover used)

Perturb Weight
Add Connection
Add Node
Multimodal Networks (1)

Multitask Learning*
 One
mode per task
 Shared hidden layer
 Knows current task

Previous work
 Supervised
learning context
 Multiple tasks learned
quicker than individual
 Not tried with evolution yet
* R. A. Caruana, "Multitask learning: A knowledge-based source of inductive bias" ICML 1993
Multimodal Networks (2)
Starting network with one mode

Mode Mutation




MM Previous



Extra modes evolved
Networks choose mode
Chosen via preference neurons
Links from previous mode
Weights = 1.0
MM Random



Links from random
sources
Random weights
Supports mode deletion
MM(P)
MM(R)
Experiment

Compare 4 conditions:
 Control:
Unimodal networks
 Multitask: One mode per task
 MM(P): Mode Mutation Previous
 MM(R): Mode Mutation Random + Delete Mutation




500 generations
Population size 52
“Player” behavior scripted
Network controls homogeneous team of 4
MO Performance Assessment

Reduce Pareto front to single number
 Hypervolume
of
dominated region

Pareto compliant
 Front
A dominates
front B implies
HV(A) > HV(B)

Standard statistical
comparisons of
average HV
Front/Back Ramming Behaviors
Multitask
MM(R)
Predator/Prey Behaviors
Multitask
MM(R)
Discussion (1)

Front/Back Ramming
 Control
< MM(P), MM(R) < Multitask
 Multiple modes help
 Explicit knowledge of task helps
Discussion (2)

Predator/Prey
 MM(P),
Control, Multitask < MM(R)
 Multiple modes not necessarily helpful
 Disparity in relative difficulty of tasks

Multitask ends up wasting effort
 Mode
deletion aids search for one good mode
How To Apply

Multitask good if:
 Task
division known, and
 Tasks are comparably difficult

Mode mutation good if:
 Task
division is unknown, or
 “Obvious” task division is misleading
Future Work

Games with more tasks
 Does
method scale?
 Control mode bloat

Games with independent tasks
 Ms. Pac-Man
 Collect pills while avoiding ghosts
 Eat ghosts after eating power pill

Games with blended tasks
 Unreal Tournament 2004
 Fight while avoiding damage
 Fight or run away?
 Collect items or seek opponents?
Conclusion

Domains with multiple tasks are common
 Both
in real world and games
Multimodal networks improve learning in
multitask games
 Will allow interesting/complex behavior to
be developed in future

Questions?
Jacob Schrum – schrum2@cs.utexas.edu
Risto Miikkulainen – risto@cs.utexas.edu
University of Texas at Austin
Department of Computer Science
Download