Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

advertisement
Evolving Agent Behavior in
Multiobjective Domains Using
Fitness-Based Shaping
Jacob Schrum and Risto Miikkulainen
University of Texas at Austin
Department of Computer Science
Typical Uses of MOEAs

Where have MOEAs proven themselves?













Wireless Sensor Networks (Woehrle et al, 2010)
Groundwater Management (Siegfried et al 2009)
Hydrologic model calibration (Tang et al, 2006)
Epoxy polymerization (Deb et al, 2004)
Voltage-controlled oscillator design (Chu et al, 2004)
Multi-spindle gear-box design (Deb & Jain, 2003)
Foundry casting scheduling (Deb & Reddy, 2001)
Multipoint airfoil design (Poloni & Pediroda, 1997)
Design of aerodynamic compressor blades (Obayashi, 1997)
Electromagnetic system design (Michielssen & Weile, 1995)
Microprocessor design (Stanley & Mudge, 1995)
Design of laminated ceramic composites (Belegundu et al, 1994)
Many engineering/design problems!
New Domains for MOEAs


Simulated agents often face multiple objectives
Automatic discovery of intelligent behavior





Video game opponents in Unreal
Tournament (van Hoorn, 2009)
Predator/prey scenarios
(Schrum & Miikkulainen 2009)
Race car driving in TORCS
(Agapitos et al, 2008)
Comparatively little so far
Direct application of MOEA seldom successful

Success often depends on “shaping”
What is Shaping?



Term from Behavioral Psychology
Identified by B. F. Skinner (1938)
Task-Based Example:
Train rat to press lever



First reward proximity
Then any interaction with lever
Then actual pressing of lever
Evolutionary Shaping



Environment changes, making task harder
Evolution shapes behavior across generations
Example: Migration given continental drift [1]



Animals become accustomed to short migration
Continental drift increases distance of migration
Ability to travel increasing distances required
Arctic Tern
Atlantic Salmon

EC models with incremental evolution (ex. [2])
[1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975.
[2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.
Fitness-Based Shaping




Not extensively used
Little/no domain knowledge needed
Multiobjective approach a good fit
Selection criteria change


Exploiting ignored objectives (TUG)
Exploiting unfilled niches (BD)
Objective Space
Dominated, but
exploiting mostly
ignored objective
Uncrowded Niches
Behavior Space
Crowded Niches
Mutiobjective Optimization
 
v  u iff

Pareto dominance:
 i  1, , n: v  u
i
i
 i  1, , n: v  u
i
i
Assumes maximization
Want nondominated points
NSGA-II used in this work

What to evolve?




NNs as control policies
Nondominated
Constructive Neuroevolution




Genetic Algorithms + Neural Networks
Build structure incrementally (complexification)
Good at generating control policies
Three basic mutations (no crossover used)
Perturb Weight
Add Connection
Add Node
Targeting Unachieved Goals

Main ideas:



“Hard” and “easy” defined in terms of goal values




Temporarily deactivate “easy” objectives
Focus on “hard” objectives
Easy: average fitness “persists” above goal (achieved)
Hard: goal not yet achieved
Objectives reactivated when no longer achieved
Increase goal values when all achieved
Hard Objectives
TUG Example
Other goals also achieved → Goals increase
Noisy evaluations
Goal achieved
Reset recency-weighted average
Behavioral Diversity

Originally developed for single-objective tasks [3]




Add behavioral diversity objective
Encourage exploration of new behaviors
Domain-specific behavior measure required
Extensions in this work:



Senses
Multiobjective task
Domain independent method
Only requires policy mapping
ℝN to ℝ M, e.g. NNs
Actions
[3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive
problems in neuro-evolution. 2009.
Behavioral Diversity Details

Behavior vector:

Given input vectors, concatenate outputs
0.1 2.3 4.3 5.2 3.2
Behavior vector
0.5 5.3 7.5 3.4 2.1
2.4 4.3 0.7 4.2
…
1.3 4.2 5.6 4.5 7.7

Behavioral diversity objective:

AVG distance from other
behavior vectors
High average distance from other points
…
2.1 3.5
Battle Domain

Evolved monsters (blue)


Scripted fighter (green)


Bat can hurt monsters
Three objectives




Monsters can hurt fighter
Deal damage
Avoid damage
Stay alive
Previous work required
incremental evolution to solve
Experimental Comparison

NN copied to 4 monsters


In paper




Control: Plain NSGA-II
TUG: NSGA-II with TUG using expert initial goals
BD: NSGA-II with BD using random input vectors
Additional methods since publication



Homogeneous teams
TUG-Low: NSGA-II with TUG using minimal initial goals
BD-Obs: NSGA-II with BD using inputs from evaluations
Each repeated 30 times
Attainment Surfaces [4]

Result attainment surface


Shows space dominated by single Pareto front
Summary attainment surface s


Union of space dominated in at least s out of n runs
Surface s weakly dominates s+1, etc.
Individual
surfaces intersect
Surface 1
Surface 2
Surface 3
Pareto Fronts
(Approximation Sets)
Result Attainment
Surfaces
Summary
Attainment Surfaces
[4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance
of stochastic multiobjective optimizers. 2005.
Final Summary Attainment Surfaces
Animation: worst to best summary attainment surface
Control
TUG
TUG-Low
BD
BD-Obs
Hypervolume Metric [5]

Hypervolume of result attainment surface


WRT reference point


Simply “volume” for 3 domain objectives
Slightly less than minimum scores
Pareto-compliant metric

F1  F2  HV1  HV2
Hypervolume = A + B + C + D
[5] E. Zitzler and L. Thiele. Multiobjective optimization
using evolutionary algorithms – a comparative case
study. 1998.
Hypervolume
Successful Behaviors
BD
TUG
BD-Obs
TUG-Low
Discussion



Control: more extreme trade-offs
BD: more precise timing
BD-Obs and BD similar


“Real” inputs give no
advantage
TUG: more teamwork

Particular initial objectives

TUG-Low more like BD than TUG

ALL are better than Control
Future Work

How to combine TUG and BD


Naïve combination doesn’t work
Scaling up



Many objectives
More complex domains
Current work in Unreal Tournament promising
Conclusion


BD and TUG improve MO evolution
Domain independence!


Contrast to task-based shaping
Expand MOEAs to a new range of domains
Questions?
Email: schrum2@cs.utexas.edu
See movies at:
http://nn.cs.utexas.edu/?fitness-shaping
TUG Details

Persistence:



rt 1  rt   ( xt 1  rt )
Goals:





Recency-weighted average rt
surpasses goal
Initial values based on domain knowledge
Or simply the minimal values for objectives
Increase each goal g when all are achieved
g o  g o   (omax  g o )
Goal achieved
o
Objectives reactivated when no longer achieved
TUG Cycles
Download