Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science Typical Uses of MOEAs Where have MOEAs proven themselves? Wireless Sensor Networks (Woehrle et al, 2010) Groundwater Management (Siegfried et al 2009) Hydrologic model calibration (Tang et al, 2006) Epoxy polymerization (Deb et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004) Multi-spindle gear-box design (Deb & Jain, 2003) Foundry casting scheduling (Deb & Reddy, 2001) Multipoint airfoil design (Poloni & Pediroda, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Electromagnetic system design (Michielssen & Weile, 1995) Microprocessor design (Stanley & Mudge, 1995) Design of laminated ceramic composites (Belegundu et al, 1994) Many engineering/design problems! New Domains for MOEAs Simulated agents often face multiple objectives Automatic discovery of intelligent behavior Video game opponents in Unreal Tournament (van Hoorn, 2009) Predator/prey scenarios (Schrum & Miikkulainen 2009) Race car driving in TORCS (Agapitos et al, 2008) Comparatively little so far Direct application of MOEA seldom successful Success often depends on “shaping” What is Shaping? Term from Behavioral Psychology Identified by B. F. Skinner (1938) Task-Based Example: Train rat to press lever First reward proximity Then any interaction with lever Then actual pressing of lever Evolutionary Shaping Environment changes, making task harder Evolution shapes behavior across generations Example: Migration given continental drift [1] Animals become accustomed to short migration Continental drift increases distance of migration Ability to travel increasing distances required Arctic Tern Atlantic Salmon EC models with incremental evolution (ex. [2]) [1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975. [2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008. Fitness-Based Shaping Not extensively used Little/no domain knowledge needed Multiobjective approach a good fit Selection criteria change Exploiting ignored objectives (TUG) Exploiting unfilled niches (BD) Objective Space Dominated, but exploiting mostly ignored objective Uncrowded Niches Behavior Space Crowded Niches Mutiobjective Optimization v u iff Pareto dominance: i 1, , n: v u i i i 1, , n: v u i i Assumes maximization Want nondominated points NSGA-II used in this work What to evolve? NNs as control policies Nondominated Constructive Neuroevolution Genetic Algorithms + Neural Networks Build structure incrementally (complexification) Good at generating control policies Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node Targeting Unachieved Goals Main ideas: “Hard” and “easy” defined in terms of goal values Temporarily deactivate “easy” objectives Focus on “hard” objectives Easy: average fitness “persists” above goal (achieved) Hard: goal not yet achieved Objectives reactivated when no longer achieved Increase goal values when all achieved Hard Objectives TUG Example Other goals also achieved → Goals increase Noisy evaluations Goal achieved Reset recency-weighted average Behavioral Diversity Originally developed for single-objective tasks [3] Add behavioral diversity objective Encourage exploration of new behaviors Domain-specific behavior measure required Extensions in this work: Senses Multiobjective task Domain independent method Only requires policy mapping ℝN to ℝ M, e.g. NNs Actions [3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009. Behavioral Diversity Details Behavior vector: Given input vectors, concatenate outputs 0.1 2.3 4.3 5.2 3.2 Behavior vector 0.5 5.3 7.5 3.4 2.1 2.4 4.3 0.7 4.2 … 1.3 4.2 5.6 4.5 7.7 Behavioral diversity objective: AVG distance from other behavior vectors High average distance from other points … 2.1 3.5 Battle Domain Evolved monsters (blue) Scripted fighter (green) Bat can hurt monsters Three objectives Monsters can hurt fighter Deal damage Avoid damage Stay alive Previous work required incremental evolution to solve Experimental Comparison NN copied to 4 monsters In paper Control: Plain NSGA-II TUG: NSGA-II with TUG using expert initial goals BD: NSGA-II with BD using random input vectors Additional methods since publication Homogeneous teams TUG-Low: NSGA-II with TUG using minimal initial goals BD-Obs: NSGA-II with BD using inputs from evaluations Each repeated 30 times Attainment Surfaces [4] Result attainment surface Shows space dominated by single Pareto front Summary attainment surface s Union of space dominated in at least s out of n runs Surface s weakly dominates s+1, etc. Individual surfaces intersect Surface 1 Surface 2 Surface 3 Pareto Fronts (Approximation Sets) Result Attainment Surfaces Summary Attainment Surfaces [4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005. Final Summary Attainment Surfaces Animation: worst to best summary attainment surface Control TUG TUG-Low BD BD-Obs Hypervolume Metric [5] Hypervolume of result attainment surface WRT reference point Simply “volume” for 3 domain objectives Slightly less than minimum scores Pareto-compliant metric F1 F2 HV1 HV2 Hypervolume = A + B + C + D [5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998. Hypervolume Successful Behaviors BD TUG BD-Obs TUG-Low Discussion Control: more extreme trade-offs BD: more precise timing BD-Obs and BD similar “Real” inputs give no advantage TUG: more teamwork Particular initial objectives TUG-Low more like BD than TUG ALL are better than Control Future Work How to combine TUG and BD Naïve combination doesn’t work Scaling up Many objectives More complex domains Current work in Unreal Tournament promising Conclusion BD and TUG improve MO evolution Domain independence! Contrast to task-based shaping Expand MOEAs to a new range of domains Questions? Email: schrum2@cs.utexas.edu See movies at: http://nn.cs.utexas.edu/?fitness-shaping TUG Details Persistence: rt 1 rt ( xt 1 rt ) Goals: Recency-weighted average rt surpasses goal Initial values based on domain knowledge Or simply the minimal values for objectives Increase each goal g when all are achieved g o g o (omax g o ) Goal achieved o Objectives reactivated when no longer achieved TUG Cycles