Slides

advertisement
UT^2: Human-like
Behavior via
Neuroevolution of
Combat Behavior and
Replay of Human
Traces
Jacob Schrum (schrum2@cs.utexas.edu)
Igor V. Karpov (ikarpov@cs.utexas.edu)
Risto Miikkulainen (risto@cs.utexas.edu)
Our Approach: UT^2
• Human traces to get unstuck and navigate
– Filter data to get general-purpose traces
• Evolve skilled combat behavior
– Restrictions/filters maintain humanness
• Observe and judge like a human
– Necessary to account for the judging game
Bot Architecture
Human Trace Replay
Record and Index Human Games
Synthetic pose data
Indexed by nearest navpoint
Replay nearest trace when needed
Unstuck Controller
• Mix scripted responses and human traces
– Previous UT^2 used only human traces
Stuck Condition
Response
Still
Move Forward
Collide With Wall
Move Away
Frequent Collisions
Dodge Away
Under Elevator
Goto Nearest Item or Dodge
Bump Agent
Move Away
Same Navpoint
Human Traces
Off Navpoint Grid
Human Traces
• Human traces also used after repeated failures
Explorative Retrace
• Explore the level like a human
• Collisions allowed when using RETRACE
– Humans often bump walls with no problem
• If RETRACE fails
– No trace available, or trace gets bot stuck
– Fall through to PATH module (Nav graph)
Evolved Battle Controller
Battle Controller Outputs
• 6 movement outputs
–
–
–
–
–
–
Advance
Retreat
Strafe left
Strafe right
Move to nearest item
Stand still
Item
Bot
• Additional output
– Jump?
Enemy
Battle Controller Inputs
Pie slice sensors for enemies
Ray traces for walls/level geometry
Other misc. sensors for
current weapon properties,
nearby item properties, etc.
Battle Controller Inputs
• Opponent movement sensors
– Opponent performing movement action X?
– Opponents modeled as moving like bot
– Approximation used
Constructive Neuroevolution
•
•
•
•
Genetic Algorithms + Neural Networks
Build structure incrementally (complexification)
Good at generating control policies
Three basic mutations (no crossover used)
Perturb Weight
Add Connection
Add Node
Evolving Battle Controller
• Used NSGA-II* with 3 objectives
– Damage dealt
– Damage received (negative)
– Geometry collisions (negative)
• Evolved in DM-1on1-Albatross
– Small level to encourage combat
– One native bot opponent
• High score favored in
selection of final network
• Final combat behavior
highly constrained
*K. Deb et al. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. Evol. Comp. 2002
Action Filtering
• Network choice not always used
– Forced to stand still sometimes
• Sniping, not threatened, high ground
–
–
–
–
–
Prevented from jumping while still
Prevented from jumping near walls/opponents
Prevented from going to unwanted items
Prevented from strafing/retreating into walls
Etc…
• Forced lower accuracy
• Forced delays to simulate human response time
• Evolution constrained to look human
Importance of Observing
• Humans don’t just want max score
• Human goal is to judge correctly
– Requires observation w/o fighting
• Observe module
– Bot hasn’t judged
opponent
– Avoids crowds
• Judging module
– Lengthy observation
leads to judging
Observation Behavior
Still
Approach
Use Battle Controller
Retreat
Human Subject Evaluation
• BotPrize tests humanness without saying
what is human-like vs. bot-like
• Idea: BotPrize style experiment in which
players are extensively interviewed
• IRB Human Subject Study w/cash prizes
• Performed at UT:
– 6 human volunteers
– 3 human interviewers
– 4 versions of UT^2
– Native bots
Justify
Judgments
• Record each match and replay to human
• Human explains rationale for judgments
• Downsides:
– Humans forget
– Humans make things up
– Humans change their minds
• Still, many common themes emerged:
Humans Aren’t Killing Machines
• Accuracy affected by movement/distraction
• Pause before responding to surprises
• Humans don’t fire non-stop
– Waiting for opportune shot
– Saving ammo
• Few weapon switches
• Pause to observe
Humans Aren’t Stupid
• Humans rapidly correct mistakes
– Get unstuck quickly
– Move/dodge when fired upon
– Don’t stare at walls
• Humans know their limitations
– Prefer weapons requiring less accuracy
– Don’t fight with a weak weapon
Complex Human Movements
• Do
– Chase opponents tenaciously
– Retreat while firing on opponent
– Move in and out from cover
• Don’t
– Perform many rapid movements too quickly
– Turn around too quickly
Cognitive Issues
• Theory of Mind
– Behavior transitions
• A chasing human expects to fight
• Humans expect to be chased (traps)
– Communication via judging
• Human knows that its action will be
recognized as human-like by humans
– Emotion
• Revenge on humans more satisfying
• Fear of dangerous opponents
Conclusion
• Human trace replay provides human style
exploration and gets bot unstuck
• Multiobjective neuroevolution provides
combat behavior
• Simulated observation makes bot seem
more human-like
• Future work: Incorporate Theory of Mind
Questions?
Jacob Schrum (schrum2@cs.utexas.edu)
Igor V. Karpov (ikarpov@cs.utexas.edu)
Risto Miikkulainen (risto@cs.utexas.edu)
Auxiliary Slides
Multiobjective Optimization
• Game with two objectives:
High health but did not deal much damage
– Damage Dealt
– Remaining Health
• A dominates B iff A is
strictly better in one
objective and at least
as good in others
• Population of points
not dominated are best:
Pareto Front
• Weighted-sum provably
incapable of capturing
non-convex front
Tradeoff between objectives
Dealt lot of damage,
but lost lots of health
NSGA-II
• Evolution: natural approach for finding optimal population
• Non-Dominated Sorting Genetic Algorithm II*
–
–
–
–
Population P with size N; Evaluate P
Use mutation to get P´ size N; Evaluate P´
Calculate non-dominated fronts of {P P´} size 2N
New population size N from highest fronts of {P P´}
*K. Deb et al. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. Evol. Comp. 2002
Download