Presentation

advertisement
Σ
Modeling Two-Player Games in
the Sigma Graphical Cognitive
Architecture
David V. Pynadath, Paul S. Rosenbloom,
Stacy C. Marsella and Lingshan Li
8.1.2013
Overall Desiderata for Sigma (𝚺)
 A new breed of cognitive architecture that is
 Grand unified
 Cognitive + key non-cognitive (perception, motor, affective, …)
 Functionally elegant
 Broadly capable yet simple and theoretically elegant
 “cognitive Newton’s laws”
 Sufficiently efficient
 Fast enough for anticipated applications
 For virtual humans & intelligent agents/robots that are
 Broadly, deeply and robustly cognitive
 Interactive with their physical and social worlds
 Adaptive given their interactions and experience
Hybrid: Discrete + Continuous
Mixed: Symbolic + Probabilistic
2
Sample ICT Virtual Humans
Ada & Grace
INOTS
Gunslinger
SASO
For education, training, interfaces, health, entertainment, …
3
Theory of Mind (ToM) in Sigma
 ToM models the minds of others, to enable for example:
 Understanding multiagent situations
 Participating in social interactions
 ToM approach based on PsychSim (Marsella & Pynadath)
 Decision theoretic problem solving based on POMDPs
 Recursive agent modeling
 Questions to be answered
 Can Sigma elegantly extend to comparable ToM?
 What are the benefits for ToM?
 What new phenomena emerge from this combination?
 Results reported here concern:
 Multiagent Sigma
 Implementation of single shot, two player games
 Both simultaneous and sequential moves
4
The Structure of Sigma
 Constructed in layers
 In analogy to computer systems
Cognitive Arch:
Perception
Computer System
𝚺 Cognitive System
Programs &
Services
Knowledge & Skills
Computer
Architecture
Cognitive
Architecture
Microcode
Architecture
Graphical
Architecture
Hardware
Lisp
Memory Access
Decision
Learning
Action
Predicates (WM)
Conditionals (LTM)
Graphical Architecture:
Graph Solution
Graph Modification
Graphical models
Piecewise linear functions
Conditionals: Deep blending of rules and probabilistic networks
Graphical models: Factor graphs + summary product algorithm
5
Control Structure: Soar-like Nesting of Three Layers
 A reactive layer
 One (internally parallel) graph/cognitive cycle
Which acts as the inner loop for
 A deliberative layer
 Serial selection and application of operators
Which acts as the inner loop for
 A reflective layer
 Recursive, impasse-driven, meta-level generation
Tie
No-Change
 The layers differ in
 Time scales
 Serial versus parallel
 Controlled versus uncontrolled
6
Single-Shot, Simultaneous-Move,
Two-Player Games
 Two players move simultaneously
 Played only once (not repeated)
B
A
Prisoner’s
Dilemma
Cooperat
e
Defect
Cooperate
.3
.1(,.4)
.4(,.1)
.2
Defect
 So no need to look beyond current decision
 Symmetric and asymmetric games
 Socially preferred outcome: optimum in some sense
 Nash equilibrium: Neither player can unilaterally increase
their payoff by altering their own choice
 Key result: Sigma found the best Nash equilibrium in
one memory access (i.e., graph solution)
 Although linear combination in article can’t always guarantee it
Prisoner’s
Dilemma
Cooperat
e
Defect
A
Result
B
Result
Stag
Hunt
Cooperat
e
Defect
A
Result
B
Result
Cooperate
.3
.1
.43
.43
Cooperate
.25
0
.54
.54
Defect
.4
.2
.57
.57
Defect
.1
.1
.46
.46
602 Messages
962 Messages
7
Sequential Games
 Players (A, B) alternate moves
 E.g., Ultimatum, centipede and negotiation
 Decision-theoretic approach with softmax combination
 Use expected value at each level of search
 Action Ps assumed exponential in their utilities (à la Boltzmann)
 There may be many Nash equilibria
 Instead seek stricter concept of subgame perfection
 Overall strategy is an equilibrium strategy over any subgame
 Key result: Games solvable in two modes:
 Automatic/reactive/system-1
 Controlled/deliberate/system-2
Both modes well documented in humans for general processing
Combination not found previously in ToM models
8
The Ultimatum Game
 A starts with a fixed amount of money (3)
 A decides how much (in 0-3) to offer B
 B decides whether or not to accept the offer
 If B accepts, each gets the resulting amount
 If B rejects, both get 0
 Each has a utility function over money
 E.g., <.1, .4, .7, 1>
9
Automatic/Reactive Approach
 A trellis (factor) graph in LTM with one stage per move
 Focus on backwards messages from reward(s)
CONDITIONAL Transition-B
Conditions: Money(agent:B quantity:moneyb)
Condacts: Accept(offer:offer acceptance:choice)
Function(choice,offer,moneyb): 1<T,0,0>, 1<T,1,1>, 1<T,2,2>,
1<T,3,3>, 1<F,*,0>
VB (offer, choice) = Reward B (Money B(offer, choice))
VA (offer) = å Reward A (Money A(offer, choice))* e
K*VB(offer,choice)
choice
offer
TA
accept
CONDITIONAL Reward
Condacts: Money(agent:agent quantity:money)
Function(agent,money): .1<*,0>, .4<*,1>,
.7<*,2>, 1<*,3>
TB
money
reward
exp
CONDITIONAL Transition-A
Conditions: Money(agent:A quantity:moneya)
Accept-E(offer:offer acceptance:choice)
Condacts: Offer(agent:A quantity:offer)
Function(choice,offer,moneya): 1<T,0,3>, 1<T,1,2>, 1<T,2,1>,
1<T,3,0>, 1<F,*,0>
10
Controlled/Deliberate(Reflective) Approach
 Decision-theoretic problem-space search across metalevels
 Very Soar-like, but with softmax combination
 Depends on summary product and Sigma’s mixed aspect
 Corresponds to PsychSim’s online reasoning
A
tie
0
1
2
3
A
tie
E(2)
0
1
2
3
A
tie
0
1
2
3
A
1
E(2)
no-change
no-change
B
accept
reject
2
none
B
accept
reject
2
tie
tie
E(accept)
no-change
none
accept
11
Comments on the Ultimatum Game
 Automatic version (5 conditionals)
 A’s normalized distribution over offers: <.315, .399, .229, .057>
 1 decision (94 messages) and .02 s (on a MacBook Air)
 Controlled version (19 conditionals)
Distributions Comparable
 A’s normalized distribution over offers: <.314, .400, .229, .057>
 72 decisions (868 messages/decision) and 126.69 s Speed Ratio >6000
 Same result, with distinct computational properties
 Automatic is fast and occurs in parallel with other memory processing,
but is not (easily) penetrable by new bits of other knowledge
 Controlled is slow, sequential, but can (easily) integrate new knowledge
 Distinction also maps onto expert versus novice behavior in general
Raises possibility of a generalization of Soar’s chunking mechanism
 Compile/learn automatic trellises from controlled problem solving
 Finer grained, mixed(/hybrid) learning mechanism
12
Conclusion
 Simultaneous games are solvable within a single decision
 Yield Nash equilibria (although linear combination doesn’t guarantee)
 Sequential games are solvable in either an automatic or a
controlled manner
 Raises possibility of a mixed variant of chunking that automatically
learns probabilistic trellises (HMMs, DBNs, …) from problem solving
 May yield a novel form of general structure learning for graphical models
 Two architectural modifications to Sigma were required
 Multiagent decision making (and reflection)
 Optional exponentiation of outgoing WM messages (for softmax)
 Future work includes
 More complex games
 Belief updating (learning models of others)
13
Overall Progress in Sigma
 Memory [ICCM 10]
 Mental imagery [BICA 11a; AGI 12a]
 Procedural (rule)
 Declarative (semantic/episodic)
 Constraint
 Problem solving




Preference based decisions [AGI 11]
Impasse-driven reflection [AGI 13]
Decision-theoretic (POMDP) [BICA 11b]
Theory of Mind [AGI 13]
 Learning





[ICCM 13]
Episodic
Concept (supervised/unsupervised)
Reinforcement [AGI 12b]
Action modeling [AGI 12b]
Map (as part of SLAM)
 1-3D continuous imagery buffer
 Object transformation
 Feature & relationship detection
 Perception
[BICA 11b]
 Object recognition (CRFs)
 Localization
 Natural language




Question answering (selection)
Word sense disambiguation [ICCM 13]
Part of speech tagging [ICCM 13]
Isolated word speech recognition
 Graph integration
[BICA 11b]
 CRF + Localization + POMDP
Some of these are still just beginnings
14
Download