Σ Modeling Two-Player Games in the Sigma Graphical Cognitive Architecture David V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li 8.1.2013 Overall Desiderata for Sigma (𝚺) A new breed of cognitive architecture that is Grand unified Cognitive + key non-cognitive (perception, motor, affective, …) Functionally elegant Broadly capable yet simple and theoretically elegant “cognitive Newton’s laws” Sufficiently efficient Fast enough for anticipated applications For virtual humans & intelligent agents/robots that are Broadly, deeply and robustly cognitive Interactive with their physical and social worlds Adaptive given their interactions and experience Hybrid: Discrete + Continuous Mixed: Symbolic + Probabilistic 2 Sample ICT Virtual Humans Ada & Grace INOTS Gunslinger SASO For education, training, interfaces, health, entertainment, … 3 Theory of Mind (ToM) in Sigma ToM models the minds of others, to enable for example: Understanding multiagent situations Participating in social interactions ToM approach based on PsychSim (Marsella & Pynadath) Decision theoretic problem solving based on POMDPs Recursive agent modeling Questions to be answered Can Sigma elegantly extend to comparable ToM? What are the benefits for ToM? What new phenomena emerge from this combination? Results reported here concern: Multiagent Sigma Implementation of single shot, two player games Both simultaneous and sequential moves 4 The Structure of Sigma Constructed in layers In analogy to computer systems Cognitive Arch: Perception Computer System 𝚺 Cognitive System Programs & Services Knowledge & Skills Computer Architecture Cognitive Architecture Microcode Architecture Graphical Architecture Hardware Lisp Memory Access Decision Learning Action Predicates (WM) Conditionals (LTM) Graphical Architecture: Graph Solution Graph Modification Graphical models Piecewise linear functions Conditionals: Deep blending of rules and probabilistic networks Graphical models: Factor graphs + summary product algorithm 5 Control Structure: Soar-like Nesting of Three Layers A reactive layer One (internally parallel) graph/cognitive cycle Which acts as the inner loop for A deliberative layer Serial selection and application of operators Which acts as the inner loop for A reflective layer Recursive, impasse-driven, meta-level generation Tie No-Change The layers differ in Time scales Serial versus parallel Controlled versus uncontrolled 6 Single-Shot, Simultaneous-Move, Two-Player Games Two players move simultaneously Played only once (not repeated) B A Prisoner’s Dilemma Cooperat e Defect Cooperate .3 .1(,.4) .4(,.1) .2 Defect So no need to look beyond current decision Symmetric and asymmetric games Socially preferred outcome: optimum in some sense Nash equilibrium: Neither player can unilaterally increase their payoff by altering their own choice Key result: Sigma found the best Nash equilibrium in one memory access (i.e., graph solution) Although linear combination in article can’t always guarantee it Prisoner’s Dilemma Cooperat e Defect A Result B Result Stag Hunt Cooperat e Defect A Result B Result Cooperate .3 .1 .43 .43 Cooperate .25 0 .54 .54 Defect .4 .2 .57 .57 Defect .1 .1 .46 .46 602 Messages 962 Messages 7 Sequential Games Players (A, B) alternate moves E.g., Ultimatum, centipede and negotiation Decision-theoretic approach with softmax combination Use expected value at each level of search Action Ps assumed exponential in their utilities (à la Boltzmann) There may be many Nash equilibria Instead seek stricter concept of subgame perfection Overall strategy is an equilibrium strategy over any subgame Key result: Games solvable in two modes: Automatic/reactive/system-1 Controlled/deliberate/system-2 Both modes well documented in humans for general processing Combination not found previously in ToM models 8 The Ultimatum Game A starts with a fixed amount of money (3) A decides how much (in 0-3) to offer B B decides whether or not to accept the offer If B accepts, each gets the resulting amount If B rejects, both get 0 Each has a utility function over money E.g., <.1, .4, .7, 1> 9 Automatic/Reactive Approach A trellis (factor) graph in LTM with one stage per move Focus on backwards messages from reward(s) CONDITIONAL Transition-B Conditions: Money(agent:B quantity:moneyb) Condacts: Accept(offer:offer acceptance:choice) Function(choice,offer,moneyb): 1<T,0,0>, 1<T,1,1>, 1<T,2,2>, 1<T,3,3>, 1<F,*,0> VB (offer, choice) = Reward B (Money B(offer, choice)) VA (offer) = å Reward A (Money A(offer, choice))* e K*VB(offer,choice) choice offer TA accept CONDITIONAL Reward Condacts: Money(agent:agent quantity:money) Function(agent,money): .1<*,0>, .4<*,1>, .7<*,2>, 1<*,3> TB money reward exp CONDITIONAL Transition-A Conditions: Money(agent:A quantity:moneya) Accept-E(offer:offer acceptance:choice) Condacts: Offer(agent:A quantity:offer) Function(choice,offer,moneya): 1<T,0,3>, 1<T,1,2>, 1<T,2,1>, 1<T,3,0>, 1<F,*,0> 10 Controlled/Deliberate(Reflective) Approach Decision-theoretic problem-space search across metalevels Very Soar-like, but with softmax combination Depends on summary product and Sigma’s mixed aspect Corresponds to PsychSim’s online reasoning A tie 0 1 2 3 A tie E(2) 0 1 2 3 A tie 0 1 2 3 A 1 E(2) no-change no-change B accept reject 2 none B accept reject 2 tie tie E(accept) no-change none accept 11 Comments on the Ultimatum Game Automatic version (5 conditionals) A’s normalized distribution over offers: <.315, .399, .229, .057> 1 decision (94 messages) and .02 s (on a MacBook Air) Controlled version (19 conditionals) Distributions Comparable A’s normalized distribution over offers: <.314, .400, .229, .057> 72 decisions (868 messages/decision) and 126.69 s Speed Ratio >6000 Same result, with distinct computational properties Automatic is fast and occurs in parallel with other memory processing, but is not (easily) penetrable by new bits of other knowledge Controlled is slow, sequential, but can (easily) integrate new knowledge Distinction also maps onto expert versus novice behavior in general Raises possibility of a generalization of Soar’s chunking mechanism Compile/learn automatic trellises from controlled problem solving Finer grained, mixed(/hybrid) learning mechanism 12 Conclusion Simultaneous games are solvable within a single decision Yield Nash equilibria (although linear combination doesn’t guarantee) Sequential games are solvable in either an automatic or a controlled manner Raises possibility of a mixed variant of chunking that automatically learns probabilistic trellises (HMMs, DBNs, …) from problem solving May yield a novel form of general structure learning for graphical models Two architectural modifications to Sigma were required Multiagent decision making (and reflection) Optional exponentiation of outgoing WM messages (for softmax) Future work includes More complex games Belief updating (learning models of others) 13 Overall Progress in Sigma Memory [ICCM 10] Mental imagery [BICA 11a; AGI 12a] Procedural (rule) Declarative (semantic/episodic) Constraint Problem solving Preference based decisions [AGI 11] Impasse-driven reflection [AGI 13] Decision-theoretic (POMDP) [BICA 11b] Theory of Mind [AGI 13] Learning [ICCM 13] Episodic Concept (supervised/unsupervised) Reinforcement [AGI 12b] Action modeling [AGI 12b] Map (as part of SLAM) 1-3D continuous imagery buffer Object transformation Feature & relationship detection Perception [BICA 11b] Object recognition (CRFs) Localization Natural language Question answering (selection) Word sense disambiguation [ICCM 13] Part of speech tagging [ICCM 13] Isolated word speech recognition Graph integration [BICA 11b] CRF + Localization + POMDP Some of these are still just beginnings 14