Representing von Neumann-Morgenstern Games in the Situation Calculus

From: AAAI Technical Report WS-02-06. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. Representing von Neumann-Morgenstern Gamesin the Situation Calculus Oliver Schulte and James Delgrande School of ComputingScience SimonFraser University Burnaby, B.C., Canada {oschulte,jim}@cs.sfu.ca Abstract Sequential von Neumarm-Morgernstern (VM)gamesare verygeneralformalism for representingmulti-agentinteractions andplanningproblems in a varietyof typesof environments. Weshowthat sequential VM gameswith countably manyactions andcontinuousutility functionshavea sound and completeaxiomatizationin the situation calculus. We discuss the application of various conceptsfromVMgame theoryto the theoryof planningandmulti-agent interactions, suchas representingconcurrentactionsandusing the Baire topologyto define continuouspayofffunctions.Keywords: decision and gametheory, multiagentsystems,knowledge representation,reasoningaboutactionsandchange Introduction Muchrecent research has shownthat systems of interacting softwareagents are a useful basis for computationalintelligence. Ourbest general theory of interactions between agents is the mathematicaltheory of games,developedto a high level since the seminal work ofvon Neumann and Morgenstern (yon Neumann& Morgenstern 1944). VonNeumannand Morgenstem(VMfor short) designed gametheory as a very general tool for modellingagent interactions; experience has confirmedthe generality of the formalism: Computer scientists, economists,political scientists and others have used gametheory to modelhundredsof scenarios that arise in economic, social and political spheres. Our aim is to represent the concepts of VMgametheory in a logical formalismso as to enable computationalagents to employgame-theoreticalmodelsfor the interactions that they are engagedin. It turns out that the epistemicextension /;e of the situation calculus(Levesque,Pirri, &Reiter 1998, Sec.7) is adequate for this task. Weconsider VMgamesin which the agents can take at most countably manyactions and in whichthe agents’ payoff functions are continuous. Weshow that every such VMgame G has an axiomatization Axioms(G)in the situation calculus that represents the game,in the following strong sense: The gameG itself is a model of Axioms(G), and all models of Azioms(G)are isomorphic(that is, Axioms(G)is a categorical axiomatization of (7). It followsthat the axiomatization is correct, in Copyright©2002, American Associationfor Artificial Intelligence(www.aaai.org). Allrights reserved. 89 the sense that Axioms(G)entails only true assertions about the game(7, and completein the sense that Aa:iotas(G)entails all true assertions aboutthe game(7. This result makesa numberof contributions to Artificial Intelligence. First, it showshowto construct a soundand completeset of situation calculus axiomsfor a given application domain.Bart et al. give an extendedconstruction of such a set of axiomsfor a version of the boardgame"Clue" (Bart, Delgrande,&Schulte 2001), whichwe outline in the penultimatesection. Second,the result establishes that the situation calculusis a very general languagefor representing multi-agentinteractions. Third, it providesan agent designer with a general recipe for utilizing a game-theoreticmodelof a given type of interaction (e.g., Prisoner’s Dilemma~ Battie of the Sexes, Cournot Duopoly(Osborne &Rubinstein 1994)) and describing the modelin a logical formalism. Fourth, it opensup the potential for applyingsolution and search algorithmsfrom gamesresearch to multi-agent interactions (Koller &Pfeffer 1997;van den Herik &Iida 2001; Bart, Delgrande,&Schulte 2001). Finally, gametheory is a major mathematicaldevelopmentof the 20th century, and we expect that manyof the general concepts of gametheory will provefruitful for research into intelligent agents. For example,we introduce a standard topological structure associated with gamesknownas the Baire topology. The Baire topologydeterminesthe large class of continuouspayoff function, whichcan be defined in the situation calculus in a natural way. Anattractive feature of VMgametheory is that it provides a single formalismfor representing both multi-agent interactions and single-agentplanningproblems.This is because a single-agent interaction with an environmentcan be modelledas a 2oplayer gamein whichone of the players-the environmentis indifferent about the outcomeof the interaction. Thusour representation result includes planning problemsas a special case. Thepaper is organizedas follows. Webegin with the definition of a sequential VM game.Thenext section introduces the situation calculus. Thenwe specify the set of axioms Azioms(G)for a given game(7, in two stages. First, we axiomatizethe structure of the agents’ interaction--roughly, what agents can do and what they knowwhen. Second, we showhowto define continuousutility functions in the situation calculus. Finally, for illustration weoutline an extended application from previous workin whichweused the situation calculusto represent a variant of the boardgame"Clue". (Somereaders maywantto look over this discussion in the next-to-last section, before the moreabstract results in the mainsections.) Send~litics a ~S1) osery Sequential Games Definitions Sequential gamesare also knownas extensive formgamesor gametrees. Thefollowingdefinition is due to von Neumann,Morgenstem,and Kuhn; we adopt the formulation of (Osborne&Rubinstein 1994, Ch.ll.1). Webegin with somenotation for sequences.Aninfinite sequenceis a function from the positive natural numbersto someset; we denoteinfinite sequencesby the letter h and variants suchas hI or hi. A finite sequenceof length n is a functionwith domain1, ..., n, denotedby the letter s andvariants. Almostall the sequenceswe considerin this paper are sequencesof actions. Wedenote actions throughoutby the letter a and variants. Wewrite s = at, ..., anto indicate the finite sequences whosei-th memberis ai, and write h = at, ..,an, ... for infinite sequences.Ifs = at, ...,an is a finite sequenceof n actions, the concatenations * a = at, ..., an, a yields a finite sequenceof length n + 1. Wefollow the practice of set theory and write d C s to indicate that sequenced is a prefix of s (d C s for proper prefixes); likewise, wewrite s C h to indicate that s is a finite initial segment of the infinite sequenceh. The term "sequence"without qualification applies both to finite and infinite sequences,in whichcase weuse the letter tr and variants. Nowwe are ready to define a sequential game. Definition 0.1 (yon Neumann, Morgenstern,Kuhn) sequentialgameGis a tuple ( N, H, player, f c, {Ii}, {ui}) whosecomponentsare as follows. 1. A finite set N (the set ofplayers). 2. A set of H of sequencessatisfying the following three properties. (a) The empty sequence 0 is a memberof (b) If a is in H, then every initial segmentof or is in (c) lf h is an infinite sequencesuchthat everyfinite initial segmentof h is in H, then h is in H. Eachmemberof H is a history; each elementof a history is an action taken by a player. A history tr is terminalif there is no history s E Hsuch that tr C s. (Thus all infinite histories are terminaL)Theset of terminalhistories is denotedby Z. Theset of actions available at a finite history s is denoted by A(s) = {a : s*a E H}. 3. A function player that assigns to each nonterminalhistory a memberof N O {e}. The function player determineswhichplayer takes an action after the history s. If player(s) = c, then it is "nature’s"turn to makea chance move. 4. A function fe that assigns to every history s for which player(s) = c a probability measurefc(-ls) on A(s). Each probability measureis independentof every other such measure.(Thusfc(als) is the probability that "nature" choosesaction a after the history s.) b (Sl) ~e~~ I . Send i~1 ~]: Server1 receivespayoffx,Server2 receives payoffy c d ©.............. © nodes c and d are in the same information set Figure 1: A game-theoreticmodelof the interaction between two intemet servers, S~and $2, with twousers. 5. For each player i E N an information partition Zi defined on {s E H : player(s) = i}. An element Ii ofli is called an informationset of player i. Werequire that if s, d are membersof the sameinformationset Ii, then A(s) = A(s’). 6. For each player i E N a payoff functlan ui : Z ---r R that assigns a real numberto eachterminalhistory. An Example To illustrate how interactions between agents may be represented as game trees, we adopt an abridgedversion of a scenario from(Bicchieri, Ephrati, Antonelli 1996). Wereturn to this examplethroughoutthe paper. Considerservers on the intemet. Eachserver is connected to several sources of informationand several users, as well as other servers. There is a cost to receiving and transmitting messagesfor the servers, whichthey recover by chargingtheir users. Wehave twoservers 81 and $2, and two users--journalists--U1 and [/2. Both servers are connected to each other and to user Ut; server $2 also serves user Us. Thereare twotypes of newsitems that interest the users: politics and showbiz.Thevarious costs and charges for transmissionsadd up to payoffsfor the servers, depending on what messagegets sent where. For example,it costs 81 4 cents to send a messageto 82, and it costs 82 2 cents to send a messageto 0"2. If Us receives a showbizmessage from,.q2 via St, he pays St and ,5’2 each 6 cents. So in that case the overall payoff to St is -4 + 6 = 2, and the overall payoff to $2 is -2 + 6 = 4. Bicchieri et al. describe the charges in detail; we summarizethemin Figure 1. Figure1 represents the structure of the interaction possibilities betweenthe servers and the users. Webegin withthe environment delivering a type of itemto the first server. If 90 the chance p of each type of item arising is knownamong the players, the root noder correspondingto the emptyhistory 0 wouldbe a chance node (i.e., player(O) = c) with associatedprobability p. If the probability p is unknown, we wouldassign the root nodeto an "environment"player (i.e., player(O) = env) whois indifferent amongall outcomes (and whosepayoff function hence neednot be listed). Thus gametheory offers two ways of representing uncertainty about the environment, depending on whether the probabilities governing the environmentare common knowledge amongthe players or not. Everynode in the tree represents a history in the game. The nodes belonging to $1 are {a, b}, and its information partition is 27i = {{a}, {b}}. Bicchieri et al. assumethat the messagessent from $1 to $2 are encoded so that $2 does not knowwhich type of messageit receives. Therefore the information partition of $2 is Z2 = {{c,d}}. If $2 knewwhat type of news item it receives, its information partition wouldbe 222 = {{e}, {d}}. Thepayoff functions are as illustrated, with server Sl’s payoffs shownon top. Thus ul(0.Showbiz,send S2*send U2) = 2, and u2(0*Showbiz*send Sg_,send [/2) = The gametree of Figure1 admitsdifferent interpretations fromthe one discussedso far. For example,it also represents a situation in whichthe messagesare not encoded,but in whichserver $2 has to makea decision independentof what newsitem $1 sends it. In the latter case it wouldbe more natural to reorderthe gametree so that Sz choosesfirst, then the environment,finally $1. But the insight from gametheory here is that for rational choice wedo not needto represent the "absolute" time at whichthe players move,but what each player knowswhenshe makesa move.Fromthis point of view, there is no importantdifference betweena setting in whichplayer 1 choosesfirst, and then player 2 chooses without knowingl’s choice, as in Figure i, and the setting in whichboth players choosesimultaneously(of. (Osborne &Rubinstein 1994, p.202, p.102)). Thus we can modelsimultaneousaction as equivalent to a situation in whichone player movesfirst, and the other player does not knowthat choice. TheSituationCalculusWithInfinite Histories Ourpresentationof the situation calculus follows(Levesque, Pirri, &Reiter 1998).To the foundationalsituation calculus axiomswe add elementsfor referring to infinite sequences of actions. Ouraxiomatizationresult in the next section applies to the epistemicextensionof the situation calculus. We use mnemonic symbolsfor parts of the situation calculus accordingto their intendedmeanings.It is importantto keepin mindthat these are merelysymbolsin a formal language;it is the task of our axiomatizationto ensure that the symbols carry the intended meaning.To emphasizethe distinction betweensyntactic elementsand their interpretations, we use boldface type, whichwill alwaysindicate a symbolin a formal language(although conversely, we do not use boldface for all elementsof the situation calculus.) The situation calculus with infinite histories is a multisortal languagethat contains, in addition to the usual con- nectivesand quantifiers, at least the followingelements. 1. A sort action for actions, with variables a, a’ and constants a/. ~. situations, withvariabless, s 2. Asort situation for 3. A sort inhist for infinite sequencesof actions, withvariables h, h’ etc. 4. A sort objects for everythingelse. 5. A function do : action x situation --~ situation. 6. A distinguished constant S0 E situation. 7. A relation E: situation x (situation U inhist). Weuse s I-- s~ as a shorthandfor s E s~ A ~(s = st), and similarly for s r- h. Theintended interpretation is that U denotes the relation "extends"betweensequences, that is, C denotes C as applied to sequencesviewedas sets of ordered pairs. 8. Apredicate poss(a, s), intendedto indicate that action is possiblein situation s. 9. A predicate possible(s) and possible(h). Weadopt the standard axioms for situations (see (Levesque, Pirri, &Reiter 1998)). Here and elsewhere the paper, all free variables are understoodto be universally quantified. ~s E So (1) (2) s I-- do(a, s’) - s E do(a,s) = do(a’,s’) -+(a = a’) A(s = Axiom3 ensures that every situation has a uniquename.We adopt the second-orderinduction axiomon situations. VP.[P(So) A Va, s.P(s) -~ P(do(a, s))] -~ Vs.P(s) A consequenceof Axiom4 is that every situation correspondsto a finite sequence of actions (cf. (Ternovskaia 1997,Sec. 3)). Next wespecify axiomsthat characterize infinite histories. Sot- h st- h ~ :lsl.s [:: s~ Ast r- h (s’[:: s As r-- h) -~ s’l:: h = h’ -- (Vs.s r-- h = s r- h’) possible(h) =_ Vs E h.possible(s) (5) (6) (7) (8) (9) Thefinal set of axiomssays that the possible predicate defines whichaction sequences are possible: an action sequenceis possible if no impossibleaction is ever taken along it. possible(So) (10) possible(do(a,s)) 91 - possible(s) A poss(a,s) (11) RepresentingGameFormsin the Situation Calculus The situation calculus is a natural languagefor describing gamesbecause its central notion is the sameas that of VM sequential games:a sequenceof actions. Weestablish a precise sense in whichthe situation calculusis apt for formalizing VMgames: Every VMgame G with countably many actions and continuous payoff functions has a categorical axiomatization Axioms(G)in the situation calculus, thus an axiomatizationthat describes all and only features of the gamein question. Let (N, H, player, fc, {h}, {ul}) be a sequential game G. The tuple (N,H, player, re, {Ii}) is the game form of G (Osborne & Rubinstein 1994, p.201), which we denote as F(G). The gameform specifies what actions are possibleat various stages of a game,but it does not tell us howthe players evaluate the outcomes.In this section, we construct a categorical axiomatizationof a gameformwith countably manyactions A(F). Weconsider axiomatizations of payoff functionsin a later section. Ourconstructionproceeds as follows. Weintroduceconstants for players, actions and situations, and assign each action constant ai a denotation Jail, whichdefinesa denotationfor situation constants. Thenwespecify a set of axiomsfor the other aspects of the gamein termsof the denotationsof the action and situation constants. Completeness.Clause 2c of Definition 0.1 requires that in a gametree, if everyfinite initial segments of an infinite history h is oneof the finite histories in the gametree, then h is an infinite history in the gametree. This clause rules out, for example,a situation in whichfor someaction a and every n, the sequence an is part of the game tree, but the infinite sequencea~ is not. In such a situation we might think of the infinite sequencea~ as "missing" from the gametree, and view clause 2c as ruling out this kind of incompleteness.(In topological terms, Clause 2c requires that the Baire topology renders a gametree a completemetric space; see the section on Defining ContinuousPayoff Functionsand references there). This notion of completenessis topological and different from the concept of completenessof a logical system. Froma logical point of view, topological completenessis complexbecauseit requires quantification over sequencesof situations, whichwe represent as certain kinds of properties of situations as follows. Let basis(P) stand for the formula P(S0), let inf(P) stand for Vs.P(s)--~ ~.s r- - s~A P(s~), le closed(P)stand for (s’r- sAP(s)) ~ P(s’), and finally Seq(P) stand for basis(P) A in f(P) A closed(P). comp leteness axio m correspondingto Clause2c is then the following: VP.Seq(P) .-r (qhVs.[P(s) - f- hi) (12) Axiom12 is independentof any particular gamestructure. Nonethelesswelist it as an axiomfor gametrees rather than as a general axiomcharacterizing infinite sequencesbecause the completenessrequirementis part of the notion of a game tree, rather than part of the notion of an infinite sequence. Aswesawin the section on the introductionto the situation calculus, the properties of infinite sequencescan be defined 92 in first-order logic, whereasthe completenessrequirement in the definition of a gametree requires second-orderquantification. Players. Weintroducea set of constantsto denoteplayers, suchas {e, 1, 2,...n}, and one variable p rangingover a new sort players. In the server gamewith chancemoves,we add constants c, S1 , 8 2 . Weintroduce unique namesaxiomsof the general formi ~ j, i ~ c for i ~ j. For the server game, we add e ~ SI,e ~ 82,81~ S2. Wealso add a domain closure axiom Vp.p = cVp = 1 V...Vp=n. Inthe server game, the domainclosure axiomis Vp.p = e V p = $1 Vp --2. S Actions. Weadd a set of constantsfor actions. If there are finitely manyactions al, .., an, weproceedas withthe players, adding finitely manyconstants, unique namesaxioms and a domainclosure axiom. For example, in the server game, we may introduce names UI, U2,2S2, Show, Poll for each action and then include axioms such as Vl U2, Show~ Poll, etc. However,with infinitely manyactions we cannot formulate a finite domainclosure axiom. Wepropose a different solution for this case. Just as the induction axiom4 for situations serves as a domainclosure axiomfor situations, we use a second-orderinductionaxiom for actions to ensure that there are at mostcountablymany actions in any modelof our axioms. The formal axiomsare as follows. Weintroduce a successor operation + : action --r action that takes an action a to the "next" action a+. (n) Wewrite a as a shorthandfor a~"’’+ wherethe successor operation is applied n times to a distinguished constant at (thus (°) = at). The constants (n) may serve as n ames for actions. As in the case of finitely manyactions, we introduceunique names axioms of the form a(0 ~ a(J) where i ~ j. adopt the induction axiom VP.[P(ao) A Va.P(a)--~P(a+)] --+ Va.P(a). Weassumefamiliarity with the notion of an interpretation or model(see for example(Boolos &Jeffrey 1974, Ch. 9)). Lemma1 Let .A4 = (actions,+, ~) be a model of Axiom 13 and the unique names axioms. Then for every a E actions, there is one and only one constant a(n) such that [a(n) ] = a. Proof (Outline). The unique namesaxioms ensure that for every action a, there are no twoconstants a (n), a(n’) such that ra(n)] = ra(n’)] = a. To establish that there is one such constant, use an inductive proof on the property of being namedby a constant; moreprecisely, use the induction axiomto showthat for every action a E actions, there is someconstant a(n) such that [a(n)] = a.l"l Nowchoosea function ~ that is a 1-1 and total assignmentof actions in the gameform A(F)to action constants. For example, we may choose [Show] = Showbiz, [2S2] = send Sz, I’U1] = send U1, etc. Situations. Let do(al, ..., an) denotethe result of repeatedly applyingthe do constructor to the actions al, ...,an from the initial situation. Thusdo(al, ..., an) is shorthandfor dO(al, do(a2, ..., do(a~))))...). Westipulate AxiomSet i~j Vp.p = 1V...Vp = n Actions: Unique Names DomainClosure poss(a,sj) ~poss(ai,s~) player(sj) = player(s1) _L K(s~,sj) -K(si, sj) Constraints i~j Axioms(F),defined via the denotation function 17 as described above, is the pair 27 = (I(F), 17). Wenote if two interpretations are isomorphic, then the samesentences are true in both interpretations (Boolos &Jeffrey 1974, p.191). see text [ai| ¯A(fs~l) fad¢ A(fs~]) player( f~jl) = fil [sj1 tt H; 1(r~]) = s(f~jl) s(rs,1)# s(rs~l) Theorem0.1 Let F = (N,H, player, {Ii}) be a game form. Suppose that ~ gives a 1-1 and onto mappingbetween action constants and Actions(F). Table 1: The set Axioms(F) associated with a game form F = (N,H, player,{Ii}) and a denotation function 17. Termssuch as ai andsj stand for action and situation constants, respectively. ~o(0) = So so that ~(al, ..., ~) = Soif n =0. The constantexpressionsof the formdo(a1, ..., an) serve as our namesfor situations. Weextendthe denotation[.ai] for actions to situations in the obviousway:[.~(al, ..., an)] = Fall ¯ --- ¯ [’an]. For example, [.do(Show, 2S2)] = Showbiz.send $2. Note that [.S01 = 0. Knowledge. To represent knowledge, we follow (Levesque, Pirri, &Reiter 1998, See.7) and employa binary relation K, whereK(s, ~) i s r ead as " situation s is an epistemic alternative to situation s~’’. Therelation K is intended to hold betweentwo situation constants s,s’ just in case s and s~ denote situations in the sameinformation set. For example, in the server gamewe have that X(~(Show, 2S2), ~(Poli,2S2)) holds. Triviality Axioms.We add axiomsto cnsurcthat allfunctions of situations takeon a "don’tcare" value.I_forimpossible situations.Forexample, player(do(Show, U1, U2)) = _1_ holds. Table 1 specifies the set of axioms Axioms(F) for a VMgame form F. For simplicity, we assume that there are no chance moves in F (see below). Wewrite /(s) to denote the information set to whicha finite gamehistory s belongs(in F). To reducenotational clutter, the table denotes situation constants by symbolssuch as si, with the understandingthat si stands for someconstant term of the form do(a1, ..., an). Let F (N,H, pl ayer, {/i}) be a gameform whoseset of actions is A. In set-theoretic notation, the set of all infinite sequencesof actions is A’~, and we write A<~for the set of all finite action sequences. Wesay that the tree-like structure for F is the tuple I(F) = (N, A, A<~,°~, _L,poss, p ossible, C , *,player~, K), whereeach object interprets the obvious symbol(e.g., ~ is t he sort f or h, a nd ¯interprets do), an 1. possible(s) ¢=~ s ¯ 2. K(s,s’) ¢=~ I(s)= S(s’), 3. player~(s) = / if -~possible(s) and player~(s) player(s) otherwise. The intended interpretation for the set of axioms 93 1. The intended interpretation of Axioms(F), defined via the denotation function ~, is a model of Axioms(F)and Axioms 1-13. 2. All models of Axioms(F) and Axioms 1-13 are isomorphic. Proof. (Outline) Part 1: It is fairly straightforward verify that the intendedinterpretation for F--basically, the gametree--satisfies the general axiomsand Axioms(F). For Part 2, the argumentfollowsour construction as follows. Let .h4 = (M, l-].M) be a model of Axioms(F) and Axioms1 -13. Wecan showthat .A//is isomorphic to 27 = (I(F), [7). (1) The domainclosure and unique axiomsfor the sorts player and action guaranteethat there is a 1-1 mappingbetweenA and N and the respective sorts player and action in 34. Moreprecisely, let narneA~09) be the constant i of sort player such that [.i]~ = p, and define similarly name~(a) for action a. Thenthe function f defined by f(p) [. name~(p)]takes anobject of sort player in 34 to an object of sort player in27, and f is a 1-I onto mapping.Similarly for action objects. (2) Giventhat every action in 34 has a uniqueaction constant namingit, it follows fromAxioms1--4 that every situation in 34 has a uniquesituation constant namingit (as before, we maywrite narne~(s)), whichimplies that there is a 1-1onto mapping betweenthe situations in 34 and A<~. (3) Axioms5-8 ensure that every object in the sort inhist in 34 corresponds to an infinite sequenceof (nested) situations; Axiom12 ensures that for every such nested infinite sequenceof situations, there is someobject h in inhist such that h extends each situation in the sequence.Giventhe result of step (2), it follows that there is a 1-1 onto mappingbetweenthe sort inhist in 34 and A~. (4) Since poss~(a, s) holds in 34 iff poss(name~ (a), namem(s)) is in Axioms(F), which is true just in case possz ( [name~(a)1, [name~(s)]) holds in 27, this mapconstitutes a 1-1 onto mappingbetween the extensions of poss,~ and possz. Together with Axioms 9-11, this ensures a 1-1 mappingbetween the extensions of the possible predicate in each model. (5) also have that player]~ (s) = if f player(name~ (s)) = name]~(i) is in Axioms(F), which holds just in case playerz(namez(s)) = namez(i), so the graphs of the player function are isomorphicin each model. (6) By the sameargument, the extensions of the K predicate are isomorphic in each model. So any model 34 of the axioms Axioms(F) and Axioms1 -13 is isomorphic to the intended model27, whichshowsthat all modelsof these axioms are isomorphic. [] It followsfromTheorem 0.1 that for each player i, the relation K fromour axiomatizationis an equivalencerelation on the situations at whichi moves,becauseK represents the informationpartition of playeri. In order to represent chancemoves,weneed axiomscharacterizing real numbers;wedo not wantto go into axiomatic real numbertheory here since that topic is well-knownand the issues are similar to those that we take up in the section Defining ContinuousPayoff Functions. A brief outline: Start with (second-order) axiomscharacterizing the real numbers.Thereare at mostcountablymanypairs (a, s) such that "nature" choosesaction a in situation s with some probability fc(a[s). Introduce a constant for each of these probabilities, and axiomsto ensure that the constant denotes the correct real numbers. Addaxioms of the form fc(a,s) = p wheneverfe([a][[s]) = [p], as well as propriatetriviality axiomsfor fe. Discussion and Related Work Wemayevaluate a representation formalismsuch as the situation calculus with respect to two dimensions:Expressive power--how many domains can the formalism model?-and tractability~howdifficult is it to carry out reasoning in the formal language?Typically, there is a trade-off betweenthese twodesiderata. Ourresults so far indicate how great the expressivepowerof the situation calculus is. Becamesituation calculus axiomscan represent so manyand such varied modelsof agent interactions, we cannot guarantee in generalthat the axiomatizationfor a given gametree is tractable. Animportantresearch topic is whatclasses of gameshavetractable representationsin the situation calculus (see the conclusionssection). Toillustrate this point, wenote that in extant applications of the situation calculus, the posspredicate typically has an inductive definition in terms of whatactions are possiblein a situation givenwhatfluents holdin the situation, and then what fluents obtain as a consequenceof the actions taken. (Wewill give examplesin the next section.) By contrast, gametheory deals with a set of histories, without any assumptionsabout howthat set maybe defined. The absence of state successor axiomsfor fluents in our axiomatization reflects the generality of VMgamesfor representing many different types of environment.For example,we maydistinguish betweenstatic and dynamicenvironments(Russell Norvig 1994, Ch 2.4). An environmentis dynamicfor an agent "if the environmentcan changewhile an agent is deliberating". In a dynamicenvironment,a frame axiomthat says that a fluent will remainthe sameunless an agent acts to changeit maywell be inappropriate. For example,if an agentsenses that a traffic light is greenat timet, it should not assumethat the light will be greenat time t + 1, evenif it takes no actions affecting the light. Hencea VM modelof this and other dynamicenvironmentscannot employa simple frameaxiom(cf. (Poole 1998)). Wehave followed(Levesque,Pirri, &Reiter 1998)in introducinga binary relation I((s, ~) between istuations. One difference betweenour treatmentand that of Levesqueet al. is that they allowuncertaintyaboutthe initial situation; they introduce a predicate Ko(s) whoseintended interpretation is that the situation s maybe the initial onefor all the agent 94 knows. It is possible to extend game-theoretic modelsto contain a set of gametrees--often called a "gameforest"--rather than a single tree. A gameforest containsseveral different possibleinitial situations, onefor eachgametree, just as the approachof (Levesque,Pirri, &Reiter 1998) allows different possible initial situations. Game theorists use game forests to modelgamesof incompleteinformation in which there is someuncertaintyamongthe players as to the structure of their interaction (Osborne&Rubinstein1994). Anotherdifference is that Levesqueet al. use the binary relation 14: to represent the knowledgeof a single agent. Mathematicallythe single relation K suffices for our axiomatization, even in the multi-agent setting, because the situations are partitioned accordingto whichplayer moves at them. In effect, the relation K summarizesa number of relations Ki defined on the situations at whichplayer i moves. To see this, define Ki(s, s’) ¢=*(player(s) player(d) = i) A K(s, s~). ThenKi represents a partition of player i’s nodes. Fromthe point of viewof the epistemic situation calculus, the difficulty withthis is that player i’s knowledgeis defined by the I(i relation (and hence by the Krelation) onlyat situations at whichit is playeri’s turn to move.Thusif player(s) = i, what another player j knows at s is undefinedand so we cannot directly represent, for example, what i knowsabout j’s knowledgeat s. To represent at each situation whateach player knowsat that situation wewouldrequire a relation Ki for all players that is a partition of all situations. In terms of the gametree, we wouldrequire that the informationpartition Ii of player i constitutes a partition of all nodesin the gametree, rather than just those at whichplayer i moves.The general point is that VMgametrees leave implicit the representation of whata player j knowswhenit is not her turn to move.This maynot pose a difficulty for humansusing the apparatus of VMgametheory, but if we want to explicitly represent the players’ knowledgeat any situation, we will require additional structure, such as informationpartitions comprising all nodesin the gametree. Other logical formalismshave been developedto represent classes of games. For example, Parikh’s gamelogic uses dynamiclogic to represent zero-sumgamesof perfect information(Parikh 1983). Poole’s independentchoice logic is a formalism for simultaneous move,or matrix, games (Poole 1997, Sec.3.3), and the dynamicindependentchoice logic has resources to describe the componentsof a game tree (Poole 1997,Sec,5.5). Theconstructionin this paper different becauseit aimsfor moregenerality with respect to the class of representablegametrees, and becauseit focuses on the notion of a sequenceof actions central to both game theory and the situation calculus. Defining Continuous Payoff Functions in the Situation Calculus It remains to define the payoff functions ui : Z-~Rthat assign a payofffor player i to eachterminal history. Tosimplify, weassumein this section that all terminalhistories are infinite. This is no loss of generality becausewecan define the payofffroma finite history s as the payoffassignedto all infinite histories extendings, whenever all histories extending s share the samepayoff. There are in general uncountably manyinfinite histories, so we cannot introduce a name for each of themin a countablelanguage. Moreover,payoff functions in infinite gamescan be of arbia’ary complexity (Mosehovakis 1980), so wecannot expect all such functions to have a definition in the situation calculus. In economic applications,continuouspayofffunctionstypically suffice to represent agents’ preferences. Ournext step is to showthat all continuouspayofffunctionshavea definition in the situation calculus providedthat weextend the situation calculus with constants for rational numbers.Beforeweformally define continuousfunctions, let us consider two examplesto motivatethe definition. Example1. In a commonsetting, familiar from Markov decisionprocesses,an agent receivesa "reward"at eachsituation (Sutton &Barto 1998, Ch.3). Assumingthat the value of the rewardcan be measuredas a real number,an ongoing interaction betweenan agent and its environmentgives rise to an infinite sequenceof rewardsrx, ..., rn, .... A payoff function u maynowmeasurethe value of the sequence of rewards by a real number.A common measureis the discounted sum: Weuse a discount factor t~ with 0 < ~ < 1, and define the payoff to the agent froman infinite sequence of rewardsby u(rl, ..., rn,...) ~-~i~x t~iri" For example, supposethat the agent’stask is to ensurethe truth of a certain fluent, for examplethat ison(light, s) holds. Wemay then define the reward fluent function reward(s) = 1 ison(light, s) and reward(O,= i son(light, s) so that the agent receives a rewardof 1 if the light is on, anda rewardof 0 otherwise. Thenfor an infinite history h we have an infinite sequenceof rewardsrx,..., rn,.., consistingof 0s and Is. If the agent managesto keep the light on all the time, reward(a)= 1 will be true in all situations, and its total discounted payoff is)--~-i=1 oo ~i x 1 = d;/1 - t~. Example2. Supposethat the agent’s job is to ensure"that a certain condition never occurs. Imaginethat there is an action DropContainer(s)and the agent’s goal is to ensure that it never takes this action. Wemayrepresent this task specification with the followingpayoff function: u(h) = 0 there is a situation s C h such that do(DropContainer, s) holds, and u(h) = 1 otherwise.It is impossibleto define this payoff function as a discounted sumof boundedrewards. The special character of the 0-1 payoff function u stems fromthe fact that if DropContainer is true at situation s, the payoff function "jumps"froma possible value of 1 to a determinate value of 0, no matter howmanytimes beforehandthe agent managed to avoid the action. Intuitively, the payoff to the agent is not built up incrementallyfromsituation to situation. Usingstandard conceptsfrom topology, wecan formalizethis intuitive difference withthe notion of a continuouspayoff function. To that end, we shall introduce a numberof standard topological notions. Our use of topologywill be self-containedbut terse; a text that covers the definitions we use is (Royden1988). Let ‘4, B be twotopological spaces. A mappingf : .4 --+ B is continuousif for every open set YC_ B, its preimage f-l(y) is an open subset of .4. Thusto define the set of continuouspayoff functions, we need to introduce a system 95 of opensets on Z, the set of infinite histories, and_R,the set of real numbers.Weshall employthe standard topology on R, denotedby7"¢.. The Baire topology is a standard topology for a space that consists of infinite sequences,such as the set of infinite gamehistories. It is importantin analysis, gametheory (Moschovakis1980), other areas of mathematics, and increasingly in computerscience (Finkel 2001), (Duparc, Finkel, &Ressayre 2001). Let A be a set of actions. Let Is] = {h : s C h} be the set of infinite action sequencesthat extends. Thebasic opensets are the sets of the formIs] for each finite sequence(situation) s. Anopenset is a union basic opensets, including again the emptyset 0. Wedenote the resulting topological space by B(A). For each rational q and natural numbern > 0, define an open interval O(q, n) centered at q by O(q, n) = {z : Iz ql < 1/n}. Let Eu(q,n) = u-l(O(q,n)). Sinceu is continuous, each set Eu(q, n) is an openset in the Baire space and hencea unionof situations. So the function u induces a characteristic relation intervalu(S, q, n) that holds just in case Is] C_ E~(q,n). In other words, intervalu(s,q,n) holds iff for all histories h extendings the utility u(h) within distance 1/n of the rational q. In the examplewith the discontinuouspayoff function above, intervalu(s, 1, n) does not hold for any situation s for n > 1. For given any situation s at whichthe disaster has not yet occurred, there is a history h D s with the disaster occurring, such that u(h) = 0 < I1 1/nl. Intuitively, wi th a continuous payoff function u an initial situation s determinesthe value u(h) for any history h extendings up to a certain "confidenceinterval" that boundsthe possible values of histories extending s. Aswe movefurther into the history h, with longer initial segmentss of h, the "confidenceintervals" associated with s becomecentered around the value u(h). The following theoremexploits the fact that the collection of these intervals associated with situations uniquelydeterminesa payoff function. Assumethat constants for situations, natural numbers and rationals have been defined along with a relation intervalu such that interval~(s, q, n) is true just in case intervalu([s], [q], I’n]) holds. Theorem 0.2 Let A be a set of actions, and let u : I3(A) 7"~ be a continuous function. Let payoff be the axiom Via, n.3s r- h.interval~(s, u(h), n). I. u satisfies payoff, and 2. if z~t : 13(A)~ R satisfies payoff, then u’ =u. Proof. For part 1, let a history h and a numbern be given. Clearly u(h) O(u(h),n), soh ~Eu(u(h),n) = u-l(O(q,n)). Since Eu(u(h),n) is an open set, there is a situation s C h such that [s] ___ Eu(u(h),n). Hence intervalu(s, u(h), holds ands witnesses the claim. For part 2, supposethat u~ satisfies the axiompayoff in a model,such that for all histories h, numbersn, there is a situation s C h satisfying interval,(s,u’(h),n). We showthat for all histories h, for everyn, it is the case that lu(h) - u’(h)l 1/n, which implies th at t~ (h) = u’(h). Let h, n be given and, by Part 1, choosea situation s C h satisfying intervalu(s,u~(h),n). By the definition of the Fluent I Meaning H(i, c~, s) I player i holds card x AskPhase(s)I a query may be posed Know(i, ¢, s) player i knowsthat ¢ A central fluent in our axiomatizationis the card homing fluent H(i, c~, s), read as "playeri holds card z in situation s". Wewrite H(0, c~, s) to denote that card x is in the mystery pile in situation s. In MYST, as in Clue, cards do not movearound. This is a static, invariant aspect of the game environment that the situation calculus can capture concisely in the followingsuccessorstate axiom: Table 2: Three fluents used to axiomatize MYST relation intervalu, it follows that h 6 Bu(u’(h),n) u-*(O(u’(h),n)). So u(h) 60(u’(h),n), which by the definition of O(u’(h), implies tha t lu( h) - u’( h) l <1In. Since this holds for any history h and numbern, the claim that u(h) = u’(h) follows. [] Weremark that if a VMgameG has only a finite set of histories, any utility function ui is continuous,so Theorem 0.2 guaranteesthat payofffunctions for finite gamesare definable in the situation calculus. An Application: Axioms for a Variant of "Clue" Ourresults so far provide a general mathematicalfoundation for the claimthat the situation calculusis a strongformalism for representing game-theoreticstructures; they showthat for a large class of games,the situation calculus can supply axiomscharacterizing the gamestructure exactly (categorically). This section outlines the representationof a fairly complexparlour gameto illustrate what such axiomatizations are like, and to give a sense of the usefulness of the situation calculus in describing gamestructures. The discussion indicates howthe situation calculus can take advantages of invariancesand localities present in an environment to give a compactrepresentation of the environmentaldynamics. Weshowhowa state successor axiom can define memory assumptionslike the game-theoreticnotion of perfeet recall in a direct andelegant manner. In previous work, we havestudied a variant of the wellknownboard game "Clue", which we call MYST.For our present purposes, a brief informal outline of MYST sufrices; for moredetails we refer the reader to previouswork (Ban, Delgrande,&Schulte 2001), (Bart 2000). In particular, wewill not go into the details here of definingformally the terms of our language for describing MYST. MYST begins witha set of cardse,, .., Cn. Thesecard are distributed among a numberof players 1,2 ..... k, and a "mysterypile". Eachplayer can see her owncards, but not those of the others, nor those in the mysterypile. Theplayers’ goal is to guess the contents of the mysterypile; the first person to makea correct guess wins. Theplayers gain informationby taking turns queryingeachother. Thequeries are of the form "do youhold one of the cards fromC’?", where(7 is a set of cards. If the queried player holds noneof the cards in G, he answers"no". Otherwisehe showsone of the cards from G to the player posing the query. In what follows, we discuss someaspects of the axiomatization of MYST; a fuller discussion is (Bart, Delgrande,&Schulte 2001), and (Ban 2000)offers a full set of axioms. Table2 showsthe fluents we discuss belowtogether with their intended meaning. 96 H(i, ex, s) -- H(i, c~, do(a, Anotherfact that remainsinvariant across situations, and that is crucial to reasoningabout MYST, is that every card is held by exactly one player or the mysterypile. Wemay express this with two separate axioms. ExclusivenessH(i, cz, s) -~ Vj # i.-~H(j, cx, s). If player i holds card ex, then no other player j (or the mysterypile) holds c~. If the mysterypile holds card cx, then e~ is not held by anyplayer. Exhaustiveness Vi=o P H(i, ex, s). Everycard is held by at least one player (or the mystery pile). It is straightforwardto specify preconditionsfor actions in MYST. For example, consider asking queries, one of the mainactions in this game.Wewrite ask,s(i, q) to denote that player i takes the action of asking query q. The fuent Ask,Phase(s)expresses that the situation s is in an "asking phase", i.e., that no query has been posed yet. Thenthe preconditionfor asking a queryis given by Poss(ask,s(i, Q ), s) =- Ask.Phase(s) A player(s) whereplayer(s) indicates whoseturn it is in situation s. Weuse an epistemicfluent Know(i, ¢, s) to denote that player i knows¢ in situation s, where¢ is a sentence. Players gain informationby observingthe results of queries. For example,if player 1 showscard 3 to player 2, then player 1 comesto knowthat player 2 holds card 3. Usinga fluent shows(j, i, c~), we mayexpress this effect by the axiom Knows(2,H(1, ca, s), do(shows(l, 2, ca), s)). In general, we have the following knowledgeaxiomthat is invariant acrosssituations: Know(i,H(j, c~, s ), do(shows(j, i, e~ ), Indeed, the fact that player 1 holds card 3 becomescommonknowledgeamongthe two players after player 1 shows card 3 to player 2. Weuse a commonknowledgefluent indexed by a set of players to express this principle. The samefluent allowsus to capture the fact that in MYST, after player 1 showscard 3 to player 2 in responseto queryq, it becomescommonknowledgeamongall players that player 1 has somecard matchingthe query (for moredetails, see (Ban, Delgrande, & Schulte 2001, Sec. 3), (Bart 2000)). Thisillustrates that the situation calculustogetherwithepistemic logic provides resources to capture somequite subtle differential effects of actions on the knowledgeand common knowledgeof agents. In our analysis of strategic reasoning in MYST, we assumethat agents do not forget facts once they cometo know them. Thisis part of the game-theoreticnotion of perfect recall. I Wemayrender this assumptionabout the memory of the players by the followingaxiomfor the knowledge fluent: Know(i, ¢, s) ~ Know(i, ¢, do(a, for a sentence¢. Conclusion Von Neumaun-Morgenstern game theory is a very general formalismfor representingmulti-agentinteractions that encompassessingle-agentdecision processesas a special case. Game-theoretic models of manymulti-agent interactions from economicsand social settings are available, and the general mathematical theory of VMgames is one of the major developmentsof the 20th century. The situation calculus is a natural language for describing VMgames; we established a precise sense in whichthe situation calculus is well-suited to representing VMgames:A large class of VMgameshas a sound and complete (categorical) set axiomsin the situation calculus. This result underscores the expressive powerof the situation calculus. The connection betweenthe situation calculus and gametheory suggests fruitful applications of game-theoreticideas to planning and multi-agentinteractions. Weconsideredin particular the use of the Baire topologyfor definingcontinuousutility functions, and the representation of concurrentactions. Conversely,the logical resourcessituation calculusallow us to exploit invariants in the dynamicsof someenvironments to provide a morecompactrepresentation of their dynamics than a gamegee would. Wesee two mainavenuesfor further research. First, it is important to knowwhat types of VMgamespermit compact and tractable axiomatizationsin the situation calculus. Withoutrestrictions on the possible gameformsF, there is no boundon the computationalcomplexityof an axiomatization Axioms(F). There are a numberof plausible subclasses of gamesthat might have computationallyfeasible representations in the situation calculus. Anobvious restriction wouldbe to considerfinite gametrees, whichhave first-order categorical axiomatizationsin the situation calculus. A weakerfinitude requirement wouldbe that each agent should have only finitely manyinformation sets. An information set correspondsto what in manyAI formalisms is called a "state" because an information set defines an agent’s complete knowledgeat the point of makinga decision. Gamesin whichthe agent has only finitely manyinformation sets closely correspondto MarkovDecision Processes in whichthe possiblestates of an agent are typically assumedto be fnite. Several authors have noted the utility of the situation calculus for defining MarkovDecision Processes ((Poole 1998), (Boutilier et al. 2000)), and the research into compactrepresentations of MarkovDecision ~Fora formaldefinitionof perfectrecall in gametheory,see (Osborne&Rubinstein1994).Thegame-theoreticdefinition requires that an agentshouldnot onlyremember facts, but also her actions. This canbe representedin the epistemicextensionLe of the situationcalculusbyrequiringthat K~(s,s’) doesnot hold whenever situationss, s’ differ withrespectto an actionbyagenti. 97 Processes (cf. (Boutilier, Dean, &Hanks1999)) may apply to compactrepresentations of gametrees as well. Anotherapproach wouldbe to begin with a decidable or tractable fragmentof the situation calculus, and then examine which classes of gametrees can be axiomatized in a given fragment. There are results about computablefragmentsof the situation calculus, evenwith the second-order induction axiom, such as that presented in (Temovskaia 1999).Themainrestrictions in this result are the following: (1) Thereare only finitely manyfluents F1, .., F,, (2) fluent F is unary, i.e. of the form F(s), like our AskPhase fluent, (3) the troth of fluents in situation s determinesthe truth of fluents in a successorsituation do(a, s) by state successor axiomsof a certain restricted form (such as our axiomfrom the previous section for the card holding fluent; see (Levesque,Pirri, &Reiter 1998)for moreon the prooftheoretic and computationalleverage from state successor axioms). Our experience with MYST,and more generally the literature on MarkovDecision Processes, suggests that assumptions(1) and (3) hold in manysequential decision situations; weplan to explorein future researchthe extent to whichrestriction (2) can be relaxed, and howepistemiecomponents such as the relation K(s, ~) affect c omputational complexity. Second, the main purpose of providing an agent with a modelof its planning problemor interaction with other agents is to help the agent makedecisions. Gametheofists have introduced various modelsof rational decisionmakings in VMgames, some of which have been applied by computerscientists (see (Koller &Pfeffer 1997), (Bicchieri, Ephrati, &Antonelli1996)). It shouldbe possible axiomatizemethodsfor solving gamesin the situation calculus, and to formalize the assumptions--such as common knowledgeof rationality--that gametheorists use to derive predictions for howagents will act in a given game. Thecombinationof logical techniques, such as the situation calculus, and the advanceddecision-theoreticideas that we find in gametheory provides a promisingfoundationfor analyzingplanning problemsand multi-agent interactions. Acknowledgments Previous versions of this paper were presented at JAIST (Japanese AdvancedInstitute for Science and Technology) and the 2001 GameTheory and Epistemic Logic Workshop at the University of Tsukuba.Weare indebted to Eugenia Temovskaia,Hiroakira Ono,Mamoru Kaneko,Jeff Pelletier and the anonymousAAAIreferees for helpful comments. This research was supported by the National Sciences and Engineering Research Council of Canada. References Bart’ B.; Delgrande,J.; and Schulte, O. 2001. Knowledge and planning in an action-based multi-agent framework:A case study. In Advancesin Artificial Intelligence, Lecture Notesin Artificial Intelligence, 121-130.Berlin: Springer. Ban, B. 2000. Representationsof and strategies for static information, noncooperative gameswith imperfect infor- marion. Master’s thesis, SimonFraser University, School of ComputingScience. Bicchieri, C.; Ephrati, E.; and Antonelli, A. 1996. Games servers play: A proceduralapproach.In Intelligent Agents IL BedimSpringer-Verlag. 127-142. Boolos,G., and Jeffrey, R. 1974. ComputabilityandLogic. Cambridge:CambridgeUniversity Press. Boutilier, C.; Reiter, R.; Soutchanski,M.; and Thrun,S. 2000. Decision-theoretic, high-level agent programming in the situation calculus. In AAAI-2000, SeventeenthNational Conferenceon Artificial Intelligence. Austin, Texas: AAAI Press. Boutilier, C.; Dean, T.; and Hanks, S. 1999. Decisiontheoretic planning: Structural assumptionsand computational leverage. Journalof Artificial Intelligence Research 11:1-94. Duparc,J.; Finkel, O.; and Ressayre, J.-P. 2001. Computer science and the fine structure of borel sets. Theoretical ComputerScience 257:85-105. Finkel, O. 2001. Topologicalproperties of omegacontextfree languages. Theoretical ComputerScience 262:669697. Koller, D., and Pfeffer, A. 1997. Representationsand solutions for game-theoreticproblems.Artificial Intelligence 94(1):167-215. Levesque,H.; Pirri, E; and Reiter, R. 1998. Foundations for the situation calculus. Linl~pingElectronicArticles in Computerand Information Science 3(18). Moschovakis,Y. 1980. Descriptive Set Theory. NewYork: Elsevier-North Holland. Osborne, M. J., and Rubinstein, A. 1994. A Course in GameTheory. Cambridge,Mass.: MITPress. Parikh, R. 1983. Propositional gamelogic. In IEEESymposiamon Foundationsof ComputerScience, 195-200. Poole, D. 1997. The independent choice logic for modelling multipleagentsunderuncertainty. Artificial Intelligence 94(1-2). Poole, D. 1998. Decisiontheory, the situation calculus, and conditional plans. Link6pingElectronic Articles in Computer andInformationScience 3(8). Royden,H. L. 1988. RealAnalysis. NewYork: Macmillan, 3 edition. Russell, S. J., and Norvig,E 1994. Artificial Intelligence: A modernapproach.Englewood Cliffs, NJ: Prentice Hall. Sutton, R. S., and Barto, A. G. 1998. Reinforcementlearning : an introduction. Cambridge,Mass.: MITPress. Temovskaia,E. 1997. Inductive definability and the situation calculus. In Transactions and Changein Logic Databases, volume 1472. LNCS. Temovskaia,E. 1999. Automatatheory for reasoning about actions. In Proceedingsof the Sixteenth InternationalJoint Conferenceon Artificial Intelligence. Stockholm,Sweden: HCAI. 153-158. van den Herik, H., and Iida, H. 2001. Forewordfor special issue on games research. Theoretical ComputerScience 252,1-2:1-3. von Neumann,J., and Morgenstern, O. 1944. Theory of Gamesand EconomicBehavior. Princeton, NJ: Princeton UniversityPress. 98

Representing von Neumann-Morgenstern Games in the Situation Calculus

Related documents

Products

Support

Representing von Neumann-Morgenstern Games in the Situation Calculus

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib