From: AAAI Technical Report SS-95-07. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. Relating Formalizations of Actions Tom Costello* Dept. of Computer Science, Stanford University, Stanford, CA 94305 cost clio@sail, stanford, edu Abstract possible moveconcurrently. We relatethishighlycomplexdomainto a muchsimplerdomain,wherewe have onlya finitenumberof possible actions at anypoint-wherewe onlyconsider serialactions--where alltheeffectsof actions arcimmediate. In thispaperwe showa general method of relating two formalizations so thatwe cansaytheybothderivethe samesentences, modulo translation, in a certain class. In thecaseof ourexample we willconsider thattwoknots canbe madeequalby tying/untying. Reasoning about action is central to muchof intelligent behavior. Muchof the reasoning that is needed consists not of reasoning about long or complicated sets of actions, but of thinking of a problem at various different levels of abstraction. Problems that in one representation seem enormous can be reduced to a simple exercise in a more suitable representation. A transformation that is especially useful is one from continuous to discrete change. The transformations necessary to move from a complex continuous system to a simple discrete systems involve both introducing new properties that change with time--fluent names--or no longer modeling some other fluent names, and considering new composite actions. 1 1.1 1.2 Outlineof Paper We firstsketch theessential properties of a formalizationof actionthatwe needin orderto relate it to anotherformalism. In the following section we describe how we formalize the relation betweentwoformalizations,a translation, anda classof sentences, and we discuss whatrelationships pairsoftheories standin. Finally we comment on thegeneral applicability of this method.We relateourformalisms to otherformalisms inthefield. Introduction Aims of this Paper The goal of this paper is to show how complex reasoning about continuous concurrent systems in complex space, can sometimes be replaced by much simpler reasoning. Weshall show how to state relations between various formalizations, and show that we can precisely say when one formalization captures a part of another more complicated formalization. Ourbasictoolwillbe versions of the situation calculus, developed by McCarthy and Hayes in [McCH69]. In an extended version of this paper we consider the domain of tying and untying (tame 1) knots. This involves arbitrary curves2 in 3 space. The changes possible are explained in terms of the notion of a subsection of string movingalong a path, while other pieces of string *This work was partly supported by ARPA(ONR)grant N00014-94-1-0775 and was partly done while the author was visiting the University of Toronto. The author thanks Ray Reiter, Anna Patterson, and John McCarthyfor useful comments. 1A tame knot is a knot that has only a finite numberof loops that intersect. 2Weconsider just piecewise polynomials for expository reason. 45 2 Requirements 2.1 Discrete Theories In this paper we consider discrete theories of action that have a notion of situations sitthat are identified with sequences of actions evts, and properties or fluent names flt that hold at these situations. A model of a discrete theory is therefore an assignment of fluent names to the sequences of actions, including the empty sequence which we call the initial situation or sO. Wewill also briefly mention axiomatizations of discrete theories. These will consist of sentences of three types: ¯ That a fluent holds at a situation. ¯ That an action causes a fluent to hold in the next situation if certain fluents hold. ¯ That a domain constraint is always satisfied Axiomatizations of discrete action such as Kartha and Lifschitz’s AR, and extension of Gelfond and Lifschitz’s A [GeLi91], or Lin and Shoham’s"Provably Correct Theories of Action" [LiSh91], Reiter’s monotonicversion in [Rei91], or the author’s SSC[Cos95b], are representative of these. Weuse ~d~,c to represent the relation of consequence in these theories. Domainconstraints, or ramifications, are rules that govern how fluents may change. Thus the initial situation must obey the domain constraint, and every action must preserve the domain constraints. This can be expressed as saying that the effects of an action are closed under logical consequence, or the rule K of modal logic. This solution to the ramification problem was proposed by the author in [Cos91]. The original solution to this was proposed by Baker in [Bak91], but is more complicated. The discrete theory we will consider as our motivating example is an axiomatization of manipulating knots. In this we represent a knot as a sequence of crossings. Each crossing is either under or over. Wehave three types of actions, the Reidemeister moves [Reid32]. Wecan make or destroy a simple loop, cross two pieces that are next to each other, or movea piece over a crossing if it is above/below both pieces in the crossing. Wedo not give an axiomatization here for reasons of space. The axiomatization is straight forward enough, save for the use of an ezists fluent name function, following McCarthy in [MeC76] to represent what crossings exist. Thus we create and destroy objects--these crossings--as we go along. 2.2 Continuous Theories Continuous theories have as their "backbone" not sequences of actions, but rather an association of events with a timeline, i.e. a timeline of the events that occurred up to a momentin time identifies a situation in a given timeline. This can alternately be seen as a sequence of events, with an associated function that returns the time between each event. Weassume that situations are dense along timelines. Properties that hold at these situations are again called fluents. In general a property will follow a curve, trajectory, or fluxion between events. At events the curve or graph that the property follows over time may change. It should be noted that events may be stated externally to the system, or may be caused by internal confluences of processes. A sink overflowing is perhaps the canonical example of an event that is caused by a confluence of the effects of past events. Wecan either state explicitly what time-lines or histories we wish to consider, or we can consider all possible histories. Wenote that we now say that a fluent holds at a situation in a given history. Whenthe history is obvious will we sometimes revert to the usual notation. Axiomatizations of continuous theories, will then need statement of four types, ¯ That a fluent holds at a situation. ¯ That an event causes a fluent to begin following a fluxion if certain fluents hold. ¯ That a domainconstraint is always satisfied ¯ That a event occurs in a history Shanahan’s cireumscriptive theories of events[Shun94], Reiter and Levesque’s Golog, and the author’s system F [Cos95b] are examples of continuous systems with the above properties. Weuse ~cont to describe the consequences of a theory of continuous change. Our example is the manipulating of knots. Werepresent these as piecewise polynomials. Weallow only a finite number of pieces, thus we exclude wild knots. We use abstract syntax to describe piecewise polynomials, and we represent our string by these. Weallow arbitrary deformations along piecewise polynomials. If a deformation breaks the string, or passes two pieces through each other we trigger an fail event. Wedo not give the axiomatization for reasons of space again, there are no major difficulties in generating it once a theory of piece-wise polynomials is in place. 3 Relating Theories Wehave sketched two theories that might on the surface seem very different. One is clearly much stronger and more expressive than the other. What we now show is that for a certain class of sentences, the sentences that state whether one knot can be manipulated so that it is equal to another, the two systems are equivalent. Wefurthermore generalize the notion of relating theories, to any two theories, a translation, and a set of sentences. Wethen consider its applicability to reasoning about events, and show reasons why we should be hopeful that complicated domains may be reduced to simpler ones. 3.1 Removing Fluents Wefirst consider a relation between two theories, where one theory has a subset of the others fluent names. We ask, when is it possible to keep the same results, yet remove certain fluent names. In general we will find that we can remove all fluent names save those that are true exactly when a particular event occurs. Wewill show that every other statement about fluent names is provably equivalent to a statement about these fluent names. Wefirst need to define what it meansfor a set of fluent names to be sufficient to represent a theory. A set will be deemedsufficient if all sentences involving fluent names outside that set are equivalent to sentences involving only fluent names inside that set, or facts about the initial situation. Definition: 1 A set of fluent names F is sufficient to represent a theory T if every sentence q~ in an axiomatization of T is provably equivalent to sentence ~b where ~b does not contain any fluent name f name outside F that is not in a expression of the form holds(f, sO). Example: 1 If we have a set of fluent name constants F, and we have functions A, V, -1, that map fluent name (pairs) to the conjunction of the fluent names, then the fluent nameconstants are sufficient to represent a theory involving the closure of F under these functions. holds(f1 A f2, s) =_. holds(f1, s) A holds(f2, Given the above sentence, we can show that every sentence in terms of fl ^ f~ can be written as a sentence in terms of fl and f2, without using the function A on fluent names. Wenote that we allow statements about the initial situation to use other fluent names, as long as they are not needed elsewhere. This is analogous to using initial conditions in solving a differential equation. Example: 2 If we wish to describe the change in position of a body that moves with constant, in a straight line, we can either describe its position as a function of time, or else describe its velocity as a function of time, and give the bodies initial position. It is often simpler to describe systems by using their derivative rather than the values themselves. If vel is the velocity of the object, and Ioc(z) it location, and time returns the time associated with a situation then its position can be stated as: holds(loc(z), s) - time(s) - time(sO) = 10 * vel Or more simply as holds(loc( O), sO) A holds( velocity( vel Wherevelocity is the velcity of the particle, and we apply inertia to it. Here we have given an example where velocity is a frame for a theory, in the presence of an initial condition, the value of loe at sO. We nowdefinewhatit meansto changea theoryon a I. setof fluent names F toa different sufficient setF Definition: 2 The theory N(F’, T), is the set of sentences dp not involving fluent names not in FI, that are in the theory T The other sentences, those involving facts about the initial state we call init(F’, T). Theorem: 1 If T is a domain theory, and FI a sufficient set of fluent names, then T ~,ont¢ 3¢N(F’, T) ~cont ¢ and init(f’, T)A ¢ ~ Weomit proof of this and the following theorems for space reasons. The proofs present no special difficulties, and can be proved by induction on each formula in the theory. Theorem: 2 If T is a domain theory, and F’ a sufficient set of fluent names, then T ~,on, ¢ -3¢Y(f’, T) ~cont ¢ and init(F’, T) A ¢ ~ Wenote that the theorems above show that for continuous and discrete theories, we can change the fluent names that we work with. Wenote there are large computational advantages in being able to work with a much smaller set of fluent names. More importantly for our purposes, we can change one set of fluent names into another, yet still describe the same underlying domain. 3.2 l~’ames Wenow define a fluent frame. A frame of fluents for a set of fluent namesF is a set of fluent names,and a set of sentences, that has the property that every fluent name in the original set is definable from the fluent names in the frame, and that this is derivable from the set of sentences. Definition: 3 If F is a set of fluent names, and F’, S a frame for that set, then .for every .f 6 F there is a sentence Vs.holds(.f, s) - ¢ suchthat S ~ Vs.holds(.f, s) -and no fluent names but the fluent names in F’ occur in ¢ Thus a frame may introduce an entirely new set of fluent names, and relate them to the old set. Example: 3 If we want to describe continuous circular motion we can do so by writing sentences describing how the z and y co-ordinates of a particle change with time. holds(z ~_ sin(d), s) =_ time(s) - time(sO) holds(z ~_ cos(d), s) - time(s) - time(sO) Alternately we can use a circular co-ordinate scheme, where we have r as the distance from the origin and 0 as the angle. If we denote the first derivative of 8 with respect to time as 8’ then holds(z ~_ a, s) =_holds(r ~_ s)A holds(O ~_ c) A a = b * eos8 Wealso note that to describe continuous circular motion we need only give the initial value orS, and the values of r and 8’. These do not change over time. Thus r and 8’, with the inital value .for 0, are a frame for this example Wenow show that we can reason with a translation of our old theory into these new fluents, and get the correct results. First we must define what a translation of a theory looks like. Definition: 4 The translation of a theory T into a new frame F’, S, is the set of sentences that result from replacing the fluent-names in the sentences of T by fluent names in F’ subject to the following rules. Wesay that i.fVs.holds(.f, s) = ¢ then f is to be translated by ¢. For a given sentence X involving .f, we write the translation of that sentence to remove f as Tr(x(.f), ¢) Weexplain how this is done by structural induction on ¢ 1. Tr(x(.f), holds(.f’, s)) = X(.f’) L Tr(x(.f), z = y) = (z 3. Tr(x(f), ¢ A ¢) = Tr(x(.f), ¢) A T~(x(.f) , ¢) ~. Tr(x(f),-~¢)=-~Tr(x(f),¢) 5. Tr(X(f), Vz.f(z)) Vz.Tr(x(f), ¢(z)) This gives us a method of removing one fluent-name from a sentence. Repeating this process will thus remove all fluent-names not in F’. Wenow need to show that the order in which we removefluent names by the above process does not matter nor by which statement we remove them, and that reasoning in the new system gives the same results. Theorem: 3 If F’, S is a frame .for T, then 3¢.Tr(T, S) ~,~t ¢ and S A ¢ ~ The proof goes by induction on the sentences in the theory. Theorem:4 If F’, S is a frame for T, then T ~dite¢ =- 3¢.Tr(T, S)~d.o¢ andS ^ ¢ ~ ¢ The proofs go by induction on the sentences in the theory. Wefinally introduce the notion of modeling a domain with a coarser set of fluents. These are a set of fluents which do not have the properties above, that all other fluents can be reproduced from them, and possible the initial values of fluents. Rather these fluents are coarser, in that though we cannot find expression that exactly match, we can find expressions that are strictly weaker/stronger in the effects/preconditions of actions. In this way we can find a new domainwhere a state will be reachable in the original domainif it is reachable in the new domain, but not necessarily vice-versa. If we can reach a particular situation just using actions that obey these stronger preconditions, then we reach it with the original actions. Wenotice that the analogous operation of weakeningthe effects has the same result. Finally we notice that the opposite of this, weakening preconditions, and/or strengthening effects, is useful is we wish to showthe impossibility of an situation resulting. This is commonlydone when using induction, that is we prove something stronger than we actually need, as it is simpler to prove. Thus we define the notion of a weak frame as follows: Definition: 5 If F is a set of fluents, and F’, S a weak frame for that set, then F’ is a new set of fluent names, and S consists of sentences of the form: Vs.holds(f , s) ¢ where f is a fluent name in F’ and dp contains only fluent-names from F. This is considerably weaker than the previous definitions but we knowsee that there are occasions when we can use this notion. The idea here is that although the weak frame does not capture all of the theory, it captures all of the theory that it is able to describe. Definition: 6 The theory EX(T, F’, S) is the theory that uses only the fluent names in F’, such that when we consider the models of it union the sentences S, then the parts of those modelsthat describe objects in the original theory T are substructures of some model of T, and further more are all such substructures. Example: 4 Assume that in the blocksworld, we have three locations on the table lot1, loc2, and lot3. We introduce a new fluent name function ontable, defining it as: holds( ontable( z ), -- holds(on(z, locl ), s)V holds(on(z, lot2), s) V holds(on(z, Ioc3), If we have a sentence holds(on(A, locl ), we translate this to holds( ontable( A ), Rather than it being necessary for a location to be clear to move a block, it now is the case that there must be or fewer blocks on the table. The model that just uses ontable(z) rather than the individual locations is a weaker model. The fluent name function ontable(z), with the above sentence is a weak frame for this blocksworld theory. In particular cases this function is effectively calculuable given a particular theory. This is the ease in the systems F. This transaltion can be done by a recursive procedure that replaces the original fluent names, by the new names. Theorem:5 If F’, S is a weak frame for T, and q~ contains only fluents in F’,and fluent name variables ranging over FI, then T ~oo,. ¢-3¢.EX(T,F’, S) ~o~, ¢ andS ^ ¢ ~ 3.3 Composite Actions Wehave shown how various manipulations on fluents can be used to simplify reasoning, we now look at what we can do with actions. Changing actions is much more subtle than changing fluents. For when we change actions we can decide to make out theory stronger or weaker. Wesay one theory is weaker than another if it proves less sentences. If we remove actions from a theory, it becomes weaker in that we can "reach" less situations. Thus the notions of weaker and stronger can be relativized to a set of sentences, so that a theory is said to be weaker with respect to a set of sentences, than another theory, if it proves less sentences in that set. It is often useful to consider stronger theories in this respect. Sometimes it is easier to prove that a certain situation is not reachable by considering stronger actions than you actually have. If you can prove this from the stronger actions, then it clearly is unreachable by the weaker actions. Thus the sets of sentences that we will consider will be of the form that particular situations are reachable/nonreachable. A composite actions can be thought of as a disjunction of actions, where all we know is that one action in the set occurred. This however is not general enough, we need to be able to parameterize the set by the state of the world. Thus we have a set A(s), the actions that the composite action is made up of at a given situation. Thus a composite action has as its result, the common results of all actions from the set of actions A(s). This will give us extra flexibility. Howeverwe will see that this is not enough, we will want to be able to consider composite actions as sets, again parameterized by a situation, not of other actions, but of other sequences of actions, or partial time-lines. Wefind that this generality will allow us to state the relationship between our two formalizations of knots. Wefirst define some notions of composite action. Definition: 7 A (set of} composite action(s) is defined as a (set of) function(s) from situations to sets of tions. Wecan consider a situation to be characterized by the fluents holding at it in this case. Thus we associate a function a : ~lt ---r 2errs, with each composite action. The effects of a composite actions are exactly the fluents that holds after every action in the composite action. The preconditions for a given effect of a composite action are exactly the preconditions that satisfy all of the actions it contains. Exanaple: 5 If we imagine that we are in a room, we can describe our position by what quarter of the room we are in, or, alternately, where we are exactly in terms of a finer co-ordinate system that measures in feet from the center. Assume we have the actions of moving a certain distance in a certain direction. Then we can make the composite action, moving to the next quarter, by assigning to each situation, those actions that would movefrom that situation to one where we were in the next quarter. Wenote that the set of actions that we associate with each composite action varies from situation to situation. If we were in the middle, the perhaps all movesof greater than 10 feet would be in the composite action of moving to the next quarter. If we were near the edge then perhaps all moves of one foot or more might count. Wenow define the translation a set of composite actions. of a domain theory by Definition: 8 The translation of a domain theory T by a set of composite actions A, TR(T, A) is defined the theory, where what holds after a sequence of composite actions, or partial time-line, is the disjunction of all possible sequences, or partial time-lines, of choices of actions out of the sets associated with the composite actions. This can be generated effectively from a theory in SSC. By reachable(f, h) we mean there is a sequence of actions that result in f holding at a situation in a given history h. Defining this is closely related to an induction principle over situations. In the continuous case it is not purely induction, as we can have infinite sequences of later and later situations, that do not exhaust all of time. It is necessary to consider whether the elapsed time between all the situations converges in the limit. Wenow give a theorem relating composite actions of the first type, to the situations that are reachable. Theorem: 6 If T is a domain theory, and A a set of composite actions, and S the translation ofT by A, then T ~t reachable(f, T ~cont -,reachable(f, h) =_ S ~ont reachable(f, h) - S ~cont -,reachable(f, Here we state the theorem for continuous models, an analogous theorem holds for for the discrete case. We now define the second notion of composite actions, namely composite actions of the second type. Definition: 9 A composite action of the second type is a function from partial histories to events, where a partial history is defined by the fluents that hold at the beginning of that time interval, and what fluxions they are following, and a set of events duration pairs that occur form then on in that history, until a duration d has passed. Werepresent the first situation by a fluent. We represent this as A2 : 2flt×flux × dur x 2evts×dur --+ evts Example: 6 In the last example we thought of being in a room, and having a single parameterized move action. We now imagine that the size of the areas we can be in can change over time--perhaps they are marked by a spotlight. Nowa certain action might send us into a region in one situation. The same action in a situation that is exactly equivalent might not send us into a lit region in another case, because the lit region might move in the second case. We see that the events that might send us into a new region, need to be parameterized by the entire partial history, not just the previous events. Weneed to take into account that the success of an event might depend on other factors, not just on the situation it is attempted in. Wenowneed to specify a translation of a theory T by a composite action of the second type. Wenotice that this translation will be from a continuous formalization, to a discrete one e.g. a theory in ~" to the simple situation calculus. Definition: 10 The translation of a domain theory T by a set of composite actions of the second type A, TR2(T, A) is defined as follows: It is a theory in a discrete language, with events, the events that are mapped to, by A2, and fluents the same as the theory T. A fluent holds after a sequence of actions in a model of this theory iff, there is a model of T, such that that fluent holds at a situation in T, and the partial histories that lead to that situations are mappedto the events that lead to the situation in the discrete model. Wenow extend the last theorem to composite actions of the second type. Let reachable(f) be there is a possible history where reachable(f, h) in the continuous ease and in the discrete ease the exhibition of holds(f, s) for somes. Theorem: 7 If T is a domain theory, and A a set of composite actions of the second type, and S the translation of T by A, then T ~,~t reachable(f) =_ S ~d~,, reachable(f) T ~,o~t ~reachable(f) = S ~d~,¢ -~reachable(f) Wenote that this theorem does not have an analogous diserete theorem, as in the discrete ease there is no difference between a partial history and the last situation. Nothing happens between actions in the discrete ease. 4 Knots This allows us to state the relation between the two forrealizations of knots. This theorem is thus the generalization of the theorem in topology that states that if two knots have the same topology, then the two representations of the knots can be made identical by a finite sequence of Reidemeister moves. A knot is usually specified by a projection. If we represent our knot as a curve in three space, removing any of the dimensions, gives a projection. Wewould like to get a projection that had the property that it had a small numberof multiple points. A multiple point of a projection is a point in the plane that is the image of more than one point in the knot. Wewant to get a projection with just double points, and where every double point represents a true crossing, not a tangent. A theorem of topology[Crow63]tells us that "every knot is equivalent under an arbitrarily small rotation to a knot whose projection has only true regular points". Thus every knot can be represented as a projection in two space. Once we have a projection in two space, we can further reduce this to a sequence of crossings by choosing any point on the curve and following it around, numbering the crossings encountered, and then telling which number goes under at each point. Moving from a representation as a curve in 3 space to this representation is an example of what we called a weak frame. While it throws away much information it retains the topology, and thus whether the knot is equivalent to another. Thus the set of fluents that tell what the knots crossings are, and whether they go over or under is a weak frame for the full theory, whenpaired with the sentences that relate this to the full theory. Whenwe consider the three types of Reidemeister moves, we notice that these are exampleof composite actions of the second type. They depend on the transaltion of the fluents given above. This is because they mapeach partial history of the knot, to the set of events that would transform it into a knot that wouldproject to a different set of crossings. Thus the classic theorem of topology is equivalent to the statement that the Reidemeister moves are a set of composite actions for the translation of a curve in three space with arbitrary deformations, by the weak frame given by the above translation. 5 Usefulness of Relations Wehave exhibited that there are manydifferent relations that can be made between theories. In particular we have examined five. ¯ Firstly we have shownthat all the fluents that are in the theory, are "not necessary". They can be eliminated in favor of a muchsmaller set, the fluents that correspond to actions. ¯ Secondly we showed how a formalization can be related to a formalization using an entirely different set of fluents. This allows the changing of language from one formalization to another. ¯ Thirdly we have shown that we can reason in a set of fluents that are insufficient to represent the complexities of the theory, yet are sufficient to represent the parts that we care about. ¯ Fourthly we saw how a formalization can be related to another formalization by changing the actions, or events. ¯ Lastly we saw a generalization of this that allowed partial time-lines to be mappedto events in another formalization. Wesaw how this allowed reasoning in concurrent continuous domains, as it related them to domains where reasoning is well understood. 5O Theserealtions between theories allowreasoning to be simplified by recasting it in a simpler domain. Thiswork hasbeenmainlymeta-theoretic--thcse relations needto be expressed in a objectlevellanguage to be fullyexploited by a reasoning machine. References [Bak91] Baker, A.B. (1991) ’Nonmonotonic reasoning in the frameworkof the situation calculus’ in Artificial Intelligence J. 49 5-23 [Cos91] Costello, T (1991) ’Causal Inheritance’ Second International Symposium on Commonsense Reasoning [Cos95b] Costello, T (1995) ’The Method Of Fluxions’, to appear [Crow63] Crowell, R. H., Fox, R. H. (1963) Introduction to Knot Theory, Boston [Kar94] Kartha, G. N., Lifschitz, V. (1994) ’Actions with Indirect Effects’, Proceedingof the Fourth International Conference on Knowledge Representation and Reasoning, Morgan Kaufmann [GeLi91] Gelfond, M., Lifschitz, V. (1992) ’Representing Actions in Extended Logic Programming’ in Proc. Joint Int’l Conf. and Syrup. on Logic Programminged. Krzystof, pp 559-5?3 [LiShgl] Lin, F., Shoham, Y. (1991) ’Provably Correct Theories of Action (preliminary report)’ in Proc AAAI 91 pp. 349-354 [McC76] McCarthy J. (1976) ’First Order Theories of Concepts and Propositions’ in Reasoning about CommonSense, papers by John McCarthy. [McCH69] McCarthy, J., Hayes, P. (1969)’Some Philosophical Problems from the Standpoint of AI’, Meltzer B. and Mitchie D. (eds) Machine Intelligence 4 , Edinburgh University Press [Reid32] Reidemeister, K. (1932) ’KNOTENTHEORIE’ in Ergebnisse der Mathematik, vol. 1 no. 1 Reiter, R. (1991) ’The frame problem in the [Rei91] situation calculus: A simple solution (sometimes) and a completeness result for goal regression’ in Artificial Intelligence and the Mathematical Theory of Computation ed. Lifschitz V. [Shan94] Shanahan, M (1994) ’A Circumscriptive Calcuus of Events ’ in Artificial Intelligence J. to appear