From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Issues in Constructing Decision and Learning Abstract Models Jonathan King Tash Group in Logic and the Methodology of Science University of California Berkeley, CA 94720 tas h@math. berkeley, edu Introduction Constructing When applyingdecision theory to problemsin planning,the complexity of the domain raises several issues. In particular, planning domainsgenerally require choosing amongsequencesof actions, whosepossible numbergrows exponentiallywith length. Wheneach action is described in termsof a mapping froman initial state to a distribution over possible resulting states (as in [Tash 1993]), calculating the consequences of each sequence and maximizing over expectedutility of the resulting histories is a hugecomputationalchore. Thedifficulty of this task demandscareful control of computationalexpenditures,and often simplifying assumptionsmust be madein order to preservetractability. Onecommon simplification is to apply decision theory to a reduceddecision model,onewhosestates are sets of finer states in a morecompletemodel.For example,[Deanet al. 1993] apply decision theoretic methodsto planning on a stochastic automaton,and cluster together all states lying outside of a certain envelope for purposes of policy determination. This paper discusses someof the issues arising in trying to forma coarse, or abstract, decision modelso that work done in planning using such a model providesreasonableresults for the original problem. It shouldbe notedthat other approaches to controllingthe computational expenditures of a planner often involve decisions madeon a reduced decision model, so these issues are quite general. For example,whencomputational effort is controlled using a metalevelarchitecture, as in [Russell and Wefald1991], allocation of resources to a computationis not determined on the basis of all the informationavailable to the planner, whichis generally adequateto determinethe result of the computation itself, but rather on the basis of certain features of the situation whichare moreeasily computedand are used to indicate the expected value of doing the computation.Thus, the metalevel is controlling computationsusing a decision modelwhichtreats various computationssharing certain features as the same. It is a general characteristic of abstract modelsthat the feature set used to condition decisions is reduced from that of the complete model becausedistinctions betweenfine states in the sameabstract state are ignored. 261 Abstract Decision Models ¯ To groundintuitions, let us consider a problemdomain given as a Markovdecision process. This consists of a set of states, wherefor eachstate and action availablein that state there are fixed transition probabilitiesto Otherstates. Eachstate also has a certain utility, the valuethe planning agentgets frombeingin that state. Thegoalof the agentis to chooseactions so as to maximizethe long-runexpected value. (Infinite time values can be handledby discounting future rewardsby a factor exponentialin time). Thereare variety of methods,suchas policy iteration, for findingthe optimalpolicy, or action choices,for this problem.Several suchmethods,andtheir relation to planningas traditionally conceived, are discussed in [Koenig 1991]. For many planningproblems,however,a straightforwardapplication of these methodswouldbe computationallyintractable. Wecan create an abstract decision model for this problemby partitioningthe state spaceinto abstract states, and conditioning our action choices only on the abstract state wefind ourselves in, in effect assigning the same action choiceto each fine state in a given abstract state. For every abstract policy, therefore, there is a corresponding policy on the original space, or fine policy. Thesepolicies formsomesubset of the total possible fine policies, andfor decisionsmadeusingthe abstract modelto be goodfor the original problem,this subset mustcontain policies adequatelyclose in valueto the optimalfine policy. This immediatelysuggests a criterion for goodabstract decision models. Criterion 1: The potential value of an abstract decision modelis the maximum value for any abstract policy on this modelconsidered as a policy for the completedecision model. Clearly, betweentwo abstract decision models,the one with the higherpotential value is able to represent better policies. However, applyingthis criterion to decideupona goodabstract modelis likely to involve computational effort comparableto that neededto solve the complete decision model.It is thereforeinteresting to considerwhat this criterion says in terms of the relation betweenfine states within a given abstract state, in order to devise a morelocally testable measureof modelquality, If the optimal policy for the complete modelassigns the same actionto eachfree state in a givenabstract state, then using the abstract state does not prevent representationof that policy. Thesameaction will be assignedto eachfine state if the particularstate is irrelevantto the consequences of the availableactions. This suggestsa local test sufficient for guaranteeing lossless abstraction. Local Test 1: Are action consequencesindependentof the fine state giventhe abstractstate? Satisfaction of thistest is anevenstronger demand than choosingan abstraction so as to maintainmaximalpotential value, but is locally determinable,andmaytherefore offer approximations of tractable use. For example,a test for the degree of action consequenceindependencecould be used to decide whether to split an abstract state into two substates. Sucha test is reminiscentof the Galgorithmof [Chapman and Kaelbling 1991], discussed below. No such locally testable criterion can be expectedto reproducethe behaviorof the potential valuecriterion, however,because Iossless abstraction is obtainable, evenwithout the same behavioracross undistinguished fine states for all actions, usingabstractstates whosefine states merelyall requirethe sameaction to be chosen. Butthe action to be chosenin a state is only deducibleusingglobal information,becauseit couldaffect rewardsin the distant future. Evenif oneis givena partition of fine states into abstract ones, further difficulties remainin defining the abstract decisionmodel.In particular, weneedto assign utilities and transition probabilities to the abstract states. If one couldassignto eachabstract state a probabilitydistribution over its constituent fine states, one could averagetheir utility anduse for distributions over action consequences the averageof those for the fine states. However,such a distributionoverfine states is not easy to constructbecause the appropriateone can changedrastically as the system evolves. For example, consider the Markovdecision process in Figure la. The numbersrepresent transition probabilities, someof whichare conditional on the choice choicea/b choicea/b a.8 b.2 a.2 .8 .9 .9 x I 1-x O,(b) (a) Figure1. Affectof abstraction on probabilities. 262 betweenaction "a" and action "b." A smaller, abstract modelcan be formedby combiningthe boxedstates into a single abstract state, as shownin Figurelb. In the abstract space,the choiceis irrelevant, as either action leads to the abstract state. However,the distribution over fine states giventhat oneis in the abstract state doesdependon which action choice is made(the first will occur with either probability 0.8 or 0.2). This ambiguityis reflected in the abstract modelby the uncertaintyover whatprobabilities to use for transitions out of the abstract state. In figure lb, choice"a" leads to the value (0.8x0.9+ 0.2x0.1)= 0.74 for "x," whereas"b" leads to (0.2x0.9+ 0.8x0.1)= 0.26. Let us nowconsider what happenswhena policy choice is fixed. Thecompletemodelis reducedto a (decisionless) Markov model.If this modelis ergodic,so everystate is reachablefromeveryother state, thenthe probabilityof a fine state givenits containingabstract state is welldefined, and in fact is just its long-termrelative frequencyin the completemodel.Evenwithoutergodicity, if an initial state is fixed as well, the fine state probabilities can be determined,and the abstract modelis well defined. (The abstract states not reachablegiventhe chosenpolicy will of coursenot have, nor need, well-defineddistributions over their constituent fine states.) Therefore,the ambiguities discussed aboveare always a consequenceOf a dependence of the dynamicson policy choiceswhichhas beenhiddenin the abstraction. Learning Abstract Decision Models Onemight hope to avoid these problemsby learning the abstract modelfromexperience, avoidingreference to the completemodelaltogether. In any case, one often doesn’t knowthe probabilities for the complete model either, makinglearning necessary, and hopefully easier to carry out in the smaller model.However,this does not avoid the difficulty of assigning transition probabilities, because whenoneis trying action "a," onewill observeprobability x = 0.74, whereaswhenone is trying "b," one will observe x = 0.26. If one learns the full dependenceof these probabilities on the various possible policies, one is in effect learning the complete decision model, whichis assumedto be computationally intractable. Let us consider a learning strategy whereonechoosesa policy, learns the modelparametersusing this policy, and uses the model so learned to choose a newpolicy. (Of course, someelementsof a modelcannotbe learned given a certain policy, such as the transition probabilities for actions not taken in the policy. Wecan leave these at the values inherited fromprevious learning cycles.) Let us definea policyas locallystable if it is a f’Lxedpoint under this process(i.e. it is optimalwith respect to its model). Such a policy will be optimal for models in some neighborhood of the appropriateone for the policy, giving it somestability with respect to statistical variancein the learned model. Unfortunately, there can exist several policies whichare locally stable for a givenabstract model. Consider the complete model in Figure 2a. An abstract modelcan be formedby combiningthe boxedstates into ease of learning (whereasCriterion 1 reflected best case modeling ability). Criterion2: Thehorizonsize of an abstract modelis the proportion of its possible abstract policies which lead, via iteration of perfect (given the policy) model learning and optimal (given the model) policy determination, to a policy achieving the model’s potential value. Thehorizonsize characterizesthe extent of the basin of attraction around optimal policies, whenusing local learning algorithms. A modelwith a larger horizon size will, given a randomstarting policy, be morelikely to convergeon its optimal policy. However, as with Criterion 1, determinationof horizonsize for use in choosinga good abstract modelis likely to involve computationaleffort similar to that neededto find the globallyoptimalabstract policyin the fh’st place. Therefore,againas with Criterion 1, weshall consider whatlocally testable conditions are informative as to its degree of satisfaction. A fixed distributionover fine states giventheir containingabstract state enablesdeterminationof all abstract modelparameters fromthe completemodel.This suggests a local test whose satisfaction guaranteesthat no informationabout earlier policy decisionscan be hiddenin the fine states, and the modelparametersare independentof policy. (a) [ oh°ice a/b [ .3 b [+]I+l (b) Figure2. Affectof abstractionon values. abslract states, as in Figure2b. Givenchoice"a," the utility "x" is observedto be 1, and"y" is observedto be 0. With thoseutilities, "a" is the optimalchoice,as it leads to the morevaluablestate with higherprobability. Therefore,"a" is a locally stable policy. However, "b" is actuallya better policy, becauseits adoptiongives x = 0 and y = 1, for a total expected utility of 0.7 (that of "a" is 0.6). Becauseof these difficulties, no algorithmwhichlearns its modelusingthe currentpolicy andthen tries to choosea newpolicy using its current modelcan be expected to convergeto a globally optimalpolicy on the abstract model. One can design models where early decisions decide betweenunseenfine states in wayshaving no noticeable affect until somearbitrary future point, whenthey determine the relative merits of another decision. Therefore,no algorithmsuchas policy iteration, whichuses local information to decide on changes improving the current policy, will in general be able to find the policy achievingan abstract moderspotential value. Becauseexhaustivesearch throughthe space of abstract plans for the best one, learning eachpolicy’s model,is an excessivecomputationalburden,the plannermaystill have to resort to using somelocal learning and policy choice algorithm,suchas alternatingcyclesof policy iteration with model learning. Since such methods are subject to entrapment by locally stable policies, a goodmodelwill be one wherethis possibility is minimized.This suggests anothercriterion for goodabstract models,one reflecting 263 LocalTest 2:. Is the distribution over the fine states of an abstract state independent of the actionschosenat its abstractparents,giventheir fine state distributions? Satisfactionof this local test is an evenstrongerdemand than that the horizon size be 1. Again, however, no criterion whichis locally testable can be expected to reproducebehaviorlike that of the horizonsize criterion, becausemodeldependencieson action choice (which can be locally found) mightnot prevent learning the optimal policy, and only global policy comparisonscan generally determinewhichsuch dependenciescreate locally stable suboptimalpolicies. Both the stated criteria express the intuition that informationabstracted awayshould be minimallyrelevant to optimalpolicy choice. Criterion 1 expressesirrelevance to ability to specify an optimal policy, and Criterion 2 expressesirrelevanceto ability to find the best specifiable policy. In both cases, local tests are suggested which consider relevance of the ignored information towards ability to set abstract model parameters retaining articulation with the complete model, independent (as requiredby locality) of whatthe optimalpolicy in fact is. One can check locally for relevance of ignored state distinctionsto all possibleactionchoices,or to assignation of transition probability andutility values, but one cannot determinelocally whetherthis informationwill actually impacton the problemof findingan optimalpolicy. Related Work and Conclusions Theseissues are relevant to anyformof decision-theoretic planning which uses abstraction to reduce the computationalburden. For example,in the workby [Dean et al. 1993] mentionedearlier, the more challenging planningcases requiredseveral restrictions on the abstract modelsto be imposedby the designer without recourse to decision-theoreticjustification. Theseincludedconsidering a reducedset of policies, designerchoiceof a small set of variables on whichto condition action choices, and hand setting of sometransition probabilities in the abstract models used. The above arguments are intended to demonstrate that suchrestrictions, not chosento be optimal but simplyto reduce the scope of the problemof finding optimalsolutions, will generallybe necessary. Other work has been done on how to construct an abstract decision modelfor large problems.For example,a paper by [Chapmanand Kaelbling 1991] described a method,called the G algorithm, for collapsing the state space for Q-learning. This algorithm was based on a principle similar to the local tests discussed above.The algorithmstarted with the state space, parametrizedby a sequenceof bits, beingabstractedinto a single state. It then reeursively splits states on the value of a bit whenthe resulting finer states havesignificantlydifferent Q-values. This is a local test for relevance, and its usefulness is subject to the constraints discussed above. The authors acknowledge that successof the algorithmrequires a welldesigned bit representation of the states. Such a representationmayoften be hard to find. Anotherpaper, by [Moore1991], also discusses a model learning algorithmwhichstarts with a very abstract model and refines as deemednecessary. His algorithmmakesuse of a Euclideanstructure on the state space. This enables himto choosea central fine state in eachabstract state as representative,effectively choosinga distribution over fine states putting all weighton the central onein order to set abstract modelparameters.Hecan then choosean optimal abstract policy and use it to project the future with the complete model, refining the abstract space along the expectedpath until the point whereit differs from the projection. Such an algorithm may often refine unnecessarily, and will only refrain fromdoing so to an unmanageabledegree if the complete modelbehavior is adequately continuous in state description parameters, making the abstract model parameter settings and projections sufficiently accurate to avoid excessive explorationof the space. Somesuch structure on the state space, possiblyinherited fromthe geometryof the physical environmentof the planner, mayprovide the best hope generally for applying local methods such as those describedaboveto the constructionof an adequateabstract decision model. The issue of choosing model parameters for a given abstract space is very closely related to the small world problemdiscussed by [Savage 1972] and [Shafer 1986]. Savage’sconcernwasto assign probabilities in the abstract model(defined slightly differently from here) so as maintainthe preferencesbetweenabstract policies foundin the completedecision model.Suchassignments,given his methodfor choosingutilities, could involvealtering even probabilities having unambiguous values in our 264 formulation.Asdiscussedby Shafer, his formulationof the problemdoes not appear to directly address our needs, wherepolicy preferences, are initially unknown. However, it doesexhibit the longhistory of difficulties experienced in trying to choose abstract modelsadequateto the task of determiningpolicy values. It is commonknowledge that the computational complexityinherent in decision theoretic methodsmakes themonly an idealized guide to rational decision making. Thepurposeof this paper has beento exposethe particular simplifyingassumptionsrequired for their application to the construction of abstractions in planning. The elucidationof idealizedcriteria for choosingabstractions, and the considerationof the nature of moreapplicablebut less refined criteria, will hopefully improve our understandingof the structure of realistically solvable problemsand the extent to whichnormative methodscan be of use in solvingthem. Acknowledgments This work has benefited from discussions with Stuart Russell and other membersof the RUGSgroup at UC Berkeleyandthe BerkeleyInitiative in Soft Computing. It was supported by a fellowship from NASA. References Chapman, D. and Kaelbling, L., "Input Generalization in Delayed Reinforcement Learning: An Algorithm And PerformanceComparisons,"IJCAI, 1991. Dean,T., Kaelbling, L., Kirman,J., and Nicholson,A., "Deliberation Schedulingfor Time-Critical Sequential DecisionMaking,"Uncertaintyin Artificial Intelligence, MorganKaufmann,San Mateo, CA, 1993. Koenig,S., OptimalProbabilistic and Decision-Theoretic Planningusing MarkovianDecision Theory, Report No. UCB/CSD92/685, Computer Science Division, Universityof California, Berkeley,CA,1992. Moore,A., "Variable Resolution DynamicProgramming: Efficiently LearningActionMapsin Multivariate Realvalued State-spaces," MachineLearning:Proceedingsof the Eighth International Workshop,MorganKaufmann, 1991. Russell, S. andWefald,E., DoThe Right Thing, MITPress, Cambridge,MA,1991. Savage, L., The Foundationsof Statistics, Dover, New York, NY,1972. Shafer, G., "SavageRevisited," Statistical Science, 1:4, 1986. reprinted in Shafer, G. and Pearl, J., eds., Readingsin Uncertain Reasoning, MorganKaufmann,San Mateo, CA,1990. Tash, J., "A Framework for Planning UnderUncertainty," 1993 AAAI Spring Symposium on Foundations of AutomaticPlanning.