Issues in Constructing and Learning Abstract Decision

From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Issues
in Constructing
Decision
and Learning Abstract
Models
Jonathan King Tash
Group in Logic and the Methodology of Science
University of California
Berkeley, CA 94720
tas h@math. berkeley, edu
Introduction
Constructing
When
applyingdecision theory to problemsin planning,the
complexity of the domain raises several issues. In
particular, planning domainsgenerally require choosing
amongsequencesof actions, whosepossible numbergrows
exponentiallywith length. Wheneach action is described
in termsof a mapping
froman initial state to a distribution
over possible resulting states (as in [Tash 1993]),
calculating the consequences of each sequence and
maximizing
over expectedutility of the resulting histories
is a hugecomputationalchore. Thedifficulty of this task
demandscareful control of computationalexpenditures,and
often simplifying assumptionsmust be madein order to
preservetractability.
Onecommon
simplification is to apply decision theory to
a reduceddecision model,onewhosestates are sets of finer
states in a morecompletemodel.For example,[Deanet al.
1993] apply decision theoretic methodsto planning on a
stochastic automaton,and cluster together all states lying
outside of a certain envelope for purposes of policy
determination. This paper discusses someof the issues
arising in trying to forma coarse, or abstract, decision
modelso that work done in planning using such a model
providesreasonableresults for the original problem.
It shouldbe notedthat other approaches
to controllingthe
computational expenditures of a planner often involve
decisions madeon a reduced decision model, so these
issues are quite general. For example,whencomputational
effort is controlled using a metalevelarchitecture, as in
[Russell and Wefald1991], allocation of resources to a
computationis not determined on the basis of all the
informationavailable to the planner, whichis generally
adequateto determinethe result of the computation
itself,
but rather on the basis of certain features of the situation
whichare moreeasily computedand are used to indicate
the expected value of doing the computation.Thus, the
metalevel is controlling computationsusing a decision
modelwhichtreats various computationssharing certain
features as the same. It is a general characteristic of
abstract modelsthat the feature set used to condition
decisions is reduced from that of the complete model
becausedistinctions betweenfine states in the sameabstract
state are ignored.
261
Abstract Decision
Models ¯
To groundintuitions, let us consider a problemdomain
given as a Markovdecision process. This consists of a set
of states, wherefor eachstate and action availablein that
state there are fixed transition probabilitiesto Otherstates.
Eachstate also has a certain utility, the valuethe planning
agentgets frombeingin that state. Thegoalof the agentis
to chooseactions so as to maximizethe long-runexpected
value. (Infinite time values can be handledby discounting
future rewardsby a factor exponentialin time). Thereare
variety of methods,suchas policy iteration, for findingthe
optimalpolicy, or action choices,for this problem.Several
suchmethods,andtheir relation to planningas traditionally
conceived, are discussed in [Koenig 1991]. For many
planningproblems,however,a straightforwardapplication
of these methodswouldbe computationallyintractable.
Wecan create an abstract decision model for this
problemby partitioningthe state spaceinto abstract states,
and conditioning our action choices only on the abstract
state wefind ourselves in, in effect assigning the same
action choiceto each fine state in a given abstract state.
For every abstract policy, therefore, there is a
corresponding
policy on the original space, or fine policy.
Thesepolicies formsomesubset of the total possible fine
policies, andfor decisionsmadeusingthe abstract modelto
be goodfor the original problem,this subset mustcontain
policies adequatelyclose in valueto the optimalfine policy.
This immediatelysuggests a criterion for goodabstract
decision models.
Criterion 1: The potential value of an abstract
decision modelis the maximum
value for any abstract
policy on this modelconsidered as a policy for the
completedecision model.
Clearly, betweentwo abstract decision models,the one
with the higherpotential value is able to represent better
policies. However,
applyingthis criterion to decideupona
goodabstract modelis likely to involve computational
effort comparableto that neededto solve the complete
decision model.It is thereforeinteresting to considerwhat
this criterion says in terms of the relation betweenfine
states within a given abstract state, in order to devise a
morelocally testable measureof modelquality, If the
optimal policy for the complete modelassigns the same
actionto eachfree state in a givenabstract state, then using
the abstract state does not prevent representationof that
policy. Thesameaction will be assignedto eachfine state
if the particularstate is irrelevantto the consequences
of the
availableactions. This suggestsa local test sufficient for
guaranteeing
lossless abstraction.
Local Test 1: Are action consequencesindependentof
the fine state giventhe abstractstate?
Satisfaction
of thistest is anevenstronger
demand
than
choosingan abstraction so as to maintainmaximalpotential
value, but is locally determinable,andmaytherefore offer
approximations
of tractable use. For example,a test for the
degree of action consequenceindependencecould be used
to decide whether to split an abstract state into two
substates. Sucha test is reminiscentof the Galgorithmof
[Chapman
and Kaelbling 1991], discussed below. No such
locally testable criterion can be expectedto reproducethe
behaviorof the potential valuecriterion, however,because
Iossless abstraction is obtainable, evenwithout the same
behavioracross undistinguished
fine states for all actions,
usingabstractstates whosefine states merelyall requirethe
sameaction to be chosen. Butthe action to be chosenin a
state is only deducibleusingglobal information,becauseit
couldaffect rewardsin the distant future.
Evenif oneis givena partition of fine states into abstract
ones, further difficulties remainin defining the abstract
decisionmodel.In particular, weneedto assign utilities
and transition probabilities to the abstract states. If one
couldassignto eachabstract state a probabilitydistribution
over its constituent fine states, one could averagetheir
utility anduse for distributions over action consequences
the averageof those for the fine states. However,such a
distributionoverfine states is not easy to constructbecause
the appropriateone can changedrastically as the system
evolves. For example, consider the Markovdecision
process in Figure la. The numbersrepresent transition
probabilities, someof whichare conditional on the choice
choicea/b
choicea/b
a.8
b.2
a.2
.8
.9
.9
x
I
1-x
O,(b)
(a)
Figure1. Affectof abstraction
on probabilities.
262
betweenaction "a" and action "b." A smaller, abstract
modelcan be formedby combiningthe boxedstates into a
single abstract state, as shownin Figurelb. In the abstract
space,the choiceis irrelevant, as either action leads to the
abstract state. However,the distribution over fine states
giventhat oneis in the abstract state doesdependon which
action choice is made(the first will occur with either
probability 0.8 or 0.2). This ambiguityis reflected in the
abstract modelby the uncertaintyover whatprobabilities to
use for transitions out of the abstract state. In figure lb,
choice"a" leads to the value (0.8x0.9+ 0.2x0.1)= 0.74 for
"x," whereas"b" leads to (0.2x0.9+ 0.8x0.1)= 0.26.
Let us nowconsider what happenswhena policy choice
is fixed. Thecompletemodelis reducedto a (decisionless) Markov
model.If this modelis ergodic,so everystate
is reachablefromeveryother state, thenthe probabilityof a
fine state givenits containingabstract state is welldefined,
and in fact is just its long-termrelative frequencyin the
completemodel.Evenwithoutergodicity, if an initial state
is fixed as well, the fine state probabilities can be
determined,and the abstract modelis well defined. (The
abstract states not reachablegiventhe chosenpolicy will of
coursenot have, nor need, well-defineddistributions over
their constituent fine states.) Therefore,the ambiguities
discussed aboveare always a consequenceOf a dependence
of the dynamicson policy choiceswhichhas beenhiddenin
the abstraction.
Learning Abstract Decision
Models
Onemight hope to avoid these problemsby learning the
abstract modelfromexperience, avoidingreference to the
completemodelaltogether. In any case, one often doesn’t
knowthe probabilities for the complete model either,
makinglearning necessary, and hopefully easier to carry
out in the smaller model.However,this does not avoid the
difficulty of assigning transition probabilities, because
whenoneis trying action "a," onewill observeprobability
x = 0.74, whereaswhenone is trying "b," one will observe
x = 0.26. If one learns the full dependenceof these
probabilities on the various possible policies, one is in
effect learning the complete decision model, whichis
assumedto be computationally
intractable.
Let us consider a learning strategy whereonechoosesa
policy, learns the modelparametersusing this policy, and
uses the model so learned to choose a newpolicy. (Of
course, someelementsof a modelcannotbe learned given a
certain policy, such as the transition probabilities for
actions not taken in the policy. Wecan leave these at the
values inherited fromprevious learning cycles.) Let us
definea policyas locallystable if it is a f’Lxedpoint under
this process(i.e. it is optimalwith respect to its model).
Such a policy will be optimal for models in some
neighborhood
of the appropriateone for the policy, giving
it somestability with respect to statistical variancein the
learned model. Unfortunately, there can exist several
policies whichare locally stable for a givenabstract model.
Consider the complete model in Figure 2a. An abstract
modelcan be formedby combiningthe boxedstates into
ease of learning (whereasCriterion 1 reflected best case
modeling
ability).
Criterion2:
Thehorizonsize of an abstract modelis
the proportion of its possible abstract policies which
lead, via iteration of perfect (given the policy) model
learning and optimal (given the model) policy
determination, to a policy achieving the model’s
potential
value.
Thehorizonsize characterizesthe extent of the basin of
attraction around optimal policies, whenusing local
learning algorithms. A modelwith a larger horizon size
will, given a randomstarting policy, be morelikely to
convergeon its optimal policy. However,
as with Criterion
1, determinationof horizonsize for use in choosinga good
abstract modelis likely to involve computationaleffort
similar to that neededto find the globallyoptimalabstract
policyin the fh’st place. Therefore,againas with Criterion
1, weshall consider whatlocally testable conditions are
informative as to its degree of satisfaction. A fixed
distributionover fine states giventheir containingabstract
state enablesdeterminationof all abstract modelparameters
fromthe completemodel.This suggests a local test whose
satisfaction guaranteesthat no informationabout earlier
policy decisionscan be hiddenin the fine states, and the
modelparametersare independentof policy.
(a)
[ oh°ice a/b [
.3
b
[+]I+l
(b)
Figure2. Affectof abstractionon values.
abslract states, as in Figure2b. Givenchoice"a," the utility
"x" is observedto be 1, and"y" is observedto be 0. With
thoseutilities, "a" is the optimalchoice,as it leads to the
morevaluablestate with higherprobability. Therefore,"a"
is a locally stable policy. However,
"b" is actuallya better
policy, becauseits adoptiongives x = 0 and y = 1, for a
total expected
utility of 0.7 (that of "a" is 0.6).
Becauseof these difficulties, no algorithmwhichlearns
its modelusingthe currentpolicy andthen tries to choosea
newpolicy using its current modelcan be expected to
convergeto a globally optimalpolicy on the abstract model.
One can design models where early decisions decide
betweenunseenfine states in wayshaving no noticeable
affect until somearbitrary future point, whenthey
determine the relative merits of another decision.
Therefore,no algorithmsuchas policy iteration, whichuses
local information to decide on changes improving the
current policy, will in general be able to find the policy
achievingan abstract moderspotential value.
Becauseexhaustivesearch throughthe space of abstract
plans for the best one, learning eachpolicy’s model,is an
excessivecomputationalburden,the plannermaystill have
to resort to using somelocal learning and policy choice
algorithm,suchas alternatingcyclesof policy iteration with
model learning. Since such methods are subject to
entrapment
by locally stable policies, a goodmodelwill be
one wherethis possibility is minimized.This suggests
anothercriterion for goodabstract models,one reflecting
263
LocalTest 2:. Is the distribution over the fine states of
an abstract state independent
of the actionschosenat its
abstractparents,giventheir fine state distributions?
Satisfactionof this local test is an evenstrongerdemand
than that the horizon size be 1. Again, however, no
criterion whichis locally testable can be expected to
reproducebehaviorlike that of the horizonsize criterion,
becausemodeldependencieson action choice (which can
be locally found) mightnot prevent learning the optimal
policy, and only global policy comparisonscan generally
determinewhichsuch dependenciescreate locally stable
suboptimalpolicies.
Both the stated criteria express the intuition that
informationabstracted awayshould be minimallyrelevant
to optimalpolicy choice. Criterion 1 expressesirrelevance
to ability to specify an optimal policy, and Criterion 2
expressesirrelevanceto ability to find the best specifiable
policy. In both cases, local tests are suggested which
consider relevance of the ignored information towards
ability to set abstract model parameters retaining
articulation with the complete model, independent (as
requiredby locality) of whatthe optimalpolicy in fact is.
One can check locally for relevance of ignored state
distinctionsto all possibleactionchoices,or to assignation
of transition probability andutility values, but one cannot
determinelocally whetherthis informationwill actually
impacton the problemof findingan optimalpolicy.
Related Work and Conclusions
Theseissues are relevant to anyformof decision-theoretic
planning which uses abstraction
to reduce the
computationalburden. For example,in the workby [Dean
et al. 1993] mentionedearlier, the more challenging
planningcases requiredseveral restrictions on the abstract
modelsto be imposedby the designer without recourse to
decision-theoreticjustification. Theseincludedconsidering
a reducedset of policies, designerchoiceof a small set of
variables on whichto condition action choices, and hand
setting of sometransition probabilities in the abstract
models used. The above arguments are intended to
demonstrate
that suchrestrictions, not chosento be optimal
but simplyto reduce the scope of the problemof finding
optimalsolutions, will generallybe necessary.
Other work has been done on how to construct an
abstract decision modelfor large problems.For example,a
paper by [Chapmanand Kaelbling 1991] described a
method,called the G algorithm, for collapsing the state
space for Q-learning. This algorithm was based on a
principle similar to the local tests discussed above.The
algorithmstarted with the state space, parametrizedby a
sequenceof bits, beingabstractedinto a single state. It then
reeursively splits states on the value of a bit whenthe
resulting finer states havesignificantlydifferent Q-values.
This is a local test for relevance, and its usefulness is
subject to the constraints discussed above. The authors
acknowledge
that successof the algorithmrequires a welldesigned bit representation of the states. Such a
representationmayoften be hard to find.
Anotherpaper, by [Moore1991], also discusses a model
learning algorithmwhichstarts with a very abstract model
and refines as deemednecessary. His algorithmmakesuse
of a Euclideanstructure on the state space. This enables
himto choosea central fine state in eachabstract state as
representative,effectively choosinga distribution over fine
states putting all weighton the central onein order to set
abstract modelparameters.Hecan then choosean optimal
abstract policy and use it to project the future with the
complete model, refining the abstract space along the
expectedpath until the point whereit differs from the
projection. Such an algorithm may often refine
unnecessarily, and will only refrain fromdoing so to an
unmanageabledegree if the complete modelbehavior is
adequately continuous in state description parameters,
making the abstract model parameter settings and
projections sufficiently accurate to avoid excessive
explorationof the space. Somesuch structure on the state
space, possiblyinherited fromthe geometryof the physical
environmentof the planner, mayprovide the best hope
generally for applying local methods such as those
describedaboveto the constructionof an adequateabstract
decision model.
The issue of choosing model parameters for a given
abstract space is very closely related to the small world
problemdiscussed by [Savage 1972] and [Shafer 1986].
Savage’sconcernwasto assign probabilities in the abstract
model(defined slightly differently from here) so as
maintainthe preferencesbetweenabstract policies foundin
the completedecision model.Suchassignments,given his
methodfor choosingutilities, could involvealtering even
probabilities
having unambiguous values in our
264
formulation.Asdiscussedby Shafer, his formulationof the
problemdoes not appear to directly address our needs,
wherepolicy preferences, are initially unknown.
However,
it doesexhibit the longhistory of difficulties experienced
in
trying to choose abstract modelsadequateto the task of
determiningpolicy values.
It is commonknowledge that the computational
complexityinherent in decision theoretic methodsmakes
themonly an idealized guide to rational decision making.
Thepurposeof this paper has beento exposethe particular
simplifyingassumptionsrequired for their application to
the construction of abstractions in planning. The
elucidationof idealizedcriteria for choosingabstractions,
and the considerationof the nature of moreapplicablebut
less refined criteria, will hopefully improve our
understandingof the structure of realistically solvable
problemsand the extent to whichnormative methodscan
be of use in solvingthem.
Acknowledgments
This work has benefited from discussions with Stuart
Russell and other membersof the RUGSgroup at UC
Berkeleyandthe BerkeleyInitiative in Soft Computing.
It
was supported by a fellowship from NASA.
References
Chapman,
D. and Kaelbling, L., "Input Generalization in
Delayed Reinforcement Learning: An Algorithm And
PerformanceComparisons,"IJCAI, 1991.
Dean,T., Kaelbling, L., Kirman,J., and Nicholson,A.,
"Deliberation Schedulingfor Time-Critical Sequential
DecisionMaking,"Uncertaintyin Artificial Intelligence,
MorganKaufmann,San Mateo, CA, 1993.
Koenig,S., OptimalProbabilistic and Decision-Theoretic
Planningusing MarkovianDecision Theory, Report No.
UCB/CSD92/685, Computer Science
Division,
Universityof California, Berkeley,CA,1992.
Moore,A., "Variable Resolution DynamicProgramming:
Efficiently LearningActionMapsin Multivariate Realvalued State-spaces," MachineLearning:Proceedingsof
the Eighth International Workshop,MorganKaufmann,
1991.
Russell, S. andWefald,E., DoThe Right Thing, MITPress,
Cambridge,MA,1991.
Savage, L., The Foundationsof Statistics, Dover, New
York, NY,1972.
Shafer, G., "SavageRevisited," Statistical Science, 1:4,
1986.
reprinted in Shafer, G. and Pearl, J., eds., Readingsin
Uncertain Reasoning, MorganKaufmann,San Mateo,
CA,1990.
Tash, J., "A Framework
for Planning UnderUncertainty,"
1993 AAAI Spring Symposium on Foundations of
AutomaticPlanning.