PAGODA: An Integrated Architecture for Autonomous Intelligent

advertisement
From: AAAI Technical Report WS-93-06. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
PAGODA: An Integrated
Architecture
Autonomous Intelligent
Agents
for
Marie desJardins
SRI International
333 R.avenswood Ave.
Menlo Park CA 94025
marie@erg.sri.com
2
Abstract
PAGODA
(Probabilistic
Autonomous GOalDirected Agent) is an autonomousintelligent
agent that explores a novel environment, building a modelof the world and using the modelto
plan its actions [desJardins, 1992b]. PAGODA
incorporates solutions to the problems of selecting learning tasks, choosinga learning bias,
classifying observations, and performing induetire learning of world modelsunder uncertainty,
ill all integrated systemfor planning and learning in complex domains.
This paper raises some key issues in building
autonomous embedded agents and shows how
PAGODA
addresses these issues. The probabilistic learning mechanismthat PAGODA
uses
to build its world model is described in more
detail.
1
Introduction
PAGODA
is an autonomous intelligent
agent that explores a novel, possibly nondeterministic, environment,
building a probabilisti¢ model of the world and using
the model to plan its actions to maximizeutility. The
guiding principles behind PAGODA
include probabilistic
representation of knowledge, Bayesian evaluation techniques, and limited rationality as a normative behavioral
goal. The key properties of PAGODA
are:
¯ The agent operates autonomously,with rot,ritual intervention from humans, and does not require a
teacher to present or classify learning instances, or
to provide a representation for learned theories.
¯ The agent handles uncertainty due to inaccurate
sensors, randomness in the environment, and sensory limitations. The learned theories express observed uncertainty explicitly.
Section 2 motivates the problem by defining embedded
limited rational agents and autonomouslearning. Section 3 describes the architecture and the componentsof
PAGODA;
the probabilisti¢ learning componentis presented mmore detail in Section 4. Conclusions are given
in Section 5.
15
The Problem
Learning
of
Autonomous
In this section, we define the concepts of embeddedlimited rational agents and autonomouslearning and discuss four specific problemsthat arise from these definitions. An embeddedagent consists of three components:
a transducer, a learning module, and a planner. Embedded agents must interact with their environments in
real time, and are continuously being affected by and
manipulating the environment.
The sensory inputs received from the transducer,
which maybe incomplete or inconsistent, are referred to
as the agent’s perceived world. A perceived world may
correspond to many actual world states. The agent’s
actions allow it to moveabout in limited ways, usually
with limited accuracy.
The learning module uses background knowledge and
its sensory inputs to build a modelof the world for the
planner to use. In our model, the agent initially knows
the set of actions it can execute, but not what effect
those actions have.
A rational agent chooses actions that maximizeits expected utility. A limited rational agent takes into account the cost of time, balancing time spent deliberating
with performing external actions. An autonomousagent
operates independently of humanintervention. Specifically, it does not require inputs (except for its initial
state) to tell it what its goals are, how to behave, or
what to learn.
Deciding What to Learn: In a complex environment, the true world model will be too complicated
for an agent with limited resources to learn completely.
Therefore, the agent will have to focus attention on
learning portions of this true world model. A rational
agent should allocate its resources to maximizeits ultimate goal achievement,by focusing its learning attention
on whatever aspects of the world are expected to be most
useful to learn.
Selecting Representations for Learning: An autonomous agent must decide what bias to use for each
learning task. Bias, as defined by Mitchell [1980], is
"...any basis for choosing one generalization over another, other than strict consistency with the observed
training instances." Rather than requiring the designer
to specify biases for each potential learning task, the
L................
ENVIRONMENT/
_J
TRANSDUCER
PAGODA
PLANNER
LEARNER
l EVALUATION
HYPOTHESIS
PROBABILISTIC
SEARCH
I EVAL
TION
[
GOAL GENERATION
ies
I"
POT"ES’S
J GENERATION
Goals
( KNOWLEDGE
/
¯ k BASE
/ ~ ......
.....
Theories
Figure
J"
1 Schematic
agent must use background or learned knowledge to select biases as learning tasks arise.
Learning in Uncertain
Domains: Many real-world
environments contain uncertainty, which can arise from
randomness, noise in the agent’s sensors, sensory limitations, and/or complexity. Traditionally,
learning has
been defined as the problem of finding a theory that is
consistent with all observed instances.
However, when
uncertainty is present, there may be no consistent theories under a reasonable learning bias. In this case, the
agent must either settle for an inconsistent world model
or represent uncertainty explicitly in its model.
Planning
Under Uncertainty:
When an agent’s
learned world model contains uncertainty,
the agent
needs a planning mechanism that can maximize goal
satisfaction
in the face of this uncertainty.
Most Ai
planning techniques require deterministic models of the
world, and are therefore inapplicable in this case. Fortunately, decision theory provides a paradigm for behaving optimally under uncertainty: a rational agent should
choose whatever action maximizes its expected fltt.ure
average utility per unit time.
3
PAGODA:
An
Autonomous
Agent
Model
PAGODA(Probabilistic
Autonomous GOal-Directed
Agent) is a model for an intelligent agent that was motivated by the issues and problems discussed in the previous section. PAGODA
is a limited semi-rational
embedded agent that exhibits autononmus learning. We say
"semi-rational"
because PAGODA
does not exhibit optimal resource-bounded
behavior. However, the model
does explicitly consider issues of limited rationality, providing important contributions towards building an optimal agent.
16
New theories;
biases
view of PAGODA
3.1
Architecture
PAGODA
builds an explicit,
predictive world model that
its planner can use to construct action plans. The learnlag process is guided by the system’s behavioral goal (to
maximize utility),
and background knowledge is used to
select learning biases automatically.
Figure 1 shows a
schematic view of PAGODA.The behavior cycle of the
agent is as follows:
initial
knowledge goal is to predict its
1. PAGODA’s
utility: that is, it will first learn theories to predict
the utility of performing various actions in specified
world states. The agent’s utility is provided as one
of its sensory inputs.
background knowledge is used to as2. Probabilistic
sign a value to potential biases for each knowledge
goal (Section 3.3). The bias with the highest, value
is sent to the hypothesis generator.
3. Sensory observations are sent to the agent by the
transducer. Probabilities
of old theories are updated to reflect the new evidence provided by the
observations,
and new theories are generated and
evaluated (Section 4).
4. The planner analyzes the preconditions of the theories to determine which features of the environment
will be most useful (with respect to maximizing utility) to learn (Section 3.2). These most useful preconditions are sent to the learner as new knowledge
goals.
a forward search through the
5. The planner initiates
space of possible outcomes of actions, based on the
probabilistic
predictions made by the current best
theories. The action which maximizes expected utility is taken (Section 3.4).
fS. The action chosen by the planner is sent to the
transducer, which executes the action in the real
or simulated environment.
7. The sequence is repeated.
3.2 Goal-Directed Learning
Initially, PAGODA
has a trivial "theory" for features
in its sensory inputs, in the sense that it can determine
their values by examiningits sensory inputs. Its learning
effort is directed towards being able to predict feature
values resulting from a proposed sequence of actions, by
forming a model of the world that provides a mapping
from perceived world and actions to feature values.
PAGODA
incorporates a novel approach called GoalDirected Learning (GDL),which allows the agent to decide what features of the world are most worth learning
about [desJardins, 1992a]. The agent uses decision theory to computethe expected utility of being able to predict various features of the world, and formulates knowledge 9oals to predict the features with highest utility.
3.3 Evaluating Learning Biases
PAGODA
uses probabilistic
background knowledge to
evaluate potential biases for each knowledgegoal by compoting howwell each bias is expected to perform during
future learning. Probabilistic Bias Evaluation (PBE)
chooses a set of features that is as relevant as possible, without being so large that the complexity of the
learning task is excessive [desJardins, 1991]. Each potential bias (set of input features) is assigned a value
using a decision-theoretic computation which combines
the expected accuracy of predictions over time with a
time-preference (or discounting) function that expresses
the agent’s willingness to trade long-term for short-term
performance. The computed value represents the expected discounted accuracy of predictions made by theories formed using the given bias. The bias with the
highest value is used for learning.
Planning
3.4 Probabilistic
PAGODA
uses the principle of maximizingexpected utilit.v to choose its behaviors: it forward chains through
the probability space of predictions, and selects the action that maximizes
its expectedutility, tlowever,it. ctt,.rently only plans external actions and alwayssearches to
a fixed search depth, determined by the designer. The
planner occasionally chooses a randomaction instead of
selecting the best apparent action, in order to ensure that
exploration continues and the agent does not get stuck
o,J a local maximum,but it does not explicitly reason
about the value of taking such sub-optimal actions.
4
Probabilistic
Learning
PAGODA’s
theories are represented as sets of conditional probabilities. In order to evaluate them, the system trades off the predictive accuracy of a proposedtheory with its simplicity using Bayesian theory evaluation.
The prior probability of a theory is a function of its simplicity; the likelihood of the evidence correspondsto the
theory’s accuracy with respect, to the evidence. These
are combinedusing Bayes’ rule to yieht an overall probability.
To distinguish betweenvarious levels of abstraction of
t.heories, we use the terms class, structure, conditiolml
17
probabilities, and specific theories. Each prior probability distribution is over theory classes; one or more
theories are grouped into each class. The structure of a
theory within a class refers to the conditioning contexts
of the rules in the theory. Whenvalues for the conditional probabilities are addedto these rules, the theory
is called a specific theory.
Simplicity is considered to be a measure applied to
the class that a theory belongs in, rather than the theory itself. Various classification schemes maybe used
to measuresimplicity; somepossibilities are briefly presented in Section 4.2.2. The likelihood of the data (i.e.,
predictive accuracy), on the other hand, is a function of
the specific theory and conditional probabilities it contains.
4.1 Probabilistic
Representation and Inference
A theory consists of a set of conditional probability distributions (called "rules"); each of these specifies the observed distribution of values of the predicted ("output")
feature, given the conditioning context. Conditioning
contexts consist of a perceived world and possibly an
action taken by the agent. PCI (Probability Computation using Independence)is a probabilistic inference
mechanismthat combines the conditional probabilities
in a theory to make predictions about the output feature, given a proposed action and a perceived world [desJardins, 1993]. PCI determines which rules (conditional
distributions) within a theory are relevant for making
prediction: these are the most specific conditioning contexts that apply to the perceived world. If there are
multiple relevant rules, they are combined(using minireal independenceassumptions) to get a single predicted
distribution.
4.2 Bayesian Learning
In this section, we develop a Bayesian methodfor evaluating the relative likelihood of alternative theories. As
we will show, only two terms need to be considered:
the prior probability of the theory, P(T), and the accuracy of the theory, given by the likelihood of evidence
P(EIT). The former quantity is defined as a function
of the simplicity of the class containing the theory; the
latter is computedby applying PCI to the specific theory. The formula used to evaluate theories is derived
in Section 4.2.1, and the prior probability distributions
used by PAGODA
are discussed in Section 4.2.2. Finally,
likelihood is discussed in Section 4.2.3.
4.2.1 Bayesian Theory Evaluation
Recall that the theories being evaluated consist of conditional probabilities, which are determinedempirically.
These probabilities in a theory are distinct from the
probability of a theory. The goal in this section is to
find the probability of a theory.
The theory with the highest probability should be that
with the most effective structure for representing the observed data. Given this structure, the conditional probabilities within the theory are straightforward to optimize.
Complexstrt,ctt, res (those with manydependencies)cost
the agent in terms of space, computation time, and risk
ofoverfitting. On the other hand, simple structures with
only a few dependencies may not capture important relationships in the world.
The probability we wish to find, then, is the probability that the structure of this theory is the best. representation of the behavior of the environment. It is not
the probability that the particular values of tile conditional (statistical) probabilities in the theory are correct,
or eventhat they are close, l The statistical probabilities
are estimated using observed frequencies; this maximizes
the accuracy of the theory as given by the Bayesian likelihood P(EIT A K).
Using the notation
T a proposed theory
K background knowledge
E evidence: a sequence of observations el, e2,..., e,,
Bayes’rule gives
P(TIK A E) P(TIK) P( E[T h
(1)
p(EjK)
Given that the same evidence and background knowledge
is used to evaluate competingtheories, the constaut factor P(EIK) in the denominator can be dropped,"~ yielding
P(TIK A E) ¢x P(TIK) P(EIT ^ h’)
(2)
Wealso assume that the individual observations
el ...en composingE are independent, given K and 7’.
This standard conditional independence assumption is
reasonable, because the theories generated by the agent
make independent predictions. Therefore, T embodies
an assumption that the observations are independent,
which must be true if T holds. Given this independence
assumption, Equation 2 becomes:
P(TIKAE ) (x P(TIK) H’~=~P(e,ITAh’)
(3)
The first quantity on the right-hand side represents
the "informedprior"--i.e., the probability of tile theory
given the background knowledgeK, but no direct, evidence. The second quantity represents the likelihood of
the theory, i.e., the combinedprobabilities of each piece
of evidence given the theory and K.
4.2.2 Prior Probability
The prior probability of a theory, P(T), is the probability of T before any evidence has been collected. A
prior, however, is never completely uninformed: even
before any direct observations about a particular learning task are made,an agent’s past. experience, a vailabh,
sensors, and internal representation will affect its disposition to believe a theory, and hence its prior probability
distribution. All of the backgroundknowledgeavailable
to an agent should ideally be reflected in its "prior."
The background knowledge K of Equation 3 consists of probabilistic backgroundknowledge(provided by
1Which
is not to say that the statistical probabilitiesaren’t
close, simplythat we do not claim to measuretheir accuracy
explicitly with this evaluationtechnique.
aNote that if we drop these normalizing factors, we no
longer havea true probability distribution. However,for notational convenience,
the resulting measurewill still be called
P.
18
the designer) and theories about other output features,
whichare irrelevant to the probability of this theory. The
probabilistic background knowledge is used by PAGODA
to select the initial bias, i.e., the langua[gein whichits
thcories are represented [desJardins, 1991J. ’1 he relevant
information i,i the probabilistic backgroundknowledgeis
therefore implicit in the initial bias B, so that
P(TIK ) =
) P(TIB
The search mechanism in PAGODA
only explores the
space defined by the initial bias. However,B does not
provide a preference function over the space. Wecan
drop the conditioning context B, since all theories are
implicity conditioned by the same bias.
A variety of justifications have been proposed for the
use of simplicity as one test of the value of a theory. The
most commonreference to simplicity is Occam’srazor,
which tells us to select the most simple of the consistent
theories. But this has two problems:first, it does not tell
us what to do when we have a complex theory with high
accuracy on the learning set and a simpler, but slightly
less accurate, theory. (Or, rather, it does tell us what
to do--we are to reject all inconsistent theories out of
hand, which doesn’t seem reasonable.) Second, it does
not provide a definition of simplicity.
The approach used by PAGODA
is to evaluate theories
by first determining the prior probability of the theory
class, and then computingthe likelihood of specific theories within the class. The prior probability is therefore a
distribution over classes, rather than over theories. The
prior probability of a theory P(T) refers to the probability that the true theory Tr is in the sameclass as T,
i,e,,
P(T) = P(Class(T) = Class(T’))
Wehave used PAGODA
as a testbed to experiment
with several different metrics of simplicity. Theydiffer
in the level of classification (theories, rules, features, or
terms) and in the methodof finding the probability of the
class (the two methodsused are the uniform distribution
and a Huffman encoding scheme).
In the uniform distribution on theories, all theories
are placed in the same class, so equal prior probability is assigned to every theory. This prior leads to a
preference for theories that exactly fits the data, if any
exist. Underthe rule-level classification of theories, all
theories with the same number of rules are in the same
class and have equal probability. This leads t.o a weak
bias towards simpler theories--theories with fewer rules
are preferred, bill. the complexityof the rules within the
theory has no effect, in the feature-level classification of
theories, a stronger bias towards simplicity, all theories
with the sametotal numberof features in tile conditioning contexts of the rules are in the sameclass.
Huffmanencoding of terms is the strongest prior. The
classification level is terms ("words" within feature descriptors), but a uniform distribution is not assumed.
Rather, the frequency of terms within the theory is used
to compute an optimal (Huffman) encoding for the theory, and the length of the encoded theory gives the negative logarithmof its probability.
Using PAGODA’s
Bayesian theory evaluation method,
all of these priors converge to the best theory, given
enoughexamples. However,we are interested ill increasing learning performanceill the short term, whena limited number of training examples maybe available. We
expect that ill more complex worlds with a greater degree of complexity, the stronger biases towards simplicity
will be better at avoiding overfitting in the short term,
since stronger evidence is required to include a rule in
a theory. In simple deterministic domains, we expect to
find a good theory quickly, so the weaker biases should
give good results. Preliminary experiments seem to bear
this out, but do not yet provide a good understanding
of whento select whichprior.
4.2.3 Likelihood of Evidence
P(etlTA K) is the probability of the observation made
at time l, given a theory and background knowledge.
P(et]TAK) is equal to P(etlT) if et is conditionally independent of K given T, which is a reasonable assumption since the other theories in K make no predictions
regarding T’s output feature, and any relevant informa,
lion ill K has already been used to select, the current
bias.
If the theory being evaluated predicts the output feature o, et can be rewritten as
senses, A act;£ontA Or+
1
This is because in the current implementatiou of
PAGODA,
only features at time t are considered for predicting features at time t + 1. This assumption does not
affect the analysis, though; for example,features at. time
t - 1 could be included without any significant modifications. The probability of el given the theory T is
feature-poor. However,even in this simple domain, GDL
makesthe optimal choice for a newfeature to learn, improving the overall performanceof the system within resource bounds. Weplan to test GDLand the planning
component in a domain with a larger number of interacting
features.
Therehas beena fairamountof recentresearch
on learning
probabilistic
theories,
particularly
decision
trees(e.g.,
[Buntine,
1990,Quinlan,
1986]).
Webelieve
thattheprobabilistic
rule-like
representation
usedby
PAGODA
is a morenatural
representation,
andthatbiasessuchas thosePAGODAusesto prefersimpletheorieswillleadtobetter
hypotheses.
However,
wehavenot
yet doneany directcomparisons
of PAGODAto these
systems.
To ourknowledge,
therearenootherintelligent
agent
architectures
thatintegrate
solutions
to therangeof
problems
PAGODAaddresses---selecting
learning
tasks,
choosing
biases,
learning
andreasoning
withprobabilistic
worldmodels,
andprobabilistic
planning.
Althoughmuchworkremains
to be done,we believe
that
PAGODAaddresses
manyof the fundamental
concerns
arising
fromthedesireto buildautonomous
learning
agents,
andprovides
a foundation
forbuilding
rational
autonomous
agentsthatcanlearnandplanin complex,
nondeterministic
environments.
References
[Buntine, 1990] Wray Buntine. A Theory of Learning
Classification Rules. PhDthesis, University of Technology, Sydney, February 1990.
[Chaitin, 1977] G. J. Chaitin. Algorithmic information
theory. IBMJ. Res. Develop., 21:350-359, July 1977.
P(sensestA ac¢iontA Ot+l IT)
[desJardins, 1991] Marie desJardins. Probabilistic eval= P(senses, A actiontlT)
uation of bias for learning systems. In Eighth International Workshop on Machine Learning, pages 495P(ot+llsensestA actiont
A T)
499. Morgan Kaufman, 1991.
Since T makes no predictions regarding sonssst and
[desJardins, 1992a] Marie desJardins. Goal-directed
actiont,
the first term can be rewritten as the prior
learning: A decision-theoretic modelfor deciding what
probability:
to learn next. In Proceedingsof the MachineDiscovery
P(sensestA actiontIT)
= P(sensest A action,)
Workshop, 1992.
Wedropthisterm,
since
itisa constant
fora given
setof [desJardins,
1992b] Marie desJardins. PAGODA:A
observations.
Thesecondtermis computed
by applying
Model for AutonomousLearning in Probabilistic DoPCIto thespecific
theory
T [desJardins,
1993].
mains. PhDthesis, UCBerkeley, 1992.
[desJardins, 1993] Marie desJardins. Representing and
5 Conclusions and Future Work
reasoning with probabilistic knowledge: A Bayesian
approach. In Conference on Uncertainty in Artificial
PAGODA
has been implemented in the ltALPtl (RatioIntelligence, 1993.
hal Agent with Limited Processing llardware) worhl,
silnl,lated, nondeterministic robot domain. A good the[Mitchell, 1980] TomMitchell. The need for biases in
ory of this domainis non-trivial to represent. PAGODA learning generalizations. Technical Report CBM-TRlearns a fairly goodtheory, but is limited by its inability
117, Rutgers University, May1980.
t.o generate and represent internal states that wouldal[Quinlan, 1986] R. Quinlan. The effect of noise on conlow it. to "remember"earlier sensory experiences to aid
cept learning. In Ryszard Michalski, Jaime Carbonell,
in predictions. Anotherresult of these tests is that tile
and TomMitchell, editors, MachineLearning H, pages
choice of prior probability distribution has a significant
149-166. Morgan Kaufman, 1986.
impact on learning. Internal states and automated prior
[Solomonoff, 1964] R. J. Solomonoff. A formal theory of
selection are important directions for future research.
inductive inference. Information and Control, 7, 1964.
Tile GDLcomponent of PAGODA
has not been extensively tested, since the RALPH
domainis relatively
19
Download