Reasoning About Preferences in Intelligent Agent Systems

advertisement
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence
Reasoning About Preferences in Intelligent Agent Systems ∗
Simeon Visser
Utrecht University
Amsterdam, the Netherlands
simeon87@gmail.com
John Thangarajah
RMIT University
Melbourne, Australia
johnt@rmit.edu.au
Abstract
instance, in the travel example above, the user may prefer a
particular airline, or to travel by train and spend any money
saved on a better class of accommodation. Note that this extra
information is included as a preference rather than a goal,
as it is acceptable to satisfy the goal without satisfying the
preference. For example, if the user prefers to fly on Dodgy
Airlines, but no such flights are available, then specifying this
as a preference means that the user can still have a holiday;
specifying this as a goal would mean that the user refuses to
travel by any means other than Dodgy Airlines.
In this paper, we show how preferences can be incorporated
into the BDI plan selection process, so that when a choice of
plan is made, we can do so using the preferences as a constraint on plan selection. For example, if the user prefers 5∗
accommodation, then the agent should first attempt to book
accommodation of this type, i.e. choose plans which book 5∗
hotels in preference to other plans. Similar comments apply
to ordering subgoals in a plan, such as a preference for travelling by train and spending any money saved on accommodation requires means that the subgoal of booking the train be
performed first.
We achieve this by adapting the preference language
LPP [Baier and McIlraith, 2007; Bienvenu et al., 2006] to
specify preferences over properties of goals, which are a way
of specifying what will occur when the goal is achieved. For
example, booking a 5∗ hotel gives the property accommodation class the value 5∗ . The properties of a goal and their
values are presented to the user, who can then use these to
specify preferences, without having to know how the goals
may be achieved. We use the notion of summary information
[Thangarajah et al., 2003; 2002] to derive the properties of a
goal.
Agent systems based on the BDI paradigm need
to make decisions about which plans are used to
achieve their goals. Usually the choice of which
plan to use to achieve a particular goal is left up
to the system to determine. In this paper we show
how preferences, which can be set by the user of the
system, can be incorporated into the BDI execution
process and used to guide the choices made.
1
Introduction
A fundamental feature of agent systems is the ability to make
decisions, and to manage the consequences of these decisions
in complex dynamic environments. In agent systems based
on the popular Belief-Desire-Intention (BDI) model [Rao and
Georgeff, 1992], an agent typically has a set of beliefs about
the current state of the world, a set of goals, which represent
states of the world that it would like to bring about, and plans
which are used to achieve its goals. The plans are often organized into a plan library, i.e. a collection of plans from which
the agent makes a selection in order to achieve a particular
goal. Due to the unpredictable nature of the environment in
which agents are used, it is normal for the agent to have to
choose one of several plans which may be used to achieve
a particular goal; by suitably adapting the choice of plan for
the circumstances applicable at the time, the agent can use
the number of alternatives in the plan library to provide robust behavior.
For example, a travel agent that is asked to book a holiday may adopt two subgoals of booking accommodation and
booking transport. If there are 5 different accommodation
venues and 6 different means of transport, there are a total
of 30 different combinations that may be used by the agent
to achieve the goal. However, not every such combination
may be available; for instance, a given hotel or flight may
be full. It is precisely this uncertainty that motivates the use
of the BDI approach, in that we can specify what needs to be
done whilst leaving the agent free to find the most appropriate
combination of plans which will achieve this.
In practice, it is common for the user to want to specify
some preferences for how the goal should be achieved. For
∗
James Harland
RMIT University
Melbourne, Australia
james.harland@rmit.edu.au
2
Preference Specification
Preferences are expressed in terms of properties of goals,
which can be thought of as the relevant effects of the achievement of a goal. For example, a goal G of booking a holiday
may have a property called payment which specifies the payment method used. Any plan that achieves G by paying for
the holiday with a credit card will result in the value credit
being assigned to this property. Similarly, an alternative plan
may assign the value debit for payment. This means that
the possible values of the property payment for goal G is
the set {credit, debit}. The preference language we use al-
We acknowledge the ARC Discovery Grant DP 1094627.
426
ϕ0 (v0 )
lows the user to specify preferences over the possible values
of these properties. For example, the statement “I would prefer for payment to be made via credit” states the preference
for value credit rather than debit for payment.
Preferences can also be specified considering the resource
usage of plans used to achieve a goal. Preferences about resource usage is specified in terms of an amount and comparative operator for a particular resource type (“I prefer to spend
at most $100 on transport”).
The user then specifies preferences in terms of properties
and resource usage of goals without having to know the details of how the goal is achieved. For example, for the above
user, it is sufficient to know that paying for a holiday can be
done by credit or debit without knowing that in order to pay,
a mode of transport and accommodation must be selected.
The preferences we wish to express are concerned with the
values of properties and the resource usage of goals. Examples include: “I prefer to minimize the money spent on accommodation.”, “I prefer to fly rather than travel by train.”,
and “If the accommodation is a 5* hotel, I prefer to travel with
Jetstar.”. Our preference language is based on LPP [Baier
and McIlraith, 2007; Bienvenu et al., 2006]. Following the
structure of LPP, we use basic desire formulas to represent
basic statements about the preferred situation, atomic preference formulas to represent an ordering over basic desire
formulas and general preference formulas to express atomic
preference formulas that are optionally subjected to a condition. We introduce the class of conditional preference formulas that allow us to specify conditions with regard to information collected at runtime. The preferences of a user are
specified as a set of general preference formulas.
Definition 2.1 (Basic Desire Formula) A basic desire formula is either
1. the name of a goal property and a desired or an undesired value, expressed as name = value or name =
value, where value is not null,1
2. the predicate minimize(resource), to express that the
usage of resource should be minimized,2
3. the predicate usage(resource, amount, comparator ∈
{<, ≤, =, ≥, >}) to express that the usage of resource
should be according to the comparator and the amount
(e.g., at most 500), or
4. a predicate preceded by a goal name goal (e.g.
goal.minimize(resource)).
If ϕ1 , . . . , ϕn , n ≥ 2, are of the form name = value or
name = value, then ϕ1 ∧ . . . ∧ ϕn (conjunction) and ϕ1 ∨
. . . ∨ ϕn (disjunction) are also basic desire formulas.
Examples include transport.type = train and
usage(money, 500, ≤).
Basic desire formulas can be
ordered in atomic preference formulas to express preferences
in which a basic desire formula is preferred over another.
Definition 2.2 (Atomic Preference Formula) Let vmin and
vmax be numeric values such that vmin < vmax . An atomic
preference formula is a formula
...
ϕn (vn ), n ≥ 0,
where ϕi is a basic desire formula and vmin ≤ vi ≤ vmax
and vi < vj for i < j. If ϕi is a predicate, then ϕi can only
be used when i = n = 0 (i.e., we do not allow predicates
together with other basic desire formulas).
Note that we have dropped the constraint for ϕ0 to have
v0 = vmin as done by Bienvenu et al. and explain later the
reason for this modification. We use vmin = 0 and vmax =
100 in our work. An example of an atomic preference formula is transport.type = plane (0)
transport.type =
train (100), which indicates that transport by plane is preferred over train. We now define conditional preference formulas which can be used in general preference formulas defined later.
Definition 2.3 (Conditional Preference Formula) A conditional preference formula is
1. the name of a goal property and a desired or an undesired value, expressed as name = value or name =
value, or
2. the predicate success(goal), failure(goal), or
used(goal, resource, amount), where goal is the name
of a goal, resource is the name of a resource and
amount ≥ 0 is a numerical value.
If ϕ1 , . . . , ϕn , n ≥ 2, are conditional preference formulas
then ϕ1 ∧. . .∧ϕn (conjunction) and ϕ1 ∨. . .∨ϕn (disjunction)
are also conditional preference formulas.
Some examples are f ailure(book f light) and
success(transport). We now define general preference formulas to express atomic preference formulas that are optionally preceded by a conditional preference formula.
Definition 2.4 (General Preference Formula) A
preference formula is
general
1. an atomic preference formula, or
2. γ : Ψ, where γ is a conditional preference formula and
Ψ is an atomic preference formula.
We can express the examples given in the beginning of this
section as the following general preference formulas
accommodation.minimize(money) (0)
transport.type = plane (0) transport.type = train (100)
accommodation.type = hotel ∧ accommodation.quality =
5∗ : book f light.airline = Jetstar (0)
Resource and property information are propagated upwards to the top level of the goal in a goal-plan tree. We
follow a similar approach to the one taken by Thangarajah et
al. [Thangarajah et al., 2003; 2002] in which goals and plans
are augmented with resource and effect summaries to detect
and resolve resource conflicts and to facilitate the merging of
common subgoals.
Typical BDI agent systems consist of a plan library where
for a given goal, there is one or more plans in the plan library
that could possibly achieve the goal (hence, this is an OR
decomposition). Each plan performs some actions or posts
a number of subgoals (this is an AND decomposition as all
subgoals must be achieved). A subgoal is in turn handled by
some other plan in the plan library. This decomposition leads
1
We will explain later the purpose of null.
Resource usage does not always need to be minimized when
other preferences are taken into consideration.
2
427
Legend of terms
RS = resource summary
PS = property summary
PPS = programmer−specified
property summary
PR = programmer−specified
resource requirements
RS = <{(money, 200)}, {(money, 600)}>
PS = {(type, {hotel}), (payment, {credit})
, (book_hotel.quality, {3*, 5*})}
PPS = {(type, {hotel}), (payment, {credit})}
BookFiveStarHotelPlan
PR
= {(money, 400)}
PPS = {(quality, {5*})}
TransportGoal
name = transport
RS = <{(money, 90)}, {(money, 355)}>
PS = {(type, {plane, train}), (book_flight.payment, {credit, null})
, (book_flight.airline, {jetstar, qantas, null})
, (payment, {credit, debit})}
FlightPlan
PR = {(money, 100)}
PPS = {(type, {backpacker})
, (quality, {basic})
, (payment, {credit, debit})}
name = book_hotel
RS = <{(money, 200)}, {(money, 600)}>
PS = {(quality, {3*, 5*})}
BookThreeStarHotelPlan
= {(money, 200)}
PPS = {(quality, {3*})}
RS = <{(money, 190)}, {(money, 1055)}>
PS = {(accommodation.type, {backpacker, hotel}), (accommodation.payment, {credit, debit})
, (accommodation.book_hotel.quality, {3*, 5*, null}), (accommodation.quality, {basic, 3*, 5*})
, (transport.type, {plane, train}), (transport.book_flight.payment, {credit, null})
, (transport.book_flight.airline, {jetstar, qantas, null}), (transport.payment, {credit, debit})}
BackpackerPlan
HotelPlan
PR
HolidayPlan
name = accommodation
RS = <{(money, 100)}, {(money, 700)}>
PS = {(type, {backpacker, hotel}), (payment, {credit, debit})
, (book_hotel.quality, {3*, 5*, null}), (quality, {basic, 3*, 5*})}
AccommodationGoal
BookHotelGoal
HolidayGoal
name = holiday
RS = <{(money, 190)}, {(money, 1055)}>
PS = {(accommodation.type, {backpacker, hotel}), (accommodation.payment, {credit, debit})
, (accommodation.book_hotel.quality, {3*, 5*, null}), (accommodation.quality, {basic, 3*, 5*})
, (transport.type, {plane, train}), (transport.book_flight.payment, {credit, null})
, (transport.book_flight.airline, {jetstar, qantas, null}), (transport.payment, {credit, debit})}
TrainPlan
RS = <{(money, 90)}, {(money, 240)}>
PS = {(type, {plane}), (book_flight.payment, {credit})
, (book_flight.airline, {jetstar, qantas})}
PPS = {(type, {plane})}
PR = {(money, 115)}
PPS = {(type, {train}), (payment, {debit})}
name = book_flight
RS = <{(money, 90)}, {(money, 240)}>
PS = {(payment, {credit}), (airline, {jetstar, qantas})}
BookFlightGoal
FlyWithJetstarPlan
PR
= {(money, 90)}
PPS = {(payment, {credit})
, (airline, {jetstar})}
FlyWithQantasPlan
PR
= {(money, 150)}
PPS = {(payment, {credit}), (airline, {qantas})}
Figure 1: Goal-plan tree example
mary of a node N is that upon successful execution of N , the
value of p is exactly one of that in values.
The human-readable names of the goals encountered from
the root goal to the property’s node (except the root node itself) are in goalpath. For example, the property summary of
the node ACCOMMODATION G OAL contains a property called
book hotel.quality. This property has book hotel as goalpath and quality as the name.
We use the value null ∈ values to indicate that the property could receive no value. For example, the goal T RANS PORT G OAL has two plans, F LIGHT P LAN and T RAIN P LAN ,
where the former has the property book f light.airline but
the latter does not. Hence the property book f light.airline
of T RANSPORT G OAL will have the value null if T RAIN P LAN is successfully executed. Note that we allow a
programmer-specified property to have more than one value.
For example the property payment of BACKPACKER P LAN
has values {credit, debit}. If the user does not prefer a particular value, the resulting value of the property is irrelevant.
The nodes in the goal-plan tree are annotated with
programmer-specified information and computed summary
information. Each goal-node is annotated with the programmer defined goalname. For each plan in the plan library, we
require that a programmer specifies the resources required by
that plan (P R ), similar to Thangarajah et al. [Thangarajah
et al., 2002], and additionally a property summary (P P S ).
The corresponding plan-node in the goal-plan tree is annotated with this programmer-specified information.
to a natural tree-like hierarchy which is termed a goal-plan
tree. A goal node has one or more plan nodes as children and
a plan node has zero or more goal nodes as children. Figure 1 shows an example goal-plan tree of a goal of booking
a holiday (H OLIDAY G OAL), which consists of two subgoals
for booking accommodation (ACCOMMODATION G OAL) and
booking transport to the destination (T RANSPORT G OAL).
Similar to the approach of Thangarajah et al., we annotate
nodes in the goal-plan tree with the required information and
we automatically propagate this information to other nodes
in the tree at compile time. The goal-plan tree in Figure
1 shows, apart from the goal-plan decompositions, both the
programmer-specified properties P P S and resources of plans
P R , and the propagated annotations of the summary information (property summaries P S and resource summaries RS).
We use resource summaries here as we also allow the user
to specify preferences based on resources. A resource summary contains the necessary and possible resource usage for
a node ( N, P ). Necessary resources are those that are used
irrespective of the goal-plan choice and possible resources are
those that may be needed in the worst case where plans may
fail and alternatives are tried. We define a property summary
P S of a node N (denoted as P SN ) in the goal-plan tree, as
a set of properties, where each property is of the form (goalpath.name, values) where goalpath denotes the path to N in
the goal-plan tree, name is the name of the property, and values is a set of values. Note that values is a non-empty set.
The intended meaning of a property p in the property sum-
428
For example, in Figure 1, goal B OOK F LIGHT G OAL is annotated with its name, and its plans F LY W ITH J ETSTAR P LAN
and F LY W ITH Q ANTAS P LAN are both annotated with the sets
of resources and properties required by the plan. Note that
it is sufficient to provide this information just for the lowest
level plans (i.e. the leaves of the goal-plan tree) as the information is propagated up in the goal-plan tree. We do however
allow annotation for all plan-nodes, such as H OTEL P LAN
which has both programmer-specified and computed information.
Using the programmer-specified information above, we
compute the resource summaries using the techniques described by Thangarajah et al. [Thangarajah et al., 2002],
and the property summaries using similar principles. The
techniques were shown to be computationally effective in
[Thangarajah and Padgham, 2010]. Note that the summary
information is computed at compile time and dynamically updated at run-time.
The goal-plan tree in Figure 1 illustrates the complete computed summaries for each node.
3
regard to the resource usage of a node. However, an estimate
is necessary when attempting to satisfy preferences related to
resource usage.
We use the function k-estimate to compute the estimated resource usage of a particular resource of a plan.
Definition 3.1 (k-estimate) Let (neci , posi ) be the necessary and
possible usage of a particular resource for a plan Pi . We define the
k-estimate of Pi as ei = neci + k · (posi − neci ) where 0 ≤ k ≤ 1.
The value of k can be set for each plan and it is related to
the expected failure rate of the plan, including the execution
of its subgoals. For example, if a plan uses on average 10%
more than the necessary amount of a resource, we can set
k = 0.1. We emphasize that the estimate serves to guide the
agent and that this does not mean the agent is able to execute
the plan with a resource usage that is close to the estimate.
For example, the resource estimate for F LIGHT P LAN (see
Figure 1), assuming k = 0.5, would be (90) + (0.5) . (240-90)
= 165.
The semantics of our preference language are based on
the semantics of LPP. For each class of preference formulas, we define an evaluation function w that assigns a value
vmin ≤ v ≤ vmax to a formula of that class, where a lower
value means more preferred. We evaluate preference formulas only for plan selection for a goal and we therefore evaluate a formula for a given goal G and a plan Peval of G. We
include G in the evaluation of a preference formula because
the satisfaction of some basic desire formulas for a plan depends on the summary information of other plans of G. For
example, the basic desire formula minimize(resource) is
satisfied for a plan if its resource usage of resource is lowest
compared to all other plans.
Compared to the work of Bienvenu et al., our main contributions are the semantics of our basic desire formulas and
the semantics of our introduced class of conditional preference formulas. Otherwise, we follow their semantics, apart
from the adaptation of general preference formulas to utilize
conditional preference formulas.
Reasoning about preferences
Consider the following preference formula
book f light.airline = qantas : acc.type = hotel (0)
Based on our previous observations, we want to execute
B OOK F LIGHT G OAL before ACCOMMODATION G OAL to obtain the value of book f light.airline. By analyzing the
goal-plan tree, we see that B OOK F LIGHT G OAL is part of the
subtree rooted at T RANSPORT G OAL. Further, both T RANS PORT G OAL and ACCOMMODATION G OAL are subgoals of
H OLIDAY P LAN. We see that at H OLIDAY P LAN, the agent
should therefore pursue the subgoal T RANSPORT G OAL before ACCOMMODATION G OAL.
Similarly, the preference transport.type = plane (0)
transport.type = train (100) can be satisfied by choosing
F LIGHT P LAN rather than T RAIN P LAN to achieve T RANS PORT G OAL . We now describe algorithms that utilize user
preferences and summary information in the goal-plan tree to
guide these decisions.
When the agent needs to select a plan for a goal, our approach is to express numerically how well a plan satisfies the
preference formulas. We can then sort the plans from most to
least preferred and attempt the plans in that order to achieve
the goal.
Before we can define the semantics of preference formulas, we first discuss the differences between preferences about
goal properties and preferences about the resource usage of a
goal. These preferences differ in the sense that goal properties and their possible values are more precisely known to the
agent than the resource usage of a goal. The latter depends
on the possibly unsuccessful execution of plans in the subtree
rooted at that goal whereas the value of a goal property must
be one of the values in the known set of values.
Since the actual resource usage of a node in the goal-plan
tree is not known to the agent, and is dependent on the path
chosen to achieve the goal, we have opted to use estimates of
the amount that will be used. The resource summaries of a
node only contain the amounts that will necessarily and possibly be used which is the only information the agent has with
Definition 3.2 (Basic Desire Satisfaction) Let ϕ be a basic desire
formula, let G be a goal and let P1 , . . . , Pn , n ≥ 1 be the plans for
G. Furthermore, let P Si and Ri respectively be the property summary and resource summary of Pi , and let P Seval be the property
summary of Peval . We define w(ϕ, G, Peval ) as follows:
• if ϕ is name = value then w(ϕ, G, Peval ) = vmin iff there
exists a property p ∈ P Seval , such that value ∈ values(p)
and name is equal to either
-
name(p) or name(p) prepended with the goalpath from
the root node to G, excluding the name of the root goal
node, or
name(p) with the goalpath to G prepended, if there exists a plan Pi = Peval with a property q ∈ P Si such
that q has no goalpath and name(q) = name(p).3
Otherwise, w(ϕ, G, Peval ) = vmax .
• if ϕ is name =
value then w(ϕ, G, Peval ) is as above
for (name = value) with the requirement that value ∈
values(p)
• if ϕ is ψ1 ∧ . . . ∧ ψm , m ≥ 2 then
w(ϕ, G, Peval ) = max({w(ψj , G, Peval ) | 0 ≤ j ≤ m})
3
429
This condition is used for properties that were merged using N .
- MS names of goals that have succeeded,
- MF names of goals that have failed, and
- MR a data structure with per goal and per resource the
amount that was used. We use Mgoal
R (resource) to denote the amount (possibly none) of resource used thus
far for the execution of goal.
• if ϕ is ψ1 ∨ . . . ∨ ψm , m ≥ 2 then
w(ϕ, G, Peval ) = min({w(ψj , G, Peval ) | 0 ≤ j ≤ m})
• if ϕ is minimize(resource) then w(ϕ, G, Peval ) = vmin iff
Peval ∈ S, and w(ϕ, G, Peval ) = vmax otherwise, where S
is a set of plans computed by one of the following procedures.4
– min nec pos Let (neci , posi ) be the necessary and
possible resource usage of resource for Pi . Let S contain each plan Pi that satisfies
(1) there is no Pj , i = j such that necj < neci ,
(2) if multiple plans satisfy (1), we include in S only those
for which it additionally holds that there is no Pj , i =
j such that posj < posi .
– min estimate Let ei be the k-estimate of a plan
Pi . Then S contains each plan Pi for which there is no
Pj , i = j such that ej < ei .
Definition 3.4 (Conditional Preference Satisfaction) Let Φ be a
conditional preference formula and let metadata M
=
MP , MS , MF , MR . We define w(Φ, M) as follows
• if Φ is not a conjunction or disjunction, we define w(Φ, M) =
vmin iff the given condition holds and
w(Φ, M) = vmax otherwise.
-
• if ϕ is usage(resource, amount, comparator) then
w(ϕ, G, Peval ) = vmin iff Peval ∈ S, and w(ϕ, G, Peval ) =
vmax otherwise, where S is a set of plans computed using the
procedure corresponding to the comparator ∈ {<, ≤, =, ≥
, >} that is used.
-
-
• if Φ is ϕi ∧ . . . ∧ ϕn , n ≥ 2 then
w(Φ, M) = max({w(ϕi , M) | 0 ≤ i ≤ n})
< and ≤ amount: Let (neci , posi ) be the necessary
and possible resource usage of resource for Pi . Then S
contains all plans Pi such that posi < amount ( resp.
≤). If no such plans exist, then S contains all plans Pi
such that neci < amount (resp. ≤).
= amount: Let ei be the k-estimate for Pi . Let S
contain all plans Pi such that |ei − amount| is lowest
(i.e., ei is closest to amount).
> and ≥ amount: Let (neci , posi ) be the necessary
and possible resource usage of resource for Pi . Then S
contains all plans Pi such that neci > amount (resp.
≥). If no such plans exist, then S contains all plans
Pi such that ei > amount (resp. ≥), where ei is the
k-estimate for Pi .
• if Φ is ϕi ∨ . . . ∨ ϕn , n ≥ 2 then
w(Φ, M) = min({w(ϕi , M) | 0 ≤ i ≤ n})
Definition 3.5 (General Preference Satisfaction) Let Φ be a general preference formula and let G be a goal.
We define
w(Φ, G, Peval , M) as
• w(ϕ0 . . . ϕn , G, Peval , M) =
w(ϕ0 . . . ϕn , G, Peval )
• w(γ
: Ψ, G, Peval , M) =
vmin ,
if w(γ, M) = vmax
w(Ψ, G, Peval ), otherwise
Recall that in atomic preference formulas, we do not define
ϕ0 to have value v0 = vmin as done by Bienvenu et al. This
allows preference formulas to override other formulas. For
example, we can prefer prop = val1 (100) and γ : prop =
val2 (50), where γ is a conditional preference formula. If γ
is satisfied then a plan that satisfies prop = val2 is preferred
over a plan that satisfies prop = val1 .
We can now present our algorithm for computing the preferred order in which plans of a goal G should be selected for
execution. The input of this algorithm is the set of general
preference formulas F, the goal G and metadata M.
As an illustration of the semantics of name = value,
consider accommodation.quality = 3∗ when evaluating
for the goal called accommodation. This formula is satisfied for H OTEL P LAN because the property summaries of
H OTEL P LAN and BACKPACKER P LAN respectively contain
a property called book hotel.quality and quality which
means the second condition for name is satisfied. Recall that
an atomic preference formula orders one or more basic desire
formulas, each with an associated value. The value we wish
to associate with an atomic preference formula Φ as a whole
is the value of a satisfied basic desire formula in Φ with the
lowest associated value.
-
Definition 3.3 (Atomic Preference Satisfaction) Let
Φ = ϕ0 (v0 ) . . . ϕn (vn ), n ≥ 0, be an atomic preference
formula and let G be a goal. We define w(Φ, G, Peval ) = vi
if there exists ϕi such that w(ϕi , G, Peval ) = vmin and there
is no j < i such that w(ϕj , G, Peval ) = vmin . Otherwise,
w(Φ, G, Peval ) = vmax .
For each plan Pi of G, compute a score scorei which is
the sum of the values of w(f, G, Pi , M) for each f ∈ F.
Sort the plans by scorei in non-decreasing order.
The output is an ordered list of the plans which the agent
attempts in that order. In case of plan failure, the next plan in
the ordered list is attempted.
Consider the general preference formula
goal1 .prop1 = value1 : goal2 .prop2 = value2 (0)
The evaluation of conditional preference formulas requires
information about the execution of goals thus far. For this we
define metadata M = MP , MS , MF , MR :
-
if Φ is name = value then (name, value) ∈ MP
if Φ is name = value then (name, value) ∈ MP
if Φ is success(goal) then goal ∈ MS
if Φ is f ailure(goal) then goal ∈ MF
if Φ is used(goal, resource, amount) then
Mgoal
R (resource) ≥ amount
which can be read as “if prop1 of goal1 has received the value
value1 then I prefer prop2 of goal2 to receive value2 ”. To
satisfy this preference, we should execute goal1 before goal2
to determine the value of prop1 . If its value is indeed value1
then we can aim to satisfy the preferred value of prop2 for
goal2 . We wish to satisfy the user preferences as much as
MP pairs of a goal name and its received value,
4
The procedure to be used is determined by the designer prior to
execution.
430
References
possible which is why the agent should, given a general preference formula γ : Ψ, execute the goals mentioned in γ before the goals that are referred to in Ψ. We can determine
these goal orderings at compile time by analyzing the preference formulas and the structure of the goal-plan tree.
4
[Baier and McIlraith, 2007] Jorge A. Baier and Sheila A.
McIlraith. On domain-independent heuristics for planning
with qualitative preferences. In 7th Workshop on Nonmonotonic Reasoning, Action and Change (NRAC), 2007.
[Bienvenu et al., 2006] Meghyn Bienvenu, Christian Fritz,
and Sheila A. McIlraith. Planning with qualitative temporal preferences. In KR, pages 134–144. AAAI Press,
2006.
[Fritz and McIlraith, 2005] Christian Fritz and Sheila McIlraith. Compiling qualitative preferences into decisiontheoretic golog programs. In Proceedings of The 6th Workshop on Nonmonotonic Reasoning, Action, and Change,
Edinburgh, UK, 2005.
[Fritz and McIlraith, 2006] C. Fritz and S. McIlraith.
Decision-theoretic golog with qualitative preferences. In
KR’2006, pages 153–163, Lake District, UK, 2006.
[Hindriks and van Riemsdijk, 2008] K. Hindriks and B. van
Riemsdijk. Using temporal logic to integrate goals and
qualitative preferences into agent programming. In DALT,
pages 215–232, 2008.
[Hindriks et al., 2008] K. Hindriks, C. Jonker, and W. Pasman. Exploring heuristic action selection in agent programming. In ProMAS, volume 5442 of Lecture Notes in
Computer Science, pages 24–39. Springer, 2008.
[Myers and Morley, 2001] K. L. Myers and D. N. Morley.
Human directability of agents. In K-CAP, pages 108–115.
ACM, 2001.
[Myers and Morley, 2002] K. L. Myers and D. N. Morley.
Resolving conflicts in agent guidance. In Proceedings of
the Workshop on Preferences in AI and CP: Symbolic Approaches, 2002.
[Nguyen and Wobcke, 2006] Anh Nguyen and Wayne
Wobcke. An adaptive plan-based dialogue agent: integrating learning into a bdi architecture. In AAMAS, pages
786–788. ACM, 2006.
[Rao and Georgeff, 1992] A. S. Rao and M. P. Georgeff. An
abstract architecture for rational agents. In KR, pages 439–
449, 1992.
[Thangarajah and Padgham, 2010] John Thangarajah and
Lin Padgham. Computationally effective reasoning about
goal interactions. Journal of Automated Reasoning, pages
1–40, 2010.
[Thangarajah et al., 2002] J. Thangarajah, M. Winikoff,
L. Padgham, and K. Fischer. Avoiding resource conflicts
in intelligent agents. In ECAI, pages 18–22, 2002.
[Thangarajah et al., 2003] J. Thangarajah, L. Padgham, and
M. Winikoff. Detecting & exploiting positive goal interaction in intelligent agents. In AAMAS, pages 401–408.
ACM, 2003.
Discussion
We have implemented our preference system in the agent
platform Jadex5 and have tested it on a number of examples,
including the holiday example discussed above. The implementation consists of around 3000 lines of code, which utilizes the metagoal and metaplan features of Jadex. We
extracted the goal-plan tree from the agent specification and
we annotate and propagate the summary information before
the system runs. We used the preference formulas from Section 2 and analysed the execution of our system both with
and without preferences. When preferences were not used or
when no preferences were available, the agent randomly selected a plan for a goal and pursued the subgoals of a plan
in an arbitrary order. The user preferences were therefore
only satisfied when the agent happened to make the right decisions during executions. When we included our reasoning algorithms, the agent booked backpacker accommodation (as cheapest alternative) and a flight with either airline
as expected. In case the plan for backpacker accommodation
failed, the agent booked a 3∗ hotel. When this plan failed as
well, the 5∗ hotel was selected and due to the third preference
formula, the agent selected the plan to fly with Jetstar rather
than Qantas. Further, the accommodation goal was pursued
before the transport goal to make this possible.
Preferences have been used for the selection [Hindriks et
al., 2008; Nguyen and Wobcke, 2006] and elimination [Hindriks and van Riemsdijk, 2008; Myers and Morley, 2001;
2002] of actions or plans. The preference language on which
our work is based is also used by Fritz and McIlraith [Fritz
and McIlraith, 2005; 2006] to integrate preferences into the
agent programming language DT-Golog.
Various directions for future work exist. Firstly, we have
not incorporated consistency checks to see if the user has
specified preferences that conflict in some or all executions
of the agent system. Secondly, we have not explored interactions that may exist between preferences: satisfying a preference in a part of the goal-plan tree might make it impossible to satisfy preferences in other parts of the goal-plan tree.
Thirdly, we have focussed on consumable resources and we
have not explored the specification of and reasoning about
preferences over reusable resources. Lastly, the assumptions
that we make in our propagation rules and the way the nodes
in the goal-plan tree have been annotated influence the usefulness of the computed properties. For example in Figure 1, if B OOK F IVE S TAR H OTEL P LAN was also annotated
with payment = {credit}, we end up with payment =
{credit} and book hotel.payment = {credit, null} for
H OTEL P LAN which is redundant. The underlying issue here
is that, it is not clear whether a property prop of a plan and
a property prop of a subgoal of that plan should be considered the same or not. Our future work will address the above
limitations.
5
http://jadex.informatik.uni-hamburg.de/
431
Download