Using Decision-Theoretic Planning Agents in Market-Based... Michael Brydon

From: AAAI Technical Report WS-02-06. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Using Decision-Theoretic Planning Agents in Market-BasedSystems
Michael Brydon
Faculty of BusinessAdministration and Centre for SystemsScience
SimonFraser University
Bumaby,British Columbia,Canada
mjbrydon@sfu.ca
Abstract
as minimizing the tardiness or makespan of a set of jobs
[23].
A decision-theoretic
planning system for manufacturing
scheduling would explicitly managethe uncertainty in the
production environment and would permit managers to
express their objectives in terms of complex, discontinuous,
multiattribute utility functions. Unfortunately, the richness
of the decision-theoretic planning approach leads to a welldocumented explosion in the number of states required to
represent the problem [5, 11]. Several techniques, such as
state aggregation [5, 6, 12], and reachability analysis [1, 4]
have been used successfully to delay the onset of impractically large state spaces. However,it is unlikely that such
techniques by themselves are sufficient to permit the formulation and solution of large, industrial-scale decision-theoretic planning problems. Instead, some form of problem
decomposition is required [10, 19].
A very general approach to problem decomposition that
is attracting a great deal of research interest is based on the
notion of agents interacting within markets [8, 9, 21, 33,
34]. The purpose of this paper is to examine the suitability
of decision-theoretic planning agents for certain types of
market-based systems. Section 2 begins by defining the
environmental context of the resource allocation problems
considered in this paper. Section 3 works backwards from
economic theory and the characteristics
of the original
resource allocation problem to identify the core requirements for the market-based agents. The suitability of decision-theoretic
planning for implementing market-based
agents is examined in Section 4. The fundamental question
addressed in the section is not whether decision-theoretic
planning is an appropriate framework for fulfilling
the
requirements imposed by economic theory. Instead, the
important question is one of computational practicality.
Although the agent-level planning problems are exponentially smaller than the undecomposedproblem, they typically remain far too large to represent and solve using
conventional dynamic programming techniques (see [24]).
In order to be solvable in practice, the agent-level planning
problems must possess intemal structure
that can be
exploited by state space reduction techniques. The empirical evidence presented in this section suggests that certain
important classes of resource allocation problems do possess internal structure that can yield significant reductions
in the effective size of the agent-level planning problem.
Section 5 concludes with some general observations regard-
This paper examinesa numberof theoretical and practical
issues concerningthe use of decision-theoretic planning to
implement agents in market-based systems. The markets
considered here result from the decompositionof complex,
intra-organizationai resource allocation problems such as
manufacturing scheduling. Althoughthese problems can be
formulatedas monolithicoptimizationproblems,they tend to
be muchtoo large to solve in practice. Marketsprovide a
meansof decomposinglarge resource allocation problems
and distributing the computationof a solution over many
processors.
Animportantpreconditionof efficient marketsis agent-level
rationality. Decisiontheoretic planningcan be used to implementeconomicrationality and thus decision theoretic planning agents fit well into market-based approaches. The
primary challenge in building decision theoretic planning
agents is the size of the agent’s state space. Althoughmarketbased decompositionresults in agent-level problemsthat are
muchsmaller than the original resource allocation problem,
the price mechanismused to achieve independence of the
agent-level problems requires that the agents plan over a
large number of different resource contingencies. This
requirementexacerbatesthe state space explosionthat characterizes decision-theoretic planning. The application of
state space reduction techniques such as structured dynamic
programmingand reachability analysis are shownto yield
significant reductionsin the effective size of the agent-level
problemsand thereby increase the applicability of decision
theoretic planning techniques in market-basedsystems.
1 Introduction
Real-world resource allocation problems such as scheduling production machinery in large manufacturing facilities could benefit greatly from the application of decisiontheoretic planning techniques. Such allocation problems are
characterized
by uncertainty
and complex trade-offs
between objectives such as time, flexibility, and cost. However, the standard techniques used to allocate resources rely
on numerous simplifying assumptions, such as deterministic outcomes and single-attribute objective functions such
Copyright
©2000,American
Associationfor Artificial
(www.aaai.org).
All rights reserved.
19
ing the use of decision-theoretic planningagents in marketbased systems.
2 Market-Based
¯ terminal
J reward
PRODUCTION SYSTEM
\
/~- --~
Decomposition
A distinction is often madein the distributed artificial
intelligence (DAI)literature betweencooperative distributed problem solving (CDPS)environmentsand multiagent
systems (MAS)environments[3, 15]. The critical difference
between CDPSand MASenvironments is the existence of a
global utility function. In CDPS
environments,a global utility function exists; the challengeis to decompose
the problem such that the maximization behavior of the agents
results in maximizationof the global utility function. The
final utilities of the agents themselvesare irrelevant since
the agents exist solely to facilitate the distribution of computation.
In contrast, agents in MAS
environmentsare typically
usedto modelthe distributed, multiagentstructure of a naturally decentralized problem[28]. For example,agents can
be used to represent different firms in a complexsupply
chain coordination problem.Since the fundamentalprinciples of decision theory prohibit the aggregationor comparison of utilities across individual agents [25], no meaningful
measureof global utility exists in such environments.Identifying a "good"of"fair" solution therefore requires a commitmentto a particular game-theoreticbargainingsolution
[26, 28].
The task of allocating scarce production resources in a
manufacturingfacility is an instance of the CDPSclass.
Althoughthe term "cooperative" is misleadingin this case
(since economicagents are strictly self-interested), intraorganizational resource allocation problems have welldefinedutility functions. Thetask facing the designerof the
agents is to decomposethe global utility function into
agent-level utility functions. In the following section, a
decompositionis proposedthat achieves an additive relationship betweenglobal utility and independencebetween
the agent-level problems.
2.1 Decomposing the Scheduling Problem
A natural wayto decomposea resource allocation problem in a manufacturingenvironmentis to create computerbased agents for each part and machinein the production
system(part agents and machineagents respectively). All
the costs and revenues in the problemdomaincan then be
allocated to individual agents as shownin Figure1.
The only source of revenue in the systemis the payment
that occurs whena completed part is shipped to the next
stage in the system’ssupplychain, such as a final customer,
a distributor, or finished goodsinventory. Incomingrevenue
is modeledusing a terminal rewardthat is received by a
part agent whenthe part it represents leaves the production
system. The size and functional form of the reward are
determinedexogenouslybut are knownto the part agent.
On the cost side, both part agents and machineagents
incur costs. In the case of part agents, each part is allocated
the costs of its constituentrawmaterials as well as the cost
2O
II(
materials
part
~"---"--~(machlne-’~
holding
costs
vadable setup and
cost8 maintenance
Figure]: Anallocation of global revenues
andcoststo
a part and machineagent.
of holding the part in work-in-processinventory. The costs
of wear and tear, consumables,and setups are allocated to
individual machines.
The arrows marked"inter-agent transfers" are discussed
in moredetail in Section3. At this point, it is sufficient to
recognize that all inter-agent transfers are zero-sumand
therefore haveno net impacton global utility. As a result,
the global utility of the productionsystem(revenues- costs)
can be expressedas the sumof the individual agent utilities:
=
(1)
i
To simplify exposition in this paper, we makethe assumption that variable, setup, and maintenancecosts are zero,
This permits us to eliminate machine agents from the
decomposed problem and focus on the more complex
requirementsof part agents.
2.2 Microeconomic Theory and CDPS Environments
The purpose of microeconomic
theory is to describe the
interaction of individual rational agents pursuingtheir own
selfish objectives[2, 17]. Thetermrationality, as it is used
in the microeconomiccontext, corresponds to Elster’s
notion of thin rationality [13]. Thinrationality requires only
that an agent maximizeits utility accordingto a consistent
set of preferences over outcomesand that its reasoning
about the expected values of outcomesbe consistent with
the fundamentalaxiomsof probability theory.
Althoughthe microeconomic
formulation of agents is too
simplistic to provide an accurate description of human
agents, the objective in a CDPS
environmentis not to simulate reality. Instead, the objective is to distribute computation and aggregate the results so that global utility is
maximized.The designer of problem solving systems in a
CDPS
environmenthas the luxury of complete control over
the implementationand behavior of the agents in the system. This is clearly not the case for the designers of real
markets (e.g., the NewYorkStock Exchange[32]) or problem solving systems in MASenvironments. It is therefore
possible for a CDPSsystem to be constructed in strict
accordance with microeconomictheory.
The advantage of using microeconomicsas a blueprint
for a market-basedsystemis that the theory provides some
a priori insights and guarantees about the overall outcome
of the market. Specifically, the First Fundamental
Theorem
of WelfareEconomics
states that a competitiveequilibrium
inducesan allocation that is Paretooptimal[17, 16]. Thatis,
if rational, self interested agentsare permittedto competein
a market,the outcomeat equilibriumis that no agent can be
madebetter offwithout makingsomeother agent worseoff.
AlthoughPareto optimality in the general case is insufficient for global optimality, it is possible to achieveequivalence betweenthe twoin cases in whichglobal utility is the
sumof agentutilities.
2.3 Maximization Behavior of Agents
Accordingto the decompositionin Figure 1, an agent in a
manufacturingsystemmaximizesits utility by attaining its
terminal rewardwhile simultaneouslyminimizingits costs.
The cost and rewardsthat the agent accumulatesduring its
progress through the production system are a function of
two things: chance and the agent’s control of production
resources. It is therefore possible to express agent i’s
expectedutility as a function of its allocation of resources
x, : U~=f,(x~).
Productionresourcesare definedas discrete units of processing time on a particular machineat a particular time. A
family of propositional variables Owns(Mj,Tk) can be used
to denote whetherthe agent has exclusiverights to processing on machineMj at time Tk. The total numberof production resources in the system, m, is the numberof machines
in the production systemmultiplied by the numberof time
units in the planninghorizon.
Theonly waythat agents can increase their utility is to
exchangeresources with other agents. Suchexchangescan
occur betweenrational agents if and only if neither agent is
worseoff as a result of the exchangeand at least one agent
is strictly better off. Onemeansof facilitating such individually rational (IR) exchangesbetweenagents is to augment
the m-goodeconomywith a numeraire good. A numeraire
goodis a goodthat possesses the properties of fungibility
and divisibility but whichhas no intrinsic utility to the
agents other than its acceptance within the economyas a
unit of exchange[17]. Acceptanceof the goodas a unit of
exchangeis achieved by adopting and enforcing the conventionthat all agents expresstheir preferencesfor resource
goods in terms of the numeraire goodMrather than their
ownidiosyncratic and incomparableunits of utility. The
result is a quasi-linearutility functionfor agenti:
U~ = Mr + dpi(x,)
(2)
Toconstruct(2), the agent scales its utility functionto make
it linear in good M. The term ¢i(xi) corresponds to the
agent’sintrinsic utility for resourcegoods.In a manufacturing scheduling environment, the form of Cj(x~) is determined by the complex interaction of costs and rewards
faced by the part. Intuitively, high-valueparts with high
holding costs and binding deadlines are boundto value cer-
21
rain productionresources morethan low-valueparts that are
built to stock.
2.4
Determining Reservation
Prices
Determining the rationality of a particular exchange
requires agents to knowtheir ownprivate reservation prices
for goods. Agenti’s reservation price for resource goodx/
is the changein the quantity Mi that makesthe agent indifferent betweenan allocation that contains xJ one that is
identical in everyrespect exceptthat it does not contain x).
Thus given x/ = xi u x/, AMj(x2)= ~i(x/)- qb,.(xj).
Notethat the reservationprice is symmetrical.It contains no
spread betweenthe price at whichthe agent is willing to
buythe resource(bid price) and the price at whichthe agent
is willing to sell the resource if it already ownsit (ask
price). In addition, the quantity AMis independentof the
agent’s endowment
of M. Animplication of the quasi-linear
utility functionis that agents are risk neutral withrespect to
good Mand thus all exchangedecisions are madeon the
basis of marginalchangesin M.
An important issue that arises in resource allocation
problemsis that valuations for resource goodsare seldom
independent. Twocommonforms of resource interdependence are substitutability and complementarity.Twogoods
x1, x2 are substitutes if the utility of owningthemin the
same
allocation
is
subadditive:
Ui(xl, x2)< Ui(xl)+ U~.(x2). Conversely, the goods are
complements
if the utility of owningthemin the sameallocation is superadditive: U~(xl, x2) > U~(xI) + Ui(x2). Substitutability
occurs in manufacturing environments
whenevera contract on a particular machineat a particular
time can be used in place of a contract on a different
machineor at a different time. Complementarityoccurs
wheneveran operation requires more than one unit of processing time or involves precedenceconstraints.
The deeply embeddedinterdependencies of goods in
resource allocation problemsmeansthat the agents invariably participate in a combinatorialauction [18, 27, 29]. In a
combinatorialauction, an agent cannotdetermineits reservation price for a goodby consideringthe goodin isolation.
Instead, the agent mustconsider as manyas 2" unique allocations of resources and calculate its reservation price for
all transition betweenall allocations.
For example, in the m= 2 economyshownin Figure 2,
the agent’s reservation price for good x2 dependson its
ownership of xI . If the agent’s initial allocation is
(xI = 0, x2 = 0, M= 100), its reservation price for 2 i s
$30- $100=-$70. In other words, the agent is willing to
pay up to $70 to acquire the resource. However,if the
agent’sinitial state is (x1 = 1, x2 = 0, M= 50), its reservation price is only $0 - $50 = -$50. Theagent therefore con-
initial allocation
M
~(X 1 m 0, X2 = 0, M =
(X 1 =
fo
o1:o I
15° I
100)
gx,,:}, { }l
1, x2 = O, M= 50
~l = O,x2 = I,M. = 30)
Ixl
Z
Agent/
(x I "= 1,x2= 1, M=O)
Figure 2: Anindifference graph for an economy
consisting of two indivisible resource goods(X1 and
x2 ) and a numeraire good M.
Agent/
1
3
2
10
Total
~10
l{ }, x2)l
ol ~timalallocation
~gent/
~,(x~)
!
15
2
0
total
15
0
;2
100
Total
100
Figure 3: A contract graph showingfor a two-agent
two-good economy.
siders the goodsto 2be partial substitutes since it values x
Iless
. whenit already ownsx
3 Interagent
Exchange
The decomposition of a resource allocation problem
results in apure exchangeeconomy[17]. Noproduction or
consumptionof the m resource goodsor the single numeraire good occurs during the operation of the market.
Instead, the agents exchangegoodsuntil no further IR transactions are possible. The equilibriumoutcomeis said to be
efficient if each goodis ownedby the agent with the reservation price for the good.
A contract graph [30] provides a convenient means of
visualizing the exchangesthat can occur in an n-agent mgoodeconomy.Eachvertex of the graph describes an allocation of goodsto agents. Theedges represent exchangesof
goods(or contracts) betweenagents. Figure 3 showsa very
simple contract graph for an economywith two agents and
tworesourcegoods.Theintrinsic utilities of the agents are
shownfor each vertex as well as the sumof the utilities.
Althoughtransfers of Mare essential for the execution of
the exchanges,interagent transfers of Min a pure exchange
economywith quasi-linear utility functions are zero-sum.
Thetotal utility of money,~ U~(M~),
is constant in all allocations and thus the flows of Mcan be let~ implicit on the
contract graph.
Oneadvantageof the contract graphrepresentation is that
it permits the operation of the marketto be visualized as
local search over allocations. The different auction forms
used to implementthe market result in different search
operators. However,
in this case, the agents are risk-neutral
22
with respect to the auction and possess independentprivate
valuations of the goods. Consequently, the major auction
forms are equivalent in terms of economicefficiency [20].
Becauseof the equivalenceof auction forms, the market
can be implementedas a continuous auction rather than a
call auction. In a call auction, a central auctioneercollects
reservation prices from potential buyers of a good and
awards the good to the highest bidding according to some
price function such as first- or second-price[14, 20]. In a
continuousauction, agents are alwaysin a position to buyor
sell resource goods and transactions are executed without
the intervention of an auctioneer.
The primarydifference betweena call auction and a continuous auction is the numberand sequence of contracts
requiredto attain the equilibriumallocation. Sincethe reservation prices of the bidders are sorted in a call auction, a
single contract is all that is requiredto ensurean efficient
outcome.In a continuousauction, the path through the contract graph dependson the order in whichthe agents interact. Althoughthe distribution of the economicsurplus to
agents maybe different in a continuous auction, the only
outcomeof interest in a CDPS
environmentis global utility.
A continuous auction therefore provides a convenient
meansof avoidingthe synchronizationrequirementof a call
auction.
4 Decision-Theoretic
Agents
Regardless of the auction form used to implementthe
market, the agents that participate in the marketmust be
able to determinetheir reservation prices for arbitrary bundles of resource goods. The agents use their reservation
prices in twoways.First, an agent acting in the buyerrole
needs to identify desirable allocations and determineits
willingness to pay for the resources it does not own.Conversely, an agent in the seller role mustbe able to respondto
ask price inquiries from other agents. To complicatematters, agents in a combinatorialauction maybe selling and
buyingwithin the scope of a single transaction. For example, the contract represented by the horizontal line in
Figure 3 has Agent1 selling xI and buyingx2 at the same
time.
To this point, nothing has been said about the precise
manner in which an agent i determines ~i(xi) where
x,. e Xi and IXi[ = 2m. However,it is importantto recognize that an agent’s expectedreturn on its investmentin a
production resource dependson the wayin whichthe agent
uses the resource. Thus, the valuation problemis tightly
coupledwith an agent-level planning problem.
4.1 Physical and Resource State Spaces
Anagent is a computer-basedprocess that represents the
interests of an object (in this case, a part) in the real world
(in this case, a manufacturing
facility). Adecision-theoretic
planning agent generates a policy x that specifies the
agent’s best action in everystate of the state space of the
part it represents [5, 6, 12]. Thevariables used to represent
the "physical" state of a part agent are shownin Figure 4.
Theactions available to the agent are shownin Figure5.
Figure4: State variables for part agents
State variable/
familyof state
variables
Description
Domain
Time
current systemtime in
{1.....T}
discrete units
Status(Op/)
the completionstatus of {complete,
each operation Opi in the incomplete}
part’s project plan
Shipped
whetherthe part has
{true, false}
receivedits terminal
reward
ElapsedTime(Opi)the numberof units
1, ...,
processingtime already max(d/)}
invested in operationOpi
Figure 5: Actionsfor part agents
Action/family of
actions
Description
Wait
do nothing (the value of Timeis
incremented)
Process(Opi, Mj, ~rocess operation Opi on machineMj
Tk)
for one unit of time starting at timeTk
Ship
exit the systemand receive a terminal
reward
^ Status(Op3)=complete ^ Shipped=false, then the
agent’s policy states that it should executethe Ship action
(and thereby trigger the paymentof its terminal reward).
Actionpreconditionsensure that the Ship action is not executed until the pawsoperations are completeand operations
can only be completedby executing Process0 actions.
The uncertainties that exist in manufacturingenvironmentsregarding the durations of operations, quality problems, and machine breakdowns mean that decisiontheoretic planningat the agent-level is seldomtrivial. The
planning problemis further complicatedfor market-based
agents by contention for scarce resources. Anagent cannot
simply choose to execute an action such as
Process(Op3, M2, T3) since other agents mayalso want
exclusive use of machineM2at time T3. Meuleauet al.
[19] address the problemof resource dependencyby using
heuristic search techniques to mergethe policies of the
agents into a consistent global allocation of resources.
Analternative approachto centralized dispute resolution
is IR exchangebased on market prices, as discussed in
Section 3. However,in order to determinetheir reservation
prices for goods, the agents must plan over all possible
resource contingencies. For every state s t in the agent’s
physicalstate space si e Si , the agent mustconsiderits best
action given every possible allocation of resourcesx; ~ X~.
The decision-theoretic planning algorithm generates the
agent’s optimal policy over the joint physical and resource
state space and therebygeneratesreservation prices as a byproductof its operation.
Toillustrate, considera physical state in whichthe agent
evaluates the expected value of executing the
Process(Op3,M2, T3) action. The action has manypreconditions including precedence constraints (such as
Status(Op2) = complete)and temporal constraints (such
Time= T3). In addition, execution of the action requires
that Owns(M2,
T3) be true for the agent. Howeverthe ownership status of resources is not controllable by the agent
throughits action set and must be treated as a randomvariable. Duringpolicy generation, the decision-theoretic planning algorithmselects an action for the state si n x~ where
Owns(M2,
T3) ~ x~. and determines the expected value
that state given the optimal policy n: Vn(si n xi). The
algorithm also selects an action for sinx/ where
Owns(M2,T3) ~ x/ and determine Vn(sj n xi" ) . The difference in expected values, Vn(si n xi)- Vn(si n xi’),
takes into account any complexinterdependencies between
Owns(M2,
T3) and other resources and provides the agent’s
true
.
reservation
price for the goodwhilein state s
t
4.2
State Space Reduction
The obvious problemwith planning over resource contingenciesis that the effective size of the agent’s state space
can increase by a factor of 2m. Even a small problemwith
If the part requires three operationsand is in a state that
satisfies Status(Opl) = complete^ Status(Op2) = complete
23
five machinesand a planning horizon of ten time units can
increase the size of the agent’s state space by a factor of
1015.This growthin the state spaceeffectively nullifies the
decrease in problemsize achieved through decomposition
in the first place.
Onemeansof attenuating the explosion in problemsize
causedby the needto plan over resourcecontingenciesis to
use a state space reduction technique such as structured
dynamicprogramming[5, 6, 12]. Structured dynamicprogrammingis based on the observation that not all state
informationis relevantfor all decisions.Toillustrate, recall
the exampleused previously in which the Ship action was
associated with any state satisfying the clause:
Status(Opl) = complete ^ Status(Op2) = complete
Status(Op3)= complete^ Shipped= false. Thesefour state
variables are all that is required to choose the optimal
action. All other state variablesare irrelevant to the decision
and can be ignored. Furthermore,if precedenceconstraints
exist betweenthe operations, then only twopieces of information are required
to select
an action:
Status(Op3)= complete^ Shipped= false.
Theirrelevance of certain state variables in certain decision contexts meansthat there is no need to enumeratethe
entire state spaceto generatethe optimalpolicy. Instead, the
relevant state variables can be identified in each decision
contextand states that are identical withrespect to the relevant state variables can be aggregatedinto abstract states.
The abstract states are typically represented using a tree
structure, as shownby the small fragment in Figure 6. A
dynamicprogramming
algorithm can then be applied to the
leaf nodesof the tree rather than the individual states. As
the policy improves,other state variables typically become
relevant and must be included in the tree. For example,the
value of the state variable Timeis often relevant whenthe
Ship action is executedsince the completiontime of the part
determines whetherlateness penalties must be subtracted
from the terminal reward. Thus, the numberof abstract
states growsuntil the optimalpolicy is found.
In the best case, the algorithm uncovers numerous
sources of irrelevance in the problemstructure and generates the optimalpolicy using an abstract state spacethat is a
small fraction of the size of the fully-enumerated state
space. In the worst case, the structured dynamicprogrammingalgorithmcan yield no reduction in the effective size
of the problem but incurs the computational overhead of
continually assessing the relevance of state variables.
Unfortunately,it is difficult to estimate the impactof the
structured approach a priori. Thus, the only wayto know
whetheran otherwise unsolvable decision-theoretic planning problem is solvable using structured dynamicprogramming
is to solve it.
Fortunately, problemswith the sameunderlyingstructure
share the same sources of independenceand irrelevance.
For example, the family of state variables ElapsedTime(Opk)
is importantfor determiningthe probability that
an operationwill be completein the next unit of processing
time given the numberof units of processing time already
invested in the operation. However,these variables always
24
Figure6: A small fragmentof a tree representation of
an agent-level planning and valuation problem.The
optimal policy and expected value of each abstract
state (leaf node)is shown.
cease to be relevant onceoperation Opkis completeregardless of the nature of the operation, the distribution of completion times, or any other problem-specificfactor. In such
cases, the magnitudeof the reductions achievedusing structured dynamicprogrammingfor one problem instance are
typically generalizableto all instances with the sameunderlying structure.
4.3 Empirical Results
Theprimary research question addressed in this paper is
whether structured dynamicprogrammingyields favorable
results whenapplied to decision-theoretic planning problems formulated over a joint physical and resource state
space. To answerthis question, a numberof small but typical agent-level planning and valuation problemswere formulated and solved using a variation of Dearden and
Boutilier’s algorithm [12] that incorporates non-propositional state variables and a limited form of teachability
analysis(see [7]).
The test problemsbelongto the flow shopclass of scheduling problemsin whicheach part requires n operations on
n machines. Each operation Opi can only be performed on
machineMi and precedence constraints require that each
Opi be completebefore Opi+l can be started. The duration
of each operation is represented by either a single value
(deterministic problems)or a probability histogram (stochastic problems). The global objective is to maximizethe
total profit of the productionsystem. Interestingly, even
very small schedulingproblemsof this formare difficult to
solve exactly. For example, the three-machine, three-part
flow shop scheduling problem is knowto be strongly NPhard [23].
Theresults for a numberof deterministic and stochastic
problemsare shownin Figure 7. In each class, the problem
size is varied by changingthe length of the planning hod-
zons. As the results indicate, the structured dynamicprogramming approach does indeed delay the onset of
impractically large state spaces. A probleminstance with
three operations and a planning horizon of 11 time units
induces an explicit state space of 1014states. However,
the
structured algorithm requires fewer than 16,000 abstract
states to determinethe agent’s optimal policy and generate
sufficient reservation price informationto permit the agent
to participate in a combinatorialauction.
Theresults also indicate that there are limits to the structured dynamic programming approach. The number of
abstract state spacesgrowsat a slowerrater than the explicit
state space; however, the growth is exponential in both
cases. Moreover,the growthin the abstract state space is
muchfaster whenthe underlyingproblemis stochastic.
4.4
of "expired" resources and thus the size of the agent’s
resource state space decrease as the value of Timein its
physical state spaceincreases.
The secondstructural pattern, feasibility, emergesfrom
the interaction of three common
properties of scheduling
problems: resource complementarities(multiple resources
on one or moremachinesare required to completethe part),
holding costs that are a function of the value addedto the
part, and a bindingdeadline. Ifa part agent does absolutely
nothing but wait, it receives no terminal reward and,
depending on the problemenvironment, maynot incur any
costs. Underthese circumstances, the lower boundon the
agent’s expected utility in any state is zero since it can
alwaysdo nothing. Accordingly,agents typically price all
resource goodsat zero wheneverthey are in a physical state
in whichthere is little chanceof attaining the terminal
rewardbefore their deadline.
Thethird structural pattern that occurs in the resource
state space is dominance.An allocation xi dominatesallocation x/ if Vx(si nxi) = V=(si nxi’) and xj cx/. Intuitively, the algorithm recognizes non-increasing returns
from additional resources. Returning to the examplein
Figure 6, the left-most branch of the tree correspondingto
the allocation x,. = {Owns(M2,T3)} has an expected
value of 95. If Owns(M2,
T3) is false, the agent can use two
units of processing on machineM3as an imperfect substitute and realize an expectedvalue of 89. However,the algorithm
never
considers
the
allocation
Structure of the Resource State Space
Ananalysis of the algorithm’s performanceon the test
problems suggests that the decision-theoretic planning
problems faced by market-based agents contains three
"structural patterns" that can be exploited by state space
reductionalgorithms.
Thefirst structural pattern is temporalirrelevance. The
representation of goodsused for the resource state space
defines each goodin terms of a machineand a unit of time.
Naturally, as time progresses, units of processingtime that
occur in the past can have no impact on the future streams
of costs and rewardsthat accrueto the agent. Thestructured
dynamicprogramming
algorithm recognizes the irrelevance
6OOOO
5OOOO
4OOOO
[
e Deterministic
3OOOO
¯ Stochastic
2OOOO
100o0
0
,
1.E+O0 1.E+02
¯ ¯ --"
1.E+04 1.E-+06 1.E+08
1.E+10
1.E4-12
1.E+14 1.E+16
number
of explicit states(log scale)
Figure 7: The structured dynamicprogramming
algorithm delays the onset of computationalimpracticality.
25
xi’ = {Owns(M2,T3), Owns(M3,T3), Owns(M3,
becauseall resource goodshavea non-zeroacquisition cost
and the additional goods provide no increase in expected
utility for the agent. In short, the agent has nothingto gain
by owningxi’ over x~.
The reduction in the size of the resource state space
enabledby these three structural patterns is significant. For
example, a problem with three machines and a decision
horizonof eight time units generatesa resourcestate space
of 23 x s = 16.8 million unique resource allocations. However, the numberof allocations actually evaluated in each
physical state varies between508 and 2 according to the
frequencydistribution in Figure 8. Thus, in the majority of
physical states, the agent plans over fewerthan 50 distinct
resource bundles.
20
15
number
of resource
allocationsevaluated
Figure8: The numberof resource allocations
considered by the structured dynamicprogramming
algorithmin each physical state dependson the
attributes on the state.
5
Conclusions
Theuse of decision-theoretic planningto generate optimal solutions to large, complexresource allocation problems is infeasible due to the massivestates spaces that are
required to represent the problems.Althoughemergingstate
space reduction techniquessuch as structured dynamicprogramming
and reachability analysis can greatly reduce the
effective size of a decision-theoretic planning problem,
these techniques merely attenuate the exponential explosion; they donot eliminateit.
Market-baseddecompositionof the same large resource
allocation problemsprovides a morerealistic opportunityto
exploit the powerof decision-theoretic planning techniques
since the agent-level planningproblemsare relatively small.
Moreover,decision-theoretic planners provide a literal and
exact implementationof economicrationality and are there-
26
fore ideal for market-basedsystems based on fundamental
microeconomictheory.
The difficulty that arises in practice is due to the price
mechanism.Price is a critical element of market-based
decompositionsince it permits the agents to plan independently and simultaneously in spite of contention for the
same scarce resources. However, the price mechanism
imposesa significant computational burden on the agents
since they must plan over a large numberof resource contingencies in order to determinetheir reservation prices for
the resources.
Theempiricalresults presentedin this paper suggest that
the computational burden imposedby the price mechanism
can be reduced by exploiting structural patterns in the
resource state space. Specifically, structured dynamicprogramming
can be used to computethe reservation prices of
only those resource allocations that are relevant to the
agent. This minimalset of reservation prices can be combined to determine the agent’s reservation price for any
allocation.
Animportantissue that is not addressedin this paper is
the feasibility of determining the equilibrium outcomein
the resulting combinatorial auction. The winnerdetermination problemfor the auction requires search over a contract
graph that is exponential in the numberof agents and
resource goods[30]. However,a numberof heuristic search
techniques have beenproposedthat performvery well, even
on very large probleminstances [22, 29]. Thus, the main
advantage of the market-basedapproach described here is
that it transforms hard, stochastic optimization problems
into hard deterministic winnerdetermination problemsthat
are moreamenableto heuristic solution techniques.
Giventhe ultimate requirementsfor inexact techniquesat
the market level, one might question the value of using a
computationallyintensive exact approachsuch as decisiontheoretic planning at the agent level. Althoughthe issue
remains unresolved,there are a numberof practical reasons
whythe use of decision-theoretic planning to implement
market-basedagents is appropriate:
1. Computationis distributed -- The agents can solve
their planning and valuation problemsindependently and
simultaneously.Thus, the computationalintensity of the
techniqueis less critical than it wouldbe if the agentlevel problemswere serialized.
2. Decision-theoretic planning agents encapsulate uncertainty -- Manyreal-world resource allocation problems
are characterized by risk and uncertainty. Decision-theoretic planning provides an ideal meansof incorporating
probabilistic information into decision-making. Moreover, the expected values (in the form of reservation
prices) generated by the planning algorithm encapsulate
the uncertainty in the system. The final winnerdetermination problemcan then be formulatedas a deterministic
problem.
3. Accurate preference information simplifies the winner determinationprocess -- Search over the contract
graphis simplified if each agent has a partially ordered
list of resourceallocations and intrinsic utilities. Thisis
especially true if the search is implemented
as a continuous auction in which individual agents must initiate
exchanges.
4. The impact of improvementsis additive -- Improvementsin both hardwareand techniques for solving decision-theoretic planning problemshave an additive effect
in multiagent systems. As such, even incremental
improvements
can havea significant impact on the feasibility of applyingthe market-basedapproachto industrial
scale problems.
Of course, this is not to suggest that the decision-theoretic planningtechniquesused to date in this research are
sufficient for industrial-scale problems.Instead of agents
with three operations and a time horizon of 11 time units,
problemsof a useful size require agents to plan over longer
time horizons and managea dozenor so operations. These
"useful" agent-level problemsare clearly very large. However, they are not so large that they are beyondconsideration. Improvedtechniques for state space reduction,
teachability analysis, and approximationcould makevery
large scale market-basedsystems based on decision-theoretic planningagentsa reality.
[10] ThomasDean and Shieu-Hong Lin, 1995, "Decomposition Techniquesfor Planningin Stochastic Domains,"
in the proceedingof 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal,
PQ, pp. 1121-1128.
[11]ThomasDean, Leslie Pack Kaelbling, Jan Kirmanand
Anne Nicholson, 1993, "Planning with Deadlines in
Stochastic Domains,"in the proceedings of the Eleventh National Conferenceon Artificial Intelligence,
Washington,DC, pp. 574-579.
[12]Richard Deardenand Craig Boutilier, 1997, "Abstraction and ApproximateDecision Theoretic Planning,"
Artificial Intelligence, Vol. 89, pp. 219-283.
[13] Jon Elster, 1983, Sour Grapes:Studies in the Subversion of Reality, CambridgeUniversity Press, Cambridge, UK.
[14]Richard Engelbrecht-Wiggans,1983, "An Introduction
to the Theoryof Biddingfor a Single Object," in Auctions, Bidding, and Contracting: Uses and Theory,
Richard Engelbrecht-Wiggans, Martin Shubik and
Robert M. Stark, eds., NewYorkUniversity Press, New
York, pp. 53-104.
[15]Nicholas R. Jennings, Katia Sycara and Michael J.
Woolridge, 1998, "A Roadmapof Agent Research and
Development," AutonomousAgents and Multi-Agent
Systems,Vol. 1, No.1, pp. 7-38.
[16]DonaldW. Katzner, 1988, Walrasian Microeconomics:
An Introduction to the EconomicTheory of Market
Behavior, Addison-Wesley,Reading, MA.
[17]AndreuMas-Colell, Michael D. Whinstonand Jerry R.
Green, 1995, MicroeconomicTheory, Oxford University Press, NewYork.
[18]R. Preston McAfeeand John McMillan, 1996, "Analyzing the Airwaves Auction," Journal of Economic
Perspectives, Vol. 10, No.1, pp. 159-176.
[19]Nicolas Meuleau, Milos Hauskrecht, Kee-EungKim,
Leonid Peshkin, Leslie Pack Kaelbling, ThomasDean
and Craig Boutilier, 1998, "Solving VeryLarge Weakly
CoupledMarkovDecision Processes," in the proceedings of the 15th National Conference on Artificial
Intelligence, pp. 165-172.
[20]Paul Milgrom, 1989, "Auctions and Bidding: a
Primer," Journal of EconomicPerspectives, Vol. 3, No.
2, pp. 3-22.
[21]Tracy Mullen and Michael E Wellman, 1995, "Some
Issues in the Design of Market-OrientedAgents," in
Intelligent Agents II: IJCAI’95Workshopon AgentTheories, Architectures, and Languages,MichaelJ. Woolridge, MilindTambeand J6rg E Mtiller, eds., SpringerVerlag, Berlin, pp. 283-298.
[22]DavidC. Parkes, 2000, "iBnndle: An Efficient Ascending Price BundleAuction," in the proceedings of the
ACMConference on Electronic Commerce(EC-99),
pp. 148-157.
[23]Michael Pinedo, 1995, Scheduling: Theory, Algorithms, and Systems, Prentice-Hail, Englewood
Cliffs,
NJ.
[24]Martin L. Puterman, 1994, Markov Decision Processes: Discrete Stochastic DynamicProgramming,
John Wiley &Sons, NewYork.
References
[1] AndrewG. Barto, Steven J. Bradtke and Satinder P.
Singh, 1995, "Learningto act using real-time dynamic
programming,"
Artificial Intelligence, Vol. 72, No.1-2,
pp. 81-138.
[2] Gary S. Becker, 1976, The Economic Approach to
HumanBehavior, University of Chicago Press, Chicago, IL.
[3] Alan H. Bondand Les Gasser, 1988, Readings In DistributedArtificial Intelligence, MorganKaufmann,San
Francisco, CA.
[4] Craig Boutilier, RonenI. Brafinan and Christopher
Geib, 1998, "Structured Reachability Analysis for
MarkovDecision Processes," in the proceedingsof the
14th AnnualConferenceon Uncertainty in Artificial
Intelligence (UAI-98), Madison, WI(24-26 July)
24-32.
[5] Craig Boutilier, ThomasDeanand Steve Hanks, 1999,
"Decision-Theoretic Planning: Structural Assumptions
and ComputationalLeverage," Journal of Artificial
Intelligence Research,Vol. 11, pp. 1-94.
[6] Craig Boutilier, Richard Dearden and Mois~sGoldszmidt, 1995, "Exploiting Structure in Policy Construction," in the proceedings of the 12th National
Conferenceon Artificial Intelligence, Seattle, WA,pp.
1104-1111.
to
[7] Michael J. Brydon, 2001, A Market-BasedApproach
Resource Allocation in Manufacturing, unpublished
Ph.D.dissertation, Universityof British Columbia.
[8] John Q. Cheng and Michael P. Wellman, 1998, "The
WALRAS
algorithm: A convergent distributed implementationof general equilibrium outcomes,"Computational Economics,Vol. 12, pp. 1-24.
[9] Scott H. Clearwater, 1996, Market-BasedControl: A
Paradigmfor Distributed ResourceAllocation, World
Scientific, Singapore.
27
[25]HowardRaiffa, 1968, Decision Analysis: Introductory
Lectures on Choices and Uncertainty, Addison-Wesley,
Reading, MA.
[26]Eric Rasmusen, 1989, Gamesand Information: An
Introduction to GameTheory, Blackwell, London.
[27] StephenJ. Rassenti, VernonL. Smithand R. L. Bulrin,
1982, "A Combinatorial Auction Mechanismfor Airport TimeSlot Allocation," Bell Journalof Economics,
Vol. 13, pp. 402-417.
[28] Jeffrey S. Rosenscheinand Gilad Zlotkin, 1994, Rules
of Encounter: Designing Conventions for Automated
Negotiation Among Computers, MIT Press, Cambridge, MA.
[29]Michael H. Rothkopf, Alexsander Pekec and Ronald
M. Harstad, 1998, "Computationally Manageable
Combinational Auctions," ManagementScience, Vol.
44, No. 8, pp. 1131-1147.
[30] TuomasW.Sandholm,1998, "Contract Typesfor Satisricing Task Allocation: I Theoretical Results," in the
proceedings of the AAAI1998 Spring Symposium:
Satisficing Models,Stanford, CA(March23-25).
[31]Tuomas W. Sandholm and Subhash Suri, 2000,
"ImprovedAlgorithms for Optimal WinnerDetermination in CombinatorialAuctions and Generalizations,"
in the proceedingsof the National Conferenceon Artificial Intelligence (AAAI),Austin, TX(July 31-August
2) pp. 90-97.
[32]Robert A. Schwartz, 1988, Equity Markets: Structure,
Trading, and Performance,Harper & Row,NewYork.
[33]WilliamE. Walsh, MichaelP. Wellman,Peter R. Wurmanand Jeffrey K. MacKie-Mason,1998, "SomeEconomics of Market-BasedDistributed Scheduling," in
the proceedings of the 18th International Conference
on Distributed ComputingSystems, pp. 612-621.
[34] MichaelP. Wellman,1996, "Market-OrientedProgramming: SomeEarly Lessons," in Market-BasedControl:
A Paradigmfor Distributed ResourceAllocation, S.H.
Clearwater,ed., WorldScientific, Singapore,pp. 74-95.
28