From: AAAI Technical Report WS-02-06. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. Using Decision-Theoretic Planning Agents in Market-BasedSystems Michael Brydon Faculty of BusinessAdministration and Centre for SystemsScience SimonFraser University Bumaby,British Columbia,Canada mjbrydon@sfu.ca Abstract as minimizing the tardiness or makespan of a set of jobs [23]. A decision-theoretic planning system for manufacturing scheduling would explicitly managethe uncertainty in the production environment and would permit managers to express their objectives in terms of complex, discontinuous, multiattribute utility functions. Unfortunately, the richness of the decision-theoretic planning approach leads to a welldocumented explosion in the number of states required to represent the problem [5, 11]. Several techniques, such as state aggregation [5, 6, 12], and reachability analysis [1, 4] have been used successfully to delay the onset of impractically large state spaces. However,it is unlikely that such techniques by themselves are sufficient to permit the formulation and solution of large, industrial-scale decision-theoretic planning problems. Instead, some form of problem decomposition is required [10, 19]. A very general approach to problem decomposition that is attracting a great deal of research interest is based on the notion of agents interacting within markets [8, 9, 21, 33, 34]. The purpose of this paper is to examine the suitability of decision-theoretic planning agents for certain types of market-based systems. Section 2 begins by defining the environmental context of the resource allocation problems considered in this paper. Section 3 works backwards from economic theory and the characteristics of the original resource allocation problem to identify the core requirements for the market-based agents. The suitability of decision-theoretic planning for implementing market-based agents is examined in Section 4. The fundamental question addressed in the section is not whether decision-theoretic planning is an appropriate framework for fulfilling the requirements imposed by economic theory. Instead, the important question is one of computational practicality. Although the agent-level planning problems are exponentially smaller than the undecomposedproblem, they typically remain far too large to represent and solve using conventional dynamic programming techniques (see [24]). In order to be solvable in practice, the agent-level planning problems must possess intemal structure that can be exploited by state space reduction techniques. The empirical evidence presented in this section suggests that certain important classes of resource allocation problems do possess internal structure that can yield significant reductions in the effective size of the agent-level planning problem. Section 5 concludes with some general observations regard- This paper examinesa numberof theoretical and practical issues concerningthe use of decision-theoretic planning to implement agents in market-based systems. The markets considered here result from the decompositionof complex, intra-organizationai resource allocation problems such as manufacturing scheduling. Althoughthese problems can be formulatedas monolithicoptimizationproblems,they tend to be muchtoo large to solve in practice. Marketsprovide a meansof decomposinglarge resource allocation problems and distributing the computationof a solution over many processors. Animportantpreconditionof efficient marketsis agent-level rationality. Decisiontheoretic planningcan be used to implementeconomicrationality and thus decision theoretic planning agents fit well into market-based approaches. The primary challenge in building decision theoretic planning agents is the size of the agent’s state space. Althoughmarketbased decompositionresults in agent-level problemsthat are muchsmaller than the original resource allocation problem, the price mechanismused to achieve independence of the agent-level problems requires that the agents plan over a large number of different resource contingencies. This requirementexacerbatesthe state space explosionthat characterizes decision-theoretic planning. The application of state space reduction techniques such as structured dynamic programmingand reachability analysis are shownto yield significant reductionsin the effective size of the agent-level problemsand thereby increase the applicability of decision theoretic planning techniques in market-basedsystems. 1 Introduction Real-world resource allocation problems such as scheduling production machinery in large manufacturing facilities could benefit greatly from the application of decisiontheoretic planning techniques. Such allocation problems are characterized by uncertainty and complex trade-offs between objectives such as time, flexibility, and cost. However, the standard techniques used to allocate resources rely on numerous simplifying assumptions, such as deterministic outcomes and single-attribute objective functions such Copyright ©2000,American Associationfor Artificial (www.aaai.org). All rights reserved. 19 ing the use of decision-theoretic planningagents in marketbased systems. 2 Market-Based ¯ terminal J reward PRODUCTION SYSTEM \ /~- --~ Decomposition A distinction is often madein the distributed artificial intelligence (DAI)literature betweencooperative distributed problem solving (CDPS)environmentsand multiagent systems (MAS)environments[3, 15]. The critical difference between CDPSand MASenvironments is the existence of a global utility function. In CDPS environments,a global utility function exists; the challengeis to decompose the problem such that the maximization behavior of the agents results in maximizationof the global utility function. The final utilities of the agents themselvesare irrelevant since the agents exist solely to facilitate the distribution of computation. In contrast, agents in MAS environmentsare typically usedto modelthe distributed, multiagentstructure of a naturally decentralized problem[28]. For example,agents can be used to represent different firms in a complexsupply chain coordination problem.Since the fundamentalprinciples of decision theory prohibit the aggregationor comparison of utilities across individual agents [25], no meaningful measureof global utility exists in such environments.Identifying a "good"of"fair" solution therefore requires a commitmentto a particular game-theoreticbargainingsolution [26, 28]. The task of allocating scarce production resources in a manufacturingfacility is an instance of the CDPSclass. Althoughthe term "cooperative" is misleadingin this case (since economicagents are strictly self-interested), intraorganizational resource allocation problems have welldefinedutility functions. Thetask facing the designerof the agents is to decomposethe global utility function into agent-level utility functions. In the following section, a decompositionis proposedthat achieves an additive relationship betweenglobal utility and independencebetween the agent-level problems. 2.1 Decomposing the Scheduling Problem A natural wayto decomposea resource allocation problem in a manufacturingenvironmentis to create computerbased agents for each part and machinein the production system(part agents and machineagents respectively). All the costs and revenues in the problemdomaincan then be allocated to individual agents as shownin Figure1. The only source of revenue in the systemis the payment that occurs whena completed part is shipped to the next stage in the system’ssupplychain, such as a final customer, a distributor, or finished goodsinventory. Incomingrevenue is modeledusing a terminal rewardthat is received by a part agent whenthe part it represents leaves the production system. The size and functional form of the reward are determinedexogenouslybut are knownto the part agent. On the cost side, both part agents and machineagents incur costs. In the case of part agents, each part is allocated the costs of its constituentrawmaterials as well as the cost 2O II( materials part ~"---"--~(machlne-’~ holding costs vadable setup and cost8 maintenance Figure]: Anallocation of global revenues andcoststo a part and machineagent. of holding the part in work-in-processinventory. The costs of wear and tear, consumables,and setups are allocated to individual machines. The arrows marked"inter-agent transfers" are discussed in moredetail in Section3. At this point, it is sufficient to recognize that all inter-agent transfers are zero-sumand therefore haveno net impacton global utility. As a result, the global utility of the productionsystem(revenues- costs) can be expressedas the sumof the individual agent utilities: = (1) i To simplify exposition in this paper, we makethe assumption that variable, setup, and maintenancecosts are zero, This permits us to eliminate machine agents from the decomposed problem and focus on the more complex requirementsof part agents. 2.2 Microeconomic Theory and CDPS Environments The purpose of microeconomic theory is to describe the interaction of individual rational agents pursuingtheir own selfish objectives[2, 17]. Thetermrationality, as it is used in the microeconomiccontext, corresponds to Elster’s notion of thin rationality [13]. Thinrationality requires only that an agent maximizeits utility accordingto a consistent set of preferences over outcomesand that its reasoning about the expected values of outcomesbe consistent with the fundamentalaxiomsof probability theory. Althoughthe microeconomic formulation of agents is too simplistic to provide an accurate description of human agents, the objective in a CDPS environmentis not to simulate reality. Instead, the objective is to distribute computation and aggregate the results so that global utility is maximized.The designer of problem solving systems in a CDPS environmenthas the luxury of complete control over the implementationand behavior of the agents in the system. This is clearly not the case for the designers of real markets (e.g., the NewYorkStock Exchange[32]) or problem solving systems in MASenvironments. It is therefore possible for a CDPSsystem to be constructed in strict accordance with microeconomictheory. The advantage of using microeconomicsas a blueprint for a market-basedsystemis that the theory provides some a priori insights and guarantees about the overall outcome of the market. Specifically, the First Fundamental Theorem of WelfareEconomics states that a competitiveequilibrium inducesan allocation that is Paretooptimal[17, 16]. Thatis, if rational, self interested agentsare permittedto competein a market,the outcomeat equilibriumis that no agent can be madebetter offwithout makingsomeother agent worseoff. AlthoughPareto optimality in the general case is insufficient for global optimality, it is possible to achieveequivalence betweenthe twoin cases in whichglobal utility is the sumof agentutilities. 2.3 Maximization Behavior of Agents Accordingto the decompositionin Figure 1, an agent in a manufacturingsystemmaximizesits utility by attaining its terminal rewardwhile simultaneouslyminimizingits costs. The cost and rewardsthat the agent accumulatesduring its progress through the production system are a function of two things: chance and the agent’s control of production resources. It is therefore possible to express agent i’s expectedutility as a function of its allocation of resources x, : U~=f,(x~). Productionresourcesare definedas discrete units of processing time on a particular machineat a particular time. A family of propositional variables Owns(Mj,Tk) can be used to denote whetherthe agent has exclusiverights to processing on machineMj at time Tk. The total numberof production resources in the system, m, is the numberof machines in the production systemmultiplied by the numberof time units in the planninghorizon. Theonly waythat agents can increase their utility is to exchangeresources with other agents. Suchexchangescan occur betweenrational agents if and only if neither agent is worseoff as a result of the exchangeand at least one agent is strictly better off. Onemeansof facilitating such individually rational (IR) exchangesbetweenagents is to augment the m-goodeconomywith a numeraire good. A numeraire goodis a goodthat possesses the properties of fungibility and divisibility but whichhas no intrinsic utility to the agents other than its acceptance within the economyas a unit of exchange[17]. Acceptanceof the goodas a unit of exchangeis achieved by adopting and enforcing the conventionthat all agents expresstheir preferencesfor resource goods in terms of the numeraire goodMrather than their ownidiosyncratic and incomparableunits of utility. The result is a quasi-linearutility functionfor agenti: U~ = Mr + dpi(x,) (2) Toconstruct(2), the agent scales its utility functionto make it linear in good M. The term ¢i(xi) corresponds to the agent’sintrinsic utility for resourcegoods.In a manufacturing scheduling environment, the form of Cj(x~) is determined by the complex interaction of costs and rewards faced by the part. Intuitively, high-valueparts with high holding costs and binding deadlines are boundto value cer- 21 rain productionresources morethan low-valueparts that are built to stock. 2.4 Determining Reservation Prices Determining the rationality of a particular exchange requires agents to knowtheir ownprivate reservation prices for goods. Agenti’s reservation price for resource goodx/ is the changein the quantity Mi that makesthe agent indifferent betweenan allocation that contains xJ one that is identical in everyrespect exceptthat it does not contain x). Thus given x/ = xi u x/, AMj(x2)= ~i(x/)- qb,.(xj). Notethat the reservationprice is symmetrical.It contains no spread betweenthe price at whichthe agent is willing to buythe resource(bid price) and the price at whichthe agent is willing to sell the resource if it already ownsit (ask price). In addition, the quantity AMis independentof the agent’s endowment of M. Animplication of the quasi-linear utility functionis that agents are risk neutral withrespect to good Mand thus all exchangedecisions are madeon the basis of marginalchangesin M. An important issue that arises in resource allocation problemsis that valuations for resource goodsare seldom independent. Twocommonforms of resource interdependence are substitutability and complementarity.Twogoods x1, x2 are substitutes if the utility of owningthemin the same allocation is subadditive: Ui(xl, x2)< Ui(xl)+ U~.(x2). Conversely, the goods are complements if the utility of owningthemin the sameallocation is superadditive: U~(xl, x2) > U~(xI) + Ui(x2). Substitutability occurs in manufacturing environments whenevera contract on a particular machineat a particular time can be used in place of a contract on a different machineor at a different time. Complementarityoccurs wheneveran operation requires more than one unit of processing time or involves precedenceconstraints. The deeply embeddedinterdependencies of goods in resource allocation problemsmeansthat the agents invariably participate in a combinatorialauction [18, 27, 29]. In a combinatorialauction, an agent cannotdetermineits reservation price for a goodby consideringthe goodin isolation. Instead, the agent mustconsider as manyas 2" unique allocations of resources and calculate its reservation price for all transition betweenall allocations. For example, in the m= 2 economyshownin Figure 2, the agent’s reservation price for good x2 dependson its ownership of xI . If the agent’s initial allocation is (xI = 0, x2 = 0, M= 100), its reservation price for 2 i s $30- $100=-$70. In other words, the agent is willing to pay up to $70 to acquire the resource. However,if the agent’sinitial state is (x1 = 1, x2 = 0, M= 50), its reservation price is only $0 - $50 = -$50. Theagent therefore con- initial allocation M ~(X 1 m 0, X2 = 0, M = (X 1 = fo o1:o I 15° I 100) gx,,:}, { }l 1, x2 = O, M= 50 ~l = O,x2 = I,M. = 30) Ixl Z Agent/ (x I "= 1,x2= 1, M=O) Figure 2: Anindifference graph for an economy consisting of two indivisible resource goods(X1 and x2 ) and a numeraire good M. Agent/ 1 3 2 10 Total ~10 l{ }, x2)l ol ~timalallocation ~gent/ ~,(x~) ! 15 2 0 total 15 0 ;2 100 Total 100 Figure 3: A contract graph showingfor a two-agent two-good economy. siders the goodsto 2be partial substitutes since it values x Iless . whenit already ownsx 3 Interagent Exchange The decomposition of a resource allocation problem results in apure exchangeeconomy[17]. Noproduction or consumptionof the m resource goodsor the single numeraire good occurs during the operation of the market. Instead, the agents exchangegoodsuntil no further IR transactions are possible. The equilibriumoutcomeis said to be efficient if each goodis ownedby the agent with the reservation price for the good. A contract graph [30] provides a convenient means of visualizing the exchangesthat can occur in an n-agent mgoodeconomy.Eachvertex of the graph describes an allocation of goodsto agents. Theedges represent exchangesof goods(or contracts) betweenagents. Figure 3 showsa very simple contract graph for an economywith two agents and tworesourcegoods.Theintrinsic utilities of the agents are shownfor each vertex as well as the sumof the utilities. Althoughtransfers of Mare essential for the execution of the exchanges,interagent transfers of Min a pure exchange economywith quasi-linear utility functions are zero-sum. Thetotal utility of money,~ U~(M~), is constant in all allocations and thus the flows of Mcan be let~ implicit on the contract graph. Oneadvantageof the contract graphrepresentation is that it permits the operation of the marketto be visualized as local search over allocations. The different auction forms used to implementthe market result in different search operators. However, in this case, the agents are risk-neutral 22 with respect to the auction and possess independentprivate valuations of the goods. Consequently, the major auction forms are equivalent in terms of economicefficiency [20]. Becauseof the equivalenceof auction forms, the market can be implementedas a continuous auction rather than a call auction. In a call auction, a central auctioneercollects reservation prices from potential buyers of a good and awards the good to the highest bidding according to some price function such as first- or second-price[14, 20]. In a continuousauction, agents are alwaysin a position to buyor sell resource goods and transactions are executed without the intervention of an auctioneer. The primarydifference betweena call auction and a continuous auction is the numberand sequence of contracts requiredto attain the equilibriumallocation. Sincethe reservation prices of the bidders are sorted in a call auction, a single contract is all that is requiredto ensurean efficient outcome.In a continuousauction, the path through the contract graph dependson the order in whichthe agents interact. Althoughthe distribution of the economicsurplus to agents maybe different in a continuous auction, the only outcomeof interest in a CDPS environmentis global utility. A continuous auction therefore provides a convenient meansof avoidingthe synchronizationrequirementof a call auction. 4 Decision-Theoretic Agents Regardless of the auction form used to implementthe market, the agents that participate in the marketmust be able to determinetheir reservation prices for arbitrary bundles of resource goods. The agents use their reservation prices in twoways.First, an agent acting in the buyerrole needs to identify desirable allocations and determineits willingness to pay for the resources it does not own.Conversely, an agent in the seller role mustbe able to respondto ask price inquiries from other agents. To complicatematters, agents in a combinatorialauction maybe selling and buyingwithin the scope of a single transaction. For example, the contract represented by the horizontal line in Figure 3 has Agent1 selling xI and buyingx2 at the same time. To this point, nothing has been said about the precise manner in which an agent i determines ~i(xi) where x,. e Xi and IXi[ = 2m. However,it is importantto recognize that an agent’s expectedreturn on its investmentin a production resource dependson the wayin whichthe agent uses the resource. Thus, the valuation problemis tightly coupledwith an agent-level planning problem. 4.1 Physical and Resource State Spaces Anagent is a computer-basedprocess that represents the interests of an object (in this case, a part) in the real world (in this case, a manufacturing facility). Adecision-theoretic planning agent generates a policy x that specifies the agent’s best action in everystate of the state space of the part it represents [5, 6, 12]. Thevariables used to represent the "physical" state of a part agent are shownin Figure 4. Theactions available to the agent are shownin Figure5. Figure4: State variables for part agents State variable/ familyof state variables Description Domain Time current systemtime in {1.....T} discrete units Status(Op/) the completionstatus of {complete, each operation Opi in the incomplete} part’s project plan Shipped whetherthe part has {true, false} receivedits terminal reward ElapsedTime(Opi)the numberof units 1, ..., processingtime already max(d/)} invested in operationOpi Figure 5: Actionsfor part agents Action/family of actions Description Wait do nothing (the value of Timeis incremented) Process(Opi, Mj, ~rocess operation Opi on machineMj Tk) for one unit of time starting at timeTk Ship exit the systemand receive a terminal reward ^ Status(Op3)=complete ^ Shipped=false, then the agent’s policy states that it should executethe Ship action (and thereby trigger the paymentof its terminal reward). Actionpreconditionsensure that the Ship action is not executed until the pawsoperations are completeand operations can only be completedby executing Process0 actions. The uncertainties that exist in manufacturingenvironmentsregarding the durations of operations, quality problems, and machine breakdowns mean that decisiontheoretic planningat the agent-level is seldomtrivial. The planning problemis further complicatedfor market-based agents by contention for scarce resources. Anagent cannot simply choose to execute an action such as Process(Op3, M2, T3) since other agents mayalso want exclusive use of machineM2at time T3. Meuleauet al. [19] address the problemof resource dependencyby using heuristic search techniques to mergethe policies of the agents into a consistent global allocation of resources. Analternative approachto centralized dispute resolution is IR exchangebased on market prices, as discussed in Section 3. However,in order to determinetheir reservation prices for goods, the agents must plan over all possible resource contingencies. For every state s t in the agent’s physicalstate space si e Si , the agent mustconsiderits best action given every possible allocation of resourcesx; ~ X~. The decision-theoretic planning algorithm generates the agent’s optimal policy over the joint physical and resource state space and therebygeneratesreservation prices as a byproductof its operation. Toillustrate, considera physical state in whichthe agent evaluates the expected value of executing the Process(Op3,M2, T3) action. The action has manypreconditions including precedence constraints (such as Status(Op2) = complete)and temporal constraints (such Time= T3). In addition, execution of the action requires that Owns(M2, T3) be true for the agent. Howeverthe ownership status of resources is not controllable by the agent throughits action set and must be treated as a randomvariable. Duringpolicy generation, the decision-theoretic planning algorithmselects an action for the state si n x~ where Owns(M2, T3) ~ x~. and determines the expected value that state given the optimal policy n: Vn(si n xi). The algorithm also selects an action for sinx/ where Owns(M2,T3) ~ x/ and determine Vn(sj n xi" ) . The difference in expected values, Vn(si n xi)- Vn(si n xi’), takes into account any complexinterdependencies between Owns(M2, T3) and other resources and provides the agent’s true . reservation price for the goodwhilein state s t 4.2 State Space Reduction The obvious problemwith planning over resource contingenciesis that the effective size of the agent’s state space can increase by a factor of 2m. Even a small problemwith If the part requires three operationsand is in a state that satisfies Status(Opl) = complete^ Status(Op2) = complete 23 five machinesand a planning horizon of ten time units can increase the size of the agent’s state space by a factor of 1015.This growthin the state spaceeffectively nullifies the decrease in problemsize achieved through decomposition in the first place. Onemeansof attenuating the explosion in problemsize causedby the needto plan over resourcecontingenciesis to use a state space reduction technique such as structured dynamicprogramming[5, 6, 12]. Structured dynamicprogrammingis based on the observation that not all state informationis relevantfor all decisions.Toillustrate, recall the exampleused previously in which the Ship action was associated with any state satisfying the clause: Status(Opl) = complete ^ Status(Op2) = complete Status(Op3)= complete^ Shipped= false. Thesefour state variables are all that is required to choose the optimal action. All other state variablesare irrelevant to the decision and can be ignored. Furthermore,if precedenceconstraints exist betweenthe operations, then only twopieces of information are required to select an action: Status(Op3)= complete^ Shipped= false. Theirrelevance of certain state variables in certain decision contexts meansthat there is no need to enumeratethe entire state spaceto generatethe optimalpolicy. Instead, the relevant state variables can be identified in each decision contextand states that are identical withrespect to the relevant state variables can be aggregatedinto abstract states. The abstract states are typically represented using a tree structure, as shownby the small fragment in Figure 6. A dynamicprogramming algorithm can then be applied to the leaf nodesof the tree rather than the individual states. As the policy improves,other state variables typically become relevant and must be included in the tree. For example,the value of the state variable Timeis often relevant whenthe Ship action is executedsince the completiontime of the part determines whetherlateness penalties must be subtracted from the terminal reward. Thus, the numberof abstract states growsuntil the optimalpolicy is found. In the best case, the algorithm uncovers numerous sources of irrelevance in the problemstructure and generates the optimalpolicy using an abstract state spacethat is a small fraction of the size of the fully-enumerated state space. In the worst case, the structured dynamicprogrammingalgorithmcan yield no reduction in the effective size of the problem but incurs the computational overhead of continually assessing the relevance of state variables. Unfortunately,it is difficult to estimate the impactof the structured approach a priori. Thus, the only wayto know whetheran otherwise unsolvable decision-theoretic planning problem is solvable using structured dynamicprogramming is to solve it. Fortunately, problemswith the sameunderlyingstructure share the same sources of independenceand irrelevance. For example, the family of state variables ElapsedTime(Opk) is importantfor determiningthe probability that an operationwill be completein the next unit of processing time given the numberof units of processing time already invested in the operation. However,these variables always 24 Figure6: A small fragmentof a tree representation of an agent-level planning and valuation problem.The optimal policy and expected value of each abstract state (leaf node)is shown. cease to be relevant onceoperation Opkis completeregardless of the nature of the operation, the distribution of completion times, or any other problem-specificfactor. In such cases, the magnitudeof the reductions achievedusing structured dynamicprogrammingfor one problem instance are typically generalizableto all instances with the sameunderlying structure. 4.3 Empirical Results Theprimary research question addressed in this paper is whether structured dynamicprogrammingyields favorable results whenapplied to decision-theoretic planning problems formulated over a joint physical and resource state space. To answerthis question, a numberof small but typical agent-level planning and valuation problemswere formulated and solved using a variation of Dearden and Boutilier’s algorithm [12] that incorporates non-propositional state variables and a limited form of teachability analysis(see [7]). The test problemsbelongto the flow shopclass of scheduling problemsin whicheach part requires n operations on n machines. Each operation Opi can only be performed on machineMi and precedence constraints require that each Opi be completebefore Opi+l can be started. The duration of each operation is represented by either a single value (deterministic problems)or a probability histogram (stochastic problems). The global objective is to maximizethe total profit of the productionsystem. Interestingly, even very small schedulingproblemsof this formare difficult to solve exactly. For example, the three-machine, three-part flow shop scheduling problem is knowto be strongly NPhard [23]. Theresults for a numberof deterministic and stochastic problemsare shownin Figure 7. In each class, the problem size is varied by changingthe length of the planning hod- zons. As the results indicate, the structured dynamicprogramming approach does indeed delay the onset of impractically large state spaces. A probleminstance with three operations and a planning horizon of 11 time units induces an explicit state space of 1014states. However, the structured algorithm requires fewer than 16,000 abstract states to determinethe agent’s optimal policy and generate sufficient reservation price informationto permit the agent to participate in a combinatorialauction. Theresults also indicate that there are limits to the structured dynamic programming approach. The number of abstract state spacesgrowsat a slowerrater than the explicit state space; however, the growth is exponential in both cases. Moreover,the growthin the abstract state space is muchfaster whenthe underlyingproblemis stochastic. 4.4 of "expired" resources and thus the size of the agent’s resource state space decrease as the value of Timein its physical state spaceincreases. The secondstructural pattern, feasibility, emergesfrom the interaction of three common properties of scheduling problems: resource complementarities(multiple resources on one or moremachinesare required to completethe part), holding costs that are a function of the value addedto the part, and a bindingdeadline. Ifa part agent does absolutely nothing but wait, it receives no terminal reward and, depending on the problemenvironment, maynot incur any costs. Underthese circumstances, the lower boundon the agent’s expected utility in any state is zero since it can alwaysdo nothing. Accordingly,agents typically price all resource goodsat zero wheneverthey are in a physical state in whichthere is little chanceof attaining the terminal rewardbefore their deadline. Thethird structural pattern that occurs in the resource state space is dominance.An allocation xi dominatesallocation x/ if Vx(si nxi) = V=(si nxi’) and xj cx/. Intuitively, the algorithm recognizes non-increasing returns from additional resources. Returning to the examplein Figure 6, the left-most branch of the tree correspondingto the allocation x,. = {Owns(M2,T3)} has an expected value of 95. If Owns(M2, T3) is false, the agent can use two units of processing on machineM3as an imperfect substitute and realize an expectedvalue of 89. However,the algorithm never considers the allocation Structure of the Resource State Space Ananalysis of the algorithm’s performanceon the test problems suggests that the decision-theoretic planning problems faced by market-based agents contains three "structural patterns" that can be exploited by state space reductionalgorithms. Thefirst structural pattern is temporalirrelevance. The representation of goodsused for the resource state space defines each goodin terms of a machineand a unit of time. Naturally, as time progresses, units of processingtime that occur in the past can have no impact on the future streams of costs and rewardsthat accrueto the agent. Thestructured dynamicprogramming algorithm recognizes the irrelevance 6OOOO 5OOOO 4OOOO [ e Deterministic 3OOOO ¯ Stochastic 2OOOO 100o0 0 , 1.E+O0 1.E+02 ¯ ¯ --" 1.E+04 1.E-+06 1.E+08 1.E+10 1.E4-12 1.E+14 1.E+16 number of explicit states(log scale) Figure 7: The structured dynamicprogramming algorithm delays the onset of computationalimpracticality. 25 xi’ = {Owns(M2,T3), Owns(M3,T3), Owns(M3, becauseall resource goodshavea non-zeroacquisition cost and the additional goods provide no increase in expected utility for the agent. In short, the agent has nothingto gain by owningxi’ over x~. The reduction in the size of the resource state space enabledby these three structural patterns is significant. For example, a problem with three machines and a decision horizonof eight time units generatesa resourcestate space of 23 x s = 16.8 million unique resource allocations. However, the numberof allocations actually evaluated in each physical state varies between508 and 2 according to the frequencydistribution in Figure 8. Thus, in the majority of physical states, the agent plans over fewerthan 50 distinct resource bundles. 20 15 number of resource allocationsevaluated Figure8: The numberof resource allocations considered by the structured dynamicprogramming algorithmin each physical state dependson the attributes on the state. 5 Conclusions Theuse of decision-theoretic planningto generate optimal solutions to large, complexresource allocation problems is infeasible due to the massivestates spaces that are required to represent the problems.Althoughemergingstate space reduction techniquessuch as structured dynamicprogramming and reachability analysis can greatly reduce the effective size of a decision-theoretic planning problem, these techniques merely attenuate the exponential explosion; they donot eliminateit. Market-baseddecompositionof the same large resource allocation problemsprovides a morerealistic opportunityto exploit the powerof decision-theoretic planning techniques since the agent-level planningproblemsare relatively small. Moreover,decision-theoretic planners provide a literal and exact implementationof economicrationality and are there- 26 fore ideal for market-basedsystems based on fundamental microeconomictheory. The difficulty that arises in practice is due to the price mechanism.Price is a critical element of market-based decompositionsince it permits the agents to plan independently and simultaneously in spite of contention for the same scarce resources. However, the price mechanism imposesa significant computational burden on the agents since they must plan over a large numberof resource contingencies in order to determinetheir reservation prices for the resources. Theempiricalresults presentedin this paper suggest that the computational burden imposedby the price mechanism can be reduced by exploiting structural patterns in the resource state space. Specifically, structured dynamicprogramming can be used to computethe reservation prices of only those resource allocations that are relevant to the agent. This minimalset of reservation prices can be combined to determine the agent’s reservation price for any allocation. Animportantissue that is not addressedin this paper is the feasibility of determining the equilibrium outcomein the resulting combinatorial auction. The winnerdetermination problemfor the auction requires search over a contract graph that is exponential in the numberof agents and resource goods[30]. However,a numberof heuristic search techniques have beenproposedthat performvery well, even on very large probleminstances [22, 29]. Thus, the main advantage of the market-basedapproach described here is that it transforms hard, stochastic optimization problems into hard deterministic winnerdetermination problemsthat are moreamenableto heuristic solution techniques. Giventhe ultimate requirementsfor inexact techniquesat the market level, one might question the value of using a computationallyintensive exact approachsuch as decisiontheoretic planning at the agent level. Althoughthe issue remains unresolved,there are a numberof practical reasons whythe use of decision-theoretic planning to implement market-basedagents is appropriate: 1. Computationis distributed -- The agents can solve their planning and valuation problemsindependently and simultaneously.Thus, the computationalintensity of the techniqueis less critical than it wouldbe if the agentlevel problemswere serialized. 2. Decision-theoretic planning agents encapsulate uncertainty -- Manyreal-world resource allocation problems are characterized by risk and uncertainty. Decision-theoretic planning provides an ideal meansof incorporating probabilistic information into decision-making. Moreover, the expected values (in the form of reservation prices) generated by the planning algorithm encapsulate the uncertainty in the system. The final winnerdetermination problemcan then be formulatedas a deterministic problem. 3. Accurate preference information simplifies the winner determinationprocess -- Search over the contract graphis simplified if each agent has a partially ordered list of resourceallocations and intrinsic utilities. Thisis especially true if the search is implemented as a continuous auction in which individual agents must initiate exchanges. 4. The impact of improvementsis additive -- Improvementsin both hardwareand techniques for solving decision-theoretic planning problemshave an additive effect in multiagent systems. As such, even incremental improvements can havea significant impact on the feasibility of applyingthe market-basedapproachto industrial scale problems. Of course, this is not to suggest that the decision-theoretic planningtechniquesused to date in this research are sufficient for industrial-scale problems.Instead of agents with three operations and a time horizon of 11 time units, problemsof a useful size require agents to plan over longer time horizons and managea dozenor so operations. These "useful" agent-level problemsare clearly very large. However, they are not so large that they are beyondconsideration. Improvedtechniques for state space reduction, teachability analysis, and approximationcould makevery large scale market-basedsystems based on decision-theoretic planningagentsa reality. [10] ThomasDean and Shieu-Hong Lin, 1995, "Decomposition Techniquesfor Planningin Stochastic Domains," in the proceedingof 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, PQ, pp. 1121-1128. [11]ThomasDean, Leslie Pack Kaelbling, Jan Kirmanand Anne Nicholson, 1993, "Planning with Deadlines in Stochastic Domains,"in the proceedings of the Eleventh National Conferenceon Artificial Intelligence, Washington,DC, pp. 574-579. [12]Richard Deardenand Craig Boutilier, 1997, "Abstraction and ApproximateDecision Theoretic Planning," Artificial Intelligence, Vol. 89, pp. 219-283. [13] Jon Elster, 1983, Sour Grapes:Studies in the Subversion of Reality, CambridgeUniversity Press, Cambridge, UK. [14]Richard Engelbrecht-Wiggans,1983, "An Introduction to the Theoryof Biddingfor a Single Object," in Auctions, Bidding, and Contracting: Uses and Theory, Richard Engelbrecht-Wiggans, Martin Shubik and Robert M. Stark, eds., NewYorkUniversity Press, New York, pp. 53-104. [15]Nicholas R. Jennings, Katia Sycara and Michael J. Woolridge, 1998, "A Roadmapof Agent Research and Development," AutonomousAgents and Multi-Agent Systems,Vol. 1, No.1, pp. 7-38. [16]DonaldW. Katzner, 1988, Walrasian Microeconomics: An Introduction to the EconomicTheory of Market Behavior, Addison-Wesley,Reading, MA. [17]AndreuMas-Colell, Michael D. Whinstonand Jerry R. Green, 1995, MicroeconomicTheory, Oxford University Press, NewYork. [18]R. Preston McAfeeand John McMillan, 1996, "Analyzing the Airwaves Auction," Journal of Economic Perspectives, Vol. 10, No.1, pp. 159-176. [19]Nicolas Meuleau, Milos Hauskrecht, Kee-EungKim, Leonid Peshkin, Leslie Pack Kaelbling, ThomasDean and Craig Boutilier, 1998, "Solving VeryLarge Weakly CoupledMarkovDecision Processes," in the proceedings of the 15th National Conference on Artificial Intelligence, pp. 165-172. [20]Paul Milgrom, 1989, "Auctions and Bidding: a Primer," Journal of EconomicPerspectives, Vol. 3, No. 2, pp. 3-22. [21]Tracy Mullen and Michael E Wellman, 1995, "Some Issues in the Design of Market-OrientedAgents," in Intelligent Agents II: IJCAI’95Workshopon AgentTheories, Architectures, and Languages,MichaelJ. Woolridge, MilindTambeand J6rg E Mtiller, eds., SpringerVerlag, Berlin, pp. 283-298. [22]DavidC. Parkes, 2000, "iBnndle: An Efficient Ascending Price BundleAuction," in the proceedings of the ACMConference on Electronic Commerce(EC-99), pp. 148-157. [23]Michael Pinedo, 1995, Scheduling: Theory, Algorithms, and Systems, Prentice-Hail, Englewood Cliffs, NJ. [24]Martin L. Puterman, 1994, Markov Decision Processes: Discrete Stochastic DynamicProgramming, John Wiley &Sons, NewYork. References [1] AndrewG. Barto, Steven J. Bradtke and Satinder P. Singh, 1995, "Learningto act using real-time dynamic programming," Artificial Intelligence, Vol. 72, No.1-2, pp. 81-138. [2] Gary S. Becker, 1976, The Economic Approach to HumanBehavior, University of Chicago Press, Chicago, IL. [3] Alan H. Bondand Les Gasser, 1988, Readings In DistributedArtificial Intelligence, MorganKaufmann,San Francisco, CA. [4] Craig Boutilier, RonenI. Brafinan and Christopher Geib, 1998, "Structured Reachability Analysis for MarkovDecision Processes," in the proceedingsof the 14th AnnualConferenceon Uncertainty in Artificial Intelligence (UAI-98), Madison, WI(24-26 July) 24-32. [5] Craig Boutilier, ThomasDeanand Steve Hanks, 1999, "Decision-Theoretic Planning: Structural Assumptions and ComputationalLeverage," Journal of Artificial Intelligence Research,Vol. 11, pp. 1-94. [6] Craig Boutilier, Richard Dearden and Mois~sGoldszmidt, 1995, "Exploiting Structure in Policy Construction," in the proceedings of the 12th National Conferenceon Artificial Intelligence, Seattle, WA,pp. 1104-1111. to [7] Michael J. Brydon, 2001, A Market-BasedApproach Resource Allocation in Manufacturing, unpublished Ph.D.dissertation, Universityof British Columbia. [8] John Q. Cheng and Michael P. Wellman, 1998, "The WALRAS algorithm: A convergent distributed implementationof general equilibrium outcomes,"Computational Economics,Vol. 12, pp. 1-24. [9] Scott H. Clearwater, 1996, Market-BasedControl: A Paradigmfor Distributed ResourceAllocation, World Scientific, Singapore. 27 [25]HowardRaiffa, 1968, Decision Analysis: Introductory Lectures on Choices and Uncertainty, Addison-Wesley, Reading, MA. [26]Eric Rasmusen, 1989, Gamesand Information: An Introduction to GameTheory, Blackwell, London. [27] StephenJ. Rassenti, VernonL. Smithand R. L. Bulrin, 1982, "A Combinatorial Auction Mechanismfor Airport TimeSlot Allocation," Bell Journalof Economics, Vol. 13, pp. 402-417. [28] Jeffrey S. Rosenscheinand Gilad Zlotkin, 1994, Rules of Encounter: Designing Conventions for Automated Negotiation Among Computers, MIT Press, Cambridge, MA. [29]Michael H. Rothkopf, Alexsander Pekec and Ronald M. Harstad, 1998, "Computationally Manageable Combinational Auctions," ManagementScience, Vol. 44, No. 8, pp. 1131-1147. [30] TuomasW.Sandholm,1998, "Contract Typesfor Satisricing Task Allocation: I Theoretical Results," in the proceedings of the AAAI1998 Spring Symposium: Satisficing Models,Stanford, CA(March23-25). [31]Tuomas W. Sandholm and Subhash Suri, 2000, "ImprovedAlgorithms for Optimal WinnerDetermination in CombinatorialAuctions and Generalizations," in the proceedingsof the National Conferenceon Artificial Intelligence (AAAI),Austin, TX(July 31-August 2) pp. 90-97. [32]Robert A. Schwartz, 1988, Equity Markets: Structure, Trading, and Performance,Harper & Row,NewYork. [33]WilliamE. Walsh, MichaelP. Wellman,Peter R. Wurmanand Jeffrey K. MacKie-Mason,1998, "SomeEconomics of Market-BasedDistributed Scheduling," in the proceedings of the 18th International Conference on Distributed ComputingSystems, pp. 612-621. [34] MichaelP. Wellman,1996, "Market-OrientedProgramming: SomeEarly Lessons," in Market-BasedControl: A Paradigmfor Distributed ResourceAllocation, S.H. Clearwater,ed., WorldScientific, Singapore,pp. 74-95. 28