Lectures TDDD10 AI Programming Multiagent Decision Making

Lectures AI Programming: Introduction to RoboRescue 3Agents and Agents 4Multi-Agent and 5 Multi-Agent Decision Making 6Cooperation And Coordination 7 Cooperation And Coordination 8 Machine 9Knowledge 10 Putting It All 1 2 TDDD10 AI Programming Multiagent Decision Making Cyrille Berger 2 / 82 Lecture goals Lecture content Multi-agent decision in a competitive environment Learn about the concept of utility, rational agents, voting and auctioning 3 / 82 Self-Interested Social Choice Auctions Single Dimension Auctions Combinatorial Auctions 4 / 82 Utilities and Preferences Self-Interested Agents Assume we have just two agents: Ag = {i, Agents are assumed to be self-interested: they have preferences over how the environment is Assume Ω = {ω₁, ω₂, …}is the set of “outcomes” that agents have preferences over We capture preferences by utility uᵢ = Ω→ ℝ uⱼ = Ω→ ℝ Utility functions lead to preference orderings over outcomes: ω ⪰ ω’ means uᵢ(ω) ≥ uᵢ(ω’) ω ⪲ ω’ means uᵢ(ω) > uᵢ(ω’) 6 / 82 Self-Interested Agents What is utility? If agents represent individuals or organizations then we cannot make the benevolence assumption. Utility is not money, but similar Agents will be assumed to act to further there own interests, possibly at expense of others. Potential for conflict. May complicate the design task enormously. 7 / 82 8 / 82 Multiagent Encounters (1/2) Multiagent Encounters (2/2) We need a model of the environment in which these agents will act… Examples of a state transformer function agents simultaneously choose an action to perform, and as a result of the actions they select, an outcome in Ω will result the actual outcome depends on the combination of actions assume each agent has just two possible actions that it can perform, C (“cooperate”) and D (“defect”) Environment behavior given by state transformer function: τ: Acⁱ ⨯ Acʲ → Ω This environment is sensitive to actions of both agents: τ(D,D)=ω₁ τ(D,C)=ω₂ τ(C,D)=ω₃ Neither agent has any influence in this environment: τ(D,D)=ω₁ τ(D,C)=ω₁ τ(C,D)=ω₁ This environment is controlled by j τ(D,D)=ω₁ τ(D,C)=ω₂ τ(C,D)=ω₁ 9 / 82 Coordination game Suppose we have the case where both agents can influence the outcome, and they have utility functions as follows: 10 / 82 Payoff Matrices We can characterize the previous scenario in a payoff matrix: uᵢ(ω₁)=2 uᵢ(ω₂)=1 uᵢ(ω₃)=3 uᵢ(ω₄)=4 uⱼ(ω₁)=2 uⱼ(ω₂)=3 uⱼ(ω₃)=1 uⱼ(ω₄)=4 This environment is sensitive to actions of both agents: τ(D,D)=ω₁ τ(D,C)=ω₂ τ(C,D)=ω₃ τ(C,C)=ω₄ With a bit of abuse of notation: uᵢ(D,D)=2 uᵢ(D,C)=1 uᵢ(C,D)=3 uⱼ(D,D)=2 uⱼ(D,C)=3 uⱼ(C,D)=1 Agent i is the column player Agent j is the row player Then agent i’s preferences are: C,C ⪰ᵢ C,D ≻ᵢ D,C ⪰ᵢ “C” is the rational choice for i. 11 / 82 12 / 82 Disgrace of Gijón (World Cup 1982) The Prisoner’s Dilemma Two men are collectively charged with a crime and held in separate cells, with no way of meeting or communicating. They are told that: One game left: Germany uᵢ(≥3-0) = 2 uⱼ(≥3-0) = uᵢ(2-0) = uᵢ(1-0) = 2 uⱼ(2-0) = uⱼ(1-0) = uᵢ(a-a) = -1 uⱼ(a-a) = uᵢ(0-a) = -1 uⱼ(0-a) = 2 (a > Final score: Germany 1 - 0 if one confesses and the other does not, the confessor will be freed, and the other will be jailed for three years If both confess, then each will be jailed for two years Both prisoners know that if neither confesses, then they will each be jailed for one year 13 / 82 The Prisoner’s Dilemma Payoff matrix for prisoner’s dilemma: 14 / 82 Solution Concepts How will a rational agent behave in any given scenario? Answered in solution concepts: Top left: If both defect, then both get punishment for mutual defection Top right: If i cooperates and j defects, i gets sucker’s payoff of 1, while j gets 4 Bottom left: If j cooperates and i defects, j gets sucker’s payoff of 1, while i gets 4 Bottom right: Reward for mutual cooperation 15 / 82 dominant Nash equilibrium Pareto optimal strategies that maximize social 16 / 82 Dominant Strategies (1/2) Dominant Strategies (2/2) Given any particular strategy (either C or D) of agent i, there will be a number of possible We say s₁ dominates s₂ if every outcome possible by i playing s₁ is preferred over every outcome possible by i playing s₂ A rational agent will never play a dominated So in deciding strategy what to do, we can delete dominated strategies Unfortunately, there is not always a unique undominated strategy Rational action example: Prisoner's Dilemna: 17 / 82 18 / 82 (Pure Strategy) Nash Equilibrium (1/2) (Pure Strategy) Nash Equilibrium (2/2) In general, we will say that two strategies s1 and s2 are in Nash equilibrium if: Coordination game: Neither agent has any incentive to deviate from a Nash equilibrium Prisoner's Dilemna: under the assumption that agent i plays s₁, agent j can do no better than play s₂; and under the assumption that agent j plays s₂, agent i can do no better than play s₁. Unfortunately: Not every interaction scenario has a Nash equilibrium Some interaction scenarios have more than one Nash equilibrium 19 / 82 20 / 82 Pareto Optimality (1/2) Pareto Optimality (2/2) An outcome is said to be Pareto optimal (or Pareto efficient) if there is no other outcome that makes one agent better off without making another agent worse off. If an outcome is Pareto optimal, then at least one agent will be reluctant to move away from it (because this agent will be worse off). If an outcome ω is not Pareto optimal, then there is another outcome ω’ that makes everyone as happy, if not happier, than ω. “Reasonable” agents would agree to move to ω’ in this case. (Even if I don’t directly benefit from ω, you can benefit without me suffering.) Coordination game: Prisoner's Dilemna: 21 / 82 Social Welfare (1/2) The social welfare of an outcome ω is the sum of the utilities that each agent gets from ω : Think of it as the “total amount of utility in the As a solution concept, may be appropriate when the system”. whole system (all agents) has a single owner (then overall benefit of the system is important, not individuals). 23 / 82 22 / 82 Social Welfare (2/2) Coordination game: Prisoner's Dilemna: 24 / 82 The Prisoner’s Dilemma The Prisoner’s Dilemma Solution This apparent paradox is the fundamental problem of multi-agent interactions. It appears to imply that cooperation will not occur in societies of self-interested agents. Real world D is a dominant (D,D) is the only Nash All outcomes except (D,D) are Pareto (C,C) maximizes social The individual rational action is defect This guarantees a payoff of no worse than 2, whereas cooperating guarantees a payoff of at most 1. So defection is the best response to all possible strategies: both agents defect, and get payoff = 2 But intuition says this is not the best outcome: Surely they should both cooperate and each get payoff of 3! 25 / 82 The Iterated Prisoner’s Dilemma One answer: play the game more than once If you know you will be meeting your opponent again, then the incentive to defect appears to evaporate Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma 27 / 82 nuclear arms reduction (“why don’t I keep mine. . . ”) free rider systems — public television licenses. The prisoner’s dilemma is Can we recover 26 / 82 Backwards Induction But…, suppose you both know that you will play the game exactly n times On round n - 1, you have an incentive to defect, to gain that extra bit of payoff… But this makes round n – 2 the last “real”, and so you have an incentive to defect there, too. This is the backwards induction problem. Playing the prisoner’s dilemma with a fixed, finite, pre-determined, commonly known number of rounds, defection is the best strategy 28 / 82 Axelrod’s Tournament Strategies in Axelrod’s Tournament Suppose you play iterated prisoner’s dilemma against a range of What strategy should you choose, so as opponents… to maximize your overall payoff? Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma RANDOM ALLD: “Always defect” — the hawk TIT-FOR-TAT: On round u = 0, cooperate On round u > 0, do what your opponent did on round u – TESTER: On 1st round, defect. If the opponent retaliated, then play TIT-FOR-TAT. Otherwise intersperse cooperation and defection. JOSS: As TIT-FOR-TAT, except periodically defect 29 / 82 Axelrod’s Tournament results TIT-FOR-TAT won the first tournament A second tournament was called TIT-FOR-TAT won the second tournament as well 31 / 82 30 / 82 Recipes for Success in Axelrod’s Tournament Axelrod suggests the following rules for succeeding in his tournament: Don’t be envious: Don’t play as if it were zero sum! Be nice: Start by cooperating, and reciprocate cooperation Retaliate Always punish defection immediately, but use “measured” force — don’t overdo it Don’t hold Always reciprocate cooperation immediately 32 / 82 Competitive and Zero-Sum Interactions Where preferences of agents are diametrically opposed we have strictly competitive scenarios Zero-sum encounters are those where utilities sum to zero: uᵢ(ω) + uⱼ(ω) = 0 Matching Pennies Players i and j simultaneously choose the face of a coin, either “heads” or “tails”. If they show the same face, then i wins, while if they show different faces, then j wins. for all ω ∊ Ω Zero sum implies strictly Zero sum encounters in real life are very rare, but people tend to act in many scenarios as if they were zero 33 / 82 34 / 82 Mixed Strategies for Matching Pennies Mixed Strategies No pair of strategies forms a pure strategy Nash Equilibrium: whatever pair of strategies is chosen, somebody will wish they had done something else. The solution is to allow mixed A mixed strategy has the form play α₁ with probability p₁ play α₂ with probability p2₂ ... play αk with probability pk. that p₁ + p₂ + … + pₖ = 1. Nash proved that every finite game has a Nash equilibrium in mixed strategies. play “heads” with probability play “tails” with probability This is a Nash Equilibrium 35 / 82 36 / 82 Social Choice Social choice theory is concerned with group decision making. Classic example of social choice theory: voting. Formally, the issue is combining preferences to derive a social outcome. Social Choice 38 / 82 Components of a Social Choice Model Assume a set Ag = {1, …, n} of voters. These are the entities who expresses preferences. Voters make group decisions wrt a set Ω = {ω₁, ω₂, …} of outcomes. Think of these as the candidates. If |Ω| = 2, we have a pair wise election. 39 / 82 Preferences Each voter has preferences over W: an ordering over the set of possible outcomes Ω. Example, Suppose: Ω = {gin, rum, brandy, whisky} then we might have agent mjw with preference order: ω(mjw) = (brandy, rum, gin, whisky) meaning: brandy >mjw rum >mjw gin >mjw whisky 40 / 82 Preference Aggregation The fundamental problem of social choice theory: Given a collection of preference orders, one for each voter, how do we combine these to derive a group decision, that reflects as closely as possible the preferences of voters? variants of preference aggregation: social welfare social choice Social Welfare Functions Let П(Ω) be the set of preference orderings over Ω. A social welfare function takes the voter preferences and produces a social preference order: We define ≻* as the outcome of a social welfare function whisky ≻* gin ≻* brandy ≻* rum ≻* S ≻* M ≻* SD ≻* MP ≻* C ≻* V ≻* FP ≻*KD ≻* FI ≻* 41 / 82 Social Choice Functions Sometimes, we just to select one of the possible candidates, rather than a social order. This gives social choice functions: Example: presidential election. 42 / 82 Voting Procedures: Plurality Social choice function: selects a single outcome. Each voter submits Each candidate gets one point for every preference order that ranks them Winner is the one with largest number of points. Example: Political elections in UK, France, USA... If we have only two candidates, then plurality is a simple majority election. 43 / 82 44 / 82 Anomalies with Plurality Strategic Manipulation by Tactical Voting Suppose your preferences Suppose |Ag| = 100 and Ω = {ω₁, ω₂, ω₃} with: ω₁ ≻ ω₂ ≻ ω₃ while you believe 49% of voters have 40% voters voting for 30% of voters voting for 30% of voters voting for ω₂ ≻ ω₁ ≻ ω₃ and you believe 49% have ω₃ ≻ ω₂ ≻ ω₁ You may do better voting for w2, even though this is not your true preference profile. This is tactical voting: an example of strategic manipulation of the vote. Especially a problem in two legs With plurality, ω₁ gets elected even though a clear majority (60%) prefer another candidate! 45 / 82 46 / 82 Condorcet’s Paradox Applications of social choice theory Suppose Ag = {1, 2, 3} and Ω = {ω₁, ω₂, ω₃} Main application is for human choice and decision making Results aggregation ω₁ ≻₁ ω₂ ≻₁ ω₃ ω₂ ≻₂ ω₃ ≻₂ ω₁ ω₃ ≻₃ ω₁ ≻₃ ω₂ For every possible candidate, there is another candidate that is preferred by a majority of This is Condorcet’s paradox: there are situations in which, no matter which outcome we choose, a majority of voters will be unhappy with the outcome chosen. 47 / 82 aggregate the output of several search engines 48 / 82 Application of auctions With the rise of the Internet, auctions have become popular in many e-commerce applications (e.g. eBay) Auctions are an efficient tool for reaching agreements in a society of self-interested Auctions For example, bandwidth allocation on a network, sponsor links Auctions can be used for efficient resource allocation within decentralized computational Frequently utilized systems for solving multi-agent and multi-robot coordination problems For example, team-based exploration of unknown terrain 50 / 82 What is an Auction? An auction takes place between an agent known as the auctioneer and a collection of agents known as the bidders Limit Price Each trader has a value or limit price that they place on the good. The goal of the auction is for the auctioneer to allocate all goods to the bidders The auctioneer desires to maximize the price and bidders desire to minimize the price Assume some initial Exchange is the free alteration of allocations of goods and money between 51 / 82 A buyer who exchanges more than their limit price for a good makes a loss. A seller who exchanges a good for less than their limit price makes a loss. Limit prices clearly have an effect on the behavior of traders. There are several models, embodying different assumptions about the nature of the good. 52 / 82 Limit Price Winner's curse Private value Termed in the 1950s: Common value For example Good has an value to me that is independent of what it is worth to you. Textbook gives the example of John Lennon’s last dollar bill. The good has the same value to all of us, but we have differing estimates of what it is. Winner’s curse Correlated value Our values are related. The more you are prepared to pay, the more I should be prepared to pay. Oil companies bid for drilling rights in the Gulf of Mexico Problem was the bidding process given the uncertainties in estimating the potential value of an offshore oil field Competitive bidding in high risk situations, by Capen, Clapp and Campbell, Journal of Petroleum Technology, 1971 An oil field had an actual intrinsic value of $10 million Oil companies might guess its value to be anywhere from $5 million to $20 million The company who wrongly estimated at $20 million and placed a bid at that level would win the auction, and later find that it was not worth that much In many cases the winner is the person who has overestimated the most ⇒ “The Winner’s curse” Bid Shading: Offer bid below a certain amount of the valuation 53 / 82 Auction Parameters Auction One shot: Only one bidding Ascending: Auctioneer begins at minimum price, bidders increase bids Descending: Auctioneer begins at price over value of good and lowers the price at each round Continuous: Auctions may Standard Auction: One seller and multiple buyers Reverse Auction: One buyer and multiple sellers Double Auction: Multiple sellers and multiple buyers Combinatorial 54 / 82 Single versus Multi-dimensional Single dimensional auctions The only content of an offer are the price and quantity of some specific type of good. “I’ll bid $200 for those 2 Multi dimensional auctions Offers can relate to many different aspects of many different goods. “I’m prepared to pay $200 for those two red chairs, but $300 if you can deliver them tomorrow.” Frequency ranges for cell Buyers and sellers may have combinatorial valuations for bundles of goods 55 / 82 56 / 82 English Auction An example of first-price open-cry ascending auctions Protocol: Single Dimension Auctions Auctioneer starts by offering the good at a low price Auctioneer offers higher prices until no agent is willing to pay the proposed level The good is allocated to the agent that made the highest offer Properties Generates competition between bidders (generates revenue for the seller when bidders are uncertain of their valuation) Dominant strategy: Bid slightly more than current bit, withdraw if bid reaches personal valuation of good Winner’s curse (for common value goods) 58 / 82 Dutch Auction First-Price Sealed-Bid Auctions First-price sealed-bid auctions are one-shot auctions: Protocol: Dutch auctions are examples of first-price opencry descending auctions Protocol: Auctioneer starts by offering the good at artificially high value Auctioneer lowers offer price until some agent makes a bid equal to the current offer price The good is then allocated to the agent that made the offer Properties Items are sold rapidly (can sell many lots within a single day) Intuitive strategy: wait for a little bit after your true valuation has been called and hope no one else gets in there before you (no general dominant strategy) Winner’s curse also possible 59 / 82 Within a single round bidders submit a sealed bid for the The good is allocated to the agent that made highest Winner pays the price of highest Often used in commercial auctions, e.g., public building contracts Problem: the difference between the highest and second highest bid is “wasted money” (the winner could have offered less) Intuitive strategy: bid a little bit less than your true valuation (no general dominant strategy) As more bidders as smaller the deviation should 60 / 82 Vickrey Auctions Generalized first price auctions Proposed by William Vickrey in 1961 (Nobel Prize in Economic Sciences in 1996) Vickrey auctions are examples of second-price sealed-bid one-shot auctions Protocol: within a single round bidders submit a sealed bid for the good good is allocated to agent that made highest bid winner pays price of second highest bid Used by Yahoo for “sponsored links” auctions Introduced in 1997 for selling Internet advertising by Yahoo/Overture (before there were only “banner ads”) Advertisers submit a bid reporting the willingness to pay on a per-click basis for a particular keyword Cost-Per-Click (CPC) Advertisers were billed for each “click” on sponsored links leading to their page Dominant strategy: bid your true valuation if you bid more, you risk to pay too much if you bid less, you lower your chances of winning while still having to pay the same price in case you win Antisocial behavior: bid more than your true valuation to make opponents suffer (not “rational”) For private value auctions, strategically equivalent to the English auction mechanism The links were arranged in descending order of bids, making highest bids the most prominent Auctions take place during each However, auction mechanism turned out to be unstable! Bidders revised their bids as often as 61 / 82 62 / 82 Generalized second price auctions Introduced by Google for pricing sponsored links (AdWords Select) Observation: Bidders generally do not want to pay much more than the rank below them Therefore: 2nd price Further Combinatorial Auctions Advertisers bid for keywords and keyword combinations Rank: CPC_BID X quality score Price: with respect to lower ranks http://www.chipkin.com/googleadwords-actual-cpccalculation/ After seeing Google’s success, Yahoo also switched to second price auctions in 2002 63 / 82 Combinatorial Auctions Complements and Substitutes In a combinatorial auction, the auctioneer puts several goods on sale and the other agents submit bids for entire bundles of goods Given a set of bids, the winner determination problem is the problem of deciding which of the bids to accept The value an agent assigns to a bundle of goods may depend on the combination Complements: The value assigned to a set is greater than the sum of the values assigns to its elements The solution must be feasible (no good may be allocated to more than one agent) Ideally, it should also be optimal (in the sense of maximizing revenue for the auctioneer) A challenging algorithmic problem 65 / 82 Example: „a pair of shoes” (left shoe and a right Substitutes: The value assigned to a set is lower than the sum of the values assigned to its elements Example: a ticket to the theatre and another one to a football match for the same night In such cases an auction mechanism allocating one item at a time is problematic since the best bidding strategy in one auction may depend on the outcome of other auctions 66 / 82 Optimal Winner Determination Algorithm Protocol One auctioneer, several bidders, and many items to be Each bidder submits a number of package bids specifying sold the valuation (price) the bidder is prepared to pay for a particular bundle The auctioneer announces a number of winning bids The winning bids determine which bidder obtains which item, and how much each bidder has to pay No item may be allocated to more than one bidder Examples of package bids: Agent 1: ({a, b}, 5), ({b, c}, 7), ({c, d}, 6) Agent 2: ({a, d}, 7), ({a, c, d}, 8) Agent 3: ({b}, 5), ({a, b, c, d}, 12) Generally, there are 2n − 1 non-empty bundles for n items, how to compute the optimal solution? 67 / 82 An auctioneer has a set of items M = {1,2,…,m} to sell There are N={1,2,…,n} buyers placing bids Buyers submit a set of package bids B = {B1,B2,…,Bn} A package bid is a tuple B = [S, v(S)], where S ⊆ M is a set of items (bundle) and vi(S) > 0 buyer’s i true valuation (price) xS,i ∈ {0, 1} is a decision variable for assigning bundle S to buyer i The winner determination problem (WDP) is to label the bids as winning or losing (by deciding each xs,i so as to maximize the sum of the total accepted bid price) This is NP-Complete ! Can be solved with an integer program solver, or heuristic search 68 / 82 Solving WDPs by Heuristic Search Two ways of representing the state space: Branch-on-items: A state is a set of items for which an allocation decision has already been made Branching is carried out by adding a further Branch-on-bids: Branch-on-Items Branching based on the question: “What bid should this item be assigned to?” Each path in the search tree consists of a sequence of disjoint bids Bids that do not share items with each other A path ends when no bid can be added to it at each node are the sum Costs A state is a set of bids for which an acceptance decision has already been made Branching is carried out by adding a further of the prices of the bids accepted on the path 69 / 82 Problem with branch-on-items 70 / 82 Example of branch-on-items What if the auctioneer's revenue can increase by keeping items? Example: There is no bid for 1, $5 bid for 2, $3 bid for {1;2} Bids: {1,2}, {2,3}, {3}, {1;3} We add Dummy Bids: {1}, {2} Thus, better to keep 1 and sell 2 than selling The auctioneer's possibility of keeping items can be implemented by placing dummy bids of price zero on those items that received no 1-item bids (Sandholm 2002) 71 / 82 72 / 82 Branch-on-bids Branch-on-bids - Example Branching is based on the question: “Should this bid be accepted or rejected?“ Binary tree When branching on a bid, the children in the search tree are the world where that bid is accepted (IN), and the world where that bid is rejected (OUT) No dummy bids are needed First a bid graph is constructed that represents all constraints between the bids Then, bids are accepted/rejected until all bids have been handled Bids: {1,2}, {2,3}, {3}, {1;3} On accept: remove all constrained bids from the graph On reject: remove bid itself from the graph 73 / 82 Heuristic Function 74 / 82 Auctions for Multi-Robot Exploration For any node N in the search tree, let g(N) be the revenue generated by bids that were accepted according The heuristic until function N h(N) estimates for every node N how much additional revenue can be expected ongoing from N An upper bound on h(N) is given by the sum over the maximum contribution of the set of unallocated items A: Tighter bounds can be obtained by solving the linear program relaxation of the remaining items (Sandholm 2006) 75 / 82 Consider a team of mobile robots that has to visit a number of given targets (locations) in initially partially unknown Examples of such tasks are cleaning missions, spaceterrain exploration, surveillance, and search and rescue Continuous re-allocation of targets to robots is necessary For example, robots might discover that they are separated by a blockage from their target To allocate and re-allocate the targets among themselves, the robots can use auctions where they sell and buy targets Team objective can be to minimize the sum of all path costs, hence, bidding prices are estimated travel costs The path cost of a robot is the sum of the edge costs along its path, from its current location to the last target that it visits 76 / 82 Multi-Robot Exploration General Exploration Robot always follow a minimum cost path that visits all allocated targets Whenever a robot gains more information about the terrain, it shares this information with the other robots If the remaining path of at least one robot is blocked, then all robots put their unvisited targets up for auction The auction(s) close after a predetermined amount of time Constraints: each robot wins at most one bundle and each target is contained in exactly one bundle After each auction, robots gained new targets or exchanged targets with other robots Then, the cycle repeats Three robots exploring Mars. The robots’ task is to gather data around the four craters, e.g. to visit the highlighted target sites. Source: N. Kalra 77 / 82 78 / 82 Single-Round Combinatorial Auction Parallel Single-Item Auctions Protocol: Protocol: Every robot bids all possible bundles of targets The valuation is the estimated smallest path cost needed to visit all targets in the bundle (TSP) A central auctioneer determines and informs the winning robots within one round Every robot bids on each target in parallel until all targets are asigned The valuation is the smallest path cost from the robots current position to the target Similar to Target Clustering Optimal team performance: Advantage: Combinatorial auctions take all positive and negative synergies between targets into account Simple to implement and computation and communication efficient Minimization of the total path Drawbacks: Disadvantage: Robots cannot bid on all possible bundles of targets because the number of possible bundles is exponential in the number of targets To calculate costs for each bundle requires to calculate the smallest path cost for visiting a set of targets (Traveling Salesman Problem) Winner determination is NP-hard The team performance can be highly suboptimal since it does not take any synergies between the targets into account 79 / 82 80 / 82 Sequential Single-Item Auctions Summary Utilities and competitive strategies Voting mechanism We discussed English, Dutch, First-Price SealedBid, and Vickrey auctions Generalized second price auctions have shown good properties in practice, however, “truth telling” is not a dominant strategy Combinatorial auctions are a mechanism to allocate a number of goods to a number of agents Protocol: Targets are auctioned after the sequence T1, T2, T3, T4, … The valuation is the increase in its smallest path cost that results from winning the auctioned target The robot with the overall smallest bid is allocated the corresponding target Finally, each robot calculates the minimum-cost path for visiting all of its targets and moves along this path Advantages: Hill climbing search: some synergies between targets are taken into account (but not all of them) Simple to implement and computation and communication efficient Since robots can determine the winners by listening to the bids (and identifying the smallest bid) the method can be executed decentralized Disadvantages: Order of targets change the result 81 / 82 82 / 82

Lectures TDDD10 AI Programming Multiagent Decision Making

Related documents

Products

Support

Lectures TDDD10 AI Programming Multiagent Decision Making

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib