1 Plan of the course Introduction Rules of Encounters Strategic Negotiation Auctions protocols strategies Argumentation 2 Machines Controlling and Sharing Resources Electrical grids (load balancing) Telecommunications networks (routing) PDA’s (schedulers) Shared databases (intelligent access) Traffic control (coordination) 3 Broad Working Assumption Designers (from different companies, countries, etc.) come together to agree on standards for how their automated agents will interact (in a given domain) Discuss various possibilities and their tradeoffs, and agree on protocols, strategies, and social laws to be implemented in their machines 4 Attributes of Standards Efficient: Stable: Simple: Pareto Optimal No incentive to deviate Low computational and communication cost Distributed: No central decision-maker Symmetric: Agents play equivalent roles Designing protocols for specific classes of domains that satisfy some or all of these attributes 5 Distributed Artificial Intelligence (DAI) Distributed Problem Solving (DPS) — Centrally designed systems, built-in cooperation, have global problem to solve Multi-Agent Systems (MAS) — Group of utility-maximizing heterogeneous agents co-existing in same environment, possibly competitive 6 Phone Call Competition Example Customer wishes to place long-distance call Carriers simultaneously bid, sending proposed prices Phone automatically chooses the carrier (dynamically) MCI AT&T $0.20 $0.18 Sprint $0.23 7 Best Bid Wins Phone chooses carrier with lowest bid Carrier gets amount that it bid MCI AT&T $0.20 $0.18 Sprint $0.23 8 Attributes of the Mechanism Distributed Symmetric Stable Simple Efficient Carriers have an incentive to invest effort in strategic behavior MCI “Maybe I can bid as high as $0.21...” $0.18 AT&T $0.20 Sprint $0.23 9 Best Bid Wins, Gets Second Price Phone chooses carrier with lowest bid Carrier gets amount of second-best price MCI AT&T $0.20 $0.18 Sprint $0.23 10 Attributes of the Mechanism Distributed Symmetric Stable Simple Efficient Carriers have no incentive to invest effort in strategic behavior MCI “I have no reason to overbid...” $0.18 AT&T $0.20 Sprint $0.23 11 Database Domain TOD “All female employees making over $50,000 a year.” Common Database “All female employees with more than three children.” 2 1 12 Negotiation “A discussion in which interested parties exchange information and come to an agreement.” — Davis and Smith, 1977 Two-way exchange of information Each party evaluates information from its own perspective Final agreement is reached by mutual selection 13 Game Theory--Short Introduction Game theory is the study of decision making in multi-person situations where the outcome depends on everyone’s choice. In Decision Theory and the theory of competitive equilibrium from economics the other participants actions are considered as an environmental parameter. The effect of the of the decisionmaker’s actions on the other participants is not taken into consideration. 14 Describing a Game Essential elements: players, actions, information, strategies, payoffs, outcome, and equilibria. Ways to present social interactions as a game: Extensive form:the most complete description. Strategic form: many details are omitted. Coalitional form: binding agreements exist. 15 Example of two players game india d deal sikh D blow op deal 1 0 2 2- 23- 1- 0 16 Nash Equilibrium An action profile is an order set a=(a1,…,aN) of one action for each of the N players in the game. An action profile a is a Nash Equilibrium (Nash 53) of a strategic game, if each agent j does not have a different action yielding an outcome that it prefers to that generated when chooses aj, given that every other player I chooses ai. 17 op Ind 3-,5 blow sik yes 2,5 2,1blow yes sik dealH 0.4 c 0.6 Ind op 2,1- dealH dealH sik dealH 3,4 op Ind dealH Ind dealH sik op -3,0- 4- ,4 1,4 1,4 18 Rules of Encounter Jeffrey S. Rosenschein Gilad Zlotkin 19 Domain Theory Task Oriented Domains Agents have tasks to achieve Task redistribution State Oriented Domains Goals specify acceptable final states Side effects Joint plan and schedules Worth Oriented Domains Function rating states’ acceptability Joint plan, schedules, and goal relaxation 20 Postmen Domain Post Office 1 TOD 2 a c b d f e 21 Database Domain TOD “All female employees making over $50,000 a year.” Common Database “All female employees with more than three children.” 2 1 22 Fax Domain 2 1 TOD faxes to send a c b f d Cost is only to establish connection e 23 Slotted Blocks World SOD 1 3 2 1 2 1 2 3 24 The Multi-Agent Tileworld WOD agents hole B A tile 22 2 5 5 obstacle 2 34 25 Task Oriented Domain (TOD) A tuple < T, A, c > where: T is the set of all possible tasks • A = A1 , … , An is a list of agents • c is a monotonic function c : [2T ] + An encounter is a list T1 ,…, Tn of finite sets of tasks from T such that agent Ak needs to achieve all the tasks in Tk (also called agent Ak’s goal). 26 Building Blocks Domain A precise definition of what a goal is Agent operations Negotiation Protocol A definition of a deal A definition of utility A definition of the conflict deal Negotiation Strategy In Equilibrium Incentive-compatible 27 Deal and Utility in two-agent TOD Deal is a pair (D1, D2): D1 D2 = T1 T2 Conflict deal: = (T1, T2) Utilityi() = Cost(Ti) – Cost(Di) 28 Negotiation Protocols Agents use a product-maximizing negotiation protocol (as in Nash bargaining theory); It should be a symmetric PMM (product maximizing mechanism); Examples: 1-step protocol, monotonic concession protocol… 29 Building Blocks Domain A precise definition of what a goal is Agent operations Negotiation Protocol A definition of a deal A definition of utility A definition of the conflict deal Negotiation Strategy In Equilibrium Incentive-compatible 30 Negotiation with Incomplete Information Post Office 1 h a g f 1 b 1 2 c e 2 d What if the agents don’t know each other’s letters?31 –1 Phase Game: Broadcast Tasks Post Office b, f h a b 1 e 1 2 g f 1 c e 2 d Agents will flip a coin to decide who delivers 32 all the letters. Hiding Letters Post Office f h a 1 b e (1) (hidden) g f b 1 2 c e 2 d They then agree that agent 2 delivers to f and e. 33 Another Possibility for Deception Post Office b, c 1 b, c 2 a c b 1, 2 1, 2 They will agree to flip a coin to decide who goes to b and who 34 goes to c. Phantom Letter Post Office a b, c, d 1 b, c 2 c b 1, 2 1, 2 d 1 (phantom) They agree that agent 1 goes to c. 35 Negotiation over Mixed Deals Mixed deal (D1, D2) : p The agents will perform (D1, D2) with probability p, and the symmetric deal (D2, D1) with probability 1 – p Theorem: With mixed deals, agents can always agree on the “all-or-nothing” deal 36 Hiding Letters with Mixed All-or-Nothing Deals Post Office f h a 1 b (1) (hidden) g f b 1 e 2 c e 2 d They will agree on the mixed deal where agent 1 has a 3/8 chance of delivering to f and e. 37 Phantom Letters with Mixed Deals Post Office a b, c, d 1 b, c 2 c b 1, 2 1, 2 They will agree on the mixed deal where A has 3/4 chance of delivering all d letters, lowering his expected utility. 38 1 (phantom) Sub-Additive TODs TOD < T, A, c > is sub-additive if for all finite sets of tasks X, Y in T we have: c(X Y) c(X) + c(Y) 39 Sub-Additivity X Y c(X Y) c(X) + c(Y) 40 Sub-Additive TODs The Postmen Domain, Database Domain, and Fax Domain are sub-additive. The “Delivery Domain” (where postmen don’t have to return to the Post Office) is not sub-additive. 41 Incentive Compatible Mechanisms a h a g f 1 b c e (1) (hidden) b Sub-Additive d Hidden Phantom 2 c Pure A/N Mix L T L 1, 2 1, 2 d 1 (phantom) L T/P T/P Theorem: For all encounters in all sub-additive TODs, when using a PMM over all-or-nothing deals, no agent has an incentive to hide a task. 42 Decoy Tasks Decoy tasks, however, can be beneficial even with all-or-nothing deals 1 1 Sub-Additive Hidden Phantom Decoy Pure L A/N T Mix L L T/P T/P L L L 1 1 1 2 2 43 Concave TODs TOD < T, A, c > is concave if for all finite sets of tasks Y and Z in T , and X Y, we have: c(Y Z) – c(Y) c(X Z) – c(X) Concavity implies sub-additivity. 44 Concavity Z Y X The cost Z adds to X is more than the cost it adds to Y. (Z - X is a superset of Z - Y) 45 Concave TODs The Database Domain and Fax Domain are concave (not the Postmen Domain, unless restricted to trees). Z 1 X 1 2 1 2 1 1 This example was not concave; Z adds 0 to X, but adds 2 to its superset Y (all blue nodes). 46 Three-Dimensional Incentive Compatible Mechanism Table Theorem: For all encounters in all concave TODs, when using a PMM over all-ornothing deals, no agent has any incentive to lie. Concave Hidden Phantom Decoy Pure L L L A/N T T T Mix L T T Sub-Additive Hidden Phantom Decoy Pure L A/N T Mix L L T/P T/P L L L 47 Modular TODs TOD < T, A, c > is modular if for all finite sets of tasks X, Y in T we have: c(X Y) = c(X) + c(Y) – c(X Y) Modularity implies concavity. 48 Modularity X Y c(X Y) = c(X) + c(Y) – c(X Y) 49 Modular TODs The Fax Domain is modular (not the Database Domain nor the Postmen Domain, unless restricted to a star topology). Even in modular TODs, hiding tasks can be beneficial in general mixed deals. 50 Three-Dimensional Incentive Compatible Mechanism Table Modular Concave H Sub-Additive H Pure A/N Mix P D L L T T/P L L T/P L P D Pure L L L A/N T T T Mix L T T H P D Pure L T T A/N T T T Mix L T T L 51 Related Work Coalitions Formations: Shehory, Sandholm Mechanism design:Ephrati, Kraus, Tennenholtz Other models of negotiation: Sycara, Durfee, Lesser, Gasser, Gmytrasiewicz, Jennings Consensus mechanisms, voting techniques, economic models: Ephrati, Wellman, Sandholm 52 Conclusions By appropriately adjusting the rules of encounter by which agents must interact, we can influence the private strategies that designers build into their machines The interaction mechanism should ensure the efficiency of multi-agent systems Rules of Encounter Efficiency 53 Conclusions To maintain efficiency over time of dynamic multi-agent systems, the rules must also be stable The use of formal tools enables the design of efficient and stable mechanisms, and the precise characterization of their properties Stability Formal Tools 54 Strategic Negotiation Collaborators: Jon Wilkenfeld, Rina SchwartzAzoulay, Orna Shechter, Esti Freitsis 55 DAI Overview AI DAI DPS MA strategic negotiation 56 Strategic Negotiation Model Model of alternative offers (Rubinstein) which takes negotiation time into consideration: reduces negotiation time. During the strategic-negotiations agents communicate their respective desires to reach mutually beneficial agreement. The model provides a unified to many problems. 57 Structure of the Negotiation There are N self motivated agents, randomly designated 1,2,... All the agents negotiate to reach an agreement. The negotiation process may include several Time ־equidistant iterations 0,1,2… and can continue forever. In each time period t, agent j(t) =t mod N makes an offer. 58 Structure of the Negotiation - cont. The other agents respond simultaneously: YES4 or NO8 or OPTM. If the offer was accepted4 by all the agents: the last offer is implemented. If at least one agent opts outM: a conflict occurs. Otherwise (the offer was rejected8 by at least one agent), the negotiation proceeds to period t+1.ֱֲא 59 Applications Information servers (large databases). Resources sharing. Tasks distribution. Computer assisted negotiation. Union/management negotiation. 60 Negotiation on data allocation in multi-server environment 61 Environment Description There are several information servers. Each server is located at a different geographical area. Each server receives queries from the clients in its area, and sends documents as responses to queries. These documents can be stored locally, or in another server. 62 Environment Description the query serveri a query distance document/s server j the document/s a client area i area j 63 Environment Description - cont. The information is clustered in datasets (corresponding to file, fragment, etc.) Each new dataset has to be allocated to one of the servers by mutual agreement among the servers. Each server wants to store the datasets in a location which reduces its communication and storage costs. A negotiation session is initiated when a set 64 of new datasets arrive. Motivation Cooperation among servers with similar areas of interest (e.g., Web servers). The Data and Information System component of the Earth Observing System (EOSDIS) of NASA: A distributed knowledge system which supports archival and distribution of data at multiple and independent servers. 65 Motivation - cont. Each data collection, or file, is called a dataset. The datasets are huge, so each dataset has only one copy. The current policy for data allocation in NASA is static: old datasets are not reallocated; each new dataset is located by the server with the nearest topics (defined according to the topics of the datasets stored by this server). 66 Related Work File Allocation Problem The original problem: How to distribute files among computers, in order to optimize the system performance. Our problem: How can self-motivated servers decide about distribution of files, when each server has its own objectives. 67 Basic Definitions SERVERS: the set of the servers. DATASETS: the set of datasets (files) to be allocated. Allocation: a mapping of each dataset to one of the servers. The set of all possible allocation is denoted by Allocs. U: the utility function of each server. 68 The Conflict Allocation at least one server opts outM of the negotiation, then the conflict allocation conflict_alloc is implemented. We consider the conflict allocation to be the static allocation. (each dataset is stored in the server with closest topics). If 69 Utility Function Userver(alloc,t) specifies the utility of server from alloc־Allocs at time t. It consists of The utility from the assignment of each dataset. The cost of negotiation delay. Userver(alloc,0)= S Vserver(x,alloc(x)). x־DATASETS 70 Parameters of utility query price: payment for retrieved docoments. usage(ds,s): the expected number of documents of dataset ds from clients in the area of server s. storage costs, retrieve costs, answer costs. 71 Cost over time Cost of communication and computation time of the negotiation. Loss of unused information: new documents can not be used until the negotiation ends. Datasets usage and storage cost are assumed to decrease over time, with the same discount ratio (p-1). Thus, there is a constant discount ratio of the utility from an allocation: Userver(alloc,t)= t*Userver(alloc,0) - t*C. 72 Assumptions Each server prefers any agreement over continuation of the negotiation indefinitely. The utility of each server from the conflict allocation is always greater or equal to 0. OFFERS - the set of allocations that are preferred by all the agents over opting out. 73 Equilibrium Nash equilibrium: A strategy profile p is a Nash Equilibrium if no player has a different strategy yielding an outcome that he prefers to that generated when it chooses pi. Subgame Perfect Equilibrium: If the strategy profile induced in every subgame is a Nash Equilibrium of this subgame. 74 Negotiation Analysis Simultaneous Responses Simultaneous responses: A server, when responding, is not informed of the other responses. Theorem: For each offer x ־OFFERS, there is a subgameperfect equilibrium of the bargaining game, with the outcome x offered and unanimously accepted in period 0. 75 Choosing the Allocation The designers of the servers can agree in advance on a joint technique for choosing x: giving each server its conflict utility. maximizing a social welfare criterion: the sum of the servers’ utilities. or the generalized Nash product of the servers’ utilities: P (Us(x)-Us(conflict)). 76 Choosing the Allocation - cont. The problem of finding an optimal allocation is NP-complete (a reduction from the multiprocessors scheduling). When finding x is intractable, we suggest the following protocol: each server will search for an allocation the allocation which maximizes the predefined social welfare criterion will be chosen. 77 Search Methods We have implemented the following algorithms: A backtracking algorithm: Searching the search space of the allocation problem. A random restart hill-climbing algorithm: Starts with a random allocation and tries to improve it. A genetic algorithm: Searching by simulating an evolution process. Each individual represents an allocation. The algorithm involves: reproduction, crossover and mutation of individuals. 78 Experimental Evaluation How do the parameters influence the results of the negotiation? vcost(alloc): the variable costs due to an allocation (excludes storage_cost and the gains due to queries). vcost_ratio: the ratio of vcosts when using negotiation, and vcosts of the static allocation. 79 Effect of Parameters on The Results As the number of servers grows, vcost_ratio increases (more complex computations) . As the number of datasets grows, vcost_ratio decreases (negotiation is more beneficial) . Changing the mean usage did not influence vcost_ratio significantlyK, but vcost_ratio decreases as the standard deviation of the usage increases. 80 Influence of Parameters - cont. When the standard deviation of the distances between servers increases, vcost_ratio decreases. When the distance between servers increases, vcost_ratio decreases. In the domains tested, answer_cost ס storage_cost ס retrieve_cost ס query_price ס vcost_ratio ס. vcost_ratio ס. vcost_ratio ע. vcost_ratio ע. 81 Social Criteria We studied the effect of the choice of the social welfare criterion on the results. We compare the following criteria: Sum of agents’ utilities. Product of agents’ utilities. Maximizing the sum achieves lower vcost_ratio. Maximizing the product achieves lower dispersion of the agents’ utilities. 82 Incomplete Information Each server knows: The usage frequency of all datasets, by clients from its area. The usage frequency of datasets stored in it, by all clients. 83 Incomplete Information - cont. A revelation mechanism: First, all the servers report simultaneously all their private information: – for each dataset, the past usage of the dataset by this server. – for each server, the past usage of each local dataset by this server. Then, the negotiation proceeds as in the complete information case. 84 Incomplete Information - cont. Lemma: There is a Nash equilibrium where each server tells the truth about its past usage of remote datasets, and the other servers usage of its local datasets. Lies concerning details about local usage of local datasets are intractable. 85 Summary: negotiation on data allocation We have considered the data allocation problem in a distributed environment. We have presented the utility function of the servers, which expresses their preferences. We have proposed using a negotiation protocol for solving the problem. For incomplete information situations, a revelation process was added to the protocol. 86 Negotiations in the pollution sharing problem Collaborator: Esti Freitsis 87 Environment Description There are some closely grouped plants in an industrial region. Each plant can produce several types of products. Each plant has a utility function (profit). There are several types of pollution substances. Each plant has norms, restricting maximal emission of each polluting substance that it emits. The pollution always has to be below these norms. We refer to the situation when only these norms have to be carried out as usual circumstances. 88 Special circumstances Sometimes there is a need to reduce pollution for some period because of external factors such as weather (high humidity, wind towards residential area). In this case plants receive new norms. We refer to this situation as special circumstances. 89 Current solution Current solution: each plant reduce pollution according to the new norms. Disadvantage: for one plant it is less costly to reduce one substance while for another it is less costly to reduce another substance. 90 Negotiations Our solution: plants negotiate to reach beneficial agreements about the emission of what substances and by which percent each of them must be reduced. The conflict solution: following the new norms. We consider complete information situations. 91 Negotiations Protocols Simultaneous responses: an agent responding to an offer is not informed of the other responses. Sequential responses: an agent responding to an offer is informed of the responses of the preceding agents (assuming that the agents are ordered). 92 Negotiations strategies for simultaneous responses As in the data allocation case: For each possible agreement x that is better to all the plants than the conflict solution there is a subgame-perfect equilibrium of the bargaining game, with the outcome x offered and unanimously accepted in period 0. 93 Negotiations strategies for sequential responses Assumption: there is a time period, T where negotiation cannot continue anymore. In T the conflict allocation is implemented. Perfect equilibrium by backward induction: At T-1 if negotiations hasn’t ended, AT-1 suggests the best agreement to itself which is better to all agents than the conflict solution (denoted by OT-1 ); the other agents accept. At T-2, AT-2 suggests the best agreement to itself which is better to all agents than the conflict solution and OT-1 (denoted by OT-2). The other agents accept. By induction, at the first time period A0 O0 the others accept. 94 Assumptions about the environment Profit is a linear function of the number of items of each product produced by the plant Pollution is a linear function of the number of items of each product produced. 95 Techniques which were checked Strategic negotiations: Sequential responses: backtracking Simultaneous response: Maximization of the sum with guaranties of default profit : – Simplex method - method for linear optimization Nash Product: Praxis - method for multi-variable nonlinear function minimization. Hill Climbing 96 Simulation Parameters Number of plants is varied from 5 to 20. Number of pollution types is varied from 5 to 20. For each product pollution of some type is produced with probability 1/2. Each plant produces Max_prod different types of products. Max_prod is varied from 5 to 20. Pollution and profit per item of product and pollution constraints are set randomly. Results: Average of 25 simulation runs. 97 Plants’ utility as the function of the number of plants 98 Standard Deviation as the function of the number of plants 99 Computation time as a function of number of plants 100 Plants’ utility as the function of the number of pollution substances 101 Standard deviation as the function of the number of pollution substances 102 Computation time as a function of the number of pollution substances 103 Plants’ utility as a function of the number of products 104 Standard deviation as a function of the number of products 105 Computation time as the function of the number of products 106 Computation time as a function of the number of products 107 Conclusions Maximizing the sum yields the highest average utility, but also the highest standard deviation; requires agreement between the designers on selecting a solution. Backward induction yields a reasonable average utility with low standard deviations and no need for designers agreement on detailed protocol. On going work: incomplete information. 108 Sharing Resources Through Negotiation Joint resource: public communication system; satellite; Agents: self motivated. Environment: no central controller. 109 Environment Description Two agents must share a joint resource; the resource can only be used by one agent at a time. No central controller. One agent (A) is using the resource, and the second (W) wants to use it too. The agents negotiate to reach an agreement: a schedule that divides the usage of the resource; <s,t>. 110 Environment Description -cont A continues to use the resource as the negotiation proceeds: A gains over time. W is not able to use the resource: W loses over time. Opting out causes damage to the resource: both agents wait q time steps. Additional option: an agent can leave the negotiation. 111 Applying the strategic model We developed a detailed utility function for the agents (U_A; U_W). Parameters: type of goal, dead-lines, costs of negotiation, gains from goal, etc. Main factor in the negotiation: the best agreement for A, which is still better for W than Opting out (O_n). 112 Perfect equilibrium strategies O_n depends on the specific situation; we proved lemmas which specify the value of O_n as a function of the utility function parameters. Complete information: Negotiation ends at most after one step with an agreement, or W leaves. The strategies are simple. 113 Experiments Using MINUET Agent 1 Working on goal 102 Agent 2 Send request <5,3> #### Receive request <5,3> Resources 1001 - free 1002 - busy 114 Experiments Results Metric Utility score Abandon goals Nego./Alter. Nego. 91% 9.6 21.2 EDF 91% 8.4 15.5 115 Summary A strategic model of negotiation, taking the passage of time into account. We consider wide range of situations: complete /incomplete information; N>2 agents; agents lose over time/some lose and some gain over time; 116 Summary--cont. The model was applied to different domains. We found simple and stable strategies. Negotiation ends without delay. 117