Late-Breaking Developments in the Field of Artificial Intelligence Papers Presented at the Twenty-Seventh AAAI Conference on Artificial Intelligence Algorithm Selection in Bilateral Negotiation Litan Ilany and Ya’akov (Kobi) Gal Ben-Gurion University of the Negev, Israel al. 2012). The meta-agent was able to significantly outperform the winner of the 2012 competition (the agent strategy that achieved the best average performance over all domains and all agents), and agree more often with an “oracle” that chose the optimal agent strategy for each domain in retrospect. Our approach generalized to the case in which both competition agents and the competition domains were not available for training. These results provide insights for agent-designers in negotiation, demonstrating that “a little learning goes a long way”, and suggesting that the algorithm selection approach may also be feasible for other multiagent optimization problems such as planning and decisionmaking. Abstract Despite the abundance of strategies in the literature on repeated negotiation under incomplete information, there is no single negotiation strategy that is optimal for all possible settings. Thus, agent designers face an “algorithm selection” problem— which negotiation strategy to choose when facing a new negotiation. Our approach to this problem is to predict the performance of different strategies based on structural features of the domain and to select the negotiation strategy that is predicted to be most successful using a “meta-agent”. This agent was able to outperform all of the finalists to the recent Automated Negotiation Agent Competition (ANAC). Our results have insights for agent-designers, demonstrating that “a little learning goes a long way”, despite the inherent uncertainty associated with negotiation under incomplete information. The Setting We first make the following definitions for bilateral negotiation: A domain consists of a set of issues. Each issue can take one of possible discrete values. The domain is common knowledge to the negotiating parties. A proposal is an assignment of values to all issues. A negotiation round involves two participants termed Agent1 and Agent2. Each agent has a profile, that determines its valuation of a proposal, which is private information. In a negotiation round, Agent1 and Agent2 make alternating take-it-or-leave-it offers to each other until a proposal is accepted or a predetermined deadline is reached. If an agreement is reached for a proposal pt at time t, the utility of Agent1 is its valuation of the agreed proposal. Throughout this paper we will use the same empirical setting and conditions used in the ANAC 2012 tournament.1 Under these conditions, a “tournament” over a set of agents A and a set of domains D consists of multiple negotiation rounds between all agent pairs in A over all domains D for all possible profiles 2 . Agents do not know the identity of their negotiation partners. The agents are “reset” at the onset of each negotiation round, meaning that no information about a domain or the history of past rounds is accessible (and no learning across rounds is possible). The winner of the tournament is the agent that achieved the highest average score over all of the negotiation rounds. Introduction Multi-attribute negotiation under incomplete information is a well studied problem in AI and there is an abundance of agent-designs that use varying methods, heuristics, and techniques (Kraus 2001; Jennings et al. 2001; Lin and Kraus 2010). However, it has been widely observed that no single negotiation strategy is optimal for all domains. This paper studies the “algorithm selection problem” (Xu et al. 2008; Samulowitz and Memisevic 2007), that is, which of a set of possible strategies to choose to negotiate with an unknown partner in a new domain, when the preferences of the negotiation partner as well as its negotiation strategy are unknown. Our methodology consists of defining a set of features that encapsulate the information about the domain that is available to agents at the onset of negotiation. These features are then used to predict the performance of existing negotiation strategies on a new domain using canonical learning methods including multi-layer neural networks, decision trees, and linear and logistic regression. We designed a “metaagent” which, at run time, selects the negotiation strategy that is predicted to be most successful on a new domain based on its features. We demonstrate our approach empirically on a negotiation test bed that used for the annual international Automated Negotiation Agent Competition (ANAC) (Baarslag et 1 http://anac2012.ecs.soton.ac.uk/ This means that each agent pair negotiates four times in each domain, once as proposer and once as responder for each profile. c 2013, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 2 50 Methodology Ϭ͘ϳ Ϭ͘ϲϵ We define the algorithm selection problem in negotiation as follows: Given a set of training domains D and training agents A, which agent to select from A to negotiate in each domain of a competition that includes a set of test domains D and test agents A ? This paper studies the case where both agents and domains are unknown (A ∩ A = ∅, D ∩ D = ∅). Our approach also performs well when the test agents are known and domains are unknown (D ∩ D = ∅, A = A ), which we do not list for brevity.3 Our approach to the algorithm selection problem is to construct a “meta-agent” that chooses an agent in A by predicting its performance in each domain in the tournament using characteristics of the domain that are observable to the meta-agent at the onset of the negotiation. Each domain d is defined by several features from three types: (a) domain information that is common knowledge, (b) private information about the agent’s profile, (c) information that inferred from the first proposal p that Agent1 receives from Agent2. We describe the design of a class of meta-agents that use standard machine learning algorithms to predict the performance of different agents in a negotiation round. We used canonical supervised learning algorithms to predict the performance of agents given a domain and profile. We used two measures of performance. The first measured the difference between the score of agent i when negotiating with any agent k in domain d and the average score over all negotiations with all agents in the domain. The second measure of performance predicted whether an agent i is a “winner” when negotiating with any agent k on a domain. We used the following learning techniques to predict an agent’s performance when negotiating in a new domain (adapting standard overfitting avoidance methods for each technique). A regression tree algorithm (denoted “CART” (Breiman et al. 1984)); a neural network (denoted “NN”) with a single hidden layer and four hidden nodes, using early stopping after 150 iterations when training; a linear regression model (denoted “LinReg”) with a forward-backward selection method for choosing the predictive variables (Shibata 1981). We used classification methods for determining the probability that an agent is a winner: a classification tree (denoted “CART (class)”), a logistic regression model (denoted “LogReg”), and a neural network with the same structure as described above, denoted “NN (class)”. All of these classifiers computed the probability that an agent is a winner for a specific domain and features. Assuming knowledge of a set of training domains D and agents A, the algorithm used by the meta-agent to choose an agent strategy is as follows (presented from the point of view of Agent1): Given a test domain d ∈ D , the metaagent will compute the features associated with the domain. Some of these features depend on receiving a proposal from Agent2, but if the meta-agent is the first proposer, it needs to make a proposal to Agent2. Lacking any information about Ϭ͘ϲϴ Ϭ͘ϲϳ Ϭ͘ϲϲ Ϭ͘ϲϱ Ϭ͘ϲϰ Ϭ͘ϲϯ KƌĂĐůĞ EE EE;ůĂƐƐͿ Zd Zd;ůĂƐƐͿ >ŝŶZĞŐ >ŽŐZĞŐ ǀŐĞƐƚ Figure 1: Score comparison between meta-agents, AvgBest agent and oracle the profile of Agent2, it makes the proposal that provides it with maximal utility. Finally, it predicts the performance of each agent in A on domain d, and returns the agent with the highest predicted performance (breaking ties randomly). The run-time of the algorithm is dominated by the feature selection process, which is polynomial in the size of the bid space in the domain. We report results on a data set that included 7 agents and 72 domains (144 profiles) that were used in the finals of the ANAC 2012 competition. We used K-fold cross validation with 4 agents as portfolio and 60 known domains for the learning process whereas 3 unknown agents and 12 unknown domains where used for the testing tournament phase, resulting in a total of 18 folds: 6 different sets of domains and 3 different sets of agents. We selected training domains and agents from this set and compared the performance of the meta-agents using the different learning and performance measures to (1) the “AvgBest” strategy (selecting the agent in A associated with the maximum score averaged over all known domains in D), which is equal to choosing the winning agent in A according to the rules of the ANAC competition. (2) the “Oracle” strategy: This strategy assumes perfect knowledge of agents’ performance for the test domains and agents. All of the competition parameters were set exactly as in the finals of the ANAC 2012 competition. Figure 1 compares the scores of the different meta-agents to the Oracle and AvgBest agents. All of the meta-agents outperformed the AvgBest agent in score (p < 0.01 for all meta-agents). It is interesting to see that the difference in score between meta-agents was not significant (ANOVA p = .792). Not shown in the figure is that all of the metaagents made significantly more good agent-choices than the AvgBest agent (p < 0.017), and agreed more often with the Oracle agent-choice (p < 0.002). The number of oraclechoices made by the meta-agent using the CART (Class) method was (statistically) significantly higher than the other meta-agents (p < 0.013). Our results demonstrate the feasibility of using machine learning methods to solve the algorithm selection problem in negotiation, in that (1) all of the meta-agents significantly outperformed the AvgBest agent and agreed more often with the Oracle choice; (2) the approach generalized well even when both the competition agents and domains were not known during training. 3 We emphasize that agents do not observe the identity of their negotiation opponent nor its profile in the tournament, even if the test agents are known. 51 References T. Baarslag, K. Fujita, E.H. Gerding, K. Hindriks, T. Ito, N.R. Jennings, C. Jonker, S. Kraus, R. Lin, V. Robu, et al. Evaluating practical negotiating agents: Results and analysis of the 2011 international competition. Artificial Intelligence, 2012. L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and regression trees. Chapman & Hall/CRC, 1984. N.R. Jennings, P. Faratin, A.R. Lomuscio, S. Parsons, M.J. Wooldridge, and C. Sierra. Automated negotiation: prospects, methods and challenges. Group Decision and Negotiation, 10(2):199–215, 2001. Sarit Kraus. Stragetic negotiation in Multiagent Environments. MIT Press, 2001. R. Lin and S. Kraus. Can automated agents proficiently negotiate with humans? Communications of the ACM, 53(1), 2010. H. Samulowitz and R. Memisevic. Learning to solve qbf. In Proceedings of the national conference on artificial intelligence, volume 22, page 255, 2007. R. Shibata. An optimal selection of regression variables. Biometrika, 68(1):45–54, 1981. L. Xu, F. Hutter, H.H. Hoos, and K. Leyton-Brown. Satzilla: portfolio-based algorithm selection for sat. Journal of Artificial Intelligence Research, 32(1):565–606, 2008. 52