Algorithm Selection in Bilateral Negotiation Litan Ilany and Ya’akov (Kobi) Gal

advertisement
Late-Breaking Developments in the Field of Artificial Intelligence
Papers Presented at the Twenty-Seventh AAAI Conference on Artificial Intelligence
Algorithm Selection in Bilateral Negotiation
Litan Ilany and Ya’akov (Kobi) Gal
Ben-Gurion University of the Negev, Israel
al. 2012). The meta-agent was able to significantly outperform the winner of the 2012 competition (the agent strategy that achieved the best average performance over all domains and all agents), and agree more often with an “oracle” that chose the optimal agent strategy for each domain
in retrospect. Our approach generalized to the case in which
both competition agents and the competition domains were
not available for training. These results provide insights for
agent-designers in negotiation, demonstrating that “a little
learning goes a long way”, and suggesting that the algorithm
selection approach may also be feasible for other multiagent optimization problems such as planning and decisionmaking.
Abstract
Despite the abundance of strategies in the literature on repeated negotiation under incomplete information, there is no
single negotiation strategy that is optimal for all possible settings. Thus, agent designers face an “algorithm selection”
problem— which negotiation strategy to choose when facing
a new negotiation. Our approach to this problem is to predict the performance of different strategies based on structural
features of the domain and to select the negotiation strategy
that is predicted to be most successful using a “meta-agent”.
This agent was able to outperform all of the finalists to the
recent Automated Negotiation Agent Competition (ANAC).
Our results have insights for agent-designers, demonstrating
that “a little learning goes a long way”, despite the inherent
uncertainty associated with negotiation under incomplete information.
The Setting
We first make the following definitions for bilateral negotiation: A domain consists of a set of issues. Each issue can
take one of possible discrete values. The domain is common knowledge to the negotiating parties. A proposal is an
assignment of values to all issues. A negotiation round involves two participants termed Agent1 and Agent2. Each
agent has a profile, that determines its valuation of a proposal, which is private information.
In a negotiation round, Agent1 and Agent2 make alternating take-it-or-leave-it offers to each other until a proposal
is accepted or a predetermined deadline is reached. If an
agreement is reached for a proposal pt at time t, the utility
of Agent1 is its valuation of the agreed proposal.
Throughout this paper we will use the same empirical setting and conditions used in the ANAC 2012 tournament.1
Under these conditions, a “tournament” over a set of agents
A and a set of domains D consists of multiple negotiation
rounds between all agent pairs in A over all domains D for
all possible profiles 2 . Agents do not know the identity of
their negotiation partners. The agents are “reset” at the onset of each negotiation round, meaning that no information
about a domain or the history of past rounds is accessible
(and no learning across rounds is possible). The winner of
the tournament is the agent that achieved the highest average
score over all of the negotiation rounds.
Introduction
Multi-attribute negotiation under incomplete information is
a well studied problem in AI and there is an abundance of
agent-designs that use varying methods, heuristics, and techniques (Kraus 2001; Jennings et al. 2001; Lin and Kraus
2010). However, it has been widely observed that no single
negotiation strategy is optimal for all domains. This paper
studies the “algorithm selection problem” (Xu et al. 2008;
Samulowitz and Memisevic 2007), that is, which of a set of
possible strategies to choose to negotiate with an unknown
partner in a new domain, when the preferences of the negotiation partner as well as its negotiation strategy are unknown.
Our methodology consists of defining a set of features that
encapsulate the information about the domain that is available to agents at the onset of negotiation. These features are
then used to predict the performance of existing negotiation
strategies on a new domain using canonical learning methods including multi-layer neural networks, decision trees,
and linear and logistic regression. We designed a “metaagent” which, at run time, selects the negotiation strategy
that is predicted to be most successful on a new domain
based on its features.
We demonstrate our approach empirically on a negotiation test bed that used for the annual international Automated Negotiation Agent Competition (ANAC) (Baarslag et
1
http://anac2012.ecs.soton.ac.uk/
This means that each agent pair negotiates four times in each
domain, once as proposer and once as responder for each profile.
c 2013, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
2
50
Methodology
Ϭ͘ϳ
Ϭ͘ϲϵ
We define the algorithm selection problem in negotiation
as follows: Given a set of training domains D and training agents A, which agent to select from A to negotiate in each domain of a competition that includes a set
of test domains D and test agents A ? This paper studies the case where both agents and domains are unknown
(A ∩ A = ∅, D ∩ D = ∅). Our approach also performs
well when the test agents are known and domains are unknown (D ∩ D = ∅, A = A ), which we do not list for
brevity.3 Our approach to the algorithm selection problem
is to construct a “meta-agent” that chooses an agent in A by
predicting its performance in each domain in the tournament
using characteristics of the domain that are observable to the
meta-agent at the onset of the negotiation. Each domain d
is defined by several features from three types: (a) domain
information that is common knowledge, (b) private information about the agent’s profile, (c) information that inferred
from the first proposal p that Agent1 receives from Agent2.
We describe the design of a class of meta-agents that use
standard machine learning algorithms to predict the performance of different agents in a negotiation round. We used
canonical supervised learning algorithms to predict the performance of agents given a domain and profile. We used
two measures of performance. The first measured the difference between the score of agent i when negotiating with
any agent k in domain d and the average score over all negotiations with all agents in the domain. The second measure
of performance predicted whether an agent i is a “winner”
when negotiating with any agent k on a domain.
We used the following learning techniques to predict an
agent’s performance when negotiating in a new domain
(adapting standard overfitting avoidance methods for each
technique).
A regression tree algorithm (denoted “CART” (Breiman
et al. 1984)); a neural network (denoted “NN”) with a single hidden layer and four hidden nodes, using early stopping after 150 iterations when training; a linear regression
model (denoted “LinReg”) with a forward-backward selection method for choosing the predictive variables (Shibata
1981). We used classification methods for determining the
probability that an agent is a winner: a classification tree (denoted “CART (class)”), a logistic regression model (denoted
“LogReg”), and a neural network with the same structure as
described above, denoted “NN (class)”. All of these classifiers computed the probability that an agent is a winner for a
specific domain and features.
Assuming knowledge of a set of training domains D and
agents A, the algorithm used by the meta-agent to choose
an agent strategy is as follows (presented from the point of
view of Agent1): Given a test domain d ∈ D , the metaagent will compute the features associated with the domain.
Some of these features depend on receiving a proposal from
Agent2, but if the meta-agent is the first proposer, it needs to
make a proposal to Agent2. Lacking any information about
Ϭ͘ϲϴ
Ϭ͘ϲϳ
Ϭ͘ϲϲ
Ϭ͘ϲϱ
Ϭ͘ϲϰ
Ϭ͘ϲϯ
KƌĂĐůĞ
EE
EE;ůĂƐƐͿ
Zd
Zd;ůĂƐƐͿ
>ŝŶZĞŐ
>ŽŐZĞŐ
ǀŐĞƐƚ
Figure 1: Score comparison between meta-agents, AvgBest
agent and oracle
the profile of Agent2, it makes the proposal that provides
it with maximal utility. Finally, it predicts the performance
of each agent in A on domain d, and returns the agent with
the highest predicted performance (breaking ties randomly).
The run-time of the algorithm is dominated by the feature
selection process, which is polynomial in the size of the bid
space in the domain.
We report results on a data set that included 7 agents and
72 domains (144 profiles) that were used in the finals of
the ANAC 2012 competition. We used K-fold cross validation with 4 agents as portfolio and 60 known domains
for the learning process whereas 3 unknown agents and 12
unknown domains where used for the testing tournament
phase, resulting in a total of 18 folds: 6 different sets of
domains and 3 different sets of agents.
We selected training domains and agents from this set
and compared the performance of the meta-agents using
the different learning and performance measures to (1) the
“AvgBest” strategy (selecting the agent in A associated with
the maximum score averaged over all known domains in D),
which is equal to choosing the winning agent in A according to the rules of the ANAC competition. (2) the “Oracle” strategy: This strategy assumes perfect knowledge of
agents’ performance for the test domains and agents. All of
the competition parameters were set exactly as in the finals
of the ANAC 2012 competition.
Figure 1 compares the scores of the different meta-agents
to the Oracle and AvgBest agents. All of the meta-agents
outperformed the AvgBest agent in score (p < 0.01 for
all meta-agents). It is interesting to see that the difference
in score between meta-agents was not significant (ANOVA
p = .792). Not shown in the figure is that all of the metaagents made significantly more good agent-choices than the
AvgBest agent (p < 0.017), and agreed more often with the
Oracle agent-choice (p < 0.002). The number of oraclechoices made by the meta-agent using the CART (Class)
method was (statistically) significantly higher than the other
meta-agents (p < 0.013).
Our results demonstrate the feasibility of using machine
learning methods to solve the algorithm selection problem
in negotiation, in that (1) all of the meta-agents significantly
outperformed the AvgBest agent and agreed more often with
the Oracle choice; (2) the approach generalized well even
when both the competition agents and domains were not
known during training.
3
We emphasize that agents do not observe the identity of their
negotiation opponent nor its profile in the tournament, even if the
test agents are known.
51
References
T. Baarslag, K. Fujita, E.H. Gerding, K. Hindriks, T. Ito,
N.R. Jennings, C. Jonker, S. Kraus, R. Lin, V. Robu, et al.
Evaluating practical negotiating agents: Results and analysis
of the 2011 international competition. Artificial Intelligence,
2012.
L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and regression trees. Chapman & Hall/CRC, 1984.
N.R. Jennings, P. Faratin, A.R. Lomuscio, S. Parsons,
M.J. Wooldridge, and C. Sierra. Automated negotiation:
prospects, methods and challenges. Group Decision and Negotiation, 10(2):199–215, 2001.
Sarit Kraus. Stragetic negotiation in Multiagent Environments. MIT Press, 2001.
R. Lin and S. Kraus. Can automated agents proficiently negotiate with humans? Communications of the ACM, 53(1),
2010.
H. Samulowitz and R. Memisevic. Learning to solve qbf. In
Proceedings of the national conference on artificial intelligence, volume 22, page 255, 2007.
R. Shibata. An optimal selection of regression variables.
Biometrika, 68(1):45–54, 1981.
L. Xu, F. Hutter, H.H. Hoos, and K. Leyton-Brown. Satzilla:
portfolio-based algorithm selection for sat. Journal of Artificial Intelligence Research, 32(1):565–606, 2008.
52
Download