General Opponent* Modeling for Improving Agent-Human Interaction Sarit Kraus

advertisement
General Opponent*
Modeling for Improving
Agent-Human Interaction
Sarit Kraus
Dept. of Computer Science
Bar Ilan University
AMEC
May 2010
1
Motivation
Negotiation is an extremely
important form of people
interaction
2
2
Computers interacting with people
Computer
has the
control
Human has
the control
3
Computer
persuades
human
4
Culture sensitive agents
The development of standardized
Buyer/Seller
agent
to
be
used
in
the
collection
agents negotiate
of
data
for
studies
on
culture
and
well across
negotiation
cultures
PURB
agent
5
Semi-autonomous cars
6
Medical applications
Gertner Institute for
Epidemiology and Health
Policy Research
7
7
Automated care-taker
I will be too
tired in the
afternoon!!!
The physiotherapist has
no other available
I scheduled an
appointments this week.
appointment for
How about resting before
you at the
the appointment?
physiotherapist this
afternoon
Try to
reschedule and
fail
8
Security applications
•Collect
•Update
•Analyze
•Prioritize
9
People often follow suboptimal
decision strategies

Irrationalities attributed to
–
–
–
–
–
–
10
sensitivity to context
lack of knowledge of own preferences
the effects of complexity
the interplay between emotion and cognition
the problem of self control
bounded rationality in the bullet
General opponent*
modeling
10
Challenges of human opponent* modeling

Small number of examples
–

Noisy data
–
–
11
difficult to collect data on people
people are inconsistent (the same person may act
differently)
people are diverse
Agenda

Multi-attribute multi-round bargaining
–

Revelation + bargaining
–

AAT based learning
Coordination with people:
–
12
SIGAL
Optimization problems
–

KBAgent
Focal point based learning
QOAgent [LIN08]



Played at least
as well as people
Multi-issue, multi-attribute, with incomplete
information
Domain independent
Is
it
Implemented several tactics and heuristics
possible to
Non-deterministic behavior, improve
also via means of
the
randomization
QOAgent?
qualitative in nature
Yes, if you have data
–

13
R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating with bounded
rational agents in environments with incomplete information using an
automated agent. Artificial Intelligence, 172(6-7):823–851, 2008
KBAgent [OS09]



Multi-issue, multi-attribute, with incomplete
information
Domain independent
Implemented several tactics and heuristics
–


14
qualitative in nature
Non-deterministic behavior, also via means
of randomization
Using data from previous interactions
Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of human-agent
negotiations via effective general opponent modeling. In AAMAS, 2009
Example scenario

Employer and job
candidate
–
15
Objective: reach an
agreement over hiring
terms after successful
interview
General opponent modeling
•
•
16
Challenge: sparse data of past negotiation
sessions of people negotiation
Technique: Kernel Density Estimation
General opponent modeling
 Estimate likelihood of other party:
– accept an offer
– make an offer
– its expected average utility
 The estimation is done separately for each
possible agent type:
–
The type of a negotiator is determined using a simple
Bayes' classifier
 Use estimation for decision making
17
KBAgent as the job candidate

Best result: 20,000, Project manager, With leased car; 20%
pension funds, fast promotion, 8 hours
KBAgent
Human
12,000
Programmer
Without leased car
Pension: 10%
Fast promotion
10 hours
18
20,000
Team Manager
With leased car
Pension: 20%
Slow promotion
9 hours
20,000
Project manager
Without leased car
Pension: 20%
Slow promotion
9 hours
KBAgent as the job candidate

Best agreement: 20,000, Project manager, With leased car; 20%
pension funds, fast promotion, 8 hours
20,000
12,000
Programmer
Team Manager KBAgent
Human
Without leased car
With leased car
Pension: 10%
Pension: 20%
Fast promotion
Slow promotion
10 hours
9 hours

19
Round 7
20,000
Programmer
With leased car
Pension: 10%
Slow promotion
9 hours
Experiments
Learned from 20 games
of human-human



172 grad and undergrad students in Computer
Science
People were told they may be playing a
computer agent or a person.
Scenarios:
–
–
20
Employer-Employee
Tobacco Convention: England vs. Zimbabwe
Results:
Comparing KBAgent to others
Player
Type
KBAgent vs people
QOAgent vs peoples
468.9 (37.0)
Employer
417.4 (135.9)
People vs. People
408.9 (106.7)
People vs. QOAgent
431.8 (80.8)
People vs. KBAgent
380. 4 (48.5)
KBAgent
482.7 (57.5)
QOAgent
397.8 (86.0)
People vs. People
People vs. QOAgent
21
Average Utility Value (std)
People vs. KBAgent
Job
Candidate
310.3 (143.6)
320.5 (112.7)
370.5 (58.9)
Main results

In comparison to the QOAgent
– The KBAgent achieved higher utility values than
General
opponent*
General
QOAgent
modeling
improves
agent
– More agreements
were accepted
by people
opponent
– The sum of utility values (social welfare) were higher
bargaining
when modeling
the KBAgent was involved


22
The KBAgent
achievedagent
significantly higher utility
improves
values than people
negotiations
Results demonstrate the proficiency negotiation
done by the KBAgent
Automated care-taker
I arrange for you to
I will be too
How cangoI to the
tired in the
physiotherapist
in
convince
him?
afternoon!!!
the afternoon
What argument
should I give?
23
Security applications
How should I
convince him
to provide
me with
information?
24
Should I tell
him thatArgumentation
we
Should
I tell her that
are running
my
leg
hurts?
out
of
Which information to reveal?
Should I tell him
antibiotics?
thatI tell
I willhim
lose
Should
I a
project
if I my
don’t hire
Build awas
game
that
fired from
today?
last job?
combines information
revelation and bargaining
25
25
Color Trails (CT)
An infrastructure for agent
design, implementation
and evaluation for open
environments
 Designed with Barbara Grosz
(AAMAS 2004)
 Implemented by Harvard team
and BIU team

26
An experimental test-ted
Interesting
–
–
for people to play
analogous to task settings;
vivid representation of strategy
space (not just a list of
outcomes).
Possible
for computers to play
Can vary in complexity
–
–
–
27
repeated vs. one-shot setting;
availability of information;
communication protocol.
Game description

The game is built from phases:
–
–
–
28
Revelation phase
First proposal phase
Counter-proposal phase
Joint work with Kobi Gal and Noam Peled
Two boards
29
Why not equilibrium agents?
 Results from the social sciences suggest people
do not follow equilibrium strategies:
 Equilibrium based agents played against
people failed.
 People rarely design agents to follow equilibrium
strategies
(Sarne et al AAMAS 2008).
 Equilibrium strategies are
usually not cooperative –
all lose.
30
30
Perfect Equilibrium agent


Solved using Backward induction; no strategic
signaling
Phase two:
–
–
31
Second proposer: Find the most beneficial proposal
while the responder benefit remains positive.
Second responder: Accepts any proposal which
gives it a positive benefit.
Perfect Equilibrium agent

Phase one:
–
–


32
First proposer: propose the opponent’s counterproposal
First responder: Accepts any proposals which gives it
the same or higher benefit from its counter-proposal.
In both boards, the PE with goal revelation
yields lower or equal expected utility than nonrevelation PE
Revelation: Reveals in half of the games
Asymmetric game
33
Performance
34
140 students
Benefits diversity

35
Average proposed benefit to players from first
and second rounds
Revelation affect

The effect of revelation on performance:
Only 35% of the games played by humans included revelation


36
Revelation had a significant effect on human
performance but not on agent performance
People were deterred by the strategic machinegenerated proposals, which heavily depended
on the role of the proposer and the responder.
SIGAL agent
Agent based on general opponent
modeling:
Genetic
algorithm
Logistic
Regression
37
SIGAL Agent: Acceptance



Learns from previous games
Predict the acceptance probability for each
proposal using Logistic regression
Features (for both players) relating to
proposals:
–
–
–
–
38
Benefit.
Goal revelations.
Players types
Benefit difference
between rounds 2 and 1.
SIGAL Agent: counter proposals

39
Model the way humans make counter-proposals
SIGAL Agent

Maximizes expected benefit given any state in
the game
–
–
–
40
Round
Player revelation
Behavior in round 1
Agent strategies comparison
Round 1
Agent
41
Send
Receive
Round 2
Send
Receive
EQ
Green:10 Purple:2
Gray:11
Green:2
Purple:10
Gray:11
SIGAL
Green:2
Green:2
Putple:5
Purple:9
SIGAL agent: performance
42
Agents performance comparison
Equilibrium Agent
General
opponent*
modeling
improves agent
negotiations
43
SIGAL Agent
44
GENERAL OPPONENT*
MODELING IN
MAXIMIZATION PROBLEMS
44
AAT agent
Agent based on general* opponent
modeling
45
Decision
Tree/
Naïve
Byes
AAT
45
Aspiration Adaptation Theory (AAT)

Economic theory of people’s behavior (Selten)
–


46
No utility function exists for decisions (!)
Relative decisions used instead
Retreat and urgency used for goal variables
Avi Rosenfeld and Sarit Kraus. Modeling Agents through Bounded
Rationality Theories. Proc. of IJCAI 2009., JAAMAS, 2010.
Commodity search
1000
47
47
Commodity search
900
1000
48
Commodity search
900
1000
950
49
If price < 800 buy; otherwise visit 5 stores and
buy in the cheapest.
49
Results
Behavioral
Using AAT to Quickly Learn
models used in
General
opponent*
modeling is
beneficial
83
Correct Classification %
81
79
77
75
73
71
69
67
65
Sparse
50
Naïve
Learning
Sparse AAT
50
General opponent*
modeling in cooperative
environments
51
Coordination with limited communication

Communication is not always possible:
–
–
–
–
–
52
High communication costs
Need to act undetected
Damaged communication devices
Language incompatibilities
Goal: Limited interruption of human
activities
I. Zuckerman, S. Kraus and J. S. Rosenschein. Using Focal Points
Learning to Improve Human-Machine Tactic Coordination, JAAMAS, 2010.
Focal Points (Examples)

53
Divide £100 into two piles, if your piles are
identical to your coordination partner, you get
the £100. Otherwise, you get nothing.
101 equilibria
Focal points (Examples)
9 equilibria
54
16 equilibria
Focal Points


55
Thomas Schelling (63):
Focal Points = Prominent
solutions to tactic coordination
games.
Based Prior work: Focal Points
Coordination for closed environments

Domain-independent rules that could be used
by automated agents to identify focal points:

Properties: Centrality,
Firstness, Extremeness, Singularity.
–
–

56
Logic based model
Decision theory based model
Algorithms for agents coordination.
Kraus and Rosenchein MAAMA 1992
Fenster et al ICMAS 1995
Annals of Mathematics and Artificial Intelligence 2000
FPL agent
Agent based on general* opponent
modeling
57
Decision
Tree/
neural
network
Focal Point
57
FPL agent
Agent based on general opponent
modeling:
58
Decision
Tree/
neural
network
raw data
vector
FP vector
58
Focal Point Learning

59
3 experimental domains:
Results – cont’

“very similar domain” (VSD) vs “similar
domain” (SD) of the “pick the pile” game.
General
opponent*
modeling
improves agent
coordination
60
Evaluation of agents (EDA)



Peer Designed Agents (PDA): computer agents
developed by humans
Experiment: 300 human subjects, 50 PDAs, 3
Experiments
with
EDA
Results: people is a costly
–
–
61
process
EDA outperformed PDAs in the same situations in
which they outperformed people,
on average, EDA exhibited the same measure of
generosity
R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the Evaluation of
Automated Negotiators using Peer Designed Agents, in AAAI 2010.
Conclusions


Negotiation and argumentation with people is
required for many applications
General* opponent modeling is beneficial
–
–
–
62
Machine learning
Behavioral model
Challenge: how to integrate machine learning and
behavioral model
62
Download