General Opponent* Modeling for Improving Agent-Human Interaction Sarit Kraus Dept. of Computer Science Bar Ilan University AMEC May 2010 1 Motivation Negotiation is an extremely important form of people interaction 2 2 Computers interacting with people Computer has the control Human has the control 3 Computer persuades human 4 Culture sensitive agents The development of standardized Buyer/Seller agent to be used in the collection agents negotiate of data for studies on culture and well across negotiation cultures PURB agent 5 Semi-autonomous cars 6 Medical applications Gertner Institute for Epidemiology and Health Policy Research 7 7 Automated care-taker I will be too tired in the afternoon!!! The physiotherapist has no other available I scheduled an appointments this week. appointment for How about resting before you at the the appointment? physiotherapist this afternoon Try to reschedule and fail 8 Security applications •Collect •Update •Analyze •Prioritize 9 People often follow suboptimal decision strategies Irrationalities attributed to – – – – – – 10 sensitivity to context lack of knowledge of own preferences the effects of complexity the interplay between emotion and cognition the problem of self control bounded rationality in the bullet General opponent* modeling 10 Challenges of human opponent* modeling Small number of examples – Noisy data – – 11 difficult to collect data on people people are inconsistent (the same person may act differently) people are diverse Agenda Multi-attribute multi-round bargaining – Revelation + bargaining – AAT based learning Coordination with people: – 12 SIGAL Optimization problems – KBAgent Focal point based learning QOAgent [LIN08] Played at least as well as people Multi-issue, multi-attribute, with incomplete information Domain independent Is it Implemented several tactics and heuristics possible to Non-deterministic behavior, improve also via means of the randomization QOAgent? qualitative in nature Yes, if you have data – 13 R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating with bounded rational agents in environments with incomplete information using an automated agent. Artificial Intelligence, 172(6-7):823–851, 2008 KBAgent [OS09] Multi-issue, multi-attribute, with incomplete information Domain independent Implemented several tactics and heuristics – 14 qualitative in nature Non-deterministic behavior, also via means of randomization Using data from previous interactions Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of human-agent negotiations via effective general opponent modeling. In AAMAS, 2009 Example scenario Employer and job candidate – 15 Objective: reach an agreement over hiring terms after successful interview General opponent modeling • • 16 Challenge: sparse data of past negotiation sessions of people negotiation Technique: Kernel Density Estimation General opponent modeling Estimate likelihood of other party: – accept an offer – make an offer – its expected average utility The estimation is done separately for each possible agent type: – The type of a negotiator is determined using a simple Bayes' classifier Use estimation for decision making 17 KBAgent as the job candidate Best result: 20,000, Project manager, With leased car; 20% pension funds, fast promotion, 8 hours KBAgent Human 12,000 Programmer Without leased car Pension: 10% Fast promotion 10 hours 18 20,000 Team Manager With leased car Pension: 20% Slow promotion 9 hours 20,000 Project manager Without leased car Pension: 20% Slow promotion 9 hours KBAgent as the job candidate Best agreement: 20,000, Project manager, With leased car; 20% pension funds, fast promotion, 8 hours 20,000 12,000 Programmer Team Manager KBAgent Human Without leased car With leased car Pension: 10% Pension: 20% Fast promotion Slow promotion 10 hours 9 hours 19 Round 7 20,000 Programmer With leased car Pension: 10% Slow promotion 9 hours Experiments Learned from 20 games of human-human 172 grad and undergrad students in Computer Science People were told they may be playing a computer agent or a person. Scenarios: – – 20 Employer-Employee Tobacco Convention: England vs. Zimbabwe Results: Comparing KBAgent to others Player Type KBAgent vs people QOAgent vs peoples 468.9 (37.0) Employer 417.4 (135.9) People vs. People 408.9 (106.7) People vs. QOAgent 431.8 (80.8) People vs. KBAgent 380. 4 (48.5) KBAgent 482.7 (57.5) QOAgent 397.8 (86.0) People vs. People People vs. QOAgent 21 Average Utility Value (std) People vs. KBAgent Job Candidate 310.3 (143.6) 320.5 (112.7) 370.5 (58.9) Main results In comparison to the QOAgent – The KBAgent achieved higher utility values than General opponent* General QOAgent modeling improves agent – More agreements were accepted by people opponent – The sum of utility values (social welfare) were higher bargaining when modeling the KBAgent was involved 22 The KBAgent achievedagent significantly higher utility improves values than people negotiations Results demonstrate the proficiency negotiation done by the KBAgent Automated care-taker I arrange for you to I will be too How cangoI to the tired in the physiotherapist in convince him? afternoon!!! the afternoon What argument should I give? 23 Security applications How should I convince him to provide me with information? 24 Should I tell him thatArgumentation we Should I tell her that are running my leg hurts? out of Which information to reveal? Should I tell him antibiotics? thatI tell I willhim lose Should I a project if I my don’t hire Build awas game that fired from today? last job? combines information revelation and bargaining 25 25 Color Trails (CT) An infrastructure for agent design, implementation and evaluation for open environments Designed with Barbara Grosz (AAMAS 2004) Implemented by Harvard team and BIU team 26 An experimental test-ted Interesting – – for people to play analogous to task settings; vivid representation of strategy space (not just a list of outcomes). Possible for computers to play Can vary in complexity – – – 27 repeated vs. one-shot setting; availability of information; communication protocol. Game description The game is built from phases: – – – 28 Revelation phase First proposal phase Counter-proposal phase Joint work with Kobi Gal and Noam Peled Two boards 29 Why not equilibrium agents? Results from the social sciences suggest people do not follow equilibrium strategies: Equilibrium based agents played against people failed. People rarely design agents to follow equilibrium strategies (Sarne et al AAMAS 2008). Equilibrium strategies are usually not cooperative – all lose. 30 30 Perfect Equilibrium agent Solved using Backward induction; no strategic signaling Phase two: – – 31 Second proposer: Find the most beneficial proposal while the responder benefit remains positive. Second responder: Accepts any proposal which gives it a positive benefit. Perfect Equilibrium agent Phase one: – – 32 First proposer: propose the opponent’s counterproposal First responder: Accepts any proposals which gives it the same or higher benefit from its counter-proposal. In both boards, the PE with goal revelation yields lower or equal expected utility than nonrevelation PE Revelation: Reveals in half of the games Asymmetric game 33 Performance 34 140 students Benefits diversity 35 Average proposed benefit to players from first and second rounds Revelation affect The effect of revelation on performance: Only 35% of the games played by humans included revelation 36 Revelation had a significant effect on human performance but not on agent performance People were deterred by the strategic machinegenerated proposals, which heavily depended on the role of the proposer and the responder. SIGAL agent Agent based on general opponent modeling: Genetic algorithm Logistic Regression 37 SIGAL Agent: Acceptance Learns from previous games Predict the acceptance probability for each proposal using Logistic regression Features (for both players) relating to proposals: – – – – 38 Benefit. Goal revelations. Players types Benefit difference between rounds 2 and 1. SIGAL Agent: counter proposals 39 Model the way humans make counter-proposals SIGAL Agent Maximizes expected benefit given any state in the game – – – 40 Round Player revelation Behavior in round 1 Agent strategies comparison Round 1 Agent 41 Send Receive Round 2 Send Receive EQ Green:10 Purple:2 Gray:11 Green:2 Purple:10 Gray:11 SIGAL Green:2 Green:2 Putple:5 Purple:9 SIGAL agent: performance 42 Agents performance comparison Equilibrium Agent General opponent* modeling improves agent negotiations 43 SIGAL Agent 44 GENERAL OPPONENT* MODELING IN MAXIMIZATION PROBLEMS 44 AAT agent Agent based on general* opponent modeling 45 Decision Tree/ Naïve Byes AAT 45 Aspiration Adaptation Theory (AAT) Economic theory of people’s behavior (Selten) – 46 No utility function exists for decisions (!) Relative decisions used instead Retreat and urgency used for goal variables Avi Rosenfeld and Sarit Kraus. Modeling Agents through Bounded Rationality Theories. Proc. of IJCAI 2009., JAAMAS, 2010. Commodity search 1000 47 47 Commodity search 900 1000 48 Commodity search 900 1000 950 49 If price < 800 buy; otherwise visit 5 stores and buy in the cheapest. 49 Results Behavioral Using AAT to Quickly Learn models used in General opponent* modeling is beneficial 83 Correct Classification % 81 79 77 75 73 71 69 67 65 Sparse 50 Naïve Learning Sparse AAT 50 General opponent* modeling in cooperative environments 51 Coordination with limited communication Communication is not always possible: – – – – – 52 High communication costs Need to act undetected Damaged communication devices Language incompatibilities Goal: Limited interruption of human activities I. Zuckerman, S. Kraus and J. S. Rosenschein. Using Focal Points Learning to Improve Human-Machine Tactic Coordination, JAAMAS, 2010. Focal Points (Examples) 53 Divide £100 into two piles, if your piles are identical to your coordination partner, you get the £100. Otherwise, you get nothing. 101 equilibria Focal points (Examples) 9 equilibria 54 16 equilibria Focal Points 55 Thomas Schelling (63): Focal Points = Prominent solutions to tactic coordination games. Based Prior work: Focal Points Coordination for closed environments Domain-independent rules that could be used by automated agents to identify focal points: Properties: Centrality, Firstness, Extremeness, Singularity. – – 56 Logic based model Decision theory based model Algorithms for agents coordination. Kraus and Rosenchein MAAMA 1992 Fenster et al ICMAS 1995 Annals of Mathematics and Artificial Intelligence 2000 FPL agent Agent based on general* opponent modeling 57 Decision Tree/ neural network Focal Point 57 FPL agent Agent based on general opponent modeling: 58 Decision Tree/ neural network raw data vector FP vector 58 Focal Point Learning 59 3 experimental domains: Results – cont’ “very similar domain” (VSD) vs “similar domain” (SD) of the “pick the pile” game. General opponent* modeling improves agent coordination 60 Evaluation of agents (EDA) Peer Designed Agents (PDA): computer agents developed by humans Experiment: 300 human subjects, 50 PDAs, 3 Experiments with EDA Results: people is a costly – – 61 process EDA outperformed PDAs in the same situations in which they outperformed people, on average, EDA exhibited the same measure of generosity R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the Evaluation of Automated Negotiators using Peer Designed Agents, in AAAI 2010. Conclusions Negotiation and argumentation with people is required for many applications General* opponent modeling is beneficial – – – 62 Machine learning Behavioral model Challenge: how to integrate machine learning and behavioral model 62