Analyzing and modelling the strategies in the AKQJ10 Poker Game Prepared for Prof. Anupama Sharma As a part of the course Game Theory, BITS F314 By: ε<0 Amrtanshu Bharadwaj 2020B4AA1860G Bhavya Mehta Pranav Ghag Pranav Kocheta Aditi Kashyap 2020B4AA1846G 2020B4A32060G 2020B4A41859G 2020B4A71654G 1 Acknowledgement We would like to express our special thanks of gratitude to Prof. Anupama Sharma, Department of Mathematics, BITS Pilani K. K. Birla Goa Campus who gave us the golden opportunity to make this Game Theory Model. It helped us in doing a lot of research, and we came to know about a lot of things related to this Model as well as polish our Game Theory Concepts. Last but not least, we would like to express our gratitude to our friends and respondents for their support and willingness to spend some time with us. 2 Contents Acknowledgement ................................................................................................................................. 2 Abstract ................................................................................................................................................. 4 Introduction ............................................................................................................................................ 4 The Model.............................................................................................................................................. 5 High-Risk High Reward.......................................................................................................................... 6 Playing It Safe ....................................................................................................................................... 9 Non-Dominating Strategies .................................................................................................................. 11 Calculations ......................................................................................................................................... 11 Theoretical Conclusions....................................................................................................................... 13 Conclusions on Simulating ................................................................................................................... 14 References: ......................................................................................................................................... 17 3 Abstract The AKQJ10 poker game is a simplified variation of traditional poker, in which one card is dealt from a set of cards containing one of A, K, Q, J, and 10, and the goal is to have the highest-ranked card. The cards are ranked in descending order, with the Ace (A) being the highest, followed by King (K), Queen (Q), Jack (J), and Ten (10). This game is intriguing from a game theory perspective because it involves strategic decision-making, including betting and bluffing, even though it is based on a single card's rank. We will model this poker variant as a game to predict the best response and maximize the payoff for the player. Later, we will train an RL model to play this game, further testing our hypothesis by playing against the RL-trained bot. Introduction In 1995, a pioneer poker theorist, Mike Caro, proposed a game called AKQ. It is a simplified game that allows the application of game theoretical concepts to solve it rigorously. It has a limited number of game states, and hand calculations can be performed relatively easily. Since one of this paper's objectives is to contrast traditionally derived results with results obtained from a Reinforcement Learning (RL) model, we have chosen to increase the complexity of Caro’s game, thereby creating a slightly different variant called AKQJ10. We assume there are only five cards called the Ace, King, Queen, Jack, and 10 (AKQJ10), where A>K>Q>J>10. These five cards are dealt to the two players so that each picks one card randomly without replacement. Players only know their card and can deduce that the other player must have one of the four remaining cards. On the basis of his card, player_1(P1) can choose to either Bet or Check. Check implies that the game is over and no money is exchanged. If P1 chooses Bet, the game moves forward, leading to player_2’s (P2) turn. He can then choose either Call or Fold. If P2 chooses to fold, he loses that round; however, if he chooses to call, the highest card wins the pot. We shall discuss two strategies called High-Risk High Reward and Playing It Safe, which we believe represent players' general mindset. The fundamental difference lies in how the player prefers to play the game, and this difference is modelled by assigning different payoffs. Furthermore, we have trained an RL bot that has independently come up with its own strategy. Since its only objective was maximising its payoff, we believe this model accurately represents a rational player. 4 The Model N: = {1,2} A: = {{Bet, Check},{Call, Fold}} Player Function: P(∅) = 1, P(Bet) = 2 Preferences: 1. High-Risk High-Reward: Win by Bluff > Normal Win > Tie > Lose 2. Playing It Safe: Normal Win > Win by Bluff > Tie > Lose 5 High-Risk High Reward This mindset assigns a higher payoff to bluffing than to winning by a higher card. Thus, the payoff matrix is as follows: P2 P1 ACE KING QUEEN JACK 10 ACE CALL KING FOLD CAL L QUEEN FOLD CALL FOL D JACK 10 CALL FOLD CAL L FOLD BET 1 0 1 0 1 0 1 0 CHECK 0 0 0 0 0 0 0 0 BET -1 2 1 0 1 0 1 0 CHECK 0 0 0 0 0 0 0 0 BET -1 2 -1 2 1 0 1 0 CHECK 0 0 0 0 0 0 0 0 BET -1 2 -1 2 -1 2 1 0 CHECK 0 0 0 0 0 0 0 0 BET -1 2 -1 2 -1 2 -1 2 CHECK 0 0 0 0 0 0 0 0 6 It is extremely apparent that when P1 has an Ace, he should always bet. When P2 has an Ace, he should always Call; when he has a 10, he should always fold. Thus, we eliminate those choices to get the following reduced payoff matrix. P2 P1 ACE CALL KING QUEEN JACK 10 CALL FOLD CALL FOLD CALL FOLD FOLD 1 0 1 0 1 0 0 ACE BET KING BET -1 1 0 1 0 0 CHECK 0 0 0 0 0 0 BET -1 -1 2 1 0 0 CHECK 0 0 0 0 0 0 BET -1 -1 2 -1 2 0 CHECK 0 0 0 0 0 0 BET -1 -1 2 -1 2 -1 2 CHECK 0 0 0 0 0 0 0 QUEEN JACK 10 7 Since there is only one strategy after receiving Ace for P1 and Ace and 10 for P2, we ignore these trivial cases, thus arriving at the following Simplified Payoff Matrix. P2 P1 KING QUEEN JACK 10 KING CALL QUEEN FOLD JACK CALL FOLD CALL FOLD BET 1 0 1 0 CHECK 0 0 0 0 BET -1 2 1 0 CHECK 0 0 0 0 BET -1 2 -1 2 CHECK 0 0 0 0 BET -1 2 -1 2 -1 2 CHECK 0 0 0 0 0 0 8 Playing It Safe This mindset assigns a higher payoff to winning by a higher card than to bluffing. Thus, the payoff matrix is as follows: P2 P1 ACE KING QUEEN JACK 10 ACE CALL KING FOLD QUEEN JACK 10 CALL FOLD CALL FOLD CALL FOLD CALL FOLD BET 2 0 2 0 2 0 2 0 CHECK 0 0 0 0 0 0 0 0 BET -1 1 2 0 2 0 2 0 CHECK 0 0 0 0 0 0 0 0 BET -1 1 -1 1 2 0 2 0 CHECK 0 0 0 0 0 0 0 0 BET -1 1 -1 1 -1 1 2 0 CHECK 0 0 0 0 0 0 0 0 BET -1 1 -1 1 -1 1 -1 1 CHECK 0 0 0 0 0 0 0 0 9 P2 P1 KING QUEEN JACK 10 ACE CALL KING CALL QUEEN FOLD JACK CALL FOLD CALL FOLD BET -1 2 0 2 0 CHECK 0 0 0 0 0 BET -1 -1 1 2 0 CHECK 0 0 0 0 0 BET -1 -1 1 -1 1 CHECK 0 0 0 0 0 BET -1 -1 1 -1 1 -1 1 CHECK 0 0 0 0 0 0 0 Similarly after eliminating the strictly dominated strategies and further the trivial case of Ace and 10, we get the following Simplified Payoff Matrix. 10 Non-Dominating Strategies Thus, in both the cases the non-dominated strategies for P1 are: 1. 2. 3. 4. 5. 6. 7. 8. Bet Kings as Kb Check Kings as Kch Bet Queens as Qb Check Queens as Qch Bet Jacks as Jb Check Jacks as Jch Bet 10s as 10b Check 10s as 10ch, And the non-dominated strategies for P2 are: 1. 2. 3. 4. 5. 6. Call Kings as Kc Fold Kings as Kf Call Queens as Qc Fold Queens as Qf Call Jacks as Jc Fold Jacks as Jf. Hence, there will be 48 matchups since both players can’t have the same card, and we need to consider the expected value of each of these confrontations. Now, we can make some important observations that will help our calculations. Calculations 1. If P2 folds and P1 has a weaker card, the payoff is 0, so we can disregard such cases from our calculations. 2. Since we have five cards in our deck, P1 will hold any one card 1/5th of the time, and P2 will hold any of the remaining cards 1/4th of the time. 3. In both strategies, we can see that P2 will never call a 10, so we can ignore that case well. Say we pick Kb vs Qc. When P1 has an Ace, he wins by getting called by King, Queen, and Jack. If he has a King, he loses to Ace but wins to Queen and Jack. If he has a Queen, He loses to Ace, King, but wins to Jack. If he has Jack, he loses to Ace, King, and Queen. P1 will have a card 1/5th of the time, and P2 will have the remaining cards 1/4th of the time. The payoff values are chosen from the matrix. Thus, the calculation is as follows: 11 After similar calculations, we get the following matrices For High-Risk High Reward: Kc Kf Qc Qf Jc Jf Kb 4 8 6 6 Kch 3 8 5 6 Qb 2 10 6 6 Qch 3 8 5 6 Jb 2 10 4 8 Jch 3 8 5 6 10b 2 10 4 8 6 6 10ch 2 7 4 5 6 3 For Playing It Safe: 12 Kc Kf Qc Qf Jc Jf Kb 6 6 7 3 Kch 3 5 4 2 Qb 5 9 7 3 Qch 5 7 6 4 Jb 5 9 6 6 Jch 7 9 8 6 10b 5 9 6 6 7 3 10ch 7 9 8 6 9 3 Now after the elimination of weakly dominated strategies, we will be left with the following. Theoretical Conclusions Player 1 Player 2 Always bet Ace Always call Ace Always bet King Always call King Always check Queen Always call Queen Always check Jack Always call Jack Always bet 10 Always fold 10 13 Conclusions on Simulating After calculating the theoretical conclusions, we played the two solution concepts for the two different mindsets against each other 1 million times, to see what results we get: For the above graph, since it is a zero-sum game, the win percentage of Player 1 is the loss percentage of Player 2 and vice versa, so the graph shows the win and loss percentage of Player 1. Case(i): Player 1 plays High Risk High Reward; Player 2 plays High Risk High Reward Player 1 win % is 50.03; Player 2 win % is 24.98. Case(ii): Player 1 plays Playing it Safe; Player 2 plays Playing it Safe Player 1 win % is 24.99%; Player 2 win % is 15.01%. Case(iii): Player 1 plays High Risk High Reward; Player 2 plays Playing it Safe Player 1 win % 24.98; Player 2 win % is 15.02 Case(iv): Player 1 plays Playing it Safe; Player 2 plays High Risk High Reward Player 1 win % is 65.1; Player 2 win % is 14.98 From the following graph, we can observe that in general there is a Player 1 advantage when playing the two strategies gainst each other or even if both players follow the same strategy. 14 Especially when we observe the case where Player 1 plays “Playing it Safe,” and Player 2 plays ”High Risks High Rewards,” we can see a clear rise in the win percentage. This makes sense since Player 2 can’t bluff so it’s incentive to bluff is not useful, whereas Player 1 will always bet with the higher cards and not take any risks and check with the smaller cards. We also played these two concepts against an RL model 1 million times, where we defined a win with a payoff of 1, a loss with a payoff of -1, and a tie with a payoff of 0. This ensures that the RL model is not biased towards any concept and its preferences are completely analogous to the win and loss according to the game's rules. In the above graph, the position of the strategy in the name means that the player is following that strategy. Blue signifies Player 1, and orange represents Player 2. From the graphs, we can clearly observe that Case(v): Player 1 plays High-Risk High Reward; Player 2 is AI Player 1 win % is 16.19; Player 1 loss % is 14.96% Case(vi): Player 1 plays Playing it Safe; Player 2 is AI Player 1 win % is 8.38%; Player 1 loss % is 13.8% Case(vii): Player 1 is AI; Player 2 plays High RIsk High Reward Player 2 win % is 9.95%; Player 2 loss % is 14.96% 15 Case(vii): Player 1 is AI; Player 2 plays Playing it Safe Player 2 win % is 9.94%; Player 2 loss % is 5.03% From this we can observe that when playing against AI(or in a real-world situation against someone who we have no idea plays what strategy) as Player 1 High-Risk High Reward has a better win % in comparison to Playing it Safe. As Player 2, we observe that even though the win percentage is approximately the same for both concepts, the loss percentage for Playing it Safe is significantly lower than High Risk High Reward. So as Player 1, we prefer to play High-Risk High Reward, and as Player 2 we prefer Playing it Safe. 16 References: THINKING POKER THROUGH GAME THEORY by Damian Palafox; https://scholarworks.lib.csusb.edu/cgi/viewcontent.cgi?article=1378&context=etd Introduction to Game Theory by Bill Chen https://web.mit.edu/willma/www/2013lec3.pdf Bill Chen and Jerrod Ankenman. The Mathematics of Poker. ConJelCo LLC, Pittsburgh, PA, 2006. https://pokerbooks.lt/books/en/The_Mathematics_of_Poker.pdf Code of the RL model BhavyaMehta2/AKQJ10 (github.com) 17