Uploaded by f20201860

Game Theory Report (2)

advertisement
Analyzing and modelling the strategies in the
AKQJ10 Poker Game
Prepared for
Prof. Anupama Sharma
As a part of the course
Game Theory, BITS F314
By: ε<0
Amrtanshu Bharadwaj
2020B4AA1860G
Bhavya Mehta
Pranav Ghag
Pranav Kocheta
Aditi Kashyap
2020B4AA1846G
2020B4A32060G
2020B4A41859G
2020B4A71654G
1
Acknowledgement
We would like to express our special thanks of gratitude to Prof. Anupama Sharma, Department of
Mathematics, BITS Pilani K. K. Birla Goa Campus who gave us the golden opportunity to make this
Game Theory Model. It helped us in doing a lot of research, and we came to know about a lot of things
related to this Model as well as polish our Game Theory Concepts.
Last but not least, we would like to express our gratitude to our friends and respondents for their
support and willingness to spend some time with us.
2
Contents
Acknowledgement ................................................................................................................................. 2
Abstract ................................................................................................................................................. 4
Introduction ............................................................................................................................................ 4
The Model.............................................................................................................................................. 5
High-Risk High Reward.......................................................................................................................... 6
Playing It Safe ....................................................................................................................................... 9
Non-Dominating Strategies .................................................................................................................. 11
Calculations ......................................................................................................................................... 11
Theoretical Conclusions....................................................................................................................... 13
Conclusions on Simulating ................................................................................................................... 14
References: ......................................................................................................................................... 17
3
Abstract
The AKQJ10 poker game is a simplified variation of traditional poker, in which one card is dealt from a
set of cards containing one of A, K, Q, J, and 10, and the goal is to have the highest-ranked card. The
cards are ranked in descending order, with the Ace (A) being the highest, followed by King (K), Queen
(Q), Jack (J), and Ten (10). This game is intriguing from a game theory perspective because it involves
strategic decision-making, including betting and bluffing, even though it is based on a single card's
rank. We will model this poker variant as a game to predict the best response and maximize the payoff
for the player. Later, we will train an RL model to play this game, further testing our hypothesis by
playing against the RL-trained bot.
Introduction
In 1995, a pioneer poker theorist, Mike Caro, proposed a game called AKQ. It is a simplified game that
allows the application of game theoretical concepts to solve it rigorously. It has a limited number of
game states, and hand calculations can be performed relatively easily. Since one of this paper's
objectives is to contrast traditionally derived results with results obtained from a Reinforcement
Learning (RL) model, we have chosen to increase the complexity of Caro’s game, thereby creating a
slightly different variant called AKQJ10.
We assume there are only five cards called the Ace, King, Queen, Jack, and 10 (AKQJ10), where
A>K>Q>J>10. These five cards are dealt to the two players so that each picks one card randomly
without replacement. Players only know their card and can deduce that the other player must have one
of the four remaining cards. On the basis of his card, player_1(P1) can choose to either Bet or Check.
Check implies that the game is over and no money is exchanged. If P1 chooses Bet, the game moves
forward, leading to player_2’s (P2) turn. He can then choose either Call or Fold. If P2 chooses to fold,
he loses that round; however, if he chooses to call, the highest card wins the pot.
We shall discuss two strategies called High-Risk High Reward and Playing It Safe, which we believe
represent players' general mindset. The fundamental difference lies in how the player prefers to play
the game, and this difference is modelled by assigning different payoffs.
Furthermore, we have trained an RL bot that has independently come up with its own strategy. Since its
only objective was maximising its payoff, we believe this model accurately represents a rational player.
4
The Model
N: = {1,2}
A: = {{Bet, Check},{Call, Fold}}
Player Function: P(∅) = 1, P(Bet) = 2
Preferences:
1. High-Risk High-Reward: Win by Bluff > Normal Win > Tie > Lose
2. Playing It Safe: Normal Win > Win by Bluff > Tie > Lose
5
High-Risk High Reward
This mindset assigns a higher payoff to bluffing than to winning by a higher card. Thus, the payoff
matrix is as follows:
P2
P1
ACE
KING
QUEEN
JACK
10
ACE
CALL
KING
FOLD
CAL
L
QUEEN
FOLD CALL
FOL
D
JACK
10
CALL FOLD
CAL
L
FOLD
BET
1
0
1
0
1
0
1
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
2
1
0
1
0
1
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
2
-1
2
1
0
1
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
2
-1
2
-1
2
1
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
2
-1
2
-1
2
-1
2
CHECK
0
0
0
0
0
0
0
0
6
It is extremely apparent that when P1 has an Ace, he should always bet. When P2 has an Ace, he
should always Call; when he has a 10, he should always fold. Thus, we eliminate those choices to get
the following reduced payoff matrix.
P2
P1
ACE
CALL
KING
QUEEN
JACK
10
CALL
FOLD
CALL
FOLD
CALL
FOLD
FOLD
1
0
1
0
1
0
0
ACE
BET
KING
BET
-1
1
0
1
0
0
CHECK
0
0
0
0
0
0
BET
-1
-1
2
1
0
0
CHECK
0
0
0
0
0
0
BET
-1
-1
2
-1
2
0
CHECK
0
0
0
0
0
0
BET
-1
-1
2
-1
2
-1
2
CHECK
0
0
0
0
0
0
0
QUEEN
JACK
10
7
Since there is only one strategy after receiving Ace for P1 and Ace and 10 for P2, we ignore these
trivial cases, thus arriving at the following Simplified Payoff Matrix.
P2
P1
KING
QUEEN
JACK
10
KING
CALL
QUEEN
FOLD
JACK
CALL
FOLD
CALL
FOLD
BET
1
0
1
0
CHECK
0
0
0
0
BET
-1
2
1
0
CHECK
0
0
0
0
BET
-1
2
-1
2
CHECK
0
0
0
0
BET
-1
2
-1
2
-1
2
CHECK
0
0
0
0
0
0
8
Playing It Safe
This mindset assigns a higher payoff to winning by a higher card than to bluffing. Thus, the payoff
matrix is as follows:
P2
P1
ACE
KING
QUEEN
JACK
10
ACE
CALL
KING
FOLD
QUEEN
JACK
10
CALL
FOLD
CALL
FOLD
CALL
FOLD
CALL
FOLD
BET
2
0
2
0
2
0
2
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
1
2
0
2
0
2
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
1
-1
1
2
0
2
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
1
-1
1
-1
1
2
0
CHECK
0
0
0
0
0
0
0
0
BET
-1
1
-1
1
-1
1
-1
1
CHECK
0
0
0
0
0
0
0
0
9
P2
P1
KING
QUEEN
JACK
10
ACE
CALL
KING
CALL
QUEEN
FOLD
JACK
CALL
FOLD
CALL
FOLD
BET
-1
2
0
2
0
CHECK
0
0
0
0
0
BET
-1
-1
1
2
0
CHECK
0
0
0
0
0
BET
-1
-1
1
-1
1
CHECK
0
0
0
0
0
BET
-1
-1
1
-1
1
-1
1
CHECK
0
0
0
0
0
0
0
Similarly after eliminating the strictly dominated strategies and further the trivial case of Ace and 10, we
get the following Simplified Payoff Matrix.
10
Non-Dominating Strategies
Thus, in both the cases the non-dominated strategies for P1 are:
1.
2.
3.
4.
5.
6.
7.
8.
Bet Kings as Kb
Check Kings as Kch
Bet Queens as Qb
Check Queens as Qch
Bet Jacks as Jb
Check Jacks as Jch
Bet 10s as 10b
Check 10s as 10ch,
And the non-dominated strategies for P2 are:
1.
2.
3.
4.
5.
6.
Call Kings as Kc
Fold Kings as Kf
Call Queens as Qc
Fold Queens as Qf
Call Jacks as Jc
Fold Jacks as Jf.
Hence, there will be 48 matchups since both players can’t have the same card, and we need to
consider the expected value of each of these confrontations. Now, we can make some important
observations that will help our calculations.
Calculations
1. If P2 folds and P1 has a weaker card, the payoff is 0, so we can disregard such cases from our
calculations.
2. Since we have five cards in our deck, P1 will hold any one card 1/5th of the time, and P2 will
hold any of the remaining cards 1/4th of the time.
3. In both strategies, we can see that P2 will never call a 10, so we can ignore that case well.
Say we pick Kb vs Qc.
When P1 has an Ace, he wins by getting called by King, Queen, and Jack. If he has a King, he loses to
Ace but wins to Queen and Jack. If he has a Queen, He loses to Ace, King, but wins to Jack. If he has
Jack, he loses to Ace, King, and Queen. P1 will have a card 1/5th of the time, and P2 will have the
remaining cards 1/4th of the time. The payoff values are chosen from the matrix. Thus, the calculation
is as follows:
11
After similar calculations, we get the following matrices
For High-Risk High Reward:
Kc
Kf
Qc
Qf
Jc
Jf
Kb
4
8
6
6
Kch
3
8
5
6
Qb
2
10
6
6
Qch
3
8
5
6
Jb
2
10
4
8
Jch
3
8
5
6
10b
2
10
4
8
6
6
10ch
2
7
4
5
6
3
For Playing It Safe:
12
Kc
Kf
Qc
Qf
Jc
Jf
Kb
6
6
7
3
Kch
3
5
4
2
Qb
5
9
7
3
Qch
5
7
6
4
Jb
5
9
6
6
Jch
7
9
8
6
10b
5
9
6
6
7
3
10ch
7
9
8
6
9
3
Now after the elimination of weakly dominated strategies, we will be left with the following.
Theoretical Conclusions
Player 1
Player 2
Always bet Ace
Always call Ace
Always bet King
Always call King
Always check Queen
Always call Queen
Always check Jack
Always call Jack
Always bet 10
Always fold 10
13
Conclusions on Simulating
After calculating the theoretical conclusions, we played the two solution concepts for the two different
mindsets against each other 1 million times, to see what results we get:
For the above graph, since it is a zero-sum game, the win percentage of Player 1 is the loss percentage
of Player 2 and vice versa, so the graph shows the win and loss percentage of Player 1.
Case(i): Player 1 plays High Risk High Reward; Player 2 plays High Risk High Reward
Player 1 win % is 50.03; Player 2 win % is 24.98.
Case(ii): Player 1 plays Playing it Safe; Player 2 plays Playing it Safe
Player 1 win % is 24.99%; Player 2 win % is 15.01%.
Case(iii): Player 1 plays High Risk High Reward; Player 2 plays Playing it Safe
Player 1 win % 24.98; Player 2 win % is 15.02
Case(iv): Player 1 plays Playing it Safe; Player 2 plays High Risk High Reward
Player 1 win % is 65.1; Player 2 win % is 14.98
From the following graph, we can observe that in general there is a Player 1 advantage when playing
the two strategies gainst each other or even if both players follow the same strategy.
14
Especially when we observe the case where Player 1 plays “Playing it Safe,” and Player 2 plays ”High
Risks High Rewards,” we can see a clear rise in the win percentage. This makes sense since Player 2
can’t bluff so it’s incentive to bluff is not useful, whereas Player 1 will always bet with the higher cards
and not take any risks and check with the smaller cards.
We also played these two concepts against an RL model 1 million times, where we defined a win with a
payoff of 1, a loss with a payoff of -1, and a tie with a payoff of 0. This ensures that the RL model is not
biased towards any concept and its preferences are completely analogous to the win and loss
according to the game's rules.
In the above graph, the position of the strategy in the name means that the player is following that
strategy. Blue signifies Player 1, and orange represents Player 2. From the graphs, we can clearly
observe that
Case(v): Player 1 plays High-Risk High Reward; Player 2 is AI
Player 1 win % is 16.19; Player 1 loss % is 14.96%
Case(vi): Player 1 plays Playing it Safe; Player 2 is AI
Player 1 win % is 8.38%; Player 1 loss % is 13.8%
Case(vii): Player 1 is AI; Player 2 plays High RIsk High Reward
Player 2 win % is 9.95%; Player 2 loss % is 14.96%
15
Case(vii): Player 1 is AI; Player 2 plays Playing it Safe
Player 2 win % is 9.94%; Player 2 loss % is 5.03%
From this we can observe that when playing against AI(or in a real-world situation against someone
who we have no idea plays what strategy) as Player 1 High-Risk High Reward has a better win % in
comparison to Playing it Safe. As Player 2, we observe that even though the win percentage is
approximately the same for both concepts, the loss percentage for Playing it Safe is significantly lower
than High Risk High Reward. So as Player 1, we prefer to play High-Risk High Reward, and as Player 2
we prefer Playing it Safe.
16
References:
THINKING POKER THROUGH GAME THEORY by Damian Palafox;
https://scholarworks.lib.csusb.edu/cgi/viewcontent.cgi?article=1378&context=etd
Introduction to Game Theory by Bill Chen
https://web.mit.edu/willma/www/2013lec3.pdf
Bill Chen and Jerrod Ankenman. The Mathematics of Poker. ConJelCo LLC, Pittsburgh, PA, 2006.
https://pokerbooks.lt/books/en/The_Mathematics_of_Poker.pdf
Code of the RL model
BhavyaMehta2/AKQJ10 (github.com)
17
Download