Multi Level Reasoning In A Cheat Agent Deception

advertisement
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover
Deception
Rina Blomberg
rinbl907@student.liu.se
729G11: Artificial Intelligence II
Linköping University
September 19, 2011.
One goal in Artificial Intelligence is to design cognitive systems that can perform multi level
reasoning typical of humans. The aim of this paper is to theoretically demonstrate how human
multi-level reasoning within a competitive environment can be modelled in an intelligent agent.
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
“...it is the means by which rational beings understand
themselves to think about cause and effect, truth and falsehood,
and what is good or bad.”
Reason - From Wikipedia 09-09-2011
Introduction
Human intelligence can be observed through their rational reasoning in the face of the
unknown: hypotheses are made, tested and evaluated and new beliefs replace old.
When presented with problems that require uninformed/sub-informed decisions,
humans typically perform strategic actions in order to gain more information about the
problem. With problems that involve two or more persons, humans engage in multilevel reasoning: reasoning about other‟s beliefs and predicting their plans of action.
One goal in Artificial Intelligence is to design cognitive systems that can perform this
kind of human-level reasoning and the aim of this paper is to theoretically demonstrate
how human multi-level reasoning in a competitive environment can be modelled in an
intelligent agent.
Background
Games have been a key element for studies in Artificial Intelligence (AI). They provide
researchers with a structured environment for which human cognition can be observed,
monitored and tested. One of the most popular games since the late 20th Century that
involves making rational, sub-informed decisions with multi-level reasoning is poker.
Poker is described as a game of simple rules but with complex strategies, and one
challenge within AI is to create a poker agent capable of playing on par with a human
expert.
Despite increasing progress in pokerbot design, the reason why agents are still unable to
compete at an expert level is not due to the complexity of the game space but due to the
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
fact that poker is a nontransitive zero-sum game.
To tackle this challenge, AI
researchers have taken a Bayesian Network-driven approach, (Korb, Nicholson, &
Jitnah, 1999), (Mihok & Terry, 2005), (Tretyakov & Kamm, 2009).
Bayesian Networks are used in AI to reason about uncertainty and provide agents with
the means to make inferences about complex causal relationships between variables via
a directed acyclic graph.
Bayesian inference is an iterative process.
Beliefs are
modelled as a probability distribution over uncertain data and as new evidence is
observed the belief system is updated. The initial belief is termed prior probability and
the modified belief is termed posterior probability.
Pokerbot researchers, Brian Mihok and Michael Terry (M&T) defined the different
levels of human reasoning in poker so as to create a solid framework for their model
(see Definition 1). A key element in M&T‟s system-design was the use of Bayesian
Nets for lower-level inferences about the opponent‟s hand and strategy as a basis for
higher-level strategic reasoning (Mihok & Terry, 2005).
Definition 1. Mihok and Terry define different levels of human reasoning in poker as:
Level 0: Reasoning about your hand
Level 1: Reasoning about your opponents hand
Level 2: Reasoning about what your opponent thinks about his and your
hand
Level 3: What to do about what your opponent thinks
…
Professional players are often thinking on level 4 or 5.
Influenced by M&T‟s framework, this project will explain the design of a theoretical
model for a Cheat1 agent capable of competing on par with an experienced human
player. The card game Cheat is not as complex as poker, betting and suiting (clubs,
diamonds, hearts, spades), for example, are not elements of the game. Deception
though, plays a major role in game strategy and so like poker, multi-level reasoning is
required for optimal play. For simplicity, the game space in this model will be confined
1
Also known as „Bullshit‟ and „I doubt it‟.
3
to the two player game version, but with a few modifications the model may be usable
for versions with 3+ players.
2 Player Cheat
Set Up
A standard pack of shuffled 52 playing cards is used. The chosen dealer deals 8 cards
face down to both players (the player‟s hand) and one card face up in the middle of the
board (the tabled card). The remaining cards (the pack) are piled face down beside the
table card (see Figure 1). A player may look at the cards in his own hand. The player
who did not deal is the first to move.
Figure 1. Initial board setup.
Object
The object of the game is to be the first player to get rid of all the cards in his hand.
Game Play
Turns: The player whose turn it is, (Player) can choose to discard one to four cards
face down onto the tabled pile and call out their rank or pick up one card from the pack
and add it to his hand.
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
Picking Up: If Player chooses to pick up a card from the pack then Player‟s turn is
over.
Discarding: Should Player choose to table a number of cards, the chosen rank must
either be one rank higher or one rank lower than the previously tabled card(s) (see
Definition 2).
Player must be truthful about the quantity of cards he discards but is
allowed to bluff about the actual cards he tables. For example, Player may claim to
table three 7s but in actual fact placed face down two Js and one 9. Opponent must
decide whether or not Player made an honest claim.
Accepting A Play: If Opponent accepts Player‟s claim as honest, then Player‟s turn is
over and Opponent‟s turn consists of choosing between picking up a card from the pack
or discarding a number of cards with either one rank higher or lower than the (claimed)
rank of the tabled card(s).
Calling A Bluff:
If Opponent thinks Player was dishonest, he calls out „Cheat!‟
Player‟s tabled cards are exposed and one of two things happens:
1. If all of the exposed cards are shown to be of the rank that Player claimed, then
Opponent must pick up and add to his hand the entire tabled pile. Player starts
the next round.
2. If any of the exposed cards are different from the claimed rank, then Player must
pick up and add to his hand the entire tabled pile. Opponent starts the next
round.
Next Round: Then next round starts with the player whose turn it is which is dependent
upon the outcome of the called bluff. The player tables face up any card he chooses
from his hand and it then becomes Opponents turn to move and play continues as usual.
Ending the Game: The player who has just one card left in his hand and is able to
discard that card without cheating, wins the game. Alternatively, a player who collects
30+ cards in his hand, automatically loses.
5
Definition 2. Ranking of cards from lowest to highest.
When a King is tabled the choice of ranks are Queen or Ace.
When an Ace is tabled the choice of ranks are King or Two.
Cheat Strategy
The key to winning is being the player with the least number of cards in hand and
having increasing opportunities to set the tabled rank to one‟s own preference. This is
done by successfully calling the opponent‟s bluff and fooling the opponent into calling
cheat for an honest play.
Holding all four cards of a rank in hand is the easiest way to gain the lead because the
opponent is either forced to pick up or cheat if made to play this rank.
Card memory is a necessity. Remembering which cards one has discarded that have
ended up in the opponent‟s hand, assists in reasoning whether or not to play honestly
and whether or not to call the opponent a cheat.
Cheat becomes a game of pattern recognition. The player who wins exploits patterns in
his opponent's play before the opponent can adjust. Thus, an inexperienced player will
repeatedly call cheat for honest plays.
With every turn, an experienced player will reason on various levels:

Which rank should I play? What will most likely lead to strong sequential play
and discards to my advantage? (Level 0)

What is the likelihood that my opponent holds one or more cards of this rank in
his hand? (Level 1)

What is the likelihood that my opponent will call cheat, what is the likelihood
that my opponent cheated? (Level 2)

Is my opponent aware of my bluffing strategy and what should I do about my
opponent‟s bluffing strategy? (Level 3)
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
As M&T have demonstrated in their research, these multi reasoning levels provide a
framework from which to generate a winning strategy for the agent and aid in
classifying inference tasks. For this theoretical model, Levels 0 to 2 will be formulated.
The advanced Level 3 will remain outside the scope of the project but will be included
in the discussion at the end of this report.
Achieving Level 0:
In Cheat a strong hand is one that allows for sequential play with many discards or
controls whether the opponent will be forced to pick up or cheat. Sequential play
involves playing a series of successive ranks. In most cases the entire hand does not
need to be evaluated in order to determine sequential play, it is enough to reason over
±3 ranks from the rank of the tabled. Because the outcome of each turn is uncertain, it
is necessary for the agent to calculate hand strength with each turn.
Reasoning about own hand
Let:
q = the quantity with the domain: {1, 2, 3, 4}
r = a rank with the domain: {A, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K}
t = the rank of the tabled card.
Represent sequential play in the hand as: H = [qt, q(t+1), q(t+2), q(t+3)]
and: L = [q(t-3), q(t-2), q(t-1), qt]
Create a heuristic algorithm to compare the strength of L to H (see Algorithm 1).
Algorithm 1 helps the agent to decide whether to play one rank higher or one rank lower
than the tabled card, even when the agent‟s hand doesn‟t actually have that rank. This is
because the algorithm puts a value on the sequence of ranks and their quantities rather
than just valuing a rank per se.
When the outcome of the opponent‟s turn falls in favour of the agent‟s direction of
preference – H(igher) vs. L(ower); the agent can reason at this level, together with
7
condition action rules, to not risk calling cheat and instead continue to discard cards in
the direction of the preferred sequence.
Algorithm 1. Algorithm to compare the strength of L to H.
Achieving Level 1
As the game evolves, a player gains increasing information over what cards the
opponent may be holding. The player gets this information in two ways:
1. Estimating the likelihood that the opponent will have qi ri in his hand based upon
the total number of unknown cards on the board.
2. By remembering which cards he has discarded that have ended up in the
opponent‟s hand and by estimating the likelihood that those cards are still in the
opponent‟s hand (Level 2).
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
Reasoning about opponent’s hand
Let:
U = the total number of unknown cards on the board (includes unknown cards in
opponent hand, tabled pile and pack).
N = the number of ri contained in the set of U.
C = the total number unknown cards in opponent‟s hand.
Create an algorithm for calculating P(qj ri) (see Algorithm 2).
Algorithm 2. Calculating the probability that a certain quantity of a certain rank is
contained in the hand of the opponent.
Algorithm 2 gives the agent necessary insight into the opponent‟s hand, which together
with condition action rules, at this level of reasoning, can be used to decide whether or
not to call the opponent a cheat. This level of reasoning however, is not a optimal
strategy because an experienced human player will take into account a number of other
stochastic variables that influence whether or not an opponent has decided to cheat.
Achieving Level 2
Reasoning about what opponent thinks about his and your hand
Estimating whether or not a known card is still in the opponent‟s hand after several
turns (within a single round), involves taking into consideration the likelihood that the
opponent has cheated with each turn. Calculating this likelihood requires Level 2
9
reasoning and the agent achieves this through Bayesian inference. This project now
turns to the design of the Bayesian Net, and does so node by node with reference to the
agent‟s reasoning needs.
Engineering the Bayesian Net
A Bayesian Net is a directed acyclic graph in which each node in the graph corresponds
to a random variable, which may be discrete or continuous. A probability function
associated with each node takes as input a particular set of values and returns the
probability for a particular value of the variable represented by the node. The edges are
directed links between pairs of nodes and represent conditional dependencies. Each
node Xi has a conditional probability distribution P(Xi | Parents(Xi)) that quantifies the
effect of the parents on the node (Russell & Norvig, 2010). Nodes which are not
connected represent conditionally independent variables.
Algorithm 2 is used to calculate the probability that a certain quantity, qj of a certain
rank, ri exists in the Opponent’s hand - P(O). So the first node, X1 in the Bayesian Net
will contain thirteen different probability tables for P(O) one for each card rank.
Algorithm 2 can also be used for calculating the conditional probability distribution:
P(A | O), which represents the opponent‟s belief about the quantity ri in the Agent’s
hand given the quantity ri contained in his own hand. A conditional probability table
for P(A | O) forms the second node, X2 = A in the Bayesian Net (see Figure 2).
Figure 2. Connected nodes: X1 and X2, with probability tables.
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
So far, the agent is able to form knowledge about the cards in the opponent‟s hand and
what the opponent believes about the agent‟s hand based only upon the information
contained in his own hand.
But this information is not enough to reason about
deception, thus the Bayesian Net will need additional nodes.
Experienced players regularly weigh how certain they are that the opponent is bluffing
against the cost of picking up the tabled pile should they be wrong. This is a crucial
decision making process and must therefore be represented in the Bayesian Net.
Let:
H be a random variable with a Bernouilli distribution (see Formula 1).
H = 1 means that picking up the pack will Hurt.
H = 0 means that picking up the pack will not Hurt.
X3 = H
Forumla 1. The probability formula for the random variable H where p = P(H =1) and
h ∈ {0,1}.
P(H = h) = ph(1-p)1-h
Create an algorithm for determining the value of H for player B (see Algorithm 3)
where:
A = number of cards in A‟s hand
B = number of cards B‟s hand
T = number of cards in tabled pile
True = 1
False = 0
Algorithm 3 allows for the agent to take into account the likelihood that picking up the
tabled pile will hurt either its chances of winning or the opponent‟s chances of winning.
This is determined by calculating the ratio for A:(B+T). If
> 2.3 for A < B+T, then
picking up the tabled piled is considered detrimental to B‟s hand because for every one
card A discards, B must discard more than 2.3 cards in order to win. Such a scenario
will increase B‟s need to cheat with large quantities of cards at a time and thus increase
the risk of being caught again.
11
Algorithm 3. Algorithm for determining the value of H for player B.
The choice of how many cards to table can greatly affect the outcome of being called a
cheat, so the fourth node, X4 in the network will represent the Choice of tabling 1-4
cards. Let C be a random variable with a categorical distribution for the domain {1, 2,
3, 4}.
The probability law of C is given in Formula 2 where pi represents the
probability of seeing element i and
.
Formula 2. The probability formula of C where pi represents the probability of seeing
element i and
.
P(C = ci) = pi
The final node, X5 in the Bayesian Net represents the probability that the player in
question has Bluffed. The random variable B in X5 has the domain: {True, False}.
The initial data contained within the conditional probability table P(B | A, C, H) will
only be an estimate. It is intended, that as the Bayesian net is learned through new data
observations, each parameter in the conditional probability table will come to represent
the actual probability for the sample‟s value (this is discussed more in the next section:
Learning the Bayesian Net).
The final topography of the Bayesian Net can be seen in Figure 3 which shows the
conditional dependencies for X5.
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
Figure 3. Bayesian inference net for Cheat agent.
Bayesian Net for Level 2 reasoning
The semantics of this Bayesian Net allow the agent to use the same data and inference
methods for two different reasoning perspectives:
1. The first perspective is reasoning about the opponent‟s cards and actions – the
agent reasoning when it is the opponent‟s turn.
2. The second perspective is reasoning about the opponent‟s beliefs about the
agent‟s cards and actions – the agent reasoning when it is its own turn.
By learning the Bayesian Net after each player‟s turn, the posterior probability of
perspective one forms the hypothesis (prior probability) for perspective two. This
makes the Bayesian Net a powerful tool for Level 2 reasoning and is not so different
from the reasoning methods that humans employ at this level. The next section of this
project will explain how the Bayesian Net is learned.
13
Learning The Bayesian Net
In order for this multi level reasoning model to be effective, it must learn from its
observations. This process involves maintaining an observation table for each variable
to be updated in the network, and then updating the variable‟s conditional probability
table under the assertion that the actual probabilities for the parameters are
proportionally equal to the observed data for that variable.
The chosen learning
approach is therefore Maximum-likelihood Parameter Learning (Russell & Norvig,
2010).
In a report from 2002, titled “Sensitivity Analysis for Selective Parameter Update in
Bayesian Network Learning” by Wang, Rish & Ma; a formula for the maximum
likelihood of a specific parameter θ was discussed (Wang, Rish, & Ma, 2002). Because
learning the network involves the process of actively updating specific parameters, this
formula will be applied to the model (see Formula 1).
Formula 1. Maxiumum likelihood parameter learning formula (Wang, Rish, & Ma,
2002). θ is a parameter from row x, column y in a conditional probability table τ. N
is the number of observations made for a particular instance of a variable, with k
possible values, in an observation table τ´.
Each variable in the Bayesian Net that requires active evidence updating will have a
corresponding observation table (see Figure 4), which is a total of 15 tables.
Figure 4. Observation tables for the variables B´, A´ and O´i.
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
An observation value entered into a table representing the quantity of a rank in the
opponent's hand does, not have to be exactly 1 if the agent is uncertain. Instead the
agent can use Algorithm 2 and add the value for each P(
in the table.
) to the corresponding value
The total observations for the table will still however be a whole
number because
.
The reason why observations for the agent‟s hand are represented in only one table,
rather than 13 different tables for each rank is because it is assumed that the actual rank
of the cards have no varying effect on the behaviour of the opponent. It is only the
quantity of cards in relation to the opponent‟s cards that need to be statistically
observed.
Inference And Action
At node X5 the agent is able to use the same inference method to determine two
different questions:
1. Given that my opponent has tabled qj ri what is the probability that he is
bluffing?
2. If I table qj ri what is the probability that the opponent will call cheat?
Using inference by enumeration together with relevant variable selection, the agent is
able to calculate (see Formula 3), the probability of B = True given the value for H as
determined by Algorithm 3 and the value of C as either an observed value (question 1)
or a hypothesis (question 2).
Formula 2. Bayesian inference formula for P(B | H, C)
This method of inference can be used to calculate all Level 2 queries such as whether or
not a known quantity of card rank is still in the opponent‟s hand after several turns
(within a single round): P(o | b) in which o = qj ri and b = True.
15
The information derived from Bayesian inference combined with the information
regarding hand strength in Algorithm 1 provides the agent with a solid foundation for
the condition action rules. Algorithm 4 is a simple example of how the condition action
rules could be formulated.
Algorithm 4. A simple example of condition action rules for a Cheat agent.
Discussion
The inference method used in this project was one of exact inference. The complexity
of exact inference is dependent upon the structure of the network. This network‟s
structure is known as a polytree in which there is at most one undirected path between
any two nodes. The time and space complexity of exact inference in polytrees is linear
in the size of the network. (Russell & Norvig, 2010) The size of this network can be
defined as the total number parameters from all of the conditional probability tables. If
adapting this network to a 3+ player game involves a structural change such that the
network becomes multiply connected rather than singly connected (polytree), other
inference methods will be needed in order to avoid exponential time and space
complexity.
This project has shown that human multi-level reasoning as high as Level 2
theoretically can be modelled in an agent using Bayesian inference.
As previous
pokerbot researchers have shown, a purely Bayesian driven model will not succeed in
solving higher level reasoning problems, such as reasoning about an opponent‟s strategy
and adapting to that strategy.
This is because Bayesian Nets are meant for the
Multi Level Reasoning In A Cheat Agent
Using Bayesian Inference to Cover and Uncover Deception
Rina Blomberg
construction of adaptive algorithms and so model engineers must turn to hybrid
Bayesian networks such as a Dynamic Bayesian Net (Tretyakov & Kamm, 2009).
Mihok‟s and Terry‟s pokerbot research, theoretically incorporated Bayesian nets into a
Hidden Markov Model (HMM) for the purpose of creating a multivariable inference
system (Mihok & Terry, 2005).
A HMM can be considered as a simple Dynamic
Bayesian Net (Wikipedia, 2011) and consists of a finite set of states, each of which is
associated with a probability distribution.
M&T argue that humans attempt high levels of reasoning through hypothesis testing
and an important goal in their research was to demonstrate how a HMM is an applicable
model for this human process. Their approach starts by estimating the range of hands
an opponent in poker may have and as more information becomes available, the
distribution of initial states is updated.
The HMM keeps track of the transitions
between various hand states and the embedded Bayesian Nets in the HMM are used to
make inferences about observations regarding the opponent‟s hand and strategy.
Future work for this project will continue using M&T‟s research and influence and
attempt to design a Cheat agent capable of reasoning on Level 3 by introducing a
Dynamic Bayesian Net to the model.
17
References
Korb, K. B., Nicholson, A. E., & Jitnah, N. (1999). Bayesian Poker. Proceedings of the
15th International Conference on Uncertainty in Artificial Intelligence (pp. 343-350).
Stockholm: UAI.
Mihok, B. E., & Terry, M. A. (2005). A Bayesian Net Inference Tool for Hidden State in
Texas Hold'em Poker. Cambridge: Massachusetts Institute of Technology .
Mihok, B., & Terry, M. (2005). Statistical Learning and Inference Methods for
Reasoning in Games. Cognitive Robotics 2005 Advance Lectures. Cambridge: MIT
Open Courseware.
Nicholson, A. E., Korb, K. B., & Boulton, D. Using Bayesian Decision Networks to
Play Texas Hold'Em Poker. Australia: Faculty of Information Technology, Monash
University.
Russell, S., & Norvig, P. (2010). Artificial Intelligence, A Modern Approach -Third
Edition. New Jersey: Pearson.
Tretyakov, K., & Kamm, L. (2009). Modeling Texas Hold'em Poker Strategies with
Bayesian Networks. Tartu: University of Tartu.
Wang, H., Rish, I., & Ma, S. (2002). Using Sensitivity Analysis for Selective Parameter
Update in Bayesian Network Learning. Menlo Park, CA: AAAI Press.
Wikipedia. (2011). Wikipedia. Retrieved 9 19, 2011, from Dynamic Bayesian Network:
http://en.wikipedia.org/wiki/Dynamic_Bayesian_network
Download