Abbas322Paper

advertisement
Bounded Rationality and Strategy Learning using Neural Networks.
Simnan Abbas
This paper tries to look at whether neural networks can be used to simulate the
endogenous emergence of bounded-rational behavior in normal form games. There is an
algorithm that is used to find the Nash Equilibrium of a normal form game, but this is too
complex for an evolutionary learning back-propagation algorithm to teach itself. Instead
the Neural Network settles or learning an approximation to the Nash Equilibrium in a
subset of games. The experiments run in the writing of this paper show that the Neural
Network (NN) is able to correctly classify the Nash Equilibrium (NE) strategy in
approximately 70% of the games it was tested on, much like behavioral heuristics
acquired by the bounded rational agent. 1
Suppose a child were to play to observe others playing a wide variety of games
during his childhood. Would he/she learn to play Nash Strategies? And if so, at what rate
would he/she do that? The following experiments carried out may suggest that the person
will endogenously learn the rules of thumb that allow them to perform in playing new
games. 2
Nash Equilibrium
The Nash equilibrium is an important idea because it is often thought it be the
minimum requirement of rational play in games. A Nash equilibrium is a strategy set
played by all players that gives them no incentive to unilaterally deviate from their
particular strategy. The implicit assumption of a Nash equilibrium is that both players are
1
2
Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games”
Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games”
rational and act to improve their best interests, and that it is common knowledge that the
other player is rational, and that the other player knows that the other player knows that
the other is rational and so on. In a 33 game of perfect information, a Nash Equilibrium
is a strategy profile such that given the expectation of the player B’s strategy and the use
of the equilibrium strategy profile, there is no incentive for player A to unilaterally
deviate from his/her strategy. Thus the Nash Equilibrium strategy profile defines the best
response of the player to what he/she perceives in the other players best response.
In this experiment, we train a NN on randomly generated 33 games represented
in normal form and examine whether it is possible to teach the NN how to identify NE’s
in games it has never seen before. This is much like asking, that if one is a good at chess,
how good can he/she be at checkers, having never played before?
Definitions
As mentioned earlier the NN is trained on a sample of random 33 games. The
game is a simultaneous move game, with static and discrete payoffs between two players,
which complete information of each other’s payoffs. Each player has three legible actions
and the values of the payoffs are randomly distributed between 0 and 10 and are made
known to the players before the game. There is complete and perfect information in this
game as payoffs are common knowledge.
The games that the NN are trained on are restricted to those that have only one
unique pure strategy Nash equilibrium. This eliminates games were there is only a mixed
strategy Nash equilibrium and games in which there are multiple equilibrium.
The Model
The first step was to produce a training data set to use on the NN. The training
data was created using a C program that would output files containing training data. The
program would randomly generate 3 by 3 games of for two players with payoffs of
support from 0 to 9. It would also check to make sure that the game produced would have
a unique Nash equilibrium, and would discard any games produced that had multiple
equilibrium. The output was in a form of 18 numbers representing the payoffs for each
player for the 3 by 3 game and the Nash Equilibrium strategy. To clarify, one game of
training data would look as follows
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Col
Out
Out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
2
7
5
3
0
0
7
5
9
0
1
0
9
0
2
7
6
2
8
9
1
This would correspond to the game:
9, 0 2, 7 6, 2
8, 9 1, 7 5, 3
0, 0 7, 5 9, 0
Equilibrium strategy for one Player: 7, 5
The first 18 columns, that represented payoffs generated by the random game
generator, are fed into the network as input. In the output layer a (0,0) represented that
the first strategy was chosen, a (0,1) represented that the second strategy was chosen and
a (1,0) represented that the third strategy was chosen. The NN used is a back-propagation
network with 1 hidden layer, TanH Activation functions and a momentum rate for the
learning rate. 3
Input Layer
18 Nodes
Hidden Layer
10 Nodes
TanH Activation Function
Momentum
Output Layer
2 Nodes
(0,0) means plays 1st strategy
(0,1) means plays 2nd strategy
(1,0) means plays 3rd strategy
Back-Propagation Component
Activation Function
At each node of the neural network an activation function is used to determine
how the summation of the input of that node is mapped to an output for that node. I
experimented with a few activation functions, and ended up using the TanH Axon,
because it gave me the best results. The TanhAxon applies a bias and Tanh function to
each neuron in the layer. This will squash the range of each neuron in the layer to
between -1 and 1. Such nonlinear elements provide a network with the ability to make
soft decisions.4 The function is:
3
4
To create the network I used NeuroSolutions, software that allows users to simuilate Neural Networks
NeuroSolutions Help File
F(x) = tanh[Bxi], where Bxi is the scaled and offset activity
Momentum
When step components try to improve the networks performance by taking steps
in the direction estimated by the back-propagation algorithm. Network learning can be
very slow if the step size is small, and can diverge if it is chosen too large. To further
complicate matters, a step size that works well for one location may be unstable in
another. Momentum provides the step change with some inertia, so that it tends to move
along a direction that is on average going down. In the NN I used the momentum was set
to a rate of 0.7, which seemed to work well5.
The Training Process
The training sample was a set of 4700 random games. The only restriction on the
games was that they had to contain a unique Nash Equilibrium. The 18 input payoff
values were then sent to the input layer, where they were sent to the hidden layers and
transformed using the activation function before being sent to the two-node output layer.
The back-propagation process works as follows. First the weights on the synapses
are set to random values. After this the network propagates forward and produces an
output with the random weights in place. The output is compared to the desired output in
the training data and an error function is used to calculate the degree of error in each of
the output nodes. This error function is the Mean Standard Error, which is he root mean
square difference between the output and the correct answer over the full set of games.
The aim is to minimize this error function over the span of the training process.
5
NeuroSolutions Help Documentation
After this each forward propagation the error value is used to estimate how much to
change the weights of the synapses leading back from the node, so as to produce a better
estimate of the output in the next trial. The following graph shows the desired values of
Col 19 and Col20 (which correspond to the output nodes) compared the actual values of
the Col 19 and 20 produced by the neural network during the training process. The Y axis
refers to
The Training Process
the value of the actual or desired output while the X axis refers to a particular game in the
training process. The figure shows the Neural Network trying to approximate the desired
answer from the training data. After 1000 epochs of training the Mean Squared Error
converges as shown in the graph and the training period ends.
Best Network
Epoch #
Minimum MSE
Training
1000
0.129586175
MSE versus Epoch
0.6
0.5
MSE
0.4
0.3
Training MSE
0.2
0.1
0
1
100
199
298
397
496
595
694
793
892
991
Epoch
The Testing Process
The data is now tested on 800 randomly generated 3*3 games that the Neural
Network has never seen before. The question is whether it has learnt to generalize how to
predict Nash Equilibrium more so than it would have been predicted by chance. At first
we test the Neural Network using random weights to give us an indication of how well
the network would have done if it had not been trained at all. The following Table shows
the results.
Testing With Random Weights
Performance
Mean Squared Error
Normalized Mean Square Error
Mean Absolute Error
Correct Classifications of Nash Equilibrium
Output1
0.477395166
1.932908989
0.525424277
Output2
0.484265921
2.281886051
0.522238623
22.2 %
Testing Data With Training
Performance
Mean Squared Error
Normalized Mean Square Error
Mean Absolute Error
Min Abs Error
Max Abs Error
Linear Correlation Coefficient
Percent Accuracy (Accuracy defined as
[correct answer – NN answer])
Correct Classifications of Nash Equilibrium
Output1
0.0773375
0.313129163
0.172878057
0.000276698
1.033802621
0.831162695
Output2
0.0834955
0.393435111
0.177709387
5.1856E-05
1.013413787
0.780079395
77.07910919
92.16590118
70.1%
The tables demonstrate that the Neural Network does produce significantly better results
than it would by chance, but its accuracy is still significantly less than 100%. The
findings imply that any finite normal form game could be modeled in this way although
we have looked mostly at 3 by 3 games, which are the smallest class of games that can be
subjected to iterated elimination of dominated strategies. Perhaps a Neural network
player that has seen a large enough sample of games with the NE pointed out to it, could
learn the Nash Algorithm, although this is somewhat unlikely, given the complexity of
that algorithm. The network is likely to find some simpler but less accurate way of
solving the problem that gets it fair degree of accuracy in judging the NE of a never
before seen game. This somehow parallels the behavioral heuristics that are learned by
agents with bounded rationality. The 70% accuracy rate that is attained by the neural
network seems close to the 59.6% experimental figure that was obtained from the Stahl
and Wilson6 (1994) in their study of humans trying to play 3*3 symmetric games they
had never seen before.
Stahl and Wilson, “Experimental evidence on players models of other players”, Journal of Economic
Behavior and Organization.
6
Alternatives to Nash
At this point, it is interested to ask that if the NN is not perfectly simulating a NE
strategy at every game, is there some other simpler algorithm it has learnt? Research into
this seems to suggest that the NN is following payoff dominance. Payoff dominance is a
form of “low level rationality” or where the player picks the highest conceivable payoff
for oneself plays the corresponding row, hoping that the other player will pick the
corresponding column. The highest payoff is that expected payoff which to be obtained
based on assuming that the other player simply randomizes between his/her strategies
(playing a mixed strategy with probability 1/3 on each action). 7 What is very interesting
about payoff dominance is that human laboratory subjects, when asked to play similar
games that they had never seen before, tended to resort to the similar tactic of payoff
dominance to pick their strategies8
Conclusion
The lesson from this may be that a checkers master will also be a good player in
chess, and there will be times when he/she will be able to pick out best response to
others’ strategies, but there will also be times when the checkers master will make many
mistakes and lose at chess.
The NN built in this project has shown that a NN can be trained to generalize a
NE in normal form games to some simpler rules of thumb and apply them to games never
seen before. The network shows some strategic awareness, but this awareness is bounded.
7
8
Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games”
Bounded Rationality
Empirical Research by Zizzo and Sgroi show that the NN tends to go for high payoff
actions, much like humans in similar situations. The project therefore does bring up some
interesting questions about the nature of bounded rationality. 9 Psychologists have shown
that humans use simple heuristics to simplify computationally demanding problems in
their lives. The most interesting aspect of this project therefore has been how closely the
neural network behaves like human laboratory subjects.
Note
For this project, I created a program that generates randomly generated games, created a
neural network using software, and performed analysis on the training and testing
process.
9
Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games”
Code Written for Project
Random Game Generator
#include <stdio.h>
#include <stdlib.h>
#define NUM 3
int matrix [3][3][2];
int matrixNash [3][3][2];
void printmatrix (){
int i, j;
for (i=0; i < NUM; i++)
{
for (j=0; j<NUM;j++)
{
printf ("%2d,", matrix[i][j][0]);
printf ("%2d ", matrix[i][j][1]);
//printf ("%d %d ", matrix[i][j][0],
matrix[i][j][1] );
}
printf ("\n");
}
}
void ComputeNash()
{
int i, j, max=0, maxindex=0;
void printnash (){
int i, j;
for (j=0; j < NUM; j++)
{
for (i=0; i < NUM; i++)
{
//if ((matrixNash[i][j][0] == 100) &&
(matrixNash[i][j][1] == 100)) { printmatrix(); printf
("strategies - %d %d\n", i, j); }
if ((matrixNash[i][j][0] == 100) &&
(matrixNash[i][j][1] == 100)) {
printmatrix();
printf ("strategy - %d
%d\n", matrix[i][j][0], matrix[i][j][1]);
if (i==0) printf ("0 0\n");
if (i==1) printf ("0 1\n");
if (i==2) printf ("1 0\n");
/*
if (j==0) printf ("0 0\n");
if (j==1) printf ("0 1\n");
if (j==2) printf ("1 0\n");*/
}
}
}
for (i=0; i < NUM; i++)
{
for (j=0; j < NUM; j++)
{
if (max <= matrix [i][j][1]) {max =
matrix [i][j][1]; maxindex = j;}
}
int main ()
{
int i,j, p;
}
max = 0;
matrixNash [i][maxindex][1]= 100;
for ( p=0; p < 8000; p++)
{
}
for (j=0; j < NUM; j++)
{
for (i=0; i < NUM; i++)
{
if (max <= matrix [i][j][0]) {max =
matrix [i][j][0]; maxindex = i;}
}
max = 0;
matrixNash [maxindex][j][0]= 100;
}
}
int checkunique()
{
int i, j, count =0;
for (j=0; j < NUM; j++)
{
for (i=0; i < NUM; i++)
{
if ((matrixNash[i][j][0] == 100) &&
(matrixNash[i][j][1] == 100)) count++;
}
}
if (count > 1) return 0;
else return 1;
}
for (i=0; i < NUM; i++)
{
for (j=0; j<NUM;j++)
{
matrix [i][j][0]= ((rand())%10);
//matrix [i][j][0]= matrix [i][j][0]
/10;
matrix [i][j][1]= ((rand())%10);
//matrix [i][j][1]= matrix [i][j][1] /10;
}
}
for (i=0; i < NUM; i++)
{
for (j=0; j<NUM;j++)
{
matrixNash [i][j][0]= 0;
matrixNash [i][j][1]= 0;
}
}
//printmatrix();
ComputeNash();
if (checkunique() == 1)
printnash();
}
return 0;
}
Download