Bounded Rationality and Strategy Learning using Neural Networks. Simnan Abbas This paper tries to look at whether neural networks can be used to simulate the endogenous emergence of bounded-rational behavior in normal form games. There is an algorithm that is used to find the Nash Equilibrium of a normal form game, but this is too complex for an evolutionary learning back-propagation algorithm to teach itself. Instead the Neural Network settles or learning an approximation to the Nash Equilibrium in a subset of games. The experiments run in the writing of this paper show that the Neural Network (NN) is able to correctly classify the Nash Equilibrium (NE) strategy in approximately 70% of the games it was tested on, much like behavioral heuristics acquired by the bounded rational agent. 1 Suppose a child were to play to observe others playing a wide variety of games during his childhood. Would he/she learn to play Nash Strategies? And if so, at what rate would he/she do that? The following experiments carried out may suggest that the person will endogenously learn the rules of thumb that allow them to perform in playing new games. 2 Nash Equilibrium The Nash equilibrium is an important idea because it is often thought it be the minimum requirement of rational play in games. A Nash equilibrium is a strategy set played by all players that gives them no incentive to unilaterally deviate from their particular strategy. The implicit assumption of a Nash equilibrium is that both players are 1 2 Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games” Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games” rational and act to improve their best interests, and that it is common knowledge that the other player is rational, and that the other player knows that the other player knows that the other is rational and so on. In a 33 game of perfect information, a Nash Equilibrium is a strategy profile such that given the expectation of the player B’s strategy and the use of the equilibrium strategy profile, there is no incentive for player A to unilaterally deviate from his/her strategy. Thus the Nash Equilibrium strategy profile defines the best response of the player to what he/she perceives in the other players best response. In this experiment, we train a NN on randomly generated 33 games represented in normal form and examine whether it is possible to teach the NN how to identify NE’s in games it has never seen before. This is much like asking, that if one is a good at chess, how good can he/she be at checkers, having never played before? Definitions As mentioned earlier the NN is trained on a sample of random 33 games. The game is a simultaneous move game, with static and discrete payoffs between two players, which complete information of each other’s payoffs. Each player has three legible actions and the values of the payoffs are randomly distributed between 0 and 10 and are made known to the players before the game. There is complete and perfect information in this game as payoffs are common knowledge. The games that the NN are trained on are restricted to those that have only one unique pure strategy Nash equilibrium. This eliminates games were there is only a mixed strategy Nash equilibrium and games in which there are multiple equilibrium. The Model The first step was to produce a training data set to use on the NN. The training data was created using a C program that would output files containing training data. The program would randomly generate 3 by 3 games of for two players with payoffs of support from 0 to 9. It would also check to make sure that the game produced would have a unique Nash equilibrium, and would discard any games produced that had multiple equilibrium. The output was in a form of 18 numbers representing the payoffs for each player for the 3 by 3 game and the Nash Equilibrium strategy. To clarify, one game of training data would look as follows Col Col Col Col Col Col Col Col Col Col Col Col Col Col Col Col Col Col Out Out 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 2 7 5 3 0 0 7 5 9 0 1 0 9 0 2 7 6 2 8 9 1 This would correspond to the game: 9, 0 2, 7 6, 2 8, 9 1, 7 5, 3 0, 0 7, 5 9, 0 Equilibrium strategy for one Player: 7, 5 The first 18 columns, that represented payoffs generated by the random game generator, are fed into the network as input. In the output layer a (0,0) represented that the first strategy was chosen, a (0,1) represented that the second strategy was chosen and a (1,0) represented that the third strategy was chosen. The NN used is a back-propagation network with 1 hidden layer, TanH Activation functions and a momentum rate for the learning rate. 3 Input Layer 18 Nodes Hidden Layer 10 Nodes TanH Activation Function Momentum Output Layer 2 Nodes (0,0) means plays 1st strategy (0,1) means plays 2nd strategy (1,0) means plays 3rd strategy Back-Propagation Component Activation Function At each node of the neural network an activation function is used to determine how the summation of the input of that node is mapped to an output for that node. I experimented with a few activation functions, and ended up using the TanH Axon, because it gave me the best results. The TanhAxon applies a bias and Tanh function to each neuron in the layer. This will squash the range of each neuron in the layer to between -1 and 1. Such nonlinear elements provide a network with the ability to make soft decisions.4 The function is: 3 4 To create the network I used NeuroSolutions, software that allows users to simuilate Neural Networks NeuroSolutions Help File F(x) = tanh[Bxi], where Bxi is the scaled and offset activity Momentum When step components try to improve the networks performance by taking steps in the direction estimated by the back-propagation algorithm. Network learning can be very slow if the step size is small, and can diverge if it is chosen too large. To further complicate matters, a step size that works well for one location may be unstable in another. Momentum provides the step change with some inertia, so that it tends to move along a direction that is on average going down. In the NN I used the momentum was set to a rate of 0.7, which seemed to work well5. The Training Process The training sample was a set of 4700 random games. The only restriction on the games was that they had to contain a unique Nash Equilibrium. The 18 input payoff values were then sent to the input layer, where they were sent to the hidden layers and transformed using the activation function before being sent to the two-node output layer. The back-propagation process works as follows. First the weights on the synapses are set to random values. After this the network propagates forward and produces an output with the random weights in place. The output is compared to the desired output in the training data and an error function is used to calculate the degree of error in each of the output nodes. This error function is the Mean Standard Error, which is he root mean square difference between the output and the correct answer over the full set of games. The aim is to minimize this error function over the span of the training process. 5 NeuroSolutions Help Documentation After this each forward propagation the error value is used to estimate how much to change the weights of the synapses leading back from the node, so as to produce a better estimate of the output in the next trial. The following graph shows the desired values of Col 19 and Col20 (which correspond to the output nodes) compared the actual values of the Col 19 and 20 produced by the neural network during the training process. The Y axis refers to The Training Process the value of the actual or desired output while the X axis refers to a particular game in the training process. The figure shows the Neural Network trying to approximate the desired answer from the training data. After 1000 epochs of training the Mean Squared Error converges as shown in the graph and the training period ends. Best Network Epoch # Minimum MSE Training 1000 0.129586175 MSE versus Epoch 0.6 0.5 MSE 0.4 0.3 Training MSE 0.2 0.1 0 1 100 199 298 397 496 595 694 793 892 991 Epoch The Testing Process The data is now tested on 800 randomly generated 3*3 games that the Neural Network has never seen before. The question is whether it has learnt to generalize how to predict Nash Equilibrium more so than it would have been predicted by chance. At first we test the Neural Network using random weights to give us an indication of how well the network would have done if it had not been trained at all. The following Table shows the results. Testing With Random Weights Performance Mean Squared Error Normalized Mean Square Error Mean Absolute Error Correct Classifications of Nash Equilibrium Output1 0.477395166 1.932908989 0.525424277 Output2 0.484265921 2.281886051 0.522238623 22.2 % Testing Data With Training Performance Mean Squared Error Normalized Mean Square Error Mean Absolute Error Min Abs Error Max Abs Error Linear Correlation Coefficient Percent Accuracy (Accuracy defined as [correct answer – NN answer]) Correct Classifications of Nash Equilibrium Output1 0.0773375 0.313129163 0.172878057 0.000276698 1.033802621 0.831162695 Output2 0.0834955 0.393435111 0.177709387 5.1856E-05 1.013413787 0.780079395 77.07910919 92.16590118 70.1% The tables demonstrate that the Neural Network does produce significantly better results than it would by chance, but its accuracy is still significantly less than 100%. The findings imply that any finite normal form game could be modeled in this way although we have looked mostly at 3 by 3 games, which are the smallest class of games that can be subjected to iterated elimination of dominated strategies. Perhaps a Neural network player that has seen a large enough sample of games with the NE pointed out to it, could learn the Nash Algorithm, although this is somewhat unlikely, given the complexity of that algorithm. The network is likely to find some simpler but less accurate way of solving the problem that gets it fair degree of accuracy in judging the NE of a never before seen game. This somehow parallels the behavioral heuristics that are learned by agents with bounded rationality. The 70% accuracy rate that is attained by the neural network seems close to the 59.6% experimental figure that was obtained from the Stahl and Wilson6 (1994) in their study of humans trying to play 3*3 symmetric games they had never seen before. Stahl and Wilson, “Experimental evidence on players models of other players”, Journal of Economic Behavior and Organization. 6 Alternatives to Nash At this point, it is interested to ask that if the NN is not perfectly simulating a NE strategy at every game, is there some other simpler algorithm it has learnt? Research into this seems to suggest that the NN is following payoff dominance. Payoff dominance is a form of “low level rationality” or where the player picks the highest conceivable payoff for oneself plays the corresponding row, hoping that the other player will pick the corresponding column. The highest payoff is that expected payoff which to be obtained based on assuming that the other player simply randomizes between his/her strategies (playing a mixed strategy with probability 1/3 on each action). 7 What is very interesting about payoff dominance is that human laboratory subjects, when asked to play similar games that they had never seen before, tended to resort to the similar tactic of payoff dominance to pick their strategies8 Conclusion The lesson from this may be that a checkers master will also be a good player in chess, and there will be times when he/she will be able to pick out best response to others’ strategies, but there will also be times when the checkers master will make many mistakes and lose at chess. The NN built in this project has shown that a NN can be trained to generalize a NE in normal form games to some simpler rules of thumb and apply them to games never seen before. The network shows some strategic awareness, but this awareness is bounded. 7 8 Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games” Bounded Rationality Empirical Research by Zizzo and Sgroi show that the NN tends to go for high payoff actions, much like humans in similar situations. The project therefore does bring up some interesting questions about the nature of bounded rationality. 9 Psychologists have shown that humans use simple heuristics to simplify computationally demanding problems in their lives. The most interesting aspect of this project therefore has been how closely the neural network behaves like human laboratory subjects. Note For this project, I created a program that generates randomly generated games, created a neural network using software, and performed analysis on the training and testing process. 9 Zizzo, et all, “Bounded Rational Behavior by Neural Networks in Normal Form Games” Code Written for Project Random Game Generator #include <stdio.h> #include <stdlib.h> #define NUM 3 int matrix [3][3][2]; int matrixNash [3][3][2]; void printmatrix (){ int i, j; for (i=0; i < NUM; i++) { for (j=0; j<NUM;j++) { printf ("%2d,", matrix[i][j][0]); printf ("%2d ", matrix[i][j][1]); //printf ("%d %d ", matrix[i][j][0], matrix[i][j][1] ); } printf ("\n"); } } void ComputeNash() { int i, j, max=0, maxindex=0; void printnash (){ int i, j; for (j=0; j < NUM; j++) { for (i=0; i < NUM; i++) { //if ((matrixNash[i][j][0] == 100) && (matrixNash[i][j][1] == 100)) { printmatrix(); printf ("strategies - %d %d\n", i, j); } if ((matrixNash[i][j][0] == 100) && (matrixNash[i][j][1] == 100)) { printmatrix(); printf ("strategy - %d %d\n", matrix[i][j][0], matrix[i][j][1]); if (i==0) printf ("0 0\n"); if (i==1) printf ("0 1\n"); if (i==2) printf ("1 0\n"); /* if (j==0) printf ("0 0\n"); if (j==1) printf ("0 1\n"); if (j==2) printf ("1 0\n");*/ } } } for (i=0; i < NUM; i++) { for (j=0; j < NUM; j++) { if (max <= matrix [i][j][1]) {max = matrix [i][j][1]; maxindex = j;} } int main () { int i,j, p; } max = 0; matrixNash [i][maxindex][1]= 100; for ( p=0; p < 8000; p++) { } for (j=0; j < NUM; j++) { for (i=0; i < NUM; i++) { if (max <= matrix [i][j][0]) {max = matrix [i][j][0]; maxindex = i;} } max = 0; matrixNash [maxindex][j][0]= 100; } } int checkunique() { int i, j, count =0; for (j=0; j < NUM; j++) { for (i=0; i < NUM; i++) { if ((matrixNash[i][j][0] == 100) && (matrixNash[i][j][1] == 100)) count++; } } if (count > 1) return 0; else return 1; } for (i=0; i < NUM; i++) { for (j=0; j<NUM;j++) { matrix [i][j][0]= ((rand())%10); //matrix [i][j][0]= matrix [i][j][0] /10; matrix [i][j][1]= ((rand())%10); //matrix [i][j][1]= matrix [i][j][1] /10; } } for (i=0; i < NUM; i++) { for (j=0; j<NUM;j++) { matrixNash [i][j][0]= 0; matrixNash [i][j][1]= 0; } } //printmatrix(); ComputeNash(); if (checkunique() == 1) printnash(); } return 0; }