Creating Optimal Multi-Layer Perceptron Networks To play Go with a Genetic Algorithm by Nathan Erickson n8erickson@yahoo.com 12/18/03 ECE539 Final Project for Prof. Yu Hen Hu Univ. of Wisconsin – Madison Project Overview The ancient Chinese game of Go has long been a difficult problem for computer programmers. Its complex and rapidly changing game play makes it very difficult for computers to get a handle on how to play well. According to the book “The way to go” “The situations that arise from the simple objectives of go are complex enough to have thwarted all attempts to program a competitive go-playing computer. Informed opinion doubts that a computer will soon, if ever, challenge the ability of a go professional. Effective go strategy is sublimely subtle.” Go’s subtleties are profound and masters of the game spend years, even decades, becoming proficient at the strategies to win. In some parts of the world Go is a more popular game than chess. Japan and other Asian nations have professional Go tournaments with winning in the millions of dollars. Programming an engine to play go has been tried with some success in the past. Go to http://www.dcs.warwick.ac.uk/~pwg/go/go.html to play a java applet version that is reasonably strong. That engine works by attempting to assign relative strengths to positions and placing stones to get maximum influence. My final project attempts another solution to the problem by using a back propagating multi-layer perception to learn and play go. However the vast complexity of Go would make selecting an optimal MLP configuration impossible so I also use a genetic algorithm to find an optimal MLP configuration. The complexity of Go requires a large network of perceptions with no upper limit of size on the number of layers and the size of each layer. The project is a combination of Theory, software development, and application to the game of Go. I explored how MLP’s and GA’s work while developing the software that did them and applied that to playing Go. To learn how to play Go please see “The Way to Go” and other great books available for free from the US Go Association (http://www.usgo.org) http://www.usgo.org/usa/waytogo/index.asp Project Structure My projects name is GoArena. This was primarily intended as a theoretical study of how MLP and Genetic Algorithms work and interact. It is also a large software development for the MLP and GA and an application development project for the game of Go. The back propagation algorithm will require an enormous amount of iterations to find a decent solution and many generations of the genetic algorithm to improve it incrementally. In order to get a good speed I decided to implement my project in C++. It was compiled on Windows XP using Microsoft Visual Studio .Net Academic. I made the following classes and have tested and debugged them as much as time allows. Some functionality that I wanted is not there or is in the wrong place but it complies and runs a reasonable demonstration of both the MLP and GA. GoArena Contains the Genetic Algorithm, move GA to own class | |- PopulationManager | | | | Player Contains the MLP, move MLP to own class | | | |- Neuron | |- Board Class Explanation (bottom up): Board.cpp & Board.h contain the go board. It manages the game of go and provides the rules that state how to move, what legal, and who wins. Neuron.cpp & Neuron.h are basically data storage for the MLP implemented in the Player class. Player.cpp & Player.h are where the brains of the program are. It contains the MLP data structures and determines what moves to make in a game. PopManager.cpp & PopManager.h used to manage the population of GoArena. This should use the GA when the GA is placed in its own class to Breed Players and should contain functions to prevent over using the system resources such as memory. Maintains statistics about the population such as size, rating, age, etc. GoArena.cpp is the main program that controls what goes on. It currently has the Genetic Algorithm. It sets up and plays games then determines the winner. Could be threaded to create multiple tables to play games for speed on multi-processor systems. GoArena GoArena is the main program that controls what goes on. It has #defined constants to control how intense of a search to do. It can alter the maximum population, Epoc size, and number of generations to run for. 1) 2) 3) 4) 5) 6) 7) 8) Upon start it: Starts a Population Manager to hold the PlayerList Creates a board to play on Loops of XXX generations Playing EPOC games per generation Selects 2 players to play a game Plays game Determines winner and sends back propagation data to players After each EPOC it kills weak and dysfunctional players and creates new ones Population Manager PopManager is supposed to use the Genetic Algorithm to maintain a number of Players. It would provide a structured community to the Players and the GA in order to facilitate breeding better players. I did not have time to move the GA to its own class out of the main GoArena so this class is largely unused. When created it 1) Creates 4 Players and enters them in the PlayerList Or 2) Creates a specified number of Players and enters them in the PlayerList. Also will be responsible for maintaining statistics about the population for the GA. Statistics will also be used to see effectiveness of MLP/GA in making strong players. Player Each Player represents a person like in a real life community. Each Player would be seated at a table by the GoArena to play against another Player and its abilities would be measured. The fitness of each Player is evaluated by the GA to determine which Players live and die. Each Player 1) Sets up its own MLP to represent its brain 2) Uses the output of it MLP to select where to move when playing a game. 3) Learns to play better after it is told if it won the game. The player also maintains its record of wins/losses/age and id. These would be used by the Population Manager and the GA. Neuron The neuron class is a helper class that contains data for the MLPs in each player. Its only function is to hold the weights of the MLP and calculate its output when given the previous layer of the MLP. Applies an activation function to output. Board This is the rules of the game. It is the fitness function that determines a Players worth. This could be replaced by another fitness function and the rest of the simulation would run fine. It just wouldn’t be GoArena anymore. The board provides a place to play the game of go. It enforces the rules of go and measures the winner. It provides a constant that allows you change the size of the board. There are 3 main sizes of board used in the real world depending on your proficiency and how long you wish to play. 19x19 is the largest size for very good players but takes a long time. Other common board sizes are 13x13 and 9x9. GoArena can use any of these sizes just by changing the #define BOARD_SIZE. The MLP As a software development project considerable time was put into creating a C++ implementation of a multi-layer perceptron network. The network in GoArena is a fully forward connected network. There is no upward bound on the size of the network. For this application it was critical that the network be able to grow to seemingly ridiculous proportions if necessary. Reasonableness would be ensured by the Genetic Algorithm. Any network that was unable to perform would be killed and another would take its place. The network is meant to truly mimic the human brain which has billions of neurons and connections. The no limits approach to the MLP combined with the GA would make this possible. Learning takes place after each game. GoArena determines the winner of each game then passes that back to the Player who uses it to update the weights of its MLP. Version 2 of the MLP would include the ability to backward connect different layers of the MLP and even more closely mimic the human brain. The Genetic Algorithm The theory of a genetic algorithm is very simple but making it work in practice is very complicated. This project combined a GA with a MLP to play go. This is a large theoretical step that complicates thing a lot. How to combine MLPs? How to mutate MLPs? The simplest form of the GA is employed in GoArena. Its current function is simply 1) Kill any player that can not win any games 2) Kill the weakest players to make room for more 3) Create a new player with randomly chosen values. A more advanced GA would be able to use the mutate and breed functions to create new MLPs. Mutating MLPs would consist of randomly permuting the weights of the better Players and then testing both old and new in the arena. The Breed function would be more complex requiring a determination of whether to use the number of layers of the Mother or Father or average them or Average the top ten players. The combinations for selective breeding are very large. A GA could even be used to select the best GA for making new Players! Results Unfortunately, due to limited time some functions of the go board could not be perfected. While doing a reasonable job it does not always enforce the rules of go properly. Sometime groups with no liberties are not removed and the function to score the game at the end is a simplified version that doesn’t count dead stones. The best way to see the results of the GA/MLP is to watch the games it plays as it is learning. This program takes a long time to run. 1000 generations with 1000 games per Epoc and 100 Players on a 9x9 board takes approximately 12 hours on an Athlon XP 2500+. As the simulation progresses you can see how the players get smarter. The best method is to have it show the results of every 100th or so game. Looking at the board at the end you can get a general idea of how smart the strategies employed by the computer are. This does require some knowledge of the game to appreciate. Please see the intro for suggests on learning go. At the start the computer is horribly stupid. It just places stones in random places. Later in the simulation you can see it has developed some ability to cover the board and group stones. Also later in the simulation you see less and less players being killed for not being able to win any games. On long runs the grouping can become pronounced with one Player taking half the board and the other taking the other half. This is a striking improvement over the random placement that occurred in the beginning. Passes also become a more frequent method of ending the game rather than simply filling the board and having nothing left to do. Ideally Players would eventually learn to how to properly end a game with a pass and more importantly they would learn when it is the proper time to do so. I have not seen this in any simulations I have run. This is an extremely complicated behavior and my current GA/MLP would take a long time to reach this point. While slow there is noticeable improvement and change in the play as the simulation progresses. These show the MLP and GA are working to together to make better players and that given enough time we could have a very good Go Player. Anyone have a supercomputer handy?