report

advertisement
Creating Optimal Multi-Layer
Perceptron Networks
To play Go with a
Genetic Algorithm
by Nathan Erickson
n8erickson@yahoo.com
12/18/03
ECE539 Final Project
for
Prof. Yu Hen Hu
Univ. of Wisconsin – Madison
Project Overview
The ancient Chinese game of Go has long been a difficult problem for computer
programmers. Its complex and rapidly changing game play makes it very difficult for
computers to get a handle on how to play well. According to the book “The way to go”
“The situations that arise from the simple objectives
of go are complex enough to have thwarted all
attempts to program a competitive go-playing
computer. Informed opinion doubts that a computer
will soon, if ever, challenge the ability of a go
professional. Effective go strategy is sublimely subtle.”
Go’s subtleties are profound and masters of the game spend years, even decades,
becoming proficient at the strategies to win. In some parts of the world Go is a more
popular game than chess. Japan and other Asian nations have professional Go
tournaments with winning in the millions of dollars.
Programming an engine to play go has been tried with some success in the past.
Go to http://www.dcs.warwick.ac.uk/~pwg/go/go.html to play a java applet version that
is reasonably strong. That engine works by attempting to assign relative strengths to
positions and placing stones to get maximum influence.
My final project attempts another solution to the problem by using a back
propagating multi-layer perception to learn and play go. However the vast complexity of
Go would make selecting an optimal MLP configuration impossible so I also use a
genetic algorithm to find an optimal MLP configuration. The complexity of Go requires
a large network of perceptions with no upper limit of size on the number of layers and the
size of each layer. The project is a combination of Theory, software development, and
application to the game of Go. I explored how MLP’s and GA’s work while developing
the software that did them and applied that to playing Go.
To learn how to play Go please see “The Way to Go” and other great books
available for free from the US Go Association (http://www.usgo.org)
http://www.usgo.org/usa/waytogo/index.asp
Project Structure
My projects name is GoArena. This was primarily intended as a theoretical study
of how MLP and Genetic Algorithms work and interact. It is also a large software
development for the MLP and GA and an application development project for the game
of Go. The back propagation algorithm will require an enormous amount of iterations to
find a decent solution and many generations of the genetic algorithm to improve it
incrementally. In order to get a good speed I decided to implement my project in C++. It
was compiled on Windows XP using Microsoft Visual Studio .Net Academic.
I made the following classes and have tested and debugged them as much as time
allows. Some functionality that I wanted is not there or is in the wrong place but it
complies and runs a reasonable demonstration of both the MLP and GA.
GoArena Contains the Genetic Algorithm, move GA to own class
|
|- PopulationManager
|
|
|
| Player Contains the MLP, move MLP to own class
|
|
|
|- Neuron
|
|- Board
Class Explanation (bottom up):
Board.cpp & Board.h contain the go board. It manages the game of go and provides the
rules that state how to move, what legal, and who wins.
Neuron.cpp & Neuron.h are basically data storage for the MLP implemented in the
Player class.
Player.cpp & Player.h are where the brains of the program are. It contains the MLP data
structures and determines what moves to make in a game.
PopManager.cpp & PopManager.h used to manage the population of GoArena. This
should use the GA when the GA is placed in its own class to Breed Players and should
contain functions to prevent over using the system resources such as memory. Maintains
statistics about the population such as size, rating, age, etc.
GoArena.cpp is the main program that controls what goes on. It currently has the Genetic
Algorithm. It sets up and plays games then determines the winner. Could be threaded to
create multiple tables to play games for speed on multi-processor systems.
GoArena
GoArena is the main program that controls what goes on. It has #defined
constants to control how intense of a search to do. It can alter the maximum population,
Epoc size, and number of generations to run for.
1)
2)
3)
4)
5)
6)
7)
8)
Upon start it:
Starts a Population Manager to hold the PlayerList
Creates a board to play on
Loops of XXX generations
Playing EPOC games per generation
Selects 2 players to play a game
Plays game
Determines winner and sends back propagation data to players
After each EPOC it kills weak and dysfunctional players and creates new ones
Population Manager
PopManager is supposed to use the Genetic Algorithm to maintain a number of
Players. It would provide a structured community to the Players and the GA in order to
facilitate breeding better players. I did not have time to move the GA to its own class out
of the main GoArena so this class is largely unused.
When created it
1) Creates 4 Players and enters them in the PlayerList
Or
2) Creates a specified number of Players and enters them in the PlayerList.
Also will be responsible for maintaining statistics about the population for the GA.
Statistics will also be used to see effectiveness of MLP/GA in making strong players.
Player
Each Player represents a person like in a real life community. Each Player would
be seated at a table by the GoArena to play against another Player and its abilities would
be measured. The fitness of each Player is evaluated by the GA to determine which
Players live and die.
Each Player
1) Sets up its own MLP to represent its brain
2) Uses the output of it MLP to select where to move when playing a game.
3) Learns to play better after it is told if it won the game.
The player also maintains its record of wins/losses/age and id. These would be used by
the Population Manager and the GA.
Neuron
The neuron class is a helper class that contains data for the MLPs in each player.
Its only function is to hold the weights of the MLP and calculate its output when given
the previous layer of the MLP. Applies an activation function to output.
Board
This is the rules of the game. It is the fitness function that determines a Players
worth. This could be replaced by another fitness function and the rest of the simulation
would run fine. It just wouldn’t be GoArena anymore.
The board provides a place to play the game of go. It enforces the rules of go and
measures the winner. It provides a constant that allows you change the size of the board.
There are 3 main sizes of board used in the real world depending on your proficiency and
how long you wish to play. 19x19 is the largest size for very good players but takes a
long time. Other common board sizes are 13x13 and 9x9. GoArena can use any of these
sizes just by changing the #define BOARD_SIZE.
The MLP
As a software development project considerable time was put into creating a C++
implementation of a multi-layer perceptron network. The network in GoArena is a fully
forward connected network. There is no upward bound on the size of the network. For
this application it was critical that the network be able to grow to seemingly ridiculous
proportions if necessary. Reasonableness would be ensured by the Genetic Algorithm.
Any network that was unable to perform would be killed and another would take its
place. The network is meant to truly mimic the human brain which has billions of
neurons and connections. The no limits approach to the MLP combined with the GA
would make this possible.
Learning takes place after each game. GoArena determines the winner of each
game then passes that back to the Player who uses it to update the weights of its MLP.
Version 2 of the MLP would include the ability to backward connect different
layers of the MLP and even more closely mimic the human brain.
The Genetic Algorithm
The theory of a genetic algorithm is very simple but making it work in practice is
very complicated. This project combined a GA with a MLP to play go. This is a large
theoretical step that complicates thing a lot.
How to combine MLPs?
How to mutate MLPs?
The simplest form of the GA is employed in GoArena. Its current function is
simply
1) Kill any player that can not win any games
2) Kill the weakest players to make room for more
3) Create a new player with randomly chosen values.
A more advanced GA would be able to use the mutate and breed functions to create
new MLPs. Mutating MLPs would consist of randomly permuting the weights of the
better Players and then testing both old and new in the arena. The Breed function would
be more complex requiring a determination of whether to use the number of layers of the
Mother or Father or average them or Average the top ten players. The combinations for
selective breeding are very large. A GA could even be used to select the best GA for
making new Players!
Results
Unfortunately, due to limited time some functions of the go board could not be
perfected. While doing a reasonable job it does not always enforce the rules of go
properly. Sometime groups with no liberties are not removed and the function to score
the game at the end is a simplified version that doesn’t count dead stones.
The best way to see the results of the GA/MLP is to watch the games it plays as it
is learning. This program takes a long time to run. 1000 generations with 1000 games per
Epoc and 100 Players on a 9x9 board takes approximately 12 hours on an Athlon XP
2500+.
As the simulation progresses you can see how the players get smarter. The best
method is to have it show the results of every 100th or so game. Looking at the board at
the end you can get a general idea of how smart the strategies employed by the computer
are. This does require some knowledge of the game to appreciate. Please see the intro for
suggests on learning go.
At the start the computer is horribly stupid. It just places stones in random places.
Later in the simulation you can see it has developed some ability to cover the board and
group stones. Also later in the simulation you see less and less players being killed for
not being able to win any games. On long runs the grouping can become pronounced with
one Player taking half the board and the other taking the other half. This is a striking
improvement over the random placement that occurred in the beginning. Passes also
become a more frequent method of ending the game rather than simply filling the board
and having nothing left to do.
Ideally Players would eventually learn to how to properly end a game with a pass
and more importantly they would learn when it is the proper time to do so. I have not
seen this in any simulations I have run. This is an extremely complicated behavior and
my current GA/MLP would take a long time to reach this point.
While slow there is noticeable improvement and change in the play as the
simulation progresses. These show the MLP and GA are working to together to make
better players and that given enough time we could have a very good Go Player. Anyone
have a supercomputer handy?
Download