Development of a Machine-Learning

advertisement
Development of a Machine-Learning-Based AI
for Go
Computer Systems Research 2005-2006
Justin Park
Abstract
Since Deep Blue’s victory in 1997 over Gary Kasparov, the World Chess Champion of
the time, the new forefront of artificial intelligence has been the ancient game of Go, developed
in China 2500 to 4000 years ago.
The challenges in Go lie in its large board set (19x19) and the complexity of developing a
heuristic function. In Go, the influence each piece has on other pieces is very abstract, and often
the outcome of a certain move can be seen only after many plays. The result is that Go programs
are often very complex, with hard-coded patterns and responses to certain standard moves. An
artificial intelligence that uses machine learning to develop the skills would simplify the
programming of strategies. Larger boards can then use the database of situations in smaller
boards to compute pattern recognition problems generally very difficult for an AI to perform.
This project attempts to recreate the “Roving Eye” technique for Go while learning the game at
smaller board sizes.
Introduction
The purpose of this project is to develop a Go algorithm for a 9 by 9 sized board by
implementing the technique of machine learning. The algorithm would eventually utilize the Go
Modem Protocol (GMP) to communicate with a standard graphical user interface for Go, thus
enabling matches between other developed AI’s. Each game, in theory, would increase the
performance of the algorithm by building a large database of situations and rating each situation
by its outcome in the game.
In order to develop any AI, a set of rules for the game is required. The rules for Go,
though simple in concept, is harder to program than traditional board games such as chess or
checkers. The reason for this is because of the need to remove a body of stones based on
whether or not it is completely surrounded, rather than single pieces. An even harder
programming task is at the end of the game, when the amount of area each side has secured is
counted.
After board rules are developed, the main algorithms for the artificial intelligence portion
of the program need to be coded. This would include a guiding heuristic function that would
keep the AI from making arbitrary moves from the start. The function would be able to evaluate
board positions based on a resonating influence sphere from each stone. The second part of the
AI is coding the actual machine-learning portion. This would include a database that stores each
successfully completed game. Another function would search this database for similar board
states and create an accompanying evaluation score, and then judge whether to use a move from
the database or to use the strongest move found by the guiding heuristic function.
`
The study will put emphasis on gains made by the AI through machine-learning when
placed against its parent heuristic function. The analysis will include effects on direct human
intervention in computer vs. computer games, human vs. computer games, and indirect
intervention through the addition of human vs. human games in the archives.
Background Information
Go is an ancient board game comprised of a 19 by 19 grid and white and black stones.
Developed between 2000 B.C. and 200 B.C. in China, it is the oldest surviving board game
known to man.
Rules of Play:
The object of the game is to capture the largest amount of territory. Territory is defined
by a surrounded region in the board that consists of two or more houses (eyes). This will
be described later.
Each player alternates, with Black starting first. Any intersection on the grid can be played on
with the exception of two rules:
1. A move cannot be suicidal unless it captures stones in the progress
2. In the event of Ko, a move must be placed elsewhere before moving in a position to
regain a previous board state.
These rules will be described in depth later.
When one stone color surrounds the liberties of another player, the surrounded stones become
“prisoners” and are removed from the board. Each captured stone deducts 1 space from the total
area of its owner in the end of the game. A suicidal move is one that is placed in a area
completely surrounded by the other player, with all liberties covered. Suicidal moves are illegal.
Ko:
If by capturing a stone, the board state is the same as it was two moves previously, it is
an illegal move (otherwise, the repetition of the same moves can occur indefinitely).
Instead, the player must play at a different location before capturing the stone. The
concept of fighting over a stone is called a Ko battle.
If both players pass consecutively, the game ends. The players count the area gained and the
winner is the one with the greatest amount. In typical 19x19 games, white is given a 5.5 or 6.5
handicap called Komi, which forces a winner in every game.
Machine Learning:
Machine Learning is a broad subtopic of Artificial Intelligence dealing with a computer’s
ability to improve techniques by analyzing data and building upon previous knowledge. In the
context of my project, machine learning will be able to examine a database of games given a
particular board state, and return the best move after examining board similarity and board state
strengths.
Perhaps the most interesting aspect of Machine Learning in my project is the affect of
human intervention. When human games are placed in the database, it will add different paths
that computers may not have thought of with just the usage of the parent heuristic function.
Human games add not only the factor of creativity, but also can input known formations that
have been in human knowledge for hundreds of years.
Lastly, a goal of this project is to show that database building on smaller size games
affect initial game-play on larger board sizes. In particular, 7x7 games will be looked at before
5x5 game experience, and after.
Python:
Python was the language of choice for this project due to its high-level object oriented
structure, which allowed for easy creation of classes (essential for database building). Because
of its use of dynamic naming, it was also very easy to code the program without the need of
declaring functions and variables in headers. Also, it has ease of integration with C and C++.
Using wrapper code programs such as SWIG, C and C++ code works flawlessly with python.
This was a key deciding factor as I originally had planned to integrate preexisting code written in
C for the utilization of the GMP protocol and the reading of the .sgf format commonly used to
save Go games.
Research Theory and Design Criteria:
In this section I will go in depth through the various algorithms and program structure.
The program is divided into three classes. The first class, which contains the main body
of the code is Board. Within it is the main loop of the code, the methods for determining board
rules, and the main code for the AI. The second and third classes deal with the datamining/machine-learning aspect of the AI.
The main loop of the program is in AskInput(). It takes in input through text based
commands and then sends them to a parser. Some of the commands I included were move, pass,
resign, show (which prints all the boards), editor (to set up the board to whichever state), and
fast, which lets two AI players play each other with no pause.
The next important function that the program needed to accomplish was to carry out all
the rules of Go. The majority of this task dealt with finding out which moves were illegal. By
far, the hardest illegal move to detect was suicidal moves. I used nsurround(), a recursive
function, to check surrounding stones for a gap in a contiguous body. nsurround() not only
accomplishes checking for illegal moves but also helps in killing bodies of stones. In order to do
this, I use the inverse function of nsurround(), checking when a move is placed, if it completes
the capture of neighboring opposite-colored stones.
After this, I needed to create a basic guiding heuristic function. This was done with
influenceHeuristic(), which evaluates positions based on their influence from radiating stones
(the influence decreases with distance). A “plasticBoard,” a fake copy of current board is needed
in order to test out the evaluation function at all the possible positions without affecting the
original board. The evaluation is done by powerpoints(), which takes distance into account when
assigning point values. Contiguous stones are awarded bonus points, and empty spaces are given
points depending on its closeness to allied stones. This result is a stronger tendency by the AI to
place beginning stones in the middle, where more areas are influenced, and to connect stones. It
is known that connecting stones makes both stones much harder to capture.
After the parent heuristic was completed, I needed to create the database structure. The
two classes, Game, and Games, were essential to accomplishing this. Game was simply a
collection of board states, moves, and scores determined by the heuristic function, of completed
games. Games was a collection of Game objects that were sorted according to date played, the
size, and the number of moves. In Games is also the critical function needed to perform data
mining, searchForMove(). In searchForMove(), I called boards with a similar number of moves
to be compared using the compareBoards() function.
In compareBoards(), I checked the similarity of board states by counting the number of
black and white stones in a square, and giving bonus points for similar patterns.
compareSquares() is called on the four corners of a board and returns a value based on the
similarity of the squares.
Once these values are returned to searchForMove(), the possible board states are sorted
based on both the score differential made by each move and the similarity of the board position.
The best move is returned to MainAI(), to be compared with the initial parent function.
In order to encourage growth through Machine Learning, extra weight is placed on the
algorithm that searches through the database over the parent heuristic function
Results:
I did three comparisons in the study. The first comparison was with the parent function
versus the Machine-Learning function. In the first trial, the parent function was black and the
Machine-Learning function was white. The parent function won consistently despite the
Machine-Learning function changing its moves slightly. In the next evolution, with the parent
function being white, and the Machine-Learning function being black, the Machine-Learning
function consistently beat its parent function, and even changed to beat it in a different way.
After several repeat trials, there was little change in how the function evolved.
When switching back to the original pairings, with Machine-Learning being white, and
the parent function being black while retaining the database logs, the Machine-Learning function
‘learned’ how to avoid being killed completely by white, though it never gained ground in
beating it.
Pink shows the trained machine-learning function while blue shows the initial function without experience.
The second study was on the growth of technique in 5x5 board to 7x7 boards. In a
straight 7x7 match with no experience, the Machine-Learning function (white), lost completely
for the first two rounds, and then learned by itself to retain about half the board. However, a 7x7
match with 5x5 experience (winning experience when Machine-Learning was black), won the
game in the first try by a little more than half the board. After one game of 7x7 experience, the
Machine-Learning function failed utterly losing the entire board. For subsequent evolutions, the
same result occurred, though I noticed a strong structure in the early game play (but then the late
game play failed).
The final study was the effect of human movements in 5x5 games. I inputed several
human to computer games and found out that when evolutions had stunted, the addition of
human games encouraged different routes of movement, many of which led to victories. In all
cases, human input made some difference.
Discussion:
One interesting finding of the Machine-Learning algorithm was that it inherited game
winning techniques. When placed in a better situation, it learned how to win, and then make
better of worse situations such as in the example where the Machine-Learning function started
off as black. In most cases, I found that machine learning, when based upon its parent function,
causes degenerate performance over time. When playing around with various constants, I found
out that by increasing the weight of database-backed moves, the 5x5 Machine-Learning function,
which though previously would win consistently with black, learned how to lose with black. The
introduction of human input was the only way to stop consistent losses.
It was also interesting to see how the 5x5 game evolved to the 7x7 game. After the first
winning game, I could see a definite structure to the Machine-Learning function as it winning by
a huge margin. However, it wasn’t able to finish off the job, and slowly lost its place.
Conclusion and Recommendations:
Machine-Learning algorithms add much more randomness to the algorithm than
deterministic heuristic functions. The performance of the Machine-Learning function varied
quite a bit while the performance of the parent heuristic function remained nearly constant. The
souring of evolution that occurred when the Machine-Learning function was faced with the same
opponent again and again can be compared to fish in a pond, where homogeneity in gene
structure leads to crisis and non-adaptability. When human games were added to the database,
the AI was able to perform a lot more variety of moves when placed in different situations.
There were many flaws in my program due to time constraints that I think could be
improved upon. The major flaw in the machine-learning was that it did not look past the first
move in terms of heuristic scoring. Even a 2-3 ply search would yield vastly greater results,
especially in Go, where stones can be captured altering the entire game. Another area that could
be improved upon is the function that compares boards. Instead of using a 2x2 square, a 3x3
square would have much more accurate comparisons.
Further areas of research should go into the area of testing out the evolution of the
Machine-Learning heuristic function with a variety of different heuristic functions. It would be
interesting to see how the Machine-Learning function adapts to various circumstances rather than
the same opponent.
References/Sources:
“Evolving a Roving Eye for Go.” http://nn.cs.utexas.edu/downloads/papers/stanley.gecco04.pdf
“Computer Go: an AI Oriented Survey.” http://www.ai.univ-paris8.fr/~cazenave/CGAISurvey.pdf
“Garry Kasparov.” http://en.wikipedia.org/wiki/Gary_Kasparov
“The Many Faces of Go.” http://www.smart-games.com/manyfaces.html
http://research.microsoft.com/displayArticle.aspx?id=1062
http://en.wikipedia.org/wiki/Go_%28board_game%29
http://www.aaai.org/AITopics/html/go.html
http://www.newscientist.com/article.ns?id=dn6914
http://www.cs.dartmouth.edu/~brd/Teaching/AI/Lectures/Summaries/learning.html#Definitions
http://www.scism.sbu.ac.uk/inmandw/review/ml/review/rev6542.html
Download