V. A view of the Probabilistic Algorithms applied to Draughts

advertisement
Thomas LEMAIRE
ID : 110237824
E-Mail : tlemaire59@yahoo.fr
Exchange Student
Project CS 526
Probabilistic Reasoning applied to Draughts
Winter 2003
Contents
I. Introduction .................................................................................................... 3
II. Interests ......................................................................................................... 3
III.
Machine Learning in Games ...................................................................... 4
IV.
History of the machine learning in Draughts............................................... 4
V. A view of the Probabilistic Algorithms applied to Draughts ............................ 5
A. Temporal Difference Learning .................................................................... 5
B. Artificial Neural Network ............................................................................. 5
C.
Function applied to units ......................................................................... 5
D.
Reinforcement Learning and TD Learning .............................................. 6
VI.
NeuroDraughts ........................................................................................... 7
A. Architecture ................................................................................................ 7
B. Training with NeuroDraugts........................................................................ 7
C.
Through self play .................................................................................... 8
VII. Performances from the paper..................................................................... 8
A. Human playing against network ................................................................. 8
B. Tournament ................................................................................................ 9
VIII. Further Topics .......................................................................................... 10
A. Developpement of NeuroDraughts ........................................................... 10
B. Application................................................................................................ 10
IX.
Conclusion ............................................................................................... 11
X. Research Papers ordered by relevance ...................................................... 12
I. Introduction
The temporal difference family of reinforcement learning procedures have been
applied successfully to a number of domains including playing games. Thanks to
TD learning a system can select actions according to how they contribute to the
realization of some desired futures outcome. TD learning is the base of some
algorithms which give the possibility of a program to learn by itself. It belongs to
the family of probabilistic algorithms
This project presents an overview of a probabilistic algorithm applied to the game
of Draughts, NeuroDraughts. This algorithm uses NeuralNetwork and TD
learning. This game is enough difficult for applying some probabilistic algorithms
and having some interesting results. Indeed, programming a Draughts software
has been always difficult because the simplicity of this game hides a very
complex game. A good player will be able to change the pattern of a game as he
wishes and by this fact disturbs the programm. The probabilistic reasoning
occurs when we need to choose the best move while we don't know how the
player will play by his turn. So, we should compute some probabilities of the next
move of the concurrent and compute our best move knowing these probabilities.
This is the basic reasoning of this problem. The projects shows some example of
theses kind of algorithms thanks to the work of Mark Lynch.
II. Interests
I would like to explain why I took this project about probabilistic algorithms
applied to the game of draughts (or checkers). In fact, I am an experimented
player in draughts ( I took part in 4 Wolrd Championship for young people in
Draughts) and I would like to see how we can compute a software so that it can
win against confirmed players. Nervertheless, for the moment there exists no real
good program which are superior to the best player (even if at a certain level, a
program seems to be better) and I would like to overview one of the best
algorithms which try to make a program better. Moreover, my level will enable me
to analyse the performances of these algorithms.
III.Machine Learning in Games
Strategy games have always been a privileged domain for research in artificial
intelligence.
Since the first algorithms (beginning of 40's), a lot of progress has been made .
The game "Othello" which is a very calculatory game, enables program to beat
any player. In chess, the research has not ended even if in 1997, Deeper Blue of
IBM, won against the worl champion, Garry Kasaparov. Indeed in 2002, the world
champion, Vladimir Kramnik, played against the software, Deep Fritz, which was
enable to analyse 3 millions of board per second againt 200 for Deep Blue,
although Deep Fritz had less processor than Deep Blue (1000 processors).
This confrontation ended with a draw. The game of Go, an Asian game, remain
for the moment of an extrem difficulty for computer. The sorts of intelligence
required for this game (pattern recognition, understanding of global objectifs, and
priorisation of objectives) are difficult to program. So, best program are beaten by
not so good player.
Finally for Backgammon, Poker and Bridge which call a notion of probability,
programs play like best players.
IV. History of the machine learning in Draughts
I will begin this report by presenting a brief overview of the research in Machine
Learning applied to Draughts. The real beginning for both chess programs and
draughts programs may be dated on March 9, 1949, when a research worker at
Bell Telephone Laboratories, Claude Shannon, presented a seminar paper about
how to program a Chess computer.
Then, the computers appeared, and make possible to use Shannon's theories.
In a few years, computers became better, and in 1992, Chinook, an American
draughts program developed from 1988 in Alberta university, won against all the
players in World's championship, but lost against Marion Tinsley (4 lost, 2 won,
33 draws).
In 1995, Chinnok became world's champion, against Don Laferty (1 victory, 31
draws).
In the same time, other programs were developed in other sort of draughts.
Nowadays a project called, Buggy, developped by French and Dutch players,
manage to win against the 11th ranked over world player. This program is the
most valuable program (wolrd champion of draughts program) and it is
continually developped to beat every player. This program used among other
things full and lazy evaluation functions, pattern (shot) recognition. The match
between human and machine point out the fact that it is difficult to program this
game and to find efficient algorithms.
V. A view of the Probabilistic Algorithms applied to
Draughts
A. Temporal Difference Learning
At first, the TD(lamba) familiy of learning procedures is applied a lot to game
playing. Indeed TD is well suited to game playing because it updates it's
prediction at each time step to the prediction at the next step. So, that means it
predicts the relatived best move by evaluating the state fo the game after it plays.
Moreover, a record doesn't need to be kept, because only the current state and
the previous evalution is needed to calculate the error and the new evaluation at
any particular stage. And, the most important is that TD can improve the play of
the program by playing againt itself. So, it can run for a thousand of games, learn
and so become a high player. So, TD learning provides a computational
mechanism that allows the positive or negative reward given at the end of a
game to be effect passed back in time to previous moves.
B. Artificial Neural Network
Samuel was the pioneer of the idea of updating evaluations based on successive
predictions in checkers program. He used a polynomial evaluation function
(adjustments of the coefficients) as opposed to an Artificial Neural Network
(ANN) used in Neurodraughts (adjustments of the weights). But the work of
Samuel helped a lot to create the good feature set for NeuroDraughts, and
showed a lot of results which are references in Machine Learning.
A ANN is a multi layered perceptron (MLP) which consists of a layer of inputes, a
layer of hidden units and a lyer of outputs. Each input is connected to each
hidden unit and each hidden unit is connected to each output. It can be used to
approximate
functions.
Training such a MLP is more complex because of the addition of the hidden
layer.
C. Function applied to units
Some functions are created to process the units in the ANN.
A squashing function is computed to control the range values coming from each
hidden unit. This function enables us to keep vlaues always within a certain
range. The hyperbolic tangent was choosent for the NeuroDraughts , that is to
say the values will be kept between –1 and 1 because tanh is a bijective function
from ]-infinity, + infinity[ to [-1,+1](-1 for loss, +1 for win).
Moreover in order to make weight update we use backpropagation by calculating
the error signal of the network. We use linear formula to change the weights
between outputs and input.
D. Reinforcement
Learning
and
TD
Learning
In reinforcement learning the learner is rewarded for perfoming well (winning)
and given negative reinforcement for performing badly (losing). In between the
starting board and the final board when no specific reward is available the TD
mechanism tries to update the prediction for the current state to that of the next
state. For all non-terminal boards states the program forms a training pair
between the current board state and the prediction of a win for the next state.
We can notice that TD can be viewed as an extension of backpropation.
An important advantage of TD is that we don't need to wait until the final outcome
to train. This means that only one board state must be kept in memory.
This is beneficial for multi-processor machines learning. One thread could be
working out the next move and the other could be training for the current move.
TD will consider a state good if the most likely state following it is good and vice
versa. So, after playing several games one can expect the TD trained network to
be very accurate prediction of winning/losing.
VI. NeuroDraughts
A. Architecture
To compute, the evaluation network and the temporal difference procedures are
incorporated into a same structure which represents the learning structure.. The
mapping of the boards to the inputs is handle so that we can have a relation
between the state board and the Neural Network. The network consists of a layer
of inputs, hidden and a single output unit. The network is trained using back
propagation. This is trained to evaluate boards after the player has moved.
To represent the board, we need to notice that there exists five sorts of pieces in
each square. So, we can compute the board by a set of squares where 0 means
empty, 0.25 black man, 0.5 red man, 0.75 black king and 1 red king. There is
others way to represent the board : square with 3 inputs or board represented y a
given number of features to be described.
B. Training with NeuroDraugts
To train NeuroDraughts some methods could be applied. The first one is the
Straight Play strategy which let teo oppponents play against each other, both
learning for a set number of games. But it is impossible to know the improvement
thanks to this method, because no benchmarks are available at any stage. The
second one is cloning, a training network plays against a non-training network. If
the training network win a certain percentage of games (80%-90%), then the
non-training network copied its weights. Using this method is not reliable
because if the training net wins the last 5 games after losing the first 45, it will be
cloned yet. So, even if it becomes good, a much poorer network is in fact being
cloned. An other one strategy is thanks to Expert play. But expert games always
finish with six or more pieces left. So, it is a poor choice for NeuroDraughts.
We can notice that self play is probably the best method of training as it is fully
automated and should consistently improve from generation to generation. If we
apply look ahead (see 2 moves after) then it can reveal a lot, even if for confimed
players seeing 2 moves after is very poor (in average they can see 9-10 moves
before).
Finally, training with a human player could be beneficial if this player is of a hight
standard.
C. Through self play
This part show some points during the self play training. The basis of self play is
the tournament : A set of games during which training is occuring is followed by
two games (the test). We can see whether or not the level of play has improved
enough to beat its clone. If this is the case, the weights of the training nets are
copied to the non-training oppponent. Then to replace the rules we use a neural
net. Each possible board is generated from the current position and then
evaluated. Then, the move with the highest evaluation is passed back to the
player. At the end of the game depending on who has won, a reward is given to
the network being trained. Then the behaviour is either positively or negatively
reinforced. Nevertheless, self play has a big drawback which is the fact that if a
network plays absurd moves and “thinks” that it is a good move, these bad
moves will not be remove by the network. So, a human intervention in the
network by a manual manner should be necessary.
VII. Performances from the paper
A. Human playing against network
At the beginning, I thought it would be nice if I compute myself the experiments.
Nevertheless, I wanted to use a programm in Turbo Pascal provided by a French
programmer and this program turned out to be too difficult to be used (no
commentary, one file). So, I decided to use the program of M. Lynch provided
with his paper.
Before playing against the program, I had to create the network by using Expert
Trainer to create a network and so copy it in order to evaluate the performance of
the program.
I had to notice that the program use the rules of checkers and not international
draughts. And even if only the number of pieces differs, I admit that it is not really
the same about the strategy. So, I trained a lot a network by using many
functions provided by the program. And I had to assume, that the program works
very well about tactic (it can see easily all gain of pieces). I had intentionnaly
made some mistakes and the program saw them quickly. But on the other hand,
as all others programs there are some problem in strategy. In a certain kind of
game, I tried to win always. But I had to assume that the fact that it can see and
look ahead easily, the strategy of this program is very interesting, even if some
moves were very absurd, but by playing a lot we could improve this point.
Moreover we could manipulate the network to see where is the default.
B. Tournament
By an objective point of view, if we run a sequence of 2000 training games by the
following method :
Firstly, we do a tournament between networks, all the clone networks and the
final nets produced within each training sequence were played against each
other. The best net and the final network for each training sequence were
chosen. So, each training sequence was represented by two nets.
Then, these played in another tournament in which each network played 2
matches against every other network trained with the same representation (self
play, dual, expert). Finally, the best 10 nets for each representation were chosen
and played off in a cross representational tournament in which each net played
58 matches (29x2).
We see as a result that the networks trained with expert games produced the
strongest set of networks. These networks won or drew 88% of their games as
opposed to 67% as dual and 66% as self play. So, it shows that the best training
remains expert games even if it finishes when it remains 6-7 pieces on the board.
Nevertheless, using self play after can improve the network to obtain a network
able to beat very good players.
Nevertheless, I don’t think that we could get an enough efficient network to beat
any player with only the application of NeuroDraughts in a program. But by
adding it in an evaluated program it could be maybe possible. It is a good point to
look forward.
VIII. Further Topics
A. Developpement of NeuroDraughts
The aim of a programm is to eat its creator. So, the program achieve easily. This
program is achieved as well as a technical and theorical point of view. It will
however remain in developpement, with the emphasis being put on reducing the
domain specific knowledge while trying to keep the level of play constant. In
future we could use Genetic Algorithm technology to attempt to automatically
discover features. Moreover, we could add a research algorithm like MiniMax.
search strategy which is simple model of the probable immediate outcomes of a
move.
B. Application
This method which is quite new could be applied to some existing draughts
program. Indeed, an international project (mainly French) named "Buggy" aims to
beat any best player in the world. Of course, nowadays it is still impossible
because of the complexity of this game, but Buggy tried in March to beat the 11 th
player and doesn't use for the moment NeuroDraughts which could improve it.
Buggy uses for the moment an algorithm to generate moves, a algorithm of
research and an evaluation function. When the first two points are computed, the
porgram can beat 99% of players. Indeed the tactic of the program is very strong
thanks to a good capacity of calculation. On the other hand, to gain a high level,
the program must understand the game. For that, it must get knowledge linked to
the game, the theory of the game, so the strategy. So the programmer must build
an evaluation function as accurate as possible. But, draughts is too rich and
complex that it is extremly difficult to transmit to the program some notions even
simple. So, it is always possible to find board that the software cannot play
correctly.
So, by incorporating NeuroDraughts in Buggy, we could improve the evaluation
function of this software. Unfortunaltely, for the moment this program owns a kind
of database with the differents system of game, and evaluate these thanks to this
database and complete and incomplete evaluation functions. So, depending
upon the kind of system of game, it can evaluate the reward of the board. The
NeuroDraughts could occur when we evaluate the board, and so could improve
the program.
In a word, applying NeuroDraughts to an existing program could bring the aspect
of a learning by itself program, which is not common in others program. Indeed,
current program uses database of positions to evaluate a position and not really
some probabilistic notions.
IX. Conclusion
For some reasons, even if I have predicted to implement the project, I was not
able to do by lack of data of an original source of a draughts program. Moreover
the program of Lynch did not provide the source. But I was able to test the work
of NeuroDraughts which appears quite powerful in tact as the most of exisiting
draughts software, but appears to be better than others program in strategy.
This project enables us to see an overview of techniques using reinforcement
learning to improve artificial intelligence of strategy games and especially
draughts. Finally, this technique even if it seems to be quite complicate at the
beginning, is enough simple to be understood. So, this technique could be
proposed to the project "Buggy" so that we can improve this program.
Moreover,we could use this technique in other strategy games. Again, draughts
is the most advanced domain in research of learning in games, because
draughts is the most complicated to be computed (after Go), and so research are
more interesting in draughts. We add the point that more a program plays more it
learns.
X. Research Papers ordered by relevance





"An application of Temporal Difference Learning to Draughts", Mark Lynch
"NeuroDraughts : the role of representation, search, training regime and architecture in a TD
draughts player", N.J.L Griffith
"Inductive Inference of Chess and Draughts Player Strategy", Anthony R. Jansen, David L.
Dowe, and Graham E. Farr
"Experiments with a Bayesian game player", Warren D. Smith, Eric B. Baum
"Exact probabilistic analysis of error detection for parity checkers", V.A. Vardanian
Download