G13GAM -- Game Theory Materi Pendukung : T0264P16_5 Alpha-beta pruning

advertisement
Materi Pendukung : T0264P16_5
G13GAM -- Game Theory
Copyright © Dr A. N. Walker, 1997.
Alpha-beta pruning
Alpha-beta pruning is a potentially hard concept because, at one level, it's counterintuitive. We can understand that we don't want to look at bad moves, why is it that we
don't want to look at good moves? The first partial answer is that this is the other side of
the same coin -- a move that is good for me is bad for you. The second partial answer is
that it isn't that we don't want to know about good moves, rather that it's their existence
not their details that matter. The third partial answer is that we don't need so much good
moves as good enough moves; it is because `good enough' moves are much easier to find
than `good' moves that this technique makes a huge improvement to any game-playing
program.
What is alpha-beta pruning?
[The name is essentially completely arbitrary -- the two variables concerned in a
computer program happened to be called alpha and beta, and we might just as well have
had fred-jim pruning, p-q pruning or foo-bar pruning.]
When we are called upon to analyse a position, we will be given two `levels', alpha and
beta. These are the `floor' and `ceiling', respectively, of the range of values our caller is
interested in. Think of the beta level as representing a value which is `good enough'; we'll
worry later about what `good enough' really means, and also about what the alpha level
means. Let us suppose that a full analysis would give us the picture shown to the right:
we see that of the nine possible moves, moves 1, 4, 5 and 9 are reasonably good, moves
2, 6 and 7 are so-so, move 3 is a disaster, and move 8 is the star move we would like to
find. These values could be found by static evaluation, or by dynamic [recursive]
evaluations looking deeper in the game tree.
The second picture shows what could happen for various possible beta levels. The top
[dotted] line shows a high level. In this case, almost no gain is made; none of the first
seven moves is `good enough', but move 8 at last goes over the line, and we can return its
value. If the line had been a little higher, then no move would have been over the line,
nothing would have been gained, and we would have had to do a full evaluation of all
nine moves. The middle [dashed] line shows a lower beta level. In this case, move 4 is
already `good enough'. We do not need even to glance at moves 5 to 9. We can stop the
analysis immediately, and return the value of move 4. The bottom [mixed] line shows an
even more favourable situation; here the very first move is `good enough', and we can
return its value without glancing at moves 2 to 9. This can be a huge saving of time.
We've saved 8/9 of the analysis already; but it can be better than this in the context of
dynamic analysis, for we can make similar savings at every level of the tree, equivalent
[in this example] to having a computer which is potentially 9×9×9×... times faster, and
the deeper we look, the bigger the advantage. Note that the advantage is maximised by
having the beta level as low as possible, and by having `good enough' moves as early in
the list of moves as possible.
Note however that once pruning has taken place, we are no longer returning the true
value of a position. The third picture shows this in the context of the middle [dashed] beta
level. The true value of this position is the value of the best move, move 8. But we are
returning the value of move 4, the first `good enough' move; we don't know whether or
not later moves will be even better, so all we can say is that the true value of this position
is at least the value of move 4. This is shown by the arrow. The [potentially huge]
savings made by pruning the moves in the blue box are [slightly] offset by the loss of
information and the return of a lower bound rather than a true value. Very often, the
lower bound is all we need, and the pruning is pure gain.
This leaves two questions. Where do the alpha and beta levels come from? And what is
the alpha level for? They come from whoever asked us to analyse this position; we in turn
will pass them on to any subsidiary [`dynamic'] analysis. Because positions are evaluated
from the point of view of the player to move, they will be `inverted' when they are passed
on -- the caller's `ceiling' is the callee's `floor', and vice versa. So the alpha level, the
`floor', will become the beta level, the ceiling, for the subsidiary analysis; the higher we
can make the floor, the lower will be the resulting ceiling, and so the more efficient will
be that analysis. The next picture, left, shows this in action. We have inherited alpha and
beta levels from our `parent'. After evaluating move number 1, we are no longer
interested in moves that are worse than this, so we can raise the floor to the
corresponding value. Moves 2 and 3 now fall below this floor; these values come from
the `children' discovering moves above their beta level, so all we can say is that these
values are upper bounds on the true values -- but even the upper bounds are below our
floor, so we don't need to know the true values, just that moves 2 and 3 are worse than
move 1. This is indicated by the arrows again; note that the tops of the arrows are higher
than the corresponding dots were in the previous pictures.
In summary, when we evaluate a move, one of three things will happen: (a) The value
lies between alpha and beta. In this case, this move is potentially interesting -- it may be
the best move in the position, and its value may be the value of the position as a whole.
We can raise the value of alpha to match, for we are no longer interested in exact values
of worse moves. (b) The value is at least beta. This move is `good enough', and causes a
`beta cutoff'. We return its value as a lower bound for the true value without bothering to
evaluate the remaining moves. This value will, after inversion, be below the floor for the
`parent'. The move causing the cutoff is sufficient to `refute' the move leading to this
position. (c) The value is at most alpha. This move has been refuted, possibly by a beta
cutoff in the `child'. This value is an upper bound, not the true value, which may be much
worse, but this will not affect the value of the current position.
Value of alpha-beta
Some [rather ignorant!] programmers treat alpha-beta pruning as an `optional extra', a
modest bonus that makes searching slightly more efficient at the expense of sophisticated
programming. Not so! Its use typically doubles the depth to which the program can
search. Since the tree is growing exponentially with increasing depth, this is equivalent to
an exponential growth in the speed with which the program can search.
However, the pruning is seen to full advantage only if (a) the beta levels are low enough
to be interesting, and (b) the first move tried is often `good enough'. So much effort has
gone in to ensuring that these two conditions hold.
An insight into the help that alpha-beta pruning gives can be had by imagining that
Kasparov is sitting next to you suggesting good moves to try. When you are doing well,
and when his suggestions are good, you do not need to consider anything beyond his
ideas, and your analysis can be conducted much more efficiently -- a beta cutoff! If his
suggestions are bad, then you will have to consider other ideas as well [this is point (b)
above!], but even rotten suggestions do not cost you anything compared with no
suggestions at all If your position is bad, then his suggestions do not [indeed, cannot]
help; all the evaluations will be too low to cause a cutoff; but even in this case, his
suggestions will raise the floor as far as it can be raised [if he gives you the best ideas
first], so that deeper analysis will see a lower ceiling and easier cutoffs [this is point (a)
above]. In most lines, with best play [as suggested by Kasparov!], good positions and bad
positions will alternate [my piece up is your piece down!]; alpha-beta pruning causes
very quick cutoffs in the good positions, which is why you can search twice as deeply. Of
course, in real life your computer does not have access to Kasparov; it has to do the best
it can with its own analysis.
Refinements
The `correct' way to use alpha-beta pruning is to start with alpha set to minus infinity,
beta to plus infinity, and to rely on the natural `raising of the floor', seen above, to make
the bounds more useful. However, this creates a tension with the observation above that
the pruning works best if the bounds are as tight as possible. So other strategies are
possible.
Essentially, in all of these, we lie to our children about our proposed alpha-beta bounds.
Our children must, however, tell us the truth, as outlined above: if the value is between
alpha and beta, they must return the correct value, otherwise they return an upper or
lower bound. Our lies mean that we place the bounds closer than they should have been,
and therefore that the search is more efficient. The penalty is that sometimes we are
caught out. Even so, we will quite often get away with it; if the returned value is outside
the true bounds, then we only need the upper or lower bound anyway. If however it is
inside the true bounds but outside the fraudulent bounds, then we have to ask our children
to look again, and to refine their evaluations. So, the search is made sometimes more
efficient and sometimes less. Where the balance lies is a matter of pragmatism and
experiment.
Once this viewpoint is accepted, then describing this as `lying' carries unwanted moral
overtones. We should instead re-define the purpose of the analysis as being to provide a
value which is either `guaranteed' or else merely a bound, as described above. Then we
can ask our children questions with a clear conscience!
Effectively, we are asking of the analysis questions like `In this position, is White
between 1 and 2 pawns up?', and getting answers like `Yes, White is 1.376 pawns up' or
`No, White is at least 2.3 pawns up', or `No, White is no more than 0.52 pawns up'. [All
of these questions and answers relative to the requested depth of analysis, of course.]
These questions can be answered much more efficiently than the blunt `How many pawns
up is White?', and the answers are usually, but not always, sufficient. When they are not
sufficient, we have to ask again, `OK, is White between 2.3 and 3 pawns up', and balance
these extra questions against the extra efficiency.
Download