slides

advertisement
Games, Theory and Application
Jaap van den Herik and Jeroen Donkers
Institute for Knowledge
and Agent Technology
Institute for Knowledge and Agent Technology, IKAT
Department of Computer Science
Universiteit Maastricht
The Netherlands
Computer
Science
Universiteit
Maastricht
SOFSEM 2004
Prague, Hotel VZ Merin
Sunday, January 25
8.30-10.00h
Contents
•
•
•
•
•
•
•
Sofsem'04
Games in Artificial-Intelligence Research
Game-tree search principles
Using opponent models
Opponent-Model search
Gains and risks
Implementing OM search: complexities
Past future of computer chess
2
Games, such as Chess, Checkers, and Go
are an attractive pastime and
scientifically interesting
Sofsem'04
3
Chess
• Much research has been performed in Computer Chess
• Deep Blue (IBM) defeated the world champion
Kasparov in 1997
• A Micro Computer better than the world champion?
Sofsem'04
4
World Champion Programs
•
•
•
•
•
•
•
•
•
•
•
•
Sofsem'04
KAISSA
CHESS
BELLE
CRAY BLITZ
CRAY BLITZ
DEEP THOUGHT
REBEL
FRITZ
SHREDDER
JUNIOR
SHREDDER
?
1974
1977
1980
1983
1986
1989
1992
1995
1999
2002
2003
2004
Stockholm
Toronto
Linz
New York
Keulen
Edmonton
Madrid
Hong Kong
Paderborn
Maastricht
Graz
Ramat-Gan
5
International Draughts
• Buggy best draughts program
• Human better than computer, but the margin is small
• Challenge: More knowledge in program
Sofsem'04
6
Go
• Computer Go programs are weak
• Problem: recognition of patterns
• Top Go programs: Go4++, Many Faces of Go,
GnuGo, and Handtalk
Sofsem'04
7
Awari
•
•
•
•
A Mancala game (pebble-hole game)
The game is a draw
Solved by John Romein and Henri Bal (VU Amsterdam) 2002
All positions (900 billion) with a cluster computer of 144 1GHz
processors and 72Gb ram computed in 51 hour
• Proven that even the best computer programs still make many
errors in their games
Sofsem'04
8
Scrabble
•
•
•
•
Sofsem'04
Maven beats every human opponent
Author Brian Sheppard
Ph.D. (UM): “Towards Perfect Play of Scrabble” (02)
Maven can play scrabble in any language
9
Overview
Solved
Connect-four
Qubic
Go-Moku
Renju
Kalah
Awari
Nine men’s
morris
Sofsem'04
Super human
Checkers (8x8)
Gipf
Othello
Scrabble
Backgammon
Lines of Action
Bao
World champion
Chess
Draughts (10x10)
Grand master
Amazons
Bridge
Chinese Chess
Hex
Poker
Amateur
Go
Arimaa
Shogi
10
Computer Olympiad
• Initiative of David Levy (1989)
• Since 1989 there have been 8 olympiads; 4x
Londen, 3x Maastricht, 1x Graz
• Goal:
• Finding the best computer program for each game
• Connecting programmers / researchers of different games
• Computers play each other in competition
• Demonstrations:
• Man versus Machine
• Man + Machine versus Man + Machine
Sofsem'04
11
Computer versus Computer
Sofsem'04
12
Computer Olympiad
• Last time in Graz 2004
• Also World Championship Computer Chess and
World Championship Backgammon
• 80 particpants in several categories
• Competitions in olympiad’s history:
Abalone, Awari, Amazons, Backgammon, Bao,
Bridge, Checkers, Chess, Chinese Chess, Dots and
Boxes, Draughts, Gipf, Go-Moku, 19x19 Go, 9x9 Go,
Hex, Lines of Action, Poker, Renju, Roshambo,
Scrabble, and Shogi
Sofsem'04
13
UM Programs on the Computer Olympiads
Gipfted - Gipf
MIA - LOA
Anky - Amazons
Sofsem'04
Bao 1.0
Magog - Go
14
Computer Game-playing
• Can computers beat humans in board games
like Chess, Checkers, Go, Bao?
• This is one of the first tasks of Artificial
Intelligence (Shannon 1950)
• Successes obtained in Chess (Deep Blue),
Checkers, Othello, Draughts, Backgammon,
Scrabble...
Sofsem'04
15
Computer Game-Playing
• Sizes of game trees:
•
•
•
•
•
Nim-5:
Tic-Tac-Toe:
Checkers:
Chess:
Go:
28 nodes
 105 nodes
 1031 nodes
 10123 nodes
 10360 nodes
• In practice it is intractable to find a solution
with minimax: so use heuristic search
Sofsem'04
16
Three Techniques
• Minimax Search
• α-β Search
• Opponent Modelling
Sofsem'04
17
Game-playing
• Domain: two-player zero-sum game with
perfect information (Zermelo, 1913)
• Task: find best response to opponent’s moves,
provided that the opponent does the same
• Solution: Minimax procedure (von Neumann,
1928)
Sofsem'04
18
Minimax Search
Nim-5
Players remove on turn 1, 2, or 3 matches.
The player who takes the last match wins the game.
Sofsem'04
19
Minimax Search
5
1
2
4
1
1
2
1
1
1 0 0
3
2
2
2
3
2
3
1
3
0
1
1
1
1
2
0 0
2
1
1
1
1
0
3
2
2
1
2
0 0
3
0
1
2
1 0
1
1
0
1
0
1
0
+1 –1 –1 +1 –1 +1 +1 –1 +1 +1 –1 +1 –1
Sofsem'04
20
Minimax Search
+1
1
MINIMAXING
+1
–1
1
2
2
3
+1
1
3
1
+1
2
1
1
+1 –1 –1
1
–1
–1
3
1
2
+1 –1 +1 –1
+1
1
2
2
1
–1 +1 –1 +1 +1 –1 +1 +1
2
3
–1
+1
1
1
2
1
+1
1
–1
1
+1
+1 –1 –1 +1 –1 +1 +1 –1 +1 +1 –1 +1 –1
Sofsem'04
21
Pruning
• You do not need the total solution to play
(well): only the move at the root is needed
• Methods exist to find this move without a need
for a total solution: pruning
• Alpha-beta search (Knuth & Moore ‘75)
• Proof-number search (Allis et al. ‘94), etc.
• Playing is solving a sequence of game trees
Sofsem'04
22
Heuristic Search
0.25
1
2
–1
0.25
1
0.25
2
0.33
3
3
0.5
1
0.33
–1
2
3
0.5
–1
1
0.5
2
–1
• Truncate the game tree (depth + selection)
• Use a (static heuristic) evaluation function at
the leaves to replace pay-offs
Sofsem'04
• Miminax on the reduced game tree
23
Heuristic Search
• The approach works very well in Chess,
but...
• Is solving a sequence of reduced games the
best way to win a game?
• Heuristic values are used instead of pay-offs
• Additional information in the tree is unused
• The opponent is not taken into account
• We aim at the last item: opponents
Sofsem'04
24
Minimax
[3]
2
3
3
Sofsem'04
4
2
7
25
α-β algorithm
3
3
β-pruning
2
3
Sofsem'04
4
2
7
26
The strength of α-β
3
4
2
More than thousand prunings
Sofsem'04
27
The importance of α-β algorithm
3
3
3
Sofsem'04
β-pruning
4
2
28
The Possibilities Of Chess
THE NUMBER OF DIFFERENT, REACHABLE
POSITIONS IN CHESS IS
(CHINCHALKAR): 1046
Sofsem'04
29
A Clever Algorithm (α-β)
SAVES THE SQAURE ROOT OF THE NUMBER OF POSSIBILITIES, N, THIS IS MORE
THAN
99,99999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999
999999999999999999999999%
Sofsem'04
30
A Calculation
NUMBER OF POSSIBILITIES:
SAVINGS BY α-Β ALGORITHM:
1000 PARALLEL PROCESSORS:
POSITIONS PER SECOND:
LEADS TO: 1023-12 =
A CENTURY IS
SOLVING CHESS:
1046
1023
103
109
1011 SECONDS
109 SECONDS
102 CENTURIES
SO 100 CENTURIES OR 10,000 YEAR
Sofsem'04
31
Using opponent’s strategy (NIM-5)
–1
1
2
–1
1
–1
1
–1
1
2
–1
2
1
+1 –1 –1
–1
2
3
+1
1
3
3
1
+1
2
1
3
1
2
+1 –1
+1
1
2
2
1
1
–1 +1 +1 –1 +1 +1
1
–1
1
–1
1
+1
Sofsem'04
“Player 1 never takes 3”
32
Using opponent’s strategy
• Well known Tic-Tac-Toe strategy:
R1: make 3-in-a-row if possible
R2: prevent opponent’s 3-in-a-row
if possible
H1: occupy central square if possible
H2: occupy corner square if possible
knows that
Sofsem'04
uses this strategy
33
Opponent-Model search
Iida, vd Herik, Uiterwijk, Herschberg (1993)
Carmel and Markovitch (1993)
• Opponent’s evaluation function is known
(unilateral: the opponent uses minimax)
• This is the opponent model
• It is used to predict opponent’s moves
• Best response is determined, using the own
evaluation function
Sofsem'04
34
OM Search
• Procedure:
• At opponent’s nodes: use minimax (alpha-beta)
to predict the opponent’s move. Return the own
value of the chosen move
• At own nodes: Return (the move with) the highest
value
• At leaves: Return the own evaluation
• Implementation: one-pass or probing
Sofsem'04
35
OM Search Equations
Sofsem'04
36
OM Search Example
7
7
8
Sofsem'04
V0: 8
Vop: 9
v0: 8
vop: 6
v0: 7
vop: 6
7
6
V0: 7
Vop: 6
8
V0: 8
Vop: 6
v0: 8
vop: 6
6
V0: 6
Vop: 7
37
OM Search Algorithm (probing)
Sofsem'04
38
Risks and Gains in OM search
• Gain: difference between the predicted move
and the minimax move
• Risk: difference between the move we expect
and the move the opponent really selects
• Prediction of the opponent is important, and
depends on the quality of opponent model.
Sofsem'04
39
Additional risk
• Even if the prediction is correct, there are
traps for OM search
• Let P be the set of positions that the
opponent selects, v0 be our evaluation
function and vop the opponent’s function.
Sofsem'04
40
Four types of error
• V0 overestimates a position in P (bad)
• V0 underestimates a position in P
• Vop underestimates a position that enters P
(good for us)
• Vop overestimates a position in P
Sofsem'04
41
Base case
MAX
6
MIN
MAX
6
6
V0 = 6
Vop = 6
Sofsem'04
v0 = 6
vop = 6
mmx = 6
obt = 6
v0 = 6
vop = 6
mmx = 6
obt = 6
2
7
8
V0 = 7
Vop = 7
V0 = 8
Vop = 8
v0 = 2
vop = 2
mmx = 2
obt = 2
2
V0 = 2
Vop = 2
Vo: our evaluation
Vop: opponent’s evaluation
Mmx: minimax value
Obt: obtained value
42
Type-I error
MAX
6
MIN
MAX
6
6
V0 = 6
Vop = 6
Sofsem'04
v0 = 9
vop = 6
mmx = 8
obt = 2
v0 = 6
vop = 6
mmx = 6
obt = 6
2
7
8
V0 = 7
Vop = 7
V0 = 8
Vop = 8
v0 = 9
vop = 2
mmx = 8
obt = 2
2
V0 = 9
Vop = 2
Vo: our evaluation
Vop: opponent’s evaluation
Mmx: minimax value
Obt: obtained value
43
MAX
Type-I error
vanished
MIN
MAX
Sofsem'04
6
v0 = 8
vop = 8
mmx = 8
obt = 8
6
v0 = 6
vop = 6
mmx = 6
obt = 6
2
6
7
8
V0 = 6
Vop = 6
V0 = 7
Vop = 7
V0 = 8
Vop = 8
v0 = 8
vop = 8
mmx = 8
obt = 8
2
V0 = 9
Vop = 9
Vo: our evaluation
Vop: opponent’s evaluation
Mmx: minimax value
Obt: obtained value
44
Type-II error
MAX
6
MIN
MAX
v0 = 2
vop = 6
mmx = 2
obt = 2
6
6
V0 = 1
Vop = 6
v0 = 1
vop = 6
mmx = 1
obt = 6
2
7
8
V0 = 7
Vop = 7
V0 = 8
Vop = 8
v0 = 2
vop = 2
mmx = 2
obt = 2
2
V0 = 2
Vop = 2
Vo: our evaluation
Vop: opponent’s evaluation
Mmx: minimax value
Obt: obtained value
Sofsem'04
45
Type-III error
MAX
6
MIN
MAX
6
6
V0 = 6
Vop = 6
Sofsem'04
v0 = 8
vop = 6
mmx = 6
obt = 8
v0 = 6
vop = 6
mmx = 6
obt = 6
2
7
8
V0 = 7
Vop = 7
V0 = 8
Vop = 1
v0 = 8
vop = 1
mmx = 2
obt = 8
2
V0 = 2
Vop = 2
Vo: our evaluation
Vop: opponent’s evaluation
Mmx: minimax value
Obt: obtained value
46
Type-IV error
MAX
6
MIN
MAX
6
6
V0 = 6
Vop = 8
Sofsem'04
v0 = 7
vop = 7
mmx = 6
obt = 7
v0 = 7
vop = 7
mmx = 6
obt = 7
2
7
8
V0 = 7
Vop = 7
V0 = 8
Vop = 8
v0 = 2
vop = 2
mmx = 2
obt = 2
2
V0 = 2
Vop = 2
Vo: our evaluation
Vop: opponent’s evaluation
Mmx: minimax value
Obt: obtained value
47
Pruning in OM Search
• Pruning at MAX nodes is not possible in OM
search, only pruning at MIN nodes can take
place (Iida et al, 1993).
• Analogous to α-β search, the pruning version
is called β-pruning OM search
Sofsem'04
48
Pruning Example
7
8
7
8
6
8
Sofsem'04
7
8
a
b
c
9
d
5
7
e
h
i
7
4
9
j
4
6
5
7
8
f
g
k
l
5
7
m
8
n
o
p
q
r
s
t
u
v
w
x
y
9
6
8
7
4
5
9
10
4
6
8
9
5
7
9
8
49
Two Implementations
• One-pass:
• visit every node at most once
• back-up both your own and the opponent’s value
• Probing:
• At MIN nodes use α-β search to predict the
opponent’s move
• back-up only one value
Sofsem'04
50
One-Pass β-pruning OM search
Sofsem'04
51
Probing β-pruning OM search
Sofsem'04
52
Best-case time complexity
Sofsem'04
53
Best-case time complexity
COM = Subtree A
C’OM = Subtree B
Sofsem'04
54
Best-case time complexity
• The time complexity for the one-pass version
is equal to that of the theoretical best
• The time complexity for the probing version is
different:
• first detect the number of leaf
evaluations needed:
• Then find the number of probes needed:
Sofsem'04
55
Best-case time complexity
In the best case,
pruning in the probes
is not changed by β=v+1:
all probes take the same
number of evaluations
as with a fully open
window.
Sofsem'04
56
Best-case time complexity
Sofsem'04
57
Time complexities compared
Sofsem'04
(Branching factors 4, 8, 12, 16, 20)
58
Average-case time complexity
(Measured on random game trees)
Sofsem'04
59
Probabilistic
Opponent-model Search
Donkers, vd Herik, Uiterwijk (2000)
• More elaborate opponent model:
• Opponent uses a mixed strategy: n different
evaluation functions (opponent types) plus a
probability distribution
• It is assumed to approximate the true opponent’s
strategy
• Own evaluation function is one of the opponent types
Sofsem'04
60
PrOM Search
• Procedure:
• At opponent’s nodes: use minimax (alphabeta) to predict the opponent’s move for all
opponent types separately. Return the expected
own value of the chosen moves
• At own nodes: Return (the move with) the
highest value
• At leafs: Return the own evaluation
• Implementation: one-pass or probing
Sofsem'04
61
PrOM Search Equations
Sofsem'04
62
PrOM Search Example
v0: 7.4
7
Pr(0) = 0.3
Pr(1) = 0.7
v0: 0.3 x 6 + 0.7 x 8 = 7.4
v0: 7
7
0.0
v0: 8
Sofsem'04
8
0: 7
1: 8
0: 7
1: 6
6
1.0
0: 8
1: 7
v0: 7
7
0.7
0: 7
1: 6
v0: 8
8
0: 6
1: 8
0.3
0: 8
1: 8
v0: 6
6
0: 6
1: 10
63
PrOM Search Algorithm
Sofsem'04
64
How did chess players envision the
future of non-human chess players?
-
Euwe
Donner
De Groot
Timman
Sosonko
Böhm
Question: Do you think that a computer will
be able to play at grandmaster level?
Sofsem'04
65
Euwe (June 10, 1980)
“I don’t believe so.
I am almost convinced of that, too”
Sofsem'04
66
Timman (June 22, 1980):
“If you would express it in ELO-points, than I believe
that a computer will not be able to obtain more than
......
let’s say
......
2500 ELO-points.”
Sofsem'04
67
Sosonko (August 26, 1980):
• “I haven’t thought about that so much, but I
believe that it will certainly not be harmful for the
human chess community. It will become more
interesting. I am convinced that computer chess
will play a increasingly important role. It will be
very interesting, I think”
Sofsem'04
68
Donner (April 13, 1981):
“But the computer cannot play chess at all and will
never be able to play the game, at least not the first
two thousand years, (...)
What it does now has nothing to do with chess.”
Sofsem'04
69
De Groot (June 19, 1981):
“No, certainly not. As a matter of fact, I believe it will
not even possible that it can reach a stable master
level.
I believe it is possible that the computer can reach a
master level, but not that it is a stable master.”
Sofsem'04
70
Contributions from Science
• COMPUTERS PLAY STRONGER THAN
HUMANS.
• COMPUTERS CAN NOT SOLVE CHESS.
• COMPUTERS ENABLE AN ALTERNATIVE FORM
OF GAME EXPERIENCE.
Sofsem'04
71
Conclusions
1.
Chess is a frontrunner among the games
2.
Kasparov’s defeat has become a victory
for brute force in combination with
knowledge and opponent modelling
Sofsem'04
72
The Inevitable Final Conclusion
TECHNOLOGY, AND IN PARTICULAR
OPPONENT MODELLING,
MAKES THE BEST BETTER,
BUT DOES NOT LEAVE THE
WEAKER PLAYERS WITH
EMPTY HANDS
SINCE THEIR LEVEL
WILL INCREASE TOO.
Sofsem'04
73
Download