A SCRABBLE CROSSWORD GAME PLAYING ... S t u a r t C. Shapiro

advertisement
A SCRABBLE CROSSWORD GAME PLAYING PROGRAM
S t u a r t C. S h a p i r o
Department of Computer Science
S t a t e U n i v e r s i t y o f New York a t B u f f a l o
A m h e r s t , New York 14226
A program t h a t p l a y s the SCRABBLE Crossword Game has oeen d e s i g n e d and implemented in SIMULA
67 on a DECSystem-­10 and in Pascal on a CYBER 173. The h e a r t of t h e d e s i g n is the d a t a s t r u c -­
t u r e f o r th e l e x i c o n and the a l g o r i t h m f o r s e a r c h i n g i t . The l e x i c o n i s r e p r e s e n t e d a s a
l e t t e r t a b l e , o r t r i e u s i n g a c a n o n i c a l o r d e r i n g o f the l e t t e r s i n t h e words r a t h e r t h a n t h e
o r i g i n a l s p e l l i n g . The a l g o r i t h m takes t h e t r i e and a c o l l e c t i o n o f l e t t e r s , i n c l u d i n g b l a n k s ,
and f i n d s a l l words t h a t can b e formed from any c o m b i n a t i o n and p e r m u t a t i o n o f t h e l e t t e r s .
Words a r e found i n a p p r o x i m a t e l y t h e o r d e r o f t h e i r v a l u e i n the game.
1
INTRODUCTION
game between any number of p l a y e r s , and each
w r o t e a program p l a y e r .
The SCRABBLE Crossword Game is a w e l l known
game c o n s i d e r e d to r e q u i r e a f a i r amount of
i n t e l l i g e n c e , s t r a t e g i c and t a c t i c a l s k i l l ,
f a c i l i t y w i t h w o r d s , and l u c k . U n l i k e most games
i n the a r t i f i c i a l i n t e l l i g e n c e l i t e r a t u r e , i t
can be p l a y e d by two or more p l a y e r s and is
n e i t h e r a zero sum game n o r a game of p e r f e c t
i n f o r m a t i o n . For these r e a s o n s , minimax s e a r c h -­
i n g p r o c e d u r e s do n o t seem a p p l i c a b l e . When
humans p l a y the game, a l a r g e e a s i l y a c c e s s i b l e
v o c a b u l a r y seems to be the most i m p o r t a n t d e t e r -­
m i n e r o f v i c t o r y . One m i g h t , t h e r e f o r e , t h i n k
t h a t i t would b e easy t o w r i t e a program t h a t
p l a y s the SCRABBLE Crossword Game at the cham-­
p i o n s h i p l e v e l . We are examining t h i s question
b y c o n c e n t r a t i n g our e f f o r t s o n the d e s i g n o f
the l e x i c o n and t h e a l g o r i t h m f o r s e a r c h i n g i t
and p a y i n g much l e s s a t t e n t i o n t o the s t r a t e g i e s
and t a c t i c s o f a c t u a l p l a y .
I n t he s e c t i o n s t h a t f o l l o w , w e w i l l b r i e f l y
d e s c r i b e the game manager, b a s i n g the d e s c r i p -­
t i o n o n the P a s c al v e r s i o n , w h i c h i s more gen-­
e r a l t h a n the SIMULA v e r s i o n . We w i l l t h e n
describe
t h e l e x i c o n d a t a s t r u c t u r e and search
a l g o r i t h m i n more d e t a i l . F i n a l l y , w e w i l l
b r i e f l y d e s c r i b e t h e t h r e e program p l a y e r s t h a t
have been w r i t t e n .
2.
THE GAME MANAGER
F i g u r e 1 shows the o v e r a l l o r g a n i z a t i o n o f the
s y s t e m , assuming one human p l a y e r and one p r o -­
gram p l a y e r . I n r e a l i t y ,
t h e r e may b e any
number of human p l a y e r s and any number of p r o -­
gram p l a y e r s as l o n g as t h e r e a r e at l e a s t two
players.
The game manager module h a n d l e s a l l i n t e r a c t i o n
w i t h human p l a y e r s and e i t h e r d e a l s t i l e s o r
a c c e p t s t i l e s p i c k e d b y human a s s i s t a n t s . A f t e r
Our l e x i c o n d i f f e r s fro m those u s u a l l y used i n
n a t u r a l language p r o c e s s i n g programs because o f
t h e use t o w h i c h i t i s t o b e p u t . U s u a l l y one i s
c o n f r o n t e d w i t h a p o s s i b l e w o r d . One must d e t e r -­
mine i f i t i s a w o r d , a n d , i f s o , segment i t i n -­
t o a f f i x e s and s t e m , and r e t r i e v e l e x i c a l i n f o r -­
m a t i o n a s s o c i a t e d w i t h t h e stem. The problem f o r
t h e SCRABBLE Crossword Game l e x i c o n i s , g i v e n a
s e t o f l e t t e r s , f i n d a l l t h e words t h a t can b e
made f r o m any c o m b i n a t i o n and p e r m u t a t i o n of
them. T h i s i s a v e r y d i f f e r e n t p r o b l e m .
We have d e s i g n e d a program t h a t p l a y s the
SCRABBLE Crossword Game and implemented two
v e r s i o n s . A t I n d i a n a U n i v e r s i t y , Howard Smith
implemented a program in SIMULA 67
[2,3]
on a
DECSystem-­10. A t S U N Y / B u f f a l o , M i c h a e l M o r r i s
and K a r l Schimpf implemented a v e r s i o n i n
P a s c a l [ 6 ] on a CDC CYBER 173 t h a t manages a
797
and computes t h e s c o r e o f l e g a l p l a y s . I f a
human proposes a word n o t in the l e x i c o n , t h e
r e f e r e e asks i f i t i s r e a l l y a w o r d . I f t h e
human says i t i s , i t i s added t o t h e l e x i c o n .
Otherwise the play i s considered i l l e g a l .
Each node i n th e l e x i c o n t r i e c o n t a i n s a l e t t e r ,
a l i s t o f w o r d s , a success p o i n t e r and a f a i l
p o i n t e r . I f L i s t h e sequence o f l e t t e r s c o n -­
t a i n e d i n a l l nodes o n t h e success p a t h f r om t h e
r o o t t o some n o d e , n , e x c l u d i n g the l e t t e r i n n ,
and l i s t h e l e t t e r i n n , t h e n t h e l i s t o f words
i n n i s a l l words t h a t a r e p e r m u t a t i o n s o f t h e
l e t t e r s i n L p l u s £ , t h e success p o i n t e r p o i n t s
to a s u b t r i e c o n t a i n i n g words formed f r o m L, £,
p l u s o t h e r l e t t e r s , and t h e f a i l p o i n t e r p o i n t s
t o a s u b t r i e c o n t a i n i n g words formed f r o m L ,
other l e t t e r s , but not
l.
The o r d e r o f l e t t e r s
o n b o t h success and f a i l p a t h s i s the c a n o n i c a l
o r d e r d i s c u s s e d b e l o w . F i g u r e 2 shows a s m a l l
lexicon t r i e .
Program p l a y e r s may use t h e r e f e r e e t o f i n d
l e g a l p l a y s and t o choose t h e b e s t among s e v e r a l
legal plays.
2.2
The Human Opponent
A human p l a y e r i n t e r a c t s w i t h the game manager
in a n o t a t i o n close to the o f f i c i a l n o t a t i o n of
the SCRABBLE Crossword Game P l a y e r s , I n c . [ l ] .
The word i s t y p e d w i t h each l e t t e r a l r e a d y o n
t h e b o a r d preceeded by a " $ " . When a b l a n k t i l e
is p l a y e d , a "@'' is typed f o l l o w e d by the l e t t e r
i t i s b e i n g used a s . The l o c a t i o n o f t h e word i s
i n d i c a t e d e i t h e r by a row and the b e g i n n i n g and
e n d i n g columns ( e . g . 8 K -­ N ) , or by a column and
t h e b e g i n n i n g and e n d i n g rows ( e . g . C 2 -­ 7 ) .
2.3
3.2
The L e x i c o n Manager
The
l e x i c o n manager i s used b y t h e r e f e r e e t o
check w h e t h e r words and c r o s s w o r d s a r e c o n t a i n e d
in t h e
l e x i c o n , and to add words w h i c h a human
p l a y e r c l a i m e d a r e w o r d s , b u t were n o t a l r e a d y i n
t h e l e x i c o n . The program p l a y e r s use t h e l e x i c o n
manager to f i n d words to p l a y . A human p l a y e r may
use t h e l e x i c o n manager t o i n t e r a c t w i t h t h e
l e x i c o n w i t h o u t e n d i n g t h e game.
3.
3.1
The Search A l g o r i t h m
B r i e f l y , the a l g o r i t h m f o r searching the
l e x i c o n i s t o t a k e a s e t o f t i l e s , o r d e r them
a c c o r d i n g to the c a n o n i c a l o r d e r i n g , search the
f a i l p a t h o f the r o o t u n t i l t h e l e t t e r o f t h e
f i r s t t i l e matches t h e l e t t e r i n a n o d e , t h e n
search t h e success s u b t r i e w i t h the r e m a i n i n g
t i l e s o f t h e s e t . T h e n , t o g e t words formed
from s u b s e t s o f t h e o r i g i n a l s e t , i g n o r e t h e
l e t t e r t h a t matched t h e node and a l s o s e a r c h
t h e f a i l s u b t r i e . So, i f t h e l e x i c o n o f F i g u r e
2 i s searched w i t h t h e l e t t e r s KCPTIE, "peck
w i l l b e f o u n d , and t h e n , i g n o r i n g " p " , " t i c k
w i l l be found.
11
f t
So t h a t words can be found in a p p r o x i m a t e l y t h e
o r d e r o f t h e i r v a l u e s i n t h e SCRABBLE Crossword
Game, the m a j o r s o r t used i n the c a n o n i c a l o r d e r
i n g i s t h e v a l u e o f t h e l e t t e r s i n t h a t game.
S i n c e when a l e t t e r o f a l e x i c o n node i s i n t h e
s e a r c h s e t b o t h i t s success and f a i l s u b t r i e s
are searched, but i f i t i s not only i t s f a i l
s u b t r i e i s s e a r c h e d , t h e secondary o r d e r
is
f r e q u e n c y o f t h e l e t t e r i n t h e game s e t , l e a s t
f r e q u e n t f i r s t . The t e r t i a r y s o r t i s f r e q u e n c y
i n E n g l i s h , most f r e q u e n t f i r s t , t o h e l p m i n i -­
mize the s i z e o f the l e x i c o n t r i e . Contrary t o
t h i s o r d e r i n g , " S " was p l a c e d l a s t , because o f
t h e l a r g e number o f words t h a t can have " S "
added and r e m a i n w o r d s . For e x a m p l e , F i g u r e 2
has 2 1 n o d e s , b u t i f " S " were p l a c e d b e f o r e
L » where i t b e l o n g s , the same l e x i c o n would
have 32 n o d e s . The f i n a l o r d e r i n g used was
' 'QZXJKHFYWVCMPBGDLUTNRIOAES''.
THE LEXICON
The Data S t r u c t u r e
Four d e t a i l s were i g n o r e d i n t h e above b r i e f
d e s c r i p t i o n o f the search a l g o r i t h m . F i r s t ,
l o n g e r w o r d s , l i k e " s m i l e s " a r e more v a l u a b l e
t h a n s h o r t e r words u s i n g t h e same l e t t e r s , l i k e
" m i l " , so words on t h e same success b r a n c h a r e
c o l l e c t e d i n the o r d e r o f l e a f t o w a r d r o o t ,
rather
t h a n r o o t toward l e a f . Second, t h e
s e a r c h s e t o f t i l e s may c o n t a i n b l a n k s , w h i c h
can match any l e t t e r . B l a n k s a r e i n i t i a l l y
p l a c e d f i r s t i n t h e s e a r c h s e t s o they can
match nodes h i g h i n t h e l e x i c o n t r i e , and l a t e r
798
b e s t o f the P w p l a y s ,
i l l e g a l , t h e y exchange
tries playing parallel
s i d e r i n g up to w2 words
rack.
s h u f f l e d down i n t h e s e a r c h s e t t o match lower
n o d e s . T h i r d , t h e s e a r ch s e t can c o n t a i n l e t t e r s
t h a t a r e r e q u i r e d t o b e i n found w o r d s , s o a r e
never ignored by the search a l g o r i t h m . F o u r t h ,
the s e a r c h t e r m i n a t e s a f t e r each word i s found
i n case t h a t word i s a c c e p t a b l e t o the p l a y e r and
n o f u r t h e r search i s r e q u i r e d . I f t h i s i s not the
c a s e , t h e s e a r c h can b e resumed from where i t
l e f t o f f . The SIMULA 67 v e r s i o n uses t h e d e t a c h
f a c i l i t y f o r t h i s p u r p o s e . Space p r e c l u d e s p r e -­
s e n t i n g the complete a l g o r i t h m h e r e . I t can b e
found as p r o c e d u r e f i n d w o r d s 2 in [9") .
3.3
Table 1 shows the r e s u l t s of i n t e r a c t i v e games of
SIMSCRB a g a i n s t humans DRF and S J , and b a t c h
games o f G e r ry a g a i n s t i t s e l f and Joanne a g a i n s t
itself.
The programs p l a y a p p r o x i m a t e l y a t human
Use of Secondary S t o r a g e
We have used two d i f f e r e n t schemes f o r m a i n t a i n -­
i n g t h e l e x i c o n on d i s k . The SIMULA 67 v e r s i o n
uses a s e q u e n t i a l f i l e o f c h a r a c t e r s and d i g i t s .
The l e t t e r o f each node i s f o l l o w e d b y the l i s t
o f w o r d s , i f a n y , and t h e n b y one o f the d i g i t s
0-­3 i n d i c a t i n g w h i c h k i n d s o f s u b t r i e s the node
h a s . I f b o t h a r e p r e s e n t , t h e success s u b t r i e
proceeds t h e f a i l s u b t r i e . T h i s o r g a n i z a t i o n
a l l o w s t h e l e x i c o n t o b e searched w h i l e the f i l e
i s r e a d i n t h e f o r w a r d d i r e c t i o n o n l y . However,
a n o d e ' s e n t i r e success s u b t r i e must b e read t o
f i n d i t s f a i l s u b t r i e . The SIMUIA v e r s i o n , w i t h
a l e x i c o n of abou t 1500 w o r d s , averaged about 42
CPU seconds per p a i r of moves (program and human).
l e v e l . The m a j o r d i f f e r e n c e i s t h e number o f
t i m e s the programs exchange t i l e s . T h i s i s p a r t -­
l y due t o s m a l l v o c a b u l a r i e s , and p a r t l y t o
w a s t i n g many o f t h e l i m i t e d p w t r i e s o n p o s i t i o n s
f o r w h i c h l e g a l moves c o u l d n o t b e f o u n d .
The P a s c a l v e r s i o n uses a segmented f i l e o f r e -­
c o r d s , where each r e c o r d i s a l e x i c o n node c o n -­
t a i n i n g a t most one w o r d . I f n e c e s s a r y , s u c c e s s i v e
r e c o r d s w i t h dummy l e t t e r s c o n t a i n a d d i t i o n a l
w o r d s . Except f o r t h i s , t h e r o o t o f t h e success
s u b t r i e o f a node i m m e d i a t e l y f o l l o w s t h e node.
A node w i t h no success s u b t r i e is f o l l o w e d by an
e n d -­ o f -­ s e g m e n t mark. The r o o t o f the f a i l s u b t r i e
o f a node i s t h e f i r s t node o f some segment l a t e r
i n t h e f i l e , s o t h a t t o f i n d a n o d e ' s f a i l s u b -­
t r i e , t h e e n t i r e success s u b t r i e n e e d n ' t b e r e a d ,
j u s t t h e f i r s t node o f each i n t e r v e n i n g segment.
C u r r e n t l y , t h i s l e x i c o n c o n t a i n s 2126 w o r d s , i n
4262 node r e c o r d s and 1666 segments. The P a s c a l
program averages abou t 29 CPU seconds per p a i r of
program v s . program moves when r u n i n b a t c h , b u t
has been t o o slow t o r u n i n t e r a c t i v e l y due t o
system o v e r h e a d .
4.
o r , i f these a r e a l l
t h e i r r a c k s . Joanne f i r s t
t o e x i s t i n g w o r d s , c o n -­
formed e n t i r e l y from t h e
PROGRAM PLAYERS
Three program p l a y e r s have been w r i t t e n , SIMSCRB,
w r i t t e n by Howard Smith in SIMULA 6 7 , and two
P a s c a l programs -­ -­ G e r r y , b y M i c h a e l M o r r i s , and
J o a n n e , by K a r l Schimpf. SIMSCRB o n l y p l a y s a c r o s s
e x i s t i n g w o r d s , n e v e r e x t e n d i n g a word o r p l a y i n g
p a r a l l e l t o a w o r d . Joanne a l s o c o n s i d e r s p a r a l l e l
p l a y s , b u t n o t e x t e n d i n g w o r d s . Gerry c o n s i d e r s
a l l t y p e s o f p l a y s . Each b o a r d a n a l y z e r m a i n t a i n s
a l i s t o f p l a y a b l e p o s i t i o n s o r d e r e d b y e x p e c t ed
v a l u e of a p l a y t h e r e . The programs c o n s i d e r t h e
b e s t P p o s i t i o n s and t h e f i r s t w words found i n
t h e l e x i c o n f o r each p o s i t i o n . The chosen p l a y i s
the f i r s t t h a t i s wort h a t l e a s t m p o i n t s , o r the
799
ACKNOWLEDGEMENTS
Ben Shneiderman, M a r g a r e t Ambrose and Barbara
Rasche c o o p e r a t e d o n a n e a r l y v e r s i o n o f the p r o -­
gram. Howard R. Smith implemented the SIMULA 67
v e r s i o n , and was p a r t i a l l y r e s p o n s i b l e f o r the
l e x i c o n s e a r ch a l g o r i t h m , M i c h a e l M o r r i s J r . and
K a r l Schimpf implemented the P a s c al v e r s i o n .
REFERENCES
[1].
C o n k l i n , D . K . ( E d . ) The O f f i c i a l SCRABBLE
P l a y e r s Handbook. Harmony Books D i v i s i o n , Crown
P u b l i s h e r s , I n c . , NY, 1976.
[2].
D a h l , O . -­ J . , Myhrhang, B . ; H y g a a r d , K . The
Simula 67 Common Base Language. Norwegian Com-­
p u t i n g C e n t r e , F o r s k n i n g s v e i e n I B , Oslo 3 , 1968.
[3].
D a h l , O . -­ J . , and N y g a a r d , K . S i m u l a -­ a n
A l g o l -­ b a s e d s i m u l a t i o n l a n g u a g e. Comm. ACM 9,
( S e p t . , 1 9 6 6 ) , 6 7 1 -­ 6 7 8 .
[4].
De La B r i a n d a i s , R. F i l e s e a r c h i n g u s i n g
v a r i a b l e l e n g t h k e y s . P r o c . WJCC, AFIPS P r e s s ,
M o n t v a l e , N J , 1959, 295-­298.
[5].
Hays, D.G. I n t r o d u c t i o n t o C o m p u t a t i o n a l
L i n g u i s t i c s . A m e r i c a n E l s e v i e r , NY, 1967, 9 2 -­ 9 4 .
[6~|. J e n s e n , K. and W i r t h , N. PASCAL User
Manual and R e p o r t . S p r i n g e r -­ V e r l a g , NY, 1976.
[7].
K n u t h , D.E. The A r t o f Computer Programming
V o l . 3 / S o r t i n g and S e a r c h i n g . A d d i s o n -­ W e s l e y ,
Reading, MA, 1973, 4 8 1 -­ 4 8 7 .
[ 8 1 . Lamb, S.M and J a c o b s e n , W . H . , J r . , A h i g h -­
speed l a r g e -­ c a p a c i t y d i c t i o n a r y s y s t e m . Mechani-­
c a l T r a n s l a t i o n , 6 ( N o v . , 1 9 6 1 ) , 7 6 -­ 1 0 7 .
[9].
S h a p i r o , S.C. and S m i t h , H . R . , A SCRABBLE
Crossword Game P l a y i n g Program. Tech R p t . No. 119,
Dept. of Computer S c i e n c e , S U N Y / B u f f a l o , NY, 1977,
Download