A SCRABBLE CROSSWORD GAME PLAYING PROGRAM S t u a r t C. S h a p i r o Department of Computer Science S t a t e U n i v e r s i t y o f New York a t B u f f a l o A m h e r s t , New York 14226 A program t h a t p l a y s the SCRABBLE Crossword Game has oeen d e s i g n e d and implemented in SIMULA 67 on a DECSystem-­10 and in Pascal on a CYBER 173. The h e a r t of t h e d e s i g n is the d a t a s t r u c -­ t u r e f o r th e l e x i c o n and the a l g o r i t h m f o r s e a r c h i n g i t . The l e x i c o n i s r e p r e s e n t e d a s a l e t t e r t a b l e , o r t r i e u s i n g a c a n o n i c a l o r d e r i n g o f the l e t t e r s i n t h e words r a t h e r t h a n t h e o r i g i n a l s p e l l i n g . The a l g o r i t h m takes t h e t r i e and a c o l l e c t i o n o f l e t t e r s , i n c l u d i n g b l a n k s , and f i n d s a l l words t h a t can b e formed from any c o m b i n a t i o n and p e r m u t a t i o n o f t h e l e t t e r s . Words a r e found i n a p p r o x i m a t e l y t h e o r d e r o f t h e i r v a l u e i n the game. 1 INTRODUCTION game between any number of p l a y e r s , and each w r o t e a program p l a y e r . The SCRABBLE Crossword Game is a w e l l known game c o n s i d e r e d to r e q u i r e a f a i r amount of i n t e l l i g e n c e , s t r a t e g i c and t a c t i c a l s k i l l , f a c i l i t y w i t h w o r d s , and l u c k . U n l i k e most games i n the a r t i f i c i a l i n t e l l i g e n c e l i t e r a t u r e , i t can be p l a y e d by two or more p l a y e r s and is n e i t h e r a zero sum game n o r a game of p e r f e c t i n f o r m a t i o n . For these r e a s o n s , minimax s e a r c h -­ i n g p r o c e d u r e s do n o t seem a p p l i c a b l e . When humans p l a y the game, a l a r g e e a s i l y a c c e s s i b l e v o c a b u l a r y seems to be the most i m p o r t a n t d e t e r -­ m i n e r o f v i c t o r y . One m i g h t , t h e r e f o r e , t h i n k t h a t i t would b e easy t o w r i t e a program t h a t p l a y s the SCRABBLE Crossword Game at the cham-­ p i o n s h i p l e v e l . We are examining t h i s question b y c o n c e n t r a t i n g our e f f o r t s o n the d e s i g n o f the l e x i c o n and t h e a l g o r i t h m f o r s e a r c h i n g i t and p a y i n g much l e s s a t t e n t i o n t o the s t r a t e g i e s and t a c t i c s o f a c t u a l p l a y . I n t he s e c t i o n s t h a t f o l l o w , w e w i l l b r i e f l y d e s c r i b e the game manager, b a s i n g the d e s c r i p -­ t i o n o n the P a s c al v e r s i o n , w h i c h i s more gen-­ e r a l t h a n the SIMULA v e r s i o n . We w i l l t h e n describe t h e l e x i c o n d a t a s t r u c t u r e and search a l g o r i t h m i n more d e t a i l . F i n a l l y , w e w i l l b r i e f l y d e s c r i b e t h e t h r e e program p l a y e r s t h a t have been w r i t t e n . 2. THE GAME MANAGER F i g u r e 1 shows the o v e r a l l o r g a n i z a t i o n o f the s y s t e m , assuming one human p l a y e r and one p r o -­ gram p l a y e r . I n r e a l i t y , t h e r e may b e any number of human p l a y e r s and any number of p r o -­ gram p l a y e r s as l o n g as t h e r e a r e at l e a s t two players. The game manager module h a n d l e s a l l i n t e r a c t i o n w i t h human p l a y e r s and e i t h e r d e a l s t i l e s o r a c c e p t s t i l e s p i c k e d b y human a s s i s t a n t s . A f t e r Our l e x i c o n d i f f e r s fro m those u s u a l l y used i n n a t u r a l language p r o c e s s i n g programs because o f t h e use t o w h i c h i t i s t o b e p u t . U s u a l l y one i s c o n f r o n t e d w i t h a p o s s i b l e w o r d . One must d e t e r -­ mine i f i t i s a w o r d , a n d , i f s o , segment i t i n -­ t o a f f i x e s and s t e m , and r e t r i e v e l e x i c a l i n f o r -­ m a t i o n a s s o c i a t e d w i t h t h e stem. The problem f o r t h e SCRABBLE Crossword Game l e x i c o n i s , g i v e n a s e t o f l e t t e r s , f i n d a l l t h e words t h a t can b e made f r o m any c o m b i n a t i o n and p e r m u t a t i o n of them. T h i s i s a v e r y d i f f e r e n t p r o b l e m . We have d e s i g n e d a program t h a t p l a y s the SCRABBLE Crossword Game and implemented two v e r s i o n s . A t I n d i a n a U n i v e r s i t y , Howard Smith implemented a program in SIMULA 67 [2,3] on a DECSystem-­10. A t S U N Y / B u f f a l o , M i c h a e l M o r r i s and K a r l Schimpf implemented a v e r s i o n i n P a s c a l [ 6 ] on a CDC CYBER 173 t h a t manages a 797 and computes t h e s c o r e o f l e g a l p l a y s . I f a human proposes a word n o t in the l e x i c o n , t h e r e f e r e e asks i f i t i s r e a l l y a w o r d . I f t h e human says i t i s , i t i s added t o t h e l e x i c o n . Otherwise the play i s considered i l l e g a l . Each node i n th e l e x i c o n t r i e c o n t a i n s a l e t t e r , a l i s t o f w o r d s , a success p o i n t e r and a f a i l p o i n t e r . I f L i s t h e sequence o f l e t t e r s c o n -­ t a i n e d i n a l l nodes o n t h e success p a t h f r om t h e r o o t t o some n o d e , n , e x c l u d i n g the l e t t e r i n n , and l i s t h e l e t t e r i n n , t h e n t h e l i s t o f words i n n i s a l l words t h a t a r e p e r m u t a t i o n s o f t h e l e t t e r s i n L p l u s £ , t h e success p o i n t e r p o i n t s to a s u b t r i e c o n t a i n i n g words formed f r o m L, £, p l u s o t h e r l e t t e r s , and t h e f a i l p o i n t e r p o i n t s t o a s u b t r i e c o n t a i n i n g words formed f r o m L , other l e t t e r s , but not l. The o r d e r o f l e t t e r s o n b o t h success and f a i l p a t h s i s the c a n o n i c a l o r d e r d i s c u s s e d b e l o w . F i g u r e 2 shows a s m a l l lexicon t r i e . Program p l a y e r s may use t h e r e f e r e e t o f i n d l e g a l p l a y s and t o choose t h e b e s t among s e v e r a l legal plays. 2.2 The Human Opponent A human p l a y e r i n t e r a c t s w i t h the game manager in a n o t a t i o n close to the o f f i c i a l n o t a t i o n of the SCRABBLE Crossword Game P l a y e r s , I n c . [ l ] . The word i s t y p e d w i t h each l e t t e r a l r e a d y o n t h e b o a r d preceeded by a " $ " . When a b l a n k t i l e is p l a y e d , a "@'' is typed f o l l o w e d by the l e t t e r i t i s b e i n g used a s . The l o c a t i o n o f t h e word i s i n d i c a t e d e i t h e r by a row and the b e g i n n i n g and e n d i n g columns ( e . g . 8 K -­ N ) , or by a column and t h e b e g i n n i n g and e n d i n g rows ( e . g . C 2 -­ 7 ) . 2.3 3.2 The L e x i c o n Manager The l e x i c o n manager i s used b y t h e r e f e r e e t o check w h e t h e r words and c r o s s w o r d s a r e c o n t a i n e d in t h e l e x i c o n , and to add words w h i c h a human p l a y e r c l a i m e d a r e w o r d s , b u t were n o t a l r e a d y i n t h e l e x i c o n . The program p l a y e r s use t h e l e x i c o n manager to f i n d words to p l a y . A human p l a y e r may use t h e l e x i c o n manager t o i n t e r a c t w i t h t h e l e x i c o n w i t h o u t e n d i n g t h e game. 3. 3.1 The Search A l g o r i t h m B r i e f l y , the a l g o r i t h m f o r searching the l e x i c o n i s t o t a k e a s e t o f t i l e s , o r d e r them a c c o r d i n g to the c a n o n i c a l o r d e r i n g , search the f a i l p a t h o f the r o o t u n t i l t h e l e t t e r o f t h e f i r s t t i l e matches t h e l e t t e r i n a n o d e , t h e n search t h e success s u b t r i e w i t h the r e m a i n i n g t i l e s o f t h e s e t . T h e n , t o g e t words formed from s u b s e t s o f t h e o r i g i n a l s e t , i g n o r e t h e l e t t e r t h a t matched t h e node and a l s o s e a r c h t h e f a i l s u b t r i e . So, i f t h e l e x i c o n o f F i g u r e 2 i s searched w i t h t h e l e t t e r s KCPTIE, "peck w i l l b e f o u n d , and t h e n , i g n o r i n g " p " , " t i c k w i l l be found. 11 f t So t h a t words can be found in a p p r o x i m a t e l y t h e o r d e r o f t h e i r v a l u e s i n t h e SCRABBLE Crossword Game, the m a j o r s o r t used i n the c a n o n i c a l o r d e r i n g i s t h e v a l u e o f t h e l e t t e r s i n t h a t game. S i n c e when a l e t t e r o f a l e x i c o n node i s i n t h e s e a r c h s e t b o t h i t s success and f a i l s u b t r i e s are searched, but i f i t i s not only i t s f a i l s u b t r i e i s s e a r c h e d , t h e secondary o r d e r is f r e q u e n c y o f t h e l e t t e r i n t h e game s e t , l e a s t f r e q u e n t f i r s t . The t e r t i a r y s o r t i s f r e q u e n c y i n E n g l i s h , most f r e q u e n t f i r s t , t o h e l p m i n i -­ mize the s i z e o f the l e x i c o n t r i e . Contrary t o t h i s o r d e r i n g , " S " was p l a c e d l a s t , because o f t h e l a r g e number o f words t h a t can have " S " added and r e m a i n w o r d s . For e x a m p l e , F i g u r e 2 has 2 1 n o d e s , b u t i f " S " were p l a c e d b e f o r e L » where i t b e l o n g s , the same l e x i c o n would have 32 n o d e s . The f i n a l o r d e r i n g used was ' 'QZXJKHFYWVCMPBGDLUTNRIOAES''. THE LEXICON The Data S t r u c t u r e Four d e t a i l s were i g n o r e d i n t h e above b r i e f d e s c r i p t i o n o f the search a l g o r i t h m . F i r s t , l o n g e r w o r d s , l i k e " s m i l e s " a r e more v a l u a b l e t h a n s h o r t e r words u s i n g t h e same l e t t e r s , l i k e " m i l " , so words on t h e same success b r a n c h a r e c o l l e c t e d i n the o r d e r o f l e a f t o w a r d r o o t , rather t h a n r o o t toward l e a f . Second, t h e s e a r c h s e t o f t i l e s may c o n t a i n b l a n k s , w h i c h can match any l e t t e r . B l a n k s a r e i n i t i a l l y p l a c e d f i r s t i n t h e s e a r c h s e t s o they can match nodes h i g h i n t h e l e x i c o n t r i e , and l a t e r 798 b e s t o f the P w p l a y s , i l l e g a l , t h e y exchange tries playing parallel s i d e r i n g up to w2 words rack. s h u f f l e d down i n t h e s e a r c h s e t t o match lower n o d e s . T h i r d , t h e s e a r ch s e t can c o n t a i n l e t t e r s t h a t a r e r e q u i r e d t o b e i n found w o r d s , s o a r e never ignored by the search a l g o r i t h m . F o u r t h , the s e a r c h t e r m i n a t e s a f t e r each word i s found i n case t h a t word i s a c c e p t a b l e t o the p l a y e r and n o f u r t h e r search i s r e q u i r e d . I f t h i s i s not the c a s e , t h e s e a r c h can b e resumed from where i t l e f t o f f . The SIMULA 67 v e r s i o n uses t h e d e t a c h f a c i l i t y f o r t h i s p u r p o s e . Space p r e c l u d e s p r e -­ s e n t i n g the complete a l g o r i t h m h e r e . I t can b e found as p r o c e d u r e f i n d w o r d s 2 in [9") . 3.3 Table 1 shows the r e s u l t s of i n t e r a c t i v e games of SIMSCRB a g a i n s t humans DRF and S J , and b a t c h games o f G e r ry a g a i n s t i t s e l f and Joanne a g a i n s t itself. The programs p l a y a p p r o x i m a t e l y a t human Use of Secondary S t o r a g e We have used two d i f f e r e n t schemes f o r m a i n t a i n -­ i n g t h e l e x i c o n on d i s k . The SIMULA 67 v e r s i o n uses a s e q u e n t i a l f i l e o f c h a r a c t e r s and d i g i t s . The l e t t e r o f each node i s f o l l o w e d b y the l i s t o f w o r d s , i f a n y , and t h e n b y one o f the d i g i t s 0-­3 i n d i c a t i n g w h i c h k i n d s o f s u b t r i e s the node h a s . I f b o t h a r e p r e s e n t , t h e success s u b t r i e proceeds t h e f a i l s u b t r i e . T h i s o r g a n i z a t i o n a l l o w s t h e l e x i c o n t o b e searched w h i l e the f i l e i s r e a d i n t h e f o r w a r d d i r e c t i o n o n l y . However, a n o d e ' s e n t i r e success s u b t r i e must b e read t o f i n d i t s f a i l s u b t r i e . The SIMUIA v e r s i o n , w i t h a l e x i c o n of abou t 1500 w o r d s , averaged about 42 CPU seconds per p a i r of moves (program and human). l e v e l . The m a j o r d i f f e r e n c e i s t h e number o f t i m e s the programs exchange t i l e s . T h i s i s p a r t -­ l y due t o s m a l l v o c a b u l a r i e s , and p a r t l y t o w a s t i n g many o f t h e l i m i t e d p w t r i e s o n p o s i t i o n s f o r w h i c h l e g a l moves c o u l d n o t b e f o u n d . The P a s c a l v e r s i o n uses a segmented f i l e o f r e -­ c o r d s , where each r e c o r d i s a l e x i c o n node c o n -­ t a i n i n g a t most one w o r d . I f n e c e s s a r y , s u c c e s s i v e r e c o r d s w i t h dummy l e t t e r s c o n t a i n a d d i t i o n a l w o r d s . Except f o r t h i s , t h e r o o t o f t h e success s u b t r i e o f a node i m m e d i a t e l y f o l l o w s t h e node. A node w i t h no success s u b t r i e is f o l l o w e d by an e n d -­ o f -­ s e g m e n t mark. The r o o t o f the f a i l s u b t r i e o f a node i s t h e f i r s t node o f some segment l a t e r i n t h e f i l e , s o t h a t t o f i n d a n o d e ' s f a i l s u b -­ t r i e , t h e e n t i r e success s u b t r i e n e e d n ' t b e r e a d , j u s t t h e f i r s t node o f each i n t e r v e n i n g segment. C u r r e n t l y , t h i s l e x i c o n c o n t a i n s 2126 w o r d s , i n 4262 node r e c o r d s and 1666 segments. The P a s c a l program averages abou t 29 CPU seconds per p a i r of program v s . program moves when r u n i n b a t c h , b u t has been t o o slow t o r u n i n t e r a c t i v e l y due t o system o v e r h e a d . 4. o r , i f these a r e a l l t h e i r r a c k s . Joanne f i r s t t o e x i s t i n g w o r d s , c o n -­ formed e n t i r e l y from t h e PROGRAM PLAYERS Three program p l a y e r s have been w r i t t e n , SIMSCRB, w r i t t e n by Howard Smith in SIMULA 6 7 , and two P a s c a l programs -­ -­ G e r r y , b y M i c h a e l M o r r i s , and J o a n n e , by K a r l Schimpf. SIMSCRB o n l y p l a y s a c r o s s e x i s t i n g w o r d s , n e v e r e x t e n d i n g a word o r p l a y i n g p a r a l l e l t o a w o r d . Joanne a l s o c o n s i d e r s p a r a l l e l p l a y s , b u t n o t e x t e n d i n g w o r d s . Gerry c o n s i d e r s a l l t y p e s o f p l a y s . Each b o a r d a n a l y z e r m a i n t a i n s a l i s t o f p l a y a b l e p o s i t i o n s o r d e r e d b y e x p e c t ed v a l u e of a p l a y t h e r e . The programs c o n s i d e r t h e b e s t P p o s i t i o n s and t h e f i r s t w words found i n t h e l e x i c o n f o r each p o s i t i o n . The chosen p l a y i s the f i r s t t h a t i s wort h a t l e a s t m p o i n t s , o r the 799 ACKNOWLEDGEMENTS Ben Shneiderman, M a r g a r e t Ambrose and Barbara Rasche c o o p e r a t e d o n a n e a r l y v e r s i o n o f the p r o -­ gram. Howard R. Smith implemented the SIMULA 67 v e r s i o n , and was p a r t i a l l y r e s p o n s i b l e f o r the l e x i c o n s e a r ch a l g o r i t h m , M i c h a e l M o r r i s J r . and K a r l Schimpf implemented the P a s c al v e r s i o n . REFERENCES [1]. C o n k l i n , D . K . ( E d . ) The O f f i c i a l SCRABBLE P l a y e r s Handbook. Harmony Books D i v i s i o n , Crown P u b l i s h e r s , I n c . , NY, 1976. [2]. D a h l , O . -­ J . , Myhrhang, B . ; H y g a a r d , K . The Simula 67 Common Base Language. Norwegian Com-­ p u t i n g C e n t r e , F o r s k n i n g s v e i e n I B , Oslo 3 , 1968. [3]. D a h l , O . -­ J . , and N y g a a r d , K . S i m u l a -­ a n A l g o l -­ b a s e d s i m u l a t i o n l a n g u a g e. Comm. ACM 9, ( S e p t . , 1 9 6 6 ) , 6 7 1 -­ 6 7 8 . [4]. De La B r i a n d a i s , R. F i l e s e a r c h i n g u s i n g v a r i a b l e l e n g t h k e y s . P r o c . WJCC, AFIPS P r e s s , M o n t v a l e , N J , 1959, 295-­298. [5]. Hays, D.G. I n t r o d u c t i o n t o C o m p u t a t i o n a l L i n g u i s t i c s . A m e r i c a n E l s e v i e r , NY, 1967, 9 2 -­ 9 4 . [6~|. J e n s e n , K. and W i r t h , N. PASCAL User Manual and R e p o r t . S p r i n g e r -­ V e r l a g , NY, 1976. [7]. K n u t h , D.E. The A r t o f Computer Programming V o l . 3 / S o r t i n g and S e a r c h i n g . A d d i s o n -­ W e s l e y , Reading, MA, 1973, 4 8 1 -­ 4 8 7 . [ 8 1 . Lamb, S.M and J a c o b s e n , W . H . , J r . , A h i g h -­ speed l a r g e -­ c a p a c i t y d i c t i o n a r y s y s t e m . Mechani-­ c a l T r a n s l a t i o n , 6 ( N o v . , 1 9 6 1 ) , 7 6 -­ 1 0 7 . [9]. S h a p i r o , S.C. and S m i t h , H . R . , A SCRABBLE Crossword Game P l a y i n g Program. Tech R p t . No. 119, Dept. of Computer S c i e n c e , S U N Y / B u f f a l o , NY, 1977,