LEARNING RACQUETBALL BY CONSTRAINED EXAMPLE GENERATION* By Leonard P. Wesley Department of Computer and I n f o r m a t i o n Science U n i v e r s i t y of Massachusetts Amherst, Mass. 01003 ABSTRACT The syntax of the e x p e r t ' s i n f o r m a t i o n was " I F A PLAYER ENCOUNTERS o b j e c t - o f - e n c o u n t e r s THEN TRY AND MEET object-of-meet AND RESPOND object-of-respond.". The context w i t h i n which l e a r n i n g occurs is the ' o b j e c t - o f - e n c o u n t e r s ' , and the a p p r o p r i a t e behavior f o r meeting and h i t t i n g the b a l l i s ' o b j e c t - o f - m e e t ' a n d ' o b j e c t - o f - r e s p o n d ' respectively. An example of the e x p e r t ' s advice is " I F A PLAYER ENCOUNTERS a f a s t c r o s s - c o u r t shot THEN TRY AND MEET the b a l l waist high AND RESPOND w i t h a c e i l i n g s h o t . " Both the context and behavior knowledge is t r a n s l a t e d i n t o v e l o c i t y and RB c o u r t X,Y,2 coordinates (i.e., 'coordinate-velocity' information). This knowledge is represented in a f r a m e - l i k e data s t r u c t u r e [ F i g 1a,b3. A model of l e a r n i n g in a h i g h l y a c t i v e and c o m p e t i t i v e environment i s presented h e r e . This paper p o s i t s t h a t l e a r n i n g , under these c o n d i t i o n s , can be c h a r a c t e r i z e d as accommodating examples produced by the Constrained Example Generation (CEG) process. The r o l e of the"CEG process in the LRCEG system is demonstrated by a t y p i c a l learning scenario. I INTRODUCTION This paper Introduces an emerging system designed to l e a r n r a c q u e t b a l l by a process of constrained example generation (LRCEG). Having the c a p a c i t y to generate a set of c o n s t r a i n e d examples has been shown to be e s s e n t i a l in many domains [ 1 ] . [ 2 ] . Furthermore, l e a r n i n g a s a form o f accommodating and a s s i m i l a t i n g knowledge has been demonstrated to be u s e f u l to a number of systems [6]. .The i m p l i c a t i o n here is t h a t some forms of l e a r n i n g can be modeled, in p a r t , as a process of r e t r i e v a l , m o d i f i c a t i o n , o r c o n s t r u c t i o n o f domain specific knowledge, represented as examples C33.C53. E x p l o r i n g a v a r i e t y of l e a r n i r g issues w i t h i n the RB domain is made p o s s i b l e by the g e n e r a l i t y and robustness of the CEG paradigm [ 4 ] , Racquetball (RB) as a l e a r n i n g domain has some advantages. The speed and c o m p e t i t i v e n a t u r e of the aport f a c i l i t a t e s i n v e s t i g a t i n g l e a r n i n g under r a p i d l y changing and strenous c o n d i t i o n s . Many important occupations require learning the a p p r o p r i a t e behavior f o r u n p r e d i c t a b l e , f a s t - p a c e d , and demanding environments (e.g., medicine, athletics, aircraft piloting). The l e a r n i n g r e q u i r e d to perform tasks in these domains can be very d i f f e r e n t ( a l t h o u g h not m u t u a l l y e x c l u s i v e ) from a t y p i c a l classroom e d u c a t i o n . The g e n e r a l i t y and robustness of the CEG paradigm allows e x p l o r i n g a v a r i e t y o f l e a r n i n g - R B issues [ 4 ] . (CONTEXT (SEMANTICS (X dX) (Y dY) (Z dZ) (V dV)) (OBJECTIVE o b j e c t i v e - f u n c t i o n - d e f i n i t i o n ) (SITUATION s i t u a t i o n - f u n c t i o n - d e f i n i t i o n ) (RESPONSE r e s p o n s e - i n f o - 4 - f u r c t i o n - d e f i n i t i o n ) ) (a.) II ACQUISITION AND REPRESENTATION OF DOMAIN SPECIFIC KNOWLEDGE Some knowledge about RB must be obtained and represented b e f o r e the LRCEG system can attempt l e a r n i n g . Knowledge f o r the i n i t i a l set o f six examples was e x t r a c t e d from i n t e r v i e w s of a RB expert. The LRCEG system is w r i t t e n in LISP and runs on a VAX 11/780. A f l o w diagram of the LRCEG system and how a user i n t e r a c t s w i t h it appears in Fig 2. System o p e r a t i o n can be described in t h i r t e e n steps (corresponding to box numbers in Fig 2 . ) a s f o l l o w s : ( 1 ) l n i t i a l b a l l X,Y,Z p o s i t i o n and • T h i s work s u p p o r t e d , in p a r t , by f r e e computer time from the COINS department at the U n i v e r s i t y of Massachusetts, Amherst, Mass. 01003. 144 v e l o c i t y V, number of i t e r a t i o n s , and d i s c r e t e time interval is s p e c i f i e d by the user. (2)A snapshot (instantaneous ball position and velocity) information is gathered. ( 3 ) R e t r i e v e example(s) (from box 14) with satisfiable semantic constraints(i.e. X,Y,Z,V from snapshot are w i t h i n dX,dY,dZ,dV of X,Y,Z,V from semantics s l o t of the example. R e t r i e v a l output is a set of examples that are candidates for modification. (4)Modification involves calling the function-definition programs defined in the OBJECTIVE, SITUATION, and RESPONSE s l o t s (see F i g 1 a . ) . The o b j e c t i v e program m o d i f i e s the X,Y,Z c o o r d i n a t e s of where a player meets the b a l l , The player computes the t r a j e c t o r y o f the b a l l and m o d i f i e s where, on the RB c o u r t , to be in order to s a t i s i f y the c o n s t r a i n t o f meeting the b a l l a t the p o s i t i o n s p e c i f i e d by the " o b j e c t - o f - m e e t " . The situation function quantifies the players "situation" which i s i n v e r s e l y p r o p o r t i o n a l t o the d i s t a n c e between the player and the o b j e c t i v e . The response f u n c t i o n m o d i f i e s the X.Y.Z coordinates of where to hit the ball and computes the short-term-success-rate which is a "goodness" measure of the example's response s p e c i f i c a t i o n . Examples are m o d i f i e d if improvement can be made in the s h o r t - t e r m - s u c c e s s - r a t e . Each example is t r y i n g t o o b t a i n the highest "goodness" ( i . e . , s h o r t - t e r m - s u c c e s s - r a t e value) measure p o s s i b l e , in e f f e c t competing w i t h other examples to be selected as the one having the best response. A better s i t u a t i o n , higher l o n g - t e r m - s u c c e s s - r a t e , slower b a l l , and f a s t e r player w i l l tend t o increase the short-term-success-rate. A l l examples are sorted on t h e i r s h o r t - t e r m - s u c c e s s - r a t e before step 5. This a l l o w s subsequent r e t r i e v a l , m o d i f i c a t i o n , and execution o f the c u r r e n t l y best examples f i r s t . Being able to r e t r i e v e and manipulate the best examples f i r s t i s important i f time and quantity c o n s t r a i n t s are placed on the l e a r n i n g process. (5)Execute a c t i o n s p e c i f i e d b y o b j e c t i v e s l o t o f c u r r e n t l y best example ( i . e . , t r y t o meet the b a l l by updating current X,Y,Z of player). ( 6 ) I t e r a t i o n s completed o r player has h i t the b a l l . (7)Update b a l l X,Y,Z and V f o r next i t e r a t i o n . (8)Execute the a c t i o n s p e c i f i e d in the response s l o t by r e t u r n i n g the example w i t h the currently highest short-term-response-rating. (9)An example has f a i l e d i f executing i t s response r e s u l t s i n a missed o r skipped b a l l ( h i t s f l o o r before front wall). Otherwise, t h r e s h o l d the response r a t i n g of the returned example to determine the success o r f a i l u r e o f i t s response. ( 1 0 ) S e l f e x p l a n a t o r y (11)Change the i n i t i a l c o n t e x t s l i g h t l y ( i . e . , vary the i n i t i a l ball position and/or v e l o c i t y ) . (12)Compute the statistical success r a t e of the example(s) and use a sigmoidal f u n c t i o n to update l o n g - t e r m - s u c c e s s - r a t e of the example from 9. (13) Replace old l o n g - t e r m - s u c c e s s - r a t e w i t h updated one from 12 and add the m o d i f i e d example to 14. This step is the accommodation process as the new example is now a v a i l a b l e f o r c o m p e t i t i o n and e v a l u a t i o n w i t h o l d examples in subsequent environments. The LRCEG system is l e a r n i n g s u c c e s s f u l l y if it r e t u r n s a example having the best response r a t i n g f o r a p r e v i o u s l y s i m i l a r c o n t e x t ( i . e . , what might work best in the f u t u r e is what p r e v i o u s l y worked best i n s i m i l a r c o n t e x t s ) . IV EXPERIMENTAL RESULTS This s e c t i o n presents F i g s , 3 through 7 as a typical l e a r n i n g scenario i n which the system ('player-l') l e a r n s the best response for a cross-court shot. 145 semantic c o n s t r a i n t s were not satisfied and/or t h e i r s h o r t - t e r m - s u c c e s s - r a t e was not high enough. A f t e r r o t a t i n g F i g 3 +45 degrees about the y axis. '1' t r a c e s the path o f ' p l a y e r - 1 1 . *B' t r a c e s the t r a j e c t o r y o f the b a l l start* i n g at ' p l a y e r - 2 ' who does not move in t h i s s c e n a r i o . For d i s p l a y purposes the p o s i t i o n of the b a l l and p l a y e r s are not i n d i c a t e d f o r every i t e r a t i o n . F i n a l l y the l o n g - t e r m - s u c c e s s - r a t e of example 7 is high enough so t h a t ' p l a y e r - 1 ' begins executing the a c t i o n s p e c i f i e d by t h a t example from the f i r s t i t e r a t i o n . V FUTURE IMPROVEMENTS AND EXTENSIONS Future research w i l l extend the l i m i t s o f the LRCEC system to incorporate more f u n c t i o n s c u r r e n t l y performed by the user. Once the extensions have been implemented the f o l l o w i n g l e a r n i n g issues can be addressed: (1)Determine the l i m i t s o f t h i s paradigm f o r l e a r n i n g ( 2 ) A s c e r t a i n the knowledge and proceses necessary to l e a r n a sequence of examples ( i . e . sequence of a c t i o n s ) (3)Evaluate accommodation techniques and (4)Compare how w e l l the system l e a r n s as a f u n c t i o n of the expert supplying the knowledge. VI CONCLUSION This paper has demonstrated how some types of l e a r n i n g can be accompolished by a p p l y i n g the CEG paradigm to examples r e p r e s e n t i n g domain knowledge. ACKNOWLEDGEMENTS Many thanks and much g r a t i t u d e goes to P r o f . Edwina L. Rissland f o r her i n v a l u a b l e guidance, c r i t i q u e s , s u g g e s t i o n s , and research t h a t provided the foundation f o r t h i s paper; Daryl T. Lawton f o r smoothing out my t h o u g h t s ; and to the COINS department f o r f r e e computer time on weekends. BIBLIOGRAPHY [ 1 ] C o l l i n s , A. and A. Stevens, (1979) "GOALS AND STRATAGIES OF EFFECTIVE TEACHERS" Bolt Beranek and Newnan Inc., Cambridge, Mass. [ 2 ] R i s s l a n d , E . , and E. Soloway (1980) "OVERVIEW OF AN EXAMPLE GENERATION SYSTEM" Proceedings o f F i r s t National Conference on A r t i f i c a l Intelligence C3) R i s s l a n d , E. (1980) "EXAMPLE GENERATION" Proceedings of T h i r d National Conference of the Canadian Society for Computational Studies o f Intelligence [ 4 ] R i s s l a n d , E. L . , and E. M, Soloway "CONSTRAINED EXAMPLE GENERATION: A TESTBED FOR STUDYING ISSUES IN LEARNING". In Proc IJCAI-81 [5] Rissland, E., Soloway, E., O'Connor, S., Waisbrot, S . , W a l l , R., Wesley, L . , T. Weymouth (1980) "EXAMPLES OF EXAMPLE GENERATION USING THE CEG ARCHITECTURE" COINS Technical Report, in p r e p e r a t i o n . [ 6 ] McDERMOTT, J. (1978) "ANA: AN ASSIMILATING AND ACCOMMODATING PRODUCTION SYSTEM" DARPA R e p o r t 3597 Sane as F i g 4 except r o t a t e d -90 degrees about the z a x i s . Line segment 1 i l l u s t r a t e s ' p l a y e r - r executing the a c t i o n s p e c i f i e d b y example number 3. Line segment 2 i n t e r s e c t s l i n e segment 1 at the p o i n t where example 1's s h o r t - t e r m - s u c c e s s - r a t e is higher than example 3's. At the intersection point * p l a y e r - l ' begins executing the a c t i o n s p e c i f i e d by example 1. When ' p l a y e r - 1 * meets the r a c q u e t b a l l example 1 s p e c i f i e s where to h i t it. A new and seperate example 7 has been created and accommodated as a r e s u l t of steps 12-14 in Fig 2. Except for the l o n g - t e r m - s u c c e s s - r a t e , i n f o r m a t i o n f o r examp l e 7 is a copy of the l a t e s t m o d i f i c a t i o n s ) to example 1. Example 7fs l o n g - t e r m - s u c c e s s - r a t e was computed to be .75 The parent of example 7 ( i . e . , example 1) now has a l o n g - t e r m - s u c c e s s - r a t e of .55 Example 3's l o n g - t e r m - s u c c e s s - r a t e was r e duced to .45 . (NOTE: initial long-term-success-rates f o r a l l examples was • 5 ) . T h i s f i g u r e shows how example 7 begins i n f l u e n c i n g p l a y e r - T s behavior e a r l i e r i n a s i m i l a r r a l l y . Examples 2 and 4-6 d i d not i n f l u e n c e p l a y e r - T s behavior because t h e i r 146