LEARNING RACQUETBALL BY CONSTRAINED EXAMPLE

advertisement
LEARNING RACQUETBALL BY CONSTRAINED EXAMPLE GENERATION*
By Leonard P. Wesley
Department of Computer and I n f o r m a t i o n Science
U n i v e r s i t y of Massachusetts
Amherst, Mass. 01003
ABSTRACT
The syntax of the e x p e r t ' s i n f o r m a t i o n was " I F A
PLAYER ENCOUNTERS o b j e c t - o f - e n c o u n t e r s THEN TRY AND
MEET
object-of-meet
AND
RESPOND
object-of-respond.".
The context w i t h i n which
l e a r n i n g occurs is the ' o b j e c t - o f - e n c o u n t e r s ' , and
the a p p r o p r i a t e behavior
f o r meeting and h i t t i n g
the b a l l i s ' o b j e c t - o f - m e e t ' a n d ' o b j e c t - o f - r e s p o n d '
respectively.
An example of the e x p e r t ' s advice is
" I F A PLAYER ENCOUNTERS a f a s t c r o s s - c o u r t shot
THEN TRY AND MEET the b a l l waist high AND RESPOND
w i t h a c e i l i n g s h o t . " Both the context and behavior
knowledge is t r a n s l a t e d i n t o v e l o c i t y and RB c o u r t
X,Y,2 coordinates
(i.e.,
'coordinate-velocity'
information).
This knowledge is represented in a
f r a m e - l i k e data s t r u c t u r e [ F i g 1a,b3.
A model of l e a r n i n g in a h i g h l y a c t i v e
and
c o m p e t i t i v e environment i s presented h e r e . This
paper p o s i t s t h a t l e a r n i n g , under these c o n d i t i o n s ,
can be c h a r a c t e r i z e d as accommodating examples
produced by the Constrained Example Generation
(CEG) process. The r o l e of the"CEG process in the
LRCEG system is demonstrated by a t y p i c a l
learning
scenario.
I INTRODUCTION
This paper Introduces an emerging
system
designed to l e a r n r a c q u e t b a l l by a process of
constrained example generation (LRCEG). Having the
c a p a c i t y to generate a set of c o n s t r a i n e d examples
has been shown to be e s s e n t i a l in many domains
[ 1 ] . [ 2 ] . Furthermore, l e a r n i n g a s a form o f
accommodating and a s s i m i l a t i n g knowledge has been
demonstrated to be u s e f u l to a number of systems
[6].
.The i m p l i c a t i o n here is t h a t some forms of
l e a r n i n g can be modeled, in p a r t , as a process of
r e t r i e v a l , m o d i f i c a t i o n , o r c o n s t r u c t i o n o f domain
specific
knowledge,
represented
as
examples
C33.C53.
E x p l o r i n g a v a r i e t y of l e a r n i r g
issues
w i t h i n the RB domain is made p o s s i b l e by the
g e n e r a l i t y and robustness of the CEG paradigm [ 4 ] ,
Racquetball (RB) as a l e a r n i n g domain has some
advantages.
The speed and c o m p e t i t i v e n a t u r e of
the aport f a c i l i t a t e s i n v e s t i g a t i n g l e a r n i n g
under
r a p i d l y changing and strenous c o n d i t i o n s . Many
important
occupations
require
learning
the
a p p r o p r i a t e behavior f o r u n p r e d i c t a b l e , f a s t - p a c e d ,
and
demanding
environments
(e.g.,
medicine,
athletics,
aircraft
piloting).
The l e a r n i n g
r e q u i r e d to perform tasks in these domains can be
very d i f f e r e n t
( a l t h o u g h not m u t u a l l y e x c l u s i v e )
from a t y p i c a l classroom e d u c a t i o n . The g e n e r a l i t y
and robustness of the CEG paradigm allows e x p l o r i n g
a v a r i e t y o f l e a r n i n g - R B issues [ 4 ] .
(CONTEXT (SEMANTICS (X dX) (Y dY) (Z dZ) (V dV))
(OBJECTIVE o b j e c t i v e - f u n c t i o n - d e f i n i t i o n )
(SITUATION s i t u a t i o n - f u n c t i o n - d e f i n i t i o n )
(RESPONSE r e s p o n s e - i n f o - 4 - f u r c t i o n - d e f i n i t i o n ) )
(a.)
II ACQUISITION AND REPRESENTATION OF
DOMAIN SPECIFIC KNOWLEDGE
Some knowledge about RB must be obtained and
represented b e f o r e the LRCEG system can attempt
l e a r n i n g . Knowledge f o r the i n i t i a l
set o f six
examples was e x t r a c t e d from i n t e r v i e w s of a RB
expert.
The LRCEG system is w r i t t e n in LISP and runs
on a VAX 11/780.
A f l o w diagram of the LRCEG
system and how a user i n t e r a c t s w i t h it appears
in
Fig 2.
System o p e r a t i o n can be described in
t h i r t e e n steps (corresponding to box numbers in Fig
2 . ) a s f o l l o w s : ( 1 ) l n i t i a l b a l l X,Y,Z p o s i t i o n and
• T h i s work s u p p o r t e d , in p a r t , by f r e e computer
time from the COINS department at the U n i v e r s i t y of
Massachusetts, Amherst, Mass. 01003.
144
v e l o c i t y V, number of i t e r a t i o n s , and d i s c r e t e time
interval
is s p e c i f i e d by the user.
(2)A snapshot
(instantaneous
ball
position
and
velocity)
information is gathered.
( 3 ) R e t r i e v e example(s)
(from
box
14)
with
satisfiable
semantic
constraints(i.e.
X,Y,Z,V from snapshot are w i t h i n
dX,dY,dZ,dV of X,Y,Z,V from semantics s l o t of the
example.
R e t r i e v a l output is a set of examples
that
are
candidates
for
modification.
(4)Modification
involves
calling
the
function-definition
programs
defined
in
the
OBJECTIVE, SITUATION, and RESPONSE s l o t s (see F i g
1 a . ) . The o b j e c t i v e program m o d i f i e s the X,Y,Z
c o o r d i n a t e s of where a player meets the b a l l , The
player computes the t r a j e c t o r y o f the b a l l and
m o d i f i e s where, on the RB c o u r t , to be in order to
s a t i s i f y the c o n s t r a i n t o f meeting the b a l l a t
the
p o s i t i o n s p e c i f i e d by the " o b j e c t - o f - m e e t " . The
situation
function
quantifies
the
players
"situation"
which i s i n v e r s e l y p r o p o r t i o n a l t o the
d i s t a n c e between the player and the o b j e c t i v e . The
response f u n c t i o n m o d i f i e s the X.Y.Z coordinates of
where to
hit
the
ball
and
computes
the
short-term-success-rate
which
is a "goodness"
measure of the example's response s p e c i f i c a t i o n .
Examples are m o d i f i e d if improvement can be made in
the s h o r t - t e r m - s u c c e s s - r a t e .
Each
example
is
t r y i n g t o o b t a i n the highest "goodness" ( i . e . ,
s h o r t - t e r m - s u c c e s s - r a t e value) measure p o s s i b l e , in
e f f e c t competing w i t h other examples to be selected
as the one having the best response.
A better
s i t u a t i o n , higher l o n g - t e r m - s u c c e s s - r a t e , slower
b a l l , and f a s t e r player w i l l tend t o increase the
short-term-success-rate.
A l l examples are sorted
on t h e i r s h o r t - t e r m - s u c c e s s - r a t e before step 5.
This a l l o w s subsequent r e t r i e v a l , m o d i f i c a t i o n , and
execution o f the c u r r e n t l y best examples f i r s t .
Being able to r e t r i e v e and manipulate the best
examples f i r s t i s important i f time and
quantity
c o n s t r a i n t s are placed on the l e a r n i n g process.
(5)Execute a c t i o n s p e c i f i e d b y o b j e c t i v e s l o t o f
c u r r e n t l y best example ( i . e . , t r y t o meet the b a l l
by
updating
current
X,Y,Z
of
player).
( 6 ) I t e r a t i o n s completed o r player has h i t the b a l l .
(7)Update b a l l X,Y,Z and V f o r next i t e r a t i o n .
(8)Execute the a c t i o n s p e c i f i e d in the response
s l o t by r e t u r n i n g the example w i t h the
currently
highest
short-term-response-rating.
(9)An
example has f a i l e d i f executing i t s response
r e s u l t s i n a missed o r skipped b a l l ( h i t s f l o o r
before
front wall).
Otherwise, t h r e s h o l d the
response r a t i n g of the
returned
example
to
determine the success o r f a i l u r e o f i t s response.
( 1 0 ) S e l f e x p l a n a t o r y (11)Change the i n i t i a l c o n t e x t
s l i g h t l y ( i . e . , vary the i n i t i a l
ball position
and/or v e l o c i t y ) .
(12)Compute the
statistical
success r a t e of the example(s) and use a sigmoidal
f u n c t i o n to update l o n g - t e r m - s u c c e s s - r a t e of the
example
from
9.
(13) Replace
old
l o n g - t e r m - s u c c e s s - r a t e w i t h updated one from 12 and
add the m o d i f i e d example to 14. This step is the
accommodation process as the new example is now
a v a i l a b l e f o r c o m p e t i t i o n and e v a l u a t i o n w i t h o l d
examples in subsequent environments.
The LRCEG system is l e a r n i n g
s u c c e s s f u l l y if
it r e t u r n s a example having the best response
r a t i n g f o r a p r e v i o u s l y s i m i l a r c o n t e x t ( i . e . , what
might work best in the f u t u r e is what p r e v i o u s l y
worked best i n s i m i l a r c o n t e x t s ) .
IV EXPERIMENTAL RESULTS
This s e c t i o n presents F i g s , 3 through 7 as a
typical
l e a r n i n g scenario i n which the system
('player-l')
l e a r n s the best response
for
a
cross-court shot.
145
semantic c o n s t r a i n t s were
not
satisfied
and/or
t h e i r s h o r t - t e r m - s u c c e s s - r a t e was not
high enough.
A f t e r r o t a t i n g F i g 3 +45 degrees about the y
axis.
'1'
t r a c e s the path o f ' p l a y e r - 1 1 .
*B' t r a c e s the t r a j e c t o r y o f the b a l l
start*
i n g at ' p l a y e r - 2 ' who does not move in t h i s
s c e n a r i o . For d i s p l a y purposes the p o s i t i o n
of the b a l l and p l a y e r s are not i n d i c a t e d f o r
every i t e r a t i o n .
F i n a l l y the l o n g - t e r m - s u c c e s s - r a t e of example
7 is high enough so t h a t ' p l a y e r - 1 ' begins
executing the a c t i o n s p e c i f i e d by t h a t example from the f i r s t i t e r a t i o n .
V FUTURE IMPROVEMENTS AND EXTENSIONS
Future research w i l l extend the l i m i t s o f the
LRCEC
system
to
incorporate
more f u n c t i o n s
c u r r e n t l y performed by the
user.
Once
the
extensions have been implemented the f o l l o w i n g
l e a r n i n g issues can be addressed: (1)Determine the
l i m i t s o f t h i s paradigm f o r l e a r n i n g ( 2 ) A s c e r t a i n
the knowledge and proceses necessary to l e a r n a
sequence of examples ( i . e .
sequence of a c t i o n s )
(3)Evaluate accommodation techniques and (4)Compare
how w e l l the system l e a r n s as a f u n c t i o n of the
expert supplying the knowledge.
VI CONCLUSION
This paper has demonstrated how some types of
l e a r n i n g can be accompolished by a p p l y i n g the CEG
paradigm to examples r e p r e s e n t i n g domain knowledge.
ACKNOWLEDGEMENTS
Many thanks and much g r a t i t u d e goes to P r o f .
Edwina L.
Rissland
f o r her i n v a l u a b l e guidance,
c r i t i q u e s , s u g g e s t i o n s , and research t h a t provided
the foundation f o r t h i s paper; Daryl T. Lawton
f o r smoothing out my t h o u g h t s ; and to the COINS
department f o r f r e e computer time on weekends.
BIBLIOGRAPHY
[ 1 ] C o l l i n s , A. and A.
Stevens, (1979)
"GOALS
AND
STRATAGIES
OF
EFFECTIVE
TEACHERS"
Bolt
Beranek
and
Newnan
Inc.,
Cambridge, Mass.
[ 2 ] R i s s l a n d , E . , and E. Soloway (1980) "OVERVIEW
OF
AN
EXAMPLE
GENERATION
SYSTEM"
Proceedings o f F i r s t
National
Conference
on
A r t i f i c a l Intelligence
C3) R i s s l a n d , E.
(1980)
"EXAMPLE GENERATION"
Proceedings of T h i r d National Conference of
the
Canadian
Society
for
Computational
Studies o f
Intelligence
[ 4 ] R i s s l a n d , E.
L . , and
E.
M,
Soloway
"CONSTRAINED EXAMPLE GENERATION:
A TESTBED
FOR STUDYING ISSUES IN LEARNING".
In Proc
IJCAI-81
[5] Rissland,
E., Soloway, E., O'Connor,
S.,
Waisbrot, S . , W a l l , R., Wesley, L . , T.
Weymouth (1980)
"EXAMPLES OF
EXAMPLE
GENERATION USING THE CEG
ARCHITECTURE"
COINS Technical Report, in p r e p e r a t i o n .
[ 6 ] McDERMOTT, J.
(1978)
"ANA:
AN
ASSIMILATING
AND
ACCOMMODATING PRODUCTION
SYSTEM"
DARPA R e p o r t 3597
Sane as F i g 4 except r o t a t e d -90 degrees
about the z a x i s . Line segment 1 i l l u s t r a t e s
' p l a y e r - r executing the a c t i o n s p e c i f i e d b y
example number 3. Line segment 2 i n t e r s e c t s
l i n e segment 1 at the p o i n t where example 1's
s h o r t - t e r m - s u c c e s s - r a t e is higher than example
3's.
At
the
intersection
point
* p l a y e r - l ' begins executing the a c t i o n s p e c i f i e d by example 1. When ' p l a y e r - 1 * meets the
r a c q u e t b a l l example 1 s p e c i f i e s where to h i t
it.
A new and seperate example 7 has been created
and accommodated as a r e s u l t of steps 12-14
in
Fig
2.
Except
for
the
l o n g - t e r m - s u c c e s s - r a t e , i n f o r m a t i o n f o r examp l e 7 is a copy of the l a t e s t m o d i f i c a t i o n s )
to
example
1.
Example
7fs
l o n g - t e r m - s u c c e s s - r a t e was computed to be .75
The parent of example 7 ( i . e . , example 1)
now has a l o n g - t e r m - s u c c e s s - r a t e of .55
Example 3's l o n g - t e r m - s u c c e s s - r a t e was r e duced
to
.45
.
(NOTE:
initial
long-term-success-rates
f o r a l l examples was
• 5 ) . T h i s f i g u r e shows how example 7 begins
i n f l u e n c i n g p l a y e r - T s behavior e a r l i e r i n a
s i m i l a r r a l l y . Examples 2 and 4-6 d i d not
i n f l u e n c e p l a y e r - T s behavior because t h e i r
146
Download