Towards Goal-Driven Reflective Learning Pierre E. Bonzon HEC-Inforge,

From: AAAI Technical Report SS-94-02. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Towards Goal-Driven Reflective
Learning
Pierre E. Bonzon
HEC-Inforge,
Universityof Lausanne,
1015Switzerland
pbonzon@hec.unil.ch
1. Introduction
The rest of this paper is organized as follows: section 2
reviews somefundamentalconcepts about reflection; section
3 introduces the one particular framework we have been
experimenting with so far; section 4 illustrates
the
preceeding discussions with a simple working example.
Morethan 35 years ago (which is quite a long time in AI
history ), John McCarthywrote:
"... in orderfor a program
to be capableof learningsomething,it
mustfirst be capableof beingtold it... Includedin the set of
imperativeswhichmaybe obeyedis the routine whichdeducesand
2.-Towards a reflective
architecture
obeys.
"[6]
Exceptfor various ad hoc (and sometimesquite successful)
systems, this de facto manifesto calling for the study of
introspective systems did not give rise to what maybe
called "a general architecture for declarative and/or
reflective machinelearning". Somerecent research taking
place under the label of "goal-driven learning" signals
howevera renewed interest in these very basic issues: "
learning in such systems is an active process.., to generate
learning goals, goal-driven learning systems must be
introspective"
[5].
learning tower
Following the pioneering work of B. Smith [9], it has
become customary to introduce reflective systems by
considering an infinite tower of metacircular language
processors (or interpreters) in which each processor
interprets the one below it, with the interpreter at the
bottom executing user input and the wholetower being run
by an "ultimate machine"at the top. As recalled by Jefferson
and Friedman[3] (which also provide a very readable and
enjoyable reconstruction of an equivalent finite tower), the
various levels in this tower are connected by a mechanism,
called reifieation, that permits a programrtmning at one
level to provide code to the next higher level. Thuswhena
user level re/tier is applied,"its bodyis run as ff it werecode
belonging to the interpreter running the application". In
another view of this same process, the contents of an
interpreter registers are passed above suitably packageor
reifed, such that the receiving interpreter can manipulate
them(this process is then described as converting program
into data). Conversely,reflection (giving rise to so-called
deifiers) activates a level downin the tower, andcan be seen
as the process by which some data values from an
interpreter are loaded into the lower interpreter registers.
Consequently, "this process maybe thought of as turning
data into program"[10].
Interestingly enough, researchers from outside of the AI
community have developped high hopes from such
systems: "..decision support systems can be provided by
facilitating andstimulating reflective learning"[1]. For most
people howeverreflective learning remains an unexplained
concept. A tentative intuitive description goes as follows:
after solving a problem,reflect on the solution, i.e. try and
express implicit solving processes and/or ideas as explicit
procedures and/or knowledge.
A more constructive approach leads to the following 3steps process:
after focusingon a problem,select a particular goal
look for waysto achieve this goal and then build a plan
leading to a solution
finally, ’Yeflect" on this plan and comeup with a
reusable framework allowing one to easily retrieve
and/or deduce pairs of goal/plan instances.
This last description bears numerous analogies with
previously advocated learning paradigms, such as
explanation-based learning (EBL) . in fact many EBL
problem solvers have been designed "to learn from
successful operator sequences" with the definite goal of
improving
the system’s overall problem solving
performance [7]. Our ownapproach basically follows the
samepath. To try and distinguish it from previous work,let
us mentionits use of
Level 2
l
Level 1
l
Level 0
a reflection schemeallowing the deliberative integration
of planning,executionand learning steps [4]
predeflned generic operators that permit one to improve
problemsolving performanceby switching from "search
guided only by weak methods to domain-dependent
focusedbehavior"[2].
wt~pet
122
reification
~I
lO^al-aUOo~ ~eq Sma. o J.qm snhq ’,(Joaql aseq ieu~po a~
;noq~noa~m.eloJ saop I! ’olelselom MouUMO
SI! OIU~alels
aseq ~au aq; so~npo.qm
Jalaaclaolu~iom aldUn.S .s~ aI!qM
(llU).41aa""I’o
uol
olmst6¢u
-$-o
0 lOaO’l
-~atu
.......
J.......
I
.............
I lOaWI
°
~ ..............
~
:SMOllOse pal~!dapaq plnoa a.,man~sSu~. lnsoz
j
aU.L "aaMol ~ .u!uaea
I Xueaapun paaqd ac! plnoo aaMo;
al/en~uq ,(ue ’AFosaa^uco"(plaOMleuaalxa aq; jo lapom
eolaa^m362eqM
a~uaaolaa Atre ;noq;!M)
uoBelaacholm.
aSm~uqlapOm oi ~ osodand OlOS oSOqM’auo sno!^aJd
oql a^oqepaaqdaq IIOMplnco ’plaoM(leaa ao) ie,,,a;xa aq;
ol pa;eIaa sa.n~!~e amoslapom ol s! asochnd asoqM’aaMm
Mau.s~ ;aqUl p"auo leuo.q!pe~ oq; jo MOtApaddolo^a
u~op-oplsdn
ue ;sn(Ious! uo.qeIuasaad s!.~ ;ehqaIou
oiluei~odu~.
~ I!’molIoqoq~leuqsatuo.~ II.qS Indm.
~asn
sv
¯ aan;aal!qvaelapOmSmu.rca
I ano
IO ;acaq orb le sptmIs suo.~!Idde aa.~aa }o u~ aqna.med
e Moqttola)lS aM a~qM£ uogaas MOlaC
1 aas ’aldmexa
ut~ qons.loj LtO:lO.ld~:l~.UOA.~
e .t.q~.t~ ulo.IJ xalaad~.u~am
Oaezlsqe aaom ~’!) aaMo[ e uru oi posn aq ,(era
:I~.~.
1 e ’,~l’asa’aauoD
¯ [ Faaa[Ie~OlSpOllop~ .ulpuodsoJ.Ioo
orb olin.(uo.q~u pue ~oa~ ’ole;s e "o’Duo!l~eeIam
pue ale~selam a~ moo e;ep 0 la^al peol ol spaau
¯ {i1Io ~a/~ape ’~alaacha;uIelama~ .ugB.t~ mo~j~alaadaoIuT
(~e~sqess~I "o’.t) aaddna~ aIe^~aeol aapaout ’aldmexaaO~l
¯ .~tetoi ato Lq. dn[~AOI
e saleal~aesaaBTap
al~Ixi ’Ia^Ol amo/lxau aq; ;e opoa um o~ sLnmadMou
( uowe).qdde
all;S~umu~ma~A,a;m.
a~ m B .u~uoI~
apoo a~a~ ~(a~ /I se Suluumsasso,~o~d a" D sao.~. ~
(uo.neaoI
molloq ~osso~d aleut, in aq~ jo aauanbasuoo e) II
moqoauo aq~ slaxi.talm, p^aI Ua^l~e le aossoooadqaea
tao~loqaq; le Mouaxe
(aalaachalu~ uP;Ilnq e ,(q palaoddnsI olaaaq paumsse
’salaada’olu~;am aq~ ~a.) aossaaoad aleut. In a~ /mu
(uo.~aee;am a~ pue a+else+atu aq; +’!) +nduT ~asn q;oq
J:oama!d[aaal o,~ ~ .u~ono
oq~Ii~u~qlaM’oan~olIq~JeoqI Jo ozo4:)n.tls o^l~o[Jo.l
paz!soP o~ ;unoaae olin. $ .up~£ .paSue~un;ja I oae s~aq~o
al!qM pa;epdn a~ smauodmoaaims aq~ jo amos MoqOION
0oladml~m
otlo/tmod
mau
qnomau"1"o
. oo~ mmm
ollo/omd
qmm
"t1’
a’~’’I’~)
(’(t
-
0 l’oaO’l
,~iI
:a.lr~Id I’aaaI auo ~U~OHO
§ oq:l I~ snhq ptm
gMt~smaue $’a) lnchno se IIOMse (Fuo~ pue ,~v~s ’uo~.~
ue ~J’a) lndm.lloudxa ~Imla^al tlOtra Fapotuo~ asooq"~M
’[o~luco
la^apmam~o uwmopaseq e ~a~W
Inoqea~palMom
i Su~uasaadaa Aaoaq~ amosuo 9m;.g~ado
~ela~aI~ u+e oI puodsozioo p[no~ ~oI ~au .s~
ul Ia^aI qaea ,(qaaaq~ ’aan~aaIltpae~uno;8u!uaral aa!$a;%ba
e aoj ~t.,DiooI uaoqoaojaaotD OaeqaM"JnOlaeq’oc
I ~t.quaea[
ao/pu~ a^Bdepe ot aoop aq; suado snql pue uo!i~aad~u!
ao/ptm u~m~/a~ ~ a!uamFt~ saaatqae ssaaoad sttD aloqM
aq~ u0 "a/q/.su~xa samct~I jl-OSlt a~tmguqa~ llns~ e sv
:so.~s.uopeaeq~~ .u~oIIO
t aql seq
ompn~saaasol asau .s~ ’saossa~acl aSen~uq aelnaapeiam
~u~.uamaldu~ ~OMO1Ie.U~ao aql oi podsa~ ~!M
¯ ano~eqaq8u!u~I ann
e tJupo .zmu mql ’yas~!~oo~as~ a~ a;~dn m paul!sap
aq ,(Usea plnoo Slapomaleaoqqa aaOl~l "a^oqe uaa~iapom
This final structure dearly distinguishes two fundamental
componmts,Le. the L-leveis and the K-levels (for languageand knowledge-level, respectively). While us~ inputs are
received at the bottom,they are ultimately interpreted at the
L-ultimate level, which is where "some concrete
computation really proceeds, say, by using underlying
hardwareresources"[8]. This ultimate L-level also coincides
with the base K-level, where the knowledgeinterpretation
proc~s is actually initiated. On-goingcomputations up to
the K-ultimate level eventually relate to the domainat the
top, whichmodelsthe external world.
When the number of possible events becomes high,
forward chaining is a better choice. Skilled students,
reflecting on their experience, will eventually switch to this
mode. Whenencountering successive instances of the same
kind of event they will further cut their search effort, a
numberof analogies gradually allowing them to "blindly"
follow the same procedure. Directly applicable operators
are thus implicitly put to work.
As hypothesized in section 3, MEAplans can be used as a
starting point to modelthis learning process. Theplans thus
obtained can then be used by the learning theory to
instantiate generic entry operators. Togetherwith modified
domainoperators bearing the nameof single preconditions,
these entry operators will constitute a newdomaintheory
that in essence allows one to forwardchain the right set of
journal entries.
3.-A particular learning scheme based
on predefined generic operators
For the p~ of our discussion it suffices to note in a
first approach that each processor in the learning tower
(except for the one which coincides with the L-ultimate
level) can be represented in a STRIPS-like formalism
involving the usual precondition, add and delete lists,
together with an additional execute list containing calls to
reiflers and/or deifiers. Besidesbase level knowledgetaking
the form of domain operators, a meta-level may well
embodya classical means-end analysis forming an MEA
planner, Le a typical weak search method. Our learning
schemeis then based on the following assumptions:
5.-
Acknowledgement
The author is greatly indebted to anonymousreferees for
valuable commentson a previous version of this paper.
6. References
to any plan returned by the MEA
planner (the result of
weaksearch methodanalogous to a backwardchaining
behavior possibly involving backtracking) corresponds
a learned program analogous to an optimized forward
chaining behavior involving the application of derived
domain operators
which can be used without
preconditions.
Among
the (meta)theories that are put to work via deifiers
to implementthis scheme,we single out:
a "generic" theory containing partially uninstantiated
operators whoseexecute list chains together successive
calls to reifiers allowing to activate a lower level
metatheory(such as metaapply), allowing one in turn to
activate derived upper level domainoperators
a learning meta-theory which, given a particular MEA
plan, is ~ to instantiate operators from the generic
theory and to derive directly applicable operators from
the original domaintheory, thus giving rise to a directly
applicable learned domaintheory.
4.-Application:
a dynamic student’s
model of skill acquisition
The learning schemejust introduced was inspired by and
intended to serve as the basis for modelling student’s
acquisition of financial accounting skills from textbook
knowledge.Suchstudent routinely face the goal of finding
the right journal entries correspondingto given accounting
situations. Applicable textbook knowledge,as expressed in
a formalbase-level theory, first providesthe kind of entries
associated with various possible basic transactions. Further
textbook knowledgeindicates the kind of events leading to
such transactions. Under the hypothesis that unskilled
students will "go by the book", the implicit solving process
allowing them to deduce the correct entries will be
equivalent to a backwardchaining search, and thus will
match the behavior of the MEA
planner.
[1]
Angehrn, A., Stimulus Agents: An alternative
Framework for Computer-aided Decision Making,
DC:,S-92Transactions, M.Silver (ed), TheInstitute
ManagementScience, 1992
[2]
Carbonnel, J. , introduction: Paradigmsfor Machine
Learning,Artificial Intelligence,40, 1989
[3]
Jefferson, s., and Friedman,D., A Simple Reflective
interpreter, in Yonezawa,Y. and Smith, B., (eds),
Reflection and Meta-LevelArchitecture, Proc . IMSA
Workshop, 1992
[4]
Kuokka,D. The Deliberative Integration of Planning,
Execution and Learning, PhDThesis, Carnegie Mellon
Univ., 1990
[5]
Leake, D, and Ram, A., Goal-Driven Learning:
FundamentalIssues, AI Magazine,14,4,1993
[6]
McCarthy, J., Programs with Common
Sense, 1958,
appeared in M. Minsky
(ed), Semantic Information
Processing,MITPress, 1968
[7]
Minton, S. et al., Explanation-Based Learning, A
ProblemSolvingPerspective,Artificial Intelligence, 40
1989
[8]
Nakajima, S., WhatMakesa LanguageReflective and
How(preliminary results), in Yonezawa, Y. and
Smith,B., (eds), Reflection and Meta-LevelArchitecture,
Proc. IMSAWorkshop,1992
[9]
Smith,B., Reflection and and Semanticsin Lisp, Pro(:.
ACMPOPL, 1984
[10] Wand, M., and Friedman, D., The Mystery of the
TowerRevealed: a Non-Reflective Description of the
Reflective Tower, Lisp and Symbolic Computation,
1,1, 1988
124