From: AAAI Technical Report SS-94-02. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Towards Goal-Driven Reflective Learning Pierre E. Bonzon HEC-Inforge, Universityof Lausanne, 1015Switzerland pbonzon@hec.unil.ch 1. Introduction The rest of this paper is organized as follows: section 2 reviews somefundamentalconcepts about reflection; section 3 introduces the one particular framework we have been experimenting with so far; section 4 illustrates the preceeding discussions with a simple working example. Morethan 35 years ago (which is quite a long time in AI history ), John McCarthywrote: "... in orderfor a program to be capableof learningsomething,it mustfirst be capableof beingtold it... Includedin the set of imperativeswhichmaybe obeyedis the routine whichdeducesand 2.-Towards a reflective architecture obeys. "[6] Exceptfor various ad hoc (and sometimesquite successful) systems, this de facto manifesto calling for the study of introspective systems did not give rise to what maybe called "a general architecture for declarative and/or reflective machinelearning". Somerecent research taking place under the label of "goal-driven learning" signals howevera renewed interest in these very basic issues: " learning in such systems is an active process.., to generate learning goals, goal-driven learning systems must be introspective" [5]. learning tower Following the pioneering work of B. Smith [9], it has become customary to introduce reflective systems by considering an infinite tower of metacircular language processors (or interpreters) in which each processor interprets the one below it, with the interpreter at the bottom executing user input and the wholetower being run by an "ultimate machine"at the top. As recalled by Jefferson and Friedman[3] (which also provide a very readable and enjoyable reconstruction of an equivalent finite tower), the various levels in this tower are connected by a mechanism, called reifieation, that permits a programrtmning at one level to provide code to the next higher level. Thuswhena user level re/tier is applied,"its bodyis run as ff it werecode belonging to the interpreter running the application". In another view of this same process, the contents of an interpreter registers are passed above suitably packageor reifed, such that the receiving interpreter can manipulate them(this process is then described as converting program into data). Conversely,reflection (giving rise to so-called deifiers) activates a level downin the tower, andcan be seen as the process by which some data values from an interpreter are loaded into the lower interpreter registers. Consequently, "this process maybe thought of as turning data into program"[10]. Interestingly enough, researchers from outside of the AI community have developped high hopes from such systems: "..decision support systems can be provided by facilitating andstimulating reflective learning"[1]. For most people howeverreflective learning remains an unexplained concept. A tentative intuitive description goes as follows: after solving a problem,reflect on the solution, i.e. try and express implicit solving processes and/or ideas as explicit procedures and/or knowledge. A more constructive approach leads to the following 3steps process: after focusingon a problem,select a particular goal look for waysto achieve this goal and then build a plan leading to a solution finally, ’Yeflect" on this plan and comeup with a reusable framework allowing one to easily retrieve and/or deduce pairs of goal/plan instances. This last description bears numerous analogies with previously advocated learning paradigms, such as explanation-based learning (EBL) . in fact many EBL problem solvers have been designed "to learn from successful operator sequences" with the definite goal of improving the system’s overall problem solving performance [7]. Our ownapproach basically follows the samepath. To try and distinguish it from previous work,let us mentionits use of Level 2 l Level 1 l Level 0 a reflection schemeallowing the deliberative integration of planning,executionand learning steps [4] predeflned generic operators that permit one to improve problemsolving performanceby switching from "search guided only by weak methods to domain-dependent focusedbehavior"[2]. wt~pet 122 reification ~I lO^al-aUOo~ ~eq Sma. o J.qm snhq ’,(Joaql aseq ieu~po a~ ;noq~noa~m.eloJ saop I! ’olelselom MouUMO SI! OIU~alels aseq ~au aq; so~npo.qm Jalaaclaolu~iom aldUn.S .s~ aI!qM (llU).41aa""I’o uol olmst6¢u -$-o 0 lOaO’l -~atu ....... J....... I ............. I lOaWI ° ~ .............. ~ :SMOllOse pal~!dapaq plnoa a.,man~sSu~. lnsoz j aU.L "aaMol ~ .u!uaea I Xueaapun paaqd ac! plnoo aaMo; al/en~uq ,(ue ’AFosaa^uco"(plaOMleuaalxa aq; jo lapom eolaa^m362eqM a~uaaolaa Atre ;noq;!M) uoBelaacholm. aSm~uqlapOm oi ~ osodand OlOS oSOqM’auo sno!^aJd oql a^oqepaaqdaq IIOMplnco ’plaoM(leaa ao) ie,,,a;xa aq; ol pa;eIaa sa.n~!~e amoslapom ol s! asochnd asoqM’aaMm Mau.s~ ;aqUl p"auo leuo.q!pe~ oq; jo MOtApaddolo^a u~op-oplsdn ue ;sn(Ious! uo.qeIuasaad s!.~ ;ehqaIou oiluei~odu~. ~ I!’molIoqoq~leuqsatuo.~ II.qS Indm. ~asn sv ¯ aan;aal!qvaelapOmSmu.rca I ano IO ;acaq orb le sptmIs suo.~!Idde aa.~aa }o u~ aqna.med e Moqttola)lS aM a~qM£ uogaas MOlaC 1 aas ’aldmexa ut~ qons.loj LtO:lO.ld~:l~.UOA.~ e .t.q~.t~ ulo.IJ xalaad~.u~am Oaezlsqe aaom ~’!) aaMo[ e uru oi posn aq ,(era :I~.~. 1 e ’,~l’asa’aauoD ¯ [ Faaa[Ie~OlSpOllop~ .ulpuodsoJ.Ioo orb olin.(uo.q~u pue ~oa~ ’ole;s e "o’Duo!l~eeIam pue ale~selam a~ moo e;ep 0 la^al peol ol spaau ¯ {i1Io ~a/~ape ’~alaacha;uIelama~ .ugB.t~ mo~j~alaadaoIuT (~e~sqess~I "o’.t) aaddna~ aIe^~aeol aapaout ’aldmexaaO~l ¯ .~tetoi ato Lq. dn[~AOI e saleal~aesaaBTap al~Ixi ’Ia^Ol amo/lxau aq; ;e opoa um o~ sLnmadMou ( uowe).qdde all;S~umu~ma~A,a;m. a~ m B .u~uoI~ apoo a~a~ ~(a~ /I se Suluumsasso,~o~d a" D sao.~. ~ (uo.neaoI molloq ~osso~d aleut, in aq~ jo aauanbasuoo e) II moqoauo aq~ slaxi.talm, p^aI Ua^l~e le aossoooadqaea tao~loqaq; le Mouaxe (aalaachalu~ uP;Ilnq e ,(q palaoddnsI olaaaq paumsse ’salaada’olu~;am aq~ ~a.) aossaaoad aleut. In a~ /mu (uo.~aee;am a~ pue a+else+atu aq; +’!) +nduT ~asn q;oq J:oama!d[aaal o,~ ~ .u~ono oq~Ii~u~qlaM’oan~olIq~JeoqI Jo ozo4:)n.tls o^l~o[Jo.l paz!soP o~ ;unoaae olin. $ .up~£ .paSue~un;ja I oae s~aq~o al!qM pa;epdn a~ smauodmoaaims aq~ jo amos MoqOION 0oladml~m otlo/tmod mau qnomau"1"o . oo~ mmm ollo/omd qmm "t1’ a’~’’I’~) (’(t - 0 l’oaO’l ,~iI :a.lr~Id I’aaaI auo ~U~OHO § oq:l I~ snhq ptm gMt~smaue $’a) lnchno se IIOMse (Fuo~ pue ,~v~s ’uo~.~ ue ~J’a) lndm.lloudxa ~Imla^al tlOtra Fapotuo~ asooq"~M ’[o~luco la^apmam~o uwmopaseq e ~a~W Inoqea~palMom i Su~uasaadaa Aaoaq~ amosuo 9m;.g~ado ~ela~aI~ u+e oI puodsozioo p[no~ ~oI ~au .s~ ul Ia^aI qaea ,(qaaaq~ ’aan~aaIltpae~uno;8u!uaral aa!$a;%ba e aoj ~t.,DiooI uaoqoaojaaotD OaeqaM"JnOlaeq’oc I ~t.quaea[ ao/pu~ a^Bdepe ot aoop aq; suado snql pue uo!i~aad~u! ao/ptm u~m~/a~ ~ a!uamFt~ saaatqae ssaaoad sttD aloqM aq~ u0 "a/q/.su~xa samct~I jl-OSlt a~tmguqa~ llns~ e sv :so.~s.uopeaeq~~ .u~oIIO t aql seq ompn~saaasol asau .s~ ’saossa~acl aSen~uq aelnaapeiam ~u~.uamaldu~ ~OMO1Ie.U~ao aql oi podsa~ ~!M ¯ ano~eqaq8u!u~I ann e tJupo .zmu mql ’yas~!~oo~as~ a~ a;~dn m paul!sap aq ,(Usea plnoo Slapomaleaoqqa aaOl~l "a^oqe uaa~iapom This final structure dearly distinguishes two fundamental componmts,Le. the L-leveis and the K-levels (for languageand knowledge-level, respectively). While us~ inputs are received at the bottom,they are ultimately interpreted at the L-ultimate level, which is where "some concrete computation really proceeds, say, by using underlying hardwareresources"[8]. This ultimate L-level also coincides with the base K-level, where the knowledgeinterpretation proc~s is actually initiated. On-goingcomputations up to the K-ultimate level eventually relate to the domainat the top, whichmodelsthe external world. When the number of possible events becomes high, forward chaining is a better choice. Skilled students, reflecting on their experience, will eventually switch to this mode. Whenencountering successive instances of the same kind of event they will further cut their search effort, a numberof analogies gradually allowing them to "blindly" follow the same procedure. Directly applicable operators are thus implicitly put to work. As hypothesized in section 3, MEAplans can be used as a starting point to modelthis learning process. Theplans thus obtained can then be used by the learning theory to instantiate generic entry operators. Togetherwith modified domainoperators bearing the nameof single preconditions, these entry operators will constitute a newdomaintheory that in essence allows one to forwardchain the right set of journal entries. 3.-A particular learning scheme based on predefined generic operators For the p~ of our discussion it suffices to note in a first approach that each processor in the learning tower (except for the one which coincides with the L-ultimate level) can be represented in a STRIPS-like formalism involving the usual precondition, add and delete lists, together with an additional execute list containing calls to reiflers and/or deifiers. Besidesbase level knowledgetaking the form of domain operators, a meta-level may well embodya classical means-end analysis forming an MEA planner, Le a typical weak search method. Our learning schemeis then based on the following assumptions: 5.- Acknowledgement The author is greatly indebted to anonymousreferees for valuable commentson a previous version of this paper. 6. References to any plan returned by the MEA planner (the result of weaksearch methodanalogous to a backwardchaining behavior possibly involving backtracking) corresponds a learned program analogous to an optimized forward chaining behavior involving the application of derived domain operators which can be used without preconditions. Among the (meta)theories that are put to work via deifiers to implementthis scheme,we single out: a "generic" theory containing partially uninstantiated operators whoseexecute list chains together successive calls to reifiers allowing to activate a lower level metatheory(such as metaapply), allowing one in turn to activate derived upper level domainoperators a learning meta-theory which, given a particular MEA plan, is ~ to instantiate operators from the generic theory and to derive directly applicable operators from the original domaintheory, thus giving rise to a directly applicable learned domaintheory. 4.-Application: a dynamic student’s model of skill acquisition The learning schemejust introduced was inspired by and intended to serve as the basis for modelling student’s acquisition of financial accounting skills from textbook knowledge.Suchstudent routinely face the goal of finding the right journal entries correspondingto given accounting situations. Applicable textbook knowledge,as expressed in a formalbase-level theory, first providesthe kind of entries associated with various possible basic transactions. Further textbook knowledgeindicates the kind of events leading to such transactions. Under the hypothesis that unskilled students will "go by the book", the implicit solving process allowing them to deduce the correct entries will be equivalent to a backwardchaining search, and thus will match the behavior of the MEA planner. [1] Angehrn, A., Stimulus Agents: An alternative Framework for Computer-aided Decision Making, DC:,S-92Transactions, M.Silver (ed), TheInstitute ManagementScience, 1992 [2] Carbonnel, J. , introduction: Paradigmsfor Machine Learning,Artificial Intelligence,40, 1989 [3] Jefferson, s., and Friedman,D., A Simple Reflective interpreter, in Yonezawa,Y. and Smith, B., (eds), Reflection and Meta-LevelArchitecture, Proc . IMSA Workshop, 1992 [4] Kuokka,D. The Deliberative Integration of Planning, Execution and Learning, PhDThesis, Carnegie Mellon Univ., 1990 [5] Leake, D, and Ram, A., Goal-Driven Learning: FundamentalIssues, AI Magazine,14,4,1993 [6] McCarthy, J., Programs with Common Sense, 1958, appeared in M. Minsky (ed), Semantic Information Processing,MITPress, 1968 [7] Minton, S. et al., Explanation-Based Learning, A ProblemSolvingPerspective,Artificial Intelligence, 40 1989 [8] Nakajima, S., WhatMakesa LanguageReflective and How(preliminary results), in Yonezawa, Y. and Smith,B., (eds), Reflection and Meta-LevelArchitecture, Proc. IMSAWorkshop,1992 [9] Smith,B., Reflection and and Semanticsin Lisp, Pro(:. ACMPOPL, 1984 [10] Wand, M., and Friedman, D., The Mystery of the TowerRevealed: a Non-Reflective Description of the Reflective Tower, Lisp and Symbolic Computation, 1,1, 1988 124