From: AAAI-87 Proceedings. Copyright ©1987, AAAI (www.aaai.org). All rights reserved. ArEnaand E. P&ditis and Jack Mostow Department of Computea: Science Rutgers University, New Brunswick, NJ 08903 Abstract PROLEARN An adaptive inteTpTeter for a programming language adapts to particular applications by learning from execution experience. This paper describes PROLEARN, a prototype adaptive interpreter for a subset of Prolog. It uses two methods to speed up a given program: explanation-based generalization and partial evaluation. The generalization of computed results differentiates PROLEARN from programs that cache and reuse specific values. We illustrate PROLEARN on several simple programs and evaluate its capabilities and limitations. The effects of adding a learning component to Prolog can be summarized as follows: the more search and subroutine calling in the original query, the more speedup after learning; a learned subroutine may slow down queriesthat match its head but fail its body. to compute execution simplify trace. Partial techniques execution trace for more efficient The rest of this paper is organized II describes EBG in PRQLEARN partial evaluation in PROLEARN. to its execution preter for a subset of Prolog. a prototype sults. Section VII summarizes environ- handles all except for the cut symbol (used to control backand side-effects (primitives that cause input or output or change the database). PROLEARN learns new subroutines that represent justifiable gener&zations of example executions. (We will use “subroutine” to mean a user-defined Prolog rule, as opposed to a primitive or a is an important way to increase the performance of search-based problem-solving system [Mitchell et al., 1986, Mitchell et al., 1983, Minton, 1985, Langley, 1983, Laird et al., 198623, Fikes et al., 1972, Maha&v=, 1985, Korf, 19851. Because Prolog uses search as a basic mechanism and has built-in unification, it is a natural vehicle for developing adaptive interpreters. PROLEARN combines explanation-based generdiwation (EBC) [Mitchell et al., 19861 with partial evaluation [Kahn and Carlsson, 1984, Kahn, 19841 to learn from execution experience and thereby reduce future search. As ’TlGs work 1s Lupported by NSF under Grant Number DMC8810507, and by the Rutgers Center for Computer Aids to Industrial Productivity,as well as by DARPA under Contract Number N6001485-K-0116 Machine Learning 2%Knowledge Acquisitiorn Section VI discusses related the research work. Finally, contributions Section of this work. in crdaptive intep- PROLEARN as follows. and Section III describes PROLEARN is only a for a practical adaptive interpreter; Section IV some of its shortcomings and suggests possible Section V summarizes the empirical reimprovements. ment by learning from execution experience and thereby customize itself toward particular apphcations? T&a paper describes PROLEARN, are used to prototype discusses a program, PROLEARN routine call as a goal concept adapt call, it uses EBG evaluation the generalized As it executes How c0l.M an interpreter fact .) Search reduction each subroutine class of queries solved by the same execution. I of Prolog tracking) interprets the general terms of primitives and facts. trace as a template, the subroutine, primitive subroutines every sub- Using the specific it constructs as follows. treats for EBG to operationalize a customized PROLEARN and user-defined the course of executing the subroutine, remove the dependence on the particular to the subroutine, and conjoins records facts version of all calls to performed generalizes arguments the generalized in execution in them to passed calls. Thr resulting conjunction is used to define a new special case version of the original subroutine. To illustrate, consider the following database (from [Mitchell et al., 19861): on(boxf ,,tablei). voluwte(boxl,lO). isa(boxi Sbox) . iaa(tablei oendtable). color(box1 ,red) . color (table1 Bblue) . density(boxl, i0) S safe-to-stack(X,Y) :- lighter(X,Y);not(fragile(Y)). lighter(X,Y) :weight (X,Wl) S weight (Y ,W2), Wl < w2. weight(X,Y) :wollune~x,~~, density(X,D) Y iz V * D. EBG sometimes D be simplified subroutine. weight(K,50O) Given the :- isa(X,endtabae). query PROLEARN learns safe-to-stack(X,Y) safe-to-staek(boxl,tablel), the subroutine :- produces a conjunction by exploiting This simplification becomes below claus~(wollame(X,v),true), elause(density&X,B)otsue), IL4 ie B * D, evident solves from the Using The pole. formulation evaluation the Towers The program puzzle-where the From pole program evalua- of the generalization example. of Manoi of the solution: the learned in [Kahn and Carlsson, in the following disks are move’d about is similar to partial tion without execution as described 19841 and [Bloch, 19841. The need for partial definition of terms that can constraints N 2 to the To pole uses a standard 0 via recursive move N-l disks from the From pole to the Using pole, move one disk from the From pole That product is, object of X’s X is safe to stack volume Y is an endtable. and density The learned on object Y if the is less than 500 and uses clause to subroutine ensure that user-defined predicates like wolunw, density, othand iza are evaluated solely by matching stored facts; erwise the learned subroutine might invoke arbitrarily expensive subroutines for these predicates, thereby defeating its efficiency. PROLEARN also learns similar specialized versions of each subroutine invoked in the course cuting safe-to-&a& (e.g., lighter). EBG in PROLEARN tive calls effectively caches the result of searching safe-to-stach(box1 The learned subroutine, namely ,tablel) 9 isa(obj2 0endtable). which is unfolded in terms of prim- itive calls like iaa(Y ,endtabls), eliminates this search. While subroutine unfolding provides some speedup in procedural languages, greater be&use-of While original in Prolog definition, speedup can be much this search reduction. the learned trace generated the subroutine is a special it is a generalization for the specific query. case of the of the execution Therefore it will ap- ply - _ to queriesother than the one that led to itscreation, and should speed up interpret ation for the entire class of such queries. This class is restricted to queries whose execution would have followed the same execution path as the original query. Relaxing this restriction would require allowing learned subroutines to call user-defined subroutines, which would make learned subroutines more general but possibly less efficient. PROLEARN relies on EBG generalization is justifiable. to guarantee that tius An execution resulting to the BlOV@(O,-,- 9- , Cl). mower(l,FPo~,To,Using,plan) :N > 0, M is N-1, Izlovg(PI,From,UP3ing,To,Subplanl), mover(H,lJeing,To,From,Subplan2), append(Subplan1, CCFr~~~,T~ll,Frontplan), app5ndC ClJJ9. for the PROLEARN must determine which definition of weight to use. As it turns out, the first definition fails and it must use the second one instead, plan is bound The append(Frontplan,Subplan2,Plan). intermediate subroutines needed to answer the query. Par example, when the query weight (tablei ,W2) is evaluated in the course of executing disks from the Using pole to the To pole. variable plan. of exe- amounts to subroutine unfoldUnfolding a query into its primi- ing with generalization. to the To pole, and finally move N-l trace constitutes a proof of a query in the “theory” defined by the subroutines and facts in the program. EBG generalizes away only those details of the trace ignored by this “proof.99 The learned subroutine should therefore P ,,rz for any query it matches, that is, produce the sam1~,oehavior as the original program. (Section IV discusses the problem of preserving appsnd(CHIT1 ,L, [HIUI) :- append(T,L,U). Given the query move(3,lsft,right,cgnter,Plan), Plan is bound to: C[left,rightlo i&ft,centerl 9 [right Dcenter] s Cleft,ri [centsr,rightl, [left ,ri Since rsove LEARN learns contains recursive three move learns the following calls subroutines. to stove, PRO- In particular it subroutine: mowe(M,From,To,Using,C[From,TolD [From,UsinglB [To ,Usingl o CFrom,Tol DCUsing,Froml TO] ) [F~oRI,ToI 1) , :- i5 N-l, pI>O, L is M-f, L>O, 0 is L-l, L>O, 0 is L-l, M>O, K is M-l, K>O, 0 is K-f, K>O, 0 is K-I. The saove definition learned by EBG represents a spe- cialized subroutine for to the To pole via th ’ Worse still, future with move of N > 3 disks will 3 disks from the From pole pole. The newly learned move” z for two disks and one disk are similarly verbose. queries match the learned rule and not fail until they reach conjunct 0 is L-l. the The term 0 is K-l (in the above conjunction) could be eliminated by binding K to 1 instead. In general, if two of the unknowns in an expression like X is Y - Z are known then the third can be deduced. PROLEARN uses correctness of the original program.) Prieditis and Mostow 4% such rules to simplify primitive subroutine LEARN actually learns the above move as: calls. 33. PRO- move(3,From,To,Using, C[F~O~,TO]a [Frorn,Usingl, , [Using ,Frod , [To ,Usingl , CF~OIU,TOI [Using I To] , [From, To] 1) . PROLEARN partially evaluates the following between down are eliminated terms like 12 is X * 4 are eliminated 3. Terms like 4 is y/3 by binding kinds of following are simplified ((Y > 119, (Y < 1599to exploit PROLEARN’s division. Early cutoff occurs After partial PROLEARN inserts The integer is that it is tried before subroutines. a number Search of shortcomings architectures common PaobBem As PROLEARN learns more subroutines, search time As an example, consider member, a gradually increases. commonly-used Prolog subroutine that tests for member- [Xi- member(X, C-IT]) Giventhe length [b,c,d,al), PROLEARN third and second member(X, member(X, [TI,T2,XlCTl,XlI). V-IY,M) . . equiw(X A Y,M A I) :- equiv(X,M), equiv(X V Y,M V N) :- equiv(X,M),equiv(Y,N). equiv are equivalent is used to test whether boolean expressions of 1, A and V). operators V b) equiv(Y,N). A -(c V d) s(a its two (with the Consider the V b) A (c V d)) (A) to find relevant subroutines. Here the only match is the subroutine whose head is equiv(X A Y ,M A N) . In fact, there is only one relevant subroutine at each node of the entire search tree for this query, so its branching factor is 1. When branching factor is this low, there is no search to eliminate. Here, speedup derives only from elimination of subroutine calls in the learned rules, namely: A-lY,X equiw(llX A Y) . 9. a query is worth cost (search learning from and subroutine learning, the cost of the learning process, bution of subsequent queries. PROLEARN from every execution without considering depends calling) on without and the distrisimply learns these factors. positions 6. list. [Tl,TB,T3,Xl- AlY,M) equiv(lX its execution three member definitions (since zaexnber three times). They represent member- mmber(X, equiv(lX :- Whether xnember(X,T). ship of an item at the fourth, in an arbitrary :- logical query: the equiv(X,Y). A Y) ,M) equiv(llX,X querymember(a, learns the following is called recursively consider 19851): equiv(l(X standard following I). :- [Mahadevan, V Y) ,M) arguments ship of an item in a list: member(X, and remembering? learning, equiw(l(X equiw(ll(a that learn. Bottleneck :- at the cost of slowing learning worthless (after beand Prolog interpreters typically index on the subroutine’s name (equiv) and the first argument’s principal functor Limitations exhibits of possibly database queries is worth The subroutine subroutine version from which it was derived, to other problem-solving A. learned it above existing V PROLEARN the To ensure the less efficient original like when Y 5 11. evaluation, added to the database. X to to conjunctions rth Problem up some what equiv(X,X). equiw(llX,Y) X to Y, and by binding speeding others, As an example terms: terms like 3 < 5 (that are always true) are eliminated, terms like X == Y (a test on whether X and Y are the same variable) and X = Y (a test on whether X has the same value as Y) The Learning Given the tradeoffs between storing and computing, tween the cost of learning and the resulting speedup, The Correctness and I). 1). Is the set of programs new subroutines? still correct That after PROLEARN is, does the new program learns preserve Given the query mexnber(a, [bpc,d,e,al), PROLEARN must search through all the learned member defi- the user-intended behavior? Two problems, redundancy and over-generalization, make this question difficult to an- nitions to get to the general case for member. A query testing the membership of an object in the list of a hundred swer. items may generate ninety-nine new member definitionsclogging up the interpreter if most of them are executed to avoid again. ular query, A number of researchers have observed that uncon- trolled learning of macro-operators can cause search bottlenecks [Minton, 1985, Iba, 1985, Fikes et al., 19721. Minton’s programused the frequency of use and the heuristic usefulness of macro-operators as a filter to control learning. Iba’s program learned only those operator sequences leading from one state with peak heuristic value to another. 4% Machine Learning & Knowledge Acquisition PROLEARN disallows subroutines with side-effects redundancy. Since every learned subroutine in re p resents another way to execute a particPROLEARN the database of original subroutines describes how to execute the query (perhaps cient manner). The danger lies in side-effects repeated and thus change program behavior violates the intent of the original program. PROLEARN also disallows already in a less effithat may be in a way that the cut symbol (used to control backtracking and define negation-as-failure). If PROLEARN allowed the cut symbol, learning new subroutines would, in general, be impossible without considering how much the context of other subroutines (e.g., cut and fail combinations that were executed during a particular query). While applying PROLEARN to a program with side effects is not guaranteed to preserve its behavior, neither does it necessarily tive interpreters lead to disaster. preserve Demanding program behavior that adap- penalty, is unnecessar- to characterize and expand PROLEARN is also theoretically subject ecution rule that didn’t apply this problem diately unwieldy by placing above in Prolog) each learned the subroutine from because Although It ap- problem-solving. to overcome member equiv 2 2 Table 1: Speedup Table Factors While imme- 1 lists the speedup original query in Prolog it was derived For each example, 1.3 1 6 36 1 1 3 6 factors the table time for the original follows: The results alone, age branching greatest the example factor speedup. from eliminating The user subroutine speedup. routine with above. with branching speedup averthe resulted calls (member and equiv) arithmetic 1 and few had the least resulted from sub- not search reduction. operations were eliminated. , Since each learned subroutine applies to a whole class of queries, the speedup results apply to the entire class. However, the results presented above are incomplete, since they do not show the slowdown for queries outside this class. Such slowdown occurs when a query matches learned subroutine and then fails after executing the terms in its definition. We measured speedup does not values [Mostow and learned that cache specific chunks. et al., 19791, PROLEARN the execution frequency of different control paths in a program, they do not generalize as PROLEARN does. by comparing time before and after learning. This paper has shown how an interpreter can adapt to its execution environment and thus customize itself to a particular application. PROLEARN, a some of execution This comparison measures The generalization sults differentiates PROLEARN and reuse specific values. The effects of adding can be summarized proto- of computed from programs a learning component as follows: The more search and subroutine to Prolog - calls in the original after learning: and backtracking by search are eliminated, caused well as the overhead of subroutine re- that cache query, the more speedup the indexing as calls. The same speedup applies to the class of queries that match the learned subroutine, not just the query from which the subroutine was learned. This class includes queries that would have followed the same execution path as the original e A learned Prolog an implemented type of an adaptive Prolog interpreter, uses two methods to increase its performance: explanation-based generalization and partial evaluation. calls (move). factor from EBG as a The one example (move) where partial evaluation helped was sped up by a factor of 8.5 over EBG because costly which avoids this mechanism subroutines produced The next greatest Any speedup uses EBG, chunking to simplify occur- of the same variable, that are more specific 19841. While some optimizers use data about the execution environment [Cohen, 19861 or collect statistics about over the the largest (safe-to-stack) ca3l elimination, that multiple vironments more dynamically than typical program optimizers [Aho et al., 1986, Kahn and Carlsson, 1984, Bloch, can be summarized the user subroutine two examples Soar’s evaluation cached subroutines Soar assumes also lists the average branch- of calls to user-defined B For EBG 4 stores genBoth approaches pay for decreased execution time with increased space costs and lookup time. PROLEARN adapts programs to their execution en- presented ing factor and number query. exfrom 54 to 156 times slower PROLEARN’s PROLEARN Also, Cohen, 1985, Lenat eralized procedures. Query in Prolog in CPU for the examples ranging by the interpretation rences of a constant are instances which can cause it to learn chunks than necessary. Over Original by factors is outweighed resemble Soar’s chunks, Unlike systems 23 94 4 queries of Prolog’s up PROLEARN’s et a!., 1986b], PROLEARN is an s’ncbdental learning system because it learns as a side-effect of use partid 23 11 4 speeds Like Soar [Laird assumption. safe-to-staclt it loses the efficiency learning of the example The extra performance than Prolog. rather than at the top of the database. RlOVB if PROLEARN’s to the same subroutine which achieved penalty, which renders PRCLEARN changing when the rule was learned. (though be Prolog meta-interpreter. imposes a considerable to 32, this speedup over-generalization problem as SOAR [Laird et al, 1986a], in which a learned rule masks a pre-existing special-case pears possible largely indexing. the class of programs PROLEARN can interpret without program behavior in unacceptable ways. would simple but inefficient level of interpretation ily restrictive, since not all behavior changes are unacceptable [Mostow and Cohen, 1985, Cohen, 19861. Further work is needed speedup techniques were eficiently implemented in the Prolog interpreter itself. PROLEARN is actually implemented as a subroutine query. may slow down queries that match its head but fail its body. Prieditis and Mostow 497 [Langley, 19831 P. Ackmowlledgments Many thanks van, Prasad go to Tony Tadepalli, Bonner, Steve Smadar Kedar-Cabelli Thanks also go to Chun Minton, Sridhar Tom for their comments Mahade- Fawcett, and on this paper. Liew and Milind Deshpande for their help with I#!. Langley. Learning effective search In Proceedings IJCAI-8, International heuristics. Joint Conferences on Artificial Intelligence, Karlsruhe, West Germany, [Lenat et al., 19791 D.B. Lenat, Klahr. Cognitive economy systems. In Proceedings Conferences August [Aho et al., 1986] pilerx A. Aho, R. Sethi, and J. Ullman. ComTechniques, and TOOL. Addison- Principles, Wesley, Reading, Mass., [Bloch, 19841 6. Bloch. of Logic Programs. mann Institute 1986. S ource-to-Source Technical of Science, Report Tranaformationa CS84-22, Weiz- November 1984. 19861 D. C o h en. Automatic compilation of logical specifications into efficient programs. In Proceedings [Cohen, American gence, Seattle, WA, A AAI-87, Association for Artificial August 1986. [Fikes et al., 19721 R. Fikes, P. Hart, and N. J. Nilsson. Learning and executing generalized tificial Intelligence, 3(4):251-288,1972. ing8 in Artificial robot plans. ArAlso in Read- Webber, Intelligence, Intelli- B. L. and Nils- son, N. J., (Eds.). [Iba, 19851 G. Iba. problem-solving. Learning by discovering macros in InternaIn Proceedings IJCAI-9, tional Joint Conferences on Artificial Intelligence, Los Angeles, CA, August [Kahn, 19841 K. K a h n. 1985. Partial methodology and artificial 5(1):53-57, Spring 1984. evaluation, programming intelligence. AI Mugazine, [Kahn and Carlsson, 19841 K. Kahn and M. Carlsson. The compilation of prolog programs without the use In Proceedinga of the Internaof a prolog compiler. tional Conference on Fifth Generation Computer Systems, ICOT, Tokyo, Japan, 1984. [Kedar-Cabelli and McCarty, 19871 S. T. Kedar-Cabelli and L. T. McCarty. Explanation-based generalization as resolution theorem of the Fourth International proving. Machine In Proceeding8 Learning Work- ahop, Irvine, CA, June 1987. [Korf, 19851 R. Learning to Solve Problem by Korf. Searching for Mucro- Operators. Pitman, Marshfield, MA, 1985. [Laird et al., 1986a] J.E. Laird, P.S. Rosenbloom, and A. Newell. Overgeneralization during knowledge compilation in soar. In Worbhop on Knowledge Compilation, Oregon State University, Corvallis, OR, Septem- ber 1986. [L&d et al., 1986b] Newell. Soar: mechanism. 498 J. E. Laird, P. S. Rosenbloom, and A. the architecture of a general learning Machine Learning, l(l):ll-46, Machine learning & Knowledge Acquisition 1986. August 1983. F. Hayes-Roth, and P. in artificial intelligence IJCAI-6, on Artificial International Intelligence, Tokyo, Joint Japan, 1979. Verification-based [Mahadevan, 19851 S. Mahadevan. a generalization strategy for inferring learning: In Proceedings problem-decomposition methods. IJCAI-9, International Joint Conferences on Artificial Intelligence, Los Angeles, [Minton, 19851 S. Minton. for problem-solving. national Joint Los Angeles, [Mitchell Conferences et al., 19861 T. [Mitchell view. Learning chine Learning, and Cohen, tomating program R. Keller, Palo Alto, 19851 J. Mostow speedup and S. generalization: heuristics. CA, and acquirIn Ma- 1983. and D. Cohen. by deciding a 1986. P. E. Utgoff, by experimentation: problem-solving Tioga, Intelligence, Learning, 1(1):47-80, et al., 19831 T. M. Mitchell, ing and refining [Mostow on Artificial Mitchell, Machine plans Inter- IJCAI-9, 1985. Explanation-based R. B. Banerji. 1985. generalizing In Proceedinga CA, August Kedar-Cabelli. unifying CA, August Selectively Au- what to cache. International Joint ConferIn Proceedinga IJCAI-9, ences on Artificial Intelligence, Los Angeles, CA, August 1985.