Getting Order Independence in Incremental Learning Antoine Cornu6jols Equipe Inf6rence et Apprentissage Laboratoire de Rechercheen Informatique (LRI), UA410 du CNRS Universit6 de Paris-sud, Orsay B~timent 490. 91405 ORSAY (France) email (UUCP) : antoine@lri.lri.fr From: AAAI Technical Report SS-93-06. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Abstract* It is empirically known that most therefore seems 1o be equivalent to a preference bias that makes a choice amongall the models or hypotheses that the learning system could reach given the collection of data (that is the models that would be obtained had the collection of data been presented in every possible orders). incremental learning systems are order dependent, i.e. provide results that dependon the particular order of the data presentation. This paper aims at uncovering the reasons behind this, and at specifying the conditions that wouldguarantee order independence.It is shownthat both an optimalityand a storage criteria are sufficient for ensuring order independence. Given that these correspond to very strong requirements however, it is Hence, ordering undoubtedly amounts to some additional knowledge supplied to the system. This is why teachers have some value: by selecting suitable pedagogical presentations of the material to be learned they provide further knowledge that hopefully helps the learning process. interesting to study necessary, hopefully less stringent, conditions. The results obtained prove that these necessary conditions are equally difficult to meet in practice. Besides its main outcome, this paper provides an interesting methodto transform an history dependent bias into an history independentone. 1 Learning without some bias that allows the reduction of the search space for the target concept or model is impossible except in the crudest form of rote learning. Whenlooking more closely, it is usual to distinguish between : - representation bias : where the search space is constrained because all partitions of the example space can not be expressed in the hypothesis space considered by the system (this is the basis for inductive generalization and is the main topic of current Machine Introduction Ordering effects ill Incremenhal Learning have been widely mentioned in the literature without, however, being the Leamiugtheory [12]), and subject of much specific study except for some rare pioneering works [1,4,9, and, incidentally, in 2] 1. In -preference bias : which dictates which subspace should be preferred in the search space (e.g. prefer short, ordering effects are observed when, given a collection of data (e.g. examples in inductive concept learning), different ordered sequencesof these data lead to different learning results. In this respect, ordering of data simple hypotheses over more complex ones) (this type of bias has been muchless studied because it touches on procedural aspects instead of declarative ones only). Because ordering of inputs allows one to favor some models over some others, it seems to amount to a preference bias that chooses between competing hypotheses. In spite of tiffs resemblancehowever, there is a deep difference with the biases generally discussed in Machine Learning. Indeed, with ordering effects, one observes the preference but cannot pinpoint directly where This paper appeared in the Proc. of the European Conference on MachineLearning (ECMI.-93), Vienna, April, 5-8, 1993. 1 See also the AAAITechnical Report corresponding to the recent AAAI Spring Symposium (March 23-25, 1993, Stanford University) devoted to "Training Issues in Incremental Learning". 43 in the learning systemit lies and howit works.Thisis in contrast with what is considered classically as a bias, whereone can identify operationalconstraints e.g. isolate representation constraints or proceduresfor choicebetween hypotheses. Thuswe use the term global preferenc~[~ias to denote preference amongmodelsdue to ordering effects after a sequenceof inputs has beenobserved,and the term local preference bias to denote the local choice strategy followedby the systemwhenat each learning step it must chooseto follow somepaths and discard others. Twoquestions then immediatelycomeup : I- Whatis the relationship between a global preferencebias anda local one? 2- Whatis the relationship between a global preferencebias that is observedor ahnedat anda correspondingteaching strategy that specifies the order of inputs ? In other words, how to designa teachingstrategy so as to get a certain global preferencebias ? To sumup, order independentincrementallearners (i) are able to focus on a optimal hypothesis whenthey have to chooseamongthe current potential ones; and (it) they do keep enough informations so as to not forget any potential hypothesis.If one or both of these properties is lacking, then incremental learning is prone to be order dependent.A closer look at each of these property in turn will help to see why. Thesecondquestion is related to the recently introduced concept of teachability [5,8,15,16,17] and the not so recent concern for good training sequences [19,20]. llowcver, it differs in a fundamentalpoint in that, in the former the problemis essentially the determination of good ex,’unples to speed up learning, whereas in our setting weassumethat the collection of instances is given a priori and our only degree of freedomlies in the choice of a good ordering. Additionally, researchers in the teachability concepthavenot beeninterested withthe idea of guiding the learner toward somepreferred modeland awayfromothers, they seek to characterizethe learnability of concept classes irrelevant of particular preferences within the classes. Keepingin mindthese differences, the emphasison providingadditional knowledgeto the learner throughan educationalstrategy is the stone. (i) Optimality vs. non optimality. Sincethe influential thesis and papersof Mitchell[10,11], is has becomecommonplaceto consider learning as a search process in a concept or solution space. Whatever the form (generalization, explanation,...) and the constraints on the hypothesisspace (e.g. representation languagebias), the learner is searchingthe best solution in this space given the data at hand. In order to compare the solutions in the search space, it is possibleto imagine that the learner evaluates and gradeseach one of themand then chooses the top one. Of course, the grade and therefi~rethe rankof these solutionscan be modifiedif the data are changed. In order to answerthese questions, and particularly the first one, it is necessaryto determinethe causes of the orderingeffects. 2 algorithm in Version Space [10,11], ID5R [18], or systems that are not usually considered as learning systems but could be, such as TMS[3] or someversions of the BayesianInferencenets of Pearl [13]. Theyall have in commonthat they do not forget any information present in the input data. Thus, even whenthey makea choice betweenalternative hypotheses, like ID5 or TMS and unlike the CE algorithm, they keep enough informationto be able to compareall potential competing modelsso as to select the best one at any moment,and changetheir mindif needed.Theyare therefore equivalent to non-incremental learningsystemsthat get all the data at once and focus on the best hypothesis given the informationsupplied. Becauseincrementallearners usually function by adapting a current solution (or a small solution set) to cope with new informations, they proceed typically in a hillclimbing fashion (see figure 1) open to the draw-back missing the global optimumsolution in favor of local ones. Causes of ordering effects It is instructive to look at incrementallearners that are NOTorder dependent,like the candidate elimination (CE) 44 Direction of the gradient /~ State of the system at time n-i Fig. 1. At time n-I, the systemis in a state correspondingto a given hypothesis. At time n, with a newarriving piece of data, the value of each hypothesisis re-evaluated and the systemfollows the direction of greatest gradient to reach a newstate. As underlined above, one must realize that in contrast to the commonoptimization problems where the opthnums and the whole topology of the search space are set once and for all before the search starts, in incremental learning the topology is changed with each new input and the solutions in the hypothesis space are therefore open to reevaluation, new optimums replacing old ones. Because of this, incremental learning wandering from one local and temporary optimum to the next can be order dependent unless it reaches the global current optimumat each step that is with each ,arriving data2. In that case of course, at the end of the training period and given that all the training data have been observed, the system would reach the global optimumfor this set of data regardless of its ordering during learning. addition to the requirement of finding the global opthnum at each thne. To forget If this forgetting is dependentupon the current state of the system, then it follows that it may also be dependent upon the ordering of the inputs. Therefore the resulting learning may be order dependent. It is important to note that these two reasons if often tied (this is because one keeps only a current solution for adaptation that one is tempted to forget informations about other possibilities) are nonetheless independent in principle and can be observed and studied separately. For reasons that will be exposed shortly, we will concentrate on the forgetting aspect of incremental learning and will In fact, it is easy to see that there is a second caveat in (ii) incremental learners because they entertain only some preferred hypothesis or model, can also discard in the process part of the information present in the past inputs. accordingly assume, by way of an adequate framework, that the best alternatives amongthe available ones can always be chosen by the learner. Forgetting of information lies therefore at the heart of order dependence in incremental learning. But forgetting or not to forget. Finding the optimal solution at each step is operating only to the extent that the set of all possible solutions is brought to consideration; if only part of it is available can take two faces. In the first one, information present in the input data is lost, meaningthat the current hypothesis space considered by the learner is underconstrained. In the then the whole optimization process is bound to be suboptimM.Now,beside adapting the current solution to new constraints expressed under the form of new data, second one, by contrast, what is lost are potential altenmlives to the current preferred hypotheses, which amounts to overconslraining the space of possibilities. 2 Finding the global optimumat each step requires either to keep in memoryall possible hypotheses, re-evaluate them and take the best one, or, short to this memoryintensive method, to be able to re-construct any possible hypothesis with enoughaccuracy and keep track to the best one so far. "Ibis latter methodis closed in spirit to SimulatedAnnealing or to Genetic Algorithms. In mathematics, this corresponds to crgodic systems. 45 This last form of forgetting is equivalent to a local preference bias which chooses among competing hypotheses which ones to pursue. This raises then a more specific question aforementioned ones, but which contributes overall goal : than the to the same 3. In which case an incremental learner can be order independent? Or, in other words, which information can be safely forgotten without altering the result of learning whateveris the orderingof inputs ? of what is known to the learner about the target concept f. It is this last questionthat this paperfocuseson. It must be kept in mindthat it is equivalentto the question: what local preferencebias leads to a null globalpreferencebias (i.e. to orderindependence) Ill the following of the paper, we will restrict ourselves to a simple concept learning model in whichthe learner attempts to infer an unknown target conceptf, chosenfrom a knownconceptclass F of {0,1 }valued functions over an instance space X. This fr,’unework allows us, in the next section, to define a measureof the informationgained by the leanling system and of the effect of a local bias on this information.This measure naturally suggests an equivalence relation betweenlocal preference bias and additional instances, whichis detailed in section 4. Then, in section 5, it bccomesa relatively simple matter to answerquestion 3 above. The conclusion comparesthe frameworkadopted here with the emergingone of teachability and discusses the results obtained. 3 Information measure and local preference bias In this section, we are interested in formalizing and quantifyingthe effect of a local preferencebias on whatis learned by the system. For this, we first define a characterization of the information maintained by a learner. t.et F be a conceptclass over the instance space X, and f~ F be a target concept. Theteacher has a collection of examples EX={xi,f(xi)} at his disposal, and makes a scquenceX=Xl,X2 ..... Xm,Xm+l .... with Xm~ EXfor all m. "l]le learner receives informationabout f incrementally via tile label sequencef(xl),...,f(Xm)~f(xm+l) For any m_>l, we define (with respect to xd0 the mth version space: A Fm(xJ) ={)~ 1) ~F :f(x A =.l(xI ),...,f (Xm) =/(Xm) The version space at time mis simply the class of all conceplsin F consistent with the first mlabels off (with respect to x). Fm(x,Dwill serve as a characterization 46 at time m Weknowfrom Mitchell [10] that the version space can be economicallyrepresented and stored using the boundary sets S-set (set of the most general hypothesesthat are morespecific than the conceptsin the version space), and G-set (set of the mostspecific hypothesesthat are more general than the conceptsin the version space)3. Eachnew ex,’unple (xn~f(xm))provides newinformationif it allows to reduce the version space by modifying,throughthe CE algorithm, either one of the boundarysets. Generally,the S-set and the G-set contain manyelements, and in worst cases, they can grow exponentially over somesequences of examples[6]. A local preference bias is a choice strategy which, at any time m, discards parts of the current version space, generally in order to keep the boundarysets manageable. In this way, it reduces the version space and acts as if there had been someadditional information that had allowed to constrain the space of hypotheses. The next section gives a closer lookat this equivalence. 4 Bias and additional instances Weassume that the incremental learner maintains a version space of potential concepts by keeping the boundarysets. Weassumefurther that the local preference bias, if any, acts by removingelementsof the S-set and/or of the G-set, thus reducingthe versionspace. Indeed,in so doing, it removesfromthe version space all conceptsor hypotheses that are no longer more general than some element of the S-set and morespecific than someelement of the G-set. Besides,the resulting versionspacekeepsits consistency since, in this operation, no element of the resulting S-set becomemoregeneral than other elements of the S-set or of the G-set, and vice-versa, no elementof the G-set can becomemorespecific than other elementsof the G-set or of the S-set. 3 To be exact, a subset of the concept space can be represented by its S-set,andG-setif ,and onlyif it is closed under the partial order of generality of the description languageand bounded.This is the case for most learning tasks and concept representations and particularly when the description language consists in the set of all coujunctive expressions over a finite set of boolean features.See[7] fi~r moredetails. To sum up, we nowhave a learning system that forgets pieces of information during learning by discarding potential hypotheses, but at the same time is optimal since, in principle, by keepingthe set of all the remaining hypotheses, it could select the best mnongthem. In that way, we isolate the effect of forgetting without interminglingwith nonoptimality effects. Weare interested in studying the action of a local preference bias (whichhas been shownto be equivalent to the forgetting of potential hypotheses)along all possible sequencesof data. Morespecifically, wewantto find what type of local bias (or forgetting strategy) leads to orderingeffects, that is for whichall training sequences conductto the stoneresulting state. n! sequences / Initial state State after learning Fig. 2. For n instances, there are n/ possible training sequences. We look for conditions under which all sequences would result in the same state. Whatmakesthis problemdifficult is that the action of the local preferencebias dependson whichstate the systemis in, and therefore depends on the training sequence h)llowed.This implies in turn that all training sequences shouldbe compared in order to find conditionson the local bias. Avcry simple but very important idea will allow to circumventthis obstacle. It has three parts : equivalent to an order independent incremental learner using bias b2, (leadingto the consideration of extra instances) then, finding conditions on the local bias bl would be the same as finding conditionson the bias b2, only this time irrelevant of the training sequence(since b2 is used by an orderindependent learner). (i) and(ii) allowto realize (iii) if it canbe shown that the effect of the local bias bl is the sameas the effect of additional instances given by an oracle or bias b2 to an order independentlearner. In other words, if it can be provedthat any forgetting of hypothesisis equivalent to observing extra instauces, theu, conditions on bl will amount to conditions on the addition of fictitious instances to an order independentlearning algorithm. (i) forgetting hypotheses,amountsto overconstrain the search space (ii) extra instancesto an order independentlearning algorithm wouldresult in constraining the search spathe (iii) if an incrementallearner using a local bias bl (Icading to forgetting of hypotheses)could be made 47 Orderdependent learnerusing localbias(forgetting) OrderIndependent algorlthm (e.g. theCEalgorlthm) G-set G-set space I~- / NewS-set \ I Elimination of gi New l Action of thelocalbias Sd J inst~c~ Fig. 3. The equivalence needed to allow the study of conditions on the local bias. The effect of the local bias (forgetting of some element of the S-set or G-set) is the same as the effect of additional instances made available to an order independent algorithm such as the CE algorithm. gl Of this powerfulidea, we will m~ea theorem. Theorem i : With each choice it makes, the local pre.ference bias acts as if additional exampleshad been knownto an order independent/earner. Proof : (i) Case of the reduction of the S-set. For each clementgi of the S-set it is possible to find additional fictitious exampleswhich, if considered by the CE algorithm,wouldlead to its eliminationof the S-set. It suffices to take the positive instances coveredbY(or morespecific than) all g2 suchthat (gj E S-set ,and j¢i) ,and not covered by g~ or excludedby the G-set. As a result, the CEalgorithm wouldnot have to modifythe (l-set nor the gj such that (gj E S-set and j~i) and would generalize gi .just enough to cover the new instances. But since {gj /(gj e S-set andj~i)} is the set of all past instances plus the newfictitious ones, gi can only becomemore general than one or several gj, and hencewill be eliminatedfrom the newS-set. (ii) Caseof the reductionof the G-set. In the stone way, in order to eliminate an elementgi of the G-set Ihrough the CEalgorithm, it suffices to provide instances covered bv g! but not covered by the oth~l" elementsof the G-set ,and by II~e S-set. Asfor (i) above, the CEalgorithm wouldthen specialize gi just enough It excludethe negative instances, and this wouldresult in an element of the G-set that wouldbe morespecific th,’m others, henceeliminated.[] g2 Fig. 4. In this figure, {gl,g2,g3}are assumed to be the S-set of the positive instances(EI,E2,E3).If E4is observed,then neither gl nor g2 needto be modified.In fact {gl,g2) is the S-setof all positiveinstancesthat belongto the grayarea. g3 will needto be generalizedjust enoughto coverE4, andthis will makeit moregeneral than gl and g2, henceit will be eliminatedfromthe S-set. The following examplewill help to understandthis. Let us assume that we present positive and negative instances of scenes madeof several objects each one describedby a conjunctionof attribute-values, and that we use three f,’unilies of attributes, eachone organizedas a tree withincreasingorderof generalitytowardthe root. 48 Sizes large small Shapes convex polygon triangle ellipsis I pentagon quadri lateral circle parallelogram I oqui fat era i rectan( le lozenge I square (~O|OFS colc, r " col. composed eel.base B/W Bl,,ck green purple White red In file following, //Ik R Y B yellow bhle we show how an element of the S-set could be removed by observing more instances. I.et us suppose that we ob~rve the following sequence of positive {(1,ar<le red square) & (small instances : -El ~ -4 S-set-I = {(large red square) & (small white lozenge)} - E2 ~ -4 S-set-2 = {(large red parallelogram) & (small col.-base polygon), white lozenge)} {(lar(~e red pa~alleloclram)& (small blue equilateral)} (? R-Y-Bpolygon) & (? col.-base parallelogram)} -E3 -4 4 {(large yellow right) & (small blue rectangle)} S-set-3 = {(large R-Y-Bpolygon) & (small col.-base polygon), ~-- gl (? R-Y-Bparallelogram) & (? col.-base polygon), <--- g2 (? R-Y-Bpolygon) & (? col.-base parallelogrmn)} 4-- g3 49 Let us assumethat at this point the local bias for somereasonsdecidesto discard g3 fromthe S-set. Thiswouldresult in S-set-4 = [(large R-Y-Bpolygon)&(small col.-base polygon), (? R-Y-B parallelogram)&(? col.-base polygon)]. The very sameS-set wouldbe obtained if the le,’u-ning ,’dgorilhmwas the CE,algorithm that after El, e2 and E3 observeda newinstance covered by gl and by g2 and not by g3 such as : -E4: + ((large yellow parallelogram) & (small black pentagon)} (Weuse the notation FnBTto differentiate a learner implementing a local bias (LB)from one that does not and only imple,nents the CE algorithm noted Fn(xd) section 3. VSwbmeansthe version space obtained with bias). Theorem 2:An incremental learner implementing a local preferencebias is order independentfor a collection EXof instancesif the action of this bias is equivalentfor all possible sequencesx of elements of EXto the supply of the sameset of additionalinstances. 5 Bias and order independence Whatwe have seen so far is that a local preference bias (forgetting hypotheses)can be madeequivalent to another bias that wouldthrow in chosenextra instances for the learner to observe. Thanksto this we can nowtackle the main topic of this paper, namely what kind of local preferencebias a learner can implement so as to slay order independent. Indeed, instead of studying a stralegy of forgetting of hypothesesthat dependon the current version space, and therefore on the past history, we nowstudy addition of extra fictitious instmicesto the CEalgorithm that is order indepe,dent. In other words,weare nowin a position to specify conditionson the local preferencebias by slating to whichextra instancesit should,’unountto. Weassumethat the teacher has a set of n ex,’unples EX, and draws a sequence x of these according to her requirements. Proof : It follows inunediately fromtheorem1. [] Now,we want to enlarge theorem2, and give conditions uponthe set of fictitious examplesthat the local bias acting as an oracle can provideto the learner so that the learner be order independent and reach VSwbpossibly different fromVS,b. The CEalgorithm without bias would reach the state VSub(Version Space with no bias). If VSwb~: VSnb, then it follows that extra instances should be providedto the CEalgorithm so that it reaches VSwb.Theorem3 states whichones. Furthermore,we ,assumeorder independence,i.e. : (1) Vx, FLB(xdO = VSwb, where VSwbis constant. n!sequences / State after Initial without state~~ bias learnin~ learning \ with ~ 50 bla.> G nb" - Version spaces S-set with bias : Swb VSnb~ S-set with r + ¯ Fig. 5. Whichfictitious bias : S b instances should the oracle corresponding to the local bias provide to the CE algorithm so that it get to the versionspaceVSwb ? Answering this questionamountsto give conditionson the local bias that wouldconducta learner using it to VSwb irrespective of the training sequencefi~llowed.Theansweris that fictitious positive instancesshouldbe drawnfromthe gray array, whereasfictitious negativeinstancesshouldbe drawnfromthe stripped array. The’+’ and’-’ figure the real instancesthat c,mductthe CEalgorithmto the VSnbversionspace. l.et Suband Gnbbe respectively the S-set and file G-set that the CEalgorithmwouldobtain fromthe collection of instances EX, and let Swband Gwbbe respectively the S-set and the G-set of VSwbin (1) (the version space obtained on any sequence x of EX by the learner implementing the local preferencebias). Theorem4 : For an incremental learner implementinga local preference bias leading to the same result as an incrementallearner without a local bias, it is necessary that the action of this bias be equivalentto the supplyof: Theorem3 : For an incremental learner intplementing a local preference bias to be order independent for EX leading to the version spaceC, it is necessary that the actionof this bias be equivalentto the supplyof: - a set of fictitious positive instatl¢¢$ such that they together with the real positive instances are coveredanti boundedby Swb, and, - a set of negativeinstancessuchthat eachoneis covering all elementsof Gnb. - a set of positive instances such that eachone is covered by elements of Snb, and, It is as if this local preferencebias knew"in advance"the collection EXof insU’mces,and eliminatedelementsof the S-set andof tile G-set judiciously. Thisleads to the final theorem. Theorem5 : It is not possible for a local deterministic preferencebias to lead to a null global preferencebias for any arbitrary collection EXof examples. -a set of fictitious negative instances such that they together with the real negative ones are excluded anti boumted by Gwb. Proof : Indeed,at each step, the action of the local bias can only dependon the past training instances and the current state. In order to lead to order independenceit wouldhave to be equivalent to an oracle that provides instances drawnfroman array of the instance space that can be defined only with respect to all the training instances. This wouldmeanthat the learner, throughits preference bias, was always perfectly informed in advanceon the collection EXof examplesheld by the teacher. [] Prt~)f: Self-evident.[] Thenext theoremis the application of theorem3 to tile case where VSwb= Fn(EX,f) = VSub, that is the case wherethe local preference bias leads to the null global preferencebias, i.e. has no effect. Sucha local bias c,’mbe seen as eliminating options judiciously sittce the result obtained after any ~quencex of EXis the sameas what tile CEalgorithmwouldget on EX.In this c,’t~e Snb and Swbare one and the .~tme as are Gnband Gwb. 51 6 Conclusion In this research, we are interested in the following general question : given a collection of examples(or data in general), howcana teacher, a priori, best put them in sequenceso that the learner, a deterministicincremental learning system that does not ask questions during learning, can acquire sometarget concept(or knowledge)? This question, that corresponds to situations wherethe teacher does not have the choice of the examplesand can not interpret the progress madeby the student until the end of the learning period, leads to the study of incremental learning per se, independently of any particular system. The solution to this general interrogation could be of someuse to several realistic settings where a teacher has an incremental learning system and a collection of data, or whendata ,arrive sequentially but time allows to keepthemin small buffers that can be orderedbefore being proces~d. This framework is to be compared with the recent surge of interest for "teaching strategies" that allow to optimally teach a concept to a learner, thus providing Iowcr bounds oil Icarnability complexity [5,17]. The difference with tile formerfr,’uneworkis that in one case the teacher can only play oil the order of the sequenceof inpuls, whetherin tile other case, the teacher choosestile mostinformativeideal exmnplesbut does not look fi)r the best order (there are someexceptionssuch as [13]). instances (either the correspondingprior knowledge is well-tailored to the future potential collections of inputs, or there is no prior knowledge, thus no reduction of storage and computationalcomplexity). In this study, we havegiven sufficient conditions only on the fictitious examplesthat should be provided by an oracle equivalent to a local preferencebias. It wouldbe nice to obtain necessary conditions that state which instances should necessarily be given by the oracle to get order independence. Weare currently working on this question. It should be clear that the problemof noisy data is of no concern to us here. Wecharacterize through the version space what is knownto the learner irrespective of the quality of the data. Noise would becomean important issue if it wasdependenton the orderingof the data (e.g. sensor equipmentthat woulddegradewith time). Issues for future research include : are these results extensible to more general learning situations (e.g. unsupervised) ? given a local preference bias, howto determinea gtx’xl sequenceorderingso as to best guidethe systemtowards the target knowledge? Acknowledgments: I thank all the membersof the EquipeInference et Apprentisssage,and particularly Yves Kodratoff, for the goodhumoredand research conducive almosphereso beneficial to intellectual work. Comments fromthe reviewershelpedto makethis paperclearer. This paper has outlined somefirst results concerningorder sensitivity in supervisedconceptualincrementallearning. The most important ones are : (i) that order dependenceis due non-opthnalityand/or to forgetting of possibilities correspondingto a local preference bias that heuristically selects the most promising hypotheses, (ii) that this bias can be seen as the result of additional instances given to the learner (i.e. prior knowledge built into the system), (iii) that (ii) allows to replace the difficult problem determiningthe action of the local bias alongdifferent sequencesof instances by a problemof,addition (which is commutative,i.e. order independent)of instances to an order independentlearner, whichleads to (iv) that thcre are strong contingenciesfor an incremental learner to be order independenton somecollections of 52 14. Porat References 1. from Cornu~jols A. (1989) "An Ex ploration in Incremental Learning : the INFLUENCESystenz", Daley & Smith Inductive (1986) Inference". : "On the Information 15. Rivest in of and Control. 69, 4. Fisher, : "A truth Intellieence, maintenance "Ordering Ef COBWEB and an Order-Independent ill 5. Proc. of the 9th Int. 16. Shinohara system". 17. Salzberg, Method". 1"o appear Goldman & Kearns "On th e Co mplexity of (1991) ~, Santa Aug. learning algorithms and Valiant’s 7. lllrsh Intelligence. II. (1990) Merging". Proc. L_e..RI.IIiI~. I, ductive bi as: A! 9 rsion-Space Conf. on Machille "I nductive Le arning fr om Go od Joint Conf. on (IJCAI-91), pp.751-756. "] ’he Ef fects of Learning Classifications pp.361-370, for Intelligence. 1988. 10. Mitchell T. (1978) Version Sp aces : Concept Learning. Order on by Example: Heuristics Finding the Optimal Order", Artificial Phi) Thesis, anApproach to Stanford, December 1978. I I. Mitchell Artificial 12. NataraJan lheoretical T. (1982) "Generalization as Intelligent9, B. (1991) Search", vol.18, pp.203-226, 1982. Machine Le arning. A Approach. Morgan Kaufinann, 1991. 13. Pearl J. (1988) : Probabilistic Svsten~s: Theory), & Miyano (1990) "T eachability in in Proc. of the Workshop on Learnine Theory, 1990, pp.247-255. Deleher, Heath & Kasif (1991) ~, 4,161-186,(1989). "Learning on e su bprocedure pe Intelligence. 31 (1), pp.l-40, january Reasoning in Intelligent Networks of Plausible Winston (1970): Laboratory, "I ncremental Ve of the 7th Int. MacGregor J. (1988) vol.34, Learning Inference. "Learning structural from examples", AI-TR-2,3 l, MIT, Artificial 36, 177-222. Examples". In Proc. of the 12th Int. Artif. Intel. on Computational lesson". Artificial 20. learning framework". Univ. of Austin, Texas, Juue 21-23, 1990, Ing (1991) mplicated In Proc. of COLT88 19. Van Lehn (1987) 5-7 pp.330-338. 8. "L earning Co and Usefully". r 1987. Haussler D. (1988) "Quantifying in Artificial (1988) 18. Utgoff P. (1989) "I ncremental In duction of Decision 1991, pp.303-314. 6. " ,7, pp.705-711. Trees". Cruz, L "Learning with a helpful teacher". Proc. of the IJCAI-91. Conf. on Machine Learning, 1992. Teaching". Aleorithmic fects in Aberdeen, June 29-July Ist, M hin tomata Cambridge 1988, pp.69-79. 12, 231-272, 1979. Xu & Zard (1992) "L earning Au Examples". Computational Learning", Doyle J. (1979) Artificial & SIoan (Workshop Complexity (1991) 1991. Concepts Reliably pp.l 2-40, 1986. 3. Ordered pp.109-138, to Proc.of the 6th Intl. Conf. on Machine Le,’u’ning, Ithaca, June 29- July I, 1989, pp.383-386. 2. & Feldman Morgan Kaufinann, 1988. 53 Cambridge, MA, 1970. descriptions Intelligence