From: AAAI Technical Report SS-94-04. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Error Recovery in Automation - An Overview Peter Loborg Department of Computer and information Science LinkSping University, S-581 83 Link~3ping, SWEDEN Phone: (+46) 13 282494 E-mail: plo@ida.liu.se Abstract This paper attemptsto providean overviewof techniques andapproachesto error recovery,both in automationandin other fields (such as autonomous robotics) whereanalogousproblemsoccur. Theterm’error recovery’is often usedas a common namefor the three sub areas error/fault detection, error/fault diagnose,andrecoveryfromthe resultingfailure. All three areas will be covered. Ratherthan presentingdifferent systemsand approaches in-depth, different types of systemsandapproacheswill be presented and compared. Terminology Thefirst observationmadeis that there is a lack of common terminology,e.g. the worderror is sometimes used to denote the original reason whysomethingwentwrong, whereas in other casesit is usedto denotethe effects in the computersystemdue to someunforeseenevent. However, in the ’reliable computing’community there is an emerging standard[Laprie, 1992].Althoughthey are primarilyinterested in problems internal to a computer andits software, the terminologyis applicablewhentalking about problemsin a ’real world’controlled by a computer,problemswhichmaymanifest themselvesas discrepancies betweenthe actual state in the real worldandwhatthe computer ’think’ is the actualstate. Thefollowing definitionswill be usedthroughoutthis paper:. fault the original source of any problem,such as a brokenair pressurevalveor assemblypart out of tolerances, whichmayled to an error directly or indirectly. error is a differencebetweenwhatis specifiedand whatis actuallythere. It maybe latent (the systemhas not recognizedit as such)or detectedby a detection mechanism (often called monitor). Thus,in a control system,an error is an observable discrepancybetweenthe actual state (the state in the controlledsystem)andthe internal representationof the intendedstate. 94 failure occurswhenan error affect the service delivered froma systemin anyway,e.g. the robot is unable to continuethe graspingoperationsince someonehaveremovedthe object in question. Finally,(completing the chainof effects) if the component that experiencea failure is a part of a systemof components, the result is a fault in the systemwhichcontains the falling component. Usingthis terminology,it is evidentthat whatis commonly describedas error recovery(such as catchinga signal on divisionby zero) is really failure recovery,althoughit wouldbe desirable to handlethe problembefore it manifests itself as a failure, i.e. to havetrue error recovery (detect the presenceof a zero anddo something about it beforeit is usedin a division). Systems whichare able to handlean error withoutaffectingthe service deliveredare called fault tolerantsystems. Thefield of error recoveryis often dividedin three subfields: detection techniquesfor (or the processof) observing the actual state of the controlledsystemand comparing it withspecificationsin order to find discrepanciesas early as possible. diagnosetechniquesfor finding the original fault which causedthe error. recoveryapplyingthe propercorrective actions in order to preventa possiblefuture error or reachan error free state. In eachof these subfieldsthere are severalprinciplesfor howto achievethese objectives,as well as different methods for representingthe informationneeded. Classification Thefollowingclassification parameterswill serve as a frameworkfor the comparisonof approachesand techniques. Although it is not alwayspossible, subjective parametershas beenavoidedas far as possible. Thusno approachwill be classified as beingthe ’best’, since such an evaluationis dependenton the applicationandthe motivationfor usinga specific approachin that application. When presentingdifferent approachesof error recovery, a basic categorisationis whethera forwardor backward error recoveryapproachis used.Thedefinitionfor these terms are ([Laprie,1992;Noreils,1990]): backwarderror recovery:Find an earlier (previously passed)error free state of the systemandreturn there, by ’undoing’whathas beendonesince. Examples of general techniquesusing this approachare recovery blocks [Kimand Welch, 1989]and transactions [Harderand Reuter, 1983]. forwarderror recovery:Findan error free state that the systemis supposedto eventuallyreach, andperformactionsto reachthat state. This is often doneusingpredefinedalternative actions or replanningof whatactions to use to achievethe ’goal’ [Chenand Trivedi, 1991;Noreils, 1990]. Othercriteria whichhaveimplicationson the systemperformance as wellas its flexibility are: Knowledgesources: The information (or knowledge) used mayoriginate fromseveral knowledge sources, suchas fromthe original code, froma descriptionof whatto perform,fromstatistically aquiredinformationabout relations between error causesandeffects in the systemas well as the propercorrective action. This knowledge mayalso be manuallyspecified after analysis of the system. Application dependence: The knowledgeas such may be moreor less application dependent,which haveimplicationsfor the reusability of the knowledge. Knowledgerepresentation: The knowledgeused may be representedin several different formalisms, suchas rules, alternativecodeor pieces of plans to use, graphsdescribingdecisiontrees etc. Notablehere is that several of these formalisms maybe combined,or translated fromone form into another. Techniques for error recovery In this sectiondifferentaspectsof error recoverywill be presentedanddiscussed.Firstly, techniquesfor the detection of a fault is discussed,followedby diagnoseand recoverytechniques. ble, since the degreeof observabilityof a systemmaynot correspondto the richness in the modelof the system.The followingtechniquesaddressthis problem. Sensorysignatureis a termusedby[Leeet. al., 1983]to denotea collection of parameterswhichspecifythe limits of acceptabilityof a sensedsignal duringa specific task phase. Theresult is that sensormonitoringis guidedby currentaction, not onlyin a binaryfashion(shouldit be monitored or not) but also qualitatively(howtight are the limits, howoften shouldtheybe verified). Gini ([Gini, 1893])uses a worldmodelto representthe activity in a robot system.This worldmodeland the operations on it is generatedfromthe original robotprogram,and for eachoperationexecutedin the robot a corresponding semanticoperationis executedin the worldmodel.If the resuits differ (the worldmodelcontainingthe expectedoutcomecompared to the sensedstate of the world)there is an error. Whatsensors to use is guidedby expectedchangesas well as expectedlack of changes. In [Hayes-Roth,1990], BarbaraHeyes-Roth uses a focus of attentionprinciplein a life supportandmonitoring system to discriminate betweenwhichparametersto monitor moreclosely andnot. This focusis guidedby rules according to whattask is performed,current loadin the system, andmayalso (in case of overflowof sensordata) imposefiltering strategies on sensorchannels.Theknowledge usedis currently’hard-wired’,applicationdependenta priori knowledge. ObservationsThesethree examplesshowsthat there exists feasible techniquesfor detection,providedthat there is a goodobservabilityof the system,that is that thereare sensors enoughto monitormostof the tasks. Notableis that the knowledge used to guide the monitoringin Gini’s approach is extracted from the program,knowledgewhichthe user mustsupplyexplicitly in the other twoapproaches.In Lee’s approach,the knowledgeis input twice (althoughin somewhatdifferent forms)- bothin the ’original’ programandin the monitoringsystem. Guidingthe monitoringtask in planningsystemsis often a variant of Gini’sapproach,andis thus not includedhere. Omittedin this presentationis also techniquesfor howto implement monitoringof different types of signals, since this is out of the scopeof this paper.However, an overview can be foundin [Iserman,1984]and[Frank, 1990]. Diagnose Detection In orderto avoidcostlyfailures, it is desirableto detectthe presenceof faults as early as possible.Theconstantverification of absenceof errors is costly andnot alwayspossi- 95 Diagnose is the processof identifyingthe causeof the error (or failure). Theidentifcationstarts byverifyingthe presenceor absenceof selected sensordata values,often called features. Theresult of the diagnosisis a set of explanations for the error, i.e. a set of possiblefaults. Failuretree [Srinivas, 1977;Srinivas, 1978]:Earlywork by Srinivas presents a methodwherepredefinedknowledge aboutpossiblefaults for eachaction(rules testing sensors andfacts trying to explainwhythe action failed) andthe currentstate of executionis usedto build a decisiontree of tests for ’features’upona failure. Thistree is thenusedto diagnosethe situation. Themethodalso uses the information availablein the interruptedexecutionto trace situations wherethe error couldbe a result of a previousor parallel action. This is accomplished by examingthe preconditionsof all interrupted actions. Depending on the outcomeof that examination,the corresponding error rules are also included. Since Srinivas methodis computationallyexpensive,it wouldbe favourableto ’remember’ the errors that actually occuredin an applicationandtheir cause, andstore themin a computationallymoreefficient form. Thetechniquefor doing so has beentermed machinelearningI [Changand DiCesare,1989;Changet. al., 1990a;Changet. al., 21990b] andis usedto produceheuristic rules whichgivena failed action, tests for the presenceandabsenceof certain features knownto discriminate betweenpossible explanations. The techniqueis basedon an initial knowledge about possible errors for an action, andfor eacherror bothprobableand definite featuresfor that error. In the casewhenthe system finds multipleexplanations,it produces a tentative heuristic rule that maybe confirmedor ruled out whenmoreknowledgeis acquired"fromexperiments,modelsor other sources". ProbabilityVectors[TaylorandTaylor, 1988;Tayloret. al., 1990]is an alternativetechniquewherestatistical a priori knowledge aboutthe systemis used. Eachpossible fault is associatedwith the probabilityof its occurence,and storedin a vector.Thisis repeatedfor all possibleeffects (errors), all modelparametersandall sensors,resulting four vectors containingprobability estimations.Connection matricesdescribesa weightedrelationship betweenfaults and errors, errors andparameters,parametersand sensors. Usingthis information,it is possibleto compute a set of probablecausesfor a givenerror, whatsensorsto use to verify an error as wellas (at least in theory)guidingthe process of instrumentinga workcell with sensorsto achieverelevant observability. Alternativeapproachesto diagnosecan be foundin other domains(such as AI in Medicine,e.g. see [Zhang,1993]). Theyare often called modelbasedor consistencybased, since theycontaina qualitative descriptionof the system. This description, or model,expressedin somelogical forrealism, describes each components normalbahaviourand howmodulesinteract. If an observationof an actual system 1. Should not be mixed up with ’true’ machinelearning such as neural nets. 2. Extends and complementsprevious work such as [Changet. aL, 1989] and [Pazzani, 1986; Pazzani, 1987]. 96 state is addedto the descriptionsuchthat the descriptionbecomesinconsistent,there is an error present. Byidentifying whichcomponent (or components) that in conjunctionwith the observationresults in a contradiction,the set of explanations (faulty components) havebeenfound. Theexpalnation is complete,the diagnosecan handlemultiplefaults and mayevenreviel previously unknown faults [deKleerand Williams,1987;Reiter, 1987]. However,it mightsometimes proposeawkward explanationssince it lacks knowledgeaboutpossibleerrors, or rather, causalrelations. Addingfaultmodels [deKleer and Williams,1989; Struss andDressier, 1990]to the descriptionimprovesthe explanation, but the explanationis nolonger guarantiedto be complete. Thesystemmaynowlabel components as faulty, correct or either - meaning it can not decidewhich. Observations The techniques for diagnose presented is just a smallsampleof whatcan be foundin the literature. Characteristicfor the first examples (failure trees withmodifications andprobabilityvectors) is that the knowledge is shallow,that is it is composed of ’rules of the thumb’or statistics aboutthe system,andthat it is not complete.If the systemis reconfigured,large parts of the knowledge has to be updated.In the failure tree basedapproaches,this could be handledby specifyingthe diagnoseknowledge separately for eachmodulein an assemblycell. In a specific configuration, most of the knowledgewouldthen be collected fromits components, andonly a smallpart wouldhaveto be addeddescribingthe current configuration.However, specifying all diagnosticknowledge for each moduleis probably not feasible,andthus theywill still be incomplete - i.e. does not coverall failure situations. In the modelbasedapproach,the knowledge stored is often called deepknowledge since it coversthe normalstructure andfunctionof the system.This approachrequires that the knowledgeis complete, emphasingthe sameproblems of uppdatingthe systemas with the shallowtechniques.Althoughthis is an even moremorecomputationallyexpensiveapproach,it has gainedinterest sinceit is better suited for modulardescriptionthan the diagnosticrules above (e.g., see [Lee,1986]). Recovery Theterm(error) recoveryis usedin a moreversatile fashion thanthe twoabove.Essentially,it is the processof ’correcting an error’, i.e. changethe state of the controlled systemto be consistentwith specifications. If a module accomplishes this withoutfailure (i.e. the systemwhich uses this module/service nevernotices any error), the module is said to be fault tolerant. If the module reportsthe error to someother part of the systemresponsiblefor taking correctiveactions, the systemas a wholeachieveserror recovery. Thefollowingsections reviewssomeof the approaches: placingthe original(failed) part or as a correctivecontinuation of it, andthe programexecutionis then continuedby Programming languageconstructs A first step to structhe interpreterin its currentcontext.Thisapproach is often ture the handlingof exceptionalsituations in a programming calledplanrepair - andthe idea is to ’remember’ the correclanguageis to introduce a newprogramming construct, oftion by altering the plan/program. ten called an exceptionhandler.Uponan error, the control Delchambre et. al. presents a backward error recovery is transferred to the properexceptionhandler, whichwill approach[Delchambre and Coupetz,1988;Gaspartet. al., then executecorrective actions. However,in somelanguag1989].It usesa ’flat’ (nonhierarchical)plan representation es not eventhis supportexists, andthe constructsof the lanwith framelikestructuresrepresentingparts andtheir feaguage has to be used to trap errors and accomplish tures, and rules describinggeneral assemblyknowledge. dispatchersfor correctiveactions or default actionst [Cox, Theplan describesthe assemblytask and is givenby the 1988; Cox1989]. user. Anexample of a feature definedfor a part is the ’handAttemptshavebeenmadeto introduce a notion of recovability coefficient’,describinghow’easy’it is for the robot erypoints(or blocks)in a program,specifyinga legal and to handlethat part. Anexampleof an assemblyknowledge consistentstate to return to in caseof errors. This maybe rule is that if a sub-assembly is to be pickedup, the system viewedas a weakerversionof the conceptof transactionsas shouldfocusthe pickingoperationto the part of the sub-asused in the data base community [Harderand Reuter, 1983], semblywith the best handabilitycoefficient. Theresponse andthere is ongoingworkstudyingthe usability of these to an error is to disassemble the faulty part (i.e. onlyerrors conceptsin the automationarea [Schmidt,1992]. causedby faulty parts are handled)by generatinga sequence In the data base community, the notion of SAGAS or of actions (a plan) to accomplish this. nested SAGAS [Gareia-Molinaand Salem, 1987; GarciaMeijeret. al. presentsa hierarchicalsystemin whichthe Molinaet. aL, 1991] has beendevelopedand proposedfor objectsusedin the (a priori given)planis called ’knowledge modelling parallel, nestedactivities in a corporationas a areas’ (KA’s)[MeijerandHerzberger,1988;Meijeret. al., data base application. Examples of suchactivities are re1991]. EachKAis responsiblefor someaction suchas callceivingorders, billing the customerwhileupdatingthe ining a partially orderedset of other KA’sor performingsome ventory,andso on. In principle,it is a scheme for specifying primitivesensingor acting. For eachKAthere is an invocacompensating activities to be usedin case of an abortionof tion specification,specifyingwhatgoalsit will fulfill and/or an ongoingactivity, and to specifyhowthe abortionof one whatfacts needto existsin orderto use it. If it is a primitive activitythat is a part of a nestedstructureof activitiesshould KA,monitorconditionsmaybe specified, as well as a set of affect the other activities. Thereare nomeansto describe exceptionhandlingstrategies. Sincethese strategies are that somealternative activity shouldbe performedwhenan themselvesrepresentedas KA’s,they contain invocation activityaborts. specificationsguidingthe systemto whatrecoverystrategy Knowledgebased systems The term knowledge based to use whenthe monitoringconditionssignal an error. systems is commonly used to denote any system wherethe Ahierarchical, framebased approachis proposedby knowledge usedis separatedfromthe programthat uses it. [ChertandTrivedi,1991], whereframesrepresentplanskelThesesystemare often also called rule basedor framebased etonsof different abstractionlevels. The’planning’consists systems,accordingto howthe knowledge is represented. of selectingappropriate’frames’and’instantiating’ subIn twoearly proposals([Leeet. al., 1983;Gini, 1983]) plans(frames)refiningthe planuntil all leavesare primitive conventionalrobot system wasextendedby a "knowledge actionsor perceptions.In this approach,the majordifferand reasoningmodule".Error recoverywasachievedby usencebetweengeneratingthe original plan for an assembly ing the failure tree produced in the diagnoseprocess,andby task andgeneratingan error recoveryplanis that the overall goalfor the plan is onlyconsideredin the first case. During augmenting the explanationswith corrective actions/proceplanningfor error recovery,the maingoal is to achieveany dures. These’corrections’weredesignedto restore the controlled system(the application)to either a previouslyvisited errorfree state. Themaindifferences in the approachestaken by Meijer state or a state further downthe original program.In both these approachesthe original robot programis interrupted andChenis that in the first case exceptionhandlingstrateandthe recoveryis performedby downloading newinstrucgies are predefinedfor eachprimitiveaction, albeit as more tions to the robot. or less generalplans, whilein Chen’sapproachthese plans In Noreilset. aL [NoreilsandChatila, 1989;Noreils, are generateduponan error situation usingavailableactions 1990]proposesa variant of the solutionabove.Herethe corandsubplans,but with a somewhat different goal state than rectiveactionis insertedinto the originalprogram, either reduringthe original planning. I. Examples are ’longjump’ in Candstructureduse of methods in C++. Graphbased approachesWhenthe task description is basedon somegraphformalismsuch as a Petri Net or Finite 97 State Machines,somerestricted variants of the approaches described above apply. In the following, only Petri Net basedapproacheswill be presentedsince alternative formalismsare often compilable into a Petri Net. Petri Nets(PN)is a static description(’basedplaces and transitions betweenthem,whereaction normallytake place duringa transition) withoutany exceptionhandling constructs as can be foundin programming languages.Thus, if error handlingis to be usedin a PNbasedapproach,it mustbe modelledas a part of the normaloperation,andonce the PNis constructedit can not be arbitrarily modified.Using PN’s,there are four constructsthat canbe usedto encode exceptionhandling [Zhouand DiCesare,1989]: Input Condition:Whenseveral transitions leaves a place, it is customary to select arbitrarily betweenthem. However,the transitions maybe augmented with a condition,whichif satisfied will favourthe selectionof that transition, and otherwisereject that transition. Alternative Path: Usingthe Input Conditionconstruct above, an alternative path throughthe PNmay be defined. Backward Error Recovery:The two constructs aboveis used to trap an anomalyand executecorrective actions (undoingthe problem),returning to previouslyvisited place in the PN. ForwardError Recovery: Analogousto Backward Error Recovery, but the correctiveaction will directly solve the problemandreturn control to the sameplace as wherethe error originated(or wasdetected). Petri Nets with backtracking:In [CaoandSanderson,1992] a systemis presentedwhichgeneratesa PNcontroller from an assemblydescription representedas an AND/OR tree. At eachplace reachedby a transition wherethe transition/operation mayfail, a forwarderror recoveryaction is defined whichwill try to correct the failure. If the PNstill fails, a backtrackingapproachis used basedon the augmentation of the PNwith inverse transitions (brother transition in [Cao and Sanderson, 1992]) allowing the system to backtrack (disassemblingparts) to a previousstate. In the assembly context, dependingof the parts disassembled,the system maychooseto redo the assembleoperation with partially newmaterialas a strategyto correct the originalerror. This augmentation of the PNis doneas a part of the generation of the PNfrom the AND/OR graph, and can only handle a subsetof all faults that mayoccurin an assembly cell. Also, whenexpandingthe PNwith newplaces and transitions for handlinganomalies,the resulting net growslarge and complex. LayeredPetri Nets (LPN)is used to modularizea 98 controlleras severalPN’sresponsiblefor different functionality or modesof operation.In a LPN a place maybe defined to be a completePNin itself, defininga complex action to be performedwhenthe token reachesthat place. In [Hasegawaet. al., 1990]LPN’sis usedto define separatenets for normaloperation (an auto-modenet) for exceptionhandling, manualoperationetc., and howto switchbetween these modes.This maybe viewedas a moregeneral and structured versionof the exceptionhandlerconstruct mentioned above,since it also handles’manualmode’and any other modeof operationdesirable. ModifiableLayeredPetri Nets: In [Zhouand DiCesare, 1 usingthe four error 1989;ZhouandDiCesare,1993]a LPN recoverymethodsdescribedabovein cooperationwith an Error Diagnoseand RecoveryPlanning module(EDRP) presented. Whenan anomalyappears during ’execution’ of the LPNwhichhas previouslynot beenplannedfor, the EDPR moduleextends the original LPNby transforming parts of it (using the four constructsdescribedabove) handlethe situation. This maybe viewedas on line ’patching’ of the code,but the authorsclaimthat all propertiesof the original net (suchas avoidingdeadlocksor buffer over2. flows)is preserved Observations Table 1: Techniques for recovery after an error "13qaeof error recover Knowled Is knowlge source? edgeapplication dependent Knowled gerepresentationt eehnique Gini any program Lee any us4~r Not~ils. Chatila any user/system ycrJno mles Delchambre backward user/system no roles.frames Meijeret.a/. any user ? semofrules Chcn any system no plans Caoet. al. yea mles backward assembly petrinet Hasegawa et. al. any user yes layered PN Zhou, DiC- any user/system yes/no layered PN, ~ale~ @.sKre Mostof the techniquespresentedaboveprovideboth back1. Herethe authorsviewstheplaceas aa operationor ongoing activityanda transitionas aninstantaneouschange of state (place). 2. Theoretical work supporting thisis described in [-Fielding et. al., 1988]. ward and forward error recovery. Someof the systems combinepredefined system information with application dependent knowledgespecified by the user (Delchambre/ Zhou), someextract it from other available sources such as the original robot programor assemblydescription(Gini/ Cao). Notable is that very few approachesuses general error recovery knowledgeonly. Exceptions is approaches proposed by Delchambreand Chert, witch manageswithout any application specific error recovery knowledge.In all other approaches error recovery knowledgeis associated with each action or each possible error explanation. Techniquesfor recovery seems to evolve in two directions; techniques based on graph formalismssuch as Petri Nets, whichhave sometractable properties of provable liveliness etc., and general, plan basedformalisms.In both cases there are planners involved, which poses a problemsince 1most planners are knownto have problem with soundness and/or safeness2, and to have intractable complexity(i.e. they can’t handle large data). However,there are planners that are safe, and in somedomainseven sound. Andby restricting the semantics of the planning problem,the complexity can be reduced. Promisingworkin this direction can be found in [Klein, 1993]. Conclusions In this paper different techniques proposingsolutions to the error recovery field have been reviewed. Whatthey share is a completely different view of what to specify and howto do it comparedwith equipmenttraditionally used in industry, whichmakesthemhard to integrate into existing production plants. Although, someattempts have been made to build industrial applications (e.g. ThornEMI[Ashtonet. al., 1987], and LookheedAeronautical Systems Company [Kartak, 1988]), using only the most fundamentaltechniques. As limited as they are, they are still regardedas improvementsto the alternatives given at the time. As striving towards a moreflexible work cell equipment yields a more complexinstruction task and a more complex behaviour, it is essential to find suitable high-level methods and languagesfor instructing the cell if the flexibility is not to be lost. Simplyproviding ’hi-level’ languagesused for programmingcomputers is not enough. The goal must be to maximizethe expressiveness while minimizingthe specification needed, and thus only leave to the application programmerto specify what is really application dependent knowledge, knowledgewhich can not be generated from the design of the product. This in turn implies that different parts of the system(abstraction levels) is to be specified/instructed by different categories of people, and thus that dif1. Just becausethe plannerdoesnot find a planthere is noguarantee that it doesn’texist one 2. Theplanneris not guaranteed to halt if a plandoesnot exist 99 ferent languages/forms av descriptions maybe needed - as opposite to what is suggested in [Cox, 1988]. Concludingfromthis, future research in the area ought to be heading towards multileyered knowledgebased systems describing limited domains, systems which is easy to adapt to a newapplication. References [Ashton et. al., 1987] M. Ashton, D. A. Harding and M. I. Micklefield. A flexible assembly system controller. In Proceedingsof the 2nd International Conferenceon Machine Control Systems - MACON-2, p.165-74, IFS Publications, Kempston,UK1987. [Cao and Sanderson,1992]T. Caoand A. C. Sanderson.SensorbasedError Recoveryfor RoboticTaskSequencesUsingFuzzy Petri Nets. In IEEEInternational Conferenceon Roboticsand Automation,p.1063-9, 1992. [Chang and DiCesare, 1989] S.J. Changand E DiCesare. The generation of diagnostic heuristics for automated error recoveryin manufacturing workstations.In IEEEInternational Conference on RoboticsandAutomation.p. 522-7vol.1, 1989. [Chang, 1989] S.J. Chang, E DiCesare and G. Goldbogen.An algorithm for constructing a failure propagation tree in manufacturingsystems. In IEEEInternational Symposium on Intelligent Control,p.38-43,1989. [Chang et. al., 1990a] S. J. Chang, G. Goldbogenand E DiCesare.Evaluationof diagnosabilityof failure knowledge in manufacturingsystems. In IEEEInternational Conferenceon RoboticsandAutomation,p.696-701vol. 1, 1990. [Chang et. al., 1990b] S. J. Chang, G. Goldbogen and E DiCesare. Aspects of diagnostic rules for manufacturing systems: generation, generalization and reduction. In IEEE International Conferenceon Systems, Manand Cybernetics, p.78-83, 1990. [Chert and Trivedi, 1991]C.X.Chelaand M. M.Trivedi. A task planner for sensor-basedinspection and manipulationrobots. In Proceedingsof the SPIE- The International Society for OpticalEngineering,vol. 1571,p. 591-603,1991. [Cox, 1988] I.J. Cox. C++language support for guaranteed initialization, safe termination anderror recoveryin robotics.In IEEEInternational Conferenceon Robotics and Automation. p.641-3vol.l, 1988. [Cox, 1989] I.J. Coxand N.H. Gehani. Exception handling in robotics. Computer,22(3):43-9, March1989. [deKleer amdW’llliams, 1987] J. deKleer and B.C. W’dliams. Diagnosing MultipleFaults. Artificial Intelligence, 32(1):97130, 1987. [deKleer and Williams, 1989] J. deKleer and B.C. V~rdliams. Diagnosis with Behavioural Modes.In Proceedingsof the 1989’thlntenationalJoint Conference on Artificial Intelligence (IJCAI89),pp. 1324-1330,1989. [Delchambreand Coupez,1988] A. Delchambreand D. Coupez. Knowledge based error recovery in robotized assembly.In Proceedings of the 9th International Conference on Developmentsin AssemblyAutomation - Japan vs Europe; ProductDesignfor Assembly;AssemblyAutomation.p.349-66. IFS Publications, Kempston,Bedford,UK,1988. [Fielding et. aL, 1988] E J. Fielding, E DiCesare and O. Golbogen. Error Recovery in Automated Manufactoring through the Augmentation of ProgI’amrned Processes. In Journal ofRobotic Systems, 5(4),337-362, 1988. [Meijer and Herzberger, 1988] G. R. Meijer and L. O. Hertzberger. Off-Line Programming of Exception Handling Strategies. In Proceedings of IFAC Symposium on Robot Control p.431-436, Karlsruhe 1988. [Frank, 1990]E M. Frank.FaultDiagnosis in Dynamic Systems usingAnalyticaland Knowledge BasedRedundancy- A Survey onsomenewresults. Automatica, 26(3):459-474, 1990. [Meijer et. al., 1991] G.R. Meijer, L. O. Hertzberger, T. L. Mai, E. Ganssens and E Arlabosse. Exception Handling System for AutonomousRobots Based on PES. In Journal of Robotics and AutonomousSystems. 7(2-3): 197-209, 1991. [Garcia-Molina and Salem,1987]H. Garcia-Molina and K. Salem.SAGAS.Proc.SIGMODint.conf.on Management of Data,pp.249-259, May1987. [Garcia-Molina et.al.,1991]H.Garcia-Molina, D. Gawlick, J. Klein, K. Kleissner and Kenneth Salem. Coordinating MultiTransaction Activities. Data EngineeringBulletine, 1991. [Gaspart et. al., 1989] P. Gaspart, A. Delchambre, A. Coupez and P. Brouillard. Rule based procedures for diagnosis and error recovery. In Proceedings of MIV-89 . International Workshopon Industrial Applications of MachineIntelligence and Vision (Seiken Symposium),p.88-93, 1989. [Gini, 1983] M. Gini. Recovering from Failures: A New Chalengefor Industrial Robotics. In Proceedingsof the 25"th IEEE Computer Society International Conference (COMPCON-83). p.220-227, Arlington 1983. [Harder and Reuter, 1983] T. Harder and A. Reuter. Principles of Transaction Oriented Database Recovery. ACMComputing serveys, 15(4):287-317, 1983. [Hasegawa et. al., 1990] M. Hasegawa, M. Takata, T. Temmyo and H. Matsuka. Modelling of exception handling in manufacturing cell control and its application to PLC programming. In IEEE International Conference on Robotics and Automation, p.514-19, vol.1, 1990. [Heyes-Roth, 1990] B. Hayes-Roth. Architectural Foundations for Real-TmaePerformancein Intelligent Agents. In Journal of Real-17meSystems, no.2, p.99-125, 1990. [Iseremann, 1984] R. Isermann. Process Fault Detection based on Moddeling and Estimation Methods A Survey. Automatica, p.387-404, vol.20, 1984. [Kartak, 1988] J. A. Kartak. Development of automated workcell control software: a case study. In Proceedingsof the 18th International Symposiumon Industrial Robots, p.467-91, 1988. [Klm and Welch, 1989] K.H. Kiln and O. H. Welch. Distributed Execution of Recovery Blocks: an approach for uniform treatment of hardware and software faults in real-time applications. In IEEEtransactions on Computers, 38(5):626636, 1989. [Klein, 1993] I. Klein. Automatic Synthesis fo Sequential Control Schemes. PhD-theses no.305, Linktping University, 1993. [Noreils, 1990] E R. Noreils. Integrating error recovery in a mobile robot control system. In IEEEInternational Conference on Robotics and Automation, p.396-401vol. 1, 1990. [Noreils and Chatila, 1989] E R. Noreils and R. G. Chatila. Control of mobile robot actions. In IEEE International Conference on Robotics and Automation. p. 701-7 vol.2, 1989. [Pazzani, 1986] M.J. Pazzani. Reflnln$ the KnowledgeBase of a Diagnostic Expert System: AnApplication of Failure Driven Learning. In 5’th National Conferenceon Artifical Intelligence (AAA1-86). p.1029-35, 11-15 Aug. 1986. [PazTarti, 1987] M.J. Pazzani. Failure-Driven Learning of Fault Diagnosis Hheuristics. IEEE Transactions on Systems, Man and Cybernetics, SMC-17(3),p.380-394, May/June 1987. lReiter, 1987] R. Reiter. A Theory of Diagnose from First Principles. Artificial Intelligence, 32(1):57-95,1987. [Schmidt, 1992] U. Schmidt. A Frameworkfor Automated Error Recovery in FMS.In International Conference on Automation, Robotics and ComputerVision. p. IA.3.4.1-5, 1992. [Srinivas, 1977] S. Srinivas. Error Recovery in Robot Systems PhDthesis California Inst. of Tech., Pasadena,1977. [Srinivas, 1978] S. Srinivas. Error Recovery in Robots Through Failure ReasoningAnalysis. In Proceedingsof AFIP- National ComputerConference, p.275-282, 1978. [Struss and Dressier, 1989] R Struss and O. Dressier. "Physical Negation" - Integrating Fault Models into the General Diagnostic Engine. In International Joint Conference on Artificial Intelligence (IJCAI89), pp. 1318-1324,1989. [Taylor and Taylor, 1988] G. E. Taylor and P. M. Taylor. Dynamicerror probability vectors: a framework for sensory decision making. In IEEE International Conference on Robotics and Automation, p. 1096-100, 1988. [Taylor et. al., 1990] P.M. Taylor, I. Halleron and X.K. Song. The application of a dynamicerror frameworkto robotic assembly. IEEEInternational Conference on Robotics and Automation, pp170-5, 1990. [Zhang, 1993] T. Zhang. A Study in Diagnosis Using Classification and Defaults. PhDthesis no. 302, LinkSping University, 1993 [Zhou and DiCesare, 1989] M. C. Zhou and F. DiCesare. Adaptivedesign of Petri net controllers for error recovery in automated manufacturing systems. IEEE Transactions on Systems, Manand Cybernetics, 19(5): 963-73, Sept. 1989. [Laprie, 1992] J. C. Laprie. Basic Concepts and Associated Terminology. In Dependable Computing and Fault Tolerant Systems, Vol. 5, Springer-Verlag, WienNew-York,1992. [Lee et. al., 1983] M. H. Lee, D.P.Bames and N.W. Hardy. KnowledgeBased Error Recovery in Industrial Robots. In International Joint Conference on Artificial Intelligence (IJCAI83). pp.824-26, 1983. [Zhou and DiCesare, 1993] M.C. Zhou and E DiCesare. Petri Net Synthesis for Discrete Event Control of Manufactoring Systems. ISBN0-7923-9289-2, Kluwer AcademicPub., 1993. [Lee, 1986] M.H. Lee. Deep knowledge modelling in robotics. In Proceedings of the Alvey IKBS Research Theme: Expert Systems. Deep Knowledge. Workshop 3Io.2, p.44-50, Alvey Directorate, London,UK, 1986. 100