From: AAAI Technical Report SS-94-04. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. ManufacturingControl System Principles supporting Error Recovery Peter Loborg and Anders Tiirne Dept. of Computer and Information Science, Link0ping University, S-58183 Link0ping, SWEDEN E-mail:petlo@ida.liu.se,andto@ida.liu.se January24, 1994 Abstact This paper argues that manufacturingcontrol systems shouldbe structuredto retain informationusedwhilespecifying the systemso it maybe reusedduringerror recovery situations. Anexampleof such a system(the Aramissystem)andits proposederror recoverycapabilities is presented. Introduction MostManufacturingControl Systems0VICS)commercially availablelack supportfor error recovery.Evennormalconstructs for encodingexceptionhandling,a natural part of manycomputerlanguages,are absent. Althoughseveral proposalshavebeensuggested,there are still shortcomings: ¯ althoughmanyproposalscontain supportfor error recoveryor fault tolerant operation,mostof the support only concernsthe assemblyplan/activities of the manufacturing process, suchas whatto do whena graspoperation falls (e.g. [Caoand Sanderson,1992;Delchambre and Coupez,1988;Gaspartet. al., 1989]). However, after a structural failure suchas a brokenair pressure hose, andsubsequentrepair, parts of the machinerymay be in an abnormal state (i.e. not passedduringnormal execution),or in a normalstate but not corresponding to the ’state’ of the programexecution- a situation seldom addressed. ¯ they are often tediousto instruct, not to mentionthe effort neededto modifythe system.Oneproblemis that the knowledge used wheninstructing a MCS is to a large extentlost at coding/compile time- i.e. it is not explicitly needed/represented in the systemsnormalinstruction formalism,andthus not availablefor further usage. Asa result of this, manyearlier attemptsto providesupport for error recoveryoften result in multiplerepresentation of the knowledge (e.g. [Leeet. al., 1983;Schmidt, 1992;Srinivas, 1977;Srinivas, 1978;TaylorandTaylor, 1988;Tayloret. al., 1990),implyingextra overhead whenthe systemchangesand the knowledge base is to be updated.Aprimeexampleis the usageof an external expert systemresponsiblefor detecting anomaliesand for producingrecoveryactions, hookedon to an existing 101 controller. Asmallchangeof the programin the controller results in majorrewritingsof the expertsystem. However,there are approacheswhereknowledgeis reused(e.g. [Gini, 1983])or integratedas a part of the instruction formalism([ChenandTrivedi, 1991;Delchambreand Coupez,1988;Gaspartet. al., 1989;Meijer andHertzberger,1988;Meijeret. al., 1991]). ¯ the competence of the user of these systems,i.e. the operatorat the shopfloor, is seldomreused. Mostproposals presentedare designedto solve the problemautomatically,thus limitingtheir applicability. Thearea of error recoverymaybe subdividedin detection, diagnoseandrecovery.Weare primarilyinterested in providingsupportin the recoveryphase,or moreprecisely, providingsupportto changethe state of the machine to be a legal one, andto synchronize the state of the currentexecution with the state of the machine.Thegoal is to minimize the time andmaterialloss duringthe recoveryprocess. Themeansare twofold:Firstly, the usageof an explicitly representedmodelof the underlyingsystem,providingrobust servicefor a task level instructionsystemas wellas a semanticframework for anyplanningor plan repair activity. Secondly, a task level wherethe activities, their causalrelation andwhatobjector typeof objectthey use are loggedin order to extract informationabout the normalbehaviorof the system. Thenext section presents an overviewof Aramis(A Robot AndManufacturing Instruction System),whichis specification, programming and executionenvironmentfor manufacturing applications. Thefollowingsection describes the recoverysupport(whichwill be1) providedby the system. Aramis - An Overview TheAramissystemhas beendesignedwith a layered archi1. Theprogrammingenvironment exists as describedherein,used in a robotcell equipped withvision,tactilesensorsetc. [Loborg et. aL, 1994;Loborg et. aL, 1993;LoborgandT6me,1991;T6rne, 1990].Theerror recoverysupportaspresented hereinis under development, andwill be presentedin a forthcoming licentiate thesis. tecture, basedondifferentlevels of abstraction.It consists of three different levels; task programming level, the control level andthe physicallevel (Fig.I). It useswo rld model(WM)as interface betweenthe task level andthe controllevel. Atthe task level the operatorspecifies whatoperationsor actions should be performeduponthe WM and under what conditions, usinga graphicalhybridrule basedlanguage [Loborgand TOme,1991]. Thephysical environmentis modelledas a set of objects in the WM, whereeach object has a set of state variables. Thetask programexecutesby setting referencevaluesfor the objects(their state variables), therebyrequestingthe objectsto changetheir state. Thecontrol level acts as a servomechanism and is responsible for keepingthe real worldin a state representedin the WM, as the WM reference values are changedby task programexecution. Theprogramming at the control level is typically doneby control engineers,andconsists mainlyof specifyinghow devicesin the real worldshould be viewedin the WM, and whatcontrol algorithmsis neededto implementthe servo mechanism. Thephysicallevel is the actual connectionto the real world,whereexplicit I/O, requestedby control level algorithms, is performed with sensorsand actuators. speed= stop speed = slow AND direction = forward speed = fast AND direction = forward .............. TaskProgramming for coordinatingthe operationof all objects involved. Eachobject contain an internal slate I andinformation aboutits properusage.This will preventthe task level programmer fromabusingthe object unnoticed- an exception will occur. Thisprinciple is best explainedthroughthe followingexample,wherea two speedbidirectional electrical engineis modelled (fig 2). This deviceis representedin the WM by an object containing two state variables, named speed and direction. The domainof speed is a set of the three symbolsstop, low and high. The domainof direction consists of forward and backward. This results in an objectwithsix possiblestates. Weuse a set of constraintsto groupthe state spaceinto a set of interesting states (e.g. whenthe value of speedequals stop, the value of direction is uninteresting, andthus five states is adequateto describe the engine).Afinite state machine (FSM) definedover the set of interestingstates is usedto describe howthese variable values maybe combinedand changed in this object. Toeachtransition definedin the ARAMIS Program) Set . ~= ~ R e~ad, . Figure 2 An electrical, bidirectional two speed engine is modeledby two variables, direction and speed. Its behaviour is modeled by a FSMwith five interesting states, each representing a combinationof variable values according to its constraint. The ares between the states describe legal transitions, rouling out the posibility to reverse the engine without requesting it to stop in between. oo°,,o <.v. IOo.tro, l-.. ......... 7 .:;........ ff7 .......................... ~ indicatesinformation flow) I R’lea~Wor~d Figure 1 The world model is viewed as a set of reference values whichis set by the Aramisprogramand used by a servo controller (sensing/control) to invoke correct control algorithms in the ’real world’. The WMalso support the reading of actual values. Theremainderof this section is devotedto the worldmodel (howobjects are modelled,etc.), the controllevel andits connection to the real world,andfinally there is a brief presentationof the task level language. World model based programming In Aramis,weadopt an object centeredviewof the world, whereeachobjectis responsiblefor its owninternal state andits consistency.Thetask level programis responsible 102 FSMthere mightbe an associated control algorithmresponsible for achievingsomething in the real worldwhichcorrespondsto the transitionof states in the object. Thisfunctionality is part of the controllevel andwill be detailedin next subsection. Thetask level programcontrols the object by requesting changesof its variable values. Theobject will respondupon a requestwith oneof the following: ¯ error - there is nodirect path fromthe currentstate to anystate representingthe requestedvalues,i.e. a state wherethe requestedvalueswill satisfy the constraint (task level programming error). Or:. there are severaldirect pathsto states corresponding 1. Weuse the wordstate to denote a specific combinationof values for a fixed set of variables. Eachvariable take its values froma finite (andthusdiscrete)domain. to requestedvariable values(object modelling error should never occur) ¯ failed - the object did not succeedin reachingthe requestedvariable values. This maybe dueto an engine overloadsituation, somefault in the engineor the electrical hardwarefromthe I/O boardand onwards,or a programming error in the algorithmresponsiblefor achievingthe state. Thisreply also indicatesthat any built in recoveryactionshavefailed. ¯ succeed- the newstate is reachedand the worldhas a state whichcorresponds to the requestedstate. Eachrequest fromthe task level programwill be suspended until the objectreturns the result, in whichcasethe execution of the requesting’process’continuesas normalor an exception/erroris raised. This modelof an object is well suited for specifyingaspects of real time behaviour,suchas transition completion timeI andalgorithmperiodicity. This is vital whendesigning controllers for anycontinuousphysicalprocess. Control layer - implements objects Thecontrol layer hardwareconsists of a set of computing devicesandperipheral hardwareorganizedin a topologyof communication channels, whichshould provide real time computationalpowerand I/O to the world model. Although this has not beena part of the project, the modeling of this topologyas wellas the analysisof computational needs(basedon the object models)and schedulingof the actual computation are topics that needto be addressedin order to build a completesystem.Majorcontributionshave already beenmadein the areas of real time communication systemsandprotocols(e.g. [Thomesse et. al., 1991]),scheduling algorithmsapplicable both to communication and processscheduling(e.g. [Leinbaughand Yamini,1982;Rajkumaret. al., 1988;Stancovicet. al., 1990;Fohler, 1992; ShaandSathaye,1992])as well as distributionof time with boundedaccuracy(e.g. [Kopetzand Ochsenreiter,1987; Kopetz,1992]). Theproblemof allocating processornodes for different sets of processesdepending on both their computational and communicational needandthe needfor processor specific services(suchas an I/O device)has also been investigated(e.g. [Maet. aL, 1982;Verhoosel et. al., 1991]). Ourconcernhas beento find a suitable specificationfor the computational needsthat supportboth the schedulingalgorithmsandanalytical methodsused, andan abstraction/ interfaceto a task level language. Theinterfaceconsistsof a set of objectsandtheir state variables,anda principlefor howthe task level interacts with objects. However, if the sameabstractionof the physical worldas presentedby availablesensorsandactuators I. Themaximal amount of timethe transitionwill useonceit is actually started. 103 wouldbe usedat the task level, the resulting state space wouldbe enormousas wouldthe amountof communication neededto keepthe worldmodelupdatedin a distributed system. Thus,eachobjecttype is describedin twoparts. Firstly an abstract modelof the objectis specified, as presentedin previoussubsection.Secondly,the implementation of it is specified. This includesthe specificationof whattypes of actuatorsandsensorsare used,whichalso implieswhatdata types are usedat this level, codeto extract information and controlthe device,etc. Thesetwo parts are then connectedby describinga mappingfromselected variablesor sensorsof the object to the state variablesdescribingthe interface. Appropriate parts of the codedefinedin the objectare associatedwithtransitions andnodesof the FSM of the object - to be usedto accomplish a transition or maintaina reachedstate. Example: If the enginein Fig.2 is equippedwith a speedometerproducingvaluesas integers in the rangeof 0..3000,the cardinalityof the state spaceat the control level is 3001x2. In the state spaceas viewedin the WM (presentedin Fig.2) the variablespeedis reducedto range over3 values,andthus the state spaceis reducedto a cardinality of 6. Thisstate spaceis thengrouped into 5 interesting states, over whicha behaviouris definedin terms of legal transitions, constitutinga FSM.Achangeof valueof the speedometer sensorat the controllevel is not propagated to the worldmodeluntil it impliesa changein the correspondingWM-variable of the object. Formally,the state space as viewedby the sensors and actuators available (SRT)is reducedor mapped onto an exportedstate space(SE- the onepresentin or exportedto the WM) by a surjective functionf: SRT---~S E. Thereforeall low level states havea mapping.Thesamemappingalso denotes an inverse, injective functionfl : SE--OSRTimplying that state changingrequestsissuedat the task level have a uniquecounterpartat the control level. Whether the control algorithmsresponsiblefor achievinga state changein SRTsuchthat fl(sF, request)equalsSRTresult, the resulting state, actuallywill do that or not is implementation dependant.Whatis requiredis that]~SRTresult) equalssF, request. Theconstraints usedto specifythe mappingfromthe exportedstate spaceto the set of interestingstates, also called the abstractstate space,mustdenotea surjective functionf’: SE"~S,,t, thus specifyingthe constraintsto be non overlapping. In this twolevel abstraction,the formerabstractionis usedto decreasethe descriptionof the object to something usefulat the tasklevel, andthe latter to definethe semantics of the object andto providea base for error recovery. Since the state of an object maynowbe abstractedto an abstractstate for that object- a singlevariable- the state spaceused in anyplanningeffort wouldnot increase exponentially withthe number of state variablesusedto control the system,only with the number of objects it contains. Task level programming- coordinating object activities Sincethis supportwill mainlydependon the object specification, it is expectedthat manytask level languagesthat resemblesthe Aramislanguageas presented abovemayuse a similar aproach. Execution and error semantics Thetask level languagehas the role of coordinatingobject activities. Themostprimitivewayto do this is to requesta single objectto changethe valueof oneor severalof its WM variables. This is a primitive action. A user defined actionis a collectionof calls to primitiveand/orto user definedactions, and a partial temporalorder (PTO)over them.ThePTOconsists of a set of temporalrestrictions overpairs of these action calls. Atemporalrestriction betweentwoaction calls specifies that theymustexecutein sequenceandwhichof themthat shouldexecutefirst. When executinga user definedaction, the default is to execute all its parts in parallel, exceptwhenthe PTOprohibits this. Whether parallel actions actuallywill be executedin parallel or not, dependson resource management. Thecontrol level maynot be powerfulenoughto executeall requestedalgorithmsin parallel, or someobject maybe occupiedbyother activities andthe requestedaction will haveto wait. Thepurposeof usingthis schemeis to promotethe encodingof whatorderingrestrictions of the applicationthat really exists, andleaveto the systemto handlefurther restrictions imposedby resource management. Aset of processes(called workers)repeatedlyactionsto solvetheir tasks. Theseworkersare activatedanddeactivated by a humanoperator, acting as a foreman,decidingwhat to performat the shopfloor. Thedescriptionaboveis not complete.Thereexists constructs usedwithinan actionto definealternatives, andrecursiveactionsare usedto expressiterations. Thereare constructs to expressdata flowbetweenactions andtransformations of data throughthe use of functions.For a complete description, se [TOme,1990; Loborgand TOme, 1991]. Error Recovery - exception handling and planning Examplesof commonplace corrective actions in a manufacturingcontextare to replacewhatis faulty or free what has accidentally got stuck. However, this maynot always be accomplishedautomatically, and humanintervention mayleave the machinery in anotherstate than a state appropriate for continuedexecution. Thefollowingsectionswill briefly presentthe exception handlingto be usedin a layeredsystemas describedabove, and present someideas about howto supportthe user when the state of the machineryno longercorrespondsto whatis appropriateto continueexecution. 104 Basedon previoussections, the followingdefinitions are straight forward: ¯ Thereferential closureof a task level process1 p (RCp) is the set of all state variablesfromobjectswhichare referencedby p. Theidea is to limit the worldto be concideredduringrecoverysituations. Theabstractreferential clousure(ARCp) is the set of objects referenced p. ¯ A processstate for a processp is a vectorof valuesfor the variablesin the RCp.Anabstractprocessstate is a vectorcontainingthe currentabstract state for each object in the ARCp. ¯ Theprocess state spaceis the set of all possiblestates of a process,andis dividedin three subsets: ¯ Theillegal state spaceis the subset wheresomeof the objectsare in an illegal state ¯ the abnormal state spaceis the set of legal process states not normallypassedduringexecution ¯ the normalstate spaceis the set of legal process states passedduringsomeexecution Analogous, there is an abstract counterpart. ¯ Anexecutionpath for a process p denotes anysequence of processstates forp that maybe passedduringsome 2. All possible executionpaths for p formsan execution executiongraphforp (F_X3p).A normalexecutionpath (NEPp) denotesthe trace of an executionof an errorfree program,i.e. a programwhichsucceedsin accomplishing its task. Thesubset of the EGpthat is normallyused is denoteda normalexecutiongraphfor p (NEGp).The abstract versions (AEGp and ANEGp) are defined analogously. ¯ An error occurs whenan executiondeviates fromthe EGpandthere is no exceptionhandlerthat succeedsin redirect the executionbackinto the EGp.Theerror will manifestitself in anyof the followingthree alternatives: ¯ the executionenters the illegal state space ¯ the executionhalts prematurely ¯ explicit monitoring of the processstate detects the deviation This definition is basedon the assumption that the model of the worldis correct, but not complete(thus the second 1. Actually, onlya toplevel process,a worker. Although action calls insidethe worker are performed in parallelandthusconstitutesprocesses, theyare regarded as actioncalls - notprocesses. 2. Withthe term’someexecution’wedonot includeexecution of exception handlers! point above). ¯ An exception occurs whenan executiondeviates from the EGpbut is redirectedinto the EGp(withoutentering the illegal state space)by someexceptionhandler. Asa result of this definition, problems duringthe assembly processcausedby damaged parts or parts slightly out of specificationare not regardedas errors if there exists a procedureto recoverfromthese situations. Generally,recoveringfroman error correspondsto changingthe state of the machinery so it will correspond to anyprocessstate in the EG.It is preferableto recoverto a processstate in EO’close’to the currentprocessstate - providedit is possibleto associatea cost functionto the transitions of objects. Usingthis approach,the followingissues must be concidered: ¯ decidingwhethera processstate is in the F_X3 or not is nontrivial, compared with decidingwhetherthe state is inside a NEG - a matterof comparing with the execution log. ¯ simplychangingthe state of the machinerywill most likely conflict with physicalor other restrictions on how to operate the machinery.Theinformationabout such restrictions is encodedin the task level program,explicitly or implicitly. Thatinformationmustbe extractedto guide the planningof howto changethe state of the machinery. Thefollowingsubsectionswill present suggestionson how to implement exceptionhandlinganda restricted version of recoveryfromerrors, basedon the definitions anddiscussion above. Exceptionhandling in the control level Exceptionhandling at the task level Providingexceptionhandlingat the task level maybe viewedas simplyprovidingsomesyntactic construct to catch a signal andtransfer control to somepredefinedpiece of code. Examplesof such exceptionhandlingconstructs can be foundin several different programming languages. Mostof themhave in common that they describe howto handleexceptionsin a sequentialset of instructions- not parallel activities nor higherlevel instructions/activities. In the data base community,the notion of SAGAS or nested SAGAS [Garcia-Molinaand Salem, 1987; GarciaMolinaet. al., 1991] has beendevelopedandproposedfor modelling parallel, nestedactivities in a corporation,suchas receivingorders, billing the customerwhileupdatingthe inventory,andso on. In principle,it is a scheme for specifying compensating activities to be usedin case of an abortionof an ongoingactivity. It also specifieshowthe abortionof one activitythat is a part of a nestedstructureof activitiesshould affect the other activities. Thereare no meansto describe that an alternative activity shouldbe performed whenan activity aborts. In a moregeneralsetting, as describedin previoussections, it mightalso be valuableto be able to specifyadditional and/oralternativeactionsto use, as wellas howto return to normalexecution.Thelatter is uncommon in normal programming languagessuch as ADA,wherethe exception handler’takes over’ the remainingexecutionin the module that experiencedthe error. If the exceptionhandlersucceeds,the modulewill return as normal,otherwiseit will propagatethe sameor a modifiederror to its caller. Error Recoveryat the task level Fromthe task level point of view,it is preferableif the control level couldbehavein a transactionalway,i.e. either accomplishrequestedstate changesor report a failure without changinganything.This is not alwayspossible - a half doneseamweldmaynot easily be undone- but should be desired whenever possible. Astraight forwardapproachis to label transitions of an object as undoableand/orrestartable, either by specifying other algorithmsto use in orderto achievethat, or by specifyingthe objectto use anotherexistenttransition andits algorithmas undoor recovertransition. This is analogousto the notion of ’compensating actions’ andhas beenproposed in manufacturing applications by, e.g. [Schmidt,1992].A further functionalityis to specifythe transitionas interruptable, i.e. specifyingthat the task level is to executesome preparedbut exceptionaltask beforereturningthe control backto the controllevel. Thedefinition of objectsas proposedherein is easily adaptedto sucha solution, but experiments with different applicationsis neededto evaluatethis approach. 105 Whenthe exceptionhandlingfails, the error is propagated up to the user, the human operator. Heor she has to understandthe originalfault andcorrect it in orderto continue execution.In order to correct the fault, it is not uncommon that the state of the machinery is changedmerelyin order to access the failing device, nor is it uncommon to remove someof the workpiecesdueto a defect - either causingthe problem or a result of it. Theeffect of this activityis, that the machinery is not alwaysin the samestate as it was wheninterrupted. Asspecifiedpreviously,the processstate after the failed action maynowbe either illegal, abnormal or normal. Usingthe currentdefinitionof an objectthere is no support for howto changean illegal processstate into a legal one, i.e. the systemcan not proposehowto doit. It can merelypresentthe fact andinformthe user aboutthe expectedstate. If the processstate is abnormal, it indicatesthat noexecution passedthis state yet. Thusthe state mustbe changed to be a normalonein orderto be certainthat it is possibleto continueexecution.If this normalstate doesnot correspond to the onewherethe fault occurred,it is likely that continued execution will fail. Thereasonis that the internalstate of the controllingcomputer is not correct with respectto continued executionfromthis new,normalstate. Theexecutionlog - a sourceoflnformation It is a basic assumptionin this presentationthat the informationneeded to performrecoveryplanningis either explicitly availableor maybe extracted by observingthe systemexecutingunder normalconditions.This section will discussthe secondpart of that assumption. Thereferential closureof a processp (RCp)is an interesting notionsince mostproblemstend to be local to the processk Astraight forwardmethodto identify a RCpused in an applicationis to log the activities of processp. TheRCpmay also be analyzedfromthe sourcecodefor p. As stated previously, the normalexecutiongraph(NEG) is usefulfor decidingwhethera legal processstate is normal or abnormal,i.e. if there exists a programwhichcan be used to continueexecutionfromthat processstate. Byloggingthe process state the NEG (and the ANEG) is obtained. Since loggingisprocesslocal,the log will start to repeatitself, i.e. as processescontinueto executecyclically, the log of each processwill containrepeatingseries of log records. When that happens,newlog records will no longerbe appendedto the log. Theexisting log will be reused, expanded with branchesif the path starts to deviatefromthe previously knowncycle. Extra care mustbe taken whenhandlingvariables referring to materialin flow,since they maypoint to differentobjectinstancesat differentpointsin time. Apartfromloggingthe state of the system,the state of the controller mustbe loggedin order to supportresynchronization of programexecutionwith a state of the machinery. Theresult of this loggingschemeis a knowledge base with informationaboutthe normalbehaviourof the application, i.e. knowledge aboutdata andcontrol flow. Thusimplicit knowledge is extractedandmadeexplicitly available for reuse. Planningfor error recoveryPlanningis neededin order to supportthe user in situations whenthe machinery is in an abnormal state. Ratherthan creating a plan fromthe current (abnormal)state to the last known normalstate, wesuggests that a plan shouldbe generatedto the closest normalstate. Normal executionis then restarted fromthat newstate, using the loggedcontroller informationto synchronizethe controllerwiththe state. Theexpectedadvantageof usingthis approach,is that it will excludethe needto plan for a repetitive sequence of actions - thus reducingplanningcomplexity.Themotivationis postponed to the endof this subsection. 1. Thecasewhen processsynchronization is a part of the problem is postponed as futurework. 106 Thecost functionimpliedby the termclosest usedabove is definedas follows: ¯ for eachobjectthere is a cost associatedwith eachtransition between abstract states of that object ¯ givena currentabstract state for an object, the distance twostates in that objectis definedas the minimal sumof transition costs between the states. Sincetransitions havea direction, the distance betweenstate AandB of an object maydiffer fromthe distance betweenB andA. Thedistanceis infinite if there is no sequence of transitions fromAto B. ¯ the distance betweentwoabstract processstates is the sumof the distancefor the components, i.e. the individual objects. Usingthis definition the distancebetweento processstates is defined,usingthe abstractstate space,as the shortestdistance betweenthe corresponding abstract processstates. Producinga planto reachthe selected goal state amounts to selectingthe orderin whichthe differenttransitionsfrom the participating objectsshouldbe executed.If no ordering couldbe found,there are physicalrestrictions in the machinery that prohibitsthe currentsolution.In sucha casethe next 2. cheapestgoalstate is selectedandthe processrestarts When the orderingis found,the plan is refined fromthe abstract state spaceto the full state spaceas represented in the worldmodel.Thisrefinement is safe, i.e. it will not invalidate the plan achievedat the abstract level. Normally,decidingwhenit is safe to use a planningoperator impliesverifyingpre- andprevail-conditions for that operator.In this approach,eachtransition can be viewedas an operator. Byanalyzingthe log, pre- andpost-conditions as well as a pessimisticversionof the prevail conditioncan be extractedfor eachtransition. Sincethe planningis performedin the abstract state space,these conditionswill express the currentabstract states of other objectsin the RCp, or a "don’t-care"value. This approachdoesnot permitanyobject to re-enter a state it hasalreadyvisited.Thisrestrictionis natural- if the applicationimpliesa partly repetitive behaviour, i.e. that an objectcyclicallyreturnsto the samestate, there will alsoexist programsused duringnormalexecutionto accomplish this behaviour.Thusthese programsmaybe used from whateverstate in the loop the plannerchooseas goalstate. In [Klein, 1993;B~lckstrOm, 1992;Backstrt3m andKlein, 1991]a planingformalismcalled simplified action struc÷) is presenttures (SAS)andan extendedversionof it (SAS ed. Usingthis formalism,somerestricted planningproblemsis shown to be tractable, i.e. solvablein polynomial time. Theapproachpresentedherein is close to but doesnot fully correspond to these restricted classes. Themainadvan2. In orderto includeaaalternative,moreexpensive set of traasifionsleadingto the same goalstate, the definitionof distancemust be rephrased. rage with this approachis that the actual planningproblems as posedby a real applicationis expectedto be limited, both in termsof the state space (dependingon howprocessesare structured) andin the length of producedplans. Resynehronizatlon A seldom addressed problem when dealing with an error recoverysituation in the automation context is, that evenif the humanoperator is capableof manuallymodifying the state of the machineryto be a legal one, it is almostimpossibleto continueexecutionfromthat state. The reason is that it is non trivial to change the interhal state of the controller to correspond to the new state of the machinery. However, if there is a well defined separation between different processes in a controller, and if interprocess communication is formalized, it is possible to support the change of the internal state for one such process. As described above, logging of a process implies saving the values for the state of the process (all variables in the RCp) as well as information about the current activation records used by the process (the internal state), i.e. both data and control information. Using this information, it is straight forward to modify the internal state I to be in accordmacewith any state present in the log; Find the entry in the log corresponding to the state, and replace the process internal control information by the one found in that entry. [Delchambre and Coupez, 1988] A. Delchambre and D. Coupez. Knowledgebased error recovery in robotized assembly In Proceedings of the 9th International Conference on Developments in Assembly Automation - Japan vs Europe; Product Design for Assembly; Assembly Automation. p.349-66. IFS Publications, Kempston,Bedford, UK, 1988 [Fohler, 1992] G. Fohler. Realizing Changes of Operational Modes with Pre Run-Time Scheduled Hard Real-Trine Systems. In Proceedingsof the Second International Workshop on Responsive Computer Systems, Saitama, Japan, October 1992. [Gaspart et. aL, 1989] P. Gaspart and A. Delchambre and A. Coupezand P. Brouillard. Rule based procedures for diagnosis and error recovery. In Proceedings of MIV-89- International Workshopon Industrial Applications of MachineIntelligence and ~r~sion (Seiken Symposium).p.88-93, 1989 [Gini, 1983] M. Gini. Recovering from Failures. A New Chalengefor Industrial Robotics. In Proceedingsof the 25’th IEEE Computer Society International Conference (COMPCON-83). p.220-227, Arlington 1983 [Garcia-Molina and Salem, 1987] H. Garcia-Molina, K. Salem. SAGAS.Proc. SIGMOD int. conf. on Management of Data, pp.249-259, May 1987 [Garcia-Molina et. aL, 1991] H. Garcia-Molina, D. Gawlick, J. Klein, K. Kleissner and K. Salem. Coordinating MultiTransaction Activities. Data Engineering Bulletine 1991 (also in COMPCOM91) [Klein, 1993] I. Klein. Automatic Synthesis fo Sequential Control Schemes. PhD-theses no.305, Link6ping University, 1993 Conclusion In this paper,a proposalis presented of howto structurea manufacturing controlsystemin orderto achievean open, flexible andmodular systemwhichis capableof reusing the knowledge encodedfor executionto providesupport for errorrecovery.Theknowledge is of the sametypeas frequentlyusedin proposalsfor detection/monitoring and diagnose,thusenablingits reusefor thosepurposes as well. [Kopetz, 1992] H. Kopetz. Sparse Time versus Dense Time in Distributed Reai-TimeSystems. Proc. 12th Int. Conference on Distributed ComputingSystems, Japan, June 1992 [Kopetz and Ochsenreiter, 1987] H. Kopetz, W. Ochsenreiter. Clocksynchronization in Distributed Real-I~me Systems, IEEE Transactions on Computers,Vol C36, No. 8, pp. 933-940, Aug. 1987 [Lee et. al., 1983] M.H. Le, and D.P.Barnes and N.W. Hardy. Knowledge Based Error Recovery in Industrial Robots. In Proceedingsof the International Joint Conferenceon Artificial Intelligence. pp.824-826, 1983 References [B~ckstr6m, 1992] C. B~ckstr6m. Computational Complexity of Reasoning about Plans. PhD-thesis no.281, Link6ping University, 1992 [Leinbaugh and yamini, 1982] D.W. Leinbaugh and M. Yamlni. Guaranteed response tinae in a distributed hard real-time environment, Proc. Real-time systems symposium,Dec. 1982. [B.~ckstr6mand Klein, 1991] C. B/ickstr6m, I. Klein. Planning in polynomial time, the SAS-PUBSclass. Computational Intelligence, 7:181-197, August1991. [L,oborg et. al., 1994] E Loborg, P. Holmbom,M. Sk61d and A. T6me.A Model for the Execution of Task Specifikations for Intelligent and Flexible Manufacturing Systems, accepted for publication in Integrated Computer-AidedEngineering (special issue about AI in Manufactoring and Robotics to appear somewhereduring 94). [Chert and Trivedi, 1991] C.-X. Chert and M.M.Trivedi. A task planner for sensor-based inspection and manipulation robots. In Proceedings of the SPIE - The International Society for Optical Engineering, vol. 1571, p. 591-603, 1991 [Cat and Sanderson, 1992] T. Cat and A.C. Sanderson. Sensorbased Error Recovery for Robotic Task Sequences Using Fuzzy Petri Nets. In Proceedingsof lEEEInternational Conferenceon Robotics and Automation, p. 1063-9, 1992 [Loborg et. ai., 1993] P. Loborg, T. Risch, M. Sk61d and A. T6rne. Active tO Databases in Control Applications, in Microproeessing and Microprogramming, Vol. 38, No 1-5, p.255-264, proceedings of the 19th Euromicro Conference, Barcelona, sept 1993. 1. Providedthat the executing systemis well defined, using modifiable data structures to hold such information. If the executing code is compiledto machinecode, this maynot be the case. 107 [Loborg and T0me, 1991] P. Lobor8 and A. T6me. A Hybrid Language for the Control of Multimachine Environments, in Proceedings of EIA/AIE-91,Hawaii, June 1991. [Maet. al., 1988] P.R. Ma, E. Y. S. Lee and M. Tsuchiya. A Task Allocation Model for Distributed Computing Systems, Hard ReallimeSystems, Eds.SytankovichlRamamritham, pg.249255,IEEEComp.Soc.Press, 1988. [Meijerand Hertzberger, 1988]G. R. Meijerand L. O. Hertzberger. Off-Line Programming of Exception Handling Strategies. In Proceedings of IFAC Symposium on Robot Control. p.431-436, Karlsruhe 1988 [Meijer et.al.,1991]G.R.Meijer andL.O.Hertzberger andT. L. MaiandE. Ganssens andF. Arlabosse. Exception Handling SystemforAutonomous RobotsBasedon PES.In Journal of Robotics andAutonomous Systems. p. 197-209, no.7,1991 [Rajkumar et.al.,1988]R. Rajkumar, L. Sha,J.P. Lehoczky, Real-Time Synchronization protocols formulti-processors. Proc.IEEEReal-~me systems syrup., CS Press, LosAlamitos, Calif., pp.259-269, 1988 [Schmidt, 1992]U. Schmidt. A Framework for Automated Error Recovery in FMS.In Proceedings of the2’ndInternational Conference on Automation, Robotics and Computer Vision. p.IA.3.4.1-5, 1992 [Sha and Sathaye, 1992] L. Sha and S. Sathaye. Distributed Real-Time System Design using Generalized Rate Monotonic theory, Second Int. Conf. on Automation, Robotics and Computer Vission, SINGAPORE, 1992 [Srinivas, 1977] S. Srinivas. Error Recovery in Robot Systems PhDthesis California Inst. of Tech., Pasadena,1977 [Srinivas, 1978] S. Srinivas. Error Recovery in Robots Through Failure ReasoningAnalysis. In Proceedingsof AFIP. National ComputerConference, p.275-282, 1978 [Stankovic et. al., 1990] J.A. Stankovic, K. Ramamritham,and P.-E Shiah. Efficient Scheduling Algorithms for Reai-Ttme Muitiprocessor Systems. IEEE Transactions on Parallel and Distributed Systems, pages 184-194, April 1990 [Taylor and Taylor, 1988] G. E. Taylor and P. M. Taylor. Dynamicerror probability vectors: a framework for sensory decision making. In Proceedings of the 1988 IEEE International Conferenceon Robotics and Automation, p. 1096100 vol.2, 1988 [Taylor et. al., 1990] P. M. Taylor and I. Halleron and X. K. Song. The application of a dynamicerror frameworkto robotic assembly. IEEEInternational Conference on Robotics and Automation, pp170-5. IEEE Comput. Soc. Press, Los Alamitos, CA, USA 1990. [Thomesseet. al., 1991] J.-P. Thomesse,P. Lorenz, J. P. Bardinet and T.Valentin, Factory Instrumentation Protocol: Model, Products and Tools, Control Engineering, pp65-67, Sept 1991. [T6rne, 1990] A. T6me. The Instruction and Control of MultiMachine Environments, in Applications of Artificial Intelligence in EngineeringV, vol. 2, pp. 137-152,prec. of the 5th Int. Conf. in Boston July 90, Springer-Verlag,1990. [Verhoosel et. al., 1991] J. P. C. Verlaoosel, E. J. Luit, D. K. Hammer, E. Jansen. A Static Scheduling Alghorithm for Distributed Reai-TimeSystems, Journal of Real-?ime Systems, 3, pp.227-246, KluwerAcad. Pub. 1991. 108