tolerance itself was not addressed at all. On one hand, The family of synchronous languages [BB91], [BCE+ 03] it would need the use of completely different validahas been quite successful in offering formally defined lantion tools (taking into account stochastic aspects), and guages and programming environments for safety-critical reon the other hand no quantitative requirements were active systems. available for the case study. Lustre [CHPP87] has been defined and studied in the Ver– The real implementation should be distributed. In this imag laboratory. It is a dataflow language, well-suited for the study, we only considered the functional aspects of the description regulation systems. The industrial programming F. Maraninchi, et.ofal. specification, and we did not address the distribution. environment SCADE, developped by Esterel-Technologies1 , 1 Note that an ideal approach would be to validate is based upon Lustre. SCADE is now a de-facto standard, first a centralized version of such a specification, and worldwide, for the development of critical embedded sotware, to use automatic code distribution tools preserving especially in avionics, automotive, or energy production. On the functional properties. Some proposals for such an the academic side, several tools have been developed around automatic distribution of L USTRE programs have been Lustre at Verimag: a simulator called Luciole, a debugger made [CGP99], [CMSW99], [SC04]. called Ludic, two verification tools — Lesar is a modelchecker, NBac is a tool based on abstract interpretation and II. A N OVERVIEW OF L USTRE AND ITS PROGRAMMING dedicated to the verification of numerical properties — and a ENVIRONMENT tool for automatic testing, Lurette. In this paper, we illustrate the use of these tools in the A. The language design of a component of an avionic software. The example F. Maraninchi, N. Halbwachs, P. Raymond, C. Parent In a dataflow language for reactive systems, both the inputs , a Grenoble is a fault-tolerant system computing the gyroscopic data gfor Vérima France and outputsUniversity, of the system are described by their flows of values military aircraft. {Florence.Maraninchi,Nicolas.Halbwachs,Pascal.Raymond,C.Parent @imag.fr ** along time. Time is discrete and instants may be numbered by F. Maraninchi*, N. Halbwachs*, P. Raymond*, C. Parent* and R. }K. Shyamasundar † The authors thank IFCPAR (Indo-French Centre for Promotion of R.K.integers. Shyamasundar If x is a flow, we will note xn its value at the nth Advanced Research) under which part of the work was done. reaction (or nth instant) of the program. of Fundamental Research, Mumbai, India, 1 Verimag is a joint laboratory of Université Tata Joseph Institute Fourier, CNRS and *Verimag Grenoble University, France, {Florence.Maraninchi, Nicolas.Halbwachs, A program consumes input flows Pascal.Raymond, and computes output Grenoble-INP. shyam@tifr.res.in 1 see http://www.esterel-technologies.com/. flows, possibly using local flows which are not visible from the Catherine.Parent}@imag.fr 7 : 69 Specification and Validation of ofEmbedded Specification and Validation Embedded Systems: Systems: A Case Study a Fault-Tolerant Data A Caseof Study of a Fault-Tolerant DataAcquisition Acquisition System with Lustre Programming environment† System with Lustre Programming environment ** Tata Institute of Fundamental Research, Mumbai, India, shyam@tifr.res.in Specification and Validation of Embedded Systems: a. What is in the paper: Apart from demonstrating the use showStudy how to of specify and validate an embedded using Lustre programming L system tools, the goal ofthe the paper is twofold: AWeCase a Fault-Tolerant Data of–Acquisition it intends to show how the use of an executable, environment. The case-study considered is a fault-tolerant system for the acquisition of gyroscopic † formally clean, description language can help in unt System with Lustre Programming environmen data in a military aircraft. We illustrate the use of Lustre tools for describing, and derstanding an informal specification, insimulating, removing Abstract—We show how to specify and validate an embedded system using the Lustre programming environment. The caseUSTRE study considered is a fault-tolerant system for the acquisition of gyroscopic data in a military aircraft. We illustrate the use of Lustre tools for describing, simulating, and verifying the system. Beside, we show how the formalization of the requirements by meansF.ofMaraninchi, an executable allows to Parent be ambiguities,ofand in communicating, by showing N.language Halbwachs, P. ambiguities Raymond, C. verifying the system. Beside, we show how the formalization the requirements by early means of an removed, and how the system be developed step by step, while , can simulation, with the author of the specification. Vérima g Grenoble University, France simulation and validation take place at each step. We believe executable language allows ambiguities to be removed, and how the system can be developed step – it illustrates a progressive top-down design, with sim{Florence.Maraninchi,Nicolas.Halbwachs,Pascal.Raymond,C.Parent }@imag.fr that this example is representative of a wide class of embedded ulation and validation atbelieve each step. that this example is systems. by step, while simulation and validation take place at each step. We R.K. Shyamasundar b. What is not in the paper: Two important features of the case studies were not considered during this experiment: – The considered system aims at fault-tolerance. While verification,Executable specifications. the experiment shows that the approach is well-suited programming this kindSynchronous of fault-tolerance software, Keywords - Embedded real-time systems,Language design andfor implementation, languages, I. an I NTRODUCTION the problem of measuring Abstract—We show how to specify and validate embedded a. What is in the paper: Apart from demonstrating the use and validating the fault using the Lustre Formal methods, Formal verification, Executable specifications. system programming environment. The case+ tolerance itself was not addressed at all. On one hand, of L USTRE tools, The family of synchronous languages [BB91], [BCE 03]the goal of the paper is twofold: study considered is a fault-tolerant system for the acquisition of it would use of completely different valida– it intends tolanshow how the use ofneed an the executable, has been quite successful in offering formally defined gyroscopic data in a military aircraft. We illustrate the use of tools (taking stochastic aspects), and formally clean, language can helpinto in account unguages and programming environments for safety-critical re-descriptiontion Lustre tools for describing, simulating, and verifying the system. I.Introduction derstanding an informal specification, in removing on the other hand no quantitative requirements were Beside, we show how the formalization active systems. of the requirements by means of an executableLustre language allows ambiguities to be and studied ambiguities, and in communicating, by for showing the caseearly [CHPP87] has been defined in the Verfamily of synchronous languages [BB91], [BCE+03] –available –of the it specification. intends to study. show how the use of an executable, removed,The and how the system can be developed step by step, while simulation, with the author – The real implementation should be distributed. In this imag laboratory. It is a dataflow language, well-suited for the simulation and validation place at each We believeformally formally clean, description language can help has been quite take successful instep.offering defineda progressive top-down – it illustrates design, with simstudy, we only considered the functional aspects of the descriptionofofa regulation systems. The industrial programming that this example is representative wide class of embedded ulation and validation at each step. in understanding an informal specification, in languages and programming environments for safety-critical 1 specification, and we did not address the distribution. systems. environment SCADE, developped by Esterel-Technologies , b. What is not in the paper: Two important features ofapproach the Note that an ideal would be in to validate removing ambiguities, and communicating, by reactive systems. is basedsystems,Language upon Lustre. SCADE is now a de-facto standard, Keywords:Embedded real-time design and case studies were not considered this experiment: firstduring a centralized version of such a specification, and worldwide, for the development critical embedded showing early simulation, with the author of the implementation, languages,Formal methods,Formal LustreSynchronous [CHPP87] has been definedofand studied in sotware, the system aims – The considered fault-tolerance. While to atuse automatic code distribution tools preserving especially in avionics, automotive, or energy production. On verification,Executable specifications. the experiment shows that the approach is well-suited specification. Verimag laboratory. It is a dataflow language, well-suited the functional properties. the academic side, several tools have been developed around this kind of fault-tolerance software,Some proposals for such an for programming distribution of USTRE programs have been design, with –automatic – it illustrates a Lprogressive top-down for the description of regulation systems. industrial at Verimag: a simulator calledThe Luciole, a debugger I. Lustre I NTRODUCTION the problem of measuring and validating the faultmadesimulation [CGP99], [CMSW99], [SC04]. at each step. called Ludic, two verification tools — Lesar is a modeland validation programming environment SCADE, developped by Estereltolerance itself was not addressed at all. On one hand, The family of synchronous languages [BB91], [BCE+ 03] NBac is a tool based on abstract interpretation andthe use of completely different valida1 it would need , ischecker, based upon Lustre. SCADE is now a dehasTechnologies been quite successful in offering formally defined lanb. What is not OF in Lthe paper: Two important features of the USTRE A N OVERVIEW dedicated to the for verification of numerical properties — (taking and a intoII. tion tools account stochastic aspects), and AND ITS PROGRAMMING guages and programming environments safety-critical refacto standard, tool worldwide, for the Lurette. development of critical case studies were not considered during this experiment: ENVIRONMENT for automatic testing, on the other hand no quantitative requirements were active systems. embedded sotware, especially in avionics, orthe case In this paper, illustrate of these tools for in available The considered system aims at fault-tolerance. Lustre [CHPP87] has been defined and we studied in thethe Ver-useautomotive, A. study. The––language – tools TheThe real implementation should be distributed. In this oflanguage, a component of an avionic software. example energy production. On the academic side, have imag laboratory. It is a design dataflow well-suited for theseveral While theforexperiment shows In a dataflow language systems, both thethat inputsthe approach study, we only considered the functional aspects ofreactive the issystems. a fault-tolerant systemprogramming computing the gyroscopic data for a description of regulationaround The industrial been developed Lustre at Verimag: a simulator called and outputs of the system are described by their flows of values is well-suited for programming this kind of faultspecification, and we did not address the distribution. military aircraft. environment SCADE, developped by Esterel-Technologies1 , time. Time discrete and instants may numbered byof measuring Luciole, a debugger called Ludic, two verificationNote tools that — an idealalong approach wouldisbe to validate tolerance software, the beproblem is based upon Lustre. SCADE is now a de-facto standard, † The authors thank IFCPAR (Indo-French Centre first Promotion of version integers. x isa specification, a flow, we will x its value at the nth a centralized of If such andnote Lesar is NBac is a sotware, tool based on forabstract worldwide, for athemodelchecker, development critical embedded and validating the nfaulttolerance itself was not Advanced of Research) under which part of the work was done. reaction (or nth instant) of the program. to use automatic code distribution tools preserving especially in avionics,and automotive, production. On Joseph Verimag isora energy jointto laboratory of Université Fourier, CNRS and interpretation dedicated the verification of numerical all. On one it would A program consumes input and hand, computes output need the use the functional properties. Some addressed proposals for at such anflows Grenoble-INP. the academic side, several tools have been developed around properties — and1asee tool for automatic testing, Lurette. automatic distribution of Lpossibly USTRE have been ofprograms completely different validation flows, using local flows which are not visible fromtools the (taking into http://www.esterel-technologies.com/. Lustre at Verimag: a simulator called Luciole, a debugger madein [CGP99], called Ludic, twopaper, verification — Lesarthe is use a modelIn this we tools illustrate of these tools the [CMSW99], [SC04]. account stochastic aspects), and on the other hand checker, NBac a tool based onofabstract interpretation and The example is design of ais component an avionic software. no quantitative requirements were available for the II. A N OVERVIEW OF L USTRE AND ITS PROGRAMMING dedicated to the verification of numerical properties — and a case study. a fault-tolerant system computing the gyroscopic data for a ENVIRONMENT tool for automatic testing, Lurette. military aircraft. In this paper, we illustrate the use of these tools in the A. The language –– The real implementation should be distributed. design of a component of an avionic software. The example In this a. What is in the paper: Apart from demonstrating the use In a dataflow language for reactive systems, bothstudy, the inputswe only considered the functional is a fault-tolerant system computing the gyroscopic data for a and outputs of the system are described by their flows of values aspects of the specification, and we did not address of LUSTRE tools, the goal of the paper is twofold: military aircraft. Embedded Systems: along time. Time is discrete and instants may be numbered by nt Data Acquisition † The authors thank IFCPAR (Indo-French Centre for Promotion of integers. If x is a flow, we will note xn its value at the nth underthank which part of the work was done. t† TheResearch) ing environmenAdvanced authors IFCPAR (Indo-French Centrereaction for Promotion of Advanced Research) under which part of the work was done. (or nth instant) of the program. Verimag is a joint laboratory of Université Joseph Fourier, CNRS and d, C. Parent Verimag is a joint laboratory of Universit´e Joseph Fourier, CNRS and Grenoble-INP. A program consumes input flows and computes output Grenoble-INP. France 11seesee mond,C.Parent}@imag.fr flows, possibly using local flows which are not visible from the http://www.esterel-technologies.com/. http://www.esterel-technologies.com/. Tataa Institute of Fundamental Research, design Mumbai, Keywords:Embedded systems,Language andIndia, representative of widereal-time class of embedded systems. implementation, Synchronous shyam@tifr.res.in languages,Formal methods,Formal 1 umbai, India, the paper: Apart from demonstrating the use E tools, the goal of the paper is twofold: ds to show how the use of an executable, y clean, description language can help in unding an informal specification, in removing ties, and in communicating, by showing early on, with the author of the specification. ates a progressive top-down design, with simand validation at each step. ot in the paper: Two important features of the es were not considered during this experiment: nsidered system aims at fault-tolerance. While CSI Journal of Computing | Vol. 1 • No. 4, 2012 a “bounded counter”, etc. 7 : 70 B. An example Lustre program a very simple example ofSystems: program, A weCase give aStudy Lustreofnode Specification and As Validation of Embedded a that willSystem be usedwith later Lustre in the case study. Calledenvironment “maintain”, Fault-Tolerant Data Acquisition Programming this operator receives an integer n and a Boolean b as input parameters, and computes a Boolean output m, which is true the distribution. Note that an ideal approach would output m, which truemaintained whenever bhigh has during been maintained high whenever b has isbeen the last n cycles. during the last n cycles. The node uses a counter cpt, which is be to validate first a centralized version of such a The node uses a counter cpt, which is set to n whenever b set to n whenever b is false, and decremented to 0 otherwise. specification, and to use automatic code distribution is false, and decremented to 0 otherwise. The output m is true The output m is true when cpt is zero. tools preserving the functional properties. Some when cpt is zero. proposals for such an automatic distribution of LUSTRE programs have been made [CGP99], [CMSW99], [SC04]. II. An Overview of Lustre and its Programming Environment A. The language In a dataflow language for reactive systems, both the inputs and outputs of the system are described by their flows of values along time. Time is discrete and instants may be numbered by integers. If x is a flow, we will note xn its value at the nth reaction (or nth instant) of the program. A program consumes input flows and computes output flows, possibly using local flows which are not visible from the environment. Local and output flows are defined by equations. An equation “x = y + z” defines the flow x from the flows y and z in such a way that, at each instant n, xn = yn + zn. A set of such equations, using arithmetic, Boolean, etc. operators, describes a network of operators, and is similar to the description of a combinational circuit. The same constraints apply: one should not write sets of equations with instantaneous loops, like : {x = y + z; z = x + 1; ...}. This is a set of fixpoint equations that perhaps has solutions (see [SBT96], for more detailed discussion), but it is not accepted as a dataflow program. For referencing the past, the operator pre is introduced : "n > 0; (pre(x))n = xn–1. One typically writes T = pre(T) + i, where T is an output, and i is an input. It means that, at each instant, the value of the flow T is obtained by adding the current value of the input i to the previous value of T. Initialization of flows is provided by the –> operator. E –> F is an expression, the value of which is the one of E at the first instant (i.e., E0), and then the one of F forever (i.e., Fn:"n > 1). The equation X = 0 –> pre(X) + 1 defines the flow of integers; as a reactive program, it produces values on the basic clock. The conditional structure is a ternary combinational operator, and is strict: the two branches are always evaluated. One writes: X = if C then E else F, where C is a Boolean expression and E1, E2 are two expressions of the same type, meaning: "n > 0, Xn = if Cn then En else Fn. The language is structured by the definition of reusable nodes that can be called anywhere in expressions defining variables. Programs usually input a library of small wellidentified reactive behaviors, like a “two-states” with reset, a “bounded counter”, etc. B. An example Lustre program As a very simple example of program, we give a Lustre node that will be used later in the case study. The node named “maintain” denotes an operator that receives an integer n and a Boolean b as input parameters, and computes a Boolean CSI Journal of Computing | Vol. 1 • No. 4, 2012 node maintain (n : int ; b : bool) returns (m : bool) ; var cpt : int ; let cpt = n -> if b then if pre(cpt)>0 then pre(cpt) - 1 else pre(cpt) else n ; m = (cpt = 0) ; tel C. The programming environment In addition to the industrial environment SCADE — C. The programming environment which proposes a graphical interface, a simulator, and a In addition the industrial environment SCADE — which compiler —, antoacademic toolset has been developed around proposes a graphical interface, a simulator, and a compiler —, Lustre. The following tools or prototypes are available: a compiler into C. a simulator, called Luciole, which allows an early simulation of Lustre nodes. It displays an interactive board, where inputs can be entered and outputs are displayed. It is connected with the tool Sim2chro, which displays the history of inputs/outputs by means of timing diagrams. two verification tools: they are both restricted to the verification of safety properties, described by means of synchronous observers [HLR93]: a synchronous observer is a Lustre program, taking as inputs the input/output variables of the program under verification, and signaling whenever the property considered is violated. This technique is used both for describing required properties, and assertions about the environment which must be assumed for these properties to hold. The tools differ in the techniques applied for verification: –– Lesar [RHR91] is quite a standard symbolic modelchecker [BCM+90]. It checks the property by exploring (enumeratively or symbolically) a finite state model of the program, which abstracts away all its numerical aspects. As a consequence, it is not able to verify properties depending on the dynamic behavior of the numerical variables. It should be used for control dominated properties. –– NBac [JHR99] is able to handle simple numerical properties, thanks to the use of “Linear Relation Analysis” [HPR97], a special case of abstract interpretation [CC77]. However, NBac is much more expensive than Lesar, and should only be applied to fairly small programs. a prototype debugging tool [MG00]. • • ti th p An It re (tell and spec able real thes the a pr Industri Since t crepanci the lang hierarchi and mod version. also diffe features, III. A. Deve We sta of functi 7 : 71 F. Maraninchi, et. al. Roll Roll FaultFaulttolerant tolerant system system Pitch Pitch 3 “secure” 3values “secure” values Yaw Yaw 4 x 3 Faulty 4physical x 3 Faulty connections physical connections each of them made of 2 wires each made of 2 wires (4 x 3ofxthem 2 inputs) (4 x 3 x 2 inputs) 4 identical physical devices 4 identical physical devices (4 x 3 values, deg/s) (4 x 3 values, deg/s) Fig. 2. Fig. 2. Description of the physical system Description of the physical system Fig. 2 : Description of the physical system Roll Roll 4 identical physical devices 4 identical physical devices (4 x 1 values, deg/s) (4 x 1 values, deg/s) Fig. 3. Fig. 3. 3 3 FaultFaulttolerant tolerant system system 4 x 1 Faulty 4physical x 1 Faulty connections physical connections each of them made of 2 wires each made of 2 wires (4 x 1ofxthem 2 inputs) (4 x 1 x 2 inputs) 1 ”secure” 1value ”secure” value Description of the physical system for one axis Fig.for 3 :one Description of the physical system for one axis Description of the physical system axis are called performance requirements. For instance, several Industrial vs. academic tools are called performance For instance, several distinct working rates are requirements. required for the parts of the system: distinct working rates are forthere the parts the system: 0.05s, 0.0125s, etc. Since the release of required Scade-V6, are of significant 0.05s, 0.0125s, etc. the industrial and academic versions discrepancies between of the language. In particular, Scade-V6 contains a notion We first identified the system interface, i.e., the physical ofinputs hierarchic automata [CPP05], inspired both by Esterel We first identified the system interface, i.e., the physical and outputs, and some additional inputs that model [BS91] and mode automata [MR03], which is not in the inputs and outputs, and some additional inputs that faults. Then we wrote a single-clock Lustre program model mimacademic version. The mechanisms for defining and handling faults. Then we wrote a single-clock Lustre program mimicking the internal structure of the informal specification, but arrays are also different. In this paper, we willspecification, not use thesebut icking the internal structure of the informal forgetting about the multi-rate requirements. This is already incompatible features, thus conforming to both versions. forgetting about the multi-rate requirements. This is already an interpretation of the informal specification. For instance, an the informal specification. For instance, we interpretation translated the of English “a followed by b immediately” by we translated the English “a followed by b immediately” by “at the next step”. “at the next step”. At each step of the development, we ran manual simulations, At as each of the development, we ran manual and, farstep as possible, we used verification tools simulations, to establish and, as farproperties. as possible, we used verification tools to establish important important properties. B. Physical structure III.Informal description of the Gyroscopic B. Physical structure System Figure 2 describes the system and its physical environment. Figure 2 describes the system andgyroscopes, its physicaleach environment. The system is connected to four of them A. Development of the Case Study The system is connected to four gyroscopes, each of them measuring the angle variations along three axes named roll, measuring the angle variations along three axes named roll, pitch yaw. from The values obtained by one of in these physical Weand started an informal specification English, pitch and yaw. The values obtained by one of these physical devices, one axis, are transmitted to the some computer system made of for functional requirements and timing devices, one axis, transmitted to the requirements. system along twoforwires. Hence the system receives 4computer × 3 × 2 values. requirements, that areare called performance along two wires. Hence the system receives 4 × 3 × 2 values. From these 24 values, it has to compute only three, called For instance, several distinct working rates are required From these 24the values, it has to 0.0125s, computeetc. only three, called for the parts of system: 0.05s, secure values. secure The values. first step in modeling the system in Lustre, is to The first on step modeling the the system in Lustre, to concentrate oneinaxis only, since behaviour on all isaxes 2 concentrate on one axis only, since the behaviour on all axes are all the same . We shall be using the diagram depicted in 2 . We shall thethe diagram are all 3theassame Figure the reference in be theusing rest of paper. depicted in CSI Journal of Computing | of Vol. • No. 4, 2012 Figure 3 as the reference in the rest the1 paper. 2 Actually, 2 Actually, the behaviour of pitch is slightly different. the behaviour of pitch is slightly different. 7 : 72 Specification and Validation of Embedded Systems: A Case Study of a Fault-Tolerant Data Acquisition System with Lustre Programming environment We first identified the system interface, i.e., the physical inputs and outputs, and some additional inputs that model faults. Then we wrote a single-clock Lustre program mimicking the internal structure of the informal specification, but forgetting about the multirate requirements. This is already an interpretation of the informal specification. For instance, we translated the English “a followed by b immediately” by “at the next step”. At each step of the development, we ran manual simulations, and, as far as possible, we used verification tools to establish important properties. B. Physical structure Fig. 2 describes the system and its physical environment. The system is connected to four gyroscopes, each of them measuring the angle variations along three axes named roll, pitch and yaw. The values obtained by one of these physical devices, for one axis, are transmitted to the computer system along two wires. Hence the system receives 4 x 3 x 2 values. From these 24 values, it has to compute only three, called secure values. The first step in modeling the system in Lustre, is to concentrate on one axis only, since the behaviour on all axes are all the same2. We shall be using the diagram depicted in Fig. 3 as the reference in the rest of the paper. Luciole simulation Lesar Boolean verification NBac numeric verification Lurette automatic testing Lustre compiler Fig. 1. The difficult part is the detection of faults. First, we have to define what we call faults, and then to show how the redundant structure of the gyroscopic system can handle them. Definition of faults 4 between the actual measurement devices and the computer; Lustre program C code D. The faults The system able toorhandle two kinds of faults: themselves being isbroken not working properly for link some faults, that are due to some bad behavior of a physical link time. SCADE Lustre front-end (remember we concentrate on one axis only, say roll). Each channel delivers one value, and there is a vote to compute one single value out of four, depending on the current fault conditions. The behaviour is as follows: if all the channels are working, then take the Olympic average of the four values (i.e., the average of all the values, except the two extreme ones); if one channel has failed, take the median value, among the other three; if two channels have failed, take the average of the two remaining ones. Three or more channels having failed at the same time is supposed to have a very low probability, but the case is handled by the system emitting a so-called “safe value”, which is to be maintained for some delay, even when the bad situation disappears. Of course, the channels are intended to run on different processors, for the purpose of redundancy. For simplicity, we shall ignore these distributed aspects in our modelling. Fig. 1 : The Lustre programming environment The Lustre programming environment C. The voting principle The internal structure of the system is as follows: it made of four channels, each of them being in charge of C.is The voting principle the two wires that come from one of the four gyroscopes The internal structure of the system is as follows: it is made of2 four channels, each of them being in charge of the two wires Actually, the behaviour of pitch is slightly different. that come from one of the four gyroscopes (remember we concentrate on one axis only, say roll). Each channel delivers one and there is a vote to| compute one 4, single CSIvalue, Journal of Computing Vol. 1 • No. 2012value out of four, depending on the current fault conditions. The behaviour is as follows: if all the channels are working, then take the Olympic average of the four values (i.e., the average sensor faults, are dueoftofaults the measurement devices (the Modeling andthat Detection sensors) themselves being broken working Each channel compares the valuesor itnotreceives onproperly the two for some time. wires, and is able to detect local discrepancies. This double transmission of values fromofone gyroscope to the computer Modeling and Detection faults system is there to detect transmission faults. Note that, for a the values on more the fault Each to be channel reported,compares the two values haveittoreceives differ by two wires, and is able to detect local discrepancies. This than ∆v during consecutive ∆t units of time. double transmission of values from one gyroscope to the Moreover, in order to support sensor faults, channels talk computer system is there to detect transmission faults. toNote eachthat, other exchange values, sothe that each of them forand a fault to be reported, two values have can to compare its own value to the other three. If one of the gyrodiffer by more than Dv during consecutive Dt units of time. scopes is not working, the valuethe it delivers will probably Each channel compares values it receives ondiffer the from the values given by the three other devices. Channels two wires, and is able to detect local discrepancies. This also havetransmission to exchange their failure statuses, because each one double of values from one gyroscope to the should compare its is value to the values transmission of the other channels, computer system there to detect faults. but only of for those that todobenot declarethe themselves failed Note that, a fault reported, two values have(a to differdeclares by moreitself than failed v during consecutive of time. channel when it detectst units a transmission Moreover, in order to support sensor faults, channels talk fault). to each other and exchange values, so that each of them can compare its own value to the other three. If one of theLatching gyroscopes not resetting working, the value it delivers will E. faultsis and probably differ from the values given by the three other Some faults are considered serious than others, and devices. Channels also havemore to exchange their failure should therefore be latched: even if the cause of the fault statuses, because each one should compare its value to the disappears, theother channel continues to declare values of the channels, but only of those itself that dofailed. not Hence, channelfailed has an internal state, thatitself reflects the declareeach themselves (a channel declares failed whenitithas detects a transmission fault). faults encountered. Typically, transmission faults (discrepancies between the two values received on the two wires) are not latched, because they are considered to be physically temporary. Conversely, faults that are due to cross-channel comparisons are latched: one of the physical devices is supposed to be off, and it is unlikely to repair during the flight. Of course, there should be some way of resetting the latches. 7 : 73 F. Maraninchi, et. al. E. Latching faults and resetting IV.Describing the Architecture in Lustre Some faults are considered more serious than others, and should therefore be latched: even if the cause of the fault disappears, the channel continues to declare itself failed. Hence, each channel has an internal state, that reflects the faults it has encountered. Typically, transmission faults (discrepancies between the two values received on the two wires) are not latched, because they are considered to be physically temporary. Conversely, faults that are due to cross-channel comparisons are latched: one of the physical devices is supposed to be off, and it is unlikely to repair during the flight. Of course, there should be some way of resetting the latches. This is done in our system thanks to two kinds of resets, called OnGroundReset and InAirReset. OnGroundReset can be thought of as a general resetting mechanism that happens only when it is really safe to do so, namely on ground. InAirReset is more interesting. First, it is not automatic, but results from a pilot action. Hence the pilot should be given some information about the internal state of the fault-tolerant controller, in order to decide whether he/she should issue a reset. Moreover, it is to be taken into account only under some conditions. The internal state of a channel, regarding the latched faults, is one of the following: Everything is working properly, or there are transmission faults from time to time, but they are not latched There has been at least one serious fault in the past, and the failure is latched. The internal state can be driven to the normal one with any of the resets (on ground or in air), but for the InAirReset to work, some conditions on the measured values have to hold. There have been several serious faults in the past, or one serious fault followed immediately by a transmission one, then the fault is latched and reset-inhibited. In this case, only the OnGroundReset may restore the normal state. The system should never enter a global state in which more than two channels are reset-inhibited, since it cannot be repaired on board. A careful reading of the informal documentation gives all the details about the possible transitions among these three states. Writing down these transitions allowed us to make precise the priorities, and to remove some ambiguities. We give the automaton point of view on this part of the system, in section VI-C below. The architecture of the Lustre program is exactly the same as the one described in the informal specification – thanks to the dataflow style of the language. Figure 4 shows the main structure of the system (for one axis): The four channels will be implemented as four identical nodes. The voter is another node. An additional node, the global allocator, will deal with the problem of third faults. This direct translation of the specification into the program architecture highlights the advantages of some features of the language: the notion of concurrency corresponds to the logical concurrency of the specification; moreover, this concurrency can model a physical concurrency, as it is the case, here, if the four channels are to be distributed; the communication — here, it will be simply a delay — is an abstract model of the real communication between distributed processes (see [Cas01], [HB02], [JHR+07] for a more accurate modeling of physical concurrency); the connections between nodes clearly reflect the specification, thanks to the data-flow communication between nodes; the four channels will be essentially instantiations of a single node – thanks to the functional nature of the language. F. Special case for third faults Finally, the faults are not treated the same depending on their order of occurrence: the first and second one (among four channels) are treated the same, but it is said in the informal documentation that a third failure should not cause the channels to become reset-inhibited. Hence, we need to treat the third faults in a special way – which will be clear in the sequel. A. Global inputs and outputs The system receives: 4 pairs (or 2 4-tuples, Roll_a and Roll_b) of flows of type “real”, coming from the sensors; 2 Boolean flows, On_Ground_Reset and In_Air_Reset. It computes a single safe value for Roll. B. The component interfaces The channels Each channel receives a pair (roll_a, roll_b) from outside. The reset signals OnGroundReset and InAirReset (implemented as Boolean flows) also come from outside. It receives also, from each other channel, the computed value foreign_roll and the failure status foreign failure roll. Finally, it receives a Boolean flow Inhib_Roll_Allowed from the global allocator. The channel computes three outputs: its local value local_roll for roll, its transmission failure status transmit_ failure_roll (which will be broadcast both to other channels and to the voter), and a request for “reset inhibition” arbitration, Ask_inhib_roll to the global allocator. The exchange of values foreign_roll and foreign_failure_ roll among channels at once raises a problem, since local roll and transmit_failure_roll will be computed as combinational functions of the inputs foreign roll and foreign failure roll. As a consequence, delays should be introduced somewhere, CSI Journal of Computing | Vol. 1 • No. 4, 2012 7 : 74 Specification and Validation of Embedded Systems: A Case Study of a Fault-Tolerant Data Acquisition System with Lustre Programming environment 6 On Ground Reset Roll a In Air Reset Roll b local roll Channel 0 transmit failure Global ask Channel 1 Allocator allowed values Calculate Channel 3 Channel 2 (the voter) failures Roll Fig. 4. : Architecture of the Lustre program: connecting the four channels together ArchitectureFig. of the4 Lustre program: connecting the four channels together to prevent local_roll and transmit_failure_roll to depend instantaneously on themselvesFailDetect (remember thattalks combinational transmission discrepancies; to the three loops are forbidden in Lustre, and rejected by all tools). So,of wethe other channels and knows about the internal fail status delay broadcasting the value (using a “pre” operator) when channel. It also talks to the global allocator, which knows entering channel. Forofinstance, we call and foreign_rolli about thethe failure statuses the four ifchannels, is ablej to the value entering the channel j coming from channel i, and prevent a third failure from becoming reset-inhibited. local_rolli the value computed by channel i, we will have foreign_rollij = pre(local_rolli) The voter V. T HE VOTER The voter isreceives only a consumer values by other It simply the valuesoflocal rollproduced and transmit_ modules. Thus, it can be designed in isolation. Figure failure_roll from the channels, and computes the safe value 7 describes the Lustre for the complete Roll. However, it is notcode combinational (see §V).voter. The details of its parts are described below. Notice that the voter is not a The global allocator combinational node: it has to memorize a fragment of its past. components, with suitable connections. It is described3 in Fig. 5.B. Counting faults nodes noneof, D. The One channel oneoffour, twooffour, threeoffour are intended to count the faults, i.e., We can go one step further in the description of the the number of Boolean variables have value among architecture, by giving the internalthat structure of onetrue, channel f1, f2, f3, f4. They can be programmed with integers, (see Fig. 6). It is made of two parts: Monitor detects the of course, like in Fig. reftwooffour.a. is also transmission discrepancies; FailDetectHowever, talks to there the three a form that does not make use of numerical variables, other channels and knows about the internal fail status ofand thatchannel. can beItnecessary forthe decidability reasons when trying the also talks to global allocator, which knows to perform formal verification with Lesar. Hence, we about the failure status of all the four channels, and is able towill sometimes usefailure the node ofbecoming Fig. 8.b. reset-inhibited. prevent a third from Similar nodes for noneof, oneoffour, and V. The Voter threeoffour are easy to write. The voter is only a consumer of values produced by other modules. Thus, it can be designed in isolation. Fig. 7 describes the Lustre code foritself the complete voter. The details C. The voting mechanism of its parts are described below. Notice that the voter is not a Then comes theitvoting itself. Thea conditional expression combinational node: has to memorize fragment of its past. It is in charge of dealing with “reset inhibition”. It needs the outputs local_roll, transmit failure_roll and Ask_inhib_roll A. The timing aspects from the channels, and returns four Boolean allowed (one for each channel). Here again, to avoid combinational loops, the if mimics the informal documentation. The auxiliary nodes Remember that the “safe value” must me maintained The timing aspects input to each channel be the delayed three Inhib_Roll_Allowed of more channels have been faultywill “recently”. The first A. OlympicAverage, Median, and Average are straightforward. Remember that the “safe value” must be maintained if version of the corresponding output allowed of the global three lines define a counter cpt_roll, which is non zero three or more channels have been faulty “recently”. The first allocator. exactly when there was a case with three failures, in the recent three lines define a counter cpt_roll, which is non zero past. “recent” means within the last SAFE_COUNTER_TIME D. Validation C. Connecting the four channels together exactly when there was a case with three failures, in the recent units of time, where SAFE_COUNTER_TIME is a constant. It “recent” is difficult to within expresstheproperties of the voter, apart According to the general architecture described above, past. means last SAFE_COUNTER_TIME The writing in Lustre is quite simple: just restart the counter from the whole specification (which would be a rephraswe are able to write the main node, which invokes the main units of time, where SAFE_COUNTER_TIME is a constant. The with value SAFE_COUNTER_TIME each time there are three faults, and then decrement it at each step, until it reaches zero : 3 ing of the program). For this simple device, we can just Because of the very regular structure of this node, it would be more concisely and elegantly described by means arrays. try a simulation using Luciole. Fig. of9 Lustre-V4 shows such a simHowever, since these arrays differ significantly from those of Scade, we decidedwe not choose to make use of them. ulation: constant inputs for the values x1, x2, cpt_roll = 0 -> if three_roll then SAFE_COUNTER_TIME else if pre (cpt_roll)> 0 CSI Journal of Computing | Vol. 1 • No. 4, 2012 then pre(cpt_roll)-1 else 0 ; x3, x4 coming from the channels, and just play with the occurrences of transmission faults f1, f2, f3, f4, observing that the correct output is computed in each case. The simulation is done with SAFE COUNTER TIME = 3 and FAIL SAFE ROLL VALUE = 0. A. The timing aspects Remember that the “safe value” must me maintained if three of more channels have been faulty “recently”. The first three lines define a counter cpt_roll, which is non zero exactly when thereet. was a case with three failures, in the recent F. Maraninchi, al. past. “recent” means within the last SAFE_COUNTER_TIME units of time, where SAFE_COUNTER_TIME is a constant. writing in Lustre is quite simple: just restart the counter with The writing in Lustre is quite simple: just restart the counter value SAFE_COUNTER_TIME each time there are three faults, with value SAFE_COUNTER_TIME each time there are three and then it at each step,step, untiluntil it reaches zero zero : faults, and decrement then decrement it at each it reaches : cpt_roll = 0 -> if three_roll then SAFE_COUNTER_TIME else if pre (cpt_roll)> 0 then pre(cpt_roll)-1 else 0 ; Then comes the voting itself. The conditional expression mimics the informal documentation. The auxiliary nodes OlympicAverage, Median, and Average are straightforward. D. Validation 7 : 75 It in is Fig. difficult to express properties the avoter, apart like reftwooffour.a. However, thereofis also form that from the make wholeuse specification (which would a rephrasdoes not of numerical variables, andbethat can be ing of the for program). For reasons this simple can just necessary decidability whendevice, trying we to perform try a simulation Luciole. Fig. we 9 shows such a use simformal verificationusing with Lesar. Hence, will sometimes ulation: choose the nodewe of Fig. 8.b. constant inputs for the values x1, x2, x3, x4 comingnodes from the just play withand the Similar for channels, noneof,and oneoffour, occurrences of are transmission faults f1, f2, f3, f4, observthreeoffour easy to write. ing that the correct output is computed in each case. The C. The voting mechanism itself simulation is done with SAFE COUNTER TIME = 3 and comes the voting itself. FAIL Then SAFE ROLL VALUE = 0.The conditional expression B. Counting faults mimics the informal documentation. The auxiliary nodes OlympicAverage, Median, and Average are straightforward. The nodes noneof, oneoffour, twooffour, threeoffour are intended to count the faults, i.e., the number of Boolean variables that have value true, among f1, f2, f3, f4. They can be programmed with integers, of course, D.Validation It is difficult to express properties of the voter, apart from the whole specification (which would be a rephrasing of the 7 program). For this simple device, we can do a simulation using node GYRO ( Roll_a_1, Roll_b_1, Roll_a_2, Roll_b_2, Roll_a_3, Roll_b_3, Roll_a_4, Roll_b_4 : real; On_Ground_Reset, In_Air_Reset : bool) returns (Roll : real); var local_roll_1, local_roll_2, local_roll_3, local_roll_4 : real; transmit_failure_1, transmit_failure_2, transmit_failure_3, transmit_failure_4 : bool; ask_1, ask_2, ask_3, ask_4 : bool; allowed_1, allowed_2, allowed_3, allowed_4 : bool; let (local_roll_1, transmit_failure_1, ask_1) = Channel(Roll_a_1, Roll_b_1, On_Ground_Reset, In_Air_Reset, pre(local_roll_2), pre(transmit_failure_2), pre(local_roll_3), pre(transmit_failure_3), pre(local_roll_4), pre(transmit_failure_4), allowed_1); (local_roll_2, transmit_failure_2, ask_2) = Channel(Roll_a_2, Roll_b_2, On_Ground_Reset, In_Air_Reset, pre(local_roll_1), pre(transmit_failure_1), pre(local_roll_3), pre(transmit_failure_3), pre(local_roll_4), pre(transmit_failure_4), allowed_2); (local_roll_3, transmit_failure_3, ask_3) = Channel(Roll_a_3, Roll_b_3, On_Ground_Reset, In_Air_Reset, pre(local_roll_1), pre(transmit_failure_1), pre(local_roll_2), pre(transmit_failure_2), pre(local_roll_4), pre(transmit_failure_4), allowed_3); (local_roll_4, transmit_failure_4, ask_4) = Channel(Roll_a_4, Roll_b_4, On_Ground_Reset, In_Air_Reset, pre(local_roll_1), pre(transmit_failure_1), pre(local_roll_2), pre(transmit_failure_2), pre(local_roll_3), pre(transmit_failure_3), allowed_4); (allowed_1, allowed_2, allowed_3, allowed_4) = Allocator(ask_1, ask_2, ask_3, ask_4); Roll = Voter(local_roll_1, local_roll_2, local_roll_3, local_roll_4, transmit_failure_1, transmit_failure_2, transmit_failure_3, transmit_failure_4); tel Fig. 5. Fig. 5 : The main node The main node roll1 OnGroundReset CSI Journal of Computing | Vol. 1 • No. 4, 2012 roll2 Roll = Voter(local_roll_1, local_roll_2, local_roll_3, local_roll_4, transmit_failure_1, transmit_failure_2, transmit_failure_3, transmit_failure_4); 7Fig. : 765. tel Specification and Validation of Embedded Systems: A Case Study of a Fault-Tolerant Data Acquisition System with Lustre Programming environment The main node roll1 OnGroundReset InAirReset roll2 Local roll MONITOR transmit failure Foreign roll failure roll FAIL DETECT Foreign failure roll Ask inhib roll Inhib roll Allowed Fig. 6. Architecture of the Lustre program: a channel Fig. 6 : Architecture of the Lustre program: a channel 8 node Voter ( x1, x2, x3, x4 : real ; -- four values given by the channels f1, f2, f3, f4 : bool ; -- failure statuses seen by the four channels ) returns (x : real) var zero_roll, one_roll, two_roll, three_roll : bool ; -- numbers of failures cpt_roll : int ; -- a counter let cpt_roll = 0 -> if three_roll then SAFE_COUNTER_TIME else if pre (cpt_roll)>0 then pre(cpt_roll) - 1 else 0 ; zero_roll = noneof (f1, f2, f3, f4) ; one_roll = oneoffour (f1, f2, f3, f4) ; two_roll = twooffour (f1, f2, f3, f4) ; three_roll = threeoffour (f1, f2, f3, f4) ; x = if (zero_roll and cpt_roll = 0 ) then OlympicAverage (x1, x2, x3, x4) else if (one_roll and cpt_roll = 0 ) then Median (x1, x2, x3, x4, f1, f2, f3, f4 ) else if (two_roll and cpt_roll = 0 ) then Average (x1, x2, x3, x4, f1, f2, f3, f4 ) else FAIL_SAFE_ROLL_VALUE ; tel ; Fig. 7. The voter in Lustre Fig. 7 : The voter in Lustre node twooffour (f1, f2, f3, f4 : bool) returns (r : bool) let r = ((if f1 then 1 else 0) + (if f2 then 1 else 0) + (if f3 then 1 else 0) + (if f4 then 1 else 0)) = 2 ; tel CSI Journal of Computing | Vol. 1 •counter No. 4, 2012 (a) A version with node twooffour (f1, f2, f3, f4 : bool) returns (r : bool) 1 2 3 4 5 6 7 8 9 10 11 x1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 x2 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 x3 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 x4 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 f1 tel ; Fig. 7. tel ; The Fig. voter7.in Lustre The voter in Lustre 7 : 77 F. Maraninchi, et. al. node twooffour node twooffour (f1, f2, (f1, f3, f4 f2,: f3, bool) f4 : bool) returns (r : bool) returns (r : bool) let let r = ((if rf1 = then ((if 1 f1else then0) 1 + else 0) + (if f2 then (if 1 f2else then0) 1 + else 0) + (if f3 then (if 1 f3else then0) 1 + else 0) + (if f4 then = 2 ; (if 1 f4else then0)) 1 else 0)) = 2 ; tel tel 1 (a) A version (a) with A version counter with counter 3 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 9 2 10 11 x1 1.00 1.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.00 1.00 x1 x2 2.00 2.00 2.002.00 2.002.00 2.002.00 2.002.00 2.002.00 2.002.00 2.002.00 2.002.00 2.002.00 2.00 2.00 x3 4.00 4.00 4.004.00 4.004.00 4.004.00 4.004.00 4.004.00 4.004.00 4.004.00 4.004.00 4.004.00 4.00 4.00 x4 6.00 6.00 6.006.00 6.006.00 6.006.00 6.006.00 6.006.00 6.006.00 6.006.00 6.006.00 6.006.00 6.00 6.00 x2 x3 x4 node twooffour node twooffour (f1, f2, (f1, f3, f4 f2,: f3, bool) f4 : bool) f1 f1 returns (r returns : bool) (r : bool) let let f2 f2 r = f1 and r = f1 and f3 f3 (f2 and not (f2 f3 andand notnot f3 f4 and or not f4 or f4 f4 f3 and not f3 f2 andand notnot f2 f4 and or not f4 or f4 and not f4 f2 andand notnot f2 f3) and or not f3) or 3.00 3.00 3.00 3.00 3.003.00 3.00 3.00 f2 and f2 and (f1 and not (f1 f3 andand notnot f3 f4 and or not f4 or 2.00 2.00 2.00 2.00 x x f3 and not f3 f1 andand notnot f1 f4 and or not f4 or 1.50 1.50 1.50 1.50 f4 and not f4 f1 andand notnot f1 f3) and or not f3) or f3 and f3 and 0.00 0.00 0.000.00 0.00 0.00 (f2 and not (f2 f1 andand notnot f1 f4 and or not f4 or f1 and not f1 f2 andand notnot f2 f4 and or not f4 or 3 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 9 10 11 1 2 f4 and not f4 f2 andand notnot f2 f1) and or not f1) or f4 and f4 and Fig. 9 : A Luciole simulation of the voter — Initially, (f2 and not (f2 f3 andand notnot f3 f1 and or not f1 or A Luciole Fig. simulation A Luciole ofsimulation the — of Initially, the — Initially, is no fault, theresois nooffault, so f3 and not f3 f2 andand notnot f2 f1 and or not f1 or Fig. 9. there is9.no fault, so thevoter output isvoter thethere olympic average the output is the output olympicis average the olympic of theaverage inputs of (i.e., thethe inputs average (i.e.,ofthe average of f1 and not f1 f2 andand notnot f2 f3) and ; not f3) ; the inputs (i.e., the average of 2 and 4). At step 2, a fault 2 and 4). At step 2 and 2, 4). a fault At step occurs 2, aon fault channel occurs 4, on so channel the outputs 4, so is the the outputs is the tel tel occurs onx1, channel 4, so the the median of x1, median of median x2, x3, of x1, which x2, is x3, 2. outputs At which step is 3, is 2. channel At step 3 3, becomes channel 3 becomes faulty, sowhich the faulty, result is sothe theaverage result 3, isofchannel the x1average and x2. ofbecomes At x1step and4,x2. afaulty, third At step so 4, a third x2, x3, is 2. At step 3 (b) A purely (b)Boolean A purelyversion Boolean version faultresult occurs, is fault so the theoccurs, output sotakes theof the output takesvalue the At “safe” 0 for 3value units for 3 units of the average x1“safe” and x2. step 4, a0ofthird time. time. Fig. 8. The Fig. node8.twooffour The node twooffour fault occurs, so the output takes the “safe” value 0 for 3 Fig. 8 : The node twooffour units of time. Luciole. Fig. 9 shows such a simulation: we choose constant inputs for the values x1, x2, x3, x4 coming from the channels, and just play with the occurrences of transmission faults f1, f2, f3, f4, observing that the correct output is computed in each case. The simulation is done with SAFE_COUNTER_TIME = 3 and FAIL_SAFE_ROLL_VALUE = 0. VI.Incremental implementation of functionality Once we have designed the global architecture of the program, and taken decisions about where to place Lustre pre’s, we can develop the whole program progressively. We can start by giving each node some trivial behavior (like output = constant), just to see whether the architecture indeed compiles. It allows typing and self-dependence problems to be detected. Then we can start writing more and more appropriate code for each of the nodes. We used four steps, each of which with simulation : We start (§VI-A) with the detection of transmission failures only (the channels do not talk to each other). This can be observed on one channel only, first, and then we can put together all the four channels. Then (§VI-B), we add the detection of faults that are due to cross-channels comparisons, but without latching them. Then we implement the latching of faults and the effect of resets (§VI-C). Finally, we implement the global allocator (§VI-D) that allows a special behavior to be given to third fault. A. Local detection of transmission faults only Lustre code In each channel, the node Monitor determines if the two values received by the channel differ too much for too long a time. It also transmits some combination of the two values received as its “local” value. Nothing is said about the combination function in the informal documentation.We have chosen to output the first value. Any other choice could be implemented in a very simple way. CSI Journal of Computing | Vol. 1 • No. 4, 2012 NE BY ONE ure of the ustre pre’s, y. We can e output ure indeed roblems to priate code which with ansmission ach other). t, and then hat are due hing them. d the effect VI-D) that fault. if the two for a too of the two d about the . We chose mplemented es var In each channel, the node Monitor determines if the two let diff1, diff2, diff3 : bool ; diff1 = abs (xi - pxother1) Lustre values code received by the channel differ too much for a too -- comparisons of xi with the three > CROSS_CH_TOL_ROLL ; -foreign long time. It also transmits some combination of the two diff2 = absvalues (xi - pxother2) let values as its value. Nothing is Specification saidifabout the and In each channel, the“local” node Monitor determines the two > CROSS_CH_TOL_ROLL received Validation Embedded Systems: A Case Study of a ; diff1 ==of abs -- pxother1) abs (xi (xi pxother3) combination function in the informal documentation. We chose values received by the channel differ too much for a too 7 : 78 Fault-Tolerant Data Acquisitiondiff3 System with Lustre Programming environment >> CROSS_CH_TOL_ROLL CROSS_CH_TOL_ROLL ;; to output Any other can be implemented long time.the It first alsovalue. transmits somechoice combination of the two diff2 = abs (xi pxother2) fault = in a very simpleasway. values received its “local” value. Nothing is said about the > CROSS_CH_TOL_ROLL ; maintain(TIME_CROSS_ROLL, diff3 =ifabs (xi - pxother3) combination function in the informal documentation. We chose pfother1 then Monitor ( > CROSS_CH_TOL_ROLL tonode output the first value. Any other choice can be implemented -- don’t take this one into account; xa, xb : real ; -- two input values fault = if pfother2 then -- the same in) a very simple way. maintain(TIME_CROSS_ROLL, returns ( node local_value Monitor ( : real ; xa, xb :value real seen ; --bytwo input values -- the this channel ) failure : bool ; returns ( -- detection of a transmission fault ) local_value : real ; let -- the value seen by this channel failure ; failure := bool maintain(TIME_ROLL, abs(xa - xb) -- detection of a transmission fault > DELTA_ROLL); ) local_value = xa ; let tel failure = maintain(TIME_ROLL, abs(xa - xb) > DELTA_ROLL); TIME_ROLL and= xa DELTA_ROLL twoconstants; constants; local_value ; TIME_ROLL and DELTA_ROLL areare two tel maintain is the Lustre presented in §II-B. maintain is the Lustre nodenode presented in §II-B. if pfother3 if pfother1 then else diff3 then false -don’t take thisthen one diff2 into account else if pfother3 if pfother2 then -the else (diff2 andsame diff3) elseififpfother3 pfother2 then then false else if pfother3 thendiff3 diff1 else if pfother3 then and diff2 else (diff1 diff3) else (diff2 else if pfother3 then and diff3) else if(diff1 pfother2 and then diff2) if pfother3 then and diff1 else (diff1 and diff2 diff3)) ; else (diff1 and diff3) tel else if pfother3 then (diff1 and diff2) In Section VII-A, this nodeand willdiff2 be formally checked;for else (diff1 and diff3)) equivalence with another version. tel Notice that the informal specification is unclear: we deIn Section VII-A, this node be formally“diff1 checked and for cided to report a failure whenwill the conjunction Simulations Simulations TIME_ROLL and DELTA_ROLL are two constants; 9 equivalence with another version. diff2 and diff3” holds for the given delay. We could In Section VII-A, this node will be formally checked for maintain the Lustre node presented in §II-B. Fig. 10 is shows a simulation of the node Monitor, with have chosen another solution, first detecting if each we foreign Notice that theanother informal specification is unclear: deequivalence with version. Fig. 10 shows a simulation of the node Monitor, with constants value differs from the local one for the given delay, and then cided to report a failure when the conjunction “diff1 and Simulations constants B. Adding non-latched comparisons DELTA_ROLL = 14 cross-channel and TIME_ROLL = 3 2 3 4 5 6 8 9 10 11 12 13 14 15 16 17 18 reportingand the1 conjunction of7 these conditions. diff2 diff3” holds for the given delay. We could DELTA_ROLL = 14 and TIME_ROLL = 3 have chosen another solution, first detecting if 27.50 each foreign 27.30 of 27.80the node Fig. 10 shows a simulation of the nodecomparisons Monitor, with Whatever be the correct choice, the body B. Adding non-latched cross-channel Lustre code value differs from the local one for the given delay, and then constants Lustre code reporting the conjunction of these conditions. For cross-channel thethe local value DELTA_ROLL =comparisons, 14 and TIME_ROLL =value 3xi is 20.30 20.50 20.70 For cross-channel comparisons, local xicomis Whatever be the correct choice,20.10the body of the node pared to those among the foreign ones that are not declared 18.20 compared to those among the foreign ones that are not xa 15.20 15.20 15.20 failed (that is why we is need failneed statuses of the threeofforeign declared failed (that whythewe the fail status the channels). Intuitively, theIntuitively, local valuethe is local faultyvalue if it differs too 12.10 12.20 three foreign channels). is faulty much (i.e., too more than a more constant if it differs much (i.e., thanCROSS_CH_TOL_ROLL) a constant CROSS_CH_ 7.50 TOL_ROLL) the other (supposedly correct) values. a from all thefrom otherall(supposedly correct) values. Moreover, 7.00 6.10 6.10 6.10 Moreover, a cross-channel failure only is reported only if this cross-channel failure is reported if this situation lasts 21.20 21.20 21.20 situation some delay (TIME_CROSS_ROLL). This is for some lasts delayfor(TIME_CROSS_ROLL). This detection detection performed by a node values_nok, by the performedisby a node values_nok, called by called the component component FailDetect of the channel and which is given 15.20 14.60 FailDetect of the channel, and which is given below, and xb 12.10 below, along with a description in the sequel.: 11.50 then discussed: 9.90 8.90 8.10 7.90 7.70 7.50 9.10 9.10 5.90 5.70 node values_nok ( 5.40 pfother1, pfother2, pfother3 : bool ; -- foreign values status: true if faulty 27.80 27.30 27.50 xi : real ; -- local value pxother1, pxother2, pxother3 : real ; -- foreign values 20.70 ) 20.10 20.30 20.50 returns ( 18.20 local value 15.20 15.20 15.20 fault : bool -- there is a cross channel fault 12.10 12.20 ) var 7.50 7.00 diff1, diff2, diff3 : bool ; 6.10 6.10 6.10 -- comparisons of xi with the three inline monitor failed -- foreign values let 9 3 5 6 8 10 11 12 13 14 15 16 17 18 7 1 2 4 diff1 = abs (xi - pxother1) Fig. 10 : A simulation of the Monitor > CROSS_CH_TOL_ROLL ; Fig. 10. A simulation of the Monitor diff2 = abs (xi - pxother2) > CROSS_CH_TOL_ROLL ; diff3 = abs (xi - pxother3) > CROSS_CH_TOL_ROLL ; FailDetect is simply: fault = maintain(TIME_CROSS_ROLL, CSI Journalifofpfother1 Computing | Vol. 1 • No. 4, 2012 then failure = transmit_failure or cross_failure; -- don’t take this one into account cross_failure = if pfother2 then -- the same values_nok(pfother1, pfother2, pfother3, if pfother3 xi, pxother1, pxother2, pxother3); debug localfa debug localfa debug localfa debug localfa debug cross fa debug cross fa debug cross fa debug cross fa Fig. 11. A 8.90 9.10 9.10 8.10 7.90 7.70 7.50 5.90 5.70 5.40 xb4 15.00 15.00 3.25 3.50 15.00 15.00 27.80 x F. Maraninchi, et. al. 20.10 20.30 20.50 16 17 18 local value 7.50 27.80 1 2 3 4 5 6 18.20 7 8 9 10 11 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 15.20 20.70 12 13 14 15 16 15.00 15.00 15.00 15.00 15.00 13.00 13.00 9.00 9.00 9.00 15.20 15.20 12.10 12.20 xa1 3.00 0.00 inline monitor failed 2 xa2 -2.00 -2.00 0.00 3.00 7.00 7.50 3.00 3.00 3.00 3.00 3.00 3 4 5 6 7 8 9 3.00 3.00 3.00 3.00 3.00 3.00 3.00 10 3.00 11 3.00 3 4 5 6 7 8 9 10 11 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 A simulation3.50 of3.50the3.50Monitor 3.50 3.50 3.50 3.50 1 xb2 Fig. 10. 3.00 0.00 1 -2.00 6.10 6.10 6.10 xb1 15.20 15.20 -2.00 2 12 13 3.00 12 4.00 3.00 14 15 3.00 16 3.00 17 3.00 27.80 13 27.30 15 16 14 27.50 4.00 4.00 4.00 18 3.00 17 4.00 18 4.00 0.00 0.00 20.10 20.30 20.50 xa3 18.20 xa 0.00 2.50 2.50 0.00 3.25 3.25 3.00 2.50 2.50 2.50 15.20 2.50 20.70 15.20 15.20 2.50 16.00 16.00 16.00 16.00 7.50 7.00 failure = transmit_failure or cross_failure; 6.10 6.10 6.10 cross_failure = 21.20 21.20 21.20 values_nok(pfother1, pfother2, pfother3, xi, pxother1, pxother2, pxother3); 5.00 3.00 xb4 15.00 15.00 15.00 1 xa1 1 0.00 xb 7.50 27.80 4.00 14.60 x 16 17 18 failure; other3, other3); ng Luciole, of Luciole while it is rresponding ei ), we first output. The for relevant duce first a 5), then a ross failure other cross ). 3.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 2 3 4 5 6 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 7 8 0.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.25 3.25 3.50 3.25 12.10 3.00 3.00 11.50 Simulations 9.90 debug localfailure1 9.10 9.10 8.90 7.90 7.70 7.50 We can run a simulation on this 8.10 program, using Luciole, debug localfailure2 5.90 5.70 the Lustre simulator. Since the current version of5.40Luciole only the values of input/output variables, while it is debug shows localfailure3 interesting to see also the internal variables corresponding 27.80 localfailure4 27.30 27.50 todebug failures (transmit failurei , cross failure i ), we first modify program so that these variables be output. The debug cross the failure1 simulation is performed with the following values for relevant debug cross failure2 20.70 constants: 20.10 20.30 20.50 18.20 DELTA_ROLL = 4.0 ; 15.20 15.20 15.20 debug cross failure4 CROSS_CH_TOL_ROLL = 10.0 ; 12.20 12.10 TIME_ROLL = TIME_CROSS_ROLL = 3 ; Fig. 11 : A simulation of transmission and cross failures 7.50 7.00 Fig. 11 shows an execution (where we introduce first a 1 Fig. 11. 9 10 11 12 13 14 15 16 15.00 15.00 15.00 15.00 15.00 12 13 14 15 16 13.00 13.00 9.00 9.00 9.00 3.00 6 3.00 7 3.00 8 3.00 9 3.00 10 3.00 11 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 3.50 3.50 3.50 3.50 3.50 3.50 2.50 2.50 2.50 2.50 2.50 2.50 2 3 4 5 -2.00 -2.00 -2.00 -2.00 3.00 3.00 3.00 xa2 3.00 3.00 4.00 4.00 3.50 2.50 0.00 xb2 1 0.00 2 3 0.00 xa3 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 0.00 xb3 12 :automaton The automaton driving Fig. Fig. 12. The driving failure latchingfailure latching -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 15.20 debug cross failure3 local value 15 3.00 0.00 15.00 0.00 15.20 15.20 3.00 0.00 15.00 5.70 5.40 3.00 debug localfailure3 simulation is performed with the following values for relevant constants: debug localfailure4 DELTA_ROLL = 4.0 ; debug cross failure1 CROSS_CH_TOL_ROLL = 10.0 ; 10 debug cross failure2 TIME_ROLL = TIME_CROSS_ROLL = 3 ; debug Fig. cross failure3 11 shows an execution (where we introduce first a transmission debug cross failure4fault on the first channel (steps 2 to 5), then a cross failure on channel 4 (steps 3 to 7), then a cross failure on channel 3 alone (from step 9), on which an other cross failure, on 1, is combined (from step 12) failures ). Fig.channel 11. A simulation of transmission and cross xb1 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 9.10 9.10 3.00 7 : 79 xb3 16.00 3.00 debug localfailure2 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 FailDetect is simply: 12.10 12.20 xa4 3.00 debug localfailure1 10 15 3.00 15.00 4.00 0.00 27.30 27.50 3.00 0.00 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 A simulation and cross failures 6.10of 6.10transmission 6.10 transmission fault on the first channel (steps 2 to 5), then a inline monitor failed that the informal specification is unclear: we Notice cross failure on channel 4 (steps 3 to 7), then a cross failure decided to report the conjunction 3 a 4failure 5 6 8 10 11 16 17 18 7when 2 12 13 14 15 “diff1 on channel 31 alone (from step 9),9 on which an other cross and diff2 and diff3” holds for2 the given delay. We could failure, on channel 1, is combined (from step 12) ). Fig. 10. A simulation of the Monitor have chosen 1another solution, first detecting if each3foreign value differs from the local one for the given delay, and then reporting the conjunction of these conditions. Whatever be the correct choice, the body of the node FailDetect simply: Fig. 12. The automaton driving failure latching FailDetect is is simply: failure = transmit_failure or cross_failure; cross_failure = C. Latching the cross-channel faults values_nok(pfother1, pfother2, pfother3, xi, pxother1, pxother2, pxother3); Now, recall that some serious faults must be latched (i.e., sustained even when their cause has disapeared), until the occurence of some reset command. This latching is performed Simulations Simulations by FailDetect, and depends on the state of an Wethe cannode runrun a asimulation on program, using Luciole, We can simulation on this this program, using Luciole, automaton, as described by Fig. 12: the Lustre simulator. Since the current version of Luciole the Lustre simulator. Since the current version of Luciole only shows1 the values of input/output variables, while propit is • State is the normal state: everything is working only shows to thesee values ofthe input/output variables, while it is interesting also internal faults variables corresponding erly, or there are transmission from time to time, interesting to see also the internal variables corresponding to to failures (transmit failure i , cross failurei ), we first which are not latched; failures (transmit ; cross failure ), we first modify the program failure so that ithese variables be output. The i • In the state 2, theresohas been least one serious fault modify that these variables be output. Thein simulation isprogram performed with the at following values for relevant the past, which is latched, but may be reset either by constants: “OnGroundReset” or by “InAirReset”; = 4.0 • DELTA_ROLL In state 3, there have ;been several serious faults in the CROSS_CH_TOL_ROLL = 10.0 ; past, or one serious fault followed immediately by a TIME_ROLL = TIME_CROSS_ROLL = 3 ; transmission one, and the fault is latched and reset- Fig. 11 shows an execution (where we introduce first a 16.00 16.00 16.00 16.00 16.00 C. Latching the cross-channel faults xa4 5.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 0.00 It may bethe noted that some serious C. Latching cross-channel faults faults must be latched (i.e., sustained even when their cause has disappeared), Now, that some serious must beThis latched (i.e., until therecall occurence of some resetfaults command. latching sustained even when their cause has disapeared), until the is performed by the node FailDetect, and depends on the occurence of some resetascommand. This latching is performed state of an automaton, described by Fig. 12: debug localfailure1 node1 FailDetect, depends on theisstate of an by the State is the normal and state: everything working debug localfailure2 automaton, as or described bytransmission Fig. 12: properly, there are faults from time to debug localfailure3 which arenormal not latched; • time, State 1 is the state: everything is working properly, or there are transmission faultsone from time fault to time, debug In state 2, there has been at least serious in localfailure4 which arewhich not latched; the past, is latched, but may be reset either by debug cross failure1 by “InAirReset”; • “OnGroundReset” In state 2, there hasorbeen at least one serious fault in debug cross failure2 thestate past, 3,which latched, may be reset faults either in by In thereishave beenbut several serious the past, or one serious fault followed immediately by “OnGroundReset” or by “InAirReset”; debug cross failure3 and theseveral fault isserious latchedfaults and reset • aIntransmission state 3, thereone, have been in the debug cross failure4 inhibited. In this case, only the “OnGroundReset” cana past, or one serious fault followed immediately by restore the normal state. transmission one, and the fault is latched and resetThe failure output is true when in state 2 or 3, and is Fig. 11. A simulation of transmission and cross failures equal to transmit_failure when in state 1. The transitions between these states are driven by the following conditions: from state 1: if the locally computed variable involves a 2 cross-failure, it is considered more serious if its value is “in the nominal range” (since it can be later considered 1 3 as correct). So, in this case, one moves to state 3. If the erroneous value lies outside the nominal range, one moves to state 2. Fig. The automaton latching 12. from state 2: anydriving resetfailure signal restores the normal state (1); a cross-failure immediately followed by a foreign failure involves a move to state 3; a transmission failure occurring in state 2 also involves a move to state 3. C. Latching the cross-channel faults from state 3: only the “OnGroundReset” can restore the normal Now, state. recall that some serious faults must be latched (i.e., xb4 15.00 15.00 15.00 15.00 15.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 3.25 3.25 3.50 3.25 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 4.00 0.00 x 0.00 1 sustained even when their cause has disapeared), until the Lustre code occurence of some reset command. This latching is performed Programming such an automaton in Lustre tedious, by the node FailDetect, and depends on the is state of an automaton, as described by Fig. 12: • State 1 is the normal state: everything is working properly, or there are transmission faults from time to time, CSI Journal of Computing | Vol. 1 • No. 4, 2012 which are not latched; • In state 2, there has been at least one serious fault in the past, which is latched, but may be reset either by • from involves state 1: ifa the locally computed variable involves a r = false -> (pre(state) = 1 and try1to3) failure move to state 3; a transmission failure occur only when the authorization is given: cross-failure, it is2 considered more serioustoifstate its value occurring in state also involves a move 3. is • The actual moves or (pre(state) = 2 and try2to3); “in state the nominal (since it can be later considered • from 3: only range” the “OnGroundReset” can restore the from1to3 = try1to3 and a; where try1to3 and try2to3 obey the previous definitions as correct). normal state. So, in this case, one moves to state 3. If from2to3 = try2to3 and a; of from1to3 and from2to3: range, one and the erroneous value lies outside the nominal Specification Validation of Embedded Systems: A Case Study of a The global allocator is quite it receives requests try1to3 = cross_failure and simple: InNominalRange(xi); Lustre code moves to state 2. 7 : 80 Fault-Tolerant Data Acquisition System with Lustre Programming environment try2to3 = (pre(cross_failure) r from the units, together with the “OnGroundReset” signal, i • from state 2: any reset signal restores the normal state and foreign_failure) and returns authorizations. An internal counter nb is used Programming such an automaton in followed Lustre isby tedious, but (1); a cross-failure immediately a foreign or transmit_failure ; to count the number of units that are “reset-inhibited”, The actual moves occur only when the authorization isand but rather systematic. The state is encoded by an integer failure involves move to state 3;bya an transmission failure rather systematic. The astate is encoded integer variable, • The actual moves whento theprevent authorization is given:from given: variable, ranging from 1 to 3. Notice that the order in which authorizations are occur givenonly in order this counter 2 also involves move to 3. the rangingoccurring from 1 in to state 3. Notice that thea order in state which the transition conditions are tested for is relevant: for reaching 3. The “OnGroundReset” signal resets the allocator from1to3 = try1to3 and a; • from state 3: only “OnGroundReset” the from1to3 = try1to3 and a; transition conditions are the tested for is relevant:can forrestore instance, instance, it allows priority to be given to reset: from2to3 = try2to3 and a; normal state. in its initial configuration (because it causes all automata to from2to3 = try2to3 and a; it allows priority to be given to reset: 4 leave state 3) . The global allocator quite The global allocatoris is quitesimple: simple:it itreceives receivesrequests requestsr Lustre code i state = 1 -> if pre(state)=1 then Programming such an automaton in Lustre is tedious, but if pre(reset) then 1 by an integer variable, rather systematic. The state is encoded -- reset has priority ranging else from 1iftopre(from1to2) 3. Notice that thethen order2 in which the transitionelse conditions are tested for is then relevant: for instance, if pre(from1to3) 3 else 1 it allows priority to be given to reset: else if pre(state)=2 then if pre(from2to1) then 1 else state = 1if->pre(from2to3) then 3 else 2 else -- pre(state)=3 if pre(state)=1 then ififpre(from3to1) then pre(reset) then 1 1 else 3 ; -- reset has priority else if pre(from1to2) then 2 Thedefinitions definitions of transition conditions follow the The conditions follow the specifielse of if transition pre(from1to3) then 3 else 1 specification: cation: else if pre(state)=2 then cation: if pre(from2to1) then 1 else if pre(from2to3) from1to2 = cross_failure andthen 3 else 2 else -- pre(state)=3 not InNominalRange (xi) if pre(from3to1) then 1 else 3 ;; from1to3 = cross_failure and InNominalRange (xi) ; from2to3 = (pre(cross_failure) and the specifiThe definitions of transition conditions follow foreign_failure) cation: or transmit_failure ; from2to1 = inairreset or ongroundreset; from1to2 = cross_failure and from3to1 = ongroundreset; not InNominalRange (xi) ; from1to3 =whole cross_failure Fig. 1313 shows thethe code node Fig. : shows whole codefor forthe theand node FailDetect. FailDetect. InNominalRange (xi) ; from2to3 = (pre(cross_failure) and D. Taking into account the requirement on third foreign_failure) D. Taking on third fault fault into account theorrequirement transmit_failure ; thereal realsystem, system, specialor — from2to1 = inairreset ongroundreset; InInthe aa special device — called called“Global “Global from3to1 ongroundreset; Allocator” — ofof preventing more than two units Allocator” —isisinin=charge charge preventing more than two units from becoming reset-inhibited (i.e., state The from becoming reset-inhibited (i.e., from entering state3).3). The Fig. 13 shows the whole code forfrom the entering node FailDetect. node Fail-Detect must be changed as follows: whenever node Fail-Detect must be changed as follows: whenever the automaton should should move move to to state state3,3,ititsends sends arequest, request, the automaton say into account the and requirement on thirdathe fault sayD.r, Taking to the global allocator, only performs move if r, to the global allocator, and only performs the move if the the allocator sendssystem, it an authorization back,— saycalled a. The“Global node In the real a special device hasAllocator” an additional a, and sends additional — isBoolean in chargeinput of preventing morean than two units Boolean output r. The transitions(i.e., are then follows: from becoming reset-inhibited frommodified entering as state 3). The node TheFail-Detect request is the must disjunction of the which be changed as condition follows: whenever involved a move from state 1 to3,state 3, and the one the automaton should move to state it sends a request, say a move from stateperforms 2 to statethe 3. move if the r,which to theinvolved global allocator, and only r = false -> (pre(state) = 1 and try1to3) or (pre(state) = 2 and try2to3); where try1to3 and try2to3 obey the previous definitions of from1to3 and from2to3: try1to3 = cross_failure and InNominalRange(xi); try2to3 = (pre(cross_failure) and foreign_failure) or transmit_failure ; 4 from thethe units, signal, ri from units,together togetherwith withthe the “OnGroundReset” “OnGroundReset” signal, node allocator(r1,r2,r3,r4,reset: and returns authorizations. An internal nb and returns authorizations. An internal counter counterbool) nb isisused usedto returns (a1,a2,a3,a4: bool); count the number of units that are “reset-inhibited”, and to count the number of units that are “reset-inhibited”, and var nb_aut, already: int; authorizations are given in order to prevent this counter from authorizations are given in order to prevent this counter from let reaching 3. The signalresets resetsthe theallocator allocator reaching 3. The=“OnGroundReset” “OnGroundReset” signal already if (true -> reset) in configuration (because it causes causes all all automata automatatoto in its its initial initial configuration (because it then 0 else pre(nb_aut); leave state 3)44.. leavea1 state = 3) r1 and already <= 1; a2 = r2 and ((not r1 and already <= bool) 1) node allocator(r1,r2,r3,r4,reset: or (r1 and already returns (a1,a2,a3,a4: bool); = 0) ); already: int; var nb_aut, let a3 = r3 and already = if r1 (true ((not and -> notreset) r2 and already <= 1) 0 else pre(nb_aut); or (#(r1,r2) then and already = 0) a1 = r1 );and already <= 1; a2 a4 == r2 r4and and ((not 1) not r3 and ((notr1r1and andalready not r2 <= and or (r1 and already = 0) already <= 1) ); or (#(r1,r2,r3) and already = 0) a3 = r3 );and ((not r1 (true and not and already nb_aut = if -> r2 reset) then 0 <= 1) or (#(r1,r2) and already else pre(nb_aut) + = 0) ); (if a1 then 1 else 0) + a4 = r4 and (if a2 then 1 else 0) + ((not r1 and not r2 and not r3 and (if a3 then 1 else 0) + already <= 1) (ifalready a4 then 1 else 0) ; or (#(r1,r2,r3) and = 0) tel ); nb_aut = if (true -> reset) then 0 else pre(nb_aut) + Notice that there is an “instantaneous dialogue” between (if a1 then 1 else 0) + the a2same thenstep, 1 else 0) +asks units and the allocator: in the(if very the unit (if the a3 allocator then 1 replies, else 0) the allocator for an authorization, and+ the (if a4 then 1 else 0) ; unit takes the transition or not, according to this reply. tel 4 The “#” operator, in Lustre, is a n-ary Boolean operator, which returns 4 “true” if and only if atismost one of its operand is true. between the Notice that there an “instantaneous dialogue” Notice that there is an “instantaneous dialogue” between unitsunits and and the allocator: in theinvery same same step, step, the unit the the allocator: the very theasks unit the allocator for an authorization, the allocator replies, the asks the allocator for an authorization, the allocatorand replies, unit takes the transition not, according to according this reply. to this and the unit takes theortransition or not, reply. 4 The “#” operator, in Lustre, is a n-ary Boolean operator, which returns Section some properties “true”In if and only if atVII-B, most one of its operand is true.of this allocation mechanism will be formally verified. VII.Some Experiences in Formal Verification Ideally, the formal verification of such a program should consist of comparing it with a global, abstract specification. As it is often the case with real case-studies, this specification is not available. The problem even more serious, here, since the abstract specification of such a fault-tolerant system The “#” operator, in Lustre, is a n-ary Boolean operator, which returns “true” if and only if at most one of its operand is true. CSI Journal of Computing | Vol. 1 • No. 4, 2012 7 : 81 F. Maraninchi, et. al. 12 node FailDetect ( transmit_failure : bool ; xi : real ; ongroundreset, inairreset : bool ; choffi : bool ; pxother1, pxother2, pxother3 : real ; -- other values (pre) pfother1, pfother2, pfother3 : bool ; -- other failures (pre) ) returns ( failure : bool ; -- failure detected by this channel ) var cross_failure : bool ; state : int ; -- only 1, 2, 3 are relevant from1to2, from1to3, from2to3, from2to1, from3to1 : bool ; reset, foreign_failure : bool ; let -- the state ------------------------------------------------------state = 1 -> if pre(state)=1 then if pre(reset) then 1 -- reset has priority else if pre(from1to2) then 2 else if pre(from1to3) then 3 else 1 else if pre(state)=2 then if pre(from2to1) then 1 else if pre(from2to3) then 3 else 2 else -- pre(state)=3 if pre(from3to1) then 1 else 3 ; reset = ongroundreset or (inairreset and not cross_failure) ; foreign_failure = pfother1 or pfother2 or pfother3 ; -- The output -----------------------------------------------------failure = (state = 2) or (state = 3) or (state = 1 and transmit_failure) ; -- All the transitions --------------------------------------------from1to2 = cross_failure and not InNominalRange (xi) ; from1to3 = cross_failure and InNominalRange (xi) ; from2to3 = (pre(cross_failure) and foreign_failure) or transmit_failure ; from2to1 = reset ; from3to1 = ongroundreset ; -- Cross channel comparisons --------------------------------------cross_failure = values_nok (pfother1, pfother2, pfother3, xi, pxother1, pxother2, pxother3) ; tel Fig. 13. Latching Failures Fig. 13 : Latching Failures should probably involve probabilistic properties, which are A. Cross-channel fault detection stated nowhere, couldofnot handled mechanism by usual In Section VII-B, and somewhich properties thisbeallocation verification tools.verified. will be formally of the theversion node values_nok, which detects In of the node values_nok givencross-channel in Section faults. VI-B, the output fault is defined as maintain(TIME_ • When some consistency are clearly expressed CROSS_ROLL, cond), where properties cond is a Boolean expression in the requirements: for instance, we will try to prove that carefully detailing all the possible combinations of failures. at most channels can become “reset-inhibited”. One could looktwo for a more compact and symmetrical condition, say cond1, expressing that all the significant other measures A. Cross-channel fault are too different from thedetection local measure: )) diffi "i Î [1...3]; Incond1 the =version of the (Øpfother node values_nok given i or in Section VI-B, the output fault is defined as maintain(TIME_CROSS_ROLL, cond), where cond fault = maintain(TIME_CROSS_ROLL, is a Boolean expression carefully detailing all the possible pfother1 or diff1) combinations of failures. One could look for a compact and (pfother2 or more diff2) and symmetrical condition, say cond1, expressing that all and (pfother3 or diff3)); the significant other measures are too different from the local So, we can write two complete versions of the node measure: values_nok, with the same definition for the diffi and a So, we don’t know “what to verify” on the complete EXPERIENCES FORMAL VERIFICATION VII. S OME program. However, there areINtwo common cases where verification tools can beverification applied “locally”: Ideally, the formal of such a program should consist of comparing it ways with of a writing global, aabstract specification. When there are two node, one can write and the try to show that they are equivalent. If they are As itboth is often case with real case-studies, this specification a bug isThe detected, at even least, more in oneserious, of them. If they is notnot, available. problem here, since are, thisspecification increases the one can have in any of the abstract of confidence such a fault-tolerant system should them.involve We willprobabilistic illustrate this situation which on two are versions probably properties, stated of the node values_nok, detects cross-channel nowhere, and which could not bewhich handled by usual verification faults. tools. So,When properties are clearly we some don’t consistency know “what to verify” on theexpressed complete in theHowever, requirements: for instance, we will trywhere to prove program. there are two common cases verthattools at most can become “reset-inhibited”. ification cantwo be channels applied “locally”: • When there are two ways of writing a node, one can write both and try to show that they are equivalent. If they are not, a bug is detected, at least, in one of them. If they are, this increases the confidence one can have in any of them. We will illustrate this situation on two versions cond1 = ∀i ∈ [1..3], (¬pfotheri ) ⇒ diffi or CSI Journal of Computing | Vol. 1 • No. 4, 2012 fault = maintain(TIME_CROSS_ROLL, and (pfother1 or diff1) (pfother2 or diff2) 13 13 and (pfother3 or diff3)); • on the other hand, the determination of the suitable con Specification•and Validation of Embedded Systems: A Case Study and (pfother3 or diff3)); on other hand, the complex. determination of the suitable con-of a trolthe structure is quite Obviously, the automata 82 can write two complete versions Fault-Tolerant Data Acquisition System with Lustre Programming environment So,7 :we of the node trol structure is quite complex. involved in “FailDetect” shouldObviously, be taken the intoautomata account, So, we can with write the twosame complete versions thei and nodea values_nok, definition for theofdiff involved in “FailDetect” should be taken into but the tool takes a very long time to find it. account, and a values_nok, with the same definition for the diff i different definition for r. Then, we try, using our verification butlimitations the tool takes a very timecomplex. to find it.Obviously, control structure islong quite different definition r. Then, we using try, using verificationThese suggest some improvements to the tool,the different forwhatever r.for Then, we the try, our our verification tools, to definition show that, be input sequences, they automata involved in “FailDetect” should taken tools, to show that, whatever be the input sequences, they These limitations suggest some improvements tobethe tool,into which will be discussed in the conclusion. tools, to show that, whatever be the input sequences, they provide the same output sequence: account, but theintool takes a very long time to find it. provide the same output sequence: which will be discussed the conclusion. So, we decided to use Lesar for this verification. For that, provide thethe same output sequence: • with verification tool Lesar, the verification fails. These limitations suggest some to the with the verification tool Lesar, the verification fails.weSo, weto decided tothe useprogram, Lesar for verification. that, have modify so this that theimprovements propertyFor involves • with the verification tool Lesar, the verification fails. The The diagnosis returned by Lesar shows thatthat it isit due tool,towhich will be discussed in the conclusion. diagnosis returned by Lesar shows is duewe have modify the program, soitthat theuses property involves computations. Since only counters up to The diagnosis returned by Lesar shows that itonly is due to the of the tool: Lesar considers thetheonly Boolean to weakness the weakness of the tool: Lesar considers only So, we decided to use Lesarit for this verification. that, only Boolean computations. Since only uses counters For up to 4, it is quite easy: to the weakness of the tool: Lesar considers only the Boolean aspects of the program, andand abstracts away all all Boolean aspects of the program, abstracts away we have to modify the program, so that the property involves quite easy: must be changed to work only with Boolean Boolean aspects ofexpressions. the program, and abstractsit away all 4, it• is “FailDetect” the numerical expressions. As aAs consequence, produces the numerical a consequence, it produces only Boolean computations. Since it only uses counters up to • “FailDetect” must be changed to work only with Boolean the numerical expressions. As a consequence, it produces variables: instead a counter-example where the Boolean variables diff1, a counter-example where the Boolean variables diff1, 4, it is quite easy: of encoding states by integers, we use variables: instead ofand encoding states by integers, wesuch use adiff2, counter-example where the Boolean variables diff1, of Booleans,must wechanged define atonode compare diff2, diff3 (which defined numerical and and diff3 (which areare defined by by numerical pairs “FailDetect” be worktoonly with Boolean pairs of Booleans, and we define a node to compare such diff2, and diff3 (which are defined by numerical expressions) appearing in two the two versions of the node pairsvariables: for equality: expressions) appearing in the versions of the node instead of encoding states by integers, we use pairspairs for equality: expressions) appearing inIt of the node distinct values. is atwo case of false negative. have have distinct values. It is athe case of versions false negative. of Booleans, and we define a node to compare such const state1 = [false, true]; distinct values. It able, isisa able, case of some false negative. prototype the prototype to extend, to take • have the NBacNBac is to some extend, to take intointo const state1 == [false, true]; pairs for equality: state2 [true, false]; • the prototype NBac is able, to extend, to take into account numerical aspects inthe theverification. verification. It state2 == [true, false]; state3 [true, true]; const state1 = [false, true]; account numerical aspects in some It also alsofails state3 =(s1, [true, node EQState s2: true]; [bool,bool]) in proving the equivalence of the two nodes, but indicates account numerical aspects in the verification. It also state2 = [true, false]; fails in proving the equivalence of the two nodes, but node EQState (s1, s2: returns [bool,bool]) (eq: bool); that the output may may differdiffer when all two the pfother state3 = [true, true]; fails in proving equivalence of the nodes, but i indicates that thethe output when allinputs the inputs returns (eq: bool); let node EQState (s1, s2: are true. Thisoutput is an actual discrepancy: in the thisinputs case, the [bool,bool]) indicates that the may differ when all pfotheri are true. This is an actual discrepancy: in leteq =(s1[0]=s2[0]) returns and (s1[1]=s2[1]); (eq: bool); first version outputs = false (since none This fault is outputs an actual discrepancy: in of pfother this case,i are the true. first version fault = false eq =(s1[0]=s2[0]) and (s1[1]=s2[1]); tel let other values is significant, the local one is assumed to be this case, first version fault = false tel (since nonethe of other values isoutputs significant, the local one is eq =(s1[0]=s2[0]) and (s1[1]=s2[1]); good), while the second version outputs fault = true. (since none of other values is significant, the local one is Then, we change the definition of the state as follows: assumed be good), while thesecond second version outputs tel It is to probably a bug in the version, which can be Then, we change the definition of the state as follows: assumed to true. be good), while the second version outputs fault = It is probably a bug in the second fixed as follows: Then, we change the definition of the state as follows: fault = true. probably a bug in the second state= state1 -> version, which can Itbeisfixed as follows: fault = maintain(TIME_CROSS_ROLL, state= state1 -> if EQState(ps,state1) then version, which can be fixed as follows: pfother1 or diff1) fault maintain(TIME_CROSS_ROLL, =and (pfother2 or diff2) fault maintain(TIME_CROSS_ROLL, pfother1 or diff1) = and (pfother3 or diff3) pfother1 diff1) and (pfother2 or diff2) andnot(pfother1 andor pfother2 and (pfother2 or diff2) and (pfother3 or diff3) andpfother3)); and or diff3) and (pfother3 not(pfother1 and pfother2 and not(pfother1 and pfother2 and fixed pfother3)); Now, NBac shows that this version is equivalent to Now, NBac fixedpfother3)); version is equivalent to the first one.shows that this and Now, NBac the first one.shows that this fixed version is equivalent to B. first Allocation of “reset-inhibition” the one. if EQState(ps,state1) if pre(reset) then then state1 if pre(reset) then state1 then state2 else if pre(from1to2) else else if if pre(from1to2) pre(from1to3) then then state2 state3 else pre(from1to3) then state3 else if state1 else else if state1 EQState(ps,state2) then elseififpre(from2to1) EQState(ps,state2) then then state1 if pre(from2to1) then state1 else if pre(from2to3) then state3 else pre(from2to3) then state3 else if state2 else state2 else elseif pre(from3to1) then state1 if pre(from3to1) then state1 else state3 ; else state3 ; In section VI-D we designed the “Global Allocator”, B. Allocation of “reset-inhibition” B. Allocation “reset-inhibition” the role of which is to prevent more than two channels to In section VI-D we designed the “Global Allocator”, the become VI-D “reset-inhibited. Wethe can“Global try to verify the behavior • The property must be expressed only with Boolean variIn section we designed Allocator”, role of of the which is to prevent more than two channels the to property The property must be expressed only with Boolean allocator alone: we know that nb_aut, the number • The must be expressed with variables: the observer receives theonly states of Boolean the channels, role of which is to prevent more than two channels to become “reset-inhibited. We can try to verify the behavior variables: the observer receives the states of the channels, of authorizations, should stay smaller than 2. Of course, ables: the observer receives the states of the channels, and returns a single Boolean which is false when at become “reset-inhibited. We can try to verify the behavior of thebecause allocator alone: we variables, know thatLesar nb_aut, and returns a single Boolean which is false when least at least of numerical is notthe ablenumber to proveofthis and returns a single is false when at least three of them are are inBoolean state 3. which of the allocator alone: we know that nb_aut, the number of three of them in state 3. authorizations, should stay smaller than 2. Of course, because property (the range of the counter being small, the node could three of them are in state 3. authorizations, should smaller 2. course, because of numerical variables, Lesar is notthan able to Of prove this itproperty be rewritten onlystay with Boolean variables, but is neither node verif(st1, st2, st3, st4:[bool,bool]) of numerical variables, Lesar is not able to prove this property node returns verif(st1, st3, st4:[bool,bool]) (ok:st2, bool); (the range of the nor counter beingNBac small,proves the node could bevery very natural, efficient). the property (ok: bool); (the range of about the being small,but theit node couldvery be var returns three: bool; easily (in 0.5 sec.). rewritten only withcounter Boolean variables, is neither var three: inhib1,bool; inhib2, inhib3, inhib4: bool; rewritten only withambitious Boolean variables, it is neither very on natural, nor efficient). NBac verification proves thebut property very easily A more consists in proving, let inhib1, inhib2, inhib3, inhib4: bool; natural, nor efficient). NBac proves the property very easily the integrated (in about 0.5 sec.). system, that at most two units can become let -- at most 2 channels reset inhibited (in Aabout 0.5 sec.). i.e., resetinhibited, that notconsists only are the authorizations more ambitious verification in proving, on the -- ok at =most channels reset inhibited not 2three; A more ambitious verification consists in the by correctly delivered, butmost also two that they are correctly obeyed = not three; --okcounting the number of inhibited units integrated system, that at units canproving, becomeon resetthe units. --three counting the number of inhibited units integrated system, most can become reset=(inhib1 and inhib2 and inhib3) or inhibited, i.e., thatthat not atonly aretwo the units authorizations correctly three =(inhib1 and inhib2 and inhib3) or (inhib1 and inhib2 and inhib4) or inhibited, i.e., that not only are the authorizations correctly The current version of NBac is not able to perform this delivered, but also that they are correctly obeyed by the units. (inhib1 (inhib1 and and inhib2 inhib3 and and inhib4) inhib4) or or for two delivered, but also that they are correctly obeyed the units. Theverification, current version ofreasons: NBac is not able to by perform this (inhib1 (inhib2 and and inhib3 inhib3 and and inhib4) inhib4);or The current version of NBac ablenumerical to perform this on for one hand, there are istoonot many variables, verification, two reasons: and inhib3 and inhib4); inhib1 (inhib2 = EQState(st1,state3); which makes the symbolic computations very complex. verification, for two reasons: inhib1 inhib2 == EQState(st1,state3); EQState(st2,state3); • on one hand, there are too many numerical variables, An interesting remark that the variables corresponding inhib2 • on one hand, the there areistoo many numerical variables, inhib3 == EQState(st2,state3); EQState(st3,state3); which makes symbolic computations very complex. to inhib3 == EQState(st3,state3); “roll” measures, while being used to determine failures, inhib4 EQState(st4,state3); which makes the symbolic computations very complex. An interesting remark is that the variables corresponding inhib4 = EQState(st4,state3); tel have no real influence on the the variables property; corresponding the tool is not able An interesting remark is that to “roll” measures, while being used to determine failures, tel to detect this fact, and to “slice” these variables away. to “roll” measures, while used to determine have no real influence onbeing the property; the tool is failures, not able on the other hand, the determination of the suitable Lesar proves this property in 69 sec. have no real property; tool is not able to detect thisinfluence fact, andontothe “slice” thesethe variables away. Lesar proves this property in 69 sec. to detect this fact, and to “slice” these variables away. Lesar proves this property in 69 sec. CSI Journal of Computing | Vol. 1 • No. 4, 2012 14 7 : 83 several aspects: the language itself, the design methodology, andvalidated the validation tools. and with associated tools. The study throws light on The aspects: study demonstrates that case study methodology, is quite wellseveral the language itself, the design suited Lustre. tools. A similar experiment performed with Esterel and thefor validation [RS00], showed that athat data-flow language much The [SS03] study demonstrates case study is is quite more natural for most features of the system. Now, wellsuited for Lustre. A similar experiment performed with we restricted ourselves to showed using only thea kernel of Lustre which Esterel [RS00], [SS03] that data-flow language is is much more natural most features of the In system. Now, we compatible with theforindustrial tool Scade. the full academic restricted ourselves of Lustre is version of Lustre, to weusing alsoonly havethe a kernel powerful notionwhich of arrays, compatible with the industrial tool Scade. In the full academic the use of which would have significantly reduced the size version Lustre, weall also have a powerful notion of arrays, of the of description: data symmetrically processed in each the use of which would have significantly reduced the size of channel should be structured in arrays. This usefulness of the description: all data symmetrically processed in each arrays, both concerning the structure of the source program, channel should be structured in arrays. This usefulness of and the quality of the generated code, was confirmed by other arrays, both concerning the structure of the source program, case studies. The most imperative part of the system is the and the quality of the generated code, was confirmed by other automata used FailDetect reset-inhibition. case studies. Theinmost imperative for partdefining of the system is the While the description of such small automata is not difficult in node verif(ask1, ask2, ask3, ask4, reset: bool; automata used in FailDetect for defining reset-inhibition. st1, st2, st3, st4: [bool, bool]) While Lustre, would be obviously easier with anisextension based theit description of such small automata not difficult returns (ok: bool); on automata [MR03]. in Lustre, it would be obviously easier with an extension var morethantwoasks : bool; based on automata [MR03]. Concerning programming methodology, we have tried to req1, req2, req3, req4: bool; promote a progressive approach: the whole Concerning programming methodology, wearchitecture have tried is inhib1, inhib2, inhib3, inhib4: bool; to promote a progressive approach: the whole architecture designed first, and its components are progressively detailed let isin designed and isitsthat components are progressively -- if more than 2 requests, then turn. Thefirst, interest an integrated — yet still in-- at least 2 channels in state 3 detailed in turn. The interest that an integrated yet still for complete —, version of theisprogram is always — available ok = (morethantworeqs => incomplete —, version of theThis program is always available simulation and validation. approach completely differs not #(inhib1, inhib2, inhib3, inhib4)); for simulation and validation. This approach completely from the “progressive refinement” generally advocated (e.g., differs from the “progressive refinement” generally advocated in B [Abr95]), where a non-deterministic specification is inhib1 = EQState(st1,state3); (e.g., in B [Abr95]), where a non-deterministic specification inhib2 = EQState(st2,state3); progressively refined (i.e., made more deterministic) into a is progressively refined (i.e., made more deterministic) into inhib3 = EQState(st3,state3); aprogram. program.Here, Here,since since we we use use aa programming programming language, language, all inhib4 = EQState(st4,state3); our descriptions are deterministic. So the design all our descriptions are deterministic. So theproceeds design in the reverse direction: an initial, very simplified description, proceeds in the reverse direction: an initial, very simplified is req1 = if (true -> reset) then false progressively until itenriched matchesuntil the whole specification. else pre(req1) or ask1; description, is enriched progressively it matches the req2 = if (true -> reset) then false Of course, this pragmatic approach is less “formal”, butis we whole specification. Of course, this pragmatic approach else pre(req2) or ask2; less “formal”, believe thatsince it is more realistic, it believe that itbut is we more realistic, it better meetssince engineers req3 = if (true -> reset) then false better meets engineers uses. Also, because of the weaknesses uses. Also, because of the weaknesses of formal validation else pre(req3) or ask3; oftools, formal tools,state, in their present early by error in validation their present early error state, detection early req4 = if (true -> reset) then false detection by — early simulation — which allowed by simulation which is allowed by ourisapproach — our seems else pre(req4) or ask4; approach — seems to be more practical to be more practical than early proof.than early proof. morethantworeqs = Finally, it was an interesting ourour validation Finally, it was an interestingchallenge challengeforfor validation not #(req1, req2, req3, req4); tools. We showed, first, that the Luciole simulator is tools. We showed, first, that the Luciole simulator is extremely tel extremely useful, at each stage of the development, to quickly useful, at each stage of the development, to quickly check check either the whole program or a single node. Concerning either the whole program or a single node. Concerning veriwe faced the usual problems due to lack of global Lesarfinds findsthat that this property false. This due the Lesar this property is is false. This is isdue totothe fact verification, fication, we However, faced the some usual properties problems due lack of global specification. wereto available. A fact requests produce state changes at the next step. thatthat requests onlyonly produce state changes at the next step. The specification. However, some properties were available. A first remark is that proving properties directly on the source The correct property must be written: correct property must be written: first remark is thatinproving properties directly ontool thewith source program requires, most cases, a verification ok = true -> program requires, in most cases, a verification tool with some numerical capabilities. Such capabilities are provided (pre(morethantworeqs) => ok = true -> some numerical capabilities. Suchthat capabilities are provided by NBac, but the size of problems it can handle must (pre(morethantworeqs) => not #(inhib1, inhib2, inhib3, inhib4)); suggests that someit ways in which byincreased. NBac, butThe thecase sizestudy of problems can handle must not #(inhib1, inhib2, inhib3, inhib4)); be Now, Lesar finds, in 130 sec., that the property is the can help NBac: mentioned already, be user increased. The case as study suggests some there ways should in which satisfied. be a way of abstracting away some Boolean variables whose Now, Lesar finds, in 130 sec., that the property is satisfied. the user can help NBac: as mentioned already, there should dependences numericalaway values obviously influence be a way of on abstracting some Booleandon’t variables whose VIII. Conclusion the validity of the property; it was the case, in our example, dependences on numerical values obviously don’t influence C ONCLUSION We presented aVIII. real case study5 described in Lustre, of the failure detections. On the other hand, the user should the validity of the property; it was the case, in our example, 5 5 We presented a real case study described in Lustre, and For concision and confidentiality reasons, the case study was slightly simplified the paper, On but the it isother still representative the of the failureindetections. hand, the userof should validated with associated complexity of actual system.tools. The study throws light on be able to suggest an initial control structure; for instance the automata used to determine the reset-inhibition were obviously 5 For concision and confidentiality reasons, the case study was slightly simplified in the paper, but it is still representative of the complexity of actual relevant to the verification. We were not able to apply our autoCSI Journal Vol. 1because • No. 4, matic testing tooloftoComputing this example, |mainly of 2012 missing system. F. Maraninchi, et. al. An other interesting property is that the allocator allows a maximum number of channels An other interesting propertytois become that the reset-inhibited. allocator allows In words,number we want that, whenever atreset-inhibited. least two channels aother maximum of channels to become In have words, requested become since thechannels last “Onother we to want that, reset-inhibited whenever at least two GroundReset”, then at least two of them since (in fact, two, have requested to become reset-inhibited theexactly last “Onbecause of thethen previous property) are actually GroundReset”, at least two of them (in fact,reset-inhibited. exactly two, because of the previous property) are reset-inhibited. This property can be expressed by actually the following observer: it This property can be expressed by the following observer: it receives the requests from channels to become reset-inhibited, receives the requests from channels become the OnGroundReset” signal, and tothe state reset-inhibited, of the channels. the OnGroundReset” signal, and the state of signal. the channels. The requests are memorized from each reset Then the The requests are memorized from each reset signal. the output is true iff whenever there is at least two Then memorized output is true iff whenever there is at least two memorized requests, at least two channels are in state 3 (remember that, in requests, at least two channels are in state 3 (remember that, Lustre, #(x1,x2,...,xn) is a Boolean expression which in Lustre, #(x1,x2,...,xn) is a Boolean expression which is true iff at most 1 of its Boolean arguments xi is true; so, so, its is true iff at most 1 of its Boolean arguments xi is true; its negation expresses that at least two of them are true): negation expresses that at least two of them are true): 7 : 84 Specification and Validation of Embedded Systems: A Case Study of a Fault-Tolerant Data Acquisition System with Lustre Programming environment be able to suggest an initial control structure; for instance the automata used to determine the reset-inhibition were obviously relevant to the verification. We were not able to apply our automatic testing tool to this example, mainly because of missing knowledge, both about the assumptions on the environment and about the global properties to be checked on the whole system. This is a challenging area of pursuit. Springer Verlag, October 2002. [HLR93] N. Halbwachs, F. Lagnier, and P. Raymond. Synchronous observers and the verification of reactive systems. In M. Nivat, C. Rattray, T. Rus, and G. Scollo, editors, Third Int. Conf. on Algebraic Methodology and Software Technology, AMAST’93, Twente, June 1993. Workshops in Computing, Springer Verlag. [HPR97] N. Halbwachs, Y.E. Proy, and P. Roumanoff. Verification of realtime systems using linear relation analysis. Formal Methods in System Design, 11(2):157–185, August 1997. [JHR99] B. Jeannet, N. Halbwachs, and P. Raymond. Dynamic partitioning in analyses of numerical properties. In A. Cortesi and G. Fil´e, editors, Static Analysis Symposium, SAS’99, Venice (Italy), September 1999. LNCS 1694, Springer Verlag. References [Abr95] J.-R. Abrial. The B-Book. Cambridge University Press, [BB91] 1995. A. Benveniste and G. Berry. Another look at realtime programming. Special Section of the Proceedings of the IEEE, 79(9), September 1991. [BCE+03] A. Benveniste, P. Caspi, S.A. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The synchronous languages 12 years later. Proceedings of the IEEE, 91(1), January 2003. [BCM+90] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and J. Hwang. Symbolic model checking: 1020 states and beyond. In Fifth IEEE Symposium on Logic in Computer Science, Philadelphia, pages 428–439, June 1990. [JHR+07] E. Jahier, N. Halbwachs, P. Raymond, X. Nicollin, and D. Lesens. Virtual execution of AADL models via a translation into synchronous programs. In EMSOFT 2007, Salzburg, Austria, 2007. [JRB04] E. Jahier, P. Raymond, and P. Baufreton. Case studies with Lurette V2. In First International Symposium on Leveraging Applications of Formal Method, ISoLa 2004, Paphos, Cyprus, October 2004. [MG00] F. Maraninchi and F. Gaucher. Step-wise + algorithmic debugging for reactive programs: Ludic, a debugger for lustre. In AADEBUG’2000 – Fourth International Workshop on Automated Debugging, Munich, aug 2000. [MR03] F. Maraninchi and Y. R´emond. Mode-automata: a new domainspecific construct for the development of safe critical systems. Science of Computer Programming, 46(3):219–254, 2003. [RHR91] [CGP99] P. Caspi, A. Girault, and D. Pilaud. Automatic distribution of reactive systems for asynchronous networks of processors. IEEE Transactions on Software Engineering, 25(3):416–427, 1999. Research report INRIA 3491. C. Ratel, N. Halbwachs, and P. Raymond. Programming and verifying critical systems by means of the synchronous dataflow programming language LUSTRE. In ACM-SIGSOFT’91 Conference on Software for Critical Systems, New Orleans, December 1991. [RS00] B. Rajan and R. K. Shyamasundar. A fault tolerant distributed system in Multiclock Esterel. In FORTE/ PSTV, pages 301–316. North Holland, October 2000. [CHPP87] P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice. LUSTRE, a declarative language for programming synchronous systems. In 14th Symposium on Principles of Programming Languages, Munich, January 1987. [RWNH98] P. Raymond, D. Weber, X. Nicollin, and N. Halbwachs. Automatic testing of reactive systems. In 19th IEEE Real-Time Systems Symposium, Madrid, Spain, December 1998. [CMSW99] P. Caspi, C. Mazuet, R. Salem, and D. Weber. Formal design of distributed control systems with Lustre. In Proc. Safecomp’99, volume 1698 of Lecture Notes in Computer Science. Springer Verlag, September 1999. [SBT96] T. R. Shiple, G. Berry, and H. Touati. Constructive analysis of cyclic circuits. In International Design and Testing Conference IDTC’96, Paris, France, 1996. [SC04] N. Scaife and P. Caspi. Integrating model-based design and preemptive scheduling in mixed time- and eventtriggered systems. In Euromicro conference on RealTime Systems (ECRTS’04), Catania, Italy, June 2004. [SS03] A.D. Shabbir and R. K. Shyamasundar. Specification of fault tolerant Gyroscopic controller in Esterel. In Internal Report of External Funded Project. TIFR, 2003. [BS91] F. Boussinot and R. de Simone. The Esterel language. Proceedings of the IEEE, 79(9):1293–1304, September 1991. [Cas01] P. Caspi. Embedded control: From asynchrony to synchrony and back. In 1st International Workshop on Embedded Software, EMSOFT2001, Lake Tahoe, October 2001. LNCS 2211. [CC77] [CPP05] [HB02] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM Symposium on Principles of Programming Languages, POPL’77, Los Angeles, January 1977. Jean-Louis Colac¸o, Bruno Pagano, and Marc Pouzet. A Conservative Extension of Synchronous Data-flow with State Machines. In ACM International Conference on Embedded Software (EMSOFT’ 05), Jersey city, New Jersey, USA, September 2005. N. Halbwachs and S. Baghdadi. Synchronous modeling of asynchronous systems. In EMSOFT’02. LNCS 2491, CSI Journal of Computing | Vol. 1 • No. 4, 2012 7 : 85 F. Maraninchi, et. al. About the Authors Florence Maranichi, is a Professor, Grenoble INP / ENSIMAG (Director of International Relations and is in charge of the “Embedded Software and Systems” master curriculum) at VERIMAG laboratory (Head of the group Synchronous Languages and Reactive Systems) Dr. Nicolas Halbwachsis Directeur de Recherche at CNRS, and Director of Verimag Laboratory Dr. Pascal Raymond is Chargé de Recherche au CNRS. Dr. Catherine Parent-Vigouroux is an Associate Professor at University Joseph Fourier, Grenoble Prof. R. K. Shyamasundar, Fellow IEEE, Fellow ACM is a Senior Professor and JC Bose National Fellow at TIFR. CSI Journal of Computing | Vol. 1 • No. 4, 2012