A Case Study of a Fault-Tolerant Data Acquisition System

advertisement
tolerance itself was not addressed at all. On one hand,
The family of synchronous languages [BB91], [BCE+ 03]
it would need the use of completely different validahas been quite successful in offering formally defined lantion tools (taking into account stochastic aspects), and
guages and programming environments for safety-critical reon the other hand no quantitative requirements were
active systems.
available for the case study.
Lustre [CHPP87] has been defined and studied in the Ver– The real implementation should be distributed. In this
imag laboratory. It is a dataflow language, well-suited for the
study, we only considered the functional aspects of the
description
regulation systems. The industrial programming
F. Maraninchi,
et.ofal.
specification, and we did not address the distribution.
environment SCADE, developped by Esterel-Technologies1 ,
1
Note that an ideal approach would be to validate
is based upon Lustre. SCADE is now a de-facto standard,
first a centralized version of such a specification, and
worldwide, for the development of critical embedded sotware,
to use automatic code distribution tools preserving
especially in avionics, automotive, or energy production. On
the functional properties. Some proposals for such an
the academic side, several tools have been developed around
automatic distribution of L USTRE programs have been
Lustre at Verimag: a simulator called Luciole, a debugger
made [CGP99], [CMSW99], [SC04].
called Ludic, two verification tools — Lesar is a modelchecker, NBac is a tool based on abstract interpretation and
II. A N OVERVIEW OF L USTRE AND ITS PROGRAMMING
dedicated to the verification of numerical properties — and a
ENVIRONMENT
tool for automatic testing, Lurette.
In this paper, we illustrate the use of these tools in the A. The language
design of a component of an avionic software.
The example
F. Maraninchi,
N. Halbwachs, P. Raymond, C. Parent
In a dataflow language for reactive systems, both the inputs
, a Grenoble
is a fault-tolerant system computing the gyroscopic
data gfor
Vérima
France
and outputsUniversity,
of the system
are described by their flows of values
military aircraft.
{Florence.Maraninchi,Nicolas.Halbwachs,Pascal.Raymond,C.Parent
@imag.fr
**
along
time.
Time
is
discrete
and instants
may
be numbered by
F. Maraninchi*,
N.
Halbwachs*,
P.
Raymond*,
C.
Parent*
and
R. }K.
Shyamasundar
† The authors thank IFCPAR (Indo-French Centre for Promotion of
R.K.integers.
Shyamasundar
If x is a flow, we will note xn its value at the nth
Advanced Research) under which part of the work was done.
reaction (or
nth instant)
of the program.
of Fundamental
Research,
Mumbai,
India, 1
Verimag is a joint laboratory of Université Tata
Joseph Institute
Fourier, CNRS
and
*Verimag
Grenoble University, France,
{Florence.Maraninchi,
Nicolas.Halbwachs,
A program
consumes input flows Pascal.Raymond,
and computes output
Grenoble-INP.
shyam@tifr.res.in
1 see http://www.esterel-technologies.com/.
flows, possibly using local flows which are not visible from the
Catherine.Parent}@imag.fr
7 : 69
Specification
and Validation
of ofEmbedded
Specification
and Validation
Embedded Systems:
Systems:
A Case Study
a Fault-Tolerant
Data
A Caseof
Study
of a Fault-Tolerant
DataAcquisition
Acquisition
System
with Lustre
Programming
environment†
System with
Lustre
Programming
environment
** Tata Institute of Fundamental Research, Mumbai, India, shyam@tifr.res.in
Specification
and Validation of Embedded Systems:
a. What is in the paper: Apart from demonstrating the use
showStudy
how to of
specify
and validate an embedded
using
Lustre
programming
L system
tools, the
goal ofthe
the paper
is twofold:
AWeCase
a Fault-Tolerant
Data of–Acquisition
it intends to show how the use of an executable,
environment. The case-study considered is a fault-tolerant
system
for
the
acquisition
of gyroscopic
†
formally clean, description
language can help in unt
System
with
Lustre
Programming
environmen
data in a military aircraft. We illustrate the use of Lustre
tools
for describing,
and
derstanding
an informal
specification, insimulating,
removing
Abstract—We show how to specify and validate an embedded
system using the Lustre programming environment. The caseUSTRE
study considered is a fault-tolerant system for the acquisition of
gyroscopic data in a military aircraft. We illustrate the use of
Lustre tools for describing, simulating, and verifying the system.
Beside, we show how the formalization of the requirements
by meansF.ofMaraninchi,
an executable
allows
to Parent
be
ambiguities,ofand
in communicating,
by showing
N.language
Halbwachs,
P. ambiguities
Raymond,
C.
verifying the
system.
Beside,
we
show
how the
formalization
the
requirements
by early
means of an
removed, and how the system
be developed step by step, while
, can
simulation, with the author of the specification.
Vérima
g
Grenoble
University,
France
simulation
and validation
take
place at each step.
We
believe
executable
language
allows
ambiguities
to
be
removed,
and
how
the
system
can
be
developed
step
–
it
illustrates
a
progressive
top-down
design,
with
sim{Florence.Maraninchi,Nicolas.Halbwachs,Pascal.Raymond,C.Parent
}@imag.fr
that this example is representative of a wide class of embedded
ulation
and
validation
atbelieve
each step. that this example is
systems.
by step, while
simulation and
validation
take
place
at
each
step.
We
R.K. Shyamasundar
b. What is not in the paper: Two important features of the
case studies were not considered during this experiment:
– The considered system aims at fault-tolerance. While
verification,Executable specifications.
the experiment shows that the approach is well-suited
programming this kindSynchronous
of fault-tolerance software,
Keywords - Embedded real-time systems,Language design andfor
implementation,
languages,
I. an
I NTRODUCTION
the problem
of measuring
Abstract—We show how to specify and validate
embedded
a. What is in the paper: Apart from
demonstrating
the use and validating the fault using the Lustre
Formal
methods,
Formal
verification,
Executable
specifications.
system
programming
environment.
The
case+
tolerance
itself
was
not
addressed
at
all.
On
one
hand,
of L USTRE
tools,
The family of synchronous languages [BB91],
[BCE
03]the goal of the paper is twofold:
study considered is a fault-tolerant system for the acquisition of
it would
use of completely different valida– it intends
tolanshow how the
use ofneed
an the
executable,
has
been
quite
successful
in
offering
formally
defined
gyroscopic data in a military aircraft. We illustrate the use of
tools (taking
stochastic aspects), and
formally clean,
language
can helpinto
in account
unguages
and programming
environments
for safety-critical
re-descriptiontion
Lustre
tools for describing,
simulating,
and verifying the
system.
I.Introduction
derstanding an informal specification,
in removing
on the other hand
no quantitative requirements were
Beside, we show how the
formalization
active
systems. of the requirements
by means of an executableLustre
language
allows ambiguities
to be and studied
ambiguities,
and in communicating,
by
for showing
the caseearly
[CHPP87]
has
been defined
in the Verfamily
of synchronous
languages
[BB91], [BCE+03]
–available
–of the
it specification.
intends
to study.
show how the use of an executable,
removed,The
and how
the system
can be developed step
by step, while
simulation,
with
the author
– The
real implementation should be distributed. In this
imag
laboratory.
It
is
a
dataflow
language,
well-suited
for
the
simulation
and validation
place at each
We believeformally
formally
clean,
description
language
can help
has been
quite take
successful
instep.offering
defineda progressive top-down
– it illustrates
design,
with
simstudy, we only considered the
functional aspects
of the
descriptionofofa regulation
systems.
The industrial programming
that this example is representative
wide class of
embedded
ulation and validation
at each
step.
in
understanding
an
informal
specification,
in
languages and programming
environments
for
safety-critical
1
specification,
and
we
did
not
address
the
distribution.
systems.
environment SCADE, developped by Esterel-Technologies ,
b. What is not in the paper: Two
important
features
ofapproach
the
Note
that an
ideal
would
be in
to validate
removing
ambiguities,
and
communicating, by
reactive systems.
is basedsystems,Language
upon Lustre. SCADE
is now a de-facto standard,
Keywords:Embedded
real-time
design and
case studies were not considered
this experiment:
firstduring
a centralized
version of such a specification, and
worldwide,
for the
development
critical
embedded
showing
early
simulation, with the author of the
implementation,
languages,Formal
methods,Formal
LustreSynchronous
[CHPP87]
has
been
definedofand
studied
in sotware,
the system aims
– The considered
fault-tolerance.
While
to atuse
automatic code
distribution tools preserving
especially
in
avionics,
automotive,
or
energy
production.
On
verification,Executable specifications.
the
experiment
shows
that
the
approach
is
well-suited
specification.
Verimag laboratory.
It
is
a
dataflow
language,
well-suited
the
functional
properties.
the academic side, several tools have been developed
around this kind of fault-tolerance software,Some proposals for such an
for programming
distribution of
USTRE programs
have been design, with
–automatic
– it illustrates
a Lprogressive
top-down
for the description
of
regulation
systems.
industrial
at Verimag:
a simulator
calledThe
Luciole,
a debugger
I. Lustre
I NTRODUCTION
the problem
of measuring and validating the faultmadesimulation
[CGP99], [CMSW99],
[SC04]. at each step.
called Ludic, two
verification
tools
—
Lesar
is
a
modeland
validation
programming
environment
SCADE,
developped
by
Estereltolerance itself was not addressed at all. On one hand,
The family of synchronous languages [BB91], [BCE+ 03]
NBac is a tool based on abstract interpretation
andthe use of completely different valida1
it would
need
, ischecker,
based
upon
Lustre.
SCADE
is now
a dehasTechnologies
been quite successful
in offering
formally
defined
lanb. What
is not OF
in Lthe
paper: Two important features of the
USTRE
A N OVERVIEW
dedicated
to the for
verification
of numerical
properties
— (taking
and a intoII.
tion tools
account
stochastic aspects),
and AND ITS PROGRAMMING
guages
and
programming
environments
safety-critical
refacto standard, tool
worldwide,
for
the Lurette.
development of
critical
case studies
were
not considered during this experiment:
ENVIRONMENT
for
automatic
testing,
on
the
other
hand
no
quantitative
requirements
were
active systems.
embedded
sotware,
especially
in
avionics,
orthe case
In this
paper,
illustrate
of these
tools for
in
available
The considered system aims at fault-tolerance.
Lustre [CHPP87]
has been
defined
and we
studied
in thethe
Ver-useautomotive,
A. study.
The––language
– tools
TheThe
real
implementation should be distributed. In this
oflanguage,
a component
of an
avionic
software.
example
energy
production.
On
the
academic
side,
have
imag
laboratory.
It is a design
dataflow
well-suited
for
theseveral
While
theforexperiment
shows
In
a
dataflow
language
systems,
both thethat
inputsthe approach
study,
we
only
considered
the
functional
aspects
ofreactive
the
issystems.
a fault-tolerant
systemprogramming
computing the gyroscopic data for a
description
of regulationaround
The
industrial
been developed
Lustre
at Verimag:
a
simulator
called
and
outputs
of
the
system
are
described
by
their
flows
of
values
is
well-suited
for
programming
this
kind of faultspecification, and we did not address the distribution.
military aircraft.
environment SCADE, developped
by Esterel-Technologies1 ,
time. Time
discrete
and instants may
numbered byof measuring
Luciole, a debugger called Ludic, two verificationNote
tools
that —
an idealalong
approach
wouldisbe
to validate
tolerance
software,
the beproblem
is based upon Lustre. SCADE is now a de-facto standard,
† The authors thank IFCPAR (Indo-French Centre first
Promotion
of version
integers.
x isa specification,
a flow, we will
x its value at the nth
a centralized
of If
such
andnote
Lesar is
NBac
is a sotware,
tool
based on forabstract
worldwide,
for athemodelchecker,
development
critical
embedded
and
validating
the nfaulttolerance itself was not
Advanced of
Research)
under
which part
of the work was done.
reaction
(or nth instant)
of the program.
to
use
automatic
code
distribution
tools preserving
especially
in avionics,and
automotive,
production.
On Joseph
Verimag
isora energy
jointto
laboratory
of Université
Fourier, CNRS and
interpretation
dedicated
the
verification
of numerical
all.
On one
it would
A program
consumes
input
and hand,
computes
output need the use
the functional properties.
Some addressed
proposals
for at
such
anflows
Grenoble-INP.
the academic side, several
tools have been developed around
properties — and1asee
tool
for automatic testing, Lurette.
automatic distribution
of Lpossibly
USTRE
have
been
ofprograms
completely
different
validation
flows,
using
local
flows
which are not
visible fromtools
the (taking into
http://www.esterel-technologies.com/.
Lustre at Verimag: a simulator
called Luciole, a debugger
madein
[CGP99],
called Ludic,
twopaper,
verification
— Lesarthe
is use
a modelIn this
we tools
illustrate
of these tools
the [CMSW99], [SC04].
account stochastic aspects), and on the other hand
checker,
NBac
a tool based onofabstract
interpretation
and The example is
design
of ais component
an avionic
software.
no quantitative requirements were available for the
II. A N OVERVIEW OF L USTRE AND ITS PROGRAMMING
dedicated to the verification of numerical properties — and a
case study.
a
fault-tolerant
system
computing
the
gyroscopic
data
for
a
ENVIRONMENT
tool for automatic testing, Lurette.
military
aircraft.
In this paper,
we illustrate the use of these tools in the A. The language
–– The real implementation should be distributed.
design of a component of an avionic software. The example
In this
a. What is in the paper: Apart from demonstrating
the use
In a dataflow
language for reactive systems,
bothstudy,
the inputswe only considered the functional
is a fault-tolerant system computing the gyroscopic data for a
and
outputs of the system are described by
their flows
of
values
aspects
of
the
specification, and we did not address
of
LUSTRE
tools,
the
goal
of
the
paper
is
twofold:
military
aircraft.
Embedded Systems:
along time. Time is discrete and instants may be numbered by
nt Data Acquisition
† The authors thank IFCPAR (Indo-French Centre for Promotion of integers. If x is a flow, we will note xn its value at the nth
underthank
which part
of the work
was done.
t† TheResearch)
ing environmenAdvanced
authors
IFCPAR
(Indo-French
Centrereaction
for Promotion
of Advanced
Research) under which part of the work was done.
(or nth instant)
of the program.
Verimag is a joint laboratory of Université Joseph Fourier, CNRS and
d, C. Parent
Verimag
is
a
joint
laboratory
of
Universit´e
Joseph
Fourier,
CNRS
and
Grenoble-INP.
A
program
consumes
input
flows
and computes output
Grenoble-INP.
France
11seesee
mond,C.Parent}@imag.fr
flows, possibly using local flows which are not visible from the
http://www.esterel-technologies.com/.
http://www.esterel-technologies.com/.
Tataa Institute
of
Fundamental
Research, design
Mumbai,
Keywords:Embedded
systems,Language
andIndia,
representative
of
widereal-time
class
of
embedded
systems.
implementation, Synchronous shyam@tifr.res.in
languages,Formal methods,Formal
1
umbai, India,
the paper: Apart from demonstrating the use
E tools, the goal of the paper is twofold:
ds to show how the use of an executable,
y clean, description language can help in unding an informal specification, in removing
ties, and in communicating, by showing early
on, with the author of the specification.
ates a progressive top-down design, with simand validation at each step.
ot in the paper: Two important features of the
es were not considered during this experiment:
nsidered system aims at fault-tolerance. While
CSI Journal of Computing | Vol. 1 • No. 4, 2012
a “bounded counter”, etc.
7 : 70
B. An example Lustre program
a very simple
example ofSystems:
program, A
weCase
give aStudy
Lustreofnode
Specification and As
Validation
of Embedded
a
that willSystem
be usedwith
later Lustre
in the case
study. Calledenvironment
“maintain”,
Fault-Tolerant Data Acquisition
Programming
this operator receives an integer n and a Boolean b as input
parameters, and computes a Boolean output m, which is true
the distribution. Note that an ideal approach would
output
m, which
truemaintained
whenever bhigh
has during
been maintained
high
whenever
b has isbeen
the last n cycles.
during the last n cycles. The node uses a counter cpt, which is
be to validate first a centralized version of such a
The node uses a counter cpt, which is set to n whenever b
set to n whenever b is false, and decremented to 0 otherwise.
specification, and to use automatic code distribution
is false, and decremented to 0 otherwise. The output m is true
The output m is true when cpt is zero.
tools preserving the functional properties. Some
when
cpt is zero.
proposals for such an automatic distribution of
LUSTRE programs have been made [CGP99],
[CMSW99], [SC04].
II. An Overview of Lustre and its Programming
Environment
A. The language
In a dataflow language for reactive systems, both the
inputs and outputs of the system are described by their flows
of values along time. Time is discrete and instants may be
numbered by integers. If x is a flow, we will note xn its value
at the nth reaction (or nth instant) of the program.
A program consumes input flows and computes output
flows, possibly using local flows which are not visible from
the environment. Local and output flows are defined by
equations. An equation “x = y + z” defines the flow x from the
flows y and z in such a way that, at each instant n, xn = yn + zn.
A set of such equations, using arithmetic, Boolean, etc.
operators, describes a network of operators, and is similar
to the description of a combinational circuit. The same
constraints apply: one should not write sets of equations with
instantaneous loops, like : {x = y + z; z = x + 1; ...}. This is
a set of fixpoint equations that perhaps has solutions (see
[SBT96], for more detailed discussion), but it is not accepted
as a dataflow program. For referencing the past, the operator
pre is introduced : "n > 0; (pre(x))n = xn–1.
One typically writes T = pre(T) + i, where T is an
output, and i is an input. It means that, at each instant, the
value of the flow T is obtained by adding the current value of
the input i to the previous value of T. Initialization of flows
is provided by the –> operator. E –> F is an expression, the
value of which is the one of E at the first instant (i.e., E0), and
then the one of F forever (i.e., Fn:"n > 1). The equation X =
0 –> pre(X) + 1 defines the flow of integers; as a reactive
program, it produces values on the basic clock.
The conditional structure is a ternary combinational
operator, and is strict: the two branches are always evaluated.
One writes: X = if C then E else F, where C is a Boolean
expression and E1, E2 are two expressions of the same type,
meaning: "n > 0, Xn = if Cn then En else Fn.
The language is structured by the definition of reusable
nodes that can be called anywhere in expressions defining
variables. Programs usually input a library of small
wellidentified reactive behaviors, like a “two-states” with
reset, a “bounded counter”, etc.
B. An example Lustre program
As a very simple example of program, we give a Lustre
node that will be used later in the case study. The node named
“maintain” denotes an operator that receives an integer n
and a Boolean b as input parameters, and computes a Boolean
CSI Journal of Computing | Vol. 1 • No. 4, 2012
node maintain (n : int ; b : bool)
returns (m : bool) ;
var cpt : int ;
let
cpt = n -> if b then
if pre(cpt)>0 then
pre(cpt) - 1
else pre(cpt)
else n ;
m
= (cpt = 0) ;
tel
C. The programming environment
In addition to the industrial environment SCADE —
C. The
programming environment
which proposes a graphical interface, a simulator, and a
In
addition
the industrial
environment
SCADE —
which
compiler —, antoacademic
toolset
has been developed
around
proposes
a
graphical
interface,
a
simulator,
and
a
compiler
—,
Lustre. The following tools or prototypes are available:
ƒƒ a compiler into C.
ƒƒ a simulator, called Luciole, which allows an early
simulation of Lustre nodes. It displays an interactive
board, where inputs can be entered and outputs are
displayed. It is connected with the tool Sim2chro,
which displays the history of inputs/outputs by means
of timing diagrams.
ƒƒ two verification tools: they are both restricted to the
verification of safety properties, described by means
of synchronous observers [HLR93]: a synchronous
observer is a Lustre program, taking as inputs
the input/output variables of the program under
verification, and signaling whenever the property
considered is violated. This technique is used both for
describing required properties, and assertions about
the environment which must be assumed for these
properties to hold. The tools differ in the techniques
applied for verification:
–– Lesar [RHR91] is quite a standard symbolic
modelchecker [BCM+90]. It checks the property
by exploring (enumeratively or symbolically) a
finite state model of the program, which abstracts
away all its numerical aspects. As a consequence,
it is not able to verify properties depending on the
dynamic behavior of the numerical variables. It
should be used for control dominated properties.
–– NBac [JHR99] is able to handle simple numerical
properties, thanks to the use of “Linear Relation
Analysis” [HPR97], a special case of abstract
interpretation [CC77]. However, NBac is much
more expensive than Lesar, and should only be
applied to fairly small programs.
ƒƒ a prototype debugging tool [MG00].
•
•
ti
th
p
An
It re
(tell
and
spec
able
real
thes
the
a pr
Industri
Since t
crepanci
the lang
hierarchi
and mod
version.
also diffe
features,
III.
A. Deve
We sta
of functi
7 : 71
F. Maraninchi, et. al.
Roll
Roll
FaultFaulttolerant
tolerant
system
system
Pitch
Pitch
3 “secure”
3values
“secure”
values
Yaw
Yaw
4 x 3 Faulty
4physical
x 3 Faulty
connections
physical
connections
each of them
made of 2 wires
each
made of 2 wires
(4 x 3ofxthem
2 inputs)
(4 x 3 x 2 inputs)
4 identical physical devices
4 identical
physical
devices
(4 x 3 values,
deg/s)
(4 x 3 values, deg/s)
Fig. 2.
Fig. 2.
Description of the physical system
Description of the physical system
Fig. 2 : Description of the physical system
Roll
Roll
4 identical physical devices
4 identical
physical
devices
(4 x 1 values,
deg/s)
(4 x 1 values, deg/s)
Fig. 3.
Fig. 3.
3
3
FaultFaulttolerant
tolerant
system
system
4 x 1 Faulty
4physical
x 1 Faulty
connections
physical
connections
each of them
made of 2 wires
each
made of 2 wires
(4
x 1ofxthem
2 inputs)
(4 x 1 x 2 inputs)
1 ”secure”
1value
”secure”
value
Description of the physical system for one axis
Fig.for
3 :one
Description
of the physical system for one axis
Description of the physical system
axis
are called performance requirements. For instance, several
Industrial
vs.
academic
tools
are
called
performance
For
instance,
several
distinct
working
rates are requirements.
required for the
parts
of the system:
distinct
working
rates are
forthere
the parts
the system:
0.05s,
0.0125s,
etc.
Since
the release
of required
Scade-V6,
are of
significant
0.05s, 0.0125s,
etc. the industrial and academic versions
discrepancies
between
of the
language.
In particular,
Scade-V6
contains
a notion
We first identified
the system
interface,
i.e., the
physical
ofinputs
hierarchic
automata
[CPP05],
inspired
both
by
Esterel
We first
identified
the
system
interface,
i.e.,
the
physical
and outputs, and some additional inputs that
model
[BS91]
and
mode
automata
[MR03],
which
is
not
in
the
inputs
and
outputs,
and
some
additional
inputs
that
faults. Then we wrote a single-clock Lustre program model
mimacademic
version.
The
mechanisms
for
defining
and
handling
faults.
Then
we
wrote
a
single-clock
Lustre
program
mimicking the internal structure of the informal specification,
but
arrays
are
also
different.
In this
paper,
we willspecification,
not use thesebut
icking
the
internal
structure
of
the
informal
forgetting about the multi-rate requirements. This is already
incompatible
features,
thus conforming
to both versions.
forgetting about
the multi-rate
requirements.
This is already
an interpretation of the informal specification. For instance,
an
the informal
specification.
For instance,
we interpretation
translated the of
English
“a followed
by b immediately”
by
we
translated
the
English
“a
followed
by
b
immediately”
by
“at the next step”.
“at the next step”.
At each step of the development, we ran manual simulations,
At as
each
of the development,
we ran manual
and,
farstep
as possible,
we used verification
tools simulations,
to establish
and,
as farproperties.
as possible, we used verification tools to establish
important
important properties.
B. Physical structure
III.Informal
description of the Gyroscopic
B. Physical structure
System
Figure 2 describes the system and its physical environment.
Figure
2 describes
the system
andgyroscopes,
its physicaleach
environment.
The
system
is connected
to four
of them
A.
Development
of
the
Case
Study
The
system
is
connected
to
four
gyroscopes,
each
of them
measuring the angle variations along three axes named
roll,
measuring
the
angle
variations
along
three
axes
named
roll,
pitch
yaw. from
The values
obtained
by one of in
these
physical
Weand
started
an informal
specification
English,
pitch
and
yaw.
The
values
obtained
by
one
of
these
physical
devices,
one axis, are
transmitted to
the some
computer
system
made
of for
functional
requirements
and
timing
devices,
one
axis,
transmitted
to the requirements.
system
along twoforwires.
Hence
the
system
receives
4computer
× 3 × 2 values.
requirements,
that
areare
called
performance
along
two
wires.
Hence
the
system
receives
4
×
3
×
2
values.
From
these
24
values,
it
has
to
compute
only
three,
called
For instance, several distinct working rates are required
From
these
24the
values,
it has
to 0.0125s,
computeetc.
only three, called
for
the parts
of
system:
0.05s,
secure
values.
secure
The values.
first step in modeling the system in Lustre, is to
The first on
step
modeling
the the
system
in Lustre,
to
concentrate
oneinaxis
only, since
behaviour
on all isaxes
2
concentrate
on
one
axis
only,
since
the
behaviour
on
all
axes
are all the same . We shall be using the diagram depicted in
2
. We shall
thethe
diagram
are
all 3theassame
Figure
the reference
in be
theusing
rest of
paper. depicted in
CSI Journal
of Computing
| of
Vol.
• No. 4, 2012
Figure
3 as the reference
in the rest
the1 paper.
2 Actually,
2 Actually,
the behaviour of pitch is slightly different.
the behaviour of pitch is slightly different.
7 : 72
Specification and Validation of Embedded Systems: A Case Study of a
Fault-Tolerant Data Acquisition System with Lustre Programming environment
We first identified the system interface, i.e., the
physical inputs and outputs, and some additional
inputs that model faults. Then we wrote a single-clock
Lustre program mimicking the internal structure of the
informal specification, but forgetting about the multirate requirements. This is already an interpretation of
the informal specification. For instance, we translated the
English “a followed by b immediately” by “at the next step”.
At each step of the development, we ran manual
simulations, and, as far as possible, we used verification
tools to establish important properties.
B. Physical structure
Fig. 2 describes the system and its physical
environment. The system is connected to four gyroscopes,
each of them measuring the angle variations along three
axes named roll, pitch and yaw. The values obtained by
one of these physical devices, for one axis, are transmitted
to the computer system along two wires. Hence the system
receives 4 x 3 x 2 values. From these 24 values, it has to
compute only three, called secure values.
The first step in modeling the system in Lustre, is
to concentrate on one axis only, since the behaviour on
all axes are all the same2. We shall be using the diagram
depicted in Fig. 3 as the reference in the rest of the paper.
Luciole
simulation
Lesar
Boolean
verification
NBac
numeric
verification
Lurette
automatic
testing
Lustre
compiler
Fig. 1.
The difficult part is the detection of faults. First, we
have to define what we call faults, and then to show how
the redundant structure of the gyroscopic system can
handle them.
Definition of faults
4
between the actual measurement devices and the computer;
Lustre
program
C code
D. The faults
The system
able toorhandle
two kinds
of faults:
themselves
being isbroken
not working
properly
for link
some
faults, that are due to some bad behavior of a physical link
time.
SCADE
Lustre
front-end
(remember we concentrate on one axis only, say roll). Each
channel delivers one value, and there is a vote to compute
one single value out of four, depending on the current fault
conditions. The behaviour is as follows: if all the channels
are working, then take the Olympic average of the four
values (i.e., the average of all the values, except the two
extreme ones); if one channel has failed, take the median
value, among the other three; if two channels have failed,
take the average of the two remaining ones. Three or more
channels having failed at the same time is supposed to
have a very low probability, but the case is handled by
the system emitting a so-called “safe value”, which is to be
maintained for some delay, even when the bad situation
disappears.
Of course, the channels are intended to run on different
processors, for the purpose of redundancy. For simplicity,
we shall ignore these distributed aspects in our modelling.
Fig. 1 : The Lustre programming environment
The Lustre programming environment
C. The voting principle
The internal structure of the system is as follows: it
made
of four
channels, each of them being in charge of
C.is The
voting
principle
the two wires that come from one of the four gyroscopes
The internal structure of the system is as follows: it is made
of2 four
channels,
each of them
being
in charge
of the two wires
Actually,
the behaviour
of pitch
is slightly
different.
that come from one of the four gyroscopes (remember we
concentrate on one axis only, say roll). Each channel delivers
one
and there
is a vote to| compute
one 4,
single
CSIvalue,
Journal
of Computing
Vol. 1 • No.
2012value
out of four, depending on the current fault conditions. The
behaviour is as follows: if all the channels are working, then
take the Olympic average of the four values (i.e., the average
sensor faults,
are dueoftofaults
the measurement devices (the
Modeling
andthat
Detection
sensors)
themselves
being
broken
working
Each channel compares the valuesor itnotreceives
onproperly
the two
for
some
time.
wires, and is able to detect local discrepancies. This double
transmission
of values
fromofone
gyroscope to the computer
Modeling and
Detection
faults
system is there to detect transmission faults. Note that, for a
the values
on more
the
fault Each
to be channel
reported,compares
the two values
haveittoreceives
differ by
two
wires,
and
is
able
to
detect
local
discrepancies.
This
than ∆v during consecutive ∆t units of time.
double transmission of values from one gyroscope to the
Moreover, in order to support sensor faults, channels talk
computer system is there to detect transmission faults.
toNote
eachthat,
other
exchange
values, sothe
that
each
of them
forand
a fault
to be reported,
two
values
have can
to
compare
its
own
value
to
the
other
three.
If
one
of
the
gyrodiffer by more than Dv during consecutive Dt units of time.
scopes
is not
working,
the valuethe
it delivers
will
probably
Each
channel
compares
values it
receives
ondiffer
the
from
the
values
given
by
the
three
other
devices.
Channels
two wires, and is able to detect local discrepancies. This
also
havetransmission
to exchange their
failure
statuses,
because each
one
double
of values
from
one gyroscope
to the
should
compare
its is
value
to the
values transmission
of the other channels,
computer
system
there
to detect
faults.
but
only
of for
those
that todobenot
declarethe
themselves
failed
Note
that,
a fault
reported,
two values
have(a
to differdeclares
by moreitself
than failed
v during
consecutive
of time.
channel
when
it detectst units
a transmission
Moreover, in order to support sensor faults, channels talk
fault).
to each other and exchange values, so that each of them
can compare its own value to the other three. If one of
theLatching
gyroscopes
not resetting
working, the value it delivers will
E.
faultsis and
probably differ from the values given by the three other
Some faults
are considered
serious than
others,
and
devices.
Channels
also havemore
to exchange
their
failure
should
therefore
be
latched:
even
if
the
cause
of
the
fault
statuses, because each one should compare its value to the
disappears,
theother
channel
continues
to declare
values of the
channels,
but only
of those itself
that dofailed.
not
Hence,
channelfailed
has an
internal state,
thatitself
reflects
the
declareeach
themselves
(a channel
declares
failed
whenitithas
detects
a transmission fault).
faults
encountered.
Typically, transmission faults (discrepancies between the
two values received on the two wires) are not latched, because
they are considered to be physically temporary.
Conversely, faults that are due to cross-channel comparisons
are latched: one of the physical devices is supposed to be off,
and it is unlikely to repair during the flight.
Of course, there should be some way of resetting the latches.
7 : 73
F. Maraninchi, et. al.
E. Latching faults and resetting
IV.Describing the Architecture in Lustre
Some faults are considered more serious than others,
and should therefore be latched: even if the cause of the fault
disappears, the channel continues to declare itself failed.
Hence, each channel has an internal state, that reflects the
faults it has encountered.
Typically, transmission faults (discrepancies between
the two values received on the two wires) are not latched,
because they are considered to be physically temporary.
Conversely, faults that are due to cross-channel
comparisons are latched: one of the physical devices is
supposed to be off, and it is unlikely to repair during the flight.
Of course, there should be some way of resetting
the latches. This is done in our system thanks to two
kinds of resets, called OnGroundReset and InAirReset.
OnGroundReset can be thought of as a general resetting
mechanism that happens only when it is really safe to do so,
namely on ground. InAirReset is more interesting. First, it is
not automatic, but results from a pilot action. Hence the pilot
should be given some information about the internal state of
the fault-tolerant controller, in order to decide whether he/she
should issue a reset. Moreover, it is to be taken into account
only under some conditions.
The internal state of a channel, regarding the latched
faults, is one of the following:
ƒƒ Everything is working properly, or there are transmission
faults from time to time, but they are not latched
ƒƒ There has been at least one serious fault in the past, and
the failure is latched. The internal state can be driven to
the normal one with any of the resets (on ground or in
air), but for the InAirReset to work, some conditions on
the measured values have to hold.
ƒƒ There have been several serious faults in the past, or
one serious fault followed immediately by a transmission
one, then the fault is latched and reset-inhibited. In this
case, only the OnGroundReset may restore the normal
state. The system should never enter a global state in
which more than two channels are reset-inhibited, since
it cannot be repaired on board.
A careful reading of the informal documentation gives all
the details about the possible transitions among these three
states. Writing down these transitions allowed us to make
precise the priorities, and to remove some ambiguities. We
give the automaton point of view on this part of the system,
in section VI-C below.
The architecture of the Lustre program is exactly the
same as the one described in the informal specification –
thanks to the dataflow style of the language. Figure 4 shows
the main structure of the system (for one axis): The four
channels will be implemented as four identical nodes. The
voter is another node. An additional node, the global allocator,
will deal with the problem of third faults.
This direct translation of the specification into the
program architecture highlights the advantages of some
features of the language:
ƒƒ the notion of concurrency corresponds to the logical
concurrency of the specification;
ƒƒ moreover, this concurrency can model a physical
concurrency, as it is the case, here, if the four channels
are to be distributed; the communication — here, it
will be simply a delay — is an abstract model of the
real communication between distributed processes (see
[Cas01], [HB02], [JHR+07] for a more accurate modeling
of physical concurrency);
ƒƒ the connections between nodes clearly reflect the
specification, thanks to the data-flow communication
between nodes;
ƒƒ
the four channels will be essentially instantiations of
a single node – thanks to the functional nature of the
language.
F. Special case for third faults
Finally, the faults are not treated the same depending on
their order of occurrence: the first and second one (among four
channels) are treated the same, but it is said in the informal
documentation that a third failure should not cause the
channels to become reset-inhibited. Hence, we need to treat
the third faults in a special way – which will be clear in the
sequel.
A. Global inputs and outputs
ƒƒ
ƒƒ
The system receives:
4 pairs (or 2 4-tuples, Roll_a and Roll_b) of flows of type
“real”, coming from the sensors;
2 Boolean flows, On_Ground_Reset and In_Air_Reset.
It computes a single safe value for Roll.
B. The component interfaces
The channels
Each channel receives a pair (roll_a, roll_b) from
outside. The reset signals OnGroundReset and InAirReset
(implemented as Boolean flows) also come from outside. It
receives also, from each other channel, the computed value
foreign_roll and the failure status foreign failure roll. Finally,
it receives a Boolean flow Inhib_Roll_Allowed from the global
allocator.
The channel computes three outputs: its local value
local_roll for roll, its transmission failure status transmit_
failure_roll (which will be broadcast both to other channels and
to the voter), and a request for “reset inhibition” arbitration,
Ask_inhib_roll to the global allocator.
The exchange of values foreign_roll and foreign_failure_
roll among channels at once raises a problem, since local roll
and transmit_failure_roll will be computed as combinational
functions of the inputs foreign roll and foreign failure roll.
As a consequence, delays should be introduced somewhere,
CSI Journal of Computing | Vol. 1 • No. 4, 2012
7 : 74
Specification and Validation of Embedded Systems: A Case Study of a
Fault-Tolerant Data Acquisition System with Lustre Programming environment
6
On Ground Reset
Roll a
In Air Reset
Roll b
local roll
Channel 0
transmit failure
Global
ask
Channel 1
Allocator
allowed
values
Calculate
Channel 3
Channel 2
(the voter)
failures
Roll
Fig. 4.
: Architecture
of the Lustre
program:
connecting the four channels together
ArchitectureFig.
of the4 Lustre
program: connecting
the four channels
together
to prevent local_roll and transmit_failure_roll to depend
instantaneously
on themselvesFailDetect
(remember thattalks
combinational
transmission discrepancies;
to the three
loops
are
forbidden
in
Lustre,
and
rejected
by
all
tools).
So,of
wethe
other channels and knows about the internal fail
status
delay
broadcasting
the
value
(using
a
“pre”
operator)
when
channel. It also talks to the global allocator, which knows
entering
channel.
Forofinstance,
we call and
foreign_rolli
about thethe
failure
statuses
the four ifchannels,
is ablej to
the value entering the channel j coming from channel i, and
prevent a third failure from becoming reset-inhibited.
local_rolli the value computed by channel i, we will have
foreign_rollij = pre(local_rolli)
The voter
V. T HE VOTER
The
voter isreceives
only a consumer
values
by other
It simply
the valuesoflocal
rollproduced
and transmit_
modules.
Thus,
it
can
be
designed
in
isolation.
Figure
failure_roll from the channels, and computes the safe value 7
describes
the Lustre
for the complete
Roll.
However,
it is notcode
combinational
(see §V).voter. The details
of its parts are described below. Notice that the voter is not a
The
global allocator
combinational
node: it has to memorize a fragment of its past.
components, with suitable connections. It is described3 in Fig.
5.B. Counting faults
nodes noneof,
D. The
One channel
oneoffour, twooffour,
threeoffour
are
intended
to count the faults, i.e.,
We can go one step further in the description of the
the
number
of
Boolean
variables
have value
among
architecture, by giving the internalthat
structure
of onetrue,
channel
f1,
f2,
f3,
f4.
They
can
be
programmed
with
integers,
(see Fig. 6). It is made of two parts: Monitor detects the
of course, like
in Fig. reftwooffour.a.
is also
transmission
discrepancies;
FailDetectHowever,
talks to there
the three
a
form
that
does
not
make
use
of
numerical
variables,
other channels and knows about the internal fail status ofand
thatchannel.
can beItnecessary
forthe
decidability
reasons
when
trying
the
also talks to
global allocator,
which
knows
to
perform
formal
verification
with
Lesar.
Hence,
we
about the failure status of all the four channels, and is able towill
sometimes
usefailure
the node
ofbecoming
Fig. 8.b. reset-inhibited.
prevent
a third
from
Similar
nodes
for
noneof, oneoffour, and
V. The Voter
threeoffour are easy to write.
The voter is only a consumer of values produced by
other modules. Thus, it can be designed in isolation. Fig. 7
describes
the Lustre
code foritself
the complete voter. The details
C. The voting
mechanism
of its parts are described below. Notice that the voter is not a
Then comes
theitvoting
itself. Thea conditional
expression
combinational
node:
has to memorize
fragment of its
past.
It is in charge of dealing with “reset inhibition”. It needs
the outputs local_roll, transmit failure_roll and Ask_inhib_roll
A. The timing aspects
from the channels, and returns four Boolean allowed (one for
each
channel). Here
again,
to avoid
combinational
loops, the if mimics the informal documentation. The auxiliary nodes
Remember
that the
“safe
value”
must me maintained
The timing aspects
input
to each
channel
be the delayed
three Inhib_Roll_Allowed
of more channels have
been
faultywill
“recently”.
The first A.
OlympicAverage, Median, and Average are straightforward.
Remember that the “safe value” must be maintained if
version
of the
corresponding
output allowed
of the
global
three lines
define
a counter cpt_roll,
which
is non
zero
three
or more channels have been faulty “recently”. The first
allocator.
exactly when there was a case with three failures, in the recent
three
lines define a counter cpt_roll, which is non zero
past. “recent” means within the last SAFE_COUNTER_TIME D. Validation
C. Connecting the four channels together
exactly
when there was a case with three failures, in the recent
units of time, where SAFE_COUNTER_TIME is a constant.
It “recent”
is difficult
to within
expresstheproperties
of the voter, apart
According
to
the
general
architecture
described
above,
past.
means
last SAFE_COUNTER_TIME
The writing in Lustre is quite simple: just restart the counter
from
the
whole
specification
(which
would
be a rephraswe are able to write the main node, which invokes the main
units of time, where SAFE_COUNTER_TIME is a constant.
The
with value SAFE_COUNTER_TIME each time there are three
faults, and then decrement it at each step, until it reaches zero :
3
ing of the program). For this simple device, we can just
Because of the very regular structure of this node, it would be more concisely
and elegantly
described
by means
arrays.
try a simulation
using
Luciole.
Fig. of9 Lustre-V4
shows such
a simHowever, since these arrays differ significantly from those of Scade, we
decidedwe
not choose
to make use
of them.
ulation:
constant
inputs for the values x1, x2,
cpt_roll = 0 ->
if three_roll then SAFE_COUNTER_TIME
else if pre (cpt_roll)> 0
CSI Journal of
Computing
| Vol. 1 • No. 4, 2012
then
pre(cpt_roll)-1
else 0 ;
x3, x4 coming from the channels, and just play with the
occurrences of transmission faults f1, f2, f3, f4, observing that the correct output is computed in each case. The
simulation is done with SAFE COUNTER TIME = 3 and
FAIL SAFE ROLL VALUE = 0.
A. The timing aspects
Remember that the “safe value” must me maintained if
three of more channels have been faulty “recently”. The first
three lines define a counter cpt_roll, which is non zero
exactly
when thereet.
was
a case with three failures, in the recent
F. Maraninchi,
al.
past. “recent” means within the last SAFE_COUNTER_TIME
units of time, where SAFE_COUNTER_TIME is a constant.
writing
in Lustre
is quite
simple:
just restart
the counter
with
The
writing
in Lustre
is quite
simple:
just restart
the counter
value
SAFE_COUNTER_TIME
each time
there
are
three
faults,
with
value
SAFE_COUNTER_TIME
each
time
there
are
three
and then
it at each
step,step,
untiluntil
it reaches
zero zero
:
faults,
and decrement
then decrement
it at each
it reaches
:
cpt_roll = 0 ->
if three_roll then SAFE_COUNTER_TIME
else if pre (cpt_roll)> 0
then pre(cpt_roll)-1
else 0 ;
Then comes the voting itself. The conditional expression
mimics the informal documentation. The auxiliary nodes
OlympicAverage, Median, and Average are straightforward.
D. Validation
7 : 75
It in
is Fig.
difficult
to express
properties
the avoter,
apart
like
reftwooffour.a.
However,
thereofis also
form that
from
the make
wholeuse
specification
(which
would
a rephrasdoes not
of numerical
variables,
andbethat
can be
ing
of the for
program).
For reasons
this simple
can just
necessary
decidability
whendevice,
trying we
to perform
try
a simulation
Luciole.
Fig. we
9 shows
such a use
simformal
verificationusing
with Lesar.
Hence,
will sometimes
ulation:
choose
the nodewe
of Fig.
8.b. constant inputs for the values x1, x2,
x3, x4
comingnodes
from the
just play withand
the
Similar
for channels,
noneof,and oneoffour,
occurrences
of are
transmission
faults f1, f2, f3, f4, observthreeoffour
easy to write.
ing that the correct output is computed in each case. The
C. The voting mechanism itself
simulation is done with SAFE COUNTER TIME = 3 and
comes
the voting
itself.
FAIL Then
SAFE
ROLL
VALUE
= 0.The conditional expression
B. Counting faults
mimics the informal documentation. The auxiliary nodes
OlympicAverage, Median, and Average are straightforward.
The nodes noneof, oneoffour, twooffour,
threeoffour are intended to count the faults, i.e., the
number of Boolean variables that have value true, among f1,
f2, f3, f4. They can be programmed with integers, of course,
D.Validation
It is difficult to express properties of the voter, apart from
the whole specification (which would be a rephrasing of the 7
program). For this simple device, we can do a simulation using
node GYRO ( Roll_a_1, Roll_b_1, Roll_a_2, Roll_b_2,
Roll_a_3, Roll_b_3, Roll_a_4, Roll_b_4 : real;
On_Ground_Reset, In_Air_Reset : bool)
returns (Roll : real);
var
local_roll_1, local_roll_2, local_roll_3, local_roll_4 : real;
transmit_failure_1, transmit_failure_2,
transmit_failure_3, transmit_failure_4 : bool;
ask_1, ask_2, ask_3, ask_4 : bool;
allowed_1, allowed_2, allowed_3, allowed_4 : bool;
let
(local_roll_1, transmit_failure_1, ask_1) =
Channel(Roll_a_1, Roll_b_1, On_Ground_Reset, In_Air_Reset,
pre(local_roll_2), pre(transmit_failure_2),
pre(local_roll_3), pre(transmit_failure_3),
pre(local_roll_4), pre(transmit_failure_4),
allowed_1);
(local_roll_2, transmit_failure_2, ask_2) =
Channel(Roll_a_2, Roll_b_2, On_Ground_Reset, In_Air_Reset,
pre(local_roll_1), pre(transmit_failure_1),
pre(local_roll_3), pre(transmit_failure_3),
pre(local_roll_4), pre(transmit_failure_4),
allowed_2);
(local_roll_3, transmit_failure_3, ask_3) =
Channel(Roll_a_3, Roll_b_3, On_Ground_Reset, In_Air_Reset,
pre(local_roll_1), pre(transmit_failure_1),
pre(local_roll_2), pre(transmit_failure_2),
pre(local_roll_4), pre(transmit_failure_4),
allowed_3);
(local_roll_4, transmit_failure_4, ask_4) =
Channel(Roll_a_4, Roll_b_4, On_Ground_Reset, In_Air_Reset,
pre(local_roll_1), pre(transmit_failure_1),
pre(local_roll_2), pre(transmit_failure_2),
pre(local_roll_3), pre(transmit_failure_3),
allowed_4);
(allowed_1, allowed_2, allowed_3, allowed_4) =
Allocator(ask_1, ask_2, ask_3, ask_4);
Roll = Voter(local_roll_1, local_roll_2,
local_roll_3, local_roll_4,
transmit_failure_1, transmit_failure_2,
transmit_failure_3, transmit_failure_4);
tel
Fig. 5.
Fig. 5 : The main node
The main node
roll1
OnGroundReset
CSI Journal of Computing | Vol. 1 • No. 4, 2012
roll2
Roll = Voter(local_roll_1, local_roll_2,
local_roll_3, local_roll_4,
transmit_failure_1, transmit_failure_2,
transmit_failure_3, transmit_failure_4);
7Fig.
: 765.
tel
Specification and Validation of Embedded Systems: A Case Study of a
Fault-Tolerant Data Acquisition System with Lustre Programming environment
The main node
roll1
OnGroundReset
InAirReset
roll2
Local roll
MONITOR
transmit failure
Foreign roll
failure roll
FAIL DETECT
Foreign failure roll
Ask inhib roll
Inhib roll Allowed
Fig. 6.
Architecture of the Lustre program: a channel
Fig. 6 : Architecture of the Lustre program: a channel
8
node Voter ( x1, x2, x3, x4 : real ; -- four values given by the channels
f1, f2, f3, f4 : bool ; -- failure statuses seen by the four channels
)
returns (x : real)
var
zero_roll, one_roll, two_roll, three_roll : bool ; -- numbers of failures
cpt_roll : int ; -- a counter
let
cpt_roll = 0 -> if three_roll then SAFE_COUNTER_TIME
else if pre (cpt_roll)>0 then pre(cpt_roll) - 1
else 0 ;
zero_roll = noneof
(f1, f2, f3, f4) ;
one_roll
= oneoffour
(f1, f2, f3, f4) ;
two_roll
= twooffour
(f1, f2, f3, f4) ;
three_roll = threeoffour (f1, f2, f3, f4) ;
x = if (zero_roll and cpt_roll = 0 ) then
OlympicAverage (x1, x2, x3, x4)
else if (one_roll and cpt_roll = 0 ) then
Median (x1, x2, x3, x4, f1, f2, f3, f4 )
else if (two_roll and cpt_roll = 0 ) then
Average (x1, x2, x3, x4, f1, f2, f3, f4 )
else FAIL_SAFE_ROLL_VALUE ;
tel ;
Fig. 7.
The voter in Lustre
Fig. 7 : The voter in Lustre
node twooffour (f1, f2, f3, f4 : bool)
returns (r : bool)
let
r = ((if f1 then 1 else 0) +
(if f2 then 1 else 0) +
(if f3 then 1 else 0) +
(if f4 then 1 else 0)) = 2 ;
tel
CSI Journal of Computing
| Vol.
1 •counter
No. 4, 2012
(a) A version
with
node twooffour (f1, f2, f3, f4 : bool)
returns (r : bool)
1
2
3
4
5
6
7
8
9
10
11
x1
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
x2
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
x3
4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00
x4
6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00
f1
tel ;
Fig. 7.
tel ;
The Fig.
voter7.in Lustre
The voter in Lustre
7 : 77
F. Maraninchi, et. al.
node twooffour
node twooffour
(f1, f2, (f1,
f3, f4
f2,: f3,
bool)
f4 : bool)
returns (r
: bool)
returns
(r : bool)
let
let
r = ((if rf1
= then
((if 1
f1else
then0)
1 +
else 0) +
(if f2 then
(if 1
f2else
then0)
1 +
else 0) +
(if f3 then
(if 1
f3else
then0)
1 +
else 0) +
(if f4 then
= 2 ;
(if 1
f4else
then0))
1 else
0)) = 2 ;
tel
tel
1
(a) A version
(a) with
A version
counter
with counter
3 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 9
2
10
11
x1
1.00 1.00 1.001.00
1.001.00
1.001.00
1.001.00
1.001.00
1.001.00
1.001.00
1.001.00
1.001.00 1.00 1.00
x1
x2
2.00 2.00 2.002.00
2.002.00
2.002.00
2.002.00
2.002.00
2.002.00
2.002.00
2.002.00
2.002.00 2.00 2.00
x3
4.00 4.00 4.004.00
4.004.00
4.004.00
4.004.00
4.004.00
4.004.00
4.004.00
4.004.00
4.004.00 4.00 4.00
x4
6.00 6.00 6.006.00
6.006.00
6.006.00
6.006.00
6.006.00
6.006.00
6.006.00
6.006.00
6.006.00 6.00 6.00
x2
x3
x4
node twooffour
node twooffour
(f1, f2, (f1,
f3, f4
f2,: f3,
bool)
f4 : bool)
f1
f1
returns (r
returns
: bool)
(r : bool)
let
let
f2
f2
r = f1 and
r = f1 and
f3
f3
(f2 and not
(f2 f3
andand
notnot
f3 f4
and or
not f4 or
f4
f4
f3 and not
f3 f2
andand
notnot
f2 f4
and or
not f4 or
f4 and not
f4 f2
andand
notnot
f2 f3)
and or
not f3) or
3.00
3.00
3.00 3.00 3.003.00 3.00 3.00
f2 and f2 and
(f1 and not
(f1 f3
andand
notnot
f3 f4
and or
not f4 or
2.00
2.00
2.00
2.00
x
x
f3 and not
f3 f1
andand
notnot
f1 f4
and or
not f4 or
1.50
1.50
1.50
1.50
f4 and not
f4 f1
andand
notnot
f1 f3)
and or
not f3) or
f3 and f3 and
0.00 0.00 0.000.00 0.00 0.00
(f2 and not
(f2 f1
andand
notnot
f1 f4
and or
not f4 or
f1 and not
f1 f2
andand
notnot
f2 f4
and or
not f4 or
3 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 9
10 11
1
2
f4 and not
f4 f2
andand
notnot
f2 f1)
and or
not f1) or
f4 and f4 and
Fig. 9 : A Luciole simulation of the voter — Initially,
(f2 and not
(f2 f3
andand
notnot
f3 f1
and or
not f1 or
A Luciole
Fig.
simulation
A
Luciole
ofsimulation
the
—
of Initially,
the
— Initially,
is no fault,
theresois nooffault, so
f3 and not
f3 f2
andand
notnot
f2 f1
and or
not f1 or Fig. 9. there
is9.no
fault,
so
thevoter
output
isvoter
thethere
olympic
average
the output is the output
olympicis average
the olympic
of theaverage
inputs of
(i.e.,
thethe
inputs
average
(i.e.,ofthe average of
f1 and not
f1 f2
andand
notnot
f2 f3)
and ;
not f3) ;
the
inputs
(i.e.,
the
average
of
2
and
4).
At
step
2,
a fault
2 and 4). At step
2 and
2, 4).
a fault
At step
occurs
2, aon
fault
channel
occurs
4, on
so channel
the outputs
4, so
is the
the
outputs is the
tel
tel
occurs
onx1,
channel
4,
so
the
the
median
of x1,
median of
median
x2, x3,
of x1,
which
x2,
is x3,
2. outputs
At
which
step is
3, is
2.
channel
At step
3 3,
becomes
channel
3 becomes
faulty,
sowhich
the faulty,
result
is
sothe
theaverage
result 3,
isofchannel
the
x1average
and x2.
ofbecomes
At
x1step
and4,x2.
afaulty,
third
At step so
4, a third
x2,
x3,
is
2.
At
step
3
(b) A purely
(b)Boolean
A purelyversion
Boolean version
faultresult
occurs, is
fault
so the
theoccurs,
output
sotakes
theof
the
output
takesvalue
the At
“safe”
0 for
3value
units
for 3 units of
the
average
x1“safe”
and
x2.
step
4, a0ofthird
time.
time.
Fig. 8. The Fig.
node8.twooffour
The node twooffour
fault occurs, so the output takes the “safe” value 0 for 3
Fig. 8 : The node twooffour
units of time.
Luciole. Fig. 9 shows such a simulation: we choose constant
inputs for the values x1, x2, x3, x4 coming from the channels,
and just play with the occurrences of transmission faults f1,
f2, f3, f4, observing that the correct output is computed in
each case. The simulation is done with SAFE_COUNTER_TIME
= 3 and FAIL_SAFE_ROLL_VALUE = 0.
VI.Incremental implementation of functionality
Once we have designed the global architecture of the
program, and taken decisions about where to place Lustre
pre’s, we can develop the whole program progressively. We can
start by giving each node some trivial behavior (like output
= constant), just to see whether the architecture indeed
compiles. It allows typing and self-dependence problems to
be detected.
Then we can start writing more and more appropriate
code for each of the nodes. We used four steps, each of which
with simulation :
ƒƒ We start (§VI-A) with the detection of transmission
failures only (the channels do not talk to each other).
ƒƒ
ƒƒ
ƒƒ
This can be observed on one channel only, first, and then
we can put together all the four channels.
Then (§VI-B), we add the detection of faults that are
due to cross-channels comparisons, but without latching
them.
Then we implement the latching of faults and the effect
of resets (§VI-C).
Finally, we implement the global allocator (§VI-D) that
allows a special behavior to be given to third fault.
A. Local detection of transmission faults only
Lustre code
In each channel, the node Monitor determines if the
two values received by the channel differ too much for too
long a time. It also transmits some combination of the two
values received as its “local” value. Nothing is said about the
combination function in the informal documentation.We have
chosen to output the first value. Any other choice could be
implemented in a very simple way.
CSI Journal of Computing | Vol. 1 • No. 4, 2012
NE BY ONE
ure of the
ustre pre’s,
y. We can
e output
ure indeed
roblems to
priate code
which with
ansmission
ach other).
t, and then
hat are due
hing them.
d the effect
VI-D) that
fault.
if the two
for a too
of the two
d about the
. We chose
mplemented
es
var
In each channel, the node Monitor determines if the two let diff1, diff2, diff3 : bool ;
diff1 = abs (xi - pxother1)
Lustre
values code
received by the channel differ too much for a too
-- comparisons of xi with
the three
> CROSS_CH_TOL_ROLL
;
-foreign
long time. It also transmits some combination of the two
diff2
= absvalues
(xi - pxother2)
let
values
as its
value. Nothing
is Specification
saidifabout
the and
In each
channel,
the“local”
node Monitor
determines
the two
> CROSS_CH_TOL_ROLL
received
Validation
Embedded
Systems:
A Case Study of a ;
diff1 ==of
abs
-- pxother1)
abs (xi
(xi
pxother3)
combination
function
in
the
informal
documentation.
We
chose
values
received
by
the
channel
differ
too
much
for
a
too
7 : 78
Fault-Tolerant Data Acquisitiondiff3
System with
Lustre
Programming
environment
>> CROSS_CH_TOL_ROLL
CROSS_CH_TOL_ROLL ;;
to output
Any other
can be implemented
long
time.the
It first
alsovalue.
transmits
somechoice
combination
of the two
diff2
=
abs
(xi
pxother2)
fault =
in a very
simpleasway.
values
received
its “local” value. Nothing is said about the
> CROSS_CH_TOL_ROLL ;
maintain(TIME_CROSS_ROLL,
diff3 =ifabs
(xi - pxother3)
combination function in the informal documentation. We chose
pfother1
then
Monitor
(
> CROSS_CH_TOL_ROLL
tonode
output
the first value.
Any other choice can be implemented
-- don’t take this
one into account;
xa, xb : real ; -- two input values
fault = if pfother2 then -- the same
in) a very simple way.
maintain(TIME_CROSS_ROLL,
returns (
node local_value
Monitor (
: real ;
xa,
xb :value
real seen
; --bytwo
input
values
-- the
this
channel
)
failure : bool ;
returns
(
-- detection
of a transmission fault
) local_value : real ;
let -- the value seen by this channel
failure
;
failure := bool
maintain(TIME_ROLL,
abs(xa - xb)
-- detection of
a transmission fault
> DELTA_ROLL);
)
local_value = xa ;
let
tel
failure = maintain(TIME_ROLL, abs(xa - xb)
> DELTA_ROLL);
TIME_ROLL
and= xa
DELTA_ROLL
twoconstants;
constants;
local_value
;
TIME_ROLL
and
DELTA_ROLL
areare two
tel
maintain
is the
Lustre
presented
in §II-B.
maintain
is the
Lustre
nodenode
presented
in §II-B.
if pfother3
if pfother1
then else diff3
then false
-don’t
take thisthen
one diff2
into account
else
if pfother3
if pfother2
then
-the
else (diff2 andsame
diff3)
elseififpfother3
pfother2 then
then
false else
if pfother3
thendiff3
diff1
else if pfother3
then and
diff2
else (diff1
diff3)
else (diff2
else if pfother3
then and diff3)
else if(diff1
pfother2
and then
diff2)
if
pfother3
then and
diff1
else (diff1 and diff2
diff3)) ;
else (diff1 and diff3)
tel
else if pfother3 then
(diff1 and diff2)
In Section VII-A,
this nodeand
willdiff2
be formally
checked;for
else (diff1
and diff3))
equivalence with another version.
tel
Notice that the informal specification is unclear: we deIn
Section
VII-A,
this node
be formally“diff1
checked and
for
cided
to report
a failure
whenwill
the conjunction
Simulations
Simulations
TIME_ROLL and DELTA_ROLL are two constants;
9 equivalence
with
another
version.
diff2
and
diff3”
holds
for
the
given
delay.
We
could
In Section VII-A, this node will be formally checked for
maintain
the Lustre
node presented
in §II-B.
Fig. 10 is
shows
a simulation
of the node
Monitor, with
have
chosen
another
solution,
first detecting
if each we
foreign
Notice
that
theanother
informal
specification
is unclear:
deequivalence
with
version.
Fig.
10
shows
a
simulation
of
the
node
Monitor,
with
constants
value
differs
from
the
local
one
for
the
given
delay,
and
then
cided
to
report
a
failure
when
the
conjunction
“diff1
and
Simulations
constants
B.
Adding
non-latched
comparisons
DELTA_ROLL
= 14 cross-channel
and TIME_ROLL
= 3
2
3
4
5
6
8
9
10 11 12 13 14 15 16 17 18
reportingand
the1 conjunction
of7 these
conditions.
diff2
diff3”
holds
for
the
given delay. We could
DELTA_ROLL = 14 and TIME_ROLL = 3
have
chosen
another
solution,
first
detecting
if 27.50
each
foreign
27.30
of 27.80the
node
Fig.
10
shows
a simulation
of the nodecomparisons
Monitor, with Whatever be the correct choice, the body
B.
Adding
non-latched
cross-channel
Lustre
code
value
differs
from
the
local
one
for
the
given
delay,
and
then
constants
Lustre code
reporting the conjunction of these conditions.
For
cross-channel
thethe
local
value
DELTA_ROLL
=comparisons,
14
and TIME_ROLL
=value
3xi is
20.30 20.50 20.70
For
cross-channel
comparisons,
local
xicomis
Whatever be the correct choice,20.10the
body of the node
pared
to
those
among
the
foreign
ones
that
are
not
declared
18.20
compared to those among the foreign ones that are not
xa
15.20
15.20 15.20
failed (that
is why
we is
need
failneed
statuses
of the
threeofforeign
declared
failed
(that
whythewe
the fail
status
the
channels).
Intuitively,
theIntuitively,
local valuethe
is local
faultyvalue
if it differs
too
12.10 12.20
three
foreign
channels).
is faulty
much
(i.e., too
more
than
a more
constant
if
it differs
much
(i.e.,
thanCROSS_CH_TOL_ROLL)
a constant CROSS_CH_
7.50
TOL_ROLL)
the other (supposedly
correct)
values. a
from all thefrom
otherall(supposedly
correct) values.
Moreover,
7.00
6.10 6.10 6.10
Moreover,
a cross-channel
failure only
is reported
only if this
cross-channel
failure is reported
if this situation
lasts
21.20 21.20 21.20
situation
some delay (TIME_CROSS_ROLL).
This is
for some lasts
delayfor(TIME_CROSS_ROLL).
This detection
detection
performed
by a node values_nok,
by the
performedisby
a node values_nok,
called by called
the component
component
FailDetect
of
the
channel
and
which
is
given
15.20
14.60
FailDetect of the channel, and which is given below,
and
xb
12.10
below, along with a description in the sequel.:
11.50
then discussed:
9.90
8.90
8.10 7.90 7.70
7.50
9.10 9.10
5.90 5.70
node values_nok (
5.40
pfother1, pfother2, pfother3 : bool ;
-- foreign values status: true if faulty
27.80
27.30 27.50
xi : real ; -- local value
pxother1, pxother2, pxother3 : real ;
-- foreign values
20.70
)
20.10 20.30 20.50
returns (
18.20
local value
15.20
15.20 15.20
fault : bool
-- there is a cross channel fault
12.10 12.20
)
var
7.50
7.00
diff1, diff2, diff3 : bool ;
6.10 6.10 6.10
-- comparisons of xi with the three
inline monitor failed
-- foreign values
let
9
3
5
6
8
10 11 12 13 14 15 16 17 18
7
1
2
4
diff1 = abs (xi - pxother1)
Fig. 10 : A simulation of the Monitor
> CROSS_CH_TOL_ROLL ; Fig. 10. A simulation of the Monitor
diff2 = abs (xi - pxother2)
> CROSS_CH_TOL_ROLL ;
diff3 = abs (xi - pxother3)
> CROSS_CH_TOL_ROLL ; FailDetect is simply:
fault =
maintain(TIME_CROSS_ROLL,
CSI Journalifofpfother1
Computing
| Vol. 1 • No. 4, 2012
then
failure = transmit_failure or cross_failure;
-- don’t take this one into account
cross_failure =
if pfother2 then -- the same
values_nok(pfother1, pfother2, pfother3,
if pfother3
xi, pxother1, pxother2, pxother3);
debug localfa
debug localfa
debug localfa
debug localfa
debug cross fa
debug cross fa
debug cross fa
debug cross fa
Fig. 11.
A
8.90
9.10 9.10
8.10 7.90 7.70
7.50
5.90 5.70
5.40
xb4
15.00
15.00
3.25
3.50
15.00
15.00
27.80
x
F. Maraninchi, et. al.
20.10 20.30 20.50
16
17
18
local value
7.50 27.80
1
2
3
4
5
6
18.20
7
8
9
10
11
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
15.20
20.70
12
13
14
15
16
15.00
15.00
15.00
15.00
15.00
13.00
13.00
9.00
9.00
9.00
15.20 15.20
12.10 12.20
xa1
3.00
0.00
inline monitor failed
2
xa2
-2.00
-2.00
0.00
3.00
7.00
7.50
3.00
3.00
3.00
3.00
3.00
3
4
5
6
7
8
9
3.00
3.00
3.00
3.00
3.00
3.00
3.00
10
3.00
11
3.00
3
4
5
6
7
8
9
10
11
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
A simulation3.50 of3.50the3.50Monitor
3.50
3.50
3.50
3.50
1
xb2
Fig. 10.
3.00
0.00
1
-2.00
6.10 6.10 6.10
xb1
15.20 15.20
-2.00
2
12
13
3.00
12
4.00
3.00
14
15
3.00
16
3.00
17
3.00
27.80
13 27.30
15 16
14 27.50
4.00
4.00
4.00
18
3.00
17
4.00
18
4.00
0.00
0.00
20.10 20.30 20.50
xa3
18.20
xa
0.00
2.50
2.50
0.00
3.25
3.25
3.00
2.50
2.50
2.50 15.20
2.50
20.70
15.20 15.20
2.50
16.00
16.00
16.00
16.00
7.50
7.00
failure = transmit_failure
or cross_failure;
6.10 6.10 6.10
cross_failure =
21.20 21.20 21.20
values_nok(pfother1, pfother2, pfother3,
xi, pxother1, pxother2, pxother3);
5.00
3.00
xb4
15.00
15.00
15.00
1
xa1
1
0.00
xb
7.50 27.80
4.00
14.60
x
16
17
18
failure;
other3,
other3);
ng Luciole,
of Luciole
while it is
rresponding
ei ), we first
output. The
for relevant
duce first a
5), then a
ross failure
other cross
).
3.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
2
3
4
5
6
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
7
8
0.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.25
3.25
3.50
3.25
12.10
3.00
3.00
11.50
Simulations
9.90
debug localfailure1
9.10 9.10
8.90
7.90 7.70 7.50
We can run a simulation on this 8.10
program,
using Luciole,
debug localfailure2
5.90 5.70
the Lustre simulator. Since the current version
of5.40Luciole
only
the values of input/output variables, while it is
debug shows
localfailure3
interesting to see also the internal variables corresponding
27.80
localfailure4
27.30 27.50
todebug
failures
(transmit failurei , cross failure
i ), we first
modify
program so that these variables be output. The
debug
cross the
failure1
simulation is performed with the following values for relevant
debug cross failure2
20.70
constants:
20.10 20.30 20.50
18.20
DELTA_ROLL = 4.0 ; 15.20
15.20 15.20
debug cross
failure4
CROSS_CH_TOL_ROLL
= 10.0 ;
12.20
12.10
TIME_ROLL = TIME_CROSS_ROLL = 3 ;
Fig. 11 : A simulation of transmission and cross
failures
7.50
7.00
Fig. 11 shows an execution
(where we introduce first a
1
Fig. 11.
9
10
11
12
13
14
15
16
15.00
15.00
15.00
15.00
15.00
12
13
14
15
16
13.00
13.00
9.00
9.00
9.00
3.00
6
3.00
7
3.00
8
3.00
9
3.00
10
3.00
11
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
4.00
3.50
3.50
3.50
3.50
3.50
3.50
2.50
2.50
2.50
2.50
2.50
2.50
2
3
4
5
-2.00
-2.00
-2.00
-2.00
3.00
3.00
3.00
xa2
3.00
3.00
4.00
4.00
3.50
2.50
0.00
xb2
1
0.00
2
3
0.00
xa3
-13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00
0.00
xb3
12 :automaton
The automaton
driving
Fig. Fig.
12. The
driving failure
latchingfailure latching
-15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00
15.20
debug cross failure3
local value
15
3.00
0.00
15.00
0.00
15.20 15.20
3.00
0.00
15.00
5.70 5.40
3.00
debug localfailure3
simulation
is performed with the following values for relevant
constants:
debug localfailure4
DELTA_ROLL = 4.0 ;
debug cross failure1
CROSS_CH_TOL_ROLL = 10.0 ;
10
debug cross failure2
TIME_ROLL = TIME_CROSS_ROLL = 3 ;
debug Fig.
cross failure3
11 shows an execution (where we introduce first a
transmission
debug
cross failure4fault on the first channel (steps 2 to 5), then a
cross failure on channel 4 (steps 3 to 7), then a cross failure on
channel 3 alone (from step 9), on which an other cross failure,
on
1, is combined
(from step
12) failures
).
Fig.channel
11. A simulation
of transmission
and cross
xb1
-15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00 -15.00
9.10 9.10
3.00
7 : 79
xb3
16.00
3.00
debug localfailure2
-13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00 -13.00
FailDetect
is simply:
12.10 12.20
xa4
3.00
debug localfailure1
10
15
3.00
15.00
4.00
0.00
27.30 27.50
3.00
0.00
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
A simulation
and cross failures
6.10of
6.10transmission
6.10
transmission
fault on the first channel (steps 2 to 5), then a
inline monitor
failed that the informal specification is unclear: we
Notice
cross failure on channel 4 (steps 3 to 7), then a cross failure
decided to report
the
conjunction
3 a 4failure
5
6
8
10 11
16 17 18
7when
2
12 13 14 15 “diff1
on channel 31 alone
(from
step
9),9 on
which
an other
cross
and diff2 and diff3” holds for2 the given delay. We could
failure,
on
channel
1,
is
combined
(from
step
12)
).
Fig.
10.
A
simulation
of
the
Monitor
have chosen 1another solution, first detecting if each3foreign
value differs from the local one for the given delay, and then
reporting the conjunction of these conditions.
Whatever be the correct choice, the body of the node
FailDetect
simply:
Fig. 12. The automaton
driving failure latching
FailDetect
is is
simply:
failure = transmit_failure or cross_failure;
cross_failure =
C. Latching
the cross-channel faults
values_nok(pfother1,
pfother2, pfother3,
xi, pxother1, pxother2, pxother3);
Now, recall that some serious faults must be latched (i.e.,
sustained even when their cause has disapeared), until the
occurence of some reset command. This latching is performed
Simulations
Simulations
by
FailDetect,
and depends
on the
state
of an
Wethe
cannode
runrun
a asimulation
on
program,
using
Luciole,
We
can
simulation
on this
this
program,
using
Luciole,
automaton,
as described
by Fig.
12:
the
Lustre
simulator.
Since
the
current
version
of
Luciole
the Lustre simulator. Since the current version of Luciole
only
shows1 the
values
of input/output
variables,
while propit is
• State
is the
normal
state: everything
is working
only shows to
thesee
values
ofthe
input/output
variables,
while it is
interesting
also
internal faults
variables
corresponding
erly,
or
there
are
transmission
from
time
to
time,
interesting
to see also the
internal
variables corresponding to
to
failures
(transmit
failure
i , cross failurei ), we first
which
are not latched;
failures
(transmit
; cross
failure
), we first
modify
the
program failure
so that ithese
variables
be output.
The
i
• In the
state
2, theresohas
been
least one
serious
fault
modify
that
these
variables
be
output.
Thein
simulation
isprogram
performed
with
the at
following
values
for relevant
the past, which is latched, but may be reset either by
constants:
“OnGroundReset” or by “InAirReset”;
= 4.0
• DELTA_ROLL
In state 3, there
have ;been several serious faults in the
CROSS_CH_TOL_ROLL
= 10.0
;
past, or one serious fault
followed
immediately by a
TIME_ROLL = TIME_CROSS_ROLL = 3 ;
transmission one, and the fault is latched and reset-
Fig. 11 shows an execution (where we introduce first a
16.00
16.00
16.00
16.00
16.00
C. Latching the cross-channel faults
xa4
5.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
0.00
It may bethe
noted
that some serious
C. Latching
cross-channel
faults faults must be latched
(i.e., sustained even when their cause has disappeared),
Now,
that some
serious
must beThis
latched
(i.e.,
until
therecall
occurence
of some
resetfaults
command.
latching
sustained
even
when
their
cause
has
disapeared),
until
the
is performed by the node FailDetect, and depends on the
occurence
of
some resetascommand.
This
latching
is performed
state
of
an
automaton,
described
by
Fig.
12:
debug localfailure1
node1 FailDetect,
depends
on theisstate
of an
ƒby
ƒ the
State
is the normal and
state:
everything
working
debug localfailure2
automaton,
as or
described
bytransmission
Fig. 12:
properly,
there are
faults from time to
debug
localfailure3
which
arenormal
not latched;
• time,
State
1 is the
state: everything is working properly,
or
there
are
transmission
faultsone
from
time fault
to time,
ƒdebug
ƒ In
state
2,
there
has
been
at least
serious
in
localfailure4
which
arewhich
not latched;
the
past,
is latched, but may be reset either by
debug cross failure1
by “InAirReset”;
• “OnGroundReset”
In state 2, there hasorbeen
at least one serious fault in
debug cross
failure2
thestate
past, 3,which
latched,
may be
reset faults
either in
by
ƒƒ In
thereishave
beenbut
several
serious
the
past, or one serious
fault followed immediately by
“OnGroundReset”
or by “InAirReset”;
debug cross
failure3
and
theseveral
fault isserious
latchedfaults
and reset
• aIntransmission
state 3, thereone,
have
been
in the
debug cross failure4
inhibited.
In
this
case,
only
the
“OnGroundReset”
cana
past, or one serious fault followed immediately by
restore
the
normal
state.
transmission one, and the fault is latched and resetThe failure output is true when in state 2 or 3, and is
Fig. 11. A simulation of transmission and cross failures
equal to transmit_failure when in state 1. The transitions
between these states are driven by the following conditions:
ƒƒ from state 1: if the locally computed variable involves a
2
cross-failure, it is considered more serious if its value is
“in the nominal
range” (since it can be later considered
1
3
as correct). So, in this case, one moves to state 3. If the
erroneous value lies outside the nominal range, one
moves to state 2.
Fig.
The
automaton
latching
ƒƒ 12.
from
state
2: anydriving
resetfailure
signal
restores the normal state
(1); a cross-failure immediately followed by a foreign
failure involves a move to state 3; a transmission failure
occurring in state 2 also involves a move to state 3.
C. Latching the cross-channel faults
from state 3: only the “OnGroundReset” can restore the
normal
Now, state.
recall that some serious faults must be latched (i.e.,
xb4
15.00
15.00
15.00
15.00
15.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
3.25
3.25
3.50
3.25
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
3.00
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
4.00
0.00
x
0.00
1
sustained even when their cause has disapeared), until the
Lustre code
occurence of some reset command. This latching is performed
Programming
such an automaton
in Lustre
tedious,
by the
node FailDetect,
and depends
on the is
state
of an
automaton, as described by Fig. 12:
• State 1 is the normal state: everything is working properly, or there are transmission faults from time to time,
CSI Journal of Computing | Vol. 1 • No. 4, 2012
which are not latched;
• In state 2, there has been at least one serious fault in
the past, which is latched, but may be reset either by
• from involves
state 1: ifa the
locally
computed
variable involves
a r = false -> (pre(state) = 1 and try1to3)
failure
move
to state
3; a transmission
failure
occur only when the authorization is given:
cross-failure,
it is2 considered
more
serioustoifstate
its value
occurring
in state
also involves
a move
3. is • The actual moves
or (pre(state) = 2 and try2to3);
“in state
the nominal
(since it can be later
considered
• from
3: only range”
the “OnGroundReset”
can restore
the from1to3 = try1to3 and a;
where try1to3 and try2to3 obey the previous definitions
as correct).
normal
state. So, in this case, one moves to state 3. If from2to3 = try2to3 and a;
of from1to3 and from2to3:
range, one and
the erroneous value lies outside the nominal
Specification
Validation of Embedded Systems: A Case Study of a
The global
allocator is quite
it receives requests
try1to3
= cross_failure
and simple:
InNominalRange(xi);
Lustre
code
moves
to
state
2.
7 : 80
Fault-Tolerant Data Acquisition
System
with Lustre Programming environment
try2to3
= (pre(cross_failure)
r
from
the
units,
together
with
the
“OnGroundReset”
signal,
i
• from state 2: any reset signal restores the normal state
and foreign_failure)
and
returns
authorizations.
An
internal
counter
nb
is
used
Programming
such an automaton
in followed
Lustre isby
tedious,
but
(1); a cross-failure
immediately
a foreign
or transmit_failure ;
to
count
the
number
of
units
that
are
“reset-inhibited”,
The actual moves occur only when the authorization isand
but
rather
systematic.
The
state
is encoded
by an integer
failure
involves
move
to state
3;bya an
transmission
failure
rather
systematic.
The astate
is
encoded
integer
variable,
•
The
actual moves
whento
theprevent
authorization
is given:from
given:
variable,
ranging
from
1
to
3.
Notice
that
the
order
in
which
authorizations
are occur
givenonly
in order
this counter
2 also involves
move to
3. the
rangingoccurring
from 1 in
to state
3. Notice
that thea order
in state
which
the
transition
conditions
are
tested
for
is
relevant:
for
reaching
3.
The
“OnGroundReset”
signal
resets
the
allocator
from1to3
=
try1to3
and
a;
• from
state 3: only
“OnGroundReset”
the from1to3 = try1to3 and a;
transition
conditions
are the
tested
for is relevant:can
forrestore
instance,
instance,
it
allows
priority
to
be
given
to
reset:
from2to3
=
try2to3
and
a;
normal
state.
in
its
initial
configuration
(because
it
causes
all
automata
to
from2to3 = try2to3 and a;
it allows priority to be given to reset:
4
leave
state
3)
.
The
global
allocator
quite
The
global
allocatoris is
quitesimple:
simple:it itreceives
receivesrequests
requestsr
Lustre code
i
state = 1 ->
if pre(state)=1
then
Programming
such an automaton
in Lustre is tedious, but
if
pre(reset)
then
1 by an integer variable,
rather systematic.
The
state
is
encoded
-- reset has priority
ranging else
from 1iftopre(from1to2)
3. Notice that thethen
order2 in which the
transitionelse
conditions
are tested for is then
relevant:
for instance,
if pre(from1to3)
3 else
1
it allows
priority
to be given to reset:
else
if pre(state)=2
then
if pre(from2to1) then 1
else
state
= 1if->pre(from2to3) then 3 else 2
else
-- pre(state)=3
if pre(state)=1
then
ififpre(from3to1)
then
pre(reset) then
1 1 else 3 ;
-- reset has priority
else if pre(from1to2) then 2
Thedefinitions
definitions
of transition
conditions
follow
the
The
conditions
follow
the specifielse of
if transition
pre(from1to3)
then
3 else
1
specification:
cation: else if pre(state)=2 then
cation:
if pre(from2to1) then 1
else
if pre(from2to3)
from1to2
= cross_failure
andthen 3 else 2
else -- pre(state)=3
not InNominalRange
(xi)
if pre(from3to1)
then 1 else
3 ;;
from1to3 = cross_failure and
InNominalRange (xi) ;
from2to3
= (pre(cross_failure)
and the specifiThe definitions
of transition conditions follow
foreign_failure)
cation:
or transmit_failure ;
from2to1
=
inairreset
or ongroundreset;
from1to2 = cross_failure
and
from3to1 = ongroundreset;
not InNominalRange (xi) ;
from1to3
=whole
cross_failure
Fig.
1313
shows
thethe
code
node
Fig.
: shows
whole
codefor
forthe
theand
node FailDetect.
FailDetect.
InNominalRange (xi) ;
from2to3 = (pre(cross_failure) and
D. Taking into account the requirement on third
foreign_failure)
D. Taking
on third fault
fault into account theorrequirement
transmit_failure ;
thereal
realsystem,
system,
specialor
—
from2to1
= inairreset
ongroundreset;
InInthe
aa special
device
— called
called“Global
“Global
from3to1
ongroundreset;
Allocator”
—
ofof
preventing
more
than
two
units
Allocator”
—isisinin=charge
charge
preventing
more
than
two
units
from
becoming
reset-inhibited
(i.e.,
state
The
from
becoming
reset-inhibited
(i.e.,
from
entering
state3).3).
The
Fig.
13 shows
the whole code
forfrom
the entering
node
FailDetect.
node
Fail-Detect
must
be
changed
as
follows:
whenever
node Fail-Detect must be changed as follows: whenever
the
automaton should
should move
move to
to state
state3,3,ititsends
sends arequest,
request,
the automaton
say
into account
the and
requirement
on thirdathe
fault
sayD.r, Taking
to the global
allocator,
only performs
move if
r, to the global allocator, and only performs the move if the
the allocator
sendssystem,
it an authorization
back,—
saycalled
a. The“Global
node
In the real
a special device
hasAllocator”
an additional
a, and sends
additional
— isBoolean
in chargeinput
of preventing
morean
than
two units
Boolean
output r. The
transitions(i.e.,
are then
follows:
from becoming
reset-inhibited
frommodified
entering as
state
3). The
ƒƒ node
TheFail-Detect
request is the must
disjunction
of the
which
be changed
as condition
follows: whenever
involved
a move
from
state
1 to3,state
3, and
the one
the
automaton
should
move
to state
it sends
a request,
say
a move from
stateperforms
2 to statethe
3. move if the
r,which
to theinvolved
global allocator,
and only
r = false -> (pre(state) = 1 and try1to3)
or (pre(state) = 2 and try2to3);
where try1to3 and try2to3 obey the previous definitions
of from1to3 and from2to3:
try1to3 = cross_failure and InNominalRange(xi);
try2to3 = (pre(cross_failure)
and foreign_failure)
or transmit_failure ;
4
from
thethe
units,
signal,
ri from
units,together
togetherwith
withthe
the “OnGroundReset”
“OnGroundReset” signal,
node
allocator(r1,r2,r3,r4,reset:
and
returns
authorizations.
An internal
nb
and
returns
authorizations. An
internal counter
counterbool)
nb isisused
usedto
returns
(a1,a2,a3,a4:
bool);
count
the
number
of
units
that
are
“reset-inhibited”,
and
to count the number of units that are “reset-inhibited”, and
var
nb_aut,
already:
int;
authorizations
are
given
in
order
to
prevent
this
counter
from
authorizations are given in order to prevent this counter from
let
reaching
3. The
signalresets
resetsthe
theallocator
allocator
reaching
3.
The=“OnGroundReset”
“OnGroundReset”
signal
already
if (true
-> reset)
in
configuration
(because
it causes
causes all
all automata
automatatoto
in its
its initial
initial configuration
(because
it
then 0 else pre(nb_aut);
leave
state
3)44..
leavea1
state
= 3)
r1 and already <= 1;
a2 = r2 and
((not r1 and already <= bool)
1)
node allocator(r1,r2,r3,r4,reset:
or (r1 and already
returns (a1,a2,a3,a4:
bool); = 0)
); already: int;
var nb_aut,
let a3 = r3 and
already
= if r1
(true
((not
and ->
notreset)
r2 and already <= 1)
0 else pre(nb_aut);
or (#(r1,r2) then
and already
= 0)
a1 = r1
);and already <= 1;
a2
a4 == r2
r4and
and
((not
1) not r3 and
((notr1r1and
andalready
not r2 <=
and
or (r1 and already = 0)
already <= 1)
);
or (#(r1,r2,r3) and already = 0)
a3 = r3
);and
((not
r1 (true
and not
and already
nb_aut
= if
-> r2
reset)
then 0 <= 1)
or (#(r1,r2)
and already
else pre(nb_aut)
+ = 0)
);
(if a1 then 1 else 0) +
a4 = r4 and
(if a2 then 1 else 0) +
((not r1 and not r2 and not r3 and
(if a3 then
1 else
0) +
already
<= 1)
(ifalready
a4 then
1 else 0) ;
or (#(r1,r2,r3) and
= 0)
tel
);
nb_aut = if (true -> reset) then 0
else pre(nb_aut) +
Notice that there is an “instantaneous
dialogue”
between
(if a1 then
1 else
0) + the
a2same
thenstep,
1 else
0) +asks
units and the allocator: in the(if
very
the unit
(if the
a3 allocator
then 1 replies,
else 0)
the allocator for an authorization,
and+ the
(if a4 then 1 else 0) ;
unit
takes
the
transition
or
not,
according
to
this
reply.
tel
4 The “#” operator, in Lustre, is a n-ary Boolean operator, which returns
4
“true”
if and
only
if atismost
one of its operand
is true. between the
Notice
that
there
an “instantaneous
dialogue”
Notice that there is an “instantaneous dialogue” between
unitsunits
and and
the allocator:
in theinvery
same same
step, step,
the unit
the
the allocator:
the very
theasks
unit
the
allocator
for
an
authorization,
the
allocator
replies,
the
asks the allocator for an authorization, the allocatorand
replies,
unit takes
the transition
not, according
to according
this reply. to this
and
the unit
takes theortransition
or not,
reply.
4 The “#” operator, in Lustre, is a n-ary Boolean operator, which returns
Section
some
properties
“true”In
if and
only if atVII-B,
most one
of its operand
is true.of this allocation
mechanism will be formally verified.
VII.Some Experiences in Formal Verification
Ideally, the formal verification of such a program should
consist of comparing it with a global, abstract specification.
As it is often the case with real case-studies, this specification
is not available. The problem even more serious, here, since
the abstract specification of such a fault-tolerant system
The “#” operator, in Lustre, is a n-ary Boolean operator, which returns “true” if and only if at most one of its operand is true.
CSI Journal of Computing | Vol. 1 • No. 4, 2012
7 : 81
F. Maraninchi, et. al.
12
node FailDetect (
transmit_failure : bool ;
xi : real ;
ongroundreset, inairreset : bool ;
choffi : bool ;
pxother1, pxother2, pxother3 : real ; -- other values (pre)
pfother1, pfother2, pfother3 : bool ; -- other failures (pre)
)
returns (
failure : bool ; -- failure detected by this channel
)
var
cross_failure : bool ;
state : int ; -- only 1, 2, 3 are relevant
from1to2, from1to3, from2to3, from2to1, from3to1 : bool ;
reset, foreign_failure : bool ;
let
-- the state ------------------------------------------------------state
= 1 ->
if pre(state)=1 then
if pre(reset) then 1
-- reset has priority
else if pre(from1to2) then 2
else if pre(from1to3) then 3
else 1
else if pre(state)=2 then
if pre(from2to1) then 1
else if pre(from2to3) then 3
else 2
else -- pre(state)=3
if pre(from3to1) then 1
else 3 ;
reset = ongroundreset or (inairreset and not cross_failure) ;
foreign_failure = pfother1 or pfother2 or pfother3 ;
-- The output -----------------------------------------------------failure = (state = 2) or (state = 3) or (state = 1 and transmit_failure) ;
-- All the transitions --------------------------------------------from1to2 = cross_failure and not InNominalRange (xi) ;
from1to3 = cross_failure and InNominalRange (xi) ;
from2to3 = (pre(cross_failure) and foreign_failure) or transmit_failure ;
from2to1 = reset ;
from3to1 = ongroundreset ;
-- Cross channel comparisons --------------------------------------cross_failure = values_nok (pfother1, pfother2, pfother3,
xi, pxother1, pxother2, pxother3) ;
tel
Fig. 13.
Latching Failures
Fig. 13 : Latching Failures
should probably involve probabilistic properties, which are
A. Cross-channel fault detection
stated
nowhere,
couldofnot
handled mechanism
by usual
In
Section
VII-B, and
somewhich
properties
thisbeallocation
verification
tools.verified.
will
be formally
of the
theversion
node values_nok,
which detects
In
of the node values_nok
givencross-channel
in Section
faults.
VI-B, the output fault is defined as maintain(TIME_
• When some
consistency
are clearly
expressed
CROSS_ROLL,
cond),
where properties
cond is a Boolean
expression
in
the
requirements:
for
instance,
we
will
try
to
prove
that
carefully detailing all the possible combinations of failures.
at most
channels
can become
“reset-inhibited”.
One could
looktwo
for a
more compact
and symmetrical
condition,
say cond1, expressing that all the significant other measures
A. Cross-channel
fault
are
too different from
thedetection
local measure:
)) diffi
"i Î [1...3];
Incond1
the =version
of the (Øpfother
node values_nok
given
i
or
in Section VI-B, the output fault is defined as
maintain(TIME_CROSS_ROLL,
cond), where cond
fault
= maintain(TIME_CROSS_ROLL,
is
a
Boolean
expression
carefully
detailing
all the possible
pfother1 or diff1)
combinations
of
failures.
One
could
look
for
a
compact
and (pfother2 or more
diff2)
and
symmetrical
condition,
say
cond1,
expressing
that all
and (pfother3 or diff3));
the significant
other
measures
are
too
different
from
the
local
So, we can write two complete versions of the node
measure:
values_nok, with the same definition for the diffi and a
So, we don’t know “what to verify” on the complete
EXPERIENCES
FORMAL
VERIFICATION
VII. S OME
program.
However,
there areINtwo
common
cases where
verification
tools
can beverification
applied “locally”:
Ideally, the
formal
of such a program should
consist
of comparing
it ways
with of
a writing
global, aabstract
specification.
ƒƒ When
there are two
node, one
can write
and the
try to
show
that
they
are equivalent.
If they are
As itboth
is often
case
with
real
case-studies,
this specification
a bug isThe
detected,
at even
least, more
in oneserious,
of them.
If they
is notnot,
available.
problem
here,
since
are, thisspecification
increases the
one can have
in any
of
the abstract
of confidence
such a fault-tolerant
system
should
them.involve
We willprobabilistic
illustrate this
situation which
on two are
versions
probably
properties,
stated
of the
node
values_nok,
detects
cross-channel
nowhere,
and
which
could not bewhich
handled
by usual
verification
faults.
tools.
ƒƒ So,When
properties
are clearly
we some
don’t consistency
know “what
to verify”
on theexpressed
complete
in theHowever,
requirements:
for instance,
we will
trywhere
to prove
program.
there are
two common
cases
verthattools
at most
can become “reset-inhibited”.
ification
cantwo
be channels
applied “locally”:
• When there are two ways of writing a node, one can write
both and try to show that they are equivalent. If they are
not, a bug is detected, at least, in one of them. If they
are, this increases the confidence one can have in any
of them. We will illustrate this situation on two versions
cond1 = ∀i ∈ [1..3], (¬pfotheri ) ⇒ diffi
or
CSI Journal
of Computing | Vol. 1 • No. 4, 2012
fault
= maintain(TIME_CROSS_ROLL,
and
(pfother1 or diff1)
(pfother2 or diff2)
13
13
and (pfother3 or diff3));
• on the other hand, the determination of the suitable con
Specification•and
Validation
of Embedded
Systems:
A Case
Study
and (pfother3 or diff3));
on
other hand,
the complex.
determination
of the
suitable
con-of a
trolthe
structure
is quite
Obviously,
the automata
82 can write two complete versions
Fault-Tolerant
Data Acquisition
System
with
Lustre
Programming
environment
So,7 :we
of the node
trol
structure
is quite complex.
involved
in “FailDetect”
shouldObviously,
be taken the
intoautomata
account,
So, we can with
write the
twosame
complete
versions
thei and
nodea
values_nok,
definition
for theofdiff
involved
in
“FailDetect”
should
be
taken
into
but the tool takes a very long time to find it. account,
and
a
values_nok,
with
the
same
definition
for
the
diff
i
different definition for r. Then, we try, using our verification
butlimitations
the
tool takes
a very
timecomplex.
to find it.Obviously,
control
structure
islong
quite
different definition
r. Then,
we using
try, using
verificationThese
suggest
some
improvements
to the tool,the
different
forwhatever
r.for
Then,
we the
try,
our our
verification
tools, to definition
show that,
be
input sequences,
they
automata
involved
in
“FailDetect”
should
taken
tools,
to
show
that,
whatever
be
the
input
sequences,
they
These
limitations
suggest
some
improvements
tobethe
tool,into
which
will
be
discussed
in
the
conclusion.
tools,
to
show
that,
whatever
be
the
input
sequences,
they
provide
the same
output
sequence:
account,
but theintool
takes
a very long time to find it.
provide
the same
output
sequence:
which
will
be
discussed
the
conclusion.
So, we decided to use Lesar for this verification. For that,
provide
thethe
same
output sequence:
• with
verification
tool Lesar, the verification fails.
These
limitations
suggest
some
to the
ƒƒ with the verification tool Lesar, the verification fails.weSo,
weto
decided
tothe
useprogram,
Lesar
for
verification.
that,
have
modify
so this
that
theimprovements
propertyFor
involves
• with
the
verification
tool
Lesar,
the
verification
fails.
The The
diagnosis
returned
by Lesar
shows
thatthat
it isit due
tool,towhich
will
be
discussed
in
the
conclusion.
diagnosis
returned
by Lesar
shows
is duewe have
modify
the program,
soitthat
theuses
property
involves
computations.
Since
only
counters
up to
The
diagnosis
returned
by
Lesar
shows
that itonly
is due
to the
of the
tool:
Lesar
considers
thetheonly Boolean
to weakness
the weakness
of the
tool:
Lesar
considers
only
So, we
decided to use
Lesarit for
this
verification.
that,
only
Boolean
computations.
Since
only
uses
counters For
up to
4,
it
is
quite
easy:
to
the
weakness
of
the
tool:
Lesar
considers
only
the
Boolean
aspects
of the
program,
andand
abstracts
away
all all
Boolean
aspects
of the
program,
abstracts
away
we
have
to
modify
the
program,
so
that
the
property
involves
quite easy: must be changed to work only with Boolean
Boolean
aspects
ofexpressions.
the program,
and
abstractsit away
all 4, it• is
“FailDetect”
the numerical
expressions.
As aAs
consequence,
produces
the numerical
a consequence,
it produces
only
Boolean computations. Since it only uses counters up to
•
“FailDetect”
must be changed to work only with Boolean
the
numerical
expressions.
As
a
consequence,
it
produces
variables:
instead
a counter-example
where
the Boolean
variables
diff1,
a counter-example
where
the Boolean
variables
diff1,
4,
it is quite
easy: of encoding states by integers, we use
variables:
instead
ofand
encoding
states
by integers,
wesuch
use
adiff2,
counter-example
where
the
Boolean
variables
diff1,
of Booleans,must
wechanged
define
atonode
compare
diff2,
diff3
(which
defined
numerical
and and
diff3
(which
areare
defined
by by
numerical
ƒpairs
ƒ “FailDetect”
be
worktoonly
with Boolean
pairs
of
Booleans,
and
we
define
a
node
to
compare
such
diff2,
and
diff3
(which
are
defined
by
numerical
expressions)
appearing
in two
the two
versions
of the
node
pairsvariables:
for equality:
expressions)
appearing
in the
versions
of the
node
instead of encoding states by integers, we use
pairspairs
for equality:
expressions)
appearing
inIt
of the node
distinct
values.
is
atwo
case
of false
negative.
have have
distinct
values.
It is
athe
case
of versions
false
negative.
of Booleans,
and we define
a node to compare such
const state1
= [false,
true];
distinct
values.
It able,
isisa able,
case
of some
false
negative.
ƒƒ prototype
the
prototype
to
extend,
to take
• have
the
NBacNBac
is
to some
extend,
to take
intointo
const
state1
== [false,
true];
pairs
for equality:
state2
[true, false];
• the
prototype
NBac is
able,
to
extend,
to take
into
account
numerical
aspects
inthe
theverification.
verification.
It
state2
== [true,
false];
state3
[true,
true];
const
state1
= [false,
true];
account
numerical
aspects
in some
It also
alsofails
state3
=(s1,
[true,
node EQState
s2: true];
[bool,bool])
in
proving
the
equivalence
of
the
two
nodes,
but
indicates
account
numerical
aspects
in
the
verification.
It
also
state2
= [true,
false];
fails in proving the equivalence of the two nodes, but
node EQState
(s1,
s2: returns
[bool,bool])
(eq: bool);
that
the output
may may
differdiffer
when
all two
the
pfother
state3
= [true,
true];
fails
in proving
equivalence
of the
nodes,
but i
indicates
that
thethe
output
when
allinputs
the inputs
returns
(eq: bool);
let node EQState (s1, s2:
are true.
Thisoutput
is an actual
discrepancy:
in the
thisinputs
case, the
[bool,bool])
indicates
that
the
may
differ
when
all
pfotheri are true. This is an actual discrepancy: in
leteq
=(s1[0]=s2[0]) returns
and (s1[1]=s2[1]);
(eq: bool);
first version
outputs
= false
(since none
This fault
is outputs
an actual
discrepancy:
in of
pfother
this case,i are
the true.
first version
fault
= false
eq =(s1[0]=s2[0]) and (s1[1]=s2[1]);
tel
let
other
values
is
significant,
the
local
one
is
assumed
to
be
this
case,
first version
fault
= false
tel
(since
nonethe
of other
values isoutputs
significant,
the local
one is
eq =(s1[0]=s2[0]) and (s1[1]=s2[1]);
good), while the second version outputs fault = true.
(since
none
of
other
values
is
significant,
the
local
one
is
Then,
we change the definition of the state as follows:
assumed
be good),
while
thesecond
second
version
outputs
tel
It is to
probably
a bug
in the
version,
which
can be
Then, we change the definition of the state as follows:
assumed
to true.
be good),
while
the second
version
outputs
fault
=
It
is
probably
a
bug
in
the
second
fixed as follows:
Then, we change the definition of the state as follows:
fault
=
true.
probably
a bug in the second
state= state1 ->
version,
which
can Itbeisfixed
as follows:
fault
= maintain(TIME_CROSS_ROLL,
state=
state1 ->
if EQState(ps,state1)
then
version,
which
can
be
fixed
as
follows:
pfother1 or diff1)
fault
maintain(TIME_CROSS_ROLL,
=and
(pfother2 or diff2)
fault
maintain(TIME_CROSS_ROLL,
pfother1
or diff1)
= and
(pfother3
or diff3)
pfother1
diff1)
and (pfother2
or
diff2)
andnot(pfother1
andor
pfother2
and
(pfother2
or
diff2)
and
(pfother3
or
diff3)
andpfother3));
and
or diff3)
and (pfother3
not(pfother1
and pfother2
and
not(pfother1
and pfother2
and fixed
pfother3));
Now, NBac
shows
that this
version
is equivalent to
Now,
NBac
fixedpfother3));
version is equivalent to
the first
one.shows that this and
Now,
NBac
the first
one.shows that this fixed version is equivalent to
B. first
Allocation
of “reset-inhibition”
the
one.
if EQState(ps,state1)
if pre(reset) then then
state1
if pre(reset)
then state1 then state2
else if pre(from1to2)
else
else if
if pre(from1to2)
pre(from1to3) then
then state2
state3
else
pre(from1to3) then state3
else if
state1
else
else
if state1
EQState(ps,state2) then
elseififpre(from2to1)
EQState(ps,state2)
then
then state1
if
pre(from2to1)
then state1
else
if pre(from2to3)
then state3
else
pre(from2to3) then state3
else if
state2
else state2
else
elseif pre(from3to1) then state1
if
pre(from3to1)
then state1
else
state3 ;
else state3 ;
In section
VI-D we designed the “Global Allocator”,
B. Allocation
of “reset-inhibition”
B. Allocation
“reset-inhibition”
the role of which
is to prevent more than two channels to
In section VI-D we designed the “Global Allocator”, the
become VI-D
“reset-inhibited.
Wethe
can“Global
try to verify
the behavior • The property must be expressed only with Boolean variIn
section
we
designed
Allocator”,
role of
of the
which
is to prevent more
than two
channels the
to
ƒƒ property
The property
must
be expressed
only with Boolean
allocator alone: we know that nb_aut, the number • The
must be
expressed
with
variables:
the observer
receives
theonly
states
of Boolean
the channels,
role
of
which
is
to
prevent
more
than
two
channels
to
become
“reset-inhibited.
We
can
try
to
verify
the
behavior
variables: the observer receives the states of the channels,
of authorizations, should stay smaller than 2. Of course,
ables:
the observer
receives
the states
of the
channels,
and
returns
a
single
Boolean
which
is
false
when
at
become
“reset-inhibited.
We
can
try
to
verify
the
behavior
of thebecause
allocator
alone: we variables,
know thatLesar
nb_aut,
and returns a single Boolean which is false when least
at least
of numerical
is notthe
ablenumber
to proveofthis
and
returns
a single
is false when at least
three
of them
are are
inBoolean
state
3. which
of
the
allocator
alone:
we
know
that
nb_aut,
the
number
of
three
of
them
in
state
3.
authorizations,
should
stay
smaller
than
2.
Of
course,
because
property (the range of the counter being small, the node could
three of them are in state 3.
authorizations,
should
smaller
2.
course,
because
of numerical
variables,
Lesar
is notthan
able
to Of
prove
this itproperty
be rewritten
onlystay
with
Boolean
variables,
but
is neither
node verif(st1, st2, st3, st4:[bool,bool])
of
numerical
variables,
Lesar
is
not
able
to
prove
this
property
node returns
verif(st1,
st3, st4:[bool,bool])
(ok:st2,
bool);
(the range
of the nor
counter
beingNBac
small,proves
the node
could bevery
very natural,
efficient).
the property
(ok: bool);
(the
range
of about
the
being
small,but
theit node
couldvery
be
var returns
three: bool;
easily
(in
0.5 sec.).
rewritten
only
withcounter
Boolean
variables,
is neither
var three:
inhib1,bool;
inhib2, inhib3, inhib4: bool;
rewritten
only
withambitious
Boolean
variables,
it is neither
very on
natural, nor
efficient).
NBac verification
proves thebut
property
very
easily
A more
consists
in
proving,
let inhib1, inhib2, inhib3, inhib4: bool;
natural,
nor
efficient).
NBac
proves
the
property
very
easily
the integrated
(in about
0.5 sec.). system, that at most two units can become
let
-- at most 2 channels reset inhibited
(in Aabout
0.5
sec.). i.e.,
resetinhibited,
that notconsists
only are
the authorizations
more
ambitious
verification
in proving,
on the
-- ok
at =most
channels reset inhibited
not 2three;
A more
ambitious
verification
consists
in
the by
correctly
delivered,
butmost
also two
that
they are
correctly
obeyed
= not three;
--okcounting
the number of inhibited units
integrated
system,
that
at
units
canproving,
becomeon
resetthe units.
--three
counting
the number
of inhibited
units
integrated
system,
most
can become
reset=(inhib1
and inhib2
and inhib3)
or
inhibited,
i.e.,
thatthat
not atonly
aretwo
the units
authorizations
correctly
three =(inhib1
and
inhib2
and
inhib3)
or
(inhib1
and
inhib2
and
inhib4)
or
inhibited,
i.e.,
that
not
only
are
the
authorizations
correctly
The
current
version
of
NBac
is
not
able
to
perform
this
delivered, but also that they are correctly obeyed by the units.
(inhib1
(inhib1 and
and inhib2
inhib3 and
and inhib4)
inhib4) or
or
for
two
delivered,
but also
that
they
are correctly
obeyed
the units.
Theverification,
current
version
ofreasons:
NBac
is not able
to by
perform
this
(inhib1
(inhib2 and
and inhib3
inhib3 and
and inhib4)
inhib4);or
Theƒƒ current
version
of
NBac
ablenumerical
to perform
this
on for
one
hand,
there
are istoonot
many
variables,
verification,
two
reasons:
and inhib3 and inhib4);
inhib1 (inhib2
= EQState(st1,state3);
which
makes
the
symbolic
computations
very
complex.
verification,
for
two
reasons:
inhib1
inhib2 == EQState(st1,state3);
EQState(st2,state3);
• on one hand, there are too many numerical variables, An
interesting
remark
that
the variables
corresponding
inhib2
• on
one
hand, the
there
areistoo
many
numerical
variables,
inhib3 == EQState(st2,state3);
EQState(st3,state3);
which
makes
symbolic
computations
very
complex. to
inhib3
== EQState(st3,state3);
“roll”
measures,
while being
used to determine
failures,
inhib4
EQState(st4,state3);
which
makes
the
symbolic
computations
very
complex.
An interesting remark is that the variables corresponding
inhib4 = EQState(st4,state3);
tel
have no real
influence
on the
the variables
property; corresponding
the tool is not able
An
interesting
remark
is
that
to “roll” measures, while being used to determine failures,
tel
to detect
this fact,
and to “slice”
these
variables away.
to
“roll”
measures,
while
used to
determine
have
no real
influence
onbeing
the property;
the tool is failures,
not able
ƒ
ƒ
on
the
other
hand,
the
determination
of
the
suitable
Lesar proves this property in 69 sec.
have
no real
property;
tool is not
able
to detect
thisinfluence
fact, andontothe
“slice”
thesethe
variables
away.
Lesar proves this property in 69 sec.
to detect this fact, and to “slice” these variables away.
Lesar proves this property in 69 sec.
CSI Journal of Computing | Vol. 1 • No. 4, 2012
14
7 : 83
several aspects: the language itself, the design methodology,
andvalidated
the validation
tools.
and
with associated
tools. The study throws light on
The aspects:
study demonstrates
that case
study methodology,
is quite wellseveral
the language itself,
the design
suited
Lustre. tools.
A similar experiment performed with Esterel
and
thefor
validation
[RS00],
showed that athat
data-flow
language
much
The [SS03]
study demonstrates
case study
is is
quite
more natural
for most
features
of the system.
Now,
wellsuited
for Lustre.
A similar
experiment
performed
with we
restricted
ourselves
to showed
using only
thea kernel
of Lustre
which
Esterel
[RS00],
[SS03]
that
data-flow
language
is is
much
more natural
most features
of the In
system.
Now,
we
compatible
with theforindustrial
tool Scade.
the full
academic
restricted
ourselves
of Lustre
is
version of
Lustre, to
weusing
alsoonly
havethe
a kernel
powerful
notionwhich
of arrays,
compatible
with
the
industrial
tool
Scade.
In
the
full
academic
the use of which would have significantly reduced the size
version
Lustre, weall
also
have
a powerful notion
of arrays,
of the of
description:
data
symmetrically
processed
in each
the
use
of
which
would
have
significantly
reduced
the size of
channel should be structured in arrays. This usefulness
of the description: all data symmetrically processed in each
arrays, both concerning the structure of the source program,
channel should be structured in arrays. This usefulness of
and the quality of the generated code, was confirmed by other
arrays, both concerning the structure of the source program,
case studies. The most imperative part of the system is the
and the quality of the generated code, was confirmed by other
automata
used
FailDetect
reset-inhibition.
case
studies.
Theinmost
imperative for
partdefining
of the system
is the
While
the
description
of
such
small
automata
is
not difficult in
node verif(ask1, ask2, ask3, ask4, reset: bool; automata used in FailDetect for defining reset-inhibition.
st1, st2, st3, st4: [bool, bool]) While
Lustre,
would be obviously
easier
with anisextension
based
theit description
of such small
automata
not difficult
returns (ok: bool);
on
automata
[MR03].
in Lustre, it would be obviously easier with an extension
var morethantwoasks : bool;
based
on automata
[MR03].
Concerning
programming
methodology, we have tried to
req1, req2, req3, req4: bool;
promote
a
progressive
approach:
the whole
Concerning programming methodology,
wearchitecture
have tried is
inhib1, inhib2, inhib3, inhib4: bool;
to
promote
a
progressive
approach:
the
whole
architecture
designed
first,
and
its
components
are
progressively
detailed
let
isin designed
and isitsthat
components
are progressively
-- if more than 2 requests, then
turn. Thefirst,
interest
an integrated
— yet still in-- at least 2 channels in state 3
detailed
in turn.
The interest
that an integrated
yet still for
complete
—, version
of theisprogram
is always —
available
ok = (morethantworeqs =>
incomplete
—,
version
of theThis
program
is always
available
simulation
and
validation.
approach
completely
differs
not #(inhib1, inhib2, inhib3, inhib4)); for simulation and validation. This approach completely
from the “progressive refinement” generally advocated (e.g.,
differs from the “progressive refinement” generally advocated
in B [Abr95]), where a non-deterministic specification is
inhib1 = EQState(st1,state3);
(e.g., in B [Abr95]), where a non-deterministic specification
inhib2 = EQState(st2,state3);
progressively refined (i.e., made more deterministic) into a
is progressively refined (i.e., made more deterministic) into
inhib3 = EQState(st3,state3);
aprogram.
program.Here,
Here,since
since we
we use
use aa programming
programming language,
language, all
inhib4 = EQState(st4,state3);
our
descriptions
are
deterministic.
So
the
design
all our descriptions are deterministic. So theproceeds
design in
the
reverse
direction:
an
initial,
very
simplified
description,
proceeds in the reverse direction: an initial, very simplified is
req1 = if (true -> reset) then false
progressively
until itenriched
matchesuntil
the whole
specification.
else pre(req1) or ask1;
description,
is enriched
progressively
it matches
the
req2 = if (true -> reset) then false
Of course,
this pragmatic
approach
is less “formal”,
butis we
whole
specification.
Of course,
this pragmatic
approach
else pre(req2) or ask2;
less
“formal”,
believe
thatsince
it is more
realistic,
it
believe
that itbut
is we
more
realistic,
it better
meetssince
engineers
req3 = if (true -> reset) then false
better
meets
engineers
uses.
Also,
because of
the
weaknesses
uses.
Also,
because
of
the
weaknesses
of
formal
validation
else pre(req3) or ask3;
oftools,
formal
tools,state,
in their
present
early by
error
in validation
their present
early
error state,
detection
early
req4 = if (true -> reset) then false
detection
by —
early
simulation
— which
allowed by
simulation
which
is allowed
by ourisapproach
— our
seems
else pre(req4) or ask4;
approach
— seems
to be
more
practical
to be more
practical
than
early
proof.than early proof.
morethantworeqs =
Finally,
it
was
an
interesting
ourour
validation
Finally, it was an interestingchallenge
challengeforfor
validation
not #(req1, req2, req3, req4); tools. We showed, first, that the Luciole simulator is
tools.
We
showed,
first,
that
the
Luciole
simulator
is
extremely
tel
extremely useful, at each stage of the development, to quickly
useful, at each stage of the development, to quickly check
check either the whole program or a single node. Concerning
either the whole program or a single node. Concerning veriwe faced the usual problems due to lack of global
Lesarfinds
findsthat
that
this
property
false.
This
due
the
Lesar
this
property
is is
false.
This
is isdue
totothe
fact verification,
fication, we However,
faced the some
usual properties
problems due
lack of global
specification.
wereto available.
A
fact
requests
produce
state
changes
at the
next
step.
thatthat
requests
onlyonly
produce
state
changes
at the
next
step.
The specification. However, some properties were available. A
first remark is that proving properties directly on the source
The correct property must be written:
correct property must be written:
first remark
is thatinproving
properties
directly ontool
thewith
source
program
requires,
most cases,
a verification
ok = true ->
program
requires,
in
most
cases,
a
verification
tool
with
some numerical capabilities. Such capabilities are provided
(pre(morethantworeqs)
=>
ok = true ->
some
numerical
capabilities.
Suchthat
capabilities
are provided
by
NBac,
but the size
of problems
it can handle
must
(pre(morethantworeqs)
=>
not
#(inhib1, inhib2, inhib3,
inhib4));
suggests that
someit ways
in which
byincreased.
NBac, butThe
thecase
sizestudy
of problems
can handle
must
not #(inhib1, inhib2, inhib3, inhib4)); be
Now, Lesar finds, in 130 sec., that the property is
the
can help
NBac:
mentioned
already,
be user
increased.
The
case as
study
suggests
some there
ways should
in which
satisfied.
be a way of abstracting away some Boolean variables whose
Now, Lesar finds, in 130 sec., that the property is satisfied. the user can help NBac: as mentioned already, there should
dependences
numericalaway
values
obviously
influence
be a way of on
abstracting
some
Booleandon’t
variables
whose
VIII. Conclusion
the
validity
of
the
property;
it
was
the
case,
in
our
example,
dependences
on
numerical
values
obviously
don’t
influence
C ONCLUSION
We presented aVIII.
real case
study5 described in Lustre,
of the failure detections. On the other hand, the user should
the validity of the property; it was the case, in our example,
5
5
We
presented
a
real
case
study
described
in
Lustre,
and
For concision and confidentiality reasons, the case study was slightly
simplified
the paper, On
but the
it isother
still representative
the
of the
failureindetections.
hand, the userof should
validated
with
associated
complexity
of actual
system.tools. The study throws light on be able to suggest an initial control structure; for instance the
automata used to determine the reset-inhibition were obviously
5 For concision and confidentiality reasons, the case study was slightly
simplified in the paper, but it is still representative of the complexity of actual relevant to the verification. We were not able to apply our autoCSI Journal
Vol. 1because
• No. 4,
matic
testing tooloftoComputing
this example, |mainly
of 2012
missing
system.
F. Maraninchi, et. al.
An other interesting property is that the allocator allows a
maximum
number
of channels
An other
interesting
propertytois become
that the reset-inhibited.
allocator allows In
words,number
we want
that, whenever
atreset-inhibited.
least two channels
aother
maximum
of channels
to become
In
have words,
requested
become
since
thechannels
last “Onother
we to
want
that, reset-inhibited
whenever at least
two
GroundReset”,
then
at least
two of them since
(in fact,
two,
have
requested to
become
reset-inhibited
theexactly
last “Onbecause of thethen
previous
property)
are actually
GroundReset”,
at least
two of them
(in fact,reset-inhibited.
exactly two,
because
of the previous
property) are
reset-inhibited.
This property
can be expressed
by actually
the following
observer: it
This
property
can
be
expressed
by
the
following
observer: it
receives the requests from channels to become reset-inhibited,
receives
the requests from
channels
become
the OnGroundReset”
signal,
and tothe
state reset-inhibited,
of the channels.
the
OnGroundReset”
signal,
and
the
state
of signal.
the channels.
The requests are memorized from each reset
Then the
The
requests
are
memorized
from
each
reset
signal.
the
output is true iff whenever there is at least two Then
memorized
output is true iff whenever there is at least two memorized
requests, at least two channels are in state 3 (remember that, in
requests, at least two channels are in state 3 (remember that,
Lustre, #(x1,x2,...,xn) is a Boolean expression which
in Lustre, #(x1,x2,...,xn) is a Boolean expression which
is true iff at most 1 of its Boolean arguments xi is true;
so,
so, its
is true iff at most 1 of its Boolean arguments xi is true;
its
negation
expresses
that
at
least
two
of
them
are
true):
negation expresses that at least two of them are true):
7 : 84
Specification and Validation of Embedded Systems: A Case Study of a
Fault-Tolerant Data Acquisition System with Lustre Programming environment
be able to suggest an initial control structure; for instance
the automata used to determine the reset-inhibition were
obviously relevant to the verification. We were not able to
apply our automatic testing tool to this example, mainly
because of missing knowledge, both about the assumptions
on the environment and about the global properties to be
checked on the whole system. This is a challenging area of
pursuit.
Springer Verlag, October 2002.
[HLR93] N. Halbwachs, F. Lagnier, and P. Raymond.
Synchronous observers and the verification of reactive
systems. In M. Nivat, C. Rattray, T. Rus, and G. Scollo,
editors, Third Int. Conf. on Algebraic Methodology and
Software Technology, AMAST’93, Twente, June 1993.
Workshops in Computing, Springer Verlag.
[HPR97]
N. Halbwachs, Y.E. Proy, and P. Roumanoff. Verification
of realtime systems using linear relation analysis.
Formal Methods in System Design, 11(2):157–185,
August 1997.
[JHR99]
B. Jeannet, N. Halbwachs, and P. Raymond. Dynamic
partitioning in analyses of numerical properties. In A.
Cortesi and G. Fil´e, editors, Static Analysis Symposium,
SAS’99, Venice (Italy), September 1999. LNCS 1694,
Springer Verlag.
References
[Abr95]
J.-R. Abrial. The B-Book. Cambridge University Press,
[BB91] 1995. A. Benveniste and G. Berry. Another look at realtime programming. Special Section of the Proceedings of
the IEEE, 79(9), September 1991.
[BCE+03]
A. Benveniste, P. Caspi, S.A. Edwards, N. Halbwachs,
P. Le Guernic, and R. de Simone. The synchronous
languages 12 years later. Proceedings of the IEEE, 91(1),
January 2003.
[BCM+90] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and
J. Hwang. Symbolic model checking: 1020 states and
beyond. In Fifth IEEE Symposium on Logic in Computer
Science, Philadelphia, pages 428–439, June 1990.
[JHR+07] E. Jahier, N. Halbwachs, P. Raymond, X. Nicollin,
and D. Lesens. Virtual execution of AADL models via
a translation into synchronous programs. In EMSOFT
2007, Salzburg, Austria, 2007.
[JRB04]
E. Jahier, P. Raymond, and P. Baufreton. Case studies
with Lurette V2. In First International Symposium on
Leveraging Applications of Formal Method, ISoLa 2004,
Paphos, Cyprus, October 2004.
[MG00]
F. Maraninchi and F. Gaucher. Step-wise + algorithmic
debugging for reactive programs: Ludic, a debugger
for lustre. In AADEBUG’2000 – Fourth International
Workshop on Automated Debugging, Munich, aug 2000.
[MR03]
F. Maraninchi and Y. R´emond. Mode-automata: a new
domainspecific construct for the development of safe
critical systems. Science of Computer Programming,
46(3):219–254, 2003.
[RHR91]
[CGP99] P. Caspi, A. Girault, and D. Pilaud. Automatic
distribution of reactive systems for asynchronous
networks of processors. IEEE Transactions on Software
Engineering, 25(3):416–427, 1999. Research report
INRIA 3491.
C. Ratel, N. Halbwachs, and P. Raymond. Programming
and verifying critical systems by means of the
synchronous dataflow programming language LUSTRE.
In ACM-SIGSOFT’91 Conference on Software for
Critical Systems, New Orleans, December 1991.
[RS00]
B. Rajan and R. K. Shyamasundar. A fault tolerant
distributed system in Multiclock Esterel. In FORTE/
PSTV, pages 301–316. North Holland, October 2000.
[CHPP87] P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice.
LUSTRE, a declarative language for programming
synchronous systems. In 14th Symposium on Principles
of Programming Languages, Munich, January 1987.
[RWNH98] P. Raymond, D. Weber, X. Nicollin, and N. Halbwachs.
Automatic testing of reactive systems. In 19th IEEE
Real-Time Systems Symposium, Madrid, Spain,
December 1998.
[CMSW99] P. Caspi, C. Mazuet, R. Salem, and D. Weber. Formal
design of distributed control systems with Lustre. In
Proc. Safecomp’99, volume 1698 of Lecture Notes in
Computer Science. Springer Verlag, September 1999.
[SBT96]
T. R. Shiple, G. Berry, and H. Touati. Constructive
analysis of cyclic circuits. In International Design and
Testing Conference IDTC’96, Paris, France, 1996.
[SC04]
N. Scaife and P. Caspi. Integrating model-based design
and preemptive scheduling in mixed time- and eventtriggered systems. In Euromicro conference on RealTime Systems (ECRTS’04), Catania, Italy, June 2004.
[SS03]
A.D. Shabbir and R. K. Shyamasundar. Specification
of fault tolerant Gyroscopic controller in Esterel.
In Internal Report of External Funded Project. TIFR,
2003.
[BS91]
F. Boussinot and R. de Simone. The Esterel language.
Proceedings of the IEEE, 79(9):1293–1304, September
1991.
[Cas01]
P. Caspi. Embedded control: From asynchrony to
synchrony and back. In 1st International Workshop
on Embedded Software, EMSOFT2001, Lake Tahoe,
October 2001. LNCS 2211.
[CC77]
[CPP05]
[HB02]
P. Cousot and R. Cousot. Abstract interpretation: a
unified lattice model for static analysis of programs by
construction or approximation of fixpoints. In 4th ACM
Symposium on Principles of Programming Languages,
POPL’77, Los Angeles, January 1977.
Jean-Louis Colac¸o, Bruno Pagano, and Marc Pouzet. A
Conservative Extension of Synchronous Data-flow with
State Machines. In ACM International Conference on
Embedded Software (EMSOFT’ 05), Jersey city, New
Jersey, USA, September 2005.
N. Halbwachs and S. Baghdadi. Synchronous modeling
of asynchronous systems. In EMSOFT’02. LNCS 2491,
CSI Journal of Computing | Vol. 1 • No. 4, 2012
7 : 85
F. Maraninchi, et. al.
About the Authors
Florence Maranichi, is a Professor, Grenoble INP / ENSIMAG (Director of International
Relations and is in charge of the “Embedded Software and Systems” master curriculum) at
VERIMAG laboratory (Head of the group Synchronous Languages and Reactive Systems)
Dr. Nicolas Halbwachsis Directeur de Recherche at CNRS, and Director of Verimag
Laboratory
Dr. Pascal Raymond is Chargé de Recherche au CNRS.
Dr. Catherine Parent-Vigouroux is an Associate Professor at University Joseph Fourier,
Grenoble
Prof. R. K. Shyamasundar, Fellow IEEE, Fellow ACM is a Senior Professor and JC Bose
National Fellow at TIFR.
CSI Journal of Computing | Vol. 1 • No. 4, 2012
Download