10-11-pike

advertisement
Copilot:
A Hard Real-Time Runtime Monitor
Lee Pike | Galois, Inc. | leepike@galois.com
joint work with
Alwyn Goodloe | National Institute of Aerospace
Robin Morisset | École Normale Supérieure
Sebastian Niller | Technische Universität Ilmenau
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTi me™ and a
decompressor
are needed to see thi s pi cture.
Need
How do you know your embedded SW won’t fail?
 Certification (e.g., DO-178B) is largely process-oriented
 Probably not formally verified
• Even if so, may make bad assumptions
• E.g., faults, hardware behavior, program semantics, timing, etc.
 Unanticipated faults lead to unexpected behavior
 Unanticipated behavior can put life at risk
Need to detect/respond at runtime
© 2010 Galois, Inc.
Just the FaCTS, Ma’am
The Constraints
A runtime monitoring solution must satisfy the FaCTS:
 Functionality: don’t change the target’s behavior
 Certifiability: don’t require re-certification (e.g., DO-178B)
of the target
 Timing: don’t interfere with the target’s timing
 SWaP: don’t exhaust size, weight, power reserves
© 2010 Galois, Inc.
State of Affairs
of Embedded SW Runtime Verification
 Most runtime monitoring approaches violate at least one
FaCTS constraints:
• Inlining monitors changes timing properties and makes a new
program (re-certification?)
• Few constant-time/space monitor synthesis tools
 Embedded C has not received the same attention as Java
© 2010 Galois, Inc.
Outline
1. A solution: the Copilot language and compiler
2. Pilot-study1: injecting software faults in a fault-tolerant airspeed system
3. Monitor correctness
4. Conclusions
1Pun
intended.
© 2010 Galois, Inc.
Copilot Design (1/2)
 Target: hard real-time embedded systems
• Common for guidance, navigation, and control systems
 Monitoring by sampling data at fixed time period
• Could be C variables/arrays, memory-mapped registers, etc.
• Adequate for hard real-time target programs, driven primarily by
time, not events
• Doesn’t require inlining monitors
© 2010 Galois, Inc.
Copilot Design (2/2)
 Stream / data-flow specification language
• Think: Haskell infinite lists
 Generates embedded ANSI C99 programs
• Can run “bare” on minimal microprocessors
 Does not modify or instrument target software
• Generates it's own schedule---no RTOS needed
– Distributes monitor processing across the schedule
• But could run as one high-priority RTOS task
 Constant-time & constant-space code
• Very small data footprints
• Easier WCET estimation
 Side effects impossible
• A tenet of dataflow languages
© 2010 Galois, Inc.
Copilot Implementation:
the eDSL Approach
Macro
languag
e
 A Haskell embedded domainspecific language (eDSL) -e.g., a library
 Types, parser, lexer, macro
language... for free
• ...And probably correct
Front-end
Atom
http://hackage.haskell.org/package/atom
 Atom eDSL is the backend
• Originally developed by Tom
Hawkins (huge thanks to Tom!)
• Very useful itself for writing more
procedural real-time C code
© 2010 Galois, Inc.
Copilot
Haskell
 “Compile” and “interpret” are
functions
• Interpreter (almost) for free
Interpreter
Stream Semantics
(Append)
all operators are
lifted in Copilot
 x .= [0, 1, 2] ++ (varW64 x + 3)
 f [0, 1, 2]
(in Copilot)
(in Haskell)
where f :: [Word64] -> [Word64]
f x = x ++ f (map (+3) x)
 x = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ...
 x = [0, 1, 2]
(+3)
[3, 4, 5]
(+3)
[6, 7, 8]
...
© 2010 Galois, Inc.
Stream Semantics
(Drop)
 x .= [0, 1, 2] ++ (var x + 3)
 y = drop 2 (var x)
 x = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ...
 y = 2, 3, 4, 5, 6, 7, 8, 9, 10 ...
© 2010 Galois, Inc.
Timed Semantics
 Period: duration between discrete events
 Phase: offsets into the period
x1(); x2();
 Example:
• x: period 4 phase 1
• y: period 4 phase 3
 Copilot ensures synchronization between streams
• Assuming synchronization of phases in distributed systems: no
non-faulty processor reaches the start of phase p+1 until every nonfaulty processor has started phase p
© 2010 Galois, Inc.
Example Copilot Specification
“If the temperature rises more than 2.3 degrees within 2
seconds, then the engine has been shut off.”
(period == 1 sec)
initial “don’t care”
values
external
variable
engine :: Streams
engine = do
temps
.= [0, 0, 0] ++ extF temp 1
overTempRise .= drop 2 (varF temps) > 2.3 + var temps
trigger
.= var overTempRise ==> extB shutoff 2
variables are
strings
x :: String
x = “x”
© 2010 Galois, Inc.
phase to
sample in
Sample Code Generated
(Incomplete)
state-update function
for trigger stream
engine :: Streams
engine = do
...
trigger =
(var overTempRise)
==> (extB shutoff 2)
external variable
sample function
/* engine.sample__shutoff_2 */
static void __r6() {
bool __0 = true;
bool __1 = shutoff;
if (__0) {
}
engine.tmpSampleVal__shutoff_2 = __1;
}
© 2010 Galois, Inc.
/* engine.updateOutput__trigger */
static void __r0() {
bool __0 = true;
bool __1 = engine.tmpSampleVal__shutoff_2;
bool __2 = ! __1;
float __3 = 2.3F;
uint64_t __4 = 0ULL;
uint64_t __5 = engine.outputIndex__temps;
uint64_t __6 = __4 + __5;
uint64_t __7 = 4ULL;
uint64_t __8 = __6 % __7;
float __9 = engine.prophVal__temps[__8];
float __10 = __3 + __9;
uint64_t __11 = 2ULL;
uint64_t __12 = __11 + __5;
uint64_t __13 = __12 % __7;
float __14 = engine.prophVal__temps[__13];
bool __15 = __10 < __14;
bool __16 = __2 && __15;
bool __17 = ! __16;
if (__0) {
}
engine.outputVal__trigger = __17;
}
Types
 Types: Int & Word (8, 32, 64), Float, Double
 Each stream has a unique type:
x .= ([0, 1] ++ (varW64 x + 3) :: Spec Word64)
 Type-inference to minimize “type hints”:
x .= extI32 a 1 > extI32 b 2
y .= [0, 1] ++ (varW64 y + 3 + var y)
 Casting
• Implicit casting is a type-error:
x .= varW64 y + varW32 z
• Explicit casting guarantees:
– signs never lost (no Int --> Word casts)
– No overflow (no cast to a smaller width)
x .= [True] ++ not (var x)
y .= castI32 (varB x) + castI32 (castW8 (varB x))
© 2010 Galois, Inc.
“Sir, it’s eDSLs all the way down!” (1/2)
 Don’t like the stream language interface? Then define
another DSL over top.
 We’ve done just that for
• LTL
• past-time LTL
• Statistical properties (in-progress)
 Think of these as both new DSLs or as Copilot Libraries.
 Because they’re all in the same host-language, they can be
intermingled.
© 2010 Galois, Inc.
“Sir, it’s eDSLs all the way down!” (2/2)
Example: "If the engine temperature exeeds 250 degrees,
then the engine is shut off within one second, and in the
0.1 second following the shutoff, the cooler is engaged and
remains engaged.”
(period == 0.1 sec)
engine :: Streams
engine = do
temp
`ptltl` alwaysBeen (extW8 engineTemp 1 > 250)
cnt
.=
[0] ++ mux (varB temp && varW8 cnt < 10)
(varW8 cnt + 1)
(varW8 cnt)
off
.=
(varW8 cnt >= 10 ==> extB engineOff 1)
cooler `ptltl` (extB coolerOn 1 `since` extB engineOff 1)
monitor .=
varB off && varB cooler
© 2010 Galois, Inc.
Copilot Language Restrictions
Design goal: make memory usage constant and “obvious” to
the programmer
 No anonymous streams
 No lazily-computed values
• E.g. x .= [0] + (varW16 x) + 1
y .= drop 2 (varW16 x)
 Other restrictions (see paper)
 Upshot: “WYSIWYG memory usage”
• Memory constrained by number of streams
• Memory for each stream is essentially the LHS of ++
• Doesn’t include stack variables
© 2010 Galois, Inc.
Timing Info & Expression Counts
Timing
info
Expression
count
Period
-----3
3
3
3
3
3
3
Phase
----0
0
0
1
1
2
2
Exprs Rule
----- ---18 engine.updateOutput__trigger
14 engine.updateOutput__overTempRise
3 engine.update__temps
7 engine.output__temps
2 engine.sample__temp_1
6 engine.incrUpdateIndex__temps
2 engine.sample__shutoff_2
----52
Hierarchical Expression Count
helps
Total
-----52
6
7
2
2
14
18
3
Local
-----0
6
7
2
2
14
18
3
Rule
---engine
incrUpdateIndex__temps
output__temps
sample__shutoff_2
sample__temp_1
updateOutput__overTempRise
updateOutput__trigger
update__temps
Generated engine.c and engine.h
Moving engine.c and engine.h to ./
Calling the C compiler ...
gcc ./engine.c -o ./engine -Wall
© 2010 Galois, Inc.
with
WCET analysis
...
Interpreter / Semantics
(in one slide)
interpret :: Streamable
interpret inVs moVs s =
case s of
Const c
->
Var v
->
PVar _ v _
->
PArr _ (v,s') _ ->
Append ls s'
Drop i s'
F f _ s'
F2 f _ s0 s1
->
->
->
->
F3 f _ s0 s1 s2
->
© 2010 Galois, Inc.
a => Vars -> Vars -> Spec a -> [a]
repeat c
getElem v inVs
getElem v moVs
map (\i -> getElem v moVs !! fromIntegral i)
(interpret inVs moVs s')
ls ++ interpret inVs moVs s'
drop i $ interpret inVs moVs s'
map f $ interpret inVs moVs s'
zipWith f (interpret inVs moVs s0)
(interpret inVs moVs s1)
zipWith3 f (interpret inVs moVs s0)
(interpret inVs moVs s1)
(interpret inVs moVs s2)
Download, Develop,
http://leepike.github.com/Copilot/
Use
BSD3
BSD3
© 2010 Galois, Inc.
Just a cabal
install away
Usage
 compile spec “c-name” [opts] baseOpts
 interpret spec rounds [opts] baseOpts
 test rounds [opts] baseOpts
• quickChecking the compiler/interpreter
 verify filepath int
• SAT solving on the generated C program
 help (commands and options)
 [spec] (parser)
 Opts (incomplete list):
•
•
•
•
•
C trigger functions
Ad-hoc C code (library included for writing this)
Hardware clock
Verbosity
GCC options
© 2010 Galois, Inc.
Interlude: Pitot Failures
Failures cited in

Northwest Orient Airlines Flight 6231 (1974)---3 killed
•

Birgenair Flight 301, Boeing 757 (1996)---189 killed
•


Tape left on the static port(!) gave erratic data
Líneas Aèreas Flight 2553, Douglas DC-9 (1997)---74 killed
•
Freezing caused spurious low reading, compounded with a
failed alarm system
•
Speed increased beyond the plane’s capabilities
Air France Flight 447, Airbus A330 (2009)---228 killed
•
•

One of three pitot tubes blocked; faulty air speed indicator
Aeroperú Flight 603, Boeing 757 (1996)---70 killed
•

Increased climb/speed until uncontrollable stall
Airspeed “unclear” to pilots
QuickTime™ and a
decompressor
are needed to see this picture.
Still under investigation
Not an exhaustive list!
© 2010 Galois, Inc.
Test TBed

Representative of fault-tolerant systems

4 X STM microcontrollers

ARM Cortex M3 cores clocked at 72 Mhz

MPXV5004DP differential pressure sensor

•
Senses dynamic and static pitot tube
pressure
•
Pitot tubes measure airspeed
Designed to fit UAS (unpiloted air system)
•
Size, power, weight,...
© 2010 Galois, Inc.
QuickTime™ and a
decompressor
are needed to see this picture.
Test Bed Architecture
QuickTime™ and a
decompressor
are needed to see this picture.
© 2010 Galois, Inc.
Aircraft Configuration (1/2)
Edge 540T-R1
QuickTime™ and a
BMP decompressor
are needed to see this picture.
© 2010 Galois, Inc.
Aircraft Configuration (2/2)
QuickTime™ and a
BMP decompressor
are needed to see this picture.
© 2010 Galois, Inc.
Copilot Monitors
Introduced software faults to be caught by Copilot monitors:
 Abrupt airspeed change:
airspeed  > 100 m/s
 Majority assumption:
• Used the Boyer-Moore majority vote algorithm
• Finds majority in exactly one pass...
• ...If a majority exists, arbitrary value otherwise
• Monitor checks if returned value is majority
 Voting agreement:
• Check agreement between the voted values
• Uses coordinating distributed monitors
© 2010 Galois, Inc.
Now Execute the Test Suite
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
© 2010 Galois, Inc.
Monitoring Results
 Monitoring approach worked and did not disrupt the FaCTS
properties of the observed system
• Under 100 C expressions
• No more than 16 expressions per phase
• Binaries on the order of 10k
 Monitoring via sampling works for periodic tasks
 Off-by-one (phase) in one monitor, so spurious data
• Don’t write monitors at 2am before the flight test :)
© 2010 Galois, Inc.
Correctness (1/2)
Who watches the watchmen?
 -Wall (Haskell)
 -Wall (gcc)
• Of course, you can use lint, valgrind, etc. if you like
 Copilot is typed (for free!) by Haskell:
• Statically typed: type-checking at runtime
• Strongly typed: no implicit type conversions
– Makes hacks in the compiler hard---this is good
 “QuickCheck” equivalence checker between the Copilot
interpreter and generated C code.
• Caught a subtle dependency bug
© 2010 Galois, Inc.
Correctness (2/2)
Using CBMC “out of the box”
 Bounded Model Checker for ANSI-C
• Google: cbmc ansi-c
 Provide a depth to unroll the Copilot program
 Proves absence of
• buffer overflows (both lower and upper)
• pointer deferences of NULL
• division by zero
• floating point computations resulting in not-a-number (NaN)
• uninitialized local variables use
© 2010 Galois, Inc.
“Why not just use Esterel?”
x Not certified
 Open-source (BSD3), free
• A good platform for open research & education
• Extensible (e.g., LTL libraries)
 Focused on monitoring, not control
 Correctness:
• Copilot: ~3k LOCs, Atom: ~2k LOCs
• Working on full proof of correspondence
 Need to confirm:
• Smaller memory footprint
• More control over memory usage, scheduling, etc.
© 2010 Galois, Inc.
Future Work
 Equivalence checking: interpreter <--> C code
• Using SAT-solving on AIG structures, à la Cryptol (e.g., Erkök,
Carlsson, Wick. Hardware/software co-verification of cryptographic
algorithms using Cryptol, FMCAD, 2009)
 The steering problem
• Mode change
 Fault-tolerant monitor generation
• Monitors need 10-9 failures/hour reliability, too!
 Programmer assistance with scheduling
 Sampling rate vs. data history
 MC/DC coverage (should be nearly trivial)
© 2010 Galois, Inc.
Conclusions
 Problem space: hard real-time embedded C
• Just the FaCTS, ma’am:
– Functionality, certifiability, timing, SWaP
• Monitoring by sampling
 Goal: build a language/compiler that
• Is expressive enough -- e.g., ptLTL properties
• But enforces austere memory usage requirements
 Use an eDSL approach
• Improve the probability of correctness
• Reduce the overhead of a new compiler
Shameless plug: We’re looking for a 2010 summer intern: email
<leepike@galois.com>
Acknowledgements: Thanks to LaRC’s Safety Critical Aeronautics
Systems Branch/D320 allowing us to fly with them. This work is
supported by Contract NNL08AD13T and monitored by Dr. Ben Di Vito.
© 2010 Galois, Inc.
Appendix
© 2010 Galois, Inc.
Monitoring By Sampling
Without inlining monitors, we must sample:
 Property (011)*
 False positive (monitor misses an fault):
• Values are 0111011 but sampling 011011
 False negative (monitor signals a fault that didn’t occur):
• Values are 011011 but sampling 0111011
 Observation: with fixed periodic schedule and shared clock
• False negatives impossible
•
We don’t want to re-steer an unbroken system
• False positives possible, but requires constrained misbehavior
© 2010 Galois, Inc.
Pitot Data
QuickTime™ and a
decompressor
are needed to see this picture.
© 2010 Galois, Inc.
Distributed Monitors
© 2010 Galois, Inc.
Download