The Specification-Consistent Coordination Model (SCCM) and its applications to

advertisement
The Specification-Consistent
Coordination Model (SCCM)
and its
applications to
Byzantine Failures
The Byzantine Failure Problem
• In a large multi-processor, internal breakdowns
are expected to be common events.
• Most of these breakdowns will result in complex
behavior where the processor will return incorrect
output.
• Given the source code of the program, we want to
be able to detect and recover from such failures.
• SCCM may be a way to do this.
The Goals of SCCM
• SCCM was originally intended as an aid to
programmers.
• It divides the programming task into two stages
• A Specification that rigorously defines the algorithm.
• A Coordination, which defines the actual imperative
program and associates it with the Specification to
ensure correctness.
• SCCM employs a runtime checker to ensure that
the imperative program being executed matches
the specification.
SCCM – Specification
• SCCM defines the algorithm via a functional
language.
• For every piece of information that arises during
the algorithm's lifetime, there is a function with a
particular argument value to identify it.
• Example: sum of all the numbers in array A:
input A[]
sum(i) = sum(i-1) + A[i]
output sum(A.length)
• Each intermediate value is identified by sum(i)
SCCM – Coordination
• An imperative version of this program:
var ImpSum = 0
Array A[] = {1, 2, 3, 4, 5, 6}
for i=1 to A.length
ImpSum = ImpSum + A[i]
Output ImpSum
• We can ensure that this program matches the
Specification by associating with every value of
ImpSum a corresponding identifier sum(i).
Named Values
• SCCM works by naming each piece of mutable
storage with some f(x) from the specification.
• It maintains correctness by ensuring that when the
imperative program overwrites values, it
transforms their names in a way consistent with
the specification.
• Because all values have names and names may
only be transformed in consistent ways, SCCM
ensures that the implementation's control flow is
the same as in the specification.
Summation's Named Values
• In command ImpSum = ImpSum + A[i],
we would use the definition of sum() to transform
ImpSum's name from sum(i) to sum(i+1).
• Sum(): sum(i) = sum(i-1) + A[i]
• ImpSum's values and names:
ImpSum's Value:
0
ImpSum's Name: Sum(0)
1
Sum(1)
3
Sum(2)
6
Sum(3)
10
Sum(4)
15
Sum(5)
21
Sum(6)
Fibonacci Sequence
• The algorithm specification for the Fibonacci
Sequence is simple:
fib(0) = 1
fib(1) = 1
fib(i) = fib(i-1) + fib(i-2)
• An implementation will have to name each of its
values with some fib(i) and only use the above
rule to transform values.
Fibonacci – Specification
• The source code of the specification of the
Fibonacci Sequence algorithm.
Fibonacci – Coordination
The Consistency Link
• Full Application: “A:=fib(0)”
• Here, fib() is actually called with 0 as the
argument and its return value, 1 is assigned to A.
fib(0) is now A's name.
• Fetch Application: “C:=fib(i) <- A”
• The value and name of A are copied into C.
SCCM makes sure that before the copy, A's name
is fib(i).
The Consistency Link 2
• Step Application:
“B:=fib(i+2)|(fib.l1 <-A,
fib.l2 <-C)”
• fib() is executed to obtain the value of B.
• Rather than wastefully recursively calling fib(i+1) and
fib(i), SCCM pulls those values from A and C.
• It ensures that the name in A is fib(i+1) and C's name
is f(i).
• Thus, B gets its value and SCCM ensures that proper
control flow was maintained.
Potential Coding Errors
• Errors in the imperative program are caught.
• Example: Setting loop bounds to (0,n) rather than
(0, n-1) results in fib(n+1) being output rather
than fib(n). SCCM detects this error.
• In general, it is hard to make errors in both the
specification and the coordination that match each
other.
SCCM Message Passing
• SCCM allows us to create parallel programs via
message passing.
• We can send and receive SCCM named values,
with SCCM ensuring global adherence to the
specification.
• Both the Send and the Receive are checked.
SCCM – Send
• Sample Send:
send n() <- N to <destination>
endsend
• The value of N, named n() is sent out.
• We can send out single elements or lists of
elements.
• SCCM makes sure that the values sent out
actually have the names the Send command
claims them to have.
SCCM – Receive
• Sample Receive:
recv
match n() := N
endrecv
• The value of N, that was sent in the prior slide is
received by the target processor.
• N's value must be named n(), just like in the send.
• All receives (as far as I can tell) are ReceiveAny's.
Another Send Example
• Sample send command:
send a(i, 2*i) <- A(i, i)
for i in (1,3)
to <destination>
endsend
• The contents of 3 diagonal elements of A[][] are
sent, named a(1,2), a(2,4), a(3,6) to the
destination processor.
• SCCM checks that those are indeed the names in
those diagonal elements.
Another Receive Example
• Sample receive command:
recv
check a(i, 2*i) =: B(i)
match for i:int in (s,t)
endrecv
• The diagonal elements of A[][] are now received.
• Their names must be the same but they may be
saved into some other structure at the target
processor. (like the array B[])
SCCM Performance
• When the same problem is implemented in C,
SCCM and SML:
• SCCM is usually 6-9 time slower than C because of
all the runtime checking overhead.
• SCCM is 50% faster than SML, because SCCM
produces imperative programs that do not have SML's
functional overheads.
Is SCCM useful for Programmers?
• The amount of time one spends writing a SCCM
program is much larger than for a normal
program.
• Arguably, this is less than the amount of time
spent on debugging but writing a specification for
a large system would be very hard.
• Most programmers would find it hard to express
their algorithms in purely functional notation.
• Programs in SCCM are several times longer than
their equivalents in C.
• Example: Bubble Sort.
SCCM for Byzantine Failures
• SCCM effectively captures a program's control
flow.
• The price for the programmer is having to write a
more complex program that is several times
longer.
• We are trying to design compilers techniques that
can verify whether a processor has faithfully
executed a program.
• Thus, the added difficulty does not concern our
purposes.
SCCM for Byzantine Failures
• We may be able to annotate a program so that
after execution it can prove to us that it
transformed all of its data according to the
original source code.
• SCCM can be thought of a system for creating
problem-specific type systems. Can we create a
Linear-Algebra specific type system?
• Can Model Checking help us determine a
program's legal set of data transformations?
Related Fields
I. Certification Trails
• A Certification Trail is a trail of information a
program leaves behind, describing its work.
• After the first program completes, a second
program can use this trail to perform the same
computation much more quickly.
• Thus, the certification trail for a program acts
much like a checksum or parity bit for data.
• Little overhead is required.
• Problem: Currently this approach requires mostly
manual work. No techniques exist for compilers
to generate certification trails.
Related Fields
II. Result Checking
• A subfield of CS Theory dealing with ways to
probabilistically verify the correctness of an
algorithm's output.
• Related to Interactive Proofs.
• Problems:
• Though the focus is on checkers that are
asymptotically faster than the actual algorithm, most
solutions are too inefficient to be used in practice.
• There is no general methodology for generating
checkers for problems and most checkers in existence
are for obscure and specialized problems.
Related Fields
III. Replication
• Run the same program on multiple computers.
Compare their output to protect from corruption.
• The only available solution to Byzantine Failures.
• Very resource inefficient. Most replication-based
approaches require 3 times as many resources as
unprotected systems.
• ED4I – run the same program twice with different
data to detect permanent and transient faults.
• BFS – Replicated services. Processors Vote on
results. Resilient to f faults by using 3f+1
replicas.
Gaussian Elimination
Download