The Specification-Consistent Coordination Model (SCCM) and its applications to Byzantine Failures The Byzantine Failure Problem • In a large multi-processor, internal breakdowns are expected to be common events. • Most of these breakdowns will result in complex behavior where the processor will return incorrect output. • Given the source code of the program, we want to be able to detect and recover from such failures. • SCCM may be a way to do this. The Goals of SCCM • SCCM was originally intended as an aid to programmers. • It divides the programming task into two stages • A Specification that rigorously defines the algorithm. • A Coordination, which defines the actual imperative program and associates it with the Specification to ensure correctness. • SCCM employs a runtime checker to ensure that the imperative program being executed matches the specification. SCCM – Specification • SCCM defines the algorithm via a functional language. • For every piece of information that arises during the algorithm's lifetime, there is a function with a particular argument value to identify it. • Example: sum of all the numbers in array A: input A[] sum(i) = sum(i-1) + A[i] output sum(A.length) • Each intermediate value is identified by sum(i) SCCM – Coordination • An imperative version of this program: var ImpSum = 0 Array A[] = {1, 2, 3, 4, 5, 6} for i=1 to A.length ImpSum = ImpSum + A[i] Output ImpSum • We can ensure that this program matches the Specification by associating with every value of ImpSum a corresponding identifier sum(i). Named Values • SCCM works by naming each piece of mutable storage with some f(x) from the specification. • It maintains correctness by ensuring that when the imperative program overwrites values, it transforms their names in a way consistent with the specification. • Because all values have names and names may only be transformed in consistent ways, SCCM ensures that the implementation's control flow is the same as in the specification. Summation's Named Values • In command ImpSum = ImpSum + A[i], we would use the definition of sum() to transform ImpSum's name from sum(i) to sum(i+1). • Sum(): sum(i) = sum(i-1) + A[i] • ImpSum's values and names: ImpSum's Value: 0 ImpSum's Name: Sum(0) 1 Sum(1) 3 Sum(2) 6 Sum(3) 10 Sum(4) 15 Sum(5) 21 Sum(6) Fibonacci Sequence • The algorithm specification for the Fibonacci Sequence is simple: fib(0) = 1 fib(1) = 1 fib(i) = fib(i-1) + fib(i-2) • An implementation will have to name each of its values with some fib(i) and only use the above rule to transform values. Fibonacci – Specification • The source code of the specification of the Fibonacci Sequence algorithm. Fibonacci – Coordination The Consistency Link • Full Application: “A:=fib(0)” • Here, fib() is actually called with 0 as the argument and its return value, 1 is assigned to A. fib(0) is now A's name. • Fetch Application: “C:=fib(i) <- A” • The value and name of A are copied into C. SCCM makes sure that before the copy, A's name is fib(i). The Consistency Link 2 • Step Application: “B:=fib(i+2)|(fib.l1 <-A, fib.l2 <-C)” • fib() is executed to obtain the value of B. • Rather than wastefully recursively calling fib(i+1) and fib(i), SCCM pulls those values from A and C. • It ensures that the name in A is fib(i+1) and C's name is f(i). • Thus, B gets its value and SCCM ensures that proper control flow was maintained. Potential Coding Errors • Errors in the imperative program are caught. • Example: Setting loop bounds to (0,n) rather than (0, n-1) results in fib(n+1) being output rather than fib(n). SCCM detects this error. • In general, it is hard to make errors in both the specification and the coordination that match each other. SCCM Message Passing • SCCM allows us to create parallel programs via message passing. • We can send and receive SCCM named values, with SCCM ensuring global adherence to the specification. • Both the Send and the Receive are checked. SCCM – Send • Sample Send: send n() <- N to <destination> endsend • The value of N, named n() is sent out. • We can send out single elements or lists of elements. • SCCM makes sure that the values sent out actually have the names the Send command claims them to have. SCCM – Receive • Sample Receive: recv match n() := N endrecv • The value of N, that was sent in the prior slide is received by the target processor. • N's value must be named n(), just like in the send. • All receives (as far as I can tell) are ReceiveAny's. Another Send Example • Sample send command: send a(i, 2*i) <- A(i, i) for i in (1,3) to <destination> endsend • The contents of 3 diagonal elements of A[][] are sent, named a(1,2), a(2,4), a(3,6) to the destination processor. • SCCM checks that those are indeed the names in those diagonal elements. Another Receive Example • Sample receive command: recv check a(i, 2*i) =: B(i) match for i:int in (s,t) endrecv • The diagonal elements of A[][] are now received. • Their names must be the same but they may be saved into some other structure at the target processor. (like the array B[]) SCCM Performance • When the same problem is implemented in C, SCCM and SML: • SCCM is usually 6-9 time slower than C because of all the runtime checking overhead. • SCCM is 50% faster than SML, because SCCM produces imperative programs that do not have SML's functional overheads. Is SCCM useful for Programmers? • The amount of time one spends writing a SCCM program is much larger than for a normal program. • Arguably, this is less than the amount of time spent on debugging but writing a specification for a large system would be very hard. • Most programmers would find it hard to express their algorithms in purely functional notation. • Programs in SCCM are several times longer than their equivalents in C. • Example: Bubble Sort. SCCM for Byzantine Failures • SCCM effectively captures a program's control flow. • The price for the programmer is having to write a more complex program that is several times longer. • We are trying to design compilers techniques that can verify whether a processor has faithfully executed a program. • Thus, the added difficulty does not concern our purposes. SCCM for Byzantine Failures • We may be able to annotate a program so that after execution it can prove to us that it transformed all of its data according to the original source code. • SCCM can be thought of a system for creating problem-specific type systems. Can we create a Linear-Algebra specific type system? • Can Model Checking help us determine a program's legal set of data transformations? Related Fields I. Certification Trails • A Certification Trail is a trail of information a program leaves behind, describing its work. • After the first program completes, a second program can use this trail to perform the same computation much more quickly. • Thus, the certification trail for a program acts much like a checksum or parity bit for data. • Little overhead is required. • Problem: Currently this approach requires mostly manual work. No techniques exist for compilers to generate certification trails. Related Fields II. Result Checking • A subfield of CS Theory dealing with ways to probabilistically verify the correctness of an algorithm's output. • Related to Interactive Proofs. • Problems: • Though the focus is on checkers that are asymptotically faster than the actual algorithm, most solutions are too inefficient to be used in practice. • There is no general methodology for generating checkers for problems and most checkers in existence are for obscure and specialized problems. Related Fields III. Replication • Run the same program on multiple computers. Compare their output to protect from corruption. • The only available solution to Byzantine Failures. • Very resource inefficient. Most replication-based approaches require 3 times as many resources as unprotected systems. • ED4I – run the same program twice with different data to detect permanent and transient faults. • BFS – Replicated services. Processors Vote on results. Resilient to f faults by using 3f+1 replicas. Gaussian Elimination