Graybox Stabilization Anish Arora Murat Demirbas Sandeep Kulkarni The Ohio State University/ Michigan State University July 2001 Stabilization • Traditionally, stabilization has been a whitebox (application-dependent) approach to dependability • It assumes a complete system description & is proved using Closure, wrt its legitimate states Convergence, from arbitrary states to legitimate states • The assumption raises basic questions about the approach : is it applicable for closed-source applications ? is it feasible for large applications ? is stabilization “reusable” ? Graybox Stabilization Concept Stabilization without knowledge of system implementation but with knowledge only of system specification Approach Given a specification A, design a wrapper W s.t. A wrapped with W is stabilizing to A Goal For an implementation C that satisfies the specification of A, C wrapped with W is stabilizing to A Recent Case Studies in Graybox Stabilization We have recently designed stabilizing systems without assuming knowledge of implementation i. In Aladdin home network at MSR [WRA00a, WRA00b, AJW01] model-based stabilization enabled low-cost replication strategy for (name-based and attribute-based) lookup server Rely : publishers “refresh” information periodically publisher refreshes & subscriber queries are broadcast Guarantee : Always 1. every query gets a unique response from service 2. quality of response is high Recent Case Studies … contd. ii. Also in Aladdin, model-based stabilization dealt with hidden state and hidden transitions for dependable X10 powerline networking Recent Case Studies … contd. iii. In resettable vector clocks [ADK00] stabilization was achieved assuming a client contract: in any window with M clock reset events at any node j , all nodes deliver a message from j & messages in transit at start of window are delivered (eventually reset events occur at every node) Outline of Talk This talk presents sufficient conditions for achieving graybox stabilization and illustrates them using several implementations of Timestamp-based Mutual Exclusion 1. Sufficient condition for implementations 2. Sufficient condition for specifications 3. Case study: Timestamp-based Mutual Exclusion 4. Graybox stabilization in Ricart-Agrawala & Lamport’s solutions Impossibility Result • Graybox stabilization is not achievable for every implementation C of A F A S0 S1 F C S0 S* S2 S3 ... S* S1 S2 S3 ... • Wrapper that renders A stabilizing may not suffice for stabilizing C Sufficient Condition for Implementations Convergence refinement : C is a refinement of A Every computation of C that starts from a noninitial state is a compression or expansion of some computation of A starting from the corresponding state Special Cases of Convergence Refinement Everywhere refinement : Every computation of C is a computation of A Everywhere-eventually refinement : Every computation of C is an arbitrary finite prefix followed by a computation of A Graybox Stabilization Theorem If • C is a convergence refinement of A • A wrapped with W is stabilizing to A C wrapped with W is stabilizing to A then Sufficient Condition for Specifications • Verifying convergence refinement may be difficult for distributed applications, since instantaneous access to global state is lacking calculating global invariants may be hard • Local specifications : Decompose A and C into several parallel components A = ( j :: Aj ) C = ( j :: Cj ) Graybox Stabilization Theorem for Local Specifications If • Cj is an convergence refinement of Aj for all j • A wrapped with W is stabilizing to A C wrapped with W is stabilizing to A then Timestamp-based Distributed Mutual Exclusion • Mutual exclusion at most one node in critical section at any time • Starvation freedom each requesting node eventually enters critical section • First-come first-serve requesting nodes enter critical section in order of increasing timestamp LocalSpec of A Node j • Client spec • Program spec Request : each requesting node sends a REQUEST to all nodes Reply : each node that receives an earlier REQUEST replies to sender CS entry : node enters c.s. upon receiving a later message from all nodes CS release • Environment spec Graybox Stabilization Wrapper for A Node j • A wrapper that suffices is : node j is hungry send(REQUESTj) to all nodes k • A more efficient wrapper W is : node j is hungry send(REQUESTj ,k) to all nodes k s.t. j.REQUESTk earlier than REQUESTj Graybox Stabilization of TME Result If an implementation C is a convergence refinement of LocalSpec then C wrapped with W is stabilizing to LocalSpec Stabilizing Ricart-Agrawala’s and Lamport’s Solutions • Ricart-Agrawala and Lamport ME are convergence refinements of LocalSpec we assume that their internal variables (e.g. sets, queues) are self-cleaning self-cleaning is readily achieved by adding actions such as true ensure that deferred_set is consistent with external variables of LocalSpec • It follows that W makes Ricart-Agrawala & Lamport ME stabilizing to LocalSpec Summary • Convergence refinements and local specifications are sufficient for achieving stabilization without knowledge of implementation details • Assuming knowledge of specification offers potential for lower-cost dependability than assuming no such knowledge Future Directions • Formal derivation of Dijkstra's 3-state stabilizing tokenring programs as "convergence refinements" of an abstract token ring program • Fault-tolerance preserving compilers Given fault-tolerant A, produce convergence refinements of A McGuire, Gouda: AP to APC compiler • Case studies in graybox masking fault-tolerant systems