Dynamic Verification of Sequential Consistency Albert Meixner and Daniel J. Sorin

Dynamic Verification of Sequential Consistency Albert Meixner and Daniel J. Sorin Presented by Peter Gilbert ECE259 Spring 2008 Introduction • Problem: How to detect errors in multithreaded memory systems? – Transient physical faults increasingly problematic as memory systems become more complex – Errors possible in many components: caches, memories, cache/memory controllers, interconnect Potential approach • Tailored detection mechanisms for each component and each type of error – Example: for “single-bit-stuck-at-x” model for system bus, add a parity bit – Problems: • • • • Lots of components… Requires understanding error models and how they interact Cannot detect design bugs Some errors difficult to detect with localized mechanisms Dynamic verification • Monitor system execution • Verify high-level invariants rather than considering individual components • Can detect transient faults, design bugs, fabrication defects – Any error that affects the high-level invariants • End-to-end correctness DVSC • Verifying memory consistency == verifying memory system correctness • Goal of DVSC: verify that SC is enforced in a shared memory multiprocessor – SC defines end-to-end correctness DVSC error model • Caches and memories – Bit corruption • Cache/memory controllers – Corruption of state or outputs • Interconnect – Messages corrupted, dropped, replicated, misrouted, reordered First idea: DVSC-Direct • Dynamically construct total order of loads and stores • Verify that total order satisfies SC • Trigger system recovery when error is detected DVSC-Direct Design • For every load and store: – Processor informs block’s home memory node • Inform message: – <address, load/store, data value, logical time> • Logical time: if A causes B, A has smaller logical time • Replay accesses in logical time order at home node – Uses priority queue for Informs and shadow copies of memory blocks – Verify that load gets value from most recent store Cost of DVSC-Direct • Inform bandwidth proportional to the number of loads and stores – 8-53 times more bandwidth than unprotected system – Uses bandwidth like a system without caches! Alternative: DVSC-Indirect • Verify sub-invariants proven to be equivalent to SC – Proof due to Plakal et al. • Terminology – coherence epoch - interval of logical time during which a processor has Shared or Exclusive access to a block – A memory access is bound to a coherence transaction T if permission is obtained via T Constructing SC from sub-invariants • Fact 1: load of block B bound to T receives either: – – • Most recent store of B bound to T Value of B received in response to T Lemmas 1. Exclusive epochs for block B do not overlap with other Exclusive or Shared epochs for B 2. Every load or store occurs in some epoch and is bound to the transaction that epoch 3. Each word w of B received at the start of an epoch equals the most recent store to w DVSC-Indirect approach • DIVA to verify that memory operations occur in program order (Fact 1) – Recall from Architecture I: DIVA dynamically verifies a speculative core – DIVA presents in-order abstraction to memory system • ECC added to each cache and memory line to detect silent corruptions • Hardware for verifying epoch invariants Cache controller • Cache controller maintains Cache Epoch Table – CET entry (per-block): S/E DRB 1 bit 1 bit Logical time at start 16 bits Hash of data block at start of epoch 16 bits S/E - type of epoch: shared or exclusive DRB - data ready bit – Check that every load and store is performed in appropriate epoch (Lemma 2) – When epoch ends, send info to home memory controller in Inform-Epoch message (CET + end time and end data hash) Memory controller • Memory controller maintains directory-like Memory Epoch Table – MET entry (per-block): Latest end time of any S epoch Latest end time of any E epoch Hash of data block from latest E epoch 16 bits 16 bits 16 bits – When Inform-Epoch is received from cache controller: • Sort in priority queue (VWB) • Process in logical time order, checking: – This epoch does not overlap with other epochs – Correct block data is transferred from epoch to epoch DVSC-Indirect implementation CPU DIVA CPU DIVA CPU DIVA Cache CET Cache CET Cache CET Interconnect VWB Memory Verifier MET VWB Memory Verifier MET VWB Memory Verifier MET DVSC-Indirect snooping example DVSC-Indirect summary • Verifies SC through sub-invariants • Costs: – DIVA – Storage structures • CET at each cache controller, MET and VWB at each memory controller, ECC on cache and memory lines • Not large or complicated • Bandwidth usage – Proportional to coherence traffic (as opposed to number of loads and stores for DVSC-Direct) Evaluation • Can DVSC detect the errors from the error model? – Corrupted, dropped, misrouted, reordered, duplicated messages – Corrupted cache and memory blocks – Don’t consider errors in processor core • DIVA handles this • How much does DVSC increase bandwidth usage? • How does it affect error-free performance? Methodology • Full-system simulation with Simics – 8-node multiprocessor • Each processor implements SC, speculates for higher performance • Two levels of cache • Support for backward error recovery with SafetyNet at each node – DVSC-Indirect costs • Each CET 68 KB • Each MET 102 KB • VWB: 1024 entries – SPARC V9 running Solaris 8 – Interconnect: 2.5 GB/s links • Benchmarks – Four commercial workloads + barnes-hut from SPLASH-2 Error coverage • DVSC-Direct and DVSC-Indirect detected all injected errors in simulation • Small probability of false negatives in DVSC-Indirect – ECC fails to detect a bit error – hash collisions • False positives also possible in DVSCIndirect – When VWB not large enough to prevent outof-order processing of Inform-Epochs Bottleneck link bandwidth Bandwidth on most-utilized link - directory • DVSC-Direct: uses 8-53 times more bandwidth • DVSC-Indirect directory: 8-25% increase • DVSC-Indirect snooping: 0-15% increase DVSC-Indirect error-free performance Runtime with DVSC-Indirect compared to unprotected system - directory • • Performance impact minimal: usually equivalent to that of SafetyNet by itself Similar results for snooping Conclusions • DVSC-Indirect is effective at detecting all memory system errors injected in the simulations – False negatives and false positives will occur with small probability • DVSC-Indirect imposes small error-free performance overhead • Bottleneck bandwidth usage with DVSC-Indirect is only 8-25% greater than unprotected case • Is the hardware cost justified? – Probably depends on the application reliability requirements

Dynamic Verification of Sequential Consistency Albert Meixner and Daniel J. Sorin

Related documents

Products

Support

Dynamic Verification of Sequential Consistency Albert Meixner and Daniel J. Sorin

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib