Dynamic Verification of Sequential Consistency Albert Meixner and Daniel J. Sorin

advertisement
Dynamic Verification of
Sequential Consistency
Albert Meixner and Daniel J. Sorin
Presented by Peter Gilbert
ECE259 Spring 2008
Introduction
• Problem: How to detect errors in
multithreaded memory systems?
– Transient physical faults increasingly
problematic as memory systems become
more complex
– Errors possible in many components: caches,
memories, cache/memory controllers,
interconnect
Potential approach
• Tailored detection mechanisms for each
component and each type of error
– Example: for “single-bit-stuck-at-x” model for system
bus, add a parity bit
– Problems:
•
•
•
•
Lots of components…
Requires understanding error models and how they interact
Cannot detect design bugs
Some errors difficult to detect with localized mechanisms
Dynamic verification
• Monitor system execution
• Verify high-level invariants rather than
considering individual components
• Can detect transient faults, design bugs,
fabrication defects
– Any error that affects the high-level invariants
• End-to-end correctness
DVSC
• Verifying memory consistency == verifying
memory system correctness
• Goal of DVSC: verify that SC is enforced
in a shared memory multiprocessor
– SC defines end-to-end correctness
DVSC error model
• Caches and memories
– Bit corruption
• Cache/memory controllers
– Corruption of state or outputs
• Interconnect
– Messages corrupted, dropped, replicated,
misrouted, reordered
First idea: DVSC-Direct
• Dynamically construct total order of loads
and stores
• Verify that total order satisfies SC
• Trigger system recovery when error is
detected
DVSC-Direct Design
• For every load and store:
– Processor informs block’s home memory node
• Inform message:
– <address, load/store, data value, logical time>
• Logical time: if A causes B, A has smaller logical time
• Replay accesses in logical time order at home
node
– Uses priority queue for Informs and shadow copies of
memory blocks
– Verify that load gets value from most recent store
Cost of DVSC-Direct
• Inform bandwidth proportional to the
number of loads and stores
– 8-53 times more bandwidth than unprotected
system
– Uses bandwidth like a system without caches!
Alternative: DVSC-Indirect
• Verify sub-invariants proven to be
equivalent to SC
– Proof due to Plakal et al.
• Terminology
– coherence epoch - interval of logical time
during which a processor has Shared or
Exclusive access to a block
– A memory access is bound to a coherence
transaction T if permission is obtained via T
Constructing SC from sub-invariants
•
Fact 1: load of block B bound to T receives
either:
–
–
•
Most recent store of B bound to T
Value of B received in response to T
Lemmas
1. Exclusive epochs for block B do not overlap with
other Exclusive or Shared epochs for B
2. Every load or store occurs in some epoch and is
bound to the transaction that epoch
3. Each word w of B received at the start of an epoch
equals the most recent store to w
DVSC-Indirect approach
• DIVA to verify that memory operations
occur in program order (Fact 1)
– Recall from Architecture I: DIVA dynamically
verifies a speculative core
– DIVA presents in-order abstraction to memory
system
• ECC added to each cache and memory
line to detect silent corruptions
• Hardware for verifying epoch invariants
Cache controller
• Cache controller maintains Cache Epoch Table
– CET entry (per-block):
S/E DRB
1 bit 1 bit
Logical time at start
16 bits
Hash of data block at
start of epoch
16 bits
S/E - type of epoch: shared or exclusive
DRB - data ready bit
– Check that every load and store is performed in
appropriate epoch (Lemma 2)
– When epoch ends, send info to home memory
controller in Inform-Epoch message (CET + end time
and end data hash)
Memory controller
• Memory controller maintains directory-like
Memory Epoch Table
– MET entry (per-block):
Latest end time
of any S epoch
Latest end time
of any E epoch
Hash of data block
from latest E epoch
16 bits
16 bits
16 bits
– When Inform-Epoch is received from cache
controller:
• Sort in priority queue (VWB)
• Process in logical time order, checking:
– This epoch does not overlap with other epochs
– Correct block data is transferred from epoch to epoch
DVSC-Indirect implementation
CPU
DIVA
CPU
DIVA
CPU
DIVA
Cache
CET
Cache
CET
Cache
CET
Interconnect
VWB
Memory
Verifier
MET
VWB
Memory
Verifier
MET
VWB
Memory
Verifier
MET
DVSC-Indirect snooping example
DVSC-Indirect summary
• Verifies SC through sub-invariants
• Costs:
– DIVA
– Storage structures
• CET at each cache controller, MET and VWB at each
memory controller, ECC on cache and memory lines
• Not large or complicated
• Bandwidth usage
– Proportional to coherence traffic (as opposed to
number of loads and stores for DVSC-Direct)
Evaluation
• Can DVSC detect the errors from the error
model?
– Corrupted, dropped, misrouted, reordered,
duplicated messages
– Corrupted cache and memory blocks
– Don’t consider errors in processor core
• DIVA handles this
• How much does DVSC increase
bandwidth usage?
• How does it affect error-free performance?
Methodology
• Full-system simulation with Simics
– 8-node multiprocessor
• Each processor implements SC, speculates for higher performance
• Two levels of cache
• Support for backward error recovery with SafetyNet at each node
– DVSC-Indirect costs
• Each CET 68 KB
• Each MET 102 KB
• VWB: 1024 entries
– SPARC V9 running Solaris 8
– Interconnect: 2.5 GB/s links
• Benchmarks
– Four commercial workloads + barnes-hut from SPLASH-2
Error coverage
• DVSC-Direct and DVSC-Indirect detected
all injected errors in simulation
• Small probability of false negatives in
DVSC-Indirect
– ECC fails to detect a bit error
– hash collisions
• False positives also possible in DVSCIndirect
– When VWB not large enough to prevent outof-order processing of Inform-Epochs
Bottleneck link bandwidth
Bandwidth on most-utilized link - directory
• DVSC-Direct: uses 8-53 times more bandwidth
• DVSC-Indirect directory: 8-25% increase
• DVSC-Indirect snooping: 0-15% increase
DVSC-Indirect error-free performance
Runtime with DVSC-Indirect compared to unprotected system - directory
•
•
Performance impact minimal: usually equivalent to that of SafetyNet by itself
Similar results for snooping
Conclusions
• DVSC-Indirect is effective at detecting all
memory system errors injected in the
simulations
– False negatives and false positives will occur with
small probability
• DVSC-Indirect imposes small error-free
performance overhead
• Bottleneck bandwidth usage with DVSC-Indirect
is only 8-25% greater than unprotected case
• Is the hardware cost justified?
– Probably depends on the application reliability
requirements
Download