Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun Goyal

advertisement
Recording Inter-Thread Data
Dependencies for Deterministic
Replay
Tarun Goyal
Kevin Waugh
Arvind Gopalakrishnan
Debugging
Multi – Threading Programs
Debuggers – always helpful
Aim of discussion
• Deterministic replay of multi processor execution
• Record non deterministic events, specially memory races
Flight Data Recorder (FDR)
Bug – Net
Strata
FDR -- Approach
Deterministic re-players and data
race detectors exist
FDR – Records operating system and
I/O issues
FDR -- Assumptions
Sequential Consistency
Directory based scheme
Cache size is same as memory
FDR -- Kinds of logs
3 kinds to meet performance, space and complexity
requirements
• To restore consistent state  logs old memory on updates –
checkpoints and logging
• Record outcome of races  assumes SC and records subset
(implied races omitted)
• Record system I/O  logs interrupt timing and treats device
interfaces as pseudo processors.
Has low time space overhead – continuously
enabled
Recording Races
Necessary to log non deterministic thread interleaving –
outcomes of races
Question? – how much… solution in memory model – here
SC
Record arcs – order pairs of dynamic instructions – not all
Time stamps of cached blocks stored – missing timestamps
approx
FDR Issues and Optimizations
Log Size – Regulated Transitive Reduction – judiciously log
strict vector dependencies
Hardware Cost – false races – approx on LRU in associative
set – 24KB per core
Simpler Design – take timestamps out of the cache
TSO Model – avoids replay deadlocks of SC – additional
info of load values
BugNet:Net the Bug
Architecture support for Deterministic Replay Debugging .
Focus on replay of user code and shared libraries.
Built, improving on the ideas of FDR
Claim to be viable for use with software development
(application).
Archtecture Overview
Checkpoint based recording
• Check Point Interval snapshots
• CP buffer (PC+Reg Map)
Observe the Loads done by threads to trace
the complete execution
• Intial Register Values in a CP
• The Trace of the loads
Tracking loads works in spite of interrupts
,DMA transfers and other threads writing to
shared memory.
Load Bits in cache
• Reduce multiple loads/log size.
• Updates stores from external events
FLL and MRB
Dictionary based compression
• For log data
FDR vs BugNet
FDR
• Features include tracking
I/O, Interrupts, DMA
accesses.
• Extra Hardware and log size
overhead
BugNet
• Focus on application level
S/W debugging, simpler
scheme.
• Smaller in terms of
Hardware and Log Size
Assumptions/Limitations
Assumes a sequential consistency memory model
Wont help in finding bugs which are caused by
interactions with the OS and other system code.
Question usability in mainstream systems.
For debugging user level applications, software based
recording more viable?
Strata – Logging Shared Memory
Dependencies
Record memory counts on a dependency
Hardware/cache-based scheme
• Assumes sequential consistency
• Dictionary and Snoopy cache consistency
Drop-in replacement for Netzer’s scheme
• Smaller log size
• Less computation to create log
• More complicated replay
Narayanasamyet. al. ASPLOS06
Strata cont.
Lowresource overhead
• 12% bandwidth on Dictionary Scheme
• ~0% bandwidth on Snoopy Scheme
Scales linearly with number of threads
• Each stratum holds one word per threads
• Potentially worse than Netzer’s scheme
Concerns and Criticisms
All systems are require hardware
• Significant resource overhead
• Software would be slower, but still useful
Consistency models restrictive
• Exclude commodity hardware (x86)
Encourages sloppy programming
• Users != Testers
Download