PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, James Cownie Intel Corporation CGO 2010, Toronto, Canada Software & Services Group 1 Non-Determinism • Program execution is not repeatable across runs – Interactions with environment (single-threaded) – Shared-memory interleaving (multi-threaded) • Source of many problems – Hard to predict and test behaviors -> leads to bugs – Very hard and unpleasant to debug – Breaks program analyses that rely on repeatability • Obstacle for adoption of parallel programming Software & Services Group 2 Dealing with Non-Determinism • Eliminate it – Deterministic program execution enforced by runtime (e.g. constrained execution [ISCA’09]) • Deterministic Replay – Let it be but capture and reproduce execution if needed – Every instruction gets same input as in original run • This paper: User-level Deterministic Replay – Implementation, challenges and usage examples Software & Services Group 3 Requirements • • • • • • No OS or hardware changes No changes in user environment Manageable log sizes for long runs Reasonable run-time overhead Multi-threaded and multi-processed applications Integration with other existing analysis tools (e.g. Dynamic analyzers, debuggers, profilers) • No assumptions about synchronization APIs Software & Services Group 4 Rest of the Talk • • • • • Motivation & Requirements PinPlay Overview Usage Examples Results Summary Software & Services Group 5 PinPlay replay capture User-level deterministic replay and analysis Binary + Input Normal Program Output PinPlay + Logs (pinballs) OS (Linux® or Windows®) Logs (pinballs) Analysis Tools PinPlay + Debuggers OS (Linux® or Windows®) Run in application’s native environment Replays user code OS independent: cross-OS replay! Easily integrates w/ other tools and debuggers Software & Services Group 6 Replay Models • Parallel-capture and parallel-replay T0 T1 T2 T0 T1 T2 PinPlay Logs (pinballs) PinPlay • Parallel-capture and isolated-replay PinPlay T0 T1 T2 PinPlay Logs Logs Logs (pinballs) (pinballs) (pinballs) T0 PinPlay T1 PinPlay T2 Software & Services Group 7 Information Captured For Replay All memory Values 1. Subset of Memory Values • Shadow-memory to capture first reads without prior writes and OS side-effects automatically [Sigmetrics’06] • Values changed by remote threads 2. Initial registers and OS register side-effects: • Signals/Exceptions/APCs/system calls 3. 4. 5. 6. 8 Reads without prior writes OS side-effects used by app Values from remote threads All other values (not captured) Code executed (user and libraries) Position of code and stack Output of some instructions (e.g. RDTSC) Subset of shared-memory access interleaving (transitive opt. - FDR [ISCA’03]) Software & Services Group PinPlay Architecture User Land pinball Application code and data Your Pin-based Tool PinPlay Lib Logger Instrumentation and analysis to capture logs Replayer Instrumentation and analysis to inject side-effects Intel’s Pin (JIT compiler and instrumentor) * OS (Linux® or Windows®) Capable of logging, replaying and relogging execution (recapture from a replaying run) 9 * http://www.pintool.org/ Software & Services Group Cross-OS Replay and Challenges • Log on one OS and replay on another • System call translations – Most OS activity does not happen on replay (only sideeffects restored) – Semantics is translated across OSes (e.g. create thread) • Memory mapping – Problem: address space different across OSes – Solution: use Pin’s Fetch API to redirect code and memory operand rewriting to redirect data address space on Windows® code code data data address space on Linux® Software & Services Group 10 Usage Example: Program Analysis • Sampling and checkpointing for simulation Multi-process MPI program – One run for profiling and finding representative regions, another for checkpointing – Requirement: both runs must be identical PinPlay Checkpoints for simulation Logs Logs Per-Process (pinballs) (pinballs) pinball PinPlay + Checkpointer PinPlay + Profiler Per-Process pinball Representative Regions • Pinballs are used to share workloads for Pinbased analyses among architects Software & Services Group 11 Usage Example: Replay for Debugging • Capture a buggy run and replay under debugger – – – – Guaranteed to reproduce the bug and helps root causing Works w/ off-the-shelf unmodified debuggers (e.g. GDB) PinPlay based tool extends GDB commands w/ your own Limitation: debugger can’t change control-flow • Used to debug various multi-threaded applications • Also using it for in-house debugging of concurrency issues with a major database vendor Logs (pinballs) PinPlay Enabled Debugger Tool Intel’s Pin GDB remote (unmodified) protocol Binary Software & Services Group 12 Results Slowdown relative to Native Logger Slowdown Size (MB) 39 91 396 2140 1116 5222 1996 Replayer Slowdown 160 140 120 100 80 60 40 20 0 Software & Services Group 13 Isolated replay Benchmark/Application Average Icount (Billions) SPEC2006 (single-threaded) 924 SPECOMP2001 (4-threaded openmp) 307 McBench (4-threaded RMS) 156 MILC-8p (numerical simulator/MPI) 109 POP-8p (ocean circulator model/MPI) 952 WRF-8p (Weather Prediction/MPI) 755 EnergyApp-8p (Energy Exploration/MPI) 693 Sources of Slowdown • Instrumentation of every memory operation to identify system call side-effects and log data – Could be done by OS at the cost of OS modification or OS-specific analysis (doesn’t work on Windows®) • Locks for shadow-memory accesses – Could be eliminated by using a shadow-copy per thread at the cost of significant increase in log sizes • Other optimizations possible (please look at the paper) Software & Services Group 14 Summary • User-level deterministic capture and replay – No OS changes, special hardware, or virtualization – Integrates w/ other Pin-tools for repeatable analysis and debugging • Replay occurs on any machine and works across OSes (Windows to Linux) • Pinballs are OS-independent and self-contained – Ideal for sharing workloads among researchers, for Pin-based analyses • We will release PinPlay libraries in future Software & Services Group 15 Q&A Software & Services Group 16