Chapter 1 - Problem Statement • Given an arbitrary piece of MPI code, • Inject application-level checkpointing into the code, and • Be competitive with hand-written code Chapter 2 - Codegen • Sequential program + explicit checkpoint • Variation 1 - checkpoint everything – "simple analysis" and code generation – DOME, Beck's work • Variation 2 - minimize chpt size – Use program text to reconstruct values – "x=f(...); y=g(x); z=h(x,y);chpt();" – "simple analysis" and code generation – Jim Chapter 3 - space/time optimization • Variation 1 and 2 are two ends of a spectrum • still Sequential program + explicit checkpoint • compiler chooses what to save and what to reconstruct Chapter 4 - placing checkpoints • just sequential programs • Place checkpoints so that – overhead of checkpointing using Chapter 3 is minimized. – Checkpoints occur "frequently enough" Chapter 5 - general purpose system • arbitrary parallel (MPI) codes • Sequential checkpointing using Chapter 4 • What checkpointing protocol to use? • Uncoordinated – whenever a node reaches a checkpoint • Coordinated, Non-blocking – Sufficient - all nodes reach a checkpoint at roughly the same time. – Necessary - prove that deadlock cannot occur. Chapter 5 - general purpose system • Coordinated, Blocking – All processes execute MPI_BARRIER, MPI_REDUCE, and this is a reasonable point to checkpoint. Chapters 6, ... - Experiments, Related Work, Conclusions