Chapter 1 - Problem Statement

advertisement
Chapter 1 - Problem Statement
• Given an arbitrary piece of MPI code,
• Inject application-level checkpointing into
the code, and
• Be competitive with hand-written code
Chapter 2 - Codegen
• Sequential program + explicit checkpoint
• Variation 1 - checkpoint everything
– "simple analysis" and code generation
– DOME, Beck's work
• Variation 2 - minimize chpt size
– Use program text to reconstruct values
– "x=f(...); y=g(x); z=h(x,y);chpt();"
– "simple analysis" and code generation
– Jim
Chapter 3 - space/time
optimization
• Variation 1 and 2 are two ends of a
spectrum
• still Sequential program + explicit
checkpoint
• compiler chooses what to save and what
to reconstruct
Chapter 4 - placing
checkpoints
• just sequential programs
• Place checkpoints so that
– overhead of checkpointing using Chapter 3 is
minimized.
– Checkpoints occur "frequently enough"
Chapter 5 - general purpose
system
• arbitrary parallel (MPI) codes
• Sequential checkpointing using Chapter 4
• What checkpointing protocol to use?
• Uncoordinated
– whenever a node reaches a checkpoint
• Coordinated, Non-blocking
– Sufficient - all nodes reach a checkpoint at roughly the same
time.
– Necessary - prove that deadlock cannot occur.
Chapter 5 - general purpose
system
• Coordinated, Blocking
– All processes execute MPI_BARRIER,
MPI_REDUCE, and this is a reasonable point to
checkpoint.
Chapters 6, ... - Experiments,
Related Work, Conclusions
Download