Uniprocessor Checkpointing CS 717 – Fall 2001 9/25/01 The Need to Save State Many of the FT systems we have discussed need a way to restart processes from previous points in their computation A checkpoint is just a ‘snapshot’ of a process (or system) at a certain point in time A checkpointing system provides a way to take these snapshots, and to restart from them Types of Ckpt Systems Kernel Level OS supports ckpt & recovery Transparent to the application and developer User Level Application linked against (user) library Library functions perform ckpt and recovery Transparent to application Limitations (cannot restore PID, PPID, etc.) Application Level Applications coded to ckpt themselves, and to restart from a checkpoint Comparison of Levels Kernel & User (System) Level Easy to add checkpointing to existing code Works with (almost) any programs General, ‘coarse’, approach Application Level Could require complete re-write, or extensive modifications Specific, ‘fine-grained’ solutions System Level Checkpointing Libckpt (1994) Plank, Beck, Kingsley (UTK), Li (Princeton) User level library for UNIX Libckpt User Level Checkpoint Library Goals Transparent Requires minimal modifications to code and rere-linking Low Overhead Automatic optimizations to reduce ckpt file size Allow user directed checkpointing Libckpt Overview Taking the ‘snapshot’ Suspend the process Write process’ memory and registers to a file Recovery Reload executable from original file Reconstruct memory and register state from checkpoint file Libckpt Operation Application main() is re-named ckpt_target() Library main() checks if in restore mode (specified using command line option); otherwise reads checkpoint parameters from file Libckpt Operation (2) main() sets a timer to interrupt application every n seconds On signal Uses setjmp to record registers, pc, etc. Writes the stack and heap segments to file Resumes application code Libckpt Operation If application started with =recover as command line option Application begins, recovering Text segments Open checkpoint file Recover heap from file Recover stack from file Restores register file (using longjmp) Virtual Address Space Bottom of Stack Stack SP sbrk(0) Heap &edata &etext 0 Data (Static) Text Checkpoint And Recovery Algorithms main() if(recovery) restore stack restore heap pos = top of stack longjmp(pos, 1) // restore regs. else run usual code signal_handler() jmp_buf pos if(setjmp(pos)==0) //saved reg. in known //position on stack write stack write heap else // process recovered return Illustration main() user_main() fun1() fun2() signal save regs on stack save stack to file save heap to file resume main() restore() restore stack restore heap take jump Optimization: Incremental Checkpointing Observation: between taking two checkpoints, only a portion of the memory has actually been changed Optimization: save only what has been changed since last ckpt, the rest can be read from previous ckpts Taking Incremental Ckpts. After taking a ckpt (and after init.), set protection on all pages to ‘read-only’ Write to page will cause a protection violation Libckpt library catches that signal, and sets page protection to ‘read-write’, page is marked as dirty When writing checkpoint file, only write dirty pages Drawbacks to Incremental Ckpt Required to keep multiple copies of the checkpoint file On recovery, will unnecessarily restore old copies of data Optimization: Asynchronous Checkpointing Observation: the process must be suspended while the checkpoint file is written Optimization: a separate thread could write the checkpoint file while the main thread was allowed to continue Asynchronous Checkpointing Make a copy of the process space 2nd thread takes writes copy to disk 1st thread continues without halting Asynchronous Checkpointing(2) Unix fork() provides the necessary behavior When about to take ckpt, process forks OS makes a complete copy of the original process’ space Clone writes ckpt file, then dies Original continues computing Copy-On-Write Checkpointing Like asynchronous checkpointing, but only copy page if the two versions are about to differ Some (most?) OS implement fork() in this manner, so benefit is automatic Checkpoint Compression Use a standard data compression algorithm to shrink the size of the checkpoint file Only improves overhead if the speed of compression is faster than the speed of disk writes, and compression is significant “For uniprocessor checkpointing, this is not the case” Not implemented in libckpt User Directed Checkpointing As described so far, libckpt is (almost) entirely transparent to the programmer Compare to application level checkpoint requiring extensive code changes Is there a middle ground? Libckpt allows programmers to annotate application code with directives that guide the checkpointing Memory Exclusion Certain areas of memory can be excluded from the checkpoint Dead memory – will never be read or written Clean memory – values have not changed since previous checkpoint Incremental Ckpt provides clean memory opt. at a coarse level (page size) Only writing the ‘active’ areas of the stack and heap provides dead memory opt. User Directed Memory Exclusion Libckpt provides the app. programer with two functions exclude_bytes(ptr, length, usage) Specify an area of memory to exclude from future checkpoints include_bytes(ptr, length) Add a previously excluded area of memory to future checkpoints Clean Memory If mem is clean exclude_bytes(mem, …, CKPT_READONLY) Include mem in next checkpoint, but exclude in all subsequent Cannot write to mem until after call to include_bytes(mem) Restore last saved version of mem Clean Memory: Example for (…) { A = init_A() exclude_bytes(A,…,CKPT_READONLY) do_stuff(A) //assuming A does not change include_bytes(A…) } Dead Memory If mem is dead exclude_bytes(mem, …, CKPT_DEAD) Do not checkpoint mem Cannot read mem until after include_bytes(mem) Will not restore mem Dead Memory: Example for (…) { A = init_A() do_stuff(A) exclude_bytes(A…DEAD) do_other_stuff() // assumes will not read A include_bytes(A) } Using Memory Exclusion There can be a dramatic reduction in the size of the checkpoint file Must be used very carefully Inadvertently excluding a live region from a checkpoint could cause erroneous behavior on restart Synchronous Checkpointing At different points in the program’s execution the amount of ‘live’ state varies widely The stack might be much smaller (shallower call graph) Heap items might have been de-allocated Regions of memory might be dead or clean Synchronous Ckpt (2) If checkpoints are taken at times where there is relatively little live state, the checkpoint file size (and overhead) will be smaller Allow user to specify where in a program a checkpoint should be taken Independent of timers (signals) Sync. Ckpt. Example for (…) { checkpoint_here() A = malloc(…) do_stuff(A) free A } Synchronous Ckpt (3) To avoid checkpointing too frequently, mintime parameter specifies the minimal amount of time between two checkpoints If checkpoint_here() is called less than mintime seconds after the last checkpoints, the call is ignored Synchronous Ckpt (4) To ensure that checkpoints are taken frequently enough to be of use, maxtime parameter specifies the maximum time allowed to elapse between two checkpoints If maxtime passes, an asynchronous checkpoint is taken Combining Mem. Exclusion and Sync. Checkpointing main(){ D = malloc f = file while(!done){ D = read(f) perform_calc(D) output_result() } } ckpt_target(){ D = malloc f = file while(!done){ D = read(f) perform_calc(D) output_result() exclude_bytes(D, DEAD) checkpoint_here() include_bytes(D) } }