Condor - Computer Sciences Dept. - University of Wisconsin–Madison

advertisement
DMTCP:
A New Linux Checkpointing
Mechanism For Vanilla
Universe Jobs
Condor Project
Computer Sciences Department
University of Wisconsin-Madison
Why DMTCP?
› Why checkpoint at all?
› Problems with Condor’s Standard Universe
 Single process.
 No pthreads.
 No mmap() support.
 Forced re-link to form a static executable.
› DMTCP removes these restrictions!
www.cs.wisc.edu/Condor
2
What is DMTCP?
› Distributed Multi-Threaded CheckPointing.
› Works with Linux Kernel 2.6.9 and later.
› Supports sequential and multi-threaded
›
›
›
›
›
computations across single/multiple hosts.
Entirely in user space (no kernel modules or root
privilege).
Transparent (no recompiling, no re-linking).
Written at Northeastern U. and MIT and under
active development for 4+ years.
LGPL’d and freely available.
No Remote I/O.
www.cs.wisc.edu/Condor
3
Process Structure
Coordinator
Process 1
CT
T1
Signal (USR2)
DMTCP
CT
Network Socket
T1 T2
Process N
CT = DMTCP checkpoint thread
T = User Thread
www.cs.wisc.edu/Condor
4
How Does It Work?
› ./dmtcp_checkpoint a.out # starts coordinator too
› ./dmtcp_command –c # talks to coordinator
› ./dmtcp_restart ckpt_a.out-*.dmtcp
› Coordinator is a stateless synchronization server
›
for the distributed checkpointing algorithm.
Checkpoint/Restart performance related to size
of memory, disk write speed, and synchronization.
www.cs.wisc.edu/Condor
5
How Does It Work?
› LD_PRELOAD: Transparently preloads checkpoint
›
›
libraries which installs libc wrappers and
checkpointing code.
SIGUSR2: Used internally from checkpoint thread
to user threads.
Wrappers: Only on less heavily used calls to libc
 fork, exec, system, pipe, bind, listen, setsockopt,
connect, accept, clone, close, ptsname, openlog, closelog,
signal, sigaction, sigvec, sigblock, sigsetmask,
sigprocmask, rt_sigprocmask, pthread_sigmask
 Overhead is negligible.
www.cs.wisc.edu/Condor
6
How Does It Work?
› Additional wrappers when process id
& thread id virtualization is enabled
getpid, getppid, gettid, tcgetpgrp,
tcsetprgrp, getgrp, setpgrp, getsid,
setsid, kill, tkill, tgkill, wait, waitpid,
waitid, wait3, wait4
www.cs.wisc.edu/Condor
7
How Does It Work?
› Checkpoint image compression on-
the-fly (default).
› Currently only supports dynamically
linking to libc.so. Support for static
libc.a is feasible, but not
implemented.
› Stays close to POSIX API standards.
www.cs.wisc.edu/Condor
8
A Checkpoint Under DMTCP
› dmtcphijack.so & mtcp.so present in
executable’s memory.
› Ask coordinator process for
checkpoint via dmtcp_command.
› Now what happens?
www.cs.wisc.edu/Condor
9
A Checkpoint Under DMTCP
› Suspend user threads with SIGUSR2.
› Elect shared file descriptor leaders.
› Drain kernel buffers and do network
handshake with peers.
› Write checkpoint to disk.
› Refill kernel buffers.
› Resume user threads.
www.cs.wisc.edu/Condor
10
Where Is the Checkpoint?
› In the cwd of the application.
A set of ckpt_<exec>_<id>.dmtcp files.
› In the cwd of the coordinator.
A dmtcp_restart_script.sh file.
The dmtcp_restart_script.sh may need
tweaking depending upon circumstance.
www.cs.wisc.edu/Condor
11
A Restart Under DMTCP
›
›
›
›
›
›
›
›
Restart Process loads in memory.
Reopen files and recreate ptys.
Recreate and reconnect sockets.
Fork into user processes.
Rearrange file descriptors to initial layout.
Restore memory and threads.
Refill kernel buffers.
Resume user threads.
www.cs.wisc.edu/Condor
12
Supported OS Features
› Threads, mutexes/semaphores, fork, exec
 Shared memory (via mmap), TCP/IP sockets,
UNIX domain sockets, pipes, ptys, terminal
modes, ownership of controlling terminals,
signal handlers, open and/or shared fds, I/O
(including the readline library), parent-child
process relationships, process id & thread id
virtualization, session and process group ids,
and more…
› Trying to keep the implementation small!
www.cs.wisc.edu/Condor
13
Supported Applications
› MPICH-2, OpenMPI, SciPy/iPython, Python
 cmsRun, Perl, Ruby, PHP, GHCi (Glasgow Haskell
Compiler), Ocaml, Octave, Macaulay2, GNUPlot,
slsh (S-Lang scripts), MZScheme, GST (Gnu
Smalltalk virtual machine), tcsh, dash, csh, tclsh
(tcl-based interpreter), SQLite.
 And many others!
www.cs.wisc.edu/Condor
14
Planned Application Support
› Bash, gcl (GNU Common Lisp), maxima
(based on gcl), and the Sun JVM.
› These programs use sbrk() for their
own memory management and induce a
bug in DMTCP.
› A fix is planned and will go in soon.
www.cs.wisc.edu/Condor
15
Planned Application Support
› Matlab
Directly calling the binary without
graphics works, but matlab uses bash
which needs the sbrk() fix.
www.cs.wisc.edu/Condor
16
Condor/DMTCP Integration
› Experimental at this time.
 Determining scalability, stability, and extent of
›
“weird edge cases” of DMTCP mixed with
Condor.
Completely outside of Condor source code.
 A vanilla job called “shim_dmtcp” that wraps
the user’s job and stdfiles with DMTCP.
 A submit description file which transfers
needed dmtcp files over to the remote side and
saves intermediate checkpoints.
 No remote I/O!
www.cs.wisc.edu/Condor
17
Shim Script Execution
condor_starter
shim_dmtcp
Coordinator
Job
www.cs.wisc.edu/Condor
18
Submit File Example
universe = vanilla
executable = shim_dmtcp
arguments = logfile stdinf stdoutf stderrf a.out arg0 arg1…
should_transfer_files = YES
when_to_transfer_output = ON_EVICT_OR_EXIT
transfer_input_files = <dmtcp libraries and programs>,\
a.out, stdinf, stdoutf, stderrf
environment = DMTCP_TMPDIR=./;JALIB_STDERR_PATH=/dev/null
kill_sig = 2
output = shim.$(Cluster).$(Process).out
error = shim.$(Cluster).$(Process).err
log = shim.log
queue
www.cs.wisc.edu/Condor
19
Condor/DMTCP Integration
› Early Results
 It works with our test case and thousands of
jobs.
 Problems
• Checkpointing between Physical Address Kernels and
normal kernels is a challenge.
• DMTCP’s API needs some improvement.
• Coordinator failure means job failure.
• Shim script is clunky, e.g. no streaming I/O.
› Next: Integration into our stduniv test
suite for full regression testing.
www.cs.wisc.edu/Condor
20
Future Condor Integration
› Add WantCheckpoint = True and
›
›
›
›
CheckpointMethod = DMTCP for a vanilla universe
job.
Condor takes care of the wrapping of the job with
DMTCP and transferal of needed DMTCP files--no
shim script voodoo.
Condor should honor CheckpointPlatform for
Vanilla universe jobs in case of pool segmentation.
Parallel universe support with single coordinator.
Doug Thain’s Parrot for remote I/O.
www.cs.wisc.edu/Condor
21
Challenges
› C/C++ runtime library compatibility issues.
 Recompile DMTCP on slot before job execution?
› Dynamic library incompatibilities.
› No Checkpoint Server.
 Condor file transfer protocol enhancement?
› Debugging methods and practices?
www.cs.wisc.edu/Condor
22
Further Reading
› “DMTCP: Transparent Checkpointing
for Cluster Computation and the
Desktop”
http://arxiv.org/abs/cs/0701037
› Source Code
http://dmtcp.sourceforge.net
www.cs.wisc.edu/Condor
23
Questions?
› DMTCP
http://dmtcp.sourceforge.net
Gene Cooperman: gene@ccs.neu.edu
› Condor/DMTCP Integration
Pete Keller: psilord@cs.wisc.edu
Ask me if you want to try the Alpha
Version out!
www.cs.wisc.edu/Condor
24
Thank you
www.cs.wisc.edu/Condor
25
Download