• Overview • Capbility • Specifying policy through ClassAds • Condor-G & Glidein

advertisement
•
•
•
•
•
•
Dr Steven Newhouse
Technical Director
Overview
Capbility
Specifying policy through ClassAds
Condor-G & Glidein
Master & Worker
Condor DAG Man
London e-Science Centre
Department of Computing, Imperial College London
2
mlnopohqrs
"!#!%$'& (*)+& ,-
• High Throughput Computing (HTC)
Infrastructure: CPU Cycles/year
• Uses ClassAds to represent jobs & resources
during matchmaking
• Inherent fault-tolerance
• Available on W2000 (limited support) & UNIX
MONQPSRTNQUe V
VXW NQY[ZQ\^]_NQN Wa`cbd
MONQPSRTNQU
g VXk
fhgji
. )/& (1032547698;:<=<;> ?9@AB<;C76D4=E@D:FAG8H6DIGIGJH?9>8HECK>69?;L
3
‘ s’“”•
4
– “—˜o• ‘ ”
‡wˆsˆ Ž Š3‰Ša ‹ ŒyŠ3
z;uyw… x‚ tK{uyx ||ƒuw€w}vy†au ~yx uy€ }3~Kx  „ uy€
Ÿ ¯³ §¤¬¤´µ££9¶'·'¢±šª¤´'¦
ŸK a §¤¡ ¢¤¨ƒ£ £w¨3¥'©¤¦ ª« ¬'¢­®« §¤ª
¯ ¢­®° ±š¢²
• One user & one machine (no root access)
• Provides a persistent execution environment
that will run jobs when your policy allows
them to run
• Need jobs to run without any user interaction
‡ˆˆ Ž Š3‰Š3 ‹ ŒwŠa
… uy€y†auyxºwuwuy„
5
… ‚ € }3x ~K„š™5~K€ ~y› ‚ x
œ ~w|p} ‚ x
|}~Kx }a† € ‚ ›7u }3 ~y}3uyx
| yž ‚ †a†
 uy„ „ ‚  }3uyx
¸ ‚ |3{¤}au 
œ ~y|} ‚ x
|p}~Kx }3†
| yž ‚ †a†
… „ v |} ‚ x¹ uy† ‚
œ ~w|p} ‚ x
|}~Kx }a†
… „ v |} ‚ x¹ uy† ‚
œ ~w|p} ‚ x
|}~Kx }a†
6
1
“’
»s’’
• Master
• Vanilla
– Starts up, manages, and deploys new daemons
– No checkpointing, migration, restart or remote calls
• Startd (Represents a machine to the Condor system)
• Standard
– Responsible for starting, suspending, and stopping jobs
– Enforces the wishes of the machine owner (the owner’s
“policy”… more on this soon)
– Relink with Condor ‘C’ library to provide checkpointing,
migration & restart
– Redirect remote system calls to submitting machine
– No support for fork, shared memory, or IPC system calls
• Schedd (Represents users to the Condor system)
• Globus
– Maintains the persistent queue of jobs
– Responsible for contacting available machines and sending them
jobs
– Specify the resource job manager
• Scheduler (e.g. DAGman)
• Collector (Represents the state of the Condor pool)
– Runs meta-scheduler on submitting machine
– Each daemon periodically provides an update via a ClassAd
• Negotiator (Matches jobs to resources through ClassAd’s) 7
¼ ph½m— ’¾•“””¤’
¿½p“ “ »s’Àr”— Á
• MPI
• PVM
8
 ý“sÄ
• I/O System calls trapped and sent back to
submit machine
• Allows Transparent Migration Across
Administrative Domains
• Language Independent
• No Source Code changes required
• Condor can transfer files
Schedd
Startd
Starter
Shadow
– Automatically send back changed files
– Atomic transfer of multiple files
Submit
Customer Job
Condor
Syscall Lib
9
Å “o““’h½m”hƕ»qrÇÈ
• Looking for the solution to the
NUG30 quadratic assignment
problem
• An informal collaboration of
mathematicians and computer
scientists
• Condor-G delivered 3.46E8 CPU
seconds in 7 days (peak 1009
processors) in U.S. and Italy (8
sites)
10
Ĕ“êÄs“”””ë
ɃÊ˵Ì9˵ÍÎ9˵Í3Ê˵É5Ë'Ï5Ë'ÉÐ5Ë'É5Ì9Ë
É5Ñ9Ë Ò˵ÍÉ9˵Í9Ë ÊËpÍ3ÒË'ÍÌ5Ë'Í5Í9Ë
É5Ï9˵Í9Ð5ËpÉ9Ó5Ë'ÏÑ5Ë'Ð5Ë'ÍÑ9˵É3ÒË
Î5Ë'É5Î9˵Ó9ËpÍ5Ó9˵É5Í5Ë'É5É5˵Í5Ï
Ô;ÕaÖ×DØDÙyÚÛ Ü=Ý Þ ßƒàDáKá®ÕaâHã¤àaä*×aâ#Ø3àDÞ Öwå®ä ÕaæyÖÕDÞµá®âHçrè æyéwàDá®æƒè á
11
12
2
¼ ’h½mo“ ëosëo
Submit
Machine
Schedd
Schedd
»’½Ä ˜ ¼ ìs’í ¼ “ î
Collector
Collector
Collector
Collector
Collector
Collector
Negotiator
Negotiator
Negotiator
Negotiator
Negotiator
Negotiator
Central
Manager
Pool-Foo
Central
Manager
Pool-Bar
Central
Manager
(CONDOR_HOST)
saturn
viking-i00
(Solaris E6800) (Linux Xeon)
• A Requirement expression must evaluate to True
for a match to be made
• All matches which meet the requirements can be
sorted by preference with a Rank expression.
• Higher the Rank, the better the match
mir
(Linux x86)
13
Universe
= vanilla
Executable = my_job
Arguments = -arg1 –arg2
InitialDir = run_$(Process)
Requirements = Memory >= 256 && Disk > 10000
Rank = (KFLOPS*10000) + Memory
Queue 600
¼ ’ ‘ ssïðoñ ½m“sò ‘ ”—
ó ˜“’î pÄĔ—
START – When is this machine willing to start a job
RANK - Job Preferences
SUSPEND - When to suspend a job
CONTINUE - When to continue a suspended job
PREEMPT – When to nicely stop running a job
KILL - When to immediately kill a preempting job
• START jobs when their has been no activity
on the keyboard/mouse for 5 minutes and the
load average is low
• SUSPEND jobs as soon as activity is detected
• PREEMPT jobs if the activity continues for 5
minutes or more
• KILL jobs if they take more than 5 minutes to
preempt
15
16
Å “s’o"•˜ëõô”
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)
BackgroundLoad = 0.3
HighLoad = 0.5
KeyboardBusy = (KeyboardIdle < 10)
CPU_Busy = ($(NonCondorLoadAvg) >= $(HighLoad))
MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))
ActivityTimer = (CurrentTime - EnteredCurrentActivity)
START = $(CPU_Idle) && KeyboardIdle > 300
SUSPEND = $(MachineBusy)
CONTINUE = $(CPU_Idle) && KeyboardIdle > 120
PREEMPT = (Activity == "Suspended") &&
$(ActivityTimer) > 300
KILL = $(ActivityTimer) > 300
14
ö o“÷•s”“’’÷’ø
• A ClassAd maps attributes to expressions
• Expressions
–
–
–
–
Constants: strings, numbers, etc.
Expressions: other.Memory > 600M
Lists: { “roy”, “pfc”, “melski” }
Other ClassAds
• Powerful tool for grid computing
– Semi-structured (you pick your structure)
– Matchmaking
• Open Source: http://www.cs.wisc.edu/condor/classad
17
18
3
”“’’÷ Å “o“îë
”“’’÷õ»•’“ ë
[
• Condor:
Type = “Job”;
Owner = “roy”;
Universe = “Standard”;
Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M);
Rank = (other.DiskSpace > 300M ? 10 : 1);
– Used to describe jobs & machines
– Used by the matchmaker to place jobs on machines
• European Data Grid:
]
[
– JDL: Class Ad derived schema for jobs & machines
– Used by the resource broker to match jobs to
machines (See later)
Type = “Machine”;
OpSys = “Linux”;
DiskSpace = 500M;
AllowedUsers = {“roy”, “melski”, “pfc”};
Requirements = (IsMember(other.Owner, AllowedUsers);
• Future:
]
– XML Representation
(To see a real ClassAd, try: condor_q –l or condor_status –l)
19
ö o“ê’wùqnø
20
ùqnïúl1ê ö sî’
Condor-G
• Use Condor to run jobs on Grid resources
• Use the Globus Toolkit to provide:
Grid Resource
Schedd
Schedd
– GRAM (submit a remote job)
– GASS (transfer job’s files)
LSF
LSF
• Use Condor to provide:
– Job management
– Fault tolerance
– Credential management
• Two components: Globus Universe & GlideIn
• Disadvantages
– No remote syscalls, checkpoint/migration, or dynamic
resource selection
21
ùqrïúl1ê ö î ’
‡wˆsˆŽ Š3û5ü Ša ¤ýµ
Condor-G
22
ùqnïúl1ê ö sî’
‡wˆsˆŽ Š3û5ü Š3 ýp
Grid Resource
Condor-G
Schedd
Schedd
Grid Resource
Schedd
Schedd
LSF
LSF
GridManager
GridManager
23
LSF
LSF
24
4
ùqrïúl1ê ö î ’
‡wˆsˆŽ Š3û5ü Ša ¤ýµ
ùqnïúl1ê ö sî’
‡wˆsˆŽ Š3û5ü Š3 ýp
Condor-G
Grid Resource
Condor-G
Grid Resource
Schedd
Schedd
JobManager
JobManager
Schedd
Schedd
JobManager
JobManager
LSF
LSF
GridManager
GridManager
GridManager
GridManager
LSF
LSF
User
User Job
Job
25
qnӐ
qr”¤Ã’h»•s’
Job Submission Machine
Job Execution Site
• Create your own personal Condor pool from
temporarily-acquired Grid resources
• Brings the full power of Condor to the Grid
• Run a Condor startd on a Grid resource
• Startd reports back to your machine and runs
Vanilla and Standard Universe jobs
Globus
GateKeeper
rk
Fo
rk
End User
Requests
Persistant
Job Queue
Fo
Condor -G
Scheduler
26
Globus
JobManager
Globus
JobManager
Submit
Submit
Fork
Site Job Scheduler
(PBS, Condor, LSF, LoadLeveler, NQE, etc. )
Condor -G
GridManager
GASS
Server
Job X
Job Y
28
ùqrïúl1ê ö î ’
‡wˆsˆ Ž Š3‰Ša ‹ ŒyŠ3
Condor-G
ùqnïúl1ê ö sî’
‡ˆˆ Ž Š3‰Š3 ‹ ŒwŠa
Grid Resource
Condor-G
Schedd
Schedd
Grid Resource
Schedd
Schedd
LSF
LSF
LSF
LSF
Collector
Collector
Collector
Collector
þ ü ÿ Œ®ÿ ‹ 
29
30
5
ùqrïúl1ê ö î ’
‡wˆsˆ Ž Š3‰Ša ‹ ŒyŠ3
Condor-G
ùqnïúl1ê ö sî’
‡ˆˆ Ž Š3‰Š3 ‹ ŒwŠa
Grid Resource
Schedd
Schedd
GridManager
GridManager
LSF
LSF
Condor-G
Grid Resource
Schedd
Schedd
JobManager
JobManager
LSF
LSF
GridManager
GridManager
Collector
Collector
Collector
Collector
þ ü ÿ Œ®ÿ ‹ 
þ ü ÿ Œ®ÿ ‹ 
31
ùqrïúl1ê ö î ’
‡wˆsˆ Ž Š3‰Ša ‹ ŒyŠ3
32
ùqnïúl1ê ö sî’
‡ˆˆ Ž Š3‰Š3 ‹ ŒwŠa
Condor-G
Grid Resource
Condor-G
Grid Resource
Schedd
Schedd
JobManager
JobManager
Schedd
Schedd
JobManager
JobManager
GridManager
GridManager
LSF
LSF
LSF
LSF
GridManager
GridManager
Startd
Startd
Startd
Startd
Collector
Collector
Collector
Collector
þ ü ÿ Œ®ÿ ‹ 
þ ü ÿ Œ®ÿ ‹ 
33
ùqrïúl1ê ö î ’
‡wˆsˆ Ž Š3‰Ša ‹ ŒyŠ3
Condor-G
Grid Resource
Schedd
Schedd
JobManager
JobManager
GridManager
GridManager
‡ˆˆ Ž Š3‰Š3 ‹ ŒwŠa
z;… uyuy…€yx‚ t7†a{¤uwx |p|ƒuy€yuy}vy†axu ~yx uy€ }3ºw~Kx uw „ uyuy€ „
þ ü ÿ Œ®ÿ ‹ 
þ ü ÿ Œ®ÿ ‹ Ž Š3
'ÿ ƒ‹ Œ7ü H‰Ša‹ ŒwŠa!pŠŠ3ü
LSF
LSF
Startd
Startd
Collector
Collector
34
User
User Job
Job
35
36
6
Åö#" Å “’ù ö sî
Job Submission Machine
Job Execution Site
• Master-Worker Style Parallel Applications
Persistant
Job Queue
End User
Requests
Condor -G
GridManager
Condor -G
Scheduler
Fork
– Large problem partitioned into small pieces (tasks);
– The master manages tasks and resources (worker pool);
– Each worker gets a task, execute it, sends the result back,
and repeat until all tasks are done;
– Examples: ray-tracing, optimization problems, etc.
Globus Daemons
+
Local Site Scheduler
[See Figure 1]
GASS
Server
Fork
• On Condor (PVM, Globus, … … )
Condor -G
Collector
– Many opportunities!
– Issues (in a Distributed Opportunistic Environment):
Condor
Daemons
Transfer Job X
• Resource management, communication, portability;
• Fault-tolerance, dealing with runtime pool changes.
Fork
Condor
Shadow
Process for
Job X
Job
Resource
Information
Redirected
System Call
Data
Job X
Condor System Call
Trapping & Checkpoint
Library
37
Åö ½Ä”¤˜p—ho ö sî%$
38
Åö ½’’h½ms’
• An OO framework with simple interfaces
• Nug30 solved in 7 days by MW-QAP
– Quadratic assignment problem outstanding for 30 years
– Utilized 2500 machines from 10 sites
– 3 classes to extend, a few virtual functions to fill;
– Scientists can focus on their algorithms.
• NCSA, ANL, UWisc, Gatech, INFN@Italy, … …
• 1009 workers at peak, 11 CPU years
• Lots of Functionality
– http://www-unix.mcs.anl.gov/metaneos/nug30/
– Handles all the issues in a meta-computing environment;
– Provides sufficient info. to make smart decisions.
• STORM (flight scheduling)
• Many Choices without Changing User Code
– Stochastic programming problem (1000M row X 13000M col)
– Multiple resource managers: Condor, PVM, …
– 2K times larger than the best sequential program can do
– 556 workers at peak, 1 CPU year
– Multiple communication interfaces: PVM, File, Socket, …
– http://www.cs.wisc.edu/~swright/stochastic/atr/
39
ó ÷•q Å “
40
&('*)!+-, .)0/-1-243
• A DAG is the data structure
used by DAGMan to represent
these dependencies.
• Directed Acyclic Graph Manager
• Specify dependencies between your Condor jobs,
so it can manage them automatically for you.
e.g. Don’t run “B” until “A” has completed successfully.
• Most real science involves complex sequences of
tasks – on many resources at many sites.
• Each job is a “node” in the
DAG.
• Each node can have any
number of “parent” or
“children” nodes – as long as
there are no loops!
e.g., move data, compute, check, move back, etc
• Failures are a certainty, so recoverability of the
sequence – not just the jobs – is crucial.
41
Job A
Job B
Job C
Job D
42
7
¼ ë“ ó ÷•q
¼ ¤ë“ ó ÷q1¿òÁ
• DAGMan acts as a “meta-scheduler”, managing the
submission of your jobs to Condor based on the
DAG dependencies.
• DAGMan holds & submits jobs to the Condor queue
at the appropriate times.
A
Condor A
Job
Queue
B
A
C
.dag
File
Condor B
Job
Queue C
DAGMan D
B
C
DAGMan D
43
¼ ë“ ó ÷•q ¿òÁ
44
¼ së“ ó ÷•q
• In case of a job failure, DAGMan continues until it can no
longer make progress, and then creates a “rescue” file with
the current state of the DAG.
• Once the failed job is ready to be re-run, the rescue
file can be used to restore the prior state of the DAG.
A
Condor
Job
Queue
B
A
X
Rescue
File
Condor
Job
Queue C
DAGMan D
Rescue
File
C
B
DAGMan D
45
ô’oë“ ó ÷•q
46
‘¼ í ‘ Àr½ð½Ä’
• Once the DAG is complete, the DAGMan job itself
is finished, and exits.
• Executes locally on the submit host before or
after job submission…
• Example:
PRE
Job A
# diamond.dag
PRE A prepare-A.sh
Job A a.sub
Job B b.sub
Job C c.sub
Job D d.sub
POST D double-check.sh
Parent A Child B C
Parent B C Child D
A
Condor
Job
Queue
B
DAGMan
C
D
Job B
Job C
Job D
POST
• PRE/POST scripts are part of node
47
48
8
Æ ¼ ð ¼65
½“s—
•
•
•
•
• Tells DAGMan to re-run a node multiple
times if necessary…
• Example:
Job A
# diamond.dag
Job A a.sub
Job B b.sub
RETRY B 5
Job C c.sub
RETRY C 5
Job D d.sub
Parent A Child B C
Parent B C Child D
Job B
Job C
Condor provides a reliable execution infrastructure
Enables the federation of resources
Provides a reliable Grid submission system
Distributed dependent workflow (no control flow)
with failure detection and recovery
Job D
49
50
9
Download