The Ethernet Approach to Grid Computing

advertisement
The Ethernet Approach
to Grid Computing
Douglas Thain and Miron Livny
Condor Project, University of Wisconsin
http://www.cs.wisc.edu/condor/ftsh
MCRunJob
(python)
Impala
(bash)
MOP
(python)
Submit DAG
(perl)
Condor-G
(C++)
Gridmanager
(C++)
GAHP Server
(C++)
The UW
US-CMS
Physics Grid
DAGMan
(C++)
Wrapper
globus-url-copy
(C)
Gatekeeper
(C)
Jobmanager
(C)
Batch Interface
(bash)
Batch System
(???)
MOP wrapper
(bash)
Impala wrapper
(bash)
Actual Job
(Fortran)
Outline
• Two problems in real systems:
Ethernet
– Timing is uncontrollable.
– Failures lack detail.
Carrier Sense
Collision Detect
Exponential Backoff
Limited Allocation
• A solution:
– The Ethernet Approach.
• A language and a tool:
try for 30 minutes
...
end
– The Fault Tolerant Shell.
– Time and failures are explicit.
• Example Applications:
– Shared Job Queue.
– Shared Disk Buffer.
– Shared Data Servers.
Client
WWW
Server
WWW
Server
dataset
dataset
Client
Client
Client
Black
Hole
Client
1 - Timing is Uncontrollable
• Consider a distributed file system.
• Suppose that the network is down.
– “soft mounted” - failure after one minute
– “hard mounted” – failure never exposed
• Time is an unknown in nearly every
operating system activity:
– Process invocation.
– Memory access.
– Network communications.
2 - Failures Lack Detail
• Consider this trivial program:
% cp a b
• We would like to distinguish:
– “success.”
– “file not found.”
– “nfs server down, still trying.”
– “couldn’t find library libc.so.25.”
2 - Failures Lack Detail
• Consider this trivial program:
% cp a b
• Actual results:
– “success.” (exit code 0)
– “file not found.” (exit code 1)
– “nfs server down, still trying.” (code 1)
– “couldn’t find library libc.so.25.” (code 1)
Examples Abound!
• TCP connect -> ECONNREFUSED
– Wrong port number.
– A loaded service is rejecting connections.
– The machine has just rebooted, has initialized
TCP/IP, but not yet started the service.
• FTP RETR -> code 550
–
–
–
–
–
“550 File or directory not found.”
“550 Erlaubnis hat verweigert.”
“550 Archiveer systeem offline.”
“550 Fuori di memoria.”
“550 File staging in from tape.” (NCSA Unitree)
How do we
design new
systems that
avoid these
problems?
Real systems
have these
problems. How
can we learn to
live with them?
“Error Scope”
HPDC 2002
“Ethernet Approach”
HPDC 2003
Not enough
information or control.
The Ethernet Approach
Ethernet Rules
Network
or Memory
or Disk Space
or OS Resources
Carrier Sense
No Carrier Sense
Collision Detect
== Aloha Protocol
Exponential Backoff
Limited Allocation
The Fault Tolerant Shell
• A tool that encourages the Ethernet approach
in system integration.
– Similar to the Bourne or C-Shells.
– Process invocation and repetition are simple.
– Other elements are possible but ugly.
• Not meant to be general purpose, high
performance, or abstractly beautiful.
– Not OOP, AOP, SOP, GP, etc...
– Ethernet ideas could be used in such languages.
• Elements:
– Brittle property, try/catch, timed try, forany/forall.
The Brittle Property
wget http://host/file.tar.gz
gunzip file.tar.gz
tar xvf file.tar
Failure of any
step causes
an immediate
halt of the
entire group.
Untyped Exceptions
try
wget http://host/file.tar.gz
gunzip file.tar.gz
tar xvf file.tar
catch
echo “Zoiks!”
end
Failure of this
group raises
an exception.
Exceptions
have no type!
Timed Try Statements
try for 30 minutes
wget http://host/file.tar.gz
gunzip file.tar.gz
tar xvf file.tar
end
Success after n is as good
as success after one.
(Otherwise, failure.)
The enclosed
statement will
be cancelled
after 30 mins.
An exception in
the enclosed
statement will
retry up to 30
mins.
(Exp. backoff.)
Timed Try Statements
• If group completes within time limit.
– Try block succeeds.
• If group fails within time limit.
– Automatically retried.
– Exponentially increasing delay.
– Random factor to avoid collisions.
• If group runs over time limit.
– Resources reclaimed, exception thrown.
forany and forall
forany host in xxx yyy zzz
wget http://${host}/file
end
Attempt to make
this statement
succeed for any
random branch.
Attempt to make
forall host in xxx yyy zzz
this statement
wget http://${host}/file
succeed for all
branches end
simultaneously.
Ethernet Properties
Example Applications
Job
Queue
Disk
Buffer
Data
Servers
Collision
Detect
failed cmd
failed cmd
failed cmd
Exp
Backoff
“try” backoff
“try” backoff
“try” backoff
Limited
Allocation
“try” timeout
“try” timeout
“try” timeout
Carrier
Sense
File
Descriptors
Estimated
Free Space
Short Active
Probe
handled
by ftsh
handled
by coder
Shared Job Queue
Multiple clients connect to a job queue to manipulate jobs.
(Submit, query, remove, etc.) What’s the bottleneck?
Match
Maker
Client
Client
Client
Condor
schedd
Activity
Log
Local
Filesystem
Job
Job
Job Job
Job
Job Queue
Job
Job
Job
CPU
CPU
CPU
Aloha Client
try for 5 minutes
condor_submit job.file
end
Ethernet Client
try for 5 minutes
if avail_fds() .lt. 1000
failure
end
condor_submit job.file
end
Measure
free file
descriptors.
Throw an
exception
and try
again.
Shared Disk Buffer
Multiple batch jobs share an output buffer.
Jobs write output files, and a mover pushes them out.
Step E:
Send
Step D:
Read
Step C:
Commit
Step B:
Write
Step A:
Arbitrate
Data
Mover
Job 8
Job 9
Job 10
Step F:
Delete
d4.c
d5.c
d6.c
d7.c
d8.i
d9.i
d10.i
Local
File
System
Aloha Client
try for 30 minutes
try
run-job > d$n.i
mv d$n.i d$n.c
catch
rm -f d$n.i
end
end
Create the
file, marked
“incomplete.”
Atomically
commit the file.
Remove the file
if any failure.
Ethernet Client
try for 30 minutes
if overcommitted()
failure
end
try
run-job > d$n.i
mv d$n.i d$n.c
catch
rm -f d$n.i
end
end
Buffer is
overcommitted if
estimated needs
exceed available
space.
Shared Data Servers
A healthy but
loaded server
might also have
a high response
time.
Client
WWW
Server
WWW
Server
dataset
dataset
Client
Client
Client
Black
Hole
Accepts all
connections
and holds
them idle
indefinitely.
Client
Each client wants one instance of the data set, but doesn’t care
which one. How to deal with delays and failures?
Aloha Client
try for 15 minutes
forany host in xxx yyy zzz
try for 1 minute
wget http://${host}/data
end
end
end
Ethernet Client
try for 15 minutes
forany host in xxx yyy zzz
try for 5 seconds
wget http://${host}/tiny
end
try for 1 minute
wget http://${host}/data
end
end
end
Test the
server by
fetching a
tiny file.
All Clients
Blocked on
Black Hole
Some Thoughts
• This is a necessary technique for real problems.
– Timing is uncontrollable; failures lack detail.
– A simple technique has significant payoff.
• The Ethernet approach is not always ideal.
– Carefully chosen errnos are powerful.
– Designing errnos is tricky.
• Requires clients of good will.
– Some scenarios require external coordination.
– Admission control for admission control?
• Time and failure are first-class concerns.
– They should be first-class elements of languages!
– We get good mileage without complex constructions.
• More info at:
– http://www.cs.wisc.edu/condor/ftsh
Computing’s central challenge,
“How not to make a mess of it,”
has not yet been met.
-Edsger Dijkstra
Download