Tips For Deploying Large Pools

advertisement
Tips For Deploying Large
Pools
Zach Miller
Computer Sciences Department
University of Wisconsin-Madison
zmiller@cs.wisc.edu
http://www.cs.wisc.edu/condor
Overview
When supporting pools of hundreds or
thousands of machines, there are
some potentially tricky issues that
can come up. Here I’ll address a few
of them and talk about some solutions
and workarounds.
www.cs.wisc.edu/condor
2
Scalability Questions
› How many jobs can I submit at once?
› How many machines can I have in my
pool?
› Does it matter how long my jobs run?
› What other factors impact
scalability?
www.cs.wisc.edu/condor
3
Job Queue
› The condor_schedd can be one of the
major bottlenecks in a condor system.
› One schedd *can* hold 50000 jobs
(or perhaps more) but it becomes
painful to use, and can bring the
throughput of your pool way down.
www.cs.wisc.edu/condor
4
Why?
› Besides consuming an enormous amount of memory and disk,
having lots of jobs in the queue impacts the time it takes to
match jobs.
› The condor_schedd is single-threaded.
› So, while running condor_q and waiting for 10000 jobs to be
listed, the schedd can’t be doing other things, like actually
starting jobs (spawning shadows)
› It also cannot talk to the negotiator to match new jobs…
which causes the negotiator to perhaps timeout waiting!
www.cs.wisc.edu/condor
5
Job Queue
› One options is to use DAGMan to throttle
›
the number of submitted jobs.
Add all your jobs to a DAG (even if there
are no dependences) and then do:
condor_submit_dag -maxjobs 200
› DAGman will then never allow more than
200 jobs from this batch to be submitted
at once.
www.cs.wisc.edu/condor
6
DAGMan
› DAGMan also provides the nice ability to
›
›
retry jobs that fail
Each DAGMan batch of is independent of
any others, i.e. the maxjobs is only for a
particular batch of jobs
Can add a delay between submission of jobs
so the condor_schedd isn’t swamped using:
 DAGMAN_SUBMIT_DELAY = 5
www.cs.wisc.edu/condor
7
Other Small Time Savers
When Submitting
› In the submit file:
COPY_TO_SPOOL = FALSE
› In the condor_config file:
SUBMIT_SKIP_FILECHECK = TRUE
www.cs.wisc.edu/condor
8
File Transfer
› If you are using Condor’s file transfer
›
mechanism and you are also using
encryption, the overhead can be
significant.
Condor 6.7 allows per-file specification of
whether or not to use encryption
www.cs.wisc.edu/condor
9
Per-File Encryption
. . .
Transfer_input_files = big_tarball.tgz, sec.key
Encrypt_input_files = sec.key
Dont_encrypt_input_files = big_tarball.tgz
. . .
www.cs.wisc.edu/condor
10
Per-File Encryption
. . .
Transfer_input_files = big_tarball.tgz, sec.key
Encrypt_input_files = *.key
Dont_encrypt_input_files = *.tgz
. . .
www.cs.wisc.edu/condor
11
Job Queue
› My machine is running 800 jobs and the
›
›
load is too high!! How can I throttle this?
Use MAX_JOBS_RUNNING in the
condor_config file. By default, this is set
to 300. (You may actually wish to increase
this if your submit machine can handle it)
This controls how many shadows the
schedd will spawn
www.cs.wisc.edu/condor
12
Pool Size
› Some of the largest known condor
pools are over 4000 nodes
› Some have 1 VM per actual CPU, and
some have multiple VMs per CPU
www.cs.wisc.edu/condor
13
Central Manager
› If you have a lot of machines sending
updates to your central manager, it is
possible you are losing some of the
periodic updates. You can determine
if this is the case using the
COLLECTOR_DAEMON_STATS
feature…
www.cs.wisc.edu/condor
14
Keeping Update Stats
COLLECTOR_DAEMON_STATS = True
COLLECTOR_DAEMON_HISTORY_SIZE = 128
% condor_status -l | grep Updates
UpdatesTotal = 57200
UpdatesSequenced = 57199
UpdatesLost = 2
UpdatesHistory =
"0x00000000800000000000000000000000"
www.cs.wisc.edu/condor
15
If Your Network Is
Swamped
› You can make many different intervals
longer:
 UPDATE_INTERVAL = 300
 SCHEDD_INTERVAL = 300
 MASTER_UPDATE_INTERVAL = 300
 ALIVE_INTERVAL = 300
www.cs.wisc.edu/condor
16
Negotiation
› Normally considers each job
separately
Can I run this job? No…
Can I run this job? No…
Can I run this job? No…
Etc…
www.cs.wisc.edu/condor
17
Negotiation
10/12 09:08:45
Request 00463.00000:
10/12 09:08:45
Rejected 463.0 zmiller@cs.wisc.edu
<128.105.166.24:37845>: no match found
10/12 09:08:45
Request 00464.00000:
10/12 09:08:45
Rejected 464.0 zmiller@cs.wisc.edu
<128.105.166.24:37845>: no match found
10/12 09:08:45
Request 00465.00000:
10/12 09:08:45
Rejected 465.0 zmiller@cs.wisc.edu
<128.105.166.24:37845>: no match found
www.cs.wisc.edu/condor
18
Negotiation
› This process can be greatly sped up if you know
›
›
which attributes are important to each job:
 SIGNIFICANT_ATTRIBUTES = Owner,Cmd
Then, once a job is rejected, any more jobs of the
same “class” can be skipped immediately.
The less time the schedd spends talking to the
negotiator, the better
www.cs.wisc.edu/condor
19
Job Length
› The length of your jobs matters!
› There is overhead in scheduling a job,
›
moving the data, starting shadows and
starters, etc.
Jobs that run just a few seconds incur way
more overhead than they do work!
www.cs.wisc.edu/condor
20
Job Length
› So if your jobs are too short, the schedd
›
basically cannot keep up with keeping the
pool busy
There is of course no exact formula for
how long they should run, but longerrunning jobs usually get better overall
throughput (assuming no evictions!)
www.cs.wisc.edu/condor
21
Other factors
› If you have many jobs running on a single
›
submit host, you may want to increase some
of your resource limits
On linux (and others I’m sure) there are
system-wide limits and per-process limits
on the number of file descriptors (FDs).
www.cs.wisc.edu/condor
22
Resource Limits
› System Wide:
Edit /etc/sysctl.conf:
# increase system fd limit
fs.file-max = 32768
Or:
echo 32768 > /proc/sys/fs/file-max
www.cs.wisc.edu/condor
23
Resource Limits
› Per-Process:
 Edit /etc/security/limits.conf
 Or:
su - root
ulimit -n 16384 # for sh
limit descriptors 16384 # for csh
su - your_user_name
<run job here>
www.cs.wisc.edu/condor
24
Port Ranges
› Default range is 1024 to 4999
› Again, in /etc/sysctl.conf:
# increase system IP port
limitsnet.ipv4.ip_local_port_range = 1024 65535
› Or:
echo 1024 65535 >
/proc/sys/net/ipv4/ip_local_port_range
www.cs.wisc.edu/condor
25
Complex Problem
› Exactly how much work a system can
do is a fairly complex problem since
you are dealing with many types of
resources (CPU, disk, network I/O)
› Some experimentation is necessary.
www.cs.wisc.edu/condor
26
Questions?
Thank You!
www.cs.wisc.edu/condor
27
Download