Resource Management Systems

advertisement
Resource Management Systems
DCC/FCUP
Grid Computing
1
NQE
(Network Queue Environment)
DCC/FCUP
Grid Computing
2
NQE
snow
./prog.out
FTA: File Transfer Agent
DCC/FCUP
Grid Computing
NQS: Networking Queueing System
3
NQE user commands
cevent
cqdel
cqstatl
cqsub
ftua
ilb
nqe
Posts, reads, and deletes job-dependency event information.
Deletes or signals to a specified batch request.
Provides a line-mode display of requests and queues on a specified host
Submits a batch request to NQE.
Transfers a file interactively (this command is issued on an NQE server only).
Executes a load-balanced interactive command.
Provides a graphical user interface (GUI) to NQE functionality.
Commands issued on an NQE server only:
qalter
Alters the attributes of one or more NQS requests
qchkpnt
Checkpoints an NQS request on a UNICOS, UNICOS/mk, or IRIX system
qdel
Deletes or signals NQS requests
qlimit
Displays NQS batch limits for the local host
qmsg
Writes messages to stderr, stdout, or the job log file of an NQS batch request
qping
Determines whether the local NQS daemon is running and responding to requests
qstat
Displays the status of NQS queues, requests, and queue complexes
qsub
Submits a batch request to NQS
rft
Transfers a file in a batch request
Fonte: http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=bks&fname=/SGI_Admin/NQE_AG/apa.html
DCC/FCUP
Grid Computing
4
SGE
(Sun Grid Engine)
Um único recurso pode desempenhar
Mais de uma atividade
DCC/FCUP
Grid Computing
5
SGE
Commands similar to the ones used by
NQE
 Example: g.job
#!/bin/csh
gaussian < testDFT.in
 To run:

qsub –pe smp 4 –M ines@dcc.fc.up.pt –m ae –r n g.job
Or...
DCC/FCUP
Grid Computing
6
SGE
File g.job
#!/bin/csh
#$ -pe smp 4 # parallel environment
#$ -M ines@dcc.c.up.pt
#$ -m ae
# mail sent at end/abort
#$ -r n
# no rerun
gaussian < testDFT.in


To run: qsub g.job
DCC/FCUP
Grid Computing
7
SGE: other example
#$ -pe openmpi* 32
#$ -q short*
#$ -l dedicated=4
DCC/FCUP
Grid Computing
8
SGE: another example
#$ -V
#Inherit the submission environment
#$ -cwd
# Start job in submission directory
#$ -N myMPI
# Job Name
#$ -j y
# Combine stderr and stdout
#$ -o $JOB_NAME.o$JOB_ID
# Name of the output file (eg.
myMPI.oJobID)
#$ -pe 12way 24
# Requests 12 tasks/node, 24 cores
total
#$ -q normal
# Queue name normal
#$ -l h_rt=01:30:00
# Run time (hh:mm:ss) - 1.5 hours
#$ -M
# Use email notification address
#$ -m be
# Email at Begin and End of job
DCC/FCUP
Grid Computing
9
SGE




User can specify requirements (cpu type, disk storage,
memory etc)
SGE regisers the task, requirements and control
information (owner, group, dept, date/time of submission
etc)
Planner to execute tasks
As soon as a resource queue is available, SGE launches
the execution of one of the waiting tasks



Task with greater priority or greater waiting time, according to the
configuration of the planner
If there are several available queues, choose the least loaded
There can exist more than one queue per cluster
DCC/FCUP
Grid Computing
10
SGE

Planning policies:
 Based in tickets (User)
 The more tickets a user has, the greater its priority
 Tickets are assigned statically according to the queueing
policy and priorities assigned to each user
 Based in urgency (tasks)
 Time limit to finish the task (can be specified by the user)
 Waiting queue time
 Required resources
 Customized:
allows arbitrary assignment of priorities
to the tasks (similar to nice)
DCC/FCUP
Grid Computing
11
SGE

Lyfe cycle of a task:
 Submission
 Master stores the task and informs the planner
 Planner inserts the task in the appropriate queue
 Master sends task to host
 Before executing the execution daemon:
 Changes to the directory of the task
 Initializes the environment (env variables)
 Initializes the set of processors
 Changes the task uid to the pwner’s uid
 Initializes resource limits to the process
 Collects accounting info
 When it finishes these steps, stores the task in its database
and wait for it to finish
 Once the task is finished, informs the master and deletes the
task from the database
DCC/FCUP
Grid Computing
12
SGE

Some commands:
 qconf:
cluster configuration
 qsub: task submission
 qdel: delete tasks from the queue
 qacct: accounting information
 qhost: inspects hosts status
 qstat: inspects queues status
DCC/FCUP
Grid Computing
13
SGE

GUI
DCC/FCUP
Grid Computing
14
SGE GUI
DCC/FCUP
Grid Computing
15
Condor

It is a specialized job and resource
management system. It provides:
 Job
management mechanism
 Scheduling
 Priority scheme
 Resource monitoring
 Resource management
DCC/FCUP
Grid Computing
16
Condor





The user submits a job to an agent.
The agent is responsible for remembering jobs in a
persistent storage while finding resources willing to run
them.
Agents and resources advertise themselves to a
matchmaker, which is responsible for introducing
potentially compatible agents and resources.
At the agent, a shadow is responsible for providing all
the details necessary to execute a job.
At the resource, a sandbox is responsible for creating a
safe execution environment for the job and protecting the
resource from any mischief.
DCC/FCUP
Grid Computing
17
Condor
Matchmaker
User
Plan of
jobs
ClassAds
job
Problem Solver
Agent
Resource
claim
Shadow
Details of the
job
Sandbox
Environment
Job
DCC/FCUP
Grid Computing
18
Condor: Gateway Flocking
- Gateway passes information about participants
between pools,
- M(A) sends request to M(B) through gateways,
- M(B) returns a match
DCC/FCUP
Grid Computing
19
Condor
Direct Flocking
A also advertises to Condor Pool B
DCC/FCUP
Grid Computing
20
RMS
Each has its own interface
 Do not provide integration
 Lack of interoperability
 Requires specific administrative skills
 Increase operational costs
 Generate over-provisioning and global
load imbalance

DCC/FCUP
Grid Computing
21
Download