powerpoint

advertisement
Resource Management
Reading:
“A Resource Management Architecture for
Metacomputing Systems”
What is Resource Management?

Mechanisms for locating and allocating
computational resources
Authentication
Process creation

Remote job submission

Scheduling

Other resources that can be managed:
Memory
Disk
Networks
Resource Management Issues
for Grid Computing

Site autonomy
Resources owned by different organizations,
in different administrative domains
Local policies for use, scheduling, security

Heterogeneous substrate
Different local resource management
systems

Policy extensibility
Local sites need ability to customize their
resource management policies
More Issues for Grid Computing

Co-allocation
May need resources at several sites
Mechanism for allocating multiple
resources, initiating computation,
monitoring and managing

On-line control
Adapt application requirements to resource
availability
Specifying Resource and Job
Requirements

Resource requirements:
Machine type
Number of nodes
Memory
Network

Job or scheduler parameters:
Directory
Executable
Arguments
Environment
Maximum time required
Resource and Job Specification

Globus: Resource Specification Language
(RSL)
&(executable=myprog)
(|(&(count=5)(memory>=64))
(&(count=10)(memory>=32)))

Condor: Classified ads
Resource owners advertise abilities and
constraints
Applications advertise resource requests
Matchmaking: match offers & requests
Components of Globus Resource
Management Architecture





Resource specification using RSL
Resource brokers: translate resource
requirements into specifications
Co-allocators: break down requests for
multiple sites
Local resource managers: apply local,
site-specific resource management policies
Information about available compute
resources and their characteristics
Resource Specification Language


Common notation for exchange of
information between components
API provided for manipulating RSL
RSL Syntax

Elementary form: parenthesis clauses
(attribute op value [ value … ] )

Operators Supported:
<, <=, =, >=, > , !=

Some supported attributes:
executable, arguments, environment, stdin,
stdout, stderr, resourceManagerContact,
resourceManagerName

Unknown attributes are passed through
May be handled by subsequent tools
Constraints: “&”

For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
(executable=myprog)

“Create 5-10 instances of myprog, each
on a machine with at least 64 MB
memory that is available to me for 4
hours”
Multirequest: “+”

A multirequest allows us to specify multiple
resource needs, for example
+ (& (count=5)(memory>=64)
(executable=p1))
(&(network=atm) (executable=p2))
Execute 5 instances of p1 on a machine
with at least 64M of memory
Execute p2 on a machine with an ATM
connection

Multirequests are central to co-allocation
Resource Broker


Takes high-level RSL specification
Transforms into concrete specifications
through “specialization” process

Locate resources that meet requirements

Multiple brokers may service single request


Application-specific brokers translate
application requirements
Output: complete specification of locations
of resources; given to co-allocator
Examples of Resource Brokers

Nimrod-G
Automates creation and management of
large parametric experiments
Run application under wide range of input
conditions and aggregate results
Queries MDS to find resources
Generates number of independent jobs
GRAM allocates jobs to computational nodes
Higher-level broker: allows user to specify
time and cost constraints
Examples of Resource Brokers

AppLeS
Application Level Scheduler
Map large number of independent tasks to
dynamically varying pool of available
computers
Use GRAM to locate resources and initiate
and manage computation
Resource co-allocators

May request resources at multiple sites
Two or more computers and networks

Break multi-request into components

Pass each component to resource manager


Provide means for monitoring job status or
terminating job
Complex:
Two or more resource managers
Global state like availability of resources
difficult to determine
Different co-allocation services
1.
2.
3.

Require all resources to be available
before job proceeds; fail globally if failure
occurs at any resource
Allocate at least N out of M resources and
return
Return immediately, but gradually return
more resources as they become available
Each useful for some class of applications
Concurrent Allocation

If advance reservations are available:
 Obtain list of available time slots from each
participating resource manager and choose timeslot

Without reservations:
 Optimistically allocate resources
 Hope desired set will be available at future time
 Use information service (MDS) to determine current
availability of resources
 Construct RSL request that is likely to succeed
 If allocation fails, all started jobs must be terminated
Disadvantages of
Concurrent Allocation Scheme



Computational resources wasted while
waiting for all requested resources to
become available
Application must be altered to perform
barrier to synchronize startup across
components
Detecting failure of a resource is difficult,
e.g. in queue-based local resource
managers
Local Resource Managers

Implemented with Globus Resource
Allocation Manager (GRAM)
1. Processing RSL specifications representing
resource requests
 Deny request
 Create one or more processes (jobs) that satisfy
request
2. Enable remote monitoring and management
of jobs
3. Periodically update MDS information service
with current availability and capabilities of
resources
GRAM (cont.)

Interface between grid environment and
entity that can create processes
E.g., Parallel scheduler or Condor pool


GRAM may schedule resource itself
More commonly, maps resource
specification into a request to a local
resource allocation mechanism
E.g., Condor, LoadLeveler, LSF

Co-exists with local mechanisms
GRAM (cont.)

GRAM API has functions for:
Submitting a job request: produces
globally unique job handle
Canceling a job request
Asking when job request is expected to run
Upon submission, can request that progress
be signaled asynchronously to callback URL
GRAM Scheduling Model

Jobs are either:
Pending: resources have not yet been
allocated to the job
Active: resources allocated, job running
Done: when all processes have terminated
and resources have been deallocated
Failed: job terminates due to :
 explicit termination
 error in request format
 failure in resource management system
 denial of access to resource
GRAM Components

Gatekeeper
Responds to a request:
1. Performs mutual authentication of user
and resource
2. Determines local user name for remote
user
3. Starts a job manager that executes as
local user and handles request
GRAM Components (cont.)

Job manager
Creates processes requested by user
Submits resource allocation requests to
underlying resource management system
(or does fork)
Monitors state of created processes
Notifies callback contact of state transitions
Implements control operations like
termination
GRAM Components (cont.)

GRAM reporter
Responsible for storing into MDS
(information service) info about:
Scheduler structure
 Support reservations?
 Number of queues
Scheduler state
 Currently active jobs
 Expected wait time in queue
 Total number of nodes and available nodes
Resource
Management Architecture
RSL
specialization
Broker
RSL
Queries
& Info
Application
Ground RSL
Information
Service
Co-allocator
Simple ground RSL
Local
resource
managers
GRAM
GRAM
GRAM
LSF
EASY-LL
NQE
Job Submission Interfaces

Globus Toolkit includes several command
line programs for job submission
globus-job-run: Interactive jobs
globus-job-submit: Batch/offline jobs
globusrun: Flexible scripting infrastructure
Download