PowerPoint-Präsentation

advertisement

University Dortmund

Tutorial

Grid Resource

Management and Scheduling

EuroPar 2004

Pisa, Italy

Ramin Yahyapour

CEI, University of Dortmund

Agenda

 Background on Resource Management and Scheduling

 Job scheduling and resource management on HPC resources

 Transition to Grid RM and Grid Scheduling

 Current state-of-the-art

 Future of GRMS

 Requirements, outlook for upcoming GRMS

2

Introduction

We all know what “the Grid” is…

 one of the many definitions:

“ Resource sharing & coordinated problem solving in dynamic, multiinstitutional virtual organizations ” (Ian Foster)

 however, the actual scope of “the Grid” is still quite controversial

 Many people consider High Performance Computing (HPC) as the main Grid application.

 today’s Grids are mostly Computational Grids or Data Grids with HPC resources as building blocks

 thus, Grid resource management is much related to resource management on HPC resources (our starting point).

 we will return to a broader Grid scope and its implications later

3

Resource Management on HPC Resources

HPC resources are usually parallel computers or large scale clusters

The local resource management systems (RMS) for such resources includes:

 configuration management

 monitoring of machine state

 job management

There is no standard for this resource management.

Several different proprietary solutions are in use.

Examples for job management systems:

 PBS, LSF, NQS, LoadLeveler, Condor

4

HPC Management Architecture in General

Control Service

Job Master

Master

Server

Resource and Job

Monitoring and Management Services

Compute Resources/

Processing Nodes

Resource/

Job Monitor

Resource/

Job Monitor

Resource/

Job Monitor

Compute

Node

Compute

Node

Compute

Node

5

Computational Job

 A job is a computational task

 that requires processing capabilities (e.g. 64 nodes) and

 is subject to constraints (e.g. a specific other job must finish before the start of this job)

 The job information is provided by the user

 resource requirements

CPU architecture, number of nodes, speed memory size per CPU software libraries, licenses

I/O capabilities job description additional constraints and preferences

 The format of job description is not standardized, but usually very similar

6

Example: PBS Job Description

 Simple job script:

#!/bin/csh

# resource limits: allocate needed nodes

#PBS -l nodes=1

#

# resource limits: amount of memory and CPU time

([[h:]m:]s). whole job file is a shell script information for the RMS are comments

#PBS -l mem=256mb

#PBS -l cput=2:00:00

# path/filename for standard output

#PBS -o master:/mypath/myjob.out

./my-task the actual job is started in the script

7

Job Submission

The user “submits” the job to the RMS e.g. issuing “ qsub jobscript.pbs

 The user can control the job

 qsub: submit

 qstat: poll status information

 qdel: cancel job

It is the task of the resource management system to start a job on the required resources

Current system state is taken into account

8

PBS Structure

qsub jobscript

Job Submission

Management

Server

Scheduler

Job Execution

Job Execution

Job Execution

Processing Node

Processing Node

Processing Node

Job & Resource

Job & Resource

Monitor

Monitor

Monitor

9

Execution Alternatives

Time sharing:

The local scheduler starts multiple processes per physical CPU with the goal of increasing resource utilization.

 multi-tasking

The scheduler may also suspend jobs to keep the system load under control

 preemption

Space sharing:

 The job uses the requested resources exclusively; no other job is allocated to the same set of CPUs.

 The job has to be queued until sufficient resources are free.

10

Job Classifications

Batch Jobs vs interactive jobs

 batch jobs are queued until execution

 interactive jobs need immediate resource allocation

Parallel vs. sequential jobs

 a job requires several processing nodes in parallel

 the majority of HPC installations are used to run batch jobs in spacesharing mode!

 a job is not influenced by other co-allocated jobs

 the assigned processors, node memory, caches etc. are exclusively available for a single job.

 overhead for context switches is minimized

 important aspects for parallel applications

11

Preemption

 A job is preempted by interrupting its current execution

 the job might be on hold on a CPU set and later resumed; job still resident on that nodes (consumption of memory)

 alternatively a checkpoint is written and the job is migrated to another resource where it is restarted later

Preemption can be useful to reallocate resources due to new job submissions (e.g. with higher priority) or if a job is running longer then expected.

12

Job Scheduling

 A job is assigned to resources through a scheduling process

 responsible for identifying available resources

 matching job requirements to resources

 making decision about job ordering and priorities

HPC resources are typically subject to high utilization therefore, resources are not immediately available and jobs are queued for future execution

 time until execution is often quite long (many production systems have an average delay until execution of >1h)

 jobs may run for a long time (several hours, days or weeks)

13

Typical Scheduling Objectives

 Minimizing the Average Weighted Response Time

AWRT

 j

Jobs w j

(t

 j

Jobs w j j

 r j

)

 r : submission time of a job t : completion time of a job w : weight/priority of a job

 Maximize machine utilization/minimize idle time

 conflicting objective criteria is usually static for an installation and implicit given by the scheduling algorithm

14

Job Steps

A user job enters a job queue, the scheduler (its strategy) decides on start time and resource allocation of the job.

Grid

-

User

Job

Description

Scheduler

Job Execution

Management

Schedule lokale

Job-Queue

HPC

Machine

Node Job

Mgmt

Node Job

Mgmt

Node Job

Mgmt

15

Scheduling Algorithms:

FCFS

Well known and very simple: First-Come First-Serve

Jobs are started in order of submission

Ad-hoc scheduling when resources become free again

 no advance scheduling

Advantage:

 simple to implement

 easy to understand and fair for the users

(job queue represents execution order)

 does not require a priori knowledge about job lengths

 Problems:

 performance can extremely degrade; overall utilization of a machine can suffer if highly parallel jobs occur, that is, if a significant share of nodes is requested for a single job.

16

Time

Resources

Procssing Nodes

FCFS Schedule

Queue

1.

2.

3.

4…

Scheduler

Schedule

Job-Queue

Comput e

Resourc e

17

Scheduling Algorithms:

Backfilling

Improvement over FCFS

A job can be started before an earlier submitted job if it does not delay the first job in the queue

 may still cause delay of other jobs further down the queue

Some fairness is still maintained

Advantage:

 utilization is improved

 Information about the job execution length is needed

 sometimes difficult to provide user estimation not necessarily accurate

Jobs are usually terminated after exceeding its allocated execution time; otherwise users may deliberately underestimate the job length to get an earlier job start time

18

Time

Backfill Scheduling

 Job 3 is started before Job 2 as it does not delay it

Queue

1.

2.

3.

4…

Resources

Procssing Nodes

Scheduler

Schedule

Job-Queue

Comput e

Resourc e

19

Time

Backfill Scheduling

However, if a job finishes earlier than expected, the backfilling causes delays that otherwise would not occur

 need for accurate job length information (difficult to obtain)

Queue

Scheduler

1.

Job finishes earlier!

2.

3.

4…

Resources

Procssing Nodes

Schedule

Job-Queue

Comput e

Resourc e

20

Job Execution Manager

 After the scheduling process, the RMS is responsible for the job execution:

 sets up the execution environment for a job,

 starts a job,

 monitors job state, and

 cleans-up after execution (copying output-files etc.)

 notifies the used (e.g. sending email)

21

Scheduling Options

 Parallel job scheduling algorithms are well studied; performance is usually acceptable

Real implementations may have addition requirements instead of need of more complex theoretical algorithms:

 Prioritization of jobs, users, or groups while maintaining fairness

Partitioning of machines

 e.g.: interactive and development partition vs. production batch partitions

Combination of different queue characteristics

For instance, the Maui Scheduler is often deployed as it is quite flexible in terms of prioritization, backfilling, fairness etc.

22

University Dortmund

Transition to Grid Resource

Management and Scheduling

Current state of the art

Transition to the Grid

More resource types come into play:

 Resources are any kind of entity, service or capability to perform a specific task

 processing nodes, memory, storage, networks, experimental devices, instruments

 data, software, licenses people

The task/job/activity can also be of a broader meaning

 a job may involve different resources and consists of several activities in a workflow with according dependencies

The resources are distributed and may belong to different administrative domains

 HPC is still key the application for Grids. Consequently, the main resources in a Grid are the previously considered HPC machines with their local RMS

24

Implications to Grid Resource

Management

 Several security-related issues have to be considered: authentication, authorization,accounting

 who has access to a certain resource?

 what information can be exposed to whom?

 There is lack of global information:

 what resources are when available for an activity?

 The resources are quite heterogeneous:

 different RMS in use

 individual access and usage paradigms

 administrative policies have to be considered

25

Scope of Grids

Cluster Grid Enterprise Grid Global Grid

Source: Ian Foster

26

Resource Management Layer

Grid Resource Management System consists of :

 Local resource management system (Resource Layer)

Basic resource management unit

Provide a standard interface for using remote resources e.g. GRAM, etc.

 Global resource management system (Collective Layer)

 Coordinate all Local resource management system within multiple or distributed

Virtual Organizations (VOs)

Provide high-level functionalities to efficiently use all of resources

 Job Submission

Resource Discovery and Selection

Scheduling

Co-allocation

Job Monitoring, etc.

e.g. Meta-scheduler, Resource Broker, etc.

27

Grid Middleware

Application

“Coordination of several resources”: infrastructure services, application services

“Add resource”: Negotiate access, control access and utilization

“Communication with internal resource functions and services”

“Control local execution”

Collective

Resource

Connectivity

Fabric

Application

Transport

Internet

Link

Source: Ian Foster 28

Grid

Middleware

Grid Middleware (2)

User/

Application

Resource

Broker

Higher-Level

Services

Core Grid

Infrastructure Services

Information

Services

Monitoring

Services

Security

Services

Local Resource

Management

Grid Resource

Manager

Grid Resource

Manager

PBS LSF

Grid Resource

Manager

Resource Resource Resource

29

Globus Grid Middleware

 Globus Toolkit

 common source for Grid middleware

GT2

GT3 – Web/GridService-based

GT4 – WSRF-based

 GRAM is responsible for providing a service for a given job specification that can:

 Create an environment for a job

 Stage files to/from the environment

 Submit a job to a local scheduler

 Monitor a job

Send job state change notifications

Stream a job’s stdout/err during execution

30

Globus Job Execution

Job is described in the resource specification language

Discover a Job Service for execution

Job Manager in Globus 2.x (GT2)

Master Management Job Factory Service (MMJFS) in Globus 3.x (GT3)

Alternatively, choose a Grid Scheduler for job distribution

Grid scheduler selects a job service and forwards job to it

A Grid scheduler is not part of Globus

The Job Service prepares job for submission to local scheduling system

If necessary, file stage-in is performed

 e.g. using the GASS service

The job is submitted to the local scheduling system

If necessary, file stage-out is performed after job finishes.

31

Globus GT2 Execution

User/Application

RSL

RSL

Resource Broker

Specialized

RSL

Resource

Allocation

MDS

GRAM

PBS

Resource

GRAM

LSF

Resource

GRAM

Resource

32

RSL

Grid jobs are described in the resource specification language (RSL)

RSL Version 1 is used in GT2

It has an LDAP filter-like syntax that supports boolean expressions:

Example:

& (executable = a.out)

(directory = /home/nobody )

(arguments = arg1 "arg 2")

(count = 1)

33

Globus Job States

pending stage-in suspended active stage-out done failed

34

Globus GT3

 With transition to Web/Grid-Services, the job management becomes

 the Master Managed Job Factory Service (MMJFS)

 the Managed Job Factory Service (MJFS)

 the Managed Job Service (MJS)

The client contacts the MMJFS

MMJFS informs the MJFS to create a MJS for the job

The MJS takes care of managing the job actions.

 interact with local scheduler

 file staging

 store job status

35

Globus GT3 Job Execution

User/Application

Master Managed

Job Factory Service

Managed

Factory Job Service

Resource Information

Provider Service

Managed

Job Service

File Streaming

Factor Service

File Streaming

Service

Local Scheduler

 Globus as a toolkit does not perform scheduling and automatic resource selection

36

Example: Extending the Globus

Architecture at KAIST

Client

Job

Monitoring

Service

1

9

Job

Submission

Service

2

7

Scheduling

Service

3

5

Resource

Selection

Service

4

Resource

Information

Service

Local

Resource

Monitoring

Service (RIPS)

Job Manger

Service (MJS)

Providers

Resource

Preference

Provider

Information

Provider

8

Resource

Reservation

Service

6

Local

Resource

Manager (PBS) Workflow

Monitoring Information Flow

37

Source: Jin-Soo Kim

Job Description with RSL2

The version 2 of RSL is XML-based

Two namespaces are used:

 rsl: for basic types as int, string, path, url

 gram: for the elements of a job

*GNS = “http://www.globus.org/namespaces“

<?xml version="1.0" encoding="UTF-8"?>

<rsl:rsl xmlns:rsl="GNS/2003/04/rsl" xmlns:gram="GNS/2003/04/rsl/gram" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="

GNS/2003/04/rsl

./schema/base/gram/rsl.xsd

GNS/2003/04/rsl/gram

./schema/base/gram/gram_rsl.xsd">

<gram:job>

<gram:executable><rsl:path>

<rsl:stringElement value="/bin/a.out"/>

</rsl:path></gram:executable>

</gram:job>

</rsl:rsl>

38

RSL2 Attributes

<count> (type = rsl:integerType)

 Number of processes to run (default is 1)

<hostCount> (type = rsl:integerType)

 On SMP multicomputers, number of nodes to distribute the “count” processes across

 count/hostCount = number of processes per host

<queue> (type = rsl:stringType)

 Queue into which to submit job

<maxWallTime> (type = rsl:longType)

 Maximum wall clock runtime in minutes

<maxCpuTime> (type = rsl:longType)

 Maximum CPU runtime in minutes

<maxTime> (type = rsl:longType)

Only applies if above are not used

Maximum wall clock or cpu runtime (schedulers’s choice) in minutes

39

Job Submission Tools

GT 3 provides the Java class GramClient

GT 2.x: command line programs for job submission

 globus-job-run: interactive jobs

 globus-job-submit: batch jobs

 globusrun: takes RSL as input

40

Globus 2 Job Client Interface

A simple job submission requiring 2 nodes: globus-job-run –np 2 –s myprog arg1 arg2

A multirequest specifies multiple resources for a job globus-job-run -dumprsl -: host1 /bin/uname -a \

-: host2 /bin/uname –a

+ ( &(resourceManagerContact="host1")

(subjobStartType=strict-barrier) (label="subjob 0")

(executable="/bin/uname") (arguments= "-a") )

( &(resourceManagerContact="host2")

(subjobStartType=strict-barrier)(label="subjob 1")

(executable="/bin/uname") (arguments= "-a") )

41

Globus 2 Job Client Interface

The full flexibility of RSL is available through the command line tool globusrun

Support for file staging: executable and stdin/stdout

Example: globusrun -o –r hpc1.acme.com/jobmanager-pbs

'&(executable=$(HOME)/a.out) (jobtype=single)

(queue=time-shared)’

42

Problem: Job Submission

Descriptions differ

The deliverables of the GGF Working Group JSDL

:

A specification for an abstract standard Job Submission Description

Language (JSDL) that is independent of language bindings, including;

 the JSDL feature set and attribute semantics,

 the definition of the relationship between attributes, and the range of attribute values.

A normative XML Schema corresponding to the JSDL specification.

 A document of translation tables to and from the scheduling languages of a set of popular batch systems for both the job requirements and resource description attributes of those languages, which are relevant to the JSDL.

43

JSDL Attribute Categories

 The job attribute categories will include:

Job Identity Attributes

 ID, owner, group, project, type, etc.

Job Resource Attributes

 hardware, software, including applications, Web and Grid Services, etc.

Job Environment Attributes

 environment variables, argument lists, etc.

Job Data Attributes

 databases, files, data formats, and staging, replication, caching, and disk requirements, etc.

Job Scheduling Attributes

 start and end times, duration, immediate dependencies etc.

Job Security Attributes

 authentication, authorisation, data encryption, etc.

44

Problem: Resource Management Systems

Differ Across Each Component

Application

Task definition

Submit

Task Analysis

Scheduler

Executing Host

Staging

Task

Interface Format Execution Environment

LSF

Has API plus Batch Utilities via

“LSF Scripts”

User: Local disk exported

System: Remote initialized (option)

Platform Mix

Unix, Windows

Grid Engine

GDI API Interface plus

Command line interface

System: Remote initialized, with SGE local variables exported

Unix only

PBS

API (script option)

Batch Utilities via “PBS Scripts”

System: Remote initialized, with PBS local variables exported

Unix only

Source: Hrabri Rajic

45

GGF-WG DRMAA

GGF Working Group “Distributed Resource Management

Application API”

From the charter:

Develop an API specification for the submission and control of jobs to one or more Distributed Resource

Management (DRM) systems.

The scope of this specification is all the high level functionality which is necessary for an application to consign a job to a DRM system including common operations on jobs like termination or suspension.

 The objective is to facilitate the direct interfacing of applications to today's DRM systems by application's builders, portal builders, and Independent Software

Vendors (ISVs).

46

Source: Hrabri Rajic

DRMAA State Diagram

The remote job could be in following states:

 system hold user hold system and user hold simultaneously queued active system suspended user suspended system and user suspended simultaneously running finished (un)successfully

47

Example: Condor-G

 Condor-G is a Condor system enhanced to manage Globus jobs.

It provides two main features

Globus Universe: interface for submitting, queuing and monitoring jobs that use Globus resources

GlideIn: system for efficient execution of jobs on remote Globus resources

CondorG runs as a “personal Condor” system

 daemons run as non-privileged user processes

 each user runs her/his Condor-G

48

Condor-G: GlideIn

Globus is used to run the Condor daemons on Grid resources

 Condor daemons run as a Globusmanaged job

GRAM service starts daemons rather than the Condor jobs

 When the resources run these GlideIn jobs, they will join the personal

Condor pool

These daemons can be used to launch a job from Condor-G to a

Globus resource

 Jobs are submitted as Condor jobs and they will be matched and run on the Grid resources

 the daemons receive jobs from the user’s Condor queue

 combines the benefits of Globus and Condor

49

Condor-G

Schedd

Using GlideIn

GridManager

Grid Resource

JobManager

LSF

Startd

Collector

glide-ins

Source: ANL/USC ISI

50

Example: DAGMan

Directed Acyclic Graph Manager

DAGMan allows you to specify the dependencies between your Condor-G jobs, so it can manage them automatically for you.

(e.g., “Don’t run job “B” until job “A” has completed successfully.”)

A DAG is defined by a .dag file , listing each of its nodes and their dependencies:

Job A

# diamond.dag

Job A a.sub

Job B b.sub

Job C c.sub

Job D d.sub

Parent A Child B C

Parent B C Child D

Job B Job C

Job D

 each node will run the Condor-G job specified by its accompanying Condor submit file

Source: Miron Livny

51

University Dortmund

Grid Scheduling

How to select resources in the Grid?

Different Level of Scheduling

Resource-level scheduler

 low-level scheduler, local scheduler, local resource manager

 scheduler close to the resource, controlling a supercomputer, cluster, or network of workstations, on the same local area network

 Examples: Open PBS, PBS Pro, LSF, SGE

Enterprise-level scheduler

 Scheduling across multiple local schedulers belonging to the same organization

 Examples: PBS Pro peer scheduling, LSF Multicluster

Grid-level scheduler

 also known as super-scheduler, broker, community scheduler

Discovers resources that can meet a job’s requirements

 Schedules across lower level schedulers

53

Grid-Level Scheduler

Discovers & selects the appropriate resource(s) for a job

If selected resources are under the control of several local schedulers, a meta-scheduling action is performed

 Architecture:

 Centralized: all lower level schedulers are under the control of a single

Grid scheduler

 not realistic in global Grids

 Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler

54

Grid User

Scheduler

Grid Scheduling

Grid-Scheduler

Scheduler Scheduler

Schedule

Job-Queue

Machine 1

Schedule

Job-Queue

Machine 2

Schedule

Job-Queue

Machine 3

55

Activities of a Grid Scheduler

 GGF Document:

“10 Actions of Super

Scheduling (GFDI.4)”

Phase One-Resource Discovery

1. Authorization Filtering

2. Application Definition

3. Min. Requirement Filtering

Phase Two - System Selection

4. Information Gathering

5. System Selection

Phase Three- Job Execution

6. Advance Reservation

7. Job Submission

8. Preparation Tasks

9. Monitoring Progress

10 Job Completion

11. Clean-up Tasks

Source: Jennifer Schopf

56

Grid Scheduling

A Grid scheduler allows the user to specify the required resources and environment of the job without having to indicate the exact location of the resources

 A Grid scheduler answers the question: to which local resource manger(s) should this job be submitted?

Answering this question is hard:

 resources may dynamically join and leave a computational grid

 not all currently unused resources are available to grid jobs:

 resource owner policies such as “maximum number of grid jobs allowed”

 it is hard to predict how long jobs will wait in a queue

57

Select a Resource for Execution

 Most systems do not provide advance information about future job execution

 user information not accurate as mentioned before new jobs arrive that may surpass current queue entries due to higher priority

Grid scheduler might consider current queue situation, however this does not give reliable information for future executions:

 A job may wait long in a short queue while it would have been executed earlier on another system.

Available information:

Grid information service gives the state of the resources and possibly authorization information

Prediction heuristics: estimate job’s wait time for a given resource, based on the current state and the job’s requirements.

58

Selection Criteria

 Distribute jobs in order to balance load across resources

 not suitable for large scale grids with different providers

Data affinity: run job on the resource where data is located

Use heuristics to estimate job execution time.

Best-fit: select the set of resources with the smallest capabilities and capacities that can meet job’s requirements

 Quality of Service of

 a resource or

 its local resource management system

 what features has the local RMS?

 can they be controlled from the Grid scheduler?

59

Scheduling Attributes

Working Group in the Global Grid Forum to

“Define the attributes of a lower-level scheduling instance that can be exploited by a higherlevel scheduling instance.”

The following attributes have been defined:

 Attributes of allocation properties

Revocation of an allocation

 The local scheduler reserves the right to withdraw a given allocation.

Guaranteed completion time of allocation,

 A deadline for job completion is provided by the local scheduler.

Guaranteed Number of Attempts to Complete a Job

 A local scheduler will retry a given job task, e.g. useful for data transfer actions.

Allocations run-to-completion

 A job is not preempted if it has been started

Exclusive Allocations

 A job has exclusive access to the given resources; e.g. no time-sharing is performed

Malleable Allocations

 The given resource set may change during runtime; e.g. a computational job will gain

(moldable job) or lose processors

60

Scheduling Attributes (2)

 Attributes of available information

 Access to tentative schedule

The local scheduler exposes his schedule of future allocations

Option: Only the projected start time of a specified allocation is available

Option: Only partial information on the current schedule is available

Exclusive control

 The local scheduler is exclusively in charge of the resources; no other jobs can appear on the resources

Event Notification

 The local scheduler provides an event subscription service

 Attributes for manipulating allocation execution

Preemption

Checkpointing

Migration

Restart

61

Scheduling Attributes (3)

 Attributes for requesting resources

 Allocation Offers

 The local system can provides an interface to request offers for an allocation

Allocation Cost or Objective Information

 The local scheduler can provide cost or objective information

Advance Reservation

 Allocations can be reserved in advance

Requirement for Providing Maximum Allocation Length in Advance

 The higher-level scheduler must provide a maximum job execution length

Deallocation Policy

 A policy applies for the allocation that must be met to stay valid

Remote Co-Scheduling

 A schedule can be generated by a higher-level instance and imposed on the local scheduler

Consideration of Job Dependencies

 The local scheduler can deal with dependency information of jobs; e.g. for workflows

62

CSF – Community Scheduler

Framework

An open source implementation of OGSA-based metascheduler for

VOs.

 Supports emerging WS-Agreement spec

 Supports GT GRAM

Fills in gaps in existing resource management picture

Contributed from platform to the Globus Toolkit

Extensible, Open Source framework for implementing metaschedulers

Provides basic protocols and interfaces to help resources work together in heterogeneous environments

63

RIPS = Resource Information

Provider Service

Source: Chris Smith

CSF Architecture

Platform

LSF User

Metascheduler

Plugin

LSF

Globus

Toolkit User

Grid Service Hosting

Environment

Global

Information

Service

RIPS

Job

Service

Meta-Scheduler

Reservation

Service

Queuing

Service

GRAM

SGE

RIPS

GRAM

PBS

RIPS

RM

Adapter

SGE PBS

Platform LSF

64

Global Information Service

Source: Chris Smith

Rsrc

Info

Req

Cluster

Info

Req

Data

Store

Req

Data

Load

Req

Global Information Service

Index Service

Registry

SD

Aggregator

SDProvider

Manager

Data

Storage

RIPS

Rsrc, job rsv info

RIPS

Rsv

Info

Job

Info

Data

Store

Load

DB

65

Support for Virtual Organizations

User of VO “A” User of VO “B”

Virtual

Organization “A”

Community

Scheduler

Community

Scheduler

Virtual

Organization “B”

Organization 3

Organization 1

Organization 2

Source: Chris Smith

66

CSF Grid Services

Job Service:

 creates, monitors and controls compute jobs

Reservation Service:

 guarantees resources are available for running a job

Queuing Service:

 provides a service where administrators can customize and define scheduling policies at the VO level and/or at the different resource manager level

 Defines an API for plug in schedulers

 RM Adaptor Service:

 provides a Grid service interface that bridges the Grid service protocol and resource managers (LSF, PBS, SGE, Condor and other RMs)

67

GT3 Job Submission /

Architecture

MMJFS = Master Managed Job Factory Service

MJS = Managed Job Service

Blue indicates a Grid Service hosted in a GT3 container

MJS for LSF

MMJFS

RIPS

Site A – MMJFS on node1

LSF managed-jobglobusrun managed-jobglobusrun managed-jobglobusrun

MJS for PBS

MMJFS PBS

RIPS

Site B – MMJFS on node2

Index Service

MJS for SGE

MMJFS SGE

RIPS

Site C – MMJFS on node3

68

Source: Chris Smith

Queuing

Service

Job

Service

Reservation

Service

Index Service

Virtual

Organization

GT3 + CSF Architecture

RM Adapter for LSF

RIPS

LSF

RM Adapter for PBS

Site A

RIPS

PBS

MMJFS/MJS

Site B

SGE

RIPS

Site C

69

Source: Chris Smith

Queue Service

In CSF, Job Service instances are “submitted” to the

Queue Service for dispatch to a resource manager.

The Queue Service provides a plug in API for extending the scheduling algorithms provided by default with CSF.

The Queue Service is responsible for:

 loads and validates configuration information

 loads all configured scheduler plugins

 calls the plugin API functions

 schedInit() after loading the plugin successfully

 schedOrder() when a new job is submitted

 schedMatch() during the scheduling cycle

 schedPost() before the scheduling cycle ends, and after scheduling decisions are sent to the job service instances

70

Example: Project GridLab - GRMS

Authorization

System

Information

Services

Data

Management

Adaptive

Resource

Discovery

File Transfer

Unit

Jobs

Queue

Job Receiver BROKER

Execution

Unit

SLA

Negotiation

GRMS

Scheduler

Workflow

Manager

Resource

Reservation

Prediction

Unit

GLOBUS, other

Local Resources (Managers)

Source: Jarek Nabrzyski

Monitoring

Application

Manager

71

Anticipated Features

Reliable and predictable delivery of a service

 Quality of service for a job service:

 Reliable job submission: two-phase commit

 Predictable start and end time of the job

Advance reservation assures start time and throughput

Fault tolerance/recovery:

 Migrate job to another resource before the fault occurs: the job continues

 after the fault: the job is restarted

 Rerun the job on the same resource after repair

 Allocate multiple resources for a job

72

Co-allocation

 It is often requested that several resources are used for a single job.

 that is, a scheduler has to assure that all resources are available when needed.

 in parallel (e.g. visualization and processing)

 with time dependencies (e.g. a workflow)

 The task is especially difficult if the resources belong to different administrative domains.

 The actual allocation time must be known for co-allocation

 or the different local resource management systems must synchronize each other (wait for availability of all resources)

73

Example Multi-Site Job Execution

Grid-Scheduler

Scheduler

Multi-Side Job

Scheduler Scheduler

Schedule Schedule Schedule

Job-Queue Job-Queue Job-Queue

Machine

1

Machine

2

Machine

3

A job uses several resources at different sites in parallel.

Network communication is an issue.

74

Advanced Reservation

Co-allocation and other applications require a priori information about the precise resource availability

With the concept of advanced reservation, the resource provider guarantees a specified resource allocation.

 includes a two- or three-phase commit for agreeing on the reservation

 Implementations:

 GARA/DUROC/SNAP provide interfaces for Globus to create advanced reservation

 implementations for network QoS available.

 setup of a dedicated bandwidth between endpoints

75

Limitations of current Grid RMS

 The interaction between local scheduling and higher-level Grid scheduling is currently a one-way communication

 current local schedulers are not optimized for Grid-use

 limited information available about future job execution

 a site is usually selected by a Grid scheduler and the job enters the remote queue.

 The decision about job placement is inefficient.

 Actual job execution is usually not known

 Co-allocation is a problem as many systems do not provide advance reservation

76

Grid User

Example of Grid Scheduling

Decision Making

Where to put the Grid job?

15 jobs running

20 jobs queued

Grid-Scheduler

5 jobs running

2 jobs queued

40 jobs running

80 jobs queued

Scheduler Scheduler Scheduler

Schedule

Job-Queue

Machine 1

Schedule

Job-Queue

Machine 2

Schedule

Job-Queue

Machine 3

77

Available Information from the

Local Schedulers

 Decision making is difficult for the Grid scheduler

 limited information about local schedulers is available

 available information may not be reliable

 Possible information:

 queue length, running jobs

 detailed information about the queued jobs

 execution length, process requirements,…

 tentative schedule about future job executions

These information are often technically not provided by the local scheduler

In addition, these information may be subject to privacy concerns!

78

Consequence

 Consider a workflow with 3 short steps (e.g. 1 minute each) that depend on each other

Assume available machines with an average queue length of 1 hour.

The Grid scheduler can only submit the subsequent step if the previous job step is finished.

 Result:

 The completion time of the workflow may be larger than 3 hours

(compared to 3 minutes of execution time)

 Current Grids are suitable for simple jobs, but still quite inefficient in handling more complex applications

 Need for better coordination of higher- and lower-level scheduling!

79

University Dortmund

GRMS in

Next Generation Grids

Outlook on future Grid Resource

Management and Scheduling

Example Grid Scenario

Remote Center

Reads and Generates TB of Data

WAN Transfer Compute

Resources

LAN/WAN Transfer

Assume a data-intensive simulation that should be visualized and steered during runtime!

Visualization

81

Resource Request of a Simple

Grid Job

A specified architecture with

48 processing nodes,

1 GB of available memory, and a specified licensed software package for 1 hour between 8am and 6pm of the following day

 Time must be known in advance.

A specific visualization device during program execution

Minimum bandwidth between the VR device and the main computer during program execution

Input: a specified data set from a data repository at most 4

 preference of cheaper job execution over an earlier execution.

 actually a pretty “simple” example (no complex workflows)

82

Use-Case Example:Coordinated

Simulation and Visualization

Expected output of a Grid scheduler: resources

Data

Network 1

Computer 1

Network 2

Computer 2

Software License

Storage

Network 3

VR-Cave time

Data Access Storing Data

Data Transfer Data Transfer

Loading Data Parallel Computation Providing Data

Communication for Computation

Parallel Computation

Software Usage

Reservations are necessary!

Data Storage

Communication for Visualization

Visualization

83

Need for a Grid Scheduling

Architecture

Grid-

User/Application

Grid-

Scheduler

Grid

Middleware

Local RMS

Grid

Middleware

Local RMS

Grid

Middleware

Local RMS

Grid

Middleware

Local RMS

Information

Services

Monitoring

Services

Security

Services

Accounting/Billing

Other

Grid Services

Schedule

Compute

Resources

Schedule

Data

Resources

Schedule

Network

Resources

Schedule

Other

Resources/

Services

84

Required Services/Components

 Relevant to Grid scheduling

Information Service

Job/Workflow Description

Requirement Description

Resource Discovery

Reservation

Monitoring/Notification

Job Execution

Security

Accounting/Billing

Data Management

Local RMS

85

Service Oriented Architectures

Services are used to abstract all resources and functionalities.

Concept of OGSI and WSRF

 using WebServices, SOAP, XML to implement the services

 OGSI idea of GridServices is implemented in GT3

 transition to WSRF with GT4

 Core service for building a Grid are discussed in the Open Grid

Services Architecture (OGSA)

86

Open Grid Services Architecture

Users in Problem Domain X

Applications in Problem Domain X

Application & Integration Technology for Problem Domain X

Generic Virtual Service Access and Integration Layer

Job Submission Brokering Workflow

Registry Banking

OGSA

Data Transport Resource Usage

Authorisation

Transformation

Structured Data

Integration

Structured Data Access

OGSI: Interface to Grid Infrastructure

Web Services: Basic Functionality

Compute, Data & Storage Resources

Structured Data

Distributed

Virtual Integration Architecture 87

OGSA Outlook

Data Catalog

Data Provision

Data Integration

Virtual

Organization

Policy +

Agreement

Data Access

Context

Services

Data

Services

Status +

Monitoring

Services

Problem

Determination

Event

Information Logging

Service Service

Application

Content

Manager

Workflow

Manager

Workload

Manager

Broker

Execution

Planning

Service

Job

Manager

Job

Service

Job &

Workflow

Management

Reservation

Service

Service

Container

Deploy +

Configuration

Service

Provisioning

Resource

Management

Services

Security

Services

Infrastructure

Services

WS-RF

(OGSI)

WS

Management

Authentication

Authorization

Delegation

Firewall

Transition

88

“Demand”

Workload Mgmt.

Framework

User/Job

Proxies

Policies

Environment

Mgmt.

Job

Factory

Dependency management

OGSA Execution Planning

Workload Optimizing Framework

Queuing Services

Workload Optimization

Workload Post Balancing

Workload Models (History/Prediction)

Workload Orchestration

Admission Control (Workload)

SLA Management (Workload)

Primary Interaction

Meta - Interaction

Optimizing Framework

Scheduling

Workload Optimization

Resource – Workload

Optimal Mapping

Resource Selection

Selection Context (e.g. VO)

“Supply”

Resource Mgmt. Framework

CMM Reservation

Information Provider

Resource Provisioning

Resource

Factory

Resource

Allocation

(or Binding)

Resource Optimizing Framework

Capacity Management

Resource Placement

Admission Control (Resources)

Quality of Service (Resources)

Represents one or more OGSA services

89

Functional Requirements for Grid Scheduling

Functional Requirements:

 Cooperation between different resource providers

 Interaction with local resource management systems

 Support for reservations and service level agreements

 Orchestration of coordinated resources allocations

 Automatic handling of accounting and billing

 Distributed Monitoring

 Failure Transparency

90

What are Basic Blocks for a Grid Scheduling Architecture?

Scheduling

Service

Reservation

Job

Supervisor

Service

Accounting and Billing

Service

Query for resources

Data Management

Service

Network Management

Service

Maintain information

Information Service static & scheduled/forecasted

Resources

Data

Network

Maintain information

Compute Manager

Management

System

Compute/ Storage /Visualization etc

Data Manager

Data-Resources

Network Manager

Management System

Network

Network-Resources

Scheduling-relevant Interfaces of Basic Blocks are still to be defined!

91

Information Service

Resource Discovery

Relevant for Grid Scheduling:

Access to static and dynamic information

Dynamic information include data about planned or forecasted future events

 e.g.: existing reservations, scheduled tasks, future availabilities need for anonymous and limited information (privacy concerns)

 Information about all resource types

 including e.g. data and network

 future reservation, data transfers etc.

92

Job/Workflow Description

Requirement Description

Information about the job specifics (what is the job) and job requirements (what is required for the job) including data access and creation

 Need for common workflow description

 e.g. a DAG formulation

 include static and dynamic dependencies

 need for the ability to extract workflow information to schedule a whole workflow in advance

93

Reservation Management

Agreement and Negotiation

Interaction between scheduling instances, between resource/agreement providers and agreement initiators (higher-level scheduler)

 access to tentative information necessary negotiations might take very long individual scheduling objectives to be considered probably market-oriented and economic scheduling needed

Need for combining agreements from different providers

 coordinate complex resource requests or workflows

Maintain different negotiations at the same time

 probable several levels of negotiations, agreement commitment and reservation

94

Accounting and Billing

Interaction to budget information

Charging for allocations, reservations; preliminary allocation of budgets

Concepts for reliable authorization of Grid schedulers to spend money on behalf of the user

Re-funding in terms of resource/SLA failure, re-scheduling etc.

 Reliable monitoring and accounting

 required for tracing whether a party fulfilled an agreement

95

Monitoring Services

Monitoring of

 resource conditions agreements schedules program execution

SLA conformance workflow

Monitoring must be reliable as it is part of accountability

 fail or fulfillment of a service/resource provider must be clearly identifiable

96

Conclusions for Grid Scheduling

Grids ultimately require coordinated scheduling services.

Support for different scheduling instances

 different local management systems different scheduling algorithms/strategies

For arbitrary resources

 not only computing resources, also data, storage, network, software etc.

Support for co-allocation and reservation

 necessary for coordinated grid usage

(see data, network, software, storage)

Different scheduling objectives

 cost, quality, other

97

Scheduling Model

Using a Brokerage/Trading strategy:

Submit Grid

Job Description

Discover Resources

Consider individual user policies

Coordinate Allocations

Higher-level scheduling

Select Offers

Collect Offers

Query for

Allocation Offers

Generate

Allocation Offer

Lower-level scheduling

Consider individual owner policies

Analyze Query

98

Properties of

Multi-Level Scheduling Model

Multi-level scheduling must support different RM systems and strategies.

Provider can enforce individual policies in generating resource offers.

 User receive resource allocation optimized to the individual objective

Different higher-level scheduling strategies can be applied.

Multiple levels of scheduling instances are possible

Support for fault-tolerant and load-balanced services.

99

Negotiation in Grids

Multilevel Grid scheduling architecture

 Lower level local scheduling instance

 Implementation of owner policies

 Higher level Grid scheduling instance

 Resource selection and coordination

(Static) Interface definition between both instances

 Different types of resources

 Different local scheduling systems with different properties

 Different owner policies

(Dynamic) Communication between both instances

 Resource discovery

 Job monitoring

100

Using Service Level Agreements

The mapping of jobs to resources can be abstracted using the concept of Service Level Agreement (SLAs) (Czajkowski, Foster,

Kesselman & Tuecke)

SLA: Contract negotiated between

 resource provider, e.g. local scheduler

 resource consumer, e.g., grid scheduler, application

SLAs provide a uniform approach for the client to

 specify resource and QoS requirements, while

 hiding from the client details about the resources,

 such as queue names and current workload

101

Service Level Agreement Types

Resource SLA (RSLA)

 A promise of resource availability

 Client must utilize promise in subsequent SLAs

 Advance Reservation is an RSLA

Task SLA (TSLA)

 A promise to perform a task

Complex task requirements

May reference an RSLA

Binding SLA (BSLA)

 Binds a resource capability to a TSLA

 May reference an RSLA (i.e. a reservation)

 May be created lazily to provision the task

 Allows complex resource arrangements

102

Agreement-Based Negotiation

A client (application) submits a task to a Grid scheduler

 The client negotiates a TSLA for the task with the Grid Scheduler

In order to provision the TSLA, the Grid Scheduler may obtain an RSLA with the Grid resource or may use a preexisting RSLA that the Grid scheduler has negotiated speculatively

 TSLA that refers to an RSLA assures the jobs gets the reserved resources at a specified time

 TSLA without an RSLA tells little about when the resources will be available to the job

103

Agreement-Based Negotiation (2)

The job starts execution on the resource according to the

TSLA and the RSLA

 For an existing TSLA, the Grid Scheduler may obtain additional

RSLAs

 An RSLA is negotiated by the Grid Scheduler with the Resource

 A BSLA binds this RSLA to the corresponding TSLA

BSLAs allow to dynamically provision resources that are either

 not needed for the whole duration of the task or

 not known completely (e.g., the time at which a resource will be needed) before submitting the task

104

Example of Agreement Mapping

User/

Application

TSLA 1

Grid

Scheduler

User/

Application

TSLA 2

RSLA 1

RSLA 2

Grid Resource

Manager

Grid Resource

Manager

Grid Resource

Manager

The Grid Scheduler receives requests for two agreements.

It negotiates with the resources the

RSLA1 and RSLA2 , and in parallel with the agreement initiators about the corresponding

TSLA1 and TSLA2 .

Resource Resource Resource

105

GGF: GRAAP-WG

Goal: Defining WebService-based protocols for negotiation and agreement management

WS-Agreement Protocol:

106

Towards Grid Scheduling

Grid Scheduling Methods:

 Support for individual scheduling objectives and policies

 Multi-criteria scheduling models

 Economic scheduling methods to Grids

Architectural requirements:

 Generic job description

 Negotiation interface between higher- and lower-level scheduler

 Economic management services

 Workflow management

 Integration of data and network management

107

Grid Scheduling Strategies

Current approach:

Extension of job scheduling for parallel computers.

Resource discovery and load-distribution to a remote resource

Usually batch job scheduling model on remote machine

But actually required for Grid scheduling is:

 Co-allocation and coordination of different resource allocations for a Grid job

 Instantaneous ad-hoc allocation is not always suitable

This complex task involves:

Cooperation between different resource providers

Interaction with local resource management systems

Support for reservations and service level agreements

Orchestration of coordinated resources allocations

108

User Objective

Local computing typically has:

 A given scheduling objective as minimization of response time

Use of batch queuing strategies

Simple scheduling algorithms: FCFS, Backfilling

Grid Computing requires:

 Individual scheduling objective

 better resources faster execution cheaper execution

 More complex objective functions apply for individual Grid jobs!

109

Provider/Owner Objective

Local computing typically has:

 Single scheduling objective for the whole system:

 e.g. minimization of average weighted response time or high utilization/job throughput

In Grid Computing :

 Individual policies must be considered:

 access policy, priority policy, accounting policy, and other

 More complex objective functions apply for individual resource allocations !

 User and owner policies/objectives may be subject to privacy considerations!

110

Grid Economics –

Different Business Models

Cost model

 Use of a resource

 Reservation of a resource

Individual scheduling objective functions

 User and owner objective functions

 Formulation of an objective function

 Integration of the function in a scheduling algorithm

 Resource selection

 The scheduling instances act as broker

 Collection and evaluation of resource offers

111

Scheduling Objectives in the Grid

In contrast to local computing, there is no general scheduling objective anymore

 minimizing response time

 minimizing cost

 tradeoff between quality, cost, response-time etc.

Cost and different service quality come into play

 the user will introduce individual objectives

 the Grid can be seen as a market where resource are concurring alternatives

Similarly, the resource provider has individual scheduling policies

Problem:

 the different policies and objectives must be integrated in the scheduling process

 different objectives require different scheduling strategies

 part of the policies may not be suitable for public exposition

(e.g. different pricing or quality for certain user groups) 112

Grid Scheduling Algorithms

 Due to the mentioned requirements in Grids its not to be expected that a single scheduling algorithm or strategy is suitable for all problems.

 Therefore, there is need for an infrastructure that

 allows the integration of different scheduling algorithms

 the individual objectives and policies can be included

 resource control stays at the participating service providers

 Transition into a market-oriented Grid scheduling model

113

Economic Scheduling

 Market-oriented approaches are a suitable way to implement the interaction of different scheduling layers

 agents in the Grid market can implement different policies and strategies

 negotiations and agreements link the different strategies together

 participating sites stay autonomous

Needs for suitable scheduling algorithms and strategies for creating and selecting offers

 need for creating the Pareto-Optimal scheduling solutions

Performance relies highly on the available information

 negotiation can be hard task if many potential providers are available.

114

Economic Scheduling (2)

 Several possibilities for market models:

 auctions of resources/services

 auctions of jobs

 Offer-request mechanisms support:

 inclusion of different cost models, price determination

 individual objective/utility functions for optimization goals

 Market-oriented algorithms are considered:

 robust

 flexible in case of errors

 simple to adapt

 markets can have unforeseeable dynamics

115

Problem: Offer Creation

t

Job t4

R1 R2 R3 R4 R5 R6 R7 R8 t3 t2 t1

116

t

Offer Creation (2)

Offer 1

End of current schedule

R1 R2 R3 R4 R5 R6 R7 R8

117

t

Offer Creation (3)

Offer 2

End of current schedule

R1 R2 R3 R4 R5 R6 R7 R8

118

t

Offer Creation (4)

Offer 3

New end of current schedule

End of current schedule

R1 R2 R3 R4 R5 R6 R7 R8

119

Evaluate Offers

 Evaluation with utility functions

A utility function is a mathematical representation of a user’s preference

 The utility function may be complex and

 contain several different criteria

 Example using response time (or delay time) and price:

util

U

max

(

a

1

latency

a

2

price

)

120

Optimization Space

A3 latency

A2

A3

A2

A1

 price

The utility depends on the user and provider objectives

A1

121

Example: NWIRE

 Following are some slides to illustrate the scheduling process in the

NWIRE project.

NWIRE models a Grid infrastructure without central services

 A P2P scheduling model is employed where Grid Scheduler/Managers that reside on each local site are queried for offers.

A Grid schedulers asks known other Schedulers.

 Similar to P2P systems, no Grid manager knows the whole grid but keeps a list of “know other Grid managers”

Those schedulers can forward the request to other schedulers

The scope and depths of the query can be parameterized

Individual objective functions are used to evaluate the utility for a user and the resource provider.

 The scheduler selects the allocation offer with the highest utility value

 The whole model represents a Grid market and auction system.

122

Evaluation of Economic Approach

Real Workload Data of Grids is not yet available.

However, available are workloads of single HPC installations.

Example of evaluation results:

 considered: traces from Cornell Theory Center (430 nodes IBM RS/6000 SP)

 4 workload adapted from the real traces, each 10000 jobs

 Assumed Synthetic Grid machine configurations:

Name m64 m128 m256 m384 m512

Configuration

4*64 + 6*32 + 8*8

4*128

2*256

1*384 + 1*64 + 4*16

1*512

Largest Maschine

64

128

256

384

512

123

Scheduling Process

1) Client sends a request

Application

Request

Requirements

Job Attributes

ObjectiveFunction

Grid

Manager

Resource

Manager

Resource

Manager

Resource

Manager

124

Example Request

KEY “Hops“ {VALUE “HOPS“ {2}}

KEY “MaxOfferNumber“ {VALUE “MaxOfferNumber“ {10}}

KEY “MinMulti-Site“ {VALUE “MinMulti-Site“ {10}}

...

KEY “Utility“ {

ELEMENT 1 {

CONDITION{((OperatingSystem EQ “Linux“)&&( NumberOfProcessors >= 8))}

VALUE ‘‘UtilityValue‘‘ {-EndTime}

VALUE ‘‘RunTime‘‘ {5}

...

}

ELEMENT 2 {

CONDITION{ ... }

VALUE ‘‘UtilityValue‘‘ {-JobCost}

...

}

}

125

Application

Scheduling Process

Grid

Manager

Request

3) Limitations of direction and depth

Request

Grid

Manager

Request

Request

Request

2) Query for other offers

Request

Requirements

Job Attributes

ObjectiveFunction

Grid

Manager

Resource

Manager

Resource

Manager

Resource

Manager

Scheduler requests offers from other Grid Managers/Schedulers.

Search depths and scope can be parameterized.

126

Scheduling Process

Application

Grid

Manager

Request

Request

Offer

Resource

Manager

Grid

Manager

Resource

Manager

Request

Request

Grid

Manager

Request

Offer

Attributes

ObjectiveFunction

4) Allocation;

Maximization of the objectives

Resource

Manager

Selection of allocation according to the utility value

Returned offers are collected.

127

Scheduling Process

Grid

Manager

Request

Request

Request

Request

Grid

Manager

Request

Offer

Application

5) User and Resource

Managers are informed about allocations

Resource

Manager

Grid

Manager

Resource

Manager

Resource

Manager

Schedule

Allokation_1

Allokation_2

Allokation_3

The user can directly be informed about the schedule for his execution.

The Grid Manager may re-schedule his allocations to optimize the result further.

128

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

129

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

130

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

131

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

132

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

133

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

134

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

135

Offer Creation

Zeit

D

C

B

A

1 2 3 4

Resources

5 6 7

136

AWRT for Economic Scheduling

Compared to Conservative Backfilling

Changes in average completion time of market-economic vs conventional scheduler

10,000

5,000

0,000

-5,000

-10,000

-15,000

M128 M256

Resource Configuration

M384

Conventional algoritms better

Markted-oriented better

137

Utilization

Changes in Utilization of market-economic vs conventional schedulers

3,00

2,00

1,00

0,00

-1,00

-2,00

-3,00

M128 M256

Resource Configurations

M384

Markted-oriented better

Conventional better

138

AWRT for different Machine Utility Functions

300000

250000

200000

150000

100000

50000

0

MF 1 MF 2 MF 3 MF 4

Maschinen-Funktion

MF 5 MF 6

139

Results on Economic Models

Economical models provide results in the range of conventional algorithms in terms of AWRT

Economic methods leave much more flexibilities in defining the desired resources

Problems of site autonomy, heterogeneous resources and individual owner and user preferences are solved

Advantage of reservation of resources ahead of time

Further analysis needed:

 Usage of other utility functions

140

Extending Resource Types

Data is a key resource in the Grid

 remote job execution will in most cases include the consideration of data

 Data is a resource that can be managed

 replication, caching pre-fetching, post-fetching

The requirements on network information and QoS features will increase

 scheduling and planning data and resources requires information about available network bandwidth how long does a reliable data transfer take?

is sufficient bandwidth available between two resources (e.g between a visualization device and the compute resource

 Both resource types as well as others will need suitable management and scheduling features

 negotiation and agreement protocols provide already an infrastructure

141

Data and Network Scheduling

Most new resource types can be included via individual lower-level resource management systems.

Additional considerations for

 Data management

 Select resources according to data availability

 But data can be moved if necessary!

 Network management

 Consider advance reservation of bandwidth or SLA

Network resources usually depend on the selection of other resources!

Problem: no general model for network SLAs.

 Coordinate data transfers and storage allocation

142

Data Management

Access to information about the location of data sets

Information about transfer costs

Scheduling of data transfers and data availability

 optimize data transfers in regards to available network bandwidth and storage space

Coordination with network or other resources

 Similarities with general grid scheduling:

 access to similar services

 similar tasks to execute interaction necessary

143

Functional View of Grid

Data Management

Application

Planner:

Data location,

Replica selection,

Selection of compute and storage nodes

Metadata Service

Location based on data attributes

Replica Location

Service

Location of one or more physical replicas

Information Services

State of grid resources, performance measurements and predictions

Security and Policy

Executor:

Initiates data transfers and computations

Data Movement

Data Access

Compute Resources

Source: Carl Kesselmann

Storage Resources

144

Replica Location Service In

Context

Replica Consistency Management Services

Reliable Replication Service

Metadata

Service

Replica Location Service

Reliable Data

Transfer Service

GridFTP

 The Replica Location Service is one component in a layered data management architecture

Provides a simple, distributed registry of mappings

Consistency management provided by higher-level services

Source: Carl Kesselmann

145

Example of a Scheduling Process

Scheduling Service:

1.

receives job description

2.

3.

4.

5.

queries Information Service for static resource information prioritizes and pre-selects resources queries for dynamic information about resource availability queries Data and Network Management Services

6.

7.

8.

generates schedule for job reserves allocation if possible otherwise selects another allocation delegates job monitoring to Job Supervisor

Example:

40 resources of requested type are found.

12 resources are selected.

8 resources are available.

Network and data dependencies are detected.

Utility function is evaluated.

6 th tried allocation is confirmed.

Job Supervisor/Network and Data Management: service, monitor and initiate allocation

Data/network provided and job is started

146

Use-Case Example:Coordinated

Simulation and Visualization

Expected output of a Grid scheduler: resources time

Data

Network 1

Computer 1

Network 2

Computer 2

Software License

Storage

Network 3

VR-Cave

Data Access Storing Data

Data Transfer Data Transfer

Loading Data Parallel Computation Providing Data

Communication for Computation

Parallel Computation

Software Usage

Data Storage

Communication for Visualization

Visualization

147

Re-Scheduling

 Reconsidering a schedule with already made agreements may be a good idea from time to time

 because resource situation may have changed, or workload situation has changed

 Optimization of the schedule can only work with the bounds of made agreements and reservations

 given guarantees must be observed

 The schedulers can try to maximize the utility values of the overall schedule

 a Grid scheduler may negotiate with other resource providers in order to get better agreements; may cancel previous agreements a local scheduler may optimize the local allocations to improve the schedule.

148

Activities

 Core service infrastructure

 OGSI/WSRF

 OGSA

 GGF hosts several groups in the area of Grid scheduling and resource management.

Examples:

 WG Scheduling Attributes (finished)

 WG Distributed Resource Management Application API (active)

 WG Grid Resource Allocation Agreement Protocol (active)

 WG Grid Economic Services Architecture (active)

 RG Grid Scheduling Architecture (active)

149

Conclusion

Resource management and scheduling is a key service in an Next

Generation Grid.

In a large Grid the user cannot handle this task.

Nor is the orchestration of resources a provider task.

System integration is complex but vital.

The local systems must be enabled to interact with the Grid.

Providing sufficient information, expose services for negotiation

 Basic research is still required in this area.

 No ready-to-implement solution is available.

 New concepts are necessary.

 Current efforts provide the basic Grid infrastructure. Higher-level services as Grid scheduling are still lacking.

 Future RMS systems will provide extensible negotiation interfaces

 Grid scheduling will include coordination of different resources

150

Further Outlook

Commercial Interest in the Grid may have stronger focus on non HPCrelated aspects

 eCommerce-Solutions

 Dynamically migrating Application Server Load

Inter-operation between companies

Supply-Chain Management request distributed resources under certain constraints

Enterprise Application Integration EAI

 dynamically discover and use available services

The problems tackled by Grid RMS is common to many application fields.

Grid solutions may become a key technology infrastructure or solutions from other areas (e.g. the WebService-Community) may surpass Grid efforts?

151

References

Book: “Grid Resource Management: State of the Art and Future Trends”, co-editors Jarek Nabrzyski, Jennifer M. Schopf, and

Jan Weglarz, Kluwer Publishing, 2004

 PBS, PBS pro: www.openpbs.org

and www.pbspro.com

LSF, CSF: www.platform.com

Globus: www.globus.org

Global Grid Forum: www.ggf.org

, see SRM area

152

Contact: ramin.yahyapour@udo.edu

Computer Engineering Institute

University of Dortmund

44221 Dortmund, Germany http://www-ds.e-technik.uni-dortmund.de

153

Download