Workflow Management for Grid Computing

advertisement
GridFlow: Workflow
Management for Grid
Computing
Kavita Shinde
Outline
Introduction
 Grid Resource Management
 Grid Workflow Management
 An Example Scenario
 Conclusion

Introduction

GridFow




given a set of workflow tasks and a set of
resources,how do we map them to Grid resources?
workflow management systems developed at
University of Warwick
developed on top of an agent-based resource
management system for Grid computing(ARMS)
focus is on service-level scheduling and workflow
management
Grid Resource Management

Three Layers of resource management system
within the GridFlow system
 Grid Resource
 high-end computing or storage resource
 accessed remotely
 Multiprocessors, or clusters of workstations or PCs with large
disk storage space
 Local Grid
 multiple grid resources that belong to one organization
 resources are connected with high speed networks
 Global Grid
 consists of all local Grids
Grid Resource Management

PACE
a
toolset for resource performance and usage
analysis
 takes separate resource and application models
as inputs and is able to predict the execution
time of a task prior to run time
 scalability(execution time vs. level of parallelism)
can be determine

helps in preventing over-occupying of resources
 useful
when trying to interleave sub-workflows as
much as possible
Grid Resource Management

Titan
 grid
resource manager
 locates a suitable resource set and passes the subworkflow to a local scheduler
 utilizes free processors to minimize idle-time and
improve throughput
 supported by the PACE performance predictive data
Grid Resource Management

ARMS
component – agent
 agent – representative of a local grid at a global level
of grid resource management
 agents cooperate with each other to find the available
resources and there characteristics
 main

dispatch requests that can not be satisfied locally to
neighboring agents
Grid Workflow Management
The implementation of grid workflow management is
carried out at multiple layers
 Tasks


basic building block of application
e.g.. MPI(Message Passing Interface) and PVM(Parallel Virtual
Machine) jobs running on multiple processors tasks
 Sub-workflows


a flow of closely related tasks that is to be executed in a predefined
sequence on grid resources of a local grid
usually significant communication between tasks, but resource
conflicts may occur when multiple sub-workflows require the same
resource simultaneously
 Workflows

a flow of several different sub-workflows
GridFlow
user portal
provides
graphical user interface to
compose workflow elements and
access additional grid services
LGSS
handles
conflicts - scheduled subworkflows may belong to different
workflows

ARMS
represents
a local Grid at a global
level of Grid resource management,
and conducts local Grid sub-workflow
scheduling

Globus MDS
provides
information about the
available resources on the Grid and
their status

Titan
utilizes
performance data obtained
from PACE for resource scheduling
Grid Workflow Management

GGWM
 Simulation
 takes place before a grid workflow is actually executed,
workflow schedule is achieved
 returns simulation results to GridFlow portal for user agreement
 Execution
 executed according to the simulated schedule
 the actual execution may differ - dynamic nature of grid
 delays - send back to the simulation engine & rescheduled
 Monitoring
 provides access to real-time status reports of tasks or subworkflow execution
Global Grid Workflow Management

Scheduling Algorithm
all properties of each sub-workflow – null
 look for a schedulable sub-workflow
 initialize

ensure pre- sub-workflows have all been scheduled
 configure
the start time of the chosen sub-workflow to
be the latest end time of its pre- sub-workflows
 submit the start time and the sub-workflow to a grid
level Agent(ARMS)

finds a suitable local grid using LGSS
Global Grid Workflow Management
 ARMS
reschedules the less critical sub-workflows
 algorithm relies heavily on the simulation results of
LGSS
Workflow W : a set of subworkflows Si(i=1,….n) Si and Sn
starting and ending points
pi : number of pre- sub-workflows
of Si
qi : number of post- sub-workflows
of Si
G: global grid – set of local grids
Lj(j=1….m)
k: true if sub-workflow is
scheduled else false
Local Grid Sub-Workflow Scheduling

Scheduling Algorithm
 very
similar to GGWM
 has to deal with multiple tasks that may belong to different
workflows
 start time of the chosen task can’t be configured with the
latest end time of its pre-tasks directly


resource conflicts
 Executes the task with the higher priority first
gives higher priority to a possibly earlier enabled task
Fuzzy Time Operations


LGSS and GGWM algorithms are implemented
using fuzzy timing techniques
fuzzy time function –
 gives
numerical estimate of the possibility that an event
arrives at time 
advantages:
can be computed very fast
suitable for scheduling time critical applications
 they
do not necessarily provide the best scheduling
solution
1() = 0.5(0,2,6,7)
2() = (2,4,4,6)
a:
possibility distributions of 1 and
b:
latest arrival distribution of 1 and
c:
earliest enabling time
2
2
operator min – intersection of 1
and 2
d:
e:
f:
operator max – union of 1 and 2
sum of 1 and 2
min(0.5,1)(0+2,
2+4, 6+4,
7+6)=0.5(2, 6, 10, 13)
An Example Scenario






W1, W2: Workflows
L1, L2: Local Grids
task A2 of sub-workflow S3
from W1 is being executed
S3 from W2 is to be scheduled
resource conflict between A3
and A4
schedule aims to find the
e5()
An Example Scenario


task enabling times – from pre-task end times
task execution times – from TITAN system supported by
PACE functions
a3()=(3,5,5,7); d3()=(5,6,7,8);
a4()=(0,3,3,5); d4()=(10,12,14,16);
d5()=(2,5,6,9);
An Example Scenario
using LGSS
s3() = min{(3,5,5,7),earliest{(3,5,5,7),(0,3,3,5)}}
= min{(3,5,5,7),(0,3,3,5)}
= 0.5(3,4,4,5)
s4() = min{(0,3,3,5),earliest{(3,5,5,7),(0,3,3,5)}}
= min{(0,3,3,5),(0,3,3,5)}
= (0,3,3,5)
e13()= sum{0.5(3,4,4,5),(5,6,7,8)}
= 0.5(8,10,11,13)
An Example Scenario
e14()= sum{latest{0.5(8,10,11,13),(0,3,3,5)},(10,12,14,16)}
= sum{0.5(8,10,11,13),(10,12,14,16)}
= 0.5(18,22,25,29)
e24()= sum{(0,3,3,5)},(10,12,14,16)}
= (10,15,17,21)
e23()= sum{latest{ (10,15,17,21),0.5(3,4,4,5)},(5,6,7,8)}
= sun{0.5(10,12.5,26,29),(5,6,7,8)}
= 0.5(15,18.5,26,29)
e4()= max{0.5(18,22,25,29),(10,15,17,21)}
= (10,15,17,29)
An Example Scenario
e5()= sum{(10,15,17,29),(2,5,6,9)}
= (12,20,23,38)
so S3 from W2 will complete on local grid L1 most likely
between 20 to 23
submit this data to GGWM – decides whether the local grid
L1 should be allocated the sub-workflow S3 from W2
Conclusion





the fuzzy timing technique provides a good solution to the
conflict solving problem arising from grid workflow
management issue
results indicate that local and global grid workflow
management can coordinate with each other to optimize
workflow execution time and solve conflicts of interest
useful in highly dynamic grid environments
large network latencies exists and application
performance is difficult to predict accurately
needs more flexible cooperation among different grid
services and components which challenges security
Download