GridFlow: Workflow Management for Grid Computing Kavita Shinde Outline Introduction Grid Resource Management Grid Workflow Management An Example Scenario Conclusion Introduction GridFow given a set of workflow tasks and a set of resources,how do we map them to Grid resources? workflow management systems developed at University of Warwick developed on top of an agent-based resource management system for Grid computing(ARMS) focus is on service-level scheduling and workflow management Grid Resource Management Three Layers of resource management system within the GridFlow system Grid Resource high-end computing or storage resource accessed remotely Multiprocessors, or clusters of workstations or PCs with large disk storage space Local Grid multiple grid resources that belong to one organization resources are connected with high speed networks Global Grid consists of all local Grids Grid Resource Management PACE a toolset for resource performance and usage analysis takes separate resource and application models as inputs and is able to predict the execution time of a task prior to run time scalability(execution time vs. level of parallelism) can be determine helps in preventing over-occupying of resources useful when trying to interleave sub-workflows as much as possible Grid Resource Management Titan grid resource manager locates a suitable resource set and passes the subworkflow to a local scheduler utilizes free processors to minimize idle-time and improve throughput supported by the PACE performance predictive data Grid Resource Management ARMS component – agent agent – representative of a local grid at a global level of grid resource management agents cooperate with each other to find the available resources and there characteristics main dispatch requests that can not be satisfied locally to neighboring agents Grid Workflow Management The implementation of grid workflow management is carried out at multiple layers Tasks basic building block of application e.g.. MPI(Message Passing Interface) and PVM(Parallel Virtual Machine) jobs running on multiple processors tasks Sub-workflows a flow of closely related tasks that is to be executed in a predefined sequence on grid resources of a local grid usually significant communication between tasks, but resource conflicts may occur when multiple sub-workflows require the same resource simultaneously Workflows a flow of several different sub-workflows GridFlow user portal provides graphical user interface to compose workflow elements and access additional grid services LGSS handles conflicts - scheduled subworkflows may belong to different workflows ARMS represents a local Grid at a global level of Grid resource management, and conducts local Grid sub-workflow scheduling Globus MDS provides information about the available resources on the Grid and their status Titan utilizes performance data obtained from PACE for resource scheduling Grid Workflow Management GGWM Simulation takes place before a grid workflow is actually executed, workflow schedule is achieved returns simulation results to GridFlow portal for user agreement Execution executed according to the simulated schedule the actual execution may differ - dynamic nature of grid delays - send back to the simulation engine & rescheduled Monitoring provides access to real-time status reports of tasks or subworkflow execution Global Grid Workflow Management Scheduling Algorithm all properties of each sub-workflow – null look for a schedulable sub-workflow initialize ensure pre- sub-workflows have all been scheduled configure the start time of the chosen sub-workflow to be the latest end time of its pre- sub-workflows submit the start time and the sub-workflow to a grid level Agent(ARMS) finds a suitable local grid using LGSS Global Grid Workflow Management ARMS reschedules the less critical sub-workflows algorithm relies heavily on the simulation results of LGSS Workflow W : a set of subworkflows Si(i=1,….n) Si and Sn starting and ending points pi : number of pre- sub-workflows of Si qi : number of post- sub-workflows of Si G: global grid – set of local grids Lj(j=1….m) k: true if sub-workflow is scheduled else false Local Grid Sub-Workflow Scheduling Scheduling Algorithm very similar to GGWM has to deal with multiple tasks that may belong to different workflows start time of the chosen task can’t be configured with the latest end time of its pre-tasks directly resource conflicts Executes the task with the higher priority first gives higher priority to a possibly earlier enabled task Fuzzy Time Operations LGSS and GGWM algorithms are implemented using fuzzy timing techniques fuzzy time function – gives numerical estimate of the possibility that an event arrives at time advantages: can be computed very fast suitable for scheduling time critical applications they do not necessarily provide the best scheduling solution 1() = 0.5(0,2,6,7) 2() = (2,4,4,6) a: possibility distributions of 1 and b: latest arrival distribution of 1 and c: earliest enabling time 2 2 operator min – intersection of 1 and 2 d: e: f: operator max – union of 1 and 2 sum of 1 and 2 min(0.5,1)(0+2, 2+4, 6+4, 7+6)=0.5(2, 6, 10, 13) An Example Scenario W1, W2: Workflows L1, L2: Local Grids task A2 of sub-workflow S3 from W1 is being executed S3 from W2 is to be scheduled resource conflict between A3 and A4 schedule aims to find the e5() An Example Scenario task enabling times – from pre-task end times task execution times – from TITAN system supported by PACE functions a3()=(3,5,5,7); d3()=(5,6,7,8); a4()=(0,3,3,5); d4()=(10,12,14,16); d5()=(2,5,6,9); An Example Scenario using LGSS s3() = min{(3,5,5,7),earliest{(3,5,5,7),(0,3,3,5)}} = min{(3,5,5,7),(0,3,3,5)} = 0.5(3,4,4,5) s4() = min{(0,3,3,5),earliest{(3,5,5,7),(0,3,3,5)}} = min{(0,3,3,5),(0,3,3,5)} = (0,3,3,5) e13()= sum{0.5(3,4,4,5),(5,6,7,8)} = 0.5(8,10,11,13) An Example Scenario e14()= sum{latest{0.5(8,10,11,13),(0,3,3,5)},(10,12,14,16)} = sum{0.5(8,10,11,13),(10,12,14,16)} = 0.5(18,22,25,29) e24()= sum{(0,3,3,5)},(10,12,14,16)} = (10,15,17,21) e23()= sum{latest{ (10,15,17,21),0.5(3,4,4,5)},(5,6,7,8)} = sun{0.5(10,12.5,26,29),(5,6,7,8)} = 0.5(15,18.5,26,29) e4()= max{0.5(18,22,25,29),(10,15,17,21)} = (10,15,17,29) An Example Scenario e5()= sum{(10,15,17,29),(2,5,6,9)} = (12,20,23,38) so S3 from W2 will complete on local grid L1 most likely between 20 to 23 submit this data to GGWM – decides whether the local grid L1 should be allocated the sub-workflow S3 from W2 Conclusion the fuzzy timing technique provides a good solution to the conflict solving problem arising from grid workflow management issue results indicate that local and global grid workflow management can coordinate with each other to optimize workflow execution time and solve conflicts of interest useful in highly dynamic grid environments large network latencies exists and application performance is difficult to predict accurately needs more flexible cooperation among different grid services and components which challenges security