Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne Melbourne, Australia www.cloudbus.org Agenda Introduction to Grid Scheduling Application Models and Deployment Approaches Economy-based “Computational” Grid Scheduling Nimrod-G -- Grid Resource Broker Scheduling Algorithms and Experiments on World Wide Grid testbed Economy-based “Data Intensive” Grid Scheduling Gridbus -- Grid Service Broker Scheduling Algorithms and Experiments on Australian Belle Data Grid testbed Grid Grid Economy 2 Scheduling Economics Grid Scheduling: Introduction Grid Resources and Scheduling User Application Grid Resource Broker Local Resource Manager Single CPU (Time Shared Allocation) 4 Local Resource Manager SMP (Time Shared Allocation) Grid Information Service Local Resource Manager 2100 2100 2100 2100 2100 2100 2100 2100 Clusters (Space Shared Allocation) Grid Scheduling Grid scheduling: Grid schedulers are Global Schedulers 5 Resources distributed over multiple administrative domains Selecting 1 or more suitable resources (may involve co-scheduling) Assign tasks to selected resources and monitoring execution. They have no ownership or control over resources Jobs are submitted to local resource managers (LRMs) as user LRMs take care of actual execution of jobs Example Grid Schedulers Nimrod-G - Monash University Condor-G – University of Wisconsin Computational Grid & System centric Gridbus Broker – University of Melbourne 6 Computational Grid & System-centric AppLeS–University of California@San Diego Computational Grid & Economic-based Data Grid & Economic based Key Steps in Grid Scheduling Phase I-Resource Discovery 1. Authorization Filtering Phase III- Job Execution 2. Application Definition 3. Min. Requirement Filtering 6. Advance Reservation 7. Job Submission 8. Preparation Tasks Phase II - Resource Selection 9. Monitoring Progress 10 Job Completion 4. Information Gathering 11. Clean-up Tasks 5. System Selection 7 Source: J. Schopf, Ten Actions When SuperScheduling, OGF Document, 2003. Movement of Jobs: Between the Scheduler and a Resource 8 Push Model Manager pushes jobs from Queue to a resource. Used in Clusters, Grids Pull Model P2P Agent request for a job for processing from job-pool Commonly used in P2P systems such as Alchemi and SETI@Home Hybrid Model (both push and pull) Broker deploys an agent on resources, which pulls jobs from a resource. May use in Grid (e.g., Nimrod-G system). Broker also pulls data from user host or separate data host (distributed datasets) (e.g., Gridbus Broker). Example Systems Job Push Dispatch Pull Hybrid Architecture 9 Centralised PBS, SGE, Condor, Alchemi (when in dedicated mode) Windmill from CERN (used in Physics ATLAS experiment) Condor (as it supports nondedicated owner specified policies) Decentralised Nimrod-G, AppLeS, Condor-G, Gridbus Broker Alchemi, SETI@Home, UnitedDevice, P2P Systems, Aneka Nimrod-G (push Grid Agent, which pulls jobs) Application Models and their Deployment on Global Grids Grid Applications and Parametric Computing Bioinformatics: Drug Design / Protein Modelling Natural Language Engineering Sensitivity experiments on smog formation Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Ecological Modelling: Control Strategies for Cattle Tick Data Mining Electronic CAD: Field Programmable Gate Arrays VLSI Design: Finance: SPICE Simulations Investment Risk Analysis Civil Engineering: Building Design Automobile: Crash Simulation 11 Network Simulation Aerospace: Wing Design astrophysics How to Construct and Deploy Applications on Global Grids ? Three Options/Solutions: Manual Scheduling - Use pure Globus commands Application Level Scheduling - Build your own Distributed App & Scheduler Application Independent Scheduling – Grid Brokers 12 Decouple App Construction from Scheduling Perform parameter sweep (bag of tasks) (utilising distributed resources) within “T” hours or early and cost not exceeding $M. Using Pure Globus commands Do all yourself! (manually) Total Cost:$??? 13 Build Distributed Application & Application-Level Scheduler Build App and scheduler case by case basis 14 E.g., MPI Approach Total Cost:$??? Compose and Deploy using Brokers – Nimrod-G and Gridbus Approach •Compose Apps and Submit to the Broker • Define QoS requirements •Aggregate View 90 80 70 60 50 40 East West North South 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Compose, Submit & Play! 15 The Nimrod-G Grid Resource Broker and Economy-based Grid Scheduling [Buyya, Abramson, Giddy, 1999-2001] Deadline and Budget Constrained Algorithms for Scheduling Applications on “Computational” Grids Nimrod-G : A Grid Resource Broker A resource broker (implemented using Python) for managing, steering, and executing task farming (parameter sweep) applications on global Grids. It allows dynamic leasing of resources at runtime based on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, etc.) Key Features 17 A declarative parameter programming language A single window to manage & control experiment Persistent and Programmable Task Farming Engine Resource Discovery Resource Trading (User-Level) Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & results Steering & data management Accounting A Glance at Nimrod-G Broker Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod/G Engine Schedule Advisor Trading Manager Grid Store Grid Dispatcher Grid Explorer Grid Middleware TM Globus, Legion, Condor, etc. TS GE GIS Grid Information Server(s) RM & TS $ RM & TS G $ RM & TS C L G Globus enabled node. See HPCAsia 2000 paper! 18 L Legion enabled node. RM: Local Resource Manager, TS: Trade Server $ G C L Condor enabled node. Nimrod/G Grid Broker Architecture Legacy Applications Customised Apps (Active Sheet) P-Tools (GUI/Scripting) (parameter_modeling) Monitoring and Steering Portals Farming Engine Meta-Scheduler Algorithm1 Programmable Entities Management Resources IP hourglass! Jobs Tasks Schedule Advisor Channels Agents JobServer Database Grid Explorer Dispatcher & Actuators Globus-A Legion-A Condor-A Legion Condor Globus Computers 19 ... AlgorithmN AgentScheduler Local Schedulers PC/WS/Clusters Nimrod-G Clients ... Trading Manager P2P-A P2P ... Storage Condor/LL/NQS Nimrod-G Broker GTS Networks Database ... GMD ... G-Bank Instruments Radio Telescope Middleware Fabric Deadline A Nimrod/G CostMonitor 66 Arlington Alexandria Legion hosts She na nd o a h Rive r 64 64 81 Ra p p a ha n no c k Po to m a c Rive r Rive r Roanoke Ja m e s Rive r Ap p o m a to x Rive r Richmond Hampton Norfolk Virginia Beach Portsmouth Chesapeake Newport News 77 VIRGINIA 85 Globus Hosts Bezek is in both Globus and Legion Domains 20 User Requirements: Deadline/Budget 21 Nimrod/G Interactions Nimrod-G Grid Broker Grid Scheduler Grid Tools And Applications Task Farming Engine Grid Dispatcher Grid Info Server Grid Trade Server Process Server Local Resource Manager Nimrod Agent User Process Do this in 30 min. for $10? File Server User Node 22 File access Grid Node Compute Node Adaptive Scheduling Steps Discover Establish Resources Rates Distribute Jobs 23 Compose & Schedule Discover More Resources Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Deadline and Budget Constrained Scheduling Algorithms 24 Algorithm/ Strategy Execution Time Execution Cost (Deadline, D) (Budget, B) Cost Opt Limited by D Minimize Cost-Time Opt Minimize when possible Minimize Time Opt Minimize Limited by B Conservative-Time Opt Minimize Limited by B, but all unprocessed jobs have guaranteed minimum budget Deadline and Budget-based Cost Minimization Scheduling 1. 2. 3. 25 Sort resources by increasing cost. For each resource in order, assign as many jobs as possible to the resource, without exceeding the deadline. Repeat all steps until all jobs are processed. Scheduling Algorithms and Experiments World Wide Grid (WWG) WW Grid Australia North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster UCSD: Linux PCs BU: SGI IRIX Melbourne U. : Cluster Nimrod-G+Gridbus VPAC: Alpha Globus+Legion GRACE_TS Solaris WS Globus/Legion GRACE_TS Europe Asia WW Grid Internet Tokyo I-Tech.: Ultra WS AIST, Japan: Solaris Cluster Kasetsart, Thai: Cluster NUS, Singapore: O2K Globus + GRACE_TS Chile: Cluster 27 Globus + GRACE_TS South America ZIB: T3E/Onyx AEI: Onyx Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Globus + GRACE_TS Application Composition Using Nimrod Parameter Specification Language #Parameters Declaration parameter X integer range from 1 to 165 step 1; parameter Y integer default 5; #Task Definition task main #Copy necessary executables depending on node type copy calc.$OS node:calc #Execute program with parameter values on remote node node:execute ./calc $X $Y #Copy results file to use home node with jobname as extension copy node:output ./output.$jobname endtask 28 calc 1 5 output.j1 calc 2 5 output.j2 calc 3 5 output.j3 … calc 165 5 output.j165 Experiment Setup Workload: Deadline: 2 hrs. and budget: 396000 G$ Strategies: 1. Minimise cost 2. Minimise time Execution: 29 165 jobs, each need 5 minute of CPU time Optimise Cost: 115200 (G$) (finished in 2hrs.) Optimise Time: 237000 (G$) (finished in 1.25 hr.) In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. Users can now trade-off between Time Vs. Cost. Resources Selected & Price/CPU-sec. 30 Resource & Location Grid services & Fabric Cost/CPU sec.or unit No. of Jobs Executed Linux Cluster-Monash, Melbourne, Australia Globus, GTS, Condor 2 64 153 Linux-Prosecco-CNR, Pisa, Italy Globus, GTS, Fork 3 7 1 Linux-Barbera-CNR, Pisa, Italy Globus, GTS, Fork 4 6 1 Solaris/Ultas2 TITech, Tokyo, Japan Globus, GTS, Fork 3 9 1 SGI-ISI, LA, US Globus, GTS, Fork 8 37 5 Sun-ANL, Chicago,US Globus, GTS, Fork 7 42 4 Time_Opt Cost_Opt. Total Experiment Cost (G$) 237000 115200 Time to Complete Exp. (Min.) 70 119 Deadline and Budget Constraint (DBC) Time Minimization Scheduling 1. 2. 3. 4. 31 For each resource, calculate the next completion time for an assigned job, taking into account previously assigned jobs. Sort resources by next completion time. Assign one job to the first resource for which the cost per job is less than the remaining budget per job. Repeat all steps until all jobs are processed. (This is performed periodically or at each scheduling-event.) Resource Scheduling for DBC Time Optimization Condor-Monash Linux-Prosecco-CNR Linux-Barbera-CNR Solaris/Ultas2-TITech SGI-ISI Sun-ANL 12 No. of Tasks in Execution 10 8 6 4 2 32 Time (in Minute) 68 72 60 64 52 56 44 48 36 40 28 32 20 24 16 8 12 4 0 0 Resource Scheduling for DBC Cost Optimization Condor-Monash Linux-Prosecco-CNR Linux-Barbera-CNR Solaris/Ultas2-TITech SGI-ISI Sun-ANL 14 No. of Tasks in Execution 12 10 8 6 4 2 Time (in Minute) 33 10 2 10 8 11 4 96 90 84 78 72 66 60 54 48 42 36 30 24 18 6 12 0 0 Nimrod-G Summary One of the “first” and most successful Grid Resource Brokers world-wide! Project continues to be active and being used in many e-Science applications. For recent developments, please see: 34 http://messagelab.monash.edu.au/Nimrod Gridbus Broker “Distributed” Data-Intensive Application Scheduling Gridbus Grid Service Broker (GSB) A Java-based resource broker for Data Grids (Nimrod-G focused on Computational Grids). It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, & T/C optimisation) Key Features 36 A single window to manage & control experiment Programmable Task Farming Engine Resource Discovery and Resource Trading Optimal Data Source Discovery Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & sharing of results Accounting workload Gridbus User Console/Portal/Application Interface App, T, $, Optimization Preference Gridbus Broker Gridbus Farming Engine Schedule Advisor Trading Manager Record Keeper Grid Dispatcher Core Middleware Grid Explorer TM $ TS GE GIS, NWS Grid Info Server RM & TS $ G $ U Data Node Data Catalog C G Globus enabled 37 node. L Amazon EC2/S3 Cloud. A Gridbus Broker: Separating “applications” from “different” remote service access enablers and schedulers Application Development Interface Home Node/Portal Scheduling Interfaces Alogorithm1 AlogorithmN Single-sign on security batch() Gridbus Broker fork() -PBS -Condor -SGE -Aneka -XGrid Data Catalog Plugin Actuators Aneka Globus Data Store Job manager Amazon EC2 Access Technology fork() batch() -PBS -Condor -SGE 38 SRB Gridbus agent Grid FTP AMI SSH fork() batch() -PBS -Condor -SGE -XGrid Gridbus agent Gridbus Services for eScience applications 39 Application Development Environment: XML-based language for composition of task farming (legacy) applications as parameter sweep applications. Task Farming APIs for new applications. Web APIs (e.g., Portlets) for Grid portal development. Threads-based Programming Interface Workflow interface and Gridbus-enabled workflow engine. … Grid Superscalar – in cooperation with BSC/UPC Resource Allocation and Scheduling Dynamic discovery of optional computational and data nodes that meet user QoS requirements. Hide Low-Level Grid Middleware interfaces Globus (v2, v4), SRB, Aneka, Unicore, and ssh-based access to local/remote resources managed by XGrid, PBS, Condor, SGE. Click Here for Demo Drug Design Made Easy! 40 s http://www.gridbus.org AASample SampleList Listof ofGridbus GridbusBroker BrokerUsers Users Molecular Moleculardocking dockingfor fordrug drugdesign design on Australian National Grid on Australian National Grid High HighEnergy EnergyPhysics: Physics: Particle Discovery Particle Discovery NeuroScience: NeuroScience:Brain Brain Activity Analysis Activity Analysis Melbourne University EU EUData DataMining MiningGrid Grid 41 DaimlerChrysler, Technion, U. Ljubljana, U. Ulster Kidney/Human Kidney/Human Physiome PhysiomeModelling Modelling Melbourne Medical Faculty, Université d'Evry, France Finance Finance/Investment /InvestmentRisk Risk Studies: Spanish Stock Studies: Spanish StockMarket Market Universidad Complutense de Madrid, Spain Case Study: High Energy Physics and Data Grid The Belle Experiment 42 KEK B-Factory, Japan Investigating fundamental violation of symmetry in nature (Charge Parity) which may help explain “why do we have more antimatter in the universe OR imbalance of matter and antimatter in the universe?”. Collaboration 1000 people, 50 institutes 100’s TB data Case Study: Event Simulation and Analysis B0->D*+D*-Ks • Simulation and Analysis Package - Belle Analysis Software Framework (BASF) • Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data Analyzed 100 data files (30MB each) that were distributed among the five nodes within Australian Belle DataGrid platform. 43 Australian Belle Data Grid Testbed Certificate Authority Analysis Request Analysis Results Virtual Organization Replica Catalog NWS NameServer Grid Service Broker Globus Gatekeeper GRIS Globus Gatekeeper GRIS NWS Sensor NWS Sensor GridFTP Dual Intel Xeon 2.8 Ghz, 2 GB RAM GridFTP Dept. of Physics, University of Sydney Dual Intel Xeon 2.8 Ghz, 2 GB RAM AARNET GRIDS Lab, University of Melbourne Globus Gatekeeper GRIS Globus Gatekeeper GRIS NWS Sensor GridFTP Globus Gatekeeper GRIS NWS Sensor GridFTP NWS Sensor GridFTP Dual Intel Xeon 2.8 Ghz, 2 GB RAM ANU, Canberra VPAC Melbourne 44 Intel Pentium 2.0 Ghz, 512 MB RAM Dept. of Physics, University of Melbourne Dual Intel Xeon 2.8 Ghz, 2 GB RAM Dept. of Computer Science, University of Adelaide Belle Data Grid (GSP CPU Service Price: G$/sec) Certificate Authority Analysis Request Analysis Results Virtual Organization Replica Catalog NWS NameServer Grid Service Broker Globus Gatekeeper GRIS Globus Gatekeeper GRIS NWS Sensor NWS Sensor GridFTP G$4 Dual Intel Xeon 2.8 Ghz, 2 GB RAM GridFTP Dept. of Physics, University of Sydney NA Dual Intel Xeon 2.8 Ghz, 2 GB RAM Globus Gatekeeper AARNET GRIDS Lab, University of Melbourne GRIS Globus Gatekeeper GRIS NWS Sensor Globus Gatekeeper GridFTP GRIS NWS Sensor GridFTP NWS Sensor Dual Intel Xeon 2.8 Ghz, 2 GB RAM GridFTP ANU, Canberra VPAC Melbourne 45 G$6 Intel Pentium 2.0 Ghz, 512 MB RAM Dept. of Physics, University of Melbourne G$2 Dual Intel Xeon 2.8 Ghz, 2 GB RAM Data node Dept. of Computer Science, University of Adelaide G$4 Belle Data Grid (Bandwidth Price: G$/MB) Certificate Authority Analysis Request Analysis Results Virtual Organization NWS NameServer Replica Catalog Grid Service Broker Globus Gatekeeper Globus Gatekeeper GRIS 32 33 36 NWS Sensor GRIS 31 GridFTP NA 30 31 GRIDS Lab, University of Melbourne Dept. of Physics, University of Sydney NWS Sensor Globus Gatekeeper GridFTP GRIS GRIS NWS Sensor GridFTP NWS Sensor Dual Intel Xeon 2.8 Ghz, 2 GB RAM GridFTP ANU, Canberra VPAC Melbourne 46 G$6 Intel Pentium 2.0 Ghz, 512 MB RAM Dept. of Physics, University of Melbourne G$4 Dual Intel Xeon 2.8 Ghz, 2 GB RAM Globus Gatekeeper AARNET GRIS GridFTP 34 38 Dual Intel Xeon 2.8 Ghz, 2 GB RAM Globus Gatekeeper NWS Sensor G$2 Dual Intel Xeon 2.8 Ghz, 2 GB RAM Data node Dept. of Computer Science, University of Adelaide G$4 Deploying Application Scenario A data grid scenario with 100 jobs and each accessing remote data of ~30MB Deadline: 3hrs. Budget: G$ 60K Scheduling Optimisation Scenario: 47 Minimise Time Minimise Cost Results: SUMMARY OF EVALUATION RESULTS Scheduling strategy Total Time Compute Data Taken Cost Cost (mins.) (G$) (G$) Total Cost (G$) Cost Minimization Time Minimization 34425 58390 71.07 48.5 26865 50938 7560 7452 Time Minimization in Data Grids fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 80 70 Number of jobs completed 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Time (in mins.) 48 Results : Cost Minimization in Data Grids fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 100 90 80 Number of jobs completed 70 60 50 40 30 20 10 0 1 49 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time(in mins.) Observation Organization 50 SUMMARY OF EVALUATION RESULTS Scheduling strategy Total Time Compute Data Taken Cost Cost (mins.) (G$) (G$) Total Cost (G$) Cost Minimization Time Minimization 34425 58390 Node details 71.07 48.5 26865 50938 Cost (in G$/CPUsec) 7560 7452 Total Jobs Executed Time Cost CS,UniMelb belle.cs.mu.oz.au 4 CPU, 2GB RAM, 40 GB HD, Linux N.A. (Not used as a compute resource) -- -- Physics, UniMelb fleagle.ph.unimelb.edu.au 1 CPU, 512 MB RAM, 40 GB HD, Linux 2 3 94 CS, University of Adelaide belle.cs.adelaide.edu.au 4 CPU (only 1 available) , 2GB RAM, 40 GB HD, Linux N.A. (Not used as a compute resource) -- -- ANU, Canberra belle.anu.edu.au 4 CPU, 2GB RAM, 40 GB HD, Linux 4 2 2 Dept of Physics, USyd belle.physics.usyd.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux 4 72 2 VPAC, Melbourne brecca-2.vpac.org 180 node cluster (only head node used), Linux 6 23 2 Summary and Conclusion 51 Application scheduling on global Grids is a complex undertaking as systems need to be adaptive, scalable, competitive,…, and driven by QoS. Nimrod-G is one of the popular Grid Resource Broker for scheduling parameter sweep applications on Global Grids Scheduling experiments on the World Wide Grid demonstrate Nimrod-G broker ability to dynamically lease services at runtime based on their quality, cost, and availability depending on consumers QoS requirements. Easy to use tools for creating Grid applications are essential for success of Grid Computing. References 52 Rajkumar Buyya, David Abramson, Jonathan Giddy, Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid, Proceedings of the 4th International Conference on High Performance Computing in AsiaPacific Region (HPC Asia 2000), Beijing, China. IEEE Computer Society Press, USA, 2000. David Abramson, Rajkumar Buyya, and Jonathan Giddy, A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Broker, Future Generation Computer Systems (FGCS) Journal, Volume 18, Issue 8, Pages: 1061-1074, Elsevier Science, The Netherlands, October 2002. Jennifer Schopf, Ten Actions When SuperScheduling, Global Grid Forum Document GFD.04, 2003. Srikumar Venugopal, Rajkumar Buyya and Lyle Winton, A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids, Concurrency and Computation: Practice and Experience, Volume 18, Issue 6, Pages: 685-699, Wiley Press, New York, USA, May 2006.