A Case for Economy Grid Architecture for Service Oriented Grid Computing Rajkumar Buyya, David Abramson, Jon Giddy School of Computer Science and Software Engineering, Monash University, Melbourne, Australia www.buyya.com/ecogrid http://www.gridcomputing.com Overview A brief introduction to Grid computing Resource Management issues A Glance at Approaches to Grid computing. Grid Architecture for Computational Economy Economy Grid = Globus + GRACE Nimrod-G: A Grid Resource Broker Grid Scheduling Experiments Conclusions Economy Grid Scheduling Economics Scalable HPC: Breaking Administrative Barriers 2100 2100 2100 2100 2100 2100 2100 2100 ? P E R F O R M A N C E 2100 Administrative Barriers •Individual •Group •Department •Campus •State •National •Globe •Inter Planet •Universe Desktop SMPs or SuperComputers Local Cluster Enterprise Cluster/Grid Global Cluster/Grid Inter Planet Cluster/Grid ?? Why Grids ? Large Scale Exploration needs them—Killer Applications. Solving grand challenge applications using computer modeling, simulation and analysis Aerospace Internet & Ecommerce Life Sciences CAD/CAM Digital Biology Military Applications What is Grid ? An infrastructure that couples: Computers – PCs, workstations, clusters, supercomputers, laptops, notebooks, mobile devices, PDA, etc; Software – e.g., ASPs renting expensive special purpose applications on demand; Catalogued data and databases – e.g. transparent access to human genome database; Special devices – e.g., radio telescope – SETI@Home searching for life in galaxy. People/collaborators. Potentially Offers a simple, consistent, dependable, and pervasive access across widearea networks and presents users with an integrated global resource. Grid Applications-Drivers Distributed HPC (Supercomputing): High-throughput computing: Data mining, particle physics (CERN), Drug Design. On-demand computing: Application service provides (ASPs). Data-intensive computing: Sharing digital contents among peers (e.g., Napster) Remote software access/renting services: Large scale simulation/chip design & parameter studies. Content Sharing Computational science. Medical instrumentation & network-enabled solvers. Collaborative: Collaborative design, data exploration, education. Building and Using Grids requires... Services that make our systems Grid Ready! Security mechanisms that permit resources to be accessed only by authorized users. (New) programming tools that make our applications Grid Ready!. Tools that can translate the requirements of an application into requirements for computers, networks, and storage. Tools that perform resource discovery, trading, composition, scheduling and distribution of jobs and collects results. Players in Grid Computing What users want ? Users in Grid Economy & Strategy Grid Consumers Execute jobs for solving varying problem size and complexity Benefit by selecting and aggregating resources wisely Tradeoff timeframe and cost Strategy: minimise expenses Grid Providers Contribute “idle” resource for executing consumer jobs Benefit by maximizing resource utilisation Tradeoff local requirements & market opportunity Strategy: maximise returns on services Sources of Complexity in Resource Management for World Wide Computing Size (large number of nodes, providers, consumers) Heterogeneity of resources (PCs, Workstatations, clusters, and supercomputers) Heterogeneity of fabric management systems (single system image OS, queuing systems, etc.) Heterogeneity of fabric management polices Heterogeneity of applications (scientific, engineering, and commerce) Heterogeneity of application requirements (CPU, I/O, memory, and/or network intensive) Heterogeneity in demand patters Geographic distribution and different time zones Differing goals (producers and consumers have different objectives and strategies) Unsecure and Unreliable environment Traditional approaches to resource management are NOT useful for Grid ? They use centralised policy that need Due to too many heterogenous parameters in the Grid it is impossible to define: complete state-information and common fabric management policy or decentralised consensus-based policy. system-wide performance matrix and common fabric management policy that is acceptable to all. So, we propose the usage of “economics” paradigm for managing resources proved successful in managing decentralization and heterogeneity that is present in human economies! We can easy leverage proven Economic principles and techniques Easy to regulate demand and supply User-centric, scalable, adaptable, value-driven costing, etc. Offers incentive (money?) for being part of the grid! mix-and-match Object-oriented Internet-WWW Problem Solving Approach Market/Computational Economy Grid RMS to support •Authentication (once). •Specify (code, resources, etc.). •Discover resources. authorization, •Negotiate authorisation, acceptable acceptableuse, use,Cost, Cost,etc. etc. •Acquire resources. Jobs. •Schedule jobs. •Initiate computation. Domain 1 Domain 2 •Steer computation. •Access remote data-sets. •Collaborate with results. •Account for usage. Ack: Globus.. Building an Economy Grid “brokerage” system….. Foundation for the Grid Economy Economic Models for Resource Trading Commodity Market Model Posted Prices Models Bargaining Model Tendering (Contract Net) Model Auction Model English, first-price sealed-bid, second-price sealded-bid (Vickrey), and Dutch. Proportional Resource Sharing Model Shareholder Model Partnership Model Grid Architecture for Computational Economy Sign-on Info ? Grid Explorer Application Job Control Agent Grid Market Services Information Server(s) Health Monitor Grid Node N Secure Schedule Advisor QoS Grid Node1 Pricing Algorithms Trade Server Trade Manager … Deployment Agent Trading JobExec Grid User Grid Resource Broker Misc. services Resource Allocation Storage R1 Grid Middleware Services Accounting Resource Reservation R2 … Rm Grid Service Providers Economy Grid = Globus + GRACE Applications Science Engineering MPI-G MDS Condor LSF MPI-IO Heartbeat Monitor Nexus GASS GRD PBS … Portals High-level Services and Tools GlobusView DUROC Commerce DUROC QBank eCash ActiveSheet Grid Status Nimrod/G CC++ globusrun Grid Apps. Grid Tools Core Services Globus Security Interface Local Services GRACE-TS GRAM GARA GMD GBank JVM TCP UDP Linux Irix Solaris Grid Middleware Grid Fabric GRACE components A resource broker (e.g., Nimrod/G) Resource trading protocols A mediator for negotiating between users and grid service providers (Grid Market Directory) A deal template for specifying resource requirements and services offers A trade server A pricing policy specification Accounting (e.g., QBank) and payment management (GBank) Grid Open Trading Protocols Trade Manager Trade Server Get Connected Call for Bid(DT) Reply to Bid (DT) Pricing Rules Negotiate Deal(DT) API …. Confirm Deal(DT, Y/N) Cancel Deal(DT) Change Deal(DT) Get Disconnected DT - Deal Template - resource requirements (BM) - resource profile (BS) - price (any one can set) - status - change the above values - negotiation can continue - accept/decline - validity period Open Trading Finite State Machine DT < TM, Request for Resource > < TM, Ask Price > << TS, Update >> DT < TS, Final Offer > Offer TS < TM, Accept > DA <TM, Rej.> << TM, Update >> <TS, Bid > < TM, Final Offer > Offer TM < TS, Reject > DN DT TM TM DA DN - Deal Template - Trade Manager - Trade Server - Deal Accepted - Deal Not accepted Pricing, Accounting, Allocations and Job Scheduling Flow @ each site/Grid Level 0 Pricing Policy GRID Bank (digital transactions) 2 0 1 Trade Server 3 5 4 DB@Each Site QBank 8 Resource Manager IBM-LL/PBS/…. 6 7 Compute Resources clusters/SGI/SP/... 0. Make Deposits, Transfers, Refunds, Queries/Reports 1. Clients negotiates for access cost. 2. Negotiation is performed per owner defined policies. 3. If client is happy, TS informs QB about access deal. 4. Job is Submitted 5. Check with QB for “go ahead” 6. Job Starts 7. Job Completes 8. Inform QB about resource resource utilization. Service Items to be Charged CPU - User and System time Memory: maximum resident set size - page size amount of memory used page faults: with/without physical I/O Storage: size, r/w/block IO operations Network: msgs sent/received Signals received, context switches Software and Libraries accessed Data Sources (e.g. Protein Data Bank) How to decide Price ? Fixed price model (like today’s Internet) Dynamic/Demand and Supply (like tomorrow’s Internet) Usage Period Loyalty of Customers (like Airlines favoring frequent flyers!) Historical data Advance Agreement (high discount for corporations) Usage Timing (peak, off-peak, lunch time) Calendar based (holiday/vacation period) Bulk Purchase (register 100 .com domains at once!) Voting -- trade unions decide pricing structure Resource capability as benchmarked in the market! Academic R&D/public-good application users can be offered at cheaper rate compared to commercial use. Customer Type – Quality or price sensitive buyers. Can be Prescribed by Regulating (Govt.) authorities Payments- Options & Automation Buy credits in advance / GSPs bill the user later--”pay as you go” Pay by Electronic Currency via Grid Bank NetCash (anonymity), NetCheque, and Paypal NetCheque: - http://www.isi.edu/gost/info/netcash/ NetCash - http://www.isi.edu/gost/info/netcheque/ Users register with NC accounting servers, can write electronic cheques and send (e.g email). When deposited, balance is transferred from sender to receiver account. It supports anonymity and it uses the NetCheque system to clear payments between currency servers. Paypal.com– account+email is linked to credit card. Enter the recipient’s email address and the amount you wish to request. The recipient gets an email notification and pays you at www.PayPal.com A Glance at Nimrod-G Broker Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod/G Engine Schedule Advisor Trading Manager Grid Store Grid Dispatcher Grid Explorer Grid Middleware TM Globus,Legion, Condor-g,, Ninf,etc. TS GE GIS Grid Information Server(s) RM & TS RM & TS G RM & TS C L G Globus enabled node. L Legion enabled node. RM: Local Resource Manager, TS: Trade Server C Condor enabled node. Nimrod/G : A Grid Resource Broker A resource broker for managing and steering task farming (parametric sweep) applications on computational Grids based on deadline and computational economy. Key Features A single window to manage & control experiment Resource Discovery Trade for Resources Resource Composition & Scheduling Steering & data management It allows to study the behaviour of some of the output variables against a range of different input scenarios. Nimrod/G Grid Broker Architecture Legacy Applications Customised Apps (Active Sheet) P-Tools (GUI/Scripting) (parameter_modeling) Farming Engine Monitoring and Steering Portals Meta-Scheduler Algorithm1 Programmable Entities Management Resource Job Task Schedule Advisor Variables ... AlgorithmN ResourceScheduler Nimrod Clients JobServer Grid Explorer Dispatcher Nimrod Broker Trading Manager (transport and execution management) Globus Computers Legion Local Schedulers PC/WS/Clusters ... Condor-G Storage Condor/LL/Mosix/ GRACE-TS Networks Database ... ... G-Bank Instruments Radio Telescope Middleware Fabric Deadline A Nimrod/G Client Cost 66 Arlington Alexandria Legion hosts She na nd o a h Rive r 64 64 81 Ra p p a ha n no c k Po to m a c Rive r Rive r Roanoke Ja m e s Rive r Ap p o m a to x Rive r Richmond Hampton Norfolk Virginia Beach Portsmouth Chesapeake Newport News 77 VIRGINIA 85 Globus Hosts Bezek is in both Globus and Legion Domains Nimrod/G Interactions Resource Discovery Farming Engine Scheduler Trade Server Dispatcher Process server I/O server Root node Grid Info servers Resource allocation (local) Queuing System Job Wrapper User process File access Gatekeeper node Computational node Adaptive Scheduling algorithms Adaptive Scheduling Algorithms Time Minimisation Cost Minimisation None Minimisation Execution Time (not beyond deadline) Minimise Limited by deadline Limited by deadline Discover Establish Resources Rates Distribute Jobs Compose & Schedule Execution Cost (not beyond budget) Limited by budget Minimise Limited by budget Discover More Resources Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Inter-Continental Grid Australia North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster Monash Uni.: Nimrod/G Linux cluster Globus+Legion +Condor/G Solaris WS Globus/Legion GRACE_TS Internet Asia/Japan Tokyo I-Tech.: ETL, Tuskuba Linux cluster Globus + GRACE_TS Europe ZIB/FUB: T3E/Mosix Cardiff: Sun E6500 Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster Pozman: SGI/SP2 Globus + GRACE_TS Experiment-1 Setup Workload: 165 jobs, each need 5 minute of cpu time Deadline: 1 hrs. and budget: 800,000 units Strategy: minimise cost and meet deadline Execution Cost with cost optimisation AU Peaktime:471205 (G$) AU Offpeak time: 427155 (G$) Resources Selected & Price/CPU-sec. Resource Owner and Type & Size Location Grid services Peaktime Cost (G$) Offpeak cost Linux cluster Monash, (60 nodes) Australia Globus/Condor 20 5 IBM SP2 nodes) Globus/LL 5 10 (80 ANL, Chicago, US Sun (8 nodes) ANL, Chicago, US Globus/Fork 5 10 SGI (96 nodes) ANL, Chicago, US Globus/Condor-G 15 15 SGI (10 nodes) ISI, LA, US Globus/Fork 10 20 Execution @ AU Peak Time Linux cluster - Monash (20) 12 Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10) 10 6 4 2 Time (minutes) 54 52 51 49 47 46 44 43 41 40 38 37 36 34 33 31 30 28 27 25 24 22 21 20 19 17 15 14 12 10 9 8 6 4 3 1 0 0 Jobs 8 Execution @ AU Offpeak Time Linux cluster - Monash (5) 12 Sun - ANL (10) SP2 - ANL (10) SGI - ANL (15) SGI - ISI (20) 10 6 4 2 Time (minutes) 60 57 55 53 50 48 46 43 41 39 37 35 32 31 28 26 23 21 19 17 15 13 10 8 7 4 3 0 0 Jobs 8 AU peak: Resources/Cost in Use 40 30 After the calibration phase, note the difference in pattern of two graphs. This is when scheduler stopped using expensive resources. 25 20 15 10 5 500 0 350 300 250 200 150 100 50 Time (in min.) 54 51 47 44 41 38 36 33 30 27 24 21 19 15 12 9 6 3 0 0 51 47 44 41 38 54 400 Cost of Resources in Use Time (in min.) 36 33 30 27 24 21 19 15 12 9 6 3 450 0 Resources (No. of CPUs) in Use 35 Time (in min.) 59 53 56 49 43 47 41 38 32 35 29 22 26 20 14 17 11 8 6 3 0 Cost of Resources in Use Time (in min.) 59 56 53 49 47 43 41 38 35 32 29 26 22 20 17 14 11 8 6 3 0 Resources (No. of CPUs) in Use AU offpeak: Resources/Cost in Use 30 25 20 15 10 5 350 0 300 250 200 150 100 50 0 DesignDrug@Home: Data Intensive Computing on Grid A Virtual Laboratory for “Molecular Modelling for Drug Design" on Peer-to-Peer Grid. It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design. In collaboration with: Kim Branson, Structural Biology, Walter and Eliza Hall Institute (WEHI) http://www.csse.monash.edu.au/~rajkumar/dd@home/ Active Sheet: Spreadsheet Processing on Grid Nimrod Proxy Nimrod/G Related Works (contd) Mariposa-Distributed Database system (UCB) UCB Millennium clusters query with budget, creates sub-query & divides budget, trades with (remote) servers remote execution environment on clusters and supports computational economy rexec for clusters proportional resource sharing UNSW Mungi Storage management: allocation of backing store and garbage collection of unwanted memory segments depending available credit. Amount of credit required to store increases as available storage space becomes minimum. Related Works JaWS - Java based Webcomputing system offers market oriented programming and computing mechanisms on the Web. Xenoservers - Accounted execution of untrusted code D’Agents - Agents and computational economy MOSIX - cost based cluster load balancing A number of theoretical works on pricing. FIPA standard Agents Interaction Protocols (for trading) - we plan to explore this! Can we Predict its Future ? “I think there is a world market for about five computers.” Thomas J. Watson Sr., IBM Founder, 1943 Conclusions The HPC will be dominated by Peer-to-Peer Grid of clusters. Adaptive, scalable, and easy to use Systems and End-User applications will be prominent. Access electricity, internet, entertainment (music, movie,…), etc. from the wall socket! An Economics –based Service Oriented Grid Computing computing needed for eventual success of Grids! The impact of World-Wide Grid on 21st century economy will be the same as electricity on 20th century economy. Thank You… Any ??