Introduction to Grid Computing References: Grid Book, Chapters 1, 2, 22 1. What is Grid Computing? Computational Grid is a collection of distributed, possibly heterogeneous resources which can be used as an ensemble to execute large-scale applications Computational Grid also called “metacomputer” Term “computational grid” comes from an analogy with the electric power grid: o Electric power is ubiquitous o Don’t need to know the source (transformer, generator) of the power or the power company that serves it Ever-present search for cycles in HPC. Two foci of research “In the box” parallel computers, as evidenced by the PetaFLOPS initiative Increasing development of infrastructure and middleware to leverage the performance potential of distributed Computational Grids Grid applications include Distributed Supercomputing o Distributed Supercomputing applications couple multiple computational resources – supercomputers and/or workstations o Distributed supercomputing applications include SFExpress (large-scale modeling of battle entities with complex interactive behavior for distrtibuted interactive simulation), Climate Modeling (modeling of climate behavior using complex models and long time-scales) High-Throughput Applications o Grid used to schedule large numbers of independent or loosely coupled tasks with the goal of putting unused cycles to work o High-throughput applications include RSA keycracking, seti@home (detection of extra-terrestrial communication) Data-Intensive Applications o Focus is on synthesizing new information from large amounts of physically distributed data o Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications, digital library applications 2. Early Experiences with Grid Computing Gigabit Testbeds Program o Late 80’s, early 90’s, gigabit testbed program was developed as joint NSF, DARPA, CNRI (Corporation for Networking Research, Bob Kahn) initiative o Idea was to investigate potential architecture for a gigabit/sec network testbed and to explore usefulness for end-users o 5 testbeds formed: CASA (southwest), MAGIC and BLANCA (Midwest), AURORA and NECTAR (northeast), VISTANET (southeast), each had a unique blend of research in applications and in networking and computer science research: Testbed Applications Network CASA Distributed Supercomputing HIPPI switches connected by HIPPI-over-SONET at OC-12 BLANCA Virtual Environments, Experimental ATM switches Remote visualization and running over experimental 622 steering, multimedia digital Mb/s and 45 Mb/s circuits libraries developed by AT&T and universities VISTANET Radiation treatment planning ATM network at OC-12 applications involving (622 Mb/s) interconnecting supercomputer, remote HIPPI local area networks instrument (radiation beam) and visualization NECTAR Coupled supercomputers OC-48 (2.4 Gb/s) links running chemical reaction between PSC supercomputer dynamics and CS research facility and CMU (metropolitan area testbed) AURORA Telerobotics, distributed OC-12 network interconnecting virtual memory and operating 4 research sites and supporting system research the development of ATM host interfaces, ATM switches and network protocols. MAGIC Remote vehicle control OC-12 network to interconnect applications and high-speed ATM-attached hosts access to databases for terrain visualization and battle simulation I-Way o First large-scale Grid experiment o Put together for SC’95 o I-Way consisted of a Grid of 17 sites connected by vBNS o Over 60 applications ran on the I-WAY during SC’95 o Each I-WAY site served by an I-POP (I-WAY Point of Presence) used for authentication of distributed applications, distribution of associated libraries and other software, and monitoring the connectivity of the I-WAY virtual network o Users could use single authentication and job submission across multiple sites or they could work directly with end-users o Scheduling done with a “human-in-the-loop” PACIs o 2 NSF Supercomputer Centers (PACIs) – SDSC/NPACI and NCSA/Alliance, both committed to Grid computing although the effort has been stronger at NCSA o vBNS backbone between NCSA and SDSC running at OC-12 with connectivity to over 100 locations at speeds ranging from 45 Mb/s to 155 Mb/s or more o Applications include data-intensive computing (NPACI), visual supercomputing and teleimmersion (Alliance). o “Access Grid” by NCSA serves to connect sites for collaboration work in distributed environments and group interactions Other Efforts o Globus testbed = GUSTO which supports Globus infrastructure and application development o Centurion Cluster at UVA = Legion testbed o IPG = supported by NASA as grid computing testbed, Globus is supported as infrastructure and application and middleware development efforts are underway 3. What is the difference between Grid Computing, Cluster Computing and the Web? Cluster computing focuses on platforms consisting of often homogeneous interconnected nodes in a single administrative domain. Clusters often consist of PCs or workstations and relatively fast networks Cluster components can be shared or dedicated Application focus is on cycle-stealing computations, high-throughput computations, distributed computations Web focuses on platforms consisting of any combination of resources and networks which support naming services, protocols, search engines, etc. Web consists of very diverse set of computational, storage, communication, and other resources shared by an immense number of users Application focus is on access to information, electronic commerce, etc. Grid focus on ensembles of distributed heterogeneous resources used as a platform for high performance computing Some grid resources may be shared, other may be dedicated or reserved Application focus is on high-performance, resource-intensive applications 4. State-of-the-art Grid Infrastructure: Globus and Legion Legion and Globus are the two best-known infrastructure efforts Globus -- integrated toolkit of Grid services Developed by Ian Foster (ANL/UC) and Carl Kesselman (USC/ISI) “Bag of services” model – applications can use Grid services without having to adopt a particular programming model Globus services include o Resource allocation and process management (GRAM) o Communication services (Nexus) o Distributed access to structure and state information (MDS) o Authentication and security services (GSI) o System monitoring (HBM) o Remote data access (GASS) o Construction, caching and location of executables (GEM) Legion Developed by Andrew Grimshaw (UVA) Provides single, coherent virtual machine model that addresses grid issues within a reflective, object-based metasystem “Everything is an object” in Legion model – HW resources, SW resources, etc. Every Legion object is defined and managed by its class object; class objects act as managers and make policy, as well as define instances Legion defines the interface and basic functionality of a set of core object types which support basic services Users may also define and build their own class objects