The Globus Grid Programming Toolkit: A User-level Tutorial The Globus Project Team ANL and USC/ISI http://www.globus.org Abstract This tutorial is a practical introduction to programming for high-performance distributed computing systems, or "computational grids," and the capabilities of the Globus grid toolkit. Emerging high-performance networks promise to enable a wide range of emerging application concepts such as remote computing, distributed supercomputing, teleimmersion, smart instruments, and data mining. However, the development and use of such applications is in practice very difficult and time consuming, because of the need to deal with complex and highly heterogeneous systems. The Globus grid programming toolkit is designed to help application developers and tool builders overcome these obstacles to the construction of "grid-enabled” scientific and engineering applications. It does this by providing a set of standard services for authentication, resource location, resource allocation, configuration, communication, file access, fault detection, and executable management. These services can be incorporated into applications and/or programming tools in a "mix-and-match" fashion to provide access to needed capabilities. Our goal in this tutorial is both to introduce the capabilities of the Globus toolkit and to show attendees how Globus services can be applied in specific applications. Hence, the tutorial covers a mixture of grid programming principles and detailed case studies of real applications. Introduction 2 Tutorial Goals Provide an introduction To the structure of the Globus computational grid To the capabilities of the Globus toolkit To pragmatic issues associated with using the toolkit Enable attendees To start building & using Globus applications To utilize Globus services Introduction 3 Overview Introduction to computational grids High-level overview of the Globus toolkit Four components: Security and remote process creation Running programs across multiple resources Information services Dynamic configuration and resource management Case studies Other Globus services, and future directions Globus installation & administration Introduction 4 Why “The Grid”? New applications based on high-speed coupling of people, computers, databases, instruments, etc. Computer-enhanced instruments Collaborative engineering Browsing of remote datasets Use of remote software Data-intensive computing Very large-scale simulation Large-scale parameter studies Introduction 5 E.g.: Computer-Enhanced Instruments for Microtomography 50 Mb/s -> 5 Gb/s -> 100 Gb/s APS beamline @ Argonne “100 Gflop/sec, 50 Mb/sec, 5 minutes; rendering, 10 GB storage” 5 Mb/s -> 1 Gb/s -> 10 Gb/s Coupling with supercomputers Interactive use of beamline Collaboration on results Parameter studies for experiment planning Coupling with mass store systems Los Angeles Chicago Introduction 6 E.g.: Tele-immersion “5 Gflop/sec, flowspecs, design db” Multiple access modalities Multiple flows Control Simulation Text Tracking Video Haptics Audio Rendering Leigh et al., UofI, Electronic Visualization Lab. Database Introduction 7 SF-Express: Distributed Interactive Simulation NCSA Origin Argonne SP Caltech Exemplar Maui SP “200 GB memory, 100 BIPs” P. Messina et al., Caltech Issues: Resource discovery, scheduling Configuration Multiple comm methods Message passing (MPI) Scalability Fault tolerance Introduction 8 The Grid “Dependable, consistent, pervasive access to [high-end] resources” Dependable: Can provide performance and functionality guarantees Consistent: Uniform interfaces to a wide variety of resources Pervasive: Ability to “plug in” from anywhere Introduction 9 Evolution of a Concept Metacomputing: late 80s Gigabit testbeds: early 90s Focus on distributed computation Research, primarily on networking I-WAY: 1995 Demonstration of application feasibility PACIs (National Technology Grid): 1998 NASA Information Power Grid: 1999 ASCI DISCOM: 1999; SSI: 2000? Introduction 10 I-WAY The Alliance National Technology Grid National and International Grid Testbeds NASA’s Information Power Grid Introduction 11 Technical Challenges Complex application structures, combining aspects of parallel, multimedia, distributed, collaborative computing Dynamic varying resource characteristics, in time and space Need for high & guaranteed “end-to-end” performance, despite heterogeneity and lack of global control Interdomain issues of security, policy, payment Introduction 12 Issues •Authenticate once •Specify simulation (code, resources, etc.) •Locate resources •Negotiate authorization, Domain 1 acceptable use, etc. •Acquire resources Domain 2 •Initiate computation •Steer computation •Access remote datasets •Collaborate on results •Account for usage Introduction 13 Architectural Approaches Distributed systems: DCE, CORBA, Jini, etc. Rich functionality eases app. development Complexity hinders deployment especially in absence of global control Performance difficulties Internet Protocol, Web tools Simple protocols facilitate deployment Missing functionality hinders app. development Performance difficulties Introduction 14 Standards & Commodity Tech Where appropriate, exploit standards and commodity technology in core infrastructure LDAP, SSL, X.509, GSS-API, GAA-API, http, ftp, XML, etc. Provides leverage Interface with other common standards CORBA, Java/Jini, DCOM, Web, etc While our core infrastructure may not be built on one of these distributed architectures, we must cleanly interface with them Introduction 15 The Globus Project Basic research in grid-related technologies Development of Globus toolkit Core services for grid-enabled tools & applns Construction of large grid testbed: GUSTO Resource management, QoS, networking, storage, security, adaptation, policy, etc. Largest grid testbed in terms of sites & apps Application experiments Tele-immersion, distributed computing, etc. Introduction 16 Globus Approach A toolkit and collection of services addressing key technical problems Bag of services model Not a vertically integrated solution Inter-domain issues, rather than clustering Integration of intra-domain solutions Distinguish between local and global services “IP hourglass” model Introduction 17 Technical Focus & Approach Information-rich environment Enable incremental development of gridenabled tools and applications Basis for configuration and adaptation Support many programming models, tools, applications Deploy toolkit on national-scale testbed to allow large-scale applications Evolve in response to user requirements Introduction 18 Globus Approach Focus on architecture issues Propose set of core services as basic infrastructure Applications Diverse global services Use to construct high-level, domain-specific solutions Design principles Keep participation cost low Enable local control Support for adaptation Core Globus services Local OS Introduction 19 Layered Architecture Applications High-level Services and Tools GlobusView DUROC Nexus Gloperf MPI MPI-IO CC++ Testbed Status Nimrod/G globusrun Core Services Metacomputing Directory Service Condor MPI LSF Easy NQE Globus Security Interface Local Services GRAM Heartbeat Monitor AIX GASS TCP UDP Irix Solaris Introduction 20 Core Globus Services Communication infrastructure (Nexus, IO) Information services (MDS) Network performance monitoring (Gloperf) Process monitoring (HBM) Remote file and executable management (GASS and GEM) Resource management (GRAM) Security (GSI) Introduction 21 Sample of High-Level Services I Communication & I/O libraries Parallel languages CC++, HPC++ Collaborative environments MPICH, PAWS, RIO (MPI-IO), PPFS, MOL CAVERNsoft, ManyWorlds Others MetaNEOS, NetSolve, LSA, AutoPilot, WebFlow Introduction 22 Sample High-Level Services II Resource brokers and co-allocators DUROC: co-allocation of multiple systems Nimrod: high-throughput computing Graphical system status display elements GlobusView MDS Browsers Health & Status Monitors (HBM) Network Monitors (Gloperf) Introduction 23 “GUSTO” Globus Ubiquitous Supercomputing Testbed Organization A collection of organizations committed to creating a persistent computational grid infrastructure As of November 1998, 70 organizations in 3 continents and 8 countries Introduction 24 16 sites, 330 computers, 3600 nodes, 2 Teraflop/s, 10 application partners Introduction 25 GUSTO Testbed During SC’97 Introduction 26 GUSTO Computational Grid Testbed: November 1998 Introduction 27 Where We Are (November 1998) New results in security, resource management, tools, fault detection, etc. Globus v1.0 completed All core services complete, relatively robust, and documented Available on most Unix platforms Many tool projects are leveraging this considerable investment in infrastructure Interesting applications are emerging, although mostly still in “demo” mode Introduction 28 Where We Are (June 1999) New results in QoS, security, resource management, data management, tools, etc. Globus v1.1 nearing completion Available on most Unix platforms and Win32 Many tool projects are leveraging this considerable investment in infrastructure Documentation and deployment underway at NCSA and NASA IPG Always looking for interesting applications Introduction 29 Changes from 1.0 to 1.1 Tutorial changes for 1.1 are denoted by Name changes from Globus to Grid Security and Information Service adopted as core Grid infrastructure by several organizations Globus Security Infrastructure -> Grid Security Infrastructure Metacomputing Directory Service -> Grid Information Service Affects naming of APIs and tools Numerous small API fixes, additions, changes Cleanup of programs/tools A few new modules (I/O, error objects) Introduction 30 Example Application Projects Computed microtomography (ANL, ISI) Hydrology (ISI, UMD, UT; also NCSA, Wisc.) Interactive modeling and data analysis Collaborative engineering (“tele-immersion”) Real-time, collaborative analysis of data from X-Ray source (and electron microscope) CAVERNsoft @ EVL, Metro @ ANL X-Ray crystallography (ANL, SUNY) High-throughput computing for Shake ‘n Bake Introduction 31 Example Application Expts (contd) Distributed interactive simulation (CIT, ISI) Remote visualization and steering for astrophysics Record-setting SF-Express simulation Including trans-Atlantic experiments Data-intensive computing experiments (with LBNL and SLAC: “Clipper” project) Introduction 32 For More Information on Globus http://www.globus.org Papers on all components Tutorial and documents Software Application descriptions Introduction 33 The Grid: Blueprint for a New Computing Infrastructure I. Foster, C. Kesselman (Eds), Morgan Kaufmann, 1999 Available July 1998; ISBN 1-55860-475-8 22 chapters by expert authors including Andrew Chien, Jack Dongarra, Tom DeFanti, Andrew Grimshaw, Roch Guerin, Ken Kennedy, Paul Messina, Cliff Neuman, Jon Postel, Larry Smarr, Rick Stevens, and many others “A source book for the history of the future” -- Vint Cerf http://www.mkp.com/grids Introduction 34 Tutorial Approach Four sections, each illustrates a basic Globus technique Laboratory material is available to allow practice with the use of each technique See http://www.globus.org/tutorial Introduction 35