MS doc - Computer Science and Engineering

Introduction to Grid Computing
References: Grid Book, Chapters 1, 2, 22
1. What is Grid Computing?
Computational Grid is a collection of distributed, possibly heterogeneous
resources which can be used as an ensemble to execute large-scale
 Computational Grid also called “metacomputer”
 Term “computational grid” comes from an analogy with the electric
power grid:
o Electric power is ubiquitous
o Don’t need to know the source (transformer, generator) of the
power or the power company that serves it
Ever-present search for cycles in HPC. Two foci of research
 “In the box” parallel computers, as evidenced by the PetaFLOPS
 Increasing development of infrastructure and middleware to leverage
the performance potential of distributed Computational Grids
Grid applications include
 Distributed Supercomputing
o Distributed Supercomputing applications couple multiple
computational resources – supercomputers and/or workstations
o Distributed supercomputing applications include SFExpress
(large-scale modeling of battle entities with complex interactive
behavior for distrtibuted interactive simulation), Climate
Modeling (modeling of climate behavior using complex models
and long time-scales)
 High-Throughput Applications
o Grid used to schedule large numbers of independent or loosely
coupled tasks with the goal of putting unused cycles to work
o High-throughput applications include RSA keycracking,
seti@home (detection of extra-terrestrial communication)
 Data-Intensive Applications
o Focus is on synthesizing new information from large amounts
of physically distributed data
o Examples include NILE (distributed system for high energy
physics experiments using data from CLEO), SAR/SRB
applications, digital library applications
2. Early Experiences with Grid Computing
 Gigabit Testbeds Program
o Late 80’s, early 90’s, gigabit testbed program was developed as
joint NSF, DARPA, CNRI (Corporation for Networking
Research, Bob Kahn) initiative
o Idea was to investigate potential architecture for a gigabit/sec
network testbed and to explore usefulness for end-users
o 5 testbeds formed: CASA (southwest), MAGIC and BLANCA
(Midwest), AURORA and NECTAR (northeast), VISTANET
(southeast), each had a unique blend of research in applications
and in networking and computer science research:
Distributed Supercomputing
HIPPI switches connected
at OC-12
Virtual Environments,
Experimental ATM switches
Remote visualization and
running over experimental 622
steering, multimedia digital
Mb/s and 45 Mb/s circuits
developed by AT&T and
VISTANET Radiation treatment planning
ATM network at OC-12
applications involving
(622 Mb/s) interconnecting
supercomputer, remote
HIPPI local area networks
instrument (radiation beam)
and visualization
Coupled supercomputers
OC-48 (2.4 Gb/s) links
running chemical reaction
between PSC supercomputer
dynamics and CS research
facility and CMU
(metropolitan area testbed)
Telerobotics, distributed
OC-12 network interconnecting
virtual memory and operating 4 research sites and supporting
system research
the development of ATM host
interfaces, ATM switches and
network protocols.
Remote vehicle control
OC-12 network to interconnect
applications and high-speed
ATM-attached hosts
access to databases for terrain
visualization and battle
 I-Way
o First large-scale Grid experiment
o Put together for SC’95
o I-Way consisted of a Grid of 17 sites connected by vBNS
o Over 60 applications ran on the I-WAY during SC’95
o Each I-WAY site served by an I-POP (I-WAY Point of
Presence) used for authentication of distributed applications,
distribution of associated libraries and other software, and
monitoring the connectivity of the I-WAY virtual network
o Users could use single authentication and job submission across
multiple sites or they could work directly with end-users
o Scheduling done with a “human-in-the-loop”
o 2 NSF Supercomputer Centers (PACIs) – SDSC/NPACI and
NCSA/Alliance, both committed to Grid computing although
the effort has been stronger at NCSA
o vBNS backbone between NCSA and SDSC running at OC-12
with connectivity to over 100 locations at speeds ranging from
45 Mb/s to 155 Mb/s or more
o Applications include data-intensive computing (NPACI), visual
supercomputing and teleimmersion (Alliance).
o “Access Grid” by NCSA serves to connect sites for
collaboration work in distributed environments and group
 Other Efforts
o Globus testbed = GUSTO which supports Globus infrastructure
and application development
o Centurion Cluster at UVA = Legion testbed
o IPG = supported by NASA as grid computing testbed, Globus is
supported as infrastructure and application and middleware
development efforts are underway
3. What is the difference between Grid Computing, Cluster Computing
and the Web?
Cluster computing focuses on platforms consisting of often homogeneous
interconnected nodes in a single administrative domain.
 Clusters often consist of PCs or workstations and relatively fast
 Cluster components can be shared or dedicated
 Application focus is on cycle-stealing computations, high-throughput
computations, distributed computations
Web focuses on platforms consisting of any combination of resources and
networks which support naming services, protocols, search engines, etc.
 Web consists of very diverse set of computational, storage,
communication, and other resources shared by an immense number of
 Application focus is on access to information, electronic commerce,
Grid focus on ensembles of distributed heterogeneous resources used as a
platform for high performance computing
 Some grid resources may be shared, other may be dedicated or
 Application focus is on high-performance, resource-intensive
4. State-of-the-art Grid Infrastructure: Globus and Legion
Legion and Globus are the two best-known infrastructure efforts
Globus -- integrated toolkit of Grid services
 Developed by Ian Foster (ANL/UC) and Carl Kesselman
 “Bag of services” model – applications can use Grid services
without having to adopt a particular programming model
 Globus services include
o Resource allocation and process management (GRAM)
o Communication services (Nexus)
o Distributed access to structure and state information (MDS)
o Authentication and security services (GSI)
o System monitoring (HBM)
o Remote data access (GASS)
o Construction, caching and location of executables (GEM)
 Developed by Andrew Grimshaw (UVA)
 Provides single, coherent virtual machine model that addresses grid
issues within a reflective, object-based metasystem
 “Everything is an object” in Legion model – HW resources, SW
resources, etc.
 Every Legion object is defined and managed by its class object; class
objects act as managers and make policy, as well as define instances
 Legion defines the interface and basic functionality of a set of core
object types which support basic services
 Users may also define and build their own class objects