
1.1 Grid Computing
There is general agreement among businesses that today's information infrastructure
costs too much and does not meet the need for flexibility and reliability that they demand
[4]. Companies face the paradox of increasing demand for compute cycles while
simultaneously possessing unused compute cycles on the servers and desktops across the
enterprise. This disconnect exists in almost all companies and the underutilization rate
often increases dramatically with the number of PCs owned by the business. Therefore,
there is tremendous opportunity to capture the latent value of Information Technology
(IT) investments just by the efficient utilization of deployed assets.
Grid Computing extends that concept of cluster computing to a scale that crosses
organizational boundaries. The Grid refers to an infrastructure that enables the integrated,
collaborative use of high-end computers, networks, databases, and other resources, such
as scientific instruments, that may be owned and managed by multiple organizations.
Grid applications often involve large volume data (> Terabyte) and often require secure
resource sharing across organizational boundaries that are not easily handled by today's
Internet and Web infrastructures [1]. This new innovative approach to computing can be
thought of as a massive "utility" grid, similar to the electric grid that provides power to
homes and businesses each and every day. In this same utility fashion, Grid Computing
openly seeks to be capable of adding an infinite number of computing devices into a grid
environment, adding to the computing capability and problem resolution tasks within the
operational grid environment [2]. It can also be stated as a type of parallel and distributed
system that enables the sharing, selection, and aggregation of geographically distributed
dynamically at runtime depending on their availability,
capability, performance, cost, and users' quality-of-service requirements [3].
The architecture evolution reflecting the change from a centralized architecture to
distributed grid is represented in Figure 1 [4]. In this representation, the centralized
model leverages a relatively homogenous environment and focused support specialists. In
contrast, the distributed architecture provides businesses with more flexibility and more
control over their IT environment. But this also leads to additional complexity, lower
asset utilization, a distributed support staff and additional facilities costs which virtually
eliminates the potential cost benefit of mini and micro computers used. The next jump on
this evolutionary trail is the virtualization of resources such as the dynamic, efficient
pooling of mixed compute, storage and network resources which is enabled by the rapidly
declining cost/performance ratio of network bandwidth and systems management
Distri butedGrd
Figure 1.1: Architecture Evolution
The term Utility Computing and Grid Computing are being used interchangeably due to
marketing efforts of various companies although differences exist in the principles of its
approach [25]. Utility Computing is the ability to intelligently match IT resources to
business demand on a pay-for-use basis, which is an on-demand variation of the Grid
Computing concept. Utility Computing benefits include reduced cost, speed time to value
and alignment of IT spending with business activity.
A further study of the evolution of the grid is represented in Chapter 2 along with the
various current architectures available and different vendor based initiatives towards grid
1.2 Origins of Grid Computing
In order to understand the current trends in grid computing it would be ideal to put the
concept in perspective of its origins. Grid computing concept is not entirely new. It is
actually a variation on the notion of distributed computing, which takes an application off
one server and runs it more effectively by spreading it across several servers on a
network. In the early 1970s when computers were first linked by networks, the idea of
harnessing unused CPU cycles was born which leveraged the Internet's predecessor, the
ARPAnet, in the form of distributed computing. Distributed computing scaled to a global
level with the maturation of the Internet in the 1990s.
The Internet has had the powerful effect of connecting computers, but more importantly
people. However, only email and the World Wide Web systems have been widely used.
The problem with the Internet is that it connects dispersed computing resources but
doesn't provide a way to coordinate them to work on a common problem. New protocols
such as resource discovery, negotiation or coordination are necessary for integrating and
trading resources on the Internet. As more and more resources are converted from atoms
to bits and are connected to the Internet, a standard framework is required to approach
these resources on the net, no matter what protocols they use and where they are located,
and coordinate them together in order to help people collaborate and work on a common
and complicate problem [2].
The most successful and popular of distributed computing projects in history, is the
SETI@home (Search for Extraterrestrial Intelligence) project. The project's focus is to
search for radio signal fluctuations that may indicate a sign of intelligent life from space.
Over two million people computing project to date -
the largest number of volunteers for any Internet distributed
have installed the SETI@home software agent since the
project started in May 1999 to search through signals collected by the Arecibo Radio
Telescope in Puerto Rico. The project originally received far more terabytes of data every
day than its assigned computers could process. This project conclusively proved that
distributed computing could accelerate computing project results while managing project
costs. The project invited volunteers to download the SETI@home software to donate the
idle processing time on their computers to the project. The project originated at the
University of California at Berkeley.
Grid computing is evolving into an important discipline within the computer industry by
extending distributed computing through an increased focus on resource sharing, coordination, manageability, and high performance.
Current distributed computing
technologies, such as Internet technologies, addresses communication and information
exchanges among computers but does not provide an integrated approach to the
coordinated use of resources at multiple sites for computation. Even Business-toBusiness exchanges focus on information sharing and not necessarily computing resource
The Grid has many of the same characteristics as the electric grid, such as [5]:
" Complexity characterized by complicated connections across multi administrative
" Standards enabled through research, business drivers, and Government or association
standard enforcement
" Distribution characterized by several distribution levels with different complexity and
different capacity.
Some experts postulate the concept of Grid technology arriving in three waves: Firstly in
academic research communities, followed by corporations which is beginning to happen
now. The ultimate goal however is the third wave, which will see the technology coalesce
to create a processing network analogous to the Web, and called simply "the Grid" [15].
1.3 Market for Grid Computing
Grid Computing is slowly gaining momentum in the business world and its application is
being tested in various industrial segments around the world. Marketing efforts are
underway by the big technology vendors such as IBM, SUN, HP and Oracle, to name a
few, touting their Grid enabled solutions leveraging their computing cycles and enormous
productivity gains.
Having the backing from the major industrial players has opened up the 'Great Grid
Rush' of this century. The rush is justified by the market opportunity that grid computing
promises. According to an IDC survey the market opportunity for grid computing is
projected to be $12 billion dollar by 2007 as shown in Figure 1.2 [6].
Some of the key findings in the IDC report include:
Primary customer motivations, adoption scenarios, targeted workloads, and business
needs suggest that the grid market is beginning to split into three distinct segments:
compute, data, and optimization
" Early adoption has largely been in the high-performance computing market for large
batch-oriented grids
" Emerging opportunity is focused primarily on the pooling and allocation of resources
across a variety of business services
Grid Revenue Forecast by Grid Category, 2002-2007
*Campute gDate DOptmizetion
Figure 1.2: Revenue Forecast by Grid Categoryl, 2002 - 2004
IDC attributes the projected revenue increase to a number of factors, including the
maturation and standardization of Grid software, the drive for efficient use of IT
infrastructure by end users, the expanded awareness of Grids, and the expansion of the
market beyond traditional HPC applications and users.
Another recent IDC study [7] shows that grid computing adoption in Western Europe,
although still nascent, is expected to reach beyond the high performance computing
(HPC) space and will become more pervasive in commercial data centers over the next
few years. IDC's European Enterprise Server Group forecasts that grid computing will
approach $1.8 billion in server revenue by 2008 across HPC technical markets and
commercial applications. The study builds on IDC's server research group investigations
into emerging grid technology in an attempt to define and size the present and future grid
computing space in Western Europe, in particular the incremental server revenue attached
to grid computing.
The increasing corporate demand for flexibility, reliability and resiliency with reduced
costs of technology needs has increased the interest of corporations to study the benefits
Grid Categories are explained in Chapter 2
provided by grid computing. Grid computing can provide many benefits not available
with traditional computing models [26]:
Better utilization of resources -
Grid computing uses distributed resources more
efficiently and delivers more usable computing power. This can decrease time-tomarket, allow for innovation, or enable additional testing and simulation for improved
product quality. By employing existing resources, grid computing helps protect IT
investments, containing costs while providing more capacity
Increased user productivity -
By providing transparent access to resources, work can
be completed more quickly. Users gain additional productivity as they can focus on
design and development rather than wasting valuable time hunting for resources and
manually scheduling and managing large numbers of jobs and providing end users
uninhibited access to the computing, data and storage resources they need (when they
need them)
Scalability -
Grids can grow seamlessly over time, allowing many thousands of
processors to be integrated into one cluster.
Components can be updated
independently and additional resources can be added as needed, reducing large onetime expenses
" Flexibility -
Grid computing provides computing power where it is needed most,
helping to better meet dynamically changing work loads. Grids can contain
heterogeneous compute nodes, allowing resources to be added and removed as needs
Additional motivations of corporation to connect to the grid are represented in Figure 1.3
and some key drivers are discussed below.
IT Infrastructure Complexity
Over time, organizations have adopted new information technologies to support new
applications and placed them alongside older technology rather than abandoning their
previous investment in systems, software, and processes. This has often led to an IT
infrastructure that is very complex. The foundation often is 20 years old. Layered on top
of that foundation are layers of new technology. Complexity clearly can increase an
organization's staff-related costs.
It may also decrease an organization's ability to
respond to rapidly moving market forces [14] and thereby affects its agility and
Simplify IT
Tinme to
Cresa COW04pmim,
Ad vwftp
_A U
I m
Figure 1.3: Connecting Motivations to Grid Computing
Reduce Capital Costs
Declining or stagnant revenue has forced many organizations to seek ways to trim
costs. IT budgets either have been reduced while the IT organization is expected
to continue supporting all application systems and business initiatives or have
remained static while the IT organization is expected to provide expanded levels of
service to the organization. Thus, the IT staff faces the paradox of having to do more
with less. To address these issues, IT organizations have implemented various techniques
such as consolidating multiple workloads on fewer, larger system configurations in a
smaller number of datacenters,
increasing focus on the use of
configurations (blade computing, clustering, or grids) constructed of high-volume, lowcost systems and reducing the number of system architectures utilized throughout the
network to make development, administration, operations, and support easier and,
thus, less costly
Improved System Utilization
Organizations have installed a number of different computing solutions and are now
finding that these systems are not fully utilized nor do they interoperate as a single
IT management is often seeking ways to increase the overall
utilization of its systems and integrate application systems. Since
facing a rapidly changing environment, they are also concerned about agility. That
is, they are seeking ways to align their system resources to address today's need, knowing
that tomorrow they may need to realign to address a different set of needs.
While potential opportunity is both broad and significant, there are numerous and varied
challenges and obstacles that may hinder its adoption including:
" Cultural and organizational concerns associated with resource sharing -
e.g., the
comfort factor associated with virtualized resources for business units
" General lack of commercial applications running in a grid environment
General lack of tools and industry standards leading organizations to think of grids as
requiring large people and services costs, which lessen any infrastructure cost savings
" Security concerns
Chapter 3 discusses the barriers to adoption of Grid Computing in corporations.
1.4 Grid Computing Competitive Landscape
The market opportunity of grid computing has been dominated by several major vendors
like IBM, SUN, Microsoft, Oracle and EDS to name a few (Figure 1.4). There are also
other providers at different segments of the grid computing offerings.
A Michael Porter Five Forces view of the Grid computing landscape shows that the
barrier to entry into this market space is low and the rivalry within the industry although
civil has seen recent trends of cluster alliances. The availability of several vendors has
increased the buyer power although that view may also limit extensibility of their
architectures by using vendor specific grid enabler tools. This also can project initial
higher switching costs for corporation to use alternatives. As the technology matures and
acceptance grows within business communities, the potential for more entrants is high.
-Enterprise vision
-Solution set
-Road map IT to BP
Full-Service Orientation
Hosting, Point Solutions and
Managed Service Providers
AT&T SevenSpace
NaviSite Coradiant Digex
BroadSoft Sector
AimNet Solutions
Figure 1.4: Key Providers of Grid/Utility Programs
Figure 1.4 also depicts the confusing messages that are being marketed by different
vendors. For example IBM has their On-Demand offering whereas HP and EDS has their
Adaptive and Agile Enterprise offerings respectively.
Whereas, Figure 1.5, shows the current best positioned suppliers of Grid Computing
services from the perspective of the customers according to IDC market research. IBM is
viewed as the market leader and their market positioning strategy has already started
reaping the rewards. According to the VP of Grid Computing at IBM, Ken King, IBM's
Grid business had already almost hit the $1 billion mark [27].
A summary of the Go-to-Market strategy segmentation is provided below:
Development of channel model (e.g. IBM Global Services, Sun Microsystems)
Partnerships with ISVs to enable delivery of ASP (e.g. IBM GS)
Stand-alone offerings (e.g. IBM GS, HP)
EDS, Sun
Dell, CSC, Cisco,
Figure 1.5: Best-Positioned Suppliers of Grid Computing Services
Although the initiatives may be marketed differently by the major players for
differentiation essentially it all related to grid computing. The next chapter discusses
some of these different messages and how they relate to their Grid Computing
architecture application.
2.1 Grid Computing Architecture Introduction and IT Evolution
Grid Computing has been an integral part of the Information Technology evolution. In
Chapter 1 the architecture evolution was presented to define the concept of virtualization
and the enabling virtual organization (VO). Over the last several years the grid
community has produced protocols, services and various tools that address the notion of
building scalable virtual organizations. Grid Computing, in turn, provides highly scalable,
highly secure, and extremely high-performance mechanisms for discovering and
negotiating access to remote computing resources in a seamless manner. This makes it
possible for the sharing of computing resources, on an unprecedented scale, among an
infinite number of geographically distributed groups.
Prior to delving deeper into Grid Computing architecture, the evolution of IT would be
evaluated in relation to emergence of grid or utility computing. The consumption model
reflects how corporations purchase their IT as shown in Figure 2.1 [4]. The evolution
started off with the concept of 'Insourcing' where corporations owned all compute
resources and managed them internally. Since the last two decades IT was used to
implement additional business functions leading to higher complexity and costs. In order
to manage cost and complexity companies started to outsource non-core business
functions initially to other IT vendors while maintaining the decision to architectures and
product selections. Although this model provided a better option than 'Insourced' model
the corporations still lacked the agility and cost performance that was required to remain
competitive. The next wave represented in the figure is that of the utility or grid
computing where the virtualized organization will be the implementation for the
corporation to purchase of services for computation or utilize the existing idle CPU cycle
resources currently available. There is the need for the openness of standards and
guidelines to follow in order to maintain the transparency of resource availability and
their utilizations across organizations.
-managed and
-operated assets
Company-managed and
provider-operated assets
Company-owned, -managed
and -operated assets
Figure 2.1 Consumption Model
Grid computing environments must be constructed upon the following foundations [4]:
" Coordinated resources. Avoid building grid systems with a centralized control;
instead, provide the necessary infrastructure for coordination among the resources,
based on respective policies and service-level agreements
" Open standard protocols and frameworks. The use of open standards provides
interoperability and integration facilities. These standards must be applied for
resource discovery, resource access, and resource coordination.
Additionally a metric would be included for the measurement of Quality of Service
(QoS) requirements necessary for the end-user community.
2.2 Grid Computing Attributes
The requirements for grid computing infrastructure can be described by the following
attributes [16]:
Virtualization is the abstraction into a service of every physical and logical entity in a
grid. Virtualization is important because it enables grid components (such as storage,
processors, databases, application servers, and applications) to integrate tightly without
creating rigidity and brittleness in the system. Rather than making fixed ties that
determine which application server node will handle requests from a particular
application, for example, or where a database physically locates its data, virtualization
enables each component of the grid to react to changing circumstances more quickly and
to adapt to component failures without compromising performance of the system as a
Dynamic Provisioning
Provisioning simply means distributing supplies where they are needed. In the context of
the grid, "supplies" may mean server requests that need to be handled, data that needs to
be accessed and used, or computations that need to be performed. Provisioning in the grid
environment means a grid service broker that knows the resource requirements of one
element of the grid and the resource availability of another element links the two together
automatically and dynamically to make efficient use of resources. Then it adjusts the
associations as circumstances change. Policies, such as response time thresholds or
anticipated peak demands, can be used to further optimize the associations of resourcerequestors to resource providers.
Resource Pooling
Consolidation and pooling of resources is required for grids to achieve better utilization
of resources, a key contributor to lower costs. By pooling individual disks into storage
arrays and individual servers into blade farms, the grid runtime processes that
dynamically couple service consumers to service providers have more flexibility to
optimize the associations. Resource sharing also happens purely in software. Web
services provide the model for applications to expose re-usable functionality for
discovery and invocation by unrelated applications.
Self-Adaptive Software
With labor being the most significant portion of IT costs, savings due to better hardware
utilization or more responsive systems become irrelevant if the everyday tasks of
administrators are not automated and simplified. A grid infrastructure would be
unworkable if every node required constant manual tuning and intervention. A critical
grid infrastructure requirement is systems that automate the bulk of maintenance and
tuning tasks traditionally performed by IT staff. More of the tasks that used to be
performed by administrators must now be handled by the systems themselves.
Unified Management
Even with self-managing systems, human beings will always be involved in managing an
enterprise grid, but the management tasks required by humans should be simplified with a
single tool that can provision, monitor, and administer every element in the grid. Such a
tool should evaluate availability and performance from the perspective of the user, such
that any bottleneck in the system or any unavailable component raises alerts. Most
importantly, with a grid infrastructure, IT professionals must be able to treat groups of
systems as a single logical entity so that tasks can be performed once and executed on
multiple machines.
Implement One from Many. Together, the attributes of virtualization, dynamic
provisioning, and resource pooling form the requirements for software that implements a
single logical entity using many services running on multiple servers and crossing
multiple disks-an entity which delivers high quality of service from low-cost
Manage Many as One. Together, the attributes of self-adaptive software and a unified
management model form the requirements for dramatically lowering management costs
by viewing the entire enterprise grid as one simple whole.
2.3 Grid Computing Standard
The Global Grid Forum (GGF) has the purpose of defining specifications for grid
computing. The Globus Alliance implements these standards through the Globus Toolkit,
which has become the de facto standard for grid middleware. As a middleware
component, it provides a standard platform for services to build upon, but grid computing
needs other components as well, and many other tools operate to support a successful
Grid environment. This situation resembles that of TCP/IP: the usefulness of the Internet
emerged both from the success of TCP/IP and the establishment of applications such as
newsgroups and web pages.
Globus has implementations of the GGF-defined protocols to provide:
1. Resource management: Grid Resource Allocation & Management Protocol (GRAM)
2. Information Services: Monitoring and Discovery Service (MDS)
3. Security Services: Grid Security Infrastructure (GSI)
4. Data Movement and Management: Global Access to Secondary Storage (GASS) and
Emergence of Web Services and their increased popularity within industry has allowed
for refocusing of the strategy for Globus Toolkit to incorporate Web Services. More
details about Web Services is available in Appendix A. The Open Grid Services
Architecture (OGSA) represents an evolution towards a Grid system architecture based
on Web Services concepts and technologies. Open Grid Services Infrastructure (OGSI) is
a formal specification of the concepts described by the OGSA. OGSA is a distributed
interaction and computing architecture that is based around the Grid service, assuring
interoperability on heterogeneous systems so that different types of systems can
communicate and share information (Figure 2.2). It leverages the emerging Web services
to define the Web Services Definition Language (WSDL) interfaces. Specifically, the
Grid service interface is described by WSDL, which defines how to use the service.
OGSI 1.0 specifies a set of service primitives that define a nucleus of behavior common
to all grid services. The Web Services Resource Framework (WSRF) is an evolution of
OGSI 1.0. Its goal is to evolve the grid architecture in a way that's more clearly aligned
with the general evolution of Web services. Instead of defining a new type of grid service,
these specifications will allow the services specified in the OGSA to be based completely
on standard Web services.
Globus Toolkits (GT), the standard, as implemented in GTI and GT2, was originally
formulated without reference to the industry standard of Web Services. However, the
most recent release (GT3) has moved fairly aggressively towards Web Services as the
underlying architecture. Although GT3 incorporates many recent standards, such as XML
and Web Services, it has a complicated structure that contains much legacy GT2 code.
The next version, GT4, is slated to be released in 2005.
Academic and industry R ID- 'ivanaged
Svirtual systems'
C: 0
.2 -r,
-0 -
a) C
Cn M
0 +w standards
Open Grid
Services Arc
Web services, etc.
Multiple irrplementations
Globus Toolkit
Defacto standard
Single implementation
Figure 2.2: The evolution of grid technologies and standards [8]
2.3.1 Globus Toolkit Standard
The Globus Toolkit (GT) [29] is the open source software base for building Grid
infrastructure and applications. A software system addressing key technical problems in
the development of Grid-enabled tools, services, and applications which offers a modular
set of orthogonal services and middleware for building solutions.
The evolution of the Globus Toolkits started off with GT1.0 which was released in 1998
followed by GT2.0 in 2001 and the current production version GT3.0 in June 2003. The
discussion here is limited to GT3.0 as the new version, GT4, has not being released.
The GT3 core architecture includes four core components that are represented by a white
background in Figure 2.3 [10]. The core components together make up the building
blocks of the Grid Services.
OGSI Specification
and Security
implementations do not provide any run time services but serve purely as a base for other
Grid Service Container
System-Level Services
jOGSI Spec I mplementation
Security Infrastructure
Hosting Environment
Figure 2.3: GT3 Core Architecture
OGSI Specification Implementation provides implementations for all OGSI specified
interfaces, as well as APIs and tools to make it easier to develop OGSI compliant services.
Security Infrastructure implementation provides SOAP as well as transport level
message protection, end-to-end mutual authentication, and single sign-on service
authorization; basically a rendering of the GSI implementation known from Globus
Toolkit 2 in an OGSI environment. This includes using X.509 identity certificates for
authentication and X.509 Proxy certificates to support delegation and single sign-on
which was updated to conform to latest IETF/GGF draft.
System-Level Services are general-purpose services that facilitate the use of Grid
Services in production environments and include the following services:
Administration Service
Logging Service
Management Service
Grid Service Container includes the above three core components and also shields the
application from environment specific run-time settings, such as what database is used to
persist service data. The container also controls the lifecycle of services, and the
dispatching of remote requests to service instances.
Hosting Environment currently offers support for four Java environments:
1. Embedded: utility to be used in clients or lightweight servers to expose Grid services
2. Standalone:
environment with an additional server mainline with startup options)
3. Servlet: container inside of a standard Java Servlet Engine
4. EJB: container inside of an EJB application server
Base Services includes higher-level services like Program Execution, Data Management,
and Information Services which was discussed earlier.
User-Defined Services are higher-level services which are not included in the toolkit but
are built on top of any subset of GT3 components including Base Services.
2.3.2 Grid Architecture and Internet
Figure 2.4 compares the Grid architecture with the Internet architecture. In this
framework, each layer of the Grid provides the Application Programming Interface (API)
and Software Development Kit (SDK) to help with the application development. The
separation of each layer makes it is possible to implement many protocols in each layer
without losing interoperability. The layered architecture makes it possible to easily
assemble resources on the grid for sharing and building Grid applications through all the
levels [3]. Descriptions of the layers are provided below.
Grid Infrastructure
Internet Infrastructure
"Coordinating multiple resources":
ubiquitous infrastructure services, appspecific distributed services
"Sharing single resources": negotiating
access, controlling use
"Talking to things": communication
(Internet protocols) & security
"Controlling things locally": Access
to, & control of, resources
Figure 2.4: Layered Grid architecture and its relationship to Internet Protocol architecture
Fabric Layer deals with local resource and is resource-specific. The resource can be
computing cycles, storage, network, code repositories, or catalogs like databases. In terms
of resource, alternative resource specific operations can also be provided. However, the
minimum required operations include enquiry and resource management operations.
Connectivity Layer supports core communication and authentication. Since virtual
organization systems work in a distrusted and dynamic relationships environment,
security is extremely important. The required functions are single sign on, delegation,
integration with various local security solutions, and user-based trust relationships.
Globus related protocol is the Grid Security Infrastructure protocol which is based on
Resource Layer defines a suite of protocols on service negotiation, initiation, monitoring,
control, accounting, and payment. However, this layer still concerns only the local
resource. It deals with two classes of information: information protocols and management
protocols. Because management protocols handle the negotiation and sharing relationship
initiation, they are embedded with access policies. In Globus Toolkit, Grid Resource
Information Protocol (GRIP), Grid Resource Registration Protocol, Grid Resource
Access and Management, and GridFTP are defined in this layer.
Collective Layer is used to coordinate multiple distributed resources and capture
interactions across collections of resources. It is not related with any individual resource.
Due to the variety of Grid applications, there can be many protocols defined in this layer,
for example, directory service, collaboration frameworks, software discovery etc. The
Collective Layer can be general or domain specific. In Globus Toolkit, there are many
protocols defined.
The Application Layer consists of Grid applications, which can be developed on services
defined at any layer. The applications within this layer may in practice call upon
sophisticated frameworks and libraries like the Common Component Architecture,
SciRun , CORBA, Cactus and workflow systems [9].
2.3.3 Grid and Web Services Standard Convergence (WSRF)
Since releasing Globus Toolkit 3.0 in July 2003, the GGF and the Globus Alliance have
been working closely to define enhancements to the standards. In January 2004, they
presented the WS-Resource Framework (WSRF), an open framework for modeling and
accessing stateful resources using Web services. WSRF defines where Web service
standards are evolving to meet grid services elements and requirements and the
convergence of core technology standards would allow a common base for business and
technology services (Figure 2.5). The specification consists of separate specifications,
each one focusing on a specific area.
Started far
apart in
Figure 2.5: Convergence of Grid and Web Service Standards [11]
The WS-Resource Framework (WSRF) is a set of six Web services specifications that
define what is termed the WS-Resource approach to modeling and managing state in a
Web services context. It can be viewed as a straightforward refactoring of the concepts
and interfaces developed in the OGSI version 1.0 specification, in a manner that exploits
recent developments in Web services architecture (e.g., WS-Addressing) to express these
concepts and interfaces in a manner that is fully aligned with current Web services
directions. WSRF retains essentially all of OGSI concepts, and introduces only modest
changes to OGSI messages and their associated semantics. GT4.0 is expected to be
WSRF enabled.
2.4 Grid Computing Categories
Grid Computing application can be varied based on the needs and utilization
requirements. Primarily there are three main categories that are currently assessed and
implemented both in the research and business communities. These include Compute,
Data and Optimization Grids. The details about these grid categories are presented in the
following sections.
2.4.1 Compute Grid
A computational grid is a hardware and software infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to high-end computational capabilities [5].
There are many different implementations of computing Grids. For example, in the
Condor system, a computing Grid scheduling system developed by University of
Wisconsin in Madison, there are three basic components: central manager, execution
hosts, and submission hosts. The central manager collects the status of all computing
server in its clusters and matches computing resource requests with a server that can meet
this requirement.
Execution machines are servers executing the assigned jobs.
Submission hosts are servers that request computing resources. Every server in the cluster
can be configured as an execution machine and submission machine at the same time.
There is always one central manager in one cluster although it can co-exist with
submission machines and execution machines. These servers can be dedicated servers for
computing Grid or idle machines.
BASF, one of the world's largest chemical companies, uses Platform Computing Inc.'s
grid products to power a project that examines the effect of catalysts on accelerating
chemical reactions. The problems faced by BASF were that using traditional computer
approaches, the analysis requires very complex, compute-intensive simulations could
take days to complete
and the process also required a large amount of intervention by
the researchers, reducing their productivity. BASF wanted a solution that would run the
simulations faster, improve researcher productivity, and use the existing IT infrastructure.
By deploying Platform's LSF grid offering, BASF was able to increase processor
utilization by 90%, automate manual scheduling and processing tasks, and reclaim 20%
of the researcher's work [17].
2.4.2 Data Grid
Data grids are grids that provide computing resources that allow for in depth analysis of
shared large-scale (and often diverse) databases. Data Grid has overlapping goals with
heterogeneous distributed database systems, which deal with different kinds of database
management systems distributed in a heterogeneous environment like different hardware,
operation system, network connection, data models, and even DBMS vendors. Both of
them aim to resolve the distributed data management tasks including integrated data
catalog, data discovery, distributed query, distributed transaction, and semantic
integration. However, most distributed database management systems are focused on
research under the database circumstances while Data Grid targets an even more
dynamic environment across administrative domains.
Second, most
distributed database systems require a central information server, which is not possible in
the dynamic Data Grid environment. Therefore, notification and event mechanisms are
very important in data Grid.
Pacific Life Insurance, the fifteenth largest life insurance company in the nation, has
deployed Entropia's DCGrid to accelerate financial modeling and simulation, using their
existing desktop PC infrastructure. DCGrid's ability to run these simulations more
quickly translated into a key competitive advantage for Pacific Life. DCGrid was able to
cost-effectively exploit the excess power of existing PCs, allowing Pacific Life to avoid
the purchase of additional hardware and related training [17].
2.4.3 Optimization Grid
Enterprise optimization grids help enterprises increase asset utilization through optimized
grid design. These grids focus on providing increased computing resources and better
storage systems utilization for enterprises that are trying to better leverage their
investments in computer systems and storage. This includes pooling of resources together
for better economy of scale and also scaling computing and storage resources to meet
spikes in computing demand.
Hewitt Associates, a large human resource consulting and outsourcing firm, operates one
of the best know examples of an enterprise optimization grid. The company reconfigured
its existing information systems environment to offload complex, resource-intensive
pension calculation applications off of its mainframe systems onto less expensive gridbased Linux blade servers. By so doing, Hewitt Associates has been able to reduce costs
associated with processing pension calculations on the more expensive mainframe, while
improving the performance (by more than 90%) of the pension application [17].
2.5 Applications of Grid Computing
Grid Computing in the commercial market has been led by several major vendors in both
the software and 'hardware' aspect of grid computing. Currently there are already
numerous grid computing based projects in existence in academia and government
initiatives. More recently the major IT vendors have declared key Grid Computing based
corporate strategic goals. IBM, SUN, Oracle, Microsoft and HP were chosen, in this
thesis, as commercial vendors for an analysis of their Grid Computing offerings. Gridbus
project is also analyzed to provide the Open Source and research community perspective
on Grid Computing initiatives. GridGarden.NET, a Microsoft .NET based Grid
Framework developed at MIT, is also presented.
2.5.1 IBM On-Demand
IBM On-Demand is the company's marketing term that denotes a company whose
business processes-integrated end-to-end across the company and with key partners,
suppliers and customers-can respond with flexibility and speed to any customer demand,
opportunity, or external threat. On-demand businesses are responsive, variable, focused
and resilient. IBM has established key business relationships with leading middleware
independent software vendors (ISV), like Avaki, Platform Computing, Data Synapse and
United Devices, to provide customers with the most robust grid solutions in the industry.
IBM is also a strong supporter of open standards for Grid architectures and has also
developed an extension of GT3 called the IBM Grid Toolbox V3 [24].
The IBM Grid Toolbox V3 for Multiplatforms implements the OGSI standard and
provides the tools to build a grid and to develop, deploy, and manage grid services. The
IBM Grid Toolbox consists of the following:
" A hosting environment that is capable of running grid services and sharing them with
other grid participants, such as grid service providers and grid service consumers
" A set of tools to manage and administer grid services and the grid hosting
environment, including a Web-based interface, the Grid Services Manager
A set of APIs and development tools to create and deploy new grid services and grid
The IBM Grid Toolbox V3 for Multiplatforms includes core grid services and base grid
services as shown in Figure 2.6. Core grid services are always available in any instance
of the IBM Grid Toolbox. Core grid services cannot be deployed or un-deployed
separately from the installation of the IBM Grid Toolbox. Base grid services can be
deployed with the installation of the IBM Grid Toolbox, but they can also be deployed or
un-deployed separately.
MAXj*ff Motio
fMa negnr~t
CMM Po"ic 30roup
OG51 cote sevcs
IBWb~phues? Application Servot
Expiess V5.2
Figure 2.6: IBM Grid Toolbox Base Services
The IBM Grid Toolbox includes a list of base grid services that can be deployed with the
installation or deployed or un-deployed separately. The following base grid services are
Program Management Services
Program Management Services manages jobs located on local or remote instances of the
IBM Grid Toolbox. Program Management Services simplifies the use of remote systems
by providing a standard interface for requesting and using remote system resources for
the submission and control of jobs on a grid. This implementation is typically used to
support distributed computing applications.
Information Services
Information Services provides information about grid resources for use in resource
discovery, selection, and optimization. Information Services also maintains knowledge
about resource availability, capacity, and current utilization. This information is critical to
the operation of the grid and development of applications. Within any grid, resources will
fluctuate, depending on their availability to process and share data. As resources become
free within the grid, they can update their status within the grid information services. The
client, broker, and grid resource manager use this information to make informed
decisions on resource assignments.
Data Management Services
Data Management Services provides data transfer capabilities throughout the grid.
Without this grid service, data is not able to move through the grid nodes. Grid
applications use this grid service to move data from one node to another.
Common Management Models (CMM) Services
The CMM implementation in the IBM Grid Toolbox provides the infrastructure required
to represent an instrumented resource as a grid service, so that it can be queried and
managed in a grid context.
Policy Services
The Policy Services in the IBM Grid Toolbox enables administrators to define a set of
business goals and to enforce a set of rules that allow their grid to meet those goals. In the
IBM Grid Toolbox, a policy identifies the desired outcome for the interactions between
different elements in the grid environment. The core set of services to define, manage,
and apply policies on a grid include: Policy Service Manager (PSM), Policy Service
Agent (PSA) and Policy Repository.
Service Group Services
The Service Group Services implementation in the IBM Grid Toolbox includes
extensions to the service group services in the core Globus Toolkit, and is compliant with
the Service Group definition in the OGSI specification. Service group provides a
framework that allows grid applications to categorize (group) grid services and later
execute queries on the group of services to find specific types of grid services. Grid
application writers might use the service group framework to implement a registry
service for their grid.
IBM WebSphere
http server
soap engine
IBM Grid Toolbox
0 GSI core (G T3)
Figure 2.7: IBM Grid Toolbox Hosting Environment
The IBM Grid Toolbox is built with an embedded version of the IBM WebSphere
Application Server - Express V5.0.2. Although it functions nearly the same as a fullproduct version of WebSphere Application Server, it is used exclusively for the IBM
Grid Toolbox. A hosting environment can be seen essentially as a grid container running
inside of a Java engine (Web container) or an EJB application server (WebSphere
Application Server) as shown in Figure 2.7 [24].
2.5.2 Sun N1 Grid
Sun NI Grid is its architecture for the next -generation data center. The architecture is
designed to make the entire data center behave as a single, unified system. NI is designed
to reduce management complexity and cost; increase data center resource utilization,
improve infrastructure responsiveness and agility, and ensure investment protection.
Sun's new Grid offerings come in four categories: access, data, computation, and
visualization [13].
Sun's "access software" enables efficient usage of resources regardless of location, and is
provided through a new Grid Portal solution that relies on the Sun Grid Engine Enterprise
Edition (SGEEE) (currently called Sun NI Grid Engine) software and the industrystandard Globus toolkit.
Sun's Data Grid solutions rely on the Sun StorEdge Open SAN Architecture, the Sun
StorEdge 3510 FC array, and Sun StorEdge SAM-FS and QFS software. Sun said its
Data Grid solutions allow data to be collected, managed and protected regardless of user
or data location.
The Sun Fire Compute Grid family couples Sun Fire systems with a choice of
interconnect technologies, which Sun says "provides excellent price-performance with
clusters of small systems as well as excellent price/productivity with superclusters" that
utilize large memory and simplified programming environments. Interconnect choices
include Gigabit Ethernet switches, Myrinet, Infiniband, Quadrics, or the Sun Fire Link
Visualization Grids let applications perform graphics operations using local or remote
graphics systems. Sun's Visual Grid platform is based on the Sun Fire V880z, the XVR4000 high-speed graphics subsystem, and specialized software based on the OpenGL
industry standard.
Sun says its Grid Reference Architecture provides a framework for the deployment of
these building blocks. Sun's Customer Ready Systems (CRS) program integrates the
building blocks with complementary third-party hardware and software into "ready-todeploy" solutions that are built in Sun factories based on a customer's specifications and
supported by a global professional services practice focused on Grid deployments.
. ....
Sun Grid Solution Stack
Web Interface
Sun Grid Portal
SolarisTm Operating Environment & Lirw*
Sun Enterprisem and Sun Fire TM Servers
Sun StorEdgeTM Systems and HPC SAN
hin and Bladed Servers - SPARC & Intel
Figure 2.8: SUN NI Grid Computing Strategy
Sun NI Grid Engine software is a distributed management product that optimizes
utilization of software and hardware resources. It can increase utilization of available
resources to as much as 98 percent. Sun NI Grid Engine software is both a job manager
and a job scheduler for clusters of computers. The Sun NI Grid Engine Enterprise
Edition software can harness computing power across multiple clusters (campus grids).
Sun's NI Grid technology is designed to allow multiple software applications to share a
common pool of servers and storage resources. By opening up previously static
relationships between hardware, applications, and the operating system, NI Grid
technology enables flexible provisioning of resources so that excess capacity can be
available to power other applications in the virtualized Network.
Sun Grid Engine software works by enabling companies to submit and manage jobs from
just about any Linux or UNIX system on the network. It does this by monitoring the
availability of workstations, then deploying jobs to the available resource. Additionally,
the command line utility gives the company the flexibility to script and automate jobs as
well as build a custom front end. And the GUI provides the business a convenient
management tool for administering the Sun ONE Grid Engine software.
Sun NI Grid Engine software enabled hosts can be master hosts, execution hosts,
submission hosts, and administration hosts. These roles are not mutually exclusive; it is
possible for a host to perform all four functions. A typical cluster configuration is to have
one master host, running the sgeqmaster (manager) and sge-schedd (scheduler)
daemons and the other hosts running sge execd (execution) daemons. All Sun NI Grid
Engine software hosts are communicating through TCP/IP; for this purpose, there is a
special daemon, sge commd, running on each host.
Computing resources are modeled by Sun NI Grid Engine software as job execution
queues. Each queue can have specific attributes and can support multiple parallel
environments. The most frequent parallel environments used are Message Passing
Interface (MPI) and parallel virtual machine (PVM).
2.5.3 Oracle 10g
Oracle offers organizations a comprehensive solution to manage information and run
enterprise applications on grids. Oracle Database lOg has been designed to manage
information on computing grids called database grids. Oracle Application Server I Og
(OracleAS lOg) has been designed to run enterprise applications on computing grids
called applicationserver grids. Both Oracle Database lOg and Oracle Application Server
lOg can be very efficiently managed in a grid computing environment using Oracle
Enterprise Manager lOg Grid Control. Figure 2.9 illustrates Oracle 1 Og products and
features mapping to grid computing requirements [21].
One from
an ad
man newO
Many as
Figure 2.9: Oracle 1 Og Product Grid Mapping
Oracle Database 1 Og builds on the success of Oracle9i Database, and adds many new
grid-specific capabilities.
Other vendors implement certain portions of a grid
infrastructure, for example pools of virtualized storage are becoming common, but no one
else can provide a true grid database. Oracle Database 1 g is based on Real Application
Clusters, introduced in Oracle9i.
Oracle Application Server 10g, the next generation of Oracle's integrated suite of
application infrastructure software, has been specifically designed to run enterprise
applications on computing grids. It is designed to run enterprise applications on pools of
low cost servers and storage with very high performance, scalability, and availability
while radically reducing the costs of systems and applications monitoring and
management. Further, customers can deploy all their existing Oracle9iAS applications on
Application Server 1 g without any changes and take advantage of the new grid features.
OracleAS 1 Og is managed by Oracle Enterprise Manager 1 Og Grid Control, a web-based
management console that enables administrators to manage many application servers as
though they were one, thereby automating administrative tasks and reducing
administrative costs. Grid Control also provides facilities to enable many administrators
to work together to manage an application server grid. Finally, OracleAS 1Og is also
integrated with Oracle Database I Og in many different ways to optimize quality of
service across a unified grid computing infrastructure for data management and enterprise
Oracle Enterprise Manager lOg Grid Control is the complete, integrated, central
management console and underlying framework that automates administrative tasks
across sets of systems in a grid environment. Grid Control helps reduce administration
costs through automation and policy-based standardization. With Oracle Grid Control, IT
professionals can group multiple hardware nodes, databases, application servers, and
other targets into single logical entities. By executing jobs, enforcing standard policies,
monitoring performance and automating many other tasks across a group of targets
instead of on many systems individually, Grid Control enables IT staff to scale with a
growing grid. Because of this feature, the existence of many small computers in a grid
infrastructure does not increase management complexity.
Oracle's product offerings are focused on Oracle-based solutions, not generalized
offerings. This being said, a great deal of what organizations are seeking when they
consider Grid Computing architectures can be supported quite well using Oracle's
2.5.4 Microsoft Grid
In Chapter 1, Microsoft was pointed out as a player in the Grid Computing space; in
reality Microsoft does not seem to have any specific grid products in this space.
Microsoft currently offers High Performance Computing (HPC) cluster solutions through
its version Windows 2000/3 Server operating systems. Microsoft has also announced a
future Windows 2003 Server, HPC Edition [33].
According to Microsoft's Distinguished Engineer Jim Gray web services and Grid
Computing are synergistic. The data-centric nature of web services complements Grid
Computing's computational focus. ".NET is application-centric and is designed to make
it easy to build the Data Grid - the integration of data sources throughout the world to
produce a data library that provides a consistent, unified corpus that also is easy to query"
Gray says [41]. Microsoft's strategy seems to be based on these Data Grids and
implementing the Global Web Services Architecture (GXA) (Figure 2.10) [18].
A recent article points out that Microsoft is working on a skunk-works project related to
Grid Computing, code-named Bigtop, which is designed to allow developers to create a
set of loosely coupled, distributed operating-systems components in a relatively rapid
way. Microsoft is looking at the consequences of loosely coupling a larger number of
moderately powerful computers to achieve similar results as using several tightly coupled
high performance systems together [42].
Infrastructure Protocols
Future Messaging
Future Eventing
SOAP Modules
WS-Attach ents
Figure 2.10: Global Web Services Architecture
GXA is a layered architecture built upon baseline Web service specifications, such as
Simple Object Access Protocol (SOAP), Universal Data Discovery Interface (UDDI), and
Web Services Description Language (WSDL) (see Appendix B). The GXA protocols
above the baseline can be divided into two layers. The first layer is SOAP modules, such
as WS-Security and WS-Attachments. These protocols are primarily concerned with the
content and structure of an individual message. For example, the Security specification
describes how security information, such as a Kerberos ticket, is embedded in a SOAP
message. The second layer (top) is composed of infrastructure protocols such as
Transaction and reliable messaging. Infrastructure protocols describe how sequences of a
message are put together to solve business problems. The Transaction specification, for
example, describes the flow of a SOAP message sent between a set of Web services that
need to work together to coordinate a series of database updates.
A framework using Microsoft .NET platform called GridGarden.NET, developed at
Intelligent Engineering Systems Laboratory (IESL) MIT, is presented later in this chapter.
Alchemi, the Open Source .NET initiative, is described in Appendix C.
2.5.5 HP Adaptive Enterprise
HP has been a pioneer in grid computing since the late 1980s when Joel Birnbaum, then
the director of HP Laboratories, envisioned the notion of "utility computing". Since then
HP has been putting its energies toward grid computing inside and outside the lab.
Company Chairman and Chief Executive Officer Carly Fiorina recently committed to use
industry standards to grid-enable all of HP's products as illustrated in Figure 2.10 and
also provide enterprises with services needed to implement Grid Computing [30].
Products ranging from the smallest handhelds, printers and PCs to the most powerful
storage arrays and supercomputers will be able to connect with and serve as resources on
a grid. Already today HP provides customers with grid-enabled systems running HP-UX,
Linux, and Tru64 UNIX.
Figure 2.11: The HP grid solution stack diagram
HP believes that open, vendor-neutral standards are critical to the wide adoption of grid
technology and that the flexibility of an open standards-based approach is the only way to
bring together heterogeneous resources into an effective grid computing environment.
Grid computing enables HP's vision of the Adaptive Enterprise -- where information
technology is a highly efficient, flexible service that is agile enough to change in line
with a corporation's business and its business environment.
HP has partnered with Platform Computing, Avaki, United Devices and Altair
Engineering to provide grid software solutions with the HP Adaptive Enterprise vision.
2.5.6 Gridbus - Open Source Grid Initiative
The Grid Computing and Distributed Systems (GRIDS) Laboratory at the University of
Melbourne is actively engaged in the design and development of next-generation parallel
and distributed computing systems and applications. The Lab's flagship "Open Source"
project, called Gridbus, is developing technology that enables GRID computing and
BUSiness (Figure 2.12) [31].
Grid Brokers:
Workflow Ix-Parameter
Workflow Engine
Con Grid
Market i
Exchange &
Gridbus Data Broker
Worldwide Grid
Figure 2.12: Gridbus Middleware
The Gridbus Project is engaged in the design and development of grid middleware
technologies to support eScience and eBusiness applications. These include visual Grid
application development tools for rapid creation of distributed applications, competitive
economy-based Grid scheduler, cooperative economy based cluster scheduler, Webservices based Grid market directory (GMD), Grid accounting services, Gridscape for
creation of dynamic and interactive test bed portals, G-monitor portal for web-based
management of Grid applications execution, and the widely used GridSim toolkit for
performance evaluation. Recently, the Gridbus Project has developed a Windows/.NETbased desktop clustering software and Grid job web services to support the integration of
both Windows and Unix-class resources for Grid computing. A layered architecture for
realization of low-level and high-level Grid technologies is shown in the figure below.
Some of the Gridbus technologies discussed below have been developed by making use
of Web Services technologies and services provided by low-level Grid middleware,
particularly Globus Toolkit and Alchemi.
2.5.7 GridGarden.NET - A Microsoft .NET based Grid Framework from MIT
The Intelligent Engineering Systems Laboratory (IESL) at MIT has developed a .NET
implementation the middleware necessary for Grid Computing [1]. The IESL team has
focused on deploying computations over a farm of worker machines using the OGSA
standards but implemented on the .NET CLR. Appendix B provides a summary of the
Microsoft .NET architecture. The environment supports message passing between
processes (more precisely Application Domains) using a more programmer friendly
interface than Message Passing Interface (MPI).
This work has been partially sponsored by the National Infrastructure Simulation and
Analysis Center of Sandia National Laboratories.
This is a master-slave architecture in which the master computer is called a "Proxy" and
the slave computer a "Worker". The Proxy for the Master is used because it provides
users with "proxy" objects for manipulating the Workers. For example, a user can, over
the Internet, command one or more of the Worker threads to "Start/Resume", "Pause" or
Worker Prmces
Event SubsystemJ
Figure 2.13: Components of GridGarden.NET
The computing is done through a number of "worker processes". Workers, the computers
that hold the worker processes, are distributed across the network by URLs. For the
current design, two communication channels are provided to communicate with the Proxy,
namely, .NET TCP Remoting channel and HTTP Web Service channel. The TCP
Remoting channel is used for the local area network (typically the Workers) due to its
high performance, while the HTTP channel is used for communication with Internet
clients, since the SOAP based Web Service is an open standard.
Programs running as worker processes are coded in the Common Language Runtime
(CLR), which is platform independent (though presently limited to Windows, Linux and
BSD). A grid program is normally a CLR dynamic loaded library (DLL) that contains the
minimum functionality needed by GridGarden.NET system to Start/Resume, Pause and
Kill it. We call the DLL a "Seed". The GridGarden.NET framework stores the DLLs in
the Seed Pool for later distribution to the Workers. A number of abstract seed classes are
provided as templates for the programmer. Typically, the programmer inherits and
extends the abstract classes to their specific needs, such as particle simulation.
One important idea to be introduced is the "Net Application Domain" (NAD). Similar to
MPI's "Communication Group", the NAD facilitates identification and communication of
worker processes for a Grid Application. One difference between NAD and MPI
Communication Group is that the NAD is not just for communication. It also provides a
shared memory block, similar to a global COMMON, for worker processes, and its data
can be exported over the Internet. The share memory block is called "Global". It is
critical for Grid Computing so that results from all the Workers can be aggregated and
fed on to other systems, such as a graphics server or a database server. Grid users
accessing information about a simulation or controlling the application can do this by
Read or Write to the Global data. Data stored in Global can be arbitrary serializable
objects and can represent any data type.
Controlling the Grid Application is currently achieved through a Windows GUI that
allows the Grid Application administrator to Run, Pause, Kill etc, which forms a mini
distributed operating environment that is quite similar to UNIX. One difference is that
GridGarden.NET commands are managing distributed the application through Web
The coding of the framework is mostly done in C#. However, C++ and VB for .NET can
also be used since they are supported and can be compiled into CLR code. There are
some other language bindings for CLR that are mostly Open Sourced, such as Perl,
Python and even PHP. These are also potential language for GridGarden.NET developers
but at present have not been tested. High performance programming libraries, such as
Lapack, can be easily embedded into GridGarden.NET by inter operation. The current
system contains a matrix manipulation class "Matrix" in GridLib as a wrapper for a
native code function written in C. This Matrix class is a good illustration of how to write
a wrapper for legacy libraries.
3.1 Grid Computing Today - Current State of Affairs
The buzz around Grid Computing has picked up considerable momentum in recent years
within the commercial realm where huge amounts of information are available but not
properly understood. Grid Computing should not be regarded as the silver bullet but
rather a tool that will help businesses and entities to leverage their current computing
resources and be able to access others' when needed.
This momentum is validated by the endorsement of Grid Computing technologies into the
corporate strategies of several major IT vendors. Although the trend shows the enormous
Grid Computing market opportunity [6], there are still some barriers that need to be
overcome. According to a market study report by Platform Computing [19] there is an
overwhelming consensus that non-technical barriers including organizational politics
played an important role in the implementation of Grid Computing. A further analysis
revealed four top key organizational issues as illustrated in Figure 3.1:
Loss of control or access to resources
Risks associated with enterprise-wide deployment
Loss or reduction of budget dollars
Reduced priority of projects
The key issues mentioned above were also highlighted by Sharon Green, Director of
Utility Computing at EDS and Ed Reynolds, EDS Fellow [25]. In order to overcome
these barriers corporate-wide education about the benefits and realities of Grid
Computing implementation is important. The study also found that corporate culture has
transformed as such that employees perceive that they own the corporate assets that they
use for their jobs leading to difficulties in sharing their resources (loss of control) in
exchange for access to a larger pool of resources. Depending on the type of solution a
corporation chooses the resources can be under the total control of the organizations.
Figure 3.1: Organizational issues ranked "Extremely High" and "High" in severity [19]
The issue of risks in enterprise-wide deployment can be mitigated by starting out by grid
enabling one or two applications and initially and then scaling higher across the
enterprise. On the other hand the fear of loss of budget is compensated by the savings
from not purchasing additional expensive hardware and increased computing capabilities
for faster tasks completions. Once the shared infrastructure is established one would be
able to move resources and move processing around, and migrate resources based on
priorities. These priorities are needed to be decided at an organizational level, and
implemented top-down.
Security and standards are also key factors relating to the adoption of Grid Computing by
various organizations. The broad range of standards initiatives currently focused on grid
computing are a reliable indicator of the acceptance, and ultimately widespread usage, of
this technology [22]. Several grid standards groups like the Global Grid Forum, Globus
Alliance, and OASIS have formed a number of working groups to develop standards
addressing data transport, interoperability,
security and integration with existing
technologies like web services.
The development of grid standards will also allow customers the flexibility and
adaptability to be free to take advantage of open-source and commodity products where
appropriate while avoiding being locked into a proprietary unique vendor solution.
3.2 Industry Perspective - Hierarchy for Types of Grid Computing
In the previous chapter, several key players were identified and their 'Grid Computing'
implementations or initiatives were presented. In this section the hierarchy for types of
Grid Computing is introduced and industry perspectives are highlighted. This is followed
by several real world case study implementations of the major vendors' 'Grid
Computing' solutions in the next section.
There are at least three levels of hierarchy for types Grid Computing that have been
identified [28]. The different levels suggest a different mode of communication and
orchestration of resources among different entities. The levels are explained below:
Level 1 Grid Type:
Level 1 Grid can be attributed to highly parallelizable problems i.e. non-coupled
problems like Monte Carlo simulations where same code is run in every machine but
focusing different data. In general a user distributes code to be executed across a network
of machines and coordinates the messaging between the machines so that some
computational goal is achieved. The metaphor of Master and Worker machine can be
used here for illustration. The coordination of the machines is achieved by using the
Master machine. Here it is assumed that the Master is a "special" trusted machine that has
coordination and other responsibilities not shared by the Workers [1].
The Master controls the workload on Worker and messaging routing and coordination or
queries or distributes agents to remote machines and gathers results. At this level there is
no Worker to Worker communication or messaging. This is illustrated in Figure 3.2.
Figure 3.2: Master-Worker Level I Grid Scenario
Level 2 Grid Type:
In a Level 2 type Grid, Workers may execute different code and act on different data and
the Master may or may not coordinate Workers. The Master distributes the code to
different workers and starts the execution of the tasks on the Workers. The Workers in
this scenario can communicate with each other possibly via the Master. This is illustrated
in Figure 3.2. Issues such as large messages sizes, buffers and timing need to be
considered due to the communication between the Workers and Master.
Figure 3.3: Level 2 Grid Type
Level 3 Grid Type:
Level 3 Grid Type represents true Internet grids with cross domain computing. This
involves multiple grids combining to form super grids where the grid nodes are web
addressable (Figure 3.4). This involves highly complex workflow and coordination where
communication not only exists directly between Workers but also between Masters
within the hierarchy of multiple grids.
Figure 3.4: Level 3 Grid Type
One of the primary concerns in this Level 3 Grid Type is security issues across multiple
domains and complex and complicated process of communication, coordination and
scaling among others.
The Industry View
There are current Grid Computing implementations which already fall within Level 1 and
Level 2. Most of the Level 2 implementation uses Message Passing Interface (MPI),
which is a de facto standard for communication among the nodes running a parallel
program on a distributed memory system. As mentioned earlier, security is of primary
concern in Level 3 Grid Type implementation and using multiple level of security
abstraction also raises the concern for security holes for corporations due to the
sensitivity of their data. It is also known in the industry that in order to build Grid
applications there needs to be a paradigm shift among programmers to focus development
and reconfiguring of applications to meet the requirements of grid computing similar to
the way it has been implemented in mainframes in the past [25]. Currently, there are
unique solutions that are being implemented by the different major vendors like IBM,
SUN, and Oracle etc. and their partners for their corporate clients (See next section).
The challenge of implementing Grid Computing is further complicated by the variety of
different operating systems that run in corporations and if open source is not adopted
whole heartedly within the corporation then Globus standard might be difficult to
implement [25]. Web Services could play a crucial role in connecting these disparate
resources if the standards are agreed upon. One consistent set of standards is being
developed by Microsoft under the title of Global Web Services Architecture (GXA).
However, OGSA is proposing others, such as WS-Resources that are not compatible with
GXA and therefore there is some confusion as to which standards will prevail [1].
3.3 IBM Case Study: Hewitt Associates
Hewitt, an IBM supplier and customer, is a global outsourcing and consulting firm that
offers a complete range of human resource services. High-volume printing, quick
reaction time to changing customer needs and around-the clock availability of the print
process are important to Hewitt, which offers printed documents to customers as a
primary means of communication [24].
Hewitt needed a highly available print composition system capable of satisfying customer
demands while integrating transparently and cost effectively with its existing system. The
challenge was to partner with Hewitt Associates to build an enterprise print composition
solution that was cost-effective, resilient and scalable. Sefas Innovation's Open Print
Backstage module, running on IBM BladeCenter systems and Red Hat Linux was
selected as solution components. To establish that Sefas' Open Print Backstage module
could be grid enabled to satisfy Hewitt's demanding print composition requirements, a
proof of concept (POC) exercise was staged at the IBM Innovation Center in Waltham,
Hewitt had successfully implemented a grid for another application and thought the Open
Print Backstage module was a good candidate for the next Hewitt grid implementation.
Backstage - part of the Sefas Open Print suite that offers complete document
management software infrastructure for enterprise publishing - is the high-performance
document production engine that drives the high-volume printing of transactional
documents. Backstage is optimized for volume batch printing throughput and features an
easy-to-use drag-and drop interface that allows any application to be put into production
from any workstation.
Proof of Concept Results
The IBM grid team, in concert with teams from Sefas, Hewitt and DataSynapse,
conducted the POC grid enablement exercise. The test environment featured multiple
IBM BladeCenter and IBM xSeries systems running Red Hat Linux, as well as z/OS.
DataSynapse GridServer provided the grid infrastructure software layer. DataSynapse is
also an IBM Business Partner.
The end-to-end integration testing showed conclusively that Hewitt jobs submitted to the
Sefas grid successfully processed with all business and technical objectives achieved.
Most importantly, the test allowed Sefas to provide added value to Hewitt, its customer.
"The proof of concept grid enablement exercise was important to us for three reasons,"
said Jean-Philippe Sarraut, CEO of Sefas Innovation. "First, we've had a long and
successful relationship with IBM that we want to continue. Second, most of our
customers are using IBM infrastructure solutions, so ensuring our solutions work in an
IBM environment is a must. Lastly, we've been exploring ways of splitting and spreading
CPU-consuming tasks to low-cost commodity hardware platforms. IBM, Sefas and
Hewitt are at the forefront of support for new technologies such as grid computing,
Blades, open standards and Linux, so the timing of the grid enablement evaluation
couldn't have been better from our perspective."
Key Business Benefits:
Reduced the time required to set up a new client from 12 months with their previous
system to 3 months
" The new architecture allowed Hewitt to absorb production sample reviews at a
fraction of the cost of using mainframe-based architecture
* The new system can be scaled up as needed
" System resilience avoids reprocessing very large files when data is corrupt or invalid
" Workflow is more fluid, leading to more predictable processing times
" Use of IT resources is maximized using grid technology
3.4 SUN Case Study - Axyz Animation, Inc.2
The world of digital special effects is constantly concerned with the time and effort it
takes for animators and technical directors to create many versions of various shots and
farm these shots off to be rendered frame by frame for review. Since "rendering" is such a
compute-intensive task, finding the available computing resources and distributing the
work effectively can end up being tremendously time consuming. "Our scripts had to be
manually tweaked for each and every render to run on various machines," explains John
Coldrick, senior animator/V.P. at Axyz Animation, Inc. "The available machines and
CPUs were constantly changing and required monitoring. Our process was very
inefficient and error prone--animators and technical directors were constantly waiting for
render tests."
Sun Microsystems Case Studies [20]
Based in Toronto, Ontario in Canada, Axyz Animation is a small- to mid-sized company
that produces digital special effects for commercials, television series, and films. Unlike
large animation shops', where having a "render farm solution" is a necessity, for smaller
shops it's not an economical option to have a custom-made solution. Unfortunately, offthe-shelf packages do not typically work acceptably in the animation environment,
therefore most large animation shops have solutions that are written from the ground up.
"What we needed was a flexible solution that didn't require re-inventing the wheel as far
as distributed processes went, and yet was flexible enough for us to implement things the
way we wanted," explains Coldrick. After extensive research, Coldrick read about Sun
ONE Grid Engine software developed by Sun Microsystems, Inc. He continues, "What
we discovered in Sun Grid Engine software was a remarkably robust, flexible, and
scalable product that fit our needs like a glove. In fact, due to the scalability of Sun Grid
Engine software, it would easily work as a solution for the larger animation shops as
Sun Grid Engine Software to the Rescue
After switching to Sun Grid Engine software, Axyz animators could submit any process-animation or render--with the same command. "Animators don't have to be concerned
about what machine is available, or massaging their scripts to maximize speed," explains
Coldrick. "Every available CPU in our farm is put at their disposal, and jobs that typically
took a whole night in the past can often run in a fraction of the time. Bottlenecks are a
thing of the past."
Sun Grid Engine software works by enabling companies to submit and manage jobs from
just about any Linux or UNIX system on the network. It does this by monitoring the
availability of workstations, then deploying jobs to the available resource. Additionally,
the command line utility gives the company the flexibility to script and automate jobs as
well as build a custom front end. And the GUI provides the business a convenient
management tool for administering the Sun ONE Grid Engine software. "We've been able
to implement application-specific licensing scenarios, such as applications that will run
multi-threaded on one machine without an additional token penalty," says Coldrick. "This
helps utilize all of our CPUs to their maximum. By taking the significant technical task of
managing distributed processing off my plate, I was free to focus on an implementation
that worked for our needs."
Sun Grid Engine software can be set up in three different environments depending on the
company's requirements. The Cluster Grid includes one group using the Cluster Grid.
The company could have multiple Cluster Grids set up in different locations. In Axyz's
case, they used the Cluster Grid environment.
"I was able to set up a single group Cluster Grid in approximately two weeks of spare
time after regular working hours--not very long at all," continues Coldrick. "I wrote script
wrappers that not only allowed our staff to use language that was familiar to them, but
allowed us to adapt to the specific requirements of the applications that we run." In
addition, Axyz was able to easily write utility scripts that were tailored to its everyday
tasks because of the open nature of the Sun Grid Engine software.
Axyz picked up a short-term project to work on a TV pilot for ABC. For this pilot, they
were asked to generate a demanding 120 shots in three weeks. To accommodate this
project, Axyz easily developed a second Sun Grid Engine software group. "I set up a
second Sun Grid Engine software group for that short time, and combined with a
powerful production pipeline that we developed, we were able to bring in quality work,
on time, on budget, and without killing any of our staff," explains Coldrick. "Without Sun
Grid Engine software this could not have been possible, since in such an incredibly short
time having to haggle with managing something as mundane as render management
would have killed the project."
All in all, Sun Grid Engine software has freed up significant amounts of time for Axyz
animators, and let them focus on what they do best--animate. "In addition, turnaround
time for tests has dropped significantly, allowing for more refining in the same amount of
time," concludes Coldrick. "Human error has also dropped, as there are far fewer scripts
that need to be edited to get the job done. We couldn't be happier."
Key Business Benefits:
Animation and render jobs submitted and managed quickly, efficiently, and reliably
by Sun ONE Grid Engine software
Dramatically reduced time to do animation or render jobs from overnight to 1-2 hours
Completely eliminated bottlenecks from animation process
" Significantly increased server utilization rates to almost 95 percent
Helped enable small animation company to cost-effectively distribute jobs to and
from Linux systems
Provided robust and flexible solution helping to allow company to easily grow with
future business by simply adding compute power
3.5 Oracle Case Study - Chicago Stock Exchange 3
In 2002, Chicago Stock Exchange (CHX) was exhausting the capacity of its legacy
database server in its previous configuration, which was cumbersome to manage.
Recovery from a hardware failure could take up to three hours, at worst, but outages of
15 to 20 minutes, which were more common, were also unacceptable. When replacing its
legacy system, one challenge CHX faced was estimating capacity for a new system, as
the stock market was entering a downturn at that time. If CHX bought more server power
than it needed, it would be paying for idle capacity. However, if it under-bought server
capacity, it risked throwing away the hardware investment when market activity picked
up and business needs required a larger machine.
CHX identified that its high-availability and hardware-economy requirements could be
met by implementing an enterprise grid solution using Oracle Database with Real
Application Clusters, Oracle Application Server, and Oracle Enterprise Manager. This
move allows CHX to limit its initial hardware investment only to the server power it
currently needs, and provided the flexibility to incrementally scale-out system capacity as
demand dictates. CHX is upgrading its system to Oracle 1 Og to further improve system
automation and centralized management. CHX increased the number of servers in its grid
configuration from two to four, further minimizing the risk of downtime.
Server Utilization
Prior to using Oracle Real Application Clusters, CHX ran on two HP Alpha Server GS60
servers. "We recorded a lot of trading information during the day, and then at night we
would process it," said David Milne, director of database technologies for the Chicago
Stock Exchange. "That meant one of these big machines was virtually idle while the
other was very busy. We could failover and run on one machine, but that was just to get
us through a crisis." CHX runs its online customer service, batch reporting, and data
mining systems along with a near-real-time decision support system on Oracle. The
system is managed through Oracle Enterprise Manager Grid Control and runs on
economical HP ES40 servers.
The resulting infrastructure has greater flexibility for allocating workloads to get the most
out of the system. "Since I now have jobs divided onto four machines, it's easier to see
where the problems are and better tune the systems," Milne added. "We can now quickly
identify and fix problems instead of just having to live with the problems because we
weren't able to see what was going on." CHX officials also note the system consistently
achieves more predictable CPU utilization rates with Oracle's grid environment and is
capable of handling mixed workloads, including batch loads and online transaction
processing systems. Over time, CHX can expand its lOg environment by scaling out with
additional server and storage modules-a strategy that Mainstay Partners, an independent
consulting firm, predicts will avoid costly hardware replacements.
Oracle 10g Solution Results
CHX officials expect significant productivity increases, operational benefits, and
improved customer satisfaction levels with its Oracle enterprise grid solution. These
benefits can be attributed to the built-in automation in Oracle Database 1 Og and the selftuning capabilities of Oracle Enterprise Manager 10g, such as Automatic Database
Diagnostic Monitor (ADDM). ADDM simplifies the query tuning process, eliminating
routine diagnostics and manual performance "fire drills."
Oracle Enterprise Manager Grid Control will also provide CHX greater visibility through
centralized system management. This approach will ease the challenge of managing
mixed workloads in an enterprise grid by dynamically monitoring and shifting resources,
as predetermined according to CHX's business rules. The increased automation allows
CHX to deploy database administrators and systems administrators away from routine
maintenance tasks to more strategic projects.
"One of my biggest goals is to continuously improve our computing environment,
making it seamless and transparent, so the customer never knows that there's a database
serving them," Milne said. "Oracle Real Application Clusters on HP keeps our database
up and running, allowing our customers to conduct business on their schedule. Oracle
helps us ensure maximum uptime so we're here whenever our customers need us."
Key Business Benefits:
" CHX's investment in Oracle technology will yield a return on investment of 171%
over five years and an internal rate of return of 47% (Mainstay)
According CHX executives, Oracle's enterprise grid computing solution has provided
opportunities for improved operations and customer service
Cost savings stemming from headcount avoidance due to improved overall system
The current state of affairs of grid computing has certainly shown its both potential and
barriers that exists but the future perspective also needs to be taken into account. This
chapter discusses the several drivers for the future of Grid Computing in industry as well
in its existing areas.
4.1 Market Forecast
The potential for Grid Computing is huge and the projected investment by the major
vendors is surely to increase. The market drivers identified in Chapter 1 will play a
significant role in the next few years as the commercial grid computing technology
matures and cultural and political organizational acceptance increases. The market
opportunity of $12 billion and $1.8 billion in Western Europe is a driver in itself [6] [7].
IDC has been forecasting the virtual environment software markets to grow at a faster
rate than the operating system and subsystem market. This is, in part, due to its belief that
Grid Computing is going to be increasingly important to organizations [14].
One of the major drivers that might be leading in acceptance rate for Grid Computing
would be clarity of the different messages that are being marketed by the various industry
leaders like IBM, SUN, Oracle and others. There seems to be no clear definition of Grid
Computing and particularly the standards that would lead to its ubiquity. Today there is a
mix of application-specific code, "off the shelf' tools and services from Globus, startups,
established IT-vendors such as mentioned earlier and others in the Grid community.
These are all tied together by application development and system integration. But in
order for Grid Computing to reach the next level there should be wider open source
implementations and more opportunities for newer small companied to participate with
more investment from the major IT vendors. This could be achieved by building the right
'Eco-system' to nurture the organic growth and proliferation of Grid Computing. IBM,
for example, had announced their intentions to build this Eco-System through partnering
and supporting new businesses ventures in Grid Computing and creating new market
opportunities [27]. Similarly, EDS has their Agility Alliance, with selective key partners
like Sun, HP, Microsoft and others, which is a part of EDS Agile Enterprise initiative to
promote Utility Computing services [25].
Utility Computing can also be a major driver in acceptance of the Grid Computing
concept. IT outsourcing in general has already played a major role in promoting the
evolution of Utility Computing. Evolutionary change through incremental adoption as IT
becomes more and more synchronized with business objectives is illustrated in Figure 4.1.
I-e inft stwrlCr iS SLpI',mmd-, 1-ned by the shared access of 6 utity.
Figure 4.1: Evolutionary change through incremental IT adoption
There are already several models available for Utility Computing such as Dedicated (inhouse), Shared (Outsourced) and Public (Data Center) along with pricing models based
on Compute (CPU cycles), Storage (GB) and Network (GB Bandwidth) that are already
implemented commercially [25].
4.2 Technology and Standards Direction
The technology for Grid Computing has been evolving over several years and the pace
has picked up by the endorsements of the larger vendors. The Globus standard has been
at the forefront as the de facto open standard for the grid but several other projects for
different platforms are already available as mentioned in previous chapters. In the recent
past, tech companies have delivered products that make it relatively easy to set up grids
although complexity increases if the same operating software is not used within the
hierarchy of Grid Computing types. Technology development based on open standards
will play a major role in enhancing the future of grid computing as was mentioned earlier.
For now as technology improves so does the disparity in the systems and their
implementations. A clear lack of standards will hinder the global Grid Computing
strategy. Figure 4.2 provides a list of Grid-related standards organizations that are
promoting, defining and evolving different aspects of Grid Computing [23].
Research and Industry, use cases, architectures and
specifications (OGSA, OGSI/WSRF)
* Promote and grow Enterprise grid computing
* Distributed Mgt. standards and models (CIM)
eBusiness & Web Services Management (WS-RF,
WS-Notification. WSDM, ...)
* Internet architectures & specifications (SNMP, SMI)
* Web Services architectures and specifications
+ Advance the adoption of storage networks as complete and
trusted solutions"
Figure 4.2: Grid-related standards organizations
For example, Enterprise Grid Alliance (EGA) is one such organization which is a
consortium of leading vendors and customers focused on developing Enterprise Grid
solutions. EGA is open, independent and vendor-neutral. Anyone can join by executing
relevant agreements and paying dues- there are no admission barriers. Their technical
scope includes grid activities within enterprise data centers, but not desktop grids; using
proven and standard enterprise components, but not vector supercomputers; within and
between trusted and secure enterprises, but not involving dynamically defined virtual
organizations, and for use with enterprise commercial and technical applications, but not
scientific computing or academic research grids. EGA's ultimate goal is to unify Grid
Computing within and between enterprises to support true cooperative processing and not
just message passing [23]. Appendix D provides summaries about the other standards
Collaboration among the various standards organizations will be a key enabler for
tackling the challenges of technology and standards for wide Grid Computing acceptance.
Federated . .
Silos ...
Network Fabric
Storage Fabric
Shared . .
Figure 4.3: Vision of the Grid Computing journey
Figure 4.3 illustrates the journey ahead for Grid Computing and one should keep in mind
that the journey is not only about technology but also people, process and finance [23].
These are also the organizational issues identified as barriers to Grid Computing
implementation in Chapter 3.
4.3 Industry Perspectives
Corporations are taking a wait and see approach towards the adoption of Grid Computing
into their corporate infrastructures. Utility Computing service provider, EDS, is taking a
cautious approach to adopting Grid Computing as part of their major initiatives but
keeping vigilant about future opportunities and disruptive technologies leading to wider
acceptance of Grid Computing [25]. Whereas, IBM, Sun, Oracle and others are touting
Grid Computing enabled software and hardware services and are investing heavily in
marketing their products and services. The case studies in the previous chapter provided
specific examples of Grid Computing based solutions within corporations but it did not
reflect the ideals of true ubiquity of the grid. Academia and scientific foundations, as well
as government initiatives, are trying to push those boundaries since they are not restricted
by the business justification of cost and benefit that corporations have to take into
The application programmers' paradigm shift to deeper Grid Computing development
application support, mentioned in Chapter 3, will play a key role in enabling the Grid
Computing ability of various software. This cultural shift although not new, since it
already existed in the mainframe days, will be very important and is recognized by the
industry [25].
Finally, technology is driving the standards and open source standards versus
homogenous standards 4 will have a global impact in the adoption of Grid Computing [27].
4 Term used here to illustrate alliances and partnerships among vendors to promote their joint Grid
Computing standards
5.1 Discussions
Grid Computing has been in existence within academia and the scientific community
over several years and there are many different projects currently active worldwide. The
commercial interest in Grid Computing has gained momentum and there are several
major vendors offering different initiatives in Grid Computing. What's been interesting is
that there seems to be confusion about the definition of Grid Computing and Utility
Computing, and the use of the term interchangeably. The market opportunity predicted
for Grid Computing is very extensive and this is will be a key driver in expanding Grid
Computing in the commercial space [6]. Chapter 2 introduced the Grid Computing
infrastructure and the de facto Globus standard. The convergence of Grid Computing and
Web Services was also highlighted with the emergence of WSRF and GXA. Some of the
major vendors', like IBM, Sun, Oracle and HP, Grid Computing initiatives were
Microsoft .NET Grid framework at MIT, were also presented to illustrate the use of the
Grid standard.
The current state of affairs of Grid Computing was discussed in Chapter 3 along with
three case studies of major vendor implementation of Grid Computing solutions. The
organizational issues to Grid acceptance were listed and the hierarchy for Grid types was
introduced. There are already current implementations of Level 1 and Level 2 Grid Types
but Level 3 Grid Types raises issues of security across domains and the complex and
complicated process of communication, coordination and scaling among others. This is
further complicated by the use of different operating systems by different systems across
multiple domains and also within domains. A new paradigm of application development
supporting Grid Computing was also mentioned.
The future state of Grid Computing with regards to marketing, standards and technology
directions were discussed in Chapter 4. The market opportunity as mentioned earlier
seems to be driving some of the commercial movement in the Grid Computing space with
formation and participation in various standards organizations. Standardization and a
broader acceptance of the open standards along with identification of key business
benefits of Grid Computing will be essential for commercial acceptance.
5.2 Conclusions
The value of Grid Computing to a company will be measured by the business benefit it
provides. In order for that to be the case, the standards need to be defined and a new way
of thinking in utilizing and maximizing the unused CPU cycles towards realizing
business benefits has to be part of the organization's culture with a long term visions.
Corporations are currently evaluating Grid Computing solutions on an as needed basis
and are not thinking in terms of parallel processing terms as to how to improve the
business. Globus is the de facto standard that most organizations are following but
challenges exist in implementing this standard in a heterogeneous operating environment.
The flexibility, reliability and resiliency with IT cost reduction that companies demand
can be realized by implementing Grid Computing. The computing needs of business and
engineering solutions are currently available from the major vendors and their partners as
presented in the sample case studies. It can be concluded that standards, security and
convergence of Grid and Web Services technologies in an Open Source environment are
some of the key enablers towards the ubiquity goal of Grid Computing.
XML Web Services
XML Web services are units of application logic that provide data and services to other
applications. Applications access XML Web services by means of industry standard Web
protocols and data formats, such as HTTP, XML, and Simple Object Access Protocol
(SOAP), regardless of how each XML Web service is implemented. Web service
architecture involves many layered and interrelated technologies Figure A. 1 [32]. One of
the primary advantages of the XML Web services architecture is that it allows programs
written in different languages on different platforms to communicate with each other in a
standards-based way.
Figure A. 1: Web Services Architecture Stack
One of the core characteristics of an XML Web service is the high degree of abstraction
that exists between the implementation and the consumption of a service. By using XMLbased messaging as the mechanism by which the service is created and accessed, both the
XML Web service client and the XML Web service provider are freed from needing any
knowledge of each other beyond inputs, outputs, and location.
Simple Object Access Protocol (SOAP)
SOAP defines how messages are formatted, sent, and received when working with XML
Web services. SOAP is also an industry standard that is built on XML and HTTP. Any
platform that supports the SOAP standard can support XML Web services. In other
words SOAP is the communications protocol for XML Web services. SOAP is a
specification that defines the XML format for messages. If you have a well-formed XML
fragment enclosed in a couple of SOAP elements, you have a SOAP message. There are
other parts of the SOAP specification that describe how to represent program data as
XML and how to use SOAP to do Remote Procedure Calls. These optional parts of the
specification are used to implement RPC-style applications where a SOAP message
containing a callable function, and the parameters to pass to the function, is sent from the
client, and the server returns a message with the results of the executed function. Most
current implementations of SOAP support RPC applications because programmers who
are used to doing COM or CORBA applications understand the RPC style. SOAP also
supports document style applications where the SOAP message is just a wrapper around
an XML document. Document-style SOAP applications are very flexible and many new
XML Web services take advantage of this flexibility to build services that would be
difficult to implement using RPC.
The last optional part of the SOAP specification defines what an HTTP message that
contains a SOAP message looks like. This HTTP binding is important. The HTTP
binding is optional, but almost all SOAP implementations support it because it's the only
standardized protocol for SOAP. For this reason, there's a common misconception that
SOAP requires HTTP. Some implementations support MSMQ, MQ Series, SMTP, or
TCP/IP transports, but almost all current XML Web services use HTTP because it is
ubiquitous. Since HTTP is a core protocol of the Web, most organizations have a network
infrastructure that supports HTTP and people who understand how to manage it already.
The security, monitoring, and load-balancing infrastructure for HTTP are readily
available today.
Web Services Description Language (WSDL)
WSDL is an XML format for describing the network services that are offered by the
server. You use WSDL to create a file that identifies the services that are provided by the
server and the set of operations within each service that the server supports. For each of
the operations, the WSDL file also describes the format that the client must follow when
requesting an operation.
Universal Description, Discovery and Integration (UDDI)
The Universal Description, Discovery and Integration (UDDI) specifications define a
registry service for Web services and for other electronic and non-electronic services. A
UDDI registry service is a Web service that manages information about service providers,
service implementations, and service metadata. Service providers can use UDDI to
advertise the services they offer. Service consumers can use UDDI to discover services
that suit their requirements and to obtain the service metadata needed to consume those
The UDDI specifications define:
" SOAP APIs that applications use to query and to publish information to a UDDI
XML Schema schemata of the registry data model and the SOAP message formats
WSDL definitions of the SOAP APIs
UDDI registry definitions of various identifier and category systems that may be used
to identify and categorize UDDI registrations
Organizations building and managing secure XML Web services need to ensure that only
authorized parties are allowed to use the XML Web services and that the SOAP messages
sent and received by the XML Web services can only be modified or viewed by
appropriate parties. WS-Security describes how to use the existing W3C security
specifications, XML Signature and XML Encryption, to ensure the integrity and
confidentiality of SOAP messages. And together with WS-License, it describes how
existing digital credentials and their associated trust semantics can be securely associated
with SOAP messages.
Together, these specifications form the bottom layer of
comprehensive modular security architecture for XML Web services. Future security
specifications will build on these basic capabilities to provide mechanisms for credential
exchange, trust management, revocation, and other higher-level capabilities.
The two initial security specifications provide the following capabilities:
WS-Security is a simple, stateless, SOAP extension that describes how digital
credentials should be placed within SOAP messages, and how these credentials
a message
confidentiality. WS-Security describes how message integrity is maintained even for
SOAP messages that use the WS-Routing specifications described below. Using WSSecurity, XML Web services can examine incoming SOAP messages and, based on
an evaluation of the credentials, determine whether or not to process the request. WSSecurity supports a wide range of digital credentials and technologies including both
public key and symmetric key cryptography.
WS-License describes how several common license formats, including X.509
certificates and Kerberos tickets, can be used as WS-Security credentials. WSLicense includes extensibility mechanisms that enable new license formats to be
easily incorporated into the specification.
Microsoft .NET Framework
The Microsoft .NET Framework is a platform for building, deploying, and running Web
Services and applications. It provides a highly productive, standards-based, multilanguage
for integrating existing investments
with next-generation
applications and services as well as the agility to solve the challenges of deployment and
operation of Internet-scale applications. The .NET Framework consists of two main parts:
the common language runtime (CLR) and a unified, hierarchical class library that
includes a revolutionary advance to Active Server Pages (ASP.NET), an environment for
building smart client applications (Windows Forms), and a loosely-coupled data access
subsystem (ADO.NET) as shown in Figure B.1 [33].
The .NET Framework is designed to fulfill the following objectives:
To provide a consistent object-oriented programming environment whether object
code is stored and executed locally, executed locally but Internet-distributed, or
executed remotely.
" To provide a code-execution environment that minimizes software deployment and
versioning conflicts.
To provide a code-execution environment that guarantees safe execution of code,
including code created by an unknown or semi-trusted third party.
To provide a code-execution environment that eliminates the performance problems
of scripted or interpreted environments.
To make the developer experience consistent across widely varying types of
applications, such as Windows-based applications and Web-based applications.
To build all communication on industry standards to ensure that code based on
the .NET Framework can integrate with any other code.
The .NET Framework can be hosted by unmanaged components that load the common
language runtime into their processes and initiate the execution of managed code, thereby
creating a software environment that can exploit both managed and unmanaged features.
The .NET Framework not only provides several runtime hosts, but also supports the
development of third-party runtime hosts.
Visal asic C++j C#
Common Language Specification
Web Forms, XML Web Services
Base Classes
Common Language Runtime
Operating System
Figure B. 1: Microsoft .NET Architecture
The .NET Framework is an integral Windows component for building and running the
next generation of software applications and Web services.
Common Language Runtime
The common language runtime manages memory, thread execution, code execution, code
safety verification, compilation, and other system services (Figure B.2) [33]. These
features are intrinsic to the managed code that runs on the common language runtime.
With regards to security, managed components are awarded varying degrees of trust,
depending on a number of factors that include their origin (such as the Internet, enterprise
network, or local computer). This means that a managed component might or might not
be able to perform file-access operations, registry-access operations, or other sensitive
functions, even if it is being used in the same active application.
Figure B.2: Common Language Runtime Architecture
The common language runtime makes it easy to design components and applications
whose objects interact across languages. Objects written in different languages can
communicate with each other, and their behaviors can be tightly integrated. For example,
you can define a class and then use a different language to derive a class from your
original class or call a method on the original class. You can also pass an instance of a
class to a method of a class written in a different language. This cross-language
integration is possible because language compilers and tools that target the runtime use a
common type system defined by the runtime, and they follow the runtime's rules for
defining new types, as well as for creating, using, persisting, and binding to types.
Class Libraries
The .NET Framework class library is a collection of reusable types that tightly integrate
with the common language runtime. The class library is object oriented, providing types
from which your own managed code can derive functionality. This not only makes
the .NET Framework types easy to use, but also reduces the time associated with learning
new features of the .NET Framework. In addition, third-party components can integrate
seamlessly with classes in the .NET Framework. Base classes provide standard
functionality such as input/output, string manipulation, security management, network
communications, thread management, text management, and user interface design
The ADO.NET classes enable developers to interact with data accessed in the form of
XML through the OLE DB, ODBC, Oracle, and SQL Server interfaces. XML classes
enable XML manipulation, searching, and translations. The ASP.NET classes support the
development of Web-based applications and Web services. The Windows Forms classes
support the development of desktop-based smart client applications.
The .NET Framework also provides a collection of classes and tools to aid in
development and consumption of XML Web services applications. XML Web services
are built on standards such as SOAP (a remote procedure-call protocol), XML (an
extensible data format), and WSDL. The .NET Framework is built on these standards to
promote interoperability with non-Microsoft solutions. It is also possible to extend the
library by creating one's own classes and compiling them into libraries.
.NET Framework Security
The .NET Framework provides several mechanisms for protecting resources and code
from unauthorized code and users:
ASP.NET Web Application Security provides a way to help limit access to a site by
comparing authenticated credentials (or representations of them) to Microsoft
Windows NT file system permissions or to an XML file that lists authorized users,
authorized roles, or authorized HTTP verbs.
" Code access security uses permissions to help limit the access that code has to
protected resources and operations. It helps protect computer systems from malicious
mobile code and helps provide a way to allow mobile code to run safely. (Code
access security, together with the policies that govern it, are referred to as evidencebased security.)
Role-based security provides information needed to make decisions about what a user
is allowed to do. These decisions can be based on either the user's identity or role
membership, or both.
Alchemi: A .NET-based Grid Computing Framework [34]
A Microsoft Windows based grid computing infrastructure will play a critical role in the
industry-wide adoption of grids due to the large-scale deployment of Windows within
enterprises. This enables the harnessing of the unused computational power of desktop
PCs and workstations to create a virtual supercomputing resource at a fraction of the cost
of traditional supercomputers. However, there is a distinct lack of service oriented
architecture-based grid computing software in this space. To overcome this limitation, we
developed a Windows-based
grid computing framework
called Alchemi
implemented on the Microsoft .NET Platform.
Precorn plied executables
Any lanquaqe
Parametric Modeling Environment
Grldbus CGrid Service Broker (GSB)
Alchemi Actuator
Grid Threads
Globus Actuator
Alcheml Jobs
Windows-based machines with .NET
Figure C. 1: Alchemi architecture and interaction between its components
Alchemi follows the master-worker parallel programming paradigm in which a central
component dispatches independent units of parallel execution to workers and manages
them. This smallest unit of parallel execution is a grid thread, which is conceptually and
programmatically similar to a thread object (in the object-oriented sense) that wraps a
"normal" multitasking operating system thread. A grid application is defined simply as an
application that is to be executed on a grid and that consists of a number of grid threads.
Grid applications and grid threads are exposed to the grid application developer via the
object oriented Alchemi .NET API.
The Manager manages the execution of grid applications and provides services associated
with managing thread execution. The Executors register themselves with the Manager
which in turn keeps track of their availability. Threads received from the Owner are
placed in a pool and scheduled to be executed on the various available Executors. A
priority for each thread can be explicitly specified when it is created within the Owner,
but is assigned the highest priority by default if none is specified. Threads are scheduled
on a Priority and First Come First Served (FCFS) basis, in that order. The Executors
return completed threads to the Manager which are subsequently passed on or collected
by the respective Owner.
The Executor accepts threads from the Manager and executes them. An Executor can be
configured to be dedicated, meaning the resource is centrally managed by the Manager,
or non-dedicated, meaning that the resource is managed on a volunteer basis via a screen
saver or by the user. For non-dedicated execution, there is one-way communication
between the Executor and the Manager. In this case, the resource that the Executor
resides on is managed on a volunteer basis since it requests threads to execute from the
Manager. Where two-way communication is possible and dedicated execution is desired
the Executor exposes an interface (Executor) so that the Manager may communicate
with it directly. In this case, the Manager explicitly instructs the Executor to execute
threads, resulting in centralized management of the resource where the Executor resides.
Thus, Alchemi's execution model provides the dual benefit of:
Flexible resource management i.e. centralized management with dedicated execution
vs. decentralized management with non-dedicated execution; and
Flexible deployment under network constraints i.e. the component can be deployment
as nondedicated where two-way communication is not desired or not possible (e.g.
when it is behind a firewall or NAT/proxy server).
Thus, dedicated execution is more suitable where the Manager and Executor are on the
same Local Area Network while non-dedicated execution is more appropriate when the
Manager and Executor are to be connected over the Internet.
Grid applications created using the Alchemi API are executed on the Owner component.
The Owner provides an interface with respect to grid applications between the application
developer and the grid. Hence it "owns" the application and provides services associated
with the ownership of an application and its constituent threads. The Owner submits
threads to the Manager and collects completed threads on behalf of the application
developer via the Alchemi API.
Cross-Platform Manager
The Cross-Platform Manager, an optional sub-component of the Manager, is a generic
web services interface that exposes a portion of the functionality of the Manager in order
to enable Alchemi to manage the execution of platform independent grid jobs (as
opposed to grid applications utilizing the Alchemi grid thread model). Jobs submitted to
the Cross-Platform Manager are translated into a form that is accepted by the Manager
(i.e. grid threads), which are then scheduled and executed as normal in the fashion
described above. Thus, in addition to supporting the grid-enabling of existing
applications, the Cross-Platform Manager enables other grid middleware to interoperate
with and leverage Alchemi on any platform that supports web services (e.g. Gridbus Grid
Service Broker).
Global Grid Forum (GGF) [35]
The Global Grid Forum is a community-initiated forum of thousands of individuals from
industry and research leading the global standardization effort for grid computing. GGF's
primary objectives are to promote and support the development, deployment, and
implementation of Grid technologies and applications via the creation and documentation
of "best practices" - technical specifications, user experiences, and implementation
GGF efforts are also aimed at the development of a broadly based Integrated Grid
Architecture that can serve to guide the research, development, and deployment activities
of the emerging Grid communities. GGF goals include the following:
To facilitate and support the creation and development of regional and global
computational grids that will provide to the scientific community, industry,
government and the public at large dependable, consistent, pervasive and inexpensive
access to high-end computational capabilities
To address architecture, infrastructure, standards and other technical requirements for
computational grids and to facilitate and find solutions to obstacles inhibiting the
creation of these grids
To educate the scientific community, industry, government and the public regarding
the technologies involved in, and potential uses and benefits of, computational grids
" To facilitate the application of grid technologies within educational, research,
governmental, healthcare and other industries
" To provide a forum for exploration of computational grid technologies, applications
and opportunities, and to stimulate collaboration among the scientific community,
industry, government and the public regarding the creation, development and use of
computational grids
To exercise all powers conferred upon corporations formed under the Illinois General
Not-For-Profit Corporation Act in order to accomplish its charitable, scientific and
educational purposes and to take other actions necessary, advisable or convenient to
carry out any or all of these purposes
Distributed Management Task Force, Inc. (DMTF) [36]
With more than 3,000 active participants, the Distributed Management Task Force, Inc.
(DMTF) is the industry organization leading the development of management standards
and integration technology for enterprise and Internet environments. DMTF standards
provide common management infrastructure components for instrumentation, control and
communication in a platform-independent
and technology neutral way. DMTF
technologies include information models (CIM), communication/control
(WBEM), and core management services/utilities.
DMTF works closely with its Alliance Partners, including CompTIA, Consortium for
Service Innovation, Federation Against Software Theft (FAST), Global Grid Forum
(GGF), Interoperability Technology Association for Information Processing (INTAP), IT
Service Management Forum (itSMF), Network Applications Consortium (NAC),
Northwest Energy Efficiency Alliance, The Open Group, Storage Networking Industry
Association (SNIA) and TeleManagement Forum (TMF). These top industry standards
bodies are working with and participating in the development of DMTF's CIM - and its
semantically rich definitions of management information - as a common approach to
address the challenge of providing interoperable distributed management.
Organization for the Advancement of Structured Information Standards (OASIS)
OASIS was founded in 1993 under the name SGML Open as a consortium of vendors
and users devoted to developing guidelines for interoperability among products that
support the Standard Generalized Markup Language (SGML). OASIS changed its name
in 1998 to reflect an expanded scope of technical work. It is a not-for-profit, international
consortium that drives the development, convergence, and adoption of e-business
standards. The consortium produces more Web services standards than any other
organization along with standards for security, e-business, and standardization efforts in
the public sector and for application-specific markets. Founded in 1993, OASIS has more
than 4,000 participants representing over 600 organizations and individual members in
100 countries.
OASIS is distinguished by its transparent governance and operating procedures.
Members themselves set the OASIS technical agenda, using a lightweight process
expressly designed to promote industry consensus and unite disparate efforts. Completed
work is ratified by open ballot. Governance is accountable and unrestricted. Officers of
both the OASIS Board of Directors and Technical Advisory Board are chosen by
democratic election to serve two-year terms. Consortium leadership is based on
individual merit and is not tied to financial contribution, corporate standing, or special
The Internet Engineering Task Force (IETF) [38]
The Internet Engineering Task Force is a large open international community of network
designers, operators, vendors, and researchers concerned with the evolution of the
Internet architecture and the smooth operation of the Internet. It is open to any interested
individual. The actual technical work of the IETF is done in its working groups, which
are organized by topic into several areas (e.g., routing, transport, security, etc.). Much of
the work is handled via mailing lists. The IETF holds meetings three times per year.
The Internet Assigned Numbers Authority (IANA) is the central coordinator for the
assignment of unique parameter values for Internet protocols. The IANA is chartered by
the Internet Society (ISOC) to act as the clearinghouse to assign and coordinate the use of
numerous Internet protocol parameters.
World Wide Consortium (W3C) [39]
In October 1994, Tim Berners-Lee, inventor of the Web, founded the World Wide Web
Consortium (W3C) at the Massachusetts Institute of Technology, Laboratory for
Computer Science [MIT/LCS] in collaboration with CERN, where the Web originated,
with support from DARPA and the European Commission.
By promoting interoperability and encouraging an open forum for discussion, W3C
commits to leading the technical evolution of the Web. In just ten years, W3C has
developed more than eighty technical specifications for the Web's infrastructure.
However, the Web is still young and there is still a lot of work to do, especially as
computers, telecommunications, and multimedia technologies converge. To meet the
growing expectations of users and the increasing power of machines, W3C is already
laying the foundations for the next generation of the Web. W3C's technologies will help
make the Web a robust, scalable, and adaptive infrastructure for a world of information.
To understand how W3C pursues this mission, it is useful to understand the Consortium's
goals and driving principles.
Storage Networking Industry Association (SNIA) [40]
SNIA incorporated in December 1997 and is a non-profit trade association. Its members
are dedicated to ensuring that storage networks become complete and trusted solutions
across the IT community. The SNIA works toward this goal by forming and sponsoring
technical work groups, producing (with our strategic partner Computerworld) the Storage
Networking World Conference series, building and maintaining a vendor neutral
Technology Center in Colorado Springs, CO, and promoting activities that expand the
breadth and quality of the storage networking market.