Grid Computing Systems UMBC CMSC 621 Fall 07 Report Fuesane Cheng What is Grid Computing System • Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations (VOs). • Virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image • individual users can access computers and data transparently, without having to consider location, operating system, account administration, and other details. Users essentially sees a single, large virtual computer • Based on an open set of standards and protocols, e.g., Open Grid Services Architecture (OGSA) that enable communication across heterogeneous, geographically dispersed environments. • With grid computing, organizations can optimize computing and data resources, pool them for large capacity workloads, share them across networks and enable collaboration • "virtual supercomputer" by using – spare computing resources within an organization. – a network of geographically dispersed computers Grid Computing System Grid Computing Layer INTERNET Grid Computing Layer Types of Grid Computing Systems • The Heavy-weight, feature-rich systems that tend to concern themselves primarily with providing access to large-scale, intra- and inter-institutional resources such as clusters or multiprocessors. Grid systems developed using the Globus Toolkit are examples of this class. • The Desktop Grid, in which cycles are scavenged from idle desktop computers. The Berkeley Open Infrastructure for Network Computing (BOINC), a descendant of the SETI@home project, is an example of middleware for public Desktop Grid computing, as it harnesses resources that exist outside of institutional control. • The hybrid BOINC- and Globus-based Grid systems to inter-operate and thus provides a means for Globus-based computational Grids to incorporate a much greater range of resources. – Decreasing the startup cost for new Desktop Grid computing projects, it makes Desktop Grids a viable option for a broader range of projects, and provides to Desktop Grids features inherent in Globus (e.g., authentication, authorization, file transfer). The Anatomy of the Grid Layered Grid Architecture Fabric: Interfaces to Local Control • • • Provide shared access to resources (e.g. computational, storage, catalogs, network, Repository) Implement the local, resource–specific operations on specific resources (physical or logical) to allow sharing operation at higher levels. There is a trade off between requesting richer fabric functionality and simplifying Gird infrastructure deployment. For Example: – Advance reservations makes it possible for higher-level services to aggregate (coschedule) resources in interesting ways that would otherwise be impossible to achieve – However, as in practice few resources support advance reservation “out of the box,” a requirement for advance reservation increases the cost of incorporating new resources into a Grid • • At minimum, resources should implement enquiry and resource management mechanisms Sample Resource Capabilities: – Computational: starting, monitoring, controlling the execution of processes, and management, advance reservation, and enquiry functions – Storage: Putting and getting files, remote data selection and reduction, and management, advance reservation, and enquiry functions – Network: Control over the resources for network transfers (e.g., prioritization, reservation), Enquiry functions to determine network characteristics and load. – Code Repository: Managing versioned source and object code – Catalogs: catalog query and update operations on databases Connectivity: Communicating Easily and Securely • Defines core communication and authentication protocols • Communication Protocols enable the exchange of data between Fabric layer resources. – Internet (IP and ICMP), transport (TCP, UDP), and application (DNS, OSPF,RSVP, etc.) layers of the Internet layered protocol architecture – New Protocols • Authentication protocols build on communication services to provide cryptographically secure mechanisms for verifying the identity of users and resources. – Single sign on: logon once and have access to multiple Grid resources defined in the Fabric layer – Delegation: a program endowed by a user to run on his behalf is able to access the resources on which the user is authorized. The program may also be able to delegate a subset of its rights to another program. – Integration with various local security solutions: Be able to interoperate with each site or resource’s security solutions such as Kerberos and Unix security – User-based trust relationship: different sites or resources are not required to cooperate or interact with each other in order to let an authorized user to use them at the same time Resource: Sharing Single Resources • • • Builds on Connectivity layer communication and authentication protocols to define protocols (and APIs and SDKs) for the secure negotiation, initiation, monitoring, control, accounting, and payment of sharing operations on individual resources Concerned entirely with individual resources and hence ignore issues of global state and atomic actions across distributed collections; such issues are the concern of the Collective layer Two primary classes of protocols – Information protocols are used to obtain information about the structure and state of a resource, for example, its configuration, current load, and usage policy (e.g., cost). – Management protocols are used to negotiate access to a shared resource, • Specifying – Resource requirements (including advanced reservation and quality of service) – Operation(s) to be performed, such as process creation, or data access • Enforcing resources sharing policy • Monitoring the status of an operation and controlling (for example, terminating) the operation. • The Resource and Connectivity protocol layers form the neck of the hourglass model, and as such should be limited to a small and focused set – capture the fundamental mechanisms of sharing across many different resource types (for example, different local resource management systems); but – not overly constraining the types or performance of higher-level protocols that may be developed. Collective: Coordinating Multiple Resources • Contains protocols and services (and APIs and SDKs) that are not associated with any one specific resource but rather are global in nature and capture interactions across collections of resources. • Builds on the narrow Resource and Connectivity layer “neck” in the protocol hourglass, they can implement a wide variety of sharing behaviors without placing new requirements on the resources being shared • Service Examples: – – – – – – – – – – Directory services Co-allocation, scheduling, and brokering services Monitoring and diagnostics services Data replication services Grid-enabled programming systems Workload management systems and collaboration frameworks Software discovery services Community authorization servers Community accounting and payment services Collaboratory services Collective Layer Example Collective and Resource layer protocols, services, APIs, and SDKS can be combined in a variety of ways to deliver functionality to applications Applications • Applications are constructed in terms of, and by calling upon, services defined at any layer • At each layer, well-defined protocols that provide access to some useful service such as resource management, data access, resource discovery, and so forth • At each layer, APIs may also be defined whose implementation (ideally provided by third-party SDKs) exchange protocol messages with the appropriate service(s) to perform desired actions • APIs are implemented by software development kits (SDKs), which in turn use Grid protocols to interact with network services that provide capabilities to the end user • Higher level SDKs can provide functionality that is not directly mapped to a specific protocol, but may combine protocol operations with calls to additional APIs as well as implement local functionality Application Programmer’s view of Grid Architecture Solid lines represent a direct call; dash lines protocol interactions “On the Grid”: The Need for Intergrid Protocols • Currently, it is quite feasible to define multiple instantiations of key Grid architecture elements • Grids constructed with these different protocols are not interoperable and cannot share essential services • Long-term success of Grid computing requires selection and achieving widespread deployment of one set of protocols at the Connectivity and Resource layers—and, to a lesser extent, at the Collective layer • These Intergrid protocols enable different organizations to interoperate and exchange or share resources. • Resources that speak these protocols can be said to be “on the Grid.” Standard APIs are also highly useful if Grid code is to be shared. Relationships with Other Technologies • Web Service and SOA – The ubiquity of Web technologies (i.e., IETF and W3C standard protocols—TCP/IP, HTTP, SOAP, etc.—and languages, such as HTML and XML) makes them attractive as a platform for constructing VO Grid systems and applications – Emergence of SOA standards for Web Services and Grids are just another but important service capability being provided – They do an excellent job of supporting the browser-client-to-web-server interactions, but lack features required for the richer interaction models that occur in VOs. E.g. TLS vs. Single sign-on – Grid Security Infrastructure (GSI) extensions to TLS with delegation capabilities would permit a browser client to delegate capabilities to a Web server so that so that the server could act on the client’s behalf • Application and Storage Service Providers – Application service providers (ASPs), storage service providers (SSPs), and hosting companies offer outsourcing services for specific business and engineering applications and storage capabilities by service level agreement that defines access to a specific combination of hardware and software – Security, dynamic reconfiguration of resources, and load sharing across providers are challenging rarely attempted currently Relationships with Other Technologies (2) • Enterprise Computing Systems – Technologies such as CORBA, EJB, J2EE, and DCOM are all systems designed to enable the construction of distributed applications – Standard resource interfaces, remote invocation mechanisms, and services discovery make it easy to share resources within a single organization – However, sharing arrangements are relatively static and restricted to occur within a single organization primarily in client-server form rather than the coordinated use of multiple resources – Integrate with Grid protocols provides enhanced capability and enables interoperability such as: • CORBA ORB uses GSI mechanisms to address cross-organizational security issues • Portable Object Adaptor that speaks the Grid resource management protocol to access resources spread across a VO • Grid-enabled Naming and Trading services that use Grid information service protocols to query information sources distributed across large VOs • Internet and Peer-to-Peer Computing – Peer-to-peer computing and Internet computing is an example of the more general (“beyond client-server”) sharing modalities and computational structures are in much common with Grid technologies – But need to shift focus from vertical to shared infrastructure and interoperability Protocols and Standards for Web Services Copy from “Service Oriented Computing” Munindar Singh, Michael Huhns Globus Tookit • Grid Computing Layer (Middleware) development toolkit which has been developed since the late 1990 to support the development of service-oriented distributed computing applications and infrastructures • An open source software toolkit used for building grids. It is being developed by the Globus Alliance and many others all over the world. • Includes software for security, information infrastructure, resource management, data management, communication, fault detection, and portability • Packaged as a set of components that can be used either independently or together to develop applications. • Grid Resource Allocation and Management (GRAM) protocol and its gatekeeper (factory) service; these provide for the secure and reliable creation and management of arbitrary computations, termed transient service instances • Grid Security Infrastructure (GSI), which supports single sign on, delegation, and credential mapping. A two-phase commit protocol is used for reliable invocation • Meta Directory Service (MDS-2), which provides for information discovery through soft-state registration, data modeling, and a local registry Globus Toolkit Components Selected GT4 Components and Interactions Shaded boxes are GT4 code and white boxes are user code Globus Architecture • Shown in previous figure, the GT4 architecture depicts three sets of components – A set of Service Implementations • • • • • • • Execution management (GRAM) Data access and movement (GridFTP, RFT, OGSA-DAI) Replicata Management (RLS, DRS) Monitoring and discovery (Index Trigger, WebMDS) Credential management (MyProxy, delegation, SimpleCA) Instrument management (GTCP) Most are Java Web Services but some are in other languages and/or use other protocols – Three Containers • Used to host user-developed services written in Java, Python, and C respectively • Provide implementations of security, management, discovery, state management, and other mechanisms frequently required when building services • Extend open source service hosting environments with support for useful Web Service specifiaciton – A set of Client Libraries • Allow client programs to invoke operations on both GT4 and user-developed services with multiple interfaces providing different levels of control: WS-I SOAP, common security and messaging infrastructure, a powerful and extensible authorization framework, common WS interfaces and behaviors, life time management of stateful components Examples of Grid Services http://lattice.umiacs.umd.edu/gridservices.php http://www.gridforum.org/documents/GFD.29.pdf UMBC Planned Grid Connectivity 1150 CPUs, including 80 x86 node cluster College Park 12 Institutions Sura Grid 224 XServe blades Bowie National Lambda Rail(NLR) Globus Toolkit/Condor Websphere App. Server NL R Lattice Grid Bluegrit Rationale S/W Lambda Ram +900 cpu’s Fiber Matisse UMBC HyperWall 6CPUs/12screens SURAgrid Participants Bowie State GMU (As of April 2006) UMD UMich UKY UVA UArk GPN Vanderbilt ODU UAH USC NCState OleMiss TTU SC TACC LSU = SURA Member = Resources on-grid ULL UFL UAB TAMU GSU Tulane UNCC Lattice Grid • What is: – The Lattice Project is an attempt to effectively share computational resources among departments and institutions, starting with those in the University System of Maryland. – The Grid is focused on computation, and we have not yet made efforts to enable large-scale data access, storage, or replication. • Grid Software – make heavy use of the Globus Toolkit, which forms the backbone of our Grid system. It provides mechanisms for job submission, file transfer, and authentication and authorization of Grid entities, to name a few things. – have also done extensive work with BOINC, which enables public participation in the Grid and represents a potentially huge resource. We have developed software that allows Globus, (and hence our Grid system), to submit jobs to a BOINC pool. – work with scheduling software, such as Condor and PBS, that controls local resources. Such software is being deployed where it is most appropriate. UMBC Near Term Bluegrit Design Hardware: 1 Intel based head node 1 Intel based storage server College Park 33 2-Proc. JS20 blades(2.2GHz +.5GB) 14 4-Proc. JS21 blades(2.5GHz +2GB) 5.4 TB of shared storage 1.3 TB of node storage UMBC Network 10 Gb Head node Operating System: Red Hat Enterprise 4 Linux JS21 Blades Network: 10 Gb external connection to College Park 1 Gb Ethernet interconnect 100 Mb external connection JS20 Blades Storage UMBC Future Bluegrit Potentials 5 Available Chassis with 70 blade slots Add Cell blade architecture for future computing Upgrade interconnects between chassis/blades Increase RAM availability Build Out Campus Grid References 1. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. I. Foster, C. Kesselman, S. Tuecke. International J. Supercomputer Applications, 15(3), 2001. 2. Service-Oriented Science. I. Foster. Science, vol. 308, May 6, 2005. 3. Globus Toolkit Version 4: Software for Service-Oriented Systems. I. Foster. IFIP International Conference on Network and Parallel Computing, SpringerVerlag LNCS 3779, pp 2-13, 2006. 4. Lattice Project http://lattice.umiacs.umd.edu/gridservices.php 5. SURA Grid http://www1.sura.org/3000/3200_ITGridPlan.htm 6. UMBC Multicore Computing Center (MC2) http://www.umbc.edu/research/blog/2007/08/ibm_gift_to_bring_orchestra_of_1.html