Implementing Production Grids Bina Ramamurthy 5/28/2016 B.Ramamurthy 1 Introduction Based on “Implementing Production Grids” by William Johnston, of The NSASA IPG Engineering Team and The DOE Science Grid Team http://www-library.lbl.gov/docs/LBNL/511/92/PDF/LBNL-51192.pdf Production grids are intended to provide identified user communities with a rich, stable, and standard distributed computing environment. Standards are mainly from the Global Gird Forum (GGF) 5/28/2016 B.Ramamurthy 2 Existing Production Grids Grids provide a persistent infrastructure for scientific and business applications. Existing grids: UK e-science grid NASA Information Power Grid DOE science grid(s) Asia Pacific Grid These are infrastructure projects. 5/28/2016 B.Ramamurthy 3 Grid Services Number of projects aimed at providing types of higher-level grid services that will be used directly by the community. 5/28/2016 Ninf ( A network based information library for global worldwide computing infrastructure) GridLab Network Weather Services B.Ramamurthy 4 Coverage in the Paper Various suites related to the gird: 5/28/2016 Globus (infrastructure) Condor (infrasructure) SRB/MCAT (federating and cataloguing tertiary storage) PBSPro (job management system) A PKI authentication substrate B.Ramamurthy 5 Major Topics Deploying operational infrastructure and manage it. Establishing cross-site trust Dealing with scaling issues Listening to and interfacing with users 5/28/2016 B.Ramamurthy 6 The Grid Context Girds are an approach for building dynamically constructed problem-solving using geographically and organizationally dispersed, high-performance and data handling resources. Grids also provide important infrastructure supporting multi-institutional and multiorganizational collaboration. Functionally, girds are tools, middleware and services for a wide variety of applications. (but currently only scientific applications.) 5/28/2016 B.Ramamurthy 7 Important Features A set of uniform software services that manage and provide access to heterogeneous, distributed resources. Widely deployed infrastructure. Is that all? 5/28/2016 B.Ramamurthy 8 Grid Architecture See the copy of the enclosed page 5/28/2016 B.Ramamurthy 9 Basic Functions (Hour glass model) The set of basic functions a grid must have are called the Common Grid Services These include: Grid Information Service (GIS) Grid Security Infrastructure (GSI) Grid Job initiator (Globus GRAM) Grid Scheduling function (NWS, Maui) Basic data management mechanism (GridFTP) Grid event monitoring (Grid Monitoring Architecture) 5/28/2016 B.Ramamurthy 10 Usage Models Anticipated usage model will determine what gets deployed and when. These usage models can be further divided into compute models and data models. (compute grid and data grid) 5/28/2016 B.Ramamurthy 11 Compute and Data Models Compute Models Export existing services Loosely coupled processes Workflow managed processes Distributed-pipelined/coupled processes Tightly coupled processes Data Models Occasional access to multiple tertiary storage Distributed analysis of massive datasets followed by cataloging and archiving Large Reference data sets Grid metadata management 5/28/2016 B.Ramamurthy 12 Grid Support for Collaboration Grids support collaboration in the form of virtual organization (VO). VO is a combination of human collaborators and the grid environment they share. Security: GSISSH, GSIFTP, GridFTP: GSI provides authentication, communication and trust management. Persistent Publication service: Preserve organizational structure and share community information: GIS Group-to-group audio and videoconferencing facility based on Internet IP multicast: Access Grid. 5/28/2016 B.Ramamurthy 13 Building a Multi-site Computational and Data Grid Test Environment The grid building team: sys admin plays an important role: Grid software involves root-owned processes and also trust model for authorizing users that is not typical. Form a working group (WG). Grid resources: identify computing and storage resources to be incorporated into the grid. Install batch schedulers to manage load. Use co-scheduling. Co-scheduling for the grid involves scheduling multiple individual, potential architecturally and administratively heterogeneous computing resources so that multiple processes are guaranteed to execute at the same time in order to communicate and coordinate with each other. Examples: PBSPro, Maui. We will use Globus grid software for the test environment. 5/28/2016 B.Ramamurthy 14 Initial Test Bed Grid information service: to locate resources based on characteristics needed by the job (OS, CPU count, memory, etc.) Globus Meta Data Service (MDS) provides GRIS and GIIS respectively providing the registry and directory services. Use PKI authentication and use certificates from Globus Certificate Authority for the test environment. 5/28/2016 Validate access to, and operation of GIS/GIISs at all sites and test local and remote job submission using these certificates. B.Ramamurthy 15 Trust Management GSI provides uniform grid entity naming and authentication mechanism. But the real issue is establishing the “trust” in the process that each CA uses for issuing the identity certificates to users and other entities such as host and services. Two steps defined in CA policy: 5/28/2016 Physical identification of the entities, verification of their association with the organization and assigning appropriate names. X-509 certificate is issued for the entity. B.Ramamurthy 16 Trust and Usage Trust is confidence in or reliance on some quality or attribute of a person or thing or the truth of a statement. 5/28/2016 Grid identity token (in say X.509) is presented for remote authentication. It is verified by the using appropriate cryptographic techniques. The relying party should have some level of confidence that the entity that initiated the transaction is the entity that is expected to be. B.Ramamurthy 17 Establishing an Operational CA Set up or identify a CA to issue Grid X.509 certificates to users and hosts. You may use the Netscape CMS (Certificate Management System). CA policies are encoded in formal statements called Certificate Policy/Certification Practice Statement (CP/CPS). Templates are available for these. Determine your space of entities for each of which you will have to issue certificates: humans, hosts, services, security domain gateways. Each of which must have a clear policy defined in CP/CPS. 5/28/2016 B.Ramamurthy 18 Naming Important issues in developing CP is the naming of the principals. Tendency is to pack a lot o information into a subject’s name (ex. X.500 style). However less information helps is certificate management. For example, certificate can have flat name space with the common name of the entity and a random string. On the other hand if it is a hierarchical namespace then consider full organizational hierarchy in naming. 5/28/2016 B.Ramamurthy 19 The Certification Authority Model Single CA provider is a common model. 5/28/2016 A central CA that has an overall CP and subordinate policies for a collection of VOs. An independent can be assigned the job of operating in the CA infrastructure. There is a Root CA that certifies the subordinate CAs that issue users certificates. See the attached figure. B.Ramamurthy 20