http://www.grid-support.ac.uk http://www.ngs.ac.uk Middleware services on the NGS http://www.nesc.ac.uk/ http://www.pparc.ac.uk/ http://www.eu-egee.org/ Acknowledgments • Matt Ford, NGS Induction Workshop (Dec. 2004, NeSC) • Neil Chue Hong , OGSA-DAI Tutorial GGF13 • OGSA-DAI website, www.ogsadai.org Induction to Grid Computing and the National Grid Service 3 NGS software • Computation services based on Globus Toolkit 2 – Use compute nodes for sequential or parallel jobs, primarily from batch queues – Can run multiple jobs concurrently • Data services: – Storage Resource Broker: • Primarily for file storage and access • Virtual filesystem with replicated files – “OGSA-DAI”: Data Access and Integration • Primarily for grid-enabling databases (relational, XML) – NGS Oracle service – GridFTP • Portal to support collaboration and ease use • Authorisation and Authentication using GSI Induction to Grid Computing and the National Grid Service 4 Globus Toolkit illustration • Command line interface to the tool for job globus-job-submit grid-data.rl.ac.uk/jobmanager-pbs submission – need to know name of/bin/hostname a Compute-f https://grid-data.rl.ac.uk:64001/1415/1110129853/ Element (batch queue) globus-job-status https://grid-data.rl.ac.uk:64001/1415/1110129853/ DONE globus-job-get-output https://grid-data.rl.ac.uk:64001/1415/1110129853/ grid-data12.rl.ac.uk Induction to Grid Computing and the National Grid Service 5 The “UI” machine • The user’s interface to the grid – Where you upload your certificate for your session – Where you create proxy certificates – Where you can run the various commands, including… • • • • The clients and development tools from Globus Toolkit 2.4.3 GSI (Grid Security Infrastructure) enabled Secure Shell Storage Resource Broker (more on this tomorrow) OGSA-DAI (more on this tomorrow) Induction to Grid Computing and the National Grid Service 6 Our setup Tutorial room machines ssh pub-234.nesc.ed.ac.uk. UI gsissh and Globus commands Internet NGS head nodes Execute Nodes grid-data.rl.ac.uk 7 Job submission : CLI Command-line interfaces GLOBUS, GLOBUS, etc. etc. User’s Interface to the grid Induction to Grid Computing and the National Grid Service 8 Application-specific tools Application Specific and / or Higher generic tools e.g. BRIDGES Command-line interfaces GLOBUS, etc. User’s Interface to the grid Induction to Grid Computing and the National Grid Service 9 Application-specific tools API’s: •Java •C •… Application Specific and / or Higher generic tools e.g. BRIDGES Command-line interfaces GLOBUS, etc. User’s Interface to the grid Induction to Grid Computing and the National Grid Service 10 Secure shell access UI NGS head node Code and data gsiscp: copies file using proxy certificate to allow AA Induction to Grid Computing and the National Grid Service 11 Open shell on NGS CN UI NGS node Code and data gsissh Code and data Compile, edit, recompile, build Can be an Xwindows client SHORT interactive runs are ok (sequential) Totalview debugger. Induction to Grid Computing and the National Grid Service 12 Run jobs from the UI UI NGS execute node Code and data Code and data Executables globus_job_run Or globus_job_submit / globus_get_output Can pass files with these commands: e,g, parameters for a job. Induction to Grid Computing and the National Grid Service 13 Non-communicating Processes Globus_job_submit UI Internet Head processors of clusters Worker processors of clusters Processes run without any communication between them Induction to Grid Computing and the National Grid Service 14 Communicating Processes UI Globus_job_submit Internet Head processors of clusters Worker processors of clusters Processes send messages to each other – Must run on same cluster Induction to Grid Computing and the National Grid Service 15 Communicating Processes UI Internet Head processors of clusters MPI Worker processors of clusters Processes send messages to each other – Must run on same cluster Induction to Grid Computing and the National Grid Service 16 Available API’s • C http://www.globus.org/developer/apireference.html • “Community Grid” CoG http://www.cogkit.org/ – Java, Python, Matlab Induction to Grid Computing and the National Grid Service 17 Data services • OGSA-DAI: data access and integration • GridFTP: a protocol for large file transfer • The Storage Resource Broker • But first…. Induction to Grid Computing and the National Grid Service 18 Oracle and the NGS (1) • The NGS core nodes, from the outset, have been partitioned into compute and data clusters. • As the NGS matures the requirement for data hosting will grow • Oracle database: for both users and services offered by the NGS. • The RAL and Manchester sites are designated as the data clusters with each site having the ability to dedicate up to eight nodes for use by Oracle. Induction to Grid Computing and the National Grid Service 19 Oracle and the NGS (2) Support • Additional application needed after joining the NGS • All enquiries and production support for the Oracle service is via the Grid Operations Support Centre (GOSC) – 9-5 Operational support (monitoring, notification, maintenance) other times best endeavours basis. Induction to Grid Computing and the National Grid Service 20 Data services on NGS Simple data files • Middleware supporting – Replica files – Logical filenames – Catalogue: maps logical name to physical storage device/file – Virtual filesystems, POSIX-like I/O • Storage Resource Broker Structured data – RDBMS, XML databases • Require extendable middleware tools to support – Move computation near to data – easy access, controlled by AA – integration and federation • OGSA -DAI Induction to Grid Computing and the National Grid Service 21 OGSA-DAI www.ogsadai.org Induction to Grid Computing and the National Grid Service 22 What is OGSA-DAI? • The Open Grid Services Architecture Data Access and Integration project is concerned with constructing middleware to assist with access and integration of data from separate data sources via the grid. • The project was conceived by the UK Database Task Force and is working closely with the Global Grid Forum DAIS-WG and the Globus team. Induction to Grid Computing and the National Grid Service 23 OGSA-DAI Motivation • OGSA-DAI is motivated by the need to: – Provide an extensible framework for easily integrating data resources on to Grids. – Provide for data discovery from previously unknown locations. – Allow different types of data models from distributed data resources to be easily integrated to Grid applications. – Allow data to be accessed through uniform interfaces. – Facilitate the integration of data from various sources to obtain the required information. – … Induction to Grid Computing and the National Grid Service 24 OGSA-DAI Provides • Access to and updating of data resources • Exposure of Data Resources to the Grid • Additional data manipulation functionality at the service level • Uniform access to disparate, heterogeneous data resources – Does not hide underlying data model • Data resources exposed through services – Clients interact with these services Induction to Grid Computing and the National Grid Service 25 Interacting with Data Resources • Activity: The data resource manipulation, data transformation and delivery actions that the client wants the service to perform. – Think of sending the job to the data not the data to the job. • Perform documents: Used by clients to specify to the services the activities they want executed. • Response documents: Used by the services to inform clients as to the status of execution of their Perform documents and, often, to also return data to a client. Induction to Grid Computing and the National Grid Service 26 OGSA-DAI Deck of Activities Induction to Grid Computing and the National Grid Service 27 OGSA-DAI and the NGS • the OGSA-DAI deployment on the NGS is being actively developed • users should expect that procedures may change – it does not reflect the commitment NGS has to providing a service. • Initially the Manchester JISC data cluster has been charged with deploying the OGSA-DAI service Induction to Grid Computing and the National Grid Service 28 Storage Resource Broker Induction to Grid Computing and the National Grid Service 29 SRB Projects • • • • • • • • Digital Libraries – – UCB, Umich, UCSB, Stanford,CDL NSF NSDL - UCAR / DLESE NASA Information Power Grid Astronomy – – National Virtual Observatory 2MASS Project (2 Micron All Sky Survey) Particle Physics – – – Particle Physics Data Grid (DOE) GriPhyN SLAC Synchrotron Data Repository Medicine – Digital Embryo (NLM) Earth Systems Sciences – – ESIPS LTER Persistent Archives – – NARA LOC Neuro Science & Molecular Science – – TeleScience/NCMIR, BIRN SLAC, AfCS, … Over 90 Tera Bytes in 16 million files Induction to Grid Computing and the National Grid Service 30 What is SRB? • Storage Resource Broker (SRB) is a software product developed by the San Diego Supercomputing Centre (SDSC). • Allows users to access files and database objects across a distributed environment. • Actual physical location and way the data is stored is abstracted from the user • Allows the user to add user defined metadata describing the scientific content of the information Induction to Grid Computing and the National Grid Service 31 How SRB Works • 4 major components: MCAT Database c d MCAT Server b e f SRB A Server SRB B Server – The Metadata Catalogue (MCAT) – The MCAT-Enabled SRB Server – The SRB Storage Server – The SRB Client g a SRB Client Induction to Grid Computing and the National Grid Service 32 SRB Client Tools • Provide a user interface to send requests to the SRB server. • 4 main interfaces: – – – – Command line (S-Commands) MS Windows (InQ) Web based (MySRB). Java (JARGON) • Web Services (MATRIX) Induction to Grid Computing and the National Grid Service 33 Planned Deployment on NGS Disk Farm Database Servers @ Manchester MCAT DB1 SRB Server DB n MCAT Server @ Manchester Online Replication Failover link User MCAT Server @ RAL MCAT DB1 SRB Server DB n Database Servers @ RAL SRB server @ Leeds Resource Driver Disk Farm SRB server @ RAL Resource Driver Disk Farm SRB server @ Oxford Resource Driver SRB server @ HPCX Resource Driver Disk Farm Induction to Grid Computing and the National Grid Service Disk Farm 34 GridFTP Induction to Grid Computing and the National Grid Service 35 What is GridFTP? • A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol • A Protocol – Multiple independent implementations can interoperate • This works. Both the Condor Project at Uwis and Fermi Lab have home grown servers that work with ours. • Lots of people have developed clients independent of the Globus Project. • Globus also supply a reference implementation: – Server – Client tools (globus-url-copy) – Development Libraries Induction to Grid Computing and the National Grid Service 36 Summary • Computation services – Globus Toolkit 2 • Data services – – – – ORACLE SRB OGSA-DAI GridFTP • Collaboration services – the portal Induction to Grid Computing and the National Grid Service 37