Grid Technology B Different Flavors of Grids CERN Geneva April 1-3 2003 Geoffrey Fox Community Grids Lab Indiana University gcf@indiana.edu Different Types of Grids • • • • Compute and File-oriented Grids “Internet Computing” Grids (Desktop Grids) Peer-to-peer Grids Information Grids: to distinguish between File, database and “Perl Filter” based Grids • Semantic Grids • Integrated (Hybrid, Complexity) Grids – Bio and Geocomplexity • Campus Grids • Enterprise Grids Compute and File-oriented Grids • Different Grids have different structures • Compute/File oriented Grids are well represented by “production part of particle physics” either in – Monte Carlo – Production of Data Summary Tapes • This is nearer the “Globus GT2” rather than the “Web Service” vision of the Grid • Strongly supported of course by EDG (European Data Grid and Trillium project in the US (Virtual Data Toolkit) • Physics Analysis phase of particle physics requires more collaboration and is more dynamic What do HEP experiments want to do on the GRID in the long term ? Production: n n n n n Simulation (Monte Carlo generators). Reconstruction (including detector geometry …). Event Mixing (bit wise superposition of Signal and Backgrounds). Reprocessing (Refinement, improved reconstruction data production). Production (production of AODs and ESDs starting from Raw data). n Very organized activity, generally centrally managed by prod teams Physics analysis: n n Searches for specific event signatures or particle types. (data access can be very sparse, perhaps on the order of one event out of each million). Measurement of inclusive and exclusive cross sections for a given physics channel – Measurement of relevant kinematical quantities n n I/O not feasible to organize the input data in a convenient fashion unless one constructs new files containing the selected events . the activities are also uncoordinated (not planned in advance) and (often) iterative. EDG “Compute/File” Grid Work Packages • WP1: Work Load (Resource) Management System • WP2: Data (Replication/Caching) Management • WP3: Grid Monitoring / Grid Information Systems (general meta-data lookup • WP4: Fabric Management (software etc. on cluster) • WP5: Storage Element (Grid Interface to mass storage) • WP6: Security • WP7: Network Monitoring Compute/File Grid Requirements I • Called Data Grid by Globus team • Terabytes or petabytes of data – Often read-only data, “published” by experiments • Large data storage and computational resources shared by researchers around the world – Distinct administrative domains – Respect local and global policies governing how resources may be used • Access raw experimental data • Run simulations and analysis to create “derived” data products Compute/File Grid Requirements II • Locate data – Record and query for existence of data • Data access based on metadata – High-level attributes of data • Support high-speed, reliable data movement – E.g., for efficient movement of large experimental data sets • Support flexible data access – e.g., databases , hierarchical data formats (HDF), aggregation of small objects • Data Filtering – Process data at storage system before transferring Compute/File Grid Requirements III • Planning, scheduling and monitoring execution of data requests and computations • Management of data replication – Register and query for replicas – Select the best replica for a data transfer • Security – Protect data on storage systems – Support secure data transfers – Protect knowledge about existence of data • Virtual data – Desired data may be stored on a storage system (“materialized”) or created on demand Functional View of Compute/File Grid Application Metadata Service Planner: Data location, Replica selection, Selection of compute and storage nodes Replica Location Service Information Services Location based on data attributes Location of one or more physical replicas State of grid resources, performance measurements and predictions Security and Policy Executor: Initiates data transfers and computations Data Movement Data Access Compute Resources Storage Resources COLLECTIVE 2: SERVICES SPECIFIC TO APPLICATION DOMAIN OR VIRTUAL ORG. FABRIC CONNECTIVITY COLLECTIVE 1: GENERAL SERVICES FOR COORDINATING RESOURCE: MULTIPLE SHARING SINGLE RESOURCES RESOURCES COLLECTIVE Layered C/F Grid Architecture Request Interpretation and Planning Services Data Transport Services Data Access Protocol or Service Communication Protocols (e.g., TCP/IP stack) Storage systems Workflow or Request Management Services Data Federation Services Storage Resource Management ApplicationSpecific Data Discovery Services Data Filtering or Transformation Services Data Filtering or Transformation Services Authentication and Authorization Protocols (e.g., GSI) Compute Systems Networks Community Authorization Services General Data Discovery Services Database Management Services Consistency Services (e.g., Update Subscription, Versioning, Master Copies) Storage Management (Brokering) Compute Scheduling (Brokering) Compute Resource Management Monitoring/ Auditing Services Resource Monitoring/ Auditing C/F Grid Architecture I (from the bottom up) • Fabric Layer – Storage systems – Compute systems – Networks • Connectivity Layer – Communication protocols (e.g., TCP/IP protocol stack) – Authentication and Authorization protocols (e.g., GSI) C/F Grid Architecture II • Resource Layer: sharing single resources – Data Access Protocol or Service (e.g., Globus gridftp) – Storage Resource Management (e.g., SRM/DRM/HRM from Lawrence Berkeley Lab) – Data Filtering or Transformation Services (e.g., DataCutter from Ohio State University) – Database Management Services (e.g., local RDBMS) – Compute Resource Management Services (e.g., local supercomputer scheduler) – Resource Monitoring/Auditing Service C/F Grid Architecture III • Collective 1 Layer: General Services for Coordinating Multiple Resources – Data Transport Services (e.g., Globus Reliable File Transfer and Multiple File Transfer Service from LBNL) – Data Federation Services – Data filtering or Transformation Service (e.g., Active ProxyG from Ohio State University) – General Data Discovery Services (e.g., Globus Replica Location Service and Globus Metadata Catalog Service) – Storage management/brokering – Compute management/brokering (e.g., Condor from University of Wisconsin, Madison) – Monitoring/auditing service C/F Grid Architecture IV • Collective 2 Layer: Services for Coordinating Multiple Resources that are Specific to an Application Domain or a Virtual Organization – Request Interpretation and Planning Services (e.g., Globus Chimera and Pegasus for Physics Applications and Condor DAGMan) – Workflow management service (e.g., Globus Pegasus) – Application-Specific Data Discovery Services (e.g., Earth Systems Grid Metadata Catalog) – Community Authorization service (e.g., Globus CAS) – Consistency Services with varying levels of consistency, including data versioning, subscription, distributed file systems or distributed databases Composing These Services To Provide Higher-Level Functionality • For example, a Grid File System might compose: – Fabric layer: storage components, compute elements – Connectivity layer: security and communication protocols – Resource layer: data access protocols or services and storage resource management – Collective layers: transport and discovery services, collective storage management, monitoring and auditing, authorization and consistency services Peers Peer to Peer Network User Service Resource Routing User Service Resource Routing User Service Resource Routing Peers are Jacks of all Trades linked to “all” peers in community Typically Integrated Clients Servers and Resources User Service Resource Routing User Service Resource Routing User Service Resource Routing Peer to Peer (Hybrid) Grid User Service Resource Routing User Service Resource Routing NB Routing User Service Resource Routing User Service Resource Routing User Service Resource Routing Services Dynamic Message or Event Routing from Peers or Servers User Service Resource Routing A democratic organization Peers Database Database Service Facing Web Service Interfaces Event/ Message Brokers Event/ Message Brokers Event/ Message Brokers Peer to Peer Grid Peers User Facing Web Service Interfaces Chapter 18 and 19 Grid Book Peer to Peer Grid Entropia: Desktop Grid l l l Entropia (chapter 12 of book), United Devices, Parabon, SETI@Home etc. have demonstrated “internet Computing” or Desktop Grid very succesfully Used to be called peer-to-peer computing but that fell out of favor due to Napster’s bad name Condor has similar types of utility but Entropia optimized for – Huge number of clients – Providing a secure “sandbox” for application to run in which guarantees that application will not harm client Scaling of Entropia Application Entropia Architecture Application Execution on the Entropia System. End-user submits computation to Job Management (1). The Job Manager breaks up the computation into many independent “subjobs” (2) and submits the subjobs to the resource scheduler. In the mean time, the available resources of a client are periodically reported to the Node Manager (a) that informs the Subjob Scheduler (b) using the resource descriptions. The Subjob Scheduler matches the computation needs with the available resources (3) and schedules the computation to be executed by the clients (4,5,6). Results of the computation are sent to the Job Manager (7), put together, and handed back to the end-user (8). Information Grids I • Actually nearly all Grids consist of composing access to data with processing of that data in some computer program • In Compute/File Grids (Data Grids for Globus), one naturally allowed database access from programs although in some cases dominant access is to files • In Information Grids, we consider access to databases but view of course files as a special case of databases • Real difference is what tier we are looking at: – Compute/File Grids are looking at “backend resources” – Information Grids are looking at “middle tier” because typically data volumes are not large enough to stress typical middle-tier mechanisms Information Grids II • Should use Middle tier where possible and adopt hybrid model with control always in middle tier and using backend only where needed – This would require reworking a lot of tools e.g. Condor should schedule services not jobs • Most programming models either specify “program view” or “service view” and do not separate – Developments like GT3 will allow changes but it will take a long time before key tools are implemented in hybrid mode • Note Bioinformatics and many other Information Grids only require service view – These applications have in UK e-Science started with “Web Service” and not “Globus” view User Services System Services Grid Computing Environments Portal Services System Services Application Service View Service Middleware System Services Program View System Services System Services Raw (HPC) Resources “Core” Grid Database OGSA-DAI (Malcolm Atkinson Edinburgh) UK e-Science Grid Core Programme Development of Data Access and Integration Services for OGSA http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI - Access to XML Databases - Access to Relational Databases - Distributed Query Processing - XML Schema Support for e-Science - DAI Key Services GridDataService GDS Access to data & DB operations GridDataServiceFactory GDSF Makes GDS & GDSF GridDataServiceRegistry GDSR Discovery of GDS(F) & Data GridDataTranslationService GDTS Translates or Transforms Data GridDataTransportDepot Data transport with persistence GDTD Integrated Structured Data Transport Relational & XML models supported Role-based Authorisation Binary structured files (later) 1a. Request to Registry for sources of data about “x” SOAP/HTTP Registry 1b. Registry responds with Factory handle service creation API interactions 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3c. Results of query returned to client as XML 2b. Factory creates GridDataService to manage access Grid Data Service 3b. GDS interacts with database XML / Relationa l database Interface transparency: one GDS supports multiple database types Relational database Client Client Client Grid Data Service XML databas e Director y / File system Software Availability Available now Phase 1 prototype of GDS, GDSF & GDSR for XML Java implementations for the axis/tomcat platform and the Xindice database • Globus-2 Relational database support BinX Schema v0.2 www.epcc.ed.ac.uk/gridserve/WP5 An XML Schema for describing the structure of binary datafiles – the power of XML for terabyte files Software Q1 2003 Reference implementation 1 Access & Update • • XML databases • Relational databases To be released as Basic Services in Globus Toolkit 3 umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI/products Advanced Components Translation Client GDS:PerformScript GDS DB Translation GDT Consumer Composed Components GDS:performScript Translation GDS:performScript GDS Client GDS:performScript GDT Translation GDS:performScript GDT GDT Consumer Futures of OGSA-DAI Allow querying of distributed databases – this is using Grid to federate multiple databases Grid is “intrinsically” federation technology – need to mimic classic database federation ideas in a Grid language Form composite Schema from integration of those of individual databases (OGSA-DAI allows you to query each database web service to find schema) Decide how to deal with very important case where user view is a complex filter run on database query Hardest when need to dynamically assign resource to perform filter Could view as a “simulation Web Service” outside OGSA-DAI WSDL Of Filter Filter DB Semantic Grid starts with the Semantic Web which is a “dream” and a project of W3C “The Semantic Web is an extension of the current Web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be processed by automated tools as well as people” From the W3C Semantic Web Activity statement Digital Brilliance is phase transition coming from “collective effect” in the Grid Spin Glass. • The Hosting environment is the “Ether” • The Resources are the Spins • The forces are the meta-data linking resources • Knowledge (The Higgs) will emerge when we get enough meta-data to force phase transition Resource Description Framework Richer semantics Semantic Web Classical Web OWL Web Ontology Language “The World Wide Web as it is currently constituted resembles a poorly mapped geography. Our insight into the documents and capabilities available are based on keyword searches, abetted by clever use of document connectivity and usage patterns. The sheer mass of this data is unmanageable without powerful tool support. In order to map this terrain more precisely, computational agents require machine-readable descriptions of the content and capabilities of web accessible resources. These descriptions must be in addition to the humanreadable versions of that information. The OWL Guide SW Tools Good Tools for recording meta-data (OWL) but not so advanced in looking at their implications • Semantic Web requires a metadata-enabled Web • Where will the metadata come from? • How about from the linked rich resources of a virtual organization? • A Grid ……. Classical Web Classical Grid More computation Grid is metadata based middleware Astronomy Sky Survey Data Grid 1. Portals and Workbenches 2.Knowledge & Resource Management 3. Metadata View Catalog Analysis Bulk Data Analysis Standard APIs and Protocols Concept space 4.Grid Security Caching Replication Backup Scheduling Data View Information 5. Discovery Metadata delivery Data Discovery Data Delivery Standard Metadata format, Data model, Wire format 6. Catalog Mediator Data mediator Catalog/Image Specific Access 7. Compute Resources Derived Collections Catalogs Data Archives An Example of RDF and Dublin Core • <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/metadata/dublin_core#"> • <rdf:Description about="http://www.dlib.org"> • <dc:Title>D-Lib Program - Research in Digital Libraries</dc:Title> <dc:Description>The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing. </dc:Description> <dc:Publisher>Corporation For National Research Initiatives</dc:Publisher> <dc:Date>1995-01-07</dc:Date> • <dc:Subject> – <rdf:Bag> <rdf:li>Research; statistical methods</rdf:li> <rdf:li>Education, research, related topics</rdf:li> <rdf:li>Library use Studies</rdf:li> </rdf:Bag> • </dc:Subject> <dc:Type>World Wide Web Home Page</dc:Type> • <dc:Format>text/html</dc:Format> • <dc:Language>en</dc:Language> • </rdf:Description> </rdf:RDF> For example… • Annotations of results, workflows and database entries could be represented by RDF graphs using controlled vocabularies described in RDF Schema and OWL • Personal notes can be XML documents annotated with metadata or RDF graphs linked to results or experimental plans • Exporting results as RDF makes them available to be reasoned over • RDF graphs can be the “glue” that associates all the components (literature, notes, code, databases, intermediate results, sketches, images, workflows, the person doing the experiment, the lab they are in, the final paper) • The provenance trails that keep a record of how a collection of services were orchestrated so they can be replicated or replayed, or act as evidence More meta-data … • Represent the syntactic data types of e-Science objects using XML Schema data types • Represent domain ontologies for the semantic mediation between database schema, an application’s inputs and outputs, and workflow work items • Represent domain ontologies and rules for parameters of machines or algorithms to reason over allowed configurations • Use reasoning over execution plans, workflows and other combinations of services to ensure the semantic validity of the composition • Use RDF as a common data model for merging results drawn from different resources or instruments • Capture the structure of messages that are exchanged between components And more meta-data … • At the data/computation layer: classification of computational and data resources, performance metrics, job control, management of physical and logical resources • At the information layer: schema integration, workflow descriptions, provenance trail • At the knowledge layer: problem solving selection, intelligent portals • Governance of the Grid, for example access rights to databases, personal profiles and security groupings • Charging infrastructure, computational economy, support for negotiation; e.g. through auction model Richer semantics http://www.semanticgrid.org Semantic Web Semantic Grid Classical Web Classical Grid More computation Source: Norman Paton Summary of Grid Types • Compute/File Grid: The “Linux workstation view of distributed system” – need planning, scheduling of 10,000’s jobs, efficient movement of data to processors • Desktop Grid: as above but use huge numbers of “foreign” compute resources • Information Grids: Web service access to meta-data rich data repositories • Hybrid (complexity) Grids: Combination of Information and Compute/File Grids • Peer-to-peer Grid: Unstructured general purpose access to other style grids • Semantic Grid: Enables knowledge discovery in all Grids