The Globus® Toolkit V4.0: Current Status, Future Directions Carl Kesselman 2 The Application-Infrastructure Gap Dynamic and/or Distributed Applications Shared Distributed Infrastructure B A 1 1 9 9 Bridging the Gap: Service-Oriented Infrastructure 3 Users Service-oriented applications Wrap applications as services Compose applications into workflows Service-oriented infrastructure Provision physical resources to support application workloads Composition Workflows Invocation Appln Service Appln Service Provisioning Globus is Service-Oriented Infrastructure Technology Software for service-oriented infrastructure Uniform abstractions & mechanisms Registries, security, data management, … Open source & open standards E.g., GRAM on computer, GridFTP on storage system, custom application service Tools to build applications that exploit service-oriented infrastructure Service enable new & existing resources Each empowers the other Enabler of a rich tool & service ecosystem 4 Globus as Service-Oriented Infrastructure User Application User Application Tool Uniform interfaces, security mechanisms, Web service transport, monitoring GRAM Computers Reliable File Transfer MDSIndex User Svc Host Env Specialized resource 5 User Application Tool User Svc Host Env MyProxy GridFTP Storage DAIS Database A Typical eScience Use of Globus: Network for Earthquake Eng. Simulation Links instruments, data, computers, people An eBusiness Use of Globus: SAP Demonstration @ GlobusWorld 3 Globus-enabled applns: CRM: Internet Pricing Configurator (IPC) CRM: Workforce Management (WFM) Web Browsers / Batch Processes SCM: Advanced Planner & Optimizer (APO) Applications modified to: Adjust to varying demand & resources Use Globus to discover & provision resources (typically several thousand requests) Request: Price Query 1 IPC Server 2 IPC Delegation of Dispatcher Request 2 IPC Response: PricelistServer Depending on: - Time - Discount - Number of Items -… 3 SAP AG R/3 Internet Pricing & Configurator (IPC) 8 Globus Toolkit V4.0 Major release planned April 29th 2005 Fifteen months of design, development and testing 1.8M lines of code Major contributions from five institutions Hundreds of millions of service calls executed over weeks of continuous operation Significant improvements over GT3 code base in all dimensions 9 Our Goals for GT4 Usability, reliability, scalability, … Documentation at acceptable quality level Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform Web service components have quality equal or superior to pre-WS components WS-I Basic (Security) Profile compliant New components, platforms, languages And links to larger Globus ecosystem Globus Open Source Grid Software G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Components Replica Location Service XIO Credential Management Security Data Management Non-WS Execution Management Information Services Common Runtime 11 GT4 Components Your Your CC Client Client SERVER Your Your Python Python Client Client Java Services in Apache Axis Python hosting, Plus GT Libraries and Handlers GT Libraries Pre-WS MDS C WS Core Pre-WS GRAM pyGlobus WS Core RLS Your C Service MyProxy Your Python Service SimpleCA X.509 credentials = common authentication CAS OGSA-DAI GTCP Delegation Index Trigger Archiver Your Your Java Java Service Service GRAM RFT Interoperable WS-I-compliant SOAP messaging Your Your CC Client Client Your Your Java Java Client Client Your Your Python Python Client Client GridFTP Your Your Java Java Client Client CLIENT C Services using GT Libraries and Handlers 12 GT4 Web Services Core Supports both Globus services (GRAM, RFT, Delegation, etc.) & user-developed services Redesign to enhance scalability, modularity, performance, usability Leverages existing WS standards WS-I Basic Profile: WSDL, SOAP, etc. WS-Security, WS-Addressing Adds support for emerging WS standards WS-Resource Framework, WS-Notification Java, Python, & C hosting environments 13 GT4 Web Services Core Custom Web Services Custom GT4 WSRF Web WSRF Web Services Services WS-Addressing, WSRF, WS-Notification WSDL, SOAP, WS-Security Registry Administration GT4 Container User Applications 14 Open Source/Open Standards WSRF developed in collaboration with IBM Currently in OASIS process Contributions to Apache for WS-Security WS-Addressing Axis Apollo (WSRF) Hermes (WS-Notification) 15 GT4 Security Highlights Standards based support for message level and transport level security Standards based authorization (SAML) via CAS or callout Stand-alone delegation service More authentication options MyProxy, simpleCA, … 16 GT4’s Use of Security Standards 17 GT4 Security SSL/WS-Security with Proxy Services (running Certificates Authz Callout on user’s behalf) Access Compute Center Rights CAS or VOMS issuing SAML or X.509 ACs Users Rights Local Policy on VO identity or attribute authority MyProxy VO Rights’ KCA 18 GT4 Data Management Stage large data to/from nodes Replicate data for performance & reliability Locate data of interest Provide access to diverse data sources File systems, parallel file systems, hierarchical storage (GridFTP) Databases (OGSA DAI) 19 GT4 Data Functions Find your data: Replica Location Service Managing ~40M files in production settings Move/access your data: GridFTP, RFT High-performance striped data movement 27 Gbit/s memory-to-memory on a 30 Gbit/s link (90% utilization) with 32 IBM TeraGrid nodes. 17.5 Gbit/s disk-to-disk limited by the storage system Reliable movement of 120,000 files (so far) Couple data & execution management GRAM uses GridFTP and RFT for staging Bandwidth Vs Striping 100% Globus code Bandwidth (Mbps) 18000 GridFTP in GT4 No licensing issues Stable, extensible 20 Disk-to-disk on TeraGrid 20000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 10 20 30 40 50 60 70 Degree of Striping # Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32 IPv6 Support XIO for different transports Striping multi-Gb/sec wide area transport Pluggable Front-end: e.g., future WS control channel Back-end: e.g., HPSS, cluster file systems Transfer: e.g., UDP, NetBLT transport 21 Reliable File Transfer: Third Party Transfer Fire-and-forget transfer Web services interface Many files & directories RFT Client SOAP Messages RFT Service Integrated failure recovery GridFTP Server Master DSI Protocol Interpreter GridFTP Server Data Channel Data Channel IPC Link IPC Receiver Notifications (Optional) Protocol Interpreter Master DSI IPC Link Slave DSI Data Channel Data Channel Slave DSI IPC Receiver 22 Replica Location Service Identify location of files via logical to physical name map Distributed indexing of names, fault tolerant update protocols GT4 version scalable & stable Managing ~40 million files across ~10 sites Index Index Local Update Bloom Bloom DB send filter filter (secs) (secs) (bits) 10K <1 2 1M 1M 2 24 10 M 5M 7 175 50 M 23 Data Replication Service (tech preview) Pull “missing” files to local site Site B Site A List of required Files Data Replication Service Local Replica Catalog Replica Location Index Reliable File Transfer Service GridFTP Reliable File Transfer Service GridFTP Data Replication Service Local Replica Catalog Replica Location Index 24 OGSA-DAI Flexible & Composable Middleware Data access Data integration Relational & XML Databases, semi-structured files Multiple data delivery mechanisms, data translation Extensible & Efficient framework Request documents contain multiple tasks A task = execution of an activity Group work to enable efficient operation Extensible set of activities > 30 predefined, framework for writing your own Moves computation to data Pipelined and streaming evaluation Concurrent task evaluation 25 Predefined Activities Developers encouraged to roll their own – many do fileAccess fileManipulation directoryAccess fileWriting relationalResourceManager sqlBulkLoadRowset sqlUpdateStatement sqlStoredProcedure sqlQueryStatement DeliverFromFile DeliverFromGDT xmlCollectionManagement xmlResourceManagement xQueryStatement xUpdateStatement xPathStatement DeliverToStream DeliverFromGFTP DeliverToGFTP DeliverToURL DeliverFromURL DeliverToFile DeliverToGDT outputStream inputStream xslTransform zipArchive gzipCompression 26 OGSA-DAI Current Release = Release 5 Added Installation wizards & indexed files >1100 registered users we know about Running on 3 message passing infrastructures Release 6 – May 2005 Improved client side API Explicit control of sequential & parallel tasks Dynamic reconfigurability Release 7 – September 2005 Sessions and local transactions More integration components, distributed relational query WS-DAI reference implementation Talk by Neil Chue Hong 15:45 Today 27 Execution Management (GRAM) Common WS interface to schedulers Unix, Condor, LSF, PBS, SGE, … More generally: interface for process execution management Lay down execution environment Stage data Monitor & manage lifecycle Kill it, clean up A basis for application-driven provisioning 28 GT4 GRAM 2nd-generation WS implementation Streamlined critical path Use only what you need Flexible credential management optimized for performance, stability, scalability Credential cache & delegation service GridFTP & RFT used for data operations Data staging & streaming output Eliminates redundant GASS code Single and multi-job support 29 GT4 GRAM Structure: WSRF/WSN Poster Child Service host(s) and compute element(s) Client Delegate Delegation Transfer request RFT File Transfer Compute element Local job control sudo GT4 Java Container GRAM GRAM services services GRAM adapter GridFTP FTP control Local scheduler User job FTP data GridFTP Remote storage element(s) 30 Initial Investigations into VM Deployment request VM EPR VM Factory create new VM image Client use existing VM image Create VM image inspect and manage VM Repository deploy & suspend start program VM Manager Resource VM 31 Monitoring and Discovery “Every service should be monitorable and discoverable using common mechanisms” WSRF/WSN provides those mechanisms A common aggregator framework for collecting information from services, thus: MDS-Index: Xpath queries, with caching MDS-Trigger: perform action on condition Deep integration with Globus containers & services: every GT4 service is discoverable GRAM, RFT, GridFTP, CAS, … GT4 Monitoring & Discovery WS-ServiceGroup Clients (e.g., WebMDS) GT4 Container Registration & WSRF/WSN Access GT4 Container MDSIndex Automated registration in container GRAM 32 MDSIndex adapter GT4 Cont. Custom protocols for non-WSRF entities MDSIndex GridFTP User RFT 33 MDS4 Extensibility Aggregator framework provides Registration management Collection of information from Grid Resources Plug in interface for data access, collection ,query, … WebMDS framework provides for customized display XSLT transformations GT4 Documentation is Much Improved! 35 The Globus Ecosystem Globus components address core issues relating to resource access, monitoring, discovery, security, data movement, etc. A larger Globus ecosystem of open source and proprietary components provide complementary components GT4 being the latest version A growing list of components These components can be combined to produce solutions to Grid problems We’re building a list of such solutions Many Tools Build on, or Can Contribute to, GT4-Based Grids 36 Condor-G, DAGman VOMS MPICH-G2 PERMIS GRMS GT4IDE Nimrod-G Sun Grid Engine Ninf-G PBS scheduler Open Grid Computing Env. LSF scheduler Commodity Grid Toolkit GridBus GriPhyN Virtual Data System TeraGrid CTSS Virtual Data Toolkit NEES GridXpert Synergy IBM Grid Toolbox Platform Globus Toolkit … 37 2005 and Beyond We have a solid Web services base We now want to build, on that base, a open source service-oriented infrastructure Virtualization New services for provisioning, data management, security, VO management End-user tools for application development Etc., etc. 38 How Globus Works Globus is a distributed open source community with many contributors & users CVS, documentation, bugzilla, email lists Modular structure allows many to contribute Globus Alliance Board provides governance when needed Meritocracy: individuals who demonstrate ongoing contributions & commitment Primarily: what to include, when to release Globus Alliance is an informal partnership of organizations led by Board members 39 Evolution of the Globus Alliance Argonne/U.Chicago (Childers, Foster): 1995 USC/ISI (Kesselman): 1995 Edinburgh (Atkinson, Parsons): 2003 Swedish PDC (Johnsson, Mulmo): 2003 NCSA (Welch): 2004 Univa (Czajkowski, Tuecke): 2004 Other contributors will surely be added 40 From eScience to eBusiness Since ~2001, growing interest in Globus for commercial use Enterprises, IT vendors, ISVs asking Globus leaders to address commercial needs But hard to do in a research laboratory In response, we have created two new organizations Globus Consortium Univa Globus Consortium (www.globusconsortium.com) Nonprofit organization funded by companies to advance Globus Toolkit for enterprise use Initial sponsor members: HP, IBM, Intel, Sun Initial contributors: Nortel, Univa First two projects already identified Member-driven software quality improvements Contributions to job submission standards Other projects to be defined, e.g. Develop new features key to enterprise use Education & outreach 41 42 Provider of commercial support, services, & products around open source Globus Commercial distribution of GT4 & beyond Integration with enterprise systems Committed to open source & open standards Founded by Tuecke, Foster, Kesselman Tuecke left Argonne to be CEO Foster, Kesselman remain at Argonne, ISI Experienced management team Rich Miller, Vas Vasiliadis, Paul Davé, Bob Mandel 43 Globus and its User Community How can “we” best support “you”? We try to provide the best software we can We use bugzilla & other community tools We work to grow the set of contributors How can “you” best support “us”? Become a contributor: of software, bug fixes, answers to questions, documentation Provide us with success stories that can justify continued Globus development Promote Globus within your communities 44 Working with GT4 Download and use the software, and provide feedback Review, critique, add to documentation Join gt4friends@globus.org mail list Globus Doc Project: http://gdp.globus.org Tell us about your GT4-related tool, service, or application 45 So… GT4 is a significant step forward in the quality, functionality and standards compliance of GT. Beta release available for immediate use, final April 29th Downloads and docs at: www.globustoolkit.org 2nd Edition www.mkp.com/grid2