Introduction to eInfrastructure Jennifer M. Schopf UK National eScience Centre Argonne National Lab Talk Outline Definition of Grids, eInfrastructure, and eResearch JISC plans Globus Toolkit Provider of basic infrastructure Focus on data tools OMII – Open Middleware Infrastructure UK repository and distribution of eResearch tools 2 What is a Grid? Many definitions – many differences especially between academics and industry Both use the buzzword to get funding My definition Resource sharing Coordinated problem solving Dynamic, multi-institutional virtual orgs 3 Resource Sharing Resources can be anything Computers Storage/repositories Sensors and Networks People and software Local Control of the resources, and local policies for their use Sharing is always conditional Issues of trust, policy Negotiation and payment 4 Coordinated Problem Solving Beyond client-server Client Server defines a small set of wellunderstood interactions as the only ones that can take place Actions in this space can include Distributed data analysis Computation and visualization of results Collaboration 5 Virtual Organization (VO) Concept Virtual Community C Person B (Administrator) Compute Server C1' Person A (Principal Investigator) Person E (Researcher) Person D (Researcher) Person B (Staff) Compute Server C2 File server F1 (disk A) Compute Server C1 Person A (Faculty) Person C (Student) Organization A Person D File server F1 (Staff) (disks A and B) Compute Server C3 Person E (Faculty) Person F (Faculty) Organization B VO for each application or workload Carve out and configure resources for a particular use and set of users 6 Dynamic, Multi-institutional Virtual Organizations Crossing administrative domains No one has full control over the resources Local policy not global Different local policy on different sites Community overlays on classic organizational structures Large or small, static or dynamic 7 What is eScience or eResearch? Use of distributed resources, in a coordinated way, across multiple administrative domains to do science or further your research “Classic” eScience Use compute and data resources at many sites to run large scale simulations for a physics or biology application Today’s Use Cases Replicate data across multiple sites to increase reliability, redundancy and performance Use one common interface to access a variety of data resources at multiple sites Look at a number of available resources to select the one that best suits the application needs at this time 8 What is eInfrastructure? “A framework (political, technological and administrative) for the easy and cost-effective shared use of distributed electronic resources across a geographical area” “The combination of research infrastructure, grid, and broadband technologies projects” “Anything that enables eScience, collaborative research – distributed, persistent, reliable, accessible services” “Broader than Grids - includes things like digital libraries, networking, etc” “current Grid-based eInfrastructure model” 9 How does JISC define it? “Similar to NSF’s cyberinfrastructure work” (CI==Grids) Tony Hey (JCSR chair) says “A national eInfrastructure to support collaborative and multidisciplinary research and innovation is the joint responsibility of RCUK (OST) and JISC (HEFCs)” 2006 eInfrastructure–Grid initiatives continue building advanced Gridempowered infrastructures Production quality & ready-to-use SW Environments dynamically adaptable to user needs 10 Malcolm Read has said E-infrastructure includes: Networks (internet, light paths…) Computers (workstations, servers, HPC…) Access controls (security, AAA…) Middleware (metadata…) Finding tools (portals, search engines…) Digital libraries (bibliographic, text, images, sound…) Research data (national and scientific databases, individual data…) 11 JISC funding for eInfrastructure July 27 ‘05 press release for additional funds http://www.jisc.ac.uk/ index.cfm?name=news_spendingreview Continued development of JANET Further digitisation of major scholarly collections Enhancement to e-learning programmes, (e-assm’t, e-portfolios, e-learning tools) Development of the e-infrastructure Incl development of collaborative env’ts Development of a shared infrastructure to support use of institutional repositories 12 Much Still To Be Defined I’ve been told ~ £11M specifically for eInfrastructure Starting in April 2006, 2 years of funding Programme manager being hired OST roadmap is basis (due by March, no draft available yet) areas are (no mapping to funding amount) 1: 2: 3: 4: 5: 6: Middleware/AA/DRM Networks and Computer Power (Hardware) Preservation and Curation Search and Navigation Data and Information Creation Virtual Research Communities 13 JISC cont. When this is better formulated, it will be broadcast widely There’s a JCSR meeting in mid February where some of it should be solidified 14 Questions on Definitions or JISC? Two Common eInfrastructure Approaches in the UK Globus Toolkit Open Middleware Infrastructure Institute (OMII) release 16 What functionality is needed to use a Grid? Basics: Run a job Transfer a file Find out what’s going on (service and job monitoring All done securely Higher-level Replication Higher level data movement Workflow-scheduling 17 Globus Toolkit Was Created To Help Applications The Globus Toolkit consists of collections of solutions to problems that frequently come up when trying to build collaborative distributed applications Heterogeneity Standards Focus on simplifying heterogeneity for application developers Working towards more “vertical solutions” Capitalize on and encourage use of existing standards (IETF, W3C, OASIS, GGF) Reference implementations of new/proposed standards in these organizations Open source, open contribution model 18 Globus is an Hour Glass Local sites have an their own policies, installs – heterogeneity! Queuing systems, monitors, network protocols, etc Globus unifies Higher-Level Services and Users Standard GT4 Interfaces Build on Web services Use WS-RF, WS-Notification to represent/access state Common management abstractions & interfaces Local heterogeneity 19 Globus Toolkit: Open Source Grid Infrastructure Globus Toolkit v4 www.globus.org Data Replication Credential Mgmt Replica Location Grid Telecontrol Protocol Delegation Data Access & Integration Community Scheduling Framework WebMDS Python Runtime Community Authorization Reliable File Transfer Workspace Management Trigger C Runtime Authentication Authorization GridFTP Grid Resource Allocation & Management Index Java Runtime Security Data Mgmt Execution Mgmt Info Services Common Runtime 20 GT4 Web Services Core Supports both GT (GRAM, RFT, Delegation, etc.) & user-developed services Redesign to enhance scalability, modularity, performance, usability Leverages existing WS standards WS-I Basic Profile: WSDL, SOAP, etc. WS-Security, WS-Addressing Adds support for emerging WS standards WS-Resource Framework, WS-Notification Java, Python, & C hosting environments Java is standard Apache 21 WSRF & WS-Notification Naming and bindings (basis for virtualization) Every resource can be uniquely referenced and has one or more associated services for interacting Lifecycle (basis for resilient state management) Resources created by svcs following a factory pattern Resource destroyed immediately or scheduled Information model (basis for monitoring & discovery) Resource properties associated with resources Operations for querying and setting this info Asynchronous notification of changes to properties Service groups (basis for registries & collective svcs) Group membership rules and membership management Base fault type 22 WSRF vs XML/SOAP The definition of WSRF means that the Grid and Web services communities can move forward on a common base Why Not Just Use XML/SOAP? WSRF and WS-N are just XML and SOAP WSRF and WS-N are just Web services Benefits of following the specs: These patterns represent best practices that have been learned in many Grid applications There is a community behind them Why reinvent the wheel? Standards facilitate interoperability 23 Basic Globus Security Mechanisms Grid-wide identities implemented as PKI certificates Transport-level and message-level authentication Ability to delegate credentials to agents Ability to map between Grid & local identities Local security administration & enforcement Single sign-on support implemented as “proxies” A “plug in” framework for authorization decisions 24 The Challenge of Grid Resource Management Enabling secure, controlled remote access to heterogeneous computational resources and management of remote computation Authentication and authorization Resource discovery & characterization Reservation and allocation Computation monitoring and control Addressed by a set of protocols & services GRAM protocol as a basic building block Resource brokering & co-allocation services GSI for security, MDS for discovery 25 GT4 Execution Management (GRAM) Common WS interface to schedulers Unix, Condor, LSF, PBS, SGE, … More generally: interface for process execution management Lay down execution environment Stage data Monitor & manage lifecycle Kill it, clean up A basis for application-driven provisioning 26 GT4 Data Functions Find your data: Replica Location Service Move/access your data: GridFTP, Reliable File Transfer (RFT) High-performance striped data movement Couple data & execution management Managing ~40M files in production settings GRAM uses GridFTP and RFT for staging Access databases through standard Grid interfaces: OGSA-DAI 28 GridFTP in GT4 Basic file transfer support, and memory-to-memory copies High-performance, secure, reliable data transfer Optimized for high-bandwidth wide-area networks FTP with well-defined extensions Uses basic Grid security (control and data channels) Multiple data channels for parallel transfers Partial file transfers Third-party (direct server-to-server) transfers Performance tuning Greatly improve performance over most FTP implementations On TeraGrid network achieved 27 Gbs on a 30 Gbs link (90% utilization) with 32 nodes 29 Reliable File Transfer: Third Party Transfer Fire-and-forget transfer Web services interface Many files & directories RFT Client SOAP Messages RFT Service Integrated failure recovery GridFTP Server Master DSI Protocol Interpreter GridFTP Server Data Channel Data Channel IPC Link IPC Receiver Notifications (Optional) Protocol Interpreter Master DSI IPC Link Slave DSI Data Channel Data Channel Slave DSI IPC Receiver 30 Monitoring and Discovery System (MDS4) Grid-level monitoring system used most often for resource selection Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification Aid user/agent to identify host(s) on which to run an application WS-ResourceProperties, WS-BaseNotification, WSServiceGroup Functions as an hourglass to provide a common interface to lower-level monitoring tools 32 MDS4 Components Information providers Caching registry of data Trigger Service Can be from web services, executables, files Index Service Basic data sources – queue data, cluster data, etc Warnings when conditions are met WebMDS Visualization of data 33 34 Tested Platforms Debian Fedora Core FreeBSD HP/UX IBM AIX Red Hat Sun Solaris SGI Altix (IA64 running Red Hat) SuSE Linux Tru64 Unix Apple MacOS X (no binaries) Windows – Java components only List of binaries and known platform-specific install bugs at http://www.globus.org/toolkit/docs/4.0/admin/ docbook/ ch03.html 35 Many Tools Build on, or Can Contribute to, GT4-Based Grids Condor-G, DAGman MPICH-G2 GRMS Nimrod-G Ninf-G Open Grid Computing Env. Commodity Grid Toolkit GriPhyN Virtual Data System Virtual Data Toolkit GridXpert Synergy Platform Globus Toolkit VOMS PERMIS GT4IDE Sun Grid Engine PBS scheduler LSF scheduler GridBus TeraGrid CTSS NEES IBM Grid Toolbox … 36 Any questions about Globus? Open Middleware Infrastructure Institute To be a leading provider of reliable interoperable and open-source Grid middleware components services and tools to support advanced Grid enabled solutions in academia and industry. Formed University of Southampton (2004) Focus on an easy to install e-Infrastructure solution Utilise existing software & standards Expanding with new partners in 2006 OGSA-DAI team at Edinburgh myGrid team at Manchester Slides compliments of Steven Newhouse 38 OMII Functions Provide a software repository of Grid components and tools from e-science projects Re-engineering software, harden it, and provide support for components sourced from the community Contract the development of “missing” software components necessary in grid middleware (managed programme) Provide an integrated grid middleware release of the sourced software components Slides compliments of Steven Newhouse 39 The Managed Programme: Distribution and Repository OGSA-DAI (Data Access service) GridSAM (Job Submission & Monitoring service) Grimoires (Registry service based on UDDI) GeodiseLab (Matlab & Jython environments) FINS (Notification services using WS-Eventing) BPEL (Workflow service) MANGO (Managing workflows with BPEL) FIRMS (Reliable messaging) Slides compliments of Steven Newhouse 40 So… eInfrastructure has many definitions – but basically it’s Grid computing JISC has funding for this – but haven’t yet defined where it will be spent Globus Toolkit provides many basic tools, and is incorporated in many projects, esp those focused on data movement In the UK, OMII is another useful source of eInfrastructure software 41 Additional Information Contact: Jennifer M. Schopf jms@mcs.anl.gov http://www.mcs.anl.gov/~jms Globus Alliance: http://www.globus.org Information about OMII: http//www.omii.ac.uk s.newhouse@omii.ac.uk 42