NeSC Workshop on Applications and Testbeds on the Grid Prior to GGF5 and HPDC-11 In collaboration with the Applications and Testbeds Research Group and the Grid User Services Research Group of the Global Grid Forum Workshop Presentations collected by John Towns, NCSA and NLANR jtowns@nlanr.net NeSC Workshop on Applications and Testbeds on the Grid Date: Venue: 20 July 2002 Marriott Hotel Glasgow, 500 Argyle Street, Glasgow The GGF’s Applications Research Group (ARG) and Grid Users Services Research Group (GUS), in seeking to provide a bridge between the wider application community and the developers and directors of grid policies, standards and infrastructures, held a workshop in Glasgow, Scotland preceding GGF5. The goals of this workshop were: • • • • to provide a forum for (prospective) applications utilizing Grids to spread information about both the current state-of-the-art and future directions of tools, toolkits and other instruments for users and application programmers to encourage users and application programmers to make use of existing Grid infrastructure to gather user and developer requirements for the effective use of Grid technologies This document is the collected presentation of the workshop following the agenda below. All presentation abstracts and presentations will be available through the workshop web page at: http://umbriel.dcs.gla.ac.uk/Nesc/general/esi/events/apps/ Agenda for the NeSC Workshop on Applications and Testbeds on the Grid Saturday 20 July 2002 0800 0845 Registration 0845 0900 Welcome 0900 1000 Session: Applications 1 0900 Grid Enabled Optimisation and Design Search for Engineering (GEODISE) - Simon Cox 0920 Taming of the Grid: Lessons Learned and Solutions Found in the National Fusion Collaboratory - Kate Keahey 0940 Chemical Reactor Performance Simulation: A Grid-Enabled Application - Ken Bishop 1000 1030 BREAK 1030 1130 Session: Applications 2 1030 Experiences with Applications on the Grid using PACX-MPI Matthias Mueller 1050 DAME Presentation Overview - Tom Jackson 1110 CFD Grid Research in N*Grid Project – Chun-ho Sung 1130 1230 OPEN DISCUSSION "Grid Experiences: Good, Bad and Ugly" 1230 1400 LUNCH 1400 1520 Session: Infrastructure 1 1400 GridTools: "Customizable command line tools for using Grids" Ian Kelley 1420 Using pyGlobus to Expose Legacy Applications as OGSA Components - Keith Jackson 1440 An Overview of the GAT API - Tom Goodale 1500 Collaborative Tools for Grid Support - Laura McGinnis 1520 1550 BREAK 1550 1710 Session: Infrastructure 2 1550 Application Web Service Tool Kit - Geoffrey Fox 1610 Grid Portals: Bridging the gap between Grids and application scientists - Michael Russell 1630 A Data Miner for the Information Power Grid - Thomas Hinke 1650 Grid Programming Frameworks at ICASE - Thomas Eidson 1710 1800 OPEN DISCUSSION "If we build it, will they come?" 1800 CLOSING Grid Enabled Optimisation and Design Search for Engineering (GEODISE) Prof Simon Cox Southampton University http://www.geodise.org Academic and Industrial Partners Southampton, Oxford and Manchester Simon Cox- Grid/ W3C Technologies and High Performance Computing Global Grid Forum Apps Working Group Andy Keane- Director of Rolls Royce/ BAE Systems University Technology Partnership in Design Search and Optimisation Mike Giles- Director of Rolls Royce University Technology Centre for Computational Fluid Dynamics Carole Goble- Ontologies and DARPA Agent Markup Language (DAML) / Ontology Inference Language (OIL) BAE Systems- Engineering Rolls-Royce- Engineering Fluent- Computational Fluid Dynamics Microsoft- Software/ Web Services Intel- Hardware Compusys- Systems Integration Epistemics- Knowledge Technologies Condor- Grid Middleware Nigel Shadbolt- Director of Advanced Knowledge Technologies (AKT) IRC 1 The GEODISE Team ... ) ) ) ) ) ) ) ) ) ) ) ) Richard Boardman Sergio Campobasso Liming Chen Mike Chrystall Simon Cox Mihai Duta Clive Emberey Hakki Eres Matt Fairman Carole Goble Mike Giles Zhuoan Jiao ) ) ) ) ) ) ) ) ) ) ) ) ) Andy Keane Juri Papay Graeme Pound Nicola Reader Angus Roberts Mark Scott Tony Scurr Nigel Shadbolt Paul Smart Barry Tao Jasmin Wason Gang “Luke” Xue Fenglian Xu Design 2 Design Challenges Modern engineering firms are global and distributed How to … ? … improve design environments … cope with legacy code / systems … produce optimized designs CAD and analysis tools, user interfaces, PSEs, and Visualization Optimisation methods … integrate large-scale systems in a flexible way Management of distributed compute and data resources … archive and re-use design history Data archives (e.g. design/ system usage) … capture and re-use knowledge Knowledge repositories & knowledge capture and reuse tools. “Not just a problem of using HPC” NASA Satellite Structure Optimized satellite designs have been found with enhanced vibration isolation performance using parallel GA’s running on Intel workstation clusters. 3 4 5 Baseline 3D-boom on test 6 Gas Turbine Engine: Initial Design Base Geometry Secondary Kinetic Energy Collaboration with Rolls-Royce Design of Experiment & Response Surface Modelling Initial Geometry RSM Construct DoE RSM Evaluate CFD CFD … CFD CFD CFD … CFD CFD CFD … CFD CFD CFD … CFD Cluster Parallel Analysis RSM Tuning Search Using RSM CFD Build Data-Base Adequate ? Best Design 7 Optimised Design Geometry Secondary Kinetic Energy The Grid Problem “Flexible and secure sharing of resources among dynamic collections of individuals within and across organisations” ) Resources = assets, capabilities, and knowledge Capabilities (e.g. application codes, analysis & design tools) Compute Grids (PC cycles, commodity clusters, HPC) Data Grids Experimental Instruments Knowledge Services Virtual Organisations Utility Services Grid middleware mediates between these resources 8 GEODISE Engineer GEODISE PORTAL Knowledge repository Ontology for Engineering, Computation, & Optimisation and Design Search Visualization Session database Traceability OPTIMISATION OPTIONS System APPLICATION SERVICE PROVIDER Intelligent Application Manager Reliability Security QoS CAD System CADDS IDEAS ProE CATIA, ICAD Globus, Condor, SRB Optimisation archive COMPUTATION Licenses and code Analysis CFD FEM CEM Design archive Parallel machines Clusters Internet Resource Providers Pay-per-use Intelligent Resource Provider Geodise will provide grid-based seamless access to an intelligent knowledge repository, a state-of-the-art collection of optimisation and search tools, industrial strength analysis codes, and distributed computing & data resources GEODISE Demo (1) Security Infrastructure Authentication & Authorisation (2) Define Geometry to optimise Nacelle Design 3D Axisymmetric (2D) (3) Sample Objective function to build Response Surface Model Grid Computing 9 GEODISE Demo (4) Optimise over Response Surface Model Target velocity function evaluation Search using Response Surface Model Width of curve on lower nacelle surface Position of curve on upper nacelle surface (5) Grid Database Query and Postprocessing of Results Automated Data Archiving Auto-Generate Database Problem Solving Environment XML Schema from PSE Insert Files into Database XML files Evolve Database New XML Schema from PSE Process Schema Create Database Data Repository Insert Files Reconcile Schema Update Database Data Archive Knowledge Discovery Updated XML Schema Web/Agent access to Repository GEODISE Home Movie 10 Knowledge Technologies Knowledge Capture for Design Process Ontology Driven Service Composition Workflow Management ) Knowledge Driven Acquire Model Re-use Retrieve Publish Maintain 11 The future of design optimisation Design Optimisation needs integrated services ) Design improvements driven by CAD tools coupled to advanced analysis codes (CFD, FEA, CEM etc.) ) On demand heterogeneous distributed computing and data spread across companies and time zones. ) Optimization “for the masses” alongside manual search as part of a problem solving environment. ) Knowledge based tools for advice and control of process as well as product. Geodise will provide grid-based seamless access to an intelligent knowledge repository, a state-of-the-art collection of optimisation and search tools, industrial strength analysis codes, and distributed computing and data resources 12 The Taming of the Grid: Lessons Learned in the National Fusion Collaboratory Kate Keahey Overview z Goals and Vision z Challenges and Solutions z Deployment War Stories z Team z Summary NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 2 Goals “ enabling more efficient use of experimental facilities more effective integration of experiment, theory and modelling” z Fusion Experiments – Pulses every 15-20 minutes – Time-critical execution z We want: – More people running more simulation/analysis codes in that critical time window – Reduce the time/cost to maintain the software – Make the best possible use of facilities > Share them > Use them efficiently z Better collaborative visualization (outside of scope) NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 3 Overview of the Project z Funded by DOE as part of the SciDAC initiative – 3 year project – Currently in its first year z First phase – SC demonstration of a prototype z Second phase – More realistic scenario at Fusion conferences – First shot at research issues z z Planning an initial release for November timeframe Work so far: – Honing existing infrastructure – Initial work on design and development of new capabilities NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 4 Vision z Vision of the Grid as a set of “network services” – Characteristics of the software (problems) > Software is hard to port and maintain (large, complex) > Needs updating frequently and consistently (physics changes) > Maintenance of portability is expensive > “Software Grid” as important as “Hardware Grid” > Reliable between pulse execution for certain codes > Prioritization, pre-emption, etc. – Solution: > provide the application (along with hardware and maintenance) as a remotely available “service to community” > Provide the infrastructure enabling this mode of operation NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 5 What prevents us? z Issues of control and trust – – – – z How do How do Will my How do I enter into contract with resource owner? I ensure that this contract is observed? code get priority when it is needed? I deal with a dynamic set of users? Issues of reliability and performance – Time-critical applications > How do we handle reservations and ensure performance? – Shared environment is more susceptible to failure – No control over resources – But a lot of redundancy NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 6 Other Challenges z z z Service Monitoring Resource Monitoring Good understanding of quality of service – Application-level – Composition of different QoS z z Accounting Abstractions – How do “network services” relate to OGSA Grid Services? z Implementational and deployment issues – firewalls NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 7 Issues of Trust: Use policies z Requirements – Policies coming from different sources: > A center should be able to dedicate a percentage of its resources to a community > Community may want to grant different rights to different groups of users – A group within a VO may be given management rights for certain groups of jobs – Managers should be able to use their higher privileges (if any) to manage jobs – Shared/dynamic accounts dealing with dynamic user community problem NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 8 Issues of Trust (cntd.) resource … owner virtual organization policy specification and management client Request Gird-wide client credential -credential -policy target -policy action NESC Workshop, Glasgow 07/20/02 Akenti (authorization system) policy evaluation GRAM (JM) (resource management) enforcement module PEP Local enforcer credential National Fusion Collaboratory local resource management system 9 Issues of Trust (cntd.) z Policy language – Based on RSL – Additions > Policy tags, ownership, actions, etc. z Experimenting with different enforcement strategies – Gateway – Sandboxing – Services z z z Joint work with Von Welch (ANL), Bo Liu Work based on GT2 Collaborating with Mary Thompson (LBNL) NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 10 Issues of Reliable Performance z Scenario: – A GA scientist needs to run TRANSP (at PPPL) between experimental pulses in less than 10 mins – TRANSP inputs can be experimentally configured beforehand to determine how its execution time relates to them > Loss of complexity (“physics”) to gain time – The scientist reserves the PPPL cluster for the time of the experiment – Multiple executions of TRANSP, initiated by different clients and requiring different QoS guarantees can co-exist on the cluster at any time, but when a reservation is claimed, the corresponding TRANSP execution claims full CPU power NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 11 Issues of Reliable Performance (cntd) Use policies (administrator) Meta-data Information (servers) z service interface multiple clients, different requirements TRANSP Service Interface TRANSP QoS requirements (client) execution Execution broker Broker multiple service installations Status: an OGSA-based prototype – Uses DSRT and other GARA-inspired solutions to implement pre-emption, reservations, etc. z Joint work with Kal Motawi NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 12 Deployment (Firewall Problems) z The single most serious problem: firewalls – Globus requires > Opening specific ports for the services (GRAM, MDS) > Opening a range of non-deterministic ports for both client and server > Those requirements are necessitated by the design – Site policies and configurations > > > > > z Blocking outgoing ports Opening a port only for traffic from a specific IP Authenticating through the firewall using SecureID card NAT (private network) “opening a firewall is an extremely unrealistic request” An extremely serious problem: makes us unable to use the Fusion Grid NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 13 Firewalls (Proposed Solutions) z z Inherently difficult problem Administrative Solutions – Explain why it is OK to open certain ports > Document explaining Globus security (Von Welch) – Agree on acceptable firewall practices to use with Globus > Document outlining those practices (Von Welch) – Talk to potential “influential bodies” > ESCC: August meeting, Lew Randerson, Von Welch > DOE Science Grid: firewall practices under discussion z Technical Solutions – OGSA work: Von Welch, Frank Siebenlist – Example: route interactions through one port z Do you have similar problems? Use cases? NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 14 Firewalls (Resources) z New: updated firewall web page – http://www.globus.org/security/v2.0/firewalls.html z Portsmouth, UK – http://esc.dl.ac.uk/Papers/firewalls/globus-firewall-experiences.pdf z DOE SG Firewall Policy Draft (Von Welch) z DOE SG firewall testbed z Globus Security Primer for Site Administrators (Von Welch) NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 15 The NFC Team z Fusion – David Schissel, PI, General Atomics (applications) – Doug McCune, PPPL (applications) – Martin Greenwald, MIT (MDSplus) z Secure Grid Infrastructure – Mary Thompson, LBNL (Akenti) – Kate Keahey, ANL, (Globus, network services) z Visualization – ANL – University of Utah – Princeton University z More information at www.fusiongrid.org NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 16 Summary z Existing infrastructure – A lot in relatively little time – Caveat: firewalls z Building infrastructure – Network services > A view of a “software grid” > Goal: to provide execution reliable in terms of an application-level QoS > To accomplish this goal we need: z Authorization and use policies z Resource management strategies NESC Workshop, Glasgow 07/20/02 National Fusion Collaboratory 17 Simulation of Chemical Reactor Performance – A Grid-Enabled Application – Kenneth A. Bishop Li Cheng Karen D. Camarda The University of Kansas kbishop@ku.edu NeSC Workshop July 20, 2002 Presentation Organization • Application Background • • Grid Assets In Play • • • Chemical Reactor Performance Evaluation Hardware Assets Software Assets Contemporary Research • • NCSA Chemical Engineering Portal Application Cactus Environment Application NeSC Workshop July 20, 2002 1 Chemical Reactor Description Reaction Conditions: Temperature: 640 ~ 770 K Pressure: 2 atm Coolant Molten Salt Feed Products O-Xylene : Air Mixture Phthalic Anhydride V2O5 Catalyst in Tubes NeSC Workshop July 20, 2002 Simulator Capabilities • • • • Reaction Mechanism: Heterogeneous Or Pseudo-homogeneous Reaction Path: Three Specie Or Five Specie Paths Flow Phenomena: Diffusive vs Bulk And Radial vs Axial Excitation: Composition And/Or Temperature NeSC Workshop July 20, 2002 2 Chemical Reactor Start-up CENTER TEMPERATURE RADIUS TUBE ENTRANCE AXIAL POSITION INITIAL CONDITION: FEED NITROGEN FEED TEMP. 640 K COOLANT TEMP. 640 K FINAL CONDITION: FEED 1% ORTHO-XYLENE FEED TEMP. 683 K COOLANT TEMP. 683 K EXIT TEMPERATURE K 640 770 NeSC Workshop July 20, 2002 Reactor StartStart-up: t = 60 LOW HIGH + TEMPERATURE ORTHOORTHO-XYLENE PHTHALIC ANHYDRIDE TOLUALDEHYDE PHTHALIDE COx NeSC Workshop July 20, 2002 3 Reactor StartStart-up: t = ∞ LOW HIGH + TEMPERATURE ORTHOORTHO-XYLENE PHTHALIC ANHYDRIDE TOLUALDEHYDE PHTHALIDE COx NeSC Workshop July 20, 2002 Grid Assets In Play - Hardware • The University of Kansas • • • • JADE O2K [6] 250MHz, R10000, 512M RAM PILTDOWN Indy [1] 175MHz, R4400, 64M RAM Linux Workstations Windows Workstations • University of Illinois (NCSA) • MODI4 O2K [48] 195MHz, R10000, 12G RAM • Linux ( IA32 [968] & IA64 [256] Clusters) • Boston University • LEGO O2K [32] 195MHz, R10000, 8G RAM NeSC Workshop July 20, 2002 4 Grid Assets In Play - Software • The University of Kansas • IRIX 6.5: Globus 2.0 (host); COG 0.9.13 [Java] (client); Cactus • Linux: Globus 2.0 (host); COG 0.9.13 [Java] (client); Cactus • Windows 2K: COG 0.9.13 (client); Cactus • University of Illinois (NCSA) • IRIX 6.5: Globus 2.0 (host); COG 0.9.13 (client); Cactus • Linux: Cactus • Boston University • IRIX 6.5: Globus 1.1.3 (host); COG 0.9.13 (client); Cactus NeSC Workshop July 20, 2002 Research Projects • Problem Complexity: Initial (Target) • Pseudo-homogeneous (Heterogeneous) Kinetics • Temperature And Feed Composition Excitation • 1,500 (70,000) grid nodes & 200 (1,000) time steps • Applications • Alliance Chemical Engineering Portal; Li Cheng – Thrust: Distributed Computation Assets – Infrastructure: Method of Lines, XCAT Portal, DDASSL • Cactus Environment; Karen Camarda – Thrust: Parallel Computation Algorithms – Infrastructure: Crank-Nicholson, Cactus, PETSc NeSC Workshop July 20, 2002 5 ChE Portal Project Plan • Grid Asset Deployment • Client: KU • Host: KU or NCSA or BU • Grid Services Used • Globus Resource Allocation Manager • Grid FTP • Computation Distribution (File Xfer Load) • • • • Direct to Host Job Submission (Null) Client- Job Submission; Host- Simulation (Negligible) Client- Simulation; Host- ODE Solver (Light) Client- Solver; Host- Derivative Evaluation (Heavy) NeSC Workshop July 20, 2002 ChE Portal Project Results • Run Times (Wall Clock Minutes) Load\Host PILTDOWN Null 76.33 Negligible NA Light NA Heavy 2540* JADE 22.08 27.76 35.08 NA MODI4 7.75 8.25 13.49 15.00** • 211,121 Derivative Evaluations • ** Exceeded Interactive Queue Limit After 3 Time Steps (10,362 Derivative Evaluations) NeSC Workshop July 20, 2002 6 ChE Portal Project Conclusions • Conclusions • The Cost For The Benefits Associated With The Use Of Grid Enabled Assets Appears Negligible. • The Portal Provides Robust Mechanisms For Managing Grid Distributed Computations. • The Cost Of File Transfer Standard Procedures As A Message Passing Mechanism Is Extremely High. • Recommendation • A High Priority Must Be Assigned To Development Of High Performance Alternatives To Standard File Transfer Protocols. NeSC Workshop July 20, 2002 Cactus Project Plan • Grid Asset Deployment • Client: KU • Host: NCSA (O2K, IA32 Cluster, IA64 Cluster) • Grid Services Used • MPICH-G • Cactus Environment Evaluation • Shared Memory : Message Passing • Problem Size: 5x105 – 1x108 Algebraic Equations • Grid Assets: 0.5 – 8.0 O2K Processor Minutes 0.1 – 4.0 IA32 Cluster Processor Minutes • Application Script Use NeSC Workshop July 20, 2002 7 Cactus Project Results Parallel Speedup on IA32 Linux Cluster 2D Dynamic simulation, 1 Processor/Cluster Node 35 Speedup (t1/tN) 30 25 9x33 20 18x66 15 36x132 10 5 0 0 5 10 15 20 25 30 Number of Cluster Nodes NeSC Workshop July 20, 2002 Cactus Project Conclusions • Conclusions • The IA32 Cluster Outperforms O2K On The Small Problems Run To Date. (IA32 Faster Than O2K; IA32 Speedup Exceeds O2K Speedup.) • The Cluster Computations Appear To Be Somewhat Fragile. (Convergence Problems Encountered Above 28 Cluster Node Configuration; Similar (?) Problems With The IA64 Cluster.) • The Grid Service (MPICH-G) Evaluation Has Only Begun. • Recommendations • Continue The Planned Evaluation of Grid Services. • Continue The Planned IA64 Cluster Evaluation. NeSC Workshop July 20, 2002 8 Overall Conclusions • The University Of Kansas Is Actively Involved In Developing The Grid Enabled Computation Culture Appropriate To Its Research & Teaching Missions. • Local Computation Assets Appropriate To Topical Application Development And Use Are Necessary. • Understanding Of And Access To Grid Enabled Assets Are Necessary. NeSC Workshop July 20, 2002 9 Experiences with Applications on the Grid using PACX-MPI Matthias Mueller mueller@hlrs.de HLRS www.hlrs.de University Stuttgart Germany 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Outline • • • Definition, Scenarios and Success Stories Middleware and Tools: The DAMIEN project Case Study 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 1 Grid Scenario • Standard approach: one big supercomputer • Grid approach: distributed resources 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Example of Distributed Resources: Supercomputers • • • • • • HLRS: CSAR: PSC: TACC: NCHC: JAERI: 23.07.2002 Cray T3E 512/900, 460 GFlops Cray T3E 576/1200, 691 GFlops Cray T3E 512/900, 460 GFlops Hitachi SR8000 512CPU/64 Nodes, 512 GFlops IBM SP3 Winter Hawk2,168CPU/42 Nodes, 252 GFlops NEC SX-4/4 Nodes, 8 GFlops ========= 2.383 TFlops Matthias Müller Höchstleistungsrechenzentrum Stuttgart 2 Applications 23.07.2002 • CFD (HLRS) – re-entry simulation of space craft • Processing of radio astronomy data (MC) – pulsar search code • DSMC (HLRS) – simulation of granular media Matthias Müller Höchstleistungsrechenzentrum Stuttgart GWAAT: Global Wide Area Application Testbed NSF Award at SC’ 99 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 3 Network Topology STAR-TAP Chicago TANET2 APAN IMnet Tsukuba/Tokyo Tokyo Hsinchu vBNS Abilene PSC SCInet Pittsburgh TEN 155 DFN New York Dante Frankfurt DFN DFN JANET X.X.X.X Belwü RUS Dallas Stuttgart JAERI TACC NCHC PSC Hitachi SR 8000 NEC SX-4 IBM SP3 Cray T3E sr8k.aist.go.jp frente.koma.jaeri ivory.nchc.gov.tjaromir.psc.edu .go.jp 150.29.228.82 128.182.73.68 w 202.241.61.92 140.110.7.x 23.07.2002 ATM PVC 2 Mbit/s Shared connections HLRS Cray T3E hwwt3e-at.hww.de 129.69.200.195 Manchester MCC Cray T3E turing.cfs.ac.uk 130.88.212.1 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Middleware for Scientific Grid Computing: PACX-MPI • • • PACX-MPI is a Grid enabled MPI implementation no difference between parallel computing and Grid computing higher latencies for external messages (70ms compared to 20µs) Co-operation with JAERI regarding Communication Library (stampi) 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 4 Status of the Implementation (I) • • • • Full MPI 1.2 implemented MPI-2 functionality – Extended collective operations – Language interoperability routines – Canonical Pack/Unpack Functions MPI 2 JoD Cluster attributes Other implemented features: – data conversion – data compression – data encryption 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Status of the implementation (II) • PACX-MPI ported and tested on – Cray T3E – SGI Origin/Onyx – Hitachi SR2201 and SR8000 – NEC SX4 and SX5 – IBM RS6000/SP – SUN platforms – Alpha platforms – LINUX: IA32 and IA64 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 5 Challenge: What is the application performance? • • • • • • • 23.07.2002 CFD (HLRS) – re-entry simulation of space craft Processing of radio astronomy data (MC) – pulsar search code DSMC (HLRS) – simulation of granular media Electronic structure simulation (PSC) Risk management for environment crisis (JAERI) GFMC (NCHC): – high-tc superconductor simulation Coupled vibro-acoustic simulation Matthias Müller Höchstleistungsrechenzentrum Stuttgart Comparision DSMC <-> MD • • Domain decomposition Speed up DSMC 23.07.2002 MD Matthias Müller Höchstleistungsrechenzentrum Stuttgart 6 DSMC - Direct Simulation Monte Carlo on Transatlantic Grid P a r tic le s/C P U 1953 w ith o u t P A C X 1 x 60 N odes 0 .0 5 se c w ith P A C X 2 x 30 N odes 0 .2 8 se c 3906 0 .1 0 se c 0 .3 1 se c 7812 0 .2 0 se c 0 .3 1 se c 15625 0 .4 0 se c 0 .4 0 se c 31250 0 .8 1 se c 0 .8 1 se c 125000 3 .2 7 se c 3 .3 0 se c 500000 1 3 .0 4 se c 1 3 .4 1 se c 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Necessary Tools: The DAMIEN project 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 7 The development phase • Sequential code(s) • Parallel (MPI) code(s) • parallelization • code coupling • optimisation MPI MpCCI • compiling • linking with libraries MpCCI PACX-MPI • debugging • performance analysis Marmot MetaVampir • testing Results ok ? yes 23.07.2002 no Matthias Müller Höchstleistungsrechenzentrum Stuttgart MpCCI: Basic Functionality • Communication – Based on MPI – Coupling of Sequential and Parallel Codes – Communicators for Codes (internal) and Coupling (external) • Neighborhood Search – Bucket Search Algorithm – Interface for User-defined Neighborhood Search • Interpolation – Linear Surface Interpolation for Standard Elements – Volume Interpolation – User-defined Interpolation 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 8 DAMIEN End-User application • • • EADS distributed over numerous sites all over Europe Computing resources are distributed ⇒ “natural” need for Gridsoftware to couple the resources Coupled vibro-acoustic simulations – structure of rockets during the launch – noise reduction in airplanes-cabins 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart MetaVampir - Application Level Analysis 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 9 The production phase Given problem Grid-enabled code Experience with small problem sizes MetaVampirtrace Dimemastrace Dimemas Determine optimal number of processors and combination of machines MetaVampir Execute job Configuration Manager QoS Manager 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Dimemas: Tuning Methodology Sequential machine Tracing MP library - MPI Tracing - PVM facilities - etc... Message Passing Code Parallel machine Dimemas Trace File Code modification DIMEMAS Paraver Visualization Trace File Visualization and analysis Simulation Parameters modification 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 10 DAMIEN tools in the production phase Dimemas-Tracefile Edit new configuration Execute Dimemas simulator Check results with Vampir All configurations tested ? no yes Specify best configuration Specify QoS parameter Launch job with Configuration Manager 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Case Study: PCM 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 11 PCM Direct numerical simulation of turbulent reactive flows • 2-D flow fields • detailed chemical reactions • spatial discretization: 6th order central derivatives • integration in time: 4th order explicit Runge-Kutta Challenging Applications 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Requirements for a 3D simulation • • • • Components – density, velocity and energy – mass fractions for chemical species (9 - 100) Spatial discretization – 100 µm (typical flame-front), 1 cm for computational domain – 100 grid-points into each direction Discretization in time – 10-8 for some important intermediate radicals – 1 s for slowly produced pollutants (e.g. NO) Summary – 100 variables, 106 grid-points – 1 ms simulation time with time steps of about 10-8 – 105 iterations with 108 unknowns 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 12 Example of a production run Cray T3E/900 512 nodes 64 GByte memory Hitachi SR8000 16 nodes /128 CPUs 128 GByte memory Hippi: 100MBit, 4ms • • • • auto-ignition process fuel (10%H2 and 90%N2, T=298K) and heated oxidiser (air at T=1298K) distribution is superimposed with turbulent flow-field temporal evolution computed 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Performance Analysis with Meta-Vampir 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 13 Message statistics - process view 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Message statistics - cluster-view 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 14 Result of production run: Maximum heat-release 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart Summary and Conclusion • • • • • • To make use of the Grid you need middleware and tools A Grid aware MPI implementation like PACX-MPI offers an incremental, optimized approach to the Grid Together with other standard tools this attracted a lot of scientific applications Applications are the driving force for PACX-MPI and DAMIEN This kind of scientific Grid computing is very demanding, but the requirements are common to other forms of Grid computing – performance, resource management, scheduling, security The network is the bottleneck, because: – Fat networks that are weakly interconnected – Political barriers between networks – Performance is not transparent – no responsibility for end-to-end performance 23.07.2002 Matthias Müller Höchstleistungsrechenzentrum Stuttgart 15 Overview of the DAME Project Distributed Aircraft Maintenance Environment University of York Martyn Fletcher Project Goal z z z z Build a GRID Application for Distributed Diagnostics. Application is generic, but application demonstrator will be for aircraft maintenance. Three year project - began in January 2002. One of six pilot projects funded by the EPSRC under the current UK e-Science initiative. 1 Outline Of Basic Operation z z z z On landing – DAME receives data from the engine. DAME looks for patterns, performs modelling etc. and provides a diagnosis / prognosis. DAME system made up of GRID based web services. DAME system also provides analysis tools etc. for use by Domain Experts etc. Benefits of Use z Allows problems to be detected early and common causes to be detected. z Ultimately will reduce flight delays, in-flight shutdowns and aborted takeoffs - due to engine problems. 2 DAME Collaborators z z z z z z z University of York. University of Leeds. University of Oxford. University of Sheffield. Rolls-Royce Plc. (RR). Data Systems & Solutions LLC. (DS&S). Cybula Ltd. Technologies Used z z z z AURA - Advanced Uncertain Reasoning Architecture – high performance pattern matcher developed by University of York and Cybula Ltd. QUOTE – “On The Engine” - intelligent engine signature collection / local diagnosis system developed by University of Oxford for Rolls Royce / DS&S. Decision Support – University of Sheffield. GRID architecture / web services – University of Leeds. 3 Current Work z z z z z Developing expertise in GRID architectures and web services. Developing AURA to make it available as a GRID web service. QUOTE work is ongoing. Developing Decision Support. Working with the “users” RR and DS&S to develop the use cases (following slides). Use Case Process 1. 2. 3. 4. 5. 6. 7. Define the DAME system scope and boundaries. Identify the list of primary actors. Identify the list of primary actor goals / use cases for the DAME system. Describe the outermost (overall) summary use cases. Revise the outermost summary use cases. Expand each DAME system use cases. Reconsider and readjust the set of uses cases. 4 Primary Actors z z z z z z z z z Maintenance team Engine releaser Maintenance scheduler Maintenance advisor Domain expert MR&O engine releaser MR&O Condition Recorder Knowledge engineer System administrator Outermost Use Cases z z z z z z z z z Release Engine. Dispossess the QUOTE anomaly. Plan Maintenance Schedule. Provide Maintenance Advice. Provide Maintenance Information. Provide Expert Information. Pass-Off Engine. Capture Knowledge. Maintain the System. 5 DAME Use Cases z z z z z z z z Perform Diagnosis. Perform Analysis. Model The System. Match The Pattern. Provide The Decision. Update Local Diagnostics. Provide Statistics Report Etc. DAME Use Case Diagram DAME Update Local Diagnostics Model The System «extends» Engine & GSS «extends» Match The Pattern «extends» Perform Analysis «extends» «uses» «uses» Perform Diagnosis Domain Expert (RR) Provide The Decision «uses» Engine Releaser (Airline) Perform Diagnosis not detected by QUOTE Get Pass-Off Information Maintenance Team Store Result MRO Engine Releaser Provide System Diagnostics Information Provide Statistics Report Perform System Software Maintenance System Administrator Maintenance Scheduler (Airline) Capture Knowledge Maintenance Advisor (DS&S or Airline Equivalent) Knowledge Engineer 6 Use Case – Perform Diagnosis (Main Success Scenario) 1. 2. 3. 4. 5. The DAME system analyses the data using e.g. Match the Pattern, Model the System use cases, etc. The DAME system assesses the results and determines the diagnoses / prognoses and confidence levels. The Domain Expert (DE) receives the proposed diagnosis / prognosis. The DE accepts it. The DE provides the result to the Maintenance Team. Use Case – Perform Diagnosis (Extensions) 4a. The Domain Expert does not accept the diagnosis / prognosis: requests more information from the Maintenance Team and / or Performs Analysis. Continues at step 5. 7 Future Use Case (UC) Work. z Expand existing UCs in discussion with the users - RR and DS&S. z Use these in planning the DAME demonstrations. z Update UCs as project progresses. z Consider other domains e.g. medical. Use Case Conclusions. z Use case diagrams – Tend to confuse people. – Useful as an overview only. z Use cases in text form – Easily understood. – Users like them. – Very little jargon. z Key point - know when to stop expansion of use cases. FOR MORE INFO... http://www.cs.york.ac.uk/dame 8 CFD Grid Research in N*Grid Project KISTI Supercomputing Center Chun-ho Sung Supercomputing Center Introduction to N*Grid z What is N*Grid? Korean Grid research initiative z Construction and Operation of the Korean National Grid z N*Grid includes National Computational Grid National Data Grid National Access Grid National Application Grid (Ex: Bio Grid, CFD Grid, Meteo Grid etc) z Funded by Korean government through Ministry of Information and Communication z KISTI supercomputing center is a primary contractor of N*Grid Supercomputing Center 1 Scope of N*Grid z High Performance Computational Grid Supercomputers High performance clusters z Advanced Access Grid Massive data distribution and processing (Data Grid) Collaborative access grid Immersive visualization z Grid Middleware Information service, Security, Scheduling, … z Search and support of Grid Application Project (Seed Project) Grid application testbed Grid application portals Grid applications Supercomputing Center CFD & Grid Research z CFD – computational fluid dynamics Nonlinear partial differential equations – Navier-Stokes equations Requires huge amount of computing resource The most limiting factor is computing power! z CFD in Grid research It can fully exploit the power of computing grid resources. Parallel/Distributed computing algorithm in CFD shows high level of maturity. Grand Challege problem can be solved through grid research (direct numerical simulation of turbulent flow). Grid research can receive feedback from real application. Supercomputing Center 2 CFD in N*Grid z Virtual Wind Tunnel on Grid infrastructure Flow analysis Module Mesh Generation Module CAD System Optimization Module Supercomputing Center Components of Virtual Wind Tunnel z CAD system Define geometry, integrated in grid portal z Mesh Generator Multi-block and/or Chimera grid system Semi-automated mesh generation z Flow Solver 3-dimensional Navier-Stokes code parallelized with MPI z Optimization Module Sensitivity analysis, response surface etc z Database Repository for geometries and flow solutions Communicate with other discipline code (CSD, CEM) Supercomputing Center 3 High Throughput Computing Environment z Improved throughput for Parametric study such as flutter analysis Construction of response surface Computing Grid Unstable Flutter boundary? Stable Supercomputing Center Preliminary Results z Supercomputer Grid Experiment Chonbuk N. Univ.: IBM SP2 Chonan/Soongsil Univ. Cluster Taejon KISTI Compaq GS320 KREONet2 Globus/MPICH-G KISTI: Compaq GS320 Chonbuk N. Univ. IBM SP2 Pusan Dong-Myoung. Univ. IBM SP2 Supercomputing Center 4 Preliminary Results – Cont. z Cluster Grid Experiment 2 Linux PC cluster systems over WAN duy.kaist.ac.kr : 1.8GHz P4 4 nodes, RAM 512M cluster.hpcnet.ne.kr : 450MHz P2 4 nodes, RAM 256M F90, PBS, MPICH-G2, GT2.0 duy.kaist.ac.kr/jobmanager-pbs cluster.hpcnet.ne.kr/jobmanager-pbs MPICH-G2 scheduler-pbs scheduler-pbs Execution nodes Execution nodes Supercomputing Center Preliminary Results – Cont. z Simulation of a parallel multi-stage rocket 400 thousand grid points & 6 processors Chimera methodology Supercomputing Center 5 Preliminary Results – Cont. z Aerodynamic Design Optimization RAE2822 airfoil design in 2D turbulent flow field 10 design variables & 4 processors Adjoint sensitivity analysis Supercomputing Center Preliminary Results – Cont. z Obtained parallel efficiency on supercomputer grid 16 Ideal Case 14 GS320-SP2(Onera M6) Speed-up 12 10 8 6 4 2 5 10 Processors 15 Supercomputing Center 6 Ongoing Efforts z CFD portal PHP based web interface GPDK for next version Integrated PRE/POST processing interface z High throughput computing Environment Generate parameter set Distribute/Submit jobs Collect results z Improved parallel algorithm z Adequate for WAN Supercomputing Center Remarks z Most application engineers are reluctant to use grid, since they believe that it is just a WAN version of parallel computing z We need to prove power of grid environment to application engineers, in order to encourage to use a new grid technology z Therefore, it is very important to show the capabilities of grid services and what can be done with those services Supercomputing Center 7 Thank you for your attention! Supercomputing Center 8 GridTools: Customizable command line tools for Grids Ian Kelley + Gabrielle Allen Max Planck Institute for Gravitational Physics Golm, Germany ikelley@aei.mpg.de NeSC Apps Workshop July 20th, 2002 Introduction • Simple command line tools (in Perl) for testing and performing operations across TestBeds. • Motivation: – Working with 26 machines on the SC2001 testbed – Tools to help us get our physics users onto the Grid – Playground for easily testing different scenarios before building them into portals/applications. • Have been useful for us, so put them together and wrote some documentation. • See also TeraGrid pages: – http://www.ncsa.uiuc.edu/~jbasney/teragrid-setup-test.html NeSC Apps Workshop July 20th, 2002 1 TestBeds • What do we mean by “TestBed”? – My definition of a TestBed: • “a collection of machines with some sort of coordinated infrastructure, that is used for a common purpose or by a specific group of users – We want to develop, deploy and test portal and application software – Ultimately: want real “users” to view TestBed as a single resource • For me: – – – – SC2001 (GGF Apps) TestBed GridLab TestBed AEI Relativity Group production machines My personal TestBed NeSC Apps Workshop July 20th, 2002 SC2001 TestBed • 26 Machines • Very heterogeneous • All sites worked to build towards a common setup (GRAM, GSI, Cactus, Portal, GIIS) • At SC2001 showed a Cactus simulation dynamically spawning individual analysis tasks to all machines • http://www.aei.mpg.de/~allen/TestBedWeb/ NeSC Apps Workshop July 20th, 2002 2 NumRel Production TestBed • This is what we really want! • For physicists to do physics! • Hard work! Blue Horizon Lemieux Psi Seaborg Globus Titan Origin Platinum Los Lobos sr8000 NeSC Apps Workshop July 20th, 2002 (Some) TestBed Gripes … • • • • • • • • Software deployment not yet standard/stable Information not easy-to-find or up-to-date Different security/account policies (firewalls!!) Priorities mean things not always fixed quickly. Hard to get a global view of current state. Have trouble keeping track of changes Not everything works as expected ☺ Basically we need to work in a “research-like” environment, but the more we use it, the more “production-like” it will become … NeSC Apps Workshop July 20th, 2002 3 What We Want To Do • Run different tests (gsi, gram, etc) on our TestBed to verify that things are working correctly. • Easily get up-to-date global views of our testbeds. • Log files for tracking history, stability, etc. • Easily add and configure machines and tests. • Construct and test more complex scenarios for applications • Something that our end-users can also use! NeSC Apps Workshop July 20th, 2002 Higher Level Scenarios • For example for our portal/applications we want to test feasibility/usefulness etc of – – – – – – – – – Remote code assembly and compilation Repositories of executables Things specific for Cactus: parameter files, thornlists Data description archiving, selection, transfer Visualisation Design of user-orientated interfaces User customisations Collaborative/Group issues Simulation announcing/steering/tracking • These also require work on the applications !! NeSC Apps Workshop July 20th, 2002 4 GridTools Aims • Give a wrapper around Globus tools that enables scripting capability to perform multiple tasks. • Provide additional functionality such as a pseudo database for storing machine and configuration specific information. • Modularization of functionality to allow for easy development of more complex programs. NeSC Apps Workshop July 20th, 2002 What You Get • Basic scripts: – TestAuth – TestResources • Report making – CreateTEXT – CreateHTML – CreateMAP • A Library: – GridTools.pm • A Pseudo-database – grid.dat – (all the stuff we really want to get from e.g. MDS) • Other stuff NeSC Apps Workshop July 20th, 2002 5 TestAuth Output NeSC Apps Workshop July 20th, 2002 Current Tests for TestResources • • • • • • • • Authorize to Globus Gatekeeper Simple GRAM job submission Using GSIFTP to copy files Using GSISCP to copy files Testing GSISSH in batchmode Simple job run using GASS server Simple MPI job run using GASS server Using machine specific predefined RSLs to execute a simple job • Very simple to add new tests. NeSC Apps Workshop July 20th, 2002 6 TestResources Output NeSC Apps Workshop July 20th, 2002 TestResources Output NeSC Apps Workshop July 20th, 2002 7 Extensibility • • GridTools.pm, a Perl module, contains many common functions that allow you to easily write additional scripts or modify the existing ones. – Such as execution of commands via fork() or using timeouts – Reading of machine configuration – User (text based) interfaces Could implement other useful functionality – Timing how long things take to complete – More advanced monitoring • How often do different services go down on different machines • – Querying of information servers to update local database, or visa-versa GridTools can be extended to perform more complicated tasks. – Such as real job submission • Using RSL templates and compilation specific information – Distribution or aggregation of files and processes NeSC Apps Workshop July 20th, 2002 Conclusion • GridTools can help you to run tests on a group of computers to provide you with a general overview of the status of your TestBed. • Can be extended to include more complicated tasks such as job distribution and compilation. • Obtain from CVS: – cvs –d :pserver:cvs_anon@cvs.aei.mpg.de:/numrelcvs login • password: anon – cvs –d :pserver:cvs_anon@cvs.aei.mpg.de:/numrelcvs co GridTools • Contact me for help/comments: ikelley@aei.mpg.de. NeSC Apps Workshop July 20th, 2002 8 Exposing Legacy Applications as OGSI Components using pyGlobus Keith R. Jackson Distributed Systems Department Lawrence Berkeley National Lab NeSC Grid Apps Workshop Overview • • • • • • • • • The Problem? Proposed Solution Why Python? Tools for generating Python interfaces to C/C++/Fortran pyGlobus Overview Current support in pyGlobus for Web Services OGSI plans for pyGlobus Steps for Exposing a Legacy Application Contacts & Acknowledgements NeSC Grid Apps Workshop 1 The Problem? • Many existing codes in multiple languages, e.g., C, C++, Fortran — Would like to make these accessible on the Grid • Should be accessible from any language — Need to integrate standard Grid security mechanisms for authentication — Need a standard framework for doing authorization — Would like to avoid custom “one off” solutions for each code NeSC Grid Apps Workshop The Solution • Provide a framework that legacy applications can easily be plugged into — Must be easy to add applications written in many languages • Use the Python language as the “glue” • The framework should support: — Authentication using standard Grid mechanisms — Flexible authorization mechanisms — Lifecycle management • Including persistent state • Develop one container, and reuse it for many legacy applications • Use Web Services protocols to provide language neutral invocation and control — Use standard high-performance Grid protocols, e.g., GridFTP, for data transfer NeSC Grid Apps Workshop 2 Solution (cont.) Client GSI Authentication Python Container Authorization Adapter Dispatcher Lifecycle Management Operations Application Factory Python Shadow Class State Management Operations Legacy Application NeSC Grid Apps Workshop Why Python? • Easy to learn/read high-level scripting language — Very little syntax • A large collection of modules to support common operations, e.g., networking, http, smtp, ldap, XML, Web Services, etc. • Excellent for “gluing” together existing codes — Many automated tools for interfacing with C/C++/Fortran • Support for platform independent GUI components • Runs on all popular OS’s, e.g., UNIX, Win32, MacOS, etc. • Support for Grid programming with pyGlobus, PyNWS, etc. NeSC Grid Apps Workshop 3 Tools for Interface Generation • SWIG (Simple Wrapper Interface Generator) — Generates interfaces from C/C++ • Supports the full C++ type system — Can be used to generate interfaces for Python, Perl, Tcl, Ruby, Guile, Java, etc. — Automatic Python “shadow class” generation — http://www.swig.org/ • Boost.Python (Boost Python interface generator) — Generates interfaces from C++ — http://www.boost.org/libs/python/doc/ • PyFort (Python Fortran connection tool) — Generates interfaces from Fortran — http://pyfortran.sourceforge.net/ • F2PY (Fortran to Python Interface Generator) — Generates interfaces from Fortran — http://cens.ioc.ee/projects/f2py2e/ NeSC Grid Apps Workshop pyGlobus Overview • The Python CoG Kit provides a mapping between Python and the Globus Toolkit™. It extends the use of Globus by enabling access to advanced Python features such as events and objects for Grid programming. • Hides much of the complexity of Grid programming behind simple object-oriented interfaces. • The Python CoG Kit is implemented as a series of Python extension modules that wrap the Globus C code. • Provides a complete interface to GT2.0. • Uses SWIG (http://www.swig.org) to help generate the interfaces. NeSC Grid Apps Workshop 4 pyGlobus and Web Services • Provides a SOAP toolkit that supports SOAP/HTTP/GSI — Allows standard GSI delegation to web services — Interoperates with the GSI enabled Java SOAP • XSOAP from Indiana • Axis SOAP from ANL — Currently based on SOAP.py, but switching to ZSI • ZSI supports document-oriented SOAP • ZSI supports much more flexible encoding of complex types NeSC Grid Apps Workshop GSISOAP Client Example from pyGlobus import GSISOAP, ioc proxy = GSISOAP.SOAPProxy(“https://host.lbl.gov :8081”, namespace=“urn:gtg-Echo”) proxy.channel_mode = ioc.GLOBUS_IO_SECURE_CHANNEL_MODE proxy.delegation_mode = ioc.GLOBUS_IO_SECURE_DELEGATION_MODE_NO NE print proxy.echo(“spam, spam, spam, eggs, and spam”) NeSC Grid Apps Workshop 5 GSISOAP Server Example from pyGlobus import GSISOAP, ioc def echo(s, _SOAPContext): cred = _SOAPContext.delegated_cred # Do something useful with cred here return s server = GSISOAP.SOAPServer(host.lbl.gov, 8081) server.channel_mode = ioc.GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP server.delegation_mode = ioc.GLOBUS_IO_SECURE_DELEGATION_MODE_FULL_PROXY server.registerFunction(SOAP.MethodSig(echo, keywords=0, context=1), “urn:gtg-Echo”) server.serve_forever() NeSC Grid Apps Workshop OGSI Plans for pyGlobus • Develop a full OGSI implementation in Python — Planned alpha release of an OGSI client by the end of August — OGSI hosting environment based on WebWare (http://webware.sourceforge.net/) • Dynamic web service invocation framework — Similar to WSIF (Web Services Invocation Framework) from IBM for Java • http://www.alphaworks.ibm.com/tech/wsif — Download and parse WSDL document, create request on the fly — Support for multiple protocol bindings to WSDL portTypes NeSC Grid Apps Workshop 6 Steps to Expose a Legacy App • Wrap the legacy application to create a series of Python classes or functions — Use one of the automated tools to help with this • Use pyGlobus to add any needed Grid support — GridFTP client to move data files — IO module for GSI authenticated network communication • Extend the GridServiceFactory class to implement any custom instantiation behavior • Add the Python shadow class to the container — XML descriptor file used to control properties of the class, e.g., security, lifecycle, etc. NeSC Grid Apps Workshop Contacts / Acknowledgements • http://www-itg.lbl.gov/ • krjackson@lbl.gov • This work was supported by the Mathematical, Information, and Computational Science Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract DE-AC03-76SF00098 with the University of California. NeSC Grid Apps Workshop 7 www.gridlab.org An Overview of the GAT API Tom Goodale Max-Plank-Institut fuer Gravitationsphysik goodale@aei-potsdam.mpg.de www.gridlab.org The Problem Application developers want to use the Grid now. The Grid is, however, a rapidly changing environment What technology should they use ? How soon will this technology be obsolete ? NeSC Applications Workshop 20/7/2002 www.gridlab.org The Problem Much of the existing Grid technologies are very low level; they allow sophisticated programmers to do things, but leave some application developers cold. Grid technologies are not deployed at all sites to the same extent. NeSC Applications Workshop 20/7/2002 www.gridlab.org The Problem Something which works in one place may need to be tweaked to work elsewhere even if the technologies used are, in principle, the same. It can take a lot of effort to get services deployed, but developers want to be able to develop and test their applications while this is going on. NeSC Applications Workshop 20/7/2002 www.gridlab.org The Solution ? Need an API which insulates the application developer from the details of low-level grid operations. Need an API which is “future-proof”. Need to make sure that developers can write and test their applications irrespective of the underlying state of deployment of Grid technologies. NeSC Applications Workshop 20/7/2002 www.gridlab.org Gridlab The GridLab project, which started in January of this year, seeks to provide such an API. Three year project. Will provide the Grid Application Toolkit (GAT) API Will provide a set of services which can be accessed via the GAT- API NeSC Applications Workshop 20/7/2002 www.gridlab.org GAT What does it mean to provide the GAT-API ? An API specification. A sample implementation. A set of services (or equivalents) available through the sample implementation What else ? A testbed to prove that it works Applications using it ! NeSC Applications Workshop 20/7/2002 www.gridlab.org GAT-API Need to identify the Grid operations which people want to do. Needs to be high level, rather than just duplicating existing low-level APIs Must give user the choice of which sets of low-level functionaility are actually used. NeSC Applications Workshop 20/7/2002 www.gridlab.org GAT-API Must allow application developers to do everything they want to do through the API, rather than forcing them to access specific Grid technologies with specific calls. Must provide the capability to see what actually happened so that users or developers can diagnose problems Must have bindings for many languages. NeSC Applications Workshop 20/7/2002 www.gridlab.org Which Operations Spawn, migrate Checkpoint Find resource, allocate resource find process, find data Copy data, send/receive data Security Allow selective access to data, processes, etc Multiple VOs, etc NeSC Applications Workshop 20/7/2002 www.gridlab.org GAT Implementation Must be modular to allow access both to current Grid technologies and future Grid technologies Must be modular to be independent of any specific technology NeSC Applications Workshop 20/7/2002 www.gridlab.org GAT Implementation Core – the API bindings – must be deployable on all architectures Core must return sensible error codes in the absence of any other part Should be able to provide access to serviceequivalents so that the application can work even in the absence of externally-deployed services or behind a firewall. NeSC Applications Workshop 20/7/2002 www.gridlab.org GAT Implementation Must be able to react to changing Grid conditions and allow access to new services as they become available Must allow access to existing services, as well as the services being developed within the GridLab project. NeSC Applications Workshop 20/7/2002 www.gridlab.org Status Very rough prototype implementation of core. Continuing work to define the set of operations available through the API, and hence the precise API specification Work to develop a set of GridLab services to be accessed via the API. NeSC Applications Workshop 20/7/2002 NeSC Workshop on Applications and Testbeds on the Grid Collaborative Tools for Grid Support Laura F. McGinnis Pittsburgh Supercomputing Center Pittsburgh, PA, USA July 20, 2002 This Talk The Software Tools Collaboratory, an NIH funded project at the Pittsburgh Supercomputing Center, has recently completed a protein folding simulation, utilizing 4 major systems at 3 geographically distributed sites. This presentation will discuss the issues related to setting up and running the simulation, from the perspective of establishing and maintaining the infrastructure and communication among participants before, during and after the simulation. As many sites have found, coordinating the resources for grid computing is more than just a matter of synchronizing batch schedulers. This presentation will share the collaboratory’s experience in supporting and managing communication among participants, especially in the back channels, using common, publicly available tools. • The Experiment • The Tools • Alternatives lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 1 The Experiment: The Task • Simulate protein folding for a 400point surface using CHARMM • Using the 4 systems, the elapsed time for this experiment was estimated to take 30 hours lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland The Experiment: The Cast • 1 Scientist at The Scripps Research Institute in San Diego, California • 1 Legion Administrator at the University of Virginia • 4 machines of different architectures • • • • TCS Alphacluster at Pittsburgh Supercomputing Center T3E at Pittsburgh Supercomputing Center IBM SP at San Diego Supercomputing Center Linux Cluster at University of Virginia • The Chorus – 4 Collaboratory support observers at 2 locations in Pittsburgh lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 2 The Experiment: Prep Work • Establish the protocol • Test the components • Including the collaborative tools • Coordinate dedicated time • On all platforms • With all necessary support staff lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland The Experiment: The Run • Launch jobs to each machine via Legion • Monitor progress of jobs as they run • Collect results back to the scientist’s site for evaluation lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 3 The Tools: Prep Tools • Email: • Majordomo • Mhonarc • Document Management: • Enotes (US Department of Energy, Oak Ridge National Laboratory) lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland The Tools: Run Tools • Application Sharing • Microsoft NetMeeting, SGI’s SGIMeeting • Communication: • AOL Instant Messenger • Email: • Majordomo lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 4 The Tools: AIM 100 Day 1 90 Chatroom Activity over time 80 70 60 50 Day 2 40 30 20 Day 3 lfm@psc.edu 20-July-2002 9:00 17:00 16:00 15:00 14:00 13:00 12:00 11:00 10:00 9:00 18:00 17:00 16:00 15:00 14:00 13:00 12:00 11:00 0 10:00 10 NeSC Applications Workshop Glasgow, Scotland The Tools: AIM Chorus 10% Participation by Cast Member Systems 20% Scientist 54% Legion Admin 16% lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 5 The Tools: AIM Statement Types UVa 1% Misc 2% NetMeeting 3% Admin 5% Jaromir 7% Lemieux 39% Blue Horizon 16% lfm@psc.edu 20-July-2002 Legion 27% NeSC Applications Workshop Glasgow, Scotland Alternative Tools: Document Management • DoE Electronic Notebooks: • Enote@LBNL • Enote@PNNL • Enote@ORNL • DocShare • UServ • CVS (Concurrent Versions Systems) • Shared Disk Space • PCs • Unix lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 6 Alternative Tools: Document Sharing Evaluation Criteria • • • • • • • • Available from Cost Ease of Use Sharable Document Types Web-Based Interface Web Client Support Cross-Platform Client System Requirements • Server System Requirements • Shared Editing • Editing Method • Object Handling Method • Source Code Availability • History Tracking • Other features/notes lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland Alternative Tools: Communication • Instant Messenger Services • AOL Instant Messenger • Yahoo Instant Messenger • Microsoft Instant Messenger • • • • ICQ Internet Relay Chat (IRC) Zephyr Imici lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland 7 Alternative Tools: Communications Evaluation Criteria • • • • • • • • Provider Cost Size of Download Ease of Signup File Transfer Ability Logging Chat Rooms SPAM Problems lfm@psc.edu 20-July-2002 • • • • • • • • People Locator Information Required Auto-ID Firewall/Proxy Usage Platform System Requirements Internet Email Notes NeSC Applications Workshop Glasgow, Scotland Lessons Learned • Running applications on a grid is still a peopleintensive activity • Participants require strong communication tools • Custom solutions are still the norm • Use cases should be analyzed to identify commonalities which can be addressed (and differentiators that cannot be) • Tools are available to support back-channel communications • Don’t overlook “popular” software • There are also low-overhead tools available • Know your risk and annoyance threshold 8 The Cast: Credits • The Scientist – Mike Crowley • The Legion Administrator – Glenn Wasson • The System Administrators • TCS Alphacluster and T3E @ PSC – Chad Vizino • IBM SP @SDSC – Kenneth Yoshimoto • Linux Cluster @Uva – Glenn Wasson • The Chorus • • • • Sergiu Sanielivici (PSC) Cindy Gadd (UPMC) Robb Wilson (UPMC) Laura McGinnis (PSC) lfm@psc.edu 20-July-2002 NeSC Applications Workshop Glasgow, Scotland Appendix 1: Contact Information for Communication Tools • • • • • • • AOL Instant Messenger (aol.com) Yahoo Instant Messenger (yahoo.com) ICQ (ICQ.com) Microsoft Instant Messenger (microsoft.com) Internet Relay Chat (mirc.com, ircle.com) Imici (imici.com) Zephyr (mit.edu) 9 Appendix 2: Contact Information for Document Management Tools • Enote@LBL (http://vision.lbl.gov/~ssachs/doe2000/lbnl.download.html) • Enote@PNNL (http://www.emsl.pnl.gov:2080/docs/collab/) • Enote@ORNL (http://www.epm.ornl.gov/~geist/java/applets/enote/) • DocShare (http://collaboratory.psc.edu/tools/docshare/faq.html) (must email nstone@psc.edu)) • Userv (http://userv.web.cmu.edu/userv/Download.jsp) • CVS (Concurrent Versions Systems) (http://collaboratory.psc.edu/tools/cvs/faq.html) 10 Application Web Services and Event / Messaging Systems NeSC Glasgow July 20 2002 PTLIU Laboratory for Community Grids Geoffrey Fox, Shrideep Pallickara, Marlon Pierce Computer Science, Informatics, Physics Indiana University, Bloomington IN 47404 http://www.naradabrokering.org http://grids.ucs.indiana.edu/ptliupages gcf@indiana.edu 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 1 Application Portal in a Minute (box) Systems like Unicore, GPDK, Gridport (HotPage), Gateway, Legion provide “Grid or GCE Shell” interfaces to users (user portals) • Run a job; find its status; manipulate files • Basic UNIX Shell-like capabilities Application Portals (Problem Solving Environments) are often built on top of “Shell Portals” but this can be quite time confusing • Application Portal = Shell Portal Web Service + Application (factory) Web service 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 2 1 Application Web service Application Web Service is ONLY metadata • Application is NOT touched Application Web service defined by two sets of schema: • First set defines the abstract state of the application What are my options for invoking myapp? Dub these to be “abstract descriptors” • Second set defines a specific instance of the application I want to use myapp with input1.dat on solar.uits.indiana.edu. Dub these to be “instance descriptors”. Each descriptor group consists of • Application descriptor schema • Host (resource) descriptor schema • Execution environment (queue or shell) descriptor schema 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 3 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 4 2 Engineering Application WS Schema wizard: given a Schema, creates (JSP) web page with form to specify XML Instances • Use for application metadata AntiSchema wizard: given an HTML form, creates a Schema • Captures input parameters of application Castor converts Schema to Java • Use Python if you prefer! Apache converts Java into Web Services Make this in a portlet for use in favorite portal Being used today in DoD …….. (with and without Globus) 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 5 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 6 3 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 7 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 8 4 Different Web Service Organizations Everything is a resource implemented as a Web Service, whether it be: • back end supercomputers and a petabyte data • Microsoft PowerPoint and this file Web Services communicate by messages ….. Grids and Peer to Peer (P2P) networks can be integrated by building both in terms of Web Services with different (or in fact sometimes the same) implementations of core services such as registration, discovery, life-cycle, collaboration and event or message transport ….. • Gives a Peer-to-Peer Grid Here we discuss Event or Message Service linking web services together 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" Database Peers 9 Database Resource Facing Web Service Interfaces Event/ Message Brokers Integrate P2P and Grid/WS Event/ Message Brokers Peer to Peer Grid Web Service Interfaces Peers User Facing Web Service Interfaces 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 10 A democratic organization Peer to Peer Grid 5 XML Skin Message Or Event Soft Soft Based ware ware Inter Connection Resource XML Skin Resource Data base e-Science/Grid/P2P Networks are XML Specified Resources connected by XML specified messages Implementation ofuri="http://www.naradabrokering.org" resource and connection may or may not be XML 11 email="gcf@indiana.edu" 7/23/2002 Role of Event/Message Brokers We will use events and messages interchangeably • An event is a time stamped message Our systems are built from clients, servers and “event brokers” • These are logical functions – a given computer can have one or more of these functions • In P2P networks, computers typically multifunction; in Grids one tends to have separate function computers • Event Brokers “just” provide message/event services; servers provide traditional distributed object services as Web services There are functionalities that only depend on event itself and perhaps the data format; they do not depend on details of application and can be shared among several applications • NaradaBrokering is designed to provide these functionalities • MPI provided such functionalities for all parallel computing 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 12 6 NaradaBrokering implements an Event Web Service Destination Source Matching Routing Web Service 1 WSDL Ports (Virtual) Queue Broker Filter workflow Web Service 2 WSDL Ports Filter is mapping to PDA or slow communication channel (universal access) – see our PDA adaptor Workflow implements message process Routing illustrated by JXTA Destination-Source matching illustrated by JMS using PublishSubscribe mechanism 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 13 Engineering Issues Addressed by Event / Messaging Service Application level Quality of Service – give audio highest priority Tunnel through firewalls Filter messages to slow (collaborative or real time) clients Hardware multicast is erratically implemented (Event service can dynamically use software multicast) Scaling of software multicast Elegant implementation of Collaboration in a Groove Networks (done better) style Integrate synchronous and asynchronous collaboration 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 14 7 Features of Event Service I MPI nowadays aims at a microsecond latency The Event Web Service aims at a millisecond latency • Typical distributed system travel times are many milliseconds (to seconds for Geosynchronous satellites) • Different performance/functionality trade-off Messages are not sent directly from P to S but rather from P to Broker B and from Broker B to subscriber S • Actually a network of brokers Synchronous systems: B acts as a real-time router/filterer • Messages can be archived and software multicast Asynchronous systems: B acts as an XML database and workflow engine Subscription is in each case, roughly equivalent to a database query 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 15 Features of Event Web Service II In principle Message brokering can be virtual and compiled away in the same way that WSDL ports can be bound in real time to optimal transport mechanism • All Web Services are specified in XML but can be implemented quite differently • Audio Video Conferencing sessions could be negotiated using SOAP (raw XML) messages and agree to use certain video codecs transmitted by UDP/RTP There is a collection of XML Schema – call it GXOS – specifying event service and requirements of message streams and their endpoints • One can sometimes compile message streams specified in GXOS to MPI or to local method call Event Service must support dynamic heterogeneous protocols 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 16 8 Features of Event Web Service III The event web service is naturally implemented as a dynamic distributed network • Required for fault tolerance and performance A new classroom joins my online lecture • A broker is created to handle students – multicast locally my messages to classroom; handle with high performance local messages between students Company X sets up a firewall • The event service sets up brokers either side of firewall to optimize transport through the firewall Note all message based applications use same message service • Web services imply ALL applications are (possibly virtual) message based 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 17 Single Server P2P Illusion Data base Traditional Collaboration Architecture e.g. commercial WebEx Collaboration Server 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 18 9 Narada Broker Network (P2P) Community For message/events service Broker Broker (P2P) Community Resource Broker Broker Data base Broker (P2P) Community Software multicast Broker (P2P) Community 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 19 NaradaBrokering and JMS (Java Message Service) Transit Delays for Message Samples in Narada and SonicMQ Low Rate; Small Messages Mean Transit Delay (MilliSeconds) 14 12 10 8 6 4 2 0 0 5 550 500 450 400 350 300 250 200 Payload Size 150 (Bytes) 100 Narada 10 15 Publish Rate 20 (Messages/sec) 7/23/2002 SonicMQ (commercial JMS) 25 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 20 10 JXTA just got slower 100 Client ⇔ JXTA ⇔ JXTA ⇔ Client 90 Mean Transit Delay (Milliseconds) 80 Client ⇔ JXTA ⇔ Narada ⇔ JXTA ⇔ Client 70 60 50 Client ⇔ JXTA ⇔ JXTA ⇔ Client multicast 40 30 Narada Client 20 0 Pure Narada 2 hops Client 10 0 7/23/2002 Narada 100 200 300 400 500 600 Message Payload Size uri="http://www.naradabrokering.org" email="gcf@indiana.edu" (Bytes) 21 Shared Input Port (Replicated WS) Collaboration Collaboration as a WS Set up Session F I U R O F Web Service I O WS Viewer WS Display Master F Event (Message) Service I F I 7/23/2002 U R O I O F Web Service I O WS Viewer WS Display Other Participants U R O F Web Service WS Viewer uri="http://www.naradabrokering.org" email="gcf@indiana.edu" WS Display 22 11 Shared Output Port Collaboration Collaboration as a WS Set up Session Web Service Message Interceptor F I R O Master WSDL U Application or Content source Web Service O F I Event (Message) Service 7/23/2002 WS Viewer WS Display WS Viewer WS Display Other Participants WS Viewer WS Display uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 23 NaradaBrokering Futures Higher Performance – reduce minimum transit time to around one millisecond Substantial operational testing Security – allow Grid (Kerberos/PKI) security mechanisms Support of more protocols with dynamic switching as in JXTA – SOAP, RMI, RTP/UDP Integration of simple XML database model using JXTA Search to manage distributed archives More formal specification of “native mode” and dynamic instantiation of brokers General Collaborative Web services 7/23/2002 uri="http://www.naradabrokering.org" email="gcf@indiana.edu" 24 12 www.gridlab.org Grid Portals: Bridging the gap between scientists and the Grid Michael Russell, Jason Novotny, Gabrielle Allen Max-Planck-Institute fuer Gravitationphysik Golm, Deutschland www.gridlab.org The promises of Grid computing are grand Uniform access to heterogenous resources The ability to pool distributed resources together on demand Resources are either transparently available to users or they simply don’t have to worry about them Support virtual organizations of distributed researchers collaborationg across institutoional, geographic, and political boundaries… Support applications with enormous computing and/or data management requirements. 1 www.gridlab.org Grid computing is quickly evolving In the 2 years I’ve been here Globus 1.3 went 2.0 Grid Portals became “the next big thing” Luckily, portlets have come to the rescue Meanwhile, Globus went pro to bring in IBM and other heavyweights Now OGSA is stepping up to bat Global Grid Forum is already a hit Can’t wait for the video games! www.gridlab.org So where is the Grid? How can we use it? Or at least that’s so-called users are probably asking. The Grid is a work in progress, one of the biggest undertakings in the history of humankind. Most of you are in Scotland to do your part in building the Grid. And most of you have reasons why you need the Grid, you want to use it right? Or perhaps you want to help others to use it. The point is, a large gap exists between the Grid and its would-be users. 2 www.gridlab.org Grid Portals So you want to build a Grid portal to bridge that gap. But what does it take to build a Grid portal? Well, it’s going to take a lot… It takes a solid understanding of the state-ofthe-art in Web portal development. It takes a solid understanding of the state-ofthe-art of in Grid computing. And much more… www.gridlab.org The Astrophysics Simulation Collaboratory The ASC seeks to: Enable astrophysicists to develop better and more powerful simulations with the Cactus Computational Toolkit. Enable astrophysicists to run and analyze simulations on Grids. Build a Grid portal to support these activities and all the rest that comes with Web portals. 3 www.gridlab.org The ASC and Cactus So we worked with the Cactus Project to develop support for working with Cactus from the ASC Portal. We developed application Web components to: Install Cactus software from multiple cvs repositories onto target resources. Build executables with those installations using the autoconf and make capabilities in Cactus. Upload and edit parameter files. Run simulations on target resources, as well as for connecting to simulations and monitoring their progress. Launching the appropriate visualization applications on Cactus data. In most cases, the Cactus Project developed extensions to Cactus software to support these components. www.gridlab.org The ASC and Globus We worked with the Globus Project to develop support for working with Globus from the ASC Portal. We developed Grid Web components to: Enable logging on with one or more Grid proxy certificates stored in MyProxy. Submit and monitor jobs with Globus Gatekeeper and for maintaing job history. Manipulate files (listing, copying, moving, deleting files) with GSIFTP. We tried to use MDS, but at that time MDS did not meet our needs, so we developed our own components for storing static information about resources and polling for whatever dynamic information could be reliably retrieved from MDS. We asked Globus to build GSI-CVS and now we’re building Web components and services to use and extend GSI-CVS. We’ve added support for GSI-authentication in the MindTerm Java SSH implementation. 4 www.gridlab.org The ASC and ASC We realized we needed to support the people within ASC directly from our portal, so we built administrative Web components to: Manage user accounts. Assign security roles to users and to which pages users have access. Manage the proxy certificates users require to authenticate to other services and determine which certificates authenticate to which services. Manage the resources and services to which we provide access from the ASC Portal. www.gridlab.org Problems we faced The ASC is a Virtual Organization in every sense of the word with users, developers, administrators, software, and resources distributed across the U.S. and Europe. With our developer(s) in the University of Chicago and our users in Washington University, St. Louis and Max Planck Institute in Golm, Germany, there wasn’t nearly enough direct contact between our users and developers. This made it difficult to meet their needs. It was easy enough to prototype Web components, but we needed to build a Grid portal framework that would support future development and sustain a production-quality Grid portal, and that took several months to develop. We were closely associated with Globus, but this made us too reliant upon Globus as our interface to resources… For while we had identified the resources our users required, the Grid as we knew it then and today just wasn’t enough. 5 www.gridlab.org It takes more than just a portal You need to build a virtual organization or otherwise join a virtual organization and plug into their work! www.gridlab.org Other lessons we learned Put the needs of scientists at the very top of your list. Be familiar with their research and the day-to-day problems they face in using computing technology to conduct that research. Next, consider what it is you really need in the form of: Enhancements to the applications scientists are developing. For new applications, consider how you can buildin support for Grid operations. For legacy applications, consider how you can provide better support for their applications with external services. Enhancements to the Grid infrastructure with respect to your applications. These enhancements should build off other… High-level services that coordinate the use of resources. We’re beginning to see Globus as the “system” layer with respect to Grids. 6 www.gridlab.org Don’t forget your infrastructure Create a vialable testbed that includes both the resources your users need and use in their everyday reserch as well as resources with which you can experiment. In a VO like the ASC, you will not have administrative control over these resources. In fact, this is a very difficult problem to overcome, it takes a lot of effort to see changes applied where and when you need them. You need tools for testing things out and you’re going to need to keep track of all your resources and providers, the change requests you make, the problems your users experience, and so forth… www.gridlab.org Get into production mode In order to build a viable Grid portal, realize you need to build a production system, something your users can rely upon to work every time. For instance, what happens to your production database when your software and data model change? What makes a production system? Solid engineering practices and attention to user requirements, security issues, persistence management, quality assurance, release management, performance issues… Project management at the VO level is complex, make sure you understand Grid-level management issues before you start writing those cool Grid portal proposals! 7 www.gridlab.org Build a team It’s important to communicate with partners in the Global Grid Forum, but try to cover all the bases within your project. If you don’t have the funds to build a large team, then allocate funds towards developing explicit links between yours and other projects. Because you need application experts, Grid portal experts, Grid service experts, Grid testbed administrators, and so on… division of labor is a cornerstone of project management. www.gridlab.org GridLab Well, so we’re taking the lessons we learned in the ASC and elsewhere (GPDK, for example!) through the collective experience of everyone involved in GridLab and we’re applying them towards building a… Production Grid across Europe (and the U.S.) Grid Application Toolkit for developing applications with built-in support for Grid operations, like m igrateApplication() Grid Portal to support GridLab 8 www.gridlab.org Key points We’re going to work as closely as possible with scientists and their applications throughout the project. We’ve created a testbed of resources that our scientists really want to use. We’re working to build higher-level services to better coordinate the use of those resources, and the requirements for these services will come either directly from our application groups or indirectly through using a Grid portal. We’re developing application frameworks to enable scientists to make use of these higher-level services as basic function/method calls within the applications. And we’re coordinating these activities through the development and use of the GridLab Portal. www.gridlab.org The GridLab Portal We’re using the ASC Portal software to allow us to focus on the needs of scientists from the very beginning of our project development. We’re simultaneously building a new framework that takes the best of current practices in Web and Grid computing, and we’re documenting just about every bit of its requirements, design, and development. As we develop this framework, we’ll be preparing the Web interfaces we develop with the ASC Portal for migration to the new framework. 9 www.gridlab.org Bridging the gap Before you begin your Grid portal efforts, look at what’s already out there. For example: GPDK - JSP/Servlets, Great way for getting an introduction to Grid Portal development GridPort - Perl-CGI, well-managed project and they have real application groups using NPACI resources. Bear in mind that Portlets are where it’s all heading, but there is no Portlet API quite yet. A lot of people looking at building Grid portals with the JetSpeed codebase (but not everyone!). Or, you should consider working with us… www.gridlab.org A possible collaboration with GridLab GridLab Application Toolkits •Cactus •Triana GridLab Portal Run my application! GridLab Application Manager GridLab Resource Manager GridLab Information Service Grid CVS Grid Make Compute Resources Help us design better Grid technologies. Plug in your own application and supporting services.Then develop your own Web pages with our Grid portal… 10 www.gridlab.org Some online references Astrophysics Simulation Collaboratory: http://www.ascportal.org Cactus Project: http://www.cactuscode.org Globus Project: http://ww.globus.org GridLab: http://www.gridlab.org GridPort: http://www.gridport.org Grid Portal Development Toolkit: http://www.doesciencegrid.org/Projects/GPDK Jakarta JetSpeed: jakarta.apache.org/jetspeed OGSA: http://www.globus.org/ogsa Portlet Specification: http://www.jcp.org/jsr/detail/168.jsp 11 A Data Miner for the Information Power Grid Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA Ames Research Center 1 NAS Division Data Mining on the Grid What is data mining? Why use the grid for data mining? Grid miner overview Grid miner architecture Grid miner implementation Current status Ames Research Center 2 NAS Division 1 What Is Data Mining? “Data mining is the process by which information and knowledge are extracted from a potentially large volume of data using techniques that go beyond a simple search though the data.” [NASA Workshop on Issues in the Application of Data Mining to Scientific Data, Oct 1999, http://www.cs.uah.edu/NASA_Mining/] Ames Research Center 3 NAS Division Example: Mining for Mesoscale Convective Systems Image shows results from mining SSM/I data Ames Research Center 4 NAS Division 2 Data Mining on the Grid What is data mining? Why use the grid for data mining? Grid miner overview Grid miner architecture Grid miner implementation Current status Ames Research Center 5 NAS Division Grid Provides Computational Power • Grid couples needed computational power to data – NASA has a large volume of data stored in its distributed archives • E.g., In the Earth Science area, the Earth Observing System Data and Information System (EOSDIS) holds large volume of data at multiple archives – Data archives are not designed to support user processing – Grids, coupled to archives, could provide such a computational capability for users Ames Research Center 6 NAS Division 3 Grid Provides Re-Usable Functions • Grid-provided functions do not have to be re-implemented for each new mining system – – – – – Single sign-on security Ability to execute jobs at multiple remote sites Ability to securely move data between sites Broker to determine best place to execute mining job Job manager to control mining jobs • Mining system developers do not have to re-implement common grid services • Mining system developers can focus on the mining applications and not the issues associated with distributed processing Ames Research Center 7 NAS Division Data Mining on the Grid What is data mining? Why use the grid for data mining? Grid miner overview Grid miner architecture Grid miner implementation Current status Ames Research Center 8 NAS Division 4 Grid Miner • Developed as one of the early applications on the IPG – Helped debug the IPG – Provided basis for satisfying a major IPG milestones • IPG is NASA implementation of Globus-based Grid • Provides basis for what could be an on-going Grid Mining Service Ames Research Center NAS Division 9 Grid Miner Operations Results Results Translated Data Data Data Preprocessed Preprocessed Data Data Patterns/ Patterns/ Models Models Input Preprocessing Analysis Output HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others... Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others... GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... Intergraph Raster Others... Figure thanks to Information and Technology Laboratory at the University of Alabama in Huntsville 5 Data Mining on the Grid What is data mining? Why use the grid for data mining? Grid miner overview Grid miner architecture Grid miner implementation Current status Ames Research Center 11 NAS Division Mining on the Grid Satellite Data Grid Mining Agent Archive X IPG Processor Grid Mining Agent IPG Processor Satellite Data Archive Y Ames Research Center Grid Mining Agent IPG Processor 12 NAS Division 6 Grid Miner Architecture Grid Mining Agent Data Archive X IPG Processor IPG Processor Mining Database Daemon IPG Processor Mining Operations Repository Miner Confiig Server Control Database IPG Processor Satellite Data Archive Y Ames Research Center Grid Mining Agent IPG Processor 13 NAS Division Data Mining on the Grid What is data mining? Why use the grid for data mining? Grid miner overview Grid miner architecture Grid miner implementation Current status Ames Research Center 14 NAS Division 7 Starting Point for Grid Miner • Grid Miner reused code from object-oriented ADaM data mining system – Developed under NASA grant at the University of Alabama in Huntsville, USA – Implemented in C++ as stand-alone, objected-oriented mining system • Runs on NT, IRIX, Linux – Has been used to support research personnel at the Global Hydrology and Climate Center and a few other sites. • Object-oriented nature of ADaM provided excellent base for enhancements to transform ADaM into Grid Miner Ames Research Center 15 NAS Division Transforming Stand-Alone Data Miner into Grid Miner • Original stand-alone miner had 459 C++ classes. • Had to make small modifications to ADaM – Modified 5 existing classes – Added 3 new classes • Grid commands added for – Staging miner agent to remote sites – Moving data to mining processor Ames Research Center 16 NAS Division 8 Staging Data Mining Agent to Remote Processor globusrun -w -r target_processor '&(executable=$(GLOBUSRUN_GASS_U RL)# path_to_agent)(arguments=arg1 arg2 … argN)(minMemory=500)' Ames Research Center 17 NAS Division Moving Data to be Mined gsincftpget remote_processor local_directory remote_file Ames Research Center 18 NAS Division 9 Data Mining on the Grid What is data mining? Why use the grid for data mining? Grid miner overview Grid miner architecture Grid miner implementation Current status Ames Research Center 19 NAS Division Current Status • Currently works on the IPG as a prototype system • User documentation underway • Data archives need to be grid-enabled – Connected to the grid – Provide controlled access to data on tertiary storage • E.g., by using a system such as the Storage Resource Broker that was developed at the San Diego Super Computer Center • Some earlier-adopter users need to be found to begin using the Grid Miner – Willing to code any new operations needed for their applications – Willing to work with system with prototype-level documentation Ames Research Center 20 NAS Division 10 Backup Slides Ames Research Center 22 NAS Division 11 Example of Data Being Mined 75 MB for one day of global data - Special Sensor Microwave/Imager (SSM/I). Much higher resolution data exists with significantly higher volume. Ames Research Center 23 NAS Division Grid Will Provide Re-usable Services • In the future, Grid/Web services will provide the ability to create reusable services that can facilitate the development of data mining systems – Builds on the web services work from the e-commerce area • Service interface is defined through WSDL (Web Services Description Language) • Standard access protocol is SOAP (Simple Object Access Protocol) – Mining applications can be built by re-using capabilities provided by existing grid-enabled Web services. Ames Research Center 24 NAS Division 12 Mining on the IPG • Now user must – Develop mining plan – Identify data files to be mined and check file URLs into Control Database – Create mining ticket that has information on • • • • Miner Configuration Server - Currently LDAP server but future GIS Executable type - e.g., SGI Sending-host contact information - Source of mining plan and agent Mining-database contact information - Location of Urls of files to be mined. Future – User could use current capability or a Grid Mining Portal for all of above Ames Research Center 25 NAS Division Mining on the IPG • Mining agent – Acquires configuration information from Miner Configuration Server – Acquires mining plan from sending host (future Mining Portal) – Acquires mining operations needed to support mining plan from Mining Operations Repository – Acquires URLs of data to be mined from Control Database – Transfers data using just-in-time acquisition – Mines data – Produces mining output Ames Research Center 26 NAS Division 13 Mining Operator Acquisition One possibility for the future is a number of source directories for – Public mining operations contributed by practitioners – For-fee mining operations from a future mining.com – Private mining operations available to a particular mining team Ames Research Center 27 NAS Division 14 Applications Talk file://///Mithril/D%20Drive/Temp/App-talk/intro.html Applications Talk Outline file://///Mithril/D%20Drive/Temp/App-talk/outline.html Methodology for Building Grid Applications Methodology for Building Grid Applications 1. Grid Programming Overview 2. Modular Grid Programming 3. Component/Framework Project 4. Summary Thomas M. Eidson ICASE at NASA Langley URL: www.icase.edu/~teidson email: teidson@icase.edu 1 of 1 8/8/2002 9:15 AM 1 of 1 8/8/2002 9:14 AM Applications Talk Outline file://///Mithril/D%20Drive/Temp/App-talk/o_overview.html Programming Model Slide file://///Mithril/D%20Drive/Temp/App-talk/sci_prog.html Modern Scientific Programming Features Scientific Programming 1. Requirements 2. Programmers and Users 3. Issues Composite Applications non-trivial collection of element applications (heterogenous physics, graphics, databases) 2. including large, data-parallel element applications 3. task-parallel execution with message passing and event signals 4. data located in files and databases distributed over network 1. Computing Environment a heterogeneous network of computers (workstations to supercomputers) 2. a variety of OS architectures, languages 3. a variety of sites with different administrations & policies 1. Users nature: mixture of designers, programmers, and users 2. programming teams 3. trend toward code sharing 1. 1 of 1 8/8/2002 9:17 AM 1 of 1 8/8/2002 9:17 AM 8/8/2002 9:20 AM file://///Mithril/D%20Drive/Temp/App-talk/Guser_model.html Applications Talk file://///Mithril/D%20Drive/Temp/App-talk/sci_issues.html Current Research - Opinion Too much emphasis fancy interfaces 2. access to "existing" services 1. Not enough emphasis design of grid applications - component approach recommended 2. side effects of complex applications - port metadata 3. application characterization standards 1 of 1 Programming Model 1. 1 of 1 8/8/2002 9:20 AM Modular Grid Programming Modular organization: single-focus programming modules 2. Coupling: simple interfaces to complex, interactive dialogs 3. Task-Oriented Programming 4. Programming Entities 5. Programming Components definition 6. Software Components & Ports 7. Aspect-oriented Programming (Filters) 8. Multi-language Support 9. Component/Instance Programming 10. Workflow Program 8/8/2002 9:22 AM file://///Mithril/D%20Drive/Temp/App-talk/o_modular.html file://///Mithril/D%20Drive/Temp/App-talk/Gtask_prog.html Applications Talk Outline 8/8/2002 9:21 AM 1 of 1 1 of 1 Component Programming Slide 1. 8/8/2002 9:23 AM file://///Mithril/D%20Drive/Temp/App-talk/Gprog_entity.html Programming Model Slide file://///Mithril/D%20Drive/Temp/App-talk/naut_comp.html Programming Component A Programming Component is an abstraction representing a well-defined programming entity along with metadata that defines the the following properties of the entity. The metadata is referred to as a Shared Programming Definition (SPD) to emphasize that Programming Components are independent of any specific framework. Identity is necessary to ensure that a program expresses the programmers desires. An interface (port) is needed to allow specific behavior to be accessed. State is important to allow a range of functionality so that only a modest number of Programming Components are needed. Relationships between Programming Components allow complex behavior to be defined in a hierarchical manner and to support dynamic modification of behavior. Behavior describes to computational characteristics of a programming entity. 1 of 1 Composite Application Slide Programming Component = Programming Entity + SPD 1 of 1 8/8/2002 9:24 AM 1 of 1 Component Programming Slide 1 of 1 Component Programming Slide 8/8/2002 9:25 AM file://///Mithril/D%20Drive/Temp/App-talk/Gprog_aspects.html 8/8/2002 9:25 AM file://///Mithril/D%20Drive/Temp/App-talk/Gcomp_ports.html 1 of 1 Component Programming Slide 1 of 1 Component Programming Slide 8/8/2002 9:27 AM file://///Mithril/D%20Drive/Temp/App-talk/Gcomp_instances.html 8/8/2002 9:26 AM file://///Mithril/D%20Drive/Temp/App-talk/Gcomp_basics.html 8/8/2002 9:28 AM file://///Mithril/D%20Drive/Temp/App-talk/Ghost_prog.html Applications Talk Outline file://///Mithril/D%20Drive/Temp/App-talk/o_framework.html Nautilus Programming Framework 1. Building Blocks Current funding: partial funding via NSF, joint with U. Tenn (Dongarra and Eijkhout) 3. Nautilus Programming Model 2. 1 of 1 Component Programming Slide 4. 1 of 1 SANS Features 8/8/2002 9:29 AM Nautilus Framework: Building Blocks 1. Large Application Working Environment (LAWE), NASA SBIR 8/8/2002 9:32 AM file://///Mithril/D%20Drive/Temp/App-talk/fw_basis.html file://///Mithril/D%20Drive/Temp/App-talk/Gprog_model0.html Nautilus Framework Slide programming model: programming component + metadata (interface specs) within framework 2. Self-Adaptive Numerical Software (SANS) and NetSolve, U. Tenn behavioral specifications 3. Common Component Architecture (CCA) Specification modular programming standards 4. Globus Toolkit Grid services and security Relevant Grid Forum Specifications 6. Component Programming Slide compatibility with other frameworks and Grid interfaces Relevant Web Standards: SOAP, XML interoperability with other frameworks and Grid interfaces 1 of 2 8/8/2002 9:31 AM 1 of 1 5. SANS Framework Slide file://///Mithril/D%20Drive/Temp/App-talk/fw_sans.html Applications Talk Outline SANS Service Component Summary of ICASE Grid Research Problem: Communication gap between numerical terminology and application terminology. 2. Numerics: matrix properties (spectrum, norm) 3. Application: PDE, discretisation (ex: elliptic problem & linear elements <=> M-matrix, hence Alg. Multigrid) 4. Research: Bridge gap by Intelligent Agent in Self-Adaptive System 5. Approach: use Behavioural metadata 1. 1. Example: user specifies information about systems Intelligent Agent uses heuristic determination of properties in absence of user info Intelligent Agent database enhances by inforamtion from previous runs 7. 1 of 1 Support programmers/users in developing Grid Applications Infrastructure: Tidewater Regional Grid Partnership local (WM, ODU, HU, JLab, NASA Langley, military bases) distant (IPG, U. Utah, U. Complutense/Spain, U. Va.) features: PGP user managment and Globus application services Targets applications: NASA: MultiDisciplinary Optimazation (MDO), Probalistic structures (task farming), impact code, reusable launch vehicles, symmetric web U. Va.: battlefield simmulation U. Utah: genetic algorithms Brown: heterogeneous mathematical algorithms JLab: distributed access to data to describe characteristics of problem data and of software components (e.g., elements of linear system solvers) to enable smart service components to match solver components to user data. 6. file://///Mithril/D%20Drive/Temp/App-talk/summary.html 2. Build Programming Framework to support efficent application development Nautilus Project Targets: linear systems, eigenvalue solvers, information retrieval 8/8/2002 9:33 AM 1 of 1 8/8/2002 9:37 AM