What makes Grid computing difficult? Peter Coveney Centre for Computational Science University College London 31(P.V.Coveney@ucl.ac.uk) March 2003 PeterParis, Coveney EPSRC Annual e-Science Meeting, 22 April 2005 Talk contents • What is Grid computing? • How to do it • Problems • Making grids more usable Peter Coveney (P.V.Coveney@ucl.ac.uk) Grid Computing My preferred definition: Grid computing is distributed computing performed transparently across multiple administrative domains Notes: Computing means any activity involving digital information -- no distinction between numeric/symbolic, or numeric/data/viz Transparency implies minimal complexity for users of the technology Peter Coveney (P.V.Coveney@ucl.ac.uk) See: Phil Trans R Soc London A (2005) Grid computing is NOT… • …launching isolated jobs onto medium-sized or big iron, as is the case for • most work being done on TeraGrid machines • most work being done on National Grid Service What added value is there to “grid-enabling” NGS and TeraGrid machines? - Common file-store system is an important and valuable feature • …talking about or merely complying with middleware specifications and standards • Note: We have interests in Grid-based “informatics” projects, OGSA-DAI and all that for the £1.1M EPSRC funded “Discovery of Novel Functional Peter Coveney (P.V.Coveney@ucl.ac.uk) Oxides” Project Is worse is better? • The Global Grid Forum • The WSRF debacle 2004 • Credibility problem--an expensive talking shop? • Angels dancing on a pinhead, or the European plug revisited? • De facto standards • We need workable, usable solutions • There must be continual engagement between users and grid techies Peter Coveney (P.V.Coveney@ucl.ac.uk) Transferring binary data • Web Services Applications Need Effective, Standard Methods for Handling Binary Data World Wide Web Consortium Issues Three Web Services Recommendations • http://www.w3.org/ -- 25 January 2005 -- The World Wide Web Consortium (W3C) has published three new Web Services Recommendations: • XML-binary Optimized Packaging (XOP), • SOAP Message Transmission Optimization Mechanism (MTOM), and • Resource Representation SOAP Header Block (RRSHB). These recommendations provides ways to efficiently package and transmit binary data included or referenced in a SOAP 1.2 message. Peter Coveney (P.V.Coveney@ucl.ac.uk) Grid Computing: How? • To do grid computing we need to find or build a grid which is: • Stable • Persistent • Usable from which we can pick and choose the resources we need. • Where do we find such a grid that lasts longer than a demo? • UK: National Grid Service (since mid 2004) • US TeraGrid Note: All the above use elements of Globus Toolkit 2 Peter Coveney (P.V.Coveney@ucl.ac.uk) Demos versus real science A tension exists • Demos can help us make progress versus • Here today, gone tomorrow Grid infrastructure must be made persistent in order to perform real science Peter Coveney (P.V.Coveney@ucl.ac.uk) RealityGrid A £4 million project funded by EPSRC Instruments: XMT devices, LUSI,… Grid Middleware HPC resources Storage devices User with laptop or PDA Scalable MD, MC, mesoscale modelling Steering Performance control/monitoring Visualization engines Peter Coveney (P.V.Coveney@ucl.ac.uk) VR and/or AG nodes Building services on GT2 grids • Globus Toolkit 2 has limited usable functionality, so we: • Track specs & standards • Provide functionality as easily as possible • Put this on top of GT2 grid middleware • We do NOT wait for heavyweight generic solutions provided by others: • GT3 (obsolescent) • GT4 (yes, but when?) • It’s a recipe for being sidelined indefinitely… • Lightweight middleware: makes provision of a service oriented architecture a pleasant experience for all Peter Coveney (P.V.Coveney@ucl.ac.uk) Grid computing headaches • Deployment on existing grids • It takes a long time and much effort by many people to get applications properly deployed • Often requires extensive re-working of existing application code • Lots of things can go wrong • Many people have given up -- ROI too low • Lack of persistent grid infrastructure and capabilities -- steering, viz, bandwidth provision; need interactive access • Security issues • Clunky, not very usable • Existing model not taken seriously by people who care about it Peter Coveney (P.V.Coveney@ucl.ac.uk) TeraGyroid Grid 2003 Starlight (Chicago) 10 Gbps ANL Netherlight (Amsterdam) PSC Manchester Caltech NCSA BT provision 2 x 1 Gbps Daresbury production network SJ4 SDSC MB-NG Phoenix Visualization Computation Access Grid node Network PoP Service Registry Peter Coveney (P.V.Coveney@ucl.ac.uk) UCL Dual-homed system “STIMD Grid 2004” UK NGS Grid infrastructure Leeds Starlight (Chicago) US TeraGrid SDSC Manchester Netherlight (Amsterdam) Oxford RAL NCSA PSC UKLight UCL AHM 2004 Both the US TeraGrid and UK NGS use GT2 middleware All sites connected by production network (not all shown) Computation Steering clients Network PoP Service Registry Peter Coveney (P.V.Coveney@ucl.ac.uk) Local laptops,PDAs, and Manchester vncserver RealityGrid demos @ NeSC • Lattice-Boltzmann study of complex fluid flow through porous media (oilfield application) • Molecular dynamics/thermodynamic integration to compute SH2-protein/peptide and HIV-protease/drug binding affinities • Now achieving flexible multi-modal means to control/steer these applications using Qt-steerer, PDA & web portal • Grid infrastructure only comes together “around the demo” -one has to work very hard to get that, and even harder to see it persist beyond the one-week time frame. Peter Coveney (P.V.Coveney@ucl.ac.uk) Problems for users • lack of a common API for usable core functionality (e.g. file-transfer) across distinct grid applications and domains • heterogeneous software stacks make grid-application portability a nightmare for users • security: high barrier for getting certificates accepted beyond the issuing domain--some improvements in past year for US/UK projects • non-uniform scheduling and job-launching resources and often incompatible policies in different admin domains • complex grid middleware detrimental to scientific research, and contrary to the stipulated goals of grid computing Peter Coveney (P.V.Coveney@ucl.ac.uk) X.509 digital certificates in grid computing • Users share certificates because “it’s too hard to get my own”, “it’s too hard to get my certificate authorised for that site, but my colleague managed to get his done”, “my certificate doesn’t work properly”, etc • Users store private keys in multiple locations • Users protect private keys with no passphrase or with trivial passphrases • Users re-use certificates obtained for one specific purpose for another “because it is too difficult to get another one” Peter Coveney (P.V.Coveney@ucl.ac.uk) Adapted from Bruce Beckles Security and usability • Usability considerations alone lead one to the conclusion they are unsatisfactory: • Extremely difficult to use, particularly as implemented in current grid environments • Security solutions which are difficult to use are inherently insecure -- users inadvertently use them in an insecure way or deliberately subvert them in an attempt to “just get my work done without all this stuff getting in the way” Peter Coveney (P.V.Coveney@ucl.ac.uk) Adapted from Bruce Beckles In other words… …it is too difficult to use this hopeless mess properly, and, anyway, I’ve got better things to do with my time, so… “Can’t use it. Won’t use it.” Peter Coveney (P.V.Coveney@ucl.ac.uk) Adapted from Bruce Beckles Lightweight middleware • OGSI::Lite/WSRF::Lite • by Mark McKeown of Manchester University • Lightweight OGSI/WSRF implementation, written in Perl • uses existing software (eg for SSL) where possible; simple installation • Using OGSI::Lite (2003) • Grid-based job submission and steering retrofitted onto the LB2D workstation class simulation code within a week • Standards compliance: we were able to steer simulations from a web browser, with no custom client software needed • Necessary for all RealityGrid grid work, e.g. TeraGyroid • Now developing extended capabilities using WSRF::Lite on US TeraGrid & UK NGS • We have developed WEDS--a web service environment for distributed simulation Peter Coveney (P.V.Coveney@ucl.ac.uk) Scientists developing middleware! • Rapid prototyping of usable grid middleware (EPSRC funded) • Robust application hosting under WSRF::Lite (OMII funded) • Total value > £500K OMII = Open Middleware Infrastructure Institute (UK) www.omii.ac.uk Peter Coveney (P.V.Coveney@ucl.ac.uk) WEDS Architecture Client Broker Machine Service Service Factory Wrapper Service Invoked Application Managed resource Peter Coveney (P.V.Coveney@ucl.ac.uk) • Each resource runs a WSRF::Lite container containing a WEDS Machine service and factory services for each hosted application. • Each machine that a user wishes to use is registered with a broker service • The user contacts the broker with the details of the job to run • The broker match-makes the job details with the capabilities advertised by each machine service and decides where to invoke the service • The broker passes back the contact details of the service instance to the client Robust application hosting • Developing our lightweight hosting tools to meet the needs of applications scientists • No preconceptions about the 'right way' to do things or pre-determined adherence to particular specifications or “work flows” • Gain experience by working with real-world problems, refactoring design as required • Projects/people we are collaborating with as “end-users” --Daniel Mason (Imperial) -- polystyrene-surface interactions (see demo) --CCP5’s DL-MESO Project (Rongshan Qin, DL) -- mesoscale modelling/simulation --Jonathan Essex (Southampton) -- NAMD for protein modelling --Integrative Biology EPSRC e-Science Project project --IBiS (Integrative Biological Simulation) BBSRC Bioinformatics & e-Science Project • Close collaboration with OMII and its middleware Peter Coveney (P.V.Coveney@ucl.ac.uk) Summary • We are using the Grid to do real science • When successful, leads to step jump in our capabilities • We are working with US TeraGrid and UK National Grid Service to try to ensure compatibility between two grids into the future (GT4, …) • We’re being held back by the state of existing “grid infrastructure” Peter Coveney (P.V.Coveney@ucl.ac.uk) Summary • There remain large barriers to routine use of flexible computational grids • Lightweight middleware greatly facilitates deployment of users’ applications on grids • We’re working with several “computational user communities” from physics through to biology to try to attract them onto grids in this manner Peter Coveney (P.V.Coveney@ucl.ac.uk) Acknowledgements • Many colleagues, post-graduates and post-docs • EPSRC • OMII • NSF Peter Coveney (P.V.Coveney@ucl.ac.uk)