What makes Grid computing difficult? Peter Coveney Centre for Computational Science

advertisement
What makes
Grid computing difficult?
Peter Coveney
Centre for Computational Science
University College London
31(P.V.Coveney@ucl.ac.uk)
March 2003
PeterParis,
Coveney
EPSRC Annual e-Science Meeting, 22 April 2005
Talk contents
• What is Grid computing?
• How to do it
• Problems
• Making grids more usable
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Grid Computing
My preferred definition:
Grid computing is distributed
computing performed transparently
across multiple administrative
domains
Notes:
Computing means any activity involving digital information
-- no distinction between numeric/symbolic, or numeric/data/viz
Transparency implies minimal complexity for users of the technology
Peter Coveney (P.V.Coveney@ucl.ac.uk)
See: Phil Trans R Soc London A (2005)
Grid computing is NOT…
• …launching isolated jobs onto medium-sized or big iron, as is the case
for
• most work being done on TeraGrid machines
• most work being done on National Grid Service
What added value is there to “grid-enabling” NGS and TeraGrid
machines?
- Common file-store system is an important and valuable feature
• …talking about or merely complying with middleware specifications and
standards
• Note: We have interests in Grid-based “informatics” projects, OGSA-DAI
and all that for the £1.1M EPSRC funded “Discovery of Novel Functional
Peter Coveney
(P.V.Coveney@ucl.ac.uk)
Oxides”
Project
Is worse is better?
• The Global Grid Forum
• The WSRF debacle 2004
• Credibility problem--an expensive talking shop?
• Angels dancing on a pinhead, or the European plug revisited?
• De facto standards
• We need workable, usable solutions
• There must be continual engagement between users and grid techies
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Transferring binary data
• Web Services Applications Need Effective, Standard Methods for Handling
Binary Data
World Wide Web Consortium Issues Three Web Services
Recommendations
• http://www.w3.org/ -- 25 January 2005 -- The World Wide Web Consortium
(W3C) has published three new Web Services Recommendations:
• XML-binary Optimized Packaging (XOP),
• SOAP Message Transmission Optimization Mechanism (MTOM), and
• Resource Representation SOAP Header Block (RRSHB).
These recommendations provides ways to efficiently package and transmit binary data
included or referenced in a SOAP 1.2 message.
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Grid Computing: How?
• To do grid computing we need to find or build a grid which is:
• Stable
• Persistent
• Usable
from which we can pick and choose the resources we need.
• Where do we find such a grid that lasts longer than a demo?
• UK: National Grid Service (since mid 2004)
• US TeraGrid
Note: All the above use elements of Globus Toolkit 2
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Demos versus real science
A tension exists
• Demos can help us make progress
versus
• Here today, gone tomorrow
Grid infrastructure must be made persistent in order to perform real
science
Peter Coveney (P.V.Coveney@ucl.ac.uk)
RealityGrid
A £4 million project funded by EPSRC
Instruments: XMT
devices, LUSI,…
Grid Middleware
HPC resources
Storage devices
User with laptop or PDA
Scalable MD, MC,
mesoscale modelling
Steering
Performance
control/monitoring
Visualization engines
Peter Coveney (P.V.Coveney@ucl.ac.uk)
VR and/or AG nodes
Building services on GT2 grids
• Globus Toolkit 2 has limited usable functionality, so we:
• Track specs & standards
• Provide functionality as easily as possible
• Put this on top of GT2 grid middleware
• We do NOT wait for heavyweight generic solutions provided
by others:
• GT3 (obsolescent)
• GT4 (yes, but when?)
• It’s a recipe for being sidelined indefinitely…
• Lightweight middleware: makes provision of a service
oriented architecture a pleasant experience for all
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Grid computing headaches
• Deployment on existing grids
• It takes a long time and much effort by many people to get applications
properly deployed
• Often requires extensive re-working of existing application code
• Lots of things can go wrong
• Many people have given up -- ROI too low
• Lack of persistent grid infrastructure and capabilities -- steering, viz,
bandwidth provision; need interactive access
• Security issues
• Clunky, not very usable
• Existing model not taken seriously by people who care about it
Peter Coveney (P.V.Coveney@ucl.ac.uk)
TeraGyroid Grid 2003
Starlight (Chicago)
10 Gbps
ANL
Netherlight
(Amsterdam)
PSC
Manchester
Caltech
NCSA
BT provision
2 x 1 Gbps
Daresbury
production
network
SJ4
SDSC
MB-NG
Phoenix
Visualization
Computation
Access Grid node
Network PoP
Service Registry
Peter Coveney (P.V.Coveney@ucl.ac.uk)
UCL
Dual-homed system
“STIMD Grid 2004”
UK NGS
Grid infrastructure
Leeds
Starlight (Chicago)
US TeraGrid
SDSC
Manchester
Netherlight
(Amsterdam)
Oxford
RAL
NCSA
PSC
UKLight
UCL
AHM 2004
Both the US TeraGrid
and UK NGS use GT2
middleware
All sites connected by
production network (not
all shown)
Computation
Steering clients
Network PoP
Service Registry
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Local
laptops,PDAs,
and Manchester
vncserver
RealityGrid demos @ NeSC
• Lattice-Boltzmann study of complex fluid flow through porous
media (oilfield application)
• Molecular dynamics/thermodynamic integration to compute
SH2-protein/peptide and HIV-protease/drug binding affinities
• Now achieving flexible multi-modal means to control/steer
these applications using Qt-steerer, PDA & web portal
• Grid infrastructure only comes together “around the demo” -one has to work very hard to get that, and even harder to see
it persist beyond the one-week time frame.
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Problems for users
• lack of a common API for usable core
functionality (e.g. file-transfer) across distinct grid
applications and domains
• heterogeneous software stacks make grid-application
portability a nightmare for users
• security: high barrier for getting certificates accepted beyond
the issuing domain--some improvements in past year for
US/UK projects
• non-uniform scheduling and job-launching resources
and often incompatible policies in different admin domains
• complex grid middleware detrimental to scientific research,
and
contrary to the stipulated goals of grid computing
Peter Coveney (P.V.Coveney@ucl.ac.uk)
X.509 digital certificates in grid computing
• Users share certificates because “it’s too hard to get my own”,
“it’s too hard to get my certificate authorised for that site, but my
colleague managed to get his done”, “my certificate doesn’t work
properly”, etc
• Users store private keys in multiple locations
• Users protect private keys with no passphrase or with trivial
passphrases
• Users re-use certificates obtained for one specific purpose for
another “because it is too difficult to get another one”
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Adapted from Bruce Beckles
Security and usability
• Usability considerations alone lead one to the
conclusion they are unsatisfactory:
• Extremely difficult to use, particularly as implemented in
current grid environments
• Security solutions which are difficult to use are inherently
insecure -- users inadvertently use them in an insecure way or
deliberately subvert them in an attempt to “just get my work
done without all this stuff getting in the way”
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Adapted from Bruce Beckles
In other words…
…it is too difficult to use this hopeless mess properly,
and, anyway, I’ve got better things to do with my
time, so…
“Can’t use it. Won’t use it.”
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Adapted from Bruce Beckles
Lightweight middleware
• OGSI::Lite/WSRF::Lite
• by Mark McKeown of Manchester University
• Lightweight OGSI/WSRF implementation, written in Perl
• uses existing software (eg for SSL) where possible; simple installation
• Using OGSI::Lite (2003)
• Grid-based job submission and steering retrofitted onto the LB2D
workstation class simulation code within a week
• Standards compliance: we were able to steer simulations from a web
browser, with no custom client software needed
• Necessary for all RealityGrid grid work, e.g. TeraGyroid
• Now developing extended capabilities using WSRF::Lite on
US TeraGrid & UK NGS
• We have developed WEDS--a web service environment for
distributed simulation
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Scientists developing middleware!
• Rapid prototyping of usable grid middleware (EPSRC
funded)
• Robust application hosting under WSRF::Lite (OMII funded)
• Total value > £500K
OMII = Open Middleware Infrastructure Institute (UK)
www.omii.ac.uk
Peter Coveney (P.V.Coveney@ucl.ac.uk)
WEDS Architecture
Client
Broker
Machine Service
Service Factory
Wrapper Service
Invoked
Application
Managed resource
Peter Coveney (P.V.Coveney@ucl.ac.uk)
• Each resource runs a
WSRF::Lite container
containing a WEDS
Machine service and
factory services for each
hosted application.
• Each machine that a user
wishes to use is registered
with a broker service
• The user contacts the
broker with the details of
the job to run
• The broker match-makes
the job details with the
capabilities advertised by
each machine service and
decides where to invoke
the service
• The broker passes back
the contact details of the
service instance to the
client
Robust application hosting
• Developing our lightweight hosting tools to meet the needs of applications scientists
• No preconceptions about the 'right way' to do things or pre-determined adherence to particular
specifications or “work flows”
• Gain experience by working with real-world problems, refactoring design as required
• Projects/people we are collaborating with as “end-users”
--Daniel Mason (Imperial) -- polystyrene-surface interactions (see demo)
--CCP5’s DL-MESO Project (Rongshan Qin, DL) -- mesoscale modelling/simulation
--Jonathan Essex (Southampton) -- NAMD for protein modelling
--Integrative Biology EPSRC e-Science Project project
--IBiS (Integrative Biological Simulation) BBSRC Bioinformatics & e-Science Project
• Close collaboration with OMII and its middleware
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Summary
• We are using the Grid to do real science
• When successful, leads to step jump in our capabilities
• We are working with US TeraGrid and UK National Grid
Service to try to ensure compatibility between two grids into
the future (GT4, …)
• We’re being held back by the state of existing “grid
infrastructure”
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Summary
• There remain large barriers to routine use of flexible
computational grids
• Lightweight middleware greatly facilitates deployment of
users’ applications on grids
• We’re working with several “computational user
communities” from physics through to biology to try to
attract them onto grids in this manner
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Acknowledgements
• Many colleagues, post-graduates and post-docs
• EPSRC
• OMII
• NSF
Peter Coveney (P.V.Coveney@ucl.ac.uk)
Download