RealityGrid A testbed for computational condensed matter science Peter Coveney

advertisement
RealityGrid
A testbed for computational
condensed matter science
Peter Coveney
Paris, 31 March 2003
Academic Partners
University College London
Queen Mary, University of London
Imperial College
University of Manchester
University of Edinburgh
University of Oxford
University of Loughborough
www.realitygrid.org
RealityGrid
Collaborating institutions:
Schlumberger
Edward Jenner Institute
for Vaccine Research
Silicon Graphics Inc
Computation for Science
Consortium
Advanced Visual Systems
Fujitsu
British Telecommunications
RealityGrid—the Concept
• A RealityGrid generalises the concept of a Reality Centre
across a network of computational resources managed by
Grid middleware.
• It optimises the scientific discovery process by integrating
simulation, visualization and data from experimental
facilities – real-time data mining.
• It builds on and extends the functionality of a DataGrid.
• New middleware issues arise because a RealityGrid must
address synchronicity of resources and their interaction.
http://www.realitygrid.org
UK and US Grid Technologies
Instruments: XMT
devices, LUSI,…
Grid Middleware
HPC resources
Storage devices
User with laptop
Scalable MD, MC,
mesoscale modelling
Performance
control/monitoring
Visualization engines
Steering
VR and/or AG nodes
Moving the bottleneck out of the hardware and into the human mind…
The TeraGyroid Project
•
Funded by EPSRC & NSF (USA)-to join
the UK e-Science Grid and US TeraGrid
•
Main objective was to deliver high impact
science which it would not be possible to
perform without the combined resources of
the US and UK grids
•
Study of defect dynamics in liquid crystalline
surfactant systems using lattice-Boltzmann
methods
•
Three month project including work exhibited
at Supercomputing 2003 and SC Global
•
TRICEPS was the HPC-Challenge aspect of this
work
Collaborating Institutions
...and hundreds of individuals at:
Argonne National Laboratory (ANL)
Boston University
BT
BT Exact
Caltech
CSC
Computing Services for Academic Research (CSAR)
Daresbury Laboratory
Department of Trade and Industry (DTI)
Edinburgh Parallel Computing Centre
Engineering and Physical Sciences Research Council (EPSRC)
Forschungzentrum Juelich
HLRS (Stuttgart)
HPCx
IBM
Imperial College London
National Center for Supercomputer Applications (NCSA)
Pittsburgh Supercomputer Center
San Diego Supercomputer Center
SCinet
SGI
SURFnet
TeraGrid
Tufts University, Boston
UKERNA
UK Grid Support Centre
University College London
University of Edinburgh
University of Manchester
ANL
LB3D: Three dimensional
Lattice-Boltzmann simulations
•
LB3D code is written in Fortran90 and
parallelized using MPI
• Scales linearly on all available
resources (CSAR, HPCx, Lemieux,
Linux/Itanium)
•
•
Fully steerable
Uses parallel data format PHDF5
•
Data produced during a single large
scale simulation can exceed hundreds
of gigabytes to terabytes
•
Simulations require supercomputers
•
High end visualization hardware and
parallel rendering software (e.g. VTK)
needed for data analysis
3D datasets showing snapshots
from a simulation of spinodal
decomposition: A binary mixture
of water and oil phase separates.
‘Blue’ areas denote high water
densities and ‘red’ visualizes the
interface between both fluids.
Computational Steering with LB3D
•
All simulation parameters are steerable using the RealityGrid steering
library.
•
Checkpointing/restarting functionality allows ‘rewinding’ of simulations
and run time job migration across architectures.
•
Steering reduces storage requirements because the user can adapt
data dumping frequencies.
•
CPU time can be saved because users do not have to wait for jobs to
be finished if they can already see that nothing relevant is happening.
•
Instead of doing “task farming”, i.e. launching many simulations at the
same time, parameter searches can be done by “steering” through
parameter space of bigger simuation.
•
Analysis time is significantly reduced because less irrelevant data is
produced.
Parameter Space Exploration
Cubic micellar phase,
high surfactant density
gradient.
Cubic micellar phase,
low surfactant density
gradient.
Initial condition:
Self-assembly
Random water/
starts.
surfactant mixture.
Rewind and
restart from
checkpoint.
Lamellar phase:
surfactant bilayers
between water layers.
Distributed knowledge discovery
The visualised output is streamed to
a distributed set of collaborators
located at Access Grid nodes
across the USA and UK who also
interact with the simulations.
One of RealityGrid’s central aims—
to provide Grid-enabled
collaboratories—is now being
realised.
Liquid crystalline gyroid
This distributed, collaborative activity
further accelerates the discovery of
new scientific phenomena hidden in
terabytes of data.
N. González-Segredo and P. V. Coveney, "Self-assembly of the gyroid cubic mesophase:
lattice-Boltzmann simulations." Europhys. Lett., 65, 6, 795-801 (2004);
“…the most important contribution to computational condensed matter science in 2003..”
SC Global Demonstration
Access Grid Session as seen at BT
TRICEPS Grid
Starlight (Chicago)
10 Gbps
ANL
Netherlight
(Amsterdam)
PSC
Manchester
Caltech
NCSA
Daresbury
BT provision
2 x 1 Gbps
production
network
SJ4
SDSC
MB-NG
Phoenix
Visualization
Computation
Access Grid node
Network PoP
Service Registry
UCL
Dual-homed system
Hardware Infrastructure
Computation (using more than 6000 processors) including:
• HPCx (Daresbury), 1280 procs IBM Power4 Regatta, 6.6 Tflops peak, 1.024 TB
• Lemieux (PSC), 3000 procs HP/Compaq, 3TB memory, 6 Tflops peak
• TeraGrid Itanium2 cluster (NCSA), 256 procs, 1.3 Tflops peak
• TeraGrid Itanium2 cluster (SDSC), 256 procs, 1.3 Tflops peak
• Green (CSAR), SGI Origin 3800, 512 procs, 0.512 TB memory (shared)
• Newton (CSAR), SGI Altix 3700, 256 Itanium 2 procs, 384GB memory (shared)
Visualization:
• Bezier (Manchester), SGI Onyx 300, 6xIR3, 32procs
• Dirac (UCL), SGI Onyx 2, 2xIR3, 16 procs
• SGI loan machine, Phoenix, SGI Onyx 1xIR4, 1xIR3, commissioned on site
• TeraGrid Visualization Cluster (ANL), Intel Xeon
• SGI Onyx (NCSA)
Service Registry:
• Frik (Manchester), Sony Playstation2
Storage:
• 20 TB of science data generated in project; Atlas Petabyte Storage System (RAL)
Access Grid nodes at Boston University, UCL, Manchester, Martlesham, Phoenix (4)
Steering in the OGSA
Steering
GS
bind
Simulation
components start
independently and
attach/detach dynamically
Steering library
library
Steering
publish
Client
connect
Steering
Steering library
client
Registry
find
data transfer
(Globus-IO)
Display
publish
Steering library
bind
Display
Steering
GS
Visualization
Visualization
Display
•Computations run at HPCx, CSAR, SDSC, PSC and NCSA
•Visualizations run at Manchester, UCL, Argonne, NCSA, Phoenix
•Scientists steer calculations from UCL, BT and Boston, collaborating via Access Grid
•Visualizations viewed remotely
•Grid services run anywhere
Software Infrastructure
•
•
•
•
•
•
•
Globus Toolkit, versions 2.2.3, 2.2.4, 2.4.3, 3.1
– We use GRAM (job management), Grid-FTP (migration of
checkpoint files) and Globus-IO (inter-component communication)
6 Certificate Authorities involved
OGSI::Lite
– Perl implementation of OGSI used for Steering Grid Services,
Registry and Checkpoint Metadata Tree
RealityGrid Steering library and toolkit
Visualization
– VTK 4.0+patches, SGI OpenGL VizServer, Chromium, VIC,...
Malleable Checkpoint/Restart
– HDF5, XDR, RealityGrid performance control s/w
Port forwarding s/w from EPCC
TeraGyroid Summary
•
Distributed, collaborative, bleeding edge
technology
•
Applied to deliver new results in
condensed matter science
– Gyroid mesophase of amphiphilic liquid
crystals
– Unprecedented scale (space and time
scales) enables us to investigate
phenomena which have been totally
out of reach hitherto
Hybrid Modelling
Overview
•
Objective:
– To construct a computational scheme to couple two descriptions of
matter with very different characteristic length and time scales.
•
Applications: dynamical processes near interfaces governed by the
interplay between micro and macro-dynamics.
– Examples: complex-fluids near surfaces (polymers, colloids,
surfactants, etc.), lipid membranes, complex boundary conditions in
nano-channels (fluidics), wetting, crystal growth from vapour
phase, melting, critical fluids near surfaces, etc.
•
Key to integrative/systems modelling approaches
•
Grid is the natural architecture for such modelling
Oscillatory-wall shear flow
Application: rheology in nano-slots, complex boundary conditions
Velocity profile
u = u(x,t) j
Set-up
uwall(t) = umax sin(2p wt)
x
Penetration length
du=(2p n/w)1/2
C
P
y
Solid wall (fixed)
General coupling framework aims
• Portability of models via conformance rules
– Support exchange between developers
– Promote ease of composition
– Ability to export models to other frameworks
– Ability to import models into a composition
• Flexibility in deployment
– Neutral to actual communication mechanism
– Flexible mapping of models to machines
http://www.cs.man.ac.uk/cnc/projects/gcf
Examples of deployment:
Coupling a MD code with a CFD code
Single Machine
MD
Sequential control code
Communication through shared buffer
CFD
Muliple machines
MD
CFD
Concurrent deployment;
(two executables)
Communications:
Distributed MPI or
Web services or
GT3, etc.…
Loughborough University PDA Thin Steering
Client for Grid Applications
Wired/wireless connection and ADSL.
Point and click interaction.
Handwriting recognition via PDA OS.
Frees application scientist from desktop.
The registry screen provides access to multiple
registries of simulations.
The jobs screen allows access to any job on
a grid registry
The parameter screen allows the simulation
to be steered via wireless network.
Loughborough Mathematical
Sciences
• Terascaled Loughborough’s MD and ab initio codes.
• Grid based real-time visualization and computational steering
developed for the MD code.
• Visualization suite developed for MD code.
• Nanoindentation simulations performed illustrating the link between
dislocation emission and “pop-ins”.
• Used routinely in research work
Edinburgh Physics & EPCC
• Development of lattice Boltzmann code
– Colloidal particles in binary fluids
– Lattce-Boltzmann code with thermal fluctuations
• Terascaling
– LB3D, Plato, Ludwig (with HPCx)
• Portal
– Web-based RealityGrid steering (see demo)
– Checkpoint management (under design)
• Visualisation / animation / ReG integration
LB algorithmic developments
Terascaling ReG applications
•LB3D
•Plato
•Colloids in
binary fluids
•Ludwig
•LB with
thermal
fluctuations
Web-based Steering Portal
Visualisation / integration
•Integration
of Ludwig in
ReG environment
•Visualisation and animations
See
live
demo!
ICENI: An Integrated Grid
Middleware to support e-Science
Point & click install through Java Web Start to build
the infrastructure and to populate it with resources
ICENI Services
LB3D LB3D LB3D
@IC @UCL @MC
Streamed to Access Grid
Performance
Database
Scheduler
Visualisation client 2
Visual composition to define and configure
Visualisation
client 1for execution
an abstract
simulation
Best LB3D instance selected from the Grid - considering
predicted performance and resource capability
Later, build an extended workflow around the running application
to dynamically steer, view data, and interact with collaborators
Summary
• We’ve tried to use the Grid for serious scientific work.
• While we've had some notable successes, we've been held back
in many ways by the state of existing middleware
– Globus obstructs access to & use of resources (cf ssh, etc.)
• This has been discouraging to people who could otherwise make
very good use of Grid technologies.
• A simpler, more lightweight system would be more useful to us.
J. Chin and P. V. Coveney, "Towards tractable toolkits for the Grid: a plea for lightweight, usable middleware."
UK e-Science Technical Report, number UKeS-2004-01
Our requirements
• Computer-centric: simulations range from workstation-class
codes to high-end HPC systems
• Data-centric: we must keep track of and mine terabytes of
geographically distributed simulation and experimental data
• Community-centric: building cutting-edge visualization and
collaboration systems
Middleware sysadmin problems
• Extremely tedious installation procedure
• Uses customized versions of open-source libraries such as
OpenSSL, instead of detecting and using installed versions
• Uses own package manager (GPT), which can conflict with
existing systems
• Too many prerequisites: Relational database, servlet container,
...
Middleware user problems
•
Level 2 Grid just about offers a job submission service, but far from
transparently: requires significant customization on each machine
•
Very poorly documented
•
Little to no support for scientists with existing Fortran/C/C++ codes who
wish to deploy them across an often Java-oriented Grid
•
Significant functionality in OGSI spec which hasn’t been used by scientists
•
Others have similar comments (e.g. Grid2003 Production Grid):
– significant job failure rates due to the (Globus) middleware itself
– "We spent inordinate amounts of time on configuration issues and
fixing the same or similar problems“ over and over again
Experience with a lightweight
toolkit
• OGSI::Lite by Mark McKeown of Manchester University
• Lightweight OGSI implementation: uses existing software (eg for
SSL) where possible
• Many of the book-keeping Grid services implemented using it
• Grid-based job submission and steering retrofitted onto the LB2D
simulation code within a week
• Standards compliance: we were able to steer simulations from a
web browser, with no custom client software needed!
A possible way forward?
• Better collaboration with application scientists please
• Documentation - HOWTOs detailing how to take an existing
code and Grid-enable it
• A useful toolkit would allow a user to only install the
components (job submission, resource discovery and allocation,
RPC, …) which were actually required.
• We hope the UK e-Science programme will fund work on
lightweight toolkits: it needs to be “bottom up” vs “top down”
Download