Experience of the SRB in support of collaborative grid computing Martin Dove

advertisement
Experience of the SRB in
support of collaborative
grid computing
Martin Dove
University of Cambridge
www.eminerals.org
A voyage of discovery
‣ We aimed to focus on grid computing to support
molecular-scale simulations ...
‣ ... but discovered the important role of data and
information delivery
‣ We thought that the SRB would provide a
means to archive data ...
‣ ... but discovered that it could be much more
useful than that
The SRB has radically changed our view of
how we should carry out the scientific
process
www.eminerals.org
My view of eScience
Computing grids
Data grids
www.eminerals.org
Collaborative
grids
Science beyond the lab-book
‣ Management of too
many tasks
‣ Management of the
resultant data deluge
‣ Sharing the information
content with
collaborators
‣ Maintaining accuracy
and verification
www.eminerals.org
Expansion of calcite
Neutron diffraction
experiments
5% increase in c
small decrease in
a
www.eminerals.org
BaCO3: lattice parameters
R3c
8.0
Unit cell length (Å)
7.5
a
b
c
7.0

6.5
R3m

Pm3m

6.0
Molecular dynamics
simulations on the NGS
5.5
5.0
0
500
1000
1500
Temperature (K)
www.eminerals.org
2000
2500
Challenge for the researcher
‣ Short-term collation of the data
‣ Longer-term management of the data
‣ Sharing the data with collaborators
www.eminerals.org
SRB and grid computing
‣ It was important to build the data grid – in
our case the SRB – into the heart of the
computing grid environment
‣ Then we needed tools to make the
integration of the data and compute grids
seamless, and which are easy to use – nonintrusive
www.eminerals.org
Profile of our users
‣ They want maximum control over
their work processes – they don’t
want to access them through portals
or GUI’s
‣ They also don’t want their
applications pre-wrapped as
services: they want to have
complete control over their
applications, e.g. to add capability
‣ They know what they are doing ...
‣ ... and they don’t want to be told how
to do things!
www.eminerals.org
Parallel (HPC)
clusters
Access to external
facilities and grids Campus
grids
Data
vault
Data
vault
Data
vault
Globus is used
a)Condor
to provide user authentication
JobMgr
via
digital certificates
b)Globus
job submission middleware
Internet
Our data grid is based
on the San Diego
Storage Resource Broker
Cluster
JobMgr
Compute
clusters
Desktop
pools
Data
vault
Globus
Condor
JobMgr
Globus
The application server provides
databases and server capabilities
Researcher
for the SRB, metadata
tools, and
job submission tool
Application
server
Cluster
JobMgr
Globus
Job submission process
‣ We have developed RMCS to run the job
submission process
‣ It integrates with the use of the data grid,
specifically with the SRB
‣ RMCS can be run from the user’s desktop
via a shell-command client tool
www.eminerals.org
Data
vault
Researcher
7. Researcher
interacts with
the metadata
database to
extract core
output values
Application
server
1. Upload data files
and application to
data vault
2. Submit job to
grid via RMCS
5. Metadata is sent to
the application server
3. Data files and
application are
transferred to the
grid resource
6. Output files
are transferred to
the data vault
4. Job runs on
grid compute
resources
Parameter sweeps
We have perl programs that
‣ implement bulk file upload to the
SRB or other data grid
‣ generate set of RMCS input files
‣ submit all the RMCS jobs
Bulk job creation and submission
is a one-command procedure
www.eminerals.org
Data and information
XML data
representation
instead
www.eminerals.org
Researcher A
Data
vault
Upload XML data files to data vault
for sharing with collaborator
SciSpace.net
Instant
messaging
Access Grid
with JMAST
View information
content of data
files using ccViz
Researcher B
SRB: some early positives
‣ When we started, it was the only show in
town to facilitate easy data sharing
‣ It was affordable in terms of capital and
person ££££
‣ It is easily extended through addition of new
vaults
‣ It proved easy to use
www.eminerals.org
Anecdote: Lucy’s project
Lucy was a third-year project
student, and we let her perform
her project using all our grid
infrastructure with no
compromises
‣ Lucy learned to use the SRB-based data
grid very easily
‣ Using our data tools, she was able to
provide me with remote access to the
information content of her data very easily
www.eminerals.org
Some caveats
‣ We didn’t actually need to federate or
distribute different data sources ...
‣ ... and by distributing our data we
discovered that such an approach gives an
unnecessary weak link and issues of
ownership
‣ We didn’t need the access-control tools, nor
the data replication tools, in which case
some of the infrastructure was heavier than
needed
www.eminerals.org
So what is different now?
‣ We now expect to be able to share their data
with collaborators ...
‣ ... and we expect this to be easy (ie not via a
multi-stage process)
‣ We now routinely produce complete archives
of all files associated with a study easily and
automatically, rather than have stuff dumped
to our desktops
‣ And we now expect a single place to deposit
data, and for this process to be easy and
automatic
www.eminerals.org
Summary
‣ The SRB was critical to the successes of the
eMinerals project
‣ The SRB was easy to use, and affordable
‣ We have developed some tools on top of the
SRB to make access, display of data, and
access control easier (eg webdav access,
web interface)
‣ The SRB has radically changed the way we
think about managing data – but I don’t think
that this was an easy change to acquire
www.eminerals.org
Credits
Cambridge: Kat Austen, Richard Bruin, Mark
Calleja, Gen-Tao Chiang, Ian Frame, Peter
Murray-Rust, Toby White, Andrew Walker
STFC: Kerstin Kleese van Dam, Phil Couch,
Tom Mortimer-Jones, Rik Tyer
Funded by NERC
www.eminerals.org
Download