Using HTC grid infrastructures: practical experiences from the eminerals project

advertisement
Using HTC grid
infrastructures: practical
experiences from the
eminerals project
Mark Calleja (proxy for Martin Dove)
University of Cambridge
www.eminerals.org
Our view of eScience
Computing grids
Data grids
www.eminerals.org
Collaborative
grids
Science beyond the lab book
‣ Management of too
many tasks
‣ Management of the
resultant data deluge
‣ Sharing the
information content
with collaborators
‣ Maintaining accuracy
and verification
www.eminerals.org
Rock-salt structure of BaCO3
Note
disordered
positions of
oxygen atoms
www.eminerals.org
BaCO3: lattice parameters
R3c
8.0
Unit cell length (Å)
7.5
a
b
c
7.0

6.5
R3m

Pm3m

6.0
Molecular dynamics
simulations on the NGS
5.5
5.0
0
500
1000
1500
Temperature (K)
www.eminerals.org
2000
2500
Usable HTC grid tools
‣ Easy-to-use tools
‣ Easy access to resources and data
‣ Enabling me to achieve much more
than before
“Can I run my jobs
before breakfast?”
www.eminerals.org
Useful tools for HTC grids
‣ Use standard tools and interfaces,
eg Globus, Condor
‣ Heterogenous resources for
heterogenous applications
‣ Metascheduling
‣ Integrated data grid
‣ Give as much control as possible to
the user
‣ The key is in the user interface
www.eminerals.org
Parallel (HPC)
clusters
Access to external
facilities and grids Campus
grids
Data
vault
Data
vault
Data
vault
Globus is used
a)Condor
to provide user authentication
JobMgr
via
digital certificates
b)Globus
job submission middleware
Internet
Our data grid is based
on the San Diego
Storage Resource Broker
Cluster
JobMgr
Compute
clusters
Desktop
pools
Data
vault
Globus
Condor
JobMgr
Globus
The application server provides
databases and server capabilities
Researcher
for the SRB, metadata
tools, and
job submission tool
Application
server
Cluster
JobMgr
Globus
Job submission process
‣ Central role the data grid for data
staging and data archiving
‣ Desktop job submission
‣ Automatic metadata collection
‣ Wrapped up in our RMCS tool
www.eminerals.org
Data
vault
Researcher
7. Researcher
interacts with
the metadata
database to
extract core
output values
Application
server
1. Upload data files
and application to
data vault
2. Submit job to
minigrid via RMCS
5. Metadata is sent to
the application server
3. Data files and
application are
transferred to the
grid resource
6. Output files
are transferred to
the data vault
4. Job runs on
grid compute
resources
RMCS input file
Executable
= ossia2004
pathToExe
= /home/bob.eminerals/OSSIA2004
preferredMachineList = lv1.nw-grid.ac.uk-serial dl1.nw-grid.ac.uk-serial
jobType
= performance
numOfProcs
= 1
Output
= trans.out
Sdir
= /home/bob.eminerals/RMCSdemo
Sget
= *
Sput
= *
GetEnvMetadata = true
RDesc
= Test sweep of temperature using ossia
RDatasetID
= 263
AgentXdefault
= trans.xml
AgentX
=
Energy,trans.xml:PropertyList[$].Property[title='Energy'].value
AgentX
= OrderParameter,trans.xml:Module[$].Property[title='Order
parameter'].value
AgentX
= HeatCapacity,trans.xml:Module[$].Property[title='Heat
capacity'].value
AgentX
=
Susceptibility,trans.xml:Module[$].Property[title='Susceptibility'].value
www.eminerals.org
RMCS architecture
Client layer: shell
tools, GUI
Server layer: API,
database, job
control
Grid resources for
computing and data
www.eminerals.org
RMCS shell interface
RMCS shell commands interact with the
RMCS server via web services – removing
the need for complicated middleware
installation, and is ‘firewall friendly’
Examples of commands:
‣ rmcs_submit: submit a job
‣ rmcs_status: how is the job doing?
‣ rmcs_cancel: kill the job
‣ rmcs_remove: remove from status listing
www.eminerals.org
RMCS GUI interface
www.eminerals.org
Parameter sweeps
We have perl programs that
‣ implement bulk file upload to the
SRB or other data grid
‣ generate set of RMCS input files
‣ submit all the RMCS jobs
Bulk job creation and submission
is a one-command procedure
www.eminerals.org
Data and information
www.eminerals.org
Data representation: XML
Chemical Markup Language
<?xml version="1.0" encoding="UTF-8"?>
<cml convention="FoX_wcml-2.0" fileId="cis1.cml"
version="2.4" xmlns="http://www.xml-cml.org/schema">
<metadataList name="Metadata">
<metadata name="Code name" content="ossia"/>
<metadata name="Code version date" content="January 8, 2007, v2007.3"/>
...
</metadataList>
<module title="Initial System" dictRef="emin:initialModule">
<parameterList>
<parameter dictRef="ossia:temperature" name="Temperature">
<scalar dataType="xsd:double" units="cmlUnits:eV">1.000000000000e-1</scalar>
</parameter>
<parameter dictRef="ossia:NumberOfSteps" name="Number of steps">
<scalar dataType="xsd:integer" units="units:countable">10000000</scalar>
</parameter>
...
</parameterList>
</module>
...
<module title="Finalization" dictRef="emin:finalModule">
<propertyList>
<property dictRef="ossia:Energy" title="Energy">
<scalar dataType="xsd:double" units="cmlUnits:eV">2.052516362912e-1</scalar>
</property>
...
</propertyList>
</module>
</cml>
Capturing audit metadata
Capturing initial parameters
Capturing computed properties
www.eminerals.org
XML and Fortran
‣ Most of our simulation codes are written in
Fortran, which has little support for XML
‣ Thus we have written a set of XML libraries
for Fortran – called FoX – to make writing
XML easy
‣ We have XML-ised a number of simulation
codes, including SIESTA, CASTEP, DL_POLY
and GULP
‣ We have also developed an XML-aware
interface to the SRB called TobysSRB
www.eminerals.org
What XML gives us
‣ Simulation code output that is selfdescribing (no more mere lists of numbers!)
‣ Data files can be transformed to give usercentric and information-centric
representations, including plotted data
‣ Easy to extract key information extracted,
essential for large combinatorial studies
‣ Enables automatic capture of metadata, and
metadata is essential for managing data
www.eminerals.org
XML → metadata
‣ RMCS automatically harvests
metadata from our output
XML files
‣ We have developed a new set
of tools to access the
metadata database
(“RCommands”)
‣ We use metadata for locating data and
datasets created by our colleagues
‣ We also use metadata for extracting core
information from data – useful for analysing
combinatorial studies
www.eminerals.org
RCommands and metadata
Metadata are associated with a
hierarchy of studies, datasets and data
objects, both as descriptions and as
name/value pairs
Examples of commands:
‣ Rls: list metadata items
‣ Rget: get metadata
‣ Rannotate: add metadata
‣ Rgem: extract metadata from all
data objects within a dataset
www.eminerals.org
Researcher A
Data
vault
Upload XML data files to data vault
for sharing with collaborator
Project
wiki
SciSpace.net
Instant
messaging
eMail
Annotate data
with metadata
Access Grid
with JMAST
View information
content of data
files using ccViz
Using Rgem to share
simulation outputs
Application
server
Locate data
from metadata
Researcher B
Summary
‣ eMinerals toolset empowers the
scientist users in their use of HTC
grid resources
‣ Tools work from our personal
computers with easy installation
‣ Integrates compute, data and
collaborative components
www.eminerals.org
Credits
Cambridge: Kat Austen, Richard Bruin, Mark
Calleja, Gen-Tau Chiang, Ian Frame, Peter
Murray-Rust, Toby White, Andrew Walker
STFC: Kerstin Kleese van Dam, Phil Couch,
Tom Mortimer-Jones, Rik Tyer
Bath: Corrine Arrouvel, Arnaud Marmier,
Steve Parker
Funded by NERC
www.eminerals.org
Download