Using HTC grid infrastructures: practical experiences from the eminerals project Mark Calleja (proxy for Martin Dove) University of Cambridge www.eminerals.org Our view of eScience Computing grids Data grids www.eminerals.org Collaborative grids Science beyond the lab book ‣ Management of too many tasks ‣ Management of the resultant data deluge ‣ Sharing the information content with collaborators ‣ Maintaining accuracy and verification www.eminerals.org Rock-salt structure of BaCO3 Note disordered positions of oxygen atoms www.eminerals.org BaCO3: lattice parameters R3c 8.0 Unit cell length (Å) 7.5 a b c 7.0 6.5 R3m Pm3m 6.0 Molecular dynamics simulations on the NGS 5.5 5.0 0 500 1000 1500 Temperature (K) www.eminerals.org 2000 2500 Usable HTC grid tools ‣ Easy-to-use tools ‣ Easy access to resources and data ‣ Enabling me to achieve much more than before “Can I run my jobs before breakfast?” www.eminerals.org Useful tools for HTC grids ‣ Use standard tools and interfaces, eg Globus, Condor ‣ Heterogenous resources for heterogenous applications ‣ Metascheduling ‣ Integrated data grid ‣ Give as much control as possible to the user ‣ The key is in the user interface www.eminerals.org Parallel (HPC) clusters Access to external facilities and grids Campus grids Data vault Data vault Data vault Globus is used a)Condor to provide user authentication JobMgr via digital certificates b)Globus job submission middleware Internet Our data grid is based on the San Diego Storage Resource Broker Cluster JobMgr Compute clusters Desktop pools Data vault Globus Condor JobMgr Globus The application server provides databases and server capabilities Researcher for the SRB, metadata tools, and job submission tool Application server Cluster JobMgr Globus Job submission process ‣ Central role the data grid for data staging and data archiving ‣ Desktop job submission ‣ Automatic metadata collection ‣ Wrapped up in our RMCS tool www.eminerals.org Data vault Researcher 7. Researcher interacts with the metadata database to extract core output values Application server 1. Upload data files and application to data vault 2. Submit job to minigrid via RMCS 5. Metadata is sent to the application server 3. Data files and application are transferred to the grid resource 6. Output files are transferred to the data vault 4. Job runs on grid compute resources RMCS input file Executable = ossia2004 pathToExe = /home/bob.eminerals/OSSIA2004 preferredMachineList = lv1.nw-grid.ac.uk-serial dl1.nw-grid.ac.uk-serial jobType = performance numOfProcs = 1 Output = trans.out Sdir = /home/bob.eminerals/RMCSdemo Sget = * Sput = * GetEnvMetadata = true RDesc = Test sweep of temperature using ossia RDatasetID = 263 AgentXdefault = trans.xml AgentX = Energy,trans.xml:PropertyList[$].Property[title='Energy'].value AgentX = OrderParameter,trans.xml:Module[$].Property[title='Order parameter'].value AgentX = HeatCapacity,trans.xml:Module[$].Property[title='Heat capacity'].value AgentX = Susceptibility,trans.xml:Module[$].Property[title='Susceptibility'].value www.eminerals.org RMCS architecture Client layer: shell tools, GUI Server layer: API, database, job control Grid resources for computing and data www.eminerals.org RMCS shell interface RMCS shell commands interact with the RMCS server via web services – removing the need for complicated middleware installation, and is ‘firewall friendly’ Examples of commands: ‣ rmcs_submit: submit a job ‣ rmcs_status: how is the job doing? ‣ rmcs_cancel: kill the job ‣ rmcs_remove: remove from status listing www.eminerals.org RMCS GUI interface www.eminerals.org Parameter sweeps We have perl programs that ‣ implement bulk file upload to the SRB or other data grid ‣ generate set of RMCS input files ‣ submit all the RMCS jobs Bulk job creation and submission is a one-command procedure www.eminerals.org Data and information www.eminerals.org Data representation: XML Chemical Markup Language <?xml version="1.0" encoding="UTF-8"?> <cml convention="FoX_wcml-2.0" fileId="cis1.cml" version="2.4" xmlns="http://www.xml-cml.org/schema"> <metadataList name="Metadata"> <metadata name="Code name" content="ossia"/> <metadata name="Code version date" content="January 8, 2007, v2007.3"/> ... </metadataList> <module title="Initial System" dictRef="emin:initialModule"> <parameterList> <parameter dictRef="ossia:temperature" name="Temperature"> <scalar dataType="xsd:double" units="cmlUnits:eV">1.000000000000e-1</scalar> </parameter> <parameter dictRef="ossia:NumberOfSteps" name="Number of steps"> <scalar dataType="xsd:integer" units="units:countable">10000000</scalar> </parameter> ... </parameterList> </module> ... <module title="Finalization" dictRef="emin:finalModule"> <propertyList> <property dictRef="ossia:Energy" title="Energy"> <scalar dataType="xsd:double" units="cmlUnits:eV">2.052516362912e-1</scalar> </property> ... </propertyList> </module> </cml> Capturing audit metadata Capturing initial parameters Capturing computed properties www.eminerals.org XML and Fortran ‣ Most of our simulation codes are written in Fortran, which has little support for XML ‣ Thus we have written a set of XML libraries for Fortran – called FoX – to make writing XML easy ‣ We have XML-ised a number of simulation codes, including SIESTA, CASTEP, DL_POLY and GULP ‣ We have also developed an XML-aware interface to the SRB called TobysSRB www.eminerals.org What XML gives us ‣ Simulation code output that is selfdescribing (no more mere lists of numbers!) ‣ Data files can be transformed to give usercentric and information-centric representations, including plotted data ‣ Easy to extract key information extracted, essential for large combinatorial studies ‣ Enables automatic capture of metadata, and metadata is essential for managing data www.eminerals.org XML → metadata ‣ RMCS automatically harvests metadata from our output XML files ‣ We have developed a new set of tools to access the metadata database (“RCommands”) ‣ We use metadata for locating data and datasets created by our colleagues ‣ We also use metadata for extracting core information from data – useful for analysing combinatorial studies www.eminerals.org RCommands and metadata Metadata are associated with a hierarchy of studies, datasets and data objects, both as descriptions and as name/value pairs Examples of commands: ‣ Rls: list metadata items ‣ Rget: get metadata ‣ Rannotate: add metadata ‣ Rgem: extract metadata from all data objects within a dataset www.eminerals.org Researcher A Data vault Upload XML data files to data vault for sharing with collaborator Project wiki SciSpace.net Instant messaging eMail Annotate data with metadata Access Grid with JMAST View information content of data files using ccViz Using Rgem to share simulation outputs Application server Locate data from metadata Researcher B Summary ‣ eMinerals toolset empowers the scientist users in their use of HTC grid resources ‣ Tools work from our personal computers with easy installation ‣ Integrates compute, data and collaborative components www.eminerals.org Credits Cambridge: Kat Austen, Richard Bruin, Mark Calleja, Gen-Tau Chiang, Ian Frame, Peter Murray-Rust, Toby White, Andrew Walker STFC: Kerstin Kleese van Dam, Phil Couch, Tom Mortimer-Jones, Rik Tyer Bath: Corrine Arrouvel, Arnaud Marmier, Steve Parker Funded by NERC www.eminerals.org