Experience of the SRB in support of collaborative grid computing Martin Dove University of Cambridge www.eminerals.org A voyage of discovery ‣ We aimed to focus on grid computing to support molecular-scale simulations ... ‣ ... but discovered the important role of data and information delivery ‣ We thought that the SRB would provide a means to archive data ... ‣ ... but discovered that it could be much more useful than that The SRB has radically changed our view of how we should carry out the scientific process www.eminerals.org My view of eScience Computing grids Data grids www.eminerals.org Collaborative grids Science beyond the lab-book ‣ Management of too many tasks ‣ Management of the resultant data deluge ‣ Sharing the information content with collaborators ‣ Maintaining accuracy and verification www.eminerals.org Expansion of calcite Neutron diffraction experiments 5% increase in c small decrease in a www.eminerals.org BaCO3: lattice parameters R3c 8.0 Unit cell length (Å) 7.5 a b c 7.0 6.5 R3m Pm3m 6.0 Molecular dynamics simulations on the NGS 5.5 5.0 0 500 1000 1500 Temperature (K) www.eminerals.org 2000 2500 Challenge for the researcher ‣ Short-term collation of the data ‣ Longer-term management of the data ‣ Sharing the data with collaborators www.eminerals.org SRB and grid computing ‣ It was important to build the data grid – in our case the SRB – into the heart of the computing grid environment ‣ Then we needed tools to make the integration of the data and compute grids seamless, and which are easy to use – nonintrusive www.eminerals.org Profile of our users ‣ They want maximum control over their work processes – they don’t want to access them through portals or GUI’s ‣ They also don’t want their applications pre-wrapped as services: they want to have complete control over their applications, e.g. to add capability ‣ They know what they are doing ... ‣ ... and they don’t want to be told how to do things! www.eminerals.org Parallel (HPC) clusters Access to external facilities and grids Campus grids Data vault Data vault Data vault Globus is used a)Condor to provide user authentication JobMgr via digital certificates b)Globus job submission middleware Internet Our data grid is based on the San Diego Storage Resource Broker Cluster JobMgr Compute clusters Desktop pools Data vault Globus Condor JobMgr Globus The application server provides databases and server capabilities Researcher for the SRB, metadata tools, and job submission tool Application server Cluster JobMgr Globus Job submission process ‣ We have developed RMCS to run the job submission process ‣ It integrates with the use of the data grid, specifically with the SRB ‣ RMCS can be run from the user’s desktop via a shell-command client tool www.eminerals.org Data vault Researcher 7. Researcher interacts with the metadata database to extract core output values Application server 1. Upload data files and application to data vault 2. Submit job to grid via RMCS 5. Metadata is sent to the application server 3. Data files and application are transferred to the grid resource 6. Output files are transferred to the data vault 4. Job runs on grid compute resources Parameter sweeps We have perl programs that ‣ implement bulk file upload to the SRB or other data grid ‣ generate set of RMCS input files ‣ submit all the RMCS jobs Bulk job creation and submission is a one-command procedure www.eminerals.org Data and information XML data representation instead www.eminerals.org Researcher A Data vault Upload XML data files to data vault for sharing with collaborator SciSpace.net Instant messaging Access Grid with JMAST View information content of data files using ccViz Researcher B SRB: some early positives ‣ When we started, it was the only show in town to facilitate easy data sharing ‣ It was affordable in terms of capital and person ££££ ‣ It is easily extended through addition of new vaults ‣ It proved easy to use www.eminerals.org Anecdote: Lucy’s project Lucy was a third-year project student, and we let her perform her project using all our grid infrastructure with no compromises ‣ Lucy learned to use the SRB-based data grid very easily ‣ Using our data tools, she was able to provide me with remote access to the information content of her data very easily www.eminerals.org Some caveats ‣ We didn’t actually need to federate or distribute different data sources ... ‣ ... and by distributing our data we discovered that such an approach gives an unnecessary weak link and issues of ownership ‣ We didn’t need the access-control tools, nor the data replication tools, in which case some of the infrastructure was heavier than needed www.eminerals.org So what is different now? ‣ We now expect to be able to share their data with collaborators ... ‣ ... and we expect this to be easy (ie not via a multi-stage process) ‣ We now routinely produce complete archives of all files associated with a study easily and automatically, rather than have stuff dumped to our desktops ‣ And we now expect a single place to deposit data, and for this process to be easy and automatic www.eminerals.org Summary ‣ The SRB was critical to the successes of the eMinerals project ‣ The SRB was easy to use, and affordable ‣ We have developed some tools on top of the SRB to make access, display of data, and access control easier (eg webdav access, web interface) ‣ The SRB has radically changed the way we think about managing data – but I don’t think that this was an easy change to acquire www.eminerals.org Credits Cambridge: Kat Austen, Richard Bruin, Mark Calleja, Gen-Tao Chiang, Ian Frame, Peter Murray-Rust, Toby White, Andrew Walker STFC: Kerstin Kleese van Dam, Phil Couch, Tom Mortimer-Jones, Rik Tyer Funded by NERC www.eminerals.org