SRB and iRODS work at the STFC and National Grid Service Dr Thomas Mortimer-Jones eScience Centre Science and Technology Facilities Council Overview ● ● ● ● ● ● ● Atlas Data Store (ADS) Diamond Light Source (DLS) Daresbury Synchrotron Radiation Source (SRS) Biotechnology and Biological Sciences Research Council (BBSRC) Arts and Humanities Data Service (AHDS) National Grid Service (NGS) iRODS Work at STFC Atlas Data Store Atlas Data Store • Petabyte-scale mass storage system • In use for over 20 years • Data storage backend for the UK CERN tier 1 centre which will receive data from the Large Hadron Collider (LHC) • Allows for fire safe and off-site backups to ensure maximum data safety Atlas Data Store • SRB driver allows seamless integration of tape resources into an SRB system • SRB containers group together small files for efficient storage to tape • Containers are first created on a cache disk (parallel transfer is possible) • When the user has finished writing to the container it can then be synced to tape Diamond Light Source Diamond Light Source • Largest UK-funded scientific facility to be built for over 40 years • Requirement to store data produced by visiting scientists Diamond Light Source ● ● ● ● Imploding star topology SRB clients on the beamlines register the data with SRB Data is transferred to a central server where it is containerised Data synced to tape – system turns sporadic data into a regular stream Daresbury SRS Daresbury SRS • Worlds first dedicated synchrotron • Operating since 1981 • Due to start decommissioning at the end of this year • Data catalogue must be preserved so that data isn't lost after the facility has been decommissioned – backup to ADS • Data must be accessible for users to get access to their data Daresbury SRS • User interface based on Toby's SRB from the University of Cambridge • Modified to the requirements of the facility • Metadata has been extracted from the headers of the synchrotron output files • Bespoke metadata query functionality allows for users to locate data based on any information they can remember about their visit, e.g. dates, stations, run numbers Daresbury SRS Daresbury SRS Other Facilities • ISIS neutron and muon source • Central Laser Facility • New Light Source? BBSRC • Archive research material from funded projects • Data needs to be stored off-site to meet regulatory, project and funder requirements • Bespoke user interface has been developed to allow users to access the system • Like Diamond using an imploding star topology due to the constraints of their private network BBSRC • Each BBSRC institute site has its own SRB server • Data is stored from institute computers into the local SRB • A scheduled transfer occurs to physically move the data onto a central BBSRC SRB server using Sphymove • Another scheduled transfer replicates the data onto a resource at the Atlas Data Store • A final scheduled process transfers the data onto tape BBSRC • The steps involved in transferring data between the institutes and the tape store (and the reverse process when data is required back) is monitored by “Request Tracker” • If a fault occurs on any of the steps it can easily be repeated so that the data is stored to tape as quickly as possible BBSRC Arts and Humanities Data Service AHDS required a 'dark archive' where data retrieved only in case of disasters Uses SRB to store all collections in tape store Required to MD5 checksum data Contend with poor network connectivity Collections contained many, many different characters (accents, multiple spaces etc) Lots of big collections, but small files in each collection Arts and Humanities Data Service System consists of client-side scripts to transfer data and query status and server-side daemon to checksum and synch data to tape − Design minimized exposure to network connectivity problems − And to handle MD5 checksumming (not built into SRB) Arts and Humanities Data Service Client-side transfer script essentially wrapped Sput, Srsync commands − Split large collections up into smaller ones for transfer − Run Srsync if collection existed − Submit job to queue by creating special SRB collection containing info necessary to calculate md5s and synch the container to tape Arts and Humanities Data Service ● ● ● ● Server-side daemon processes the “queue” by reading special collections and calculating MD5s and then syncing to tape if successful Client can query the queue to check status of jobs Checksumming and syncing to tape are asynchronous to data transfers from client Production service started in March 08, has more than 4TB data stored on tape National Grid Service • Grid computing and data resource for use by UK academics • Four core sites at Manchester, Leeds, Oxford and RAL with a number of partner sites across the country • Every user has an SRB account created when they register National Grid Service • SRB allows for easy staging of data from scientists desktops through to compute resources • Allows for experimental data generated at facilities to be post processed on grid compute resources (Federation) • Allowed for the development of submission frameworks like (R)MCS, see Martin Dove's talk IRODS Work at STFC IRODS Work at STFC Like, in2p3 have been involved with iRODS since the beginning Involved in a number of projects: − − − PhD student work on handling small files, see Andrea Weise's talk ASPIS (iRODS+Shibboleth+Provenance), see Mark Hedges' talk Looking at iRODS with a view to migrating from SRB to iRODS in the medium-term IRODS Work at STFC Have come up with a list of requirements for iRODS (some highlights): − Interoperation with SRB to help with migration. − GUI and web-front end support − Mountable iRODS for legacy application access for both Windows and Linux − GSI support − Interoperation with SRM IRODS Work at STFC Have a test iRODS server at STFC that we are currently trying out Plan to allow more users once we are more familiar with it Please let us know if you are interested in using such a system Acknowledgements SRB Services Group Adil Hasan Roger Downing srbservices@rl.ac.uk Acknowledgements Database Services Group Data Storage Group Data Management Group