SRB and iRODS work at the STFC and National Grid Service

advertisement
SRB and iRODS work at
the STFC and National
Grid Service
Dr Thomas Mortimer-Jones
eScience Centre
Science and Technology Facilities Council
Overview
●
●
●
●
●
●
●
Atlas Data Store (ADS)
Diamond Light Source (DLS)
Daresbury Synchrotron Radiation Source (SRS)
Biotechnology and Biological Sciences Research
Council (BBSRC)
Arts and Humanities Data Service (AHDS)
National Grid Service (NGS)
iRODS Work at STFC
Atlas Data Store
Atlas Data Store
• Petabyte-scale mass storage system
• In use for over 20 years
• Data storage backend for the UK CERN tier 1 centre
which will receive data from the Large Hadron
Collider (LHC)
• Allows for fire safe and off-site backups to ensure
maximum data safety
Atlas Data Store
• SRB driver allows seamless integration of tape
resources into an SRB system
• SRB containers group together small files for
efficient storage to tape
• Containers are first created on a cache disk (parallel
transfer is possible)
• When the user has finished writing to the container
it can then be synced to tape
Diamond Light Source
Diamond Light Source
• Largest UK-funded scientific facility to be built for
over 40 years
• Requirement to store data produced by visiting
scientists
Diamond Light Source
●
●
●
●
Imploding star topology
SRB clients on the beamlines register the data
with SRB
Data is transferred to a central server where it is
containerised
Data synced to tape – system turns sporadic data
into a regular stream
Daresbury SRS
Daresbury SRS
• Worlds first dedicated synchrotron
• Operating since 1981
• Due to start decommissioning at the end of this
year
• Data catalogue must be preserved so that data
isn't lost after the facility has been
decommissioned – backup to ADS
• Data must be accessible for users to get access to
their data
Daresbury SRS
• User interface based on Toby's SRB from the
University of Cambridge
• Modified to the requirements of the facility
• Metadata has been extracted from the headers of
the synchrotron output files
• Bespoke metadata query functionality allows for
users to locate data based on any information they
can remember about their visit, e.g. dates,
stations, run numbers
Daresbury SRS
Daresbury SRS
Other Facilities
• ISIS neutron and muon source
• Central Laser Facility
• New Light Source?
BBSRC
• Archive research material from funded projects
• Data needs to be stored off-site to meet
regulatory, project and funder requirements
• Bespoke user interface has been developed to
allow users to access the system
• Like Diamond using an imploding star topology
due to the constraints of their private network
BBSRC
• Each BBSRC institute site has its own SRB server
• Data is stored from institute computers into the
local SRB
• A scheduled transfer occurs to physically move the
data onto a central BBSRC SRB server using
Sphymove
• Another scheduled transfer replicates the data
onto a resource at the Atlas Data Store
• A final scheduled process transfers the data onto
tape
BBSRC
• The steps involved in transferring data between
the institutes and the tape store (and the reverse
process when data is required back) is monitored
by “Request Tracker”
• If a fault occurs on any of the steps it can easily be
repeated so that the data is stored to tape as
quickly as possible
BBSRC
Arts and Humanities Data
Service






AHDS required a 'dark archive' where data
retrieved only in case of disasters
Uses SRB to store all collections in tape store
Required to MD5 checksum data
Contend with poor network connectivity
Collections contained many, many different
characters (accents, multiple spaces etc)
Lots of big collections, but small files in each
collection
Arts and Humanities Data
Service

System consists of client-side scripts to transfer
data and query status and server-side daemon to
checksum and synch data to tape
−
Design minimized exposure to network
connectivity problems
−
And to handle MD5 checksumming (not built
into SRB)
Arts and Humanities Data
Service

Client-side transfer script essentially wrapped
Sput, Srsync commands
−
Split large collections up into smaller ones for
transfer
−
Run Srsync if collection existed
−
Submit job to queue by creating special SRB
collection containing info necessary to calculate
md5s and synch the container to tape
Arts and Humanities Data
Service
●
●
●
●
Server-side daemon processes the “queue” by
reading special collections and calculating MD5s
and then syncing to tape if successful
Client can query the queue to check status of
jobs
Checksumming and syncing to tape are
asynchronous to data transfers from client
Production service started in March 08, has more
than 4TB data stored on tape
National Grid
Service
• Grid computing and data resource for use by UK
academics
• Four core sites at Manchester, Leeds, Oxford and
RAL with a number of partner sites across the
country
• Every user has an SRB account created when they
register
National Grid
Service
• SRB allows for easy staging of data from scientists
desktops through to compute resources
• Allows for experimental data generated at
facilities to be post processed on grid compute
resources (Federation)
• Allowed for the development of submission
frameworks like (R)MCS, see Martin Dove's talk
IRODS Work at STFC
IRODS Work at STFC


Like, in2p3 have been involved with iRODS since
the beginning
Involved in a number of projects:
−
−
−
PhD student work on handling small files, see
Andrea Weise's talk
ASPIS (iRODS+Shibboleth+Provenance), see
Mark Hedges' talk
Looking at iRODS with a view to migrating from
SRB to iRODS in the medium-term
IRODS Work at STFC

Have come up with a list of requirements for
iRODS (some highlights):
− Interoperation with SRB to help with migration.
− GUI and web-front end support
−
Mountable iRODS for legacy application access
for both Windows and Linux
−
GSI support
−
Interoperation with SRM
IRODS Work at STFC



Have a test iRODS server at STFC that we are
currently trying out
Plan to allow more users once we are more
familiar with it
Please let us know if you are interested in using
such a system
Acknowledgements
SRB Services Group
Adil Hasan
Roger Downing
srbservices@rl.ac.uk
Acknowledgements
Database Services Group
Data Storage Group
Data Management Group
Download