An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research David Meredith, Peter Turner, Alex Arana, Gerson Galang, David Wallom, Phil Kershaw, Weijing Fang, Ally Hume, Mario Antonioletti, Steve Crouch Problem • Moving data is a growing problem • Data increasing in size – difficult to move about – Storage – Network • Initiating data transfers across different protocols (data onto/off grids) from a range of clients – Remote user - desktop, portal – Grid + Web • e.g. copy from beam-line data resource to my home storage lab • Can’t do transfer through clients – not scalable • Need something lightweight for users Users/Use Cases • For users from e.g.: – Diamond Synchrotron, STFC – Australian Synchrotron Facility • Use Cases: – Hermes (e.g. Oxford Anatomy Institute of Biology – not wanting to deploy whole other machine to do this – 100gb’s of data. They want desktop client to do this) – NGS Portal – Any Commons VFS-style Client – SAGA client? High-level Requirements • Properties: – – – – Scalable Durable/Reliable Asynchronous Support protocols: ftp/sftp/http/https/gsiftp/SRB/iRODS/SRM • Core requirement: third party transfer needs to be cross-platform (e.g. SRB -> gsiftp) • Construct XML that specifies requirements, send to 3rd party service for asynchronous Realisations • Need to discuss at a high-level – separate into particular layers – Top-level service, scheduling/movement – I/fs to individual data protocols (i.e. thru VFS) • Could go to data service providers and ask them to support 3rd party – But process could take too long – The tech is already out there • Would this go into UMD (Unified Middleware Distribution)? They want all projects using eufunded e-Infrastructure Current Cross Protocol File Transfer – Data is buffered through the client, this does Not Scale and is synchronous ! File operations (list, upload, download, delete, rename) Client provides single interface to different remote file systems (Srb GsiFtp, Ftp, Sftp). Bit pipe (byte IO stream) VFS/Saga client, e.g. Portal/Hermes Authentication tokens (un/pw, x509?) Auth tokens only in memory on one server. Self contained. SRB/ FTP SFTP/ GSIFTP Piping bytes via client is bottleneck, single point of failure, concurrency issues). Required / Suggested Architecture VFS/Saga client Asynchronous, no concurrency issues, no data buffered via client ! File operations (list, upload, download, delete, rename) JMS QUEUE behind WS-I interface VFS workers Bit pipe (byte IO stream) Authentication tokens (un/pw, x509?) Move file transfers to different server (farm), increase bandwidth, concurrency. Passing auth tokens around in messages (strong security required) Development / testing. SRB/ FTP SFTP/ GSIFTP Work to date • Data transfer currently done via e.g. Hermes Client • Commons VFS provides ftp/sftp/HTTP/HTTPS/webdav/gsiftp • Will always need clients via interface e.g. Portal, Hermes, VFS client but have transfer via scalable third party service – Asynchronous, poll for progress – Architecture: underlying VFS code exists, deployed into serviceoriented, scalable manner • Standards-driven? – OGSA-DMI – JSDL • GridSAM compute-focused DataMINX DTS – ‘Heavy Lifting’ Data Transfer Service • This is just one possible implementation of this, GridSAM another? • Under discussion last 4 days • JMS-based scalability for asynchronously/in parallel moving data – DTS web service submits to JMS queue – DTS worker nodes (VFS clients) picks up JMS transfer msgs – Can specify in JMS queue direct machines to perform transfer • Within J2EE environment • Abstractions with target URIs – Through shared connection pool per machine – One connection to target URI Other Possible Solution Paths • GridSAM does some but not all • gLite File Transfer Service – does this on a large scale • Stork – Supports ftp/http/fsiftp/nest/srb/srm/csrm/unitree – But not web service – suitable? • Alan W – Vbrowser – Hermes-esque? • DW: Cloud-based (e.g. Amazon solution?) • AH: Parallelisation in OGSA-DAI for compute, here is parallelisation for data – GridSAM’s data transfer is not parallelised – Could have job that just moves data – but cannot guarantee network availability on worker nodes, and not architecturally ok • If one web service supports a single protocol, just extend it Issues • It’s a big problem with a big suggested solution – lots of developer work • Need to think about failure use cases – Worker nodes fails – JMS gives you isolation from service failure through tested, transaction-based durability – Need to discuss and uncover other failure cases • Specs – do they cover all the use cases? – JSDL/HPC File Staging Profile, OGSA-DMI? – Interfaces limited? Next Steps (Within CW) • Recommend further session (Mario, Steve C, Ally, David M, Peter T, Alex A, Gerson G, David W, Weijian F): – – – – Have others critique the design work over last 4 days Possible subdivision for detailed issues High-level requirements discussion Implementation/specification • Go over issues with schema specs, possible ways forward • Possible architectures that can assist the problem now – Stork! Next Steps (Out of CW) • Spec issues: – Schedule discussion within OGSA-DMI WG (Mario to organise) – HPC File Staging Profile/JSDL WG’s (David M/Steve C to organise) – DW: attend the OGF PGI sessions – they will be observing & championing necessary changes to JSDL/HPC Profile (Steve C)