An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research

advertisement
An Open Standards-based
Scalable Heavy Lifting Data
Transfer Service for e-Research
David Meredith, Peter Turner, Alex
Arana, Gerson Galang, David Wallom,
Phil Kershaw, Weijing Fang, Ally Hume,
Mario Antonioletti, Steve Crouch
Problem
• Moving data is a growing problem
• Data increasing in size – difficult to move about
– Storage
– Network
• Initiating data transfers across different protocols (data
onto/off grids) from a range of clients
– Remote user - desktop, portal
– Grid + Web
• e.g. copy from beam-line data resource to my home storage
lab
• Can’t do transfer through clients – not scalable
• Need something lightweight for users
Users/Use Cases
• For users from e.g.:
– Diamond Synchrotron, STFC
– Australian Synchrotron Facility
• Use Cases:
– Hermes (e.g. Oxford Anatomy Institute of Biology –
not wanting to deploy whole other machine to do this
– 100gb’s of data. They want desktop client to do this)
– NGS Portal
– Any Commons VFS-style Client
– SAGA client?
High-level Requirements
• Properties:
–
–
–
–
Scalable
Durable/Reliable
Asynchronous
Support protocols:
ftp/sftp/http/https/gsiftp/SRB/iRODS/SRM
• Core requirement: third party transfer needs to
be cross-platform (e.g. SRB -> gsiftp)
• Construct XML that specifies requirements, send
to 3rd party service for asynchronous
Realisations
• Need to discuss at a high-level – separate into
particular layers
– Top-level service, scheduling/movement
– I/fs to individual data protocols (i.e. thru VFS)
• Could go to data service providers and ask them
to support 3rd party
– But process could take too long
– The tech is already out there
• Would this go into UMD (Unified Middleware
Distribution)? They want all projects using eufunded e-Infrastructure
Current Cross Protocol File Transfer – Data is buffered through the
client, this does Not Scale and is synchronous !
File operations (list,
upload, download,
delete, rename)
Client provides
single interface
to different
remote file
systems (Srb
GsiFtp, Ftp, Sftp).
Bit pipe (byte IO stream)
VFS/Saga client,
e.g.
Portal/Hermes
Authentication tokens
(un/pw, x509?)
Auth tokens only in memory on
one server.
Self contained.
SRB/
FTP
SFTP/
GSIFTP
Piping bytes via client is
bottleneck, single point of failure,
concurrency issues).
Required / Suggested
Architecture
VFS/Saga
client
Asynchronous, no
concurrency issues, no
data buffered via client
!
File operations (list,
upload, download,
delete, rename)
JMS QUEUE
behind WS-I
interface
VFS workers
Bit pipe (byte IO stream)
Authentication tokens
(un/pw, x509?)
Move file transfers to different
server (farm), increase
bandwidth, concurrency.
Passing auth tokens around in
messages (strong security
required)
Development / testing.
SRB/
FTP
SFTP/
GSIFTP
Work to date
• Data transfer currently done via e.g. Hermes Client
• Commons VFS provides
ftp/sftp/HTTP/HTTPS/webdav/gsiftp
• Will always need clients via interface e.g. Portal, Hermes,
VFS client but have transfer via scalable third party service
– Asynchronous, poll for progress
– Architecture: underlying VFS code exists, deployed into serviceoriented, scalable manner
• Standards-driven?
– OGSA-DMI
– JSDL
• GridSAM compute-focused
DataMINX DTS – ‘Heavy Lifting’ Data
Transfer Service
• This is just one possible implementation of this, GridSAM
another?
• Under discussion last 4 days
• JMS-based scalability for asynchronously/in parallel moving
data
– DTS web service submits to JMS queue
– DTS worker nodes (VFS clients) picks up JMS transfer msgs
– Can specify in JMS queue direct machines to perform transfer
• Within J2EE environment
• Abstractions with target URIs
– Through shared connection pool per machine
– One connection to target URI
Other Possible Solution Paths
• GridSAM does some but not all
• gLite File Transfer Service – does this on a large scale
• Stork
– Supports ftp/http/fsiftp/nest/srb/srm/csrm/unitree
– But not web service – suitable?
• Alan W – Vbrowser – Hermes-esque?
• DW: Cloud-based (e.g. Amazon solution?)
• AH: Parallelisation in OGSA-DAI for compute, here is parallelisation
for data
– GridSAM’s data transfer is not parallelised
– Could have job that just moves data – but cannot guarantee network
availability on worker nodes, and not architecturally ok
• If one web service supports a single protocol, just extend it
Issues
• It’s a big problem with a big suggested solution –
lots of developer work
• Need to think about failure use cases
– Worker nodes fails – JMS gives you isolation from
service failure through tested, transaction-based
durability
– Need to discuss and uncover other failure cases
• Specs – do they cover all the use cases?
– JSDL/HPC File Staging Profile, OGSA-DMI?
– Interfaces limited?
Next Steps (Within CW)
• Recommend further session (Mario, Steve C, Ally,
David M, Peter T, Alex A, Gerson G, David W,
Weijian F):
–
–
–
–
Have others critique the design work over last 4 days
Possible subdivision for detailed issues
High-level requirements discussion
Implementation/specification
• Go over issues with schema specs, possible ways forward
• Possible architectures that can assist the problem now –
Stork!
Next Steps (Out of CW)
• Spec issues:
– Schedule discussion within OGSA-DMI WG (Mario
to organise)
– HPC File Staging Profile/JSDL WG’s (David M/Steve
C to organise)
– DW: attend the OGF PGI sessions – they will be
observing & championing necessary changes to
JSDL/HPC Profile (Steve C)
Download