pptx

advertisement
Rosa Filgueira – University of Edinburgh
Iraklis Klamapnos- University of Edinburgh
Yusuke Tanimura- AIST, Tsukuba
Malcolm Atkinson- University of Edinburgh

Introduction

Related developments

FAST

Future work and Questions
◦
◦
◦
◦
◦
Problem description
Hypothesis
Rock Physics laboratory experiments
Objective
Proposal
◦ Data transfer protocols
◦ Data transport systems
◦ Selecting the best data transfer protocol
◦ Data transfer experiments
◦ Implementation and evaluation

Large number of rock physics (RP) laboratories
◦ Runs many experiments (Experimentalists)

Large number of rock physicists
◦ Develops computational codes (Code builders)

Sharing experimental data among this
community is still in its early days
◦ No facilities to transfer experimental data
automatically in real time with their associated
description (metadata)

Several tools for providing reliable and high
performance data transfer capabilities
◦ Dropbox or Globus Online

Not optimized for the RP requirements

The RP community will benefit from tool
◦ Transfers data and metadata in near-real time
◦ Repository and DB accessible from a website

For experimentalists
◦ Collection and comparison of experiments from
many labs

For code builders
◦ Find test data for running their models




Laboratory rock property measurements
◦ Properties of the rock sample are studied
under different conditions
High-pressure vessels to apply pore
pressures and stresses to cylindrical
rock sample
Until the sample has failed, different
features (e.g stress, porosity,
temperature, etc, ....) are recorded at
several time intervals
In each interval, data transferred to a
local computer machine (channel. 1
channel per rock)
Pressure Vessel
UCL- RP Laboratory
Rock Samples
Initial target: 30 months
Deploy under the sea- Mediterranean
8 rock samples- different features
Different interval of times and data sizes

Each experiment can record data differently
◦
◦
◦
◦
◦

Events can be written in a new file or appended
Files can be stored in the same directory or not
Intervals for writing data can be shorts or long
Number of rocks samples could be one or several
Duration of an experiments can be short or long
Data intensive problem for transferring the
data

To transfer RP experimental data from one
location to another
◦ Automated data transfer until the end-experiment
 Transfer experimental data
 Near real time and non-real time
 Synchronization
 Incremental (File) and Directory
◦ Possible interruptions and fails
◦ Record and transfer the metadata

FAST: Flexible automated
synchronization transfer
◦ Data and metadata in real time and nonreal time
◦ Incremental (file) and directory sync
◦ Selection of the data-transfer protocol
◦ Compatible with all O.S
◦ Simple to set up and manage
◦ Monitors the transmission, detects errors
and recovers from them.
◦ Data collected in a repository, metadata in
DB, and web site for accessing them

Proposal is triggered by our work
◦ EFFORT project
◦ Using data provided by the Creep-2
project

File transfer Protocol (FTP)

FTP security extension (FTPS)

Secure Copy (SCP)

SSH File Transfer Protocol (SFTP)

Rsync
◦ Control and data are un-encrypted
◦ Easy to use, lack of security
◦ Control encrypted (TLS or STLS), but data might not be
◦ SSH for transferring data and authentication (more secure than previous ones)
◦ File transfer only
◦ Ideal for quick transfer of single files
◦ Based in SSH-2: best for secure access (packet confirmation)
◦ File transfer, creating and delete remote directories and files
◦ Directory synchronization,
◦
◦
◦
◦
◦
Incremental file transfer (delta algorithm)
File and directory synchronization
Can provide encrypted transfer by using SSH
On-the-fly compression option
Idea for back-ups

UDP-(UDT)
◦ UDP protocol for data-intensive applications
◦ UDT can transfer data a higher speed than TCPbased protocols

UDT Enabled Rsync (UDR)
◦ Uses Rsync for the transport mechanism (delta)
◦ Sends data over the UDT protocolIdeal for large
data over long distance
◦ Ideal for large data over long distance

GridFTP:

Globus Online
◦ HP secure, reliable data rate via high bandwidth
◦ many-to-many
◦ difficult to use
◦ Uses GridFTP protocol
◦ Automates the management of files:

monitoring performance, retrying files, recovering from failes
◦ Do not support file synchronization.

Dropbox:
◦
◦
◦
◦
Centralize cloud storage, file and directory synchronization
Rsync-delta protocol
Data stored on the Amazon S3 (Third party)
One-to-one file transfer

BTSync

WinSCP
◦ Decentralized cloud storage, P2P file synchronization (No Third party).
◦ Connecting the devices to communicate with UDP
◦ Many-to-many file transfers
◦ SFTP and FTP client for Windows
Email from Globus Online Support
We recently noticed that you are creating many CLI sessions to
cli.globusonline.org, each with a single blocking transfer. This is a
suboptimal way to use Globus Online and in fact is causing us some
resource usage issues.

Previous tools
◦ Different data-transfer protocols
◦ Some automated data synchronization

No one
◦ Select the best protocol depending on requirements
◦ Methods for tracking metadata and transferring it

Our work automatically
◦
◦
◦
◦
◦
Selects a protocol among FTPS, SFTP, Rsync, and UDR
Injects a minimum of metadata
GridFTP and P2P discarded: communications 1-to-1
FTPS instead of using FTP: minimum security level
SFTP derives from SCP
FTPS, SFTP, Rsync and UDR

Two machines located in Edinburgh
◦ VLAN Network 100MB/s




Synthetic program to generate events
Data size written to files: 50KB, 500KB, 1MB,
10MB, 100MB, 500MB, 1GB and 10GB.
Measures: transfer rate and elapsed time
Repetition: 10 times
SFTP fastest < 500MB
Rsync fastest >= 500MB
** without compression
Elapsed Time
File Size
Rsync
UDR
SFTP
FTPS
Rsync-c
UDR-c
50KB
0
0
0
0
0.1
0.1
500KB
0.2
0.3
0.1
0.2
0.3
0.2
1MB
0.7
0.5
0.3
0.7
0.8
0.8
50MB
4
4
3
4
7
1.05
500MB
39
42
40
43
78
1.05
1GB
78
79
79
82
147
180
10GB
814
845
850
1012
1495
1712

UDR has been specially designed
◦ Large data transfer over long distance

UDR vs Rsync by using two machines
◦ Located in different local networks
 University of Edinburgh 1GbE
 AIST-Tsukuba  10GbE

Generated Files: 1MB, 500MB, 1GB, 10GB and
30GB.
UDR fastest
** without compression
Elapsed Time
File size
Rsync
UDR
Rsync-c
UDR-c
1MB
0
0
0
0
500MB
365
20
154
56
1GB
730
37
79
120
10GB
6722
364
3000
1140
30GB
1630
1080
7560
3360



Front-end: GUI using Java SWING
Back-end: Decision tree
Data and Metadata
◦ Data stored in a remote repository (NAS)
◦ Metadata collected in remote database (MySQL)

Science gateway (Web tool) connected with
the repository and database
◦
◦
◦
◦
Searching
Visualizing
Analyzing
Download

FAST has been evaluated:
◦ By using synthetic programs for generating data
 real time and non-real time




For each type of synchronization
Different data sizes, and different types of network locations
Short and Long term experiments
Stop and restart
◦ For transferring data from a real rock physic experiment




Laboratory- UCL (London) and Edinburgh
Days: 45 days
Interval: Every minute
Rock Samples: 1


Use FAST in the Creep-2 experiment
Implement FAST policies
◦ Data available in the repository for specific users
during a reasonable period


Sharing data from many-to-many locations
Decision-tree
◦ Automating generation and maintenance
◦ Keep up-to-date the by measuring transfers


Use FAST in more rock physics laboratories
Use FAST in other disciplines

email: rosa.filgueira@ed.ac.uk
Download