Efficient Access to Many Small Files in a Grid Filesystem

advertisement
Efficient Access to
Many Small Files
in a Grid Filesystem
Douglas Thain and Christopher Moretti
University of Notre Dame
Efficient Access to Many
Small (and Big) Files
in a Grid Filesystem
Douglas Thain and Christopher Moretti
University of Notre Dame
Abstract
Many grid data tools focus on transferring,
storing, and managing large (GB-TB) files.
But, many users need to manage, transfer, and
process lots (1000s) of small (KB-MB) files.
We describe protocols and interfaces for
manipulating many small files over wide area
networks. (Doesn’t hurt large files, either.)
Implemented in the Chirp file system.
Performance:
– Best case: order of magnitude improvement.
– Worst case: no slower than before.
The Small File Problem
Who has lots of small files?
Anyone using a batch system.
– One file for submit, input, output, error, log...
Anyone using a large software package.
– Executables, libraries, config files...
Anyone using a filesystem like a database.
– Genomics, astronomy, physics...
Anyone who likes to write shell scripts.
– foreach host in list ssh $host > $host.output
Why is this a problem?
Users do the “sensible” thing:
– foreach file in (list) do transfer done
The “sensible” thing performs miserably:
–
–
–
–
New TCP Connection
SSL Authentication
Configuration Operations
Slow Start Again
Result is KB/s on a GB/s link.
Why not just use tar?
If you can, you should!
Sometimes you cannot:
– The system semantics demand multiple files.
– Packing and unpacking can be very slow.
– Not enough disk space to unpack.
– Different apps select different data subsets.
– Using an existing script or program.
Users don’t know or care that it’s a dist
system, why should they change?
The Challenge:
How to design interfaces
so that users get the expected
performance and behavior?
Chirp and Parrot:
A Grid Filesystem
Requirements for a Grid Filesystem
Transparent access to files in the same
manner as a local Unix filesystem.
Non privileged deployment at both client
and server. (root not possible on the grid.)
User control over policies for naming,
caching, consistency, and fault tolerance.
Flexible access controls for sharing.
Good performance on both small and
large files.
Chirp/Parrot – A Grid Filesystem
No
Privs
Needed!
Authentication:
Kerberos / Globus / Hostname / Unix
Automatic Recovery
Single TCP Stream
Chirp
unix
system
calls
ptrace
trap
Parrot
Protocol:
open / pread / pwrite / close
stat / mkdir / rmdir / unlink
getfile / putfile / movefile
Ordinary
Unix
Filesystem
Ordinary
Unix
Program
Authorization:
kerberos:joe@nd.edu
globus:/O=ND/CN=Joe
hostname:*.nd.edu
group:server.nd.edu/team
RWLDA
RWLDA
RL
RWL
No
Privs
Needed!
Ordinary Unix Commands
> parrot tcsh
> ls /chirp
alpha.nd.edu
beta.nd.edu
...
> cd /chirp/alpha.nd.edu/mydir
> cp /tmp/bigdata .
> emacs mydata.txt
Parrot Specific Commands
> parrot tcsh
> parrot_whoami
globus:/O=ND/CN=Joe
> parrot_getacl /chirp/alpha.nd.edu/
kerberos:joe@nd.edu
RWLDA
globus:/O=ND/CN=Joe RWL
hostname:*.nd.edu
RL
Chirp as Remote Filesystem
App
App
Parrot
Parrot
App
App
App
App
App
Parrot
Parrot
Parrot
Parrot
Parrot
Grid Site A
Grid Site B
Secured
by GSI
Chirp
Server
Unix
Filesystem
Grid
Middleware
App
Parrot
Cert
Chirp as Cluster Filesystem
App
App
Parrot
Parrot
App
App
App
App
App
Parrot
Parrot
Parrot
Parrot
Parrot
Grid Site A
dir
server
Chirp
Server
Grid Site B
Chirp
Server
Chirp
Server
Chirp
Server
Unix
Unix
Unix
Unix
Filesystem Filesystem Filesystem Filesystem
aux
db
http://www.cse.nd.edu/~ccl/viz
Sample Applications
Image Processing for Biometrics
– Moretti et al, PCGRID 2007
Bioinformatics on EGEE
– Blanchet et al, Grid 2006
High Energy Physics on LCG
– Sfiligoi et al, CHEP 2005,
Molecular Dynamics Repository
– Wozniak et al, HPDC 2005
Remote DB Access on EDG
– Klous et al, CCPE 2005
Protocols for Small Files
What About FTP?
FTP is a great data transfer system, but it
was never designed to be a file system:
– New TCP stream per data transfer.
– New TCP stream for each directory list.
– Lots of connections can overwhelm net devices.
– Coarse errors: 550 for all file system errors.
– Semantic problems: e.g. empty directory.
– Unix access controls, (But, see SecPAL)
– Wildly varying implementations and support.
FTP Protocol Reminder
Control Connection
FTP
Client
AUTH GSSAPI
MIC
MIC
PORT
RETR
FTP
Server
Data Connection
Minimum of four
round trips (plus
auth overhead)
to fetch a file +
loss of TCP
window.
AUTH GSSAPI
MIC
MIC
Data Transfer
Common practice
is new control
connection for
every data transfer!
What About NFS?
NFS was designed for a local area
network among (relatively) trusted hosts.
– Fine-grained file access very slow on WAN.
– Kernel support and root assistance needed to
start server, mount client, change target.
– Unix UID for ownership, access control.
– Need to bind to privileged port, often filtered.
– Use of “file handles” to refer to files makes it
very difficult to build a user-level server.
+ lots of lookup operations over the WAN.
NFS Protocol Reminder
NFS
Client
lookup(00,a)
lookup(10,b)
lookup(20,c)
...
read 4KB
read 4KB
read 4KB
...
On a WAN, throughput
limited to 4KB/latency.
10ms = 400 KB/s
100ms = 40 KB/s
NFS
Server
Chirp Hybrid Protocol Overview
Chirp
Client
auth globus (8 RTT)
open
read
write
close
...
getfile(“mydata”)
size and data
putfile(“otherdata”,size)
data
Chirp
Server
Protocol Comparison
FTP - Stream per File
– Latency = 4+ RTT for each file
– Throughput = TCP limit after slow start
NFS – Remote Procedure Call
– Latency = 1 RTT for each file
– Throughput = block size / latency
Chirp - Hybrid
– Latency = 1 RTT for each file
– Throughput = TCP limit in steady state
Local Area Performance
Wide Area Performance
Real WAN Performance
Interfaces for Small Files
Standard Unix Copy
cp /tmp/source /chirp/B/target
cp
open(source)
open(target)
loop: read/write
Parrot
read
open(source)
open
Local
open(source)
Chirp
read
Local
Disk
write
open
write
Chirp
Server
Problem:
The system does not know the
context of the operation!
Solution:
Introduce a higher-level operation
copyfile that exploits the context.
Improved Copy with Copyfile
cp /tmp/source /chirp/B/target
new
cp
copyfile(source,target)
Parrot
open(source)
putfile(target)
Local
open(source)
Chirp
putfile(target)
Local
Disk
Chirp
Server
Is it reasonable to modify cp?
Installation:
– Cannot modify /bin/cp.
– Install new parrot_cp
– Alias cp or link named “cp” in PATH.
Backwards compatibility:
– parrot_cp without Parrot falls back to normal.
– Ordinary cp on Parrot behaves as before.
– Parrot_cp on a different filesystem falls back.
Improved Copy with Copyfile
cp /chirp/A/source /chirp/B/target
new
cp
copyfile(source,target)
Parrot
thirdput(source,B,target)
Chirp
thirdput(source,B,target)
Chirp
Server
A
putfile(target)
Chirp
Server
B
Directory Copy
cp –r /chirp/A/mydir
/chirp/B/mydir
cp
Parrot
thirdput(/mydir/X,B,/mydir/X)
thirdput(/mydir/X,B,/mydir/Y)
thirdput(/mydir/X,B,/mydir/Z)
ACL
mkdir(mydir)
setacl(mydir)
Chirp
Server
A
Chirp
Server
B
mydir
mydir
X
Y
Z
ACL
X
Y
Z
Improved Directory Copy
cp –r /chirp/A/mydir
/chirp/B/mydir
cp
Parrot
thirdput(/mydir,B,/mydir)
Chirp
Server
A
mkdir
putfile*3
setacl
Chirp
Server
B
mydir
ACL
X
Y
mydir
Z
ACL
X
Y
Z
Third Party Performance
You get the idea...
ls –la D
– Original: getdir D + N*stat
– Improved: getlongdir D
rm –rf D
– Original: getdir D + N*unlink (recursive)
– Improved: rmall D
md5sum F
– Original: open F + N*read + close
– Improved: md5 F
Final Example
ls –la
/chirp/alpha/data
md5sum /chirp/alpha/data/*
cp -r
/chirp/alpha/data
/chirp/beta/data
md5sum /chirp/beta/data/*
rm –rf
/chirp/alpha/data
Original Implementation
app
parrot
ls -la md5
cp
chirp
server
A
rm
cp
md5
chirp
server
B
Improved Implementation
app
parrot
ls -la md5
cp
chirp
server
A
rm
md5
chirp
server
B
180
160
140
120
100
Original
Improved
te
de
le
ch
ec
ks
um
ov
e
m
ch
ec
ks
um
t
80
60
40
20
0
lis
time (seconds)
Performance on Script
The Challenge:
How to design interfaces
so that users get the expected
performance and behavior?
Summary
Good small file performance requires
attention to low level network protocols.
– getfile, putfile, thirdput, rmall, checksum
Exploiting protocols requires minor
changes to the Unix I/O interface.
– copyfile, rmall, checksum, others?
Easy to apply those changes in a user
transparent way.
– cp, rm, md5sum all operate as normal
Usable performance in a wide-area FS.
For more information...
Douglas Thain
–dthain@nd.edu
Chris Moretti
–cmoretti@nd.edu
Parrot and Chirp
–http://www.cctools.org
Download