Distributed Storage And WAN Transport

advertisement
Distributed Storage And
WAN Transport
Peter Kunszt
SyBIT Tech Day
Nov. 23 2011, Bern
Distributed Storage Systems
Distributed FS
Make it look like local FS
User sees one space
Remote user sees same local space
Policies on sharing, access should be
available
Caching FS
Data lives somewhere else
But looks local due to smart WAN cache
2011.11.23
2
Gluster (bought by RedHat)
www.gluster.org GlusterFS. Many commercial
users.
The software is open source, they sell an
appliance and support (just like redhat)
Single global namespace
Block storage clustering, no central metadata
Works over 1GbE, 10GbE, Infiniband
Replication
‘NFS–like’ native
No kernel dependenices, simple installation
2011.11.23
3
XtreemFS
Part of XtreemOS project (EU FP7). Used only
by German MosGrid in latest version in
production.
Object-based design. Global FS namespace.
Metadata and Replica Service stores info. Data on
Object Storage Servers. Linked through Replica
Management Service.
Written in java – using native Memblocking. Keystore
DB used : BabuDB
Uses Linux FUSE kernel module, MIT Vivaldi
algorithm for replica automation and selection
2011.11.23
4
DDN WOS
www.ddn.com/industry/life-sciences
Storage appliance, sold with several
interfaces including S3 and REST. GPFS
based. Highly resilient to failure.
Policy-based replication
Data protection mechanism – several
copies stored
Break data into fragments, store those x times
Can be combined with replication
2011.11.23
5
IBM Panache aka
Active Cloud Engine
www.almaden.ibm.com/storagesystems/project
s/panache/
Clustered Filesystem CACHE for parallel I/O
Can cache from multiple nodes
GPFS for local FS, pNFS for remote access also
using parallel I/O
No proprietary HW or SW needed for installation
Very resilient to failures, late sync if necessary
2011.11.23
6
IBM Active Cloud Engine™– WAN Caching capabilities
Statement of Direction

If data is modified at home
– Revalidation done at a configurable timeout
– Close to NFS style close-to-open consistency across
sites
– POSIX strong consistency within cache site

Fileset on home cluster is associated with a
fileset on one or more cache clusters

If data is in cache …
– Cache hit at local disk speeds
– Client sees local GPFS performance if file or
directory is in cache

If data not in cache …
– Data and metadata (files and directories)
pulled on-demand at network line speed
and written to GPFS
– Uses NFS for WAN data transfer
If data is modified at cache
– Writes see no WAN latency
– are done to the cache (i.e. local GPFS), then
asynchronously pushed home

If network is disconnected …
– cached data can still be read, and writes to cache are
written back after reconnection
IO Nodes SONAS layer

SONAS layer
IO Nodes
Pull on cache
miss
Push on write
NFS over the WAN
Cache Cluster Site
Cache Cluster Site 2
Home Cluster Site
SoNAS System
7
IBM Active Cloud Engine™
 What is IBM Active Cloud Engine?
• Policy-driven engine that helps improve storage efficiency by automatically
Distributing files, images, and application updates to multiple locations *
Identifying files for backup or replication to a DR location
Moving desired files to the right tier of storage including tape in a TSM hierarchy
Deleting expired or unwanted files
• High-performance: can scan billions of files in minutes
 What client value does Active Cloud Engine deliver?
• Enables ubiquitous access to files from across the globe *
• Reduces networks costs and helps improve application performance by distributing files closer to users *
• Improves data protection by identifying candidates for backup or DR
• Lowers storage cost by moving files transparently to the most appropriate tier of storage
• Controls storage growth by moving older files to tape and deleting unwanted or expired files
• Enhances administrator productivity by automating file management
 What capabilities are supported by Active Cloud Engine in SONAS?
• Active Cloud Engine on SONAS supports all the functions described above
 What capabilities are supported by Active Cloud Engine in Storwize V7000 Unified?
• Active Cloud Engine on Storwize V7000 Unified supports all the functions described above except distribution to
* Active Cloud Engine Statement of Direction
multiple locations
8
Fast Transport
Network bandwidth maximization
Fair share
Congestion control
Scheduling
TCP based: GridFTP and similar
FTP blocksize adjustment
Many parallel threads
2011.11.23
9
Aspera
www.asperasoft.com
Built-in to other appliances, many users
UDP based transport
Swarming – can look like a DoS
Also has an FTP connection for control information
Configurable, has server and client UI for
transport control
Congestion control
Fair share control
2011.11.23
10
FileCatalyst
www.filecatalyst.com
Similar to Aspera: UDP based transport
2011.11.23
11
Signiant
www.signiant.com
And one more. Is not cheap but I didn’t
find out more.
2011.11.23
12
Download