OSD Technical Architecture

advertisement
pNFS extension for NFSv4
IETF-62 March 2005
Brent Welch
welch@panasas.com
March 23, 2016
Abstract
pNFS extends NFSv4
Scalable I/O problem
(Brief review) of the proposal
Status and next steps
Object Storage and pNFS
Object security scheme
IETF 61, Nov 2004
Page 2
pNFS
Extension to NFSv4
NFS is THE file system standard
Fewest additions that enable parallel I/O from clients
Layouts are the key additional abstraction
Describe where data is located and how it is organized
Clients access storage directly, in parallel
Generalized support for asymmetric solutions
Files: Clean way to do filer virtualization, eliminate botteneck
Objects: Standard way to do object-based file systems
Blocks: Standard way to do block-based SAN file systems
IETF 61, Nov 2004
Page 3
Scalable I/O Problem
Storage for 1000’s of active clients => lots of bandwidth
Scaling capacity through 100’s of TB and into PB
File Server model has good sharing and manageability
but it is hard to scale
Many other proprietary solutions
GPFS, CXFS, StorNext, Panasas, Sustina, Sanergy, …
Everyone has their own client
Like to have a standards based solution => pNFS
IETF 61, Nov 2004
Page 4
Scaling and the Client
Gary Grider’s rule of thumb for HPC
1 Gbyte/sec for each Teraflop of computing power
2000 3.2 GHz processors => 6TF => 6 GB/sec
One file server with 48 GE NICs? I don’t think so.
100 GB/sec I/O system in ’08 or ’09 for 100 TF cluster
Making movies
1000 node rendering farm, plus 100’s of desktops at night
Oil and Gas
100’s to 1000’s of clients
Lots of large files (10’s of GB to TB each)
EDA, Compile Farms, Life Sciences …
Everyone has a Linux cluster these days
IETF 61, Nov 2004
Page 5
Asymmetric File Systems
Control Path vs. Data Path (“out-of-band” control)
Metadata servers (Control)
File name space
Client
Client
Client
Client
Client
Access control (ACL checking)
Sharing and coordination
Protocol
NFSv4 + pNFS ops
Storage Devices (Data)
Clients access storage directly
SAN Filesystems
CXFS, EMC Hi Road, Sanergy, SAN FS
Object Storage File Systems
Panasas, Lustre
IETF 61, Nov 2004
Page 6
Storage
Storage
Storage
Storage
Storag
Storage
Storage
pNFS
Server
Storage
Management
Protocol
NFS and Cluster File Systems
NFS Client
NFS Client
NFS Client
NFS Client
NFS “Head”
NFS “Head”
NFSClient
“Head”
Native
Native Client
Native Client
NFS Client
NFS Client
NFS Client
NFS Client
Native Client
Native Client
Native Client
Native Client
Cluster FS
Cluster FS
Cluster FS
Cluster FS
Currently, NFS head can be a client of a cluster file system (instead
of a client of the local file system)
NFSv4 adds more state, makes integration with cluster file system
more interesting
IETF 61, Nov 2004
Page 7
pNFS and Cluster File Systems
pNFS Client
pNFS Client
pNFS Client
pNFS Client
pNFS Client
pNFS Client
pNFS Client
pNFS Client
Cluster FS + NFSv4
Cluster FS + NFSv4
Cluster FS + NFSv4
Cluster FS + NFSv4
Cluster FS + NFSv4
Native Client
Native Client
Native Client
Native Client
Replace proprietary cluster file system protocols with pNFS
Lots of room for innovation inside the cluster file system
IETF 61, Nov 2004
Page 8
pNFS Ops Summary
GETDEVINFO
Maps from opaque device ID used in layout data structures to the storage
protocol type and necessary addressing information for that device
LAYOUTGET
Fetch location and access control information (i.e., capabilities)
LAYOUTCOMMIT
Commit write activity. New file size and attributes visible on storage.
LAYOUTRELEASE
Give up lease on the layout
CB_LAYOUTRETURN
Server callback to recall layout lease
IETF 61, Nov 2004
Page 9
Multiple Data Server Protocols
BE INCLUSIVE !!
Client Apps
Broaden the market reach
Three (or more) flavors of outof-band metadata attributes:
BLOCKS:
SBC/FCP/FC or SBC/iSCSI… for
files built on blocks
OBJECTS:
OSD/iSCSI/TCP/IP/GE for files
built on objects
FILES:
NFS/ONCRPC/TCP/IP/GE for
files built on subfiles
Inode-level encapsulation in
server and client code
IETF 61, Nov 2004
Page 10
pNFS IFS
Layout
driver
NFSv4 extended
w/ orthogonal
layout metadata
attributes
1. SBC (blocks)
2. OSD (objects)
3. NFS (files)
pNFS server
Layout metadata
grant & revoke
Local
Filesystem
March 2005 Connectathon
NetApp server prototype by Garth Goodson
NFSv4 for the data protocol
Layouts are an array of filehandles and stateIDs
Storage Mangement protocol is also NFSv4
Client prototype by Sun
Solaris 10 variant
IETF 61, Nov 2004
Page 11
Object Storage
Object interface is midway between files and blocks (think inode)
Create, Delete, Read, Write, GetAttr, SetAttr, …
Objects have numeric ID, not pathnames
Objects have a capability based security scheme (details in a moment)
Objects have extensible attributes to hold high-level FS information
Based on NASD and OBSD research out of CMU (Gibson et. al)
SNIA T10 standards based. V1 complete, V2 in progress.
IETF 61, Nov 2004
Page 12
Objects as Building Blocks
Object
Comprised of:
User Data
Attributes
Layout
Interface:
ID <dev,grp,obj>
Read/Write
Create/Delete
Getattr/Setattr
Capability-based
File Component:
Stripe files across
storage nodes
IETF 61, Nov 2004
Page 13
Objects and pNFS
Clients speak pNFS to metadata server
Or they will, eventually
Clients speak iSCSI/OSD to the data servers (OSD)
Files are striped across multiple objects
Metadata manager speaks iSCSI/OSD to data server
Metadata manager uses extended attributes on objects to store
metadata
High level file system attributes like owners and ACLs
Storage maps
Directories are just files stored in objects
IETF 61, Nov 2004
Page 14
Object Security (1)
Clean security model based on shared, secret device keys
Part of T10 standard
Metadata manager knows device keys
Metadata manager signs caps, OSD verifies them (explained next slide)
Metadata server returns capability to the client
Capability encodes: object ID, operation, expire time, data range, cap
version, and a signature
Data range allows serialization over file data, as well as different rights to
different attributes
Cap version allows revocation by changing it on the object
May want to protect capability with privacy (encryption) to avoid snooping
the path between metadata manager and client
IETF 61, Nov 2004
Page 15
Object Security (2)
Capability Request (client to metadata manager)
CapRequest = object ID, operation, data range
Capability signed with device key known by metadata manager
CapSignature = {CapRequest, Expires, CapVersion} KeyDevice
CapSignature used as a signing key (!)
OSD Command (client to OSD)
Request contains {CapRequest, Expires, CapVersion} + other details +
nonce to prevent replay attacks
RequestSignature = {Request} CapSignature
OSD can compute CapSignature by signing CapRequest with its own key
OSD can then verify RequestSignature
Caches and other tricks used to make this go fast
IETF 61, Nov 2004
Page 16
Scalable Bandwidth
Panasas Bandwidth vs. OSDs
12
10
GB/sec
8
6
4
2
0
0
50
100
150
200
250
Num OSDs (Panasas OSD has 2 disks, processor, and GE NIC)
IETF 61, Nov 2004
Page 17
300
350
Per Shelf Bandwidth
Bandwidth vs Clients, 10 OSD
450
400
350
MB/sec
300
250
200
150
100
50
0
0
5
10
15
Clients
Read (64K)
IETF 61, Nov 2004
Page 18
Write (64K)
20
25
Scalable NFS
NFS Throughput (SPEC FS)
20
Response Time (msec)
18
16
14
12
10
8
6
4
2
0
0
50000
100000
150000
200000
250000
SPEC SFS Ops/sec
60-Director
IETF 61, Nov 2004
Page 19
ORT
10-Director
ORT
300000
350000
Rendering load from 1000 clients
IETF 61, Nov 2004
Page 20
Status
pNFS ad-hoc working group
Dec ’03 Ann Arbor, April ’04 FAST, Aug ’04 IETF, Sept ’04 Pittsburgh
Internet Drafts
draft-gibson-pnfs-problem-statement-01.txt July 2004
draft-gibson-pnfs-reqs-00.txt October 2004
draft-welch-pnfs-ops-00.txt October 2004
draft-welch-pnfs-ops-01.txt March 2005
Next Steps
Add NFSv4 layout and storage protocol to current draft
Add text for additional error and recovery cases
Add separate drafts for object storage and block storage
IETF 61, Nov 2004
Page 21
Backup
IETF 61, Nov 2004
Page 22
Object Storage File System
Out of band control path
Direct data path
IETF 61, Nov 2004
Page 23
“Out-of-band” Value Proposition
Out-of-band allows a client to use more than one storage address for a
given file, directory or closely linked set of files
Parallel I/O direct from client to multiple storage devices
Scalable capacity: file/dir uses space on all storage: can get big
Capacity balancing: file/dir uses space on all storage: evenly
Load balancing: dynamic access to file/dir over all storage: evenly
Scalable bandwidth: dynamic access to file/dir over all storage: big
Lower latency under load: no bottleneck developing deep queues
Cost-effectiveness at scale: use streamlined storage servers
pNFS standard leads to standard client SW: share client support $$$
IETF 61, Nov 2004
Page 24
Scaling and the Server
Tension between sharing and throughput
File server provides semantics, including sharing
Client
Client
Client
Client
Direct attach I/O provides throughput, no sharing
File server is a bottleneck between clients and storage
Server
Pressure to make server ever faster and more expensive
Clustered NAS solutions, e.g., Spinnaker
SAN filesystems provide sharing and direct access
Asymmetric, out-of-band system with distinct control and data paths
Proprietary solutions, vendor-specific clients
Physical security model, which we’d like to improve
IETF 61, Nov 2004
Page 25
Symmetric File Systems
Distribute storage among all the clients
GPFS (AIX), GFS, PVFS (User Level)
Reliability Issues
Compute nodes less reliable because of the disk
Storage less reliable, unless replication schemes employed
Scalability Issues
Stealing cycles from clients, which have other work to do
Coupling of computing and storage
Like early days of engineering workstations, private storage
IETF 61, Nov 2004
Page 26
Download