Active Storage Nets David Nagle Computer Engineering and Computer Science Carnegie Mellon University Network-attached storage is an excellent application for Active Storage Nets Clients Fast/Giga Ethernet Servers FibreChannel “SCSI” drives Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 1/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Network-Attached Secure Disk (NASD) Enable direct transfer between client & storage device Object-oriented interface (SCSI 4) w/persistent objects Objects supported by in-drive file system File manager provides policy (e.g., names, ACLs, consistency) Device understands cryptographic capabilities NASD (Cluster) Client File Manager Access Control NASD Read/Write Read/Write NASD Switch Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 2/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Current NASD Prototype Current prototype supports NFS, AFS and Striped NFS Object Interface (SCSI-4) moving toward standard Storage management middleware in development NASD/NSIC Working Group: CMU, HP, Quantum, IBM, StorageTek, & Seagate Banwidth (MB/s) , Percent Idle 100 90 70 60 50 40 30 20 10 0 Carnegie Mellon Aggregate bandwidth 80 1 2 3 4 5 6 7 8 9 10 Number of clients Parallel Data Laboratory http://www.pdl.cs.cmu.edu 3/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Beyond Bandwidth: all the other goals Storage self-management - 3-7X capital cost/year Object-oriented disks - performance optimizations Quality of storage - object attributes and scheduling Securable devices - reduce trust scope Scalable cluster servers - multi-device serializability Avoid OS stranglehold - user-level file/storage systems Enable nimble vendors - extensions unknown to FS Active disks - application exploitation of device cycles Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 4/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Potential Issues with NASD Network protocols from SAN, LAN and WAN SAN traffic fast ... WAN traffic slower ... What about LAN traffic Drive supports multiple protocols or uses protocol converter (e.g., server) Storage Function (RAID, Striping) Central coordination point has been the server or RAID controller Security Costly ... why repeat at multiple levels in the system What happens to security when NASD objects are cached On-drive resources Limited on-drive resources: limited RAM ... no extra SIMM slots on a drive Future network-attached storage legacy systems not as powerful Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 5/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Active Nets + NASD What can Active Nets do for Network-attached Storage? Dynamically composed and adaptive storage protocols Network components aware of NASD objects Caching, reliability, security, & prefetching via NASD objects Quality of Service meets Quality of Storage Network-embedded storage functionality Active NW components provide NW traffic info to endpoints Application File System Dynamic Protocol Composition Active Switch Active Router High BW, low-latency protocol NASD SAN SAN NASD Active Switch & NASDs reconfigure to create a RAID Array or striped storage server NASD Robust, adaptive LAN/WAN protocol Carnegie Mellon NASD Parallel Data Laboratory http://www.pdl.cs.cmu.edu 6/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Protocol Support SANs enable lightweight protocol ... WAN needs TCP/IP LAN traffic protocol based on location, physical NW & application NASD support for TCP/IP can be difficult CTCP-like protocol pushes more function to the client [Johnson98] Active NW components provide NW traffic info to endpoints Application File System Dynamic Protocol Composition NASD SAN SAN Active Router NASD Active Router supports protocol conversion from LAN to SAN High BW, low-latency protocol Robust, adaptive LAN/WAN protocol Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 7/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting NASD Object Aware Networks Drive object model provides standard access to data Network nodes cache/manage complete objects ... or groups of objects Network nodes can initiate cache and prefetching algorithms Drives (and clients) can expand RAM by using network nodes Application File System Active Router provides cache of NASD objects Dynamic Protocol Composition ork netw Active Router SAN ted initia etch pref NASD SAN Cache NASD Drives directly deposit data in network nodes Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 8/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Security NASD-aware networks can support NASD security Net caches can enforce NASD capabilities Allows storage to transparently extend security model into network Application File System Dynamic Protocol Composition Active Router enforces NASD security capabilities ork netw Active Router SAN ted initia etch pref NASD SAN Cache NASD Drives directly deposit data in network nodes Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 9/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Net-based Storage Functionality RAID and striping are object-based not drive-based Client middleware provides storage function Active Nets can accelerate storage function Network coordinates amoung independant devices to provide QOS Active Switch & NASDs reconfigure to create a RAID Array or striped storage server Application File System Dynamic Protocol Composition NASD Active Switch NASD SAN SAN NASD Active Router Striped File NASD Active Hub and NASDs manage striped storage scheduling for video server Carnegie Mellon Active Hub NASD NASD Parallel Data Laboratory http://www.pdl.cs.cmu.edu 10/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Dynamic Replica Management Create dynamic replicas for minimizing WAN traffic Dynamically copy or move data in LAN and SAN based on application behavior Network and storage coordinate to deliver QOS Local clients repeatedly request remote object (green block) SAN SAN Active Router NASD NASD Carnegie Mellon NASD NASD NASD NASD NASD NASD Local client accesses data from remote storage ... active router observes this and helps dynamically replicate data Parallel Data Laboratory http://www.pdl.cs.cmu.edu 11/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Summary Infrastructural computing via Active Nets & NASD Active networks enable storage access from diverse locations NASD-aware networks extend storage capabilities/functionality Active nets programmability enables flexable storage management Current work Integrate Active Nets and NASD testbeds Designing adaptive protocols and storage API Storage function in the net Tech Transfer NASD/NSIC working group: IBM, Seagate, Quantum, StorageTek, HP Parallel Data Consortium: HP, Seagate, Quantum, StorageTek, Intel, Compaq, 3Com, Symbios, Clariion, & WindRiver Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 12/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting CMU NASD is an Object Oriented Disk Avoid file manager unless policy decision needed access control once (1,2) for all accesses (3,4) to drive object spread access computation over all drives under manager Scalable BW, off-load manager, “file” knowledge File Manager Network Protocol Access Control, Namespace, and Consistency Network 1 NASD NASD Net 3 2 (capability) Net Controller Controller 3 4 4 Client Network 3,4 Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 13/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting CMU’s Functional Definition of NASD Direct client/drive transfer in networked environment Asynchronous filesystem oversight of rights, semantics Cryptographic capabilities ensure command integrity Self-management by more abstraction, independence Extensible features for application, not just client OS NASD Manager (Cluster) Client NASD Access Control Read/Write Read/Write NASD Switch Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 14/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting NFS Motivation for NASD Design NASD common code path: read bytes from file FOO send <read, bytes, capability(FOO=objX)> to NASD NASD: lookup objX, verify capability, fetch bytes from objX reply <read-reply, data> directly to client Count in SAD NetSCSI NASD top 2% by Cycles Cycles Cycles work %of SAD %of SAD %of SAD (billions) (billions) (billions) (thousd) Attr Read 792.7 26.4 11.8% 26.4 11.8% 0.0 0.0% Attr Write 10.0 0.6 0.3% 0.6 0.3% 0.6 0.3% Block Read 803.2 70.4 31.6% 26.8 12.0% 0.0 0.0% Block Write 228.4 43.2 19.4% 7.6 3.4% 0.0 0.0% Dir Read 1577.2 79.1 35.5% 79.1 35.5% 0.0 0.0% Dir RW 28.7 2.3 1.0% 2.3 1.0% 2.3 1.0% Delete Write 7.0 0.9 0.4% 0.9 0.4% 0.9 0.4% Open 95.2 0.0 0.0% 0.0 0.0% 12.2 5.5% Total 3542.4 223.1 100.0% 143.9 64.5% 16.1 7.2% NFS Operation Berkeley NFS traces [Dahlin94] (230 clients, 6.6M reqs) Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 15/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Security implications of NASD Not tied to any specific higher-level security system ie., not Kerberos, authenticated RPC, x.509 Authenticates command to be file manager approved rests on secrecy of file manager key (hierarchy) only anticipates hardware support in drive ASIC Clients Secure (crypto) Server Environment Authentication AS Server NASD array Client Network SAD/SID FS NASD NASD File Manager FM disk Secure (crypto) Storage System Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 16/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting NASD object i-face: Where’s metadata? Not at client: don’t rest integrity on trusted client Data Layout in storage device? avoid distributing per-drive block lists to drive enables on-drive, drive-subsystem optimization ie. interposed/stackable NASD - striping, RAID ie. remote Transparent Informed Prefetching NFS with hinting app $ NASD with RTIP R/W hints $ • 3D visualization rendering random 25 planes • 120 secs w/o hints and 68 seconds with hints Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 17/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting NASD Interface Design: Storage Objects Layout is best (actually) done below SCSI-4 real-time support possible; accurate geometry simplifies, strengthens transparent performance optimization Drive serves storage objects on behalf of manager Objects have attributes Inodes: (name), size, protection, type, timestamps, layout Object layout guidance from higher level sequential within object requests contiguous (can preallocate) related objects can be clustered with “nearby to” attribute Attributes are extensions of object name give higher level (filesystem) uninterpreted attribute big enough to contain another object name (soft link) Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 18/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Supporting File Managers: Soft Partitions, Well-known Objects Split capacity among different managers: partitions managers can use attributes differently; no need to integrate boundary should be soft: resize partition should be fast partition key hierarchy: (partition key, working keys) Well-known objects replace SCSI mode sense/select published format and interpretation Per-drive and per-partition separated eg. Partition table of contents (mini-disk directory) eg. Assist simple boot code with easily found “first object” Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 19/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Adapting Filesystems to NASD Reorganize decomposition of function (aka port) Primitives become drive responsibility data transfer, synchronous/automatic metadata updates Policy remains manager responsibility namespace definition/navigation access control policy client cache management multi-access atomicity Managers retain control through capabilities exploiting attributes for naming and revocation restricting client operations to protect “set attribute” Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 20/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Mapping Filesystem to NASD Objects Objects: attributes, access control, clustering Simple model each file and directory bound to separate NASD object file attributes inherit object attributes (times, logical size) Multiple objects per file? internal structure: database pages, mpeg group-of-pictures NASD striping, redundancy Multiple files/directories per object? probable contiguity, prefetching; shared metadata overhead capabilities can be restricted to object region NFS, AFS simple model; Striped NFS multiple per file Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 21/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Prototyping NASD: NFS & AFS on NASD File -> NASD object; Directory -> NASD object NASD object: private metadata, exposed attributes allocation: length, blocks used; times: create, data modify FS specific: NFS: owner, group, mode FS specific: AFS: above and modify time Operation disposition NFS: to drive: get attribute, read, write AFS: to drive: FetchStatus, BulkStatus, FetchData (w/cap), StoreData (w/cap) AFS: Read w/o cap: GetCap (callback, attributes), (GetAttr from drive), FetchData AFS: Write w/o cap: GetWCap, StoreData, ReturnCap AFS cache coherence protocol independent of NASD AFS quota enforced by capability escrow (using write range) Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 22/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Cheops: Striping storage middleware Asynchronous storage management oversight Client asking for service pays for it (synchronizer) striping, RAID, incremental growth, consistent caches fast (optimistic) synchronization support in NASDs anticipates user-level network access (Intel VIA) Client application Name Service PFS clerk NASD IF NASD IF Cheops Managers Cheops clerk NASD Carnegie Mellon Naming, authorization Capability translation, striping, RAID NASD Parallel Data Laboratory http://www.pdl.cs.cmu.edu 23/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Storage Interface Evolution & Taxonomy displ app RPC file manager object store device disk PAN LAN computers in data/control path 3 Local Filesystem 3 Distributed FS (DFS) 4 DFS + RAID 3/4 DFS + HPSS bypass 2/2+ DFS w/ NASD 2/2++ DFS w/ NASD, Cheops PAN LAN PAN LAN LAN PAN PAN bulk data transfer SAN SAN read, write read, write SAN Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 24/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Scalable Bandwidth for Applications Client library implementation enables aggressiveness parallel file system accommodates parallel nature of scaling NASD middleware fetches large blocks in parallel 70 Parallel Data Mining Bandwidth (MB/s) 60 13 NASDs (133Mhz) 1-10 clients (233Mhz) MPI + SIO LLAPI Cheops + NASD 7.4 MB/s per client until drive limits switched ATM LAN 50 40 NASD 30 20 NFS 10 0 1 2 3 4 5 6 7 8 9 10 Number of clients Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 25/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Recap: SCSI-4 based on NASD Direct transfer for wire-once, scalable bandwidth NASD for object bandwidth and server off-loading ie. Data mine at 65 MB/s in distributed filesystem NASD filesystems serve policy (async oversight) namespace, access control, consistency, atomicity capabilities encode policy, metadata; crypto integrity NASD is attribute-based, variable length object Storage management middleware clients pay for requested synchronizing semantics striping, RAID, incremental capacity, migration optimistic synchronization using fs-specific attributes Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 26/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting Are Device Cycles Really Available? Quantum Trident drive Current.68 micron chip is 74 sq. mm • Control: M68020 • Datapath ASIC HW Support for Crypto & SANS • 4 indep clock domains, each 40 MHz SCSI processor disk R/W channel uP control port DRAM port • ~ 110 Kgates + 22Kb Integrated 200 MIPS uProcessor • .35 micron next gen. enables integration of control uP onto ASIC •Siemans Tri-Core Architecture Carnegie Mellon Parallel Data Laboratory http://www.pdl.cs.cmu.edu 27/27 D. Nagle, July 16, 1998 DARPA/ITO Active Nets Meeting