Surfing Technology Curves Steve Kleiman CTO Network Appliance Inc. 1 Book Plug The Innovator’s Dilemma - When New Technologies Cause Great Firms to Fail Clayton M. Christensen 2 About NetApp Two product lines: Network Attached File Servers (a.k.a. filers) Web proxy caches: NetCache Founded in 1992 >$1B revenue run rate >70% CAGR since founding >120% last year 3 Filers: Fast, Simple, Reliable and Multi-protocol System Sun E 3500/4500 HP-9000 N4000 NetApp 840 CPUs 2 4 1 Overall Resp. Ops per Result Resp. @ Max SpecRate 8,165 15,270 15,235 3.04 1.91 1.54 23.8 3.7 3.6 20.4 10.4 46 Ops/FS RAID 340 318 15,235 no yes yes 4 Filers: Fast, Simple, Reliable and Multi-protocol Disk management Filer finds disks and organizes into RAID groups and spares automatically Simple addition of storage Automatic RAID reconstruction Data management Snapshots SnapRestore SnapMirror Simple upgrade Small command set 5 Filers: Fast, Simple, Reliable and Multi-protocol Built-in RAID Easy hardware maintenance Hot plug disk, power, fans Low MTTR Cluster Failover Autosupport >99.995% measured field availability 6 Filers: Fast, Simple, Reliable and Multi-protocol NFS CIFS CIFS and NFS attributes HTTP FTP DAFS Internet Cache FTP Streaming media 7 Wave 1: Networks, Appliances and Software 8 Network and Storage Bandwidth Year 1992 1994 1996 1998 2001 Storage 10 MB 20 MB 40 MB 100 MB 200-400 MB Network 0.1 MB 1 MB 10 MB 100 MB 1000 MB Penalty 100-to-1 20-to-1 4-to-1 1-to-1 .2-to-1 9 The Appliance Revolution 1980s (General Purpose) 1990s (Appliance Based) UNIX UNIX Application Windows/NT Print File Service Filer Routing Router/Switch Printer ... ... 10 Appliance philosophy Appliance philosophy breeds focus External simplicity internal simplicity RISC argument Don’t have to be all things to all people Limited compatibility constraints Interfaces are bits on wire Think different! Can innovate with both software and hardware 11 Filer Architecture Commercial off-the shelf chips Any appropriate architecture CPU i486 Pentium Alpha ‘064 Alpha‘164 PIII Board level integration 1 or more CPUs (4) 1 or more PCI busses (4) High bandwidth switches Multiple memory banks Integrated I/O NVRAM Mem PCI NVRAM 12 Roads Not Taken No “unobtainium” Minimalist infrastructure No special purpose busses No big MPs Motherboards only: no cache coherent backplanes No functionally distributed computers No special purpose networks (e.g. HIPPI) No block access protocols 13 DataOnTap Architecture Daemons, Shells, Commands Java Virtual Machine Lib ATM GbE FDDI 100BT VI NIC* NFS FCAL TCP/IP CIFS WAFL RAID Disk HTTP SCSI VIPL DAFS SK * VI supported on FC, (Future: GbE, Infiniband) 14 DataOnTap Simple Kernel Message passing Non-preemptive Sample optimizations Checksum caching Suspend/Resume Cache hit pass through 15 WAFL: Write Anywhere File Layout Log-like write throughput No segment cleaning (LFS) Write data allocated to optimize RAID performance Delayed write allocation Active data is never overwritten (shadow paging) On-disk data is always consistent File system state is changed atomically Every 10 sec, by default Client modification requests are logged to NVRAM NVRAM log is replayed only on reboot 16 Wave 2: Memory-to-Memory Interconnects (a.k.a NUMA, NORMA) 17 Problem: Remove single points of failure Without doubling hardware Minimizing performance overhead Without decreasing reliability 18 Clustered Failover Architecture Network Filer 1 ServerNet Filer 2 NVRAM NVRAM Fibre Channel Fibre Channel 19 Memory-to-Memory Interconnects Efficient transfer model Allows minimal overhead on receiver Scaleable Bandwidth High speed ASIC based switching Gigabit technology Open architecture PCI, not coherent bus interface Incorporate multiple technologies Relatively inexpensive 20 Mirroring NVRAM NVRAM is split into local and partner regions PCI Bus ServerNet DMA Data is assembled in NVRAM Data is DMAed from NVRAM to equivalent offset in remote node To partner NVRAM NVRAM data from partner CPU NVRAM Client reply is sent when log entry DMA completes 21 Leveraged Components Memory-to-Memory interconnects Low overhead, high-bandwidth, cheap WAFL Always consistent file system Built-in NVRAM logging/replay Fibre Channel disks Two independent ports Single function appliance software Simple, low-overhead failover 22 Wave 3: The Internet 23 The Consequences of Higher-speed Internet Access 200K-400K home cable head-end Requires 1.5-3Gbps access capability 30% subscription rate, 20% online Minimum 128Kbps BW Enterprise Remote sites still connected by slow links Require high-quality access to content Overloaded web servers ISP Require distribution and caching of large media files 24 Yet Another Appliance Cisco NetApp 25 NetCache HTTP/FTP proxy cache appliance Highly deployable Forward and reverse proxy Transparency Filtering iCAP Enables value added services Virus scanning, transcoding, ad insertion, … Stream splitting Stream caching Content distribution 26 Cacheable Content Cacheable Content Time Static Content Dynamic Content Streaming Media 27 Wave 4: The Death of Tapes 28 Using Tapes for Disaster Recovery Year Drive # Tapes # Tape drives to Capacity Capacity Drives Required restore in 8 hours 1999 36G 168 6TB 172 21 2000 72G 216 16TB 160 28 2001 144G 500a 72TB 360 63 a: with SAN 29 SnapMirror Remote asynchronous mirroring Continuous incremental update Only allocated blocks are transmitted Automatic resynchronization after disconnect Destination is always a consistent “snapshot” of source Filer WAN Filer 30 Creating a Snapshot Before Snapshot After Snapshot Active FS Active FS Snapshot A B C After Block Update D Disk Blocks A B Active FS Snapshot C D A B C D C’ New Block 31 WAFL: Block Map File Multiple bits per 4KB block Column for allocated block in the active file system Columns for allocated blocks in snapshots Taking a Snapshot Copy root inode S1 S2 S3 FS Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 32 Consistent Image Propagation Fast Network or Slow Modification Rate Source 1 2 3 Destination 4 1 5 4 6 5 6 Slow Network or High Modification Rate Source Destination 1 2 3 4 1 5 6 4 33 Wave 5: Local File Sharing and Virtual Interface Architecture 34 ISPs: Scalable Services Scalability Scale compute power and storage independently Resiliency Cost Commodity hardware and Open Systems standards Load Balancing Switch Application Servers Gigabit Switch File Servers F760 F760 Internet or Intranet F760 Data Center 35 Database Better Manageability Offline backup with snapshots Replication Recovery from snapshots Easy storage management Equal or better performance Less retuning F760 36 Local File Sharing Geographically constrained 1 or 2 machine rooms Mostly homogeneous clients Can be large or small 1 - 100 machines Single administrative control High performance applications Web service, Cache Email, News Database, GIS 37 Local File Sharing Architecture Characteristics Applications tend to avoid OS e.g. No virtual memory Applications tend to have OS adaptation layer Different access protocol requirements e.g. high-performance locking, recovery, streaming 38 What is VI? Virtual Interface (VI) Architecture VI architecture organization Promoted by Intel, Compaq and Microsoft VI Developer’s Forum Standard capabilities Send/receive message, remote DMA read/write Multiple channels with send/completion queues Data transfer bypasses kernel Memory pre-registration 39 VI Architecture Application VIPL Library User Kernel KVIPL client Data Kernel Control KVIPL Module Hardware VI compliant NIC driver VI compliant NIC 40 VI-compliant implementations Fibre channel (FC-VI draft standard) e.g. Troika, Emulex Giganet Servernet II Infiniband Enables 1U MP heads Future: VI over TCP/IP 41 How VI Improves Data Transfer No fragmentation, reassembly and realignment data copies No user/kernel boundary crossing No user/kernel data copies Data transfer direct to application buffers 42 Direct Access File System Application Buffers User File Access API DAFS VIPL* API VIPL VI NIC Driver Kernel Data Control Hardware NIC Memory * VI Provider Layer specification maintained by the VI Developers Forum 43 DAFS Benefits File access protocol with implicit data sharing Direct application access File data transfers directly to application buffers Bypasses Operating System File semantics Optimized for high throughput and low latency Consistent high speed locking Graceful recovery/failover of clients and servers Fencing Enhanced data recovery Leverages VI for transport independence 44 DAFS vs. SAN Wires Direct (direct transfer to memory) Block Local Attached Network (TCP/IP) SCSI over IP SAN Protocols File DAFS NAS 45 Summary Wave 1: Filers Technology: Fast networks, commodity servers Environment: Appliance-ization Wave 2: Failover Technology: Memory-to-memory interconnects, Dual ported FC disks Environment: 24x7 requirements Wave 3: NetCache Technology: Internet, HTTP Environment: High BW requirements, POP deployability 46 Summary Wave 4: SnapMirror Technology: Disk areal density, Fibre Channel, fast networks Environment: Cost of downtime for recovery Wave 5: DAFS Technology: VI architecture Environment: Local file sharing 47