Storage Architecture CE202 December 2, 2003 David Pease Hierarchy of Storage Faster Smaller Higher Cost Disk Speed RAM Capacity Cache Optical Tape Slower Larger Lower Storage System Components • • • • • • • • Application I/O Library File System Device Driver Host Bus Adapter Interconnect Storage Controller Devices I/O Context Disks Disk Drives • “Workhorse” of modern storage systems • Capacity increasing, raw price dropping – can buy 1TB for only $1000! – bandwidth not keeping pace – reliability is actually decreasing • massive systems can mean even lower availability • Majority of cost of ownership in administration, not purchase price – backup, configuration, failure recovery Disk Architecture spindle sector cylinder track platters arms with read/write heads rotation Disk Storage Density Disk Capacity Growth IBM Disk Storage Roadmap Storage Costs RAID • Redundant Arrays of Inexpensive Disks • Two orthogonal concepts: – data striping for performance – redundancy for reliability • Striped arrays can increase performance, but at the cost of reliability (next page) – redundancy can give arrays better reliability than an individual disk Reliability of Striped Array 1 0.9 0.8 System Reliability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 Number of Disks 10 11 12 13 14 15 16 One-month Trace of Hardware Failures others 26% power supply 6% disk failure 42% FS error 6% disk subsystem 10% disk error 10% Trace collected from the Internet Archive (March 2003) (thanks Kelly Gottlib) -- Over 100 terabytes of compressed data -- 30 disk failures out of total 70 hardware problems RAID Levels Level 0 1 2 3 4 5 6 Description Non-redundant striping Mirrored Memory-style ECC Bit-Interleaved Parity Block-Interleaved Parity Block-Interleaved, Distributed Parity P+Q Redundancy Additonal Failures Disks Tolerated 0 0 n 1 1+lg n 1 1 1 1 1 1 1 2 2 RAID Levels 0 1 2 3 4 5 6 RAID: 4x Small Write Penalty small data write xor 3 4 1 2 5 Log-Structured File Systems • Based on assumption that disk traffic will become dominated by writes • Always writes disk data sequentially, into next available location on disk – no seeks on write • Eliminates problem of 4x write penalty – all writes are “new”, no need to read old data or parity • However, almost no examples in industry file systems Tape Tape Media • Inherently sequential – long time to first byte – no random I/O • Subject to mechanical stress – number of read-write cycles lower than disk • Problems as an archival medium: – readers go away after some years • most rapidly in recent years – tapes (with data) remain in a salt mine Tape Media • Density will always trail that of disk – Tape stretches, more difficult to get higher density • Alignment also an issue – once it’s past the head, it’s gone – more conservative techniques required • Bottom line: mechanical engineering issues for tape are the difficult ones Optical • CD, CD-R/RW, DVD, DVD-R/RW – Capacities: • CD: ~700MB (huge 20 years ago!) • DVD: – – – – single sided, single layer: 5GB single sided, double layer: 9GB double sided, single layer: 10GB double sided, double layer: 18GB • Size of cell limited by wavelength of light – current lasers are red – blue lasers are under development, then UV, ... Optical • Magneto-optical (HAMR) – heat from laser makes changing direction of magnetization easier (so cell is smaller) MEMS • MicroElectroMechanical Systems – 6-10 times faster than disk – cost and capacity issues Magnetic RAM (MRAM) • Stores each bit in a magnetic cell rather than a capacitor or flip-flop – data is persistent • Can be read and written very quickly – Read and write times 0.5 – 10 µs or less – Individual bits are writeable (no block erase) • Density & cost comparable to DRAM – may require density/speed tradeoffs – denser MRAM may have to run slower because of heat dissipation on writes Magnetic RAM (MRAM) • Several companies have announced partnerships to produce products ~2003 • Ideas for use of MRAM in storage: – Persistent cache • Hot data in MRAM, cold data to disk • No need to flush write cache to avoid data loss – HeRMES • all metadata in MRAM • enough file data in MRAM to hide disk latency for first access to a file Peripheral Buses • • • • • • • • SCSI IDE/ATA HIPPI (High Performance Parallel Intf.) IEEE 1394 (FireWire) FibreChannel (FCP) IP (e.g., iSCSI) InfiniBand Serial ATA Peripheral Buses • Parallel – SCSI, most printers, IBM Channels – 1 or more bytes per clock – Skew problems at high speeds • Serial – FC, RS232, IEEE1394 (FireWire) – 1 bit per clock, self clocking – can be run at much higher speeds than parallel bus Networked Storage • Storage attached by general-purpose or dedicated network (e.g., FibreChannel) • Motivations: – homogenous and heterogeneous file sharing – centralized administration – better resource utilization (shared storage resources, pooling) • Dedicated Networks: – Fibre-Channel: FCP (SCSI over FC) – iSCSI: SCSI over IP – InfiniBand Networked Storage • Can mean many things: – NAS (Network-Attached Storage): file server appliances serving NFS and/or CIFS (for example, Network Appliance) – NASD (Network-Attached Secure Disk): intelligent, network-attached drives w/ security features (also, Network-Attached Storage Device) – SAN (Storage Area Network): network for attaching disks and computers, usually dedicated only to storage operations • OBSD (Object-Based Storage Device): similar to NASD A SAN File System NFS CIFS FTP HTTP Control Network (IP) Win2K AIX Solaris Linux IFS w/cache IFS w/cache IFS w/cache IFS w/cache Meta-data Server Meta-data Server SAN data Meta-data Server Security assists Data Data HSM & Backup Storage Management Server Metadata Additional Reading • Hennessy & Patterson: Chapter 6 • Chen, Lee, Gibson, Katz, & Patterson: RAID: high performance, reliable secondary storage. ACM Computing Surveys 26, June 1994, 145-185 • Rosenblum & Ousterhout: The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, Feb. 1992, 26-52 • Gibson, Nagle, et al.: A cost-effective, high-bandwidth storage architecture. Proceedings of the Eight Conference on Architectural Support for Programming Languages and Operating Systems, 1998 • http://www.almaden.ibm.com/cs/storagesystems/stortank/