Storage

advertisement
Storage Architecture
CE202
December 2, 2003
David Pease
Hierarchy of Storage
Faster
Smaller
Higher
Cost
Disk
Speed
RAM
Capacity
Cache
Optical
Tape
Slower
Larger
Lower
Storage System Components
•
•
•
•
•
•
•
•
Application
I/O Library
File System
Device Driver
Host Bus Adapter
Interconnect
Storage Controller
Devices
I/O Context
Disks
Disk Drives
• “Workhorse” of modern storage systems
• Capacity increasing, raw price dropping
– can buy 1TB for only $1000!
– bandwidth not keeping pace
– reliability is actually decreasing
• massive systems can mean even lower availability
• Majority of cost of ownership in
administration, not purchase price
– backup, configuration, failure recovery
Disk Architecture
spindle
sector
cylinder
track
platters
arms with
read/write
heads
rotation
Disk Storage Density
Disk Capacity Growth
IBM Disk Storage Roadmap
Storage Costs
RAID
• Redundant Arrays of Inexpensive Disks
• Two orthogonal concepts:
– data striping for performance
– redundancy for reliability
• Striped arrays can increase performance, but
at the cost of reliability (next page)
– redundancy can give arrays better reliability than an
individual disk
Reliability of Striped Array
1
0.9
0.8
System Reliability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
9
Number of Disks
10
11
12
13
14
15
16
One-month Trace of
Hardware Failures
others
26%
power
supply
6%
disk failure
42%
FS error
6%
disk
subsystem
10%
disk error
10%
Trace collected from the Internet Archive (March 2003)
(thanks Kelly Gottlib)
-- Over 100 terabytes of compressed data
-- 30 disk failures out of total 70 hardware problems
RAID Levels
Level
0
1
2
3
4
5
6
Description
Non-redundant striping
Mirrored
Memory-style ECC
Bit-Interleaved Parity
Block-Interleaved Parity
Block-Interleaved, Distributed Parity
P+Q Redundancy
Additonal Failures
Disks
Tolerated
0
0
n
1
1+lg n
1
1
1
1
1
1
1
2
2
RAID Levels
0
1
2
3
4
5
6
RAID: 4x Small Write Penalty
small data write
xor
3
4
1
2
5
Log-Structured File Systems
• Based on assumption that disk traffic will
become dominated by writes
• Always writes disk data sequentially, into next
available location on disk
– no seeks on write
• Eliminates problem of 4x write penalty
– all writes are “new”, no need to read old data or
parity
• However, almost no examples in industry file
systems
Tape
Tape Media
• Inherently sequential
– long time to first byte
– no random I/O
• Subject to mechanical stress
– number of read-write cycles lower than
disk
• Problems as an archival medium:
– readers go away after some years
• most rapidly in recent years
– tapes (with data) remain in a salt mine
Tape Media
• Density will always trail that of disk
– Tape stretches, more difficult to get higher
density
• Alignment also an issue
– once it’s past the head, it’s gone
– more conservative techniques required
• Bottom line: mechanical engineering
issues for tape are the difficult ones
Optical
• CD, CD-R/RW, DVD, DVD-R/RW
– Capacities:
• CD: ~700MB (huge 20 years ago!)
• DVD:
–
–
–
–
single sided, single layer: 5GB
single sided, double layer: 9GB
double sided, single layer: 10GB
double sided, double layer: 18GB
• Size of cell limited by wavelength of light
– current lasers are red
– blue lasers are under development, then UV, ...
Optical
• Magneto-optical (HAMR)
– heat from laser makes changing direction
of magnetization easier (so cell is smaller)
MEMS
• MicroElectroMechanical Systems
– 6-10 times faster than disk
– cost and capacity issues
Magnetic RAM (MRAM)
• Stores each bit in a magnetic cell rather than
a capacitor or flip-flop
– data is persistent
• Can be read and written very quickly
– Read and write times 0.5 – 10 µs or less
– Individual bits are writeable (no block erase)
• Density & cost comparable to DRAM
– may require density/speed tradeoffs
– denser MRAM may have to run slower because of
heat dissipation on writes
Magnetic RAM (MRAM)
• Several companies have announced
partnerships to produce products ~2003
• Ideas for use of MRAM in storage:
– Persistent cache
• Hot data in MRAM, cold data to disk
• No need to flush write cache to avoid data loss
– HeRMES
• all metadata in MRAM
• enough file data in MRAM to hide disk latency for first
access to a file
Peripheral Buses
•
•
•
•
•
•
•
•
SCSI
IDE/ATA
HIPPI (High Performance Parallel Intf.)
IEEE 1394 (FireWire)
FibreChannel (FCP)
IP (e.g., iSCSI)
InfiniBand
Serial ATA
Peripheral Buses
• Parallel
– SCSI, most printers, IBM Channels
– 1 or more bytes per clock
– Skew problems at high speeds
• Serial
– FC, RS232, IEEE1394 (FireWire)
– 1 bit per clock, self clocking
– can be run at much higher speeds than
parallel bus
Networked Storage
• Storage attached by general-purpose or
dedicated network (e.g., FibreChannel)
• Motivations:
– homogenous and heterogeneous file sharing
– centralized administration
– better resource utilization (shared storage
resources, pooling)
• Dedicated Networks:
– Fibre-Channel: FCP (SCSI over FC)
– iSCSI: SCSI over IP
– InfiniBand
Networked Storage
• Can mean many things:
– NAS (Network-Attached Storage): file server
appliances serving NFS and/or CIFS (for example,
Network Appliance)
– NASD (Network-Attached Secure Disk): intelligent,
network-attached drives w/ security features (also,
Network-Attached Storage Device)
– SAN (Storage Area Network): network for attaching
disks and computers, usually dedicated only to
storage operations
• OBSD (Object-Based Storage Device): similar to NASD
A SAN File System
NFS
CIFS
FTP
HTTP
Control Network (IP)
Win2K
AIX
Solaris
Linux
IFS w/cache
IFS w/cache
IFS w/cache
IFS w/cache
Meta-data
Server
Meta-data
Server
SAN
data
Meta-data
Server
Security
assists
Data
Data HSM &
Backup
Storage
Management
Server
Metadata
Additional Reading
• Hennessy & Patterson: Chapter 6
• Chen, Lee, Gibson, Katz, & Patterson: RAID: high
performance, reliable secondary storage. ACM Computing
Surveys 26, June 1994, 145-185
• Rosenblum & Ousterhout: The design and implementation
of a log-structured file system. ACM Transactions on
Computer Systems, Feb. 1992, 26-52
• Gibson, Nagle, et al.: A cost-effective, high-bandwidth
storage architecture. Proceedings of the Eight Conference
on Architectural Support for Programming Languages and
Operating Systems, 1998
• http://www.almaden.ibm.com/cs/storagesystems/stortank/
Download