SSDs: advantages
• exhibit higher speed than disks
• drive down power consumption
• offer standard interfaces like HDDs do
SSDs: critical technical constraints
• the absence of in-place update
• the absence of random writing on pages
• erasure limit : wear out after a certain number of program cycles
Erasure limit: SLC vs MLC
• SLC: 100,000 cycles
• MLC: 10,000 cycles
Erasure limit: RBER vs UBER
Solution: SSD-RAID
• RAID offers device-level redundancy
• RAID is an effective method of constructing large-scale, highperformance, and high-reliability storage systems
• SSD-RAID combines the advantages of the classic RAID and state-ofthe-art SSDs
Two parity-based SSD-RAID systems
• Differential RAID
• CSWL-RAID: Cross-SSD Wear-Leveling
• They have a same assumption: parity blocks are updated more often than data blocks, and devices holding more parity receive more writes and consequently age faster
Differential RAID
• The Problem with RAID for SSDs:
• they cause multiple SSDs to wear out at approximately the same rate
Differential RAID: RAID5 case
Differential RAID: features
• Uneven Parity Distribution
• Parity-Shifting Drive Replacement
Uneven Parity Distribution: example
• RAID-4: ( 100, 0, 0, 0, 0)
• RAID-5: ( 20, 20, 20, 20, 20)
• Diff-RAID: ( 40, 15, 15, 15, 15)
Uneven Parity Distribution: aging rate
Parity-Shifting Drive Replacement: example
Parity-Shifting Drive Replacement: example
Analysis of Age Distribution Convergence
• Distribution of device ages at replacement time for (80,5,5,5,5) parity assignment
Analysis of Age Distribution Convergence
• Convergent distribution of ages at replacement time for different parity assignments
Trade-off between reliability and throughput
• the more skewed the parity distribution towards a single device
• the higher the age differential
• the higher the reliability
• the lower throughput
Diff-RAID Reliability Evaluation
• Reliability of Diff-RAID
• Reliability of Diff-RAID Configurations
• Reliability with Different Flash Types
• Reliability with Different ECC Levels
• Reliability Beyond Erasure Limit
• Reliability on Real Workloads
Reliability of Diff-RAID
• Diff-RAID reliability changes over time and converges to a steady state
Reliability of Diff-RAID Configurations
Reliability with Different Flash Types
Reliability with Different ECC Levels
Reliability Beyond Erasure Limit
Reliability on Real Workloads
Diff-RAID Performance Evaluation
• Diff-RAID Throughput
• Performance Under Real Workloads
• Recovery Time
Diff-RAID Throughput
Performance Under Real Workloads
Recovery Time
Differential RAID: disadvantages
• Assuming a perfectly random workload: without considering the actual age of devices
• Parity-Shifting Drive Replacement: the procedure of reconstructing data and redistributing parity is complex and very time consuming
• Trade-off between reliability and throughput: hard to determine a trade-off point
CSWL-RAID: Why is CSWL needed
• RAID5 and RAID6 cannot ensure wear leveling among devices under a imperfectly random workload
CSWL-RAID: Basic Principle
• change the wearing rate of some SSDs by dynamically adjusting the fraction of parity on them
CSWL-RAID: Practical Architecture
CSWL-RAID: Basic data layout
Age distribution (1,1,1,1)
Age distribution (3,3,3,1)
Age distribution (2,2,1,1)
CSWL-RAID: Improved data layout
Age distribution (1,1,1,1)
Age distribution (3,3,3,1)
Age distribution (2,2,1,1)
CSWL-RAID: Addressing Method
RAID4 case Basic CSWL-RAID5 case
RAID5 case
CSWL-RAID: Addressing Method
• Improved CSWL-RAID5 case
CSWL-RAID: Addressing Method
• Improved CSWL-RAID5 case
CSWL-RAID: Average latency
CSWL-RAID: Redistribution time
CSWL-RAID5 case CSWL-RAID6 case
CSWL-RAID: Age difference
CSWL-RAID: Reliability
CSWL-RAID: disadvantages
• All SSDs wear out at approximately the same rate: lower reliability and shorter lifetime
• Addressing method is too complex: the complexity of the addressing algorithm is O(t), where t denotes redistribution times