Bridging the Information Gap in Storage Protocol Stacks Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison State of Affairs File System Interface Storage System Namespace, Files, Metadata, Layout, Free Space Block Based, Read/Write Parallelism, Redundancy 2 of 32 Problem • Information gap may cause problems – Poor performance • Partial stripe write operations – Duplicated functionality • Logging in file system and storage system – Reduced functionality • Storage system lacks knowledge of files • Time to re-examine the division of labor 3 of 32 Our Approach • Enhance the storage interface – Expose performance and failure information • Use information to provide new functionality – On-line expansion – Dynamic parallelism – Flexible redundancy Informed LFS Exposed RAID 4 of 32 Outline • ERAID Overview • I·LFS Overview • Functionality and Evaluation – – – – On-line expansion Dynamic parallelism Flexible redundancy Lazy redundancy • Conclusion 5 of 32 ERAID Goals • Backwards compatibility – Block-based interface – Linear, concatenated address space • Expose information to the file system above – Regions – Performance – Failure • Allow file system to utilize semantic knowledge 6 of 32 ERAID Regions • Region – Contiguous portion of the address space • Regions can be added to expand the address space • Region composition – RAID: One region for all disks – Exposed: Separate regions for each disk – Hybrid ERAID 7 of 32 ERAID Performance Information • Exposed on a per-region basis • Queue length and throughput • Reveals – Static disk heterogeneity – Dynamic performance and load fluctuations ERAID 8 of 32 ERAID Failure Information • Exposed on a per-region basis • Number of tolerable failures • Reveals – Static differences in failure characteristics – Dynamic failures to file system above X RAID1 ERAID 9 of 32 Outline • ERAID Overview • I·LFS Overview • Functionality and Evaluation – – – – On-line expansion Dynamic parallelism Flexible redundancy Lazy redundancy • Conclusion 10 of 32 I·LFS Overview • Log-structured file system – – – – – Transforms all writes into large sequential writes All data and metadata is written to a log Log is a collection of segments Segment table describes each segment Cleaner process produces empty segments • Why use LFS for an informed file system? – Write-anywhere design provides flexibility – Ideas applicable to other file systems 11 of 32 I·LFS Overview • Goals – Improve performance, functionality, and manageability – Minimize system complexity • Exploits ERAID information to provide – – – – On-line expansion Dynamic parallelism Flexible redundancy Lazy redundancy 12 of 32 I·LFS Experimental Platform • • • • NetBSD 1.5 1 GHz Intel Pentium III Xeon 128 MB RAM Four fast disks – Seagate Cheetah 36XL, 21.6 MB/s • Four slow disks – Seagate Barracuda 4XL, 7.5 MB/s 13 of 32 I·LFS Baseline Performance • Four slow disks: 30 MB/s • Four fast disks: 80 MB/s 14 of 32 Outline • ERAID Overview • I·LFS Overview • Functionality and Evaluation – – – – On-line expansion Dynamic parallelism Flexible redundancy Lazy redundancy • Conclusion 15 of 32 I·LFS On-line Expansion • Goal: Expand storage incrementally – Capacity – Performance • Ideal: Instant disk addition – Minimize downtime – Simplify administration • I·LFS supports on-line addition of new disks 16 of 32 I·LFS On-line Expansion Details • • • • ERAID: Expandable address space Expansion is equivalent to adding empty segments Start with an oversized segment table Activate new portion of segment table 17 of 32 I·LFS On-line Expansion Experiment • I·LFS immediately takes advantage of each extra disk 18 of 32 I·LFS Dynamic Parallelism • Goal: Perform well on heterogeneous storage – Static performance differences – Dynamic performance fluctuations • Ideal: Maximize throughput of the storage system • I·LFS writes data proportionate to performance 19 of 32 I·LFS Dynamic Parallelism Details • ERAID: Dynamic performance information • Most file system routines are not changed – Aware of only the ERAID linear address space – Reduces file system complexity • Segment selection routine – Aware of ERAID regions and performance – Chooses next segment based on current performance 20 of 32 I·LFS Static Parallelism Experiment • Simple striping limited by the rate of the slowest disk • I·LFS provides the full throughput of the system 21 of 32 I·LFS Dynamic Parallelism Experiment • I·LFS adjusts to the performance fluctuation 22 of 32 I·LFS Flexible Redundancy • Goal: Offer new redundancy options to users • Ideal: Range of mechanisms and granularities • I·LFS provides mirrored per-file redundancy 23 of 32 I·LFS Flexible Redundancy Details • ERAID: Region failure characteristics • Use separate files for redundancy – Even inode N for original files – Odd inode N+1 for redundant files – Original and redundant data in different sets of regions • Flexible data placement within the regions • Use recursive vnode operations for redundant files – Leverage existing routines to reduce complexity 24 of 32 I·LFS Flexible Redundancy Experiment • I·LFS provides a throughput and reliability tradeoff 25 of 32 I·LFS Lazy Redundancy • • • • Goal: Avoid replication performance penalty Ideal: Replicate data immediately before failure I·LFS offers redundancy with delayed replication Avoids replication penalty for short-lived files 26 of 32 I·LFS Lazy Redundancy • ERAID: Region failure characteristics • Segments needing replication are flagged • Cleaner acts as replicator – Locates flagged segments – Checks data liveness and lifetime – Generates redundant copies of files 27 of 32 I·LFS Lazy Redundancy Experiment • I·LFS avoids performance penalty for short-lived files 28 of 32 Outline • ERAID Overview • I·LFS Overview • Functionality and Evaluation – – – – On-line expansion Dynamic parallelism Flexible redundancy Lazy redundancy • Conclusion 29 of 32 Comparison with Traditional Systems • On-line expansion – Yes • Dynamic parallelism (heterogeneous storage) – Yes, but with duplicated functionality • Flexible redundancy – No, the storage system is not aware of file composition • Lazy redundancy – No, the storage system is not aware of file deletions 30 of 32 Conclusion • Introduced ERAID and I·LFS • Extra information enables new functionality – Difficult or impossible in traditional systems • Minimal complexity – 19% increase in code size • Time to re-examine the division of labor 31 of 32 Questions? http://www.cs.wisc.edu/wind/ 32 of 32