University of Wisconsin - Madison RELIABILITY ANALYSIS OF ZFS CS 736 Project Reliability Analysis of ZFS Summary To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface – simulating transient block corruptions at various levels in ZFS on-disk hierarchy. Results Classes of fault handled by ZFS. Measure of the robustness of ZFS. Lessons on building a reliable, robust file system. University of Wisconsin - Madison Coming Up Outline of the talk ZFS Organization ZFS On Disk format ZFS features and specs regarding reliability. Experimental Setup and Experiments Results and Conclusions Future Work University of Wisconsin - Madison ZFS Organization ZFS ZFS ZFS Pooled Storage Model ZFS ZFS Pool -Pooled Storage Model - Disk is a ZFS pool comprising of many file systems. University of Wisconsin - Madison ZFS Organization Object based Transactional based object file system Every structure is an object. Operation on object(s) is a transaction. Grouping of transaction as transaction group. All data and metadata blocks are checksummed. No silent corruptions. Modifications are always Copy on Write Always on-disk consistent. All metadata and data(optional) is compressed. University of Wisconsin - Madison ZFS Structures Entire file system is represented as Objects - dnode_phys_t Object Sets - dnode_phys_t [ ] P/L analogy – each object is a template. The bonus buffer describes specific attributes. University of Wisconsin - Madison ZFS Structures Blocks and block pointers Data transferred to disks in terms of blocks. Block pointers (blkptr_t) used to locate, verify and describe blocks. Contains checksum and compression information. Physical size of block <> Logical Size of block Gang blocks University of Wisconsin - Madison ZFS Structures Block pointers Data Virtual Address – combination of fields in blkptr_t to locate block on disk. Wideness – blkptr_t can store upto three copies of the data pointed by a unique DVA. These blocks are called as “ditto blocks”. vdev1 asize offset1 vdev2 Three for pool wide metadata Two for file system wide metadata One for data (configurable) asize offset2 vdev3 asize offset3 Lvl typ cksum comp psize lsize University of Wisconsin - Madison ZFS Structures University of Wisconsin - Madison Wideness ZFS Structures Attributes on disk ZAP (ZFS Attribute Processor) ZAP objects used to handle arbitrary (name, object) associations within an object set (objset) Most commonly used to implement directories Also used extensively throughout the DSL University of Wisconsin - Madison Putting it all together •Everything in ZFS is an object. Objects •A dnode describes and organizes a collection of blocks making up an object. University of Wisconsin - Madison Objects Putting it all together Object set Objects Object Sets •Group related objects to form objsets. •Filesystems, volumes, clones and snapshots are objsets. University of Wisconsin - Madison Putting it all together Object set Objects Space map Snapshot Information DataSets •Encapsulates objset and provides •Space usage •Snapshot Information DataSet University of Wisconsin - Madison Putting it all together Dataset directories •Groups Datasets Object set Space map Objects •Properties such as quotas, compression Snapshot Information •Dataset Relationships DataSet Properties Child Map DataSet Directory University of Wisconsin - Madison A road less travelled University of Wisconsin - Madison From vdev label to data To sum up Moving forward Layers of indirection End to end Checksums which are separated from data. Wideness (Ditto Blocks) (3 – 2 – 1) Compression Copy on Write Scrub facility University of Wisconsin - Madison Experimental Setup Corruption Framework Corrupter Modify Driver physical disk blocks Analyzer App Understand on-disk ZFS structures Consumer App Monitor ZFS responses, error codes University of Wisconsin - Madison Experimental Setup - Simplification Setup on Solaris 10 VM Only one physical vdev (disk) No striping, mirror, raid… Initial target – Pointer Corruption Reduced Sample Space Interesting Cases Disable compression as much as possible University of Wisconsin - Madison Initial Finding All metadata compressed Cannot disable metadata compression Pointer Corruption not feasible Perform corruptions on compressed objects Representative of effects of disk faults on ZFS University of Wisconsin - Madison Corruption Experiments TYPE: Type-aware Object Corruptions TARGET (Targeted On-Disk Objects) Vdev labels [@Pool] Uberblocks [@Pool] Object sets Meta Object Set [@Pool] Myfs Object Set [@FS] objset_phys_t (describing object set) Object array objset_phys_t Indirect blkptr objects Object array ZIL [@FS] File Data [@FS] Directory Data [@FS] University of Wisconsin - Madison Results Detection Recovery Correction vdev label YES/Checksum YES/Replica NO/COW uberblock YES/Checksum YES/Replica NO/COW MOS Object YES/Checksum YES/Replica NO/COW MOS Object Set YES/Checksum YES/Replica NO/COW FS Object YES/Checksum YES/Replica NO/COW FS Indirect Objects YES/Checksum YES/Replica NO/COW FS Object Set YES/Checksum YES/Replica NO/COW ZIL YES/Checksum NO NO Directory Data YES/Checksum NO/Configurable NO/Configurable File Data YES/Checksum NO/Configurable NO/Configurable University of Wisconsin - Madison Summary (using IRON Taxonomy) Detection Checksums in parent blkptrs Recovery Replication in parent blkptrs (ditto blocks) University of Wisconsin - Madison Conclusion Integration of File System and Volume Manager Saves an additional translation Use of one generic pointer block for checksums and replication Merkel tree provides Robustness Use of replication/compression in commodity file system viable COW can be used effectively University of Wisconsin - Madison Observations/Questions No correction of ditto blocks: relies on COW Consecutive (n=wideness) failures without transaction group commit ?? Snapshot corruption ?? Explicit scrubbing corrects ditto blocks in-place Potential for corruption ?? Space/ Performance hit due to redundancy/compression 2% hit in terms of space/IO ?? (Banham & Nash) No Page Cache, uses ARC University of Wisconsin - Madison Future Work Snapshot corruptions Multiple device configuration Striping Mirror RAID-Z University of Wisconsin - Madison