Ext4 – Linux Filesystem Rich Mazzolini Jeff Thompson Vic Lawson Kyle Wisniewski Beginnings and Ext2 • • • • EXT: Extended File System First one created in 1992 for Linux EXT2 created one year after EXT XIAFS also created at the time, but lost to EXT2 due to EXT2’s better longevity and flexibility. • EXT2 expanded the maximum filesystem size from 2 GB to 32 TB. EXT3 • In 2001 EXT3 was created to enable journaling within the filesystem. • Journaling: writing all filesystem changes to a temporary location, or journal, before writing permanently to the filesystem. – Allows for better recovery. • EXT2 can be converted to EXT3 without having to backup and restore file. EXT4 • Not an entirely new filesystem, but rather a fork of EXT3. • Main improvements: Journal Checksums and delayed allocation of memory • This meant the system waits until right before it writes the file permanently to allocate memory. This allows for better decision making. • EXT4 is backwards compatible with all other versions of EXT. • EXT4 accepts up to 1 exbibyte (260 bytes) volume sizes Features of Ext4 • Compatibility – Existing Ext3 systems can be updated to Ext4 by running only a few commands, however only new data will be stored in the new data structures • Larger filesystems and larger file sizes – Filesystems can be a maximum of 1 EiB – Files can be as large as 16TiB • More subdirectories – Ext4 allows for 64000 subdirectories Features of Ext4(cont.) • Extents – An extent tells the filesystem how many subsequent physical blocks are used by the file – This allows to allocate the file into an extent of that size, rather than mapping each individual block – IE a 100MB file can be a single extent or a mapping of 25600 blocks at a block size of 4KB Features of Ext4(cont.) • Multiblock allocation – Ext3 would call an allocator for each block – Ext4 only calls the allocator once for each file – Returning to the previous example: • A 100MB file would need to call the allocator 25600 times for each individual block in Ext3 • In Ext4, the allocator is called only once to allocate the 25600 blocks Features of Ext4(cont.) • Delayed Allocation – Allocation is delayed until the file is being written to the disk – In Ext3 the allocation happened as soon as possible, even if the file was to sit in cache for some time • Fast fsck – In Ext4, fsck does not check unused inodes, speeding up this command greatly Features of Ext4(cont.) • Journal Checksumming – Ext4 uses checksumming to make sure that the journal blocks are not failing or corrupting. – The journal blocks are some of the most used on the disk which means that they are more prone to hardware failures. • “No Journaling” Mode – Ext4 allows for the disabling of the journal to remove the little of overhead that it takes Features of Ext4(cont.) • Online Defragmentation – Allows for defragmentation while a filesystem is still in use • Inodes – Ext4 has a larger default inode size, allowing for more information about each file – Ext4 will automatically reserve several inodes when a directory is created in anticipation of the directory holding files – Ext4 uses nanosecond resolution timestamps over Ext3 use of second resolution timestamps Features of Ext4(cont.) • Persistant Preallocation – Ext4 and later kernel versions of Ext3 allow applications to reserve space for tasks without having to write the data immediately – An example would be a file download that may take hours. The space is already allocated as the data comes in • Barriers – On by default. – A barrier forbids any writing of data passed the barrier until all data before the barrier has been committed. Ext4 Design Design for ext4 (on-disk format) Ext3 default w/ large file system scalability On-disk format change corruption Multi-block allocation delayed allocation Ext4 Extents Set of logically contiguous blocks w/i a file External References http://ext2.sourceforge.net/2005-ols/2005-olsext3-presentation.pdf http://lwn.net/Articles/187321/ Ext4 Disk Layout (Overview) Overview array of logical blocks: reduces overhead, increases throughput Each files block stored within same group Layout (redundant superblock and group blocks) Group 0 Padding ext4 Super Block Group Descriptors Reserved GDT Blocks Data Block Bitmap inode Bitmap inode Table Data Blocks 1024 bytes 1 block many blocks many blocks 1 block 1 block many blocks many more blocks Flexible block groups (multiple block groups tied together) Meta block groups (cluster block groups in single disk block) lazy block groups (uninitialize inode bitmap/table) Special i-nodes (defective blocks, root, boot loader,…) Block and inode Allocation (locality, delayed allocation, keep inode, directory, block group in same group Checksums (detect/isolate errors) Bigalloc (allocate disk blocks in multiple blocks-reduce frag) Ext4 Disk Layout (Block) Block Super blocks block counts, inode counts, supported features, maintenance features,…) https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout Group descriptors Contains full copy of block group descriptor table Block bitmap tracks usage of data blocks w/i group) i-node bitmap records which inode table entries are being used) i-node table file metadata (FAT) + file type info) Ext4 Disk Layout (directory,extended) i-block contents file block indexes and special purposes) Symbolic links (< 60 bytes) or extents (> 60 bytes) Direct/indirect block addressing (ext 2/3 = no efficient Extent tree (allows very large file alloc with single extent) Directory entries Linear directories (series of data blocks with linear array of directory entires) Hash tree directories –ext3 performance improved with tree keyed off of directory entry name hash Extended attributes File ACLs, security data, user info.,… Ext4 Disk Layout (mount protection,journal) Multiple mount protection Protect against multiple hosts simultaneous access Journal (blocks) Protects files system against corruption from system crash Layout Superblock [(descriptor_block data_blocks|revocation_block) [more data or revocations] commmit_block] [more transactions...] Block header – 12 byte header with magic number, description of what block contains, transaction id Super block – size of journal and start of transaction log Descriptor block – journal block tag array of final data block locations Data block – magic number data block replaced with 0’s Revocation block – data blocks that supersede older blocks commit block – indicates transaction completely written Ext4 Metadata Checksums Overview TL:DR – too long, didn’t read TL;DR Instructions Protect against corruption Store checksums of metadata objects Prevents broken metadata shredding file system 32-bit file systems run out of space, thus format 64-bit Algorithm: CRC32 polynomial stronger error detection than CRC16 CRC stuffing – user CRC32 function + 16 bit checksum Benchmarking (see chart) https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums#Stuff_Dar rick_Hasn.27t_Thought_Hard_Enough_About Metadata check-summing Block group detector protected by CRC16 Journal checksum ensures integrity of journal entries Ext4 Metadata Checksums (cont) On-disk structure modifications Superblock i-nodes (patch required) i-node/block bitmap 64–bit: each bitmap has crc32c checksum i-node/block bitmap 32 bit: may use crc16c checksum Extent tree: enough space to store crc32c checksum at end Directory block – most have 12 byte entry at end for checksum H-tree – crc32c checksum stored at end of hash tree block Extended attributes: separate disk block and inode have 12 bytes for checksum Metadata not upgraded – direct, indirect and triple indirect block maps do not allow for checksums Tool updates: user should be able to: Turn on checksum with simple –o parameter Convert file systems simply Disable metadata check-summing Display checksums in debug Ext4 Performance Taken from: https://ext4.wiki.kernel.org/index.php/Testing_Results TODO List for Ext4 • Improve recovery from bad transaction checksums in the journal • Test journal checksums under power failures • Add inode checksums • Improve the ability to resize while online • Implement SSD Trim support • Improve merging of extents • Improve defragmentation Compatibility with Windows • It is possible to use software to allow certain operations in an Ext4 system from Windows, however there are no drivers available yet that allow all features of Ext4 to be used – Ext2Fsd is a driver that will allow write operations • Extents must be turned off – Ext2Read will allow read operations in Windows with extents enabled Compatibility with OS X • OS X has full compatibility with Ext4 filesystems through the use of Paragon ExtFS. – This is a commercial software and must be purchased. • Free solutions are extremely limited – ext4fuse is a free solution but is limited to read only Getting Ext4 • Once you have upgraded to e2fsprogs 1.41 or later. Simply type: # mke2fs -t ext4 /dev/DEV or # mkfs.ext4 /dev/DEV • Once the filesystem is created, it can be mounted as follows: # mount -t ext4 /dev/DEV /wherever • To enable the ext4 features on an existing ext3 filesystem, use the command: # tune2fs -O extents,uninit_bg,dir_index /dev/DEV • WARNING: Once you run this command, the filesystem will no longer be mountable using the ext3 filesystem! • After running this command, you MUST run fsck to fix up some on-disk structures that tune2fs has modified: # e2fsck -fDC0 /dev/DEV Sources • http://kernelnewbies.org/Ext4 • https://ext4.wiki.kernel.org/index.php/Ext4_H owto