Ext4

advertisement
Ext4 – Linux Filesystem
Rich Mazzolini
Jeff Thompson
Vic Lawson
Kyle Wisniewski
Beginnings and Ext2
•
•
•
•
EXT: Extended File System
First one created in 1992 for Linux
EXT2 created one year after EXT
XIAFS also created at the time, but lost to
EXT2 due to EXT2’s better longevity and
flexibility.
• EXT2 expanded the maximum filesystem size
from 2 GB to 32 TB.
EXT3
• In 2001 EXT3 was created to enable journaling
within the filesystem.
• Journaling: writing all filesystem changes to a
temporary location, or journal, before writing
permanently to the filesystem.
– Allows for better recovery.
• EXT2 can be converted to EXT3 without having
to backup and restore file.
EXT4
• Not an entirely new filesystem, but rather a fork
of EXT3.
• Main improvements: Journal Checksums and
delayed allocation of memory
• This meant the system waits until right before it
writes the file permanently to allocate memory.
This allows for better decision making.
• EXT4 is backwards compatible with all other
versions of EXT.
• EXT4 accepts up to 1 exbibyte (260 bytes) volume
sizes
Features of Ext4
• Compatibility
– Existing Ext3 systems can be updated to Ext4 by
running only a few commands, however only new
data will be stored in the new data structures
• Larger filesystems and larger file sizes
– Filesystems can be a maximum of 1 EiB
– Files can be as large as 16TiB
• More subdirectories
– Ext4 allows for 64000 subdirectories
Features of Ext4(cont.)
• Extents
– An extent tells the filesystem how many
subsequent physical blocks are used by the file
– This allows to allocate the file into an extent of
that size, rather than mapping each individual
block
– IE a 100MB file can be a single extent or a
mapping of 25600 blocks at a block size of 4KB
Features of Ext4(cont.)
• Multiblock allocation
– Ext3 would call an allocator for each block
– Ext4 only calls the allocator once for each file
– Returning to the previous example:
• A 100MB file would need to call the allocator 25600
times for each individual block in Ext3
• In Ext4, the allocator is called only once to allocate the
25600 blocks
Features of Ext4(cont.)
• Delayed Allocation
– Allocation is delayed until the file is being written
to the disk
– In Ext3 the allocation happened as soon as
possible, even if the file was to sit in cache for
some time
• Fast fsck
– In Ext4, fsck does not check unused inodes,
speeding up this command greatly
Features of Ext4(cont.)
• Journal Checksumming
– Ext4 uses checksumming to make sure that the
journal blocks are not failing or corrupting.
– The journal blocks are some of the most used on
the disk which means that they are more prone to
hardware failures.
• “No Journaling” Mode
– Ext4 allows for the disabling of the journal to
remove the little of overhead that it takes
Features of Ext4(cont.)
• Online Defragmentation
– Allows for defragmentation while a filesystem is still in
use
• Inodes
– Ext4 has a larger default inode size, allowing for more
information about each file
– Ext4 will automatically reserve several inodes when a
directory is created in anticipation of the directory
holding files
– Ext4 uses nanosecond resolution timestamps over
Ext3 use of second resolution timestamps
Features of Ext4(cont.)
• Persistant Preallocation
– Ext4 and later kernel versions of Ext3 allow
applications to reserve space for tasks without having
to write the data immediately
– An example would be a file download that may take
hours. The space is already allocated as the data
comes in
• Barriers
– On by default.
– A barrier forbids any writing of data passed the barrier
until all data before the barrier has been committed.
Ext4 Design
 Design for ext4 (on-disk format)





Ext3 default w/ large file system scalability
On-disk format change
corruption
Multi-block allocation
delayed allocation
 Ext4 Extents
 Set of logically contiguous blocks w/i a file
 External References
 http://ext2.sourceforge.net/2005-ols/2005-olsext3-presentation.pdf
 http://lwn.net/Articles/187321/
Ext4 Disk Layout (Overview)
 Overview



array of logical blocks: reduces overhead, increases throughput
Each files block stored within same group
Layout (redundant superblock and group blocks)
Group 0
Padding
ext4 Super
Block
Group
Descriptors
Reserved
GDT Blocks
Data Block
Bitmap
inode
Bitmap
inode Table
Data Blocks
1024 bytes
1 block
many
blocks
many
blocks
1 block
1 block
many
blocks
many more
blocks







Flexible block groups (multiple block groups tied together)
Meta block groups (cluster block groups in single disk block)
lazy block groups (uninitialize inode bitmap/table)
Special i-nodes (defective blocks, root, boot loader,…)
Block and inode Allocation (locality, delayed allocation, keep inode,
directory, block group in same group
Checksums (detect/isolate errors)
Bigalloc (allocate disk blocks in multiple blocks-reduce frag)
Ext4 Disk Layout (Block)
 Block
 Super blocks


block counts, inode counts, supported features, maintenance
features,…)
https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout
 Group descriptors

Contains full copy of block group descriptor table
 Block bitmap

tracks usage of data blocks w/i group)
 i-node bitmap

records which inode table entries are being used)
 i-node table

file metadata (FAT) + file type info)
Ext4 Disk Layout
(directory,extended)
 i-block contents




file block indexes and special purposes)
Symbolic links (< 60 bytes) or extents (> 60 bytes)
Direct/indirect block addressing (ext 2/3 = no efficient
Extent tree (allows very large file alloc with single
extent)
 Directory entries
 Linear directories (series of data blocks with linear
array of directory entires)
 Hash tree directories –ext3 performance improved
with tree keyed off of directory entry name hash
 Extended attributes
 File ACLs, security data, user info.,…
Ext4 Disk Layout (mount protection,journal)
 Multiple mount protection

Protect against multiple hosts simultaneous access
 Journal (blocks)


Protects files system against corruption from system crash
Layout
Superblock [(descriptor_block data_blocks|revocation_block) [more data or revocations] commmit_block] [more transactions...]






Block header – 12 byte header with magic number,
description of what block contains, transaction id
Super block – size of journal and start of transaction log
Descriptor block – journal block tag array of final data block
locations
Data block – magic number data block replaced with 0’s
Revocation block – data blocks that supersede older blocks
commit block – indicates transaction completely written
Ext4 Metadata Checksums

Overview





TL:DR – too long, didn’t read
TL;DR Instructions


Protect against corruption
Store checksums of metadata objects
Prevents broken metadata shredding file system
32-bit file systems run out of space, thus format 64-bit
Algorithm:
 CRC32 polynomial stronger error detection than CRC16
 CRC stuffing – user CRC32 function + 16 bit checksum
 Benchmarking (see chart)
https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums#Stuff_Dar
rick_Hasn.27t_Thought_Hard_Enough_About

Metadata check-summing


Block group detector protected by CRC16
Journal checksum ensures integrity of journal entries
Ext4 Metadata Checksums (cont)

On-disk structure modifications









Superblock i-nodes (patch required)
i-node/block bitmap 64–bit: each bitmap has crc32c checksum
i-node/block bitmap 32 bit: may use crc16c checksum
Extent tree: enough space to store crc32c checksum at end
Directory block – most have 12 byte entry at end for checksum
H-tree – crc32c checksum stored at end of hash tree block
Extended attributes: separate disk block and inode have 12 bytes for
checksum
Metadata not upgraded – direct, indirect and triple indirect
block maps do not allow for checksums
Tool updates: user should be able to:




Turn on checksum with simple –o parameter
Convert file systems simply
Disable metadata check-summing
Display checksums in debug
Ext4 Performance
Taken from: https://ext4.wiki.kernel.org/index.php/Testing_Results
TODO List for Ext4
• Improve recovery from bad transaction
checksums in the journal
• Test journal checksums under power failures
• Add inode checksums
• Improve the ability to resize while online
• Implement SSD Trim support
• Improve merging of extents
• Improve defragmentation
Compatibility with Windows
• It is possible to use software to allow certain
operations in an Ext4 system from Windows,
however there are no drivers available yet
that allow all features of Ext4 to be used
– Ext2Fsd is a driver that will allow write operations
• Extents must be turned off
– Ext2Read will allow read operations in Windows
with extents enabled
Compatibility with OS X
• OS X has full compatibility with Ext4
filesystems through the use of Paragon ExtFS.
– This is a commercial software and must be
purchased.
• Free solutions are extremely limited
– ext4fuse is a free solution but is limited to read
only
Getting Ext4
• Once you have upgraded to e2fsprogs 1.41 or
later. Simply type:
# mke2fs -t ext4 /dev/DEV or
# mkfs.ext4 /dev/DEV
• Once the filesystem is created, it can be
mounted as follows:
# mount -t ext4 /dev/DEV /wherever
• To enable the ext4 features on an existing ext3
filesystem, use the command:
# tune2fs -O extents,uninit_bg,dir_index /dev/DEV
• WARNING: Once you run this command, the filesystem
will no longer be mountable using the ext3 filesystem!
• After running this command, you MUST run fsck to fix
up some on-disk structures that tune2fs has modified:
# e2fsck -fDC0 /dev/DEV
Sources
• http://kernelnewbies.org/Ext4
• https://ext4.wiki.kernel.org/index.php/Ext4_H
owto
Download