File Systems and Disk Management Andy Wang Operating Systems

advertisement
File Systems and Disk
Management
Andy Wang
Operating Systems
COP 4610 / CGS 5765
Design Goals of File Systems
Physical reality
File system abstraction
Block-oriented
Byte-oriented
Physical sectors
Named files
No protection
Users protected from one
another
Robust to machine failures
Data might be corrupted if
machine crashes
File System Components
 Disk management organizes disk blocks into
files
 Naming provides file names and directories
to users, instead of tracks and sector
numbers (e.g. Takashita)
 Protection keeps information secure from
other users
 Reliability protects information loss due to
system crashes
User vs. System View of a File
 User level: individual files
 System call level: collection of bytes
 Operating system level:


A block is a logical transfer unit

Even for getc() and putc()

4 Kbytes under UNIX
A sector is a physical transfer unit

4-Kbyte sectors on disks
 File: a named collection of blocks
User vs. System View of a File
 A process

Read bytes 2 to 12
 OS


Fetch the block containing those bytes
Return those bytes to the process
User vs. System View of a File
 A process

Write bytes 2 to 12
 OS



Fetch the block containing those bytes
Modify those bytes
Write out the block
Ways to Access a File
 People use file systems
 Design of file systems involves understanding
how people use file systems



Sequential access—bytes are accessed in
order
Random access (direct access)—bytes are
accessed in any order
Content-based access—bytes are accessed
according to constraints on byte contents

e.g., return 100 bytes starting with “aye carumba”
File Usage Patterns
 Most files are small, and most references are
to small files

e.g., .login and .c files
 Large files use up most of the disk space

e.g., mp4 files
 Large files account for most of the bytes
transferred between memory and disk
 Bad news for file system designers
File System Design Constraints
 High performance

Efficient access of small files



Many small files
Used frequently
Efficient access of large files


Consume most disk space
Account for most of the data movement
Some Definitions
 A file contains a file header, which
associates the file with its disk sectors
name
data block location
data block location
File header
Some Definitions
 A file system needs an allocation bitmap to
track free space on the disk, one bit per block
Disk Allocation Policies
 Contiguous allocation
 Link-list allocation
 Segment-based allocation
 Indexed allocation
 Multi-level indexed allocation
 Hashed allocation
Contiguous Allocation
 File blocks are stored contiguously on disk
 To allocate a file,


Specify the file size
Search the disk allocation bitmap for
consecutive free blocks
data block location
number of blocks
File header
Pros and Cons of Contiguous
Allocation
+ Fast sequential access
+ Ease of computing random file locations

Adding an offset to the first disk block location
- External fragmentation
- Difficulty in growing files
Linked-List Allocation
 Each file block on a disk is associated with a
pointer to the next block

A special marker to indicate the end of the file
 e.g., MS-DOS file system

File attribute table (FAT)
data block location
next block entry
File header
Pros and Cons of Linked-List
Allocation
+ Files can grow dynamically with incremental
allocation of blocks
- Sequential access may suffer

Blocks may not be contiguous
- Horrible random accesses

May involve multiple sequential searches
- Unreliable

A corrupted pointer can lead to loss of the
remaining file
Indexed Allocation
 Uses a preallocated index to directly track the
file block locations
data block location
data block location
File header
Pros and Cons of Indexed Allocation
+ Fast lookups and random accesses
- File blocks may be scattered all over the disk


Poor sequential access
Needs defragmenter
- Needs to reallocate index as the file size
increases
Segment-Based Allocation
 Needs a segment table to allocate multiple,
contiguous regions of blocks
begin, end blocks
begin, end blocks
File header
Pros and Cons of Segment-Based
Allocation
+ Relax the requirements for large contiguous
disk regions
- Fragmentation  100%

Segment-based allocation  Indexed
allocation
- Random accesses not as fast as pure
contiguous allocation
Multilevel Indexed Allocation
 Certain index entries point to index blocks, as
opposed to data blocks (e.g., Linux ext4)
data block location
data block location
data block location
data block location
12
index block location
index block location
index block location
File header
Multilevel Indexed Allocation
 A single indirect block contains pointers to
data blocks
 A double indirect block contains pointers to
single indirect blocks
 A triple indirect block contains pointers to
double indirect blocks
Pros and Cons of Multilevel Indexed
Allocation
+ Optimized for small and large files


Small files accessed via the first 12 pointers
Large files can grow incrementally
- Multiple disk accesses to fetch a data block
under triple indirect block
- File size capped by the number of pointers
- Arbitrary file size boundaries among levels
Hashed Allocation
 Allocates a disk block by hashing the block
content to a disk location
data block location
data block location
data block location
data block location
Old file header
New file header
Pros and Cons of Hashed Allocation
+ File blocks of the same content can share the
same disk block to save storage

e.g., empty blocks
+ Good for backups and archival

Small modifications to a large file result in only
additional storage of the changes
- Poor disk performance
Download