File System Internals Sunny Gleason COM S 414 November 29, 2001

advertisement
File System Internals
Sunny Gleason
COM S 414
November 29, 2001
In this Lecture
• The Hard Disk
– Architecture
– Performance
• File System Structures
• Local File systems
– Like, FAT, UFS, Ext2, Ext3
Where to Find More Info
• Hard Drive Manufacturers
– http://www.storage.ibm.com/hdd/index.htm
– http://www.seagate.com/newsinfo/technology/
– http://www.westerndigital.com/library/
• Windows File Systems
– http://www.microsoft.com/hwdev/download/hard
ware/fatgen103.pdf
– http://msdn.microsoft.com/library/default.asp?url
=/library/en-us/fileio/fsys_10ku.asp
Where to Find More Info
• Unix File Systems
– http://www-106.ibm.com/developerworks/library/l-fs.html
• NFS Version 4
– http://nfsv4.org/
• The Actual Code – Linux Kernel Source
• http://www.kernel.org/
• (look in the “fs” directory of any 2.4 kernel)
– BSD Kernel Source
• http://www.openbsd.org/
• (Look in the “sys/ufs” directory)
Where to Find More Info
• The Book
– Chapters 11, 12, …
• CS414 Spring 2001 Web Site
– http://www.cs.cornell.edu/Courses/cs414/2001sp/
– (from which these slides are mostly stolen…)
• CS414 Fall 2000 Web Site
– http://www.cs.cornell.edu/Courses/cs414/2000fa/
– (other useful slide sets available)
The Memory Hierarchy
• Memory is arranged as a hierarchy:
– Close to CPU:
• Registers, L1 cache
• L2 Cache
– RAM (primary memory)
– Disk Storage (secondary memory)
– Tape or Optical Storage (tertiary mem.)
• Higher = higher speed, higher cost
Hard Disk: Architecture
• A disk drive has several physical components
– spindle
– surface (one side in the pack)
– read/write arm and head
– track (cylinder is vertical set of tracks)
– Sector
Physical Disk Access
• Delays associated with accessing a
sector on the disk:
– Seek delay (biggest)
• Moving the read/write head
– Rotational delay
• Waiting for the sector to spin under the head
– Transfer delay (smallest)
• Transferring the bits from the disk
Physical Disks
• O/S goal: provide file system API
• Problems with disks:
– Read errors
– Bad blocks
– Missed seeks
• O/S Disk API may have many levels:
– Physical disk block <surf#, cyl#, sec#>
– Disk (volume) logical block <block#>
– File logical <file block, record, or byte#>
Logical Disks
• A single hard disk may contain
multiple file systems
Making the HD Usable
• The hard disk must be partitioned
• Partitions are formatted with specific
filesystems
• In some cases, can “quick format” instead
of full reformat
• Multiple partitions are useful
– (Limited) protection against crashes
– If one partition fills up, the rest are still usable
– “Dual-booting” - in general, ability to load
multiple operating systems
Some Typical Numbers
•
•
•
•
•
•
•
•
Sector Size: 512 bytes
Cylinders per disk: 6962
Platters: 3 - 12
Rotational Speed: 10,000 rpm
Storage size: 12 - 120 GB
Seek time: 5 - 12ms
Latency: 3ms
Transfer Rate: 14-20 MB/sec
Disk Structure
• Bare disk interface: cylinders, sectors
• O/S imposes structure on disks
• Disk contents:
– Data : user files
– Metadata: structural / administrative info
• Any ideas?
• Free list: structure indicating which blocks are
unused
• Typically maintained as a bitmap: an array of
bits, representing blocks
Dealing with Mechanical Latencies
• Caches
– Locality in file access
• RAM disk
– Reserve RAM as a [fast!] filesystem
• RAID
– Exploiting parallelism
• Clever layouts and scheduling algorithms
– Head scheduling
– Meta-information layout
Bad Blocks
• All disks have some bad blocks
• Blocks go bad as time goes on
• O/S removes these blocks from the
allocation map
• On some disks, some cylinders have
reserve blocks that can be remapped to
replace bad blocks
The File System
• File system supports the abstraction of file
objects
– Create, delete, read, write, rename
• File: a named collection of data
• Typical abstraction: a vector of bytes
• O/S knows about special file types:
– Directories, symlinks, executable files
• For data files, applications decide internal file
structure (data file format)
Accessing Files
• Files can be accessed in different ways:
– Sequential Access
• Read bytes one at a time, in order
– Direct access
• Random access, given block/byte number
– Record access
• Some higher-level structure, instead of byte
– Indexed access
• Uses map from index field to corresponding file record
Storing Files
• Files can be allocated in different ways:
– Contiguous allocation
• All bytes together, in order
– Linked Structure
• Each block points to the next block
– Indexed Structure
• An index block contains pointer to many other blocks
– Rhetorical Questions -- which is best?
• For sequential access? Random access?
• Large files? Small files? Mixed?
Linked-list allocation
• Each data block contains pointer to the
next data block
• Advantages?
• Disadvantages?
Linked-List Allocation
• A single pointer is sufficient to locate all
the blocks of the file
• Seeking takes O(n) time, where n is the
size of the file
• A single corrupt pointer can cause the
entire file to be lost
MS-DOS Filesystem
• MS-DOS uses a File Allocation Table
(FAT)
• Like a linked structure, except pointers
are kept in a separate table
– For every block, the FAT keeps track of
whether or not it is allocated, and if so,
which block it points to
– Two copies of the FAT on disk
Indexed Allocation
• Index block contains pointers to each
data block
• Pros?
• Cons?
Combined Scheme: UFS
• Unix File System
• An inode contains the metadata for
UNIX files
– Contains control and allocation information
– Each inode contains 15 block pointers
• 12 direct
• 1 single, 1 double, 1 triple indirect
– Kind of tricky -- see the diagram!
UNIX Inode
UNIX Inode
• If data blocks are 4K …
– First 48K reachable from the inode
– Next 4MB available from single-indirect
– Next 4GB available from double-indirect
– Next 4TB (!) available through the tripleindirect block
• Any block can be found with at most 3
disk accesses
UNIX Directories
• Directories are just like regular files
– They contain <filename, inode#> tuples
– Filename is usually filename +
filename_length
usr
home
etc
3
4
5
inode 4
ken
7
hopkik
9
gleason
12
UNIX Disk Layout
Boot Block
Superblock
Inodes
Data Blocks …
• Boot block provides information on how
to boot the computer (tiny “bootstrap”
program)
• Superblock contains the file system
layout: # of inodes, block size, location
of the free list
File System Problems
• Fragmentation
– When the blocks of a file are located all
over the physical disk
– Causes undesirable seeking
– Use defragmentation utility to compact the
filesystem, consolidate free space
– See the pictures!
Fragmentation
Defragmentation
File System Problems
• Unreliability
– Historically, disks have been among the most
unreliable components
• Develop “bad blocks”
• Modern disks detect such faults, and have replacement
blocks that can be remapped to replace bad blocks
• Filesystems still need to track bad blocks and avoid using
them
• Inode 1 is a special inode that keeps track of where all
the bad blocks are
File System Problems
• System crashes or power failures can
occur at any time
– Any disk operation can be interrupted at
any time
– Need to ensure that the filesystem is
consistent throughout updates
• Data that is being modified may be lost, but
that should not compromise entire file system
File System Problems
• Crashes can occur at any time
– A write in UNIX involves:
• Writing the new data
• Updating the inode
• Updating the free list
– Is there a correct order? What can go
wrong if the FS does not respect the
order?
Disk Scheduling
• To minimize mechanical delays, the O/S looks
at multiple pending disk requests
– FCFS (first come, first serve)
• Ok when load is low
• Long waiting times for long request queues
– SSTF (shortest seek time first)
• Always minimize arm movement, maximizes throughput
• Favors middle blocks
– SCAN (elevator)
• Continue in same direction until done, then reverse
direction and service in that order
– C-SCAN: like scan, but return to 0 at end
Disk Scheduling
• In general, unless there are request
queues, it doesn’t matter
• The O/S may locate files strategically
for performance reasons
– The Organ Pipe distribution locates heavilyused files towards the center of the disk
– The Ext2 Filesystem places groups of
inodes around the disk, closer to the data
blocks that they reference
Conclusion
• Hard disks provide vast amounts of
slow, cheap storage
• Operating Systems layer file system
services on top of the raw disk API
• The O/S must find ways to work around
the slow performance and unreliability
of disk storage
Thanks!
• Any questions?
• Review session - Tuesday, 12/04
5:30 - 7:30pm
Related documents
Download