COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 11 File System ©Copyright 2008 Umakishore Ramachandran and William D. Leahy Jr. 11.1 Attributes • Attributes associated with a file are referred to as metadata • Metadata represents space overhead • Typical attributes – – – – – – – – Name Alias Owner Creation time Last write time Access rights Privileges: Read, Write, Execute Size 11.1 Attributes - Name • Initially each file had a name and a list of names was kept in a directory file • As file system capacity increased a second level was added – Top level directory containing names of bottom level directories – Bottom level directories containing names of files 11.1 Attributes - Name • Eventually Directory Directory Directory Music Files Users Eche’s music folder Bruce Springsteen Secret garden Born to run I’m on fire Billy Joel We didn’t start the fire Piano Man Uptown girl Tupac Shakur Changes California love 11.1 Attributes - Name • Implemented as a tree structure / users students staff faculty rama foo 11.1 Attributes - Name • Filename Extensions – Sometimes mandatory e.g. DEC TOPS-10 – Sometimes typical but not mandatory e.g. Windows – Sometimes optional • System uses extension to know what application to launch to appropriately handle file in normal case 11.1 Attributes - Alias • Aliases – May be at actual file level • Linux: ln foo bar • Creates a new directory entry 'alias' with same status as 'original i-node access rights hard links size creation time name 3193357 -rw------2 rama 80 Jan 23 18:30 bar 3193357 -rw------2 rama 80 Jan 23 18:30 foo – May be at level of names (same as shortcuts) • Linux: ln –s fox box • Creates a link entry named bar which contains foo i-node access rights hard links size creation time name 3193495 lrwxrwxrwx 1 rama 3 Jan 23 18:52 box -> fox 3193357 -rw------1 rama 80 Jan 23 18:30 fox 11.1 Attributes - Links Hard Links • Efficient…goes right to file • Do not contain other file name(s) • Hard link to a directory may lead to circular lists • For this reason, Linux does not allow creating hard links to directories Soft Links • Improves usability by indicating actual file name • Less efficient have to go to directory to look at link to go to a directory to then go to file 11.1 Attributes - Versions • Versions – Today, in typical operating systems writing to an exisiting file overwrites the file – Some operating systems allow versioning • Requires purge mechanism • Very useful • Sometimes annoying test.data;4 11.1 Attributes – Access Rights • Access Rights specify who can access a file and what they can do to the file • Ideally one should be able to specify access rights by user but this requires a lot of metadata • As a compromise Linux breaks down rights by user, group and other • Typical privleges include: read, write, execute, change ownership, change privileges 11.1 Attributes • Visible metadata for a Linux file rwxrw-r-- 1 rama fac 2364 Apr 18 19:13 foo – – – – – – – – – User permissions: rwx Group permissions: rwOther permissions: r— Hard links: 1 Owner: rama Group: fac Size: 2364 Creation date and time: Apr 18 19:13 File name: foo 11.1 Attributes Attribute Name Alias Meaning Name of the file Other names that exist for the same physical file Elaboration Attribute set at the time of creation or renaming Attribute gets set when an alias is created; system such as Unix provide explicit commands for creating aliases for a given file; Unix supports aliasing at two different levels (physical or hard, and symbolic or soft) Owner Usually the user who created the file Creation time Time when the file was created first Time when the file was last written to Attribute gets set at the time of creation of a file; systems such as Unix provide mechanism for the file’s ownership to be changed by the superuser Attribute gets set at the time a file is created or copied from some other place Attribute gets set at the time the file is written to or copied; in most file systems the creation time attribute is the same as the last write time attribute; Note that moving a file from one location to another preserves the creation time of the file Last write time Privileges Read Write Execute Size The permissions or access rights to the file specifies who can do what to the file; Attribute gets set to default values at the time of creation of the file; usually, file systems provide commands to modify the privileges by the owner of the file; modern file systems such NTFS provide an access control list (ACL) to give different levels of access to different users Total space occupied on Attribute gets set every time the size changes due to the file system modification to the file 11.1 Attributes • Windows and some versions of UNIX have an access control list for each file • Such a practice allows for more flexability • The tradeoff is an increase in the amount of metadata stored 11.1 Attributes Unix command touch <name> Semantics Create a file with the name <name> Elaboration Creates a zero byte file with the name <name> and a creation time equal to the current wall clock time mkdir <sub-dir> Create a sub-directory <sub-dir> The user must have write privilege to the current working directory (if <sub-dir> is a relative name) to be able to successfully execute this command rm <name> Remove (or delete) the file named <name> Remove (or delete) the subdirectory named <sub-dir> Create a name <new> and make it symbolically equivalent to the file <orig> Only the owner of the file (and/or superuser) can delete a file Create a name <new> and make it physically equivalent to the file <orig> Change the access rights for the file <name> as specified in the mask <rights> Change the owner of the file <name> to be <user> Change the group associated with the file <name> to be <group> Create a new file <new> that is a copy of the file <orig> Even if the file <orig> is deleted, the physical file remains accessible via the name <new> mv <orig> <new> Renames the file <orig> with the name <new> Renaming happens in the same directory if <new> is a file name; if <new> is a directory name, then the file <orig> is moved into the directory <new> preserving its name <orig> cat/more/less <name> View the file contents rmdir <sub-dir> ln –s <orig> <new> ln <orig> <new> chmod <rights> <name> chown <user> <name> chgrp <group> <name> cp <orig> <new> Only the owner of the <sub-dir> (and/or the superuse) can remove the named sub-directory This is name equivalence only; so if the file <orig> is deleted, the storage associated with <orig> is reclaimed, and hence <new> will be a dangling reference to a non-existent file Only the owner of the file (and/or the superuser) can change the access rights Only superuser can change the ownership of a file Only the owner of the file (and/or the superuser) can change the group associated with a file The copy is created in the same directory if <new> is a file name; if <new> is a directory name, then a copy with the same name <orig> is created in the directory <new> 11.2 Design Choices in implementing a File System on a Disk Subsystem • Some design constraints – Four components of latency in doing I/O operations to and from disk • Seek time to a specific cylinder • Rotational latency to get specific sector under read/write head of disk • Transfer time from/to disk controller buffer • DMA transfer from/to controller buffer to/from system memory 11.2 Design Choices in implementing a File System on a Disk Subsystem • Some design constraints – Files are of arbitrary size – Files may be accesses sequentially or randomly – Files need to be allocated initially – Files need to be able to grow – Space should be used efficiently 11.2.1 Contiguous Allocation • At file creation time a set amount of space is allocated (may depend on file type) • File cannot grow beyond that size • Fragmentation a problem 11.2.1 Contiguous Allocation • Free list – Allocation may be by first or best fit – Requires periodic compaction 11.2.2 Contiguous Allocation with Overflow Area • Modification of previous scheme to allow files to expand into a designated overflow area • Random access suffers due to overflow area • Despite limitations has been used extensively due to fast file access times 11.2.3 Linked Allocation Free List (In memory) (free block) Directory (free block) foo.txt bar.jpg 0 (foo.txt) (bar.jpg) baz 0 (foo.txt) (baz) (baz) 0 0 • Files not stored contiguously • No compaction required • Sequential and random access poor • Susceptible to errors 11.2.4 File Allocation Table (FAT) Directory File name /foo /bar FAT start index 30 0 0 0 1 0 70 . . 30 50 1 -1 70 0 . . 1 -1 50 Free/busy next 11.2.4 File Allocation Table (FAT) • Divide disk into partitions • Each partition has a FAT • The directory just has a pointer into the starting sector entry in the FAT for each file. • Less chance for errors than linked allocation • FAT becomes big so clustering and partitioning may be necessary leading to other problems 11.2.5 Indexed Allocation • Essentially breaks up FAT into one data structure per file • Allocate an index disk block for each file called an i-node • Directory entries now point to the i-node for that file • Maintain free list as bit vector 11.2.5 Indexed Allocation Data blocks Directory i-node for /foo 30 File name i-node address 100 100 201 /foo /bar 201 30 i-node for /bar 50 50 99 99 11.2.5 Indexed Allocation • Problem is that the one index file has to point to every possible size file. • Since the i-node is a fixed size there is a maximum file size 11.2.6 Multilevel Indexed Allocation • Make the i-node point to index blocks which point to the files (first-level indirection) • This concept may be extended to two-level (and beyond) indirection • Problem: Accessing even a small file requires a lot of indirection 11.2.6 Multilevel Indexed Allocation Data blocks Directory 100 1st level i-node for /foo File name i-node address 30 40 40 201 45 /foo 100 201 30 45 299 299 11.2.7 Hybrid Indexed Allocation • Combine the previous two concepts – Two direct pointers for small files – One single indirect – One double indirect – One triple indirect 11.2.7 Hybrid Indexed Allocation 11.2.7 Hybrid Indexed Allocation Data blocks 100 i-node for /foo 30 201 direct (100) direct (201) single indirect (40) 150 double indirect (45) triple indirect File name 40 i-node address /foo 60 30 45 60 70 150 160 299 399 160 299 70 399 Directory 11.2.7 Hybrid Indexed Allocation • Given the following: – Size of index block = 512 bytes – Size of Data block = 2048 bytes – Size of pointer = 8 bytes (to index or data blocks) a) What is the maximum size (in bytes) of a file that can be stored in this file system? b) How many data blocks are needed for storing a data file of 266 KB? c) How many index blocks are needed for storing a data file of size 266 KB? 11.2.8 Comparison of allocation strategies Allocation Strategy Free list maintenance Sequential Access Random Access File growth Allocation Overhead Space Efficiency complex Very good Very good messy Medium to high Internal and external fragmentation Contiguous Contiguous blocks for With small files Overflow complex Very good for small files Very good for small files OK Medium to high Internal and external fragmentation Linked List Non- Bit vector Good but dependent on seek time Good but dependent on seek time Good but dependent on seek time Good but dependent on seek time Good but dependent on seek time Not good Very good Small to medium Excellent Good but dependent on seek time Good but dependent on seek time Good but dependent on seek time Good but dependent on seek time Very good Small Excellent limited Small Excellent Good Small Excellent Good Small Excellent Contiguous FAT Indexed Multilevel Indexed Hybrid File representatio n Contiguous blocks contiguous blocks Noncontiguous blocks Noncontiguous blocks Noncontiguous blocks Noncontiguous blocks FAT Bit vector Bit vector Bit vector 11.3 Putting it all together • UNIX uses a hybrid allocation approach with hierarchical naming i.e. no central directory • Each part of the file name corresponds to an inode which form part of a tree like structure where all but the leaf nodes are directory files (which are i-nodes) • Each directory entry contains a type which indicates if it is a directory or a data file 11.3 Putting it all together Data blocks not shown 11.3 Putting it all together 11.3 Putting it all together 11.3 Putting it all together • Given – Current directory /tmp – I-node for /tmp 20 – The following Unix commands are executed in the current directory: touch foo ln foo bar ln –s /tmp/foo baz ln baz gag • Note: Type of i-node can be one of directory-file, data-file, sym-link • Show the file structure 11.3.1 i-node • Unix files have a unique number known as the inode number • Each file on a disk in represented by an i-node structure that occupies an entire disk block • The i-node number is just the address of the block for that file • The file system reserves enough blocks in a contiguous group • There is also a bit-vector which indicates which inodes are in use. Possibly same for free blocks 11.4 Components of the File System 11.4.1 Anatomy of creating and writing files • Program makes an I/O call to create a file on hard disk – API routine for creating a file validates the call by checking the permissions, access rights, and other related information for call. After such validation, it calls the name resolver. – Name resolver contacts storage allocation module to allocate an i-node for new file. – Storage allocation module gets a disk block from free list and returns it to name resolver. Storage allocation module will fill in i-node commensurate with allocation scheme. – Name resolver creates a directory entry and records name to i-node mapping information for new file in directory. 11.4.1 Anatomy of creating and writing files • Program writes to the file just created. – API routine for file write will validate the request. – Name resolver passes memory buffer to storage allocation module along with i-node information for file. – Storage allocation module allocates data blocks from free list commensurate with size of file write. It then creates a request for disk write and hands request to device driver. – Device driver adds request to its request queue. In concert with the disk-scheduling algorithm, device driver completes write of file to disk. – Upon completion of file write, device driver gets an interrupt from disk controller that is passed back up to file system, which in turn communicates with CPU scheduler to continue execution of your program from point of file write. 11.5 Interaction among the various subsystems 11.5 Interaction among the various subsystems 11.5 Interaction among the various subsystems 11.5 Interaction among the various subsystems 11.6 Layout of the file system on the physical media 11.6 Layout of the file system on the physical media Partition Start address {platter, track, sector} End address {platter, track, sector} OS 1 {1, 10, 0} {1, 600, 0} Linux 2 {1, 601, 0} {1, 2000, 0} MS Vista 3 {1, 2001, 0} {1, 5000, 0} None 4 {2, 10, 0} {2, 2000, 0} None 5 {2, 2001, 0} {2, 3000, 0} None 11.6 Layout of the file system on the physical media 11.6.1 In memory data structures • Typically for performance reasons critical data structures are kept in memory • Eventually they will be written back to disk • Why? – They may be a short lifetime file – Convenience and efficiency • Some risk exists especially with removable media 11.7 Dealing with System Crashes • Systems sometimes crash due to bugs, deadlocks or even power failure • File system is critical thus os takes care to keep file system healthy • Upon failure system will try and write a crash image • Upon boot system checks for evidence of crash image and checks for file system consistency (On UNIX fsck) 11.8 File systems for other physical media • File system for a CD-ROM and CD-R – Files can never be erased in such media, which significantly reduces the complexity of the file system. – CD-ROM: No question of a free list – CD-R : All the free space is at the end of the CD where the media may be appended with new files. – CD-RW (rewritable CD) is more complex in that space of deleted files needs to be added to free space on media. – DVD: Similar • Solid State Drives – Seek time to disk blocks – a primary concern in disk-based file systems – is less of a concern in SSD-based file systems. – Allocation strategies (e.g., there is no need to ensure that data blocks of file map to contiguous blocks on drive). 11.9 A summary of modern file systems • Linux • Windows 11.9.1 Linux • File system API still the same as UNIX • Numerous internal changes to accommodate things like multiple file system partitions, longer file names, larger files, and hide distinction between files present on local media vs. network. • Virtual File System (VFS): A system to allow multiple file systems to be used in a way that is transparent to the user 11.9.1.1 ext2 • Linux started with Minix file system • Quickly moved to ext (extended file system) • This was enhanced and improved and is now ext2 • The layout of a ext2 file partition is close to what we have already described • Some useful things to mention… 11.9.1.1 ext2 11.9.1.1 ext2 11.9.1.2 Journaling File Systems • Overhead of writing small quantities of data is high • System may crash before changes in memory resident disk data structures have been committed to physical disk • For these reasons Linux uses a journaled file system 11.9.1.2 Journaling File Systems • Instead of actually performing each operation on the actual disk a record is kept of each transaction. These records may be in a simple sequential data structure 11.9.1.2 Journaling File Systems • Journal data structure is a finite size (e.g. 1Mb). • Once data structure is full it is written to disk and logs are deleted • Journaling fixes the small-write problem • When a system crashes it may be possible to write information about journals which will allow recovery 11.9.2 Microsoft Windows • FAT-16: 2 GB per partition (still used for small removable media) • FAT-32: 2 TB per partition (still used for interoperability with 95/98) • NTFS: (64 bit addresses) Fundamental unit of structuring is the Volume 11.9.2 Microsoft Windows • View of a File – UNIX: Stream of bytes – NTFS: Object composed of typed attributes • Attributes – May be created or deleted at will – Examples • • • • • name creation date raw image thumbnail etc. 11.9.2 Microsoft Windows • NTFS – File names up to 255 Unicode characters – / replaced by \ – Aliasing through hard and soft links is a recent addition – On the fly compression and decompression – Optional encryption feature – Journaling 11.9.2 Microsoft Windows • Similar to i-node: Master File Table • MFT contains – – – – – File name Timestamp Security information Data or pointers to disk blocks containing data Optional pointer to another MFT • File system tries to maximize use of contiguous blocks 11.10 Summary • Attributes associated with a file • Allocation strategies and associated data structure for storage management on disk • Meta-data managed by file system • Implementation details of a file system • Interaction among various subsystems of operating system • Layout of files on disk • Data structures of file system and their efficient management • Dealing with system crashes • File system for other physical media • Examples of modern file systems Questions?