• Sharing files o Introduction Users often need to share files amongst themselves It is convenient for the shared file to appear simultaneously in different directories belonging to different users One of C’s files is also present in B’s directory This is called a link Complication – File system turns from a tree to a Directed Acyclic Graph (DAG) o Problem – If directories themselves contain disk addresses, a copy will have to be made in B’s directory when the file is linked If the file is appended, only one of the directories will have the updated values o Solution 1 – I-nodes Remove disk addresses from directories and put them in the Inodes • Each directory will then contain a reference to the I-node when linked (called a hard link) • • • When link is created, the owner does not change • Reference count increases so the FS will know when to delete the I-node Problem – When C wants to delete the file, what do we do? • If file is deleted and I-node removed, B will point to an invalid I-node o Count doesn’t say which directory holds the reference, and keeping it is undesirable as there are an unlimited number of directories and I-nodes are fixed size o If I-node is later reassigned, B will point to a totally different file • Solution – Delete directory entry for C but leave I-node intact as long as the count is positive o B is the only user having a directory entry for a file owned by C. If accounting or quotas are used, C will be charged for the file until B decides to actually delete it o Solution 2 – Symbolic Links Create a special file (LINK) that simply contains the path name of the file to which it is linked • That file is kept in B’s directory and is called a symbolic link • Has no actual affect on C or the I-node of the linked file • When B reads the linked file, the OS uses the path it holds to find the actual file to read Deletion of the file is much simpler • When C deletes the file, it is destroyed • Attempts by B to read the file will fail because it can’t be located • Removing the symbolic link has no affect on the actual file • Problem – Overhead • File containing the path must be read, and then the path must be parsed and followed to find the actual I-node. o Might require many additional disk reads • Extra I-node is needed for each symbolic link o Also need an extra disk block for storing the path Advantage – easy linking to other machines • Using a network address in the link allows files to appear on remote machines without them actually being there. o General linking problem – file system traversal Links will cause a file to have multiple paths in the file system Programs that recursively examine the file system will discover the file multiple times A tape backup might make multiple copies of the same file • Provides results inconsistent with the original file system Journaling File Systems o Introduction Consider the operations necessary for removing a file • Remove the file from its directory • Release the I-node to the pool of free I-nodes • Return all disk blocks to the pool of free disk blocks What if system crashes after first step? • File no longer can be used, but its I-nodes and blocks are in limbo forever • Undesirable because resources are wasted What if system crashes after second step? • We only lose blocks, but that’s still undesirable What if we try to release the I-node first? • I-node will be reassigned • Directory will point to the wrong file What if blocks are released first? • I-node will point to blocks that are currently in the free pool • Two or more files will share random blocks o Proposed solution – journaling Keep a log of what the file system plans to do before it does it • In the event of a crash, system can look at the log and try to make things consistent again • NTFS, ext3, and ReiserFS all use this technique Procedure for three operations above • Create a log entry that includes the three operations • Write the log to the disk (and possibly read it back to verify) • After the log has been written and verified, the actual operations can take place After all operations are completed successfully, the log entry is erased. • If a crash occurs, upon recovery the log is read to see if operations were pending. o If so, all of them can be rerun (possibly multiple times) until the file is correctly removed o Key to journaling – idempotent operations Idempotent operations are those that can be repeated as often as necessary without harm • Example – update the bitmap to mark I-node k or block n as free • Example – Search a directory and remove any entry called foo Not idempotent – “Add newly freed blocks from I-node k to the end of the free list” • They might already be added Change – “Search the list of free blocks and add block n from Inode k to it if not already present” • More expensive (because of search) Implication • With idempotent operations, race conditions on the log entries don’t matter • Repeated operations can’t cause harm, so repeating all operations will cause the same result Virtual File Systems o Introduction Many different file systems can be used on the same system • Even on the same disk (with partitions) • How do we deal with this? Windows • Example - Main NTFS drive, legacy FAT-32, CD-ROM, DVD, USB drive • Every file system is given a different drive letter (C:, D:, etc.) • The Drive letter must be either explicitly or implicitly given for a process to open a file • Implication - Forces users to know the various file systems, which partition their files are on, etc. Unix/Linux • Example – ext2 as root file system, ext3 mounted on /usr, ReiserFS mounted on /home, CD-ROM mounted on /mnt • All file systems are integrated into a single structure o Single file system hierarchy with different file systems attached throughout • • Implication – users (and processes) are not required to be aware that multiple (possibly different) file systems are being used o This requires the concept of a Virtual File System (VFS) to work o Virtual File System Structure Abstract out parts of the file system that are common to all implementations • Put that code in a separate layer and have it call the concrete implementation to actually manage the data • Architecture • All system calls related to files are sent to the VFS o All standard POSIX calls related to files form this “upper interface” to user processes o Examples – open, read, write, lseek, etc. • VFS also has a “lower interface” of function calls that must be implemented by compliant concrete implementations o Called “VFS interface” in the figure o Example – read specific block, put that block in the buffer cache, and return a pointer to it. • Implication – As long as the concrete file system supplies all “VFS interface” functions, it can be used as a file system in Unix/Linux o Network File System (NFS) takes advantage of this and was a main motivation behind developing VFS Many of the same data structures are used in VFS as concrete OS • V-node (analogous to I-node) • Directories (describes FS directory) • Internal data structures o Mount table o Array of file descriptors o VFS operation chronologically Boot or mount • Root file system and other subsequent file systems must be registered with the VFS • Provides a list of function pointers the VFS interface requires o VFS uses these function pointers whenever an operation is necessary File open • VFS creates a v-node and copies all information from the concrete I-node into it (in RAM) o This includes pointer to the table of concrete functions • VFS makes an entry in the file descriptor table and makes it point to the V-node • Returns file descriptor to the process File read • Process calls read will the file descriptor returned by VFS • VFS uses that to find the V-node for the file • Function pointer to relevant concrete read code is then executed