File Systems A collection of directories and files Many Operating Systems support multiple, virtual, file system (VFS) organizations • A VFS is an abstraction, which enables a single system call to abstract the file system organization details from the developer • The system call provides a middle layer, which transfers to the correct low-level object-oriented interface File Record Structure File: collection of records, Record: collection of fields • No Structure: A sequence of bytes • Record structure: Lines of text, Fixed length, variable length • Complex Record Structures – – – – Formatted documents with appropriate control characters Relocatable load files Database table rows Combination of binary fields • Who decides the structure: – Operating system – Program File Control Block (FCB) OS data structure consisting of information about a file • • • • • • • Name –human-readable Identifier – unique number identifies each file Type – most systems support different types Location – pointer to file location on device Size – current file size Protection – access rights and owner Time, date, and user identification – data for protection, security, and usage • Where is file information maintained? On a disk resident directory structure File System Abstraction of a ‘raw’ partition as collections of files and directories • Partition: Contains a file system on disk, consists of: – File control blocks (FCB): Defines a file’s attributes – Directory/Folder: A collection of FCBs – Boot Control Block: OS load Information – Partition Control Block: Information about the partition File System Operations A file is an abstract data type with well-defined operations • • • • • Create Write and Read Reposition within file (Seek) Delete or Truncate Open – Load the file information from the directory structure into memory • Close – update the file information on disk and release resources Open File Information • File pointer: pointer to last read/write location, per process that has the file open • File-open count: Allows removal from the openfile list on the last close • Pointers: Disk location and a data access cache • Access rights: per-process access mode information • Locking Information: mediates access to a file – Mandatory – access denied based on record locks – Advisory – processes can inquire lock status Java File Exclusive Lock FileLock exclusive=null; public static final boolean EXCLUSIVE=false; try { RandomAccessFile raf = new RandomAccessFile("file.txt","rw"); FileChannel ch = raf.getChannel(); // exclusively lock the first half of the file exclusive = ch.lock(0,raf.length()/2,EXCLUSIVE); /** Now modify the data . . . */ exclusive.release(); // release lock. } catch (Exception ioe) { System.out.println("I didn't like that"); } Blocks till lock available, or InterruptedException, or AsynchronousCloseExcception Java File Shared Lock FileLock shared=null; public static final boolean SHARED=true; try { RandomAccessFile raf = new RandomAccessFile("file.txt","rw"); FileChannel ch = raf.getChannel(); // Shared lock on the top half long len = raf.length(); shared = ch.lock(len/2+1,len, SHARED); /** Now read the data . . . */ sharedLock.release(); // release lock. } catch (java.io.Exception ioe) { System.err.println("I didn't like that"); } Direct and Sequential Access • Sequential Access: read, write, append, reset, rewrite (cannot read previously written records) • Direct (Random) Access: seek, read, write Indexed Access File System Software Structure • Virtual File System (VFS): wrapper between applications and different file systems • Uniform application view File Types Files/Folders Read/Write Layered Approach Directory Structure Directory: A collection of nodes containing file information Directory F1 F2 F3 F4 Files Fn Typical File System Organization Directory Design Note: A directory is another abstract data type • Operations: Search, Create, Delete, List, Rename, Traverse • Design Criteria – – – – Efficiency – locating a file quickly Naming – convenient to users, aliases, unique full qualified path names Grouping – by extension or properties Access control • Design decisions – Should sub-directories be removed on a delete operation? – What kind of path names should be allowed? – Are absolute and relative paths supported? Directory Structure Goals: (a)Convenient name space (b)Quick to access and locate (c)Ability to group related files Definitions: Path (absolute, relative) working directory Two level: Fails Goal c Single Level: Fails Goal b and c Tree Structured Single and Two Level Directories • Single level • Disadvantages: Name conflicts, no sub-folders • Can have the same names for different users • Efficient searching but no sub-folders Tree-Structured Directories • Efficient searching, can group by sub-folders, Working directory, absolute/relative path names • Problem to resolve: How should links (aliases) work? Acyclic-Graph Directories Cycles can lead to infinite loops • Problems sharing directories and files – Aliased names (link) – Multiple link levels – Dangling pointers • Solutions – A linked list of back pointers – Lazy detection – Follow link chains – Remove data when entry count = 0 General Graph Directory Issues Cycle detection algorithms Garbage collection algorithms Only allow links to files, not directories Mount Points • Definitions – Mount: Loading a remote file system for local access – Mount Point: the path point where a remote file system merges with the local structure • Top figure: un-mounted file systems • Bottom figure: The top right file system mounted over the users directory of the file system of the top left File Sharing Files are shared by users locally, and over networks, and grids • Sharing protection: user and group identifications and access codes • Client Server Network Models: Network File System (NSF) or CISF (Windows Common Internet File System) using remote procedure calls • Consistency for simultaneous access • Remote File Transfer: FTP (WinSCP) • Remote Login: TELNET (PuTTY) • Issues – – – – – Handling network and server failure. Transaction based systems Stateless protocol: easy recovery, but less security State-based protocols: difficult recovery, better security Establishing when updates become visible to other users Access Control • • • • File owner/creator controls: what can be done by whom Types of access (Read, Write, Execute, Append, Delete, List) Mode of access: read, write, execute Three classes of users and examples of access rights a) owner access (u) 7 1 1 1 (RWX) b) group access (g) 6 1 1 0 (RW) c) public access (o) 1 0 0 1 (X) • System administrator creates group names and adds lists of users to it. • Owner defines access to a particular file (say game) or subdirectory Command to set access rights to a file: Owner (user) group Public (other) chmod 761 game Example: chmod u+rwx g+rw o+x game Example: chmod g-x u-rw game Example: chmod u=rwx g=rw o=x game Associate file game with group staff: chgrp staff game File System Transient Data in Memory (a) Opening a file (b) Reading a file Directory Structure Alternatives • Simple List: names & disk pointers – simple to program – O(n) search time • Hashed – O(1) directory search time – collisions possible – fixed hash table size • Other alternatives – Separate chaining – Sorted list O(lg n) find; O(n) deletion Directory File Start Length 0 2 tr 14 3 Mail 19 6 list 28 4 6 2 Count f Simple list structure Contiguous allocation Allocating Space for Files Contiguous allocation • Each file occupies a set of contiguous blocks on the disk • Simple – Only starting block # and number of blocks are required • Both random and sequential access is possible • External fragmentation (holes) • Files cannot grow; adjacent space might be allocated • Some systems allocate in groups of blocks (extents or clusters). Files are linked lists of these contiguous allocations Logical to physical translation of record R Block = start + R*record size/block size Offset = R*record size % block size Linked Allocation of File Space • Files are linked lists of blocks: blocks may be anywhere • Simple – Only need a directory’s starting address • No external fragmentation, but no random access • File-allocation table (FAT) used by MS-DOS and OS/2 has a chain of available clusters of blocks • Caching reduces disk seeks Location of record R Block = located by linked list traversal Offset = R*record size % block size FAT Free block count Indexed Allocation of File Space • index block contains block pointers • Index table must be maintained and is linked • Random access possible • Allows dynamic access without external fragmentation • Index table can be cached Location of record R Block = located by index table lookup Offset = R*record size % block size Multi-level Indexed Allocation of File Space Inode outer-index index table file UNIX (4K bytes per block) Management of Free Space • Bit vector (bit per block; 0=free) – Extra space needed – Example: bit/block size = 4096 disk size = 1 gigabyte space = 230/(212*23)= 32 KB – Easy to find groups of free blocks • Linked list (free list) – Finding contiguous space hard – No waste of space • Grouping: separate lists ordered by contiguous block size • Counting: A Linked list contains block #s + a count of adjacent free blocks • Issue: Maintaining consistency between memory structures and those on disk Efficiency • Efficiency dependent on: – Allocate and access algorithms – FCB’s and directory content – Caching • Caching – By Buffer: cache disk blocks in separate section of memory – By Page: cache pages using virtual memory techniques. (Memory-mapped I/O) • Algorithm Optimizations – Use free-behind (release previously read blocks) and read-ahead replacement to optimize sequential access – Dedicate section of memory as a virtual disk (RAM disk). Various Disk-Caching Locations Unified and Non-unified Buffer Cache • Page cache: holds pages, rather than disk blocks Unified Buffer • Buffered Cache: holds recently used disk blocks • Unified Buffer Cache: Same cache for both file I/O and Memory Mapped Files • Non Unified Buffer Cache: Separate page cache for Memory Mapped Files and for file I/O. Requires extra copying No Unified Buffer Reliability • Consistent back up procedures – System programs perform full or incremental back ups – Data recovery recovers any lost data from a back up device • Consistency checking on reboot – Inconsistent directory/block allocations automatically repaired • Log structured (or journaling) to minimize seeks – Write file system operations to a transaction on a circular buffer (or log). – Transaction committed after log write operations complete – A background task processes log transactions • Asynchronously updates the file system • Deletes appropriate log records after the update completes – After a crash, the system finishes any partial operations The Sun Network File System (NFS) Software specification for accessing remote files across LAN or WAN • Networked system view: independent and heterogeneous • Sharing of file systems: transparent to users • Mount operations: – require specifying the host IP address – Remote directories are mounted over any local file system directory; they hide the directories and subdirectories over which they mount – Cascading mounts: locally mount over other mounted file systems. Users do not get access to subdirectories remotely mounted over remote directories • Implementation: – Remote Procedure calls (RPC) & External Data Representation (XDR) protocol – Servers are stateless but maintain client lists for server shutdowns NFS Mounting Purpose: Establish connections Mount operation • usr/shared mounts over usr/local • User loses access to local Cascaded mount operation • usr/dir2 mounts over usr/local/dir1 • Now dir2 hides dir1 Three independent file systems Pseudo code Establish connection with server Request name of remote directory to mount Server returns file handle, containing file-system identifier/inode number User view changes and the remote file system becomes available Remote procedure calls for file/directory operations available NFS Protocol NFS servers • Uses buffering (server side) and caching (client side). The local kernel checks if the local cache is up to date • All operations are synchronous • Utilizes RPC calls • 1-1 API with UNIX system calls (except open, close) • NO concurrency-control • Request are stateless, with a full set of arguments NFS Path-Name Translation • Performed by breaking the full path into path component names and performing a separate NFS lookup call for every pair: path component name and directory virtual node (vnode) • To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names