CSS490 Distributed File Systems Textbook Ch7 (p421 - 440) Instructor: Munehiro Fukuda These slides were compiled from the course textbook and reference books. Winter, 2004 CSS490 DFS 1 DFS Services Storage service Disk service: giving a transparent view of distributed disks. Block service giving the same logical view of disk-accessing units. True file service File-accessing mechanism: deciding a place to manage remote files and unit to transfer data (at server or client? file, block or byte?) File-sharing semantics: providing similar to Unix but weaker file update semantics File-caching mechanism: improving performance/scalability File-replication mechanism: improving performance/availability Name service Mapping between text file names and reference to files, (i.e. file IDs) Directory service Winter, 2004 CSS490 DFS 2 DFS Desirable Features Transparency: should include structure, access, naming, and replication transparency. User mobility: should not force a user to work on a specific node. Performance: should be comparable to that of a centralized file system. Simplicity: should give the same semantics as a centralized file system. Scalability: should cope with the growth of nodes. Fault tolerance: should not face a failure stop and maintain backup copies. Synchronization: should complete concurrent access requests consistently. Security: should protect files from network intruders. Heterogeneity: should allow a variety of nodes to share files in different storage media Winter, 2004 CSS490 DFS 3 File Models Unstructured and Structured Files An un-interpreted sequence of bytes: UNIX and MSDOS: Non-indexed records: IBM mainframe Indexed records such as B-tree: Research Storage System(RSS) and Oracle Mutable and Immutable Files Mutable: a single stored sequence altered by each update (ex. Unix and MSDOS) Immutable: a history of immutable versions, each created every update (ex. Cedar File System) Winter, 2004 CSS490 DFS 4 File-Accessing Models Accessing Remote Files Remote service model Data caching model File access At a server Merits A simple implementation Demerits Communication overhead At a client that cached a file copy Reducing network traffic Cache consistency problem Unit of Data Transfer Transfer level File Merits Simple, less communication overhead, and immune to server Demerits A client required to have large storage space Block A client not required to have large storage space More network traffic/overhead Byte Flexibility maximized Difficult cache management to handle the variable-length data Record Handling structured and indexed files More network traffic More overhead to re-construct a file. Winter, 2004 CSS490 DFS 5 File-Sharing Semantics 1. 2. 3. 4. Define when modifications of the file data made by a user are observable by other users Unix semantics Session Semantics Immutable shared-files semantics Transaction-like semantics Winter, 2004 CSS490 DFS 6 File-Sharing Semantics Unix Semantics Absolute Ordering Client A a b Append(d) read a b c a b a b c a b c d a b c d e t4 t5 a b c d e a b c t1 Client B t2 Append(c) t3 t6 delayed Append(d) read Network Delays Winter, 2004 CSS490 DFS 7 File-Sharing Semantics Session Semantics Client A Open(file) a b Append(c) a b c Client C Client B a b Append(d) a b c d Append(e) a b c d e Close(file) Open(file) Open(file) a b Append(x) a b x Append(y) a b c y Append(z) a b c d z Append(m) a b c d e m Close(file) Close(file) a b c d e a b c d e Close(file) Winter, 2004 Server a b c d z a b c d e m CSS490 DFS 8 File-Sharing Semantics Transaction-Like Semantics (Concurrency Control) Forward validation Backward validation Client A Client B Trans_start R1 R2 W3 R4 W5 Client C Client D Compare reads with former writes Client A Client B Trans_start R1 R2 W3 R4 W5 Trans_start R1 R2 W6 validation R4 Commitment W7 Trans_start R1 R2 W9 R4 W8 Trans_end Trans_end Trans_end Trans_start R1 R2 R6 R8 W8 Client C Compare write with later reads Trans_start R1 R2 W6 validation R4 Commitment W7 Trans_start Trans_end Trans_abort Trans_restart R1 R2 W9 R4 W8 Trans_start Trans_end Trans_abort Trans_restart Client D R1 R2 R6 R8 W8 Abort itself or conflicting active transactions Trans_end Which validation is better? Winter, 2004 CSS490 DFS 9 File-Sharing Semantics Immutable Shared-Files Semantics Server Client B Client A Version 1.0 Tentative based on 1.0 Tentative based on 1.0 Version 1.1 Version conflict Abort Depend on each file system. Abortion is simple (later, the client A can Decide to overwrite it with its tentative 1.0 by changing the corresponding directory) Winter, 2004 CSS490 DFS Version 1.2 Version 1.2 Ignore conflict Merge 10 File-Caching Schemes Cache Location Node boundary Client Server Main memory copy Main memory copy Location No caching Merits No modifications In server’s main memory One-time disk access, Easy implementation, Unix-like file-sharing semantics One-time network access, No size restriction In client’s disk copy Disk Disk file In client’s main memory Winter, 2004 Maximum performance, Diskless workstation, Scalability CSS490 DFS Demerits Frequent disk access, Busy network traffic Busy network traffic Cache consistency problem, File access semantics, Frequent disk access, No Diskless workstation Size restriction, Cache consistency problem, File access semantics 11 File-Caching Schemes Modification Propagation Client 1 Main memory copy Client 2 Main memory new copy W W Immediate write Disk file W Client 1 Main memory copy Client 2 Main memory new copy W W delayed write Disk file Winter, 2004 Write-through scheme Pros: Unix-like semantics and high reliability Cons: Poor write performance Delayed-write scheme Write on cache displacement Periodic write Write on close Pros: Write accesses complete quickly Some writes may be omitted by the following writes. Gathering all writes mitigates network overhead. Cons: Delaying of write propagation results in fuzzier file-sharing semantics. CSS490 DFS 12 File-Caching Schemes Cache Validation Schemes – Client-Initiated Approach Client 1 Main memory copy Client 2 Main memory copy W Write through Disk Delayed write? file W Client 1 Main memory copy Check before every access Client 2 Main memory new copy W W W Write-on-close Disk file W Winter, 2004 Checking before every access (Unix-like semantics but too slow) Checking periodically (better performance but fuzzy file-sharing semantics) Checking on file open (simple, suitable for session-semantics) Problem: High network traffic Check-on-open Check-on-close? CSS490 DFS 13 File-Caching Schemes Cache Validation Schemes – Server-Initiated Approach Client 1 Main memory copy W W W Write through Or Delayed write? Client 2 Main memory copy Client 3 Main memory copy Client 4 Main memory Deny for a new open Notify (invalidate) Disk file W Keeping track of clients having a copy Denying a new request, queuing it, and disabling caching Notifying all clients of any update on the original file Problem: violating client-server model Stateful servers nd file opening. Check-on-open still needed for the 2 Winter, 2004 CSS490 DFS 14 Sun NFS Structure Server Client A / / usr bin / usr bin export User process opt bin org shared shared export User process VFS VFS Local FS Client B NFS client RPC stub Winter, 2004 Local FS NFS server RPC stub CSS490 DFS VFS Local FS NFS client RPC stub 15 Sun NFS Installation Server: Check if NFS is running: Start NSF: Edit /etc/exports file: Export dirs in /etc/exports: Check exported directories: Client: Import a server’s directory: mount –o options server_name:/dir /my_dir rpcinfo –p /etc/rc.d/init.d/nfs start /dir/to/export client1(permissions), client2(… exportfs –a showmount –e bg: continue working on importing upon a failure, intr: a process will be interupted if its I/O request to the server dir is pending. soft: allowing a client to time out the connection after a number of retries rw/ro: normal r/w or read only portmapper client Underlying Connections: NFS mount service port permission 2049 Winter, 2004 CSS490 DFS rpc mountd portmapper nfs 16 Sun NFS Overviews Communication RPC: a compound procedure Lookup, Open, and Read Server status Stateless: simple implementation in ver 3. Statefull: allowing clients to cache files in ver 4. RPC call back from a server to invalidate a client’s cache Synchronization Session semantics File Locking in ver 4: lock, lockt, locku, and renew Ex. Emacs: Tests with lockt when modifying buffer, locks a file with lockt, and unlock with locku after writing buffer contents to the file. Share reservation: specify how to share a file (with ro, wo, or r/w) Winter, 2004 CSS490 DFS 17 SUN NFS Overviews (Cont’d) Caching In client’s memory Session semantics Revalidation of client’s cache upon re-opening the same file Open delegation: A server delegates a open decision to a writing client which can handle an open request from other clients on the same machine. A server calls back the client when receiving an open request from another machine. Fault Tolerance RPC failure: use a duplicate-request cache File locking failure: provide a grace period during which a client reclaim locks previously granted and the server builds up its previous state. Winter, 2004 CSS490 DFS 18 Sun NFS Duplicate Request Cache server client server client XID = 1234 server client XID = 1234 XID = 1234 XID = 1234 Too soon, ignore Transaction completed reply Too soon, ignore XID = 1234 Transaction completed Transaction completed reply reply Just replied, ignore XID = 1234 reply Then, when does the server delete this cached result? Winter, 2004 CSS490 DFS 19 DFS Example Andrew File System Client A Client A / tmp / usr tmp bin usr bin Symbolic links Symbolic links User process Unix Kernel (Unix FS) Winter, 2004 Venus process Venus process Unix Kernel (Unix FS) cache CSS490 DFS 20 DFS Example XFS Metadata Manager 2: Log them in a segment Storage Server Metadata Manager 3: Collaborative caching (Read data from another client if possible) Client Storage Server Client Storage Server 1: Write requests 1: Read request LAN Winter, 2004 2: Query a manager 3: Fragment a segment and sent them to a strip group of servers CSS490 DFS 21 DFS Example Plan 9 Client / ex Union directory in a b c import a x d N y export import File server 1 File server 2 d1 d2 a b c a import net Computation server Network Interface d3 d x N y net Remote execution Winter, 2004 CSS490 DFS Network access Internet 22 Paper Review by Students Sun NFS Andrew File System XFS Plan 9 LFS Winter, 2004 CSS490 DFS 23