Outline • Distributed File Systems – continued 5/29/2016 COP5611 1 Distributed File Systems • A distributed file system is a resource management component in a distributed operating systems – It implements a common file system shared by all the computers in the systems • Two important goals – Network transparency – High availability 5/29/2016 COP5611 2 Architecture 5/29/2016 COP5611 3 Architecture – cont. • Normally for performance reasons distributed file systems are organized as a client-server architecture – File servers store files and perform storage and retrieval upon client’s requests – Two most important parts are • Name server • Cache manager 5/29/2016 COP5611 4 Architecture – cont. 5/29/2016 COP5611 5 Architecture – cont. 5/29/2016 COP5611 6 Mounting • Mounting is a way to bind together different file systems to form a single hierarchical structured name space – It is widely used in both local and distributed UNIX machines – In distributed file systems, file systems maintained by remote servers are mounted at the clients 5/29/2016 COP5611 7 Mounting – cont. 5/29/2016 COP5611 8 Mounting – cont. 5/29/2016 COP5611 9 Mounting – cont. 5/29/2016 COP5611 10 Mounting – cont. 5/29/2016 COP5611 11 Automounting 5/29/2016 COP5611 - cont. 12 Automounting – cont. 5/29/2016 COP5611 13 Caching • Caching is commonly used in distributed file systems to reduce delays in accessing the data – In file caching, a copy of the data stored at a remote file server is brought to the client, reducing access delays due to network latency – The effectiveness of caching is based on the temporal locality in programs – Files can also be cached at the server side 5/29/2016 COP5611 14 Client Caching 5/29/2016 COP5611 15 Client Caching – cont. 5/29/2016 COP5611 16 Cache Consistency 5/29/2016 COP5611 17 Hints • An alternative approach to caching – The cached data is treated as hints – The cached data is not guaranteed to be completely accurate • The cache consistency issue is ignored in this implementation – This is useful for applications which can recover from invalid cached data 5/29/2016 COP5611 18 Bulk Data Transfer • Bulk data transfer is to transfer multiple data blocks instead of just the block being referenced by the client – Temporal locality and the fact that most files are accessed in their entirety – Reduce the network communication overhead by reducing the cost of executing communication protocols 5/29/2016 COP5611 19 Security 5/29/2016 COP5611 20 Naming in Distributed File Systems • A name in file systems is a way to reference a file or a directory • Name resolution refers to the process of mapping a name to an object (or in the case of replication, to multiple objects) • A name space is a collection of names 5/29/2016 COP5611 21 Naming in a Local File System 5/29/2016 COP5611 22 Naming in a Local File System – cont. 5/29/2016 COP5611 23 Naming in Distributed File Systems – cont. • Three approaches to naming in distributed file systems – The simplest scheme is to concatenate the host name to the names of files • Not network transparent • Not location-independent – Mounting remote directories to local directories • Location transparent but not network transparent – A single global directory • Limited to a few cooperating computers 5/29/2016 COP5611 24 Naming in Distributed File Systems – cont. • Context – Content can be used to partition a file name space – Here a filename consists of a context and a name local to the context – Name resolution involves interpreting the name within a context, which may invoke other contexts recursively 5/29/2016 COP5611 25 Naming in Distributed File Systems – cont. • Name Servers are responsible for name resolution in distributed file systems – A name server is a process that maps names specified by clients to stored objects such as files and directories – A single name server vs. multiple name servers 5/29/2016 COP5611 26 Caches on Disk or Memory • Cache in main memory vs. cache on a local disk – Cache in main memory • Advantages • Disadvantages – Cache on a local disk • Advantages • Disadvantages 5/29/2016 COP5611 27 Writing Policy • This is related to the cache consistency – It decides what to do when a cache block at the client is modified – Several different policies • Write-through • Delayed writing policy for some time – Delayed writing policy when the file is closed 5/29/2016 COP5611 28 Cache Consistency • Schemes to guarantee consistency – Server-initiated approach • Servers inform the cache managers whenever the data in client caches become stale • Cache managers can retrieve the new data when needed – Client-initiated approach • Cache managers validate data with the server before returning it to the clients – Limited caching 5/29/2016 COP5611 29 Availability • Availability is an important issue in distributed file systems – Replication is the primary mechanism for enhancing the availability of files in distributed file systems • Replication – Unit of replication – Replica management 5/29/2016 COP5611 30 Scalability • Scalability deals with the suitability of the design to support more clients – – – – Caching helps reduce the client response time Server-initiated cache invalidation Some clients can be used as servers The structure of the server process also plays a major role in scalability 5/29/2016 COP5611 31 Semantics • Semantics of a file system characterize the effects of accesses on files – For example, a read operation should return the data (stored) due to the latest write operation – Guaranteeing the semantics when employing caching, is difficult and expensive 5/29/2016 COP5611 32 Case Studies • There are distributed file systems that have been developed – – – – – – – – Architecture Communication Processes and their organization Naming Consistency Caching and replication Fault tolerance Security 5/29/2016 COP5611 33 Case Studies – cont. • These examples are taken from Tanenbaum and van Steen’s book – It does not follow the main textbook because • The material there is not up-to-date • It did not provide enough details • You need to read Sect. 9.5 5/29/2016 COP5611 34 Sun Network File Systems • Developed by Sun – – – – The first version was developed and was kept to Sun The second version was incorporated in SunOS 2.0 Version 3 was released around 1994 Version 4 is under development to make the NFS a true wide-area file system across the Internet 5/29/2016 COP5611 35 Sun Network File Systems – cont. • NFS is not a true file system – It is a collection of protocols that together provide clients with a model of a distributed files system • In some sense, it is a middle ware (like CORBA) – NFS protocols are designed in such a way that different implementations should easily interoperate • It can run on a heterogeneous collection of computers 5/29/2016 COP5611 36 NFS Architecture (1) a) b) The remote access model. The upload/download model 5/29/2016 COP5611 37 NFS Architecture (2) 5/29/2016 COP5611 38 File System Model Operation v3 v4 Description Create Yes No Create a regular file Create No Yes Create a nonregular file Link Yes Yes Create a hard link to a file Symlink Yes No Create a symbolic link to a file Mkdir Yes No Create a subdirectory in a given directory Mknod Yes No Create a special file Rename Yes Yes Change the name of a file Rmdir Yes No Remove an empty subdirectory from a directory Open No Yes Open a file Close No Yes Close a file Lookup Yes Yes Look up a file by means of a file name Readdir Yes Yes Read the entries in a directory Readlink Yes Yes Read the path name stored in a symbolic link Getattr Yes Yes Read the attribute values for a file Setattr Yes Yes Set one or more attribute values for a file Read Yes Yes Read the data contained in a file Write Yes Yes Write data to a file 5/29/2016 COP5611 39 Communication a) b) Reading data from a file in NFS version 3. Reading data using a compound procedure in version 4. 5/29/2016 COP5611 40 Processes • NFS is a traditional client-server system – In versions 2 and 3, the NFS servers are stateless • in that they are not required to maintain any client state • The main advantage of the stateless approach is simplicity – However, some of operations are intrinsically stateful and they are implemented in a separate lock manager – Version 4 uses a stateful approach 5/29/2016 COP5611 41 Naming (1) 5/29/2016 COP5611 42 Naming (2) 5/29/2016 COP5611 43 File Attributes (1) Attribute Description TYPE The type of the file (regular, directory, symbolic link) SIZE The length of the file in bytes CHANGE Indicator for a client to see if and/or when the file has changed FSID Server-unique identifier of the file's file system 5/29/2016 COP5611 44 File Attributes (2) Attribute Description ACL an access control list associated with the file FILEHANDLE The server-provided file handle of this file FILEID A file-system unique identifier for this file FS_LOCATIONS Locations in the network where this file system may be found OWNER The character-string name of the file's owner TIME_ACCESS Time when the file data were last accessed TIME_MODIFY Time when the file data were last modified TIME_CREATE Time when the file was created 5/29/2016 COP5611 45 Semantics of File Sharing (1) a) b) On a single processor, when a read follows a write, the value returned by the read is the value just written. In a distributed system with caching, obsolete values may be returned. 5/29/2016 COP5611 46 Semantics of File Sharing (2) Method Comment UNIX semantics Every operation on a file is instantly visible to all processes Session semantics No changes are visible to other processes until the file is closed Immutable files No updates are possible; simplifies sharing and replication Transaction All changes occur atomically 5/29/2016 COP5611 47 File Locking in NFS (1) Operation Description Lock Creates a lock for a range of bytes Lockt Test whether a conflicting lock has been granted Locku Remove a lock from a range of bytes Renew Renew the leas on a specified lock 5/29/2016 COP5611 48 File Locking in NFS (2) Current file denial state Request access NONE READ WRITE BOTH READ Succeed Fail Succeed Succeed WRITE Succeed Succeed Fail Succeed BOTH Succeed Succeed Succeed Fail (a) Requested file denial state Current access state NONE READ WRITE BOTH READ Succeed Fail Succeed Succeed WRITE Succeed Succeed Fail Succeed BOTH Succeed Succeed Succeed Fail (b) 5/29/2016 COP5611 49 Client Caching (1) 5/29/2016 COP5611 50 Client Caching (2) 5/29/2016 COP5611 51 RPC Failures 5/29/2016 COP5611 52 Security 5/29/2016 COP5611 53 Secure RPCs 5/29/2016 COP5611 54 Access Control Operation Description Read_data Permission to read the data contained in a file Write_data Permission to to modify a file's data Append_data Permission to to append data to a file Execute Permission to to execute a file List_directory Permission to to list the contents of a directory Add_file Permission to to add a new file t5o a directory Add_subdirectory Permission to to create a subdirectory to a directory Delete Permission to to delete a file Delete_child Permission to to delete a file or directory within a directory Read_acl Permission to to read the ACL Write_acl Permission to to write the ACL Read_attributes The ability to read the other basic attributes of a file Write_attributes Permission to to change the other basic attributes of a file Read_named_attrs Permission to to read the named attributes of a file Write_named_attrs Permission to to write the named attributes of a file Write_owner Permission to to change the owner Synchronize Permission to to access a file locally at the server with synchronous reads and writes 5/29/2016 COP5611 55 Sun Network File Systems – cont. • Setting a Network File System on Unix systems – NFS software must have installed – Server side • Start NFS daemons • Export directories – /etc/exports – Client side • /etc/fstab • mount -a 5/29/2016 COP5611 56 Coda Distributed File System • Developed at CMU – Integrated now with a number of UNIX systems – Based on the Andrew File System (AFS) • Design goals – – – – Scalable Secure Highly available High degree of naming and location transparency • So that the system would appear to its users as a pure local file system 5/29/2016 COP5611 57 The Coda File System Type of user Description Owner The owner of a file Group The group of users associated with a file Everyone Any user of a process Interactive Any process accessing the file from an interactive terminal Network Any process accessing the file via the network Dialup Any process accessing the file through a dialup connection to the server Batch Any process accessing the file as part of a batch job Anonymous Anyone accessing the file without authentication Authenticated Any authenticated user of a process Service Any system-defined service process 5/29/2016 COP5611 58 Overview of Coda (1) 5/29/2016 COP5611 59 Overview of Coda (2) 5/29/2016 COP5611 60 Communication (1) 5/29/2016 COP5611 61 Communication (2) a) b) Sending an invalidation message one at a time. Sending invalidation messages in parallel. 5/29/2016 COP5611 62 Naming 5/29/2016 COP5611 63 File Identifiers 5/29/2016 COP5611 64 Sharing Files in Coda 5/29/2016 COP5611 65 Transactional Semantics File-associated data Read? Modified? File identifier Yes No Access rights Yes No Last modification time Yes Yes File length Yes Yes File contents Yes Yes • The metadata read and modified for a store session type in Coda. 5/29/2016 COP5611 66 Client Caching 5/29/2016 COP5611 67 Server Replication 5/29/2016 COP5611 68 Disconnected Operation 5/29/2016 COP5611 69 Secure Channels (1) • Mutual authentication in RPC2. 5/29/2016 COP5611 70 Secure Channels (2) • Setting up a secure channel between a (Venus) client and a Vice server in Coda. 5/29/2016 COP5611 71 Access Control Operation Description Read Read any file in the directory Write Modify any file in the directory Lookup Look up the status of any file Insert Add a new file to the directory Delete Delete an existing file Administer Modify the ACL of the directory • Classification of file and directory operations recognized by Coda with respect to access control. 5/29/2016 COP5611 72 Plan 9: Resources Unified to Files 5/29/2016 COP5611 73 Communication File Description ctl Used to write protocol-specific control commands data Used to read and write data listen Used to accept incoming connection setup requests local Provides information on the caller's side of the connection remote Provides information on the other side of the connection status Provides diagnostic information on the current status of the connection • Files associated with a single TCP connection in Plan 9. 5/29/2016 COP5611 74 Processes 5/29/2016 COP5611 75 Naming 5/29/2016 COP5611 76 Overview of xFS. 5/29/2016 COP5611 77 Processes (1) 5/29/2016 COP5611 78 Processes (2) 5/29/2016 COP5611 79 Naming • Main data structures used in xFS. Data structure Description Manager map Maps file ID to manager Imap Maps file ID to log address of file's inode Inode Maps block number (i.e., offset) to log address of block File identifier Reference used to index into manager map File directory Maps a file name to a file identifier Log addresses Triplet of stripe group, ID, segment ID, and segment offset Stripe group map Maps stripe group ID to list of storage servers 5/29/2016 COP5611 80 Overview of SFS • The organization of SFS. 5/29/2016 COP5611 81 Naming /sfs LOC HID Pathname /sfs/sfs.vu.sc.nl:ag62hty4wior450hdh63u623i4f0kqere/home/steen/mbox • A self-certifying pathname in SFS. 5/29/2016 COP5611 82 Summary • Issue NFS Coda Plan 9 xFS SFS Design goals Access transparency High availability Uniformity Serverless system Scalable security Access model Remote Up/Download Remote Log-based Remote Communication RPC RPC Special Active msgs RPC Client process Thin/Fat Fat Thin Fat Medium Server groups No Yes No Yes No Mount granularity Directory File system File system File system Directory Name space Per client Global Per process Global Global File ID scope File server Global Server Global File system Sharing sem. Session Transactional UNIX UNIX N/S Cache consist. write-back write-back write-through write-back write-back Replication Minimal ROWA None Striping None Fault tolerance Reliable comm. Replication and caching Reliable comm. Striping Reliable comm. Recovery Client-based Reintegration N/S Checkpoint & write logs N/S Secure channels Existing mechanisms Needham-Schroeder Needham-Schroeder No pathnames Self-cert. Access control Many operations Directory operations UNIX based UNIX based NFS BASED A comparison between NFS, Coda, Plan 9, xFS. N/S indicates that nothing has been specified. 5/29/2016 COP5611 83