Outline • Announcements • Lab2 • Distributed File Systems 5/29/2016 COP5611 1 Announcements • The midterm will be on March 20 – It will cover up Chap 8 • The emphasis will be on the materials covered in class • You are not required to know the details of distributed deadlock detection algorithms – There will be a review on March 18 • Regarding the first Lab 1 – Each group needs to set up an appointment with me to do a demonstration of your lab for grading – I want to remind you that I take cheating very seriously 5/29/2016 COP5611 2 Distributed File Systems • A distributed file system is a resource management component in a distributed operating systems – It implements a common file system shared by all the computers in the systems • Two important goals – Network transparency – High availability 5/29/2016 COP5611 3 Architecture 5/29/2016 COP5611 4 Architecture – cont. • Normally for performance reasons distributed file systems are organized as a client-server architecture – File servers store files and perform storage and retrieval upon client’s requests – Two most important parts are • Name server • Cache manager 5/29/2016 COP5611 5 Architecture – cont. 5/29/2016 COP5611 6 Architecture – cont. 5/29/2016 COP5611 7 Mounting • Mounting is a way to bind together different file systems to form a single hierarchical structured name space – It is widely used in both local and distributed UNIX machines – In distributed file systems, file systems maintained by remote servers are mounted at the clients 5/29/2016 COP5611 8 Mounting – cont. 5/29/2016 COP5611 9 Mounting – cont. 5/29/2016 COP5611 10 Mounting – cont. 5/29/2016 COP5611 11 Mounting – cont. 5/29/2016 COP5611 12 Automounting 5/29/2016 COP5611 - cont. 13 Automounting – cont. 5/29/2016 COP5611 14 Caching • Caching is commonly used in distributed file systems to reduce delays in accessing the data – In file caching, a copy of the data stored at a remote file server is brought to the client, reducing access delays due to network latency – The effectiveness of caching is based on the temporal locality in programs – Files can also be cached at the server side 5/29/2016 COP5611 15 Client Caching 5/29/2016 COP5611 16 Client Caching – cont. 5/29/2016 COP5611 17 Cache Consistency 5/29/2016 COP5611 18 Hints • An alternative approach to caching – The cached data is treated as hints – The cached data is not guaranteed to be completely accurate • The cache consistency issue is ignored in this implementation – This is useful for applications which can recover from invalid cached data 5/29/2016 COP5611 19 Bulk Data Transfer • Bulk data transfer is to transfer multiple data blocks instead of just the block being referenced by the client – Temporal locality and the fact that most files are accessed in their entirety – Reduce the network communication overhead by reducing the cost of executing communication protocols 5/29/2016 COP5611 20 Security 5/29/2016 COP5611 21 Naming in Distributed File Systems • A name in file systems is a way to reference a file or a directory • Name resolution refers to the process of mapping a name to an object (or in the case of replication, to multiple objects) • A name space is a collection of names 5/29/2016 COP5611 22 Naming in a Local File System 5/29/2016 COP5611 23 Naming in a Local File System – cont. 5/29/2016 COP5611 24 Naming in Distributed File Systems – cont. • Three approaches to naming in distributed file systems – The simplest scheme is to concatenate the host name to the names of files • Not network transparent • Not location-independent – Mounting remote directories to local directories • Location transparent but not network transparent – A single global directory • Limited to a few cooperating computers 5/29/2016 COP5611 25 Naming in Distributed File Systems – cont. • Context – Content can be used to partition a file name space – Here a filename consists of a context and a name local to the context – Name resolution involves interpreting the name within a context, which may invoke other contexts recursively 5/29/2016 COP5611 26 Naming in Distributed File Systems – cont. • Name Servers are responsible for name resolution in distributed file systems – A name server is a process that maps names specified by clients to stored objects such as files and directories – A single name server vs. multiple name servers 5/29/2016 COP5611 27 Caches on Disk or Memory • Cache in main memory vs. cache on a local disk – Cache in main memory • Advantages • Disadvantages – Cache on a local disk • Advantages • Disadvantages 5/29/2016 COP5611 28 Writing Policy • This is related to the cache consistency – It decides what to do when a cache block at the client is modified – Several different policies • Write-through • Delayed writing policy for some time – Delayed writing policy when the file is closed 5/29/2016 COP5611 29 Cache Consistency • Schemes to guarantee consistency – Server-initiated approach • Servers inform the cache managers whenever the data in client caches become stale • Cache managers can retrieve the new data when needed – Client-initiated approach • Cache managers validate data with the server before returning it to the clients – Limited caching 5/29/2016 COP5611 30 Availability • Availability is an important issue in distributed file systems – Replication is the primary mechanism for enhancing the availability of files in distributed file systems • Replication – Unit of replication – Replica management 5/29/2016 COP5611 31 Scalability • Scalability deals with the suitability of the design to support more clients – – – – Caching helps reduce the client response time Server-initiated cache invalidation Some clients can be used as servers The structure of the server process also plays a major role in scalability 5/29/2016 COP5611 32 Semantics • Semantics of a file system characterize the effects of accesses on files – For example, a read operation should return the data (stored) due to the latest write operation – Guaranteeing the semantics when employing caching, is difficult and expensive 5/29/2016 COP5611 33