• Review of Quiz #1
• Distributed File Systems
4/10/2020 COP5611 1
• A distributed file system is a resource management component in a distributed operating systems
– It implements a common file system shared by all the computers in the systems
• Two important goals
– Network transparency
– High availability
4/10/2020 COP5611 2
4/10/2020 COP5611 3
– cont.
• Normally for performance reasons distributed file systems are organized as a client-server architecture
– File servers store files and perform storage and retrieval upon client’s requests
– Two most important parts are
• Name server
• Cache manager
4/10/2020 COP5611 4
– cont.
4/10/2020 COP5611 5
4/10/2020 COP5611 6
• Mounting is a way to bind together different file systems to form a single hierarchical structured name space
– It is widely used in both local and distributed
UNIX machines
– In distributed file systems, file systems maintained by remote servers are mounted at the clients
4/10/2020 COP5611 7
– cont.
4/10/2020 COP5611 8
– cont.
4/10/2020 COP5611 9
– cont.
4/10/2020 COP5611 10
– cont.
4/10/2020 COP5611 11
- cont.
4/10/2020 COP5611 12
– cont.
4/10/2020 COP5611 13
• Caching is commonly used in distributed file systems to reduce delays in accessing the data
– In file caching, a copy of the data stored at a remote file server is brought to the client, reducing access delays due to network latency
– The effectiveness of caching is based on the temporal locality in programs
– Files can also be cached at the server side
4/10/2020 COP5611 14
4/10/2020 COP5611 15
– cont.
4/10/2020 COP5611 16
4/10/2020 COP5611 17
• An alternative approach to caching
– The cached data is treated as hints
– The cached data is not guaranteed to be completely accurate
• The cache consistency issue is ignored in this implementation
– This is useful for applications which can recover from invalid cached data
4/10/2020 COP5611 18
• Bulk data transfer is to transfer multiple data blocks instead of just the block being referenced by the client
– Temporal locality and the fact that most files are accessed in their entirety
– Reduce the network communication overhead by reducing the cost of executing communication protocols
4/10/2020 COP5611 19
4/10/2020 COP5611 20
• A name in file systems is a way to reference a file or a directory
• Name resolution refers to the process of mapping a name to an object (or in the case of replication, to multiple objects)
• A name space is collection of names
4/10/2020 COP5611 21
4/10/2020 COP5611 22
– cont.
4/10/2020 COP5611 23
Naming in Distributed File Systems – cont.
• Three approaches to naming in distributed file systems
– The simplest scheme is to concatenate the host name to the names of files
• Not network transparent
• Not location-independent
– Mounting remote directories to local directories
• Location transparent but not network transparent
– A single global directory
• Limited to a few cooperating computers
4/10/2020 COP5611 24
Naming in Distributed File Systems – cont.
• Context
– Content can be used to partition a file name space
– Here a filename consists of a context and a name local to the context
– Name resolution involves interpreting the name within a context, which may invoke other contexts recursively
4/10/2020 COP5611 25
Naming in Distributed File Systems – cont.
• Name Servers are responsible for name resolution in distributed file systems
– A name server is a process that maps names specified by clients to stored objects such as files and directories
– A single name server vs. multiple name servers
4/10/2020 COP5611 26
• Cache in main memory vs. cache on a local disk
– Cache in main memory
• Advantages
• Disadvantages
– Cache on a local disk
• Advantages
• Disadvantages
4/10/2020 COP5611 27
• This is related to the cache consistency
– It decides what to do when a cache block at the client is modified
– Several different policies
• Write-through
• Delayed writing policy for some time
– Delayed writing policy when the file is closed
4/10/2020 COP5611 28
• Schemes to guarantee consistency
– Server-initiated approach
• Servers inform the cache managers whenever the data in client caches become stale
• Cache managers can retrieve the new data when needed
– Client-initiated approach
• Cache managers validate data with the server before returning it to the clients
– Limited caching
4/10/2020 COP5611 29
• Availability is an important issue in distributed file systems
– Replication is the primary mechanism for enhancing the availability of files in distributed file systems
• Replication
– Unit of replication
– Replica management
4/10/2020 COP5611 30
• Scalability deals with the suitability of the design to support more clients
– Caching helps reduce the client response time
– Server-initiated cache invalidation
– Some clients can be used as servers
– The structure of the server process also plays a major role in scalability
4/10/2020 COP5611 31
• Semantics of a file system characterize the effects of accesses on files
– For example, a read operation should return the data (stored) due to the latest write operation
– Guaranteeing the semantics when employing caching, is difficult and expensive
4/10/2020 COP5611 32