Outline • Review of Quiz #1 • Distributed File Systems 5/29/2016

advertisement

Outline

• Review of Quiz #1

• Distributed File Systems

4/10/2020 COP5611 1

Distributed File Systems

• A distributed file system is a resource management component in a distributed operating systems

– It implements a common file system shared by all the computers in the systems

• Two important goals

– Network transparency

– High availability

4/10/2020 COP5611 2

Architecture

4/10/2020 COP5611 3

Architecture

– cont.

• Normally for performance reasons distributed file systems are organized as a client-server architecture

– File servers store files and perform storage and retrieval upon client’s requests

– Two most important parts are

• Name server

• Cache manager

4/10/2020 COP5611 4

Architecture

– cont.

4/10/2020 COP5611 5

Architecture – cont.

4/10/2020 COP5611 6

Mounting

• Mounting is a way to bind together different file systems to form a single hierarchical structured name space

– It is widely used in both local and distributed

UNIX machines

– In distributed file systems, file systems maintained by remote servers are mounted at the clients

4/10/2020 COP5611 7

Mounting

– cont.

4/10/2020 COP5611 8

Mounting

– cont.

4/10/2020 COP5611 9

Mounting

– cont.

4/10/2020 COP5611 10

Mounting

– cont.

4/10/2020 COP5611 11

Automounting

- cont.

4/10/2020 COP5611 12

Automounting

– cont.

4/10/2020 COP5611 13

Caching

• Caching is commonly used in distributed file systems to reduce delays in accessing the data

– In file caching, a copy of the data stored at a remote file server is brought to the client, reducing access delays due to network latency

– The effectiveness of caching is based on the temporal locality in programs

– Files can also be cached at the server side

4/10/2020 COP5611 14

Client Caching

4/10/2020 COP5611 15

Client Caching

– cont.

4/10/2020 COP5611 16

Cache Consistency

4/10/2020 COP5611 17

Hints

• An alternative approach to caching

– The cached data is treated as hints

– The cached data is not guaranteed to be completely accurate

• The cache consistency issue is ignored in this implementation

– This is useful for applications which can recover from invalid cached data

4/10/2020 COP5611 18

Bulk Data Transfer

• Bulk data transfer is to transfer multiple data blocks instead of just the block being referenced by the client

– Temporal locality and the fact that most files are accessed in their entirety

– Reduce the network communication overhead by reducing the cost of executing communication protocols

4/10/2020 COP5611 19

Security

4/10/2020 COP5611 20

Naming in Distributed File Systems

• A name in file systems is a way to reference a file or a directory

• Name resolution refers to the process of mapping a name to an object (or in the case of replication, to multiple objects)

• A name space is collection of names

4/10/2020 COP5611 21

Naming in a Local File System

4/10/2020 COP5611 22

Naming in a Local File System

– cont.

4/10/2020 COP5611 23

Naming in Distributed File Systems – cont.

• Three approaches to naming in distributed file systems

– The simplest scheme is to concatenate the host name to the names of files

• Not network transparent

• Not location-independent

– Mounting remote directories to local directories

• Location transparent but not network transparent

– A single global directory

• Limited to a few cooperating computers

4/10/2020 COP5611 24

Naming in Distributed File Systems – cont.

• Context

– Content can be used to partition a file name space

– Here a filename consists of a context and a name local to the context

– Name resolution involves interpreting the name within a context, which may invoke other contexts recursively

4/10/2020 COP5611 25

Naming in Distributed File Systems – cont.

• Name Servers are responsible for name resolution in distributed file systems

– A name server is a process that maps names specified by clients to stored objects such as files and directories

– A single name server vs. multiple name servers

4/10/2020 COP5611 26

Caches on Disk or Memory

• Cache in main memory vs. cache on a local disk

– Cache in main memory

• Advantages

• Disadvantages

– Cache on a local disk

• Advantages

• Disadvantages

4/10/2020 COP5611 27

Writing Policy

• This is related to the cache consistency

– It decides what to do when a cache block at the client is modified

– Several different policies

• Write-through

• Delayed writing policy for some time

– Delayed writing policy when the file is closed

4/10/2020 COP5611 28

Cache Consistency

• Schemes to guarantee consistency

– Server-initiated approach

• Servers inform the cache managers whenever the data in client caches become stale

• Cache managers can retrieve the new data when needed

– Client-initiated approach

• Cache managers validate data with the server before returning it to the clients

– Limited caching

4/10/2020 COP5611 29

Availability

• Availability is an important issue in distributed file systems

– Replication is the primary mechanism for enhancing the availability of files in distributed file systems

• Replication

– Unit of replication

– Replica management

4/10/2020 COP5611 30

Scalability

• Scalability deals with the suitability of the design to support more clients

– Caching helps reduce the client response time

– Server-initiated cache invalidation

– Some clients can be used as servers

– The structure of the server process also plays a major role in scalability

4/10/2020 COP5611 31

Semantics

• Semantics of a file system characterize the effects of accesses on files

– For example, a read operation should return the data (stored) due to the latest write operation

– Guaranteeing the semantics when employing caching, is difficult and expensive

4/10/2020 COP5611 32

Download