CS 519 -- Operating Systems -- Fall 2000

advertisement
DISTRIBUTED FILE
SYSTEM
Nhóm báo cáo :
Lê Tuấn Anh
Nguyễn Hải Duy
Đặng Thanh Linh
Trần Trung Hiếu 50500892
Nguyễn Hoàng Nam
Computer Science
1
Distributed file system.
Content:
I.
Distributed file system design.
II. Distributed file system Implementation
III. Network file system (NFS)
IV. Trends in distributed file system.
Computer Science
2
Distributed file system.
What’s Distributed File System?
Distributed File System (DFS) is a mechanism for
sharing files
DFS is used to make files distributed across multiple
servers appear to users as if they reside in one place
on the network
DFS provides a mechanism to create logical views of
folders and files regardless of where those files are
physically located on the network
Computer Science
Distributed file system.
What’s Distributed File System?(cont.)
Computer Science
Distributed file system.
File Service
Specify what the file system offers to its clients to
manipulate on shared files
ex: read,write…on files
Implemented by a user/kernel process called file
server
A system may have one or several file servers
running at the same time
Computer Science
Distributed file system.
File Service (cont.)
Two models for file services
upload/download: files move between server and clients,
few operations (read file & write file), simple, requires
storage at client, good if whole file is accessed
remote memory access: files stay at server, reach
interface for many operations, less space at client,
efficient for small accesses
Computer Science
Distributed file system.
File Service (cont.)
Computer Science
Distributed file system.
Directory Service
Provide operations for :
 creating and deleting directories
 naming and renaming files
 moving files from one directory to another
 entering, removing, looking up files in one
directory
Computer Science
Distributed file system.
Naming Transparency
Naming is the mapping between logical and physical
objects.
 Ex: a user filename maps to <cylinder,sector>
 In a conventional file system, it's understood
where the file actually resides; the system and disk
are known.
 In a transparent DFS, the location of a file,
somewhere in the network, is hidden
 File replication means multiple copies of a file;
mapping returns a SET of locations for the
Computer
Science
Distributed file system.
replications.
Naming
Transparency(cont.)
Location transparency: the path name gives no hint as
to where the file (or other object) is located.
ex: /server1/dir1/x specifies x is located on server1 but it
does not tell where that server1 is located -> server can
move the file in the network without changing the path
Location independence: possible to remove one file
among servers which not change the path name.
Computer Science
Distributed file system.
Naming Schemes
 Machine + path naming, such as /machine/path
 Mounting remote file system onto the local file
hierarchy
 A single name space that looks the same on all
machines
Computer Science
Distributed file system.
Two level naming
Symbolic name (external), e.g. prog.c; binary name
(internal), e.g. local i-node number as in Unix
Directories provide the translation from symbolic to
binary names
Binary name format
i-node: no cross references among servers
(server, i-node): a directory in one server can refer to a
file on a different server
{binary_name}: binary names refer to the original file and
all of its backups when looking up
Computer Science
Distributed file system.
File Sharing Semantics
UNIX semantics: total ordering of R/W events
easy to achieve in a non-distributed system
in a distributed system with one server and multiple
clients with no caching at client, total ordering is also
easily achieved since R and W are immediately
performed at server
Session semantics: writes are guaranteed to become
visible only when the file is closed
if two or more clients simultaneously write: one file
(last one or non-deterministically) replaces the other
Computer Science
Distributed file system.
File Sharing Semantics
(cont.)
Immutable files: create and read file operations (no
write)
writing a file means to create a new one and enter it into
the directory replacing the previous one with the same
name: atomic operations
two processes try to replace the same file at the same
time: last copy or nondeterministically
what happens if a file is replaced while another process
is busy reading it
Transaction semantics: mutual exclusion on file
accesses; either all file operations are completed
or none is. Good for banking systems
Computer Science
Distributed file system.
II.DFS Implementation
File usage
- Measurements.
- File Usage Pattern(Observed in a study by Satyanarayanan ).
System Structure
- File-server and Directory-server Organization.
- Special attention to alternative approaches.
Computer Science
15
Distributed file system.
File usage- Measurements
- Static measurements:
* Represent a snapshot of the system at a certain
instant.
* Made by examining the disk to see what is on it.
- Dynamic measurements:
* Modifying the file system to record all operations to a
log for subsequent analysis
Computer Science
16
Distributed file system.
File usage- Measurements
- Static measurements:
The distribution of files size.
The distribution of file types.
The amount of storage occupied by files of various
types and size.
- Dynamic measurements:
The relative frequency of various operations
The number of files open at any moment
The amount of sharing that takes place
Computer Science
17
Distributed file system.
File Usage- Measurement Problems
- How typical the observed user population is?
Satyanarayanan's measurements were made at a university -> Also
apply to industrial research lab or office automation project or
banking system?
- Watching out for artifacts of the system being measured
Ex: Distribution of file names in an MS-DOS system- File names
are never more than 8 characters( plus an optional threecharacters extension)
- Made on more-or-less traditional UNIX systems. Whether or not
they can be transferred or extrapolated to distributed systems
Computer Science
18
Distributed file system.
File Usage- File Usage Pattern
Observed in a study by Satyanarayanan (1981)
- Most files are small (< 10K)
- Reading is much more frequent than writing
- Most R&W accesses are sequential (random access is rare)
- Most files have a short lifetime -> create the file on the
client
- File sharing is unusual -> caching at client
- The average process uses only a few files
Computer Science
19
Distributed file system.
Server System Structure
Are client and server different?
- Some system, all machines run the same basic
software
-> any machine can offer file-service to the publicoffer names of selected directories so that other
machines can access them.
- The other systems, the file server and directory
server are just user programs
-> run client and server software on the same machines
or no
Computer Science
20
Distributed file system.
Server System Structure
Are client and server different?
- The other extreme systems have clients and server
are on different machine.
Computer Science
21
Distributed file system.
Server System Structure
File + directory service: combined or not ?
- Combine file service and directory service into a single
server that handles all the directory and file calls.
- Keep file service and directory service separate:
Directory-server map symbolic name onto its binary
name.
File-server with the binary name to read or write the
file.
Computer Science
22
Distributed file system.
Server System Structure
Separating File + directory service
-Advantage
Produce simpler software
-Disadvantage
Require more communications
Computer Science
23
Distributed file system.
Server System Structure
Separating File + directory service Example: Look-up a/b/c
Client sends a symbolic name
to the directory-server
-> binary name given by file-server
Directory-hierarchy
be partitioned among multiple servers:
-1st directory on sever 1
contain an entry a for another directory
on server 2.
- 2nd directory on sever 2
contain an entry b for another directory
on server 3.
- 3rd directory on sever 3
contain an entry c for a file.
- File with its binary name.
Computer Science
24
Distributed file system.
Server System Structure
Separating File + directory service Example: Look-up a/b/c
-Client send a message ->
server 1
-Server 1 finds a and sees the
binary name refers to another
server -> (1) tell the client
which hold b
•Requires the client to know
which server holds which
directory -> require more
messages.
Computer Science
25
Distributed file system.
Server System Structure
Separating File + directory service Example: Look-up a/b/c
-Client send a message -> server 1
-Server 1 finds a and sees the
binary name refers to another
server -> (2) forward the
remainder of the request to
server 2.
• Efficient
• Can not use RPC (Remote
Procedure Call) because the
process which the client sends the
message to is not one that sends the
reply
Computer Science
26
Distributed file system.
Server System Structure
Separating File + directory service
Problem
Path names look up, especially with multiple directory
servers can be expensive.
Cache directory hints at client to accelerate the path
name look up – directory and hints must be kept
coherent
Computer Science
27
Distributed file system.
Server System Structure
Another question
Whether or not file, directory and other servers should
keep state information about clients ?
- Yes Stateful server.
- No Stateless server.
Computer Science
28
Distributed file system.
Server System Structure
Stateless vs. Stateful
Stateful Servers
Stateless Server






requests are self-contained
better fault tolerance
open/close at client (fewer
messages)
no space reserved for tables
thus, no limit of open files
no problem if client crashes
Computer Science





29
shorter messages
better performance (info in
memory until close)
open/close at server
file locking possible
read ahead possible
Distributed file system.
Caching
Definition: A cache is a block of memory for temporary
storage of data likely to be used again.
Main memory
Cache Memory
Index
Data
Index Tag
0
xyz
0
2
abc
1
pdq
1
0
xyz
2
abc
3
ght
Computer Science
30
Data
Distributed file system.
Caching
There are four potential places to store files, or parts of files:
-The Server’s disk.
-The Server’s main memory.
-The Client disk.
-The Client ‘s main memory.
These different storage locations all have different properties .
Computer Science
Distributed file system.
Caching
Computer Science
32
Distributed file system.
Caching-Store all file in the server’s disk.
Advantages:
-Plenty of space.
-The file are accessible to all clients .
-Have one copy of each file ->no consistency problems
arises.
Problem:
-Performance: the file must be transferred from the
server’s disk to the server’s main memory,and then again
over the network to the client’s main memory.
Computer Science
33
Distributed file system.
Caching files in the server's main
memory.
Advantages:
-Eliminates the disk transfer.
-Keep its memory and disk copies synchronized
Problems:
-Network transfer still has to be done.
-What is the unit the cache manages?(whole files or
disk blocks ).
-What to do when the cache fills up and
something must be evicted.(one of algorithm :LRU).
Computer Science
34
Distributed file system.
Caching at client’s disk (if available):
-The disk holds more but is slower.
- If large amounts of data are being used, a client disk
cache may be better.
- This method isn’t used in practice.
- In any event, most systems that do client caching do
it in the client's main memory.
Computer Science
35
Distributed file system.
Cache in the client's main memory:
There are three options to decide where to put files:
-Inside each process address space: no sharing at
client, it is effective only if individual processes
open and close files repeatedly
-In the kernel: kernel involvement on hits, a kernel
call is needed in all cases
-In a separate user-level cache manager: flexible
and efficient if paging can be controlled from userlevel
Computer Science
36
Distributed file system.
Cache in the client's main memory
Computer Science
37
Distributed file system.
Cache Consistency.
-Two clients simultaneously
read the same file and then both modify it.
-Two files are written back to the server, the one
written last will overwrite the other one.
- Client caching has to be thought out fairly carefully
-There are several ways to solve the consistency
problem:
- Write through; Delayed write; Write on close;
Centralized control
Computer Science
38
Distributed file system.
Cache Consistency- Write-through
algorithm
-When a cache entry (file or block) is modified, the new
value is kept in the cache, but is also sent immediately
to the server
-> high traffic, requires cache managers to check
(modification time) with server before can provide
cached content to any client
Computer Science
39
Distributed file system.
Cache Consistency -Delayed write
-Delayed write: coalesces multiple writes; better
performance but ambiguous semantics .
*the client just makes a note that a file has been
updated. Once every 30 seconds or so, all the file
updates are gathered together and sent to the server
all at once.
*entire sequence happens before time to send all
modified files back to the server
Computer Science
40
Distributed file system.
Cache Consistency -Write-on-close
-Write-on-close: implements session semantics, write
a file back to the server only after it has been closed.
Computer Science
41
Distributed file system.
Cache Consistency -Central control
-Central control: file server keeps a directory of
open/cached files at clients -> Unix semantics, but
problems with robustness and scalability; problem also
with invalidation messages because clients did not
solicit them
Computer Science
42
Distributed file system.
Replication:
-Multiple copies of selected files.
1.
To increase reliability by having independent
backups of each file.
2.
To allow file access to occur even if one file
server is down. A server crash should not bring the
entire system down until the server can be rebooted.
3.
To split the workload over multiple .By having
files replicated on two or more servers, the least
heavily loaded one can be used.
Computer Science
43
Distributed file system.
Replication transparency
Replication transparency
-explicit file replication: programmer controls
replication
-lazy file replication: copies made by the server in
background
-use group communication: all copies made at the
same time in the foreground
Computer Science
44
Distributed file system.
Computer Science
45
Distributed file system.
Replication-Update protocols:
Updating all replicas using a coordinator works but is not robust (if
coordinator is down, no updates can be performed) => Voting:
updates (and reads) can be performed if some specified # of
servers agree.
Voting Protocol:
A version # (incremented at write) is associated with each file
To perform a read, a client has to assemble a read quorum of Nr
servers; similarly, a write quorum of Nw servers for a write
If Nr + Nw > N, then any read quorum will contain at least one most
recently updated file version
For reading, client contacts Nr active servers and chooses the file
with largest version #
For writing, client contacts Nw active servers asking them to write.
Succeeds if they all say yes.
Computer Science
46
Distributed file system.
Replication-Update protocols:
Nr is usually small (reads are frequent), but Nw is
usually close to N (want to make sure all replicas are
updated). Problem with achieving a write quorum in the
presence of server failures
Voting with ghosts: allows to establish a write quorum
when several servers are down by temporarily creating
dummy (ghost) servers (at least one must be real)
Ghost servers are not permitted in a read quorum (they
don’t have any files)
When server comes back it must restore its copy first
by obtaining a read quorum
Computer Science
47
Distributed file system.
III.Network file system (NFS)
Three aspects of NFS:
The architecture
 The protocol
The implementation
Computer Science
Distributed file system.
NFS Architecture
Basic idea NFS: An arbitrary collection of clients and
servers.
Server export one or more directory for access by
remote client.
List of director is maintained /etc/exports/
Computer Science
49
Distributed file system.
NFS Architecture
Clients access exported directories by mounting
them.
Clients diskless can mount on remote root directory
and else.
To programs running on clients is no difference
between a file located.
So, the basic architectural characteristic NFS is
server exported directory and clients mount them
remotely.
Computer Science
50
Distributed file system.
NFS Protocol
The goal of NFS is to support heterogeneous system.
To accomblishing that must to define two clientserver protocol.
The first NFS protocol handle mounting.
The second NFS protocol is for directory and file
access.
Computer Science
51
Distributed file system.
NFS Protocol: Mounting
Clients send the path name to a server and request to
mount.
If legal, server return handle file to client else.
Handle file contains all information of file and
directory.
Many clients contain /etc/rc to not manual
intervention.
Computer Science
52
Distributed file system.
NFS Protocol: Automounting
Allows a set remote directories to be associated with
the local directory.
First time client sent a message to each of server and
first one to reply wins.
Advantages:
-If server down, it is possible to bring client up.
-allowing client to try to a set of servers in parallel.
Other, automounting most often used for read-only
file and rarely change.
Computer Science
53
Distributed file system.
NFS Protocol: Accessing
Clients send the message to server to manipulate and
read and write file.
Most of UNIX system calls supported NFS exception
OPEN and CLOSE.
To READ, clients send message to server and receive
file handle.
To WRITE, clients only need a file handle, offset and
the number of file desired.
Computer Science
54
Distributed file system.
NFS Protocol: Accessing
Advantages
Servers don’t remember any information between
calls to open connection
Stateless, not efficient when server crashes and
recovers
In contrast, statefull
Computer Science
55
Distributed file system.
NFS Protocol: Security
Problem: in stateless, locks can’t associated with open
file
NFS uses UNIX protection mechanism with “rwx” bit
Other, use public key cryptography
Information about all of keys are maintained by NIS
(Network Information Services)
NIS’s function is to store (key, value) and mapping
between user name to password, machine name to
network address
Computer Science
56
Distributed file system.
NFS Inplementation
Computer Science
57
Distributed file system.
NFS Inplementation
System call layer
This handle calls like OPEN, READ and CLOSE.
Virtual file system layer (VFS)
Maintain a table with one entry for each open file
Entry is v-node (virtual, i-node)
Computer Science
58
Distributed file system.
NFS Inplementation: Usage v-node
Mount
The system administrator Call mount program
Make a MOUNT system call
Kernel asked NFS client to create r-node (remote, i-node) in
internal table to hold the file handle
V-node point to r-node
Computer Science
59
Distributed file system.
NFS Inplementation: Usage v-node
OPEN
Kernel base on some point during parsing the name.
Kernel asked NFS client code to OPEN file
NFS client lookup in remain table and report back to VFS
layer
Put in its table a v-node that point to r-node
Computer Science
60
Distributed file system.
NFS Inplementation: Usage v-node
READ
The caller is given a file descriptor for the remote file
VFS locates the corresponding v-node
Transfers between client and server
Make in large chunks, normally 8192 bytes
caching
Computer Science
61
Distributed file system.
IV.Trends In Distributed File Systems
Some Problem make changes in File System :
New Hardware
Scalability
WAN
Mobile Users
Fault Tolerance
Mulimedia
Computer Science
62
Distributed file system.
New Hardware
Well Designed Hardware can help solve problem :
Computer Science
63
Distributed file system.
Scalability
Distributed file systems is toward lager . Old algorithm
may not work and may cause bottle neck problem
A general way to solve this problem is partition the
systems into smaller units which are relatively
independent
Computer Science
64
Distributed file system.
WAN
Most current work now on distributed systems focuses
on LAN-based systems but it will be interconnected to
form transparent distributed systems covering
countries and continent . So what kind of file system
would be need to serve all the world ?
A larger system lead to a large variety encounter for
example what format one should use for files containing
floating-pint numbers .
Computer Science
65
Distributed file system.
Mobile Users
Laptop ,pocket pc , smart phone can be found every
where these days and they are multiplying like rabbits .
However the connection may not good at all .
And solution is based on caching.
Remote control
Computer Science
66
Distributed file system.
Fault Tolerance
If the a system goes down for an hour there are many
serious problem so the demand for systems that
essentially never fail will grow.
File replication become an essential requirement .
Computer Science
67
Distributed file system.
Multimedia
Real time conference , video on demand or multimedia
will need completely different file system .
Computer Science
68
Distributed file system.
Download