Distributed File Systems: RPC, NFS, and AFS

advertisement
Distributed File Systems:
RPC, NFS, and AFS
Announements
• Homework 6 available later tonight
– Due next Tuesday, December 2nd
• See me after class to pick up prelim
• Upcoming Agenda
– No class on Thursday—Happy Thanksgiving!
– Next week last week of classes—December 2nd and 4th
– Final—Thursday, December 18th at 2pm
• Room 131 Warren Hall
• Length is 2hrs
2
Goals for Today
• Distributed file systems (DFS)
• Network file system (NFS)
– Remote Procedure Calls (RPC)
• Andrew file system (AFS)
3
Distributed File Systems (DFS)
4
Distributed File Systems
• Goal: view a distributed system as a file system
– Storage is distributed
– Web tries to make world a collection of hyperlinked documents
• Issues not common to usual file systems
–
–
–
–
–
Naming transparency
Load balancing
Scalability
Location and network transparency
Fault tolerance
• We will look at some of these today
5
Transfer Model
• Upload/download Model:
– Client downloads file, works on it, and writes it back on server
– Simple and good performance
• Remote Access Model:
– File only on server; client sends commands to get work done
6
Naming transparency
• Naming is a mapping from logical to physical objects
• Ideally client interface should be transparent
– Not distinguish between remote and local files
– /machine/path or mounting remote FS in local hierarchy are not
transparent
• A transparent DFS hides the location of files in system
• 2 forms of transparency:
– Location transparency: path gives no hint of file location
• /server1/dir1/dir2/x tells x is on server1, but not where server1 is
– Location independence: move files without changing names
• Separate naming hierarchy from storage devices hierarchy
7
File Sharing Semantics
• Sequential consistency: reads see previous writes
– Ordering on all system calls seen by all processors
– Maintained in single processor systems
– Can be achieved in DFS with one file server and no caching
8
Caching
• Keep repeatedly accessed blocks in cache
– Improves performance of further accesses
• How it works:
–
–
–
–
If needed block not in cache, it is fetched and cached
Accesses performed on local copy
One master file copy on server, other copies distributed in DFS
Cache consistency problem: how to keep cached copy
consistent with master file copy
• Where to cache?
– Disk: Pros: more reliable, data present locally on recovery
– Memory: Pros: diskless workstations, quicker data access,
– Servers maintain cache in memory
9
File Sharing Semantics
• Other approaches:
– Write through caches:
• immediately propagate changes in cache files to server
• Reliable but poor performance
– Delayed write:
• Writes are not propagated immediately, probably on file close
• Session semantics (AFS): write file back on close
• Alternative (NFS): scan cache periodically and flush modified blocks
• Better performance but poor reliability
– File Locking:
• The upload/download model locks a downloaded file
• Other processes wait for file lock to be released
10
Network File System (NFS)
11
Network File System (NFS)
• Developed by Sun Microsystems in 1984
– Used to join FSes on multiple computers as one logical whole
• Used commonly today with UNIX systems
• Assumptions
– Allows arbitrary collection of users to share a file system
– Clients and servers might be on different LANs
– Machines can be clients and servers at the same time
• Architecture:
– A server exports one or more of its directories to remote clients
– Clients access exported directories by mounting them
• The contents are then accessed as if they were local
12
Example
13
NFS Mount Protocol
• Client sends path name to server with request to mount
– Not required to specify where to mount
• If path is legal and exported, server returns file handle
– Contains FS type, disk, i-node number of directory, security info
– Subsequent accesses from client use file handle
• Mount can be either at boot or automount
– Using automount, directories are not mounted during boot
– OS sends a message to servers on first remote file access
– Automount is helpful since remote dir might not be used at all
• Mount only affects the client view!
14
NFS Protocol
• Supports directory and file access via remote procedure
calls (RPCs)
• All UNIX system calls supported other than open & close
• Open and close are intentionally not supported
–
–
–
–
–
For a read, client sends lookup message to server
Server looks up file and returns handle
Unlike open, lookup does not copy info in internal system tables
Subsequently, read contains file handle, offset and num bytes
Each message is self-contained
• Pros: server is stateless, i.e. no state about open files
• Cons: Locking is difficult, no concurrency control
15
NFS Implementation
• Three main layers:
• System call layer:
– Handles calls like open, read and close
• Virtual File System Layer:
– Maintains table with one entry (v-node) for each open file
– v-nodes indicate if file is local or remote
• If remote it has enough info to access them
• For local files, FS and i-node are recorded
• NFS Service Layer:
– This lowest layer implements the NFS protocol
16
NFS Layer Structure
17
How NFS works?
• Mount:
–
–
–
–
–
–
Sys ad calls mount program with remote dir, local dir
Mount program parses for name of NFS server
Contacts server asking for file handle for remote dir
If directory exists for remote mounting, server returns handle
Client kernel constructs v-node for remote dir
Asks NFS client code to construct r-node for file handle
• Open:
– Kernel realizes that file is on remotely mounted directory
– Finds r-node in v-node for the directory
– NFS client code then opens file, enters r-node for file in VFS,
and returns file descriptor for remote node
18
Cache coherency
• Clients cache file attributes and data
– If two clients cache the same data, cache coherency is lost
• Solutions:
– Each cache block has a timer (3 sec for data, 30 sec for dir)
• Entry is discarded when timer expires
– On open of cached file, its last modify time on server is checked
• If cached copy is old, it is discarded
– Every 30 sec, cache time expires
• All dirty blocks are written back to the server
19
Remote Procedure Call (RPC)
20
Procedure Call
• More natural way is to communicate using procedure calls:
– every language supports it
– semantics are well defined and understood
– natural for programmers to use
• Basic idea: define server as a module that exports a set of
procedures callable by client programs.
• To use the server, the client just does a procedure call, as
if it were linked with the server
call
Client
Server
return
21
(Remote) Procedure Call
• So, we would like to use procedure call as a model for
distributed communication.
• Lots of issues:
–
–
–
–
–
how do we make this invisible to the programmer?
what are the semantics of parameter passing?
how is binding done (locating the server)?
how do we support heterogeneity (OS, arch., language)
etc.
22
Remote Procedure Call
• The basic model for Remote Procedure Call (RPC) was
described by Birrell and Nelson in 1980, based on work
done at Xerox PARC.
• Goal to make RPC as much like local PC as possible.
• Used computer/language support.
• There are 3 components on each side:
– a user program (client or server)
– a set of stub procedures
– RPC runtime support
23
RPC
• Basic process for building a server:
– Server program defines the server’s interface using an interface
definition language (IDL)
– The IDL specifies the names, parameters, and types for all clientcallable server procedures
– A stub compiler reads the IDL and produces two stub procedures for
each server procedure: a client-side stub and a server-side stub
– The server writer writes the server and links it with the server-side
stubs; the client writes her program and links it with the client-side
stubs.
– The stubs are responsible for managing all details of the remote
communication between client and server.
24
RPC Stubs
• Client-side stub is a procedure that looks to the client as
if it were a callable server procedure.
• Server-side stub looks like a calling client to the server
• The client program thinks it is calling the server;
– in fact, it’s calling the client stub.
• The server program thinks it’s called by the client;
– in fact, it’s called by the server stub.
• The stubs send messages to each other to make RPC
happen.
25
RPC Call Structure
client
program
call foo(x,y)
client makes
local call to
stub proc.
server is
called by
its stub
proc foo(a,b)
begin foo...
server
program
end foo
call foo
client
stub
proc foo(a,b)
call foo
stub builds msg
packet, inserts
params
stub unpacks
params and
makes call
send msg
RPC
runtime
call foo(x,y)
server
stub
msg received
runtime sends
msg to remote
node
Call
runtime
receives msg
and calls stub
RPC
runtime
26
RPC Return Structure
client
program
call foo(x,y)
client continues
server proc
returns
proc foo(a,b)
begin foo...
server
program
end foo
return
client
stub
proc foo(a,b)
return
stub unpacks
msg, returns
to caller
stub builds
result msg
with output
args
msg received
RPC
runtime
call foo(x,y)
server
stub
send msg
runtime
receives msg,
calls stub
return
runtime
responds
to original
msg
RPC
runtime
27
RPC Information
Flow
marshal
call
return
Machine B
Server
(callee)
return
call
marshal
return values
Server
Stub
unmarshal
args
send
receive
Network
Machine A
send
Client
RPC
Stub
Runtime
receive
unmarshal
mbox2
ret vals
Network
Client
(caller)
args
mbox1
RPC
Runtime
28
RPC Binding
• Binding is the process of connecting the client and server
• The server, when it starts up, exports its interface,
– identifying itself to a network name server and
– telling the local runtime its dispatcher address.
• The client, before issuing any calls, imports the server,
– which causes the RPC runtime to lookup the server through the
name service and
– contact the requested server to setup a connection.
• The import and export are explicit calls in the code.
29
RPC Marshalling
• Marshalling is packing of procedure params into message
packet.
• RPC stubs call type-specific procedures to marshall (or
unmarshall) all of the parameters to the call.
• On client side, client stub marshalls parameters into call
packet;
– On the server side the server stub unmarshalls the parameters to
call the server’s procedure.
• On return, server stub marshalls return parameters into
return packet;
– Client stub unmarshalls return params and returns to the client.
30
Problems with RPC
• Non-Atomic failures
– Different failure modes in distributed system than on a single machine
– Consider many different types of failures
• User-level bug causes address space to crash
• Machine failure, kernel bug causes all processes on same machine to fail
• Some machine is compromised by malicious party
– Before RPC: whole system would crash/die
– After RPC: One machine crashes/compromised while others keep
working
– Can easily result in inconsistent view of the world
• Did my cached data get written back or not?
• Did server do what I requested or not?
– Answer? Distributed transactions/Byzantine Commit
• Performance
– Cost of Procedure call « same-machine RPC « network RPC
– Means programmers must be aware that RPC is not free
• Caching can help, but may make failure handling complex
31
Cross-Domain Comm./Location
Transparency
• How do address spaces communicate with one another?
–
–
–
–
Shared Memory with Semaphores, monitors, etc…
File System
Pipes (1-way communication)
“Remote” procedure call (2-way communication)
• RPC’s can be used to communicate between address
spaces on different machines or the same machine
– Services can be run wherever it’s most appropriate
– Access to local and remote services looks the same
• Examples of modern RPC systems:
– CORBA (Common Object Request Broker Architecture)
– DCOM (Distributed COM)
– RMI (Java Remote Method Invocation)
32
Microkernel operating systems
• Example: split kernel into application-level servers.
– File system looks remote, even though on same machine
App
App
App
file system
Windowing
VM Networking
Threads
Monolithic Structure
App
File
sys
windows
RPC address
spaces
threads
Microkernel Structure
• Why split the OS into separate domains?
– Fault isolation: bugs are more isolated (build a firewall)
– Enforces modularity: allows incremental upgrades of pieces of software
(client or server)
– Location transparent: service can be local or remote
• For example in the X windowing system: Each X client can be on a
separate machine from X server; Neither has to run on the machine with
the frame buffer.
33
Andrew File System (AFS)
34
Andrew File System (AFS)
• Named after Andrew Carnegie and Andrew Mellon
– Transarc Corp. and then IBM took development of AFS
– In 2000 IBM made OpenAFS available as open source
• Features:
–
–
–
–
–
–
–
Uniform name space
Location independent file sharing
Client side caching with cache consistency
Secure authentication via Kerberos
Server-side caching in form of replicas
High availability through automatic switchover of replicas
Scalability to span 5000 workstations
35
AFS Overview
• Based on the upload/download model
– Clients download and cache files
– Server keeps track of clients that cache the file
– Clients upload files at end of session
• Whole file caching is central idea behind AFS
– Later amended to block operations
– Simple, effective
• AFS servers are stateful
– Keep track of clients that have cached files
– Recall files that have been modified
36
AFS Details
• Has dedicated server machines
• Clients have partitioned name space:
– Local name space and shared name space
– Cluster of dedicated servers (Vice) present shared name space
– Clients run Virtue protocol to communicate with Vice
• Clients and servers are grouped into clusters
– Clusters connected through the WAN
• Other issues:
– Scalability, client mobility, security, protection, heterogeneity
37
AFS: Shared Name Space
• AFS’s storage is arranged in volumes
– Usually associated with files of a particular client
• AFS dir entry maps vice files/dirs to a 96-bit fid
– Volume number
– Vnode number: index into i-node array of a volume
– Uniquifier: allows reuse of vnode numbers
• Fids are location transparent
– File movements do not invalidate fids
• Location information kept in volume-location database
– Volumes migrated to balance available disk space, utilization
– Volume movement is atomic; operation aborted on server crash
38
AFS: Operations and Consistency
• AFS caches entire files from servers
– Client interacts with servers only during open and close
• OS on client intercepts calls, and passes it to Venus
– Venus is a client process that caches files from servers
– Venus contacts Vice only on open and close
• Does not contact if file is already in the cache, and not
invalidated
– Reads and writes bypass Venus
• Works due to callback:
– Server updates state to record caching
– Server notifies client before allowing another client to modify
– Clients lose their callback when someone writes the file
• Venus caches dirs and symbolic links for path translation
39
AFS Implementation
• Client cache is a local directory on UNIX FS
– Venus and server processes access file directly by UNIX i-node
• Venus has 2 caches, one for status & one for data
– Uses LRU to keep them bounded in size
40
Summary
• RPC
– Call procedure on remote machine
– Provides same interface as procedure
– Automatic packing and unpacking of arguments without user
programming (in stub)
• NFS:
– Simple distributed file system protocol. No open/close
– Stateless server
• Has problems with cache consistency, locking protocol
• AFS:
– More complicated distributed file system protocol
– Stateful server
• session semantics: consistency on close
41
Prelim II
• Prelims graded
– Mean 73 (Median 76), Stddev 13.3, High 98 out of 100!
– Good job!
• Re-grade policy
– Submit written re-grade request to Nazrul.
• Entire prelim will be re-graded.
• We were generous the first time…
– If still unhappy, submit another re-grade request.
• Nazrul will re-grade herself
– If still unhappy, submit a third re-grade request.
• I will re-grade. Final grade is law.
42
Grade distribution
43
Question #3
• Hardlinks
– Need a count in inode/fileheader
– Remove decrements count and removes file if count is 0
– New syscall: Link(char *src, char *dst)
• Softlinks
–
–
–
–
Need to file type indicated in inode/fileheader
Also, need path to target file
Open needs to change to perform recursive lookup
New syscall: SymLink(char *src, char *dst)
44
Question #4
• Concurrent writers
– RAID 0, 1+0, 5, 6
– Not 1 or 4 because cannot perform independent writes
• Concurrent readers (but not concurrent writers)
– RAID 1 and 4
• 2*k-1 disks and want concurrent readers
– Unavailable: RAID 1 — requires 2*k disks
– Undesirable: RAID 4, 5, and 6 — requires complex controllers
45
Happy Thanksgiving!!!
46
Download