Other File Systems: LFS, NFS, and AFS

advertisement
Other File Systems:
LFS, NFS, and AFS
Goals for Today
• Discuss specific file systems
– both local and remote
• Log-structured file system (LFS)
• Distributed file systems (DFS)
– Network file system (NFS)
– Andrew file system (AFS)
2
Log-Structured File Systems
• The trend: CPUs are faster, RAM & caches are bigger
–
–
–
–
So, a lot of reads do not require disk access
Most disk accesses are writes  pre-fetching not very useful
Worse, most writes are small  10 ms overhead for 50 µs write
Example: to create a new file:
• i-node of directory needs to be written
• Directory block needs to be written
• i-node for the file has to be written
• Need to write the file
– Delaying these writes could hamper consistency
• Solution: LFS to utilize full disk bandwidth
3
LFS Basic Idea
• Structure the disk a log
– Periodically, all pending writes buffered in memory are collected
in a single segment
– The entire segment is written contiguously at end of the log
• Segment may contain i-nodes, directory entries, data
– Start of each segment has a summary
– If segment around 1 MB, then full disk bandwidth can be utilized
• Note, i-nodes are now scattered on disk
– Maintain i-node map (entry i points to i-node i on disk)
– Part of it is cached, reducing the delay in accessing i-node
• This description works great for disks of infinite size
4
LFS vs. UFS
file2
file1
inode
directory
dir1
dir2
Unix File
System
inode map
dir2
dir1
Log
file1
data
file2
Log-Structured
File System
Blocks written to
create two 1-block
files: dir1/file1 and
dir2/file2, in UFS and
LFS
5
LFS Cleaning
• Finite disk space implies that the disk is eventually full
– Fortunately, some segments have stale information
– A file overwrite causes i-node to point to new blocks
• Old ones still occupy space
• Solution: LFS Cleaner thread compacts the log
– Read segment summary, and see if contents are current
• File blocks, i-nodes, etc.
– If not, the segment is marked free, and cleaner moves forward
– Else, cleaner writes content into new segment at end of the log
– The segment is marked as free!
• Disk is a circular buffer, writer adds contents to the front,
cleaner cleans content from the back
6
Distributed File Systems
• Goal: view a distributed system as a file system
– Storage is distributed
– Web tries to make world a collection of hyperlinked documents
• Issues not common to usual file systems
–
–
–
–
–
Naming transparency
Load balancing
Scalability
Location and network transparency
Fault tolerance
• We will look at some of these today
7
Transfer Model
• Upload/download Model:
– Client downloads file, works on it, and writes it back on server
– Simple and good performance
• Remote Access Model:
– File only on server; client sends commands to get work done
8
Naming transparency
• Naming is a mapping from logical to physical objects
• Ideally client interface should be transparent
– Not distinguish between remote and local files
– /machine/path or mounting remote FS in local hierarchy are not
transparent
• A transparent DFS hides the location of files in system
• 2 forms of transparency:
– Location transparency: path gives no hint of file location
• /server1/dir1/dir2/x tells x is on server1, but not where server1 is
– Location independence: move files without changing names
• Separate naming hierarchy from storage devices hierarchy
9
File Sharing Semantics
• Sequential consistency: reads see previous writes
– Ordering on all system calls seen by all processors
– Maintained in single processor systems
– Can be achieved in DFS with one file server and no caching
10
Caching
• Keep repeatedly accessed blocks in cache
– Improves performance of further accesses
• How it works:
–
–
–
–
If needed block not in cache, it is fetched and cached
Accesses performed on local copy
One master file copy on server, other copies distributed in DFS
Cache consistency problem: how to keep cached copy
consistent with master file copy
• Where to cache?
– Disk: Pros: more reliable, data present locally on recovery
– Memory: Pros: diskless workstations, quicker data access,
– Servers maintain cache in memory
11
File Sharing Semantics
• Other approaches:
– Write through caches:
• immediately propagate changes in cache files to server
• Reliable but poor performance
– Delayed write:
• Writes are not propagated immediately, probably on file close
• Session semantics (AFS): write file back on close
• Alternative (NFS): scan cache periodically and flush modified blocks
• Better performance but poor reliability
– File Locking:
• The upload/download model locks a downloaded file
• Other processes wait for file lock to be released
12
Network File System (NFS)
• Developed by Sun Microsystems in 1984
– Used to join FSes on multiple computers as one logical whole
• Used commonly today with UNIX systems
• Assumptions
– Allows arbitrary collection of users to share a file system
– Clients and servers might be on different LANs
– Machines can be clients and servers at the same time
• Architecture:
– A server exports one or more of its directories to remote clients
– Clients access exported directories by mounting them
• The contents are then accessed as if they were local
13
Example
14
NFS Mount Protocol
• Client sends path name to server with request to mount
– Not required to specify where to mount
• If path is legal and exported, server returns file handle
– Contains FS type, disk, i-node number of directory, security info
– Subsequent accesses from client use file handle
• Mount can be either at boot or automount
– Using automount, directories are not mounted during boot
– OS sends a message to servers on first remote file access
– Automount is helpful since remote dir might not be used at all
• Mount only affects the client view!
15
NFS Protocol
• Supports directory and file access via remote procedure
calls (RPCs)
• All UNIX system calls supported other than open & close
• Open and close are intentionally not supported
–
–
–
–
–
For a read, client sends lookup message to server
Server looks up file and returns handle
Unlike open, lookup does not copy info in internal system tables
Subsequently, read contains file handle, offset and num bytes
Each message is self-contained
• Pros: server is stateless, i.e. no state about open files
• Cons: Locking is difficult, no concurrency control
16
NFS Implementation
• Three main layers:
• System call layer:
– Handles calls like open, read and close
• Virtual File System Layer:
– Maintains table with one entry (v-node) for each open file
– v-nodes indicate if file is local or remote
• If remote it has enough info to access them
• For local files, FS and i-node are recorded
• NFS Service Layer:
– This lowest layer implements the NFS protocol
17
NFS Layer Structure
18
How NFS works?
• Mount:
–
–
–
–
–
–
Sys ad calls mount program with remote dir, local dir
Mount program parses for name of NFS server
Contacts server asking for file handle for remote dir
If directory exists for remote mounting, server returns handle
Client kernel constructs v-node for remote dir
Asks NFS client code to construct r-node for file handle
• Open:
– Kernel realizes that file is on remotely mounted directory
– Finds r-node in v-node for the directory
– NFS client code then opens file, enters r-node for file in VFS,
and returns file descriptor for remote node
19
Cache coherency
• Clients cache file attributes and data
– If two clients cache the same data, cache coherency is lost
• Solutions:
– Each cache block has a timer (3 sec for data, 30 sec for dir)
• Entry is discarded when timer expires
– On open of cached file, its last modify time on server is checked
• If cached copy is old, it is discarded
– Every 30 sec, cache time expires
• All dirty blocks are written back to the server
20
Andrew File System (AFS)
• Named after Andrew Carnegie and Andrew Mellon
– Transarc Corp. and then IBM took development of AFS
– In 2000 IBM made OpenAFS available as open source
• Features:
–
–
–
–
–
–
–
Uniform name space
Location independent file sharing
Client side caching with cache consistency
Secure authentication via Kerberos
Server-side caching in form of replicas
High availability through automatic switchover of replicas
Scalability to span 5000 workstations
21
AFS Overview
• Based on the upload/download model
– Clients download and cache files
– Server keeps track of clients that cache the file
– Clients upload files at end of session
• Whole file caching is central idea behind AFS
– Later amended to block operations
– Simple, effective
• AFS servers are stateful
– Keep track of clients that have cached files
– Recall files that have been modified
22
AFS Details
• Has dedicated server machines
• Clients have partitioned name space:
– Local name space and shared name space
– Cluster of dedicated servers (Vice) present shared name space
– Clients run Virtue protocol to communicate with Vice
• Clients and servers are grouped into clusters
– Clusters connected through the WAN
• Other issues:
– Scalability, client mobility, security, protection, heterogeneity
23
AFS: Shared Name Space
• AFS’s storage is arranged in volumes
– Usually associated with files of a particular client
• AFS dir entry maps vice files/dirs to a 96-bit fid
– Volume number
– Vnode number: index into i-node array of a volume
– Uniquifier: allows reuse of vnode numbers
• Fids are location transparent
– File movements do not invalidate fids
• Location information kept in volume-location database
– Volumes migrated to balance available disk space, utilization
– Volume movement is atomic; operation aborted on server crash
24
AFS: Operations and Consistency
• AFS caches entire files from servers
– Client interacts with servers only during open and close
• OS on client intercepts calls, and passes it to Venus
– Venus is a client process that caches files from servers
– Venus contacts Vice only on open and close
• Does not contact if file is already in the cache, and not
invalidated
– Reads and writes bypass Venus
• Works due to callback:
– Server updates state to record caching
– Server notifies client before allowing another client to modify
– Clients lose their callback when someone writes the file
• Venus caches dirs and symbolic links for path translation
25
AFS Implementation
• Client cache is a local directory on UNIX FS
– Venus and server processes access file directly by UNIX i-node
• Venus has 2 caches, one for status & one for data
– Uses LRU to keep them bounded in size
26
Summary
• LFS:
– Local file system
– Optimize writes
• NFS:
– Simple distributed file system protocol. No open/close
– Stateless server
• Has problems with cache consistency, locking protocol
• AFS:
– More complicated distributed file system protocol
– Stateful server
• session semantics: consistency on close
27
Enjoy Spring Break!!!
28
Storage Area Networks (SANs)
• New generation of architectures for managing storage in
massive data centers
– For example, Google is said to have 50,000-200,000 computers
in various centers
– Amazon is reaching a similar scale
• A SAN system is a collection of file systems with tools to
help humans administer the system
29
Examples of SAN issues
• Where should a file be stored
– Many of these systems have an indirection mechanism so that a
file can move from volume to volume
– Allows files to migrate, e.g. from a slow server to a fast one or
from long term storage onto an active disk system
• Eco-computing: systems that seek to minimize energy in
big data centers
30
Examples of SAN issues
• Disk-to-disk backup
– Might want to do very fast automated backups
– Ideally, can support this while the disk is actively in use
• Easiest if two disks are next to each other
• Challenge: back up entire data center in New York at site
in Kentucky
– US Dept of Treasury e-Cavern
31
File System Reliability
• 2 considerations: backups and consistency
• Why backup?
– Recover from disaster
– Recover from stupidity
• Where to backup? Tertiary storage
– Tape: holds 10 or 100s of GBs, costs pennies/GB
• sequential access  high random access time
• Backup takes time and space
32
Backup Issues
• Should the entire FS be backup up?
– Binaries, special I/O files usually not backed up
• Do not backup unmodified files since last backup
– Incremental dumps: complete per month, modified files daily
• Compress data before writing to tape
• How to backup an active FS?
– Not acceptable to take system offline during backup hours
• Security of backup media
33
Backup Strategies
• Physical Dump
– Start from block 0 of disk, write all blocks in order, stop after last
– Pros: Simple to implement, speed
– Cons: skip directories, incremental dumps, restore some file
• No point dumping unused blocks, avoiding it is a big overhead
• How to dump bad blocks?
• Logical Dump
–
–
–
–
Start at a directory
dump all directories and files changed since base date
Base date could be of last incremental dump, last full dump, etc.
Also dump all dirs (even unmodified) in path to a modified file
34
Logical Dumps
• Why dump unmodified directories?
– Restore files on a fresh FS
– To incrementally recover a single file
File that has
not changed
35
A Dumping Algorithm
Algorithm:
•
•
•
•
Mark all dirs & modified files
Unmark dirs with no mod. files
Dump dirs
Dump modified files
36
Logical Dumping Issues
•
•
•
•
Reconstruct the free block list on restore
Maintaining consistency across symbolic links
UNIX files with holes
Should never dump special files, e.g. named pipes
37
Download