NFS & Distributed Systems Issues Vivek Pai Dec 12, 2002

advertisement
NFS & Distributed
Systems Issues
Vivek Pai
Dec 12, 2002
The Next Project
Behavioral spec
 Implementation up to you
 Can assume max of 32
procs/threads
 Use a simple counter to implement
simple counts
 I may release a tool to test easier


But feel to use ApacheBench, etc
2
Behavioral Spec
The following behavioral spec is important
 If there aren’t enough free
processes/threads, the server should
spawn one per second
 If there are too many free, one should be
killed per second
 This should not depend on any other
activity in the system
3
Caching Mmap
Always use mmap
 Keep cache of active & inactive
maps
 Total cache size in KB should be
limited by command-line argument
 Can only exceed this limit if all
mappings are active

4
Man Pages You May Like
Mmap, munmap
 Man –k pthread
 Flock/lockf
 Sleep
 Signal
 Alarm

5
Being A Good User
Do not fork wildly
 Try to test on non-shared system

6
Imagine The Following
Everyone has a desktop machine
 Each machine has a user
 Each user has a home directory
 What problems arise?

Can’t move between machines
 Can’t easily share files with others
 How does this data get backed up?

7
Was It Always Like This?
No
 Think mainframes:

Big, centralized box
 All disks attached
 Programs ran on box
 Only terminals/monitors on each desk

8
How Did We Get Here?
Mainframe killers advocated little
boxes
 Lots of little boxes are a distributed
system
 Distributed systems introduce new
problems

9
Why Use Little Boxes?

Little boxes are cheap


Little boxes are disposable


Easier to order a PC than a mainframe
No need for a maintenance contract
Economy of scale

Design cost amortized over more units
10
Were Minis Immune?
Minicomputers were “department”sized versus “company”-sized
 Most information not shared among
everyone
 Administrator per department OK
 Shared resources only within
department OK

11
Why Not Just Shared Disk?

Centralized storage
Easier
 Better
 Easier
 Easier


administration/backup
use of capacity
to build large filesystem cache
to provide AC/power
Problem: compare bandwidth
10 Mbit/sec Ethernet at the time
 Switched versus shared irrelevant

12
New Problem

Single point of failure
Means everything depends on this item
 In other cases, duplication helps


Common failures = reboot
But all information (state) lost
 All clients would have to be told
 We’d need to keep track of all clients

• On stable storage!
13
Toward Statelessness
Make server as dumb as possible
 Shift burdens to client-side
 Client failure only harms that client
 Each operation is self-contained
 Repeating operations permissible


Idempotent – repeating causes no
change
14
Idempotency

Regular Unix system call
write(fd, buf, size)
 Writes size bytes at current position,
moves position forward by size


Idempotent version
pwrite(fd, buf, size, offset)
 Idempotent operations in NFS hidden
from user programs

15
Distributed Caching
Local filesystems have caches
 Use caches to offload network traffic

Same object replicated in many caches
 No problem for reads


What happens on write/update?
Multiple different copies of data?
 What happens if it’s metadata?

16
Distributed Write Problem

Possible approaches

Disallow caching on writes
• What about emacs?

Disallow caching of shared files
• What happens for really big files?


Disallow caching of metadata writes
What disk blocks does OS care
about?
17
Sun’s Write Philosophy
File block write sharing not an issue
 Very few programs do it
 Correctness depends on program
 Reduce window of opportunity

Flush dirty blocks periodically
 Flush can be asynchronous

18
Metadata Operations
Performed synchronously at server
 Must be reflected to disk

Why: stability
 Overhead: disk op + network


Can we speed up synchronous ops?
19
New Statelessness
Problems

Stale file handle problem
cd ~vivek/temp1/temp in window A
 rm –r ~vivek/temp1 in window B
 “ls” in window A


Stale inode problem
Machine A gets file for read
 Filesystem reformatted by admin
 Machine A modifies file, tries to write

20
What Slows Down Servers

Network overhead
Disk DMA in 4KB pieces
 Network processing in 1500 byte
packets + manipulation
 Multiple CPUs


Synchronous operations

Nonvolatile memory + recovery
21
Download