NFS & Distributed Systems Issues Vivek Pai Dec 12, 2002 The Next Project Behavioral spec Implementation up to you Can assume max of 32 procs/threads Use a simple counter to implement simple counts I may release a tool to test easier But feel to use ApacheBench, etc 2 Behavioral Spec The following behavioral spec is important If there aren’t enough free processes/threads, the server should spawn one per second If there are too many free, one should be killed per second This should not depend on any other activity in the system 3 Caching Mmap Always use mmap Keep cache of active & inactive maps Total cache size in KB should be limited by command-line argument Can only exceed this limit if all mappings are active 4 Man Pages You May Like Mmap, munmap Man –k pthread Flock/lockf Sleep Signal Alarm 5 Being A Good User Do not fork wildly Try to test on non-shared system 6 Imagine The Following Everyone has a desktop machine Each machine has a user Each user has a home directory What problems arise? Can’t move between machines Can’t easily share files with others How does this data get backed up? 7 Was It Always Like This? No Think mainframes: Big, centralized box All disks attached Programs ran on box Only terminals/monitors on each desk 8 How Did We Get Here? Mainframe killers advocated little boxes Lots of little boxes are a distributed system Distributed systems introduce new problems 9 Why Use Little Boxes? Little boxes are cheap Little boxes are disposable Easier to order a PC than a mainframe No need for a maintenance contract Economy of scale Design cost amortized over more units 10 Were Minis Immune? Minicomputers were “department”sized versus “company”-sized Most information not shared among everyone Administrator per department OK Shared resources only within department OK 11 Why Not Just Shared Disk? Centralized storage Easier Better Easier Easier administration/backup use of capacity to build large filesystem cache to provide AC/power Problem: compare bandwidth 10 Mbit/sec Ethernet at the time Switched versus shared irrelevant 12 New Problem Single point of failure Means everything depends on this item In other cases, duplication helps Common failures = reboot But all information (state) lost All clients would have to be told We’d need to keep track of all clients • On stable storage! 13 Toward Statelessness Make server as dumb as possible Shift burdens to client-side Client failure only harms that client Each operation is self-contained Repeating operations permissible Idempotent – repeating causes no change 14 Idempotency Regular Unix system call write(fd, buf, size) Writes size bytes at current position, moves position forward by size Idempotent version pwrite(fd, buf, size, offset) Idempotent operations in NFS hidden from user programs 15 Distributed Caching Local filesystems have caches Use caches to offload network traffic Same object replicated in many caches No problem for reads What happens on write/update? Multiple different copies of data? What happens if it’s metadata? 16 Distributed Write Problem Possible approaches Disallow caching on writes • What about emacs? Disallow caching of shared files • What happens for really big files? Disallow caching of metadata writes What disk blocks does OS care about? 17 Sun’s Write Philosophy File block write sharing not an issue Very few programs do it Correctness depends on program Reduce window of opportunity Flush dirty blocks periodically Flush can be asynchronous 18 Metadata Operations Performed synchronously at server Must be reflected to disk Why: stability Overhead: disk op + network Can we speed up synchronous ops? 19 New Statelessness Problems Stale file handle problem cd ~vivek/temp1/temp in window A rm –r ~vivek/temp1 in window B “ls” in window A Stale inode problem Machine A gets file for read Filesystem reformatted by admin Machine A modifies file, tries to write 20 What Slows Down Servers Network overhead Disk DMA in 4KB pieces Network processing in 1500 byte packets + manipulation Multiple CPUs Synchronous operations Nonvolatile memory + recovery 21