Disk Scheduling In Linux It’s really quite complex! My Goals • Teach a little bit of Computer Science. • Show that the easy stuff is hard in real life. • BTW, Operating systems are cool! How A Disk Works • A disk is a bunch of data blocks. • A disk has a disk head. • Data can only be accessed when the head is on that block. • Moving the head takes FOREVER. • Data transfer is fast (once the disk head gets there). The Problem Statement • Suppose we have multiple requests for disk blocks …. Which should we access first? P.S. Yes, order does matter … a lot. Formal Problem Statement • Input: A set of requests. • Input: The current disk head location. • Input: Algorithm state (direction??) • Output: The next disk block to get. P.S. Not the whole ordering, just the next one. The Goals • Maximize throughput – Operations per second. • Maximize fairness – Whatever that means. • Avoid Starvation – And very very long waits. • Real Time Concerns – These can be life threatening (and bank accounts too!). What’s Not Done • Every Operating system assigns priorities to all sorts of things – Requests for RAM – Requests for the CPU – Requests for network access • Few use disk request priority. Why Talk About This Now? • Because in the last year Linux has had three very different schedulers, and they’ve been tested against each other. A Pessimal Algorithm • Choose the disk request furthest from the current disk head. • This is known to be as bad as any algorithm without idle periods. An Optimal Algorithm • Chose the disk request closest to the disk head. • It’s Optimal! Optimal Analyzed • This is known to have the highest performance in operations per second. • It’s unfair to requests toward the edges of the disk. • It allows for starvation. First Come First Serve • Serve the requests in their arrival order. – It’s fair. – It avoid starvation. – It’s medium lousy performance. – Some simple OSs use this. Elevator • Move back and forth, solving requests as you go. • Performance – good • Fairness – Files near the middle of the disk get 2x the attention. Cyclic Elevator • Move toward the bottom, solving requests as you go. • When you’ve solved the lowest request, seek all the way to the highest request. – Performance penalty occurs here. Cyclic Elevator Analyzed • It’s fair. • It’s starvation-proof. • It’s very good performance. – Almost as good as elevator. • It’s used in real life (and every textbook). Deadline Scheduler • Each request has a deadline. • Service requests using cyclic elevator. • When a deadline is threatened, skip directly to that request. • For Real Time (which means xmms) Jens Axboe Deadline Analyzed • Gives Priority to Real Time Processes. • Fair otherwise. • No starvation – Unless a real time process goes wild. Anticipatory Scheduling (The Idea) • Developed by several people. • Coded by Nick Piggin. • Assume that an I/O request will be closely followed by another nearby one. Anticipatory Scheduling (The Algorithm) • After servicing a request … WAIT. – Yes, this means do nothing even though there is work to be done. • If a nearby request occurs soon, service it. • If not … cyclic elevator. Anticiptory Scheduling Analyzed • • • • Fair No support for real time. No starvation. Makes assumptions about how processes work in real life. – That’s the idleness. – They better be right Benchmarking the Anticipatory Scheduler Source: http://www.cs.rice.edu/~ssiyer/r/antsched/antsched.pdf Completely Fair Queuing (also by Jens) • Real Time needs always come first • Otherwise, no user should be able to hog the disk drive. • Priorities are OK. The CFQ Picture RT Q1 Dispatcher 10ms Valve Disk Queue Q2 Disk Q20 Yes, Gabe’s art is better Analyzing CFQ • Complex!!!!! • Has Real Time support (Jens likes that). • Fair, and fights disk hogs! – A new kind of fairness!! • No starvation is possible. – Real time crazyness excepted. • Allows for priorities – But no one knows how to assign them. Benchmark #1 • time (find kernel-tree -type f | xargs cat > /dev/null) Dead: 3 minutes 39 seconds CFQ: 5 minutes 7 seconds AS: 17 seconds The Benchmark #2 for i in 1 2 3 4 5 6 do time (find kernel-tree-$i -type f | xargs cat > /dev/null ) & done Dead: 3m56.791s CFQ: 5m50.233s AS: 0m53.087s The Benchmark #3 time (cp 1-gig-file foo ; sync) AS: 1:22.36 CFQ: 1:25.54 Dead: 1:11.03 Benchmark #4 • time ssh testbox xterm -e true Old: 62 seconds Dead: 14 seconds CFQ: 11 seconds AS: 12 seconds Benchmark #5 • While “cat 512M-file > /dev/null “ • Measure “write-and-fsync -f -m 100 outfile” Old: 6.4 seconds Dead: 7.7 seconds CFQ: 8.4 seconds AS: 11.9 seconds The Winner is … • Andrew Morton said, "the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster." What You Learned • In Real Operating Systems … – Performance is never obvious. – No one uses the textbook algorithm. – Benchmarking is everything. • Theory is useful, if it helps you benchmark better. • Linux is cool, since we can see the development process.