Homework #1: Computer Science 600.419 Storage Systems Johns Hopkins University Fall 2007 This assignment is due on Thursday October 18, 2007 at 5:00 pm. It should be turned in electronically to both randal@cs.jhu.edu and abat@cs.jhu.edu. Or, a hardcopy should be placed under Randal Burns’ office, NEB 222. Solutions need to be legibly written (or typeset) and turned in on standard letter sized paper. The name of the student, date, and assignment should appear at the top of every page. 1. Media transfer rate on a zoned disk. You have a 7,200 rpm (CAV) disk with 4 platters (8 R/W heads). The disk has 3 zones: zone 0 with 800 sectors per track, zone 1 with 640 sectors per track, and zone 2 with 480 sectors per track. (a) What is the ideal media transfer rate from each of these zones? In this case, ideal means that you can ignore overhead from head switching and cylinder switching, i.e. head switch time and cylinder switch time are both 0. (b) If the disk has a head switch time of 1.8ms and a cylinder switch time of 2.4 ms, what is the maximum media transfer rate from each zone. Consider transferring data from many cylinders sequentially. You may assume the the disk performs zero latency reads or is skewed perfectly. (c) What is the percent degradation in media transfer rate per zone between the ideal case and the more real case (parts a and b respectively)? Is it uniform across all zones? Why or why not? (d) Assuming that sectors are laid out in a rotationally optimal fashion between tracks and cylinders, what is the minimum media transfer rate when reading a track worth of data for each zone. Note that a track worth of data is not necessarily aligned on a track boundary. 2. Buffer management for speed matching on a zoned disk. This problem uses the disk from problem 1. Additionally, this disk has 4000 KB of on disk cache for speed matching between the media and a 80,000 KB per second bus. Carefully consider which values of media transfer rate you choose when answering each part. (a) During a read, what is the minimum bus delay that can result in a buffer overflow after media transfer has begun. Give a lower bound for each zone. (b) When reading a track worth of data, how much data does the disk need to buffer before it begins transfer in order to guarantee that the buffer does not empty? Which zones require more buffer space and why? (c) In the best case, how much can overlapping bus transfer with media transfer improve performance when reading a tracks worth of data? Give an upper bound for percentage improvement. What does this tell us about the relationship between overlapped bus transfer and disk zoning? 3. The effect of mirroring on disk performance. This problem analyzes the effects on I/O performance of mirroring data among several disks. In this thought experiment, we are implementing a reliable virtual disk by using two disks that are copies of each other. Each disk stores the same data at the same LBN using the following rules for read and write: • A read request for a sector completes when the sector is read from either disk. • A write request for a sector completes when the sector is written to both disks. You should assume that the disk drives have the exact same geometry and physical properties. However, variations in rotational velocity cause the rotational position of the disk drives to be entirely independent of each other. No effort is made to synchronize the rotation of the disk drives. (a) What are the components of service time for a disk request? (b) Describe (qualitatively) how this layout strategy affects performance when reading a randomly selected sector? What component(s) of service time are affected? (c) For each component of service time that changes, write an expression that quantifies the read performance relative to read performance for a single disk. (d) Describe (qualitatively) how this layout strategy affects write performance? What component(s) of service time are affected? (e) For each component of service time that changes, write an expression that quantifies the write performance relative to write performance for a single disk. (f) What do your answers to the previous questions indicate about the role of fast-write caching in multi-disk systems? (g) Synchronizing the rotation of mirrored disk drives so that heads pass over the same sectors of the disk at the same time can mitigate performance degradation. Why? What are some of the technical obstacles to synchronizing disk rotation? 4. Snapshots and copy-on-write file systems. You are the system administrator for the back offices of a small textile company in Baltimore, MD. All office staff use a single time-sharing computer system with locally-attached disks as the storage medium. You’ve configured this computer to run the Linux operating system with the Second Extended (ext2) file system. The company president, Ms. Humphries, is concerned that viruses or other malware will delete important customer and accounting information from the computer system. She has asked you to implement a backup system that will ensure that this information is not permanently lost when unauthorized modifications are made. To accomplish this, you decide to modify the ext2 codebase to include the capability to take and restore snapshots. (a) Describe which disk blocks (including both metadata and data blocks) are modified, and in what manner, and in an order that preserves the integrity of the on-disk file system at all times, during the following operations. State any assumptions you make. i. ii. iii. iv. v. vi. The creation of a 128 B file. The creation of a 128 TB file. mv /home/randal/passwd /etc rm -rf / mkfs.ext2 /dev/sda1 chmod 666 /etc/fstab (b) One way to create snapshots is to keep an “undo” log of all changes made to the file system. The undo log would contain unmodified copies of all data and metadata blocks as they were before a write occurs. This undo log could be kept in a special file or on a separate storage device. i. Is this a good approach? Explain your reasoning in the context to your answers to (a). Include a discussion of what you feel are the most important evaluation metrics for such a system. ii. Could log entries be compressed: e.g., if two writes occur to block #12, then only the most recent log entry would need to be kept in the log? Why or why not? Is there a difference between the ability to compress data blocks and to compress metadata blocks? 2 (c) Another way to create snapshots is to always allocate new inodes and data blocks whenever a change occurs. The old inodes and data blocks are not freed or overwritten; they are simply left in place in case they are needed at some point in the future. i. Propose a method for tracking and locating these old inodes and blocks. For example, how could the file system support such requests as show me /etc/passwd as of 12:01am Wednesday or identify at what times /etc/passwd changed? Assume that this snapshot functionality should generally be hidden from the user; i.e., ls /etc/*passwd* by itself only shows one file. ii. A potential problem with this approach is that you could quickly run out of inodes. Discuss the pros and cons of allowing flexible allocation of inodes: i.e., instead of having them in fixed locations in an allocation group, to allow any data block to serve as an inode. (d) Bonus question: most of the historic Baltimore textile factories have long since closed their doors. Apparently they did not use modern file systems to protect their proprietary information. Ms. Humphries’s idea of protecting the company’s intellectual assets is a good one, but snapshots probably aren’t enough to adequately hedge against business threats. Given what you know about the ext2 file system, propose another change to ext2 that will afford your users better protection against data loss caused by malware and viruses. 3