Homework #1 - Hopkins Storage Systems Lab

advertisement
Homework #1:
Computer Science 600.419
Storage Systems
Johns Hopkins University
Fall 2007
This assignment is due on Thursday October 18, 2007 at 5:00 pm. It should be turned in electronically to both
randal@cs.jhu.edu and abat@cs.jhu.edu. Or, a hardcopy should be placed under Randal Burns’ office,
NEB 222. Solutions need to be legibly written (or typeset) and turned in on standard letter sized paper. The name of
the student, date, and assignment should appear at the top of every page.
1. Media transfer rate on a zoned disk.
You have a 7,200 rpm (CAV) disk with 4 platters (8 R/W heads). The disk has 3 zones: zone 0 with 800 sectors
per track, zone 1 with 640 sectors per track, and zone 2 with 480 sectors per track.
(a) What is the ideal media transfer rate from each of these zones? In this case, ideal means that you can
ignore overhead from head switching and cylinder switching, i.e. head switch time and cylinder switch
time are both 0.
(b) If the disk has a head switch time of 1.8ms and a cylinder switch time of 2.4 ms, what is the maximum
media transfer rate from each zone. Consider transferring data from many cylinders sequentially. You may
assume the the disk performs zero latency reads or is skewed perfectly.
(c) What is the percent degradation in media transfer rate per zone between the ideal case and the more real
case (parts a and b respectively)? Is it uniform across all zones? Why or why not?
(d) Assuming that sectors are laid out in a rotationally optimal fashion between tracks and cylinders, what is
the minimum media transfer rate when reading a track worth of data for each zone. Note that a track worth
of data is not necessarily aligned on a track boundary.
2. Buffer management for speed matching on a zoned disk.
This problem uses the disk from problem 1. Additionally, this disk has 4000 KB of on disk cache for speed
matching between the media and a 80,000 KB per second bus. Carefully consider which values of media
transfer rate you choose when answering each part.
(a) During a read, what is the minimum bus delay that can result in a buffer overflow after media transfer has
begun. Give a lower bound for each zone.
(b) When reading a track worth of data, how much data does the disk need to buffer before it begins transfer
in order to guarantee that the buffer does not empty? Which zones require more buffer space and why?
(c) In the best case, how much can overlapping bus transfer with media transfer improve performance when
reading a tracks worth of data? Give an upper bound for percentage improvement. What does this tell us
about the relationship between overlapped bus transfer and disk zoning?
3. The effect of mirroring on disk performance.
This problem analyzes the effects on I/O performance of mirroring data among several disks. In this thought
experiment, we are implementing a reliable virtual disk by using two disks that are copies of each other. Each
disk stores the same data at the same LBN using the following rules for read and write:
• A read request for a sector completes when the sector is read from either disk.
• A write request for a sector completes when the sector is written to both disks.
You should assume that the disk drives have the exact same geometry and physical properties. However, variations in rotational velocity cause the rotational position of the disk drives to be entirely independent of each
other. No effort is made to synchronize the rotation of the disk drives.
(a) What are the components of service time for a disk request?
(b) Describe (qualitatively) how this layout strategy affects performance when reading a randomly selected
sector? What component(s) of service time are affected?
(c) For each component of service time that changes, write an expression that quantifies the read performance
relative to read performance for a single disk.
(d) Describe (qualitatively) how this layout strategy affects write performance? What component(s) of service
time are affected?
(e) For each component of service time that changes, write an expression that quantifies the write performance
relative to write performance for a single disk.
(f) What do your answers to the previous questions indicate about the role of fast-write caching in multi-disk
systems?
(g) Synchronizing the rotation of mirrored disk drives so that heads pass over the same sectors of the disk at
the same time can mitigate performance degradation. Why? What are some of the technical obstacles to
synchronizing disk rotation?
4. Snapshots and copy-on-write file systems.
You are the system administrator for the back offices of a small textile company in Baltimore, MD. All office
staff use a single time-sharing computer system with locally-attached disks as the storage medium. You’ve
configured this computer to run the Linux operating system with the Second Extended (ext2) file system.
The company president, Ms. Humphries, is concerned that viruses or other malware will delete important customer and accounting information from the computer system. She has asked you to implement a backup system
that will ensure that this information is not permanently lost when unauthorized modifications are made.
To accomplish this, you decide to modify the ext2 codebase to include the capability to take and restore snapshots.
(a) Describe which disk blocks (including both metadata and data blocks) are modified, and in what manner,
and in an order that preserves the integrity of the on-disk file system at all times, during the following
operations. State any assumptions you make.
i.
ii.
iii.
iv.
v.
vi.
The creation of a 128 B file.
The creation of a 128 TB file.
mv /home/randal/passwd /etc
rm -rf /
mkfs.ext2 /dev/sda1
chmod 666 /etc/fstab
(b) One way to create snapshots is to keep an “undo” log of all changes made to the file system. The undo log
would contain unmodified copies of all data and metadata blocks as they were before a write occurs. This
undo log could be kept in a special file or on a separate storage device.
i. Is this a good approach? Explain your reasoning in the context to your answers to (a). Include a
discussion of what you feel are the most important evaluation metrics for such a system.
ii. Could log entries be compressed: e.g., if two writes occur to block #12, then only the most recent log
entry would need to be kept in the log? Why or why not? Is there a difference between the ability to
compress data blocks and to compress metadata blocks?
2
(c) Another way to create snapshots is to always allocate new inodes and data blocks whenever a change
occurs. The old inodes and data blocks are not freed or overwritten; they are simply left in place in case
they are needed at some point in the future.
i. Propose a method for tracking and locating these old inodes and blocks. For example, how could
the file system support such requests as show me /etc/passwd as of 12:01am Wednesday or identify at
what times /etc/passwd changed? Assume that this snapshot functionality should generally be hidden
from the user; i.e., ls /etc/*passwd* by itself only shows one file.
ii. A potential problem with this approach is that you could quickly run out of inodes. Discuss the pros
and cons of allowing flexible allocation of inodes: i.e., instead of having them in fixed locations in an
allocation group, to allow any data block to serve as an inode.
(d) Bonus question: most of the historic Baltimore textile factories have long since closed their doors. Apparently they did not use modern file systems to protect their proprietary information. Ms. Humphries’s
idea of protecting the company’s intellectual assets is a good one, but snapshots probably aren’t enough
to adequately hedge against business threats. Given what you know about the ext2 file system, propose
another change to ext2 that will afford your users better protection against data loss caused by malware
and viruses.
3
Download