Operating System Concepts chapter 12 CS 355 Operating Systems Dr. Matthew Wright Background: Magnetic Disks • Rotate 60 to 200 times per second • Transfer rate: rate at which data flows between drive and computer • Positioning time (random-access time): time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency) Disk Address Structure • Disks are addressed as a large 1-dimensional array of logical blocks (usually 512 bytes per logical block). • This array is mapped onto the sectors of the disk, usually with sector 0 on the outermost cylinder, then through that track, then through that cylinder, and then through the other cylinders working toward the center of the disk. • Converting logical addresses to cylinder and track numbers is difficult because: – Most disks have some defective sectors, which are replaced by spare sectors elsewhere on the disk. – The number of sectors per track might not be constant. • Constant linear velocity (CLV): tracks farther from center hold more bits, so disk rotates faster when reading these tracks to keep data rate constant (CSs, DVDs commonly use this method) • Constant angular velocity (CAV): rotational speed is constant, so bit density decreases from inner tracks to outer tracks to keep data rate constant Disk Scheduling: FCFS • Simple, but generally doesn’t provide the fastest service • Example: suppose the read/write heads start on cylinder 53, and disk queue has requests for I/O to blocks on the following cylinders: 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests FCFS. Total head movement spans 640 cylinders. Disk Scheduling: SSTF • Shortest Seek Time First (SSTF): service the requests closest to the current position of the read/write heads • This is similar to SJF scheduling, and could starve some requests. • Example: heads at cylinder 53; disk request queue contains: 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests SSTF. Total head movement spans 236 cylinders. Disk Scheduling: SCAN • SCAN algorithm: disk heads start at one end, move towards the other end, then return, servicing requests along each way • Example: heads at cylinder 53 moving toward 0; request queue: 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests with SCAN algorithm. Total head movement spans 236 cylinders. Disk Scheduling: C-SCAN • Circular SCAN (C-SCAN): Disk heads start at one end, move towards the other end, servicing requests along each way. Disk heads return immediately to the first end without servicing requests, then repeat. • Example: heads at cylinder 53; request queue: 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests with CSCAN algorithm. Total head movement spans 383 cylinders. Disk Scheduling: LOOK and C-LOOK • Like SPAN or C-SPAN algorithms, but only going as far as the last request in either direction. • Example: heads at cylinder 53; request queue: 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests with CSCAN algorithm. Total head movement spans 322 cylinders. Selecting a Disk-Scheduling Algorithm • Which algorithm to choose? – SSTF is common and better than FCFS. – SCAN and C-SCAN perform better for systems that place a heavy load on the disk. – Performance depends on the number and types of requests, and the fileallocation method. – In general, either SSTF or LOOK is a reasonable choice for the default algorithm. • The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. • Why not let the controller built into the disk hardware manage the scheduling? – The disk hardware can take into account both seek time and rotational latency. – The OS may choose to mandate the disk scheduling to guarantee priority of certain types of I/O. Disk Management • The Operating System may also be responsible for tasks such as disk formatting, booting from disk, and bad-block recovery. • Low-level formatting divides a disk into sectors, and is usually performed when the disk is manufactured. • Logical formatting creates a file system on the disk, and is done by the OS. • The OS maintains the boot blocks (or boot partition) that contain the bootstrap loader. • Bad blocks: disk blocks may fail – An error-correcting code (ECC) stored with each block can detect and possibly correct an error (if so, it is called a soft error). – Disks contain spare sectors which are substituted for bad sectors. – If the system cannot recover from the error, it is called a hard error, and manual intervention may be required. Swap-Space Management • Recall that memory uses disk space as an extension of main memory; this disk space is called the swap space, even for systems that implement paging rather than pure swapping. • Swap-space can be: – A file in the normal file system: easy to implement, but slow in practice – A separate (raw) disk partition: requires a swap-space manager, but can be optimized for speed rather than storage efficiency • Linux allows the administrator to choose whether the swap space is in a file or in a raw disk partition. RAID Structure • RAID: Redundant Array of Independent Disks or Redundant Array of Inexpensive Disks • In systems with large numbers of disks, disk failures are common. • Redundancy allows the recovery of data when disk(s) fail. – Mirroring: A logical disk consists of two physical disks, and every write is carried out on both disks. – Bit-level striping: Splits the bits of each byte across multiple disks, which improves the transfer rate. – Block-level striping: Splits blocks of a file across multiple disks, which improves the access rate for large files and allows for concurrent reads of small files. – A nonvolatile RAM (NVRAM) cache can be used to protect data waiting to be written in case a power failure occurs. • Are disk failures really independent? • What if multiple disks fail simultaneously? RAID Levels • RAID level 0: non-redundant striping – Data striped at the block level, with no redundancy. • RAID level 1: mirrored disks – Two copies of data stored on different disks. – Data not striped. – Easy to recover data from one disk that fails • RAID 0 + 1: combines RAID levels 0 and 1 – Provides both performance and reliability. RAID Levels • RAID level 2: error-correcting codes – Data striped across disks at the bit level. – Disks labeled P store extra bits that can be used to reconstruct data if one disk fails. – Requires fewer disks than RAID level 1. – Requires computation of the error-correction bits at every write, and failure recovery requires lots of reads and computation. RAID Levels • RAID level 3: bit-interleaved parity – Data striped across disks at the bit level. – Since disk controllers can detect whether a sector has read correctly, a single parity bit can be used for error detection and correction. – As good as RAID level 2 in practice, but less expensive. – Still requires extra computation for parity bits. • RAID level 4: block-interleaved parity – Data striped across disks at the block level. – Stores parity blocks on a separate disk, which can be used to reconstruct the blocks on a single failed disk. RAID Levels • RAID level 5: block-interleaved distributed parity – Data striped across disks at the block level. – Spreads data and parity blocks across all disks. – Avoids possible overuse of a single parity disk, which could happen with RAID level 4. • RAID level 6: P + Q redundancy scheme – Like RAID level 5, but stores extra redundant information to guard against simultaneous failures of multiple disks. – Uses error-correcting codes such as Reed-Solomon codes. RAID Implementation • RAID can be implemented at various levels: – At the kernel of system software level – By the host bus-adapter hardware – By storage array hardware – In the Storage Area Network (SAN) by disk virtualization devices • Some RAID implementations include a hot spare: an extra disk that is not used until one disk fails, at which time the system automatically restores data onto the spare disk. Stable-Storage Implementation • Stable storage: storage that never loses stored information. • Write-ahead logging (used to implement atomic transactions) requires stable storage. • To implement stable storage: – Replicate information on more than one nonvolatile storage media with independent failure modes. – Update information in a controlled manner to ensure that failure during an update will not leave all copies in a damaged state, and so that we can safely recover from a failure. • Three possible outcomes of a disk write: 1. Successful completion: all of the data written successfully 2. Partial failure: only some of the data written successfully 3. Total failure: occurs before write starts; previous data remains intact Stable-Storage Implementation • Strategy: maintain two (identical) physical blocks for each logical block, on different disks, with error-detection bits for each block • A write operation proceeds as: – Write the information to the first physical block. – When the first write completes, then write the same information to the second physical block. – When the second write completes, then declare the operation successful. • During failure recovery, examine each pair of physical blocks: – If both are the same and neither contains a detectable error, then do nothing. – If one block contains a detectable error, then replace its contents with the other block. – If neither block contains a detectable error, but the values differ, then replace the contents of the first block with that of the second. • As long as both copies don’t fail simultaneously, we guarantee that a write operation will either succeed completely or result in no change. Tertiary Storage • Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is formatted and an empty file system is generated on the disk. • Tapes are presented as a raw storage medium, i.e., and application does not open a file on the tape, it opens the whole tape drive as a raw device. – Usually the tape drive is reserved for the exclusive use of that application. – Since the OS does not provide file system services, the application must decide how to use the array of blocks. – Since every application makes up its own rules for how to organize a tape, a tape full of data can generally only be used by the program that created it. • The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer, and then use the cartridge in another computer. • Contemporary OSs generally leave the name space problem unsolved for removable media, and depend on applications and users to figure out how to access and interpret the data. • Some kinds of removable media (e.g., CDs) are so well standardized that all computers use them the same way.