Lecture 29 Reminder: Homework 7 is due on Monday at class time for Exam 2 review; no late work accepted. Memory management project is now due on Friday, April 1. Reminder: Exam 2 is on Wednesday. Exam 2 review sheet is posted. Questions? Friday, March 25 CS 470 Operating Systems - Lecture 29 1 Outline Disk systems Disk scheduling Disk management RAID Friday, March 25 CS 470 Operating Systems - Lecture 29 2 Disk Drives A disk is viewed logically as a linear array of blocks. How is it mapped onto a circular disk drive? A disk drive is one or more platters rotating on a spindle. Each side of a platter has a head that reads the data off that side of the platter. Each platter side has concentric grooves called tracks. The vertical extent of the same track position on each platter is a cylinder. Each track/cylinder is divided into sectors. Friday, March 25 CS 470 Operating Systems - Lecture 29 3 Disk Drives Friday, March 25 CS 470 Operating Systems - Lecture 29 4 Disk Drives Generally, block numbers are mapped with Block 0 at cylinder/track 0 (outermost groove), head 0, sector 0. The next block is sector 1 until the track is full, then the next block is head 1, sector 0, etc., until the cylinder is full, then the next block is cylinder/track 1, head, 0, sector 0, and so forth. Conceptually, it is possible for OS's to map logical block numbers to <cyl, head, sector> addresses, but this does not happen any more with mapping handled by the disk controller. Friday, March 25 CS 470 Operating Systems - Lecture 29 5 Disk Drives One reason mapping is done in disk controller is that disks have been getting larger. Density has increased in three dimensions. # sectors/track (higher rotation speed) # tracks/platter (shorter seek separation) # bits/space (vertical writes in groove) Components of disk performance are seek time: disk arm movement to correct cylinder rotational delay (latency): wait for correct sector to rotate under the head Friday, March 25 CS 470 Operating Systems - Lecture 29 6 Disk Drives Taken together, data access time is determined by Bandwidth (bytes transferred/unit time): buffer to disk, buffer to host Buffer size Disk drives come in various speeds and sizes optimized for various applications Friday, March 25 CS 470 Operating Systems - Lecture 29 7 Disk Drives Using Western Digital as a prototypical line Disk drive Application Sizes RPM Cache Buffer to host Notes, Street price WD Caviar Blue Standard, internal desktop 80GB1TB 7200 32MB SATA 6Gb/s 1TB ~$65 WD Caviar Black Maximum speed, internal desktop 80GB2TB 7200 64MB SATA 6Gb/s 2TB ~$160 WD Caviar Green Maximum capacity, low power, internal desktop 320GB3TB variable 64MB SATA 3Gb/s 3TB ~$215 WD VelociRaptor Internal, enterprise server 150600GB 10000 32MB SATA 6Gb/s 600GB ~$250 WD Scorpio Blue Standard, internal laptop 80GB1TB 5200 8MB SATA 3Gb/s 1TB ~$110 WD Scorpio Black Maximum power, internal laptop 160GB750GB 7200 16MB SATA 3Gb/s 750GB ~$110 Friday, March 25 CS 470 Operating Systems - Lecture 29 8 Disk Drives Disk drive Application Sizes RPM Cache Buffer to host Notes, Street price WD AV-25 24/7 surveillance 160500GB 5400 32MB SATA 3Gb/s MTBF 1 million hours, 500GB ~$72 WD My Book Essential External desktop 80GB2TB USB 3.0 2TB ~$120 5Gb/s WD My Passport Elite External portable 250640GB USB 2.0 640GB ~$113 480Mb/s WD My Book World Edition II Networked 2-4TB Ethernet RAID 1/0 (2 drives in box), 4TB ~$340 Toshiba makes a 240GB, 4200 RPM, 8MB cache disk drive that sells for ~$230. Why would anyone want to buy this small, slow, expensive drive? Friday, March 25 CS 470 Operating Systems - Lecture 29 9 Disk Drives What is the limit on the capacity of a disk drive using conventional magnetic media? Typical drives are ~250Gb/sq.in. The Toshiba drive is ~344Gb/sq.in. Current limit is ~500Gb/sq.in. Theoretical limit is ~1Tb/sq.in., any smaller grains and heat will change the magnetization of the bits Seagate research into ways of packing more bits. Theoretically up to 50Tb/sq.in. Friday, March 25 CS 470 Operating Systems - Lecture 29 10 Disk Scheduling As with all resources, can extract best performance if schedule disk accesses. Now mostly done in the disk controller because: Original IDE interface has maximum 16383 cylinders x 16 heads x 63 sectors = 8.4GB to be reported. All disks to this now and the EIDE interface was added to find the actual geometry using LBA (linear block addressing). Most disks map out defective sectors to spare ones. # sectors/track is not constant. About 40% more sectors on outer tracks than on inner tracks. Friday, March 25 CS 470 Operating Systems - Lecture 29 11 Disk Scheduling OS generally just makes requests to the controller. The controller has a queue and a scheduling algorithm to choose which request is serviced next. The algorithms are straightforward and have similar properties to other scheduling algorithms that we have studied. OS's are now more concerned with disk management. I.e., how to make a disk usable to users. Friday, March 25 CS 470 Operating Systems - Lecture 29 12 Formatting Low-level, physical formatting is done at the factory, but OS can do this, too. File system formatting Create a partition table that groups cylinders into a virtual disk. Tools like fdisk, sfdisk, PartitionMagic Create file system. In Unix, makefs allocates inodes (index blocks). Create swap space. Friday, March 25 CS 470 Operating Systems - Lecture 29 13 Boot Block How does a computer find the OS to boot? Cannot require that it be in a particular location on a particular disk, since we can choose between more than one. Bootstrap loader is a program that loads OS's. It could be stored in ROM, but then would be hard to change. Usually very small loader is stored in ROM that knows where the loader program is in the boot block (aka MBR master boot record). Example loaders include grub, lilo, the Windows loader,... Friday, March 25 CS 470 Operating Systems - Lecture 29 14 Boot Block Boot loaders know how to initialize the CPU and bring up the file system. They are configured to know where the OS program code resides. E.g., grub knows they are in the file system, usually in /boot. Boot loader loads the kernel into memory, then jumps to the first instruction of the OS. Then the OS takes over. Friday, March 25 CS 470 Operating Systems - Lecture 29 15 Bad Blocks All disks have bad areas. The factory initially maps out the blocks that would be been allocated to these areas. (Too many of them causes the disk to be rejected.) Some disk controllers are "smart" (e.g., SCSI) and automatically remap bad blocks when encountered. Spare sectors are reserved on each cylinder for this. Other controllers rely on OS to inform. E.g. Win marks FAT entries after chkdisk scan. Friday, March 25 CS 470 Operating Systems - Lecture 29 16 Swap Space Usage of swap space depends on memory management algorithm and OS. Some store entire program and data in swap space for duration of execution. Others only store the pages being used. Friday, March 25 CS 470 Operating Systems - Lecture 29 17 Swap Space Swap space issues include file vs. disk partition - usually a raw partition with dedicated manager for speed single vs. multiple spaces location - if single, usually in center of disk; multiple only if multiple disks size - running out means aborting processes, but more real memory means less need to swap Friday, March 25 CS 470 Operating Systems - Lecture 29 18 RAID Disks have gotten physically smaller and much cheaper. Want to combine multiple disks into one system to increase read/write performance and to improve reliability. Initially, RAID was Redundant Arrays of Inexpensive Disks focusing on providing large amounts of storage cheaply. Now focus is on reliability, so now RAID is Redundant Arrays of Independent Disks. Friday, March 25 CS 470 Operating Systems - Lecture 29 19 RAID Reliability is characterized by mean time to failure (MTF). E.g., 100,000 hours for a disk. For an array of 100 disks, the MFT that some disk will fail is 100000/100 = 1000 hours = 41.66 days(!). If only one copy of each piece of data is stored, each failure is costly. To solve this problem, introduce redundancy. I.e., store extra information that can be used to rebuild lost information. Friday, March 25 CS 470 Operating Systems - Lecture 29 20 RAID Simplest redundancy is to mirror a disk. I.e., create a duplicate. Every write goes to both disks and a read can go to either one. The only way to lose data is if the second disk fails during the time to repair the first disk. MTF for the system depends on the MTF of the disks and the mean time to repair (MTR). Friday, March 25 CS 470 Operating Systems - Lecture 29 21 RAID If disk failures are independent and MTR is 10 hours, MTF (i.e., data loss) is 1000002/(2*10) hours = 500x106 hours = ~57,000 years(!) Of course, many failures are not independent. E.g., power failures, natural disasters, manufacturing defects, etc. Friday, March 25 CS 470 Operating Systems - Lecture 29 22 RAID Performance is increased through parallelism. E.g., for a mirrored disk, transfer rate is the same as a single disk, but overall read rate doubles. Transfer rate can be improved by striping data across multiple disks. E.g., if we have 8 disks, can write one bit of each byte on each disk simultaneously. Number of accesses per unit time is the same, but each accesses reads 8 times as much data. Larger units such as block striping is common. Friday, March 25 CS 470 Operating Systems - Lecture 29 23 RAID Levels Striping does not help with reliability, and mirroring is expensive. Various schemes, called RAID levels, provide both with different tradeoffs. RAID 0 is simple striping. RAID 1 is simple mirroring. Higher levels are more complicated. Friday, March 25 CS 470 Operating Systems - Lecture 29 24 RAID 0+1 and RAID 1+0 Can also combine schemes. RAID 0+1 is a mirrored RAID 0 system. RAID 1+0 is a RAID 1 system that is striped. Friday, March 25 CS 470 Operating Systems - Lecture 29 25