Storage Systems CSE 598d, Spring 2007 Lecture 3: Disk drive trends, modeling (contd.) Feb 1, 2007 • Topics – Disk drive modeling – SCSI vs ATA – Rules of thumb in data engineering Disk Drive Modeling • Problems because – Non-linear – State Dependent • Not easy to model analytically • Pitfalls – – – – Seek time linear w.r.t distance Uniform distribution for rotational latency Constant transfer times Ignoring bus contention Comparing four different models • (i) Constant fixed time for each I/O • (ii) Simple model which is – – – – – Seek time is linear with distance No head settle/switch costs Uniform rotational delay Fixed controller costs Linear transfer costs • (iii) Better seek and positioning model – – – – (a) 3.45+0.597*sqrt(d) ms (for < 616 cylinders) (b) 10.8+0.012d ms (for >= 616 cylinders) 2.5 ms for head/track switch Keeps track of rotational position • (iv) All of (iii) + Cache model + Read ahead + Bus speed + Controller overheads • Chosen metric for comparison: relative demerit Results (i) (ii) (iii) (iv) This disk does not have a cache! Disk with a cache (iii) (iv) Conclusions • The following aspects are important – Disk cache/buffer (112) – Data transfer model (20) • Overlaps with bus transfer, seek-time, head-switching – Rotational position, data layout (2) • While these are not (that important) How do we get the drive parameters for such modeling? • Manuals/Data sheets – Not everything is publicized – Things can still vary • Interrogative extraction – Though extensive SCSI interface, not all may be supported – Several more parameters may be needed • Empirical/experimental extraction – This is hard Complications of empirical extraction • Overlapping controller overheads, bus transfers, mechanical delays, etc. • Contention for shared resources • Cache segmentation • Prefetching • Non-uniformity in performance (e.g. seeks) • Large seemingly non-deterministic delays (e.g. thermal recalibration) • Fluctuations in timing. Parameters needed for modeling • • • • Data layout Seek, rotational latency and transfer costs Bus, controller and host processing costs Caching and prefetching parameters Data Layout Parameters • Where does a block actually reside on disk? • May need to be re-acquired upon each formatting (since a re-allocated defect may be converted to slipped defects for better efficiency) • SEND/RECEIVE diagnostic of SCSI interface can be used to query the actual location of a block. – Doing this for each block would be very timeconsuming Storage Systems CSE 598d, Spring 2007 Lecture 4: Disk drive trends, modeling (contd.) Feb 8, 2007 Empirical Extraction • Send commands to disk and measure Mean Time Between Request Completions – MTBRC(a,b) – of 2 requests iteratively. • Rotational distance between request pairs is varied until a minimum is reached. Extracting Head switch time • MTBRC1 = MTBRC(1-sector write, 1-sector read on the same track) = Host1+Cmd+Media+Bus+Comp • MTBRC2 = MTBRC(1-sector write, 1-sector read on a diff. track of same cylinder) = Host2+Cmd+HdSw+Media+Bus+Comp HdSw = (MTBRC2–Host2) – (MTBRC1–Host1) Extracting Seek Times (i) For each seek distance, select 5 points evenly spaced. From each of these points, perform 10 inward and 10 outward seeks of this distance. Get the average of these. (ii) Measure MTBRC(1-sector write, 1-sector read on same track), and MTBRC(1-sector write, 1-sector read on next cylinder). Difference between these is mechanical time for 1-cylinder seek. (iii) Subtract (ii) from the 1-cylinder distance value of (i). The diff. represents the non-mechanical overheads of seek. Subtract (iii) from each of the values obtained in (i) Typical Seek time profile Extracting Rotation Speed • Perform a series of 1-sector writes to the same location and calculate the mean time between completions. Extracting Cache Segments, Size … • Say the # of segments is N. • Perform 1-sector reads of the first logical blocks of the first N-1 cylinders • Perform a 1-sector read of the first logical block of the last data cylinder • Perform a 1-sector read of the first logical block of the first cylinder. If that is a hit (measured by response time), then # of segments is N or greater. Extraction techniques for • • • • • • • • • Segment size Do prefetched data replace requested data in the current segment? Are all requested data always thrown away? Does prefetching stop on track/cylinder boundaries? Is the prefetching size proportional to request size? Does it implement read-on-arrival? Write-on-arrival? Is cache space allocated on a track or sector basis? Can READs hit on data placed in the cache by WRITEs? What is the segment replacement algorithm? The physical I/O path from CPU to the disk CPUs RAMs System bus Bridge chip Host I/O bus (PCI, Infiniband) SCSI HBA SCSI FC HBA iSCSI Graphics Ethernet HBA Card NIC Fibre Channel IP LAN I/O buses • System bus – Rapid data transfer between CPU and memory • Host I/O bus – Common: PCI, emerging: Infiniband • Device drivers responsible for control of and communication with peripheral devices of all types – Part of the device driver for storage device almost always realized by firmware that is processed by special processors (ASICs) • ASICs are partially integrated into the main curcuit board, such as onboard SCSI controllers, or connected to the main board via add-on cards (PCI cards) – Storage devices connected to the server via the host bus adapter (HBA) – Communication connection between the HBA and the peripheral device is called the I/O bus • Similar I/O path/techniques used within a disk subsystem I/O bus technologies • • • • • • SCSI ATA/IDE, Serial-ATA (SATA) SCSI over IP (iSCSI) Fibre Channel USB … many more SCSI basics • Small Computer System Interface • First version released in 1986 – Many versions since • The dominant technology for UNIX and PC servers – Assignment: find out what your laptop/desktop uses • A communication protocol as well as bus • Parallel bus for data and additional lines for control of communication SCSI basics (more) • A daisy-chain can connect upto 16 devices together • SCSI protocol defines – How devices reserve the bus – In what format data is transferred – Initial versions: message then ACK then next message – Latest versions: asynchronous issuance, multiple messages in transit together, increased data rate SCSI vs ATA: Motivating Factors • • • • • Cost (Market Demands) Form factor Configuration in groups Reliability Access Patterns Leading to differences in … • • • • • • • • Mechanics Materials Electronics Firmware Performance (RPM and Seeks) Reliability Power Consumption … Differences in Mechanics • ES Head/Disc Assembly – – – – – – Sustain higher disturbance Higher rigidity More mass Higher bandwidth servos Avoiding through holes Filter for particles, desiccant for humidity, carbon absorbent for organic materials – Better air flow hardware – O-ring seals for spindle – Higher quality sealing Mechanics (contd.) • Actuator – Larger magnets for faster seeks – Lower resistance (thicker and fewer windings) actuator coils – Latch (to hold actuator when off) can affect seek performance. ES compensates for this with a bi-stable latch. • Spindle – Higher RPM => Windage and Vibrations – PS Drives use a cantilever design to hold a motor (captured only at base), while ES drives capture the motor at both ends. Differences in Electronics • Needs to take and process commands from host, perform head positioning, servo processing, data transfers, cache management, etc. – PS drives may not have separate servo processor (to handle repeatable on non-repeatable runouts). – ES ASIC gate count 2X PS gate count – ES firmware code 2X PS firmware code size (to handle more concurrency) – ES Cache space 10X PS Cache space Differences in Magnetics • More or less similar (since there is no reason why latest advancements may not be used in both). • Main differences are in electronics needed to provide a Signal-to-Noise (SNR) ratio for the higher RPM of ES drives. Differences in Performance • Capacity – Areal density is similar since they use same magnetics – Differences due to # of platters and their size • Size of Platters – Power is nearly cubic to platter size. – To sustain higher RPM, ES drives use smaller platters (2.5” and lower) -> also helps seeking • # of platters – Trend is towards de-populated drives since you can use more drives to meet the capacity demands in ES environments Performance (contd.) • Data Rates – Though higher RPM favors ES, PS benefits from larger platter size, and more frequent introductions of newer models. Performance (contd.) • Seeks – Mechanical improvements and smaller platters favors ES. – ES also allows larger queue depths of outstanding requests to benefit from smarter scheduling. Rotational Vibration • Environment/nearby drives can excite the drives to throw the actuator off-track. • Note that this causes performance loss. • Need to understand how much vibration (in radians/square-sec) is present and design for it. • Some recent drives even have a vibration sensor for compensation in servo processing. Reliability • Described based on poweron hours (8 hrs/day for PS and 24 hrs for ES). • Depends on – Duty cycle (40% for ES vs. 75% for PS due to shorter seeks) – Temperature – Particles inside – Head crashes Serial ATA (SATA) • Serial implementation of ATA • Higher data rates – 133-150 Mbps compared to 320 Mbps for SCSI • Easier to configure, cheaper, less reliable (?)