EECS 221: INFORMATION STORAGE LECTURE NOTES Instructor: Zhiying Wang Winter Quarter Week 1 1/5/2016 - 1/7/2016 L ECTURE : 1/5/2016 From 2012 until 2020, data will double every two years Every person will have an avg of 5,200 GB of data, which results in 40 Zettabytes (1021 ) of data in 2020. Challenges for Storage Cloud, Public & Private • subscription based storage • storage policies change faster In-memory computing • necessary for new storage technology • example: server systems • computing from CPU to memory (storage part) SW/HW Integration Need optimized platform storage interconnect Need optimized software storage access methods Object Storage • file systems (manage data as file hierarchy) • block storage (manage data as blocks) • object storage (content that is stored) • distinguish content requested Virtualization • Pooling Physical Storage from multiple network storage devices into what appears to be a single storage device 1 • Challenges of shared storage structure Big data and various data sources New applications (mobiles, wearables) Resouce allocation and networking strategies How to measure storage system? List of Considerations: • $/GB (density) • Throughput – (i) IOPS # of I/O operations supported in system per second – (ii) transfer rate: amount of data transferred per second How much money to spend to achieve certain IOPS? How much energy does it cost to achieve certain IOPS? Latency • (i) response time: amount of time elapsed from issued to serviced or command completed • (ii) service time: time that elapsed from the command service start to command completed Typically, latency and throughput are inversely proportional IOPS x Average Block Size = Transfer rate IOPS is also rate of service request (arrival rate) Ulitization: arrival rate x service time M/M/1 Queue: Poisson Process ( arrival, service rates are exponential) response time = Service Time 1 - Utilization Data Storage Hierarchy • Tier 0: Ultra Low latency (Cache) • Tier 1: Main Memory, Low Latency (In-memory computing, Virtual Server) • Tier 2: Mixed used, read intensive (Image Retrieval, Web Search) • Tier 3: Nearline Storage. High Capacity, Low Power, Higher Latency (Tapes) HIgh Frequency Trading 1ms improvement = $100M profit/year $/transaction: Virtual Desktop Server provides display data through terminals How much does it cost me to complete one transaction? Storage Media Considerations SCM: Memristor, PCM, STT-RAM, Flash, HD (non-volatile) 2 Consider technology maturity Important Figure of Merits: Density, energy/bit, read time, write time, retention, endurance, 3D Capability Replace SSD every two years (new tech leads to new densities) Increase density in 3D space HDD: First Disk Drive, IBM RAMAC (Random Access Memory with Accounting Control) 1955 Before HDD: Magnetic tapes developed by IBM (1953) 1950: First Magnetic Drum invented for United States Navy (1m bits storage space) IBM RAMAC: 50 24" disks, 5 million character (1 character = 7 bits) Average access = 1 second Better than magnetic tapes which could only implement one dimensional search Magnetic Disk: Two Dimensional, allows for Random Access (vs. Magnetic Tapes) Developments: minituarization 1962: every disk has own head (easy to access to one surface of the disk) 1963: Removable disks, not used anymore These led to Floppy disks HDD Diameter: 24 inches in 1955 –> 1 in in 2000 Size decrease = Density increase Flash Storage • NAND (decreases in nm throughout the years, better in areal perspecitve ) • Logic Flash Storage is enabled from software Storage vs. Communication Systems = Communication Systems can retransmit on error Cannot ask storage device to write again. Need very small error probability (achieved by signal processing, error correction, and calibration) Class Topics HD Flash Memory and other non-volatile memory Storage Architecture in Data Centers Networking in Data Centers Power Consumption in Data Centers Object Storage Virtualization, Consistent Storage Grading Scribing: 5% HW: 75% Final Project: 20& 3 Hard Disks: Physical Layer Angular and Radial Coordinates How to find information on disk? To locate point on disk: • angle: angular position θ • distance from center to point r Seek: head moves to radial position desired Ferromagnetic materials • separated into grains • electron spin alignment (aligned magnetic field, magnetizing = writing info) Hysteresis Loop Applying Current will magnetize material to a saturation point Mr ∗ Hc = strength of material Track = circular area on the disk Write (applying magnetization) Direction of magnetization can indicate 0 or 1 theoretically correct but not in practice Instead, look at transitions for magnetic field transition or magnetic reversal = 1 Write has to be in one block (to account for transitions, block is known as a sector) textcolorblueWrite Head Components of the Write Head Core, Coil, Poles Inducing current on coil will emit field to disk, magnetizing the material in a specific direction depending on the direction of the current Properties of Write Head Core material must be easy to magnetize Head must be strong and durable Flux must be strong enough to magnetize material Read Head Magnetoresistive (MR) Heads 2 directions Easy and hard direction Change of magnetic field changes resistance Transitions cannot be close together or else the resulting signal from the transition pulses will be lower than the intended signal and may cause errors 4