MEMORY HIERACHY & EXTERNAL MEMORY By Noordiana Kassim The Memory Hierarchy • Topics • • • Storage technologies and trends Locality of reference Caching in the memory hierarchy Random-Access Memory (RAM) • Key features • • • • Static RAM (SRAM) • • • • • RAM is packaged as a chip Basic storage unit is a cell (one bit per cell) Multiple RAM chips form a memory Each cell stores bit with a six-transistor circuit Retains value indefinitely, as long as it is kept powered Relatively insensitive to disturbances such as electrical noise Faster and more expensive than DRAM Dynamic RAM (DRAM) • • • • Each cell stores bit with a capacitor and transistor Value must be refreshed every 10-100 ms Sensitive to disturbances Slower and cheaper than SRAM Non-Volatile RAM (NVRAM) • Key Feature: Keeps data when power lost • • • • Several types Most important is NAND flash Ongoing R&D NAND flash • • Reading similar to DRAM (though somewhat slower) Writing packed with restrictions: • • • • • Can’t change existing data Must erase in large blocks (e.g., 64K) Block dies after about 100K erases Writing slower than reading (mostly due to erase cost) Chips often packaged with Flash Translation Layer (FTL) • • Spreads out writes (“wear leveling”) Makes chip appear like disk drive Typical Bus Structure Connecting CPU and Memory • • A bus is a collection of parallel wires that carry address, data, and control signals Buses are typically shared by multiple devices CPU chip register file ALU system bus bus interface I/O bridge memory bus main memory Memory Read Transaction (1) • CPU places address A on memory bus register file %eax Load operation: movl A, %eax ALU I/O bridge bus interface A main memory 0 x A Memory Read Transaction (2) • Main memory reads A from memory bus, retrieves word x, and places it on bus register file %eax Load operation: movl A, %eax ALU I/O bridge bus interface x main memory 0 x A Memory Read Transaction (3) • CPU reads word x from bus and copies it into register %eax register file %eax x Load operation: movl A, %eax ALU I/O bridge bus interface main memory 0 x A Memory Write Transaction (1) • CPU places address A on bus; main memory reads it and waits for corresponding data word to arrive register file %eax y Store operation: movl %eax, A ALU I/O bridge bus interface A main memory 0 A Memory Write Transaction (2) CPU places data word y on bus register file %eax y Store operation: movl %eax, A ALU I/O bridge bus interface y main memory 0 A Memory Write Transaction (3) • Main memory reads data word y from bus and stores it at address A register file %eax y Store operation: movl %eax, A ALU I/O bridge bus interface main memory 0 y A Disk Access Time • Average time to access some target sector approximated by : • • Seek time (Tavg seek) • • • Time to position heads over cylinder containing target sector Typical Tavg seek = 9 ms Rotational latency (Tavg rotation) • • • Taccess = Tavg seek + Tavg rotation + Tavg transfer Time waiting for first bit of target sector to pass under r/w head Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min Transfer time (Tavg transfer) • • Time to read the bits in the target sector. Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min Disk Access Time Example • Given: • • • • Derived: • • • • Rotational rate = 7,200 RPM Average seek time = 9 ms Avg # sectors/track = 400 Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms Taccess = 9 ms + 4 ms + 0.02 ms Important points: • • • Access time dominated by seek time and rotational latency First bit in a sector is the most expensive, the rest are free SRAM access time is about 4 ns/doubleword, DRAM about 60 ns • • Disk is about 40,000 times slower than SRAM, and 2,500 times slower then DRAM Logical Disk Blocks • • • Modern disks present a simpler abstract view of the complex sector geometry: • The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2, ...) Mapping between logical blocks and actual (physical) sectors • Maintained by hardware/firmware device called disk controller • Converts requests for logical blocks into (surface,track,sector) triples Allows controller to set aside spare cylinders for each zone • Accounts for the difference in “formatted capacity” and “maximum capacity” KEY CHARACTERISTICS OF COMPUTER MEMORY SYSTEMS •Number of words •Number of bytes •Word •Block •Sequential •Direct •Random •Associative Location Capacity Unit of Transfer Access Method •Access Time •Cycle Time •Transfer Rate •Semiconductor •Magnetic •Optical •Magneto-Optical •Volatile/non •Erasable/non •Memory modules Physical Type Physical Characteristic • Internal (e.g: registers,cache, main memory) • External (e.g: disks, tapes) Performance Organization MEMORY HIERACHY • Design constraints on a computer’s memory can be summed up by three (3) questions: How much? How fast? CAPACITY ACCESS TIME COST How Expensive? MEMORY HIERACHY • There is a trade-off among the three (3) characteristics of memory : capacity, access time, and cost that hold the following relationships. • • • Faster access time, greater cost per bit. Greater capacity, smaller cost per bit. Greater capacity, slower access time. MEMORY HIERACHY DIAGRAM Memory Hierarchies • Some fundamental and enduring properties of hardware and software: • Fast storage technologies cost more per byte and have less capacity • Gap between CPU and main memory speed is widening • Well-written programs tend to exhibit good locality • These fundamental properties complement each other beautifully • They suggest an approach for organizing memory and storage systems known as a memory hierarchy An Example Memory Hierarchy Smaller, faster, and costlier (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices L5: L0: registers L1: on-chip L1 cache (SRAM) L2: L3: L4: off-chip L2 cache (SRAM) CPU registers hold words retrieved from L1 cache L1 cache holds cache lines retrieved from the L2 cache memory main memory (DRAM) local secondary storage (local disks) remote secondary storage (distributed file systems, Web servers) L2 cache holds cache lines retrieved from main memory Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers Caches • • • Cache: Smaller, faster storage device that acts as staging area for subset of data in a larger, slower device Fundamental idea of a memory hierarchy: • For each k, the faster, smaller device at level k serves as cache for larger, slower device at level k+1 Why do memory hierarchies work? • Programs tend to access data at level k more often than they access data at level k+1 • Thus, storage at level k+1 can be slower, and thus larger and cheaper per bit • Net effect: Large pool of memory that costs as little as the cheap storage near the bottom, but that serves data to programs at ≈ rate of the fast storage near the top. Cache memory • • • • If the active portions of the program and data are placed in a fast small memory, the average memory access time can be reduced, Thus reducing the total execution time of the program Such a fast small memory is referred to as cache memory The cache is the fastest component in the memory hierarchy and approaches the speed of CPU component Cache memory • When CPU needs to access memory, the cache is examined • If the word is found in the cache, it is read from the fast memory • If the word addressed by the CPU is not found in the cache, the main memory is accessed to read the word Cache memory When the CPU refers to memory and finds the word in cache, it is said to produce a hit • Otherwise, it is a miss • The performance of cache memory is frequently measured in terms of a quantity called hit ratio • Hit ratio = hit / (hit+miss) • Cache memory • • • The basic characteristic of cache memory is its fast access time, Therefore, very little or no time must be wasted when searching the words in the cache The transformation of data from main memory to cache memory is referred to as a mapping process, there are three types of mapping: • • • Associative mapping Direct mapping Set-associative mapping Cache memory • To help understand the mapping procedure, we have the following example: Associative mapping • • • • The fastest and most flexible cache organization uses an associative memory The associative memory stores both the address and data of the memory word This permits any location in cache to store any word from main memory The address value of 15 bits is shown as a fivedigit octal number and its corresponding 12-bit word is shown as a four-digit octal number Associative mapping Associative mapping • • • • A CPU address of 15 bits is places in the argument register and the associative memory us searched for a matching address If the address is found, the corresponding 12-bits data is read and sent to the CPU If not, the main memory is accessed for the word If the cache is full, an address-data pair must be displaced to make room for a pair that is needed and not presently in the cache Direct Mapping Associative memory is expensive compared to RAM • In general case, there are 2^k words in cache memory and 2^n words in main memory (in our case, k=9, n=15) • The n bit memory address is divided into two fields: k-bits for the index and n-k bits for the tag field • Direct Mapping Direct Mapping Set-Associative Mapping • The disadvantage of direct mapping is that two words with the same index in their address but with different tag values cannot reside in cache memory at the same time • Set-Associative Mapping is an improvement over the direct-mapping in that each word of cache can store two or more word of memory under the same index address Set-Associative Mapping Set-Associative Mapping In the slide, each index address refers to two data words and their associated tags • Each tag requires six bits and each data word has 12 bits, so the word length is 2*(6+12) = 36 bits • General Caching Concepts • Types of cache misses: • • • Cold (compulsory) miss • Cold misses occur because the cache is empty Conflict miss • Most caches limit blocks at level k to a small subset (sometimes a singleton) of the block positions at level k+1 • E.g. block i at level k+1 must be placed in block (i mod 4) at level k • Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block • E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time Capacity miss • Occurs when the set of active cache blocks (working set) is larger than the cache EXTERNAL MEMORY Types of External Memory • Magnetic Disk • • • Optical • • • • • • • RAID (Redundant Array of Independent Disks) Removable CD-ROM CD-Recordable (CD-R) CD-R/W DVD DVD-R DVD-RW Magnetic Tape Magnetic Disk Coated with magnetizable material for read and write purpose. The substrat used to be aluminum. Recently use glass. Better stiffness Greater shock/damage resistance Lower fly height Improved uniformity of surface helps to reduced read-write errors Magnetic Write and Read Mechanism • Head: Fixed head One read-write head per track Heads build into a fixed ridged arm Movable head One read-write head per surface Build into a movable arm When the track passes under the head, it generates a current of the same polarity as the one already recorded. Disk Data Layout Contains: 1. Tracks 2. 3. Intertrack Gaps Sectors 4. Same with as the head. Fixed-length (512 bytes) is commonly used in industry. InterSector Gaps Gaps are there to minimize errors due to misalignment of head or interference of magnetic field. Disk Layout Methods CAV – Constant Angular Velocity Multiple Zone Recording: to enhance density(capacity) Characteristics Movable Head or not Removability Provides unlimited storage capacity Easy data transfer between systems Multiple Platter Single or double sided. Disk Performance Parameters Seek Time : time to position the head at the track Rotational Delay : The time it takes for the begining of the sector to reach the head Transfer Time : time required for the transfer T = b/rN T = Transfer time b = Number of bytes to be transfered N = Number of bytes on a track r = rotation speed in rev/sec Units usually is in ms, and considered average case RAID Stand for Redundant Arrays of Independent Disks RAID is a set of physical disk drives viewed by the perating system as a single logical drive Data are distributed across the physical drives of array in ascheme known as striping, describes subseuently. Redundant disk capacity is used to store parity information, which quarantees data recoverability in case of a disk failure. Uses Array Management Software Level 0 ~ 6 and more, such as RAID 10 (a combination of RAID 0 and RAID 1) RAID Level 0 Not a true member of RAID family No redundancy or fault tolerance High transfer capacity for large and small I/O data It's there because it distrbites datas across mutiple disks No parity coculation is needed Easy to implement RAID Level 0 In a transaction environment, there may be hundreds of I/O requests per second. A disk array can provide high I/O executtion rates by balancing the I/O load across mutiple disks. Parallel processing Any error is uncorrectable One disk's failure will result in all data in an array being lost RAID Level 1 Redundancy is achieved by having a mirror disk Insufficient use of space Read request is really efficiency (the one involves minimum seek time plus rotational latency) Write request could be done parallelly (T = the larger one) Recovery is really simple. Just replace the broken disk with a new one Optical Storage CD-ROM Originally for audio • 650Mbytes giving over 70 minutes audio • Polycarbonate coated with highly reflective coat, usually aluminium • Data stored as pits • Read by reflecting laser • Constant packing density • Constant linear velocity • Random Access on CD-ROM Difficult • Move head to rough position • Set correct speed • Read address • Adjust to required location • CD-ROM for & against Large capacity (?) • Easy to mass produce • Removable • Robust • Expensive for small runs • Slow • Read only • Other Optical Storage • CD-Recordable (CD-R) • • • • WORM (Write once, read many) Now affordable Compatible with CD-ROM drives CD-RW • • • • Erasable Getting cheaper Mostly CD-ROM drive compatible Phase change • Material has two different reflectivities in different phase states DVD - what’s in a name? • Digital Video Disk ? • • Used to indicate a player for movies • Only plays video disks Digital Versatile Disk ? • Used to indicate a computer drive • Will read computer disks and play video disks DVD - technology Multi-layer • Very high capacity (4.7G per layer) • Full length movie on single disk • • Using MPEG compression CD vs DVD DVD+R • The +R format pre-groove also uses a wobble frequency, but at a much higher frequency 817kHz. Instead of pre-pits, the R+ formats convey the sector addressing information by frequency modulation of the wobble frequency. Magnetic Tape Serial access • Slow • Very cheap • Backup and archive •