Chapter 4 Internal Memory Yonsei University Contents • • • • • 4-2 Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium and Power PC Advanced DRAM Organization Yonsei University Key Characteristics Location Processor Internal (main) External (secondary) Capacity Word size Number of words Unit of Transfer Word Block Access Method Sequential Direct Associative 4-3 Computer memory system overview Performance Access time Cycle time Transfer rate Physical Type Semiconductor Magnetic Optical Magneto-Optical Physical Characteristics Volatile/nonvolatile Erasable/nonerasable Organization Yonsei University Characteristics of Memory System Computer memory system overview • Location – Processor – Internal(main) – External(secondary) 4-4 Yonsei University Capacity Computer memory system overview • Internal memory capacity – Expressed in terms of bytes or words • External memory capacity – Expressed in terms of bytes 4-5 Yonsei University Unit of Transfer Computer memory system overview • Internal memory – The unit of transfer is equal to number of data into and out of the memory module – Often equal to the word length – Word • Natural unit of organization of memory • The size of the word is equal to number of bits used to represent a number and to instruction length – Addressable unit • In many systems, the addressable unit is word • Some systems allow addressing at the byte level – Unit of transfer • The number of bits read out of or written into memory at a time 4-6 Yonsei University Methods of Accessing Computer memory system overview • Sequential access – Start at the beginning and read through in order – Access time depends on location of data and a previous location – e.g. tape • Direct access – Individual blocks have unique address – Access is by jumping to vicinity plus sequential search – Access time depends on location and previous location – e.g. disk 4-7 Yonsei University Methods of Accessing Computer memory system overview • Random access – Individual addresses identify locations exactly – Access time is independent of location or previous access – e.g. RAM • Associative access – Data is located by a comparison with contents of a portion of the store – Access time is independent of location or previous access – e.g. cache 4-8 Yonsei University Performance Computer memory system overview • Access time – Time it takes to perform read or write operation (for random-access memory) – Time it takes to position the read-write mechanism at desired location (for non-random-access memory) • Memory cycle time – Applied to random-access memory – Consists of the access time plus any additional time required before next access 4-9 Yonsei University Computer memory system overview Performance • Transfer rate – Rate at which data can be transferred into or out of a memory unit (for random-access memory) – For non-random-access memory T N ? T A ? N R TN = Average time to read or write N bits TA = Average access time N = Number of bits R = Transfer rate, in bits per second(bps) 4-10 Yonsei University Physical Types Computer memory system overview • Semiconductor – RAM • Magnetic – Disk & Tape • Optical – CD & DVD • Others – Bubble – Hologram 4-11 Yonsei University Physical Characteristics Computer memory system overview • Decay • Volatility • Erasable • Power consumption 4-12 Yonsei University Organization Computer memory system overview • Physical arrangement of bits to form words • Not always obvious • e.g. interleaved 4-13 Yonsei University The Bottom Line Computer memory system overview • How much? – Capacity • How fast? – Time is money • How expensive? 4-14 Yonsei University Computer memory system overview The Memory Hierarchy • Relationships – The faster access time, the greater cost per bit – The greater capacity, the smaller cost per bit – The greater capacity, the slower access time • As one goes down the hierarchy, the following occur – – – – 4-15 Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor Yonsei University The Memory hierarchy 4-16 Computer memory system overview Yonsei University Computer memory system overview The Memory Hierarchy • Registers – In CPU • Internal or Main memory – May include one or more levels of cache – RAM • External memory – Backing store 4-17 Yonsei University Performance of A Two-Level Memory 4-18 Computer memory system overview Yonsei University Hierarchy List • • • • • • • • 4-19 Computer memory system overview Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape Yonsei University Memory Hierarchy Computer memory system overview • Locality of reference – During the course of the execution of a program, memory references tend to cluster 4-20 Yonsei University Memory Hierarchy Computer memory system overview • Additional levels can be effectively added to the hierarchy in software • Portion of main memory can be used as a buffer to hold data that is to be read out to disk • Such a technique, sometimes referred to as a disk cache 4-21 Yonsei University Computer memory system overview So You Want Fast? • It is possible to build a computer which uses only static RAM (see later) • This would be very fast • This would need no cache – How can you cache cache? • This would cost a very large amount 4-22 Yonsei University Semiconductor Memory Types Memory Type Category Random-access Read-write memory memory(RAM) Read-only memory(ROM) Programmable ROM(PROM) Electrically Erasable PROM(EEPROM) 4-23 Erasure Write Mechanism Volatility Electrically, byte level Electrically Volatile Mask Read-only memory Not possible UV light, chip level Erasable PROM(EPROM) Flash memory Semiconductor main memory Read-mostly memory Electrically, block level Nonvolatile Electrically Electrically, byte level Yonsei University Semiconductor main memory RAM • Misnamed as all semiconductor memory is random access • Read/Write • Volatile • Temporary storage • Static or dynamic – DRAM requires periodic charge refreshing to maintain data storage 4-24 Yonsei University DRAM • • • • • • • • • 4-25 Semiconductor main memory Bits stored as charge in capacitors Charges leak Need refreshing even when it is powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Yonsei University SRAM • • • • • • • • • 4-26 Semiconductor main memory Bits stored as on/off switches No charges to leak No refreshing needed when it is powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Yonsei University Read Only Memory (ROM ) Semiconductor main memory • Permanent storage • Applications – – – – Microprogramming Library subroutines Systems programs (BIOS) Function tables • Problems – The data insertion step includes a large fixed cost – No room for error • Written during manufacture – Very expensive for small runs 4-27 Yonsei University Programmable ROM (PROM) Semiconductor main memory • Nonvolatile & Written only once – Writing is performed electrically and may be performed by a supplier or customer at a time later than the original chip fabrication – Needs special equipment to program 4-28 Yonsei University Read “Mostly” Memory (RMM) Semiconductor main memory • Read operations are far more frequent than write operations • But for which nonvolatile storage is required • EPROM • EEPROM • Flash 4-29 Yonsei University EPROM Semiconductor main memory • Erasable Programmable • Storage cell must be erased by UV before written operation • Read and written electrically • More expensive than PROM • Advantage of multiple update capability 4-30 Yonsei University Electrically Erasable PROM(EEPROM) Semiconductor main memory • A read-mostly memory that can be written into at any time without erasing prior contents • Takes much longer to write than read • Advantage – Nonvolatility with the flexibility of being updatable in place using ordinary bus control, address, data lines 4-31 Yonsei University Flash Memory Semiconductor main memory • Erase whole memory electrically • Entire flash memory can be erased in one or few seconds (faster than EPROM) • Possible to erase just blocks • Impossible to erase byte-level • Use only one transistor per bit • Achieves high density 4-32 Yonsei University Memory Cell Operation 4-33 Semiconductor main memory Yonsei University Chip Logic Semiconductor main memory • A 16Mbit chip can be organized as 1M of 16 bit words • A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on • A 16Mbit chip can be organized as a 2048 x 2048 x 4bit array – Reduces the number of address pins • Multiplex row address and column address • 11 pins to address (211=2048) • Adding one more pin doubles range of values • So the memory size grows by a factor of 4 4-34 Yonsei University Semiconductor main memory Refreshing • • • • • • 4-35 Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance Yonsei University Typical 16 Mb DRAM (4M x 4) 4-36 Semiconductor main memory Yonsei University Semiconductor main memory Chip Packaging • Pins support following signal lines – Address of words being accessed • For 1M words, a total of 20(220=1M) pins needed(A0-A19) – Data to be read out, consisting of 8 lines(D0-D7) – Power supply to the chip(Vcc) – Ground pin(Vss) – Chip enable (CE)pin – Program voltage(Vpp) that is supplied during programming(writing operation) 4-37 Yonsei University Typical Memory Package 4-38 Semiconductor main memory Yonsei University Module Organization Semiconductor main memory • How a memory module consisting of 256K 8-bit words could be organized – For 256K word, 18-bit address is needed and is supplied to the module from some external sources 4-39 Yonsei University Semiconductor main memory Module Organization • Possible organization of a memory consisting of 1M word by 8 bits per word – Need four columns of chips, each column containing 256K words arranged 4-40 Yonsei University 1-Mbyte Memory Organisation 4-41 Semiconductor main memory Yonsei University Semiconductor main memory Error Correction • Hard Failure – Permanent physical defect – Memory cell or cells affected cannot reliably store data and become stuck at 0 or 1or switch erratically between 0 and 1 – Caused by harsh environmental abuse, manufacturing defects and wear • Soft Error – Random, non-destructive – No permanent damage to memory – Caused by power supply problems or alpha particles • Detected using Hamming error correcting code 4-42 Yonsei University Error - Correcting Code Semiconductor main memory • When data are to be read into memory, a function f is performed to produce a code • Both the data and the code are stored • If M-bit data word and K-bit code, the stored word is M+K bits 4-43 Yonsei University Error Correction Semiconductor main memory • If M-bit word of data is stored, and code is K bit, then actual size of stored word is M+K bits • New set of K code is generated from M data bits compared with fetched code bits • Comparison one of three results – No errors detected • The fetched data bits sent out – An error is detected and it is possible to correct error • The data bit plus error-correction bits fed into corrector which produces a corrected set of M bits to be sent out – An error is detected, but it is impossible to correct it • This condition reported 4-44 Yonsei University Hamming Error-Correcting Code 4-45 Semiconductor main memory Yonsei University Semiconductor main memory Hamming Code • Parity bits • By checking the parity bits, discrepancies found in circle A and circle C, but not in circle B • Only one of the seven compartments is in A and C but not B • The error can be corrected by changing that bit • Syndrome word : result of comparison of bit-by-bit • Each bit of syndrome is 0 or 1 according to if there is or is not a match in position for two input • The syndrome word is K bits wide and has a range between 0 and 2K-1 2K – 1 ≥ M + k 4-46 Yonsei University Semiconductor main memory Increase in Word Length Single-Error Single-ErrorCorrection/DoubleCorrection Error Detection Data Bits Check Bits %Increase Check Bits %Increase 4-47 8 4 50 5 62.5 16 5 31.25 6 37.5 32 6 18.75 7 21.875 64 8 10.94 8 12.5 128 256 8 9 6.25 3.52 9 10 7.03 3.91 Yonsei University Error Correction Semiconductor main memory • To generate 4-bit syndrome – If the syndrome contains all 0s, no error is detected – If the syndrome contains one and only one bit set to 1, an error occurs in one of 4 check bits • No correction needed – If the syndrome contains more than one bit set to 1, numerical value of syndrome indicates the position of data bit in error • This data bit is inverted for correction 4-48 Yonsei University Layout of Data Bits & Check Bits 4-49 Semiconductor main memory Yonsei University Layout of Data Bits & Check Bits Semiconductor main memory • To achieve these characteristics, data and check bits are arranged into a 12-bit word as depicted • Those bit positions whose position numbers are powers of 2 are designated as check bits 4-50 Yonsei University Error Correction Semiconductor main memory C1 ? M 1 ? M 2 ? M 4? M5? M7 C 2 ? M1 ? M3? M 4? M6? M7 C4 ? M 2? M3? M 4? M8 C8 ? M5? M6? M7 ? M8 • Each check bit operates on every data bit position whose position number contains 1 in corresponding column position 4-51 Yonsei University Check Bit Degeneration Semiconductor main memory • Data and check bits are positioned properly in the 12-bit word. • By laying out position number of each data bit in columns, the 1s in each row indicates data bits checked by the check bit for that row 4-52 Yonsei University Check Bit Degeneration 4-53 Semiconductor main memory Yonsei University Semiconductor main memory Error Correction • Single-error-correcting(SEC) • Single-error-correcting, double-error -detecting(SEC-DED) 4-54 Yonsei University Hamming SEC-DEC Code 4-55 Semiconductor main memory Yonsei University Error Correction Semiconductor main memory • IBM 30xx implementations use 8-bit SECDED code for each 64 bits of data in main memory • Size of main memory is 12% larger than apparent to the user • VAX computer use 7-bit SEC-DED for each 32 bits of memory, for 22% overhead • A number of contemporary DRAMs use 9 check bits for each 128 bits of data, for 7% overhead 4-56 Yonsei University Cache memory Cache & Main Memory • Small amount of fast memory • Sits between normal main memory and CPU • May be located on CPU chip or module 4-57 Yonsei University Cache memory Principle • • • • CPU requests contents of memory location Check cache for this data If present, get the word from cache (fast) If not present, main memory is read into cache and the word is delivered to the processor • The word is delivered from cache to CPU because of phenomenon of locality of reference • Cache includes tags to identify which block of main memory is in each cache slot 4-58 Yonsei University Cache memory Principle • For mapping purposes, the memory is considered to consist of number of fixedlength block of K words each – Number of block : M = 2n/K • Cache consists of C lines of K words each and the number of lines is considerably less than the number of main memory blocks • Each line includes a tag that identifies which particular block is currently being stored 4-59 Yonsei University Cache memory Cache/Main-Memory Structure 4-60 Yonsei University Cache memory Principle • Cache hit occurs – Data and address buffers are disabled and the communication is only between the processor and cache, with no system bus traffic • Cache miss occurs – Desired address is loaded onto the system bus and the data are returned through data buffer to both cache and main memory 4-61 Yonsei University Cache memory Cache Read Operation 4-62 Yonsei University Cache memory Elements of Cache Design Cache size Mapping Function Direct Associative Set Associative Replacement Algorithm Least recently used(LRU) First in first out(FIFO) Least frequently used(LFU) Random 4-63 Write Policy Write through Write back Write once Line Size Number of Caches Single or two level Unified or split Yonsei University Cache memory Cache Size • Small enough that overall average cost per bit is close to that of main memory • Large enough that overall average access time is close to that cache alone • The larger cache, the larger number of gates involved in addressing the cache • The larger cache tend to be slower than small caches • Cache size is limited by available chip and board area • Impossible to arrive “optimum” size because cache’s performance is very sensitive to the nature of workload 4-64 Yonsei University Cache memory Factors For Cache Size • Cost – More cache is expensive • Speed – More cache is faster (up to a point) – Checking cache for data takes time 4-65 Yonsei University Cache memory Typical Cache Organization 4-66 Yonsei University Cache memory Mapping Function • Needed for determining which main memory block currently occupies a cache line • Direct mapping – Cache of 64kByte – Cache block of 4 bytes • i.e. cache is 16k (214) lines of 4 bytes – 16MBytes main memory – 24 bit address • (224=16M) 4-67 Yonsei University Cache memory Direct Mapping • Maps each block of main memory into only one possible cache line i = j modulo m where i = cache line number j = main memory block number m= number of lines in the cache 4-68 Yonsei University Direct Mapping Cache Organization 4-69 Yonsei University Cache memory Direct Mapping Cache Line Table Cache line Main memory block assigned 0 1 . . . m -1 0, m, 2m, … , 2s- m 1, m +1, 2m +1, … , 2s- m + 1 . . . m –1, 2m –1, 3m –1,… , 2s-1 Cache memory – Least significant w bits identify a unique word or byte – Most significant s bits specify one memory block – The MSBs are split into a cache line field r and a tag of s-r (most significant) – This latter field identifies one of the m=2r lines of cache 4-70 Yonsei University Cache memory Direct Mapping Example 4-71 Yonsei University Cache memory Direct Mapping Address Structure Tag s-r 8 Line or Slot r Word w 14 2 • 24 bit address • 2 bit word identifier (4 byte block) • 22 bit block identifier – 8 bit tag (=22-14) – 14 bit slot or line • No two blocks in the same line have the same Tag field • Check contents of cache by finding line and checking Tag 4-72 Yonsei University Cache memory Direct Mapping • Advantage – Direct mapping technique is simple and inexpensive to implement • Disadvantage – A fixed cache location for any given block – The hit ratio will be low 4-73 Yonsei University Cache memory Associative Mapping • A main memory block can load into any line of cache • The tag field uniquely identifies a block of main memory • To determine whether a block is in the cache, the cache control logic must simultaneously examine every line’s tag for a match • Cache searching gets expensive 4-74 Yonsei University Fully Associative Cache Organization 4-75 Yonsei University Cache memory Cache memory Associative Mapping Example 4-76 Yonsei University Associative Mapping Address Structure Cache memory Word 2 bit Tag 22 bit • 22 bit tag stored with each 32 bit block of data • Compare tag field with tag entry in cache to check for hit • Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block • e.g. – Address – FFFFFC 4-77 Tag FFFFFC Data 24682468 Cache line 3FFF Yonsei University Cache memory Associative Mapping • Replacement algorithms is designed to maximize the hit ratio • Advantage – Flexible replacement of blocks when a new block is read into the cache • Disadvantage – Complex circuitry is required to examine the tags of all cache lines in parallel 4-78 Yonsei University Cache memory Set Associative Mapping • Compromise that reduces disadvantage of direct & associative approaches m=v× k i = j modulo v Where i = cache set number j = main memory block number m= number of lines in the cache k-way set associative mapping – V=m, k=1, the set associative technique reduces to direct mapping – V=1, k=m, reduces to associative mapping 4-79 Yonsei University Cache memory Set Associative Mapping • Cache is divided into a number of sets • Each set contains a number of lines • A given block maps to any line in a given set – e.g. Block B can be in any line of set i • e.g. 2 lines per set – 2 way associative mapping – A given block can be in one of 2 lines in only one set 4-80 Yonsei University K- Way Set Associative Cache Organization 4-81 Yonsei University Cache memory Set Associative Mapping Address Structure Tag 9 bit Word 2 bit Set 13 bit • Use set field to determine cache set to look in • Compare the tag field to see if we have a hit • e.g – Address – 1FF 7FFC – 001 7FFC 4-82 Tag 1FF 001 Data 12345678 11223344 Cache memory Set number 1FFF 1FFF Yonsei University Set Associative Mapping Example • 13 bit set number • Block number in main memory is modulo 213 • 000000, 008000,… , FF8000map to same set 4-83 Yonsei University Cache memory Two Way Set Associative Mapping Example 4-84 Yonsei University Cache memory Cache memory Replacement Algorithms • A new block is brought into cache, one of the existing blocks must be replaced • Direct mapping – Because there is only one possible line for any particular block, the block is must be replaced • Associative & set associative techniques – Need replacement algorithm • Least recently used(LRU) – Most effective – Replace that block in the set that has been in the cache longest with no reference – When a line is referenced, its USE bit is set to 1 and the USE bit of the other line is set to 0 4-85 Yonsei University Cache memory Replacement Algorithms • First in first out (FIFO) – Replace block that has been in cache longest • Least frequently used(LFU) – Replace block which has experienced the fewest references • Random – Not based on usage 4-86 Yonsei University Cache memory Write Policy • Must not overwrite a cache block unless main memory is up to date • If it has not updated, old block in the cache may be overwritten • When multiple processors are attached to the same bus and each processor has its own local cache – A word altered in one cache, invalidates a word in other cache • Multiple CPUs may have individual caches • I/O may address main memory directly 4-87 Yonsei University Cache memory Write Through • All writes go to main memory as well as cache • Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date • Disadvantage – Generates substantial memory traffic and may create bottleneck 4-88 Yonsei University Cache memory Write Back • Minimizes memory writes • Updates are made only in the cache • An UPDATE bit for cache slot is set when an update occurs • If a block is to be replaced, it is written back to main memory if and only if the UPDATE bit is set • Other caches get out of sync • I/O must access main memory through cache – Complex circuitry and a potential bottleneck • 15% of memory references are writes 4-89 Yonsei University Cache memory Cache Coherency • If data in one cache is altered, this invalidates the corresponding word in main memory and the same word in other caches • Even if a write-through policy is used, the other caches may contain invalid data • A system that prevent this problem said to maintain cache coherency – Bus watching with write through – Hardware transparency – Noncachable memory 4-90 Yonsei University Bus Watching With Write Through Cache memory • Each cache controller monitors address lines to detect write operations to memory by other bus masters • If another master writes to a location in shared memory that also resides in cache memory, cache controller invalidates that cache entry • This strategy depends on the use of a writethrough policy by all cache controller 4-91 Yonsei University Cache memory Hardware Transparency • Additional hardware used to ensure that all updates to main memory via cache reflected in all cache • If one processor modifies a word in its cache, this update is written to main memory • Any matching words in other caches are similarly updated 4-92 Yonsei University Cache memory Noncachable Memory • Only a portion of main memory is shared by more than one processor, and this is designated as noncachable • All accesses to shared memory are cache misses because the shared memory is never copied into the cache • The noncachable memory can be identified using chip-select logic of high-address bits 4-93 Yonsei University Cache memory Line size • When a block of data is retrieved and placed in the cache, the desired word and some number of adjacent words are retrieved • Two specific effects come into play – Larger blocks reduce the number of block that fit into a cache. Because each block fetch overwrites older cache contents, a small number of blocks result in data being overwritten shortly after it is fetched – As a block becomes larger, each additional word is farther from the requested word, and therefore less likely to be needed in near future 4-94 Yonsei University Cache memory Line size • The relationship between block size and hit ratio is complex, depending on the locality characteristics of a particular program • No definitive optimum value has been found • A size of from 2 to 8 words seems reasonably close to optimum 4-95 Yonsei University Cache memory Number of Caches • On chip cache – Possible to have cache on the same chip as the processor – Reduces processor’s external bus activity and speed up execution times and increases overall system performance • Most contemporary designs include both on-chip and external caches • Two-level cache – Internal cache designed level 1(L1) and external cache designed level 2(L2) 4-96 Yonsei University Cache memory Number of Caches • Reason for including L2 cache – If there is no L2 cache and the processor makes access request for memory location not in L1 cache, the processor must access DRAM or ROM memory across the bus • Poor performance because of slow bus speed and slow memory access time – If L2 SRAM cache is used, missing information can be quickly retrieved – Effect of using L2 depends on hit rates in both L1 and L2 caches 4-97 Yonsei University Cache memory Number of Caches • To split the cache into two: one dedicated to instruction and one dedicated to data • Potential advantage of a unified cache – For a given cache size, a unified cache has higher hit rate than split caches because it balances the load between instruction and data fetches automatically – Only one cache needs to be designed and implemented 4-98 Yonsei University Cache memory Number of Caches • The trend toward split caches – Superscalar machines (Pentium II, PowerPC) which emphasize parallel instruction execution and prefetching of predicted future instructions – The key advantage of the split cache design is that it eliminates condition for cache between the instruction processor and the execution unit – Important in any design that relies on the pipelining of instructions 4-99 Yonsei University Pentium II Block Diagram 4-100 Pentium II and PowerPC Yonsei University Structure Of Pentium II Data Cache Pentium II and PowerPC • LRU replacement algorithm • Write-back policy 4-101 Yonsei University Data Cache Consistency Pentium II and PowerPC • To provide cache consistency, data cache supports a protocol MESI (modified/exclusive/shared/invalid) • Data cache includes two status bit per tag, so that each line can be in one of four states – – – – 4-102 Modified Exclusive Shared Invalid Yonsei University Pentium II and PowerPC MESI Cache Line States M E S Modified Exclusive Shared I Invalid This cache line valid? Yes Yes Yes No The memory copy is… Out of date Valid Valid - Copies exist in other caches? No No Maybe Maybe Does not go to bus Goes to bus and updates cache Goes directly to use A write to this line… 4-103 Does not go to bus Yonsei University Pentium II and PowerPC Cache Control • Internal cache controlled by two bits – CD (cache disable) – NW(not write – through) • Two Pentium II instruction that used to control cache – INVD invalidates(flushes) internal cache memory and signals external cache to invalidate – WBINVD writes back and invalidates internal cache, then writes back and invalidates external cache 4-104 Yonsei University Pentium II Cache Operation Modes Control Bits Pentium II and PowerPC Operating Mode CD NW Cache Fills Write Throughs 0 0 Enabled Enabled Enabled 1 1 0 1 Disabled Disabled Enabled Disabled Enabled Disabled 4-105 Invalidates Yonsei University PowerPC Internal Cache Pentium II and PowerPC Model Size Bytes/Line Organization PowePC 601 1 32-kbyte 32 8-way set associative PowePC 603 2 8-kbyte 32 2-way set associative PowePC 604 2 16-kbyte 32 4-way set associative PowePC 620 2 32-kbyte 64 8-way set associative 4-106 Yonsei University PowerPC G3 Block Diagram 4-107 Pentium II and PowerPC Yonsei University PowerPC Cache Organization Pentium II and PowerPC • The L1 caches are eight-way set associative and use a version of the MESI cache coherency protocol • The L2 cache is a two-way set associative cache with 256K, 512K, of 1 Mbyte of memory 4-108 Yonsei University Enhanced DRAM(EDRAM) Advanced DRAM organization • Integrates a small SRAM cache onto a generic DRAM chip • Refresh operations can be conducted in parallel with cache read operations, minimizing the time that the chip is unavailable due to refresh • The read path from the row cache to the output port is independent of the write path from the I/O module to the sense amplifiers • This enables a subsequent read access to the cache to be satisfied in parallel with the completion of the write operation 4-109 Yonsei University EDRAM 4-110 Advanced DRAM organization Yonsei University Cache DRAM(CDRAM) Advanced DRAM organization • Includes a larger SRAM cache than EDRAM • SRAM on the CDRAM can be used in two ways – Used as a true cache, consisting of a number of 64-bit lines • The cache mode is effective for ordinary random access to memory – Used as a buffer to support the serial access of a block of data 4-111 Yonsei University Synchronous DRAM(SDRAM) Advanced DRAM organization • Exchanges data with the processor synchronized to an external clock signal and running at the full speed of the processor/memory bus without imposing wait states • With synchronous access, the DRAM moves data in & out under control of the system clock • Employs burst mode – To eliminate the address setup time and row and column line precharge time after first access – In burst mode, a series of data bits can be clocked out rapidly after the first bit has been accessed – Useful when all bits are to be accessed in sequence and in the same row of the array as the initial access 4-112 Yonsei University SDRAM Advanced DRAM organization • Dual-bank architecture that improves opportunities for on-chip parallelism • The mode register and associated control logic is another key feature differentiating SDRAMs from conventional DRAMs • It provides a mechanism to customize the SDRAM to suit specific system needs • The mode register specifies the burst length • SDRAM performs best when it is transferring large blocks of data serially 4-113 Yonsei University SDRAM 4-114 Advanced DRAM organization Yonsei University Rambus DRAM(RDRAM) Advanced DRAM organization • A more revolutionary approach to the memory-bandwidth problem • The chip exchanges data with processor over 28 wires no more than 12cm long • The bus can address up to 320 RDRAM chip and is rated at 500 Mbps • The special RDRAM bus delivers address and control information using an asynchronous block-oriented protocol • An RDRAM gets a memory request over the high-speed bus 4-115 Yonsei University RamLink Advanced DRAM organization • A memory interface with point-to-point connections arranged in a ring – Traffic on the rings is managed by a memory controller that sends messages to the DRAM chips • Data is exchanged in the from of packets • Request packets initiate memory transaction • Strengths – Provides a scalable architecture that supports a small or large number of DRAMs – Does not dictate internal DRAM structure 4-116 Yonsei University RamLink Advanced DRAM organization • RamLink Architecture 4-117 Yonsei University RamLink Advanced DRAM organization • Packet format 4-118 Yonsei University Characteristics of Two-Level Memories Appendix 4A Main Memory Cache Virtual Memory (Paging) Disk Cache Typical access time ratios 5/1 1000/1 1000/1 Memory management system Implemented by special hardware Combination of hardware and System software system software 4 to 128 bytes 64 to 4096 bytes 64 to 4096 bytes Direct access Indirect access Typical block size Access of processor to second level 4-119 Indirect access Yonsei University Relative Dynamic Frequency Study Language [HUCK83] [KNUT71] Pascal Workload Scientific FORTRAN Student Appendix 4A [PATT82] Pascal C System System [TANE78] SAL System Assign Loop 74 4 67 3 45 5 38 3 42 4 Call 1 3 15 12 12 IF 20 11 29 43 36 GoTo Other 2 - 9 7 6 3 1 6 4-120 Yonsei University Locality Appendix 4A • Each call is represented by the line moving down and to the right • Each return is represented by the line moving up and to the right • A window with depth equal to 5 is defined – Only a sequence of calls and returns with a net movement of 6 in either direction causes the window to move 4-121 Yonsei University The Call/Return Behavior of Programs 4-122 Appendix 4A Yonsei University Locality Appendix 4A • Spatial locality – Refers to the tendency of execution to involve a number of memory locations that are clustered • Temporal locality – Refers to the tendency for a processor to access memory locations that have been used recently 4-123 Yonsei University Locality of Reference For Web Pages 4-124 Appendix 4A Yonsei University Operation of Two-Level Memory Appendix 4A Ts = H ×T1 + (1 - H) × (T1 + T2 ) = T1 + (1 – H) × T2 where Ts = average(system) access time T1 = access time of M1 (e.g., cache, disk cache) T2 = access time of M2 (e.g., main memory, disk) H = hit ratio (fraction of time reference is found in M1) 4-125 Yonsei University Performance C Appendix 4A s ? C S S 1 1 1 ? C S ? S 2 2 2 Where Cs = average cost per bit for the combined two-level memory C1 = average cost per bit of upper-level memory M1 C2 = average cost per bit of lower-level memory M2 S1 = size of M1 S2 = size of M2 We would like Cs ˜ C2 (Given C1 >> C2 , this requires S1 << S2 ) 4-126 Yonsei University Memory cost Vs. Memory Size 4-127 Appendix 4A Yonsei University Performance Appendix 4A • Consider the quantity T 2 / T1 , which referred to as the access efficiency T T 4-128 1 s ? 1 1 ? (1 ? H ) T T 2 1 Yonsei University Access Efficiency Vs. Hit Ratio(T2 / T1 ) 4-129 Appendix 4A Yonsei University Hit Ratio Vs. Memory Size 4-130 Appendix 4A Yonsei University Performance Appendix 4A • If there is strong locality, it is possible to achieve high values of hit ratio even with relatively small upper-level memory size – Small cache sizes will yield a hit ratio above 0.75 regardless of the size of main memory – A cache in the range of 1K to 128K words is generally adequate, whereas main memory is now typically in the multiple-megabyte range • If we need only a relatively small upper-level memory to achieve good performance, the average cost per bit of the two levels of memory will approach that of the cheaper memory 4-131 Yonsei University