EECS150 - Digital Design Lecture 20 - Memory April 4&9, 2002 John Wawrzynek Spring 2002 EECS150 - Lec19-memory Page 1 Memory Basics • Uses: – – – – – – • Example RAM: Register file data & program storage general purpose registers buffering table lookups CL implementation Whenever a large collection of state elements is required. • Types: – RAM - random access memory – ROM - read only memory – EPROM, FLASH - electrically programmable read only memeory Spring 2002 regid = register identifier sizeof(regid) = log2(# of reg) WE = write enable EECS150 - Lec19-memory Page 2 Register File Internals • Functionally the regfile is equivalent to a 2-D array of flip-flops: • Cell with write logic: How do we go from "regid" to "SEL"? Spring 2002 EECS150 - Lec19-memory Page 3 Regid (address) Decoding Spring 2002 EECS150 - Lec19-memory Page 4 Standard Internal Memory Organization • Special circuit tricks are used for the cell array to improve storage density. (We will look at these later) • RAM/ROM naming convention: – examples: 32 X 8, "32 by 8" => 32 8-bit words – 1M X 1, "1 meg by 1" => 1M 1-bit words Spring 2002 EECS150 - Lec19-memory Page 5 Read Only Memory (ROM) • Functional Equivalence: • Of course, full tri-state buffers are not needed at each cell point. • Single transistors are used to implement zero cells. Logic one’s are derived through precharging or bit-line pullup transistor. Spring 2002 EECS150 - Lec19-memory Page 6 Column MUX in ROMs and RAMs: • Controls physical aspect ratio • In DRAM, allows reuse of chip address pins Spring 2002 EECS150 - Lec19-memory Page 7 Cascading Memory Modules (or chips) • example 256 X 8 ROM using 256 X 4 parts: • example: 1K X * ROM using 256 X 4 parts: • each module has tri-state outputs: Spring 2002 EECS150 - Lec19-memory Page 8 Definitions • Bandwidth: Total amount of data accross out of a device or across an interface per unit time. (usually Bytes/sec) • Latency: A measure of the time from a request for a data transfer until the data is received. Memory Interfaces for Acessing Data • Asynchronous (unclocked): A change in the address results in data appearing • Synchronous (clocked): A change in address, followed by an edge on CLK results in data appearing. Somtimes, multiple request may be outstanding. • Volatile: Looses its state when the power goes off. Spring 2002 EECS150 - Lec19-memory Page 9 Example Memory Components: • Volatile: – Random Access Memory (RAM): • DRAM "dynamic" • SRAM "static" • Non-volatile: – Read Only Memory (ROM): • Mask ROM "mask programmable" • EPROM "electrically programmable" • EEPROM "erasable electrically programmable" • FLASH memory - similar to EEPROM with programmer integrated on chip Spring 2002 EECS150 - Lec19-memory Page 10 Volatile Memory Comparison • SRAM Cell • DRAM Cell word line word line bit line • • • • bit line bit line Larger cell lower density, higher cost/bit No refresh required • Simple read faster access Standard IC process natural for integration with logic • • Spring 2002 • Smaller cell higher density, lower cost/bit Needs periodic refresh, and refresh after read Complex read longer access time Special IC process difficult to integrate with logic circuits EECS150 - Lec19-memory Page 11 In Desktop Computer Systems: • SRAM (lower density, higher speed) used in CPU register file, on- and off-chip caches. • DRAM (higher density, lower speed) used in main memory • Closing the GAP: Innovation targeted towards higher bandwidth for memory systems: – – – – – – SDRAM - synchronous DRAM RDRAM - Rambus DRAM EDORAM - extended data out SRAM Three-dimensional RAM hyper-page mode DRAM video RAM multibank DRAM Spring 2002 EECS150 - Lec19-memory Page 12 Important DRAM Examples: • EDO - extended data out (similar to fast-page mode) – RAS cycle fetched rows of data from cell array blocks (long access time, around 100ns) – Subsequent CAS cycles quickly access data from row buffers if within an address page (page is around 256 Bytes) • SDRAM - synchronous DRAM – clocked interface – uses dual banks internally. Start access in one back then next, then receive data from first then second. • DDR - Double data rate SDRAM – Uses both rising (positive edge) and falling (negative) edge of clock for data transfer. (typical 100MHz clock with 200 MHz transfer). • RDRAM - Rambus DRAM – Entire data blocks are access and transferred out on a highspeed bus-like interface (500 MB/s, 1.6 GB/s) – Tricky system level design. More expensive memory chips. Spring 2002 EECS150 - Lec19-memory Page 13 Non-volatile Memory Used to hold fixed code (ex. BIOS), tables of data (ex. FSM next state/output logic), slowly changing values (date/time on computer) • Mask ROM – Used with logic circuits for tables etc. – Contents fixed at IC fab time (truly write once!) • EPROM (erasable programmable) & FLASH – requires special IC process (floating gate technology) – writing is slower than RAM. EPROM uses special programming system to provide special voltages and timing. – reading can be made fairly fast. – rewriting is very slow. • erasure is first required , EPROM - UV light exposure Spring 2002 EECS150 - Lec19-memory Page 14 FLASH Memory • Electrically erasable • In system programmability and erasability (no special system or voltages needed) • On-chip circuitry (FSM) to control erasure and programming (writing) • Erasure happens in variable sized "sectors" in a flash (16K - 64K Bytes) See: http://developer.intel.com/design/flash/ for product descriptions, etc. Spring 2002 EECS150 - Lec19-memory Page 15 Relationship between Memory and CL • Memory blocks can be (and often are) used to implement combinational logic functions: • Examples: – LUTs in FPGAs – 1Mbit x 8 EPROM can implement 8 independent functions each of log2(1M)=20 inputs. • The decoder part of a memory block can be considered a “minterm generator”. • The cell array part of a memory block can be considered an OR function over a subset of rows. Spring 2002 • The combination gives us a way to implement logic functions directly in sum of products form. • Several variations on this theme exist in a set of devices called Programmable logic devices (PLDs) EECS150 - Lec19-memory Page 16 A ROM as AND/OR Logic Device Spring 2002 EECS150 - Lec19-memory Page 17 PLD Summary Spring 2002 EECS150 - Lec19-memory Page 18 PLA Example Spring 2002 EECS150 - Lec19-memory Page 19 PAL Example Spring 2002 EECS150 - Lec19-memory Page 20 Memory Blocks in FPGAs • LUTs can double as small RAM blocks: – 5-LUT is a 16x1 memory – achieves 16x density advantage over using CLB flip-flops • Newer FPGA families include additional on chip RAM blocks (usually dual ported) – Called “block-rams” in Xilinx Virtex series Spring 2002 EECS150 - Lec19-memory Page 21 Memory Specification in Verilog • Memory modeled by an array of registers: reg[15:0] memword[0:1023]; // 1,024 registers of 16 bits each //Example Memory Block Specification //----------------------------//Read and write operations of memory. //Memory size is 64 words of 4 bits each. module memory (Enable,ReadWrite,Address,DataIn,DataOut); input Enable,ReadWrite; input [3:0] DataIn; input [5:0] Address; output [3:0] DataOut; reg [3:0] DataOut; reg [3:0] Mem [0:63]; //64 x 4 memory always @ (Enable or ReadWrite) if (Enable) if (ReadWrite) DataOut = Mem[Address]; //Read else Mem[Address] = DataIn; //Write else DataOut = 4'bz; //High impedance state endmodule Spring 2002 EECS150 - Lec19-memory Page 22 Error Correction Codes (ECC) • Memory systems generate errors (accidentally fliped-bits) – DRAMs store very little charge per bit – “Soft” errors occur occasionally when cells are struck by alpha particles or other environmental upsets. – Less frequently, “hard” errors can occur when chips permanently fail. • Where “perfect” memory is required – servers, spacecraft/military computers, … • Memories are protected against failures with ECCs • Extra bits are added to each data-word – extra bits are used to detect and/or correct faults in the memory system – in general, each possible data word value is mapped to a unique “code word”. A fault changes a valid code word to an invalid one which can be detected. Spring 2002 EECS150 - Lec19-memory Page 23 Simple Error Detection Coding Parity Bit • Each data value, before it is written to memory is “tagged” with an extra bit to force the stored word to have even parity: b7b6b5b4b3b2b1b0p • Each word, as it is read from memory is “checked” by finding its parity (including the parity bit). b7b6b5b4b3b2b1b0p + + c • A non-zero parity indicates an error occurred: – two errors (on different bits) is not detected (nor any even number of errors) – odd numbers of errors are detected. Spring 2002 EECS150 - Lec19-memory Page 24 Hamming Error Correcting Code • Use more parity bits to pinpoint bit(s) in error, so they can be corrected. • Example: SEC on 4-bit data – use 3 parity bits, with 4-data bits results in 7-bit code word – 3 parity bits sufficient to identify any one of 7 code word bits – overlap the assignment of parity bits so that a single error in the 7-bit work can be corrected • Group parity bits so they correspond to subsets of the 7 bits: – p1 protects bits 1,3,5,7 – p2 protects bits 2,3,6,7 – p3 protects bits 4,5,6,7 Spring 2002 1 2 3 4 5 6 7 p1 p2 d1 p3 d2 d3 d4 Bit position number 001 = 110 011 = 310 p1 101 = 510 111 = 710 010 = 210 011 = 310 p2 110 = 610 111 = 710 100 = 410 101 = 510 p3 110 = 610 111 = 710 EECS150 - Lec19-memory Page 25 Hamming Code Example 1 2 3 4 5 6 7 p1 p2 d1 p3 d2 d3 d4 • Example: c = c1c2c3= 101 – Note: parity bits occupy power-oftwo bit positions in code-word. • – On writing parity bits are • assigned to force even parity over their respective groups. – On reading, check bits (c1,c2,c3) • are generated by finding the parity of the group along with its parity bit. If an error occurred in a group, the corresponding check bit will be 1, if no error the check bit will be 0. – error in 4,5,6, or 7 (by c3=1) – error in 1,3,5, or 7 (by c1=1) – no error in 2, 3, 6, or 7 (by c2=0) Therefore error must be in bit 5. Note the check bits point to 5 By our clever positioning and assignment of parity bits, the check bits always address the position of the error! • c=000 indicates no error Spring 2002 EECS150 - Lec19-memory Page 26 Hamming Error Correcting Code • Overhead involved in single error correction code: – let p be the total number of parity bits and d the number of data bits in a p + d bit word. – If p error correction bits are to point to the error bit (p + d cases) plus indicate that no error exists (1 case), we need: 2p >= p + d + 1, thus p >= log(p + d + 1) for large d, p approaches log(d) • Adding on extra parity bit covering the entire word can provide double error detection 1 2 3 4 5 6 7 8 p1 p2 d1 p3 d2 d3 d4 p4 • On reading the C bits are computed (as usual) plus the parity over the entire word, P: C=0 P=0, no error C!=0 P=1, correctable single error C!=0 P=0, a double error occurred C=0 P=1, an error occurred in p4 bit Typical modern codes in DRAM memory systems: 64-bit data blocks (8 bytes) with 72-bit code words (9 bytes). Spring 2002 EECS150 - Lec19-memory Page 27