EE 5324 – VLSI Design II Part V: Memory Design Kia Bazargan University of Minnesota Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 199 References and Copyright • Textbooks referenced [Rab96] J. M. Rabaey “Digital Integrated Circuits: A Design Perspective” Prentice Hall, 1996. • Slides used(Modified by Kia when necessary) [©Hauck] © Scott A. Hauck, 1996-2000; G. Borriello, C. Ebeling, S. Burns, 1995, University of Washington [©Prentice Hall] © Prentice Hall 1995, © UCB 1996 Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 200 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 201 Registers • Used for storing data • Structure N-bit wide Parallel/serial read/write Clocked Static/dynamic implementation Multiple read/write ports possible Example: 32-bit wide by 16-bit deep, dual-port parallel read, single port parallel write register file Spring 2006 16 words • Register files 32 bits EE 5324 - VLSI Design II - © Kia Bazargan ... 32 [©Hauck] 202 Implementing Registers Using Logic Gates • Flip-flops Simple SR latch: S Q Q R S Q R Q S 1 1 0 0 R 1 0 1 0 Q Q 0 1 x Q’ Q’ 1 0 x Flip-flops o JK, D, T o Clocked o Master-slave (edge-triggered) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 203 Implementing Registers in CMOS • Direct gate implementation too costly A master-slave JK flip-flop uses 38 CMOS transistors • Directly implement in transistors Example: clocked SR FF Q Q f Note: carefully size the S S, R and f transistors so that we can write Q f R [Rab96] p.342 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 204 Implementing Registers in CMOS (cont.) • Another example: D latch (register) Uses transmission gate When “WR” asserted, “write” operation will take place Stack D latch structures to get n-bit register WR D Q Q WR WR WR Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 205 Shift Registers: Idea • Shift registers are used for iteratively shifting data Used in pipelining, bit-by-bit processing, etc. f D D1 D1 f • Problem? f D2 D2 f D1 f D3 D3 f D2 D3 When clock goes high, the data will traverse all the shift registers chain in one clock cycle! Solution: use non overlapping clocks f1 and f2. f1 used by odd gates, f2 by even gates (use xmission gates after D1’, D2’, D3’). Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 206 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 207 Memory (Array) Design • Array of bits • Area very important Memory takes considerable area in processor chips Compaction results in fewer memory chip modules, more on-chip cache • Timing and power consumption of memory blocks have significant impact on the system • Different types RAM (SRAM, DRAM, CAM) ROM (PROM, EEPROM, FLASH) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 208 Memory Design (cont.) • Static vs. dynamic RAM Dynamic needs refreshing o Refreshing: read, then write back to restore charge o Either periodically or after each read • Static (SRAM) Data stored as long as supply voltage is applied Large (6 transistors/cell) Fast • Dynamic (DRAM) Spring 2006 Periodic refresh required Small (1-3 transistors/cell) Slower Special fabrication process EE 5324 - VLSI Design II - © Kia Bazargan [©Prentice Hall] 209 Memory Architecture: the Big Picture • Address: which one of the M words to access • Data: the N bits of the word are read/written S0 S1 S2 SM-2 SM-1 word select lines Spring 2006 Storage cells ... Address decoder A0 A1 Word M-2 Word M-1 ... Ak-1 N bits Decoder ... Word 0 Word 1 Word 0 Word 1 ... ... SM-2 SM-1 Word M-2 Word M-1 k = log2 (M) EE 5324 - VLSI Design II - © Kia Bazargan S0 S1 S2 N bits 210 Memory Access Timing: the Big Picture • Timing: Send address on the address lines, wait for the word line to become stable Read/write data on the data lines Read Cycle READ Read Access Write Cycle Read Access WRITE Write Access Data Valid Data Written DATA [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 211 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 212 Memory Cell: Static RAM (8 transistors) • 8-transistor cell Bit_i is the data bus Sj is the word line biti Sj biti – Bit’ used to reduce delay • Bus drivers Sj – Sense Amplifier (inverter with high gain) used for fast switching – Make sure inverters in cell are weaker than the combination of “write buffer” and pass transistor Spring 2006 Rd/WR Very big driver EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 213 Memory Cell: Static RAM (6 transistors) • 6-transistor cell Must adjust inverters for input coming through n-type pass gate biti (BL) Sj (WL) biti • Bus drivers Must adjust senseAmp for input coming through n-type pass gate Harder to drive 1 than 0 through write buffer (high resistance via n-transistor) One side is sending 0 anyway (bit or bit’) written correctly Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan Rd/WR [©Hauck] 214 6-Transistor SRAM Cell: Layout WL is word line (select line Sj) BL is bit line (biti) VDD M2 M2 M5 Q M4 Q Q Q M1 M6 M3 GND M1 BL WL Vdd M4 M3 M5 BL BL M6 WL BL [Rab96] p.578 [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 215 6-Transistor Memory Array bit0 • 8 words deep RAM, 2 bits wide words S0 • To write to word j: S1 Set Sj=1, all other S lines to 0 Send data on the global bit0, bit0’, bit1, bit1’ bit0 bit1 bit1 S7 • To read word k: Set Sk=1, all other S lines to 0 Sense data on bit0 and bit1. Spring 2006 Rd/WR bit0 Rd/WR bit0 EE 5324 - VLSI Design II - © Kia Bazargan bit1 bit1 216 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 217 Dynamic RAM 4-Transistor Cell • 4-transistor cell • Dynamic charge storage must be refreshed • Dedicated busses for reading and writing data in data out WR Rd Keeps its value for about 1ms [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 218 Dynamic RAM 3-Transistor Cell • 3-transistor cell No p-type transistors yield a very compact layout for cell No Vdd connection Sense Amplifier must be able to quickly detect dropping voltage precharge data in data out WR Rd – Precharge data_out’ to generate ‘1’ outputs [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 219 Dynamic RAM 3-Transistor Cell: Timing precharge WR data in Rd WR X Vdd-VT data in data out data out X Vdd Vdd DV Rd Value stored at node X when writing a “1”=VWR-VTn [Rab96] p.586 [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 220 Dynamic RAM 3-Transistor Cell: Layout Din Dout Rd Din WR M1 Rd Dout GND M3 M3 M2 M2 WR M1 [Rab96] p.586 [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 221 Dynamic RAM 1-Transistor Cell • 1-transistor cell Storage capacitor is source of cell transistor Special processing steps to make the storage capacitor large Charge sharing with bus capacitance Bi Precharge to middle voltage level Si (WL) Storage capacitor (Ccell << Cbus) Extra demand on sense amplifier to detect small changes Destructive read (must write immediately) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 222 Dynamic RAM 1-Transistor Cell: Timing Write "1" WL X GND BL Vdd/2 Read "1" BL Vdd-VT WL X Cs Vdd Vdd/2 sensing CBL • Write: Cs is charged/discharged • Read Voltage swing is small (~250 mV) Spring 2006 DV = VBL - VPRE = (VX - VPRE) . Cs / (Cs+CBL) EE 5324 - VLSI Design II - © Kia Bazargan [Rab96] p.587 [©Prentice Hall] 223 Dynamic RAM 1-Transistor Cell: Observations • DRAM memory cell is single-ended • Read operation is destructive • Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design Polysilicon-diffusion plate capacitor Trench or stacked capacitor • When writing a “1” into a DRAM cell, a threshold voltage is lost Set WL to a higher value than Vdd [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 224 Dynamic RAM 1-Transistor Cell: Layout Capacitor Metal word line poly poly n+ n+ M1 word line SiO2 Field Oxide Inversion layer induced by plate bias Diffused bit line Polysilicon gate Polysilicon plate (a) Cross-section (b) Layout Used Polysilicon-Diffusion Capacitance Expensive in Area [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 225 Dynamic RAM 1-Transistor Cell: Layout [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 226 Dynamic RAM 1-Transistor Cell: Layout Capacitor Word line Dielectric Insulating Layer Cell plate layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell Spring 2006 Stacked-capacitor Cell EE 5324 - VLSI Design II - © Kia Bazargan [©Prentice Hall] 227 RAM Cells: Summary • Static Fastest (no refresh) Simple design Right solution for small memory arrays such as register files • Dynamic Densest: 1T is best and is the way to go for large memory arrays Built-in circuitry to step through cells and refresh (can do more than one word at a time) Sense amplifier needed for fast read operation [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 228 Multi-Port RAM Cells bus_B bus_A row-bus_A row-bus_B bus_B bus_A • Idea: add more input and output transistors • Can be applied to all variants Usually not done for 1T cells [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 229 Multi-Port RAM Cells Array • 7 words deep, 2 wide words, dual port mem • To read from word j and write “d1d0” to word k simultaneously: SA0 SB0 SA1 ... bus_B1 bus_A1 bus_A1 bus_B1 EE 5324 - VLSI Design II - © Kia Bazargan bus_B0 bus_A0 bus_A0 Spring 2006 ... bus_B0 SB1 Set SAj=1, and all other SA’s=0 SA7 Set SBk=1, and all other SB’s=0 Sense the values on bus_A0 and bus_A1 SB7 Write d1d0 to bus_B0 and bus_B1 230 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 231 Read Only Memory (ROM) Cells: MOS NOR • To store constants data or invariant code • Popular for control implementation Store program or state machine • Programmable logic array structure • Can be precharged or pseudo-nMos MOS NOR ROM read1 bit1 bit2 bit3 010 read2 001 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 232 ROM Cell: MOS NOR Layout Metal1 on top of diffusion WL0 Basic cell 10 l x 7 l WL1 GND (diffusion) Polysilicon Metal1 WL2 WL3 Only 1 layer (metal-to-diffusion contact mask) is used to program memory array Programming of the memory can be delayed to one of last process steps [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 233 ROM Cell: MOS NOR Alternative Layout BL0 BL1 BL2 BL3 Basic Cell 8.5l x 7 l WL0 WL1 WL2 Threshold raising implant GND (diffusion) Metal1 over diffusion Polysilicon WL3 Threshold raising implants disable transistors [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 234 ROM Cell: MOS NAND Pullup devices BL0 BL1 BL2 BL3 WL0 WL1 WL2 WL3 All word lines high by default with exception of selected row [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 235 ROM Cell: MOS NAND: Layout Diffusion Polysilicon Basic cell 5l x 6l Threshold lowering implant • No contact to Vdd or GND necessary drastically reduced cell size • Loss in performance compared to NOR ROM Why? Spring 2006 [©Prentice Hall] EE 5324 - VLSI Design II - © Kia Bazargan 236 ROM Cells: Summary • Mask programmability • Precharged vs. pseudo-nMos • NAND cell, NOR cell Area Speed • Other types: EEPROM, etc. Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 237 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 238 Memory Cell Array Interface: Example • Memory parameters: 16-bit wide 1024-word deep Address = 00000010012 A0 A1 A2 A3 ... A9 Decoder • Accessing word 9 0 1 0 1 0 … 0 0 S0 0 S1 0 S2 0 1 S9 0 ... 0 Word 0 Word 1 Word 2 ... Word 9 . . . S1022 S1023 Word 1022 0 Word 1023 16 bits SenseAmp / Drivers 16 bits Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 239 Memory Cell Array Layout S0 S1 S2 • Memory performance (speed) • Memory area Cell array layout A0 A1 ... Ak-1 • How to layout the cells array? Linear is bad: o Long data busses large capacity o A lot of cells connected to data bus o Decoder will have a lot of logic levels Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan Decoder Storage cell speed (read, write) Data bus capacitance Periphery: address decoders, sense amplifiers, buffers ... Word 0 Word 1 Word 2 ... SM-2 SM-1 Word M-2 Word M-1 N bits SenseAmp / Drivers N bits 240 Memory Cell Array Layout (cont.) • Group the M words into M/L rows, each containing L words • Benefits? ... Ak-1 Row Decoder Alog L Alog L+1 S0..L-1 Word 0 Word 1 SL..2L-1 Word L Word L+1 S2L..3L-1 Word 2L Word 2L+1 ... SM-L..M-1 Word M-L ... ... N bits N bits SAmp/Drv SAmp/Drv A0 N bits ... Alog L-1 Spring 2006 N bits ... ... ... Word L-1 Word 2L-1 Word 3L-1 ... ... ... Word M-1 ... ... N bits SAmp/Drv ... N bits address: L bits k-L bits Column Decoder + MUX N bits EE 5324 - VLSI Design II - © Kia Bazargan 241 Memory Cell Array Access Example • word=16-bit wide(N), row=8 words(L), address=10 bits (k) • Accessing word 9= 00000010012 Row Decoder 1 A3 0 A4 0 … ... 0 A9 S0..7 S8..15 S16..23 S1016-1023 L=8 words Word 0 Word 8 Word 16 Word 1 Word 9 Word 17 ... ... ... ... ... ... ... Word 1016 ... ... Word 1023 ... ... 16 bits SAmp/Drv ... 16 bits 16 bits 16 bits SAmp/Drv SAmp/Drv 1 A0 0 A1 0 A2 Spring 2006 Word 7 Word 15 Word 23 16 bits 16 bits M/L = 1024/8= 128 rows Column Decoder + MUX 16 bits EE 5324 - VLSI Design II - © Kia Bazargan 242 Hierarchical Memory Structure • Taking the idea one step further Shorter wires within each block Enable only one block addr decoder power savings Row Address Column Address Block Address Blk EN Blk EN Blk EN Blk EN Global Bus SAmp/ Global drivers/ Drv sense amplifiers Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [Rab96] p. 558 243 Decreasing Word Line Delay • Word line delay comes into play! We used to have long busses, made 2D array shorter busses But, longer word lines! • How to decrease the delay on the word lines? Break the word line by inserting buffers Place the decoder in the middle Polysilicon word line Metal word line (a) Drive the word line from both sides Spring 2006 Polysilicon word line Metal bypass (b) Use metal bypass EE 5324 - VLSI Design II - © Kia Bazargan [©Prentice Hall] 244 Decreasing Word Line Delay (cont.) • Place the decoder in the middle • Add buffers to outputs of decoder memory cell array d e c o d e r memory cell array k Address lines [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 245 Row Decoder Implementation • Collection of 2k high fan-in (k inputs) logic gates • Regular and dense structure • N(AND) decoder WL0 = A0.A1.A2.A3.A4.A5.A6.A7.A8.A9 WL511 = A0.A1.A2.A3.A4.A5.A6.A7.A8.A9 • NOR decoder WL0 = A0+A1+A2+A3+A4+A5+A6+A7+A8+A9 WL511 = A0+A1+A2+A3+A4+A5+A6+A7+A8+A9 [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 246 Row Decoder Implementation (cont.) Precharge devices GND GND WL3 WL3 WL2 WL2 Vdd f A0 A0 A1 WL1 WL1 WL0 WL0 A1 Dynamic 2-to-4 NOR Decoder A0 A0 A1 A1 f 2-to-4 MOS Dynamic NAND Decoder [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 247 Row Decoder Implementation (cont.) WL1 WL0 A0A1 A0A1 A0A1 A0A1 A0 A1 A1 A0 A2A3 A2A3 A2A3 A2A3 A2 A3 A3 A2 Splitting decoder into two or more logic layers produces a faster and cheaper implementation Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [©Prentice Hall] 248 Column Multiplexers: Tree-Based Decoder • Route many inputs to a single output Inputs come from different words, same bit position • Series transistors are slow On the critical path too • Area? w0 w1 w2 w3 w4w5 w6 w7 A0' A0 A1' A1 A2' One-bit very small, but have to repeat the “decoding” for all bit positions. Spring 2006 Bit position i of L word columns A2 EE 5324 - VLSI Design II - © Kia Bazargan Bit position i [©Hauck] 249 Column Multiplexers: Faster Implementation • Decode address into one-hot signals • Each bit passes through single n-device or pass gate • Column decoding done in parallel w/ row decoding Alog L1 Decoder A0 A1 ... bit 0 bit 1 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 250 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 251 Content Addressable Memory (CAM) • Instead of address, provide data find a match Applications: cache, physical particle collider • Needs “Encoder”: Inverse function of decoder Take a one-hot collection of signals and encode them m bits 2n rows content addressable memory cell array m Spring 2006 e n c o d e r n EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 252 Content Addressable Memory Cell • Read and write like normal 6T memory cell • Match signal is precharged to 1, pulled to 0 if no match Send data on bit’ and data’ on bit for matching Match remains 1 iff all bits in word match row select match bit bit' [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 253 Encoders content addressable memory cell array e n c o d e r row select match bit Spring 2006 bit' EE 5324 - VLSI Design II - © Kia Bazargan 254 Content Addressable Memory (CAM) • Writing is done as normal SRAM Address decoder needed Drive row select • ½ n log n transistors on the address lines (in encoder) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 255 Outline • Registers (flip-flops), shift registers • Memory interface • Memory cells Static memory cell Dynamic memory cell ROM cells • Address decoders • Content addressable memory (CAM) • Non volatile memory cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 256 Non-Volatile Memory Cells • Programmable after fabrication • Keep their configuration even after the supply voltage is disconnected • Basic idea: Use a floating strip of polysilicon between the substrate and the gate Put charges on the floating gate Increase threshold voltage disable the device • Different types based on the erasure method Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 257 Floating-Gate Transistor (FAMOS) Floating gate Gate D Drain Source G t ox t ox n+ p Substrate S n+ (a) Device cross-section (b) Schematic symbol [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 258 Floating-Gate Transistor: Programming 20 V 10V 5V 0V 20 V -5 V - - S - - 5V 0V -2.5 V - - - - D Avalanche injection. S 5V - - - - D Removing programming voltage leaves charges trapped S D Programming results in higher VT [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 259 FLOTOX EEPROM Floating gate Source Gate Drain I -10 V 20-30 nm n+ p Substrate 10 V n+ VGD 10 nm (b) Fowler-Nordheim I-V characteristics (a) Flotox transistor BL WL V DD (c) EEPROM cell during a read operation Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [©Prentice Hall] 260 FLASH EEPROM Control gate Floating gate erasure n+ source Thin tunneling oxide programming n+ drain p-substrate [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 261 Cross-Section of NVM Cells Courtesy Intel [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 262 Characteristics of Some NVM Cells Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 263