Low Power SRAM VLSI Final Presentation Stephen Durant Ryan Kruba Matt Restivo Voravit Vorapitat High Level Architecture ADDR 5:32 Block Enable SRAM Block OUT Block I/O READ WRITE ADDR DATA BLK ENABLE SRAM Block OUT Output Buffering SRAM Block Sense Amp BLK_EN0 BLK_EN1 ADDR 13:12 2:4 Decode r ADDR 14 Out BLK_EN2 BLK_EN3 Block Level Architecture BLK_EN CLK Pulse Gen 1 Precharge Decode r 6:64 ADDR SRAM Block BLK_EN Pulse Gen 1 Pulse Gen 2 Dela y S A S A S A Write S A Input Gating READ WRITE Register ADDR DATA Buffer ADDR 14:13 SRAM Block x8 Word Line Pulse WL WL Pulse Pulse WL to reduce the drop in bit line voltage during a read Size the inverters to create min WL pulse length min WL pulse occurs before the point where the sense amp can no longer execute a read Sense Amp Enabling Sense amp enabled after WL pulse to maximize differential current Wordline pulse generator clocks a second pulse generator to ensure proper SA timing SAE signal and precharge signal separate to allow outputs to hold to end of clock cycle Sense Amp VDD SAE SAE BL VDD BLB SAE Size the three nmos transistors to control: Bit line voltage drop Delay Gate Length Vs. Bit Line Voltage Drop Using a 5 V vdd and allowing OutB to drop to 4 V min 2 1.8 1.6 bit line voltage drop 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 l e ngt h 5 6 7 Delay from SAE to Out 2.5 2 Time (nS) 1.5 1 0.5 0 0 1 2 3 4 Le ngt h •From 50% SAE high to 50% Out low •Same parameters as bit line voltage graph 5 6 7 Memory Partitioning 32 blocks *256 rows *128 columns balance between idle block power savings and peripheral circuitry resulting block aspect ratio relatively square to limit maximum WL/BL capacitances WL partitioning and four words/row to reduce power Simulation Model Multiple voltage sources to accurately measure energy Sense Amp Wordline, active column, inactive column, and peripheral Etotal = EWL+32Eact+96Einact+ Eperipheral Bitcell VSS Sense Amp VSS Dummy Cells Bitcell VSS VSS Dummy Cells VSS Dummy Cells VSS Bitcell Dummy Cells Bitcell Low Power Techniques Optimal Signal Order for Energy Goal: Making WL pulse as short as possible. Read SAE must be asserted only after WL pulse ends. Write WL pulse must start after BL or BLB completely discharged. Write ’0’ CL K BL WL SA E Read Write ’1’ Read Lower Vdd Energy=CeffVdd2 (Rail to Rail) -Expected quadratic energy reduction Energy=CeffVdd∆V (BL/BLB during read) - ∆V should scale down but may not be as fast as Vdd so we expect between linear and quadratic energy reduction. Simulation Result for 1 bit Energy vs Vdd for 1 bit read/write 6 5 Energy (pJ) 4 WL 3 Write Read 2 DRead 1 0 0 0.5 1 1.5 Vdd (V) 2 2.5 3 Note: The Read/Write/Dread shown here is BL energy only Energy vs Vdd for 32 bit read/write 250 Energy (pJ) 200 150 TotalWrite TotalRead 100 Total Average 50 How far should we go? 0 0 0.5 1 1.5 Vdd (V) 2 2.5 ERe ad 32 EWL 32ERe ad 96ED Re ad EWrite32 EWL 32EWrite 96ED Re ad 3 Clock Gating Try to reduce the capacitance that high activity signal have to drive. Example: WL Pulse which have to drive 256 of 2-input NAND! Level 0 EffLoad=256 Level 1 EffLoad=128+2 Even Further Level 2 EffLoad=64+4 Simulation Result Energy vs Clock Gating Level 25 Energy (pJ) 20 15 10 5 0 0 0.5 1 1.5 2 Level of Clock Gating 2.5 3 3.5 Some note about clock gating It act like a decoder, in our design we choose to use level 2 clock gating for WL pulse so we did not need 8 to 256 decode any more, we just need the 6 to 64.