DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland DRAM Circuit and Architecture Basics • Overview • Terminology • Access Protocol • Architecture Word Line Storage element (capacitor) Bit Line Switching element DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Circuit Basics DRAM Cell University of Maryland DRAM Word Line Storage element (capacitor) Column Decoder Sense Amps Data In/Out Buffers ... Bit Lines... . .. Word Lines ... Switching element Row Decoder Bit Line Memory Array DRAM Memory System: Lecture 2 Row, Bitlines and Wordlines Spring 2003 Bruce Jacob David Wang DRAM Circuit Basics “Row” Defined University of Maryland Bit Lines Word Line “Row” of DRAM Row Size: 8 Kb @ 256 Mb SDRAM node 4 Kb @ 256 Mb RDRAM node DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Circuit Basics Sense Amplifier I University of Maryland 1 4 2 5 3 6 6 Rows shown Sense and Amplify DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland DRAM Circuit Basics Sense Amplifier II : Precharged precharged to Vcc/2 1 Vcc (logic 1) 4 Sense and Amplify 2 5 3 6 Gnd (logic 0) Vcc/2 DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Circuit Basics Sense Amplifier III : Destructive Read University of Maryland 1 4 2 3 Vcc (logic 1) Gnd (logic 0) 5 Sense and Amplify 6 Wordline Driven Vcc/2 DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Access Protocol ROW ACCESS University of Maryland DRAM Column Decoder Sense Amps Data In/Out Buffers . .. Word Lines ... AKA: OPEN a DRAM Page/Row or ACT (Activate a DRAM Page/Row) or RAS (Row Address Strobe) ... Bit Lines... Row Decoder CPU MEMORY BUS CONTROLLER Memory Array DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland once the data is valid on ALL of the bit lines, you can select a subset of the bits and send them to the output buffers ... CAS picks one of the bits big point: cannot do another RAS or precharge of the lines until finished reading the column data ... can’t change the values on the bit lines or the output of the sense amps until it has been read by the memory controller DRAM Circuit Basics “Column” Defined Column: Smallest addressable quantity of DRAM on chip SDRAM*: column size == chip data bus width (4, 8,16, 32) RDRAM: column size != chip data bus width (128 bit fixed) SDRAM*: get “n” columns per access. n = (1, 2, 4, 8) RDRAM: get 1 column per access. 4 bit wide columns #0 #1 #2 #3 #4 #5 “One Row” of DRAM * SDRAM means SDRAM and variants. i.e. DDR SDRAM DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Access Protocol COLUMN ACCESS I University of Maryland DRAM Column Decoder Sense Amps Data In/Out Buffers . .. Word Lines ... READ Command or CAS: Column Address Strobe ... Bit Lines... Row Decoder CPU MEMORY BUS CONTROLLER Memory Array DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Access Protocol Column Access II University of Maryland DRAM then the data is valid on the data bus ... depending on what you are using for in/out buffers, you might be able to overlap a litttle or a lot of the data transfer with the next CAS to the same page (this is PAGE MODE) Column Decoder Sense Amps Data In/Out Buffers . .. Word Lines ... Data Out ... Bit Lines... Row Decoder CPU MEMORY BUS CONTROLLER Memory Array ... with optional additional CAS: Column Address Strobe note: page mode enables overlap with CAS DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland DRAM “Speed” Part I How fast can I move data from DRAM cell to sense amp? NOTE DRAM Column Decoder Sense Amps Data In/Out Buffers RCD (Row Command Delay) . .. Word Lines ... tRCD ... Bit Lines... Row Decoder CPU MEMORY BUS CONTROLLER Memory Array Bruce Jacob David Wang University of Maryland DRAM “Speed” Part II How fast can I get data out of sense amps back into memory controller? tCAS aka tCASL aka tCL DRAM Column Decoder Sense Amps Data In/Out Buffers CPU MEMORY BUS CONTROLLER CAS: Column Address Strobe CASL: Column Address Strobe Latency CL: Column Address Strobe Latency ... Bit Lines... . .. Word Lines ... Spring 2003 Row Decoder DRAM Memory System: Lecture 2 Memory Array Bruce Jacob David Wang University of Maryland DRAM “Speed” Part III How fast can I move data from DRAM cell into memory controller? DRAM Column Decoder Sense Amps Data In/Out Buffers CPU MEMORY BUS CONTROLLER tRAC = tRCD + tCAS RAC (Random Access Delay) ... Bit Lines... . .. Word Lines ... Spring 2003 Row Decoder DRAM Memory System: Lecture 2 Memory Array Bruce Jacob David Wang University of Maryland DRAM “Speed” Part IV How fast can I precharge DRAM array so I can engage another RAS? DRAM Column Decoder Sense Amps Data In/Out Buffers CPU MEMORY BUS CONTROLLER tRP RP (Row Precharge Delay) ... Bit Lines... . .. Word Lines ... Spring 2003 Row Decoder DRAM Memory System: Lecture 2 Memory Array DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM “Speed” Part V How fast can I read from different rows? University of Maryland DRAM Column Decoder Sense Amps Data In/Out Buffers tRC = tRAS + tRP RC (Row Cycle Time) . .. Word Lines ... ... Bit Lines... Row Decoder CPU MEMORY BUS CONTROLLER Memory Array DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland DRAM “Speed” Summary I What do I care about? tRCD tCAS Seen in ads. Easy to explain Easy to sell tRP tRC = tRAS + tRP tRAC = tRCD + tCAS Embedded systems designers DRAM manufactuers Computer Architect: Latency bound code i.e. linked list traversal RAS: Row Address Strobe CAS: Column Address Strobe RCD: Row Command Delay RAC :Random Access Delay RP :Row Precharge Delay RC :Row Cycle Time DRAM Memory System: Lecture 2 Spring 2003 DRAM “Speed” Summary II Bruce Jacob David Wang DRAM Type Frequency Data Bus Width (per chip) Peak Data Bandwidth (per Chip) Random Access Time (tRAC) Row Cycle Time (tRC) PC133 SDRAM 133 16 200 MB/s 45 ns 60 ns DDR 266 133 * 2 16 532 MB/s 45 ns 60 ns PC800 RDRAM 400 * 2 16 1.6 GB/s 60 ns 70 ns FCRAM 200 * 2 16 0.8 GB/s 25 ns 25 ns RLDRAM 300 * 2 32 2.4 GB/s 25 ns 25 ns University of Maryland DRAM is “slow” But doesn’t have to be tRC < 10ns achievable Higher die cost Not commodity Not adopted in standard Expensive DRAM Memory System: Lecture 2 Spring 2003 “DRAM latency” F Bruce Jacob David Wang University of Maryland DRAM “latency” isn’t deterministic because of CAS or RAS+CAS, and there may be significant queuing delays within the CPU and the memory controller Each transaction has some overhead. Some types of overhead cannot be pipelined. This means that in general, longer bursts are more efficient. DRAM CPU Mem E1 Controller A B C D E2/E3 A: Transaction request may be delayed in Queue B: Transaction request sent to Memory Controller C: Transaction converted to Command Sequences (may be queued) D: Command/s Sent to DRAM E1: Requires only a CAS or E2: Requires RAS + CAS or E3: Requires PRE + RAS + CAS F: Transaction sent back to CPU “DRAM Latency” = A + B + C + D + E + F DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Architecture Basics PHYSICAL ORGANIZATION University of Maryland NOTE x2 DRAM Sense Amps Data Buffers ... Bit Lines... Memory Array x4 DRAM This is per bank … Typical DRAMs have 2+ banks Column Decoder Sense Amps Data Buffers ... Bit Lines... .... Memory Array x8 DRAM Column Decoder Row Decoder .... ... Bit Lines... x4 DRAM .... Sense Amps Data Buffers Row Decoder Column Decoder Row Decoder x2 DRAM Memory Array x8 DRAM DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Architecture Basics Read Timing for Conventional DRAM University of Maryland let’s look at the interface another way .. the say the data sheets portray it. RAS Row Access [explain] main point: the RAS\ and CAS\ signals directly control the latches that hold the row and column addresses ... Column Access CAS Data Transfer Address Row Address DQ Column Address Row Address Valid Dataout Column Address Valid Dataout DRAM Memory System: Lecture 2 Spring 2003 DRAM Evolutionary Tree ........ Bruce Jacob David Wang ...... University of Maryland MOSYS since DRAM’s inception, there have been a stream of changes to the design, from FPM to EDO to Burst EDO to SDRAM. the changes are largely structural modifications -- nimor -- that target THROUGHPUT. [discuss FPM up to SDRAM Everything up to and including SDRAM has been relatively inexpensive, especially when considering the pay-off (FPM was essentially free, EDO cost a latch, PBEDO cost a counter, SDRAM cost a slight re-design). however, we’re run out of “free” ideas, and now all changes are considered expensive ... thus there is no consensus on new directions and myriad of choices has appeared [ do LATENCY mods starting with ESDRAM ... and then the INTERFACE mods ] FCRAM Conventional DRAM $ (Mostly) Structural Modifications Targeting Throughput FPM Structural Modifications Targeting Latency EDO P/BEDO VCDRAM SDRAM ESDRAM Interface Modifications Targeting Throughput Rambus, DDR/2 Future Trends DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Read Timing for Conventional DRAM University of Maryland Row Access NOTE Column Access Transfer Overlap Data Transfer RAS CAS Address Row Address DQ Column Address Row Address Valid Dataout Column Address Valid Dataout DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Read Timing for Fast Page Mode University of Maryland Row Access FPM aallows you to keep th esense amps actuve for multiple CAS commands ... Column Access Transfer Overlap much better throughput problem: cannot latch a new value in the column address buffer until the read-out of the data is complete Data Transfer RAS CAS Address Row Address DQ Column Address Column Address Valid Dataout Column Address Valid Dataout Valid Dataout DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Read Timing for Extended Data Out University of Maryland Row Access solution to that problem -instead of simple tri-state buffers, use a latch as well. by putting a latch after the column mux, the next column address command can begin sooner Column Access Transfer Overlap Data Transfer RAS CAS Address Row Address DQ Column Address Column Address Valid Dataout Column Address Valid Dataout Valid Dataout DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Read Timing for Burst EDO University of Maryland Row Access by driving the col-addr latch from an internal counter rather than an external signal, the minimum cycle time for driving the output bus was reduced by roughly 30% Column Access Transfer Overlap Data Transfer RAS CAS Address Row Address DQ Column Address Valid Data Valid Data Valid Data Valid Data DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Read Timing for Pipeline Burst EDO University of Maryland “pipeline” refers to the setting up of the read pipeline ... first CAS\ toggle latches the column address, all following CAS\ toggles drive data out onto the bus. therefore data stops coming when the memory controller stops toggling CAS\ Row Access Column Access Transfer Overlap Data Transfer RAS CAS Address Row Address DQ Column Address Valid Data Valid Data Valid Data Valid Data DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Read Timing for Synchronous DRAM University of Maryland main benefit: frees up the CPU or memory controller from having to control the DRAM’s internal latches directly ... the controller/CPU can go off and do other things during the idle cycles instead of “wait” ... even though the time-to-first-word latency actually gets worse, the scheme increases system throughput Row Access Clock Column Access RAS Transfer Overlap Data Transfer CAS Command ACT READ Address Row Addr DQ Col Addr Valid Data Valid Data Valid Data Valid Data (RAS + CAS + OE ... == Command Bus) DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland output latch on EDO allowed you to start CAS sooner for next accesss (to same row) latch whole row in ESDRAM -allows you to start precharge & RAS sooner for thee next page access -- HIDE THE PRECHARGE OVERHEAD. DRAM Evolution Inter-Row Read Timing for ESDRAM “Regular” CAS-2 SDRAM, R/R to same bank Clock Command ACT READ PRE ACT READ Col Addr Bank Row Addr Col Addr Address Row Addr DQ Valid Data Valid Data Valid Data Valid Data Valid Data ESDRAM, R/R to same bank Clock Command ACT READ PRE ACT READ Col Addr Bank Row Addr Col Addr Address Row Addr DQ Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang University of Maryland neat feature of this type of buffering: write-around DRAM Evolution Write-Around in ESDRAM “Regular” CAS-2 SDRAM, R/W/R to same bank, rows 0/1/0 Clock Command ACT READ PRE ACT WRITE PRE ACT READ Col Addr Bank Row Addr Col Addr Bank Row Addr Col Addr Address Row Addr DQ Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data Valid Data ESDRAM, R/W/R to same bank, rows 0/1/0 Clock Command ACT READ PRE ACT WRITE READ Col Addr Row Addr Col Addr Col Addr Valid Valid Data Data Valid Data Address Row Addr DQ Bank Valid Data Valid Data Valid Data Valid Valid Data Data Valid Valid Data Data Valid Valid Data Data (can second READ be this aggressive?) Valid Data Valid Data DRAM Memory System: Lecture 2 Spring 2003 DRAM Evolution $ Bruce Jacob David Wang University of Maryland main thing ... it is like having a bunch of open row buffers (a la rambus), but the problem is that you must deal with the cache directly (move into and out of it), not the DRAM banks ... adds an extra couple of cycles of latency ... however, you get good bandwidth if the data you want is cache, and you can “prefetch” into cache ahead of when you want it ... originally targetted at reducing latency, now that SDRAM is CAS-2 and RCD-2, this make sense only in a throughput way Internal Structure of Virtual Channel 16 “Channels” (segments) Bank B Bank A Input/Output Buffer 2Kb Segment 2Kb Segment 2Kbit # DQs DQs 2Kb Segment 2Kb Segment Row Decoder Activate Sense Amps Prefetch Restore Sel/Dec Read Write Segment cache is software-managed, reduces energy DRAM Memory System: Lecture 2 Spring 2003 Bruce Jacob David Wang DRAM Evolution Internal Structure of Fast Cycle RAM University of Maryland SDRAM FCRAM 13 bits 8M Array (8Kr x 1Kb) Sense Amps tRCD = 15ns (two clocks) 15 bits Row Decoder 8K rows requires 13 bits tto select ... FCRAM uses 15 (assuming the array is 8k x 1k ... the data sheet does not specify) Row Decoder FCRAM opts to break up the data array .. only activate a portion of the word line 8M Array (?) Sense Amps tRCD = 5ns (one clock) Reduces access time and energy/access Spring 2003 Bruce Jacob David Wang DRAM Evolution ........ DRAM Memory System: Lecture 2 ...... Internal Structure of MoSys 1T-SRAM University of Maryland MoSys takes this one step further ... DRAM with an SRAM interface & speed but DRAM energy [physical partitioning: 72 banks] addr Bank Select auto refresh -- how to do this transparently? the logic moves tthrough the arrays, refreshing them when not active. but what is one bank gets repeated access for a long duration? all other banks will be refreshed, but that one will not. solution: they have a bank-sized CACHE of lines ... in theory, should never have a problem (magic) Auto Refresh $ DQs