DRAM - KTU - Kompiuterių katedra

advertisement

COMPUTER

ARCHITECTURE

(P175B125)

Assoc.Prof.

Stasys Maciulevičius

Computer Dept.

stasys.maciulevicius@ktu.lt

DRAM cell

Row (word) line

• Storing of one bit in dynamic memory cell needs one transistor only (static memory cell has 6-8 transistors).

• In order to reduce the number of chip contacts, traditionally address has been transferred in two steps: first are transferred higher bits – row address, later

– column address

• This results in a greater number of cycles in access.

• Information is stored in the form of load capacitor within an integrated circuit. Since real capacitors leak charge, the information eventually fades unless the capacitor charge is refreshed periodically

• DRAM works approximately 10 times slower than SRAM

2009-2013

©S.Maciulevičius 2

Structure of 1 M DRAM chip

CAS#

Column addr.

buffer

A0-A8

N.2 clock oscillator

Refresh controller

A0-A8

Refresh counter

Column decoder

Amplifiers and write control

Row addr.

buffer

A0-A8

RAS#

N.1 clock oscillator

DRAM array

(matrix)

512

512

4

I/O control and data buffers

OE#

WE#

Data

D0-D3

©S.Maciulevičius 3 2009-2013

DRAM roadmap

Ordinary FPM EDO SDRAM

2009-2013

1987

1M

4M

16M

94

BEDO

95 96

64M

©S.Maciulevičius

97

256M

98 99 2000

RDRAM , DDR,

DDR 2, …

4

Ordinary DRAM

RAS#

CAS#

Addr Row 1 Col.1

Data Data1

Every access - individual

Row 2 Col.2

Data 2

2009-2013

©S.Maciulevičius 5

Fast Page Mode (FPM) DRAM

RAS#

CAS#

Addr Row 1 Col. 1 Col. 2 Col. 3

Data

Data1 Data2 Data3

• For successive reads or writes within the row CAS# should be repeated

• When CAS# 

H, data output lines

Z state

2009-2013

©S.Maciulevičius 6

Extended Data Output RAM

(EDO RAM)

RAS#

CAS#

Addr

Data

Row1 Col.1

Col.2 Col.3

Data1 Data2 Data3

• For transferring of burst CAS# should be repeated

• It differs from FPM with the additional feature that a new access cycle can be started while keeping the data output of the previous cycle active

• Therefore, it can achieve a smaller period (higher frequency)

2009-2013

©S.Maciulevičius 7

Burst Extended Data Output RAM

(BEDO)

RAS#

CAS#

Addr

Row1 Col.1

Data

Col.2

D10 D11 D12 D13 D20

2009-2013

©S.Maciulevičius 8

Burst Extended Data Output RAM

(BEDO)

• A pipelined stage was added allowing pageaccess cycle to be divided into two components

• An address counter on the chip was added to keep track of the next address

• Quicker access time is achieved (up to 50% for large blocks of data) than with traditional EDO

• Could process four memory addresses in one burst, for a maximum of 5-1-1-1 , when EDO

RAM - 5-2-2-2

2009-2013

©S.Maciulevičius 9

Synchronous Dynamic RAM

(SDRAM)

• Traditionally DRAM has an asynchronous interface which means that it responds as quickly as possible to changes in control inputs

• SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding to control inputs and is therefore synchronized with the computer's system bus

• All of them are designed to work in burst mode, transfering one portion of data each clock. Programmable burst length - 1, 2, 4, 8 or 256

• Could process four portions in one burst for a maximum of

5-1-1-1

2009-2013

©S.Maciulevičius 10

Synchronous Dynamic RAM

(SDRAM)

This is realized by adding registers (latches) to fix the address, data and control signals:

Address

DRAM Data

Control signals

Clock

2009-2013

©S.Maciulevičius 11

SDRAM

• Clock is used to drive an internal finite state machine that pipelines incoming instructions

• Pipelining means that the chip can accept a new instruction before it has finished processing the previous one. E.g. in a pipelined read, the requested data appears after a fixed number of clock pulses after the read instruction, and additional instructions can be sent during this time

• For indicating DRAM speed two principles are used:

• Minimal interval between adjacent portions of the bundle (8 ns, 7 ns, and 6 ns, etc.)

• Bus frequency (100 MHz corresponds to 8-ns, 133 MHz -to 6-ns, etc.).

• Don’t forget that the first portion can have significant latency!

2009-2013

©S.Maciulevičius 12

Timing of PC100 SDRAM

Instr Row Col Row Col Row Col

Data W W W W W W R R

Bubbles

• 2 cycle addressing

• Bubbles increase latency, decrease bandwidth

W W

2009-2013

©S.Maciulevičius 13

Timing of PC100 SDRAM

This diagram should be drawn with attention to two SDRAM technology-driven issues:

• In PC platform unbuffered SDRAM DIMMs require the so-called '2cycles addressing‘ - the row and column addresses on the bus are retained two cycles. This is necessary when several DIMM slots are on board. In the case only 1 DIMM, just 1 cycle is sufficient

• Changing of address (the selection of other column, by reading, as well as by writting) needs for a small pause

('bubbles')

©S.Maciulevičius

2009-2013 14

Type

Ordinary

FPM

EDO

BEDO

SDRAM

2009-2013

Compare classical DRAMs

Standard bus speeds, MHz

Access rate

4.77 - 40

16 - 66

33 - 75

60 - 100

60 - 100+

5-5-5-5

5-3-3-3

5-2-2-2

5-1-1-1

5-1-1-1

©S.Maciulevičius

DRAM access time

80-150 ns

60-80 ns

50-60 ns

50-60 ns

7-15 ns

15

New DRAM types

If the above DRAM types may be considered as relatively classic, in past years new types of

DRAMs were developed, which were and are used into computers:

DDR SDRAM - Double Data Rate SDRAM

DDR2 SDRAM – twice faster than DDR

DDR3 SDRAM – four times faster than DDR

2009-2013

©S.Maciulevičius 16

DDR SDRAM

DDR - Double Data Rate SDRAM It achieves nearly twice the bandwidth of the preceding single data rate (SDR) SDRAM by transferring data on the rising and falling edges of the clock signal

Bandwidth:

• 1 generation - with a bus frequency of 100 MHz,

DDR SDRAM gives a maximum transfer rate of 1600

MB/s

• later - 3.2 GB/s (= 200

2

8 B; frequency of 200

MHz)

2009-2013

©S.Maciulevičius 17

DDR SDRAM

DDR read operations can be explained using this simplified scheme:

From memory array

Data register

(n-bit) n bits n bits

2n bits

D0

MUX Q

D1 Data register

(n-bit) n bits

2009-2013

©S.Maciulevičius 18

DDR modules

Some DDR modules are specified here:

Standard name

Mem. clock

(MHz)

Cycle time

(ns)

I/O bus clock

(MHz)

Data transf. rate

(MHz)

Module name

Peak transfer rate

(MB/s)

DDR-200 100 10 100 200 PC-1600 1600

DDR-266 133 7.5

133 266 PC-2100 2100

DDR-333 166 6 166 333 PC-2700 2700

DDR-400 200 5 200 400 PC-3200 3200

2009-2013

©S.Maciulevičius 19

DDR2

DDR2 core performs read and write operations in same frequency, as DDR or SDRAM

However :

• I/O buffers operating frequency is double

• Twice expanded bus that connects the core and the buffers

Therefore the data are multiplexed and transmitted at a double frequency using the normal width bus

Thus, DDR2 533 work in the same frequency as

DDR266 or PC133 SDRAM

2009-2013

©S.Maciulevičius 20

DDR2 SDRAM read operation

From memory array

4n bits

2009-2013

Data register

(n-bit)

Data register

(n-bit)

Data register

(n-bit)

Data register

(n-bit)

©S.Maciulevičius n bits n bits n bits n bits

D0

D1

D2 MUX Q

D1 n bits

21

DDR2 modules

Some DDR2 modules are specified here:

Standard name

Mem. clock

(MHz)

Cycle time

(ns)

I/O bus clock

(MHz)

Data transf. per sec

(Mln)

Module name

Peak transfer rate

(MB/s)

DDR2-400 100 10 200 400 PC2-3200 3200

DDR2-533 133 7.5

266 533 PC2-4300 4266

DDR2-667 166 6 333 667 PC2-5300 5333

DDR2-800 200 5

DDR2-1066 266 3

400

533

800

1066

PC2-6400

PC2-8500

6400

8533

2009-2013

©S.Maciulevičius 22

DDR and DDR 2

Increased delay in clock periods, but data are transferred faster

2009-2013

©S.Maciulevičius 23

SDRAM, DDR, and DDR 2

As you can see, all the SDRAM parts operate at the basic (core) frequency, while the data is transmitted once a clock

DDR parts operate at the basic

(core) frequency, while the data is transmitted twice per clock

DDR 2 output buffers operate at the double frequency, while the data is transmitted twice per buffers clock (four times per core clock)

©S.Maciulevičius 24 2009-2013

DDR3

Core Data buffer Data output frequency 100 MHz frequency 400 MHz rate 800 MHz

2009-2013

Memory core

(cell array)

©S.Maciulevičius

Data output buffers

25

Benefits of DDR3

• First of all – less energy consumption (by 40%) compared to the popular DDR2 (this is due to reduction of supply voltage: 1,5 V - DDR3, 1,8 V -

DDR2, or 2,5 V – DDR)

• The higher working speed - DDR3 frequency range

800 МHz – 1600 МHz (clock frequency 400 МHz –

800 МHz); while the DDR2 frequency range 400

МHz - 1066 МHz (clock frequency 200 МHz - 533

МHz), and DDR – 200 МHz - 600 МHz only

• DDR3 drawback – increased latency (in clock periods)

2009-2013

©S.Maciulevičius 26

DDR3 modules

Some DDR3 modules are specified here:

Standard name

Mem. clock

(MHz)

Cycle time

(ns)

I/O bus clock

(MHz)

Data transf. per sec

(Mln)

Module name Peak transfer rate

(MB/s)

DDR3-800 100 10 400 800 PC3-6400 6400

DDR3-1066 133 7.5

533 1066 PC3-8500 8533

DDR3-1333 166 6 667 1333 PC3-10600 10667

DDR3-1600 200 5 800 1600 PC3-12800 12800

©S.Maciulevičius 27 2009-2013

DDR

DDR2

DDR3 (market)

2009-2013

©S.Maciulevičius 28

DDR4

• DDR4 is the next evolution in DRAM, bringing even higher performance and more robust control features while improving energy economy

Feature/Option DDR3

Voltage (core and I/O) 1.5V

Data rate (Mb/s)

DDR4

1.2V

800, 1066, 1333,

1600, 1866, 2133

1600, 1866, 2133,

2400, 2667, 3200

Densities 512Mb –8Gb 2Gb –16Gb

Internal banks

2009-2013

8

©S.Maciulevičius

16

29

Increasing DRAM speed

2009-2013

©S.Maciulevičius 30

DDR timing

Main DDR DRAM timing parameters are:

• t

RCD

- RAS to CAS delay – the number of clock cycles needed between a row address strobe and a column address strobe

• t

CL

- CAS delay (latency) – the number of clock cycles required to access a specific column of data

• t

RP

- RAS precharge – the number of clock cycles needed to close one row of memory and open another

• t

RAS

- active to precharge delay – The number of clock cycles needed to access a specific row of data in RAM

E.g., “DDR2-800 5-5-5-15” shows the values of these four parameters

2009-2013

©S.Maciulevičius 31

DDR timing

Typical values of these parameters for DDR chips:

• RAS to CAS Delay : 2, 3, 4;

• CAS Latency : 2.0, 2.5, 3.0;

• RAS Precharge : 2, 3, 4

©S.Maciulevičius

2009-2013 32

SPD

In accordance with JEDEC standards in each module must be small special ROM chip called the SPD ( S erial P resence D etect) with access information about a computer memory module:

• configuration and type

• timing

• producer (his code)

• serial number

• production date

• other information

Total ROM size is 128 bytes

2009-2013

©S.Maciulevičius 33

SPD

E.g., CPU-Z test extracts such information from SPD:

2009-2013

©S.Maciulevičius 34

DRAM refresh

• Memory refresh is the process of periodically reading information from an area of computer memory, and immediately rewriting the read information to the same area with no modifications

• Each memory refresh cycle refreshes a succeeding area of memory

• Classic asynchronous DRAM is refreshed by opening each row in turn

• For convenience, the refresh counter is incorporated into RAM chips

2009-2013

©S.Maciulevičius 35

DRAM refresh

• In CAS-before-RAS (CBR) refresh the CAS# line is driven low before RAS#, then the DRAM ignores the address inputs and uses an internal counter to select the row to open ( refresh)

• Hidden refresh allows PC RAM refresh memory cycles to take place in memory banks not used by the CPU at the time, instead or together with the normal refresh cycles

• Refresh period – T ref in first DRAMs was 2 ms, now

– 64 ms or even 128 ms

2009-2013

©S.Maciulevičius 36

Memory controller

The memory controller is a digital circuit which manages the flow of data going to and from the main memory:

D

A

D

A

DRAM

CPU

Rd

Wr

DRAM controller

RAS#

CAS#

WE#

OE#

©S.Maciulevičius

2009-2013 37

Memory controller

 It can be a separate chip or integrated into another chip

 Computers using Intel microprocessors traditionally had a memory controller implemented on their motherboard's northbridge (“northern” part of chipset)

 AMD's Athlon 64 and Opteron processors, Intel

Core i7 have a memory controller on the microprocessor die to reduce the memory latency.

This also adds some restrictions for using some

DRAM types

2009-2013

©S.Maciulevičius 38

Memory controller in chipset

 Computers using Intel Core 2 (Duo and Quad) microprocessors had a memory controller implemented on their motherboard's northbridge

( e.g., on P45 MCH Memory Controllel Hub ):

2009-2013

©S.Maciulevičius 39

Memory controller in Core i7

Integrated

Memory

Controller

2009-2013

©S.Maciulevičius 40

DRAM modules

SIPP Single In-Line Pin Package

• 30 pins

• used in some 286-based computers

• often bent or broke during installation

SIMM Single In-Line Memory Module

• “short” (90 mm) – 30 pins, 8 bits of data

• “long” (108 mm) – 72 pins, 4 bytes of data

• 32, 36 (with parity), ECC-36 and ECC-40 – with an errorcorrecting code

• some - with PD (Presence Detect, indicates size 4, 8, 16, 32

MB)

DIMM – Dual In-Line Memory Module

• 133,35 mm – 168-244 pins, 8 bytes

• 64 (ordinary) bit word, 72 or 80 bits (with parity or errorcorrecting code)

©S.Maciulevičius

2009-2013 41

SIMM modules

2009-2013

©S.Maciulevičius 42

SDRAM module

2009-2013

©S.Maciulevičius 43

DDR modules

Comparison of memory modules for desktop PCs

(DIMM)

2009-2013

©S.Maciulevičius 44

Registered memory modules

Registered (also called buffered) memory modules have a register between the DRAM modules and the system's memory controller

 They place less electrical load on the memory controller and allow single systems to remain stable with more memory modules than they would have otherwise

 There is a performance penalty for using registered memory. Each read or write is buffered for one cycle between the memory bus and the DRAM, so the registered RAM can be thought of as running one clock cycle behind the equivalent unregistered DRAM

2009-2013

©S.Maciulevičius 45

Registered memory modules

2009-2013

©S.Maciulevičius 46

FB-DIMM

Fully Buffered DIMM (or FB-DIMM) is a memory technology which can be used to increase reliability and density of memory systems

 Conventionally, data lines from the memory controller have to be connected to data lines in every DRAM module

 Fully buffered DIMM architecture introduces an

advanced memory buffer

(AMB) between the memory controller and the memory module

©S.Maciulevičius

2009-2013 47

FB-DIMM

2009-2013

©S.Maciulevičius 48

FB-DIMM

 FB-DIMM uses 10 pairs of lines carrying commands and data from the processor to memory and 14 bit lanes carrying data from memory to the processor

 Each bit is carried over a differential pair (signal and inversion), clocked at 12 times the basic memory clock rate, 6 times the double-pumped data rate

©S.Maciulevičius

2009-2013 49

FB-DIMM

 While Fully-Buffered DIMM was originally a good idea, the industry soon found that it has implementation problems

 First, the serial input frequency has to be 4 times higher than the memory clock frequency. This puts it into the microwave frequency range and is a whole new page of technical difficulties

 The higher serial input frequency also increases the heat generation to an unacceptable point.

 Smart engineers soon announced the alternative approach, the LRDIMM

2009-2013

©S.Maciulevičius 50

LRDIMM

 LRDIMM (Load Reduced Dual-inline Memory

Module) is designed with a buffer chip to replace the register to help minimize loading, it can increase overall server system memory capacity and speed

 It is pin-compatible with existing DDR3 DIMM sockets and LRDIMM is JEDEC standard

 LRDIMM can contain 72 modern 40nm 4 gigabit

DDR3 SDRAM

 Dual server can have at most 16 ordinary DIMMs, but using LRDIMM – even 24 DIMMs

2009-2013

©S.Maciulevičius 51

LRDIMM and FBDIMM

2009-2013

©S.Maciulevičius 52

Download