Lecture Set 9

advertisement
Computer Systems Organization:
Lecture 9
Ankur Srivastava
University of Maryland, College Park
Based on Slides from Mary Jane Irwin ( www.cse.psu.edu/~mji )
www.cse.psu.edu/~cg431
[Adapted from Computer Organization and Design,
Patterson & Hennessy, © 2005, UCB]
ENEE350
Review: Major Components of a Computer
Processor
Control
Datapath
ENEE350
Devices
Memory
Input
Output
Processor-Memory Performance Gap
µProc
55%/year
(2X/1.5yr)
Performance
10000
1000
“Moore’s Law”
Processor-Memory
Performance Gap
(grows 50%/year)
100
10
DRAM
7%/year
(2X/10yrs)
19
80
19
83
19
86
19
89
19
92
19
95
19
98
20
01
20
04
1
Year
ENEE350
The Memory Hierarchy Goal

Fact: Large memories are slow and fast memories are
small

How do we create a memory that gives the illusion of
being large, cheap and fast (most of the time)?
ENEE350

With hierarchy

With parallelism
A Typical Memory Hierarchy

By taking advantage of the principle of locality

Can present the user with as much memory as is available in the
cheapest technology

at the speed offered by the fastest technology
On-Chip Components
Control
eDRAM
Instr Data
Cache Cache
1’s
10’s
100’s
100’s
K’s
10K’s
M’s
Size (bytes):
Cost:
ENEE350
ITLB DTLB
Speed (%cycles): ½’s
Datapath
RegFile
Second
Level
Cache
(SRAM)
highest
Main
Memory
(DRAM)
Secondary
Memory
(Disk)
1,000’s
G’s to T’s
lowest
Characteristics of the Memory Hierarchy
Processor
4-8 bytes (word)
Increasing
distance
from the
processor in
access time
L1$
8-32 bytes (block)
L2$
1 to 4 blocks
Inclusive– what
is in L1$ is a
subset of what
is in L2$ is a
subset of what
is in MM that is
a subset of is
in SM
Main Memory
1,024+ bytes (disk sector = page)
Secondary Memory
(Relative) size of the memory at each level
ENEE350
Memory Hierarchy Technologies

Caches use SRAM for speed
and technology compatibility



21
Address
Low density (6 transistor cells), Chip select
high power, expensive, fast
Output enable
Static: content will last
“forever” (until power
turned off)
SRAM
2M x 16
Write enable
16
Din[15-0]
16
Main Memory uses DRAM for size (density)

High density (1 transistor cells), low power, cheap, slow

Dynamic: needs to be “refreshed” regularly (~ every 8 ms)
- 1% to 2% of the active cycles of the DRAM

Addresses divided into 2 halves (row and column)
- RAS or Row Access Strobe triggering row decoder
- CAS or Column Access Strobe triggering column selector
ENEE350
Dout[15-0]
Classical RAM Organization (~Square)
bit (data) lines
R
o
w
D
e
c
o
d
e
r
row
address
RAM Cell
Array
word (row) line
Column Selector &
I/O Circuits
data bit or word
ENEE350
Each intersection
represents a
6-T SRAM cell or
a 1-T DRAM cell
column
address
One memory row holds a block
of data, so the column address
selects the requested bit or word
from that block
Classical DRAM Organization (~Square Planes)
bit (data) lines
R
o
w
D
e
c
o
d
e
r
Each intersection
represents a
1-T DRAM cell
RAM Cell
Array
word (row) line
column
address
row
address
Column Selector &
I/O Circuits
data bit
data bit
data bit
ENEE350
The column address
selects the requested
bit from the row in each
plane
Memory Systems that Support Caches

The off-chip interconnect and memory architecture can
affect overall system performance in dramatic ways
on-chip
CPU
One word wide organization
(one word wide bus and
one word wide memory)

Assume
Cache
bus
32-bit data
&
32-bit addr
per cycle Memory

1.
1 clock cycle to send the address
2.
25 clock cycles for DRAM to be read
3.
1 clock cycle to return a word of data
Memory-Bus to Cache bandwidth

ENEE350
number of bytes accessed from memory
and transferred to cache/CPU per clock
cycle
One Word Wide Memory Organization

on-chip
CPU
Cache
If the block size is one word, then for a
memory access due to a cache miss,
the pipeline will have to stall the
number of cycles required to return one
data word from memory
cycle to send address
cycles to read DRAM
bus
cycle to return data
total clock cycles miss penalty
Memory

Number of bytes transferred per clock
cycle (bandwidth) for a single miss is
bytes per clock
ENEE350
One Word Wide Memory Organization

on-chip
CPU
If the block size is one word, then for a
memory access due to a cache miss,
the pipeline will have to stall the
number of cycles required to return one
data word from memory
Cache
1
25
1
27
bus
Memory

cycle to send address
cycles to read DRAM
cycle to return data
total clock cycles miss penalty
Number of bytes transferred per clock
cycle (bandwidth) for a single miss is
4/27 = 0.148
ENEE350
bytes per clock
One Word Wide Memory Organization, con’t

on-chip
What if the block size is four words?
cycle to send 1st address
CPU
cycles to read DRAM
cycles to return last data word
total clock cycles miss penalty
Cache
bus
Memory

Number of bytes transferred per clock
cycle (bandwidth) for a single miss is
bytes per clock
ENEE350
One Word Wide Memory Organization, con’t

on-chip
What if the block size is four words?
1
4 x 25 = 100
1
102
CPU
Cache
cycle to send 1st address
cycles to read DRAM
cycles to return last data word
total clock cycles miss penalty
25 cycles
bus
25 cycles
25 cycles
Memory
25 cycles

Number of bytes transferred per clock
cycle (bandwidth) for a single miss is
(4 x 4)/102 = 0.157
ENEE350
bytes per clock
Interleaved Memory Organization

For a block size of four words
on-chip
cycle to send 1st address
CPU
cycles to read DRAM
cycles to return last data word
Cache
total clock cycles miss penalty
bus
Memory Memory Memory Memory
bank 0 bank 1 bank 2 bank 3
Number of bytes transferred
per clock cycle (bandwidth) for a
single miss is

bytes per clock
ENEE350
Interleaved Memory Organization

For a block size of four words
on-chip
1 cycle to send 1st address
CPU
25
cycles to read DRAM
4 cycles to return the 4 word
Cache
30 total clock cycles miss penalty
25 cycles
bus
25 cycles
25 cycles
Memory Memory Memory Memory
bank 0 bank 1 bank 2 bank 3
25 cycles
Number of bytes transferred
per clock cycle (bandwidth) for a
single miss is

(4 x 4)/30 = 0.533 bytes per clock
ENEE350
DRAM Memory System Summary

Its important to match the cache characteristics


with the DRAM characteristics


ENEE350
caches access one block at a time (usually more than one
word)
use DRAMs that support fast multiple word accesses,
preferably ones that match the block size of the cache
with the memory-bus characteristics

make sure the memory-bus can support the DRAM access
rates and patterns

with the goal of increasing the Memory-Bus to Cache
bandwidth
Download