Lecture 12 - Storage Hierarchy

advertisement
Lecture 12
Storage Hierarchy
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 1
Who Cares about Memory
Hierarchy?
• Processor Only Thus Far in Course
– CPU cost/performance, ISA, Pipelined Execution
1000
CPU
100
55%/year
CPU-DRAM Gap
10
35%/year
7%/year
DRAM
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1
• 1980: no cache in mproc;
• 1995 2-level cache, 60% transistors on Alpha 21164 mproc
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 2
General Principles
• Locality
– Temporal Locality: referenced again soon
– Spatial Locality: nearby items referenced soon
• Locality + smaller HW is faster = memory hierarchy
– Levels: smaller, faster, more expensive/byte than the level below
– Inclusive: data found in top also found in the bottom
• Definitions
–
–
–
–
Upper is closer to processor
Block: minimum unit that present or not in the upper level
Address = Block frame address + block offset address
Hit time: time to access the upper level, including hit determination
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 3
Measures
• Hit rate: fraction found in that level
– So high that usually talk about Miss Rate or Fault Rate
– Miss rate fallacy: as MIPS to CPU performance,
miss rate to average memory access time in memory
• Average memory-access time = Hit time + Miss rate
x Miss penalty (ns or clocks))
• Miss penalty:: time to replace a block from the lower level, including
to replace in CPU
– access time: time to access the lower level =(lower level latency)
– transfer time: time to transfer block =(BW upper & lower, block size)
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 4
Block Size vs. Measures
Increasing Block Size generally increases Miss Penalty and
decreases Miss Rate
Miss
Penalty
Transfer
Time
Miss
Rate
=
Average
Memory
Access
Time
Access
Time
Block Size
Block Size
Miss Penalty x Miss Rate
Storage Hierarchy
=
Block Size
Avg. Memory Access Time(AMAT)
CS510 Computer Architecture
Lecture 12 - 5
Implications for CPU
• Fast hit check since every memory access needs it
– Hit is the common case
• Unpredictable memory access time
– 10s of clock cycles: wait
– 1000s of clock cycles:
• Interrupt & switch & do something else
• New style: multithreaded execution
• How to handle miss (10s => HW, 1000s => SW)?
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 6
4 Questions for
Memory Hierarchy Designers
• Q1: Where can a block be placed in the upper level? (Block
placement)
• Q2: How is a block found if it is in the upper level?
(Block identification)
• Q3: Which block should be replaced on a miss?
(Block replacement)
• Q4: What happens on a write?
(Write strategy)
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 7
Q1: Block Placement:
Where can a Block be Placed in the
Upper Level?
Block 12 placed in
8 block cache
Fully Associative:
Block 12 can go
anywhere
Block
– Fully Associative(FA),
Number 0
Direct Mapped,
2-way Set Associative(SA)
– SA Mapping ;
(Block #) Modulo(# of Sets)
1 234 567
Direct mapped:
Block 12 can go
only into Block 4
(12 Mod 8)= 4
0 123 4 56 7
Set Associative:
Block 12 can go
anywhere in Set 0
(12 Mod 4)= 0
0 12 34 56 7
Set Set Set Set
0
1 2 3
Block Frame Address
Block
1 11 11 1
Number 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Memory
...
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 8
Q2: Block Identification:
How to Find a Block in the Upper Level?
• Tag on each block
– No need to check index or block offset
• Increasing associativity shrinks index, expands tag
Block Address
Tag
Index
Block
Offset
FAM: No index
DM: Large index
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 9
Q3: Block Replacement:
Which Block Should be Replaced
on a Miss?
• Easy for Direct Mapped
• SAM or FAM:
– Random
– LRU
Miss Rates
Associativity:
2-way
Cache Size
LRU
16 KB
5.18%
5.69%
4.67%
5.29%
4.39%
4.96%
64 KB
1.88%
2.01%
1.54%
1.66%
1.39%
1.53%
256 KB
1.15%
1.17%
1.13%
1.13%
1.12%
1.12%
Storage Hierarchy
4-way
Random
LRU
8-way
Random
CS510 Computer Architecture
LRU
Random
Lecture 12 - 10
Q4: Write Strategy:
What Happens on a Write?
• DLX : store 9%, load 26% in integer programs
– STORE:
• 9%/(100%+26%+9%)  7% of the overall memory traffic
• 9%/(26%+9%)  25% of the data cache traffic
– READ access is majority, thus to make the common case
fast: optimizing caches for reads
– High performance designs cannot neglect the speed of
WRITEs
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 11
Q4: What Happens on a Write?
• Write Through: The information is written to both the block in
the cache and to the block in the lower-level memory.
• Write Back: The information is written only to the block in the
cache. The modified cache block(Dirty Block) is written to main
memory only when it is replaced.
– is block clean or dirty?
• Pros and Cons of each:
– WT: read misses cannot result in writes (because of replacements)
– WB: no writes of repeated writes
• WT needs to be combined with write buffers so that don’t wait
for lower level memory
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 12
Q4: What Happens on a Write?
• Write Miss
– Write Allocate (fetch on write)
– No-Write Allocate (write around)
• WB caches generally use Write Allocate, while WT caches
often use No-Write Allocate
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 13
Summary
• CPU-Memory gap is major performance obstacle for
performance, HW and SW
• Take advantage of program behavior: locality
• Time of program still only reliable performance measure
• 4Qs of memory hierarchy
–
–
–
–
Block Placement
Block Identification
Block Replacement
Write Strategy
Storage Hierarchy
CS510 Computer Architecture
Lecture 12 - 14
Download