ppt - Computer Science and Engineering

advertisement
Using a Victim Buffer in an ApplicationSpecific Memory Hierarchy
Chuanjun Zhang*, Frank Vahid**
*Dept. of Electrical Engineering
Dept. of Computer Science and Engineering
University of California, Riverside
**Also with the Center for Embedded Computer Systems at UC Irvine
This work was supported by the National Science Foundation and the
Semiconductor Research Corporation
Chuanjun Zhang, UC Riverside
1
Low Power/Energy Techniques are Essential

Skadron et al., 30th ISCA
Hot enough to cook an egg.

High performance
processors are going to
be too hot to work

Low energy dissipation is
imperative for battery-driven
embedded systems
Low power techniques are
essential to both embedded
systems and high performance
processors
Frank Vahid, UC Riverside
2
Caches Consume Much Power

Caches consume 50% of total
processor system power


ARM920T and M*CORE(Segars 01, Lee 99)
Caches accessed often


Associativity reduces misses


Consume dynamic power
Less power off-chip, but more power
per access
Victim buffer helps



(Jouppi 90)
Add to direct-mapped cache
Keep recently evicted lines in small
buffer, check on miss
Like higher-associativity, but without
extra power per access

10% energy savings, 4% performance
improvement (Albera 99)
Frank Vahid, UC Riverside
Processor
Cache
Victim
buffer
Memory
3
Victim Buffer

PROCESSOR
With a victim buffer

One cycle
Two cycles

HIT

MISS

HIT
Miss
Without a victim buffer


L1 cache
Victim buffer
22cycles
cycles
21
One cycle on a cache hit
Two cycles on a victim
buffer hit
Twenty two cycles on a
victim buffer miss

One cycle on a cache hit
Twenty one cycles on a
victim buffer miss
More accesses to off-chip
memory
OFFCHIP MEMORY
Frank Vahid, UC Riverside
4
Cache Architecture with a Configurable Victim Buffer

Is a victim buffer a useful
configurable cache
parameter?





SRAM
cache Vdd
control
circuit
Thus, want ability to
shut off VB for given app.
One bit register
A switch
Four-line victim buffer
shown
tag
reg
VB misses, so extra
cycle wasteful?
Hardware overhead

VB on/off
Helps for some
applications
For others, not useful


data to
processor
to mux
L1
cache
data
SRAM
victim line
1
0
from cache
s control circuit
27-bit tag 16-byte cache line data
data from next
level memory
control signals
CAM
SRAM
Fully-associative victim buffer
control signals to the
next level memory
Frank Vahid, UC Riverside
5
Hit Rate of a Victim Buffer
100%
Data cache
50%
mpeg
jpeg
art
mcf
mpeg
jpeg
art
mcf
Ave
pegwit
pegwit
vpr
g721
g721
parser
epic
epic
adpcm
v42
ucbqsort
pjepg
fir
g3fax
brev
blit
binary
bilv
bcnt
auto2
crc
padpcm
0%
100%
8 Kbyte
4 Kbyte
2 Kbyte
Instruction cache
Ave
vpr
parser
adpcm
v42
ucbqsort
pjepg
fir
g3fax
brev
blit
binary
bcnt
auto2
crc
padpcm
0%
bilv
50%
Hit rate of victim buffer when added to an 8 Kbyte, 4 Kbyte, or 2 Kbyte direct-mapped cache
Benchmarks from Powerstone, MediaBench, and Spec 2000.
Frank Vahid, UC Riverside
6
Computing Total Memory-Related Energy

Consider CPU stall energy and off-chip memory energy


Excludes CPU active energy
Thus, represents all memory-related energy
energy_mem = energy_dynamic + energy_static
energy_dynamic = cache_hits * energy_hit + cache_misses * energy_miss
energy_miss = energy_offchip_access + energy_uP_stall + energy_cache_block_fill
energy_static = cycles * energy_static_per_cycle
energy_miss = k_miss_energy * energy_hit
energy_static_per_cycle = k_static * energy_total_per_cycle
(we varied the k’s to account for different system implementations)
Underlined – measured quantities
SimpleScalar (cache_hits, cache_misses, cycles)
Our layout or data sheets (others)

Frank Vahid, UC Riverside
7
Performance and Energy Benefits of Victim Buffer
with a Direct-Mapped Cache
Substantial benefit
12%
performance
energy
60%
13% 38% 43%
24%
21%
8%
15%
4%
vpr
parser
mcf
art
jpeg
mpeg
pegwit
g721
epic
adpcm
v42
ucbqsort
pjeg
fir
g3fax
brev
blit
binary
bilv
bcnt
auto2
Should
shut-off
VB
crc
-4%
padpcm
0%
An 8-line victim buffer with an 8 Kbyte direct-mapped
cache (0%=DM w/o victim buffer)
Configurable victim buffer is clearly useful to avoid performance penalty
for certain applications
Frank Vahid, UC Riverside
8
Is a Configurable Victim Buffer Useful Even With a
Configurable Cache
We showed that a
configurable cache can
reduce memory access
power by half on average


2 Kb
way
(Zhang/Vahid/Najjar ISCA 03,
ISVLSI 03)
Software-configurable cache



Line
Associativity – 1, 2 or 4 ways
Size: 2, 4 or 8 Kbytes
Does that configurability
subsume usefulness of
configurable victim buffer?
Normalized energy

1.0
0.8
0.6
0.4
0.2
epic
mpeg2
0.0
1
Frank Vahid, UC Riverside
2
Associativity
4
9
Best Configurable Cache with VB Configurations





Optimal cache configuration when
cache associativity, cache size, and
victim buffer are all configurable.
I and D stands for instruction cache
and data cache, respectively.
V stands for the victim buffer is on.
nK stands for the cache size is n
Kbyte.
The associativity is represented by
the last four characters


Benchmark vpr, I2D1 stands for twoway instruction cache and directmapped data cache.
Note that sometimes victim buffer
should be on, sometimes off
Example Best
Example Best
padpcm
I8KD4KI1D2
ucbqsort
I4KDV4KI1D1
crc
I2KDV4KI1D1
v42
I8KD8KI1D1
auto2
I4KD2KI1D1
adpcm
I2KDV2KI1D1
bcnt
I2KD2KI1D1
epic
IV4KDV8KI1D1
bilv
I4KD2KI1D1
jpeg
I8KD2KI4D1
binary
I4KD2KI1D1
mpeg2
I4KDV4KI1D1
blit
I2KDV2KI1D1
g721
I8KDV2KI2D1
brev
I4KD2KI1D1
art
I4KDV2KI1D1
g3fax
I4KDV2KI1D1
mcf
I4KD4KI1D1
fir
I4KD2KI1D1
parser
I8KDV4KI4D1
pjepg
I4KDV2KI1D1
vpr
I8KD2KI2D1
pegw it
I4KD4KI1D1
Frank Vahid, UC Riverside
10
Performance and Energy Benefits of Victim Buffer
Added to a Configurable Cache
12%
performance
8%
32%
energy
23%
43%
4%
0%


vpr
parser
mcf
art
jpeg
mpeg
pegwit
g721
epic
adpcm
v42
ucbqsort
pjeg
fir
g3fax
brev
blit
binary
bilv
bcnt
auto2
crc
padpcm
-4%
An 8-line victim buffer with a configurable cache, whose
associativity, size, and line size are configurable
(0%=optimal config. without VB)
Still surprisingly effective
Frank Vahid, UC Riverside
11
Conclusion

Configurable victim buffer useful with direct-mapped cache



Configurable victim buffer also useful with configurable cache



As much as 60% energy and 4% performance improvements for some
applications
Can shut off to avoid performance penalty on other apps.
As much as 43% energy and 8% performance improvement for some
applications
Can shut off to avoid performance overhead on other applications
Configurable victim buffer should be included as a softwareconfigurable parameter to direct-mapped as well as
configurable caches for embedded system architectures
Frank Vahid, UC Riverside
12
Download