- Ingineeri.com

advertisement
Energy-Efficient Hardware Data
Prefetching
Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz
CONTENTS
 Introduction
 Hardware
prefetching
 Hardware
data prefetching methods
 Performance
speedup
 Energy-aware
 PARE
 Conclusion
prefetching techniques
Introduction
Data prefetching, is the process of fetching data
that is needed in the program in advance, before
the instruction that requires it is executed.
 It removes apparent memory latency.

Two types
 Software prefetching
 Using compiler
 Hardware prefetching
 Using additional circuit

Hardware prefetching


Use additional circuit
Prefetch tables are used to store recent load
instructions and relations between load
instructions.
 Better
performance
 Energy
overhead comes from
 Energy
cost
Hardware Data Prefetching
Methods
Sequential
Stride
prefetching
prefetching
Pointer
prefetching
Combined
stride and pointer
prefetching
Sequential Prefetching

One block lookahead (OBL) approach
 Initiate
a prefetch for block b+1 when block b is
accessed
 Prefetch_on_miss
o
Whenever an access for block b results in a
cache miss
 Tagged

prefetching
Associates a tag bit with every memory block
When a block is demand-fetched or a prefetched
block is referenced for the first time next block is

OBL Approaches
 Prefetch-on-miss

Tagged
prefetch
Click to edit the
outline text format

demand-fetched
prefetched
demand-fetched
prefetched
Second Outline
Level0 demand-fetched
1 prefetched
 Third Outline
Level
0Fourth
demand-fetched
0Outline
prefetched
1Level
prefetched
 Fifth
Outline
Level

Stride Prefetching
 Employ
special logic to monitor the
processor’s address referencing pattern
 Detect
constant stride array references
originating from looping structures
 Compare
successive addresses used by load
or store instructions
Reference Prediction Table
(RPT)

RPT

64 entries

64 bits
 Hold
most recently used memory instructions
 Address
of the memory instruction
 Previous
address accessed by the instruction
 Stride
 State
value
field
Organization of RPT
PC
effective address
instruction tag
previous address
stride
state
+
prefetch address
Pointer Prefetching
 Effective
 No
for pointer_intensive programs
constant stride
 Dependence_based

 Use
prefetching
Detect dependence relationship
two hardware tables

Correlation table(CT)
•
Storing dependence information
Combined Stride And Pointer
Prefetching
 Objective
to evaluate a technique that would
work for all types of memory access patterns
 Use
both array and pointer
 Better
 All
performance
three tables (RPT, PPW, CT)
Performance Speedup
 Combined
(stride+dep) technique has the best speedup
for most benchmarks.
no-prefetch
sequential
tagged
stride
dependence
stride+dep
2.4
2.2
Speedup
2
1.8
1.6
1.4
1.2
1
0.8
mcf
parser
art
bzip2
galgel
bh
em3d
health
mst
perim
avg
Energy-aware Prefetching
Architecture
Compiler-Based
LDQ
RA
RB OFFSET
Hints
Selective Filtering
Filtered
Regular
Cache Access
Compiler-Assisted
Adaptive Prefetching
Stride Counter
Filtered
Prefetch Filtering
using Stride Counter
Stride
Prefetcher
Pointer
Prefetcher
Prefetching Filtering
Buffer (PFB)
Filtered
Hardware Filtering
using PFB
Prefetches
Data-array
Tag-array
......
...
...
...
Prefetch from L2 Cache
...
L1 D-cache
Energy-aware Prefetching
Technique
 Compiler-Based
 Only
Selective Filtering (CBSF)
searching the prefetch hardware tables
 Compiler-Assisted Adaptive
Prefetching
(CAAP)
 Select
different prefetching schemes
 Compiler-driven
Filtering using Stride Counter
(SC)
 Reduce
prefetching energy
 Hardware-based
Filtering using PFB (PFB)
Compiler-based selective
filtering
 Only
searching the prefetch hardware tables
for selective memory instructions identified
by the compiler

Energy reduced by

Using loop or recursive type memory
access

Use only array and linked data structure
memory access
Compiler-assistive adaptive
prefetching
 Select
different prefetching scheme based
on
Memory access to an array which
does not belongs to any larger
structure are only fed into the stride
prefetcher.

Memory access to an array which
belongs to a larger structure are fed
into both stride and pointer

Compiler-hinted Filtering
Using A Runtime SC
Reducing prefetching energy consumption
wasted on memory access patterns with very
small strides.

 Small
strides are not used
 Stride
can be larger than half the cache line size
 Each
cache line contain

Program Counter(PC)

Stride counter
PARE: A Power-aware
Prefetch Engine
 Used
 Two

for reducing power dissipation
ways to reduce power
Reduces the size of each entry
•
Based on spatial locality of memory accesses

Partitions the large table into multiple smaller
tables
Hardware Prefetch Table
Pare Hardware Prefetch Table
 Break
up the whole prefetch table into 16
smaller tables
 Each
 It
table containing 4 entries
also contain a group number
 Only
bits
use lower 16 bit of the PC instead of 32
Pare Table Design
Advantages Of Pare Hardware
Table
 Power
consumption reduced
 CAM
cell power is reduced
 Small
table
 Reduce
total power consumption
Conclusion
 Improve
the performance
 Reduce
the energy overhead of hardware data
prefetching
 Reduce

total energy consumption
compiler-assisted and hardware-based energyaware techniques and a new power-aware
prefetch engine techniques are used.
References
 Yao
Guo ,”Energy-Efficient Hardware Data
Prefetching,” IEEE ,vol.19,no.2,Feb.2011
 A.
J. Smith, “Sequential program prefetching in
memory hierarchies,”IEEE Computer, vol. 11,
no. 12, pp. 7–21, Dec. 1978.
 A.
Roth, A. Moshovos, and G. S. Sohi,
“Dependence based prefetching for linked data
structures,” in Proc. ASPLOS-VIII, Oct. 1998,
pp.115–126.
Download