Proposal for Power and Energy analysis of different pre-

advertisement
Proposal for
Power and Energy analysis of different prefetching mechanisms for Linked Data Structures
NIKHIL JAGDALE
KARAN KHANNA
AMAN KUMAR
Page 1 of 5
I.
PROBLEM DEFINITION AND MOTIVATION
3
II.
BRIEF SURVEY OF RELATED WORK
3
III.
SOME INITIAL IDEAS
3
IV.
EXPERIMENTAL METHODOLOGY/ SETUP
4
V.
TIMETABLE
4
VI.
REFERENCES
5
Page 2 of 5
I. Problem definition and motivation
Memory latencies in today’s processor are in the order of hundreds of processor clock cycles. In
programs that contain linked data structures (pointer intensive programs), the penalty of cache misses
can be costly. Several pre-fetching techniques for pointer-intensive programs exist that improve
performance, but at an expense of increased energy consumption. It is unclear, that which of these
techniques gives an optimum trade-off between performance improvement & energy consumption.
Further the cognizance of which pre-fetching technique is most suited for low energy consumption is
vital to domains such as embedded systems that are striving to keep up with the increased demand
for computational performance while maintain longer battery lives in portable devices.
II. Brief survey of related work
Although there is a lot of effort going on with respect to data pre-fetching techniques in the research
community, few seem to have characterised existing techniques with respect to energy consumption.
The most related research in this area to the best of our knowledge seems to be by Yao Guo et. al [1],
which evaluates a set of hardware based data pre-fetching techniques from energy stand point. Their
study covers two sequential pre-fetching techniques, stride pre-fetching, and dependence based prefetching of which only the latter most deals with irregular applications/ Linked-data structures.
However we acknowledge that it is not possible for any single investigation to explore all proposed
techniques from energy perspective, and also feel that it would be interesting to analyse many more
different techniques not covered in this study.
In this project, we intend to focus exclusively on comparative energy and performance evaluation of
pre-fetching approaches that deal with irregular applications/ Linked data structures, covering both:
hardware and software approaches. We further want to use our results, along with any existing
results such as those from [1], and evaluate the suitability of one or more of these techniques in hand
held embedded devices such as smart-phones, where minimizing energy consumption assumes
relatively high importance. Some of the interesting approaches we have short-listed to this point to
be considered as a part of our analysis include but are not limited to the following: Software based Greedy pre-fetching [2], Memory side Correlation pre-fetching [3], Address Value Delta prediction
[4], Content-directed pre-fetching [5].
III. Some initial ideas
Based on our understanding, we intend to select three different pre-fetching techniques that are
proposed for accelerating pointer-intensive applications after consultation with course faculty, and
adapt a common minimum simulation environment to those three. We then aim to execute a preselected set of pointer-intensive benchmark applications in the simulated environment, and the three
adapted versions with pre-fetching support built into them. In each of the tests, we would gather
application execution times [which will reflect relative execution speedups], and total energy
consumed in executing the applications.
Based on how our simulated environment is organised, it might be important to isolate energy
contributions from components that are not impacted by pre-fetching. However, there is a counter
argument to this, in that pre-fetching reduces execution time, and therefore any components not
related to pre-fetching such as main memory, LCD, SD card etc. also come into the larger systemwide energy equation; if application finishes earlier, these unrelated modules also have the
opportunity to go to sleep earlier. A related study [1] does not take any such effects in account, by
considering energy analysis of only the memory sub-system of the processor.
There are some interesting metrics related to pre-fetching, which may be correlated to energy
consumption trends, during the course of this investigation. These are:
 Pre-fetch coverage: Ratio of the number of misses reduced due to pre-fetches, over the total
number of pre-fetches that will occur in the absence of pre-fetching.
 Pre-fetch precision: Ratio of the number of distinct pre-fetched cache lines that are accessed
by at-least one demand request after being pre-fetched and before being replaced out, to the
number of pre-fetched cache lines.
Page 3 of 5


Pre-fetch pollution: is defined as a ratio of the number of those demand misses that are
caused by interference due to pre-fetching & will not occur without pre-fetch interference
over the number of misses that will occur without pre-fetching [6].
Energy cost of performance: Performance improvement per unit increase in energy
consumption, where performance improvement may be defined as percentage reduction in
execution time of a benchmark.
IV. Experimental methodology/ setup
1. The best simulation environment for our purposes would be the one that represents most
closely the more ubiquitous ARM based architectures being used for embedded mobile
applications, such as smart phones, audio players etc.. We need to finalize on this in
consultation with the course faculty.
2. Modifications for energy and performance instrumentation: We have been suggested to
consider WATTCH and SIMPLESCALAR for our purposes. [Need more investigation on the
same]
3. Prospect modifications for implementing software assisted greedy pre-fetching:
a. Addition of mechanism to support pre-fetch instruction in the simulator, if need be.
b. Add pre-fetch instructions in the benchmark suit, or alternatively explore the
possibility of using Todd Mowry’s version of compiler that automates this part.
4. Prospect modifications for implementing memory side correlation pre-fetching:
a. Add a memory side pre-fetch engine between the main memory and the L2 cache that
uses passive push pre-fetching.
b. Modify L2 cache to receive lines that it has not requested.
5. Prospect modifications for implementing AVD:
a. Add AVD prediction logic consisting of adders, comparators & a prediction table.
6. Benchmark suits: From researching several related works, it appears that Olden is the most
commonly used benchmark suit for pointer intensive applications. We intend to stick with
this as of now.
V. Timetable
Milestone
Due
M1
10/13
M2
11/01
M3
11/19
Description
1. Investigate & Freeze suitability of WATTCH and/or SIMPLESCALAR
2. Investigate & freeze the closest simulator, modify it to match a typical
ARM based Smartphone architecture
3. Decide upon suitable metrics that can be collected and analysed by this
experiment.
4. Modify Olden to include pre-fetch instructions/or use Mowry's compiler
(in this case see how it can be modified for ARM platforms)
1. Modify and incorporate different techniques into the simulated
environment
2. Run a first level of simulation and collect metrics
3. Review any short-falls in any measurement process and take appropriate
actions
4. Review if all metrics identified still make sense or their needs to be a
change at some point.
1. Do regressive simulations (different benchmarks, and all selected
techniques) and collect data
2. Analyse data and information (Comparative analysis of various pre-fetch
techniques.)
3. Use this information, and information from related work to comment
upon the suitability of one or other approach for battery powered mobile
computing platform.
Page 4 of 5
VI. References
[1] Energy Characterization of Hardware-Based Data Prefetching, Yao Guo et. al., 2004
[2] Automatic Compiler-Inserted Prefetching for Pointer-Based Applications by Chi-Keung Luk
and Todd C. Mowry
[3] Using a User-Level Memory Thread for Correlation Prefetching by Yan Solihin et. al.
[4] Address-Value Delta (AVD) Prediction by Onur Mutlu, Hyesoon Kim, Yale N. Patt
[5] Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid
Prefetching Systems by Eiman Ebrahimi, Onur Mutlu, Yale N. Patt
[6] An Adaptive Data Prefetcher for High-Performance Processors by Yong Chen et.al,
http://www.cs.iit.edu/~scs/psfiles/ccgrid10-adaptpf.pdf
[7] Push vs. pull - Data movement for linked data structures by Chia-Lin Yang and Alvin R.
Lebeck
Page 5 of 5
Download