Evaluation of SimPoint for Specific Architectural Studies Veynu Narasiman Aater Suleman

advertisement
Evaluation of SimPoint for
Specific Architectural Studies
Veynu Narasiman
Aater Suleman
May 3, 2005
Outline






Motivation
Project Goals
Methodology
Preliminary Results
Analysis
Conclusion
Motivation

Architects need to know the performance
improvement of a particular enhancement





The sooner the better
There is a need to reduce simulation time
Accuracy should not be compromised
SimPoint attempts to solve this problem
Many architects are hesitant to use
SimPoint
SimPoint




Reduces the number of instructions to be
simulated
Divides entire application into fixed length
slices and chooses the most representative
slices
Uses the Basic Block Execution behavior of
each slice as the selection criteria
Detailed information can be found at:

http://www.simpoint.com
Project Goals

Evaluate the accuracy of SimPoint for:

Prefetching


Compare actual performance improvement to that
estimated using SimPoint
Branch Prediction


Compare the overall actual prediction rates to that
estimated using SimPoint
Evaluate SimPoint’s ability to capture branches that exhibit
a certain kind of phase behavior
Methodology



Use PIN Instrumentation tool
Simulate SPECINT suite with Reference
input
Prefetch Tool





Branch Prediction Tool



32KB L1-cache, 1MB L2-cache
32-byte cache line size 32-way associative caches with Round Robin
Replacement
Stream Prefetcher from PowerPC
Measure L2-cache statistics
GSHARE predictor with 8192-entry Pattern History Table
Measure prediction statistics
Slice Size: 100 million instructions
Prefetching Data
Benchmark
bzip
gcc
gzip
perlbmk
vortex
vpr
Benchmark
bzip
gcc
gzip
perlbmk
vortex
vpr
Benchmark
bzip
gcc
gzip
perlbmk
vortex
vpr
Slices
SimSlices
208
199
541
329
1013
1094
10
10
10
10
10
10
Slices
SimSlices
208
199
541
329
1013
1094
10
10
10
10
10
10
Slices
SimSlices
208
199
541
329
1013
1094
10
10
10
10
10
10
L2 HR Without
Sim-HR
46.62%
71.99%
99.26%
94.38%
82.26%
95.31%
Prefetch
Real-HR
60.94%
76.04%
99.67%
94.87%
85.65%
95.36%
Percent Difference
L2 HR With Prefetch
Sim-HR
Real-HR
59.35%
67.21%
74.84%
77.96%
99.69%
99.83%
95.04%
95.54%
83.58%
86.57%
94.18%
94.20%
Percent Difference
Improvement in L2 Hit Ratio
Simpoint
Real
27.30%
10.29%
3.96%
2.52%
0.43%
0.16%
0.70%
0.71%
1.60%
1.07%
-1.19%
-1.22%
Difference
-23.49%
-5.33%
-0.41%
-0.52%
-3.96%
-0.05%
-11.69%
-4.00%
-0.14%
-0.53%
-3.45%
-0.03%
17.00%
1.45%
0.27%
-0.01%
0.53%
0.02%
Analysis of Bzip
Branch Prediction Data
Benchmark
bzip
gcc
gzip
mcf
perlbmk
vpr
Benchmark
bzip
gcc
gzip
mcf
perlbmk
vpr
Slices
SimSlices
208
199
541
329
1013
1094
10
10
10
10
10
10
Slices
SimSlices
208
199
541
329
1013
1094
10
10
10
10
10
10
Branch Prediction Accuracy
Real-BPA
Sim-BPA
91.58%
92.98%
97.32%
96.92%
93.55%
93.01%
93.51%
93.31%
94.68%
94.94%
89.09%
89.13%
Percent Difference
Branch Miss PKI
Real-BMPKI
Sim-BMPKI
10.997
8.900
5.655
5.777
12.482
12.905
14.278
14.491
7.146
6.758
14.324
14.283
Percent Difference
1.53%
-0.41%
-0.58%
-0.21%
0.28%
0.05%
-19.07%
2.16%
3.39%
1.49%
-5.43%
-0.29%
Branch Behavior
do {
c1 = block[i1];
c2 = block[i2];
if (c1 != c2) return (c1 > c2);
s1 = quadrant[i1];
s2 = quadrant[i2];
if (s1 != s2) return (s1 > s2);
i1++; i2++;
.
.
.
Conclusion


SimPoint reduces simulation time
Prefetching Accuracy



Improvement captured for applications with high hit
ratios
Improvement overestimated for bzip
Branch Prediction Accuracy


Overall branch prediction accuracy captured
Individual branch phase behavior to be determined
Questions?
Download