Evaluation of SimPoint for Specific Architectural Studies Veynu Narasiman Aater Suleman May 3, 2005 Outline Motivation Project Goals Methodology Preliminary Results Analysis Conclusion Motivation Architects need to know the performance improvement of a particular enhancement The sooner the better There is a need to reduce simulation time Accuracy should not be compromised SimPoint attempts to solve this problem Many architects are hesitant to use SimPoint SimPoint Reduces the number of instructions to be simulated Divides entire application into fixed length slices and chooses the most representative slices Uses the Basic Block Execution behavior of each slice as the selection criteria Detailed information can be found at: http://www.simpoint.com Project Goals Evaluate the accuracy of SimPoint for: Prefetching Compare actual performance improvement to that estimated using SimPoint Branch Prediction Compare the overall actual prediction rates to that estimated using SimPoint Evaluate SimPoint’s ability to capture branches that exhibit a certain kind of phase behavior Methodology Use PIN Instrumentation tool Simulate SPECINT suite with Reference input Prefetch Tool Branch Prediction Tool 32KB L1-cache, 1MB L2-cache 32-byte cache line size 32-way associative caches with Round Robin Replacement Stream Prefetcher from PowerPC Measure L2-cache statistics GSHARE predictor with 8192-entry Pattern History Table Measure prediction statistics Slice Size: 100 million instructions Prefetching Data Benchmark bzip gcc gzip perlbmk vortex vpr Benchmark bzip gcc gzip perlbmk vortex vpr Benchmark bzip gcc gzip perlbmk vortex vpr Slices SimSlices 208 199 541 329 1013 1094 10 10 10 10 10 10 Slices SimSlices 208 199 541 329 1013 1094 10 10 10 10 10 10 Slices SimSlices 208 199 541 329 1013 1094 10 10 10 10 10 10 L2 HR Without Sim-HR 46.62% 71.99% 99.26% 94.38% 82.26% 95.31% Prefetch Real-HR 60.94% 76.04% 99.67% 94.87% 85.65% 95.36% Percent Difference L2 HR With Prefetch Sim-HR Real-HR 59.35% 67.21% 74.84% 77.96% 99.69% 99.83% 95.04% 95.54% 83.58% 86.57% 94.18% 94.20% Percent Difference Improvement in L2 Hit Ratio Simpoint Real 27.30% 10.29% 3.96% 2.52% 0.43% 0.16% 0.70% 0.71% 1.60% 1.07% -1.19% -1.22% Difference -23.49% -5.33% -0.41% -0.52% -3.96% -0.05% -11.69% -4.00% -0.14% -0.53% -3.45% -0.03% 17.00% 1.45% 0.27% -0.01% 0.53% 0.02% Analysis of Bzip Branch Prediction Data Benchmark bzip gcc gzip mcf perlbmk vpr Benchmark bzip gcc gzip mcf perlbmk vpr Slices SimSlices 208 199 541 329 1013 1094 10 10 10 10 10 10 Slices SimSlices 208 199 541 329 1013 1094 10 10 10 10 10 10 Branch Prediction Accuracy Real-BPA Sim-BPA 91.58% 92.98% 97.32% 96.92% 93.55% 93.01% 93.51% 93.31% 94.68% 94.94% 89.09% 89.13% Percent Difference Branch Miss PKI Real-BMPKI Sim-BMPKI 10.997 8.900 5.655 5.777 12.482 12.905 14.278 14.491 7.146 6.758 14.324 14.283 Percent Difference 1.53% -0.41% -0.58% -0.21% 0.28% 0.05% -19.07% 2.16% 3.39% 1.49% -5.43% -0.29% Branch Behavior do { c1 = block[i1]; c2 = block[i2]; if (c1 != c2) return (c1 > c2); s1 = quadrant[i1]; s2 = quadrant[i2]; if (s1 != s2) return (s1 > s2); i1++; i2++; . . . Conclusion SimPoint reduces simulation time Prefetching Accuracy Improvement captured for applications with high hit ratios Improvement overestimated for bzip Branch Prediction Accuracy Overall branch prediction accuracy captured Individual branch phase behavior to be determined Questions?