L1 Data Cache Thrashing: Simple Detection, Correction Demo on

advertisement
L1 Data Cache Thrashing: Simple Detection, Correction
Demo on the C64x+ Simulator
Aim of the demo: Show use of the L1D conflict miss event on
C64x+ simulators to detect cache thrashing.
Steps to observe the data cache thrashing problem:
•
•
Build project datacache_simple_thrash
Select target configuration C6455 device cycle accurate simulator, Little Endian
•
•
Load the program on the target
Launch the profiling tool and select “profile all functions on a simulation device for
total cycles”
•
Then click Properties and browse and select the event L1D->Miss->Conflict event
•
Click Properties again and browse and select the event L1D->Miss->Summary event
•
•
Run the program to completion (takes a few seconds to run)
View the function profiler output, select CPU.cycle event and observe that it reports
an inclusive total of 3803467 CPU cycles.
•
Select the L1D miss summary event to view, and observe that it reports 196763
occurrences of misses.
•
Select the L1D conflict miss event to view, and observe that it reports 96896
occurrences of conflict misses.
We find that nearly 50% of the cache misses are due to cache conflicts and this suggests
cache thrashing.
Problem analysis, resolution
Steps to analyze closely:
•
Set program breakpoint at line 33 of main.c
•
•
•
Reset the target and re-run the test.
Launch the Cache Tag RAM viewer
At each iteration when the breakpoint is hit, note that all the 3 arrays x, y, and z map
to the same set resulting in y always being evicted.
1. This is because L1D is 2 way set associative, and it can hold, at any point of
time, 2 lines that have the same tag. In our case, since we have all 3 arrays
mapping to the same tag each iteration, at least one eviction occurs every
iteration. Since we have a z+=x*y statement, this causes multiple evictions per
iteration.
Steps to fix the problem:
Approach: We need to prevent the i-th entries of arrays x,y and z from
all mapping to the same cache set.
•
•
Add a cache line sized pad between any 2 of these arrays. This ensures that each
iteration, only two array entries map to the same set (which is okay since L1D is two
way set associative).
In our source code, uncomment line 7 of main.c to achieve desired effect.
•
Re-build, and re-run the test case to observe improvements:
Observe that now, all three entries – x, y and z – are all present in the cache and that x is
mapped to set 0 while y and z are mapped to set 1.
1. CPU cycle count down to 2446371 cycles
2. L1D miss summary event down to 100247 misses
3. L1D conflict miss event down to just 380 events!
We achieved a 250X reduction in conflict misses!
Download