Using Information About Cache Evictions to
Measure the Interactions of
Application Data Structures
Bryan R. Buck
Jeffrey K. Hollingsworth
Department of Computer Science
1 University of Maryland
Introduction
Cache behavior information is important
– Processor speed increasing faster than memory
Should relate cache info to data structures
– More useful to programmer in tuning applications
Collect using hardware
– Software techniques, such as simulation, are slow
– In the past, limited hardware support
– Situation is changing, hardware support more common
2 University of Maryland
Outline
Measuring cache misses
– Sampling
Information about evictions
– What is required
– Sampling
Simulation-based study
– The simulator and applications used
– Results
Conclusions and future work
University of Maryland 3
Finding Objects With Most Cache Misses
Handling every cache miss is slow
– Use sampling, requirements:
• Periodic interrupt on cache miss
• Ability to determine miss address
Associate count with each object
– Variable or dynamically allocated memory
Interrupt after every n cache misses
– Obtain address of miss
– Find object containing it and increment count
University of Maryland 4
Interactions Between Objects
Why does data leave the cache?
– What object caused it to be replaced?
Hardware could provide eviction information
– When miss occurs, save address of evicted data
Not difficult to provide physical address
– Can calculate from tag of evicted cache line
– Information in OS can map physical to virtual
• May be imprecise due to paging
University of Maryland 5
Measuring Eviction Information
Use sampling, store more at each miss
– Object that caused the miss
– Object containing the data that was evicted
– Part of code it happened in
Questions
– “Buckets” much smaller, will sampling be accurate?
– Data structure more complicated, how efficient?
University of Maryland 6
Experiments
Implemented in simulation
– Simulator uses ATOM binary rewriting tool
• Instrument load/stores for cache simulation
• Instrument basic blocks for virtual cycle count
• Simulates necessary hardware support
– Miss and eviction sampling runs under simulation
Tested using SPEC95/2000 applications
– su2cor, applu, equake, gzip, mgrid, swim, wupwise, …
– Sampled 1 in 25,000 misses
University of Maryland 7
Accuracy of Sampling Cache Misses
Application su2cor
Variable
U
R-loops
S
W2-intact
W2-sweep
Actual
Rank %
1 60.5
2 5.3
3
4
5
5.0
4.2
4.1
Sample
Rank %
1
4
61.1
4.6
2
3
6
5.3
5.3
4.0
swim
UNEW
PNEW
VNEW
CU
H
1 10.3
2 10.3
3 10.3
4 7.0
5 6.9
1 10.6
3 9.8
2 10.0
6 7.1
9 6.9
University of Maryland 8
100
80
60
40
20
0
Eviction Results: mgrid
U
V
R other
U U sampled
V
University of Maryland
V sampled
R R sampled
9
Evictions By Code Region: mgrid
% of total evictions of U by U, V, and R in each line of code.
Variable Function Line
U resid interp interp interp interp
214
302
312
290
281
Actual
Rank
1
2
3
4
5
%
40.7
5.2
5.2
4.7
4.7
Sample
Rank %
1
4
42.1
4.7
3
5
2
5.0
4.7
5.0
V
R resid psinv resid
200 1
174
200
1
2
University of Maryland
1.6
18.8
2.3
1
1
2
2.1
18.6
2.1
10
Cache Misses Due to Instrumentation
100
10 sample 1 in 250 sample 1 in 2,500 sample 1 in 25,000
1
0.1
0.01
su
2co r ap plu eq ua ke gz ip
University of Maryland mg rid sw im wu pw ise
11
Instrumentation Overhead sample 1 in 250 sample 1 in 2,500 sample 1 in 25,000
100
10
1
0.1
0.01
su2 co r ap plu equ ak e gz ip
University of Maryland mg rid sw im wup wi se
12
120
100
80
60
40
20
0 to mc at v
Simulation Overhead cache simulation load/store cycle count sw im su
2c or mg rid appl u co mpr es s
University of Maryland ijpe g
13
Using Dyninst
Better knowledge about objects
– Local variables
– FORTRAN common blocks
Can instrument memory allocation routines
– Track objects created/destroyed
Measure by code using hardware counters
– Save counts at significant points, like Paradyn
• Function entries/exits/calls
– Turn counting on & off around areas of interest
University of Maryland 14
Instrumenting Loads and Stores
New BPatch_point type
– BPatch_loadStore
– New method, isStore(), returns true or false
New expression type
– BPatch_effectiveAddr
• Only valid at BPatch_loadStore points
• Returns the effective address being accessed
University of Maryland 15
Future Work
Run miss sampling on real hardware
– IBM POWER3, POWER4
– Use Dyninst
Visualization tool
– Save all data in compact format tool understands
• For tested applications, largest file is 15MB
– Filter by objects, parts of code
– Compare data from different runs
Use results to optimize applications
University of Maryland 16
Future Work Continued
More uses of eviction information
– For estimating portion of object in cache
• Use difference of misses and evictions
– For finding lost opportunities for reuse
• Track evicted data to until next load
• Measure interval in time, cache misses, etc.
University of Maryland 17
Conclusions
Features are appearing in new processors
– Possible to implement cache miss sampling now
– Much more efficient than software simulation
Eviction information in hardware practical
– Sampling is efficient and accurate
Could use Dyninst
– For simulation or for hardware
University of Maryland 18