buck.ppt

advertisement

Using Information About Cache Evictions to

Measure the Interactions of

Application Data Structures

Bryan R. Buck

Jeffrey K. Hollingsworth

University of Maryland

Department of Computer Science

1 University of Maryland

Introduction

Cache behavior information is important

– Processor speed increasing faster than memory

Should relate cache info to data structures

– More useful to programmer in tuning applications

Collect using hardware

– Software techniques, such as simulation, are slow

– In the past, limited hardware support

– Situation is changing, hardware support more common

2 University of Maryland

Outline

Measuring cache misses

– Sampling

Information about evictions

– What is required

– Sampling

Simulation-based study

– The simulator and applications used

– Results

Conclusions and future work

University of Maryland 3

Finding Objects With Most Cache Misses

Handling every cache miss is slow

– Use sampling, requirements:

• Periodic interrupt on cache miss

• Ability to determine miss address

Associate count with each object

– Variable or dynamically allocated memory

Interrupt after every n cache misses

– Obtain address of miss

– Find object containing it and increment count

University of Maryland 4

Interactions Between Objects

Why does data leave the cache?

– What object caused it to be replaced?

Hardware could provide eviction information

– When miss occurs, save address of evicted data

Not difficult to provide physical address

– Can calculate from tag of evicted cache line

– Information in OS can map physical to virtual

• May be imprecise due to paging

University of Maryland 5

Measuring Eviction Information

Use sampling, store more at each miss

– Object that caused the miss

– Object containing the data that was evicted

– Part of code it happened in

Questions

– “Buckets” much smaller, will sampling be accurate?

– Data structure more complicated, how efficient?

University of Maryland 6

Experiments

Implemented in simulation

– Simulator uses ATOM binary rewriting tool

• Instrument load/stores for cache simulation

• Instrument basic blocks for virtual cycle count

• Simulates necessary hardware support

– Miss and eviction sampling runs under simulation

Tested using SPEC95/2000 applications

– su2cor, applu, equake, gzip, mgrid, swim, wupwise, …

– Sampled 1 in 25,000 misses

University of Maryland 7

Accuracy of Sampling Cache Misses

Application su2cor

Variable

U

R-loops

S

W2-intact

W2-sweep

Actual

Rank %

1 60.5

2 5.3

3

4

5

5.0

4.2

4.1

Sample

Rank %

1

4

61.1

4.6

2

3

6

5.3

5.3

4.0

swim

UNEW

PNEW

VNEW

CU

H

1 10.3

2 10.3

3 10.3

4 7.0

5 6.9

1 10.6

3 9.8

2 10.0

6 7.1

9 6.9

University of Maryland 8

100

80

60

40

20

0

Eviction Results: mgrid

U

V

R other

U U sampled

V

University of Maryland

V sampled

R R sampled

9

Evictions By Code Region: mgrid

% of total evictions of U by U, V, and R in each line of code.

Variable Function Line

U resid interp interp interp interp

214

302

312

290

281

Actual

Rank

1

2

3

4

5

%

40.7

5.2

5.2

4.7

4.7

Sample

Rank %

1

4

42.1

4.7

3

5

2

5.0

4.7

5.0

V

R resid psinv resid

200 1

174

200

1

2

University of Maryland

1.6

18.8

2.3

1

1

2

2.1

18.6

2.1

10

Cache Misses Due to Instrumentation

100

10 sample 1 in 250 sample 1 in 2,500 sample 1 in 25,000

1

0.1

0.01

su

2co r ap plu eq ua ke gz ip

University of Maryland mg rid sw im wu pw ise

11

Instrumentation Overhead sample 1 in 250 sample 1 in 2,500 sample 1 in 25,000

100

10

1

0.1

0.01

su2 co r ap plu equ ak e gz ip

University of Maryland mg rid sw im wup wi se

12

120

100

80

60

40

20

0 to mc at v

Simulation Overhead cache simulation load/store cycle count sw im su

2c or mg rid appl u co mpr es s

University of Maryland ijpe g

13

Using Dyninst

Better knowledge about objects

– Local variables

– FORTRAN common blocks

Can instrument memory allocation routines

– Track objects created/destroyed

Measure by code using hardware counters

– Save counts at significant points, like Paradyn

• Function entries/exits/calls

– Turn counting on & off around areas of interest

University of Maryland 14

Instrumenting Loads and Stores

New BPatch_point type

– BPatch_loadStore

– New method, isStore(), returns true or false

New expression type

– BPatch_effectiveAddr

• Only valid at BPatch_loadStore points

• Returns the effective address being accessed

University of Maryland 15

Future Work

Run miss sampling on real hardware

– IBM POWER3, POWER4

– Use Dyninst

Visualization tool

– Save all data in compact format tool understands

• For tested applications, largest file is 15MB

– Filter by objects, parts of code

– Compare data from different runs

Use results to optimize applications

University of Maryland 16

Future Work Continued

More uses of eviction information

– For estimating portion of object in cache

• Use difference of misses and evictions

– For finding lost opportunities for reuse

• Track evicted data to until next load

• Measure interval in time, cache misses, etc.

University of Maryland 17

Conclusions

Features are appearing in new processors

– Possible to implement cache miss sampling now

– Much more efficient than software simulation

Eviction information in hardware practical

– Sampling is efficient and accurate

Could use Dyninst

– For simulation or for hardware

University of Maryland 18

Download