sigma.ppt

advertisement
The SIGMA Tools
Jeff Hollingsworth
(University of Maryland)
Luiz Derose
K Ekanadham
(IBM Research)
University of Maryland
1
Sigma Goals

Family of tools to understand caches
– Focus of detailed statistics
– Complement existing hardware counters

Ability to handle real applications
– MPI and openMP programs
– Fortran and C

Provide hints about restructuring
– Padding (both inter and intra data structures)
– Blocking
University of Maryland
2
Approach

Run instrumented program
– Capture full information about memory use
– Produce compact trace
• Extracts loops and memory strides

Post execution tools
– Memory profiler
• share of accesses due to each data structure
– Cache Prediction Tool
• Predict cache misses using symbolic equations
– Detailed simulator
• Full discrete event simulator
University of Maryland
3
Structure of SIGMA Data Collection
source
files
Sigma
Compile/Link
Instrumented
binary
Program
Execution
dumpMap
.addr
trace
files
.lst
files
Cache
Simulator
Prediction
Tool
University of Maryland
Memory
Ref Tool
4
New Dyninst Features for SIGMA

Fortran Common Blocks
– Class BPatch_cblock
• Represents a unique definition of a common block
• getComponents – returns members of the common block
• getFunctions – returns functions that define this block
– Class BPatch_type
• getCblocks – returns list of BPatch_cblock
– Global Variables
• Named common blocks now visible

Fortran specific Debug Symbols
– Now parsed and visible
University of Maryland
5
Representing Program Execution

Capture full execution behavior
– Record all basic blocks and memory addresses
– Produces large traces (due to looping)

Trace compression
– Maintain pattern buffer
– Scan for repeating patterns
• Extract memory strides
– Repeat algorithms for nested loops
Count
Length
RPT
BLK1 ADR ADR
ADR BLK2 ADR ADR BLK3
250
Base
100
200
300
300
500
7 Stride
4
4
4
4
4
University of Maryland
6
Trace Information
Application
Memory Refs
Full Trace
Compressed Trace
Slowdown
Swim_loops
10,770,481
50,454,784
3,692
57
308,927,975
1,291,107,400
53,944
21
131,505,450
693,813,012
496,032
35
Mgrid
Hydro2d


Compression ratio a function of regularity
Slowdown depends on fraction of
instructions that load/store memory
University of Maryland
7
Using SIGMA Trace Generation

Compiling - modify makefile

Running

Selected instrumentation
– .f to .o rules
• prepend $(SIGMA)/bin/sigmaCompile $<
– Link step
• prepend $(SIGMA)/bin/sigmaLink
– Two environment variables
• SIGMA_TRACELEVEL
• SIGMA_TRACEDIR
– Only sigmaCompile selected files
• No overhead for uninstrumented files
– Explict calls to enable/disable
• Some overhead remains
University of Maryland
8
Cache Prediction Tool



Use compressed traces
Convert memory refs back to array refs
Compute Miss Equations
– re-use vectors (Ghosh & Martonosi)
– Direct set of linear constraints (Chatterjee et. al)

To Compute Misses
– define misses as a system of linear equations
– use Omega library to solve

Provides
– count of misses
– information about iterations that cause misses
University of Maryland
9
Iteration Space
Re-use vectors
• defines points in the iteration space that
access the same data
Miss equations
• describe points in interaction space that
cause misses on conflicts
University of Maryland
10
Predicting cache misses

Operate on compact traces
– Only expand to full trace if needed

Use algorithms developed for compilers
– Re-use vectors
– Cache miss equations

Miss types are identified
– capacity, cold, and conflict
University of Maryland
11
Memory
Cache Terminology
Cache
consists of lines L
-way associate
Each Line maps to a set S
University of Maryland
12
Array References

A reference Rv(i1,i2) refers to
– the vth array reference in a loop
– the i1th iteration of the outer loop
– the i2nd iteration of the inner loop

Rv(i1,i2) precedes Ru(j1,j2) if
– i1 < j1 or
– i1 = j1 and i2 < j2 or
– i1 = j1 and i2 = j2 and v < u
University of Maryland
13
A Replacement Miss

There exists a reference Ra(i1,i2) such that
– Ra(i1,i2) refers to line L and maps to set S

There exists another Rb(j1,j2) such that
– Rb(j1,j2) refers to line L and maps to set S
– Rb(j1,j2) precedes Ra(i1,i2)

There exist at least  references such that
– Rn(k1,k2) maps to set S
– Rn(k1,k2) refers to line line Ln where
• Ln is distinct from all other Ln’s and L
– Ra(j1,j2) precedes Rb(k1,k2) precedes Rb(i1,i2)
University of Maryland
14
Using Miss Data

For each Reference get
– Set of iterations that produce cold misses
– Set of iterations that produce replacement misses

Counting Misses
– Can count misses at each reference
– Combined counts for a loop nest
University of Maryland
15
Status

Trace Generation Running
Cache Prediction Running for small loops

Future Work

– Multiple loop nests
– Multi-level caches
– Irregular programs
University of Maryland
16
Download