The SIGMA Tools Jeff Hollingsworth (University of Maryland) Luiz Derose K Ekanadham (IBM Research) University of Maryland 1 Sigma Goals Family of tools to understand caches – Focus of detailed statistics – Complement existing hardware counters Ability to handle real applications – MPI and openMP programs – Fortran and C Provide hints about restructuring – Padding (both inter and intra data structures) – Blocking University of Maryland 2 Approach Run instrumented program – Capture full information about memory use – Produce compact trace • Extracts loops and memory strides Post execution tools – Memory profiler • share of accesses due to each data structure – Cache Prediction Tool • Predict cache misses using symbolic equations – Detailed simulator • Full discrete event simulator University of Maryland 3 Structure of SIGMA Data Collection source files Sigma Compile/Link Instrumented binary Program Execution dumpMap .addr trace files .lst files Cache Simulator Prediction Tool University of Maryland Memory Ref Tool 4 New Dyninst Features for SIGMA Fortran Common Blocks – Class BPatch_cblock • Represents a unique definition of a common block • getComponents – returns members of the common block • getFunctions – returns functions that define this block – Class BPatch_type • getCblocks – returns list of BPatch_cblock – Global Variables • Named common blocks now visible Fortran specific Debug Symbols – Now parsed and visible University of Maryland 5 Representing Program Execution Capture full execution behavior – Record all basic blocks and memory addresses – Produces large traces (due to looping) Trace compression – Maintain pattern buffer – Scan for repeating patterns • Extract memory strides – Repeat algorithms for nested loops Count Length RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 Base 100 200 300 300 500 7 Stride 4 4 4 4 4 University of Maryland 6 Trace Information Application Memory Refs Full Trace Compressed Trace Slowdown Swim_loops 10,770,481 50,454,784 3,692 57 308,927,975 1,291,107,400 53,944 21 131,505,450 693,813,012 496,032 35 Mgrid Hydro2d Compression ratio a function of regularity Slowdown depends on fraction of instructions that load/store memory University of Maryland 7 Using SIGMA Trace Generation Compiling - modify makefile Running Selected instrumentation – .f to .o rules • prepend $(SIGMA)/bin/sigmaCompile $< – Link step • prepend $(SIGMA)/bin/sigmaLink – Two environment variables • SIGMA_TRACELEVEL • SIGMA_TRACEDIR – Only sigmaCompile selected files • No overhead for uninstrumented files – Explict calls to enable/disable • Some overhead remains University of Maryland 8 Cache Prediction Tool Use compressed traces Convert memory refs back to array refs Compute Miss Equations – re-use vectors (Ghosh & Martonosi) – Direct set of linear constraints (Chatterjee et. al) To Compute Misses – define misses as a system of linear equations – use Omega library to solve Provides – count of misses – information about iterations that cause misses University of Maryland 9 Iteration Space Re-use vectors • defines points in the iteration space that access the same data Miss equations • describe points in interaction space that cause misses on conflicts University of Maryland 10 Predicting cache misses Operate on compact traces – Only expand to full trace if needed Use algorithms developed for compilers – Re-use vectors – Cache miss equations Miss types are identified – capacity, cold, and conflict University of Maryland 11 Memory Cache Terminology Cache consists of lines L -way associate Each Line maps to a set S University of Maryland 12 Array References A reference Rv(i1,i2) refers to – the vth array reference in a loop – the i1th iteration of the outer loop – the i2nd iteration of the inner loop Rv(i1,i2) precedes Ru(j1,j2) if – i1 < j1 or – i1 = j1 and i2 < j2 or – i1 = j1 and i2 = j2 and v < u University of Maryland 13 A Replacement Miss There exists a reference Ra(i1,i2) such that – Ra(i1,i2) refers to line L and maps to set S There exists another Rb(j1,j2) such that – Rb(j1,j2) refers to line L and maps to set S – Rb(j1,j2) precedes Ra(i1,i2) There exist at least references such that – Rn(k1,k2) maps to set S – Rn(k1,k2) refers to line line Ln where • Ln is distinct from all other Ln’s and L – Ra(j1,j2) precedes Rb(k1,k2) precedes Rb(i1,i2) University of Maryland 14 Using Miss Data For each Reference get – Set of iterations that produce cold misses – Set of iterations that produce replacement misses Counting Misses – Can count misses at each reference – Combined counts for a loop nest University of Maryland 15 Status Trace Generation Running Cache Prediction Running for small loops Future Work – Multiple loop nests – Multi-level caches – Irregular programs University of Maryland 16