CACHETOR Detecting Cacheable Data to Remove Bloat Khanh Nguyen Guoqing Xu UC Irvine USA Introduction Bloat: Excessive work to accomplish simple tasks • • Modern software suffers from bloat [Xu et.al., FoSER 2010] • It is difficult for compilers to remove the penalty • One pattern: repeated computations that have the same inputs and produce the same outputs • 4 out of 18 best practices (IBM’s)* are to reuse data Khanh Nguyen - UC Irvine * www.ibm.com/software/webservers/appserv/ws_bestpractices.pdf Example {0.0, 2.3, 1.0, float[] fValues = {?, ?,1.0, ?, ?, . .1.0, . , ?}; 3.4, 1.0, 1.0, . . . , 1.0}; int[] iValues = new int[fValues.length] ; int cached_result = Float.floatToIntBits(1.0); for (int i = 0; i < fValues.length; i++){ ifiValues[i] (fValues[i] === 1.0) iValues[i] = cached_result; Float.floatToIntBits(fValues[i]); else iValues[i] = Float.floatToIntBits(fValues[i]); } {adapted from sunflow, an open-source image rendering system} Khanh Nguyen - UC Irvine The Big Picture Dynamic Dependence Analysis I-Cachetor Dependence Profile/Graph D-Cachetor M-Cachetor Khanh Nguyen - UC Irvine Cachetor • Introduction • Scalable algorithms for the dependence analysis • 3 detectors • Evaluations Khanh Nguyen - UC Irvine In Theory Practice Abstract Value Profiling Full Value Profiling Cachetor Abstract Dynamic Slicing Full Dynamic Slicing Khanh Nguyen - UC Irvine Overview • Combine value profiling and dynamic slicing in a mutually-beneficial and scalable manner • • Distinct values are used to abstract instruction instances Result: an abstract dependence graph • • Nodes: abstract representations of runtime instances Edges: dependence relationships between nodes Khanh Nguyen - UC Irvine Equivalence Class e1 Inst. instances Instruction … i en f1 Khanh Nguyen - UC Irvine Equivalence Class 1 1 1 2 2 3 Inst. instances 3 3 2 Values created 3 4 4 5 5 6 6 6 6 6 f1(inst. instance) = value created Values created Inst. instances f2 f1 -Top-N ? - Hashing ? 3 3 3 6 6 Inst. instances 1 Values created 1 1 1 4 4 7 0 1 7 2 2 2 2 5 5 8 8 8 f1 - Hashing f2 Another Abstraction Level • Context sensitive: • To distinguish entities based on the calling context • • To improve the tool’s precision Please refer to our paper for details Khanh Nguyen - UC Irvine Cacheability • Quantitative measurement indicating how likely a program entity will keep producing/containing identical values • • Compute cacheability for 3 kinds of program entities: • Instruction • Data structure • Method call Rank and report top entities Khanh Nguyen - UC Irvine Cachetor • Introduction • Scalable algorithms for the dependence analysis • 3 detectors • Evaluations Khanh Nguyen - UC Irvine I-Cachetor • Detect instructions that create identical values • Compute cacheability for each static instruction (Inst.CM) • Cacheability: 0 1 2 3 D-Cachetor: Overview • • 2 steps: • Step 1: detect cacheable individual objects • Step 2: detect cacheable data structure Compute cacheability for each allocation site node D-Cachetor: Step 1 • Compute cacheability for each object (Obj.CM), not considering reference relationships • Focus: instructions that write primitive-typed fields a = new A()1 1 2 … a.f = b<2,3> a.g = c<3,3> a.h = d<5,7> t a.… = … D-Cachetor: Step 2 • Group objects using the reference relationships ds = new DS()2 • Compute DataStructureCM • Focus: instructions that write reference-typed fields • Add only objects whose Obj.CM is within a range a = new A()4 c = new C()2 b = new B()6 d = new D()7 M-Cachetor • Detect method calls that have the same inputs and produce the same outputs • Compute CallSiteCM • For each call site c: a = f( ), CallSiteCM is: • If a is primitive: CallSiteCM = Inst.CMc • If a is reference: CallSiteCM = the average of DataStructureCM of all data structures rooted at a Implementation • Jikes RVM 3.1.1 • Optimizing-compiler-only mode • Context-sensitive • Evaluated on 14 benchmarks from DaCapo & Java Grande Khanh Nguyen - UC Irvine Overheads X 600 Geo. Mean = 201.96X (Time) - 1.98X(Space) Time Space X 10 9 500 8 7 400 6 300 5 4 200 3 2 100 1 0 0 Khanh Nguyen - UC Irvine Case Studies Program Time Reduction Space Reduction GC runs Reduction GC time Reduction montecarlo 12.1% 98.7% 70.0% 89.2% raytracer 19.1% 1.2% 33.3% 30.2% euler 20.5% 0.4% 40.0% 44.8% bloat 13.1% 12.6% -7.3% -4.0% xalan 5.2% 0.1% -0.7% -1.1% Khanh Nguyen - UC Irvine False Positives Program D-Cachetor M-Cachetor montecarlo 2 6 raytracer 3 4 euler 1 7 bloat 1 4 xalan 4 5 Numbers of false positives identified among top 20 items in the reports of D-Cachetor and M-Cachetor. Khanh Nguyen - UC Irvine False Positives Sources • Handling of floating point values • Context-sensitive reporting • Missing the actual values • Hashing-induced false positives Khanh Nguyen - UC Irvine Conclusions • Cachetor - novel tool, supports detection of cacheable data to improve performance • • Scalable combination of value profiling and dynamic slicing 3 detectors that can detect cacheable: o o o • Instructions Data structures Method calls Large optimization opportunities can be found from Cachetor’s reports Khanh Nguyen - UC Irvine THANK YOU! Questions - Comments? Khanh Nguyen - UC Irvine What happened in montecarlo? public void runSerial() { results = new Vector(nRunsMC); // Now do the computation. PriceStock ps; for( int iRun=0; iRun < nRunsMC; iRun++ ) { ps = new PriceStock(); ps.setInitAllTasks(initAllTasks); ps.setTask(tasks.elementAt(iRun)); ps.setTask(iRun, (long)iRun*11); ps.run(); results.addElement(ps.getResult()); } {Calculate the result on the fly} private void processSerial() { processResults(); } private void initTasks(int nRunsMC) { tasks = new Vector(nRunsMC); for( int i=0; i < nRunsMC; i++ ) { String header= "MC run “ + String.valueOf(i); ToTask task = new ToTask(header, (long)i*11); tasks.addElement((Object) task); } } Khanh Nguyen - UC Irvine