Pentagons: A Weakly Relational Abstract Domain for the Efficient Validation of Array Accesses Francesco Logozzo, Manuel Fahndrich Microsoft Research, Redmond The Background Efficient static checking of .NET assemblies Foxtrot: a language agnostic contract language Clousot: a language agnostic static analyzer 2 Based on abstract interpretation Checks contracts, array bounds, memory accesses, nullness, … Demo Ok: not null Wrong? 3 Demo Ok: not null Ok: index in bounds 4 The paper in a nutshell Is 0 ≤ y < x ? Testing: try some points Program executions Polyhedra Yes! in O(2n) 5 Model checking: try all the points What for the others? What if we have ∞ points? Abstract interpretation: approximation Octagons Yes! in Θ(n3) Pentagons Yes! In O(n) Intervals No in O(n) Pentagons? A lightweight numerical domain Keep relations in the form a ≤ x ≤ b && x < y a, b numerical constants x, y variables Enough to validate > 83% of the accesses of mscorlib.dll 6 Mscorlib.dll is the main library in .NET Fast: Analyze it in a couple of minutes Abstract domain An abstract domain is a complete lattice endowed with Widening operator To ensure the convergence of the analysis Transfer functions To capture the abstract semantics of statements 7 Ex. The increasing chain [0,1] ⊑ [0,2] ⊑ [0,3] ⊑ [0, 4] ⊑ ... Is extrapolated by widening to [0, +∞] Ex. ⟦x := y + 3⟧([y → [1, 2]) = [y →[1,2], x→ [4,5]] Interval domain Elements: { [a, b] | a ∈ Z ∪ { -∞ }, b ∈ Z ∪ { +∞ } } Order [a,b] ⊑ [c,d] iff c ≤ a and b ≤ d Join [a,b] ⊔ [c,d] =[min(a,c), max(b,d)] Meet [a,b] ⊓ [c,d] = [max(a,c), min(b,d)] Widening: Keep the stable bounds Transfer functions: ordinary interval arithmetic 8 LT Domain Elements ℘ ({ X < Y | X and Y are variables }) Efficient representation with Hashtables Order A ⊑ B iff B ⊆ A Join A ⊔ B = A \cap B Meet A ⊓ B =A ∪ B Widening: just the join as the lattice has finite height Transfer functions: ⟦ y := x + 1 ⟧(A) = (A-{y}) ∪ { x < y } 9 Pentagons Reduced Cartesian product of Intervals and LT Reduced? Ex. Not just pairs: information flows from one element to the other (x → [1, 4], y → [3, 3], { x < y }) => (x → [1, 2], y → [3, 3], { x < y }) May introduce cubic slowdown Reduction is applied 10 In precise points of the analysis Lazily at join points The (Naif) Join of Pentagons Left_P = (left_intv, left_lt) , Right_P = (right_intv, right_lt) 1. 2. Close Left_P and Right_P Apply the join pairwisely Closure (intv, lt) iterates until saturation this rule: if x → [a,b], y → [c,d] ∈ intv. If b< c then lt = lt ∪ { x < y } Problem: It introduces a quadratic slowdown 11 The smarter join on Pentagons Idea: 1. 2. 3. Apply the pairwise join If a symbolic constraint x < y is dropped, check if the other branch implies it If it does, then keep the constraint Formal details in the paper Results: 12 For mscorlib we moved from > 1h to a couple of minutes No access is lost! Experiment: Array bounds analysis Dll # of accesses Checked Validated Prec. mscorlib.dll 21 073 6:42 17 057 14 218 83.38% System.dll 15 111 4:08 11 609 9 973 85.91% System.Design.dll 12 743 4:20 14 202 12 946 91.15% System.Web.dll 23 368 4:12 10 072 9 579 95.11% No pre-processing No pre-selection Intra-procedular analysis only 13 Time Assemblies as shipped # of meth. Contracts will improve the precision Conclusions A lightweight abstract domain Efficient, and scalable Used for array bounds validation Implemented in Clousot To be used 14 as a first pass to drop most of the proof obligations In combination with other domains