Implicit Programming Viktor Kuncak Swiss Federal Institute of Technology Lausanne (EPFL) Joint work with: Tihomir Gvero, EPFL Ali Sinan Köksal, now grad student at UC Berkeley Ruzica Piskac, now at Max-Planck Inst. for Software Systems Philippe Suter, EPFL (graduating) http://lara.epfl.ch/ Programming Activities and Tools requirements Three related activities: • Development within an IDE (Eclipse, Visual Studio, emacs) • Compilation and static checking (optimizing compiler, static analyzer, contract checker) • Execution on a (virtual) machine def f(x : Int) = { y=2*x+1 } iload_0 iconst_1 iadd More compute power available for each step use it to improve programmer productivity 42 Implicit Programming Agenda requirements Advance techniques and tools for Development within an IDE What can we prove in 0.5 s? def f(x : Int) = { y=2*x+1 } Compilation and verification Eliminate unknowns from programs Constraint diagnostics iload_0 iconst_1 iadd Execution When is unfolding enough? to improve software productivity. 42 Background: Scala Programming Language Introduced by Martin Odersky, EPFL Unifies functional and object-oriented programming Runs on the Java Virtual Machine (and .NET) Now used by over 100’000 developers (incl. Twitter, UBS, LinkedIn) DSL embedding mechanisms. IDE support (e.g. Eclipse) sealed abstract class Tree case class Leaf() extends Tree case class Node(left:Tree, data:Int, right:Tree) extends Tree def elems(t:Tree) :Set[Int] = t match { case Leaf() ⇒ Set.empty case Node(l,d,r) ⇒ elems(l) ++ Set(d) ++ elems(r) } http://www.scala-lang.org Kaplan: Executable Spec for Sorting sealed abstract class List case class Nil() extends List case class Cons(head : Int, tail : List) extends List def elems(ls : List) : Set[Int] = … def isSorted(lst : List) : Boolean = lst match { case Nil() ⇒ true case Cons(_, Nil()) ⇒ true case Cons(x, Cons(y, ys)) ⇒ x < y && isSorted(Cons(y,ys)) } ((lst: List)⇒ isSorted(lst) && elems(lst) == Set(0, 1, -3)).find.get > Cons(-3, Cons(0, Cons(1, Nil()))) Koksal, Kuncak, Suter: Constraints as Control, POPL 2012 Minimizing Solutions val (a, b) = (12, 8) val (lcm, _, _) = ((m, fa, fb) ⇒ m > 0 && m == fa * a && m == fb * b) .minimizing((m,fa,fb) ⇒ m).find.get > 24 Enumerating Complex Values ((t:Tree)⇒ isBinarySearchTree(t) && elems(t)==Set(1,2,3)).findAll.toList > List(Node(Node(L, 1, L), 2, Node(L, 3, L)), Node(Node(Node(L, 1, L), 2, L), 3, L), Node(L, 1, Node(L, 2, Node(L, 3, L)))) Red-Black Trees Implementation: next 30 pages invariants Formalize Invariants (in Scala) sealed abstract class Tree case class Empty() extends Tree case class Node(color: Color, left: Tree, value: Int, right: Tree) extends Tree def bBalanced(t : Tree) : Boolean = t match { case Node(_,l,_,r) => bBalanced(l) && bBalanced(r) && bHeight(l) ==bHeight(r) case Empty() => true } def bHeight(t : Tree) : Int = t match { case Empty() => 1 case Node(Black(), l, _, _) => bHeight(l) + 1 case Node(Red(), l, _, _) => bHeight(l) } def isRBT(t : Tree) : Int = … Insertion in a few lines def insert(x : Int, t : Tree) = ((t1:Tree) => isRBT(t1) && elems(t1) = elems(t) ++ Set(x) ).find Are invariants together with declarative shorter than functional implementation? – not necessarily, – but we know the result satisfies the invariants – RBT only makes sense with invariants Extending the Program Suppose we wish to implement ‘remove’ as well def remove(x : Int, t : Tree) = ((t1:Tree) => isRBT(t1) && elems(t1) = elems(t) -- Set(x) ).find Declarative remove is again just a few lines (keep the same invariants!) declarative knowledge can be more reusable void RBDelete(rb_red_blk_tree* tree, rb_red_blk_node* z){ rb_red_blk_node* y; rb_red_blk_node* x; rb_red_blk_node* nil=tree->nil; rb_red_blk_node* root=tree->root; y= ((z->left == nil) || (z->right == nil)) ? z : TreeSuccessor(tree,z); x= (y->left == nil) ? y->right : y->left; if (root == (x->parent = y->parent)) { /* assignment of y->p to x->p is intentional */ root->left=x; } else { if (y == y->parent->left) { y->parent->left=x; } else { y->parent->right=x; } } if (y != z) { /* y should not be nil in this case */ #ifdef DEBUG_ASSERT Assert( (y!=tree->nil),"y is nil in RBDelete\n"); #endif /* y is the node to splice out and x is its child */ if (!(y->red)) RBDeleteFixUp(tree,x); tree->DestroyKey(z->key); tree->DestroyInfo(z->info); y->left=z->left; y->right=z->right; y->parent=z->parent; y->red=z->red; z->left->parent=z->right->parent=y; if (z == z->parent->left) { z->parent->left=y; } else { z->parent->right=y; } free(z); } else { tree->DestroyKey(y->key); tree->DestroyInfo(y->info); if (!(y->red)) RBDeleteFixUp(tree,x); free(y); } #ifdef DEBUG_ASSERT Assert(!tree->nil->red,"nil not black in RBDelete"); #endif void RBDeleteFixUp(rb_red_blk_tree* tree, rb_red_blk_node* x) { rb_red_blk_node* root=tree->root->left; rb_red_blk_node* w; while( (!x->red) && (root != x)) { if (x == x->parent->left) { w=x->parent->right; if (w->red) { w->red=0; x->parent->red=1; LeftRotate(tree,x->parent); w=x->parent->right; } if ( (!w->right->red) && (!w->left->red) ) { w->red=1; x=x->parent; } else { if (!w->right->red) { w->left->red=0; w->red=1; RightRotate(tree,w); w=x->parent->right; } w->red=x->parent->red; x->parent->red=0; w->right->red=0; LeftRotate(tree,x->parent); x=root; /* this is to exit while loop */ } } else { /* the code below is has left and right switched from above */ w=x->parent->left; if (w->red) { w->red=0; x->parent->red=1; RightRotate(tree,x->parent); w=x->parent->left; } if ( (!w->right->red) && (!w->left->red) ) { w->red=1; x=x->parent; } else { if (!w->left->red) { w->right->red=0; w->red=1; LeftRotate(tree,w); w=x->parent->left; } w->red=x->parent->red; x->parent->red=0; w->left->red=0; RightRotate(tree,x->parent); x=root; /* this is to exit while loop */ } } } x->red=0; #ifdef DEBUG_ASSERT Imperative remove 140 lines of tricky C, reusing existing functions Unreadable without pictures (from Emin Martinian) Key Question When will such declarative programming work? Which constraints can we solve predictably? ( ? ).find Need algorithms for solving constraints over integers, lists, trees, sets, maps, multisets, … Executing Constraints: Then and Now Long tradition – Prolog (inspired by Robinson’s resolution theorem proving) – CLP(R): added linear constraints – Functional Logic Programming (e.g.Curry): narrowing technique then came SAT and Kodkod – Aleksandar Milicevic, Derek Rayside, Kuat Yessenov, Daniel Jackson: Unifying execution of imperative and declarative code. ICSE 2011 – Hesam Samimi, Ei Darli Aung, Todd D. Millstein: Falling Back on Executable Specifications. ECOOP 2010 and SMT - Satisfiability Modulo Theories Implicit Programming : SMT = Prolog : Resolution SMT = SAT + Cooperating Decision Procedures SAT solver cleverly SMT enumerates conjunctions x < y+1 && y < x+1 && x’=f(x) && y’=f(y) && x’=y’+1 linear programing solver x < y+1 y < x+1 x’=y’+1 congruence closure implementation x=y x’=f(x) y’=f(y) x’=y’ 0=1 e.g. Z3 theorem prover (N. Bjorner, L. de Moura) Leon: Constraint Solver of Kaplan = + recursive function definitions • Supports recursive functions and data-types. • Originally developed for program verification with counter-examples. [SAS 2011] • Algorithm is a semi-decision procedure; always terminates when there is a counter-example. • Works as a decision procedure for a well-defined class of functions. [POPL 2010] – tree fold functions f : Tree Set such that |f-1(x)| is large enough “SMT modulo Recursive Functions” Algorithm to find solution for constraint with recursive functions: – Disable recursive call branches. If z3-sat, report SAT – Enable recursive branches, replace the result with fresh constant; if z3-unsat, report UNSAT – If none gives answer, replace call with function body (unroll) and also add function contracts (k-induction) Results for Computing with Constraints send + more = money 1.17 s All red-black trees up to size 7: 27.45 s (comparable to JPF) last list element: Note: domains not bounded upfront Example Results for Verification Unrolling depth Same property also esnures that enumeration often terminates. http://lara.epfl.ch/leon/ Implicit Programming Agenda requirements Advancing 3) Development within an IDE def f(x : Int) = { y=2*x+1 } 2) Compilation and static checking iload_0 iconst_1 iadd 1) Execution on a (virtual) machine 42 Automated reasoning is key enabling technology Synthesis for Arithmetic def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = ((h: Int, m: Int, s: Int) ⇒ ( h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && m ≥ 0 && m < 60 && s ≥ 0 && s < 60)).find def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = val t1 = val t2 = val t3 = val t4 = Some(t1, t3, t4) ? Choose Notation def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = choose((h: Int, m: Int, s: Int) ⇒ ( h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && m ≥ 0 && m < 60 && s ≥ 0 && s < 60)) def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = val t1 = val t2 = val t3 = val t4 = Some(t1, t3, t4) ? Starting point: quantifier elimination (QE) • A specification statement of the form r = choose(x ⇒ F( a, x )) “let r be x such that F(a, x) holds” • Corresponds to constructively solving the quantifier elimination (QE) problem ∃ x . F( a, x ) where a are parameters • Witness terms from QE are the synthesized code Methodology QE Synthesis Quantifier Elimination: x. S(x,a) P(a) Synthesis: x. S(x,a) S(t(a),a) • For all QE procedures we examined, we were able to find the corresponding witness terms • One-point rule immediately gives a term x. (x = t(a) && S(x,a)) S(t(a),a) Example for other domains: x. (a1 < x & x < a2) a1 < a1 + 1 & a1 + 1 < a2 t(a1,a2)=a1+1 Synthesis Procedure: Equalities Process equalities first: • compute parametric description of solution set • replace n variables with n-1 h * 3600 + m * 60 + s == totalSeconds s = totalSeconds - h * 3600 - m * 60 In general we obtain divisibility constraints – use Extended Euclid’s Algorithm, matrix pseudo inverse in Z Synthesis Procedure: Inequalities • Solve for one by one variable: – separate inequalities depending on polarity of x: Ai ≤ αix βjx ≤ Bj – define terms a = maxi⌈Ai/αi⌉ and b = minj⌈Bj/ βj⌉ • If b is defined, return x = b else return x = a • Further continue with the conjunction of all formulas ⌈Ai/αi⌉ ≤ ⌈Bj/ βj⌉ • Similar to Fourier-Motzkin elimination (remove floor and ceiling using divisibility) 26 2y − b ≤ 3x + a ∧ 2x − a ≤ 4y + b 2y − b − a ≤ 3x ∧ 2x ≤ 4y + a + b 4y − 2b − 2a ≤ 6x ≤ 12y + 3a + 3b (4y − 2b − 2a) / 6 ≤ x ≤ (12y + 3a + 3b) / 6 (4y − 2b − 2a) / 6 ≤ ⌊(12y + 3a + 3b) / 6⌋ two extra variables: (4y − 2b − 2a) / 6 ≤ l ∧ 12y + 3a + 3b = 6 ∗ l + k ∧0≤k≤5 4y − 2b − 2a ≤ 6 * l ∧ 12y + 3a + 3b = 6 ∗ l + k ∧0≤k≤5 pre: 6|3a + 3b − k 4y − 2b − 2a ≤ 12y + 3a + 3b – k − 5b − 5a +k ≤ 8y y = ⌈(k − 5a − 5b)/8⌉ 27 Generated Code Contains Loops val (x1, y1) = choose(x: Int, y: Int => 2*y − b =< 3*x + a && 2*x − a =< 4*y + b) val kFound = false for k = 0 to 5 do { val v1 = 3 * a + 3 * b − k if (v1 mod 6 == 0) { val alpha = ((k − 5 * a − 5 * b)/8).ceiling val l = (v1 / 6) + 2 * alpha val y = alpha val kFound = true break } } if (kFound) val x = ((4 * y + a + b)/2).floor else throw new Exception(”No solution exists”) 28 Result of Synthesis def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = choose((h: Int, m: Int, s: Int) ⇒ ( h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && m ≥ 0 && m < 60 && s ≥ 0 && s < 60 )) def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = val t1 = totalSeconds div 3600 val t2 = totalSeconds -3600 * t1 val t3 = t2 div 60 val t4 = totalSeconds - 3600 * t1 - 60 * t3 (t1, t3, t4) Implemented as an extension of the Scala compiler. Example: Date Conversion in C Knowing number of days since 1980, find current year and day BOOL ConvertDays(UINT32 days) { year = 1980; while (days > 365) { if (IsLeapYear(year)) { if (days > 366) { days -= 366; year += 1; } } else { days -= 365; year += 1; } ... } Enter December 31, 2008 All music players (of a major brand) froze in the boot sequence. Implicit Programming for Date Conversion Knowing number of days since 1980, find current year and day val origin = 1980 @spec def leapsTill(y : Int) = (y-1)/4 - (y-1)/100 + (y-1)/400 let (year1, day1)=choose( (year:Int, day:Int) => { days == (year-origin)*365 + leapsTill(year)-leapsTill(origin) + day && 0 < day && day <= 366 print(year1, day1) }) Analysis and termination simpler than with loop Properties of Synthesis Algorithm • For every formula in linear integer arithmetic – synthesis algorithm terminates – produces the most general precondition (assertion saying when result exists) – generated code gives correct values whenever correct values exist • If there are multiple or no solutions for some parameters, we get a warning Compile-time warnings def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = choose((h: Int, m: Int, s: Int) ⇒ ( h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && h < 24 && m ≥ 0 && m < 60 && s ≥ 0 && s < 60 )) Warning: Synthesis predicate is not satisfiable for variable assignment: totalSeconds = 86400 Compile-time warnings def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = choose((h: Int, m: Int, s: Int) ⇒ ( h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && m ≥ 0 && m ≤ 60 && s ≥ 0 && s < 60 )) Warning: Synthesis predicate has multiple solutions for variable assignment: totalSeconds = 60 Solution 1: h = 0, m = 0, s = 60 Solution 2: h = 0, m = 1, s = 0 Arithmetic pattern matching def fastExponentiation(base: Int, power: Int) : Int = { def fp(m: Int, b: Int, i: Int): Int = i match { case 0 ⇒ m case 2 * j ⇒ fp(m, b*b, j) case 2 * j + 1 ⇒ fp(m*b, b*b, j) } fp(1, base, p) } • Goes beyond Haskell’s (n+k) patterns • Compiler checks that all patterns are reachable and whether the matching is exhaustive Synthesis for parametrized arithmetic def decomposeOffset(offset: Int, dimension: Int) : (Int, Int) = choose((x: Int, y: Int) ⇒ ( offset == x + dimension * y && 0 ≤ x && x < dimension )) • The predicate becomes linear at run-time • Synthesized program must do case analysis on the sign of the input variables • Some coefficients are computed at run-time Synthesis for (multi)sets (BAPA) def splitBalanced[T](s: Set[T]) : (Set[T], Set[T]) = choose((a: Set[T], b: Set[T]) ⇒ ( a union b == s && a intersect b == empty && a.size – b.size ≤ 1 && b.size – a.size ≤ 1 )) def splitBalanced[T](s: Set[T]) : (Set[T], Set[T]) = val k = ((s.size + 1)/2).floor val t1 = k s val t2 = s.size – k val s1 = take(t1, s) val s2 = take(t2, s minus s1) (s1, s2) a b NP-Hard Constructs • Divisibility combined with inequalities: – corresponding to big disjunction in q.e. , we will generate a for loop with constant bounds (could be expanded if we wish) • Disjunctions – Synthesis of a formula computes program and exact precondition of when output exists – Given disjunctive normal form, use preconditions to generate if-then-else expressions (try one by one) Synthesis for Disjunctions More on this Synthesis Approach • Software Synthesis Procedures, Communications of the ACM, February 2012 • STTT 2012 • PLDI 2010 Ongoing work – Use it to compile certain Kaplan specifications – Synthesis for further theories Implicit Programming Agenda requirements Advancing 3) • Development within an IDE 2) • Compilation and static checking def f(x : Int) = { y=2*x+1 } iload_0 iconst_1 iadd 1) • Execution on a (virtual) machine Automated reasoning is key enabling technology 42 Type-Driven Synthesis within an IDE Type-Driven Synthesis within an IDE Synthesis Approach Outline ………………………… ………………………… ………………………… ………………………… ………………………… ………………………… ………………………… ………………………… …………… source code in editor Extract: - visible symbols - expected type Program point (cursor) pre-computed weights Create type environment: - encode subtypes - assign initial weights Search algorithm with weights (generates intermediate expressions and their types) Ranking w/ Tihomir Gvero and Ruzica Piskac, CAV’11 5 suggested expressions Search Algorithm • Semi-decidable problem (cf. Leon) • Related to proof search in intuitionistic logic • Weights are an important additional aspect: – in our application we find many solutions quickly – must select interesting ones – interesting = small weight – algorithm with weights preserves completeness • Identified decidable (even polynomial) cases, in the absence of generics and subtyping Results for Expression Suggestion Benchmark Length #Initial #Derived #Snip.Gen. Rank Time [ms] ByteArrayInputStreambytebufintoffsetintlength 4 22 4049 102 3 546 CharArrayReadercharbuf 3 26 782 343 1 546 HashSetiterator 2 60 1832 201 1 546 Hashtableelements 2 32 869 445 1 546 HashtableentrySet 2 31 874 441 1 546 HashtablekeySet 2 32 968 492 3 546 Hashtablekeys 2 30 818 477 2 515 PriorityQueuepoll 2 27 1208 363 1 562 Examples demonstrating API usage Remove 1-2 lines; ask tool to synthesize it Time budget for search: 0.5 seconds Generates up to 1000s of well-formed and well-typed expressions Displays top 5 of them The one that was removed appeared as rank 1, 2, or 3 Similarly good behavior in over 50% of the 120 examples evaluated Other Topics • Specification decision procedures, including MAPA, BAPA, extensions, term powers (Piskac, Yessenov) • Synthesis based on automata (Jobstmann, Spielmann) • Static analysis and verification – Jahob (Rinard, Zee) – Static analysis of PHP (Kneuss, Suter), Scala (Kneuss,Suter), C (Vujosevic-Janicic) – Predicate abstraction with interpolation (Hojjat, Ruemmer, Iosif) • Numerical computation analysis (Darulova) • Collaborations on distributed systems (Kostic), speculative linearizability (Guerraoui), testing (Marinov) Implicit Programming, Explicit Design Explicit = written down, machine readable Implicit = omitted, to be (re)discovered Current practice: – explicit program code – implicit design (key invariants, properties, contracts) A lot of hard work exists on verification and analysis: checking design against given code and recovering (reverse engineering) implicit invariants. Our goal: – explicit design – implicit program Total work of developer is not increased! Moreover: – can be decreased for certain types of specifications – confidence in correctness higher – program is spec Conclusions 4) more human-oriented programming, empowering user to program requirements Advancing 3) Development within an IDE: synthesize entire expressions 2) Compilation and static checking: transform spec into program 1) Execution on a (virtual) machine: use and extend SMT solvers 3) choose x,y,z,n=> xn + yn = zn 2) SMT solver 1) http://lara.epfl.ch/~kuncak 42