From Verification to Synthesis Sumit Gulwani sumitg@microsoft.com Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1 Synthesis Goal: Synthesize a computational concept in some underlying language from user intent using some search technique. State of the art: We can synthesize programs of size 10-20. 1 Dimensions in Synthesis • Language (Application) – Programs • Straight-line programs – Automata – Queries • User Intent (Ambiguity) – Logic, Natural Language – Examples, Demonstrations/Traces – Program • Search Technique (Algorithm) – SAT/SMT solvers (Formal Methods) – A*-style goal-directed search (AI) – Version space algebras (Machine Learning) PPDP 2010: “Dimensions in Program Synthesis”, Gulwani. 2 Compilers vs. Synthesizers Dimension Compilers Synthesizers Concept Language Executable Program Variety of concepts: Program, Automata, Query, Sequence User Intent Structured language Variety/mixed form of constraints: logic, examples, traces Search Technique Syntax-directed Uses some kind of search translation (No new (Discovers new algorithmic algorithmic insights) insights) 3 • From verification to synthesis – – – – – Bitvector algorithms (PLDI 2011, ICSE 2012) General loopy programs (POPL 2010) SIMD algorithms (PPoPP 2013) Program inverses (PLDI 2011) Graph algorithms (OOPSLA 2010) • End-user Programming (Examples & Natural Language) – – – – Syntactic string transformations: Flash Fill (POPL 2011) Semantic string transformations (VLDB 2012) Table layout transformations (PLDI 2011) Smartphone scripts (MobiSys 2013) • Computer-aided Education – – – – Problem Synthesis (AAAI 2012, CHI 2013) Solution Synthesis (PLDI 2011, IJCAI 2013) Feedback Synthesis (PLDI 2013, IJCAI 2013) Content Authoring (CHI 2012) 4 From Verification to Synthesis Application Generating Synthesis Solving Synthesis Constraint Constraint Bitvector Loopy Alg. SIMD Location variables CEGIS + SMT Template-based SMT Relational verification CEGIS + Reachability value graph Template-based + SMT symbolic execution Inverses Graph Alg. Reference: Path-based Inductive Synthesis for Program Inversion, PLDI 2011, Srivastava, Gulwani, Chaudhuri, Foster 5 Dimensions in Synthesis • Language – Programs • Straight-line programs – Automata – Queries • User Intent – Logic, Natural Language – Examples, Demonstrations/Traces – Program • Search Technique – SAT/SMT solvers (Formal Methods) – A*-style goal-directed search (AI) – Version space algebras (Machine Learning) 6 Program Inversion: Example In-place run-length encoding: A = [1,1,1,0,0,2,2,2,2] Encoder A=[1,0,2] N=[3,2,4] Decoder A’=[1,1,1,0,0,2,2,2,2] IN(A,n); Assume (n >= 0) i, m := 0, 0; // parallel assignment while (i<n) r := 1; while (i+1<n && A[i]=A[i+1]) r, i := r+1, i+1; A[m], N[m], m, i := A[i], r, m+1, i+1; OUT(A,N,m); IN(A,N,m) i’, m’ := 0, 0; while (m’ < m) r’ := N[m’]; while (r’>0) r’,i’, A’[i’] := r’-1, i’+1, A[m’]; m’ := m’+1; OUT(A’,m’); assert(A’=A; m’=n); 7 Program Inversion as Synthesis Problem In-place run-length encoding: A = [1,1,1,0,0,2,2,2,2] Encoder A=[1,0,2] N=[3,2,4] Decoder A’=[1,1,1,0,0,2,2,2,2] E = { 0, 1, m’±1, r’±1, i’±1, A[i’], A[m’], N[m’] } P = { m’<m, r’>0, A’[i’]= A’[i’+1] } IN(A,n); Assume (n >= 0) i, m := 0, 0; // parallel assignment while (i<n) r := 1; while (i+1<n && A[i]=A[i+1]) r, i := r+1, i+1; A[m], N[m], m, i := A[i], r, m+1, i+1; OUT(A,N,m); IN(A,N,m) i’, m’ := e1, e2; // ei ∈ E while (p1) // pi ∈ P r’ := e3; while (p2) r’,i’, A’[e4] := e5, e6, e7; m’ := e8; OUT(A’,m’); Assert(A’=A; m’=n); 8 Synthesis Technique • Inductive invariant required to establish correctness are too sophisticated. • We use symbolic execution to generate verification condition for correctness on certain paths in the original and the inverted program. • This generates constraints of the form ∃ 𝑒𝑖 , 𝑝𝑖 ∀𝑉 (𝜙1 𝑉, 𝑒𝑖 , 𝑝𝑗 ∧ ⋯ ∧ 𝜙𝑘 𝑉, 𝑒𝑖 , 𝑝𝑗 ) 9 Related Work: Program Sketching Reference: Program Synthesis by Sketching, Phd Thesis 2008, Armando Solar-Lezama (Advisor: Ras Bodik @ UC-Berkeley) • Key Ideas: – Write an arbitrary program with holes, where each hole takes values from a finite domain. – Use CEGIS to generate SAT constraints on holes. • Cons: Not as efficient as domain-specific synthesizers. – (On bitvector benchmark, times out on 9/25 tasks, and on the remaining it is slower by 20x on average). • Pros: – A very powerful formalism that can be used to model a variety of synthesis problems. – Sees synthesis as an interactive process. 10