Dimensions in Program Synthesis Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond Invited Tutorial FMCAD 2010 Lugano, Switzerland Automated Program Synthesis • What is Program Synthesis? – Synthesize an executable program from user intent expressed in form of some constraints. Compilers Synthesizers Structured language input Can accept a variety/mixed form of constraints (e.g., logic, examples, traces, partial programs) Syntax-directed translation Uses some kind of search No new algorithmic insights Discovers new algorithmic insights • Why today? – Natural goal given that computing has become accessible, but: • fundamental “how” programming models have not changed. • most people are not expert programmers. – Enabling technology is now available • Better search/logical reasoning techniques (SAT/SMT solvers) • Faster machines (good application for multi-cores) 1 Program Synthesis: Recent Success Our techniques can synthesize a wide variety of algorithms/programs from logic and examples. • Undergraduate book algorithms (e.g., sorting, dynamic prog) – [Srivastava/Gulwani/Foster, POPL 2010] • Program Inverses (e.g, deserializers from serializers) – [Srivastava/Gulwani/Chaudhuri/Foster, MSR-TR-2010-34] • Graph Algorithms (e.g., bi-partiteness check) – [Itzhaky/Gulwani/Immerman/Sagiv, OOPSLA 2010] • Bit-vector algorithms (e.g., turn-off rightmost one bit) – [Jha/Gulwani/Seshia/Tiwari, ICSE 2010] • String Manipulating macros (e.g. ”Helmut Veith”-> “Veith, H.”) – [Gulwani, POPL 2011] • Geometry Constructions (e.g. construct reg. hexagon given a side) – [Gulwani/Korthikanti/Tiwari, recent work] 2 Outline • Dimension 1: User Intent • Dimension 2: Search space • Dimension 3: Search Technique • Applications – Bit-vector Algorithms – String Manipulation Macros – Geometry Constructions 3 Outline Dimension 1: User Intent • Dimension 2: Search space • Dimension 3: Search Technique • Applications – Bit-vector Algorithms – String Manipulation Macros – Geometry Constructions 4 Dimension 1: User Intent • Logical specifications – Logical relations between inputs and outputs • Natural language • Input-output examples • Traces • Programs 5 Logical Specification: Example 1 Problem: Sorting Logical relation between input array A and output array B of size n. 8k: (0≤k<n-1) )(B[k] ≤ B[k + 1]) Æ 8k 9j: (0≤k<n-1) ) (0≤j<n Æ B[j] = A[k]) 6 Natural Language • Advances in NLP allow mapping natural language to logic. – NL interfaces have been designed for database queries. • Natural language can be ambiguous. – This issue can be resolved by paraphrasing. 7 Input-Output Examples • Advantages of Input-Output examples – Easy to provide, No need to remember syntax – Less chances of mistake • What prevents a trivial table-lookup solution on inputoutput pairs (xi, yi)? Switch x Case x1: return y1 Case x2: return y2 : Case xn: retun yn – Restriction on search space! • How to select examples? – Interactive manner! 8 Interaction Model 1. User provides few input-output examples I. 2. Sythesizer constructs a program P consistent with I. 3. Process may be repeated after adding new examples. – User-driven Interaction • User tests the program on other inputs I’. If a discrepancy is found, user provides a new input-output example. – Synthesizer-driven Interaction • If synthesizer finds another program P’ consistent with I, it asks user to provide output for distinguishing input. Typically few iterations are required – small teaching dimension [Goldman, Kearns, JCSS ‘95] – low Kolmogorov (descriptive) complexity 9 Traces • A detailed step-by-step description of how the program should behave on a given input. • Easier to deal with than input-output examples. – Some synthesizers that accept input-output examples first generate traces. • A natural model in certain applications. – Programming by demonstration systems for end-users. • intermediate states resulting from the user’s successive actions on a user interface constitute a valid trace. – Reverse engineering. • Convenient in certain scenarios. – E.g., consider describing Factorial(7). • Trace: 7*6*5*4*3*2 or Recursive Trace: 7*Factorial(6) • Final simplified output: 5040 10 Outline • Dimension 1: User Intent Dimension 2: Search space • Dimension 3: Search Technique • Applications – Bit-vector Algorithms – String Manipulation Macros – Geometry Constructions 11 Dimension 2: Search Space • Programs – Operators • Comparison operators • Combination of arithmetic and bitwise operators • APIs exported from a given library – SortList = Array2List◦SortArray◦List2Array – Control-flow • • • • Given looping template Bounded number of statements Partial program with holes straight-line or loop-free 12 Loop-free Programs Parameterized by set of components/operators used. • Bitvector Algorithms – Components = Arithmetic + Bitwise operators • String Manipulation (or Text Editing) Macros – Components = editing commands (insert, locate, select, delete) • Geometrical Constructions – Components = ruler + compass • Unbounded data type manipulation – Components = linear arithmetic/set operators – [PLDI ‘10, Viktor Kuncak et al, Complete Functional Synthesis] • API call sequences [PLDI ’05, Bodik et al, Jungloid Mining] – Components = API calls Can be likened to putting together Jigsaw puzzle pieces. 13 Dimension 2: Search Space • Programs – Operators – Control-flow • Grammars – Examples: Regular Expressions, DFAs, NFAs, Context Free Grammars, Regular Transducers – Applications: robotics/control systems, pattern recognition, computational linguistics/biology, data compression/mining etc. • Logics – First-order logic + Fixed point • = PTIME algorithms over ordered structures such as strings, graphs • E.g., Graph Classifiers: Bipartite, Acyclic, Connected Graph Computations: Shortest Path, Cycle, 2 coloring 14 Outline • Dimension 1: User Intent • Dimension 2: Search space Dimension 3: Search Technique • Applications – Bit-vector Algorithms – String Manipulation Macros – Geometry Constructions 15 Program Synthesis Techniques • Exhaustive search – Usually requires non-trivial optimizations. – Geometry algorithms, Mutual Exclusion algorithms • Reduction to SAT/SMT constraints – Can leverage engineering advances in recent off-the-shelf solvers. – Bit-vector algorithms, Graph algorithms, Program inverses • Version space algebra – Data-structures to efficiently represent and manipulate huge sets of programs consistent with given observations. – String Manipulation macros • Machine Learning – Bayesian Learning, Belief Propagation – QBF Solving 16 Outline • Dimension 1: User Intent • Dimension 2: Search space • Dimension 3: Search Technique • Applications Bit-vector Algorithms – String Manipulation Macros – Geometry Constructions 17 Application 1: Bitvector Algorithms • Search Space Dimension: Straight-line programs that use – Arithmetic Operators: +,-,*,/ – Logical Operators: Bitwise and/or/not, Shift left/right • User Intent Dimension – Logical specifications – Input-output examples • Search Algorithm Dimension: SAT/SMT based techniques 18 Application 1: Bitvector Algorithms • Significance Dimension: Algorithm Designers Algorithm Designers Consumers of Program Synthesis Technology 19 Examples of Bitvector Algorithms Turn-off rightmost 1-bit 10101100 & Z 10101000 Z & (Z-1) 10101100 Z 10101011 Z-1 10101000 Z & (Z-1) 20 Examples of Bitvector Algorithms Turn-off rightmost contiguous sequence of 1-bits 10101100 10100000 Z Z & (1 + (Z | (Z-1))) Ceil of average of two integers without overflowing (Y|Z) – ((Y©Z) >> 1) 21 Examples of Bitvector Algorithms P24: Round up to next highest power of 2 o1 := sub(x,1); o2 := shr(o1,1); o3 := or(o1,o2); o4 := shr(o3,2); o5 := or(o3,o4); o6 := shr(o5,4); o7 := or(o5,o6); o8 := shr(o7,8); o9 := or(o7,o8); o10 := shr(o9,16); o11 := or(o9,o10); res := add(o10,1); P25: Higher order half of product of x and y o1 := and(x,0xFFFF); o2 := shr(x,16); o3 := and(y,0xFFFF); o4 := shr(y,16); o5 := mul(o1,o3); o6 := mul(o2,o3); o7 := mul(o1,o4); o8 := mul(o2,o4); o9 := shr(o5,16); o10 := add(o6,o9); o11 := and(o10,0xFFFF); o12 := shr(o10,16); o13 := add(o7,o11); o14 := shr(o13,16); o15 := add(o14,o12); res := add(o15,o8); [ICSE 2010] Joint work with Susmit Jha, Sanjit Seshia (UC-Berkeley), Ashish Tiwari (SRI) and Venkie (MSR Redmond) 22 Experiments: Comparison with Exhaustive Search Program Brahma AHA time Program Name lines iters time P13 4 4 6 X P14 4 4 60 X P15 4 8 119 X P16 4 5 62 X P17 4 6 78 109 P18 6 5 46 X P19 6 5 35 X P20 7 6 108 X P21 8 5 28 X P22 8 8 279 X P23 10 8 1668 X P24 12 9 224 X P25 16 11 2779 X Name lines iters time P1 2 2 3 0.1 P2 2 3 3 0.1 P3 2 3 1 0.1 P4 2 2 3 0.1 P5 2 3 2 0.1 P6 2 2 2 0.1 P7 3 2 1 2 P8 3 2 1 1 P9 3 2 6 7 P10 3 14 76 10 P11 3 7 57 9 P12 3 9 67 10 Brahma AHA time 23 Functional Specification • Choice 1: Logical relation between inputs and outputs • Choice 2: Input-Output Examples 24 Functional Specification: Logical Relations Problem: Turn off rightmost 1-bit Functional Specification of desired behavior b Æ[ ( p=1 I[p]=1 b Æ (I[j]=0)) ) (J[p]=0 Æ(J[j] = I[j])) ] j=p+1 jp Tool Output: J = I & (I-1) 25 Functional Specification Problem: Turn off rightmost contiguous string of 1-bits • Logical Relations – A bit complicated • Input-Output Examples – Key challenge is to resolve under-specification – Our solution: Interaction with user 26 Dialog: Interactive Synthesis Problem: Turn-off rightmost contiguous string of 1’s User: I want a design that maps 01011 -> 01000 Oracle: I can think of two designs Design 1: (x+1) & (x-1) Design 2: (x+1) & x which differ on 00000 (Distinguishing Input) What should 00000 be mapped to? User: 00000 -> 00000 27 Dialog: Interactive Synthesis Problem: Turn-off rightmost contiguous string of 1’s User: 01011 -> 01000 Oracle: 00000 ? User: 00000 Oracle: 01111 ? User: 00000 Oracle: 00110 ? User: 00000 Oracle: 01100 ? User: 00000 Oracle: 01010 ? User: 01000 Oracle: Your design is X & (1 + ((x-1)|x)) 28 Outline • Dimension 1: User Intent • Dimension 2: Search space • Dimension 3: Search Technique • Applications – Bit-vector Algorithms String Manipulation Macros – Geometry Constructions 29 Application 2: String Manipulation Macros • Search Space Dimension: Programs with conditionals/loops – String operations: Concatenate, Substring – Logical operations: comparison involving # of occurrences of a regular expression • User Intent Dimension: Input-output examples • Search Algorithm Dimension: Combination of – Version Space Algebras – Machine Learning 30 Application 2: String Manipulation Macros • Significance Dimension: End-users Algorithm Designers Software Developers Most Useful Target End-Users Consumers of Program Synthesis Technology 31 Methodology: Automating end-user programming 1. Identify tasks that end-users struggle with and identify how they can effectively communicate their intent. – Read help-forums and blogs. – Interview real users. 2. Design a language that satisfies the following trade-off. – – Expressive enough to express a lot of tasks. Small enough to allow efficient learning. 3. Design a learning algorithm with following features. – – – Interactive with fast convergence (with success or failure). Provide feedback. Noise tolerant. Joint work with: Bill Harris (UW, Madison), Rishabh Singh (MIT) 32 Synthesis Algorithm for String Programs • Language L of programs contains regular expressions, conditionals and loops. • Goal: Given input-output pairs: (i1, o1), (i2, o2), (i3, o3), (i4, o4), compute set of all programs P such that – P(i1)=o1, P(i2)=o2, P(i3)=o3, P(i4)=o4 – Then, choose the simplest such P. 33 Synthesis Algorithm for String Programs 1. Compute the set D1 of all straight-line programs s.t. for any Q in D1, Q(i1) = o1. Similarly compute D2, D3, D4. 2. Let D = D1 Å D2 Å D3 Å D4. If D ≠ ; then done, Else: (a). Find a smallest partition, say {D1,D3}, {D2,D4}, s.t. D1 Å D3 ≠ ; and D2 Å D4 ≠ ; (b). Learn a boolean classifier b that maps i1 and i3 to true and i2 and i4 to false. 3. Desired set of programs is If (b) then D1 Å D3 else D2 Å D4 34 Outline • Dimension 1: User Intent • Dimension 2: Search space • Dimension 3: Search Technique • Applications – Bit-vector Algorithms – String Manipulation Macros Geometry Constructions 35 Application 3: Geometry Constructions • Search Space Dimension: Straight-line programs – Operations: Ruler, Compass • User Intent Dimension: Logical specifications – Can be obtained from natural language – Is further translated into a random input-output example • Search Algorithm Dimension: Exhaustive Search – Property Testing – Goal-directed search – Commonly used library of constructions 36 Application 3: Geometry Constructions • Significance Dimension: Students and Teachers Algorithm Designers Software Developers Most Useful Target Most Transformational Target End-Users Students and Teachers Consumers of Program Synthesis Technology 37 Automating Education Make education interactive and fun • Automated problem solving (for students) – Provide hints – Point out mistakes and suggest fixes • Creation of teaching material (for teachers) – Authoring tools – Problem construction • Group interaction (for teachers/students) – Ask questions – Share annotations Domains: Geometry, Algebra, Probability, Mechanics, Electrical Circuits, etc. 38 Geometry Constructions Domain What is the role of PL + Logic + Synthesis? • Programming Language for Geometry – Objects: Point, Line, Circle – Constructors • • • • • Ruler(Point, Point) -> Line Compass(Point, Point) -> Circle Intersect(Circle, Circle) -> Pair of Points Intersect(Line, Circle) -> Pair of Points Intersect(Line, Line) -> Point • Logic for Geometry – Inequality predicates over arithmetic expressions • Distance(Point, Point), Angle(Line, Line), … • Automated Problem Solving – Given pre/postcondition, synthesize a straight-line program 39 Geometry Domain: Automated Problem Solving Automated Problem Solving • Given pre/postcondition, synthesize a straight-line program Example: Draw a line L’ perpendicular to a given line L. Precondition: true Postcondition: Angle(L’,L) = 90 Program Step 1: P1, P2 = ChoosePoint(L); Step 2: C1 = Circle(P1,P2); Step 3: C2 = Circle(P2,P1); Step 4: <P3, P4> = Intersect(C1,C2); Step 5: L’ = Line(P3,P4); 40 Constructing line L’ perpendicular to given line L Step 1: P1, P2 = ChoosePoint(L); Step 2: C1 = Circle(P1,P2); Step 3: C2 = Circle(P2,P1); Step 4: <P3, P4> = Intersect(C1,C2); Step 5: L’ = Line(P3,P4); L’ P3 C2 C1 P1 L P2 P4 41 Examples of Geometry Constructions • • • • • • • Bisect a given line. Bisect an angle. Copy an angle. Draw a line parallel to a given line. Draw an equilateral triangle given two points. Draw a regular hexagon given a side. Given 4 points, draw a square with each of the sides passing through a different point. Other Applications: • New approximate geometric constructions • 2D/3D planning problems 42 Synthesis Algorithm for Geometry Constructions • Synthesis, in general, is harder than verification. – Synthesis Problem: Given pre/postcondition, synthesize a straight-line program – Verification Problem: Given pre/postcondition, and a straightline program, determine whether the Hoare triple holds. Precondition: True Postcondition: Angle(L,L’) = 90 Step Step Step Step Step 1: P1, P2 = ChoosePoint(L); 2: C1 = Circle(P1,P2); 3: C2 = Circle(P2,P1); 4: <P3, P4> = Intersect(C1,C2); 5: L’ = Line(P3,P4); • Decision procedures for verification of geometry constructions are known, but are complex. – Because of symbolic reasoning. 43 A simpler strategy for verification of Constructions • Symbolic reasoning based decision procedures are complex. • How about property testing? Theorem: A construction that works (i.e., satisfies the postcondition) for a randomly chosen model of precondition also works for all models (w.h.p.). Proof: • Objects constructed using ruler/compass can be described using polynomial ops (+,-,*), square-root & division operator. • The randomized polynomial identity testing algorithm lifts to square-root and division operators as well ! 44 Randomized Polynomial Identity Testing • Problem: Given two polynomials P1 and P2, determine whether they are equivalent. • The naïve deterministic algorithm of expanding polynomials to compare them term-wise is exponential. • A simple randomized test is probabilistically sufficient: – Choose random values r for polynomial variables x – If P1(r) ≠ P2(r), then P1 is not equivalent to P2. – Otherwise P1 is equivalent to P2 with high probability, 45 Synthesis Algorithm for Geometry Constructions Problem: Symbolic reasoning is hard. Idea #1: Leverage Property Testing to reduce symbolic reasoning to concrete reasoning. • Construct a random input-output example (I,O) for the problem and find a construction that can generate O from I. • Example: Construct incenter of a triangle. – If I chose my input triangle to be an equilateral one, then the circumcenter construction also appears to work! • Since incenter = circumcenter for an equilateral traingle. – But what are the chances of choosing an random triangle to be an equilateral one? 46 Synthesis Algorithm for Geometry Constructions Exhuastive Search Strategy: Given input objects I and desired objects O, keep constructing new objects from I using ruler and compass until objects O are obtained. Problem: Search blows up, i.e., too many (useless) objects get constructed. – Example: n points lead to O(n^2) lines, which leads to O(n^4) points, and so on… 47 Synthesis Algorithm for Geometry Constructions Problem: Search space is huge. • Idea #2: Perform goal-directed reasoning. – Example: If an operation leads to construction of a line L that passes through a desired output point, it is worthwhile constructing line L. – Mimics human intelligence. – For this to be effective, we need solutions with small depth. • Idea #3: Work with a richer library of primitives. – Common constructions picked up from chapters of text-books. – A search space of (small width, large depth) is converted into one of (large width, small depth). – Mimics human experience/knowledge. 48 Search space Exploration: With/without goal-directness 49 Problem Solving Engine with Natural Interfaces Problem Description in English Natural Language Processing Problem Description as Logical Relation Synthesis Engine Solution as Functional Program Paraphrasing Solution in English Joint work with: Kalika Bali, Monojit Chaudhuri (MSR Bangalore) Vijay Korthikanti (UIUC), Ashish Tiwari (SRI) 50 Useful modules powered by problem solving engine The next step is to architect several useful modules on top of the problem-solving architecture such as: • Interactive feedback to students – – Provide hints Point out mistakes and suggest fixes • Creation of teaching material (for teachers) – Problem construction – Authoring tools 51 Other Domains What domains should we prioritize for automation? • Mathematics – Algebra – Probability • Physics – Mechanics – Electrical Circuits – Optics • Chemistry – Quantitative Chemistry – Organic Chemistry 52 Electrical Circuits: Concept-specific solutions • Consider the problem of computing effective resistance between two nodes in a graph of resistances. • MATLAB implements Kirchoff’s law based decision procedure – Algebraic sum of the currents at any circuit junction = 0 – Sum of changes in potential in any complete loop = 0 Joint work with: Swarat Chaudhuri (Penn State University) 53 Electrical Circuits: Concept-specific solutions • Consider the problem of computing effective resistance between two nodes in a graph of resistances. • Kirchoff’s law based decision procedure is not useful for students who are expected to know only simpler concepts. • Solutions need be parameterized by specific concepts such as – Series/Parallel composition of resistances – Symmetry Reduction – Wheatstone Bridge Joint work with: Swarat Chaudhuri (Penn State University) 54 Resistance Reduction Concepts Series Combination Parallel Combination Wheat-stone Bridge If R3/R1 = R4/R2, then VD = VB 55 Automating Education: Long-term Goals • Ultra-intelligent computer • Model of human mind • Inter-stellar travel (Inter-disciplinary) Dimensions in Program Synthesis • User Intent – Human Computer Interaction – Natural Language Processing • Search Space (requires corresponding domain expertise) – Graphics (for image manipulation) – Mathematics/Physics (for classroom problem solving) • Search Techniques – Logical Reasoning – Machine Learning 57 The Significance Dimension Algorithm Designers Software Developers Most Useful Target Most Transformational Target End-Users Students and Teachers Consumers of Program Synthesis Technology 58 Research Questions • How to combine various forms of user intent in a unified programming interface? – logic, natural language, input/output example, partial program • How to ensure a modular architecture that allows reuse of domain knowledge and search techniques across different synthesis tools/applications? • How to combine power of different search techniques? – Version space algebras – SAT/SMT based logical reasoning techniques – Machine learning techniques 59 References • Dimensions in Program Synthesis – Invited paper at ACM PPDP 2010 • Bitvector Algorithms – “Oracle guided component based program synthesis”, ICSE 2010, Jha/Gulwani/Seshia/Tiwari • String Manipulation Macros – “Automating String Processing in Spreadsheets using InputOutput Examples”, POPL 2011, Gulwani • Geometry Constructions – “Synthesizing Geometry Constructions”, Techreport 2011, Gulwani/Korthikanti/Tiwari 60