Dimensions in Synthesis Sumit Gulwani sumitg@microsoft.com Microsoft Research, Redmond May 2012 Program Synthesis • Synthesize a program in some underlying language from user intent using some search technique. • Why today? – Variety of (cheap) computational devices and platforms • Billions of non-experts have access to these devices! – Enabling technology is now available • Better search algorithms • Faster machines (good application for multi-cores) 1 Program Synthesis • Synthesize a program in some underlying language from user intent using some search technique. • Why today? – Variety of (cheap) computational devices and platforms • Billions of non-experts have access to these devices! – Enabling technology is now available • Better search algorithms • Faster machines (good application for multi-cores) 2 Dimensions in Synthesis • Concept Language (Application) – Programs • Straight-line programs – Automata – Queries – Sequences • User Intent (Ambiguity) – Logic, Natural Language – Examples, Demonstrations/Traces • Search Technique (Algorithm) – SAT/SMT solvers (Formal Methods) – A*-style goal-directed search (AI) – Version space algebras (Machine Learning) PPDP 2010: “Dimensions in Program Synthesis”, Gulwani. 3 Compilers vs. Synthesizers Dimension Compilers Synthesizers Concept Language Executable Program Variety of concepts: Program, Automata, Query, Sequence User Intent Structured language Variety/mixed form of constraints: logic, examples, traces Search Technique Syntax-directed Uses some kind of search translation (No new (Discovers new algorithmic algorithmic insights) insights) 4 Potential Users of Synthesis Technology Algorithm Designers Software Developers Most Useful Target Most Transformational Target End-Users Students and Teachers • Vision for End-users: Enable people to have (automated) personal assistants. • Vision for Education: Enable every student to have access to free & high-quality education. 5 Organization Lecture 1: Algorithms • Synthesis of Straight-line Programs from Logic – Bit-vector Algorithms – Geometry Constructions Lecture 2: Applications • Intelligent Tutoring Systems Lecture 3: Ambiguity • Synthesis from Examples & Keywords 6 Lab Intelligent Tutoring Systems Technical Goals: • Identify a useful task that can be formalized as a synthesis problem. • Propose an appropriate user interaction model. • Propose an appropriate search technique. 7 Synthesizing Bitvector Algorithms PLDI 2011: Gulwani, Jha, Tiwari, Venkatesan Dimensions in Synthesis • Concept Language – Programs • Straight-line programs – Automata – Queries – Sequences • User Intent – Logic, Natural Language – Examples, Demonstrations/Traces • Search Technique – SAT/SMT solvers (Formal Methods) – A*-style goal-directed search (AI) – Version space algebras (Machine Learning) PPDP 2010: “Dimensions in Program Synthesis”, Gulwani. 9 Bitvector Algorithms Straight-line programs that use – Arithmetic Operators: +,-,*,/ – Logical Operators: Bitwise and/or/not, Shift left/right 10 Examples of Bitvector Algorithms Turn-off rightmost 1-bit 10101100 & Z 10101000 Z & (Z-1) 10101100 Z 10101011 Z-1 10101000 Z & (Z-1) 11 Examples of Bitvector Algorithms Turn-off rightmost contiguous sequence of 1-bits 10101100 10100000 Z Z & (1 + (Z | (Z-1))) Ceil of average of two integers without overflowing (Y|Z) – ((Y©Z) >> 1) 12 Examples of Bitvector Algorithms Round up to next highest power of 2 o1 := sub(x,1); o2 := shr(o1,1); o3 := or(o1,o2); o4 := shr(o3,2); o5 := or(o3,o4); o6 := shr(o5,4); o7 := or(o5,o6); o8 := shr(o7,8); o9 := or(o7,o8); o10 := shr(o9,16); o11 := or(o9,o10); res := add(o10,1); Higher order half of product of x and y o1 := and(x,0xFFFF); o2 := shr(x,16); o3 := and(y,0xFFFF); o4 := shr(y,16); o5 := mul(o1,o3); o6 := mul(o2,o3); o7 := mul(o1,o4); o8 := mul(o2,o4); o9 := shr(o5,16); o10 := add(o6,o9); o11 := and(o10,0xFFFF); o12 := shr(o10,16); o13 := add(o7,o11); o14 := shr(o13,16); o15 := add(o14,o12); res := add(o15,o8); 13 Problem Definition Given: • Specification • Specification of desired functionality of library components Synthesize a straight-line program where • Each variable in is either or some • is a permutation of 1...n where k<j that meets the desired specification. Verification Constraint 14 Problem Definition: Turn-off rightmost 1 bit • Specification of desired functionality • Specification of library components 15 Synthesis Constraint Verification Constraint Synthesis Constraint 16 Idea # 1: Reduce Second-order Quantification in Synthesis Constraint to First Order represents which component goes on which location (line #) and from which location does it gets its input arguments. We encode this by location variables L. 17 Example: Possible programs that use 2 components and their Representation using Location Variables 18 Encoding Well-formedness of Programs The following constraint ensures that L assignments correspond to well-formed programs. • Consistency Constraint: Every line in the program should have at most one component. • Acyclicity Constraint: A variable should be initialized before being used. 19 Encoding data-flow The following constraint describes connections between inputs and outputs of various components. 20 Idea # 1: Reduce Second-order Quantification in Synthesis Constraint to First Order 21 Idea # 2: Using CEGIS style procedure to solve the Synthesis Constraint Synthesis constraint is of the form: 9L 8Y F(L,Y) Choose some values y1,..,yn for y Finite Synthesis Step 9L F(L,y1) Æ … Æ F(L,yn) No Solution Failure Solution L = S Verification Step Does 8Y F(S,Y) hold? Or, equivalently 9Y :F(S,Y) Solution Y = yn+1 No Solution return S 22 Experiments: Comparison with Brute-force Search Program Brahma AHA time Program Name lines iters time P13 4 4 6 X P14 4 4 60 X P15 4 8 119 X P16 4 5 62 X P17 4 6 78 109 P18 6 5 46 X P19 6 5 35 X P20 7 6 108 X P21 8 5 28 X P22 8 8 279 X P23 10 8 1668 X P24 12 9 224 X P25 16 11 2779 X Name lines iters time P1 2 2 3 0.1 P2 2 3 3 0.1 P3 2 3 1 0.1 P4 2 2 3 0.1 P5 2 3 2 0.1 P6 2 2 2 0.1 P7 3 2 1 2 P8 3 2 1 1 P9 3 2 6 7 P10 3 14 76 10 P11 3 7 57 9 P12 3 9 67 10 Brahma AHA time 23 Synthesizing Geometry Constructions PLDI 2011: Gulwani, Korthikanti, Tiwari. Dimensions in Synthesis • Concept Language – Programs • Straight-line programs – Automata – Queries – Sequences • User Intent – Logic, Natural Language – Examples, Demonstrations/Traces • Search Technique – SAT/SMT solvers (Formal Methods) – A*-style goal-directed search (AI) – Version space algebras (Machine Learning) PPDP 2010: “Dimensions in Program Synthesis”, Gulwani. 25 Ruler/Compass based Geometry Constructions Given a triangle XYZ, construct circle C such that C passes through X, Y, and Z. C L1 L2 N Z X Y 26 Other Examples of Geometry Constructions • Draw a regular hexagon given a side. • Given 3 parallel lines, draw an equilateral triangle whose vertices lie on the parallel lines. • Given 4 points, draw a square whose sides contain those points. 27 Significance • Good platform for teaching logical reasoning. – Visual Nature: • Makes it more accessible. • Exercises both logical/visual abilities of left/right brain. – Fun Aspect: • Ruler/compass restrictions make it fun, as in sports. • Application in dynamic geometry or animations. – “Constructive” geometry macros (unlike numerical methods) enable fast re-computation of derived objects from free (moving) objects. 28 Programming Language for Geometry Constructions Types: Point, Line, Circle Methods: • Ruler(Point, Point) -> Line • Compass(Point, Point) -> Circle • Intersect(Circle, Circle) -> Pair of Points • Intersect(Line, Circle) -> Pair of Points • Intersect(Line, Line) -> Point Geometry Program: A straight-line composition of the above methods. 29 Example Problem: Program Given a triangle XYZ, construct circle C such that C passes through X, Y, and Z. 1. C1 = Compass(X,Y); 2. C2 = Compass(Y,X); 3. <P1,P2> = Intersect(C1,C2); 4. L1 = Ruler(P1,P2); 5. D1 = Compass(Z,X); 6. D2 = Compass(X,Z); 7. <R1,R2> = Intersect(D1,D2); 8. L2 = Ruler(R1,R2); 9. N = Intersect(L1,L2); 10. C = Compass(N,X); C L1 N C1 D1 R1 Z R2 P1 X D2 L2 C2 Y P2 30 Specification Language for Geometry Programs Conjunction of predicates over arithmetic expressions Predicates p := e1 = e2 | e1 e2 | e1 · e2 Arithmetic Expressions e := | | | Distance(Point, Point) Slope(Point, Point) e 1 § e2 c 31 Example Problem: Precondition/Postcondition Given a triangle XYZ, construct circle C such that C passes through X, Y, and Z. Precondition: Slope(X,Y) Slope(X,Z) Æ Slope(X,Y) Slope(Z,X) Postcondition: LiesOn(X,C) Æ LiesOn(Y,C) Æ LiesOn(Z,C) Where LiesOn(X,C) ´ Distance(X,Center(C)) = Radius(C) 32 Verification/Synthesis Problem for Geometry Programs • Let P be a geometry program that computes outputs O from inputs I. • Verification Problem: Check the validity of the following Hoare triple. Assume Pre(I); P Assert Post(I,O); • Synthesis Problem: Given Pre(I), Post(I,O), find P such that the above Hoare triple is valid. 33 Approaches to Verification Problem Pre(I), P, Post(I,O) a) Symbolic decision procedures are complex. 34 Randomized Polynomial Identity Testing • Problem: Given two polynomials P1 and P2, determine whether they are equivalent. • The naïve deterministic algorithm of expanding polynomials to compare them term-wise is exponential. • A simple randomized test is probabilistically sufficient: – Choose random values r for polynomial variables x – If P1(r) ≠ P2(r), then P1 is not equivalent to P2. – Otherwise P1 is equivalent to P2 with high probability, 35 Approaches to Verification Problem Pre(I), P, Post(I,O) a) Symbolic decision procedures are complex. b) 1. 2. 3. New efficient approach: Random Testing! Choose I’ randomly from the set { I | Pre(I) }. Compute O’ := P(I’). If O’ satisfies Post(I’,O’) output “Verified”. Correctness Proof of (b): • Objects constructed by P can be described using polynomial ops (+,-,*), square-root & division operator. • The randomized polynomial identity testing algorithm lifts to square-root and division operators as well ! 36 Idea 1 (from Theory): Symbolic Reasoning -> Concrete Synthesis Algorithm: // First obtain a random input-output example. 1. Choose I’ randomly from the set { I | Pre(I) }. 2. Compute O’ s.t. Post(I’,O’) using numerical methods. // Now obtain a construction that can generate O’ from I’ (using exhaustive search). 3. S := I’; 4. While (S does not contain O’) 5. S := S [ { M(O1,O2) | Oi 2 S, M 2 Methods } 6. Output construction steps for O’. 37 Error Probability of the algorithm is extremely low. Given a triangle XYZ, construct circle C such that C passes through X, Y, and Z. C L1 … L1 = Ruler(P1,P2); N … L2 = Ruler(R1,R2); N = Intersect(L1,L2); Z C = Compass(N,X); X L2 Y • For an equilateral 4XYZ, incenter coincides with circumcenter N. • But what are the chances of choosing a random 4XYZ to be an equilateral one? 38 Idea 2 (from PL): High-level Abstractions Synthesis algorithm times out because programs are large. • Identify a library of commonly used patterns (pattern = “sequence of geometry methods”) – E.g., perpendicular/angular bisector, midpoint, tangent, etc. S := S [ { M(O1,O2) | Oi 2 S, M 2 Methods } S := S [ { M(O1,O2) | Oi 2 S, M 2 LibMethods } • Two key advantages: – Search space: large depth -> small depth – Easier to explain solutions to students. 39 Use of high-level abstractions reduces program size Given a triangle XYZ, construct circle C such that C passes through X, Y, and Z. 1. C1 = Compass(X,Y); 2. C2 = Compass(Y,X); 3. <P1,P2> = Intersect(C1,C2); 4. L1 = Ruler(P1,P2); 5. D1 = Compass(Z,X); 6. D2 = Compass(X,Z); 7. <R1,R2> = Intersect(D1,D2); 8. L2 = Ruler(R1,R2); 9. N = Intersect(L1,L2); 10. C = Compass(N,X); 1. 2. 3. 4. L1 = PBisector(X,Y); L2 = PBisector(X,Z); N = Intersect(L1,L2); C = Compass(N,X); 40 Idea 3 (from AI): Goal Directed Search Synthesis algorithm is inefficient because the search space is too wide and hence still huge. • Prune forward search by using A* style heuristics. S := S [ { M(O1,O2) | Oi 2 S, M 2 LibMethods } S := S [ {M(O1,O2) | Oi2S, M2LibMethods, IsGood(M(O1,O2)) } • Example: If a method constructs a line L that passes through a desired output point, then L is “good” (i.e., worth constructing). 41 Effectiveness of Goal-directed search Given a triangle XYZ, construct circle C such that C passes through X, Y, and Z. C L1 L2 N Z X Y • L1 and L2 are immediately constructed since they pass through output point N. • On the other hand, other lines like angular bisectors are not eagerly constructed. 42 Experimental Results 25 benchmark problems. • such as: Construct a square whose extended sides pass through 4 given points. • 18 problems less than 1 second. 4 problems between 1-3 seconds. 3 problems 13-82 seconds. • Idea 2 (high-level abstractions) reduces programs of size 3-45 to 3-13. • Idea 3 (goal-directedness) improves performance by factor of 10-1000 times on most problems. 43 Search space Exploration: With/without goal-directness 44 Dimensions in Synthesis • Concept Language – Programs • Straight-line programs – Automata – Queries – Sequences • User Intent – Logic, Natural Language – Examples, Demonstrations/Traces • Search Technique – SAT/SMT solvers (Formal Methods) – A*-style goal-directed search (AI) – Version space algebras (Machine Learning) PPDP 2010: “Dimensions in Program Synthesis”, Gulwani. 45 Optional Advance Preparation • Lecture 2 – Section 4 in WAMBSE 2012 keynote paper “Synthesis from Examples”, Gulwani. • Lab – Section 4 in WAMBSE 2012 keynote paper. – NCERT Online Book Website. http://ncert.nic.in/NCERTS/textbook/textbook.htm • Lecture 3 – Sections 1-3 in WAMBSE 2012 keynote paper 46 Intelligent Tutoring Systems • Motivation – Online learning sites: Khan academy, Edx, Udacity, Coursera • Increasing class sizes with even less personal attention – New technologies: Tablets/Smartphones, NUI, Cloud • Various Aspects – – – – Solution Generation Problem Generation Automated Grading Content Entry • Various Domains – K-12: Mathematics, Physics, Chemistry – Undergraduate: Introductory Programming, Automata Theory – Language Learning 47