Dimensions in Program Synthesis Sumit Gulwani () Microsoft Research, Redmond

advertisement
Dimensions in Program Synthesis
Sumit Gulwani
(sumitg@microsoft.com)
Microsoft Research, Redmond
Invited Tutorial
FMCAD 2010
Lugano, Switzerland
Automated Program Synthesis
• What is Program Synthesis?
– Synthesize an executable program from user intent expressed
in form of some constraints.
Compilers
Synthesizers
Structured language input
Can accept a variety/mixed form of constraints
(e.g., logic, examples, traces, partial programs)
Syntax-directed translation
Uses some kind of search
No new algorithmic insights
Discovers new algorithmic insights
• Why today?
– Natural goal given that computing has become accessible, but:
• fundamental “how” programming models have not changed.
• most people are not expert programmers.
– Enabling technology is now available
• Better search/logical reasoning techniques (SAT/SMT solvers)
• Faster machines (good application for multi-cores)
1
Program Synthesis: Recent Success
Our techniques can synthesize a wide variety of
algorithms/programs from logic and examples.
• Undergraduate book algorithms (e.g., sorting, dynamic prog)
– [Srivastava/Gulwani/Foster, POPL 2010]
• Program Inverses (e.g, deserializers from serializers)
– [Srivastava/Gulwani/Chaudhuri/Foster, MSR-TR-2010-34]
• Graph Algorithms (e.g., bi-partiteness check)
– [Itzhaky/Gulwani/Immerman/Sagiv, OOPSLA 2010]
• Bit-vector algorithms (e.g., turn-off rightmost one bit)
– [Jha/Gulwani/Seshia/Tiwari, ICSE 2010]
• String Manipulating macros (e.g. ”Helmut Veith”-> “Veith, H.”)
– [Gulwani, POPL 2011]
• Geometry Constructions (e.g. construct reg. hexagon given a side)
– [Gulwani/Korthikanti/Tiwari, recent work]
2
Outline
• Dimension 1: User Intent
• Dimension 2: Search space
• Dimension 3: Search Technique
• Applications
– Bit-vector Algorithms
– String Manipulation Macros
– Geometry Constructions
3
Outline
 Dimension 1: User Intent
• Dimension 2: Search space
• Dimension 3: Search Technique
• Applications
– Bit-vector Algorithms
– String Manipulation Macros
– Geometry Constructions
4
Dimension 1: User Intent
• Logical specifications
– Logical relations between inputs and outputs
• Natural language
• Input-output examples
• Traces
• Programs
5
Logical Specification: Example 1
Problem: Sorting
Logical relation between input array A and
output array B of size n.
8k: (0≤k<n-1) )(B[k] ≤ B[k + 1])
Æ 8k 9j: (0≤k<n-1) ) (0≤j<n Æ B[j] = A[k])
6
Natural Language
• Advances in NLP allow mapping natural language to logic.
– NL interfaces have been designed for database queries.
• Natural language can be ambiguous.
– This issue can be resolved by paraphrasing.
7
Input-Output Examples
• Advantages of Input-Output examples
– Easy to provide, No need to remember syntax
– Less chances of mistake
• What prevents a trivial table-lookup solution on inputoutput pairs (xi, yi)?
Switch x
Case x1: return y1
Case x2: return y2
:
Case xn: retun yn
– Restriction on search space!
• How to select examples?
– Interactive manner!
8
Interaction Model
1. User provides few input-output examples I.
2. Sythesizer constructs a program P consistent with I.
3. Process may be repeated after adding new examples.
– User-driven Interaction
• User tests the program on other inputs I’. If a discrepancy
is found, user provides a new input-output example.
– Synthesizer-driven Interaction
• If synthesizer finds another program P’ consistent with I, it
asks user to provide output for distinguishing input.
Typically few iterations are required
– small teaching dimension [Goldman, Kearns, JCSS ‘95]
– low Kolmogorov (descriptive) complexity
9
Traces
• A detailed step-by-step description of how the
program should behave on a given input.
• Easier to deal with than input-output examples.
– Some synthesizers that accept input-output examples
first generate traces.
• A natural model in certain applications.
– Programming by demonstration systems for end-users.
• intermediate states resulting from the user’s successive
actions on a user interface constitute a valid trace.
– Reverse engineering.
• Convenient in certain scenarios.
– E.g., consider describing Factorial(7).
• Trace: 7*6*5*4*3*2 or Recursive Trace: 7*Factorial(6)
• Final simplified output: 5040
10
Outline
• Dimension 1: User Intent
 Dimension 2: Search space
• Dimension 3: Search Technique
• Applications
– Bit-vector Algorithms
– String Manipulation Macros
– Geometry Constructions
11
Dimension 2: Search Space
• Programs
– Operators
• Comparison operators
• Combination of arithmetic and bitwise operators
• APIs exported from a given library
– SortList = Array2List◦SortArray◦List2Array
– Control-flow
•
•
•
•
Given looping template
Bounded number of statements
Partial program with holes
straight-line or loop-free
12
Loop-free Programs
Parameterized by set of components/operators used.
• Bitvector Algorithms
– Components = Arithmetic + Bitwise operators
• String Manipulation (or Text Editing) Macros
– Components = editing commands (insert, locate, select, delete)
• Geometrical Constructions
– Components = ruler + compass
• Unbounded data type manipulation
– Components = linear arithmetic/set operators
– [PLDI ‘10, Viktor Kuncak et al, Complete Functional Synthesis]
• API call sequences [PLDI ’05, Bodik et al, Jungloid Mining]
– Components = API calls
Can be likened to putting together Jigsaw puzzle pieces.
13
Dimension 2: Search Space
• Programs
– Operators
– Control-flow
• Grammars
– Examples: Regular Expressions, DFAs, NFAs, Context Free
Grammars, Regular Transducers
– Applications: robotics/control systems, pattern recognition,
computational linguistics/biology, data compression/mining etc.
• Logics
– First-order logic + Fixed point
• = PTIME algorithms over ordered structures such as strings, graphs
• E.g., Graph Classifiers: Bipartite, Acyclic, Connected
Graph Computations: Shortest Path, Cycle, 2 coloring
14
Outline
• Dimension 1: User Intent
• Dimension 2: Search space
 Dimension 3: Search Technique
• Applications
– Bit-vector Algorithms
– String Manipulation Macros
– Geometry Constructions
15
Program Synthesis Techniques
• Exhaustive search
– Usually requires non-trivial optimizations.
– Geometry algorithms, Mutual Exclusion algorithms
• Reduction to SAT/SMT constraints
– Can leverage engineering advances in recent off-the-shelf
solvers.
– Bit-vector algorithms, Graph algorithms, Program inverses
• Version space algebra
– Data-structures to efficiently represent and manipulate
huge sets of programs consistent with given observations.
– String Manipulation macros
• Machine Learning
– Bayesian Learning, Belief Propagation
– QBF Solving
16
Outline
• Dimension 1: User Intent
• Dimension 2: Search space
• Dimension 3: Search Technique
• Applications
 Bit-vector Algorithms
– String Manipulation Macros
– Geometry Constructions
17
Application 1: Bitvector Algorithms
• Search Space Dimension: Straight-line programs that use
– Arithmetic Operators: +,-,*,/
– Logical Operators: Bitwise and/or/not, Shift left/right
• User Intent Dimension
– Logical specifications
– Input-output examples
• Search Algorithm Dimension: SAT/SMT based techniques
18
Application 1: Bitvector Algorithms
• Significance Dimension: Algorithm Designers
Algorithm
Designers
Consumers of Program Synthesis Technology
19
Examples of Bitvector Algorithms
Turn-off rightmost 1-bit
10101100
&
Z
10101000
Z & (Z-1)
10101100
Z
10101011
Z-1
10101000
Z & (Z-1)
20
Examples of Bitvector Algorithms
Turn-off rightmost contiguous sequence of 1-bits
10101100
10100000
Z
Z & (1 + (Z | (Z-1)))
Ceil of average of two integers without overflowing
(Y|Z) – ((Y©Z) >> 1)
21
Examples of Bitvector Algorithms
P24: Round up to next
highest power of 2
o1 := sub(x,1);
o2 := shr(o1,1);
o3 := or(o1,o2);
o4 := shr(o3,2);
o5 := or(o3,o4);
o6 := shr(o5,4);
o7 := or(o5,o6);
o8 := shr(o7,8);
o9 := or(o7,o8);
o10 := shr(o9,16);
o11 := or(o9,o10);
res := add(o10,1);
P25: Higher order half
of product of x and y
o1 := and(x,0xFFFF);
o2 := shr(x,16);
o3 := and(y,0xFFFF);
o4 := shr(y,16);
o5 := mul(o1,o3);
o6 := mul(o2,o3);
o7 := mul(o1,o4);
o8 := mul(o2,o4);
o9 := shr(o5,16);
o10 := add(o6,o9);
o11 := and(o10,0xFFFF);
o12 := shr(o10,16);
o13 := add(o7,o11);
o14 := shr(o13,16);
o15 := add(o14,o12);
res := add(o15,o8);
[ICSE 2010] Joint work with Susmit Jha, Sanjit Seshia (UC-Berkeley),
Ashish Tiwari (SRI) and Venkie (MSR Redmond)
22
Experiments: Comparison with Exhaustive Search
Program
Brahma
AHA
time
Program
Name
lines iters time
P13
4
4
6
X
P14
4
4
60
X
P15
4
8
119
X
P16
4
5
62
X
P17
4
6
78
109
P18
6
5
46
X
P19
6
5
35
X
P20
7
6
108
X
P21
8
5
28
X
P22
8
8
279
X
P23
10
8
1668
X
P24
12
9
224
X
P25
16
11
2779 X
Name
lines iters
time
P1
2
2
3
0.1
P2
2
3
3
0.1
P3
2
3
1
0.1
P4
2
2
3
0.1
P5
2
3
2
0.1
P6
2
2
2
0.1
P7
3
2
1
2
P8
3
2
1
1
P9
3
2
6
7
P10
3
14
76
10
P11
3
7
57
9
P12
3
9
67
10
Brahma
AHA
time
23
Functional Specification
• Choice 1: Logical relation between inputs and outputs
• Choice 2: Input-Output Examples
24
Functional Specification: Logical Relations
Problem: Turn off rightmost 1-bit
Functional Specification of desired behavior
b
Æ[ (
p=1
I[p]=1
b
Æ
(I[j]=0)) ) (J[p]=0 Æ(J[j] = I[j])) ]
j=p+1
jp
Tool Output:
J = I & (I-1)
25
Functional Specification
Problem: Turn off rightmost contiguous string of 1-bits
• Logical Relations
– A bit complicated
• Input-Output Examples
– Key challenge is to resolve under-specification
– Our solution: Interaction with user
26
Dialog: Interactive Synthesis
Problem: Turn-off rightmost contiguous string of 1’s
User: I want a design that maps 01011 -> 01000
Oracle: I can think of two designs
Design 1: (x+1) & (x-1)
Design 2: (x+1) & x
which differ on 00000 (Distinguishing Input)
What should 00000 be mapped to?
User: 00000 -> 00000


27
Dialog: Interactive Synthesis
Problem: Turn-off rightmost contiguous string of 1’s
User: 01011 -> 01000
Oracle: 00000 ?
User: 00000
Oracle: 01111 ?
User: 00000
Oracle: 00110 ?
User: 00000
Oracle: 01100 ?
User: 00000
Oracle: 01010 ?
User: 01000
Oracle: Your design is X & (1 + ((x-1)|x))
28
Outline
• Dimension 1: User Intent
• Dimension 2: Search space
• Dimension 3: Search Technique
• Applications
– Bit-vector Algorithms
 String Manipulation Macros
– Geometry Constructions
29
Application 2: String Manipulation Macros
• Search Space Dimension: Programs with conditionals/loops
– String operations: Concatenate, Substring
– Logical operations: comparison involving # of occurrences of a
regular expression
• User Intent Dimension: Input-output examples
• Search Algorithm Dimension: Combination of
– Version Space Algebras
– Machine Learning
30
Application 2: String Manipulation Macros
• Significance Dimension: End-users
Algorithm
Designers
Software Developers
Most Useful
Target
End-Users
Consumers of Program Synthesis Technology
31
Methodology: Automating end-user programming
1. Identify tasks that end-users struggle with and identify
how they can effectively communicate their intent.
– Read help-forums and blogs.
– Interview real users.
2. Design a language that satisfies the following trade-off.
–
–
Expressive enough to express a lot of tasks.
Small enough to allow efficient learning.
3. Design a learning algorithm with following features.
–
–
–
Interactive with fast convergence (with success or failure).
Provide feedback.
Noise tolerant.
Joint work with: Bill Harris (UW, Madison), Rishabh Singh (MIT)
32
Synthesis Algorithm for String Programs
• Language L of programs contains regular expressions,
conditionals and loops.
• Goal: Given input-output pairs: (i1, o1), (i2, o2), (i3, o3),
(i4, o4), compute set of all programs P such that
– P(i1)=o1, P(i2)=o2, P(i3)=o3, P(i4)=o4
– Then, choose the simplest such P.
33
Synthesis Algorithm for String Programs
1. Compute the set D1 of all straight-line programs s.t. for
any Q in D1, Q(i1) = o1. Similarly compute D2, D3, D4.
2. Let D = D1 Å D2 Å D3 Å D4. If D ≠ ; then done, Else:
(a). Find a smallest partition, say {D1,D3}, {D2,D4}, s.t. D1 Å
D3 ≠ ; and D2 Å D4 ≠ ;
(b). Learn a boolean classifier b that maps i1 and i3 to true
and i2 and i4 to false.
3. Desired set of programs is
If (b) then D1 Å D3
else D2 Å D4
34
Outline
• Dimension 1: User Intent
• Dimension 2: Search space
• Dimension 3: Search Technique
• Applications
– Bit-vector Algorithms
– String Manipulation Macros
 Geometry Constructions
35
Application 3: Geometry Constructions
• Search Space Dimension: Straight-line programs
– Operations: Ruler, Compass
• User Intent Dimension: Logical specifications
– Can be obtained from natural language
– Is further translated into a random input-output example
• Search Algorithm Dimension: Exhaustive Search
– Property Testing
– Goal-directed search
– Commonly used library of constructions
36
Application 3: Geometry Constructions
• Significance Dimension: Students and Teachers
Algorithm
Designers
Software Developers
Most Useful
Target
Most
Transformational
Target
End-Users
Students and Teachers
Consumers of Program Synthesis Technology
37
Automating Education
Make education interactive and fun
• Automated problem solving (for students)
– Provide hints
– Point out mistakes and suggest fixes
• Creation of teaching material (for teachers)
– Authoring tools
– Problem construction
• Group interaction (for teachers/students)
– Ask questions
– Share annotations
Domains: Geometry, Algebra, Probability, Mechanics,
Electrical Circuits, etc.
38
Geometry Constructions Domain
What is the role of PL + Logic + Synthesis?
• Programming Language for Geometry
– Objects: Point, Line, Circle
– Constructors
•
•
•
•
•
Ruler(Point, Point) -> Line
Compass(Point, Point) -> Circle
Intersect(Circle, Circle) -> Pair of Points
Intersect(Line, Circle) -> Pair of Points
Intersect(Line, Line) -> Point
• Logic for Geometry
– Inequality predicates over arithmetic expressions
• Distance(Point, Point), Angle(Line, Line), …
• Automated Problem Solving
– Given pre/postcondition, synthesize a straight-line program
39
Geometry Domain: Automated Problem Solving
Automated Problem Solving
• Given pre/postcondition, synthesize a straight-line program
Example: Draw a line L’ perpendicular to a given line L.
Precondition: true
Postcondition: Angle(L’,L) = 90
Program
Step 1: P1, P2 = ChoosePoint(L);
Step 2: C1 = Circle(P1,P2);
Step 3: C2 = Circle(P2,P1);
Step 4: <P3, P4> = Intersect(C1,C2);
Step 5: L’ = Line(P3,P4);
40
Constructing line L’ perpendicular to given line L
Step 1: P1, P2 = ChoosePoint(L);
Step 2: C1 = Circle(P1,P2);
Step 3: C2 = Circle(P2,P1);
Step 4: <P3, P4> = Intersect(C1,C2);
Step 5: L’ = Line(P3,P4);
L’
P3
C2
C1
P1
L
P2
P4
41
Examples of Geometry Constructions
•
•
•
•
•
•
•
Bisect a given line.
Bisect an angle.
Copy an angle.
Draw a line parallel to a given line.
Draw an equilateral triangle given two points.
Draw a regular hexagon given a side.
Given 4 points, draw a square with each of the
sides passing through a different point.
Other Applications:
• New approximate geometric constructions
• 2D/3D planning problems
42
Synthesis Algorithm for Geometry Constructions
• Synthesis, in general, is harder than verification.
– Synthesis Problem: Given pre/postcondition, synthesize a
straight-line program
– Verification Problem: Given pre/postcondition, and a straightline program, determine whether the Hoare triple holds.
Precondition: True
Postcondition: Angle(L,L’) = 90
Step
Step
Step
Step
Step
1: P1, P2 = ChoosePoint(L);
2: C1 = Circle(P1,P2);
3: C2 = Circle(P2,P1);
4: <P3, P4> = Intersect(C1,C2);
5: L’ = Line(P3,P4);
• Decision procedures for verification of geometry
constructions are known, but are complex.
– Because of symbolic reasoning.
43
A simpler strategy for verification of Constructions
• Symbolic reasoning based decision procedures are complex.
• How about property testing?
Theorem: A construction that works (i.e., satisfies the
postcondition) for a randomly chosen model of precondition
also works for all models (w.h.p.).
Proof:
• Objects constructed using ruler/compass can be described
using polynomial ops (+,-,*), square-root & division operator.
• The randomized polynomial identity testing algorithm lifts
to square-root and division operators as well !
44
Randomized Polynomial Identity Testing
• Problem: Given two polynomials P1 and P2, determine
whether they are equivalent.
• The naïve deterministic algorithm of expanding
polynomials to compare them term-wise is exponential.
• A simple randomized test is probabilistically sufficient:
– Choose random values r for polynomial variables x
– If P1(r) ≠ P2(r), then P1 is not equivalent to P2.
– Otherwise P1 is equivalent to P2 with high probability,
45
Synthesis Algorithm for Geometry Constructions
Problem: Symbolic reasoning is hard.
Idea #1: Leverage Property Testing to reduce symbolic
reasoning to concrete reasoning.
• Construct a random input-output example (I,O) for the problem
and find a construction that can generate O from I.
• Example: Construct incenter of a triangle.
– If I chose my input triangle to be an equilateral one, then the
circumcenter construction also appears to work!
• Since incenter = circumcenter for an equilateral traingle.
– But what are the chances of choosing an random triangle to be
an equilateral one?
46
Synthesis Algorithm for Geometry Constructions
Exhuastive Search Strategy: Given input objects I and
desired objects O, keep constructing new objects from I
using ruler and compass until objects O are obtained.
Problem: Search blows up, i.e., too many (useless) objects get
constructed.
– Example: n points lead to O(n^2) lines, which leads to O(n^4)
points, and so on…
47
Synthesis Algorithm for Geometry Constructions
Problem: Search space is huge.
• Idea #2: Perform goal-directed reasoning.
– Example: If an operation leads to construction of a line L that
passes through a desired output point, it is worthwhile
constructing line L.
– Mimics human intelligence.
– For this to be effective, we need solutions with small depth.
• Idea #3: Work with a richer library of primitives.
– Common constructions picked up from chapters of text-books.
– A search space of (small width, large depth) is converted into
one of (large width, small depth).
– Mimics human experience/knowledge.
48
Search space Exploration: With/without goal-directness
49
Problem Solving Engine with Natural Interfaces
Problem Description in English
Natural Language
Processing
Problem Description as Logical Relation
Synthesis
Engine
Solution as Functional Program
Paraphrasing
Solution in English
Joint work with: Kalika Bali, Monojit Chaudhuri (MSR Bangalore)
Vijay Korthikanti (UIUC), Ashish Tiwari (SRI)
50
Useful modules powered by problem solving engine
The next step is to architect several useful modules on
top of the problem-solving architecture such as:
• Interactive feedback to students
–
–
Provide hints
Point out mistakes and suggest fixes
• Creation of teaching material (for teachers)
– Problem construction
– Authoring tools
51
Other Domains
What domains should we prioritize for automation?
• Mathematics
– Algebra
– Probability
• Physics
– Mechanics
– Electrical Circuits
– Optics
• Chemistry
– Quantitative Chemistry
– Organic Chemistry
52
Electrical Circuits: Concept-specific solutions
• Consider the problem of computing effective resistance
between two nodes in a graph of resistances.
• MATLAB implements Kirchoff’s law based decision procedure
– Algebraic sum of the currents at any circuit junction = 0
– Sum of changes in potential in any complete loop = 0
Joint work with: Swarat Chaudhuri (Penn State University)
53
Electrical Circuits: Concept-specific solutions
• Consider the problem of computing effective resistance
between two nodes in a graph of resistances.
• Kirchoff’s law based decision procedure is not useful for
students who are expected to know only simpler concepts.
• Solutions need be parameterized by specific concepts such as
– Series/Parallel composition of resistances
– Symmetry Reduction
– Wheatstone Bridge
Joint work with: Swarat Chaudhuri (Penn State University)
54
Resistance Reduction Concepts
Series
Combination
Parallel
Combination
Wheat-stone
Bridge
If R3/R1 = R4/R2, then VD = VB
55
Automating Education: Long-term Goals
• Ultra-intelligent computer
• Model of human mind
• Inter-stellar travel 
(Inter-disciplinary) Dimensions in Program Synthesis
• User Intent
– Human Computer Interaction
– Natural Language Processing
• Search Space (requires corresponding domain expertise)
– Graphics (for image manipulation)
– Mathematics/Physics (for classroom problem solving)
• Search Techniques
– Logical Reasoning
– Machine Learning
57
The Significance Dimension
Algorithm
Designers
Software Developers
Most Useful
Target
Most
Transformational
Target
End-Users
Students and Teachers
Consumers of Program Synthesis Technology
58
Research Questions
• How to combine various forms of user intent in a unified
programming interface?
– logic, natural language, input/output example, partial program
• How to ensure a modular architecture that allows reuse of
domain knowledge and search techniques across different
synthesis tools/applications?
• How to combine power of different search techniques?
– Version space algebras
– SAT/SMT based logical reasoning techniques
– Machine learning techniques
59
References
• Dimensions in Program Synthesis
– Invited paper at ACM PPDP 2010
• Bitvector Algorithms
– “Oracle guided component based program synthesis”,
ICSE 2010, Jha/Gulwani/Seshia/Tiwari
• String Manipulation Macros
– “Automating String Processing in Spreadsheets using InputOutput Examples”, POPL 2011, Gulwani
• Geometry Constructions
– “Synthesizing Geometry Constructions”,
Techreport 2011, Gulwani/Korthikanti/Tiwari
60
Download