Dimensions in Synthesis Sumit Gulwani May 2012

advertisement
Dimensions in Synthesis
Sumit Gulwani
sumitg@microsoft.com
Microsoft Research, Redmond
May 2012
Program Synthesis
• Synthesize a program in some underlying language from
user intent using some search technique.
• Why today?
– Variety of (cheap) computational devices and platforms
• Billions of non-experts have access to these devices!
– Enabling technology is now available
• Better search algorithms
• Faster machines (good application for multi-cores)
1
Program Synthesis
• Synthesize a program in some underlying language from
user intent using some search technique.
• Why today?
– Variety of (cheap) computational devices and platforms
• Billions of non-experts have access to these devices!
– Enabling technology is now available
• Better search algorithms
• Faster machines (good application for multi-cores)
2
Dimensions in Synthesis
• Concept Language (Application)
– Programs
• Straight-line programs
– Automata
– Queries
– Sequences
• User Intent (Ambiguity)
– Logic, Natural Language
– Examples, Demonstrations/Traces
• Search Technique (Algorithm)
– SAT/SMT solvers (Formal Methods)
– A*-style goal-directed search (AI)
– Version space algebras (Machine Learning)
PPDP 2010: “Dimensions in Program Synthesis”, Gulwani.
3
Compilers vs. Synthesizers
Dimension
Compilers
Synthesizers
Concept
Language
Executable Program
Variety of concepts:
Program, Automata, Query,
Sequence
User Intent
Structured language Variety/mixed form of
constraints: logic, examples,
traces
Search
Technique
Syntax-directed
Uses some kind of search
translation (No new (Discovers new algorithmic
algorithmic insights) insights)
4
Potential Users of Synthesis Technology
Algorithm
Designers
Software Developers
Most Useful
Target
Most
Transformational
Target
End-Users
Students and Teachers
• Vision for End-users: Enable people to have (automated)
personal assistants.
• Vision for Education: Enable every student to have
access to free & high-quality education.
5
Organization
Lecture 1: Algorithms
• Synthesis of Straight-line Programs from Logic
– Bit-vector Algorithms
– Geometry Constructions
Lecture 2: Applications
• Intelligent Tutoring Systems
Lecture 3: Ambiguity
• Synthesis from Examples & Keywords
6
Lab
Intelligent Tutoring Systems
Technical Goals:
• Identify a useful task that can be formalized as a
synthesis problem.
• Propose an appropriate user interaction model.
• Propose an appropriate search technique.
7
Synthesizing Bitvector Algorithms
PLDI 2011: Gulwani, Jha, Tiwari, Venkatesan
Dimensions in Synthesis
• Concept Language
– Programs
• Straight-line programs
– Automata
– Queries
– Sequences
• User Intent
– Logic, Natural Language
– Examples, Demonstrations/Traces
• Search Technique
– SAT/SMT solvers (Formal Methods)
– A*-style goal-directed search (AI)
– Version space algebras (Machine Learning)
PPDP 2010: “Dimensions in Program Synthesis”, Gulwani.
9
Bitvector Algorithms
Straight-line programs that use
– Arithmetic Operators: +,-,*,/
– Logical Operators: Bitwise and/or/not, Shift left/right
10
Examples of Bitvector Algorithms
Turn-off rightmost 1-bit
10101100
&
Z
10101000
Z & (Z-1)
10101100
Z
10101011
Z-1
10101000
Z & (Z-1)
11
Examples of Bitvector Algorithms
Turn-off rightmost contiguous sequence of 1-bits
10101100
10100000
Z
Z & (1 + (Z | (Z-1)))
Ceil of average of two integers without overflowing
(Y|Z) – ((Y©Z) >> 1)
12
Examples of Bitvector Algorithms
Round up to next
highest power of 2
o1 := sub(x,1);
o2 := shr(o1,1);
o3 := or(o1,o2);
o4 := shr(o3,2);
o5 := or(o3,o4);
o6 := shr(o5,4);
o7 := or(o5,o6);
o8 := shr(o7,8);
o9 := or(o7,o8);
o10 := shr(o9,16);
o11 := or(o9,o10);
res := add(o10,1);
Higher order half
of product of x and y
o1 := and(x,0xFFFF);
o2 := shr(x,16);
o3 := and(y,0xFFFF);
o4 := shr(y,16);
o5 := mul(o1,o3);
o6 := mul(o2,o3);
o7 := mul(o1,o4);
o8 := mul(o2,o4);
o9 := shr(o5,16);
o10 := add(o6,o9);
o11 := and(o10,0xFFFF);
o12 := shr(o10,16);
o13 := add(o7,o11);
o14 := shr(o13,16);
o15 := add(o14,o12);
res := add(o15,o8);
13
Problem Definition
Given:
• Specification
• Specification
of desired functionality
of library components
Synthesize a straight-line program
where
• Each variable in is either or some
•
is a permutation of 1...n
where k<j
that meets the desired specification.
Verification
Constraint
14
Problem Definition: Turn-off rightmost 1 bit
• Specification of desired functionality
• Specification of library components
15
Synthesis Constraint
Verification
Constraint
Synthesis
Constraint
16
Idea # 1: Reduce Second-order Quantification in
Synthesis Constraint to First Order
represents which component goes on which location
(line #) and from which location does it gets its input
arguments. We encode this by location variables L.
17
Example: Possible programs that use 2 components
and their Representation using Location Variables
18
Encoding Well-formedness of Programs
The following constraint ensures that L assignments
correspond to well-formed programs.
• Consistency Constraint: Every line in the program
should have at most one component.
• Acyclicity Constraint: A variable should be initialized
before being used.
19
Encoding data-flow
The following constraint describes connections
between inputs and outputs of various components.
20
Idea # 1: Reduce Second-order Quantification in
Synthesis Constraint to First Order
21
Idea # 2: Using CEGIS style procedure to
solve the Synthesis Constraint
Synthesis constraint is of the form: 9L 8Y F(L,Y)
Choose some values y1,..,yn for y
Finite Synthesis Step
9L F(L,y1) Æ … Æ F(L,yn)
No Solution
Failure
Solution L = S
Verification Step
Does 8Y F(S,Y) hold?
Or, equivalently 9Y :F(S,Y)
Solution Y = yn+1
No Solution
return S
22
Experiments: Comparison with Brute-force Search
Program
Brahma
AHA
time
Program
Name
lines iters time
P13
4
4
6
X
P14
4
4
60
X
P15
4
8
119
X
P16
4
5
62
X
P17
4
6
78
109
P18
6
5
46
X
P19
6
5
35
X
P20
7
6
108
X
P21
8
5
28
X
P22
8
8
279
X
P23
10
8
1668
X
P24
12
9
224
X
P25
16
11
2779 X
Name
lines iters
time
P1
2
2
3
0.1
P2
2
3
3
0.1
P3
2
3
1
0.1
P4
2
2
3
0.1
P5
2
3
2
0.1
P6
2
2
2
0.1
P7
3
2
1
2
P8
3
2
1
1
P9
3
2
6
7
P10
3
14
76
10
P11
3
7
57
9
P12
3
9
67
10
Brahma
AHA
time
23
Synthesizing Geometry Constructions
PLDI 2011: Gulwani, Korthikanti, Tiwari.
Dimensions in Synthesis
• Concept Language
– Programs
• Straight-line programs
– Automata
– Queries
– Sequences
• User Intent
– Logic, Natural Language
– Examples, Demonstrations/Traces
• Search Technique
– SAT/SMT solvers (Formal Methods)
– A*-style goal-directed search (AI)
– Version space algebras (Machine Learning)
PPDP 2010: “Dimensions in Program Synthesis”, Gulwani.
25
Ruler/Compass based Geometry Constructions
Given a triangle XYZ,
construct circle C such that
C passes through X, Y, and Z.
C
L1
L2
N
Z
X
Y
26
Other Examples of Geometry Constructions
• Draw a regular hexagon given a side.
• Given 3 parallel lines, draw an equilateral triangle
whose vertices lie on the parallel lines.
• Given 4 points, draw a square whose sides contain
those points.
27
Significance
• Good platform for teaching logical reasoning.
– Visual Nature:
• Makes it more accessible.
• Exercises both logical/visual abilities of left/right brain.
– Fun Aspect:
• Ruler/compass restrictions make it fun, as in sports.
• Application in dynamic geometry or animations.
– “Constructive” geometry macros (unlike numerical
methods) enable fast re-computation of derived objects
from free (moving) objects.
28
Programming Language for Geometry Constructions
Types: Point, Line, Circle
Methods:
• Ruler(Point, Point) -> Line
• Compass(Point, Point) -> Circle
• Intersect(Circle, Circle) -> Pair of Points
• Intersect(Line, Circle) -> Pair of Points
• Intersect(Line, Line) -> Point
Geometry Program: A straight-line composition of
the above methods.
29
Example Problem: Program
Given a triangle XYZ, construct circle C such that C
passes through X, Y, and Z.
1. C1 = Compass(X,Y);
2. C2 = Compass(Y,X);
3. <P1,P2> = Intersect(C1,C2);
4. L1 = Ruler(P1,P2);
5. D1 = Compass(Z,X);
6. D2 = Compass(X,Z);
7. <R1,R2> = Intersect(D1,D2);
8. L2 = Ruler(R1,R2);
9. N = Intersect(L1,L2);
10. C = Compass(N,X);
C
L1
N
C1
D1
R1
Z
R2
P1
X
D2
L2
C2
Y
P2
30
Specification Language for Geometry Programs
Conjunction of predicates over arithmetic expressions
Predicates
p := e1 = e2
| e1  e2
| e1 · e2
Arithmetic Expressions
e :=
|
|
|
Distance(Point, Point)
Slope(Point, Point)
e 1 § e2
c
31
Example Problem: Precondition/Postcondition
Given a triangle XYZ, construct circle C such that C
passes through X, Y, and Z.
Precondition:
Slope(X,Y)  Slope(X,Z) Æ Slope(X,Y)  Slope(Z,X)
Postcondition:
LiesOn(X,C) Æ LiesOn(Y,C) Æ LiesOn(Z,C)
Where LiesOn(X,C) ´
Distance(X,Center(C)) = Radius(C)
32
Verification/Synthesis Problem for Geometry Programs
• Let P be a geometry program that computes
outputs O from inputs I.
• Verification Problem: Check the validity of the
following Hoare triple.
Assume Pre(I);
P
Assert Post(I,O);
• Synthesis Problem: Given Pre(I), Post(I,O), find P
such that the above Hoare triple is valid.
33
Approaches to Verification Problem
Pre(I), P, Post(I,O)
a) Symbolic decision procedures are complex.
34
Randomized Polynomial Identity Testing
• Problem: Given two polynomials P1 and P2, determine
whether they are equivalent.
• The naïve deterministic algorithm of expanding
polynomials to compare them term-wise is exponential.
• A simple randomized test is probabilistically sufficient:
– Choose random values r for polynomial variables x
– If P1(r) ≠ P2(r), then P1 is not equivalent to P2.
– Otherwise P1 is equivalent to P2 with high probability,
35
Approaches to Verification Problem
Pre(I), P, Post(I,O)
a) Symbolic decision procedures are complex.
b)
1.
2.
3.
New efficient approach: Random Testing!
Choose I’ randomly from the set { I | Pre(I) }.
Compute O’ := P(I’).
If O’ satisfies Post(I’,O’) output “Verified”.
Correctness Proof of (b):
• Objects constructed by P can be described using
polynomial ops (+,-,*), square-root & division operator.
• The randomized polynomial identity testing algorithm
lifts to square-root and division operators as well !
36
Idea 1 (from Theory): Symbolic Reasoning -> Concrete
Synthesis Algorithm:
// First obtain a random input-output example.
1. Choose I’ randomly from the set { I | Pre(I) }.
2. Compute O’ s.t. Post(I’,O’) using numerical methods.
// Now obtain a construction that can generate O’ from I’
(using exhaustive search).
3. S := I’;
4. While (S does not contain O’)
5.
S := S [ { M(O1,O2) | Oi 2 S, M 2 Methods }
6. Output construction steps for O’.
37
Error Probability of the algorithm is extremely low.
Given a triangle XYZ, construct circle C such that C
passes through X, Y, and Z.
C
L1
…
L1 = Ruler(P1,P2);
N
…
L2 = Ruler(R1,R2);
N = Intersect(L1,L2);
Z
C = Compass(N,X);
X
L2
Y
• For an equilateral 4XYZ, incenter
coincides with circumcenter N.
• But what are the chances of choosing a
random 4XYZ to be an equilateral one?
38
Idea 2 (from PL): High-level Abstractions
Synthesis algorithm times out because programs are large.
• Identify a library of commonly used patterns (pattern =
“sequence of geometry methods”)
– E.g., perpendicular/angular bisector, midpoint, tangent, etc.
S := S [ { M(O1,O2) | Oi 2 S, M 2 Methods }
S := S [ { M(O1,O2) | Oi 2 S, M 2 LibMethods }
• Two key advantages:
– Search space: large depth -> small depth
– Easier to explain solutions to students.
39
Use of high-level abstractions reduces program size
Given a triangle XYZ, construct circle C such that C
passes through X, Y, and Z.
1. C1 = Compass(X,Y);
2. C2 = Compass(Y,X);
3. <P1,P2> = Intersect(C1,C2);
4. L1 = Ruler(P1,P2);
5. D1 = Compass(Z,X);
6. D2 = Compass(X,Z);
7. <R1,R2> = Intersect(D1,D2);
8. L2 = Ruler(R1,R2);
9. N = Intersect(L1,L2);
10. C = Compass(N,X);
1.
2.
3.
4.
L1 = PBisector(X,Y);
L2 = PBisector(X,Z);
N = Intersect(L1,L2);
C = Compass(N,X);
40
Idea 3 (from AI): Goal Directed Search
Synthesis algorithm is inefficient because the search space
is too wide and hence still huge.
• Prune forward search by using A* style heuristics.
S := S [ { M(O1,O2) | Oi 2 S, M 2 LibMethods }
S := S [ {M(O1,O2) | Oi2S, M2LibMethods, IsGood(M(O1,O2)) }
• Example: If a method constructs a line L that passes
through a desired output point, then L is “good” (i.e.,
worth constructing).
41
Effectiveness of Goal-directed search
Given a triangle XYZ,
construct circle C such that
C passes through X, Y, and Z.
C
L1
L2
N
Z
X
Y
• L1 and L2 are immediately constructed since
they pass through output point N.
• On the other hand, other lines like angular
bisectors are not eagerly constructed.
42
Experimental Results
25 benchmark problems.
• such as: Construct a square whose extended sides
pass through 4 given points.
• 18 problems less than 1 second.
4 problems between 1-3 seconds.
3 problems 13-82 seconds.
• Idea 2 (high-level abstractions) reduces programs
of size 3-45 to 3-13.
• Idea 3 (goal-directedness) improves performance
by factor of 10-1000 times on most problems.
43
Search space Exploration: With/without goal-directness
44
Dimensions in Synthesis
• Concept Language
– Programs
• Straight-line programs
– Automata
– Queries
– Sequences
• User Intent
– Logic, Natural Language
– Examples, Demonstrations/Traces
• Search Technique
– SAT/SMT solvers (Formal Methods)
– A*-style goal-directed search (AI)
– Version space algebras (Machine Learning)
PPDP 2010: “Dimensions in Program Synthesis”, Gulwani.
45
Optional Advance Preparation
• Lecture 2
– Section 4 in WAMBSE 2012 keynote paper
“Synthesis from Examples”, Gulwani.
• Lab
– Section 4 in WAMBSE 2012 keynote paper.
– NCERT Online Book Website.
http://ncert.nic.in/NCERTS/textbook/textbook.htm
• Lecture 3
– Sections 1-3 in WAMBSE 2012 keynote paper
46
Intelligent Tutoring Systems
• Motivation
– Online learning sites: Khan academy, Edx, Udacity, Coursera
• Increasing class sizes with even less personal attention
– New technologies: Tablets/Smartphones, NUI, Cloud
• Various Aspects
–
–
–
–
Solution Generation
Problem Generation
Automated Grading
Content Entry
• Various Domains
– K-12: Mathematics, Physics, Chemistry
– Undergraduate: Introductory Programming, Automata Theory
– Language Learning
47
Download