Programming by Examples Sumit Gulwani Tutorial @ CBSoft Sept 2015

advertisement
Programming by Examples
Sumit Gulwani
Tutorial @ CBSoft
Sept 2015
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based specification
• Search methodology
• SDK
• Miscellaneous
1
The New Opportunity
• 2 orders of magnitude more end users
 99% of computer users don’t know programming.
• Struggle with simple repetitive tasks
Traditional customer for
PL technology
Software developer
End Users
2
Excel help forums
Typical help-forum interaction
300_w30_aniSh_c1_b  w30
300_w5_aniSh_c1_b  w5
=MID(B1,5,2)
=MID(B1,FIND(“_”,$B:$B)+1,
FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1)
Flash Fill (Excel 2013 feature) demo
“Automating string processing in spreadsheets using input-output examples”;
POPL 2011; Sumit Gulwani
Data Wrangling
• Data locked up in silos in various formats
– Flexible organization for viewing but challenging to manipulate.
• Wrangling workflow: Extraction, Transformation, Formatting
• Data scientists spend 80% of their time wrangling data.
• PBE can enable easier & faster data wrangling experience. 6
Data Science Class Assignment
To get Started!
FlashExtract Demo
“FlashExtract: A Framework for data extraction by examples”;
PLDI 2014; Vu Le, Sumit Gulwani
8
FlashExtract
FlashExtract
Table Re-formatting
Trifacta: small, guided steps
Start with:
End goal:
FlashRelate
4. Pivot Number on Type
Trifacta provides a series of small transformations:
1. Split on “:” Delimiter
2. Delete Empty Rows
From: Skills of the Agile Data Wrangler (tutorial by Hellerstein and Heer)
3. Fill Values Down
FlashRelate Demo
“FlashRelate: Extracting Relational Data from Semi-Structured
Spreadsheets Using Examples”;
PLDI 2015; Barowy, Gulwani, Hart, Zorn
12
PBE tools for Data Manipulation
Extraction
• FlashExtract: Extract data from text files, web pages [PLDI 2014;
Powershell convertFrom-string cmdlet
Transformation
• Flash Fill: Excel feature for Syntactic String Transformations
[POPL 2011, CAV 2015]
• Semantic String Transformations [VLDB 2012]
• Number Transformations [CAV 2013]
• FlashNormalize: Text normalization [IJCAI 2015]
Formatting
• FlashRelate: Extract data from spreadsheets [PLDI 2015, PLDI 2011]
• FlashFormat: a Powerpoint add-in [AAAI 2014]
13
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based spec
 Ranking
• Search methodology
• SDK
• Miscellaneous
14
Basic ranking scheme
Prefer programs with simpler Kolmogorov complexity
• Prefer fewer constants.
• Prefer smaller constants.
Input
Alex Polozov
Output
Alex
Helmut Seidl
Helmut
• 1st Word
• If (input = “Alex Polozov”) then “Alex” else “Helmut”
• “Alex”
15
Challenges with Basic ranking scheme
Prefer programs with simpler Kolmogorov complexity
• Prefer fewer constants.
• Prefer smaller constants.
Input
Output
Alex Polozov
Helmut Seidl
Polozov, Alex
Seidl, Helmut
• 2nd Word + “, ‘’ + 1st Word
• “Polozov, Alex”
How to select between
Fewer larger constants vs. More smaller constants?
Idea: Associate numeric weights with constants.
16
Challenges with Basic ranking scheme
Prefer programs with simpler Kolmogorov complexity
• Prefer fewer constants.
• Prefer smaller constants.
Input
Missing page numbers, 1993
Output
1993
64-67, 1995
1995
• 1st Number from the beginning
• 1st Number from the end
How to select between
Same number of same-sized constants?
Idea: Examine data features (in addition to program features)
17
Machine learning based ranking scheme
Rank score of a program: Weighted combination of
various features.
• Weights are learned using machine learning.
Program features
• Number of constants
• Size of constants
Features over user data: Similarity of generated output
(or even intermediate values) over various user inputs
• IsYear, Numeric Deviation, Number of characters
• IsPersonName
“Predicting a correct program in Programming by Example”;
[CAV 2015] Rishabh Singh, Sumit Gulwani
18
Machine learning based ranking scheme
Rank score of a program: Weighted combination of
various features.
• Weights are learned using machine learning.
Training data for weight computation:
• Let T be a task, specified as a set of input-output pairs for
all inputs that the user cares about.
• Let (I,O) be any single input-output pair from task T.
• Let P be the set of all programs consistent with (I,O).
• Let Q ⊂ P be the set of all programs consistent with task T.
• Weights should be such that at least one program in Q is
ranked higher than all programs in P-Q.
“Predicting a correct program in Programming by Example”;
[CAV 2015] Rishabh Singh, Sumit Gulwani
19
Comparison of Ranking Strategies over FlashFill Benchmarks
Basic
Learning
Strategy
Average # of examples required
Basic
4.17
Learning
1.48
“Predicting a correct program in Programming by Example”; CAV 2015
Rishabh Singh, Sumit Gulwani
20
FlashFill Ranking Demo
21
Need for a fall-back mechanism
“It's a great concept, but it can also lead to
lots of bad data. I think many users will look
at a few "flash filled" cells, and just assume
that it worked. … Be very careful.”
“most of the extracted data will be fine. But
there might be exceptions that you don't notice
unless you examine the results very carefully.”
22
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based spec
 Ranking, User Interaction Models
• Search methodology
• SDK
• Miscellaneous
23
User Interaction Models for Ambiguity Resolution
Communicate actionable information to the user.
• Make it easy to inspect output correctness
– User can accordingly provide more examples
• Show programs
– in any desired programming language; in English
– Enable effective navigation between programs
• Computer initiated interactivity (Active learning)
– Highlight less confident entries in the output.
– Ask directed questions based on distinguishing inputs.
“User Interaction Models for Disambiguation in Programming by Example”,
[UIST 2015] Mayer, Soares, Grechkin, Le, Marron, Polozov, Singh, Zorn, Gulwani
24
FlashExtract Demo
(User Interaction Models)
25
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based spec
o Ranking, User Interaction Models
• Search methodology
 DSL
• SDK
• Miscellaneous
26
Domain-specific Language (DSL)
• Balanced Expressiveness
– Expressive enough to cover wide range of tasks
– Restricted enough to enable efficient search
• Restricted set of operators
– those with small inverse sets
• Restricted syntactic composition of those operators
• Natural computation patterns
– Increased user understanding/confidence
– Enables selection between programs, editing
27
FlashFill DSL
𝑇𝑢𝑝𝑙𝑒 𝑆𝑡𝑟𝑖𝑛𝑔 𝑥1 , … , 𝑆𝑡𝑟𝑖𝑛𝑔 𝑥𝑛 → 𝑆𝑡𝑟𝑖𝑛𝑔
top-level expr T := if-then-else(B,C,T)
| C
condition-free expr C := Concatenate(A, C)
| A
atomic expression A := SubStr(X, …)
| ConstantString
input string X := x1 | x2 | …
boolean expression B := …
28
Substring Operator
What is a good choice for … in SubStr(X, …) ?
• Regular expression
– Not very expressive. Does not take context into account.
– For instance: content within parenthesis, second word
• Extended regular expression
– Too sophisticated for learning and readability.
Desired computational pattern:
• Should involve simple regexes.
• Take context into account.
29
Substring Operator
SubStr(X, P, P’)
position expr P := Pos(X, R1, R2, K)
Kth position in X whose left/right
side matches with R1/R2.
30
Substring Operator
Evaluation of Expr SubStr(x,p,p’) , where p=Pos(x,r1,r2,k) &
p’=Pos(x,r1’,r2’,k’)
matches r1 matches r2
matches r1’ matches r2’
p
p’
w
x
Two special cases:
• r1 = r2’ = 𝜖 : This describes the substring
• r2 = r1’ = 𝜖 : This describes the context around the substring
General case is very expressive (describes substring & context)
Regular exprs are simple (they describe local properties).
31
Substring Operator
SubStr(X, P, P’)
position expr P := Pos(X, R, R, K)
|K
regular expr R := Seq(Token1, …, Tokenn), where n ≤3.
Token := Word | Number | Alphanumeric | ‘[‘ | …
•
•
•
•
2nd Word: SubStr(x, Pos(x,𝜖,Word,2), Pos(x,Word,𝜖,2))
Content within brackets: SubStr(x, Pos(s,’[‘,𝜖,1), Pos(x,𝜖,’]’,1))
First 7 characters: SubStr(x, 0, 7)
Last 7 characters: SubStr(x, -8, -1)
32
Substring Operator
SubStr(X, P, P)
position expr P := Pos(X, R, R, K) | K
Restriction
let x = X in
SubStr(x, P, P)
position expr P := Pos(x, R, R, K) | K
let x = X in
let p1 = P[x] in
let p2 = P[x] in
SubStr(x, p1, p2)
position expr P[y] := Pos(y, R, R, K) | K
Elegance
33
Substring Operator
let x = X in
let p1 = P[x] in
let p2 = P[x] in
SubStr(x, p1, p2)
position expr P[y] := Pos(y, R, R, K) | K
Increased
Expressiveness
let x = X in
let p1 = P[x] in
let p2 = P[x] | p1 + P[Suffix(x,p1)] in
SubStr(x, p1, p2)
position expr P[y] := Pos(y, R, R, K) | K
First 7 chars in 2nd Word: SubStr(x, p1=Pos(x,𝜖,Word,2), p1+7)
Suffix(x,p) ≡ SubStr(x,p,-1)
34
Substring Operator
let x = X in
let p1 = P[x] in
let p2 = P[x] | p1 + P[Suffix(x,p1)] in
SubStr(x, p1, p2)
position expr P[y] := Pos(y, R, R, K) | K
Increased
Expressiveness
let x = X in
let p1 = P[x] | let p0 = P[x] in (p0 + P[Suffix(s, p0)]) in
let p2 = P[x] | p1 + P[Suffix(x,p1)] in
SubStr(x, p1, p2)
position expr P[y] := Pos(y, R, R, K) | K
2nd word within brackets: let p0 = Pos(x,’[‘,𝜖,1) in
SubStr(x, p1 = p0+Pos(Suffix(x,p0), 𝜖, Word, 2),
p1+Pos(Suffix(x,p1), Word, 𝜖, 2)) 35
FlashExtract DSL
𝑆𝑡𝑟𝑖𝑛𝑔 𝑑 → 𝐿𝑖𝑠𝑡(𝑃𝑜𝑠𝑃𝑎𝑖𝑟)
all lines L := Split(d,”\n”)
some lines N := Filter(L, 𝜆z: F[z]) | Filter(L, 𝜆z: F[prevLine(z)])
| FilterByPosition(L, init, iter)
line filter function F[y] := Contains(y,r,K) | startsWith(y,r)
substr expr S[X] := let x = X in
let p1=P[x] | let p0=P[x] in (p0+P[Suffix(s,p0)]) in
let p2 = P[x] | p1 + P[Suffix(x,p1)] in
SubStr(x, p1, p2)
position expr P[y] := Pos(y, R, R, K) | K
36
FlashExtract DSL
𝑆𝑡𝑟𝑖𝑛𝑔 𝑑 → 𝐿𝑖𝑠𝑡(𝑃𝑜𝑠𝑃𝑎𝑖𝑟)
Seq expr E := Map(N, 𝜆z: PP[z])
| Map(Pos(d,R,R), 𝜆z: PP[Suffix(d,z)])
| Merge(T1, T2)
all lines L := Split(d,”\n”)
some lines N := Filter(L, 𝜆z: F[z]) | Filter(L, 𝜆z: F[prevLine(z)])
| FilterByPosition(L, init, iter)
line filter function F[y] := Contains(y,r,K) | startsWith(y,r)
substr expr S[X] := let x = X in SubStr(x, PP[x])
position pair PP[x] := let p1=P[x] | let p0=P[x] in (p0+P[Suffix(s,p0)]) in
let p2 = P[x] | p1 + P[Suffix(x,p1)] in
(p1,p2)
37
position expr P[y] := Pos(y, R, R, K) | K
FlashExtract DSL
𝑆𝑡𝑟𝑖𝑛𝑔 𝑑 → 𝐿𝑖𝑠𝑡(𝑃𝑜𝑠𝑃𝑎𝑖𝑟1 , … . , 𝑃𝑜𝑠𝑃𝑎𝑖𝑟𝑛 )
top-level expr T := Plan(𝜋, P, (k1,D1),…,(kn-1,Dn-1)), where 0≤ ki < i.
Plan(𝜋, P, (k1,D1),…,(kn-1,Dn-1)) ≡
Map(P, 𝜆z: R), where R[𝜋(0)] = z, R[𝜋(i)] -> Di(R[ki])
primary keys P := E
derived value D := 𝜆z: PP[Suffix(d,Snd(z))]
38
FlashRelate DSL
39
Table Re-formatting
Input: Semi-structured
spreadsheet
Output: Relational table
Table Re-formatting
Input: Semi-structured
spreadsheet
Output: Relational table
FlashRelate DSL
𝑆𝑝𝑟𝑒𝑎𝑑𝑠ℎ𝑒𝑒𝑡 𝑑 → 𝐿𝑖𝑠𝑡 𝐶𝑒𝑙𝑙1 , … , 𝐶𝑒𝑙𝑙𝑛
top-level expr T := Plan(𝜋, P, (k1,D1),…,(kn-1,Dn-1)), where 0≤ ki < i.
primary keys P := Filter(d.Cells, 𝜆z: F[z])
cell filter fn F[y] := Boolean constraint over
y.Coordinates, y.Content, y.Neighbors
derived value D := 𝜆z: Neighbor(z,K,K)
| 𝜆z: MoveUntil(z, Direction, 𝜆y: F[y])
42
Summary: Design of DSLs for Synthesis
• Balanced Expressiveness
– Expressive enough to cover wide range of tasks
• Build out iteratively from a simple core.
– Restricted enough to enable efficient search
• Use “let” construct for syntactic restrictions.
• Natural computation patterns
– Use function definitions for reusability.
43
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based spec
o Ranking, User Interaction Models
• Search methodology
 DSL, Deductive divide-and-conquer strategy
• SDK
• Miscellaneous
44
Search Strategy
Goal: Set of expr of kind 𝑒 that satisfies spec 𝜙
[denoted 𝑒 ⊨ 𝜙 ]
𝑒: DSL (top-level) expression
𝜙: example-based inductive specification
Examples: Conjunction of (input state 𝜎 , output value 𝑣)
[denoted 𝜎 ⇝ 𝑣]
Inductive Spec: Conjunction of (input state, output property)
Output properties are easier to specify intent!
“FlashMeta: A Framework for Inductive Program Synthesis”
Alex Polozov, Sumit Gulwani; OOPSLA 2015
45
Output properties
Task
•
•
•
•
Elements belonging to the output list
Elements not belonging to the output list
Contiguous subsequence of the output list
Prefix of the output list
46
Output properties
Task
• Prefix of the output table (seq of records)
We do not require explicit (magenta) record
boundaries in which case the spec is:
• Prefixes of projections of the output table
47
Search Strategy
Goal: Set of expr of kind 𝑒 that satisfies spec 𝜙
[denoted 𝑒 ⊨ 𝜙 ]
𝑒: DSL (top-level) expression
𝜙: example-based inductive specification
Strategy: Based on divide-and-conquer
• 𝑒 ⊨ 𝜙 is reduced to simpler problems (over subexpressions of e or sub-constraints of 𝜙).
• Top-down (as opposed to bottom-up enumerative search).
48
Problem Reduction
list of strings T := Map(L, S)
substring fn S := 𝜆y: …
FlashExtract DSL
list of lines L := Filter(Split(d,”\n”), B)
boolean fn B := 𝜆y: …
Spec for T
Spec for L
⋈
Spec for S
∧
49
Problem Reduction
SubStr grammar
Spec for E
substring expr E := SubStr(y, P1, P2)
position expr P := K | Pos(y, R1, R2, K)
Spec for P1
Redmond, WA
⋈
Spec for P2
Redmond, WA
50
Search Strategy
Goal: Set of expr of kind 𝑒 that satisfies spec 𝜙
[denoted 𝑒 ⊨ 𝜙 ]
𝑒: DSL (top-level) expression
𝜙: example-based inductive specification
Methodology: Based on divide-and-conquer
• 𝑒 ⊨ 𝜙 is reduced to simpler problems (over subexpressions of e or sub-constraints of 𝜙).
• Top-down (as opposed to bottom-up enumerative search).
Key concepts in problem reduction: VSAs & Witness functions
51
Version Space Algebra (VSA)
AST based succinct representation for a set of programs
A graph with 3 kinds of nodes and a unique start node.
Each node 𝑁 represents a set of programs [𝑁].
• Leaf node: labelled with a set 𝑒 of program expressions
[𝑁] = 𝑒
• Union node (with k children 𝑁1 , … , 𝑁𝑘 )
𝑁 = 𝑁1 ∪ ⋯ ∪ 𝑁𝑘
• Join node (with k ordered children 𝑁1 , … , 𝑁𝑘 ): labelled
with a k-ary operator F
𝑁 = 𝐹 𝑒1 , … , 𝑒𝑚
𝑒1 ∈ 𝑁1 , … , 𝑒𝑘 ∈ [𝑁𝑘 ] }
52
VSA Operations
• Intersect: VSA × 𝑉𝑆𝐴 → 𝑉𝑆𝐴
• TopRank: 𝑉𝑆𝐴 × Ranking function × int 𝑘 → Top-𝑘 programs
• Cluster: 𝑉𝑆𝐴 × State 𝜎 → 2𝑉𝑆𝐴
– The output is a smallest partitioning of the input VSA s.t. all
programs in any output VSA produce the same output on 𝜎.
• Filter: 𝑉𝑆𝐴 × Spec 𝜙 → 𝑉𝑆𝐴
– Filter the input VSA to the subset that satisfies spec 𝜙.
53
Problem Reduction Rules
𝑒 ⊨ 𝜙 = Union( 𝑒1 ⊨ 𝜙 , 𝑒2 ⊨ 𝜙 )
where 𝑒 is a non-terminal defined as 𝑒 ≔ 𝑒1 | 𝑒2
𝑒 ⊨ 𝜙1 ∧ 𝜙2 = 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡( 𝑒 ⊨ 𝜙1 , 𝑒 ⊨ 𝜙2 )
54
Intersect Operation
Intersect: 𝑉𝑆𝐴 × 𝑉𝑆𝐴 → 𝑉𝑆𝐴
The output VSA represents the intersection of the sets of
programs represented by the input VSAs.
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 𝐿𝑒𝑎𝑓 𝑒1 , 𝐿𝑒𝑎𝑓(𝑒2 ) = 𝐿𝑒𝑎𝑓 𝑒1 ∩ 𝑒2
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡(𝐿𝑒𝑎𝑓 𝑒 , 𝑁)) = 𝐿𝑒𝑎𝑓({ 𝑒 ∈ 𝑒 | 𝑒 ∈ 𝑁 })
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 Union 𝑁1 , 𝑁2 , 𝑁 =
Union(𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 𝑁1 , 𝑁 , 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 𝑁2 , 𝑁 )
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 𝐹 𝑁1 , 𝑁2 , 𝐹 𝑁1′ , 𝑁2′
=
𝐹(𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 𝑁1 , 𝑁1′ , 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡 𝑁2 , 𝑁2′ )
55
Problem Reduction Rules
𝑒 ⊨ 𝜙 = Union( 𝑒1 ⊨ 𝜙 , 𝑒2 ⊨ 𝜙 )
where 𝑒 is a non-terminal defined as 𝑒 ≔ 𝑒1 | 𝑒2
𝑒 ⊨ 𝜙1 ∧ 𝜙2 = 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡( 𝑒 ⊨ 𝜙1 , 𝑒 ⊨ 𝜙2 )
𝑒 ⊨ 𝜙1 ∧ 𝜙2 = 𝐹𝑖𝑙𝑡𝑒𝑟( 𝑒 ⊨ 𝜙1 , 𝜙2 )
56
Cluster Operation
Cluster: 𝑉𝑆𝐴 × 𝜎 → 2𝑉𝑆𝐴
Smallest partitioning of the input VSA s.t. programs in any
output VSA are indistinguishable over the input state.
Notation: Let 𝐶𝑙𝑢𝑠𝑡𝑒𝑟(𝑁, 𝜎) be denoted by 𝑁 ∕𝜎
Union 𝑁1 , 𝑁2 ∕𝝈 =
let 𝑁 = 𝑁1 ∕𝜎 ∪ 𝑁2 ∕𝜎 in 𝑀𝑒𝑟𝑔𝑒(𝑁, 𝜎)
𝐹 𝑁1 , 𝑁2 ∕𝜎 =
let 𝑁1 = 𝑁1 ∕𝜎 , 𝑁2 = 𝑁2 ∕𝜎 in
let 𝑁3 = F 𝑁1′ , 𝑁2′ 𝑁1′ ∈ 𝑁1 , 𝑁2′ ∈ 𝑁2 } in 𝑀𝑒𝑟𝑔𝑒(𝑁3 , 𝜎)
𝑀𝑒𝑟𝑔𝑒 𝑁, 𝜎
unites VSAs in 𝑁 that have same behavior on 𝜎.
57
Filter Operation
Filter: 𝑉𝑆𝐴 × 𝜙 → 𝑉𝑆𝐴
Filter the input VSA to the subset that satisfies 𝜙.
𝐹𝑖𝑙𝑡𝑒𝑟 𝑁, 𝜙 =
let 𝜎 = set of input states that occur in 𝜙 in
let 𝑁 = 𝑁 ∕𝜎 in
Union( 𝑁 ′ ∈ 𝑁 𝑁′ satisfies 𝜙})
58
Problem Reduction Rules
𝑒 ⊨ 𝜙 = Union( 𝑒1 ⊨ 𝜙 , 𝑒2 ⊨ 𝜙 )
where 𝑒 is a non-terminal defined as 𝑒 ≔ 𝑒1 | 𝑒2
𝑒 ⊨ 𝜙1 ∧ 𝜙2 = 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡( 𝑒 ⊨ 𝜙1 , 𝑒 ⊨ 𝜙2 )
𝑒 ⊨ 𝜙1 ∧ 𝜙2 = 𝐹𝑖𝑙𝑡𝑒𝑟( 𝑒 ⊨ 𝜙1 , 𝜙2 )
59
Problem Reduction Rules
Let F be a binary operator.
Inverse set: 𝐹 −1 𝑣 = 𝑢, 𝑤
𝐹 𝑢, 𝑤 = 𝑣}
𝐶𝑜𝑛𝑐𝑎𝑡 −1 "Abc" = { "Abc",ϵ , ("𝐴𝑏","c"), ("A","bc"), (ϵ, "Abc")}
𝐹 𝑒1 , 𝑒2 ⊨ 𝜎 ⇝ 𝑣 =
𝑈𝑛𝑖𝑜𝑛({F e1 ⊨ 𝜎 ⇝ 𝑢 , 𝑒2 ⊨ 𝜎 ⇝ 𝑤
| 𝑢, 𝑤 ∈ 𝐹 −1 𝑣 })
[𝐶𝑜𝑛𝑐𝑎𝑡 𝑋, 𝑌 ⊨ (𝜎 ⇝ "Abc")] = Union({
𝐶𝑜𝑛𝑐𝑎𝑡( 𝑋 ⊨ 𝜎 ⇝ "Abc" , 𝑌 ⊨ 𝜎 ⇝ 𝜖 ),
𝐶𝑜𝑛𝑐𝑎𝑡 𝑋 ⊨ 𝜎 ⇝ "Ab" , 𝑌 ⊨ 𝜎 ⇝ "𝑐" ,
𝐶𝑜𝑛𝑐𝑎𝑡 𝑋 ⊨ 𝜎 ⇝ "A" , 𝑌 ⊨ 𝜎 ⇝ "𝑏𝑐" ,
𝐶𝑜𝑛𝑐𝑎𝑡 𝑋 ⊨ 𝜎 ⇝ ϵ , 𝑌 ⊨ 𝜎 ⇝ "𝐴𝑏𝑐" })
60
Problem Reduction Rules
Let F be an n-ary binary operator.
Dependent Inverse Set: 𝐹 −1 𝑣 𝑢1 ) =
𝑢2 , … , 𝑢𝑛
𝐹 𝑢1 , … , 𝑢𝑛 = 𝑣}
𝑆𝑢𝑏𝑆𝑡𝑟 −1 "Ab" "Ab cd Ab") = { 0,2 , (6,8) }
𝐹 𝑒0 , 𝑒1 , 𝑒2 ⊨ 𝜎 ⇝ 𝑣 =
let 𝑁 = VSA of 𝑒0 in
let 𝑁1 , … , 𝑁𝑘 = 𝑁 ∕𝜎 in
let 𝑦𝑗 = 𝐸𝑣𝑎𝑙 𝑁𝑗 , 𝜎 in
𝑈𝑛𝑖𝑜𝑛
𝐹 𝑁𝑗 , 𝑀1 , 𝑀2
𝑗 = 1. . 𝑘
𝑢, 𝑤 ∈ 𝐹 −1 𝑣 𝑦𝑗
𝑀1 = 𝑒1 ⊨ 𝜎 ⇝ 𝑢
𝑀2 = 𝑒2 ⊨ 𝜎 ⇝ 𝑤
Let 𝜎 be the state 𝑥: “𝐴𝑏 𝑐𝑑 𝐴𝑏” .
𝑥, 𝑃1 ⊨ 𝜎 ⇝ 3 ,
𝑆𝑢𝑏𝑆𝑡𝑟 𝑥, 𝑃1 , 𝑃2 ⊨ 𝜎 ⇝ "cd" = 𝑆𝑢𝑏𝑆𝑡𝑟
𝑃2 ⊨ 𝜎 ⇝ 5
61
Problem Reduction Rules
Let F be an n-ary operator.
Witness Function: 𝑊𝐹 𝜙 =
𝑊𝑖𝑡𝑒 𝜎1 ⇝ 𝑣1 ∧ 𝜎2 ⇝ 𝑣2
𝜙1 , … , 𝜙𝑛
∀𝑔𝑖 ⊨ 𝜙𝑖 : 𝐹 𝑔1 , … , 𝑔𝑛 ⊨ 𝜙 }
𝜎1 ⇝ 1 ∧ 𝜎2 ⇝ 0, 𝜎1 ⇝ 𝑣1 , 𝜎2 ⇝ 𝑣2 ,
=
𝜎1 ⇝ 1 ∧ 𝜎2 ⇝ 1, 𝜎1 ⇝ 𝑣1 ∧ 𝜎2 ⇝ 𝑣2 , 1
𝐹 𝑒1 , 𝑒2 ⊨ 𝜙 = 𝑈𝑛𝑖𝑜𝑛( F e1 ⊨ 𝜙1 , 𝑒2 ⊨ 𝜙2
𝜙1 , 𝜙2 ∈ 𝑊𝐹 𝜙 })
𝐼𝑇𝐸 𝐵, 𝐸1 , 𝐸2 ⊨ 𝜎1 ⇝ 𝑣1 ∧ 𝜎2 ⇝ 𝑣2 = 𝑈𝑛𝑖𝑜𝑛(
𝐵 ⊨ 𝜎1 ⇝ 1 ∧ 𝜎2 ⇝ 1 ,
𝐵 ⊨ 𝜎1 ⇝ 1 ∧ 𝜎2 ⇝ 0 ,
E1 ⊨ 𝜎1 ⇝ 𝑣1 ,
𝐼𝑇𝐸
, 𝐼𝑇𝐸 E1 ⊨ 𝜎1 ⇝ 𝑣1 ∧ (𝜎2 ⇝ 𝑣2 ) , )
𝐸2 ⊨ 𝜎2 ⇝ 𝑣2
𝐸2 ⊨ 1
62
FlashMeta Framework
• Provides efficient implementations of VSA operations
• Provides a library of witness functions
Role of synthesis designer
• Can add new operators and witness functions.
• Can provide ranking strategies.
• Can specify tactics to resolve non-determinism in search
– Which witness function to use?
– How to order search branches?
63
Comparison of FlashMeta with hand-tuned implementations
Lines of Code
(K)
Development time
(months)
PBE Technology
Original
FlashMeta
Original
FlashMeta
FlashFill
12
3
9
1
FlashExtractText
7
4
8
1
FlashRelate
5
2
8
1
FlashNormalize
17
2
7
2
FlashExtractWeb
N/A
2.5
N/A
1.5
Running time of FlashMeta implementations vary between 0.53x of the corresponding original implementation.
• Faster because of some free optimizations
• Slower because of larger feature sets & a generalized framework
64
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based spec
o Ranking, User Interaction Models
• Search methodology
o DSL, Deductive divide-and-conquer strategy
 SDK
• Miscellaneous
65
Outline
Example-based
specification
Program
Search Algorithm
• Demos
• Dealing with ambiguity in example-based spec
o Ranking, User Interaction Models
• Search methodology
o DSL, Deductive divide-and-conquer strategy
• SDK
 Miscellaneous
66
Miscellaneous
• Dimensions in synthesis
• How I started working on synthesis using examples
• My favorite synthesis project
67
PBE vs Machine Learning
Traditional PBE
Traditional Machine
Learning
Requires few examples.
Generates human readable and
editable models.
Models are deterministic and
intended to work correctly.
Generally does not handle
noise.
Requires too many examples
Generates black-box models
Models are probabilistic and
aimed for high precision.
Can handle noise in input
data.
Opportunity: Combine complementary strengths of PBE & ML.
• generalization via probabilistic models.
• can be useful in data cleaning.
68
Deductive Synthesis vs Inductive Synthesis
Deductive Synthesis
• Refers to synthesis using deductive methods.
• Has traditionally been restricted to synthesis in the
presence of logical specifications.
Inductive Synthesis
• Refers to synthesis from inductive (example-based)
specifications.
• Various kinds of techniques have been applied including
constraint solving, stochastic, and enumerative search.
FlashMeta performs synthesis from inductive specifications
using deductive methods!
69
Dimensions in Program Synthesis
• Domain-specific language
– User-provided sketch [Solar-Lezama, Phd Thesis ‘08]
– SyGuS: parameterized DSL framework [Alur et.al., FMCAD ‘13]
• Search methodology
–
–
–
–
–
Deductive
Constraint solving
Enumerative search [Udupa et al; PLDI 2013]
Stochastic search [Schkufza, Sharma, Aiken; CACM RH ’15]
Web/Repository based search [Yahav et al, Swarat et al]
• Specification
– Examples, Demonstrations
PPDP 2010; “Dimensions in Program Synthesis”; Gulwani
70
Programming by Examples
Earlier literature:
• Version space algebra and its application to programming by
demonstration; [Lau, Domingos, Weld: ICML 2000]
• Why PBD Systems Fail: Lessons Learned for Usable AI.
[Lau, CHI 2008]
Recent PL literature:
• Type-and-example-directed program synthesis;
[Osera, Zdancewic; PLDI 2015]
• Synthesizing data structure transformations from input-output
examples; [Feser, Chaudhuri, Dillig; PLDI 2015]
• Interactive Parser synthesis from example;
[Leung, Sarracino, Lerner; PLDI 2015]
71
Dimensions in Program Synthesis
• Domain-specific language
– User-provided sketch [Solar-Lezama, Phd Thesis ‘08]
– SyGuS: parameterized DSL framework [Alur et.al., FMCAD ‘13]
• Search methodology
–
–
–
–
–
Deductive
Constraint solving
Enumerative search [Udupa et al; PLDI 2013]
Stochastic search [Schkufza, Sharma, Aiken; CACM RH ’15]
Web/Repository based search [Yahav et al, Swarat et al]
• Specification
– Examples, Demonstrations
– Logical specifications
– Natural language
PPDP 2010; “Dimensions in Program Synthesis”; Gulwani
72
Miscellaneous
• Dimensions in synthesis
 How I started working on synthesis using examples
• My favorite synthesis project
73
Graduation Advice (2005)
You will have too many
problems to solve; you
can’t pursue them all.
Make thoughtful choices.
George Necula
UC-Berkeley
74
From Program Verification to Program Synthesis
Precondition P
Statement s
Postcondition Q
Forward dataflow analysis: From s, P, compute Q
Backward dataflow analysis: From s, Q, compute P
Program Synthesis:
From P, Q, compute s
Nebojsa Jojic
MSR Redmond
(2005)
75
Synthesis using SAT/SMT Constraint Solvers
Program synthesis
is an extremely
hard combinatorial
search task!
Try using SAT
solvers, which have
been engineered to
solve huge instances.
Venkie
MSR Redmond
(2006)
76
Initial results in program synthesis
Results: Managed to synthesize a wide variety of programs
from logic specs.
Approach: Reduce synthesis to solving SAT/SMT constraints.
• Bit-vector algorithms (e.g., turn-off rightmost one bit)
– [PLDI 2011, ICSE 2010]
• SIMD algorithms (e.g., vectorization of CountIf)
– [PPoPP 2013]
• Undergraduate book algorithms (e.g., sorting, dynamic prog)
– [POPL 2010]
• Program Inverses (e.g, deserializers from serializers)
– [PLDI 2011]
• Graph Algorithms (e.g., bi-partiteness check)
– [OOPSLA 2010]
77
Mid-life Awakening (2010)
Software developers
Two orders of magnitude more users
End users
Dimensions in Research
Cultivating research taste is a journey.
• Problem Definition
–
–
–
–
Advisor’s interest/funding, Internship, Course project
Intersection with your collaborator’s interest
Next logical advance in your current portfolio
Talk to potential customers, market surveys
• Solution Strategy
– Develop new techniques vs. Apply existing techniques
– Cross-disciplinary
• Impact
– Paper, Tool, Awards, Media
– Personal happiness
Once you develop it,
you start a new journey.
Miscellaneous
• Dimensions in synthesis
• How I started working on synthesis using examples
 My favorite synthesis project
80
Biological Synthesis
Project Sumay
Joint work with
Monika Gulwani
Slides from a presentation made
in April 2013
Template based approach to a WiSE name
Monika’s constraints:
• neither popular on Bing (Sorry uncle Ben, uncle Tom) nor Big
• Musical and easy to Pronounce
• elegant Semantics
Sumit’s approach:
• Template based strategy: C V C V C (where C=consonant, V=vowel)
• Pruning strategy: Pick a CV pair from each creator, including initials.
Solution space:
• MoSu_, MoSi_, MiSu_, MiSi_, MaSu_, MaSi_
• SuMo_, SuMi_, SuMa_, SiMo_, SiMi_, SiMa_,
Potential solutions:
 MaSum => Too gentle
 SuMit => Need a different name than creator
 SuMan => A bit feminine
 SuMay => meaning “wise” in Hindi.
– That’s how dad gets a Bonus for having mom’s initial in his name at right place!
82
Inductive Synthesis
Sumay was 12 days past the due date.
• Planned inductive procedure was used to get him out.
• Good news: 1 unit of induction was good enough.
– Monika wanted to compete with my other distraction (FlashFill)
that is also known to work well with minimum inductive units.
– But Monika wanted to more than “even” out:
• Sumay was born on O1/O2/O3 @ O4 hrs O5 min (where Oi = odd #)
83
Sumay_&_Mom.Sleep()
Well-deserved rest in hospital.
Dad.Watch_Sumay_&_Mom_fromHisCot();
Nikolaj: They won’t be quiet for long!
84
Safety Property: Event DiaperChange handled properly
Property: Handler of event DiaperChange should not be preempted by another occurrence of that event.
• @Clousot: Warning: “Property not satisfied”.
• @Monika: It is a false positive.
• Found a counterexample next day: Event DiaperChange
occurred twice before the handler could finish.
• @Francesco: Invoke garbage collector (GC) immediately
when DiaperChange occurs to mask off other invocations.
– Problem: GC is too small & sometimes gets pre-empted.
85
Safety Property: Event DiaperChange handled properly
Property: Handler of event DiaperChange should not be preempted by another occurrence of that event.
• @Clousot: Warning: “Property not satisfied”.
• @Monika: It is a false positive.
• Found a counterexample next day: Event DiaperChange
occurred twice before the handler could finish.
• @Francesco: Invoke garbage collector (GC) immediately
when DiaperChange occurs to mask off other invocations.
– Problem: GC is too small & sometimes gets pre-empted.
Solution:
• @Babyrus: Run handler within DiaperChangePad lock.
• @Target: Use wipe warmer to make Environment less adversarial.
86
Liveness Property: Crying stops eventually
If (DiaperChangeEvent) { ... }
Else {
If (RecentlyFed) {
Try { Burping() };
Catch (Timeout Exception) { DadSurfaceMagic(); }
}
Else {
Try { Feeding() };
Catch (NotInterested Exception) { DadSurfaceMagic(); }
}
}
87
Procedure DadSurfaceMagic()
// Summary: Sumay enjoys tummy time on Dad’s chest.
Step 1: Hold Sumay.
Assert(Sumay stops crying but is breathing fast);
Step 2: Sit down on couch. Open the recliner (to 22
degrees) and lie down.
Assert(Sumay is calm & his breathing speed starts matching
Dad’s relaxed breathing speed & soon goes off to sleep);
88
Results
Who does Sumay look like?
• Sumay looks like his dad: 70%
• Incomparable element: 15%
• Monika looks like Sumay: 15%
Some cute quotes:
• “Excels in cuteness test”
• Aditya Nori: “hello world to Sumay”
• Nikolaj Bjorner: ”Enjoy: ‘Baby Sleep Guide: Sleep Solutions for
You & Your Baby’. I hope your constraint system is going to be
feasible and that there not only exists a non-empty set of
solutions, but that they can also be found within reasonable time.”
• Shobana Balakrishnan: ”Sumay the wise one! Wonder how soon
before he will start disproving some of your theories!”
89
Programming by Examples
• Cross-disciplinary inspiration
–
–
–
–
Theory/Logical Reasoning (Search algo)
Language Design (DSL)
Machine Learning (Ranking)
HCI (User interaction models)
• Data wrangling is a timely application.
– 99% of end users are non-programmers.
– Data scientists spend 80% time in cleaning data.
Some new opportunities:
•
•
•
•
Tighter integration with machine learning.
Integration with existing programming environments.
Multi-model intent specification (using both Examples and NL).
New application domains such as robotics.
90
Download