Cultivating Research Taste Programming Languages Mentoring Workshop 2015 Sumit Gulwani

advertisement
Cultivating Research Taste
(illustrated via a journey in Program Synthesis research)
Programming Languages Mentoring Workshop
2015
Sumit Gulwani
Microsoft Research, Redmond
Dimensions in Research
• Problem Definition
–
–
–
–
Advisor’s interest and funding, Internship, Course project
Intersection with your collaborator’s interest
Next logical advance in your current portfolio
Talk to potential customers, market surveys
• Solution Strategy
– Develop new techniques vs. Apply existing techniques
– Cross-disciplinary
• Impact
– Paper, Tool, Awards, Media
– Personal happiness
Cultivating research taste is a journey!
Once you develop it, you start on another journey!
1
Program Synthesis
Goal: Synthesize a program in the underlying domain-specific
language (DSL) from user intent using some search algorithm.
An old problem, but more significant today.
• Diverse computational platforms & programming languages.
• Enabling technology: Better algorithms & faster machines
Synthesis can revolutionize end-user programming if we:
• target the right set of application domains
– such as Data manipulation
• allow the right intent specification mechanism
– Examples, Natural Language
• can tame the huge search space for real-time interaction
– Domain-specific search algorithms
PPDP 2010 [Invited talk paper]: “Dimensions in Program Synthesis”;
2
Graduation Advice (2005)
You will have too many
problems to solve; you
can’t pursue them all.
Make thoughtful choices.
George Necula
UC-Berkeley
3
From Program Verification to Program Synthesis
Precondition P
Statement s
Postcondition Q
Forward dataflow analysis: From s, P, compute Q
Backward dataflow analysis: From s, Q, compute P
Program Synthesis:
From P, Q, compute s
Nebojsa Jojic
MSR Redmond
(2005)
4
Synthesis using SAT/SMT Constraint Solvers
Program synthesis
is an extremely
hard combinatorial
search task!
Try using SAT
solvers, which have
been engineered to
solve huge instances.
Venkie
MSR Bangalore
(2006)
5
Initial results in program synthesis
Results: Managed to synthesize a wide variety of programs
from logic specs.
Approach: Reduce synthesis to solving SAT/SMT constraints.
• Bit-vector algorithms (e.g., turn-off rightmost one bit)
– [PLDI 2011, ICSE 2010]
• SIMD algorithms (e.g., vectorization of CountIf)
– [PPoPP 2013]
• Undergraduate book algorithms (e.g., sorting, dynamic prog)
– [POPL 2010]
• Program Inverses (e.g, deserializers from serializers)
– [PLDI 2011]
• Graph Algorithms (e.g., bi-partiteness check)
– [OOPSLA 2010]
6
Mid-life Awakening (2010)
Software developers
Two orders of magnitude more users
End users
Dimensions in Research
 Problem Definition
–
–
–
–
Advisor’s interest and funding, Internship, Course project
Intersection with your collaborator’s interest
Next logical advance in your current portfolio
Talk to potential customers, market surveys
• Solution Strategy
– Develop new techniques vs. Apply existing techniques
– Cross-disciplinary
• Impact
– Paper, Tool, Media, Awards
– Personal happiness
Cultivating research taste is a journey!
Once you develop it, you start on another journey!
8
Problem Definition: Inspired by Excel help forums
Typical help-forum interaction
300_w30_aniSh_c1_b  w30
300_w5_aniSh_c1_b  w5
=MID(B1,5,2)
=MID(B1,FIND(“_”,$B:$B)+1,
FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1)
Flash Fill (Excel 2013 feature)
Dimensions in Research
• Problem Definition
–
–
–
–
Advisor’s interest and funding, Internship, Course project
Intersection with your collaborator’s interest
Next logical advance in your current portfolio
Talk to potential customers, market surveys
 Solution Strategy
– Develop new techniques vs. Apply existing techniques
– Cross-disciplinary
• Impact
– Paper, Tool, Awards, Media
– Personal happiness
Cultivating research taste is a journey!
Once you develop it, you start on another journey!
12
Flash Fill: Domain Specific Language
Guarded Expression G := Switch((b1,e1), …, (bn,en))
Boolean Expression b := c1 Æ … Æ cn
Atomic Predicate c := Match(vi,k,r)
Trace Expression e := Concatenate(f1, …, fn)
Atomic Expression f := s // Constant String
| SubStr(vi, p1, p2) | Loop(w: e)
Index Expression p := k // Constant Integer
| Pos(r1, r2, k) // kth position in string
whose left/right side
matches with r1/r2
Regular Expression r := TokenSequence(T1,…,Tn)
POPL 2011: “Automating String Processing in Spreadsheets using
Input-Output Examples”; Sumit Gulwani.
13
Substring Operator
Let w = SubString(s, p, p’)
where p = Pos(r1, r2, k) and p’ = Pos(r1’, r2’, k’)
w1
w2
w 1’
p
w2’
p’
w
r1 matches w1
r2 matches w2
r1’ matches w1’
r2’ matches w2’
s
Two special cases:
• r1 = r2’ = 𝜖 : This describes the substring
• r2 = r1’ = 𝜖 : This describes boundaries around the substring
The general case allows for the combination of the two and is
14
thus a powerful operator!
Syntactic String Transformations: Example
Format phone numbers
Input v1
Output
(425)-706-7709
425-706-7709
510.220.5586
510-220-5586
235 7654
425-235-7654
745-8139
425-745-8139
Switch((b1, e1), (b2, e2)), where
b1 Match(v1,NumTok,3),
b2 :Match(v1,NumTok,3),
e1 Concatenate(SubStr2(v1,NumTok,1), ConstStr(“-”),
SubStr2(v1,NumTok,2), ConstStr(“-”),
SubStr2(v1,NumTok,3))
e2 Concatenate(ConstStr(“425-”),SubStr2(v1,NumTok,1),
ConstStr(“-”),SubStr2(v1,NumTok,2))
15
Flash Fill: Search Algorithm
Goal: Given input-output pairs: (i1,o1), (i2,o2), (i3,o3), (i4,o4), find
P such that P(i1)=o1, P(i2)=o2, P(i3)=o3, P(i4)=o4.
Algorithm:
1. Learn set S1 of trace expressions s.t. 8e in S1, [[e]] i1 = o1.
Similarly compute S2, S3, S4. Let S = S1 ÅS2 ÅS3 ÅS4.
2(a). If S ≠ ; then result is S.
Challenge: Each Sj may have a huge number of expressions.
Key Idea: We have a DAG based data-structure that allows
for succinct representation and manipulation of Sj.
16
Flash Fill: Search Algorithm
Goal: Given input-output pairs: (i1,o1), (i2,o2), (i3,o3), (i4,o4), find
P such that P(i1)=o1, P(i2)=o2, P(i3)=o3, P(i4)=o4.
Algorithm:
1. Learn set S1 of trace expressions s.t. 8e in S1, [[e]] i1 = o1.
Similarly compute S2, S3, S4. Let S = S1 ÅS2 ÅS3 ÅS4.
2(a). If S ≠ ; then result is S.
2(b). Else find a smallest partition, say {S1,S2}, {S3,S4}, s.t.
S1 ÅS2 ≠ ; and S3 ÅS4 ≠ ;.
3. Learn boolean formulas b1, b2 s.t.
b1 maps i1, i2 to true, and b2 maps i3, i4 to true.
4. Result is: Switch((b1,S1 ÅS2), (b2,S3 ÅS4))
Search Methodology: Reduce learning of an expression to
learning of sub-expressions (Divide-and-Conquer!)
17
Ranking
General Principles
• Prefer shorter programs.
– Fewer number of conditionals.
– Shorter string expression, regular expressions.
• Prefer programs with fewer constants.
Strategies
• Baseline: Pick any minimal sized program using
minimal number of constants.
• Machine Learning: Programs are scored using
a weighted combination of program features.
– Weights are learned using training data.
Rishabh Singh
18
Experimental Comparison of various Ranking Strategies
Strategy
Average # of examples required
Baseline
4.17
Learning
1.48
Technical Report: “Predicting a correct program in Programming by
Example”; Singh, Gulwani
19
User Interaction Model
Current Flash Fill Model
• Auto-prediction avoids discoverability issue.
• User inspects output and may provide additional examples.
Show programs
• in any desired language (after conversion from DSL).
• Paraphrase in English.
Computer initiated interactivity
• Highlight less confident entries in the output.
• Ask directed questions based on distinguishing inputs.
20
Dimensions in Research
• Problem Definition
–
–
–
–
Advisor’s interest and funding, Internship, Course project
Intersection with your collaborator’s interest
Next logical advance in your current portfolio
Talk to potential customers, market surveys
• Solution Strategy
– Develop new techniques vs. Apply existing techniques
– Cross-disciplinary
 Impact
– Paper, Tool, Awards, Media
– Personal happiness
Cultivating research taste is a journey!
Once you develop it, you start on another journey!
21
Initial Success: Media articles & Blogposts
Broader Impact
Defined a new research trajectory, which keeps me busy
with a passionate sense of purpose.
• End-user Programming using Examples and Natural
Language
• Intelligent Tutoring systems
23
Conclusion
Dimensions in Research
Problem definition, Solution strategy, Impact
Cultivating research taste is a journey
Mine involved:
“Program analysis”
-> “Program synthesis”
-> “Program synthesis for end-users using examples”
Once you develop it, you start a new journey
Mine involves: having fun with cross-disciplinary research in
• “Frameworks for end-user programming using examples & NL”
• “Intelligent Tutoring systems”
Backup Slides
for
Flash Fill Demo
25
Download