Programming by Examples Sumit Gulwani Marktoberdorf Lectures August 2015

advertisement
Programming by Examples
Sumit Gulwani
Marktoberdorf Lectures
August 2015
Lecture 5: Miscellaneous Content
• Dimensions in synthesis
• Programming using Natural Language
• Applications of search in computer-aided Education
• How I started working on synthesis using examples
• My favorite synthesis project
1
PBE vs Machine Learning
Traditional PBE
Traditional Machine
Learning
Requires few examples.
Generates human readable and
editable models.
Models are deterministic and
intended to work correctly.
Generally does not handle
noise.
Requires too many examples
Generates black-box models
Models are probabilistic and
aimed for high precision.
Can handle noise in input
data.
Opportunity: Combine complementary strengths of PBE & ML.
• generalization via probabilistic models.
• can be useful in data cleaning.
2
Deductive Synthesis vs Inductive Synthesis
Deductive Synthesis
• Refers to synthesis using deductive methods.
• Has traditionally been restricted to synthesis in the
presence of logical specifications.
Inductive Synthesis
• Refers to synthesis from inductive (example-based)
specifications.
• Various kinds of techniques have been applied including
constraint solving, stochastic, and enumerative search.
FlashMeta performs synthesis from inductive specifications
using deductive methods!
3
Dimensions in Program Synthesis
• Domain-specific language
– User-provided sketch [Solar-Lezama, Phd Thesis ‘08]
– SyGuS: parameterized DSL framework [Alur et.al., FMCAD ‘13]
• Search methodology
–
–
–
–
–
Deductive
Constraint solving
Enumerative search [Udupa et al; PLDI 2013]
Stochastic search [Schkufza, Sharma, Aiken; CACM RH ’15]
Web/Repository based search [Yahav et al, Swarat et al]
• Specification
– Examples, Demonstrations
PPDP 2010; “Dimensions in Program Synthesis”; Gulwani
4
Programming by Examples
Earlier literature:
• Version space algebra and its application to programming by
demonstration; [Lau, Domingos, Weld: ICML 2000]
• Why PBD Systems Fail: Lessons Learned for Usable AI.
[Lau, CHI 2008]
Recent PL literature:
• Type-and-example-directed program synthesis;
[Osera, Zdancewic; PLDI 2015]
• Synthesizing data structure transformations from input-output
examples; [Feser, Chaudhuri, Dillig; PLDI 2015]
• Interactive Parser synthesis from example;
[Leung, Sarracino, Lerner; PLDI 2015]
5
Dimensions in Program Synthesis
• Domain-specific language
– User-provided sketch [Solar-Lezama, Phd Thesis ‘08]
– SyGuS: parameterized DSL framework [Alur et.al., FMCAD ‘13]
• Search methodology
–
–
–
–
–
Deductive
Constraint solving
Enumerative search [Udupa et al; PLDI 2013]
Stochastic search [Schkufza, Sharma, Aiken; CACM RH ’15]
Web/Repository based search [Yahav et al, Swarat et al]
• Specification
– Examples, Demonstrations
– Logical specifications
– Natural language
PPDP 2010; “Dimensions in Program Synthesis”; Gulwani
6
SmartPhone Programming using Natural Language
“When I receive a new SMS, if the phone is connected
to my car’s bluetooth, read the message content and
reply to the sender ‘I am driving’.“
Synthesis Methodology
• Similar to PBE, there is an underlying DSL & ranking fn.
• Candidate set of programs is produced using:
– Rule based NLP engine identifies operators and likely
relationships between them.
– Type-based synthesis is used to complete partial programs.
SmartSynth: Synthesizing Smartphone Automation Scripts from
Natural Language, MobiSys 2013, Le, Gulwani, Su
7
SmartSynth (TouchDevelop) Demo
8
Spreadsheet Programming using Natural Language
Challenge: Handle variability in English description
• “sum the hours for the capitol hill location chefs”
• “total hours capitol hill chefs”
• “get the hours where title = chef that work at capitol hill &
sum them up”
• “sum column D where column C is chef and A is capitol hill”
NLyze: Interactive Programming by Natural Language for SpreadSheet
Data Analysis and Manipulation; SIGMOD 2014; Gulwani, Marron
9
Outline
• Dimensions in synthesis
• Programming using Natural Language
 Applications of search in computer-aided Education
• How I started working on synthesis using examples
• My favorite synthesis project
10
Application: Computer-aided Education
Various tasks
• Problem Generation
• Solution Generation
• Feedback Generation
• …
Various subject-domains
• Arithmetic, Algebra, Geometry
• Programming, Automata, Logic
• Language Learning
• ...
CACM 2014; “Example-based Learning in Computer-aided STEM Education”;
Gulwani
11
Problem Synthesis
Motivation
• Problems similar to a given problem.
– Avoid copyright issues
– Prevent cheating in MOOCs (Unsynchronized instruction)
• Problems of a given difficulty level and concept usage.
– Generate progressions
– Generate personalized workflows
Key Ideas
• Test input generation techniques [CHI 2013]
• Template-based generalization
12
Problem Synthesis: Algebra (Trigonometry)
Example Problem: sec 𝑥 + cos 𝑥
Query: 𝑇1 𝑥 ± 𝑇2 (𝑥)
𝑇1 ≠ 𝑇5
sec 𝑥 − cos 𝑥 = tan2 𝑥 + sin2 𝑥
𝑇3 𝑥 ± 𝑇4 𝑥
= 𝑇52 𝑥 ± 𝑇62 (𝑥)
New problems generated:
csc 𝑥 + cos 𝑥 csc 𝑥 − cos 𝑥 = cot 2 𝑥 + sin2 𝑥
(csc 𝑥 − sin 𝑥)(csc 𝑥 + sin 𝑥) = cot 2 𝑥 + cos 2 𝑥
(sec 𝑥 + sin 𝑥)(sec 𝑥 − sin 𝑥) = tan2 𝑥 + cos 2 𝑥
:
(tan 𝑥 + sin 𝑥)(tan 𝑥 − sin 𝑥) = tan2 𝑥 − sin2 𝑥
(csc 𝑥 + cos 𝑥)(csc 𝑥 − cos 𝑥) = csc 2 𝑥 − cos 2 𝑥
:
AAAI 2012: “Automatically generating algebra problems”;
Singh, Gulwani, Rajamani.
13
Problem Synthesis: Algebra (Trigonometry)
Example Problem: sec 𝑥 + cos 𝑥
Query: 𝑇1 𝑥 ± 𝑇2 (𝑥)
𝑇1 ≠ 𝑇5
sec 𝑥 − cos 𝑥 = tan2 𝑥 + sin2 𝑥
𝑇3 𝑥 ± 𝑇4 𝑥
= 𝑇52 𝑥 ± 𝑇62 (𝑥)
New problems generated:
csc 𝑥 + cos 𝑥 csc 𝑥 − cos 𝑥 = cot 2 𝑥 + sin2 𝑥
(csc 𝑥 − sin 𝑥)(csc 𝑥 + sin 𝑥) = cot 2 𝑥 + cos 2 𝑥
(sec 𝑥 + sin 𝑥)(sec 𝑥 − sin 𝑥) = tan2 𝑥 + cos 2 𝑥
:
(tan 𝑥 + sin 𝑥)(tan 𝑥 − sin 𝑥) = tan2 𝑥 − sin2 𝑥
(csc 𝑥 + cos 𝑥)(csc 𝑥 − cos 𝑥) = csc 2 𝑥 − cos 2 𝑥
:
AAAI 2012: “Automatically generating algebra problems”;
Singh, Gulwani, Rajamani.
14
Problem Synthesis: Algebra (Limits)
𝑛
Example Problem:
𝑛
Query:
lim
𝑛→∞
𝑖=0
lim
𝑛→∞
𝑖=0
2𝑖 2 + 𝑖 + 1
5
=
𝑖
2
5
𝐶0 𝑖 2 + 𝐶1 𝑖 + 𝐶2
𝐶3 𝑖
𝐶4
=
𝐶5
C0 ≠ 0 ∧ gcd 𝐶0 , 𝐶1 , 𝐶2 = gcd 𝐶4 , 𝐶5 = 1
New problems generated:
𝑛
lim
𝑛→∞
𝑖=0
𝑛
lim
𝑛→∞
𝑖=0
𝑛
2
3𝑖 + 2𝑖 + 1
7
=
𝑖
3
7
lim
𝑛→∞
𝑛
2
𝑖
3
=
𝑖
2
3
𝑖=0
lim
𝑛→∞
𝑖=0
3𝑖 2 + 3𝑖 + 1
=4
𝑖
4
5𝑖 2 + 3𝑖 + 3
=6
𝑖
6
15
Problem Synthesis: Algebra (Integration)
Example Problem:
Query:
(csc 𝑥) (csc 𝑥 − cot 𝑥) 𝑑𝑥 = csc 𝑥 − cot 𝑥
𝑇0 𝑥 𝑇1 𝑥 ± 𝑇2 𝑥 𝑑𝑥 = 𝑇4 𝑥 ± 𝑇5 (𝑥)
𝑇1 ≠ 𝑇2 ∧ 𝑇4 ≠ 𝑇5
New problems generated:
(tan 𝑥) (cos 𝑥 + sec 𝑥) 𝑑𝑥 = sec 𝑥 − cos 𝑥
(sec 𝑥) (tan 𝑥 + sec 𝑥) 𝑑𝑥 = sec 𝑥 + cot 𝑥
(cot 𝑥) (sin 𝑥 + csc 𝑥) 𝑑𝑥 = sin 𝑥 − csc 𝑥
16
Problem Synthesis: Algebra (Determinant)
Ex. Problem
𝑥+𝑦
𝑧𝑥
𝑦𝑧
2
𝑧𝑥
𝑦+𝑧
𝑥𝑦
𝐹0 (𝑥, 𝑦, 𝑧) 𝐹1 (𝑥, 𝑦, 𝑧)
Query 𝐹3 (𝑥, 𝑦, 𝑧) 𝐹4 (𝑥, 𝑦, 𝑧)
𝐹6 (𝑥, 𝑦, 𝑧) 𝐹7 (𝑥, 𝑦, 𝑧)
2
𝑧𝑦
𝑥𝑦
𝑧+𝑥
= 2𝑥𝑦𝑧 𝑥 + 𝑦 + 𝑧
3
2
𝐹2 (𝑥, 𝑦, 𝑧)
𝐹5 (𝑥, 𝑦, 𝑧)
𝐹8 (𝑥, 𝑦, 𝑧)
= 𝐶10 𝐹9 (𝑥, 𝑦, 𝑧)
𝐹𝑖 ≔ 𝐹𝑗 𝑥 → 𝑦; 𝑦 → 𝑧; 𝑧 → 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑖, 𝑗 ∈ { 4,0 , 8,4 , 5,1 , … }
New problems generated:
𝑦2
𝑧+𝑦
𝑧2
2
𝑦𝑧 + 𝑦 2
𝑦𝑧
𝑧𝑥
𝑥2
𝑧2
𝑥+𝑧
𝑥𝑦
𝑧𝑥 + 𝑧 2
𝑧𝑥
2
𝑦+𝑥
𝑦2
𝑥2
𝑥𝑦
𝑦𝑧
𝑥𝑦 + 𝑥 2
2
= 2 𝑥𝑦 + 𝑦𝑧 + 𝑧𝑥
3
= 4𝑥 2 𝑦 2 𝑧 2
17
Problem Synthesis: Sentence Completion
1. The principal characterized his pupils as _________
because they were pampered and spoiled by their indulgent parents.
2. The commentator characterized the electorate as _________
because it was unpredictable and given to constantly shifting moods.
(a) cosseted
(b) disingenuous
(c) corrosive
(d) laconic
(e) mercurial
One of the problems is a real problem from SAT (standardized US exam),
while the other one was automatically generated!
From problem 1, we generate template T1:
*1 characterized *2 as *3 because *4
We specialize T1 to template T2:
*1 characterized *2 as mercurial because *4
Problem 2 is an instance of T2 found using web search!
LaSEWeb: Automating search strategies over web data;
Polozov, Gulwani; KDD 2014
Problem Synthesis
Motivation
• Problems similar to a given problem.
– Avoid copyright issues
– Prevent cheating in MOOCs (Unsynchronized instruction)
• Problems of a given difficulty level and concept usage.
– Generate progressions
– Generate personalized workflows
Key Ideas
• Test input generation techniques [CHI 2013]
• Template-based generalization
• Reverse of solution generation
19
Natural Deduction
Prove that: 𝑥1 ∨ 𝑥2 ∧ 𝑥3
and 𝑥1 → 𝑥4
and 𝑥4 → 𝑥5
implies 𝑥2 ∨ 𝑥5
Inference Rule
Premises
Conclusion
Modus Ponens (MP)
𝑝 → 𝑞, 𝑝
𝑞
Hypothetical Syllogism (HS)
𝑝 → 𝑞, 𝑞 → 𝑝
𝑝→𝑟
Disjunctive Syllogism (DS)
𝑝 ∨ 𝑞, ¬𝑝
𝑞
Simplification (Simp)
𝑝∧𝑞
𝑞
Replacement Rule
Proposition
Equiv. Proposition
Distribution
𝑝 ∨ (𝑞 ∧ 𝑟)
Double Negation
𝑝
¬¬𝑝
Implication
𝑝→𝑞
¬𝑝 ∨ 𝑞
Equivalence
𝑝≡𝑞
𝑝 ∨ 𝑞 ∧ (𝑝 ∨ 𝑟)
𝑝 → 𝑞 ∧ (𝑞 → 𝑝)
IJCAI 2013: “Automatically Generating Problems and Solutions for Natural Deduction”
20
Umair Ahmed, Sumit Gulwani, Amey Karkare
Similar Problem Generation: Natural Deduction
Similar Problems = those that have a minimal proof with
the same sequence of inference rules as used by a
minimal proof of given problem.
Premise 1
Premise 2
Premise 3
Conclusion
𝑥1 ∨ (𝑥2 ∧ 𝑥3 )
𝑥1 → 𝑥4
𝑥4 → 𝑥5
𝑥2 ∨ 𝑥5
Similar Problems
Premise 1
Premise 2
Premise 3
Conclusion
𝑥1 ≡ 𝑥2
𝑥3 → ¬𝑥2
(𝑥4 → 𝑥5 ) → 𝑥3
𝑥1 → (𝑥𝑦 ∧ ¬𝑥5 )
𝑥1 ∧ (𝑥2 → 𝑥3 )
𝑥1 ∨ 𝑥4 → ¬𝑥5 𝑥2 ∨ 𝑥5
(𝑥1 ∨ 𝑥2 ) → 𝑥3
𝑥3 → 𝑥1 ∧ 𝑥4
(𝑥1 → 𝑥2 ) → 𝑥3
𝑥3 → ¬𝑥4
𝑥1 → 𝑥2 ∧ 𝑥3
𝑥4 → ¬𝑥2
𝑥1 ∧ 𝑥4 → 𝑥5
𝑥1 ∨ 𝑥5 ∨ 𝑥4
𝑥3 ≡ 𝑥5 → 𝑥4
𝑥1 ∨ 𝑥4 ∧ ¬𝑥5
𝑥1 → 𝑥5
𝑥5 ∨ 𝑥2 → 𝑥1
𝑥1 → 𝑥3 ≡ ¬𝑥5
21
Parameterized Problem Generation: Natural Deduction
Parameters:
# of premises = 3, Size of propositions ≤ 4
# of variables = 3, # of inference steps = 2
Inference rules = { DS, HS }
Parameterized Problems
Premise 1
Premise 2
Premise 3 Conclusion
(𝑥1 → 𝑥3 ) → 𝑥2
𝑥2 → 𝑥3
¬𝑥3
𝑥1 ∧ ¬𝑥3
𝑥3 ≡ 𝑥1 → 𝑥2 ¬𝑥2
𝑥1 ∧ ¬𝑥3
𝑥3 → 𝑥1
𝑥1 ≡ 𝑥3 ∨ 𝑥1 ≡ 𝑥2
(𝑥1 ≡ 𝑥2 ) → 𝑥3
¬𝑥3
𝑥1 ≡ 𝑥3
𝑥1 ≡ ¬𝑥3
𝑥2 ∨ 𝑥1
𝑥3 → ¬𝑥2
𝑥1 ∧ ¬𝑥3
𝑥3 → 𝑥1
𝑥1 → 𝑥2 ∧ 𝑥3
𝑥3 → ¬𝑥2
¬𝑥3
22
Feedback Synthesis
Motivation
• Makes teachers more effective.
– Saves them time.
– Provides immediate insights on where students are struggling.
• Can enable rich interactive experience for students.
– Generation of hints.
– Pointer to simpler problems depending on kind of mistake.
Key Ideas:
• Use PBE techniques to learn buggy procedure in student mind
• Nearest correct solution
23
Feedback Synthesis: Programming (Array Reverse)
i = 1
front <= back
i <= a.Length
--back
PLDI 2013: “Automated Feedback Generation for Introductory
Programming Assignments”; Singh, Gulwani, Solar-Lezama
Experimental Results
13,365 incorrect attempts for 13 Python problems.
(obtained from Introductory Programming course at
MIT and its MOOC version on the EdX platform)
• Average time for feedback = 10 seconds
• Feedback generated for 64% of those attempts.
• Reasons for failure to generate feedback
– Completely incorrect solutions
– Big conceptual errors
– Timeout (4 min)
Tool accessible at: http://sketch1.csail.mit.edu/python-autofeedback/ 25
Outline
• Dimensions in synthesis
• Programming using Natural Language
• Applications of search in computer-aided Education
 How I started working on synthesis using examples
• My favorite synthesis project
26
Graduation Advice (2005)
You will have too many
problems to solve; you
can’t pursue them all.
Make thoughtful choices.
George Necula
UC-Berkeley
27
From Program Verification to Program Synthesis
Precondition P
Statement s
Postcondition Q
Forward dataflow analysis: From s, P, compute Q
Backward dataflow analysis: From s, Q, compute P
Program Synthesis:
From P, Q, compute s
Nebojsa Jojic
MSR Redmond
(2005)
28
Synthesis using SAT/SMT Constraint Solvers
Program synthesis
is an extremely
hard combinatorial
search task!
Try using SAT
solvers, which have
been engineered to
solve huge instances.
Venkie
MSR Redmond
(2006)
29
Initial results in program synthesis
Results: Managed to synthesize a wide variety of programs
from logic specs.
Approach: Reduce synthesis to solving SAT/SMT constraints.
• Bit-vector algorithms (e.g., turn-off rightmost one bit)
– [PLDI 2011, ICSE 2010]
• SIMD algorithms (e.g., vectorization of CountIf)
– [PPoPP 2013]
• Undergraduate book algorithms (e.g., sorting, dynamic prog)
– [POPL 2010]
• Program Inverses (e.g, deserializers from serializers)
– [PLDI 2011]
• Graph Algorithms (e.g., bi-partiteness check)
– [OOPSLA 2010]
30
Mid-life Awakening (2010)
Software developers
Two orders of magnitude more users
End users
Dimensions in Research
Cultivating research taste is a journey.
• Problem Definition
–
–
–
–
Advisor’s interest/funding, Internship, Course project
Intersection with your collaborator’s interest
Next logical advance in your current portfolio
Talk to potential customers, market surveys
• Solution Strategy
– Develop new techniques vs. Apply existing techniques
– Cross-disciplinary
• Impact
– Paper, Tool, Awards, Media
– Personal happiness
Once you develop it,
you start a new journey.
Outline
• Dimensions in synthesis
• Programming using Natural Language
• Applications of search in computer-aided Education
• How I started working on synthesis using examples
 My favorite synthesis project
33
Biological Synthesis
Project Sumay
Joint work with
Monika Gulwani
Slides from a presentation made
in April 2013
Template based approach to a WiSE name
Monika’s constraints:
• neither popular on Bing (Sorry uncle Ben, uncle Tom) nor Big
• Musical and easy to Pronounce
• elegant Semantics
Sumit’s approach:
• Template based strategy: C V C V C (where C=consonant, V=vowel)
• Pruning strategy: Pick a CV pair from each creator, including initials.
Solution space:
• MoSu_, MoSi_, MiSu_, MiSi_, MaSu_, MaSi_
• SuMo_, SuMi_, SuMa_, SiMo_, SiMi_, SiMa_,
Potential solutions:
 MaSum => Too gentle
 SuMit => Need a different name than creator
 SuMan => A bit feminine
 SuMay => meaning “wise” in Hindi.
– That’s how dad gets a Bonus for having mom’s initial in his name at right place!
35
Inductive Synthesis
Sumay was 12 days past the due date.
• Planned inductive procedure was used to get him out.
• Good news: 1 unit of induction was good enough.
– Monika wanted to compete with my other distraction (FlashFill)
that is also known to work well with minimum inductive units.
– But Monika wanted to more than “even” out:
• Sumay was born on O1/O2/O3 @ O4 hrs O5 min (where Oi = odd #)
36
Sumay_&_Mom.Sleep()
Well-deserved rest in hospital.
Dad.Watch_Sumay_&_Mom_fromHisCot();
Nikolaj: They won’t be quiet for long!
37
Safety Property: Event DiaperChange handled properly
Property: Handler of event DiaperChange should not be preempted by another occurrence of that event.
• @Clousot: Warning: “Property not satisfied”.
• @Monika: It is a false positive.
• Found a counterexample next day: Event DiaperChange
occurred twice before the handler could finish.
• @Francesco: Invoke garbage collector (GC) immediately
when DiaperChange occurs to mask off other invocations.
– Problem: GC is too small & sometimes gets pre-empted.
38
Safety Property: Event DiaperChange handled properly
Property: Handler of event DiaperChange should not be preempted by another occurrence of that event.
• @Clousot: Warning: “Property not satisfied”.
• @Monika: It is a false positive.
• Found a counterexample next day: Event DiaperChange
occurred twice before the handler could finish.
• @Francesco: Invoke garbage collector (GC) immediately
when DiaperChange occurs to mask off other invocations.
– Problem: GC is too small & sometimes gets pre-empted.
Solution:
• @Babyrus: Run handler within DiaperChangePad lock.
• @Target: Use wipe warmer to make Environment less adversarial.
39
Liveness Property: Crying stops eventually
If (DiaperChangeEvent) { ... }
Else {
If (RecentlyFed) {
Try { Burping() };
Catch (Timeout Exception) { DadSurfaceMagic(); }
}
Else {
Try { Feeding() };
Catch (NotInterested Exception) { DadSurfaceMagic(); }
}
}
40
Procedure DadSurfaceMagic()
// Summary: Sumay enjoys tummy time on Dad’s chest.
Step 1: Hold Sumay.
Assert(Sumay stops crying but is breathing fast);
Step 2: Sit down on couch. Open the recliner (to 22
degrees) and lie down.
Assert(Sumay is calm & his breathing speed starts matching
Dad’s relaxed breathing speed & soon goes off to sleep);
41
Results
Who does Sumay look like?
• Sumay looks like his dad: 70%
• Incomparable element: 15%
• Monika looks like Sumay: 15%
Some cute quotes:
• “Excels in cuteness test”
• Aditya Nori: “hello world to Sumay”
• Nikolaj Bjorner: ”Enjoy: ‘Baby Sleep Guide: Sleep Solutions for
You & Your Baby’. I hope your constraint system is going to be
feasible and that there not only exists a non-empty set of
solutions, but that they can also be found within reasonable time.”
• Shobana Balakrishnan: ”Sumay the wise one! Wonder how soon
before he will start disproving some of your theories!”
42
Programming by Examples
• Cross-disciplinary inspiration
–
–
–
–
Theory/Logical Reasoning (Search algo)
Language Design (DSL)
Machine Learning (Ranking)
HCI (User interaction models)
• Data wrangling is a timely application.
– 99% of end users are non-programmers.
– Data scientists spend 80% time in cleaning data.
Some new opportunities:
•
•
•
•
Tighter integration with machine learning.
Integration with existing programming environments.
Multi-model intent specification (using both Examples and NL).
New application domains such as robotics.
43
Download