Int - Lab for Automated Reasoning and Analysis - LARA

advertisement
Implicit Programming
Viktor Kuncak
Swiss Federal Institute of Technology Lausanne
(EPFL)
Joint work with:
Tihomir Gvero, EPFL
Ali Sinan Köksal, now grad student at UC Berkeley
Ruzica Piskac, now at Max-Planck Inst. for Software Systems
Philippe Suter, EPFL (graduating)
http://lara.epfl.ch/
Programming Activities and Tools
requirements
Three related activities:
• Development within an IDE
(Eclipse, Visual Studio, emacs)
• Compilation and static checking
(optimizing compiler,
static analyzer, contract checker)
• Execution on a (virtual) machine
def f(x : Int) = {
y=2*x+1
}
iload_0
iconst_1
iadd
More compute power available for each step
 use it to improve programmer productivity
42
Implicit Programming Agenda
requirements
Advance techniques and tools for
Development within an IDE
What can we prove in 0.5 s?
def f(x : Int) = {
y=2*x+1
}
Compilation and verification
Eliminate unknowns from programs
Constraint diagnostics
iload_0
iconst_1
iadd
Execution
When is unfolding enough?
to improve software productivity.
42
Background: Scala Programming Language
Introduced by Martin Odersky, EPFL
Unifies functional and object-oriented programming
Runs on the Java Virtual Machine (and .NET)
Now used by over 100’000 developers
(incl. Twitter, UBS, LinkedIn)
DSL embedding mechanisms. IDE support (e.g. Eclipse)
sealed abstract class Tree
case class Leaf() extends Tree
case class Node(left:Tree, data:Int, right:Tree) extends Tree
def elems(t:Tree) :Set[Int] = t match {
case Leaf() ⇒ Set.empty
case Node(l,d,r) ⇒ elems(l) ++ Set(d) ++ elems(r)
}
http://www.scala-lang.org
Kaplan: Executable Spec for Sorting
sealed abstract class List
case class Nil() extends List
case class Cons(head : Int, tail : List) extends List
def elems(ls : List) : Set[Int] = …
def isSorted(lst : List) : Boolean = lst match {
case Nil()
⇒ true
case Cons(_, Nil()) ⇒ true
case Cons(x, Cons(y, ys)) ⇒ x < y && isSorted(Cons(y,ys))
}
((lst: List)⇒ isSorted(lst) && elems(lst) == Set(0, 1, -3)).find.get
> Cons(-3, Cons(0, Cons(1, Nil())))
Koksal, Kuncak, Suter: Constraints as Control, POPL 2012
Minimizing Solutions
val (a, b) = (12, 8)
val (lcm, _, _) = ((m, fa, fb) ⇒
m > 0 && m == fa * a && m == fb * b)
.minimizing((m,fa,fb) ⇒ m).find.get
> 24
Enumerating Complex Values
((t:Tree)⇒ isBinarySearchTree(t) &&
elems(t)==Set(1,2,3)).findAll.toList
> List(Node(Node(L, 1, L), 2, Node(L, 3, L)),
Node(Node(Node(L, 1, L), 2, L), 3, L),
Node(L, 1, Node(L, 2, Node(L, 3, L))))
Red-Black Trees
Implementation:
next 30 pages
invariants
Formalize Invariants (in Scala)
sealed abstract class Tree
case class Empty() extends Tree
case class Node(color: Color, left: Tree, value: Int, right: Tree) extends Tree
def bBalanced(t : Tree) : Boolean = t match {
case Node(_,l,_,r) => bBalanced(l) && bBalanced(r) &&
bHeight(l) ==bHeight(r)
case Empty() => true
}
def bHeight(t : Tree) : Int = t match {
case Empty() => 1
case Node(Black(), l, _, _) => bHeight(l) + 1
case Node(Red(), l, _, _) => bHeight(l)
}
def isRBT(t : Tree) : Int = …
Insertion in a few lines
def insert(x : Int, t : Tree) =
((t1:Tree) =>
isRBT(t1) && elems(t1) = elems(t) ++ Set(x)
).find
Are invariants together with declarative shorter
than functional implementation?
– not necessarily,
– but we know the result satisfies the invariants
– RBT only makes sense with invariants
Extending the Program
Suppose we wish to implement ‘remove’ as well
def remove(x : Int, t : Tree) =
((t1:Tree) =>
isRBT(t1) && elems(t1) = elems(t) -- Set(x)
).find
Declarative remove is again just a few lines
(keep the same invariants!)
declarative knowledge can be more reusable
void RBDelete(rb_red_blk_tree* tree, rb_red_blk_node* z){
rb_red_blk_node* y;
rb_red_blk_node* x;
rb_red_blk_node* nil=tree->nil;
rb_red_blk_node* root=tree->root;
y= ((z->left == nil) || (z->right == nil)) ? z : TreeSuccessor(tree,z);
x= (y->left == nil) ? y->right : y->left;
if (root == (x->parent = y->parent)) { /* assignment of y->p to x->p is intentional */
root->left=x;
} else {
if (y == y->parent->left) {
y->parent->left=x;
} else {
y->parent->right=x;
}
}
if (y != z) { /* y should not be nil in this case */
#ifdef DEBUG_ASSERT
Assert( (y!=tree->nil),"y is nil in RBDelete\n");
#endif
/* y is the node to splice out and x is its child */
if (!(y->red)) RBDeleteFixUp(tree,x);
tree->DestroyKey(z->key);
tree->DestroyInfo(z->info);
y->left=z->left;
y->right=z->right;
y->parent=z->parent;
y->red=z->red;
z->left->parent=z->right->parent=y;
if (z == z->parent->left) {
z->parent->left=y;
} else {
z->parent->right=y;
}
free(z);
} else {
tree->DestroyKey(y->key);
tree->DestroyInfo(y->info);
if (!(y->red)) RBDeleteFixUp(tree,x);
free(y);
}
#ifdef DEBUG_ASSERT
Assert(!tree->nil->red,"nil not black in RBDelete");
#endif
void RBDeleteFixUp(rb_red_blk_tree* tree, rb_red_blk_node* x) {
rb_red_blk_node* root=tree->root->left;
rb_red_blk_node* w;
while( (!x->red) && (root != x)) {
if (x == x->parent->left) {
w=x->parent->right;
if (w->red) {
w->red=0;
x->parent->red=1;
LeftRotate(tree,x->parent);
w=x->parent->right;
}
if ( (!w->right->red) && (!w->left->red) ) {
w->red=1;
x=x->parent;
} else {
if (!w->right->red) {
w->left->red=0;
w->red=1;
RightRotate(tree,w);
w=x->parent->right;
}
w->red=x->parent->red;
x->parent->red=0;
w->right->red=0;
LeftRotate(tree,x->parent);
x=root; /* this is to exit while loop */
}
} else { /* the code below is has left and right switched from above */
w=x->parent->left;
if (w->red) {
w->red=0;
x->parent->red=1;
RightRotate(tree,x->parent);
w=x->parent->left;
}
if ( (!w->right->red) && (!w->left->red) ) {
w->red=1;
x=x->parent;
} else {
if (!w->left->red) {
w->right->red=0;
w->red=1;
LeftRotate(tree,w);
w=x->parent->left;
}
w->red=x->parent->red;
x->parent->red=0;
w->left->red=0;
RightRotate(tree,x->parent);
x=root; /* this is to exit while loop */
}
}
}
x->red=0;
#ifdef DEBUG_ASSERT
Imperative
remove
140 lines of tricky C,
reusing existing functions
Unreadable without pictures
(from Emin Martinian)
Key Question
When will such declarative programming work?
Which constraints can we solve predictably?
( ? ).find
Need algorithms for solving constraints over
integers, lists, trees, sets, maps, multisets, …
Executing Constraints: Then and Now
Long tradition
– Prolog (inspired by Robinson’s resolution theorem proving)
– CLP(R): added linear constraints
– Functional Logic Programming (e.g.Curry): narrowing technique
then came SAT and Kodkod
– Aleksandar Milicevic, Derek Rayside, Kuat Yessenov, Daniel Jackson:
Unifying execution of imperative and declarative code. ICSE 2011
– Hesam Samimi, Ei Darli Aung, Todd D. Millstein: Falling Back on
Executable Specifications. ECOOP 2010
and SMT - Satisfiability Modulo Theories
Implicit Programming : SMT = Prolog : Resolution
SMT = SAT + Cooperating Decision Procedures
SAT solver
cleverly
SMT
enumerates conjunctions
x < y+1 && y < x+1 && x’=f(x) && y’=f(y) && x’=y’+1
linear programing
solver
x < y+1
y < x+1
x’=y’+1
congruence closure
implementation
x=y
x’=f(x)
y’=f(y)
x’=y’
0=1
e.g. Z3 theorem prover (N. Bjorner, L. de Moura)
Leon: Constraint Solver of Kaplan
=
+
recursive
function
definitions
• Supports recursive functions and data-types.
• Originally developed for program verification with
counter-examples. [SAS 2011]
• Algorithm is a semi-decision procedure; always
terminates when there is a counter-example.
• Works as a decision procedure for a well-defined
class of functions. [POPL 2010]
– tree fold functions f : Tree  Set such that |f-1(x)| is large enough
“SMT modulo Recursive Functions”
Algorithm to find solution for constraint with recursive
functions:
– Disable recursive call branches. If z3-sat, report SAT
– Enable recursive branches, replace the result with fresh
constant;
if z3-unsat, report UNSAT
– If none gives answer, replace call with function body (unroll)
and also add function contracts (k-induction)
Results for Computing with Constraints
send + more = money
1.17 s
All red-black trees up to size 7: 27.45 s (comparable to JPF)
last list element:
Note: domains
not bounded
upfront
Example Results for Verification
Unrolling depth
Same property also esnures that enumeration often terminates.
http://lara.epfl.ch/leon/
Implicit Programming Agenda
requirements
Advancing
3) Development within an IDE
def f(x : Int) = {
y=2*x+1
}
2) Compilation and static checking
iload_0
iconst_1
iadd
1) Execution on a (virtual) machine

42
Automated reasoning is key enabling technology
Synthesis for Arithmetic
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
((h: Int, m: Int, s: Int) ⇒ (
h * 3600 + m * 60 + s == totalSeconds
&& h ≥ 0
&& m ≥ 0 && m < 60
&& s ≥ 0 && s < 60)).find
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
val t1 =
val t2 =
val t3 =
val t4 =
Some(t1, t3, t4)
?
Choose Notation
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
choose((h: Int, m: Int, s: Int) ⇒ (
h * 3600 + m * 60 + s == totalSeconds
&& h ≥ 0
&& m ≥ 0 && m < 60
&& s ≥ 0 && s < 60))
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
val t1 =
val t2 =
val t3 =
val t4 =
Some(t1, t3, t4)
?
Starting point: quantifier elimination
(QE)
• A specification statement of the form
r = choose(x ⇒ F( a, x ))
“let r be x such that F(a, x) holds”
• Corresponds to constructively solving the
quantifier elimination (QE) problem
∃ x . F( a, x )
where a are parameters
• Witness terms from QE are the synthesized code
Methodology QE  Synthesis
Quantifier Elimination:
 x. S(x,a)  P(a)
Synthesis:
 x. S(x,a)  S(t(a),a)
• For all QE procedures we examined, we were
able to find the corresponding witness terms
• One-point rule immediately gives a term
x. (x = t(a) && S(x,a))  S(t(a),a)
Example for other domains:
 x. (a1 < x & x < a2)
 a1 < a1 + 1 & a1 + 1 < a2
t(a1,a2)=a1+1
Synthesis Procedure: Equalities
Process equalities first:
• compute parametric description of solution set
• replace n variables with n-1
h * 3600 + m * 60 + s == totalSeconds

s = totalSeconds - h * 3600 - m * 60
In general we obtain divisibility constraints
– use Extended Euclid’s Algorithm,
matrix pseudo inverse in Z
Synthesis Procedure: Inequalities
• Solve for one by one variable:
– separate inequalities depending on polarity of x:
Ai ≤ αix
βjx ≤ Bj
– define terms a = maxi⌈Ai/αi⌉ and b = minj⌈Bj/ βj⌉
• If b is defined, return x = b else return x = a
• Further continue with the conjunction of all
formulas ⌈Ai/αi⌉ ≤ ⌈Bj/ βj⌉
• Similar to Fourier-Motzkin elimination
(remove floor and ceiling using divisibility)
26
2y − b ≤ 3x + a ∧ 2x − a ≤ 4y + b
2y − b − a ≤ 3x ∧ 2x ≤ 4y + a + b
4y − 2b − 2a ≤ 6x ≤ 12y + 3a + 3b
(4y − 2b − 2a) / 6 ≤ x ≤ (12y + 3a + 3b) / 6
(4y − 2b − 2a) / 6 ≤ ⌊(12y + 3a + 3b) / 6⌋
two extra variables:
(4y − 2b − 2a) / 6 ≤ l ∧ 12y + 3a + 3b = 6 ∗ l + k
∧0≤k≤5
4y − 2b − 2a ≤ 6 * l ∧ 12y + 3a + 3b = 6 ∗ l + k
∧0≤k≤5
pre: 6|3a + 3b − k
4y − 2b − 2a ≤ 12y + 3a + 3b – k
− 5b − 5a +k ≤ 8y
 y = ⌈(k − 5a − 5b)/8⌉
27
Generated Code Contains Loops
val (x1, y1) = choose(x: Int, y: Int =>
2*y − b =< 3*x + a && 2*x − a =< 4*y + b)
val kFound = false
for k = 0 to 5 do {
val v1 = 3 * a + 3 * b − k
if (v1 mod 6 == 0) {
val alpha = ((k − 5 * a − 5 * b)/8).ceiling
val l = (v1 / 6) + 2 * alpha
val y = alpha
val kFound = true
break } }
if (kFound)
val x = ((4 * y + a + b)/2).floor
else throw new Exception(”No solution exists”)
28
Result of Synthesis
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
choose((h: Int, m: Int, s: Int) ⇒ (
h * 3600 + m * 60 + s == totalSeconds
&& h ≥ 0
&& m ≥ 0 && m < 60
&& s ≥ 0 && s < 60 ))
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
val t1 = totalSeconds div 3600
val t2 = totalSeconds -3600 * t1
val t3 = t2 div 60
val t4 = totalSeconds - 3600 * t1 - 60 * t3
(t1, t3, t4)
Implemented as an extension of the Scala compiler.
Example: Date Conversion in C
Knowing number of days since 1980, find current year and day
BOOL ConvertDays(UINT32 days) {
year = 1980;
while (days > 365) {
if (IsLeapYear(year)) {
if (days > 366) {
days -= 366;
year += 1;
}
} else {
days -= 365;
year += 1;
} ...
}
Enter December 31, 2008
All music players (of a major brand)
froze in the boot sequence.
Implicit Programming for Date Conversion
Knowing number of days since 1980, find current year and day
val origin = 1980
@spec def leapsTill(y : Int) = (y-1)/4 - (y-1)/100 + (y-1)/400
let (year1, day1)=choose( (year:Int, day:Int) => {
days == (year-origin)*365 + leapsTill(year)-leapsTill(origin) + day &&
0 < day && day <= 366
print(year1, day1)
})
Analysis and termination simpler than with loop
Properties of Synthesis Algorithm
• For every formula in linear integer arithmetic
– synthesis algorithm terminates
– produces the most general precondition
(assertion saying when result exists)
– generated code gives correct values whenever
correct values exist
• If there are multiple or no solutions for some
parameters, we get a warning
Compile-time warnings
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
choose((h: Int, m: Int, s: Int) ⇒ (
h * 3600 + m * 60 + s == totalSeconds
&& h ≥ 0 && h < 24
&& m ≥ 0 && m < 60
&& s ≥ 0 && s < 60
))
Warning: Synthesis predicate is not
satisfiable for variable assignment:
totalSeconds = 86400
Compile-time warnings
def secondsToTime(totalSeconds: Int) : (Int, Int, Int) =
choose((h: Int, m: Int, s: Int) ⇒ (
h * 3600 + m * 60 + s == totalSeconds
&& h ≥ 0
&& m ≥ 0 && m ≤ 60
&& s ≥ 0 && s < 60
))
Warning: Synthesis predicate has multiple
solutions for variable assignment:
totalSeconds = 60
Solution 1: h = 0, m = 0, s = 60
Solution 2: h = 0, m = 1, s = 0
Arithmetic pattern matching
def fastExponentiation(base: Int, power: Int) : Int = {
def fp(m: Int, b: Int, i: Int): Int = i match {
case 0 ⇒ m
case 2 * j ⇒ fp(m, b*b, j)
case 2 * j + 1 ⇒ fp(m*b, b*b, j)
}
fp(1, base, p)
}
• Goes beyond Haskell’s (n+k) patterns
• Compiler checks that all patterns are reachable
and whether the matching is exhaustive
Synthesis for parametrized arithmetic
def decomposeOffset(offset: Int, dimension: Int) : (Int, Int) =
choose((x: Int, y: Int) ⇒ (
offset == x + dimension * y && 0 ≤ x && x < dimension
))
• The predicate becomes linear at run-time
• Synthesized program must do case analysis on
the sign of the input variables
• Some coefficients are computed at run-time
Synthesis for (multi)sets (BAPA)
def splitBalanced[T](s: Set[T]) : (Set[T], Set[T]) =
choose((a: Set[T], b: Set[T]) ⇒ (
a union b == s && a intersect b == empty
&& a.size – b.size ≤ 1
&& b.size – a.size ≤ 1
))
def splitBalanced[T](s: Set[T]) : (Set[T], Set[T]) =
val k = ((s.size + 1)/2).floor
val t1 = k
s
val t2 = s.size – k
val s1 = take(t1, s)
val s2 = take(t2, s minus s1)
(s1, s2)
a
b
NP-Hard Constructs
• Divisibility combined with inequalities:
– corresponding to big disjunction in q.e. ,
we will generate a for loop with constant bounds
(could be expanded if we wish)
• Disjunctions
– Synthesis of a formula computes program and exact
precondition of when output exists
– Given disjunctive normal form, use preconditions
to generate if-then-else expressions (try one by one)
Synthesis for Disjunctions
More on this Synthesis Approach
• Software Synthesis Procedures,
Communications of the ACM, February 2012
• STTT 2012
• PLDI 2010
Ongoing work
– Use it to compile certain Kaplan specifications
– Synthesis for further theories
Implicit Programming Agenda
requirements
Advancing
3) • Development within an IDE
2) • Compilation and static checking

def f(x : Int) = {
y=2*x+1
}
iload_0
iconst_1
iadd
1) • Execution on a (virtual) machine

Automated reasoning is key enabling technology
42
Type-Driven Synthesis within an IDE
Type-Driven Synthesis within an IDE
Synthesis Approach Outline
…………………………
…………………………
…………………………
…………………………
…………………………
…………………………
…………………………
…………………………
……………
source
code in editor
Extract:
- visible symbols
- expected type
Program point
(cursor)
pre-computed
weights
Create type environment:
- encode subtypes
- assign initial weights
Search algorithm
with weights
(generates intermediate
expressions and their types)
Ranking
w/ Tihomir Gvero and Ruzica Piskac, CAV’11
5 suggested
expressions
Search Algorithm
• Semi-decidable problem (cf. Leon)
• Related to proof search in intuitionistic logic
• Weights are an important additional aspect:
– in our application we find many solutions quickly
– must select interesting ones
– interesting = small weight
– algorithm with weights preserves completeness
• Identified decidable (even polynomial) cases,
in the absence of generics and subtyping
Results for Expression Suggestion
Benchmark
Length #Initial
#Derived
#Snip.Gen.
Rank
Time [ms]
ByteArrayInputStreambytebufintoffsetintlength
4
22
4049
102
3
546
CharArrayReadercharbuf
3
26
782
343
1
546
HashSetiterator
2
60
1832
201
1
546
Hashtableelements
2
32
869
445
1
546
HashtableentrySet
2
31
874
441
1
546
HashtablekeySet
2
32
968
492
3
546
Hashtablekeys
2
30
818
477
2
515
PriorityQueuepoll
2
27
1208
363
1
562
Examples demonstrating API usage
Remove 1-2 lines; ask tool to synthesize it
Time budget for search: 0.5 seconds
Generates up to 1000s of well-formed and well-typed expressions
Displays top 5 of them
The one that was removed appeared as rank 1, 2, or 3
Similarly good behavior in over 50% of the 120 examples evaluated
Other Topics
• Specification decision procedures, including MAPA,
BAPA, extensions, term powers (Piskac, Yessenov)
• Synthesis based on automata (Jobstmann, Spielmann)
• Static analysis and verification
– Jahob (Rinard, Zee)
– Static analysis of PHP (Kneuss, Suter),
Scala (Kneuss,Suter), C (Vujosevic-Janicic)
– Predicate abstraction with interpolation
(Hojjat, Ruemmer, Iosif)
• Numerical computation analysis (Darulova)
• Collaborations on distributed systems (Kostic),
speculative linearizability (Guerraoui), testing (Marinov)
Implicit Programming, Explicit Design
Explicit = written down, machine readable
Implicit = omitted, to be (re)discovered
Current practice:
– explicit program code
– implicit design (key invariants, properties, contracts)
A lot of hard work exists on verification and analysis: checking design
against given code and recovering (reverse engineering) implicit invariants.
Our goal:
– explicit design
– implicit program
Total work of developer is not increased! Moreover:
– can be decreased for certain types of specifications
– confidence in correctness higher – program is spec
Conclusions
4) more human-oriented
programming, empowering
user to program
requirements
Advancing
3) Development within an IDE:
synthesize entire expressions
2) Compilation and static checking:
transform spec into program
1) Execution on a (virtual) machine:
use and extend SMT solvers
3)
choose x,y,z,n=>
xn + yn = zn
2)
SMT
solver
1)
http://lara.epfl.ch/~kuncak
42
Download