Storyboard Programming of Data Structure Manipulations Rishabh Singh

Storyboard Programming of Data Structure
Manipulations
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
A picture is worth 20 lines of code
by
JUL 12 2010
Rishabh Singh
LIBRARIES
B.Tech(H), Indian Institute of Technology Kharagpur (2008)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Science
ARCHIVES
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2010
@ Massachusetts
Author ...
Institute of Technology 2010. All rights reserved.
;
department of Electrical Engineering and Computer Science
May 7, 2010
Certified by.'
.
.
,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Armando Solar-Lezama
Assistant Professor
Thesis Supervisor
Accepted by ....... ,....
Terry Orlando
Chairman, Department Committee on Graduate Students
Storyboard Programming of Data Structure Manipulations
A picture is worth 20 lines of code
by
Rishabh Singh
Submitted to the Department of Electrical Engineering and Computer Science
on May 7, 2010, in partial fulfillment of the
requirements for the degree of
Master of Science
Abstract
We introduce Storyboard Programming, a new programming model that harnesses
the programmer's visual intuition about a problem to synthesize a correct implementation. The motivation for our technique comes from the domain of data-structure
manipulations. In this domain, programmers often think in terms of abstract graphical visualizations but have a hard time translating that intuition into low-level pointer
manipulating code. We aim to bridge this gap and show that it is possible to derive the low-level implementation automatically from the graphical specifications with
little additional input from the programmer.
The storyboard in our programming model consists of a series of scenarios which
show how the data-structure evolves under different conditions. We present two novel
algorithms to synthesize the code from the storyboards. The algorithms derive an abstract domain and a set of correctness conditions automatically from the storyboards.
The synthesizer uses the abstract domain to perform abstraction guided combinatorial
synthesis. The resulting program is guaranteed to satisfy the correctness conditions
derived from the storyboard, and to conform to the high-level structure specified by
the programmer.
We have implemented our framework successfully on top of the SKETCH system.
Our implementation is capable of synthesizing several interesting data-structure manipulations such as insertion, deletion, rotation, reversal over linked list and binary
search tree data structures.
Thesis Supervisor: Armando Solar-Lezama
Title: Assistant Professor
Acknowledgments
I would first like to thank my advisor Prof. Armando Solar-Lezama to have put in so
much efforts in making this thesis look what it is. He has always been an inspiration
and a great motivator for helping me think about the big picture of our research.
Also for the endless times I went to his offices for various issues with Sketch encoding
and my understanding about it.
Prof. Daniel Jackson influenced a lot about working on big important ideas.
My undergraduate advisor Prof. Andrey Rybalchenko who made me believe that
everything in life is acheivable, its just that we need to work that bit harder. In his
words: Believe to Achieve. I have been trying to do that ever since.
I was fortunate enough to work during internship with great researchers like Dimitra Giannakopoulou, Corina Pasareanu and Tom Henzinger. They helped me ever
so much at every point with all different kinds of issues.
My fellow CSAIL graduate students Eunsuk Kang, Sasa Misailovic, Aleksandar
Milicevic, Joseph P. Near, Jean Yang, and Kuat Yessenov have never failed to provide
inspiring discussion and helpful feedback. Also my roommates Saurav Bandopadhyay
and Rahul Rithe never let me miss my home too much.
I would like to thank my father, its all because of him and his wishes I am where
I am today. I would also like to thank my mother, my sister Richa and my brother
Rohit for all the sufferings they took for me. Finally I would like to specially thank
Deeti for helping me ever so much at every point of my life.
Contents
1
Introduction
1.1 Bridging the gap between graphical intuition and implementation
1.2 The promise of graphics aided programming
1.3 The Storyboard Programming Framework
1.4 From verification to synthesis . . . . . . .
1.5 Contributions . . . . . . . . . . . . . . . .
1.6 Organization of the thesis . . . . . . . . .
2
Storyboard Programming Overview
2.1 Scan and Modify manipulations: linked list insertion
2.1.1 Storyboard . . . . . . . . . . . . . .
2.1.2 Control flow sketch . . . . . . . . . .
2.1.3 Derivation of the abstract domain
2.1.4 Translation to constraint equations
2.2 Handling abstract node updates : linked list reversal
2.2.1 Storyboard . . . . . . . . . . . . . .
2.2.2 Control flow sketch . . . . . . . . . .
2.2.3 Derivation of the abstract domain
2.2.4 Translation to constraint equations
2.3 Solving the data flow equations
2.3.1 EXPSYN algorithm
2.3.2 IMPSYN algorithm
3
Preliminaries
3.1 The SKETCH Synthesizer . . . . . . . . . . . . . .
3.1.1 Example sketch . . . . . . . . . . . . . . .
4 Storyboard Language
4.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Sem antics . . . . . . . . . . . . . . . . . . . . . .
5 EXPSYN Algorithm
5.1 N otations . . . . . . . . . . . . . . . . . . . . . .
5.2 Construction of uninterpreted transition functions
5.3
SKETCH encoding of data flow constraints
. . . .
14
14
. . . .
. . . .
14
. . . .
15
. . . .
16
. . . .
18
. . . .
20
. . . . 20
. . . . 21
.... 21
. . . . 22
. . . .
24
. . . .
24
. . . .
24
5.4
6
Correctness guarantees of the synthesized implementation . . . . . . .
IMPSYN Algorithm
6.1
6.2
6.3
6.4
6.5
6.6
39
.
.
.
.
.
.
.
39
41
43
43
44
45
45
Merging conditional statements in path program traces . . . .
46
6.6.3 Concretizing the choices for loop exit constraint . . . . . . . .
Correctness guarantees of the synthesized implementation . . . . . . .
46
47
Symbolic representation of states and transition functions
SKETCH encoding and constraints . . . . . . . . . . . . .
Verification engine . . . . . . . . . . . . . . . . . . . . .
IMPSYN algorithm description . . . . . . . . . . . . . . .
Correctness proof of IMPSYN algorithm . . . . . . . . . .
O ptim izations . . . . . . . . . . . . . . . . . . . . . . . .
6.6.1 Synthesizing multiple program paths together . .
6.6.2
6.7
38
7 Experiments
7.1 EXPSYN algorithm results . . . . . . . . . . . .
7.1.1 Insertion/ Deletion in a sorted linked list
7.1.2 Insertion in a binary search tree . . . . .
7.2 IMPSYN algorithm results . . . . . . . . . . . .
7.2.1 Left/Right rotation in binary search tree
7.2.2 In-place linked list reversal . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
48
48
53
56
56
61
8 Related Work
8.1 Software synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Visual Programming and Programming by Example . . . . . . . . . .
63
63
64
9
66
Conclusions and Future Work
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
1-1
1-2
Binary Search Tree rotation . . . . . . . . . . . . . . . . . . . . . . .
Binary Search Tree rotation code . . . . . . . . . . . . . . . . . . . .
10
11
2-1
2-2
2-3
Storyboard for linked list insertion . . . . . . . . . . . . . . . . . . .
Control flow sketch for linked list insertion . . . . . . . . . . . . . . .
Control flow graph for linked list insertion . . . . . . . . . . . . . . .
15
16
19
2-4
A Scenario in the storyboard for linked list reverse manipulation . . .
20
2-5
2-6
2-7
2-8
Unfold and fold operation on the mid abstract node .
Control flow sketch for in-place linked list reversal . .
Control flow graph for in-place linked list reversal.. . .
The IMPSYN algorithm . . . . . . . . . . . . . . . . .
.
.
.
.
21
22
23
25
3-1
3-2
A simple sketch toy program . . . . . . . . . . . . . . . . . . . . . . .
Swapping of two integers without a temporary variable . . . . . . . .
27
27
4-1
4-2
29
.
.
.
.
. . . .
. . . .
.
. . . .
. . .
and
. . .
. . .
.
.
.
.
.
.
.
.
. .
(b)
. .
. .
.
.
.
.
4-3
Grammar for the Storyboard Language . . . . . . . . . .
(a) Scenario constraint graph for the example scenario Si
example state edge cover (abstract state) for Si . . . . .
Semantics of the Storyboard language . . . . . . . . . . .
. .
an
. .
. .
32
34
6-1
The IMPSYN algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .
44
7-1
Storyboard for linked list insertion
. . . . . . . . . . . . . . . . . . .
49
7-2
Control flow sketch for sorted linked list insertion manipulation
. . .
50
7-3
7-4
7-5
Synthesized implementation for sorted linked list insertion manipulation 51
Synthesized implementation for sorted linked list deletion manipulation 52
Storyboard for binary search tree insertion . . . . . . . . . . . . . . . 53
7-6
Control flow sketch for binary search tree insertion manipulation . . .
54
Synthesized implementation for binary search tree insertion manipulation
Storyboard for binary search tree left rotation . . . . . . . . . . . . .
Control flow sketch for binary search tree left rotation manipulation .
Synthesized implementation for binary search tree left rotate manipulatio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-11 Synthesized implementation for binary search tree right rotate manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-12 Storyboard for inplace linked list reversal . . . . . . . . . . . . . . . .
55
57
58
7-13 Control flow sketch for inplace linked list reversal manipulation . . . .
62
7-7
7-8
7-9
7-10
59
60
61
7-14 Synthesized implementation for inplace linked list reversal manipulation 62
List of Tables
7.1
7.2
Experimental results for EXPSYN algorithm . . . . . . . . . . . . . .
Experimental results for IMPSYN algorithm . . . . . . . . . . . . . . .
48
56
Chapter 1
Introduction
Storyboard programming is a new programming paradigm for programmers to automatically synthesize efficient low level code from intuitive high level graphical specifications. The motivation of the storyboard programming model comes from the
domain of data structure manipulations. In this domain, programmers have a very
clear picture of how a data structure evolves and gets modified during the execution.
This visual intuition is then required to be implemented for executing the desired
manipulation. The correct implementation of such abstract graphical intuitions in
low level programming languages requires a lot of precise reasoning and is often troublesome. Storyboard programming aims to reduce the burden on programmers by
bridging this gap between the high level intuitive graphical specifications and the
corresponding low level pointer manipulating implementation.
For providing unbounded correctness guarantees of the synthesized implementation over the size of the data structures, the synthesizer requires an abstract domain.
The abstract domain over-approximates an infinite set of concrete configurations of
a data structure to a finite set of abstract configurations. An important technique
storyboard programming enables is the automated derivation of the abstract domain
from the graphical specifications.
1.1
Bridging the gap between graphical intuition and
implementation
Consider the example of binary search tree rotation in figure 1-1. As can be seen
from the figure, it is quite natural and intuitive to express the rotation manipulation on binary search trees graphically in terms of a starting configuration and an
ending configuration of the tree. In fact this is a standard way a computer science
undergraduate student learns about data structure manipulations in an Algorithms
class. On the other hand consider the corresponding low level code for the rotation
example [6] shown in figure 1-2. Even though figures 1-1 and 1-2 essentially capture
the same notion of tree rotation operation, the correspondence between the two is not
straightforward. The code for implementation requires a precise sequence of pointer
assignments to carry out the required rotation operation. These low level implemen-
tations are non intuituive and in some cases can get very complicated; e.g. in the
case of red black tree manipulations.
Binary Search Tree Rotation
x
Right-Rotate(T, y)
X
Left-Rotate(T, x)
Figure 1-1: Binary Search Tree rotation
1.2
The promise of graphics aided programming
The use of graphics to aid in programming, debugging and program understanding
has historically been an intriguing prospect. Myers [17] presents a broad survey of
various attempts at using high-level graphical descriptions to alleviate some of the
complexity of programming. Most of those techniques can be broadly classified into
three main classes : Visual Programming, Programming by example and batch/interactive programming. AMBIT/G [5] was one of the earliest efforts for representing
data and programs as predefined pictures and using a pattern matching language to
execute them. Grail [8] could compile flow chart programs directly to executable code.
Some systems like Shaw's [21] were proposed that could learn some restricted set of
programs from input/output pairs. Programming visualization systems [161 [4] were
also developed to display runtime data structure information for providing help in debugging of the programs. Several other systems like Pygmalion [22] and Thinglab [3]
were developed to help programmer define computations pictorially. Programming
Languages like Visual Basic allowed programmers to create GUI applications from
general graphical components using drag-and-drop techniques. Most of these systems
either require programmers to know about the how the implementation works (visual
programming) or only try to infer restricted set of arbitrary programs (programming
by example). Despite such early enthusiasm, the graphics aided programming has had
a very limited impact for most programming purposes. Most of the previous attempts
did not attempt to exploit the semantic information present in the pictures, rather
used them only syntactically as a means of communication between the programmer
and the machine. With the recent advancement of verification and SAT technology,
Algorithm 1 LeftRotate(Node root, Node x)
1: y = x.right
2: x.right
3: if y.left
#
y.left
null then
4:
y.left.p = x
5: end if
6: y.p = x.p
7: if x.p ==null then
8:
root = y
9: else
10:
if x == x.p.left then
11:
x.p.left = y
12:
else
13:
x.p.right = y
14:
end if
15: end if
16: y.left = x
17: x.p = y
Figure 1-2: Binary Search Tree rotation code
we aim to push the graphical aided programming usage to the next level so that
programmers can benefit from the large amount of computing resources available
today.
1.3
The Storyboard Programming Framework
Storyboard programming facilitates programmers to represent various cases of the
data structure modifications graphically using the storyboards. Each case consists of
abstract input and output (possibly intermediate) data structure configurations that
ideally represent a distinct behaviour of the manipulation. The synthesized low level
implementation conforms to these graphical input/output specifications. These visual
specifications are then mapped to constraints in the storyboard language. In addition
to the storyboard, the programmer also provides some approximate control flow of the
required implementation. This control flow sketch is required to structure the search
space for the correct implementation and to inhibit the synthesizer from synthesizing
non-structured implementations. The framework derives an abstract domain and a
set of correctness conditions from the storyboards, which in conjunction with the
control flow sketch are reduced to a set of equations in the intermediate SKETCH
language. The synthesized solution to the equations is then mapped back to the
desired low level implemenation corresponding to the storyboards.
1.4
From verification to synthesis
The framework synthesizes the desired implementation in two steps. In the first step,
an abstract domain and a set of correctness constraints are automatically derived from
the storyboard. The abstract domain is required for providing unbounded correctness
guarantees for the resulting implementation in terms of size of the data structure.
The correctness constraints correspond to the input, intermediate and output state
constraints obtained from the graphical specifications.
In the second step, a set of data flow equations of the form X = f(X) are dervied
from the control flow sketch provided by the programmer. This approach of translating the program into a system of O(n) equations where n is the number of program
points is very similar to the approach that maps a program into a monotone framework [18]. For the synthesis problem, the program statements are unknown but the
set of input and output states are known. The task of the synthesizer is to efficiently
search for a set of program statements that conform to the data flow, the input /output
constraints and possibly intermediate state constraints. In this research, we explore
two novel ways to search for functions in the equations from its solution. The first
approach EXPSYN constructs the states and transition functions explicitly whereas
the second approach IMPSYN models them symbolically.
We evaluate the performance of the two algorithms on case studies involving
manipulations like insertion, deletion, rotation and reversal for the linked list and
binary search tree data structures. We first wish to explore how far this framework
can be used to synthesize the well known data structure algorithms. The future
goal of this research is to use this framework for synthesizing implementations for
manipulations involving arbitrary user defined data structures.
1.5
Contributions
This thesis makes the following key contributions:
* Development of storyboard programming as a new programming paradigm.
" Automated abstract domain derivation from the storyboards to provide unbounded correctness guarantees.
" Two novel synthesis algorithms EXPSYN and IMPSYN to solve the data flow
equations.
" Storyboard language to formally specify the meaning of storyboards.
* Evaluation of the algorithms on a handful of data structure manipulations including linked lists and binary search trees.
1.6
Organization of the thesis
In chapter 2, we describe the overview of the storyboard programming framework
through two running examples. Chapter 3 describes the preliminaries about the
SKETCH synthesis system. Chapter 4 presents the syntax and semantics of our storyboard language. Chapter 5 and 6 present the detailed description of EXPSYN and
IMPSYN algorithms respectively. Chapter 7 presents the evaluation of the two algorithms on case studies including various manipulations on linked list and binary
search tree data structures. Chapter 8 describes some of the relationship between
our work and the work on visual programming, programming by example and some
recent software synthesis work. Finally Chapter 9 presents the conclusions and the
future work directions.
Chapter 2
Storyboard Programming Overview
This chapter presents a brief overview of the Storyboard programming framework.
We present two interesting classes of data structure manipulations through two representative examples: insertion in a sorted linked list and inplace linked list reversal. We briefly describe how the programmer describes the storyboards and how the
framework derives an abstract domain and correctness conditions for synthesizing the
required implementation.
2.1
Scan and Modify manipulations: linked list insertion
We present the first flavour of storyboard programming with a simple sorted linked
list insertion example. This example is representative of a class of data structure
manipulations which first require a scan over the data structure and then a local
modification to acheive the desired goal. The insertion operation requires to insert a
node x in a sorted linked list 1 that may possibly be empty.
2.1.1
Storyboard
As a first step, the programmer provides his visual insights using the storyboard as
shown in fig 2-1. The storyboard consists of a set of abstract input/output pairs,
possibly with some intermediate states. The storyboard in figure 2-1 consists of four
different scenarios; where a scenario describes how a data structure evolves and gets
modified under a certain condition. The four scenarios for the example respectively
describe: inserting a node in the middle of the list, inserting a node at the beginning
of the list, inserting a node at the end of the list and inserting into an empty list.
The insertion manipulation requires a node x to be inserted in the list between
two nodes a and b (possibly empty) with the invariant a.val < x.val < b.val. Therefore
only the concrete nodes a and b are relevant for inserting x at the correct position
and other nodes are abstracted away. The programmer visualizes the sorted linked
list in four parts: front, a, b and back. The node front represents a set of nodes which
are in front of the node a and back represents a set of nodes which are present after
head
nro
Frame:
Start
head
CfronJ{7
baci ~}
Frame:
End
-A
scenariol
bAcJ
head
head
bac]
J
head
~
head-
front
-
r
head
L.ackD
J(1nback D
scenario2
head
Cfron}-scenano
3
scenario4
a.vaI< x.val < b.val
Figure 2-1: Storyboard for linked list insertion
the node b. We call such nodes abstract nodes that represent a set of concrete nodes.
Nodes a, x and b on the other hand are concrete nodes.
2.1.2
Control flow sketch
In addition to the storyboards, the programmer provides a control flow sketch to
structure the search space for the desired implementation. The control flow sketch
for the linked list insertion example is shown in figure 2-2. As a first step, the
programmer describes the struct Node of the linked list. In the control flow sketch,
LOC describes all possible expressions over nodes that can be obtained with at most
one dereferencing with the next pointer. The programmer also defines possible choices
for conditionals COND that might be useful for the program. In many cases, these
conditionals can be derived automatically from different scenarios as they essentially
represent different cases for the program execution. An assignment statement is
described as STMT: LOC = LOC and a conditional statement as CSTMT: if(COND)
STMT. It can be noted that STMT defines all possible assignment statements over
variable expressions with at most one next dereferencing operation. (STMT)* denotes
zero or more number of STMT statements.
The control flow sketch in figure 2-2 describes first a sequence of assignment statements (STMT)*, then a while loop with the loop body consisting of a set of assignment
statements and finally a set of conditional statements. A large part of the control flow
sketch that is provided by the programmer can be easily automated, but our current
framework requires sketches to be provided manually. Ideally the framework should
only require some measure of the algorithmic complexity of the required implementation for automated construction of the control flow sketch. The control flow sketch
is then translated into a system of equations which are then solved in an abstract
setting.
In order to provide unbounded correctness guarantees over the synthesized implementation, an abstract domain is derived from the storyboards. The abstraction
reduces the infinite set of inputs to a small finite number of abstract inputs that
void insert (Node head,
Node x){
* @Start*/
Node
Node prev , curr
O)*
(whl
while (COND)
int value
Node next;
invariant next. value >= value;
(CSTMT)*
* @End*
}
Figure 2-2: Control flow sketch for linked list insertion
makes the synthesis algorithm efficient and feasible.
2.1.3
Derivation of the abstract domain
The synthesizer derives an abstract domain automatically from the storyboard provided by the programmer. For each scenario, the system defines an abstract domain
focusing on the concrete and abstract objects in that scenario. To illustrate the derviation, we consider the first scenario of inserting a node in the middle of the list in
figure 2-1. The graphical specification in the figure can be translated into a concise
text based notation in the storyboard language. All the algorithms we define later
will be defined in terms of this notation, but we want to emphasize that this translation from graphical representations to the text based language is straightforward.
The textual description consists of two parts: environment and frames. The environment describes the set of objects involved in the scenario, whereas the frames define
the description of data structure at particular program points of the corresponding
implementation.
For the first scenario in figure 2-1, the environment is described as follows:
Env{
Node head, prev,
[Node] a, b, x;
[[Nodel]
curr;
front , back;
assert front . next
assert back.next
{front , a
{back, null};
}
The environment first describes a set of variables of type Node that the programmer thinks might be useful. The local variables from the control flow sketch are
included in this set of variables. The environment above describes three local variables head (from the storyboard), prev and curr (from control flow sketch). [Class]
name* is used to define concrete objects of type Class whereas [[Class]] name* is used
to define abstract objects of type Class. An abstract object refers to a collection of
concrete objects sharing some common property. The environment for the example
above describes three concrete nodes a, b and x and two abstract nodes front and
back. The environment also defines the global connectedness properties of the abstract nodes; specifically that front.next points to both front and a, and back.next
points to back and null. These definitions are part of the environment as they hold
for all scenarios and do not get modified for this class of examples.
After defining the environment, the next step is to define the frames. Frames
represent the description of the data structure at different points in the execution.
For the first scenario, we have two frames. One frame corresponding to the beginning
of the execution and the other frame corresponding to the end of the execution. A
programmer could add multiple other frames possibly describing the invariants about
the shape of data structure inside the loops. The Start and End frames describe
the various pointer assignments in the corresponding graphical representations in a
textual form as described below.
Start{
head = front; a.next = b; b.next = back;
}
End{
head = front; a.next = x; x.next = b; b.next = back;
}
We need to compile the description of storyboards in the above language to an
abstract domain. We use predicate abstraction (powerset) [1] as our abstract domain. The predicates for our abstract domain are algorithmically derived from the
storyboards itself. A predicate corresponds to an assignment of a variable to some
concrete location in the storyboard. The synthesizer considers all updatable pointers
and variables. In this example, we have head,prev,curr (local variables) and a.next,
x.next and b.next as potential updatable variables. There are six locations in the
storyboard that they can point to: front, a, x, b, back and null.
An abstract state is defined as a set of predicates, i.e. it represents a valuation
of the variables to the locations. An example abstract state for the above scenario
would be s = { head = front, prev = front, curr = b, a.next = b, b.next = back,
x.next = null }. Since we have 6 updatable variables and 6 different locations, we
get 66 possible abstract states which is quite large. The lattice for our abstract
domain consists of sets of such abstract states with the order defined in terms of the
subset relationship. We will name such set of abstract states as abstract set-states
for disambiguating them with an abstract state. An abstract set-state in this domain
can be conviniently represented as a bitvector of states, where a set bit implies the
reachability of the corresponding state at a program location. The synthesis problem
is finally reduced to the problem of satisfying a set of constraint equations. The
bitvector representation helps us in defining the abstract interpretation [7] over these
equations as a SAT problem.
2.1.4
Translation to constraint equations
We first briefly describe the main ideas behind constraint based abstract interpretation [9]. Then we use a similar approach to use the abstract domain derived from the
storyboards to generate constraints for the possible implementations, which are then
used by the SAT solver to synthesize the desired implementation.
In order to perform abstract interpretation, we require the control flow graph
(CFG) of the program. We obtain the CFG of possible implementations from the
control flow sketch provided by the programmer. Figure 2-3 shows the control flow
graph for the example. In the graph, the blocks fi, f2 and f3 correspond to the
missing blocks of code in the sketch provided by the programmer. The abstract setstates of the program at program points tin, t 1 , t 2 , t3 and tout are defined by the
following equations
ti = f 1 (tin)|f 2 (t 2 )
t2= fc(ti)
f(t
t=
1)
f 3 (t3 )
tout
(2.1)
(2.2)
(2.3)
(2.4)
where every function fi takes as input a set of program states and maps it back
to the corresponding set of output program states.
Let us consider for a moment that the sketch is a complete program without any
unknown statements. In this case, the functions fi, f2 and f3 are the transitions
functions for each block of the code. After substituting tin and the functions fi in the
above equation, the least fixpoint solution of the equation gives us the most accurate
abstraction of the reachable set of concrete states at every program point [12].
For our synthesis problem, both the set of input and the corresponding set of
output states tin and tout are known, but the functions fi are all unknown. The
synthesizer searches for an assignment of these functions which satisfies the above set
of equations after substituting the appropriate values for tin and tout. From the sketch
of the implementation in section 2.1.2, we can see that possible choices for fi and
f2 are over a finite number of STMT functions. We assume that the function blocks
have a bounded length which can be algorithmically varied to discover the required
bounds. The possible choices for f3 are over a set of finite conditional statements
CSTMT and the possible choices for fc are over a finite set of conditions COND. We
encode these finite choices for every function fi with an additional control parameter
c?? in a similar spirit to the SKETCH system [23]. The value of the control parameters
represents the choice made by the synthesizer for its corresponding function. The
data flow equations now become
t=
fi(tinic??)|f 2 (t 2,c??)
t2
=
fc(tl,c??)
(2.5)
(2.6)
........
.......
Figure 2-3: Control flow graph for linked list insertion
t=
tout =
f(ti, C??)
f3(ta, C??)
(2.7)
(2.8)
We finally need to encode the correctness condition which comes from the storyboard scenarios. We impose the constraint that tin and tout are the set of input (Sin)
and output states (Sout) respectively as presented in the storyboard.
ti, = Sin
tout
Sout
(2.9)
(2.10)
Any solution to the set of equations obtained from the data flow constraints and
the input/output constraints is a potential candidate for the required implemenation.
We note that we do not necessarily require the least fixed point solution to the
above set of equations, as we are not necessarily interested in the valuation of the
intermediate set-states ti; we are interested in the values of the control parameters
C??. The value for control parameters is sufficient to fill up the missing blocks of code
with the corresponding statement choice. By giving the above constraints to a SAT
solver, we get a solution to our synthesis problem. The generated implementation is
verified with respect to the specifications provided in the storyboard.
This class of data structure manipulations do not require abstract node updates;
i.e. they are only used in the scanning phase but not in the modification phase.
On the other hand some class of data structure manipulations do require abstract
node updates for performing the desired operation. We describe about those class of
manipulations with a representative example in the following section.
.......
.............
......
-
2.2
iiiir.
Handling abstract node updates : linked list reversal
Some data structure manipulations require modifying the abstract node pointers in
the storyboard input/output specification. The inplace linked list reversal is a representative example of this class of data structure manipulations. For such cases, we
need additional machinery in our framework to handle such abstract node updates.
The machinery is described briefly in this section.
2.2.1
Storyboard
As in the previous example, the programmer provides an intuitive graphical specification of the linked list reversal operation. The storyboard in Figure 2-4 consists of
four scenarios respectively: reversing a list with more than two elements, reversing
a list with two elements, reversing a list with one element and reversing an empty
list. Scenario 1 represents an abstract list with two concrete nodes a and b, and one
abstract node mid.
head
mid
b
head
Scenario 1
head
head
head
head
head
head
Scenario 2
Scenario 3
Scenario 4
Figure 2-4: A Scenario in the storyboard for linked list reverse manipulation
In addition to the storyboard, the programmer provides a fold and an unfold
operation on the abstract node mid as the reversal operation requires modification
of the next pointer of the abstract node mid. The fold and unfold operations are
shown below in figure 2-5. The unfold operation nondeterministically either expands
the abstract node mid into a concrete node x pointing to abstract node mid or to a
concrete node x. The inverse operation fold collapses the node pattern of the concrete
node x pointing to abstract node mid or a concrete node x to the abstract node mid.
D)E
r--
L mid
I .mm#
x
-W
mid I
x
rnid
mid I
mid I
unfold
fold
Figure 2-5: Unfold and fold operation on the mid abstract node
2.2.2
Control flow sketch
The control flow sketch for the in-place linked list reversal example is shown in figure 26. First, the programmer describes the struct Node of linked list. In the control flow
sketch, LOC describes all possible nodes that can be obtained with at most one
dereferencing with the next pointer. An assignment statement is described as STMT:
LOC = LOC. It can be noted that STMT defines all possible assignment statements
with at most one next dereferencing operation. (STMT)* denotes zero or more number
of STMT statements. COND represents the choices of conditions which are inferred
from different scenarios.
The control flow sketch in figure 2-6 describes a sequence of assignment statements
(STMT)* followed by a while loop with the loop body consisting of a set of assignment
statements. At the beginning of the loop body, we have an unfold method call which
unfolds its argument node if it currently points to the mid node. Similarily the loop
ends with a fold statement which folds its argument node if it currently points to the
concrete node x. For this example, the framework only requires that the required
algorithm has a linear (O(n)) complexity.
2.2.3
Derivation of the abstract domain
The graphical specification from the scenario is translated in the text based notation
of the storyboard language, which as described before consists of an environment and
frames. The environment for this example is described as follows:
Env{
Node head, tempi, temp 2 , temp 3 ;
[Node] a, x, b;
[ [ Node ] ] mid, mid';
fold mid x {true , x.next = mid};
unfold x mid' {true}
}
void insert(Node head,
Node x){
* @Start*/
Node temp1 , temp2;
Node{
int value;
Node next;
(STMT)*
while (COND){
fold (LOC);
}
(STMT)*
unfold (LOC);
}
* @End*/
}
Figure 2-6: Control flow sketch for in-place linked list reversal
Since the abstract node mid is getting modified during the reversal operation,
the abstract node gets unfolded and folded several times during the execution. The
programmer provides some bounds on the number of time it is expected to be unfolded
before reaching a convergence. In the example above, the concrete node x denotes
the node showing up due to the unfold operation. The environment describes four
updatable variables head,tempi, temp 2 , temp3 and seven pointer locations a.next,
x.next, mid.nexti, mid.next 2 , mid'.next1, mid'.next2 and b.next. The abstract nodes
mid and mid' have two next pointers next1 and next 2 as specified in the graphical
specification. The multiple next pointers for abstract nodes correspond to the nondeterministic choice made by the next dereference. For this example, we have 11
updatable variables and 4 concrete locations which gives us 114 abstract states.
It can be noted that the next pointer definitions of the abstract nodes are missing
from the environment. This is the case because they do not hold the same values
throughout the execution unlike the previous class of manipulations. The environment
also defines the fold and unfold operations. The syntax of the unfold statement defines
that if its argument node matches node mid, then node mid should be replaced with
node x and nondeterministically either no new constraints are enforced (true) or the
constraint x.next = mid is added. Similarily the fold function declaration defines that
if its argument node matches node x, then x should be replaced with node mid' with
no added constraints.
2.2.4
Translation to constraint equations
The algorithm first builds the control flow graph (CFG) from the control flow sketch
as previously. The CFG for the example is shown in figure 2-7. The functions ffold
and funfold take an additional argument node (LOC) as an input parameter to match
it with the node specified in the environment definition. The data flow equations are
then derived from the CFG as shown below:
false
true
tot2
Figure 2-7: Control flow graph for in-place linked list reversal
t4= f2 (t
3)
(2.11)
(2.12)
(2.13)
(2.14)
tout= fe(t1 )
(2.15)
ti = fi(tin)|ffold(LOC, t4)
t2 = fc(ti)
t3 = funf fold(LOC, t 2 )
Each of the unknown functions fi, ffold and funfold can take one of the finite
possible choices as described by the sketch. We can augment the functions with the
control choice parameter c?? as previously to obtain the following data flow equations:
t
=
t4 = f 2 (ta,c?)
(2.16)
(2.17)
(2.18)
(2.19)
tout = fe(t, c??)
(2.20)
fi(tin, c??)Iffold(LOC, t 4 , C??)
t2 = fc(ti, C??)
t3 = funfold(LOC, t 2 ,c??)
The correctness conditions are also encoded as in the previous case:
tin = Sin
tout
Sout
(2.21)
(2.22)
2.3
Solving the data flow equations
In this section we briefly describe two novel ways for solving the data flow equations
obtained from the storyboard to synthesize the desired implementation. The first
algorithm EXPSYN enumerates the abstract states and transition functions explicitly
whereas the second algorithm IMPSYN represents the abstract states and transition
functions symbolically.
2.3.1
EXPSYN algorithm
The set of data flow equations are singly existentially quantified and the algorithm
solves them by translating them into a SAT satisfaction constraint problem. The
algorithm first enumerates all the abstract states (S) and constructs the transition
functions mapping an abstract set-state to another abstract set-state. These transition functions can be non deterministic due to the abstraction involved. An abstract
set-state at any program location t is represented as a bitvector of size |S|. The
jth set bit of the bitvector t represents the reachability of the state sj at program
location i. The transition functions map a bitvector to another bitvector. The input
state and output state bitvectors tin and tot are derived from the storyboard. The
resulting constraint is solved by an off the shelf SAT solver. It can be observed that
this approach might not scale very well as this encoding is doubly exponential in the
number of abstract states (ISI). But from our experience, this simple algorithm is
still useful for simple manipulations.
2.3.2
IMPSYN algorithm
The IMPSYN algorithm represents large state spaces symbolically instead of treating
them explicitly. This helps us in avoiding computation of a large number of wasteful
unreachable states, as it is often the case that the reachable set of abstract states
is considerably smaller. This, though, puts the burden of symbolic computation of
the transition functions towards the solver side but in turn provides us the ability to
handle very large state spaces. The algorithm trades some of the time complexiy with
memory complexity as the memory requirements now are much smaller in comparison.
The algorithm divides the task of solving the constraint equations into two phases:
Verification phase and Synthesis phase as shown in Figure 2-8, in the same spirit
as CEGIS [23]. Here instead of using counterexample inputs, the algorithm uses
counterexample paths for inductive synthesis. The non-determinism in the execution
comes from the presence of abstract nodes. The Verifier in the verification phase
is an abstract interpreter that searches over the abstract state space to verify that
only correct final states are reachable at the return location. If not, it returns a
deterministic counterexample path to the Synthesis phase. But the synthesis phase
only consists of these deterministic counterexample program paths as constraints and
thereby only needs to maintain one abstract state at each program point in contrast
to mainting set-states previously. These paths are also solved incrementally thereby
reducing the memory and time overhead of combined solving. This loop between
unsat
Figure 2-8: The IMPSYN algorithm
Synthesis and Verification phases continues until the Verifier proves the synthesized
program works correctly.
Chapter 3
Preliminaries
This chapter presents a brief description about the SKETCH synthesis system which
would be useful to understand the later chapters on EXPSYN and IMPSYN algorithms.
3.1
The SKETCH Synthesizer
The SKETCH synthesis framework was originally applied to bit-stream manipulations [26] and has recently been applied to scientific programs [24] and concurrent
datastructures [25]. The SKETCH system is pushing the frontier towards making the
automated program synthesis approach practical and useful for general programming
purposes by harnessing programmer insights about the problems.
The input language for SKETCH is a subset of C language with built-in support
for structs, integer and bitvector arrays, assert statements and some special sketching constructs which are briefly described below. It requires a sketch of the partial
program and its specification either in the form of some inefficient implementation or
in the form of some assert statements in the sketch itself. The sketch allows the programmer to provide valuable insights to guide the search for synthesizing the correct
implementation. The SKETCH system employs a Counterexample guided inductive
synthesis algorithm (CEGIS) which iteratively adds only the interesting inputs for
the program to be synthesized and ignores the ones which do not introduce any new
behaviours.
3.1.1
Example sketch
We use a very simple sketch of a toy program in figure 3-1 to describe various useful
features of the SKETCH language. The implements keyword specifies the specification
implementation for the sketch program. The specification describes a function f (x) =
x + x = 2 * x. The sketch describes a function g(x) = x*??, where ?? (integer hole)
is a special keyword in the SKETCH language that can be filled with an integer value
(upto a bounded size). For this example the integer hole will get filled up with value
2.
int sketch(int x) implements spec{
int spec(int x){
return x + x;
return x * ??;
Figure 3-1: A simple sketch toy program
Another important construct the SKETCH language provides is the choice construct which tells the synthesizer about the possible choices for a given location in
the sketch program. The choice construct is specified with {| and |} symbols. Any
regular expression over the possible choices can be written inside the choice constructs. Consider the problem of swapping two integer values x and y without using
a temporary variable. The sketch in figure 3-2 provides the sketch using the choice
construct.
int [2] swap(int x, int y){
int [2] out ;
#define VAR {
int temp;
int[2] sketch(int x, int y) implements swap{
int[2] out;
temp
y
x;
x
y
repeat (??){
temp;
VAR
out [01
out [11
{ VAR + VAR
VA
- VAR
out [0]
out [0]
return out;
return out ;
I
I
Figure 3-2: Swapping of two integers without a temporary variable
The macro VAR denotes the choice of variable which can either take values x or y.
The construct repeat(n) is a special construct in SKETCH which lets the synthesizer
repeat the block of statements inside its body n number of times. In the sketch
the argument is specified as an integer hole and the synthesizer comes up with the
minimum integer value sufficient for synthesizing the sketch. The body of the repeat
construct contains another choice construct for the statement which can either be VAR
= VAR + VAR or VAR = VAR - VAR. The synthesizer comes up with the following
sequence of operations
3 =
-
Y;
(3.1)
y
=
X + y;
(3.2)
X
=
y
(3.3)
-
X;
};
which can be verified to correctly swap the two integer variables x and y. A
detailed tutorial and comprehensive set of examples for using the SKETCH system
can be found at SKETCH homepage.
Chapter 4
Storyboard Language
Storyboards need to have well defined semantics for them to serve as an effective
means of communication between the programmer and the synthesizer. In this chapter, we provide the detailed description of the syntax of storyboard language and
present the semantics of the language for efficiently deriving the specifications and
abstract domains from the storyboard.
4.1
Syntax
SBoard
Scenario
Frame
Constraint
Scenario+
SLabel Env Frame+
FLabel Constraint
Pred;*
Pred =
VarName = loc
true
false
Env
Deci
=
=
Def
FuncDecl =
Decl* Def;* FuncDecl*
Type loc+
assert VarName == loc
assert VarName =={loci,loc 2 ,.. -,loc'}
FoldFuncDecl
UnfoldFuncDecl
FoldFuncDecl
fold VarName VarName Constraint
UnfoldFuncDecl
unfold VarName VarName Constraint
Type E Class | [Class] | [[Class]]
VarName
var | object .field
Figure 4-1: Grammar for the Storyboard Language
Figure 4-1 presents the grammar of the storyboard language. We will use the
storyboard language description of the linked list insertion example in section 2.1.3
to explain the syntax of the storyboard language. A Storyboard SBoard consists of
a set of one or more scenarios Scenario. In the example, we have 4 scenarios in the
storyboard. Every scenario consists of a scenario label SLabel, an environment Env
and a set of one or more frames Frame. Let us consider the scenario Si of inserting
an element in the middle of a sorted linked list. The environmnet for S1 is described
as :
Env{
Node head, prev, curr;
[Node] a, b, x;
[[Node]] front , back;
{front , a
assert front .next
{back, null};
assert back.next
}
An evnironment Env consists of a series of variable declarations Dec1, a set of
global definitions Def and a set of function declarations FuncDecl. The declarations
define class variables (Class), concrete objects ([Class]) and abstract objects ([[Class]])
of type Class. The evniroment of Si declares head, prev and curr as variables of Node
class; a, b and x as concrete objects and front and back as abstract objects of type
Node. The declaration follows a series of global definitions, initiated by the keyword
assert, which hold for all scenarios. The environment for Si defines the next pointer
locations for the abstract nodes front and back. The definition front.next == {front,
a} defines that the next pointer of the front abstract node can point to either front
itself or to the concrete object a. The function declarations define the fold and unfold
operations over the abstract nodes. The function declaration starts with a keyword
fold (or unfold) which is followed by two variable names var and var' and a set of
constraints C. The syntax defines that when the argument to the fold (or unfold)
method matches node var, it should be replaced with node var' and any one of the
constraint in C can be asserted nondeterministically.
After the environment declaration, a scenario defines a series of one or more frames
Frame. Each frame consists of a frame label Flabel which matches some label for a
control location in the control flow sketch of the program. The frame also consists of a
frame constraint Constraintwhich represents the constraint satisfied by the abstract
state at the corresponding program control location in the scenario. The frames for
the example scenario Si is described asq:
Start {
head = front;
a.next = b; b.next = back;
}
End{
head = front; a.next = x; x. next = b; b.next = back;
}
The frame labels Start and End correspond to the starting and return program
locations in the control flow sketch of the program. The frame constraint consists
of a sequence of predicates of the form Expr = name, the conjunction of which is
expected to be true at the corresponding program location. In the example frame, the
frame Start describes that the constraint formed by conjunction of head = front, a.next
= b and b.next = back holds at the start location of the program in this scenario. It
is worth noting that the language supports frames to be specified at any arbitrary
control location of the program. This is useful for programmers to express additional
insights about different invariants about the data strcutre manipulation which might
help scale the synthesis process.
4.2
Semantics
This section describes how our framework derives the essential abstract domain automatically from the storyboards. We define a function Scenarioi -± ABSi that
derives an abstract domain ABSi from a scenario Scenarioi. The abstract domain for the storyboard ABSSB is constructed from the abstract domain of the
constituent scenarios ABSSB = {ABSi I Scenarioi -s ABSi}, with one abstract
domain per scenario. The abstraction for each scenario is defined as a four-tuple
ABS; = {Preds, {S}, L -± {S}, Def}. In this definition, Preds is the set of predicates inferred from the Environment (Env) and the frame constraints. S is the set
of abstract states; each abstract state corresponds to a valuation of the variables and
object fields in the environment. Each state is associated with the set of predicates
that are true in that state, S C P(Preds). Finally, L
-
{S}
represents the map-
ping from the labels L to their corresponding set of states, and Def represents the
definitions from the environment.
We need some additional definitions to describe the automated abstract domain
derivation from the storyboards. Each scenario in the storyboard is modelled as
a bipartite graph G = (V, E), called the scenario constraint graph, where the two
partitions of the vertices correspond to the set of updatable variables (Vv) and the
set of concrete and abstract objects (VL) respectively in that scenario. We have
V(G) = Vv U VL. For the example scenario S1 , we construct the following bipartite
graph as shown in figure 4-2(a). The set of updatable variables VV ={ head, prev,
curr, a.next, b.next, x.next} and the set of abstract and concrete locations VL = {front,
a, x, b, back, nuIll} constitutes the vertex set V(G).
For every predicate of the form u = v in a frame constraint in that scenario,
we add an edge e = (u, v) C E(G) in the graph. For temporary variables from
the control flow sketch, edges to all the locations are added in E(G). This graph
captures all possible variable valuations to the concrete and abstract locations for
the corresponding scenario. In scenario Si, all the edges are included in E(G) for
temporary variables prev and curr. It can be noted that for other variables like front,
only one edge front -* head is added as this is the only constraint present in the
frames in Si. Similarily for a.next only the edges to nodes b and x are present.
We define a state edge cover of this graph G to be a set of edges ESEC C E(G) such
that for every vertex u E Vv, there exists an edge (u, v) E ESEC for some v E V(G).
In other words,
ESEC
={(u,v)E E(G) | V E Vvv
E V(G)}
(4.1)
Furthmore restricting the size of the state edge cover set (|ESEC ) to the number
of updatable variables (jVvj), we can guarantee that we have only one valuation of
every variable in the edge cover set. i.e.
Vu E Vv 3 one v C V(G) s.t. (u,v) E ESEC
(4.2)
A state edge cover set of size |Vvl constitutes an abstract state for that scenario.
The edges in the state edge cover represents the valuations of the variables in that
particular set. Figure 4-2(b) presents an example state edge covering of G which in
turn represents the abstract state s = { head = front, prev = front, curr = b, a.next =
b, b.next = back, x.next = null }.
head
front
prev
a
pre
curr
b
curr
a.next
x
a.next
b.next
back
b.next
x.next
null
x.next
(a)
head
front
Q"
a
b
Q
x
back
C
"null
(b)
Figure 4-2: (a) Scenario constraint graph for the example scenario Si and (b) an
example state edge cover (abstract state) for Si
The state edge cover set computation in the scenario constraint graph performs
an important optimization in which it merges states that are deemed unlikely to be
relevant in synthesizing the correct implementation. The idea is that we need to
only track the variable location valuations which are present in the storyboards. For
the example scenario Si, a.next points to b in the Start frame and points to x in
the End frame. Our abstraction assumes that only b and x are relevant locations
for a.next; so we only consider two predicates a.next = b and a.next = x instead of
considering 6 different predicates a.next = {front,a,x,b,back,null}. It should be noted
that for temporary variables (from control flow sketch) we still consider all 6 possible
valuations. After performing similar optimizations for other updatable variables, we
get only 144 abstract states which are considerably smaller than 66. This optimization
goes a long way in making the EXPSYN algorithm feasible. In a similar way, we derive
an abstract domain for each scenario of the storyboard.
The formal semantics of the Storyboard language to obtain the abstract domains
is presented in Figure 4-3. A high level description of the semantic rules follows:
" SB The rule for storyboards constructs an abstract domain for a storyboard
by taking the union of abstract domains obtained from the abstract domains of
the constituent scenarios.
" SCEN The scenario rule constructs an abstract domain ABS for a scenario.
From the environment of the scenario, we derive the variable bindings and the
constituent definitions. From each frame Framej with the variable binding, we
construct a graph Gi using the frame rule (FRAME). The graph G for the
scenario is then constructed by taking the union of graphs G.
" FRAME The frame rule returns a Graph and the corresponding label from the
constraints.
" CONSTR The constraint rule constructs a graph G = (V, E) from a set of
constituent predicates. Every predicate contributes an edge and two vertices to
the graph.
" PRED The predicate rule converts a predicate of the form (u = v) to two
vertices u, v and an edge (u, v).
A Storyboard abstraction ABSSB corresponds to a set of scenario abstractions
(SB). For a scenario Sc, its correpsonding abstraction is defined as ABS(Sc) =
{Preds(G), S(G), Label - S(G), Def }. For every frame fi E Sc, let Gi denote its
constraint graph. The graph G for scenario is constructed as the union of all frame
graphs, G = U2 Gi. Preds(G) is evaluated as the predicates corresponding to all the
edges in G. S(G) is the union of all possible variable edge covers of G. Label - S(G)
is evaluated as the union U Li -± S(Gi) for every frame i. Finally the definitions Def
are evaluated from the environment of the scenario (SCEN). A predicate (u = v)
is converted to its corresponding edge (u, v) and vertices u and v (PRED). The
graph for a frame constraint is constructed from its constituent set of predicates
(CONSTR). Using this graph and the frame label, the frame graph is returned
(FRAME).
Scenarioi - ABSj ABS = Uj ABSz
Scenario* --+ {ABS}
o- H Framej -+ G, Labeli
ABS = { Preds(G), S(G), U2 Labeli -+ S(G ), Def}
SB
Env --+ a, Def
G = Uj Gi
Env Frame* -+
ABS
- H Constraint G
a - Label Constraint-+ G, Label
-
o- H Predi -+ (ui, vi) E = Ui{(ui, vj)}
o- H Pred*-+ (V, E)
SCEN
FRAME
V = Ui{ui,vi}
CONSTR
Pred
o- H Pred
(u = v)
-
(o-(u), o-(v))
Figure 4-3: Semantics of the Storyboard language
PRED
Chapter 5
EXPSYN Algorithm
In this chapter, we explain how the EXPSYN algorithm synthesizes the desired implementation from the control flow sketch and the storyboards. First, the data flow
constraints are obtained from the control flow sketch using a fairly standard technique [9]. Then an uninterpreted relation mapping an abstract state to a set of
abstract states is constructed for each of the transition function candidate present in
the control flow sketch. This relation is then lifted to a function mapping an abstract
set-state to another abstract set-state. Finally the function choice at each program
point in the data flow constraints are encoded in the high level SKETCH intermediate
language. The novelty of this algorithm lies in the way we construct these transition
functions from the abstract domain using abstract interpretation [7] and the high
level encoding of the function choices for every statement. We describe each of these
steps in more details.
5.1
Notations
From chapter 4, we derive a set of abstract states for every scenario in a storyboard
and we denote this set by S. Let the size of this set |S| be N. We add an error
state ser. to this set to get S'= S U {Serr}. The transition function candidates in the
control flow sketch are denoted by T and their corresponding uninterpreted functions
are represented as fT. The uninterpreted functions in the data flow equations are
represented as fi. We represent a set of states at program point i as a bitvector tj
of size N + 1, where the jLh bit of the bitvector t, [j] represents the reachability of
the state sy E S' at location i. The sketch control choice variables c?? represent holes
that can take any integer value.
5.2
Construction of uninterpreted transition functions
Algorithm 2 describes the construction of an uninterpreted transition relation fT for
a transition function r. The algorithm iterates through all pairs of states (s, s') E S 2
such that s ==> WP(s', T), i.e. state s implies the constraint obtained from weakest
precondition of state s' on transition T. The weakest precondition implication query
is answered by an off the shelf CLP prolog theorem prover [31]. All such state pairs
(s, s') are added to the transition relation fT . It is to be noted that an abstract
state s can map to multiple abstract states s' on a transition relation fT due to
the nondeterminism introduced by the abstraction. For all statement transitions T
(ignoring conditional transitions), the transition relation f, is completed with (s, ser,)
for all states s C S if there exists no s' E S such that (s, s') E fT. Also (se,,, ser,)
added to f, (once the program reaches an error state, it stays in an error state).
is
Algorithm 2 BuildTransitions(T)
Require: the transition r, S: set of abstract states
Ensure: the transition relation f: S -+ 2
1: for s in S do
for s' in S do
2:
if s => WP(s', T) A Scenariold, = ScenarioId
3:
8 , then
add (s, s') to fT
4:
end if
5:
6:
end for
7: end for
Algorithm 3 fcT(fT, tin)
Require: f,: the uninterpreted transition relation, tin: input abstract states bitvector
Ensure: tut: bitvector of output abstract states
1: tout = 0
2: for i = 0 to N do
3:
if tin[i] == 1 then
4:
5:
for (si, s) c f, do
tout[s] = 1
end for
6:
7:
end if
8: end for
The above constructed uninterpreted functions f, are then lifted to corresponding functions fCT that map an abstract set-state to another abstract set-state. Algorithm 3 describes how the transition functions fCT are encoded in the SKETCH
intermediate language. For every set bit i in the input bitvector tin, it sets the
corresponding bit j in the output bitvector tout where (si, sj) c fT . The algorithm
constructs the function f c, implicitly as it is expensive (in some cases impossible) to
enumerate all possible input bitvectors of size N.
5.3
SKETCH encoding of data flow constraints
The data flow constraints obtained from the control flow sketch of the program are
first translated into the SKETCH language. The translation for the example data flow
sketch in section 2.3.1 is shown below. The initial and final state constraints on the
bitvectors are asserted using setlnitialState and checkFinalState methods respectively.
void main () {
b it [N]
int
ti,, ti,t2, t3,tout ;
CI, c2 ,c3 ,c4,c5 ;
Ci =?? ; C2 =?? ;
tin=
=
t3
C4 =??;
C5 =??;
setlnitialState ()
fi(ti,,
ti
t2
C3 =??;
cti,
ci)|f2(t2, c2) ;
c3) ;
fc(ti, c4);
tout = f 3 (t,c 5 );
assert
checkFinalStates (tout);
I
Each of the data flow constraint function fi is encoded in the SKETCH intermediate
language as follows. The choice parameter value delegates the function call to the
corresponding transition function relation fcc indexed by the choice variable.
bit [N]
if(c
if(c
if(c
fi( bit [N] t , int c)
= = 1) return fc1(t);
= = 2) return fC2(t);
= = 3) return fc3(t);
if(c = = 4) return fc4(t);
}
After encoding the constraints in the SKETCH language, the SKETCH solver is
used to search for a satisfying assignment to the choice parameters c?? which are then
mapped back to the corresponding statement functions in the original control flow
sketch of the program.
5.4
Correctness guarantees of the synthesized implementation
We only get partial correctness guarantees about the synthesized implementation
from this algorithm. We do not guarantee termination as the solution to the set of
data flow equations is not necessarily a least fixed point solution. If the program
terminates on an input, then it is guaranteed to return the correct result.
Chapter 6
IMPSYN Algorithm
The EXPSYN algorithm presented in chapter 5 works well for storyboards with upto
300 abstract states approximately. Many of the manipulations require much larger
number of abstract states, e.g. the binary search tree rotation manipulation storyboard gives rise to more than 1000 abstract states. The IMPSYN algorithm is
developed to handle such large abstract state space manipulations. The algorithm
maintains the abstract states and the transition functions symbolically and only constructs abstract states which are reachable during the execution. The main idea this
algorithm exploits is the fact that most of the abstract states are not reachable during
the program execution; and they need not be computed if they are not required by
the synthesizer. This representation trades time complexity with the memory complexity, as the solver now has to symbolically execute the transitions itself but the
memory requirements are significantly reduced.
The EXPSYN algorithm performs synthesis and verification of the implementation
both together simultaneously. It can be noted that the verification of the algorithm
is a completely deterministic process and the solver's expensive search for the verification purpose can be avoided if handled separately. The IMPSYN algorithm consists
of separate synthesis and verification engines which communicate through the counterexample traces; thereby reducing the burden on the synthesis solver to only search
for satisfying the constraints on path program traces [2] instead of arbitrary control
flow. We name it counterexample trace guided inductive synthesis (CETGIS) and
the detailed description of the algorithm follows.
6.1
Symbolic representation of states and transition
functions
The abstract states obtained from the storyboards are represented symoblically using
an astate type of type struct in SKETCH. All the concrete and abstract locations in
the storyboard are defined as integer macros. Each class variable in the storyboard is
represented as an integer field in astate. The updatable pointer locations are represented as two dimensional integer arrays in astate where the first index represents the
location and the second index represents the nondeterministic choice introduced by
the abstraction. These integer variables can take any value from the integer macros
defined for the abstract and concrete locations. Additionally astate also contains two
bits isErr for representing whether the abstract state is an error state and isEmpty for
representing whether a state is an empty state.
The abstract state representation of the storyboard in section 2.1.3 is shown below.
The abstract and concrete node locations FRONT, A, X, B, BACK and null are defined
as integer macros. The struct astate contains the variables head, prev and curr with the
two-dimensional array next corresponding to the updatable pointer locations. This
example has a non-determinism branching factor of 2. e.g. next[FRONT][0] stores
the first nondeterministic choice for front.next and next[FRONT][1] stores the second
nondeterministic choice for front.next.
#define
#define
#define
#define
#define
#define
FRONT 0
A 1
X 2
B 3
BACK 4
EMP 5 //
Null
struct astate{
int head, prev , curr;
int[5][2] next;
bit isErr;
bit isEmpty;
-
}
The transition functions are similarily represented symbolically. A transition function fci takes as an argument an input abstract state and an integer choice variable c
representing how to resolve the nondeterminism, and returns an accordingly modified
output abstract state. The transition functions also model error conditions if a null
pointer is dereferenced. The transition functions for the conditional statements sets
the isEmpty bit according to the conditional expression.
An example symbolic transition function for the example in section 2.1.3 is shown
below.
curr = curr. next
astate fc12(astate fromState , bit c){
astate toState
new astate(fromState);
if(toState . curr
fromState.err
else
toState . curr
return toState;
}
- EMP)
1;
fromState . next [ci [ fstate ];
6.2
SKETCH encoding and constraints
The sketch encoding for the data flow constraints consists of four kinds of constraints:
" Program trace constraints (Trace): These traces correspond to an execution path in the control flow sketch of the program. The path program traces
consists of a sequence of program statements in the control flow sketch starting
from the start location and ending at the return location. It is to be noted
that these program traces are completely deterministic as the non-deterministic
choice is resolved by the concrete value of the second parameter. An example
program trace constraint for the running example is shown below.
void tracel () implements spec{
astate[10] t;
t [0] = getlnitialState()
t [1] = fi t [0]
t [2]1 = fc t [11] ,
f 2 (t [2],
t [3]
t [4] = fc, t [3] ,
t [5]
t
t
t
t
[6]
[7]
[8]
[9]
f 2 (t [4]
,
0)
0)
1);
0);
1);
= fc, t [5]1
0);
f 2 (t [6] , 1);
= fc t [7] , 0);
f3(t [8], 0);
assertFinalState (t [9]);
}
"
Loop exiting constraints (Termination): For every loop in the control flow
sketch, these constraints encode the constraint that there exists at least one state
in some nondeterministic trace execution that exits the loop after a bounded
number of loop unrollings. The holes ?? can take any bit value to construct
some nondeterministic behaviour. These constraints ensure the terimnation of
the synthesized implementation in presence of the loops. An example constraint
for the example with loop bound of k = 4 is presented below.
void looplExit () implements spec{
astate [10] t ;
t [0] = getlnitialState()
fi (t [0]
??);
fc (t [1]l
t [3] = f 2 (t [2],
??);
t [1] =
t [2] =
t[4]
t [5]1
??);
fc(t [3] , ??);
f2(t [4] , ??);
t [6] = fc(t [5] ,??)
t [7] = f 2 (t [6] ,??);
t [8] = f(t
t [9]
=
f2
[7]
,
t [8]
;
??);
??);
assert t [2].isEmpty
| t [4].isEmpty
| t [6]. isEmpty
t [8].isEmpty;
}
e Loop fixpoint constraints (Fixpoint): For every loop in the control flow
sketch, these constraints encode the constraint that the loop computation reaches
a fixpoint after a bounded number of loop unrollings. An example constraint
for the example is shown below with the bound for loop unrolling k = 4.
void loop1Fixpoint (bit [9]
astate [10] t ;
c)
implements
spec{
t [ 0] = ge tI n it iaIS t at e()
t [1] = fi(t [0],
c[0]);
t [2] = fc(t [1]1
c [1])
t [3] = f 2 (t [21
c[2
c 3
t [4] = fc(t [3]
t [5] = f2(t [4] ,c [4])
t[6]
t [7]
fc(t[5]
t[8]
fc(t[7],
t [9]
f2(t [8]
f 2 (t [6]
,
c[5]
c [6]);
c [7]);
c [8]);
assert equal(t[2],t[41)
|| equal(t[2],t[6]) I| equal(t[2],t [8])
|| equal(t[4], t[6]) || equal(t[4], t[8]) | equal(t[6], t[8]);
}
o Final state constraint (FinalState): The final state constraint (assertFinalState(t))
encodes the constraint that the final state is either an empty state (t.isEmpty
= 1) or the final state satisfies the constraints specified by the End frame constraints of the storyboard.
The uninterpreted function choices in the data flow constraints are encoded similary as in Section 5.3 and is shown below.
bit [N] fi(bit [N] t , bit
if(??) return fc1 (t,
if(??) return fc2 (t,
if(??) return fc3 (t,
c){
c);
c);
c);
if(??) return fc4 (t, c);
At every program location, the sketch synthesizer needs to maintain only one
abstract state. Each trace is sequentially added to the constraints and are solved
incrementally exploiting the incremental solving feature of SKETCH. The incremental
solving feature is essential for feasibility of solving the large set of constraints.
6.3
Verification engine
The verification engine implements a standard deterministic abstract fixpoint verification algorithm as shown in algorithm 4. The algorithm computes the set of all
reachable states of the input program until a fixpoint of abstract states is reached.
If the final state constraints are satisfied, the verifier returns that the program P is
correct. Otherwise, the verifier returns a deterministic counterexample trace (with
concrete choice values) back to the synthesis engine.
Algorithm 4 Verify(P)
Require: P : program implementation with concrete transition functions
Ensure: a deterministic counterexample trace cex if not correct, otherwise null
1: Queue<astate> stateQueue = empty
2: HashSet<astate> visitedStates = empty
3: stateQueue.enqueue(initState)
4: while stateQueue != empty do
5:
astate s = stateQueue.dequeue(
6:
visitedStates.add(s)
7:
for astate s' E nextStates(s) do
8:
if s'V visitedStates then
9:
stateQueue.enqueue(s')
10:
end if
11:
end for
12: end while
13: if checkFinalState(visitedStates) then
14:
return null { /* Program is verified */}
15: else
16:
return getCexTrace(visitedStates)
17: end if
6.4
IMPSYN algorithm description
The IMPSYN algorithm communicates between the synthesis engine and the verification engine as shown in figure 6-1. The detailed algorithm is shown in Algorithm 5.
First the verifier assigns a random assignment to the uninterpreted transition functions in the data flow constraints and then runs the verifier on that program. If the
program verifies, the current program is returned as the desired data structure manipulation implementation. Otherwise a deterministic counterexample path from the
Figure 6-1: The IMPSYN algorithm
start location to the return location of the program is returned to the synthesis engine.
The synthesizer incrementally adds the counterexample trace constraints to the loop
exit, loop fixpoint and final state constraints. It then searches for a satisfying assignment to the functions to remove this counterexample path. The synthesized program
is then returned back to the verifier for verifying whether the currently synthesized
program is correct. If yes, the algorithm stops and the synthesized implementation
is returned. Otherwise the loop between the synthesis and verifier engines continues.
Algorithm 5 IMPSYN
Require: the data flow constraints and correctness conditions from storyboard
Ensure: the correct data structure implementation
1: Assign function randomly in P
2: boolean done = false
3: while !done do
cex = Verify(P)
4:
if cex == null then
5:
6:
7:
done = true
end if
8:
P = Synthesize(P, cex)
9: end while
10: return P {The desired correct implementation}
6.5
Correctness proof of IMPSYN algorithm
In this section, we provide a brief proof about the correctness of IMPSYN algorithm,
i.e. a correct program P is returned by the IMPSYN algorithm. For simplicity and
without loss of generality let us assume we have only a single scenario with one
input state constraint (sin) and one output state constraint (sout).
The final state
constraint specifies that at the end of the trace the output state can either be empty
or the specified output state (sout). All the states reachable at the return location
of the program are guaranteed to satisfy the output state constraints (FinalState
constraint). Therefore for proving correctness of the algorithm, it suffices to show
that there exists at least one non-empty state reachable at the return location of the
program. We need to consider the three kinds of basic control flow constructs present
in the program :
" Straight Line : If the initial state is non empty (initState.isEmpty = 0), then
the final state is also going to be non empty as the only way to make a state
empty is through some conditional statement.
* Conditional Branch : Let us call the two branches of the conditional branches
to be btrue and bfalse. For contradiction, let us assume both the branch paths
leads to an empty final state. For this condition to happen, the initial state
should not pass both the true and the false branch of the conditional cond, i.e.
the state satisfies both cond and !cond which is a contradiction. Therefore at
least one of the paths lead to the correct final state.
" Loop : With the loop exit constraint we can guarantee that there exists at
least one abstract state that leaves the exits the loop within the bound of the
loop unrolling.
These correctness arguments can be naturally extended inductively to an arbitrary
combination of the these building blocks to obtain correctness arugments for programs
with arbitrary control flow.
6.6
Optimizations
In this section, we present some optimizations that accelerates the loop between the
Verifier and Synthesizer engines to obtain faster convergence.
6.6.1
Synthesizing multiple program paths together
The Verifier returns a completely deterministic path program trace to the synthesizer
phase. Every path program statement has a non-deterministic choice parameter. Let
the non-determinism bound be b and number of statements in the trace be n. There
are nb possible traces with the same program statements and the Verify-Synthesis
loop might take a long time in correcting multiple similar paths. As an optimization
step, the Synthesizer treats the concrete choices of the deterministic counterexample
path trace as an input variable to the program. The SKETCH system synthesizes a
program that satisfies the final state constraints for all possible valuations of the choice
parameters, i.e it satisfies all nb possible paths simultaneously. SKETCH uses CEGIS
algorithm to automatically discover only the required non-deterministic paths from
the set of all paths. For the example constraint in section 6.2, the trace is updated
as shown below.
void trace1(bit [9]
astate[1O] t;
t [0]
c)
implements spec{
= getlnitialState()
t[1] = fi(t[0],
t [2] = fc(t [1]
c[0]);
t [3]
t[4]
t [5
t [6]
f2 (t [2] ,
c [2]);
fc( t[3],
f 2 (t [4] ,
fc t [5]
t [7 = f 2 (t[6] ,
t [8] = fc t [7]
t[9] = f3(t[8]
c[3])
c [4]);
c [5])
=
=
=
=
c
[1]);
c [6);
c [7])
c [8]
assertFinalState (t [9]);
}
6.6.2
Merging conditional statements in path program traces
The conditional statements can similarily create exponential number of paths even
without a non-deterministic choice. Therefore the conditional statements are treated
as a part of the path program traces. The idea we exploit here is the fact that even
in the presence of conditional statements in the program trace, the synthesizer still
needs to keep track of only 1 program state at each program location. This is because
at merge points only 1 path is feasible.
6.6.3
Concretizing the choices for loop exit constraint
The loop exit constraint states the constraint about existence of at least one state
exiting the loop condition after a bounded number of loop unrolling. Since the final
state constraint states that either the final state can be empty or it satisfies the
end frame constraint, it is much easier for synthesizer to come up with a satisfying
assignment that leads the final state to an empty state. There is an existential
quantification over the non-deterministic choices in the program statements in the
loop exit constraint. Therefore given some number of deterministic counterexample
traces, the synthesizer can fill up the existential holes in the loop exit constraint to
make the final states empty in a much easier fashion. For most of the inductive
cases, we reserve the 1 " non-deterministic choice for the base case and other ones for
inductive cases. We want the loop exit constraint to hold on the base case and the
inductive case is only used for reaching a fixpoint. We apply this observation as an
optimization by simply replacing the non-deterministic holes with a concrete value 0.
The loop exit constraint section 6.2, the trace is updated as shown below.
void looplExit () implements spec{
astate [10] t ;
t[0] = getnitialState();
fi(t[0],
0);
t [2]
t [3]
fc(t [1]
0);
(t [2
0);
t [4]
t [5
fc (t [3]
f2 (t [41
0);
0);
t [6]
t [7]
fc (t[5
f2 (t [61
0);
0);
t [8]
t [9]
fc(t [7]
0);
f 2 (t [8],
0);
t[1]
=
f2
assert t [2]. isEmpty
6.7
St [4]. isEmpty
t [6]. isEmpty
| t [8]. isEmpty;
Correctness guarantees of the synthesized implementation
This algorithm provides complete correctness guarantees for the synthesized implementation, as it also provides a termination guarantee in addition to the final states
satisfying the end frame constraints. This algorithm requires a bound on the number
of loop unrollings k to reach a fixpoint, but in most cases it is observed that this
number is small in the abstract setting. This value k can be varied algorithmically as
well if required.
Chapter 7
Experiments
This chapter presents preliminary results for the EXPSYN and IMPSYN algorithms
on some data structure manipulation case studies including linked lists and binary
search trees. The experiments were run on an Intel Core 2 Quad 3.0 GHz CPU with
4GB of RAM.
7.1
EXPSYN algorithm results
The results for case studies of EXPSYN algorithm are presented in Table 7.1. The
Table describes the number of clauses in the corresponding SAT formula, the amount
of memory used, the number of abstract states, the number of transition functions and
the time it took to synthesize the desired implementation. The state column denotes
the number of abstract states obtained from the storyboard after the state edge cover
optimization. The transitions corresponds to the number of possible choices for a
program statement in the control flow sketch. From the Table, we can observe that
the EXPSYN algorithm is very memory intensive. Since it encodes the abstract setstates as a bitvector, the search space over bitvectors of such large sizes results in
large memory requirements. A brief description for the case studies, control flow
sketch and the synthesized implementations follow.
Data Structure
Link list insertion
Link list deletion
BST insertion
#Clauses
179K
180K
258K
Memory
5.5G
5.5G
8.1G
States
249
249
211
Transitions
73
73
78
User Time
6m8s
5m46s
2m32s
Table 7.1: Experimental results for EXPSYN algorithm
7.1.1
Insertion/Deletion in a sorted linked list
We first desrcibe the results on our running example. The storyboard for the sorted
linked list insertion manipulation is shown in figure 7-1. The four scenarios correspond
to: insertion in the middle of the linked list, inserting in the beginning of a linked
list, inserting in the end of a linked list and inserting in an empty linked list. The
control flow sketch of the program is shown in figure 7.1.1.
a
fi
J
f] ck
FaeEIt
head
scenarhol
~
head
head
head
Start
Frame:
head
{taici ]
[ r
head
head
head
scenario2
scenario3
scenario4
a.val < x.val < b.val
Figure 7-1: Storyboard for linked list insertion
The storyboard for the sorted linked list deletion manipulation is the storyboard
obtained by flippping the input and output frames of the storyboard for the sorted
linked list insertion manipulation shown in figure 7-1. The control flow sketch of
the deletion operation was also given the same as the insertion opertation shown in
figure 7.1.1.
It is interesting to note that for the reversible manipulations, we can use this
trick of swapping the input and output frame constraints to achieve the inverted
manipulation. Another example of such invertible operation is left rotation and right
roation in a binary search tree.
It is interesting to note that for this example, we did not require to specify the
inductive definiton of the abstract nodes front and back. These definitons were not
required for this case as the abstract nodes are only required to perform scanning and
reaching the correct concrete nodes for local modifications. For the scan and modify manipulations, the inductive definitions would both perform the same execution
and would not add any new behaviour to the program execution. The storyboard
specification without the inductive definitions might allow some bad inputs to be
abstracted away as the abstract input specification, e.g. the abstract node front can
represent a set of nodes with an internal cycle. But the synthesized implementation
would still provide the partial correctness guarantee, i.e it would either not terminate
or terminate with the correct result. Adding inductive definitions to the storyboard
would rule out such bad inputs.
#de fine
#define
#define
#define
LOC { (headIprevIcurrIx)(.next)?
STMT LOC = LOC;
COND LOC == null
LCOND LOC!=null && LOC. val < x. val
void Illnsert (Node head,
Node x){
* @Start*/
Node prev , curr
STMT
while (LUND){
STM
STMT
}
i f (COND){
STM
STMT
}
STMT
STMT
}
/* @End*/
}
Figure 7-2: Control flow sketch for sorted linked list insertion manipulation
void
IlInsert (Node head,
Node prev=null,
x.next = prev;
curr = head;
Node x){
curr=null;
while(curr!= null && curr .val < x. val){
prev = curr ;
curr = prev. next
if(curr = head){
x. next
curr ;
head =
}
else{
x.next = curr;
prev.next = x;
Figure 7-3: Synthesized implementation for sorted linked list insertion manipulation
void llDelete (Node head,
Node prev=null , curr
Node x){
null ;
curr = x;
curr = head;
while(curr!= x){
prev = curr ;
curr = curr.next;
}
if ( curr - head){
head = he ad;
head = he ad. next;
}
else{
prev . next = curr. next;
prev .next = prev. next;
}
Figure 7-4: Synthesized implementation for sorted linked list deletion manipulation
7.1.2
Insertion in a binary search tree
root
rootb
root
root
z>a
z< a
root
root
z<a
Z
Scenario 1
Scenario 2
Scenario 3
An
a
a"1
Scenario 4
/nb
b
a
a< bcz
a<z<b
Z
b\
zz
6b
Scenario 6
Scenario 7
Scenario 8
Scenario 9
Figure 7-5: Storyboard for binary search tree insertion
The storyboard for inserting a node z in a binary search tree is shown in figure 75. The storyboard consists of 9 scenarios: inserting a node in an empty tree, two
scenarios for inserting a node in a tree with one node (left and right), three scenarios
for inserting a node with a.right = b and z < a, a < z < b and b < z and three
scenarios with b.left = a and z < a, a < z < b and b < z. The control flow sketch
of the bst insertion operation is shown in figure 7.1.2. The control flow structure has
a while loop as it is expected to be an O(n) manipulation. The loop body contains
an assignment statement and a conditional statement. After the loop, the sketch
contains two nested conditional statements. It can be seen that in this case, there is
a lot more structure which is required to be provided by the programmer. But we
want to emphasize that this part of exploring different control flow structures can be
automated as an outer loop to our current framework. Performing a naive exploration
would certainly be very expensive, but an exploration guided by programmer insight
would be much more useful and general.
The storyboard again for this case study does not include the inductive definitions
for the the abstract nodes. It can allow some ill-formed input to be abstracted away
as the abstract specification. But the combination of different scenario cases and the
control flow sketch restricts the possibility of insertion at some incorrect position in
the tree. We believe that for scan and modify manipulations, we can get away with
some incomplete specifications which are not essential for synthesizing the correct
implementations.
LOC {| (xlylz| root )(. left I . right)?
STMT LOC = LOC;
.right)?
y)(. left
LCOND { (x
NCOND { (xIy I z). key < (xy lz). key
#define
#define
#define
#define
#define COND
void
{|
|}
null
}
}
LCOND I NCOND |}
bstlnsert (Node head,
Node x){
* @Start *
Node prev ,
curr
STMT
STMT
while (COND){
STMT
i f (COND)
STMT
else
STMT
}
i f (COND){
}
STM
else{
i f (COND)
STMT
else
}
STM
* @End*/
}
Figure 7-6: Control flow sketch for binary search tree insertion manipulation
void
bstlnsert (Node root , Node z){
Node x = null,
x = root;
x = root;
y = null;
while(x != null){
y = x;
if(x > z)
x
y. left
else
x = y.right;
if(y != null){
if(y > z)
y. left
else
y. right
}
else
root = z;
Figure 7-7: Synthesized implementation for binary search tree insertion manipulation
7.2
IMPSYN algorithm results
The results for IMPSYN algorithm on BST rotation and linked list reversal case studies is presented in Table 7.2. It can be observed from the Table that the memory
requirements for IMPSYN algorithm are significantly lesser than that for the EXPSYN
algorithm. The number of abstract states in these benchmarks are much larger than
the threshold (~ 300 states) for the EXPSYN algorithm. It can also be noted that
the sizes of the SAT formula is much larger. The incremental solving feature of the
IMPSYN algorithm enables the solving of such large SAT instances.
Data Structure
BST left-rotation
BST right-rotation
Linked List reversal
#Clauses
1.36M
1.35M
611K
Memory
0.8G
0.7G
0.5G
States
94
94
114
Transitions
79
79
76
Iterations
1
1
4
Table 7.2: Experimental results for IMPSYN algorithm
7.2.1
Left/Right rotation in binary search tree
The storyboard for the BST left rotate manipulation is shown in Figure 7-8. This is
taken directly from the Introduction to Algorithms textbook [6]. The first scenario
shows the common case with two concrete nodes x and y which are to be appropriately
rotated. The abstract nodes a, /, -y and PX represent a set of nodes in the corresponding subtree. As the book also points out, the algorithm needs to care about
whether the node x is the root node or x is left/right child of its parent; and if the
left subtree of node y is empty. This gives us 6 scenarios in total as shown in the
storyboard.
The control flow sketch is shown in Figure 7.2.1. Here STMT is defined to be all
statements over the variables root, x and y with two pointer dereferences. Since the
storyboard has 6 scenarios and the expected complexity of the manipulation is 0(1),
the programmer provides three if statements. The second nested if corresponds to
the case that there are three cases for parent pointer of x : null, parent to the left
and parent to the right. It can be noted that this structure of control flow sketch can
be easily automated as well. For this example, we can try different combinations of
if statements nesting. The synthesized implementation for left rotation is presented
in Figure 7.2.1.
As in the case of the linked list deletion previously, for synthesizing the BST right
rotation implementation we simply reverse the input output pairs in the scenarios of
the storyboard in Figure 7-8. The control flow sketch essentially remains the same
with conditionals choices flipping symmetrically. The synthesized code is shown in
Figure 7.2.1.
Total Time
113s
115s
110s
.................
..
. ..............................
MPL-
rootQ
rootQ
root
rot~
A
z~I\
t~.
Z
root
A
0Y,
-4 )
LK~
root
zA\
Scenario
1
~j
A
LA
)
root(
root
A
A
p
ZA\
A
Scenario
2
@oot
(-:Dy-,
POXroot
A
A
Scenario 5
root,
Aa
A
Figure 7-8: Storyboard for binary search tree left rotation
root
Scenario 5
Scenario
3-
Scenario 4
#define LOC {1 (xlyjroot)(.left
#define STMT LOC = LOC;
#define COND {
y.le ft
null
void bstLeftRotate (Node root,
I .right
I x. parent
Node x,
I .parent)?(.left
null
I x
I
. right
I . parent)?
- x.parent .left
Node y){
* @Start*/
STMT
STMT
i f (COND){
STMT
}
else{
STMT
}
STMT
STMT
i f (COND){
i f (COND)
STMT
else
STMT
else
STMT
}
STMT
STMT
* @End*
Figure 7-9: Control flow sketch for binary search tree left rotation manipulation
i}
void bstLeftRotate(Node root , Node x,
Node y){
root = root .parent;
y. parent. right = y. left
if(y. left != null)
y.left.parent = y.parent;
y.left = x.left.parent;
if(x. parent = null)
y. right .parent;
root
else{
if (x
x. parent . left )
root parent. left
y. right parent;
else
root parent. right
y. right .parent;
}
y. parent = y. left .parent;
x. parent = y. right. parent;
}
Figure 7-10: Synthesized implementation for binary search tree left rotate manipulation
void bstRightRotate (Node root ,NNode x , Node y){I
y. left = y. left .right
root = root .parent;
if(x.right ! null)
x. right .parent
x. right = y. right .parent;
- null)
if(y. parent
left
.parent;
root = x.
else{
if(y
y. parent. left )
x. left .parent;
root .parent left
else
root parent. right
x. parent
y.parent
root .parent.parent;
x;
}
Figure 7-11: Synthesized implementation for binary search tree right rotate manipulation
7.2.2
In-place linked list reversal
head
-
mid
b
I
a
--
%
Lmi
unfold
-eIIIN~~
a
he
mid' k-
r7H mid ~fold
I
---
I
Imid I
Scenario
--- - - -- - - - -------- --- --- --- --- ----------
head
head
head
he ad
head
head
Scenario 2
Scenario 3
Scenario 4
Figure 7-12: Storyboard for inplace linked list reversal
The storyboard for inplace linked list reversal is shown in Figure 7-12. The first
scenario defines the common case where the first node of list is defined to be a concrete
node a, the last node as concrete node b and all the nodes in the middle between these
two nodes are represented by the abstract node mid. The inductive definition of the
abstract node mid is also shown in the storyboard by the nondeterministic unfold and
fold operations. The other scenarios cover the cases of linked lists of length 2, 1 and
0 respectively.
The control flow sketch of the manipulation is shown in Figure 7.2.2. The programmer provides the expected complexity of O(n) with one while loop with randomly
chosed 5 statments. This number of statements inside the loop can be algorithmically
manipulated. The programmer also expects that three temporary variables temp1,
temp2 and temp3 would be sufficient. Every loop iteration begins with an unfold
statement and ends with a fold statement. The synthesized implementation is shown
in figure 7.2.2. Interestingly, it can be observed that the synthesized implementation
does not use the temporary variable temp2. Also, only 4 statements are required
inside the loop body.
#de fine
#de fine
#define
#define
#define
void
LOC
{
(head I temp1
temp2
STMT LOC = LOC;
COND {1 temp1 != null
UNFOLDSTMT
FOLDSTMT {
{|
temp2
unfold LOC
fold LOC I}
temp3)(.next)?
= null
|}
I temp3 !=null I head != null
}
LlReverse(Node head){
* @Start*/
Node temp1 null
,
temp2=null , temp3=null;
STMT
while (COND){
UNFOLDSTMT
STMT
STMT
STMT
STMT
STMT
FOLDSTMT
* @End*
}
Figure 7-13: Control flow sketch for inplace linked list reversal manipulation
void llReverse(Node head){
Node temp1 = null , temp2
null , temp3
null;
temp1 = head;
while(templ != null){
unfold tempi;
temp1;
head
temp1.next;
temp1
head;
head .next
temp3;
head. next
temp3 = head;
fold head;
I
Figure 7-14: Synthesized implementation for inplace linked list reversal manipulation
}
Chapter 8
Related Work
8.1
Software synthesis
Software synthesis has been an active area of research at least since the early 80s
when the seminal work of Waldinger and Manna [15, 14] showed some early promise
in the concept of deductive synthesis. A more algorithmic approach to synthesis was
pioneered by Pnueli and Rosner in the context of finite state controllers [19]. More
recently, some of the ideas from the field of controller synthesis have been applied to
software, for example, to synthesize program repairs [11].
The idea of using abstract interpretation for synthesis was recently introduced by
Vechev Yahav and Yorsh [30], as a follow up to earlier work on synthesis of concurrent
datastructures [29]. Their system is designed to synthesize efficient synchronization
for concurrent programs, and is very different from ours, both in its scope and in the
algorithms it uses. Unlike our system, their synthesizer is not based on constraint
solving; instead, it uses a "generate-and-test" approach, where the system generates
candidate implementations and tries to verify them using abstract interpretation,
relying on the domain-specific properties of the domain to prune the search space.
This approach is very effective for the problem of synthesizing synchronization, but
the constraint based approach is more general and allows us to handle extremely large
search spaces with no apparent structure.
The idea of using a constraint based approach for abstract interpretation was previously introduced by Gulwani et. al. [9]. More recently, their group has used similar
techniques to synthesize invariants [10] and even complete programs [27]. The most
important distinction between their work and ours is the use of the storyboards to
capture insights from the programmer and make the synthesis process more efficient.
The idea of using a sketch to define the structure of the implementation was
adapted from the original work on sketch based synthesis [24]. The idea was originally
applied to the domain of bit-stream manipulations [26], such as ciphers and error
correction codes, and has been applied more recently to scientific programs [24] and
concurrent datastructures [25].
8.2
Visual Programming and Programming by Example
The systems using graphics for aiding programming, debugging and program. understanding have been an intriguing research area from a very long time. Myers [17]
classifies these systems on the basis of three broad categories: Visual Programming
(with Program Visualizations), Programming by Example and interactive /batch systems. Visual Programming refers to systems that allow programmers to specify program computations graphically whereas Program visualization is used for graphically
visualizing data structures at run-time for debugging purposes. Programming by Example approach uses a finite set of input-output pairs and tries to infer a program
that conforms to those examples.
Grail [8] was one of the earliest systems that compiled flowcharts to executable
code. The AMBIT/G [5] language represented both programs and the data as graphs.
Then the pictorial program was pattern matched for its execution. The framework
was used to describe list-structure garbage collection program and reduction-analysis
string parser. Even though these approaches alleviate the problem of writing code
by letting the programmers use static predefined pictures but the burden of figuring
out the exact sequence of operations still lies with the programmer. Also attempting to capture dynamic transformations trhough static diagrams makes the resulting
programs much difficult.
Shaw [21] developed a framework for learning restricted Lisp programs from single
input/output. The framework is not for general programs and also not guaranteed
to learn the correct program. Pygmalion [22] was one of the first successful programming by demonstration systems. The programmer provided concrete execution of the
program on a concrete example with the help of icons and the system inferred some
recursive program from the example. Tinker [13], aimed at beginning programmers,
let them write Lisp programs by providing Lisp expressions or mouse inputs to handle
the execution on concrete examples of input data. These concrete program executions
were then generalized to symbolic executions and in the process ambiguities were resolved by asking the programmer for disambiguations. These systems alleviates the
problem for programmer to worry about abstract inputs but still they require the
programmer to know how the program is supposed to execute on concrete inputs.
Sketchpad [28] is a seminal work that lead to a whole new field of human-computer
interaction, as well is considered a great breakthrough for the computer graphics
research. In this framework, a programmer used sketched geometrical object shapes
like straight lines, arcs etc. using a light pen. The programmer could also express
constraints on these shapes to get regular geometrical objects on the screen and
used graphical buttons for providing options like copying etc. Even though this was a
revolutionary work, this did not cater to the general purpose programming puroposes.
Thinkpad [20] system combined some ideas from programming by example, constrainedbased systems and graphical programming frameworks. It used data abstracts to let
user draw data structures graphically and use constraints to specify data structure
invariants. The programmer could then manipulate these graphical abstractions and
perform an execution of the program on some example input. This system provided
a platform for programmer to program pictorialy but still the problem of reasoning
about the precise execution of the program remained.
The Storyboard programming framework combines ideas from visual programming, programming by example and software synthesis research. It lets programmer
specify graphical specifications for input-output pairs (potentially infinite due to abstraction). The Storyboard framework is different from previous works in programming by example as it requires no concrete executions on the example inputs from
the user. Moreover, our framework requires a control flow sketch of the program to
be provided by the programmer to structure the program search space and not let
the synthesizer synthesize arbitrary programs. This idea of providing programmer's
insights helps rule out a large subset of undesirable programs. The framework is
moreover different from the previous work in visual programming as it does not require programmers to provide a pictorial execution of the desired program. Only the
input and output pictures are required which are much easier to reason about rather
than the complete program execution.
Chapter 9
Conclusions and Future Work
In this thesis we present a framework for automatically synthesizing correct implementations of data structure manipulations with unbounded guarantees from graphical
specifications. Graphical specifications with combined synthesis engine alleviate the
task of complex reasoning about the corresponding low level implementation for the
programmers. Programmer only needs to reason graphically about the manipulation
which is much more intuitive than the low level code. The thesis also presents a
novel algorithm to automatically derive abstract domain from the storyboards that
is required to provide unbounded correctness guarantees for the synthesized implementation. We present preliminary results for the feasibility of our approach on the
linked list and binary search tree manipulation examples.
Our immediate future work is to implement the graphical frontend for the storyboard language. We intend to provide basic graphical constructs similar to Thinkpad [20]
for programmers to draw abstract data structures graphically. We also intend to allow programmers to specify constraints over the data abstractions in the frontend.
We then plan to use our framework for synthesizing challenge benchmarks like red
black tree manipulations. And ultimately, our goal is for programmers to use this
framework for multitude of tasks involving manipulations for arbitrary user defined
data structures.
We also plan to investigate the approach of modular synthesis using storyboards.
For synthesizing large implementations, we intend to first synthesize small modules
using storyboards and then search over these modules to get the desired large implementations. For example insertion in a red black tree can be broken down to two
modules: a module for insertion in binary search tree and a module fixing the height
invariant. Furthermore, the binary search tree module can be used as a sub-module
for fixing the height invariant module.
Bibliography
[1] Thomas Ball, Rupak Majumdar, Todd D. Millstein, and Sriram K. Rajamani.
Automatic predicate abstraction of c programs. In PLDI, pages 203-213, 2001.
[2] Dirk Beyer, Thomas A. Henzinger, Rupak Majumdar, and Andrey Rybalchenko.
Path invariants. In PLDI, pages 300-309, 2007.
[3] Alan Borning. The programming language aspects of thinglab, a constraintoriented simulation laboratory. ACM Trans. Program. Lang. Syst., 3(4):353-387,
1981.
[4] Marc H. Brown and Robert Sedgewick.
IEEE Software, 2(1):28-39, 1985.
Techniques for algorithm animation.
[5] Carlos Christensen. On the implementation of ambit, a language for symbol
manipulation. Commun. ACM, 9(8):570-573, 1966.
[6] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Introduction to Algorithms, Second Edition. The MIT Press and McGraw-Hill
Book Company, 2001.
[7] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice
model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238-252, 1977.
[8] T. 0. Ellis, J. F. Heafner, and W. F. Sibley. The grail project: An experiment in
man-machine communications. Technical Report RM-5999-ARPA, RAND, 1969.
[9] Sumit Gulwani, Saurabh Srivastava, and Ramarathnam Venkatesan. Program
analysis as constraint solving. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pages
281-292, New York, NY, USA, 2008. ACM.
[10] Sumit Gulwani, Saurabh Srivastava, and Ramarathnam Venkatesan. Constraintbased invariant inference over predicate abstraction. In VMCAI '09: Proceedings
of the 2009 Conference on Verification Model Checking and Abstract Interpretation, pages 120-135, 2009.
[11] Barbara Jobstmann, Andreas Griesmayer, and Roderick Bloem. Program repair
as a game. In CAV, pages 226-238, 2005.
[12] Gary A. Kildall. A unified approach to global program optimization. In POPL
'73: Proceedings of the 1st annual ACM SIGACT-SIGPLA N symposium on Principles of programming languages, pages 194-206, New York, NY, USA, 1973.
ACM.
[13]
Henry Lieberman. Tinker: Example-based programming for artificial intelligence. In IJCAI, page 1060, 1981.
[14] Z. Manna and R. Waldinger. Synthesis: Dreams => programs. IEEE Transactions on Software Engineering, 5(4):294-328, 1979.
[15] Zohar Manna and Richard Waldinger. A deductive approach to program synthesis. ACM Trans. Program. Lang. Syst., 2(1):90-121, 1980.
[16] B. A. Myers. Displaying data structures for interactive debugging. Technical
Report CSL-80-7, Xerox PARC, 1980.
[17]
B. A. Myers. Visual programming, programming by example, and program
visualization: a taxonomy. SIGCHI Bull., 17(4):59-66, 1986.
[181 Flemming Nielson, Hanne R. Nielson, and Chris Hankin. Principles of Program
Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[19] Amir Pnueli and Roni Rosner. On the synthesis of an asynchronous reactive
module. In ICALP '89: Proceedings of the 16th International Colloquium on
Automata, Languages and Programming, pages 652-671, London, UK, 1989.
Springer-Verlag.
[20] R.V. Rubin, E.J. Colin, and S.P. Reiss. Think pad: A graphical system for
program-ming by demonstration. IEEE Software, 2:73-79, 1985.
[21] David E. Shaw, William R. Swartout, and C. Cordell Green. Inferring lisp
programs from examples. In IJCAI'75: Proceedings of the 4th international
joint conference on Artificial intelligence, pages 260-267, San Francisco, CA,
USA, 1975. Morgan Kaufmann Publishers Inc.
[22] David Canfield Smith. Pygmalion: a creative programming environment. PhD
thesis, Stanford, CA, USA, 1975.
[23] Armando Solar-Lezama. Program Synthesis By Sketching. PhD thesis, EECS
Dept., UC Berkeley, 2008.
[24] Armando Solar-Lezama, Gilad Arnold, Liviu Tancau, Rastislav Bodik, Vijay
Saraswat, and Sanjit Seshia. Sketching stencils. In PLDI '07: Proceedings of the
2007 ACM SIGPLAN conference on Programming language design and implementation, volume 42, pages 167-178, New York, NY, USA, 2007. ACM.
[25] Armando Solar-Lezama, Chris Jones, Gilad Arnold, and Rastislav Bodik. Sketching concurrent datastructures. In PLDI 08, 2008.
[26] Armando Solar-Lezama, Rodric Rabbah, Rastislav Bodik, and Kemal Ebcioglu.
Programming by sketching for bit-streaming programs. In PLDI '05: Proceedings
of the 2005 ACM SIGPLAN conference on Programming language design and
implementation, pages 281-294, New York, NY, USA, 2005. ACM Press.
[27] Saurabh Srivastava, Sumit Gulwani, and Jeffrey Foster. From program verification to program synthesis. POPL, 2010.
[28] Ivan E. Sutherland. Sketchpad, A Man-Machine GraphicalCommunication System. Outstanding Dissertations in the Computer Sciences. Garland Publishing,
New York, 1963.
[29] Martin Vechev and Eran Yahav. Deriving linearizable fine-grained concurrent
objects. SIGPLAN Not., 43(6):125-135, 2008.
[30] Martin Vechev, Eran Yahav, and Greta Yorsh. Abstraction-guided synthesis of
synchronization. In POPL, New York, NY, USA, 2010. ACM.
[31] Jan Wielemaker. An overview of the swi-prolog programming environment. In
WLPE, pages 1-16, 2003.