Department of Computer Science
University of California, Santa Barbara bultan@cs.ucsb.edu
http://www.cs.ucsb.edu/~bultan/composite
Concurrent programming is difficult and error prone
– Sequential programming: states of the variables
– Concurrent programming: states of the variables and the processes
Linked list manipulation is difficult and error prone
– States of the heap: possibly infinite
We would like to guarantee properties of a concurrent linked list implementation
There has been work on verification of concurrent systems with integer variables (and linear constraints)
– [Boigelot 98], [Bultan, Gerber and Pugh, TOPLAS 99], [Delzanno and Podelski, STTT 01]
There has been work on verification of (concurrent) linked lists
– [Sagiv,Reps, Wilhelm TOPLAS 98], [Yahav POPL 01]
What can we do for concurrent systems:
– where both integer and heap variables influence the control flow
– or the properties we wish to verify involve both integer and heap variables?
Use symbolic verification techniques
– Use polyhedra to represent the states of the integer variables
– Use BDDs to represent the states of the boolean and enumerated variables
– Use shape graphs to represent the states of the heap
– Use a composite representation to combine them
Use forward-fixpoint computations to compute reachable states
– Truncated fixpoint computations can be used to detect errors
– Over-approximation techniques can be used to prove properties
• Polyhedra widening
• Summarization in shape graphs
Action
Language
Specification
OR Guarded Commands
Translator to
Action Language
Action Language
Parser
Action Language
Verifier
Students who work with me on this project:
– Tuba Yavuz-Kahveci
– Constantinos Bartzis
– Aysu Betin-Can
Composite Symbolic Library
Code Generator
Verified code
(Java monitor classes)
Omega
Library
Presburger
Arithmetic
Manipulator
CUDD
Package
BDD
Manipulator
MONA
Automata
Manipulator
Composite Symbolic Library, Integration of polyhedra representation with BDDs
– [Yavuz-Kahveci, Tuncer, Bultan, TACAS 01], [Yavuz-Kahveci,
Bultan, STTT]
Action Language Verifier
– [Bultan ICSE 00], [Bultan, Yavuz-Kahveci ASE 01]
Verification of Concurrency Control Components using Action
Language Verifier
– [Yavuz-Kahveci, Bultan ISSTA 02]
Using automata representation for Presburger arithmetic in
Composite Symbolic Library
– [Bartzis and Bultan, CIAA 02], [Bartzis and Bultan, IJFCS] [Bartzis,
Bultan CAV 03]
Specification of concurrent linked lists
– Action Language
Symbolic verification
– Composite representation
Approximation techniques
– Summarization
– Widening
Counting abstraction
Experimental results
Related Work
Conclusions
[Bultan ICSE 00] [Yavuz-Kahveci, Bultan ASE 01]
A state based language
– Actions correspond to state changes
States correspond to valuations of variables
– Integer (possibly unbounded), heap, boolean and enumerated variables
– Parameterized constants are allowed
Transition relation is defined using actions
– Atomic actions: Predicates on current and next state variables
– Action composition: synchronous (&) or asynchronous (|)
Modular
– Modules can have submodules
Properties to be verified
– Invariant(p) : p always holds
We use state formulas to express the properties we need to check
– No primed variables in state formulas
– State formulas are boolean combination (
,
,
,
,
) of integer, boolean and heap formulas numItems > 2 => top.next != null integer formula heap formula
Boolean formulas
– Boolean variables and constants (true, false)
– Relational operators: =,
– Boolean connectives (
,
,
,
,
)
Integer formulas (linear arithmetic)
– Integer variables and constants
– Arithmetic operators: +,
, and * with a constant
– Relational operators: =,
, > , <,
,
– Boolean connectives (
,
,
,
,
)
Heap formulas
– Heap variable, heap-variable.selector, heap constant null
– Relational operators: =,
– Boolean connectives (
,
,
,
,
)
We use transition formulas to express the actions
– In transition formulas primed-variables denote the next-state values, unprimed-variables denote the current-sate values current state variables pc =l2 and numItems =0 and top’ = add and numItems’ =1 and pc’ =l3; next state variables
Transition formulas are in the form:
– boolean-formula integer-formula
heap-transition-formula
Heap transition formulas are in the form:
– guard-formula update-formula
A guard formula is a boolean combination of terms in the form: id id id id id
1
1
1
= id
2
1
.f = id
1
.f = id
2
2
.f
= null
.f = null id id
1
1
id
2
.f
id id
1 id id
1
1
.f
id
null
2
2
.f
.f
null
An update formula is a term in the form: id’
1
= id
2 id’
1
.f = id
2 id’
1
= null id’
1
= new id’
1 id’
1
= id
2
.f
.f = id
2
.f id’
1
.f
= null id’
1
.f = new
module main() heap {next} top, add, get, newTop; boolean mutex; integer numItems;
Variable declarations define the state space of the system
Initial states initial: top=null and mutex and numItems=0; module push() enumerated pc {l1, l2, l3, l4}; initial: pc=l1 and add=null;
Atomic actions: primed variables denote the next sate variables push1: pc=l1 and mutex and !mutex’ and add’=new and pc’=l2; push2: pc=l2 and top=null and top’=add and numItems’=1 and pc’=l3; push3: pc=l3 and top’.next =null and mutex’ and pc’=l1; push4: pc=l2 and top!=null and add’.next=top and pc’=l4; push5: pc=l4 and top’=add and numItems’=numItems+1 and mutex’ and pc’=l1; push: push1 | push2 | push3 | push4 | push5;
Transition relation of the push module is defined endmodule as the asynchronous composition of its atomic actions
module pop() enumerated pc {l1, l2, l3}; initial: pc=l1 and get=null and newTop=null; pop1: pc=l1 and mutex and top!=null and newTop’=top.next and !mutex’ and pc’=l2; pop2: pc=l2 and get’=top and pc’=l3; pop3: pc=l3 and top’=newTop and mutex’ and numItems’=numItems-1 and pc’=l1; pop: pop1 | pop2 | pop3; endmodule main: pop() | pop() | push() | push();
Transition relation of main defined as asynchronous composition of two pop and two push processes spec: invariant([mutex =>(numItems=0 <=> top=null)]) spec: invariant([mutex =>(numItems>2 => top->next!=null)]) endmodule
Invariants to be verified
module main() heap {next} top, add, get, newTop; boolean mutex; integer numItems; initial: top=null and mutex and numItems=0; module push() enumerated pc {l1, l2, l3, l4}; initial: pc=l1 and add=null; push1: pc=l1 and mutex and !mutex’ and add’=new and pc’=l2; push2: pc=l2 and numItems=0 and top’=add and numItems’=1 and pc’=l3; push3: pc=l3 and add’.next=null and mutex’ and pc’=l1; push4: pc=l2 and numItems>0 and add’.next=top and pc’=l4; push5: pc=l4 and top’=add and numItems’=numItems+1 and mutex’ and pc’=l1; push: push1 | push2 | push3 | push4 | push5; endmodule
Specification of concurrent linked lists
– Action Language
Symbolic verification
– Composite representation
Approximation techniques
– Summarization
– Widening
Counting abstraction
Experimental results
Related Work
Conclusions
Forward fixpoint for the reachable states can be computed by iteratively manipulating symbolic representations
– We need forward-image (post-condition), union, and equivalence check computations
ReachableStates(I: Set of initial states,
T: Transition relation) {
RS := I; repeat {
RS old
:= RS;
RS := RS old
} until (RS RS forwardImage(RS old
) old
, T);
}
We use symbolic representations for encoding sets of states
Boolean logic formulas (stored as a BDDs) represent the sets of states of the boolean variables: pc=l1 mutex
Presburger arithmetic formulas (stored as polyhedra) represent the sets of states of integer variables: numItems > 0
Sets of shape graphs represent the sates of the heap variables and the heap heap variables add and top point to node n1 add top add.next is node n2 next n1 n2 top.next is also node n2 next add.next.next is null
Each node in the shape graph represents a dynamically allocated memory location
Heap variables point to nodes of the shape graph (if they are not null)
The edges between the nodes show the locations pointed by the fields of the nodes
Each variable type is mapped to a symbolic representation type
– Boolean and enumerated types
BDD representation
– Integer variables
Polyhedra
– Heap variables
Shape graphs
Each conjunct in a transition formula operates on a single symbolic representation
Composite representation: A disjunctive representation to combine different symbolic representations
Union, subsumption check and forward-image computations are performed on this disjunctive representation
A composite representation
A is a disjunction
A
i n t
1 k
1 a ik where
– n is the number of composite atoms in A
– t is the number of basic symbolic representations
Each composite atom is a conjunction
– Each conjunct corresponds to a different symbolic representation
A set of shape graphs
BDD A set of polyhedra add pc=l1 mutex
numItems=2
top
pc=l2 mutex
numItems=2
add top
pc=l4 mutex
pc=l1 mutex
numItems=2
add
numItems=3
add top top
[Yavuz-Kahveci, Tuncer, Bultan TACAS01], [Yavuz-Kahveci, Bultan
STTT]
Composite Library implements this approach using an objectoriented design
An abstract class defines the common interface for symbolic representations
– Easy to extend with new symbolic representations
– Enables polymorphic verification
As a BDD library we use Colorado University Decision Diagram
Package (CUDD) [Somenzi et al]
As an integer constraint manipulator we use Omega Library
[Pugh et al]
For encoding the states of the heap variables and the heap we use shape graphs encoded as BDDs (using CUDD)
Symbolic
+union()
+isSatisfiable()
+isSubset()
+forwardImage()
BoolSym
–representation:
BDD
+union()
•
•
•
HeapSym
–representation: list of ShapeGraph
+union()
•
•
•
IntSym
–representation: list of Polyhedra
+union()
•
•
•
CompSym
–representation: list of comAtom
+ union()
•
•
•
CUDD Library
ShapeGraph
–atom: *Symbolic
OMEGA Library compAtom
–atom: *Symbolic
Given a composite representation
A
i n t
1 k
1 a ik
We can check satisfiability as follows:
(
)
i n
1 k t
1
(
ik
)
Given composite representations for a set of states and a transition relation:
A
i n t
1 k
1 a ik
R
j m t
1 k
1 r jk
We can compute the forward image as follows: forwardIma ge ( A , R )
i n
1 j m
1 k t
1 forwardIma ge ( a ik
, r jk
)
set of states
pc=l4 mutex
numItems=2
add top transition relation pc=l4 and mutex’ pc’=l1
numItems’=numItems+1
top’=add pc=l1 mutex
numItems=3
add top
pc=l1 mutex
numItems=0
add
add pc=l2 mutex
numItems=0
top top
pc=l3 mutex
numItems=1
add top
pc=l1 mutex
numItems=1
add top
pc=l2 mutex
pc=l4 mutex
numItems=1
add top
numItems=1
add top
pc=l1 mutex
numItems=2
add top
pc=l2 mutex
pc=l4 mutex
numItems=2
numItems=2
add add top top
pc=l4 mutex
numItems=3
add top
.
.
.
We have two reasons for non-termination
– integer variables can increase without a bound
– the number of nodes in the shape graphs can increase without a bound
The state space is infinite
Even if we ignore the heap variables, reachability is undecidable when we have unbounded integer variables
So, we use conservative approximations
Specification of concurrent linked lists
– Action Language
Symbolic verification
– Composite representation
Approximation techniques
– Summarization
– Widening
Counting Abstraction
Experimental results
Related Work
Conclusions
To verify or falsify a property p
Compute a lower ( RS
) or an upper ( RS + ) approximation to the set of reachable states
There are three possibilities:
p
RS
“The property is satisfied”
RS +
reachable sates which violate the property
p
RS
RS
“The property is false”
p
RS
“I don’t know”
RS RS +
Truncated fixpoint computation
– To compute a lower bound for a least-fixpoint computation
– Stops after a fixed number of iterations
Widening
– To compute an upper bound for the least-fixpoint computation
– We use a generalization of the polyhedra widening operator by
[Cousot and Halbwachs POPL’77]
Summarization
– Generate summary nodes in the shape graphs which represent more than one concrete node
– Materialization: we need to generate concrete nodes from the summary nodes when needed
The nodes that form a chain are mapped to a summary node
...
No heap variable points to any concrete node that is mapped to a summary node
Each concrete node mapped to a summary node is only pointed by a concrete node which is also mapped to the same summary node
During summarization, we also introduce an integer variable which counts the number of concrete nodes mapped to a summary node
pc=l1 mutex
numItems=3
add top summarized nodes
After summarization, it becomes: pc=l1 mutex
numItems=3 summarycount=2
add top a new integer variable representing the number of concrete nodes encoded by the summary node summary node
Summarization guarantees that the number of different shape graphs that can be generated are finite
However, the summary-counts can still increase without a bound
We use polyhedral widening operation to force the fixpoint computation to convergence
pc=l1 mutex
numItems=3
summaryCount=2
add top
add pc=l2 mutex
numItems=3
summaryCount=2
add pc=l4 mutex
numItems=3
summaryCount=2
pc=l1 mutex
numItems=4
summaryCount=2
add top top top
We need to do summarization again
pc=l1 mutex
numItems=4
summaryCount=2
add top pc=l1 mutex
After summarization, it becomes:
numItems=4
summaryCount=3
add top
After each fixpoint iteration we try to merge as many composite atoms as possible
For example, following composite atoms can be merged pc=l1 mutex
numItems=3
summaryCount=2
add top pc=l1 mutex
numItems=4
summaryCount=3
add top
pc=l1 mutex
numItems=3
summaryCount=2
add top pc=l1 mutex
numItems=4
summaryCount=3
add top
pc=l1 mutex
(numItems=4
summaryCount=3
numItems=3
summarycount=2)
add top
add pc=l1 mutex
(numItems=4
summaryCount=3
numItems=3
summaryCount=2)
pc=l1 mutex
numItems=summaryCount+1
3 numItems
numItems 4
add top top
Forward-fixpoint computation still will not converge since numItems and summaryCount keep increasing without a bound
We use the widening operation:
– Given two composite atoms c
1 iterates, assume that c c
1
2
= b
= b
1
2
i
1
i
2
h
h
1
2 and c
2 in consecutive fixpoint where b
1
= b
2 and h
1
= h
2 and i
1
i
2
– Also assume that i
1 is a single polyhedron (i.e. a conjunction of arithmetic constraints) and i
2 is also a single polyhedron
Then
– i
1
i
2 by i
2 is defined as: all the constraints in i
1 which are also satisfied
Replace i
2 with i
1
i
2 in c
2
This generates an upper approximation to the forward-fixpoint computation
pc=l1 mutex
numItems=summaryCount+1
3 numItems
numItems 4
add pc=l1 mutex
numItems=summaryCount+1
3 numItems
numItems 5
add pc=l1 mutex
numItems=summaryCount+1
3 numItems
add top top top
Now, the forward-fixpoint converges
Use counting abstraction [Delzanno CAV’00]
– Create an integer variable for each local state of a process
– Each variable will count the number of processes in a particular state
Local states of the processes have to be finite
– Shared variables of the monitor can be unbounded
Counting abstraction can be automated
module main() heap top, add, get, newTop; boolean mutex; integer numItems;
Variables for counting the number of processes in each state
Parameterized constant integer l1C, l2C, l2C, l4C; parameterized integer numProc; representing the number of processes initial: top=null and mutex and numItems=0 and l1C=numProc and l2C=0 and l3C=0 and l4C=0 ; restrict: numProc>0 ;
Initialize initial state module push() counter to the number
//enumerated pc {l1, l2,l3,l4}; initial: add=null; of processes. Initialize push1: l1C>0 and mutex and !mutex' other states to 0.
and add'=new and l1C'=l1C-1 and l2C'=l2C+1 ; push2: l2C>0 and top=null and top'=add numItems'=1 and l2C'=l2C-1 and l3C'=l3C+1 ;
...
When local state changes, endmodule local state counter
SPECIFICATION
Stack
Single Lock Queue
Two Lock Queue
VERIFIED INVARIANTS top=null numItems=0 top null numItems 0 numItems=2 top.next null head=null numItems=0 head null numItems 0
(head=tail head null) numItems=1 head tail numItems 0 numItems>1 head tail numItems>2 head.next
tail
Number of
Processes
Queue
1P-1C
HC
10.19
2P-2C
4P-4C
15.74
31.55
1P-PC
PP-1C
12.85
18.24
Queue Stack
IC
12.95
21.64
46.5
13.62
19.43
HC
4.57
6.73
12.71
5.61
6.48
Stack
IC
5.21
8.24
15.11
5.73
6.82
2Lock
Queue
HC
60.5
88.26
2Lock
Queue
IC
58.13
122.47
We need a summarization operation that can be used to define more than just singly linked lists
We use restricted graph grammar rules to define summarization patterns
The nodes which match to the summarization pattern are represented with a single node
We still keep a summary count for each summary node
L x
x .n = y , L y n n
...
n
L x
x .n = y , y .p = x , L y
L x
x .n = y , x .d = z , L y n p n p
...
n p d n d n
...
n d
Find the (maximal) set of nodes that match to the pattern
We are looking at linear linked lists
– there will be one entry node
– and one exit node (exit node is not included in the summary node)
Other than the entry and the exit nodes, the set of nodes which match the pattern do not have any incoming or outgoing edges to outside nodes
Using summarization patterns we can handle a larger class of linked lists
Summarization and materialization operations can be done automatically based on the summarization pattern
Verification is still completely automatic but the user has to give the summarization pattern
There is a lot of work on Shape analysis, I will just mention the ones which directly influenced us:
– [Sagiv,Reps, Wilhelm TOPLAS’98]
– [Dor, Rodeh, Sagiv SAS’00]
Verification of concurrent linked lists with arbitrary number of processes in [Yahav POPL’01]
[Sagiv,Reps, Wilhelm TOPLAS], [Lev-Ami, Reps, Sagiv, Wilhelm
ISSTA 00] use 3-valued logic and instrumentation predicates to verify properties that cannot be expressed directly in our framework such as sorted linked lists, however, our approach does not require instrumentation predicates
[Sagiv,Reps, Wilhelm ESOP 03] recent results on automatically generating instrumentation predicates
Deutch used integer constraint lattices to compute aliasing information using symbolic access paths [Deutch PLDI’94]
The idea of summarization patterns is based on the shape types introduced in [Fradet and Metayer POPL 97]
Liveness properties?
– We would like to do full CTL model checking
Backward image computation?