Randal E. Bryant
Carnegie Mellon University http://www.cs.cmu.edu/~bryant
Decision Procedures in Formal
Verification
RTL/
Source
Code
+
Specification
Abstraction
Formal
Model
+
Specification
Verification
OK
Error
Decision Procedure for Decidable Fragment of First-Order Logic
– 2 –
Applications: Out-of-order, Pipelined Microprocessors; Cache
Coherence Protocols; Device Drivers; Compiler Validation; …
SAT-based Decision Procedures
Input Formula
Satisfiability-preserving
Boolean Encoder
Boolean Formula
SAT Solver
– 3 – satisfiable unsatisfiable
EAGER ENCODING
Input Formula
Approximate
Boolean Encoder additional clause unsatisfiable
Boolean Formula
SAT Solver satisfiable satisfying assignment
First-order
Conjunctions
SAT Checker unsatisfiable
LAZY ENCODING satisfiable
Lazy Encoding Characteristics
Uninterpreted
Functions
Linear
Arithmetic
First-order
Conjunctions
SAT Checker
Theory
Combiner Bit Vectors
•
•
•
Theory N
+ Can be extended to handle wide variety of theories
+ Clean & modular design
– Does not scale well
Number of calls to conjunction checker typically exponential in formula size
Each call independent: nothing learned in one call can be exploited by another
– 4 –
Eager Encoding Characteristics
Input Formula
– Must encode all information about domain properties into Boolean formula
– Some properties can give exponential blowup
+ Lets SAT solver do all of the work Satisfiability-preserving
Boolean Encoder
Boolean Formula
SAT Solver
Good Approach for Some Domains
Modern SAT solvers have remarkable capacity
Good at extracting relevant portions out of very large formulas
Learns about formula properties as search proceeds satisfiable unsatisfiable Focus of this talk
– 5 –
Data and Function Abstraction x
0 x
1 x
2
x n -1 x
Bit-vectors to (unbounded) Integers
A
L
U
f
Common Operations p x y
1
0
ITE ( p , x , y )
If-then-else x y
= x = y
Test for equality
– 6 –
Functional units to Uninterpreted Functions a = x
b = y
f ( a,b ) = f ( x,y )
Abstract Modeling of Microprocessor
PC
F
1
Op
IF/ID
Rd
Ra
Control
Adat
ID/EX
Control
EX/WB
=
Imm
Reg.
File
A
F
L
U
2
=
Rb
For any Block that Transforms or Evaluates Data:
Replace with generic, unspecified function
Also view instruction memory as function
– 7 –
EUF: Equality with Uninterp. Functs
Decidable fragment of first order logic
Formulas ( F )
F , F
1
F
2
, F
1
T
1
= T
2
P ( T
1
, …, T k
)
F
2
Terms ( T )
ITE ( F , T
1
, T
2
)
Fun ( T
1
, …,
T k
)
Functions ( Fun ) f
Read, Write
Predicates ( P ) p
Boolean Expressions
Boolean connectives
Equation
Predicate application
Integer Expressions
If-then-else
Function application
Integer
Integer
Uninterpreted function symbol
Memory operations
Integer
Boolean
Uninterpreted predicate symbol
– 8 –
EUF Decision Problem
Circuit Representation of Formula
Truth Values
Dashed Lines
Model Control
Logical connectives
Equations f f
Integer Values
Solid lines
Model Data
Uninterpreted functions
If-Then-Else operation f f
=
=
=
=
Task
Determine whether formula F is universally valid
True for all interpretations of variables and function symbols
Often expressed as (un)satisfiability problem
» Prove that formula
F is not satisfiable
– 9 –
Finite Model Property for EUF f f f f
=
=
=
=
x
0 d
0 f ( x
0
) f ( d
0
)
Observation
Any formula has limited number of distinct expressions
Only property that matters is whether or not different terms are equal
– 10 –
Boolean Encoding of Integer Values
Expression Possible
Values x
0
{0} d
0 f ( x
0
) f ( d
0
)
{0,1}
{0,1,2}
Encoding
0
0 b
21
{0,1,2,3} b
31
Bit b b b
0
10
20
30
For Each Expression
Either equal to or distinct from each preceding expression
Boolean Encoding
Use Boolean values to encode integers over small range
– 11 –
EUF formula can be translated into propositional logic
Logic circuit with multiplexors, comparators, logic gates
Tautology iff original formula valid
Some History of EUF Decision
Procedures
Ackermann, 1954
Quantifier-free decision problem can be decided based on finite instantiations
Burch & Dill, CAV ‘94
Automatic decision procedure
» Davis-Putnam enumeration
» Congruence closure to enforce functional consistency
Boolean approaches
Goel, et al, CAV ‘98
» Attempted with BDDs, but didn’t get good results
Bryant, German, Velev, CAV ‘99
» Could verify microprocessor using BDDs
Velev & Bryant, DAC 2001
» Demonstrated power of modern SAT procedures
– 12 –
Exploiting Positive Equality
Bryant, German, Velev CAV ‘99
First successful use of Boolean methods for EUF
Positive Equality
Equations that appear in unnegated form
Exploiting
Can greatly reduce number of cases required to show validity
Only need to consider maximally diverse interpretations
Reduce number of Boolean variables in bit-level encoding
– 13 –
Diverse Interpretations: Illustration
Task
Verify someone’s obscure code for 4X4 array transpose void trans(int a[4][4])
{ int t; for (t = 4; t < 15; t++) if (~t&2|| t&8 && ~t&1) { int r = t&0x3; int c = t>>2; int val = a[r][c]; a[r][c] = a[c][r]; a[c][r] = val;
Only operations on array elements
}
}
Observation
Array elements altered only by copying one to another
Just need to make sure right set of copies performed
– 14 –
Verifying Array Code
Test for trans4 a
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15 trans4 a’
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
Single Test Adequate
Unique value for each possible source element
“Maximally Diverse”
If a’[r][c] = a[c][r] , then must have copied proper value
– 15 –
Characteristics of Array Verification
Correctness Condition a’[0][0] = a[0][0] a’[0][1] = a[1][0] a’[0][2] = a[2][0] …
… a’[3][2] = a[2][3] a’[3][3] = a[3][3]
Properties
All equations are in positive form
Worst case test is one that tends to make things unequal
Maximally diverse interpretation: use as many different values as possible
All maximally diverse interpretations isomorphic
Only need to try one to prove all handled correctly
– 16 –
Equations in Processor Verification
PC
+4
Instr
Mem
Op
IF/ID
Rd
Ra
Control
Adat
ID/EX
Control
EX/WB
=
Imm
Reg.
File
A
L
U
=
Rb
– 17 –
Data Types Equations
Register Ids
Program Data
Control stalling & forwarding
Instruction Address Only top-level verification condition
Only top-level verification condition
Exploiting Equation Structure
Positive Equations
In top-level verification condition
Can use maximally diverse interpretation
Negative Equations
PIpeline control logic
Between register IDs
Operation depends on whether or not two IDs are equal
Must use general encoding
Encode with Boolean variables
All possibility of IDs that match and/or don’t match
– 18 –
Application of Positive Equality
0
1 f f
7 8
5 6
7 8
0 1 f f
7
=
=
1
5 x
0
6 d
0
7 8 f ( x
0
) f ( d
0
)
5
=
=
6
5 6
7 6
5 6
Observation
All equations are positive in this formula
Can consider single, diverse interpretation for terms
– 19 –
Function Elimination: Ackermann’s
Method
Replace All Function Applications by Integer Variables
Introduce new domain variable
Enforce functional consistency by global constraints
x
1 x
2
= vf f
1 vf f
2
=
F
– 20 –
Unclear how to restrict evaluation to diverse interpretations
Function Elimination: ITE Method
General Technique
Introduce new domain variable
Nested ITE structure maintains functional consistency x
1 f vf
1
– 21 – x x
2
3
=
= f vf
2
= f vf
3
T
F
T
F
T
F
Generating Diverse Encoding
Replacing Application
Use fixed values rather than variables
Application results equal iff arguments equal x
1 f
5 x
2 x
3
=
= f
6
= f
7
T
F
T
F
T
F
– 22 –
Benefits of Positive Equality
Microprocessor Benchmarks Velev & Bryant, JSC ‘02
1xDLX: Single issue, RISC processor
2xDLX-EX-BP: Dual issue processor with exception handling
& branch prediction
9VLIW-BP: 9-way VLIW processor with branch prediction
Measurements
Using BerkMin SAT solver
Benchmark Using Pos. Eq.
No Pos. Eq
0.02
2
0.07
4
15
10
224
229
15
> 24hrs
> 24hrs
> 24hrs
– 23 –
1xDLX
2xDLX-EX-BP
9VLIW-BP buggy good buggy good buggy good
Revisiting Encoding Techniques x = y
y = z
z
x Satisfiable?
Small Domain (SD)
x
1 x
0
=
y
1 y
0
y
1 y
0
=
z
1 z
0
z
1 z
0
x
1 x
0
Use bit-level encodings of bounded integers
Implicitly encode properties of equality logic
Per-Constraint Encoding (EIJ) e xy
e yz
e xz
Transitivity Constraints e e e yz xy xy
e
e zx yz
e xz
e
e
e xy xz yz
Introduce explicit Boolean variable for each equation
Additional transitivity constraints to express properties of equality logic
– 24 –
Per-Constraint Encoding
Introduced by Goel et al., CAV ‘98
Exploiting sparse structure by Bryant & Velev, CAV 2000
Procedure
Initial formula F
Want to prove valid
Prove that
F is not satisfiable
Replace each equation x = y by Boolean variable e xy
Gives formula F sat
Generate formula expressing transitivity constraints
Gives formula F trans
Use SAT solver to show that F sat
F trans not satisfiable
Motivation
Provides SAT solver with more direct representation of underlying problem
– 25 –
Graph Interpretation of Transitivity
Transitivity Violation
Cycle in graph
Exactly one edge has e i,j
= false
= =
=
= =
= =
– 26 –
Exploiting Chords
Chord
Edge connecting two nonadjacent vertices in cycle
Property
Sufficient to enforce transitivity constraints for all chord-free cycles
If transitivity holds for all chord-free cycles, then holds for arbitrary cycles
– 27 –
Enumerating Chord-Free Cycles
Strategy
Enumerate chord-free cycles in graph
Each cycle of length k yields k transitivity constraints
Problem
Potentially exponential number of chord-free cycles
1 2 • • • k
2 k + k chord-free cycles
• • •
– 28 –
Adding Chords
Strategy
Add edges to graph to reduce number of chord-free cycles
1 2 • • • k
2 k + k chord-free cycles
2 k +1 chord-free cycles
• • •
Trade-Off
Reduces formula size
Increases number of relational variables
– 29 –
Chordal Graph
Definition
Every cycle of length > 3 has a chord
Goal
Add minimum number of edges to make graph chordal
Relation to Sparse Gaussian
Elimination
Choose pivot ordering that minimizes fill-in
NP-hard
Simple heuristics effective
– 30 –
1xDLX-C Equation Structure
Vertices
For each v i
13 different register identifiers
Edges
For each equation
Control stalling and forwarding logic
27 relational variables
Out of 78 possible
– 31 –
Adding Chordal Edges to 1xDLX-C
Original
27 relational variables
286 cycles
858 clauses
Augmented
33 relational variables
40 cycles
120 clauses
– 32 –
2DLX-CCt Equation Structure
Equations
Between 25 different register identifiers
143 relational variables
Out of 300 possible
– 33 –
Adding Chordal Edges to 2xDLX-CCt
Original
143 relational variables
2,136 cycles
8,364 clauses
Augmented
193 relational variables
858 cycles
2,574 clauses
– 34 –
Choosing Encoding Method
Comparison
Formula length n with m integer variables & function applications
Worst-case complexity
Small Domain Per-Constraint
Boolean
Variables
O( m log m ) O( m 2 )
Formula Size O( n + m 2 log m ) O( n + m 3 )
Per-Constraint Encoding Works Well in Practice
Generates slightly larger formulas than small domain
Better performance by SAT solver
– 35 –
Encoding Comparison
Benchmarks
Superscalar, out-of-order datapath
2 –6 instructions issued in parallel
Measurements
Using BerkMin SAT solver
Issue
Width
2
3
4
5
6
Per-Constraint
Vars Clauses Time
139 8,213
308 33,270
553 96,480
1.6
15
65
857 240,892 154
1,243 528,962 1,957
Velev & Bryant, JSC ‘02
Small Domain
Vars Clauses Time
81
127
194
1,294
3,780
8,362
1.7
19
99
249 15,647 255
304 26,738 3,206
– 36 –
Extensions
Difference logic
Predicates of form x ≤ y + C
Original logic of UCLID
Use integer variables to represent pointers into buffers
C =
1
Linear constraints
Predicates of from a
1 x
1
+ a
2 x
2
+ … + a n x n
≤ b
Used in applying UCLID to software verification and software security problems
– 37 –
Difference Logic
Predicates of form x ≤ y + C
C generally a small integer
Encoding Methods
Small domain
Range bound n · max |C|
Per constraint encoding
Variables of form e x,,y
C
Can have exponential blowup in number of variables
Choosing Encoding Method
Per constraint better, as long as it doesn’t blow up
Predicting blowup
Successfully used classifier trained by machine learning (Seshia,
Lahiri & Bryant, DAC ’03)
– 38 –
Linear Constraints
Predicates of from a
1 x
1
+ a
2 x
2
+ … + a n x n
≤ b
Common Case
All but k predicates are difference predicates
a i
= +1, a j
= –1, rest = 0
Rest are sparse
At most w coefficients nonzero
Coefficient values small n #variables w max #non-zero terms k b max
#non-difference constraints max |constant| a max max |coefficient|
– 39 –
Linear Constraints
Small Domain Encoding
(Seshia & Bryant, LICS ’04)
Find value D such that only need to consider solutions with 0 ≤ x i
< D, for all i
Bounds on D:
( n +2)
¢ n
¢
( b max
+1)
¢
( w
¢ a max
) k n w k
Encode as SAT problem with log(D) bits / integer variable
Practical for real applications b max a max
#variables max #non-zero terms
#non-difference constraints max |constant| max |coefficient|
– 40 –
Some Lessons We’ve Learned
Preserve Boolean Structure
Other approaches require collapsing to conjunctions of predicates
Exploit Problem Characteristics
Sparseness
Tighten bounds and/or reduce number of constraints
Polarity structure
Positive equality
Let SAT Solver Do the Work
Eager encoding: provide sufficient set of constraints to prove / disprove formula
They are good at digesting large volume of information
– 41 –