Decision Procedures Customized for Formal Verification Carnegie Mellon University

advertisement
Decision Procedures
Customized for
Formal Verification
Randal E. Bryant
Carnegie Mellon University
http://www.cs.cmu.edu/~bryant
Contributions by former graduate students:
Sanjit Seshia, Shuvendu Lahiri
Outline
Context


Infinite state models of hardware systems
Verification techniques
Needs


Requirements for decision procedures
Dealing with quantifiers
Our Solution


–2–
SAT-based procedure
“Eager” Boolean encoding
CADE ‘05
Verification Example
Task


–3–
Verify that
microprocessor
correctly
implements
instruction set
definition
Even though heavily
pipelined
Alpha 21264 Microprocessor
Microprocessor Report, Oct. 28, 1996
CADE ‘05
Existing Hardware Verification
Methods

Simulators, equivalence checkers, model checkers, …
All Operate at Bit Level


View each register or memory bit as state variable
Behavior of each state variable defined by Boolean function
Strengths


Finite-state systems conceptually simple
BDDs & SAT procedures allow high degrees of automation
Limitations


State space can be very large
Only verify fixed instantiation of system
 Specific memory sizes, number of processes, buffer lengths, …
–4–
CADE ‘05
Verification Challenges
Sources of
Complexity


Lots of internal state
Complex control
logic
Opportunities

–5–
Most of the logic
serves to store,
select, and
communicate data
Alpha 21264 Microprocessor
Microprocessor Report, Oct. 28, 1996
CADE ‘05
Applying Data Abstraction to
Hardware Verification
Idea


Abstract details of data encodings and operations
Keep control logic precise
Applications


Verify overall correctness of system
Assuming individual functional units correct
Advantages of Abstraction


Abstract infinite-state system easier to verify than detailed
finite-state one
Parametric representation allows verification of many
different system variants
 Arbitrary number of processes, buffer lengths, etc.
–6–
CADE ‘05
Word Abstraction
Control Logic
Com.
Log.
1
Com.
Log.
2
Data Path
Data:
Abstract details of form & functions
Control: Keep at bit level
Timing: Keep at cycle level
–7–
CADE ‘05
Data Abstraction #1: Bits → Terms
x0
x1
x2

x
xn-1
View Data as Symbolic Words

Arbitrary integers
 No assumptions about size or encoding
 Classic model for reasoning about software

–8–
Can store in memories & registers
CADE ‘05
Abstracting Data Bits
Control Logic
Com.
Log.
?
1
Com.
Log.
?
2
1
Data Path
What do we do about logic functions?
– 10 –
CADE ‘05
Abstraction #2:
Uninterpreted Functions
A
Lf
U
For any Block that Transforms or Evaluates Data:


Replace with generic, unspecified function
Only assumed property is functional consistency:
a = x  b = y  f (a, b) = f (x, y)
– 11 –
CADE ‘05
Abstracting Functions
Control Logic
Com.
Log.
F1
1
Com.
Log.
F2
1
Data Path
For Any Block that Transforms Data:



– 12 –
Replace by uninterpreted function
Ignore detailed functionality
Conservative approximation of actual system
CADE ‘05
Abstraction #3: Modeling Memories
as Mutable Functions
Memory M Modeled as Function
M
a

M(a): Value at location a
Initially
M
a


– 14 –
m0
Arbitrary state
Modeled by uninterpreted function m0
CADE ‘05
Effect of Memory Write Operation
Writing Transforms Memory

M = Write(M, wa, wd)
M
Express with Lambda Notation
M =
 a . ITE(a = wa, wd, M(a))
wa
=
a
wd
M

1
0
Reading from updated
memory:
 Address wa will get wd
 Otherwise get what’s
already in M
– 15 –
CADE ‘05
Systems with Buffers
Circular Queue
Unbounded Buffer
In Use
0
head
•
•
•
head
•
•
•
•
•
•
tail
•
•
•
•
•
•
•
•
•
In Use
tail
Max-1
Modeling Method



– 16 –
Mutable function to describe buffer contents
Integers to represent head & tail pointers
Parameterize buffer capacity with symbolic value Max
CADE ‘05
Some History of Term-Level Modeling
Historically

Standard model used for program verification
 Unbounded integer data types

Widely used with theorem-proving approaches to hardware
verification
 E.g, Hunt ’85
Automated Approaches to Hardware Verification

Burch & Dill, ’95
 Tool for verifying pipelined microprocessors
 Implemented by form of symbolic simulation

– 17 –
Continued application to pipelined processor verification
CADE ‘05
UCLID

Seshia, Lahiri, Bryant, CAV ‘02
Term-Level Verification System

Language for describing systems
 Inspired by CMU SMV

Symbolic simulator
 Generates integer expressions describing system state after
sequence of steps

Decision procedure
 Determines validity of formulas

Support for multiple verification techniques
Available by Download
http://www.cs.cmu.edu/~uclid
– 18 –
CADE ‘05
Required Logic
Scalar Data Types

Formulas (F )
Boolean Expressions
 Control signals

Terms (T )
Integer Expressions
 Data values
Functional Data Types

Functions (Fun)
Integer  Integer
 Immutable: Functional units
 Mutable: Memories

Predicates (P)
Integer  Boolean
 Immutable: Data-dependent control
 Mutable: Bit-level memories
– 19 –
CADE ‘05
CLU Logic

Counter Arithmetic, Lambda Expressions and Uinterpreted
Functions
Terms (T )
ITE(F, T1, T2)
Fun (T1, …, Tk)
succ (T)
pred (T)
Formulas (F )
F, F1  F2, F1  F2
T1 = T2
T1 < T2
P(T1, …, Tk)
Integer Expressions
If-then-else
Function application
Increment
Decrement
Boolean Expressions
Boolean connectives
Equation
Inequality
Predicate application
To support pointer
operations
– 20 –
CADE ‘05
CLU Logic (Cont.)
Functions (Fun)
f
 x1, …, xk . T
Predicates (P)
p
 x1, …, xk . F
– 21 –
Integer  Integer
Uninterpreted function symbol
Function definition
Integer  Boolean
Uninterpreted predicate symbol
Predicate definition
CADE ‘05
Outline
Context


Infinite state models of hardware systems
Verification techniques
Needs


Requirements for decision procedures
Dealing with quantifiers
Our Solution


– 22 –
SAT-based procedure
“Eager” Boolean encoding
CADE ‘05
Verifying Safety Properties
Present
State
Next
State

Reachable
States
Bad
States
Reset
States
Reset
Inputs
(Arbitrary)
State Machine Model


State encoded as Booleans, integers, and functions
Next state function expresses how updated on each step
Prove: System will never reach bad state
– 23 –
CADE ‘05
Bounded Model Checking
Reachable
Rn
Bad
States
R2
R1
Reset
States
Repeatedly Perform Image
Computations

Set of all states reachable
by one more state
transition
Underapproximation of
Reachable State Set

– 24 –
But, typically catch most
bugs with 8–10 steps
CADE ‘05
Implementing BMC
Satisfiable?
Reset

S



– 25 –


X1
X2


Bad
Xn
Construct verification condition formula for step n by
symbolically simulating system for n cycles
Check with decision procedure
Do as many cycles as tractable
CADE ‘05
True Model Checking

Rn
Bad
States
R2
R1
Reset
States
Impractical for Term-Level
Models

 Can keep adding
Reach Fixed-Point

– 26 –
Rn = Rn+1 = Reachable
Many systems never
reach fixed point
elements to buffer
Convergence test
undecidable
(Bryant, Lahiri, Seshia,
CHARME ’03)

CADE ‘05
Inductive Invariant Checking

I
Bad
States
Reachable
States
Reset
States
Key Properties of System that Make it Operate
Correctly

Formulate as formula I
Prove Inductive
– 27 –

Holds initially I(s0)

Preserved by all state changes I(s)  I((i, s))
CADE ‘05
Inductive Invariants
Formulas I1, …, In
holds for any initial state s0, for 1  j  n
I1(s)  I2(s)  …  In(s)  Ij(s ) for any current state s and
successor state s for 1  j  n
 Ij(s0)

Overall Correctness

Follows by induction on time
Restricted form of invariants

x1x2…xk (x1…xk)

(x1…xk) is a CLU formula without quantifiers
x1…xk are integer variables free in (x1…xk)

 Express properties that hold for all buffer indices, register IDs, etc.
– 28 –
CADE ‘05
Proving Invariants
Proving invariants inductive requires quantifiers
|= [x1x2…xk (x1…xk)]  [y1y2…ym (y1…ym)]
Prove unsatisfiability of formula
x1x2…xk (x1…xk)  (y1…ym)
Undecidable Problem

– 29 –
In logic with uninterpreted functions and equality
CADE ‘05
Invariant Checking:
Out-of-Order Processor Designs
base
Total
Invariants
UCLID
time
Person
time


– 30 –
exc
exc / br
exc / br /
exc / br /
mem-simp
mem
39
67
71
13
34
54 s
236 s
403 s
1594 s
2200 s
2 days
7 days
9 days
24 days
34 days
Generating invariants requires considerable human effort
Impractical for realistic designs
CADE ‘05
Constructing Invariants from
Predicates
Predicates
rob.head  reg.tag(r)
Invariant
reg.valid(r)
r,t.reg.valid(r)  reg.tag(r) = t

(rob.head  reg.tag(r) < rob.tail
 rob.dest(t) = r )
Result: Correctness
reg.tag(r) = t
rob.dest(t) = r
– 31 –
CADE ‘05
Automatic Predicate Abstraction

Graf & Saïdi, CAV ’97
Idea

Given set of predicates P1(s), …, Pk(s)
 Boolean formulas describing properties of system state


View as abstraction mapping: States  {0,1}k
Defines abstract FSM over state set {0,1}k
 Form of abstract interpretation
 Do reachability analysis similar to symbolic model checking
Early Implementations Inefficient


– 32 –
Guess at possible next abstract states
Test with call to decision procedure
CADE ‘05
P.E. as Invariant Generator
A
Rn
Abstract
System
Reach Fixed-Point on
Abstract System
R2

R1
Reset
States
Concretize

C
Concrete
System
I
Termination guaranteed,
since finite state
Equivalent to Computing
Invariant for Concrete
System

Strongest possible
invariant that can be
expressed by formula over
these predicates
Reset
States
– 33 –
CADE ‘05
Symbolic Formulation of Predicate
Abstraction
Lahiri, Bryant, Cook, CAV ‘03
Basic Operation

Compute set of legal abstract next states (B) given current
abstract states (B)
B, B:
, :

Abstract current and next-state state variables
Boolean formulas
Create formula of form (S,B)
Possible combinations of current concrete state S and next
abstract state B
Formulate as Quantifier Elimination Problem

Generate formula of form
(B)   S (S,B)
S: Integer variables

– 34 –
For interpretation of B, formula  true iff (S,B) satisfiable
CADE ‘05
Outline
Context


Infinite state models of hardware systems
Verification techniques
Needs


Requirements for decision procedures
Dealing with quantifiers
Our Solution


– 35 –
SAT-based procedure
“Eager” Boolean encoding
CADE ‘05
Decision Procedure Needs
Bounded Model Checking


Satisfiability of quantifier-free CLU formula
Handled by decision procedure
Invariant Checking


Satisfiability of quantified CLU formula
Undecidable
Predicate Abstraction

Eliminate quantifiers from CLU formula
Role of Decision Procedure

– 36 –
Apply in sound, but incomplete way
CADE ‘05
UCLID Decision Procedure Operation
CLU
Formula
Lambda
Expansion


Series of
transformations
leading to
propositional formula
Except for lambda
expansion, each has
polynomial
complexity
-free
Formula
Function
&
Predicate
Elimination
Term
Formula
Finite
Instantiation
Boolean
Formula
Boolean
Satisfiability
– 37 –
CADE ‘05
SAT-based Decision Procedures
Input Formula
Satisfiability-preserving
Boolean Encoder
Approximate
Boolean Encoder
Boolean Formula
Boolean Formula
SAT Solver
SAT Solver
satisfiable
– 38 –
Input Formula
unsatisfiable
EAGER ENCODING
additional
clause
unsatisfiable
First-order
Conjunctions
SAT Checker
satisfiable
satisfying
assignment
unsatisfiable
satisfiable
LAZY ENCODING
CADE ‘05
Eager Encoding Characteristics
Input Formula
– Must encode all information about
domain properties into Boolean
formula
– Some properties can give exponential
blowup
Satisfiability-preserving
Boolean Encoder
Boolean Formula
SAT Solver
+ Lets SAT solver do all of the work
Good Approach for Some Domains

Modern SAT solvers have remarkable
capacity
 Good at extracting relevant portions
out of very large formulas
 Learns about formula properties as
search proceeds
satisfiable
– 39 –
unsatisfiable
CADE ‘05
Encoding Methods
Difference Logic Formula
Small Domain Encoding
(SD)
Per-Constraint
Encoding (PC)
Boolean Formula
SAT Solver
satisfiable/unsatisfiable
– 41 –
CADE ‘05
Small Domain Encoding (SD)
[Bryant, Lahiri, Seshia, CAV’02]
x  y  y  z  z  x+1
0x1x0  0y1y0  0y1y0  0z1z0  0z1z0  0x1x0+1
Observation:
To check satisfiability, need to consider all possible
relative orderings of finitely-many expressions
z
x x+1
y
z
Values increase
y
x x+1
Can use Boolean encoding of finite range of values
– 4 values in this case, so 2-bit encoding
– 42 –
CADE ‘05
Per-Constraint Encoding (PC)
[Strichman, Seshia, Bryant, CAV’02]
xy

yz
e1
Overall Boolean
Encoding

z  x+1

e2

e3

e1
 e2
 e4

e4 
 e3
e1
xy
e2
yz
e3
z  x+1
New Difference
Predicate
e4
xz
Transitivity Constraints
– 43 –
CADE ‘05
Size of Boolean Encoding:
SD better than PC
Let N be size of original difference logic formula

Size of a directed acyclic graph representation
SD encoding size is worst-case O(N2)
PC encoding size is worst-case O(2N)

Can generate O(2N) transitivity constraints
Example: N = 6813
– 44 –
Method
Boolean Encoding Size
PC
> 1000000
SD
54465
CADE ‘05
Impact on SAT problem: SD vs PC
Experimentally compared zChaff performance on SD and PC
encodings of several unsatisfiable formulas
Sample result:
Method
# Boolean
variables
# CNF
Clauses
# Conflict
Clauses
zChaff
Time (sec)
PC
57211
169387
150
0.56
SD
23112
67699
15811
21.63
PC better than SD for zChaff
– 45 –
CADE ‘05
How to Choose Encoding
Hybrid Strategy

Partition variables into classes
 Which ones are compared to each other

For each class, choose encoding method
 PC except SD when PC blows up
How to Determine Whether PC Will Work

Try to predict based on formula characteristics
 Number of constraints, density, …
 Selection procedure trained by machine learning
– 46 –
CADE ‘05
Some Lessons We’ve Learned About
Decision Procedures
Preserve Boolean Structure

Other approaches require collapsing to conjunctions of
predicates (or extracting them dynamically)
Exploit Problem Characteristics


Sparseness
Polarity structure
Let SAT Solver Do the Work


– 47 –
Eager encoding: provide sufficient set of constraints to
prove / disprove formula
They are good at digesting large volume of information
CADE ‘05
Invariant Checking Revisited
Prove Unsatisfiability of Formula
x1x2…xk (x1…xk)  (y1…ym)

General Form: X (X)  (Y)
Quantifier Instantiation

Generate expressions E1(Y), …, En(Y)
 Using terms that appear in Q

Expand as (E1(Y))  … (En(Y))  (Y)
 If unsatisfiable, then so is quantified formula
 Sound, but incomplete
Trade-off


– 48 –
Be clever about instantiation, or
Instantiate many terms and rely on decision procedure
capacity
CADE ‘05
Predicate Abstraction Revisited
Formulate as Quantifier Elimination Problem

Generate formula of form
(B)   S (S,B)
S: Integer variables
Use Eager SAT Encoding of 

Get formula  A P(A,B)
A: Boolean variables
 Satisfying solutions for P w.r.t. B same as those for 

– 49 –
Core problem of symbolic model checking
CADE ‘05
Quantifier Elimination for P.A.

Formula  A P(A,B)
A: Boolean variables
 Typically: 200+ variables for A, ~20 for B
BDD-Based

Use partitioning techniques developed for symbolic model
checking
 Typically too many total Boolean variables
SAT Enumeration

Find satisfying solution (A)  (B) to P
Enumerate solution (B)
Reformulate P as P  (B)

Performance: about 1000 solutions / second


– 50 –
CADE ‘05
Why Verification Tasks Feasible
CLU Logic Fairly Simple


Equality, uninterpreted functions, difference constraints
Small model property
“Deep” Reasoning Not Required



– 51 –
Formulas large and messy, but straightforward
Verifying systems that are designed to have constrained
behaviors
Only checking effect of a few cycles of system operation
CADE ‘05
Decision Procedures Revisited
SAT-Based Approaches Effective


Good performance as decision procedures
Key to implementing predicate abstraction
 Quantifier elimination
Eager Encoding Gives Good Performance


Avoids many iterations of theory-specific checkers
Extends to linear integer arithmetic
 Seshia & Bryant, LICS ‘04
 Quantifier-free Presburger
 Small domain encoding exploiting sparseness
– 52 –
CADE ‘05
Areas of Research
Bit-Vector Decision Procedures

True model for hardware & low-level software
 Bit-field extraction
 Bit-wise Boolean operations
 Overflow effects

Automatically apply abstractions
 Abstract to symbolic terms whenever possible
Boolean Quantifier Elimination

SAT enumeration still not good enough
 Limits predicate abstraction to ~25 predicates

– 53 –
Core problem for symbolic model checking
CADE ‘05
More Research
Proof Generation

Hard to see how to generate unsatisfiability proof for CLU
formula
Debugging Support


Bounded model checking: provide counterexample trace
Invariant checking: hard to determine why invariant fails
 And may be due to weakness in quantifier instantiation

Predicate abstraction: Gets nowhere without right set of
predicates
Proving Liveness


– 54 –
Current abstractions do not preserve liveness properties
Can help in proving progress invariant
CADE ‘05
Questions?
Download