Modeling Data in Formal Verification Bits, Bit Vectors, or Words Randal E. Bryant

advertisement
Modeling Data in Formal
Verification
Bits, Bit Vectors, or Words
Randal E. Bryant
Carnegie Mellon University
http://www.cs.cmu.edu/~bryant
Overview
Issue


How should data be modeled in formal analysis?
Verification, test generation, security analysis, …
Approaches

Bits: Every bit is represented individually
 Basis for most CAD, model checking

Words: View each word as arbitrary value
 E.g., unbounded integers
 Historic program verification work

Bit Vectors: Finite precision words
 Captures true semantics of hardware and software
–2–
 More opportunities for abstraction than with bits
Bit-Level Modeling
Control Logic
Com.
Log.
1
Com.
Log.
2
Data Path
Represent Every Bit of State Individually


Behavior expressed as Boolean next-state over current state
Historic method for most CAD, testing, and verification tools
 E.g., model checkers
–3–
Bit-Level Modeling in Practice
Strengths


Allows precise modeling of system
Well developed technology
 BDDs & SAT for Boolean reasoning
Limitations

Every state bit introduces two Boolean variables
 Current state & next state

Overly detailed modeling of system functions
 Don’t want to capture full details of FPU
Making It Work


–4–
Use extensive abstraction to reduce bit count
Hard to abstract functionality
Word-Level Abstraction #1:
Bits → Integers
x0
x1
x2

xn-1
View Data as Symbolic Words

Arbitrary integers
 No assumptions about size or encoding
 Classic model for reasoning about software

–5–
Can store in memories & registers
x
Abstracting Data Bits
Control Logic
Com.
Log.
?
1
Com.
Log.
?
2
1
Data Path
What do we do about logic functions?
–6–
Word-Level Abstraction #2:
Uninterpreted Functions
A
Lf
U
For any Block that Transforms or Evaluates Data:


Replace with generic, unspecified function
Only assumed property is functional consistency:
a = x  b = y  f (a, b) = f (x, y)
–7–
Abstracting Functions
Control Logic
Com.
Log.
F1
1
Com.
Log.
F2
1
Data Path
For Any Block that Transforms Data:



–8–
Replace by uninterpreted function
Ignore detailed functionality
Conservative approximation of actual system
Word-Level Modeling: History
Historic

Used by theorem provers
More Recently

Burch & Dill, CAV ’94
 Verify that pipelined processor has same behavior as
unpipelined reference model
 Use word-level abstractions of data paths and memories
 Use decision procedure to determine equivalence

Bryant, Lahiri, Seshia, CAV ’02
 UCLID verifier
 Tool for describing & verifying systems at word level
–9–
Pipeline Verification Example
IF/ID
PC
Op
ID/EX
Control
EX/WB
Control
Rd
Ra
Instr
Mem
Pipelined Processor
=
Adat
Reg.
File
A
L
U
Imm
+4
=
Rb
Op
Control
Rd
Ra
Instr
Mem
Reference Model
Adat
Reg.
File
Imm
+4
Rb
– 10 –
Bdat
A
L
U
Abstracted Pipeline Verification
IF/ID
PC
Op
ID/EX
Control
EX/WB
Control
Rd
Ra
Instr
F3
Mem
Pipelined Processor
=
Adat
Reg.
File
A
FL2
U
Imm
F1
+4
=
Rb
Op
PC
Control
Rd
Ra
Instr
F3
Mem
Reference Model
Adat
Reg.
File
Imm
+4
F1
Rb
– 11 –
Bdat
A
FL2
U
Experience with Word-Level Modeling
Powerful Abstraction Tool


Allows focus on control of large-scale system
Can model systems with very large memories
Hard to Generate Abstract Model


Hand-generated: how to validate?
Automatic abstraction: limited success
 Andraus & Sakallah, DAC 2004
Realistic Features Break Abstraction

E.g., Set ALU function to A+0 to pass operand to output
Desire

– 12 –
Should be able to mix detailed bit-level representation with
abstracted word-level representation
Bit Vectors: Motivating Example #1
int abs(int x) {
int mask = x>>31;
return (x ^ mask) + ~mask + 1;
}
int test_abs(int x) {
return (x < 0) ? -x : x;
}
Do these functions produce identical results?
Strategy


– 13 –
Represent and reason about bit-level program behavior
Specific to machine word size, integer representations,
and operations
Motivating Example #2
void fun() {
char fmt[16];
fgets(fmt, 16, stdin);
fmt[15] = '\0';
printf(fmt);
}
Is there an input string that causes value 234 to be
written to address a4a3a2a1?
Answer

Yes: "a1a2a3a4%230g%n"
 Depends on details of compilation

– 14 –

But no exploit for buffer size less than 8
[Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]
Motivating Example #3
bit[W] popSpec(bit[W] x)
{
int cnt = 0;
for (int i=0; i<W; i++) {
if (x[i]) cnt++;
}
return cnt;
}
bit[W] popSketch(bit[W] x)
{
loop (??) {
x = (x&??) + ((x>>??)&??);
}
return x;
}
Is there a way to expand the program sketch to
make it match the spec?
Answer
– 15 –

W=16:

[Solar-Lezama, et al., ASPLOS ‘06]
x
x
x
x
=
=
=
=
(x&0x5555)
(x&0x3333)
(x&0x0077)
(x&0x000f)
+
+
+
+
((x>>1)&0x5555);
((x>>2)&0x3333);
((x>>8)&0x0077);
((x>>4)&0x000f);
Motivating Example #4
Sequential Reference Model
Pipelined Microprocessor
IF/ID
PC
Op
ID/EX
Control
EX/WB
Op
Control
Rd
Ra
Instr
Mem
Control
Rd
Ra
=
Adat
Reg.
File
Instr
Mem
A
L
U
Imm
Adat
Reg.
File
Imm
Bdat
A
L
U
+4
Rb
=
+4
Rb
Is pipelined microprocessor identical to sequential
reference model?
Strategy

Represent machine instructions, data, and state as bit vectors
 Compatible with hardware description language representation

– 16 –
Verifier finds abstractions automatically
Bit Vector Formulas
Fixed width data words
Arithmetic operations


Add/subtract/multiply/divide, …
Two’s complement, unsigned, …
Bit-wise logical operations

Bitwise and/or/xor, shift/extract, concatenate
Predicates

==, <=
Task


– 17 –
50000 * 50000 =
-1794967296
Is formula satisfiable?
E.g., a > 0 && a*a < 0
(on 32-bit machine)
Decision Procedures

Core technology for formal reasoning
Satisfying solution
Formula
Decision
Procedure
Boolean SAT

Pure Boolean formula
SAT Modulo Theories (SMT)


Support additional logic fragments
Example theories
 Linear arithmetic over reals or integers
 Functions with equality
 Bit vectors
 Combinations of theories
– 18 –
Unsatisfiable
(+ proof)
– 19 –
ra
sp
(2
00
2)
118
81
20
07
)
147
Rs
at
(
(2
00
304
Si
)
eg
e
(2
Sa
00
tE
4)
l it
eG
TI
(2
00
5)
zC
ha
ff
in
(2
00
1)
(2
00
0)
1,000
Be
rk
M
zC
ha
ff
G
Run-time (sec.)
Recent Progress in SAT Solving
3600
3,000
2,000
766
46
19
0
BV Decision Procedures:
Some History
B.C. (Before Chaff)



String operations (concatenate, field extraction)
Linear arithmetic with bounds checking
Modular arithmetic
Limitations

– 20 –
Cannot handle full range of bit-vector operations
BV Decision Procedures:
Using SAT
SAT-Based “Bit Blasting”



Generate Boolean circuit based on bit-level behavior of
operations
Convert to Conjunctive Normal Form (CNF) and check with
best available SAT checker
Handles arbitrary operations
Effective in Many Applications



– 21 –
CBMC [Clarke, Kroening, Lerda, TACAS ’04]
Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV
’05]
CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]
Bit-Vector Challenge
Is there a better way than bit blasting?
Requirements



Provide same functionality as with bit blasting
Find abstractions based on word-level structure
Improve on performance of bit blasting
Observation

Must have bit blasting at core
 Only approach that covers full functionality

Want to exploit special cases
 Formula satisfied by small values
 Simple algebraic properties imply unsatisfiability
 Small unsatisfiable core
 Solvable by modular arithmetic
 …
– 22 –
Some Recent Ideas
Iterative Approximation




UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman,
Brady, TACAS ’07
Use bit blasting as core technique
Apply to simplified versions of formula
Successive approximations until solve or show unsatisfiable
Using Modular Arithmetic


STP: Ganesh & Dill, CAV ’07
Algebraic techniques to solve special case forms
Layered Approach

– 23 –

MathSat: Bruttomesso, Cimatti, Franzen, Griggio, Hanna,
Nadel, Palti, Sebastiani, CAV ’07
Use successively more detailed solvers
Iterative Approach Background:
Approximating Formula
Overapproximation
Original Formula
Underapproximation
+
  +
More solutions:
If unsatisfiable,
then so is 
−  
Fewer solutions:
Satisfying solution
also satisfies 

−
Example Approximation Techniques

Underapproximating
 Restrict word-level variables to smaller ranges of values

Overapproximating
 Replace subformula with Boolean variable
– 24 –
Starting Iterations

 1−
Initial Underapproximation


– 25 –
(Greatly) restrict ranges of word-level variables
Intuition: Satisfiable formula often has small-domain
solution
First Half of Iteration
1+

 1−
UNSAT proof:
generate
overapproximation
If SAT, then done
SAT Result for 1−

Satisfiable
 Then have found solution for 

Unsatisfiable
 Use UNSAT proof to generate overapproximation 1+
– 26 –
 (Described later)
Second Half of Iteration
1+
SAT:
Use solution to generate
refined underapproximation
If UNSAT, then done

 2−
 1−
SAT Result for 1+

Unsatisfiable
 Then have shown  unsatisfiable

Satisfiable
 Solution indicates variable ranges that must be expanded
– 27 –
 Generate refined underapproximation
Iterative Behavior
 2+  +
1

k+

Underapproximations


k−
Overapproximations

 2−
1−
– 28 –
Successively more precise
abstractions of 
Allow wider variable ranges

No predictable relation
UNSAT proof not unique
Overall Effect
 2+  +
1

k+
Soundness
UNSAT



k−
Completeness
SAT
 2−
1−
Only terminate with solution
on underapproximation
Only terminate as UNSAT on
overapproximation


Successive
underapproximations
approach 
Finite variable ranges
guarantee termination
 In worst case, get k−  
– 29 –
Generating Overapproximation
Given



Underapproximation 1−
Bit-blasted translation of 1−
into Boolean formula
Proof that Boolean formula
unsatisfiable
Generate


Overapproximation 1+
If 1+ satisfiable, must lead to
refined underapproximation
 Generate 2− such that
 1−   2−  
– 30 –
 1+

 2−
 1−
UNSAT proof:
generate
overapproximation
Bit-Vector Formula Structure

DAG representation to allow shared subformulas
x=y
Ç
x+2z1
Æ
:
a
w & 0xFFFF = x
Æ
Ç
Ç
x % 26 = v
– 31 –

Structure of Underapproximation
w
x
y
z
x=y
Range
Constraints
Ç
x+2z1
Æ
:
a
w & 0xFFFF = x
Æ
Æ
−
Ç
Ç
x % 26 = v

Linear complexity translation to CNF
 Each word-level variable encoded as set of Boolean variables
 Additional Boolean variables represent subformula values
– 32 –
Encoding Range Constraints
Explicit

View as additional predicates in formula
w
Range
Constraints
x
0  w  8  −4  x  4
Implicit
– 33 –

Reduce number of variables in encoding
Constraint
Encoding
0w8
0 0 0 ··· 0 w2w1w0
−4  x  4
xsxsxs··· xsxsx1x0

Yields smaller SAT encodings
UNSAT Proof



Subset of clauses that is unsatisfiable
Clause variables define portion of DAG
Subgraph that cannot be satisfied with given range
constraints
w
x
y
z
Range
Constraints
x=y
Ç
x+2z1
Æ
:
a
w & 0xFFFF = x
Æ
Ç
Ç
x % 26 = v
– 34 –
Æ
Extracting Circuit from UNSAT Proof

Subgraph that cannot be satisfied with given range
constraints
 Even when replace rest of graph with unconstrained variables
w
x
y
z
Range
Constraints
x=y
Ç
x+2z1
Æ
UNSAT
:
a
Æ
Ç
b1
– 35 –
b2
Æ
Generated Overapproximation


Remove range constraints on word-level variables
Creates overapproximation
 Ignores correlations between values of subformulas
x=y
Ç
x+2z1
Æ
:
a
Æ
Ç
b1
b2
– 36 –
1+
Refinement Property
Claim

1+ has no solutions that satisfy 1−’s range constraints
 Because 1+ contains portion of 1− that was shown to be
unsatisfiable under range constraints
w
x
y
z
Range
Constraints
x=y
Ç
x+2z1
Æ
UNSAT
:
a
Æ
Ç
b1
– 37 –
b2
 1+
Æ
Refinement Property (Cont.)
Consequence


Solving 1+ will expand range of some variables
Leading to more exact underapproximation 2−
x=y
Ç
x+2z1
Æ
:
a
Æ
Ç
b1
b2
– 38 –
1+
Effect of Iteration
1+
SAT:
Use solution to generate
refined underapproximation

UNSAT proof:
generate
overapproximation
 2−
 1−
Each Complete Iteration


– 39 –
Expands ranges of some word-level variables
Creates refined underapproximation
Approximation Methods
So Far

Range constraints
 Underapproximate by constraining values of word-level
variables

Subformula elimination
 Overapproximate by assuming subformula value arbitrary
General Requirements


Systematic under- and over-approximations
Way to connect from one to another
Goal: Devise Additional Approximation Strategies
– 40 –
Function Approximation Example
x
x
0
1
else
0
0
0
0
1
0
1
x
else
0
y
§
*
y
y
Motivation

Multiplication (and division) are difficult cases for SAT
§: Prohibit Via Additional Range Constraints


Gives underapproximation
Restricts values of (possibly intermediate) terms
§: Abstract as f (x,y)
– 41 –

Overapproximate as uninterpreted function f

Value constrained only by functional consistency
Results: UCLID BV vs. Bit-blasting
[results on 2.8 GHz Xeon, 2 GB RAM]


– 42 –

UCLID always better than bit blasting
Generally better than other available procedures
SAT time is the dominating factor
Challenges with Iterative
Approximation
Formulating Overall Strategy


Which abstractions to apply, when and where
How quickly to relax constraints in iterations
 Which variables to expand and by how much?
 Too conservative: Each call to SAT solver incurs cost
 Too lenient: Devolves to complete bit blasting.
Predicting SAT Solver Performance


Hard to predict time required by call to SAT solver
Will particular abstraction simplify or complicate SAT?
Combination Especially Difficult

– 43 –
Multiple iterations with unpredictable inner loop
STP: Linear Equation Solving



Ganesh & Dill, CAV ’07
Solve linear equations over integers mod 2w
Capture range of solutions with Boolean Variables
Example Problem

Variables: 3-bit unsigned integers
x: [x2 x1 x0]


y: [y2 y1 y0]
Linear equations: conjunction of linear constraints
3x
+ 4y
2x
+ 2y
2x
+ 4y
+ 2z
+2
+ 2z
General Form
 A x = b mod 2w
– 44 –
z: [z2 z1 z0]
=0
mod 8
=0
mod 8
=0
mod 8
Solution Method
Equations
3x
+ 4y
2x
+ 2y
2x
+ 4y
+ 2z
+2
+ 2z
=0
mod 8
=0
mod 8
=0
mod 8
Some Number Theory

Odd number has multiplicative inverse mod 2w
 Mod 8: 3−1 = 3

Additive inverse mod 2w: −x = 2w − x
 Mod 8:

– 45 –
−4 = 4
−2 = 6
Solve first equation for x:
3·3x
=
3·4y
+ 3·6z
mod 8
x
=
4y
+ 2z
mod 8
Solution Method (cont.)
Substitutions
2x
+ 2y
+2
2x
+ 4y
+ 2z
x
=
4y
mod 8
=0
mod 8
+ 2z
2(4y+2z) + 2y
mod 8
+2
2(4y+2z) + 4y
– 46 –
=0
+ 2z
2y
+ 4z
4y
+ 6z
+2
=0
mod 8
=0
mod 8
=0
mod 8
=0
mod 8
What if All Coefficients Even?
Result of Substitutions

2y
+ 4z
4y
+ 6z
+2
=0
mod 8
=0
mod 8
Even numbers do not have multiplicative inverses mod 8
Observation

– 47 –
Can divide through and reduce modulus
y
+ 2z
+1
2y
+ 3z
y
=
2z
z
=
2
mod 4
y
=
3
mod 4
+3
=0
mod 4
=0
mod 4
mod 4
General Solutions

Original variables: 3-bit unsigned integers
x: [x2 x1 x0]


y: [y2 y1 y0]
z: [z2 z1 z0]
Solutions
y
=
3
mod 4
z
=
2
mod 4
Constrained variables
y: [y2 1 1]
z: [z2 1 0]
Back Substitution

x
=
4y
x
=
0
Constrained variables
x: [0 0 0]
– 48 –
+ 6z
mod 8
mod 8
Linear Equation Solutions
Equations
3x
+ 4y
2x
+ 2y
2x
+ 4y
+ 2z
+2
+ 2z
=0
mod 8
=0
mod 8
=0
mod 8
Encoding All Possible Solutions



x: [0 0 0]
y: [y2 1 1]
z: [z2 1 0]
y2, z2 arbitrary Boolean variables
4 possible solutions (out of original 512)
General Principle



– 49 – 
Form of LU decomposition
Polynomial time algorithm
Boolean variables in solution to express set of solutions
Only works when have conjunction of linear constraints
Layered Solver


Bruttomesso, et al, CAV ’07
Part of MathSAT project
DPLL(T) Framework


– 50 –
SAT solver coupled with solver for mathematical theory T
BV theory solver works with conjunctions of constraints
DPLL(T): Formula Structure
Atoms
x=y
Boolean Structure
Ç
x+2z1
Æ
:
x << 1 > y
w & 0xFFFF = x
Æ
Ç
Ç
x % 26 = v
Atoms


– 51 –
Predicates applied to bit-vector expressions
Boolean Variables

DPLL(T): Operation
1
Ç
Actions


DPLL engine satisfies
Boolean portion
Theory solver
determines whether
resulting conjunction
of atoms satifiable
Æ
0
:
1
Æ
Ç
1
Ç
0
x=y


x+2z>1

x << 1 > y

w & 0xFFFF ≠ x

x % 26 = v
Solver provides information to DPLL engine to aid search
 Nonchronological backtracking
 Conflict clause generation
– 52 –

Successful approach for other decision procedures
MathSAT Layers

Uses increasingly detailed solver layers
 Only progress if can’t find conflict using more abstract rules
Layers
1.
Equality with uninterpreted functions
 Treats all bit-level functions and operators as uninterpreted
2.
3.
– 53 –
Simple handling of concatenations, extractions, and
transitivity
Full solver using linear arithmetic + SAT
Summary: Modeling Levels
Bits


Limited ability to scale
Hard to apply functional abstractions
Words


Allows abstracting data while precisely representing control
Overlooks finite word-size effects
Bit Vectors


Realistic semantic model for hardware & software
Captures all details of actual operation
 Detects errors related to overflow and other artifacts of finite
representation

– 54 –
Can apply abstractions found at word-level
Areas of Agreement
SAT-Based Framework Is Only Logical Choice

SAT solvers are good & getting better
Want to Automatically Exploit Abstractions


Function structure
Arithmetic properties
 E.g., associativity, commutativity

Arithmetic reductions
 E.g., LU decomposition
Base Level Should Be SAT

– 55 –
Only semantically complete approach
Choices
Optimize for Special Formula Classes

E.g., STP optimized for conjunctions of constraints
 Common in software verification & testing
Iterative Abstraction


Natural framework for attempting different abstractions
Having SAT solver in inner loop makes performance tuning
difficult
DPLL(T) Framework


Theory solver only deals with conjunctions
May need to invoke SAT solver in inner loop
 Hard to coordinate outer and inner search procedures
Others?
– 56 –
Observations
Bit-Vector Modeling Gaining in Popularity


Recognition of importance
Benchmarks and competitions
Just Now Improving on Bit Blasting + SAT
Lots More Work to be Done
– 57 –
Download