Modeling and Verification of Out-of-Order Microprocessors in UCLID Shuvendu K. Lahiri

advertisement
Modeling and Verification of
Out-of-Order
Microprocessors in UCLID
Shuvendu K. Lahiri
Sanjit A. Seshia
Randal E. Bryant
Carnegie Mellon University, USA
Processor Verification
Instruction Set Architecture
Transition:
One instruction
execution
Microarchitecture
Views of System Operation

Instruction Set
Transition:
One clock cycle
 Instructions executed in sequential order
 Instruction modifies “programmer-visible” state

Microarchitecture
 At any given time, multiple instructions “in flight”
 State held in hidden pipeline registers and buffers
Verification Task

Prove all instruction sequences execute as predicted by instruction
set model
FMCAD’02
Introduction and Related Work
Inorder Pipeline Verification




Burch and Dill, CAV ’94
Relates implementation and specification by completing
partially-executed instructions in the pipeline (flushing)
Infinite data words, memories
Bounded (fixed) resources only
 Can’t model a reorder buffer (ROB) of arbitrary length
Out-of-Order Processor Verification



Arbitrary large (64-128) reorder buffer, reservation stations
and load-store queues
Very large number of instruction in the pipeline
No finite flushing function to drain the pipeline
FMCAD’02
Out-Of-Order Processor Verification
Theorem Proving approaches





Hosabettu et al. (‘00), Sawada et al.(98), Arons et al.(‘00)
Write inductive invariants
Manually guide the theorem-provers for proving invariants
Large, complicated proof scripts (fragile)
Seldom have good counterexample facilities
Compositional Model Checking [McMillan et al.]




Use compositional model checking with temporal case splitting,
path splitting, symmetry and data-type reduction
Does not need to write inductive invariants
User needs to manually decompose the proof
Has not been demonstrated effective for deep, superscalar
pipelines
Other Approaches

Finite State Model Checking [Berezin et al.], Incremental Flushing
[Skakkaebek et al.], Decision Procedure [Velev]
FMCAD’02
Contributions
Extends the work by Bryant & Velev

Restricted to Inorder pipelines with bounded resources
Application of UCLID


Modeling Framework for Out-Of-Order processors
Application of three verification approaches to Out-Of-Order
Processor
Effective use of automated decision procedure


For proving large formulas automatically
Simple heuristics for quantifier instantiation
FMCAD’02
CLU : Logic of UCLID
Terms (T )
ITE(F, T1, T2)
Fun (T1, …, Tk)
succ (T)
pred (T)
Formulas (F )
F, F1  F2, F1  F2
T1 = T2
T1 < T2
P(T1, …, Tk)
Functions (Fun)
f
 x1, …, xk . T
Predicates (P)
p
 x1, …, xk . F
Integer Expressions
If-then-else
Function application
Increment
Decrement
Boolean Expressions
Boolean connectives
Equation
Inequality
Predicate application
Integers  Integer
Uninterpreted function symbol
Function definition
Integers  Boolean
Uninterpreted predicate symbol
Predicate definition
FMCAD’02
Decision Procedure
CLU
Formula
Lambda
Expansion
Operation



Series of
transformations
leading to
propositional formula
Propositional formula
checked with BDD or
SAT tools
Bryant, Lahiri, Seshia
[CAV02]
-free
Formula
Function
&
Predicate
Elimination
Function-free
Formula
Convert to
Boolean
Formula
Boolean
Formula
Boolean
Satisfiability
FMCAD’02
Modeling Memories with ’s
Memory M Modeled as Function
Writing Transforms Memory
 next[M] = Write(M, wa, wd)
M
a
next[M]
wa
M(a):
=
Value at location a
a
Initially
M
M
a
Arbitrary
wd
1
0
m0
state
Modeled by uninterpreted
function m0

 a . ITE(a = wa, wd, M(a))

Future reads of address wa
will get wd
FMCAD’02
Modeling Unbounded FIFO Buffer
Queue is Subrange of Infinite Sequence
h : INT
 Head of the queue

t : INT
 Tail of the queue

q(h–2)
q(h–1)
head
q(h)
q(h+1)
q : INT  INT
•
•
•
 Function mapping indices to values
 q(i) valid only when h  i < t
q(t–2)
q(t–1)
tail
q(t)
q(t+1)
•
•
•

•
•
•
FMCAD’02
Modeling FIFO Buffer (cont.)
t
q(h–2)
q(h–1)
q(h–1)
q(h)
next[h]
q(h)
q(h+1)
q(h+1)
•
•
•
•
•
•
q(t–2)
q(t–2)
q(t–1)
q(t–1)
q(t)
x
q(t+1)
•
•
•
next[t] := case
(operation = PUSH) : succ(t) ;
default : t;
esac
q(h–2)
next[t]
q(t+1)
•
•
•
h
next[q] := lambda (i).
case
(operation = PUSH) & (i=t) : x;
default : q(i) ;
esac
•
•
•
•
•
•
next[h] := case
(operation = POP) : succ(h) ;
default : h ;
esac
op = PUSH
Input = x
FMCAD’02
Modeling Parallel Updates


Update arbitrary subset of entries at the
same step
next[M] := i. ITE(P(i), D(i), M(i))
Any entry, i, which satisfies a predicate
P(i) will get updated with D(i)
Useful for modeling Reorder Buffers

Forwarding data to all dependant
instructions
M(i)
M(i+1)
D(i+1)
M(i+2)
D(i+2)
P(i+1) is true
P(i+2) is true
•
•
•
M(j)
M(j+1)
D(j+1)
M(j+2)
M(j+3)
D(j+3)
P(j+1) is true
P(j+3) is true
•
•
•
Simultaneous-Update Memories
•
•
•
FMCAD’02
UCLID description
Bounded Property
Checking
Correspondence
Checking
Inductive Invariant
Checking
Term-level Symbolic Simulator
Decision Procedure
BDD
Counter Example
Generator
SAT
Systems are modeled in CLU logic
Three verification techniques

Based on Symbolic Simulation

Uses the decision procedure
Counter example traces generated for verification failures

FMCAD’02
Verification Techniques in UCLID
Bounded Property Checking



Start in reset state
Symbolically simulate for fixed number of steps
Verify a safety property for all states reachable within the
fixed number of steps from the start state
Correspondence Checking



Run 2 different simulations starting in most general state
Prove that final states equivalent
e.g. Burch-Dill Technique
Invariant Checking



Start in general state s
Prove Inv(s)  Inv(next[s])
Limited support for automatic quantifier instantiation
FMCAD’02
An Out-of-order Processor (OOO)
incr
Program
memory
PC
valid tag val
D
E
C
O
D
E
dispatch
Register
Rename Unit
result bus
retire
ALU
execute
head
tail
Reorder
Buffer
valid
value
src1valid
src1val
src1tag
src2valid
src2val
src2tag
dest
op
result
1st
Operand
2nd
Operand
Reorder Buffer
Fields
Out of order execution engine

Register Renaming
Inorder retirement

Unbounded Reorder buffer
Arithmetic instructions only
Model different components in UCLID
FMCAD’02
Verification of OOO : Automation vs.
Guarantee
Method
Bounded Property
Checking
Burch-Dill
Technique
Inductive Invariant
Checking
Resources Verification Auxiliary
(# of steps) variables
Invariants
Unbounded
Bounded
None
None
Fixed
Unbounded
None
Very few
Unbounded
Unbounded
Significant
Significant,
including those for
auxiliary variables
Presence of decision procedure


Efficiency : Allows improved bounded property checking
and Burch-Dill method
Automation : Reduces manual guidance in proving
invariants
 Automatic Instantiation of quantifiers
FMCAD’02
Technique 1 : Bounded Property
Checking
Debugging OOO using Bounded Property Checking


All the errors were discovered during this phase
Counterexample trace of great help
Debugging Motorola ELF™



Superscalar out-of-order processor
Reorder Buffer, memory unit, load-store queues etc.
Applied during early design exploration phase
FMCAD’02
Bounded Property Checking Results
Model
OOO unit
Elf™
steps terms
Term
formula
size
Prop
Formula
Size
UCLID
time (s)
SVC time
(s)
10
59
2566
15290
10.8
233.18
14
87
7480
62504
76.55
> 5 hrs
20
129
19921
263413
1679.12
> 1 day
6
33
218
942
1.2
10.9
8
70
1085
4481
8.4
1851.6
10
104
2467
16453
30.6
> 1 day
12
149
4553
54288
111.0
> 1 day
SVC (Stanford) : Another decision procedure to solve CLU formulas

Can decide more expressive class
CVC (Successor of SVC) runs out of memory on larger cases
FMCAD’02
Technique 2 : Burch-Dill Technique
Qspec
kspec
Qspec
k = issue width of OOO
impl = Transition function of OOO
Abs
Abs
Qimpl
impl
Qimpl
spec = Transition function of ISA
Abs = Relates OOO state
with an ISA state
Restrict the number of entries in the Reorder Buffer

The number of ROB entry = r
Flushing as the abstraction function Abs

Alternate between executing the instruction at the head of the
reorder buffer and retiring the head
Inductive Invariants required for the initial state Qimpl

Critical for Out-of-Order processor verification
 Redundancy present in the OOO model

Because of out-of-order execution and register renaming
FMCAD’02
Technique 2 : Burch-Dill Technique
Qspec
kspec
Qspec
k = issue width of OOO
impl = Transition function of OOO
Abs
Abs
Qimpl
impl
Qimpl
spec = Transition function of ISA
Abs = Relates OOO state
with an ISA state
More automated than inductive invariant checking



Does not require auxiliary structures,
Far fewer invariants than invariant checking
Only 4 invariants compared to about 12 for inductive
invariant checking approach
FMCAD’02
Burch-Dill Technique for OOO
Exponential blowup with the number of ROB entries


Limited to r = 8 entries currently
r = 8 finished after case-splitting in 2.5hrs
# Of ROB
# of
terms
Term
formula
size
Prop Formula
Size
UCLID
time (s)
2
63
398
5325
6.83
3
83
618
10248
30.23
4
103
886
18175
157.41
6
143
1534
41208
3051.79
8
183
2342
82915
>31hrs
Entries
FMCAD’02
Technique 3 : Invariant Checking
Deriving the inductive invariants


Require additional (auxiliary) variables to express invariants
Auxiliary variables do not affect system operation
Proving that the invariants are inductive


Automate proof of invariants in UCLID
Eliminates need for large (often fragile) proof script
FMCAD’02
Restricted Invariants and Proofs
Restricted classes of invariants

x1x2…xk (x1…xk)

(x1…xk) is a CLU formula without quantifiers
x1…xk are integer variables free in (x1…xk)

Proving these invariants requires quantifiers
|= (x1x2…xk (x1…xk))  y1y2…ym (y1…ym)
Automatic instantiation of x1…xk with concrete terms

Sound but incomplete method
Reduce the quantified formula to a CLU formula

Can use the decision procedure for CLU
FMCAD’02
Shadow Structures
Auxiliary variables


Added to predict correct value of state variables
3 shadow variables for 3 state variables
 rob.value
 rob.src1val
 rob.src2val

: shdw.value
: shdw.src1val
: shdw.src2val
Similar to McMillan’s approach and Arons et al.’s approach
FMCAD’02
Adding Shadow Structures
incr
Program
memory
PC
valid tag val
D
E
C
O
D
E
result bus
dispatch
Register
Rename Unit
retire
ALU
execute
head
tail
Reorder
Buffer
Reorder
Buffer
Fields
valid
shdw.value
value
src1valid
src1val
shdw.src1val
src1tag
src2valid
shdw.src2val
src2val
src2tag
dest
Shadow Fields
op
shdw.src1val[rob.tail]  Rfisa(src1)
shdw.src2val[rob.tail]  Rfisa(src2)
shdw.value[rob.tail] 
Updated directly from the
ISA model during dispatch
ALU(Rfisa(src1), Rfisa(src2), op)
FMCAD’02
Adding Shadow Structures
incr
Program
memory
PC
result bus
valid tag val
D
E
C
O
D
E
dispatch
Register
Rename Unit
retire
ALU
execute
head
tail
Reorder
Buffer
Reorder
Buffer
Fields
valid
shdw.value
value
src1valid
src1val
shdw.src1val
src1tag
src2valid
shdw.src2val
src2val
src2tag
dest
Shadow Fields
op
1. robt. rob.valid(t)  rob.value(t) = shdw.value(t)
2. robt. rob.src1valid(t)  rob.src1val(t) = shdw.src1val(t)
3. robt. rob.src2valid(t)  rob.src2val(t) = shdw.src2val(t)
FMCAD’02
Refinement Maps
incr
Program
memory
PC
result bus
D
E
C
O
D
E
valid tag val
dispatch
Register
Rename Unit
retire
ALU
execute
head
tail
Reorder
Buffer
Reorder
Buffer
Fields
valid
value
src1valid
src1val
src1tag
src2valid
src2val
src2tag
dest
op
shdw.value
shdw.src1val
shdw.src2val
Shadow Fields
Correspondence with a sequential ISA model

OOO and ISA synchronized at dispatch
For Register File Contents

r. reg.valid(r)  reg.val(r) = Rfisa(r)
For Program Counter

PCooo = PCisa
FMCAD’02
Invariants
Tag Consistency invariants (2)

Instructions only depend on instruction preceding in
program order
Register Renaming invariants (2)

Tag in a rename-unit should be in the ROB, and the
destination register should match
r.reg.valid(r) (rob.head  reg.tag(r) < rob.tail
 rob.dest(reg.tag(r)) = r )

For any entry, the destination should have reg.valid as
false and tag should contain this or later instruction
robt.(reg.valid(rob.dest(t)) 
t  reg.tag(rob.dest(t)) < rob.tail)
FMCAD’02
Invariants (cont.)
Executed instructions have operands ready
robt. rob.valid(t) 
rob.src1valid(t)  rob.src2valid(t)
Shadow-Value-Operands Relationship
robt. shdw.value(t) =
Alu(shdw.src1val(t),shdw.src2val(t),rob.op(t))
Producer-Consumer Values (2)
robt. rob.src1valid(t) 
shdw.src1val(t) = shdw.value(rob.src1tag(t))
Total 13 Invariants


Includes Refinement Maps
Constraints on Shadow Variables
FMCAD’02
Proving Invariants
Proved automatically




Quantifier instantiation was sufficient in these cases
Relieves the user of writing proof scripts to discharge the
proofs
Time spent = 54s on 1.4GHz m/c
Total effort = 2 person days
Not possible to use SVC or CVC

Ordering between integer array indices
 robt.


rob.src1valid(t)  rob.src1tag(t) < t
SVC/CVC interprets terms over reals
(x < y+1)  (x  y)
 Valid when x,y are integers
 Invalid when x,y are reals
FMCAD’02
Why Quantifier Instantiation works
FMCAD’02
Extensions to the base model
Increase concurrency of design



Infinite number of execution units
Any subset of {dispatch,execute,retire,nop} can be active
The same invariants were proved inductive without any
changes
Scalar  Superscalar





Incorporate issue width = 2 and retire width = 2
Data forwarding logic of the processor gets complicated
Same set of invariants proved automatically
No change in the proof script !!
Runtime increased from 54s to 134s
FMCAD’02
Adding circular reorder buffer
ROB modeled as a finite but arbitrary-size circular FIFO


Tags are reused
No dispatch when the reorder buffer is full
Changes in the model



Add a predicate rob.present() to indicate a rob entry
contains valid entry
Change the dispatch logic to stall when ROB full
Modify ‘<’ to incorporate wrap-around
Changes in proof script

Add 1 invariant about the relationship of rob.present and
active elements of ROB
Again the proof of invariants automatic !!
FMCAD’02
Liveness Proof
Liveness

Every dispatched instruction is eventually retired
Assumes a “fair” scheduler

Attempts to execute the instruction at the head infinitely
often
Proceed by a high level induction




Not mechanical
Similar to Hosabettu [CAV98] approach
Most lemmas required are already proved during safety
proof (in UCLID)
Concise proof
FMCAD’02
Current Status and Future Work
Use of decision procedure in deductive verification


Automate proof of invariants in micro-architecture
verification with speculation, memory instructions [CMU-TR]
Automate proof of invariants in verification of a directory
based cache coherence protocol with unbounded clients
and unbounded channels
Need ways to generate (some) invariants automatically


Pnueli et al.’s invisible invariant method [CAV01]
Difficult to handle unbounded data, uninterpreted functions
and ordering
Detecting convergence of such term-level models

Would enable automatic proof of models with finite buffers
FMCAD’02
Questions
Introduction and Related Work
Microprocessor Verification

Finite state symbolic Model Checking,
 Berezin et al.

Compositional Model Checking,
 McMillan et al.

Symbolic Simulation + Decision Procedure based,
 Burch & Dill,
 Bryant & Velev

Theorem Proving Techniques,
 Sawada & Hunt,
 Hosabettu et al.,
 Arons & Pnueli
FMCAD’02
Exploiting Positive Equality
Decision Procedure exploits “positive-equality”


Bryant, German, Velev , CAV’99
Extended in presence of succ, pred operations
 Bryant, Lahiri, Seshia CAV’02
Positive Equality


Number of interpretations can be greatly reduced
Equations appearing only under even # of negations
assigned false
 Except when restricted by functional consistency

Terms compared in these equations get distinct
interpretations --- called p-terms
 Identifying p-terms is a pre-processing step
FMCAD’02
Instruction Set Architecture (ISA)
FMCAD’02
UCLID description
FMCAD’02
Modeling Circular Queues
H0
T0
head
next[head] := case
(operation = POP) : succ’(head) ;
default
: head ;
esac
next[content] := Lambda i. case
(operation = PUSH) & (i = tail) : D ;
default : content(i);
esac
tail
next[tail] := case
(operation = PUSH) : succ’(tail) ;
default
: tail;
esac
succ’ := Lambda (x).
case
x = T0 : H0 ;
default : succ(x);
esac;
FMCAD’02
Term-level modeling
Abstract Bit-Vectors with Integers (Terms)

Allow restricted set of operations

x=y, x  y, succ(x), pred(x)
“Black-box” certain combinational blocks


Replace by uninterpreted functions
Maintain functional consistency
A
Lf
U
FMCAD’02
Example : Motorola ELF™ Processor
Features



32-bit Dual issue with 64 GPRs
5 stage pipeline
Out-of-order issue, in order completion of up to 2 instructions
Load/Store unit





3-cycle load latency
Fully pipelined
Load queue for loads that miss in cache
Store queue for retiring store instruction
Other buffers to hide cache miss latency
1000 lines of UCLID model derived from 20K lines of RTL
FMCAD’02
Bounded Property Checking


Compare the micro-architecture with a sequential ISA model
w.r.t. Register File, Memory and PC
ISA model synchronized at completion
ISA
impl impl impl
Impl state when 1 or 2
instruction(s) complete
ISA
impl impl impl
Impl state when no
instruction(s) complete
ISA state
FMCAD’02
Quantifier Instantiation
Prove
|= (x1x2…xk (x1…xk))  y1y2…ym (y1…ym)
1. Introduce Skolem Constants (y*1,…,y*m)
|= (x1x2…xk (x1,…,xk))  (y*1,…,y*m)
2. Instantiate x1,…,xk with concrete terms

Assume single-arity functions and predicates

Let Fx = {f | f(x) is a sub-expression of (x1…xk)}

Let Tf = {t | f(t) is a sub-expression of (y*1…y*m)}

For each bound variable x, Ax = {t|f  Fx and t  Tf}

Instantiate  over Axi x Ax2 ...x Axk
 Formula size grows exponentially with the number
of bound variables
FMCAD’02
Updating Shadow Structures
During the dispatch of new instruction
 I

=
<src1,src2,dest,op>
next[shdw.value] :=
t. (t = rob.tail ?
Alu(Rfisa(src1),Rfisa(src2),op) :
shdw.value(t));

next[shdw.src1val] :=
t. (t = rob.tail ?
Rfisa(src1) :
shdw.src1val(t));

next[shdw.src2val] :=
t. (t = rob.tail ?
Rfisa(src2) :
shdw.src2val(t));
FMCAD’02
Adding Shadow Structures
incr
Program
memory
PC
valid tag val
D
E
C
O
D
E
result bus
dispatch
Register
Rename Unit
retire
ALU
execute
head
incr
Program
memory
PC
tail
Reorder
Buffer
Reorder
Buffer
Fields
valid
shdw.value
value
src1valid
src1val
shdw.src1val
src1tag
src2valid
shdw.src2val
src2val
src2tag
dest
Shadow Fields
op
D
E
C
O
D
E
FMCAD’02
Refinement Maps
For Register File Contents

r. reg.valid(r)  reg.val(r) = Rfisa(r)

If a register is not being modified by any instruction in ROB,
then the value matches the ISA value
For Program Counter

PCooo = PCisa
FMCAD’02
Invariants
valid
value
src1valid
src1val
src1tag
src2valid
src2val
src2tag
dest
op
0
FMCAD’02
Burch-Dill Technique
More automated than inductive invariant checking



Does not require auxiliary structures,
Far fewer invariants than invariant checking
Only 4 invariants compared to about 12 for inductive
invariant checking approach
Invariants on initial state Qooo




Instructions only depend on instruction preceding in
program order
Tag in a rename-unit should be in the ROB, and the
destination register should match
For any entry, the destination should have reg.valid as false
and tag should contain this or later instruction
rob.head  rob.tail  rob.head + r
FMCAD’02
Invariants
Total 13 invariants required



Refinement map for RF and PC (2)
Shadow structure constraints (3)
Tag Consistency invariants (2)
 Instructions only depend on instruction preceding in program order

Circular Register Renaming invariants (2)
 Tag in a rename-unit should be in the ROB, and the destination register
should match
r.reg.valid(r) (rob.head  reg.tag(r) < rob.tail
 rob.dest(reg.tag(r)) = r )
 For any entry, the destination should have reg.valid as false and tag
should contain this or later instruction
robt.(reg.valid(rob.dest(t)) 
t  reg.tag(rob.dest(t)) < rob.tail)
FMCAD’02
Download