Deductive Verification of Advanced Out-of-Order Microprocessors Carnegie Mellon University

advertisement
Deductive Verification of
Advanced Out-of-Order
Microprocessors
Shuvendu K. Lahiri
Randal E. Bryant
Carnegie Mellon University
OOO Processor Model
src1
D
E
C
O
D
E
P
C
epc
PC Unit
src2
dest
imm
type
Register
Rename Unit
Instruction Mem
Branch
Predictor
Result Bus
valid
value
src1valid
src1val
src1tag
src2valid
src2val
src2tag
dest
type
pc
target
predict
Reorder Buffer
head
tail
Memory
Unit
Mem
Branch
Unit
–2–
Arithmetic
Unit
lsq
stq
Complexity of Out-of-Order
Processor Verification
Unbounded Data

Integer data paths
Parameterized Computation


Uninterpreted functions and predicates
ALU, ExceptionRaise?, Decoding Logic
Unbounded Data structures


Memory
Ordered Data structures
Highly concurrent

Retire, execute, dispatch happen concurrently
Proving Sequential Semantics

–3–
With respect to an Instruction Set Architecture (ISA)
Related Work
Deductive Methods





Theorem prover based
Hosabettu et al. and Sawada et al.
Large proof scripts
Manual intervention to discharge the proofs
Uses “flushing” technique
Compositional Model Checking based





–5–
McMillan et al.
Does not apply to deep or superscalar processors
Exploits symmetry in the design
User decomposes the proof
Does not need auxiliary invariants
Earlier Work

Lahiri, Seshia and Bryant FMCAD’02
Modeling and Verification of Out-of-Order Processors



–6–
Simple Out-of-order execution unit
Only arithmetic instructions
All proof obligations handled by decision procedure for
UCLID
This work
Apply earlier work to more complex designs



Handle speculation and exceptions
Memory instructions, store forwarding etc.
Superscalar out-of-order processors
Can we model the new components in UCLID?

Load store queues, exceptions
Is refinement based deductive verification feasible ?



Earlier deductive methods use Burch-Dill technique
Recursive “flushing” function
Aarons & Pnueli use “refinement” for simpler models
Can we retain the automation of proofs ?

–7–
Relieve the user from interactively proving theorems
Access Modes for Reorder Buffer
Retire
Dispatch
result bus
ALU
execute
FIFO


Insert when dispatch
Remove when retire
head
tail
Content Addressable

Directly Addressable


–8–
Select particular entry for
execution
Retrieve result value from
executed instruction
Broadcast result to all
entries with matching
source tag
Global

Flush all queue entries when
instruction at head causes
exception
CLU : Logic of UCLID
Terms (T )
ITE(F, T1, T2)
Fun (T1, …, Tk)
succ (T)
pred (T)
Formulas (F )
F, F1  F2, F1  F2
T1 = T2
T1 < T2
P(T1, …, Tk)
Functions (Fun)
f
 x1, …, xk . T
Predicates (P)
p
 x1, …, xk . F
–9–
Integer Expressions
If-then-else
Function application
Increment
Decrement
Boolean Expressions
Boolean connectives
Equation
Inequality
Predicate application
Integers  Integer
Uninterpreted function symbol
Function definition
Integers  Boolean
Uninterpreted predicate symbol
Predicate definition
Modeling Memories with ’s
Memory M Modeled as Function
Writing Transforms Memory

M = Write(M, wa, wd)
M
a
M
wa
=

M(a): Value at location a
a
Initially
M
M
a

– 10 –
1
0
m0


wd
Arbitrary state
Modeled by uninterpreted
function m0

 a . ITE(a = wa, wd, M(a))
Future reads of address wa
will get wd
Modeling Parallel Updates
Simultaneous-Update Memories


Update arbitrary subset of entries at the
same step
Useful for modeling Reorder Buffer
 Forwarding data to all dependant instructions
•
•
•
M(i)
M(i+1)
M(i+2)
P(i+1) is true
P(i+2) is true
•
•
•
M(j)
– 11 –
M(j+2)
M(j+3)
•
•
•
next[M] := i. ITE(P(i), D(i), M(i))
If entry i satisfies a predicate
P(i) it is updated with D(i)
M(j+1)
P(j+1) is true
P(j+3) is true
Modeling Parallel Updates
Simultaneous-Update Memories


Update arbitrary subset of entries at the
same step
Useful for modeling Reorder Buffer
 Forwarding data to all dependant instructions
•
•
•
M(i)
D(i+1)
D(i+2)
P(i+1) is true
P(i+2) is true
•
•
•
M(j)
– 12 –
M(j+2)
D(j+3)
•
•
•
next[M] := i. ITE(P(i), D(i), M(i))
If entry i satisfies a predicate
P(i) it is updated with D(i)
D(j+1)
P(j+1) is true
P(j+3) is true
Modeling Unbounded FIFO Buffer
Queue is Subrange of Infinite Sequence
Q.head = h
 Index of oldest element
Q.tail = t
 Index of insertion location

q(h–1)
head
q(h+1)
Q.val = q
•
•
•
 Function mapping indices to values
 q(i) valid only when h  i < t
q(t–2)
Initial State: Arbitrary Queue

Q.head = h0, Q.tail = t0
 Impose constraint that h0  t0

Q.val = q0
 Uninterpreted function
– 13 –
q(h)
q(t–1)
tail
increasing indices

Already
Popped
q(h–2)
q(t)
q(t+1)
•
•
•

•
•
•
Not Yet
Inserted
Modeling FIFO Buffer (cont.)
next[t] :=
ITE(operation = PUSH, succ(t), t)
next[q] :=
 (i).
ITE((operation = PUSH & i=t),
x, q(i))
– 14 –
t
•
•
•
q(h–2)
q(h–2)
q(h–1)
q(h–1)
q(h)
next[h]
q(h)
q(h+1)
q(h+1)
•
•
•
•
•
•
q(t–2)
q(t–2)
q(t–1)
q(t–1)
q(t)
x
q(t+1)
•
•
•
h
•
•
•
next[t]
q(t+1)
•
•
•
next[h] :=
ITE(operation = POP, succ(h), h)
op = PUSH
Input = x
Modeling Components of Processors
Reorder Buffer

FIFO
 Instructions in Program Order

Parallel Update memory
 Update from an executed instruction

Content Addressable
Load-Store Queue

FIFO
Store Queue


FIFO
Associative lookup by content
 Find the latest entry containing an address

Flush part of the queue
 Do not flush retired instructions
– 15 –
Verification Approach
Extending the approach in FMCAD’02


Worked with a simple OOO execution unit
No speculation or memory
Deductive verification
– 16 –
Deductive Verification
d is the state transition relation,
F describes the initial states
p is the property to be proved,
j is an inductive invariant, which implies p
Prove F  j
Prove j  d  j ’
Prove j  p
– 17 –
p is proved
Restricted Invariants and Proofs
Invariants of the form x1x2…xk (x1…xk)

(x1…xk) is a CLU formula without quantifiers

x1…xk are integer variables free in (x1…xk)
Proving these invariants requires quantifiers
|= (x1x2…xk (x1…xk))  y1y2…ym F(y1…ym)
Automatic instantiation of x1…xk with concrete terms



– 18 –
Sound but incomplete method
Reduce the quantified formula to a CLU formula
Can use the decision procedure for CLU
Proving correctness
Refinement Maps


Establish relation between OOO and sequential ISA model
A refinement map for each ISA visible state element
 Register File
 Program Counter
 Data Memory
Example

– 19 –
“If a register is not being modified in OOO, then it should
have the same value as in the ISA”
Description of Verification
– 20 –
Auxiliary Data Structures
Shadow Fields


“Predicts” correct value for OOO state elements
Updated during DISPATCH by ISA machine
Auxiliary Fields



– 21 –
Need to define a consistent internal state of OOO
Does not depend on ISA machine
Usually additional maps
Adding Shadow State


McMillan, ‘98
Arons & Pnueli, ‘99
Provides Link Between ISA
& OOO Models

ISA
Reg.
File
PC
Additional entries in ROB
 Do not affect OOO behavior


Generated when
instruction dispatched
Predict values of operands
and result
 From ISA model
OOO
Reg.
File
PC
Reorder Buffer
– 22 –
Shadow States
Operands and Result of an instruction

Correct values
Shadow Register Rename Unit

Latest non-speculative instruction to modify a register
Shadow Memory Address Map

– 23 –
Latest non-speculative instruction to modify a memory
address
Auxiliary Structures
Restricted Invariant Structure

x1x2…xk (x1…xk)
Adding complicated Invariants


For every non-executed memory instruction I in ROB, there
exists an entry in the Load-Store Queue (LSQ)
Requires Existential () Properties
Add auxiliary structure as witness for 


– 24 –
Add a map - rob_lsq_ptr : ROB  LSQ
For every non-executed memory instruction I in ROB,
rob_lsq_ptr (I) is present in LSQ
Auxiliary Structures
Restricted Invariant Structure

x1x2…xk (x1…xk)
Add auxiliary structure as
witness for 

Adding Complicated Invariants


– 25 –
For every non-executed
memory instruction I in
ROB, there exists an entry in
the Load-Store Queue (LSQ)
Requires Existential ()
Properties

Add a map - rob_lsq_ptr :
ROB  LSQ
For every non-executed
memory instruction I in
ROB, rob_lsq_ptr (I) is
present in LSQ
Auxiliary Structures
rob_lsq_ptr : ROB  LSQ

lsq_rob_ptr : LSQ  ROB already part of the model
rob_stq_ptr : ROB  STQ, stq_rob_ptr : STQ  ROB

Need reverse maps
ld_stq_ptr : ROB  STQ

– 26 –
For each Load instruction, the STQ entry that would forward
data
Incremental Models
1. Basic Out-of-order execution unit (base)
1.
Reorder Buffer, Register Rename Unit
2. Exception Handling (exc)
1.
Arithmetic exceptions
3. Branch Prediction (exc/br)
4. Memory Instruction – Simple (exc/br/mem-simp)
1.
2.
Stores commit during RETIRE
Illegal Address exceptions
5. Memory Instruction (exc/br/mem)
1.
– 29 –
Stores commit sometime after RETIRE
Counterexamples
Strengthen Invariants

Use counter-examples to (manually) strengthen the invariants
Example
Invariant : t  ROB. reg.valid(rob.dest(t))
Is the invariant inductive ?

Is it preserved by the transition function ?
Counterexample







– 30 –
rob.hd = 1, rob.tl = 10
rob.valid[1] = true
t = 5
rob.dest[5] = r10
reg.tag[r10] = 1
reg.valid[r10] = false
operation = retire
t  ROB. t 
reg.tag(rob.dest(t))
Misspeculation Invariants
Predict the instruction that would cause misspeculation

Result of branch misprediction or exception
Shadow entry to keep track of this instruction



shdw_exn_mpred_tag : tag in the ROB
Gets updated from ISA machine during DISPATCH
Reset during a “flush” of the OOO state
Invariants



– 31 –
Earliest misspeculated instruction
Instruction at shdw_exn_mpred_tag should raise an exception
or be mispredicted
Others
Ordering Invariants
Maintain Program Order in different data structures



Reorder Buffer
Load Store Queue
Store Queue
Often the source of complicated invariants

For memory instructions I1, I2
 Instruction I1 precedes I2 in Reorder Buffer iff I1 precedes I2 in
Load-Store Queue

– 32 –
If instruction I1 depends on instruction I2, then I1 precedes
I2 in program order
Load-Store Invariants
Correct Value of a Load (r,A)

If A present in STQ
 Value from STQ

If shdw.mem_tag(A) in ROB and A not in STQ
 Value of the store

Else
 Value from the memory
– 33 –
Shadow Invariants
Relate Shadow Variables to State Variables

t  ROB. [rob.valid(t)  rob.value(t) = shdw.value(t)]


– 34 –
t  ROB. [rob.src1valid(t)  rob.src1val(t) = shdw.src1val(t)
]
t  ROB. [rob.src2valid(t)  rob.src2val(t) = shdw.src2val(t) ]
Comparative Verification Effort
base
Total
Invariants
Manually
instantiate
UCLID
time
Person
time
exc
exc / br
exc / br /
exc / br /
mem-simp
mem
39
67
71
13
34
0
0
0
4
8
54 s
236 s
403 s
1594 s
2200 s
2 days
5 days
2 days
15 days
10 days
Proof script size substantially smaller


67KB as opposed to 1909 KB (Hosabettu et al.)
Very little user intervention in discharging proofs
Instantiation of quantifiers

– 35 –
Mostly automatic, few manual for larger examples
Going Superscalar
Superscalar


Dispatch 0… d instructions at each step
Retire 0… r instructions at each step
Complex Control Logic


Additional forwarding in DISPATCH window
Additional forwarding in RETIRE window
Extended the base model
– 36 –
Statistics for Superscalar Models
Width
#-instant
Time (sec)
Dispatch
Retire
2
1
12
86.63
2
2
28
137.43
2
4
88
308.55
2
8
304
1040.60
Does not require any change to proof script

Complicates control logic but the invariants still hold
Scales well with increasing width


– 37 –
Almost linear with the (Dispatch*Retire) width
Instantiation considers terms in (Dispatch + Retire) window
Conclusion
Case study of complex processors in UCLID

CLU expressive enough to model advanced features
Reasonable automation in discharging proofs


Use of automatic decision procedures
Quantification strategy robust
Need to generate invariants


Using Predicate Abstraction
Automatically constructed invariant for OOO-base model
given the predicates
Improve desirability for deductive methods
– 38 –
Modeling Circular Queues
H0
T0
head
next[head] := case
(operation = POP) : succ’(head) ;
default
: head ;
esac
– 39 –
next[content] := Lambda i. case
(operation = PUSH) & (i = tail) : D ;
default : content(i);
esac
tail
next[tail] := case
(operation = PUSH) : succ’(tail) ;
default
: tail;
esac
succ’ := Lambda x.
case
x = T0 : H0 ;
default : succ(x);
esac;
Store Queue
Address Data
•
•
•
Content Addressable


Look for an address
Same address at
multiple index
•
•
•
A(h–2) d(h–2)
A(h–1) d(h–1)
h
A(h)
d(h)
A(h+1) d(h+1)
Latest index that
matches address
t
Partial Flush

– 40 –
Remove entries
after an index
•
A(r)
d(r)
•
•
A(t–2)
d(t–2)
A(t–1)
d(t–1)
A(t)
d(t)
A(t+1)
d(t+1)
•
•
•

r
•
•
•
•
Latest Match
retired
speculative
Store Queue
Address Data
•
•
•
Content Addressable


Look for an address
Same address at
multiple index
•
•
•
A(h–2) d(h–2)
A(h–1) d(h–1)
h
A(h)
d(h)
A(h+1) d(h+1)
•
•
A(r)
d(r)
A2
•
•
A3
A(t–2)
d(t–2)
A(t–1)
d(t–1)
A(t)
d(t)
A(t+1)
d(t+1)
•
•
•
•
•
•
A1
r
•
•
– 41 –
t
retired
speculative
Quantifier Instantiation
Prove
|= (x1x2…xk (x1…xk))  y1y2…ym F(y1…ym)
1. Introduce Skolem Constants (y*1,…,y*m)
|= (x1x2…xk (x1,…,xk))  F(y*1,…,y*m)
2. Instantiate x1,…,xk with concrete terms

Assume single-arity functions and predicates

Let Fx = {f | f(x) is a sub-expression of (x1…xk)}

Let Tf = {t | f(t) is a sub-expression of F(y*1…y*m)}

For each bound variable x, Ax = {t|f  Fx and t  Tf}

Instantiate  over Axi x Ax2 ...x Axk
 Formula size grows exponentially with the number
of bound variables
– 42 –
Download