Deductive Verification of Advanced Out-of-Order Microprocessors Shuvendu K. Lahiri Randal E. Bryant Carnegie Mellon University OOO Processor Model src1 D E C O D E P C epc PC Unit src2 dest imm type Register Rename Unit Instruction Mem Branch Predictor Result Bus valid value src1valid src1val src1tag src2valid src2val src2tag dest type pc target predict Reorder Buffer head tail Memory Unit Mem Branch Unit –2– Arithmetic Unit lsq stq Complexity of Out-of-Order Processor Verification Unbounded Data Integer data paths Parameterized Computation Uninterpreted functions and predicates ALU, ExceptionRaise?, Decoding Logic Unbounded Data structures Memory Ordered Data structures Highly concurrent Retire, execute, dispatch happen concurrently Proving Sequential Semantics –3– With respect to an Instruction Set Architecture (ISA) Related Work Deductive Methods Theorem prover based Hosabettu et al. and Sawada et al. Large proof scripts Manual intervention to discharge the proofs Uses “flushing” technique Compositional Model Checking based –5– McMillan et al. Does not apply to deep or superscalar processors Exploits symmetry in the design User decomposes the proof Does not need auxiliary invariants Earlier Work Lahiri, Seshia and Bryant FMCAD’02 Modeling and Verification of Out-of-Order Processors –6– Simple Out-of-order execution unit Only arithmetic instructions All proof obligations handled by decision procedure for UCLID This work Apply earlier work to more complex designs Handle speculation and exceptions Memory instructions, store forwarding etc. Superscalar out-of-order processors Can we model the new components in UCLID? Load store queues, exceptions Is refinement based deductive verification feasible ? Earlier deductive methods use Burch-Dill technique Recursive “flushing” function Aarons & Pnueli use “refinement” for simpler models Can we retain the automation of proofs ? –7– Relieve the user from interactively proving theorems Access Modes for Reorder Buffer Retire Dispatch result bus ALU execute FIFO Insert when dispatch Remove when retire head tail Content Addressable Directly Addressable –8– Select particular entry for execution Retrieve result value from executed instruction Broadcast result to all entries with matching source tag Global Flush all queue entries when instruction at head causes exception CLU : Logic of UCLID Terms (T ) ITE(F, T1, T2) Fun (T1, …, Tk) succ (T) pred (T) Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 P(T1, …, Tk) Functions (Fun) f x1, …, xk . T Predicates (P) p x1, …, xk . F –9– Integer Expressions If-then-else Function application Increment Decrement Boolean Expressions Boolean connectives Equation Inequality Predicate application Integers Integer Uninterpreted function symbol Function definition Integers Boolean Uninterpreted predicate symbol Predicate definition Modeling Memories with ’s Memory M Modeled as Function Writing Transforms Memory M = Write(M, wa, wd) M a M wa = M(a): Value at location a a Initially M M a – 10 – 1 0 m0 wd Arbitrary state Modeled by uninterpreted function m0 a . ITE(a = wa, wd, M(a)) Future reads of address wa will get wd Modeling Parallel Updates Simultaneous-Update Memories Update arbitrary subset of entries at the same step Useful for modeling Reorder Buffer Forwarding data to all dependant instructions • • • M(i) M(i+1) M(i+2) P(i+1) is true P(i+2) is true • • • M(j) – 11 – M(j+2) M(j+3) • • • next[M] := i. ITE(P(i), D(i), M(i)) If entry i satisfies a predicate P(i) it is updated with D(i) M(j+1) P(j+1) is true P(j+3) is true Modeling Parallel Updates Simultaneous-Update Memories Update arbitrary subset of entries at the same step Useful for modeling Reorder Buffer Forwarding data to all dependant instructions • • • M(i) D(i+1) D(i+2) P(i+1) is true P(i+2) is true • • • M(j) – 12 – M(j+2) D(j+3) • • • next[M] := i. ITE(P(i), D(i), M(i)) If entry i satisfies a predicate P(i) it is updated with D(i) D(j+1) P(j+1) is true P(j+3) is true Modeling Unbounded FIFO Buffer Queue is Subrange of Infinite Sequence Q.head = h Index of oldest element Q.tail = t Index of insertion location q(h–1) head q(h+1) Q.val = q • • • Function mapping indices to values q(i) valid only when h i < t q(t–2) Initial State: Arbitrary Queue Q.head = h0, Q.tail = t0 Impose constraint that h0 t0 Q.val = q0 Uninterpreted function – 13 – q(h) q(t–1) tail increasing indices Already Popped q(h–2) q(t) q(t+1) • • • • • • Not Yet Inserted Modeling FIFO Buffer (cont.) next[t] := ITE(operation = PUSH, succ(t), t) next[q] := (i). ITE((operation = PUSH & i=t), x, q(i)) – 14 – t • • • q(h–2) q(h–2) q(h–1) q(h–1) q(h) next[h] q(h) q(h+1) q(h+1) • • • • • • q(t–2) q(t–2) q(t–1) q(t–1) q(t) x q(t+1) • • • h • • • next[t] q(t+1) • • • next[h] := ITE(operation = POP, succ(h), h) op = PUSH Input = x Modeling Components of Processors Reorder Buffer FIFO Instructions in Program Order Parallel Update memory Update from an executed instruction Content Addressable Load-Store Queue FIFO Store Queue FIFO Associative lookup by content Find the latest entry containing an address Flush part of the queue Do not flush retired instructions – 15 – Verification Approach Extending the approach in FMCAD’02 Worked with a simple OOO execution unit No speculation or memory Deductive verification – 16 – Deductive Verification d is the state transition relation, F describes the initial states p is the property to be proved, j is an inductive invariant, which implies p Prove F j Prove j d j ’ Prove j p – 17 – p is proved Restricted Invariants and Proofs Invariants of the form x1x2…xk (x1…xk) (x1…xk) is a CLU formula without quantifiers x1…xk are integer variables free in (x1…xk) Proving these invariants requires quantifiers |= (x1x2…xk (x1…xk)) y1y2…ym F(y1…ym) Automatic instantiation of x1…xk with concrete terms – 18 – Sound but incomplete method Reduce the quantified formula to a CLU formula Can use the decision procedure for CLU Proving correctness Refinement Maps Establish relation between OOO and sequential ISA model A refinement map for each ISA visible state element Register File Program Counter Data Memory Example – 19 – “If a register is not being modified in OOO, then it should have the same value as in the ISA” Description of Verification – 20 – Auxiliary Data Structures Shadow Fields “Predicts” correct value for OOO state elements Updated during DISPATCH by ISA machine Auxiliary Fields – 21 – Need to define a consistent internal state of OOO Does not depend on ISA machine Usually additional maps Adding Shadow State McMillan, ‘98 Arons & Pnueli, ‘99 Provides Link Between ISA & OOO Models ISA Reg. File PC Additional entries in ROB Do not affect OOO behavior Generated when instruction dispatched Predict values of operands and result From ISA model OOO Reg. File PC Reorder Buffer – 22 – Shadow States Operands and Result of an instruction Correct values Shadow Register Rename Unit Latest non-speculative instruction to modify a register Shadow Memory Address Map – 23 – Latest non-speculative instruction to modify a memory address Auxiliary Structures Restricted Invariant Structure x1x2…xk (x1…xk) Adding complicated Invariants For every non-executed memory instruction I in ROB, there exists an entry in the Load-Store Queue (LSQ) Requires Existential () Properties Add auxiliary structure as witness for – 24 – Add a map - rob_lsq_ptr : ROB LSQ For every non-executed memory instruction I in ROB, rob_lsq_ptr (I) is present in LSQ Auxiliary Structures Restricted Invariant Structure x1x2…xk (x1…xk) Add auxiliary structure as witness for Adding Complicated Invariants – 25 – For every non-executed memory instruction I in ROB, there exists an entry in the Load-Store Queue (LSQ) Requires Existential () Properties Add a map - rob_lsq_ptr : ROB LSQ For every non-executed memory instruction I in ROB, rob_lsq_ptr (I) is present in LSQ Auxiliary Structures rob_lsq_ptr : ROB LSQ lsq_rob_ptr : LSQ ROB already part of the model rob_stq_ptr : ROB STQ, stq_rob_ptr : STQ ROB Need reverse maps ld_stq_ptr : ROB STQ – 26 – For each Load instruction, the STQ entry that would forward data Incremental Models 1. Basic Out-of-order execution unit (base) 1. Reorder Buffer, Register Rename Unit 2. Exception Handling (exc) 1. Arithmetic exceptions 3. Branch Prediction (exc/br) 4. Memory Instruction – Simple (exc/br/mem-simp) 1. 2. Stores commit during RETIRE Illegal Address exceptions 5. Memory Instruction (exc/br/mem) 1. – 29 – Stores commit sometime after RETIRE Counterexamples Strengthen Invariants Use counter-examples to (manually) strengthen the invariants Example Invariant : t ROB. reg.valid(rob.dest(t)) Is the invariant inductive ? Is it preserved by the transition function ? Counterexample – 30 – rob.hd = 1, rob.tl = 10 rob.valid[1] = true t = 5 rob.dest[5] = r10 reg.tag[r10] = 1 reg.valid[r10] = false operation = retire t ROB. t reg.tag(rob.dest(t)) Misspeculation Invariants Predict the instruction that would cause misspeculation Result of branch misprediction or exception Shadow entry to keep track of this instruction shdw_exn_mpred_tag : tag in the ROB Gets updated from ISA machine during DISPATCH Reset during a “flush” of the OOO state Invariants – 31 – Earliest misspeculated instruction Instruction at shdw_exn_mpred_tag should raise an exception or be mispredicted Others Ordering Invariants Maintain Program Order in different data structures Reorder Buffer Load Store Queue Store Queue Often the source of complicated invariants For memory instructions I1, I2 Instruction I1 precedes I2 in Reorder Buffer iff I1 precedes I2 in Load-Store Queue – 32 – If instruction I1 depends on instruction I2, then I1 precedes I2 in program order Load-Store Invariants Correct Value of a Load (r,A) If A present in STQ Value from STQ If shdw.mem_tag(A) in ROB and A not in STQ Value of the store Else Value from the memory – 33 – Shadow Invariants Relate Shadow Variables to State Variables t ROB. [rob.valid(t) rob.value(t) = shdw.value(t)] – 34 – t ROB. [rob.src1valid(t) rob.src1val(t) = shdw.src1val(t) ] t ROB. [rob.src2valid(t) rob.src2val(t) = shdw.src2val(t) ] Comparative Verification Effort base Total Invariants Manually instantiate UCLID time Person time exc exc / br exc / br / exc / br / mem-simp mem 39 67 71 13 34 0 0 0 4 8 54 s 236 s 403 s 1594 s 2200 s 2 days 5 days 2 days 15 days 10 days Proof script size substantially smaller 67KB as opposed to 1909 KB (Hosabettu et al.) Very little user intervention in discharging proofs Instantiation of quantifiers – 35 – Mostly automatic, few manual for larger examples Going Superscalar Superscalar Dispatch 0… d instructions at each step Retire 0… r instructions at each step Complex Control Logic Additional forwarding in DISPATCH window Additional forwarding in RETIRE window Extended the base model – 36 – Statistics for Superscalar Models Width #-instant Time (sec) Dispatch Retire 2 1 12 86.63 2 2 28 137.43 2 4 88 308.55 2 8 304 1040.60 Does not require any change to proof script Complicates control logic but the invariants still hold Scales well with increasing width – 37 – Almost linear with the (Dispatch*Retire) width Instantiation considers terms in (Dispatch + Retire) window Conclusion Case study of complex processors in UCLID CLU expressive enough to model advanced features Reasonable automation in discharging proofs Use of automatic decision procedures Quantification strategy robust Need to generate invariants Using Predicate Abstraction Automatically constructed invariant for OOO-base model given the predicates Improve desirability for deductive methods – 38 – Modeling Circular Queues H0 T0 head next[head] := case (operation = POP) : succ’(head) ; default : head ; esac – 39 – next[content] := Lambda i. case (operation = PUSH) & (i = tail) : D ; default : content(i); esac tail next[tail] := case (operation = PUSH) : succ’(tail) ; default : tail; esac succ’ := Lambda x. case x = T0 : H0 ; default : succ(x); esac; Store Queue Address Data • • • Content Addressable Look for an address Same address at multiple index • • • A(h–2) d(h–2) A(h–1) d(h–1) h A(h) d(h) A(h+1) d(h+1) Latest index that matches address t Partial Flush – 40 – Remove entries after an index • A(r) d(r) • • A(t–2) d(t–2) A(t–1) d(t–1) A(t) d(t) A(t+1) d(t+1) • • • r • • • • Latest Match retired speculative Store Queue Address Data • • • Content Addressable Look for an address Same address at multiple index • • • A(h–2) d(h–2) A(h–1) d(h–1) h A(h) d(h) A(h+1) d(h+1) • • A(r) d(r) A2 • • A3 A(t–2) d(t–2) A(t–1) d(t–1) A(t) d(t) A(t+1) d(t+1) • • • • • • A1 r • • – 41 – t retired speculative Quantifier Instantiation Prove |= (x1x2…xk (x1…xk)) y1y2…ym F(y1…ym) 1. Introduce Skolem Constants (y*1,…,y*m) |= (x1x2…xk (x1,…,xk)) F(y*1,…,y*m) 2. Instantiate x1,…,xk with concrete terms Assume single-arity functions and predicates Let Fx = {f | f(x) is a sub-expression of (x1…xk)} Let Tf = {t | f(t) is a sub-expression of F(y*1…y*m)} For each bound variable x, Ax = {t|f Fx and t Tf} Instantiate over Axi x Ax2 ...x Axk Formula size grows exponentially with the number of bound variables – 42 –