Modeling and Verification of Out-of-Order Microprocessors in UCLID Shuvendu K. Lahiri Sanjit A. Seshia Randal E. Bryant Carnegie Mellon University, USA Processor Verification Instruction Set Architecture Transition: One instruction execution Microarchitecture Views of System Operation Instruction Set Transition: One clock cycle Instructions executed in sequential order Instruction modifies “programmer-visible” state Microarchitecture At any given time, multiple instructions “in flight” State held in hidden pipeline registers and buffers Verification Task Prove all instruction sequences execute as predicted by instruction set model FMCAD’02 Introduction and Related Work Inorder Pipeline Verification Burch and Dill, CAV ’94 Relates implementation and specification by completing partially-executed instructions in the pipeline (flushing) Infinite data words, memories Bounded (fixed) resources only Can’t model a reorder buffer (ROB) of arbitrary length Out-of-Order Processor Verification Arbitrary large (64-128) reorder buffer, reservation stations and load-store queues Very large number of instruction in the pipeline No finite flushing function to drain the pipeline FMCAD’02 Out-Of-Order Processor Verification Theorem Proving approaches Hosabettu et al. (‘00), Sawada et al.(98), Arons et al.(‘00) Write inductive invariants Manually guide the theorem-provers for proving invariants Large, complicated proof scripts (fragile) Seldom have good counterexample facilities Compositional Model Checking [McMillan et al.] Use compositional model checking with temporal case splitting, path splitting, symmetry and data-type reduction Does not need to write inductive invariants User needs to manually decompose the proof Has not been demonstrated effective for deep, superscalar pipelines Other Approaches Finite State Model Checking [Berezin et al.], Incremental Flushing [Skakkaebek et al.], Decision Procedure [Velev] FMCAD’02 Contributions Extends the work by Bryant & Velev Restricted to Inorder pipelines with bounded resources Application of UCLID Modeling Framework for Out-Of-Order processors Application of three verification approaches to Out-Of-Order Processor Effective use of automated decision procedure For proving large formulas automatically Simple heuristics for quantifier instantiation FMCAD’02 CLU : Logic of UCLID Terms (T ) ITE(F, T1, T2) Fun (T1, …, Tk) succ (T) pred (T) Formulas (F ) F, F1 F2, F1 F2 T1 = T2 T1 < T2 P(T1, …, Tk) Functions (Fun) f x1, …, xk . T Predicates (P) p x1, …, xk . F Integer Expressions If-then-else Function application Increment Decrement Boolean Expressions Boolean connectives Equation Inequality Predicate application Integers Integer Uninterpreted function symbol Function definition Integers Boolean Uninterpreted predicate symbol Predicate definition FMCAD’02 Decision Procedure CLU Formula Lambda Expansion Operation Series of transformations leading to propositional formula Propositional formula checked with BDD or SAT tools Bryant, Lahiri, Seshia [CAV02] -free Formula Function & Predicate Elimination Function-free Formula Convert to Boolean Formula Boolean Formula Boolean Satisfiability FMCAD’02 Modeling Memories with ’s Memory M Modeled as Function Writing Transforms Memory next[M] = Write(M, wa, wd) M a next[M] wa M(a): = Value at location a a Initially M M a Arbitrary wd 1 0 m0 state Modeled by uninterpreted function m0 a . ITE(a = wa, wd, M(a)) Future reads of address wa will get wd FMCAD’02 Modeling Unbounded FIFO Buffer Queue is Subrange of Infinite Sequence h : INT Head of the queue t : INT Tail of the queue q(h–2) q(h–1) head q(h) q(h+1) q : INT INT • • • Function mapping indices to values q(i) valid only when h i < t q(t–2) q(t–1) tail q(t) q(t+1) • • • • • • FMCAD’02 Modeling FIFO Buffer (cont.) t q(h–2) q(h–1) q(h–1) q(h) next[h] q(h) q(h+1) q(h+1) • • • • • • q(t–2) q(t–2) q(t–1) q(t–1) q(t) x q(t+1) • • • next[t] := case (operation = PUSH) : succ(t) ; default : t; esac q(h–2) next[t] q(t+1) • • • h next[q] := lambda (i). case (operation = PUSH) & (i=t) : x; default : q(i) ; esac • • • • • • next[h] := case (operation = POP) : succ(h) ; default : h ; esac op = PUSH Input = x FMCAD’02 Modeling Parallel Updates Update arbitrary subset of entries at the same step next[M] := i. ITE(P(i), D(i), M(i)) Any entry, i, which satisfies a predicate P(i) will get updated with D(i) Useful for modeling Reorder Buffers Forwarding data to all dependant instructions M(i) M(i+1) D(i+1) M(i+2) D(i+2) P(i+1) is true P(i+2) is true • • • M(j) M(j+1) D(j+1) M(j+2) M(j+3) D(j+3) P(j+1) is true P(j+3) is true • • • Simultaneous-Update Memories • • • FMCAD’02 UCLID description Bounded Property Checking Correspondence Checking Inductive Invariant Checking Term-level Symbolic Simulator Decision Procedure BDD Counter Example Generator SAT Systems are modeled in CLU logic Three verification techniques Based on Symbolic Simulation Uses the decision procedure Counter example traces generated for verification failures FMCAD’02 Verification Techniques in UCLID Bounded Property Checking Start in reset state Symbolically simulate for fixed number of steps Verify a safety property for all states reachable within the fixed number of steps from the start state Correspondence Checking Run 2 different simulations starting in most general state Prove that final states equivalent e.g. Burch-Dill Technique Invariant Checking Start in general state s Prove Inv(s) Inv(next[s]) Limited support for automatic quantifier instantiation FMCAD’02 An Out-of-order Processor (OOO) incr Program memory PC valid tag val D E C O D E dispatch Register Rename Unit result bus retire ALU execute head tail Reorder Buffer valid value src1valid src1val src1tag src2valid src2val src2tag dest op result 1st Operand 2nd Operand Reorder Buffer Fields Out of order execution engine Register Renaming Inorder retirement Unbounded Reorder buffer Arithmetic instructions only Model different components in UCLID FMCAD’02 Verification of OOO : Automation vs. Guarantee Method Bounded Property Checking Burch-Dill Technique Inductive Invariant Checking Resources Verification Auxiliary (# of steps) variables Invariants Unbounded Bounded None None Fixed Unbounded None Very few Unbounded Unbounded Significant Significant, including those for auxiliary variables Presence of decision procedure Efficiency : Allows improved bounded property checking and Burch-Dill method Automation : Reduces manual guidance in proving invariants Automatic Instantiation of quantifiers FMCAD’02 Technique 1 : Bounded Property Checking Debugging OOO using Bounded Property Checking All the errors were discovered during this phase Counterexample trace of great help Debugging Motorola ELF™ Superscalar out-of-order processor Reorder Buffer, memory unit, load-store queues etc. Applied during early design exploration phase FMCAD’02 Bounded Property Checking Results Model OOO unit Elf™ steps terms Term formula size Prop Formula Size UCLID time (s) SVC time (s) 10 59 2566 15290 10.8 233.18 14 87 7480 62504 76.55 > 5 hrs 20 129 19921 263413 1679.12 > 1 day 6 33 218 942 1.2 10.9 8 70 1085 4481 8.4 1851.6 10 104 2467 16453 30.6 > 1 day 12 149 4553 54288 111.0 > 1 day SVC (Stanford) : Another decision procedure to solve CLU formulas Can decide more expressive class CVC (Successor of SVC) runs out of memory on larger cases FMCAD’02 Technique 2 : Burch-Dill Technique Qspec kspec Qspec k = issue width of OOO impl = Transition function of OOO Abs Abs Qimpl impl Qimpl spec = Transition function of ISA Abs = Relates OOO state with an ISA state Restrict the number of entries in the Reorder Buffer The number of ROB entry = r Flushing as the abstraction function Abs Alternate between executing the instruction at the head of the reorder buffer and retiring the head Inductive Invariants required for the initial state Qimpl Critical for Out-of-Order processor verification Redundancy present in the OOO model Because of out-of-order execution and register renaming FMCAD’02 Technique 2 : Burch-Dill Technique Qspec kspec Qspec k = issue width of OOO impl = Transition function of OOO Abs Abs Qimpl impl Qimpl spec = Transition function of ISA Abs = Relates OOO state with an ISA state More automated than inductive invariant checking Does not require auxiliary structures, Far fewer invariants than invariant checking Only 4 invariants compared to about 12 for inductive invariant checking approach FMCAD’02 Burch-Dill Technique for OOO Exponential blowup with the number of ROB entries Limited to r = 8 entries currently r = 8 finished after case-splitting in 2.5hrs # Of ROB # of terms Term formula size Prop Formula Size UCLID time (s) 2 63 398 5325 6.83 3 83 618 10248 30.23 4 103 886 18175 157.41 6 143 1534 41208 3051.79 8 183 2342 82915 >31hrs Entries FMCAD’02 Technique 3 : Invariant Checking Deriving the inductive invariants Require additional (auxiliary) variables to express invariants Auxiliary variables do not affect system operation Proving that the invariants are inductive Automate proof of invariants in UCLID Eliminates need for large (often fragile) proof script FMCAD’02 Restricted Invariants and Proofs Restricted classes of invariants x1x2…xk (x1…xk) (x1…xk) is a CLU formula without quantifiers x1…xk are integer variables free in (x1…xk) Proving these invariants requires quantifiers |= (x1x2…xk (x1…xk)) y1y2…ym (y1…ym) Automatic instantiation of x1…xk with concrete terms Sound but incomplete method Reduce the quantified formula to a CLU formula Can use the decision procedure for CLU FMCAD’02 Shadow Structures Auxiliary variables Added to predict correct value of state variables 3 shadow variables for 3 state variables rob.value rob.src1val rob.src2val : shdw.value : shdw.src1val : shdw.src2val Similar to McMillan’s approach and Arons et al.’s approach FMCAD’02 Adding Shadow Structures incr Program memory PC valid tag val D E C O D E result bus dispatch Register Rename Unit retire ALU execute head tail Reorder Buffer Reorder Buffer Fields valid shdw.value value src1valid src1val shdw.src1val src1tag src2valid shdw.src2val src2val src2tag dest Shadow Fields op shdw.src1val[rob.tail] Rfisa(src1) shdw.src2val[rob.tail] Rfisa(src2) shdw.value[rob.tail] Updated directly from the ISA model during dispatch ALU(Rfisa(src1), Rfisa(src2), op) FMCAD’02 Adding Shadow Structures incr Program memory PC result bus valid tag val D E C O D E dispatch Register Rename Unit retire ALU execute head tail Reorder Buffer Reorder Buffer Fields valid shdw.value value src1valid src1val shdw.src1val src1tag src2valid shdw.src2val src2val src2tag dest Shadow Fields op 1. robt. rob.valid(t) rob.value(t) = shdw.value(t) 2. robt. rob.src1valid(t) rob.src1val(t) = shdw.src1val(t) 3. robt. rob.src2valid(t) rob.src2val(t) = shdw.src2val(t) FMCAD’02 Refinement Maps incr Program memory PC result bus D E C O D E valid tag val dispatch Register Rename Unit retire ALU execute head tail Reorder Buffer Reorder Buffer Fields valid value src1valid src1val src1tag src2valid src2val src2tag dest op shdw.value shdw.src1val shdw.src2val Shadow Fields Correspondence with a sequential ISA model OOO and ISA synchronized at dispatch For Register File Contents r. reg.valid(r) reg.val(r) = Rfisa(r) For Program Counter PCooo = PCisa FMCAD’02 Invariants Tag Consistency invariants (2) Instructions only depend on instruction preceding in program order Register Renaming invariants (2) Tag in a rename-unit should be in the ROB, and the destination register should match r.reg.valid(r) (rob.head reg.tag(r) < rob.tail rob.dest(reg.tag(r)) = r ) For any entry, the destination should have reg.valid as false and tag should contain this or later instruction robt.(reg.valid(rob.dest(t)) t reg.tag(rob.dest(t)) < rob.tail) FMCAD’02 Invariants (cont.) Executed instructions have operands ready robt. rob.valid(t) rob.src1valid(t) rob.src2valid(t) Shadow-Value-Operands Relationship robt. shdw.value(t) = Alu(shdw.src1val(t),shdw.src2val(t),rob.op(t)) Producer-Consumer Values (2) robt. rob.src1valid(t) shdw.src1val(t) = shdw.value(rob.src1tag(t)) Total 13 Invariants Includes Refinement Maps Constraints on Shadow Variables FMCAD’02 Proving Invariants Proved automatically Quantifier instantiation was sufficient in these cases Relieves the user of writing proof scripts to discharge the proofs Time spent = 54s on 1.4GHz m/c Total effort = 2 person days Not possible to use SVC or CVC Ordering between integer array indices robt. rob.src1valid(t) rob.src1tag(t) < t SVC/CVC interprets terms over reals (x < y+1) (x y) Valid when x,y are integers Invalid when x,y are reals FMCAD’02 Why Quantifier Instantiation works FMCAD’02 Extensions to the base model Increase concurrency of design Infinite number of execution units Any subset of {dispatch,execute,retire,nop} can be active The same invariants were proved inductive without any changes Scalar Superscalar Incorporate issue width = 2 and retire width = 2 Data forwarding logic of the processor gets complicated Same set of invariants proved automatically No change in the proof script !! Runtime increased from 54s to 134s FMCAD’02 Adding circular reorder buffer ROB modeled as a finite but arbitrary-size circular FIFO Tags are reused No dispatch when the reorder buffer is full Changes in the model Add a predicate rob.present() to indicate a rob entry contains valid entry Change the dispatch logic to stall when ROB full Modify ‘<’ to incorporate wrap-around Changes in proof script Add 1 invariant about the relationship of rob.present and active elements of ROB Again the proof of invariants automatic !! FMCAD’02 Liveness Proof Liveness Every dispatched instruction is eventually retired Assumes a “fair” scheduler Attempts to execute the instruction at the head infinitely often Proceed by a high level induction Not mechanical Similar to Hosabettu [CAV98] approach Most lemmas required are already proved during safety proof (in UCLID) Concise proof FMCAD’02 Current Status and Future Work Use of decision procedure in deductive verification Automate proof of invariants in micro-architecture verification with speculation, memory instructions [CMU-TR] Automate proof of invariants in verification of a directory based cache coherence protocol with unbounded clients and unbounded channels Need ways to generate (some) invariants automatically Pnueli et al.’s invisible invariant method [CAV01] Difficult to handle unbounded data, uninterpreted functions and ordering Detecting convergence of such term-level models Would enable automatic proof of models with finite buffers FMCAD’02 Questions Introduction and Related Work Microprocessor Verification Finite state symbolic Model Checking, Berezin et al. Compositional Model Checking, McMillan et al. Symbolic Simulation + Decision Procedure based, Burch & Dill, Bryant & Velev Theorem Proving Techniques, Sawada & Hunt, Hosabettu et al., Arons & Pnueli FMCAD’02 Exploiting Positive Equality Decision Procedure exploits “positive-equality” Bryant, German, Velev , CAV’99 Extended in presence of succ, pred operations Bryant, Lahiri, Seshia CAV’02 Positive Equality Number of interpretations can be greatly reduced Equations appearing only under even # of negations assigned false Except when restricted by functional consistency Terms compared in these equations get distinct interpretations --- called p-terms Identifying p-terms is a pre-processing step FMCAD’02 Instruction Set Architecture (ISA) FMCAD’02 UCLID description FMCAD’02 Modeling Circular Queues H0 T0 head next[head] := case (operation = POP) : succ’(head) ; default : head ; esac next[content] := Lambda i. case (operation = PUSH) & (i = tail) : D ; default : content(i); esac tail next[tail] := case (operation = PUSH) : succ’(tail) ; default : tail; esac succ’ := Lambda (x). case x = T0 : H0 ; default : succ(x); esac; FMCAD’02 Term-level modeling Abstract Bit-Vectors with Integers (Terms) Allow restricted set of operations x=y, x y, succ(x), pred(x) “Black-box” certain combinational blocks Replace by uninterpreted functions Maintain functional consistency A Lf U FMCAD’02 Example : Motorola ELF™ Processor Features 32-bit Dual issue with 64 GPRs 5 stage pipeline Out-of-order issue, in order completion of up to 2 instructions Load/Store unit 3-cycle load latency Fully pipelined Load queue for loads that miss in cache Store queue for retiring store instruction Other buffers to hide cache miss latency 1000 lines of UCLID model derived from 20K lines of RTL FMCAD’02 Bounded Property Checking Compare the micro-architecture with a sequential ISA model w.r.t. Register File, Memory and PC ISA model synchronized at completion ISA impl impl impl Impl state when 1 or 2 instruction(s) complete ISA impl impl impl Impl state when no instruction(s) complete ISA state FMCAD’02 Quantifier Instantiation Prove |= (x1x2…xk (x1…xk)) y1y2…ym (y1…ym) 1. Introduce Skolem Constants (y*1,…,y*m) |= (x1x2…xk (x1,…,xk)) (y*1,…,y*m) 2. Instantiate x1,…,xk with concrete terms Assume single-arity functions and predicates Let Fx = {f | f(x) is a sub-expression of (x1…xk)} Let Tf = {t | f(t) is a sub-expression of (y*1…y*m)} For each bound variable x, Ax = {t|f Fx and t Tf} Instantiate over Axi x Ax2 ...x Axk Formula size grows exponentially with the number of bound variables FMCAD’02 Updating Shadow Structures During the dispatch of new instruction I = <src1,src2,dest,op> next[shdw.value] := t. (t = rob.tail ? Alu(Rfisa(src1),Rfisa(src2),op) : shdw.value(t)); next[shdw.src1val] := t. (t = rob.tail ? Rfisa(src1) : shdw.src1val(t)); next[shdw.src2val] := t. (t = rob.tail ? Rfisa(src2) : shdw.src2val(t)); FMCAD’02 Adding Shadow Structures incr Program memory PC valid tag val D E C O D E result bus dispatch Register Rename Unit retire ALU execute head incr Program memory PC tail Reorder Buffer Reorder Buffer Fields valid shdw.value value src1valid src1val shdw.src1val src1tag src2valid shdw.src2val src2val src2tag dest Shadow Fields op D E C O D E FMCAD’02 Refinement Maps For Register File Contents r. reg.valid(r) reg.val(r) = Rfisa(r) If a register is not being modified by any instruction in ROB, then the value matches the ISA value For Program Counter PCooo = PCisa FMCAD’02 Invariants valid value src1valid src1val src1tag src2valid src2val src2tag dest op 0 FMCAD’02 Burch-Dill Technique More automated than inductive invariant checking Does not require auxiliary structures, Far fewer invariants than invariant checking Only 4 invariants compared to about 12 for inductive invariant checking approach Invariants on initial state Qooo Instructions only depend on instruction preceding in program order Tag in a rename-unit should be in the ROB, and the destination register should match For any entry, the destination should have reg.valid as false and tag should contain this or later instruction rob.head rob.tail rob.head + r FMCAD’02 Invariants Total 13 invariants required Refinement map for RF and PC (2) Shadow structure constraints (3) Tag Consistency invariants (2) Instructions only depend on instruction preceding in program order Circular Register Renaming invariants (2) Tag in a rename-unit should be in the ROB, and the destination register should match r.reg.valid(r) (rob.head reg.tag(r) < rob.tail rob.dest(reg.tag(r)) = r ) For any entry, the destination should have reg.valid as false and tag should contain this or later instruction robt.(reg.valid(rob.dest(t)) t reg.tag(rob.dest(t)) < rob.tail) FMCAD’02