Balancing Formal and Dynamic Techniques in Validation of Industrial Arithmetic Designs


Formal and Dynamic Techniques in Validation of

Industrial Arithmetic Designs

Roope Kaivola



Inside Intel

Moore’s Law - 1965

Moore’s Law - 40 Years Later

Process Name P854 P856 P858 Px60 P1262 P1264 P1266

1 st Production 1995 1997 1999 2001 2003 2005 2007

Lithography 0.35

m m 0.25

m m 0.18

m m 0.13

m m 90nm 65nm 45nm

Gate Length 0.35

m m 0.20

m m 0.13

m m <70nm <50nm <35nm <25nm

Wafer Size


200 200 200 200/300 300 300 300

A new process every two years

Moore’s Law - Implications

• Each new process generation doubles the number of transistors available to architects and designers

• Some of this increase is consumed by larger structures (caches, TLB, etc.)

• The rest goes to increased complexity:

• Out-of-order, speculative execution machines

• Deeper pipelines

• New technologies (Hyper-Threading, 64-bit extensions, virtualization, security, … )

• Multi-core designs

Pilot line

R&D process team

$3 billion

$1-2 billion

$0.5-1 billion

$5 billion investment requires high volume to achieve reasonable unit cost

The Validation Challenge

• Validation driven by the economics of Moore’s Law

• High initial investment requires high volume

• Increased complexity  increased validation effort and risk

High volumes magnify the cost of a validation escape

Microprocessor Design Scope

Typical lead CPU design requires:

• 500+ person design team:

– logic and circuit design

– physical design

– validation and verification

– design automation

• 2-2 ½ years from start of RTL coding to A0 tapeout

• 9-12 months from A0 tapeout to production qual

(may take longer for workstation/server products)

One design cycle = 2 process generations

Pentium ® 4 RTL Development

RTL Coding Complete

# Files Checked In

Total # Lines of RTL

# Lines Changed

3000 files, 1.3M lines total

(including comments, white space)

A0 tapeout

First Full-Chip

RTL Model

250K lines changed in one week

Functionality Focused Timing Focused

Design Hierarchy

System Bus

Bus Unit


Level 1 Data Cache

Level 2 Cache

Memory Subsystem

Execution Units

Integer and FP Execution Units



Trace Cache

Microcode ROM

Out-of-order execution logic


BTB/Branch Prediction

Front End

Branch History Update

Out-of-order Engine

Pentium® 4 Basic Block Diagram

Design Hierarchy





1k gate elements





Design level

Full chip

Design Hierarchy





1k gate elements





Design level

Full chip

Design Hierarchy



Well-defined interfaces

“What” functionality



1k gate elements





Design level

Full chip

Design Hierarchy



Well-defined interfaces

“What” functionality



1k gate elements

Full chip




Ad hoc interfaces

“How” functionality


Design level

• Pre-silicon

• Tape out a healthy product

• Stages

– Exercise

– Stress

– Coverage

• Post-silicon

• Identify functional issues pre-silicon validation missed

• Physical reality check

RTL Simulation

• Pre-silicon RTL simulation has advantages:

• Fine-grained (cycle-by-cycle) checking

• Complete visibility of internal state

• APIs to allow event injection

• BUT simulation is MUCH slower than real silicon

• A full-chip simulation with checkers runs at ~20 Hz on a Pentium 4 class machine

• A compute farm with ~6K CPUs running 24/7

The sum total of Pentium 4 RTL simulation cycles run prior to A0 tapeout < 1 minute on a single 2 GHz system

GHz system

RTL Simulation – Coverage

• Ideology

• List all interesting cases you can think of

• Hit these by random stimulus

• You will then likely also hit most interesting cases you did not think of

• THE mainstream validation technology

• Very powerful in practice – as long as interesting scenarios carefully identified

RTL Simulation – Granularity

• Cluster Test Environment

• Simulate each cluster in isolation

• Better visibility and controllability

• Faster

• Full-Chip Test Environment

• Do all the pieces fit together?

• Have we implemented IA-32?

RTL Simulation – Limits

• No amount of dynamic validation provides certainty:

• A single dyadic extended-precision (80-bit) FP instruction has ~10**50 possible combinations

• Exhaustive testing is impossible, even on real silicon

• Getting coverage from 0% to 80% is easy, getting from 95% to 98% painful

• Were all interesting scenarios considered when defining coverage targets?

4 Formal Verification

Pentium ® 4 Formal Verification

• First large-scale effort at Intel (~60 person years) to apply formal verification techniques to CPU design

• Objective:

• Complement other validation activities

• Correctness, not bug hunting

• Tools:

• (SMV-like) Model checking

• Symbolic simulation

• Theorem proving to connect FP proofs to IEEE 754

Formal Verification – Organization

• An independent team within design/pre-silicon validation

• Benefits:

– Impartial design scrutiny

– Expertise - reusable proof frameworks

• Detractions:

– Designs not created for verification

– Reverse engineering

Pentium ® 4 Formal Verification

• More than 14,000 properties in key areas:

• FP Execution units

• Instruction decode

• Out-of-order control mechanisms

• Primarily safety properties or conformance to a specification reference model

• Found ~20 “high quality” bugs that would have been hard to detect by dynamic testing

• No silicon bugs found to date in areas proved by FV 

Domain of FV (FPV)






Design from FV Perspective

• RTL written by circuit design engineers

• Optimized using expected constraints: often “almost wrong”

• In general, FV has little influence over designs

• FV models are automatically compiled from RTL source code (gate level)

4 - Unit-Level FV

Unit-Level Verification

• Our primary approach for Pentium 4 verification

• Bottom-up strategy

• First prove unit properties, then multiple-unit protocols and assumptions

• Maintain properties on evolving design and proliferation reuse.

• Leverage results on subsequent designs

• Tools & technologies:

• SMV-like “traditional” model checker

• LTL-inspired property description languages (e.g.


FSM Interactions Example


Unit A

Machine 1 Machine 2 WB


FSM (~17 states each)

Sequential logic



Unit C

Unit B

Property Example

• Top level property (real objective)

• A request from Unit A to Unit C always gets acknowledged

• Low level specs proved (downsized/”traditional” objectives)

• Both machines cannot be in their fault states at the same time

• When one of the machines sends a request, the PE eventually acks the request, unless it is cancelled.

• In the presence of a clear during action a, the action is continued, but z is dropped when action a completes.

• An fault during action b should not result in action a.

• After the PE acknowledges a write request, it always initiates a read.

• The state machines never livelock.

• A cancel results in the state machines going idle eventually.

• When the state machine reaches state i, ii, or iii, it eventually receives an ack from X.

• No notion of completeness if only prove low level specs

• The approach allowed us to verify a large collection of critical local properties on the Pentium 4 design

• Capacity limitations require significant property decomposition:

• Reasoning at the bottom of the hierarchy

• Low-level decompositions break when design changes

• Local “flaws” corrected in the broader scheme of things

• Designs often work more “just because” than due to sound reasoning

Formal Verification vs. Simulation

FV vs. Simulation


• yields partial results quickly,

• progresses in a linear fashion,

• but reaching full coverage is very hard, and

• completeness is unattainable.

Formal Verification

• is woefully capacity-constrained

• slow to produce results,

• but has the promise of completeness.

The Synergy Problem

• Coverage-based validation requires one to identify the sets of interesting cases for all aspects of the design,

• Even if some aspects of the design are formally verified, we still need coverage for them, to make sure we are hitting the other design aspects we failed to identify when defining our coverage targets

• Therefore, formal verification gives little or no reduction in simulation effort

Two ways forward:

• FV as an “extra”

• FV replacing coverage

Replacing Simulation by Formal Verification

• First design exercise testing is unlikely to be replaced by formal verification

• Coverage-based validation can be replaced by FV,


• FV works at the same level of granularity as simulation, and

• FV addresses all the aspects of the design simulation does.

Can we do this?

Formal Verification –

Execution Cluster


System Bus

Bus Unit

Level 1 Data Cache

Level 2 Cache

Memory Subsystem

Execution Units

Integer and FP Execution Units



Trace Cache

Microcode ROM

Out-of-order execution logic


BTB/Branch Prediction

Front End

Branch History Update

Out-of-order Engine

Pentium® 4 Basic Block Diagram

• Execution Cluster – all micro-operations executed here

• Validation task: functional correctness

• Huge state spaces (exceeding 2 160 )

• Floating-point, integer arithmetic etc

EXEC – FMUL Data-Path

Generate Partial Products



EXEC – FMUL Data-Path


Sticky bits – only care whether any is high or none is

EXEC – FMUL Data-Path Optimization



Compress lower sticky bits to a single sticky and some carry bits

EXEC – FMUL Data-Path Optimization Bug



C 0 S


Drop low carry in addition!

EXEC – FMUL Data-Path Optimization Bug



C 0 S


Bug observable only when:

-Low carry = 1

-All higher sticky bits 1’s

EXEC – FMUL Data-Path Optimization Bug


C 0 S


Bug observable only when:

-Low carry = 1

-All higher sticky bits 1’s


Some natural data-path bugs are very hard to hit

EXEC – a FV Success Story!

We can formally verify all micro-operations!

• Abstract specifications: clean, precise (IEEE for FP)

• Proofs from low-level RTL to IEEE specification

• Found many high quality bugs on many CPU designs

• Verification highly repeatable

EXEC – a FV Success Story!


• Direct symbolic simulation (STE)

• Theorem-proved decompositions for most complex micro-ops (div, sqrt, mul)

• Binary Decision Diagrams (BDD’s)

• Parametric representations

• State constraints by inductive invariants

Symbolic Trajectory Evaluation

• Our primary approach for data-path dominated property model checking

• High capacity (n*10k state elements)

Symbolic Trajectory Evaluation

• STE is a built-in function in the reFLect functional programming environment.

• Implemented as a symbolic 4 valued event driven simulator.

• Supports usage paradigms that significantly improve capacity:

• Symbolic indexing

• Parametric substitutions

• User-defined and/or dynamic weakening

An open functional programming environment

• Supports development of libraries, scripting, rapid prototyping and development of formal tools, customization.

• BDDs are first class objects.

• Reflection gives programmatic access to source level syntax.

• Theorem prover to reason about reFLect programs: provides automation for first order and linear arithmetic goals.

• Hooks to SAT solvers, automated reasoning engines.

EXEC Verification Framework

• Methodology and tools built in reFLect.

• Support structure

• IEEE compliant floating-point library

• Customized verification strategies

• Interface level proof design environment

• Infrastructure designed with proliferation in mind.

• Theorems relating model checking to abstract specifications

Case: FP Accumulator

• Verification of most micro-operations handled directly by symbolic simulation: for example, floating point accumulator.

IEEE spec

API adds design-specific information about signal names, timing, ...






Theorem proving

STE model checking


RTL design

Case: FP Accumulator

• Effort involves verifying logic at the Execution cluster boundary

• Direct STE with case splitting and parametric representations for cases

• Verify data-path correctness

• all FP uops, all flavors, all modes, flags, faults

• ACC does x87/SSE/SSE2 ADD, SUB, COM, …

• Verify control correctness

• ACC takes an arbitrary sequence of uops

• interference between uops of different latency

Case: FP Multiplier

• Very low-level RTL

• Highly optimized

• Supports different operation flavours

• Shared control logic

• Little symmetry

• Direct STE not feasible








Exponent datapath

S2 S1



Mantissa datapath

Partial Products generator

Wallace Tree

Adder Network

Rounder logic

Case: FP Multiplier

• Algorithmic decomposition to enable verification

• Verify partial product generation and addition separately

• Employ STE to verify sub-proofs individually

• Use the deductive engines in reFLect to tie the results and verify the I/O correctness claim

• Decomposition reusable on subsequent designs

• Verification with decompositions easier than with a specialized, potentially fully-automated approach (e.g.

Binary Moment Diagrams)

Formal Verification vs. Simulation


Observations from EXEC FV

• Symbolic simulation

• Gives us sufficient capacity

• Can be learnt without a degree in FV (although it helps)

• Is easy to communicate to designers

• Allows us to work at the same level of granularity

(cluster) as simulation

• Approach is robust and maintainable

Observations from EXEC FV

• For any large FV task targeting complete coverage, the verifier needs to understand in detail

• How the design works

• How the verification algorithm works

• The role of each computation step in the overall verification task in order to solve the inevitable complexity problems.

Industrial Applicability of FV – Intel

• Simulation is the default validation approach

• In a project setting, FV competes with simulation

• FV is competitive in the target areas where verifiers have sufficient prior expertise and collateral

• In Intel, FV is an established technology used in most recent CPU development projects

Industrial Application of FV – Bad News

• “Lack of capacity”

• Many FV approaches lack scalability in two fronts

• up in design size

• down in result quality

• Barriers

• Technology

• Expertise

Industrial Application of FV – Good News

• In areas where a verifier can concentrate on verification, instead of solving verification research problems, the effort to carry out FV is comparable to thorough coverage-based validation

• Current Intel projects are replacing coverage-based validation by FV in select areas – stay tuned …

• Simulation cannot answer questions like: is a design change guaranteed not to break anything?

Position Statements

• The greatest advantage of FV is complete coverage!

• For certain areas of design, we have FV methods with a strong practical track record. Then, the choice to do or not to do FV is a risk tolerance question.

• In general, the question of robust, scalable FV methods is an open research problem

• I believe that much FV research attempts to fully automate a problem that is too hard to be automated.

We have been more successful with simpler methods which the verifiers can help with their insight.

