Automatic Verification of Floating Point Units

advertisement
Automatic Verification of
Floating Point Units
Udo Krautz, Viresh Paruthi,
Anand Arunagiri, Sujeet Kumar
IBMTM Corporation
Authors
1. Udo Krautz, IBM Deutschland, Boeblingen
Germany, krautz@de.ibm.com, +49-7031-16-2347
2. Viresh Paruthi, IBM Corporation, Austin TX USA,
vparuthi@us.ibm.com, +1-512-286-7922
3. Anand B Arunagiri, IBM Corporation, Bangalore
India, aarunagi@in.ibm.com, +91-80-41777187
4. Sujeet Kumar, IBM Corporation, Bangalore India,
sujkumak@in.ibm.com, +91-80-41777283
2
Abstract
Verification of floating point units (FPU) is one of the most successful applications of formal verification
methods. The large and complex data paths and intricate control structures of FPUs makes verification with
coverage driven simulation incomplete and error prone. Formal verification (FV) has been successfully
leveraged to achieve the high level of quality desired of these critical logics. Typically, FV-based
approaches to verify FPUs rely on introducing higher level abstractions to allow reasoning. This however
has to be done manually, and quickly becomes tedious for highly optimized bit level implementations on
board high performance microprocessors. Automated formal methods working directly on the bit level and
providing a full end-to-end check for FPUs exist but are limited to single instructions (issued in an empty
pipeline), hence lack in checking control aspects of the logic as those relate to inter-instruction interactions,
or pipeline control.
In this talk we present an approach based on equivalence checking to overcome the single instruction
limitation for automated bit level proofs in the formal verification of FPUs. The sequential execution of
instructions is modeled by two instances of the design-under-test. One of these instances acts as a reference
model for the other. This allows for a large numbers of internal equivalences to be leveraged by
equivalence checking techniques. We show that this method is capable of proving instruction sequences for
highly optimized industrial FPU designs. Together with a proof of correctness of individual instructions
with model checking it guarantees correctness of the FPU design as a whole. In our experience no other
approach can provide the level of automation and ease as the proposed method.
3
Motivation
•
Floating-Point Units (FPU) inherently difficult to verify:
•
Data path challenges
– Complex floating-point algorithms and hardware
E.g. alignment shifter, leading zero anticipator (LZA), rounding, …
– Intricate corner-cases
E.g. denormal inputs/outputs, cancellation, sticky-bits, …
•
Control complexity
– Pipelined out-of-order speculative execution, microcode ops, ...
•
Various verification techniques deployed to verify FPUs
•
Incomplete methods to find bugs
– Rand/manual/targeted testcase generation, coverage analysis, …
– Bugs may skip into silicon (e.g. Pentium FP bug!)
•
Complete methods (formal) to establish correctness
– Model checking (automatic) techniques
• Restricted to a single instruction issue in an empty pipeline (datapath verif)
– Higher level reasoning
• Manual with requiring creation of dedicated models (end-to-end verif)
4
Contribution
•
We propose to enhance automated methods to enable
verification of control aspects in addition to the data path
•
Automated end-to-end verification of bit level FPUs
•
Inclusive of control and data path
– Data path verified with model checking (existing state-of-the-art)
• Submit a single instruction in an empty pipeline
• Checks for “numerical correctness” of different ops
– Control related aspects verified with sequential equivalence checking
• The design serves as its own reference
• Instruction sequence submitted to allow inter-instruction interactions
• Allows leveraging internal equivalence points to alleviate capacity issues
•
Results bear out effectiveness of the approach
5
Data path Verification
•
•
Checks numerical correctness of FPU data path
•
•
•
IEEE754 standard
Implementation constraints (timing, area, power, performance)
Fused-multiply-add (FMA) instruction: A*B + C
•
Example bugs:
– if two nearly equal numbers subtracted (causing
cancellation), the wrong exponent is returned
– if result is near underflow, the wrong guard-bit is chosen
Restricted to a single instruction issued in an empty FPU
•
Influence of other instructions not considered
•
Provides complete datapath coverage; remaining verification
resources may focus on other aspects (e.g., inter-instruction)
6
Datapath Verification Testbench
•
A “driver” issues an instruction into real, reference FPUs
•
A “checker” compares the results of the two FPUs for equality
Operands
Reference
model
Real
FPU
=
•
FP operations may be bounded by longest-latency operation
•
Verification problem is thus a bounded model check
7
Control Verification
•
Verifies pipeline control, complex micro-architectural features
•
Speculative execution, functional clock-gating, blocking, …
•
Example bugs:
– If a speculatively executed instruction stream should not be executed
(e.g. due to branch not taken), does a ‘kill’ generate any side-effects?
– Does the issue of overlapping instructions cause resource conflicts?
– Does forwarding of data to subsequent instruction yield wrong result?
•
Requires submission of continuous stream of instructions
•
•
Activate inter-instruction interactions/dependencies
Irrespective of previously executed instructions, or initial state
8
Control Verification Testbench
•
The design serves as its own “reference”
•
A “driver” issues single instruction in “reference” FPU and
additional sequence of instructions in real FPU
•
A “checker” compares correct result of “followed” instruction
Instruction
sequence
Single
instruction
(Real)
FPU
(Reference)
FPU
=
•
Verification problem is a sequential equivalence check
•
Internal equivalences can be effectively leveraged
9
Conditional Equivalence
•
A single instruction of the sequence is executed in both FPUs
•
Restricted to conditional equivalence (not general SEC)
•
Pipeline stages in which the “followed” instruction is active
should be equivalent in a specific cycle
Other instruction
Inactive stage
Active pipeline stage,
followed instruction
Followed instruction
=
•
Final check only on the result of the “followed” instruction
•
Bounded checking allows to unfold the pipeline – only equivalent
pipeline stages should be in result property‘s COI
10
Sequential Equivalence Tenets
•
Several degrees of equivalence/correctness:
•
Identical result of “followed” instruction regardless of initial state
‒ Possible with model checking if legal initial states are known
‒ Manual computation of initial states tedious for complex pipelines
•
“Followed” instruction not influenced by “residual states”
‒ Both FPUs should be equivalent for the “followed” instruction
irrespective of a previously executed instruction
•
All timing-windows need to be considered between instructions
‒ Requires an infinite sequence of instructions
‒ Infinite sequence made finite to allow bounded checking
11
Verification Technology
• SAT-based Bounded Model Check
• Performs a satisfiability check on a k-step unfolded netlist
• Hybrid SAT-engine
– Integrates structural netlist transformations, BDDs, simulation, CNF
clauses and SAT procedure in one framework
• Conditional equivalence checking
• Automatic checkers for pipeline stages getting activated
‒ Added for every stage – either proven or disproven
• Leveraged as “lighthouses” to enable end-to-end SAT check
• Encapsulated as engines in IBM’s semi-formal tool SixthSense
• Uses a Transformation Based Verification (TBV) paradigm that
maximally exploits synergy between algorithms
12
Verification Results – Setup
• Single instruction checks
•
•
•
•
•
FPU vs high level reference model
45 instructions require case-splits
24 instructions covered by semi-formal
410 instructions fully covered
Model: 10k variables/ 100k latches/ 3352k ANDs
• Instruction sequence checks
• FPU (sequence) vs FPU (with single followed op)
• Different types of instruction:
• Pipelined
• Fixed latency multicycle
• Variable latency multicycle
• 9 scenarios of sequences types defined
• Two models:
• B2B issue only
• Infinite sequences
• Model: 7,6k variables/ 254k latches/ 1398k ANDs
13
Results- Single Instruction
Instruction
Runtime
Memory
64bit Binary-FP ADD
overlap-case (369)
3min:50s
1.5GB
64bit Binary-FP ADD
cancellation-case
(168)
7min:51s
1.5GB
128bit Decimal-FP ADD
overlap-case (26388)
4min:28s
1.5GB
128bit Decimal-FP shift
single test
17min:04s
1.5GB
128bit Hex-FP convert to
64bit Integer
single test
18min:15s
1.3GB
64bit Binary-FP divide
semi formal only
>24h
running on LinuxTM 2.6 64bit, XeonTM E5-2680 2.7GHz
14
Results – Sequences
Followed instruction
Irritator instruction
Runtime
Memory
Pipelined
(extract exponent)
Pipelined
(convert decimal integer to decimal fp)
1min:07s
1GB
Fixed
latency
(128bit decimal fp add)
1min:14s
0.94GB
Variable
latency
(convert binary fp to decimal fp)
21min:17s
1.1GB
Pipelined
(convert decimal integer to decimal fp)
1min:52s
1.1GB
Fixed
latency
(128bit decimal fp add)
1min:22s
1GB
Variable
latency
(convert binary fp to decimal fp)
1:13min:22s
3.6GB
Pipelined
(convert decimal integer decimal fp)
13min:29s
1.3GB
Fixed
latency
(128bit decimal fp add)
24min:37s
1.8GB
Variable
latency
(convert binary fp to decimal fp)
6h:6min:17s
7GB
Fixed
latency
(compare decimal fp)
15
Conclusions and Future work
• Presented an end-to-end automated approach to verify FPUs
• Inclusive of dataflow and control
• Dataflow verified instruction-by-instruction against reference
• Control verified via a sequential equivalence check
• Future Work
– Extend B2B sequences to random sequences – cover all
possible sequences
• Random sequences with pipelined instructions solvable
• Random sequences with multicycle instructions unsolved in 24h
– Include forwarding of operands
• Internal equivalences do not hold due to latency differences
16
Related Work
•
IntelTM uses combination of automatic methods and STE
•
Published in CAV 2009 and FMCAD 2012
•
Results depict most defects attributed to STE
• Likely requires manual-implementation specific effort
•
•
Full details for reproducibility not disclosed
Most other works focus on data path verification
•
Focus on specific instructions and design artifacts
• E.g. FMA instruction together with multiplier
•
Largely manual as rely on methods such as theorem proving
• Tedious proofs which are implementation specific
•
If automatic use special purpose data structures
• E.g. Chen’98 uses PHDDs vs SAT/BDDs
17
Download