ppt - Ronald F. DeMara - University of Central Florida

advertisement
A Combinatorial Group Testing Method
for
FPGA Fault Location
Ronald F. DeMara, Carthik A. Sharma
University of Central Florida
Introduction
Field Programmable Gate Arrays
 Gate-array-based reconfigurable architecture

Matrix of Logic Cells (Look-Up Tables) surrounded by
peripheral I/O cells
 Capabilities:


Runtime reconfiguration
On-chip processor core & Millions of gate-equivalent logic
elements
 Millions of FPGA devices produced annually: most
SRAM-based
 Used in mission-critical applications


Remote systems & Hazardous Environments
Space Applications – Satellites, probes, and shuttles
Group Testing Algorithms
• Origin – World War II Blood testing


Problem: Test samples from millions of new recruits
Solution: Test blocks of sample before testing
individual samples
• Problem Definition
 Identify subset Q of defectives from set P



Minimize number of tests
Test v-subsets of P
Form suitable blocks
Previous Work
• Pre-compiled Column-Based Dual FPGA architecture [Mitra04]
 Autonomous detection, repair by shifting pre-compiled columns
 Isolation using distributed CED-checkers and “blind” reconfiguration
attempts
• Overview of Combinatorial Group Testing and Applications
[Du00]
 Provides taxonomy and general algorithms for applying CGT
 Examples of CGT applications: DNA clone library filtering, vaccine
screening, computer fault diagnosis, etc.
• CGT Enhanced Circuit Diagnosis [Kahng04]
 Present doubling, halving etc for circuit fault diagnosis using BIST,
CGT
 Requires ability to test resources individually
• Chinese Remainder Sieve technique [Eppstein05]
 Efficient non-adaptive and two-stage CGT based on prime number
driven test formation
 Improved algorithms for practical problem sizes (n < 1080) with small
number of defectives (d < 4)
Fault-Handling Techniques
Device Failure
Characteristics
Duration:
Target:
Approach:
Transient:
SEU
Device
Processing
Configuration Datapath
Repetitive
Readback
Majority
Vote
Invert Bit
Value
Processing
Datapath
CGT-Based
STARS
CED
Dueling
Supplementary
Testbench
Duplex
Output
Comparison
Duplex
Output
Comparison
Cartesian
Intersection
Worst-case
Clock Period
Dilation
Diagnosis:
Recovery:
SEL, Oxide Breakdown,
Electron Migration, LPD
TMR
Detection:
Bitwise
Comparison
Device
Configuration
BIST
Methods
Isolation:
Permanent:
Ignore
Discrepancy
Replicate in
Spare Resource
Fast Run-time
Location
Repetitive
Intersections
unnecessary
Select Spare
Resource
Evolutionary
Algorithm using
Intrinsic Fitness
Evaluation
Isolation Problem Outline
Objectives
 Locate faulty logic and/or interconnect resource: a single stuck-at fault
model is assumed
 Online Fault Isolation: device not entirely removed from service
Features
 Runtime Reconfiguration: FPGA resources configured dynamically
 Utilize Runtime Inputs: avoid special test-vectors, improve availability
Constraints
 Use pre-designed configurations: defined by target application
 Subsets under test have constant resource utilization range for a given
isolation problem
 Resource grouping influences fault articulation: resource-mapping and
input vector might mask hardware faults
 Do not use specialized “block designs”
 Runtime reconfiguration limited to column-swapping
 “Non-reasonable” algorithm: “tests” may be repeated without gaining
new isolation information
Fault Location Using Dueling
The set of all competing configurations is represented by S.
Set Ck represents the resources utilized by configuration k.
Each competing configuration k, 1 < k < |S| has a unique binary
Usage Matrix
Uk, 1 < k < p.
Elements Uk[i,j], 1 < i < m, 1 < j n, where m and n represent the rows and
columns in the device layout respectively.
Elements Uk[i,j] = 1 denote the usage of resource (i, j) by Ck.
The History Matrix H, with elements H[i,j] 1 < i < m, 1 < j < n, is an integer
matrix used to represent the relative fitness of individual resources.
H[i,j] provides instantaneous relative fitness values of resources.
Dueling Example
H [i,j]
@t=0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
1
0
1
0
0
0
0
U1
H [i,j]
@t=2
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
2
1
0
0
1
0
0
0
0
0
1
0
1
1
0
1
0
0
0
0
1
1
0
1
0
0
0
0
0
0
1
0
0
1
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
U2
• H [i,j] changes after C1 and C2 are loaded
• U1 and U2 are corresponding Usage Matrices
• (3,3) is identified as the faulty resource
Modified Halving
Initiate H Matrix
Initially all H[i,j] = 0
Select & Load Competing
Configurations
No
Discrepancy?
Decrement
Corresponding H
Matrix Elements
No
Stasis after n
Iterations?
No
Yes
Increment
Corresponding H
Matrix Elements
Unique Max
In H?
Fitness Augmentation
can be non-linear
Yes
Columns can be
swapped with any other
Columns
Yes
Swap 50%
Suspect Columns
Selection Process can
be Adaptive
Return Indices of
Faulty Element
FPGA Arrangement for Dueling
SRAM-based FPGA
CONFIGURATION BIT STREAM
L
Half-Configuration
Configurations in Population
• C = CL CR
• CL = subset of left-half
configurations
• CR = subset of right-half
configurations
• |CL|=|CR |= |C|/2
R
Half-Configuration
Function Logic L
Function Logic R
`
Discrepancy Check L
Discrepancy Check R
DATA OUTPUT
CONTROL
Reconfiguration Algorithm
FEEDBACK
OFF-CHIP EEPROM
( NOTE: a non-volatile memory is already required to boot any SRAM
FPGA from cold start ... this is not an additional chip )
INPUT DATA
Isolation Progress without Halving
Number of Suspected Faulty Elements (log)
Without Halving
10000
10000
Temporary stasis
in isolation due to
insufficient design
diversity
1000
100
• Resource Utilization = 40%
1000
100
0
5
10
15
20
Number of Iterations
25
30
• Initially |S| = 20,000
• Number of suspected faulty
elements constant at 36 after
23 iterations
• No subsequent improvement
due to lack of differentiating
information between
competing configurations
Dueling with Modified Halving
Number of Suspected Faulty Elements (log)
Dueling with Halving
Symptoms of
stasis invoke
halving procedure
for fast isolation
10000
• Halving works by
swapping half the used
columns with unused ones
• Halving progressively
reduces the size of the set of
suspected faulty elements
1000
• Isolation proceeds till a
single faulty element is
isolated
100
0
5
10
15
Number of Iterations
20
25
• Fault isolated after 19
iterations
Average Number of Iterations For Fault Isolation
Effect of Total Number of Elements
Increased Problem Size
30
• Number of Elements =
(Number of Rows x Number
of Columns
25
20
15
10
Population Size = 40
Resource Utilization = 50%
5
0
0
100
200
300
400
500
600
700
800
900
Number of Rows and Columns in Device
1000 1100
• As the size of the array
containing the fault
increases, the increase in the
required number of
iterations is minimal
• For 1 mill. elements, only
27.4 iterations required.
Effect of Population Size
Average Number of Iterations for Fault Isolation
Population Size
28
Increased
population size
provides minimal
added benefit
26
24
22
• Single fault in S is assumed
•As pop. size increases,
isolation expected to be faster
• Increased pop. size implies
more initial designs
20
18
16
Resource Utilization (%) = 50
Number of Resources = 40000
14
12
10
0
20
40
60
Population Size
80
100
• A population size of 30
seems to be an ideal tradeoff
between ease of isolation, and
the difficulty of generating
increased number of
individuals.
Average Number of Iterations for Fault Isolation
Effect of Resource Utilization
• Moderate resource
utilization ideal for isolation
45
Population Size=40
20
40
Population Size=20
40
• Rate of isolation progress
low with extreme utilization
characteristics
35
30
25
20
Number of Resources = 40000
15
10
20
30
40
50
60
Resource Utilization (%)
70
80
90
• Isolation takes longer
when less than 20% or
greater than 80% of the
available resources are
utilized.
Future Work
• Conducting Tests using Benchmark Circuits
 ISCAS89 s38584 with 11448 gates: sequential logic
 ISCAS85 circuits with max 3513 gates: combinational
logic
 Compression/ Signal Processing algorithms, such as the
Lempel-Ziv (LZ) compression scheme [Mitra04]
• Development of an architecture to enable
column-swapping
 Multi-layer Runtime Reconfigurable Architecture (MRRA)
being prototyped
Backup Slides
• On following pages …
Online Dueling Evaluation
• Objective
 Isolate faults by successive intersection between sets of FPGA
resources used by configurations
 Analyze complexity of Isolation process
• Variables
 Total resources available

Measured in number of LUTs
 Number of Competing Configurations

Number of initial “Seed” designs in CRR process
 Degree of Articulation

Some inputs may not manifest faults, even if faulty resource used by
individual
 Resource Utilization Factor

Percentage of FPGA resources required by target application/design
 Number of Iterations for Isolation

Measure of complexity and time involved in isolating fault
Discrepancy Mirror Circuit
Fault Coverage
Component
Fault Scenarios
Fault-Free
Function Output A
Fault
Correct
Correct
Correct
Correct
Function Output B
Correct
Fault
Correct
Correct
Correct
XNORA
Disagree (0)
Disagree (0)
Fault : Disagree(0)
Agree (1)
Agree (1)
XNORB
Disagree (0)
Disagree (0)
Agree (1)
Fault : Disagree(0)
Agree (1)
BufferA
0
0
High-Z
0
1
BufferB
0
0
0
High-Z
1
Match Output
0
0
0
0
1
Influence of LUT utilization
Perpetually Articulating Inputs
with Equiprobable Distribution
• expected number of pairings grows sub-linearly in
number of resources
• utilization below 20% or above 80% implicates (or
exonerates) a smaller sub-set of resources
• 50% utilization, the expected number of pairings for
1,000, 10,000, and 100,000 resources are 11.1, 14.9,
and 17.6
Intermittently Articulating Inputs
with Equiprobable Distribution
• at 90% utilization mean value of
258 pairings are required to
isolate the faulty resource.
Accommodating Multi-bit Word Widths
• Proof of concept
 The present circuit works efficiently
 Demonstrates important Dueling-enabled isolation method
• Strategies
 Use an array of detectors


attempt to minimize points of failure as word-width increases
Number of logic resources used is acceptable for smaller
circuits
 Create new circuit or scheme, combining fault tolerant
coding-based methods with single-fault secure circuit
 Current research focused on improving detector by
investigating codes, and fault-secure circuits
Pull-down Resistor Considerations
• Proof of concept
 The present circuit works in a verifiable correct manner
 Can utilize synthesized (digital) pull-down resistor which
simulate the behavior of analog resistors
 Demonstrates Dueling-enabled isolation method
 Can be utilized without implementation problems for
Custom-VLSI designs
• Alternative Approach
 Alternate detector circuits for FPGA implementation are
under investigation
 Avoid using Tri-state buffers, pull-down resistors and use
native digital components available on FPGAs
Competitive Runtime Reconfiguration
(CRR)
Evolutionary Computation strategies effective for more than just repair phase:
continually detect, rank, and isolate faults entirely within the underlying data throughput flow
Initialization
Population partitioned into
functionally-identical yet
physically-distinct
half-configurations
fault detection by
robust consensus L=R
over time
no
test
vectors
L=R
graceful
degredation
via
ranking of
alternatives
Selection
Detection
choose
FPGA configuration(s)
labeled L and R
apply functional inputs
to compute FPGA
outputs using L, R
discrepancy
free
Fitness
Adjustment
PRIMARY
LOOP
update fitness of only
L and R based on
detection results
NO online during
repair
YES
invoke
completelyrepaired
criteria can
be ignored
Genetic
Operators
only once
L, R results
Adjust Controls
detection mode, overlap interval, ...
and only on L or R
performance readily adjustable
no reconfiguration when fault-free
SRAM-based FPGA
Conceptual
Innovation
novel fitness
assessment
via pairwise
discrepancy
without any
pre-conceived
oracle for
correctness
(emergent
behavior)
failures in
population
memory covered
INPUT DATA
OFF-CHIP EEPROM
( NOTE: a non-volatile memory is already required to boot any SRAM
FPGA from cold start ... this is not an additional chip )
fault isolation is
model-free and
self-calibrating
device remains
is
either L's or R's
fitness < Repair
Threshold?
CONFIGURATION BIT STREAM
L
Half-Configuration
R
Half-Configuration
Function Logic L
Function Logic R
`
Discrepancy Check L
Discrepancy Check R
checking logic
part of individual
hence also
competes for
correctness
DATA OUTPUT
CONTROL
Reconfiguration Algorithm
FEEDBACK
diverse
alternatives
working
a-priori
Configuration Health States
States Transitions during lifetime of
ith Half-Configuration
Discrepancy Operator
• Baseline Discrepancy Operator
 is dyadic operator with binary
output:
• Z(Ci) is FPGA data throughput
output of configuration Ci
0 Z (CiL )  Z (CiR )
L
R
Ci  Ci  
Othewise
1
WTA:  = i^ j Ci , j EOR Ci , j
L
R
primordial
C
O
M
P
E
T
I
T
I
O
N
(Equivalence)
L=R
pristine
9
complete
repair
partial
repair
2
LR
refurbished
L=R
3
10
L R : fi  fOT
suspect
LR
:
fi  fRT
L
R
RS:  = ij Ci , j EOR Ci , j
(Hamming Distance)
L=R
1
4
integral with
EVOLUTION
L=R
LR
fi  fOT
fi < fRT
:
:
LR
COMPETITION
11
8
:
L = R :
5
7
fi < fRT
LR
under
repair
6
fi < fOT
Procedural Flow under
Consensus-Based Evaluation
Initialization
Population partitioned into
functionally-identical yet
physically-distinct
half-configurations
L=R
is
either L's or R's
fitness < Repair
Threshold?
L=R
Selection
choose
FPGA configuration(s)
labeled L and R
discrepancy
free
apply functional inputs
Detection
to compute FPGA
outputs using L, R
PRIMARY
LOOP
Fitness
Adjustment
update fitness of only
L and R based on
detection results
NO
YES
invoke
Genetic
Operators
only once
L, R results
Adjust Controls
detection mode, overlap interval, ...
and only on L or R
Regeneration
Initialization Consensus Based Evaluation
Genetic
Operators
based
on Reintroduction

Discrepancy
CL
 CR
Partition
P Operator:
into recover
sub-populations
of size |P|/2 toRate
designate
Operators
only
applied
once or
then
offspringresource
returned to
“service”
Four
Fitness
States
:left-half
physical
FPGA
right-half
utilization
without concern
about increasing
fitness Refurbished
Pristine
Suspect
Under Repair
GA Parameters & Experiments
GA parameters
Population size : 20 individuals
Crossover rate : 5%
Mutation rate : up to 80% per bit
GA operators
External-Module-Crossover
Internal-Module-Crossover
Internal-Module-Mutation
Speciation




Two-point crossover between individuals from same sub-group
Crossover points chosen to prevent intra-CLB crossover
Breeding occurs exclusively among members of sub-populations
Maintains non-interfering resource use among L, R
Experiments …
 Fault Isolation Characteristics
 Regenerative Experiments
Demonstrate …
 Objective fitness function replaced
by the Consensus-based
Evaluation Approach and Relative
Fitness
 Elimination of additional test vectors
Impact of Fault on Viable Individuals
• Existence of Positive Test Vector
 Input Ip comprises a positive test vector iff Cv(Ip)  Cf(Ip) = 1 where Cv
denotes a viable configuration and Cf denotes a faulty configuration
 So if a discrepancy is visible then some Ip exists which manifests the fault
• Minimal Case when Ip is Unique
 Ip is unique if fault is observable under exactly one test vector
• Probability Mass Function for Encountering Ip in Minimal Case
 Consider Ew=600 yielding 99.5% coverage for a module with input space
W=64
 The number of input occurrences, 0  i  600, that randomly encounter Ip
to identify the fault is governed by the probability density function:
p.m.f.(i)=
 D  W  n 
   

i  1 
D
W 
 
1
D i
where
D  600,W  64, n  1,0  i  600
where D is the length of Ew
Isolation of a single faulty individual with
1-out-of-64 impact
• Outliers are identified after EW iterations have elapsed
• Expected D.V. = (1/64)*600 = 9.375 from individual impacted by fault
• Isolated individual’s DV differs from the average DV by 3 after 1 or
more observation intervals of length EW
Isolation of a single faulty L individual
with 10-out-of-64 impact
Compare with 1-out-of-64 fault impact
 Expected DV of (10/64)*600 = 93.75 for faulty configuration
 One isolation will be complete approx. once in every 93.75/5 = 19
Sliding Windows
 Fault Isolation achieved is 100%
Isolation of 8 faulty individuals L4&R4
with 1-out-of-64 impact
• Expected isolations do not occur approx. 40% of the time
 Average discrepancy value of the population is higher
 Outlier isolation difficult
 Multiple faulty individual, Discrepancies scattered
Regeneration Performance
3x3 Multiplier
Experiment
Number
Fault Location
Failure
Type
Correctness
Total
after
Fault
Iterations
Discrepant
Iterations
Repair
Final
Effective
Iterations Correctness Throughput
1
CLB3,LUT0,Input1 Stuck-at-1
52 / 64
17920100
421123
1194
64 / 64
97.65
2
CLB6,LUT0,Input1 Stuck-at-0
33 / 64
802050
17034
47
64 / 64
97.87
3
CLB5,LUT2,Input0 Stuck-at-1
22 / 64
3134660
68027
193
64 / 64
97.83
4
CLB7,LUT2,Input0 Stuck-at-0
38 / 64
8158280
185193
513
64 / 64
97.73
5
CLB9,LUT0,Input1 Stuck-at-0
40 / 64
2332670
71613
219
64 / 64
96.93
32.6 / 64
6469550
152598
433
64 / 64
97.6
Average
Parameters:
Difference (vs. Hamming Distance)
Evaluation Window, Ew = 600
Suspect Threshold: DVS = 1-6/600=99%
Repair Threshold: DVR = 1-4/600 = 99.3%
Re-introduction rate: r = 0.1
Repairs evolved in-situ, in real-time, without additional test
vectors, while allowing device to remain partially online.
Multilayer Runtime Reconfiguration Architecture
Fault-Repair
Genetic Algorithm
Control System
Microprocessor
(MRRA)
Reconfiguration
Engine
System Bus
Virtex-II Pro
FPGA
RAM
• Develop MRRA fast
reconfiguration paradigm for the
CRR approach
• Validate with real hardware
platform along with detailed
performance analysis
• First general-purpose framework
for a wide variety of applications
requiring dynamic reconfiguration
• Extend existing theories on
reconfiguration
Loosely Coupled Solution
FP G A
O ut p u t
Input Data
Bit file
Control
hosted on
PC
PCI Interface
Virtex-II
Pro FPGA
Off Chip
RAM
Avnet FPGA Development Board
The entire system operates on a
32-bit basis
The Virtex-II Pro is mounted on a
development board which can then
be interfaced with a WorkStation
running Xilinx EDK and ISE.
For further info … EH Website
http://cal.ucf.edu
Download