FPGA interconnect planning

advertisement
FPGA
FPGA Interconnect
Interconnect Planning
Planning
Amit Singh
Malgorzata Marek-Sadowska
Xilinx Inc.
San Jose, CA
amit.singh@xilinx.com
University of California, Santa Barbara
Santa Barbara, CA
mms@ece.ucsb.edu
Outline
¾ Introduction
¾ Previous Work
¾ Architecture Model
• Architecture & Design Rent’s exponent
¾ Clustering
• Spatial regularity
¾ Fanout distribution
• Area minimization
• Area-delay minimization
• Performance
¾ Conclusions
Introduction
¾ Clustered FPGAs
• System-on-Chip (SOC)
¾ Performance, Power, Area
• Matching design and architecture complexities
• Circuit packing/clustering
¾ Fanout distribution (Zarkesh-Ha et al. 2000)
• Rent’s rule
¾ Segment length planning in FPGAs
• Area reduction
• Area-Delay product minimization
Previous Work
¾ Rent’s rule
• Landman, Russo (1974), Donath (1979)
¾ Interconnect distribution model
• Davis et al. (1998), Zarkesh-Ha et al. (2000)
• Local, semi-global, global wiring requirement
ƒ Homogeneous, heterogeneous systems
¾ FPGA segment length distribution
• Betz et al. (1999)
• Type of routing switches
• Impact – segment length distribution on area-delay
product
ƒ Best area-delay product: Mix of length 4 and 8
segments
Clustered FPGAs
¾ Clusters of LEs, connection boxes and switch-boxes
¾ Regular 2-D mesh array
¾ Example: Xilinx Virtex series, Altera APEX & Stratix
A Logic Element
Cluster of LEs
Cluster
C-Box
wSegment Length
Routing Model
S-Box
wW
Cluster
Cluster
Routing Model
Buffered routing switches
Buffer chain delay
Pass-transistor chain delay
How much interconnect?
¾ ~80% of FPGA area = interconnects
¾ Routing resource utilization (RRU) is low
• 100% logic utilization
¾ Depopulating logic clusters
• Regularity
¾ Interconnect complexity guided fanout distributionRent’s rule
• Average fanout
ƒ Segment lengths (shorter segments or longer
segments?)
ƒ Switch type (tri-state buffers or pass-transistor?)
Rent’s Rule
Rent’s rule: Landman and Russo in 1971.
Average number of terminals and blocks per module in a
partitioned design:
T=tB
p
p = Rent exponent
100
t ≅ average # term./block
T
Measure for the complexity
of the interconnection topology.
10
(simple) 0 ≤ p ≤ 1 (complex)
average
Rent’s rule
1
1
10
B
100
Typical values: 0.5 ≤ p ≤ 0.75
1000
Rent’s Rule
¾ Definitions
• Pd – Rent’s parameter for Design
Routing resource utilization
• Pa – Rent’s parameter for Architecture
Pa = 0.64
Rent’s parameter
Logic Clustering
Example
• Separation : Sum of
all terminals of nets
incident to LE
t
y
• Degree : Number of
nets incident to LE
separation
c=
d2
2
•Net weight: w( x ) =
r
Y
C
z
x
G ( X ) = [2nw( x) • (1 + α )] k
IO ≤ (k + 1)n
Pa
X
Clustering: Seed selection
A
degree = 4; separation = 18, c = 1.25
A
Nets absorbed = 1
B
degree = 4; separation = 8, c = 0.5
B
Nets absorbed = 4
Rent’s Rule: Depopulation
¾ Case 1: Pd <= Pa
• Achieve spatial uniformity.
¾ Case 2: Pd > Pa
• Need more routing resources.
A
• Solution – Depopulate clusters
A
A
Regularity
¾ Better clustering for increased routability
Avg. fan-out
2.7
Ours:
Avg. fan-out
3.7
Routing
succeeded
a channel
Routing
succeeded
with awith
channel
widthwidth
factorfactor
of 21 of 12
Rent’s Rule: Depopulation
# Clusters
Cluster Pin Utilization
80
70
60
50
40
30
20
10
0
0
5
10
15
20
# Cluster Pins utilized
Ours
T-VPack
25
30
Segment length Planning
¾ Typical segment distribution
What is the best mix of segments for:
i) Area minimization ii) Area-delay minimization?
Fanout Distribution
¾ Inter-cluster wire requirement
¾ Equivalent Rent’s parameter of system
n
n
keq = N (∏ k
Ni
i
i =1
peq =
∑pN
i =1
i
i
N
¾ Netlist profile
• Low fanout nets – smaller length segments
• Global nets – Longer segments (long lines)
¾ Global segments – buffered
¾ Shorter segments – not buffered(pass transistor
switches)
Fanout distribution
¾ Array of clusters
350
300
nets
250
200
150
100
50
0
1
kN ((m − 1) p −1 − m p −1 )
Net (m) =
m
Foavg
1 − ( FoMax + 1) p −1
=
−1
p−2
1 − ( FoMax + 1) − Ω( p, FoMax )
2
3
4
5
6
7
8
9
10
Pins per net
Actual
Predicted
Ω( p, FoMax ) =
FoMax
∑
n =1
np
n 2 (n + 1)
Fanout distribution: Area-Minimization
¾ Good Placement
Example
Fanout distribution: Area-Minimization
Area (Transistors)
1.00E+07
8.00E+06
6.00E+06
4.00E+06
2.00E+06
0.00E+00
1
2
3
4
Circuit
Ours
Length 1
Xilinx-like
5
6
Fanout distribution: Area-Minimization
Critical Path (ns)
3.00E-07
2.50E-07
2.00E-07
1.50E-07
1.00E-07
5.00E-08
0.00E+00
1
2
3
4
5
Circuit
Ours
Length 1
Xilinx-like
6
Fanout distribution:Area-Delay Product
¾ Performance!
¾ Average fanout: Critical path model
Fanout distribution:Area-Delay Product
¾ Timing-driven Placer and Router
1
0.8
0.6
0.4
0.2
Circuits
Ours
Xilinx-like
en
g
ts
se
q
di
ffe
q
ds
ip
ex
5p
m
is
ex
3
s2
98
de
s
0
al
u4
ap
ex
2
ap
ex
4
bi
gk
ey
Area-delay product
1.2
Fanout distribution:Performance
¾ Using a Timing-driven Placer and Router
Normalized Critical Path
1.4
1
0.8
0.6
0.4
0.2
Circuits
Ours
Xilinx-like
en
g
Av
g.
ts
se
q
di
ffe
q
ds
ip
ex
5p
m
is
ex
3
s2
98
de
s
0
al
u4
ap
ex
2
ap
ex
4
bi
gk
ey
Normalized delay
1.2
Conclusions & Future Work
¾ FPGA Clustering for Regularity
• Rent’s rule
¾ Fanout distribution based segment length planning
• Area minimization
ƒ Reduced wire-length
• Area-Delay minimization
ƒ Reduction of 29% over state-of-art
¾ Fixed FPGA Architecture
• 20% better area-delay product
¾ Future work: Metal layer assignment
• Applying technique to a pipelined FPGA
Download