ECE 506
Reconfigurable Computing http://www.ece.arizona.edu/~ece506
Lecture 6
Clustering
° Intra -cluster connections: fast
° Inter -cluster connections: slow
Need to pack BLEs
° Goals:
•
Reduce stress on routing
•
Take advantage of local fast interconnect
• Reduce inter-cluster wiring
• Minimize critical path (timingdriven)
° How do we do this
• Take advantage of cluster architecture
° Tradeoffs
° How many distinct inputs should be provided to a cluster of N 4-LUTs?
° How many 4 LUTs should be included in a cluster to create the most area-efficient logic block?
VPACK
° Flow
• Iterate until all BLEs consumed
• Start new cluster by selecting a random BLE
select the currently unclustered BLE with the most used inputs,
• Add BLE with most shared inputs with current cluster to cluster
to minimize the number of inputs that must be routed to each cluster.
• Keep adding until either cluster full or input pins used up
• Hill climbing – if some cluster BLEs unused
Add another BLE even if cluster input count temporarily overflowed
If input count not eventually reduced select best choice from before hill climbing
Logic Utilization
Number of Inputs per Cluster
• Lots of opportunities for input sharing in large clusters
(Betz – CICC’99)
• Reducing inputs reduces the size of the device and makes it faster.
• Most FPGA devices (Xilinx, Lucent) have 4 BLE per cluster with more inputs than actually needed.
TVPACK
Architecture Modeling
Tri-state buffer and pass transistor distribution
Cluster Size vs. Routing resources (Tile size)
Transistor and Buffer Scaling based on segment length
Flexibility of Switches (Fc=W for large cluster size is a waste?)
Logic Cluster Structure
Timing-Driven Clustering – T-VPACK
• Pack each cluster to its capacity
- Minimize number of clusters
• Minimize number of inputs per cluster
- Reduce the number of external connections
Timing-Driven Clustering – T-VPACK
• Minimize number of external connections on critical path
• Why?
- External connections have higher delay and internal connections
- Reducing number of external nets on critical path will reduce delay
Timing-Driven Clustering – T-VPACK
• Identify connections that are on the critical path
• Pack BLEs sequentially along the critical path
• Recompute criticality of remaining BLEs
Slack and Criticality Calculation
PI1 1 4 6 5 PO1
PI2
3 6 6 7
PO2
PI3
1 4 5 4
PO3
Slack and Criticality Calculation
PI1
0
1 4 6 5 PO1
PI2
0
3
PI3
0
1
6 6
4 5
Arrival Times
7
PO2
4
PO3
Slack and Criticality Calculation
1
PI1
0
1 4 6
3
5
PI2
0
3
3
6
PI3
0
1
3
1
4
6
5
7
4
Arrival Times
PO1
PO2
PO3
Slack and Criticality Calculation
1
PI1
0
1 4
7
6 5
PI2
0
3
3
6
9
6
PI3
0
1
1
4
7
7
5
Arrival Times
7
4
PO1
PO2
PO3
Slack and Criticality Calculation
1
PI1
0
1 4
7
6
13
5
PI2
0
3
3
6
9
6
15
PI3
0
1
1
4
7
7
5
7
14
4
Arrival Times
PO1
PO2
PO3
Slack and Criticality Calculation
1
PI1
0
1 4
7
6
13
5
18
PO1
PI2
0
3
3
6
9
6
15
PI3
0
1
1
4
7
7
5
7
22
14
4
18
PO2
PO3
Arrival Times
Slack and Criticality Calculation
1
PI1
0
1 4
7
6
13
5
18/ 22
PO1
PI2
0
3
3
6
9
6
15
PI3
0
1
1
4 arrival time/ required time
7
7
5
7
22/ 22
PO2
14
4
18/ 22
PO3
Slack and Criticality Calculation
1
PI1
0
1 4
7
6
13
5
18/ 22
PO1
PI2
0
3
3
6
9
6
15 / 15
7
22/ 22
PO2
PI3
0
1
1
4 arrival time/ required time
7 / 15
7
5
14/ 18
4
18/ 22
PO3
Slack and Criticality Calculation
1
PI1
0
1 4
7
6
13
5
18/ 22
PO1
PI2
0
3
3
6
9
6
15 / 15
7
22/ 22
PO2
PI3
0
1
7 / 15
1
4
7/ 13
5
14 / 18
4
18/ 22
PO3 arrival time/ required time
Slack and Criticality Calculation
1 13 / 15
PI1
0
1 4
7
6 5
18/ 22
PO1
PI2
0
3
3
6
9
6
15 / 15
7
22/ 22
PO2
PI3
0
1
7 / 15
1
4
7/ 13
5
14 / 18
4
18/ 22
PO3 arrival time/ required time
Slack and Criticality Calculation
1 13 / 15
PI1
0
1 4
7 / 9
6 5
18/ 22
PO1
PI2
0
3
3
6
9 / 9
6
15 / 15
7
22/ 22
PO2
PI3
0
1
7 / 15
1
4
7/ 13
5
14 / 18
4
18/ 22
PO3 arrival time/ required time
Slack and Criticality Calculation
PI1
0
1
1 / 5
4
7 / 9
6
13 / 15
5
18/ 22
PO1
PI2
0
3
3 / 3
6
9 / 9
6
15 / 15
7
22/ 22
PO2
PI3
0
1
1 / 9
7 / 15
4
7/ 13
5
14 / 18
4
18/ 22
PO3 arrival time/ required time
Slack and Criticality Calculation
PI1
0 / 4
1
1 / 5
4
7 / 9
6
13 / 15
5
18/ 22
PO1
PI2
0 / 0
3
3 / 3
6
9 / 9
6
15 / 15
7
22/ 22
PO2
PI3
0 / 8
1
1 / 9
7 / 15
4
7/ 13
5
14 / 18
4
18/ 22
PO3
Slack = required time arrival time
Slack and Criticality Calculation
4 2
PI1
4
1 4
2
6 5
4
PO1
PI2
0
3
0
6
0
6
0
8
PI3
8
1 4
8
6
Slack = required time arrival time
5
4
7
0
4
4
PO2
PO3
Slack and Criticality Calculation
PI1
0 / 4
1
1 / 5
4
7 / 9
6
13 / 15
5
18/ 22
PO1
PI2
0 / 0
3
3 / 3
6
9 / 9
6
15 / 15
7
22/ 22
PO2
PI3
0 / 8
1
1 / 9
7 / 15
4
7/ 13
5
14 / 18
4
18/ 22
PO3
Critical Path
Timing-Driven Clustering – T-VPACK
° Cost metric now considers both connectivity and timing criticality
° Perform an analysis of criticality at beginning considering all wires to be inter-cluster
° Determine “Base” BLE criticality
Base Criticality
How to break ties?
° Initially, many paths may have the same number of BLEs
° Include “tie-breaking” in performance cost function
Results for T-VPACK versus VPACK
Why does the gap between VPack and T-VPack increase as N increases?
Results for T-VPACK versus VPACK
° T-VPack prefers to cluster a BLE with BLEs that are in its fan-in or fan-out
° VPack favors input sharing
° T-VPack completely absorbs many low-fanout nets
• Fewer nets to route!
Results for T-VPACK versus VPACK
Why does area-delay product show an increasing trend beyond cluster size of 10?
Results for T-VPACK versus VPACK
° Increased number of nets that are completely absorbed by
T-Vpack
° Area- delay product
• Cluster size 7-10 best choice (36-34% better than N=1)
° N=7 vs N=1
• 30% less delay, 8% les area
Results for T-VPACK, DELAY !!!
Why do we see a circuit speedup?
Results for T-VPACK, DELAY !!!
18%
40%
° Intra-cluster: Fast, Inter-cluster: Slow !
° As N increases
• Number of internal connections on the critical path increase
• Number of external connections on the critical path decrease
Why are inter-cluster connections becoming faster?
Reduction in Number of external connections (internal connections are faster)
External connections on the critical path are becoming faster
Reduction in routing requirements
Drawback of VPack and T-VPack