FPGA FPGA Interconnect Interconnect Planning Planning Amit Singh Malgorzata Marek-Sadowska Xilinx Inc. San Jose, CA amit.singh@xilinx.com University of California, Santa Barbara Santa Barbara, CA mms@ece.ucsb.edu Outline ¾ Introduction ¾ Previous Work ¾ Architecture Model • Architecture & Design Rent’s exponent ¾ Clustering • Spatial regularity ¾ Fanout distribution • Area minimization • Area-delay minimization • Performance ¾ Conclusions Introduction ¾ Clustered FPGAs • System-on-Chip (SOC) ¾ Performance, Power, Area • Matching design and architecture complexities • Circuit packing/clustering ¾ Fanout distribution (Zarkesh-Ha et al. 2000) • Rent’s rule ¾ Segment length planning in FPGAs • Area reduction • Area-Delay product minimization Previous Work ¾ Rent’s rule • Landman, Russo (1974), Donath (1979) ¾ Interconnect distribution model • Davis et al. (1998), Zarkesh-Ha et al. (2000) • Local, semi-global, global wiring requirement Homogeneous, heterogeneous systems ¾ FPGA segment length distribution • Betz et al. (1999) • Type of routing switches • Impact – segment length distribution on area-delay product Best area-delay product: Mix of length 4 and 8 segments Clustered FPGAs ¾ Clusters of LEs, connection boxes and switch-boxes ¾ Regular 2-D mesh array ¾ Example: Xilinx Virtex series, Altera APEX & Stratix A Logic Element Cluster of LEs Cluster C-Box wSegment Length Routing Model S-Box wW Cluster Cluster Routing Model Buffered routing switches Buffer chain delay Pass-transistor chain delay How much interconnect? ¾ ~80% of FPGA area = interconnects ¾ Routing resource utilization (RRU) is low • 100% logic utilization ¾ Depopulating logic clusters • Regularity ¾ Interconnect complexity guided fanout distributionRent’s rule • Average fanout Segment lengths (shorter segments or longer segments?) Switch type (tri-state buffers or pass-transistor?) Rent’s Rule Rent’s rule: Landman and Russo in 1971. Average number of terminals and blocks per module in a partitioned design: T=tB p p = Rent exponent 100 t ≅ average # term./block T Measure for the complexity of the interconnection topology. 10 (simple) 0 ≤ p ≤ 1 (complex) average Rent’s rule 1 1 10 B 100 Typical values: 0.5 ≤ p ≤ 0.75 1000 Rent’s Rule ¾ Definitions • Pd – Rent’s parameter for Design Routing resource utilization • Pa – Rent’s parameter for Architecture Pa = 0.64 Rent’s parameter Logic Clustering Example • Separation : Sum of all terminals of nets incident to LE t y • Degree : Number of nets incident to LE separation c= d2 2 •Net weight: w( x ) = r Y C z x G ( X ) = [2nw( x) • (1 + α )] k IO ≤ (k + 1)n Pa X Clustering: Seed selection A degree = 4; separation = 18, c = 1.25 A Nets absorbed = 1 B degree = 4; separation = 8, c = 0.5 B Nets absorbed = 4 Rent’s Rule: Depopulation ¾ Case 1: Pd <= Pa • Achieve spatial uniformity. ¾ Case 2: Pd > Pa • Need more routing resources. A • Solution – Depopulate clusters A A Regularity ¾ Better clustering for increased routability Avg. fan-out 2.7 Ours: Avg. fan-out 3.7 Routing succeeded a channel Routing succeeded with awith channel widthwidth factorfactor of 21 of 12 Rent’s Rule: Depopulation # Clusters Cluster Pin Utilization 80 70 60 50 40 30 20 10 0 0 5 10 15 20 # Cluster Pins utilized Ours T-VPack 25 30 Segment length Planning ¾ Typical segment distribution What is the best mix of segments for: i) Area minimization ii) Area-delay minimization? Fanout Distribution ¾ Inter-cluster wire requirement ¾ Equivalent Rent’s parameter of system n n keq = N (∏ k Ni i i =1 peq = ∑pN i =1 i i N ¾ Netlist profile • Low fanout nets – smaller length segments • Global nets – Longer segments (long lines) ¾ Global segments – buffered ¾ Shorter segments – not buffered(pass transistor switches) Fanout distribution ¾ Array of clusters 350 300 nets 250 200 150 100 50 0 1 kN ((m − 1) p −1 − m p −1 ) Net (m) = m Foavg 1 − ( FoMax + 1) p −1 = −1 p−2 1 − ( FoMax + 1) − Ω( p, FoMax ) 2 3 4 5 6 7 8 9 10 Pins per net Actual Predicted Ω( p, FoMax ) = FoMax ∑ n =1 np n 2 (n + 1) Fanout distribution: Area-Minimization ¾ Good Placement Example Fanout distribution: Area-Minimization Area (Transistors) 1.00E+07 8.00E+06 6.00E+06 4.00E+06 2.00E+06 0.00E+00 1 2 3 4 Circuit Ours Length 1 Xilinx-like 5 6 Fanout distribution: Area-Minimization Critical Path (ns) 3.00E-07 2.50E-07 2.00E-07 1.50E-07 1.00E-07 5.00E-08 0.00E+00 1 2 3 4 5 Circuit Ours Length 1 Xilinx-like 6 Fanout distribution:Area-Delay Product ¾ Performance! ¾ Average fanout: Critical path model Fanout distribution:Area-Delay Product ¾ Timing-driven Placer and Router 1 0.8 0.6 0.4 0.2 Circuits Ours Xilinx-like en g ts se q di ffe q ds ip ex 5p m is ex 3 s2 98 de s 0 al u4 ap ex 2 ap ex 4 bi gk ey Area-delay product 1.2 Fanout distribution:Performance ¾ Using a Timing-driven Placer and Router Normalized Critical Path 1.4 1 0.8 0.6 0.4 0.2 Circuits Ours Xilinx-like en g Av g. ts se q di ffe q ds ip ex 5p m is ex 3 s2 98 de s 0 al u4 ap ex 2 ap ex 4 bi gk ey Normalized delay 1.2 Conclusions & Future Work ¾ FPGA Clustering for Regularity • Rent’s rule ¾ Fanout distribution based segment length planning • Area minimization Reduced wire-length • Area-Delay minimization Reduction of 29% over state-of-art ¾ Fixed FPGA Architecture • 20% better area-delay product ¾ Future work: Metal layer assignment • Applying technique to a pipelined FPGA