Bachelor of Applied Science Thesis Defense An Analysis of Network-on-Chip Implementations on Field Programmable Gate Arrays Kevan Thompson Computer Engineering School of Engineering Science, SFU Overview Introduction Background Methodology Results Conclusions and Future Work Introduction Size of Xilinx FPGAs 2500000 Virtex-7 Number of Logic Cells 2000000 1500000 1000000 Virtex-6 500000 0 2001 Virtex-2P 2002 2003 Virtex-4 Virtex-5 2004 2005 2006 2007 Year 2008 2009 2010 2011 2012 ASIC Vs FPGA ASIC: FPGA: Completely Custom Design Reconfigurable Large Initial Investment Low cost for small volume runs Need to carefully design interconnect between nodes Wires already placed on the FPGA Objective Improvements in the Xilinx tools that have significantly affected the performance of NoCs on FPGAs Improvements in NoC performance on FPGAs that are possible using manual PAR The Star and Fully Connected topologies do not fit into current models NoC Terminology •Topology •Node Ring Mesh •Degree •Average Node Degree (AND) Star Fully Connected Previous Work on NoCs on FPGAs For Xilinx FPGAs: Methodology Input FSL | | | ... | | | Multiplier Node | | | ... | | | Network Switch . . . Output FSL 8-bit multiplier node Two Fast Simplex Links (FSLs) Network topology communication switch FSLs: 16-word-deep queues,24-bit width Multiplier uses 981 Flip-flops, and 653 LUTs FPGA Xilinx Virtex-5 xc5vlx330 Results 10.1 Tools Vs 12.1 Tools •Star, Ring, and Fully Connected Networks Predicted Vs Measured Results •Star, and Fully Connected Networks Manual Implementation Ring, Star, and Mesh Networks 10.1 Tools VS 12.1 Tools for Star Networks 10.1 Tools Vs 12.1 Tools for Star Networks 250 M a x i 200 m u m 150 F r e q u 100 e n c y 50 10.1 Tools 12.1 Tools ( M H z ) 0 8 16 32 Number of Nodes 48 64 10.1 Tools VS 12.1 Tools for Ring Networks 10.1 Tools Vs 12.1 Tools for Ring Networks 210 M a 208 x i m 206 u m 204 F r e q u e n c y 202 10.1 Tools 200 12.1 Tools 198 196 ( ) M 194 H Z 192 8 16 32 Number of Nodes 48 64 10.1 Tools VS 12.1 Tools for Fully Connected Networks 10.1 Tools Vs 12.1 Tools For Fully Connected Networks 200 M a 180 x i 160 m u 140 m F 120 r e 100 q u 80 e n 60 c y 40 10.1 Tools 12.1 Tools ( M H z 20 ) 0 8 16 24 32 Number of Nodes 40 48 Percent Improvement of 12.1 Tools Over 10.1 Tools Percent Improvement of 12.1 Over 10.1 Tools 80.0% P 70.0% e r c 60.0% e n t 50.0% I 40.0% n c r 30.0% e a s 20.0% e ( % Star Ring Fully Connected 10.0% ) 0.0% 8 16 32 Number of Nodes 48 64 Star Networks Predicted Vs Measured Results 250 M a x 200 i m u m 𝒚 = −𝟎. 𝟑𝟎𝟗𝟎𝒙 + 𝟐𝟎𝟑. 𝟖 150 F r q u e 100 n c y ( M H z Measured Results Predicted Results 50 ) 0 8 16 32 48 Number of Nodes 64 80 96 Results Results for Adjusted Model Adjusted Predicted Vs Measured Result 250 M a x i 200 m u m F 150 r e q u 100 e n c y Measured Results Predicted Results ( 50 M H z ) 0 8 16 32 48 Number of Nodes 64 80 96 Comparison of Models Percent Difference Between Predicted and Measured Results 60 P e r c e n t 50 40 30 D i 20 f f e 10 r e n 0 c e ( % Original Model Adjusted Model 8 16 32 48 -10 ) -20 Number of Nodes 64 80 96 Prediction of Adjusted Model for Random Networks Percent Error for Random Networks 0 2 3 4 5 6 7 8 9 10 -10 P e r -20 c e n -30 t Random_16 Random_32 E r -40 r o r -50 Random_48 ( % ) -60 -70 Average Node Degree Fully Connected Networks Predicted Vs Measured Results for Fully Connected Networks 250 M 200 a x i 150 m u 100 m 50 F r 0 e q -50 u e -100 n c -150 y 8 16 24 40 Measured Predicted ( -200 M H -250 z ) -300 32 Number of Nodes Results Interpolated Results for Fully Connected Networks 200 180 160 Maximum Frequency (MHz) 140 120 100 80 60 40 20 0 0 10 20 30 Number of Nodes 40 50 60 CAD Tool Synthesis Steps 1 2 3 4 Behavioural-level Synthesis [14] Technology Mapping [15] Placement [16] Routing [16] HDL is parsed for recognizable constructs Constructs mapped to the specific FPGAs technology Components of the design are placed on the FPGA using Simulated Annealing Wires are connected between the components, using an algorithm called Pathfinder Automatic PAR of a 96 node Ring Network Manual PAR of a 96 Node Ring Network Ring Network Pre and Post PlanAhead Results Ring Pre and Post PlanAhead Results 250 M a x i 200 m u m F 150 r e q u 100 e n c y Pre-PlanAhead Post-PlanAhead ( 50 M h z ) 0 8 16 32 48 64 Number of Nodes 80 96 128 Star Network Pre and Post PlanAhead Results Star Pre and Post PlanAhead Results 250 M a x i 200 m u m F 150 r e q u 100 e n c y 50 Pre-PlanAhead Post-PlanAhead ( M H z ) 0 8 16 32 48 Number of Nodes 64 80 96 Mesh Network Pre and Post PlanAhead Results Mesh Pre and Post PlanAhead Results 210 M a x 205 i m u 200 m F 195 r e q u 190 e n c 185 y Pre-PlanAhead Post-PlanAhead ( M 180 H z ) 175 8 16 32 Number of Nodes 48 Conclusions Xilinx 12.1 Tools offer significant improvements in the PAR of NoCs on FPGAs The analytical model proposed by Lee et al[1] does accuratly predict the performance of Star, and Fully Connected Networks Using manual PAR it is possible to improve the performance of NoCs on FPGAs Future Work Compare the performance of the Xilinx 10.1 tools suite and the Xilinx 12.1 tools suite for link widths of 16, and 32 bits Build Star and Fully Connected networks with link widths of 16, and 32 bits Create manual implementations for Torus and Hyper Cube topologies Acknowledgements Dr. Lesley Shannon Dr. Ash Parameswaran Michael Sjoerdsma Viewers Like you! References [1] J. Lee. “An Analytical Model Describing The Performance Of Application-Specific Networks-On-Chip On Field-Programmable Gate Arrays” M.A.Sc. thesis, Simon Fraser University, Canada, 2007. [2] Xilinx. “Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf [3] Xilinx. “Virtex-4 Family Overview”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf [4] Xilinx. “Virtex-5 Family Overview”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds100.pdf [5] Xilinx. “Virtex-6 Family Overview”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf [6] Xilinx. “Virtex-7 Product Table”. 2010. Available: http://www.xilinx.com/publications/prod_mktg/Virtex7-Product-Table.pdf [7] Xilinx. “What's New in Xilinx ISE Design Suite 12”. 2010. Available: http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/whatsnew. htm#121 References Cont… [8] Cisco Systems Inc. “Fiber Distributed Data Interface”. 2010. Available: http://docwiki.cisco.com/wiki/Fiber_Distributed_Data_Interface [9] Cisco Systems Inc. “Token Ring/IEEE 802.5”. 2010. Available: http://docwiki.cisco.com/wiki/Token_Ring/IEEE_802.5 [10] Cisco Systems Inc. “Ethernet Technologies”. 2010. Available: http://docwiki.cisco.com/wiki/Ethernet_Technologies [11] Kompics. “Distributed System Launcher”. 2010. Available: http://kompics.sics.se/trac/wiki/DistributedSystemLauncher [12] T. Kranenburg, R. van Leuken. “MB-LITE: A robust, light-weight soft-core implementation of the MicroBlaze architecture”, DATE, France, 2010. [13] K Eguro, S. Hauck, A. Sharma. “Architecture -Adaptive Range Limit Windowing for Simulated Annealing FPGA Placement”, DAC, United States, 2005. [14] G. Grewal, M. O’Cleirigh, M. Wineberg. “An Evolutionary Approach to Behavioral-Level Synthesis”, CEC, Australia, 2003. References Cont… [15] C Legl, B Wurth, K. Eckl. “A Boolean Approach to Performance-Directed Technology Mapping for LUT-Based FPGA Designs”, DAC, United States, 1996. [16]S Chin, S Wilton. “An Analytical Model Relating Fpga Architecture And Place And Route Runtime”, FPL, Czech Republic, 2009. [17]R Gindin, I Cidon, I Keidar. “NoC-Based FPGA: Architecture and Routing”, NOCS, United States, 2007. Questions?