An Analysis of Network-on-Chip

advertisement
Bachelor of Applied Science Thesis Defense
An Analysis of Network-on-Chip
Implementations on Field
Programmable Gate Arrays
Kevan Thompson
Computer Engineering
School of Engineering Science, SFU
Overview
Introduction
Background
Methodology
Results
Conclusions and Future Work
Introduction
Size of Xilinx FPGAs
2500000
Virtex-7
Number of Logic Cells
2000000
1500000
1000000
Virtex-6
500000
0
2001
Virtex-2P
2002
2003
Virtex-4
Virtex-5
2004
2005
2006
2007
Year
2008
2009
2010
2011
2012
ASIC Vs FPGA
ASIC:
FPGA:
 Completely Custom
Design
 Reconfigurable
 Large Initial Investment
 Low cost for small volume
runs
 Need to carefully design
interconnect between
nodes
 Wires already placed on
the FPGA
Objective
Improvements in the Xilinx tools that have
significantly affected the performance of
NoCs on FPGAs
Improvements in NoC performance on
FPGAs that are possible using manual PAR
The Star and Fully Connected topologies
do not fit into current models
NoC Terminology
•Topology
•Node
Ring
Mesh
•Degree
•Average Node Degree
(AND)
Star
Fully
Connected
Previous Work on NoCs on FPGAs
For Xilinx FPGAs:
Methodology
Input FSL
| | | ... | | |
Multiplier Node
| | | ... | | |
Network
Switch
.
.
.
Output FSL
 8-bit multiplier node
 Two Fast Simplex Links
(FSLs)
 Network topology
communication switch
 FSLs: 16-word-deep
queues,24-bit width
 Multiplier uses 981 Flip-flops,
and 653 LUTs
 FPGA Xilinx Virtex-5 xc5vlx330
Results
10.1 Tools Vs 12.1 Tools
•Star, Ring, and Fully Connected Networks
Predicted Vs Measured Results
•Star, and Fully Connected Networks
Manual Implementation
Ring, Star, and Mesh Networks
10.1 Tools VS 12.1 Tools for Star Networks
10.1 Tools Vs 12.1 Tools for Star Networks
250
M
a
x
i
200
m
u
m
150
F
r
e
q
u 100
e
n
c
y 50
10.1 Tools
12.1 Tools
(
M
H
z
)
0
8
16
32
Number of Nodes
48
64
10.1 Tools VS 12.1 Tools for Ring Networks
10.1 Tools Vs 12.1 Tools for Ring Networks
210
M
a
208
x
i
m 206
u
m 204
F
r
e
q
u
e
n
c
y
202
10.1 Tools
200
12.1 Tools
198
196
(
)
M 194
H
Z
192
8
16
32
Number of Nodes
48
64
10.1 Tools VS 12.1 Tools for Fully Connected
Networks
10.1 Tools Vs 12.1 Tools For Fully Connected Networks
200
M
a 180
x
i
160
m
u
140
m
F 120
r
e 100
q
u 80
e
n 60
c
y
40
10.1 Tools
12.1 Tools
(
M
H
z
20
)
0
8
16
24
32
Number of Nodes
40
48
Percent Improvement of 12.1 Tools Over 10.1 Tools
Percent Improvement of 12.1 Over 10.1 Tools
80.0%
P
70.0%
e
r
c 60.0%
e
n
t 50.0%
I
40.0%
n
c
r
30.0%
e
a
s 20.0%
e
(
%
Star
Ring
Fully Connected
10.0%
)
0.0%
8
16
32
Number of Nodes
48
64
Star Networks
Predicted Vs Measured Results
250
M
a
x
200
i
m
u
m
𝒚 = −𝟎. 𝟑𝟎𝟗𝟎𝒙 + 𝟐𝟎𝟑. 𝟖
150
F
r
q
u
e 100
n
c
y
(
M
H
z
Measured Results
Predicted Results
50
)
0
8
16
32
48
Number of Nodes
64
80
96
Results
Results for Adjusted Model
Adjusted Predicted Vs Measured Result
250
M
a
x
i 200
m
u
m
F 150
r
e
q
u
100
e
n
c
y
Measured Results
Predicted Results
(
50
M
H
z
)
0
8
16
32
48
Number of Nodes
64
80
96
Comparison of Models
Percent Difference Between Predicted and Measured
Results
60
P
e
r
c
e
n
t
50
40
30
D
i
20
f
f
e
10
r
e
n
0
c
e
(
%
Original Model
Adjusted Model
8
16
32
48
-10
)
-20
Number of Nodes
64
80
96
Prediction of Adjusted Model for Random Networks
Percent Error for Random Networks
0
2
3
4
5
6
7
8
9
10
-10
P
e
r -20
c
e
n
-30
t
Random_16
Random_32
E
r -40
r
o
r
-50
Random_48
(
%
)
-60
-70
Average Node Degree
Fully Connected Networks
Predicted Vs Measured Results for Fully Connected
Networks
250
M
200
a
x
i 150
m
u 100
m
50
F
r
0
e
q -50
u
e -100
n
c
-150
y
8
16
24
40
Measured
Predicted
(
-200
M
H
-250
z
)
-300
32
Number of Nodes
Results
Interpolated Results for Fully Connected Networks
200
180
160
Maximum Frequency (MHz)
140
120
100
80
60
40
20
0
0
10
20
30
Number of Nodes
40
50
60
CAD Tool Synthesis Steps
1
2
3
4
Behavioural-level Synthesis [14]
Technology Mapping [15]
Placement [16]
Routing [16]
HDL is parsed for recognizable
constructs
 Constructs mapped to the
specific FPGAs technology
Components of the design are
placed on the FPGA using
Simulated Annealing
Wires are connected between
the components, using an
algorithm called Pathfinder
Automatic PAR of a 96 node Ring Network
Manual PAR of a 96 Node Ring Network
Ring Network Pre and Post PlanAhead Results
Ring Pre and Post PlanAhead Results
250
M
a
x
i 200
m
u
m
F 150
r
e
q
u
100
e
n
c
y
Pre-PlanAhead
Post-PlanAhead
(
50
M
h
z
)
0
8
16
32
48
64
Number of Nodes
80
96
128
Star Network Pre and Post PlanAhead Results
Star Pre and Post PlanAhead Results
250
M
a
x
i 200
m
u
m
F 150
r
e
q
u
100
e
n
c
y
50
Pre-PlanAhead
Post-PlanAhead
(
M
H
z
)
0
8
16
32
48
Number of Nodes
64
80
96
Mesh Network Pre and Post PlanAhead Results
Mesh Pre and Post PlanAhead Results
210
M
a
x 205
i
m
u 200
m
F
195
r
e
q
u 190
e
n
c 185
y
Pre-PlanAhead
Post-PlanAhead
(
M 180
H
z
)
175
8
16
32
Number of Nodes
48
Conclusions
Xilinx 12.1 Tools offer significant improvements
in the PAR of NoCs on FPGAs
The analytical model proposed by Lee et al[1]
does accuratly predict the performance of Star,
and Fully Connected Networks
Using manual PAR it is possible to improve the
performance of NoCs on FPGAs
Future Work
Compare the performance of the Xilinx 10.1
tools suite and the Xilinx 12.1 tools suite for link
widths of 16, and 32 bits
Build Star and Fully Connected networks with
link widths of 16, and 32 bits
Create manual implementations for Torus and
Hyper Cube topologies
Acknowledgements
Dr. Lesley Shannon
Dr. Ash Parameswaran
Michael Sjoerdsma
Viewers Like you!
References
[1] J. Lee. “An Analytical Model Describing The Performance Of Application-Specific
Networks-On-Chip On Field-Programmable Gate Arrays” M.A.Sc. thesis, Simon
Fraser University, Canada, 2007.
[2] Xilinx. “Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet”.
2010. Available:
http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf
[3] Xilinx. “Virtex-4 Family Overview”. 2010. Available:
http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf
[4] Xilinx. “Virtex-5 Family Overview”. 2010. Available:
http://www.xilinx.com/support/documentation/data_sheets/ds100.pdf
[5] Xilinx. “Virtex-6 Family Overview”. 2010. Available:
http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf
[6] Xilinx. “Virtex-7 Product Table”. 2010. Available:
http://www.xilinx.com/publications/prod_mktg/Virtex7-Product-Table.pdf
[7] Xilinx. “What's New in Xilinx ISE Design Suite 12”. 2010. Available:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/whatsnew.
htm#121
References Cont…
[8] Cisco Systems Inc. “Fiber Distributed Data Interface”. 2010. Available:
http://docwiki.cisco.com/wiki/Fiber_Distributed_Data_Interface
[9] Cisco Systems Inc. “Token Ring/IEEE 802.5”. 2010. Available:
http://docwiki.cisco.com/wiki/Token_Ring/IEEE_802.5
[10] Cisco Systems Inc. “Ethernet Technologies”. 2010. Available:
http://docwiki.cisco.com/wiki/Ethernet_Technologies
[11] Kompics. “Distributed System Launcher”. 2010. Available:
http://kompics.sics.se/trac/wiki/DistributedSystemLauncher
[12] T. Kranenburg, R. van Leuken. “MB-LITE: A robust, light-weight soft-core
implementation of the MicroBlaze architecture”, DATE, France, 2010.
[13] K Eguro, S. Hauck, A. Sharma. “Architecture -Adaptive Range Limit
Windowing for Simulated Annealing FPGA Placement”, DAC, United States,
2005.
[14] G. Grewal, M. O’Cleirigh, M. Wineberg. “An Evolutionary Approach to
Behavioral-Level Synthesis”, CEC, Australia, 2003.
References Cont…
[15] C Legl, B Wurth, K. Eckl. “A Boolean Approach to Performance-Directed
Technology Mapping for LUT-Based FPGA Designs”, DAC, United States,
1996.
[16]S Chin, S Wilton. “An Analytical Model Relating Fpga Architecture And
Place And Route Runtime”, FPL, Czech Republic, 2009.
[17]R Gindin, I Cidon, I Keidar. “NoC-Based FPGA: Architecture and Routing”,
NOCS, United States, 2007.
Questions?
Download