slides - Ran Manevich

advertisement
Handling Global Traffic in
Future CMP NoCs
Ran Manevich, Israel Cidon, and Avinoam
Kolodny.
Electrical Engineering Department
Technion – Israel Institute of Technology
Haifa, Israel
SLIP 2012
QNoC
Research
Group
Module
Module
Module
Module
Module
Module
Module
Module
Module
Module
Module
Module
Bandwidth Version of Rent’s
Rule
B – Cluster external bandwidth.
k – Average bandwidth per module.
G – Number of modules in a cluster.
R – Rent’s exponent, 0<R<1.
G = 16
B=∑
Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007
Rent’s Exponent Reflects
Traffic Locality
CMP NoC Traffic Follows
Rent’s Rule
2D Mesh NoC
~Average of CMP parallel
programs*
* Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008
2D Mesh – Packets
Classification by Distance

For illustration purposes, packets are classified
according to distances between sources and
destinations.
 Nearest Neighbor (NN) –
 Local – 1<Dist<2+K/8
 Global – Dist ≥ 2+K/8
K=16
K=8
Dist = 1
Fraction of global packets decreases
in large systems
(Nearest
Neighbor)
Rent’s exponent (R) = 0.7
Dominance of Global Packets in
BW/Router and Light Load Latency
Nearest Neighbor traffic is dominant
in small systems.
*
In large systems:
1. Global packets are
minority.
2. Global packets
dominate BW/router
and average latency.
* Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC):
local buses with globalmesh architecture”, SLIP 2010
Problem!!!
In large systems, global
packets (minority):
Consume most of the
network’s BW.
Significantly increase
average light load
latency.
Solution - PyraMesh


Hierarchical 2D mesh.
Global packets are routed
through higher hierarchy levels.
Overall hops-count is
reduced.
Average latency is
reduced.
Dest.
Source
Average BW per router is reduced.
4 hops
5
6
1
7
2
8
3
instead of 14!
PyraMesh - Architecture
K – The size of the base mesh.
NL – Number of levels.
NP – Number of pyramids on
top of the base mesh.
K = 8, NL = 2, NP = 1
αi = 4, Ci = 2
αi – Ratio between the sizes of levels
i and i+1.
Ci – Number of routers in level i that
are connected to a router in level
i+1 along a single dimension.
K = 8, NL = 3, NP = 1
αi = 2, Ci = 1
K = 8, NL = 2, NP = 4
αi = 4, Ci = 1
PyraMesh – Addressing and
Routing

Addressing – On each level i, node (X,Y)Base Mesh
is represented by the nearest router in the
North-East quarter:
Addressi ,( X ,Y )

i 1
 X   Y 
    ,    ; ati   am
 at

m 1
  i   ati  
Routing –
XY:
PyraMesh – Packets
Classification




Packets are distributed among levels i according
to their travel distance (D) in the base mesh.
DThi – Distance threshold of level i.
If D > DThi , the packet is directed to level i+1.
Example: DThi = 6, 12, 20
Travel Distance
Highest Level
D>20
4
12<D≤20
3
6<D≤12
2
D≤6
1 (Base Mesh)
PyraMesh – Optimization
CONSTRAINTS
OPTIMIZATION
OBJECTIVES
Area overhead,
Wiring overhead,
Maximum bandwidth per router*,
Average light-load latency*
=
F(K,NL,NP,αi,Ci,Dthi*,R*)
Optimization Results Example
of 16x16 System, R = 0.7

Light load latency optimized PyraMesh:
Packets distance thresholds
D>8
5<D≤8
D≤5

Throughput optimized PyraMesh:
D>18
6<D≤18
D≤6
Light Load Latency Performance
BMesh – The baseline mesh
Scaled Mesh (SMesh) – Links wider than in
BMesh by PyraMesh area overhead factor.
HNoC –
Throughput Results, R = 0.7
Our Contributions
Characterization of Rentian traffic in large
NoCs.
The observation that global packets limit
scalability of large systems.
PyraMesh – A novel framework for
hierarchical NoCs design.
Conclusions



Global packets limit performance in large
(future) CMP systems.
PyraMesh – A novel class of
hierarchical 2D mesh
topologies.
PyraMesh handles global traffic in future
CMP NoCs.
Thank You!
Related Work
CMesh
Hierarchical
2-Levels
Mesh
Hierarchical
Rings
on2D
a Mesh
Long GigaNoC
Range
Links
J.Bourduas
Puttmann,
D. Winter
Balfour
and
J.-C.
W.
Niemann,
J. “Latency
Dally.
M.
“Design
Porrmann,
tradeoffs
and
U.
for
Rückert.
tiled
“GigaNoC
on-chip
networks”.
– routing
A hierarchical
and
Prusseit
Gerhard
P.
Fettweis.
Hierarchical
architectures
S.
Z.Steffen
Zilic.
reduction
of
global
traffic
in
wormhole-routed
meshes
usingin
U.Markus
Y.C.
Ogras
andand
R.
Marculescu.
“ ‘It’s
aand
small
world
after
all’: CMP
NoC
performance
optimization
network-on-chip
International
for
scalable
on
chip-multiprocessors.”
Supercomputing,
2006.Euromicro
DSD(VLSI)
2007.Syst. 2006.
2D-mesh
networks-on-chip.
ISOCC
2010.
hierarchical
rings
for
global
routing”.
ASAP
2007.
viaclustered
long-range
linkConference
insertion”.
IEEE
Trans.
on Very
Large
Scale Integr.
Download