talk - ECE Users Pages - Georgia Institute of Technology

advertisement
Do We Need Wide Flits in
Networks-On-Chip?
Junghee Lee, Chrysostomos Nicopoulos, Sung Joo Park,
Madhavan Swaminathan and Jongman Kim
Presented by Junghee Lee
Introduction
• Increasing number of cores
 Communication-centric
 Packet-based Networks-on-Chip
• Unit
– Packet: a meaningful unit of the upper-layer protocol
– Flit: the smallest unit of flow control maintained by NoC
• If a packet is larger than a flit, a packet is split into multiple
flits
• The flit size usually matches with the physical channel
width
2
Motivation
256
64 or 128
Research
papers
Intel Sandy
Bridge
144
Intel SingleWhat is the optimal flit size
Chip Cloud
in Networks-on-Chip
for general purpose computing?
256 or 512
Research
papers
3
160
Tilera
Multifaceted Factors
Global
Wires
A first attempt in drawing balanced conclusion
Cost of
Router
Throughput
Flit
Size
Latency
4
Workload
Assumed NoC Router Architecture
d
v
p
c
5
Packet and Flit
Header
6
Payload
Simulation Environment
7
Parameter
Default Value
Simulator
Simics + GEMS (Garnet)
Benchmark
PARSEC
Number of processors
64
Operating system
Linux Fedora
L1 cache size
32 KB
L1 cache number of ways
4
L1 cache line size
64 B
L2 cache (shared)
16 MB, 16-way, 128-B line
MSHR size
32 for I- and 32 for D- cache
Main memory
2 GB SDRAM
Cache coherence protocol
MOESI directory
Topology
2D mesh
Default NoC Parameters
8
Parameter
Default Value
Number of virtual channels
3
Buffer depth
8 flits per virtual channel
Number of pipeline stages
4
Number of ports
5
Header overhead
16 bits
Key Questions
Can we afford wide flits as technology scales?
Is the cost of wide-flit routers justifiable?
How much do wide flits contribute to overall
performance?
Do memory-intensive workloads need wide flits?
Do we need wider flits as the number of processing
elements increases?
9
#1) Global Wires
Can we afford wide flits as technology scales?
Item
Unit
Technology
nm
65
45
32
22
Chip size*
mm2
260
260
260
260
Transistors*
MTRs
1106
2212
4424
8848
Global wiring pitch*
nm
290
205
140
100
Power index*
W/GHz cm2
1.6
1.8
2.2
2.7
Total chip power*
W
198
146
158
143
1.00
1.53
1.66
2.28
Normalized power portion
Value
Technology scaling does not allow for a direct
widening of the flits because the power portion of the
global wires increases as technology scales
* International Technology Roadmap for Semiconductors (ITRS) 2009 and 2011
10
#2) Cost of Router
Is the cost of wide-flit routers justifiable?
Cost of buffers  Flit size  Buffer depth  Number of virtual channels
Cost of switch  (Flit size)2  (Number of ports)2
Switch
Cost
Flit size  2  cost of router  2.97
Flit size  4  cost of router  10.10
Buffer
If the performance improvement does not
compensate for the increase in the cost,
widening of the flit size is hard to justifyFlit size
11
#3) Latency
How much do wide flits contribute to overall performance?
• The network traffic usually consists of packets of different sizes
– ls: The size of shortest packet
– ll: The size of longest packet
Latency
Suggested rule of thumb:
Flit size = shortest packet size + header overhead
Flit size
ls+h
12
ll+h
#4) Workload Characteristics
Do memory-intensive workloads need wide flits?
Application
Cache misses
/ Kcycle / node
Injected packets
/ Kcycle / node
Blackscholes
0.41
2.21
Freqmine
0.28
1.48
Streamcluster
0.48
2.42
Vips
0.23
1.27
X264
0.28
1.54
The injection
rate of real 0.67
applications is3.56
far less
Bodytrack
than the typical
point of NoC1.43
Ferret saturation
0.26
 Self-throttling
effect [34]
Fluidanimate
0.24
1.35
Up to 64 cores, we can keep the rule of thumb
Swaptions
0.38
2.04
because of the low injection rate
13
#5) Throughput
Do we need wider flits as the number of processing elements increases?
• Widening the flit is not a cost-effective way because of
fragmentation
• If widening the physical channel is the only option for increasing
the throughput, we suggest using physically separated networks
Latency
One 80-bit network
One 160-bit network
Two 80-bit networks
Flit size
14
Conclusions
Can we afford wide flits as technology scales?
No, unless the power budget for NoC increases
Is the cost of wide-flit routers justifiable?
No, the cost increases sharply with the flit size
How much do wide flits contribute to overall
performance?
Until the flit size reaches the shortest packet size
Do memory-intensive workloads need wide flits?
No, because of self-throttling effect
Do we need wider flits as the number of processing
elements increases?
No, because of fragmentation
15
Final Conclusion
• Suggested rule of thumb:
Flit size = shortest packet size + header overhead
• This paper provides a comprehensive discussion on all
key aspects pertaining to the NoC’s flit size
• This exploration could serve as a quick reference for the
designers/architects of general-purpose multi-core
microprocessors who need to decide on an appropriate
flit size for their design.
16
Thank you!
17
Questions?
Contact info
Junghee Lee
junghee.lee@gatech.edu
Electrical and Computer Engineering
Georgia Institute of Technology
18
Download