Uploaded by wang Bruch

Garnet2.0-Tutorial gem5-workshop ARM-Rsh-Summit2017

advertisement
Garnet2.0:
A Detailed On-Chip
Network Model Inside a
Full-System Simulator
Tushar Krishna
gem5 workshop
ARM Research Summit
September 11, 2017
Assistant Professor
School of ECE and CS
Georgia Institute of Technology
tushar@ece.gatech.edu
http://synergy.ece.gatech.edu/tools/garnet
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
2
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
3
Networks-on-Chip
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
4
Introduction to NoCs
Core
+ L1$
Core
+ L1$
Core
+ L1$
Core
+ L1$
A
Core
+ L1$
Core
+ L1$
LD (A) R1
Rsp
Req
On-Chip Network
Fwd
L2$
L2$
Garnet2.0 Tutorial | ARM Research Summit 2017
L2$
L2$
Tushar Krishna | Georgia Institute of Technology
L2$
L2$
September 11, 2017
5
Modern NoCs
Core
L1
D$
L1
I$
L2$
L3$/Directory
Network
Interface
Router
Network Interface converts cache
messages (ctrl or data) into packets.
Packets get broken down into one or
more flits depending on NoC link width
“Tile”
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
6
Network Architecture
u Topology
u Routing
u Flow
Control
u Router
Microarchitecture
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
7
Topology:
How to connect nodes with links
~Road Network
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
8
Routing:
Which path should a message take
~Series of road segments from source to destination
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
9
Flow Control:
When does a message stop/proceed
~Traffic Signals / Stop signs at end of each road segment
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
10
Router Microarchitecture:
How to build the routers
~Design of traffic intersection (number of lanes,
algorithm for turning red/green)
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
11
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
12
Why model NoCs accurately?
1.6
Full-state Directory
HyperTransport
Case Study II:
Private vs. Shared L2
Token Coherence
1.4
1.2
Normalized Runtime
Normalized Runtime
Case Study I:
Directory vs. Broadcast Protocols
1.2
1
0.8
0.6
0.4
0.2
0
Baseline NoC
(Intel SCC)
FANOUT + FANIN NoC
SMART NoC
(Krishna et al, MICRO 2011) (Krishna et al, HPCA 2013)
Shared L2
Private L2
1
0.8
0.6
0.4
0.2
0
64-core CMP with different NoC Microarchitectures
Baseline NoC
SMART NoC
(Intel SCC)
(Krishna et al, HPCA 2013)
64-core CMP with different NoC Microarchitectures
Different NoC Microarchitectures may lead
to different microarchitectural decisions
and new design optimization opportunities
64-core CMP running PARSEC workloads in full-system gem5. Average runtime plotted.
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
13
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
14
Garnet2.0
u Detailed
NoC Model
u Currently
part of Ruby Memory System in gem5
u Original version (5-stage pipeline) released in 2009
u Developed
by Niket Agarwal (currently at Google) and myself
u New
version (1-stage pipeline, more configurability)
released in 2016
u Resources
u Source:
src/mem/ruby/network/garnet2.0
u gem5 wiki page: www.gem5.org/garnet2.0
u Dev patches + practice labs:
http://synergy.ece.gatech.edu/garnet
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
15
Topology
Each topology is a python file
in configs/topologies/
Dir
Dir
Core
+ L1$
ExtLink
(bi-directional)
L2$
DMA
R
L2$
Core
+ L1$
L2$
L2$
IntLink
(uni-directional)
R
Core
+ L1$
Core
+ L1$
R
R
L2$
L2$
Dir
Garnet2.0 Tutorial | ARM Research Summit 2017
Core
+ L1$
Step 2: Instantiate “Int Links”
and connect to routers in
desired topology
This is an example of
MeshDirCorners_XY.py
R
R
Step 1: Instantiate routers and
connect to controllers
via “Ext Links”
Dir
Core
+ L1$
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
16
Topology Configurable Parameters
u
Router
u
u
router_latency (in cycles)
u
Can be set per router
u
Defined in src/mem/ruby/network/BasicRouter.py
Link
u
u
link_latency (in cycles)
u
Can be set per link
u
Defined in src/mem/ruby/network/BasicLink.py
weight (i.e., link weight)
u
u
u
To bias routing algorithm [later slides]
src_outport (string) and dst_inport (string)
u
Port direction (e.g., “East”)
u
Helps with readability of Config file + Adaptive routing algorithms
bw_multiplier (value)
u
Used by Ruby’s simple network model, NOT by Garnet
u
Link bandwidth is set inside Garnet via the ni_flit_size parameter [later slides]
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
17
Default Topologies Supported
u Pt2Pt
configs/topologies/YourTopology.py
u Crossbar
u Mesh
u Mesh_XY
u Mesh_westfirst
u MeshDirCorners_XY
u Cluster
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
18
Routing
u Routing
table
u Automatically
populated based on topology
u All messages use shortest path
u In
case of multiple options, the path with the smaller
weight is chosen
u Deterministic Routing
A custom routing algorithm can be selected from the command line by setting --routing-algorithm=2.
See configs/network/Network.py and src/mem/ruby/network/garnet2.0/GarnetNetwork.py
u Custom
u Users
can leverage outport/inport direction names
associated with each port to implement custom
algorithms (say adaptive)
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
19
Deadlock Avoidance
u
Deadlock: A condition in which a set of agents wait
indefinitely trying to acquire a set of resources
D
C
u
u
u
u
u
0
1
A
x
3
w
v
2
B
Packet A holds buffer u (in 1) and wants buffer v (in 2)
Packet B holds buffer v (in 2) and wants buffer w (in 3)
Packet C holds buffer w (in 3) and wants buffer x (in 0)
Packet D holds buffer x (in 0) and wants buffer u (in 1)
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
20
Deadlock-free Routing Algorithms
u
Deadlocks may occur if the turns taken form a cycle
u
Removing some turns can make the routing algorithm
deadlock free
XY Model
Garnet2.0 Tutorial | ARM Research Summit 2017
West-First Turn Model
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
21
Deadlock-free Routing Algorithms
u Assign
weights to bias which links used first
(to ensure no cyclic dependence)
XY Routing: Always go X first, then Y
DC
DA
DB
2
2
SA
SB
1
1
1
1
2
2
1
1
1
1
2
SC
2
2
2
2
2
1
1
1
1
2
2
Mesh_XY
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
22
Deadlock-free Routing Algorithms
u Assign
weights to bias which links used first
(to ensure no cyclic dependence)
West-first Routing: Go W first, then N/S
DC
DA
DB
2
2
SA
SB
1
1
2
2
2
2
1
1
2
2
2
SC
2
2
2
1
1
2
2
2
2
2
2
Mesh_westfirst
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
23
Flow Control
u
Virtual Channels
u
Coherence Protocol requires certain number of virtual
networks / message classes to avoid protocol deadlocks
u This
u
Within each vnet, there can be more than one VC for
boosting network performance
u In
u
is the minimum number of VCs required
Garnet, only one packet can use a VC inside a router at a time
u VCs
in vnets carrying control messages are 1-flit deep
u VCs
in vnets carrying data (cacheline) messages are 4-5 flit deep
Credits
u
Each VC conveys its buffer availability by sending credits
to its upstream router
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
24
Conventional VC Router Microarch
Route
Compute
BW: Buffer Write
VC Allocator
RC: Route Compute
SW Allocator
V
N
0
FLIT
V
N
1
VC 0
VA: VC Allocation
VC 1
Input VCs arbitrate for “output”
VCs (Input VCs at next router)
VC n
Input Buffers
SA: Switch Allocation
Input ports arbitrate for
output ports
VC 1
VC 2
BR: Buffer Read
VC n
Input Buffers
BW
RC
VA
Garnet2.0 Tutorial | ARM Research Summit 2017
ST: Switch Traversal
Crossbar Switch
LT: Link Traversal
SA
BR
ST
LT
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
25
Single-Cycle Router Implementation
Router.ccàwakeup()
1
Network
Link.cc
SwitchAllocator.cc
àwakeup()
InputUnit.cc
àwakeup()
VN0 VC 0
VC 1
VN1 VC 2
VC 3
* Buffer Write
* Route Compute
For multi-cycle router,
add delay in flit
before it can do SA
has_free_vc()
* Arbitrate outports
* VC Allocate
* send credit
select_free_vc()
* Buffer Read
4
OutputUnit.cc
àwakeup()
* Arbitrate inports
(schedule creditlink
wakeup next cycle)
VirtualChannel.cc
Credit
Link.cc
2
3
(send flit to switch)
OutVCState.cc
CreditLink.cc
* Rcv Credit
* Update State
CrossbarSwitch.ccàwakeup()
NetworkLink.cc
* Switch Traversal: Push winner flits on link queue
* Schedule output link wakeup for next cycle
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
26
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
27
System Organization
Ruby Memory System
GarnetNetwork.cc
Coherence
Protocol
Interface is
“MessageBuffers
” from Ruby
Garnet2.0 Tutorial | ARM Research Summit 2017
Network
Interface
Router
Router
Network
Link
Credit
Link
Stats
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
28
Msg_size decides
number of flits
Network Interface
Dir
L2$
<VN, msg>
Core
+ L1$
NI
NI
DMA
L2$
Core
+ L1$
NI
NI
R
R
R
R
NI
NI
L2$
NI
NI
NI
Dir
credit
Flitisize
Cache Controller
Core
+ L1$
Can add additional
info in flit
VN0
VN1
VC Select
VC 0
VC 1
VC 2
To/From
Router
VC 3
Ingress
NI
NI
Egress
L2$
Core
+ L1$
Garnet2.0 Tutorial | ARM Research Summit 2017
NetworkInterface.cc
àwakeup()
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
29
Garnet Configurable Parameters
u
ni_flit_size
u
u
u
vcs_per_vnet
u
u
Default = 1
routing_algorithm
u
u
Default = 4
buffers_per_ctrl_vc
u
u
Total VCs in each inport = num_vnets * vcs_per_vnet
buffers_per_data_vc
u
u
Default = 16B (128b) à 1-flit ctrl, 5-flit data
This sets the bandwidth of each physical link
Weight-based table or custom
Defined in:
src/mem/ruby/network/garnet2.0/GarnetNetwork.py
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
30
Command Line Parameters
src/mem/ruby/network/
BasicRouter.py
BasicLink.py
Network.py
garnet2.0/GarnetNetwork.py
Definitions and
Default Values
configs/
ruby/Ruby.py
common/Options.py
network/Network.py
example/garnet_synth_test.py
- Overrides default values in Ruby
- All parameters in in these .py files
can be specified from command line
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
31
Running Garnet with Ruby
u Build
Ruby Coherence Protocol
scons build/X86_MOESI_hammer/gem5.opt PROTOCOL=MOESI_hammer
u Protocol
determines number of message
classes/virtual networks required
u Invoke
garnet2.0 from command line with
appropriate network parameters
./build/X86_MOESI_hammer/gem5.opt configs/example/fs.py
--network=garnet2.0 --topology=Mesh_XY ...
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
32
Running Garnet Standalone
u
Build the Garnet_standalone protocol
scons build/NULL/gem5.debug PROTOCOL=Garnet_Standalone
Dummy protocol just for traffic injection via Garnet
Synthetic Traffic tester (next slide)
u 3 Virtual Networks: vnet 0 and vnet 1 inject ctrl (1-flit)
packets, vnet 2 injects data (5-flit) packets
u
u
Run user-specified synthetic traffic
./build/NULL/gem5.debug
configs/example/garnet_synth_traffic.py
--network=garnet2.0 --topology=... \
--synthetic=uniform_random \
--injectionrate=0.01 \
--num-packets-max=100 \
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
33
Garnet Synthetic Traffic
u
Dummy CPU Model [only works with Garnet_standalone protocol]
u
src/cpu/testers/garnet_synthetic_traffic
u
Inject packets for user-specified traffic pattern at user-specified injection
rate
u uniform_random
Add your own synthetic traffic:
u tornado
configs/example/garnet_synth_traffic.py
u bit_complement
src/cpu/testers/garnet_synthetic_traffic/GarnetSyntheticTraffic.hh
u bit_reverse
src/cpu/testers/garnet_synthetic_traffic/GarnetSyntheticTraffic.cc
u
u
bit_rotation
u
neighbor
u
shuffle
u
transpose
Ability to inject continuously or fixed number of packets, and/or for fixed
time, and/or from fixed source and/or to fixed destination in one/more vnets
u
u
Heavy customization very useful for debugging
All parameters described here:
http://www.gem5.org/Garnet_Synthetic_Traffic
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
34
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
35
Sample Simulation
./build/NULL/gem5.debug configs/example/garnet_synth_traffic.py
--topology=Mesh_XY --num-cpus=16 --num-dirs=16 --mesh-rows=4
m5out/config.ini
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
36
Output Stats
m5out/stats.txt
Packet and Flit stats (#injected,
#received, queueing latency at NIC, and
network latency) per VN and overall
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
More stats can be added
via GarnetNetwork.cc
September 11, 2017
37
Overview
u
What are Networks-On-Chip (NoCs)?
u
Why model NoCs accurately
u
Garnet2.0
u
u
u
u
Configuration
u
Topology
u
Routing
u
Flow-Control
u
Router Microarchitecture
System Integration
u
Network Interface
u
Network Parameters
u
Running Garnet with Ruby
u
Ruby Garnet Standalone
Output Stats
Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
38
Extensions and FAQs
u
System Level Modeling
u
How can I integrate Garnet into my own simulator (or with the gem5
classic memory system)?
u
u
How do I print a network trace?
u
u
You can run Garnet in a standalone mode, and have it inject traffic from a trace
instead of a fixed synthetic pattern. Alternately, try to use cpu/testers/traffic_gen
Does Garnet report NoC area and power numbers?
u
u
You can add some code to the NI. I have a patch on my website for reference.
Can garnet read a network trace?
u
u
This should not be too hard - would just require some changes to the NI code on how
it receives its inputs. If anyone wants to try it, I would love to give pointers.
The output of Garnet can be fed into DSENT (Sun et al, NOCS 2012) which is present
in ext/dsent. We will add an automatic stats.txt parser for it soon.
Can we model multiple clock domains?
u
The entire Ruby memory system (including Garnet inside it) is one clock domain. To
mimic a multi-clock domain design, you can schedule wakeups of slow routers
intelligently at some multiples of the clock rather than every cycle.
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
39
Extensions and FAQs
u
Topology
u
How do we model multiple BW links in Garnet?
u
u
Can we model a heterogeneous CPU-GPU system?
u
u
Yes. The current AMD GPU model models a cluster of CPUs connected to a GPU.
Can we model indirect networks such as Clos?
u
u
Inherently, that would require support in the router to manage multiple flit
sizes. Instead, you can add multiple links between the same nodes if you want
to model higher bandwidth. If they have the same weight, Garnet will
randomly send over the two.
Yes, there can be additional routers that are not connected to any controller
and act purely as switches.
Can we model large-scale HPC networks?
u
Garnet can model any sized network. You can run 256-node standalone
simulations easily. However, beyond that gem5 cannot instantiate more
directories (which it uses as destination nodes). I have a patch to run 1024node synthetic sims on my website. But these run quite slowly.
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
40
Extensions and FAQs
u
Routing
u
How do we model an adaptive routing algorithm?
u
u
Flow Control
u
Can we implement alternate deadlock avoidance schemes (such as
escape VCs or dateline)?
u
u
If you want to use internal NoC metrics (such as number of credits at an
output port) for making routing decisions, do not use table-based routing.
Instead, set the routing-algorithm to custom, and implement your own routing
function inside RoutingUnit.cc. See outportComputeXY() for reference.
You can update the vc selection scheme inside SwitchAllocator to control
which VCs get allocated.
Microarchitecture
u
Can we model variable number of VCs in each router?
u
Currently the codebase is very tied to the global vcs_per_vnet parameter. If
you want to model variable number of VCs, one hack could be to have
everyone instantiate the same number of VCs, but modify the VC select to
never allocate certain VCs
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
41
Conclusions
u
Garnet2.0 is an open-source research vehicle
Use it and contribute to it!
u Being actively maintained by the following people
u
Georgia Tech: Tushar Krishna
u AMD Research: Brad Beckmann, Onur Kayiran, Jieming Yin, Matt
Porembas
u If you have any questions, email on gem5-users or gem5-dev mailing lists
u
u
u
Would love to have it integrated with the classic memory model
Thank you!
Useful Resources:
www.gem5.org/Interconnection_Network
u www.gem5.org/garnet2.0
u http://synergy.ece.gatech.edu/garnet
u
Garnet2.0 Tutorial | ARM Research Summit 2017
Tushar Krishna | Georgia Institute of Technology
September 11, 2017
42
Download