Fault-Tolerant NI for SDM-Based NoC

advertisement
Fault-Tolerant Network-Interface for
Spatial Division Multiplexing Based
Network-on-Chip
By
Anup Das
Content
1.
NoC Overview
•
TDM-Based
•
SDM-Based
2.
Existing NI Architecture
3.
New Area Optimized Architecture
4.
Need for Fault-Tolerance
5.
Fault-Tolerant NI Architectures
•
Centralized Approach
•
Distributed Approach
6.
Results
7.
Conclusion
2
Network-on-Chip
• Increasing Number of IPs/PEs per die
• Communication bottleneck with shared bus
• Need for a scalable alternative
–Use of networking concepts
–NoC proposed by Benini et al.
Switch
Switch
Switch
NI
NI
NI
IP
IP
IP
Switch
Switch
Switch
NI
NI
NI
IP
IP
IP
3
Network-on-Chip (contd.)
• Two techniques for communication
–Time Division Multiplexing
–Spatial Division Multiplexing
A
Switch
Switch
A B C
Switch
Switch
B
C
NI
NI
NI
NI
IP
IP
IP
IP
TDM-based NoC
SDM-based NoC
4
Network Interface Architecture
• N to 1 bit serializers – one for each outgoing wire
• Data Distributor to send data from output queues to
one of the serializers
• Each distributor can send data to each of the
serializers
• Not all the distributors are loaded all the time
• A single distributor can serve all the serializers
5
Network Interface Architecture
32
PE
Queue 1
32
32
Distributor
1
Queue 2
Distributor
2
Queue 3
Distributor
3
32
n to 1
n to 1
n to 1
out[0]
out[1]
Switch
32
out[7]
6
New Area Optimized NI
• Single distributor for all the serializers
• New component called “requester” added
for interfacing with the queue
• 2 IDs introduced – serializer ID
(sID) and queue ID (qID)
sID
qID
001
000, 001, 010
010
011, 100, 101
100
110, 111
• At connection setup time – each serializer assigned to
a queue
• Serializer requests for data which is then forwarded
to corresponding queue
• Data from queues travels back to the requesting
serializer
7
New Area Optimized NI
32
32
Queue
1
32 to 1
32
32 to 1
out[0]
out[1]
32
Queue
2
32
32
32
Requester
Switch
PE
32
Distributor
32
Queue
3
32
32 to 1
out[7]
8
Need for Fault-Tolerance
• Transistor density on the rise
• Shrinking feature size
• Increasing number of faults manifesting post
fabrication
• Yield Loss
• Need for fault-tolerance
–IP/PE level
–Interconnect Level
• Idea is to provide graceful degradation of
performance in event of faults
9
NI Fault-Tolerance - Centralized
• Controller introduced between distributor and IP
queues
• Changes data mapping dynamically when fault occurs
with load balancing
32
32
32
Queue 2
32
32
Queue 3
Controller
PE
Queue 1
Distributor
1
n to 1
n to 1
32
32
out[0]
out[1]
Switch
32
32
Distributor
2
Distributor
3
n to 1
out[7]
10
Centralized NI Operation
Queue 1
Queue 2
D1
Controller
D3
Queue 3
Queue 1
Queue 2
D1
Controller
Queue 1
Queue 3
D2
D3
Queue 3
Queue 2
D2
D1
Controller
D2
D3
S1
S2
S3
S4
S5
S6
S7
S8
S1
S2
S3
S4
S5
S6
S7
S8
S1
S2
S3
S4
S5
S6
S7
S8
11
NI Fault-Tolerance - Distributed
• Multiple Distributors and Requestors –each capable of
fault recovery
• Two other IDs included – dID (distributor ID) and
rID (requester ID)
• When forwarding request to requester, distributor
forwards dID, sID and qID
• qID – used by requester to forward request to a
queue
• dID – used by requester to send back data from the
queue to the requesting distributor
• sID – used by the distributor to send data to the
requesting serializer
12
Distributed NI Operation
Queue 1
R1
D1
Queue 2
R2
D2
Queue 3
Queue 1
R1
D1
Queue 2
R2
D2
Queue 3
Queue 1
R1
D1
Queue 2
R2
D2
Queue 3
S1
S2
S3
S4
S5
S6
S7
S8
S1
S2
S3
S4
S5
S6
S7
S8
S1
S2
S3
S4
S5
S6
S7
S8
13
Results
Experimental Setup
• NoC considered with 8 links per node
• Data packets of size 32 bits
• Centralized Design coded in VHDL
• Distributed Design in Verilog
• Synopsys Design Compiler for ASIC synthesis
• UMC 65nm Standard Cells
• Area and Power number from the synthesis tool
• Area number converted to gate count for comparison
across technologies
15
Area Breakup
Other Logic
41%
Serializer
46%
Distributor
46%
Controller
13%
Distributor
44%
Requester
10%
Centralized Design
Distributed Design
Components
Centralized Design
Distributed Deign
Distributor
1.8K
2.2K
Requester
-
0.5K
Controller
1.5K
-
Serializer + Other
5K
4.5K
Total (2 Distributors)
10.1K
9.9K
16
Gate Count
Area and Power Comparison
12000
10000
8000
6000
4000
2000
0
8798
10074
9961
6046
Original NI
(without faulttolerance)
Centralized NI
(2 Distributors)
New NI
(without fault
tolerance)
Distributed NI
(2 Distributors)
6
Power
(mW)
4
3.20
4.11
3.78
2.13
2
0
Original NI
(without fault-tolerance)
Centralized NI
(2 Distributors)
New NI
(without fault tolerance)
Distributed NI
(2 Distributors)
17
Increasing Fault-Tolerance
24000
Gate Count
20000
16000
12000
Distributed
Centralized
8000
4000
0
18
Throughput
Max Throughput (Gbps)
35.00
Distributed Without Fault-Tolerance
30.00
Distributed With Fault-Tolerance
Centralized Without Fault-Tolerance
25.00
Centralized With Fault-Tolerance
20.00
15.00
10.00
5.00
0.00
8
12
16
20
24
28
32
36
Number of Outgoing Wires
19
Summary
• Distributed Design more area and power efficient but
centralized design becomes more efficient with more
distributors
• Single fault in the controller of centralized design
will render it useless
• No single fault will affect distributed NI behavior
• Next Step –
–Increase granularity of load balancing
–Fault-tolerance of Serializer
20
Thank you
Download