Fault-Tolerant Network-Interface for Spatial Division Multiplexing Based Network-on-Chip By Anup Das Content 1. NoC Overview • TDM-Based • SDM-Based 2. Existing NI Architecture 3. New Area Optimized Architecture 4. Need for Fault-Tolerance 5. Fault-Tolerant NI Architectures • Centralized Approach • Distributed Approach 6. Results 7. Conclusion 2 Network-on-Chip • Increasing Number of IPs/PEs per die • Communication bottleneck with shared bus • Need for a scalable alternative –Use of networking concepts –NoC proposed by Benini et al. Switch Switch Switch NI NI NI IP IP IP Switch Switch Switch NI NI NI IP IP IP 3 Network-on-Chip (contd.) • Two techniques for communication –Time Division Multiplexing –Spatial Division Multiplexing A Switch Switch A B C Switch Switch B C NI NI NI NI IP IP IP IP TDM-based NoC SDM-based NoC 4 Network Interface Architecture • N to 1 bit serializers – one for each outgoing wire • Data Distributor to send data from output queues to one of the serializers • Each distributor can send data to each of the serializers • Not all the distributors are loaded all the time • A single distributor can serve all the serializers 5 Network Interface Architecture 32 PE Queue 1 32 32 Distributor 1 Queue 2 Distributor 2 Queue 3 Distributor 3 32 n to 1 n to 1 n to 1 out[0] out[1] Switch 32 out[7] 6 New Area Optimized NI • Single distributor for all the serializers • New component called “requester” added for interfacing with the queue • 2 IDs introduced – serializer ID (sID) and queue ID (qID) sID qID 001 000, 001, 010 010 011, 100, 101 100 110, 111 • At connection setup time – each serializer assigned to a queue • Serializer requests for data which is then forwarded to corresponding queue • Data from queues travels back to the requesting serializer 7 New Area Optimized NI 32 32 Queue 1 32 to 1 32 32 to 1 out[0] out[1] 32 Queue 2 32 32 32 Requester Switch PE 32 Distributor 32 Queue 3 32 32 to 1 out[7] 8 Need for Fault-Tolerance • Transistor density on the rise • Shrinking feature size • Increasing number of faults manifesting post fabrication • Yield Loss • Need for fault-tolerance –IP/PE level –Interconnect Level • Idea is to provide graceful degradation of performance in event of faults 9 NI Fault-Tolerance - Centralized • Controller introduced between distributor and IP queues • Changes data mapping dynamically when fault occurs with load balancing 32 32 32 Queue 2 32 32 Queue 3 Controller PE Queue 1 Distributor 1 n to 1 n to 1 32 32 out[0] out[1] Switch 32 32 Distributor 2 Distributor 3 n to 1 out[7] 10 Centralized NI Operation Queue 1 Queue 2 D1 Controller D3 Queue 3 Queue 1 Queue 2 D1 Controller Queue 1 Queue 3 D2 D3 Queue 3 Queue 2 D2 D1 Controller D2 D3 S1 S2 S3 S4 S5 S6 S7 S8 S1 S2 S3 S4 S5 S6 S7 S8 S1 S2 S3 S4 S5 S6 S7 S8 11 NI Fault-Tolerance - Distributed • Multiple Distributors and Requestors –each capable of fault recovery • Two other IDs included – dID (distributor ID) and rID (requester ID) • When forwarding request to requester, distributor forwards dID, sID and qID • qID – used by requester to forward request to a queue • dID – used by requester to send back data from the queue to the requesting distributor • sID – used by the distributor to send data to the requesting serializer 12 Distributed NI Operation Queue 1 R1 D1 Queue 2 R2 D2 Queue 3 Queue 1 R1 D1 Queue 2 R2 D2 Queue 3 Queue 1 R1 D1 Queue 2 R2 D2 Queue 3 S1 S2 S3 S4 S5 S6 S7 S8 S1 S2 S3 S4 S5 S6 S7 S8 S1 S2 S3 S4 S5 S6 S7 S8 13 Results Experimental Setup • NoC considered with 8 links per node • Data packets of size 32 bits • Centralized Design coded in VHDL • Distributed Design in Verilog • Synopsys Design Compiler for ASIC synthesis • UMC 65nm Standard Cells • Area and Power number from the synthesis tool • Area number converted to gate count for comparison across technologies 15 Area Breakup Other Logic 41% Serializer 46% Distributor 46% Controller 13% Distributor 44% Requester 10% Centralized Design Distributed Design Components Centralized Design Distributed Deign Distributor 1.8K 2.2K Requester - 0.5K Controller 1.5K - Serializer + Other 5K 4.5K Total (2 Distributors) 10.1K 9.9K 16 Gate Count Area and Power Comparison 12000 10000 8000 6000 4000 2000 0 8798 10074 9961 6046 Original NI (without faulttolerance) Centralized NI (2 Distributors) New NI (without fault tolerance) Distributed NI (2 Distributors) 6 Power (mW) 4 3.20 4.11 3.78 2.13 2 0 Original NI (without fault-tolerance) Centralized NI (2 Distributors) New NI (without fault tolerance) Distributed NI (2 Distributors) 17 Increasing Fault-Tolerance 24000 Gate Count 20000 16000 12000 Distributed Centralized 8000 4000 0 18 Throughput Max Throughput (Gbps) 35.00 Distributed Without Fault-Tolerance 30.00 Distributed With Fault-Tolerance Centralized Without Fault-Tolerance 25.00 Centralized With Fault-Tolerance 20.00 15.00 10.00 5.00 0.00 8 12 16 20 24 28 32 36 Number of Outgoing Wires 19 Summary • Distributed Design more area and power efficient but centralized design becomes more efficient with more distributors • Single fault in the controller of centralized design will render it useless • No single fault will affect distributed NI behavior • Next Step – –Increase granularity of load balancing –Fault-tolerance of Serializer 20 Thank you