Intellectual Completion Tracking System microarchitecture for Advanced NoC Designs Sudeep P, Naveen Kanumuri, Sreenath AK , Vadlamuri Venkata Sateesh Intel India Technology Pvt Ltd, Bengaluru, India 1 sudeep.p@intel.com 2 naveen.kanumuri@intel.com 3 sreenath.ak@intel.com 4 vadlamuri.v.sateesh@intel.com Abstract— The increase in the number of processing elements on network-on-chip (NoC) demands high throughput and low latency with constraints on the area of the chip. Network Interface (NI) is one of the fundamental and performance-hungry blocks in an NoC. NI is responsible for handling acknowledgments for its requests. To handle acknowledgments in an NI, NoC must have a Completion Tracking System block. Completion Tracking System with static queues is prone to huge areas and low throughput. This paper presents an Intellectual Completion Tracking System (ICTS) which can handle multiple outstanding transactions from different initiators and assists in generating the responses from network interfaces, which caters to the network interface between PCI interfaces and nonstandard PCI interfaces. Keywords— Network-on-chip, PCI, Network Interface, High Bandwidth, Protocols. I. INTRODUCTION Network on Chip is a communication infrastructure, which interconnects all the Intellectual properties (IPs) for intensive parallel communication with high bandwidth and low latency. The main building blocks of the Network on Chip (NoC) are Network interface, Routers, and Topology of the Network. generate multiple requests without an acknowledgment from the Targets. And the Target IPs may receive at their own pace in any order. But the Network Interface should adhere to the ordering rules of the Native Protocol of that IP as well as the ordering rules of the NoC. II. PREVIOUS RESEARCH WORK Previous researches show response tracking through static queue structures. One static queue is allocated per a tag with a depth of total outstanding transactions. Request transactions are pushed into a queue, which corresponds to a tag. On arrival of response, entry is popped from a queue, which corresponds to that tag. Buffer space is allocated for each tag for the outstanding number of transactions. If a particular tag is not used by Initiator/Target IP, buffer space in the queue corresponds to that tag is unused. In another case, if there is outstanding for only one tag, then once if buffer space is not available in that queue, even though there is a huge buffer space in other queues, it cannot be utilized. If there are “T” tags with outstanding transactions “O”, then the total buffer space required is T * O. Response Tracking with static queues is as shown in Fig. 1. The IPs which are connected to the NoC natively follow external bus protocols like AXI (Advanced Extensible Interface), AHB (Advanced High-performance Bus), APB (Advanced Peripheral Bus), OCP (open core protocol) ..Etc. In order to communicate External IP to another External IP through NoC[6], Network Interfaces are required at the source side as well as at the destination side. Network Interface converts External protocols to Network Link protocol and vice versa. Master and Target IPs deploy the NIs. As the Network on Chip supports highly intensive parallel communication, so it may follow some ordering rules like PCI producer and consumer ordering rules. Otherwise, the chip may end up in some deadlock or livelock. According to the PCI producer and consumer ordering rules, non-bufferable transactions should push the bufferable transactions [3][4]. And Responses to the Requests should push the bufferable transactions. So, Network Interfaces plays an important role in maintaining these order rules between NoC and the IPs. If the Network interface is not able to handle these ordering rules with high bandwidth, it may back pressure Target/Initiator IPs. Throughput or bandwidth of Network Interface depends upon the completion management system in it. Initiator IPs connected to the NoC can generate outstanding transactions, which means these Initiators Fig. 1. Response tracking using Static Queues 978-1-7281-1933-5/20/$31.00 ©2020 IEEE. III. THE NOVELTY OF COMPLETION TRACKING SYSTEM A. Intellectual completion tracking system microarchitecture Attribute Memory: Every new Request generally contains some Request attributes which are used to frame the Response command to the Initiator. Linked List[5]: Link-list is used to dynamically allocate the buffer space for the Request attributes, which can be used in the generation of a response command. Each TAG ID has one linked-list. Linked-list[9] grows from the tail pointer for the new requests that are coming with the same TAG ID. If a response comes for a TAG ID, then the linked list header changes to the next head pointer. If all the outstanding requests get completions, then the linked list of the TAG ID should be inactivated. And this could be used for other transactions which has a new Source ID +TAG. B. Deployment of ICTS in Target Network Interface Fig. 2. Intellectual Completion Tracking System Block Level Diagram The Architecture of Intellectual Completion Tracking System (ICTS) is as shown in the Fig. 2. ICTS Block Keeps track of all outstanding transactions and assists in generating the Response command back to the Bus Agent. ICTS block provides multiple advantages, with which the entire performance of the Network on Chip can be improved. The Functional and PPA Advantages of ICTS has explained in detail in section B and section C. The micro-architecture details of ICTS block, is as follows. ID Compression Module: Bus agents in NoC system generally have an Initiator ID and Target ID for each Initiator and each Target, respectively. And also, there is a tag associated with the transaction generated by an Initiator. If Target has to follow the ordering rules, then the Initiator Network Interface has to send the new Tag, which is a combination of Initiator ID and tag to the Target. The new ID is useful for Target to generate a response based on ordering rules. The new ID consumes more width in a flit. So, the ID compression module helps to generate a unique id, which is less width compare to the Initiator ID+tag combination. The ID Compression module is already preloaded with a unique tag in a FIFO, and these are generated for every new Initiator ID+tag combination. Active ID Indicator: If any ID has outstanding transactions, then that ID is considered as an active ID. This ID should be inactivated, when all the outstanding transactions got the acknowledgments for that ID. Inactivated IDs can be used for other transactions with a new ID. Head and Tail Pointers: ID-based ordering ensures all the transactions with the same tag should get the responses in the same order. The Head pointer of an ID gets updated for every response generation, and the tail pointer of an ID gets updated for every new transaction request of that ID. The Head pointer and the Tail pointer assists in invalidating its corresponding ID. So, the Head pointer of an ID is used to get the attributes of the transaction to generate a response command to the Initiator. Fig. 3. Embedding ICTS in Target Network Interface Target Network Interface with ICTS is as shown in Fig. 3. Target Network Interface may get multiple transactions from different Initiators. Target Network Interface has to acknowledge all these Requests by satisfying the Ordering rules of NoC. Bus Target can be a Bus agent or a Bus Fabric. If it is a Bus Fabric, the Request transaction has to be forwarded to the ultimate Bus agent, which is connected to the Bus Fabric. 1) Unique TAG Generation If there are multiple transactions from the Same Initiator with the same tag, then the Network interface has to send the responses in the same order in which the request arrived. If the requests are from the different Initiators, but with the same tag, it can still send out of order responses as the Requestors 978-1-7281-1933-5/20/$31.00 ©2020 IEEE. are different. Concluding above statements with respect to the ordering rules, If Initiator ID +TAG ID is different, then Target Network Interface can send out of order responses; otherwise, it has to send in order responses for the requests. Hence, Target Network Interface has to forward the Transaction down to the Bus Fabric with a new ID, which is a combination of Initiator ID and Tag of the corresponding transaction. Initiator ID+TAG is more wire count to the Bus Fabric and all the way down to the destination. The Unique TAG generation feature in the Intellectual Completion Tracking System generates a unique ID starting from zero. With the Unique Tag generation feature, Target NI is able to receive out of order responses and generate response commands back to the Initiators. 2) Enabling full Bandwidth for all Initiator IPs in NoC If the Target Network Interface doesn’t have the capability to handle outstanding transactions from different Initiators. Then it has to serialize all the transactions, which limits the bandwidth of the Initiators. Deploying ICTS in Target Network Interface, allows the request from Multiple Initiators to process and send to the Bus Target. It can handle out of order responses, so all the Initiators can operate with their full bandwidth, as the serialization of transactions is avoided the Target Network Interface. 3) Handling split Transactions at Target NI Target NI has to process and convert transactions from NoC Router to another Bus protocol, during this process, Target NI may have to split the request transaction and send it as multiple requests to the Bus Fabric. For these split transaction requests, responses come back. To track the no of splits requests and its responses, split counters are maintained in ICTS. C. Deployment of ICTS in Initiator Network Interface 1) Acknowledgment to the Bus masters for buffarable request on the network interface link. Bus Master in the Network on Chip, which has the interface protocols like AXI or OCP, expects acknowledgment for the requests from the destination or Intermediate targets of the Network topology. If a Bus master’s request is going to be a buffarable transaction on the Network interface link, then the Network Interface (NI) is responsible for acknowledging the Request. Acknowledgment from the Network Interface to the Bus Master should follow the ordering rules of the Native bus protocol. Consider an example, if the Bus master is an OCP agent. Then the Network interface becomes a slave agent. OCP Bus master expects all the acknowledgments, either with data or without data, in order for a particular tag ID of its Requests. Network Interface converts Bus master’s request into a buffarable or non-buffarable transaction. A Network interface gets the response from a target destination for a non-buffarable transaction only. And the Network interface should not expect the response for the buffarable transactions, but it has to provide the acknowledgment to the bus master. In order to acknowledge Bus master with the Response, it should wait for all non-buffarable request’s responses from the target to arrive. ICTS ensures all the Buffarable transactions are acknowledged to the Bus masters. Attribute memory has a flag, which tells the transaction type of a pending completion. If the flag indicates acknowledgment is pending for a nonbuffarable transaction, then it must wait for the Response from NoC. otherwise, it can generate a response to the Bus master. 2) To Echo back certain attributes in the Response command for the Bus Masters. Bus Masters expects certain attributes to be echoed back in the Response. If the Initiator network interface doesn’t have the capability to revert back the required attributes to the Bus master. Then all these attributes, which have to be echoed has to travel up to the final destination and then back to the Bus master, Which increases the flit count of the transaction, area, and power consumption. Deploying ICTS for this purpose ensures to revert back all the attributes required by the Bus Master. While sending the transaction towards the Target, Initiator Network Interface has to push the required attributes into the Completion Tracking System. So that the Completion Tracking System can provide those required attributes during the generation of the Response. 3) Unique id generation Unique ID generation in Initiator NI is also significant, as the Initiator IP can be plugged-in to a bus fabric. So ID of the transaction can be wide width. Hence there is a need to compress the ID of the transaction. 4) Out of order and interleaving at the split boundary Initiator Network Interface may not be capable of handling out of order responses from the Network link, even though Bus Master has the capability to receive them. Consider an example; if Bus Master is an AXI Master, and it generates outstanding transactions with different tags, then Bus Master may get out of order responses for different Tags. In this case, ICTS can assist in framing the completion to the AXI Master. Fig. 4. Embedding ICTS in Initiator Network Interface As soon as it gets an out of order response, present the tag to the ICTS, then ICTS can give the Attributes for that 978-1-7281-1933-5/20/$31.00 ©2020 IEEE. transaction in the same cycle. Once the response is generated, that entry for that transaction in the ICTS can be removed. IV. Fig 7 and Fig. 8 has timing report and Worst negative slack (WNS) information. RESULTS CONCLUSION Fig 5 shows the waveform for the Intellectual Completion Tracking System implemented in the five-port router. Where data coming from different port and orders according to the logic and send to the output port. Performance: Multiple outstanding transaction capability is allowed instead of Serialization at all the Target and Initiator Network Interface. Which indeed can create a cumulative impact on the performance of Network-on-chip. Area: Gate count of the NoC reduces, as the area of the Tracking system is reduced at all Network Interfaces. Functional-Benefits mentioned in section III.A and III.B creates a significant improvement in the functionality of an entire NoC. REFERENCES [1] Fig. 5. Simulation results of 5 port router Synthesis Results: The proposed circuit is synthesized in FPGA, and the implementation details are listed below. Fig 6 provides the device utilization summary of the circuit implemented in spartan 3 FPGA Fig. 6. Device Utilization Summary. Fig. 7. Timing Report H. Zhang, K. Wang, Y. Dai, and L. Liu, “A multi-VC dynamically shared buffer with prefetch for network on chip,” in Proc. IEEE 7th Int. Conf. Netw., Archit., Storage, Xiamen, China, Jun. 2012, pp. 320–327 [2] W. J. Dally and B. Towles, “Bufferd flow control,” in Principles and Practices of Interconnection Networks. San Francisco, CA, USA: Morgan Kaufmann, 2003. [3] M. Lai, Z. Wang, L. Gao, H. Lu, and K. Dai, “A dynamically-allocated virtual channel architecture with congestion awareness for on-chip routers,” in Proc. 45th ACM/IEEE DAC, Anaheim, CA, USA, Jun. 2008, pp. 630–633. [4] Masoud Oveis-Gharan and Gul N.khan, “Effiecient Dynamic Virtual Channel Organization and Architecture for NoC Systems”in proc, IEEE Transactions on Very Large Scale Integration(VLSI) Systems,Feb 2016. [5] H. Zhang, K. Wang, Y. Dai, and L. Liu, “A multi-VC dynamically shared buffer with prefetch for network on chip,” in Proc. IEEE 7th Int. Conf. Netw., Archit., Storage, Xiamen, China, Jun. 2012, pp. 320–327. [6] J. Liu and J. G. Delgado-Frias, “DAMQ self-compacting buffer schemes for systems with network-on-chip,” in Proc. IEEE ICCD, LasVegas, NV, USA, 2005, pp. 97–103. [7] C. Nicopoulos, A. Yanamandra, S. Srinivasan, N. Vijaykrishnan, and M. J. Irwin, “Variation-aware low-power buffer design,” in Proc. Conf. Rec. 41st Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA, USA, 2007, pp. 1402–1406. [8] M. O. Gharan and G. N. Khan, “A novel virtual channel implementation technique for multi-core on-chip communication,” in Proc. WAMCA, New York, NY, USA, 2012, pp. 36–41. [9] J. Liu and J. G. Delgado-Frias, “A shared self-compacting buffer for network-on-chip systems,” in Proc. 49th IEEE Int. Midwest Symp. Circuits Syst., San Juan, Puerto Rico, Aug. 2006, pp. 26–30. [10] J. Park, B. W. O’Krafka, S. Vassiliadis, and J. Delgado-Frias, “Design and evaluation of a DAMQ multiprocessor network with selfcompacting buffers,” in Proc. Supercomputing, Washington, DC, USA, 1994, pp. 713–722. Fig. 8. WNS Report 978-1-7281-1933-5/20/$31.00 ©2020 IEEE.