Uploaded by Naveen Kanumuri

Completion Tracking System Microarchitecture for advanced NoC designs

advertisement
Intellectual Completion Tracking System microarchitecture for Advanced NoC Designs
Sudeep P, Naveen Kanumuri, Sreenath AK , Vadlamuri Venkata Sateesh
Intel India Technology Pvt Ltd, Bengaluru, India
1 sudeep.p@intel.com
2 naveen.kanumuri@intel.com
3 sreenath.ak@intel.com
4 vadlamuri.v.sateesh@intel.com
Abstract— The increase in the number of processing
elements on network-on-chip (NoC) demands high
throughput and low latency with constraints on the area
of the chip. Network Interface (NI) is one of the
fundamental and performance-hungry blocks in an NoC.
NI is responsible for handling acknowledgments for its
requests. To handle acknowledgments in an NI, NoC must
have a Completion Tracking System block. Completion
Tracking System with static queues is prone to huge areas
and low throughput. This paper presents an Intellectual
Completion Tracking System (ICTS) which can handle
multiple outstanding transactions from different initiators
and assists in generating the responses from network
interfaces, which caters to the network interface between
PCI interfaces and nonstandard PCI interfaces.
Keywords— Network-on-chip, PCI, Network Interface, High
Bandwidth, Protocols.
I.
INTRODUCTION
Network on Chip is a communication infrastructure,
which interconnects all the Intellectual properties (IPs) for
intensive parallel communication with high bandwidth and
low latency. The main building blocks of the Network on Chip
(NoC) are Network interface, Routers, and Topology of the
Network.
generate multiple requests without an acknowledgment from
the Targets. And the Target IPs may receive at their own pace
in any order. But the Network Interface should adhere to the
ordering rules of the Native Protocol of that IP as well as the
ordering rules of the NoC.
II.
PREVIOUS RESEARCH WORK
Previous researches show response tracking through static
queue structures. One static queue is allocated per a tag with a
depth of total outstanding transactions. Request transactions
are pushed into a queue, which corresponds to a tag. On arrival
of response, entry is popped from a queue, which corresponds
to that tag.
Buffer space is allocated for each tag for the outstanding
number of transactions. If a particular tag is not used by
Initiator/Target IP, buffer space in the queue corresponds to
that tag is unused. In another case, if there is outstanding for
only one tag, then once if buffer space is not available in that
queue, even though there is a huge buffer space in other
queues, it cannot be utilized.
If there are “T” tags with outstanding transactions “O”,
then the total buffer space required is T * O. Response
Tracking with static queues is as shown in Fig. 1.
The IPs which are connected to the NoC natively
follow external bus protocols like AXI (Advanced Extensible
Interface), AHB (Advanced High-performance Bus), APB
(Advanced Peripheral Bus), OCP (open core protocol) ..Etc.
In order to communicate External IP to another External IP
through NoC[6], Network Interfaces are required at the source
side as well as at the destination side. Network Interface
converts External protocols to Network Link protocol and vice
versa. Master and Target IPs deploy the NIs.
As the Network on Chip supports highly intensive parallel
communication, so it may follow some ordering rules like PCI
producer and consumer ordering rules. Otherwise, the chip
may end up in some deadlock or livelock. According to the
PCI producer and consumer ordering rules, non-bufferable
transactions should push the bufferable transactions [3][4].
And Responses to the Requests should push the bufferable
transactions. So, Network Interfaces plays an important role
in maintaining these order rules between NoC and the IPs. If
the Network interface is not able to handle these ordering rules
with high bandwidth, it may back pressure Target/Initiator
IPs. Throughput or bandwidth of Network Interface depends
upon the completion management system in it.
Initiator IPs connected to the NoC can generate
outstanding transactions, which means these Initiators
Fig. 1. Response tracking using Static Queues
978-1-7281-1933-5/20/$31.00 ©2020 IEEE.
III.
THE NOVELTY OF COMPLETION TRACKING SYSTEM
A. Intellectual completion tracking system microarchitecture
Attribute Memory: Every new Request generally contains
some Request attributes which are used to frame the Response
command to the Initiator.
Linked List[5]: Link-list is used to dynamically allocate
the buffer space for the Request attributes, which can be used
in the generation of a response command. Each TAG ID has
one linked-list. Linked-list[9] grows from the tail pointer for
the new requests that are coming with the same TAG ID. If a
response comes for a TAG ID, then the linked list header
changes to the next head pointer.
If all the outstanding requests get completions, then the
linked list of the TAG ID should be inactivated. And this could
be used for other transactions which has a new Source ID
+TAG.
B. Deployment of ICTS in Target Network Interface
Fig. 2. Intellectual Completion Tracking System Block Level
Diagram
The Architecture of Intellectual Completion Tracking
System (ICTS) is as shown in the Fig. 2. ICTS Block Keeps
track of all outstanding transactions and assists in generating
the Response command back to the Bus Agent. ICTS block
provides multiple advantages, with which the entire
performance of the Network on Chip can be improved. The
Functional and PPA Advantages of ICTS has explained in
detail in section B and section C. The micro-architecture
details of ICTS block, is as follows.
ID Compression Module: Bus agents in NoC system
generally have an Initiator ID and Target ID for each Initiator
and each Target, respectively. And also, there is a tag
associated with the transaction generated by an Initiator. If
Target has to follow the ordering rules, then the Initiator
Network Interface has to send the new Tag, which is a
combination of Initiator ID and tag to the Target. The new ID
is useful for Target to generate a response based on ordering
rules. The new ID consumes more width in a flit. So, the ID
compression module helps to generate a unique id, which is
less width compare to the Initiator ID+tag combination. The
ID Compression module is already preloaded with a unique
tag in a FIFO, and these are generated for every new Initiator
ID+tag combination.
Active ID Indicator: If any ID has outstanding
transactions, then that ID is considered as an active ID. This
ID should be inactivated, when all the outstanding transactions
got the acknowledgments for that ID. Inactivated IDs can be
used for other transactions with a new ID.
Head and Tail Pointers: ID-based ordering ensures all the
transactions with the same tag should get the responses in the
same order. The Head pointer of an ID gets updated for every
response generation, and the tail pointer of an ID gets updated
for every new transaction request of that ID. The Head pointer
and the Tail pointer assists in invalidating its corresponding
ID. So, the Head pointer of an ID is used to get the attributes
of the transaction to generate a response command to the
Initiator.
Fig. 3. Embedding ICTS in Target Network Interface
Target Network Interface with ICTS is as shown in Fig. 3.
Target Network Interface may get multiple transactions from
different Initiators.
Target Network Interface has to
acknowledge all these Requests by satisfying the Ordering
rules of NoC. Bus Target can be a Bus agent or a Bus Fabric.
If it is a Bus Fabric, the Request transaction has to be
forwarded to the ultimate Bus agent, which is connected to the
Bus Fabric.
1) Unique TAG Generation
If there are multiple transactions from the Same Initiator
with the same tag, then the Network interface has to send the
responses in the same order in which the request arrived. If the
requests are from the different Initiators, but with the same
tag, it can still send out of order responses as the Requestors
978-1-7281-1933-5/20/$31.00 ©2020 IEEE.
are different. Concluding above statements with respect to the
ordering rules, If Initiator ID +TAG ID is different, then
Target Network Interface can send out of order responses;
otherwise, it has to send in order responses for the requests.
Hence, Target Network Interface has to forward the
Transaction down to the Bus Fabric with a new ID, which is a
combination of Initiator ID and Tag of the corresponding
transaction. Initiator ID+TAG is more wire count to the Bus
Fabric and all the way down to the destination.
The Unique TAG generation feature in the Intellectual
Completion Tracking System generates a unique ID starting
from zero. With the Unique Tag generation feature, Target NI
is able to receive out of order responses and generate response
commands back to the Initiators.
2) Enabling full Bandwidth for all Initiator IPs in NoC
If the Target Network Interface doesn’t have the capability
to handle outstanding transactions from different Initiators.
Then it has to serialize all the transactions, which limits the
bandwidth of the Initiators. Deploying ICTS in Target
Network Interface, allows the request from Multiple Initiators
to process and send to the Bus Target. It can handle out of
order responses, so all the Initiators can operate with their full
bandwidth, as the serialization of transactions is avoided the
Target Network Interface.
3) Handling split Transactions at Target NI
Target NI has to process and convert transactions from
NoC Router to another Bus protocol, during this process,
Target NI may have to split the request transaction and send it
as multiple requests to the Bus Fabric. For these split
transaction requests, responses come back. To track the no of
splits requests and its responses, split counters are maintained
in ICTS.
C. Deployment of ICTS in Initiator Network Interface
1) Acknowledgment to the Bus masters for buffarable
request on the network interface link.
Bus Master in the Network on Chip, which has the
interface protocols like AXI or OCP, expects
acknowledgment for the requests from the destination or
Intermediate targets of the Network topology. If a Bus
master’s request is going to be a buffarable transaction on the
Network interface link, then the Network Interface (NI) is
responsible for acknowledging the Request. Acknowledgment
from the Network Interface to the Bus Master should follow
the ordering rules of the Native bus protocol.
Consider an example, if the Bus master is an OCP agent.
Then the Network interface becomes a slave agent. OCP Bus
master expects all the acknowledgments, either with data or
without data, in order for a particular tag ID of its Requests.
Network Interface converts Bus master’s request into a
buffarable or non-buffarable transaction. A Network interface
gets the response from a target destination for a non-buffarable
transaction only. And the Network interface should not expect
the response for the buffarable transactions, but it has to
provide the acknowledgment to the bus master. In order to
acknowledge Bus master with the Response, it should wait for
all non-buffarable request’s responses from the target to
arrive.
ICTS ensures all the Buffarable transactions are
acknowledged to the Bus masters. Attribute memory has a
flag, which tells the transaction type of a pending completion.
If the flag indicates acknowledgment is pending for a nonbuffarable transaction, then it must wait for the Response from
NoC. otherwise, it can generate a response to the Bus master.
2) To Echo back certain attributes in the Response
command for the Bus Masters.
Bus Masters expects certain attributes to be echoed back
in the Response. If the Initiator network interface doesn’t have
the capability to revert back the required attributes to the Bus
master. Then all these attributes, which have to be echoed has
to travel up to the final destination and then back to the Bus
master, Which increases the flit count of the transaction, area,
and power consumption. Deploying ICTS for this purpose
ensures to revert back all the attributes required by the Bus
Master.
While sending the transaction towards the Target, Initiator
Network Interface has to push the required attributes into the
Completion Tracking System. So that the Completion
Tracking System can provide those required attributes during
the generation of the Response.
3) Unique id generation
Unique ID generation in Initiator NI is also significant, as
the Initiator IP can be plugged-in to a bus fabric. So ID of the
transaction can be wide width. Hence there is a need to
compress the ID of the transaction.
4) Out of order and interleaving at the split boundary
Initiator Network Interface may not be capable of handling
out of order responses from the Network link, even though
Bus Master has the capability to receive them. Consider an
example; if Bus Master is an AXI Master, and it generates
outstanding transactions with different tags, then Bus Master
may get out of order responses for different Tags. In this case,
ICTS can assist in framing the completion to the AXI Master.
Fig. 4. Embedding ICTS in Initiator Network Interface
As soon as it gets an out of order response, present the tag
to the ICTS, then ICTS can give the Attributes for that
978-1-7281-1933-5/20/$31.00 ©2020 IEEE.
transaction in the same cycle. Once the response is generated,
that entry for that transaction in the ICTS can be removed.
IV.
Fig 7 and Fig. 8 has timing report and Worst negative slack
(WNS) information.
RESULTS
CONCLUSION
Fig 5 shows the waveform for the Intellectual Completion
Tracking System implemented in the five-port router. Where
data coming from different port and orders according to the
logic and send to the output port.
Performance: Multiple outstanding transaction capability
is allowed instead of Serialization at all the Target and Initiator
Network Interface. Which indeed can create a cumulative
impact on the performance of Network-on-chip.
Area: Gate count of the NoC reduces, as the area of the
Tracking system is reduced at all Network Interfaces.
Functional-Benefits mentioned in section III.A and III.B
creates a significant improvement in the functionality of an
entire NoC.
REFERENCES
[1]
Fig. 5. Simulation results of 5 port router
Synthesis Results: The proposed circuit is synthesized in
FPGA, and the implementation details are listed below. Fig 6
provides the device utilization summary of the circuit
implemented in spartan 3 FPGA
Fig. 6. Device Utilization Summary.
Fig. 7. Timing Report
H. Zhang, K. Wang, Y. Dai, and L. Liu, “A multi-VC dynamically
shared buffer with prefetch for network on chip,” in Proc. IEEE 7th Int.
Conf. Netw., Archit., Storage, Xiamen, China, Jun. 2012, pp. 320–327
[2] W. J. Dally and B. Towles, “Bufferd flow control,” in Principles and
Practices of Interconnection Networks. San Francisco, CA, USA:
Morgan Kaufmann, 2003.
[3] M. Lai, Z. Wang, L. Gao, H. Lu, and K. Dai, “A dynamically-allocated
virtual channel architecture with congestion awareness for on-chip
routers,” in Proc. 45th ACM/IEEE DAC, Anaheim, CA, USA, Jun.
2008, pp. 630–633.
[4] Masoud Oveis-Gharan and Gul N.khan, “Effiecient Dynamic Virtual
Channel Organization and Architecture for NoC Systems”in proc,
IEEE Transactions on Very Large Scale Integration(VLSI)
Systems,Feb 2016.
[5] H. Zhang, K. Wang, Y. Dai, and L. Liu, “A multi-VC dynamically
shared buffer with prefetch for network on chip,” in Proc. IEEE 7th Int.
Conf. Netw., Archit., Storage, Xiamen, China, Jun. 2012, pp. 320–327.
[6] J. Liu and J. G. Delgado-Frias, “DAMQ self-compacting buffer schemes
for systems with network-on-chip,” in Proc. IEEE ICCD, LasVegas,
NV, USA, 2005, pp. 97–103.
[7] C. Nicopoulos, A. Yanamandra, S. Srinivasan, N. Vijaykrishnan, and
M. J. Irwin, “Variation-aware low-power buffer design,” in Proc. Conf.
Rec. 41st Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA,
USA, 2007, pp. 1402–1406.
[8] M. O. Gharan and G. N. Khan, “A novel virtual channel implementation
technique for multi-core on-chip communication,” in Proc. WAMCA,
New York, NY, USA, 2012, pp. 36–41.
[9] J. Liu and J. G. Delgado-Frias, “A shared self-compacting buffer for
network-on-chip systems,” in Proc. 49th IEEE Int. Midwest Symp.
Circuits Syst., San Juan, Puerto Rico, Aug. 2006, pp. 26–30.
[10] J. Park, B. W. O’Krafka, S. Vassiliadis, and J. Delgado-Frias,
“Design and evaluation of a DAMQ multiprocessor network with
selfcompacting buffers,” in Proc. Supercomputing, Washington, DC,
USA, 1994, pp. 713–722.
Fig. 8. WNS Report
978-1-7281-1933-5/20/$31.00 ©2020 IEEE.
Download