Survey Exploration of Network-on-Chip Architecture N.Ashokkumar1 P. Nagarajan2 S.Ravanaraja3 1,3Department of ECE, R.V.S College of Engg&Tech, Dindigul. 2Department of ECE, PSNA College of Engg&Tech, Dindigul E-mail: kohsa.ayk@rediffmail.com Abstract Network-on-chip is a very active research field with many practical applications in industry. Based on the study, the following topics were identified as especially crucial for continued development and success of NoC paradigm: procedures and test cases for benchmarking, traffic characterization and modeling, design automation, latency and power minimization, fault-tolerance, QoS policies, prototyping, and network interface design. Network-on-chip (NoC) architectures are emerging for the highly scalable, reliable, and modular on-chip communication infrastructure paradigm.The NoC architecture uses layered protocols and packet-switched networks which consist of on-chip routers, links, and network interfaces on a predefined topology. The major goal of communication-centric design and NoC paradigm is to achieve greater design productivity and performance by handling the increasing parallelism, manufacturing complexity, wiring problems, and reliability. Research key Areas: (i) Communication infrastructure: topology, link optimization, buffer sizing, floorplanning, clockdomain, and power. (ii)Communication paradigm: routing, switching, flow control, quality-of-service, network interfaces. (iii)Benchmarking and traffic characterization for design- and runtime optimization. (iv) Application mapping: task mapping/scheduling and IP component mapping. I. Introduction :Network-On-Chip Network-On-Chip (NoC) consists of routers, links, and network interfaces. Routers direct data over several links (hops). Topology defines their logical lay-out (connections) whereas floorplan defines the physical layout. The function of a network interface (adapter)is to decouple computation (the resources) from communication (the network). Definition-Network-on-chip is a communication network targeted for on chip. The basic properties of the NoC paradigm are •separates communication from computation •avoids global, centralized controller for communication. • allows arbitrary number of terminals. • Topology allows the addition of links as the system size grows (offers scalability). • Customization (link width, buffer sizes, even topology). • Allow multiple voltage and frequency domains. • Delivers data in-order either naturally or via layered protocols. • To varying guarantees for transfers. • To support for system testings Fig 1.Network-On-Chip • To support for system testings. .• Delivers data in-order either naturally or via layered protocols. • To varying guarantees for transfers A.NoC Topologies: Topology is a very important feature in the design of NoC because design of a router depends upon it. Different topologies are proposed in the literature forth design of NoC. Commonly used topologies are mesh, ring, torus, binary tree, bus and spidergon. Some researchers have also proposed topologies suitable for an application or an application area. The topology is statically known and usually very regular (e.g., a mesh) (i)Mesh: A mesh-shaped network consists of m columns and n rows. The routers are situated in the intersections of two wires and the computational resources are near routers. Addresses of routers and resources can be easily defined as x-y coordinates in mesh. Regular mesh network is also called as Manhattan Streetnetwork. (ii)Torus: A Torus network is an improved version of basic mesh network. A simple torus network is a mesh in which the heads of the columns are connected to the tails of the columns and the left sides of the rows are connected to the right sides of the rows. Torus network has better path diversity than mesh network, and it also has more minimal routes. (iii)Tree: In a tree topology nodes are routers and leaves are computational resources .The routers above a leaf are called as leaf’s ancestors and correspond the leafs below the ancestor are its children. In a fat tree topology each node has replicated ancestors which mean that there are many alternative routes between nodes. (iv)Butterfly: A butterfly network is uni or bidirectional and butterfly-shaped network typically uses a deterministic routing. For example a simple unidirectional butterfly network contains 8 input ports, 8 output ports and 3 router levels which each contain 4 routers. Packets arriving to the inputs on the left side of the network are routed to the correct output on the right side of the network. In a bidirectional butterfly network, all the inputs and outputs are on the same side of the network. Packets coming to inputs are first routed to the other side of the network, then turned around and routed back to the correct output. (v)Polygon: The simplest polygon network is a circular network where packets travel in loop from router to other. Mesh Torus Tree Butterfly Polygon Fig 2.NoC Topologies Network becomes more diverse when chords Butterfly network with 4 inputs, 4 outputs and 2 router stages each containing 2 routers. When there are chords only between opposite routers, the topology is called as spidergon .Polygon (hexagon) network with all potential chords. (vi)Star: A star network consists of a central router in the middle of the star, and computational resources or subnetworks in the spikes of the star. The capacity requirements of the central router are quite large, because all the traffic between the spikes goes through the central router. That causes a remarkable possibility of congestion in the middle of the star. (vii)Double chain topology: A new class of interconnection network topologies, the doublechain NoC topology. Double-chain topologies are comprised of two disjoint but overlapping chains, each of which connects all network nodes. These topologies are well suited to both 2-D planar VLSI technology and the ABC router microarchitecture.Double-chain topologies provide an advantage over 2-D Mesh networks for ABC routers by providing two paths comprised solely of “straight-path” links between all source-destination pairs. Doublechain topologies also offer higher amount of path diversity as compared to a standard2-D Mesh. In contrast to a 2-D Mesh, where all source and destination pairs have only two deadlock-free paths between them, double-chain topologies offer four such paths. B.NoC Router: A router is a device that forwards data packets between computer networks, creating an overlay internetwork. A router is connected to two or more data lines from different networks. When a data packet comes in one of the lines, the router reads the address information in the packet to determine its ultimate destination. Then, using information in its routing table or routing policy, it directs the packet to the next network on its journey. Routers perform the "traffic directing" functions on the Internet. A data packet is typically forwarded from one router to another through the networks that constitute the internetwork until it gets to its destination nodes. a.Routing: Arbitration and routing logic are designed for minimal complexity and low latency, because router stages typically must take no more than a few cycles. Fig 3.NoC Routing Classification of Routing in NoC: (i)Deterministic Vs Adaptive Routing: There are many ways to classify routing in NoC One way to classify routing in NoC could be deterministic or adaptive. Indeterministic routing the path from the source to the destination is completely determined in advance by the source and the destination addresses. In adaptive routing, multiple paths from the source to the destination are possible. When a packet enters a router, destination address is read from the header and accordingly, the routing function computes all possible output ports where this packet can be forwarded to, Then a routing function selects one of the admissible output ports to forward the packet. The selectivity of output port depends upon the dynamic network conditions such as congestion and link faults. There also exist partially adaptive routing algorithms which restrict certain paths for communication. They are simple and easy to implement compared algorithms. to adaptive routing (ii)Minimal and Non-Minimal Routing: A routing which uses shortest possible paths for communication is known as minimal routing. It is also possible to use longer paths for data transfer from source to destination. This possibility results from the adaptivity offered by a routing algorithm. The type of routing which uses longer paths for communication although shortest paths do exist is known as non-minimal routing. Non-minimal routing has some advantages over minimal routing including possibility of balancing network load and fault tolerance. (iii)Static and Dynamic Routing: In static routing, the path cannot be changed after a packet leaves the source. In dynamic routing, a path can be altered any time depending upon the network conditions. Source routing is static while distributed routing can be static or dynamic depending upon the routing algorithm used. It should be noted that even when adaptive routing algorithms are used to compute paths for source routing, it remains static unless some sophisticated selection technique is introduced in the network. (iv)Application Specific Routing: This type of routing is used for specialized applications or a set of concurrent applications. For a specific application of NoC based SoC in embedded systems we can have a good profile of the communications among different cores. This means that it is possible to know that which cores are communicating with each other and which cores do not communicate at all. In order to get best performance of NoC for specific application, we can have specialized application specific routing algorithm. APSRA is one such algorithm. (v)Minimal Adaptive Routing: Minimal adaptive routing algorithm always routes packets along the shortest path. The algorithm is effective when more than one minimal or as short as possible, routes between sender and receiver exist. The algorithm uses route which is least congested. (vi)Fully Adaptive Routing: Fully adaptive routing algorithm uses always a route which is not congested. The algorithm does not care although the route is not the shortest path between sender and receiver. Typically an adaptive routing algorithm sets alternative congestion free routes to order of superiority. The shortest route is the best one. (vii)Congestion Look Ahead: A congestion look ahead algorithm gets information about blocks from other routers. On the grounds of this information the routing algorithm can direct packets to bypass the congestions. (viii)Turnaround Routing: Turnaround routing is a routing algorithm for butterfly and fat-tree networks. Senders and receivers of packets are all on the same side of the network. Packets are first routed from sender to some random intermediate node on the other side of the network. In this node the packets are turned around and then routed to the destination on the same side of the network, where the whole routing started .The routing from the intermediate node to the definite receiver is done with the destination-tag routing. Routers in turnaround routing are bidirectional which means that packets can flow through router in both forward and backward directions. The algorithm is deadlock-free because packets only turn around once from a forward channel to a backwardchannel.SPIN (Scalable Programmable Interconnect Network) is a fat-tree shaped network which uses turnaround routing algorithm. In fault-tolerant XGFT system (extended Generalized Fat Tree) the turnaround routing is called as turn back routing. The network topology in XGFT systems is also fattree. XGFT’s turn back routing slightly differs from the basic turn around algorithm. While traditional turnaround routing chooses the intermediate node randomly, the XGFT’s turnback algorithm can choose it by itself. This is useful when the network is congested. (ix)Turn-Back-When-Possible: Turn-back-when-possible (TBWP) is an algorithm for routing on tree networks. It is a little bit improved version of the turnaround routing. When turn-back channels are busy, the algorithm looks for free routing path on a higher switch level. A turn-back channel is a channel between a forward and a backward channel. It is used to change the routing direction in the network. (x)Odd-Even Routing: An odd-even routing is an adaptive algorithm used in dynamically adaptive and deterministic (DyAD) Network on Chip system. The odd-even routing is a deadlock free turn model which prohibits turns from east to north and from east to south at tiles located in even columns and turns from north to west and south to west at tiles located in odd columns. The DyAD system uses the minimal odd-even routing which reduces energy consumption and also removes the possibility of live lock. (xi)XY Routing Algorithm: It is one of the simplest and most commonly used routing algorithms used inNoC. It is a static, deterministic and deadlock free routing algorithm. Out of eight possible turns in mesh topology, XY routing algorithm allows half the turns by restricting rest of the half. According to this algorithm, a packet must always be routed along horizontal or X axis of mesh until it reaches the same column as that of destination. Then it should be routed along vertical or Y axis and towards the location of destination resource. C.Switching Mechanism: The basic switching mechanism of Noc having different type of basic switching are a. Circuit switching control: The circuitswitched network (CS network)- A real or virtual circuit establishes a direct connection between source and destination. It has statically scheduled data-path and no inherent control. The Each output port is64-bits wide, since no control data is necessary. To provide more flexibility, each 64-bit output port is split into four, 16-bit wide, lanes. Given the 5-port design, 20 input and output lanes therefore exist. A 16 X 20 crossbar provides full connectivity between every input and output lane except that no Uturns are allowed. The crossbar allocation is a configurable memory of20 entries (1 for each output lane), with 5-bits per entry (4 address bits to identify an input lane and 1 valid bit).The splitting of a 64-bit flit into 16-bit units for transport over the network also means that a serializing and deserializing unit is necessary at the tile interface of the router. The completely static nature of the CS network means that a separate control network is necessary to provide All experiments then considered both the circuitswitched and packet-switched routers, to account for the necessary overhead of the packet-switched network. B.Wormhole flow control: This type of Wormhole (WH router) switching mechanism performs dynamic allocation, but not at the cost of highly complex allocation methods. The WH router uses a conventional input-queued architecture with 4-flit-deep buffers at each input. A two-stage pipeline is provided. The use of look-ahead routing allows switch allocation to occur in the first stage with crossbar and link traversal in the second Control information is appended to each flit rather than being carried in an additional header flit. The 64-bit data-path therefore combines with a one-hot encoded, 5bit next-port identifier for look-ahead routing, two bits each for destination and addresses and one bit to identify tail flits, to result in a total flit size of 74 bits. A pipeline register is provided between the input first-inputs–first-outputs (FIFOs) and the crossbar. For the crossbar traversal stage the flit at the head of the FIFO is loaded into this register, which drives it across the rest of the data path .A stop-go flow control is also used for buffer management ,where a buffer nearly full signal is output by each input FIFO to the corresponding upstream router to indicate that flit transmission should be stopped. C.virtual channel flow control: The virtual channel flow control QoS Providing Virtual Channel Router -VC based router architecture allows a comparison to a design using increasing amounts of control. The router is designed to offer QoS for streaming applications, while also using source routing and semi-dynamic allocation of resources. The Guar VC router implements wormhole routing with virtual channel flow control. A conventional inputqueued architecture with 4 VCs per port and 4flit-deep buffers for each VC were used .Each flit identifies its VC by using a 2-bit VC identifier. The use of separate head, body, and tail flits means that the flit type is encoded by an additional 2 bits. Combining with the 64- bitdata-path results in a total flit size of 68 bits .Source routing is used to determine the packet’s entire route at the originating node, Which is then carried by one or more header flits. Fig 4 NoC switching Mechanism Per hop of the route, 6 bits are required, 2 bits for the next port, 2 bits for the VC, and a 2-bit identifier for VCallocation. For a 64-bit data path, routing information for 10hops are merged into a single header flit.Input VC queues do not share a single crossbar port per input port and hence the crossbar is asymmetric and has 20 inputs, i.e.it has one input for every input VC queue. This creates a single point of arbitration that is used to enable QoS. To provide for guaranteed throughput traffic, a central controller allocates networkVCs to at most a single QoS requiring data stream. The round- robin arbiters used at each output port then give a predictable arbitration result, where each data stream is guaranteed certain proportion of the network throughput, i.e., throughput based QoS demands can be met. Best effort flows are dealt with by assigning the same VC to multiple data streams. Conflict-free VC allocation is guaranteed. D.Speculative Virtual Channel controls: The speculative, single cycle, virtual channel design presented router design contains a large amount of allocation logic, which attempts to provide good resource sharing, while minimizing latencies. The SpecVC router provides for single cycle flit forwarding by utilizing look-ahead routing and speculative VC and crossbar allocation. A conventional inputqueued architecture with 4VCs per port and 4flit-deep cyclic buffers for each VC was each flit identifies its VC by using a one hot encoded 4bit VC identifier. A 5-bit next-port identifier, 4bits each for destination and address and a bit to identify tail flits combines with the 64-bit data path to result in a total flit size of 82 bits. Both the VC and switch allocators (based on matrix arbiters) can allocate VCs and crossbar ports speculatively for the next clocks cycle if necessary. Since both crossbar and link traversal are performed in a single clock cycle, in the best case, an incoming flit finds preallocated resources and can thus be forwarded to the next hop in a single clock cycle. A stop-go flow control method is utilized to prevent buffer overflow. Spec VC router by the 2-bit identifier in the header flit, but it does not guarantee particular bandwidth or latency. D.NoC performance analysis and NoC communication refinement. a. Research on wormhole-switched networks has traditionally emphasized the flit delivery phase while simplifying flit admission and ejection. We have initiated investigation of these issues. It turns out that different flit-admission and flitejection models have quite different impact on cost, performance and Power. In classical wormhole switch architecture, we propose the coupled flit-admission and p-sink flit-ejection models. These optimizations are simple but effective. The coupled admission significantly reduces the crossbar complexity. Since the crossbar consumes a large portion of power in the switch, this adjustment is beneficial in both cost and power. The network performance, however, is not sensible to the adjustment before the network reaches the saturation point. The psink model has a direct impact on decreasing buffering cost, and has negligible impact on performance before network saturation. As the support for one-to-many communication is necessary, we design a multicasting protocol and implement it in a wormhole switched network. This multicast service is connection-oriented and QoS aware. For the TDM virtual-circuit configuration, we utilize the generalized logicalnetwork concept and develop theorems to guide the construction of contention-free virtual circuits. Moreover, we employ a back-tracking algorithm to explore the path diversity and systematically search for feasible configurations. B.On-chip networks expose a much larger design space to explore when compared with buses. The existence of a lot of design considerations at different layers leads to making design decisions difficult. As a consequence, it is desirable to explore these alternatives and to evaluate the resulting networks extensively. We have proposed traffic representation methods to configure various workload patterns. Together with the choices of the traffic configuration parameters, the exploration of the network design space can be conducted in our network simulation environment. We have suggested a contentiontree model which can be used to approximate network contentions. Using this model and its associated scheduling method, we develop a feasibility analysis test in which the satisfaction of timing constraints for real-time messages can be evaluated through an estimation program. As communication is taking the central role in a design flow, how to refine an abstract communication model onto on-chip networkbased communication platform is an open problem. Starting from a synchronous specification, we have formulated the problem and proposed a refinement approach. This refinement is oriented for correctness, performance and resource usage. Correct-byconstruction is achieved by maintaining synchronization consistency. We have also integrated the refinement of communication protocols in our approach, thus satisfying performance requirements. By composing and merging communication tasks of processes to share the underlying implementation channels, the network utilization can be improved. E.Networks on Chip: Challenges and Solutions . a. The network-on-Chip (NoC) design paradigm is viewed as an enabling solution for the integration of an exceedingly high number of computational and storage blocks in a single chip. The practical implementation and adoption of the NoC design paradigm is faced with various unsolved issues related to design methodologies, test strategies, and dedicated CAD tools. As a result of the increasing degree of integration, several research groups are striving to develop efficient on-chip communication infrastructures. Today, there exist many SoC designs that contain multiple processors in applications such as set-top boxes, wireless base stations, HDTV, mobile handsets, and image processing. New trends in the design of communication architectures in multi-core SoCs have appeared in the research literature recently. In particular, researchers suggest that multi-core SoCs can be built around different regular interconnect structures originating from parallel computing architectures. Custom-built application specific interconnect architectures are another promising solution. Such communication-centric interconnect fabrics are characterized by different trade-offs with regards to latency, throughput, reliability, energy dissipation, and silicon area requirements. The nature of the application will dictate the selection of a specific template for the communication medium.A complex SoC can be viewed as a micro network of multiple blocks, and hence, models and techniques from networking and parallel processing can be borrowed and applied to an SoC design methodology. The micro network must ensure quality of service requirements (such as reliability, guaranteed bandwidth/latency), and energy efficiency, under the limitation of intrinsically unreliable signal transmission media. Such limitations are due to the increased likelihood of timing and data errors, the variability of process parameters, crosstalk, and environmental factors such as electro-magnetic interference (EMI) and soft errors. b. Current simulation methods and tools can be ported to networked SoCs to validate functionality and performance at various abstraction levels, ranging from the electrical to the transaction levels. NoC libraries, including switches/routers, links and interfaces will provide designers with flexible components to complement processor/storage cores. Nevertheless, the usefulness of such libraries to designers will depend heavily on the level of maturity of the corresponding synthesis/optimization tools and flows. In other words, micro-network synthesis will enable NoC/SoC design similarly to the way logic synthesis enabled efficient semicustom design possible in the eighties. C.Though the design process of NoC-based systems borrows some of its aspects from the parallel computing domain, it is driven by a significantly different set of constraints. From the performance perspective, high throughput and low latency are desirable characteristics of MP-SoC platforms. However, from a VLSI design perspective, the energy dissipation profile of the interconnect architectures is of prime importance as the latter can represent a significant portion of the overall energy budget. The silicon area overhead due to the interconnect fabric is important too. The common characteristic of these kinds of architectures is such that the processor/storage cores communicate with each other through high-performance links and intelligent switches and such that the communication design can be represented at a high abstraction level. D.The exchange of data among the processor/storage cores is becoming an increasingly difficult task with growing system size and non-scalable global wire delay. To scope with these issues, the end-to-end communication medium needs to be divided into multiple pipelined stages, with delay in each stage comparable with the clock-cycle budget. In NoC architectures, the inter-switch wire segments together with the switch blocks constitute a highly-pipelined communication medium characterized by link pipelining, deeply-pipelined switches, and latencyinsensitive component design. Many new design methodologies can only be widely adopted only if it is complemented by efficient test mechanisms and methodologies. The development of test infrastructures and techniques supporting the Network on Chip design paradigm is a challenging problem. Specifically, the design of specialized Test Access Mechanisms (TAMs) for distributing test vectors and novel Design for Testability (DFT) schemes are of major importance. Moreover, in a communication-centric design environment like that provided by the NoCs, fault tolerance and reliability of the data transmission medium are two significant requirements in safetycritical applications. E.The test strategy of NoC-based systems must address three problems, (i) testing of the functional/storage blocks and their corresponding network interfaces, (ii) testing of the interconnect infrastructure itself; and (iii) the testing of the integrated system. For testing the functional/storage blocks and their corresponding network interfaces a Test Access Mechanism (TAM) is needed to transport the test data. Such TAM provides on-chip transport of test stimuli from a test pattern source to the core under test. It also transmits test responses from the core under test to test pattern sink. The principal advantage of using NoCs as TAMs is resulting "reuse" of the existing resource and the availability of several parallel paths to transmit test data to each core. Therefore, reduction in system test time can be achieved through extensive use of test parallelization, i.e., more functional blocks can be tested in parallel as more test paths are available. The controlability/observability of NoC interconnects is relatively reduced, due to the fact that they are deeply embedded and spread across the chip. Pin-count limitations restrict the use of I/O pins dedicated for the test of the different components of the data-transport medium; therefore, the NoC infrastructure should be progressively used for testing its own components in a recursive manner, i.e., the good, already tested NoC components should be used to transport test patterns to the untested elements. This test strategy minimizes the use of additional mechanisms for transporting data to the NoC elements under test, while allowing reduction of test time through the use of parallel test paths and test data multicast. f. Testing of the functional/storage blocks and the interconnect infrastructure separately are not sufficient to ensure adequate test quality. The interaction between the functional/storage cores and the communication fabric has to undergo extensive functional testing. This functional system testing should encompass testing of I/O functions of each processing elements and the data routing functions. Many SoCs are used within embedded systems, where reliability is an important figure of merit. At the same time, in deep submicron technologies beyond the 65 nm node, failures of transistors and wires are more likely to happen due to a variety of effects, such as soft (cosmic) errors, crosstalk, process variations, electro migration, and material aging. In general, we can distinguish between transient and permanent failures. Design of reliable SoCs must encompass techniques that address both types of malfunctions. From a reliability point of view, one of the advantages of packetized communication is the possibility of incorporating error control information into the transmitted data stream. Effective error detection and correction methods borrowed from the faulttolerant computing and communications engineering domains can be applied to cope with uncertainty in on-chip data transmission. Such methods need to be evaluated and optimized in terms of area, delay and power trade-offs. Permanent failures may be due to material aging (e.g., oxide), electro migration and mechanical/thermal stress. Failures can incapacitate a processing/storage core and/or a communication link. Different fault-tolerant multiprocessor architectures and routing algorithms have been proposed in the parallel processing domain. Some of these can be adapted to the NoC domain, but their effectiveness needs to be evaluated in terms of defect/error coverage versus throughput, delay, energy dissipation and silicon area overhead metrics. G.Network interfacing: The success of the NoC design paradigm relies greatly on the standardization of the interfaces between IP cores and the interconnection fabric. Using a standard interface should not impact the methodologies for IP core development. In fact, IP cores wrapped with a standard interface will exhibit a higher reusability and greatly simplify the task of system integration. The Open Core Protocol (OCP) is a plug and play interface standard receiving a wide industrial and academic acceptance. As shown in the figure below, for a core having both master and slave interfaces, the OCP compliant signals of the functional IP blocks are packetized by a second interface. The network interface has two functions: 1. injecting/absorbing the flits leaving/arriving at the functional/storage blocks; 2. Packetizing /depacketizing the signals coming from/reaching to OCP compatible cores in form of messages/flits. All OCP signals are unidirectional and synchronous, simplifying core implementation, integration and timing analysis. The OCP defines a point-to-point interface between two communicating entities, such as the IP core and the communication medium. One entity acts as the master of the OCP instance, and the other as the slave. OCP unifies all inter-core communications, including dataflow, sideband control and test-specific signals. The state of the art has reached the point where commercial designs are readily integrating in the range of 10-100 embedded functional/storage blocks in a single SoC. Interfacing of IP cores with the network fabric. Fig 5.Interfacing of IP cores with the network fabric This range is expected to increase significantly in the near future. As a result of this enormous degree of integration, several industrial and academic research groups are striving to develop efficient communication architectures, in some cases specifically optimized for specific applications. There is a converging trend within the research community towards an agreement that Networks on Chip constitute an enabling solution for this level of integration. F. NoC Reliability issues A.On-chip networks are critical to the scaling of future multicore processors. Recent multicore processors have adopted ring topologies because of its simplicity and high bandwidth. In this paper, we first describe bufferless router micro architecture for an on-chip network ring topology. We propose to extend the bufferless router with an extra buffer entry to create lightweight router microarchitecture. The proposed microarchitecture approaches ideal latency by reducing the microarchitecture complexity through minimizing the amount of buffers and simplifying switch allocation. We describe how the proposed lightweight microarchitecture does not need additional virtual channels to break routing deadlock. The scalability of the ring topology is presented as the network size increase. Although the ring topology has larger hop count with larger network diameter, lower per-hop router latency and no serialization latency results in lower latency for the ring topology compared to a 2D mesh topology. However, the wide channels in a ring topology creates bandwidth fragmentation which results in poor bandwidth utilization for short packets, compared to a 2D mesh topology, and can limit the scalability of the ring topology. B.Providing quality-of-service (QoS) for concurrent tasks in many-core architectures is becoming important, especially for real-time applications. QoS support for on-chip shared resources (such as shared cache, bus, and memory controllers)in chip-multiprocessors has been investigated in recent years. Unlike other shared resources, network-on-chip (NoC) does not typically have central arbitration of accesses to the shared resource. Instead, each router shares the responsibility of resource allocation. While such distributed nature benefits the scalable performance of NoC, it also dramatically complicates the problem of providing QoS support for individual flows. Existing approaches to address this problem suffer from various shortcomings such as low network utilization and weak QoS guarantees. In this work, we propose LOFT No architecture which features both high network utilization and strong QoS guarantees. LOFT is based on the combination of two mechanisms: a) locallysynchronized frames (LSF), which is a distributed frame-based scheduling mechanism that provides flexible QoS guarantees to different flows and b) flit-reservation (FRS), which is a flow-control mechanism integrated in LSF that improves network utilization. The experimental results show that LOFT delivers flexible and reliable QoS guarantees while sufficiently utilizes available network capacity to gain high overall throughput. c.Intel's 80-core Terascale Processor was the first generally programmable microprocessor to break the Teraflops barrier. The primary goal for the chip was to study power management and on-die communication technologies. When announced in 2007, it received a great deal of attention for running a stencil kernel at 1.0 single precision TFLOPS while using only 97 Watts. The literature about the chip, however, focused on the hardware, saying little about the software environment or the kernels used to evaluate the chip. D.A strategy to handle multiple defects in the No Clinks with almost no impact on the communication delay is presented. The faulttolerant method can guarantee the functionally of the NoC with multiple defects in any link, and with multiple faulty links. The proposed technique uses information from test phase to map the application and to configure faulttolerant features along the NoC links. Results from an application remapped in the NoC show that the communication delay is almost unaffected, with minimal impact and overhead when compared to a fault-free system. We also show that our proposal has a variable impact in performance while traditional fault-tolerant solution like Hamming Code has a constant impact. Besides our proposal can save among 15% to100% the energy when compared Hamming Code. G.Future Work NNSE (Nostrum Network Simulation Environment) has been demonstrated in the University Booth EDA (Electronic Design Automation) Tool Program. After publicity, it has been requested for research use by a number of NoC research groups in Europe, U.S.A. and Asia. In the future, we plan to improve it in the following directions: • Parameterize more layers: Current tunable parameters include topology, routing, and switching schemes. Each of the parameters may be extended with more options. These are all network-layer parameters. In NNSE, the layered structure allows us to orthogonally consider other layers’ parameters. In the physical layer, we can build wire, noise and signaling models to examine the reliability and robustness issues. We may consider the link layer parameters such as the link capacity, link-level flow control schemesetc. The upper layer like the transport layer allows us to investigate buffer dimensioning and buffer sharing schemes, as well as end-to-end flow control methods. • Configure dependent traffic: We have so far configured independent traffic, both synthetic and semi-synthetic. This means that traffic from different channels is independent from each other. This is easy to control and generate, but realistic traffic exhibits dependency and correlation. The way to generate traffic with various dependencies such as data, control, time, causalityetc. is worth investigating. For example, traffic with the requirement of flipsynchronizationshows correlated delivery requirements on video and audio traffic streams. • Support Quality-of-Service (QoS): This requires the implementation of QoSin the communication platform, and accordingly QoS generators and sinks. Monitoring service may be necessary to collect statistics on whether the performance constraints of a traffic stream have been satisfied or not. • Integrate application mapping: A tool that only explores communication performance is not sufficient. System performance is the result of interactive involvement of both communication and computation. Therefore, supporting application-mapping onto NoC platforms is surely desirable. To this end, we need to build and/or integrate resources models for cores, memories and I/Omodules. • Incorporate power estimation: As power is as sensible as performance for a quality SoC/NoC product, NNSE should incorporate the estimation of power consumption so that the performance and power tradeoffs can be better investigated and understood. Extending further the traffic generation for performance evaluation ends up with benchmarking different on-chip networks. The diverse NoC proposals necessitate standard sets of NoC benchmarks and associated evaluation methods to fairly compare them. REFERENCES [1] Yuho Jin, Member, IEEE Computer Society, Eun Jung Kim, Member, IEEE Computer Society, and Timothy Mark Pinkston, Fellow, IEEE Computer Society “CommunicationAware Globally-Coordinated On-Chip Networks ,” IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012. [2]“Congestion Control for Scalability in Bufferless On-Chip Networks,” SAFARI Technical Report No. 2011-003 (July 20, 2011). [3]Asynchronous Bypass Channels for MultiSynchronous NOCs: ARouter Microarchitecture, Topology, Routing Algorithm, http://www2.imm.dtu.dk.TusharN.K.Jain,Mukan d Ramakrishna .Paul V.GratzMember, IEEE, Alex Sprintson, Member, IEEE, and Gwan Choi Member, IEEE“IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,” VOL. 30, NO. 11, NOVEMBER 2011. [4] New Theory for Deadlock-Free Multicast Routing in Worm-hole-switched Virtualchannelless Network-On-Chip, , Faizal Arya Samman, Member, IEEE, and Thomas Hollstein, Member,IEEE&ManfredGlesnerFellow,IEEE“I EEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,” VOL. 22, NO. 4, APRIL 2011. [5]. Arnab Banerjee, Student Member, IEEE, Pascal T. Wolkotte, Member, IEEE, Robert D. Mullins, Member, IEEE,Simon W. Moore, Senior Member, IEEE, and Gerard J. M. Smit “An Energy and Performance Exploration of Network-on-Chip Architectures,”IEEE TRANSACTIONS ON VERY LARGE SCALEINTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009. [6]. A. Ganguly, P. P. Pande, B. Belzer, and C. Grecu, “Design of lowpower & reliable networks on chip through joint crosstalk avoidance and multiple error correction coding,” J. Electron Test, vol. 24, pp. 67–81, Jun. 2008. [7] F. Angiolini et al., “Survey of Network-onchip Proposals,” SALMINEN ET AL., SURVEY OF NETWORK-ON-CHIP PROPOSALS, WHITE PAPER, c OCP-IP, MARCH 2008. [8] E. Salminen, A. Kulmala, and T. D. Hämäläinen, “Survey of network-on-chip proposals,” in White paper, OCP-IP, Mar. 2008, pp. 1–13. [9]A.P.Frantz,M.Cassel,F.L.Kastensmidt,E.Cota , and L.Carro,”crosstalk and SEU Aware Networks on chips “,IEEE Design and Test of computers ,2007,pp.340-350. [10] D. Wu, Bashir M. Al-Hashimi, Marcus T. Schmitz. Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection. Proceedings of Asia South Pacific Design Automation Conference(ASPDAC’06). Jan. 24-27, Yokohama, Japan, pp. 36-41, (2006). [11] D. Bertozzi et al., “Noc synthesis flow for customized domain specific multiprocessor systems-on-chip,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 2, pp. 113–129, Feb. 2005. [12] G.-M.Chiu. The Odd-Even Turn Model for Adaptive Routing. IEEE transactions on Parallel and Distributed Systems. 11(7), pp. 729-738, 2000. [13] L. Benini and G. de Micheli, “Networks on chips: A new SoC paradigm,” IEEE Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002. [14] NNSE: Nostrum Network Simulation Environment. http://www.imit.kth.se.