Survey Exploration of Network-on-Chip Architecture

advertisement
Survey Exploration of Network-on-Chip Architecture
N.Ashokkumar1 P. Nagarajan2 S.Ravanaraja3
1,3Department of ECE, R.V.S College of Engg&Tech, Dindigul.
2Department of ECE, PSNA College of Engg&Tech, Dindigul
E-mail: kohsa.ayk@rediffmail.com
Abstract
Network-on-chip is a very active research field with many practical applications in industry. Based on the
study, the following topics were identified as especially crucial for continued development and success of
NoC paradigm: procedures and test cases for benchmarking, traffic characterization and modeling, design
automation, latency and power minimization, fault-tolerance, QoS policies, prototyping, and network
interface design. Network-on-chip (NoC) architectures are emerging for the highly scalable, reliable, and
modular on-chip communication infrastructure paradigm.The NoC architecture uses layered protocols and
packet-switched networks which consist of on-chip routers, links, and network interfaces on a predefined
topology. The major goal of communication-centric design and NoC paradigm is to achieve greater
design productivity and performance by handling the increasing parallelism, manufacturing complexity,
wiring problems, and reliability.
Research key Areas:
(i) Communication infrastructure: topology, link optimization, buffer sizing, floorplanning, clockdomain,
and power.
(ii)Communication paradigm: routing, switching, flow control, quality-of-service, network interfaces.
(iii)Benchmarking and traffic characterization for design- and runtime optimization.
(iv) Application mapping: task mapping/scheduling and IP component mapping.
I. Introduction :Network-On-Chip
Network-On-Chip (NoC) consists of routers, links,
and network interfaces. Routers direct data over
several links (hops). Topology defines their logical
lay-out (connections) whereas floorplan defines the
physical layout. The function of a network interface
(adapter)is to decouple computation (the resources)
from communication (the network).
Definition-Network-on-chip is a communication
network targeted for on chip.
The basic properties of the NoC paradigm are
•separates communication from computation
•avoids global, centralized controller for
communication.
• allows arbitrary number of terminals.
• Topology allows the addition of links as the
system size grows (offers scalability).
• Customization (link width, buffer sizes, even
topology).
• Allow multiple voltage and frequency
domains.
• Delivers data in-order either naturally or via
layered protocols.
• To varying guarantees for transfers.
• To support for system testings
Fig 1.Network-On-Chip
• To support for system testings.
.• Delivers data in-order either naturally or via
layered protocols.
• To varying guarantees for transfers
A.NoC Topologies: Topology is a very
important feature in the design of NoC because
design of a router depends upon it. Different
topologies are proposed in the literature forth
design of NoC. Commonly used topologies are
mesh, ring, torus, binary tree, bus and spidergon.
Some researchers have also proposed topologies
suitable for an application or an application area.
The topology is statically known and usually
very regular (e.g., a mesh)
(i)Mesh: A mesh-shaped network consists of m
columns and n rows. The routers are situated in
the intersections of two wires and the
computational resources are near routers.
Addresses of routers and resources can be easily
defined as x-y coordinates in mesh. Regular
mesh network is also called as Manhattan
Streetnetwork.
(ii)Torus: A Torus network is an improved
version of basic mesh network. A simple torus
network is a mesh in which the heads of the
columns are connected to the tails of the
columns and the left sides of the rows are
connected to the right sides of the rows. Torus
network has better path diversity than mesh
network, and it also has more minimal routes.
(iii)Tree: In a tree topology nodes are routers
and leaves are computational resources .The
routers above a leaf are called as leaf’s ancestors
and correspond the leafs below the ancestor are
its children. In a fat tree topology each node has
replicated ancestors which mean that there are
many alternative routes between nodes.
(iv)Butterfly: A butterfly network is uni or
bidirectional and butterfly-shaped network
typically uses a deterministic routing. For
example a simple unidirectional butterfly
network contains 8 input ports, 8 output ports
and 3 router levels which each contain 4 routers.
Packets arriving to the inputs on the left side of
the network are routed to the correct output on
the right side of the network. In a bidirectional
butterfly network, all the inputs and outputs are
on the same side of the network. Packets coming
to inputs are first routed to the other side of the
network, then turned around and routed back to
the correct output.
(v)Polygon: The simplest polygon network is a
circular network where packets travel in loop
from router to other.
Mesh
Torus
Tree
Butterfly
Polygon
Fig 2.NoC Topologies
Network becomes more diverse when chords
Butterfly network with 4 inputs, 4 outputs and 2
router stages each containing 2 routers. When
there are chords only between opposite routers,
the topology is called as spidergon .Polygon
(hexagon) network with all potential chords.
(vi)Star: A star network consists of a central
router in the middle of the star, and
computational resources or subnetworks in the
spikes of the star. The capacity requirements of
the central router are quite large, because all the
traffic between the spikes goes through the
central router. That causes a remarkable
possibility of congestion in the middle of the
star.
(vii)Double chain topology: A new class of
interconnection network topologies, the doublechain NoC topology. Double-chain topologies
are comprised of two disjoint but overlapping
chains, each of which connects all network
nodes. These topologies are well suited to both
2-D planar VLSI technology and the ABC router
microarchitecture.Double-chain
topologies
provide an advantage over 2-D Mesh networks
for ABC routers by providing two paths
comprised solely of “straight-path” links
between all source-destination pairs. Doublechain topologies also offer higher amount of
path diversity as compared to a standard2-D
Mesh. In contrast to a 2-D Mesh, where all
source and destination pairs have only two
deadlock-free paths between them, double-chain
topologies offer four such paths.
B.NoC Router: A router is a device that
forwards data packets between computer
networks, creating an overlay internetwork. A
router is connected to two or more data lines
from different networks. When a data packet
comes in one of the lines, the router reads the
address information in the packet to determine
its ultimate destination. Then, using information
in its routing table or routing policy, it directs
the packet to the next network on its journey.
Routers perform the "traffic directing" functions
on the Internet. A data packet is typically
forwarded from one router to another through
the networks that constitute the internetwork
until it gets to its destination nodes.
a.Routing: Arbitration and routing logic are
designed for minimal complexity and low
latency, because router stages typically must
take no more than a few cycles.
Fig 3.NoC Routing
Classification of Routing in NoC:
(i)Deterministic Vs Adaptive Routing: There
are many ways to classify routing in NoC One
way to classify routing in NoC could be
deterministic or adaptive. Indeterministic routing
the path from the source to the destination is
completely determined in advance by the source
and the destination addresses. In adaptive
routing, multiple paths from the source to the
destination are possible. When a packet enters a
router, destination address is read from the
header and accordingly, the routing function
computes all possible output ports where this
packet can be forwarded to, Then a routing
function selects one of the admissible output
ports to forward the packet. The selectivity of
output port depends upon the dynamic network
conditions such as congestion and link faults.
There also exist partially adaptive routing
algorithms which restrict certain paths for
communication. They are simple and easy to
implement compared
algorithms.
to
adaptive
routing
(ii)Minimal and Non-Minimal Routing:
A routing which uses shortest possible paths for
communication is known as minimal routing. It
is also possible to use longer paths for data
transfer from source to destination. This
possibility results from the adaptivity offered by
a routing algorithm. The type of routing which
uses longer paths for communication although
shortest paths do exist is known as non-minimal
routing. Non-minimal routing has some
advantages over minimal routing including
possibility of balancing network load and fault
tolerance.
(iii)Static and Dynamic Routing:
In static routing, the path cannot be changed
after a packet leaves the source. In dynamic
routing, a path can be altered any time
depending upon the network conditions. Source
routing is static while distributed routing can be
static or dynamic depending upon the routing
algorithm used. It should be noted that even
when adaptive routing algorithms are used to
compute paths for source routing, it remains
static unless some sophisticated selection
technique is introduced in the network.
(iv)Application Specific Routing:
This type of routing is used for specialized
applications or a set of concurrent applications.
For a specific application of NoC based SoC in
embedded systems we can have a good profile
of the communications among different cores.
This means that it is possible to know that which
cores are communicating with each other and
which cores do not communicate at all. In order
to get best performance of NoC for specific
application, we can have specialized application
specific routing algorithm. APSRA is one such
algorithm.
(v)Minimal Adaptive Routing:
Minimal adaptive routing algorithm always
routes packets along the shortest path. The
algorithm is effective when more than one
minimal or as short as possible, routes between
sender and receiver exist. The algorithm uses
route which is least congested.
(vi)Fully Adaptive Routing:
Fully adaptive routing algorithm uses always a
route which is not congested. The algorithm
does not care although the route is not the
shortest path between sender and receiver.
Typically an adaptive routing algorithm sets
alternative congestion free routes to order of
superiority. The shortest route is the best one.
(vii)Congestion Look Ahead: A congestion
look ahead algorithm gets information about
blocks from other routers. On the grounds of this
information the routing algorithm can direct
packets to bypass the congestions.
(viii)Turnaround Routing:
Turnaround routing is a routing algorithm for
butterfly and fat-tree networks. Senders and
receivers of packets are all on the same side of
the network. Packets are first routed from sender
to some random intermediate node on the other
side of the network. In this node the packets are
turned around and then routed to the destination
on the same side of the network, where the
whole routing started .The routing from the
intermediate node to the definite receiver is done
with the destination-tag routing. Routers in
turnaround routing are bidirectional which
means that packets can flow through router in
both forward and backward directions. The
algorithm is deadlock-free because packets only
turn around once from a forward channel to a
backwardchannel.SPIN (Scalable Programmable
Interconnect Network) is a fat-tree shaped
network which uses turnaround routing
algorithm. In fault-tolerant XGFT system
(extended Generalized Fat Tree) the turnaround
routing is called as turn back routing. The
network topology in XGFT systems is also fattree. XGFT’s turn back routing slightly differs
from the basic turn around algorithm. While
traditional turnaround routing chooses the
intermediate node randomly, the XGFT’s
turnback algorithm can choose it by itself. This
is useful when the network is congested.
(ix)Turn-Back-When-Possible:
Turn-back-when-possible (TBWP) is an
algorithm for routing on tree networks. It is a
little bit improved version of the turnaround
routing. When turn-back channels are busy, the
algorithm looks for free routing path on a higher
switch level. A turn-back channel is a channel
between a forward and a backward channel. It is
used to change the routing direction in the
network.
(x)Odd-Even Routing: An odd-even routing is
an adaptive algorithm used in dynamically
adaptive and deterministic (DyAD) Network on
Chip system. The odd-even routing is a deadlock
free turn model which prohibits turns from east
to north and from east to south at tiles located in
even columns and turns from north to west and
south to west at tiles located in odd columns.
The DyAD system uses the minimal odd-even
routing which reduces energy consumption and
also removes the possibility of live lock.
(xi)XY Routing Algorithm:
It is one of the simplest and most commonly
used routing algorithms used inNoC. It is a
static, deterministic and deadlock free routing
algorithm. Out of eight possible turns in mesh
topology, XY routing algorithm allows half the
turns by restricting rest of the half. According to
this algorithm, a packet must always be routed
along horizontal or X axis of mesh until it
reaches the same column as that of destination.
Then it should be routed along vertical or Y axis
and towards the location of destination resource.
C.Switching Mechanism:
The basic switching mechanism of Noc having
different type of basic switching are
a. Circuit switching control: The circuitswitched network (CS network)- A real or
virtual circuit establishes a direct connection
between source and destination. It has statically
scheduled data-path and no inherent control. The
Each output port is64-bits wide, since no control
data is necessary. To provide more flexibility,
each 64-bit output port is split into four, 16-bit
wide, lanes. Given the 5-port design, 20 input
and output lanes therefore exist. A 16 X 20
crossbar provides full connectivity between
every input and output lane except that no Uturns are allowed. The crossbar allocation is a
configurable memory of20 entries (1 for each
output lane), with 5-bits per entry (4 address bits
to identify an input lane and 1 valid bit).The
splitting of a 64-bit flit into 16-bit units for
transport over the network also means that a
serializing and deserializing unit is necessary at
the tile interface of the router. The completely
static nature of the CS network means that a
separate control network is necessary to provide
All experiments then considered both the circuitswitched and packet-switched routers, to
account for the necessary overhead of the
packet-switched network.
B.Wormhole flow control: This type of
Wormhole (WH router) switching mechanism
performs dynamic allocation, but not at the cost
of highly complex allocation methods. The WH
router uses a conventional input-queued
architecture with 4-flit-deep buffers at each
input. A two-stage pipeline is provided. The use
of look-ahead routing allows switch allocation to
occur in the first stage with crossbar and link
traversal in the second Control information is
appended to each flit rather than being carried in
an additional header flit. The 64-bit data-path
therefore combines with a one-hot encoded, 5bit next-port identifier for look-ahead routing,
two bits each for destination and addresses and
one bit to identify tail flits, to result in a total flit
size of 74 bits. A pipeline register is provided
between the input first-inputs–first-outputs
(FIFOs) and the crossbar. For the crossbar
traversal stage the flit at the head of the FIFO is
loaded into this register, which drives it across
the rest of the data path .A stop-go flow control
is also used for buffer management ,where a
buffer nearly full signal is output by each input
FIFO to the corresponding upstream router to
indicate that flit transmission should be stopped.
C.virtual channel flow control: The virtual
channel flow control QoS Providing Virtual
Channel Router -VC based router architecture
allows a comparison to a design using increasing
amounts of control. The router is designed to
offer QoS for streaming applications, while also
using source routing and semi-dynamic
allocation of resources. The Guar VC router
implements wormhole routing with virtual
channel flow control. A conventional inputqueued architecture with 4 VCs per port and 4flit-deep buffers for each VC were used .Each
flit identifies its VC by using a 2-bit VC
identifier. The use of separate head, body, and
tail flits means that the flit type is encoded by an
additional 2 bits. Combining with the 64-
bitdata-path results in a total flit size of 68
bits .Source routing is used to determine the
packet’s entire route at the originating node,
Which is then carried by one or more header
flits.
Fig 4 NoC switching Mechanism
Per hop of the route, 6 bits are required, 2 bits
for the next port, 2 bits for the VC, and a 2-bit
identifier for VCallocation. For a 64-bit data
path, routing information for 10hops are merged
into a single header flit.Input VC queues do not
share a single crossbar port per input port and
hence the crossbar is asymmetric and has 20
inputs, i.e.it has one input for every input VC
queue. This creates a single point of arbitration
that is used to enable QoS. To provide for
guaranteed throughput traffic, a central
controller allocates networkVCs to at most a
single QoS requiring data stream. The round-
robin arbiters used at each output port then
give a predictable arbitration result, where
each data stream is guaranteed certain
proportion of the network throughput, i.e.,
throughput based QoS demands can be met.
Best effort flows are dealt with by assigning
the same VC to multiple data streams.
Conflict-free VC allocation is guaranteed.
D.Speculative Virtual Channel controls:
The speculative, single cycle, virtual channel
design presented router design contains a large
amount of allocation logic, which attempts to
provide good resource sharing, while
minimizing latencies. The SpecVC router
provides for single cycle flit forwarding by
utilizing look-ahead routing and speculative VC
and crossbar allocation. A conventional inputqueued architecture with 4VCs per port and 4flit-deep cyclic buffers for each VC was each flit
identifies its VC by using a one hot encoded 4bit VC identifier. A 5-bit next-port identifier, 4bits each for destination and address and a bit to
identify tail flits combines with the 64-bit data
path to result in a total flit size of 82 bits. Both
the VC and switch allocators (based on matrix
arbiters) can allocate VCs and crossbar ports
speculatively for the next clocks cycle if
necessary. Since both crossbar and link traversal
are performed in a single clock cycle, in the best
case, an incoming flit finds preallocated
resources and can thus be forwarded to the next
hop in a single clock cycle. A stop-go flow
control method is utilized to prevent buffer
overflow. Spec VC router by the 2-bit identifier
in the header flit, but it does not guarantee
particular bandwidth or latency.
D.NoC performance analysis and NoC
communication refinement.
a. Research on wormhole-switched networks has
traditionally emphasized the flit delivery phase
while simplifying flit admission and ejection.
We have initiated investigation of these issues. It
turns out that different flit-admission and flitejection models have quite different impact on
cost, performance and Power. In classical
wormhole switch architecture, we propose the
coupled flit-admission and p-sink flit-ejection
models. These optimizations are simple but
effective. The coupled admission significantly
reduces the crossbar complexity. Since the
crossbar consumes a large portion of power in
the switch, this adjustment is beneficial in both
cost and power. The network performance,
however, is not sensible to the adjustment before
the network reaches the saturation point. The psink model has a direct impact on decreasing
buffering cost, and has negligible impact on
performance before network saturation. As the
support for one-to-many communication is
necessary, we design a multicasting protocol and
implement it in a wormhole switched network.
This multicast service is connection-oriented and
QoS aware. For the TDM virtual-circuit
configuration, we utilize the generalized logicalnetwork concept and develop theorems to guide
the construction of contention-free virtual
circuits. Moreover, we employ a back-tracking
algorithm to explore the path diversity and
systematically search for feasible configurations.
B.On-chip networks expose a much larger
design space to explore when compared with
buses. The existence of a lot of design
considerations at different layers leads to
making design decisions difficult. As a
consequence, it is desirable to explore these
alternatives and to evaluate the resulting
networks extensively. We have proposed traffic
representation methods to configure various
workload patterns. Together with the choices of
the traffic configuration parameters, the
exploration of the network design space can be
conducted in our network simulation
environment. We have suggested a contentiontree model which can be used to approximate
network contentions. Using this model and its
associated scheduling method, we develop a
feasibility analysis test in which the satisfaction
of timing constraints for real-time messages can
be evaluated through an estimation program. As
communication is taking the central role in a
design flow, how to refine an abstract
communication model onto on-chip networkbased communication platform is an open
problem. Starting from a synchronous
specification, we have formulated the problem
and proposed a refinement approach. This
refinement is oriented for correctness,
performance and resource usage. Correct-byconstruction is achieved by maintaining
synchronization consistency. We have also
integrated the refinement of communication
protocols in our approach, thus satisfying
performance requirements. By composing and
merging communication tasks of processes to
share the underlying implementation channels,
the network utilization can be improved.
E.Networks on Chip: Challenges and
Solutions .
a.
The network-on-Chip (NoC) design paradigm
is viewed as an enabling solution for the
integration of an exceedingly high number of
computational and storage blocks in a single
chip. The practical implementation and adoption
of the NoC design paradigm is faced with
various unsolved issues related to design
methodologies, test strategies, and dedicated
CAD tools. As a result of the increasing degree
of integration, several research groups are
striving
to
develop
efficient
on-chip
communication infrastructures. Today, there
exist many SoC designs that contain multiple
processors in applications such as set-top boxes,
wireless base stations, HDTV, mobile handsets,
and image processing. New trends in the design
of communication architectures in multi-core
SoCs have appeared in the research literature
recently. In particular, researchers suggest that
multi-core SoCs can be built around different
regular interconnect structures originating from
parallel computing architectures. Custom-built
application specific interconnect architectures
are another promising solution. Such
communication-centric interconnect fabrics are
characterized by different trade-offs with regards
to latency, throughput, reliability, energy
dissipation, and silicon area requirements. The
nature of the application will dictate the
selection of a specific template for the
communication medium.A complex SoC can be
viewed as a micro network of multiple blocks,
and hence, models and techniques from
networking and parallel processing can be
borrowed and applied to an SoC design
methodology. The micro network must ensure
quality of service requirements (such as
reliability, guaranteed bandwidth/latency), and
energy efficiency, under the limitation of
intrinsically unreliable signal transmission
media. Such limitations are due to the increased
likelihood of timing and data errors, the
variability of process parameters, crosstalk, and
environmental factors such as electro-magnetic
interference (EMI) and soft errors.
b. Current simulation methods and tools can be
ported to networked SoCs to validate
functionality and performance at various
abstraction levels, ranging from the electrical to
the transaction levels. NoC libraries, including
switches/routers, links and interfaces will
provide designers with flexible components to
complement
processor/storage
cores.
Nevertheless, the usefulness of such libraries to
designers will depend heavily on the level of
maturity
of
the
corresponding
synthesis/optimization tools and flows. In other
words, micro-network synthesis will enable
NoC/SoC design similarly to the way logic
synthesis enabled efficient semicustom design
possible in the eighties.
C.Though the design process of NoC-based
systems borrows some of its aspects from the
parallel computing domain, it is driven by a
significantly different set of constraints. From
the performance perspective, high throughput
and low latency are desirable characteristics of
MP-SoC platforms. However, from a VLSI
design perspective, the energy dissipation profile
of the interconnect architectures is of prime
importance as the latter can represent a
significant portion of the overall energy budget.
The silicon area overhead due to the
interconnect fabric is important too. The
common characteristic of these kinds of
architectures is such that the processor/storage
cores communicate with each other through
high-performance links and intelligent switches
and such that the communication design can be
represented at a high abstraction level.
D.The exchange of data among the
processor/storage cores is becoming an
increasingly difficult task with growing system
size and non-scalable global wire delay. To
scope with these issues, the end-to-end
communication medium needs to be divided into
multiple pipelined stages, with delay in each
stage comparable with the clock-cycle budget. In
NoC architectures, the inter-switch wire
segments together with the switch blocks
constitute a highly-pipelined communication
medium characterized by link pipelining,
deeply-pipelined
switches,
and
latencyinsensitive component design. Many new design
methodologies can only be widely adopted only
if it is complemented by efficient test
mechanisms
and
methodologies.
The
development of test infrastructures and
techniques supporting the Network on Chip
design paradigm is a challenging problem.
Specifically, the design of specialized Test
Access Mechanisms (TAMs) for distributing test
vectors and novel Design for Testability (DFT)
schemes are of major importance. Moreover, in
a communication-centric design environment
like that provided by the NoCs, fault tolerance
and reliability of the data transmission medium
are two significant requirements in safetycritical applications.
E.The test strategy of NoC-based systems must
address three problems, (i) testing of the
functional/storage
blocks
and
their
corresponding network interfaces, (ii) testing of
the interconnect infrastructure itself; and (iii) the
testing of the integrated system. For testing the
functional/storage
blocks
and
their
corresponding network interfaces a Test Access
Mechanism (TAM) is needed to transport the
test data. Such TAM provides on-chip transport
of test stimuli from a test pattern source to the
core under test. It also transmits test responses
from the core under test to test pattern sink. The
principal advantage of using NoCs as TAMs is
resulting "reuse" of the existing resource and the
availability of several parallel paths to transmit
test data to each core. Therefore, reduction in
system test time can be achieved through
extensive use of test parallelization, i.e., more
functional blocks can be tested in parallel as
more test paths are available.
The controlability/observability of NoC
interconnects is relatively reduced, due to the
fact that they are deeply embedded and spread
across the chip. Pin-count limitations restrict the
use of I/O pins dedicated for the test of the
different components of the data-transport
medium; therefore, the NoC infrastructure
should be progressively used for testing its own
components in a recursive manner, i.e., the
good, already tested NoC components should be
used to transport test patterns to the untested
elements. This test strategy minimizes the use of
additional mechanisms for transporting data to
the NoC elements under test, while allowing
reduction of test time through the use of parallel
test paths and test data multicast.
f. Testing of the functional/storage blocks and
the interconnect infrastructure separately are not
sufficient to ensure adequate test quality. The
interaction between the functional/storage cores
and the communication fabric has to undergo
extensive functional testing. This functional
system testing should encompass testing of I/O
functions of each processing elements and the
data routing functions.
Many SoCs are used within embedded systems,
where reliability is an important figure of merit.
At the same time, in deep submicron
technologies beyond the 65 nm node, failures of
transistors and wires are more likely to happen
due to a variety of effects, such as soft (cosmic)
errors, crosstalk, process variations, electro
migration, and material aging. In general, we
can distinguish between transient and permanent
failures. Design of reliable SoCs must
encompass techniques that address both types of
malfunctions. From a reliability point of view,
one of the advantages of packetized
communication
is
the
possibility
of
incorporating error control information into the
transmitted data stream. Effective error detection
and correction methods borrowed from the faulttolerant computing and communications
engineering domains can be applied to cope with
uncertainty in on-chip data transmission. Such
methods need to be evaluated and optimized in
terms of area, delay and power trade-offs.
Permanent failures may be due to material aging
(e.g.,
oxide),
electro
migration
and
mechanical/thermal
stress.
Failures
can
incapacitate a processing/storage core and/or a
communication link. Different fault-tolerant
multiprocessor architectures and routing
algorithms have been proposed in the parallel
processing domain. Some of these can be
adapted to the NoC domain, but their
effectiveness needs to be evaluated in terms of
defect/error coverage versus throughput, delay,
energy dissipation and silicon area overhead
metrics.
G.Network interfacing: The success of the NoC
design paradigm relies greatly on the
standardization of the interfaces between IP
cores and the interconnection fabric. Using a
standard interface should not impact the
methodologies for IP core development. In fact,
IP cores wrapped with a standard interface will
exhibit a higher reusability and greatly simplify
the task of system integration. The Open Core
Protocol (OCP) is a plug and play interface
standard receiving a wide industrial and
academic acceptance. As shown in the figure
below, for a core having both master and slave
interfaces, the OCP compliant signals of the
functional IP blocks are packetized by a second
interface. The network interface has two
functions:
1. injecting/absorbing
the
flits
leaving/arriving at the functional/storage
blocks;
2. Packetizing /depacketizing the signals
coming
from/reaching
to
OCP
compatible
cores
in
form
of
messages/flits.
All OCP signals are unidirectional and
synchronous, simplifying core implementation,
integration and timing analysis. The OCP
defines a point-to-point interface between two
communicating entities, such as the IP core and
the communication medium. One entity acts as
the master of the OCP instance, and the other as
the slave. OCP unifies all inter-core
communications, including dataflow, sideband
control and test-specific signals. The state of the
art has reached the point where commercial
designs are readily integrating in the range of
10-100 embedded functional/storage blocks in a
single SoC.
Interfacing of IP cores with the network
fabric.
Fig 5.Interfacing of IP cores with the network
fabric
This range is expected to increase significantly
in the near future. As a result of this enormous
degree of integration, several industrial and
academic research groups are striving to
develop efficient communication architectures,
in some cases specifically optimized for specific
applications. There is a converging trend within
the research community towards an agreement
that Networks on Chip constitute an enabling
solution for this level of integration.
F. NoC Reliability issues
A.On-chip networks are critical to the scaling of
future multicore processors. Recent multicore
processors have adopted ring topologies because
of its simplicity and high bandwidth. In this
paper, we first describe bufferless router micro
architecture for an on-chip network ring
topology. We propose to extend the bufferless
router with an extra buffer entry to create
lightweight router
microarchitecture.
The
proposed microarchitecture approaches ideal
latency by reducing the microarchitecture
complexity through minimizing the amount of
buffers and simplifying switch allocation. We
describe how the proposed lightweight
microarchitecture does not need additional
virtual channels to break routing deadlock. The
scalability of the ring topology is presented as
the network size increase. Although the ring
topology has larger hop count with larger
network diameter, lower per-hop router latency
and no serialization latency results in lower
latency for the ring topology compared to a 2D
mesh topology. However, the wide channels in a
ring
topology
creates bandwidth
fragmentation which results in poor bandwidth
utilization for short packets, compared to a 2D
mesh topology, and can limit the scalability of
the ring topology.
B.Providing quality-of-service (QoS) for
concurrent tasks in many-core architectures is
becoming important, especially for real-time
applications. QoS support for on-chip shared
resources (such as shared cache, bus, and
memory controllers)in chip-multiprocessors has
been investigated in recent years. Unlike other
shared resources, network-on-chip (NoC) does
not typically have central arbitration of accesses
to the shared resource. Instead, each router
shares the responsibility of resource allocation.
While such distributed nature benefits the
scalable performance of NoC, it also
dramatically complicates the problem of
providing QoS support for individual flows.
Existing approaches to address this problem
suffer from various shortcomings such as low
network utilization and weak QoS guarantees. In
this work, we propose LOFT No architecture
which features both high network utilization and
strong QoS guarantees. LOFT is based on the
combination of two mechanisms: a) locallysynchronized frames (LSF), which is a
distributed frame-based scheduling mechanism
that provides flexible QoS guarantees to
different flows and b) flit-reservation (FRS),
which is a flow-control mechanism integrated in
LSF that improves network utilization. The
experimental results show that LOFT delivers
flexible and reliable QoS guarantees while
sufficiently utilizes available network capacity
to gain high overall throughput.
c.Intel's 80-core Terascale Processor was the
first generally programmable microprocessor to
break the Teraflops barrier. The primary goal for
the chip was to study power management and
on-die communication technologies. When
announced in 2007, it received a great deal of
attention for running a stencil kernel at 1.0
single precision TFLOPS while using only 97
Watts. The literature about the chip, however,
focused on the hardware, saying little about the
software environment or the kernels used to
evaluate the chip.
D.A strategy to handle multiple defects in the
No Clinks with almost no impact on the
communication delay is presented. The faulttolerant method can guarantee the functionally
of the NoC with multiple defects in any link, and
with multiple faulty links. The proposed
technique uses information from test phase to
map the application and to configure faulttolerant features along the NoC links. Results
from an application remapped in the NoC show
that the communication delay is almost
unaffected, with minimal impact and overhead
when compared to a fault-free system. We also
show that our proposal has a variable impact in
performance while traditional fault-tolerant
solution like Hamming Code has a constant
impact. Besides our proposal can save among
15% to100% the energy when compared
Hamming Code.
G.Future Work
NNSE
(Nostrum
Network
Simulation
Environment) has been demonstrated in the
University Booth EDA (Electronic Design
Automation) Tool Program. After publicity, it
has been requested for research use by a number
of NoC research groups in Europe, U.S.A. and
Asia. In the future, we plan to improve it in the
following directions:
• Parameterize more layers: Current tunable
parameters include topology, routing, and
switching schemes. Each of the parameters may
be extended with more options. These are all
network-layer parameters. In NNSE, the layered
structure allows us to orthogonally consider
other layers’ parameters. In the physical layer,
we can build wire, noise and signaling models to
examine the reliability and robustness issues.
We may consider the link layer parameters such
as the link capacity, link-level flow control
schemesetc. The upper layer like the transport
layer allows us to investigate buffer
dimensioning and buffer sharing schemes, as
well as end-to-end flow control methods.
• Configure dependent traffic: We have so far
configured independent traffic, both synthetic
and semi-synthetic. This means that traffic from
different channels is independent from each
other. This is easy to control and generate, but
realistic traffic exhibits dependency and
correlation. The way to generate traffic with
various dependencies such as data, control, time,
causalityetc. is worth investigating. For
example, traffic with the requirement of
flipsynchronizationshows correlated delivery
requirements on video and audio traffic streams.
• Support Quality-of-Service (QoS): This
requires the implementation of QoSin the
communication platform, and accordingly QoS
generators and sinks. Monitoring service may be
necessary to collect statistics on whether the
performance constraints of a traffic stream have
been satisfied or not.
• Integrate application mapping: A tool that
only explores communication performance is not
sufficient. System performance is the result of
interactive involvement of both communication
and computation. Therefore, supporting
application-mapping onto NoC platforms is
surely desirable. To this end, we need to build
and/or integrate resources models for cores,
memories and I/Omodules.
• Incorporate power estimation: As power is
as sensible as performance for a quality
SoC/NoC product, NNSE should incorporate the
estimation of power consumption so that the
performance and power tradeoffs can be better
investigated and understood. Extending further
the traffic generation for performance evaluation
ends up with benchmarking different on-chip
networks. The diverse NoC proposals
necessitate standard sets of NoC benchmarks
and associated evaluation methods to fairly
compare them.
REFERENCES
[1] Yuho Jin, Member, IEEE Computer Society,
Eun Jung Kim, Member, IEEE Computer
Society, and Timothy Mark Pinkston, Fellow,
IEEE Computer Society “CommunicationAware Globally-Coordinated On-Chip Networks
,” IEEE TRANSACTIONS ON PARALLEL
AND DISTRIBUTED SYSTEMS, VOL. 23,
NO. 2, FEBRUARY 2012.
[2]“Congestion Control for Scalability in
Bufferless On-Chip Networks,” SAFARI
Technical Report No. 2011-003 (July 20, 2011).
[3]Asynchronous Bypass Channels for MultiSynchronous NOCs: ARouter Microarchitecture,
Topology,
Routing
Algorithm,
http://www2.imm.dtu.dk.TusharN.K.Jain,Mukan
d Ramakrishna .Paul V.GratzMember, IEEE,
Alex Sprintson, Member, IEEE, and Gwan Choi
Member, IEEE“IEEE TRANSACTIONS ON
COMPUTER
AIDED
DESIGN
OF
INTEGRATED CIRCUITS AND SYSTEMS,”
VOL. 30, NO. 11, NOVEMBER 2011.
[4] New Theory for Deadlock-Free Multicast
Routing in Worm-hole-switched Virtualchannelless Network-On-Chip, , Faizal Arya
Samman, Member, IEEE, and Thomas Hollstein,
Member,IEEE&ManfredGlesnerFellow,IEEE“I
EEE TRANSACTIONS ON PARALLEL AND
DISTRIBUTED SYSTEMS,” VOL. 22, NO. 4,
APRIL 2011.
[5]. Arnab Banerjee, Student Member, IEEE,
Pascal T. Wolkotte, Member, IEEE, Robert D.
Mullins, Member, IEEE,Simon W. Moore,
Senior Member, IEEE, and Gerard J. M. Smit
“An Energy and Performance Exploration of
Network-on-Chip
Architectures,”IEEE
TRANSACTIONS ON
VERY LARGE
SCALEINTEGRATION (VLSI) SYSTEMS,
VOL. 17, NO. 3, MARCH 2009.
[6]. A. Ganguly, P. P. Pande, B. Belzer, and C.
Grecu, “Design of lowpower & reliable
networks on chip through joint crosstalk
avoidance and multiple error correction coding,”
J. Electron Test, vol. 24, pp. 67–81, Jun. 2008.
[7] F. Angiolini et al., “Survey of Network-onchip Proposals,” SALMINEN ET AL.,
SURVEY
OF
NETWORK-ON-CHIP
PROPOSALS, WHITE PAPER, c OCP-IP,
MARCH 2008.
[8] E. Salminen, A. Kulmala, and T. D.
Hämäläinen, “Survey of network-on-chip
proposals,” in White paper, OCP-IP, Mar. 2008,
pp. 1–13.
[9]A.P.Frantz,M.Cassel,F.L.Kastensmidt,E.Cota
, and L.Carro,”crosstalk and SEU Aware
Networks on chips “,IEEE Design and Test of
computers ,2007,pp.340-350.
[10] D. Wu, Bashir M. Al-Hashimi, Marcus T.
Schmitz. Improving Routing Efficiency for
Network-on-Chip through Contention-Aware
Input Selection. Proceedings of Asia South
Pacific
Design
Automation
Conference(ASPDAC’06).
Jan.
24-27,
Yokohama, Japan, pp. 36-41, (2006).
[11] D. Bertozzi et al., “Noc synthesis flow for
customized domain specific multiprocessor
systems-on-chip,” IEEE Trans. Parallel and
Distributed Systems, vol. 16, no. 2, pp. 113–129,
Feb. 2005.
[12] G.-M.Chiu. The Odd-Even Turn Model for
Adaptive Routing. IEEE transactions on Parallel
and Distributed Systems. 11(7), pp. 729-738,
2000.
[13] L. Benini and G. de Micheli, “Networks on
chips: A new SoC paradigm,” IEEE Computer,
vol. 35, no. 1, pp. 70–78, Jan. 2002.
[14] NNSE: Nostrum Network Simulation
Environment. http://www.imit.kth.se.
Download