Delay Optimized Architecture for On

advertisement
104
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA, VOL. 7, NO. 2, JUNE 2009
Delay Optimized Architecture for On-Chip
Communication
Sheraz Anjum, Jie Chen, Pei-Pei Yue, and Jian Liu
Abstract⎯Networks-on-chip (NoC), a new system on
chip (SoC) paradigm, has become a great focus of research
by many groups during the last few years. Among all the
NoC architectures that have been proposed until now,
2D-Mesh has proved to be the best architecture for
implementation due to its regular and simple interconnection structure. In this paper, we propose a new
interconnect architecture called 2D-diagonal mesh
(2DDgl-Mesh) for on-chip communication. The 2DDglMesh is almost similar to traditional 2D-Mesh in aspects
of cost, area, and implementation, but it can outperform
the later in delay. The both architectures are compared by
using NS-2 (a network simulator) and CINSIM (a
component based interconnection simulator) under the
same traffic models and parametric conditions. The
results of comparison show that under the proposed
architecture, the packets can almost always be routed to
their destinations in less time. In addition, our architecture can sometimes perform better than 2D-Mesh in
drop ratio for special fixed traffic models.
Index Terms⎯2D-Mesh, networks-on-chip, network
simulator 2, traffic models, system on chip.
issues. Moreover, long wire delays, reusability, less
modularity, and scalability issues have been added to the
problems of current bus-based SoCs. Consequently, more
modular and scalable design methodologies[2]-[4] have been
proposed, known as network on chip (NoC), a new SoC
paradigm. The use of globally asynchronous locally
synchronous concept in NoCs has disintegrated the design of
resources from the rest of the network. Its use could enhance
the scalability, modularity, and reusability of IP.
Design and selection of appropriate architectures for
on-chip communication take a key role in the design and
implementation of the complete platform for NoC. Different
on-chip interconnect architectures were proposed, evaluated,
or analyzed in [5]-[8]. Among all the NoC architectures
presented until now, 2D-Mesh has proved to be the best
architecture in terms of implementation due to its regular and
simple interconnection structure. 2D-Mesh architecture is
also more compatible with ultra deep submicron fabrication
technologies.
In this paper, we propose a new interconnect architecture,
named as 2D-diagonal mesh (2DDgl-Mesh), which has
similarities of traditional 2D-Mesh but can perform better
than the later in delay and sometimes in drop ratio too.
2. Related Work
1. Introduction
According to International Technology Roadmap for
[1]
Semiconductors (ITRS) , billions of transistors can be
fabricated on a single chip by using 45 nm process
technology by the end of this decade. Current system on chip
(SoC) design methodologies are not scaling well with the
advancement of process technologies. The use of buses in
today’s SoCs for interconnecting heterogeneous resources is
becoming a bottleneck due to contention and congestion
Manuscript received February 27, 2008; revised April 20, 2008. This
work was supported by the National Natural Science Foundation of China
under Grant No. 60425413 and COMSATS Institute of Information
Technology, Pakistan.
S. Anjum is with COMSATS Institute of Information Technology,
Pakistan and working as Assistant Professor at the Deptment of Electrical
Engineering CIIT Quaid Avenue, Wah Cantt Campus, Pakistan (e-mail:
sheraz1976@hotmail.com).
J. Chen, P.-P. Yue, and J. Liu are with Institute of Microelectronics,
Chinese Academy of Sciences, Beijing, 100029, China (e-mail:
jchen@ime.ac.cn, yuepeipei@ime.ac.cn, and liujian04@mails.gucas.ac.cn).
Many research teams have focused the architectural
aspects of NoCs. Kumar et al. introduced a new methodology
for designing mesh architecture for NoC[3]. Karim et al.
proposed a novel communication network architecture for
8-CPU distributed-memoly systems that has the potential to
deliver the throughput required in next generation routers[5].
Vahdatpour et al. purposed a new network on chip
architecture called hierarchical graph[6], where NS-2 (a
network simulator-2) was applied for the purpose of
simulation and analysis of their proposed architecture. In [7]
Pande et al. developed a consistent and meaningful
evaluation methodology to compare the performance and
characteristics of a variety of NoC architectures. Hossain et
al. introduced extended butterfly FAT tree interconnection
(EFTI) and provided a routing algorithm for EFTI and its
comparative analysis through the simulation results[8]. In [9]
Sun et al. constructed a proto-model using a public domain
network simulator NS-2[10] and evaluated design options for a
specific NOC architecture which has a two dimensional mesh
of switches.
ANJUM et al: Delay Optimized Architecture for On-Chip Communication
105
Fig. 3. Worst case delay comparison.
Fig. 1. 2DDgl-Mesh architecture.
Fig. 4. Discrete components of 2DDgl-Mesh: (a) and (b) are quarter
cross components and (c) to (f) are half cross components.
Fig. 2. 2D-Mesh architecture.
3. Architecture of 2DDgl-Mesh
Our proposed architecture named as 2DDgl-Mesh and
traditional 2D-Mesh are shown in Fig. 1 and Fig. 2,
respectively. It is evident from the two figures that our
architecture has been derived by introducing two diagonal
links in the traditional 2D-Mesh architecture. In Fig. 1 and
Fig. 2, ‘Rt’ represents a router and ‘r’ represents a heterogeneous resource.
Let N = X 2 be the total number of resources required to be
interconnected on-chip. The architectures in Fig. 1 and Fig. 2
have shown for specific value of X=5. Both the architectures
are scalable and can accommodate more and more resources
N with the increase in X. Let L be the total number of links
between the routers and LDgl the total number of diagonal
links that have been introduced in the proposed architecture,
then “LDgl = 2(X−1)”.
Therefore, we can conclude that the proposed architecture, 2DDgl-Mesh, has been derived by adding LDgl links
to traditional 2D-Mesh and approx. 2LDgl ports to the routers
fall on the diagonal links. The addition of diagonal links can
help in the reduction of average delay of packets that have
been routed between any source destination pair.
The worst case delay is related to the resource that is on
the opposite ends of the chip. For a sender/receiver pair
situated on the opposite ends of a chip, the number of hops
using 2D-Mesh is 2(X−1) but that using 2DDgl-Mesh is
(X−1), as indicated in Fig. 3. Therefore, 2DDgl-Mesh is two
times faster than the traditional 2D-Mesh architecture in
terms of the worst case delay.
3.1 Scalability
The proposed architecture of 2DDgl-Mesh is a scalable
architect and therefore it can accommodate as many
resources as required for implementation of the larger NoCs
without performance degradation. The architecture shown in
Fig. 1 is only for 25 nodes and its design connecting discrete
components are shown in Fig. 4.
Fig. 4 (a) and (b) are the basic discrete components
known as quarter cross components and Fig. 4 (c) to (f) are
known as half cross components. (c) and (d) are derived by
overlapping (b) to the right side of (a) and (a) to the top side
of (b), respectively. Similarly, (e) and (f) are also derived
from (a) and (b). In the similar way, the architecture of
2DDgl-Mesh is derived by overlapping (d) to the left side of
106
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA, VOL. 7, NO. 2, JUNE 2009
(e) or overlapping (c) to the top side of (f). The scaled
architecture of 2DDgl-Mesh for N = 81 is shown in Fig. 5.
The architecture of Fig. 5 is derived by overlapping the
mirror sides of four 2DDgl-Meshes. In this way the
architecture of 2DDgl-Mesh can scale to any number of
nodes/routers. Also any number of quarters cross and half
cross components can also be used in the same way to add
any number of desired nodes.
3.2 Routing
The routing algorithm determines the next hop for a
packet in an intermediate router to reach the destination node
or IP. The selection of an appropriate routing algorithm for
the NoC architecture plays an important role in the overall
performance of the NoC system. As our proposed
architecture contains many diagonal links in addition to
horizontal and vertical ones, the routing algorithms should be
able to utilize these diagonal links in an effective manner.
After careful analysis we selected Dijkstra’s shortest path
first (SPF) routing algorithms for our NoC architecture. SPF
tries to find a shortest path between a source/destination pair
and in this way the added diagonal links could effectively be
used for routing packets. The decoding of routing
information contained in the packet’s headers enables the
routers to route them to their desired destination.
On one hand the addition of diagonal links could help in
the reduction of average packet delay from their source to the
destinations, but on the other hands, more ports to the routers
falling on the diagonal or quarter cross or half cross
components are required. The addition of new ports to the
routers can be considered as router expense that has to be
paid to achieve less average packet delay. The addition of one
2DDgl-Mesh requires the addition of 16 more ports to the
traditional 2D-Mesh architecture.
Similarly addition of one half cross or one quarter cross
components requires the addition of 8 or 4 more ports
respectively to the corresponding routers of 2DDgl-Mesh as
compared with the traditional 2D-Mesh architecture.
Fig. 5. 2DDgl-Mesh for N = 81 resources.
4. Simulation Environment
Due to lack of tools available for NoC simulation, many
researchers as in [6] and [9] have used NS-2 for simulation of
NoC architectures and algorithms. In view of the facilities
and concrete documentation[10],[11] available for NS-2, we
also apply NS-2 for analytical simulation and comparison of
the proposed 2DDgl-Mesh with the traditional 2D-Mesh
architecture under the same parametric conditions. All
simulation parameters are designed same as mentioned in
Section 4 of [6], including resource nodes N=25, exponential
traffic sources for senders, a sender/receiver pair associated
to each resource node, normalized values of bandwidth, and
delay. In order to cover the traffic behavior for a large set of
applications, three different source/destination selection
models are used and detailed in Section 5.
4.1 Performance Metrics
Two performance metrics, average delay and drop ratio
of packets, are used to compare the efficiency of both the
architectures. Let Da represent the average delay, DR the drop
ratio, P the total number of packets generated in one
simulation, DLi the end to end delay of packet i and PD the
number of packets dropped, then we have
P
Da = ∑ DLi / P
(1)
DR = PD / P
(2)
i =0
5. Results of Comparison
The comparison of both the architectures is performed by
using three different traffic models. In the following
subsections we will briefly discuss these three traffic models
and the results on the architectures in consideration.
5.1 2D and 2DDgl Traffic Model
Fig. 6 reveals the details of the model. The major
difference between 2D and 2DDgl is the selection of
neighbors and non-neighbors according to 2D-Mesh and
2DDgl-Mesh architectures respectively, i.e., the resources at
the corner in 2D traffic model have only two neighbors while
that in 2DDgl have three neighbors.
In both of the models, we set Range 1 between 0 and 1. If
Range 1 is set to 0, the algorithm will always select a random
non-neighbor of the Source i; if Range 1 is set to 1, a random
neighbor of Source i will always be selected. The middle
values of Range 1 will change the probability of selection
between neighbors and non-neighbors of Source i.
Fig. 7 and Fig. 9 show the average packet delay vs. traffic
rate comparison for both the architectures using 2D and
2DDgl traffic models. It is evident from the graphs that
2DDgl-Mesh almost always has less average delay than
2D-Mesh. Similarly, Fig. 8 and Fig. 10 show that the two
architectures have almost the same drop ratio.
ANJUM et al: Delay Optimized Architecture for On-Chip Communication
107
0.09
Start
0.08
2D-Mesh
2DDgl-Mesh
0.07
Source=Si
Drop ratio
0.06
Generate a random No.
Rd1 b/w 0 to 1
0.05
0.04
0.03
0.02
0.01
Rd1<
Range 1
Generate List 1 containing
n neighbors of Resource i
that are at Hop 1 from i
0
0
Generate List 2 containing n
non-neighbors of Resource i that
are at more than 1 hop from i
40
80
120
Traffic (Mb/s)
160
200
Fig. 8. Drop vs. traffic rate using 2D random traffic model.
90
2D-Mesh
2DDgl-Mesh
Select a random neighbor
Di from List 1
Select a random non-neighbor
Di from List 2
Destination=Di
Average delay (μs)
80
70
60
50
40
Next
30
0
Note: Si and Di are integers ranging b/w 0 and N−1
40
Fig. 6. 2D and 2DDgl traffic selection model.
80
120
Traffic (Mb/s)
160
200
Fig. 9. Delay vs. traffic using 2DDgl 75% local traffic model.
2D-Mesh
2DDgl-Mesh
0.09
90
2D-Mesh
0.08 2DDgl-Mesh
80
0.07
0.06
70
Drop ratio
Average delay (μs)
100
60
0.05
0.04
0.03
50
0.02
40
0
40
80
120
Traffic (Mb/s)
160
200
0.01
0
0
Fig. 7. Delay vs. traffic rate using 2D random traffic model.
5.2 Special Fixed Traffic Model
In this model, the destination is always selected
according to the following equation:
Di = ( N − 1) − Si
(3)
where Di and Si are the ith destination and the ith sources
respectively and N is the total number of resources used. This
model is specifically designed to check the worst case delay
behavior of both the architectures. The equation will always
try to select the source/destination pair from the opposite
40
80
120
Traffic (Mb/s)
160
200
Fig. 10. Drop vs. traffic rate using 2DDgl 75% local traffic model.
sides of the chip as mirror image. For instance, if N=25 and
Si = 0, then Di = 24 (opposite side) and if Si = 4, then Di = 20
(again on the opposite side) etc.
Fig. 11 shows that 2DDgl-Mesh has even less average
delay than 2D-Mesh under the special fixed traffic selection
model. Similarly, Fig. 12 shows that our proposed
architecture has lower drop ratio under this type of fixed
traffic model.
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA, VOL. 7, NO. 2, JUNE 2009
108
5.3 Random Traffic Model
100
2D-Mesh
2DDgl-Mesh
Average delay (μs)
90
80
70
60
50
40
0
40
80
120
Traffic (Mb/s)
160
200
Fig. 11. Delay vs. traffic rate using special fixed traffic model.
0.09
2D-Mesh
2DDgl-Mesh
0.08
Drop ratio
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
40
80
120
Traffic (Mb/s)
160
200
Fig. 12. Drop vs. traffic rate using special fixed traffic model.
9
2D-Mesh
2DDgl-Mesh
Average delay (μs)
8
7
6
5
4
0
2
3
4
5 6 7 8
Simulation runs
9
10
Fig. 13. Average Delay vs. simulation runs using random traffic.
Average delay (μs)
5
To simulate IPs or resources of different sizes, we
implement diverse kinds of traffic distributions such as
geometric, periodic, and pareto distributions on different
nodes along with random traffic generation. Geometric
distribution is implemented on resources r0 to r4 and r20 to
r24 (see Fig. 1). Periodic distribution is implemented on
resources r5 to r9 and r15 to r19 and Pareto distribution is
implemented on resources r10 to r14. In this way, 2D-mesh
and 2DDgl-Mesh architectures behave like having nodes of
different sizes. Component based interconnection network
simulator (CINSIM) [12] is used to simulate the scenario of
different node sizes as mentioned above under same
parametric conditions and random traffic model. CINSIM
can simulate both the steady state as well as transient
behavior of any interconnection network. Therefore, the
behaviors of both architectures are analyzed for steady state
and for first 50 clock cycles.
The steady state simulation is computed ten times and
each time approximating 10% traffic load is added to the
previous one. Fig. 13 shows the estimated average delay of
packets vs. simulation runs. It is clear from the Fig. 13 that
2DDgl-Mesh has less average delay than 2D-Mesh for every
simulation run. Also, the transient behavior of 2DDgl-Mesh
is superior over 2D-Mesh in terms of average delay for the
first 50 clock cycles, as shown in Fig. 14.
6. Conclusions
In this paper, we have proposed a new interconnect
architecture for on-chip communication named as 2DDglMesh. The architecture is almost similar to the traditional
2D-Mesh in terms of cost, area, and implementation issues,
but it can perform better than 2D-Mesh does. We used NS-2
and CINSIM to simulate both the architectures under the
same traffic models and parametric conditions. The results of
comparison using different traffic selection models show that
the delay of 2DDgl-Mesh is always less than that of
2D-Mesh architecture. In addition, 2DDgl-Mesh can also
have lower drop ratio under fixed traffic model. The
optimization in delay or drop is achieved by adding few
diagonal links and ports to the routers falling on the diagonal
of the meshes. Therefore, it is suggested that the
2DDgl-Mesh architecture instead of 2D-Mesh can safely be
used for on-chip communication without much overhead.
4
References
3
2
1
2D-Mesh
2DDgl-Mesh
0
0
10
20
30
Clock cycle
40
50
Fig. 14. Average delay vs. clock cycle using random traffic.
[1] International Technology Roadmap for Semiconductors, 2004
ed., Semiconductor Industry Association, World Semiconductor Council, 2004.
[2] L. Benini and G. D. Micheli, “Networks on chip: a new SoC
paradigm,” IEEE Computers, vol. 35, no. 1, pp. 70-78, 2002.
[3] S. Kumar, A. Jantsch, J. P. Soininen, M. Forsell, M. Millberg, J.
Öberg, K. Tiensyrjä, and A. Hemani, “A network on chip
architecture and design methodology,” in Proc. of IEEE
Computer Society Annual Symposium on VLSI, Pittsburgh,
ANJUM et al: Delay Optimized Architecture for On-Chip Communication
Pennsylvania, USA, 2002, pp. 117-124.
[4] A. Jantsch and H. Tenhunen, Networks on Chip, Stockholm,
NY: Kluwer Academic Pub., 2003, pp. 85-106.
[5] F. Karim, A. Nguyen, and S. Dey, “On-chip communication
architecture for OC-768 network processors,” in Proc. 2001
DAC Conf., Las Vegas, 2001, pp. 678-683.
[6] A. Vahdatpour, A. Tavakoli, and M. H. Falaki, “Hierarchical
graph: a new cost effective architecture for network on chip,”
in Proc. 2005 EUC (IFIP) Conf., Nagasaki, Japan, 2005, pp.
311-320.
[7] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh,
“Performance evaluation and design trade-offs for networkon-chip interconnect architectures,” IEEE Trans. Computers,
vol. 54, no. 8, pp. 1025-1040, 2005.
[8] H. Hossain, M. M. Akbar, and M. M. Islam, “Extendedbutterfly FAT tree interconnection (EFTI) architecture for
NoC,” in Proc. IEEE Pacific Rim Conf. on Communication,
Computers and Signal Processing, Victoria, B.C., Canada,
2005, pp. 613-616.
[9] Y. R. Sun, S. Kumar, and A. Jantsch, “Simulation and
evaluation for a network on chip architecture using NS-2,”
presented at The 20th NORCHIP conference, Copenhagen,
Denmark, November 11-12, 2002.
[10] The ns manual, The VINT Project, UC Berkeley, LBL,
USC/ISI and Xerox PARC, [Online]. http://www.isi.edu/
nsnam/ns/ns-documentation.html.
[11] NS Simulator for Beginners, Lecture Notes, Univ. de Los
Andes, Merida, Venezuela and ESSI, Sophia-Antipolis, France,
[Online]. http://www-sop.inria.fr/mistral/personnel/Eitan.Altman
/COURS-NS/n3.pdf.
[12] A. Walter, M. Kühm, D. Tutsch, D. Lüdtke, and C.
Zimmermann, CINSim Handbook: Installation and User's
Guide, [Online]. http://dontcry.cs.tu-berlin.de/cinsim/docbook/
html/handbook.html.
Sheraz Anjum was born in Okara, Pakistan in
1976. He graduated in electronics from
Quaid-i-Azam University, Islamabad, Pakistan,
in 1999, received M.S. degree in computer
engineering from University of Engineering
and Technology, Taxila, Pakistan, in 2005 and
Ph.D. degree in microelectronics and solid-
109
state electronics from Institute of Micro-Electronics, Chinese
Academy of Sciences, Beijing, China in 2008, respectively.
Currently he is working as assistant professor with the Department
of Electrical Engineering, COMSATS Wah Cantt Casmpus,
Pakistan. His research interests include design of advance DSP
architectures, multi-processor SoC and network-on-chip.
Jie Chen received his M.S. and Ph.D. degrees
in electrical engineering from the University of
Electro-Communications, Tokyo, Japan in
1991 and 1994, respectively. He is presently a
director professor with Institute of Microelectronics, and a professor with the Graduate
School of Chinese Academy of Sciences.
Before joining Chinese Academy of Sciences in 2001, he was an
associate professor with the Graduate School of Information
Systems, UEC, a research project leader of Advanced IC
Development Center, YOZAN Inc., Tokyo, Japan, from 1995 to
1997, and a research associate with UEC from 1994 to 1995. He
received the fund for 100 Talent-Scientists of Chinese Academy of
Science in 2001, and the Chinese Prime Minister Fund for
Distinguished Chinese Young Scholars in 2004. His current
research interests include SOC design for wireless communications
and multimedia signal processing.
Pei-Pei Yue received the B.E. degree in
electronic information science & engineering
from Shandong University, China, in 2003.
Now she is a Ph.D. candidate with Institute of
Microelectronics, Chinese Academy of
Sciences. Her research interest includes
architecture design of networks-on-chip.
Jian Liu received the B.E. degree from
Department of Information Science and
Electronic Engineering, Zhejiang University,
in 2004. He is currently pursuing Ph.D.
degree with Institute of Microelectronics,
Chinese Academy of Sciences. His research
interest includes modelling and simulation of
NoCs.
Download