Uploaded by Danijela Efnusheva


Деветта национална конференција со меѓународно учество - ЕТАИ 2009
Ninth National Conference with International Participation - ETAI 2009
Охрид, Република Македонија - Ohrid, Republic of Macedonia, 26-29 IX 2009
Danijela Jakimovska1, Goce Dokoski2, Aristotel Tentov3 and Marija Kalendar4
Faculty of Electrical Engineering and Information Technologies, Dept. of Computer Science, Karposh 2, b.b,
Skopje, Macedonia {danijela1, goce.dokoski2, toto3, marijaka4}@feit.ukim.edu.mk
Abstract--This paper provides an overview of the
network processor research and design, current trends,
and proposition for future development. Firstly, we
explain the need of this type of chips – network
processors as software controllable devices optimized
for high performance communication traffic. After that,
we describe the network processor's key aspects: data
processing, chip design and software development
possibilities. Afterwards, we explain the reasons for
existence of different architectures and organizations.
In that sense, we give a description of few different
network processors, thus presenting current trends and
giving possible ideas for future evolution. We analyze
the approach of involving general purpose processors,
in combination with specific software in order to
further augment the performance of network
Index terms-- network processor, design, architecture,
evolution, trends.
Networks grow rapidly and include numerous complex
applications, services and real-time data that need to be
provided at very high speeds, up to multi Gb/s. Therefore,
there is a constant demand for ever increasing packet
processing speed, while at the same time, an increasing
number of services need to be provided by the networking
hardware (QoS, firewalls, scheduling, flow controls etc.)
Not to mention that data, voice and video networks are
converging and that users are looking for on-demand
services delivered in any network, on any platform.
Network devices must follow this evolution, and process
the data at these transmission rates.
Routers are traditionally designed as programmable
integrated circuits specifically tailored to the tasks of
routing and forwarding information. However, this
approach has showed inflexibility when it comes to adding
new capabilities. In the same time, the increasing
development of the System on Chip (SoC) technology, and
the availability of Field programmable gate architecture
(FPGA) as well as complex programmable logic device
(CPLD), has enabled many new possibilities in processor
This evolution resulted in the concept of a network
processor, which is optimized for packet processing and
routing. This turned out as the best solution as it provides
the necessary flexibility, while keeping a descent operating
speed. In order to design an efficient network processor, a
careful examination of the current solutions is necessary.
Therefore we’ll analyze few of the most important.
Network processors design is an ongoing field of
development and research. Many approaches have been
applied and many new ideas are emerging, such as the
NetFPGA architecture, or software routers. The aim of this
paper is to give an outline of the achievements in network
processor design, discuss current trends, as well to propose
ideas for further improvements.
This paper is organized as follows. Section 2 gives an
overview of network processors, how network processing
works and different levels of operation. Section 3 describes
current architectural trends in network processor design.
Section 4 gives a few examples of commercial network
processors, including one software solution for general
purpose processor. Section 5 presents possible architectural
changes in general purpose processors for achieving better
processing speeds. The paper concludes in Section 6.
The development of network processors starts in the late
1990s, when network devices were insufficient to handle
complex network processing requirements. They are chipprogrammable devices, optimized for packet processing at
very high speeds (multi-Gb/s) and are included in many
different types of network equipment such as routers,
switches or firewalls [1] [2].
A network processor is actually an application specific
instruction processor (ASIP) similar to the general purpose
processors that usually support the simplest instruction set
– RISC. Additionally, it implements parallel processing and
pipelining at low level and it contains hardware blocks with
specific purpose like traffic management, searching, high –
speed memory and packet I/O [2] [3].
As a result it can operate at wire speeds and achieve great
performances. Additional requirements should be satisfied
as well. It should be flexible, easily programmable and
reach time-to-market [4].
Network processing devices usually perform packet
analysis, searching and classifying frames, modifying
packet contents, retrieving relevant information from the
frames and forwarding packets. In order to achieve this,
they are usually designed as a composition of four
functional blocks: physical interface, data plane, control
plane and switching interface.
The physical layer interface does the conversion of
a signal when it is received, and then transmits it
over the communication channel medium. The
packet processing speed depends on the appropriate
operations that need to be executed.
Fast packet processing is usually referred to as the
data plane, and is characterized by simple tasks,
performed at wire speeds.
On the other side, the slow packet processing,
called control plane, is responsible for packets that
need more complex processing. This plane also
performs operations for control, configuration and
management of the network device. Therefore, the
control plane is usually implemented as a general
purpose processor, while the data plane as a
network processor, which has greater processing
The switching fabric is another part of the network
devices and its basic function is forwarding the
traffic from ingress to egress ports [3] [4] [5] [6].
Fig. 1 Functional elements of network devices
There are three levels on which network processors
operate: entry-level or access network processors, mid-level
or edge network processors, and high-end or core network
processors [3]. Nowadays the access level processors
enable up to 2 Gb/s throughput. Their applications include
routers for customer branch offices, homes etc. Network
processors used in this network level equipment are:
EZchip’s NPA, Wintegra’s WinPath, Agere, PMC Sierra,
and Intel’s IXP2300. Edge network processors aggregate
traffic from more access routers and serve as ingress and
egress to the core. They run at speeds from 2 to 5 Gb/s. In
this group of network processors belong: multipurpose
network processors like AMCC, Intel’s IPX, C-port, Agere,
Vittese, and IBM’s network processor. Core network
processors present the fastest part of the network, operating
at wire speeds between 10 and 100 Gb/s. In order to
achieve these speeds, network processors are usually
constructed from high-speed components that are able to
process huge amount of data (around million of packets per
second). Examples of such network processors are:
Ezchip’s network processors, Xelerated, Sandburst, Bay
Microsystems, or the in-house Alcatel-Lucent SP2 [2].
Fig. 2 Network processors different operation levels
Network processor design is an ongoing field of
development and there are many different solutions
In general, network processor architectures include: a
processing engine (PE), dedicated hardware, network
interface, memory resources, and software support. It must
also exploit parallelism, by using various parallelization
techniques and pipelining.
PE is the basic programmable unit in network processors,
responsible for data processing. Depending on the
architecture, PEs can be grouped in blocks of multiple PEs
[8]. Usually, PEs are positioned close to the special
hardware accelerators called coprocessors. This dedicated
hardware is easily programmable, performs additional
computations, and consequently increases processing
power and speed. Memory organization includes packet
memory, instruction memory and routing table memory.
Since the processor frequently interacts with the memory,
this communication should be very fast. Improvements in
this field are achieved by the use of content addressable
communication points for the ingress and egress packet
flow in the network processor. Another important aspect in
network processor design when it comes to flexibility is
software support. These days, companies are paying much
more attention to the programmability, allowing network
software to be written in a high-level language such as C,
and the core routines in microcode [3]. However, it is not
easy to develop software for network processors as they
have different architectures, complex design and
performance constraints. Current trend is to achieve
software uniformity and design portability.
In order to meet the performance and speed requirements,
current architectures include parallel processing, special onchip bus and memory organizations.
Parallelism can be implemented on three different levels:
instructions, threads and packets. Instruction level
parallelism enables instantaneous execution of program
instructions, similar to the pipeline approach. The idea of
using multithreading processors results in executing
multiple threads on one or more processors. On the other
hand, packet level parallelism refers to parallel packet
processing. This approach is achieved by employing
multiple PE, so each one is simultaneously responsible for
processing different packets. Currently, there are various
network processors, accomplishing their performance by
raising the number of processor engines. This trend is
actually shown in the next picture [2].
Fig. 3 Trade-offs between number of PEs and issue width.
High performance on-chip communication architectures
include buses and crossbars in order to achieve high speed
processing. Bus-based communications are not able to
satisfy the increasing performance needs, up to 40 Gb/s, so
they are replaced by crossbar switch. On the other side, the
use of the crossbars is limited by their high cost, difficult
design and low scalability. As a result, some network
processors use high bandwidth buses [4].
Memory in the network processors is usually organized on
several levels. It is very important, as it has many
responsibilities such as storing the program and registers’
content, buffering packets, keeping intermediate results,
storing data that is produced by the processors while
working, holding, and maintaining potentially huge tables
and trees for look-ups, maintaining statistical tables, and so
forth. Therefore memory should has lower latency and
fulfill the speed requirements. Consequently, memory size
and speed are a trade off [2].
Improvements in memory organization are achieved by
memory coprocessors and different caching mechanisms.
Memory coprocessor function is to execute instructions on
the data held in the memory, without involving the main
processor. Memory search operations are optimized by the
use of content addressable memory. Caching mechanisms
can significantly improve root lookup performances, and
hence packet forwarding. Although at the beginning of the
network processor evolution, caching mechanisms were not
widely used, today both data and instruction cache are very
crucial [4].
So far many network processor architectures have emerged,
all characterized by their own advantages and
disadvantages. In this section we describe three famous
architectures, two of which are very successful on the
market. We will also consider a software solution based on
general-purpose processors.
4.1 Intel IXP 2800/2850 Network Processor
The Intel IXP 2800/2850 network processor architecture
allows high performance network processing and routing,
with speeds ranging from OC-3 (155 Mb/s) to OC-192 (10
Gb/s) [9]. It can operate on very heavy traffic such as at the
internet core as well as edge routers/switches, Storage Area
routers/switches. At the same time, it is capable of
advanced traffic management routines (deep packet
inspection, load balancing and high speed packet
According to [1], IXP2850 is the first processor to integrate
security functionality into the core. It is achieved by two
security specific functional units for handling cryptographic
algorithms as well as encrypting and decrypting IPsec
packets at OC-192 speeds. However, it can achieve this on
either one 10Gb/s connection, or several 1Gb/s
connections, simultaneously.
Its hardware configuration consists of two IXP
multiprocessors, one per ingress and egress directions as
well as a switch fabric.
One IXP processor itself is a complete multiprocessor
system, consisting of one XScale core, operating on 700
MHz, based on RISC architecture, 16 programmable
"micro engines", working at 1.4GHz frequency, and
additional coprocessors for aiding packet reassembly,
acceleration and control flow. The memory consists of
general purpose registers, CAMs, SDRAM and RDRAM,
distributed as global and local memory, among the micro
engines and the XScale core.
The XScale core is supposed to do the communication and
coordination with the backplane, maintain the data, and
process the packets that the micro engines can't handle.
In [10] one interesting approach is proposed, where the
XScale core is replaced by one of the micro engines, in
order to achieve higher speed (about 40% in particular).
This alternative allows high speed routing for the IPv6
protocol, as it does not require complex header processing
in the XScale core.
The micro engines are programmable in a high-level
language - C, and their task is to do the high-speed packet
processing. Their configuration encompasses packet
processing stages, pipes and the memory model.
To conclude, using the 1 Gb/s configuration, this processor
can be used as an edge-class router, and with the 10Gb/s, it
can be used for line-speed packet processing (e.g. Firewall,
Intrusion Detection System etc.).
4.2 EZchip Architecture
This architecture is based on a pipeline of parallel
heterogeneous processors, optimized for packet processing.
Each processor is equipped with specific functional units,
memory and internal data buses. There are many network
processor (NP) types and versions, based on the EZchip
architecture, that are suitable for various applications.
EZchip's NP-1 is a 10-Gb/s full duplex network processor,
providing seven-layer packet processing. This is achieved
with many specific hardware accelerators, capable of
processing packets at wire speeds. Unlike NP-1, NP-2
includes traffic management units. This architecture uses
DRAM for all lookup tables and frame buffers, as well as
external SRAM for statistical data. The use of DRAM
minimizes overall system cost and power dissipation.
4.4. General-Purpose Processor as a Network Processor
Recently, an increasing attention has been devoted to
software routers based on off-the-shelf hardware and opensource operating systems running on personal computer
(PC) architectures. Today's high-end PCI shared buses can
be used for multi-gigabit-per-second routing, for a price
much lower than that of commercial routers.
What remains as a bottleneck to this approach is the
programmability of the NICs, although an interesting
solution has already been proposed [14].
Although this category is not primarily intended for the
core level routing, it can certainly very well serve at the
edge-level, allowing multi-gigabyte connectivity [15].
There are two very popular open-source software solutions:
The NP-1 is based on a five-stage packet processing
pipeline, each stage containing multiple parallel processors
optimized to perform specific tasks. There are four types of
such Task-Optimized Processors (TOP engines), each of
them employed to perform parsing, searching, resolving
and modifying packets, correspondingly.
Each type of TOP processor has a function-specific data
path, functional units and instruction set, required for the
complex seven layer packet processing. The instruction sets
are similar, and only small adjustments are needed to
operate the specific functional units. As a result the overall
architecture resembles a super-scalar system of a high
One important characteristic is that the allocation of the
TOP engines to the incoming frames, inter-TOP
communication and the maintenance of the ordering of the
frames is performed in the hardware and is completely
transparent to the programmer [2].
4.3 Architecture of NetFPGA Router
When it comes to achieving dissent speeds, at the same
time affording sufficient flexibility for implementing new
protocols, the Field Programmable Gate Array (FPGA)
technology comes as a very handy tool. With this in mind,
the NetFPGA architecture was designed as a sandbox for
network hardware design. It allows researches to
experiment with new ways to process packets at line-rate,
by allowing them to program its functionality in a hardware
description language (e.g. Verilog).
The NetFPGA is constructed as a PCI card that contains an
FPGA, four 1GigE ports and buffer memory (SDRAM and
DRAM). It consists of modules that are connected as a
sequence of stages in a pipeline. They communicate using a
simple packet-based synchronous FIFO push interface [11].
Although its primary intent is research, the NetFPGA
points out the idea of combining FPGA with the network
processors. This combination can significantly increase the
flexibility of a network processor at higher speeds.
QUAGGA, free software that manages TCP/IP
based routing protocols based on the GNU Zebra
project, and released as part of the GNU Project.
According to its developers, it offers true
modularity, by having a different process for each
protocol. QUAGGA works independently from the
operating system over which it's installed, and does
not include non-routing functionalities such as
DHCP, NTP or SSH access. These services can be
added separately to the operating system.
Another advantage of this software is its
availability for different architectures.
Vyatta, on the other hand is built together with the
operating system, and is easier to configure. It is
also free software; however it only supports the x86
processor architecture.
General purpose processor with multiple cores combined
with operating systems and application programs with
multithreading capabilities poses very promising
performances for their usage in routing. However, they
can’t be implemented directly in network applications like
routing, but instead needs careful internal architectural
hardware redesigning which will result at the end into
crucial improvement of the network packet processing
performances. This will result in an appropriate change in
internal software architecture, as well as instruction set
We start to investigate necessary hardware and instruction
set changes for a SUN T2 processors architectures. Initial
results are very promising toward capabilities of such kind
of processors, with appropriate internal hardware and
software interventions, to be involved in a multi-gigabitper-second routing applications.
At first, either general purpose processors or ASICs were
used as network processors, but today a combination of
these two approaches is in use, in order to cope with the
need for a greater flexibility and speed.
researches, initial results are very promising, but further
investigations are necessary, and we will pursue it.
Fig. 4 Comparison of General purpose processors, ASICs
and network processors
To meet the networks' evolution, characterized by an ever
increasing bandwidth, as well as the requirement for heavy
packet processing, these processors were constantly
equipped with increasing number of hardware accelerators.
That is how the concept of today's network processors has
evolved [13].
Today, network processors are multi-engine or multi-core,
multi-threaded system on chips, with varying degree of
programmability. The more recent network processors are
requiring relatively general purpose programming
Network processors are replacing ASICs and fixedfunction chips in a variety of networking equipment, and
thus are emerging as a dominating technology in new
network system designs. According to [12], the marketshare leader in 2007 was Intel, with its IXP2350 access
NPU and the IXP28xx 10Gbps NPU. Nowadays, Intel is
shipping an embedded x86 processor using some elements
from the IXP2350. Netronome Systems on the other side
develop high-end derivates from this processor, using a
license from Intel [6]. EZchip is shipping its highly
integrated 20GBps NP-2, that combines a network
processor unit, fine grained traffic manager, and several
Ethernet MACs. Other manufacturers that are ranking high
on the market are LSI, Wintegra, Broadcom as well as few
smaller companies.
It is expected that the NPU market will show strong growth
in the near future.
The software solutions, such as the QUAGGA and Viatta
platform, that add routing functionalities to general purpose
processors, can not reach the line speeds achieved by the
network processors on the core level. This might be
achieved by some minor architectural changes.
One may conclude that a compromise between
programmable processors and fixed-function circuits
(possibly FPGA), is always necessary.
Another interesting area is to investigate possibilities for
inclusion of general purpose Multicore processors, with
necessary internal hardware and software redesign, and
appropriate multithreading routing software, to reach or
even overcome the line speeds achieved by the special
network processors. We are at the very beginning of these
[1] Nazar Zaidi: Network Processors: Evolution and
Current Trends, RMI Corporation, USA, May 1, 2008
[2] Ran Giladi: Network Processors - Architecture,
Programming and Implementation, Ben-Gurion
University of the Negev and EZchip Technologies Ltd.,
[3] Mahmood Ahmadi, Stephan Wong: Network
Processors: Challenges and Trends, Computer
Engineering Laboratory Electrical Engineering,
Mathematics and Computer Science Department Delft
University of Technology, Netherlands
[4] Mohammad Shorfuzzaman, Rasit Eskicioglu, Peter
Graham: Architectures for Network Processors: Key
Features, Evaluation, and Trends, Department of
Computer Science University of Manitoba, Winnipeg
[5] Panos C. Lekkas: Network Processors: Architectures,
Protocols and Platforms, McGraw-Hill Professional ,
[6] Netronome, Switching & Routing Solutions,
[7] Patrick Crowley, Mark A. Franklin, Haldun
Hadimioglu, Peter Z. Onufryk: Network Processor
Design volume 1/2, Morgan Kaufmann, 2003
[8] Md. Ehtesamul Haque, Md. Humayun Kabir: A Survey
on Network Processors Department of Computer
Science and Engineering Bangladesh University of
Engineering and Technology, Dhaka, April 3, 2007
[9] David Meng, Ravi Gunturi, Manohar Castelino:
IXP2800 Intel Network Processor IP Forwarding
Benchmark Full Disclosure Report for OC192-POS,
Oct 30, 2003
[10]Ankush Garg, Prantik Bhattacharyya: Network
Processor Architecture for Next Generation Packet
Format, Department of Computer Science, University
of California, Davis
[11]Jad Naous, Sara Bolouki, Glen Gibb, Nick McKeown:
NetFPGA: Reusable Router Architecture for
Experimental Research, Stanford University California,
[12]Bob Wheele, Linley Gwennap: A Guide to Network
Processors, Ninth Edition, California, January 2008
[13]Matthias Gries: Algorithm-Architecture Trade-offs in
Network Processor Design, PhD thesis, Sweden, May
21, 2001
[14]Petracca, M. Birke, R. Bianco, A.: HERO: Highspeed enhanced routing operation in software routers
NICs, Turin, 13 Feb. 2008, Politec. di Torino,
[15]Rober Olsson, Hans Wassen, Emil Pedersen: Open
Source Routing in High-Speed Production Use, 2008,
Uppsala Universitet