Building Scalable network processing platforms

advertisement
Building Scalable Network Processing Platforms with Multicore Processors
With the huge demand being placed on today’s networks, thanks in part to the surge in smart
phones and tablets, increased processing is needed in different roles in the network. To address
this while keeping power, heat and cost under control, multicore processors are finding
enthusiastic acceptance among developers.
by Paul Stevens, Advantech
As excited new users charge their latest mobile device for the first time, little thought is given to
the challenges these new devices bring to the infrastructure that must support them. Whether a
smartphone, iPad or Android tablet, they are all adding to the rapid growth in network traffic as
new devices and applications, especially those in the mobile space, place greater demands on the
infrastructure. Besides managing the overall traffic volume, which Cisco’s Visual Networking
Index (VNI) predicts will approach the zettabyte/yr threshold (1 zettabyte = 1 billion terabytes)
by 2015 (Figure 1), increased burden is placed on all infrastructure support applications such as
the security and traffic management platforms. At all levels the evolving infrastructure needs
platforms that can handle this load while still keeping both physical and power footprints in
check. For all this, carriers must still watch the bottom line, so cost-optimized and efficient
solutions are a prerequisite. A range of multicore processors implemented across a variety of
system platform architectures are being utilized both individually and in combination to meet
these challenges.
It seems not that long ago since we experienced the telecom crash, and the huge amount of
excess capacity and dark fiber was crying out for the next “killer app.” Well, the tables have
turned and one could say that a multitude of applications have contributed to the current growth
challenges of the infrastructure that supports them. There is now estimated to be in excess of
500,000 apps available for the iPhone, iPad and Android platforms alone. Much of the demand is
based around delivering new rich media and video. According to the Cisco VNI, an ongoing
initiative to track and forecast the impact of visual networking applications, there are a number
of both exciting and frightening trends that are fueling network growth and evolution. Here are
just four numbers to consider 32, 26, 40 and 61: 32%—the compound annual growth rate
(CAGR) that IP traffic will grow over the next five years; 26—the number of times mobile data
traffic will increase between 2010 & 2015. The numbers 40 and 61 represent the percentage of
consumer Internet traffic that is video content, today and in 2015 respectively. No longer can
there be complaints about unused capacity, and the challenge is how to corral the data traffic in
the most efficient ways possible. Network technologies and applications such as deep packet
inspection (DPI), traffic-based filtering, encryption, packet and media processing are all needing
to take on extra load.
One of the fundamental attributes common to all new network application platforms is the need
for “wire speed” processing as they must interact with the traffic flows without impeding them in
any way. As we can now see, the volumes are huge and the throughput and speed requirements
will in turn require a serious amount of compute and processor capability. Simply throwing more
processors, systems and racks full of equipment at the task just won’t do as one begins to
approach the logical limits of one’s resources whether those are power, real estate or cash.
There are many similarities with the challenges that faced the processor developers as they fast
approached the limits of physics with the traditional performance enhancing technique of
increasing clock frequencies. These techniques dramatically increased power consumption and
heat output, making it more challenging to build system platforms with the necessary densities.
The resulting solution was the development of multicore processing technologies. Multicore
architectures enable processors to be created that have two or more identical CPU cores (now as
many as 32 or more) and typically share a common system memory. Each core can operate
independently on different processing elements and dataflows and can also easily interact with
other cores and processors.
The processing, bandwidth, power, scalability and cost requirements for platforms in nextgeneration mobile (4G/LTE) telecom infrastructure and enterprise networking are well matched
with the capabilities of multicore technology. Many of the applications such as those related to
DPI (security, filtering, content management) can easily be split into logic chunks with the heavy
lifting processes being highly repetitive making it suitable for scaling across many cores.
Not all network applications have the same requirements; this has led to a “division of labor”
approach to network equipment architecture:



Control Plane and Device Management functions—such as call setup, connection control,
routing, signaling, device operation, administration and maintenance—were performed
on General Purpose Processors (GPP).
Data Plane functions—such as packet processing, encryption/decryption,
compression/decompression, traffic-based filtering, video transcoding and deep packet
inspection—were performed on Network Processing Units (NPU).
Digital Signal Processing functions—such as audio and speech processing, digital image
and video processing, sensor array and radar/sonar signal processing—were performed
on Digital Signal Processors (DSP).
Early generation NPU and DSP products used ASICs to provide the required performance and
functionality, sacrificing the flexibility provided by software programmability of GPP-based
solutions. The current generations of NPU and DSP products use multicore technology to gain
the benefits of programmability and scalability, typically using a less complex RISC processor
for each core. Backed by a full SDK, the multicore NPU and DSP products are now as flexible
(i.e. programmable) as general purpose processors.
The boundaries between these different types of solutions are being blurred. GPP multicore
processors have added hardware acceleration for certain packet processing and/or security
functions, and some NPU and DSP processors have added general purpose CPU cores to handle
control plane and device management functions. As always, products are adapted to meet market
needs and these “hybrid” architectures are a good example of that.
There are numerous examples of multicore GPP, NPU and DSP processors that can fit the bill
for telecom and enterprise networking applications. Although there is some crossover, each is
suited to a certain set of applications.
Intel Xeon Processors: The top end of the embedded Intel Xeon Processor 5000 Sequence
Family, the E5645, is a 32 nm core microarchitecture designed for high-performance, datademanding applications. Each of the six 2.4 GHz cores can support 12 threads, making it a great
choice for use in networking platforms (Figure 2). For the 5000 family, there are specific low
power options that provide greater performance per/watt, making them eminently suitable for
matching with the power envelope constraints of embedded standard form factors. Intel targets
this family of processors at a wide range of applications including storage area networks,
network attached storage, routers, IP‐PBX, converged/unified communications platforms,
sophisticated content firewalls, unified threat management systems, medical imaging equipment,
military signal and image processing, and telecommunications (wireless and wireline) servers.
Cavium Network’s Octeon II Internet Application Processor Family: A flexible multicore design
using MIPS64 architecture, the Octeon family can support up to 32 cores and can be configured
with up to 75 application acceleration engines. A state-of-the-art network processor, it is
designed for the needs of next-generation networking applications. Including specialized
functions for security and packet processing acceleration with very low power consumption built
directly into the hardware (with supporting software), these processors are designed to maximize
throughput for a multitude of protocols all the way to layer 7. Key application uses for the
Octeon family are routers, switches, HD video over IP, deep packet inspection (DPI), unified
threat management (UTM) appliances, content‐aware switches, application‐aware gateways,
triple‐play gateways, WLAN and 3G/4G access and aggregation devices, storage arrays, storage
networking equipment, servers and intelligent NICs.
NetLogic Microsystems XLP Processor Family: The XLP832 processor supports 8 MIPS64cores
and is designed for both control plane and data plane applications. Numerous autonomous
acceleration engines (AAEs) provide packet processing, security, compression/decompression,
load balancing and storage acceleration functions. NetLogic’s low-latency Fast Messaging
Network (FMN) allows for non-intrusive communication and control messaging among
VirtuCores, acceleration engines and I/O, enabling inter-unit communication without the need
for spin-locks or semaphores. NetLogic targets the XLP Processor at high-end communication
systems, including wired and wireless security, networking, storage and data center acceleration.
Texas Instruments Multicore DSPs: Texas Instruments offers a high-performance multimedia
solution based on its TMS320C6678 digital signal processor (DSP). Designed for applications
such as multimedia gateways, IMS media servers, video conferencing servers and video
broadcast equipment, the C6678 is a highly dense media solution that is both power and cost
efficient at the system level. Based on its newest DSP generation of devices, the TMS320C66x,
TI's C6678 features eight 1.25 GHz DSP cores with 320 GMACs and 160 GFLOPs of combined
fixed- and floating-point performance on a single device, enabling users to consolidate multiple
DSPs to save board space and cost, as well as reduce overall power requirements (Figure 3).
Multicore-Based Network Application Platforms
We have seen that multicore GPU, NPU and DSP platforms have healthy roles to play and
equipment designers have a multitude of choices from which to select the best possible solution
for their specific application needs. There may be a multitude of reasons why one development
organization chooses one architecture over another. It may be specific technical features, existing
software investments, power requirements or competitive economics. Examples of the two ends
of that spectrum of choice are Advantech’s AdvancedTCA and Packetarium product lines.
AdvancedTCA is a standards-based board and system platform architecture designed with
telecommunication solutions in mind. Supported as part of the SCOPE Alliance’s profiles and
carrier grade base platform definition, numerous network platforms have been built using
AdvancedTCA. Advantech offers a number of multicore AdvancedTCA blades. For GPP
requirements the MIC-5322 is a dual processor Intel Xeon 5500/5600-based blade. The MIC5322 supports one of the highest performing Intel Xeon processors in ATCA form factor with 12
cores and 24 threads of processing power, low DDR3 memory latency, fast PCI Express 2.0 and
accelerated virtualization.
Aimed at providing a large amount of video and media processing capability, the Advantech
DSPA-8901 is designed with 20 TI TMS320TCI6608 DSPs. That totals 160 cores of processing
power to reach the higher levels of performance density needed to build the highest capacity
wireless media gateways. The DSPA-8901 significantly reduces overall system power
dissipation and system cost, and frees up valuable slots in gateway elements for additional
subscriber capacity and throughput. The DSPA-8901 includes a high-performance Freescale
QorIQ P2020 processor and a Broadcom BCM56321 switch, which terminates the 10 Gigabit
Ethernet fabric connections and distributes traffic to the twenty DSPs.
Although they have an impressive array of carrier grade features, AdvancedTCA platforms can
be size, power and price prohibitive for some applications, especially those that are heavily
dedicated to network processing. This was one of the key reasons behind Advantech’s costoptimized Packetarium range. The goal was to pack as much network processing performance as
possible into the smallest package while keeping power consumption and cost efficiency
optimized for the targeted applications.
The NCP-5260 represents a new generation of hybrid system designs with Intel architecture
processing on the control plane, and Packetarium network processing boards featuring NetLogic
NPUs for the data plane. It integrates up to two powerful, multicore Packetarium network
processing boards for wire speed packet processing and accommodates up to 16 x 10 GbE
external interfaces. The main carrier board provides the high-speed switched interconnects
between Packetarium boards (Figure 4).
At the high-performance end of Advantech’s Packetarium product line, the NCP-7560 integrates
up to eight powerful, NCPB-2320 multicore Packetarium Network Processing Boards. Utilizing
Cavium Network’s CN6880 Octeon II processor, a fully configured NCP-7560 packs 256 cores
into the 4U server space to handle 80 Gbit/s of network traffic from multiple 10 Gigabit Ethernet
ports. Applications that reap the performance benefits of the new Octeon II processor family
include high-capacity radio network controllers, network acceleration platforms, as well as data
center and LTE gateways.
None of us have crystal balls but we can all be certain that the future of global networks will be
one requiring a huge increase in capacity and capability. As the various models of cloud
computing go from strength to strength, and network-capable mobile devices become even more
pervasive, the requirement for ever more powerful network systems platforms will increase.
Whichever high level architectures are chosen by solution developers, the advantages of
multicore silicon linked with flexible and cost-optimized system platforms will provide a major
implementation advantage.
Advantech, Irvine, CA. (949) 789-7178. [www.advantech.com].
Cavium Networks, San Jose, CA. (650) 623-7000. [www.cavium.com].
Intel, Santa Clara, CA. (408) 765-8080. [www.intel.com].
NetLogic Microsystems, Santa Clara, CA. (408) 454-3000. [www.netlogicmicro.com].
Texas Instruments, Dallas, TX. [www.ti.com].
Download