White Paper
Adrian Hoban
Software Engineer
Intel Corporation
Using Intel® AES
New Instructions
and PCLMULQDQ
to Significantly
Improve IPSec
Performance on
Linux*
August 2010
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Executive Summary
The Advanced Encryption Standard (AES) is a cipher defined in the
Federal Information Processing Standards Publication 197. Intel®
microarchitecture, formerly codenamed Westmere, introduced an AES-NI
instruction set extension that contains six new instructions specifically
developed for facilitating optimized AES implementations. Another
addition to the microarchitecture is a carry-less-multiple instruction called
PCLMULQDQ, used for optimizing GCM implementations. This paper
investigates the potential performance gains that are possible by creating
an AES-NI-GCM implementation within the Linux kernel cryptographic
framework using the new instructions. (Assembly code implementation of
AES-NI-GCM is covered in Ref. [4].)
An AES-GCM implementation based on the AES-NI and PCLMULQDQ
instructions delivered a 400% throughput performance gain when
compared to a non-AES-NI enabled software solution on the same
platform.
The data presented in this paper demonstrates that an AES-NI enabled
IPSec stack on Linux, running on Intel® processors based on the new
Intel® microarchitecture can deliver incredible IPSec performance
improvements over previous generations of silicon.
The performance measurements show that for a single IPSec connection
on Linux, an AES-GCM implementation based on the AES-NI and
PCLMULQDQ instructions delivered a 400% throughput performance gain
when compared to a non-AES-NI enabled software solution on the same
platform. In addition, the cycles required to perform the actual cipher
operation were reduced by approximately 900%.
2
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Contents
Introduction ...........................................................................................................5
Intel® AES New Instructions (Intel® AES-NI) .............................................................5
AES-GCM ...............................................................................................................6
Advanced Encryption Standard (AES).................................................................6
Galois Counter Mode (GCM)..............................................................................6
IP Security (IPSec)..................................................................................................7
IPSec Modes ..................................................................................................7
Tunnel Mode ........................................................................................7
Transport Mode ....................................................................................7
IPSec Protocols ...............................................................................................7
Encapsulating Security Payload ...............................................................8
Authenticated Header ............................................................................8
Linux Cryptographic Framework ................................................................................8
Linux AES-NI-GCM Driver for AES-NI .........................................................................9
Assembly Code Implementation ..............................................................9
Linux AES-NI-GCM Crypto Plug-in Design ......................................................... 10
Combining AES and GCM ...................................................................... 11
Threading Model ................................................................................. 11
Asynchronous Support ......................................................................... 11
Co-Existence with Other Implementations .............................................. 12
Performance Scalability ........................................................................ 12
Testing Methodology ............................................................................................. 13
Hardware Platform ........................................................................................ 13
Software Configuration .................................................................................. 13
BIOS Configuration ....................................................................................... 14
C-States ............................................................................................ 14
Enhanced Intel SpeedStep® Technology ................................................ 14
Cache & Hardware Prefetchers .............................................................. 15
Intel® Hyper-Threading Technology ...................................................... 15
Traffic Generator Configuration ....................................................................... 15
IPSec Internet Packet Mix (IMIX)........................................................... 16
Performance Results.............................................................................................. 16
Single Tunnel Performance ............................................................................. 17
Six Tunnel Performance ................................................................................. 20
Twelve Tunnel Performance ............................................................................ 21
IPSec IMIX Performance ................................................................................ 22
Conclusion ........................................................................................................... 23
324238-001
3
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
References........................................................................................................... 24
Figures
Figure 1.
Linux Crypto Framework ...................................................................................... 9
Figure 2.
Linux AES-NI-GCM Crypto Plug-in the Linux Stack ................................................. 10
Figure 3.
Testing Topology .............................................................................................. 13
Figure 4.
Single IPSec Tunnel Performance in Mbps ............................................................ 18
Figure 5.
Single IPSec Tunnel Performance in cycles per packet ............................................ 19
Figure 6.
Single IPSec Tunnel - Percentage of time for crypto vs. non-crypto processing .......... 20
Figure 7.
Performance in Mbps for Six Simultaneous IPSec Tunnels ....................................... 21
Figure 8.
Performance in Mbps for 12 Simultaneous IPSec Tunnels ........................................ 22
Figure 9.
IPSec IMIX Performance in Mbps ......................................................................... 23
4
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Introduction
Networking security is a specialized area in internet security that focuses on
protection of network communications from unauthorized access. In a world
with billions of connected devices and with projections for the number of
intelligent connected devices to soar to 15 billion by 2015, networking
security has a very important role to play.
The IP Security (IPSec) suite of security protocols is one of the most popular
protocols used by networking security professionals to provide authenticity,
integrity, and privacy to internet communications. An IPSec implementation
can employ a variety of cryptographic algorithms to provide the security
characteristics required.
Traditionally, system administrators were required to make a choice of
cryptographic algorithms based on a tradeoff between desired security levels
and the performance requirements in the network. Intel recognized the need
for increasing the security performance capabilities of the processor so that
network security applications could be configured to deliver the highest level
of security and still keep pace with the networking performance
requirements.
This paper focuses on the design and performance capabilities of an
implementation of IPSec in Linux that is configured to use the Advanced
Encryption Standard (AES) Galois Counter Mode (GCM) algorithm mode
combination. The implementation leverages new instructions in the Intel®
microarchitecture, formerly codenamed Westmere, which is currently
available in certain Intel® Xeon® processors and Intel® Core™ processors.
Intel® AES New Instructions
(Intel® AES-NI)
The Advanced Encryption Standard (AES) is a cipher defined in the Federal
Information Processing Standards Publication 197 (FIPS 197). The standard is
based on the Rijndael algorithm and supports the symmetric block cipher with
128, 192, and 256-bit keys. AES was adopted by the U.S. government circa
2001. In 2003, the U.S. National Security Agency (NSA) approved AES for
securing classified information up to Top Secret level.
In 2010, Intel® microarchitecture, formerly codenamed Westmere,
introduced Intel® AES New Instructions (Intel® AES-NI), which is a suite of
six new instructions specifically for facilitating higher performing and more
secure AES implementations. [1] The instructions AESENC, AESENCLAST,
324238-001
5
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
AESDEC, and AESDELAST support AES encryption and decryption operations.
The instructions AESIMC and AESKEYGENASSIST support AES key expansion.
An additional benefit of using Intel® AES-NI is that a more secure solution
may be developed. Intel® AES-NI based implementations are not vulnerable
to some side-channel attacks that can be carried out on certain table-based
AES implementations. Also, the complexity of implementing AES with the new
instructions is considerably lower than implementing AES with a table-based
approach, therefore the risk of implementer error is also considerably
reduced.
AES-GCM
In network security applications, messages vary in length. Block cipher
algorithms require that data is of a fixed length. To use a block cipher
algorithm in a secure networking application, it is commonly combined with a
block cipher mode of operation. Among other things, block cipher modes help
to normalize the message size for processing. This section describes the
AES-GCM block cipher algorithm and mode combination.
Advanced Encryption Standard (AES)
Advanced Encryption Standard (AES) is a set of block ciphers taken from the
Rijndael [2] symmetric key block cipher specification. The standard defines a
block size of 128 bits and support for 128-bit, 192-bit, and 256-bit keys. The
United States government National Institute of Standards and Technology
(NIST) announced the adoption of AES in 2001 with the publication of the
Federal Information Processing Standard (FIPS) 197 document [3].
Using the AES algorithm provides the user with the ability to add
confidentiality to data. Confidentiality is the property that ensures only a
person with a valid key can read the data.
Galois Counter Mode (GCM)
Galois Counter Mode is an authenticated encryption algorithm for use with
symmetric key block ciphers such as AES. It operates on 128-bit blocks.
Using the GCM algorithm provides the user with the ability to add integrity
and authentication to data. Integrity is the property that the data has not
been tampered with. Authentication is the property that ensures the identity
of the data.
Combining AES and GCM provides the user with confidentiality, integrity, and
authentication properties.
Note: One of the other common cryptography properties is non-repudiation. Non-
repudiation is the property of ensuring both the integrity of the data and that
the sender really sent the data. AES-GCM does not have non-repudiation
6
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
properties. The use of a digital signature is typically required as the basis for
providing non-repudiation to a communication. AES-GCM can then be used as
part of the overall communication infrastructure.
IP Security (IPSec)
IP Security (IPSec) is a suite of security protocols that operates at layer 3 in
the TCP/IP layering model. It provides security functionality in the form of
confidentiality and authentication for the IPv4 and IPv6 layers. IPSec
operates at layer 3, therefore it can provide this protection to all higher level
layer traffic (including application traffic) that traverses the internet.
In Linux*, the native 2.6 kernel IPSec stack is called Netkey. It integrates
with the Transformer module (XFRM) in the kernel. Netkey accesses the
Security Policy Database (SPDB) and the Security Association Database
(SADB) to retrieve IPSec policies and IPSec security associations. A user
space application, typically an Internet Key Exchange (IKE) stack, is
responsible for loading the kernel SPDB and SADB with information necessary
for the kernel to establish an IPSec connection.
IPSec Modes
IPSec has two modes of operation, tunnel mode and transport mode.
Tunnel Mode
Tunnel mode is typically used to create a Virtual Private Network (VPN). An
IPSec VPN can support secure network-to-network communications, host-tonetwork and also host-to-host configurations. Network-to-network VPNs are
typically used to secure communication between sites. Host-to-network VPNs
are often used by remote users that need to connect securely to a corporate
network. Tunnel mode VPNs can also be used to secure host-to-host
communication (although transport mode is more commonly used in this
scenario).
Tunnel mode secures the entire IP packet and encapsulates it in another IP
header specific to the IPSec tunnel endpoints.
Transport Mode
Transport mode is typically used to secure host-to-host communication. With
transport mode, only the IP packet payload is secured. The original IP source
and destination addresses remain unchanged.
IPSec Protocols
IPSec has two protocols, Encapsulating Security Payload and Authenticated
Header.
324238-001
7
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Encapsulating Security Payload
The Encapsulating Security Payload (ESP) protocol in IPSec enables
confidentiality, authenticity, and integrity. Encryption or Authentication only
schemes are possible but not recommended. In tunnel mode using ESP
schemes, the outer, encapsulating IP header is not afforded any protection,
but the inner IP header can be fully secured. ESP is identified as protocol
number 50 in the outer IP header.
Authenticated Header
The Authenticated Header (AH) protocol in IPSec enables authenticity and
integrity. It does not provide for confidentiality. AH is identified as protocol
number 51 in the IP header.
Linux Cryptographic Framework
The Linux kernel provides an Application Programming Interface (API) for
cryptographic functionality. This API supports a wide variety of cryptographic
capabilities such as ciphers, hashes, compression, and random number
generation. The API supports both synchronous and asynchronous calling
semantics and is available to kernel mode applications to use.
The actual implementations of the algorithms are registered with the
cryptographic framework via a plug-in model. The cryptographic
implementation makes a call to the crypto_register_alg() function and
passes a pointer to its definition of a crypto_alg structure. The contents of
the crypto_alg structure define the behavior of the cryptographic
implementation. For example, the cra_name member of the crypto_alg
structure specifies the algorithm supported.
Multiple plug-ins can co-exist with the same functionality. The application can
request access to a specific implementation by explicitly requesting the
implementation by name. The name must match the definition given in the
cra_driver_name member of the crypto_alg structure. Alternatively, the
application can just specify the cryptographic algorithm it is interested in
accessing. When multiple implementations exist with the same algorithm
name, the cryptographic framework selects the implementation based on the
cra_priority member of the crypto_alg structure.
One of the more recent additions to the Linux cryptographic framework is the
ability to define the implementation as an Authenticated Encryption with
Associated Data (AEAD) type. This algorithm type is particularly suitable for
use with AES-GCM combined authenticated-cipher combined mode of
operation. It facilitates the framework to efficiently handle “one-shot”
requests from the application. With the addition of the AEAD type, it is
8
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
efficient to implement a driver that can process an AES-GCM request in one
operation.
Figure 1. Linux Crypto Framework
Linux AES-NI-GCM Driver for AES-NI
Intel has a track record of consistently delivering performance enhancements
over subsequent generations of silicon. These performance enhancements are
achieved through micro-architectural advancements as well as advancements
in process technology. Typically, software that is moved to newer generations
of silicon just runs faster. To reap the potential benefits from new
instructions, it is necessary to recompile the software application with the
latest compiler or to code directly to the new instructions.
Intel® microarchitecture introduced six new instructions specifically for
facilitating an optimized AES implementation [1]. A carry-less-multiple
instruction called PCLMULQDQ was also added. An assessment of the AES-GCM
authenticated-cipher suite suggested that significant performance gains could
be achieved in a platform that efficiently utilized these instructions.
AES implementations are typically written in C code and often implemented
with a table-based approach. As table-based implementations do not
translate readily (via a compiler) to the new instructions, the Linux
AES-NI-GCM crypto plug-in described in this section was created to efficiently
leverage the new instructions.
Assembly Code Implementation
The assembly code implementation of AES-NI-GCM is covered
comprehensively in the white paper titled: “Optimized Galois-Counter-Mode
Implementation on Intel® Architecture Processors” [4].
324238-001
9
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Linux AES-NI-GCM Crypto Plug-in Design
This section describes the Linux AES-NI-GCM Crypto Plug-in design. The
implementation conforms with RFC4106 The Use of Galois/Counter Mode
(GCM) in IPSec Encapsulating Security Payload (ESP) definition.
The implementation does not conform to typical Linux driver implementations
such as those based on character or block drivers. Nonetheless, in this
context, the term “driver” may be used interchangeably with “plug-in” as the
Linux kernel crypto interface uses the term “driver” as part of the
nomenclature, for example the cra_driver_name member of the crypto_alg
structure.
The modular view of the driver is presented in Figure 2. The driver has two
parts. The first part is a patch to the existing AES-NI driver file, called
aesni-intel_glue.c. This patch contains the C code needed to register the
new AES-NI-GCM implementation with the Linux crypto framework. The
second part of the driver is a patch to the existing aesni-intel_asm.s file.
This patch contains the assembly code implementation of AES-NI-GCM using
the new AES-NI instructions.
Figure 2. Linux AES-NI-GCM Crypto Plug-in the Linux Stack
IKE Protocol Engine
Key Mgmt.
Public Key
Library
Crypto
Library
Certificate
Library
User/Kernel
Boundary
Iproute2/
setkey
Netlink
PF_KEY
SADB
SPDB
XFRM
Crypto
IPSec
Intel AES-GCM
Crypto Plug-in
IP
Kernel
Ethernet Driver(ixgbe)
AES-GCM
Assembly Code
Intel 82599EB
10G Ethernet Controller
10
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
The driver was implemented on a standard 2.6.31.4 Linux kernel downloaded
from www.kernel.org. The code is only applicable for 64-bit configurations.
The driver files are located in the /usr/src/linux/arch/x86/crypto folder.
Combining AES and GCM
Implementations of AES-GCM are commonly split into two distinct operations:
an AES request and a GCM request. Combining both the AES and GCM
operations leads to very efficient utilization of the underlying hardware.
The Authenticated Encryption with Associated Data (AEAD) interface in the
Linux cryptographic API makes it efficient to register a combined AES-GCM
implementation in the kernel cryptographic framework. To register an
implementation with the AEAD infrastructure, the CRYPTO_ALG_TYPE_AEAD
flag must be set in the cra_flags member of the main algorithm structure,
crypto_alg. In addition, the cra_u.aead member of the crypto_alg
structure must be used to specify the function pointers and sizes of the
implementation.
Threading Model
The native Linux IPSec stack typically executes in the highest priority bottomhalf context known as a SoftIRQ context. Code executing in a SoftIRQ context
must not block. The Linux AES-NI-GCM Crypto driver is an implementation of
RFC4106 and, as such, is specifically intended for use by an IPSec stack. The
common usage model for this driver is for it to be invoked by the IPSec stack
that is executing in a SoftIRQ context. The driver may also be invoked in a
thread context.
Asynchronous Support
The assembly code implementation of AES-NI-GCM makes use of the
Intel® 64 XMM registers associated with the Streaming Instruction Multiple
Data extensions. The state of these registers is not automatically stored by
the Operating System (OS) during task switching. The kernel functions
kernel_fpu_begin and kernel_fpu_end are used to manage saving the
state of these registers.
The Linux AES-NI-GCM Crypto driver integrates with the Linux Crytpo
asynchronous framework called cryptd. In the cryptd framework, there is
one worker thread per CPU core.
Saving XMM register state can be an expensive operation. If the code is
executing in a SoftIRQ context and the driver determines that the XMM
register state needs to be saved, then the request is offloaded to the cryptd
framework to be processed in a worker thread at a later time. This case can
occur when the driver running in a thread context (or some other
application/thread in the system) accesses the XMM registers and is
pre-empted by the SoftIRQ context.
324238-001
11
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Co-Existence with Other Implementations
The Linux crypto framework supports the simultaneous co-existence of
multiple drivers that implement the same crypto algorithms. If multiple
implementations of the same algorithm exist, then the Linux crypto
framework selects the implementation based on a priority setting. Each driver
sets its priority via the cra_priority member of the crypto_alg structure.
The kernel mode application that invokes the crypto API also has the option
to specifically request an implementation by specifying the driver name. The
driver name must match the name that was registered with the crypto
framework via the cra_driver_name member of the crypto_alg structure.
Performance Scalability
Before the ubiquitous availability of multi-core processors, clock speed
increases were one of the primary vectors that subsequent generations of
processors used to enhance performance. Modern multi-core processors have
the potential to deliver outstanding performance when the software workload
is sufficiently parallel. Amdahl’s law can be used to predict the performance
increase that a multi-core processor can deliver by looking at the proportion
of the workload that can be processed in parallel.
Unidirectional packets on an IPSec VPN tunnel can be described as all
belonging to the same flow. Packets from the same flow can be distributed to
different cores in the system using an interrupt load balancing based scheme.
However, if more than one VPN tunnel exists, improved scaling can be
achieved by configuring flow affinity to a particular core.
Platforms equipped with the Intel® 82599 10 Gigabit Ethernet controller can
configure flow affinity using either Receive Side Scaling (RSS) or Flow
Director filtering. With RSS enabled, the Ethernet controller generates a hash
value based on IP header fields and uses this hash value to select a
hardware-based receive queue. Configuring the interrupt associated with the
receive queue to be serviced by a particular core, effectively “affinitizes” the
flow to that core.
The Flow Director capability of the Intel® 82599 10 Gigabit Ethernet
controller offers even greater control to the system administrator. Whereas
RSS automatically determines the receive queue for a flow, Flow Director
provides the user with the capability to manually specify the queue for a flow.
In addition, Flow Director offers some control to the user to specify what
fields within the IP packet are used to determine the queue assignment.
The Linux AES-NI-GCM Crypto Driver supports simultaneous requests from
multiple contexts. This capability, combined with the ability to direct flows
between different cores facilitates excellent multi-core scaling characteristics.
12
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Testing Methodology
The testing results presented in this paper are based on a back-to-back test
configuration as depicted in Figure 3.
Hardware Platform
Two Intel platforms were each fitted with a single Intel® Xeon® Processor
E5645, with six cores at 2.4 GHz and 12 MB Layer 3 cache. In addition, 2 GB
of DDR3 RAM was installed on each platform. I/O connectivity was provided
by a dual port Intel® 82599 10 Gigabit Ethernet controller.
The two platforms were connected in a back-to-back configuration with an
optical cable. A port on the traffic generator was connected to the remaining
port on the Ethernet Controller.
Figure 3. Testing Topology
Use of subnet routing to steer
traffic across the different
VPN connections.
10G
NIC
Platform
with one
6-Core Westmere
10G
NIC
1-12
VPNs
10G
NIC
Platform
with one
6-Core Westmere
10G
NIC
Traffic Generator
(1-12 IP Packet Streams)
Software Configuration
The platform was initialized with a standard openSUSE* 11.1 distribution of
Linux and the native Linux 2.6.31.4 kernel was downloaded from
www.kernel.org and installed.
The strongSwan Pluto IKE stack version 4.3.5 was installed on the platform.
strongSwan was configured to use pre-shared keys and to set up twelve
ESP-based VPN connections in tunnel mode. The ESP security algorithm was
specified as AES-128-GCM.
324238-001
13
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Each receive queue had its interrupt affinity assigned to a single core and
RSS was used to load balance flows between receive queues. (Use of RSS
therefore balanced flows between cores as well.)
BIOS Configuration
The test configuration was focused on determining the maximum
performance capability of the Linux AES-NI-GCM Crypto Driver. In the test
setup, the power saving features integrated into the Intel® Xeon® core were
not needed and disabled in the BIOS.
C-States
C-State is a term taken from the Advanced Configuration and Power Interface
(ACPI) 1 specification [5]. The C-State represents the processor power state of
the core. The C-State is often more commonly known as the processor “idle”
state of the core. C-State values range from C0 to Cn, where n is dependent
on the specific processor. When the core is active and executing instructions,
it is in the C0 state. Higher C-States indicate how deep the CPU idle state is.
In this test, C-States were disabled in the BIOS to prevent the processor
switching into a low power state, because the test was designed to maximize
core utilization.
Enhanced Intel SpeedStep® Technology
Enhanced Intel SpeedStep® Technology is an advanced method of altering
the processor operating frequency and voltage between high and low levels
based on the processor load [6]. This technology enables Embedded Intel®
Architecture Processors to provide very high performance computing
capability while also enabling low energy consumption. The voltage-frequency
pair is known as the Device and Processor Performance State (P-State). A
P-State of P0 is the highest voltage/frequency pairing. A high P-State will
have lower voltage and frequency levels. It takes the processor longer to
complete a task in a high P-State, but less energy is consumed.
The operating system is responsible for managing when the P-State
transitions occur. In this test configuration, performance was the most
important factor, therefore Enhanced Intel SpeedStep® Technology was
disabled in the BIOS to take control away from the OS.
1
The ACPI specification V3.0a defines the following states: Global system
power states (G-states, S0, S5), System sleeping states (S-states S1-S4),
Device power states (D-states), Processor power states (C-states), Device
and processor performance states (P-states). See the specification for details.
14
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Cache & Hardware Prefetchers
A cache is a temporary storage location that is used to reduce the access
time to frequently accessed instructions or data. Intel® architecture
processors support multiple levels of cache. Level 1 (L1) cache is the smallest
in size and offers the lowest data accesses latency from the CPU. Level 2 (L2)
cache typically offers quite a bit more temporary storage than L1 cache, but
the access latencies increase. The Intel® Xeon® 5500 series processor also
has a very large Level 3 (L3) cache (up to 8 MB). The L3 cache has larger
access latencies than L1 or L2 caches, but it is still much faster than a
memory access.
Embedded Intel® Architecture Processors are capable of speculatively
predicting that data is probably going to be needed by the pipeline in the
near future and can read data into cache before the processor actually
requires it. This is known as prefetching and helps to reduce the pipeline
stalls that are attributable to waiting on memory accesses.
For this test, the hardware prefetchers were all enabled in the BIOS.
Intel® Hyper-Threading Technology
Intel® Hyper-Threading Technology (Intel® HT Technology) enables
parallelism at the thread level on each processor core. Two hardware threads
per core are supported. With Intel® HT Technology enabled, the Operating
System (OS) sees twice the number of cores. For example, a six-core Intel®
Xeon® processor with Intel® HT Technology enabled presents the OS with
twelve (logical) cores. The OS scheduler is aware of the logical cores that
share physical resources and will typically endeavor to schedule workloads
across physical cores before loading two threads onto the same core.
For the tests described in this paper, Intel® HT Technology was
enabled/disabled as follows:
• Single tunnel test: Intel® HT Technology was enabled, however, it did not
impact the result because all processing was performed on one core with all
other cores idle.
• Six tunnel test: each tunnel was allocated to a specific core. Intel® HT
Technology was disabled to ensure that two tunnels were not inadvertently
running on logical cores that mapped to the same physical core.
• Twelve tunnel test: Intel® HT Technology was enabled to present twelve
cores to the OS. Each tunnel was again allocated to a specific core. In this
instant, pairs of tunnels were sharing the same physical core.
Traffic Generator Configuration
The traffic generator was configured to create one, six, or twelve interleaving
plaintext flows with Ethernet frame sizes ranging from 64 to 1454 bytes. The
324238-001
15
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
1454 byte maximum value was chosen to avoid IP packet fragmentation
occurring on the VPN connection.
When the test generator transmits an Ethernet frame of 1454 bytes
(excluding CRC), the resulting Ethernet frame on the VPN tunnel is
1508 bytes (again excluding CRC). The Ethernet frame on the VPN tunnel is
bigger as it contains a new 20 byte encapsulating IP header, a 16 byte ESP
header and Initialization Vector (IV), zero bytes of ESP padding (for this
packet size), a two byte ESP trailer and a 16 byte Integrity Check Value
(ICV). This particular frame contains 90 16-byte AES blocks. This is the
maximum number of AES blocks that can fit in this frame before IP
fragmentation occurs (assuming the common 1500 byte MTU limit).
IPSec Internet Packet Mix (IMIX)
A common method for assessing packet processing performance is to
configure the test generator to transmit packets that fit a distribution pattern.
This pattern is known as an Internet Packet Mix (IMIX). The pattern usually
represents the expected packet distribution the device under test will be
exposed to in the production environment.
There are many definitions of an IPSec IMIX distribution. Spirent
Communications* have defined an IPSec IMIX distribution as being composed
of 58.67% of 90 byte packets, 2% of 92 byte packets, 23.66% of 594 byte
packets, and 15.67% of 1418 byte packets [7].
Assuming that the Spirent* IPSec IMIX distribution is based on packets in the
VPN tunnel, then it is not possible to get a traffic generator that is not an
endpoint of the IPSec connection (as per Figure 3) to generate this exact
frame distribution. The smallest frame that the traffic generator can transmit
is 64 bytes (including Ethernet CRC). Anything smaller is a runt Ethernet
frame and not standard-compliant. A 64 byte Ethernet frame transmitted
from the traffic generator equates to a 118 byte Ethernet frame on the VPN
tunnel.
For completeness, this paper captures performance results with both the
Spirent IMIX definition and a custom definition that closely matches the
assumed intention of the Spirent IMIX definition.
Performance Results
This section examines the performance results measured on the platform
under various configurations. The performance data shown was captured by
the traffic generator and does not account for the increased throughput at
which the device under test is operating. The Ethernet frame in the VPN
tunnel that is encapsulating an IPSec ESP packet for AES-GCM is at least
54 bytes larger than the plaintext Ethernet frame sent by the traffic
generator.
16
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Single Tunnel Performance
For the single tunnel performance test, a unidirectional IPSec flow was
affinitized to a single core. Figure 4 shows the performance in Mbps of four
different software configurations handling the same load.
The top blue line called “1VPN – NULL Cipher” represents the performance
measured when the actual cipher operation portion of the IPSec VPN tunnel is
stubbed out to a NULL operation. This gives an indication of the theoretical
maximum IPSec packet processing performance if the cipher operation could
be completed in zero cycles. This line effectively represents the upper-bound
per-core packet processing capability that the operating system Ethernet, IP,
and IPSec stacks impose on the system.
The green line, second from the top, shows the performance achieved with
the new Linux AES-NI-GCM Crypto driver installed. For larger packets, the
throughput is over 2 Gbps for this core.
The red line, second from the bottom, shows the performance achieved with
the existing AES-NI based Linux Crypto driver from the 2.6.31.4 kernel is
loaded on the platform. It delivers a nice gain in performance with the chart
showing ~500 Mbps for large packets.
The bottom purple line on the chart shows the performance achieved when
running the test with no AES-NI software support. For the larger packets, it
maxes out at ~450 Mbps.
The chart shows that the new AES-NI-GCM crypto driver represents a
substantial 4x increase in performance over the existing AES-NI based code.
324238-001
17
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Figure 4. Single IPSec Tunnel Performance in Mbps
An alternative method for examining the performance data is to convert from
throughput in Mbps versus packet size in bytes to a cycles-per-packet versus
packet size chart. Figure 5 presents this alternative view of performance. The
trend lines on the figure are linear equations that can be used to describe the
system. The slope component of the linear equation represents the per-byte
cycle cost of the crypto operation and the Y-intercept represents the fixed per
packet cycle cost. Note these equations assume 100% CPU loading. Note that
the colors of the chart data are the same as for Figure 4, however, the
positions are reversed (no AES-NI on top, NULL cipher on bottom).
Two salient data points emerge from this chart. The first is that the standard
Linux 2.6.31.4 kernel requires ~6700 cycles per packet to perform IPSec with
a NULL cipher routine. The second is the enormous per-byte cycle savings
that has been delivered with the new Linux AES-NI-GCM crypto driver. With
the existing AES-NI based solution, the cost per byte was ~32 cycles. The
new driver can perform the same AES-GCM operation on a byte of data in an
considerably lower 4.6 cycles. This represents an approximately 900%
reduction in the cycles-per-byte required to perform the AES-GCM crypto
operation when compared to a non-AES-NI enabled software solution.
18
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Figure 5. Single IPSec Tunnel Performance in cycles per packet
The data shown in Figure 5 shows that with the existing AES-NI driver, a
large proportion of the cycle budget in the core is spent on the cipher
operation for large packets. With the new Linux AES-NI-GCM crypto driver
this balance has changed, and now the majority of cycles are spent in the
non-crypto portion of the workload.
Figure 6 depicts a comparison between cycles spent in the Ethernet, IP and
IPSec stacks in the native Linux kernel and cycles spent in the crypto driver
to perform AES-GCM using the new instructions. The data in Figure 6 was
generated based on calculations on the "1VPN – NULL Cipher” and "1 VPN AESNI-GCM" linear equations in Figure 5. The larger red bars show that the
Linux kernel is now the most cycle-intensive component in the software. This
is most obvious for small packets where 88% of the cycles were spent
running the Ethernet, IP and IPSec stacks.
324238-001
19
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Figure 6. Single IPSec Tunnel - Percentage of time for crypto vs. non-crypto processing
Due to the significant number of cycles per packet required to process an
IPSec ESP packet, the Linux kernel does not truly take advantage of the
power of the processor. To push the performance envelope further, lower
overhead operating systems and software stacks could be installed. These
lower overhead operating systems are sometimes called micro-kernels, or
Run-Time Executives (RTE). An investigation into the performance benefits of
using RTEs is outside of the scope of this paper.
Six Tunnel Performance
For the six simultaneous tunnels performance test, a unidirectional IPSec flow
for each tunnel was affinitized to a single core. Figure 7 shows the
performance in Mbps of four different software configurations handling the
same load. In this configuration, Intel® HT Technology was disabled so that
the OS could only see six cores.
This diagram shows that the processor running the new Linux AES-NI-GCM
crypto driver hits 10G line rate at larger packet sizes. The existing AES-NI
driver shows performance scaling to ~3Gbps for the platform. Comparing this
data to the single tunnel performance data in Figure 4 (that is ~500Mbps)
shows a close to linear scaling of performance from one to six cores.
20
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
With the new Linux AES-NI-GCM crypt driver installed in the platform, the
10Gbps I/O performance ceiling is reached with line rate being achieved at
~1280 byte packet sizes.
Figure 7. Performance in Mbps for Six Simultaneous IPSec Tunnels
Twelve Tunnel Performance
For the twelve simultaneous tunnels performance test, a unidirectional IPSec
flow for each tunnel was affinitized to a single logical core. Figure 8 shows the
performance in Mbps of four different software configurations handling the
same load. In this configuration, Intel® HT Technology was enabled so that
the OS could see twelve cores.
This data shows that with the new Linux AES-NI-GCM crypto driver, the
10 Gbps I/O performance ceiling is reached with much smaller packets. Line
rate is achieved at ~950 byte packet sizes. This demonstrates clearly that the
Intel® HT Technology has a demonstrable positive impact on the packet
processing performance.
324238-001
21
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Figure 8. Performance in Mbps for 12 Simultaneous IPSec Tunnels
IPSec IMIX Performance
For the IPSec IMIX performance test, two different IMIX distributions were
used. The first was the standard Spirent defined IPSec IMIX distribution and
is shown as the red bars in Figure 9. Setting up this distribution with a traffic
generator that is not an endpoint for the IPSec connection results in a
different packet distribution on the VPN tunnel than was probably intended by
the creators. See section IPSec Internet Packet Mix (IMIX) for distribution
details.
The second IMIX distribution is customized to closely simulate the intended
Spirent-defined IPSec IMIX distribution on the VPN connection. The new
distribution is shown as “IPSec IMIX – Custom” blue bars in Figure 9. This
distribution used 64 byte packets with a relative weighting of 60, 540 byte
packets with a relative weighting of 23, and 1364 byte packets with a relative
weighting of 15.
The key points to take away from this chart are that the one to six tunnel
tests show near linear scaling (5.6x versus a linear scaling factor of 6x)
across the cores. Enabling Intel® HT Technology in the platform and adding
twelve tunnels shows that positive impact of 26% increase in the platform
performance is achieved.
22
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Figure 9. IPSec IMIX Performance in Mbps
Conclusion
This paper focuses on the design and performance capabilities of an
implementation of IPSec in Linux that is configured to use the Advanced
Encryption Standard (AES) Galois Counter Mode (GCM) algorithm mode
combination. The implementation leverages new instructions in the Intel®
microarchitecture, formerly codenamed Westmere, which is currently
available in certain Intel® Xeon® processors and Intel® Core™ processors.
An analysis of the single tunnel, single-threaded, single core performance
results reveals that the combined AES-GCM driver based on Intel® AES-NI
delivers an outstanding ~400% increase in Linux IPSec large packet
throughput when compared to a non-AES-NI enabled software solution
running on the same platform. An even more salient point is the ~900%
reduction in the cycles-per-byte required to perform the AES-GCM crypto
operation when compared to a non-AES-NI enabled software solution.
An analysis of the multiple tunnels, multiple cores performance results
reveals that the platform can scale to delivering 10 Gbps line rate for packet
sizes of approximately 900 bytes and upwards. This is a significant increase
324238-001
23
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
from the non-AES-NI enabled solution that could deliver approximately 2.5
Gbps.
In this configuration, Intel® Hyper-Threading Technology is proven to offer a
significant advantage to this packet processing workload. An analysis of the
IPSec IMIX performance shows that the Intel® HT Technology-enabled
solution provides up to 26% more throughput than the non-Intel® HT
Technology-enabled solution.
Due to the significant number of cycles required to process an IPSec ESP
packet, the Linux kernel does not truly unleash the power of the processor.
To fully explore the performance capabilities of the processor, the platform
software should be configured to provide optimal performance [8].
Alternatively, optimized software stacks based on a micro-kernel or Run-Time
Executive should be used.
Nonetheless, the data presented in this paper demonstrates that an AES-NI
enabled IPSec stack on Linux, running on a processor based on Intel®
microarchitecture can deliver incredible IPSec performance increases over
previous generations of silicon.
References
1.
Intel® AES New Instructions: http://software.intel.com/en-us/articles/inteladvanced-encryption-standard-instructions-aes-ni/
2.
Rijndael Specification: http://csrc.nist.gov/archive/aes/rijndael/Rijndaelammended.pdf
3.
Advanced Encryption Standard. FIPS 197:
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
4.
Optimized Galois-Counter-Mode Implementation on Intel® Architecture
Processors: http://download.intel.com/design/intarch/PAPERS/324194.pdf
5.
Advanced Configuration and Power Interface (ACPI) specification:
http://www.acpi.info/spec.htm
6.
Enhanced Intel® SpeedStep® Technology:
http://www.intel.com/support/processors/sb/CS-028855.htm
7.
IPSec IMIX defined by Spirent Communications*:
http://spcprev.spirentcom.com/documents/4079.pdf
8.
Design considerations for efficient network applications with Intel® multi-core
processor-based systems on Linux.
http://download.intel.com/design/intarch/papers/324176.pdf
The Intel® Embedded Design Center provides qualified developers with webbased access to technical resources. Access Intel Confidential design
materials, step-by step guidance, application reference solutions, training,
Intel’s tool loaner program, and connect with an e-help desk and the
embedded community. Design Fast. Design Smart. Get started today.
www.intel.com/embedded/edc.
24
324238-001
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
Author
Adrian Hoban is a software engineer with the Embedded and
Communications Group at Intel Corporation.
Contributors
Tadeusz Struk, Gabriele Paoloni, and Aidan O’ Mahony are
software engineers with the Embedded and Communications
Group at Intel Corporation.
Wajdi Feghali, Erdinc Ozturk, James Guilford, and Vinodh Gopal
are architects with the Intel Architecture Group at Intel
Corporation.
Edward Clinton and Ken Reynolds are engineering managers with
the Embedded and Communications Group at Intel Corporation.
Acronyms
AEAD
Authenticated Encryption with Associated Data
AES
Advanced Encryption Standard
AES-NI Advanced Encryption Standard New Instructions
324238-001
AH
Authenticated Header
API
Application Programming Interface
BIOS
Basic Input Output System
CRC
Cyclic Redundancy Check
DDR3
Double Data Rate 3
EDC
Embedded Design Center
EIST
Enhanced Intel® SpeedStep® Technology
ESP
Encapsulating Security Payload
FIPS
Federal Information Processing Standards Publication
GCM
Galois Counter Mode
HT
Intel® Hyper-Threading Technology
ICV
Integrity Check Value
IKE
Internet Key Exchange
IMIX
Internet (Packet) Mix
IP
Internet Protocol
IPSec
Internet Protocol security
IV
Initialization Vector
NSA
National Security Agency
25
Using Intel® AES-NI to Significantly Improve IPSec Performance on Linux*
OS
Operating System
RAM
Random Access Memory
RTE
Run Time Executive
SADB
Security Association Database
SPDB
Security Policy Database
VPN
Virtual Private Network
XFRM
Transformer Module
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS
PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER,
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS
INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR
INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN
WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE
OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice.
Intel ® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the
instructions in the correct sequence. AES-NI is available on Intel® Core™ i5-600 Desktop Processor Series, Intel® Core™ i7-600
Mobile Processor Series, and Intel® Core™ i5-500 Mobile Processor Series. For availability, consult your reseller or system
manufacturer. For more information, see http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-InstructionsSet_WP.pdf
Requires an Intel® HT Technology enabled system, check with your PC manufacturer. Performance will vary depending on the
specific hardware and software used. Not available on Intel® Core™ i5-750. For more information including details on which
processors support HT Technology, visit http://www.intel.com/info/hyperthreading
Enhanced Intel SpeedStep® Technology: See the Processor Spec Finder at http://ark.intel.com/ or contact your Intel
representative for more information.
Intel, the Intel logo, Intel Core, Intel SpeedStep, and Xeon are trademarks of Intel Corporation in the U.S. and/or other
countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2010 Intel Corporation. All rights reserved.
26
324238-001