VMA Offloading
Quick Start Guide
Rev 1.0
www.mellanox.com
Mellanox Technologies Confidential
NOTE:
THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT (“PRODUCT(S)”) AND ITS RELATED
DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES “AS-IS” WITH ALL FAULTS OF ANY
KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE
THE PRODUCTS IN DESIGNATED SOLUTIONS. THE CUSTOMER'S MANUFACTURING TEST ENVIRONMENT
HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT(S)
AND/OR THE SYSTEM USING IT. THEREFORE, MELLANOX TECHNOLOGIES CANNOT AND DOES NOT
GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY. ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED.
IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT,
INDIRECT, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES OF ANY KIND (INCLUDING, BUT NOT
LIMITED TO, PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY FROM THE USE OF THE PRODUCT(S) AND RELATED DOCUMENTATION EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Mellanox Technologies
350 Oakmead Parkway Suite 100
Sunnyvale, CA 94085
U.S.A.
www.mellanox.com
Tel: (408) 970-3400
Fax: (408) 970-3403
© Copyright 2016. Mellanox Technologies Ltd. All Rights Reserved.
Mellanox®, Mellanox logo, Accelio®, BridgeX®, CloudX logo, CompustorX®, Connect-IB®, ConnectX®, CoolBox®,
CORE-Direct®, EZchip®, EZchip logo, EZappliance®, EZdesign®, EZdriver®, EZsystem®, GPUDirect®, InfiniHost®,
InfiniScale®, Kotura®, Kotura logo, Mellanox Federal Systems®, Mellanox Open Ethernet®, Mellanox ScalableHPC®,
Mellanox TuneX®, Mellanox Connect Accelerate Outperform logo, Mellanox Virtual Modular Switch®, MetroDX®,
MetroX®, MLNX-OS®, NP-1c®, NP-2®, NP-3®, Open Ethernet logo, PhyX®, PSIPHY®, SwitchX®, Tilera®, Tilera logo,
TestX®, TuneX®, The Generation of Open Ethernet logo, UFM®, Virtual Protocol Interconnect®, Voltaire® and Voltaire
logo are registered trademarks of Mellanox Technologies, Ltd.
All other trademarks are property of their respective owners.
For the most updated list of Mellanox trademarks, visit http://www.mellanox.com/page/trademarks
2
Document Number: MLNX-15-51330
Mellanox Technologies Confidential
Table of Contents
Rev 1.0
Table of Contents
Document Revision History.................................................................................................................. 5
1
Overview .......................................................................................................................................... 6
1.1
Prerequisites ........................................................................................................................... 6
2
Installing VMA ................................................................................................................................. 7
3
Binding VMA to the Closest NUMA ............................................................................................. 11
4
3.1
VMA Tuning Parameters ...................................................................................................... 12
3.2
Configuring the BIOS ............................................................................................................ 12
Related Documentation ................................................................................................................ 14
3
Mellanox Technologies Confidential
Rev 1.0
Table of Contents
List of Tables
Table 1: Document Revision History ....................................................................................................... 5
Table 2: VMA Tuning Parameters ......................................................................................................... 12
Table 3: Related Documentation ........................................................................................................... 14
4
Mellanox Technologies Confidential
VMA Offloading Quick Start Guide
Rev 1.0
Document Revision History
Table 1: Document Revision History
Revision
Date
Description
1.0
August 2016
Initial version of this document
5
Mellanox Technologies Confidential
Rev 1.0
1
Overview
Overview
This document describes a how to implement step-by-step VMA offloading in your setup.
Please note, the setup used in this document includes two HP DL-380 servers with RH7.1,
and an x86_64 architecture connected to an Ethernet Mellanox switch.
1.1
Prerequisites
• 2 machines, one serves as the server and the second as a client
•
Management interfaces configured with an IP that machines can ping each other
•
Physical installation of Mellanox NIC in your machines
• Your system must recognize the Mellanox NIC. To verify it recognizes it, run "lspici
| grep Mellanox"
Example output:
[root@r-host141 0]# lspci |grep Mell
81:00.0 Network controller: Mellanox Technologies MT27520 Family
[ConnectX-3]
6
Mellanox Technologies Confidential
VMA Offloading Quick Start Guide
2
Rev 1.0
Installing VMA
1. Download the latest MLNX-OFED from.
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_driv
ers
Note: Download the MLNX_OFED<version>.tgz file (NOT the iso).
2. Unpack the file and start installing the driver.
a. Copy the driver to the test machines.
b. In the directory you have copied the software unpack the software:
tar -xvf
MLNX_OFED_LINUX-<version>.tgz
c. Change to the install directory and install the software. This requires root access.
cd MLNX_OFED_LINUX-<version>
./mlnxofedinstall –vma --force
d. Type "yes" when asked and reboot the server when the installation is completed.
3. Configure the IPs for MLNX interfaces on both sides, server and client and verify ping.
4. Verify the libvma RPM installed.
rpm –qa |grep libvma
[root@r-host144 ~]# rpm -qa |grep libvma
libvma-utils-7.0.14-1.x86_64
libvma-7.0.14-1.x86_64
libvma-devel-7.0.14-1.x86_64
5. Run sockperf without VMA.
•
On the first machine (Server side): sockperf server
•
On the second (Client side): sockperf ping-pong -i <IP of FIRST
machine MLX interface> -t <test duration>
7
Mellanox Technologies Confidential
Rev 1.0
Installing VMA
Example output:
Server side:
[root@r-host142 tmp]#
sockperf sr
sockperf: == version #2.7-41.git241c4528ae75 ==
sockperf: [SERVER] listen on:
[ 0] IP = 0.0.0.0
PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 17945] using recvfrom() to block on socket(s)
Client Side:
[root@r-host144 tmp]#
sockperf pp -i 11.209.13.142
sockperf: == version #2.7-41.git241c4528ae75 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on
socket(s)
[ 0] IP = 11.209.13.142
PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.100 sec; SentMessages=82494;
ReceivedMessages=82493
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=75419;
ReceivedMessages=75419
sockperf: ====> avg-lat= 6.612 (std-dev=2.864)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-oforder messages = 0
sockperf: Summary: Latency is 6.612 usec
sockperf: Total 75419 observations; each percentile contains 754.19
observations
sockperf: ---> <MAX> observation = 133.894
sockperf: ---> percentile 99.99 =
31.852
sockperf: ---> percentile 99.90 =
23.490
sockperf: ---> percentile 99.50 =
20.164
sockperf: ---> percentile 99.00 =
19.666
sockperf: ---> percentile 95.00 =
12.456
sockperf: ---> percentile 90.00 =
10.296
sockperf: ---> percentile 75.00 =
6.013
sockperf: ---> percentile 50.00 =
5.761
sockperf: ---> percentile 25.00 =
5.463
sockperf: ---> <MIN> observation =
5.042
6. Run sockperf with VMA.
•
On the first machine: LD_PRELOAD=libvma.so sockperf server
•
On the second: LD_PRELOAD=libvma.so sockperf ping-pong -t
<test duration> -i <IP of FIRST machine MLX interface>
Example output:
Server Side:
[root@r-host142 ~]# LD_PRELOAD=libvma.so sockperf server
VMA INFO
: -------------------------------------------------------------------------VMA INFO
: VMA_VERSION: 7.0.14-0 Release built on Dec 7 2015
13:14:39
VMA INFO
: Cmd Line: sockperf server
VMA INFO
: OFED Version: MLNX_OFED_LINUX-3.2-0.0.3.0:
VMA INFO
: Log Level
3
[VMA_TRACELEVEL]
8
Mellanox Technologies Confidential
VMA Offloading Quick Start Guide
Rev 1.0
VMA INFO
: -------------------------------------------------------------------------sockperf: == version #2.7-41.git241c4528ae75 ==
sockperf: [SERVER] listen on:
[ 0] IP = 0.0.0.0
PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 33634] using recvfrom() to block on socket(s)
Client Side:
[root@r-host144 ~]# LD_PRELOAD=libvma.so sockperf ping-pong -i
11.209.13.142
VMA INFO
: -------------------------------------------------------------------------VMA INFO
: VMA_VERSION: 7.0.14-0 Release built on Dec 7 2015
13:14:39
VMA INFO
: Cmd Line: sockperf ping-pong -i 11.209.13.142 -t 10
VMA INFO
: OFED Version: MLNX_OFED_LINUX-3.2-0.0.3.0:
VMA INFO
: Log Level
3
[VMA_TRACELEVEL]
VMA INFO
: -------------------------------------------------------------------------sockperf: == version #2.7-41.git241c4528ae75 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on
socket(s)
[ 0] IP = 11.209.13.142
PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.100 sec; SentMessages=377651;
ReceivedMessages=377650
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=348797;
ReceivedMessages=348797
sockperf: ====> avg-lat= 1.420 (std-dev=0.164)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-oforder messages = 0
sockperf: Summary: Latency is 1.420 usec
sockperf: Total 348797 observations; each percentile contains 3487.97
observations
sockperf: ---> <MAX> observation =
9.325
sockperf: ---> percentile 99.99 =
6.666
sockperf: ---> percentile 99.90 =
2.337
sockperf: ---> percentile 99.50 =
2.121
sockperf: ---> percentile 99.00 =
2.088
sockperf: ---> percentile 95.00 =
1.621
sockperf: ---> percentile 90.00 =
1.450
sockperf: ---> percentile 75.00 =
1.407
sockperf: ---> percentile 50.00 =
1.387
sockperf: ---> percentile 25.00 =
1.366
sockperf: ---> <MIN> observation =
1.309
Please note the additional VMA header on both the client and server.
9
Mellanox Technologies Confidential
Rev 1.0
Installing VMA
Average latency:
•
Over O.S is 6.612 usec
•
Over VMA 1.420 usec
These machine are running with a fresh OS install, no hardware or OS tuning. With tuning,
these number will be reduced further.
10
Mellanox Technologies Confidential
VMA Offloading Quick Start Guide
3
Rev 1.0
Binding VMA to the Closest NUMA
1. Check which NUMA is related to your interface.
cat /sys/class/net/<interface_name>/device/numa_node
Example:
[root@r-host142 ~]# cat /sys/class/net/ens5/device/numa_node
1
The output above shows that your device is installed next to NUMA 1.
2. Check which CPU is related to the specific NUMA.
[root@r-host144 ~]# lscpu
NUMA node0 CPU(s):
0-13,28-41
NUMA node1 CPU(s):
14-27,42-55
The output above shows that:
•
CPUs 0-13 & 28-41 are related to NUMA 0
•
CPUs 14-27 & 42-55 are related to NUMA 1
Since we want to use NUMA 1, one of the following CPUs should be used: 14-27 &
42-55
3. Use taskset command to run the VMA process on a specific CPU.
•
Server side: LD_PRELOAD=libvma.so taskset -c 15 sockperf sr i < MLX IP interface >
•
Client Side: LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i
< IP of FIRST machine MLX interface >
In this example, we use CPU 15 that belongs to NUMA 1. You can also use
"numactl - -hardware"
11
Mellanox Technologies Confidential
Rev 1.0
3.1
Binding VMA to the Closest NUMA
VMA Tuning Parameters
Table 2: VMA Tuning Parameters
Parameter
Description
Example
VMA_RX_POLL
For blocking sockets only. It
controls the number of times the
ready packets can be polled on the
RX path before they go to sleep
(wait for interrupt in blocked
mode). The recommended value
for best latency is -1 (unlimited),
• Server:
VMA_RX_POLL=-1
LD_PRELOAD=libvma.so
taskset -c 15
sockperf sr
-i 17.209.13.142
• For best latency, use -1 for
infinite polling
• For low CPU usage use 1 for
single poll
• Client:
VMA_RX_POLL=-1
LD_PRELOAD=libvma.so
taskset -c 15 sockperf pp
-i 17.209.13.142 -t 5
• Default value is 100000
VMA_INTERNAL_T
HREAD_AFFINITY
3.2
Controls which CPU core(s) the
VMA internal thread is serviced
on. The recommended
configuration is to run VMA
internal thread on a different core
than the application but on the
same NUMA node.
• Server:
VMA_INTERNAL_THREAD_AFFINIT
Y=14 VMA_RX_POLL=-1
LD_PRELOAD=libvma.so
taskset -c 15
sockperf sr
-i 17.209.13.142
• Client:
VMA_INTERNAL_THREAD_AFFINIT
Y= 14 VMA_RX_POLL=-1
LD_PRELOAD=libvma.so
taskset -c 15 sockperf pp
-i 17.209.13.142 -t 5
Configuring the BIOS
Each machine has its own BIOS parameters. It is important to implement any server
manufacturer and Linux distribution tuning recommendations for lowest latency.
When configuring the BIOS, please pay attention to the following:
1. Enable Max performance mode.
2. Enable Turbo mode.
3. Power modes – disable C-states and P-states, do not let the CPU sleep on idle.
4. Hyperthreading – there is no right answer if you should have it ON or OFF.
•
ON means more CPU to handle kernel tasks, so the amortized cost will be smaller for
each CPU
•
OFF means do not share cache with other CPUs, so cache utilization is better
If all of your system jitter is under control, it is recommended to turn is OFF, if not keep
it ON.
12
Mellanox Technologies Confidential
VMA Offloading Quick Start Guide
Rev 1.0
5. Disable SMI interrupts.
Look for "Processor Power and Utilization Monitoring" and "Memory Pre-Failure
Notification" SMIs.
The OS is not aware of these interrupts, so the only way you might be able to notice them
is by reading the CPU msr register.
You need to carefully read your vendor BIOS tuning guide, as the vendors tend to hide
these options, so you will not be able to access them unless pressing some keycombination while in the BIOS menu.
13
Mellanox Technologies Confidential
Rev 1.0
4
Related Documentation
Related Documentation
For additional information, see the following documents/webpages:
Table 3: Related Documentation
Document/Webpage
Description
VMA Release Notes
Lists VMA’s latest feature and changes and possible software
issues.
http://www.mellanox.com/relateddocs/prod_acceleration_software/VMA_8_0_4_Release_Note
s_DOC-00329.pdf
VMA User Manual
Describes installation, configuration and operation of
Mellanox VMA driver.
http://www.mellanox.com/relateddocs/prod_acceleration_software/VMA_8_0_4_User_Manual
_DOC-00393.pdf
VMA Installation Guide
Provides an introduction to installing and running VMA for
UDP/TCP latency and throughput performance.
http://www.mellanox.com/relateddocs/prod_acceleration_software/VMA_8_0_4_Installation_G
uide_DOC-10055.pdf
VMA Wiki
https://github.com/Mellanox/libvma/wiki
VMA GitHub
https://github.com/Mellanox/libvma/
HP Guide
http://h10032.www1.hp.com/ctg/Manual/c01804533.pdf
Red Hat Enterprise Linux
Performance Tuning Guide
https://access.redhat.com/documentation/enUS/Red_Hat_Enterprise_Linux/7/pdf/Performance_Tuning_G
uide/Red_Hat_Enterprise_Linux-7Performance_Tuning_Guide-en-US.pdf
14
Mellanox Technologies Confidential