VMA Offloading Quick Start Guide Rev 1.0 www.mellanox.com Mellanox Technologies Confidential NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT (“PRODUCT(S)”) AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES “AS-IS” WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS. THE CUSTOMER'S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT(S) AND/OR THE SYSTEM USING IT. THEREFORE, MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY. ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED. IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT, INDIRECT, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES OF ANY KIND (INCLUDING, BUT NOT LIMITED TO, PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY FROM THE USE OF THE PRODUCT(S) AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Mellanox Technologies 350 Oakmead Parkway Suite 100 Sunnyvale, CA 94085 U.S.A. www.mellanox.com Tel: (408) 970-3400 Fax: (408) 970-3403 © Copyright 2016. Mellanox Technologies Ltd. All Rights Reserved. Mellanox®, Mellanox logo, Accelio®, BridgeX®, CloudX logo, CompustorX®, Connect-IB®, ConnectX®, CoolBox®, CORE-Direct®, EZchip®, EZchip logo, EZappliance®, EZdesign®, EZdriver®, EZsystem®, GPUDirect®, InfiniHost®, InfiniScale®, Kotura®, Kotura logo, Mellanox Federal Systems®, Mellanox Open Ethernet®, Mellanox ScalableHPC®, Mellanox TuneX®, Mellanox Connect Accelerate Outperform logo, Mellanox Virtual Modular Switch®, MetroDX®, MetroX®, MLNX-OS®, NP-1c®, NP-2®, NP-3®, Open Ethernet logo, PhyX®, PSIPHY®, SwitchX®, Tilera®, Tilera logo, TestX®, TuneX®, The Generation of Open Ethernet logo, UFM®, Virtual Protocol Interconnect®, Voltaire® and Voltaire logo are registered trademarks of Mellanox Technologies, Ltd. All other trademarks are property of their respective owners. For the most updated list of Mellanox trademarks, visit http://www.mellanox.com/page/trademarks 2 Document Number: MLNX-15-51330 Mellanox Technologies Confidential Table of Contents Rev 1.0 Table of Contents Document Revision History.................................................................................................................. 5 1 Overview .......................................................................................................................................... 6 1.1 Prerequisites ........................................................................................................................... 6 2 Installing VMA ................................................................................................................................. 7 3 Binding VMA to the Closest NUMA ............................................................................................. 11 4 3.1 VMA Tuning Parameters ...................................................................................................... 12 3.2 Configuring the BIOS ............................................................................................................ 12 Related Documentation ................................................................................................................ 14 3 Mellanox Technologies Confidential Rev 1.0 Table of Contents List of Tables Table 1: Document Revision History ....................................................................................................... 5 Table 2: VMA Tuning Parameters ......................................................................................................... 12 Table 3: Related Documentation ........................................................................................................... 14 4 Mellanox Technologies Confidential VMA Offloading Quick Start Guide Rev 1.0 Document Revision History Table 1: Document Revision History Revision Date Description 1.0 August 2016 Initial version of this document 5 Mellanox Technologies Confidential Rev 1.0 1 Overview Overview This document describes a how to implement step-by-step VMA offloading in your setup. Please note, the setup used in this document includes two HP DL-380 servers with RH7.1, and an x86_64 architecture connected to an Ethernet Mellanox switch. 1.1 Prerequisites • 2 machines, one serves as the server and the second as a client • Management interfaces configured with an IP that machines can ping each other • Physical installation of Mellanox NIC in your machines • Your system must recognize the Mellanox NIC. To verify it recognizes it, run "lspici | grep Mellanox" Example output: [root@r-host141 0]# lspci |grep Mell 81:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3] 6 Mellanox Technologies Confidential VMA Offloading Quick Start Guide 2 Rev 1.0 Installing VMA 1. Download the latest MLNX-OFED from. http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_driv ers Note: Download the MLNX_OFED<version>.tgz file (NOT the iso). 2. Unpack the file and start installing the driver. a. Copy the driver to the test machines. b. In the directory you have copied the software unpack the software: tar -xvf MLNX_OFED_LINUX-<version>.tgz c. Change to the install directory and install the software. This requires root access. cd MLNX_OFED_LINUX-<version> ./mlnxofedinstall –vma --force d. Type "yes" when asked and reboot the server when the installation is completed. 3. Configure the IPs for MLNX interfaces on both sides, server and client and verify ping. 4. Verify the libvma RPM installed. rpm –qa |grep libvma [root@r-host144 ~]# rpm -qa |grep libvma libvma-utils-7.0.14-1.x86_64 libvma-7.0.14-1.x86_64 libvma-devel-7.0.14-1.x86_64 5. Run sockperf without VMA. • On the first machine (Server side): sockperf server • On the second (Client side): sockperf ping-pong -i <IP of FIRST machine MLX interface> -t <test duration> 7 Mellanox Technologies Confidential Rev 1.0 Installing VMA Example output: Server side: [root@r-host142 tmp]# sockperf sr sockperf: == version #2.7-41.git241c4528ae75 == sockperf: [SERVER] listen on: [ 0] IP = 0.0.0.0 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: [tid 17945] using recvfrom() to block on socket(s) Client Side: [root@r-host144 tmp]# sockperf pp -i 11.209.13.142 sockperf: == version #2.7-41.git241c4528ae75 == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 11.209.13.142 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=1.100 sec; SentMessages=82494; ReceivedMessages=82493 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=75419; ReceivedMessages=75419 sockperf: ====> avg-lat= 6.612 (std-dev=2.864) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-oforder messages = 0 sockperf: Summary: Latency is 6.612 usec sockperf: Total 75419 observations; each percentile contains 754.19 observations sockperf: ---> <MAX> observation = 133.894 sockperf: ---> percentile 99.99 = 31.852 sockperf: ---> percentile 99.90 = 23.490 sockperf: ---> percentile 99.50 = 20.164 sockperf: ---> percentile 99.00 = 19.666 sockperf: ---> percentile 95.00 = 12.456 sockperf: ---> percentile 90.00 = 10.296 sockperf: ---> percentile 75.00 = 6.013 sockperf: ---> percentile 50.00 = 5.761 sockperf: ---> percentile 25.00 = 5.463 sockperf: ---> <MIN> observation = 5.042 6. Run sockperf with VMA. • On the first machine: LD_PRELOAD=libvma.so sockperf server • On the second: LD_PRELOAD=libvma.so sockperf ping-pong -t <test duration> -i <IP of FIRST machine MLX interface> Example output: Server Side: [root@r-host142 ~]# LD_PRELOAD=libvma.so sockperf server VMA INFO : -------------------------------------------------------------------------VMA INFO : VMA_VERSION: 7.0.14-0 Release built on Dec 7 2015 13:14:39 VMA INFO : Cmd Line: sockperf server VMA INFO : OFED Version: MLNX_OFED_LINUX-3.2-0.0.3.0: VMA INFO : Log Level 3 [VMA_TRACELEVEL] 8 Mellanox Technologies Confidential VMA Offloading Quick Start Guide Rev 1.0 VMA INFO : -------------------------------------------------------------------------sockperf: == version #2.7-41.git241c4528ae75 == sockperf: [SERVER] listen on: [ 0] IP = 0.0.0.0 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: [tid 33634] using recvfrom() to block on socket(s) Client Side: [root@r-host144 ~]# LD_PRELOAD=libvma.so sockperf ping-pong -i 11.209.13.142 VMA INFO : -------------------------------------------------------------------------VMA INFO : VMA_VERSION: 7.0.14-0 Release built on Dec 7 2015 13:14:39 VMA INFO : Cmd Line: sockperf ping-pong -i 11.209.13.142 -t 10 VMA INFO : OFED Version: MLNX_OFED_LINUX-3.2-0.0.3.0: VMA INFO : Log Level 3 [VMA_TRACELEVEL] VMA INFO : -------------------------------------------------------------------------sockperf: == version #2.7-41.git241c4528ae75 == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 11.209.13.142 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=1.100 sec; SentMessages=377651; ReceivedMessages=377650 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=348797; ReceivedMessages=348797 sockperf: ====> avg-lat= 1.420 (std-dev=0.164) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-oforder messages = 0 sockperf: Summary: Latency is 1.420 usec sockperf: Total 348797 observations; each percentile contains 3487.97 observations sockperf: ---> <MAX> observation = 9.325 sockperf: ---> percentile 99.99 = 6.666 sockperf: ---> percentile 99.90 = 2.337 sockperf: ---> percentile 99.50 = 2.121 sockperf: ---> percentile 99.00 = 2.088 sockperf: ---> percentile 95.00 = 1.621 sockperf: ---> percentile 90.00 = 1.450 sockperf: ---> percentile 75.00 = 1.407 sockperf: ---> percentile 50.00 = 1.387 sockperf: ---> percentile 25.00 = 1.366 sockperf: ---> <MIN> observation = 1.309 Please note the additional VMA header on both the client and server. 9 Mellanox Technologies Confidential Rev 1.0 Installing VMA Average latency: • Over O.S is 6.612 usec • Over VMA 1.420 usec These machine are running with a fresh OS install, no hardware or OS tuning. With tuning, these number will be reduced further. 10 Mellanox Technologies Confidential VMA Offloading Quick Start Guide 3 Rev 1.0 Binding VMA to the Closest NUMA 1. Check which NUMA is related to your interface. cat /sys/class/net/<interface_name>/device/numa_node Example: [root@r-host142 ~]# cat /sys/class/net/ens5/device/numa_node 1 The output above shows that your device is installed next to NUMA 1. 2. Check which CPU is related to the specific NUMA. [root@r-host144 ~]# lscpu NUMA node0 CPU(s): 0-13,28-41 NUMA node1 CPU(s): 14-27,42-55 The output above shows that: • CPUs 0-13 & 28-41 are related to NUMA 0 • CPUs 14-27 & 42-55 are related to NUMA 1 Since we want to use NUMA 1, one of the following CPUs should be used: 14-27 & 42-55 3. Use taskset command to run the VMA process on a specific CPU. • Server side: LD_PRELOAD=libvma.so taskset -c 15 sockperf sr i < MLX IP interface > • Client Side: LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i < IP of FIRST machine MLX interface > In this example, we use CPU 15 that belongs to NUMA 1. You can also use "numactl - -hardware" 11 Mellanox Technologies Confidential Rev 1.0 3.1 Binding VMA to the Closest NUMA VMA Tuning Parameters Table 2: VMA Tuning Parameters Parameter Description Example VMA_RX_POLL For blocking sockets only. It controls the number of times the ready packets can be polled on the RX path before they go to sleep (wait for interrupt in blocked mode). The recommended value for best latency is -1 (unlimited), • Server: VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf sr -i 17.209.13.142 • For best latency, use -1 for infinite polling • For low CPU usage use 1 for single poll • Client: VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i 17.209.13.142 -t 5 • Default value is 100000 VMA_INTERNAL_T HREAD_AFFINITY 3.2 Controls which CPU core(s) the VMA internal thread is serviced on. The recommended configuration is to run VMA internal thread on a different core than the application but on the same NUMA node. • Server: VMA_INTERNAL_THREAD_AFFINIT Y=14 VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf sr -i 17.209.13.142 • Client: VMA_INTERNAL_THREAD_AFFINIT Y= 14 VMA_RX_POLL=-1 LD_PRELOAD=libvma.so taskset -c 15 sockperf pp -i 17.209.13.142 -t 5 Configuring the BIOS Each machine has its own BIOS parameters. It is important to implement any server manufacturer and Linux distribution tuning recommendations for lowest latency. When configuring the BIOS, please pay attention to the following: 1. Enable Max performance mode. 2. Enable Turbo mode. 3. Power modes – disable C-states and P-states, do not let the CPU sleep on idle. 4. Hyperthreading – there is no right answer if you should have it ON or OFF. • ON means more CPU to handle kernel tasks, so the amortized cost will be smaller for each CPU • OFF means do not share cache with other CPUs, so cache utilization is better If all of your system jitter is under control, it is recommended to turn is OFF, if not keep it ON. 12 Mellanox Technologies Confidential VMA Offloading Quick Start Guide Rev 1.0 5. Disable SMI interrupts. Look for "Processor Power and Utilization Monitoring" and "Memory Pre-Failure Notification" SMIs. The OS is not aware of these interrupts, so the only way you might be able to notice them is by reading the CPU msr register. You need to carefully read your vendor BIOS tuning guide, as the vendors tend to hide these options, so you will not be able to access them unless pressing some keycombination while in the BIOS menu. 13 Mellanox Technologies Confidential Rev 1.0 4 Related Documentation Related Documentation For additional information, see the following documents/webpages: Table 3: Related Documentation Document/Webpage Description VMA Release Notes Lists VMA’s latest feature and changes and possible software issues. http://www.mellanox.com/relateddocs/prod_acceleration_software/VMA_8_0_4_Release_Note s_DOC-00329.pdf VMA User Manual Describes installation, configuration and operation of Mellanox VMA driver. http://www.mellanox.com/relateddocs/prod_acceleration_software/VMA_8_0_4_User_Manual _DOC-00393.pdf VMA Installation Guide Provides an introduction to installing and running VMA for UDP/TCP latency and throughput performance. http://www.mellanox.com/relateddocs/prod_acceleration_software/VMA_8_0_4_Installation_G uide_DOC-10055.pdf VMA Wiki https://github.com/Mellanox/libvma/wiki VMA GitHub https://github.com/Mellanox/libvma/ HP Guide http://h10032.www1.hp.com/ctg/Manual/c01804533.pdf Red Hat Enterprise Linux Performance Tuning Guide https://access.redhat.com/documentation/enUS/Red_Hat_Enterprise_Linux/7/pdf/Performance_Tuning_G uide/Red_Hat_Enterprise_Linux-7Performance_Tuning_Guide-en-US.pdf 14 Mellanox Technologies Confidential