RHEL6 tuning guide for
mellanox ethernet card.
2012년 10월
유니원아이앤씨주식회사
IBM X3650 M4 Performance Tuning
1.1 X3650 BIOS Setting.
구분
General
Processor
항목
변경값
Power Profile/Operating Modes
Max Performance
C-States
Disabled
Turbo mode
Enabled /Performance Optimized
Hyper-Threading
Disabled
CPU frequency select
Max performance
C-states limit : C값에 따라 전력 소모를 제어 하는 것 을 말합니다.
C값이 0일 경우 가장많은 전력을 소모하며 지속적으로 working 상태 입니다.
C값이 높을수록 전력소모는 낮아지며, sleep time 상태로 되며 working 상태로 돌아오기 까지 많은 시간이 소요됩니다.
Disable 상태일 경우, 전력 소모를 제어하지 않습니다.
2
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.2 X3650 BIOS Setting.
구분
Memory
항목
변경값
Memory speed
Max performance
Memory channel mode
Independent
Socket Interleaving
NUMA / Disabled
Memory Node Interleaving
OFF
Patrol Scrubbing
Disabled
Demand Scrubbing
Enabled
Thermal Mode
Performance
3
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.3 RHEL6 OS tuning.(Networking)
구분
항목
변경값
Disable the TCP timestamps
sysctl -w net.ipv4.tcp_timestamps=0
Disable the TCP selective acks
sysctl -w net.ipv4.tcp_sack=0
processor input queues
sysctl -w net.core.netdev_max_backlog=250000
TCP buffer sizes using setsockopt():
sysctl
sysctl
sysctl
sysctl
sysctl
Network
-w
-w
-w
-w
-w
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=16777216
Increase memory thresholds to prevent packet
dropping
sysctl -w net.ipv4.tcp_mem="16777216 16777216
16777216"
auto-tuning of TCP buffer limits
sysctl -w net.ipv4.tcp_rmem="4096 87380 167772
16"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777
216"
Low latency mode for TCP
sysctl -w net.ipv4.tcp_timestamps=0
Rebooting 후에도 변경 값을 적용하기 위해 /etc/sysctl.conf 파일을 수정합니다.
ex) net.ipv4.tcp_timestamps = 0 #해당 항목 추가
4
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.4 RHEL6 OS tuning.(Power management)
Check that the output CPU frequency for each core is equal to the maximum supported
and that all core frequencies are consistent.
Check the maximum supported CPU frequency using:
#cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
Check that core frequencies are consistent using:
#cat /proc/cpuinfo | grep "cpu MHz"
Check that the output frequencies are the same as the maximum supported.
If the CPU frequency is not at the maximum, check the BIOS settings according to table in Recommended BIOS Settings.
Check the current CPU frequency to check whether echo performance is implemented using:
#cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
OS에서 CPU frequency 현재 값과 MAX값을 비교합니다.
만약 현재 값이 MAX값과 다를 경우, BIOS에서 CPU frequency 값을 변경합니다.
5
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.5 RHEL6 OS tuning.(Kernel Idle Loop Tuning )
The mlx4_en kernel module has an optional parameter that can tune the kernel idle loop for better latency.
This will improve the CPU wakeup time but may result in higher power consumption.
To tune the kernel idle loop, set the following options in /etc/modprobe.d/mlx4.conf:
options mlx4_en enable_sys_tune=1
CPU wakeup time 성능이 향상됩니다. 하지만 전력 소모량이 증가합니다.
6
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.6 RHEL6 OS tuning.(OS Controlled Power Management )
Some operating systems can override BIOS power management configuration and enable c-states by default,
which results in a higher latency.
To resolve the high latency issue, please follow the instructions below:
1. Edit the /boot/grub/grub.conf file or any other bootloader configuration file.
2. Add the following kernel parameters to the bootloader command.
intel_idle.max_cstate=0 processor.max_cstate=1
3. Reboot the system.
Ex)
title RH6.2x64
root (hd0,0)
kernel /vmlinuz-RH6.2x64-2.6.32-220.el6.x86_64
root=UUID=817c207b-c0e8-4ed9-9c33-c589c0bb566f console=tty0
console=ttyS0,115200n8 rhgb intel_idle.max_cstate=0 processor.max_cstate=1
OS의 Power management setting이 BIOS의 setting 값보다 우선시 되는 경우가 있습니다.
그럴 경우에, /boot/grub/grub.conf 파일을 수정해서 적용을 해야 합니다.
7
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.6 RHEL6 OS tuning.(Interrupt Moderation )
Interrupt moderation is used to decrease the frequency of network adapter interrupts to the CPU. Mellanox network
adapters use an adaptive interrupt moderation algorithm by default. The algorithm checks the transmission (Tx) and
receive (Rx) packet rates and modifies the Rx interrupt moderation settings accordingly.
To manually set Tx and/or Rx interrupt moderation, use the ethtool utility. For example, the following commands first
show the current (default) setting of interrupt moderation on the interface eth1, then turns off Rx interrupt moderation,
and last shows the new setting.
# ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: on TX: off
...
pkt-rate-low: 400000
pkt-rate-high: 450000
rx-usecs: 16
rx-frames: 88
rx-usecs-irq: 0
rx-frames-irq: 0
...
8
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.6.1 RHEL6 OS tuning.(Interrupt Moderation )
ethtool -C eth1 adaptive-rx off rx-usecs 0 rx-frames 0
#ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: off TX: off
...
pkt-rate-low: 400000
pkt-rate-high: 450000
rx-usecs: 0
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0
...
Interrupt Moderation 은 CPU에 네트워크 어댑터의 인터럽트 빈도를 감소하는데 사용됩니다.
수동으로 TX / RX Interrupt Moderation 을 설정하려면 ethtool 커맨드를 사용합니다.
9
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.7 RHEL6 OS tuning.(Tuning for NUMA Architecture )
Tuning for Intel® Microarchitecture Code name Sandy Bridge
The Intel Sandy Bridge processor has an integrated PCI express controller. Thus every PCIe adapter OS is connected
directly to a NUMA node.
On a system with more than one NUMA node, performance will be better when using the local NUMA node to which
the PCIe adapter is connected.
In order to identify which NUMA node is the adapter's node the system BIOS should support the proper ACPI feature.
to see if your system supports PCIe adapter's NUMA node detection run the following command:
# cat /sys/devices/[PCI root]/[PCIe function]/numa_node
Or
# cat /sys/class/net/[interface]/device/numa_node
Example for supported system:
# cat /sys/devices/pci0000\:00/0000\:00\:05.0/numa_node
0
Example for unsupported system:
# cat /sys/devices/pci0000\:00/0000\:00\:05.0/numa_node
-1
10
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.7 RHEL6 OS tuning.(IRQ Affinity )
The affinity of an interrupt is defined as the set of processor cores that service that interrupt. To improve application
scalability and latency, it is recommended to distribute interrupt requests (IRQs) between the available processor cores.
To prevent the Linux IRQ balancer application from interfering with the interrupt affinity scheme, the IRQ balancer must
be turned off.
The following command turns off the IRQ balancer:
> /etc/init.d/irqbalance stop
The following command assigns the affinity of a single interrupt vector:
> echo <hexadecimal bit mask> > /proc/irq/<irq vector>/smp_affinity
where bit i in <hexadecimal bit mask> indicates whether processor core i is in <irq vector>’s affinity or not.
Application의 scalability 와 latency의 향상을 위해서 IRQ가 가용한 processor core들에게 분산되는 것을 권장합니다.
LINUX IRQ balancer 와 Interrupt affinity scheme 의 충돌을 방지하기 위해서, IRQ balancer는 반드시 turn off 상태여야 합
니다.
11
© 2012 UniOne INC Co., Ltd. All Rights Reserved.
IBM X3650 M4 Performance Tuning
1.7.1 RHEL6 OS tuning.(IRQ Affinity Configuration )
For Intel Sandy Bridge systems set the irq affinity to the adapter's NUMA node:
For optimizing single-port traffic, run:
# set_irq_affinity_bynode.sh <numa node> <interface>
For optimizing dual-port traffic, run:
# set_irq_affinity_bynode.sh <numa node> <interface1> <interface2>
To show the current irq affinity settings, run:
# show_irq_affinity.sh <interface>
위의 스크립트는 Mellanox 웹 사이트(www.mellanox.com )에서 다운로드 가능합니다.
12
© 2012 UniOne INC Co., Ltd. All Rights Reserved.