Veritas CFS Media Server Workloads Sequential read I/O throughput test Date: 18th September 2015 Colin Eldridge Shrinivas Chandukar What is the purpose of this document? - This initial document is designed to help setup a CFS environment for use as a media server solution. The idea is to repeat the testing we have performed in this document using your own h/w environment This report is specific to sequential read I/O, it includes best practices and configuration recommendations. This testing will identify the I/O bottlenecks in your h/w environment. The testing will identify the maximum read I/O throughput that can be achieved from one node and the maximum read I/O throughput from all nodes combined, using your h/w environment. This testing will identify the best stripe-width and number of columns for your VxVM volume. This testing will identify the best file system read_ahead tuning for a sequential read I/O workload. In summary: - - This document attempts to explain how to setup a media server solution, including: o how to perform the tests o how to measure the I/O throughput o how to choose the correct VxVM volume configuration and achieve balanced I/O o how to identify the bottlenecks in the I/O path using your h/w environment o how to tune the file system read_ahead to balance the read I/O throughput across processes You should then understand the capabilities of your h/w environment, including: o the maximum read I/O throughput that will be possible in the environment o the mechanism of balancing the I/O across the LUNs o the mechanism of balancing the read I/O throughput across active processes/threads 1. Configuration: Hardware, DMP paths and volume configuration o HOST side: 2 CFS nodes Each node has a dual port HBA card (so 2 active DMP paths to each LUN), each HBA port is connected to a different FC switch The theoretical maximum throughput per FC port on the HBA is 8Gbits/sec The theoretical maximum throughput per node (two FC ports) is 16Gbits/sec. The theoretical maximum throughput for two nodes is therefore 32Gbits/sec. In reality during our testing the maximum throughput we could reach from one node was 12Gbits/sec In our 1-node testing the dual port HBA therefore bottlenecked at approximately 12Gbits/sec (1.5 Gbytes/sec), so this is our approx. maximum throughput from one node. o FC Switch: 2 FC switches Each switch is capable of 32Gbits/sec, there are two switches so the total theoretical max throughout for both switches is 64Gbits/sec. Each individual switch port is capable of 8Gbits/sec. We are using 4 switch ports connected to HBA FC ports on the host nodes – this limits the max throughout at the switch to 32Gbits/sec (through both switches). We are using 12 switch ports connected to the modular storage arrays. o Storage Array: 6 modular array We have 6 modular storage arrays. We are using 2 ports from each storage array – each port has a theoretical maximum throughput of 4Gbits/sec. We therefore have a total of 12 storage array connections to the two FC switches (6 connections to each switch) The theoretical maximum throughput is therefore 48Gbits/sec for the storage arrays. In our 2-node testing the combination of 6 storage arrays bottlenecked at approximately 20Gbits/sec (2.5 Gbytes/sec), so this is our approx. maximum throughout from both nodes. # vxdmpadm listenclosure ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT FIRMWARE ======================================================================================================= storagearray-0 STORAGEARRAY- 21000022a1035118 CONNECTED A/A-A-STORAGE 4 1 storagearray-1 STORAGEARRAY- 21000022a1035119 CONNECTED A/A-A-STORAGE 4 1 storagearray-2 STORAGEARRAY- 21000022a1035116 CONNECTED A/A-A-STORAGE 4 1 storagearray-3 STORAGEARRAY- 21000022a1035117 CONNECTED A/A-A-STORAGE 4 1 storagearray-4 STORAGEARRAY- 21000022a106c70a CONNECTED A/A-A-STORAGE 4 1 storagearray-5 STORAGEARRAY- 21000022a106c705 CONNECTED A/A-A-STORAGE 4 1 o LUNs: Each modular array has 4 enclosures with 12 disks each, only 11 disks are used in each enclosure for a RAID-0 LUN Each LUN is comprised of 11 disks (11 way stripe), 64Kb stripe width (one disk is kept as a failure disk). There are 4 LUNs per modular array, therefore we have a total of 24 LUNs. Each LUN is approximately 3TB. All 24 LUNs can be displayed using the “vxdisk list” command: # vxdisk list DEVICE storagearray-0_16 storagearray-0_17 storagearray-0_18 storagearray-0_20 storagearray-1_6 storagearray-1_7 storagearray-1_8 storagearray-1_9 storagearray-2_5 storagearray-2_6 storagearray-2_7 storagearray-2_8 storagearray-3_4 storagearray-3_6 storagearray-3_7 storagearray-3_8 storagearray-4_8 storagearray-4_9 storagearray-4_10 storagearray-4_11 storagearray-5_8 storagearray-5_9 storagearray-5_10 storagearray-5_11 TYPE auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk auto:cdsdisk DISK storagearray-0_16 storagearray-0_17 storagearray-0_18 storagearray-0_20 storagearray-1_6 storagearray-1_7 storagearray-1_8 storagearray-1_9 storagearray-2_5 storagearray-2_6 storagearray-2_7 storagearray-2_8 storagearray-3_4 storagearray-3_6 storagearray-3_7 storagearray-3_8 storagearray-4_8 storagearray-4_9 storagearray-4_10 storagearray-4_11 storagearray-5_8 storagearray-5_9 storagearray-5_10 storagearray-5_11 GROUP testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg STATUS online online online online online online online online online online online online online online online online online online online online online online online online shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared shared o DMP paths - 2 paths per LUN There are 2 paths per LUN (on each node). Both paths are active, therefore there are 48 active paths in total (on each node). All 48 paths can be displayed using the “vxdisk path” command: # vxdisk path SUBPATH sdad sdo sdab sdm sdae sdp sdac sdn sdx sdan sdaa sdaq sdz sdap sdy sdao sdat sdw sdar sdu sdas sdv sdaz sday sdq sdau sds sdaw sdav sdr sdax sdt sdaf sdi sdag sdj sdl sdai sdk sdah sdh sdam sdg sdal sde sdaj sdf sdak DANAME storagearray-0_16 storagearray-0_16 storagearray-0_17 storagearray-0_17 storagearray-0_18 storagearray-0_18 storagearray-0_20 storagearray-0_20 storagearray-1_6 storagearray-1_6 storagearray-1_7 storagearray-1_7 storagearray-1_8 storagearray-1_8 storagearray-1_9 storagearray-1_9 storagearray-2_5 storagearray-2_5 storagearray-2_6 storagearray-2_6 storagearray-2_7 storagearray-2_7 storagearray-2_8 storagearray-2_8 storagearray-3_4 storagearray-3_4 storagearray-3_6 storagearray-3_6 storagearray-3_7 storagearray-3_7 storagearray-3_8 storagearray-3_8 storagearray-4_8 storagearray-4_8 storagearray-4_9 storagearray-4_9 storagearray-4_10 storagearray-4_10 storagearray-4_11 storagearray-4_11 storagearray-5_8 storagearray-5_8 storagearray-5_9 storagearray-5_9 storagearray-5_10 storagearray-5_10 storagearray-5_11 storagearray-5_11 DMNAME storagearray-0_16 storagearray-0_16 storagearray-0_17 storagearray-0_17 storagearray-0_18 storagearray-0_18 storagearray-0_20 storagearray-0_20 storagearray-1_6 storagearray-1_6 storagearray-1_7 storagearray-1_7 storagearray-1_8 storagearray-1_8 storagearray-1_9 storagearray-1_9 storagearray-2_5 storagearray-2_5 storagearray-2_6 storagearray-2_6 storagearray-2_7 storagearray-2_7 storagearray-2_8 storagearray-2_8 storagearray-3_4 storagearray-3_4 storagearray-3_6 storagearray-3_6 storagearray-3_7 storagearray-3_7 storagearray-3_8 storagearray-3_8 storagearray-4_8 storagearray-4_8 storagearray-4_9 storagearray-4_9 storagearray-4_10 storagearray-4_10 storagearray-4_11 storagearray-4_11 storagearray-5_8 storagearray-5_8 storagearray-5_9 storagearray-5_9 storagearray-5_10 storagearray-5_10 storagearray-5_11 storagearray-5_11 GROUP testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg testdg STATE ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED o VxVM volume The idea is to achieve balanced I/O across all the LUNs, and to maximise the h/w I/O bandwidth. As we have 24 LUNs available we created our VxVM volume with 24 columns to obtain the maximum possible throughput. We then tested using three different VxVM stripe unit widths, 64Kb, 512Kb, 1024Kb The “stripewidth” argument to the vxassist command is in units of 512byte sectors. Volume configuration using 64k stripe width volume, 24 columns: # vxassist -g testdg make vol1 50T layout=striped stripewidth=128 `vxdisk list|grep storage|awk '{print $1}'` v pl sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd vol1 ENABLED ACTIVE 107374182400 SELECT vol1-01 fsgen vol1-01 vol1 ENABLED ACTIVE 107374184448 STRIPE 24/128 RW storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924352 0/0 storagearray-0_16 storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924352 1/0 storagearray-0_17 storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924352 2/0 storagearray-0_18 storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924352 3/0 storagearray-0_20 storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473924352 4/0 storagearray-1_6 storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473924352 5/0 storagearray-1_7 storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473924352 6/0 storagearray-1_8 storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473924352 7/0 storagearray-1_9 storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473924352 8/0 storagearray-2_5 storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473924352 9/0 storagearray-2_6 storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473924352 10/0 storagearray-2_7 storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473924352 11/0 storagearray-2_8 storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473924352 12/0 storagearray-3_4 storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473924352 13/0 storagearray-3_6 storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473924352 14/0 storagearray-3_7 storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473924352 15/0 storagearray-3_8 storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473924352 16/0 storagearray-4_8 storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473924352 17/0 storagearray-4_9 storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924352 18/0 storagearray-4_10 storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924352 19/0 storagearray-4_11 storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473924352 20/0 storagearray-5_8 storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473924352 21/0 storagearray-5_9 storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924352 22/0 storagearray-5_10 storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924352 23/0 storagearray-5_11 ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA Volume configuration using 512k stripe width volume, 24 columns: # vxassist -g testdg make vol1 50T layout=striped stripewidth=1024 `vxdisk list|grep storage|awk '{print $1}'` v pl sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd vol1 ENABLED ACTIVE 107374182400 SELECT vol1-01 fsgen vol1-01 vol1 ENABLED ACTIVE 107374190592 STRIPE 24/1024 RW storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924608 0/0 storagearray-0_16 storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924608 1/0 storagearray-0_17 storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924608 2/0 storagearray-0_18 storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924608 3/0 storagearray-0_20 storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473924608 4/0 storagearray-1_6 storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473924608 5/0 storagearray-1_7 storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473924608 6/0 storagearray-1_8 storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473924608 7/0 storagearray-1_9 storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473924608 8/0 storagearray-2_5 storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473924608 9/0 storagearray-2_6 storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473924608 10/0 storagearray-2_7 storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473924608 11/0 storagearray-2_8 storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473924608 12/0 storagearray-3_4 storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473924608 13/0 storagearray-3_6 storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473924608 14/0 storagearray-3_7 storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473924608 15/0 storagearray-3_8 storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473924608 16/0 storagearray-4_8 storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473924608 17/0 storagearray-4_9 storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924608 18/0 storagearray-4_10 storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924608 19/0 storagearray-4_11 storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473924608 20/0 storagearray-5_8 storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473924608 21/0 storagearray-5_9 storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924608 22/0 storagearray-5_10 storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924608 23/0 storagearray-5_11 ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA Volume configuration using 1024k stripe width volume, 24 columns: # vxassist -g testdg make vol1 50T layout=striped stripewidth=2048 `vxdisk list|grep storage|awk '{print $1}'` v pl sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd sd vol1 ENABLED ACTIVE 107374182400 SELECT vol1-01 fsgen vol1-01 vol1 ENABLED ACTIVE 107374215168 STRIPE 24/2048 RW storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473925632 0/0 storagearray-0_16 storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473925632 1/0 storagearray-0_17 storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473925632 2/0 storagearray-0_18 storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473925632 3/0 storagearray-0_20 storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473925632 4/0 storagearray-1_6 storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473925632 5/0 storagearray-1_7 storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473925632 6/0 storagearray-1_8 storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473925632 7/0 storagearray-1_9 storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473925632 8/0 storagearray-2_5 storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473925632 9/0 storagearray-2_6 storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473925632 10/0 storagearray-2_7 storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473925632 11/0 storagearray-2_8 storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473925632 12/0 storagearray-3_4 storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473925632 13/0 storagearray-3_6 storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473925632 14/0 storagearray-3_7 storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473925632 15/0 storagearray-3_8 storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473925632 16/0 storagearray-4_8 storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473925632 17/0 storagearray-4_9 storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473925632 18/0 storagearray-4_10 storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473925632 19/0 storagearray-4_11 storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473925632 20/0 storagearray-5_8 storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473925632 21/0 storagearray-5_9 storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473925632 22/0 storagearray-5_10 storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473925632 23/0 storagearray-5_11 ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA ENA 2. VxVM maximum disk I/O: Read throughput test execution Raw disk device read I/O test execution and collection of throughput results: vxbench sequential read test execution method and result collection An example of the vxbench command that we run on each node is below. This test executes 64 parallel processes, each process is reading from the same raw volume device, reading using a block size of 1MB. The output of the vxbench command provides the combined total throughput of all 64 parallel processes, we capture this information in our result table. The result in this example test was Therefore the result of this test was 1577033.29 KBytes/second 1.504 GBytes/second Test: vxbench IO : sequential read of raw volume IOsize=1024K VxVM volume stripe width 512KB, 24 columns Processes: 64 $ ./vxbench -w read -i iosize=1024k,iotime=300,maxfilesize=40T /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 user 1: 300.015 sec 24625.94 KB/s cpu: 0.75 user 2: 300.024 sec 24498.95 KB/s cpu: 0.75 user 3: 300.004 sec 24667.86 KB/s cpu: 0.75 user 4: 300.020 sec 24417.35 KB/s cpu: 0.75 user 5: 300.016 sec 24574.65 KB/s cpu: 0.74 user 6: 300.012 sec 24615.97 KB/s cpu: 0.74 user 7: 300.029 sec 24689.68 KB/s cpu: 0.76 user 8: 300.023 sec 24587.75 KB/s cpu: 0.75 user 9: 300.032 sec 24668.98 KB/s cpu: 0.76 user 10: 300.024 sec 24795.84 KB/s cpu: 0.76 user 11: 300.033 sec 24546.01 KB/s cpu: 0.75 user 12: 300.024 sec 24761.75 KB/s cpu: 0.76 user 13: 300.028 sec 24543.02 KB/s cpu: 0.76 user 14: 300.014 sec 24591.96 KB/s cpu: 0.75 user 15: 300.013 sec 24568.13 KB/s cpu: 0.75 user 16: 300.037 sec 24624.20 KB/s cpu: 0.75 user 17: 300.018 sec 24734.97 KB/s cpu: 0.76 user 18: 300.003 sec 24596.26 KB/s cpu: 0.76 user 19: 300.004 sec 24886.31 KB/s cpu: 0.77 user 20: 300.007 sec 24879.24 KB/s cpu: 0.76 user 21: 300.017 sec 24434.71 KB/s cpu: 0.75 user 22: 300.027 sec 24437.31 KB/s cpu: 0.76 user 23: 300.019 sec 24635.87 KB/s cpu: 0.75 user 24: 300.028 sec 24665.88 KB/s cpu: 0.76 user 25: 300.021 sec 24519.64 KB/s cpu: 0.75 user 26: 300.022 sec 24587.85 KB/s cpu: 0.76 user 27: 300.006 sec 24647.22 KB/s cpu: 0.77 user 28: 300.019 sec 24666.62 KB/s cpu: 0.76 user 29: 300.006 sec 24544.82 KB/s cpu: 0.76 user 30: 300.022 sec 24625.35 KB/s cpu: 0.75 user 31: 300.021 sec 24649.38 KB/s cpu: 0.75 user 32: 300.016 sec 24701.01 KB/s cpu: 0.76 user 33: 300.018 sec 24683.74 KB/s cpu: 0.75 user 34: 300.018 sec 24738.38 KB/s cpu: 0.77 user 35: 300.001 sec 24599.78 KB/s cpu: 0.75 user 36: 300.008 sec 24674.30 KB/s cpu: 0.76 user 37: 300.024 sec 24580.86 KB/s cpu: 0.75 user 38: 300.023 sec 24628.71 KB/s cpu: 0.75 user 39: 300.007 sec 24701.75 KB/s cpu: 0.77 user 40: 300.026 sec 24765.01 KB/s cpu: 0.76 user 41: 300.007 sec 24824.63 KB/s cpu: 0.76 user 42: 300.015 sec 24707.90 KB/s cpu: 0.78 user 43: 300.032 sec 24587.01 KB/s cpu: 0.76 user 44: 300.027 sec 24700.06 KB/s cpu: 0.78 user 45: 300.019 sec 24584.70 KB/s cpu: 0.77 user 46: 300.013 sec 24745.56 KB/s cpu: 0.78 user 47: 300.033 sec 24556.21 KB/s cpu: 0.77 user 48: 300.012 sec 24728.58 KB/s cpu: 0.77 user 49: 300.010 sec 24489.82 KB/s cpu: 0.76 user 50: 300.020 sec 24751.83 KB/s cpu: 0.76 user 51: 300.035 sec 24846.13 KB/s cpu: 0.77 user 52: 300.012 sec 24639.83 KB/s cpu: 0.75 user 53: 300.010 sec 24691.24 KB/s cpu: 0.77 user 54: 300.029 sec 24686.29 KB/s cpu: 0.77 user 55: 300.021 sec 24608.41 KB/s cpu: 0.77 user 56: 300.027 sec 24440.67 KB/s cpu: 0.77 user 57: 300.017 sec 24700.92 KB/s cpu: 0.77 user 58: 300.026 sec 24645.57 KB/s cpu: 0.77 user 59: 300.004 sec 24442.54 KB/s cpu: 0.76 user 60: 300.011 sec 24749.21 KB/s cpu: 0.77 user 61: 300.006 sec 24865.61 KB/s cpu: 0.77 user 62: 300.023 sec 24468.29 KB/s cpu: 0.75 user 63: 300.023 sec 24662.87 KB/s cpu: 0.77 user 64: 300.017 sec 24646.26 KB/s cpu: 0.76 total: 300.037 sec 1577033.29 KB/s /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.01 user sys 0.01 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.01 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.01 user sys 0.00 user sys 0.01 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user sys 0.00 user cpu: 48.63 sys 0.05 user /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 iostat throughput data method and result collection An example of the iostat command that we run on each node is below. The sector size is 512bytes. Note that the average request size avgrq-sz is 1024, this is 1024 sectors * 512bytes = 512KB read I/O size. The result in this example test is Therefore the result of the test is 3155251.20 sectors/second 1.504 GBytes/second $iostat –x 20 Device: sda sdc sdd sdb sdp sdo sdn sds sdt sdq sdr sdx sdz sdy sdaa sdm sdv sdu sdw sdg sdj sdi sdl sdk sdf sde sdh sdab sdac sdad sdae sdaf sdag sdah sdai sdaj sdak sdal sdam sdan sdao sdap sdaq sdar sdas sdat sdau sdav sdaw sdax sday sdaz VxVM59000 100.00 rrqm/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 wrqm/s r/s 4.30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 61.95 0.00 64.95 0.00 64.90 0.00 64.90 0.00 64.75 0.00 63.20 0.00 64.75 0.00 62.40 0.00 66.60 0.00 64.65 0.00 62.15 0.00 63.90 0.00 62.75 0.00 64.15 0.00 66.45 0.00 62.80 0.00 64.45 0.00 65.45 0.00 64.05 0.00 64.95 0.00 65.35 0.00 62.45 0.00 65.25 0.00 64.30 0.00 63.55 0.00 63.20 0.00 66.40 0.00 62.90 0.00 63.95 0.00 63.25 0.00 64.20 0.00 65.70 0.00 62.75 0.00 65.40 0.00 62.90 0.00 66.00 0.00 64.05 0.00 62.05 0.00 66.30 0.00 64.40 0.00 65.80 0.00 62.30 0.00 65.25 0.00 63.75 0.00 63.55 0.00 63.80 0.00 63.55 0.00 64.95 0.00 3081.30 w/s 1.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 rsec/s wsec/s avgrq-sz avgqu-sz 0.00 44.40 35.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 63436.80 0.00 1024.00 1.87 66508.80 0.00 1024.00 1.82 66457.60 0.00 1024.00 2.03 66457.60 0.00 1024.00 2.05 66304.00 0.00 1024.00 2.16 64716.80 0.00 1024.00 1.88 66304.00 0.00 1024.00 2.05 63897.60 0.00 1024.00 2.05 68198.40 0.00 1024.00 2.37 66201.60 0.00 1024.00 2.37 63641.60 0.00 1024.00 2.17 65433.60 0.00 1024.00 1.96 64256.00 0.00 1024.00 2.26 65689.60 0.00 1024.00 2.32 68044.80 0.00 1024.00 2.25 64307.20 0.00 1024.00 2.39 65996.80 0.00 1024.00 2.14 67020.80 0.00 1024.00 2.02 65587.20 0.00 1024.00 2.10 66508.80 0.00 1024.00 2.28 66918.40 0.00 1024.00 2.59 63948.80 0.00 1024.00 2.38 66816.00 0.00 1024.00 2.43 65843.20 0.00 1024.00 2.15 65075.20 0.00 1024.00 2.23 64716.80 0.00 1024.00 2.01 67993.60 0.00 1024.00 2.25 64409.60 0.00 1024.00 1.89 65484.80 0.00 1024.00 2.08 64768.00 0.00 1024.00 2.18 65740.80 0.00 1024.00 2.07 67276.80 0.00 1024.00 2.44 64256.00 0.00 1024.00 2.44 66969.60 0.00 1024.00 2.41 64409.60 0.00 1024.00 2.26 67584.00 0.00 1024.00 2.13 65587.20 0.00 1024.00 2.28 63539.20 0.00 1024.00 2.15 67891.20 0.00 1024.00 2.27 65945.60 0.00 1024.00 2.17 67379.20 0.00 1024.00 2.24 63795.20 0.00 1024.00 1.98 66816.00 0.00 1024.00 2.00 65280.00 0.00 1024.00 2.05 65075.20 0.00 1024.00 2.11 65331.20 0.00 1024.00 2.21 65075.20 0.00 1024.00 2.39 66508.80 0.00 1024.00 2.35 3155251.20 0.00 1024.00 104.68 await svctm %util 0.12 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 30.22 12.73 78.84 27.99 12.35 80.21 31.32 12.92 83.84 31.60 12.76 82.79 33.38 12.47 80.75 29.71 12.37 78.20 31.67 12.28 79.48 32.81 13.16 82.10 35.60 12.27 81.70 36.69 12.89 83.35 34.92 13.56 84.25 30.63 13.05 83.36 36.05 12.86 80.72 36.25 13.08 83.93 33.89 12.47 82.86 38.07 13.43 84.37 33.04 13.31 85.81 30.76 12.47 81.65 32.67 12.64 80.97 34.90 13.01 84.53 39.63 12.90 84.31 38.16 12.90 80.53 37.21 12.73 83.05 33.46 13.61 87.53 35.19 13.48 85.69 31.87 13.19 83.39 33.89 12.58 83.50 30.10 12.73 80.09 32.68 13.12 83.93 34.56 13.00 82.22 32.21 12.40 79.60 37.07 12.33 81.03 38.80 13.33 83.64 36.79 13.08 85.56 35.92 12.97 81.59 32.23 12.36 81.59 35.60 13.04 83.54 34.65 12.77 79.26 34.21 12.95 85.87 33.60 13.43 86.51 33.90 12.42 81.74 31.74 13.24 82.46 30.52 12.21 79.66 32.12 12.43 79.27 33.08 13.10 83.22 34.69 12.86 82.07 37.70 13.13 83.41 36.08 13.09 85.01 33.97 0.32 vxstat throughput data collection method and result collection An example of the vxstat command that we run on each node is below. The blocks in the vxstat output are in units of sectors, so the block size is 512bytes. Note that ‘blocks read / operations read’ gives the average I/O size: 2624512 BLOCKS READ / 2563 OPERATIONS READ / 2 = 512KB avg. read I/O size The result in this example test is Therefore the result of the test is 63109120 blocks (512 byte sectors) read every 20 seconds 1.504 GBytes/second $ vxstat -g testdg -vd –I 20 TYP Fri dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm dm vol NAME 27 Feb 2015 12:49:49 PM IST storagearray-0_16 storagearray-0_17 storagearray-0_18 storagearray-0_20 storagearray-1_6 storagearray-1_7 storagearray-1_8 storagearray-1_9 storagearray-2_5 storagearray-2_6 storagearray-2_7 storagearray-2_8 storagearray-3_4 storagearray-3_6 storagearray-3_7 storagearray-3_8 storagearray-4_8 storagearray-4_9 storagearray-4_10 storagearray-4_11 storagearray-5_8 storagearray-5_9 storagearray-5_10 storagearray-5_11 vol1 OPERATIONS READ WRITE 2563 2564 2568 2568 2570 2569 2572 2573 2576 2572 2570 2569 2570 2568 2570 2572 2567 2567 2566 2564 2563 2563 2563 2563 61630 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BLOCKS READ 2624512 2625536 2629632 2629632 2631680 2630656 2633728 2634752 2637824 2633728 2631680 2630656 2631680 2629632 2631680 2633728 2628608 2628608 2627584 2625536 2624512 2624512 2624512 2624512 63109120 WRITE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 AVG TIME(ms) READ WRITE 29.86 32.01 32.08 33.20 32.47 34.50 35.12 36.11 32.81 34.88 34.93 36.84 30.09 32.30 31.84 33.96 30.40 32.82 32.41 34.69 36.54 37.37 37.57 39.20 33.92 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 portperfshow – FC switch port throughput data collection method and result collection An example of the command used to collect the throughput at the switch port is below. The portperfshow command reports the throughput for one switch, so two ‘portperfshow’ commands are executed, one for each FC switch. The ‘portperfshow’ total is no use here, as we only want to collect the data for the specific ports that are connected to the host HBA FC ports. In our test case this is port3 and port7. The other six ports are connected to the six modular storage arrays. FC_switch1:admin> portperfshow 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... Total ============================================================================================================== 234.4m 237.3m 238.5m 704.4m 231.4m 242.6m 239.0m 717.7m 0 0 0 0 0 0 ... 2.8g FC_switch2:admin> portperfshow 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... Total =============================================================================================================== 236.7m 236.0m 237.8m 708.0m 231.5m 238.2m 232.8m 715.5m 0 0 0 0 0 0 ... 2.8g Therefore we have to add: Switch1 port3 704.4 + port7 717.7 = 1422.1 MB/sec = 1.388769 Gbytes/sec Switch2 port3 708.0 + port7 715.5 = 1423.5 MB/sec = 1.390137 Gbytes/sec Total: = 2.778906 Gbytes/sec NOTE: Measuring the throughout at the switch port always shows a higher reading than the throughput measured by vxbench/vxstat/iostat. The measurement at the switch port is higher due to 8b/10b encoding overhead. The I/O throughput reading is therefore best measured by vxbench/vxstat/iostat and not at the FC switch port. Referring to the “Fibre channel roadmap v1.8” table at http://fibrechannel.org/fibre-channelroadmaps.html The 8GFC throughput is 1600MB/sec for full duplex, therefore the net throughput for each direction will be 800MB/sec. As the HBA is a dual port card the maximum theoretical throughput for each direction will be 1600MB/sec. However, referring to http://en.wikipedia.org/wiki/Fibre_Channel shows 8GFC is actually 797MB/sec for each direction. Therefore using our dual port card the maximum theoretical throughput for each direction will be 1594MB/sec (1.5566 GB/sec) Therefore, per the specification, the maximum theoretical throughput in our environment will be 1.5566 GB/sec per node. Above we are measuring the throughput from both nodes at the FC switch ports whilst the test is running on both nodes, this bottlenecked at the storage arrays. The bottleneck at the storage array is approximately 2.5 Gbytes/sec, however measuring the throughput at the FC switch ports gives a higher throughput reading of 2.77 Gbytes/sec, this is due to the reasons explained above. 3. VxVM maximum disk I/O: Test results and conclusions Raw volume device disk I/O throughput test results summary in Gbits per second: Test program: vxbench IO : sequential read of raw volume IOsize=1024K VxVM volume stripe widths 64KB, 512KB and 1024KB VxVM volume 24 columns Processes: 64 Summary of raw volume throughput (Gbits/sec) Stripe Summary width nodes vxbench iostat Gbits/sec Recommended 64k 1 11.429 11.485 11.5 64k 2 19.457 19.543 19.5 512k 1 12.032 12.040 12.0 YES 512k 2 20.552 20.557 20.5 YES 1024k 1 12.029 12.037 12.0 1024k 2 20.341 20.331 20.3 Raw volume device disk I/O throughput detailed test results in GBytes per second: Stripe Nodes width vxbench GB/s 1 2nd Total Node Node 64k 64k 512k 512k 1024k 1 2 1 2 1 1.429 1.429 1.436 1.436 1.428 1.428 1.215 1.217 2.432 1.218 1.225 2.443 1.214 1.220 2.434 1.504 1.504 1.505 1.505 1.504 1.504 1.285 1.284 2.569 1.284 1.286 2.570 1.286 1.287 2.573 1.504 1.504 1.505 1.505 1.505 1.505 0.782 1.306 0.803 1.372 0.817 0.795 1.304 0.829 1.370 0.813 1.577 2.610 1.632 2.741 1.629 1024k 2 1.272 1.271 2.543 1.269 1.273 2.541 1.272 1.273 2.545 1.357 1.359 2.716 st iostat GB/s 1 2nd Total Node Node st vxstat GB/s 1 2nd Total Node Node st FC Switch GB/s 1 2nd Total Switch Switch st Conclusions and recommendations so far: a. Maximum I/O size setting (RHEL6.5) The default operating system maximum I/O size is 512KB, there is no need to change the operating system’s default maximum I/O size tunable values. b. VxVM stripe width setting The optimal VxVM stripe width for media server solutions is also 512KB, Veritas therefore recommend using VxVM stripe width of 512KB. c. VxVM stripe columns setting The hardware was configured to achieve maximum throughput when accessing all the available LUNs. The number of LUNs available using our storage configuration was 24. We therefore used all 24 LUNs in our VxVM volume to maximize the storage I/O bandwidth. d. Balanced I/O Using a VxVM stripe width of 512KB and 24 columns and utilizing all paths, we were able to achieve balanced I/O across all the LUNs (see the iostat output). This then allowed us to easily identify the HBA bottleneck (using a single node) and storage bottlenecks (using both nodes). e. Maximum achievable read I/O throughout using our hardware configuration o 12Gbits/sec (1.5Gbytes/sec) Performing I/O from one node: using our hardware configuration, we identified the dual FC port HBA had a throughput bottleneck of 12Gbits/sec (1.5Gbytes/sec) – this is maximum throughput we can achieve from each node. o 20Gbits/sec (2.5Gbytes/sec) Performing I/O from two nodes: using our hardware configuration, we identified the storage bottleneck of 20Gbits/sec (2.5Gbytes/sec) f. Conclusion: From this point onwards we now know the maximum throughout achievable using our hardware configuration. 4. VxFS direct I/O maximum disk I/O: Read throughput test execution This VxFS direct I/O test mimics the VxVM raw disk test by performing direct I/O to one file that contains a single contiguous extent. Thereby, all the vxbench processes begin reading from the same device offset. This VxFS direct I/O test is therefore equivalent to the VxVM raw device test, only the starting offset into the device is different. Here are the details of the file we created for this test: # ls -li file1 4 -rw-r--r-- 1 root root 34359738368 Mar 3 14:25 file1 # ls -lhi file1 4 -rw-r--r-- 1 root root 32G Mar 3 14:25 file1 # du -h file1 32G file1 One file with a single contiguous extent of size 32GB: # fsmap -HA ./file1 Volume Extent Type vol1 Data File Offset 0 Bytes Dev Offset 34359738368 Extent Size 32.00 GB Inode# 4 Here is how we created this file and performed this test, note that we strongly recommend a file system block size of 8192: // mkfs $ mkfs -t vxfs /dev/vx/rdsk/testdg/vol1 version 10 layout 107374182400 sectors, 6710886400 blocks of size 8192, log size 32768 blocks rcq size 8192 blocks largefiles supported maxlink supported Note that for optimal read performance we recommend using the mount option of “noatime”. The ‘noatime’ mount option prevents the inode access time being updated for every read operation. // mount $ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 // create a file with a single 32Gb extent and write to it $ touch /data1/file1 $ /opt/VRTS/bin/setext -r 4194304 -f contig /data1/file1 $ dd if=/dev/zero of=/data1/file1 bs=128k count=262144 262144+0 records in 262144+0 records out 34359738368 bytes (34 GB) copied, 24.0118 s, 1.4 GB/s $ /opt/VRTS/bin/fsmap -A /data1/file1 Volume Extent Type File Offset Dev Offset Extent Size Inode# vol1 Data 0 34359738368 34359738368 4 $ ls -lh /data1/file1 -rw-r--r-- 1 root root 32G Mar 3 14:12 /data1/file1 // umount the file system to clear the file data from memory $ umount /data1 // mount the file system from both nodes $ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 // vxbench command execution, 64 processes reading from the same file using direct I/O $./vxbench -w read -c direct -i iosize=1024k,iotime=300,maxfilesize=32G /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 5. VxFS direct I/O maximum disk I/O: Test results As expected the results are the same as the VxVM raw disk read throughput test (all results in GBytes/second) VxFS direct IO vxbench st Stripe width Nodes 64k 64k 512k 512k 1024k 1 2 1 2 1 1024k 2 iostat 1.423 1.423 1.428 1.428 1.428 1.428 1.213 1.209 2.423 1.217 1.208 2.425 1.217 1.208 2.425 1.502 1.502 1.504 1.504 1.504 1.504 1.282 1.281 2.563 1.283 1.283 2.566 1.283 1.283 2.566 1.502 1.502 1.502 1.502 1.502 1.502 0.769 1.294 0.801 1.370 0.802 0.768 1.302 0.802 1.364 0.802 1.537 2.596 1.603 2.734 1.604 1.271 1.268 2.539 1.271 1.271 2.541 1.271 1.271 2.541 1.352 1.361 2.713 Total 1 Node nd 2 Node st FC Switch Total 2 Node st vxstat 1 2nd switch switch 1 Node nd Total 1 Node nd 2 Node st Total Using a stripe-width of 512KB is recommended by VERITAS. The I/O is evenly balanced across all 24 LUNs. Below is the iostat output showing all 48 paths (1-node test): IOstat Device: sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds sdt sdu sdv sdw sdx sdy sdz sdaa sdab sdac sdad sdae sdaf sdag sdah sdai sdak sdal sdaj sdam sdan sdaq sdao sdap sdar sdas sdat sdau sdav sdaw rrqm/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 wrqm/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 r/s 65.25 63.85 64.95 63.45 66.50 62.90 64.15 65.00 64.95 66.60 62.95 61.20 62.95 65.70 66.40 63.85 62.60 65.00 62.65 64.85 62.80 64.75 62.20 63.85 61.75 65.25 64.10 63.25 62.95 64.30 63.30 65.35 62.55 64.80 61.90 64.40 66.10 65.55 63.50 64.55 65.80 63.25 65.75 63.25 63.30 w/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 rsec/s 66816.00 65382.40 66508.80 64972.80 68096.00 64409.60 65689.60 66560.00 66508.80 68198.40 64460.80 62668.80 64460.80 67276.80 67993.60 65382.40 64102.40 66560.00 64153.60 66406.40 64307.20 66304.00 63692.80 65382.40 63232.00 66816.00 65638.40 64768.00 64460.80 65843.20 64819.20 66918.40 64051.20 66355.20 63385.60 65945.60 67686.40 67123.20 65024.00 66099.20 67379.20 64768.00 67328.00 64768.00 64819.20 wsec/s avgrq-sz avgqu-sz 0.00 1024.00 2.26 0.00 1024.00 2.32 0.00 1024.00 2.23 0.00 1024.00 2.09 0.00 1024.00 2.13 0.00 1024.00 2.19 0.00 1024.00 2.27 0.00 1024.00 2.23 0.00 1024.00 2.12 0.00 1024.00 2.21 0.00 1024.00 1.94 0.00 1024.00 1.93 0.00 1024.00 1.98 0.00 1024.00 2.23 0.00 1024.00 2.25 0.00 1024.00 2.20 0.00 1024.00 2.32 0.00 1024.00 2.36 0.00 1024.00 2.20 0.00 1024.00 2.50 0.00 1024.00 1.84 0.00 1024.00 2.13 0.00 1024.00 1.95 0.00 1024.00 2.00 0.00 1024.00 1.98 0.00 1024.00 2.18 0.00 1024.00 2.22 0.00 1024.00 2.08 0.00 1024.00 2.19 0.00 1024.00 2.31 0.00 1024.00 2.22 0.00 1024.00 2.13 0.00 1024.00 2.17 0.00 1024.00 2.15 0.00 1024.00 2.19 0.00 1024.00 2.31 0.00 1024.00 2.46 0.00 1024.00 2.28 0.00 1024.00 2.44 0.00 1024.00 2.40 0.00 1024.00 2.10 0.00 1024.00 2.01 0.00 1024.00 1.98 0.00 1024.00 2.10 0.00 1024.00 1.97 await 34.55 36.19 34.36 32.92 32.10 34.79 35.41 34.35 32.69 33.34 30.84 31.63 31.52 33.94 33.97 34.40 37.00 36.23 35.17 38.48 29.32 32.81 31.29 31.35 31.93 33.30 34.64 32.85 34.82 36.00 35.10 32.52 34.68 33.23 35.28 35.80 37.17 34.79 38.47 37.14 32.03 31.74 30.15 33.10 31.20 svctm 12.69 13.34 13.30 12.82 12.76 13.83 13.41 12.72 13.03 12.66 12.92 13.12 13.15 12.65 13.16 13.47 13.60 12.79 13.05 13.12 12.63 12.73 12.58 13.22 13.00 13.15 13.11 12.67 12.93 13.21 13.36 12.42 12.67 12.53 13.44 12.98 12.69 12.84 13.48 13.59 12.92 12.76 12.60 13.19 13.32 %util 82.80 85.18 86.39 81.34 84.83 86.98 86.04 82.65 84.61 84.32 81.36 80.28 82.80 83.10 87.41 85.97 85.17 83.13 81.75 85.11 79.30 82.45 78.24 84.43 80.28 85.79 84.05 80.12 81.40 84.95 84.54 81.17 79.22 81.22 83.18 83.58 83.87 84.16 85.58 87.73 85.01 80.72 82.86 83.44 84.32 sdax sday sdaz VxVM40000 0.00 0.00 0.00 0.00 0.00 61.85 0.00 65.35 0.00 67.15 0.00 3078.65 0.00 0.00 0.00 0.00 63334.40 66918.40 68761.60 3152537.60 0.00 0.00 0.00 0.00 1024.00 1024.00 1024.00 1024.00 2.01 1.92 2.07 103.77 32.52 29.36 30.88 33.71 13.27 82.08 12.36 80.74 12.08 81.15 0.32 100.00 6. VxVM raw disk and VxFS direct I/O: Results comparison and conclusions - - Raw volume device disk I/O throughput test results in Gbytes/sec : VxVM RAW IO Stripe Nodes width vxbench 1 2nd Total Node Node 64k 64k 512k 512k 1024k 1 2 1 2 1 1.429 1.429 1.436 1.436 1.428 1.428 1.215 1.217 2.432 1.218 1.225 2.443 1.214 1.220 2.434 1.504 1.504 1.505 1.505 1.504 1.504 1.285 1.284 2.569 1.284 1.286 2.570 1.286 1.287 2.573 1.504 1.504 1.505 1.505 1.505 1.505 0.782 1.306 0.803 1.372 0.817 0.795 1.304 0.829 1.370 0.813 1.577 2.610 1.632 2.741 1.629 1024k 2 1.272 1.271 2.543 1.269 1.273 2.541 1.272 1.273 2.545 1.357 1.359 2.716 st 1 Node iostat 2nd Node vxstat 1 2nd Node Node FC Switch 1 2nd Total Switch Switch st Total st Total VxFS direct I/O disk I/O throughput test results in Gbytes/sec : VxFS direct IO - st vxbench st Stripe width Nodes 64k 64k 512k 512k 1024k 1 2 1 2 1 1024k 2 iostat 1.423 1.423 1.428 1.428 1.428 1.428 1.213 1.209 2.423 1.217 1.208 2.425 1.217 1.208 2.425 1.502 1.502 1.504 1.504 1.504 1.504 1.282 1.281 2.563 1.283 1.283 2.566 1.283 1.283 2.566 1.502 1.502 1.502 1.502 1.502 1.502 0.769 1.294 0.801 1.370 0.802 0.768 1.302 0.802 1.364 0.802 1.537 2.596 1.603 2.734 1.604 1.271 1.268 2.539 1.271 1.271 2.541 1.271 1.271 2.541 1.352 1.361 2.713 Total 1 Node nd 2 Node st FC Switch Total 2 Node st vxstat 1 2nd switch switch 1 Node nd Total 1 Node nd 2 Node st Total The CPU utilisation during both tests is very small: VxVM RAW IO Stripe width 1st Node Nodes %usr %sys 64k 1 1.13 2.63 64k 2 1.41 2.39 512k 1 0.93 1.25 512k 2 0.92 1.13 1024k 1 0.96 1.26 1024k 2 0.92 1.12 2nd Node %usr %sys 0.63 0.53 0.5 2.24 1.06 1.03 VxFS Direct IO Stripe width 1st Node Nodes %usr %sys 64k 1 0.89 2.76 64k 2 0.63 2.34 512k 1 0.8 1.32 512k 2 0.65 1.17 1024k 1 0.82 1.31 1024k 2 0.64 1.11 2nd Node %usr %sys 0.63 2.34 0.65 1.17 0.64 1.11 Conclusions and recommendations so far: a. Maximum I/O size setting The default operating system maximum I/O size is 512KB, there is no need to change the default maximum I/O size tunable values. b. VxVM stripe width setting The optimal VxVM stripe width for media server solutions is also 512KB, Veritas therefore recommend using VxVM stripe width of 512KB. c. VxVM stripe columns setting The hardware was configured to achieve maximum throughput when accessing all the available LUNs. The number of LUNs available using our storage configuration was 24. We therefore used all 24 LUNs in our VxVM volume to maximize the storage I/O bandwidth. d. Balanced I/O Using a VxVM stripe width of 512KB and 24 columns and utilizing all paths, we were able to achieve balanced I/O across all the LUNs. (see the iostat output) This then allowed us to easily identify the HBA bottleneck (using a single node) and storage bottlenecks (using both nodes). e. Maximum achievable read I/O throughout using our hardware configuration o 12Gbits/sec (1.5Gbytes/sec) Performing I/O from one node: using our hardware configuration, we identified the dual FC port HBA had a throughput bottleneck of 12Gbits/sec (1.5Gbytes/sec) – this is maximum throughput we can achieve from each node. o 20Gbits/sec (2.5Gbytes/sec) Performing I/O from two nodes: using our hardware configuration, we identified the storage bottleneck of 20Gbits/sec (2.5Gbytes/sec) f. Conclusion: From this point onwards we now know the maximum throughout achievable in using our hardware configuration. g. For improved file system read performance, mount the file system using the “noatime” mount option. i. The ‘noatime’ mount option avoids the inode access time update for every read [access] operation. ii. The inode atime (access time) updates are asynchronous and do not go through the file system intent-log iii. However for maximum performance benefits in read intensive workloads the “noatime” mount option is recommended. Example mount command: # mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 h. For improved file system write performance, mount the file system using the “nomtime” mount option. i. The ‘nomtime’ mount option is a lazy update of the inode modification time, it is only available with CFS. ii. The inode mtime (modification time) updates do go through the file system intent-log. iii. The ‘nomtime’ option does not remove the modification time update, it just delays it to improve CFS write performance. iv. For maximum performance benefits in write intensive CFS workloads the “nomtime” mount option is recommended. Example mount command using noatime and nomtime: # mount -t vxfs -o noatime,nomtime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 i. Conclusion: The test results show that VxFS direct I/O does not degrade sequential read I/O throughput performance compared to raw disk. i. By creating a file system and creating a file with a single contiguous extent we could emulate the raw disk read throughput using VxFS direct I/O ii. Each direct I/O read will fetch data from disk, so no buffering is being performed using either direct I/O or raw disk I/O. iii. Using VxFS direct I/O and running an identical vxbench test, we hit the same maximum achievable read I/O throughout. iv. Therefore the sequential read throughput was not impacted using VxFS direct I/O compared to reading from VxVM raw disk. 7. VxFS buffered I/O maximum disk I/O throughput test: Test execution This VxFS buffered I/O test is different. For the buffered read I/O throughout test, each process needs to read from a different file. To prepare the files for this test we pre-allocate 16GB of file system space to each file, then write to the files to increase their file size to 16GB. To pre-create the 64 files for this test the following script can used. The script assumes an 8192 byte file system block size is being used. mkdir /data1/primary mkdir /data1/secondary for n in `seq 1 64` do touch /data1/primary/file${n}; /opt/VRTS/bin/setext -r 2097152 -f contig /data1/primary/file${n}; dd if=/dev/zero of=/data1/primary/file${n} bs=128k count=131072 & touch /data1/secondary/file${n}; /opt/VRTS/bin/setext -r 2097152 -f contig /data1/secondary/file${n}; dd if=/dev/zero of=/data1/secondary/file${n} bs=128k count=131072 & done When this script has finished some of the file data will remain in memory, before we run our buffered I/O test we need to remove the file data from memory. Note that for improved read performance you can also use the “noatime” mount option. The ‘noatime’ mount option prevents the inode access time being updated for every read operation. We did not use the “noatime” mount option in our test. To remove the file data from memory the file system can be umounted and mounted again. Alternatively, a simple trick can be used to remove the file data from memory before each test run by using the “remount” mount option, as follows: // mount $ mount -t vxfs -o remount,noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 Again we are using vxbench to perform our test. This time however we need to explicitly stipulate the path to each separate file on the vxbench command line, as shown below. Note also that the iosize argument has been changed, we are no longer reading using a 1024KB block size; in our VxFS buffered I/O test we are reading using a 32KB block size, because a smaller read(2) iosize will be used in the media server solution implementation. # ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file34 /data1/primary/file37 /data1/primary/file40 /data1/primary/file43 /data1/primary/file46 /data1/primary/file49 /data1/primary/file52 /data1/primary/file55 /data1/primary/file58 /data1/primary/file61 /data1/primary/file64 /data1/primary/file32 /data1/primary/file35 /data1/primary/file38 /data1/primary/file41 /data1/primary/file44 /data1/primary/file47 /data1/primary/file50 /data1/primary/file53 /data1/primary/file56 /data1/primary/file59 /data1/primary/file62 /data1/primary/file33 /data1/primary/file36 /data1/primary/file39 /data1/primary/file42 /data1/primary/file45 /data1/primary/file48 /data1/primary/file51 /data1/primary/file54 /data1/primary/file57 /data1/primary/file60 /data1/primary/file63 8. VxFS buffered I/O max disk I/O throughput test: Tests and test results and individual test conclusions All the tests in this entire report read from disk using sequential read I/O. VxFS readahead is required The greatest impact to the performance of sequential reads from disk when using VxFS/CFS buffered I/O is readahead. File system readahead utilizes the file system page cache to asynchronously pre-fetch file data into memory, this logically benefits sequential read I/O performance. Our buffered I/O sequential read performance tests demonstrate the impact of readahead and highlight how tuning readahead can avoid a potential imbalance in throughput between processes. Readahead is tunable using the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables. The VxVM volume configuration will impact readahead We have already determined, in our earlier testing above, that the optimal VxVM stripe-width to maximize the I/O throughput is 512KB running our test. In our storage configuration we created 24 LUNs across 6 modular arrays, by striping across all 24 LUNs we can balance the I/O across the LUNs and maximize the overall storage bandwidth. Using this optimal volume configuration we could easily identify two bottlenecks, one due to the FC HBA ports (a per-node bottleneck) and the other bottleneck in the storage itself. However the volume stripe width and the number of columns (LUNs) in the volume are also used to auto-tune the values for the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables. VxFS readahead tunables – default values When mounting a VxFS file system it will auto-tune values for the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables. These two tunables are used to tune VxFS readahead. The value for read_pref_io will be set to the VxVM volume stripe width – therefore the default autotuned value is read_pref_io=524288 in our test. The value for read_nstream will be the number of columns (LUNs) in the volume – therefore the default auto-tuned value is read_nstream=24 in our test. VxFS picks the default values for these tunables from the VxVM volume configuration. This means read_pref_io=524288 and read_nstream=24 will be set by default by VxFS at mount time using our volume configuration. VxFS readahead tunables – maximum amount file data that will be pre-fetched The maximum amount of file data that is pre-fetched from disk using read_ahead is determined by read_pref_io*read_nstream. Therefore, by default, the maximum amount of read_ahead will be “512KB * 24 = 12MB” using our volume configuration. As we will see during the buffered I/O testing, pre-fetching 12MB of file data is too much readahead, we found this caused an imbalance in read I/O throughput between processes. VxFS readahead tunable – read_pref_io The VxFS read_pref_io tunable is set to the VxVM volume stripe-width by default. The tunable means the “preferred read I/O size”. VxFS readahead will be triggered by two sequential read I/O’s. The amount of file data to pre-fetch from disk is increased as more sequential I/O’s are performed. As mentioned above, the maximum amount of readahead (the maximum amount of file data to prefetch from disk) is read_pref_io*read_nstream. However the maximum I/O request size submitted by VxFS to VxVM will be ‘read_pref_io’. Therefore read_pref_io is the maximum read I/O request size submitted to VxVM. What does it mean if read_pref_io is set to 512KB: o If (for example) we read a file using the ‘dd’ command and use a dd block size of 8KB, then VxFS readahead will pre-fetch the file data using I/O requests of size 512KB to VxVM. o Readahead can therefore result in a smaller number of I/O’s and a larger I/O request size, thus improving read I/O performance. Veritas do not recommend tuning ‘read_pref_io’ from its default auto-tuned value. If a different value (other than the default value) for ‘read_pref_io’ is desired, then Veritas recommend changing the volume stripe width instead. VxFS readahead tunable – read_nstream The read_nstream value defaults to the number of columns in the VxVM volume. As mentioned above, the maximum amount of readahead (the maximum amount of file data to prefetch from disk) is read_pref_io*read_nstream To reduce the maximum amount of read_ahead simply reduce the value of read_nstream, please see the results of our tests using different values for read_nstream below. The best practice for tuning readahead is as follows: o Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io change the VxVM volume stripe-width instead. o Reduce read_nstream to reduce the amount of readahead o You could disable readahead if necessary, but this will usually be a disadvantage (see test4). o Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot. Summary: By performing sequential reads using VxFS buffered I/O and performing readahead, the application I/O size is effectively converted to read_pref_io sized requests to VxVM. So there are two performance benefits of readahead, one is to pre-fetch file data from disk, the other is to increase the I/O size of the read request from disk (so reducing the number of I/O’s). These buffered I/O throughput tests will therefore help you decide what stripe-width, number of columns and readahead tuning is best for your solution implementation. Also, these buffered I/O throughput tests will help you to determine how many running processes you will want to be reading from disk at the same time. Buffered I/O tests: We have chosen a volume configuration that was best for disk I/O performance, however this volume configuration also results in very aggressive read_ahead (12MB at maximum). With a stripe_width of 512KB and 24 LUNs (columns) the default maximum read_ahead is therefore too aggressive. TEST1: Use the default auto-tuned settings, using one node: <this is the baseline test> Baseline vxbench test – 64files/64processess/32KB block size Default auto-tuning – read_ahead enabled/read_nstream=24/read_pref_io=524288 # vxtunefs /data1 Filesystem I/O parameters for /data1 read_pref_io = 524288 read_nstream = 24 read_ahead = 1 # mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 # ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.062 sec 48868.77 KB/s cpu: 9.78 sys 0.08 user user 2: 300.102 sec 48370.93 KB/s cpu: 9.78 sys 0.06 user user 3: 300.042 sec 48094.01 KB/s cpu: 9.86 sys 0.08 user user 4: 300.176 sec 4461.92 KB/s cpu: 1.01 sys 0.00 user user 5: 300.105 sec 4584.10 KB/s cpu: 1.12 sys 0.00 user user 6: 300.102 sec 48125.32 KB/s cpu: 9.85 sys 0.08 user user 7: 300.031 sec 48341.50 KB/s cpu: 9.79 sys 0.07 user user 8: 300.201 sec 4583.81 KB/s cpu: 1.12 sys 0.01 user user 9: 300.194 sec 4582.32 KB/s cpu: 1.14 sys 0.00 user user 10: 300.203 sec 4755.40 KB/s cpu: 1.19 sys 0.00 user user 11: 300.126 sec 48121.38 KB/s cpu: 9.74 sys 0.08 user user 12: 300.220 sec 4500.70 KB/s cpu: 1.01 sys 0.00 user user 13: 300.201 sec 4665.25 KB/s cpu: 1.11 sys 0.00 user user 14: 300.086 sec 48291.58 KB/s cpu: 9.74 sys 0.07 user user 15: 300.165 sec 4501.41 KB/s cpu: 1.01 sys 0.01 user user 16: 300.203 sec 4633.57 KB/s cpu: 1.16 sys 0.00 user user 17: 300.147 sec 48159.06 KB/s cpu: 9.64 sys 0.08 user user 18: 300.035 sec 48504.56 KB/s cpu: 9.41 sys 0.08 user user 19: 300.078 sec 48497.65 KB/s cpu: 9.73 sys 0.07 user user 20: 300.161 sec 48238.58 KB/s cpu: 9.66 sys 0.08 user user 21: 300.136 sec 48201.71 KB/s cpu: 9.74 sys 0.08 user user 22: 300.193 sec 4705.78 KB/s cpu: 1.21 sys 0.00 user user 23: 300.086 sec 48045.94 KB/s cpu: 9.86 sys 0.07 user user 24: 300.062 sec 47926.93 KB/s cpu: 9.69 sys 0.08 user user 25: 300.198 sec 4460.09 KB/s cpu: 1.11 sys 0.01 user user 26: 300.207 sec 4623.79 KB/s cpu: 1.09 sys 0.00 user user 27: 300.215 sec 4582.00 KB/s cpu: 1.01 sys 0.00 user user 28: 300.125 sec 48203.53 KB/s cpu: 9.70 sys 0.08 user user 29: 300.141 sec 48323.77 KB/s cpu: 9.65 sys 0.07 user user 30: 300.212 sec 4705.48 KB/s cpu: 1.20 sys 0.00 user user 31: 300.153 sec 48485.59 KB/s cpu: 9.72 sys 0.07 user user 32: 300.163 sec 48033.68 KB/s cpu: 9.66 sys 0.07 user user 33: 300.160 sec 48525.35 KB/s cpu: 9.82 sys 0.07 user user 34: 300.144 sec 4624.56 KB/s cpu: 1.09 sys 0.01 user user 35: 300.102 sec 48002.47 KB/s cpu: 9.60 sys 0.07 user user 36: 300.203 sec 4821.38 KB/s cpu: 1.18 sys 0.01 user user 37: 300.006 sec 48072.18 KB/s cpu: 9.64 sys 0.07 user user 38: 300.219 sec 4746.29 KB/s cpu: 1.15 sys 0.00 user user 39: 300.213 sec 4701.73 KB/s cpu: 1.18 sys 0.00 user user 40: 300.176 sec 4460.00 KB/s cpu: 1.13 sys 0.00 user user 41: 300.207 sec 4583.50 KB/s cpu: 1.05 sys 0.00 user user 42: 300.213 sec 4624.56 KB/s cpu: 1.03 sys 0.00 user user 43: 300.049 sec 48789.10 KB/s cpu: 9.87 sys 0.08 user user 44: 300.207 sec 4708.85 KB/s cpu: 1.18 sys 0.00 user user user user user user user user user user user user user user user user user user user user user 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: total: 300.077 300.079 300.099 300.064 300.199 300.204 300.032 300.120 300.128 300.203 300.201 300.206 300.212 300.212 300.211 300.206 300.133 300.035 300.195 300.047 sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec 300.220 sec 48129.27 48374.66 48494.28 48581.86 4705.78 4788.64 9044.38 47917.67 48407.76 4746.24 4460.37 4623.49 4664.43 4664.76 4623.52 4623.80 12111.95 9945.29 4583.47 48093.15 KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: 1578817.49 KB/s 9.59 9.74 9.64 9.47 1.10 1.20 1.94 9.69 9.56 1.07 1.02 1.11 1.09 1.06 1.04 1.08 2.64 2.15 1.13 9.80 sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys 0.07 0.07 0.09 0.08 0.00 0.01 0.02 0.06 0.07 0.00 0.01 0.00 0.00 0.00 0.01 0.01 0.02 0.01 0.00 0.09 cpu: 323.53 sys user user user user user user user user user user user user user user user user user user user user 2.31 user Conclusion to TEST1: <this is our baseline test, using the default auto-tuned values, read_nstream is therefore set to its default value of 24> o This test ran for 300.220 seconds and read from disk at an average rate of 1578817.49 KB/sec, vxbench therefore read 452 GB of data from disk. o The throughput per process is very imbalanced, some processes achieved ~49000 KB/sec others processes only achieved ~4800 KB/sec o However the maximum possible read I/O throughput from one node is still being achieved 1578817.49 KB/sec = 1.506 GB/sec o The problem is not the total throughput, the problem is the maximum readahead per process is 12MB at a time o 12MB of readahead (read_pref_io*read_nstream) is too aggressive and is causing an imbalance of throughout between processes. o This readahead configuration is therefore a failure, too much readahead is causing an imbalance of throughput between the processes. o o o o o We do not want to change the value of read_pref_io because we want to request large I/O sizes for better performance. By default the VxFS read_pref_io tunable is set to the VxVM volume stripe-width, in our test this value is 512KB. By default the VxFS read_nstream tunable is set to the number of columns in the VxVM volume, in our test this value is 24 (we have 24 LUNs). Next, we therefore want to experiment by setting smaller values of read_nstream and also test with read_ahead disabled as well. Our goal is to maintain the maximum amount of total throughput (approx. 1.5Gbytes/sec) whilst also spreading this throughput evenly between all the active processes reading from disk. TEST2: change read_nstream to 1, keep everything else the same as the baseline test. vxbench – 64files/64processess/32KB block size Tuning – read_ahead enabled/read_nstream=1/read_pref_io=524288 # vxtunefs /data1 -o read_nstream=1 UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1 # vxtunefs /data1 Filesystem I/O parameters for /data1 read_pref_io = 524288 read_nstream = 1 read_ahead = 1 # mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 #./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.013 sec 24639.76 KB/s cpu: 5.35 sys 0.05 user user 2: 300.044 sec 24748.27 KB/s cpu: 5.41 sys 0.06 user user 3: 300.010 sec 24706.66 KB/s cpu: 5.52 sys 0.06 user user 4: 300.021 sec 24872.94 KB/s cpu: 5.46 sys 0.05 user user 5: 300.023 sec 24724.40 KB/s cpu: 5.58 sys 0.05 user user 6: 300.060 sec 24683.79 KB/s cpu: 5.58 sys 0.06 user user 7: 300.021 sec 24744.96 KB/s cpu: 5.66 sys 0.06 user user 8: 300.016 sec 24680.46 KB/s cpu: 5.49 sys 0.06 user user 9: 300.017 sec 24784.51 KB/s cpu: 5.55 sys 0.06 user user 10: 300.021 sec 24744.97 KB/s cpu: 5.54 sys 0.05 user user 11: 300.015 sec 24747.12 KB/s cpu: 5.54 sys 0.06 user user 12: 300.017 sec 24830.60 KB/s cpu: 5.46 sys 0.05 user user 13: 300.013 sec 24824.11 KB/s cpu: 5.61 sys 0.05 user user 14: 300.028 sec 24729.11 KB/s cpu: 5.57 sys 0.05 user user 15: 300.017 sec 24752.09 KB/s cpu: 5.42 sys 0.06 user user 16: 300.028 sec 24655.71 KB/s cpu: 5.53 sys 0.06 user user 17: 300.013 sec 24834.38 KB/s cpu: 5.68 sys 0.05 user user 18: 300.048 sec 24773.52 KB/s cpu: 5.52 sys 0.07 user user 19: 300.024 sec 24697.01 KB/s cpu: 5.50 sys 0.07 user user 20: 300.012 sec 24938.48 KB/s cpu: 5.61 sys 0.06 user user 21: 300.016 sec 24646.33 KB/s cpu: 5.54 sys 0.06 user user 22: 300.016 sec 24689.11 KB/s cpu: 5.57 sys 0.05 user user 23: 300.019 sec 24695.60 KB/s cpu: 5.50 sys 0.06 user user 24: 300.023 sec 24719.31 KB/s cpu: 5.59 sys 0.05 user user 25: 300.015 sec 24755.66 KB/s cpu: 5.58 sys 0.05 user user 26: 300.018 sec 24596.75 KB/s cpu: 5.59 sys 0.07 user user 27: 300.049 sec 24717.11 KB/s cpu: 5.54 sys 0.08 user user 28: 300.019 sec 24753.74 KB/s cpu: 5.59 sys 0.06 user user 29: 300.021 sec 24214.23 KB/s cpu: 5.44 sys 0.06 user user 30: 300.021 sec 24772.27 KB/s cpu: 5.61 sys 0.05 user user 31: 300.019 sec 24908.96 KB/s cpu: 5.68 sys 0.05 user user 32: 300.045 sec 24637.23 KB/s cpu: 5.53 sys 0.06 user user 33: 300.053 sec 24677.55 KB/s cpu: 5.59 sys 0.05 user user 34: 300.017 sec 24692.39 KB/s cpu: 5.60 sys 0.07 user user 35: 300.018 sec 24787.86 KB/s cpu: 5.55 sys 0.06 user user 36: 300.019 sec 24741.70 KB/s cpu: 5.57 sys 0.07 user user 37: 300.015 sec 24813.68 KB/s cpu: 5.52 sys 0.06 user user 38: 300.014 sec 24808.66 KB/s cpu: 5.40 sys 0.06 user user 39: 300.013 sec 24716.57 KB/s cpu: 5.53 sys 0.06 user user 40: 300.024 sec 24705.55 KB/s cpu: 5.54 sys 0.06 user user 41: 300.039 sec 24796.47 KB/s cpu: 5.50 sys 0.05 user user user user user user user user user user user user user user user user user user user user user user user user 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: total: 300.044 300.044 300.028 300.060 300.019 300.052 300.020 300.016 300.020 300.035 300.049 300.022 300.014 300.026 300.058 300.022 300.021 300.015 300.027 300.021 300.021 300.011 300.015 sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec 300.061 sec 24852.33 24836.97 24735.94 24803.28 24830.57 24587.20 24750.25 24675.38 24704.09 24716.59 24700.04 24818.32 24725.01 24683.17 24786.37 24850.79 24702.35 24735.30 24840.10 24687.03 24799.55 24744.08 24655.08 KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: 1582992.52 KB/s 5.60 5.59 5.54 5.71 5.54 5.57 5.54 5.53 5.52 5.37 5.54 5.40 5.50 5.57 5.65 5.61 5.38 5.59 5.58 5.58 5.61 5.42 5.54 sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys 0.05 0.06 0.05 0.05 0.07 0.05 0.06 0.04 0.06 0.06 0.05 0.06 0.06 0.05 0.04 0.06 0.06 0.06 0.05 0.06 0.06 0.05 0.06 cpu: 354.62 sys user user user user user user user user user user user user user user user user user user user user user user user 3.65 user Conclusion to TEST2: <read_nstream set to 1> o Using read_nstream=1 produces a perfect balance in throughput per process (~24700 KB/sec), so now all the process have the same consistent throughput during the test: The maximum total throughput from one node is still being achieved (1582992.52 KB/s), approx. 1.5 GB/sec The total throughput is now divided evenly across all 64 processes and remains consistent throughout the test. The average read I/O size is obviously still 512KB (avgrq-sz = 1024.00), this is because read_pref_io is set to 512KB. The I/O is obviously evenly balanced across all 24 LUNs ( see r/s and rsec/s in the iostat output below) Most importantly the I/O throughput is now evenly balanced across all 64 processes, yet the total throughput remains the same. o The maximum readahead per process is now 512KB o The throughput per process is now therefore balanced, all 64 processes are now consitently performing approx. 24700 KB/s – perfect!! o Please note: If the throughput per process had not been evenly distributed using read_nstream=1, then we would recommend reducing the stripe-width to 256KB or 128KB Reducing the stripe-width will reduce the default value of “read_pref_io”. We do not advise tuning read_pref_io to override its default value, we recommend tuning the VxVM volume stripe-width instead. # iostat –x 20 Device: sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds sdt sdu sdv sdw sdx sdy sdz sdaa sdab sdac sdad sdae sdaf sdag sdah sdai sdak sdal sdaj sdam sdan sdaq sdao sdap sdar sdas sdat sdau sdav sdaw sdax sday sdaz VxVM56000 rrqm/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 wrqm/s r/s 0.00 63.90 0.00 64.35 0.00 64.75 0.00 64.95 0.00 64.80 0.00 65.40 0.00 65.05 0.00 63.30 0.00 64.50 0.00 64.85 0.00 63.10 0.00 65.45 0.00 64.00 0.00 64.55 0.00 64.20 0.00 64.85 0.00 64.65 0.00 63.85 0.00 64.25 0.00 64.95 0.00 64.00 0.00 63.85 0.00 64.80 0.00 65.05 0.00 64.15 0.00 63.50 0.00 64.00 0.00 65.65 0.00 65.25 0.00 64.50 0.00 64.25 0.00 64.85 0.00 64.50 0.00 64.05 0.00 64.75 0.00 64.05 0.00 64.15 0.00 64.75 0.00 64.95 0.00 63.85 0.00 64.20 0.00 65.05 0.00 64.85 0.00 64.00 0.00 64.55 0.00 64.05 0.00 65.85 0.00 63.65 0.00 3094.90 w/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 rsec/s wsec/s avgrq-sz avgqu-sz 65433.60 0.00 1024.00 2.88 65894.40 0.00 1024.00 2.88 66304.00 0.00 1024.00 2.91 66508.80 0.00 1024.00 2.90 66355.20 0.00 1024.00 2.64 66969.60 0.00 1024.00 2.70 66611.20 0.00 1024.00 2.73 64819.20 0.00 1024.00 2.61 66048.00 0.00 1024.00 2.77 66406.40 0.00 1024.00 2.79 64614.40 0.00 1024.00 2.66 67020.80 0.00 1024.00 2.80 65536.00 0.00 1024.00 2.66 66099.20 0.00 1024.00 2.72 65740.80 0.00 1024.00 2.65 66406.40 0.00 1024.00 2.75 66201.60 0.00 1024.00 2.66 65382.40 0.00 1024.00 2.64 65792.00 0.00 1024.00 2.63 66508.80 0.00 1024.00 2.68 65536.00 0.00 1024.00 2.71 65382.40 0.00 1024.00 2.70 66355.20 0.00 1024.00 2.68 66611.20 0.00 1024.00 2.70 65689.60 0.00 1024.00 2.57 65024.00 0.00 1024.00 2.56 65536.00 0.00 1024.00 2.57 67225.60 0.00 1024.00 2.65 66816.00 0.00 1024.00 2.95 66048.00 0.00 1024.00 2.91 65792.00 0.00 1024.00 2.88 66406.40 0.00 1024.00 2.64 66048.00 0.00 1024.00 2.66 65587.20 0.00 1024.00 2.90 66304.00 0.00 1024.00 2.64 65587.20 0.00 1024.00 2.63 65689.60 0.00 1024.00 2.74 66304.00 0.00 1024.00 2.74 66508.80 0.00 1024.00 2.81 65382.40 0.00 1024.00 2.74 65740.80 0.00 1024.00 2.67 66611.20 0.00 1024.00 2.69 66406.40 0.00 1024.00 2.66 65536.00 0.00 1024.00 2.65 66099.20 0.00 1024.00 2.82 65587.20 0.00 1024.00 2.82 67430.40 0.00 1024.00 2.88 65177.60 0.00 1024.00 2.80 3169177.60 0.00 1024.00 131.05 await 45.11 44.76 44.88 44.61 40.74 41.32 42.02 41.25 42.91 43.06 42.17 42.74 41.63 42.14 41.27 42.45 41.25 41.33 41.00 41.32 42.18 42.16 41.35 41.53 40.17 40.34 40.21 40.35 45.29 45.07 44.77 40.69 41.21 45.20 40.75 41.15 42.72 42.36 43.19 42.83 41.61 41.29 41.04 41.37 43.67 44.11 43.75 43.95 42.35 svctm %util 15.37 98.20 15.08 97.06 15.17 98.19 15.01 97.48 14.97 97.00 14.85 97.11 14.87 96.71 15.39 97.39 15.14 97.67 14.89 96.58 15.23 96.12 14.97 97.97 15.10 96.61 15.14 97.70 15.09 96.91 14.94 96.85 15.05 97.30 15.18 96.90 15.12 97.17 14.87 96.55 15.04 96.26 15.21 97.14 15.08 97.71 15.03 97.80 15.02 96.34 15.23 96.69 15.08 96.51 14.77 96.97 15.03 98.04 15.16 97.75 15.22 97.79 14.89 96.56 15.16 97.80 15.21 97.45 15.04 97.39 15.09 96.68 15.23 97.68 15.02 97.26 14.91 96.87 15.30 97.67 15.19 97.53 14.99 97.53 14.87 96.41 15.03 96.17 15.25 98.45 15.14 96.96 14.85 97.81 15.34 97.66 0.32 100.00 TEST3: change read_nstream to 1, read from 16 files using 16 processes, keep everything else the same as the baseline test. vxbench – 16files/16processess/32KB block size Tuning – read_ahead enabled/read_nstream=1/read_pref_io=524288 # vxtunefs /data1 -o read_nstream=1 UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1 # vxtunefs /data1 Filesystem I/O parameters for /data1 read_pref_io = 524288 read_nstream = 1 read_ahead = 1 # mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 # ./vxbench -w read -i iosize=32k,iotime=120,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 user 1: 120.030 sec 97417.75 KB/s cpu: 7.43 sys 0.07 user user 2: 120.037 sec 98452.82 KB/s cpu: 7.68 sys 0.09 user user 3: 120.033 sec 98302.87 KB/s cpu: 7.55 sys 0.08 user user 4: 120.031 sec 98227.29 KB/s cpu: 7.37 sys 0.08 user user 5: 120.030 sec 98381.88 KB/s cpu: 7.89 sys 0.05 user user 6: 120.033 sec 98272.61 KB/s cpu: 7.42 sys 0.07 user user 7: 120.032 sec 97744.74 KB/s cpu: 7.70 sys 0.08 user user 8: 120.037 sec 98069.12 KB/s cpu: 7.74 sys 0.10 user user 9: 120.030 sec 98603.74 KB/s cpu: 7.79 sys 0.06 user user 10: 120.036 sec 98756.87 KB/s cpu: 7.82 sys 0.07 user user 11: 120.037 sec 98513.11 KB/s cpu: 7.78 sys 0.10 user user 12: 120.040 sec 98360.81 KB/s cpu: 7.80 sys 0.08 user user 13: 120.030 sec 98488.47 KB/s cpu: 7.48 sys 0.09 user user 14: 120.030 sec 98241.64 KB/s cpu: 7.50 sys 0.09 user user 15: 120.039 sec 97824.57 KB/s cpu: 7.76 sys 0.09 user user 16: 120.032 sec 98700.71 KB/s cpu: 7.42 sys 0.09 user total: 120.041 sec 1572267.32 KB/s cpu: 122.13 sys 1.29 user Conclusion to TEST3: <read_nstream to 1, read from 16 files using 16 processes> o Using read_nstream=1 produces a perfect balance in throughput per process (98000 KB/sec), so all process still have an equal amount of throughput: The maximum total throughput from one node is still being achieved (1572267.32 KB/s) with 16 processes, this is approx. 1.5 GB/sec The total throughput is now divided evenly across all 16 processes, so the throughput perprocess is higher using less processes Most importantly the I/O throughput is now evenly balanced across all 16 processes, yet the total throughput remains the same. o The maximum readahead per process is still 512KB, this amount of readahead provides perfectly balanced throughput per process in our test. o The throughput per process is now therefore balanced, all 16 processes are now performing approx. 98000 KB/s – perfect!! o Please note: The throughput per process is now much higher using 16 processes rather than 64 processes. The number of processes reduced by a factor of 4 in test3, so the throughput per process increased by a factor of 4 in test3, but the total throughput is unchanged. It is therefore very important to consider the number of running processes that will be reading from disk at the same time, as the available throughput will be evenly distributed between these processes. TEST4: disable readahead, keep everything else the same as the baseline test. vxbench – 64files/64procs/32KB block size Tuning – read_ahead disabled/read_nstream=24/read_pref_io=524288 # vxtunefs /data1 -o read_nstream=24,read_ahead=0 UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1 # vxtunefs /data1 Filesystem I/O parameters for /data1 read_pref_io = 524288 read_nstream = 24 read_ahead = 0 # mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 # ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.011 sec 12246.06 KB/s cpu: 7.68 sys 0.07 user user 2: 300.009 sec 11192.53 KB/s cpu: 6.96 sys 0.07 user user 3: 300.010 sec 11619.25 KB/s cpu: 7.35 sys 0.06 user user 4: 300.014 sec 11551.35 KB/s cpu: 7.30 sys 0.07 user user 5: 300.015 sec 11563.46 KB/s cpu: 7.19 sys 0.08 user user 6: 300.007 sec 12257.53 KB/s cpu: 7.65 sys 0.10 user user 7: 300.008 sec 11638.53 KB/s cpu: 7.34 sys 0.09 user user 8: 300.007 sec 11449.44 KB/s cpu: 7.26 sys 0.09 user user 9: 300.014 sec 12062.17 KB/s cpu: 7.50 sys 0.08 user user 10: 300.008 sec 11544.21 KB/s cpu: 7.18 sys 0.08 user user 11: 300.012 sec 11442.10 KB/s cpu: 7.22 sys 0.10 user user 12: 300.012 sec 11666.33 KB/s cpu: 7.34 sys 0.07 user user 13: 300.007 sec 11740.63 KB/s cpu: 7.38 sys 0.07 user user 14: 300.015 sec 11528.29 KB/s cpu: 7.32 sys 0.07 user user 15: 300.009 sec 11616.83 KB/s cpu: 7.31 sys 0.08 user user 16: 300.008 sec 12253.34 KB/s cpu: 7.54 sys 0.07 user user 17: 300.013 sec 11727.19 KB/s cpu: 7.36 sys 0.07 user user 18: 300.009 sec 11700.54 KB/s cpu: 7.36 sys 0.07 user user 19: 300.008 sec 12245.63 KB/s cpu: 7.70 sys 0.09 user user 20: 300.007 sec 11757.38 KB/s cpu: 7.42 sys 0.08 user user 21: 300.007 sec 11242.93 KB/s cpu: 7.10 sys 0.06 user user 22: 300.012 sec 11589.92 KB/s cpu: 7.23 sys 0.08 user user 23: 300.008 sec 12262.93 KB/s cpu: 7.56 sys 0.09 user user 24: 300.007 sec 11756.85 KB/s cpu: 7.41 sys 0.08 user user 25: 300.014 sec 12086.92 KB/s cpu: 7.48 sys 0.08 user user 26: 300.011 sec 12001.58 KB/s cpu: 7.54 sys 0.07 user user 27: 300.012 sec 12096.78 KB/s cpu: 7.60 sys 0.10 user user 28: 300.017 sec 11550.08 KB/s cpu: 7.27 sys 0.08 user user 29: 300.011 sec 11734.28 KB/s cpu: 7.24 sys 0.09 user user 30: 300.011 sec 11962.11 KB/s cpu: 7.51 sys 0.08 user user 31: 300.014 sec 12128.16 KB/s cpu: 7.50 sys 0.08 user user 32: 300.011 sec 11725.32 KB/s cpu: 7.38 sys 0.10 user user 33: 300.009 sec 11371.62 KB/s cpu: 7.18 sys 0.06 user user 34: 300.009 sec 12041.25 KB/s cpu: 7.62 sys 0.07 user user 35: 300.008 sec 11980.36 KB/s cpu: 7.48 sys 0.08 user user 36: 300.015 sec 11908.75 KB/s cpu: 7.51 sys 0.07 user user 37: 300.010 sec 11432.46 KB/s cpu: 7.12 sys 0.08 user user 38: 300.014 sec 11796.37 KB/s cpu: 7.48 sys 0.06 user user 39: 300.008 sec 11824.77 KB/s cpu: 7.43 sys 0.08 user user 40: 300.014 sec 12077.29 KB/s cpu: 7.57 sys 0.07 user user 41: 300.012 sec 11564.45 KB/s cpu: 7.29 sys 0.08 user user 42: 300.015 sec 11583.94 KB/s cpu: 7.28 sys 0.05 user user 43: 300.015 sec 11874.83 KB/s cpu: 7.45 sys 0.08 user user 44: 300.010 sec 12142.53 KB/s cpu: 7.54 sys 0.08 user user user user user user user user user user user user user user user user user user user user user 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: total: iostat Device: sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds sdt sdu sdv sdw sdx sdy sdz sdaa sdab sdac sdad sdae sdaf sdag sdah sdai sdak sdal sdaj sdam sdan sdaq sdao sdap sdar sdas sdat sdau sdav sdaw sdax sday sdaz VxVM56000 300.015 300.011 300.014 300.010 300.010 300.014 300.010 300.010 300.008 300.016 300.017 300.011 300.011 300.008 300.010 300.011 300.010 300.011 300.008 300.009 sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec 300.017 sec rrqm/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11335.74 11915.63 12259.67 11405.71 11862.76 11556.89 12149.05 11384.38 11414.31 11336.45 12173.06 11808.63 12277.61 11529.39 12021.34 11499.74 12001.73 11978.65 11540.61 11221.21 KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: 753196.79 KB/s wrqm/s r/s 0.00 504.35 0.00 501.55 0.00 507.70 0.00 496.70 0.00 502.40 0.00 499.70 0.00 502.80 0.00 503.90 0.00 501.25 0.00 504.10 0.00 497.20 0.00 496.50 0.00 505.40 0.00 503.40 0.00 502.65 0.00 501.35 0.00 511.30 0.00 502.70 0.00 502.45 0.00 503.10 0.00 501.15 0.00 506.15 0.00 505.45 0.00 507.10 0.00 504.25 0.00 506.30 0.00 500.80 0.00 501.70 0.00 497.90 0.00 499.20 0.00 493.50 0.00 505.10 0.00 504.60 0.00 504.60 0.00 505.85 0.00 506.60 0.00 497.65 0.00 500.25 0.00 497.00 0.00 494.80 0.00 497.45 0.00 507.25 0.00 503.20 0.00 506.55 0.00 498.65 0.00 497.60 0.00 503.15 0.00 504.45 0.00 24109.05 w/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.05 7.43 7.56 7.12 7.34 7.28 7.49 7.11 7.09 7.09 7.57 7.33 7.55 7.14 7.44 7.18 7.55 7.52 7.20 7.06 sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys cpu: 471.23 sys 0.09 0.08 0.10 0.08 0.07 0.07 0.08 0.10 0.07 0.07 0.08 0.06 0.08 0.07 0.07 0.07 0.06 0.08 0.09 0.07 user user user user user user user user user user user user user user user user user user user user 4.95 user rsec/s wsec/s avgrq-sz avgqu-sz 32278.40 0.00 64.00 0.51 32099.20 0.00 64.00 0.44 32492.80 0.00 64.00 0.52 31788.80 0.00 64.00 0.55 32153.60 0.00 64.00 0.47 31980.80 0.00 64.00 0.62 32179.20 0.00 64.00 0.46 32249.60 0.00 64.00 0.47 32080.00 0.00 64.00 0.49 32262.40 0.00 64.00 0.91 31820.80 0.00 64.00 3.51 31776.00 0.00 64.00 0.44 32345.60 0.00 64.00 0.67 32217.60 0.00 64.00 0.60 32169.60 0.00 64.00 3.46 32086.40 0.00 64.00 0.46 32723.20 0.00 64.00 0.60 32172.80 0.00 64.00 0.76 32156.80 0.00 64.00 3.73 32198.40 0.00 64.00 3.70 32073.60 0.00 64.00 0.47 32393.60 0.00 64.00 3.47 32348.80 0.00 64.00 3.53 32454.40 0.00 64.00 0.51 32272.00 0.00 64.00 0.46 32403.20 0.00 64.00 0.61 32051.20 0.00 64.00 0.48 32108.80 0.00 64.00 0.51 31865.60 0.00 64.00 0.49 31948.80 0.00 64.00 0.47 31584.00 0.00 64.00 0.54 32326.40 0.00 64.00 0.66 32294.40 0.00 64.00 0.61 32294.40 0.00 64.00 0.53 32374.40 0.00 64.00 3.46 32422.40 0.00 64.00 0.47 31849.60 0.00 64.00 3.54 32016.00 0.00 64.00 0.43 31808.00 0.00 64.00 3.43 31667.20 0.00 64.00 0.52 31836.80 0.00 64.00 0.61 32464.00 0.00 64.00 0.75 32204.80 0.00 64.00 3.74 32419.20 0.00 64.00 3.69 31913.60 0.00 64.00 0.46 31846.40 0.00 64.00 0.90 32201.60 0.00 64.00 3.53 32284.80 0.00 64.00 0.44 1542979.20 0.00 64.00 62.84 await 1.02 0.88 1.02 1.10 0.95 1.24 0.91 0.93 0.99 1.80 7.07 0.90 1.32 1.19 6.88 0.92 1.18 1.52 7.42 7.36 0.93 6.85 6.99 1.00 0.92 1.21 0.97 1.01 0.98 0.95 1.09 1.30 1.21 1.06 6.85 0.92 7.11 0.87 6.91 1.05 1.23 1.49 7.44 7.28 0.93 1.80 7.02 0.88 2.61 svctm %util 0.79 39.97 0.70 35.15 0.79 40.20 0.81 40.10 0.76 38.42 0.91 45.48 0.72 36.22 0.76 38.05 0.78 39.16 1.11 55.75 1.96 97.30 0.73 36.06 0.93 47.04 0.85 42.86 1.92 96.49 0.75 37.40 0.85 43.55 1.07 53.56 1.96 98.52 1.92 96.67 0.73 36.65 1.92 97.28 1.95 98.45 0.78 39.43 0.74 37.47 0.89 45.24 0.75 37.80 0.81 40.70 0.76 37.72 0.75 37.24 0.83 40.93 0.93 47.05 0.86 43.54 0.79 40.05 1.91 96.67 0.73 36.80 1.97 97.96 0.70 34.90 1.96 97.41 0.81 40.03 0.90 44.75 1.03 52.22 1.96 98.61 1.92 97.06 0.75 37.21 1.14 56.59 1.94 97.52 0.72 36.44 0.04 100.00 Conclusion to TEST4: <read_ahead disabled> o The maximum read I/O throughput from one node is NOT being achieved, approx. 0.72 GBytes/sec. o The throughput for all 64 processes is balanced but is now much lower per process, they are now only performing approx. 12000 KB/s. o By disabling readahead the total throughput has halved. o All the read I/O is synchronous read I/O using a 32KB I/O request size. The iostat above shows 64 sectors (32KB) as the average I/O size for all LUN paths – avgrq-sz 64.00 Because readahead is disabled we are no longer submitting read_pref_io sized requests. Instead we are submitting a 32KB read request size, because this is the I/O size that vxbench is using. TEST5: change read_nstream to 6, keep everything else the same as the baseline test. vxbench – 64files/64procs/32KB block size Tuning – read_ahead enabled/read_nstream=6/read_pref_io=524288 # vxtunefs /data1 -o read_nstream=6 UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1 # vxtunefs /data1 Filesystem I/O parameters for /data1 read_pref_io = 524288 read_nstream = 6 read_ahead = 1 # mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 # ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.008 sec 26677.91 KB/s cpu: 5.16 sys 0.05 user user 2: 300.107 sec 26689.61 KB/s cpu: 5.25 sys 0.04 user user 3: 300.116 sec 26596.61 KB/s cpu: 4.97 sys 0.04 user user 4: 300.031 sec 26716.80 KB/s cpu: 4.98 sys 0.05 user user 5: 300.089 sec 26680.92 KB/s cpu: 5.19 sys 0.05 user user 6: 300.072 sec 26631.30 KB/s cpu: 5.01 sys 0.04 user user 7: 300.099 sec 26843.86 KB/s cpu: 5.21 sys 0.04 user user 8: 300.091 sec 26762.65 KB/s cpu: 5.20 sys 0.04 user user 9: 300.074 sec 26784.68 KB/s cpu: 5.17 sys 0.04 user user 10: 300.076 sec 26774.26 KB/s cpu: 5.07 sys 0.05 user user 11: 300.062 sec 26785.71 KB/s cpu: 4.97 sys 0.04 user user 12: 300.027 sec 14609.45 KB/s cpu: 2.90 sys 0.02 user user 13: 300.035 sec 26675.62 KB/s cpu: 5.21 sys 0.05 user user 14: 300.101 sec 9641.12 KB/s cpu: 1.95 sys 0.01 user user 15: 300.066 sec 26897.99 KB/s cpu: 4.93 sys 0.04 user user 16: 300.027 sec 26645.46 KB/s cpu: 5.09 sys 0.04 user user 17: 300.016 sec 26677.21 KB/s cpu: 5.19 sys 0.04 user user 18: 300.020 sec 26636.02 KB/s cpu: 5.25 sys 0.05 user user 19: 300.012 sec 26728.77 KB/s cpu: 4.98 sys 0.05 user user 20: 300.081 sec 18732.43 KB/s cpu: 3.46 sys 0.04 user user 21: 300.008 sec 26729.13 KB/s cpu: 5.22 sys 0.04 user user 22: 300.087 sec 26701.62 KB/s cpu: 5.16 sys 0.04 user user 23: 300.083 sec 14616.98 KB/s cpu: 2.86 sys 0.01 user user 24: 300.085 sec 26926.99 KB/s cpu: 5.02 sys 0.03 user user 25: 300.031 sec 26542.74 KB/s cpu: 5.16 sys 0.05 user user 26: 300.101 sec 26608.19 KB/s cpu: 5.02 sys 0.06 user user 27: 300.112 sec 26760.74 KB/s cpu: 5.28 sys 0.03 user user 28: 300.050 sec 26674.13 KB/s cpu: 5.20 sys 0.04 user user 29: 300.058 sec 19430.05 KB/s cpu: 3.79 sys 0.03 user user 30: 300.062 sec 26703.79 KB/s cpu: 5.24 sys 0.04 user user 31: 300.079 sec 26692.03 KB/s cpu: 5.18 sys 0.05 user user 32: 300.078 sec 19572.11 KB/s cpu: 3.75 sys 0.03 user user 33: 300.014 sec 26872.00 KB/s cpu: 5.25 sys 0.05 user user 34: 300.035 sec 26593.60 KB/s cpu: 4.86 sys 0.04 user user 35: 300.011 sec 26554.73 KB/s cpu: 5.17 sys 0.04 user user 36: 300.065 sec 26713.74 KB/s cpu: 5.23 sys 0.05 user user 37: 300.011 sec 26687.96 KB/s cpu: 5.18 sys 0.04 user user 38: 300.034 sec 26696.03 KB/s cpu: 5.30 sys 0.04 user user 39: 300.046 sec 18888.19 KB/s cpu: 3.62 sys 0.03 user user 40: 300.019 sec 26656.42 KB/s cpu: 5.18 sys 0.04 user user 41: 300.039 sec 26685.39 KB/s cpu: 5.08 sys 0.05 user user 42: 300.041 sec 14332.34 KB/s cpu: 2.85 sys 0.02 user user 43: 300.112 sec 26863.12 KB/s cpu: 5.27 sys 0.04 user user 44: 300.008 sec 26667.66 KB/s cpu: 5.07 sys 0.05 user user user user user user user user user user user user user user user user user user user user user 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: total: 300.060 300.021 300.052 300.110 300.096 300.116 300.026 300.027 300.044 300.017 300.024 300.102 300.043 300.055 300.047 300.070 300.055 300.097 300.093 300.024 sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec 300.117 sec 26949.71 26635.77 26817.32 26760.94 7747.49 14676.80 26737.82 26737.80 26777.10 26769.30 26799.31 26720.72 26807.85 26868.24 26879.17 26907.83 26786.37 16684.09 14063.68 26635.57 KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: 1572798.88 KB/s 5.05 5.07 5.26 5.19 1.56 2.93 5.13 5.05 4.96 5.13 5.33 5.17 5.04 4.91 5.24 5.32 5.02 3.18 2.79 4.90 sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys 0.04 0.05 0.03 0.04 0.00 0.03 0.06 0.05 0.04 0.04 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.03 0.01 0.04 cpu: 302.31 sys user user user user user user user user user user user user user user user user user user user user 2.53 user Conclusion to TEST5: <read_nstream set to 6> o The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s o The throughput per process is still imbalanced. o The maximum amount of readahead per process is 3MB, this is too aggressive (i.e. too much readahead is causing a throughput imbalance between the reading processes). TEST6: change read_nstream to 12, keep everything else the same as the baseline test. vxbench – 64files/64procs/32KB block size Tuning – read_ahead enabled/read_nstream=12/read_pref_io=524288 # vxtunefs /data1 -o read_nstream=12 UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1 # vxtunefs /data1 Filesystem I/O parameters for /data1 read_pref_io = 524288 read_nstream = 12 read_ahead = 1 # mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1 #./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.152 sec 4957.28 KB/s cpu: 0.94 sys 0.00 user user 2: 300.133 sec 4896.28 KB/s cpu: 0.92 sys 0.00 user user 3: 300.068 sec 32767.44 KB/s cpu: 6.28 sys 0.05 user user 4: 300.090 sec 32949.22 KB/s cpu: 6.12 sys 0.05 user user 5: 300.067 sec 5041.22 KB/s cpu: 0.95 sys 0.01 user user 6: 300.139 sec 4855.57 KB/s cpu: 0.89 sys 0.00 user user 7: 300.048 sec 32872.02 KB/s cpu: 6.31 sys 0.05 user user 8: 300.129 sec 4610.50 KB/s cpu: 0.89 sys 0.00 user user 9: 300.146 sec 4855.02 KB/s cpu: 0.92 sys 0.00 user user 10: 300.015 sec 32855.14 KB/s cpu: 6.17 sys 0.05 user user 11: 300.040 sec 32811.44 KB/s cpu: 6.20 sys 0.06 user user 12: 300.069 sec 4897.65 KB/s cpu: 0.92 sys 0.00 user user 13: 300.013 sec 32793.85 KB/s cpu: 6.30 sys 0.06 user user 14: 300.082 sec 32806.79 KB/s cpu: 6.31 sys 0.04 user user 15: 300.033 sec 32914.55 KB/s cpu: 6.36 sys 0.04 user user 16: 300.067 sec 32726.51 KB/s cpu: 6.33 sys 0.05 user user 17: 300.057 sec 32604.74 KB/s cpu: 6.30 sys 0.05 user user 18: 300.140 sec 4753.19 KB/s cpu: 0.93 sys 0.00 user user 19: 300.090 sec 32703.57 KB/s cpu: 6.29 sys 0.06 user user 20: 300.030 sec 32914.89 KB/s cpu: 6.30 sys 0.06 user user 21: 300.005 sec 32835.71 KB/s cpu: 6.37 sys 0.05 user user 22: 300.103 sec 32845.42 KB/s cpu: 6.27 sys 0.05 user user 23: 300.061 sec 32993.42 KB/s cpu: 6.30 sys 0.06 user user 24: 300.152 sec 4732.43 KB/s cpu: 0.89 sys 0.01 user user 25: 300.067 sec 32501.34 KB/s cpu: 6.34 sys 0.05 user user 26: 300.162 sec 4794.20 KB/s cpu: 0.91 sys 0.00 user user 27: 300.006 sec 32651.33 KB/s cpu: 6.36 sys 0.05 user user 28: 300.067 sec 32767.47 KB/s cpu: 6.38 sys 0.05 user user 29: 300.147 sec 4791.47 KB/s cpu: 0.93 sys 0.01 user user 30: 300.020 sec 32711.16 KB/s cpu: 6.31 sys 0.04 user user 31: 300.151 sec 5113.69 KB/s cpu: 0.92 sys 0.00 user user 32: 300.017 sec 14987.11 KB/s cpu: 2.89 sys 0.02 user user 33: 300.028 sec 32689.88 KB/s cpu: 6.38 sys 0.06 user user 34: 300.136 sec 4856.04 KB/s cpu: 0.91 sys 0.00 user user 35: 300.146 sec 4794.78 KB/s cpu: 0.91 sys 0.00 user user 36: 300.005 sec 32712.86 KB/s cpu: 6.19 sys 0.05 user user 37: 300.100 sec 32927.68 KB/s cpu: 6.37 sys 0.04 user user 38: 300.048 sec 32994.80 KB/s cpu: 6.34 sys 0.04 user user 39: 300.010 sec 32630.41 KB/s cpu: 6.27 sys 0.04 user user 40: 300.054 sec 32768.91 KB/s cpu: 6.32 sys 0.05 user user 41: 300.019 sec 33100.39 KB/s cpu: 6.17 sys 0.04 user user 42: 300.066 sec 32726.68 KB/s cpu: 6.38 sys 0.05 user user 43: 300.035 sec 33221.54 KB/s cpu: 6.34 sys 0.06 user user 44: 300.008 sec 32692.06 KB/s cpu: 6.32 sys 0.06 user user user user user user user user user user user user user user user user user user user user user 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: total: 300.025 300.146 300.108 300.073 300.007 300.050 300.026 300.087 300.022 300.091 300.075 300.075 300.059 300.062 300.064 300.079 300.044 300.049 300.142 300.039 sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec sec 300.163 sec 33181.65 4773.57 33172.52 32766.86 32814.98 32933.22 33038.23 32970.03 32833.90 32990.10 32991.84 32909.93 4778.89 32993.34 33156.83 33011.92 33097.62 4774.69 4897.10 33221.04 KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s KB/s cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: cpu: 1581364.62 KB/s 6.38 0.90 6.27 6.30 6.36 6.35 6.35 6.38 6.09 6.32 6.34 6.37 0.90 6.12 6.38 6.38 6.36 0.90 0.92 6.33 sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys sys 0.05 0.00 0.04 0.06 0.06 0.05 0.05 0.07 0.05 0.04 0.06 0.04 0.00 0.06 0.05 0.05 0.05 0.00 0.00 0.04 cpu: 302.20 sys user user user user user user user user user user user user user user user user user user user user 2.33 user Conclusion to TEST6: <read_nstream set to 12> o The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s o The throughput per process is imbalanced. o The maximum amount of readahead per process is 6MB, this is too aggressive (i.e. too much readahead is causing a throughput imbalance between the reading processes). Graphics for buffered I/O tests The graphs below show the results of the tests running 64 processes (only Test3, which runs 16 processes, is excluded from the graphs). The second graph simply joins the dots for each process; each test uses a different colour. The graphs clearly show that only read_nstream=1 (Test2) and read_ahead off (Test4) provide an evenly balanced throughput across all 64 processes. However when read_ahead is disabled the throughput is much lower. Therefore, in our test, read_nstream=1 (dark blue in the graphics) is clearly the correct value because the throughput is evenly balanced across all 64 processes and the maximum throughput is still achieved. 9. Final conclusions and best practices for optimizing sequential read I/O workloads: To maximize the sequential read I/O throughout, maintain evenly balanced I/O across all the LUNs and balance the throughput across the active reading processes, we identified the following configuration for our test environment: o 512KB VxVM stripe width (for the optimum I/O size reading from disk) o 24 LUNs and 24 columns in our VxVM volume (to use maximum storage bandwidth) o Leave read_pref_io set to the default value of 524288 (max I/O size using readahead) o Reduce read_nstream from a default value of 24 to a value of 1 (to reduce the maximum amount of data to pre-fetch in one go using readahead) The best practices for sequential read media server solution configurations are as follows: o Set up your hardware so that the maximum I/O bandwidth can be achieved. o We did not change the operating system maximum I/O size, we kept the default of 512KB. o Ensure that your I/O is balanced evenly across all your LUNs by using VxVM striped volumes We found a VxVM stripe-width of 512KB is optimal, different stripe-widths can be tested, a stripe-width greater than 1024KB is not required. We created 24 LUNs that maximized access to the storage arrays, we therefore created our VxVM volume with 24 columns to maximize the bandwidth to the storage arrays. During this process identify any bottlenecks in your HBA cards and storage, begin with a single node, the bottlenecks will give you the maximum throughput you can achieve in your environment. o If VxVM mirroring was required in our configuration then 12 LUNs would be used in each mirror. As reads can come from either mirror the read I/O throughput should not be impacted by mirroring, because we are still reading from all 24 LUNs, however writes will be impacted. o The value of read_pref_io is the read I/O request size that VxFS readahead will submit to VxVM, we want a larger I/O size for performance (read_pref_io is set to the stripe-width). Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io change the VxVM volume stripe-width instead. o Using higher read_nstream values produced an imbalance in throughput between the different processes performing disk read I/O, this is due to overly aggressive read_ahead No matter what value of read_nstream we used, we always hit the FC HBA Card throughput bottleneck of approximately 1.5GBytes/sec The larger the value of read_nstream the more aggressive read_ahead becomes, and the greater the imbalance in read throughput between the different processes Reduce read_nstream to reduce the amount of readahead. We found read_nstream=1 provided a perfect balance in throughout between processes. o Do not disable readahead unless absolutely necessary as sequential read performance will be impacted. o Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot. o Mount with options noatime and nomtime if you can. We will provide a second report for media server workload testing that explains sequential write I/O and some more best practices. Best regards Veritas IA Engineering team Server h/w configuration information: <2 nodes> System # dmidecode -q -t 1|head -5 System Information Manufacturer: HP Product Name: ProLiant DL380p Gen8 CPU # dmidecode -q -t 4|grep -e Processor -e Socket -e Manufacturer -e Version -e "Current Speed" -e Core -e Thread|grep -v Upgrade Processor Information Socket Designation: Proc 1 Type: Central Processor Manufacturer: Intel Version: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz Current Speed: 2200 MHz Core Count: 8 Core Enabled: 8 Thread Count: 16 Processor Information Socket Designation: Proc 2 Type: Central Processor Manufacturer: Intel Version: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz Current Speed: 2200 MHz Core Count: 8 Core Enabled: 8 Thread Count: 16 Memory # dmidecode -q -t 17|grep Size|grep -v "No Module Installed"|awk 'BEGIN{memsize=0}{memsize=memsize+$2}END{print memsize, $3}' 98304 MB # dmidecode -q -t 17|grep -e Speed -e Type|grep -v Detail|sort|uniq|grep -v Unknown Configured Clock Speed: 1600 MHz Speed: 1600 MHz Type: DDR3