CFS Media Server Workloads

advertisement
Veritas CFS Media Server Workloads
Sequential read I/O throughput test
Date: 18th September 2015
Colin Eldridge
Shrinivas Chandukar
What is the purpose of this document?
-
This initial document is designed to help setup a CFS environment for use as a media server solution.
The idea is to repeat the testing we have performed in this document using your own h/w environment
This report is specific to sequential read I/O, it includes best practices and configuration recommendations.
This testing will identify the I/O bottlenecks in your h/w environment.
The testing will identify the maximum read I/O throughput that can be achieved from one node and the
maximum read I/O throughput from all nodes combined, using your h/w environment.
This testing will identify the best stripe-width and number of columns for your VxVM volume.
This testing will identify the best file system read_ahead tuning for a sequential read I/O workload.
In summary:
-
-
This document attempts to explain how to setup a media server solution, including:
o how to perform the tests
o how to measure the I/O throughput
o how to choose the correct VxVM volume configuration and achieve balanced I/O
o how to identify the bottlenecks in the I/O path using your h/w environment
o how to tune the file system read_ahead to balance the read I/O throughput across processes
You should then understand the capabilities of your h/w environment, including:
o the maximum read I/O throughput that will be possible in the environment
o the mechanism of balancing the I/O across the LUNs
o the mechanism of balancing the read I/O throughput across active processes/threads
1. Configuration:
Hardware, DMP paths and volume configuration
o
HOST side: 2 CFS nodes
 Each node has a dual port HBA card (so 2 active DMP paths to each LUN), each HBA port is
connected to a different FC switch
 The theoretical maximum throughput per FC port on the HBA is
8Gbits/sec
 The theoretical maximum throughput per node (two FC ports) is 16Gbits/sec.
 The theoretical maximum throughput for two nodes is therefore 32Gbits/sec.
 In reality during our testing the maximum throughput we could reach from one node was
12Gbits/sec
 In our 1-node testing the dual port HBA therefore bottlenecked at approximately
12Gbits/sec (1.5 Gbytes/sec), so this is our approx. maximum throughput from one node.
o
FC Switch: 2 FC switches
 Each switch is capable of 32Gbits/sec, there are two switches so the total theoretical max
throughout for both switches is 64Gbits/sec.
 Each individual switch port is capable of 8Gbits/sec.
 We are using 4 switch ports connected to HBA FC ports on the host nodes – this limits the
max throughout at the switch to 32Gbits/sec (through both switches).
 We are using 12 switch ports connected to the modular storage arrays.
o
Storage Array: 6 modular array
 We have 6 modular storage arrays.
 We are using 2 ports from each storage array – each port has a theoretical maximum
throughput of 4Gbits/sec.
 We therefore have a total of 12 storage array connections to the two FC switches (6
connections to each switch)
 The theoretical maximum throughput is therefore 48Gbits/sec for the storage arrays.
 In our 2-node testing the combination of 6 storage arrays bottlenecked at approximately
20Gbits/sec (2.5 Gbytes/sec), so this is our approx. maximum throughout from both nodes.
# vxdmpadm listenclosure
ENCLR_NAME
ENCLR_TYPE
ENCLR_SNO
STATUS
ARRAY_TYPE
LUN_COUNT
FIRMWARE
=======================================================================================================
storagearray-0
STORAGEARRAY- 21000022a1035118
CONNECTED
A/A-A-STORAGE
4
1
storagearray-1
STORAGEARRAY- 21000022a1035119
CONNECTED
A/A-A-STORAGE
4
1
storagearray-2
STORAGEARRAY- 21000022a1035116
CONNECTED
A/A-A-STORAGE
4
1
storagearray-3
STORAGEARRAY- 21000022a1035117
CONNECTED
A/A-A-STORAGE
4
1
storagearray-4
STORAGEARRAY- 21000022a106c70a
CONNECTED
A/A-A-STORAGE
4
1
storagearray-5
STORAGEARRAY- 21000022a106c705
CONNECTED
A/A-A-STORAGE
4
1
o
LUNs:
 Each modular array has 4 enclosures with 12 disks each, only 11 disks are used in each
enclosure for a RAID-0 LUN
 Each LUN is comprised of 11 disks (11 way stripe), 64Kb stripe width (one disk is kept as a
failure disk).
 There are 4 LUNs per modular array, therefore we have a total of 24 LUNs.
 Each LUN is approximately 3TB.
All 24 LUNs can be displayed using the “vxdisk list” command:
# vxdisk list
DEVICE
storagearray-0_16
storagearray-0_17
storagearray-0_18
storagearray-0_20
storagearray-1_6
storagearray-1_7
storagearray-1_8
storagearray-1_9
storagearray-2_5
storagearray-2_6
storagearray-2_7
storagearray-2_8
storagearray-3_4
storagearray-3_6
storagearray-3_7
storagearray-3_8
storagearray-4_8
storagearray-4_9
storagearray-4_10
storagearray-4_11
storagearray-5_8
storagearray-5_9
storagearray-5_10
storagearray-5_11
TYPE
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
auto:cdsdisk
DISK
storagearray-0_16
storagearray-0_17
storagearray-0_18
storagearray-0_20
storagearray-1_6
storagearray-1_7
storagearray-1_8
storagearray-1_9
storagearray-2_5
storagearray-2_6
storagearray-2_7
storagearray-2_8
storagearray-3_4
storagearray-3_6
storagearray-3_7
storagearray-3_8
storagearray-4_8
storagearray-4_9
storagearray-4_10
storagearray-4_11
storagearray-5_8
storagearray-5_9
storagearray-5_10
storagearray-5_11
GROUP
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
STATUS
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
online
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
shared
o
DMP paths - 2 paths per LUN


There are 2 paths per LUN (on each node).
Both paths are active, therefore there are 48 active paths in total (on each node).
All 48 paths can be displayed using the “vxdisk path” command:
# vxdisk path
SUBPATH
sdad
sdo
sdab
sdm
sdae
sdp
sdac
sdn
sdx
sdan
sdaa
sdaq
sdz
sdap
sdy
sdao
sdat
sdw
sdar
sdu
sdas
sdv
sdaz
sday
sdq
sdau
sds
sdaw
sdav
sdr
sdax
sdt
sdaf
sdi
sdag
sdj
sdl
sdai
sdk
sdah
sdh
sdam
sdg
sdal
sde
sdaj
sdf
sdak
DANAME
storagearray-0_16
storagearray-0_16
storagearray-0_17
storagearray-0_17
storagearray-0_18
storagearray-0_18
storagearray-0_20
storagearray-0_20
storagearray-1_6
storagearray-1_6
storagearray-1_7
storagearray-1_7
storagearray-1_8
storagearray-1_8
storagearray-1_9
storagearray-1_9
storagearray-2_5
storagearray-2_5
storagearray-2_6
storagearray-2_6
storagearray-2_7
storagearray-2_7
storagearray-2_8
storagearray-2_8
storagearray-3_4
storagearray-3_4
storagearray-3_6
storagearray-3_6
storagearray-3_7
storagearray-3_7
storagearray-3_8
storagearray-3_8
storagearray-4_8
storagearray-4_8
storagearray-4_9
storagearray-4_9
storagearray-4_10
storagearray-4_10
storagearray-4_11
storagearray-4_11
storagearray-5_8
storagearray-5_8
storagearray-5_9
storagearray-5_9
storagearray-5_10
storagearray-5_10
storagearray-5_11
storagearray-5_11
DMNAME
storagearray-0_16
storagearray-0_16
storagearray-0_17
storagearray-0_17
storagearray-0_18
storagearray-0_18
storagearray-0_20
storagearray-0_20
storagearray-1_6
storagearray-1_6
storagearray-1_7
storagearray-1_7
storagearray-1_8
storagearray-1_8
storagearray-1_9
storagearray-1_9
storagearray-2_5
storagearray-2_5
storagearray-2_6
storagearray-2_6
storagearray-2_7
storagearray-2_7
storagearray-2_8
storagearray-2_8
storagearray-3_4
storagearray-3_4
storagearray-3_6
storagearray-3_6
storagearray-3_7
storagearray-3_7
storagearray-3_8
storagearray-3_8
storagearray-4_8
storagearray-4_8
storagearray-4_9
storagearray-4_9
storagearray-4_10
storagearray-4_10
storagearray-4_11
storagearray-4_11
storagearray-5_8
storagearray-5_8
storagearray-5_9
storagearray-5_9
storagearray-5_10
storagearray-5_10
storagearray-5_11
storagearray-5_11
GROUP
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
testdg
STATE
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
o
VxVM volume
 The idea is to achieve balanced I/O across all the LUNs, and to maximise the h/w I/O
bandwidth.
 As we have 24 LUNs available we created our VxVM volume with 24 columns to obtain the
maximum possible throughput.
 We then tested using three different VxVM stripe unit widths, 64Kb, 512Kb, 1024Kb
 The “stripewidth” argument to the vxassist command is in units of 512byte sectors.
Volume configuration using 64k stripe width volume, 24 columns:
# vxassist -g testdg make vol1 50T layout=striped stripewidth=128 `vxdisk list|grep storage|awk '{print $1}'`
v
pl
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
vol1
ENABLED ACTIVE
107374182400 SELECT vol1-01 fsgen
vol1-01
vol1
ENABLED ACTIVE
107374184448 STRIPE 24/128 RW
storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924352 0/0 storagearray-0_16
storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924352 1/0 storagearray-0_17
storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924352 2/0 storagearray-0_18
storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924352 3/0 storagearray-0_20
storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473924352 4/0 storagearray-1_6
storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473924352 5/0 storagearray-1_7
storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473924352 6/0 storagearray-1_8
storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473924352 7/0 storagearray-1_9
storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473924352 8/0 storagearray-2_5
storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473924352 9/0 storagearray-2_6
storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473924352 10/0 storagearray-2_7
storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473924352 11/0 storagearray-2_8
storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473924352 12/0 storagearray-3_4
storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473924352 13/0 storagearray-3_6
storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473924352 14/0 storagearray-3_7
storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473924352 15/0 storagearray-3_8
storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473924352 16/0 storagearray-4_8
storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473924352 17/0 storagearray-4_9
storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924352 18/0 storagearray-4_10
storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924352 19/0 storagearray-4_11
storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473924352 20/0 storagearray-5_8
storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473924352 21/0 storagearray-5_9
storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924352 22/0 storagearray-5_10
storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924352 23/0 storagearray-5_11
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
Volume configuration using 512k stripe width volume, 24 columns:
# vxassist -g testdg make vol1 50T layout=striped stripewidth=1024 `vxdisk list|grep storage|awk '{print $1}'`
v
pl
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
vol1
ENABLED ACTIVE
107374182400 SELECT vol1-01 fsgen
vol1-01
vol1
ENABLED ACTIVE
107374190592 STRIPE 24/1024 RW
storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924608 0/0 storagearray-0_16
storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924608 1/0 storagearray-0_17
storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924608 2/0 storagearray-0_18
storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924608 3/0 storagearray-0_20
storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473924608 4/0 storagearray-1_6
storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473924608 5/0 storagearray-1_7
storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473924608 6/0 storagearray-1_8
storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473924608 7/0 storagearray-1_9
storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473924608 8/0 storagearray-2_5
storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473924608 9/0 storagearray-2_6
storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473924608 10/0 storagearray-2_7
storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473924608 11/0 storagearray-2_8
storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473924608 12/0 storagearray-3_4
storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473924608 13/0 storagearray-3_6
storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473924608 14/0 storagearray-3_7
storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473924608 15/0 storagearray-3_8
storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473924608 16/0 storagearray-4_8
storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473924608 17/0 storagearray-4_9
storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924608 18/0 storagearray-4_10
storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924608 19/0 storagearray-4_11
storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473924608 20/0 storagearray-5_8
storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473924608 21/0 storagearray-5_9
storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924608 22/0 storagearray-5_10
storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924608 23/0 storagearray-5_11
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
Volume configuration using 1024k stripe width volume, 24 columns:
# vxassist -g testdg make vol1 50T layout=striped stripewidth=2048 `vxdisk list|grep storage|awk '{print $1}'`
v
pl
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
sd
vol1
ENABLED ACTIVE
107374182400 SELECT vol1-01 fsgen
vol1-01
vol1
ENABLED ACTIVE
107374215168 STRIPE 24/2048 RW
storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473925632 0/0 storagearray-0_16
storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473925632 1/0 storagearray-0_17
storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473925632 2/0 storagearray-0_18
storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473925632 3/0 storagearray-0_20
storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473925632 4/0 storagearray-1_6
storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473925632 5/0 storagearray-1_7
storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473925632 6/0 storagearray-1_8
storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473925632 7/0 storagearray-1_9
storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473925632 8/0 storagearray-2_5
storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473925632 9/0 storagearray-2_6
storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473925632 10/0 storagearray-2_7
storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473925632 11/0 storagearray-2_8
storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473925632 12/0 storagearray-3_4
storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473925632 13/0 storagearray-3_6
storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473925632 14/0 storagearray-3_7
storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473925632 15/0 storagearray-3_8
storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473925632 16/0 storagearray-4_8
storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473925632 17/0 storagearray-4_9
storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473925632 18/0 storagearray-4_10
storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473925632 19/0 storagearray-4_11
storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473925632 20/0 storagearray-5_8
storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473925632 21/0 storagearray-5_9
storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473925632 22/0 storagearray-5_10
storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473925632 23/0 storagearray-5_11
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
ENA
2. VxVM maximum disk I/O: Read throughput test execution
Raw disk device read I/O test execution and collection of throughput results:
vxbench sequential read test execution method and result collection
An example of the vxbench command that we run on each node is below.
This test executes 64 parallel processes, each process is reading from the same raw volume device, reading
using a block size of 1MB.
The output of the vxbench command provides the combined total throughput of all 64 parallel processes, we
capture this information in our result table.
The result in this example test was
Therefore the result of this test was
1577033.29 KBytes/second
1.504
GBytes/second
Test: vxbench
IO : sequential read of raw volume
IOsize=1024K
VxVM volume stripe width 512KB, 24 columns
Processes: 64
$ ./vxbench -w read -i iosize=1024k,iotime=300,maxfilesize=40T /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1
user
1: 300.015 sec 24625.94 KB/s cpu: 0.75
user
2: 300.024 sec 24498.95 KB/s cpu: 0.75
user
3: 300.004 sec 24667.86 KB/s cpu: 0.75
user
4: 300.020 sec 24417.35 KB/s cpu: 0.75
user
5: 300.016 sec 24574.65 KB/s cpu: 0.74
user
6: 300.012 sec 24615.97 KB/s cpu: 0.74
user
7: 300.029 sec 24689.68 KB/s cpu: 0.76
user
8: 300.023 sec 24587.75 KB/s cpu: 0.75
user
9: 300.032 sec 24668.98 KB/s cpu: 0.76
user 10: 300.024 sec 24795.84 KB/s cpu: 0.76
user 11: 300.033 sec 24546.01 KB/s cpu: 0.75
user 12: 300.024 sec 24761.75 KB/s cpu: 0.76
user 13: 300.028 sec 24543.02 KB/s cpu: 0.76
user 14: 300.014 sec 24591.96 KB/s cpu: 0.75
user 15: 300.013 sec 24568.13 KB/s cpu: 0.75
user 16: 300.037 sec 24624.20 KB/s cpu: 0.75
user 17: 300.018 sec 24734.97 KB/s cpu: 0.76
user 18: 300.003 sec 24596.26 KB/s cpu: 0.76
user 19: 300.004 sec 24886.31 KB/s cpu: 0.77
user 20: 300.007 sec 24879.24 KB/s cpu: 0.76
user 21: 300.017 sec 24434.71 KB/s cpu: 0.75
user 22: 300.027 sec 24437.31 KB/s cpu: 0.76
user 23: 300.019 sec 24635.87 KB/s cpu: 0.75
user 24: 300.028 sec 24665.88 KB/s cpu: 0.76
user 25: 300.021 sec 24519.64 KB/s cpu: 0.75
user 26: 300.022 sec 24587.85 KB/s cpu: 0.76
user 27: 300.006 sec 24647.22 KB/s cpu: 0.77
user 28: 300.019 sec 24666.62 KB/s cpu: 0.76
user 29: 300.006 sec 24544.82 KB/s cpu: 0.76
user 30: 300.022 sec 24625.35 KB/s cpu: 0.75
user 31: 300.021 sec 24649.38 KB/s cpu: 0.75
user 32: 300.016 sec 24701.01 KB/s cpu: 0.76
user 33: 300.018 sec 24683.74 KB/s cpu: 0.75
user 34: 300.018 sec 24738.38 KB/s cpu: 0.77
user 35: 300.001 sec 24599.78 KB/s cpu: 0.75
user 36: 300.008 sec 24674.30 KB/s cpu: 0.76
user 37: 300.024 sec 24580.86 KB/s cpu: 0.75
user 38: 300.023 sec 24628.71 KB/s cpu: 0.75
user 39: 300.007 sec 24701.75 KB/s cpu: 0.77
user 40: 300.026 sec 24765.01 KB/s cpu: 0.76
user 41: 300.007 sec 24824.63 KB/s cpu: 0.76
user 42: 300.015 sec 24707.90 KB/s cpu: 0.78
user 43: 300.032 sec 24587.01 KB/s cpu: 0.76
user 44: 300.027 sec 24700.06 KB/s cpu: 0.78
user 45: 300.019 sec 24584.70 KB/s cpu: 0.77
user 46: 300.013 sec 24745.56 KB/s cpu: 0.78
user 47: 300.033 sec 24556.21 KB/s cpu: 0.77
user 48: 300.012 sec 24728.58 KB/s cpu: 0.77
user 49: 300.010 sec 24489.82 KB/s cpu: 0.76
user 50: 300.020 sec 24751.83 KB/s cpu: 0.76
user 51: 300.035 sec 24846.13 KB/s cpu: 0.77
user 52: 300.012 sec 24639.83 KB/s cpu: 0.75
user 53: 300.010 sec 24691.24 KB/s cpu: 0.77
user 54: 300.029 sec 24686.29 KB/s cpu: 0.77
user 55: 300.021 sec 24608.41 KB/s cpu: 0.77
user 56: 300.027 sec 24440.67 KB/s cpu: 0.77
user 57: 300.017 sec 24700.92 KB/s cpu: 0.77
user 58: 300.026 sec 24645.57 KB/s cpu: 0.77
user 59: 300.004 sec 24442.54 KB/s cpu: 0.76
user 60: 300.011 sec 24749.21 KB/s cpu: 0.77
user 61: 300.006 sec 24865.61 KB/s cpu: 0.77
user 62: 300.023 sec 24468.29 KB/s cpu: 0.75
user 63: 300.023 sec 24662.87 KB/s cpu: 0.77
user 64: 300.017 sec 24646.26 KB/s cpu: 0.76
total:
300.037 sec
1577033.29 KB/s
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.01 user
sys
0.01 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.01 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.01 user
sys
0.00 user
sys
0.01 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
sys
0.00 user
cpu: 48.63 sys
0.05 user
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
/dev/vx/rdsk/testdg/vol1
iostat throughput data method and result collection
An example of the iostat command that we run on each node is below.
The sector size is 512bytes.
Note that the average request size avgrq-sz is 1024, this is 1024 sectors * 512bytes = 512KB read I/O size.
The result in this example test is
Therefore the result of the test is
3155251.20 sectors/second
1.504
GBytes/second
$iostat –x 20
Device:
sda
sdc
sdd
sdb
sdp
sdo
sdn
sds
sdt
sdq
sdr
sdx
sdz
sdy
sdaa
sdm
sdv
sdu
sdw
sdg
sdj
sdi
sdl
sdk
sdf
sde
sdh
sdab
sdac
sdad
sdae
sdaf
sdag
sdah
sdai
sdaj
sdak
sdal
sdam
sdan
sdao
sdap
sdaq
sdar
sdas
sdat
sdau
sdav
sdaw
sdax
sday
sdaz
VxVM59000
100.00
rrqm/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
wrqm/s
r/s
4.30
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
61.95
0.00
64.95
0.00
64.90
0.00
64.90
0.00
64.75
0.00
63.20
0.00
64.75
0.00
62.40
0.00
66.60
0.00
64.65
0.00
62.15
0.00
63.90
0.00
62.75
0.00
64.15
0.00
66.45
0.00
62.80
0.00
64.45
0.00
65.45
0.00
64.05
0.00
64.95
0.00
65.35
0.00
62.45
0.00
65.25
0.00
64.30
0.00
63.55
0.00
63.20
0.00
66.40
0.00
62.90
0.00
63.95
0.00
63.25
0.00
64.20
0.00
65.70
0.00
62.75
0.00
65.40
0.00
62.90
0.00
66.00
0.00
64.05
0.00
62.05
0.00
66.30
0.00
64.40
0.00
65.80
0.00
62.30
0.00
65.25
0.00
63.75
0.00
63.55
0.00
63.80
0.00
63.55
0.00
64.95
0.00 3081.30
w/s
1.25
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
rsec/s
wsec/s avgrq-sz avgqu-sz
0.00
44.40
35.52
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
63436.80
0.00 1024.00
1.87
66508.80
0.00 1024.00
1.82
66457.60
0.00 1024.00
2.03
66457.60
0.00 1024.00
2.05
66304.00
0.00 1024.00
2.16
64716.80
0.00 1024.00
1.88
66304.00
0.00 1024.00
2.05
63897.60
0.00 1024.00
2.05
68198.40
0.00 1024.00
2.37
66201.60
0.00 1024.00
2.37
63641.60
0.00 1024.00
2.17
65433.60
0.00 1024.00
1.96
64256.00
0.00 1024.00
2.26
65689.60
0.00 1024.00
2.32
68044.80
0.00 1024.00
2.25
64307.20
0.00 1024.00
2.39
65996.80
0.00 1024.00
2.14
67020.80
0.00 1024.00
2.02
65587.20
0.00 1024.00
2.10
66508.80
0.00 1024.00
2.28
66918.40
0.00 1024.00
2.59
63948.80
0.00 1024.00
2.38
66816.00
0.00 1024.00
2.43
65843.20
0.00 1024.00
2.15
65075.20
0.00 1024.00
2.23
64716.80
0.00 1024.00
2.01
67993.60
0.00 1024.00
2.25
64409.60
0.00 1024.00
1.89
65484.80
0.00 1024.00
2.08
64768.00
0.00 1024.00
2.18
65740.80
0.00 1024.00
2.07
67276.80
0.00 1024.00
2.44
64256.00
0.00 1024.00
2.44
66969.60
0.00 1024.00
2.41
64409.60
0.00 1024.00
2.26
67584.00
0.00 1024.00
2.13
65587.20
0.00 1024.00
2.28
63539.20
0.00 1024.00
2.15
67891.20
0.00 1024.00
2.27
65945.60
0.00 1024.00
2.17
67379.20
0.00 1024.00
2.24
63795.20
0.00 1024.00
1.98
66816.00
0.00 1024.00
2.00
65280.00
0.00 1024.00
2.05
65075.20
0.00 1024.00
2.11
65331.20
0.00 1024.00
2.21
65075.20
0.00 1024.00
2.39
66508.80
0.00 1024.00
2.35
3155251.20
0.00 1024.00
104.68
await svctm %util
0.12
0.04
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
30.22 12.73 78.84
27.99 12.35 80.21
31.32 12.92 83.84
31.60 12.76 82.79
33.38 12.47 80.75
29.71 12.37 78.20
31.67 12.28 79.48
32.81 13.16 82.10
35.60 12.27 81.70
36.69 12.89 83.35
34.92 13.56 84.25
30.63 13.05 83.36
36.05 12.86 80.72
36.25 13.08 83.93
33.89 12.47 82.86
38.07 13.43 84.37
33.04 13.31 85.81
30.76 12.47 81.65
32.67 12.64 80.97
34.90 13.01 84.53
39.63 12.90 84.31
38.16 12.90 80.53
37.21 12.73 83.05
33.46 13.61 87.53
35.19 13.48 85.69
31.87 13.19 83.39
33.89 12.58 83.50
30.10 12.73 80.09
32.68 13.12 83.93
34.56 13.00 82.22
32.21 12.40 79.60
37.07 12.33 81.03
38.80 13.33 83.64
36.79 13.08 85.56
35.92 12.97 81.59
32.23 12.36 81.59
35.60 13.04 83.54
34.65 12.77 79.26
34.21 12.95 85.87
33.60 13.43 86.51
33.90 12.42 81.74
31.74 13.24 82.46
30.52 12.21 79.66
32.12 12.43 79.27
33.08 13.10 83.22
34.69 12.86 82.07
37.70 13.13 83.41
36.08 13.09 85.01
33.97
0.32
vxstat throughput data collection method and result collection
An example of the vxstat command that we run on each node is below.
The blocks in the vxstat output are in units of sectors, so the block size is 512bytes.
Note that ‘blocks read / operations read’ gives the average I/O size:
2624512 BLOCKS READ / 2563 OPERATIONS READ / 2 = 512KB avg. read I/O size
The result in this example test is
Therefore the result of the test is
63109120 blocks (512 byte sectors) read every 20 seconds
1.504
GBytes/second
$ vxstat -g testdg -vd –I 20
TYP
Fri
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
dm
vol
NAME
27 Feb 2015 12:49:49 PM IST
storagearray-0_16
storagearray-0_17
storagearray-0_18
storagearray-0_20
storagearray-1_6
storagearray-1_7
storagearray-1_8
storagearray-1_9
storagearray-2_5
storagearray-2_6
storagearray-2_7
storagearray-2_8
storagearray-3_4
storagearray-3_6
storagearray-3_7
storagearray-3_8
storagearray-4_8
storagearray-4_9
storagearray-4_10
storagearray-4_11
storagearray-5_8
storagearray-5_9
storagearray-5_10
storagearray-5_11
vol1
OPERATIONS
READ
WRITE
2563
2564
2568
2568
2570
2569
2572
2573
2576
2572
2570
2569
2570
2568
2570
2572
2567
2567
2566
2564
2563
2563
2563
2563
61630
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
BLOCKS
READ
2624512
2625536
2629632
2629632
2631680
2630656
2633728
2634752
2637824
2633728
2631680
2630656
2631680
2629632
2631680
2633728
2628608
2628608
2627584
2625536
2624512
2624512
2624512
2624512
63109120
WRITE
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
AVG TIME(ms)
READ WRITE
29.86
32.01
32.08
33.20
32.47
34.50
35.12
36.11
32.81
34.88
34.93
36.84
30.09
32.30
31.84
33.96
30.40
32.82
32.41
34.69
36.54
37.37
37.57
39.20
33.92
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
portperfshow – FC switch port throughput data collection method and result collection
An example of the command used to collect the throughput at the switch port is below.
The portperfshow command reports the throughput for one switch, so two ‘portperfshow’ commands are
executed, one for each FC switch.
The ‘portperfshow’ total is no use here, as we only want to collect the data for the specific ports that are
connected to the host HBA FC ports.
In our test case this is port3 and port7. The other six ports are connected to the six modular storage arrays.
FC_switch1:admin> portperfshow
0
1
2
3
4
5
6
7
8
9
10
11
12
13
...
Total
==============================================================================================================
234.4m 237.3m 238.5m 704.4m 231.4m 242.6m 239.0m 717.7m 0
0
0
0
0
0
...
2.8g
FC_switch2:admin> portperfshow
0
1
2
3
4
5
6
7
8
9
10
11
12
13
...
Total
===============================================================================================================
236.7m 236.0m 237.8m 708.0m 231.5m 238.2m 232.8m 715.5m 0
0
0
0
0
0
...
2.8g
Therefore we have to add:
Switch1 port3 704.4 + port7 717.7 = 1422.1 MB/sec = 1.388769 Gbytes/sec
Switch2 port3 708.0 + port7 715.5 = 1423.5 MB/sec = 1.390137 Gbytes/sec
Total:
= 2.778906 Gbytes/sec
NOTE:
Measuring the throughout at the switch port always shows a higher reading than the throughput measured
by vxbench/vxstat/iostat.
The measurement at the switch port is higher due to 8b/10b encoding overhead.
The I/O throughput reading is therefore best measured by vxbench/vxstat/iostat and not at the FC switch
port.
Referring to the “Fibre channel roadmap v1.8” table at http://fibrechannel.org/fibre-channelroadmaps.html
The 8GFC throughput is 1600MB/sec for full duplex, therefore the net throughput for each direction will be
800MB/sec.
As the HBA is a dual port card the maximum theoretical throughput for each direction will be 1600MB/sec.
However, referring to http://en.wikipedia.org/wiki/Fibre_Channel shows 8GFC is actually 797MB/sec for
each direction.
Therefore using our dual port card the maximum theoretical throughput for each direction will be
1594MB/sec (1.5566 GB/sec)
Therefore, per the specification, the maximum theoretical throughput in our environment will be 1.5566
GB/sec per node.
Above we are measuring the throughput from both nodes at the FC switch ports whilst the test is running
on both nodes, this bottlenecked at the storage arrays. The bottleneck at the storage array is approximately
2.5 Gbytes/sec, however measuring the throughput at the FC switch ports gives a higher throughput reading
of 2.77 Gbytes/sec, this is due to the reasons explained above.
3. VxVM maximum disk I/O: Test results and conclusions
Raw volume device disk I/O throughput test results summary in Gbits per second:
Test program: vxbench
IO : sequential read of raw volume
IOsize=1024K
VxVM volume stripe widths 64KB, 512KB and 1024KB
VxVM volume 24 columns
Processes: 64
Summary of raw volume throughput
(Gbits/sec)
Stripe
Summary
width nodes vxbench iostat Gbits/sec Recommended
64k
1
11.429 11.485
11.5
64k
2
19.457 19.543
19.5
512k
1
12.032 12.040
12.0
YES
512k
2
20.552 20.557
20.5
YES
1024k
1
12.029 12.037
12.0
1024k
2
20.341 20.331
20.3
Raw volume device disk I/O throughput detailed test results in GBytes per second:
Stripe
Nodes
width
vxbench GB/s
1
2nd
Total
Node Node
64k
64k
512k
512k
1024k
1
2
1
2
1
1.429
1.429 1.436
1.436 1.428
1.428
1.215 1.217 2.432 1.218 1.225 2.443 1.214 1.220 2.434
1.504
1.504 1.505
1.505 1.504
1.504
1.285 1.284 2.569 1.284 1.286 2.570 1.286 1.287 2.573
1.504
1.504 1.505
1.505 1.505
1.505
0.782
1.306
0.803
1.372
0.817
0.795
1.304
0.829
1.370
0.813
1.577
2.610
1.632
2.741
1.629
1024k
2
1.272 1.271 2.543 1.269 1.273 2.541 1.272 1.273 2.545
1.357
1.359
2.716
st
iostat GB/s
1
2nd
Total
Node Node
st
vxstat GB/s
1
2nd
Total
Node Node
st
FC Switch GB/s
1
2nd
Total
Switch Switch
st
Conclusions and recommendations so far:
a. Maximum I/O size setting (RHEL6.5)
The default operating system maximum I/O size is 512KB, there is no need to
change the operating system’s default maximum I/O size tunable values.
b. VxVM stripe width setting
The optimal VxVM stripe width for media server solutions is also 512KB, Veritas
therefore recommend using VxVM stripe width of 512KB.
c. VxVM stripe columns setting
The hardware was configured to achieve maximum throughput when accessing all
the available LUNs.
The number of LUNs available using our storage configuration was 24.
We therefore used all 24 LUNs in our VxVM volume to maximize the storage I/O
bandwidth.
d. Balanced I/O
Using a VxVM stripe width of 512KB and 24 columns and utilizing all paths, we were
able to achieve balanced I/O across all the LUNs (see the iostat output).
This then allowed us to easily identify the HBA bottleneck (using a single node) and
storage bottlenecks (using both nodes).
e. Maximum achievable read I/O throughout using our hardware configuration
o 12Gbits/sec (1.5Gbytes/sec) Performing I/O from one node:
 using our hardware configuration, we identified the dual FC port HBA had
a throughput bottleneck of 12Gbits/sec (1.5Gbytes/sec) – this is
maximum throughput we can achieve from each node.
o 20Gbits/sec (2.5Gbytes/sec) Performing I/O from two nodes:
 using our hardware configuration, we identified the storage bottleneck of
20Gbits/sec (2.5Gbytes/sec)
f. Conclusion: From this point onwards we now know the maximum throughout
achievable using our hardware configuration.
4. VxFS direct I/O maximum disk I/O:
Read throughput test execution
This VxFS direct I/O test mimics the VxVM raw disk test by performing direct I/O to one file that contains a
single contiguous extent.
Thereby, all the vxbench processes begin reading from the same device offset.
This VxFS direct I/O test is therefore equivalent to the VxVM raw device test, only the starting offset into the
device is different.
Here are the details of the file we created for this test:
# ls -li file1
4 -rw-r--r-- 1 root root 34359738368 Mar 3 14:25 file1
# ls -lhi file1
4 -rw-r--r-- 1 root root 32G Mar 3 14:25 file1
# du -h file1
32G
file1
One file with a single contiguous extent of size 32GB:
# fsmap -HA ./file1
Volume Extent Type
vol1
Data
File Offset
0 Bytes
Dev Offset
34359738368
Extent Size
32.00 GB
Inode#
4
Here is how we created this file and performed this test, note that we strongly recommend a file system block
size of 8192:
// mkfs
$ mkfs -t vxfs /dev/vx/rdsk/testdg/vol1
version 10 layout
107374182400 sectors, 6710886400 blocks of size 8192, log size 32768 blocks
rcq size 8192 blocks
largefiles supported
maxlink supported
Note that for optimal read performance we recommend using the mount option of “noatime”.
The ‘noatime’ mount option prevents the inode access time being updated for every read operation.
// mount
$ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
// create a file with a single 32Gb extent and write to it
$ touch /data1/file1
$ /opt/VRTS/bin/setext -r 4194304 -f contig /data1/file1
$ dd if=/dev/zero of=/data1/file1 bs=128k count=262144
262144+0 records in
262144+0 records out
34359738368 bytes (34 GB) copied, 24.0118 s, 1.4 GB/s
$ /opt/VRTS/bin/fsmap -A /data1/file1
Volume Extent Type
File Offset
Dev Offset
Extent Size Inode#
vol1
Data
0
34359738368
34359738368 4
$ ls -lh /data1/file1
-rw-r--r-- 1 root root 32G Mar 3 14:12 /data1/file1
// umount the file system to clear the file data from memory
$ umount /data1
// mount the file system from both nodes
$ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
// vxbench command execution, 64 processes reading from the same file using direct I/O
$./vxbench -w read -c direct -i iosize=1024k,iotime=300,maxfilesize=32G /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
/data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
5. VxFS direct I/O maximum disk I/O: Test results
As expected the results are the same as the VxVM raw disk read throughput test (all results in GBytes/second)
VxFS direct IO
vxbench
st
Stripe
width
Nodes
64k
64k
512k
512k
1024k
1
2
1
2
1
1024k
2
iostat
1.423
1.423 1.428
1.428 1.428
1.428
1.213 1.209 2.423 1.217 1.208 2.425 1.217 1.208 2.425
1.502
1.502 1.504
1.504 1.504
1.504
1.282 1.281 2.563 1.283 1.283 2.566 1.283 1.283 2.566
1.502
1.502 1.502
1.502 1.502
1.502
0.769
1.294
0.801
1.370
0.802
0.768
1.302
0.802
1.364
0.802
1.537
2.596
1.603
2.734
1.604
1.271 1.268 2.539 1.271 1.271 2.541 1.271 1.271 2.541
1.352
1.361
2.713
Total
1
Node
nd
2
Node
st
FC Switch
Total
2
Node
st
vxstat
1
2nd
switch switch
1
Node
nd
Total
1
Node
nd
2
Node
st
Total
Using a stripe-width of 512KB is recommended by VERITAS.
The I/O is evenly balanced across all 24 LUNs. Below is the iostat output showing all 48 paths (1-node test):
IOstat
Device:
sde
sdf
sdg
sdh
sdi
sdj
sdk
sdl
sdm
sdn
sdo
sdp
sdq
sdr
sds
sdt
sdu
sdv
sdw
sdx
sdy
sdz
sdaa
sdab
sdac
sdad
sdae
sdaf
sdag
sdah
sdai
sdak
sdal
sdaj
sdam
sdan
sdaq
sdao
sdap
sdar
sdas
sdat
sdau
sdav
sdaw
rrqm/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
wrqm/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
r/s
65.25
63.85
64.95
63.45
66.50
62.90
64.15
65.00
64.95
66.60
62.95
61.20
62.95
65.70
66.40
63.85
62.60
65.00
62.65
64.85
62.80
64.75
62.20
63.85
61.75
65.25
64.10
63.25
62.95
64.30
63.30
65.35
62.55
64.80
61.90
64.40
66.10
65.55
63.50
64.55
65.80
63.25
65.75
63.25
63.30
w/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
rsec/s
66816.00
65382.40
66508.80
64972.80
68096.00
64409.60
65689.60
66560.00
66508.80
68198.40
64460.80
62668.80
64460.80
67276.80
67993.60
65382.40
64102.40
66560.00
64153.60
66406.40
64307.20
66304.00
63692.80
65382.40
63232.00
66816.00
65638.40
64768.00
64460.80
65843.20
64819.20
66918.40
64051.20
66355.20
63385.60
65945.60
67686.40
67123.20
65024.00
66099.20
67379.20
64768.00
67328.00
64768.00
64819.20
wsec/s avgrq-sz avgqu-sz
0.00 1024.00
2.26
0.00 1024.00
2.32
0.00 1024.00
2.23
0.00 1024.00
2.09
0.00 1024.00
2.13
0.00 1024.00
2.19
0.00 1024.00
2.27
0.00 1024.00
2.23
0.00 1024.00
2.12
0.00 1024.00
2.21
0.00 1024.00
1.94
0.00 1024.00
1.93
0.00 1024.00
1.98
0.00 1024.00
2.23
0.00 1024.00
2.25
0.00 1024.00
2.20
0.00 1024.00
2.32
0.00 1024.00
2.36
0.00 1024.00
2.20
0.00 1024.00
2.50
0.00 1024.00
1.84
0.00 1024.00
2.13
0.00 1024.00
1.95
0.00 1024.00
2.00
0.00 1024.00
1.98
0.00 1024.00
2.18
0.00 1024.00
2.22
0.00 1024.00
2.08
0.00 1024.00
2.19
0.00 1024.00
2.31
0.00 1024.00
2.22
0.00 1024.00
2.13
0.00 1024.00
2.17
0.00 1024.00
2.15
0.00 1024.00
2.19
0.00 1024.00
2.31
0.00 1024.00
2.46
0.00 1024.00
2.28
0.00 1024.00
2.44
0.00 1024.00
2.40
0.00 1024.00
2.10
0.00 1024.00
2.01
0.00 1024.00
1.98
0.00 1024.00
2.10
0.00 1024.00
1.97
await
34.55
36.19
34.36
32.92
32.10
34.79
35.41
34.35
32.69
33.34
30.84
31.63
31.52
33.94
33.97
34.40
37.00
36.23
35.17
38.48
29.32
32.81
31.29
31.35
31.93
33.30
34.64
32.85
34.82
36.00
35.10
32.52
34.68
33.23
35.28
35.80
37.17
34.79
38.47
37.14
32.03
31.74
30.15
33.10
31.20
svctm
12.69
13.34
13.30
12.82
12.76
13.83
13.41
12.72
13.03
12.66
12.92
13.12
13.15
12.65
13.16
13.47
13.60
12.79
13.05
13.12
12.63
12.73
12.58
13.22
13.00
13.15
13.11
12.67
12.93
13.21
13.36
12.42
12.67
12.53
13.44
12.98
12.69
12.84
13.48
13.59
12.92
12.76
12.60
13.19
13.32
%util
82.80
85.18
86.39
81.34
84.83
86.98
86.04
82.65
84.61
84.32
81.36
80.28
82.80
83.10
87.41
85.97
85.17
83.13
81.75
85.11
79.30
82.45
78.24
84.43
80.28
85.79
84.05
80.12
81.40
84.95
84.54
81.17
79.22
81.22
83.18
83.58
83.87
84.16
85.58
87.73
85.01
80.72
82.86
83.44
84.32
sdax
sday
sdaz
VxVM40000
0.00
0.00
0.00
0.00
0.00
61.85
0.00
65.35
0.00
67.15
0.00 3078.65
0.00
0.00
0.00
0.00
63334.40
66918.40
68761.60
3152537.60
0.00
0.00
0.00
0.00
1024.00
1024.00
1024.00
1024.00
2.01
1.92
2.07
103.77
32.52
29.36
30.88
33.71
13.27 82.08
12.36 80.74
12.08 81.15
0.32 100.00
6. VxVM raw disk and VxFS direct I/O:
Results comparison and conclusions
-
-
Raw volume device disk I/O throughput test results in Gbytes/sec :
VxVM RAW IO
Stripe
Nodes
width
vxbench
1
2nd
Total
Node Node
64k
64k
512k
512k
1024k
1
2
1
2
1
1.429
1.429 1.436
1.436 1.428
1.428
1.215 1.217 2.432 1.218 1.225 2.443 1.214 1.220 2.434
1.504
1.504 1.505
1.505 1.504
1.504
1.285 1.284 2.569 1.284 1.286 2.570 1.286 1.287 2.573
1.504
1.504 1.505
1.505 1.505
1.505
0.782
1.306
0.803
1.372
0.817
0.795
1.304
0.829
1.370
0.813
1.577
2.610
1.632
2.741
1.629
1024k
2
1.272 1.271 2.543 1.269 1.273 2.541 1.272 1.273 2.545
1.357
1.359
2.716
st
1
Node
iostat
2nd
Node
vxstat
1
2nd
Node Node
FC Switch
1
2nd
Total
Switch Switch
st
Total
st
Total
VxFS direct I/O disk I/O throughput test results in Gbytes/sec :
VxFS direct IO
-
st
vxbench
st
Stripe
width
Nodes
64k
64k
512k
512k
1024k
1
2
1
2
1
1024k
2
iostat
1.423
1.423 1.428
1.428 1.428
1.428
1.213 1.209 2.423 1.217 1.208 2.425 1.217 1.208 2.425
1.502
1.502 1.504
1.504 1.504
1.504
1.282 1.281 2.563 1.283 1.283 2.566 1.283 1.283 2.566
1.502
1.502 1.502
1.502 1.502
1.502
0.769
1.294
0.801
1.370
0.802
0.768
1.302
0.802
1.364
0.802
1.537
2.596
1.603
2.734
1.604
1.271 1.268 2.539 1.271 1.271 2.541 1.271 1.271 2.541
1.352
1.361
2.713
Total
1
Node
nd
2
Node
st
FC Switch
Total
2
Node
st
vxstat
1
2nd
switch switch
1
Node
nd
Total
1
Node
nd
2
Node
st
Total
The CPU utilisation during both tests is very small:
VxVM RAW IO
Stripe
width
1st Node
Nodes
%usr
%sys
64k
1
1.13
2.63
64k
2
1.41
2.39
512k
1
0.93
1.25
512k
2
0.92
1.13
1024k
1
0.96
1.26
1024k
2
0.92
1.12
2nd Node
%usr %sys
0.63
0.53
0.5
2.24
1.06
1.03
VxFS Direct IO
Stripe
width
1st Node
Nodes
%usr
%sys
64k
1
0.89
2.76
64k
2
0.63
2.34
512k
1
0.8
1.32
512k
2
0.65
1.17
1024k
1
0.82
1.31
1024k
2
0.64
1.11
2nd Node
%usr
%sys
0.63
2.34
0.65
1.17
0.64
1.11
Conclusions and recommendations so far:
a. Maximum I/O size setting
The default operating system maximum I/O size is 512KB, there is no need to
change the default maximum I/O size tunable values.
b. VxVM stripe width setting
The optimal VxVM stripe width for media server solutions is also 512KB, Veritas
therefore recommend using VxVM stripe width of 512KB.
c. VxVM stripe columns setting
The hardware was configured to achieve maximum throughput when accessing all
the available LUNs.
The number of LUNs available using our storage configuration was 24.
We therefore used all 24 LUNs in our VxVM volume to maximize the storage I/O
bandwidth.
d. Balanced I/O
Using a VxVM stripe width of 512KB and 24 columns and utilizing all paths, we were
able to achieve balanced I/O across all the LUNs. (see the iostat output)
This then allowed us to easily identify the HBA bottleneck (using a single node) and
storage bottlenecks (using both nodes).
e. Maximum achievable read I/O throughout using our hardware configuration
o 12Gbits/sec (1.5Gbytes/sec) Performing I/O from one node:
 using our hardware configuration, we identified the dual FC port HBA had
a throughput bottleneck of 12Gbits/sec (1.5Gbytes/sec) – this is
maximum throughput we can achieve from each node.
o 20Gbits/sec (2.5Gbytes/sec) Performing I/O from two nodes:
 using our hardware configuration, we identified the storage bottleneck of
20Gbits/sec (2.5Gbytes/sec)
f. Conclusion: From this point onwards we now know the maximum throughout
achievable in using our hardware configuration.
g. For improved file system read performance, mount the file system using the
“noatime” mount option.
i. The ‘noatime’ mount option avoids the inode access time update for every
read [access] operation.
ii. The inode atime (access time) updates are asynchronous and do not go
through the file system intent-log
iii. However for maximum performance benefits in read intensive workloads the
“noatime” mount option is recommended.
Example mount command:
# mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1
/data1
h. For improved file system write performance, mount the file system using the
“nomtime” mount option.
i. The ‘nomtime’ mount option is a lazy update of the inode modification time,
it is only available with CFS.
ii. The inode mtime (modification time) updates do go through the file system
intent-log.
iii. The ‘nomtime’ option does not remove the modification time update, it just
delays it to improve CFS write performance.
iv. For maximum performance benefits in write intensive CFS workloads the
“nomtime” mount option is recommended.
Example mount command using noatime and nomtime:
# mount -t vxfs -o noatime,nomtime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
i. Conclusion: The test results show that VxFS direct I/O does not degrade
sequential read I/O throughput performance compared to raw disk.
i. By creating a file system and creating a file with a single contiguous extent
we could emulate the raw disk read throughput using VxFS direct I/O
ii. Each direct I/O read will fetch data from disk, so no buffering is being
performed using either direct I/O or raw disk I/O.
iii. Using VxFS direct I/O and running an identical vxbench test, we hit the same
maximum achievable read I/O throughout.
iv. Therefore the sequential read throughput was not impacted using VxFS
direct I/O compared to reading from VxVM raw disk.
7. VxFS buffered I/O maximum disk I/O throughput test:
Test execution
This VxFS buffered I/O test is different. For the buffered read I/O throughout test, each process needs to read
from a different file.
To prepare the files for this test we pre-allocate 16GB of file system space to each file, then write to the files to
increase their file size to 16GB.
To pre-create the 64 files for this test the following script can used. The script assumes an 8192 byte file system
block size is being used.
mkdir /data1/primary
mkdir /data1/secondary
for n in `seq 1 64`
do
touch /data1/primary/file${n};
/opt/VRTS/bin/setext -r 2097152 -f contig /data1/primary/file${n};
dd if=/dev/zero of=/data1/primary/file${n} bs=128k count=131072 &
touch /data1/secondary/file${n};
/opt/VRTS/bin/setext -r 2097152 -f contig /data1/secondary/file${n};
dd if=/dev/zero of=/data1/secondary/file${n} bs=128k count=131072 &
done
When this script has finished some of the file data will remain in memory, before we run our buffered I/O test
we need to remove the file data from memory.
Note that for improved read performance you can also use the “noatime” mount option.
The ‘noatime’ mount option prevents the inode access time being updated for every read operation.
We did not use the “noatime” mount option in our test.
To remove the file data from memory the file system can be umounted and mounted again.
Alternatively, a simple trick can be used to remove the file data from memory before each test run by using the
“remount” mount option, as follows:
// mount
$ mount -t vxfs -o remount,noatime,largefiles,cluster
/dev/vx/dsk/testdg/vol1 /data1
Again we are using vxbench to perform our test. This time however we need to explicitly stipulate the path to
each separate file on the vxbench command line, as shown below.
Note also that the iosize argument has been changed, we are no longer reading using a 1024KB block size; in our
VxFS buffered I/O test we are reading using a 32KB block size, because a smaller read(2) iosize will be used in
the media server solution implementation.
# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1
/data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5
/data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9
/data1/primary/file10 /data1/primary/file11 /data1/primary/file12
/data1/primary/file13 /data1/primary/file14 /data1/primary/file15
/data1/primary/file16 /data1/primary/file17 /data1/primary/file18
/data1/primary/file19 /data1/primary/file20 /data1/primary/file21
/data1/primary/file22 /data1/primary/file23 /data1/primary/file24
/data1/primary/file25 /data1/primary/file26 /data1/primary/file27
/data1/primary/file28 /data1/primary/file29 /data1/primary/file30
/data1/primary/file31
/data1/primary/file34
/data1/primary/file37
/data1/primary/file40
/data1/primary/file43
/data1/primary/file46
/data1/primary/file49
/data1/primary/file52
/data1/primary/file55
/data1/primary/file58
/data1/primary/file61
/data1/primary/file64
/data1/primary/file32
/data1/primary/file35
/data1/primary/file38
/data1/primary/file41
/data1/primary/file44
/data1/primary/file47
/data1/primary/file50
/data1/primary/file53
/data1/primary/file56
/data1/primary/file59
/data1/primary/file62
/data1/primary/file33
/data1/primary/file36
/data1/primary/file39
/data1/primary/file42
/data1/primary/file45
/data1/primary/file48
/data1/primary/file51
/data1/primary/file54
/data1/primary/file57
/data1/primary/file60
/data1/primary/file63
8. VxFS buffered I/O max disk I/O throughput test:
Tests and test results and individual test conclusions
All the tests in this entire report read from disk using sequential read I/O.
VxFS readahead is required
The greatest impact to the performance of sequential reads from disk when using VxFS/CFS buffered
I/O is readahead.
File system readahead utilizes the file system page cache to asynchronously pre-fetch file data into
memory, this logically benefits sequential read I/O performance.
Our buffered I/O sequential read performance tests demonstrate the impact of readahead and
highlight how tuning readahead can avoid a potential imbalance in throughput between processes.
Readahead is tunable using the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables.
The VxVM volume configuration will impact readahead
We have already determined, in our earlier testing above, that the optimal VxVM stripe-width to
maximize the I/O throughput is 512KB running our test.
In our storage configuration we created 24 LUNs across 6 modular arrays, by striping across all 24
LUNs we can balance the I/O across the LUNs and maximize the overall storage bandwidth.
Using this optimal volume configuration we could easily identify two bottlenecks, one due to the FC
HBA ports (a per-node bottleneck) and the other bottleneck in the storage itself.
However the volume stripe width and the number of columns (LUNs) in the volume are also used to
auto-tune the values for the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables.
VxFS readahead tunables – default values
When mounting a VxFS file system it will auto-tune values for the ‘read_pref_io’
and ‘read_nstream’ VxFS tunables. These two tunables are used to tune VxFS readahead.
The value for read_pref_io will be set to the VxVM volume stripe width – therefore the default autotuned value is read_pref_io=524288 in our test.
The value for read_nstream will be the number of columns (LUNs) in the volume – therefore the
default auto-tuned value is read_nstream=24 in our test.
VxFS picks the default values for these tunables from the VxVM volume configuration.
This means read_pref_io=524288 and read_nstream=24 will be set by default by VxFS at
mount time using our volume configuration.
VxFS readahead tunables – maximum amount file data that will be pre-fetched
The maximum amount of file data that is pre-fetched from disk using read_ahead is determined by
read_pref_io*read_nstream.
Therefore, by default, the maximum amount of read_ahead will be “512KB * 24 = 12MB” using
our volume configuration.
As we will see during the buffered I/O testing, pre-fetching 12MB of file data is too much readahead,
we found this caused an imbalance in read I/O throughput between processes.
VxFS readahead tunable – read_pref_io
The VxFS read_pref_io tunable is set to the VxVM volume stripe-width by default. The tunable
means the “preferred read I/O size”.
VxFS readahead will be triggered by two sequential read I/O’s. The amount of file data to pre-fetch
from disk is increased as more sequential I/O’s are performed.
As mentioned above, the maximum amount of readahead (the maximum amount of file data to prefetch from disk) is read_pref_io*read_nstream.
However the maximum I/O request size submitted by VxFS to VxVM will be ‘read_pref_io’.
Therefore read_pref_io is the maximum read I/O request size submitted to VxVM.
What does it mean if read_pref_io is set to 512KB:
o If (for example) we read a file using the ‘dd’ command and use a dd block size of 8KB, then
VxFS readahead will pre-fetch the file data using I/O requests of size 512KB to VxVM.
o Readahead can therefore result in a smaller number of I/O’s and a larger I/O request size,
thus improving read I/O performance.
Veritas do not recommend tuning ‘read_pref_io’ from its default auto-tuned value.
If a different value (other than the default value) for ‘read_pref_io’ is desired, then Veritas
recommend changing the volume stripe width instead.
VxFS readahead tunable – read_nstream
The read_nstream value defaults to the number of columns in the VxVM volume.
As mentioned above, the maximum amount of readahead (the maximum amount of file data to prefetch from disk) is read_pref_io*read_nstream
To reduce the maximum amount of read_ahead simply reduce the value of read_nstream, please
see the results of our tests using different values for read_nstream below.
The best practice for tuning readahead is as follows:
o Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io
change the VxVM volume stripe-width instead.
o Reduce read_nstream to reduce the amount of readahead
o You could disable readahead if necessary, but this will usually be a disadvantage (see test4).
o Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot.
Summary:
By performing sequential reads using VxFS buffered I/O and performing readahead, the application
I/O size is effectively converted to read_pref_io sized requests to VxVM.
So there are two performance benefits of readahead, one is to pre-fetch file data from disk, the
other is to increase the I/O size of the read request from disk (so reducing the number of I/O’s).
These buffered I/O throughput tests will therefore help you decide what stripe-width, number of
columns and readahead tuning is best for your solution implementation.
Also, these buffered I/O throughput tests will help you to determine how many running processes
you will want to be reading from disk at the same time.
Buffered I/O tests:
We have chosen a volume configuration that was best for disk I/O performance, however this
volume configuration also results in very aggressive read_ahead (12MB at maximum).
With a stripe_width of 512KB and 24 LUNs (columns) the default maximum read_ahead is therefore
too aggressive.
TEST1: Use the default auto-tuned settings, using one node: <this is the baseline test>
Baseline vxbench test – 64files/64processess/32KB block size
Default auto-tuning – read_ahead enabled/read_nstream=24/read_pref_io=524288
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 24
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2
/data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6
/data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10
/data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14
/data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18
/data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22
/data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26
/data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30
/data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34
/data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38
/data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42
/data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46
/data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50
/data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54
/data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58
/data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62
/data1/primary/file63 /data1/primary/file64
user
1: 300.062 sec 48868.77 KB/s cpu: 9.78 sys
0.08 user
user
2: 300.102 sec 48370.93 KB/s cpu: 9.78 sys
0.06 user
user
3: 300.042 sec 48094.01 KB/s cpu: 9.86 sys
0.08 user
user
4: 300.176 sec
4461.92 KB/s cpu: 1.01 sys
0.00 user
user
5: 300.105 sec
4584.10 KB/s cpu: 1.12 sys
0.00 user
user
6: 300.102 sec 48125.32 KB/s cpu: 9.85 sys
0.08 user
user
7: 300.031 sec 48341.50 KB/s cpu: 9.79 sys
0.07 user
user
8: 300.201 sec
4583.81 KB/s cpu: 1.12 sys
0.01 user
user
9: 300.194 sec
4582.32 KB/s cpu: 1.14 sys
0.00 user
user 10: 300.203 sec
4755.40 KB/s cpu: 1.19 sys
0.00 user
user 11: 300.126 sec 48121.38 KB/s cpu: 9.74 sys
0.08 user
user 12: 300.220 sec
4500.70 KB/s cpu: 1.01 sys
0.00 user
user 13: 300.201 sec
4665.25 KB/s cpu: 1.11 sys
0.00 user
user 14: 300.086 sec 48291.58 KB/s cpu: 9.74 sys
0.07 user
user 15: 300.165 sec
4501.41 KB/s cpu: 1.01 sys
0.01 user
user 16: 300.203 sec
4633.57 KB/s cpu: 1.16 sys
0.00 user
user 17: 300.147 sec 48159.06 KB/s cpu: 9.64 sys
0.08 user
user 18: 300.035 sec 48504.56 KB/s cpu: 9.41 sys
0.08 user
user 19: 300.078 sec 48497.65 KB/s cpu: 9.73 sys
0.07 user
user 20: 300.161 sec 48238.58 KB/s cpu: 9.66 sys
0.08 user
user 21: 300.136 sec 48201.71 KB/s cpu: 9.74 sys
0.08 user
user 22: 300.193 sec
4705.78 KB/s cpu: 1.21 sys
0.00 user
user 23: 300.086 sec 48045.94 KB/s cpu: 9.86 sys
0.07 user
user 24: 300.062 sec 47926.93 KB/s cpu: 9.69 sys
0.08 user
user 25: 300.198 sec
4460.09 KB/s cpu: 1.11 sys
0.01 user
user 26: 300.207 sec
4623.79 KB/s cpu: 1.09 sys
0.00 user
user 27: 300.215 sec
4582.00 KB/s cpu: 1.01 sys
0.00 user
user 28: 300.125 sec 48203.53 KB/s cpu: 9.70 sys
0.08 user
user 29: 300.141 sec 48323.77 KB/s cpu: 9.65 sys
0.07 user
user 30: 300.212 sec
4705.48 KB/s cpu: 1.20 sys
0.00 user
user 31: 300.153 sec 48485.59 KB/s cpu: 9.72 sys
0.07 user
user 32: 300.163 sec 48033.68 KB/s cpu: 9.66 sys
0.07 user
user 33: 300.160 sec 48525.35 KB/s cpu: 9.82 sys
0.07 user
user 34: 300.144 sec
4624.56 KB/s cpu: 1.09 sys
0.01 user
user 35: 300.102 sec 48002.47 KB/s cpu: 9.60 sys
0.07 user
user 36: 300.203 sec
4821.38 KB/s cpu: 1.18 sys
0.01 user
user 37: 300.006 sec 48072.18 KB/s cpu: 9.64 sys
0.07 user
user 38: 300.219 sec
4746.29 KB/s cpu: 1.15 sys
0.00 user
user 39: 300.213 sec
4701.73 KB/s cpu: 1.18 sys
0.00 user
user 40: 300.176 sec
4460.00 KB/s cpu: 1.13 sys
0.00 user
user 41: 300.207 sec
4583.50 KB/s cpu: 1.05 sys
0.00 user
user 42: 300.213 sec
4624.56 KB/s cpu: 1.03 sys
0.00 user
user 43: 300.049 sec 48789.10 KB/s cpu: 9.87 sys
0.08 user
user 44: 300.207 sec
4708.85 KB/s cpu: 1.18 sys
0.00 user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
total:
300.077
300.079
300.099
300.064
300.199
300.204
300.032
300.120
300.128
300.203
300.201
300.206
300.212
300.212
300.211
300.206
300.133
300.035
300.195
300.047
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
300.220 sec
48129.27
48374.66
48494.28
48581.86
4705.78
4788.64
9044.38
47917.67
48407.76
4746.24
4460.37
4623.49
4664.43
4664.76
4623.52
4623.80
12111.95
9945.29
4583.47
48093.15
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
1578817.49 KB/s
9.59
9.74
9.64
9.47
1.10
1.20
1.94
9.69
9.56
1.07
1.02
1.11
1.09
1.06
1.04
1.08
2.64
2.15
1.13
9.80
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
0.07
0.07
0.09
0.08
0.00
0.01
0.02
0.06
0.07
0.00
0.01
0.00
0.00
0.00
0.01
0.01
0.02
0.01
0.00
0.09
cpu: 323.53 sys
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
2.31 user
Conclusion to TEST1: <this is our baseline test, using the default auto-tuned values, read_nstream is
therefore set to its default value of 24>
o This test ran for 300.220 seconds and read from disk at an average rate of 1578817.49 KB/sec,
vxbench therefore read 452 GB of data from disk.
o The throughput per process is very imbalanced, some processes achieved ~49000 KB/sec others
processes only achieved ~4800 KB/sec
o However the maximum possible read I/O throughput from one node is still being achieved
1578817.49 KB/sec = 1.506 GB/sec
o The problem is not the total throughput, the problem is the maximum readahead per process is
12MB at a time
o 12MB of readahead (read_pref_io*read_nstream) is too aggressive and is causing an imbalance of
throughout between processes.
o This readahead configuration is therefore a failure, too much readahead is causing an imbalance of
throughput between the processes.
o
o
o
o
o
We do not want to change the value of read_pref_io because we want to request large I/O sizes for
better performance.
By default the VxFS read_pref_io tunable is set to the VxVM volume stripe-width, in our test this
value is 512KB.
By default the VxFS read_nstream tunable is set to the number of columns in the VxVM volume, in
our test this value is 24 (we have 24 LUNs).
Next, we therefore want to experiment by setting smaller values of read_nstream and also test with
read_ahead disabled as well.
Our goal is to maintain the maximum amount of total throughput (approx. 1.5Gbytes/sec) whilst
also spreading this throughput evenly between all the active processes reading from disk.
TEST2: change read_nstream to 1, keep everything else the same as the baseline test.
vxbench – 64files/64processess/32KB block size
Tuning – read_ahead enabled/read_nstream=1/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=1
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 1
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
#./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2
/data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6
/data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10
/data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14
/data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18
/data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22
/data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26
/data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30
/data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34
/data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38
/data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42
/data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46
/data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50
/data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54
/data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58
/data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62
/data1/primary/file63 /data1/primary/file64
user
1: 300.013 sec 24639.76 KB/s cpu: 5.35 sys
0.05 user
user
2: 300.044 sec 24748.27 KB/s cpu: 5.41 sys
0.06 user
user
3: 300.010 sec 24706.66 KB/s cpu: 5.52 sys
0.06 user
user
4: 300.021 sec 24872.94 KB/s cpu: 5.46 sys
0.05 user
user
5: 300.023 sec 24724.40 KB/s cpu: 5.58 sys
0.05 user
user
6: 300.060 sec 24683.79 KB/s cpu: 5.58 sys
0.06 user
user
7: 300.021 sec 24744.96 KB/s cpu: 5.66 sys
0.06 user
user
8: 300.016 sec 24680.46 KB/s cpu: 5.49 sys
0.06 user
user
9: 300.017 sec 24784.51 KB/s cpu: 5.55 sys
0.06 user
user 10: 300.021 sec 24744.97 KB/s cpu: 5.54 sys
0.05 user
user 11: 300.015 sec 24747.12 KB/s cpu: 5.54 sys
0.06 user
user 12: 300.017 sec 24830.60 KB/s cpu: 5.46 sys
0.05 user
user 13: 300.013 sec 24824.11 KB/s cpu: 5.61 sys
0.05 user
user 14: 300.028 sec 24729.11 KB/s cpu: 5.57 sys
0.05 user
user 15: 300.017 sec 24752.09 KB/s cpu: 5.42 sys
0.06 user
user 16: 300.028 sec 24655.71 KB/s cpu: 5.53 sys
0.06 user
user 17: 300.013 sec 24834.38 KB/s cpu: 5.68 sys
0.05 user
user 18: 300.048 sec 24773.52 KB/s cpu: 5.52 sys
0.07 user
user 19: 300.024 sec 24697.01 KB/s cpu: 5.50 sys
0.07 user
user 20: 300.012 sec 24938.48 KB/s cpu: 5.61 sys
0.06 user
user 21: 300.016 sec 24646.33 KB/s cpu: 5.54 sys
0.06 user
user 22: 300.016 sec 24689.11 KB/s cpu: 5.57 sys
0.05 user
user 23: 300.019 sec 24695.60 KB/s cpu: 5.50 sys
0.06 user
user 24: 300.023 sec 24719.31 KB/s cpu: 5.59 sys
0.05 user
user 25: 300.015 sec 24755.66 KB/s cpu: 5.58 sys
0.05 user
user 26: 300.018 sec 24596.75 KB/s cpu: 5.59 sys
0.07 user
user 27: 300.049 sec 24717.11 KB/s cpu: 5.54 sys
0.08 user
user 28: 300.019 sec 24753.74 KB/s cpu: 5.59 sys
0.06 user
user 29: 300.021 sec 24214.23 KB/s cpu: 5.44 sys
0.06 user
user 30: 300.021 sec 24772.27 KB/s cpu: 5.61 sys
0.05 user
user 31: 300.019 sec 24908.96 KB/s cpu: 5.68 sys
0.05 user
user 32: 300.045 sec 24637.23 KB/s cpu: 5.53 sys
0.06 user
user 33: 300.053 sec 24677.55 KB/s cpu: 5.59 sys
0.05 user
user 34: 300.017 sec 24692.39 KB/s cpu: 5.60 sys
0.07 user
user 35: 300.018 sec 24787.86 KB/s cpu: 5.55 sys
0.06 user
user 36: 300.019 sec 24741.70 KB/s cpu: 5.57 sys
0.07 user
user 37: 300.015 sec 24813.68 KB/s cpu: 5.52 sys
0.06 user
user 38: 300.014 sec 24808.66 KB/s cpu: 5.40 sys
0.06 user
user 39: 300.013 sec 24716.57 KB/s cpu: 5.53 sys
0.06 user
user 40: 300.024 sec 24705.55 KB/s cpu: 5.54 sys
0.06 user
user 41: 300.039 sec 24796.47 KB/s cpu: 5.50 sys
0.05 user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
total:
300.044
300.044
300.028
300.060
300.019
300.052
300.020
300.016
300.020
300.035
300.049
300.022
300.014
300.026
300.058
300.022
300.021
300.015
300.027
300.021
300.021
300.011
300.015
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
300.061 sec
24852.33
24836.97
24735.94
24803.28
24830.57
24587.20
24750.25
24675.38
24704.09
24716.59
24700.04
24818.32
24725.01
24683.17
24786.37
24850.79
24702.35
24735.30
24840.10
24687.03
24799.55
24744.08
24655.08
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
1582992.52 KB/s
5.60
5.59
5.54
5.71
5.54
5.57
5.54
5.53
5.52
5.37
5.54
5.40
5.50
5.57
5.65
5.61
5.38
5.59
5.58
5.58
5.61
5.42
5.54
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
0.05
0.06
0.05
0.05
0.07
0.05
0.06
0.04
0.06
0.06
0.05
0.06
0.06
0.05
0.04
0.06
0.06
0.06
0.05
0.06
0.06
0.05
0.06
cpu: 354.62 sys
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
3.65 user
Conclusion to TEST2: <read_nstream set to 1>
o Using read_nstream=1 produces a perfect balance in throughput per process (~24700 KB/sec), so
now all the process have the same consistent throughput during the test:
 The maximum total throughput from one node is still being achieved (1582992.52 KB/s),
approx. 1.5 GB/sec
 The total throughput is now divided evenly across all 64 processes and remains consistent
throughout the test.
 The average read I/O size is obviously still 512KB (avgrq-sz = 1024.00), this is because
read_pref_io is set to 512KB.
 The I/O is obviously evenly balanced across all 24 LUNs ( see r/s and rsec/s in the iostat
output below)
 Most importantly the I/O throughput is now evenly balanced across all 64 processes, yet the
total throughput remains the same.
o The maximum readahead per process is now 512KB
o The throughput per process is now therefore balanced, all 64 processes are now consitently
performing approx. 24700 KB/s – perfect!!
o
Please note:
 If the throughput per process had not been evenly distributed using read_nstream=1, then
we would recommend reducing the stripe-width to 256KB or 128KB
 Reducing the stripe-width will reduce the default value of “read_pref_io”.
 We do not advise tuning read_pref_io to override its default value, we recommend tuning
the VxVM volume stripe-width instead.
# iostat –x 20
Device:
sde
sdf
sdg
sdh
sdi
sdj
sdk
sdl
sdm
sdn
sdo
sdp
sdq
sdr
sds
sdt
sdu
sdv
sdw
sdx
sdy
sdz
sdaa
sdab
sdac
sdad
sdae
sdaf
sdag
sdah
sdai
sdak
sdal
sdaj
sdam
sdan
sdaq
sdao
sdap
sdar
sdas
sdat
sdau
sdav
sdaw
sdax
sday
sdaz
VxVM56000
rrqm/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
wrqm/s
r/s
0.00
63.90
0.00
64.35
0.00
64.75
0.00
64.95
0.00
64.80
0.00
65.40
0.00
65.05
0.00
63.30
0.00
64.50
0.00
64.85
0.00
63.10
0.00
65.45
0.00
64.00
0.00
64.55
0.00
64.20
0.00
64.85
0.00
64.65
0.00
63.85
0.00
64.25
0.00
64.95
0.00
64.00
0.00
63.85
0.00
64.80
0.00
65.05
0.00
64.15
0.00
63.50
0.00
64.00
0.00
65.65
0.00
65.25
0.00
64.50
0.00
64.25
0.00
64.85
0.00
64.50
0.00
64.05
0.00
64.75
0.00
64.05
0.00
64.15
0.00
64.75
0.00
64.95
0.00
63.85
0.00
64.20
0.00
65.05
0.00
64.85
0.00
64.00
0.00
64.55
0.00
64.05
0.00
65.85
0.00
63.65
0.00 3094.90
w/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
rsec/s
wsec/s avgrq-sz avgqu-sz
65433.60
0.00 1024.00
2.88
65894.40
0.00 1024.00
2.88
66304.00
0.00 1024.00
2.91
66508.80
0.00 1024.00
2.90
66355.20
0.00 1024.00
2.64
66969.60
0.00 1024.00
2.70
66611.20
0.00 1024.00
2.73
64819.20
0.00 1024.00
2.61
66048.00
0.00 1024.00
2.77
66406.40
0.00 1024.00
2.79
64614.40
0.00 1024.00
2.66
67020.80
0.00 1024.00
2.80
65536.00
0.00 1024.00
2.66
66099.20
0.00 1024.00
2.72
65740.80
0.00 1024.00
2.65
66406.40
0.00 1024.00
2.75
66201.60
0.00 1024.00
2.66
65382.40
0.00 1024.00
2.64
65792.00
0.00 1024.00
2.63
66508.80
0.00 1024.00
2.68
65536.00
0.00 1024.00
2.71
65382.40
0.00 1024.00
2.70
66355.20
0.00 1024.00
2.68
66611.20
0.00 1024.00
2.70
65689.60
0.00 1024.00
2.57
65024.00
0.00 1024.00
2.56
65536.00
0.00 1024.00
2.57
67225.60
0.00 1024.00
2.65
66816.00
0.00 1024.00
2.95
66048.00
0.00 1024.00
2.91
65792.00
0.00 1024.00
2.88
66406.40
0.00 1024.00
2.64
66048.00
0.00 1024.00
2.66
65587.20
0.00 1024.00
2.90
66304.00
0.00 1024.00
2.64
65587.20
0.00 1024.00
2.63
65689.60
0.00 1024.00
2.74
66304.00
0.00 1024.00
2.74
66508.80
0.00 1024.00
2.81
65382.40
0.00 1024.00
2.74
65740.80
0.00 1024.00
2.67
66611.20
0.00 1024.00
2.69
66406.40
0.00 1024.00
2.66
65536.00
0.00 1024.00
2.65
66099.20
0.00 1024.00
2.82
65587.20
0.00 1024.00
2.82
67430.40
0.00 1024.00
2.88
65177.60
0.00 1024.00
2.80
3169177.60
0.00 1024.00
131.05
await
45.11
44.76
44.88
44.61
40.74
41.32
42.02
41.25
42.91
43.06
42.17
42.74
41.63
42.14
41.27
42.45
41.25
41.33
41.00
41.32
42.18
42.16
41.35
41.53
40.17
40.34
40.21
40.35
45.29
45.07
44.77
40.69
41.21
45.20
40.75
41.15
42.72
42.36
43.19
42.83
41.61
41.29
41.04
41.37
43.67
44.11
43.75
43.95
42.35
svctm %util
15.37 98.20
15.08 97.06
15.17 98.19
15.01 97.48
14.97 97.00
14.85 97.11
14.87 96.71
15.39 97.39
15.14 97.67
14.89 96.58
15.23 96.12
14.97 97.97
15.10 96.61
15.14 97.70
15.09 96.91
14.94 96.85
15.05 97.30
15.18 96.90
15.12 97.17
14.87 96.55
15.04 96.26
15.21 97.14
15.08 97.71
15.03 97.80
15.02 96.34
15.23 96.69
15.08 96.51
14.77 96.97
15.03 98.04
15.16 97.75
15.22 97.79
14.89 96.56
15.16 97.80
15.21 97.45
15.04 97.39
15.09 96.68
15.23 97.68
15.02 97.26
14.91 96.87
15.30 97.67
15.19 97.53
14.99 97.53
14.87 96.41
15.03 96.17
15.25 98.45
15.14 96.96
14.85 97.81
15.34 97.66
0.32 100.00
TEST3: change read_nstream to 1, read from 16 files using 16 processes, keep everything else the same
as the baseline test.
vxbench – 16files/16processess/32KB block size
Tuning – read_ahead enabled/read_nstream=1/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=1
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 1
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
# ./vxbench -w read -i iosize=32k,iotime=120,maxfilesize=16G /data1/primary/file1 /data1/primary/file2
/data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6
/data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10
/data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14
/data1/primary/file15 /data1/primary/file16
user
1: 120.030 sec 97417.75 KB/s cpu: 7.43 sys
0.07 user
user
2: 120.037 sec 98452.82 KB/s cpu: 7.68 sys
0.09 user
user
3: 120.033 sec 98302.87 KB/s cpu: 7.55 sys
0.08 user
user
4: 120.031 sec 98227.29 KB/s cpu: 7.37 sys
0.08 user
user
5: 120.030 sec 98381.88 KB/s cpu: 7.89 sys
0.05 user
user
6: 120.033 sec 98272.61 KB/s cpu: 7.42 sys
0.07 user
user
7: 120.032 sec 97744.74 KB/s cpu: 7.70 sys
0.08 user
user
8: 120.037 sec 98069.12 KB/s cpu: 7.74 sys
0.10 user
user
9: 120.030 sec 98603.74 KB/s cpu: 7.79 sys
0.06 user
user 10: 120.036 sec 98756.87 KB/s cpu: 7.82 sys
0.07 user
user 11: 120.037 sec 98513.11 KB/s cpu: 7.78 sys
0.10 user
user 12: 120.040 sec 98360.81 KB/s cpu: 7.80 sys
0.08 user
user 13: 120.030 sec 98488.47 KB/s cpu: 7.48 sys
0.09 user
user 14: 120.030 sec 98241.64 KB/s cpu: 7.50 sys
0.09 user
user 15: 120.039 sec 97824.57 KB/s cpu: 7.76 sys
0.09 user
user 16: 120.032 sec 98700.71 KB/s cpu: 7.42 sys
0.09 user
total:
120.041 sec
1572267.32 KB/s
cpu: 122.13 sys
1.29 user
Conclusion to TEST3: <read_nstream to 1, read from 16 files using 16 processes>
o Using read_nstream=1 produces a perfect balance in throughput per process (98000 KB/sec), so all
process still have an equal amount of throughput:
 The maximum total throughput from one node is still being achieved (1572267.32 KB/s)
with 16 processes, this is approx. 1.5 GB/sec
 The total throughput is now divided evenly across all 16 processes, so the throughput perprocess is higher using less processes
 Most importantly the I/O throughput is now evenly balanced across all 16 processes, yet the
total throughput remains the same.
o The maximum readahead per process is still 512KB, this amount of readahead provides perfectly
balanced throughput per process in our test.
o The throughput per process is now therefore balanced, all 16 processes are now performing approx.
98000 KB/s – perfect!!
o
Please note:
 The throughput per process is now much higher using 16 processes rather than 64
processes.
 The number of processes reduced by a factor of 4 in test3, so the throughput per process
increased by a factor of 4 in test3, but the total throughput is unchanged.
 It is therefore very important to consider the number of running processes that will be
reading from disk at the same time, as the available throughput will be evenly distributed
between these processes.
TEST4: disable readahead, keep everything else the same as the baseline test.
vxbench – 64files/64procs/32KB block size
Tuning – read_ahead disabled/read_nstream=24/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=24,read_ahead=0
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 24
read_ahead = 0
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2
/data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6
/data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10
/data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14
/data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18
/data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22
/data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26
/data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30
/data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34
/data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38
/data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42
/data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46
/data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50
/data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54
/data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58
/data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62
/data1/primary/file63 /data1/primary/file64
user
1: 300.011 sec 12246.06 KB/s cpu: 7.68 sys
0.07 user
user
2: 300.009 sec 11192.53 KB/s cpu: 6.96 sys
0.07 user
user
3: 300.010 sec 11619.25 KB/s cpu: 7.35 sys
0.06 user
user
4: 300.014 sec 11551.35 KB/s cpu: 7.30 sys
0.07 user
user
5: 300.015 sec 11563.46 KB/s cpu: 7.19 sys
0.08 user
user
6: 300.007 sec 12257.53 KB/s cpu: 7.65 sys
0.10 user
user
7: 300.008 sec 11638.53 KB/s cpu: 7.34 sys
0.09 user
user
8: 300.007 sec 11449.44 KB/s cpu: 7.26 sys
0.09 user
user
9: 300.014 sec 12062.17 KB/s cpu: 7.50 sys
0.08 user
user 10: 300.008 sec 11544.21 KB/s cpu: 7.18 sys
0.08 user
user 11: 300.012 sec 11442.10 KB/s cpu: 7.22 sys
0.10 user
user 12: 300.012 sec 11666.33 KB/s cpu: 7.34 sys
0.07 user
user 13: 300.007 sec 11740.63 KB/s cpu: 7.38 sys
0.07 user
user 14: 300.015 sec 11528.29 KB/s cpu: 7.32 sys
0.07 user
user 15: 300.009 sec 11616.83 KB/s cpu: 7.31 sys
0.08 user
user 16: 300.008 sec 12253.34 KB/s cpu: 7.54 sys
0.07 user
user 17: 300.013 sec 11727.19 KB/s cpu: 7.36 sys
0.07 user
user 18: 300.009 sec 11700.54 KB/s cpu: 7.36 sys
0.07 user
user 19: 300.008 sec 12245.63 KB/s cpu: 7.70 sys
0.09 user
user 20: 300.007 sec 11757.38 KB/s cpu: 7.42 sys
0.08 user
user 21: 300.007 sec 11242.93 KB/s cpu: 7.10 sys
0.06 user
user 22: 300.012 sec 11589.92 KB/s cpu: 7.23 sys
0.08 user
user 23: 300.008 sec 12262.93 KB/s cpu: 7.56 sys
0.09 user
user 24: 300.007 sec 11756.85 KB/s cpu: 7.41 sys
0.08 user
user 25: 300.014 sec 12086.92 KB/s cpu: 7.48 sys
0.08 user
user 26: 300.011 sec 12001.58 KB/s cpu: 7.54 sys
0.07 user
user 27: 300.012 sec 12096.78 KB/s cpu: 7.60 sys
0.10 user
user 28: 300.017 sec 11550.08 KB/s cpu: 7.27 sys
0.08 user
user 29: 300.011 sec 11734.28 KB/s cpu: 7.24 sys
0.09 user
user 30: 300.011 sec 11962.11 KB/s cpu: 7.51 sys
0.08 user
user 31: 300.014 sec 12128.16 KB/s cpu: 7.50 sys
0.08 user
user 32: 300.011 sec 11725.32 KB/s cpu: 7.38 sys
0.10 user
user 33: 300.009 sec 11371.62 KB/s cpu: 7.18 sys
0.06 user
user 34: 300.009 sec 12041.25 KB/s cpu: 7.62 sys
0.07 user
user 35: 300.008 sec 11980.36 KB/s cpu: 7.48 sys
0.08 user
user 36: 300.015 sec 11908.75 KB/s cpu: 7.51 sys
0.07 user
user 37: 300.010 sec 11432.46 KB/s cpu: 7.12 sys
0.08 user
user 38: 300.014 sec 11796.37 KB/s cpu: 7.48 sys
0.06 user
user 39: 300.008 sec 11824.77 KB/s cpu: 7.43 sys
0.08 user
user 40: 300.014 sec 12077.29 KB/s cpu: 7.57 sys
0.07 user
user 41: 300.012 sec 11564.45 KB/s cpu: 7.29 sys
0.08 user
user 42: 300.015 sec 11583.94 KB/s cpu: 7.28 sys
0.05 user
user 43: 300.015 sec 11874.83 KB/s cpu: 7.45 sys
0.08 user
user 44: 300.010 sec 12142.53 KB/s cpu: 7.54 sys
0.08 user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
total:
iostat
Device:
sde
sdf
sdg
sdh
sdi
sdj
sdk
sdl
sdm
sdn
sdo
sdp
sdq
sdr
sds
sdt
sdu
sdv
sdw
sdx
sdy
sdz
sdaa
sdab
sdac
sdad
sdae
sdaf
sdag
sdah
sdai
sdak
sdal
sdaj
sdam
sdan
sdaq
sdao
sdap
sdar
sdas
sdat
sdau
sdav
sdaw
sdax
sday
sdaz
VxVM56000
300.015
300.011
300.014
300.010
300.010
300.014
300.010
300.010
300.008
300.016
300.017
300.011
300.011
300.008
300.010
300.011
300.010
300.011
300.008
300.009
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
300.017 sec
rrqm/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
11335.74
11915.63
12259.67
11405.71
11862.76
11556.89
12149.05
11384.38
11414.31
11336.45
12173.06
11808.63
12277.61
11529.39
12021.34
11499.74
12001.73
11978.65
11540.61
11221.21
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
753196.79 KB/s
wrqm/s
r/s
0.00 504.35
0.00 501.55
0.00 507.70
0.00 496.70
0.00 502.40
0.00 499.70
0.00 502.80
0.00 503.90
0.00 501.25
0.00 504.10
0.00 497.20
0.00 496.50
0.00 505.40
0.00 503.40
0.00 502.65
0.00 501.35
0.00 511.30
0.00 502.70
0.00 502.45
0.00 503.10
0.00 501.15
0.00 506.15
0.00 505.45
0.00 507.10
0.00 504.25
0.00 506.30
0.00 500.80
0.00 501.70
0.00 497.90
0.00 499.20
0.00 493.50
0.00 505.10
0.00 504.60
0.00 504.60
0.00 505.85
0.00 506.60
0.00 497.65
0.00 500.25
0.00 497.00
0.00 494.80
0.00 497.45
0.00 507.25
0.00 503.20
0.00 506.55
0.00 498.65
0.00 497.60
0.00 503.15
0.00 504.45
0.00 24109.05
w/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7.05
7.43
7.56
7.12
7.34
7.28
7.49
7.11
7.09
7.09
7.57
7.33
7.55
7.14
7.44
7.18
7.55
7.52
7.20
7.06
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
cpu: 471.23 sys
0.09
0.08
0.10
0.08
0.07
0.07
0.08
0.10
0.07
0.07
0.08
0.06
0.08
0.07
0.07
0.07
0.06
0.08
0.09
0.07
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
4.95 user
rsec/s
wsec/s avgrq-sz avgqu-sz
32278.40
0.00
64.00
0.51
32099.20
0.00
64.00
0.44
32492.80
0.00
64.00
0.52
31788.80
0.00
64.00
0.55
32153.60
0.00
64.00
0.47
31980.80
0.00
64.00
0.62
32179.20
0.00
64.00
0.46
32249.60
0.00
64.00
0.47
32080.00
0.00
64.00
0.49
32262.40
0.00
64.00
0.91
31820.80
0.00
64.00
3.51
31776.00
0.00
64.00
0.44
32345.60
0.00
64.00
0.67
32217.60
0.00
64.00
0.60
32169.60
0.00
64.00
3.46
32086.40
0.00
64.00
0.46
32723.20
0.00
64.00
0.60
32172.80
0.00
64.00
0.76
32156.80
0.00
64.00
3.73
32198.40
0.00
64.00
3.70
32073.60
0.00
64.00
0.47
32393.60
0.00
64.00
3.47
32348.80
0.00
64.00
3.53
32454.40
0.00
64.00
0.51
32272.00
0.00
64.00
0.46
32403.20
0.00
64.00
0.61
32051.20
0.00
64.00
0.48
32108.80
0.00
64.00
0.51
31865.60
0.00
64.00
0.49
31948.80
0.00
64.00
0.47
31584.00
0.00
64.00
0.54
32326.40
0.00
64.00
0.66
32294.40
0.00
64.00
0.61
32294.40
0.00
64.00
0.53
32374.40
0.00
64.00
3.46
32422.40
0.00
64.00
0.47
31849.60
0.00
64.00
3.54
32016.00
0.00
64.00
0.43
31808.00
0.00
64.00
3.43
31667.20
0.00
64.00
0.52
31836.80
0.00
64.00
0.61
32464.00
0.00
64.00
0.75
32204.80
0.00
64.00
3.74
32419.20
0.00
64.00
3.69
31913.60
0.00
64.00
0.46
31846.40
0.00
64.00
0.90
32201.60
0.00
64.00
3.53
32284.80
0.00
64.00
0.44
1542979.20
0.00
64.00
62.84
await
1.02
0.88
1.02
1.10
0.95
1.24
0.91
0.93
0.99
1.80
7.07
0.90
1.32
1.19
6.88
0.92
1.18
1.52
7.42
7.36
0.93
6.85
6.99
1.00
0.92
1.21
0.97
1.01
0.98
0.95
1.09
1.30
1.21
1.06
6.85
0.92
7.11
0.87
6.91
1.05
1.23
1.49
7.44
7.28
0.93
1.80
7.02
0.88
2.61
svctm %util
0.79 39.97
0.70 35.15
0.79 40.20
0.81 40.10
0.76 38.42
0.91 45.48
0.72 36.22
0.76 38.05
0.78 39.16
1.11 55.75
1.96 97.30
0.73 36.06
0.93 47.04
0.85 42.86
1.92 96.49
0.75 37.40
0.85 43.55
1.07 53.56
1.96 98.52
1.92 96.67
0.73 36.65
1.92 97.28
1.95 98.45
0.78 39.43
0.74 37.47
0.89 45.24
0.75 37.80
0.81 40.70
0.76 37.72
0.75 37.24
0.83 40.93
0.93 47.05
0.86 43.54
0.79 40.05
1.91 96.67
0.73 36.80
1.97 97.96
0.70 34.90
1.96 97.41
0.81 40.03
0.90 44.75
1.03 52.22
1.96 98.61
1.92 97.06
0.75 37.21
1.14 56.59
1.94 97.52
0.72 36.44
0.04 100.00
Conclusion to TEST4: <read_ahead disabled>
o The maximum read I/O throughput from one node is NOT being achieved, approx. 0.72 GBytes/sec.
o The throughput for all 64 processes is balanced but is now much lower per process, they are now
only performing approx. 12000 KB/s.
o By disabling readahead the total throughput has halved.
o All the read I/O is synchronous read I/O using a 32KB I/O request size.
 The iostat above shows 64 sectors (32KB) as the average I/O size for all LUN paths –
avgrq-sz 64.00


Because readahead is disabled we are no longer submitting read_pref_io sized requests.
Instead we are submitting a 32KB read request size, because this is the I/O size that vxbench
is using.
TEST5: change read_nstream to 6, keep everything else the same as the baseline test.
vxbench – 64files/64procs/32KB block size
Tuning – read_ahead enabled/read_nstream=6/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=6
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 6
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2
/data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6
/data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10
/data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14
/data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18
/data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22
/data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26
/data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30
/data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34
/data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38
/data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42
/data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46
/data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50
/data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54
/data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58
/data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62
/data1/primary/file63 /data1/primary/file64
user
1: 300.008 sec 26677.91 KB/s cpu: 5.16 sys
0.05 user
user
2: 300.107 sec 26689.61 KB/s cpu: 5.25 sys
0.04 user
user
3: 300.116 sec 26596.61 KB/s cpu: 4.97 sys
0.04 user
user
4: 300.031 sec 26716.80 KB/s cpu: 4.98 sys
0.05 user
user
5: 300.089 sec 26680.92 KB/s cpu: 5.19 sys
0.05 user
user
6: 300.072 sec 26631.30 KB/s cpu: 5.01 sys
0.04 user
user
7: 300.099 sec 26843.86 KB/s cpu: 5.21 sys
0.04 user
user
8: 300.091 sec 26762.65 KB/s cpu: 5.20 sys
0.04 user
user
9: 300.074 sec 26784.68 KB/s cpu: 5.17 sys
0.04 user
user 10: 300.076 sec 26774.26 KB/s cpu: 5.07 sys
0.05 user
user 11: 300.062 sec 26785.71 KB/s cpu: 4.97 sys
0.04 user
user 12: 300.027 sec 14609.45 KB/s cpu: 2.90 sys
0.02 user
user 13: 300.035 sec 26675.62 KB/s cpu: 5.21 sys
0.05 user
user 14: 300.101 sec
9641.12 KB/s cpu: 1.95 sys
0.01 user
user 15: 300.066 sec 26897.99 KB/s cpu: 4.93 sys
0.04 user
user 16: 300.027 sec 26645.46 KB/s cpu: 5.09 sys
0.04 user
user 17: 300.016 sec 26677.21 KB/s cpu: 5.19 sys
0.04 user
user 18: 300.020 sec 26636.02 KB/s cpu: 5.25 sys
0.05 user
user 19: 300.012 sec 26728.77 KB/s cpu: 4.98 sys
0.05 user
user 20: 300.081 sec 18732.43 KB/s cpu: 3.46 sys
0.04 user
user 21: 300.008 sec 26729.13 KB/s cpu: 5.22 sys
0.04 user
user 22: 300.087 sec 26701.62 KB/s cpu: 5.16 sys
0.04 user
user 23: 300.083 sec 14616.98 KB/s cpu: 2.86 sys
0.01 user
user 24: 300.085 sec 26926.99 KB/s cpu: 5.02 sys
0.03 user
user 25: 300.031 sec 26542.74 KB/s cpu: 5.16 sys
0.05 user
user 26: 300.101 sec 26608.19 KB/s cpu: 5.02 sys
0.06 user
user 27: 300.112 sec 26760.74 KB/s cpu: 5.28 sys
0.03 user
user 28: 300.050 sec 26674.13 KB/s cpu: 5.20 sys
0.04 user
user 29: 300.058 sec 19430.05 KB/s cpu: 3.79 sys
0.03 user
user 30: 300.062 sec 26703.79 KB/s cpu: 5.24 sys
0.04 user
user 31: 300.079 sec 26692.03 KB/s cpu: 5.18 sys
0.05 user
user 32: 300.078 sec 19572.11 KB/s cpu: 3.75 sys
0.03 user
user 33: 300.014 sec 26872.00 KB/s cpu: 5.25 sys
0.05 user
user 34: 300.035 sec 26593.60 KB/s cpu: 4.86 sys
0.04 user
user 35: 300.011 sec 26554.73 KB/s cpu: 5.17 sys
0.04 user
user 36: 300.065 sec 26713.74 KB/s cpu: 5.23 sys
0.05 user
user 37: 300.011 sec 26687.96 KB/s cpu: 5.18 sys
0.04 user
user 38: 300.034 sec 26696.03 KB/s cpu: 5.30 sys
0.04 user
user 39: 300.046 sec 18888.19 KB/s cpu: 3.62 sys
0.03 user
user 40: 300.019 sec 26656.42 KB/s cpu: 5.18 sys
0.04 user
user 41: 300.039 sec 26685.39 KB/s cpu: 5.08 sys
0.05 user
user 42: 300.041 sec 14332.34 KB/s cpu: 2.85 sys
0.02 user
user 43: 300.112 sec 26863.12 KB/s cpu: 5.27 sys
0.04 user
user 44: 300.008 sec 26667.66 KB/s cpu: 5.07 sys
0.05 user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
total:
300.060
300.021
300.052
300.110
300.096
300.116
300.026
300.027
300.044
300.017
300.024
300.102
300.043
300.055
300.047
300.070
300.055
300.097
300.093
300.024
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
300.117 sec
26949.71
26635.77
26817.32
26760.94
7747.49
14676.80
26737.82
26737.80
26777.10
26769.30
26799.31
26720.72
26807.85
26868.24
26879.17
26907.83
26786.37
16684.09
14063.68
26635.57
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
1572798.88 KB/s
5.05
5.07
5.26
5.19
1.56
2.93
5.13
5.05
4.96
5.13
5.33
5.17
5.04
4.91
5.24
5.32
5.02
3.18
2.79
4.90
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
0.04
0.05
0.03
0.04
0.00
0.03
0.06
0.05
0.04
0.04
0.05
0.05
0.03
0.04
0.05
0.04
0.05
0.03
0.01
0.04
cpu: 302.31 sys
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
2.53 user
Conclusion to TEST5: <read_nstream set to 6>
o The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s
o The throughput per process is still imbalanced.
o The maximum amount of readahead per process is 3MB, this is too aggressive (i.e. too much
readahead is causing a throughput imbalance between the reading processes).
TEST6: change read_nstream to 12, keep everything else the same as the baseline test.
vxbench – 64files/64procs/32KB block size
Tuning – read_ahead enabled/read_nstream=12/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=12
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 12
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
#./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2
/data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6
/data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10
/data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14
/data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18
/data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22
/data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26
/data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30
/data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34
/data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38
/data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42
/data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46
/data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50
/data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54
/data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58
/data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62
/data1/primary/file63 /data1/primary/file64
user
1: 300.152 sec
4957.28 KB/s cpu: 0.94 sys
0.00 user
user
2: 300.133 sec
4896.28 KB/s cpu: 0.92 sys
0.00 user
user
3: 300.068 sec 32767.44 KB/s cpu: 6.28 sys
0.05 user
user
4: 300.090 sec 32949.22 KB/s cpu: 6.12 sys
0.05 user
user
5: 300.067 sec
5041.22 KB/s cpu: 0.95 sys
0.01 user
user
6: 300.139 sec
4855.57 KB/s cpu: 0.89 sys
0.00 user
user
7: 300.048 sec 32872.02 KB/s cpu: 6.31 sys
0.05 user
user
8: 300.129 sec
4610.50 KB/s cpu: 0.89 sys
0.00 user
user
9: 300.146 sec
4855.02 KB/s cpu: 0.92 sys
0.00 user
user 10: 300.015 sec 32855.14 KB/s cpu: 6.17 sys
0.05 user
user 11: 300.040 sec 32811.44 KB/s cpu: 6.20 sys
0.06 user
user 12: 300.069 sec
4897.65 KB/s cpu: 0.92 sys
0.00 user
user 13: 300.013 sec 32793.85 KB/s cpu: 6.30 sys
0.06 user
user 14: 300.082 sec 32806.79 KB/s cpu: 6.31 sys
0.04 user
user 15: 300.033 sec 32914.55 KB/s cpu: 6.36 sys
0.04 user
user 16: 300.067 sec 32726.51 KB/s cpu: 6.33 sys
0.05 user
user 17: 300.057 sec 32604.74 KB/s cpu: 6.30 sys
0.05 user
user 18: 300.140 sec
4753.19 KB/s cpu: 0.93 sys
0.00 user
user 19: 300.090 sec 32703.57 KB/s cpu: 6.29 sys
0.06 user
user 20: 300.030 sec 32914.89 KB/s cpu: 6.30 sys
0.06 user
user 21: 300.005 sec 32835.71 KB/s cpu: 6.37 sys
0.05 user
user 22: 300.103 sec 32845.42 KB/s cpu: 6.27 sys
0.05 user
user 23: 300.061 sec 32993.42 KB/s cpu: 6.30 sys
0.06 user
user 24: 300.152 sec
4732.43 KB/s cpu: 0.89 sys
0.01 user
user 25: 300.067 sec 32501.34 KB/s cpu: 6.34 sys
0.05 user
user 26: 300.162 sec
4794.20 KB/s cpu: 0.91 sys
0.00 user
user 27: 300.006 sec 32651.33 KB/s cpu: 6.36 sys
0.05 user
user 28: 300.067 sec 32767.47 KB/s cpu: 6.38 sys
0.05 user
user 29: 300.147 sec
4791.47 KB/s cpu: 0.93 sys
0.01 user
user 30: 300.020 sec 32711.16 KB/s cpu: 6.31 sys
0.04 user
user 31: 300.151 sec
5113.69 KB/s cpu: 0.92 sys
0.00 user
user 32: 300.017 sec 14987.11 KB/s cpu: 2.89 sys
0.02 user
user 33: 300.028 sec 32689.88 KB/s cpu: 6.38 sys
0.06 user
user 34: 300.136 sec
4856.04 KB/s cpu: 0.91 sys
0.00 user
user 35: 300.146 sec
4794.78 KB/s cpu: 0.91 sys
0.00 user
user 36: 300.005 sec 32712.86 KB/s cpu: 6.19 sys
0.05 user
user 37: 300.100 sec 32927.68 KB/s cpu: 6.37 sys
0.04 user
user 38: 300.048 sec 32994.80 KB/s cpu: 6.34 sys
0.04 user
user 39: 300.010 sec 32630.41 KB/s cpu: 6.27 sys
0.04 user
user 40: 300.054 sec 32768.91 KB/s cpu: 6.32 sys
0.05 user
user 41: 300.019 sec 33100.39 KB/s cpu: 6.17 sys
0.04 user
user 42: 300.066 sec 32726.68 KB/s cpu: 6.38 sys
0.05 user
user 43: 300.035 sec 33221.54 KB/s cpu: 6.34 sys
0.06 user
user 44: 300.008 sec 32692.06 KB/s cpu: 6.32 sys
0.06 user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
total:
300.025
300.146
300.108
300.073
300.007
300.050
300.026
300.087
300.022
300.091
300.075
300.075
300.059
300.062
300.064
300.079
300.044
300.049
300.142
300.039
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
sec
300.163 sec
33181.65
4773.57
33172.52
32766.86
32814.98
32933.22
33038.23
32970.03
32833.90
32990.10
32991.84
32909.93
4778.89
32993.34
33156.83
33011.92
33097.62
4774.69
4897.10
33221.04
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
KB/s
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
cpu:
1581364.62 KB/s
6.38
0.90
6.27
6.30
6.36
6.35
6.35
6.38
6.09
6.32
6.34
6.37
0.90
6.12
6.38
6.38
6.36
0.90
0.92
6.33
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
sys
0.05
0.00
0.04
0.06
0.06
0.05
0.05
0.07
0.05
0.04
0.06
0.04
0.00
0.06
0.05
0.05
0.05
0.00
0.00
0.04
cpu: 302.20 sys
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
user
2.33 user
Conclusion to TEST6: <read_nstream set to 12>
o The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s
o The throughput per process is imbalanced.
o The maximum amount of readahead per process is 6MB, this is too aggressive (i.e. too much
readahead is causing a throughput imbalance between the reading processes).
Graphics for buffered I/O tests
The graphs below show the results of the tests running 64 processes (only Test3, which runs 16 processes, is
excluded from the graphs). The second graph simply joins the dots for each process; each test uses a
different colour. The graphs clearly show that only read_nstream=1 (Test2) and read_ahead off (Test4)
provide an evenly balanced throughput across all 64 processes. However when read_ahead is disabled the
throughput is much lower.
Therefore, in our test, read_nstream=1 (dark blue in the graphics) is clearly the correct value because the
throughput is evenly balanced across all 64 processes and the maximum throughput is still achieved.
9. Final conclusions and best practices for optimizing
sequential read I/O workloads:
To maximize the sequential read I/O throughout, maintain evenly balanced I/O across all
the LUNs and balance the throughput across the active reading processes, we identified
the following configuration for our test environment:
o 512KB VxVM stripe width (for the optimum I/O size reading from disk)
o 24 LUNs and 24 columns in our VxVM volume (to use maximum storage bandwidth)
o Leave read_pref_io set to the default value of 524288 (max I/O size using readahead)
o Reduce read_nstream from a default value of 24 to a value of 1 (to reduce the
maximum amount of data to pre-fetch in one go using readahead)
The best practices for sequential read media server solution configurations are as follows:
o Set up your hardware so that the maximum I/O bandwidth can be achieved.
o We did not change the operating system maximum I/O size, we kept the default of 512KB.
o Ensure that your I/O is balanced evenly across all your LUNs by using VxVM striped volumes
 We found a VxVM stripe-width of 512KB is optimal, different stripe-widths can be
tested, a stripe-width greater than 1024KB is not required.
 We created 24 LUNs that maximized access to the storage arrays, we therefore
created our VxVM volume with 24 columns to maximize the bandwidth to the storage
arrays.
 During this process identify any bottlenecks in your HBA cards and storage, begin with
a single node, the bottlenecks will give you the maximum throughput you can achieve
in your environment.
o If VxVM mirroring was required in our configuration then 12 LUNs would be used in each
mirror.
 As reads can come from either mirror the read I/O throughput should not be impacted
by mirroring, because we are still reading from all 24 LUNs, however writes will be
impacted.
o The value of read_pref_io is the read I/O request size that VxFS readahead will submit to
VxVM, we want a larger I/O size for performance (read_pref_io is set to the stripe-width).
 Do not change the auto-tuned value for read_pref_io, if you want to change
read_pref_io change the VxVM volume stripe-width instead.
o Using higher read_nstream values produced an imbalance in throughput between the
different processes performing disk read I/O, this is due to overly aggressive read_ahead
 No matter what value of read_nstream we used, we always hit the FC HBA Card
throughput bottleneck of approximately 1.5GBytes/sec
 The larger the value of read_nstream the more aggressive read_ahead becomes, and
the greater the imbalance in read throughput between the different processes
 Reduce read_nstream to reduce the amount of readahead. We found read_nstream=1
provided a perfect balance in throughout between processes.
o Do not disable readahead unless absolutely necessary as sequential read performance will be
impacted.
o Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot.
o Mount with options noatime and nomtime if you can.
We will provide a second report for media server workload testing that explains
sequential write I/O and some more best practices.
Best regards
Veritas IA Engineering team
Server h/w configuration information: <2 nodes>
System
# dmidecode -q -t 1|head -5
System Information
Manufacturer: HP
Product Name: ProLiant DL380p Gen8
CPU
# dmidecode -q -t 4|grep -e Processor -e Socket -e Manufacturer -e Version -e
"Current Speed" -e Core -e Thread|grep -v Upgrade
Processor Information
Socket Designation: Proc 1
Type: Central Processor
Manufacturer: Intel
Version: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
Current Speed: 2200 MHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Processor Information
Socket Designation: Proc 2
Type: Central Processor
Manufacturer: Intel
Version: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
Current Speed: 2200 MHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Memory
# dmidecode -q -t 17|grep Size|grep -v "No Module Installed"|awk
'BEGIN{memsize=0}{memsize=memsize+$2}END{print memsize, $3}'
98304 MB
# dmidecode -q -t 17|grep -e Speed -e Type|grep -v Detail|sort|uniq|grep -v Unknown
Configured Clock Speed: 1600 MHz
Speed: 1600 MHz
Type: DDR3
Download