Windows 2000 IO Performance Leonard Chung & Jim Gray 4/5/2000 1 Study Goals Repeat and Extend the Riedel, et. al paper. Many things have changed: – Software: Windows 2000 instead of NT4SP3 – Hardware: New, faster drives and standards 3 main testing scenarios: – old-old: “old” machine with NT4SP6 – old-new: “old” machine with Win2000 – new-new: “new” machine with Win2000 4/5/2000 2 Hardware Configurations “old” hardware: – 333 MHz PII – 4 x 7200 RPM UW SCSI drives – 128 MB SDRAM “new” hardware: – – – – 4/5/2000 2 x 733 MHz PIII 4 x 10,000 RPM Ultra160 SCSI drives 256 MB RDRAM 4 x 5400 RPM UltraATA/66 IDE drives on a 3ware card 3 Primary Test Tools SQLIO – the primary test tool CacheFlush – buffered sequential DiskCache – PCI/host adapter throughput Memspeed – memory subsystem 4/5/2000 4 Testing Methodology Before each test: – Drive formatted – Test files copied in same order – Test run Sequential test files made to live on outer edge of disk, giving disk’s max performance and consistent results. 4/5/2000 5 Media Banding Modern disks are zoned – More bits stored on outer tracks + constant angular velocity = fast outer tracks – We’ve measured inner tracks on some drives being up to 40% slower than the outer tracks – A “normal” disk map… 4/5/2000 6 Media Banding 4/5/2000 7 Overall Findings Changes in throughput performance are incremental rather than radical – Trendlines have the same general shape – Most of Riedel’s model still holds 4/5/2000 8 Hardware Bandwidth (RAP) System Bandwidth: What Riedel Saw in megabytes per second (not to scale!) 30 9 per disk 72 140 Hard Disk | SCSI | PCI | Memory | Processor 4/5/2000 9 Hardware Bandwidth (PAP) System Bandwidth Yesterday in megabytes per second (not to scale!) 40 15 per disk 133 422 Hard Disk | SCSI | PCI | Memory | Processor 4/5/2000 10 Hardware Bandwidth (PAP) System Bandwidth Yesterday in megabytes per second (not to scale!) 40 15 per disk The familiar bandwidth pyramid: 133 422 The farther from the CPU, the less the bandwidth. Hard Disk | SCSI | PCI | Memory | Processor 4/5/2000 11 Hardware Bandwidth (PAP) System Bandwidth Today in megabytes per second (not to scale!) The familiar pyramid is gone! PCI is now the bottleneck! 26 26 160 133 1,600 In practice, 3 disks can reach saturation using sequential IO Hard Disk | SCSI | PCI | Memory | Processor 26 4/5/2000 12 Hardware Bandwidth (PAP) System Bandwidth Today Possible solutions: in megabytes per second (not to scale!) 26 26 26 160 532 1,600 A fatter, 64bit 66MHz PCI bus or… Hard Disk | SCSI | PCI | Memory | Processor 4/5/2000 13 Hardware Bandwidth (PAP) System Bandwidth Today Possible solutions: in megabytes per second (not to scale!) 26 26 160 133 26 1,600 26 26 133 160 A fatter, 64bit 66MHz PCI bus or… multiple PCI busses 26 Hard Disk | SCSI | PCI | Memory | Processor 4/5/2000 14 Hardware Bandwidth (RAP) System Bandwidth Today (reads) Numbers we’ve seen in megabytes per second (not to scale!) 24 each 98.5 98.5 975 Hard Disk | SCSI | PCI | Memory | Processor 4/5/2000 15 old-old: NT4SP3 vs. NT4SP6 Unbuffered read and WCE writes no longer show decrease in throughput Buffered read bug is gone Overheads are different NT4SP3 NT4SP6 NT4SP3 Unbuffered Read Throughput 80 10 Read Throughput (MB/s) ms/MB) (cpu (MB/s) Overhead Throughput 8 70 6 4 60 50 8 Read Write 6 Write (WCE) Write + WCE 4 Write 40 30 20 2 2 10 0 0 4/5/2000 0 2 2 Write Read 0 2 4 128 192 192 4 8 8 16 16 16 32 32 32 64 6464 128 128 4 8 192 Request Request Si Size ze (K(KB) Bytes) Request Size (K-Bytes) NT4SP6 Unbuffered Read NT4SP6 Buffered Overhead NT4SP6 Buffered Throughput 10 9 10 850 9 7 8 640 7 5 6 30 4 5 4 320 3 2 2 110 1 0 throughput at various request depths Throughput (MB/s) Overhead Throughput (MB/s) (CPU ms/MB) NT4SP3 Buffered Throughput NT4SP3 Buffered Overhead 1 Fast Disk, Various Request Depths 10 write (WCE) read write 1 deep 3 deep read w rite 4 deep w rice 8(WCE) deep 0 2 2 2 4 4 8 16 32 4 8 8 1616 3232 64 64 64 Request Size (KB) Request Size (KB) Request Size (KB) 128 128 128 256 256 256 16 old-new: Windows 2000 Software: Major changes, minor differences – Dmio: The volume manager for Win2K • More fixed overhead than ftdisk due to longer code paths • More features than ftdisk (dynamically size volumes, etc.) – In the end, performance is the same. • Processors are fast enough that there are more than enough cycles to spare. 4/5/2000 17 new-new: Windows 2000 Hardware: The American Way – Faster, bigger, cheaper • Disks are now 4 times bigger and 3 times faster. • SCSI bus bandwidth has surpassed the PCstandard 32bit, 33MHz PCI bus bandwidth. • Random IO is unaffected by the PCI bottleneck. • Additional SMP processor provided no additional throughput gains. 4/5/2000 18 new-new: Windows 2000 Scalability PCI Bottleneck Win2K Dynamic 1 disk unbuffered throughput Win2K Dynamic 2 disk unbuffered throughput 20 read write write (WCE) 15 10 5 Throughput (MB/s) Throughput (MB/s) 25 0 2 4 8 16 32 64 128 50 45 40 35 30 25 20 15 10 5 0 256 read write write (WCE) 2 4 Request Size (KB) read write write (WCE) 60 50 40 30 20 10 32 64 128 4 8 16 32 64 Request Size (KB) 256 100 80 60 40 read 20 write (WCE) 0 0 4/5/20002 16 Win2K 3 disk Dynamic, 1 disk Basic 4 disk unbuffered throughput Throughput (MB/s) 70 8 Request Size (KB) Win2K Dynamic 3 disk unbuffered throughput Throughput (MB/s) 128 256 2 4 8 16 32 Request Size (KB) 64 128 256 19 new-new: Windows 2000 & IDE The real IO revolution: RAID priced for the masses! The good news: – IDE disks are cheap • We bought 5400 RPM IDE 27GB drives for $209 ($7.75/GB) while our 10,000 RPM 18GB SCSI drive cost $534 ($30/GB) • IDE costs $3.17 per Kaps while SCSI costs $5.09 per Kaps. • Today, IDE is $6,500 per TB while SCSI costs $16,000 4/5/2000 20 new-new: Windows 2000 & IDE IDE Performance: •However, can provide to IDE IO/s vs. up Depth Atlas 10K SCSImultiple IO/s vs. DepthIDE disks Fireball 180 180 60% more Kaps for the same price as a single 160 160 140 140 120 120 SCSI disk. 100 100 IO/s IO/s – Single disk random IO performance on a 5400 RPM IDE is much slower than a 10,000 SCSI. 80 60 40 20 0 read write 1 4/5/2000 2 4 8 Request Depth 16 32 80 60 40 20 0 read write 1 2 4 8 Request Depth 16 32 21 new-new: Windows 2000 & IDE IDE Performance: – Single disk sequential IO throughput on a 5400 RPM IDE drive is 80% of the more expensive 10,000 RPM SCSI drive. Throughput (MB/s) 30 W in2K 1 disk 3w are IDE unbuffered throughput 25 25 20 20 15 15 10 1 deep read 2 deep read 1 deep write (WCE) 5 0 2 4 8 16 32 64 128 256 Request Size (KB) 4/5/2000 30 Throughput (MB/s) Win2K 1 disk SCSI unbuffered throughput 1 2 1 2 10 5 deep deep deep deep read read w rite w rite 0 2 4 8 16 32 64 128 256 Re que s t Size (KB) 22 new-new: Windows 2000 & IDE Price/Performance for IDE is hard to beat – Performance • For sequential and random IO, IDE is price/performance leader • Overhead for SCSI and 3ware/DMA IDE is the same. – Capacity • 69GB (~2.5 disks worth) of Quantum Fireball lct08s costs the same as one Quantum Atlas 10K 18GB disk. 4/5/2000 23 new-new: Windows 2000 & IDE The bad news about IDE – The quality of IDE controllers varies Revolutions are being missed due to slow controller 4/5/2000 24 new-new: Windows 2000 & IDE The bad news about IDE Throughput (MB/s) 25 Western Digital Caviar 30GB 3ware unbuffered read throughput 20 Missing every other revolution 15 10 1 deep 5 Missing multiple revolutions 0 2 4/5/2000 High controller overhead is causing the disk to miss revolutions at small request sizes 4 8 16 32 Request Size (KB) 64 128 256 25 new-new: Windows 2000 & IDE (3ware) The bad news about IDE – IDE RAID isn’t as mature as SCSI • Driver bugs and incompatibilities • Problems with multiple IDE drives – IDE spec gives 18” as the max cable length: getting cables to drives can be a chore – Avoid master/slave: reliability and possibly performance is lost – No hot swap 4/5/2000 26 new-new: Windows 2000 & IDE (3ware) The bad news about IDE – RAID isn’t as mature as SCSI • 3ware’s card peaks out at 55MBps for reads and 40MBps for writes; 3 disks for reads and 2 for writes. 40 50 50 35 40 30 40 25 30 30 20 deep 111deep deep deep 22 deep 2 deep deep 44 deep 4 deep deep 88 deep 20 20 15 111000 00 0 unbuffered write 40 5 4/5/2000 45 Win2K 2 disk 3ware hardware RAID0 3 disk 3ware hardware RAID0 Win2K 4 unbuffered write 60 60 8 deep 22 2 44 4 88 1166 32 32 64 64 8 16 32 64 Request Re que s t Size Size (KB) (KB) 1128 28 128 256 256 256 Throughput Throughput Throughput (MB/s) (MB/s) Throughput (MB/s) Throughput Throughput (MB/s) (MB/s) 45 60 60 Win2K 2 4 disk are hardw are RAID0 Win2K disk 3w 3ware hardware RAID0 Win2K 3 disk 3ware hardware RAID0 unbuffered read read unbuffered unbuffered read 50 50 35 30 40 40 25 30 30 20 1 deep deep 121deep deep 22deep deep 4 deep 4 deep deep deep 884deep 20 20 15 10 0 11 0 5 00 0 8 deep 22 2 44 4 88 1166 32 64 8 16 32 64 Request Re que s t Size Size (KB) (KB) 27 256 128 256 128 Where do we go from here? Network IO over Gigabit – OOB performance and slight tuning Sqlio2: a complete rewrite of SQLIO 4/5/2000 28 And in conclusion… NT4SP6 – Unbuffered requests at 2KB, 4KB request sizes no longer have dip – Buffered read request bug gone – Buffered overhead appears to be lower Windows 2000 – Despite dmio replacing ftdisk, throughput remains unaffected 4/5/2000 29 And in conclusion… new-new SCSI performance – PCI is now the bottleneck with 3 drives able to reach saturation new-new IDE – IDE shows a lot of promise: cheap storage and good performance – Difficulty lies with multiple disks • IDE RAID cards not quite ready for prime time • Physically wiring the drives 4/5/2000 30