Front cover Ready to Access DB2 for z/OS Data on Solid-State Drives Redguides for Business Leaders Jeffrey Berger Paolo Bruni Get to know the solid-state drives in z/OS environments Compare performance when accessing DB2 for z/OS data Understand requirements and applicability Executive overview The dollar-to-data ratio of solid-state drives (SSDs) is still high when compared with the dollar-to-data ratio of traditional spinning disks in the enterprise storage market; however, the SSDs ratio is expected to improve with investments in the not too distant future. By avoiding seeks and rotational delays, SSDs have the potential to improve the performance of disk storage. This IBM® Redguide™ publication provides a broad understanding of SSDs and shares several early experiences in a laboratory setting using SSDs to improve the performance when accessing DB2® for z/OS® data. Disclaimer The performance data contained in this document was obtained in various controlled laboratory environments and is for reference purposes only. Do not adapt these performance numbers to your own environments as system performance standards. The results that you might obtain in other operating environments can vary significantly. Users of this information must verify the applicability for their specific environment. © Copyright IBM Corp. 2009. All rights reserved. 1 Executive summary The traditional storage technology for most large computer systems consists of spinning disks, which are known today as hard disk drives (HDDs), but solid-state drives (SSDs) have been gaining momentum lately. SSDs are much more expensive than HDDs, but the price is expected to drop and might in the future compete against spinning disks. On February 2009, IBM announced the IBM System Storage™ DS8000® Turbo series with the enhancements of full disk encryption and key management and a solid-state drive option. Laboratory measurements are confirming the high potential to improve the performance of DB2 for z/OS. We present several of these measurements and the lessons that were learned from the study in this paper. Rotating disks that spin at rates of 15 000 rotations per minute typically achieve response times in the range of 4 to 8 milliseconds (for cache misses). In contrast, SSDs provide access times in tens of microseconds, rendering data access times that are more like dynamic random access memory (DRAM). However, the access time of the SSD drive itself ignores the functionality required of an enterprise storage subsystem and the effects of such function on response time. An enterprise storage subsystem provides layers of virtualization in order to shield the software from dependencies on hardware geometry. An enterprise storage subsystem needs to provide continuous availability, and it needs to enable multiple hosts to share the storage. This functionality, which is provided by the DS8000 storage server, adds to the response time of SSDs in an enterprise system, bringing it to the level of hundreds of microseconds. Furthermore, for predominant sequential accesses (cache hits), HDDs and SSDs show similar performance. Nevertheless, we must not underestimate the value of eliminating the seek and rotational delays of HDDs. Among the lessons learned in the measurement study of DB2 for z/OS is that solid-state drives appear to cause greater stress on the channel subsystem, because SSDs enable higher levels of throughput. More improvements in the system as a whole enable solid-state drives to further realize their full potential. IBM has delivered High Performance FICON® (zHPF) to the z/OS environment to help make this happen. IBM recommends zHPF for an SSD environment. Even when the channel subsystem is not stressed, zHPF provides lower response times when accessing SSDs. Another lesson is that disk drives are just one aspect of the I/O subsystem. I/O improvements, such as those improvements made by IBM for the z/OS environment in recent years, continue to be important. Among these improvements are high speed channels, the Modified Indirect Data Address Word (MIDAW)1 facility, Hyper Parallel Access Volumes (HyperPAVs), and Adaptive Multistream Prefetch (AMP)2. DB2 buffer pools and cache in the storage server continue to be important. It is not uncommon for an enterprise system to experience average DB2 synchronous I/O time of two milliseconds or less, and that does not count the zero I/O time for DB2 buffer hits. So, it is simplistic to argue that the busiest data sets belong on expensive fast-performing devices, such as SSD. However, data sets with low access rates do not belong on SSD either. Rather, databases with a large working set and high access rates have the most to gain from SSD performance. But large databases require large disk capacity; consequently, you cannot expect to purchase just a small amount of SSD storage and expect to realize a large performance benefit. 1 2 2 Refer to How does the MIDAW facility improve the performance of FICON channels with DB2 and other workloads, IBM REDP-4201. The Adaptive Multi-stream Prefetching (AMP) technology was developed by IBM Research. AMP introduces an autonomic, workload-responsive, self-optimizing prefetching technology that adapts both the amount of prefetch and the timing of prefetch on a per-application basis in order to maximize the performance of the system. Ready to Access DB2 for z/OS Data on Solid-State Drives A special case to consider is one where the type of workload suddenly changes. A classic example is the daily stock market opening, following on the heels of a batch workload. If the nightly batch workload has caused important data from the previous day’s work to be flushed from the cache, the stock market opening might result in a lower cache hit ratio than what will transpire later in the day. If the performance during the opening is critical enough, it is helpful to use solid-state drives, even though hard disk drives probably suffice to maintain good performance later in the day. SSD is also useful to smooth out other “rough spots” during the day. Solid-state drives Recent trends in direct access storage devices have introduced the use of NAND flash semiconductor memory technology for solid-state drives (SSDs). Flash memory is a nonvolatile computer memory that can be electrically erased and reprogrammed. It is a technology that is primarily used in memory cards and USB flash drives for the general storage and transfer of data between computers and other digital products. Flash memory offers good read access times and better kinetic shock resistance than hard disks. Flash memory, once packaged in a memory card, is durable and can withstand intense pressure, extremes of temperature, and even immersion in water. Because SSDs have no moving parts, they have better performance for random accesses than spinning disks (hard disk drives (HDDs)), and require less energy to operate. SSDs are currently priced more per unit of storage than HDDs but less than previous semiconductor storage. The industry expects these relationships to remain that way for a number of years although the gap in price will likely narrow over time. As a result, both technologies are likely to coexist for a while. In February 2009, IBM announced the IBM System Storage DS8000 Turbo series with solid-state drives (SSDs)3. Earlier, in October 2008, IBM had announced a complementary feature called High Performance FICON for System z® (zHPF)4. zHPF exploits a new channel protocol especially designed for more efficient I/O operations. IBM recommends zHPF for an SSD environment. zHPF requires a z10 processor and z/OS 1.10 (or an SPE5 retrofitted to z/OS 1.8 or 1.9), as well as a storage server, such as the DS8000, that supports it. zHPF provides lower response times when accessing SSDs. The DS8000 storage subsystem with SSDs Solid®-state drives are plug-compatible in a DS8000 and are configured using RAID in exactly the same fashion as hard disks. A RAID “rank” always consists of 8 disks. The measurements that are described in this study used RAID 5, which offers the best combination of space capacity, performance, and reliability. Let us first review how the device adapters are configured in a DS8000. A DS8000 configuration normally consists of one device adapter pair for each set of 64 disks, or eight ranks. In other words, there is one device adapter for each set of four ranks. As more ranks 3 4 5 IBM Corp. 2009. US Announcement Letter 109-120: IBM System Storage DS8000 series (Machine types 2421, 2422, 2423, and 2424) delivers new security, scalability, and business continuity capabilities. IBM United States Hardware Announcement 108-870: IBM System Storage DS8000 series (Machine types 2421, 2422, 2423, and 2424) delivers new functional capabilities (zHPF and RMZ resync). When new function is delivered as service, it is called a small programming enhancement (SPE). SPEs are delivered and tracked the same way as problems. An APAR is assigned to the SPE, and it is delivered as a program temporary fix (PTF). 3 are added to the DS8000 subsystem, the number of device adapters is increased up to a maximum of sixteen device adapters (for 64 ranks). Next, let us review the channel subsystem. A DS8100 supports up to 64 external ports, and a DS8300 supports up to 128 external ports, which can be used either for host channel connections (known as host adapters) or to connect to another DS8000 in order to replicate data on another storage server. For a small number of SSD ranks, if the ranks are limited to one or two device adapters, the device adapters will limit the performance capacity of the subsystem unless the cache hit ratio is good. However, if a few SSD ranks are spread across four or more device adapters, the performance limitations of the channel subsystem will outweigh the limitations of the device adapters, no matter what the hit ratio is. Thus, if it is necessary to sustain very high I/O rates and there are more than just two or three SSD ranks, the performance capacity of channel subsystem is critical to the performance capacity of the I/O subsystem as a whole. DB2 accesses We examine the performance characteristics of the following typical DB2 access types6: DB2 synchronous I/Os DB2 sequential I/O DB2 list prefetch DB2 list prefetch with striping DB2 synchronous I/O using one or two RAID ranks DB2 synchronous I/Os The best and most obvious benefit of the SDDs in the DS8000 is the improvement for DB2 synchronous read I/Os. The metric used in Figure 1 is the response time for synchronous reads measured by DB2 itself (which includes overhead in z/OS to process the I/O) and adding channel overhead as measured on the z10 processor. The z/OS processing overhead on the z10 was measured to be 12 microseconds elapsed time. As can be seen in Figure 1, zHPF affects the performance of cache hits as well as SSD accesses. For cache hits, zHPF lowers the response time from 290 microseconds to 229 microseconds. For SSD accesses, zHPF lowers the response time from 838 microseconds to 739 microseconds. HDD accesses were not re-evaluated using zHPF, because a delta of 100 microseconds is considered insignificant to HDD. In comparison, the response times for hard disks (spinning at 15 000 RPM) range from about 4 000 microseconds for “short seeks” to 8 000 microseconds for long seeks. “Short seeks” are indicative of an individual data set or an individual volume being a hot spot. “Long seeks” are what you might observe when seeking between the extreme inner and outer cylinders of a hard disk, but are extremely atypical. When the I/O is uniformly spread across the disks, we can expect the average response time to be about 6 000 microseconds, which is about seven times higher than the DS8000 can achieve using SSDs without zHPF and about eight times higher than if zHPF is used. 6 4 Except where specifically stated otherwise, all SSD measurements were performed on a z10 processor with 8 FICON Express 4 channels using a DS8000 storage server. The SSDs used for the study were manufactured by STEC. Both the z10 and the DS8000 contained pre-release microcode that supported SSDs and zHPF. HDDs were measured on a z9® processor and were not remeasured on a z10 due to time constraints, but seeks and rotational delays are unaffected by the processor. Ready to Access DB2 for z/OS Data on Solid-State Drives Microseconds 10000 8000 8000 6000 3860 4000 2000 229 290 739 838 Sh SSD or ts ee Lo k ng se ek ac he H SS it D +z HP F C C ac he hi t+ zH PF 0 Figure 1 DB2 synchronous I/O response time DB2 sequential I/O Sequential I/O performance has been steadily improving for many years. High-speed FICON channels, with fast host adapters in the storage server, are crucial to providing good sequential performance. RAID 5 architecture eliminates the effect of hard disk drives on sequential performance, because the data is striped across eight drives, enabling the server to exploit the bandwidth of several drives in parallel. The sequential bandwidth of the disks in a RAID 5 rank exceeds the sequential bandwidth of a device adapter in the DS8000, but the channel subsystem is not yet fast enough to enable one sequential stream to absorb the bandwidth of a device adapter. Besides high-speed channels, two other recent enhancements have contributed to faster sequential I/O. In 2005, the MIDAW facility was introduced. The Modified Indirect Data Address Word (MIDAW) facility was introduced in the IBM z9 processor to improve FICON performance, especially when accessing IBM DB2 databases. This facility is a new method of gathering data into and scattering data from discontinuous storage locations during an I/O operation. MIDAW requires a z9 or z10 processor, with z/OS 1.7 or a PTF to z/OS 1.6. MIDAWs cuts FICON channel utilization for DB2 sequential I/O streams by half or more and improves the sequential throughput of extended format data sets by about 30%. In a sense, zHPF is a complement to MIDAW. They both address the issue of channel efficiency, but whereas one affects sequential I/O, the other affects random I/O. The other new feature that improves sequential performance is Adaptive Multistream Prefetch (AMP). AMP was first introduced in Release 2.4G of the DS8000. It does not require any z/OS changes. AMP typically improves the sequential read throughput by another 30%. AMP achieves this improvement by increasing the prefetch quantity sufficiently to meet the needs of the application. For example, if the application is CPU bound, there is no need to prefetch a lot of data from the disk. At the opposite extreme, if the application is I/O bound and the channels are extremely fast, AMP will prefetch enough data to enable the “back-end” operations to keep up with the channel. 5 As more data is prefetched, more disks are employed in parallel. Therefore, high throughput is achieved by employing parallelism at the disk level. Besides enabling one sequential stream to be faster, AMP also reduces disk thrashing when there is disk contention. Disk thrashing is not an issue with SSD, because SSD does not have to seek from one cylinder to another cylinder, but parallelism is still important to SSD. Dynamic prefetch Figure 2 and Figure 3 show the performance of DB2’s dynamic prefetch as a function of DB2 page size when the pages are clustered. These measurements were done using the DS8000 storage server and were achievable because of high speed channels, MIDAW, and AMP. Figure 2 shows the throughput in megabytes per sec. MB/sec 300 238 250 4K 8K 280 280 16K 32K 200 100 0 DB2 page size Figure 2 Dynamic prefetch throughput Figure 3 shows the reciprocal, which is the time per page. Microseconds 60 40 20 58.5 58.5 16K 32K 32.7 17.6 0 4K 8K DB2 page size Figure 3 Dynamic prefetch page time The throughput of dynamic prefetch ranges from 238 to 280 MB/sec as the page size increases, and the time per page ranges from 17.6 microseconds to 58.5 microseconds. 6 Ready to Access DB2 for z/OS Data on Solid-State Drives The performance of dynamic prefetch is about the same for hard disks and solid-state drives, because sequential performance is gated not by the drives, but rather by the channels and host adapters. Sequential writes Another question about SSD is how it performs sequential writes. This question by examined by considering the Basic Sequential Access Method (BSAM). Figure 4 shows the performance of BSAM reads and writes as the number of buffers varies. The results were the same for HDDs and SSDs. Extended Format (EF) data sets were used for the measurements, because EF is recommended for optimal BSAM performance. 300 MB/sec 226 200 174 136 151 Reads Writes 100 0 3 tracks 5 tracks Buffers per I/O Figure 4 BSAM The DB2 utilities and high-level languages, such as COBOL and PL/I, all read and write three tracks per I/O, but this chart also shows how five tracks of buffers resulted in a little bit better performance. With three tracks per I/O, we get 174 MB/sec for reads and we get 136 MB/sec for writes. Again, SSD does not alter the performance of sequential I/O, because sequential performance is insensitive to disk performance. As with dynamic prefetch, you might not achieve this kind of BSAM performance without a DS8000, without fast channels, without MIDAWs, or without AMP. DB2 list prefetch Now, let us look at poorly clustered data or sparse data. When DB2 uses an index to select the pages, DB2 can elect to sort the record identifiers (RIDs) of the pages and fetch them using list prefetch. Because the pages are sorted, the seek times for HDD are much less than had the pages not been sorted. As the fetched pages become denser, the time per page goes down, albeit DB2 has to read more pages. The DB2 buffer manager further cuts the time by scheduling up to two dynamic prefetch engines in parallel. Figure 5 shows the list prefetch time per page as a function of the page density. With SSD, the time per page is independent of the number of pages fetched. That time is only 327 microseconds, which is 40% of the time for random access. Part of that reason is the parallelism of list prefetch. The other reason is that list prefetch reading 32 pages per I/O reduces the number of I/O operations. 7 Microseconds As we can see from Figure 5, hard disks actually do quite well when the density of pages is high. In fact, HDD and SSD perform exactly the same when DB2 fetches at least 14% of the pages. When DB2 fetches only 1% of the pages, SSD is almost four times faster than HDD, but even then SSDs are not as beneficial as they are for random I/Os. The reason is that even when DB2 fetches one page out of a hundred on HDDs, the seek distance is relatively small and part of the pages requires no physical seek. 1500 1200 900 600 300 0 HDD SSD Cache 0 3 6 9 12 15 % of pages read Figure 5 List prefetch time per page Another observation from Figure 5 is that, if the data is entirely in cache, the time per page is only 47 microseconds, which is still seven times faster than SSD. Thus, the value of cache hits when using list prefetch is not as high as it is for synchronous I/Os. The reason for this difference is that list prefetch relieves part of the channel overhead incurred by synchronous I/Os. Seconds Figure 6 illustrates what these times per page mean in terms of elapsed time. A table was created with five million pages, and an index was used to qualify a percentage of the pages. As the percentage increases, the elapsed time increases. The elapsed time increases linearly for SSD and for cache, but not so with hard disks. Refer to Figure 6. 250 200 150 100 50 0 HDD SSD Cache 0 3 6 9 % of pages read Figure 6 List prefetch elapsed time 8 Ready to Access DB2 for z/OS Data on Solid-State Drives 12 15 DB2 list prefetch with striping DB2 uses two engines for list prefetch. Each engine executes one list prefetch I/O at a time. A list prefetch I/O serially processes a batch of 32 4K pages. Striping the table space is a way to increase the degree of parallelism. Because a RAID 5 rank contains eight disks, it is natural to expect that a RAID 5 rank can sustain up to 8 degrees of parallelism. However, the downside of using striping to increase the parallelism of an HDD rank is that it might cause additional seeking, because striping disrupts the “skip sequential” nature of the page ordering that DB2 does. SSD has no such problem. Thus, SSD makes striping within a rank more effective. Striping with HDDs is more effective when the stripes are spread across ranks (with at most one stripe per rank), but it is not practical except in a benchmark situation to dedicate a rank to a stripe. Figure 7 illustrates the effect of striping where 2% of the 4K pages are fetched. Because the outer part of the HDD disks were used for the non-striped (that is, one stripe) case, the HDD times represent the best case. The HDD time per page was 1041 microseconds, the same as shown in Figure 6. The next HDD measurement used eight stripes, and the stripes were allocated on the first eight consecutive volumes on the outer part of the disks; the result was 632 microseconds, resulting in a 41% reduction. However, when the first four volumes (outer) and last four volumes (inner) were used, the seek times were so poor that the response time increased to 1019 microseconds, just 2% less than when striping was not used. Had the stripes been randomly placed, we can expect a response time in the neighborhood of 800 microseconds per page. Microseconds 1250 1041 1019 1000 500 HDD short seeks 632 750 HDD long seeks 327 250 77 65 122 SSD 0 1 stripe 8 stripes rank 1 8 stripes 2 ranks 8 stripes 8 ranks Figure 7 List prefetch with striping Next, we can see from Figure 7 what happens when the eight HDD stripes are spread across eight ranks. Here, the response time drops significantly to just 122 microseconds, representing about 8.5 times reduction. The reason the response time decreased by more than eight times is that spreading the stripes in this way reduced the seek distances. Yet, as impressive as this reduction might seem, it is impractical to dedicate a rank for each stripe. Now, let us examine how solid-state drives improve the performance in an extremely consistent manner, without major concerns about how stripes compete with each other in a RAID rank. The response time without striping is 327 microseconds. Allocating eight stripes in one RAID rank reduces this time to 77 microseconds, a 76.5% reduction. Thus, increasing the degree of I/O parallelism from two to sixteen provides a four-fold reduction in response time. Dividing the stripes among two RAID ranks only improves the performance to 66 microseconds, an 80% reduction compared to no striping, but not much better than when the stripes were concentrated on one rank. 9 The fact that solid-state drives cause striping to be so effective for list prefetch is just the tip of the benefits. As we are about to see, the same reason that makes striping more effective also enables solid-state drives to sustain much higher throughput for DB2 synchronous I/Os. DB2 synchronous I/O using one or two RAID ranks The next question concerns how the performance of one RAID rank scales as the I/O rate for one RAID rank increases. The measurement was conducted in a way that cache hits were avoided. The workload used for this study is called DB2IO. Each thread in DB2IO accesses a separate table space consisting of one row per 4K page. An index is scanned to select qualifying pages, and the query was designed so that no two qualifying pages reside on the same track to prevent track-level caching from producing cache hits. The cache was flushed prior to each run. For “short seeks”, the table spaces and indexes were forced into a small number of contiguous volumes on the outer cylinders of the hard disks. For “long seeks”, half of the table spaces and indexes were placed on the inner volumes, and the other half of the table spaces and indexes were placed on the outer volumes. When using SSDs, volume placement does not affect response time, but because SSD space was limited, there can only be 64 table spaces per SSD rank, thereby limiting the number of DB2IO threads to 64 per rank. Sixty-four threads were sufficient to determine the knee of the response time curve. Milliseconds The results of these tests of DB2IO without zHPF are shown in Figure 8. (This set of tests was done on a z9 processor with only 4 FICON Express 4 channels.) The performance for both short and long HDD seeks are shown along with SSD. You can interpolate the typical HDD response time somewhere between short and long seeks. As stated earlier, when the I/O is uniformly spread across the disks, we can expect the average response time to be about 6 000 microseconds. Figure 8 suggests that the knee of the curve is somewhere between 1 000 and 2 000 I/O operations per second (IO/sec). 25 20 15 10 5 0 0 3 6 9 12 15 18 I/O rate (IO/sec) HDD Short seeks HDD Long seeks SSD Figure 8 One RAID rank: 100% cache miss The SSD response time starts out at 838 microseconds for low rates, and it remains below 1 millisecond up to about 10 000 IO/sec. The response time continues to climb as the number 10 Ready to Access DB2 for z/OS Data on Solid-State Drives of threads increases. The response time with 64 threads was about 4 milliseconds with 18 600 IO/sec. One SSD rank can achieve about ten times as much I/O as one HDD rank. The limitation of the rank is not the SDDs, but rather the device adapter in the DS8000. Considerations specific to SSDs We now examine several of the key new aspects of utilizing SSDs for DB2 data: How does performance scale for a large number of drives What is the importance of HyperPAV SSD migration Does clustering still matter How does performance scale for a large number of drives Because SSDs provide extremely high throughput on the “back end”, increased channel utilization can cause more queuing on the front end. When the channel utilization increases beyond 50%, queuing begins to affect response times significantly. To minimize channel utilization, it is good to configure many channels and it is important to employ zHPF to achieve good channel efficiency. A good cache hit ratio avoids contention on the back end, but does nothing to relieve the contention on the front end. Thus, it is quite possible for the back end performance capacity to far exceed the performance capabilities of the channel subsystem. Channel efficiency is defined as the relationship between throughput and channel utilization. zHPF improves the channel efficiency in much the same way that MIDAW does, except that where MIDAW affects sequential operations, such as DB2 dynamic and sequential prefetch, zHPF affects random I/Os, such as DB2 synchronous I/Os. zHPF is a new channel program architecture that replaces the old Channel Command Word (CCW) architecture that has existed since the 1950s. The same physical channels support a mixture of the new and old channel programs. Now, the Channel Activity Report in Resource Monitor Facility (RMF) shows a count of how many of each type of I/O were executed. We examine a set of measurements in order to evaluate the constraints of the critical components. Milliseconds Figure 9 illustrates the case where there are two SSD ranks configured behind one Device Adapter (DA) pair, that is, two Device Adapters. This example considers a workload without any cache hits using Extended Format (EF) data sets. 6 8 FICON 4 8 zHPF 2 4 FICON 0 0 10 20 30 40 4 zHPF 2 zHPF Thousand IO/sec Figure 9 One DA pair: 16 SSDs: 100% cache miss 11 We start by comparing FICON to zHPF with eight channels. As the throughput increases beyond 18 000 IO/sec, the channel utilization without zHPF is such that queuing on the channel and host adapter begin to add to the I/O response time. Without zHPF, the I/O response time reaches 4 milliseconds at 31 000 IO/sec, but with zHPF, the I/O response time does not reach 4 milliseconds until the throughput reaches 38 000 IO/sec. To accentuate the effects of high channel utilization, we now examine what happens if the number of channels is reduced to four channels. With zHPF, there is extremely little reduction in performance, but with FICON, the maximum throughput drops from 31 000 to only 22 000 IO/sec. Finally, if the number of channels is reduced to two channels, even zHPF cannot achieve more than 24 000 IO/sec, which is still slightly better than FICON does with four channels. The conclusion is that zHPF is about twice as efficient as FICON. We can see this improvement more clearly and in more detail in Figure 10, which plots the relationship between throughput and channel utilization for both EF and non-EF data sets. Because media manager appends a 32-byte suffix to the end of each VSAM Control Interval of an EF data set, EF reduces FICON efficiency by about 20%. However, EF has no effect on the efficiency of zHPF. Both FICON and zHPF achieve slightly higher efficiency with cache hits compared to cache misses. Channel Utilization (%) Figure 10 shows that one channel using zHPF can achieve slightly over 23 000 cache misses per second. In theory, eight of these channels can achieve 184 000 cache misses per second. If the channel efficiency was really a constant, we expect that eight channels at 38 000 IO/sec are 20.6% busy. However, the actual channel utilization at 38 000 IO/sec with eight channels is 25.6%. So, it is not clear how well we can extrapolate from one channel to many channels, or if the channel utilization metric is precise enough to do this extrapolation. 100 FICON non- EF misses FICON EF misses 50 FICON non-EF hits FICON EF hits 0 0 10 20 Thousand IO/sec 30 zHPF misses zHPF hits Figure 10 Channel efficiency: DB2 synchronous reads Another factor to consider is the effect of cache hits. The relationship between the utilization of the channels and the utilization of the device adapters is determined by the cache hit ratio using the following formula: DA IO/sec = cache miss rate x channel IO/sec For example, if the channel IO/sec is 40 000 and the cache hit ratio is 50%, the DA IO/sec is only 20 000. The more cache hits there are, the less stress there is on the device adapters, and the more likely it is that channel performance characteristics will dominate and limit the throughput capacity of the I/O subsystem. That is why High Performance FICON is so important for solid-state drives. 12 Ready to Access DB2 for z/OS Data on Solid-State Drives Figure 11 shows SSD compared to HDD performance with a workload of random reads without any cache hits in the following configurations7: Sixteen HPF, 96 x 146 GB HDDs on 6 DA pairs in 12 x 6+P RAID5 arrays. Eighty-four HDDs are active and 12 HDDs are hot spares. The throughput of this configuration is limited to 20 000 IO/sec, at which point the HDDs have become a bottleneck. Thirty-two HPF, 384 x 146 GB HDDs on 6 DA pairs in 48 x 6+P RAID5 arrays. Three hundred and sixty HDDs are active and 24 HDDs are hot spares. Quadrupling the HDDs and doubling the channels improve the HDDs throughput to 90 000 IO/sec. Sixteen HPF, 96 x 146 GB SSDs on 6 DA pairs in 12 x 6+P RAID5 arrays. Eighty-four SSDs are active and 12 are hot spares. The SSDs dramatically improve the response times at lower IO/sec and raise the maximum throughput to 120 000 IO/sec. These results show extremely good SSD response times with zHPF all the way to 120 000 IO/sec, where the DS8300 controllers become the performance bottleneck. 4 KB random read: large configuration Milliseconds 20 15 16-HPF-96SSD 16-HPF-96HDD 10 32-HPF-384HDD 5 0 0 20 40 60 80 100 120 140 Thousand IO/sec Figure 11 SSD compared to HDD in large configuration: Random read Looking ahead to the future, it is almost certain that both channels and device adapters will become faster, and solid-state drives provide the motivation for such improvements. However, by improving the efficiency of the channels, High Performance FICON can be used to reduce the number of channels needed to take full advantage of the performance capacity of solid-state drives or to enable a given set of channels to exploit a larger disk configuration. What is the importance of HyperPAV HyperPAV has been available since Release 2.4 of the DS8000 and z/OS 1.8, with PTFs retrofit to z/OS 1.6 and 1.7. HyperPAV is an enhancement to the old style of managing Parallel Access Volume (PAV) aliases. Without HyperPAV, alias unit control blocks (UCBs) are “bound” to a specific base address, and the overhead of rebinding an alias to a separate base address is not insignificant. Also, the concept of binding an alias has the potential to waste part of the addresses when two or more logical partitions (LPARs) share the storage. 7 Chart extracted with permission from IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in DS8000, white paper, by Lee LaFrese, Leslie Sutton, David Whitworth, Storage Systems Performance, Storage Systems Group, IBM. 13 HyperPAV eliminates the binding of aliases to a base address. Instead, an alias is assigned to a base address only for the duration of an I/O. Furthermore, at a given point in time, various LPARs can use the same alias address for separate base addresses. SSD has the potential to increase the I/O rate for a system. As the I/O rate increases, there is a potential for faster changes in the I/O distribution with respect to volumes. Bigger swings can create more alias rebinds. Hence, HyperPAV is needed to combat the problems associated with alias rebinds. HyperPAV helps eliminate IOSQ time by pushing the I/O queuing down to a lower level of the I/O subsystem, which is usually the disks. Assuming that disk performance is the biggest physical limitation of throughput, the number of UCB addresses must be chosen so that there can be enough I/O parallelism to enable high disk utilization. Allocating more UCB addresses than what is needed to achieve high disk utilization is useless (except for those cases where the physical disks are not the primary limitation of throughput). Doing so only shifts the queuing from the IOS queues to the disk queues. In other words, the value of further IOSQ reduction is be largely offset by much higher disk contention. How does SSD change this phenomenon? First, let us consider that out of a typical HDD response time of 6 milliseconds, most of the time occupies both a UCB address and a disk. However with SSDs, a much lower percentage of the response is actually using a disk and a greater percentage of the time is spent occupying the channels, which is also true of cache hits. Rather than needing a sufficient number of UCB addresses to drive the HDD utilization above a desired level, it is more interesting to consider how many UCB addresses are needed to drive the channel utilization to a certain desired level, such as 50%, for example. The value of further IOSQ reduction is largely offset by higher channel contention. So, if you observe a little IOSQ time, but you do not want to see higher channel utilizations, you might prefer to allow the IOSQ time to remain until you can do something to relieve the channel contention, such as employing zHPF or increasing the number of channels. SSD migration Identifying good data candidates to move to SSD is beyond the scope of this paper, but we can talk about it briefly. In general, disconnect time is the component of I/O response time that is often a good indicator of disk seek and rotational delay. However, disconnect time is also an indicator of write delays due to synchronous remote data replication. For this reason, IBM is enhancing the SMF type 42 subtype 6 records to distinguish the disconnect time of reads and writes. The DS8000 provides the ability to obtain cache statistics for every volume in the storage subsystem. These measurements include the count of the number of operations from DASD cache to the back-end storage, the number of random operations, the number of sequential reads and sequential writes, the time to execute those operations, and the number of bytes transferred. These statistics are placed in the SMF 74 subtype 5 record. New z/OS SAS-based tooling analyzes the SMF 42-6 and SMF 74-5 records to produce a report that identifies data sets and volumes that can benefit from residing on SSD technology. These new tools are available for download from the z/OS downloads Web page: http://www.ibm.com/systems/z/os/zos/downloads/flashda.html You can obtain reports to: Identify page sets with high amounts of READ_ONLY_DISC time Analyze DASD cache statistics to identify volumes with high I/O rates Identify the data sets with the highest amount of write activity (number of write requests) 14 Ready to Access DB2 for z/OS Data on Solid-State Drives You can use IBM Data Mobility Services – Softek z/OS Dataset Mobility Facility (zDMF) host-based software to move allocated mainframe data sets. DB2 Reorg utility can move table spaces to volumes mapped by SSD drives. Does clustering still matter There is still debate about whether clustering still matters to DB2 performance when solid-state drives are used. Certainly SSDs enable list prefetch to perform much better, but list prefetch still does not perform as well as dynamic prefetch. The reason is that, although the disk itself is insensitive to clustering, other parts of the I/O subsystem are still sensitive to clustering. DB2 for z/OS evolves taking into account hardware changes (CPU speed and disk drive characteristics) that affect access path selection. DB2 optimizer adjusts its cost modeling as needed. DB2 9 already has a statistic called DATAREPEATFACTOR, which indicates the density of data, helping in deciding if RIDs need to be sorted for list prefetch. We think that clustering is likely to become less important as time goes on. If clustering matters less, a corollary question is whether REORG still matters. The answer is yes. The major reason that is most often given for doing REORGs against a DB2 table space is to reclaim space. Another reason is to rebuild compression dictionaries. A performance reason is to avoid page overflows, which occur when a row becomes larger than its original size - a situation that happens often with compressed table spaces. When such a row is selected, two synchronous I/Os can occur instead of one. Although SSD might reduce the time for each I/O, two I/Os are still slower than one. Conclusion From the performance perspective, solid-state drives and High Performance FICON are both exciting new technologies that might change the landscape of enterprise storage in the future. The traditional concerns about seek and rotational delays might disappear. This study indicates that the response time for DB2 synchronous I/Os is typically between 250 (cache hit) and 800 microseconds using SSDs, depending on the cache hit ratio, and that zHPF also contributes to lower response time. SSDs are another enhancement in a continuing series of major performance enhancements to the DS8000 I/O subsystem that have included high-speed channels, MIDAWs, HyperPAV, and AMP. To complement the higher throughput capability of the disk subsystem, IBM recommends High Performance FICON to increase the throughput capability of the channel subsystem. Because SSDs are a new storage technology, the industry will need more experience regarding their reliability and failure mode, but their performance for synchronous I/O shows an opportunity for all but the non-sequentially dominant workloads. 15 References For more information about accessing DB2 for z/OS data on solid-state drives: IBM United States Announcement 109-120: IBM System Storage DS8000 series (Machine types 2421, 2422, 2423, and 2424) delivers new security, scalability, and business continuity capabilities: http://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=8 97/ENUS109-120&appname=USN IBM United States Hardware Announcement 108-870: IBM System Storage DS8000 series (Machine types 2421, 2422, 2423, and 2424) delivers new functional capabilities (zHPF and RMZ resync): http://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=8 97/ENUS108-870&appname=USN Mark Mosheyedi and Patrick Wilkison, STEC, Enterprise SSDs, ACM Queue: http://mags.acm.org/queue/20080708/ Jeffrey Berger and Paolo Bruni, IBM, How does the MIDAW facility improve the performance of FICON channels with DB2 and other workloads, IBM REDP-4201, available at: http://www.redbooks.ibm.com/redpapers/pdfs/redp4201.pdf IBM System z and System Storage DS8000: Accelerating the SAP Deposits Management Workload With Solid State Drives, white paper available at: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101442 Lee LaFrese, Leslie Sutton, David Whitworth, Storage Systems Performance, Storage Systems Group, IBM, IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in DS8000 Performance, white paper - currently in draft version. 16 Ready to Access DB2 for z/OS Data on Solid-State Drives Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. This document REDP-4537-00 was created or updated on December 7, 2009. © Copyright International Business Machines Corporation 2009. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 17 Trademarks ® IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: DB2® DS8000® FICON® IBM® Redbooks (logo) Solid® System Storage™ System z® ® z/OS® z9® SAP, and SAP logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries. Other company, product, or service names may be trademarks or service marks of others. 18 Ready to Access DB2 for z/OS Data on Solid-State Drives