IBM Tivoli Storage Productivity Center V4.2.1 – Performance Management Best Practices Version 2.5 (March 7, 2011) Sergio Bonilla Xin Wang IBM Tivoli Storage Productivity Center development, San Jose, CA Second Edition (March 2011) This edition applies to Version 4, Release 2, of IBM Tivoli Storage Productivity Center © Copyright International Business Machines Corporation 2011. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 2 Table of Contents 1 Notices........................................................................................................................................ 4 1.1 Legal Notice........................................................................................................................ 4 1.2 Trademarks......................................................................................................................... 5 1.3 Acknowledgement .............................................................................................................. 6 1.4 Other IBM Tivoli Storage Productivity Center Publications ................................................ 6 2 IBM Tivoli Storage Productivity Center Performance Management........................................... 7 2.1 Overview ................................................................................................................................ 7 2.2 Disk Performance and Fabric Performance.......................................................................... 7 2.3 Performance Metrics............................................................................................................ 8 3 Setup and Configuration........................................................................................................... 10 3.1 Performance Data collection................................................................................................ 10 3.1.1 Adding a Device .......................................................................................................... 11 3.1.2 Create Threshold Alerts .............................................................................................. 12 3.1.3 Create Performance Monitor ................................................................................... 14 3.1.4 Check Performance Monitor Status......................................................................... 15 3.2 Retention for performance data........................................................................................ 15 3.3 Common Issues................................................................................................................ 16 3.3.1 General issues ......................................................................................................... 16 3.3.2 ESS and DS Related Issues.................................................................................... 16 3.3.3. DS4000/DS5000 Related Issues ............................................................................. 17 3.3.4 HDS Related Issues................................................................................................. 17 4. Top Reports and Graphs a Storage Administrator May Want to Run .............................. 17 4.1 Tabular Reports................................................................................................................ 18 4.2 Drill up and Drill down....................................................................................................... 20 4.3 Historic Charts .................................................................................................................. 20 4.4 Batch Reports ................................................................................................................... 21 4.5 Constraint Violation Reports............................................................................................. 23 4.6 Top Hit Reports ................................................................................................................ 24 5. SAN Planner and Storage Optimizer................................................................................ 25 6. Summary .......................................................................................................................... 26 7. Reference ......................................................................................................................... 26 Appendix A Available Metrics ...................................................................................................... 28 Appendix B Available Thresholds................................................................................................. 57 Appendix C DS3000, DS4000 and DS5000 Metrics ................................................................... 65 3 1 Notices 1.1 Legal Notice This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to nonIBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, 4 companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces. 1.2 Trademarks The following terms are trademarks or registered trademarks of the International Business Machines Corporation in the United States or other countries or both: AIX® Passport Advantage® Tivoli Storage® DB2® pSeries® WebSphere® DS4000, DS6000, DS8000 Redbooks (logo) XIV® Enterprise Storage Server® Redbooks zSeries® server® iSeries Storwize® Tivoli® The following terms are trademarks or registered trademarks of other companies: Microsoft, Windows, Windows XP and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of the Open Group in the United States and other countries. Java, Solaris, and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Intel is a registered trademark of the Intel Corporation or its subsidiaries in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. CLARiiON and Symmetrix are registered trademarks of the EMC Corporation. 5 HiCommand is a registered trademark of Hitachi Data Systems Corporation. Brocade and the Brocade logo are trademarks or registered trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. McDATA and Intrepid are registered trademarks of McDATA Corporation. Cisco is a registered trademark of Cisco Systems, Inc. and/or its affiliates in the U.S. and certain other countries. Engenio and the Engenio logo are trademarks or registered trademarks of LSI Logic Corporation. Other company, product, or service names may be trademarks or service marks of others. 1.3 Acknowledgement The materials in this document have been collected from explicit work in the IBM Tivoli Storage Productivity Center development lab, other labs within IBM, from experiences in the field at customer locations, and contributions offered by people that have discovered valuable tips and have documented the solution. Many people have helped with the materials that are included in this document, too many to properly acknowledge here, but special thanks goes to Xin Wang who compiled the original version of this document. It is a source of information for advanced configuration help and basic best practices for users wanting to get started quickly with Tivoli Storage Productivity Center. 1.4 Other IBM Tivoli Storage Productivity Center Publications IBM Tivoli Storage Productivity Center 4.2.1 - Performance Management Best Practices is a supplement to the Tivoli Storage Productivity Center publications that are available. It is intended to be a supplement to the Tivoli Storage Productivity Center publications, providing additional information to help implementers of Tivoli Storage Productivity Center with configuration questions and to provide guidance in the planning and implementation of Tivoli Storage Productivity Center. It is expected that an experienced Tivoli Storage Productivity Center installer will use this document as a supplement for installation and configuration, and use the official Tivoli Storage Productivity Center publications for overall knowledge of the installation process, configuration, and usage of the Tivoli Storage Productivity Center components. This document is not intended to replace the official Tivoli Storage Productivity Center publications, nor is it a self-standing guide to installation and configuration. You can find the entire set of Tivoli Storage Productivity Center publications at http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp. These documents are essential to a successful implementation of Tivoli Storage Productivity Center, and should be used to make sure that you do all the required steps to install and configure Tivoli Storage Productivity Center. You should have the official publications available in either softcopy of printed form, read them and be familiar with their content. 6 2 IBM Tivoli Storage Productivity Center Performance Management 2.1 Overview There are three main functions for IBM Tivoli Storage Productivity Center performance management: performance data collection, performance thresholds/alerts, and performance reports. The product can collect performance data for devices - storage subsystems and fibre channel switches - and store the data in the database up to a user-defined period. The product may collect performance data from either IBM devices using native-agent APIs, or IBM and non-IBM devices that are managed by CIM agents that are at least SMI-S 1.1 compliant. The product can set thresholds for important performance metrics, and when any boundary condition is crossed, can notify the user via email, SNMP, or other alerting mechanisms. And lastly, the product can generate reports, historic trend charts, and help analyze the bottleneck of a performance congestion by drilling down to threshold violated components and the affected hosts. The combination of those functions can be used to monitor a complicated storage network environment, to predict warning signs of system fallout, and to do capacity planning as overall workload grows. The collected performance data may also be utilized by both the Storage Optimizer and SAN Planner functions. The IBM Tivoli Storage Productivity Center Standard Edition (5608-WC0) includes performance management for both subsystems and switches, while IBM Tivoli Storage Productivity Center for Disk (5608-WC4) is only for storage subsystems. IBM Tivoli Storage Productivity Center Basic Edition (5608-WB1) and IBM Tivoli Storage Productivity Center for Data (5608-WC3) do not include performance management function. 2.2 Disk Performance and Fabric Performance Performance management for subsystems is done via the disk manager. Data collection for subsystems can be scheduled under Disk Manager -> Monitoring -> Subsystem Performance Monitors, and subsystem performance reports are under Disk Manager -> Reporting -> Storage Subsystem Performance. Performance management for fibre channel switches is done via the fabric manager. Data collection for switches can be scheduled under Fabric Manager -> Monitoring -> Switch Performance Monitors, and switch performance reports are under Fabric Manager -> Reporting -> Switch Performance. Some disk performance and all fabric performance require a CTP certified CIMOM that is at least SMI-S 1.1 compliant. Devices that do not require a CIM agent for disk performance collection include some DS8000 subsystems, and all SVC, Storwize V7000, and XIV Storage subsystems. These devices use native-agent APIs introduced in IBM Tivoli Storage Productivity Center V4.2. 7 2.3 Performance Metrics IBM Tivoli Storage Productivity Center can report on various performance metrics, which indicate the particular performance characteristics of the monitored devices. Two very important metrics for storage subsystems are the throughput in I/Os per sec and the response time in milliseconds per I/O. Throughput is measured and reported in several different ways. There is throughput of an entire box (subsystem), or of each cluster (ESS) or controller (DS6000, DS8000), or of each I/O Group (SVC, Storwize V7000). There are throughputs measured for each volume (or LUN), throughputs measured at the Fibre Channel interfaces (ports) on some of the storage boxes and on fibre channel switches, and throughputs measured at the RAID array after cache hits have been filtered out. For storage subsystems, it is common to separate the available performance statistics into two separate domains, the front-end and the back-end of the subsystem. Front-end I/O metrics are a measure of the traffic between the servers and the storage subsystem, and are characterized by relatively fast hits in the cache, as well as occasional cache misses that go all the way to the RAID arrays on the back end. Back-end I/O metrics are a measure of all traffic between the subsystem cache and the disks in the RAID arrays in the backend of the subsystem. Most storage subsystems give metrics for both kinds of I/O operations, front- and back-end. We need to always be clear whether we are looking at throughput and response time at the front-end (very close to system level response time as measured from a server), or the throughput and response time at the back-end (just between cache and disk). The main front-end throughput metrics are: • Total I/O Rate (overall) • Read I/O Rate (overall) • Write I/O Rate (overall) The corresponding front-end response time metrics are: • Overall Response Time • Read Response Time • Write Response Time The main back-end throughput metrics are: • Total Backend I/O Rate (overall) • Backend Read I/O Rate (overall) • Backend Write I/O Rate (overall) The corresponding back-end response time metrics are: • Overall Backend Response Time • Backend Read Response Time • Backend Write Response Time It is important to remember that the response times taken in isolation of throughput rates are not terribly useful, because it is common for components which have negligible throughput rates to exhibit large (bad) response times. But in essence those bad response times are not significant to the overall operation of the storage environment if they occurred for only a handful of I/O operations. It is therefore necessary to have an understanding of which throughput and response time combinations are significant and which can be ignored. To help in this determination, IBM Tivoli Storage Productivity Center V4.1.1 introduced a metric called Volume Utilization Percentage. This metric is based on both I/O Rate and Response Time of a storage volume and 8 is an approximate measure of the amount of time the volume was busy reading and writing data. It is therefore safe to ignore bad average response time values for volumes with very low utilization percentages, and conversely, those volumes with the highest utilization percentages are the most important for the smooth operation of the storage environment and are most important to exhibit good response times. When implementing storage tiering using 10K, 15K, or even SSD drives, the most highly utilized volumes should be considered for being placed on the best performing underlying media. Furthermore, it will be advantageous to track any growth or change in the throughput rates and response times. It frequently happens that I/O rates grow over time, and that response times increase as the I/O rates increase. This relationship is what “capacity planning” is all about. As I/O rates and response times increase, you can use these trends to project when additional storage performance (as well as capacity) will be required. Depending on the particular storage environment, it may be that throughput or response times change drastically from hour to hour or day to day. There may be periods when the values fall outside the expected range of values. In that case, other performance metrics can be used to understand what is happening. Here are some additional metrics that can be used to make sense of throughput and response times. • Total Cache Hit percentage • Read Cache Hit Percentage • NVS Full Percentage • Read Transfer Size (KB/Op) • Write Transfer Size (KB/Op) Low cache hit percentages can drive up response times, since a cache miss requires access to the backend storage. Low hit percentages will also tend to increase the utilization percentage of the backend storage, which may adversely affect the back-end throughput and response times. High NVS Full Percentage (also known as Write-cache Delay Percentage) can drive up the write response times. High transfer sizes usually indicate more of a batch workload, in which case the overall data rates are more important than the I/O rates and the response times. In addition to the front-end and back-end metrics, many storage subsystems provide additional metrics to measure the traffic between the subsystem and host computers, and to measure the traffic between the subsystem and other subsystems when linked in remote-copy relationships. Such fibre channel port-based metrics, primarily I/O rates, data rates, and response times are available for ESS, DS6000, DS8000, SVC, and Storwize V7000 subsystems. ESS, DS6000, and DS8000 subsystems provide additional break-down between FCP, FICON, and PPRC operations at each port. SVC and Storwize V7000 subsystems provide additional breakdown between communications with host computers, backend managed disks, and other nodes within the local cluster, as well as remote clusters at each subsystem port. XIV subsystems do not provide portbased metrics as of IBM Tivoli Storage Productivity Center V4.2.1. Similar to the Volume Utilization Percentage mentioned earlier, IBM Tivoli Storage Productivity Center V4.1.1 also introduced the Port Utilization Percentage metric (available for ESS, DS6000, and DS8000 storage subsystems). The Port Utilization Percentage is an approximate measure of the amount of time a port was busy, and can be used to identify over-utilized and under-utilized ports on the subsystem for potential port balancing. For subsystems where port utilizations are not available, the simpler Port Bandwidth Percentage metrics provide a measure of the approximate bandwidth utilization of a port, based on the port’s negotiated speed, and can be used in a similar fashion. However, beware that the Port Bandwidth Percentages can potentially provide misleading indicators of port under-utilization when ports are not under-utilized if there is a performance bottleneck elsewhere in the fabric or at the port’s communication partner. For fibre-channel switches, the important metrics are Total Port Packet Rate and Total Port Data Rate, which provide the traffic pattern over a particular switch port, as well as the Port 9 Bandwidth Percentage metrics providing indicators of bandwidth usage based on port speeds. When there are lost frames from the host to the switch port, or from the switch port to a storage device, the dumped frame rate on the port can be monitored. All these metrics can be monitored via reports or graphs in IBM Tivoli Storage Productivity Center. Also there are several metrics for which you can define thresholds and receive alerts when measured values do not fall within acceptable boundaries. Some examples of supported thresholds are: • Total I/O Rate and Total Data Rate Thresholds • Total Backend I/O Rate and Total Backend Data Rate Thresholds • Read Backend Response Time and Write Backend Response Time Thresholds • Total Port I/O Rate (Packet Rate) and Data Rate Thresholds • Overall Port Response Time Threshold • Port Send Utilization Percentage and Port Receive Utilization Percentage Thresholds • Port Send Bandwidth Percentage and Port Receive Bandwidth Percentage Thresholds Please see Appendix A for a complete list of performance metrics that IBM Tivoli Storage Productivity Center supports and Appendix B for a complete list of thresholds supported. The important thing is to monitor the throughput and response time patterns over time for your environment, to develop an understanding of normal and expected behaviors. Then you can set threshold boundaries to alert you when anomalies to the expected behavior are detected. And you can use the performance reports and graphs to investigate any deviations from normal patterns or to generate the trends of workload changes. 3 Setup and Configuration 3.1 Performance Data collection Performance data may be collected either directly from select device types using native-agent API or from devices managed by a CIM agent (CIMOM) using SMI-S interfaces. Devices that do not require a managing CIM agent include DS8000, SVC, Storwize V7000, and XIV subsystems. Devices that require a CIM agent include non-IBM subsystems and switches, ESS subsystems, DS4000 and DS5000 subsystems, and DS6000 subsystems. XIV subsystems require version 10.1 or higher to collect performance data. For devices that require the use of a CIM agent, you need to make sure all of the following prerequisites are met before adding the CIM agent to TPC: • The version of the CIMOM and the firmware for the device is supported. • A CIMOM is installed in the environment, either as a proxy on another server or embedded on the device it manages. • For subsystems or switches on a private network, be sure to have the CIMOM installed on a gateway machine so the IBM Tivoli Storage Productivity Center server on a different network can communicate with it. • The CIMOM is configured to manage the intended device. See IBM TotalStorage Productivity Center V3.2 Hints and Tips on how to install and configure a CIMOM: 10 http://www-1.ibm.com/support/docview.wss?rs=597&context=STCRLM4&context=SSMMUP &dc=DB500&uid=swg21236488&loc=en_US&cs=utf-8&lang=en The following steps cover setup and configuration for both subsystems and switches. 3.1.1 Adding a Device You may add a device to collect performance data from to TPC through either: 1. Launching the IBM Tivoli Storage Productivity Center -> Configure Devices panel. 2. Clicking the Add Storage Subsystem button on the Disk Manager -> Storage Subsystems panel. 3. Clicking the Add Fabric button on the Fabric Manager -> Fabrics panel. 4. Clicking the Add button on the Administrative Services -> Data Sources -> Storage Subsystems panel. 5. Clicking the Add CIMOM button on the Administrative Services -> Data Sources -> CIMOM Agents panel. The first four options will launch a “Configure Devices” wizard, at different stages of the configuration settings. Adding a device using the first option allows the user to configure either a storage subsystem or fabric/switch to be used for future performance monitors. The second and fourth options automatically choose the “Storage Subsystem” option and advances to a panel that allows you to choose the subsystem type, while the third option automatically chooses the “Fabric/Switches” option. Going through the first panel also allows the user to choose to add and configure a new device or to configure a previously discovered device. Adding a storage subsystem allows you to choose between configuring an IBM DS8000, IBM XIV, IBM SAN Volume Controller/IBM Storwize V7000, or other storage subsystems managed by a CIM agent. Adding fabric/switch allows you to choose between configuring a Brocade, McData, Cisco, QLogic, or Brocade/McData device. The “Configure Devices” panel updates depending on the device chosen. Fields with labels in bold are required in order to connect to the chosen device type. If you are configuring “other” devices, read the CIMOM documentation to get all the information required to connect to the CIMOM. In order to collect performance data for fabrics, you must choose to configure a CIMOM agent for the device, instead of configuring an out of band fabric agent. Configuring a device using these wizards will perform the necessary discovery and initial probe of the device required in order to run a performance data collection against the device. 11 3.1.2 Create Threshold Alerts A performance threshold is a mechanism by which you can specify one or more boundary values for a performance metric, and can specify to be notified if the measured performance data for this metric violates these boundaries. Thresholds are applied during the processing of performance data collection, so a performance monitor for a device must be actively running for a threshold to be evaluated and a violation to be recognized The Tivoli Storage Productivity Center ships with several default thresholds enabled (see Appendix B for a full list of thresholds supported) that do not change much with the environment, but metrics such as throughput and response time can vary a lot depending on the type of workload, model of hardware, amount of cache memory etc. so that there are no recommended values to set. Boundary values for these thresholds have to be determined in each particular environment by establishing a base-line of the normal and expected performance behavior for the devices in the environment. After the base-line is determined, thresholds can then be defined to trigger if the measured performance behavior falls outside the normally expected range. Thresholds are device type and component type specific, meaning that each threshold may apply to only a subset of all supported device types and to only a subset of supported component types for each device type. Every threshold is associated with a particular metric; checking that metric’s value at each collection interval determines whether the threshold is violated or not. To create an alert for subsystem thresholds, go to Disk Manager -> Alerting -> Storage Subsystem Alerts, right-click to select create storage subsystem alert (see Figure 1): • Alert tab – In the triggering condition area, select from the drop down list a triggering condition (threshold alerts have names ending with “Threshold”), ensure that the threshold is enabled via the checkbox at the top of the panel, and then enter the threshold boundary values for the desired boundary conditions. Tivoli Storage Productivity Center V4.1.1 and above allows decimal values in the threshold boundary values, prior versions only allow integer values. • Alert tab – Some thresholds are associated with an optional filter condition, which is displayed in the triggering condition area. If displayed, you can enable or disable the filter, and if enabled, can set the filter boundary condition. If the filter condition is triggered, any violation of this threshold will be ignored when the filter is enabled. • Alert tab – In the alert suppression area, select whether to trigger alerts for both critical and warning conditions or only critical conditions or not to trigger any alerts. The suppressed alerts will not alert log entries or cause any action to be taken as defined in the triggered action area, but they will still be visible in the constraint violation reports. • Alert tab – In the alert suppression area, select whether to suppress repeating alerts. You may either suppress alerts until the triggering condition has been violated continuously for a specified length of time or to suppress subsequent violations for a length of time after the initial violation. Alerts suppressed will still be visible in the constraint violation reports. • Alert tab – In the triggered action area, select one of the following actions: SNMP trap, TEC/OMNIbus event, login notification, Window’s event log, run script, or email. • Storage subsystem tab – move the subsystem(s) you want to monitor into the righthand panel (Selected subsystems). Make sure these are the subsystems for which you will define performance monitors • Save the alert with a name 12 Figure 1. Threshold alert creation panel for storage subsystems. To create an alert for switch thresholds, go to Fabric Manager -> Alerting -> Switch Alerts, right-click to select create switch alert, and follow the same steps as for subsystems described above. There are a few points that need to be addressed in order to understand threshold settings: 1. There are two types of boundaries for each threshold, the upper boundary (stress) and lower boundary (idle). When a metric’s value exceeds the upper boundary or is below the lower boundary, it will trigger an alert. 2. There are two levels of alerts, warning and critical. The combination of boundary type and level type generates four different threshold settings: critical stress, warning stress, warning idle and critical idle. Most threshold values are in descending order (critical stress has the highest value that indicates high stress on the device, and critical idle has the lowest value) while Cache Holding Time is the only threshold in ascending order. 3. If the user is only interested to receive alerts for certain boundaries, the other boundaries should be left blank. The performance manager will only check boundary conditions with input values, therefore no alerts will be sent for the condition that is left blank. 4. The storage subsystem alerts will be displayed under IBM Tivoli Storage Productivity Center -> Alerting -> Alert logs -> All, as well as under Storage Subsystem. Another important way to look at the exception data is to look at constraint violation reports. This is described in section 4.4. 13 3.1.3 Create Performance Monitor A performance data collection is defined via a mechanism called a monitor job, and that can be run manually (immediately), can be scheduled for one-time execution, or can be scheduled for repeated execution, as desired. Only after the device has been probed successfully, can a monitor job be run successfully. To create a performance monitor on a subsystem, go to Disk Manager -> Monitoring -> Subsystem Performance Monitors, right-click to select create performance monitor. • Storage Subsystem tab - move the subsystem(s) you want to monitor into the right-hand panel (Selected subsystems) • Sampling and scheduling tab – enter how frequently the data should be collected and saved (the smaller the interval length, the more granular the performance data), when the monitor will be run and the duration for the collection • Save the monitor with a name To create a performance monitor for a switch, go to Fabric Manager -> Monitoring -> Switch Performance Monitors, follow the same steps as above, substituting storage subsystem with switch. The monitor will start at the scheduled time, but the performance sample data will be collected a few minutes later. For example, if your monitor is scheduled to start at 9 am to collect with an interval length of 5 minutes, the first performance data might be inserted into the database 10-15 minutes later, and the second performance data will be inserted after 5 more minutes. Only after the first sample data is inserted into the database, in this case, around 9:10 or 9:15 am, you will be able to view the performance reports. Because of this, there are some best practice information related how to set up the schedule and duration for a performance monitor: 1. Monitor duration – if a monitor is intended to run for a long time, choose to run it indefinitely. The performance monitor is optimized such that running indefinitely will be more efficient than running, stopping, and starting again. 2. You should only have one performance monitor defined per storage device. 3. Prior to v4.1.1, if you want to run the same monitor at different workload periods, set the duration to be 1 hour less than the difference between the two starting points. This gives the collection engine one hour to finish up the first collection and shutdown properly. For example, if you want to start a monitor at 12 am and 12 pm on the same day, the duration for the 12 am collection has to be 11 hours or less, so the monitor can start again at 12 pm successfully. The same is true for a repeated run. If you want to run the same monitor daily, be sure the duration of the monitor will be 23 hours or less. If you want to run the same monitor weekly, the duration of the monitor will need to be 7x24 -1 = 167 hours or less. 4. Later versions of IBM Tivoli Storage Productivity Center no longer have the limitation of running back-to-back performance monitors for the same device with a 1 hour reduction of the duration required. To avoid overlapping performance monitors for back-to-back monitors, v4.1.1 and higher will automatically reduce the duration of the preceding run by one interval period to allow sufficient time for the monitor to end. There is no data loss, as the subsequent monitor will retain the previous interval’s performance data for the next delta. During a performance sample collection, the hourly and daily summary for each performance metric are computed based on the sample data. The summary data reflects the performance characteristics of the component over certain time periods while the sample data shows the performance right at that moment. 14 One more thing to notice for a performance monitor and the sample data: the clock on the server might be different from that on the device. The performance monitor always uses device time on the sample it collects, then converts it into the time zone (if it’s different) of the IBM Tivoli Storage Productivity Center server. 3.1.4 Check Performance Monitor Status When the performance monitor job starts to run, you begin to collect performance data for the device. You should check the status of the monitor job, make sure it runs and continues running. Expand on Subsystem Performance Monitors, right click on the monitor, select Job History, and check the status of the job you want to view. Alternatively, you may navigate to IBM Tivoli Storage Productivity Center -> Job Management and find the performance job in the list of jobs. If the status is blue, the monitor is still running without issues. If the status is yellow, you can check out the warning messages. The monitor will continue to run with warning messages. For example, if there are “missing a sample data” warning messages, the monitor will continue to run, and only if the monitor misses all the data it should collect, the status will turn red, and the monitor failed. If the status is green, the monitor completed successfully. To view the job log, select the performance job from the list of scheduled jobs. The runs of the particular job will be listed in the bottom panel. Expand the run you are interested in, select the job, and click the View Log File(s) button. Normally the job log will have error messages logged for a failed collection. There are a few common issues that may lead to failed data collection. See section 3.3 for details. 3.2 Retention for performance data After the monitor is created, the user should configure the retention of performance data in the database. Expand Administrative Services –> Configuration –> Resource History Retention, under Performance Monitors there are three options: retention for collected sample data (labeled “per performance monitoring task”), retention for aggregated hourly data, and retention for daily data. Sample data is the data that is collected at the specified interval length of the monitor, for example data collected every 5 minutes. The default retention period for sample data is 14 days. This means that by default, IBM Tivoli Storage Productivity Center keeps the individual 5 minute samples for 14 days before they are purged. Individual samples are summarized into hourly and daily data, for example the sum of 12 of the 5 minute samples are saved as an hourly performance data record, and the sum of 288 such samples are saved as a daily performance data record. The default retention periods for hourly and daily data are 30 days and 90 days, respectively. You can change all those values based on your need to retain historical performance information, but please be aware of the implication to the size of IBM Tivoli Storage Productivity Center database if performance data is kept longer, especially the sample data. Here are a few formulas the user can use to estimate the size of performance data: For subsystems, the biggest component is volume, and the biggest performance sample data will be that of volumes. For switches, the performance data is proportional to the number of ports in a switch. Assuming: NumSS = number of subsystems NumSW = number of switches 15 NumV = average number of volumes in a subsystem NumPt = average number of ports in a switch CR = number of sample data collected per hour (for a sample interval of 5 minutes, this should be 60/5 = 12 samples ) Rs = retention for sample data in days Rh = retention for hourly summarized data in days Rd = retention for daily summarized data in days The estimated space required may be calculated using the following formulas: Storage subsystem performance sample data size = NumSS * NumV * CR * 24 * Rs * 200 byte Storage subsystem performance aggregated data size = NumSS * NumV * (24 * Rh + Rd) * 200 byte Switch performance sample size= NumSw * NumPt * CR * 24 * Rs * 150 Switch performance aggregated data size = NumSw * NumPt * (24 * Rh + Rd) * 150 3.3 Common Issues There are a few known issues that may lead to failed performance data collection. Most of them are related to the configuration of devices or the environment. Here are a few hints and tips on those known issues: 3.3.1 General issues Invalid data returned by either the firmware or CIMOM of managed devices may cause a data spike during a polling interval. These spikes will skew the results of averaged and aggregated data, resulting in unreliable performance data that are also capable of reducing the effectiveness of the Storage Optimizer. To alleviate this, code was introduced in 3.1.3 to detect when a data spike occurred and the performance data for that polling interval wouldn’t be inserted into the database. This will result in a message indicating that the polling interval was skipped in the performance monitor’s job log when a spike is detected. 3.3.2 ESS and DS Related Issues Any firewalls between the ESS CIMOM host server and the ESS subsystem should be configured to allow LIST PERFSTATS traffic through. If this is not possible, then both the ESS CIMOM host server and the ESS subsystem must be on the same side of any existing firewalls. In addition to this, all IP ports on the CIMOM server above 1023 opened to receive performance data from the ESS. The port bandwidth usage percentage for DS6000 subsystems may be displayed as “N/A” in the reports. This is due to the port speeds not being available from the device. The DS6000 CIMOM may be upgraded to version 5.4.0.99 to reduce the likelihood. The storage pools of DS8000 subsystems containing space efficient volumes will have incomplete performance data collected. The performance manager is unable to determine if the space efficient volumes are fully allocated, making it impossible to manage the performance for the ranks, the arrays associated with those ranks, and the device adapters associated with those arrays, since it cannot determine the performance impact of those volumes. Rather than present 16 the user with inaccurate and misleading information, the performance manager will not aggregate the volumes’ metrics to the higher level components. 3.3.3. DS4000/DS5000 Related Issues Both clusters need to be defined to the Engenio CIMOM. If only one cluster of the DS4000 or DS5000 is defined to the Engenio CIMOM, performance data will be collected only for the one cluster, while volume and other components information are still collected for both clusters. 3.3.4 HDS Related Issues Tivoli Storage Productivity Center 4.1.1 is capable of collecting some performance statistics from HDS subsystems with HDvM 6.2, but there are currently known limitations to the performance metrics being returned. As such, Tivoli Storage Productivity Center 4.1.1 does not claim support for monitoring HDS subsystems with HDvM 6.2. For more information regarding this limitation, please see: http://www-01.ibm.com/support/docview.wss?uid=swg21406692 4. Top Reports and Graphs a Storage Administrator May Want to Run After performance data is collected for the subsystem and the switch, there are a few ways to use the data and interpret key metrics – via reports, via charts, drill down to problematic components, review of constraint violation reports, or export of the data for customized reports. The following table describes the performance reports supported for each device type in IBM Tivoli Storage Productivity Center 4.1.1. Note that not all SMI-S BSP subsystems support each report type. For example, only certain versions of DS3000, DS4000, and DS5000 return performance data that can be displayed in the By Controller and By Port reports. Please see Appendix C for a list of metrics and reports supported by DS3000, DS4000, and DS5000. Device Type ESS, DS6000, DS8000 Performance Report Type By Subsystem By Controller By Array By Volume By Port SVC, Storwize V7000 By Subsystem By Volume By I/O Group By Module/Node By Managed Disk Group 17 By Managed Disk By Port XIV By Subsystem By Volume By Module/Node SMI-S BSP (eg. DS4000, DS5000) By Subsystem By Controller By Volume By Port Switch By Switch By Port Tabular Reports The most straight forward way to view performance data is to go through the corresponding component manager’s reporting function to view the most recent data. For subsystem performance reports, go to Disk Manager -> Reporting -> Storage Subsystem Performance, then select one of the options to view the data. The options on report types as shown in Figure 2: • By Subsystem – for box level aggregate/averages for ESS, DS, SVC, Storwize V7000, XIV, and SMI-S BSP • By Controller – for ESS clusters, and DS and select SMI-S BSP controllers • By Array – for ESS and DS arrays • By Volume – for ESS, DS, XIV, and SMI-S BSP volumes/LUNs, and SVC and Storwize V7000 vdisks • By Port – ESS, DS, SVC, Storwize V7000, and SMI-S BSP FC ports onto storage box • By I/O Group – for SVC and Storwize V7000 I/O groups • By Node – for SVC and Storwize V7000 nodes • By Managed Disk Group – for SVC and Storwize V7000 managed disk groups • By Managed Disk – for SVC and Storwize V7000 managed disks 18 Figure 2. IBM Tivoli Storage Productivity Center 4.1.1 Performance Reports Options for Disk Manager. On the right-hand panel, the available metrics for the particular type of device are in the included columns, and the user can pick which metric not to include in the performance report. The user also could use the selection button to pick components specific to the device type, and use the filter button to define criteria of their choosing. Select the “display latest performance data” option to generate a report on the most recent data. Historic reports can be created by choosing either the date/time range or by defining how many days in the past to include in the report. You may display either the latest sample, hourly, or daily data for either the latest or historic reports. If the selection is saved, this customized report will show up under IBM Tivoli Storage Productivity Center -> Reporting -> My Reports –> [Admin]’s Reports. [Admin] here is the login name used to define the report. See more information regarding this topic in the IBM Tivoli Storage Productivity Center 4.1.1 User’s Guide. For switch performance reports, go to Fabric Manager -> Reporting -> Switch Performance. A report may be created in a similar fashion as a subsystem report. The supported report types are: • By Switch • By Port 19 Drill up and Drill down Based on the latest sample data, drill up and drill down can be done between different components for ESS, DS8000, DS6000, SVC, Storwize V7000, and XIV. For ESS/DS8000/DS6000, user can drill down to reports in this path by clicking on the magnifying glass icon by the row: “By Controller” -> “By Array” -> “By Volume” reports. And drill up works on reverse direction. For example, by looking at the performance of a volume, you can drill up to the performance of the underlying array to see if there is more information; and while you are at the performance report for an array, you can drill down to all the corresponding volumes to see which volume is imposing significant load on the array. For SVC and Storwize V7000, a user can drill down to reports in this path: “by Mdisk Group” -> “by Mdisk”. Historic Charts A historic chart can be created by clicking the chart icon at the top of the tabular report and choosing “history chart” on the panel that is displayed. The history chart may be created for all the rows from the previous tabular report or for a selected subset chosen prior to clicking the chart icon. One or more metrics may be chosen to display in the report, as well as whether to sort the chart by the metric(s) chosen or by component. Once the historic chart is generated, you may modify the date and time ranges included in the chart and click “generate chart” again. You may view the trend of the data by clicking the “show trends” button. The historic data in tabular form may not always be available in older versions of IBM Tivoli Storage Productivity Center; however, in those cases the report can be exported into different formats for analysis in other tools once the “by sample” history chart is displayed. Click on the File option, there are various print and export options. You can print the graph to a format such as html, pdf etc. and export the data into CSV file for archiving or input to a spreadsheet. It’s desirable to track growth in the I/O rate and response time for a particular subsystem or switch using the historic chart. Also, retention of the performance data in the IBM Tivoli Storage Productivity Center database is limited, and eventually it will be purged out of the database. It is important to develop a set of graphs and reports to summarize and visualize the data, and keep periodic snapshots of performance. The key is to monitor normal operations with key metrics, develop an understanding of expected behaviors, and then track the behavior for either performance anomalies or simple growth in the workload. See Figure 3 for an example of the throughput chart (Total I/O Rate) for a DS8000. This data is an hourly summary of I/O rate for this DS8000 for past day. This data can also easily be exported into other format for analysis. 20 Figure 3. Historic chart on Total I/O rate for a DS8000. Batch Reports Another way to backup the performance data is to use batch reports. This saves a report into a file on a regular scheduled basis. You can create a batch report by going to IBM Tivoli Storage Productivity Center -> Reporting -> My Reports, right-click on the Batch Reports node, and select Create Batch Report. In order to create batch reports, a data agent must be installed locally or remotely. Starting with IBM Tivoli Storage Productivity Center V4.1, fix pack 1, the data agent will only be available with the Data and Standard Edition. Prior to installing a data agent, an agent manager must be installed, and the device and data servers must have been registered with the agent manager. For additional information regarding installing the data agent or agent manager see: http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp?topic=/com.ibm.tpc_V411.doc/fqz 0_t_installing_agents.html When creating a batch report, the Report tab allows you to choose the type of performance report for the batch job (see Figure 4). Select either a Storage Subsystem Performance report or Switch Performance report under their respective nodes in the Available list and click the “>>” button. Only one performance report may be chosen per batch report job. Once a performance report type has been chosen, the Selection tab will be populated with the available metric columns for that report type. The panel is similar to the tabular report panel in section 4.1 (see Figure 2) and features the same options. On the Options tab, select which agent to run the report on (this will determine the location of the output file), and choose the type of output file to generate (see Figure 5), such as a CSV file that may be imported into a spreadsheet or an HTML file. Then choose when and how often you want this job to run on the When to Run tab. Then save the batch report. When the batch report is run, the file location is described in the batch job’s log. 21 For additional information regarding batch reports, see the IBM Tivoli Storage Productivity Center V4.1.1 info center: http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp?topic=/com.ibm.tpc_V411.doc/fqz 0_c_batch_reports.html Figure 4. Choose a performance report for the batch reporting job. 22 Figure 5. Choose the agent to run the report on and the type of output to generate. Constraint Violation Reports Another important way to view performance data is through constraint violation reports. For each device type, there are only certain metrics you can set thresholds. See Figure 5 for constraint violation options. Go to Disk Manager -> Reporting –> Storage Subsystem Performance -> Constraint Violations, all subsystems with thresholds violated will show up in the first general report. Similar constraint violation reports exist for switches. Go to Fabric Manager -> Reporting –> Switch Performance -> Constraint Violations, you will get switch constraint violation reports. It is very important you set meaningful threshold values that the constraint report can be used to diagnose problems in your environment. The first report shows the number of violations for each subsystem during the last 24 hours. If the normal behavior pattern is studied and the threshold values truly reflect an abnormal condition, the number of violations will indicate the severity of the problem on the subsystem. This can be used daily to monitor all the devices and to analyze the trend in your environment. You can also drill down from the subsystem level to get details on the violations on each component from this general report. In the detailed report panel, you can click on the Affected Volumes tab to generate a report showing details on the affected volumes. Under Volumes, select whether you want the report to show all volumes or only the most active volumes associated with the subsystem and component. Under Performance data, select whether you want the report to display performance data for the volumes. The user can also click on the Affected Hosts tab to generate a report showing details on the affected hosts for ESS/DS6000/DS8000. The volume report here will show the user which host is affected by this threshold violation. In the meantime, a historic graph can be generated based on constraint violation by clicking on the chart icon. All the options described in section 4.3 exist here too. 23 Figure 4. Constraint Violation Reports Options Top Hit Reports One more way to view performance reports for subsystem devices is to look at top 25 volumes with highest performance metrics (for cache hit, this will be the lowest). Here are the reports available for subsystems under IBM Tivoli Storage Productivity Center -> Reporting -> System Reports –> Disk: • Top Active Volumes Cache Hit Performance • Top Volumes Data Rate Performance • Top Volumes Disk Performance • Top Volumes I/O Rate Performance • Top Volumes Response Performance For example, the Top Volumes I/O Rate Performance report will show the 25 busiest volumes by I/O rate. The main metrics shown in this report are: • Overall read/write/total I/O rate • Overall read/write/total data rate • Read/write/overall response time Similar top hit reports are available for switches under IBM Tivoli Storage Productivity Center > Reporting –> System Reports –> Fabric: • Top Switch Ports Data Rate Performance • Top Switch Ports Packet Rate Performance 24 Figure 5. Top hits reporting choices. These reports will help the user to look up quickly the top hit volumes/ports for bottleneck analysis. One caveat is that these top reports are based on the latest sample data, and in some cases, may not reflect the problem on a component over a certain period. For example, if the daily average I/O rate is high for a volume but the last sample data is normal, this volume may not show up on the top 25 reports. Another complication in storage performance data is data wrap, that is, from one sample interval to the next, the metric value may appear extremely large. This will also skew these top reports. It is also possible to see some volumes in these reports with low or no I/O (0 or N/A values for their metrics) if fewer than 25 volumes have high I/O. There are also other predefined reports. Under the same node “System Reports -> Disk”, it has reports such as “Array Performance” and “Subsystem Performance”. Those predefined reports are provided with the product which shows also the latest sample data. 5. SAN Planner and Storage Optimizer The SAN Planner function is available for ESS, DS6000, DS8000, SVC and Storwize V7000 systems in IBM Tivoli Storage Productivity Center 4.2.1. This is an approach to the automation of storage provisioning decisions using an expert advisor designed to automate decisions that could be made by a storage administrator with time and information at hand. The goal is to give good advice using algorithms that consider many of the same factors that an administrator would when deciding where best to allocate storage. A performance advisor must take several factors into account when recommending volume allocations: 25 • Total amount of space required • Minimum number, maximum number, and sizes of volumes • Workload requirements • Contention from other workloads Only subsystems that have been discovered and probed will show up in the SAN Planner. To use the SAN Planner, the user needs to define the capacity and the workload profile of the new volumes to be allocated. A few standard workload profiles are provided. Once performance data has been collected, you can use historic performance data to define a profile to be used for new volumes whose workloads will be similar to some existing volumes. See the following link for more information: IBM Tivoli Storage Productivity Center User’s Guide, Chapter 4. Managing storage resources, under the section titled “Planning and modifying storage configurations” http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.tpc_V411.doc/fqz0_usersguid e_v411.pdf While the SAN Planner tries to identify the RAID arrays or pools with the least workload in order to recommend where to create new volumes, the Storage Optimizer examines existing volumes to determine if there are any performance bottlenecks. The Storage Optimizer then goes through several scenarios to determine if the performance bottlenecks may be eliminated by moving the problem volumes to other pools. The Storage Optimizer supports ESS, DS8000, DS6000, DS4000, SVC and Storwize V7000 in IBM Tivoli Storage Productivity Center 4.2.1. Additional information regarding the Storage Optimizer may be found in the IBM Tivoli Storage Productivity Center User’s Guide, Chapter 4. Managing storage resources, under the section titled “Optimizing storage configurations”. 6. Summary This paper attempts to give an overview of performance monitoring and management function that can be achieved using IBM Tivoli Storage Productivity Center 4.2.1. It lays out all the configuration steps necessary to start a performance monitor, to set a threshold, and to generate some useful reports and charts for problem diagnostics. It also attempts to interpret a small number of performance metrics. The reporting of those metrics can form the foundation for capacity planning and performance tuning. 7. Reference IBM Tivoli Storage Productivity Center Support Site http://www01.ibm.com/software/sysmgmt/products/support/IBMTotalStorageProductivityCenterStandardEditi on.html IBM Tivoli Storage Productivity Center Information Center http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp IBM Tivoli Storage Productivity Center Installation and Configuration Guide http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.tpc_V411.doc/fqz0_t_installin g_main.html IBM Tivoli Storage Productivity Center V4.1.1 User’s Guide http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.tpc_V411.doc/fqz0_usersguid e_v411.pdf 26 IBM Tivoli Storage Productivity Center V4.1.1 Messages http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/index.jsp?topic=/com.ibm.tpc_V411.doc/tpc msg41122.html IBM TotalStorage Productivity Center V3.1 Problem Determination Guide http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.itpc.doc/tpcpdg31.htm IBM TotalStorage Productivity Center V3.3.2/4.1 Hints and Tips Guide http://www01.ibm.com/support/docview.wss?rs=40&context=SSBSEX&context=SSMN28&context=SSMMU P&context=SS8JB5&context=SS8JFM&dc=DB700&dc=DA4A10&uid=swg27008254&loc=en_US &cs=utf-8&lang=en IBM TotalStorage Productivity Center V3.3 SAN Storage Provisioning Planner White Paper ftp://ftp.software.ibm.com/common/ssi/sa/wh/n/tsw03026usen/TSW03026USEN.PDF IBM Tivoli Storage Productivity Center V4.1 Storage Optimizer White Paper http://www-01.ibm.com/support/docview.wss?uid=swg21389271 SAN Storage Performance Management Using TotalStorage Productivity Center Redbook http://www.redbooks.ibm.com/redpieces/abstracts/sg247364.html?Open Supported Storage Products http://www01.ibm.com/support/docview.wss?rs=40&context=SSBSEX&q1=subsystem&uid=swg21384734&l oc=en_US&cs=utf-8&lang=en Supported Fabric Devices http://www01.ibm.com/support/docview.wss?rs=40&context=SSBSEX&dc=DA420&dc=DA480&dc=DA490& dc=DA430&dc=DA410&dc=DB600&dc=DA400&dc=D600&dc=D700&dc=DB520&dc=DB510&dc =DA500&dc=DA470&dc=DA4A20&dc=DA460&dc=DA440&dc=DB550&dc=DB560&dc=DB700&d c=DB530&dc=DA4A10&dc=DA4A30&dc=DB540&q1=switch&uid=swg21384219&loc=en_US&cs =utf-8&lang=en Support for Agents, CLI, and GUI http://www01.ibm.com/support/docview.wss?rs=40&context=SSBSEX&uid=swg21384678&loc=en_US&cs= UTF-8&lang=en 27 Appendix A Available Metrics This table lists the metric name, the types of components for which each metric is available, and a description. The SMI-S BSP device type mentioned in the table below refers to any storage subsystem that is managed via a CIMOM which supports SMI-S 1.1 with Block Server Performance (BSP) subprofile. Metrics that require specific versions of IBM Tivoli Storage Productivity Center are noted in parenthesis. IBM Tivoli Storage Productivity Center V4.2 and higher supports XIV version 10.2.2 and higher. Limited metrics are supported by XIV version 10.1, including total I/O rate, total data rate, overall response time, and overall transfer size. The read and write components of these metrics, as well as read, write, and total cache hits, are provided by XIV version 10.2.2. Metric Metric Type Device/Component Type Description ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem Average number of I/O operations per second for nonsequential read operations, for a particular component over a time interval. Average number of I/O operations per second for sequential read operations, for a particular component over a time interval. Average number of I/O operations per second for both sequential and non-sequential read operations, for a particular component over a time interval. Volume Based Metrics I/O Rates Read I/O Rate (normal) 801 Read I/O rate (sequential) 802 Read I/O Rate (overall) 803 Write I/O Rate (normal) 804 Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of I/O operations per second for nonsequential write operations, for a particular component over a time interval. 28 Write I/O Rate (sequential) 805 Write I/O Rate (overall) 806 Total I/O Rate (normal) 807 Total I/O Rate (sequential) 808 Total I/O Rate (overall) 809 Global Mirror Write I/O Rate (3.1.3) 937 Global Mirror Overlapping Write Percentage (3.1.3) 938 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Average number of I/O operations per second for sequential write operations, for a particular component over a time interval. Average number of I/O operations per second for both sequential and non-sequential write operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of I/O operations per second for nonsequential read and write operations, for a particular component over a time interval. Average number of I/O operations per second for sequential read and write operations, for a particular component over a time interval. Average number of I/O operations per second for both sequential and non-sequential read and write operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.1 or higher. Average number of write operations per second issued to the Global Mirror secondary site, for a particular component over a time interval. Average percentage of write operations issued by the Global Mirror primary site which were serialized overlapping writes, for 29 Global Mirror Overlapping Write I/O Rate (3.1.3) 939 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem HPF Read I/O Rate (4.1.1) 943 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem HPF Write I/O Rate (4.1.1) 944 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem Total HPF I/O Rate (4.1.1) 945 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem HPF I/O Percentage (4.1.1) 946 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem PPRC Transfer Rate (4.1.1) 947 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem a particular component over a time interval. For SVC 4.3.1 and later, some overlapping writes are processed in parallel (are not serialized), so are excluded. For earlier SVC versions, all overlapping writes were serialized. Average number of serialized overlapping write operations per second encountered by the Global Mirror primary site, for a particular component over a time interval. For SVC 4.3.1 and later, some overlapping writes are processed in parallel (are not serialized), so are excluded. For earlier SVC versions, all overlapping writes were serialized. Average number of read operations per second that were issued via the High Performance FICON (HPF) feature of the storage subsystem, for a particular component over a particular time interval. Average number of write operations per second that were issued via the High Performance FICON (HPF) feature of the storage subsystem, for a particular component over a particular time interval. Average number of read and write operations per second that were issued via the High Performance FICON (HPF) feature of the storage subsystem, for a particular component over a particular time interval. The percentage of all I/O operations that were issued via the High Performance FICON (HPF) feature of the storage subsystem for a particular component over a particular time interval. Average number of track transfer operations per second for Peer-to-Peer Remote Copy usage, for a particular component over a particular 30 Read Data Cache Hit Percentage 998 XIV Volume XIV Module XIV Subsystem time interval. Percentage of read data that was read from the cache, for a particular component over a particular time interval. Write Data Cache Hit Percentage 999 XIV Volume XIV Module XIV Subsystem Note: Available in v4.2.1.163. Percentage of write data that was written to the cache, for a particular component over a particular time interval. Total Data Cache Hit Percentage 1000 XIV Volume XIV Module XIV Subsystem Note: Available in v4.2.1.163. Percentage of all data that was written to the cache, for a particular component over a particular time interval. Small Transfers I/O Percentage 1007 XIV Volume XIV Module XIV Subsystem Medium Transfers I/O Percentage 1008 XIV Volume XIV Module XIV Subsystem Large Transfers I/O Percentage 1009 XIV Volume XIV Module XIV Subsystem Very Large Transfers I/O Percentage 1010 XIV Volume XIV Module XIV Subsystem Note: Available in v4.2.1.163. Percentage of all I/Os that were operations with small transfer sizes (<= 8 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Percentage of all I/Os that were operations with medium transfer sizes (> 8 KB and <= 64 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Percentage of all I/Os that were operations with large transfer sizes (> 64 KB and <= 512 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Percentage of all I/Os that were operations with very large transfer sizes (> 512 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Cache Hit Percentages Read Cache Hits 810 ESS/DS6K/DS8K Volume (normal) ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem Read Cache Hits 811 ESS/DS6K/DS8K Volume (sequential) ESS/DS6K/DS8K Array Percentage of cache hits for non-sequential read operations, for a particular component over a time interval. Percentage of cache hits for sequential read operations, for a 31 Read Cache Hits (overall) 812 Write Cache Hits (normal) 813 Write Cache Hits (sequential) 814 Write Cache Hits (overall) 815 Total Cache Hits (normal) 816 Total Cache Hits (sequential) 817 Total Cache Hits (overall) 818 ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array particular component over a time interval. Percentage of cache hits for both sequential and nonsequential read operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Percentage of cache hits for non-sequential write operations, for a particular component over a time interval. Percentage of cache hits for sequential write operations, for a particular component over a time interval. Percentage of cache hits for both sequential and nonsequential write operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Percentage of cache hits for non-sequential read and write operations, for a particular component over a time interval. Percentage of cache hits for sequential read and write operations, for a particular component over a time interval. Percentage of cache hits for both sequential and non- 32 Readahead Percentage of Cache Hits (3.1.3) Dirty-Write Percentage of Cache Hits (3.1.3) Data Rates Read Data Rate Write Data Rate Total Data Rate 890 891 819 820 821 ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem sequential read and write operations, for a particular component over a time interval. ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume Average number of megabytes (2^20 bytes) per second that were transferred for read operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Percentage of all read cache hits which occurred on prestaged data, for a particular component over a time interval. Percentage of all write cache hits which occurred on already dirty data in the cache, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of megabytes (2^20 bytes) per second that were transferred for write operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of megabytes 33 ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem XIV Volume XIV Module XIV Subsystem Small Transfers Data Percentage 1011 Medium Transfers Data Percentage 1012 XIV Volume XIV Module XIV Subsystem Large Transfers Data Percentage 1013 XIV Volume XIV Module XIV Subsystem Very Large Transfers Data Percentage 1014 XIV Volume XIV Module XIV Subsystem (2^20 bytes) per second that were transferred for read and write operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.1 or higher. Percentage of all data that was transferred via I/O operations with small transfer sizes (<= 8 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Percentage of all data that was transferred via I/O operations with medium transfer sizes (> 8 KB and <= 64 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Percentage of all data that was transferred via I/O operations with large transfer sizes (> 64 KB and <= 512 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Percentage of all data that was transferred via I/O operations with very large transfer sizes (> 512 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Response Times Read Response 822 Time ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group Average number of milliseconds that it took to service each read operation, for a particular component over a time interval. Note: SVC VDisk, Node, I/O Group, MDisk Group, and Subsystem support requires 34 Write Response Time Overall Response Time 823 824 Peak Read Response Time (3.1.3) 940 Peak Write Response Time (3.1.3) 941 Global Mirror Secondary Write Lag (3.1.3) 942 SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of milliseconds that it took to service each write operation, for a particular component over a time interval. Note: SVC VDisk, Node, I/O Group, MDisk Group, and Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of milliseconds that it took to service each I/O operation (read and write), for a particular component over a time interval. Note: SVC VDisk, Node, I/O Group, MDisk Group, and Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.1 or higher. The peak (worst) response time among all read operations, for a particular component over a time interval. The peak (worst) response time among all write operations, for a particular component over a time interval. The average number of additional milliseconds it took to service each secondary write operation for Global Mirror, beyond the time needed to service the primary writes, for a particular component over a particular time interval. 35 Overall Host Attributed Response Time Percentage (4.1.1) 948 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Read Cache Hit Response Time 1001 XIV Volume XIV Module XIV Subsystem Write Cache Hit Response Time 1002 XIV Volume XIV Module XIV Subsystem Overall Cache Hit Response 1003 XIV Volume XIV Module XIV Subsystem Read Cache Miss Response Time 1004 XIV Volume XIV Module XIV Subsystem Write Cache Miss Response Time 1005 XIV Volume XIV Module XIV Subsystem Overall Cache Miss Response Time 1006 XIV Volume XIV Module XIV Subsystem This is the percentage of the average response time (read+write) which can be attributed to delays from the host systems. This is provided as an aid to diagnose slow hosts and poorly performing fabrics. This value is based on the time taken for hosts to respond to transfer-ready notifications from the SVC nodes (for read) and the time taken for hosts to send the write data after the node has responded to a transfer-ready notification (for write). Average number of milliseconds that it took to service each read cache hit operation, for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each write cache hit operation, for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each cache hit operation (reads and writes), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each read cache miss operation, for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each write cache miss operation, for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each cache miss operation (reads and writes), for a particular 36 component over a particular time interval. Small Transfers Response Time 1015 XIV Volume XIV Module XIV Subsystem Medium Transfers Response Time 1016 XIV Volume XIV Module XIV Subsystem Large Transfers Response Time 1017 XIV Volume XIV Module XIV Subsystem Very Large Transfers Response Time 1018 XIV Volume XIV Module XIV Subsystem Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each I/O with a small transfer size (<= 8 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each I/O with a medium transfer size (> 8 KB and <= 64 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each I/O with a large transfer size (> 64 KB and <= 512 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Average number of milliseconds that it took to service each I/O with a very large transfer size (> 512 KB), for a particular component over a particular time interval. Note: Available in v4.2.1.163. Transfer Sizes Read Transfer Size 825 Write Transfer Size 826 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array Average number of KB per I/O for read operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of KB per I/O for write operations, for a 37 Overall Transfer Size 827 Record Mode Reads Record Mode Read 828 I/O Rate Record Mode Read Cache Hits 829 Cache Transfers Disk to Cache I/O 830 Rate Cache to Disk I/O Rate 831 ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem SMI-S BSP Volume SMI-S BSP Controller SMI-S BSP Subsystem XIV Volume XIV Module XIV Subsystem particular component over a time interval. ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller Average number of I/O operations per second for record mode read operations, for a particular component over a time interval. Percentage of cache hits for record mode read operations, for a particular component over a time interval. ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller SVC/Storwize VDisk SVC/Storwize Node Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.2.2 or higher. Average number of KB per I/O for read and write operations, for a particular component over a time interval. Note: SVC Node and SVC Subsystem support requires v3.1.3 or above. Note: XIV metrics require XIV version 10.1 or higher. Average number of I/O operations (track transfers) per second for disk to cache transfers, for a particular component over a time interval. Note: SVC VDisk, Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of I/O operations (track transfers) per second for cache to disk transfers, for a particular component over a time interval. 38 SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Constraints Write-cache Delay 832 ESS/DS6K/DS8K Volume Percentage ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Delay I/O Rate 833 ESS/DS6K/DS8K Volume ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Overflow Percentage (3.1.3) 894 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Overflow I/O Rate (3.1.3) 895 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Flushthrough Percentage (3.1.3) 896 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Flushthrough I/O Rate (3.1.3) 897 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Writethrough Percentage (3.1.3) 898 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Write-cache Writethrough I/O Rate (3.1.3) 899 SVC/Storwize VDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Note: SVC VDisk, Node, I/O Group, and Subsystem support requires v3.1.3 or above. Percentage of I/O operations that were delayed due to writecache space constraints or other conditions, for a particular component over a time interval. (The ratio of delayed operations to total I/Os.) Note: SVC VDisk, Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of I/O operations per second that were delayed due to write-cache space constraints or other conditions, for a particular component over a time interval. Note: SVC VDisk, Node, I/O Group, and Subsystem support requires v3.1.3 or above. Percentage of write operations that were delayed due to lack of write-cache space, for a particular component over a time interval. Average number of tracks per second that were delayed due to lack of write-cache space, for a particular component over a time interval. Percentage of write operations that were processed in Flushthrough write mode, for a particular component over a time interval. Average number of tracks per second that were processed in Flush-through write mode, for a particular component over a time interval. Percentage of write operations that were processed in Writethrough write mode, for a particular component over a time interval. Average number of tracks per second that were processed in Write-through write mode, for a particular component over a 39 time interval. Miscellaneous Computed Values Cache Holding 834 ESS/DS6K/DS8K Controller Time ESS/DS6K/DS8K Subsystem CPU Utilization (3.1.3) 900 Non-Preferred Node Usage Percentage (4.1.1) 949 Volume Utilization (4.1.1) 978 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize VDisk SVC/Storwize I/O Group ESS/DS6K/DS8K Volume SVC/Storwize VDisk Average cache holding time, in seconds, for I/O data in this subsystem controller (cluster). Shorter time periods indicate adverse performance. Average utilization percentage of the CPU(s) for a particular component over a time interval. The overall percentage of I/O performed or data transferred via the non-preferred nodes of the VDisks, for a particular component over a particular time interval. The approximate utilization percentage of a particular volume over a time interval, i.e. the average amount of time that the volume was busy reading or writing data. Backend Based Metrics I/O Rates Backend Read I/O Rate Backend Write I/O Rate Total Backend I/O Rate 835 836 837 ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Average number of I/O operations per second for read operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of I/O operations per second for write operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of I/O operations per second for read and write operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Data Rates 40 Backend Read Data Rate Backend Write Data Rate Total Backend Data Rate 838 839 840 Response Times Backend Read 841 Response Time Backend Write Response Time Overall Backend Response Time 842 843 ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Average number of megabytes (2^20 bytes) per second that were transferred for read operations, for a particular component over a time interval. ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Average number of milliseconds that it took to service each read operation, for a particular component over a time interval. For SVC, this is the external response time time of the MDisks. ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of megabytes (2^20 bytes) per second that were transferred for write operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of megabytes (2^20 bytes) per second that were transferred for read and write operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of milliseconds that it took to service each write operation, for a particular component over a time interval. For SVC, this is the external response time time of the MDisks. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of milliseconds that it took to service each I/O operation (read and write), for a 41 ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Backend Read Queue Time Backend Write Queue Time Overall Backend Queue Time 844 845 846 SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Peak Backend Read Response Time (4.1.1) 950 SVC/Storwize MDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem Peak Backend Write Response Time (4.1.1) 951 SVC/Storwize MDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem Peak Backend Read Queue Time (4.1.1) 952 SVC/Storwize MDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem particular component over a time interval. For SVC, this is the external response time time of the MDisks. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of milliseconds that each read operation spent on the queue before being issued to the backend device, for a particular MDisk or MDisk Group over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of milliseconds that each read operation spent on the queue before being issued to the backend device, for a particular MDisk or MDisk Group over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of milliseconds that each read operation spent on the queue before being issued to the backend device, for a particular MDisk or MDisk Group over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. The peak (worst) response time among all read operations, for a particular component over a time interval. For SVC, this is the external response time of the MDisks. The peak (worst) response time among all write operations, for a particular component over a time interval. For SVC, this is the external response time of the MDisks. The lower bound on the peak (worst) queue time for read operations, for a particular component over a time interval. The queue time is the amount of 42 Peak Backend Write Queue Time (4.1.1) 953 Transfer Sizes Backend Read 847 Transfer Size Backend Write Transfer Size Overall BackendTransfer Size 848 849 Disk Utilization Disk Utilization 850 Percentage Sequential I/O Percentage 851 SVC/Storwize MDisk SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize MDisk Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Rank ESS/DS6K/DS8K Array ESS/DS6K/DS8K Controller ESS/DS6K/DS8K Subsystem SVC/Storwize MDisk SVC/Storwize MDisk Group SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem ESS/DS6K/DS8K Array ESS/DS6K/DS8K Array time that the read operation spent on the queue before being issued to the backend device. The lower bound on the peak (worst) queue time for write operations, for a particular component over a time interval. The queue time is the amount of time that the write operation spent on the queue before being issued to the backend device. Average number of KB per I/O for read operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of KB per I/O for write operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. Average number of KB per I/O for read and write operations, for a particular component over a time interval. Note: SVC Node, I/O Group, and Subsystem support requires v3.1.3 or above. The approximate utilization percentage of a particular rank over a time interval, i.e. the average percent of time that the disks associated with the array were busy Percent of all I/O operations performed for a particular array over a time interval that were sequential operations. 43 Front-end and Switch Based Metrics I/O or Packet Rates Port Send I/O Rate 852 Port Receive I/O Rate Total Port I/O Rate 853 854 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Port XIV Port ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Port XIV Port ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Port XIV Port Port Send Packet Rate 855 Switch Port Switch Port Receive Packet Rate 856 Switch Port Switch Total Port Packet Rate 857 Switch Port Switch Port to Host Send I/O Rate (3.1.3) 901 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group Average number of I/O operations per second for send operations, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem and SVC Port, Node, I/O Group, and Subsystem support requires v3.1.3 or above; SMI-S BSP Port support requires v3.3; XIV Port supported in v4.2.1.163. Average number of I/O operations per second for receive operations, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem and SVC Port, Node, I/O Group, and Subsystem support requires v3.1.3 or above; SMI-S BSP Port support requires v3.3; XIV Port supported in v4.2.1.163. Average number of I/O operations per second for send and receive operations, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem and SVC Port, Node, I/O Group, and Subsystem support requires v3.1.3 or above; SMI-S BSP Port support requires v3.3; XIV Port supported in v4.2.1.163. Average number of packets per second for send operations, for a particular port over a time interval. Average number of packets per second for receive operations, for a particular port over a time interval. Average number of packets per second for send and receive operations, for a particular port over a time interval. Average number of exchanges (I/Os) per second sent to host computers by a particular 44 SVC/Storwize Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Host Receive I/O Rate (3.1.3) 902 Total Port to Host I/O Rate (3.1.3) 903 Port to Disk Send I/O Rate (3.1.3) 904 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Disk Receive I/O Rate (3.1.3) 905 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Disk I/O Rate (3.1.3) 906 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Send I/O Rate (3.1.3) 907 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Receive I/O Rate (3.1.3) 908 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Local Node I/O Rate (3.1.3) 909 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Send I/O Rate (3.1.3) 910 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Receive I/O Rate (3.1.3) 911 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Remote Node I/O Rate (3.1.3) 912 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem component over a time interval. Average number of exchanges (I/Os) per second received from host computers by a particular component over a time interval. Average number of exchanges (I/Os) per second transmitted between host computers and a particular component over a time interval. Average number of exchanges (I/Os) per second sent to storage subsystems by a particular component over a time interval. Average number of exchanges (I/Os) per second received from storage subsystems by a particular component over a time interval. Average number of exchanges (I/Os) per second transmitted between storage subsystems and a particular component over a time interval. Average number of exchanges (I/Os) per second sent to other nodes in the local SVC cluster by a particular component over a time interval. Average number of exchanges (I/Os) per second received from other nodes in the local SVC cluster by a particular component over a time interval. Average number of exchanges (I/Os) per second transmitted between other nodes in the local SVC cluster and a particular component over a time interval. Average number of exchanges (I/Os) per second sent to nodes in the remote SVC cluster by a particular component over a time interval. Average number of exchanges (I/Os) per second received from nodes in the remote SVC cluster by a particular component over a time interval. Average number of exchanges (I/Os) per second transmitted between nodes in the remote SVC cluster and a particular component over a time interval. 45 Port FCP Send I/O Rate (4.1.1) 979 ESS/DS6K/DS8K Port Port FCP Receive I/O Rate (4.1.1) 980 ESS/DS6K/DS8K Port Total Port FCP I/O Rate (4.1.1) 981 ESS/DS6K/DS8K Port Port FICON Send I/O Rate (4.1.1) 954 ESS/DS6K/DS8K Port Port FICON Receive I/O Rate (4.1.1) 955 ESS/DS6K/DS8K Port Total Port FICON I/O Rate (4.1.1) 956 ESS/DS6K/DS8K Port Port PPRC Send I/O Rate (4.1.1) 957 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Port PPRC Receive I/O Rate (4.1.1) 958 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Total Port PPRC I/O Rate (4.1.1) 959 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem 858 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Port Switch Port Switch XIV Port Data Rates Port Send Data Rate Average number of send operations per second using the FCP protocol, for a particular port over a time interval. Average number of receive operations per second using the FCP protocol, for a particular port over a time interval. Average number of send and receive operations per second using the FCP protocol, for a particular port over a time interval. Average number of send operations per second using the FICON protocol, for a particular port over a time interval. Average number of receive operations per second using the FICON protocol, for a particular port over a time interval. Average number of send and receive operations per second using the FICON protocol, for a particular port over a time interval. Average number of send operations per second for Peerto-Peer Remote Copy usage, for a particular port over a time interval. Average number of receive operations per second for Peerto-Peer Remote Copy usage, for a particular port over a time interval. Average number of send and receive operations per second for Peer-to-Peer Remote Copy usage for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second that were transferred for send (read) operations, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem and SVC Port, Node, I/O Group, and Subsystem support requires v3.1.3 or above; SMI-S BSP Port support requires v3.3; XIV Port supported in v4.2.1.163. 46 Port Receive Data Rate Total Port Data Rate 859 860 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Port Switch Port Switch XIV Port ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SMI-S BSP Port Switch Port Switch XIV Port Port Peak Send Data Rate 861 Switch Port Switch Port Peak Receive Data Rate 862 Switch Port Switch Port to Host Send Data Rate (3.1.3) 913 Port to Host Receive Data Rate (3.1.3) 914 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Host Data Rate (3.1.3) 915 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Disk Send Data Rate (3.1.3) 916 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Average number of megabytes (2^20 bytes) per second that were transferred for receive (write) operations, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem and SVC Port, Node, I/O Group, and Subsystem support requires v3.1.3 or above; SMI-S BSP Port support requires v3.3; XIV Port supported in v4.2.1.163. Average number of megabytes (2^20 bytes) per second that were transferred for send and receive operations, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem and SVC Port, Node, I/O Group, and Subsystem support requires v3.1.3 or above; SMI-S BSP Port support requires v3.3; XIV Port supported in v4.2.1.163. Peak number of megabytes (2^20 bytes) per second that were sent by a particular port over a time interval. Peak number of megabytes (2^20 bytes) per second that were received by a particular port over a time interval. Average number of megabytes (2^20 bytes) per second sent to host computers by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second received from host computers by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second transmitted between host computers and a particular component over a time interval. Average number of megabytes (2^20 bytes) per second sent to storage subsystems by a particular component over a time interval. 47 Port to Disk Receive Data Rate (3.1.3) 917 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Disk Data Rate (3.1.3) 918 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Send Data Rate (3.1.3) 919 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Receive Data Rate (3.1.3) 920 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Local Node Data Rate (3.1.3) 921 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Send Data Rate (3.1.3) 922 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Receive Data Rate (3.1.3) 923 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Remote Node Data Rate (3.1.3) 924 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port FCP Send Data Rate (4.1.1) 982 ESS/DS6K/DS8K Port Port FCP Receive Data Rate (4.1.1) 983 ESS/DS6K/DS8K Port Total Port FCP Data Rate 984 ESS/DS6K/DS8K Port Average number of megabytes (2^20 bytes) per second received from storage subsystems by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second transmitted between storage subsystems and a particular component over a time interval. Average number of megabytes (2^20 bytes) per second sent to other nodes in the local SVC cluster by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second received from other nodes in the local SVC cluster by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second transmitted between other nodes in the local SVC cluster and a particular component over a time interval. Average number of megabytes (2^20 bytes) per second sent to nodes in the remote SVC cluster by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second received from nodes in the remote SVC cluster by a particular component over a time interval. Average number of megabytes (2^20 bytes) per second transmitted between nodes in the remote SVC cluster and a particular component over a time interval. Average number of megabytes (2^20 bytes) per second sent using the FCP protocol, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second received using the FCP protocol, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second sent or 48 (4.1.1) Port FICON Send Data Rate (4.1.1) 960 ESS/DS6K/DS8K Port Port FICON Receive Data Rate (4.1.1) 961 ESS/DS6K/DS8K Port Total Port FICON Data Rate (4.1.1) 962 ESS/DS6K/DS8K Port Port PPRC Send Data Rate (4.1.1) 963 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Port PPRC Receive Data Rate (4.1.1) 964 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Total Port PPRC Data Rate (4.1.1) 965 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Response Times Port Send 863 Response Time Port Receive Response Time 864 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem XIV Port ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem XIV Port received using the FCP protocol, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second sent using the FICON protocol, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second received using the FICON protocol, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second sent or received using the FICON protocol, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second sent for Peer-to-Peer Remote Copy usage, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second received for Peer-to-Peer Remote Copy usage, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second transferred for Peer-to-Peer Remote Copy usage, for a particular port over a time interval. Average number of milliseconds that it took to service each send (read) operation, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem support requires v3.1.3 or above; XIV Port supported in v4.2.1.163. Average number of milliseconds that it took to service each receive (write) operation, for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem support requires v3.1.3 or above; XIV Port supported in v4.2.1.163. 49 Overall Port Response Time 865 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem XIV Port Port to Local Node Send Response Time (3.1.3) 925 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Receive Response Time (3.1.3) 926 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Local Node Response Time (3.1.3) 927 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Send Queued Time (3.1.3) 928 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Local Node Receive Queued Time (3.1.3) 929 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Local Node Queued Time (3.1.3) 930 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Average number of milliseconds that it took to service each operation (send and receive), for a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem support requires v3.1.3 or above; XIV Port supported in v4.2.1.163. Average number of milliseconds it took to service each send operation to another node in the local SVC cluster, for a particular component over a time interval. For SVC, this is the external response time of the transfers. Average number of milliseconds it took to service each receive operation from another node in the local SVC cluster, for a particular component over a time interval. For SVC, this is the external response time of the transfers. Average number of milliseconds it took to service each send or receive operation between another node in the local SVC cluster and a particular component over a time interval. For SVC, this is the external response time of the transfers. Average number of milliseconds that each send operation issued to another node in the local SVC cluster spent on the queue before being issued, for a particular component over a time interval. Average number of milliseconds that each receive operation from another node in the local SVC cluster spent on the queue before being issued, for a particular component over a time interval. Average number of milliseconds that each operation issued to another node in the local SVC cluster spent on the queue before being issued, for a particular component over a time interval. 50 Port to Remote Node Send Response Time (3.1.3) 931 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Receive Response Time (3.1.3) 932 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Remote Node Response Time (3.1.3) 933 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Send Queued Time (3.1.3) 934 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port to Remote Node Receive Queued Time (3.1.3) 935 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Total Port to Remote Node Queued Time (3.1.3) 936 SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Port FCP Send Response Time (4.1.1) 985 ESS/DS6K/DS8K Port Port FCP Receive Response Time (4.1.1) 986 ESS/DS6K/DS8K Port Overall Port FCP 987 ESS/DS6K/DS8K Port Average number of milliseconds it took to service each send operation to a node in the remote SVC cluster, for a particular component over a time interval. For SVC, this is the external response time of the transfers. Average number of milliseconds it took to service each receive operation from a node in the remote SVC cluster, for a particular component over a time interval. For SVC, this is the external response time of the transfers. Average number of milliseconds it took to service each send or receive operation between a node in the remote SVC cluster and a particular component over a time interval. For SVC, this is the external response time of the transfers. Average number of milliseconds that each send operation issued to a node in the remote SVC cluster spent on the queue before being issued, for a particular component over a time interval. Average number of milliseconds that each receive operation from a node in the remote SVC cluster spent on the queue before being issued, for a particular component over a time interval. Average number of milliseconds that each operation issued to a node in the remote SVC cluster spent on the queue before being issued, for a particular component over a time interval. Average number of milliseconds it took to service all send operations using the FCP protocol, for a particular port over a time interval. Average number of milliseconds it took to service all receive operations using the FCP protocol, for a particular port over a time interval. Average number of milliseconds 51 Response Time (4.1.1) Port FICON Send Response Time (4.1.1) 966 ESS/DS6K/DS8K Port Port FICON Receive Response Time (4.1.1) 967 ESS/DS6K/DS8K Port Overall Port FICON Response Time (4.1.1) 968 ESS/DS6K/DS8K Port Port PPRC Send Response Time (4.1.1) 969 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Port PPRC Receive Response Time (4.1.1) 970 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Overall Port PPRC Response Time (4.1.1) 971 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem Transfer Sizes Port Send Transfer 866 Size Port Receive Transfer Size Overall Port 867 868 ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SMI-S BSP Port ESS/DS6K/DS8K Port ESS/DS6K/DS8K Subsystem SMI-S BSP Port ESS/DS6K/DS8K Port it took to service all I/O operations using the FCP protocol, for a particular port over a time interval. Average number of milliseconds it took to service all send operations using the FICON protocol, for a particular port over a time interval. Average number of milliseconds it took to service all receive operations using the FICON protocol, for a particular port over a time interval. Average number of milliseconds it took to service all I/O operations using the FICON protocol, for a particular port over a time interval. Average number of milliseconds it took to service all send operations for Peer-to-Peer Remote Copy usage, for a particular port over a time interval. Average number of milliseconds it took to service all receive operations for Peer-to-Peer Remote Copy usage, for a particular port over a time interval. Average number of milliseconds it took to service all I/O operations for Peer-to-Peer Remote Copy usage, for a particular port over a time interval. Average number of KB sent per I/O by a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem support requires v3.1.3 or above; SMI-S BSP Port requires v3.3. Average number of KB received per I/O by a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem support requires v3.1.3 or above; SMI-S BSP Port requires v3.3. Average number of KB 52 Transfer Size ESS/DS6K/DS8K Subsystem SMI-S BSP Port Port Send Packet Size 869 Switch Port Switch Port Receive Packet Size 870 Switch Port Switch Overall Port Packet Size 871 Switch Port Switch Special Computed Values Port Send 972 ESS/DS6K/DS8K Port Utilization Percentage (4.1.1) Port Receive 973 ESS/DS6K/DS8K Port Utilization Percentage (4.1.1) Overall Port 974 ESS/DS6K/DS8K Port Utilization Percentage (4.1.1) Port Send 975 ESS/DS8K Port Bandwidth SVC Port Percentage Switch Port (4.1.1) XIV Port Port Receive Bandwidth Percentage (4.1.1) Overall Port Bandwidth Percentage (4.1.1) 976 977 ESS/DS8K Port SVC Port Switch Port XIV Port ESS/DS8K Port SVC Port Switch Port XIV Port transferred per I/O by a particular port over a time interval. Note: ESS/DS6K/DS8K Subsystem support requires v3.1.3 or above. Average number of KB sent per packet by a particular port over a time interval. Average number of KB received per packet by a particular port over a time interval. Average number of KB transferred per packet by a particular port over a time interval. Average amount of time that the port was busy sending data, over a particular time interval. Average amount of time that the port was busy receiving data, over a particular time interval. Average amount of time that the port was busy sending or receiving data, over a particular time interval. The approximate bandwidth utilization percentage for send operations by this port, over a particular time interval, based on its current negotiated speed. Note: XIV support available in v4.2.1.163. The approximate bandwidth utilization percentage for receive operations by this port, over a particular time interval, based on its current negotiated speed. Note: XIV support available in v4.2.1.163. The approximate bandwidth utilization percentage for send and receive operations by this port, over a particular time interval. Note: XIV support available in v4.2.1.163. Error Rates 53 Error Frame Rate 872 DS8K Port DS8K Subsystem Switch Port Switch Dumped Frame Rate 873 Switch Port Switch Link Failure Rate 874 DS8K Port DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Switch Port Switch DS8K Port DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Switch Port Switch DS8K Port DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Switch Port Switch DS8K Port DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Switch Port Switch Loss of Sync Rate Loss of Signal Rate CRC Error Rate 875 876 877 Short Frame Rate 878 Switch Port Switch Long Frame Rate 879 Switch Port Switch The number of frames per second that were received in error by a particular port over a time interval. Note: DS8K support requires TPC v4.2.1. The number of frames per second that were lost due to a lack of available host buffers, for a particular port over a time interval. The number of link errors per second that were experienced by a particular port over a time interval. Note: DS8K and SVC/Storwize support requires TPC v4.2.1. The average number of times per second that synchronization was lost, for a particular component over a particular time interval. Note: DS8K and SVC/Storwize support requires TPC v4.2.1. The average number of times per second that the signal was lost, for a particular component over a particular time interval. Note: DS8K and SVC/Storwize support requires TPC v4.2.1. The average number of frames received per second in which the CRC in the frame did not match the CRC computed by the receiver, for a particular component over a particular time interval. Note: DS8K and SVC/Storwize support requires TPC v4.2.1. The average number of frames received per second that were shorter than 28 octets (24 header + 4 CRC) not including any SOF/EOF bytes, for a particular component over a particular time interval. The average number of frames received per second that were 54 Encoding Disparity Error Rate 880 Switch Port Switch Discarded Class3 Frame Rate 881 Switch Port Switch F-BSY Frame Rate 882 Switch Port Switch F-RJT Frame Rate 883 Switch Port Switch Primitive Sequence Protocol Error Rate 988 DS8K Port DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Switch Port Switch DS8K Port DS8K Subsystem SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Switch Port Switch longer than 2140 octets (24 header + 4 CRC + 2112 data) not including any SOF/EOF bytes, for a particular component over a particular time interval. The average number of disparity errors received per second, for a particular component over a particular time interval. The average number of class-3 frames per second that were discarded by a particular component over a particular time interval. The average number of F-BSY frames per second that were generated by a particular component over a particular time interval. The average number of F-RJT frames per second that were generated by a particular component over a particular time interval. The average number of primitive sequence protocol errors detected for a particular component over a particular time interval. Note: Added in TPC v4.2.1. Invalid Transmission Word Rate 989 The average number of transmission words per second that had an 8b10 code violation in one or more of its characters; had a K28.5 in its second, third, or fourth character positions; and/or was an ordered set that had an incorrect Beginning Running Disparity. Zero Buffer-Buffer Credit Timer 990 SVC/Storwize Port SVC/Storwize Node SVC/Storwize I/O Group SVC/Storwize Subsystem Note: Added in TPC v4.2.1. The number of microseconds for which the port has been unable to send frames due to lack of buffer credit since the last node reset. Link Reset Transmitted Rate 991 DS8K Port DS8K Subsystem Switch Port Switch Note: Added in TPC v4.2.1. The average number of times per second a port has transitioned from an active (AC) state to a Link Recovery (LR1) state over a particular time 55 interval. Link Reset Received Rate 992 DS8K Port DS8K Subsystem Switch Port Switch Out of Order Data Rate 993 DS8K Port DS8K Subsystem Out of Order ACK Rate 994 DS8K Port DS8K Subsystem Duplicate Frame Rate 995 DS8K Port DS8K Subsystem Invalid Relative Offset Rate 996 DS8K Port DS8K Subsystem Sequence Timeout Rate 997 DS8K Port DS8K Subsystem Note: Added in TPC v4.2.1. The average number of times per second a port has transitioned from an active (AC) state to a Link Recovery (LR2) state over a particular time interval Note: Added in TPC v4.2.1. The average number of times per second that an out of order frame was detected for a particular port over a particular time interval. Note: Added in TPC v4.2.1. The average number of times per second that an out of order ACK frame was detected for a particular port over a particular time interval. Note: Added in TPC v4.2.1. The average number of times per second that a frame was received that has been detected as previously processed for a particular port over a particular time interval. Note: Added in TPC v4.2.1. The average number of times per second that a frame was received with bad relative offset in the frame header for a particular port over a particular time interval. Note: Added in TPC v4.2.1. The average number of times per second the port has detected a timeout condition on receiving sequence initiative for a fibre channel exchange for a particular port over a particular time interval. Note: Added in TPC v4.2.1. 56 Appendix B Available Thresholds This table lists the threshold name, the types of components for which each threshold is available, and a description. The SMI-S BSP device type mentioned in the table below refers to any storage subsystem that is managed via a CIMOM which supports SMI-S 1.1 with Block Server Performance (BSP) subprofile.Thresholds that require specific versions of IBM Tivoli Storage Productivity Center are noted in parenthesis. Threshold (Metric) Type Device/Component Type Description Array Thresholds Disk Utilization Percentage 850 ESS/DS6K/DS8K Array 837 ESS/DS6K/DS8K Array SVC/Storwize MDisk SVC/Storwize MDisk Group Total Backend Data Rate (4.1.1) 840 ESS/DS6K/DS8K Array SVC/Storwize MDisk SVC/Storwize MDisk Group Backend Read Response Time (4.1.1) 841 ESS/DS6K/DS8K Array SVC/Storwize MDisk Total Backend I/O Rate (4.1.1) Sets thresholds on the approximate utilization percentage of the arrays in a particular subsystem, i.e. the average percent of time that the disks associated with the array were busy. The Disk Utilization metric for each array is checked against the threshold boundaries for each collection interval. This threshold is enabled by default for ESS subsystems, and disabled by default for others. The default threshold boundaries are 80%, 50%, -1, -1. In addition, a filter is available for this threshold which will ignore any boundary violations if the Sequential I/O Percentage is less than a specified filter value. The pre-populated filter value is 80. Sets thresholds on the average number of I/O operations per second for array and MDisk read and write operations. The Total I/O Rate metric for each array, MDisk, or MDisk Group is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Sets thresholds on the average number of megabytes (2^20 bytes) per second that were transferred for array or MDisk read and write operations. The Total Data Rate metric for each array, MDisk, or MDisk Group is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Sets thresholds on the average number of milliseconds that it took to service each array and MDisk read operation. The Backend Read Response Time metric for 57 Threshold (Metric) Type Device/Component Type Backend Write Response Time (4.1.1) 842 ESS/DS6K/DS8K Array SVC/Storwize MDisk Overall Backend Response Time 843 SVC/Storwize MDisk Backend Read Queue Time (4.1.1) 844 SVC/Storwize Mdisk Description each array or MDisk is checked against the threshold boundaries for each collection interval. Though this threshold is disabled by default, suggested boundary values of 35,25,-1,-1 are prepopulated. In addition, a filter is available for this threshold which will ignore any boundary violations if the Backend Read I/O Rate is less than a specified filter value. The pre-populated filter value is 5. Sets thresholds on the average number of milliseconds that it took to service each array and MDisk write operation. The Backend Write Response Time metric for each array or MDisk is checked against the threshold boundaries for each collection interval. Though this threshold is disabled by default, suggested boundary values of 120,80,-1,-1 are prepopulated. In addition, a filter is available for this threshold which will ignore any boundary violations if the Backend Write I/O Rate is less than a specified filter value. The pre-populated filter value is 5. Sets thresholds on the average number of milliseconds that it took to service each MDisk I/O operation, measured at the MDisk level. The Total Response Time (external) metric for each MDisk is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. In addition, a filter is available for this threshold which will ignore any boundary violations if the Total Backend I/O Rate is less than a specified filter value. The pre-populated filter value is 10. Sets thresholds on the average number of milliseconds that each read operation spent on the queue before being issued to the backend device. The Backend Read Queue Time metric for each MDisk is checked against the threshold boundaries for each collection interval. Though this threshold is disabled by default, suggested boundary values of 5,3,-1,-1 are pre-populated. In addition, a filter is available for this threshold which will ignore any boundary violations if the Backend Read I/O Rate is less than a specified filter value. The pre-populated filter value is 5. Violation of these 58 Threshold (Metric) Type Device/Component Type Backend Write Queue Time (4.1.1) 845 SVC/Storwize MDisk Peak Backend Write Response Time (4.1.1) 951 SVC/Storwize Node Description threshold boundaries means that the SVC deems the MDisk to be overloaded. There is a queue algorithm that determines the number of concurrent I/O ops that the SVC will send to a given MDisk. If there is any queuing (other than during maybe a backup process) then this suggests performance can be improved by resolving the queuing issue. Sets thresholds on the average number of milliseconds that each write operation spent on the queue before being issued to the backend device. The Backend Write Queue Time metric for each MDisk is checked against the threshold boundaries for each collection interval. Though this threshold is disabled by default, suggested boundary values of 5,3,-1,-1 are pre-populated. In addition, a filter is available for this threshold which will ignore any boundary violations if the Backend Read I/O Rate is less than a specified filter value. The pre-populated filter value is 5. Violation of these threshold boundaries means that the SVC deems the MDisk to be overloaded. There is a queue algorithm that determines the number of concurrent I/O ops that the SVC will send to a given MDisk. If there is any queuing (other than during maybe a backup process) then this suggests performance can be improved by resolving the queuing issue. Sets thresholds on the peak (worst) response time among all MDisk write operations by a node. The Backend Peak Write Response Time metric for each Node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundary values of 30000,10000,1,-1. Violation of these threshold boundaries means that the SVC cache is having to “partition limit” for a given MDisk group – that is, the destage data from the SVC cache for this MDisk group is causing the cache to fill up (writes are being received faster than they can be destaged to disk). If delays reach 30 seconds or more, then the SVC will switch into “short term mode” where writes are no longer cached for the MDisk Group. 59 Threshold (Metric) Type Device/Component Type Description Controller Thresholds Total I/O Rate (overall) 809 SMI-S BSP Subsystem ESS/DS6K/DS8K Controller SVC/Storwize I/O Group XIV Subsystem SMI-S BSP Subsystem ESS/DS6K/DS8K Controller SVC/Storwize I/O Group XIV Subsystem Total Data Rate 821 Write-cache Delay Percentage 832 ESS/DS6K/DS8K Controller SVC/Storwize Node Cache Holding Time 834 ESS/DS6K/DS8K Controller CPU Utilization (3.1.3) 900 SVC/Storwize Node Non-Preferred Node Usage 949 SVC/Storwize I/O Group Sets threshold on the average number of I/O operations per second for read and write operations, for the subsystem controllers (clusters) or I/O Groups. The Total I/O Rate metric for each controller or I/O Group is checked against the threshold boundaries for each collection interval. These thresholds are disabled by default. Sets threshold on the average number of MB per second for read and write operations, for the subsystem controllers (clusters) or I/O Groups. The Total Data Rate metric for each controller or I/O Group is checked against the threshold boundaries for each collection interval. These thresholds are disabled by default. Sets thresholds on the percentage of I/O operations that were delayed due to writecache space constraints. The Writecache Full Percentage metric for each controller or node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundaries of 10, 3, 1, -1. In addition, a filter is available for this threshold which will ignore any boundary violations if the Write-cache Delay I/O Rate is less than a specified filter value. The pre-populated filter value is 10 I/Os per sec. Sets thresholds on the average cache holding time, in seconds, for I/O data in the subsystem controllers (clusters). Shorter time periods indicate adverse performance. The Cache Holding Time metric for each controller is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundaries of 30, 60, -1, -1. Sets thresholds on the average utilization percentage of the CPU(s) in the SVC nodes. The CPU Utilization metric for each node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundaries of 90,75,1,-1. Sets thresholds on the Non-Preferred Node Usage Percentage of an I/O Group. 60 Threshold (Metric) Type Device/Component Type Percentage (4.1.1) Description This metric of each I/O Group is checked against the threshold boundaries at each collection interval. This threshold is disabled by default. In addition, a filter is available for this threshold which will ignore any boundary violations if the Total I/O Rate of the I/O Group is less than a specified filter value. Port Thresholds Total Port I/O Rate Total Port Packet Rate Total Port Data Rate 854 ESS/DS6K/DS8K Port SVC/Storwize Port (3.1.3) SMI-S BSP Port XIV Port 857 Switch Port 860 ESS/DS6K/DS8K Port SVC/Storwize Port (3.1.3) SMI-S BSP Port Switch Port XIV Port Overall Port Response Time 865 ESS/DS6K/DS8K Port Port to Local Node Send Response Time (4.1.1) 925 SVC/Storwize Node Sets thresholds on the average number of I/O operations per second for send and receive operations, for the ports. The Total I/O Rate metric for each port is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Note: XIV support available in v4.2.1.163. Sets thresholds on the average number of packets per second for send and receive operations, for the ports. The Total I/O Rate metric for each port is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Sets thresholds on the average number of MB per second for send and receive operations, for the ports. The Total Data Rate metric for each port is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Note: XIV support available in v4.2.1.163. Sets thresholds on the average number of milliseconds that it took to service each I/O operation (send and receive) for ports. The Total Response Time metric for each port is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Sets thresholds on the average number of milliseconds it took to service each send operation to another node in the local SVC cluster. The Port to Local Node Send Response Time metric for each Node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundary values of 3,1.5,-1,-1. Violation of these threshold boundaries means that it is taking too long to send 61 Threshold (Metric) Type Device/Component Type Port to Local Node Receive Response Time (4.1.1) 926 SVC/Storwize Node Port to Local Node Send Queue Time (4.1.1) 928 SVC/Storwize Node Port to Local Node Receive Queue Time (4.1.1) 929 SVC/Storwize Node Port Send Utilization Percentage (4.1.1) 972 ESS/DS6K/DS8K Port Description data between nodes (on the fabric), and suggests either congestion around these FC ports, or an internal SVC microcode problem. Sets thresholds on the average number of milliseconds it took to service each receive operation from another node in the local SVC cluster. The Port to Local Node Receive Response Time metric for each Node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundary values of 1,0.5,-1,-1. Violation of these threshold boundaries means that it is taking too long to send data between nodes (on the fabric), and suggests either congestion around these FC ports, or an internal SVC microcode problem. Sets thresholds on the average number of milliseconds that each send operation issued to another node in the local SVC cluster spent on the queue before being issued. The Port to Local Node Send Queued Time metric for each node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundary values of 2,1,-1,-1. Violation of these threshold boundaries means that the node has to wait too long to send data to other nodes (on the fabric), and suggests congestion on the fabric. Sets thresholds on the average number of milliseconds that each receive operation issued to another node in the local SVC cluster spent on the queue before being issued. The Port to Local Node Receive Queued Time metric for each node is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundary values of 1,0.5,-1,-1. Violation of these threshold boundaries means that the node has to wait too long to receive data from other nodes (on the fabric), and suggests congestion on the fabric. Sets thresholds on the average amount of time that ports are busy sending data. The Overall Port Busy Percentage metric for each port is checked against the threshold boundaries for each collection 62 Threshold (Metric) Port Receive Utilization Percentage (4.1.1) Port Send Bandwidth Percentage (4.1.1) Port Receive Bandwidth Percentage (4.1.1) Error Frame Rate Link Failure Rate CRC Error Rate Type Device/Component Type 973 ESS/DS6K/DS8K Port 975 ESS/DS8K Port SVC/Storwize Port Switch Port XIV Port 976 872 874 877 ESS/DS8K Port SVC/Storwize Port Switch Port XIV Port DS8K Port Switch Port DS8K Port SVC/Storwize Port Switch Port DS8K Port SVC/Storwize Port Switch Port Description interval. This threshold is disabled by default. Sets thresholds on the average amount of time that ports are busy receiving data. The Overall Port Busy Percentage metric for each port is checked against the threshold boundaries for each collection interval. This threshold is disabled by default. Sets thresholds on the average port bandwidth utilization percentage for send operations. The Port Send Utilization Percentage metric is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundaries 85,75,-1,1. Note: XIV support available in v4.2.1.163. Sets thresholds on the average port bandwidth utilization percentage for receive operations. The Port Send Utilization Percentage metric is checked against the threshold boundaries for each collection interval. This threshold is enabled by default, with default boundaries 85,75,-1,-1. Note: XIV support available in v4.2.1.163. Sets thresholds on the average number of frames per second received in error for the switch ports. The Error Frame Rate metric for each port is checked against the threshold boundary for each collection interval. This threshold is disabled by default. Note: DS8K support requires TPC v4.2.1. Sets thresholds on the average number of link errors per second experienced by the switch ports. The Link Failure Rate metric for each port is checked against the threshold boundary for each collection interval. This threshold is disabled by default. Note: DS8K and SVC/Storwize support requires TPC v4.2.1. Sets thresholds on the average number of frames received in which the CRC in a frame does not match the CRC computed by the receiver. The CRC Error Rate 63 Threshold (Metric) Type Device/Component Type Description metric for each port is checked against the threshold boundary for each collection interval. This threshold is disabled by default. Invalid Transmission Word Rate 989 DS8K Port SVC/Storwize Port Switch Port Note: Added in TPC v4.2.1. Sets thresholds on the average number of bit errors detected on a port. The Invalid Transmission Word Rate metric for each port is checked against the threshold boundary for each collection interval. This threshold is disabled by default. Note: Added in TPC v4.2.1. Zero BufferBuffer Credit Timer 990 SVC/Storwize Port Sets thresholds on the number of microseconds for which the port has been unable to send frames due to lack of buffer credit since the last node reset. The Zero Buffer-Buffer Credit Timer metric for each port is checked against the threshold boundary for each collection interval. This threshold is disabled by default. Note: Added in TPC v4.2.1. 64 Appendix C DS3000, DS4000 and DS5000 Metrics This table lists the metrics supported by DS3000, DS4000 and DS5000 subsystems, a description of the metric, and the reports that will include the metrics. Older DS3000, DS4000, and DS5000 subsystems managed by Engenio providers, e.g. 10.50.G0.04, only support a subset of the following metrics in their reports. Later levels of DS3000, DS4000, and DS5000 subsystems managed by LSI SMI-S Provider 1.3 and above, e.g. 10.06.GG.33, support more metrics – these are denoted by an asterisk. For more information regarding supported DS3000, DS4000, and DS5000 subsystems and their related providers, please see: http://www01.ibm.com/support/docview.wss?rs=40&context=SSBSEX&q1=subsystem&uid=swg21384734&l oc=en_US&cs=utf-8&lang=en Metric Description Read I/O Rate (overall) Average number of I/O operations per second for both sequential and non-sequential read operations, for a particular component over a time interval. Write I/O Rate (overall) Average number of I/O operations per second for both sequential and non-sequential write operations, for a particular component over a time interval. Total I/O Rate (overall) Average number of I/O operations per second for both sequential and non-sequential read and write operations, for a particular component over a time interval. Read Cache Hits (overall) * Write Cache Hits (overall) Percentage of cache hits for both sequential and nonsequential read operations, for a particular component over a time interval. Percentage of cache hits for both sequential and nonsequential write operations, for Report Type By Volume By Controller* By Subsystem Controller Cache Performance* Controller Performance* Subsystem Performance Top Active Volumes Cache Hit Performance Top Volumes Data Rate Performance Top Volumes I/O Rate Performance By Volume By Controller* By Subsystem Controller Cache Performance* Controller Performance* Subsystem Performance Top Active Volumes Cache Hit Performance Top Volumes Data Rate Performance Top Volumes I/O Rate Performance By Volume By Controller* By Subsystem Controller Cache Performance* Controller Performance* Subsystem Performance Top Active Volumes Cache Hit Performance Top Volumes Data Rate Performance Top Volumes I/O Rate Performance By Volume By Controller* By Subsystem Controller Cache Performance* Top Active Volumes Cache Hit Performance By Volume* By Controller* By Subsystem* 65 Metric * Total Cache Hits (overall) Description Report Type a particular component over a time interval. Percentage of cache hits for both sequential and nonsequential read and write operations, for a particular component over a time interval. Controller Cache Performance* Top Active Volumes Cache Hit Performance* By Volume* By Controller* By Subsystem* Controller Cache Performance* Top Active Volumes Cache Hit Performance* By Volume By Controller* By Subsystem Controller Performance* Subsystem Performance Top Active Volumes Cache Hit Performance Top Volumes Data Rate Performance Top Volumes I/O Rate Performance By Volume By Controller* By Subsystem Controller Performance* Subsystem Performance Top Active Volumes Cache Hit Performance Top Volumes Data Rate Performance Top Volumes I/O Rate Performance By Volume By Controller* By Subsystem Controller Performance* Subsystem Performance Top Active Volumes Cache Hit Performance Top Volumes Data Rate Performance Top Volumes I/O Rate Performance Read Data Rate Average number of megabytes (2^20 bytes) per second that were transferred for read operations, for a particular component over a time interval. Write Data Rate Average number of megabytes (2^20 bytes) per second that were transferred for write operations, for a particular component over a time interval. Total Data Rate Average number of megabytes (2^20 bytes) per second that were transferred for read and write operations, for a particular component over a time interval. Read Transfer Size Write Transfer Size Overall Transfer Size * Port Send I/O Rate * Port Receive I/O Rate * Total Port I/O Average number of KB per I/O for read operations, for a particular component over a time interval. Average number of KB per I/O for write operations, for a particular component over a time interval. Average number of KB per I/O for read and write operations, for a particular component over a time interval. Average number of I/O operations per second for send operations, for a particular port over a time interval. Average number of I/O operations per second for receive operations, for a particular port over a time interval. Average number of I/O By Volume By Controller* By Subsystem By Volume By Controller* By Subsystem By Volume By Controller* By Subsystem By Port* Port Performance* By Port* Port Performance* By Port* 66 Metric Description Rate operations per second for send and receive operations, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second that were transferred for send (read) operations, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second that were transferred for receive (write) operations, for a particular port over a time interval. Average number of megabytes (2^20 bytes) per second that were transferred for send and receive operations, for a particular port over a time interval. Average number of KB sent per I/O by a particular port over a time interval. Average number of KB received per I/O by a particular port over a time interval. Average number of KB transferred per I/O by a particular port over a time interval. * Port Send Data Rate * Port Receive Data Rate * Total Port Data Rate * Port Send Transfer Size * Port Receive Transfer Size * Overall Port Transfer Size Report Type Port Performance* By Port* Port Performance* By Port* Port Performance* By Port* Port Performance* By Port* Port Performance* By Port* Port Performance* By Port* Port Performance* 67