Key Metrics for Effective Storage Performance and Capacity Reporting Abstract Doing capacity management for storage can be difficult with the many complex and varied technologies being used. Given all of the options available for data storage strategy, a clear understanding of the architecture is important in identifying performance and capacity concerns. A technician looking at metrics on a server is often seeing only the tip of a storage iceberg. Knowing which metrics are important will depend on your objectives and storage architecture, but response and space utilization will always be key to effectively managing storage. Contents • Storage Architecture • Two distinct aspects of storage capacity • Virtualization • Key metrics from the host and backend storage view • Reporting on what is most important Space Capacity - History Growth can result in increasing cost and complexity Two Distinct Aspects of Storage Capacity and Performance Storage Throughput Storage Space Response, IOPS Space Capacity – Space Utilization What does storage ‘Utilization’ mean in your environment? Factors include: RAID/DR, Raw/Configured, Host/SAN, Backups, Compression, Etc... Space Capacity – Proactive Visibility Alarm on key metric trends instead of current threshold breaches to get in front of problems before they happen. Trending, forecasting, and exceptions. Space Capacity – Trending Understand the limitations of linear regression when trending and forecasting data. Chart below has low correlation Chart above has high correlation Space Capacity – Showing Different Viewpoints Business, Application, Host, Storage Array, Billing Tier Space Capacity – Host Metrics Metrics are typically available at the file system, volume and logical disk views. Key metrics for space capacity from the host perspective are typically: • • • Storage allocated to system (disks) Allocated but not configured (volumes) Space used or free (file systems) Space Capacity – Array Metrics NetApp Aggregate Key metrics for space capacity from the array perspective depends on the technology and how it is being used. However, like the host view, total capacity and space available are key metrics: • • • Storage installed in arrays (disks) Configured but not allocated (aggregates) Space used or free (volumes) Storage arrays can have many space related metrics at different levels Virtual Environments and Clusters Managing storage in clustered and/or virtual environment can be challenging because it is shared among all hosts and virtual machines running on it. Image Source: VMware.com • Thin provisioning • Storage viewed at many levels • Could be different tiers allocated to the same cluster • Overhead at various points Storage Virtualization Pooling physical storage from multiple sources into logical groupings • Can be a centralized source for collecting data http://www.networkmagazineindia.com/200207/vendor.shtml Wide variety of techniques for virtualizing storage, be aware of the implications for data collection and reporting Performance Capacity – Response Impacts SAN or storage array performance problems can be identified at the host or backend storage environment. Response is the key metric for performance evaluation • • • • Host I/O response Fabric or Network response Virtualization device response Array response High response is typically caused by insufficient throughput capacity Performance Capacity – Host Metrics Understand the limitations of certain host metrics • Measured response is the best metric for identifying trouble. • Host utilization only shows busy time, it doesn’t give capacity for SAN. • Physical I/O rate is an important measure of throughput, all disks have their limitation. • Queue Length is a good indicator that a limitation has been reached somewhere. Performance Capacity – Host Metrics 100% host disk utilization can indicate high throughput, but ample backend capacity might still be available, as was the case here. Performance Capacity – Host Metrics Queue lengths from the previous high utilization chart indicates that it may not currently be impacting response, but headroom is unknown. Performance Capacity – Host Metrics I/O generated from the previous high utilization chart is shown here, where combined throughput peaks are very high. Performance Capacity – Host Metrics Spikes in throughput typically correlate with queues and response for simple disk configurations, as seen in the chart, but most disk configurations are not simple anymore, which means these metrics often do not correlate. Performance Capacity – Array Architecture • Front End Processors • Shared Cache • Back End Processors • Disk Storage Performance Capacity – Array Metrics Front end processors are typically the first to bottleneck. This chart shows acceptable utilization. Performance Capacity – Array Metrics Find arrays doing the most work with throughput metrics. EMC-All Array IOPs 45000 40000 35000 30000 25000 EMC -000 20000 EMC -001 EMC -002 EMC -003 15000 EMC -004 10000 5000 0 10/19/2012 Intellimagic EMC Volume IO/sec 10/20/2012 10/21/2012 10/22/2012 Performance Capacity – Array Metrics Aggregating and trending key metrics can be useful as shown here. EMC-Array Total IOPs Trend 30000 25000 20000 15000 10000 Least square f it Max IOPs IO/sec 5000 0 IO/sec f or EMC-000 between: 20/10/2012 and 22/10/2012 extrapolated until: 27/10/2012 , 72 R aw D ata points Performance Capacity – Array Metrics Knowing what is generating the IOPS can also be important EMC-Top 10 Volumes for All Array IOPs 14000 12000 10000 EMC -000,rnk-0001,v ol-00304 8000 EMC -000,rnk-0001,v ol-00321 EMC -001,rnk-0018,v ol-03614 EMC -001,rnk-0020,v ol-03437 6000 EMC -001,rnk-0020,v ol-04389 EMC -003,rnk-0033,v ol-08738 EMC -003,rnk-0033,v ol-08739 4000 EMC -003,rnk-0033,v ol-08744 EMC -004,rnk-0051,v ol-10396 EMC -004,rnk-0051,v ol-10409 2000 0 10/19/2012 Intellimagic EMC Volume IO/sec 10/20/2012 10/21/2012 10/22/2012 Performance Capacity – Storage Virtualization Metrics Key metrics are also available from virtualization devices. This chart shows the top 10 IBM SVC volumes for throughput. IBM SVC Top 10 Volumes 3500 3000 2500 2000 rnk-0217,v ol-00926 rnk-0218,v ol-00678 1500 rnk-0218,v ol-00691 rnk-0218,v ol-00974 rnk-0229,v ol-00451 1000 rnk-0229,v ol-00578 rnk-0229,v ol-00648 rnk-0229,v ol-00757 500 rnk-0229,v ol-00910 rnk-0229,v ol-01082 0 Intellimagic Volume,SVC-006 Total op/sec Performance Capacity – Storage Virtualization Metrics This is another example of aggregating and trending, although this particular SVC data sample is not a good real world example. IBM SVC Volume IOPs Trend 10000 9000 8000 7000 6000 5000 4000 Least square f it 90% upper conf . limit 3000 2000 1000 0 Total op/sec f or SVC-006,rnk-0229,v ol-00451 between: 18/10/2011 and 19/10/2011 extrapolated until: 21/10/2011 y = 2010x + 914, 97 Raw Data points 90% lower conf . limit Alarm Total op/sec Performance Capacity – Storage Virtualization Metrics Key metric for performance evaluation is response. Other metrics are important too, but are typically used to avoid or troubleshoot high response times. Storage devices can have many performance metrics at different levels Performance Capacity – Array Metrics NetApp Response EMC Performance Capacity – Component Breakdown Service time versus response time – different metrics IO Response The bar chart shows service times as blue and green, with queue time represented as red and yellow. Response is the combination of service and queue time. Performance Capacity – Workload Profiles Application type is important in estimating performance risk Performance Capacity – Scorecards and Exceptions Performance Capacity – Dashboards At a glance view of important metrics for critical resources Storage Key Metrics – Conclusions • Knowledge of your storage architecture is critical • Understand both storage space and throughput • Consider all factors that affect storage space utilization • Be aware of virtualization and clustering complexities • Know key metrics and their limitations • Start with key report types and areas that are most important Thank you for attending The End Key Metrics for Effective Storage Performance and Capacity Reporting