Headroom A Measure of Server Remaining Capacity Prem S. Sinha, PhD. President, CEO PerfCap Corporation 85, Perimeter Road, Nashua, NH 03063 www.PerfCap.com; Info@PerfCap.com; 603-594-0222 1 Challenges • Pro-active Vs. Re-active planning • Large number of geographically dispersed systems • Traditional Capacity Planning Methodology takes too long • Automated process – daily basis: – – – – Collect/consolidate data Generate reports Publish on a web site Notify – “need to know” basis 2 Capacity Planning Definition: A process to determine how much computing resources are required to meet business growth Or How much business can grow before some device will run out of capacity To answer “What if” questions like: – Can my current configuration handle three times of current workload – when will my current configuration saturate – What will be impact of a new application on current system performance – What will be impact of upgrading a current server or adding a new server – Can I reduce the number of servers with out violating my “Service Level Agreement” – a.k.a Server Consolidation 3 Sizing Methods Real System Benchmarks Simulation Models Rules of Thumb Analytic Models Linear Projections Accuracy 4 Capacity Planning via Modeling Steps: • Data Collection • Identifying Peak Interval(s) • Workload Characterization • Model Validation • Saturation Analysis • “What If” Analysis 5 Capacity Planning via Trending Performance Metric (Av. or Peak CPU Utilization) Capacity Limit Today Remaining Capacity J F M A M J J A S O N D Time • Simple to produce and follow • Issues • defining right Capacity Limit • single vs composite metric • end user satisfaction 6 PAWZ Planner Where do you want to operate? Response Time Response Time = S{Service Time + Queuing Time} Headroom Saturation Point Current Workload l Workload 7 System Capacity – An Example A Trading System 10 Response Time = S{Service Time + Queuing Time} 9 Normalized Response Time 8 Significant performance degradation 7 6 5 4 3 Today 2 1 Headroom today is ~60% 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Quotes/sec System capacity is ~8500 quotes/sec. 8 Each Day PAWZ Automatically Models Each System to Determine System Capacity PAWZ – Saturation Analysis At 125% growth from Nov. 25th, system will reach capacity. -100% is no load. 0% is Nov. 25th load. Headroom is 55% of capacity 9 PAWZ Maintains Trend of Daily Headroom To Forecast Decline of to Critical Level 10 Still not enough … Problem: Not enough time to examine response time curves or headroom trends for many systems: Solution: Risk Analysis provides high-level overview of headroom trends of all systems. 11 Risk Analysis • Need to know how soon headroom will reach a given level. • Daily risk state determined by time left until headroom will decline to user-defined thresholds. • Risk states for multiple systems displayed as color status (red, green, amber) counts. 12 Headroom Risk Analysis Headroom reaches 0 Headroom Headroom crosses threshold Current state Lead time Lead time Headroom threshold Time Amber status – system within lead time of dropping below headroom threshold. Red status – system within lead time of exhausting capacity. 13 PAWZ Maintains Trend of Daily Headroom To Forecast Decline of to Critical Level 14 Risk Analysis Summary Group of Systems 15 Headroom 16 Asset Location Configuration Change Report Critical Systems Asset Reports PAWZ FindIT Server Daily, Weekly Health Reports (NT/W2K) Intranet HP-UX Real Time LINUX Trending Performance Reports Windows NT/2000/XP Applications Events Clusters Networks SUN Solaris Storage Tru64 UNIX IBM-AIX OpenVMS Cluster 17 Automated Daily Performance Reports CPU Usage by Applications Disk Usage by Drives Memory Allocation by Type Network Traffic by Protocol Dashboard Style Summary by Group Exceptions by Type 18 Automated Daily Capacity Reports Historical CPU Usage Trend Daily Capacity Saturation Graph Remaining Headroom Capacity Trend Daily Risk Trend Web Based “What-if” Before Upgrade After Upgrade 19 Summary • Headroom provides a composite index to measure remaining Capacity of a Server • By automating the process of – – – – Data Collection Data Consolidation Saturation Analysis Risk Analysis one can do Capacity Planning of large number Servers in a timely manner 20