Caché Performance Troubleshooting Part II The System Vik Nagjee Product Manager, Kernel Technologies System Performance: Limiting Factors System system-wide metrics Caché system-wide metrics Significance: Caché system-wide metrics What are your users experiencing? How busy is Your database? How well is your application using database cache? How well is your disk system responding? Collecting: system-level metrics system-wide PERFDAT sar | glance | nmon T4 Resource and Performance Monitor iostat | vmstat MONITOR logman top | topas Process Explorer Collecting: system-wide Caché metrics Collecting Caché metrics: GLOSTAT • %SYS>DO ^GLOSTAT Collecting Caché metrics: ^pButtons • %SYS>DO ^pButtons • Installed in %SYS since 2008.2 but • The latest version (currently 1.15c) is available at ftp://ftp.intersystems.com/pub/performance/ • Can be automated via TASKMGR • Low overhead – logging data that’s already available. • Documented in the Caché Monitoring Guide 11 The performance “button” report (^pButtons) Notes on using ^pButtons • Profiles are configurable: • Create custom duration and interval combinations • Add or delete from the OS level metric collection • Collect the logs into one easy-to-use .html file: %SYS>DO Collect^pButtons • Preview a currently running profile’s data: %SYS>DO Preview^pButtons(runid) • Available at any point while profile is running. • May result in some truncated data. Collecting Caché metrics: Monitors • Caché History Monitor – SYS.History • Collect Caché metrics and User-defined metrics over time • Stored in your Caché database • Query or export the data using a variety of methods • Caché System Monitor – %Monitor.Health • Monitor the system health of your database • Alerts on abnormal metrics based on configurable criteria • Alerts from the System Monitor in cconsole.log: 04/01/13-13:55:55:847 (13897) 1 [SYSTEM MONITOR] CPUusage Warning: CPUusage = 82 ( Warnvalue is 75)....(repeated 1 times) Collecting Caché metrics: SNMP/WMI • SNMP, WMI, WSMON • Documented in the Caché Monitoring Guide • Caché metrics are exposed via the SNMP or WMI or Web services • NOTE: Future focus is on SNMP • Add CUSTOM application-specific metrics to be exposed • Use your EXISTING network management infrastructure to collect and alert on Caché metrics, your application metrics and operating system metrics System-level clues to performance issues • CPU • Lack of processing cycles ( 0% CPU Idle) • Blocked processes (run queue or device queuing) • Disk • Abnormal disk IO rate • Queuing on devices • Higher than normal latency on busy disk • Memory • Lack of free memory • Hard page faults Caché-level clues to performance issues • GloRefs and/or RouCmds • Higher than normal? • Your app will be using more CPU… • Are there extraneous processes or more users? • Lower than normal? • Your app may be struggling with another problem (slow disk) • Concurrency issues • Blocked users upstream on the network Caché-level clues to performance issues • PhysBlkRds • Higher than normal? • Cache size doesn’t match current load • Use of CACHETEMP is forcing more disk reads for other data • Lower than normal? • Maybe that’s ok • App is struggling elsewhere such as lack of CPU cycles • If coupled with abnormally low GloRefs maybe disk latency issue Application clues! • All the above coupled with application-level clues lead to solutions: • Are users complaining? • Is the rate of application activity the same? • Are batch-jobs/print jobs/screen refreshes completing in a timely manner? • Are your interfaces queuing? 0 21 10:00:00 10:00:40 10:01:20 10:02:00 10:02:40 10:03:20 10:04:00 10:04:40 10:05:20 10:06:00 10:06:40 10:07:20 10:08:00 10:08:40 10:09:20 10:10:00 10:10:40 10:11:20 10:12:00 10:12:40 10:13:20 10:14:00 10:14:40 10:15:20 10:16:00 10:16:40 10:17:20 10:18:00 10:18:40 10:19:20 10:20:00 Comparing metrics – Load measure 500 400 300 Users 200 100 200 100 0 22 10:00:00 10:00:40 10:01:20 10:02:00 10:02:40 10:03:20 10:04:00 10:04:40 10:05:20 10:06:00 10:06:40 10:07:20 10:08:00 10:08:40 10:09:20 10:10:00 10:10:40 10:11:20 10:12:00 10:12:40 10:13:20 10:14:00 10:14:40 10:15:20 10:16:00 10:16:40 10:17:20 10:18:00 10:18:40 10:19:20 10:20:00 Comparing metrics – add App Metric 500 0.7/min/user 400 0.8/min/user 300 0.8/min/user 0.9/min/user 0.8/min/user Users Accts Logged 0 23 10:00:00 10:00:40 10:01:20 10:02:00 10:02:40 10:03:20 10:04:00 10:04:40 10:05:20 10:06:00 10:06:40 10:07:20 10:08:00 10:08:40 10:09:20 10:10:00 10:10:40 10:11:20 10:12:00 10:12:40 10:13:20 10:14:00 10:14:40 10:15:20 10:16:00 10:16:40 10:17:20 10:18:00 10:18:40 10:19:20 10:20:00 Comparing metrics – add Caché metric 1200 500 1000 400 800 300 600 200 400 100 200 0 Users Accts Logged GloRefs Key points • Many important metrics available for capture • Capture the metrics at all times • Many tools/methods for capturing metrics • Include application-level metrics in your capture • Analysis for capacity or troubleshooting begins with understanding your application’s affects on the system. You can reach me at: vik@intersystems.com Thanks for attending! Q&A