CMG 2011 Highlights John Slobodnik April 17, 2012 CMG Canada CMG 2011 Highlights All items presented today are derived from presentations made at the CMG 2011 International conference. Key points from a variety of different presentations will be highlighted. Focus is on mid-range systems. Network Capacity Planning OpNet Modeler Can analyze simulated networks to predict impact on ETE response times. Allows what-if scenario’s using different technology designs. Managing Virtual Systems Full Virtualization Of hardware infrastructure, eg. VMWare ESX has poor performance vs. Power virtualization Due to hypervisor intercepting guest OS privileged instructions. However, VMWare has improved performance with latest version. Para-Virtualization The guest OS is configured to run privileged instructions. Virtual I/O is handled on a separate VM. Managing Virtual Systems VMWare Tools Installed on each VM guest Full and Para-virtual environments Get to the latest version of VMWare Tools Improved performance - communication through private channel to hypervisor Active Memory Expansion (available in IBM Power 7) Watch out for overhead if you are planning to use this. Check the impact to your application first. Managing Virtual Systems VMWare tools Comes with ballooning driver – communication to hypervisor Allocates pinned (non-shared) memory to ballooned module on guest OS (guest OS cannot access it) and gives it to the hypervisor Recommendation: Set the balloonsize threshold Performance Reporting (@ host level) Look at paging for each guest Check if any hypervisor paging exists Planning for the Cloud ESJ article from October 2011 by David S. Linthicum Recommendation on steps to follow if you are considering a cloud implementation. Performance and Capacity Management is one of the steps specified. http://esj.com/Articles/2011/10/31/10-StepsEnterprise-Cloud-Strategy.aspx?Page=1 Planning for the Cloud Step 1: Ignore the hype. Step 2: Create your business case. Step 3: Define your domains. Step 4: Define the useful cloud computing technology patterns for each domain. Step 5: Define your core requirements for each domain. Step 6: Define your security requirements. Step 7: Define your governance requirements. Step 8: Create test plans. Step 9: Create performance models. SaaS, IaaS, and PaaS ... private, public, or hybrid Will the use of cloud computing, including public and/or private clouds, provide the performance required to drive the business? Create performance models, and conduct performance tests. Keep in mind that you’ll undergo organic data growth over time, so make sure to plan for that. Step 10: Create a final migration plan. Planning for the Cloud CloudSleuth is a free, partner-driven community web portal for IT professionals who are building and managing cloud applications, and a destination for cloud service providers and consumers. It provides information, resources and hands-on solutions regarding best practices, blueprints and tools for the cloud. https://cloudsleuth.net/ Free widget measures cloud provider performance. http://www.compuware.com/application-performance- management/performance-analytics-cloudsleuth-freewidgets.html Planning for the Cloud Problems: On-demand (2 ms -> 50 ms) X 10 transactions becomes a 20 ms -> 500 ms delay Resource pooling – geographic issue At 60% CPU Busy %CPURDY increases self-service could be pointed to Frankfurt Ready to run but CPU not available. Availability and Response time ETE must be monitored Horizontal monitoring across layers. Tools on the market Optier Compuware DynaTrace Correlsense Strobe (mainframe) Managing Windows Useful (FREE!) tools Windows Resource Monitor app WPT (Windows Performance Toolkit) Shows last minute of data CSwitchIn – CSwitchOut = CPU Busy Uses event-driven data, more accurate than traditional perfmon and Task Manager. Used by IBM internally. Resource Monitor application (resmon.exe) is available beginning in Vista and Windows Server 2008. The same CPU busy event-based calculations using event data from ETW can be done with the xperf command. Concurrency Visualizer Thread locking and blocking. Strobe-like tool Use it only in DEV Comes with Visual Studio 2010 http://msdn.microsoft.com/en-us/magazine/ee336027.aspx and in the paper. Scenario An application response time monitoring tool for both Windows C++ native and .NET applications that is integrated with ETW is called the Scenario class. The Scenario instrumentation library is a free download available from the MSDN Code Gallery site at: http://archive.msdn.microsoft.com/Scenario Managing Windows Important metrics to measure for VMWare CPU “Ready Time” Measured in milliseconds Is reported every 20 seconds Is work that is “In & Ready” and not yet dispatched “esxtop” gathers metrics from ESX host %rdy %cstp Ready to Run but no CPU available If 5-7% is sustained it is likely a CPU bottleneck Has challenges scheduling multi-CPU work Elapsed Time = WAIT + RDY + CSTP + RUN Often see high WAIT, don’t worry “esxtop” v4 – cores/sockets are not separated “esxtop” v5 – separates core vs. socket vs. hyperthread metrics Managing Windows Important metrics to measure for VMWare Memory “esxtop” “esxtop” v5 Hit function key to get NUMA metrics NRMEM, NLMEM N%L – is how effective memory is VMWARE R.O.T. Look at SW* metrics for swapping Reserve memory for JVMs Otherwise ESX contends with java Increases swapping if memory constraint Increases CPU from ESX level The net effect is slower end-user response time. Network %drptx Measures the dropped packets. Managing Windows Recommendation The default Windows blocksize when creating a VM is bad, change it! The case studies in the paper are good. Paper on www.demandtech.com Select “Downloads” then: Download Measuring Processor Utilization in Windows SAN Management • A couple of heterogeneous tools for SAN management: – Virtual Instruments: Virtual Wisdom – Intellimagic HELP DEVELOPERS FIND THEIR OWN PERFORMANCE DEFECTS Reporting #1: Duplicate Request Reports for SQL and Other Resources Paper discusses methods available to do the reporting from APM tools to homegrown logging (and how to set it up). SQL requests to RDBMS are the primary focus, but any resource that can be deployed over a network is a concern. Here are some examples of various types of requests that you should report on: Count the number of invocations of each unique SQL statement RDBMS (Relational Database Management System) SOA (Service Oriented Architecture) CICS (Customer Information Control System) JMS (Java Messaging Service) LDAP (Lightweight Directory Access Protocol) Ajax or web JDBC logging tool – P6Spy HELP DEVELOPERS FIND THEIR OWN PERFORMANCE DEFECTS Reporting #2: SQL Statement Efficiency Reports 1. 2. Dissect the text of each SQL statement, highlighting various practices that are prone to poor performance: Large number of tables in the FROM clause. Large number of columns in the SELECT LIST. Large number of criteria in the WHERE and other clauses. Flag RDBMS Data Types prone to performance defects BLOBs - “Binary Large Object.” CLOBs - “Character Large Object” BLOBs are prone to performance problems because often they require multiple round-trips to access these special data types. These multiple round-trips can significantly hurt performance, especially when multiple BLOB rows are returned in a single result test. Simple queries to the database’s “catalog” tables quickly display the names of all tables that use these data types. VARCHAR datatype can be used instead of CLOB and BLOB sometimes. • Paper talks about the details of how to determine what to use. 3. 4. Minimize SELECTs to “static” data With the proper caching in place, SELECTs to certain RDBMS tables can be nearly eliminated. A simple report automatically generated every day can easily highlight any tuning opportunities by showing SELECTSs to this certain set of tables. “Static Table” - list of all table names that only contain data that is only INSERTed, UPDATEd or DELETEd LE 12 times a day Fetchsize Analysis How chatty the JDBC client is with the database server. eg. If your fetch size is 1 and your code iterates through 11 rows, then your DB will make 11 round trips to the database (often over the network) to retrieve all the data. If the fetch size is 20 and the result set has 40 rows, then 2 round trips are made. HELP DEVELOPERS FIND THEIR OWN PERFORMANCE DEFECTS Reporting #3: Report for Duplicate HTTP Requests The free tool Fiddler2 is just one of many tools that can be used to easily trace traffic from a browser. Implementation is more complex than for SQL statements, outlined in paper. Automated Unit Testing #4: No-load Response Time Regression SLA for production, performance in test has many variables that would cause performance to change. Recommendation: Track response time in unit tests from the very first day the system utters “Hello world..” That way response time leaps in data collected over weeks and months can easily be attributed to various project milestones, source code commits and changes in test data. HELP DEVELOPERS FIND THEIR OWN PERFORMANCE DEFECTS Automated Unit Testing #5: Longevity Testing Memory leak test often done just before app goes “production”. Why not do it earlier? Application test load must be steady. Following should also remain steady: JVM heap consumption Operating system RAM consumption CPU consumption Response time HELP DEVELOPERS FIND THEIR OWN PERFORMANCE DEFECTS Automated Unit Testing #6: Functional Tests Required for Efficiency Frequently used in production, obscure parts of system that are not often tested enough. Validate DB indexes. For DB2 use the db2advise tool Oracle performance report called “awrsqrpt.sql” Validate caching hits, misses and expiration. Paper discusses SQL caching testing technique. Best Practices Best practices need to be regularly re-evaluated, this is the value of CMG. Can order your own CD-ROM (or printed 2-volume set) with annual membership ($175). Access to last 10 years of papers online. VMWare Vsphere Performance Measuring Processor Utilization in Windows and Windows Applications ◦ Jonathan Paul - Paper #1501 – Sessions 313/314 ◦ Mark Friedman – Paper attached - Session 255 Instrumentation Strategies for the Cloud Migrating Applications to the Cloud Help Developers (Finally) Find Their Own Performance Defects ◦ David Halbig – Paper # 1042 - Session 352 ◦ Peter Johnson – Paper # 1033 - Session 454 ◦ Erik Ostermueller – Paper #1133 – Session 424 VMWare Vsphere Performance Measuring Processor Utilization in Windows Instrumentation Strategies for the Cloud Migrating Applications to the Cloud Help Developers (Finally) Find Their Own Performance Defects John Slobodnik Performance & Capacity Planner Information Technology Toronto Hydro (416) 542-3100 ext. 30212 Mobile: (416) 903-0196 jslobodnik@torontohydro.com capacityguy@gmail.com