Fixing Intermittent Application Performance Problems

Fixing Intermittent Application Performance Problems: Turning Hours into Minutes Intermittent performance problems are among the most frustrating issues IT administrators face. The apparently random nature of occurrences, the nearimpossibility of reproducing all conditions, and the complexity of applications and their dependencies make it difficult to diagnose root causes to resolve an issue quickly. Administrators commonly resort to time-consuming manual event log searches and trial-and-error attempts to reproduce and diagnose problems. But what they really need is a more efficient approach that puts an end to such a resource-intensive and frustrating process, one that provides the tools and the insight to identify and resolve intermittent performance problems in a hurry. Combating Inefficiencies System administrators are trained problem solvers, but it really helps to have some idea where to start. Unfortunately users usually can’t give you much information to go on. They are likely only to report general symptoms such as a slow application or an inexplicable disconnect, and they are unlikely to know if it’s being caused by an intermittent performance problem. Intermittent problems are especially difficult to diagnose because they don’t last long enough to examine and solve. They appear, wreak havoc, and vanish. The only trace evidence is captured in event logs, often buried in large volumes of hard-to-connect data. Finding such evidence can be as difficult as finding a needle in a haystack. The root cause of the problem can be hidden almost anywhere because most business applications are complex entities that interact with multiple resources, such as databases, web servers, directory services, and the network itself. That complexity forces the administrator through a slow, labor-intensive investigative process that can delay other daily tasks and projects. Without a clear lead to the cause of the problem, the system administrator is forced to review event logs from every part of the application environment. That process requires analyzing long lists of events in multiple logs item by item to find an outstanding event, error condition, or combination of conditions that correlate to the timeframe in which users began to complain. To isolate the problem, the system administrator may attempt to write test scripts, but complex applications can have such a wide variety of conditions that this trial-and-error process can take many hours that can turn into weeks. In organizations large enough to employ specialized administrators for various functions, a database administrator (DBA) or network administrator might be called in to help isolate a problem, following the same onerous analytical process as the system administrator. Scheduling another administrator with a full workload to participate in this process without proof that the problem lies in his or her area can consume a lot of time unnecessarily. And if the issue requires the collaboration of multiple groups, additional time will be wasted before the issue is resolved. Once solved, if the problem recurs, business operations will be disrupted again, user complaints will continue to roll in, and the pressure will build to find a more efficient solution, one that helps administrators overcome the challenges and expense of intermittent performance problems. 3 Common Underlying Causes That Complicate Intermittent Problem Resolution The sporadic nature of a performance problem often makes diagnosis difficult. Keep your eyes open for these three common underlying causes of intermittent problems: • Memory Leak: Microsoft Internet Information Services (IIS) for Windows® Server is a flexible, secure and manageable web server. Applications commonly pull information from this server into local memory to process. After processing, the memory should flush this data, but that doesn’t always happen. The result is overloaded random access memory (RAM) that cannot accept more data. This drastically reduces central processing unit (CPU) utilization, which in turn reduces application performance. • Long-Running Query: Seemingly innocent acts such as changing a schema or updating a stored procedure can introduce a blockage that results in a very long-running SQL query. Such queries sap CPU and memory and slow application performance. • Wireless Latency: Many applications, and especially older applications, are not tolerant of the inherent latency in wireless networks. When these applications interact with databases and web servers over a wireless network, they might not achieve the response time that the application requires. The result is a timed-out application and an annoyed user. The most useful Application Performance Monitoring tools can help diagnose these causes – and other sources – of intermittent performance problems quickly and accurately. Application Performance Monitoring Speeds Resolution The greatest challenge in diagnosing an intermittent performance problem is to know where to begin. Is the root cause in the application itself, or in a database or web server? Or is it a network issue? Without a valid starting point, there is no way to select the right diagnostic path and conduct an efficient analysis. An Application Performance Monitoring (APM) solution that lets administrators link all application dependencies can overcome this impasse. In this environment, targeted, real-time monitoring immediately puts administrators on the right diagnostic path, and clear graphic displays make it easy to follow that path to the root cause of the performance problem. Create Profiles Figure 1 is an example of an IT environment for a customer relationship management (CRM) service. It includes multiple applications running on multiple servers. Any issue that affects the performance of any application in this environment, such as a database problem or a Microsoft® Internet Information Services (IIS) server problem, can affect the performance of the CRM service. Consequently, diagnosing an intermittent performance problem in the CRM service requires monitoring all the elements that make up that service and presenting that information in a high-level view that provides a starting point for analysis. Figure 1. The CRM Service IT infrastructure consists of multiple components. An Application Performance Monitor uses application profiles to diagnose intermittent application performance problems. Application profiles define how an application is monitored and what actions should be taken when an application or one of its components fails. The most useful Application Performance Monitors not only include application profiles, they also define complex relationships and dependencies – from simple n-tier applications to large server farms and on to complete IT services. For example, a SQL server farm is one of the components that requires monitoring in the CRM service shown in Figure 1. A typical SQL server configuration consists of an online transaction processor (OLTP), a staging server, and a data warehousing SQL Server. An application profile is created by an administrator and it’s assigned to monitor each SQL server instance. Individual profiles are then embedded into a higher-level profile to monitor the entire SQL server farm (Figure 2). Once the server farm profile is created, it can be embedded into an even higher-level profile that encompasses the entire CRM service (Figure 3). Replicating this process for each IT service component, such as the IIS server farm, creates a comprehensive service profile that’s the key to a quick diagnosis of a problem’s root cause. The profile ensures that the administrator can view the status of the entire service or drill down to any component within that service, down to a specific instance or component of an application. The resulting comprehensive service monitoring profile is the foundation for fast, accurate problem diagnoses. Figure 2. The server application profile is embedded into the server farm profile. Figure 3. The server farm profile is embedded into the CRM service profile. Diagnose Problems Completing a service profile such as the CRM example generally takes less than two hours. But after that small investment in time, the process of diagnosing an intermittent performance problem can be collapsed from hours, days and weeks of time and frustration into a straightforward process that takes just minutes. Multiply this by the number of application performance incidents you receive now, and the amount of time you’ll be gaining back in your day can be considerable. For example, the Application Performance Monitor might issue a real time alert that Microsoft Dynamics is down. The administrator can then quickly view a high-level profile of the CRM service and see that there is a problem in the supporting SQL server farm (Figure 4). The problem area is shown in red, making it easy to identify. Figure 4. In this example, you can quickly determine that Microsoft Dynamics is down due to a problem in the supporting server farm. The administrator can then drill down into the SQL server farm data and see that the problem lies in the Data Warehouse server (Figure 5). Isolating the problem to a single server illuminates the next step in the diagnostic process, which is a detailed look at the Data Warehouse server status. Click on that red square to learn more. Figure 5. Drill-down analysis shows there’s a problem with the Data Warehouse Server. Clicking on the red square above will take you to the Data Warehouse server status (below, Figure 6). You can immediately see that the root cause of the problem experienced by the user as a Microsoft Dynamics failure is actually Figure 6. The Data Warehouse Server profile reveals the root cause of the problem. a memory issue in the Data Warehouse server. With only a few mouse clicks you’ve determined the root cause of the Microsoft Dynamics failure. A historical graph of the Memory Manager metrics (Figure 7, below) offers further information that the appearance of the memory issue correlates with user complaints of Microsoft Dynamics failures. Figure 7. Intermittent Microsoft Dynamics failures correlate with Data Warehouse Server memory spikes over time. But what happens if a user reports a Microsoft Dynamics performance problem, and a quick look at the high-level CRM service profile reveals that all the system components are working properly? In this case, the problem very likely lies in the network itself. Many applications were not designed with wireless networks in mind. These applications tend to be intolerant of the delays and latencies that wireless can introduce. As a result, wireless users can experience timeouts, disconnects, and data loss. Expanding Application Performance Monitoring capabilities to the network can help administrators to identify the root cause of a network problem as easily as shown in the example above. To determine if there is a network root cause to the Microsoft Dynamics performance problem, for example, an administrator can start by determining whether the user was on a wired or wireless network. If the answer is a wireless network, one obvious culprit is an oversubscribed access point. A quick look at a graphic showing access point usage confirms that the access point is indeed oversubscribed (Figure 8). But knowing that doesn’t solve the problem. Why is the access point oversubscribed? If this is an intermittent problem, what is the cause? Is network traffic from a monthly meeting in a nearby conference room temporarily overloading the access point? Or are there employees disregarding IT policy and overloading the access point by streaming unauthorized applications such as Pandora or YouTube? Determining the root cause is critical to implementing a cost-effective solution. Solving the problem by more strictly enforcing bandwidth usage policies might eliminate the unnecessary purchase of additional access points. Figure 8. An oversubscribed access point is related to the performance problem. Through the same drill-down process used to isolate the SQL server farm problem, an administrator can determine who was using the access point when Microsoft Dynamics experienced the performance problem, and how much bandwidth each person was using. Figure 9 shows that one person was clearly using an excessive amount of bandwidth. A final drill-down shows that the vast majority of the bandwidth consumed by that user was taken by an unclassified application—most likely one that is not allowed by IT policy (Figure 10). Figure 9. The administrator can view access point usage by user and bandwidth. Figure 10. An unclassified application is consuming an excessive amount of bandwidth. Automated Remediation Once root causes of intermittent application performance problems are identified, System Administrators can use the Application Performance Monitor to create multi-step action policies to address future occurrences more quickly. Action policies can include event logging, real-time alerts, and PowerShell self-healing scripts such as reboot and service restart. Action policies can be assigned at the service, application, and component level. Dependency-aware application profiles enable coordinated multi-tier action policies to ensure optimal performance of complex applications and IT services. Widespread Business Benefits An Application Performance Monitoring tool can streamline the resolution of intermittent problems, turning many hours of exasperating work into a few highly-productive minutes. The most successful solutions feature: • A unified view of the complete IT infrastructure and applications • Application profiles that can be customized to include all dependencies • The option to create multi-step action policies to automatically address future instances of the same intermittent conditions These capabilities allow administrators to diagnose intermittent performance problems quickly and accurately, whether they reside in a device or in the network itself. The ability to customize profiles can also foster cooperation throughout IT. For example, a system administrator can easily ask a database administrator to recommend areas to monitor that have historically tended to cause problems. Working cooperatively with common priorities and shared information shortens the diagnostic cycle and streamlines the process to save administrator time and reduce user complaints. About the Network Management Division of Ipswitch, Inc. The Network Management Division of Ipswitch, Inc. is the force behind WhatsUp Gold, the integrated suite of IT management solutions. Over 150,000 networks at large, mid-sized and small enterprises depend on WhatsUp Gold for comprehensive network, system, application and end-user experience monitoring in physical, virtual and wireless infrastructure environments. For 21 years, Ipswitch has developed easy-to-use and affordable products that help IT managers worldwide improve their ability to provide services that drive business success. Ipswitch is headquartered in Lexington, MA and has offices in Atlanta and Augusta, GA, Madison, WI, and American Fork, UT, as well as international offices in Netherlands and Japan. To learn more about WhatsUp Gold, please visit: http://www.whatsupgold.com/products/download/ Ipswitch, Inc. 83 Hartwell Avenue Lexington, MA 02421 USA (781) 676-5700 Kingsfordweg 151 1043 GR Amsterdam The Netherlands Copyright © 2013, Ipswitch, Inc. All rights reserved. WhatsUp is a registered trademark and Ipswitch is a trademark of Ipswitch, Inc. Other products or company names are or may be trademarks or registered trademarks and are the property of their respective holders. Ipswitch Japan 株式会社〒106-0047　東京都港区南麻布4-11-22 南麻布T&Fビル８階

Fixing Intermittent Application Performance Problems

Products

Support

Fixing Intermittent Application Performance Problems

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib