Front cover Managing Composite Applications: An Operator’s View Composite application management issues and considerations Operator management requirements identified Includes Tivoli Enterprise Portal customization Budi Darmawan David Rintoul Howard Anglin Ronaldo Pires Sathyabama S Kuppusamy ibm.com/redbooks Redpaper International Technical Support Organization Managing Composite Applications: An Operator’s View May 2008 REDP-4319-00 Note: Before using this information and the product it supports, read the information in “Notices” on page v. First Edition (May 2008) This edition applies to the following software products: IBM Tivoli Composite Application Manager for SOA (Distributed), 5724-M07 IBM Tivoli Composite Application Manager for Response Time, 5724-C04 IBM Tivoli Composite Application Manager for Web Resources, 5724-S32 IBM Tivoli Monitoring V6.1 This document created or updated on May 1, 2008. © Copyright International Business Machines Corporation 2008. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii The team that wrote this paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1. Operation and composite application . . . . . . . . . . . . . . . . . . . . 1 1.1 Composite application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 A typical operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Operation of composite application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 IBM Tivoli application management tools . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2. Designing the operator interface . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Early warning system for application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Problem analysis design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Problem resolution facilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 3. Implementation of operator design. . . . . . . . . . . . . . . . . . . . . . 11 3.1 Implementation overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Defining the workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Building the navigation tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.2 Defining the workspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.3 Defining the Trader main view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.4 Defining the laredo and bandung workspaces . . . . . . . . . . . . . . . . . 22 3.3 Working with situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Situation basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 Creating situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 © Copyright IBM Corp. 2008. All rights reserved. iii 3.4 Actions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 4. Solution walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 WebSphere failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Flood of call to Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Bad response time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 The next step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 How to get Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 iv Managing Composite Applications: An Operator’s View Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. © Copyright IBM Corp. 2008. All rights reserved. v Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Redbooks (logo) Alerts® CICS® DB2® IBM® ® IMS™ OMEGAMON® pSeries® Redbooks® Tivoli® WebSphere® z/OS® zSeries® The following terms are trademarks of other companies: ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. J2EE, Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others. vi Managing Composite Applications: An Operator’s View Preface A composite application is a distributed implementation of an application, in which it spans several application servers and crosses platform boundaries. This circumstance can create an operations challenge. The condition has gotten worse with the advent of Service-Oriented Architecture (SOA) because many applications have become loosely coupled, meaning that programs can find connections and services at run time, depending on the available environment. Operations management for composite applications is a complex issue. Applications are generally designed based on functionality, not manageability. An operator has to rely on management tools to decode any problem on these applications and recover them. This paper describes an approach for designing a management solution for operators to manage composite applications. It also provides step-by-step instructions for implementing this solution for a sample application, the Trader application, which has been enhanced with Web Services calls and access to Enterprise Service Bus (ESB) mediation functions. The team that wrote this paper This paper was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center. Budi Darmawan is a Project Leader at the International Technical Support Organization, Austin Center. He writes extensively and teaches IBM® classes worldwide on all areas of systems management, primarily application management, business service management, and workload scheduling. Before joining the ITSO in 1999, Budi worked in IBM Indonesia as lead implementor and solution architect. His current interests are J2EE™ and SOA application management, z/OS® integration, and business service management. David Rintoul is a Senior IT Specialist who works as part of the TechWorks group in AP SWG. He has over 20 years of experience in the IT field. He holds a degree in Mathematics from Newcastle University. His areas of expertise include the IBM Service Management products, the ITCAM family of products and the Tivoli® zSeries® products. Howard Anglin is a deployment expert for ITCAM for WebSphere®, Response Time Tracking, IBM Tivoli Monitoring in the United States. He has worked with © Copyright IBM Corp. 2008. All rights reserved. vii various large customers and in his role as an IT Specialist he has resolved deployment, integration, and performance issues. He has 9 years of experience in the Software Test and Development field with emphasis on the WebSphere Application Server. He holds a Bachelor of Science in Electrical Engineering from Manhattan College, Riverdale, New York. Howard began his career at IBM in the pSeries® Hardware Group as a test engineer developing automation solutions for the production line, then transferred to the Software group. Ronaldo Pires is an IBM IT Specialist. He joined IBM in 2004 and has been working on Global Technology Services Delivery in São Paulo, Brazil, supporting the systems management infrastructure for IBM outsourcing customers. His skills include IBM Tivoli Framework, IBM Tivoli Monitoring, IBM Tivoli Storage Manager, IBM Tivoli Identity Manager, Altiris Client Management Suite, BMC Control-M for z/OS and BMC Control-D for z/OS. He holds the degree of Bachelor of Mathematics from Faculdade de Filosofia Ciências e Letras de Santo André. He is a Tivoli Certified Consultant for Tivoli Storage Manager and IBM Certified Deployment Professional for Tivoli Monitoring V5.1.2. Sathyabama S Kuppusamy is a technical lead at the IBM Global Business Solution Center in India, and is currently working in the SOA Solution Center for SOA-based products. She has 6 years of experience in SOA, testing and middleware systems field. She holds the degree of Bachelor of Engineering in Computer Science from University of Madras, India and also holds a degree in Management of Business Administration in Finance from University of Madras, India. Her areas of expertise include SOA, middleware systems, and automation testing. Thanks to the following people for their contributions to this project: Rebecca Poole, Adrian Mitu, Greg Bowman IBM Software Group, Tivoli Systems Become a published author Join us for a two- to six-week residency program! Help write a book dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You will have the opportunity to team with IBM technical professionals, Business Partners, and Clients. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you will develop a network of contacts in IBM development labs, and increase your productivity and marketability. viii Managing Composite Applications: An Operator’s View Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Comments welcome Your comments are important to us! We want our papers to be as helpful as possible. Send us your comments about this paper or other IBM Redbooks® in one of the following ways: Use the online Contact us review Redbooks form found at: ibm.com/redbooks Send your comments in an e-mail to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 Preface ix x Managing Composite Applications: An Operator’s View 1 Chapter 1. Operation and composite application This chapter provides an overview of the redpaper. Topics covered are: 1.1, “Composite application” on page 2 1.2, “A typical operator” on page 3 1.3, “Operation of composite application” on page 3 © Copyright IBM Corp. 2008. All rights reserved. 1 1.1 Composite application In today’s application environment, applications may reside on one or more computer systems. These applications work together to achieve a business function. Distributed application use has also expanded since the advent of Service-Oriented Application (SOA) architecture with its Web Services structure. A typical distributed application spans multiple systems and may also use several types of communication mechanisms, different transport layers, and even different operating system platforms. We view this kind of application as a composite application. Figure 1-1 shows a sample composite application. J2EE server Back-end server J2EE server J2EE server Messaging interface Mainframe server DB2 J2EE server J2EE server Figure 1-1 Composite application The challenges for managing a composite application are: Because it resides on multiple machines, understanding the application and performing problem determination is harder since various components can reside in different places. Managing several operating environments can require you to use different interfaces. For example, a mainframe running z/OS requires the use of a TSO environment, while UNIX® systems use a telnet or ssh interface. Differences in transport can require the ability to connect to different systems; JMS interface, SNA, and TCP/IP may all be participating in the application. Application server technology differences may require a variety of tuning approaches and may introduce different tools for performance management. 2 Managing Composite Applications: An Operator’s View 1.2 A typical operator IT operations personnel or operators have the responsibility of ensuring that IT resources are available and performing correctly in order for the enterprise to perform its business function. In the ITIL® structure, the operator is tied closely to incident management. The operator may be level 2 support for the service desk function that performs initial service recovery. Operators concern themselves with: System and application availability and performance Identification of potential problems and outages Recovering from outages and problems as quickly as possible Operators mainly address issues that require them to use tools to understand IT health and potentially to recover from system problems. The operator does not need a deep understanding of the application structure nor technical expertise on the IT resources. The operator works based on a pre-defined Standard Operation Procedure. Therefore problem identification and recovery must be performed using a known procedure and standard functions. Operators in a mainframe data center are typically concerned with mainframe availability. A large number of best practices, as well as a common understanding of the role and responsibilities of mainframe operators already exist in the IT community. Typically an operator manages a single machine or a cluster of similar machines (SYSPLEX), so the interface and tools are quite uniform. 1.3 Operation of composite application When IT operations move into a distributed application environment, or a composite application environment, each component may be monitored and managed by different tools. Operators may not be able to learn and document all possible combination of tools and outages. The objective of this redpaper is to suggest a standard operation approach for operators working with composite applications. We identify the following major requirements for composite application management for operators: Ability to quickly identify problems or potential problems. Ability to quickly isolate or determine the cause of common performance problems. The term common performance problem is yet to be defined. Ability to act to resolve or rectify the problem source and ensure operation. Chapter 1. Operation and composite application 3 This redpaper provides an approach to design and build a solution to address operational issues in a composite application environment using IBM Tivoli application management tools. 1.4 IBM Tivoli application management tools This section provides an overview of the available application management software products that we will use to build the operator tools. We are building the environment based on a composite application called “Trader.” The Trader application is primarily J2EE-based; the front end is Web-based. It uses Web Services extensively and data can be stored in DB2®, IMS™, or CICS®. The solution structure is shown in Figure 1-2. J2EE server Back-end server J2EE server J2EE server Messaging interface Mainframe server DB2 J2EE server J2EE server IBM Tivoli Monitoring platform Tivoli Enterprise Portal Figure 1-2 Operation solution As shown in Figure 1-2, the solution includes monitoring of the application using: ITCAM for Response Time: This product measures and collects end user response time. It is primarily useful as an early warning to identify application problems. 4 Managing Composite Applications: An Operator’s View ITCAM for Web Resources: This product collects and provides application server statistics. The metrics on an application server help in identifying the source of a problem and providing action support that interacts with the application server. ITCAM for SOA: This product monitors and manages Web Services calls. Our composite application is based on Web Services, so the used of ITCAM for SOA is critical. The monitoring information is collected by IBM Tivoli Monitoring with Tivoli Enterprise Portal interface. The operator monitors a Tivoli Enterprise Portal for alerts and uses the workspaces to diagnose the cause of events. The operator then uses the Portal actions to act on and resolve the problems. Other available application management solutions, which we will not use in this discussion, include: ITCAM for Response Time Tracking: The tracking function of ITCAM for Response Time Tracking may provide additional information for analyzing a problem. We are not using this because the breakdown information about a transaction is generally related to root-cause resolution by application experts, and requires an understanding of the underlying J2EE architecture. ITCAM for WebSphere and ITCAM for J2EE: Although these products share the data collector with ITCAM for Web Resources, the additional functions provided by the managing server interface are mainly for deep-dive and diagnostics of the application, and require deep understanding of the underlying J2EE program. OMEGAMON® XE for Messaging: WebSphere MQ and WebSphere Message Broker can assist if the application utilizes a messaging infrastructure. Our Trader application structure does not use messaging. Chapter 1. Operation and composite application 5 6 Managing Composite Applications: An Operator’s View 2 Chapter 2. Designing the operator interface In this chapter we describe how we designed the operator management tools for our Trader application, including the following topics: 2.1, “Early warning system for application” on page 8 2.2, “Problem analysis design” on page 9 2.3, “Problem resolution facilities” on page 10 © Copyright IBM Corp. 2008. All rights reserved. 7 2.1 Early warning system for application An early warning system for an application can be implemented using IBM Tivoli Monitoring situations. Situations are automated monitors that run at regular intervals and compare monitoring metrics against threshold values. There are hundreds of metrics that can be measured; we must select from among them those we want to measure to keep the system performance reasonable. For this design, we first identify the common problems that operators are most likely to confront and have to resolve. This list would vary for different implementations, but the selection procedure would be the same. In this scenario, we anticipate dealing with the following problems: Bad response time In general, bad response time is the hardest problem to resolve because it can have many different causes. This would be our primary concern in our design. An early warning system for response time can be collected from: – – – – ITCAM for Response Time Web Response Time Agent ITCAM for Response Time Robotic Response Time Agent ITCAM for SOA Web Services performance ITCAM for Web Resources Web application response time Application unavailability Monitoring application availability is required to ensure that business operation are not interrupted by application unavailability. Unavailability is indicated by: – ITCAM for Response Time Robotic Response Time Agent – ITCAM for Web Resources WebSphere Application Server status Application server problems Apart from the performance concerns, some application server resources must be monitored to identify potential problems. Application server metrics that can monitor the health of the server include: – ITCAM for Web Resources WebSphere Application Server status – ITCAM for Web Resources WebSphere Application Server heap size – ITCAM for Web Resources WebSphere Application Server CPU usage Web services problems For SOA-based applications, monitoring Web Services operation provides additional insight about problems in the application. Problems that require warnings include: – ITCAM for SOA Web Services faults 8 Managing Composite Applications: An Operator’s View – ITCAM for SOA message count – ITCAM for SOA message size 2.2 Problem analysis design The analysis should be available from the IBM Tivoli Monitoring workspaces. For the attributes just described, we create the corresponding situation pairs that will monitor the attribute for violation and recovery. The situation name must be meaningful so that it can be parsed to identify the affected resource. Some issues to consider are: Hostname of the originating agent is shown in the hostname field. However, if you are running the Web Response Time situation, then the target HTTP host is stored in the server field. Also, some of the situations are distributed to several agents. The instance of an application server must be correctly identified. However, ITCAM for SOA does not contain the WebSphere profile name. If there are multiple application servers with different profiles but the same server name, it is not easy to distinguish among them. The monitored attribute should be correctly identified in the name. If there could be several situations for the same attribute, but different thresholds, the situation names should indicate this difference too. We used the following naming convention: Application name prefix Attribute name Optional suffix of hostname or application server instance information Table 2-1 lists the situations that we defined. Note that to actually monitor all the conditions mentioned previously, you would need to define several more situations. The situations in Table 2-1 are the one that we used in the scenarios for this paper. Table 2-1 Situation list Name Attribute group Condition Trader_ApplSrvDown WebSphere App Server Status=Disconnected Trader_ApplSrvUp WebSphere App Server Status=Connected Trader_WebRsp Web Response Time AverageResponseTime>5 Trader_ClnRsp Client Response Time AverageResponseTime>5 Chapter 2. Designing the operator interface 9 Name Attribute group Condition Trader_RbtRsp Robotic Response Time AverageResponseTime>5 Trader_WSMsgRate Service Management Agent Current Message Count>10 Trader_WSResp Service Management Agent Environment Average Elapsed Message Round Trip Time>5 2.3 Problem resolution facilities Problem resolution can usually be achieved by the operator invoking IBM Tivoli Monitoring actions; however, not all alerts will indicate problems that the operator can resolve. Some problems must be referred to a subject matter expert (SME) or the next level of support. The actions that the operator can take must be clearly defined and documented. Include information about when and how to invoke specific actions. Also cover expected outcomes and consequences, such as duration of system unavailability, work interruption, and potential backup and recovery impact. We evaluated the existing action sets that are available with the product. The sets can be expanded to include additional necessary actions. Built-in action sets are: ITCAM for Web Resources actions ITCAM for SOA actions Additional actions can be defined to satisfy our requirements. Specifically, we will define additional actions for interacting with WebSphere Services Registry and Repository. 10 Managing Composite Applications: An Operator’s View 3 Chapter 3. Implementation of operator design This chapter describes how we implemented the operator interface that we designed in the previous chapter. The discussion includes: Implementation overview Defining the workspace Working with situations Actions © Copyright IBM Corp. 2008. All rights reserved. 11 3.1 Implementation overview The implementation of the operator interface for managing application performance is described in this chapter. The discussion includes the definition of the following resources: Workspaces: These are the displays that operators will see to perform their work every day. Functionally, the workspace structure should allow them to recognize, diagnose, and take action on application problems. This is explained in 3.2, “Defining the workspace”. Situations: These are the automated monitoring functions that are the primary means for operators to quickly get notified of problems. Operators do not have to traverse the workspaces to find problems. Events from the situations are shown in the Situation Event Console part of the Tivoli Enterprise Portal. This is illustrated in 3.3, “Working with situations” on page 37. Actions: These allow operators to correct problems and errors. Actions can be automated or invoked manually. Action definitions are discussed shown in 3.4, “Actions” on page 42. 3.2 Defining the workspace The detailed steps to define the workspace are presented in this section. At a high level, the steps are: Build the navigation tree Define the workspace Define the Trader main view 12 Managing Composite Applications: An Operator’s View 3.2.1 Building the navigation tree As management agents are configured, they are automatically added to the Physical view workspaces in the navigator panel. As you would expect in our environment, this view shows the various physical servers that we have worked with so far (Figure 3-1). Figure 3-1 Tivoli Enterprise Portal Physical view example Chapter 3. Implementation of operator design 13 In this procedure we will develop a new navigation tree called Trader for a user who must monitor the Trader application only. We can do this using a Logical view in a new navigator. We perform the following steps: 1. Click the Edit Navigator View icon window shown in Figure 3-2. to open the Edit Navigator View Figure 3-2 Edit Navigator View 2. Click the Create New Navigator View icon and enter a name and description, as we did in Figure 3-3. Click OK. Figure 3-3 Create New Logical View 3. In the new Trader navigator item, create two more navigator items. Right-click and select Create Child Item. The two child items represent the WebSphere Application Servers that we use: bandung1 and laredo1. 14 Managing Composite Applications: An Operator’s View 4. The managed systems represent the monitoring agents that provide the information display for the appropriate navigation tree. Select these carefully because they represent available data for your charts. Figure 3-4 shows the properties of the bandung1 navigator view. Figure 3-4 Bandung1 Navigator Managed Systems Chapter 3. Implementation of operator design 15 5. Figure 3-5 shows the display for the laredo navigator item. Figure 3-5 Laredo1 Navigator Managed Systems 6. In Figure 3-4 on page 15 and Figure 3-5, you can see that we use individual agents for ITCAM for SOA and ITCAM for WebSphere, but for ITCAM for Response Time Tracking, we use a shared object from the management server. The entries in the Assigned field have the following meanings: 16 D4 ITCAM for SOA agent data ITCAMSOA ITCAM for SOA agent status KYNA ITCAM for WebSphere agent status KYNS ITCAM for WebSphere agent data T2 ITCAM for Response Time Tracking agent data and status Managing Composite Applications: An Operator’s View 7. You can also add views from the physical view into this new navigator view by simply dragging and dropping. Select a physical view object by clicking it (a box surrounds it when selected) and drop it directly onto the Trader object on the left, ensuring that it has the surrounding outline box. We do not perform this step. Figure 3-6 shows the final navigator tree for our example. Now that the navigator tree is defined, we can close the navigator edit window by clicking Close. Figure 3-6 Final Navigator window Chapter 3. Implementation of operator design 17 3.2.2 Defining the workspaces The initial workspace for the new navigator item is an empty workspace with a notepad and a browser pointing to a generic page showing Workspace not defined, as shown in Figure 3-7. Figure 3-7 Workspace not defined 18 Managing Composite Applications: An Operator’s View We have to define these workspaces for the navigator objects: The main Trader workspace must be the initial display window for each operator and must consume the least amount of processing. Keeping that in mind, we use the alert view for the individual objects as the initial display for the Trader workspace. Alerts® are generated by background collectors called situations. The only additional processing incurred by alerts is what is involved in transporting the alert to and from the Tivoli Enterprise Portal Server. When an operator sees an alert, drilling down to the appropriate application server exposes information used in analyzing the problem. Figure 3-8 shows the completed Trader workspace. Figure 3-8 Trader workspace Chapter 3. Implementation of operator design 19 The Laredo workspace contains the information for laredo. This workspace contains: – WebSphere Application Server statistics from ITCAM for WebSphere: CPU usage, memory usage, transaction rate, and transaction response time. – Response time information from ITCAM for Response Time Tracking that shows the performance of the Trader servlets that run on the TraderClientWeb application. – Web services client information from ITCAM for SOA: message rate and response time. Figure 3-9 shows the completed Laredo workspace. Figure 3-9 Laredo workspace 20 Managing Composite Applications: An Operator’s View The final Bandung workspace, shown in Figure 3-10, contains the following information for bandung: – WebSphere Application Server statistics from ITCAM for WebSphere: CPU usage, memory usage, transaction rate, and transaction response time. – Response time information from ITCAM for Response Time Tracking that shows the performance of the Trader servlets that run on the Trader*Services applications. This is typically called directly from the Java™ application because requests from laredo have been correlated to the calling servlets. Also, we monitor requests from Trader*Web applications. – Web services client information from ITCAM for SOA: message rate and response time. Figure 3-10 Bandung workspace Chapter 3. Implementation of operator design 21 The workspace is defined by dissecting the workspace area using the split vertical button or the split horizontal button. For each area, we then populate it with the appropriate type of chart that we wanted to fit in. Figure 3-11 indicates the available components. Tivoli Enterprise Console Table Browser 3270 terminal Pie chart Take action Bar chart Graphic view Plot chart Universal message console Circular gauge Linear gauge Situation event console Message log Notepad Figure 3-11 Workspace chart components We describe the building of some of the workspace charts in the following sections. 3.2.3 Defining the Trader main view The main view of the Trader workspace is similar to the Enterprise workspace in the physical view. It contains the Situation Event Console and Message Log. This display is adequate if you have already tuned your system and have the appropriate situations defined with the appropriate thresholds. When most of the definitions are valid, you should not get any false alarms or silent problems. Both the Message Log and Situation Event Console are inserted into the area by clicking the appropriate icon and then clicking within the area you want to assign each to. There is no real customization for these types of charts. 3.2.4 Defining the laredo and bandung workspaces The laredo and bandung workspaces shown in Figure 3-9 on page 20 and Figure 3-10 on page 21 are similar, so we discuss them together. Data charts are built from queries. However, the more queries that we submit in a single page, the more processing it introduces to the system. In designing the charts, we take into consideration the number of queries that we use and the possibility of using an IBM-supplied query. 22 Managing Composite Applications: An Operator’s View Figure 3-12 shows our workspace with the areas identified. ITCAM for Web Resources ITCAM for SOA ITCAM for Response Time Figure 3-12 Workspace areas We used these queries for our laredo and bandung workspaces: ITCAM for WebSphere information From the WebSphere Application Server queries, we use the following queries: – Application_Server attribute group, with the existing Application Server query. This query provides CPU usage percentage and memory usage information (total, used, and free memory). The CPU usage is shown as a circular gauge, the total memory and memory used are displayed in a bar chart. – Request_Times_and_Rates attribute group, with the existing Request Time and Rates query that provides average response time and request rate information. These are displayed as linear gauges. ITCAM for Response Time Tracking information From the Response Time Tracking queries, we cannot use the available queries because the original workspaces are mostly accessed through links. Links collect information from previous stages to present information. We create new queries, one for laredo and one for bandung, to present the Chapter 3. Implementation of operator design 23 information from the ITCAM_TT_Policy_Status attribute group. We copy the Response Time Agent Policy Status query to our own query. See “Creating a new query” on page 31. ITCAM for SOA information From the Service Management Agent Environment, under the Services_Inventory attribute group, we can retrieve Web services information. We want to show the response time and invocation rate of Web services. We can either use two existing queries for the same attribute group, which means that data collection will be performed twice, or create a new query that selects the information that we need. We decided to create our own query to collect the information that we use. With this design, the workspace for bandung and laredo uses only four queries to retrieve information. One goes to ITCAM for Response Time Tracking agent, and three go to the application server machine for execution by the ITCAM for SOA agent and ITCAM for WebSphere agent. Setting a query chart This is the procedure for setting a query chart: 1. Select the appropriate chart type from the toolbar and click the workspace area that you want to customize. 2. A prompt asks whether to assign a query (Figure 3-13). Click Yes. Figure 3-13 Query assignment confirmation 24 Managing Composite Applications: An Operator’s View 3. On the chart setting page, click Click here to assign a query (Figure 3-14). Figure 3-14 Empty chart property page Chapter 3. Implementation of operator design 25 4. When you reach the Query Editor page, select the query that you want to assign. Alternatively, you can create a new query, as discussed in “Creating a new query” on page 31. Figure 3-15 shows the example query for Request Times and Rates. Click OK to select the query. Figure 3-15 Query Editor 26 Managing Composite Applications: An Operator’s View 5. Back on the chart property page, select the Filter tab, which enables you to select the columns (attributes) that you want to be displayed on the chart. Multiple columns can be displayed on some charts, such as table, bar, and plot charts, but gauges support only a single column. Select the column by selecting the check box, as shown in Figure 3-16. If your query is valid and there is an appropriate provider for data, you will see a snapshot of data for the query for your reference. Figure 3-16 Chart filter Chapter 3. Implementation of operator design 27 6. You can customize the appearance of the chart using the Style tab. First, change the heading text, which is provided on the initial page, as shown in Figure 3-17. Figure 3-17 Heading text 28 Managing Composite Applications: An Operator’s View 7. Customize chart-specific attributes, such as: – For the circular gauge, customize the shape and value range of the data, as shown in Figure 3-18. Figure 3-18 Circular gauge setting Chapter 3. Implementation of operator design 29 – For the linear gauge, customize the orientation and range of data, as shown in Figure 3-19. Figure 3-19 Linear gauge settings – For the bar chart, customize the orientation and axis labels, as shown in Figure 3-20. Figure 3-20 Bar chart settings 30 Managing Composite Applications: An Operator’s View You can also change the legend text and position, as shown in Figure 3-21. Figure 3-21 Legend customization for bar chart 8. Click OK to save the chart properties and select File → Save Workspace to save the workspace. Creating a new query As discussed in 3.2.4, “Defining the laredo and bandung workspaces” on page 22, we can create a new query to optimize the workspace and provide data for our chart. From the query editor, you can either create a completely new query or copy an exiting query . You must assign a name for the new query, as shown in Figure 3-22. Figure 3-22 Name for new query Chapter 3. Implementation of operator design 31 The query specification defines both the selected attributes and the row selection conditions. Some conditions are mandatory, and typically they are selected from a variable. A variable is specified by enclosing it with $ signs. You can substitute these variables with a fixed value. Conditions specified in the same row represent an AND operation, and conditions specified on different rows represent an OR operation. A sample specification is in Figure 3-23. Figure 3-23 Query specification 32 Managing Composite Applications: An Operator’s View The query for ITCAM for Response Time Tracking that we create copies from the existing query shown in Figure 3-23 on page 32. We modify the query because the PCYGRPID and PCYGRPNAME variables will not be available from a simple workspace. They can only be retrieved from a linked workspace. For laredo, we retrieve all Policies response time information for the policy inside the Trader_Web_appl policy group, as shown in Figure 3-24. Figure 3-24 Laredo policies Chapter 3. Implementation of operator design 33 Retrieve the policy group ID number from the ITCAM for Response Time Tracking dashboard report by hovering the cursor over the policy group to see the status bar, as shown in Figure 3-25. Figure 3-25 Getting policy group ID 34 Managing Composite Applications: An Operator’s View The ITCAM for SOA query that we create selects only specific columns from scratch from the Services_Inventory attribute group. Use the following procedure: 1. Create a new query using the Create Query button. Assign the name and category of the query, as shown in Figure 3-26. Figure 3-26 Query Name and Category Chapter 3. Implementation of operator design 35 2. Select the attributes that you want to collect, as shown in Figure 3-27. Figure 3-27 Query attributes 36 Managing Composite Applications: An Operator’s View 3. The query appears in the Query Editor window, where we provide the selection conditions. For the Services_Inventory attribute group, specify at least Origin Node to be $NODE$. Figure 3-28 shows the condition that we used. Figure 3-28 Setting condition 4. You can further select the attributes to be included in the query by selecting or deselecting them. For attributes that you deselected in a previous session, you can add them by clicking the Add attributes button. 3.3 Working with situations A situation is an automatic monitoring of the system that can be performed based on a certain condition. A situation runs in the background at a predefined interval. It is useful for getting basic health information from a Tivoli Enterprise Monitoring Agent. This section discusses how to create custom situations and incorporate them into a workflow for monitoring our Trader application environment. Because the situation will be used in a workflow, it will not be auto-started. Only the workflow has to be auto-started, and this starts the situation. Chapter 3. Implementation of operator design 37 3.3.1 Situation basics A situation is a conditional expression that is evaluated at certain intervals. The situation can evaluate to true; this is considered a situation change event. A situation evaluates the attributes in an agent from an attribute group. Because an attribute group is considered a table and attributes are its columns, the situation definition contains the following components: The name of the situation The category of the situation The attribute group that will be evaluated The row selection condition from the attribute group The selection condition can contain multiple expressions: Filtering for certain types of data, such as servlet name, policy name, or other attributes. Checking data values for a selected row. This checking can be considered a threshold for the data in the table. Some checking uses an aggregation function such as count, maximum, or minimum. This type of checking is performed at the Tivoli Enterprise Monitoring Server. 3.3.2 Creating situations A situation is created using the situation wizard. You can add a situation manually or create a situation from existing one. Almost any existing situation can be used as a template to show the situation capabilities. The situation wizard is launched using the icon. 38 Managing Composite Applications: An Operator’s View As illustrated in Figure 3-29, you define the situation name and the attribute group to use, then click OK. Select the attribute you want to use for the condition and click OK to open the situation dialog. Figure 3-29 Creating situation Chapter 3. Implementation of operator design 39 The situation definition is shown in Figure 3-30. Define the condition, invocation interval, and the severity level of the alert in the Formula page. Figure 3-30 Situation definition 40 Managing Composite Applications: An Operator’s View We created the situations defined in 2.1, “Early warning system for application” on page 8. You can decide which agent to run the situations on. A situation can have a predefined action attached to it as shown in Figure 3-31. Figure 3-31 Action command definition The action is a script that can be executed from the Tivoli Enterprise Monitoring Server or the Tivoli Enterprise Monitoring Agent that detects the problem. The script can use arguments from the fired situation. Chapter 3. Implementation of operator design 41 A situation must be associated with a node in the Navigator tree so that it appears in the Situation Event List. The association is performed when you right-click the node and select Situations. In the situation window, click the button and choose all possible associations. Select the situation that you created and select Associate from the context menu. 3.4 Actions Actions can be created within a Tivoli Enterprise Portal workspace context. Right-click on a workspace area and select Take action → Create or Edit. The dialog for selecting an existing action or creating a new one is displayed, as shown in Figure 3-32. Figure 3-32 Selecting action You can then define the action. A sample built-in action for restarting WebSphere Application Server is shown in Figure 3-33. Figure 3-33 Sample action 42 Managing Composite Applications: An Operator’s View 4 Chapter 4. Solution walkthrough This chapter takes you through several scenarios that demonstrate the practical use of the solution we developed in this paper, and suggests areas for further enhancements. The topics covered are: 4.1, “WebSphere failure” on page 44 4.2, “Flood of call to Web Services” on page 45 4.3, “Bad response time” on page 46 4.4, “The next step” on page 48 © Copyright IBM Corp. 2008. All rights reserved. 43 4.1 WebSphere failure The WebSphere Application Server is unavailable. This is indicated from the situation shown in Figure 4-1. Figure 4-1 WebSphere unavailable situation The situation that monitors WebSphere problems has an automatic action to route the Web Services call. This is performed by modifying the metadata in the WebSphere Services Registry and Repository to flag the failed server as unavailable. The Web Services calls would still be routed successfully to an available server. The operator can then restart WebSphere using the ITCAM for Web Resources action to start the failed WebSphere Application Server, as shown in Figure 4-2. Figure 4-2 Restarting WebSphere The restart resets the alert for WebSphere unavailability. The routing is recovered automatically from the reset situation (informational situation). 44 Managing Composite Applications: An Operator’s View 4.2 Flood of call to Web Services This scenario indicates an unusually high number of Web Services calls to one of the providers. This also generates a high number of faults on the server. Figure 4-3 shows the alerts related to this condition. Figure 4-3 Web Services call Looking at ITCAM for Response Time, we notice that the calls are originating from a single client. The client may have gotten into a loop or been overtaken by a hostile process (virus). The operator can decide to put a calling filter on the server. The invocation of the filter is shown in Figure 4-4. Figure 4-4 Defining filter Chapter 4. Solution walkthrough 45 After the filter is applied, the calling rate goes back to normal and the situation is cleared. You can notice that the filter is still in effect as shown in Figure 4-5. Figure 4-5 Checking filter 4.3 Bad response time When response time gets worse, the operator is notified of poor trader response time from the ITCAM for Response Time Web Response Time agent as shown in Figure 4-6. Notice that there are other indications from ITCAM for Web Resources and ITCAM for SOA that also show bad response time. Figure 4-6 Bad response time from Web Response Time agent The operator can view the ITCAM for Response Time dashboard and analyze the underlying configuration, check the response time, and so on. Figure 4-7 on page 47 shows failed requests for some of the Web Services. 46 Managing Composite Applications: An Operator’s View Figure 4-7 Web Services call rate Because the server must be supporting another application that is slowing down, the operator can decide to route the Web Services call to another server. This can be done by manipulating the Web Services calling route in WebSphere Services Registry and Repository. Chapter 4. Solution walkthrough 47 4.4 The next step We have shown some examples of possible composite application management with our Tivoli Enterprise Portal and IBM Tivoli Composite Application Management solution here, and the solution can still be enhanced. Among the possible enhancements that we did not get to test and implement are the following ones: Automation: Automation can be implemented in the solution to initiate some actions without major investigation. One example is the automatic restart of WebSphere discussed in 4.1, “WebSphere failure” on page 44, Business view: Integration of situation processing with Tivoli Business Services Manager allows a business view to be defined, for example, to show Service Level attainment to the executive or managerial level of the business. Provisioning: Some environments with fluctuating load would greatly benefit from automatic provisioning (and de-provisioning) of application servers to accommodate more processing capacity. The provisioning process can be integrated into the overall management architecture. 48 Managing Composite Applications: An Operator’s View Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this paper. IBM Redbooks publications For information about ordering these publications, see “How to get Redbooks” on page 52. Note that some of the documents referenced here may be available in softcopy only. IBM Tivoli Composite Application Manager Family Installation Configuration and Basic Usage, SG24-7151 Deployment Guide Series: IBM Tivoli Composite Application Manager for WebSphere V6.0, SG24-7252 Getting Started with IBM Tivoli Monitoring 6.1 on Distributed Environments, SG24-7143 IBM Tivoli OMEGAMON XE V3.1.0 Deep Dive on z/OS, SG24-7155 Implementing OMEGAMON XE for Messaging V6.0, SG24-7357 Installing WebSphere Studio Application Monitor V3.1, SG24-6491 Large-Scale Implementation of IBM Tivoli Composite Application Manager, REDP-4162 Migrating to Netcool/Precision for IP Networks --Best Practices for Migrating from IBM Tivoli NetView, SG24-7375 Solution Deployment Guide for IBM Tivoli Composite Application Manager for WebSphere, SG24-7293 Unveil Your e-business Transaction Performance with IBM TMTP 5.1, SG24-6912 WebSphere Studio Application Monitor V3.2 Advanced Usage Guide, SG24-6764 Managing SOA Environment with Tivoli, REDP-4318 © Copyright IBM Corp. 2008. All rights reserved. 49 Other publications These publications are also relevant as further information sources: IBM Tivoli Composite Application Manager for SOA – Configuring IBM Tivoli Composite Application Manager for SOA on z/OS, SC32-9493 – IBM Tivoli Composite Application Manager for SOA Installation and User's Guide, GC32-9492 – IBM Tivoli Composite Application Manager for SOA Program Directory, GI11-4087 – IBM Tivoli Composite Application Manager for SOA Release Notes, GI11-4096 – IBM Tivoli Composite Application Manager for SOA: Installing and Troubleshooting IBM Web Services Navigator, GC32-9494 IBM Tivoli Composite Application Manager for Response Time – IBM Tivoli Composite Application Manager for Client Response Time User's Guide Version 6.2, SC23-6332 – IBM Tivoli Composite Application Manager for Web Response Time User's Guide Version 6.2, SC23-6333 – IBM Tivoli Composite Application Manager for Robotic Response Time User's Guide Version 6.2, SC23-6334 – IBM Tivoli Composite Application Manager for End User Response Time Dashboard User's Guide Version 6.2, SC23-6335 – IBM Tivoli Composite Application Manager for Response Time Problem Determination Guide Version 6.2, GI11-8061 IBM Tivoli Composite Application Manager for Web Resources – IBM Tivoli Composite Application Manager for Web Resources: J2EE Data Collector Installation Guide, GC23-6179 – IBM Tivoli Composite Application Manager for Web Resources: WebSphere Distributed Data Collector Installation Guide, GC23-6180 – IBM Tivoli Composite Application Manager for Web Resources: J2EE Agent Installation Guide, GC23-6181 – IBM Tivoli Composite Application Manager for Web Resources: WebSphere Agent Installation Guide, GC23-6182 – IBM Tivoli Composite Application Manager for Web Resources: Web Servers Agent Installation Guide, GC23-6183 50 Managing Composite Applications: An Operator’s View – IBM Tivoli Composite Application Manager for Web Resources: Community Edition Data Collector Installation Guide, GC23-6184 – IBM Tivoli Composite Application Manager for Web Resources: Quick Start Guide, GC23-6185 – IBM Tivoli Composite Application Manager for Web Resources: J2EE Agent Problem Determination Guide, GI11-8160 – IBM Tivoli Composite Application Manager for Web Resources: WebSphere Agent Problem Determination Guide, GI11-8161 – IBM Tivoli Composite Application Manager for Web Resources: Web Servers Agent Problem Determination Guide, GI11-8162 IBM Tivoli Monitoring – Exploring IBM Tivoli Monitoring, SC32-1803 – IBM Tivoli Monitoring Administrator's Guide, SC32-9408 – IBM Tivoli Monitoring: Configuring IBM Tivoli Enterprise Monitoring Server on z/OS, SC32-9463 – IBM Tivoli Monitoring Installation and Setup Guide, GC32-9407 – IBM Tivoli Monitoring Problem Determination Guide, GC32-9458 – IBM Tivoli Monitoring User's Guide, SC32-9409 – IBM Tivoli Monitoring: Upgrading from Tivoli Distributed Monitoring, GC32-9462 – IBM Tivoli Universal Agent API and Command Programming Reference Guide, SC32-9461 – IBM Tivoli Monitoring Universal Agent User's Guide, SC32-9459 – Introducing IBM Tivoli Monitoring, GI11-4071 Online resources These Web sites are also relevant as further information sources: IBM Tivoli http://www.ibm.com/tivoli IBM Tivoli Composite Application Manager for WebSphere product page http://www.ibm.com/software/tivoli/products/composite-application-mg r-websphere/ Related publications 51 IBM Tivoli Composite Application Manager for SOA product page http://www.ibm.com/software/tivoli/products/composite-application-mg r-soa/ DB2 UDB Version 8 FixPaks and clients http://www.ibm.com/software/data/db2/udb/support/downloadv8.html ITCAM for Response Time Tracking Fix Pack 1 http://www3.software.ibm.com/ibmdl/pub/software/tivoli_support/patches/ Open Group Web site for Application Response Management (ARM) http://www.opengroup.org/arm Microsoft® link for InstallShield error http://support.microsoft.com/default.aspx?scid=kb;en-us;295278 Java specification for JAX-RPC: JSR-000109 Implementing Enterprise Web Services http://www.jcp.org/aboutJava/communityprocess/final/jsr109/ Eclipse Web site http://www.eclipse.org How to get Redbooks You can search for, view, or download Redbooks, Redpapers, Technotes, draft publications, and Additional materials, as well as order hardcopy Redbooks, at this Web site: ibm.com/redbooks Help from IBM IBM Support and downloads ibm.com/support IBM Global Services ibm.com/services 52 Managing Composite Applications: An Operator’s View Back cover Managing Composite Applications: An Operator’s View Composite application management issues and considerations Operator management requirements identified Includes Tivoli Enterprise Portal customization A composite application is a distributed implementation of an application, in which it spans several application servers and crosses platform boundaries. This circumstance can create an operations challenge. The condition has gotten worse with the advent of Service-Oriented Architecture (SOA) because many applications have become loosely coupled, meaning that programs can find connections and services at run time, depending on the available environment. Operations management for composite applications is a complex issue. Applications are generally designed based on functionality, not manageability. An operator has to rely on management tools to decode any problem on these applications and recover them. This paper describes an approach for designing a management solution for operators to manage composite applications. It also provides step-by-step instructions for implementing this solution for a sample application, the Trader application, which has been enhanced with Web Services calls and access to Enterprise Service Bus (ESB) mediation functions. ® Redpaper INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment. For more information: ibm.com/redbooks REDP-4319-00 ™