IBM Power Systems Monitoring System Performance and Health of IBM i Dawn May - dmmay@us.ibm.com © 2012 IBM Corporation IBM Power Systems Goals Do you need to understand system performance in real-time? Do you need to know who is consuming the system CPU? Do you want to notify an operator when a partition is not performing as expected? Do you want to know immediately when an inquiry message is sent or when a critical job ends? This session will show you how to automate monitoring so you can focus on other aspects of your job. By the end of this session, you will be able to: – Create and use management central monitors – Understand monitoring with IBM Systems Director – Understand Health Indicators with IBM Systems Director Navigator 2 © 2012 IBM Corporation IBM Power Systems Agenda Three Toolsets Monitoring with Management Central Monitoring with IBM Systems Director Performance Health Indicators 3 © 2012 IBM Corporation IBM Power Systems Monitoring Interfaces Three different management tools – System i Navigator • Windows client application “iSeries Navigator”, “Operations Navigator” • Management Central Monitors – IBM Systems Director • Cross-platform systems management solution • Platform management, HW alerts • Real-time performance monitoring – IBM Systems Director Navigator • Browser-based interface to manage a single IBM i partition • IBM i Performance Tasks - Performance Data Investigator • Health Indicators 4 © 2012 IBM Corporation IBM Power Systems Notes: Monitoring Interfaces - where to get them? System i Navigator, is included with IBM iTM at no additional cost. The IBM i function is integrated into the base of the operating system. The client function is shipped as part of System i Access for Windows. Management Central is a technology integrated into System i Navigator and is not directly installed. When installing System i Access for Windows, choose 'Custom Install'. Expand the System i Navigator option tree and select the appropriate components such as Monitors, Commands..... The general rule for Management Central functions for connectivity is that N-2 and N+2 releases are supported. However, for the best performance and most functions available it is strongly recommended that your IBM System i Navigator and your Central System be at the highest release you have available. Your endpoint systems can then be at a mix of previous releases. Also, note that the functions available to you are only as current as the client and Central System combination. (i.e. if System i Navigator is at 6.1 and the Central System is at 5.4, only 5.4 functions will be available). Also, in almost all cases, function new in a certain release can’t be run to endpoints at older releases. IBM Systems Director Navigator is the Web console available beginning with IBM i 6.1. It is included with IBM i; no installation is necessary, all you need in a Web browser. IBM Systems Director is a browser-based management tool. You need to download and install the management server (it doesn't run on i) as well as installing the agents required on each endpoint. For monitoring on i, you need to install the 5770UME licensed program product and 5733-SC1, Option 1 (SSH) on each IBM i partition you want to monitor. IBM Systems Director can be downloaded from http://www-03.ibm.com/systems/software/director/downloads/index.html 5 © 2012 IBM Corporation IBM Power Systems Management Central Monitors 6 © 2012 IBM Corporation IBM Power Systems Know your Tools Firewall and Internet Web application server Central System IBM Systems Director Navigator for i – Performance tasks 7 Endpoint Systems System Group IBM System i Navigator for Windows -- monitoring -- historical trending © 2012 IBM Corporation IBM Power Systems Notes - Terminology IBM System i Navigator provides a graphical user interface to IBM i® . It comes in 2 complimentary options: Windows and Web. System i Navigator for Windows is in installed on the PC so the user can have a rich graphical interface to interact with their systems. System i Navigator tasks on the Web perform a subset of Navigator tasks through an Internet Web browser. These are URL-addressable links only. IBM Systems Director Navigator for i is the browser-based console that has much of the function of System i Navigator; monitors however, are not available through this interface. The Performance tasks is only available through this Web console. IBM® i® integrated Web application server - The i integrated Web application server (5761-SS1) integrates an OSGi-based Web-servlet container with the i operating system. (5.4 and later) Central System helps to manager your other systems (called endpoints) and stores most management information (inventory, command, package, product, and user definitions, etc). Endpoints are the systems which your PC does not need to be in direct contact with in order to "manage". Source System is the system from which objects, files and information are sent using Management Central's send tasks. The Source System is the source of the objects, files and information being sent. Model System has all and only desired fixes installed or has all system values set properly for the targets. Target System is where objects, files and information are sent within Management Central's send tasks. The Target Systems are the destinations of the objects, files and information being sent. Target Systems (and more generally, endpoint systems) are often grouped into System Groups. 8 © 2012 IBM Corporation IBM Power Systems Real-time performance graphs with System Monitors 9 © 2012 IBM Corporation IBM Power Systems System Monitors Monitor system performance – Predefined metrics ● Enabled with... – – – – Event logging Trigger/Reset notification CL Command Automation System Actions ● 10 CPU Utilization, Disk Arm Utilization... Job actions (hold, release...) © 2012 IBM Corporation IBM Power Systems System Monitors Select 'New Monitor...' and specify General properties 11 © 2012 IBM Corporation IBM Power Systems Notes: Select 'New Monitor...' & specify General properties You need to name your monitor – specific names work well. You can select one or multiple metrics in a single monitor Question: How many metrics do you put in a single monitor? Answer: It depends 1. do you like to see all monitors on a single screen? 2. do you prefer to have more granular monitor notifications There is no limit on the number of endpoint systems that a monitor can be started on. However you do get into usability issues when displaying the graph, too many systems on a graph and it might get to difficult to view. General - The General page for New Monitor or Monitor Properties allows you to view and change general information about the monitor. The general information includes the name of the monitor, a brief description of the monitor. Name - The unique name of the monitor. You can change the name, using up to 64 characters for the new name. Do not use any of the following characters: asterisk (*), backslash (\), colon (:), greater than (>), less than (<), question mark (?), quotation mark (“), slash (/), or vertical bar (|). Description - A brief description to help you identify this monitor in a list of monitors. You can change the description, using up to 64 characters for the new description. 12 © 2012 IBM Corporation IBM Power Systems System Monitors Select the 'Metrics to monitor‘, when done press OK to create What How often Vertical axis Horizontal axis 13 © 2012 IBM Corporation IBM Power Systems Notes: Define A Monitor The Metrics page for New Monitor or Monitor Properties allows you to select the metrics that you want to monitor. You can view and change information about the collection interval, the maximum graphing value, and the display time for each metric. You can also click Threshold 1 or Threshold 2 to specify information about the thresholds for each metric. Metrics is the piece of information to collect. Possible values are: CPU Utilization (Average) Communications IOP Utilization (Average) CPU Utilization (Interactive Jobs) Communications IOP Utilization (Maximum) CPU Utilization (Interactive Feature) Communications Line Utilization (Average) CPU Utilization Basic (Average) Communications Line Utilization (Maximum) CPU Utilization (Secondary Workloads) LAN Utilization (Average) CPU Utilization (Database Capability) LAN Utilization (Maximum) Interactive Response Time (Average) Machine Pool Faults Interactive Response Time (Maximum) User Pool Faults (Average) Transaction Rate (Average) User Pool Faults (Maximum) Transaction Rate (Interactive) Disk Storage (Average) Batch Logical Database I/O Disk Storage (Maximum) Disk Arm Utilization (Average) Disk IOP Utilization (Average) Disk Arm Utilization (Maximum) Disk IOP Utilization (Maximum) Collection Interval is the time to wait in-between each collection of data. Maximum graphing value is the highest value to be displayed on the vertical axis of the graph. Display time is how many minutes you want displayed on the horizontal axis of the graph. 14 © 2012 IBM Corporation IBM Power Systems System Monitors Setting Thresholds What Condition Automation 15 © 2012 IBM Corporation IBM Power Systems Notes: Setting Thresholds Threshold - A threshold is a setting for a metric that is being collected by a monitor. This setting allows you to specify actions to be taken when a specified value (called the trigger value) is reached. You can also specify actions to be taken when a second value (called the reset value) is reached. For example, you can specify a CL command that stops any new jobs from starting when CPU utilization reaches 90% and another command that allows new jobs to start when CPU utilization falls to less than 70%. You can also choose to add an event to the Event Log whenever the trigger value or the reset value is reached. You can set up to two thresholds for each metric that the monitor is collecting. Trigger - considered bad (usually high but can be low), reset - consider good (opposite of trigger) The two Thresholds tabs on the metrics page provide a place for you to specify whether or not you want to monitor this metric for a particular threshold. You must check the Enable threshold box before you can specify the conditions to trigger and to reset this threshold. You can also specify the action to be taken when the threshold is triggered and when it is reset. The action that you specify must be a CL command. When you click OK, this metric will be actively monitored for this threshold if the monitor is currently running. If the monitor is not currently running, this metric will be monitored for this threshold the next time the monitor is started You can specify the following conditions and commands for Threshold trigger and for Threshold reset: Value - Specifies the condition that must be met to trigger or to reset this threshold. Duration - Specifies the number of consecutive collection intervals that the value must meet the criterion to cause a threshold trigger or reset event. Specifying a higher number of collection intervals for Duration helps to avoid unnecessary threshold activity due to frequent spiking of values. i command - Specifies the command to be run on the i endpoint system when the threshold is triggered or reset on that endpoint. This command can be as simple as sending a message, or as complex as submitting or calling a program. 16 © 2012 IBM Corporation IBM Power Systems Notes: Threshold variables System Monitor Replacement Variables: Parameter Passed Data &DATE &INTVL &MON &RDUR &RVAL &SEQ &TDUR &TIME &TVAL &VAL The Date the monitor triggered or reset Collection interval: How often the monitor collected data (in seconds) The Monitor name Reset duration: How many intervals does the reset value have to be met before the monitor resets. Reset value: The value that the metric was monitoring for when the monitor reset Sequence number: A unique, incrementing number assigned to each collection interval. Can be used in a program to compare when triggers happened and in what sequence. Trigger duration: How many intervals does the trigger value have to be met before the monitor triggers The time the monitor triggered or reset Trigger value: The value that the metric was monitoring for when the monitor triggered Current value: The actual value of the metric when the monitor triggered (2) Note: A couple of things to note about system monitor replacement parameters: - The dollar sign ($) that was available in previous releases is still supported, for example, $TIME. - The wording is a bit different on some metrics and values: - Batch I/O is shown as I/O operations rather than transactions per second. - Transaction rates are shown as transactions rather than transactions per second. - Interactive response times (both average and maximum) are shown in milliseconds rather than seconds. 17 © 2012 IBM Corporation IBM Power Systems System Monitors Select the monitor, then the start button 18 © 2012 IBM Corporation IBM Power Systems Notes: Start A Monitor The Start Monitor dialog allows you to select the endpoint systems and system groups on which you want to start the monitor (if no endpoint systems or system groups have been previously selected for this monitor). To add a system or group to the Selected systems and groups list, select it in the Available systems and groups list, and then click Add. If a monitor is started and then a system is added, the monitor will be started on that endpoint system automatically. To remove a system or group from the Selected systems and groups list, select it in the list, and then click Remove. If a monitor is started and then a system is removed, the monitor will be stopped on that endpoint system automatically. Available systems and groups - A list of endpoint systems and system groups from which you can select a system or group. Click the plus sign (+) next to any group to see the systems that are included in the group. Monitor data is collected and stored on the endpoint system. A minimum amount of data is actually sent back to the client when viewing the graph, The more specific, detailed data is only sent to the client when the graphs are open PC is not required to be connected once monitor is started. The graph window can also be minimized and the monitor will still be active. The data shown in the graph is obtained from Collection Services. Collection Services houses the data in management collection objects. This data is used by system monitors, job monitors and other performance tools. 19 © 2012 IBM Corporation IBM Power Systems System Monitors View the status Overall status 20 © 2012 IBM Corporation IBM Power Systems Notes: View Status for a Monitor The Status dialog allows you to see the current status of each endpoint system and system group associated with a monitor. The status of each system and group is updated automatically as changes occur. You can expand any group in the System or Group list to see the status of individual systems in the group. By clicking the Restart button, you can restart the monitor on any systems on which it has failed. Overall status - The current status of the monitor. Possible values are: x thresholds triggered - The number of thresholds that are currently active for the monitor (that is, x represents the number of thresholds that have been triggered but have not been reset). Started on x of y systems - The monitor is collecting data on x of y endpoint systems, where x represents the number of systems where the monitor is running and y represents the number of systems where you requested to start the monitor. The monitor is in the process of starting on the remaining systems. Started - The monitor is collecting data on all endpoint systems where you requested to start the monitor. Starting - The monitor is in the process of starting. Stopping - The monitor is in the process of stopping. Stopped - The monitor is no longer collecting data. Failed - An attempt was made to start the monitor on the specified systems or groups, but the monitor was not started on any systems. The failure may have occurred because the systems were not running when you tried to start the monitor, or it may be because a connection was lost or a server was not started. Click Restart to try starting this monitor again. Failed on x of y systems - The monitor has failed to start or unexpectedly stopped working on x of y systems (where x is the number of systems on which work has stopped and y is the total number of systems on which the monitor is to be run). The monitor is starting or started on the remaining systems. The failure may have occurred because the systems were not running when you tried to start the monitor, or it may be because a connection was lost or a server was not started. Click Restart to try starting this monitor again. See the System or Group Status for a list of the endpoint systems and system 21 © 2012 IBM Corporation groups associated with the monitor and the current status of each system and group. IBM Power Systems System Monitors Viewing the thresholds Threshold Indicators Drill down with Actions 22 © 2012 IBM Corporation IBM Power Systems Changing Thresholds Properties Active Control Menu 23 © 2012 IBM Corporation IBM Power Systems Notes: Changing Thresholds You can change the thresholds several ways. Properties Active graphical control Menu items You can change thresholds while a monitor is started, e.g., you do not need to stop the monitor to change the thresholds. The general properties of the monitor can be accessed view the toolbar or menu items for making any changes or additions to the thresholds and values. To change the trigger value or the reset value for a threshold using the active graphical control, place the mouse pointer on the threshold indicator. When the ToolTip indicates Trigger, hold the mouse button down and move up or down to change the trigger value. The changing values are shown in the ToolTip. When the ToolTip indicates Reset, hold the mouse button down and move up or down to change the reset value . Click any collection point on a Monitor graph line to see Details of the data associated with the collection point. By accessing the menu items, you will taken directly to the thresholds page in properties to make any changes. There are several visual indicators when a threshold occurs: Status in the toolbar area. Upper Left corner icon will change. Line in the graph will change to red. Metric graph title will change to red with icon indicator 24 © 2012 IBM Corporation IBM Power Systems Threshold Actions IBM i PC Client 25 © 2012 IBM Corporation IBM Power Systems Notes: Threshold Actions The Actions page for Monitor Properties allows you to specify the actions to occur when a threshold is triggered and when a threshold is reset which apply to all metrics. Log event - Adds an entry to the Event Log on the central system indicating that the threshold was triggered. The entry also includes the date and time the event occurred, the endpoint system being monitored, the metric being collected, and the monitor that logged the event. Open Event Log - Displays the Event Log, which is a list of threshold trigger and reset events that have occurred. Open monitor - Displays a graphical view of the metrics as they are being collected. Sound alarm - Sounds an alarm on the PC. Threshold commands will be run under the monitor's owner's user profile. When a threshold gets triggered/reset, your PC client does not need to be up and running to run the Operating System command. However, if it is not up the corresponding PC action will not happen. 26 © 2012 IBM Corporation IBM Power Systems Viewing Events From Thresholds 27 © 2012 IBM Corporation IBM Power Systems Notes: Event Logs The Event Log window displays a list of threshold trigger and reset events for all of your monitors. You can specify on the Properties page for each monitor whether or not you want events added to the Event Log. To see the Properties page for any monitor, select the monitor in the Monitors window and then select Properties from the File menu. The list of events is arranged in order by date and time by default, but you can change the order by clicking on any column heading. For example, to sort the list by the endpoint system where the event occurred, click on System. An icon to the left of each event indicates the type of event:: A red circle with white x - indicates that this event is a trigger event for which you did not specify a host command to be run when the threshold was triggered. A yellow circle with red x - Indicates that this event is a trigger event for which you specified a host command to be run when the threshold was triggered. A white check with a check mark -indicates that this event is a threshold reset event. You can customize the list of events to include only those that meet specific criteria by selecting Options from the menu bar and then selecting Include. You can have more than one Event Log window open at the same time, and you can work with other windows while the Event Log windows are open. Event Log windows are updated continuously as events occur. 28 © 2012 IBM Corporation IBM Power Systems Customize Event Log Information 29 © 2012 IBM Corporation IBM Power Systems Notes: Options – Include Options menu choices Click Options on the menu bar to display the actions you can perform to change what information is displayed. The possible choices are: Include... Displays the Include dialog, which allows you to specify which events you want to display in the list. Columns... Displays the Columns dialog, which allows you to specify which columns of information you want to display in the list. You can also specify the order in which you want the columns to be displayed. 30 © 2012 IBM Corporation IBM Power Systems Event Properties - Trigger (reset similar) 31 © 2012 IBM Corporation IBM Power Systems Notes: Event Properties The Trigger/Reset page for Event Properties allows you to view additional information about the event. This information includes the value, the duration, the Operating System command and the sequence number of the event. Trigger/Reset value - The value specified in the monitor properties. Actual value - The actual value that exceeded the trigger value and caused the trigger event. Duration - The number of collection intervals specified for the duration in the monitor properties. Operating System command - The command that was run on the endpoint system when the event occurred. The General page for Event Properties allows you to view general information about the event. The general information includes the type of event (trigger or reset), the date and time the event occurred, the endpoint system that the event occurred on, the metric that was being collected, and the name of the monitor that logged the event. For more information, select the following: Event type System Date Time Monitor Metric 32 © 2012 IBM Corporation IBM Power Systems Management Central Job Monitors Monitor the system by selecting ... – Criteria to subset data ● Job criteria - subsystem, job name, job type or user ● Server name - web server, ftp server... – Predefined metrics ● Enabled with... – – – – Event logging Trigger/Reset notification CL Command Automation System Actions ● 33 Job Count, Thread count, CPU Utilization, … Job actions such hold, release... © 2012 IBM Corporation IBM Power Systems Job Monitors Select 'New Monitor...' and specify General properties 34 © 2012 IBM Corporation IBM Power Systems Job Monitors Select 'Metrics to monitor' and thresholds then press OK to create Different Metrics Types Levels Problem Condition Automation 35 © 2012 IBM Corporation IBM Power Systems Notes: Define A Monitor The Metrics page for New Monitor or Monitor Properties allows you to select the metrics that you want to monitor. You can view and change information for each metric. You can also click Threshold 1 or Threshold 2 to specify information about the thresholds for each metric. Metrics are the pieces of information to collect. Possible values are: Job Count, Job Log Message and Job Status Job Numeric Values: CPU Percent Utilization Logical I/O Rate Disk I/O Rate Communications I/O Rate Transaction Rate Transaction Time Thread Count Page Fault Rate Summary Numeric Values (same a job level) 36 © 2012 IBM Corporation IBM Power Systems Notes: Threshold variables Job Monitor Replacement Variables: 37 Parameter &DATE &INTVL &MON &TIME &ENDPOINT &EVENTTYPE Passed Data The Date the monitor triggered or reset Collection interval: How often the monitor collected data (in seconds) The monitor name The time the monitor triggered or reset The endpoint system name Event type: The type of trigger or reset that is happening, defined as follows: Triggered Event =1 Auto Reset Event =2 Manual Reset Event =3 &JOBNAME &JOBNUMBER &JOBSTATUS &JOBTYPE &JOBUSER The job name of the job causing the trigger/reset The job number of the job causing the trigger/reset The job status causing a trigger/reset The job type of the job causing the trigger/reset The job user of the job causing the trigger/reset © 2012 IBM Corporation IBM Power Systems Notes: Threshold variables Job Monitor Replacement Variables (continued): Parameter &METRICTYPE Passed Data The category of the metric. For a Job monitor, the categories are as follows: Status Metric = 10010 Message Metric = 10020 Numeric Metric = 10030 &METRIC Metric that has triggered/reset, defined as follows: Job CPU Utilization = 1010 Summary Comm I/O Job Logical I/O = 1020 Summary Trans. Rate Job Disk I/ = 1030 Summary Trans. Time Job Comm I/O = 1040 Summary Thread Cnt Job Transaction Rate = 1050 Summary Page Faults Job Transaction Time = 1060 Job Status Job Thread Count = 1070 Job Log Messages Job Page Faults = 1080 Summary Job Count Summary CPU Utilization = 2010 Summary Logical I/O = 2020 Summary Disk I/O = 2030 38 = 2040 = 2050 = 2060 = 2070 = 2080 = 3010 = 3020 = 4010 © 2012 IBM Corporation IBM Power Systems Notes: Threshold variables Job Monitor Replacement Variables (continued): Parameter &NUMCURRENT &NUMRESET Passed Data Current numeric value Threshold value to cause auto-reset of numeric metric &NUMTRIGGER Threshold value to cause trigger of a numeric metric &OWNER &RDUR &RESETTYPE &TDUR Monitor owner Reset duration, in intervals, as set in the threshold Reset type and defined as follows: Manual reset = 1 Automatic reset = 2 Subsystem of the job causing the trigger/reset Server type of the job causing the trigger/reset. Note: Not supported for summary metrics. Trigger duration, in intervals, as set in the threshold &THRESHOLD &MSGID &MSGSEV Threshold number causing the trigger Message ID causing the trigger/reset Message severity causing the trigger/reset &MSGTYPE Message type causing the trigger/reset &SBS &SERVER 39 © 2012 IBM Corporation IBM Power Systems Notes: Threshold variables Invalid Job Monitor Replacement Variable Combinations: Job Monitor substitution parameter notes: • If a monitor is triggered and the user performs a manual reset ("Reset with Commands" or "Reset Only") , there is no substitution value for the Parm &NUMRESET, &RDUR. It will only have a value if the reset is automated. • &MSGID, &MSGSEV, or &MSGTYPE you need to be monitoring the 'Job Log Message' metric - otherwise there is no substitution value for these. Additionally, these are only valid in the trigger and reset commands of Job Log Messages thresholds. • &RESETTYPE only has a valid substitution value on a reset command. Constant values are used to determine whether the reset type is manual or automated. • &EVENTTYPE is valid for all substitution and has constant values that are used to determine the type of monitor event that occurred (automated trigger, automated reset, or manual reset). In a trigger command, the value is always the trigger constant; in a reset command, it can either be the automated reset or manual reset constant. • &TDUR, &NUMTRIGGER, and &NUMCURRENT only have valid substitution when a trigger occurs, in the trigger command. • &NUMTRIGGER, &NUMCURRENT, and &NUMRESET only have valid substitution when a "numeric" metric is being monitored, in the trigger and reset commands of numeric metric thresholds. • &JOBSTATUS only has valid substitution when the Job Status metric is monitored, in the trigger and reset commands of Job Status thresholds. • Job Count metric not valid with: &JOBNAME, &JOBUSER, &JOBNUMBER, &JOBTYPE, &SBS, &SERVER, &MSGID, &MSGSEV, &MSGTYPE, AND &JOBSTATUS • Job Log Message metric not valid with: &RDUR, &NUMRESET, &TDUR, &NUMTRIGGER, &NUMCURRENT, and &JOBSTATUS • Job Status metric not valid with: &NUMRESET, &NUMTRIGGER, &NUMCURRENT, &MSGID, &MSGSEV, AND &MSGTYPE • The 'Job Numeric Values' metrics of CPU Percent Utilization, Logical I/) Rate, Disk I/O Rate, Communications I/) Rate, Transaction Rate, Transaction Time, Thread Count, and Page Fault Rate are not valid with: &MSGID, &MSGSEV, &MSGTYPE AND &JOBSTATUS • The 'Summary Numeric Values' metrics of CPU Percent Utilization, Logical I/) Rate, Disk I/O Rate, Communications I/) Rate, Transaction Rate, Transaction Time, Thread Count, and Page Fault Rate are not valid with: &JOBNAME, &JOBUSER, &JOBNUMBER, &JOBTYPE, &SBS, &SERVER &MSGID, &MSGSEV, &MSGTYPE AND &JOBSTATUS 40 © 2012 IBM Corporation IBM Power Systems Job Monitors Setting collection interval How often? 5,15, 30 minutes or 1 hour Tuning options Caution Uses system resources! 41 © 2012 IBM Corporation IBM Power Systems Notes: Job Monitors and System Resources Job monitors connect to a QZRCSRVS job for each job that is being monitored for the Job Log Messages and the Job Status metrics. QZRCSRVS jobs are not Management Central jobs. They are IBM i TCP Remote Command Server jobs that the Management Central Java server uses for calling commands and APIs. In order to process the API calls for the Job Log Messages and Job Status metrics in a timely fashion within the job monitor’s interval length, the APIs are called for each job concurrently at interval time. When both metrics are specified on the same monitor, two QZRCSRVS jobs are started for each job. For example, if 5 jobs are monitored for Job Log Messages, 5 QZRCSRVS jobs are started to support the monitor. If 5 jobs are monitored for Job Log Messages and Job Status, then 10 QZRCSRVS jobs are started. Thus, it is recommended that for standard systems, when you are using the Job Log Message and Job Status metrics, you limit the number of jobs monitored on a small system to 40 jobs or less. 42 © 2012 IBM Corporation IBM Power Systems Job Monitors Actions Server PC Client Job, Message and File Monitors Data will collect without thresholds and actions 43 © 2012 IBM Corporation IBM Power Systems Job Monitors Select the monitor, then the start button 44 © 2012 IBM Corporation IBM Power Systems Job Monitors View the status Overall status Restart on failed systems 45 © 2012 IBM Corporation IBM Power Systems Job Monitors Viewing the system and job information Status Detailed Information Actions 46 © 2012 IBM Corporation IBM Power Systems Message Monitors Message monitors can be used to view messages across systems that match monitor criteria. You can work with the messages listed in the monitor (display details, reply, and delete). Monitor the system by selecting ... – A single message queue – Message criteria ● ID, severity or type – Predefined metrics ● Count... Enabled with... – – – – Event logging Trigger/Reset notification CL Command Automation System Actions ● 47 Message reply, delete... © 2012 IBM Corporation IBM Power Systems Message Monitors Select 'New Monitor...' and specify General properties 48 © 2012 IBM Corporation IBM Power Systems Message Monitors Select 'Metrics to monitor' and thresholds then press OK to create n Co on i t i d Messages n o i t ma o t Au 49 © 2012 IBM Corporation IBM Power Systems Notes: Threshold variables Message Monitor Replacement Variables: Parameter &DATE &MON &INTVL &TIME &ENDPOINT &EVENTTYPE &FRMJOBNUMBER &FRMJOBNAME &FRMUSER &FRMPROGRAM &MSGKEY &MSGID &MSGSEV &MSGTYPE &MSGCOUNT &OWNER &THRESHOLD &TOLIB &TOMSGQ 50 Passed Data Date Monitor name Collection interval length in seconds Time Endpoint system name Event type and defined as follows: Triggered Event = 1 Manual Reset Event = 3 Job number for the job causing the triggering message Job name for the job causing the triggering message User owning the job causing the triggering message Name of the program causing the triggering message 4-byte message key for the message causing the trigger (as a hex string) Message ID causing the trigger Message severity causing the trigger Message type causing the trigger Current message count (that caused the trigger) Monitor owner Threshold number causing the trigger Message queue's library to which this message was sent (the library of the queue being monitored) Message queue name to which this message was sent (the queue being monitored) © 2012 IBM Corporation IBM Power Systems Message Monitors Viewing the system and message information Status Actions 51 © 2012 IBM Corporation IBM Power Systems Monitoring with Watches Watches can be use to automate the actions taken when the following occur: – Message – Licensed Internal Code Log (LIC Log) – Problem Activity Log Entry (PAL entry) Start Watch (STRWCH) command or API (QSCSWCH) End Watch (ENDWCH) command or API (QSCEWCH) When the condition being watched occurs, your program gets control and you can take any action you want http://ibmsystemsmag.blogs.com/i_can/2010/01/i-can-automate-monitoring-with-watches.html http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/topic/rzahb/rzahb_eventfunction.htm 52 © 2012 IBM Corporation IBM Power Systems Watches Low Overhead – Watches are an exit – Almost no overhead until the watched condition occurs – Your program gets control to determine what action to take – For message watches • Can watch for messages sent to any message queue, including QSYSOPR, History Log • Can watch for messages sent to any job log Can specify generic job name Can specify *ALL to watch for a message to all job logs 53 © 2012 IBM Corporation IBM Power Systems File Monitors You can use a file monitor to notify you whenever a selected file has changed, reached a specified size, or for specified text strings. Monitor the system by selecting ... – History Log (QHST) or specific files – File criteria ● File location – Predefined metrics ● Status, size and text Enabled with... – Event logging – Trigger/Reset notification – CL Command Automation 54 © 2012 IBM Corporation IBM Power Systems File Monitors You can select to monitor all system log files or selected files. 55 © 2012 IBM Corporation IBM Power Systems File Monitors Metrics – Text – File Status – Size 56 © 2012 IBM Corporation IBM Power Systems Automatic Reset for Message and File Monitor Triggers 57 Some metrics can be reset automatically after a trigger command runs – Only available for Message and File monitors © 2012 IBM Corporation IBM Power Systems Sharing of Monitors Monitors can be shared Owner – User that created the monitor None – No one else can see it Read-only – Others can see it ● i.e. view properties & copy it Controlled – Other can perform actions ● i.e. start and stop 58 © 2012 IBM Corporation IBM Power Systems Notes: Sharing The owner has specified one of the following levels of sharing: None Other users cannot view this item. Read-Only Other users can view this item and use it (but can not start or stop it). Other users can create a new item based on this one and make changes to the new one as needed. However, other users cannot delete or change this item in any way. If you are the owner of a monitor and have specified actions (such as opening the event log window or sounding an alarm on the PC), these actions occur for all users of the monitor whenever a threshold is triggered or reset. The other users cannot change these actions. Controlled Other users can start and stop this item. Only the owner can change the level of sharing. Other users can also view this item and use it to create a new item based on this one. If you are the owner of a monitor and have specified actions (such as opening the event log window or sounding an alarm on the PC), these actions occur for all users of the monitor whenever a threshold is triggered or reset. The other users cannot change these actions. Actions are run under the profile of owner! 59 © 2012 IBM Corporation IBM Power Systems Monitoring with IBM Systems Director 60 © 2012 IBM Corporation AIX® and PowerVM™ Workshop Cross-platform Management Upward Integration modules supporting Tivoli, Computer Associates, Hewlett Packard, Microsoft Windows Administrator Linux Administrator AIX Administrator IBM i Administrator Microsoft Windows™ VM VMware ESX™ VM VM VM VM VIO VIO Managed Systems Common agent, Platform agent, No agent 66 66 © 2012 IBM Corporation IBM Power Systems IBM i Prerequisites for IBM Systems Director 6.3 Agentless - SSH is required to discover IBM i – SSH, 5733-SC1 Option 1 Platform agent – IBM Universal Management Enablement – For CIM capabilities (for example, IBM i Monitors) – 5770-UME V1R3 – Operating System and UME fixes are required http://www-912.ibm.com/s_dir/slkbase.NSF/DocNumber/618224537 Common Agent (CAS) – Download from IBM Systems Director web site – Install manually on i or through the IBM Systems Director UI http://publib.boulder.ibm.com/infocenter/director/pubs/topic/com.ibm.director.tbs.helps.doc/fqm0_tbs_ibm_i_endpoints.html 62 © 2011 IBM Corporation IBM Power Systems Collection Services Prerequisites for Monitoring Collection Services must be started Consider the collection interval for frequency of updates through System Director – Default is every 15 minutes Data must be in Collection Services DB2 files to be displayed on IBM i Monitors in Systems Director – CFGPFRCOL – CRTDBF parameter must be *YES “No Data Available” – Symptom of possible Collection Services problem when viewing IBM i Monitors 63 © 2011 IBM Corporation IBM Power Systems Performance Summary 64 © 2012 IBM Corporation IBM Power Systems Monitor Your i 65 © 2012 IBM Corporation IBM Power Systems Common CIM Monitors 66 © 2012 IBM Corporation IBM Power Systems IBM i Monitors 67 © 2012 IBM Corporation IBM Power Systems IBM i Monitors 34 Metrics for common monitoring scenarios 68 © 2011 IBM Corporation IBM Power Systems 69 © 2011 IBM Corporation IBM Power Systems Create Your Own Monitor 70 © 2011 IBM Corporation IBM Power Systems Director Agent Monitors (requires CAS agent) 71 © 2011 IBM Corporation IBM Power Systems 72 © 2011 IBM Corporation IBM Power Systems 73 © 2011 IBM Corporation IBM Power Systems CIM Monitors The are many metrics that you can monitor with CIM, but it can be difficult to figure out what's available. 74 © 2011 IBM Corporation IBM Power Systems Monitor Your IOA Cache Batteries http://ibmsystemsmag.blogs.com/i_can/2012/04/monitoring-cache-battery-status-with-ibm-systems-director-63.html 75 © 2011 IBM Corporation IBM Power Systems IOA Cache Battery Monitor Metrics 76 © 2011 IBM Corporation IBM Power Systems Manage Processes … aka, Work with Active Jobs (Requires CAS Agent) 77 © 2011 IBM Corporation IBM Power Systems Monitoring IBM i Processes Create a process monitor from the Manage Processes task Once a process monitor is created additional metrics can be monitored for that process using the Create monitor task Requires CAS agent 78 © 2011 IBM Corporation IBM Power Systems Monitor Thresholds The process of setting threshold and enabling automation is the same regardless of the metric type For any metric that you want to monitor, you can set thresholds Once thresholds are set, you then can create an event filter for the event that occurs when the threshold setting is hit Automation plans complete the set up to enable automatic notification that the threshold was hit 79 © 2012 IBM Corporation IBM Power Systems Monitor Thresholds 80 © 2012 IBM Corporation IBM Power Systems Monitor Thresholds 81 © 2012 IBM Corporation IBM Power Systems Create Event Filters 82 © 2012 IBM Corporation IBM Power Systems Event Automation Plans 83 © 2012 IBM Corporation IBM Power Systems 84 © 2012 IBM Corporation IBM Power Systems Event Actions 85 © 2012 IBM Corporation IBM Power Systems Graphs and Dashboard 86 © 2012 IBM Corporation IBM Power Systems Health Summary 87 © 2012 IBM Corporation IBM Power Systems Monitoring Messages with IBM Systems Director Events 88 © 2012 IBM Corporation IBM Power Systems Event Filters for QSYSOPR messages 89 © 2012 IBM Corporation IBM Power Systems Event Filters for QSYSOPR message 90 © 2012 IBM Corporation IBM Power Systems Event Filters for QSYSOPR message 91 © 2012 IBM Corporation IBM Power Systems Monitoring QSYSOPR .... Event Automation Plan After you establish what you want to monitor by creating the event filter for IBM i messages ... You must also create an event automation plan for the action Director is to take when those events occur Even though these events are generated by their respective operating systems (or an optional layer that is installed on the operating system), IBM Systems Director does not process these events unless you create an event automation plan to do so. 92 © 2012 IBM Corporation IBM Power Systems Monitoring System Performance with IBM Systems Director Navigator 93 © 2012 IBM Corporation IBM Power Systems IBM Systems Director Navigator for i Performance Tasks 94 © 2012 IBM Corporation IBM Power Systems Health Indicators Manually Monitor your System Performance 95 © 2012 IBM Corporation IBM Power Systems Health Indicators Customize Health Indicator Thresholds 96 © 2012 IBM Corporation IBM Power Systems Continuing Education A Few More Things.... 97 © 2012 IBM Corporation IBM Power Systems Monitor Restart Options 98 © 2012 IBM Corporation IBM Power Systems Notes: Changing the Monitor Restart Options Restarting Monitors Monitor Restart was added to provide a way to automatically restart monitors when the Management Central servers have been interrupted. These interruptions could be as simple as the MC Central Server or MC Endpoint Server being restarted, or something more dramatic such as the temporary loss of communications between the Central Server and an Endpoint Server or a system being IPLed. If you select to have the system automatically attempt to restart your monitors, you may also specify how long you want the central system to keep trying to restart the monitors and how often you want the system to try during that time period. For example, if you want the system to try to restart monitors every five minutes for a period of 3 hours, you select 'Automatically restart monitors on failed systems' and then specify 180 minutes for 'How long to attempt restart' and 5 minutes for 'How often to attempt restart'. A change to this setting takes effect the next time the Management Central servers are restarted. All Monitors support the restart option. Default behavior is OFF. Documentation on how to automatically restart Management Central Monitors: https://www-304.ibm.com/support/entdocview.wss?uid=nas16e5b0871315547a68625729e004737ce 99 © 2012 IBM Corporation IBM Power Systems References IBM i Home Page – http://www-03.ibm.com/systems/power/software/i/ Information Center – http://publib.boulder.ibm.com/iseries/ IBM i Systems Management – www.ibm.com/systems/i/solutions/management/ – www.ibm.com/systems/i/software/navigator/index.html http://www.redbooks.ibm.com/redbooks/pdfs/sg246226.pdf 100 © 2012 IBM Corporation IBM Power Systems Special notices This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied. All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generallyavailable systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment. Revised September 26, 2006 101 © 2012 IBM Corporation IBM Power Systems Special notices (cont.) IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 5L, AIX 6 (logo), AS/400, BladeCenter, Blue Gene, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM Business Partner (logo), IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC, pSeries, Rational, RISC System/6000, RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, Active Memory, Balanced Warehouse, CacheFlow, Cool Blue, IBM Systems Director VMControl, pureScale, TurboCore, Chiphopper, Cloudscape, DB2 Universal Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General Parallel File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, iSeries, Micro-Partitioning, POWER, PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems, Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4, POWER4+, POWER5, POWER5+, POWER6, POWER6+, POWER7, System i, System p, System p5, System Storage, System z, TME 10, Workload Partitions Manager and X-Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. AltiVec is a trademark of Freescale Semiconductor, Inc. AMD Opteron is a trademark of Advanced Micro Devices, Inc. InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries or both. Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both. NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both. SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC). The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC). UNIX is a registered trademark of The Open Group in the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. Revised December 2, 2010 102 © 2012 IBM Corporation