An Expert System designed to evaluate IBM z/OS systems ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 1 Product Overview • Helps analyze performance of z/OS systems. • Written in SAS (only SAS/BASE is required). • Runs as a batch job on mainframe (or on PC). • Processes data in a standard performance data base (either MXG, SAS/ITRM, or MICS). • Produces narrative reports showing results from analysis! • Product is updated every six months • 45-day trial is available (see license agreement for details). ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 2 Components Delivered • SRM Component * March 1991 • TSO Component * April 1991 • MVS Component * June 1991 * These legacy components apply only to Compatibility Mode • DASD Component October 1991 • CICS Component May 1992 • WLM Component April 1995 • DB2 Component October 1999 • WMQ Component June 2004 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 3 Product Documentation Each component has an extensive User Manual available in hard-copy or CD, and web-enabled • Describes the likely impact of each finding • Discusses the performance issues associated with each finding • Suggests ways to improve performance and describes alternative solutions • Provides specific references to IBM or other documents relating to the findings • More than 4,000 pages for all components ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 4 WLM Component • Checks for problems in service definition • Identifies reasons performance goals were missed • Analyzes general system problems: • Coupling facility/XCF • Paging subsystem • System logger • WLM-managed initiators • Excessive CPU use by SYSTEM or SYSSTC • IFA/zAAP, zIIP, and IOP/SAP processors • PR/SM, LPAR, and HiperDispatch problems • Intelligent Resource Director (IRD) problems ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 5 WLM Component - sample report RULE WLM103: SERVICE CLASS DID NOT ACHIEVE VELOCITY GOAL DB2HIGH (Period 1): Service class did not achieve its velocity goal during the measurement intervals shown below. The velocity goal was 50% execution velocity, with an importance level of 2. The '% USING' and '%TOTAL DELAY' percentages are computed as a function of the average address space ACTIVE time. The 'PRIMARY,SECONDARY CAUSES OF DELAY' are computed as a function of the execution delay samples on the local system. ------LOCAL SYSTEM-------% % TOTAL EXEC PERF MEASUREMENT INTERVAL USING DELAY VELOC INDX 21:15-21:30,08SEP1998 16.6 83.4 17% 3.02 PLEX PI 2.36 PRIMARY,SECONDARY CAUSES OF DELAY DASD DELAY(99%) RULE WLM361: NON-PAGING DASD I/O ACTIVITY CAUSED SIGNIFICANT DELAYS DB2HIGH (Period 1): A significant part of the delay to the service class can be attributed to non-paging DASD I/O delay. The below data shows intervals when non-paging DASD delay caused DB2HIGH to miss its performance goal: AVG DASD MEASUREMENT INTERVAL I/O RATE 21:15-21:30,08SEP1998 31 AVG DASD USING/SEC 1.405 --AVERAGE DASD I/O TIMESRESP WAIT DISC CONN 0.010 0.003 0.004 0.002 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 6 WLM Component - sample report RULE WLM601: TRANSPORT CLASS MAY NEED TO BE SPLIT You should consider whether the DEFAULT transport class should be split. A large percentage of the messages were too small, while a significant percentage of messages were too large. Storage is wasted when buffers are used by messages that are too small, while unnecessary overhead is incurred when XCF must expand the buffers to fit a message. The CLASSLEN parameter establishes the size of each message buffer, and the CLASSLEN parameter was specified as 16,316 for this transport class. This finding applies to the following RMF measurement intervals: MEASUREMENT INTERVAL 10:00-10:30,26MAR1996 12:00-12:30,26MAR1996 12:30-13:00,26MAR1996 SENT TO JA0 Z0 Z0 SMALL MESSAGES 4,296 2,653 2,017 MESSAGES THAT FIT 0 6 0 MESSAGES TOO BIG 57 762 109 TOTAL MESSAGES 4,353 3,421 2,126 RULE WLM316: PEAK BLOCKED WORK WAS MORE THAN GUIDANCE The SMF statistics showed that blocked workload waited longer than specified by the BLWLINTHD parameter in IEAOPTxx. A maximum of more than 2 address spaces and enclaves were concurrently blocked during the interval. BLWLINTHD BLWLTRPCT --BLOCKED WORKLOAD-MEASUREMENT INTERVAL IN IEAOPT IN IEAOPT AVERAGE PEAK 7:14- 7:29,01OCT2010 20 5 0.002 63 7:29- 7:44,01OCT2010 20 5 0.000 22 7:44- 7:59,01OCT2010 20 5 0.001 49 7:59- 8:14,01OCT2010 20 5 0.001 63 8:14- 8:29,01OCT2010 20 5 0.002 62 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 7 WLM Component - sample report RULE WLM893: LOGICAL PROCESSORS IN LPAR HAD SKEWED ACCESS TO CAPACITY LPAR SYSC: HiperDispatch was specified for one or more LPARs in this CPC, and at least one LPAR used one or more high polarity central processors. LPAR SYSC was not operating in HiperDispatch Management Mode, and experienced a skew of its access to physical processors because the high polarity processors and medium polarity processors used by LPARs running in HiperDispatch Management Mode. The information below shows the number of logical processors that were assigned to LPAR SYSC and each logical processor share of physical a processor. The CPU activity skew is shown during each RMF interval, showing the minimum, average, and maximum CPU busy for the logical processors assigned to LPAR SYSC. MEASUREMENT INTERVAL 13:59-14:14,15SEP2009 LOGICAL CPUS ASSIGNED 2 % PHYSICAL CPU ACTIVITY SKEW CPU SHARE MIN AVG MAX 45.5 28.2 43.3 58.4 RULE WLM537: ZAAP-ELIGIBLE WORK HAD HIGH GOAL IMPORTANCE Rule WLM530 or Rule WLM535 was produced for this system, indicating that a relatively large amount of zAAP-eligible work was processed on a central processor. One possible cause of this situation is that the zAAP-eligible work was assigned a relatively high Goal Importance (the Goal Importance was either Importance 1 or Importance 2). Please see the discussion in the WLM Component User Manual for an explanation of this issue. ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 8 DB2 Component • Analyzes standard DB2 interval statistics • Applies analysis from DB2 Administration Guide and DB2 Performance Guide (with DB2 9.1) • Analyzes DB2 Versions 3, 4, 5, 6, 7, 8, and 9 • Evaluates overall DB2 constraints, buffer pools, EDM pool, RID list processing, Lock Manager, Log Manager, DDF, and data sharing • All analysis can be tailored to your site! ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 9 DB2 Component Typical DB2 local buffer constraints • There might be insufficient buffers for work files • There were insufficient buffers for work files in merge passes • Buffer pool was full • Hiperpool read requests failed (pages stolen by system) • Hiperpool write requests failed (expanded storage not available • Buffer pool page fault rate was high • Data Management Threshold (DMTH) was reached • DWQT and VDWQT might be too large • DWQT, VDWQT, or VPSEQT might be too small ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 10 DB2 Component Typical DB2 I/O prefetch constraints • Sequential prefetch was disabled, buffer shortage • Sequential prefetch was disabled, unavailable read engine • Sequential prefetch not scheduled, prefetch quantity = 0 • Synchronous read I/O and sequential prefetch was high • Dynamic sequential prefetch was high (before DB2 8.1) • Synchronous read I/O was high ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 11 DB2 Component Typical DB2 parallel processing constraints • Parallel groups fell back to sequential mode • Parallel groups reduced due to buffer shortage • Prefetch quantity reduced to one-half of normal • Prefetch quantity reduced to one-quarter of normal • Prefetch I/O streams were denied, shortage of buffers • Page requested for a parallel query was unavailable ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 12 DB2 Component Typical DB2 EDM pool constraints • Failures were caused by full EDM pool • Low percent of DBDs found in EDM pool • Low percent of CT Sections found in EDM pool • Low percent of PT Sections found in EDM pool • Size of EDM pool could be reduced • Excessive Class 24 (EDM LRU) latch contention ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 13 DB2 Component Typical DB2 Lock Manager constraints • Work was suspended because of lock conflict • Locks were escalated to shared mode • Locks were escalated to exclusive mode • Lock escalation was not effective • Work was suspended for longer than time-out value • Deadlocks were detected ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 14 DB2 Component Typical DB2 Log Manager constraints • Archive log read allocations exceeded guidance • Archive log write allocations exceeded guidance • Waits were caused by unavailable output log buffer • Log reads satisfied from active log data set • Log reads were satisfied from archive log data set • Failed look-ahead tape mounts ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 15 DB2 Component Typical DB2 Data Sharing constraints • Group buffer pool is too small • Incorrect directory entry/data entry ratio • Directory reclaims resulting in cross-invalidations • Castout processing occurring in “spurts” • Excessive lock contention or false lock contention • GBPCACHE ALL inappropriately specified • GBPCACHE CHANGED inappropriately specified • Conflicts between applications ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 16 DB2 Component - sample report RULE DB2-208: VIRTUAL BUFFER POOL WAS FULL Buffer Pool 2: A usable buffer could not be located in virtual Buffer Pool 2, because the virtual buffer pool was full. This condition should not normally occur, as there should be ample buffers. You should consider using the -ALTER BUFERPOOL command to increase the virtual buffer pool size (VPSIZE) for the virtual buffer pool. This situation occurred during the intervals shown below: MEASUREMENT INTERVAL 10:54-11:24, 15SEP1999 11:24-11:54, 15SEP1999 BUFFERS ALLOCATED 100 100 NUMBER OF TIMES POOL WAS FULL 12 13 RULE DB2-216: BUFFER POOLS MIGHT BE TOO LARGE Buffer Pool 1: The page fault rates for read and write I/O indicated that the buffer pools might be too large for the available processor storage. This situation occurred for Buffer Pool 1 during the intervals shown below: MEASUREMENT INTERVAL 11:15-11:45, 16SEP1999 11:45-12:15, 16SEP1999 12:45-13:15, 16SEP1999 BUFFERS ALLOCATED 25,000 25,000 25,000 PAGE-IN FOR READ I/O 36,904 30,892 23,890 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA PAGE-IN FOR WRITE I/O 195 563 170 PAGE RATE 41.2 35.0 26.7 www.cpexpert.com 17 DB2 Component - sample report RULE DB2-230: SEQUENTIAL PREFETCH WAS DISABLED - BUFFER SHORTAGE Buffer Pool BP1: Sequential prefetch is disabled when there is a buffer shortage, as controlled by the Sequential Prefetch Threshold (SPTH). Ideally, sequential prefetch should not be disabled, since performance is adversely affected. If sequential prefetch is disabled a large number of times, the buffer pool size might be too small. The sequential prefetch threshold was reached for Buffer Pool BP1 during the intervals shown below. MEASUREMENT INTERVAL 5:00- 5:15, 15MAY2009 5:15- 5:30, 15MAY2009 BUFFERS ALLOCATED 268,000 268,000 TIMES SEQUENTIAL PREFETCH DISABLED (BUFFER SHORTAGE) 125 BP1 1,533 BP1 RULE DB2-234: WRITE ENGINES WERE NOT AVAILABLE FOR ASYNCHRONOUS I/O Buffer Pool BP13: DB2 has 600 deferred write engines available for asynchronous I/O operations. When all 600 write engines are used, synchronous writes are performed. The application is suspended during synchronous writes, and performance is adversely affected. This situation occurred for Buffer Pool BP13 during the intervals shown below: MEASUREMENT INTERVAL 5:45- 6:00, 15MAY2009 BUFFERS ALLOCATED 12,800 TIMES WRITE ENGINES WERE NOT AVAILABLE 44 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA BP13 www.cpexpert.com 18 DB2 Component - sample report RULE DB2-423: DATABASE ACCESS THREAD WAS QUEUED, ZPARM LIMIT WAS REACHED Database access threads were queued because the ZPARM maximum for active remote threads was reached. You should consider increasing the maximum number of database access threads allowed. This situation occurred during the intervals shown below: MEASUREMENT INTERVAL 11:24-11:54, 01OCT2010 DATABASE ACCESS THREADS QUEUED ZPARM LIMIT REACHED 9 RULE DB2-512: LOG READS WERE SATISFIED FROM ACTIVE LOG DATA SET The DB2 Log Manager statistics revealed that more than 25% of the log reads were satisfied from the active log data set. It is preferable that the data be in the output buffer, but this is not always possible with an active DB2 environment. However, if a large percent of reads are satisfied from the active log, you should ensure that the output buffer is as large as possible. This finding occurred during the intervals shown below: MEASUREMENT INTERVAL 14:24-14:54, 01OCT2010 14:54-15:24, 01OCT2010 TOTAL LOG READS 6,554 7,274 LOG READS FROM ACTIVE LOG DATA SET 4,678 3,695 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA PERCENT 71.4 50.8 www.cpexpert.com 19 DB2 Component - sample report RULE DB2-601: COUPLING FACILITY READ REQUESTS COULD NOT COMPLETE Group Buffer Pool 6: Coupling facility read requests could not be completed because of a lack of coupling facility storage resources. This situation occurred for Group Buffer Pool 6 during the intervals shown below: MEASUREMENT INTERVAL 11:01-11:31, 14OCT1999 GROUP BUFFER POOL ALLOCATED SIZE 38M TIMES CF READ REQUESTS NOT COMPLETE 130 RULE DB2-610: GBPCACHE(N0) OR GBPCACHE NONE MIGHT BE APPROPRIATE Group Buffer Pool 4: This buffer pool had a very small amount of read activity relative to write activity. Pages read were less than 1% of the pages written. Since so few pages were read from this group buffer pool, you should consider specifying GPBCACHE(NO) for the group buffer pool or specifying GBPCACHE NONE for the page sets using the group buffer pool. This situation occurred for Group Buffer Pool 4 during the intervals shown below: MEASUREMENT INTERVAL 10:34-11:04, 14OCT1999 GROUP BUFFER POOL ALLOCATED SIZE 38M ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA PAGES READ 14 PAGES WRITTEN 18,268 READ PERCENT 0.07% www.cpexpert.com 20 CICS Component • Processes CICS Interval Statistics contained in MXG Performance Data Base (standard SMF 110) • Analyzes all releases of CICS (CICS/ESA, CICS/TS for OS390, and CICS/TS for z/OS) • Applies most analysis techniques contained in IBM’s CICS Performance Guides • Produces specific suggestions for improving CICS performance ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 21 CICS Component (Major areas analyzed) • • • • • • • • • • • • • Virtual and real storage (MXG/AMXT/TCLASS) VSAM and File Control (NSR and LSR pools) Database management (DL/I, IMS, DB2) Journaling (System and User journals) Network and VTAM (RAPOOL, RAMAX) CICS Facilities (temp storage, transient data) ISC/IRC (MRO, LU61., LU6.2 modegroups) System logger Temporary Storage Coupling Facility Data Tables (CFDT) CICS-DB2 Interface Open TCB pools TCP/IP and SSL ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 22 CICS Component - sample report RULE CIC101: CICS REACHED MAXIMUM TASKS TOO OFTEN The CICS statistics revealed that the number of attached tasks was restricted by the MXT operand, but storage did not appear to be constrained. CPExpert suggests that you consider increasing the MXT value in the System Initialization Table (SIT) for this region. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 0:00,01OCT2010 TIMES PEAK TIME MXT -PEAK TASKS- MAXTASK MAXTASK WAITING APPLID VALUE TOTAL USER REACHED QUEUE MAXTASK CICSIDG. 20 46 20 36 8 0:02:29.0 RULE CIC140: THE NUMBER OF TRANSACTION ERRORS IS HIGH The CICS statistics revealed that more than 5 transaction errors were related to terminals. These transactions errors may indicate that there is an attempted security breach, there may be problems with the terminal, or perhaps additional operator training is indicated. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 0:00,01OCT2010 0:00,01OCT2010 0:00,01OCT2010 APPLID CICSPROD CICSPROD CICSPROD TERMINAL T2M1 T2M2 T2M6 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA NUMBER OF ERRORS 348 60 348 www.cpexpert.com 23 CICS Component - sample report RULE CIC170: MORE THAN ONE STRING SPECIFIED FOR WRITE-ONLY ESDS FILE More than one string was specified for a VSAM ESDS file that was used exclusively for write operations. Specifying more than one string can significantly affect performance because of exclusive control conflict that can occur. If this finding occurs for all normal CICS processing you should consider specifying only one string in the ESDS file definition. STATISTICS COLLECTION TIME 0:00,16MAR2010 APPLID CICSYA VSAM FILE LNTEMSTR NUMBER OF WRITE OPERATIONS 431,436 RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 10:00,26MAR2008 APPLID CICSDTL1 ALLOCATE REQUESTS RETURNED TO USERS 335 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 24 CICS Component - sample report RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. The number of ALLOCATE requests returned was greater than the value specified for the ALLOCQ guidance variable in USOURCE(CICGUIDE). CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 10:00,26MAR2008 11:00,26MAR2008 12:00,26MAR2008 APPLID CICSDTL1 CICSDTL1 CICSDTL1 ALLOCATE REQUESTS RETURNED TO USERS 335 12 27 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 25 CICS Component - sample report RULE CIC307: FREQUENT LOG STREAM DASD-SHIFTS OCCURRED CICS75.A075CICS.DFHLOG: More than 1 log stream DASD-shift was initiated for this log stream during the intervals shown below. A DASD-shift event occurs when system logger determines that a log stream must stop writing to one log data set and start writing to a different data set. You normally should allocate sufficiently large log data sets so that a DASD-shift occurs infrequently. SMF INTERVAL 14:45,16MAR2010 ------NUMBER OF DASD LOG SHIFTS-----DURING INTERVAL DURING PAST HOUR 1 2 RULE CIC650: CICS EVENT PROCESSING WAS DISABLED IN CICS EVENTBINDING Event Processing was disabled in EVENTBINDING, with the result that events defined in the EVENTBINDING were not captured by CICS Event Processing. You should investigate the Event Binding to determine whether the Binding should be enabled or disabled for the region. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 0:00,12MAR2009 3:00,12MAR2009 6:00,12MAR2009 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 26 DASD Component • Processes SMF Type 70(series) to automatically build model of your I/O configuration. • Identifies performance problems with devices which have most potential for improvement. • PEND delays • Disconnect delays • Connect delays • IOSQ delays • Shared DASD conflicts • Analyzes SMF Type 42(DS) and Type 64 to identify VSAM performance problems. ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 27 DASD Component - sample report RULE DAS100: VOLUME WITH WORST OVERALL PERFORMANCE VOLSER DB2327 (device 2A1F) had the worst overall performance during the entire measurement period (10:00, 16FEB2001 to 11:00, 16FEB2001). This volume had an overall average of 56.8 I/O operations per second, was busy processing I/O for an average of 361% of the time, and had I/O operations queued for an average of 1% of the time. Please note that percentages greater than 100% and Average Per Second Delays greater than 1 indicate that multiple I/O operations were concurrently delayed. This can happen, for example, if multiple I/O operations were queued or if multiple I/O operations were PENDing. The following summarizes significant performance characteristics of VOLSER DB2327: MEASUREMENT INTERVAL 10:00-10:30,16FEB2001 10:30-11:00,16FEB2001 11:00-11:30,16FEB2001 I/O RATE 59.1 57.2 54.2 --- AVERAGE PER SECOND DELAYS--RESP CONN DISC PEND IOSQ 1.308 0.316 0.004 0.988 0.000 3.792 0.300 0.004 3.483 0.006 5.769 0.279 0.004 5.464 0.023 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA MAJOR PROBLEM PEND TIME PEND TIME PEND TIME www.cpexpert.com 28 DASD Component - sample report RULE DAS130: PEND TIME WAS MAJOR CAUSE OF I/O DELAY. A major cause of the I/O delay with VOLSER DB2327 was PEND time. average per-second PEND delay for I/O is shown below: MEASUREMENT INTERVAL 10:00-10:30,16FEB2001 10:30-11:00,16FEB2001 11:00-11:30,16FEB2001 RULE DAS160: PEND CHAN 0.492 1.927 2.840 PEND DIR PORT 0.000 0.000 0.000 PEND CONTROL 0.000 0.000 0.000 PEND DEVICE 0.000 0.000 0.000 PEND OTHER 0.495 1.556 2.624 The TOTAL PEND 0.988 3.483 5.464 DISCONNECT TIME WAS MAJOR CAUSE OF I/O DELAY. A major cause of the I/O delay with VOLSER DB26380 was DISCONNECT time. DISC time for modern systems is a result of cache read miss operations, potentially back-end staging delay for cache write operations, peer-to-peer remote copy (PPRC) operations, and other miscellaneous reasons. MEASUREMENT INTERVAL 8:30- 8:45,22OCT2001 8:45- 9:00,22OCT2001 --PERCENT-----CACHE---- READ WRITE READS WRITES HITS HITS 14615 932 19.2 100.0 14570 921 20.7 100.0 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA DASD TO CACHE 11825 11567 CACHE TO DASD PPRC 903 0 907 0 BPCR 0 0 ICLR 0 0 www.cpexpert.com 29 DASD Component - sample report RULE DAS300: PERHAPS SHARED DASD CONFLICTS CAUSED PERFORMANCE PROBLEMS Accessing conflicts caused by sharing VOLSER DB2700 between systems might have caused performance problems for the device during the measurement intervals shown below. Conflicting systems had the indicated I/O rate, average CONN time per second, average DISC time per second, average PEND time per second, and average RESERVE time to the device. Even moderate CONN, DISC, or RESERVE can cause delays to shared devices. .. I/O MAJOR OTHER -------OTHER SYSTEM DATA-------MEASUREMENT INTERVAL RATE PROBLEM SYSTEM I/O RATE CONN DISC PEND RESV 8:30- 8:45,22OCT2001 31.3 QUEUING SY1 35.0 0.041 0.001 0.455 0.000 SY2 88.2 0.100 0.003 0.714 0.000 SY3 109.0 0.123 0.003 0.723 0.000 TOTAL 232.2 0.264 0.006 1.892 0.000 8:45- 9:00,22OCT2001 25.7 QUEUING SY1 46.4 0.054 0.001 0.565 0.000 SY2 98.2 0.112 0.003 0.836 0.000 SY3 119.0 0.136 0.003 0.846 0.000 TOTAL 263.5 0.303 0.007 2.247 0.000 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 30 DASD Component - sample report RULE DAS607: VSAM DATA SET IS CLOSE TO MAXIMUM NUMBER OF EXTENTS VOLSER: RLS003. More than 225 extents were allocated for the VSAM data sets listed below. The VSAM data sets are approaching the maximum number of extents allowed. The below shows the number of extents and the primary and secondary space allocation: .. SMF TIME STAMP JOB NAME VSAM DATA SET .. 10:30,11MAR2002 CICS2ABA RLSADSW.VF01D.DATAENDB.DATA................. RULE DAS625: TOTAL EXTENTS 229 EXTENTS THIS OPEN 4 ---ALLOCATIONS--PRIMARY SECONDARY 30 CYL 1 CYL NSR WAS USED, BUT LARGE PERCENT OF ACCESS WAS DIRECT VOLSER: MVS902. Non-Shared resources (NSR) was specified as the buffering technique for the below VSAM data sets, but more than 75% of the I/O activity was direct access. NSR is not designed for direct access, and many of the advantages of NSR are not available for direct access. You should consider Local Shared Resources (LSR) for the below VSAM data sets (perhaps using System Managed Buffers to facilitate the use of LSR). The I/O RATE is for the time the data set was open. The SMF TIME STAMP and JOB NAME are from the last record for the data set. .. SMF TIME STAMP JOB NAME VSAM DATA SET .. 13:19,19SEP2002 NRXX807. SDPDPA.PK.MVSP.RT.NDMGIX.DATA............... 13:19,19SEP2002 NRXX807. SDPDPA.PR.MVSP.RT.NDMGIXD.DATA.............. 13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQFDA.DATA............. 13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQF.DATA............... 13:33,19SEP2002 TSJHM... SDPDPA.PK.MVSP.RT.NDMTCF.DATA............... ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA I/O RATE 8.4 11.2 0.3 2.8 11.1 OPEN DURATION 0:07:08 0:06:42 2:21:58 3:37:53 6:24:10 -ACCESS TYPE SEQUENTIAL 0.0 0.0 0.0 0.0 0.1 (PCT)DIRECT 100.0 100.0 100.0 100.0 99.9 www.cpexpert.com 31 DASD Component (Application Analysis) • Requires simple modification to MXG or MICS • Modification collects job step data while processing SMF Type 30 (Interval) records • Typically requires less than 10 cylinders • Data is correlated with Type 74 information • CPExpert associated performance problems to specific applications (jobs and job steps) • CPExpert can perform “Loved one” analysis of DASD performance problems ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 32 WMQ Component Analyzes SMF Type 115 statistics, as processed by MXG or MICS and placed into performance data base. • MQMLOG - Log manager statistics • MQMMSGDM - Message/data manager statistics • MQMBUFER - Buffer Manager statistics • MQMCFMGR - Coupling Facility Manager stats Type 115 records should be synchronized with SMF interval recording interval. IBM says overhead to collect accounting data is negligible. ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 33 WMQ Component Optionally analyzes SMF Type 116 accounting data, as processed by MXG or MICS and placed into performance data base. • MQMACCTQ - Thread-level accounting data • MQMQUEUE - Queue-level accounting data Type 116 records should be synchronized with SMF interval recording interval. IBM says overhead to collect accounting data is 5-10% ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 34 WebSphere MQ Typical queue manager problems Assignment of queues to page sets Assignment of page sets to buffer pools Queue manager parameters Index characteristics of queues Characteristics of messages in queues Characteristics of MQ calls CPExpert analysis uses SMF Type 116 records ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 35 WebSphere MQ Typical buffer manager problems Buffer thresholds exceeded for pool Buffers assigned per pool (too few/too many) Message traffic Message characteristics Application design CPExpert analysis uses SMF Type 115 records ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 36 WebSphere MQ Typical log manager problems Log buffers assigned Active log use characteristics Archive log use characteristics Tasks backing out System paging of log buffers Excessive checkpoints taken CPExpert analysis uses SMF Type 115 records ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 37 WebSphere MQ Typical DB2-interface problems Thread delays DB2 server processing delays Server requests queued Server tasks experienced ABENDs Deadlocks in DB2 Maximum request queue depth was too large CPExpert analysis uses SMF Type 115 records ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 38 WebSphere MQ Typical Shared queue problems Structure was full Large number of application structures defined MINSIZE is less than SIZE for CSQ.ADMIN SIZE is more than double MINSIZE ALLOWAUTOALT(YES) not specified FULLTHRESHOLD value might be incorrect CPExpert analysis uses SMF Type 115 records and Type 74 (Coupling Facility) records ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 39 WebSphere MQ – sample report RULE WMQ100: MESSAGES WERE WRITTEN TO PAGE SET ZERO More than 0 messages were written to Page Set Zero during the intervals shown below. Messages should not be written to Page Set Zero, since serious WebSphere MQ system problems could occur if Page Set Zero should become full. This finding relates to queue SYSTEM.COMMAND.INPUT STATISTICS INTERVAL 13:16-14:45, 28AUG2003 MESSAGES WRITTEN TO PAGE SET ZERO 624 RULE WMQ122: DEAD.LETTER QUEUE IS INAPPROPRIATE FOR PAGE SET ZERO Buffer Pool 0. The DEAD.LETTER queue was assigned to Page Set Zero. A dead-letter queue stores messages that cannot be routed to their correct destinations. If the DEAD-LETTER queue grows large unexpectedly, Page Set Zero can become full, and WebSphere MQ can enter a serious stress condition. You should redefine the DEAD.LETTER queue to a page set other than Page Set Zero. This finding relates to queue SYSTEM.DEAD.LETTER.QUEUE ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 40 WebSphere MQ – sample report RULE WMQ110: EXPYRINT VALUE IS OFF OR TOO SMALL Buffer Pool 3. There were more than 25 expired messages skipped when scanning a queue for a specific message. Processing expired messages adds both CPU time and elapsed time to the message processing. With WebSphere 5.3, the EXPYRINT keyword was introduced to allow the queue manager to automatically determine whether queues contained expired messages and to eliminate expired messages at the interval specified by the EXPYRINT value. This finding applies to queue: DPS.REPLYTO.RCB.IVR04 STATISTICS INTERVAL 13:41-13:41, 03JUL2003 GET SPECIFIC 0 BROWSE SPECIFIC 0 EXPIRED MESSAGES PROCESSED 313 RULE WMQ320: APPLICATIONS WERE SUSPENDED FOR LOG WRITE BUFFERS Applications were suspended while in-storage log buffers are being written to the active log. This finding normally means that too few log buffers were assigned. However, the finding could mean that there is an I/O configuration problem and the log buffer writes to the active log are delayed for I/O reasons. This finding applies to the following statistics intervals. STATISTICS INTERVAL 14:19-14:44, 12SEP2003 NUMBER OF SUSPENSIONS WAITING ON OUTPUT BUFFERS 139 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 41 WebSphere MQ – sample report RULE WMQ201: BUFFER POOL ENCOUNTERED SYNCHRONOUS (5%) THRESHOLD Buffer Pool 0. This buffer pool encountered the Synchronous Write threshold (less than 5% of the pages in the buffer pool were "stealable" or more than 95% of the pages were on the Deferred Write queue). While the Synchronous Page Writer is executing, updates to any page cause the page to be written immediately to the page set (the page is not placed on the Deferred Write Queue, but is written immediately to the page set as a synchronous write operation). This situation harms performance of applications, and is an indicator that the buffer pool is in danger of encountering a Short on Storage condition. STATISTICS INTERVAL 17:08-17:09, 07OCT2003 BUFFERS ASSIGNED 1,050 TIMES AT 5% THRESHOLD 19 IMMEDIATE WRITES 19 RULE WMQ205: HIGH I/O RATE TO PAGE SETS WITH SHORT-LIVED MESSAGES Buffer Pool 0. This buffer pool had short-lived messages assigned. The total I/O rate (read and write activity) to page sets for the short-lived messages was more than 0.5 pages per second. Writing pages to the page set and subsequently reading the pages from the page set cause I/O overhead and delay to the application. This finding applies to the following intervals: STATISTICS INTERVAL 11:32-11:32, 24JUL2006 BUFFERS ASSIGNED 50,000 PAGES WRITTEN 101 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA PAGES READ 0 I/O RATE WITH DASD 50.5 www.cpexpert.com 42 WebSphere MQ – sample report RULE WMQ300: ARCHIVE LOGS WERE USED FOR BACKOUT WebSphere MQ applications issued log reads to the archive log file for backout more than 0 times during the WebSphere MQ statistics intervals shown below. Most log read requests should come from the output buffer or the active log. Using archive logs for backout purposes often indicates that either the active log files were too small or long-running applications were backing out work. NUMBER OF LOG READS STATISTICS INTERVAL FROM ARCHIVE LOG 4:30- 5:00, 12SEP2003 192 RULE WMQ611: LARGE NUMBER OF APPLICATION STRUCTURES WERE DEFINED SMF TYPE74 (Structure) statistics showed that more than 5 application structures were defined to a coupling facility. IBM suggests that you should have as few application structures as possible. Having multiple application structures in a coupling facility can degrade performance. COUPLING FACILITY CF1 CF2 CF3 WEBSPHERE MQ STRUCTURES DEFINED 8 9 8 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 43 CPExpert Release 18.1 (Issued April 2008) Major enhancements with this update: • Provided support for z10 server • Provided analysis of HiperDispatch problems • Provided new reports to help analysis of DB2 buffer pool problems • Expanded the CPExpert email feature to the DASD Component • Provided additional analysis features for the WebSphere MQ Component ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 44 CPExpert Release 18.2 (Issued October 2008) Major enhancements with this update: • Provided support for z/OS Version 1, Release 10 • Provided additional analysis of z/OS performance problems (in WLM Component), including reduced CPU speed caused by cooling unit failure • Provided new reporting of rules based on History information kept by CPExpert (applies to all components except DB2 Component) • Added masking technique to select CICS regions (by region Group), DASD volumes (including SMS Storage Groups), and WebSphere MQ subsystems ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 45 CPExpert Release 19.1 (Issued April 2009) Major enhancements with this update: • Enhanced WLM Component with analysis of more z/OS performance problems, including Enqueue Promoted Dispatching Priority analysis • Project the amount of zAAP-eligible work that could be offloaded to a zAAP processor, if a zAAP processor were assigned to the LPAR • Provided more analysis of CICS temporary storage in CICS Component • Added Resource Enqueue analysis to DASD Component ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 46 CPExpert Release 19.2 (Issued October 2009) Major enhancements with this update: • Provided support for z/OS Version 1, Release 11 • Provide support for CICS/TS Release 4.1. • Added analysis of Resource Enqueue contention between different levels of Goal Importance to WLM Component • Added analysis of CICS Event Processing to the CICS Component (applicable to CICS/TS 4.1) • Allow users to specify narrative descriptions of individual DB2 buffer pools in CPExpert reports ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 47 CPExpert Release 20.1 (Issued April 2010) Major enhancements with this update: • Enhanced WLM Component with analysis of SMF buffer specifications and other SMF performance constraints • Support analysis of VSAM performance problems when analyzing a MICS performance data base, but using MXG TYPE42DS and MXG TYPE64 files • Allow selection of up to 20 unique DB2 subsystems while analyzing performance problems with DB2 subsystems, and add logic to handle the case where an installation has multiple identical DB2 subsystem names defined in z/OS images ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 48 CPExpert Release 20.2 (Issued October 2010) Major enhancements with this update: • Provided support z/OS Version 1 Release 12 • Provided support for z/Enterprise System (z196) • Enhanced WLM Component to provide analysis of dropped SMF records and analysis of SMF flood facility (available with z/OS V1R12) • Enhanced WLM Component to provide Management Overview of CPExpert findings, with web-enabled documentation links • Enhanced the WebSphere MQ Component to provide analysis of a non-indexed request/reply-to queue ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 49 License fees (Site license) Components First Year Additional year WLM Component 7,500 5,000 DB2 Component 7,500 5,000 CICS Component(see note) 5,000 3,000 WMQ Component 5,000 3,000 DASD Component 3,000 1,500 Note: Fees shown for the CICS Component are for analyzing no more than 50 CICS regions. ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 50 Summary • The major objective is to share solutions and provide insight into new z/OS features. • CPExpert is updated every six months; support for new versions of z/OS has been available within 30 days after General Availability of the new z/OS release. • CPExpert is offered at a low cost (affordable by all z/OS shops). • 45-day no-obligation trial is available (see license agreement for details). • Free no-obligation performance analysis is available ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 51 For more information, please contact Don Deese Computer Management Sciences, Inc. 634 Lakeview Drive Hartfield, VA 23071-3113 Phone: Fax: email: (804) 776-7109 (804) 776-7139 Don_Deese@cpexpert.com Visit www.cpexpert.com for more information, to review sample output, to review documentation in SAS ODS “point-and-click” format, to download license agreements in .pdf “form” mode, etc. ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com 52