Deep understanding of DB2 and Snapshots monitoring and Administrative view Sharad D. Pawar ACI Payment system Agenda 1. How to capture DB2 Snapshots 2. Database snapshot interpretations 3. Snapshot Analysis 4. DB2 Administrative views 2 How to Capture DB2 snapshots Get the snapshots for scenario from productions • • • • • Verify monitor switches turn on db2 “get monitor switches” db2 "get dbm cfg "| grep -i "DFT_MON« db2 "get monitor switches" Some time database server bounce will be required. 3 How to Capture DB2 snapshots DB Configuration for Monitor Switches are as follows : Buffer pool Lock Sort Statement Table Timestamp Unit of work (DFT_MON_BUFPOOL) = ON (DFT_MON_LOCK) = ON (DFT_MON_SORT) = ON (DFT_MON_STMT) = ON (DFT_MON_TABLE) = ON (DFT_MON_TIMESTAMP) = ON (DFT_MON_UOW) = ON 4 How to Capture DB2 snapshots db2 “get snapshot for all on <DBNAME> ” > outfile.txt Basic command to get the snapshots. GET SNAPSHOT FOR {DATABASE MANAGER | ALL [DCS] DATABASES | ALL [DCS] APPLICATIONS | ALL BUFFERPOOLS | [DCS] APPLICATION {APPLID appl-id | AGENTID appl-handle} | FCM FOR ALL DBPARTITIONNUMS | LOCKS FOR APPLICATION {APPLID appl-id | AGENTID appl-handle} | {ALL | [DCS] DATABASE | [DCS] APPLICATIONS | TABLES | TABLESPACES | LOCKS | BUFFERPOOLS | DYNAMIC SQL [write to file]} ON database-alias} [AT DBPARTITIONNUM db-partition-number | GLOBAL] 5 How to Capture DB2 snapshots Snapshot can be collected for following monitoring switches : • • • • • • • Database Tables Tablespaces Locks Bufferpools Application Dynamic SQL 6 Database Snapshot Interpretations Snapshot can be interpreted in following ways : General health of database can be determined from the Database Snapshots Even based on these counters we can determine application counter. 7 Snapshot Analysis Following are the matrices for measuring the performance database ARSS ( Average Result Set Size ) IRE ( Index Read Efficiency ) Synchronous Read Percentage (SRP) Number of Transactions Completed (TXCNT) Number of Selects per Transaction (SELTX) DML per Transaction (DMLTX) Sorts per Transaction (SRTTX) Sort Overflows per Transaction Rows Read per Transaction (DB-RRTX) Rows Fetched per Transaction (DB-FETTX) Bufferpool Logical Reads per Transaction (BPLRTX) Bufferpool Logical INDEX Reads per Transaction (BPLITX) 8 Snapshot Analysis ARSS ( Average Result Set Size ) Rule of Thumb : If ARSS is less than or equal to 10, then the database is behaving like an OLTP database. If the ARSS is greater than 10, then your database is behaving like a Data Warehouse database. If the ARSS is just a little bit greater than 10, then you may have an OLTP database with some concurrent decision support (DW) queries running. How to Calculate ARSS : ARSS = ROWS_SELECTED / SELECT_SQL_STMTS 9 Snapshot Analysis IRE (Index Read Efficiency) Rule of Thumb : If IRE is less than or equal to 10, This is desirable for OLTP database. If you have an OLTP database, an Index Read Efficiency ratio of ten or higher is cause for concern. This would indicate that indexes providing sufficient filtration quality are not available. DB2 may be performing scans or using inefficient indexes as a poor substitute. The IREF for this database was 180 meaning that DB2 picks up and evaluates 180 rows to return just one row on average. This is indicative of scans, and scans in an OLTP database are bad. How to Calculate IRF : IRE : Rows Read / Rows Selected 10 Snapshot Analysis SRP (Synchronous Read Percentage ) SRP Guidelines for OLTP databases: Rule of Thumb : * SRP should be greater than 90 to make good use of of high quality synchronous I/O to retrieve the required result sets. * If the SRP is in the range of 80-90%, this is good, but may have tuning opportunities for improvement. * If the SRP is in the range of 50-80%, the database's performance may be marginal at best. There are definitely physical design opportunities for improvement. If your SRP is in this range * SRP is being less than 50%, this is highly undesirable. DBA will have lot of opportunity to improve the performance. 11 Snapshot Analysis SRP (Synchronous Read Percentage) How to Calculate SRP : SRP : 100 - (((Asynchronous pool data page reads + Asynchronous pool index page reads) * 100) / (Buffer pool data physical reads + Buffer pool index physical reads)). 12 Snapshot Analysis SRP (Synchronous Read Percentage) SRP Calculation for DW Database : Rule of Thumb : * If your SRP is greater than 50%, database is performing very good as Data Warehouse queries tend to do a significant amount of data scanning for queries that return larger result sets. * If the SRP is in the range of 25-50%, this is good, but it may have tuning opportunities. * If the SRP is anywhere less than 20%, it means database has lot of opportunities for tuning. 13 Snapshot Analysis TXCNT (Number of Transactions Completed) It’s summation of the number of committed statement and rollback statements. How to Calculate TXCNT : TXCNT = Commit statements attempted + Rollback statements attempted 14 Snapshot Analysis SELTX (Number of Selects per transaction) Rule of Thumb : For OLTP database, the range is 3-15. SELTX indicates how much data retrieval work is being done for each transaction. A value less than 10 is common and desirable. How to Calculate SELTX : SELTX = "Select SQL statements executed" / TXCNT 15 Snapshot Analysis DMLTX (DML per Transaction) Rule of Thumb : General consideration is within the worldwide normal range of .5 to 4. 3-4 Select statements are being accompanied by 1-2 Insert/Update/Delete statements, on average. This is good because the units of work are small. DMLTX indicates how much data change activity is being performed for each transaction. A value less than 4 is common and desirable. As DMLTX increases, this will influence the need to increase the DB CFG parameter LOGBUFSZ. The risk of lock contention also increases along with increases in DMLTX How to calculate DMLTX : DMLTX = "Update/Insert/Delete statements executed" / TXCNT 16 Snapshot Analysis SRTX (Sorts per Transaction) Rule of Thumb : Removing Sorts from your transactions will measurably improve transaction response times AND lower CPU consumption. How to Calculate SRTX : SRTX= Total sorts / TXCNT 17 Snapshot Analysis SOPT (Sort Overflows per Transaction) Rule of Thumb : SORT consumes CPU cycle and it should be lower for each transaction. Ideally, this should be less than 1 or 2. How to calculate SOPT : SOPT = (Sort overflows * 100) / Total sorts 18 Snapshot Analysis RRTX (Rows Read per Transaction) Rule of Thumb : Rows Read per Database Transaction should be less than 10%. Making higher rows read per transaction will cause high CPU consumption on the database server. How to Calculate RRTX : RRTX = Rows Read / TXCNT 19 Snapshot Analysis FETTX (Rows Fetched per Transaction) Rule of Thumb : Rows Read per Database Transaction should be less than 10%. Making higher rows read per transaction will cause high CPU consumption on the database server. How to Calculate FETTX : FETTX = Rows Selected / TXCNT (Commit statements attempted + Rollback statements attempted) 20 Snapshot Analysis BPLRTX (Bufferpool Logical Reads per Transaction) This is cost metric and one of the best cost measurements Bufferpool Read I/O ms per Transaction or verifying your tuning success. Bufferpool logical page reads equate in direct proportion to CPU time consumed. This value Should be low. Just remember: Logical Reads = CPU Consumption When DB2 wants to access data (either index pages or data pages), a Logical Read is performed against the bufferpool. If the data requested is not present in the bufferpool per the logical read request, then a Physical Read must be performed to disk to retrieve the page that was logically requested. If the logically requested data is already in the bufferpool, then the physical disk read is avoided. So, you will note that a request for data typically begins with a Logical request which may, or may not, result in a physical request. 21 Snapshot Analysis BPLRTX (Bufferpool Logical Reads per Transaction) Formula for calculation : BPLRTX = cast((((Buffer pool data logical reads)+(Buffer pool index logical reads))/((Commit statements attempted)+(Rollback statements attempted))) 22 Snapshot Analysis Overall Read Time (ms) (ORMS) ORMS tells us the average time for DB2 to complete a physical read. It should be computed for the database and for each tablespace. DBA should compare the ORMS for the database against the ORMS for each tablespace. If any tablespaces have read times significantly higher than the average for the database, then it is important to determine why and attempt to improve the performance of the slowest tablespaces. In our example it is 0 23 Snapshot Analysis Overall Write Time (ms) (OWMS) The average time to perform a physical write for this database was 1.84ms. This is good. 97.29% of writes are being performed asynchronously. This is also good. OWMS at the database level tells us the average time for the database to perform a write (any write, whether synchronous or asynchronous). OWMS at the tablespace level tells us the average write time for each tablespace. If the OWMS for a tablespace is significantly higher than the OWMS for the database, then you have likely uncovered a "performance opportunity for improvement". This situation needs to be investigated. For the tablespaces with the slowest write times, carefully examine their definitions, containers, and placement of containers, and consider these best practices: 24 Snapshot Analysis Bufferpool Write I/O ms per Transaction (BPWIOTX) Bufferpool write time is just one important component of understanding where transaction time goes. we'll also need to look direct I/O times, lock times, sort times, and CPU times. Once we know where time is spent inside the database, then we can focus on the resource that is the greatest bottleneck to optimized performance. We'll also look at determining average transaction times, and how much time, and what percent of time, is spent inside the database and out. BPWIOTX = cast(TOTBPWRITETM as decimal(18,0))/ cast((COMMITSTMTATTMPTD + ROLLBCKSTMTATTMPTD) as decimal(18,0))) 25 Snapshot Analysis Bufferpool Matrix Database Bufferpool Index Hit Ratio (DB-BPIHR) Database Bufferpool Overall Hit Ratio (DB-BPOHR) The bufferpool index hit ratio was 98.58% and the overall bufferpool hit ratio was 86.52%. While these hit ratios look good, remember that bufferpool hit ratios can be very misleading (give you a false sense of security and tuning success) when scans are occurring in the bufferpools. To improve hit ratios, DBAs will commonly throw more and more memory at the bufferpool sizes until they can't get the bufferpool hit ratios to go any higher. 26 Snapshot Analysis Various Bufferpool Matrix Buffer pool hit ratio per bufferpool Should be around 95% and above. Buffer pool data read (logical + physical) Buffer pool index read (logical + physical) Buffer pool total read time Buffer pool total write time No of victim buffers Available Direct read/write Time spend on direct read/write 27 Snapshot Analysis Lock related matrix LCKMS - The Average Lock Wait Time Not every lock times out. Some locks just experience temporary delays while they wait for required resources to become available. The LCKMS formula will tell you the average lock wait time. LCKMS should not be greater than LOCKTIMEOUT, but it could be equal to LOCKTIMEOUT if ALL of your locks time out. Remember, too, that LOCKTIMEOUT is configured in seconds and this formula computes milliseconds another one of the wonderful consistencies within DB2 LCKTX - The Average Lock Wait Time per Transaction LCKTX = Time database waited on locks (ms) / TX Count 28 What are different Lock matrix indicate What are different lock matric indicate : Number Lock held. Total Lock wait time. Snapshot timestamp Application Id Application status Status Change time Look for “Application status =“ basically lock waiting. 29 Snapshot Analysis Problem Analysis for the Query Number Rows read per execution. Execution time per execution. System CPU time per execution. Rows returned per execution. Look for Buffer pool Data read (logical + physical) Look for Buffer pool index read (logical + physical) Look for temporary xml read (logical + physical) Look for Sort time/ Number of sort/Sort overflow 30 Snapshot Analysis Catalog Cache Analysis : The Catalog Cache Hit Ratio (CATHR) CATHR = 100 - ( Catalog cache inserts * 100 / Catalog cache lookups ) The Catalog Cache Hit Ratio should generally be at least 95%, and most shops are able to achieve this rather easily. If you find that your CATHR is less than 95%, then you will want to increase DB CFG parameter CATALOGCACHE_SZ in gradual 5% increments, or 16 4K pages, whichever is greater, until such time as you successfully achieve the 95% goal. 31 Snapshot Analysis What are different application matrix indicate : Application status Status change time Application idle time = 2 minutes 22 seconds (example) Look for Lock section in the transaction Locks held by application Lock waits since connect Time application waited on locks (ms) Deadlocks detected Lock escalations Exclusive lock escalations Number of Lock Timeouts since connected Total time UOW waited on locks (ms) Sort related counter (Total sort/Total sort time/overflow) Rows deleted/ inserted/updated Rows selected (If this value is higher then application is reading too much data) Rows written 32 DB2 Administrative View The system-defined routines and views provide a primary, easy-to-use programmatic interface to administer and use DB2® through SQL. They encompass a collection of builtin views, table functions, procedures, and scalar functions for performing a variety of DB2 tasks. Refer http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm. db2.luw.sql.rtn.doc/doc/c0022652.html SysIBMAdm (Administrative view) and Sysproc (Table Functions) 33 DB2 Administrative View Db2 “list tables for schema sysibmadm” | grep –i “mon_” 34 DB2 Administrative View MON_DB_SUMMARY MON_GET_APPL_LOCKWAIT MON_GET_BUFFERPOOL MON_GET_INDEX MON_GET_LOCKS MON_GET_TABLE MON_GET_TABLESPACE MON_GET_PKG_CACHE_STMT 35 Summary • How to use the database level matrix to overall performance of system • How to relate the same into the Administrative view Questions