PUG Challenge Americas 2013 – Westford, MA Click to edit Master title style The Deepest Depths of promon And how it may help in troubleshooting certain DB problems Presented by: Gus Bjorklund & Dan Foreman 1 PUG Challenge Americas 2013 Gus Bjorklund • Progress Wizard 2 PUG Challenge Americas 2013 Dan Foreman • • • • Progress User since 1984 Author of several Progress related Publications News Flash – A new publication which is a superset of this presentation, titled promon debghb, will be available shortly Author of several cool and useful Progress DBA Tools – ProMonitor & ProCheck & LockMon – Pro Dump&Load – Balanced Benchmark • 3 Basketball Fanatic…which sometimes leads to unexpected trips to the ER PUG Challenge Americas 2013 Dan Foreman 4 PUG Challenge Americas 2013 Brief History of the debghb option in promon • Added to promon V6.3 by Gus • Purpose: The shared memory architecture introduced in V6.3 was quite a bit different and a way to monitor shared memory activity at a detailed level was needed • The debghb option was not a formally endorsed enhancement but written by Gus in his spare time • deb = DEBug • ghb = Gus’s initials….the middle initial is a Top Secret fanatically guarded by the Finnish government 5 PUG Challenge Americas 2013 Warnings • The debghb option is not documented by Progress • DO NOT call or email Gus or PTS for help in using this option • Many of the screens and/or metrics have no value to a DBA • The view of the data is not transactionally consistent sometimes even on the same screen; example to follow • Some of the data is not accurate (data overflow, rounding errors, etc) • Some of the screens are broken (don’t display any data) • The debghb option can be altered, removed, or hidden by Progress any time they want to 6 PUG Challenge Americas 2013 Warnings #2 • DB activities in shared memory can be slowed down if certain options are enabled 05/08/13 19:10:44 OpenEdge Release 10 Monitor (R&D) Adjust Latch Options 1. 2. 3. 4. 5. 6. 7 Spins before timeout: 24000 Enable latch activity data collection Enable latch timing data collection Initial latch sleep time: 10 milliseconds Maximum latch sleep time: 5000 milliseconds Record Free Chain Search Depth Factor: 5 PUG Challenge Americas 2013 How do I access debghb? • • • • • Start promon Enter: R&D (also works in lower case starting in V10) Enter: debghb You have now entered the debghb “zone” Two main differences in the world of debghb – Extensions to some existing R&D screens – Enables access to “This menu is not here Menu” • Enter: “6” even though there is no visible option 6 • See next slide 8 PUG Challenge Americas 2013 This menu is not here Menu 06/06/13 15:23:01 OpenEdge Release 10 Monitor (R&D) This menu is not here Menu 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 9 Cache Entries Hash Chain Page Writer Queue Lru Chains Locked Buffers Buffer Locks Buffer Use Counts Resource Queues TXE Lock Activity Adjust TXE Options Latch Counts Latch Times I/O Wait Time by Type I/O by File Buffer Lock Queue Semaphores Shutdown PUG Challenge Americas 2013 Operating Hints • • • Allow at least 40-45 lines of screen data Allow at least 120-140 columns of screen width Zero out the stats (“z”) to get a clean starting place – This ‘zeroing’ does not wipe out the actual shared memory counters but only affects the current promon session • • 10 Update the stats periodically (“u”) to get snapshots All the above can be scripted PUG Challenge Americas 2013 Operating Hints - User# -1 • • • Usecnt = # of concurrent processes accessing the block When initially examining the BLQ there were 5 Clients accessing the same DBKEY But before all 5 could be displayed: – One Client dropped off, i.e. released the Buffer Lock, before they could be displayed – Another one of the 5 is partially displayed; i.e. the -1 User# 02/06/13 00:37:21 User -1 746 762 826 11 Status: Buffer Lock Queue DBKEY Area T 70658368 70658368 70658368 70658368 34 34 34 34 I I I I Status Type Usect LOCKED LOCKED LOCKED LOCKED SHARE SHARE SHARE SHARE 5 3 3 3 PUG Challenge Americas 2013 Useful Screens - Checkpoints • • Extensions to the ‘normal’ Checkpoint screen Columns of interest – Duration: the amount of time required to complete a Checkpoint; the entire Database is transactionally frozen during this time • _CheckPoint._CheckPoint-Duration (V10.2B SP5) – Sync Time: a subset of the ‘Duration’ column; the amount of time required to execute fdatasync() system call • _CheckPoint._CheckPoint.Synctime (V10.2B SP5) • • 12 See http://www.makelinux.net/alp/060 for an excellent description of fdatasync (don’t confuse it with fsync). Sample data on the next slide PUG Challenge Americas 2013 Useful Screens - Checkpoints • • • • The ‘Duration’ of the Checkpoints (i.e. the total freeze time) is very high for most of the CPs displayed A ‘Duration’ of less than 1 second is a good goal The 10 sec ‘Duration’ is approximately 1/3 of the CP ‘Len’ In other words, a CP is occurring approximately every 30 seconds and for 10 seconds of that period, NO transaction activity can take place. Ckpt No. Time Len Freq Dirty ------ Database Writes ---CPT Q Scan APW Q Flushes Duration Sync Time 2499 2498 2497 2496 2495 2494 2493 246 212 33 30 34 34 29 14097 33792 34536 34385 37992 40933 41377 14586 35191 37114 36950 40285 43429 43427 13 00:56:04 00:52:16 00:51:40 00:51:09 00:50:34 00:49:59 00:49:29 265 228 36 31 35 35 30 3003 2915 229 164 408 132 690 62 199 85 151 341 363 281 0 0 0 0 0 0 0 2.12 6.49 7.15 7.22 8.33 10.20 5.39 0.42 4.85 5.96 5.89 7.15 9.05 3.84 PUG Challenge Americas 2013 Useful Screens – Resource Queues • • • • NHM (Not Here Menu) - Option #8 Do not confuse Resources with Latches Link to Banville’s presentation In general the busiest locks will be: – DB Buf S Lock – DB Buf X Lock – Record Lock • Waits that can be problematic: – DB Buf I Lock (I = Intent but these are for Index blocks) – Sample on the next slide 14 PUG Challenge Americas 2013 Useful Screens – Resource Queues 01/31/13 00:31:57 Activity: Resource Queues 01/31/13 00:26 to 01/31/13 00:31 (5 min) Queue Record Lock Trans Commit DB Buf I Lock Record Get DB Buf Read DB Buf Write DB Buf S Lock DB Buf X Lock DB Buf S Lock LRU2 DB Buf X Lock LRU2 DB Buf Write LRU2 BI Buf Read BI Buf Write TXE Share Lock TXE Update Lock TXE Commit Lock 15 - Requests Total /Sec 1007903 3360 1631 5 1006724 3356 724869 2416 305596 1019 62727 209 33370848 111236 1092894 3643 20934886 69783 11088 37 3367 11 4788 16 16075 54 1148821 3829 10347 34 63540 212 ------- Waits ------Total /Sec Pct 8 0 139476 0 0 0 159591 157022 0 0 0 0 1096 0 282 1927 0 0 465 0 0 0 532 523 0 0 0 0 4 0 1 6 0.00 0.00 0.14 0.00 0.00 0.00 0.00 0.14 0.00 0.00 0.00 0.00 0.07 0.00 0.03 0.03 PUG Challenge Americas 2013 Useful Screens – Latch Counts • • • • • 16 NHM (Not Here Menu) - Option #11 The R&D Blocked Clients screen doesn’t show Latch contention so debghb is the only place in promon where detailed Latch activity is visible Definition of Naps: When –spin is ‘used up’ by a Progress Client, the process Naps (i.e. does no useful work) for a while and tries again General Principle: Napping is bad Samples on the next few slides PUG Challenge Americas 2013 Latch Counts – OM Latch • OM (Object Cache) Latch activity can be totally eliminated by setting the -omsize parameter equal to or greater than the number of _StorageObject records. 04/24/13 00:59:28 Owner MTX USR OM 17 ---- Activity: Latch Counts 04/24/13 00:54 to 04/24/13 00:59 (5 min 1 sec) ----- Locks ----Total /Sec 1563935 1178290 21523860 5195 3914 71507 --- Busy --/Sec Pct 86 3 7322 1.6 0.0 10.2 Naps /Sec 45 0 139 -------- Spins -------/Sec /Lock /Busy 6461297 0 9144588 1243 0 127 74724 0 1248 -- Nap Max Total HWM 259 0 3 300 0 80 PUG Challenge Americas 2013 Latch Counts – USR Latch • The small contention on the USR (DB Connection Table) Latch is because Statement Caching is enabled 04/25/13 00:33:17 Owner MTX -USR -OM 4343 BIB -SCH -LKP -GST -TXT -SEQ -AIB -TXQ -EC -LKF -BFP -BHT -PWQ -CPQ -LRU -LRU -BUF -18 Activity: Latch Counts 04/25/13 00:28 to 04/25/13 00:33 (5 min 0 sec) ----- Locks ----Total /Sec 2402181 1517252 27667792 2170630 447 90680 195 703633 505781 1947241 3304146 0 1834458 0 63583335 491 535397 812009 0 39136969 8007 5057 92225 7235 1 302 0 2345 1685 6490 11013 0 6114 0 211944 1 1784 2706 0 130456 ---- Busy --/Sec Pct 17 8 1962 0 0 0 0 0 0 0 5 0 1 0 40 0 0 0 0 13 0.2 0.1 2.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 Naps /Sec 45 0 32 1 0 0 0 0 0 0 1 0 0 0 9 0 0 2 0 1 --------- Spins ---------/Sec /Lock /Busy 2059022 0 5781905 49241 0 6 200 811 18283 4957 116828 0 16091 0 1511922 23 23253 129408 0 191806 257 0 62 6 0 0 307 0 10 0 10 0 2 0 7 14 13 47 0 1 120481 0 2946 165982 0 923 0 10149 66889 43745 20788 0 15673 0 37429 7184 47135 160424 0 13868 --- Nap Max Total HWM 133 0 0 11 0 0 1 0 4 1 0 0 7 0 13 0 0 0 0 3 300 0 80 300 0 10 10 10 10 20 20 0 10 0 300 0 160 80 0 40 PUG Challenge Americas 2013 Latch Counts – LRU Chain • • The total number of Locks for LRU is the second highest of all the Resources shown (BHT – Buffer Hash Table – is #1) The # of Naps per Second is the highest of all latches (Zero is the goal) 01/31/13 00:05:38 Owner MTX OM BHT CPQ LRU LRU BUF BUF BUF BUF 19 ----830 ------ Activity: Latch Counts 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----Total /Sec 1654034 8216844 62371320 197126 40395944 36 32676880 39818994 31278094 33342130 5513 27389 207904 657 134653 0 108922 132729 104260 111140 ------ Busy -----/Sec Pct 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Naps /Sec 15 1 3 0 1402 0 0 529 8 3 PUG Challenge Americas 2013 Latch Counts – LRU Chain • The # of locks on the second LRU (Alternate Buffer Cache) is nil because all the ABC Objects completely fit in the amount of –B2 memory allocated 01/31/13 00:05:38 Owner BHT -CPQ -LRU 830 LRU -BUF -BUF -- 20 Activity: Latch Counts 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----Total /Sec 62371320 207904 197126 657 40395944 134653 36 0 32676880 108922 39818994 132729 ------ Busy -----/Sec Pct 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 Naps /Sec 3 0 1402 0 0 529 PUG Challenge Americas 2013 Latch Counts – LRU Chain • ‘Owner’ column: if the User# doesn’t change (in value or frequency) that can be a problem indicator because Latches should be held for only a fraction of a second 01/31/13 00:05:38 Owner BHT -CPQ -LRU 830 LRU -BUF -BUF -- 21 Activity: Latch Counts 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) ----- Locks ----Total /Sec 62371320 207904 197126 657 40395944 134653 36 0 32676880 108922 39818994 132729 ------ Busy -----/Sec Pct 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 Naps /Sec 3 0 1402 0 0 529 PUG Challenge Americas 2013 Using Latch Counts to set -spin • • • • • Short answer – Forget It! If it was that easy Progress would have done it already Past attempts have not been successful Also the optimal value of –spin is not going to be the same for each Latch General guidelines: – Greater than 1000 – Less than 100000 – Current Default: 6000 * (# of CPU Cores) • Default not advised if you have more than 16 Cores – Dan’s (Patent Pending) Formula: (DBA-Birthday-Year * ) – Gus’s formula: 5000 22 PUG Challenge Americas 2013 Useful Screens – Buffer Lock Queue • • • • 23 NHM (Not Here Menu) - Option #15 The ‘normal’ R&D Blocked Clients screen does not show the Area that the DBKEY belongs to The Buffer Lock Queue (BLQ) Screen shows the Area as well as the Block Type Examples on the next two slides PUG Challenge Americas 2013 R&D Blocked Clients • • The R&D Blocked Clients screen doesn’t show enough information to identify the Object involved in this contention storm for DBKEY 65987456 There were 29 Clients all blocked on the same DBKEY 01/31/13 00:26:41 24 Status: Blocked Clients Usr Name Type Wait Wait Info Trans id 730 735 743 747 749 755 769 SELF/ABL SELF/ABL SELF/ABL SELF/ABL SELF/ABL SELF/ABL SELF/ABL BKSH BKSH BKSH BKSH BKSH BKSH BKSH 65987456 65987456 65987456 65987456 65987456 65987456 65987456 601383708 601383773 601383921 601384104 601383895 601384175 601384161 _AUTO-B _AUTO-B _AUTO-B _AUTO-B _AUTO-B _AUTO-B _AUTO-B Login time 01/30/13 01/30/13 01/30/13 01/30/13 01/30/13 01/30/13 01/30/13 23:22 23:23 23:22 23:22 23:23 23:23 23:22 PUG Challenge Americas 2013 Buffer Lock Queue • • IF there is a matching DBKEY on the the BLQ screen, we can get the Area# and the Block Type (I = Index) There were 29 processes on the Blocked Clients screen with this DBKEY and only 4 on the BLQ screen with the same DBKEY 01/31/13 00:26:41 User -1 722 772 856 859 Status: Buffer Lock Queue DBKEY 65987456 65987456 65987456 65987456 65987456 Area T Status Type Usect 34 34 34 34 34 LOCKED LOCKED LOCKED LOCKED LOCKED SHARE SHARE SHARE SHARE SHARE 4 4 4 4 4 I I I I I <lines unrelated to DBKEY 65987456 snipped> 25 PUG Challenge Americas 2013 Thank You! Questions? • Gus: gus@progress.com • Dan: dan@prodb.com 26 PUG Challenge Americas 2013