332_debghb_v2 - PUG Challenge Americas

advertisement
PUG Challenge Americas
2013 – Westford, MA
Click to edit Master title style
The Deepest Depths of promon
And how it may help in troubleshooting certain DB problems
Presented by: Gus Bjorklund & Dan Foreman
1
PUG Challenge Americas 2013
Gus Bjorklund
• Progress Wizard
2
PUG Challenge Americas 2013
Dan Foreman
•
•
•
•
Progress User since 1984
Author of several Progress related Publications
News Flash – A new publication which is a superset of
this presentation, titled promon debghb, will be available
shortly
Author of several cool and useful Progress DBA Tools
– ProMonitor & ProCheck & LockMon
– Pro Dump&Load
– Balanced Benchmark
•
3
Basketball Fanatic…which sometimes leads to
unexpected trips to the ER
PUG Challenge Americas 2013
Dan Foreman
4
PUG Challenge Americas 2013
Brief History of the debghb option in promon
• Added to promon V6.3 by Gus
• Purpose: The shared memory architecture introduced in
V6.3 was quite a bit different and a way to monitor
shared memory activity at a detailed level was needed
• The debghb option was not a formally endorsed
enhancement but written by Gus in his spare time
• deb = DEBug
• ghb = Gus’s initials….the middle initial is a Top Secret
fanatically guarded by the Finnish government
5
PUG Challenge Americas 2013
Warnings
• The debghb option is not documented by Progress
• DO NOT call or email Gus or PTS for help in using this
option
• Many of the screens and/or metrics have no value to a
DBA
• The view of the data is not transactionally consistent
sometimes even on the same screen; example to follow
• Some of the data is not accurate (data overflow, rounding
errors, etc)
• Some of the screens are broken (don’t display any data)
• The debghb option can be altered, removed, or hidden
by Progress any time they want to
6
PUG Challenge Americas 2013
Warnings #2
•
DB activities in shared memory can be slowed down if
certain options are enabled
05/08/13
19:10:44
OpenEdge Release 10 Monitor (R&D)
Adjust Latch Options
1.
2.
3.
4.
5.
6.
7
Spins before timeout:
24000
Enable latch activity data collection
Enable latch timing data collection
Initial latch sleep time:
10 milliseconds
Maximum latch sleep time:
5000 milliseconds
Record Free Chain Search Depth Factor: 5
PUG Challenge Americas 2013
How do I access debghb?
•
•
•
•
•
Start promon
Enter: R&D (also works in lower case starting in V10)
Enter: debghb
You have now entered the debghb “zone”
Two main differences in the world of debghb
– Extensions to some existing R&D screens
– Enables access to “This menu is not here Menu”
• Enter: “6” even though there is no visible option 6
• See next slide
8
PUG Challenge Americas 2013
This menu is not here Menu
06/06/13
15:23:01
OpenEdge Release 10 Monitor (R&D)
This menu is not here Menu
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
9
Cache Entries
Hash Chain
Page Writer Queue
Lru Chains
Locked Buffers
Buffer Locks
Buffer Use Counts
Resource Queues
TXE Lock Activity
Adjust TXE Options
Latch Counts
Latch Times
I/O Wait Time by Type
I/O by File
Buffer Lock Queue
Semaphores
Shutdown
PUG Challenge Americas 2013
Operating Hints
•
•
•
Allow at least 40-45 lines of screen data
Allow at least 120-140 columns of screen width
Zero out the stats (“z”) to get a clean starting place
– This ‘zeroing’ does not wipe out the actual shared memory
counters but only affects the current promon session
•
•
10
Update the stats periodically (“u”) to get snapshots
All the above can be scripted
PUG Challenge Americas 2013
Operating Hints - User# -1
•
•
•
Usecnt = # of concurrent processes accessing the block
When initially examining the BLQ there were 5 Clients
accessing the same DBKEY
But before all 5 could be displayed:
– One Client dropped off, i.e. released the Buffer Lock, before
they could be displayed
– Another one of the 5 is partially displayed; i.e. the -1 User#
02/06/13
00:37:21
User
-1
746
762
826
11
Status: Buffer Lock Queue
DBKEY Area T
70658368
70658368
70658368
70658368
34
34
34
34
I
I
I
I
Status
Type
Usect
LOCKED
LOCKED
LOCKED
LOCKED
SHARE
SHARE
SHARE
SHARE
5
3
3
3
PUG Challenge Americas 2013
Useful Screens - Checkpoints
•
•
Extensions to the ‘normal’ Checkpoint screen
Columns of interest
– Duration: the amount of time required to complete a
Checkpoint; the entire Database is transactionally frozen
during this time
• _CheckPoint._CheckPoint-Duration (V10.2B SP5)
– Sync Time: a subset of the ‘Duration’ column; the amount of
time required to execute fdatasync() system call
• _CheckPoint._CheckPoint.Synctime (V10.2B SP5)
•
•
12
See http://www.makelinux.net/alp/060 for an excellent
description of fdatasync (don’t confuse it with fsync).
Sample data on the next slide
PUG Challenge Americas 2013
Useful Screens - Checkpoints
•
•
•
•
The ‘Duration’ of the Checkpoints (i.e. the total freeze
time) is very high for most of the CPs displayed
A ‘Duration’ of less than 1 second is a good goal
The 10 sec ‘Duration’ is approximately 1/3 of the CP ‘Len’
In other words, a CP is occurring approximately every 30
seconds and for 10 seconds of that period, NO
transaction activity can take place.
Ckpt
No. Time
Len Freq
Dirty
------ Database Writes ---CPT Q
Scan APW Q Flushes Duration Sync Time
2499
2498
2497
2496
2495
2494
2493
246
212
33
30
34
34
29
14097
33792
34536
34385
37992
40933
41377
14586
35191
37114
36950
40285
43429
43427
13
00:56:04
00:52:16
00:51:40
00:51:09
00:50:34
00:49:59
00:49:29
265
228
36
31
35
35
30
3003
2915
229
164
408
132
690
62
199
85
151
341
363
281
0
0
0
0
0
0
0
2.12
6.49
7.15
7.22
8.33
10.20
5.39
0.42
4.85
5.96
5.89
7.15
9.05
3.84
PUG Challenge Americas 2013
Useful Screens – Resource Queues
•
•
•
•
NHM (Not Here Menu) - Option #8
Do not confuse Resources with Latches
Link to Banville’s presentation
In general the busiest locks will be:
– DB Buf S Lock
– DB Buf X Lock
– Record Lock
•
Waits that can be problematic:
– DB Buf I Lock (I = Intent but these are for Index blocks)
– Sample on the next slide
14
PUG Challenge Americas 2013
Useful Screens – Resource Queues
01/31/13
00:31:57
Activity: Resource Queues
01/31/13 00:26 to 01/31/13 00:31 (5 min)
Queue
Record Lock
Trans Commit
DB Buf I Lock
Record Get
DB Buf Read
DB Buf Write
DB Buf S Lock
DB Buf X Lock
DB Buf S Lock LRU2
DB Buf X Lock LRU2
DB Buf Write LRU2
BI Buf Read
BI Buf Write
TXE Share Lock
TXE Update Lock
TXE Commit Lock
15
- Requests Total
/Sec
1007903
3360
1631
5
1006724
3356
724869
2416
305596
1019
62727
209
33370848 111236
1092894
3643
20934886 69783
11088
37
3367
11
4788
16
16075
54
1148821
3829
10347
34
63540
212
------- Waits ------Total
/Sec
Pct
8
0
139476
0
0
0
159591
157022
0
0
0
0
1096
0
282
1927
0
0
465
0
0
0
532
523
0
0
0
0
4
0
1
6
0.00
0.00
0.14
0.00
0.00
0.00
0.00
0.14
0.00
0.00
0.00
0.00
0.07
0.00
0.03
0.03
PUG Challenge Americas 2013
Useful Screens – Latch Counts
•
•
•
•
•
16
NHM (Not Here Menu) - Option #11
The R&D Blocked Clients screen doesn’t show Latch
contention so debghb is the only place in promon where
detailed Latch activity is visible
Definition of Naps: When –spin is ‘used up’ by a Progress
Client, the process Naps (i.e. does no useful work) for a
while and tries again
General Principle: Napping is bad
Samples on the next few slides
PUG Challenge Americas 2013
Latch Counts – OM Latch
•
OM (Object Cache) Latch activity can be totally eliminated
by setting the -omsize parameter equal to or greater than
the number of _StorageObject records.
04/24/13
00:59:28
Owner
MTX
USR
OM
17
----
Activity: Latch Counts
04/24/13 00:54 to 04/24/13 00:59 (5 min 1 sec)
----- Locks ----Total
/Sec
1563935
1178290
21523860
5195
3914
71507
--- Busy --/Sec
Pct
86
3
7322
1.6
0.0
10.2
Naps
/Sec
45
0
139
-------- Spins -------/Sec
/Lock
/Busy
6461297
0
9144588
1243
0
127
74724
0
1248
-- Nap Max Total
HWM
259
0
3
300
0
80
PUG Challenge Americas 2013
Latch Counts – USR Latch
•
The small contention on the USR (DB Connection Table)
Latch is because Statement Caching is enabled
04/25/13
00:33:17
Owner
MTX -USR -OM 4343
BIB -SCH -LKP -GST -TXT -SEQ -AIB -TXQ -EC
-LKF -BFP -BHT -PWQ -CPQ -LRU -LRU -BUF -18
Activity: Latch Counts
04/25/13 00:28 to 04/25/13 00:33 (5 min 0 sec)
----- Locks ----Total
/Sec
2402181
1517252
27667792
2170630
447
90680
195
703633
505781
1947241
3304146
0
1834458
0
63583335
491
535397
812009
0
39136969
8007
5057
92225
7235
1
302
0
2345
1685
6490
11013
0
6114
0
211944
1
1784
2706
0
130456
---- Busy --/Sec
Pct
17
8
1962
0
0
0
0
0
0
0
5
0
1
0
40
0
0
0
0
13
0.2
0.1
2.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.2
0.0
0.0
0.0
0.0
Naps
/Sec
45
0
32
1
0
0
0
0
0
0
1
0
0
0
9
0
0
2
0
1
--------- Spins ---------/Sec
/Lock
/Busy
2059022
0
5781905
49241
0
6
200
811
18283
4957
116828
0
16091
0
1511922
23
23253
129408
0
191806
257
0
62
6
0
0
307
0
10
0
10
0
2
0
7
14
13
47
0
1
120481
0
2946
165982
0
923
0
10149
66889
43745
20788
0
15673
0
37429
7184
47135
160424
0
13868
--- Nap Max
Total
HWM
133
0
0
11
0
0
1
0
4
1
0
0
7
0
13
0
0
0
0
3
300
0
80
300
0
10
10
10
10
20
20
0
10
0
300
0
160
80
0
40
PUG Challenge Americas 2013
Latch Counts – LRU Chain
•
•
The total number of Locks for LRU is the second highest of all
the Resources shown (BHT – Buffer Hash Table – is #1)
The # of Naps per Second is the highest of all latches (Zero is
the goal)
01/31/13
00:05:38
Owner
MTX
OM
BHT
CPQ
LRU
LRU
BUF
BUF
BUF
BUF
19
----830
------
Activity: Latch Counts
01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec)
----- Locks ----Total
/Sec
1654034
8216844
62371320
197126
40395944
36
32676880
39818994
31278094
33342130
5513
27389
207904
657
134653
0
108922
132729
104260
111140
------ Busy -----/Sec
Pct
0
0
0
0
0
0
0
0
0
0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
Naps
/Sec
15
1
3
0
1402
0
0
529
8
3
PUG Challenge Americas 2013
Latch Counts – LRU Chain
•
The # of locks on the second LRU (Alternate Buffer
Cache) is nil because all the ABC Objects completely fit in
the amount of –B2 memory allocated
01/31/13
00:05:38
Owner
BHT -CPQ -LRU 830
LRU -BUF -BUF --
20
Activity: Latch Counts
01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec)
----- Locks ----Total
/Sec
62371320
207904
197126
657
40395944
134653
36
0
32676880
108922
39818994
132729
------ Busy -----/Sec
Pct
0
0.0
0
0.0
0
0.0
0
0.0
0
0.0
0
0.0
Naps
/Sec
3
0
1402
0
0
529
PUG Challenge Americas 2013
Latch Counts – LRU Chain
•
‘Owner’ column: if the User# doesn’t change (in value or
frequency) that can be a problem indicator because
Latches should be held for only a fraction of a second
01/31/13
00:05:38
Owner
BHT -CPQ -LRU 830
LRU -BUF -BUF --
21
Activity: Latch Counts
01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec)
----- Locks ----Total
/Sec
62371320
207904
197126
657
40395944
134653
36
0
32676880
108922
39818994
132729
------ Busy -----/Sec
Pct
0
0.0
0
0.0
0
0.0
0
0.0
0
0.0
0
0.0
Naps
/Sec
3
0
1402
0
0
529
PUG Challenge Americas 2013
Using Latch Counts to set -spin
•
•
•
•
•
Short answer – Forget It!
If it was that easy Progress would have done it already
Past attempts have not been successful
Also the optimal value of –spin is not going to be the
same for each Latch
General guidelines:
– Greater than 1000
– Less than 100000
– Current Default: 6000 * (# of CPU Cores)
• Default not advised if you have more than 16 Cores
– Dan’s (Patent Pending) Formula: (DBA-Birthday-Year * )
– Gus’s formula: 5000
22
PUG Challenge Americas 2013
Useful Screens – Buffer Lock Queue
•
•
•
•
23
NHM (Not Here Menu) - Option #15
The ‘normal’ R&D Blocked Clients screen does not show
the Area that the DBKEY belongs to
The Buffer Lock Queue (BLQ) Screen shows the Area as
well as the Block Type
Examples on the next two slides
PUG Challenge Americas 2013
R&D Blocked Clients
•
•
The R&D Blocked Clients screen doesn’t show enough
information to identify the Object involved in this
contention storm for DBKEY 65987456
There were 29 Clients all blocked on the same DBKEY
01/31/13
00:26:41
24
Status: Blocked Clients
Usr Name
Type
Wait
Wait Info
Trans id
730
735
743
747
749
755
769
SELF/ABL
SELF/ABL
SELF/ABL
SELF/ABL
SELF/ABL
SELF/ABL
SELF/ABL
BKSH
BKSH
BKSH
BKSH
BKSH
BKSH
BKSH
65987456
65987456
65987456
65987456
65987456
65987456
65987456
601383708
601383773
601383921
601384104
601383895
601384175
601384161
_AUTO-B
_AUTO-B
_AUTO-B
_AUTO-B
_AUTO-B
_AUTO-B
_AUTO-B
Login time
01/30/13
01/30/13
01/30/13
01/30/13
01/30/13
01/30/13
01/30/13
23:22
23:23
23:22
23:22
23:23
23:23
23:22
PUG Challenge Americas 2013
Buffer Lock Queue
•
•
IF there is a matching DBKEY on the the BLQ screen, we
can get the Area# and the Block Type (I = Index)
There were 29 processes on the Blocked Clients screen
with this DBKEY and only 4 on the BLQ screen with the
same DBKEY
01/31/13
00:26:41
User
-1
722
772
856
859
Status: Buffer Lock Queue
DBKEY
65987456
65987456
65987456
65987456
65987456
Area T
Status
Type
Usect
34
34
34
34
34
LOCKED
LOCKED
LOCKED
LOCKED
LOCKED
SHARE
SHARE
SHARE
SHARE
SHARE
4
4
4
4
4
I
I
I
I
I
<lines unrelated to DBKEY 65987456 snipped>
25
PUG Challenge Americas 2013
Thank You!
Questions?
• Gus: gus@progress.com
• Dan: dan@prodb.com
26
PUG Challenge Americas 2013
Download