Break Free with DB2

advertisement
New Ways to Solve your Locking
Problems with DB2 9.7
David Kalmuk
IBM
Platform: DB2 for Linux, Unix, Windows
Objectives
• Learn ways to quickly identify when you have a locking
problem on your system using the new time spent metrics
• Learn about the new locking event monitor which provides a
single comprehensive solution for analyzing locking problems
• Learn about the new locking table functions and views
introduced in 9.7 Fixpack 1 that allow you to quickly analyze
“live” locking problems on your system
• Learn how to use these tools to diagnose the cause of
deadlocks and lock timeouts as well as how to identify
performance problems related to lock waits on your system
• Take away practical examples you can try out in your own
environment.
Agenda
• A quick review of the new Lock Monitoring capabilities
introduced in DB2 9.7
• Identifying locking problems on your system
• Using the Locking Event Monitor to capture lock events
• Using the Locking Functions to analyze “live” locking issues on
your system
• Final thoughts
A Quick Review of the New Lock Monitoring
Capabilities Introduced in DB2 9.7
New Monitoring Perspectives and Dimensions
• Starting in 9.7, DB2 allows monitoring metrics to be accessed through a number of
different dimensions
• Allows more effective drilldown, and different perspectives on the data to help
isolate problems
• Three main dimensions, each consisting of a number of reporting points with
corresponding UDFs
• System
• Provide total perspective of application work being done by database system
• Aggregated through the WLM infrastructure
• Data objects
• Provide perspective of impact of all activity occurring within the scope of data objects
• Aggregated through data storage infrastructure
• Activity
• Provide perspective of work being done by specific SQL statements
• Aggregated through the package cache infrastructure
• New SQL functions in both System and Activity dimensions include locking metrics
Access Points: System Perspective
•
•
•
•
MON_GET_UNIT_OF_WORK
MON_GET_WORKLOAD
MON_GET_CONNECTION
MON_GET_SERVICE_SUBCLASS
• Also provide interfaces that produce XML output:
•
•
•
•
MON_GET_UNIT_OF_WORK_DETAILS
MON_GET_WORKLOAD_DETAILS
MON_GET_CONNECTION_DETAILS
MON_GET_SERVICE_SUBCLASS_DETAILS
Access Points: Activity Perspective
• MON_GET_PKG_CACHE_STMT
• Both static and dynamic SQL
• MON_GET_PKG_CACHE_STMT_DETAILS
• XML based output
• MON_GET_ACTIVITY_DETAILS (XML)
• Details for an activity currently in progress
Locking Related Metrics
• Aforementioned table functions report several locking
related metrics
•
•
•
•
•
•
lock_timeouts
lock_escals
deadlocks
lock_wait_time
lock_waits
num_locks_held (for connection / unit of work)
• These metrics allow you to perform identification and
drilldown on lock related problems
• Provide initial indicator that further investigation is needed
Time Spent Metrics
•
A new set of metrics are being introduced into DB2 that represent a
breakdown of where time is spent within DB2
•
Represents sum of time spent by each agent thread in the system (foreground
processing)
• Provides user with a relative breakdown of time spent, showing which areas
are the most expensive during request / query processing
•
•
•
Available in both system and activity perspectives
Can be used for rapid identification and diagnosis of performance problems
Times are divided into:
• Wait times
• Time agents spent blocking on I/O, network communications, etc
• Processing times (starting in 9.7FP1)
• Time spent in different component areas when the agent was not stuck on a
wait
• Summary / total times
• Total time spent in a particular component area including both processing +
wait times
• Lock wait time shows the proportion of overall DB2 effort spent within
lock waits
“Time Spent” Metrics: Breakdown of Wait +
Processing Times in DB2
Total Request Time in DB2
Direct I/O
Bufferpool I/O
Lock Wait Time
Compile Proc
Time
Section Proc Time
Commit / Rollback
Proc Time
Other Proc Time
Navigating the “time spent” hierarchy
• The row based formatting functions introduced in 9.7 FP1
offer an easy way to navigate the time spent hierarchy in a
generic fashion
• MON_FORMAT_XML_TIMES_BY_ROW
• Shows breakdown of waits + processing times
• MON_FORMAT_XML_WAIT_TIMES_BY_ROW
• Shows breakdown of just wait times
• MON_FORMAT_XML_COMPONENT_TIMES_BY_ROW
• Shows breakdown of processing time as well as overall time spent in each
“component” of DB2
• MON_FORMAT_XML_METRICS_BY_ROW
• Outputs all metrics in generic row based form
Example
select r.metric_name, r.parent_metric_name,
r.total_time_value as time, r.count, c.member
from table(mon_get_connection_details(125,-2)) as c,
table(mon_format_xml_times_by_row(c.details)) as r
order by total_time_value desc
METRIC_NAME
------------------------CLIENT_IDLE_WAIT_TIME
TOTAL_RQST_TIME
TOTAL_WAIT_TIME
LOG_DISK_WAIT_TIME
TOTAL_SECTION_PROC_TIME
IPC_SEND_WAIT_TIME
POOL_READ_TIME
LOCK_WAIT_TIME
TOTAL_COMMIT_PROC_TIME
IPC_RECV_WAIT_TIME
WLM_QUEUE_TIME_TOTAL
FCM_TQ_RECV_WAIT_TIME
FCM_MESSAGE_RECV_WAIT_TIM
FCM_TQ_SEND_WAIT_TIME
FCM_MESSAGE_SEND_WAIT_TIM
AGENT_WAIT_TIME
DIRECT_READ_TIME
…
Show me the
full hierarchy
of waits +
processing
times for my
connection
PARENT_METRIC_NAME
TIME
COUNT
MEMBER
------------------------- -------------------- -------------------- -----709189
0
533264
484277
0
TOTAL_RQST_TIME
329205
0
TOTAL_WAIT_TIME
229656
15866
0
TOTAL_RQST_TIME
144533
445549
0
TOTAL_WAIT_TIME
54743
484278
0
TOTAL_WAIT_TIME
32551
4144
0
TOTAL_WAIT_TIME
10789
102
0
TOTAL_RQST_TIME
8791
17268
0
TOTAL_WAIT_TIME
1463
484278
0
TOTAL_WAIT_TIME
0
0
0
FCM_RECV_WAIT_TIME
0
0
0
FCM_RECV_WAIT_TIME
0
0
0
FCM_SEND_WAIT_TIME
0
0
0
FCM_SEND_WAIT_TIME
0
0
0
TOTAL_WAIT_TIME
0
0
0
TOTAL_WAIT_TIME
0
0
0
Introducing the Locking Event Monitor
• New Event Monitor available starting in DB2 9.7
• Consolidated mechanism for capturing and performing in-depth
analysis of locking data
• Replaces existing deadlock event monitor and lock timeout report
• Support for capturing:
• Deadlocks
• Lock Timeouts
• Lock Waits
• Control granularity is at workload or database level
• Optional statement history
• Uses new UE Table target type
• Low overhead target type designed to minimize impact of
capturing events on a live system
Lock Notification Messages
• Lightweight mechanism that logs basic information about lock events
into the administrative log for:
• Deadlocks
• Lock timeouts
• Lock escalations
• Allows some basic data to be captured without requiring creation of
any event monitors
• Level of detail controlled by the mon_lck_msg_lvl configuration
parameter
•
•
•
•
0 – No events captured
1 – Lock escalation events captured (default)
2 – Lock escalations and deadlocks captured
3 – Lock timeouts, escalations and deadlocks captured
New Locking Functions and Views
• New functions and views were introduced in DB2 9.7 FP1 for adhoc
lock monitoring
• MON_GET_APPL_LOCKWAIT
• “Application-centric”
• Input arguments allow at source filtering of individual applications and members
• Displays all agents currently in lock wait state as well as information about:
•
•
•
What that agent is currently processing
What lock they are waiting on
What application currently has that lock
• MON_GET_LOCKS
• “Lock-centric”
• Displays information on locks held on the database system
• Search arguments (and input member) allows filtering on data extraction at source rather than by
using query predicates
•
•
•
•
application_handle
lock_name / lock_object_type
lock_mode / lock_type
table_schema / table_name
New Locking Functions and Views
• MON_FORMAT_LOCK_NAME
• Interprets binary lock name to show applicable lock attributes in row-based format:
•
•
•
•
•
•
•
Table name
Table schema
Lock object type
Tablespace name
Data partition id
Rowid, pageid
Others
• Lock name can be obtained from MON functions, or from db2notify/diag.log files
• MON_LOCKWAITS
• View that uses data from MON_GET_APPL_LOCKWAIT and other monitoring UDFs
to produce a view of lockwaits including additional data on applications
•
•
•
•
Lock wait time elapsed
Table name, schema, data partition id
Application name
Currently executing statement text for application (if available)
Other miscellaneous lock information
• WLM_GET_SERVICE_CLASS_AGENTS_V97
• Displays individual agent threads working on behalf of a particular
service class, or a particular application
• EVENT_TYPE, EVENT_OBJECT, EVENT_OBJECT_NAME fields
can be used to identify which agents are waiting on locks on
which partitions, and which locks they are waiting on
• AGENT_STATE_LAST_UPDATE_TIME field allows you to identify
when the agent entered the lock wait
• EXECUTABLE_ID provides a unique identifier for the statement in
the package cache the agent is currently working on (if applicable)
Identifying Locking Problems on your
System
Identifying Locking Problems
• Before we start analyzing locking problems on our
database we first need to know how to identify that we are
experiencing locking problems in the first place.
• This section will cover some basic methods of identifying
and classifying locking problems on the database using
our general in-memory metrics
• Can be useful if we want to use a “dashboard” type
approach for identifying that locking problems are
occurring, or simple manual indicators
• Subsequent sections will discuss how to analyze locking
problems we’ve identified in more detail using the locking
event monitor and the locking functions
“What locking events have occurred on my
database?”
select sum(lock_timeouts) as lock_timeouts,
sum(lock_escals) as lock_escals,
sum(deadlocks) as deadlocks,
sum(lock_waits) as lock_waits
from table(mon_get_workload(null,-2)) as t
Show me the
counts for locking
events at the
database level by
summing up all the
defined workloads
LOCK_TIMEOUTS
LOCK_ESCALS
DEADLOCKS
LOCK_WAITS
-------------------- -------------------- -------------------- -------------------2
1
3
2513
“What locking events have occurred on my database
over a recent sampling period?”
create view lockmetrics(lock_timeouts, lock_escals, deadlocks, lock_waits)
as select sum(lock_timeouts) as lock_timeouts,
sum(lock_escals) as lock_escals,
sum(deadlocks) as deadlocks,
sum(lock_waits) as lock_waits
from table(mon_get_workload(null,-2)) as t
create global temporary table locksamples as
(select * from lockmetrics) definition only on commit delete rows@
A bit of
scripting /
setup to help
obtain deltas
create view lockdelta (lock_timeouts, lock_escals, deadlocks, lock_waits)
as select t2.lock_timeouts - t1.lock_timeouts, t2.lock_escals - t1.lock_escals,
t2.deadlocks - t1.deadlocks, t2.lock_waits - t1.lock_waits
from lockmetrics as t2, locksamples as t1
insert into locksamples select * from lockmetrics
<sleep for 60s sampling period>
select * from lockdelta
Get lock
metrics on
database
sampled over 1
minute
(Be sure to run with auto-commit disabled when using CLP)
LOCK_TIMEOUTS
LOCK_ESCALS
DEADLOCKS
LOCK_WAITS
-------------------- -------------------- -------------------- -------------------0
0
0
412
“How much time is my database spending in
lock waits?”
select sum(total_rqst_time) as rqst_time,
sum(lock_wait_time) as lock_wait_time,
(case when sum(total_rqst_time) > 0
then (sum(lock_wait_time) * 100) / sum(total_rqst_time)
else 0 end) as lwt_pct
from table(mon_get_connection(null,-2)) as t
Compute the
percentage of
lock wait time
About 1% of
overall request
time spent in lock
waits
RQST_TIME
LOCK_WAIT_TIME
LWT_PCT
----------------- ----------------- ----------------15111549
264780
1
“Which connections are the most impacted by
lock waits?”
select application_name, total_rqst_time,
total_wait_time, lock_wait_time,
(case when (total_rqst_time > 0)
then (lock_wait_time * 100) /
total_rqst_time
else 0 end) as lwt_pct
from table (mon_get_connection(null,-2)) as t
order by lwt_pct desc fetch first 5 rows only
Lock wait
percentage
of request
time
The top 5 most
impacted
connections are
seeing relatively
uniform lock
waits
APPLICATION_NAME
---------------drvdtw
reporting
db2bp
drvdtw
stockupdate
TOTAL_RQST_TIME TOTAL_WAIT_TIME LOCK_WAIT_TIME LWT_PCT
--------------- --------------- -------------- ------182250
131218
4015
2
186779
134906
4191
2
181740
129870
4270
2
193160
139617
5749
2
189825
139756
5125
2
“Which of my statements are most impacted by
lock waits?”
select total_act_time, total_act_wait_time, lock_wait_time,
(case when (total_act_time > 0)
Lock wait
then (lock_wait_time * 100) / total_act_time
percentage
else 0 end) as lwt_pct,
stmt_text as stmt
from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as t
order by lwt_pct desc
fetch first 5 rows only
This query
shows us how
much lock waits
impacted
individual
statements in
the package
cache
TOTAL_ACT_TIME TOTAL_ACT_WAIT_TIME LOCK_WAIT_TIME LWT_PCT STMT
-------------- ------------------- -------------- -------- -----------------------------320540
273029
255929
79 INSERT INTO new_order VALUES…
44219
8803
6405
14 SELECT MIN( no_o_id ) INTO …
11
1
0
0 select total_act_time, total…
7
0
0
0 select sum(lock_timeouts) as…
50
0
0
0 select application_name, tot…
Examining Lock Notification Messages
• A final indicator that can be used to identify locking problems are the
lock notification messages written to the admin log
• Recall that the database can be configured to capture 3 levels worth
of data
• For example:
update db cfg using mon_lck_msg_lvl 3
Capture data
on escalations,
lock timeouts
and deadlocks
to the admin
log
• The admin log can then be examined to see a history of the events
that have occurred on the database, which may be indicative of
locking problems
Sample Admin Notification Message (Deadlock)
Application
ID
Event type
Lock name
2010-07-20-20.27.11.938461
Instance:davek
Node:000
PID:15387(db2agent (LOCKDB))
TID:1816127808
Appid:*LOCAL.davek.100721002657
database monitor sqmLockEvents::collectLockEvent Probe:274
Database:LOCKDB
ADM5506I "Deadlock" event has occurred on lock "02000600000000000000000054" at
timestamp "2010-07-20-20.27.11.934059" with event ID "1". The affected
application is named "db2bp", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "*LOCAL.davek.100721002657" at
member "0". The role that this application plays with respect to this lock is:
"Victim".
2010-07-20-20.27.11.939706
Instance:davek
Node:000
PID:15387(db2agent (LOCKDB))
TID:1933568320
Appid:*LOCAL.davek.100721002641
database monitor sqmLockEvents::collectLockEvent Probe:274
Database:LOCKDB
ADM5506I "Deadlock" event has occurred on lock "02000500000000000000000054" at
timestamp "2010-07-20-20.27.11.934059" with event ID "1". The affected
application is named "db2bp", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "*LOCAL.davek.100721002641" at
member "0". The role that this application plays with respect to this lock is:
“Participant”.
Role
Workload
Extracting Notification Messages Using SQL
select msg
from table(pd_log_get_msgs(current_timestamp – 1 hour)) as log
where msgnum=5506
Application
ID
Event type
Extract message
text for any locking
notifications in the
last hour
Lock name
MSG
--------------------------------------------------------------------------------
ADM5506I "Deadlock" event has occurred on lock "03000600050000000000000052" at
timestamp "2011-03-18-18.11.08.596061" with event ID "1". The affected
application is named "db2bp", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "*LOCAL.davek.110318220828" at
member "0". The role that this application plays with respect to this lock is:
"Participant".
…
Role
Workload
Using the Locking Event Monitor to
Capture Lock Events
Using the Locking Event Monitor
• This section will show how to use the locking event
monitor to capture locking events of interest on your
system
• This mechanism allows the capture of detailed information
on locking events in order to help analyze locking
problems on the database
• The process of using the locking event monitor consists of
three basic steps:
• Creating and enabling the event monitor
• Setting the type and level of data collection
• Formatting and analyzing the data
Creating the Locking Event Monitor
Create the
locking event
monitor
create event monitor lockEvmon for locking write to
unformatted event table
( table lockdata in monitorTBS)
autostart
Activate the
event monitor
set event monitor lockEvmon state 1
( Enable collection of desired data…)
( Collect data… )
( Format and examine data… )
set event monitor lockEvmon state 0
drop event monitor lockEvmon
drop table lockdata
Configure and
collect desired
data (covered
later)
If I need to
remove the
event monitor
at some point
Capturing Deadlock Events
Enable
collection of
deadlock events
without
statement
history for the
entire database
Deadlocks will
spawn
workloads so
collection must
be enabled at
the database
level
update db cfg using mon_deadlock without_hist
Other options are history,
hist_and_values, none
alter workload sysdefaultuserworkload collect deadlock
data with history and values
Other options are with history,
without history, and none
Enable collection of
additional deadlock
data for a specific
workload only
Capturing Lock Timeout Events
Enable collection of
lock timeout events
for the entire
database
Other options are history,
hist_and_values, none
update db cfg using mon_locktimeout without_hist
alter workload sysdefaultuserworkload collect lock timeout
data with history and values
Other options are with history,
without history, and none
Enable collection of
lock timeout events
a specific workload
only
Capturing Lock Wait Events
Capture any lock waits
exceeding 5 seconds (units of
usec)
Enable collection of
lock wait events for
the entire database
update db cfg using mon_lw_thresh 5000000
update db cfg using mon_lockwait without_hist
Other options are history,
hist_and_values, none
alter workload customworkload collect lock wait data with
history for locks waiting more than 3 seconds
Other options are with history,
without history, and none
Capture any lock waits
exceeding 3 seconds (can also
use microseconds)
Enable collection of
lock wait events a
specific workload
only
Formatting the Locking Event Monitor Output
• Once we have captured some locking data of interest
analysis, the next step is to format that data for analysis
• We provide three different methods for formatting the data
captured in the unformatted event table
• The db2evmonfmttool
• Produces a text based report from the UE table data
• The EVMON_FORMAT_UE_TO_XML UDF
• Produces an XML based report for each lock event
• The EVMON_FORMAT_UE_TO_TABLE procedure
• Creates a set of relational tables and populates these with the event
data from the UE table
• Let’s look at each of these in more detail
A Sample Locking Scenario
Application #1
update t1 set data = 1 where ident = 0
Application #2
(…)
update t2 set moredata=2 where moredata=1
select * from staff
update t1 set data=1 where ident=0
update t2 set moredata=2 where moredata=1 (Fails with -911)
Formatting the Locking Event Monitor Output
using the db2evmonfmt tool
Compile the
formatter tool
found in the java
samples
cd ~/sqllib/samples/java/jdbc
javac db2evmonfmt.java
Format the data
and produce a text
based report
Name of the UE table where
our event data is stored
Produce text based
report
java db2evmonfmt –d sample –ue lockdata –ftext –hours 1 > lockrep.txt
Database to connect to
Format events within
the last hour
------------------------------------------------------Event ID
: 3
Event Type
: DEADLOCK
Event Timestamp
: 2010-09-21-21.34.07.640849
Partition of detection : 0
------------------------------------------------------Deadlock Graph
-------------Total number of deadlock participants : 2
Participant that was rolled back
: 2
Type of deadlock
: local
Participant
Requesting Lock
--------------1
2
Participant
Holding Lock
--------------2
1
Deadlock lock
event
Deadlock Member Application Handle
Deadlock graph
data
--------------- -----------------0
09060
0
09070
Participant No 2 requesting lock
---------------------------------Lock Name
: 0x03001A00050000000000000052
Lock wait start time : 2010-09-21-21.34.01.589946
Lock wait end time
: 2010-09-21-21.34.07.640849
Lock Type
: ROW
…
Table Schema
: DAVEK
Table Name
: T2
Participant No 1 requesting lock
---------------------------------…
(…omitted requestor attributes)
Participant lock
information
Participant No 1 requesting lock
---------------------------------Lock Name
:
0x03001900040000000000000052
Lock wait start time : 2010-09-21-21.33.51.504053
Lock wait end time
: 2010-09-21-21.34.07.640849
Lock Type
: ROW
…
Table Schema
: DAVEK
Table Name
: T1
Current Activities of Participant No 2
---------------------------------------Activity ID
: 2
Uow ID
: 11
…
Stmt type
: Dynamic
Stmt operation
: DML, Insert/Update/Delete
Stmt text
: update t2 set moredata=2 where moredata=1
Past Activities of Participant No 2
------------------------------------Past Activities wrapped: no
Participant
statement
history
information
Current Activities of Participant No 1
---------------------------------------Activity ID
: 3
Uow ID
: 6
Activity ID
: 1
…
Uow ID
: 11
Stmt type
: Dynamic
…
Stmt
operation
: DML, Insert/Update/Delete
Stmt type
: Dynamic
Stmt text
: update t1 set data=1 where ident=0
Stmt operation
: DML, Insert/Update/Delete
Stmt text
: update t1 set data=1 where ident=0
Past Activities of Participant No 1
------------------------------------Current Activities of Participant No 1
---------------------------------------- Past Activities wrapped: no
…
Activity ID
: 3
Activity ID
: 1
Uow ID
: 6
Uow ID
: 6
…
…
Stmt type
: Dynamic
Stmt type
: Dynamic
Stmt operation
: DML, Insert/Update/Delete
operation
: DML, Insert/Update/Delete
Stmt text
: update t1 set data=1Stmt
where
ident=0
Stmt text
: update t2 set moredata=2 where
moredata=1
Past Activities of Participant No 1
------------------------------------Past Activities wrapped: no
Formatting the Locking Event Monitor Output
using EVMON_FORMAT_UE_TO_XML
Parse to produce text
based XML
select xmlparse(document evmon.xmlreport)
from table ( evmon_format_ue_to_xml
(null,
for each row of
( select * from lockdata
order by event_id, event_timestamp,
event_type, member)
)
) as evmon
Format the lock
events in the UE
table into XML
based records
Feed raw UE table data
in as input
<db2_lock_event
Event type
xmlns="http://www.ibm.com/xmlns/prod/db2/mon" id="3" type="DEADLOCK"
Lock graph
timestamp="2010-09-21T21:34:07.640849" member="0" release="9070200">
<db2_deadlock_graph dl_conns="2" rolled_back_participant_no="2" type="local">
<db2_participant no="1" deadlock_member="0" participant_no_holding_lk="2"
application_handle="09060"/>
<db2_participant no="2" deadlock_member="0" participant_no_holding_lk="1"
application_handle="09070"/>
</db2_deadlock_graph>
First
<db2_participant no="2" type="Requester" participant_no_holding_lk="1">
participant
<db2_object_requested type="lock">
<lock_name>03001A00050000000000000052</lock_name>
<lock_object_type id="2">ROW</lock_object_type>
Lock
(…)
information
<table_name id="26">T2</table_name>
<table_schema>DAVEK
</table_schema>
(…)
</db2_object_requested>
<db2_app_details>
Application
<application_handle>09060</application_handle>
information
<appl_id>*LOCAL.davek.100922011700</appl_id>
(…)
</db2_app_details>
<db2_activity type="current">
<db2_activity_details>
(…)
Statement
<stmt_text>update t2 set moredata=2 where moredata=1</stmt_text>
history data
(…)
</db2_activity_details>
</db2_activity>
<db2_activity type="past">
<db2_activity_details>
(…)
<stmt_text>update t1 set data=1 where ident=0</stmt_text>
(…)
</db2_activity_details>
</db2_activity>
</db2_participant>
<db2_participant no="1" type="Requester" participant_no_holding_lk="2">
(…)
<db2_activity type="current">
<db2_activity_details>
(…)
<stmt_text>update t1 set data=1 where ident=0</stmt_text>
(…)
</db2_activity_details>
</db2_activity>
(…)
<db2_activity type="past">
<db2_activity_details>
(…)
<stmt_text>update t2 set moredata=2 where moredata=1</stmt_text>
(…)
</db2_activity_details>
</db2_activity>
</db2_participant>
</db2_lock_event>
Statement
history data
Second
participant
Statement
history data
Formatting the Locking Event Monitor Output
using EVMON_FORMAT_UE_TO_TABLE
call evmon_format_ue_to_tables (
‘LOCKING’, null, null, null, null, null, ‘’, -1,
‘select * from lockdata
Feed raw UE table
query text as input
order by event_timestamp’
)
Format the lock
events in the UE
table into SQL
tables
Populates data in the
following tables
Table/View
------------------------------LOCKDATA
LOCK_ACTIVITY_VALUES
LOCK_EVENT
LOCK_PARTICIPANTS
LOCK_PARTICIPANT_ACTIVITIES
T1
T2
Schema
--------------DAVEK
DAVEK
DAVEK
DAVEK
DAVEK
DAVEK
DAVEK
Type
----T
T
T
T
T
T
T
Creation time
-------------------------2010-07-20-18.56.13.029144
2010-07-21-00.25.18.373751
2010-07-21-00.25.16.306045
2010-07-21-00.25.16.967038
2010-07-21-00.25.17.310122
2010-07-20-22.28.21.204058
2010-07-20-20.22.11.474826
select xmlid, event_id, event_type, event_timestamp,
member, dl_conns, rolled_back_participant_no
from lock_event
Examine lock event
information
XMLID
EVENT_TYPE EVENT_TIMESTAMP
MEMBER DL_CONNS ROLLED_BACK_PARTICIPANT_NO
----------------- ---------- -------------------------- ------ -------- -------------------------db2LockEvent_1_…
DEADLOCK 2011-03-21-18.48.54.433717
0
2
2
select xmlid, participant_no, participant_type,
participant_no_holding_lk, application_handle
from lock_participants where xmlid like ‘db2LockEvent_1_%’
Examine lock
participant
information and
dependencies that
make up the
deadlock graph
XMLID
PARTICIPANT_NO PARTICIPANT_TYPE PARTICIPANT_NO_HOLDING_LK APPLICATION_HANDLE
---------------- -------------- ---------------- ------------------------- -----------------db2LockEvent_1_…
2 Requester
1
11446
db2LockEvent_1_…
1 Requester
2
1528
select lock_name, lock_wait_start_time, lock_wait_end_time,
lock_object_type, table_schema, table_name
from lock_participants
where participant_no=2 and xmlid like ‘db2LockEvent_1_%’
Examine lock wait
information from a
specific participant
LOCK_NAME
LOCK_WAIT_START_TIME LOCK_WAIT_END_TIME LOCK_OBJECT_TYPE TABLE_SCHEMA TABLE_NAME
--------------- -------------------- ------------------- ---------------- ------------ ---------0300060005000…
2011-03-21-18.48.48 2011-03-21-18.48.54
ROW
DAVEK
T2
select activity_id, activity_type, uow_id, stmt_type,
stmt_text
from lock_participant_activities
where participant_no=2 and xmlid like ‘db2LockEvent_1_%’
ACTIVITY_ID
----------2
1
Examine participant
activity history
ACTIVITY_TYPE UOW_ID STMT_TYPE STMT_TEXT
------------- ------ --------- ----------------------------------------current
1
2 update t2 set moredata=2 where moredata=1
past
1
2 update t1 set data = 1 where ident = 0
Notes on DPF and PureScale
• On DPF and PureScale lock events may occur on different members
• On DPF lock conflicts are always local, but occur on different members
• On PureScale lock conflicts can occur locally on different members, as
well as between different members due to global locks
• Deadlock dependencies on both PureScale and DPF may involve
multiple members
• When a lock event occurs, each participant in the event is notified of
the event and will cut an individual lock record to the UE table
• This occurs irrespective of whether participants are local to a single
member, or on different members
• When the lock events are formatted, the formatter will amalgamate
the individual records from each of the participants
Example: Global Deadlock in DPF or PureScale
------------------------------------------------------Event ID
: 3
Event Type
: DEADLOCK
Event Timestamp
: 2010-09-21-21.34.07.640849
Partition of detection : 0
------------------------------------------------------Deadlock Graph
-------------Total number of deadlock participants : 2
Participant that was rolled back
: 2
Type of deadlock
: global
Participant
Requesting Lock
--------------1
2
Participant
Holding Lock
--------------2
1
Global deadlock
with conflicts on
different
members
Deadlock Member Application Handle
--------------- -----------------0
09060
3
09070
Participant No 2 requesting lock
---------------------------------Lock Name
: 0x03001A00050000000000000052
Lock wait start time : 2010-09-21-21.34.01.589946
Lock wait end time
: 2010-09-21-21.34.07.640849
Lock Type
: ROW
…
Table Schema
: DAVEK
Table Name
: T2
Participant No 1 requesting lock
---------------------------------Lock Name
:
0x03001900040000000000000052
Lock wait start time : 2010-09-21-21.33.51.504053
Lock wait end time
: 2010-09-21-21.34.07.640849
Lock Type
: ROW
…
Table Schema
: DAVEK
Table Name
: T1
Using the Locking Functions to
Analyze “Live” Locking Issues on
your System
Using the Locking Functions
• The final tool we to assist in the analysis of locking
problems are a set of new locking functions and views
introduced in DB2 9.7 FP1
• These functions allow you to inspect information on
current lock dependencies on your system from several
different perspectives
• This allows for adhoc analysis of “live” locking issues
without the need to capture and analyze historical data
• Let’s look at some examples of how these functions work
Another Sample Locking Scenario
Application #1
delete from sales where region=‘Quebec’
Application #2
delete from sales where region=‘Quebec’
Application #3
lock table sales in exclusive mode
Application #4
delete from sales where region=‘Manitoba’
Application #5
drop table sales
Identifying Applications Involved in Lock Waits
select lock_wait_start_time, lock_name, req_application_handle,
hld_application_handle, hld_member
from table(mon_get_appl_lockwait(null,-2)) as t
Lock
waiters
LOCK_WAIT_START_TIME
--------------------2010-07-22-14.41.20.428811
2010-07-22-14.26.31.028352
2010-07-22-14.26.18.772041
2010-07-22-14.26.51.747847
Corresponding
holders
LOCK_NAME
REQ_APPL… HLD_APPL… HLD_MEMBER
-------------------------- --------- --------- ---------02001000000000000000000054
236
203
0
02001000000000000000000054
203
201
0
02001000070000000000000052
202
201
0
00000500081A00002157BE8543
204
203
0
Identifying the statements the applications are
currently running
List distinct
select a.apphdl, p.stmt_text
Retrieve
applications
from (select distinct apphdl from
statement
((select l.req_application_handle as apphdl
text
from table(mon_get_appl_lockwait(null,-2)) as l)
union
Lookup SQL
(select l.hld_application_handle as apphdl
activities on
from table(mon_get_appl_lockwait(null,-2)) as l))) as a,
app coord
table(mon_get_connection(a.apphdl,-1)) as c,
table(wlm_get_workload_occurrence_activities_v97(a.apphdl,
c.coord_member)) as s,
table(mon_get_pkg_cache_stmt(null,s.executable_id,null,c.coord_member)) as p
APPHDL
-------------------202
203
204
236
STMT_TEXT
----------------------------------------delete from sales where region=‘Quebec’
lock table sales in exclusive mode
drop table sales
delete from sales where region=‘Manitoba’
Using the MON_LOCKWAITS view
Show me
applications involved
in lock waits and
their current SQL
select lock_wait_elapsed_time, lock_name, req_application_handle,
hld_application_handle, req_stmt_text, hld_current_stmt_text
from sysibmadm.mon_lockwaits
(Run this with isolation set
to UR to ensure we don’t
block on catalog locks)
LOCK_WAIT…
---------7525
8414
8427
8394
LOCK_NAME
REQ_APPL… HLD_APPL… REQ_STMT_TEXT
HLD_CURRENT_STMT_TEXT
-------------------------- ---------- --------- ------------------ ---------------------------02001000000000000000000054
236
203 delete from sales… lock table sales in excl…
02001000000000000000000054
203
201 lock table sales… 02001000070000000000000052
202
201 delete from sales… 00000500081A00002157BE8543
204
203 drop table sales
lock table sales in excl…
Finding lock holders using MON_GET_LOCKS
Show me the
waiters and holders
for a particular
lock
select lock_name, member, lock_status, application_handle
from table(mon_get_locks(
CLOB('<lock_name>02001000000000000000000054</lock_name>'),
-2)) as l
LOCK_NAME
MEMBER LOCK_STATUS APPLICATION_HANDLE
-------------------------------- ------ ----------- -------------------02001000000000000000000054
0 G
201
02001000000000000000000054
0 G
202
02001000000000000000000054
0 W
203
02001000000000000000000054
0 W
236
Formatting the Lock Name
Show me the
attributes for a
particular lock
select name, value
from table(mon_format_lock_name( '02001000000000000000000054')) as f
NAME
-------------------LOCK_OBJECT_TYPE
TBSP_NAME
TABSCHEMA
TABNAME
VALUE
---------TABLE
USERSPACE1
DAVEK
SALES
Notes on DPF and PureScale
• In DPF, the behavior of the locking functions is basically identical to
serial instances except that lock conflicts / waits may occur on
different members.
• In order to see the whole picture queries should be issued on all
members
• In PureScale with global locks it’s possible that a lock wait by a
request executing on one member is blocked waiting for a lock held
by a request executing on a different member
• In this case MON_GET_APPL_LOCKWAIT will not be able to show which
application is the holder – only that the holder is located on a remote
member rather than locally
• When this occurs, the holder application can be identified in a subsequent
step via a call to MON_GET_LOCKS using the name of the lock being
waited on
Identifying Lock Waits in PureScale
select lock_wait_start_time, lock_name, req_application_handle,
req_member, hld_application_handle, hld_member
from table(mon_get_appl_lockwait(null,-2)) as t
Lock
waiters
LOCK_WAIT_START_TIME
-------------------------2011-03-22-20.38.56.715902
2011-03-22-20.38.49.592468
2011-03-22-20.38.47.298632
2011-03-22-20.38.51.385681
Unknown lock
holders (!)
LOCK_NAME
REQ_APPL… REQ_MEMBER HLD_APPL… HLD_MEMBER
-------------------------- --------- ---------- --------- ---------00000500031C0000C0F1BDE2C
65603
2
131125
2
02001000000000000000000054
131125
2
0
02001000060000000000000052
65589
1
0
02001000000000000000000054
67
0
1
Locating the Missing Lock Holders
select lw.lock_wait_start_time, lw.lock_name, lw.req_application_handle,
lw.req_member, lk.application_handle as hld_application_handle,
lw.hld_member
from table(mon_get_appl_lockwait(null,-2)) as lw,
lateral(
select lock_name, member, lock_status, application_handle
from table(
mon_get_locks(CLOB('<lock_name>'||lw.lock_name|| </lock_name>'),
lw.hld_member))
as locks where locks.lock_status='G'
fetch first row only
) as lk
Lock
waiters
LOCK_WAIT_START_TIME
-------------------------2011-03-22-20.38.56.715902
2011-03-22-20.38.49.592468
2011-03-22-20.38.47.298632
2011-03-22-20.38.51.385681
For each lock
wait
Find me the
first holder of
the lock on
the holder
member
Non-local lock
holders now
shown
LOCK_NAME
REQ_APPL… REQ_MEMBER HLD_APPL… HLD_MEMBER
-------------------------- --------- ---------- --------- ---------00000500031C0000C0F1BDE2C
65603
2
131125
2
02001000000000000000000054
131125
2
53
0
02001000060000000000000052
65589
1
53
0
02001000000000000000000054
67
0
65589
1
Final thoughts
In Closing
• We’ve introduced you to some of the lock new monitoring
capabilities introduced in DB2 9.7
• These facilities are intended to provide a consolidated set of
diagnostics to enable to you to identify and solve a wide variety
of lock related issues on your system
• Replaces various existing facilities such as the deadlock event
monitor and lock timeout report
• Provides a variety of approaches to identify and drill down into
locking problems, both from adhoc in-memory diagnostics to
historical capture mechanism
• Hopefully the examples in this session have given you some ideas on
ways you can leverage the latest DB2 Monitoring capabilities in your
own environment
Questions?
DB2 Monitoring Resources
DB2 Monitoring Resources
• DB2 9.7 documentation:
• http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp
• Related IDUG NA 2011 presentations:
• Advanced Performance Diagnostics for SQL (D02)
David Kalmuk
IBM
dckalmuk@ca.ibm.com
New Ways to Solve your Locking Problems with
DB2 9.7
Download