New Ways to Solve your Locking Problems with DB2 9.7 David Kalmuk IBM Platform: DB2 for Linux, Unix, Windows Objectives • Learn ways to quickly identify when you have a locking problem on your system using the new time spent metrics • Learn about the new locking event monitor which provides a single comprehensive solution for analyzing locking problems • Learn about the new locking table functions and views introduced in 9.7 Fixpack 1 that allow you to quickly analyze “live” locking problems on your system • Learn how to use these tools to diagnose the cause of deadlocks and lock timeouts as well as how to identify performance problems related to lock waits on your system • Take away practical examples you can try out in your own environment. Agenda • A quick review of the new Lock Monitoring capabilities introduced in DB2 9.7 • Identifying locking problems on your system • Using the Locking Event Monitor to capture lock events • Using the Locking Functions to analyze “live” locking issues on your system • Final thoughts A Quick Review of the New Lock Monitoring Capabilities Introduced in DB2 9.7 New Monitoring Perspectives and Dimensions • Starting in 9.7, DB2 allows monitoring metrics to be accessed through a number of different dimensions • Allows more effective drilldown, and different perspectives on the data to help isolate problems • Three main dimensions, each consisting of a number of reporting points with corresponding UDFs • System • Provide total perspective of application work being done by database system • Aggregated through the WLM infrastructure • Data objects • Provide perspective of impact of all activity occurring within the scope of data objects • Aggregated through data storage infrastructure • Activity • Provide perspective of work being done by specific SQL statements • Aggregated through the package cache infrastructure • New SQL functions in both System and Activity dimensions include locking metrics Access Points: System Perspective • • • • MON_GET_UNIT_OF_WORK MON_GET_WORKLOAD MON_GET_CONNECTION MON_GET_SERVICE_SUBCLASS • Also provide interfaces that produce XML output: • • • • MON_GET_UNIT_OF_WORK_DETAILS MON_GET_WORKLOAD_DETAILS MON_GET_CONNECTION_DETAILS MON_GET_SERVICE_SUBCLASS_DETAILS Access Points: Activity Perspective • MON_GET_PKG_CACHE_STMT • Both static and dynamic SQL • MON_GET_PKG_CACHE_STMT_DETAILS • XML based output • MON_GET_ACTIVITY_DETAILS (XML) • Details for an activity currently in progress Locking Related Metrics • Aforementioned table functions report several locking related metrics • • • • • • lock_timeouts lock_escals deadlocks lock_wait_time lock_waits num_locks_held (for connection / unit of work) • These metrics allow you to perform identification and drilldown on lock related problems • Provide initial indicator that further investigation is needed Time Spent Metrics • A new set of metrics are being introduced into DB2 that represent a breakdown of where time is spent within DB2 • Represents sum of time spent by each agent thread in the system (foreground processing) • Provides user with a relative breakdown of time spent, showing which areas are the most expensive during request / query processing • • • Available in both system and activity perspectives Can be used for rapid identification and diagnosis of performance problems Times are divided into: • Wait times • Time agents spent blocking on I/O, network communications, etc • Processing times (starting in 9.7FP1) • Time spent in different component areas when the agent was not stuck on a wait • Summary / total times • Total time spent in a particular component area including both processing + wait times • Lock wait time shows the proportion of overall DB2 effort spent within lock waits “Time Spent” Metrics: Breakdown of Wait + Processing Times in DB2 Total Request Time in DB2 Direct I/O Bufferpool I/O Lock Wait Time Compile Proc Time Section Proc Time Commit / Rollback Proc Time Other Proc Time Navigating the “time spent” hierarchy • The row based formatting functions introduced in 9.7 FP1 offer an easy way to navigate the time spent hierarchy in a generic fashion • MON_FORMAT_XML_TIMES_BY_ROW • Shows breakdown of waits + processing times • MON_FORMAT_XML_WAIT_TIMES_BY_ROW • Shows breakdown of just wait times • MON_FORMAT_XML_COMPONENT_TIMES_BY_ROW • Shows breakdown of processing time as well as overall time spent in each “component” of DB2 • MON_FORMAT_XML_METRICS_BY_ROW • Outputs all metrics in generic row based form Example select r.metric_name, r.parent_metric_name, r.total_time_value as time, r.count, c.member from table(mon_get_connection_details(125,-2)) as c, table(mon_format_xml_times_by_row(c.details)) as r order by total_time_value desc METRIC_NAME ------------------------CLIENT_IDLE_WAIT_TIME TOTAL_RQST_TIME TOTAL_WAIT_TIME LOG_DISK_WAIT_TIME TOTAL_SECTION_PROC_TIME IPC_SEND_WAIT_TIME POOL_READ_TIME LOCK_WAIT_TIME TOTAL_COMMIT_PROC_TIME IPC_RECV_WAIT_TIME WLM_QUEUE_TIME_TOTAL FCM_TQ_RECV_WAIT_TIME FCM_MESSAGE_RECV_WAIT_TIM FCM_TQ_SEND_WAIT_TIME FCM_MESSAGE_SEND_WAIT_TIM AGENT_WAIT_TIME DIRECT_READ_TIME … Show me the full hierarchy of waits + processing times for my connection PARENT_METRIC_NAME TIME COUNT MEMBER ------------------------- -------------------- -------------------- -----709189 0 533264 484277 0 TOTAL_RQST_TIME 329205 0 TOTAL_WAIT_TIME 229656 15866 0 TOTAL_RQST_TIME 144533 445549 0 TOTAL_WAIT_TIME 54743 484278 0 TOTAL_WAIT_TIME 32551 4144 0 TOTAL_WAIT_TIME 10789 102 0 TOTAL_RQST_TIME 8791 17268 0 TOTAL_WAIT_TIME 1463 484278 0 TOTAL_WAIT_TIME 0 0 0 FCM_RECV_WAIT_TIME 0 0 0 FCM_RECV_WAIT_TIME 0 0 0 FCM_SEND_WAIT_TIME 0 0 0 FCM_SEND_WAIT_TIME 0 0 0 TOTAL_WAIT_TIME 0 0 0 TOTAL_WAIT_TIME 0 0 0 Introducing the Locking Event Monitor • New Event Monitor available starting in DB2 9.7 • Consolidated mechanism for capturing and performing in-depth analysis of locking data • Replaces existing deadlock event monitor and lock timeout report • Support for capturing: • Deadlocks • Lock Timeouts • Lock Waits • Control granularity is at workload or database level • Optional statement history • Uses new UE Table target type • Low overhead target type designed to minimize impact of capturing events on a live system Lock Notification Messages • Lightweight mechanism that logs basic information about lock events into the administrative log for: • Deadlocks • Lock timeouts • Lock escalations • Allows some basic data to be captured without requiring creation of any event monitors • Level of detail controlled by the mon_lck_msg_lvl configuration parameter • • • • 0 – No events captured 1 – Lock escalation events captured (default) 2 – Lock escalations and deadlocks captured 3 – Lock timeouts, escalations and deadlocks captured New Locking Functions and Views • New functions and views were introduced in DB2 9.7 FP1 for adhoc lock monitoring • MON_GET_APPL_LOCKWAIT • “Application-centric” • Input arguments allow at source filtering of individual applications and members • Displays all agents currently in lock wait state as well as information about: • • • What that agent is currently processing What lock they are waiting on What application currently has that lock • MON_GET_LOCKS • “Lock-centric” • Displays information on locks held on the database system • Search arguments (and input member) allows filtering on data extraction at source rather than by using query predicates • • • • application_handle lock_name / lock_object_type lock_mode / lock_type table_schema / table_name New Locking Functions and Views • MON_FORMAT_LOCK_NAME • Interprets binary lock name to show applicable lock attributes in row-based format: • • • • • • • Table name Table schema Lock object type Tablespace name Data partition id Rowid, pageid Others • Lock name can be obtained from MON functions, or from db2notify/diag.log files • MON_LOCKWAITS • View that uses data from MON_GET_APPL_LOCKWAIT and other monitoring UDFs to produce a view of lockwaits including additional data on applications • • • • Lock wait time elapsed Table name, schema, data partition id Application name Currently executing statement text for application (if available) Other miscellaneous lock information • WLM_GET_SERVICE_CLASS_AGENTS_V97 • Displays individual agent threads working on behalf of a particular service class, or a particular application • EVENT_TYPE, EVENT_OBJECT, EVENT_OBJECT_NAME fields can be used to identify which agents are waiting on locks on which partitions, and which locks they are waiting on • AGENT_STATE_LAST_UPDATE_TIME field allows you to identify when the agent entered the lock wait • EXECUTABLE_ID provides a unique identifier for the statement in the package cache the agent is currently working on (if applicable) Identifying Locking Problems on your System Identifying Locking Problems • Before we start analyzing locking problems on our database we first need to know how to identify that we are experiencing locking problems in the first place. • This section will cover some basic methods of identifying and classifying locking problems on the database using our general in-memory metrics • Can be useful if we want to use a “dashboard” type approach for identifying that locking problems are occurring, or simple manual indicators • Subsequent sections will discuss how to analyze locking problems we’ve identified in more detail using the locking event monitor and the locking functions “What locking events have occurred on my database?” select sum(lock_timeouts) as lock_timeouts, sum(lock_escals) as lock_escals, sum(deadlocks) as deadlocks, sum(lock_waits) as lock_waits from table(mon_get_workload(null,-2)) as t Show me the counts for locking events at the database level by summing up all the defined workloads LOCK_TIMEOUTS LOCK_ESCALS DEADLOCKS LOCK_WAITS -------------------- -------------------- -------------------- -------------------2 1 3 2513 “What locking events have occurred on my database over a recent sampling period?” create view lockmetrics(lock_timeouts, lock_escals, deadlocks, lock_waits) as select sum(lock_timeouts) as lock_timeouts, sum(lock_escals) as lock_escals, sum(deadlocks) as deadlocks, sum(lock_waits) as lock_waits from table(mon_get_workload(null,-2)) as t create global temporary table locksamples as (select * from lockmetrics) definition only on commit delete rows@ A bit of scripting / setup to help obtain deltas create view lockdelta (lock_timeouts, lock_escals, deadlocks, lock_waits) as select t2.lock_timeouts - t1.lock_timeouts, t2.lock_escals - t1.lock_escals, t2.deadlocks - t1.deadlocks, t2.lock_waits - t1.lock_waits from lockmetrics as t2, locksamples as t1 insert into locksamples select * from lockmetrics <sleep for 60s sampling period> select * from lockdelta Get lock metrics on database sampled over 1 minute (Be sure to run with auto-commit disabled when using CLP) LOCK_TIMEOUTS LOCK_ESCALS DEADLOCKS LOCK_WAITS -------------------- -------------------- -------------------- -------------------0 0 0 412 “How much time is my database spending in lock waits?” select sum(total_rqst_time) as rqst_time, sum(lock_wait_time) as lock_wait_time, (case when sum(total_rqst_time) > 0 then (sum(lock_wait_time) * 100) / sum(total_rqst_time) else 0 end) as lwt_pct from table(mon_get_connection(null,-2)) as t Compute the percentage of lock wait time About 1% of overall request time spent in lock waits RQST_TIME LOCK_WAIT_TIME LWT_PCT ----------------- ----------------- ----------------15111549 264780 1 “Which connections are the most impacted by lock waits?” select application_name, total_rqst_time, total_wait_time, lock_wait_time, (case when (total_rqst_time > 0) then (lock_wait_time * 100) / total_rqst_time else 0 end) as lwt_pct from table (mon_get_connection(null,-2)) as t order by lwt_pct desc fetch first 5 rows only Lock wait percentage of request time The top 5 most impacted connections are seeing relatively uniform lock waits APPLICATION_NAME ---------------drvdtw reporting db2bp drvdtw stockupdate TOTAL_RQST_TIME TOTAL_WAIT_TIME LOCK_WAIT_TIME LWT_PCT --------------- --------------- -------------- ------182250 131218 4015 2 186779 134906 4191 2 181740 129870 4270 2 193160 139617 5749 2 189825 139756 5125 2 “Which of my statements are most impacted by lock waits?” select total_act_time, total_act_wait_time, lock_wait_time, (case when (total_act_time > 0) Lock wait then (lock_wait_time * 100) / total_act_time percentage else 0 end) as lwt_pct, stmt_text as stmt from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as t order by lwt_pct desc fetch first 5 rows only This query shows us how much lock waits impacted individual statements in the package cache TOTAL_ACT_TIME TOTAL_ACT_WAIT_TIME LOCK_WAIT_TIME LWT_PCT STMT -------------- ------------------- -------------- -------- -----------------------------320540 273029 255929 79 INSERT INTO new_order VALUES… 44219 8803 6405 14 SELECT MIN( no_o_id ) INTO … 11 1 0 0 select total_act_time, total… 7 0 0 0 select sum(lock_timeouts) as… 50 0 0 0 select application_name, tot… Examining Lock Notification Messages • A final indicator that can be used to identify locking problems are the lock notification messages written to the admin log • Recall that the database can be configured to capture 3 levels worth of data • For example: update db cfg using mon_lck_msg_lvl 3 Capture data on escalations, lock timeouts and deadlocks to the admin log • The admin log can then be examined to see a history of the events that have occurred on the database, which may be indicative of locking problems Sample Admin Notification Message (Deadlock) Application ID Event type Lock name 2010-07-20-20.27.11.938461 Instance:davek Node:000 PID:15387(db2agent (LOCKDB)) TID:1816127808 Appid:*LOCAL.davek.100721002657 database monitor sqmLockEvents::collectLockEvent Probe:274 Database:LOCKDB ADM5506I "Deadlock" event has occurred on lock "02000600000000000000000054" at timestamp "2010-07-20-20.27.11.934059" with event ID "1". The affected application is named "db2bp", and is associated with the workload name "SYSDEFAULTUSERWORKLOAD" and application ID "*LOCAL.davek.100721002657" at member "0". The role that this application plays with respect to this lock is: "Victim". 2010-07-20-20.27.11.939706 Instance:davek Node:000 PID:15387(db2agent (LOCKDB)) TID:1933568320 Appid:*LOCAL.davek.100721002641 database monitor sqmLockEvents::collectLockEvent Probe:274 Database:LOCKDB ADM5506I "Deadlock" event has occurred on lock "02000500000000000000000054" at timestamp "2010-07-20-20.27.11.934059" with event ID "1". The affected application is named "db2bp", and is associated with the workload name "SYSDEFAULTUSERWORKLOAD" and application ID "*LOCAL.davek.100721002641" at member "0". The role that this application plays with respect to this lock is: “Participant”. Role Workload Extracting Notification Messages Using SQL select msg from table(pd_log_get_msgs(current_timestamp – 1 hour)) as log where msgnum=5506 Application ID Event type Extract message text for any locking notifications in the last hour Lock name MSG -------------------------------------------------------------------------------- ADM5506I "Deadlock" event has occurred on lock "03000600050000000000000052" at timestamp "2011-03-18-18.11.08.596061" with event ID "1". The affected application is named "db2bp", and is associated with the workload name "SYSDEFAULTUSERWORKLOAD" and application ID "*LOCAL.davek.110318220828" at member "0". The role that this application plays with respect to this lock is: "Participant". … Role Workload Using the Locking Event Monitor to Capture Lock Events Using the Locking Event Monitor • This section will show how to use the locking event monitor to capture locking events of interest on your system • This mechanism allows the capture of detailed information on locking events in order to help analyze locking problems on the database • The process of using the locking event monitor consists of three basic steps: • Creating and enabling the event monitor • Setting the type and level of data collection • Formatting and analyzing the data Creating the Locking Event Monitor Create the locking event monitor create event monitor lockEvmon for locking write to unformatted event table ( table lockdata in monitorTBS) autostart Activate the event monitor set event monitor lockEvmon state 1 ( Enable collection of desired data…) ( Collect data… ) ( Format and examine data… ) set event monitor lockEvmon state 0 drop event monitor lockEvmon drop table lockdata Configure and collect desired data (covered later) If I need to remove the event monitor at some point Capturing Deadlock Events Enable collection of deadlock events without statement history for the entire database Deadlocks will spawn workloads so collection must be enabled at the database level update db cfg using mon_deadlock without_hist Other options are history, hist_and_values, none alter workload sysdefaultuserworkload collect deadlock data with history and values Other options are with history, without history, and none Enable collection of additional deadlock data for a specific workload only Capturing Lock Timeout Events Enable collection of lock timeout events for the entire database Other options are history, hist_and_values, none update db cfg using mon_locktimeout without_hist alter workload sysdefaultuserworkload collect lock timeout data with history and values Other options are with history, without history, and none Enable collection of lock timeout events a specific workload only Capturing Lock Wait Events Capture any lock waits exceeding 5 seconds (units of usec) Enable collection of lock wait events for the entire database update db cfg using mon_lw_thresh 5000000 update db cfg using mon_lockwait without_hist Other options are history, hist_and_values, none alter workload customworkload collect lock wait data with history for locks waiting more than 3 seconds Other options are with history, without history, and none Capture any lock waits exceeding 3 seconds (can also use microseconds) Enable collection of lock wait events a specific workload only Formatting the Locking Event Monitor Output • Once we have captured some locking data of interest analysis, the next step is to format that data for analysis • We provide three different methods for formatting the data captured in the unformatted event table • The db2evmonfmttool • Produces a text based report from the UE table data • The EVMON_FORMAT_UE_TO_XML UDF • Produces an XML based report for each lock event • The EVMON_FORMAT_UE_TO_TABLE procedure • Creates a set of relational tables and populates these with the event data from the UE table • Let’s look at each of these in more detail A Sample Locking Scenario Application #1 update t1 set data = 1 where ident = 0 Application #2 (…) update t2 set moredata=2 where moredata=1 select * from staff update t1 set data=1 where ident=0 update t2 set moredata=2 where moredata=1 (Fails with -911) Formatting the Locking Event Monitor Output using the db2evmonfmt tool Compile the formatter tool found in the java samples cd ~/sqllib/samples/java/jdbc javac db2evmonfmt.java Format the data and produce a text based report Name of the UE table where our event data is stored Produce text based report java db2evmonfmt –d sample –ue lockdata –ftext –hours 1 > lockrep.txt Database to connect to Format events within the last hour ------------------------------------------------------Event ID : 3 Event Type : DEADLOCK Event Timestamp : 2010-09-21-21.34.07.640849 Partition of detection : 0 ------------------------------------------------------Deadlock Graph -------------Total number of deadlock participants : 2 Participant that was rolled back : 2 Type of deadlock : local Participant Requesting Lock --------------1 2 Participant Holding Lock --------------2 1 Deadlock lock event Deadlock Member Application Handle Deadlock graph data --------------- -----------------0 09060 0 09070 Participant No 2 requesting lock ---------------------------------Lock Name : 0x03001A00050000000000000052 Lock wait start time : 2010-09-21-21.34.01.589946 Lock wait end time : 2010-09-21-21.34.07.640849 Lock Type : ROW … Table Schema : DAVEK Table Name : T2 Participant No 1 requesting lock ---------------------------------… (…omitted requestor attributes) Participant lock information Participant No 1 requesting lock ---------------------------------Lock Name : 0x03001900040000000000000052 Lock wait start time : 2010-09-21-21.33.51.504053 Lock wait end time : 2010-09-21-21.34.07.640849 Lock Type : ROW … Table Schema : DAVEK Table Name : T1 Current Activities of Participant No 2 ---------------------------------------Activity ID : 2 Uow ID : 11 … Stmt type : Dynamic Stmt operation : DML, Insert/Update/Delete Stmt text : update t2 set moredata=2 where moredata=1 Past Activities of Participant No 2 ------------------------------------Past Activities wrapped: no Participant statement history information Current Activities of Participant No 1 ---------------------------------------Activity ID : 3 Uow ID : 6 Activity ID : 1 … Uow ID : 11 Stmt type : Dynamic … Stmt operation : DML, Insert/Update/Delete Stmt type : Dynamic Stmt text : update t1 set data=1 where ident=0 Stmt operation : DML, Insert/Update/Delete Stmt text : update t1 set data=1 where ident=0 Past Activities of Participant No 1 ------------------------------------Current Activities of Participant No 1 ---------------------------------------- Past Activities wrapped: no … Activity ID : 3 Activity ID : 1 Uow ID : 6 Uow ID : 6 … … Stmt type : Dynamic Stmt type : Dynamic Stmt operation : DML, Insert/Update/Delete operation : DML, Insert/Update/Delete Stmt text : update t1 set data=1Stmt where ident=0 Stmt text : update t2 set moredata=2 where moredata=1 Past Activities of Participant No 1 ------------------------------------Past Activities wrapped: no Formatting the Locking Event Monitor Output using EVMON_FORMAT_UE_TO_XML Parse to produce text based XML select xmlparse(document evmon.xmlreport) from table ( evmon_format_ue_to_xml (null, for each row of ( select * from lockdata order by event_id, event_timestamp, event_type, member) ) ) as evmon Format the lock events in the UE table into XML based records Feed raw UE table data in as input <db2_lock_event Event type xmlns="http://www.ibm.com/xmlns/prod/db2/mon" id="3" type="DEADLOCK" Lock graph timestamp="2010-09-21T21:34:07.640849" member="0" release="9070200"> <db2_deadlock_graph dl_conns="2" rolled_back_participant_no="2" type="local"> <db2_participant no="1" deadlock_member="0" participant_no_holding_lk="2" application_handle="09060"/> <db2_participant no="2" deadlock_member="0" participant_no_holding_lk="1" application_handle="09070"/> </db2_deadlock_graph> First <db2_participant no="2" type="Requester" participant_no_holding_lk="1"> participant <db2_object_requested type="lock"> <lock_name>03001A00050000000000000052</lock_name> <lock_object_type id="2">ROW</lock_object_type> Lock (…) information <table_name id="26">T2</table_name> <table_schema>DAVEK </table_schema> (…) </db2_object_requested> <db2_app_details> Application <application_handle>09060</application_handle> information <appl_id>*LOCAL.davek.100922011700</appl_id> (…) </db2_app_details> <db2_activity type="current"> <db2_activity_details> (…) Statement <stmt_text>update t2 set moredata=2 where moredata=1</stmt_text> history data (…) </db2_activity_details> </db2_activity> <db2_activity type="past"> <db2_activity_details> (…) <stmt_text>update t1 set data=1 where ident=0</stmt_text> (…) </db2_activity_details> </db2_activity> </db2_participant> <db2_participant no="1" type="Requester" participant_no_holding_lk="2"> (…) <db2_activity type="current"> <db2_activity_details> (…) <stmt_text>update t1 set data=1 where ident=0</stmt_text> (…) </db2_activity_details> </db2_activity> (…) <db2_activity type="past"> <db2_activity_details> (…) <stmt_text>update t2 set moredata=2 where moredata=1</stmt_text> (…) </db2_activity_details> </db2_activity> </db2_participant> </db2_lock_event> Statement history data Second participant Statement history data Formatting the Locking Event Monitor Output using EVMON_FORMAT_UE_TO_TABLE call evmon_format_ue_to_tables ( ‘LOCKING’, null, null, null, null, null, ‘’, -1, ‘select * from lockdata Feed raw UE table query text as input order by event_timestamp’ ) Format the lock events in the UE table into SQL tables Populates data in the following tables Table/View ------------------------------LOCKDATA LOCK_ACTIVITY_VALUES LOCK_EVENT LOCK_PARTICIPANTS LOCK_PARTICIPANT_ACTIVITIES T1 T2 Schema --------------DAVEK DAVEK DAVEK DAVEK DAVEK DAVEK DAVEK Type ----T T T T T T T Creation time -------------------------2010-07-20-18.56.13.029144 2010-07-21-00.25.18.373751 2010-07-21-00.25.16.306045 2010-07-21-00.25.16.967038 2010-07-21-00.25.17.310122 2010-07-20-22.28.21.204058 2010-07-20-20.22.11.474826 select xmlid, event_id, event_type, event_timestamp, member, dl_conns, rolled_back_participant_no from lock_event Examine lock event information XMLID EVENT_TYPE EVENT_TIMESTAMP MEMBER DL_CONNS ROLLED_BACK_PARTICIPANT_NO ----------------- ---------- -------------------------- ------ -------- -------------------------db2LockEvent_1_… DEADLOCK 2011-03-21-18.48.54.433717 0 2 2 select xmlid, participant_no, participant_type, participant_no_holding_lk, application_handle from lock_participants where xmlid like ‘db2LockEvent_1_%’ Examine lock participant information and dependencies that make up the deadlock graph XMLID PARTICIPANT_NO PARTICIPANT_TYPE PARTICIPANT_NO_HOLDING_LK APPLICATION_HANDLE ---------------- -------------- ---------------- ------------------------- -----------------db2LockEvent_1_… 2 Requester 1 11446 db2LockEvent_1_… 1 Requester 2 1528 select lock_name, lock_wait_start_time, lock_wait_end_time, lock_object_type, table_schema, table_name from lock_participants where participant_no=2 and xmlid like ‘db2LockEvent_1_%’ Examine lock wait information from a specific participant LOCK_NAME LOCK_WAIT_START_TIME LOCK_WAIT_END_TIME LOCK_OBJECT_TYPE TABLE_SCHEMA TABLE_NAME --------------- -------------------- ------------------- ---------------- ------------ ---------0300060005000… 2011-03-21-18.48.48 2011-03-21-18.48.54 ROW DAVEK T2 select activity_id, activity_type, uow_id, stmt_type, stmt_text from lock_participant_activities where participant_no=2 and xmlid like ‘db2LockEvent_1_%’ ACTIVITY_ID ----------2 1 Examine participant activity history ACTIVITY_TYPE UOW_ID STMT_TYPE STMT_TEXT ------------- ------ --------- ----------------------------------------current 1 2 update t2 set moredata=2 where moredata=1 past 1 2 update t1 set data = 1 where ident = 0 Notes on DPF and PureScale • On DPF and PureScale lock events may occur on different members • On DPF lock conflicts are always local, but occur on different members • On PureScale lock conflicts can occur locally on different members, as well as between different members due to global locks • Deadlock dependencies on both PureScale and DPF may involve multiple members • When a lock event occurs, each participant in the event is notified of the event and will cut an individual lock record to the UE table • This occurs irrespective of whether participants are local to a single member, or on different members • When the lock events are formatted, the formatter will amalgamate the individual records from each of the participants Example: Global Deadlock in DPF or PureScale ------------------------------------------------------Event ID : 3 Event Type : DEADLOCK Event Timestamp : 2010-09-21-21.34.07.640849 Partition of detection : 0 ------------------------------------------------------Deadlock Graph -------------Total number of deadlock participants : 2 Participant that was rolled back : 2 Type of deadlock : global Participant Requesting Lock --------------1 2 Participant Holding Lock --------------2 1 Global deadlock with conflicts on different members Deadlock Member Application Handle --------------- -----------------0 09060 3 09070 Participant No 2 requesting lock ---------------------------------Lock Name : 0x03001A00050000000000000052 Lock wait start time : 2010-09-21-21.34.01.589946 Lock wait end time : 2010-09-21-21.34.07.640849 Lock Type : ROW … Table Schema : DAVEK Table Name : T2 Participant No 1 requesting lock ---------------------------------Lock Name : 0x03001900040000000000000052 Lock wait start time : 2010-09-21-21.33.51.504053 Lock wait end time : 2010-09-21-21.34.07.640849 Lock Type : ROW … Table Schema : DAVEK Table Name : T1 Using the Locking Functions to Analyze “Live” Locking Issues on your System Using the Locking Functions • The final tool we to assist in the analysis of locking problems are a set of new locking functions and views introduced in DB2 9.7 FP1 • These functions allow you to inspect information on current lock dependencies on your system from several different perspectives • This allows for adhoc analysis of “live” locking issues without the need to capture and analyze historical data • Let’s look at some examples of how these functions work Another Sample Locking Scenario Application #1 delete from sales where region=‘Quebec’ Application #2 delete from sales where region=‘Quebec’ Application #3 lock table sales in exclusive mode Application #4 delete from sales where region=‘Manitoba’ Application #5 drop table sales Identifying Applications Involved in Lock Waits select lock_wait_start_time, lock_name, req_application_handle, hld_application_handle, hld_member from table(mon_get_appl_lockwait(null,-2)) as t Lock waiters LOCK_WAIT_START_TIME --------------------2010-07-22-14.41.20.428811 2010-07-22-14.26.31.028352 2010-07-22-14.26.18.772041 2010-07-22-14.26.51.747847 Corresponding holders LOCK_NAME REQ_APPL… HLD_APPL… HLD_MEMBER -------------------------- --------- --------- ---------02001000000000000000000054 236 203 0 02001000000000000000000054 203 201 0 02001000070000000000000052 202 201 0 00000500081A00002157BE8543 204 203 0 Identifying the statements the applications are currently running List distinct select a.apphdl, p.stmt_text Retrieve applications from (select distinct apphdl from statement ((select l.req_application_handle as apphdl text from table(mon_get_appl_lockwait(null,-2)) as l) union Lookup SQL (select l.hld_application_handle as apphdl activities on from table(mon_get_appl_lockwait(null,-2)) as l))) as a, app coord table(mon_get_connection(a.apphdl,-1)) as c, table(wlm_get_workload_occurrence_activities_v97(a.apphdl, c.coord_member)) as s, table(mon_get_pkg_cache_stmt(null,s.executable_id,null,c.coord_member)) as p APPHDL -------------------202 203 204 236 STMT_TEXT ----------------------------------------delete from sales where region=‘Quebec’ lock table sales in exclusive mode drop table sales delete from sales where region=‘Manitoba’ Using the MON_LOCKWAITS view Show me applications involved in lock waits and their current SQL select lock_wait_elapsed_time, lock_name, req_application_handle, hld_application_handle, req_stmt_text, hld_current_stmt_text from sysibmadm.mon_lockwaits (Run this with isolation set to UR to ensure we don’t block on catalog locks) LOCK_WAIT… ---------7525 8414 8427 8394 LOCK_NAME REQ_APPL… HLD_APPL… REQ_STMT_TEXT HLD_CURRENT_STMT_TEXT -------------------------- ---------- --------- ------------------ ---------------------------02001000000000000000000054 236 203 delete from sales… lock table sales in excl… 02001000000000000000000054 203 201 lock table sales… 02001000070000000000000052 202 201 delete from sales… 00000500081A00002157BE8543 204 203 drop table sales lock table sales in excl… Finding lock holders using MON_GET_LOCKS Show me the waiters and holders for a particular lock select lock_name, member, lock_status, application_handle from table(mon_get_locks( CLOB('<lock_name>02001000000000000000000054</lock_name>'), -2)) as l LOCK_NAME MEMBER LOCK_STATUS APPLICATION_HANDLE -------------------------------- ------ ----------- -------------------02001000000000000000000054 0 G 201 02001000000000000000000054 0 G 202 02001000000000000000000054 0 W 203 02001000000000000000000054 0 W 236 Formatting the Lock Name Show me the attributes for a particular lock select name, value from table(mon_format_lock_name( '02001000000000000000000054')) as f NAME -------------------LOCK_OBJECT_TYPE TBSP_NAME TABSCHEMA TABNAME VALUE ---------TABLE USERSPACE1 DAVEK SALES Notes on DPF and PureScale • In DPF, the behavior of the locking functions is basically identical to serial instances except that lock conflicts / waits may occur on different members. • In order to see the whole picture queries should be issued on all members • In PureScale with global locks it’s possible that a lock wait by a request executing on one member is blocked waiting for a lock held by a request executing on a different member • In this case MON_GET_APPL_LOCKWAIT will not be able to show which application is the holder – only that the holder is located on a remote member rather than locally • When this occurs, the holder application can be identified in a subsequent step via a call to MON_GET_LOCKS using the name of the lock being waited on Identifying Lock Waits in PureScale select lock_wait_start_time, lock_name, req_application_handle, req_member, hld_application_handle, hld_member from table(mon_get_appl_lockwait(null,-2)) as t Lock waiters LOCK_WAIT_START_TIME -------------------------2011-03-22-20.38.56.715902 2011-03-22-20.38.49.592468 2011-03-22-20.38.47.298632 2011-03-22-20.38.51.385681 Unknown lock holders (!) LOCK_NAME REQ_APPL… REQ_MEMBER HLD_APPL… HLD_MEMBER -------------------------- --------- ---------- --------- ---------00000500031C0000C0F1BDE2C 65603 2 131125 2 02001000000000000000000054 131125 2 0 02001000060000000000000052 65589 1 0 02001000000000000000000054 67 0 1 Locating the Missing Lock Holders select lw.lock_wait_start_time, lw.lock_name, lw.req_application_handle, lw.req_member, lk.application_handle as hld_application_handle, lw.hld_member from table(mon_get_appl_lockwait(null,-2)) as lw, lateral( select lock_name, member, lock_status, application_handle from table( mon_get_locks(CLOB('<lock_name>'||lw.lock_name|| </lock_name>'), lw.hld_member)) as locks where locks.lock_status='G' fetch first row only ) as lk Lock waiters LOCK_WAIT_START_TIME -------------------------2011-03-22-20.38.56.715902 2011-03-22-20.38.49.592468 2011-03-22-20.38.47.298632 2011-03-22-20.38.51.385681 For each lock wait Find me the first holder of the lock on the holder member Non-local lock holders now shown LOCK_NAME REQ_APPL… REQ_MEMBER HLD_APPL… HLD_MEMBER -------------------------- --------- ---------- --------- ---------00000500031C0000C0F1BDE2C 65603 2 131125 2 02001000000000000000000054 131125 2 53 0 02001000060000000000000052 65589 1 53 0 02001000000000000000000054 67 0 65589 1 Final thoughts In Closing • We’ve introduced you to some of the lock new monitoring capabilities introduced in DB2 9.7 • These facilities are intended to provide a consolidated set of diagnostics to enable to you to identify and solve a wide variety of lock related issues on your system • Replaces various existing facilities such as the deadlock event monitor and lock timeout report • Provides a variety of approaches to identify and drill down into locking problems, both from adhoc in-memory diagnostics to historical capture mechanism • Hopefully the examples in this session have given you some ideas on ways you can leverage the latest DB2 Monitoring capabilities in your own environment Questions? DB2 Monitoring Resources DB2 Monitoring Resources • DB2 9.7 documentation: • http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp • Related IDUG NA 2011 presentations: • Advanced Performance Diagnostics for SQL (D02) David Kalmuk IBM dckalmuk@ca.ibm.com New Ways to Solve your Locking Problems with DB2 9.7