Rich Banville - PUG Challenge Americas

Some More Database
Performance Knobs
North American PUG Challenge
Richard Banville
Software Fellow
OpenEdge Development
Agenda
2
1
LRU (again)
2
Networking: Message Capacity
3
Networking: Resource Usage
4
Index Rebuild
5
Summary
© 2012 Progress Software Corporation. All rights reserved.
Agenda
3
1
LRU (again)
2
Networking: Message Capacity
3
Networking: Resource Usage
4
Index Rebuild
5
Summary
© 2012 Progress Software Corporation. All rights reserved.
LRU (again)
Least Recent
RM Block T1
Most Recent
IX Block I1
RM Block T3
IX Block I3
 Replacement policy of database buffer pool
• Maintains working set of data buffers
• Just a linked list – a shared data structure
• Changes made orderly by LRU Latch
 Replace buffer at LRU end with newly read block from disk
4
© 2012 Progress Software Corporation. All rights reserved.
LRU (again)
Least Recent
RM Block T1
Most Recent
IX Block I1
RM Block T3
IX Block I3
 Pros – proficient block usage predictor
• Maintains high buffer pool hit ratio
 Cons – housekeeping costs
• Single threads access to buffer pool (even if for an instant)
• High activity, relatively high nap rate
 Managing LRU:
• Private read only buffers: -Bp –BpMax (not w/-lruskips until 10.2b07)
• Alternate buffer pool: –B2
• New: -lruskips
-lru2skips
5
© 2012 Progress Software Corporation. All rights reserved.
LRU (again)
Least Recent
RM Block T1
Most Recent
IX Block I1
Find first T1.
6
© 2012 Progress Software Corporation. All rights reserved.
RM Block T3
IX Block I3
LRU (again)
Least Recent
Most Recent
IX Block I1
RM Block T3
IX Block I3
RM Block T1
RM Block T3
IX Block I3
IX Block I1
RM Block T3
IX Block I3
IX Block I1
RM Block T1
RM Block T1
Find first T1.
7
© 2012 Progress Software Corporation. All rights reserved.
LRU (again)
Least Recent
RM Block T3
Most Recent
IX Block I3
IX Block I1
RM Block T1
Find first T1. (again)
RM Block T3
IX Block I3
RM Block T1
IX Block I1
RM Block T3
IX Block I3
IX Block I1
RM Block T1
What about …
For each T1: end.
For each w/many tables.
For each w/many tables, many users.
8
© 2012 Progress Software Corporation. All rights reserved.
Location, location, location
Least Recent
 With –B 1,000,000
• What does it take to evict from the buffer pool?
• What does it take to go from MRU to LRU?
 Do we need MRU on EACH access then?
• I think not.
9
© 2012 Progress Software Corporation. All rights reserved.
Most Recent
Improving Concurrency
Least Recent

-lruskips <n>
Most Recent
• LRU and LFU combined
• Small numbers make a BIG difference
• Monitor OS Read I/Os and LRU latch contention
• Adjust online via _Startup. _Startup-LRU-Skips VST field
• Adjust online via promon
– R&D
-> 4. Administrative Functions ...
-> 4. Adjust LRU force skips
10
© 2012 Progress Software Corporation. All rights reserved.
Performance – 10.2b06 & -lruskips
Readprobe Data Access Results
300,000
Records Read
250,000
200,000
~39%
150,000
lruskips 0
lruskips 10
100,000
lruskips 100
lruskips 1000
50,000
# Users
11
1
6
11
© 2012 Progress Software Corporation. All rights reserved.
16
21
26
31
36
41
46
100
Performance – 10.2b06 & -lruskips (250 users)
Readprobe Latch Waits
(per sec*)
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
lruskips 0
lruskips 10
lruskips 100
lruskips 1000
BHT
184
730
887
859
BF1
279
1,066
1,167
1,178
BF2
6
16
16
13
BF3
66
174
250
164
BF4
9
13
12
13
LRU
1,655
148
7
0
Note change in LRU latch waits vs buffer latch waits
12
© 2012 Progress Software Corporation. All rights reserved.
lruskips 0
lruskips 10
lruskips 100
lruskips 1000
Performance – 10.2b06 & -lruskips (250 users, big db)
Readprobe Latch Waits
(per sec*)
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
lruskips 0
lruskips 10
lruskips 100
lruskips 1000
BHT
14
173
217
322
BF1
0
1
1
2
BF2
1
2
1
2
BF3
1
1
1
2
BF4
3
2
1
1
LRU
1,949
1,035
19
0
Note focus now is on LRU and BHT (not buf)
13
© 2012 Progress Software Corporation. All rights reserved.
lruskips 0
lruskips 10
lruskips 100
lruskips 1000
Performance – 10.2b06 & -lruskips (big db)
Readprobe Data Access Results
400,000
350,000
~15%
~52%
Records Read
300,000
~44%
250,000
200,000
150,000
lruskips 0
lruskips 10
lruskips 100
lruskips1000
100,000
50,000
# Users
14
1
6
11
© 2012 Progress Software Corporation. All rights reserved.
16
21
26
31
36
41
46
100
Conclusions
 -lruskips can eliminate the LRU bottleneck
 LRU isn’t the last bottleneck
 Overall improvement relative to other contention
• Data access limited by buffer level contention
• Table scans over small tables have more buffer contention than
large tables
– Application changes can improve performance too!
15
© 2012 Progress Software Corporation. All rights reserved.
Agenda
16
1
LRU (again)
2
Networking: Message Capacity
3
Networking: Resource Usage
4
Index Rebuild
5
Summary
© 2012 Progress Software Corporation. All rights reserved.
Networking Control
Philosophy: Throughput by keeping server busy without remote client waits!
 Process based control
• -Ma, -Mn, -Mi
– Controls the order users are assigned to servers
• -PendCondTime
 Resource based control
• -Mm <n>
– Maximum size of network message
– Client & server startup
 New tuning knobs – resource based control
• Alleviate excessive system CPU usage by network layer
• Control record data stuffed in a network message
– Applicable for “prefetch” queries
17
© 2012 Progress Software Corporation. All rights reserved.
Networking – Prefetch Query
 No-lock query with guaranteed forward motion or scrolling
• Multiple records stuffed into single network message
• Browsed static and preselected queries scrolling by default
FOR EACH customer NO-LOCK:
….
end.
DO PRESELECT EACH customer NO-LOCK:
….
end.
define query cust-q
for customer SCROLLING.
open query cust-q
FOR EACH customer NO-LOCK.
repeat:
get next cust-q.
end.
18
© 2012 Progress Software Corporation. All rights reserved.
Server Network Message Processing Loop
Start
Outstanding
prefetch
request?
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
19
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
Outstanding
prefetch
request?
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
20
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
-NmsgWait
Outstanding
prefetch
request?
Check for request
2 second wait
Poll(2)
What’s new:
No
Process Server
Events
Yes
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
21
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
Outstanding
prefetch
request?
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
22
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
Outstanding
prefetch
request?
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
23
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
Outstanding
prefetch
request?
Poll() is system
CPU intensive
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
10 milliseconds
to poll(0)!
Check for request
No Wait
Poll(0)
Got new
request?
10 microseconds
to copy 1 record
Add a record to
network msg for
outstanding
(prefetch) request
24
No
Yes
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
What’s new:
Potential
side effects
Outstanding
prefetch
request?
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
-prefetchPriority
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
25
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Server Network Message Processing Loop
Start
Outstanding
prefetch
request?
No
Check for request
2 second wait
Poll(2)
Process Server
Events
Yes
Check for request
No Wait
Poll(0)
Got new
request?
Yes
Add a record to
network msg for
outstanding
(prefetch) request
26
No
Got new
request?
© 2012 Progress Software Corporation. All rights reserved.
Yes
Process waiting
request
No
Process Waiting Network Message
Process waiting
request
Add record
to message
Prefetch
request?
No
Yes
1st Record
request?
Yes
No
Threshold met?
Yes
Send network
message
No
Remote client
continues to wait
27
© 2012 Progress Software Corporation. All rights reserved.
Goto
start
Process Waiting Network Message
Process waiting
request
Add record
to message
Prefetch
request?
No
Yes
Non-prefetch query request
1st Record
request?
Yes
No
Threshold met?
Yes
Send network
message
No
Remote client
continues to wait
28
© 2012 Progress Software Corporation. All rights reserved.
Goto
start
Process Waiting Network Message
Process waiting
request
Add record
to message
Prefetch
request?
No
Yes
1st
record of a prefetch query
request
1st Record
request?
Yes
No
Threshold met?
Yes
Send network
message
No
Remote client
continues to wait
29
© 2012 Progress Software Corporation. All rights reserved.
Goto
start
Process Waiting Network Message
Process waiting
request
Add record
to message
Prefetch
request?
No
Yes
Secondary records of a prefetch
query request:
- threshold not met
- default threshold is 16 records
1st Record
request?
Yes
No
Threshold met?
Yes
Send network
message
No
Remote client
continues to wait
30
© 2012 Progress Software Corporation. All rights reserved.
Goto
start
Process Waiting Network Message
Process waiting
request
Add record
to message
Prefetch
request?
No
Yes
Secondary records of a
prefetch query request:
- Client waiting
- Threshold met
- Send message
1st Record
request?
Yes
No
Threshold met?
Yes
Send network
message
No
Remote client
continues to wait
31
© 2012 Progress Software Corporation. All rights reserved.
Goto
start
Process Waiting Network Message
Process waiting
request
Add record
to message
What’s new:




Increase network message fill rate:
- Improve TCP throughput
- Improve overall server performance
Prefetch
request?
No
Yes
1st Record
request?
Yes
No
Defaults have not changed
Provides control for you
Threshold met?
Yes
Send network
message
Every deployment is different
No
Remote client
continues to wait
32
© 2012 Progress Software Corporation. All rights reserved.
Goto
start
Process Waiting Network Message
Process waiting
request
Add record
to message
Prefetch
request?
No
Yes
Disregard 1st
record request
check
-prefetchDelay
1st Record
request?
Yes
No
Threshold control:
# recs vs % full
-prefetchNumRecs
Threshold met?
Yes
Send network
message
-prefetchFactor
No
Potential side effects:
–NOTE:
Improved
TCP/system
performance
- -Mm
size determines
max
-Mm 4096 / 16 rec = 256 bytes
– Choppy
behavior on remote client?
33
© 2012 Progress Software Corporation. All rights reserved.
Remote client
continues to wait
Goto
start
Altering Network Message Behavior
 Promon Support (_Startup VST too!)
• Alter online
– R&D …
– 4. Administrative Functions …
– 7. Server Options …
Server Options:
1.
2.
3.
4.
5.
7.
34
Server network message wait time:
2 seconds
Delay first prefetch message:
Enabled
Prefetch message fill percentage:
90 %
Minimum records in prefetch message: 1000
Suspension queue poll priority:
0
Terminate a server
© 2012 Progress Software Corporation. All rights reserved.
Performance – 10.2b06 & Networking changes
baseline
NumRecs
priority
NumRecs_priority
Num_Recs_Priority_lruskips
Readprobe Data Access Results
650,000
Records Read
550,000
450,000
350,000
~212%
250,000
150,000
~32%
50,000
# Users
35
1
6
11
16
© 2012 Progress Software Corporation. All rights reserved.
21
26
31
36
41
46
100
Agenda
36
1
LRU (again)
2
Networking: Message Capacity
3
Networking: Resource Usage
4
Index Rebuild
5
Summary
© 2012 Progress Software Corporation. All rights reserved.
Assumptions for best performance
 Index data is segregated from table data
• Indexes & tables are in different storage areas
 You have enough disk space for sorting
 You understand the impact of CPU and memory consumption
 Process allowed to use available system resources
37
© 2012 Progress Software Corporation. All rights reserved.
Index Rebuild Parameters - Overview
-TB
-datascanthreads
# threads for data scan phase
-TMB
merge block size ( default -TB)
-TF
merge pool fraction of system memory (in %)
-mergethreads
# threads per concurrent sort group merging
-threadnum
-TM
-rusage
-silent
38
sort block size (8K – 64K, note new limit)
# concurrent sort group merging
# merge buffers to merge each merge pass
report system usage statistics
a bit quieter than before
© 2012 Progress Software Corporation. All rights reserved.
Phases of Index Rebuild (“non-recoverable”)
Index Scan
• Scan index data area start to finish
• I/O Bound with little CPU activity
• Eliminated with area truncate
Data Scan/
Key Build
•
•
•
•
Sort-Merge
• Sort-merge –TF and/or temp sort file
• CPU Bound with I/O Activity
• I/O eliminated if –TF large enough
Index Key
Insertion
39
•
•
•
•
Scan table data area start to finish (area at a time)
Read records, build keys, insert to temp sort buffer
Sort full temp file buffer blocks (write if > -TF)
I/O Bound with CPU Activity
Read –TF or temp sort file
Insert keys into index
Formats new clusters; May raise HWM
I/O Bound with little CPU Activity
© 2012 Progress Software Corporation. All rights reserved.
Phases of Index Rebuild
Index Scan
• Scan index data area start to finish
• I/O Bound with little CPU activity
• Eliminated with area truncate
Area 9: Index scan (Type II) complete.
• Index area is scanned start to finish (single threaded)
• Block at a time with cluster hops
• Index blocks are put on free chain for the index
• Index Object is not deleted (to fix corrupt cluster or block chains)
• Order of operation:
• Blocks are read from disk,
• Blocks are re-formatted in memory
• Blocks are written to disk as –B is exhausted
• Causes I/O in other phases for block re-format
• Can be eliminated with manual area truncate where possible
40
© 2012 Progress Software Corporation. All rights reserved.
Phases of Index Rebuild
Index Scan
• Scan index data area start to finish
• I/O Bound with little CPU activity
• Eliminated with area truncate
Data Scan/
Key Build
•
•
•
•
Scan table data area start to finish (area at a time)
Read records, build keys, insert to temp sort buffer
Sort full temp file buffer blocks (write if > -TF)
I/O Bound with CPU Activity
Processing area 8 : (11463)
Start 4 threads for the area. (14536)
Area 8: Multi-threaded record scan (Type II) complete.
• Table data area is scanned start to finish (multi-threaded if –datascanthreads)
• Each thread processes next block in area (with cluster hops)
• Database re-opened by each thread in R/O mode
• Ensure file handle ulimits set high enough
41
© 2012 Progress Software Corporation. All rights reserved.
Data Scan/Key Build
RM Block
DB
Record
a)
Thread reads next data block in data area
b)
Extract next record from data block and
build index key (sort order)
c)
Insert key into sort block (-TB 8K thru 64K)
d)
Sort/merge full sort block into merge block.
(-TMB -TB thru 64K)
e)
Write merge block to –TF, overflow to temp
(-TMB sized I/O)
Key
Sort Block
Sort Block
Merge Block
-TF
42
.srt1
.srt2
© 2012 Progress Software Corporation. All rights reserved.
…
Sort Groups: -SG 3 (note 8 is minimum)
Each index assigned a particular sort group (hashed index #)
Index 1
Index 4
Record
1) -T /usr1/richb/temp/
SG 1
.srt1
2) <dbname>.srt
0 /usr1/richb/temp/
Index 2
SG 2
.srt2
SG 3
.srt3
Index 3
 Each group has its own sort file
 Sort file location
• 1. Sort files in same directory (I/O contention)
• 4. Sort files in different location
 Ensure enough space
43
© 2012 Progress Software Corporation. All rights reserved.
3) <dbname>.srt
10240 /usr1/richb/temp/
0 /usr1/richb/temp/
4) <dbname>.srt
0 /usr1/richb/temp/
0 /usr2/richb/temp/
0 /usr3/richb/temp/
Phases of Index Rebuild
Index Scan
• Scan index data area start to finish
• I/O Bound with little CPU activity
• Eliminated with area truncate
Data Scan/
Key Build
•
•
•
•
Sort-Merge
• Sort-merge –TF and/or temp sort file
• CPU Bound with I/O Activity
• I/O eliminated if –TF large enough
Scan table data area start to finish (area at a time)
Read records, build keys, insert to temp sort buffer
Sort full temp file buffer blocks (write if > -TF)
I/O Bound with CPU Activity
Sorting index group 3
Spawning 4 threads for merging of group 3.
Sorting index group 3 complete.
44
© 2012 Progress Software Corporation. All rights reserved.
Sort-Merge Phase
Sorted!
 Sort blocks in each sort group have been sorted and merged
into a linked list of individual merge blocks stored in –TF and
temp files.
 These merge blocks are further merged –TM# at a time to form
new larger “runs” of sorted merge blocks.
 -TM# of these new “runs” are then merged to form even larger
“runs” of sorted merge blocks.
 When there is only one very large “run” left, all the key entries in
the sort group are in sorted order.
45
© 2012 Progress Software Corporation. All rights reserved.
-threadnum vs -mergethreads
 -threadnum 2
-TF
-TF
-TF
46
.srt1
Thread 1
Merge phase group 1
.srt2
Thread 2
Merge phase group 2
.srt3
© 2012 Progress Software Corporation. All rights reserved.
-threadnum vs -mergethreads
 -threadnum 2
-TF
-TF
-TF
47
B-tree insertion occurs as soon as a
sort group’s merge is completed.
.srt1
Thread 0
Thread 0 begins b-tree insertion
concurrently.
.srt2
Thread 2
Merge phase group 2
.srt3
Thread 1
Merge phase group 3
© 2012 Progress Software Corporation. All rights reserved.
-threadnum vs -mergethreads
 -threadnum 2 –mergethreads 3
Thread 3
-TF
.srt1
Thread 1
Thread 4

Merge threads merge successive
“runs” of merge blocks concurrently.
Merge phase group 1
Thread 5
Thread 6
-TF
.srt2
Thread 2
Thread 7
Thread 8
-TF
.srt3
Note: 8 actively running threads
48
© 2012 Progress Software Corporation. All rights reserved.
Merge phase group 2
-threadnum vs -mergethreads
 -threadnum 2 –mergethreads 3
-TF
.srt1
Thread 6
-TF
.srt2
Thread 2
Thread 7
Merge phase group 2
Thread 8
-TF
Thread 3
.srt3
Merge phase group 3
Thread 1
Thread 4
Thread 5
49
© 2012 Progress Software Corporation. All rights reserved.
-threadnum vs -mergethreads
 -threadnum 2 –mergethreads 3 B-tree insertion occurs as soon as a
sort group’s merge is completed.
-TF
.srt1
Thread 0 begins b-tree insertion
concurrently.
Thread 0
Thread 6
-TF
.srt2
Thread 2
Thread 7
Merge phase group 2
Thread 8
-TF
Thread 3
.srt3
Merge phase group 3
Thread 1
Thread 4
Thread 5
Note: 9 actively running threads
50
© 2012 Progress Software Corporation. All rights reserved.
Phases of Index Rebuild
Index Scan
• Scan index data area start to finish
• I/O Bound with little CPU activity
• Eliminated with area truncate
Data Scan/
Key Build
•
•
•
•
Sort-Merge
• Sort-merge –TF and/or temp sort file
• CPU Bound with I/O Activity
• I/O eliminated if –TF large enough
Index Key
Insertion
51
•
•
•
•
Scan table data area start to finish (area at a time)
Read records, build keys, insert to temp sort buffer
Sort full temp file buffer blocks (write if > -TF)
I/O Bound with CPU Activity
Read –TF or temp sort file
Insert keys into index
Formats new clusters; May raise HWM
I/O Bound with little CPU Activity
© 2012 Progress Software Corporation. All rights reserved.
Index Key Insertion Phase
Building index 11 (cust-num) of group 3 …
Building of indexes in group 3 completed.
Multi-threaded index sorting and building complete.
Index B-tree
Root
Leaf
Leaf
Leaf
Write leaf when full
DB
 Key entries from sorted merge blocks are inserted into b-tree
 Performed sequentially entry at a time, index at a time
 Leaf level insertion optimization (avoids b-tree scan)
 Leaf level written to disk as soon as full (since never revisited)
52
© 2012 Progress Software Corporation. All rights reserved.
2085 Indexes were rebuilt. (11465)
Index rebuild complete. 0 error(s) encountered.
53
© 2012 Progress Software Corporation. All rights reserved.
Index Rebuild - Tuning
 Truncate index only area if possible

.srt file
 Parameters
• -mergethreads: 2 or 4 and –threadnum 2 or 1
• -datascanthreads: 1.5 * # CPUs
• -B 1024
• –TF 80 (monitor physical memory paging)
• –TMB 64
• –TB 64
• –TM 32
• –T: separate disk, RAM disk if not using -TF (no change)
• -rusage & -silent
54
© 2012 Progress Software Corporation. All rights reserved.
Performance Numbers
Index Rebuild
Elapsed Time
120,000
100,000
80,000
60,000
40,000
20,000
0
10.2b06 best
Cost of each
phase (in secs)
10.2b06 no truncate
10.2b06 w/-TF 50
10.2b06 baseline
55
© 2012 Progress Software Corporation. All rights reserved.
Agenda
56
1
LRU (again)
2
Networking: Message Capacity
3
Networking: Resource Usage
4
Index Rebuild
5
Summary
© 2012 Progress Software Corporation. All rights reserved.
Summary
 LRU
• Potential for a big win
• Always room for improvement
– Us and you!
 Networking
• You now have more control
• With power comes responsibility
 Index Rebuild
• Big improvements if
– Your database is setup properly
– You provide system resources to index rebuild
• Hopefully you’ll never need it
57
© 2012 Progress Software Corporation. All rights reserved.
?
Questions
58
© 2012 Progress Software Corporation. All rights reserved.