PPTX - PUG Challenge Americas

advertisement
Numbers, We don’t need no
stinkin’ numbers
Adam Backman
Vice President
DBAppraise, Llc
About the Presenter
• Progress user from when dinosaurs roamed
the earth (nearly)
• President - White Star Software
– Consulting: performance, coding, problem solving
– Training: Programming, System and Database
Administration
• Vice President – DBAppraise
– Managed database services
Agenda
•
•
•
•
Why performance is important?
Components of performance
Perception vs. reality
Who is most important?
Why is performance important?
•
•
•
•
Time is literally money
Many idle hands cost real money
A delayed customer is a lost customer
Delayed support equals lost confidence
Put a value on performance
•
•
•
•
Users wait 10 seconds
Does not sound bad
Users do the operation many times per day
10 seconds per transaction, 10 per hour, 8 hours
a day. 800 seconds wasted per user per day.
• 13 minutes wasted per user per day times then
number of users (500 users)
• That is over 100 hours of wasted time per day
Components of performance
•
•
•
•
Network
Disk
Memory
CPU
• Goal: Push the bottleneck to the fastest
resource
Network
• Slowest resource
• Temp files going to network drive
• Need to minimize traffic
– -Mm (Remember to increase frame size
everywhere when increasing –Mm)
– -Mn, -Mpb, -Mi, -Ma
Disk
• Most frequent offender
• People focus on wrong metrics
• Queue depth and service time are generally
good indicators of congestion
Memory
•
•
•
•
Move things off disk into memory
-B (DB to shared memory)
-Bt (temp disk files to temp buffers)
OS and Disk array caches
CPU
• The “right” type of CPU activity
– User
– System
– Wait
– Idle
– what you paid for
– System overhead
– Waiting on I/O (What type of I/O)
– You need idle but having zero does not
mean there is an issue
Numbers are good but …
• Performance stinks
• Performance is perception
• User experience is king
First, look for record locks or
other application issues
•
•
•
•
•
ProTop has a screen for blocked sessions
Record locks can completely stop activity
The user sees record in use by ….
The administrator does not
Additionally, I look for very high db requests
from a single connection
ProTop: Blocked Sessions
Blocked Sessions
Usr Name
Note
----- ------------ --------------------------------------------24 tom
REC XQH 102 [Order] Adam
Promon: Block Access
Block Access:
Type
Usr Name
Acc
999 TOTAL...
DB Requests
6415644367
Acc
0 adam
165004
Acc
5 adam
1
Acc
6 adam
1
Acc
7 dbapprai
DB Reads
\Writes
54341274
423828
BI Reads
\Writes
6284
521056
1317
1245
0
191480
5657
93125
0
6
0
184629
0
7
1
0
0
3549613
0
Buffer Hit Percentage
• Generally a good metric
• But …
– A single table small table scan can vastly skew
results
– Low volume buffer hit percentage is nearly
meaningless
How to Make Buffer Hit Rate
Useful
• Know which tables are being read
– Large tables
– Small tables
• Know what is “normal”
How to Make Buffer Hit Rate
Less Useful
• Bring up promon activity screen and only use
the first sample
• Use really small sample sizes (seconds vs.
minutes)
• Use really large sample sizes (hours vs.
minutes)
Benchmarks Lie
• Do not test real-world
– All read (Readprobe)
– All write (ATM)
– Wrong mix of read and write
• Time slicing can make results more attractive
CPU – Wait
• The CPU always blames everyone else
• If you have wait and idle it is generally no
issue
• If you have wait and no idle you likely have an
issue. Look at disk first
CPU - Idle
• If you have a single core then a single program
can use 100% of the CPU
• This is a good thing. The process will use it’s
CPU and complete
The network is never more than
10% busy
• Every network admin in the world uses this
line
• They get this from the manufacturers
• They sample and provide a single sample for a
large time frame.
• How about 100% busy 10% of the time
Setting –spin based on a
calculation
• Gus said that it should be …
• Unless Gus is at your site any calculation is
wrong
• Gus said this some time ago and was
misquoted at that time
• Generally stated as # * CPUs
• This is nearly always wrong (you could get
lucky by accident)
Percentage full on extents
• Is it 99% full or 327% full
• Important to look at allocated (actual growth)
versus percentage fill of the last extent
• Hint it never shows 100% as it preallocates
space for future extends of the area
Now we know how people lie
but how do I determine if our
performance is acceptable?
Ask the users
Method: Measuring Performance
• Determine your 5-10 most time critical
portions of the application
• Time them in isolation
• Time them during the day when everyone says
performance is OK. They will never say it’s
good.
• These timings should be close if not exactly
the same
Method: Determine importance
• Customer visible
• Done many (thousands+) times a day
• Users “wait” for screen/output
Timings
•
•
•
•
Need not be exact
Wrist watch or cell phone timer is fine
Keep track of these timings
When people complain about performance
redo the timings
If the timings are bad
• Look for bottlenecks
– Network
– Disk
– Memory
– CPU
• It will likely be one of the first two solved by
using more of the second two
If the timings are good
• Smack the users around for wasting your time
or
• Reevaluate timings, no really just smack the
users
Conclusion
• Performance is perception
– Reason for “working …”
• Focus on user experience
• Know what is normal
– In stored statistics
– In response times
Still more Conclusion
• Know what is important
– Customer facing
• Benchmarks lie
• Buffer Hit Rate
– You can make it whatever you want
– Need to understand how to make it useful
Questions?
Adam Backman
adam@wss.com
Thank you for your
time!
Download