Numbers, We don’t need no stinkin’ numbers Adam Backman Vice President DBAppraise, Llc About the Presenter • Progress user from when dinosaurs roamed the earth (nearly) • President - White Star Software – Consulting: performance, coding, problem solving – Training: Programming, System and Database Administration • Vice President – DBAppraise – Managed database services Agenda • • • • Why performance is important? Components of performance Perception vs. reality Who is most important? Why is performance important? • • • • Time is literally money Many idle hands cost real money A delayed customer is a lost customer Delayed support equals lost confidence Put a value on performance • • • • Users wait 10 seconds Does not sound bad Users do the operation many times per day 10 seconds per transaction, 10 per hour, 8 hours a day. 800 seconds wasted per user per day. • 13 minutes wasted per user per day times then number of users (500 users) • That is over 100 hours of wasted time per day Components of performance • • • • Network Disk Memory CPU • Goal: Push the bottleneck to the fastest resource Network • Slowest resource • Temp files going to network drive • Need to minimize traffic – -Mm (Remember to increase frame size everywhere when increasing –Mm) – -Mn, -Mpb, -Mi, -Ma Disk • Most frequent offender • People focus on wrong metrics • Queue depth and service time are generally good indicators of congestion Memory • • • • Move things off disk into memory -B (DB to shared memory) -Bt (temp disk files to temp buffers) OS and Disk array caches CPU • The “right” type of CPU activity – User – System – Wait – Idle – what you paid for – System overhead – Waiting on I/O (What type of I/O) – You need idle but having zero does not mean there is an issue Numbers are good but … • Performance stinks • Performance is perception • User experience is king First, look for record locks or other application issues • • • • • ProTop has a screen for blocked sessions Record locks can completely stop activity The user sees record in use by …. The administrator does not Additionally, I look for very high db requests from a single connection ProTop: Blocked Sessions Blocked Sessions Usr Name Note ----- ------------ --------------------------------------------24 tom REC XQH 102 [Order] Adam Promon: Block Access Block Access: Type Usr Name Acc 999 TOTAL... DB Requests 6415644367 Acc 0 adam 165004 Acc 5 adam 1 Acc 6 adam 1 Acc 7 dbapprai DB Reads \Writes 54341274 423828 BI Reads \Writes 6284 521056 1317 1245 0 191480 5657 93125 0 6 0 184629 0 7 1 0 0 3549613 0 Buffer Hit Percentage • Generally a good metric • But … – A single table small table scan can vastly skew results – Low volume buffer hit percentage is nearly meaningless How to Make Buffer Hit Rate Useful • Know which tables are being read – Large tables – Small tables • Know what is “normal” How to Make Buffer Hit Rate Less Useful • Bring up promon activity screen and only use the first sample • Use really small sample sizes (seconds vs. minutes) • Use really large sample sizes (hours vs. minutes) Benchmarks Lie • Do not test real-world – All read (Readprobe) – All write (ATM) – Wrong mix of read and write • Time slicing can make results more attractive CPU – Wait • The CPU always blames everyone else • If you have wait and idle it is generally no issue • If you have wait and no idle you likely have an issue. Look at disk first CPU - Idle • If you have a single core then a single program can use 100% of the CPU • This is a good thing. The process will use it’s CPU and complete The network is never more than 10% busy • Every network admin in the world uses this line • They get this from the manufacturers • They sample and provide a single sample for a large time frame. • How about 100% busy 10% of the time Setting –spin based on a calculation • Gus said that it should be … • Unless Gus is at your site any calculation is wrong • Gus said this some time ago and was misquoted at that time • Generally stated as # * CPUs • This is nearly always wrong (you could get lucky by accident) Percentage full on extents • Is it 99% full or 327% full • Important to look at allocated (actual growth) versus percentage fill of the last extent • Hint it never shows 100% as it preallocates space for future extends of the area Now we know how people lie but how do I determine if our performance is acceptable? Ask the users Method: Measuring Performance • Determine your 5-10 most time critical portions of the application • Time them in isolation • Time them during the day when everyone says performance is OK. They will never say it’s good. • These timings should be close if not exactly the same Method: Determine importance • Customer visible • Done many (thousands+) times a day • Users “wait” for screen/output Timings • • • • Need not be exact Wrist watch or cell phone timer is fine Keep track of these timings When people complain about performance redo the timings If the timings are bad • Look for bottlenecks – Network – Disk – Memory – CPU • It will likely be one of the first two solved by using more of the second two If the timings are good • Smack the users around for wasting your time or • Reevaluate timings, no really just smack the users Conclusion • Performance is perception – Reason for “working …” • Focus on user experience • Know what is normal – In stored statistics – In response times Still more Conclusion • Know what is important – Customer facing • Benchmarks lie • Buffer Hit Rate – You can make it whatever you want – Need to understand how to make it useful Questions? Adam Backman adam@wss.com Thank you for your time!