Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services Joe Carola, Siemens HS Bio • 30+ years in Information Technology – 26 of them dedicated to Relational Database, covering all areas of database design, implementation, support, performance, etc. • Full History: – – – – – – – – Development: Prog Trainee, Prog/Analyst, Sys Analyst DBA trainee, DBA, Mgr-DBA (“Actor on the Scene”) Lead DB Consultant for Codd and Date Consulting Director-DBA Technical Database Architect (Currently) DB2, Microsoft SQL Server, Oracle, Sybase SQL Server Mainframe, Unix, Wintel 1993 recipient of an International Database User Group Award for Information Excellence, based on his contributions in the area of Relational Database technology, and has presented locally and internationally on a variety of Relational Database topics. – Chairman, Delaware Valley DB2 User Group 2 Agenda – “Practical Stuff” DB Design • In the simplest terms - Logical to Physical Optimization and Indexing • Optimizer • Index Types and how they are used Monitoring and Tuning • Monitoring Process and what to monitor • Tuning Steps The above is where I see that the rubber meets the road…… 3 DB Design DB Design Logical Design • Provides complete understanding of data and it usage • Defining data entities and their attributes • Provides primary and foreign key definitions Physical Design • Based on information gathered during Logical design – Data must be understood to do this correctly and efficiently • Provides physical aspects to enhance data usage – Data types, data lengths, row sizes, • Provides precise access paths (Indexes) to rows of data – Support primary and foreign keys – Secondary indexes Poor Logical and Physical Database design can be the largest reason for performance issues • The price for poor DB Design must be paid at execution time 5 DB Design Normalization: A synthesis of data design 1st Normal Form – Data is dependent on…………..The Key 2nd Normal Form – Data is dependent on The whole Key 3rd Normal Form - “ “ “ And nothing but the Key, “so help me Codd” • Edgar F. (Ted) Codd – Developed the Relational Model “A Relational Model for Large Shared Data Banks” (1970) – Solid, yet complex, mathematical foundation • Relational Algebra • Domains, Attributes, Tuples, and Relations (OMG!) – Re-stated to simpler terms….. • Simple to understand tables, rows, and columns • The Simplicity is partially the reason for the performance issues being addressed every day – Too many shortcuts are taken – Too many non-experienced data designers are designing and implementing database applications 6 DB Design 3rd Normal Form is basically 1st cut physical • Next step after 3rd NF in Physical DB Design is a very important step for Performance, Concurrency, Operations, etc. – De-Normalization takes place here • Storing of data in summary or derived format • If it doesn’t happen, it takes place at execution time Result: High processing costs – Materialization of the result data Administration costs – Maintenance of the data Low Currency – Concurrent Access to the data • However……. – Anomalies are created as a result of De-normalization • Insert, Delete, Update • They all cost extra processing also – Must strike a balance based on requirements on performance, availability, storage, administration 7 Optimization & Indexing Optimization and Indexing Must understand the basics of indexes and performance statistics. • As a general rule, indexes should be kept as narrow as possible, most likely following a business use requirement, to reduce the amount of processing overhead associated with each query. Being familiar with how optimization works will improve the accuracy of your decision making when designing indexes • Understanding how the optimizer works is the first step toward the establishment of a truly optimized database environment As the sophistication of your database implementation increases……. • The need to optimize performance will also increase. 9 Optimization and Indexing SQL Query tuning is one of the most important tasks to improve application performance Biggest bang for the performance dollar over everything else (“IMO”): • Network, Storage, Memory, Processor Should be done in the design and testing phases • However, no amount of Database tuning or SQL statement tuning can make up for inefficient application design/coding – 60% to 80% of Application Problems come from poorly written SQL or the code around it i.e. Prog101 abuse can wreck an application too!! 10 Optimizer Responsible for choosing the least costly way to execute SQL (DML). Creates an access path with it’s decision • Performed at plan compilation time • Determines Access Methods – Index Usage – Table Scan – Join Method – Sort • Determines if Data and/or Index pages can be read in advance – Asynchronous Pre-fetch 11 The Importance of Statistics Statistics provide the optimizer with the information to make decisions Table Indexspace Generation Tablespace RDBMS Catalog Or Dictionary As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less than optimal decisions on how to process a query. 12 Statistical Terms/Concepts Cardinality • Measures how many unique values exist in the table Density • Measures the uniqueness of values within a table. • Helps the optimizer determine how many rows will be returned for a given key value • Indexes with high densities will likely be ignored by the optimizer – i.e. the index is highly non-unique Selectivity • Measures the number of rows that will be returned by a particular query. • Needed by Optimizer to calculate the relative cost of a query plan 13 From Request to Response REQUEST RESPONSE STAGE 2 PREDICATES RELATIONAL DATA SERVICES STAGE 2 - Evaluated after data retrieval via the relational (NONSARGABLE, Residual) data services which is more expensive than the Data Manager. DATA MANAGER PREDICATE ANY OTHER WITH INDEX (ES) INDEX KEY Non Indexed PREDICATE APPLIES BUFFER MANAGER I/O STAGE 1 PREDICATES STAGE 1 - Evaluated at the time the data rows are retrieved (SARGABLE). Performance advantage in using STAGE 1 PREDICATES because this stage eliminates ROWS passed to STAGE 2 via the Data Manager. REQUESTED DATA Indexing A very necessary part of Successful Database Implementation I wonder what queries will be run ? What indexes will be needed? What columns will be used as predicates? What ORDER BY will be used most often? Why do some of my queries run so slow! 15 Indexes are a good thing to add, however there is something to avoid….. “Thanks for fixing my query, what did you do?” “Great! Then add indexes to all the columns in my table ”I added an index to one of the columns #!*#!!! Types of Indexes There are two types of indexes: clustered and nonclustered, each with unique advantages depending on the data set. Clustered index • Dictates the storage order of the data in a table. Because the data is sorted, clustered indexes are more efficient on columns of data that are most often searched for ranges of values. This index type also excels at finding a specific row when the indexed value is unique. Non-clustered index • Similar to an index in a textbook where the data is stored in one place and the data value in another. A query searches for the data value by first searching the non-clustered index to find the location of the data value in the table and then retrieves the data directly from that location. The non-clustered index is useful for queries resulting in exact matches. 17 Basic Index Usage Matching Index Scan 1 Root Page 2 NonLeaf Page 3 Leaf Page Data Page Data Page Data Page Select * From TABLE1 Where INDEXED_COL1 = 12345 18 Basic Index Usage Non-Matching Index Scan Root Page NonLeaf Page 2 Leaf Page 1 Data Page Data Page Data Page Select * From TABLE1 Where INDEXED_COL1 > 00001 19 Basic Index Usage Index Only Root Page Non-Leaf Page Leaf Page 1 Select COL1 From TABLE1 Where INDEXED_COL1 > 00001 20 Join Methods Nested Loop Join SELECT A,B,X,Y FROM OUTER, INNER WHERE A=10 AND B=X Tables: Columns: OUTER A B 10 3 10 1 10 10 2 6 10 1 INNER X 5 3 2 1 2 9 7 Y A B C D E F G 1.) Scan the outer table, For each qualifying row……… 2.) find all matching rows in the inner table, via table space scan or index access. 21 COMPOSITE A B X 10 3 3 10 1 1 10 2 2 10 2 2 10 1 1 Y B D C E D The nested loop join produces this result Join Methods Merge Scan Join SELECT A,B,X,Y FROM OUTER, INNER WHERE A=10 AND B=X 1.) Condense and sort the outer table, or access it through an index on column B…... Tables: OUTER Columns: A B 10 1 10 1 10 2 10 3 10 6 Condense and sort the inner table. INNER X Y 1 D 2 C 2 E 3 B 5 A 7 G 9 F 2.) Scan the outer table, For each qualifying row….… 3.) Scan a group if matching rows in the inner table. 22 COMPOSITE A B X 10 1 1 10 1 1 10 2 2 10 2 2 10 3 3 Y D D C E B The merge scan join produces this result Join Methods Hybrid / Hash Join SELECT C2,C33 FROM OUTER, INNER WHERE C1 = A AND C2 = C22 1.) Apply local predicates and organize qualifying rows in join column sequence by either sorting or accessing via join column index…. INNER OUTER R R C22 C33 O C1 C2 I W D RID LIST 1 2 3 4 5 6 A A A A A . 1 1 2 3 6 . 1 2 2 3 5 7 D C E B A G P1 P2 P3 P4 P5 P6 P1 P1 P2 2.) Obtain only inner table RIDs via index access using sequenced join column key 23 values...…. P3 P4 4.) List Prefetch inner table rows and complete partial rows PARTIAL ROWS RESULT C2 RID C2 C33 1 1 2 2 3 P1 P1 P2 P3 P4 1 1 2 2 3 3.) Create partial rows, and sort in RID sequence...…. D D C E B An Ounce of Prevention…., Make your queries simple and efficient, ensuring the least costly access path available. • Try not to overload your tables with indexes • Try not to overload your indexes • Try not to overload your queries Keep the Database healthy • Reorganization – Eliminates empty space, and fragmentation – Reduces I/O Generate Statistics (if they are not automatic) • The Optimizer is very smart, but data attributes are always changing – DB Size/Volume, Data Skewness, Data Content Analyze SQL Query and access path selection prior to implementing into a production environment. • Execute the Explain Plan periodically to determine what method the Optimizer is selecting for an access path. 24 “Explain” Plan / SHOWPLAN Phase of the optimizer that captures information used in selecting the query access plan Why use an Explain Plan? • Gives clues as to why the optimizer made access decisions • Can be used in advance of execution • Can be used to maintain a history of problem query access – Before/After new indexes additions – Before/After Statistics are Generated/Re-Generated – Before/After Data additions/changes/deletions • Problem determination is easier by comparing reference plans 25 Example Graphical SHOWPLAN 26 Monitoring and Tuning Monitor and Tuning A Constant Process • A very necessary part of successful database implementation • Must be there to guarantee ongoing, optimal Database Performance Design Data Object Data Activity Data Repeat 3.) Consider Fixes 1.) Collect Data 4.) Apply Fixes 2.) Analyze Data 28 Redesign Tune Real time Periodic Historical Monitoring and Tuning What to monitor • Healthiness of Database Objects – Growth – Fragmentation • Exists when TS and/or indexes have pages in which the logical ordering, based on key or link value, does not match the physical ordering of the pages inside the file • Causes additional I/O and additional storage • Causes of Fragmentation – DML (Insert, Delete, Update) – Inserts/Updates cause Page Splits 29 cause holes – Delete/Updates Monitoring and Tuning What to monitor • Fragmentation illustrated Uniform pages in order Index 1 Page 1 Index 1 Page 2 Index 1 Page 3 Index 1 Page 4 Index 1 Page 5 Index 1 Page 6 Index 1 Page 7 Index 1 Page 8 Index 2 Page 2 Index 1 Page 4 Index 1 Page 5 Index 3 Page 1 Index 1 Page 8 Non-uniform pages, out of order Index 1 Page 1 Index 1 Page 2 Index 2 Page 1 • Reorganization – Reorders pages, compresses entries on a page • Always be sure to run new Statistics collection (for the Optimizer) 30 Monitoring and Tuning What to monitor • Object Usage – Access Patterns (Random, Sequential, Indexed, Non-Indexed) – I/O (Volume, Latency) – They tend to change over time as users learn the application • Memory Usage – Buffer Hit Ratio – Data/Index pages in the Buffer will avoid an I/O • Processing Activity – CPU utilization • Will indicate excessive searching and/or sorting – Parallel, Non-Parallel • Can speed up large searches • Can also monopolize all the processors • Locking – Timeouts – Deadlocks 31 Monitoring and Tuning How to monitor – Tool usage SQL Request Tool to Collect & Interpret DBMS Statistical Generation Alerts Performance DB Result Reports 32 Monitoring and Tuning Steps • Find the statements that consume the most resources – “Heavy Hitters” • Physical Reads will indicate SQL requiring disk access to get queries – Most expensive part of a Query!!! • Buffer Gets indicate the amount of searching going on within a query High Buffer Gets = Lots of Searching = Lots of Processing • Sorts information will indicate if SQL is doing an excessive amount of sorting • Find the offending statements without adding to the performance problem – Use simple top down approach • Avoid heavy tracing • Know the Database Design and Usage • Run Explain Plan on SQL 33 Additional Questions?