Performance and Tuning ORACLE SQL GUIDE for Developers and DBAs Kanagaraj Velusamy 15 December 2018 Workshop Objective 15 December 2018 2 Who Tunes? 15 December 2018 3 Introduction to SQL Tuning An important fact of database system performance tuning is the tuning of SQL statements. SQL tuning involves three basic steps Identifying high load or top SQL statements that are responsible for a large share of the application workload and system resources, by reviewing past SQL execution history available in the system. Verifying that the execution plans produced by the query optimizer for these statements perform reasonably. Implementing corrective actions to generate better execution plans for poorly performing SQL statements. These three steps are repeated until the system performance reaches a satisfactory level or no more statements can be tuned 15 December 2018 4 Goals for Tuning The objective of tuning a system is to reduce the response time for end users of the system and reduce the resources used to process the same work. You can accomplish both of these objectives in several ways: Reduce the Workload Balance the Workload Parallelize the Workload 15 December 2018 5 Reduce the workload SQL tuning commonly involves finding more efficient ways to process the same workload. It is possible to change the execution plan of the statement without altering the functionality to reduce the resource consumption. Two examples of how resource usage can be reduced are: 1. If a commonly executed query needs to access a small percentage of data in the table, then it can be executed more efficiently by using an index. By creating such an index, you reduce the amount of resources used. 2. If a user is looking at the first twenty rows of the 10,000 rows returned in a specific sort order, and if the query (and sort order) can be satisfied by an index, then the user does not need to access and sort the 10,000 rows to see the first 20 rows. 15 December 2018 6 Balance the Workload Systems often tend to have peak usage in the daytime when real users are connected to the system, and low usage in the nighttime. If noncritical reports and batch jobs can be scheduled to run in the nighttime and their concurrency during day time reduced, then it frees up resources for the more critical programs in the day. Parallelize the Workload Queries that access large amounts of data (typical data warehouse queries) often can be parallelized. This is extremely useful for reducing the response time in low concurrency data warehouse. However, for OLTP environments, which tend to be high concurrency, this can adversely impact other users by increasing the overall resource usage of the program. 15 December 2018 7 Understanding the Query Optimizer The query optimizer determines which execution plan is most efficient by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. The query optimizer also considers hints, which are optimization instructions placed in a comment in the statement. 15 December 2018 8 The query optimizer performs the following steps The optimizer generates a set of potential plans for the SQL statement based on available access paths and hints. The optimizer estimates the cost of each plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes, and partitions accessed by the statement. The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan. The optimizer calculates the cost of access paths and join orders based on the estimated computer resources, which includes I/O, CPU, and memory. Serial plans with higher costs take more time to execute than those with smaller costs. When using a parallel plan, however, resource use is not directly related to elapsed time. The optimizer compares the costs of the plans and chooses the one with the lowest cost. 15 December 2018 9 SQL Processing Architecture 15 December 2018 10 Query Optimizer Components The main objective of the query transformer is to determine if it is advantageous to change the form of the query so that it enables generation of a better query plan Selectivity Cardinality Cost 15 December 2018 11 The SQL Optimizers Whenever you execute a SQL statement, a component of the database known as the optimizer must decide how best to access the data operated on by that statement. Oracle supports two optimizers: the rule-base optimizer (which was the original), and the cost-based optimizer. To figure out the optimal execution path for a statement, the optimizers consider the following: The syntax you've specified for the statement Any conditions that the data must satisfy (the WHERE clauses) The database tables your statement will need to access All possible indexes that can be used in retrieving data from the table The Oracle RDBMS version The current optimizer mode SQL statement hints All available object statistics (generated via the ANALYZE command) The physical table location (distributed SQL) INIT.ORA settings (parallel query, async I/O, etc.) 15 December 2018 12 Not so good things about RBO... Released with Oracle 6. Using an ordered list of access methods and join methods on relative cost or each operation. Normally, it chooses the path from right to left in the from clause. Has a very limited input in determining access paths. RBO has a small number of possible access method. (it does not recognize IOT, bitmap index, hash join, …) It will process the tables based on how they are ordered on the query. (can be good and most of the time is not so good) Always ranks execution plan based on relative cost in the list, regardless of the data stored in the table. Index scan will always be better than table scan, which is not true. Coding for the RBO is halted. All new features require implementation of CBO. 15 December 2018 13 Understanding the Cost-Based Optimizer The cost-based optimizer is a more sophisticated facility than the rule-based optimizer. To determine the best execution path for a statement, it uses database information such as table size, number of rows, key spread, and so forth, rather than rigid rules. The information required by the cost-based optimizer is available once a table has been analyzed via the ANALYZE command, or via the DBMS_STATS facility. If a table has not been analyzed, the cost-based optimizer can use only rulebased logic to select the best access path. The ANALYZE command and the DBMS_STATS functions collect statistics about tables, clusters, and indexes, and store those statistics in the data dictionary. 15 December 2018 14 Understanding Access Paths for the Query Optimizer Full Table Scans: Reads all rows from a table and filters out those that do not meet the selection criteria. All blocks in the table that are under the high water mark are scanned. The high water mark indicates the amount of used space, or space that had been formatted to receive data. Each row is examined to determine whether it satisfies the statement's WHERE clause Rowid Scans: The rowid of a row specifies the data file and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row, because the exact location of the row in the database is specified. To access a table by rowid, Oracle first obtains the rowids of the selected rows, either from the statement's WHERE clause or through an index scan of one or more of the table's indexes. Oracle then locates each selected row in the table based on its rowid. 15 December 2018 15 Understanding Access Paths for the Query Optimizer Index Scans: In this method, a row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table. The index contains not only the indexed value, but also the rowids of rows in the table having that value. Therefore, if the statement accesses other columns in addition to the indexed columns, then Oracle can find the rows in the table by using either a table access by rowid or a cluster scan. 15 December 2018 16 Understanding Access Paths for the Query Optimizer Cluster Access: A cluster scan is used to retrieve, from a table stored in an indexed cluster, all rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data block. To perform a cluster scan, Oracle first obtains the rowid of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this rowid. Hash Access: A hash scan is used to locate rows in a hash cluster, based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data block. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value. 15 December 2018 17 Understanding Joins Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE clause. In a join, one row set is called inner, and the other is called outer. How the Query Optimizer Executes Join Statements Access Paths As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement. Join Method To join each pair of row sources, Oracle must perform a join operation. Join methods include nested loop, sort merge, Cartesian, and hash joins. Join Order To execute a statement that joins more than two tables, Oracle joins two of the tables and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result. 15 December 2018 18 How the Query Optimizer Chooses Execution Plans for Joins The query optimizer considers the following when choosing an execution plan: The optimizer first determines whether joining two or more tables definitely results in a row source containing at most one row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY constraints on the tables. If such a situation exists, then the optimizer places these tables first in the join order. The optimizer then optimizes the join of the remaining set of tables. With the query optimizer, the optimizer generates a set of execution plans, according to possible join orders, join methods, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. 15 December 2018 19 Understanding Joins Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table. It is very important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better. A nested loop join involves the following steps: The optimizer determines the driving table and designates it as the outer table. The other table is designated as the inner table. For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in the inner table. The outer loop appears before the inner loop in the execution plan, as follows: NESTED LOOPS outer_loop inner_loop 15 December 2018 20 Understanding Joins Hash Joins Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows. This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables. When the Optimizer Uses Hash Joins The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true: A large amount of data needs to be joined. A large fraction of a small table needs to be joined. 15 December 2018 21 Understanding Joins Sort Merge Joins Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist: 1. The row sources are sorted already. 2. A sort operation does not have to be done. However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost. Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition. In a merge join, there is no concept of a driving table. The join consists of two steps: Sort join operation: Both the inputs are sorted on the join key. Merge join operation: The sorted lists are merged together. http://oracle-online-help.blogspot.com/2007/03/nested-loops-hash-join-and-sort-merge.html 15 December 2018 22 Index Management B-tree indexes –.This is the standard tree index that Oracle has been using since the earliest releases INDEX is one of the very complex structure in ORACLE database. Some important points about INDEX Index is like table only, it stores information and occupies some space. It has links internally with tables. Index stores ROWID and Indexed column values from tables. Don't rush on creating INDEX. Do impact analyze like.. is it possible to achieve the same goal without creating INDEX?, Are you going to use the INDEX Permanently etc… Avoid creating index temporally. Because you will forget to drop index if it is not used really. Which leads to more storage occupation and decreasing DML performance on your table. Please drop the Index if it is not used really. Ordering the columns is very important for creating combined Index columns. Index rebuilding is required periodically if table has frequent DML 15 December 2018 23 Index Management Ordering the Columns & Cardinality 15 December 2018 24 Bitmap Index Bitmap indexes are used where an index column has a relatively small number of distinct values (low cardinality). These are super-fast for read-only databases, but are not suitable for systems with frequent updates You will want a bitmap index when: Table column is low cardinality - As a ROUGH guide, consider a bitmap for any index with less than 100 distinct values The table has LOW DML - You must have low insert./update/delete activity. Updating bitmapped indexes take a lot of resources, and bitmapped indexes are best for largely read-only tables and tables that are batch updated nightly. Multiple columns - Your SQL queries reference multiple, low cardinality values in there where clause. Oracle cost-based SQL optimizer (CBO) will scream when you have bitmap indexes on . 15 December 2018 25 Multiple columns BIT Map INDEX Example For example, assume there is a motor vehicle database with numerous lowcardinality columns such as car_color, car_make, car_model, and car_year. Each column contains less than 100 distinct values by themselves, and a b-tree index would be fairly useless in a database of 20 million vehicles. However, combining these indexes together in a query can provide blistering response times a lot faster than the traditional method of reading each one of the 20 million rows in the base table. For example, assume we wanted to find old blue Toyota Corollas manufactured in 1981: SELECT license_plat_nbr FROM vehicle WHERE color = ‘blue’ AND make = ‘toyota’ AND year = 1981; Oracle uses a specialized optimizer method called a BITMAPPED INDEX MERGE to service this query. In a bitmapped index merge, each Row-ID, or RID, list is built independently by using the bitmaps, and a special merge routine is used in order to compare the RID lists and find the intersecting values. Using this methodology, Oracle can provide sub-second response time when working against multiple low-cardinality columns 15 December 2018 26 Troubleshooting Oracle bitmap indexes: Some of the most common problems when implementing bitmap indexes include: Small table - The CBO may force a full-table scan if your table is small! Bad statistics - Make sure you always analyze the bitmap with dbms_stats right after creation: CREATE BITMAP INDEX emp_bitmap_idx ON index_demo (gender); Exec dbms_stats.gather_index_stats (OWNNAME=>'SCOTT', INDNAME=>'EMP_BITMAP_IDX'); Test with a hint - To force the use of your new bitmap index, just use a Oracle INDEX hint: SELECT /*+ index(emp emp_bitmap_idx) */ COUNT(*) FROM emp, dept WHERE emp.deptno = dept.deptno; 15 December 2018 27 Bitmap Join Index (JBM) This is an index structure whereby data columns from other tables appear in a multicolumn index of a junction table. This is the only create index syntax to employ a SQL-like from clause and where clause. Oracle9i has added the bitmap join index to its mind-boggling array of table join methods. This new table access method requires that you create an index that performs the join at index creation time and that creates a bitmap index of the keys used in the join. But unlike most relational database indexes, the indexed columns don't reside in the table. Oracle has revolutionized index creation by allowing a WHERE clause to be included in the index creation syntax. This feature revolutionizes the way relational tables are accessed via SQL. The bitmap join index is extremely useful for table joins that involve low-cardinality columns (e.g., columns with less than 300 distinct values). However, bitmap join indexes aren't useful in all cases. You shouldn't use them for OLTP databases because of the high overhead associated with updating bitmap indexes. Let’s take a closer look at how this type of index works. 15 December 2018 28 How Bitmap Join Indexes work To illustrate bitmap join indexes, I'll use a simple example, a many-to-many relationship where we have parts and suppliers with an inventory table serving as the junction for the many-to-many relationship. Each part has many suppliers and each supplier provides many parts 300 50 we create an index on the Inventory using columns contained in the Supplier and Part tables. The idea behind a bitmap join index is to pre-join the low cardinality columns, making the overall join faster but this technique has never been employed in cases where the low cardinality columns reside in a foreign table. 15 December 2018 29 How Bitmap Join Indexes work To create a bitmap join index, issue the following Oracle DDL CREATE bitmap index part_suppliers_state ON inventory( parts.part_type, supplier.state) FROM inventory i, parts p, supplier s WHERE i.part_id=p.part_id AND i.supplier_id=p.supplier_id; Bitmap join indexes in action To see how bitmap join indexes work, look at this example of a SQL query. Let's suppose you want a list of all suppliers of pistons in North Carolina. To get that list, you would use this query: select supplier_name from parts natural join inventory natural join suppliers where part_type = 'piston' and state='nc'; 15 December 2018 30 How Bitmap Join Indexes work Prior to Oracle9i, this SQL query would be serviced by a nested loop join or hash join of all three tables. With a bitmap join index, the index has pre-joined the tables, and the query can quickly retrieve a row ID list of matching table rows in all three tables. Note that this bitmap join index specified the join criteria for the three tables and created a bitmap index on the junction table (Inventory) with the Part_type and State keys (Figure A). Oracle benchmarks claim that bitmap join indexes can run a query more than eight times faster than traditional indexing methods. However, this speed improvement is dependent upon many factors, and the bitmap join is not a panacea. 15 December 2018 31 How Bitmap Join Indexes work Restrictions on using the bitmap join index include • The indexed columns must be of low cardinality—usually with less than 300 distinct values. • The query must not have any references in the WHERE clause to data columns that are not contained in the index. • The overhead when updating bitmap join indexes is substantial. For practical use, bitmap join indexes are dropped and rebuilt each evening about the daily batch load jobs. This means that bitmap join indexes are useful only for Oracle data warehouses that remain read-only during the processing day. Remember: Bitmap join indexes can tremendously speed up specific data warehouse queries but at the expense of pre-joining the tables at bitmap index creation time. You must also be concerned about high-volume updates. Bitmap indexes are notoriously slow to change when the table data changes, and this can severely slow down INSERT and UPDATE DML against the target tables. 15 December 2018 32 How bitmap join indexes work There are also restrictions on when the SQL optimizer is allowed to invoke a bitmap join index. For queries that have additional criteria in the WHERE clause that doesn't appear in the bitmap join index, Oracle9i will be unable to use this index to service the query. For example, the following query will not use the bitmap join index: SELECT supplier_name FROM parts NATURAL JOIN inventory NATURAL JOIN suppliers WHERE AND AND part_type = 'piston' state = 'nc' part_color = 'yellow'; -- part_color is not a part of JBI columns 15 December 2018 33 FUNCTION based Index • A function-based index allows you to match any WHERE clause in an SQL statement and remove unnecessary large-table full-table scans with super-fast index range scans. • This capability allows you to have case insenstive searches or sorts, search on complex equations, and extend the SQL language efficiently by implementing your own functions and operators and then searching on them. • Why to use this feature : It's easy and provides immediate value. It can be used to speed up existing applications without changing any of their logic or queries. It can be used to supply additional functionality to applications with very little cost. Example 1 CREATE INDEX emp_upper_idx ON emp(upper(ename)); SELECT ename, empno, sal FROM emp WHERE upper(ename) = 'KING'; Example 2 CREATE INDEX sales_margin_inx ON sales(revenue - cost); SELECT ordid FROM sales WHERE (revenue - cost) > 1000; 15 December 2018 34 How to enable Function Based Indexes The following is a list of what needs to be done to use function based indexes: • You must have the system privelege query rewrite to create function based indexes on tables in your own schema. • For the optimizer to use function based indexes, the following session or system variables must be set in Systems level or session level: ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE ALTER SESSION SET QUERY_REWRITE_INTEGRITY=TRUSTED or by setting them in the init.ora parameter file. The meaning of query_rewrite_enabled is to allow the optimizer to rewrite the query allowing it to use the function based index. The meaning of is to tell the optimizer to «trust» that the code marked deterministic by the programmer is in fact deterministic. If the code is in fact not deterministic (that is, it returns different output given the same inputs), the resulting rows from the index may be incorrect. Function based indexes are only visible to the Cost Based Optimizer and will not be used by the Rule Based Optimizer ever. 15 December 2018 35 Tips : using ANALYZE command The way that you analyze your tables can have a dramatic effect on your SQL performance. If your DBA forgets to analyze tables or indexes after a table re-build, the impact on performance can be devastating. If your DBA analyzes each weekend, a new threshold may be reached and Oracle may change its execution plan. If you do want to analyze frequently, use DBMS_STATS.EXPORT_SCHEMA_STATS to back up the existing statistics prior to re-analyzing. This gives you the ability to revert back to the previous statistics if things screw up. When you analyze, you can have Oracle look at all rows in a table (ANALYZE COMPUTE) or at a sampling of rows (ANALYZE ESTIMATE). Typically, use ANALYZE ESTIMATE for very large tables (1,000,000 rows or more), and ANALYZE COMPUTE for small to medium tables. ORACLE strongly recommends that you analyze FOR ALL INDEXED COLUMNS for any table that can have severe data skewness. For example, if a large percentage of rows in a table has the same value in a given column, that represents skewness. The FOR ALL INDEXED COLUMNS option makes the cost- based optimizer aware of the skewness of a column's data in addition to the cardinality (number-distinct values) of that data. 15 December 2018 36 Tips : using ANALYZE command When a table is analyzed using ANALYZE, all associated indexes are analyzed as well. If an index is subsequently dropped and recreated, it must be re-analyzed. Be aware that the procedures DBMS_STATS.GATHER_SCHEMA_STATS and GATHER_TABLE_STATS analyze only tables by default, not their indexes. When using those procedures, you must specify the CASCADE=>TRUE option for indexes to be analyzed as well. Following are some sample ANALYZE statements: ANALYZE TABLE EMP ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS; ANALYZE INDEX EMP_NDX1 ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS; ANALYZE TABLE EMP COMPUTE STATISTICS FOR ALL INDEXED COLUMNS; If you analyze a table by mistake, you can delete the statistics. For example: ANALYZE TABLE EMP DELETE STATISTICS; Analyzing can take an excessive amount of time if you use the COMPUTE option on large objects. We find that on almost every occasion, ANALYZE ESTIMATE 5 PERCENT on a large table forces the optimizer make the same decision as ANALYZE COMPUTE. 15 December 2018 37 Index Rebuilding There are many myths and legends surrounding the use of Oracle indexes, especially the ongoing passionate debate about rebuilding of indexes for improving performance. Some experts claim that periodic rebuilding of Oracle b-tree indexes greatly improves space usage and access speed, while other experts maintain that Oracle indexes should “rarely” be rebuilt. Interestingly, Oracle reports that the new Oracle10g Automatic Maintenance Tasks (AMT) will automatically detect indexes that are in need of re-building. Here are the pros and cons of this highly emotional issue: Arguments against Index Rebuilding Some Oracle in-house experts maintain that Oracle indexes are super-efficient at space re-use and access speed and that a b-tree index rarely needs rebuilding. They claim that a reduction in Logical I/O (LIO) should be measurable, and if there were any benefit to index rebuilding, someone would have come up with “provable” rules 15 December 2018 38 Index Rebuilding Arguments for Index Rebuilding Many Oracle shops schedule periodic index rebuilding, and report measurable speed improvements after they rebuild their Oracle b-tree indexes. In an OracleWorld 2003 presentation titled Oracle Database 10 g: The Self-Managing Database by Sushil Kumar of Oracle Corporation, Kumar states that the Automatic Maintenance Tasks (AMT) Oracle10g feature will automatically detect and rebuild sub-optimal indexes. “AWR provides the Oracle Database 10g a very good 'knowledge' of how it is being used. By analyzing the information stored in Automatic Workload Repository (AWR) , the database can identify the need of performing routine maintenance tasks, such as optimizer statistics refresh, rebuilding indexes, etc. The Automated Maintenance Tasks infrastructure enables the Oracle Database to automatically perform those operations.” 15 December 2018 39 Index Rebuilding Where are the index details? Most Oracle professionals are aware of the dba_indexes view that is populated with index statistics when indexes are analyzed. The dba_indexes view contains a great deal of important information for the SQL optimizer, but there is still more to see. Oracle provides an analyze index Index_name validate structure command that provides additional statistics into a temporary tables called index_stats. The important Index statistics for index rebuilding decision The following INDEX_STATS columns are especially useful: HEIGHT refers to the maximum number of levels encountered within the index. An index could have 90 percent of the nodes at three levels, but excessive splitting and spawning in one area of the index with heavy DML operations could make nodes in that area to have more than three levels. As an index accepts new rows, the index blocks split. Once the index nodes have split to a predetermined maximum level the index will “spawn” into a new level.. LF_ROWS refers to the total number of leafs nodes in the index. 15 December 2018 40 DEL_LF_ROWS refers to the number of leaf rows that have been marked deleted as a result of table DELETEs." CLUSTERING_FACTOR – This is one of the most important index statistics because it indicates how well sequenced the index columns are to the table rows. If clustering_factor is low (about the same as the number of dba_segments.blocks in the table segment) then the index key is in the same order as the table rows and index range scans will be very efficient, with minimal disk I/O. As clustering_factor increases (up to dba_tables.num_rows), the index key is increasingly out of sequence with the table rows. Oracle’s cost-based SQL optimizer relies heavily upon clustering_factor to decide whether to use the index to access the table. BLOCKS – This is the number of blocks consumed by the index. This is dependent on the db_block_size. In Oracle9i and beyond, many DBAs create b-tree indexes in very large blocksizes (db_32k_cache_size) because the index will spawn less. Robin Schumacher has noted in his book Oracle Performance Troubleshooting notes “As you can see, the amount of logical reads has been reduced in half simply by using the new 16K tablespace and accompanying 16K data cache. Clearly, the benefits of properly using the new data caches and multi-block tablespace feature of Oracle9i and above are worth your investigation and trials in your own database.“ 15 December 2018 41 Method 1: 15 December 2018 42 Method 2: 1. Create a table index_details with all the columns from ( ********* dba_indexes ****************** , ******************Index_stats***********) 2. Insert into (mention all dba_index columns) index_details select * from dba_indexes where owner not like 'SYS%' (or you can filter only for your schema for example 'SCOTT') Now that we have gathered the index details from dba_indexes, we must loop through iterations of the analyze index Index_name validate structure command to populate our table with other statistics. Here is the script. 3. 4. 15 December 2018 43 Is there a criterion for index rebuilding? For example, here are the criteria used by a fellow Oracle DBA who swears that rebuilding indexes with these criteria has a positive effect on his system performance: -- *** Only consider when space used is more than 1 block *** btree_space > 8192 and -- *** The number of index levels is > 3 *** height > 3 -- *** The % being used is < 75% *** or pct_used < 75 -- *** Deleted > 20% of total *** or (del_lf_rows/(decode(lf_rows,0,1,lf_rows)) *100) > 20) 15 December 2018 44 Best approach for tuning a Query Query text with good style and understandable format SELECT a.empno,a.empname SELECT a.empno ,a.empname --- ,a.salary Removed by SRZVE2 on 12/OCT/07 ,b.deptname ,c.salary FROM emp ,dept , ( SELECT a b a.empno ,SUM(a.salary*b.tax) salary FROM wage a tax b WHERE a.tax_code=b.tax_code GROUP BY a.empno ) c WHERE a.empno =200 AND a.salary =40000 AND a.empname LIKE ‘%RAJ%’ AND a.deptno =b.deptno AND a.empno = c.empno ---,a.salary --Removed by SRZVE2 on ---12/OCT/07 ,b.deptname,c.salary FROM emp a ,dept b , ( SELECT a.empno ,SUM(a.salary*b.tax) salary FROM wage a tax b WHERE a.tax_code=b.tax_code GROUP BY a.empno ) c WHERE a.empno = 200 AND a.salary = 40000 AND a.empname LIKE ‘%RAJ%’ AND a.deptno = b.deptno AND a.empno = c.empno 15 December 2018 45 Tuning Tips TIP 1 (Best Tip): SQL cannot be shared within Oracle unless it is absolutely identical. Statements must have match exactly in case, white space and underlying schema objects to be shared within Oracle's memory. Oracle avoids the parsing step for each subsequent use of an identical statement. • Use SQL standards within an application. Rules like the following are easy to implement and will allow more sharing within Oracle's memory. - Using a single case for all SQL verbs - Beginning all SQL verbs on a new line - Right or left aligning verbs within the initial SQL verb - Separating all words with a single space • Use a standard approach to table aliases. If two identical SQL statements vary because an identical table has two different aliases, then the SQL is different and will not be shared. • Use table aliases and prefix all column names by their aliases when more than one table is involved in a query. This reduces parse time AND prevents future syntax errors if someone adds a column to one of the tables with the same name as a column in another table. (ORA-00918: COLUMN AMBIGUOUSLY DEFINED) 15 December 2018 46 TIP 2 Beware of WHERE clauses which do not use indexes at all. Even if there is an index over a column that is referenced by a WHERE clause included in this section, Oracle will ignore the index. 15 December 2018 47 TIP 3: Don't forget to tune views. Views are SELECT statements and can be tuned in just the same way as any other type of SELECT statement can be. All tuning applicable to any SQL statement are equally applicable to views. TIP 4: Avoid including a HAVING clause in SELECT statements. The HAVING clause filters selected rows only after all rows have been fetched. Using a WHERE clause helps reduce overheads in sorting, summing, etc. HAVING clauses should only be used when columns with summary operations applied to them are restricted by the clause. 15 December 2018 48 TIP 5: Minimize the number of table lookups (subquery blocks) in queries, particularly if your statements include subquery SELECTs or multicolumn UPDATEs. TIP 6 Avoid joins that require the DISTINCT qualifier on the SELECT list in queries which are used to determine information at the owner end of a one-to-many relationship. The DISTINCT operator causes Oracle to fetch all rows satisfying the table join and then sort and filter out duplicate values. EXISTS is a faster alternative, because the Oracle optimizer realizes when the subquery has been satisfied once, there is no need to proceed further and the next matching row can be fetched. (Note: This query returns all department numbers and names which have at least one employee) 15 December 2018 49 TIP 7 Consider whether a UNION ALL will suffice in place of a UNION. The UNION clause forces all rows returned by each portion of the UNION to be sorted and merged and duplicates to be filtered before the first row is returned. A UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge or filter. If your tables are mutually exclusive (include no duplicate records), or you don't care if duplicates are returned, the UNION ALL is much more efficient. TIP 8: Consider using DECODE to avoid having to scan the same rows repetitively or join the same table repetitively. Note, DECODE is not necessarily faster as it depends on your data and the complexity of the resulting query. Also, using DECODE requires you to change your code when new values are allowed in the field. 15 December 2018 50 TIP 9: Oracle automatically performs simple column type conversions (or casting) when it compares columns of different types. Depending on the type of conversion, indexes may not be used. Make sure you declare your program variables as the same type as your Oracle columns, if the type is supported in the programming language you are using. 15 December 2018 51 TIP 10: Specify the leading index columns in WHERE clauses. For a composite index, the query would use the index as long as the leading column of the index is specified in the WHERE clause. The following query would use the composite index based on the primary key constraint on the PART_ID and PRODUCT_ID columns: SELECT * FROM PARTS WHERE PART_ID = 100; SELECT * FROM PARTS WHERE PRODUCT_ID = 1111; The same request can rewritten to take advantage of the index. In this query, it is assumed that the PART_ID column will always have a value greater than zero: SELECT * FROM PARTS WHERE PART_ID > 0 AND PRODUCT_ID = 1111; TIP 11: Evaluate index scan vs. full table scan. If selecting more than 20 percent of the rows from a table, full table scan is usually faster than an index access path. In such cases, write your SQLs so that they use full table scans. The following statements would not use index scans even if an index is created on the SALARY column. In the first SQL, using the FULL hint forces Oracle to employ full table scan. When using an index does more harm than good, you can also use these techniques to suppress the use of the index. SELECT /* +FULL/ * FROM EMP WHERE SALARY = 50000; SELECT * FROM EMP WHERE SALARY+0 = 50000; 15 December 2018 52 TIP 12: Use ORDER BY for index scan. Oracle's optimizer will use an index scan if the ORDER BY clause is on an indexed column. The following query illustrates this point. This query would use the index available on EMPID column even though the column is not specified in the WHERE clause. The query would retrieve ROWID for each row from the index and access the table using the ROWID. SELECT SALARY FROM EMP ORDER BY EMPID; If this query performs poorly, you can try another alternative by rewriting the same query using the FULL hint TIP 13: Know the data. you have to know your data intimately. For example, say you have a table called BOXER containing two columns BOXER_NAME and SEX with a nonunique index on column SEX. If there are an equal number of male and female boxers (1000 records), the following query will run faster if Oracle performs a full table scan: SELECT BOXER_NAME FROM BOXER WHERE SEX = 'F'; You can ensure the query performs a full table scan by rewriting it as : SELECT BOXER_NAME --+ FULL FROM BOXER WHERE SEX = 'F'; If the table contains 980 male boxers, this query would be faster because it results in index scan: SELECT BOXER_NAME /*+ INDEX (BOXER BOXER_SEX)*/ FROM BOXER WHERE SEX = 'F'; 15 December 2018 53 TIP 14: You can reach the same destination in different ways. In many cases, more than one SQL statement can get you the same desired results. Each SQL may use a different access path and may perform differently. For example, the MINUS operator can be much faster than using WHERE NOT IN (SELECT ) or WHERE NOT EXISTS. Let's say we have an index on a STATE column and another index on an AREA_CODE column. Despite the availability of indexes, the following statement will require a full table scan due to the usage of the NOT IN predicate: SELECT CUSTOMER_ID FROM CUSTOMERS WHERE STATE IN ('VA', 'DC', 'MD') AND AREA_CODE NOT IN (804, 410); However, if the same query is rewritten as the following, it will result in index scans: SELECT CUSTOMER_ID FROM CUSTOMERS WHERE STATE IN ('VA', 'DC', 'MD') MINUS SELECT CUSTOMER_ID FROM CUSTOMERS WHERE AREA_CODE IN (804, 410); If a SQL involves OR in the WHERE clause, it can also be rewritten by substituting UNION for OR in the WHERE clause. You must carefully evaluate execution plans of all SQLs before selecting one to satisfy the information request. You can use Explain Plan and TKPROF tools for this process. 15 December 2018 54 TIP 15: Use the special columns Take advantage of ROWID and ROWNUM columns. Remember, the ROWID search is the fastest. Here's an example of UPDATE using ROWID scan: SELECT ROWID, SALARY INTO TEMP_ROWID, TEMP_SALARY FROM EMPLOYEE; UPDATE EMPLOYEE SET SALARY = TEMP_SALARY * 1.5 WHERE ROWID = TEMP_ROWID; A ROWID value is not constant in a database, so don't hard-code a ROWID value in your SQLs and applications. Use ROWNUM column to limit the number of rows returned. If you're not sure how many rows a SELECT statement will return, use ROWNUM to restrict the number of rows returned. The following statement would not return more than 100 rows: SELECT EMPLOYE.SS#, DEPARTMENT.DEPT_NAME FROM EMPLOYEE, DEPENDENT WHERE EMPLOYEE.DEPT_ID = DEPARTMENT.DEPT_ID AND ROWNUM < 100; 15 December 2018 55 Additional Tips Do not use the set operator UNION if the objective can be achieved through an UNION ALL. UNION incurs an extra sort operation. Select ONLY those columns in a query which are required. Extra columns which are not actually used incur more I/O on the database and increase network traffic. Do not use the keyword DISTINCT if the objective can be achieved otherwise. DISTINCT incurs an extra sort operation. If it is required to use a composite index try to use the ‘Leading” column in the “WHERE” clause. Though Index skip scan is possible it incurs extra cost in creating virtual indexes and may not be always possible depending on the cardinality of the leading columns. There should not be any Cartesian product in the query unless there is a definite requirement. 15 December 2018 56 Additional Tips It is always better to write separate SQL statements for different tasks, but if you must use one SQL statement, then you can make a very complex statement slightly less complex by using the UNION ALL operator Joins to complex views are not recommended, particularly joins from one complex view to another. Often this results in the entire view being instantiated, and then the query is run against the view data Querying from a view requires all tables from the view to be accessed for the data to be returned. If that is not required, then do not use the view. Instead, use the base table(s), or if necessary, define a new view. While querying on a partitioned table try to use the partition key in the “WHERE” clause if possible. This will ensure partition pruning. Avoid doing an ORDER BY on a large data set especially if the response time is important. 15 December 2018 57 Additional Tips Use CASE statements instead of DECODE (especially where nested DECODEs are involved) because they increase the readability of the query immensely. Do not use HINTS unless the performance gains clear. Check if the statistics for the objects used in the query are up to date. If not, use the DBMS_STATS package to collect the same. It is always good to understand the data both functionally and it’s diversity and volume in order to tune the query. Selectivity (predicate) and Cardinality (skew) factors have a big impact on query plan. Use of Statistics and Histograms can drive the query towards a better plan. Read explain plan and try to make largest restriction (filter) as the driving site for the query, followed by the next largest, this will minimize the time spent on I/O and execution in subsequent phases of the plan. 15 December 2018 58 Additional Tips Queries tend to perform worse as they age due to volume increase, structural changes in the database and application, upgrades etc. Use Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM) to better understand change in execution plan and throughput of top queries over a period of time. SQL Tuning Advisor and SQL Access Advisor can be used for system advice on tuning specific SQL and their join and access paths, however, advice generated by these tools may not be always applicable Disclaimer: Points listed above are only pointers and may not work under every circumstances. This check list can be used as a reference while fixing performance problems in the Oracle Database. Suggested further readings Materialized Views • Advanced Replication • Change Data Capture (Asynchronous) • Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM). • Partitioning strategies. 15 December 2018 59 How to use Explain Plan Before Tuning: Two minutes After Tuning: Few Seconds 15 December 2018 60 15 December 2018 61 Before Tuning After Tuning 15 December 2018 62 Introduction to HINTS Optimizer hints can be used with SQL statements to alter execution plans for better execution. Types of Hints Hints can be of the following general types: Single-table Single-table hints are specified on one table. INDEX and USE_NL are examples of single-table hints. Multi-table Multi-table hints are like single-table hints, except that the hint can specify one or more tables or views. LEADING is an example of a multi-table hint. Note that USE_NL(table1 table2) is not considered a multi-table hint because it is actually a shortcut for USE_NL(table1) and USE_NL(table2). Query block Query block hints operate on single query blocks. START_TRANSFERMATION and UNNEST are examples of query block hints. Statement Statement hints apply to the entire SQL statement. ALL_ROWS is an example of a statement hint. 15 December 2018 63 Hints by Category Optimizer hints are grouped into the following categories: Hints for Optimization Approaches and Goals Hints for Access Paths Hints for Query Transformations Hints for Join Orders Hints for Join Operations Hints for Parallel Execution Additional Hints 15 December 2018 64 Hints for Optimization Approaches and Goals The following hints let you choose between optimization approaches and goals: ALL_ROWS, FIRST_ROWS(n) If a SQL statement has a hint specifying an optimization approach and goal, then the optimizer uses the specified approach regardless of the presence or absence of statistics, the value of the OPTIMIZER_MODE initialization parameter, and the OPTIMIZER_MODE parameter of the ALTER SESSION statement. The optimizer goal applies only to queries submitted directly. Use hints to specify the access path for any SQL statements submitted from within PL/SQL. The ALTER SESSION... SET OPTIMIZER_MODE statement does not affect SQL that is run from within PL/SQL. 15 December 2018 65 ALL_ROWS, FIRST_ROWS(n) If you specify either the ALL_ROWS or the FIRST_ROWS(n) hint in a SQL statement, and if the data dictionary does not have statistics about tables accessed by the statement, then the optimizer uses default statistical values, such as allocated storage for such tables, to estimate the missing statistics and to subsequently choose an execution plan. These estimates might not be as accurate as those gathered by the DBMS_STATS package, so you should use the DBMS_STATS package to gather statistics. If you specify hints for access paths or join operations along with either the ALL_ROWS or FIRST_ROWS(n) hint, then the optimizer gives precedence to the access paths and join operations specified by the hints. 15 December 2018 66 Example ALL_ROWS, FIRST_ROWS(n) Assume that we have a simple query that selects 1,000,000 rows from the customer table, and orders the result by customer name: select cust_name from customer order by cust_name; Let's also assume that we have an index on the cust_name column. The SQL optimizer has a choice of methods to produce the result set: Choice 1 - The database can use the cust_name index to retrieve the customer table rows. This will alleviate the need for sorting the result set at the end of the query, but using the index has the downside of causing additional I/O within the database as the index nodes are accessed. Choice 2 - The database can perform a parallel full table scan against the table and then sort the result set on desk. This execution plan will generally result in less overall disk I/O resources than using the index, but the downside to this optimization technique that no rows from the query will be available until the entire query has been completed. For a giant query, this could take several minutes. 15 December 2018 67 Example ALL_ROWS, FIRST_ROWS(n) Hence, we see two general approaches to SQL query optimization. The use of the indexes to avoiding sorting been codified within Oracle as the first_rows optimization technique. Under first_rows optimization, the optimizer goal is to begin to return rows to the query as quickly as possible, even if it means extra disk I/O. It gives preference to Index scan Vs Full scan (even when index scan is not good). It prefers nested loop over hash joins because nested loop returns data as selected. Cost of the query is not the only criteria for choosing the execution plan. It chooses plan which helps in fetching first rows fast. This mode may be good for interactive client-server model. In most of OLTP systems, where users want to see data fast on their screen, this mode of optimizer is very handy. 15 December 2018 68 Example ALL_ROWS, FIRST_ROWS(n) The all_rows optimizer goal is designed to minimize overall machine resources. Under all_rows optimization the goal is to minimize the amount of machine resources and disk I/O for the query. Hence, the all_rows optimizer mode tends to favor full table scans, and is generally used in large data warehouses where immediate response time is not required. Important facts about ALL_ROWS • ALL_ROWS considers both index scan and full scan and based on their contribution to the overall query, it uses them. If Selectivity of a column is low, optimizer may use index to fetch the data (for example ‘where employee_code=7712’), but if selectivity of column is quite high ('where deptno=10'), optimizer may consider doing Full table scan. With ALL_ROWS, optimizer has more freedom to its job at its best. • Good for OLAP system, where work happens in batches/procedures. (While some of the report may still use FIRST_ROWS depending upon the anxiety level of report reviewers) 15 December 2018 69 Hints for Access Paths Each of the following hints instructs the optimizer to use a specific access path for a table: FULL CLUSTER HASH INDEX NO_INDEX INDEX_ASC INDEX_COMBINE INDEX_JOIN INDEX_DESC INDEX_FFS NO_INDEX_FFS INDEX_SS INDEX_SS_ASC INDEX_SS_DESC NO_INDEX_SS 15 December 2018 70 Hints for Access Paths FULL Hint SELECT /*+ FULL(e) */ employee_id, last_name FROM hr.employees e WHERE last_name LIKE :b1; Oracle Database performs a full table scan on the employees table to execute this statement, even if there is an index on the last_name column that is made available by the condition in the WHERE clause.The employees table has alias e in the FROM clause, so the hint must refer to the table by its alias rather than by its name. Do not specify schema names in the hint even if they are specified in the FROM clause. 15 December 2018 71 Hints for Access Paths NO_INDEX Hint SELECT /*+ NO_INDEX(employees emp_empid) */ employee_id FROM employees WHERE employee_id > 200; Each parameter serves the same purpose as in "INDEX Hint" with the following modifications: If this hint specifies a single available index, then the optimizer does not consider a scan on this index. Other indexes not specified are still considered. If this hint specifies a list of available indexes, then the optimizer does not consider a scan on any of the specified indexes. Other indexes not specified in the list are still considered. If this hint specifies no indexes, then the optimizer does not consider a scan on any index on the table. 15 December 2018 72 Hints for Query Transformations • FACT - The FACT hint is used in the context of the star transformation to indicate to the transformation that the hinted table should be considered as a fact table. • MERGE • NO_EXPAND • NO_EXPAND_GSET_TO_UNION • NO_FACT • NO_MERGE • NOREWRITE • REWRITE • STAR_TRANSFORMATION • USE_CONCAT 15 December 2018 73 Hints for Join Orders • LEADING • ORDERED The ORDERED hint instructs Oracle to join tables in the order in which they appear in the FROM clause. Oracle recommends that you use the LEADING hint, which is more versatile than the ORDERED hint. When you omit the ORDERED hint from a SQL statement requiring a join, the optimizer chooses the order in which to join the tables. You might want to use the ORDERED hint to specify a join order if you know something that the optimizer does not know about the number of rows selected from each table. Such information lets you choose an inner and outer table better than the optimizer could. The following query is an example of the use of the ORDERED hint: SELECT /*+ORDERED */ o.order_id, c.customer_id, l.unit_price * l.quantity FROM customers c, order_items l, orders o WHERE c.cust_last_name = :b1 AND o.customer_id = c.customer_id AND o.order_id = l.order_id; 15 December 2018 74 Hints for Parallel Execution Large queries (SELECT statements) can be split into smaller tasks and executed in parallel by multiple slave processes in order to reduce the overall elapsed time. The task of scanning a large table, for example, can be performed in parallel by multiple slave processes. Each process scans a part of the table, and the results are merged together at the end. Oracle's parallel query feature can significantly improve the performance of large queries and is very useful in decision support applications, as well as in other environments with large reporting requirements. • NOPARALLEL • PARALLEL • NOPARALLEL_INDEX • PARALLEL_INDEX • PQ_DISTRIBUTE 15 December 2018 75 SELECT ......... FROM ( SELECT * FROM ( SELECT /*+ PARALLEL(bo_daily_business_ctrl_fact) */ * FROM bo_daily_business_ctrl_fact where .............. ( Complex Sub queries ) ) cy_m where ............... UNION ALL ( SELECT * FROM ( SELECT /*+ PARALLEL (bo_daily_business_ctrl_fact) */ * FROM bo_daily_business_ctrl_fact WHERE .............. (Complex Sub queries ) ) py_m WHERE .............. Weekly Business Review - Rolling 14 days (Data Provider query) - Analysis S.No Market Parameter Week id Parameter Old Version query – Execution Time (Sec) Without Parallel New Version query – Execution Time (Sec) With Parallel 1 Netherlands 200510 98.69 45.12 2 Netherlands 200515 172.76 87.55 3 Netherlands 200520 354.75 21.49 4 Netherlands 200530 214.96 35.89 5 Netherlands 200535 202.79 38.28 15 December 2018 76 Hints for Join Operations • USE_NL • NO_USE_NL • USE_NL_WITH_INDEX • USE_MERGE • NO_USE_MERGE • USE_HASH • NO_USE_HASH 15 December 2018 77 Improving Query Performance with the WITH Clause Oracle9i significantly enhances both the functionality and performance of SQL to address the requirements of business intelligence queries. The SELECT statement’s WITH clause, introduced in Oracle9i, provides powerful new syntax for enhancing query performance. It optimizes query speed by eliminating redundant processing in complex queries. • Consider a lengthy query which has multiple references to a single subquery block. Processing subquery blocks can be costly, so recomputing a block every time it is referenced in the SELECT statement is highly inefficient. • The WITH clause enables a SELECT statement to define the subquery block at the start of the query, process the block just once, label the results, and then refer to the results multiple times. • The WITH clause, formally known as the subquery factoring clause, is part of the SQL-99 standard. The clause precedes the SELECT statement of a query and starts with the keyword “WITH.” The WITH is followed by the subquery definition and a label for the result set. The query below shows a basic example of the clause 15 December 2018 78 CASE 1 Execution time Using WITH clause 2 minutes Without WITH clause 6 to 7 minutes WITH clause defined with label vcexp Main Query using WITH clause label vcexp 15 December 2018 79 CASE 2 Execution time Using WITH clause 2 minutes Without WITH clause 3.5 to 4 minutes WITH clause defined with label wuser Main Query using WITH clause label wuser 15 December 2018 80 CASE 3 Execution time Without WITH clause 2 minutes Using WITH clause 1.07 minutes This query uses the WITH clause to calculate the sum of financial for each financial set and label the results as wfinan. Then it checks each financial set’s total financial value to see if any financial set’s total value are greater than one fourth of the total fianancial values . By using the new clause, the wfinan data is calculated just once, avoiding an extra scan through the large financial table. Although the primary purpose of the WITH clause is performance improvement, it also makes queries easier to read, write and maintain. Rather than duplicating a large block repeatedly through a SELECT statement, the block is localized at the very start of the query. Note that the clause can define multiple subquery blocks at the start of a SELECT statement: when several blocks are defined at the start, the query text is greatly simplified and its speed vastly improved. The SQL WITH clause in Oracle9i significantly improves performance for complex business intelligence queries. Together with the many other SQL enhancements in Oracle9i, the WITH clause extends Oracle's leadership in business intelligence. 15 December 2018 81 Working with Merge Statement Oracle9i introduces a new set of server functionality especially beneficial for the ETL (Extraction, Transformation, and Loading) part of any Business Intelligence process flow, addressing all the needs of highly scalable data transformation inside the database. One of the most exciting new features addressing the needs for ETL is the SQL statement MERGE. The new SQL combines the sequence of conditional INSERT and UPDATE commands in a single atomic statement, depending on the existence of a record. This operation is commonly known as Upsert functionality. 90 Minutes Execution time with simple PLSQL block 15 December 2018 82 7 to 8 Minutes Execution time with single merge Statement 15 December 2018 83 How Oracle Analytical function works "Analytic Functions are an important feature of the Oracle database that allows the users to enhance SQL's analytical processing capabilities. These functions enable the user to calculate rankings and percentiles, moving window calculations, lag/lead analysis, top-bottom analysis, linear regression analytics and other similar calculation-intense data processing " Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic clause. For each row, a "sliding" window of rows is defined. The window determines the range of rows used to perform the calculations for the "current row". Window sizes can be based on either a physical number of rows or a logical interval such as time. Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause. 15 December 2018 84 Business case : Tuning a query with Analytical Function Business Requirement for BI report Name: Current year – Previous year comparsion Rolling 14 days (WBR) Business Inputs: Year, week number, Country, Number of years Business Output: For a given Year, week number and Country, data (sales,no of units) should be captured with average of previous 14 days for each day, from first day (Monday) of a given week to the last day(Sunday) of previous week to (year – Number of years) th year Input Example: 2005, 14, ‘Netherland’, 2 Output Example: Data range is from 2005 14th week(First day) to 2004 13th week (last day) Data range is from 2004 14th week(First day) to 2003 13th week (last day) 15 December 2018 85 One part of the main query select to_date(cy_m.timebyday_id,'yyyymmdd') dat, 'cy' current_year, substr(cy_m.timebyday_id,1,4) , cy_m.cy_sales, cy_m.cy_py_sales, cy_m.cy_units, cy_m.cy_py_units, (select week_no from bo_time_by_day where timebyday_id=cy_m.timebyday_id) Week_no from ( select cy.timebyday_id, avg( sum( CY.actual_net_sales_amt+cy.non_product_amt ) ) over (order by CY.timebyday_id desc rows avg( sum( CY.prev_year_actual_net_sales_amt) ) over (order by CY.timebyday_id desc rows avg( sum( CY.actual_trans_count)) over (order by CY.timebyday_id desc rows avg( sum( CY.prev_year_actual_trans_count)) over (order by CY.timebyday_id desc rows between 1 following and 14 following ) between 1 following and 14 following ) between 1 following and 14 following ) between 1 following and 14 following ) cy_sales, cy_py_sales, cy_units, cy_py_units from ( select /*+ parallel(bo_daily_business_ctrl_fact) */ * from bo_daily_business_ctrl_fact where timebyday_id >=(select to_number(to_char(max(add_months(time_by_day_date,-12))-14,'yyyymmdd')) from bo_time_by_day where year||lpad(week_no,2,0) in (200535)) and timebyday_id <=(select to_number(to_char(max(time_by_day_date),'yyyymmdd')) from bo_time_by_day where year||lpad(week_no,2,0) in (200535)) and market_id in (select distinct market_id from bo_region_dim where market_name in ('Netherlands')) and sales_comp_flag = 1 and decode(nvl(bo_daily_business_ctrl_fact.actual_trans_count,0),0,0, decode(nvl(bo_daily_business_ctrl_fact.prev_year_actual_trans_count,0),0,0,1 ))=1 ) CY group by timebyday_id order by 1 desc ) CY_M Weekly Business Review - Rolling 14 days (Data Provider query) - Analysis S.No Market Parameter Week id Old Version query – Without Parallel & Analytical function Old Version query – With Analytical function New Version query – With Parallel & Analytical function 1 Netherlands 200510 21.00 Minutes 98.69 Seconds 45.12 Seconds 2 Netherlands 200515 18.41 Minutes 172.76 Seconds 87.55 Seconds 3 Netherlands 200520 30.20 Minutes 354.75 Seconds 21.49 Seconds 4 Netherlands 200530 25.48 Minutes 214.96 Seconds 35.89 Seconds 5 Netherlands 200535 22.17 Minutes 202.79 Seconds 38.28 Seconds 15 December 2018 86 Reduce I/O with Oracle cluster tables Disk I/O is expensive because when Oracle retrieves a block from a data file on disk, the reading process must wait for the physical I/O operation to complete. For queries that access common rows with a table (e.g. get all items in order 123), unordered tables can experience huge I/O as the index retrieves a separate data block for each row requested. If we group like rows together (as measured by the clustering_factor in dba_indexes) we can get all of the row with a single block read because the rows are together. You can use 10g hash cluster tables, single table clusters, or manual row re-sequencing to achieve this goal: 15 December 2018 87 Using Clusters for Performance Clusters are groups of one or more tables that are physically stored together because they share common columns and usually are used together. Because related rows are physically stored together, disk access time improves. • Cluster tables that are accessed frequently by the application in join statements. • Do not cluster tables if the application joins them only occasionally or modifies their common column values frequently. Modifying a row's cluster key value takes longer than modifying the value in an unclustered table, because Oracle might need to migrate the modified row to another block to maintain the cluster. • Do not cluster tables if the application often performs full table scans of only one of the tables. A full table scan of a clustered table can take longer than a full table scan of an unclustered table. Oracle is likely to read more blocks, because the tables are stored together. 15 December 2018 88 Using Clusters for Performance • Cluster master-detail tables if you often select a master record and then the corresponding detail records. Detail records are stored in the same data block(s) as the master record, so they are likely still to be in memory when you select them, requiring Oracle to perform less I/O. • Store a detail table alone in a cluster if you often select many detail records of the same master. This measure improves the performance of queries that select detail records of the same master, but does not decrease the performance of a full table scan on the master table. An alternative is to use an index organized table. • Do not cluster tables if the data from all tables with the same cluster key value exceeds more than one or two Oracle blocks. To access a row in a clustered table, Oracle reads all blocks containing rows with that value. If these rows take up multiple blocks, then accessing a single row could require more reads than accessing the same row in an unclustered table. • Do not cluster tables when the number of rows for each cluster key value varies significantly. This causes waste of space for the low cardinality key value; it causes collisions for the high cardinality key values. Collisions degrade performance. 15 December 2018 89 Using Hash Clusters for Performance Hash clusters group table data by applying a hash function to each row's cluster key value. All rows with the same cluster key value are stored together on disk. Consider the benefits and drawbacks of hash clusters with respect to the needs of the application. You might want to experiment and compare processing times with a particular table as it is stored in a hash cluster, and as it is stored alone with an index. Follow these guidelines for choosing when to use hash clusters: Use hash clusters to store tables accessed frequently by SQL statements with WHERE clauses, if the WHERE clauses contain equality conditions that use the same column or combination of columns. Designate this column or combination of columns as the cluster key. • • Store a table in a hash cluster if you can determine how much space is required to hold all rows with a given cluster key value, including rows to be inserted immediately as well as rows to be inserted in the future. 15 December 2018 90 Using Hash Clusters for Performance • Use sorted hash clusters, where rows corresponding to each value of the hash function are sorted on a specific columns in ascending order, when response time can be improved on operations with this sorted clustered data. • Do not store a table in a hash cluster if the application often performs full table scans and if you must allocate a great deal of space to the hash cluster in anticipation of the table growing. Such full table scans must read all blocks allocated to the hash cluster, even though some blocks might contain few rows. Storing the table alone reduces the number of blocks read by full table scans. • Do not store a table in a hash cluster if the application frequently modifies the cluster key values. Modifying a row's cluster key value can take longer than modifying the value in an unclustered table, because Oracle might need to migrate the modified row to another block to maintain the cluster 15 December 2018 91 Index Organized Table (IOT) Index Organized Tables (IOT) have their primary key data and non-key column data stored within the same B-Tree structure. Effectively, the data is stored within the primary key index. There are several reasons to use this type of table: Why Use Index Organized Tables Faster Index Access Index-organized tables provide faster access to table rows by the primary key. Also, since rows are stored in primary key order, range access by the primary key involves minimum block accesses. In order to allow even faster access to frequently accessed columns, the row overflow storage option can be used to push out infrequently accessed non-key columns from the B-tree leaf bloc k to an optional overflow storage area. This limits the size and content of the row portion actually stored in the B-tree leaf block, resulting in a smaller B-tree and faster access 15 December 2018 92 Index Organized Table (IOT) Reduced Storage Index-organized tables maintain a single storage structure -- the B-tree index. Primary key column values are stored only in the B-tree ind ex and not duplicated in the table and index as happens in a conventional heap-organized table. Because rows of an index-organized table are stored in primary key order, a significant amount of additional storage space savings can be obtained throug h the use of key compression. Increased 24x7 Availability Index-organized tables identify rows using logical ROWIDs based on the primary key. The use of logical ROWIDs enables online reorganization and also does not affect the secondary indexes which remain valid and usable after the reorganization. This capability reduces or eliminates the downtime for reorganization of secondary indexes, making index-organized tables beneficial for 24x7 applications. 15 December 2018 93 Where are Index-Organized Tables Used? Electronic order processing - An index-organized table is an ideal storage structure for the Îordersâ table, when the query and DML is predominantly primary-key based. Heavy volume of DML operations occurring in this type of application usually fra gments the table, requiring frequent table reorganization. An index-organized table can be reorganized without invalidating its secondary indexes, and can be performed online thus reducing or even eliminating "downtime" for the orders table. Electronic catalogs - An index-organized table can be used to store all types of manufacturing and retail catalogs. Manufacturing catalogs are usually indexed by product attributes based o n primary key and a retailers catalog may have a multicolumn primary key matching the hierarchy of products offered. Both types benefit from using index-organized tables. Key compression can be used on these indexorganized tables to avoid co lumn value repetitions increasing performance and reducing storage. Internet searches - These applications maintain lists of keywords, users, or URLs, suitable for storage in a inde x-organized table, where each row holds a primary key with some additional information. An index-organized table storing URLs and their associated links can considerably speed up access time. 15 December 2018 94 Web portals and auction sites - A prevailing feature of these application types is databases of users names with a subset of this available user information accessed more frequently than the rest. The flexible column placement within index-organized tables provides options for increasing performance of these applications. Data Warehousing - Index-organized tables support parallel features for loading, index creation, and scans required for handling large volumes of data. Partitioned index-organized tables are also supported, so that each partition can be loaded concurrently. Data warehousing applications using star schemas can also gain performance and s calability by implementing "fact" tables as index-organized tables for efficient execution of star queries. All these features make index-organized tables suitable for handling large scale data. Creation Of Index Organized Tables • Specify the primary key using a column or table constraint. • Use the ORGANIZATION INDEX. CREATE TABLE locations (id NUMBER(10) NOT NULL, description VARCHAR2(50) NOT NULL, map BLOB, CONSTRAINT pk_locations PRIMARY KEY (id) ) ORGANIZATION INDEX TABLESPACE iot_tablespace PCTTHRESHOLD 20 INCLUDING description OVERFLOW TABLESPACE overflow_tablespace; 15 December 2018 95 Working with Partitioned Tables and Indexes Modern enterprises frequently run mission-critical databases containing upwards of several hundred gigabytes and, in many cases, several terabytes of data. These enterprises are challenged by the support and maintenance requirements of very large databases (VLDB), and must devise methods to meet those challenges. One way to meet VLDB demands is to create and use partitioned tables and indexes. Partitioned tables allow your data to be broken down into smaller, more manageable pieces called partitions, or even subpartitions. Indexes can be partitioned in similar fashion. Each partition is stored in its own segment and can be managed individually. It can function independently of the other partitions, thus providing a structure that can be better tuned for availability and performance. 15 December 2018 96 Working with Partitioned Tables and Indexes If you are using parallel execution, partitions provide another means of parallelization. Operations on partitioned tables and indexes are performed in parallel by assigning different parallel execution servers to different partitions of the table or index. Partitions and subpartitions of a table or index all share the same logical attributes. For example, all partitions (or subpartitions) in a table share the same column and constraint definitions, and all partitions (or subpartitions) of an index share the same index options. They can, however, have different physical attributes (such as TABLESPACE). Although you are not required to keep each table or index partition (or subpartition) in a separate tablespace, it is to your advantage to do so. Storing partitions in separate tablespaces enables you to: Reduce the possibility of data corruption in multiple partitions Back up and recover each partition independently Control the mapping of partitions to disk drives (important for balancing I/O load) Improve manageability, availability, and performance Partitioning is transparent to existing applications and standard DML statements run against partitioned tables. However, an application can be programmed to take advantage of partitioning by using partition-extended table or index names in DML. 15 December 2018 97 Partitioning Methods There are several partitioning methods offered by Oracle Database: • Range partitioning • Hash partitioning • List partitioning • Composite range-hash partitioning • Composite range-list partitioning 15 December 2018 98 When to use Range partitioning Use range partitioning to map rows to partitions based on ranges of column values. This type of partitioning is useful when dealing with data that has logical ranges into which it can be distributed; for example, months of the year. Performance is best when the data evenly distributes across the range. If partitioning by range causes partitions to vary dramatically in size because of unequal distribution, you may want to consider one of the other methods of partitioning. The example below creates a table of four partitions, one for each quarter of sales. The columns sale_year, sale_month, and sale_day are the partitioning columns, while their values constitute the partitioning key of a specific row. The VALUES LESS THAN clause determines the partition bound: rows with partitioning key values that compare less than the ordered list of values specified by the clause are stored in the partition. Each partition is given a name (sales_q1, sales_q2, ...), and each partition is contained in a separate tablespace (tsa, tsb, ...). 15 December 2018 99 When to use Range partitioning CREATE TABLE sales ( invoice_no NUMBER, sale_year INT NOT NULL, sale_month INT NOT NULL, sale_day INT NOT NULL ) PARTITION BY RANGE (sale_year, sale_month, sale_day) ( PARTITION sales_q1 VALUES LESS THAN (1999, 04, 01) TABLESPACE tsa, PARTITION sales_q2 VALUES LESS THAN (1999, 07, 01) TABLESPACE tsb, PARTITION sales_q3 VALUES LESS THAN (1999, 10, 01) TABLESPACE tsc, PARTITION sales_q4 VALUES LESS THAN (2000, 01, 01) TABLESPACE tsd ); 15 December 2018 100 When to use Hash partitioning Use hash partitioning if your data does not easily lend itself to range partitioning, but you would like to partition for performance and manageability reasons. Hash partitioning provides a method of evenly distributing data across a specified number of partitions. Rows are mapped into partitions based on a hash value of the partitioning key. Creating and using hash partitions gives you a highly tunable method of data placement, because you can influence availability and performance by spreading these evenly sized partitions across I/O devices (striping). To create hash partitions you specify the following: Partitioning method: hash Partitioning column(s) Number of partitions or individual partition descriptions The following example creates a hash-partitioned table. The partitioning column is id, four partitions are created and assigned system generated names, and they are placed in four named tablespaces (gear1, gear2, ...). CREATE TABLE scubagear (id NUMBER, name VARCHAR2 (60)) PARTITION BY HASH (id) PARTITIONS 4 STORE IN (gear1, gear2, gear3, gear4); 15 December 2018 101 When to use list partitioning Use list partitioning when you require explicit control over how rows map to partitions. You can specify a list of discrete values for the partitioning column in the description for each partition. This is different from range partitioning, where a range of values is associated with a partition, and from hash partitioning, where the user has no control of the row to partition mapping. The list partitioning method is specifically designed for modeling data distributions that follow discrete values. This cannot be easily done by range or hash partitioning because: Range partitioning assumes a natural range of values for the partitioning column. It is not possible to group together out-of-range values partitions. Hash partitioning allows no control over the distribution of data because the data is distributed over the various partitions using the system hash function. Again, this makes it impossible to logically group together discrete values for the partitioning columns into partitions. Further, list partitioning allows unordered and unrelated sets of data to be grouped and organized together very naturally. 15 December 2018 102 When to use list partitioning Unlike the range and hash partitioning methods, multicolumn partitioning is not supported for list partitioning. If a table is partitioned by list, the partitioning key can consist only of a single column of the table. Otherwise all columns that can be partitioned by the range or hash methods can be partitioned by the list partitioning method. The following example creates a list-partitioned table. It creates table q1_sales_by_region which is partitioned by regions consisting of groups of states. CREATE TABLE q1_sales_by_region (deptno number, deptname varchar2(20), quarterly_sales number(10, 2), state varchar2(2)) PARTITION BY LIST (state) (PARTITION q1_northwest VALUES ('OR', 'WA'), PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'), PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'), PARTITION q1_southeast VALUES ('FL', 'GA'), PARTITION q1_northcentral VALUES ('SD', 'WI'), PARTITION q1_southcentral VALUES ('OK', 'TX')); 15 December 2018 103 When to use Composite Range-Hash Partitioning Range-hash partitioning partitions data using the range method, and within each partition, subpartitions it using the hash method. These composite partitions are ideal for both historical data and striping, and provide improved manageability of range partitioning and data placement, as well as the parallelism advantages of hash partitioning. The following statement creates a range-hash partitioned table. In this example, three range partitions are created, each containing eight subpartitions. Because the subpartitions are not named, system generated names are assigned, but the STORE IN clause distributes them across the 4 specified tablespaces (ts1, ...,ts4). 15 December 2018 104 When to use Composite Range-Hash Partitioning CREATE TABLE scubagear (equipno NUMBER, equipname VARCHAR(32), price NUMBER) PARTITION BY RANGE (equipno) SUBPARTITION BY HASH(equipname) SUBPARTITIONS 8 STORE IN (ts1, ts2, ts3, ts4) (PARTITION p1 VALUES LESS THAN (1000), PARTITION p2 VALUES LESS THAN (2000), PARTITION p3 VALUES LESS THAN (MAXVALUE) ); The partitions of a range-hash partitioned table are logical structures only, as their data is stored in the segments of their subpartitions. As with partitions, these subpartitions share the same logical attributes. Unlike range partitions in a rangepartitioned table, the subpartitions cannot have different physical attributes from the owning partition, although they are not required to reside in the same tablespace. 15 December 2018 105 When to use Use Composite Range-List Partitioning Like the composite range-hash partitioning method, the composite range-list partitioning method provides for partitioning based on a two level hierarchy. The first level of partitioning is based on a range of values, as for range partitioning; the second level is based on discrete values, as for list partitioning. This form of composite partitioning is well suited for historical data, but lets you further group the rows of data based on unordered or unrelated column values. The following example illustrates how range-list partitioning might be used. The example tracks sales data of products by quarters and within each quarter, groups it by specified states. 15 December 2018 106 When to use Use Composite Range-List Partitioning CREATE TABLE quarterly_regional_sales (deptno number, item_no varchar2(20), txn_date date, txn_amount number, state varchar2(2)) TABLESPACE ts4 PARTITION BY RANGE (txn_date) SUBPARTITION BY LIST (state) (PARTITION q1_1999 VALUES LESS THAN (TO_DATE('1-APR-1999','DD-MON-YYYY')) (SUBPARTITION q1_1999_northwest VALUES ('OR', 'WA'), SUBPARTITION q1_1999_southwest VALUES ('AZ', 'UT', 'NM')), PARTITION q2_1999 VALUES LESS THAN ( TO_DATE('1-JUL-1999','DD-MON-YYYY')) (SUBPARTITION q2_1999_northwest VALUES ('OR', 'WA'), SUBPARTITION q2_1999_southwest VALUES ('AZ', 'UT', 'NM'), PARTITION q3_1999 VALUES LESS THAN (TO_DATE('1-OCT-1999','DD-MON-YYYY')) (SUBPARTITION q3_1999_northwest VALUES ('OR', 'WA'), SUBPARTITION q3_1999_southwest VALUES ('AZ', 'UT', 'NM')), PARTITION q4_1999 VALUES LESS THAN ( TO_DATE('1-JAN-2000','DD-MON-YYYY')) (SUBPARTITION q4_1999_northwest VALUES ('OR', 'WA'), SUBPARTITION q4_1999_southwest VALUES ('AZ', 'UT', 'NM') A row is mapped to a partition by checking whether the value of the partitioning column for a row falls within a specific partition range. The row is then mapped to a subpartition within that partition by identifying the subpartition whose descriptor value list contains a value matching the subpartition column value. For example, some sample rows are inserted as follows: (10, 4532130, '23-Jan-1999', 8934.10, 'WA') maps to subpartition q1_1999_northwest 15 December 2018 107 References: 15 December 2018 108 Performance and Tuning ORACLE SQL Kanagaraj_velusamy@rcomext.com, Kanagaraj.Velusamy@yahoo.com 15 December 2018