Ch11 Database Performance Tuning-and Query Optimization Chapter 11 Database Performance Tuning and Query Optimization Discussion Focus This chapter focuses on the factors that directly affect database performance. Because performancetuning techniques can be DBMS-specific, the material in this chapter may not be applicable under all circumstances, nor will it necessarily pertain to all DBMS types. This chapter is designed to build a foundation for the general understanding of database performancetuning issues and to help you choose appropriate performance-tuning strategies. (For the most current information about tuning your database, consult the vendor’s documentation.) Start by covering the basic database performance-tuning concepts. Encourage students to use the web to search for information about the internal architecture (internal process and database storage formats) of various database systems. Focus on the similarities to lay a common foundation. Explain how a DBMS processes SQL queries in general terms and stress the importance of indexes in query processing. Emphasize the generation of database statistics for optimum query processing. Step through the query processing example in section 11.4, Optimizer Choices. Discuss the common practices used to write more efficient SQL code. Emphasize that some practices are DBMS-specific. As technology advances, the query optimization logic becomes increasingly sophisticated and effective. Therefore, some of the SQL practices illustrated in this chapter may not improve query performance as dramatically as it does in older systems. Finally, illustrate the chapter material using the query optimization example in section 11.8. 331 Ch11 Database Performance Tuning-and Query Optimization Answers to Review Questions 1. What is SQL performance tuning? SQL performance tuning describes a process – on the client side – that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server end. 2. What is database performance tuning? DBMS performance tuning describes a process – on the server side – that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible, while making optimum use of existing resources. 3. What is the focus of most performance tuning activities, and why does that focus exist? Most performance-tuning activities focus on minimizing the number of I/O operations, because the I/O operations are much slower than reading data from the data cache. 4. What are database statistics, and why are they important? The term database statistics refers to a number of measurements gathered by the DBMS to describe a snapshot of the database objects’ characteristics. The DBMS gathers statistics about objects such as tables, indexes, and available resources -- such as number of processors used, processor speed, temporary space available, and so on. Such statistics are used to make critical decisions about improving the query processing efficiency. 5. How are database statistics obtained? Database statistics can be gathered manually by the DBA or automatically by the DBMS. For example, many DBMS vendors support the SQL’s ANALYZE command to gather statistics. In addition, many vendors have their own routines to gather statistics. For example, IBM’s DB2 uses the RUNSTATS procedure, while Microsoft’s SQL Server uses the UPDATE STATISTICS procedure and provides the Auto-Update and Auto-Create Statistics options in its initialization parameters. 6. What database statistics measurements are typical of tables, indexes, and resources? For tables, typical measurements include the number of rows, the number of disk blocks used, row length, the number of columns in each row, the number of distinct values in each column, the maximum value in each column, the minimum value in each column, what columns have indexes, and so on. For indexes, typical measurements include the number and name of columns in the index key, the number of key values in the index, the number of distinct key values in the index key, histogram of key values in an index, etc. 332 Ch11 Database Performance Tuning-and Query Optimization For resources, typical measurements include the logical and physical disk block size, the location and size of data files, the number of extends per data file, and so on. 7. How is the processing of SQL DDL statements (such as CREATE TABLE) different from the processing required by DML statements? A DDL statement actually updates the data dictionary tables or system catalog, while a DML statement (SELECT, INSERT, UPDATE and DELETE) mostly manipulates end user data. 8. In simple terms, the DBMS processes queries in three phases. What are those phases, and what is accomplished in each phase? The three phases are: 1. Parsing. The DBMS parses the SQL query and chooses be most efficient access/execution plan. 2. Execution. The DBMS executes the SQL query using the chosen execution plan. 3. Fetching. The DBMS fetches the data and sends the result set back to the client. Parsing involves breaking the query into smaller units and transforming the original SQL query into a slightly different version of the original SQL code -- but one that is “fully equivalent” and more efficient. Fully equivalent means that the optimized query results are always the same as the original query. More efficient means that the optimized query will, almost always, execute faster than the original query. (Note that we say almost always because many factors affect the performance of a database. These factors include the network, the client’s computer resources, and even other queries running concurrently in the same database.) After the parsing and execution phases are completed, all rows that match the specified condition(s) have been retrieved, sorted, grouped, and/or – if required – aggregated. During the fetching phase, the rows of the resulting query result set are returned to the client. During this phase, the DBMS may use temporary table space to store temporary data. 9. If indexes are so important, why not index every column in every table? (Include a brief discussion of the role played by data sparsity.) Indexing every column in every table will tax the DBMS too much in terms of index-maintenance processing, especially if the table has many attributes, many rows, and/or requires many inserts, updates, and/or deletes. One measure to determine the need for an index is the data sparsity of the column you want to index. Data sparsity refers to the number of different values a column could possibly have. For example, a STU_SEX column in a STUDENT table can have only two possible values, “M” or “F”; therefore this column is said to have low sparsity. In contrast, the STU_DOB column that stores the student date of birth can have many different date values; therefore, this column is said to have high sparsity. Knowing the sparsity helps you decide whether or not the use of an index is appropriate. For 333 Ch11 Database Performance Tuning-and Query Optimization example, when you perform a search in a column with low sparsity, you are very likely to read a high percentage of the table rows anyway; therefore index processing may be unnecessary work. 10. What is the difference between a rule-based optimizer and a cost-based optimizer? A rule-based optimizer uses a set of preset rules and points to determine the best approach to execute a query. The rules assign a “cost” to each SQL operation; the costs are then added to yield the cost of the execution plan. A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects being accessed to determine the best approach to execute a query. In this case, the optimizer process adds up the processing cost, the I/O costs and the resource costs (RAM and temporary space) to come up with the total cost of a given execution plan. 11. What are optimizer hints and how are they used? Hints are special instructions for the optimizer that are embedded inside the SQL command text. Although the optimizer generally performs very well under most circumstances, there are some circumstances in which the optimizer may not choose the best execution plan. Remember, the optimizer makes decisions based on the existing statistics. If the statistics are old, the optimizer may not do a good job in selecting the best execution plan. Even with the current statistics, the optimizer choice may not be the most efficient one. There are some occasions when the end-user would like to change the optimizer mode for the current SQL statement. In order to accomplish this task, you have to use hints. 12. What are some general guidelines for creating and using indexes? Create indexes for each single attribute used in a WHERE, HAVING, ORDER BY, or GROUP BY clause. If you create indexes in all single attributes used in search conditions, the DBMS will access the table using an index scan, instead of a full table scan. For example, if you have an index for P_PRICE, the condition P_PRICE > 10.00 can be solved by accessing the index, instead of sequentially scanning all table rows and evaluating P_PRICE for each row. Indexes are also used in join expressions, such as in CUSTOMER.CUS_CODE = INVOICE.CUS_CODE. Do not use indexes in small tables or tables with low sparsity. Remember, small tables and low sparsity tables are not the same thing. A search condition in a table with low sparsity may return a high percentage of table rows anyway, making the index operation too costly and making the full table scan a viable option. Using the same logic, do not create indexes for tables with few rows and few attributes—unless you must ensure the existence of unique values in a column. Declare primary and foreign keys so the optimizer can use the indexes in join operations. All natural joins and old-style joins will benefit if you declare primary keys and foreign keys because the optimizer will use the available indexes at join time. (The declaration of a PK or FK will automatically create an index for the declared column. Also, for the same reason, it is better to write joins using the SQL JOIN syntax. (See Chapter 8, “Advanced SQL.”) 334 Ch11 Database Performance Tuning-and Query Optimization Declare indexes in join columns other than PK/FK. If you do join operations on columns other than the primary and foreign key, you may be better off declaring indexes in such columns. 13. Most query optimization techniques are designed to make the optimizer’s work easier. What factors should you keep in mind if you intend to write conditional expressions in SQL code? Use simple columns or literals as operands in a conditional expression—avoid the use of conditional expressions with functions whenever possible. Comparing the contents of a single column to a literal is faster than comparing to expressions. Numeric field comparisons are faster than character, date, and NULL comparisons. In search conditions, comparing a numeric attribute to a numeric literal is faster than comparing a character attribute to a character literal. In general, numeric comparisons (integer, decimal) are handled faster by the CPU than character and date comparisons. Because indexes do not store references to null values, NULL conditions involve additional processing and therefore tend to be the slowest of all conditional operands. Equality comparisons are faster than inequality comparisons. As a general rule, equality comparisons are processed faster than inequality comparisons. For example, P_PRICE = 10.00 is processed faster because the DBMS can do a direct search using the index in the column. If there are no exact matches, the condition is evaluated as false. However, if you use an inequality symbol (>, >=, <, <=) the DBMS must perform additional processing to complete the request. This is because there would almost always be more “greater than” or “less than” values and perhaps only a few exactly “equal” values in the index. The slowest (with the exception of NULL) of all comparison operators is LIKE with wildcard symbols, such as in V_CONTACT LIKE “%glo%”. Also, using the “not equal” symbol (<>) yields slower searches, especially if the sparsity of the data is high; that is, if there are many more different values than there are equal values. Whenever possible, transform conditional expressions to use literals. For example, if your condition is P_PRICE -10 = 7, change it to read P_PRICE = 17. Also, if you have a composite condition such as: P_QOH < P_MIN AND P_MIN = P_REORDER AND P_QOH = 10 change it to read: P_QOH = 10 AND P_MIN = P_REORDER AND P_MIN > 10 When using multiple conditional expressions, write the equality conditions first. (Note that we did this in the previous example.) Remember, equality conditions are faster to process than inequality conditions. Although most RDBMSs will automatically do this for you, paying attention to this detail lightens the load for the query optimizer. (The optimizer won’t have to do what you have already done.) If you use multiple AND conditions, write the condition most likely to be false first. If you use this technique, the DBMS will stop evaluating the rest of the conditions as soon as it finds a conditional expression that is evaluated to be false. Remember, for multiple AND conditions to be found true, all conditions must be evaluated as true. If one of the conditions evaluates to false, everything else is evaluated as false. Therefore, if you use this technique, the DBMS won’t waste time unnecessarily 335 Ch11 Database Performance Tuning-and Query Optimization evaluating additional conditions. Naturally, the use of this technique implies an implicit knowledge of the sparsity of the data set. Whenever possible, try to avoid the use of the NOT logical operator. It is best to transform a SQL expression containing a NOT logical operator into an equivalent expression. For example: NOT (P_PRICE > 10.00) can be written as P_PRICE <= 10.00. Also, NOT (EMP_SEX = 'M') can be written as EMP_SEX = 'F'. 14. What recommendations would you make for managing the data files in a DBMS with many tables and indexes? First, create independent data files for the system, indexes and user data table spaces. Put the data files on separate disks or RAID volumes. This ensures that index operations will not conflict with end-user data or data dictionary table access operations. Second, put high-usage end-user tables in their own table spaces. By doing this, the database minimizes conflicts with other tables and maximizes storage utilization. Third, evaluate the creation of indexes based on the access patterns. Identify common search criteria and isolate the most frequently used columns in search conditions. Create indexes on high usage columns with high sparsity. Fourth, evaluate the usage of aggregate queries in your database. Identify columns used in aggregate functions and determine if the creation of indexes on such columns will improve response time. Finally, identify columns used in ORDER BY statements and make sure there are indexes on such columns. 15. What does RAID stand for and what are some commonly used RAID levels? RAID is the acronym for Redundant Array of Independent Disks. RAID is used to provide balance between performance and fault tolerance. RAID systems use multiple disks to create virtual disks (storage volumes) formed by several individual disks. RAID systems provide performance improvement and fault tolerance. Table 11.7 in the text shows the commonly used RAID levels. (We have reproduced the table for your convenience.) 336 Ch11 Database Performance Tuning-and Query Optimization TABLE 11.7 Common RAID Configurations RAID Level Description The data blocks are spread over separate drives. Also known as striped array. Provides 0 increased performance but no fault tolerance. Fault tolerance means that in case of failure, data could be reconstructed and retrieved. Requires a minimum of two drives. The same data blocks are written (duplicated) to separate drives. Also referred to as 1 mirroring or duplexing. Provides increased read performance and fault tolerance via data redundancy. Requires a minimum of two drives. The data are striped across separate drives, and parity data are computed and stored in a 3 dedicated drive. Parity data are specially generated data that permit the reconstruction of corrupted or missing data. Provides good read performance and fault tolerance via parity data. Requires a minimum of three drives. The data and the parity are striped across separate drives. Provides good read 5 performance and fault tolerance via parity data. Requires a minimum of three drives. Problem Solutions Problems 1 and 2 are based on the following query: SELECT FROM WHERE ORDER BY EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX EMPLOYEE EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’ EMP_LNAME, EMP_FNAME; 1. What is the likely data sparsity of the EMP_SEX column? Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low sparsity. 2. What indexes should you create? Write the required SQL commands. You should create an index in EMP_AREACODE and a composite index on EMP_LNAME, EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and EMP_NDX2, respectively. The required SQL commands are: CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE); CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME); 337 Ch11 Database Performance Tuning-and Query Optimization 3. Using Table 11.4 as an example, create two alternative access plans. Use the following assumptions: a. There are 8,000 employees. b. There are 4,150 female employees. c. There are 370 employees in area code 615. d. There are 190 female employees in area code 615. The solution is shown in Table P11.3. TABLE P11.3 COMPARING ACCESS PLANS AND I/O COSTS Plan Step A A1 A A2 B B1 B B2 B B3 B B4 Operation I/O Operations Full table scan EMPLOYE Select only rows with EMP_SEX=’F’ and EMP_AREACODE=’615’ SORT Operation Index Scan Range of EMP_NDX1 Table Access by RowID EMPLOYEE Select only rows with EMP_SEX=’F’ SORT Operation I/O Cost Resulting Set Rows Total I/O Cost 8,000 8,000 190 8,000 190 190 190 8,190 370 370 370 370 370 370 370 740 370 370 190 930 190 190 190 1,120 As you examine Table P11.3, note that in plan A the DBMS uses a full table scan of EMPLOYEE. The SORT operation is done to order the output by employee last name and first name. In Plan B, the DBMS uses an Index Scan Range of the EMP_NDX1 index to get the EMPLOYEE RowIDs. After the EMPLOYEE RowIDs have been retrieved, the DBMS uses those RowIDs to get the EMPLOYEE rows. Next, the DBMS selects only those rows with SEX = ‘F.’ Finally, the DBMS sorts the result set by employee last name and first name. Problems 4- 6are based on the following query: SELECT FROM WHERE EMP_LNAME, EMP_FNAME, EMP_DOB, YEAR(EMP_DOB) AS YEAR EMPLOYEE YEAR(EMP_DOB) = 1966; 4. What is the likely data sparsity of the EMP_DOB column? Because the EMP_DOB column stores employee’s birthdays, this column is very likely to have high data sparsity. 338 Ch11 Database Performance Tuning-and Query Optimization 5. Should you create an index on EMP_DOB? Why or why not? Creating an index in the EMP_DOB column would not help this query, because the query uses the YEAR function. However, if the same column is used for other queries, you may want to re-evaluate the decision not to create the index. 6. What type of database I/O operations will likely be used by the query? (See Table 11.3.) This query more than likely uses a full table scan to read all rows of the EMPLYEE table and generate the required output. We have reproduced the table here to facilitate your discussion: TABLE 11.3 Sample DBMS Access Plan I/O Operations Operation Table Scan (Full) Table Access (Row ID) Index Scan (Range) Index Access (Unique) Nested Loop Merge Sort Description Reads the entire table sequentially, from the first row to the last row, one row at a time (slowest) Reads a table row directly, using the row ID value (fastest) Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan) Used when a table has a unique index in a column Reads and compares a set of values to another set of values, using a nested loop style (slow) Merges two data sets (slow) Sorts a data set (slow) 339 Ch11 Database Performance Tuning-and Query Optimization Problems 7-10 are based on the ER model shown in Figure P11.7 and on the query shown after the figure. Figure P11.7 The Ch11_SaleCo ER Model Given the following query SELECT FROM WHERE P_CODE, P_PRICE PRODUCT P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT); 7. Assuming that there are no table statistics, what type of optimization will the DBMS use? The DBMS will use the rule-based optimization. 8. What type of database I/O operations will likely be used by the query? (See Table 11.3.) The DBMS will likely use a full table scan to compute the average price in the inner subquery. The DBMS is also very likely to use another full table scan of PRODUCT to execute the outer query. (We have reproduced the table for your convenience.) 340 Ch11 Database Performance Tuning-and Query Optimization TABLE 11.3 Sample DBMS Access Plan I/O Operations Operation Table Scan (Full) Table Access (Row ID) Index Scan (Range) Index Access (Unique) Nested Loop Merge Sort Description Reads the entire table sequentially, from the first row to the last row, one row at a time (slowest) Reads a table row directly, using the row ID value (fastest) Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan) Used when a table has a unique index in a column Reads and compares a set of values to another set of values, using a nested loop style (slow) Merges two data sets (slow) Sorts a data set (slow) 9. What is the likely data sparsity of the P_PRICE column? Because each product is likely to have a different price, the P_PRICE column is likely to have high sparsity. 10. Should you create an index? Why or why not? Yes, you should create an index because the column P_PRICE has high sparsity and the column is very likely to be used in many different SQL queries as part of a conditional expression. Problems 11-14 are based on the following query: SELECT FROM GROUP BY HAVING P_CODE, SUM(LINE_UNITS) LINE P_CODE SUM(LINE_UNITS) > (SELECT MAX(LINE_UNITS) FROM LINE); 11. What is the likely data sparsity of the LINE_UNITS column? The LINE_UNITS column in the LINE table represents the quantity purchased of a given product in a given invoice. This column is likely to have many different values and therefore, the column is very likely to have high sparsity. 12. Should you create an index? If so, what would the index column(s) be and why would you create that index? If not, explain your reasoning. Yes, you should create an index on LINE_UNITS. This index is likely to help in the execution of the inner query that computes the maximum value of LINE_UNITS. 341 Ch11 Database Performance Tuning-and Query Optimization 13. Should you create an index on P_CODE? If so, write the SQL command to create that index. If not, explain your reasoning. Yes, creating an index on P_CODE will help in query execution. However, most DBMSs automatically index foreign key columns. If this is not the case in your DBMS, you can manually create an index using the CREATE INDEX LINE_NDX1 ON LINE(P_CODE) command. (Note that we have named the index LINE_NDX1.) 14. Write the command to create statistics for this table. ANALYZE TABLE LINE COMPUTE STATISTICS; Problems 15-16 are based on the following query: SELECT FROM WHERE P_CODE, P_QOH*P_PRICE PRODUCT P_QOH*P_PRICE > (SELECT AVG(P_QOH*P_PRICE) FROM PRODUCT) 15. What is the likely data sparsity of the P_QOH and P_PRICE columns? The P_QOH and P_PRICE are likely to have high data sparsity. 16. Should you create an index, what would the index column(s) be, and why should you create that index? In this case, creating an index on P_QOH or on P_PRICE will not help the query execute faster for two reasons: first, the WHERE condition on the outer query uses an expression and second, the aggregate function also uses an expression. When using expressions in the operands of a conditional expression, the DBMS will not use indexes available on the columns that are used in the expression. Problems 17-19 are based on the following query: SELECT FROM WHERE ORDER BY V_CODE, V_NAME, V_CONTACT, V_STATE VENDOR V_STATE =’TN’ V_NAME; 17. What indexes should you create and why? Write the SQL command to create the indexes. You should create an index on the V_STATE column in the VENDOR table. This new index will help in the execution of this query because the conditional operation uses the V_STATE column in the conditional criteria. In addition, you should create an index on V_NAME, because it is used in the ORDER BY clause. The commands to create the indexes are: CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_NAME); 342 Ch11 Database Performance Tuning-and Query Optimization Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 18. Assume that 10,000 vendors are distributed as shown in Table P11.18. What percentage of rows will be returned by the query? Table P11.18 Distribution of Vendors by State State AK AL AZ CA CO FL GA HI IL IN KS KY LA MD MI MO Number of Vendors 15 55 100 3244 345 995 75 68 89 12 19 45 29 208 745 35 State MS NC NH NJ NV OH OK PA RI SC SD TN TX UT VA WA Number of Vendors 47 358 25 645 16 821 62 425 12 65 74 113 589 36 375 258 Given the distribution of values in Table P11.18, the query will return 113 of the 10,000 rows, or 1.13% of the total table rows. 19. What type of I/O database operations would be most likely to be used to execute that query? Assuming that you create the index on V_STATE and that you generate the statistics on the VENDOR table, the DBMS is very likely to use the index scan range to access the index data and then use the table access by row ID to get the VENDOR rows. 343 Ch11 Database Performance Tuning-and Query Optimization 20 Using Table 11.4 as an example, create two alternative access plans. The two access plans are shown in Table P11.20. Table P11.20 Comparing Access Plans and I/O Costs Plan Step A A1 A A2 B B1 B B2 B B3 Operation Full table scan VENDOR Select only rows with V_STATE=’TN’ SORT Operation Index Scan Range of VEND_NDX1 Table Access by RowID VENDOR SORT Operation I/O Operations I/O Cost Resulting Set Rows Total I/O Cost 10,000 10,000 113 10,000 113 113 113 10,113 113 113 113 113 113 113 113 226 113 113 113 339 In Plan A, the DBMS uses a full table scan of VENDOR. The SORT operation is done to order the output by vendor name. In Plan B, the DBMS uses an Index Scan Range of the VEND_NDX1 index to get the VENDOR RowIDs. Next, the DBMS uses the RowIDs to get the EMPLOYEE rows. Finally, the DBMS sorts the result set by V_NAME. 21. Assume that you have 10,000 different products stored in the PRODUCT table and that you are writing a Web-based interface to list all products with a quantity on hand (P_QOH) that is less than or equal to the minimum quantity, P_MIN. What optimizer hint would you use to ensure that your query returns the result set to the Web interface in the least time possible? Write the SQL code. You will write your query using the FIRST_ROWS hint to minimize the time it takes to return the first set of rows to the application. The query would be SELECT /*+ FIRST_ROWS */* FROM PRODUCT WHERE P_QOH <= P_MIN; 344 Ch11 Database Performance Tuning-and Query Optimization Problems 22-24 are based on the following query: SELECT FROM WHERE AND AND ORDER BY P_CODE, P_DESCRIPT, P_PRICE, PRODUCT.V_CODE, V_STATE PRODUCT P, VENDOR V P.V_CODE = V.V_CODE V_STATE = ‘NY’ V_AREACODE = ‘212’; P_PRICE; 22. What indexes would you recommend? This query uses the V_STATE and V_AREACODE attributes in its conditional criteria. Furthermore, the conditional criteria use equality comparisons. Given these conditions, an index on V_STATE and another index on V_AREACODE are highly recommended. 23. Write the commands required to create the indexes you recommended in Problem 22. CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_AREACODE); Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 24. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables. ANALYZE TABLE PRODUCT COMPUTE STATISTICS; ANALYZE TABLE VENDOR COMPUTER STATISTICS; Problems 25 and 26 are based on the following query: SELECT FROM WHERE ORDER BY P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE PRODUCT V_CODE = ‘21344’ P_CODE; 25. What index would you recommend, and what command would you use? This query uses one WHERE condition and one ORDER BY clause. The conditional expression uses the V_CODE column in an equality comparison. In this case, creating an index on the V_CODE attribute is recommended. If V_CODE is declared to be a foreign key, the DBMS may already have created such an index automatically. If the DBMS does not generate the index automatically, create one manually. The ORDER BY clause uses the P_CODE column. Create an index on the columns used in an ORDER BY is recommended. However, because the P_CODE column is the primary key of the PRODUCT table, a unique index already exists for this column and therefore, it is not necessary to create another index on this column. 345 Ch11 Database Performance Tuning-and Query Optimization 26. How should you rewrite the query to ensure that it uses the index you created in your solution to Problem 25? In this case, the only index that should be created is the index on the V_CODE column. Assuming that such an index is called PROD_NDX1, you could use an optimizer hint as shown next: SELECT FROM WHERE ORDER BY /*+ INDEX(PROD_NDX1)*/P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE PRODUCT V_CODE = ‘21344’ P_CODE; Problems 27 and 28 are based on the following query: SELECT FROM WHERE AND AND ORDER BY P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE PRODUCT P_QOH < P_MIN P_MIN = P_REORDER’ P_REORDER = 50; P_QOH; 27. Use the recommendations given in Section 11.5.2 to rewrite the query to produce the required results more efficiently. SELECT FROM WHERE AND AND ORDER BY P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE PRODUCT P_REORDER = 50 P_MIN = 50 P_QOH < 50 P_QOH; This new query rewrites some conditions as follows: Because P_REORDER must be equal to 50, it replaces P_MIN = P_REORDER with P_MIN = 50. Because P_MIN must be 50, it replaces P_QOH<P_MIN with P_QOH<50. Having literals in the query conditions make queries more efficient. Note that you still need all three conditions in the query conditions. 28. What indexes you would recommend? Write the commands to create those indexes. Because the query uses equality comparison on P_REORDER, P_MIN and P_QOH, you should have indexes in such columns. The commands to create such indexes are: CREATE INDEX PROD_NDX1 ON PRODUCT(P_REORDER); CREATE INDEX PROD_NDX2 ON PRODUCT(P_MIN); CREATE INDEX PROD_NDX3 ON PRODUCT(P_QOH); 346 Ch11 Database Performance Tuning-and Query Optimization Problems 29-32 are based on the following query: SELECT FROM WHERE GROUP BY CUS_CODE, MAX(LINE_UNITS*LINE_PRICE) CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE CUS_AREACODE = ‘615’ CUS_CODE; 29. Assuming that you generate 15,000 invoices per month; what recommendation would you give the designer about the use of derived attributes? This query uses the MAX aggregate function to compute the maximum invoice line value by customer. Because this table increases at a rate of 15,000 rows per month, the query would take considerable amount of time to run as the number of invoice rows increases. Furthermore, because the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple table column, the query optimizer is very likely to perform a full table scan in order to compute the maximum invoice line value. One way to speed up the query would be to store the derived attribute LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query would benefit by using the index to execute the query. 30. Assuming that you follow the recommendations you gave in Problem 29, how would you rewrite the query? SELECT FROM WHERE GROUP BY CUS_CODE, MAX(LINE_TOTAL) CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE CUS_AREACODE = ‘615’ CUS_CODE; 31. What indexes would you recommend for the query you wrote in Problem 30, and what SQL commands would you use? The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE. Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on this column is highly recommended. The command to create this index would be: CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE); 32. How would you rewrite the query to ensure that the index you created in Problem 31 is used? You need to use the INDEX optimizer hint: SELECT FROM WHERE GROUP BY /*+ INDEX(CUS_NDX1) */ CUS_CODE, MAX(LINE_TOTAL) CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE CUS_AREACODE = ‘615’ CUS_CODE; 347