Ch11 Database Performance Tuning-and Query Optimization Find the solutions to Problems 1 and 2 based on the following query: SELECT FROM WHERE ORDER BY EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX EMPLOYEE EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’ EMP_LNAME, EMP_FNAME; 1. What is the likely data sparsity of the EMP_SEX column? Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low sparsity. 2. What indexes should you create? Write the required SQL commands. You should create an index in EMP_AREACODE and a composite index on EMP_LNAME, EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and EMP_NDX2, respectively. The required SQL commands are: CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE); CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME); 3. Using Table 11.4 as an example, create two alternative access plans. Use the following assumptions: a. There are 8,000 employees. b. There are 4,150 female employees. c. There are 370 employees in area code 615. d. There are 190 female employees in area code 615. The solution is shown in Table P11.3. TABLE P11.3 COMPARING ACCESS PLANS AND I/O COSTS Plan Step A A1 A A2 B B1 B B2 B B3 B B4 Operation I/O Operations Full table scan EMPLOYE Select only rows with EMP_SEX=’F’ and EMP_AREACODE=’615’ SORT Operation Index Scan Range of EMP_NDX1 Table Access by RowID EMPLOYEE Select only rows with EMP_SEX=’F’ SORT Operation 327 I/O Cost Resulting Set Rows Total I/O Cost 8,000 8,000 190 8,000 190 190 190 8,190 370 370 370 370 370 370 370 740 370 370 190 930 190 190 190 1,120 Ch11 Database Performance Tuning-and Query Optimization As you examine Table P11.3, note that in plan A the DBMS uses a full table scan of EMPLOYEE. The SORT operation is done to order the output by employee last name and first name. In Plan B, the DBMS uses an Index Scan Range of the EMP_NDX1 index to get the EMPLOYEE RowIDs. After the EMPLOYEE RowIDs have been retrieved, the DBMS uses those RowIDs to get the EMPLOYEE rows. Next, the DBMS selects only those rows with SEX = ‘F.’ Finally, the DBMS sorts the result set by employee last name and first name. Problems 22-24 are based on the following query: SELECT FROM WHERE AND AND ORDER BY P_CODE, P_DESCRIPT, P_PRICE, PRODUCT.V_CODE, V_STATE PRODUCT P, VENDOR V P.V_CODE = V.V_CODE V_STATE = ‘NY’ V_AREACODE = ‘212’; P_PRICE; 22. What indexes would you recommend? This query uses the V_STATE and V_AREACODE attributes in its conditional criteria. Furthermore, the conditional criteria use equality comparisons. Given these conditions, an index on V_STATE and another index on V_AREACODE are highly recommended. 23. Write the commands required to create the indexes you recommended in Problem 22. CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_AREACODE); Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 24. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables. ANALYZE TABLE PRODUCT COMPUTE STATISTICS; ANALYZE TABLE VENDOR COMPUTER STATISTICS; Problems 29-32 are based on the following query: SELECT FROM WHERE GROUP BY CUS_CODE, MAX(LINE_UNITS*LINE_PRICE) CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE CUS_AREACODE = ‘615’ CUS_CODE; 29. Assuming that you generate 15,000 invoices per month; what recommendation would you give the designer about the use of derived attributes? 328 Ch11 Database Performance Tuning-and Query Optimization This query uses the MAX aggregate function to compute the maximum invoice line value by customer. Because this table increases at a rate of 15,000 rows per month, the query would take considerable amount of time to run as the number of invoice rows increases. Furthermore, because the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple table column, the query optimizer is very likely to perform a full table scan in order to compute the maximum invoice line value. One way to speed up the query would be to store the derived attribute LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query would benefit by using the index to execute the query. 30. Assuming that you follow the recommendations you gave in Problem 29, how would you rewrite the query? SELECT FROM WHERE GROUP BY CUS_CODE, MAX(LINE_TOTAL) CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE CUS_AREACODE = ‘615’ CUS_CODE; 31. What indexes would you recommend for the query you wrote in Problem 30, and what SQL commands would you use? The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE. Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on this column is highly recommended. The command to create this index would be: CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE); 32. How would you rewrite the query to ensure that the index you created in Problem 31 is used? You need to use the INDEX optimizer hint: SELECT FROM WHERE GROUP BY /*+ INDEX(CUS_NDX1) */ CUS_CODE, MAX(LINE_TOTAL) CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE CUS_AREACODE = ‘615’ CUS_CODE; 329