Ch11-sol

```Ch11 Database Performance Tuning-and Query Optimization
Find the solutions to Problems 1 and 2 based on the following query:
SELECT
FROM
WHERE
ORDER BY
EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX
EMPLOYEE
EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’
EMP_LNAME, EMP_FNAME;
1. What is the likely data sparsity of the EMP_SEX column?
Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low
sparsity.
2. What indexes should you create? Write the required SQL commands.
You should create an index in EMP_AREACODE and a composite index on EMP_LNAME,
EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and
EMP_NDX2, respectively. The required SQL commands are:
CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE);
CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME);
3. Using Table 11.4 as an example, create two alternative access plans. Use the following
assumptions:
a. There are 8,000 employees.
b. There are 4,150 female employees.
c. There are 370 employees in area code 615.
d. There are 190 female employees in area code 615.
The solution is shown in Table P11.3.
TABLE P11.3 COMPARING ACCESS PLANS AND I/O COSTS
Plan
Step
A
A1
A
A2
B
B1
B
B2
B
B3
B
B4
Operation
I/O
Operations
Full table scan EMPLOYE
Select only rows with
EMP_SEX=’F’ and
EMP_AREACODE=’615’
SORT Operation
Index Scan Range of
EMP_NDX1
Table Access by RowID
EMPLOYEE
Select only rows with
EMP_SEX=’F’
SORT Operation
327
I/O
Cost
Resulting
Set Rows
Total I/O
Cost
8,000
8,000
190
8,000
190
190
190
8,190
370
370
370
370
370
370
370
740
370
370
190
930
190
190
190
1,120
Ch11 Database Performance Tuning-and Query Optimization
As you examine Table P11.3, note that in plan A the DBMS uses a full table scan of EMPLOYEE.
The SORT operation is done to order the output by employee last name and first name. In Plan B,
the DBMS uses an Index Scan Range of the EMP_NDX1 index to get the EMPLOYEE RowIDs.
After the EMPLOYEE RowIDs have been retrieved, the DBMS uses those RowIDs to get the
EMPLOYEE rows. Next, the DBMS selects only those rows with SEX = ‘F.’ Finally, the DBMS
sorts the result set by employee last name and first name.
Problems 22-24 are based on the following query:
SELECT
FROM
WHERE
AND
AND
ORDER BY
P_CODE, P_DESCRIPT, P_PRICE, PRODUCT.V_CODE, V_STATE
PRODUCT P, VENDOR V
P.V_CODE = V.V_CODE
V_STATE = ‘NY’
V_AREACODE = ‘212’;
P_PRICE;
22. What indexes would you recommend?
This query uses the V_STATE and V_AREACODE attributes in its conditional criteria.
Furthermore, the conditional criteria use equality comparisons. Given these conditions, an index on
V_STATE and another index on V_AREACODE are highly recommended.
23. Write the commands required to create the indexes you recommended in Problem 22.
CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE);
CREATE INDEX VEND_NDX2 ON VENDOR(V_AREACODE);
Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively.
24. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables.
ANALYZE TABLE PRODUCT COMPUTE STATISTICS;
ANALYZE TABLE VENDOR COMPUTER STATISTICS;
Problems 29-32 are based on the following query:
SELECT
FROM
WHERE
GROUP BY
CUS_CODE, MAX(LINE_UNITS*LINE_PRICE)
CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE
CUS_AREACODE = ‘615’
CUS_CODE;
29. Assuming that you generate 15,000 invoices per month; what recommendation would you give
the designer about the use of derived attributes?
328
Ch11 Database Performance Tuning-and Query Optimization
This query uses the MAX aggregate function to compute the maximum invoice line value by
customer. Because this table increases at a rate of 15,000 rows per month, the query would take
considerable amount of time to run as the number of invoice rows increases. Furthermore, because
the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple
table column, the query optimizer is very likely to perform a full table scan in order to compute the
maximum invoice line value. One way to speed up the query would be to store the derived attribute
LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query
would benefit by using the index to execute the query.
30. Assuming that you follow the recommendations you gave in Problem 29, how would you
rewrite the query?
SELECT
FROM
WHERE
GROUP BY
CUS_CODE, MAX(LINE_TOTAL)
CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE
CUS_AREACODE = ‘615’
CUS_CODE;
31. What indexes would you recommend for the query you wrote in Problem 30, and what SQL
commands would you use?
The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE.
Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any
case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on
this column is highly recommended. The command to create this index would be:
CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE);
32. How would you rewrite the query to ensure that the index you created in Problem 31 is used?
You need to use the INDEX optimizer hint:
SELECT
FROM
WHERE
GROUP BY
/*+ INDEX(CUS_NDX1) */ CUS_CODE, MAX(LINE_TOTAL)
CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE
CUS_AREACODE = ‘615’
CUS_CODE;
329
```