chp11

advertisement
Ch11 Database Performance Tuning-and Query Optimization
Chapter 11
Database Performance Tuning and Query Optimization
Discussion Focus
This chapter focuses on the factors that directly affect database performance. Because performancetuning techniques can be DBMS-specific, the material in this chapter may not be applicable under all
circumstances, nor will it necessarily pertain to all DBMS types.
This chapter is designed to build a foundation for the general understanding of database performancetuning issues and to help you choose appropriate performance-tuning strategies. (For the most current
information about tuning your database, consult the vendor’s documentation.)





Start by covering the basic database performance-tuning concepts. Encourage students to use the
web to search for information about the internal architecture (internal process and database
storage formats) of various database systems. Focus on the similarities to lay a common
foundation.
Explain how a DBMS processes SQL queries in general terms and stress the importance of
indexes in query processing. Emphasize the generation of database statistics for optimum query
processing.
Step through the query processing example in section 11.4, Optimizer Choices.
Discuss the common practices used to write more efficient SQL code. Emphasize that some
practices are DBMS-specific. As technology advances, the query optimization logic becomes
increasingly sophisticated and effective. Therefore, some of the SQL practices illustrated in this
chapter may not improve query performance as dramatically as it does in older systems.
Finally, illustrate the chapter material using the query optimization example in section 11.8.
331
Ch11 Database Performance Tuning-and Query Optimization
Answers to Review Questions
1. What is SQL performance tuning?
SQL performance tuning describes a process – on the client side – that will generate an SQL query
to return the correct answer in the least amount of time, using the minimum amount of resources at
the server end.
2. What is database performance tuning?
DBMS performance tuning describes a process – on the server side – that will properly configure the
DBMS environment to respond to clients’ requests in the fastest way possible, while making
optimum use of existing resources.
3. What is the focus of most performance tuning activities, and why does that focus exist?
Most performance-tuning activities focus on minimizing the number of I/O operations, because the
I/O operations are much slower than reading data from the data cache.
4. What are database statistics, and why are they important?
The term database statistics refers to a number of measurements gathered by the DBMS to describe
a snapshot of the database objects’ characteristics. The DBMS gathers statistics about objects such
as tables, indexes, and available resources -- such as number of processors used, processor speed,
temporary space available, and so on. Such statistics are used to make critical decisions about
improving the query processing efficiency.
5. How are database statistics obtained?
Database statistics can be gathered manually by the DBA or automatically by the DBMS. For
example, many DBMS vendors support the SQL’s ANALYZE command to gather statistics. In
addition, many vendors have their own routines to gather statistics. For example, IBM’s DB2 uses
the RUNSTATS procedure, while Microsoft’s SQL Server uses the UPDATE STATISTICS
procedure and provides the Auto-Update and Auto-Create Statistics options in its initialization
parameters.
6. What database statistics measurements are typical of tables, indexes, and resources?
For tables, typical measurements include the number of rows, the number of disk blocks used, row
length, the number of columns in each row, the number of distinct values in each column, the
maximum value in each column, the minimum value in each column, what columns have indexes,
and so on.
For indexes, typical measurements include the number and name of columns in the index key, the
number of key values in the index, the number of distinct key values in the index key, histogram of
key values in an index, etc.
332
Ch11 Database Performance Tuning-and Query Optimization
For resources, typical measurements include the logical and physical disk block size, the location
and size of data files, the number of extends per data file, and so on.
7. How is the processing of SQL DDL statements (such as CREATE TABLE) different from the
processing required by DML statements?
A DDL statement actually updates the data dictionary tables or system catalog, while a DML
statement (SELECT, INSERT, UPDATE and DELETE) mostly manipulates end user data.
8. In simple terms, the DBMS processes queries in three phases. What are those phases, and what
is accomplished in each phase?
The three phases are:
1. Parsing. The DBMS parses the SQL query and chooses be most efficient access/execution
plan.
2. Execution. The DBMS executes the SQL query using the chosen execution plan.
3. Fetching. The DBMS fetches the data and sends the result set back to the client.
Parsing involves breaking the query into smaller units and transforming the original SQL query into
a slightly different version of the original SQL code -- but one that is “fully equivalent” and more
efficient. Fully equivalent means that the optimized query results are always the same as the original
query. More efficient means that the optimized query will, almost always, execute faster than the
original query. (Note that we say almost always because many factors affect the performance of a
database. These factors include the network, the client’s computer resources, and even other queries
running concurrently in the same database.)
After the parsing and execution phases are completed, all rows that match the specified condition(s)
have been retrieved, sorted, grouped, and/or – if required – aggregated. During the fetching phase,
the rows of the resulting query result set are returned to the client. During this phase, the DBMS may
use temporary table space to store temporary data.
9. If indexes are so important, why not index every column in every table? (Include a brief
discussion of the role played by data sparsity.)
Indexing every column in every table will tax the DBMS too much in terms of index-maintenance
processing, especially if the table has many attributes, many rows, and/or requires many inserts,
updates, and/or deletes.
One measure to determine the need for an index is the data sparsity of the column you want to index.
Data sparsity refers to the number of different values a column could possibly have. For example, a
STU_SEX column in a STUDENT table can have only two possible values, “M” or “F”; therefore
this column is said to have low sparsity. In contrast, the STU_DOB column that stores the student
date of birth can have many different date values; therefore, this column is said to have high sparsity.
Knowing the sparsity helps you decide whether or not the use of an index is appropriate. For
333
Ch11 Database Performance Tuning-and Query Optimization
example, when you perform a search in a column with low sparsity, you are very likely to read a
high percentage of the table rows anyway; therefore index processing may be unnecessary work.
10. What is the difference between a rule-based optimizer and a cost-based optimizer?
A rule-based optimizer uses a set of preset rules and points to determine the best approach to execute
a query. The rules assign a “cost” to each SQL operation; the costs are then added to yield the cost of
the execution plan.
A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects being
accessed to determine the best approach to execute a query. In this case, the optimizer process adds
up the processing cost, the I/O costs and the resource costs (RAM and temporary space) to come up
with the total cost of a given execution plan.
11. What are optimizer hints and how are they used?
Hints are special instructions for the optimizer that are embedded inside the SQL command text.
Although the optimizer generally performs very well under most circumstances, there are some
circumstances in which the optimizer may not choose the best execution plan. Remember, the
optimizer makes decisions based on the existing statistics. If the statistics are old, the optimizer may
not do a good job in selecting the best execution plan. Even with the current statistics, the optimizer
choice may not be the most efficient one. There are some occasions when the end-user would like to
change the optimizer mode for the current SQL statement. In order to accomplish this task, you have
to use hints.
12. What are some general guidelines for creating and using indexes?
Create indexes for each single attribute used in a WHERE, HAVING, ORDER BY, or GROUP BY
clause. If you create indexes in all single attributes used in search conditions, the DBMS will access
the table using an index scan, instead of a full table scan. For example, if you have an index for
P_PRICE, the condition P_PRICE > 10.00 can be solved by accessing the index, instead of
sequentially scanning all table rows and evaluating P_PRICE for each row. Indexes are also used in
join expressions, such as in CUSTOMER.CUS_CODE = INVOICE.CUS_CODE.
Do not use indexes in small tables or tables with low sparsity. Remember, small tables and low
sparsity tables are not the same thing. A search condition in a table with low sparsity may return a
high percentage of table rows anyway, making the index operation too costly and making the full
table scan a viable option. Using the same logic, do not create indexes for tables with few rows and
few attributes—unless you must ensure the existence of unique values in a column.
Declare primary and foreign keys so the optimizer can use the indexes in join operations. All natural
joins and old-style joins will benefit if you declare primary keys and foreign keys because the
optimizer will use the available indexes at join time. (The declaration of a PK or FK will
automatically create an index for the declared column. Also, for the same reason, it is better to write
joins using the SQL JOIN syntax. (See Chapter 8, “Advanced SQL.”)
334
Ch11 Database Performance Tuning-and Query Optimization
Declare indexes in join columns other than PK/FK. If you do join operations on columns other than
the primary and foreign key, you may be better off declaring indexes in such columns.
13. Most query optimization techniques are designed to make the optimizer’s work easier. What
factors should you keep in mind if you intend to write conditional expressions in SQL code?
Use simple columns or literals as operands in a conditional expression—avoid the use of conditional
expressions with functions whenever possible. Comparing the contents of a single column to a literal
is faster than comparing to expressions.
Numeric field comparisons are faster than character, date, and NULL comparisons. In search
conditions, comparing a numeric attribute to a numeric literal is faster than comparing a character
attribute to a character literal. In general, numeric comparisons (integer, decimal) are handled faster
by the CPU than character and date comparisons. Because indexes do not store references to null
values, NULL conditions involve additional processing and therefore tend to be the slowest of all
conditional operands.
Equality comparisons are faster than inequality comparisons. As a general rule, equality
comparisons are processed faster than inequality comparisons. For example, P_PRICE = 10.00 is
processed faster because the DBMS can do a direct search using the index in the column. If there are
no exact matches, the condition is evaluated as false. However, if you use an inequality symbol (>,
>=, <, <=) the DBMS must perform additional processing to complete the request. This is because
there would almost always be more “greater than” or “less than” values and perhaps only a few
exactly “equal” values in the index. The slowest (with the exception of NULL) of all comparison
operators is LIKE with wildcard symbols, such as in V_CONTACT LIKE “%glo%”. Also, using the
“not equal” symbol (<>) yields slower searches, especially if the sparsity of the data is high; that is,
if there are many more different values than there are equal values.
Whenever possible, transform conditional expressions to use literals. For example, if your condition
is P_PRICE -10 = 7, change it to read P_PRICE = 17. Also, if you have a composite condition such
as:
P_QOH < P_MIN AND P_MIN = P_REORDER AND P_QOH = 10
change it to read:
P_QOH = 10 AND P_MIN = P_REORDER AND P_MIN > 10
When using multiple conditional expressions, write the equality conditions first. (Note that we did
this in the previous example.) Remember, equality conditions are faster to process than inequality
conditions. Although most RDBMSs will automatically do this for you, paying attention to this
detail lightens the load for the query optimizer. (The optimizer won’t have to do what you have
already done.)
If you use multiple AND conditions, write the condition most likely to be false first. If you use this
technique, the DBMS will stop evaluating the rest of the conditions as soon as it finds a conditional
expression that is evaluated to be false. Remember, for multiple AND conditions to be found true, all
conditions must be evaluated as true. If one of the conditions evaluates to false, everything else is
evaluated as false. Therefore, if you use this technique, the DBMS won’t waste time unnecessarily
335
Ch11 Database Performance Tuning-and Query Optimization
evaluating additional conditions. Naturally, the use of this technique implies an implicit knowledge
of the sparsity of the data set.
Whenever possible, try to avoid the use of the NOT logical operator. It is best to transform a SQL
expression containing a NOT logical operator into an equivalent expression. For example:
NOT (P_PRICE > 10.00) can be written as P_PRICE <= 10.00.
Also, NOT (EMP_SEX = 'M') can be written as EMP_SEX = 'F'.
14. What recommendations would you make for managing the data files in a DBMS with many
tables and indexes?
First, create independent data files for the system, indexes and user data table spaces. Put the data
files on separate disks or RAID volumes. This ensures that index operations will not conflict with
end-user data or data dictionary table access operations.
Second, put high-usage end-user tables in their own table spaces. By doing this, the database
minimizes conflicts with other tables and maximizes storage utilization.
Third, evaluate the creation of indexes based on the access patterns. Identify common search criteria
and isolate the most frequently used columns in search conditions. Create indexes on high usage
columns with high sparsity.
Fourth, evaluate the usage of aggregate queries in your database. Identify columns used in aggregate
functions and determine if the creation of indexes on such columns will improve response time.
Finally, identify columns used in ORDER BY statements and make sure there are indexes on such
columns.
15. What does RAID stand for and what are some commonly used RAID levels?
RAID is the acronym for Redundant Array of Independent Disks. RAID is used to provide balance
between performance and fault tolerance. RAID systems use multiple disks to create virtual disks
(storage volumes) formed by several individual disks. RAID systems provide performance
improvement and fault tolerance. Table 11.7 in the text shows the commonly used RAID levels. (We
have reproduced the table for your convenience.)
336
Ch11 Database Performance Tuning-and Query Optimization
TABLE 11.7 Common RAID Configurations
RAID
Level
Description
The data blocks are spread over separate drives. Also known as striped array. Provides
0
increased performance but no fault tolerance. Fault tolerance means that in case of failure,
data could be reconstructed and retrieved. Requires a minimum of two drives.
The same data blocks are written (duplicated) to separate drives. Also referred to as
1
mirroring or duplexing. Provides increased read performance and fault tolerance via data
redundancy. Requires a minimum of two drives.
The data are striped across separate drives, and parity data are computed and stored in a
3
dedicated drive. Parity data are specially generated data that permit the reconstruction of
corrupted or missing data. Provides good read performance and fault tolerance via parity
data. Requires a minimum of three drives.
The data and the parity are striped across separate drives. Provides good read
5
performance and fault tolerance via parity data. Requires a minimum of three drives.
Problem Solutions
Problems 1 and 2 are based on the following query:
SELECT
FROM
WHERE
ORDER BY
EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX
EMPLOYEE
EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’
EMP_LNAME, EMP_FNAME;
1. What is the likely data sparsity of the EMP_SEX column?
Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low
sparsity.
2. What indexes should you create? Write the required SQL commands.
You should create an index in EMP_AREACODE and a composite index on EMP_LNAME,
EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and
EMP_NDX2, respectively. The required SQL commands are:
CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE);
CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME);
337
Ch11 Database Performance Tuning-and Query Optimization
3. Using Table 11.4 as an example, create two alternative access plans. Use the following
assumptions:
a. There are 8,000 employees.
b. There are 4,150 female employees.
c. There are 370 employees in area code 615.
d. There are 190 female employees in area code 615.
The solution is shown in Table P11.3.
TABLE P11.3 COMPARING ACCESS PLANS AND I/O COSTS
Plan
Step
A
A1
A
A2
B
B1
B
B2
B
B3
B
B4
Operation
I/O
Operations
Full table scan EMPLOYE
Select only rows with
EMP_SEX=’F’ and
EMP_AREACODE=’615’
SORT Operation
Index Scan Range of
EMP_NDX1
Table Access by RowID
EMPLOYEE
Select only rows with
EMP_SEX=’F’
SORT Operation
I/O
Cost
Resulting
Set Rows
Total I/O
Cost
8,000
8,000
190
8,000
190
190
190
8,190
370
370
370
370
370
370
370
740
370
370
190
930
190
190
190
1,120
As you examine Table P11.3, note that in plan A the DBMS uses a full table scan of EMPLOYEE.
The SORT operation is done to order the output by employee last name and first name. In Plan B,
the DBMS uses an Index Scan Range of the EMP_NDX1 index to get the EMPLOYEE RowIDs.
After the EMPLOYEE RowIDs have been retrieved, the DBMS uses those RowIDs to get the
EMPLOYEE rows. Next, the DBMS selects only those rows with SEX = ‘F.’ Finally, the DBMS
sorts the result set by employee last name and first name.
Problems 4- 6are based on the following query:
SELECT
FROM
WHERE
EMP_LNAME, EMP_FNAME, EMP_DOB, YEAR(EMP_DOB) AS YEAR
EMPLOYEE
YEAR(EMP_DOB) = 1966;
4. What is the likely data sparsity of the EMP_DOB column?
Because the EMP_DOB column stores employee’s birthdays, this column is very likely to have high
data sparsity.
338
Ch11 Database Performance Tuning-and Query Optimization
5. Should you create an index on EMP_DOB? Why or why not?
Creating an index in the EMP_DOB column would not help this query, because the query uses the
YEAR function. However, if the same column is used for other queries, you may want to re-evaluate
the decision not to create the index.
6. What type of database I/O operations will likely be used by the query? (See Table 11.3.)
This query more than likely uses a full table scan to read all rows of the EMPLYEE table and
generate the required output. We have reproduced the table here to facilitate your discussion:
TABLE 11.3 Sample DBMS Access Plan I/O Operations
Operation
Table Scan (Full)
Table Access (Row ID)
Index Scan (Range)
Index Access (Unique)
Nested Loop
Merge
Sort
Description
Reads the entire table sequentially, from the first row to the
last row, one row at a time (slowest)
Reads a table row directly, using the row ID value (fastest)
Reads the index first to obtain the row IDs and then accesses
the table rows directly (faster than a full table scan)
Used when a table has a unique index in a column
Reads and compares a set of values to another set of values,
using a nested loop style (slow)
Merges two data sets (slow)
Sorts a data set (slow)
339
Ch11 Database Performance Tuning-and Query Optimization
Problems 7-10 are based on the ER model shown in Figure P11.7 and on the query shown after the
figure.
Figure P11.7 The Ch11_SaleCo ER Model
Given the following query
SELECT
FROM
WHERE
P_CODE, P_PRICE
PRODUCT
P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT);
7. Assuming that there are no table statistics, what type of optimization will the DBMS use?
The DBMS will use the rule-based optimization.
8. What type of database I/O operations will likely be used by the query? (See Table 11.3.)
The DBMS will likely use a full table scan to compute the average price in the inner subquery. The
DBMS is also very likely to use another full table scan of PRODUCT to execute the outer query.
(We have reproduced the table for your convenience.)
340
Ch11 Database Performance Tuning-and Query Optimization
TABLE 11.3 Sample DBMS Access Plan I/O Operations
Operation
Table Scan (Full)
Table Access (Row ID)
Index Scan (Range)
Index Access (Unique)
Nested Loop
Merge
Sort
Description
Reads the entire table sequentially, from the first row to the
last row, one row at a time (slowest)
Reads a table row directly, using the row ID value (fastest)
Reads the index first to obtain the row IDs and then accesses
the table rows directly (faster than a full table scan)
Used when a table has a unique index in a column
Reads and compares a set of values to another set of values,
using a nested loop style (slow)
Merges two data sets (slow)
Sorts a data set (slow)
9. What is the likely data sparsity of the P_PRICE column?
Because each product is likely to have a different price, the P_PRICE column is likely to have high
sparsity.
10. Should you create an index? Why or why not?
Yes, you should create an index because the column P_PRICE has high sparsity and the column is
very likely to be used in many different SQL queries as part of a conditional expression.
Problems 11-14 are based on the following query:
SELECT
FROM
GROUP BY
HAVING
P_CODE, SUM(LINE_UNITS)
LINE
P_CODE
SUM(LINE_UNITS) > (SELECT MAX(LINE_UNITS) FROM LINE);
11. What is the likely data sparsity of the LINE_UNITS column?
The LINE_UNITS column in the LINE table represents the quantity purchased of a given product in
a given invoice. This column is likely to have many different values and therefore, the column is
very likely to have high sparsity.
12. Should you create an index? If so, what would the index column(s) be and why would you
create that index? If not, explain your reasoning.
Yes, you should create an index on LINE_UNITS. This index is likely to help in the execution of the
inner query that computes the maximum value of LINE_UNITS.
341
Ch11 Database Performance Tuning-and Query Optimization
13. Should you create an index on P_CODE? If so, write the SQL command to create that index. If
not, explain your reasoning.
Yes, creating an index on P_CODE will help in query execution. However, most DBMSs
automatically index foreign key columns. If this is not the case in your DBMS, you can manually
create an index using the CREATE INDEX LINE_NDX1 ON LINE(P_CODE) command. (Note
that we have named the index LINE_NDX1.)
14. Write the command to create statistics for this table.
ANALYZE TABLE LINE COMPUTE STATISTICS;
Problems 15-16 are based on the following query:
SELECT
FROM
WHERE
P_CODE, P_QOH*P_PRICE
PRODUCT
P_QOH*P_PRICE > (SELECT AVG(P_QOH*P_PRICE) FROM PRODUCT)
15. What is the likely data sparsity of the P_QOH and P_PRICE columns?
The P_QOH and P_PRICE are likely to have high data sparsity.
16. Should you create an index, what would the index column(s) be, and why should you create
that index?
In this case, creating an index on P_QOH or on P_PRICE will not help the query execute faster for
two reasons: first, the WHERE condition on the outer query uses an expression and second, the
aggregate function also uses an expression. When using expressions in the operands of a conditional
expression, the DBMS will not use indexes available on the columns that are used in the expression.
Problems 17-19 are based on the following query:
SELECT
FROM
WHERE
ORDER BY
V_CODE, V_NAME, V_CONTACT, V_STATE
VENDOR
V_STATE =’TN’
V_NAME;
17. What indexes should you create and why? Write the SQL command to create the indexes.
You should create an index on the V_STATE column in the VENDOR table. This new index will
help in the execution of this query because the conditional operation uses the V_STATE column in
the conditional criteria. In addition, you should create an index on V_NAME, because it is used in
the ORDER BY clause. The commands to create the indexes are:
CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE);
CREATE INDEX VEND_NDX2 ON VENDOR(V_NAME);
342
Ch11 Database Performance Tuning-and Query Optimization
Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively.
18. Assume that 10,000 vendors are distributed as shown in Table P11.18. What percentage of
rows will be returned by the query?
Table P11.18 Distribution of Vendors by State
State
AK
AL
AZ
CA
CO
FL
GA
HI
IL
IN
KS
KY
LA
MD
MI
MO
Number of
Vendors
15
55
100
3244
345
995
75
68
89
12
19
45
29
208
745
35
State
MS
NC
NH
NJ
NV
OH
OK
PA
RI
SC
SD
TN
TX
UT
VA
WA
Number of
Vendors
47
358
25
645
16
821
62
425
12
65
74
113
589
36
375
258
Given the distribution of values in Table P11.18, the query will return 113 of the 10,000 rows, or
1.13% of the total table rows.
19. What type of I/O database operations would be most likely to be used to execute that query?
Assuming that you create the index on V_STATE and that you generate the statistics on the
VENDOR table, the DBMS is very likely to use the index scan range to access the index data and
then use the table access by row ID to get the VENDOR rows.
343
Ch11 Database Performance Tuning-and Query Optimization
20 Using Table 11.4 as an example, create two alternative access plans.
The two access plans are shown in Table P11.20.
Table P11.20 Comparing Access Plans and I/O Costs
Plan
Step
A
A1
A
A2
B
B1
B
B2
B
B3
Operation
Full table scan VENDOR
Select only rows with
V_STATE=’TN’
SORT Operation
Index Scan Range of
VEND_NDX1
Table Access by RowID
VENDOR
SORT Operation
I/O
Operations
I/O
Cost
Resulting
Set Rows
Total I/O
Cost
10,000
10,000
113
10,000
113
113
113
10,113
113
113
113
113
113
113
113
226
113
113
113
339
In Plan A, the DBMS uses a full table scan of VENDOR. The SORT operation is done to order the
output by vendor name. In Plan B, the DBMS uses an Index Scan Range of the VEND_NDX1 index
to get the VENDOR RowIDs. Next, the DBMS uses the RowIDs to get the EMPLOYEE rows.
Finally, the DBMS sorts the result set by V_NAME.
21. Assume that you have 10,000 different products stored in the PRODUCT table and that you
are writing a Web-based interface to list all products with a quantity on hand (P_QOH) that is
less than or equal to the minimum quantity, P_MIN. What optimizer hint would you use to
ensure that your query returns the result set to the Web interface in the least time possible?
Write the SQL code.
You will write your query using the FIRST_ROWS hint to minimize the time it takes to return the
first set of rows to the application. The query would be
SELECT /*+ FIRST_ROWS */* FROM PRODUCT WHERE P_QOH <= P_MIN;
344
Ch11 Database Performance Tuning-and Query Optimization
Problems 22-24 are based on the following query:
SELECT
FROM
WHERE
AND
AND
ORDER BY
P_CODE, P_DESCRIPT, P_PRICE, PRODUCT.V_CODE, V_STATE
PRODUCT P, VENDOR V
P.V_CODE = V.V_CODE
V_STATE = ‘NY’
V_AREACODE = ‘212’;
P_PRICE;
22. What indexes would you recommend?
This query uses the V_STATE and V_AREACODE attributes in its conditional criteria.
Furthermore, the conditional criteria use equality comparisons. Given these conditions, an index on
V_STATE and another index on V_AREACODE are highly recommended.
23. Write the commands required to create the indexes you recommended in Problem 22.
CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE);
CREATE INDEX VEND_NDX2 ON VENDOR(V_AREACODE);
Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively.
24. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables.
ANALYZE TABLE PRODUCT COMPUTE STATISTICS;
ANALYZE TABLE VENDOR COMPUTER STATISTICS;
Problems 25 and 26 are based on the following query:
SELECT
FROM
WHERE
ORDER BY
P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE
PRODUCT
V_CODE = ‘21344’
P_CODE;
25. What index would you recommend, and what command would you use?
This query uses one WHERE condition and one ORDER BY clause. The conditional expression
uses the V_CODE column in an equality comparison. In this case, creating an index on the
V_CODE attribute is recommended. If V_CODE is declared to be a foreign key, the DBMS may
already have created such an index automatically. If the DBMS does not generate the index
automatically, create one manually.
The ORDER BY clause uses the P_CODE column. Create an index on the columns used in an
ORDER BY is recommended. However, because the P_CODE column is the primary key of the
PRODUCT table, a unique index already exists for this column and therefore, it is not necessary to
create another index on this column.
345
Ch11 Database Performance Tuning-and Query Optimization
26. How should you rewrite the query to ensure that it uses the index you created in your solution
to Problem 25?
In this case, the only index that should be created is the index on the V_CODE column. Assuming
that such an index is called PROD_NDX1, you could use an optimizer hint as shown next:
SELECT
FROM
WHERE
ORDER BY
/*+ INDEX(PROD_NDX1)*/P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE
PRODUCT
V_CODE = ‘21344’
P_CODE;
Problems 27 and 28 are based on the following query:
SELECT
FROM
WHERE
AND
AND
ORDER BY
P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE
PRODUCT
P_QOH < P_MIN
P_MIN = P_REORDER’
P_REORDER = 50;
P_QOH;
27. Use the recommendations given in Section 11.5.2 to rewrite the query to produce the required
results more efficiently.
SELECT
FROM
WHERE
AND
AND
ORDER BY
P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE
PRODUCT
P_REORDER = 50
P_MIN = 50
P_QOH < 50
P_QOH;
This new query rewrites some conditions as follows:
 Because P_REORDER must be equal to 50, it replaces P_MIN = P_REORDER with P_MIN
= 50.
 Because P_MIN must be 50, it replaces P_QOH<P_MIN with P_QOH<50.
Having literals in the query conditions make queries more efficient. Note that you still need all three
conditions in the query conditions.
28. What indexes you would recommend? Write the commands to create those indexes.
Because the query uses equality comparison on P_REORDER, P_MIN and P_QOH, you should
have indexes in such columns. The commands to create such indexes are:
CREATE INDEX PROD_NDX1 ON PRODUCT(P_REORDER);
CREATE INDEX PROD_NDX2 ON PRODUCT(P_MIN);
CREATE INDEX PROD_NDX3 ON PRODUCT(P_QOH);
346
Ch11 Database Performance Tuning-and Query Optimization
Problems 29-32 are based on the following query:
SELECT
FROM
WHERE
GROUP BY
CUS_CODE, MAX(LINE_UNITS*LINE_PRICE)
CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE
CUS_AREACODE = ‘615’
CUS_CODE;
29. Assuming that you generate 15,000 invoices per month; what recommendation would you give
the designer about the use of derived attributes?
This query uses the MAX aggregate function to compute the maximum invoice line value by
customer. Because this table increases at a rate of 15,000 rows per month, the query would take
considerable amount of time to run as the number of invoice rows increases. Furthermore, because
the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple
table column, the query optimizer is very likely to perform a full table scan in order to compute the
maximum invoice line value. One way to speed up the query would be to store the derived attribute
LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query
would benefit by using the index to execute the query.
30. Assuming that you follow the recommendations you gave in Problem 29, how would you
rewrite the query?
SELECT
FROM
WHERE
GROUP BY
CUS_CODE, MAX(LINE_TOTAL)
CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE
CUS_AREACODE = ‘615’
CUS_CODE;
31. What indexes would you recommend for the query you wrote in Problem 30, and what SQL
commands would you use?
The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE.
Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any
case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on
this column is highly recommended. The command to create this index would be:
CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE);
32. How would you rewrite the query to ensure that the index you created in Problem 31 is used?
You need to use the INDEX optimizer hint:
SELECT
FROM
WHERE
GROUP BY
/*+ INDEX(CUS_NDX1) */ CUS_CODE, MAX(LINE_TOTAL)
CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE
CUS_AREACODE = ‘615’
CUS_CODE;
347
Download