Query Processing and Query Optimization CS 157B Dennis Le Weishan Wang What are we going to cover 1. 2. 3. 4. 5. 6. Processing Selection Queries Processing Projection Queries and Eliminating Duplicates Processing Join Queries Query Plans and Query Optimization for Complex Relational Expression Query Optimization and Database Analysis in Oracle8 Summary 1. How database processes queries 1. Processing Selection Queries A Query Plan is a particular strategy for reading the indexes and files to process a query Query plan for query 1 select * from Customer where accountId = 101 Query 5 selection based on a hash table index select * from Customer where zipcode = 32306 2. Processing Projection Queries and Eliminating Duplicates Suppose we use the following query to eliminate the duplicates select distinct lastName, firstName from Customer The DBMS has two ways of doing this which are based on sorting and hashing The sorting method sorts the projected records and writes the nonduplicates The hashing method partitions the file into buckets then sorts within buckets to eliminate duplicates 3. Processing Join Queries Suppose we have following three tables: Customer, Rental and Movie B: number of Blocks, R: number of rows The following example shows the same cost of data blocks reads for both plans Two plans have the same cost Improvement - block nested loops join Use the method readBlock() in the outer loop to read the outer table into an array The inner loop reads the inner file(array) Reduce the cost to Bc + Bc x Br Improvement - indexed nested loops join Reduce the reads by using index If one of the attributes is indexed, we should take advantage of it Use indexed read readByAccountId() Two plans have different costs because of the one-to-many relationship Improvement - indexed nested loops join Improvement - indexed nested loops join Each rental has a customer but not every customer has a rental Plan-1 (Rental inner)reads every customer and every rental once, but plan2 (Rental outer)reads some customers many times and some customer no times Which plan to use is depending on the size of the file(table). Chose plan-1 if the customer file is smaller Improvement - sort-merge join Sort both tables and merge the result The cost will depend on the cost of sorting(typically on the order of NlogN for N records) Improvement - hash join Create a hash table for each join table The cost of the hash join depends on the size of the two hash tables(Hc and Hr) The total cost is 2(Bc+Br+Hc+Hr) 4. Query Plans and Query Optimization for Complex Relational Expression Query Optimization Process: 1. Enumerate the query plans 2. Estimate the costs of the plans 3. Choose the best plan for the query 4. Query Plans and Query Optimization for Complex Relational Expression 4. Query Plans and Query Optimization for Complex Relational Expression 4. Query Plans and Query Optimization for Complex Relational Expression 4. Query Plans and Query Optimization for Complex Relational Expression Estimating costs (Query Optimizer) - number of records - size of records in each table - location and types of indexes - ordering of files - memory available for disk cache - etc… 5. Query Optimization and Database Analysis in Oracle8 Tables that keep information about the DataBase Analyze Statement 5. Query Optimization and Database Analysis in Oracle8 6. Summary Principles and Practices of Query Processing Techniques in optimizing Query Processes Process of Query Optimization Commercial products implements these principles 7. References “Principles of Database Systems with internet and JAVA applications,” by Greg Riccardi