Query Optimization Engineering - Computer Science

Query Optimization Engineering David Maier(*), Leonard Shapiro(**) (*) Affiliation: Computer Science and Engineering Department, Oregon Graduate Institute (**) Affiliation: Computer Science Department, Portland State University Contact Information (*) Computer Science and Engineering Department, Oregon Graduate Institute of Science and Technology 20000 NW Walker Road Beaverton, Oregon 97006 Phone: (503) 690-1154, Fax : (503) 690-1553 Email: maier@cse.ogi.edu Personal Web Page: http://www.cse.ogi.edu/~maier/ WWW PROJECT PAGE http://www.cse.ogi.edu/DISC/projects/ereq/columbia/columbia.html (**)Computer Science Department, Portland State University P.O. Box 751 Portland, OR 97207-0751 Phone: (503) 725-4208, Fax : (503) 725-3211 Email: len@cs.pdx.edu Personal Web Page: http://www.cs.pdx.edu/~len WWW PROJECT PAGE http://www.cs.pdx.edu/~len/Columbia List of Supported Students and Staff Quan Wang, Graduate Research Assistant, Kavita Hatwal, Graduate Research Assistant Project Award Information Award Number: NSF # IRI-9619977 to Oregon Graduate Institute and NSF # IRI-9610013 to Portland State University. Duration: 9/01/1997 - 8/31/2000 Title: Query Optimization Engineering Keywords query, processing, optimization, benchmarks, performance, search, models, operators Project Summary This project will explore techniques for optimizing database queries, primarily queries from new application areas. It will focus on engineering issues such as which logical operators and rule sets work best for a particular query class? How well do different search strategies work with a given query model? What are the tradeoffs, relative to optimization time and plan quality, of search heuristics and limits on plan shape? How can optimizers handle variability in the runtime evaluation environment? Publications and Products  “Exploiting Upper and Lower Bounds in Top-Down Query Optimization”, L. Shapiro, D. Maier et al, proceedings, International Database Engineering and Applications Symposium, Grenoble, France, July 1620, 2001.  “Practical Query Unnesting”, Quan Wang, Cesar Galindo-Legaria, David Maier, Milind Joshi, Leonard Shapiro, in preparation.  "Revisiting Reference Materialization Techniques for Object Query Processing”, Q. Wang, D. Maier and L. Shapiro. Proceedings of the International Database Engineering and Applications Symposium, September 2000, Yokohama, Japan.  "Efficiency in the Columbia Database Query Optimizer", Master's thesis by Yongwen Xu, Portland State University, 1998  "A TPC-D Model for Database Query Optimization in Cascades", Master's thesis by Keith Billings, Portland State University, 1997.  Columbia Query Optimizer source code. Project Impact Human Resources This grant has supported the work of four master's students: Mr. Yongwen Xu, Mrs. Yu Zhang, Mrs. Manjiri Mahajan, and Mrs. Kavita Hatwal, and a PhD student: Mr. Quan Wang. Mr. Xu and Mr. Wang are now working fulltime for Oracle's Portland development office, and a previous student on this project, Mr. Keith Billings, is working for Informix's Portland development office. Mrs. Yu Zhang is now working fulltime with the database benchmarking group at IBM/Sequent in Portland. All of them continue to work with our research group, thus building bridges between to these nationally recognized database companies. Education and curriculum development at all levels Shapiro and Maier both teach database courses at the graduate and (Shapiro) undergraduate level. The research activities of this project have been translated into lectures in these courses. In particular, students who attend these courses from local database companies (Gemstone, Oracle, Informix, IBM/Sequent) have reacted positively to the enrichment they have received through a better understanding of query optimization. Furthermore, because of this grant, several speakers in query optimization have been brought to Portland and their talks have been attended by students in our classes, further enriching their educational experiences. These speakers have included Cesar Galindo-Legaria, Pedro Celis, Bill McKenna and Goetz Graefe. Department/institution infrastructure Because Shapiro and Maier are at different institutions, the project encourages collaboration between the institutions. The PIs and their students meet weekly at OGI. Shapiro has an office at OGI. Because of these interactions, Shapiro and Maier serve as channels of information between PSU and OGI. This results in the two institutions forming more of a critical mass in computer science. Industry -- collaborations, transfer of technology, patents. To ensure that our work is relevant and does not duplicate current art, we have worked with engineers from database companies in the Portland area, specifically Goetz Graefe, Cesar Galindo-Legaria, Milind Joshi and Surajit Chaudhuri, Microsoft ; Gary Kelly and Dave Clay, Oracle ; Jay Almarode, GemStone Systems, Inc. ; Seckin Unlu and Jeff Smits, Intel ; and Bill McKenna, Red Brick Systems. These collaborators are valuable both in identifying critical problems in commercial optimization and for evaluating the techniques we develop. Goals, Objectives, and Targeted Activities During this final year of our project we have focused our attention on three topics: Query optimization techniques for CVA queries, exploring bounds analysis and memory usage in Columbia, and working with a group in Germany that is using our Columbia optimizer to investigate the optimization of parallel plans. We are investigating object query optimization techniques, especially for queries involving collection-valued attributes, or CVA queries. The work may contribute to current body of work in four aspects. First, we study an algebraic method of unnesting nested CVA queries that is complete for a larger subset of OQL queries than existing algebraic unnesting approaches. Second, we propose a novel model to characterize order properties for intermediate query results. This model enables the relational algebra to represent evaluation plans that are comparable to several efficient physical operators performed on nested data. Third, we present a new materialization technique that performs well for CVAs and shared attributes. Fourth, we develop and validate an appropriate cost model for CVA query optimization. The unique features of the cost model are to capture clustering relationships between objects and its CVAs, and to capture more order properties for flattened CVA elements. Our experience demonstrates that the present optimization techniques can be easily implemented in a rule-based relational optimizer framework. This work is presently being written up in the PhD thesis of Mr. Wang. We are also exploring bounds analysis and memory usage in top-down optimizers. The bounds analysis work is reported in the IDEAS01 paper referenced above. Top-down optimizers are notorious for high memory expenditures, and we have investigated several possible solutions to this problem. They include the use of interesting orders, deleting physical plans even when they may need to be regenerated, and deleting logical expressions when they will no longer be used. Of these solutions, interesting orders seems to be the most effective. With the help of an NSF travel grant we have been working with Professor Bernhard Mitschang of the University of Stuttgart on a number of projects. This year we have investigated a new class of physical operators, collectively called Stream Join. For certain combinations of input physical properties, these are the most efficient algorithms for universal quantification. Previous work has covered all other input property combinations. GPRA Outcome Goals Discoveries at and across the frontier of science and engineering. Previously described research of this project concerns new algorithms for collection valued attributes, new approaches to parallel query optimization, and new heuristics for optimizers. All of these initiatives represent discoveries at the frontier of science and engineering. They represent frontiers of science because they explore paradigms not previously considered, and they represent frontiers of engineering because they are directly useful in building optimizers. Connections between discoveries and their use in service to society. Little is known about top-down optimization, and our research program is answering many basic questions in that area. While two companies (Microsoft and Tandem) are using top-down optimizers, neither has published details of their implementation. Our research will provide other database companies with fundamental guidance in this promising field. Timely and relevant information on the national and international science and engineering enterprise. Professors Shapiro and Mitschang, through their visits to each others' research groups, will deepen communication between researchers in Oregon and in Stuttgart, Germany. Project References Our work builds on lessons learned in the EREQ (DARPA) and Revelation (NSF) projects, which are included in references below. Several papers are available on the project web page.  S. Daniels, G. Graefe, T. Keller, D. Maier, D. Schmidt and B. Vance. Query Optimization in Revelation, an Overview. IEEE Data Engineering. Bulletin, June 1991.  L. Fegaras and D. Maier. Towards an Effective Calculus for Object Query Languages. ACM-SIGMOD International Conference on Management of Data, San Jose, May 1995.  G. Graefe, A. Linville and L. D. Shapiro. Sort versus Hash Revisited. IEEE Trans. on Knowledge and Data Eng. December 1994.  D. Maier, S. Daniels, T. Keller, B. Vance, G. Graefe and W. McKenna. Challenges for Query Processing in Object-Oriented Databases. In Query Processing for Advanced Database Applications, J. C. Freytag, G. Vossen and D. Maier, editors, Morgan Kaufmann, 1994.  B. Vance and D. Maier, Rapid Bushy Join-order Optimization with Cartesian Products, Proc. ACM SIGMOD Conf 1996. Area Background Query optimizers are one of the main means by which modern database systems achieve their performance advantages. Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for evaluating the request from among the manifold alternative strategies. Optimizers for commercial relational database systems appeared early in the last decade, and optimization for the basic relational model was considered a solved problem by many. However, new interest in query capabilities for knowledge discovery and on-line analytical processing, in large data warehouses, and against complex multimedia objects has kindled renewed research in optimization. Current optimizers have often proved inadequate to the needs of these new application areas. Area References  J. A. Blakeley, W. J. McKenna and G. Graefe, Experiences Building the Open OODB Query Optimizer, Proc. ACM SIGMOD Conf., Washington, D C, May 1993, 287.  P. Cellis, The Query Optimizer in Tandem's ServerWare SQL Product, Proceedings of VLDB '96, Page 592.  S. Chauduri, An Overview of Query Processing in Relational Systems, Proc. ACM PODS, 1998.  G. Graefe, The Cascades Framework for Query Optimization, Bulletin of the TC on Data Engineering, Vol 18 No. 3, September 1995, Pg 19-29  G. Graefe. Query Evaluation Techniques for Large Databases. ACM Computing Surveys June 1993. G. Graefe. Volcano, An Extensible and Parallel Dataflow Query Processing System. IEEE Trans. on Knowledge and Data Eng. February 1994.  G. Graefe, The Cascades Framework for Query Optimization, Bulletin of the TC on Data Engineering, Vol 18 No. 3, September 1995, Pg 19-29  G. Graefe and W. J. McKenna, The Volcano Optimizer Generator: Extensibility and Efficient Search, Proc. IEEE Int'l. Conf. on Data Eng., Vienna, Austria, April 1993, 209.  N. Kabra, D. DeWitt, OPT++: An Object-Oriented Implementation for Extensible Database Query Optimization, to appear, VLDB Journal.  S. Zdonik and D. Maier, Readings in Object-Oriented Database Systems, Morgan Kaufmann, San Mateo, CA, 1990.

Query Optimization Engineering - Computer Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib