Query Optimization Engineering - Computer Science

advertisement
Query Optimization Engineering
David Maier(*), Leonard Shapiro(**)
(*) Affiliation: Computer Science and Engineering Department, Oregon Graduate Institute
(**) Affiliation: Computer Science Department, Portland State University
Contact Information
(*) Computer Science and Engineering Department, Oregon Graduate Institute of Science and Technology
20000 NW Walker Road
Beaverton, Oregon 97006
Phone: (503) 690-1154, Fax : (503) 690-1553
Email: maier@cse.ogi.edu
Personal Web Page: http://www.cse.ogi.edu/~maier/
WWW PROJECT PAGE
http://www.cse.ogi.edu/DISC/projects/ereq/columbia/columbia.html
(**)Computer Science Department, Portland State University
P.O. Box 751
Portland, OR 97207-0751
Phone: (503) 725-4208, Fax : (503) 725-3211
Email: len@cs.pdx.edu
Personal Web Page: http://www.cs.pdx.edu/~len
WWW PROJECT PAGE
http://www.cs.pdx.edu/~len/Columbia
List of Supported Students and Staff
Quan Wang, Graduate Research Assistant, Kavita Hatwal, Graduate Research Assistant
Project Award Information
Award Number: NSF # IRI-9619977 to Oregon Graduate Institute and NSF # IRI-9610013 to Portland State
University. Duration: 9/01/1997 - 8/31/2000 Title: Query Optimization Engineering
Keywords
query, processing, optimization, benchmarks, performance, search, models, operators
Project Summary
This project will explore techniques for optimizing database queries, primarily queries from new application
areas. It will focus on engineering issues such as which logical operators and rule sets work best for a particular
query class? How well do different search strategies work with a given query model? What are the tradeoffs, relative
to optimization time and plan quality, of search heuristics and limits on plan shape? How can optimizers handle
variability in the runtime evaluation environment?
Publications and Products

“Exploiting Upper and Lower Bounds in Top-Down Query Optimization”, L. Shapiro, D. Maier et al,
proceedings, International Database Engineering and Applications Symposium, Grenoble, France, July 1620, 2001.

“Practical Query Unnesting”, Quan Wang, Cesar Galindo-Legaria, David Maier, Milind Joshi, Leonard
Shapiro, in preparation.

"Revisiting Reference Materialization Techniques for Object Query Processing”, Q. Wang, D. Maier and L.
Shapiro. Proceedings of the International Database Engineering and Applications Symposium, September
2000, Yokohama, Japan.

"Efficiency in the Columbia Database Query Optimizer", Master's thesis by Yongwen Xu, Portland State
University, 1998

"A TPC-D Model for Database Query Optimization in Cascades", Master's thesis by Keith Billings,
Portland State University, 1997.
 Columbia Query Optimizer source code.
Project Impact
Human Resources This grant has supported the work of four master's students: Mr. Yongwen Xu, Mrs. Yu
Zhang, Mrs. Manjiri Mahajan, and Mrs. Kavita Hatwal, and a PhD student: Mr. Quan Wang. Mr. Xu and Mr. Wang
are now working fulltime for Oracle's Portland development office, and a previous student on this project, Mr. Keith
Billings, is working for Informix's Portland development office. Mrs. Yu Zhang is now working fulltime with the
database benchmarking group at IBM/Sequent in Portland. All of them continue to work with our research group,
thus building bridges between to these nationally recognized database companies. Education and curriculum
development at all levels Shapiro and Maier both teach database courses at the graduate and (Shapiro) undergraduate
level. The research activities of this project have been translated into lectures in these courses. In particular,
students who attend these courses from local database companies (Gemstone, Oracle, Informix, IBM/Sequent) have
reacted positively to the enrichment they have received through a better understanding of query optimization.
Furthermore, because of this grant, several speakers in query optimization have been brought to Portland and their
talks have been attended by students in our classes, further enriching their educational experiences. These speakers
have included Cesar Galindo-Legaria, Pedro Celis, Bill McKenna and Goetz Graefe.
Department/institution infrastructure Because Shapiro and Maier are at different institutions, the project
encourages collaboration between the institutions. The PIs and their students meet weekly at OGI. Shapiro has an
office at OGI. Because of these interactions, Shapiro and Maier serve as channels of information between PSU and
OGI. This results in the two institutions forming more of a critical mass in computer science.
Industry -- collaborations, transfer of technology, patents. To ensure that our work is relevant and does not
duplicate current art, we have worked with engineers from database companies in the Portland area, specifically
Goetz Graefe, Cesar Galindo-Legaria, Milind Joshi and Surajit Chaudhuri, Microsoft ; Gary Kelly and Dave Clay,
Oracle ; Jay Almarode, GemStone Systems, Inc. ; Seckin Unlu and Jeff Smits, Intel ; and Bill McKenna, Red Brick
Systems. These collaborators are valuable both in identifying critical problems in commercial optimization and for
evaluating the techniques we develop.
Goals, Objectives, and Targeted Activities
During this final year of our project we have focused our attention on three topics: Query optimization
techniques for CVA queries, exploring bounds analysis and memory usage in Columbia, and working with a group
in Germany that is using our Columbia optimizer to investigate the optimization of parallel plans.
We are investigating object query optimization techniques, especially for queries involving collection-valued
attributes, or CVA queries. The work may contribute to current body of work in four aspects. First, we study an
algebraic method of unnesting nested CVA queries that is complete for a larger subset of OQL queries than existing
algebraic unnesting approaches. Second, we propose a novel model to characterize order properties for intermediate
query results. This model enables the relational algebra to represent evaluation plans that are comparable to several
efficient physical operators performed on nested data. Third, we present a new materialization technique that
performs well for CVAs and shared attributes. Fourth, we develop and validate an appropriate cost model for CVA
query optimization. The unique features of the cost model are to capture clustering relationships between objects and
its CVAs, and to capture more order properties for flattened CVA elements. Our experience demonstrates that the
present optimization techniques can be easily implemented in a rule-based relational optimizer framework. This
work is presently being written up in the PhD thesis of Mr. Wang.
We are also exploring bounds analysis and memory usage in top-down optimizers. The bounds analysis work
is reported in the IDEAS01 paper referenced above. Top-down optimizers are notorious for high memory
expenditures, and we have investigated several possible solutions to this problem. They include the use of
interesting orders, deleting physical plans even when they may need to be regenerated, and deleting logical
expressions when they will no longer be used. Of these solutions, interesting orders seems to be the most effective.
With the help of an NSF travel grant we have been working with Professor Bernhard Mitschang of the
University of Stuttgart on a number of projects. This year we have investigated a new class of physical operators,
collectively called Stream Join. For certain combinations of input physical properties, these are the most efficient
algorithms for universal quantification. Previous work has covered all other input property combinations.
GPRA Outcome Goals
Discoveries at and across the frontier of science and engineering. Previously described research of this project
concerns new algorithms for collection valued attributes, new approaches to parallel query optimization, and new
heuristics for optimizers. All of these initiatives represent discoveries at the frontier of science and engineering.
They represent frontiers of science because they explore paradigms not previously considered, and they represent
frontiers of engineering because they are directly useful in building optimizers.
Connections between discoveries and their use in service to society. Little is known about top-down
optimization, and our research program is answering many basic questions in that area. While two companies
(Microsoft and Tandem) are using top-down optimizers, neither has published details of their implementation. Our
research will provide other database companies with fundamental guidance in this promising field.
Timely and relevant information on the national and international science and engineering enterprise.
Professors Shapiro and Mitschang, through their visits to each others' research groups, will deepen communication
between researchers in Oregon and in Stuttgart, Germany.
Project References
Our work builds on lessons learned in the EREQ (DARPA) and Revelation (NSF) projects, which are included in
references below. Several papers are available on the project web page.

S. Daniels, G. Graefe, T. Keller, D. Maier, D. Schmidt and B. Vance. Query Optimization in Revelation, an
Overview. IEEE Data Engineering. Bulletin, June 1991.

L. Fegaras and D. Maier. Towards an Effective Calculus for Object Query Languages. ACM-SIGMOD
International Conference on Management of Data, San Jose, May 1995.

G. Graefe, A. Linville and L. D. Shapiro. Sort versus Hash Revisited. IEEE Trans. on Knowledge and Data
Eng. December 1994.

D. Maier, S. Daniels, T. Keller, B. Vance, G. Graefe and W. McKenna. Challenges for Query Processing in
Object-Oriented Databases. In Query Processing for Advanced Database Applications, J. C. Freytag, G.
Vossen and D. Maier, editors, Morgan Kaufmann, 1994.

B. Vance and D. Maier, Rapid Bushy Join-order Optimization with Cartesian Products, Proc. ACM
SIGMOD Conf 1996.
Area Background
Query optimizers are one of the main means by which modern database systems achieve their performance
advantages. Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for
evaluating the request from among the manifold alternative strategies. Optimizers for commercial relational
database systems appeared early in the last decade, and optimization for the basic relational model was considered a
solved problem by many. However, new interest in query capabilities for knowledge discovery and on-line
analytical processing, in large data warehouses, and against complex multimedia objects has kindled renewed
research in optimization. Current optimizers have often proved inadequate to the needs of these new application
areas.
Area References

J. A. Blakeley, W. J. McKenna and G. Graefe, Experiences Building the Open OODB Query Optimizer,
Proc. ACM SIGMOD Conf., Washington, D C, May 1993, 287.

P. Cellis, The Query Optimizer in Tandem's ServerWare SQL Product, Proceedings of VLDB '96, Page
592.

S. Chauduri, An Overview of Query Processing in Relational Systems, Proc. ACM PODS, 1998.

G. Graefe, The Cascades Framework for Query Optimization, Bulletin of the TC on Data Engineering, Vol
18 No. 3, September 1995, Pg 19-29

G. Graefe. Query Evaluation Techniques for Large Databases. ACM Computing Surveys June 1993.
G. Graefe. Volcano, An Extensible and Parallel Dataflow Query Processing System. IEEE Trans. on
Knowledge and Data Eng. February 1994.

G. Graefe, The Cascades Framework for Query Optimization, Bulletin of the TC on Data Engineering, Vol
18 No. 3, September 1995, Pg 19-29

G. Graefe and W. J. McKenna, The Volcano Optimizer Generator: Extensibility and Efficient Search, Proc.
IEEE Int'l. Conf. on Data Eng., Vienna, Austria, April 1993, 209.

N. Kabra, D. DeWitt, OPT++: An Object-Oriented Implementation for Extensible Database Query
Optimization, to appear, VLDB Journal.

S. Zdonik and D. Maier, Readings in Object-Oriented Database Systems, Morgan Kaufmann, San Mateo,
CA, 1990.
Download