Ingres_Top_n_Changes

advertisement
Ingres Top “n” Changes
Feb. 13, 2008
1. Introduction
This document discusses changes required to allow Ingres to compile and execute top “n”
queries with reasonable efficiency. Included are changes to QEF to support the possible
need to restart a query if the required “n” rows are not returned by the optimal query plan
and to implement a “priority queue” operation in which only the first “n” rows according
to some ordering are returned. Changes to OPF are required to optimize certain top “n”
queries using the scheme described in the doctoral thesis of Donko Donjerkovic and to
identify the need for priority queues at appropriate locations in the compiled query plan.
Changes are required to optimizedb to augment histograms with the maximum error
estimate that is subsequently used by OPF to perform the probabilistic determination of
the best top “n” query plan, as described in Donko’s thesis.
2. Probabilistic Top “n” Query Optimization
The top “n” optimization approach espoused in Donko’s thesis is based on the technique
of compiling a range predicate (the so-called cutoff predicate) into the query that selects
only rows whose ranking attribute is less than (or greater than, depending on whether the
ORDER BY clause is ascending or descending) some constant value.
Any query plan built around a cutoff predicate must be prepared for the possibility that
the value of the cutoff constant (called κ in Donko’s thesis) will result in fewer than “n”
rows being selected, and the need to restart the query to retrieve the remaining rows. The
intuition behind Donko’s thesis is to develop plans for successive κ values, each of which
has an expected cost which incorporates not only the cost estimate of the optimal plan
(with cutoff predicate), but also the cost of a restart plan factored by the probability of
restart. The resulting query plan can be thought of as the union of the optimal top “n”
plan with the restart plan (which simply uses the complement of the top “n” cutoff
predicate to return the remaining rows). A “stop after n rows” operator in the top of the
query plan (already implemented in Ingres) will prevent the execution of the restart
component when the optimal plan returns at least “n” rows.
Creating this plan (or pair of complementary plans) involves multiple calls to the query
optimizer with successive κ values. Optimization is as usual for the pairs of queries
computing costs for the best plan in each call. However, in addition to plan costs, the
optimizer must also return a probability for the likelihood that the optimal plan (the nonrestart plan) will NOT return “n” rows. The expected cost of the plans for a given κ value
is the sum of the cost estimate for the optimal plan and the cost estimate for the restart
plan multiplied by the probability that it will be required. The chosen κ is the value that
minimizes this expected cost.
3. OPF Optimization of Top “n”
Based on discussions with Donko, it is proposed that the logic to drive the identification
of the optimal κ value for the cutoff predicate should be located in opj_joinop() as a loop
outside the loop that enumerates each of the subqueries identified by query rewrite.
Rewrite will identify queries to be optimized in this way, and will flag subqueries for
which the cutoff predicate is required. The opj_joinop() loop will then propose successive
values of κ and enumerate the subqueries with each such value. Expected costs of the top
“n” query will be returned for each κ and the plan with the lowest cost will be chosen
once costs for successive κ’s differ by little enough. For each plan optimized with a κ
value in the cutoff predicate, a plan will be optimized with the complement of the cutoff
predicate to act as the restart query to produce the remaining rows in the event that the
optimized plan doesn’t produce the required “n” rows.
The rewrite changes will be straightforward, likely requiring a few new pieces of
information to be added to the global state structure of OPF (OPS_STATE) and to the
subquery structures (OPS_SUBQUERY). Even the loop in opj_joinop() should be
relatively simple to code. Some mechanism will be needed to generate the successive κ
values (Donko used the golden section search algorithm, though something even more
basic like binary search could be used). Then the loop needs only to keep track of the
expected cost values from each κ and the corresponding CO-trees, and apply the
termination condition appropriately.
The difficult part of the OPF work will be to generate the probabilities that the compiled
plan will return fewer than the “n” rows requested. This won’t require a rewrite of OPF
cost analysis, but it will require new information to be tracked as plans are assembled.
Specifically, we will need to accumulate probabilities on whether the fragment being
compiled will produce the required “n” rows. New logic will be required in the selectivity
estimation functions to use the error bounds built by optimizedb to produce the
probabilities. This is the part of Donko’s algorithms that I don’t fully understand yet, but
I’m getting there.
Very rough estimates of implementation effort for the OPF changes are as follows:
- rewrite changes to identify top “n” optimization potential and update
OPS_STATE, OPS_SUBQUERY structures with appropriate information: 1
week,
- opj_joinop() changes to drive cutoff parameter estimation: 1-2 weeks,
- code generation changes to flag sorts requiring priority queue processing: 2
days,
- query optimizer changes to produce probability estimates: 1 month,
- adding referential relationship catalogs (see Implementation Issues, below): 2
weeks.
4. optimizedb Changes
Estimation of the expected costs of different top “n” query plans with different cutoff
values using Donko’s probabilistic optimization requires that histograms be augmented
with a maximum selectivity error value. It is effectively the maximum error that Ingres
would produce for range predicates involving all values of the column. A second pass of
the data turned into a histogram would be required in optimizedb in which the selectivity
for “x > n” is estimated for each value n in the column. The estimated value (using the
same mechanism as OPF to perform the estimation) is then compared to the actual value
(which can be computed, since the real data is being processed). The largest difference
between estimated and actual selectivity is then recorded with the histogram. The error
estimate can be saved as either a new column in iistatistics (catalog change) or as a new
field at the end of the free form histogram data in iihistogram.
A rough estimate of the implementation effort of these changes is 2 weeks.
5. QEF Changes
Two changes of significance are required for QEF. The first will be the ability to restart a
query that hasn’t returned the requisite “n” rows from the optimized top “n” query plan.
As described earlier, a complimentary plan will be compiled that returns the result rows
not produced by the optimized plan – effectively reversing the cutoff predicate. This
could be done in one of several ways. A single query plan could be produced which
UNIONs the optimized plan with the complimentary plan, only executing the
complimentary plan if the optimized plan fails to produce “n” rows. This is definitely the
easiest option, though it would require the overhead of two query plans (including
initialization in QEF) when, hopefully, only one would be executed. The other slightly
more efficient approach would be to detach the optimized and restart plans and only
initialize the latter if actually required. The UNION approach could be achieved with
almost no work at all in QEF, and a trivial amount of work in OPF. The detached plan
approach would require additional logic in qeq.c to trigger the initialization and execution
of the restart plan, when required. This would take 1-2 weeks of implementation work.
The second change required in QEF is the support of priority queues. This is an access
technique that returns only the top “n” rows according to some ordering scheme. One
way of implementing it is to augment the QEF sort to only load the top “n” rows into its
sort heap. The first “n” rows are loaded as usual, but then only rows whose ranking
attribute places them in the current top “n” will be subsequently loaded. And when this
happens, the new rows will simply replace existing rows that are no longer in the current
top “n”. All that is required is to track the current “nth” ranking attribute value to
determine if new rows are to be retained or discarded. This technique should be easily
introduced into the existing QEF sort. It would be unusual to encounter a “n” value too
large for the QEF memory sort, but for such cases the DMF sort could be extended
similarly. If “n” is large enough that even the DMF sort overflows to disk, the normal
DMF disk sort could be performed (with no optimization) and only the first “n” rows
would be returned.
Appropriate flags and the “n” value itself would have to be added to the QEN_TSORT
node structure. If priority queues are to be supported in the DMF sort, equivalent changes
would be required in the DMR_CB structure. The OPF changes to achieve this would be
1-2 days. Changes to the QEF sort shouldn’t take more than 1-2 weeks and changes to the
DMF sort should also be in the order of 1-2 weeks.
6. Implementation Issues
The probabilistic approach extends into reasonably complex queries. Donko’s thesis
supplies guidelines for dealing with range predicates, equality predicates, equijoins and
unions. However, the more complex the query, the less accurate the estimates will be
(e.g. in the presence of subselects) and the lower the quality of the resulting plans.
Moreover, it is effective only when the ranking attribute is a column in a table (or
possibly a scalar expression involving a column). Most importantly it does NOT address
ranking attributes that are the result of aggregate functions (sum, avg, etc.). In TPC H, for
example, 3 of the 5 top “n” queries rank on aggregates and cannot be handled by the
proposed technique. Most of the top “n” TPC DS queries likewise involve aggregate
ranking attributes. In TPC E, however, it appears that most top “n” queries rank on
columns and could use the proposed optimization.
A trivial change to QEF could be made to exclude priority queues from the QEF sort and
pass them straight through to the DMF sort. That would result in priority queues only
being implemented in the DMF sort.
While I understand most of the techniques presented in Donko’s thesis, I still don’t fully
understand the mechanisms used to generate the probabilities of different cardinalities in
the result set.
Using this approach for top “n” join queries depends on knowledge of equijoins that map
referential relationships. This information has never been available in a useful form in the
Ingres catalogs. I have long promoted the idea of new, simple catalogs that record the
definitions of referential relationships in a form that is easily useable by OPF. Moreover,
the “maximum error” statistic that needs to be computed for each histogram would be
best stored in a new column in the iistatistics catalog. So it seems likely that this feature
will require catalog changes.
7. Summary
This document describes a methodology for optimizing and executing top “n” queries,
along with the changes that will be required to implement it. As can be seen, this is not a
trivial project. The project is also not conducive to incremental implementation – most of
the work described here will be required before any benefit to top “n” processing will be
seen.
An encouraging sign is that John Galloway ran the TPC H top “n” query 17 with an
explicitly coded cutoff predicate and reduced the execution time from 88 to 7 seconds. So
it would seem that optimization of top “n” queries is definitely worth the effort.
Download