Access Path Selection in a Relational Database Management System Selinger P. Griffiths M. M. Astrahan D. D. Chamberlin ‘,.::' It. A. Lorie T. G. Price 4: IBM Research Division, San Jose, . 95193 retrieval. Nor does a user specify in what order joins are to be performed. The System R optimizer .chooses both join order and an access path for each table in the SQL statement. Of the many possible choices, the optimizer the chooses one which minimizes "total access cost" for performing the entire statement. ABSTRACT: In a high level query and data manipulation language such as SQL, requests stated non-procedurally, without are reference to access paths. This paper describes how System R chooses access paths both simple (single relation) and for (such as joins), complex queries given a specification of desired data as a user of predicates. System R boolean expression database management is an experimental system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research'Laboratory. 1. California This paper issues of will address the access path selection for queries. Retrieval for data manipulation (UPDATE, DELETE) is treated similarly. Section 2 will describe the place of the optimizer in processing the of a SQL statement, and section 3 will describe the storage component access paths that are available on a single physically stored table. In section optimizer cost intro4 the formulas are duced for single table queries, and section more 5 discusses the joining of two or and their corresponding costs. tables, Nested queries (queries in predicates) are covered in section 6. Introduction System' R is an experimental database based on the relational management system model of data which has been under development at the IBM San Jose Research Laborato1975 Cl>. The software was since ry developed as a research vehicle in relaand is tional database, not generally outside the IBM Research available Division. 2. processi.Bg & B.B u statement four A SQL statement is subjected to phases of Depending on 'the processing. origin and contents of the statement., these phases may be separated by arbitrary time. In System intervals. of RI these arbitrary time intervals are transparent to a SQL components which process the system These mechanisms and a descripstatement. the processing tion of of SQL statements terminals are both programs and from Only an overview further discussed in <2>. of those processing steps that are relevant to access path selection will be discussed here. assumes familiarity This with paper data model terminology as relational <a>. The described in Codd <7> and Date in System R is the unified user interface data definition, and manipulation query, Statements in SQL can be language SQL <5>. issued both from an on-line casual-user-oriented terminal interface and from programming languages such as PL/I and COBOL. In System R a user need not know how are physically stored and what the tuples are available (e.g. which access paths indexes). SQL statements do columns have the user not require to specify anything about the access path to be used for tuple The four phases of statement processing code generation. optimization. are-parsing, is sent and execution. Each SQL statement is checked for to , the parser. where it reprecorrect syntax. A guery block is sented by a SELECT list, a FROM list, and a the respectively containing, WHERE tree, list of .items to be retrieved, the table(s) boolean combination of referenced, and the simple predicates specified by the user. A may have many query single SQL statement one because a predicate may have blocks Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct couunercial advantage, the ACMcopyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee end/or specific permission. 01979 ACM0-89791-001-X/79/0500-0023 $00.75 23 operand which is itself a query. 3. the parser returns without If any the OPTIMIZER component is errors detected, accumulates the The OPTIMIZER called. tables and columns referenced in names of the query and looks them up in the System R their existence and to catalogs to verify retrieve information about them. _T'he Research Storaae System The Research Storage System (RSSI is the storage subsystem of System R. It is responsible for maintaining Physical storage of relations, access paths on these relations, locking (in a multi-user environment), and logging and recovery facilities. The RSS presents a tuple-oriented interface (RSII to its users. Although the RSS may be used R, independently of System we are concerned here with its use for executing the code generated by the processing of SQL statements in System R, as described in the previous section. For a complete description of the RSS, see <l>. lookup portion of the The catalog OPTIMIZER also obtains statistics about the referenced relations, and the access paths These will be ‘available on each of them. After used later in access path selection. lookup has obtained the datatype catalog OPTIMIZER of each column, the and length SELECT-list and WHERE-tree to rescans the check for semantic errors and type compatiexpressions and predicate in both bility comparisons. Relations are stored in the RSS as a collection of tuples whose columns are physically contiguous. These tuples are stored on 4K byte pages; no tuple spans a Pages are organized page. into logical units called segments. Segments may contain one or more relations, but no relation rn%y span a segment. Tuples from two or more relations may occur on the same Each tuple is page. tagged with the identification of the relation to which it belongs. access OPTIMIZER performs Finally the It first determines the path selection. evaluation order among the query blocks in the statement. Then for each query block, are relations in the FROM list the processed. If there is more than one a block, permutations of the relation in join order and of the method of joining are The access paths that minimize evaluated. are chosen from a total cost for the block alternate path This tree of choices. .solution is represented minimum cost by a structural modification of the parse tree. plan in the The result is an execution Access Specification Language (ASLI <lo>. The'primary way of accessing tuples in a relation is via an RSS scan. A scan returns a tuple at a time along a given access path. OPEN, NEXT, and CLOSE are the principal commands on a scan. Two available is type tuples of NEXTs on pages of from any belonging each query After a plan is chosen for the parse tree, block and represented in called. The CODE CODE GENERATOR is the GENERATOR is a table-driven program which into machine language translates ASL trees the plan chosen by the code to execute a relait uses OPTIMIZER. In doing this code templates, one tively small number of for each type of join method (including no join). Query blocks for nested queries are "subroutines" which return treated as which they values to the predicates in further The CODE GENERATOR is occur. described in <9>. of scans are currently SQL statements. The first find all the a segment scan to a given relation. A series of a segment scan simply examines all the segment which contain tuples, relation, and returns those tuples to the given relation. types for of scan is an index The second type An index may be created by a.Sy.stem scan. a relaon one or more columns of R user may have any number tion, and a relation of indexes on it. These (including zero1 stored on separate pages from indexes are tuples. containing the relation those <3>, implemented as B-trees Indexes are are pages containing sets of whose leaves of tuples .. which contain (key, identifiers of NEXTs on Therefore a series that key). an index scan does a sequential read along obtaining the the index, the leaf Pages of tuple identifiers matching a key, and using the data tuples to them to find and return Index leaf in key value order. the user so that NEXTs chained together pages are needinot reference any upper level Pages Of the i,ndex. the parse tree During code generation, machine code and is replaced by executable Either structures. data associated its to this immediately transfered control is the away in stored code is code or the depending on execution, database for later statement (program or the origin of the case, when the code terminal). In either upon the enecuted, it calls is ultimately system (RSS) via System R internal storage (RSII to scan the storage system interface stored relations in each of the physically the along scans are These query. the The access paths chosen by the OPTIMIZER. RSI commands that may be used by generated code are described in the next section. non-empty all the In a segment scan, page5 of a segment will be touched. regardfrom whether there are any tuples less of However, them. relation on desired the When an only once. each page is touched via an index is enamined entire relation touched index is each page of the scan, 24 considered to be in conjunctive normal form, and every conjunct is called a boolean factoy. Boolean factors are notable because every tuple returned to the user must satisfy every boolean factor. An index is said to match a boolean factor i,f the boolean factor is a sargable predicate whose referenced column is the index key; e.g., an index on SALARY matches the predicate 'SALARY = 20000'. More precisewe say that a predicate ly, or set of predicates matches an index access path when the predicates are sargable and the columns mentioned in the predicate(s) are an initial substring of the set of columns of the index key. For example. a NAME, LOCATION index matches NAME = 'SMITH' AND LOCATION = 'SAN JOSE'. If an index matches a boolean factor, an access using that index is an efficient way to satisfy the boolean factor. Sargable boolean factors can also be efficiently satisfied if they are expressed as search arguments. Note that a boolean factor may be an entire tree of predicates headed by an OR. page may be examined only once, but 'a data on it if it has two tuples more than once index orderin the not "close" which are into tuples inserted are If the ing. segment pages in the index 'ordering, and if physical proximity corresponding to this maintained, we say that index key value is A clustered index the index is clustered. property that not only each index has the a paw P but also each data page containing tuple from that relation will be touched ,.. only once in a scan on that index. .: An index scan need not scan the *entire Starting and stopping key values relation. to scan only in order may be specified those tuples which have a key in a range of Both index and segment scans index values. take a set of predicates, may optionally arguments (or SARGS), which called search applied to a tuple before it is are returned to the RSI caller. If the tuple predicates, it is returned; satisfies the the scan otherwise continues until it a tuple the either finds which satisfies the segment or the SARGS or exhausts This reduces specified index value range. overhead of making cost by eliminating the for tuples which can be effiRSI calls within the RSS. Not all ciently rejected form that can become predicates are of the SARGS. A sm predicate is one of the which can be put into the form) form (or value". SARGS ncolumn comparison-operator as a boolean expression of are expressed such predicates in disjunctive normal form. f 4. Costs fox sinqle relation access During catalog lookup, the OPTIIlIZER retrieves statistics on the relations in the query and on the access paths available on each relation. The statistics kept are the following: For each relation T. - NCARD(TIr the cardinality of relation T. - TCARDfT). the number of pages in the segment that hold tuples of relation T. - P(T), the fraction of data pages in the segment that hold tuples of relation T. P(T) = TCARD(T1 / (no. of non-empty pages in the segment). paths sections we will In the next 'several describe the process of choosing a plan for We will first describe evaluating a query. accessing a single simplest case, the extends and relation, and show how it to t-way joins of relations, generalizes multiple and finally joins, n-way query blocks (nested queries). For each index I on relation T, - ICARD( number of distinct keys in index I. the number of pages in index - NINDXfIlr These statistics are maintained in the System R catalogs, and come from several sources. Initial relation loading and index creation ihitialize these statistics. They are then updated periodically by an UPDATE STATISTICS command, which can be run R does not update System by any user. INSERT, DELETE, these statistics at every of the extra database or UPDATE because locking bottleneck th,is operations and the catalogs. system create at the would of statistics would tend Dynamic updating the accesses that modify to serialize relation contents. both the prediThe OPTIMIZER examines paths in the query and the access cates relations referenced by on the available and.formu1ate.s a cost prediction the queryI the following for each access plan, using cost formula: COST = PAGE -FETCHES + W * (RSI CALLS). I/O measure of cost is' a weighted This and fetched) CPU utilization (pages (instructions executed). W is an adjustaand CPU. factor between I/O ble weighting RSI CALLS is the predicted number 0.f tuples Since most of from the RSS. returned the RSS, spent in CPU time is System R's is a good approxithe number of RSI calls the utilization. Thus CPU for mation path to process a choice of a minimum cost total resources query attempts to minimize required. During bility and OPTIMIZER, predicates I. OPTIMIZER statistics, the Using these for each assigns a selectivity factor 'F' This boolean factor in the predicate list. selectivity factor very roughly corresponds which of tuples tb the expected fraction TABLE 1 gives wk 11 satisfy the predicate. the selectivity factors for different kinds We assume that a lack of of predicates. relation is implies that the statistics so an arbitrary factor is chosen. small, type-compatiof the execution semantic checking portion of the each query block's WHERE tree of The WHERE tree is is examined. 25 TABLE 1 column . SELECTIVITY FACTORS = value index) if there is an index on column F = 1 / ICARDfcolumn index of tuples among the an even distribution This assumes values. F = l/10 otherwise column1 key = column2 F = l/MAX(ICARD(columnl index), ICARD(column2 index)) if there are indexes on both column1 and column2 the smaller in the index with that each key value This assumes cardinality has a matching value in the other index. F = l/ICARD(col.umn-i index) if there is only an index on Column-i otherwise F = l/l0 column > value (or any other open-ended comparison) / (high key value - low key value)‘ F = (high key value - value) the range of key values of the value within Linear interpolation type and value is kno,wn at yields F if the column is an arithmetic access path selection time. column not arithmetic) F = l/3 otherwise (i.e. the fact that to this number, other than There is no significance for equal predicates guesses for than the it is l’ess selective than l/2. We is less and that it which there are no indexes, are satisfied by use predicates that hypothesize that few queries more than half the tuples. column BETWEEN value1 F = (value? AND value2 -. value11 / (high key value - low key value) A ratio of the BETWEEN value range to the entire key value range is and both is arithmetic factor if column used as the selectivity value1 and value2 are known at access path selection. F = l/4 otherwise Again there is no significance to this it is choice except that between the default selectivity factors for an equal predicate and a range p.redicate. column IN (list of values) F = (number of items in list) value 1 This is allowed to be no more columnA fpred * (selectivity than factor for column = l/2. IN subquery F = (expected cardinality of the subquery result) / cardinalities of all the relations in the (product of the subquery’s FROM-list). The computation of query cardinality will be discussed below. This formula is derived by the following argument: Consider the simplest case, where subquery is of the form “SELECT columnB FROM relationc . ..“. Assume that the set of all columnB values in relationc contains the set of all columnA values. If all are selected by the subquery, then the the tuples of relationc of the subquery predicate is always TRUE and F = 1. If the tuples are restricted by a selectivity factor F’, then assume that the set in the subquery result that match columnA values of unique values selectivity factor for the is proportionately restricted, i.e. the the product of all the subquery’s predicate should be F’. F’ is / (cardinality namely (subquery cardinality) selectivity factors. With a little optimism, we can of all possible subquery answers). include sifbqueries which are joins and extend this reasoning to subqueries in which columnB is replabed by an arithmetic expression This leads to the formula given above. involving column names. expression11 OR (pred expression2) F = F(pred1) + F(pred2) - Ffpredll 26 * F(pred21 (predll NOT AND (predtl F = Ffpredl) * F(pred27 Note that this assumes that column values are independent. pred F = 1 - Ffpredl Query cardinality (QCARD) is the product of the cardinalities of every relation in the query block's FROM list times the product the selectivity of all factors of that query block's boolean factors. The number of expected RSI calls (RSICARD) i* the product of the relation cardinalities times the of selectivity factors the s_arsablq boolean factors, since the sargable boolean factors will be put into search arguments which will filter out tuples without returning across the RSS interface. relation query, we need only to examine the cheapest access path which produces tuples in each "interesting" order and the cheapest "unordered" access path. Note that an "unordered" access path may in fact produce tuples in some order, but the order is not "interesting". If there are no GROUP BY or ORDER BY clauses on the query, then there will be no interesting orderings, and the cheapest access path is the one chosen. If there are GROUP BY or ORDER BY clauses, then the cost for producing that interesting ordering must be compared to the cost of the cheapest unordered the path J&U cost of sorting QCARD tuples into the proper order. The cheapest of these alternatives is chosen as the plan for the query block. Choosing,an optimal access path for a relation consists of using single these selectivity factors in formulas together statistics on available with the access paths. Before this process is described, a definition is needed. Using an index or sorting access path tuples produces tuples in the index value or sort key order. We say that a tuple order is an if that jnterestinq order order is one specified by the query block's GROUP BY or ORDER BY clauses. The cost formulas for single'relation access paths are given in TABLE 2. These formulas give index pages fetched plus data fetched plus the weighting pages factor times RSI tuple retrieval calls. W is the weighting factor between page fetches and RSI calls. Some situations give several alternative formulas depending on whether the set of tuples retrieved will fit entirely in the RSS buffer pool for effective buffer pool per user). We assume for clustered indexes that a page remains. in the buffer long enough for every tup1e to from it. be retrieved For non-clustered indexes, it is for those assumed that relations not fitting in the the buffer, .relation is sufficiently large with respect buffer size to the that a page fetch is for every tuple retrieval. required For single relations, the cheapest access path is obtained by evaluating the cost for each available access path (each index on the relation, plus a segment scan). The costs will be- described below. For each such access path, a predicted cost is computed along with the ordering of the tuples it will produce. Scanning along the SALARY index in ascending order, for will produce example, some cost C and a tuple order of SALARY (ascending). To find access the cheapest plan fo; a single TABLE 2 COST FORMULAS SITUATIOR lxGlc(inaases) Unique index matching an equal predicate 1+1+w Clustered index I matching one or more boolean factors F(predsI * (NINDX(Il + TCARD) + W x RSICARD Non-clustered index I matching one or more boolean factors F(predsl * (NINDXfI) + NCARD) + W * RSICARD or F(predsl this * (NINDXfI) number fits Clustered matching index I not any boolean + TCARD) + W * RSICARD if in the System R buffer (NINDX(Il + TCARDI + W * RSICARD (NINDX(Il + NCARDI + W * RSICARD factors Non-clustered index I not matching any boolean factors or (NINDX(II + TCARD) + W * RSICARD if this number fits in the System R buffer TCARD/P + W * RSICARD Segment scan 27 5. pccess & selecti- “Clustering” on a column means that tuples which have the same value in that column are physically stored close to each other so that one Page access will retrieve several tuples. .&E joins Eswaran <4> and Blasgen 1976, In examined a number of methods for performing The performance of each of 2-way joins. under a variety these methods was analyzed of relation cardinalities. Their evidence than very small for other indicates that were one of two join methods relations, always optimal or near optimal. The System two chooses between these R optimizer We first describe these methods, methods. for and then discuss how they are extended Finally we specify how the n-way joins. join order (the order in which the relachosen. tions are joined) is For joins involving two relations, the two relations are called the outer relation, from which a will be retrieved tuple first, and the m relation, from which tuples will be retrieved, possibly depending on the values obtained in the outer relation tuple. A predicate which relates columns of two tables to be joined is called a &&I The columns referenced in a Predicate. join predicate are called && golumnq. N-way joins can be visualized as a sequence of a-way joins. In this visualization, two relations are joined together, the resulting composite relation is joined with the third relation, etc. At each step of the n-way join it is possible to identify the outer relation (which in general is composite.1 and the inner relation (the relation being added to the join). Thus the methods described above for two way joins are easily generalized to n-way joins. However, it should be emphasized that the first 2-way join does not have to be completed before the second t-way join is started. As soon as we get a composite tuple for the first t-way join, it can be joined with tuples of the third relation to form result tuples for the 3-way join, etc. Nested ioop joins and merge scan joins may be mixed in the same query, e.g. the first two relations of a three-way may be join joined using merge scans and the composite result may be joined with the third relation using a nested loop join. The intermediate composite relations are physically stored only if a sort is required for the next join step. When a sort of the composite relation is not specified, the composite relation will be materialized one tuple at a time to participate in the next join. The first join method, called the loops. method, uses scans, in any nested order, on the outer and inner relations. The scan on the outer relation is opened and the first tuple is retrieved. For each outer relation tuple obtained, a scan is opened on the inner relation to retrieve, one at a time, all the tuples of the inner relation which satisfy the join predicate. The composite tuples formed by the outer-relation-tuple / inner-relation-tuple pairs comprise the result of this join. We now consider the order in which the relations are chosen to be joined. It should be noted that although the cardinality of the join of n relations is the same regardless of join order, the cost of joining in different orders can be substantially different. If a query block has n relations in its FROM list, then there are n factorial permutations of relation join orders. The search space can be reduced by observing that that once the first k relations are joined, the method to join the k+l-st relation is composite to the the independent of the order of joining the applicable predicates are first k; i.e. of interesting orderings the same, the set is the same, the possible join methods are same, etc. this property, an the Using the search is to efficient way to organize best join order for successively find the larger subsets of tables. The second join method, called mereinq the outer and inner relascans. requires tions to be scanned in join column order. This implies that, along with the columns mentioned in ORDER BY and GROUP BY, columns of equi-join predicates (those of the form Table1 .columnl = TableZ.column2) also define “interesting” orders. If there is more than one join predicate, one of them used as the join predicate and the is others are treated as ordinary predicates. The merging scans method is only applied to equi-joins, although in principle it could be applied to other types of joins. If one to be joined has or both of the relations no indexes on the join column, it must be sorted into a temporary list which is ordered by the join column. The more complex logic of the merging join method takes advantage of the scan avoid rescanordering on join columns to ning the entire inner relation (looking for each tuple a match 1 for of the outer It does this by synchronizing relation. and outer scans by reference the inner to matching join column values and by “remember ing” where matching join groups .are Further savings located. occur if the relation is inner clustered on the join (as would be true if it is column the column). output of a sort on the join A heuristic is used to reduce the join considered. permutations which are order Whe’h possible, is reduced by the search orders which consideration only of join predicates relating the inner join have already relation to the other relations participating in the join. This means that tl,tt,...,tn only joining relations in are orderings til,ti2,...,tin those (j=2,...,n) .in which all j examined for either predicate tij has at least one join (1) 28 any was specified. Note exists with the correct performed for ORDER BY the ordered solution is the cheapest unordered COSt of. sorting into the with some relation tik, where k < j, or (2) for all k > j, tik has no join predicate with til,tit,...,or ti(j-1). This means that all joins requiring Cartesian products are performed as late in the join sequence as possible. For example, if Tl.T2,T3 are the three relations in a query block’s FROM list, and there are join predicates between Tl and T2 and between T2 and T3 on different columns than the Tl-T2 join, then the following permutations are .’ not considered: *, T l-T3-T2 T3-T l-T2 that if a solution order, no sort is or GROUP BY, unless more expensive than solution plus the required order. The number of solutions which must be stored is at most 2X*n (the number of subsets of n tables) t i.mes the number of interesting result orders. The computation time to generate the tree is approximately proportional to the same number. This number is frequently reduced substantially join order heuristic. Our experiby the ence is that typical cases require only a of storage and a few few thousand bytes of a second of 3701158 CPU time. tenths Joins of 8 tables have been optimized in a few seconds. ' To find the optimal plan for joining n relations, a tree of possible solutions is constructed. As discussed above, the search is performed by finding the best way to join subsets of the relations. For each set of relations joined, the cardinality of composite the relation is estimated and saved. In addition, for the unordered join, and for each interesting order obtained by the join thus far, the cheapest solution for achieving that order and the cost of that solution are saved. A solution consists of an ordered list of the relations to be joined, the join method used for each join, and a plan indicating how each relation is If to be accessed. either the outer composite relation or the before needs to be sorted inner relation then that is also included in the the join, case, single relation in the As plan. those listed in orders are “interesting” GROUP BY or ORDER BY query block’s the column every join if any. Also clause. To miniorder. defines an “interesting” of different interesting nimize the number orders and hence the number of solutions in equivalence clns,ses for interestthe tree, the best are computed and only ing orders class is each equivalence for solution is a join if there For example, saved. join predicate E. DNO = D.DNO and another predicate D.DNO = F.DNOe then all three of same order the columns belong to these equivalence class. Gomputation DJ costs The costs for joins are computed from on each of the costs of the scans the The costs relations and the cardinalities. on each of the relations are of the scans computed using the cost formulas for single presented in section relation access paths b. cost of scanning Let C-outerfpath 1) be the and N be the the outer relation via pathl, outer relation tuples of the cardinality which satisfy the applicable predicates. N is computed by: of all the cardinalities of N = (product join so far) * relations T of the of (product of the selectivity factors al 1 applicable predicates). cost of scanning Let C-innercpatht) be the applying all applicable the inner relation, scan Note that in the merge predicates. contiguous the means scanning join this the inner relation which corresgroup of ponds to one join column value in the outer Then the cost of a nested loop relation. join is c-nested-loop-join(pathl.path2)E C-outerfpathll + N * C-inner(path21 constructed by tree is The search iteration ‘on the number of relations joined found to way is the best First, so far. each relation for single each access the and for ordering tuple interesting best way of Next, the case. unordered found, these is any relation to joining join order. for the heuristics subject to joining pairs solutions for This produces best way to join Then the of relations. sets of three relations is found by COnSidand of two relations all sets eration of joining in each third relation permitted by For each plan to the join order heuristic. of the of relations, the order join a set This composite result is kept in the tree. scan join of a merge allows consideration sorting the compowhich would not require After the complete solutions (all Of site. been together) have relations joined the the cheapest optimizer chooses found, the solution which gives the required order, if can be scan join of a merge The cost actually doing broken up into the cost of the of sorting cost the merge plus the The outer or inner relations, if required. cost of doing the merge is C-merge(pathlspath2)= C-outer(path1) + N * C-inner(path2) inner relation case where the For the is sorted into a temporary relation none of the single relation access path formulas in the inner In this case section 4 apply. scan is like a segment scan except that the merging scans method makes use of the fact so that inner relation is sorted that the entire scan the necessary to is not it For a match. looking for inner relation formula for this case we use the following the cost of the inner scan. C-innercsorted list) = TEHPPAGES/N + W*RSICARD number of pages TEMPPAGES is the where 29 . required to hold the inner relation. This formula assumes that during the merge each page of the inner relation is fetched once. interesting to observe that the It is cost formula for nested loop joins and the for merging scans are essencost formula same. The reason that merging tially the scans is sometimes better than nested loops cost of the inner scan may be is that the After sorting, the inner much less. clustered on the join column relation is which tends to minimize the number of pages not necessary to scan fetched, and it is inner relation the entire (looking for a each tuple of the outer relamatch) for tion. JoB pzqxr--JOB CLERK TYPIST SALES MECHANIC 5 6 9 12 sorting a relation, cost of The includes the cost of retrievC-sort(path), the data using the specified access ing the data, which may involve Path, sorting passes, and putting several the results into a temporary list. Note that prior to the inner table, sorting only the local predicates can be applied. Also, if it is necessary to sort a composite result, the entire composite relation must be stored in relation can be a temporary before it sorted. The cost of inserting the compointo site tuples a temporary relation before sorting is included in C-sortfpath). SELECT FROM WHERE AND AND AND NAME, TITLE, SAL, DNAME EMP, DEPT, JOB TITLE=‘CLERK’ LOCYDENVER EMP.DNO=DEPT.DNO EMP.JOB=JOB.JOB “Retrieve the name, salary, job title, and department name of employees who are clerks and work for departments in Denver.” Figure We now show how the search is done for the example join shown in Fig. 1. First we find all of the reasonable access paths for relations with only local single their for this predicates applied. The results are shown in Fig. 2. There are example paths for an three access the EHP table: on JOB, and a index on DNO, an index The interesting orders are segment .scan. the DNO and JOB. The index on DNO provides DNO order and the index on JOB tuples in the tuples in JOB order. The provides our access for segment 'scan path is, unordered. For this example we purposes, assume that the index on JOB is the cheappath, so the segment scan path is est the DEPT relation there are For pruned. two access paths, an index on DNO and a segment scan. We assume that the index on DNO is cheaper so the segment scan path is pruned. For the JOB relation there are two access paths, an index on JOB and a segment We assume that the segment scan path scan. The saved. is cheaper, so both paths are saved in the results just described are as shown in Fig. 3. In the search tree the notation C(EMP.DNOl or figures, C(E.DNOl means the cost of scanning EMP via DNO index, the predicates all applying which are applicable given that tuples from the specified set of relations have already been fetched. The notation Ni is used to represent the cardinalities of the different partial results. are Next, found solutions for pairs by joining a second of Access Path for Single Relations l l EM” Eligible Predicates: Local Predicates Only “Interesting” Orderings: DNO,JOB 1 %:DNO 1 :I., 1 Ftt $EMP.DNO, ’ N2 C(DEPT.DNO) CIEMP seg. scan) ’ N2 C(DEPT reg. scan) pruned X JOB: index JOBJOB segment scan on JOB I $JOB.JOB) the.$esults Fig. access relation I N doe Figure ‘:t 3. sag. scan) 2. for single relations shown in single relation, we find paths for joining in each second for which there exists a predicate For connecting we consider each it to the access first path relation. First selection nested loop example joins. In this assume that the EMP-JOB join is cheapest This accessing JOB on the JOB index. relations relation JOIN example 1. to 30 for we by is since likely it can fetch directly the tuples with matching JOB, (without having to scan the entire relation). In practice the cost of joining is estimated using the formulas given earlier and the cheapest path is chosen. For joining the EMP relation to the DEPT ,relation we assume that the DHO index is cheapest. The best access path for each second-level relation is combined with each of the plans in Fig. 3 to form the nested loop solutions ‘shown ,:' in Fig. 4. *, C(EMP.DNOI ON0 order Figure 3. Search tree for single Referring to Fig. 3, we see that the access path chosen for the' the DEPT relation is the DHO index. After accessing DEPT via this index, we can merge with EMP using the DHO index on EMP, again without any sorting. However, it might be cheaper to sort EMP first using the JOB index as input to the sort and then do the merge. Both of these cases are shown in Fig. 5. As each of the costs shown in Figs. 4 and 5 are computed they are compared with cheapest the equivalent solution (same tables and same result order) found so far, and the cheapest solution is saved. After this pruning. solutions for all three relations are found. For each pair of relations, we find access paths for joining in the remaining third relation. As before we will extend the tree using nested loop joins and merging scans to join the third relation. The search tree for three relations is shown in Fig. 6. Note that in one case both the composite relation‘ and the table being added (JOB1 are sorted. Note also that for some of the cases. no sorts are performed at all. In these cases, the composite result is materialized one tuple at a time and the intermediate composite relation As is never stored. 'before, as each of the costs are computed they are compared with the cheapest solu- CIJOB rsg. scanI unordered CIJOB.JOBi JOB ordef UDEPT.ONO~ DNO order C(EMP.JOBI JOB order For merging JOB with EMP, ue only consider the JOB index on EMP since it is the cheapest access path for EHP regardless of order. Using the JOB index on JOB, we can merge without any sorting. However, it might be cheapter to sort JOB using a relation scan as input to the sort and then do the merge. relations the solutions using we generate Next see on the As we scans method. merging left side of Fig. 3, there is a scan on the so it is possiEHP relation in DHO order, and the DHO scan on ble to use this scan scans to do a merging DEPT relation the Although it is join, without any sorting. without merging join to do the possible might be it sorting as just described, cheaper to use the JOB index on EMP, sort Note that we never on DHO. and then merge. consider sorting the DEPT table because the on that table is already in cheapest scan DHO order. 1 (DEPT. EMP) (EMP. OEPT) ‘Nt N4 n 1 N4 4. C(E.JOBl + N,C,(D.DNO) JOE order Extended C(E.DNO) CULJOBI N,&J.JOB) DNO order N&J.JOBI JOE order search tree for second 31 C(D.DNO) + N,C,(E.DNO) DNO order relation 3 N3 N3 index EMPJOB Index DEPT.DNO C(E.DNO) + N&(D.DNO) DNO order Figure L Nl Index DEPT.DNO segment Index JOB.JOB Index EMP.JOB h&x EMP.DNO (JOB, EMP) Index EMP.JOB % n NS IXJJOB) + N,C,(E.JOBI JOB order C(J seg scan) (nested NIL&JOB) unordered loop join) IEMP, JOB1 *JOB, EMP) f index E.JOB NI 4 son Index D.DNO \ N3 Sort JOB reg scan tq JOB into L2 Ll E.JO8 with J.JOB Ll with D.LlNO DNO order Figure M-C Merpa N, l ON0 order 5. N3 Sort JOI s.?g. scan hy JOE into L2 \ 0 EON0 with DDNO :i “JI E.JOB ly DNO into Merge Be!lmmt SG .JC Merge EJOB with Merge D.DNO with LZ Ll NS JOB order Extended N5 JOB order search ,EMP. DEW Merge J.JOB with E.JOB tree N5 JOB order for second relation (merge Mew LZ with E.JOB Ns JOB order join) fi b M-9 with L5 Merge I I with D.DNO Figure 6. Extended search tree 32 for third relation D.DNO 6. Nested Queries the query: SELECT NAME FROM EMPLOYEE X WHERE SALARY > (SELECT SALARY FROM EMPLOYEE WHERE EMPLOYEE-NUMBER= X.MANAGERl This selects names of EMPLOYEE's that earn more than their HANAGER. Here X identifies the query block and relation which furnishfor the correlation. es the candidate tuple For each candidate tuple of the top level query block, the MANAGER value is used for evaluation of the subquery. The subquery result is then returned to the "SALARY >" for predicate testing acceptance of the candidate tuple. A query may appear as an operand of a predicate of the form "expression operator query". Such a query is called a Nested Query or -a Subquery. If the operator is one of the six scalar comparisons (=, -1, <, <=I, then >t >=, the subquery must return a single .value. The following example using the 'I=" operator was given in section 2: Ii SELECT NAME 3, FROM EMPLOYEE WHERE SALARY = (SELECT AVG(SALARY) FROM EHPLOYEE) If the operator is IN or NOT subquery may return a set of For example: SELECT NAME FROM EMPLOYEE WHERE DEPARTMENT-NUMBER IN (SELECT DEPARTMENT-NUHBER FROM DEPARTMENT WHERE LOCATION='DENVER'l the IN then values. correlation If a subquery is not directly below the query block it references but is separated from that block by one intermediate or more blocks, then the correlation subquery evaluation will be done before evaluation of the highest of the intermediate blocks. For example: level 1 SELECT NAME FROM EMPLOYEE X WHERE SALARY > level 2 (SELECT SALARY FROM EMPLOYEE WHERE EMPLOYEE-NUMBER = level 3 (SELECT MANAGER FROM ERPLOYEE WHERE EMPLOYEE-NUMBER = X.MANAGERll This selects names of EMPLOYEE's that earn their MANAGER's MANAGER. As more than candidate tuple of the before, for each EMPLOYEE.MANAGER level-l query block, the value is used for evaluation of the level-3 the case, because block. In this query level 3 subquery references a level 1 value it level 2 values, but does not reference new level 1 for every once is evaluated for every level 2 candidate tuple. but not candidate tuple. In both examples, the subquery needs to be evaluated only once. The OPTIMIZER will arrange for the subquery to be evaluated before the top level query is evaluated. If a single value is returned, it is incorporated into the top level query as though it had been part of the original query statement; for example, if AVG(SAL1 above evaluates to 15000 at execution time, then the predicate becomes "SALARY = 15000". If the subquery can return a set of values, they are returned in a temporary list, an internal form which is more efficient than a relation but which can only be accessed sequentially. In the example above, if the subquery returns the list (17,241 then the predicate is evaluated in a manner similar to the way in which it would have been evaluated if the original predicate had,been DEPARTMENT-NUMBER IN (17,210. value referenced by a correlaIf the (X.MANAGER above) is not tion subquery tuples candidate set of the unique in the same managmany employees have (e.g., still procedure given above will er), the for subquery to be re-evaluated cause the value. of a replicated each occurrence relation is the referenced However, if the column. referenced ordered on the can be made conditional, re-evaluation depending. on a test of whether or not the current referenced value is the same as the candidate tuple. If the previous one in the previous evaluation they are the same, In some cases, result can be used again. the referenced even pay to sort it might column in order relation on the referenced subqueries unnecesto avoid re-evaluating whether or to determine sarily. In order are values column referenced not the like OPTIHIZER can use clues unique. the NCARD is the relation NCARD > ICARD. where index cardicardinality and ICARD.is the referenced on the nality of an index column. A subquery may also contain a predicate with a subquery. down to a (theoretically) arbitrary level of nesting. When such subqueries do not reference columns from tables in higher level query blocks, they the level are all evaluated before top In this case, the most query is evaluated. subqueries deeply nested are evaluated since any subquery must be evaluated first, before its parent query can be evaluated. A subquery may contain a reference to a candidate tuple of a value obtained from a example (see block level higher query is called a correlaSuch a query below). A correlation subquery must tion subquery. each for re-evaluated principle be in referenced query from the candidate tuple re-evaluation must be done This block. parent subquery's the correlation before level block can be predicate in the higher of the acceptance qr rejection tested for consider As an example, candidate tuple. 33 7. Conclusion the path selection has The System R access queries, single table for been described work Evaluation and nested queries. joins, choices made to the the on comparing and will be choice is in progress, "right" Prelimidescribed in a forthcoming paper. although the results indicate that, nary optimizer are often costs p.redicted by the true in absolute value, the not accurate optimal path is selected in a large majorithe ordering In many cases, cases. ty of the estimated costs for 'all paths among same as that considered is precisely the among the actual measured costs. current more procedural languages. Cited and General References R: Astrahan, M. M. et al. System Relational Approach to Database Management. ACM Transactions on Database Systems, Vol. pp. 97-137. 1, No. 2, June 1936, System R: A al. M. M. et <2> Astrahan, System. To Relational Database Management appear in Computer. E. OrganizaR. and McCreight, <3> Bayer, Ordered Large of tion and Maintenance Acta Infornatica, Vol. 1, 1972. Indices. <4> Blasgen, M.W. and Eswaran, K.P. On the a Relational Data Evaluation of Queries in RJl745. IBM Research Report Base System. April, 1976. <5> Chamberlin, D.D., et al. SEQUELZ: A Unified Approach to Data Definition, Manipulation, and Control. IBH Journal of Research and Development, Vol. 20, No. 6, Nov. 1976, pp. 560-575. <6> Chamberlin, D.D., Gray, J.N., and Traiger, 1.1. Views, Authorization and Locking in a Relational Data Base System. ACM National ProceedComputer Conference ings, 1975, pp. 425-430. <7> Codd, E.F. A Relational Model of Data for Large Shared Data Banks. ACM Communications, Vol. 13. No. 6, June, 1970, pp. 377-387. <8> Date, C.J. An Introduction to Data Base 1975. Systems, Addison-Wesley, CS> Lorie. R.A. and Wade, B.W. The Compilation of a Very High Level Data .Language. IBM Research Report RJ2008, May, 1977. <lO> Lorie, R.A. and Nilsson, J.F. An Access Specification Language for ,a Relational Data Base System. IBH Research Report RJ2218. April, 1978. (11) Stonebraker, M.R., Wang, E., Kreps. P., and Held, G-D. The Design and Implemenon Database INGRES. ACM Trans. tation of Systems, Vol. 3, September, 1976, 1, No. PP. 189-222. Implemen(12) Toad, s. PRTV: An Efficient tation for Large Relational Data Bases. Large Proc. International Conf. on. Very September, Data Bases, Framingham. Mass., <l> the cost of path selection Furthermore. overwhelming. For a two-way join, is not is approximately of optimization the cost equivalent to between 5 and 20 database This number retrie,vals. becomes even more insignificant when such a path selector is placed in an environment such as System R, application where programs are compiled many times. of once and run The cost optimization is amortized over many runs. of path The key contributions this selector over area are other work in this the expanded use of statistics (index for example), the inclusion of cardinality, CPU utilization into the cost formulas, and the method of determining join order. Many CPU-bound, particularly queries are merge which temporary are joins for relations created and sorts performed. The concept factor" of "selectivity permits the optimof as many of the izer to take advantage query's restriction predicates as possible RSS search in the arguments and access By remembering paths. "interesting orderequivalence classes joins and ing" for ORDER or GROUP specifications, the optimizer does more bookkeeping than most path selectors, but this additional work in many cases results in avoiding the storage and sorting of intermediate results. query Tree pruning and tree searching techniques allow this additional bookkeeping to be performed efficiently. 1975. <13> Wong, E., and Youssefi, K. Decomposifor Query Processing. ACH tion - A Strategy Transactions on Database Systems, Vol. 1, 3 (Sept. 1976) pp. 223-2’41. No. (19) ZlOOf, M.H. Query by Example. Proc. Vol. 94, AFIPS Press, AFIPS 1975 NCC, I) Montvale, N.J., pp. 431-437. More work on validation of the optimizer cost formulas needs to be done, but we work can conclude from this preliminary database that management systems can support non-procedural query languages with performance comparable to those supporting 34