Abstract - Computer Technology Institute

COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 The LS-Quintary Tree A Universal Multidimensional Data Structure with Linear Space (Extended Abstract) Georgia Panagopoulou, Spiros Sirmakessis, Athanasios Tsakalidis (22 June 1999) Abstract In this paper we introduce a variation of the Quintary tree, which requires linear storage space; this is actually an improvement on the original form of the Quintary tree, introduced in [11]. The Quintary tree is a file structure for multidimensional database systems that answers all the known match queries (exact, partial, range, partial range). The Quintary tree, in the form that it was firstly introduced by Lee and Wong, can be built in O(N(logN)k/(k-1)!) time and requires similar storage, for a file of N records each consisting of k keys. The worst-case time bounds for the search algorithms are respectively O(logN+k), O(3k-s(s+logN)+t), O(logkN+t) and O(3k-slogsN+t), where s is the number of keys specified in a partial match (or range) query and t is the number of records retrieved by the query. In this paper we introduce the LS-Quintary tree that can be built with the same time bounds, but it uses only O(kN) space. The time bounds for answering the queries are only increased by adding a logN factor for every tuple in the answer, resulting in a substantial improvement on the product of the space of the tree and the time required to answer any one of the queries defined here as the potential value added of the improved structure. Moreover, our structure can answer range queries using linear space. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 1 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ 1. Introduction During the last two decades a great effort of the research has been consumed in the construction of an efficient data structure for information retrieval ([1], [4], [8], [20], [21], [22]). Such a data structure should be able to answer any retrieval request or search query for k-dimensional data (e.g. records with k attributes). Let S be a data set containing N records, each of which is an ordered k-tuple (r0,r1,,rk-1) of values. Each component of the k-tuple is called an attribute or a key (see [11]). A query specifies certain conditions to be satisfied by the keys of the records and can be classified according to the following types: * Exact match query: specifies a value for each key. * Partial match query: specifies s<k keys and the remaining k-s are unspecified. * Range query: specifies for each key ki, a range (e.g. (li, ri)). * Partial range query: specifies a range for each of the s<k keys. A response to a query is realized by initiating an appropriate search procedure and retrieving all the records requested by the query. Different data structures have been developed to support some of these queries (e.g. the k-d trees [1], [10], Range trees [20], k-ranges [2], Multiple Attribute Trees (MAT) [8], [15], etc.). The Quintary tree was proposed by Lee and Wong ([11]) as a data structure for building an efficient information retrieval system. Consider the data domain as a k-dimensional space and each key as a coordinate axis. Let di be the number of distinct values assumed by the keys of the records of S,  key ki, diN. A k-dimensional Quintary tree, storing N records, can support exact match queries in O(logN+k) time, partial match queries, where only s<k coordinate values are specified, in O(3k-s(s+logN)+t) time (t is the number of records satisfying the query), range queries in O(log kN+t) time, partial range queries, where a range is given for only s<k coordinates, in O(3k-slogsN+t) time, using O(NlogkN/(k-1)!)) storage. Although the Quintary tree is a well-formed data structure that answers all possible database queries, its space requirements make it space-inefficient for most of the databases. In this paper we apply ideas and techniques from the theory of persistence ([7], [13], [14]) in order to reduce the space requirements of the tree. More precisely, we use ideas from the fat node technique, described in [7], and we apply them to the Quintary tree, to eliminate the data redundancy used in the original Quintary tree. This technique eliminates any duplicates of the data stored and reduces the space to O(kN). The time required to answer any of the four queries is increased by adding a logN factor for every tuple in the answer; namely, exact match in O(logN+klogN) time, partial match in O(3k-s(s+logN)+tlogN) time (t is the number of records satisfying the query), range queries in O(log kN+tlogN) time, partial range queries in O(3k-slogsN+tlogN) time. The contribution of this work is the significant reduction of the space required. More precisely, let us introduce the potential of the Quintary tree. We define the product SDxTDQ to be the potential PDQ of a data structure D for a query Q, where SD is the space occupied by a data structure and TDQ the time required to answer a query Q. We will use this function to define the significance of the results of this paper. Moreover, we will focus in the results for range searching. Similar results can be computed for each one of the queries. The PDQ for the Quintary tree in the case of range searching is: PQuintaryRange= SQuintaryxTQuintaryRange =O(NlogkN logk N  t ) (k 1)! T he same potential function for the LS-Quintary tree is: PLSQuintaryRange= SLSQuintaryxTLSQuintaryRange =O(kN).O(logkN+tlogN) =O(kN logkN+tkNlogN) So the potential value added of the LS-Quintary tree in the case of range searching is: ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 2 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ PQu int aryRange PLSQu int aryRange   N logk N(logk N  t)  O  k  (k  1)!(kN log N  tkN log N)  1 logk N  t  1  k 1 logk1 N   O log N k 1  k!  log N  t   k! = O This is a significant improvement in the general behaviour of the data structure. Similar results can be calculated for each one of the queries. The result for range searching satisfies the lower bounds introduced by Chazelle (in [5] and [6]). Chazelle in [5] has proved that if a data structure provides a query time of O(t+logcN), for any arbitrary constant c, where t is the number of points to be reported, then its size must be Ù(N(logN/loglogN) k-1). However, as it is stated in the same work, if instead of O(t+logcN), we allow a more general function for the queries, such as O(tlogN+logcN), Chazelle’s technique breaks down completely. Therefore, Ù(Í) is the only lower bound on the storage requirement we can expect to derive. This paper presents for the first time an example of a data structure that can approximate the observation of Chazelle. This paper is organized as follows: section 2 presents in brief the Quintary tree. The results of persistence are mentioned in section 3. The main results, that is, the LS-Quintary tree, its algorithms, space and time analysis, can be found in section 4. A comparison of the LS-Quintary tree with the most known data structures is made in section 5. 2. The Quintary Tree The Quintary tree was proposed by Lee and Wong ([11]) as a data structure for supporting an efficient information retrieval system. Consider the data domain as a k-dimensional space and that each key corresponds to a coordinate axis. Let d i be the number of distinct values assumed by the keys of the records of S,  key ki, diN. The Quintary tree that stores the N records can be viewed as a multidimensional tree with k levels. At each level we have a perfectly balanced binary tree as a skeleton and attached to each node three additional subtrees, the MIDDLE, the MIDLEFT and the MIDRIGHT, which are at one level lower than the node itself. Suppose we are at the first level, namely level 0. The skeleton tree of level 0 is a binary tree with c(0) nodes. Each node w has a value which corresponds to a hyperplane H o denoted by the equation H0=value(w). If we take the node w to represent a subset of S in k-dimensional space, then the two sons of w represent two subsets containing points which lie on one side of H 0 (left and right) respectively. The points of the subset represented by w, which lie on H0, form a subset in (k-1)-dimensional space (0-coordinate equals value(w)) and are stored in the MIDDLE(w) subtree attached to w. The points which lie to the left/right of the H 0, form two subsets in (k-1)-dimensional space (these points are projected on H0); they are stored in subtrees MIDLEFT(w) and MIDRIGHT(w) respectively. In general, each node at level i represents a subset in (k-i)-dimensional space and has three subtrees storing points in (k-i-1)-dimensional space at level i+1 and two subtrees in (k-i)-dimensional space, that lie to the left and to the right of the hyperplane Hi, denoted by the equation Hi=nodevalue. At the (k-1)th level (one-dimensional space), the MIDDLE, MIDLEFT and MIDRIGHT pointers point to a list of page number that contains the corresponding data records. This makes the algorithms for answering the various queries more uniform and totally recursive. More details can be found in [11]. Figure 1 shows a recursive representation of a Quintary tree. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 3 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ r S(N,k)= |A|+|B|+|C|=N |A|=|A’|, |C|=|C’| C A RIGHT LEFT A’ C’ B MIDLEFT MIDDLE MIDRIGHT r S(N,1)= C A RIGHT LEFT B C’ A’ A’, C’: List of Page Numbers that Contain Records in A, C Respectively B: Page Number that Contains the Record r. Figure 1: Recursive representation of a Quintary tree Lee and Wong have proved the following theorem: Theorem 1: A k-dimensional Quintary tree storing N records can support: * an exact match query in O(logN+k) time, * a partial match query, where only s<k coordinate values are specified, in O(3k-s(s+logN)+t) time (t is the number of records satisfying the query), * a range query in O(logkN+t) time, * a partial range query, where a range is given for only s<k coordinates, in O(3 k-slogsN+t) time, * requires O(NlogkN/(k-1)!) storage and * can be built in O(NlogkN/(k-1)!)) time.  The later two bounds can be reduced by a logN factor if we do not use the MIDLEFT and MIDRIGHT subtrees for the nodes at (k-1)th level and slightly change the algorithms given by Lee and Wong. Figure 2 presents an example of a Quintary tree for the database shown in Table 1. A B Page No a x 1 b x 2 c x 3 d y 4 e y 5 f x 6 g z 7 h z 8 I x 9 j y 10 k z 11 l x 12 m y 13 n z 14 o x 15 Table 1. The tuples of an example database. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 4 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Figure 2. The Quintary tree for the database shown. The above time complexities suggest that Quintary tree can answer more efficiently partial match queries than other known structures, at a cost of extra storage requirement. 3. Persistence A data structure is said to be ephemeral if any changes to the structure destroy the old version. Accesses and updates can be done only on the current version. We call a data structure persistent if it supports accesses to multiple versions. The structure is partially persistent if all versions can be accessed but only the newest version can be modified and fully persistent if every version can be both accessed and modified ([7], [13], [14]). Driscoll et al. in [7] give efficient methods for transforming a pointer-based ephemeral data structure into one that is partially or fully persistent in a way that satisfies ideal resource bounds: a constant factor in query time over that of the ephemeral structure and a constant amount of space per change in the ephemeral structure. More precisely, they showed that if an ephemeral structure has nodes of constant bounded in-degree, then the structure can be made partially persistent at an amortized1 space cost of O(1) per update step and a constant factor in the amortized time per operation. Two methods are introduced in [7]; the first and simpler is the fat node method, which applies to any ephemeral linked structure and makes it partial persistent at a worst-case space cost of O(1) per update step and a worst-case time cost of O(logm) per access or update step (m is the total number of update operations). The idea is to record all changes made to node fields in the nodes themselves, without erasing old values of the fields. This requires that we allow nodes to become arbitrarily “fat”, i.e., to hold an arbitrary number of values of each field. To be more precise, each fat node will contain the same information and pointer fields as an ephemeral node, along with space for an arbitrary number of extra field values. Each extra field has an associated field name and a version stamp. The version stamp indicates the version in which the named field was changed and assigned to the specified value. In addition, each fat node has its own version stamp, indicating the version in which the node was created. The resulting persistent structure has all the versions of the ephemeral structure embedded in it. We navigate through the persistent structure as follows: When an ephemeral access step applied to version i accesses field f of a node, we access the value in the corresponding fat node whose field name is f, choosing among several such values the one with maximum version stamp no greater than i. The second method, the node-copying method, allows nodes in the persistent structure to hold only a fixed number of field values. When we run out of space in a node, we create a new copy of the node, containing only the newest value of each field. We must also store pointers to the new copy in all predecessors of the copied node in the newest version. If there is no space in a predecessor for such a pointer, the predecessor, too, must be copied. Nevertheless, if the underlying ephemeral structure has nodes of constant bounded in-degree then we can derive an O(1) amortized bound on the number of nodes copied and the time required per update step. 1By amortized cost we mean the cost of an operation averaged over a worst-case sequence of operations. See the survey paper of Tarjan [17]. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 5 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Moreover, using more powerful techniques applied to the fat node approach, we can make an ephemeral linked structure fully persistent at a worst-case space cost of O(1) per update step and an O(logm) worst-case time cost per access or update step. In this case, the versions of a fully persistent structure are only partially ordered whereas the various versions of a partially persistent structure have a natural linear ordering. This partial ordering is defined by a rooted version tree, whose nodes are the versions (0 through m) with version i the parent of version j if version j is obtained by updating version i. The sequence of updates giving rise to version i corresponds to the path in the version tree from the root to i. With a variant of the node-copying method called node splitting, we can make an ephemeral linked structure of constant bounded in-degree fully persistent at an O(1) amortized time and space cost per update step and an O(1) worst-case time per access step. The major difference between node splitting and node copying is that in the former, when a node overflows, a new copy is created and roughly half the extra pointers are moved from the old copy to the new one, thereby leaving space in both the old and new copies for later updates. More details for the method are presented in [7]. Combining the fat node method with the delayed updating technique of Tsakalidis ([18], [19]) we can derive a fully persistent form of a balanced search tree with the same time and space bounds as in the partially persistent case, although the insertion and deletion is O(logn) in the amortized case rather than in the worst case. 4. The LS-Quintary Tree 4.1. Analysis of the basic idea The Quintary tree has been designed to answer each one of the four match queries in a uniform way. The skeleton tree and the MIDDLE subtrees provide to the structure all the information needed to answer exact match queries. The MIDLEFT and MIDRIGHT trees are used whenever a key is not specified. Their usage is actually to provide to the search algorithm all the information needed to answer a partial, range or partial range query. The study of Quintary tree (for an example refer to Figure 2) has driven in the observation that the tuples stored in the MIDLEFT (MIDRIGHT) subtree of the root node of the structure are tuples stored in the MIDLEFT, MIDDLE and MIDRIGHT subtrees of every node in the skeleton tree in the LEFT (RIGHT) subtree of the root node. This is done in order to have the information located in a level d of the tree, stored in an upper level, so that we can search for it when the respective key is unspecified. So the following lemma can be proved: Lemma 1: The tuples contained in the MIDLEFT (MIDRIGHT) tree of a node r in a level d of a Quintary tree are the tuples contained in the MIDLEFT, MIDDLE and MIDRIGHT subtree of the LEFT (RIGHT) son of r. Proof: The MIDLEFT tree of a node r in a level d is obtained by projecting the tuples with value x<r in the dimension of the tuple (this applies immediately from the definition of the Quintary tree) onto the hyperplane H k-d=r (see figure 3). These tuples are the tuples having the value leftson(r) in the d-dimension [these are the tuples stored in the MIDDLE tree of the leftson(r)] plus the tuples with value y<leftson(r) in the d-dimension of the tuple, plus the tuples with value z>leftson(r). All these should be projected onto the same hyperplane. The last two sets are the tuples stored in MIDLEFT(son(r)) and MIDRIGHT(son(r)) respectively, derived from the definition of the Quintary tree. r MIDRIGHT leftson(r) MIDLEFT MIDDLE RIGHT MIDLEFT MIDDLE MIDRIGHT Figure 3. A snapshot of a Quintary tree.  ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 6 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Theorem 2: The tuples that are stored in the MIDLEFT (MIDRIGHT) subtree of the root node of a Quintary tree are the tuples stored in the MIDLEFT, MIDDLE and MIDRIGHT subtrees of every node in the skeleton tree in the LEFT (RIGHT) subtree of the root node of the tree. Proof: The proof follows immediately by applying the Lemma 1 for every node of the LEFT (RIGHT) subtree of the root node of a Quintary tree recursively from the bottom up to the top of the tree.  Using this observation, we tried to find a way of deleting the same tuples from the bottom of the tree and reducing the redundancy that makes the tree space inefficient. Due to the fact that the tree should answer the four queries in a uniform way, this information should not be omitted, but it should be kept in the MIDLEFT and MIDRIGHT subtree of the root of the tree. The same problem of redundancy appears in a recursive way to each one of the MIDLEFT, MIDRIGHT and MIDDLE subtrees of the root since they are Quintary trees of k-1 dimension, where k is the dimension of the Quintary tree. This applies recursively to every node in the tree. In every dimension a skeleton tree and the MIDLEFT, MIDDLE and MIDRIGHT subtrees of the root of the skeleton tree exist. Every other pointer of every node points to a node in the subtrees of the root. The nodes that lie on the LEFT (RIGHT) subtree of the root refer their MIDLEFT, MIDDLE and MIDRIGHT pointers to a node in the MIDLEFT (MIDRIGHT) subtree of the root. A schematic presentation of this can be seen in Figure 4. This procedure applies for each one of these trees; that is, since they are Quintary trees of (k-1) dimension, only a skeleton tree and only three subtrees of (k-1) dimension, the MIDLEFT, MIDDLE and MIDRIGHT of the root, exist. But how can the redundancy of the original Quintary tree be omitted? root MIDRIGHT r MIDLEFT MIDDLE MIDRIGHT MIDLEFT MIDDLE RIGHT LEFT Figure 4. Schematic presentation of the transformation. The idea is to follow the MIDLEFT (MIDRIGHT) subtree of the root node of the Quintary tree (in postorder traversal) and mark the occurrence of any information stored in the MIDLEFT, MIDDLE and MIDRIGHT subtrees of any node in the LEFT (RIGHT) subtree of the root of the tree. The same technique is applied to each one of the MIDLEFT, MIDRIGHT and MIDDLE subtrees of the root. This is a revised idea of the fat node method in the theory of persistence ([7]). The difference from the work of Driscoll et al. lies in the fact that they store extra information in both nodes and pointers of their structures (keeping the space linear) while in our structure extra information corresponds only to pointers. Our technique requires that we allow nodes to become arbitrarily “fat”, i.e., to hold an arbitrary number of pointer fields. To be more precise, each fat node will contain the same information and pointer fields as the original Quintary tree along with space for an arbitrary number of extra field values. Each field is a mark value. This mark value indicates the path in the revised subtree of the Quintary tree that we should follow, using its pointers, to derive the original form of the Quintary tree. The leaves of the tree, where the page numbers are located, are augmented by a list ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 7 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ of mark values. These mark values are the paths where the leaves are valid. They represent the path that we should follow to get to that leaf. We simulate the operation of the original Quintary tree by a proper use of the mark value located in node that corresponds to the access pointer to that node. This procedure is similar to the procedure followed in the work of Driscoll et. al. in [7]. Whenever we want to follow a pointer to a MIDLEFT, MIDDLE or MIDRIGHT subtree of a node in the tree we read the mark value that characterizes the pointer to that tree; this will be our current mark. Instead of following the corresponding tree in that node (that no more exists), we follow the MIDLEFT or MIDRIGHT pointer of the root (depending on whether we are following the LEFT or the RIGHT subtree of the root node of the structure) by choosing the pointer with its mark value equal to the current mark. A new mark number can be read when we move to a new MIDLEFT, MIDDLE or MIDRIGHT subtree and the same procedure applies. This idea is slightly different from the idea in [7], since the nodes do not contain their own marks because in our case they will always exist in the path when there is an access pointer to them with its mark equal to the current mark. 2 The marks are generated from the build algorithm and for simplicity are integers. They start from the value of 1 and are increased each time we follow a new pointer. Instead of integers, a combination of bits or colors can be used to represent these marks. The value of 1 is assigned to the MIDLEFT pointer of the leftmost child of the left subtree of the root node. The Quintary tree derived from this technique is presented in figure 5 for the case of the original Quintary tree of figure 2. For simplicity reasons, only the left subtree of the root is shown on the figure. Lets present an example of the simulation of a search in the LS-Quintary tree. Assume that we need to find the page number where the tuple with key (b,x) is stored; that is page number 2 (refer to the example in figure 2). We traverse the left path from the root to b on the basic-skeleton tree (Figure 5). In order to locate the key with value x, we use the MIDDLE tree on node b by following the MIDDLE pointer. From this operation the current mark is read; that is 5. The pointer 5 of the second tree gives us access to the second level of the structure where x is located. Using the same mark, we follow the MIDDLE pointer on node x to the list of page numbers. There we report the page number having in its list of marks the current mark; that is page number 2. 2Using the Lemma 1 for every node r that MIDLEFT(LEFTSON(r))=MIDRIGHT(LEFTSON(r))= (MIDLEFT(RIGHTTSON(r))=MIDRIGHT(RIGHTSON(r))=) (that is for every node at height logN-1), we can reduce further the data stored in the LS-Quintary tree, since the MIDLEFT (MIDRIGHT) subtree of node r is exactly the same as the MIDDLE subtree of LEFTSON(r) (RIGHTSON(r)). ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 8 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Figure 5. The LS-Quintary tree for the Quintary tree of figure 2. In this point a few remarks on the definition and the behaviour of the LS-Quintary tree should be added for the better understanding of the build algorithm and the performance of the tree. 1. Only the distinct key values in every dimension are stored; these key values are stored only once. 2. Mark values are generated only in the first level of the tree and they are propagated to the next levels. No additional mark values are generated in the subsequent levels. 4.2. Build algorithms The LS-Quintary tree is built using a two step procedure. In the first step, we follow the same procedure as in the original Quintary tree. An extended description of the first algorithm can be found in [11]. In this way an original Quintary tree is built. In the second step, a compression algorithm is used to reduce the space of the tree. This algorithm traverses the tree bottom-up and scans the MIDLEFT, MIDDLE and MIDRIGHT tree of each node in the skeleton tree. For every node in these trees, we search in the MIDLEFT or MIDRIGHT subtree of the root to locate the same information. Whenever we find the same node, we stamp the access pointer to that node in the MIDLEFT or MIDRIGHT subtree of the root of the tree with a mark. The access pointer gets a mark number that is increased whenever we move to a new subtree. In this way the LS-Quintary tree is created. Due to the space limit of the paper the algorithms are presented in Appendix I at the end of this work using a pseudo-language. 4.3. Space and time analysis Theorem 3. A LS-Quintary tree for a file of N records in k-dimensional space can be build in O(NlogkN/(k-1)!) time and uses O(kN) space. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 9 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Proof: The building process involves two distinct steps; step 1, the construction of the original Quintary tree and step 2, the compression of the tree that resides to the LS-Quintary tree. The time needed to build the original Quintary tree in step 1 can be obtained as follows. Suppose that all key values are distinct. If we let P(N,k) denote the time required to build the tree for a file of N records in k-dimensional space, then we have the following recurrence relation: P(1,k)=O(1) P(N,1)=O(NlogN) P(N,k)=2P(N/2, k) + 2P(N/2, k-1)+O(N) where the term O(N) is due to the step 3-5 of the BUILD algorithm. The solution for P(N,k) is O(NlogkN). As stated in [11], M. L Monier has obtained a better solution O(NlogkN/(k-1)!) to the recurrence relation (see [12]). In step 2, we traverse each one of the nodes of the Quintary tree and mark the occurrence of the information in the MIDLEFT, MIDDLE and MIDRIGHT subtree onto the MIDLEFT or MIDRIGHT subtree of the root node of the node. The number of node visits is obviously bounded by O(3k(logN)+N). So, the total build time is bounded by O(NlogkN/(k-1)!). The total amount of storage required can be derived from the space reduction procedure of the Quintary tree. First we will prove the following lemma. Lemma 2. A Quintary tree for a file of N records in k-dimensional space uses O(NlogkN/(k-1)!) space. Proof: Let S(N,k) denote the space used by a Quintary tree for a file of N records in k-dimensional space. Then we have the following recurrence relation: S(1,k)=O(k) S(N,1)=O(NlogN) S(N,k)=2S(N/2, k) + 2S(N/2, k-1)+O(N) = O(NlogkN). As stated in [11], M.L. Monier has obtained a better solution O(Nlog kN/(k-1)!) to the recurrence relation (see [12]).  As stated in the work of [11], a record is represented O(Nlog kN) times. Using the COMPRESS algorithm, every one of these occurrences is eliminated and the tuple is stored only once. We will proof and use the following lemma: Lemma 3. The space occupied by the LS-Quintary tree depends on the number of distinct key values in every dimension of the tree. Proof: Assume that a key requires O(1) space for storage. The total space needed for the LS-Quintary tree depends on the space occupied by the key values and that of the additional mark values. As it is stated in [11], the information stored in the page numbers is not calculated since it is not part of the index structure. Assuming that we have to store a file F of N records in k-dimensional space and that each dimension i has di distinct values. Based on the definition of the LS-Quintary tree, each distinct value -in any dimension- is stored only once. Therefore, the space for the key values in each dimension i is proportional to the number of the distinct values, namely O(di), for ik. The total space Skey_values for the storage of the key values in any dimension is: Skey_values=O(d1)+ O(d2)++ O(dk) For simplicity and with no loss of generality, assume that d 1d2dkd. In this case the space Skey_values needed for the key values is bounded by kd and the following equation applies: Skey_values=O(kd) (1) Now, we will calculate the space overhead caused by the mark values. Mark values are only generated from the COMPRESS_LEFT (COMPRESS_RIGHT) algorithm. The numbers are generated by the execution of these algorithms in the first dimension and are propagated to the next dimensions. That means, that no new numbers are ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 10 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ generated in the other dimensions of the LS-Quintary tree. For d distinct values in the first dimension we have at most  mark values. The number  is calculated as follows: d  d  In any balanced tree with d keys, we have   leaves and   internal nodes. In the LS-Quintary tree, we have 1 2 2 mark value generated by each leaf and 3 mark values by each internal node. Therefore: d    d    ë=   +3   2d=O(d) 2 2 (2) That means that the space for the mark values depends on the number d of distinct key values and Lemma 3 holds.  In order to calculate the space of the LS-Quintary tree, we will first examine separately one particular case, which corresponds to the worst case of the original Quintary tree; this is the case when we have a file with N records and the file is composed of the all possible combinations of the distinct values in each dimension. In this case the original Quintary tree occupies O(N logkN). In the case of the LS-Quintary tree, since the file contains all the possible combinations of the d distinct values, we have: N=dk dN 1 k (3) We define p(j) the possibility of a pointer to contain exactly j mark values. Assume that any pointer in the tree in dimension i with 2ik, can contain -on average- MV mark values. In this case, since we examine only the left tree, we have at most, as it can be derived from (2) and Figure 5, d mark values. Based on these observations, we have: MV=p(1)1+p(2)2++p(d)d It holds that: p(1)+p(2)++p(d)=1 (4) (5) From the definition of the LS-Quintary tree and the remarks from section 4.1 we can claim, with no loss of generality, that: p(1)=p(2)==p(d)=p (6) ( 6) 1 (5)  p+p++p=1dp=1p= d (7) Therefore: ( 7) 1 (1+2++d)= d 1 1  ( d  1) d d= = 2 d 2 MV=p+2p++dp=p(1+2++d)  As it can be seen from this equation, the worst case of the Quintary tree turns into a case where mark values are distributed in a uniform way from the second to the last level of LS-Quintary tree. Based on this calculation we can claim that the worst case for the number of mark values is O(d). Making the same calculations for the whole tree, for all the pointers, we have at most O(d2) additional mark values. The total space S LS_Quintary for the tree is: ( 3) S LS_Quintary=S key_values+S mark_values=O(kd)+O(kd2)  O[k( N 1 k N 2 k )] (8) From equation (8) it can be derived that the space performance of the LS-Quintary tree depends not on N (as in the original Quintary tree) but on the rate N . For the worst case of the original Quintary tree, we have very good space d performance in the LS-Quintary tree. The space in this case is bounded from O(kN). The performance is reduced when dN, but in this case, despite the fact that we have more mark values generated from the first dimension of the tree, these values appear less times in the other dimensions since the records do not repeat each other (this case is the ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 11 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ same as in [7]). It remains to prove that in any case the space is bounded by O(kN). A formal proof of this claim follows. For the general case, in order to calculate the total space overhead caused by the mark values, we will examine each level of the LS-Quintary tree separately. Assume that in the first level, the number of mark values is O(d 1), if the size of the first level is d1, since we have at most three mark values generated at each node of the tree. For the calculation of the overhead caused by the mark values at the second level of the LS-Quintary tree, we distinguish three different cases, based on the nature of the problem and the relation among the number of distinct keys of the first and second level. Namely: Case 1. Both levels contain many distinct keys. Assume that d1 and d2 are the numbers of distinct key values in the first and second level respectively. This case corresponds to the previously examined case and the proof is the same. Case 2. The first level contains many distinct keys, while the second level is composed of few distinct keys. In this case assume that d1N and d2<<N. The space overhead (S2) generated from the mark values of the second level depends on the number of pointers P of this level and the number of mark values ( MV ) that each pointer contains. Namely: S2=P MV From the definition of the tree follows that P=O(d2). Moreover, in the worst case, each pointer will contain at most d1 mark values, so we have that MV =O(d1). Therefore: S2=O(d1 d2) We have that d2<<N  d2ε (small constant) and therefore (since d1N): S2=O(εd1)  S2=O(N) Case 3. The first level contains very few distinct keys, while the second level is composed of many distinct keys. In this case assume that d1<<N and d2N. The space overhead (S2) is also: S2=P MV From the definition of the tree follows that P=O(d 2). Moreover, in the worst case, each pointer will contain d 1 mark values, so we have that MV =O(d1). Therefore: S2=O(d1 d2) We have that d1<<N  d1ε (small constant) and therefore (since d2N): S2=O(εd2)  S2=O(N) As it can be derived, in every case, the space overhead generated from the mark values of the second level of the LS-Quintary tree is O(N), and since the space for the keys is also bounded by O(N), the total space is also O(N). In the other levels of the tree over the second one, the total space of each level is also O(N), since no additional mark values are generated (other than those generated at the first level) and the nature of the trees of these levels and their relation with the previous level, fulfils always one of the three previously described cases. As a conclusion, for the whole LS-Quintary tree, we have that the tree consist of k different levels, and each level is stored in a tree with size equal to O(N). therefore: S LS_Quintary=S key_values+S mark_values=O(kN)+O(kN)=O(kN) From the above discussion it is concluded that Theorem 3 applies.  Remark. It remains to find out the search time overhead of reducing the space of the original Quintary tree. The basic skeleton has not been changed (in a way that effects its traversal), so no time access overhead is calculated for any operation on the skeleton. Whenever an access pointer to a fat node of the LS-Quintary tree has to be followed, any algorithm has to decide which one of the corresponding pointer should be chosen. This takes more than constant time. This choice is done using the marks stored in the fat node. These values can be stored using their physical order in a binary search tree; then O(logm) time is needed to choose any pointer, where m is the number of times that we have to access the MIDLEFT or MIDRIGHT subtree of the root to locate a tuple. This time can be bounded in the worst case ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 12 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ by O(logN). The same organization can also be applied to the leaves (otherwise we can use the already normalized order of the marks and create an array of access pointers with an entry for each leaf; then these access pointers can be retrieved in constant time ([3])). Using this representation, we have a logN amortized cost per answer reported. For an extended mathematical proof of the whole effort please refer in [7]. 4.4. Search algorithms The search algorithms that handle the four types of query and the analysis of their worst-case performance are presented in details in Apendix II. 5. Comparison with other multidimensional data structures The following table summarizes the complexity bounds of LS-Quintary trees, the k-d trees ([1], [10]), Range trees ([20]), k-ranges ([2]) and Multiple Attribute Trees (MAT) ([8], [15]). Quintary Tree LS-Quintary Tree k-d Trees Range Tree O(Nlogk-1N/(k -1)!) O(kN) O(N) O(Nlog Build Time O(Nlogk-1N/(k -1)!) O(Nlogk-1N/(k-1)!) Exact Match Partia l Match Range Match O(logN+k) O(logN+klogN) Space O(3 k-s (s+logN )+t) k O(log N+t) Nonoverlapp ing k-ranges O(N) Overlapp ing k-ranges O(N1+å) å>0 O(NlogN) O(NlogN) N) - O(N1+å) å>0 - - O(kN1/k+t) - - - O(sN1-(s-1)/k+t) O(Nå+t) å>0 O(logN+t ) O(N1-1/klog(N1/k) +t) - - - k- 1 N) O(NlogN) O(Nlog k- 1 O(3 - (k-s)/k+ (s+logN)logN+ O(n t) tlogN) k-s k O(log N+tlogN) Partia O(3k-slogsN+t) O(3k-slogs+1N+tlogN) l Range O(kN1-1/k+ t) O(sN1-1/k+ t) k O(log N+ t) - MAT O(N) The unspecified buckets mean that the respective data structure can not support efficiently this kind of query. 6. Conclusion As it can be easily seen, the Quintary tree is a well-formed structure used mainly for static files. As Lee and Wong stated in [11] the tree is inadequate for too many updates since they may destroy the properties of the structure and affect the performance. However for completeness, Lee and Wong outlined algorithms for insertion and deletion. The same algorithms apply to the LS-Quintary trees. The only difference is the time required for insertion. The original Quintary tree requires O(k+(logN) k) time. This bound is multiplied by an logN factor for the LS-Quintary trees. However, this is not a robust insert/delete procedure, since too many updates destroy the whole tree. So, it remains an open problem to derive a variant of the LS-Quintary tree that would support all its original properties and it will be dynamic. We believe that a more powerful construction algorithm of the data structure is needed for this. Our future work focuses mainly on the derivation of a simple and powerful construction algorithm and secondly on the dynamization of the structure. References [1] [2] [3] [4] [5] [6] [7] Bentley, L., "Multidimensional Binary Search Trees used for Associative Searching", Communications of the ACM, 18, (1975), pp 509-517. Bentley, L. and H.A. Maurer, "Efficient worst-case data structures for range searching", Acta Informatica, 13, (1980), pp. 155-168. Boronjerdi, A.R. and B.M.E. Moret, “Persistence in Computational Geometry”, to appear in the Proceedings of the Seventh Canadian Conference on Computational Geometry, (1995). Cardenas, F. and J. P. Sagamang, "Doubly-chained Tree Database Organization -Analysis and Design Strategies", Comput. J., 20, (1977), pp 15-26 Chazelle, B., “Lower bounds for orthogonal range searching: I. The reporting Case”, Journal of the ACM, 37, 2, (1990), pp. 200-212. Chazelle, B., “Lower bounds for orthogonal range searching: II. The arithmetic model”, Journal of the ACM, 37, 3, (1990), pp. 439-463. Driscoll, J., Sarnak, N., Sleator, D. and R., Tarjan, “Making data structures persistent”, Journal on Computer and System Sciences, 38, (1989), pp. 86-124. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 13 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] Kashyap, L., Subas, S.K.C and S.B. Yao, "Analysis of the Multi-Attribute-Tree Database organization", IEEE Trans. Software Eng., SE-6, (1977), pp 451-467. Knuth, D. E., “Big omicron, big omega and big theta”, SIGACT News, 8, (1976), pp. 18-24. Lee, D.T. and C. Wong, "Worst case analysis for region and partial region searches in multidimensional binary search trees and quad trees", Acta Inf., 9, 1, (1978), pp 23-29. Lee, D.T. and C. Wong, "Quintary Trees: A File Structure for Multidimensional Database Systems", ACM Trans. on Database Systems, Vol 5, No 3, (1980), pp 339-353. Monier, L., “Combinatorial solutions of multidimensional divide-and-conquer recurrences”, J. Algorithms, 1, (1980) pp. 60-74. Sarnak, N., “Persistent Data Structures”, Ph. D. thesis, Dept. of Computer Science, New York University, New York, 1986. Sarnak, N. and R.E. Tarjan, “Planar Point Location using Persistent Search Trees”, Comm. ACM, Vol 29, No 7, (1986), pp. 669-579. Sitharama Iyengar, S., Rao, N.S.V., Kashyap, R.L., and V.K., Vaishnavi, “Multidimensional data structures: Review and outlook”, Advances in Computers, 27, (1988), pp. 69-119. Tarjan, R.E., “Data structures and network algorithms”, Society for Industrial and Applied Mathematics, Philadelphia, PA, (1983). Tarjan, R.E., “Amortized computational complexity”, SIAM J. Algebraic Discrete Methods 6, (1985), pp. 306-318. Tsakalidis, A.K., ”An optimal implementation for localized search”, A84/06, Fachbereich Angewandte Mathematik und Informatik, Universität des Saarlandes, Saarbrücken, West Germany, (1984). Tsakalidis, A.K., ”AVL-trees for localized search”, Inform. and Control, 67, (1985), pp. 173-194. Willard, D.E., "New Data Structure for Orthogonal Range Queries", SIAM J. Comput., Vol 14, No 1, (February 1985), pp 232-253. Willard, D.E., and G. Luecker, "Adding Range Restriction capabilities to Dynamic Data Structures", Journal of ACM, 32, (1985), pp 597-617. Vaishnavi, K., "Multidimensional Balanced Binary Trees", IEEE Trans. on Comput., Vol 38, No 7, (July 1989), pp 968-985. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 14 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Apendix I BUILD(d,F) /* This algorithm gets as input a file F of records and the dimension d of the associated file space. It returns a pointer to the root of the tree for the file F if d1. Otherwise it returns a pointer to a list of page numbers where the records of F reside. */ begin 1. if d<1 then return (T) 2. Create a new node NODE with five pointer fields LEFT, MIDLEFT, MIDDLE, MIDRIGHT and RIGHT and one value field VAL. 3. Find the median of the file F of records with respect to key K k-d. Let it be r and set NODE.VAL=r. 4. Partition the file F into three subfiles Fleft, Fmid and Fright such that they contain records whose Kk-d values are less than, equal to and greater than r, respectively. 5. Let Fmidleft and Fmidright denote the file of records obtained by projecting Fleft and Fright onto the hyperplane Hk-d=r. If d=1, the files Fmidleft and Fmidright will be the lists of page numbers where the records of Fleft and Fright reside, respectively. 6. if Fleft is empty then begin NODE.LEFT= NODE.MIDLEFT= end else begin NODE.LEFT=BUILD(d, Fleft) NODE.MIDLEFT=BUILD(d-1, Fmidleft) end 7. if Fright is empty then begin NODE.RIGHT= NODE.MIDRIGHT= end else begin NODE.RIGHT=BUILD(d, Fright) NODE.MIDRIGHT=BUILD(d-1, Fmidright) end 8. NODE.MIDDLE=BUILD(d-1, Fmid) 9. return(addr(NODE)) end COMPRESS(NODE, d) /* The algorithm compresses the quintary tree by scanning the LEFT and RIGHT subtree and comparing the MIDLEFT, MIDDLE and MIDRIGHT subtree of each node NODE with the MIDLEFT or MIDRIGHT subtree of the root node of the tree. It calls the COMPRESS_LEFT procedure to compress the LEFT subtree and the COMPRESS_RIGHT procedure to compress the RIGHT subtree. It is invoked by COMPRESS(ROOT, k), where ROOT is the root node of the quintary tree. */ begin 1. if Tree not empty then begin MARK=1 /*This number will be used to mark the occurrence of the same information in the MIDLEFT and MIDRIGHT subtree of the root */ 2. if NODE.LEFT then begin COMPRESS_LEFT(root, root) if d1 then COMPRESS(root.MIDLEFT, d-1) end 3. if NODE.RIGHT then begin COMPRESS_RIGHT(root, root) if d1 then COMPRESS(root.MIDRIGHT, d-1) end 4. if d1 then COMPRESS(root.MIDDLE, d-1) end end COMPRESS_LEFT(root, NODE) /* The algorithm compresses the LEFT subtree of a quintary tree. It compares the MIDLEFT, MIDDLE and MIDRIGHT subtree of each node NODE with the MIDLEFT subtree of the root node of the tree. Whenever the same information is located its is marked using the MARK_STRUCTURES procedure using the MARK generated in COMPRESS_LEFT procedure. */ begin 1. if NODE.LEFT then COMPRESS_LEFT(root, NODE.LEFT) 2. if NODE.MIDLEFT then begin MARK=MARK+1 MARK_STRUCTURES(root.MIDLEFT, NODE.MIDLEFT, MARK) end 3. if NODE.MIDRIGHT then begin MARK=MARK+1 MARK_STRUCTURES(root.MIDLEFT, NODE.MIDRIGHT, MARK) ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 15 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ end 4. MARK=MARK+1 5. MARK_STRUCTURES(root.MIDLEFT, NODE.MIDDLE, MARK) 6. if NODE.RIGHT then COMPRESS_LEFT(root, NODE.RIGHT) end COMPRESS_RIGHT(root, NODE) /* The algorithm compresses the RIGHT subtree of a Quintary tree. It compares the MIDLEFT, MIDDLE and MIDRIGHT subtree of each node NODE with the MIDRIGHT subtree of the root node of the tree. Whenever the same information is located its is marked using the MARK_STRUCTURES procedures using the MARK generated in COMPRESS_RIGHT procedure. */ begin 1. if NODE.RIGHT then COMPRESS_RIGHT(root, NODE.RIGHT) 2. if NODE.MIDLEFT then begin MARK=MARK+1 MARK_STRUCTURES(root.MIDRIGHT, NODE.MIDLEFT, MARK) end 3. if NODE.MIDRIGHT then begin MARK=MARK+1 MARK_STRUCTURES(root.MIDRIGHT, NODE.MIDRIGHT, MARK) end 4. MARK=MARK+1 5. MARK_STRUCTURES(root.MIDRIGHT, NODE.MIDDLE, MARK) 6. if NODE.LEFT then COMPRESS_RIGHT(root, NODE.LEFT) end MARK_STRUCTURES(NODE1, NODE2, MARK_STAMP) /* The algorithm checks if the value stored in a NODE1 is equal to the value stored in NODE2. If this is so, then the node and the access pointer to that node is stamped with the MARK_STAMP. Else depending on the condition between the two values, the left or the right path of the tree, where NODE1 is stored, is followed. */ begin 1. if NODE2.VAL==NODE1.VAL then begin Stamp the NODE1 and the access pointer to NODE1 with the MARK_STAMP Create 5 NIL pointers to the NODE1 (that is LEFT, MIDLEFT, MIDDLE, MIDRIGHT, RIGHT) /* These pointers are created so that they exist when there is no node next to their corresponding node or there is no an access pointer to the next node so that the previous step in the next recursive execution of the algorithm can be done. They only give a linear space overhead in the whole structure */ /* Perform the same check for every subtree (LEFT, RIGHT, MIDLEFT, MIDRIGHT and MIDDLE) of both nodes */ MARK_STRUCTURES(NODE1.LEFT, NODE2.LEFT, MARK_STAMP) MARK_STRUCTURES(NODE1.RIGHT, NODE2.RIGHT, MARK_STAMP) MARK_STRUCTURES(NODE1.MIDLEFT, NODE2.MIDLEFT, MARK_STAMP) MARK_STRUCTURES(NODE1.MIDDLE, NODE2.MIDDLE, MARK_STAMP) MARK_STRUCTURES(NODE1.MIDRIGHT, NODE2.MIDRIGHT, MARK_STAMP) end else begin 2. if NODE2.VAL== then leave the pointers NIL /* Find out the branch that the algorithm should follow */ 3. if (NODE2.VAL<NODE1.VAL) then MARK_STRUCTURES(NODE1.LEFT, NODE2, MARK_STAMP) 4. if (NODE2.VAL>NODE1.VAL) then MARK_STRUCTURES(NODE1.RIGHT, NODE2, MARK_STAMP) end /* The case NODE1.VAL== is not possible if NODE2 according to Theorem 2 */ end ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 16 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ Appendix II Search algorithms In this section we present the algorithms that handle the four types of query and analyze their worst-case performance. The query Q is assumed to be a vector of k-tuples (r0,r1,,rk-1) for exact match and partial queries, ri may be an * if the key ki is unspecified in a partial match query. Ki(Q) denotes the key value of ki of the given query Q. Similarly, Q is a 2xk array for range and partial range queries. LK i(Q) and UKi(Q) denote, respectively, the lower and upper bound on the key value of ki. In the following algorithms, T is a pointer to the tree to be searched and value(T) is the value stored at the root node of the tree. ROOT is assumed to be the pointer to the entire tree. Exact Match EXACT_MATCH(T, d, Q) /* The algorithm is rather self explanatory */ begin 1. If T== then return(). 2. If d==0 then return(T). 3. If Kk-d(Q)<value(T) then EXACT_MATCH(LEFT(T), d, Q) 4. else if Kk-d(Q)>value(T) then EXACT_MATCH(RIGHT(T), d, Q) 5. else EXACT_MATCH(MIDDLE(T), d-1, Q) end. Since its node visit (a key comparison) discards roughly half of the file represented by the node, the total number of visits is at most O(k+logN), where k is the number of levels of the tree or the dimension of the file space. The answers are multiplied by a factor of logN due to the existence of the fat nodes in the tree. Partial Match PARTIAL_MATCH(T, d, Q) /* The algorithm for partial match query is similar to the algorithm for exact match query except that when a key is unspecified we have to search through the attached three subtree at one lower level (step 3’). If s is the number of unspecified keys in the query, the worst case that the algorithm makes the most number of node visits is when the specified keys are at the last s levels. The maximum number of nodes visits is obviously bounded by O(3k-s(s+logN)+t), where t is the number of records retrieved. This bound is multiplied by a factor of logN for every step that access the MIDLEFT, MIDDLE and MIDRIGHT subtrees of the tree. */ begin 1. If T== then return(). 2. If d==0 then return(T). 3'. If Kk-d(Q)==“*” then begin PARTIAL_MATCH(MIDLEFT(T), d-1, Q) PARTIAL_MATCH(MIDDLE (T), d-1, Q) PARTIAL_MATCH(MIDRIGHT(T), d-1, Q) end 3. else if Xk-d(Q)<value(T) then PARTIAL_MATCH(LEFT(T), d, Q) 4. else if Xk-d(Q)>value(T) then PARTIAL_MATCH(RIGHT(T), d, Q) else PARTIAL_MATCH(MIDDLE(T), d-1, Q) end. Range Search RANGE_SEARCH(T, d, Q, L, R) /* First of all, the root node of the entire tree at level 0 is visited. If a node value is out of bound i.e., not within the closed interval [LK0(Q), UK0(Q)], we follow either the LEFT or the RIGHT pointers depending on whether the node value is greater than UK 0(Q) or less than LK0(Q). Otherwise we invoke the algorithm recursively by visiting the MIDDLE subtree at one lower level and the LEFT and the RIGHT subtrees at the same level setting the control variables L and R to 1. The control variables are used to indicate that the interval [LK0(Q), UK0(Q)] has been partitioned into two subintervals [LKi(Q),x),3 (x,UKi(Q)], where x is the value stored at the current node and remains 1 during the searches of the subtrees at the same level. Since each interval [LK i(Q), UKi(Q)] can be partitioned into at most logN subintervals at each level i, the maximum number of node visits is bounded by O(log kN) plus the number of records found in the specified region, i.e., O(logkN+t) where t is the number of records in the region. The bound is increased by a factor of logN for every answer due to the existence of the fat nodes in the tree. */ begin 1. If T== then return(). 2. If d==0 then return(T). 3. If value(T)<LKk-d(Q) then RANGE_SEARCH(RIGHT(T), d, Q, L, R) 3[x,y) denotes an interval that contains z such that xz<y. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 17 COMPUTER TECHNOLOGY INSTITUTE 1999 ________________________________________________________________________________ 4. else if value(T)>UKk-d(Q) then RANGE_SEARCH(LEFT(T), d, Q, L, R) 5. else begin RANGE_SEARCH(MIDDLE(T), d-1, Q, 0, 0) if R==0 then RANGE_SEARCH(LEFT(T), d, Q, 1, R) else RANGE_SEARCH(MIDLEFT(T), d-1, Q, 0, 0) if L==0 then RANGE_SEARCH(RIGHT(T), d, Q, L, 1) else RANGE_SEARCH(MIDRIGHT(T), d-1, Q, 0, 0) end end. Partial Range PARTIAL_RANGE The algorithm is similar to the RANGE_SEARCH algorithm except that if a key is unspecified, we have to search through the attached three subtrees at one lower level. We simply add the following step between steps 2 and 3. 3'. If key Kk-d is unspecified then begin RANGE_SEARCH(MIDLEFT(T), d-1, Q, 0, 0) RANGE_SEARCH(MIDDLE(T), d-1, Q, 0, 0) RANGE_SEARCH(MIDRIGHT(T), d-1, Q, 0, 0) end else ... The maximum number of node visits is bounded by O(3 k-s(logN)s+t) where s is the number of specified keys and t that number of records in the region. This bound is multiplied by a factor of logN for every step that accesses the MIDLEFT, MIDDLE and MIDRIGHT subtrees of the tree. ____________________________________________________________________________________ TECHNICAL REPORT No. ΤR99/06/03 18

Abstract - Computer Technology Institute

Related documents

Products

Support

Abstract - Computer Technology Institute

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib