Abstract - Computer Technology Institute

advertisement
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
The LS-Quintary Tree
A Universal Multidimensional Data Structure with Linear Space
(Extended Abstract)
Georgia Panagopoulou, Spiros Sirmakessis, Athanasios Tsakalidis
(22 June 1999)
Abstract
In this paper we introduce a variation of the Quintary tree, which requires linear storage space; this
is actually an improvement on the original form of the Quintary tree, introduced in [11]. The
Quintary tree is a file structure for multidimensional database systems that answers all the known
match queries (exact, partial, range, partial range). The Quintary tree, in the form that it was firstly
introduced by Lee and Wong, can be built in O(N(logN)k/(k-1)!) time and requires similar storage,
for a file of N records each consisting of k keys. The worst-case time bounds for the search
algorithms are respectively O(logN+k), O(3k-s(s+logN)+t), O(logkN+t) and O(3k-slogsN+t), where
s is the number of keys specified in a partial match (or range) query and t is the number of records
retrieved by the query. In this paper we introduce the LS-Quintary tree that can be built with the
same time bounds, but it uses only O(kN) space. The time bounds for answering the queries are
only increased by adding a logN factor for every tuple in the answer, resulting in a substantial
improvement on the product of the space of the tree and the time required to answer any one of the
queries defined here as the potential value added of the improved structure. Moreover, our
structure can answer range queries using linear space.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
1
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
1. Introduction
During the last two decades a great effort of the research has been consumed in the construction of an efficient data
structure for information retrieval ([1], [4], [8], [20], [21], [22]). Such a data structure should be able to answer any
retrieval request or search query for k-dimensional data (e.g. records with k attributes).
Let S be a data set containing N records, each of which is an ordered k-tuple (r0,r1,,rk-1) of values. Each component
of the k-tuple is called an attribute or a key (see [11]). A query specifies certain conditions to be satisfied by the keys
of the records and can be classified according to the following types:
* Exact match query: specifies a value for each key.
* Partial match query: specifies s<k keys and the remaining k-s are unspecified.
* Range query: specifies for each key ki, a range (e.g. (li, ri)).
* Partial range query: specifies a range for each of the s<k keys.
A response to a query is realized by initiating an appropriate search procedure and retrieving all the records requested
by the query. Different data structures have been developed to support some of these queries (e.g. the k-d trees [1],
[10], Range trees [20], k-ranges [2], Multiple Attribute Trees (MAT) [8], [15], etc.). The Quintary tree was proposed
by Lee and Wong ([11]) as a data structure for building an efficient information retrieval system. Consider the data
domain as a k-dimensional space and each key as a coordinate axis. Let di be the number of distinct values assumed
by the keys of the records of S,  key ki, diN. A k-dimensional Quintary tree, storing N records, can support exact
match queries in O(logN+k) time, partial match queries, where only s<k coordinate values are specified, in
O(3k-s(s+logN)+t) time (t is the number of records satisfying the query), range queries in O(log kN+t) time, partial
range queries, where a range is given for only s<k coordinates, in O(3k-slogsN+t) time, using O(NlogkN/(k-1)!))
storage.
Although the Quintary tree is a well-formed data structure that answers all possible database queries, its space
requirements make it space-inefficient for most of the databases. In this paper we apply ideas and techniques from the
theory of persistence ([7], [13], [14]) in order to reduce the space requirements of the tree.
More precisely, we use ideas from the fat node technique, described in [7], and we apply them to the Quintary tree, to
eliminate the data redundancy used in the original Quintary tree. This technique eliminates any duplicates of the data
stored and reduces the space to O(kN). The time required to answer any of the four queries is increased by adding a
logN factor for every tuple in the answer; namely, exact match in O(logN+klogN) time, partial match in
O(3k-s(s+logN)+tlogN) time (t is the number of records satisfying the query), range queries in O(log kN+tlogN) time,
partial range queries in O(3k-slogsN+tlogN) time.
The contribution of this work is the significant reduction of the space required. More precisely, let us introduce the
potential of the Quintary tree. We define the product SDxTDQ to be the potential PDQ of a data structure D for a query
Q, where SD is the space occupied by a data structure and TDQ the time required to answer a query Q. We will use this
function to define the significance of the results of this paper. Moreover, we will focus in the results for range
searching. Similar results can be computed for each one of the queries. The PDQ for the Quintary tree in the case of
range searching is:
PQuintaryRange= SQuintaryxTQuintaryRange
=O(NlogkN
logk N  t
)
(k 1)!
T
he same potential function for the LS-Quintary tree is:
PLSQuintaryRange= SLSQuintaryxTLSQuintaryRange
=O(kN).O(logkN+tlogN)
=O(kN logkN+tkNlogN)
So the potential value added of the LS-Quintary tree in the case of range searching is:
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
2
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
PQu int aryRange
PLSQu int aryRange


N logk N(logk N  t)
 O

k
 (k  1)!(kN log N  tkN log N) 
1
logk N  t 
1

k 1
logk1 N
  O log N
k 1
 k!

log N  t 
 k!
= O
This is a significant improvement in the general behaviour of the data structure. Similar results can be calculated for
each one of the queries.
The result for range searching satisfies the lower bounds introduced by Chazelle (in [5] and [6]). Chazelle in [5] has
proved that if a data structure provides a query time of O(t+logcN), for any arbitrary constant c, where t is the number
of points to be reported, then its size must be Ù(N(logN/loglogN) k-1). However, as it is stated in the same work, if
instead of O(t+logcN), we allow a more general function for the queries, such as O(tlogN+logcN), Chazelle’s
technique breaks down completely. Therefore, Ù(Í) is the only lower bound on the storage requirement we can expect
to derive. This paper presents for the first time an example of a data structure that can approximate the observation of
Chazelle.
This paper is organized as follows: section 2 presents in brief the Quintary tree. The results of persistence are
mentioned in section 3. The main results, that is, the LS-Quintary tree, its algorithms, space and time analysis, can be
found in section 4. A comparison of the LS-Quintary tree with the most known data structures is made in section 5.
2. The Quintary Tree
The Quintary tree was proposed by Lee and Wong ([11]) as a data structure for supporting an efficient information
retrieval system.
Consider the data domain as a k-dimensional space and that each key corresponds to a coordinate axis. Let d i be the
number of distinct values assumed by the keys of the records of S,  key ki, diN. The Quintary tree that stores the N
records can be viewed as a multidimensional tree with k levels. At each level we have a perfectly balanced binary tree
as a skeleton and attached to each node three additional subtrees, the MIDDLE, the MIDLEFT and the MIDRIGHT,
which are at one level lower than the node itself. Suppose we are at the first level, namely level 0. The skeleton tree of
level 0 is a binary tree with c(0) nodes. Each node w has a value which corresponds to a hyperplane H o denoted by the
equation H0=value(w). If we take the node w to represent a subset of S in k-dimensional space, then the two sons of w
represent two subsets containing points which lie on one side of H 0 (left and right) respectively. The points of the
subset represented by w, which lie on H0, form a subset in (k-1)-dimensional space (0-coordinate equals value(w)) and
are stored in the MIDDLE(w) subtree attached to w. The points which lie to the left/right of the H 0, form two subsets
in (k-1)-dimensional space (these points are projected on H0); they are stored in subtrees MIDLEFT(w) and
MIDRIGHT(w) respectively.
In general, each node at level i represents a subset in (k-i)-dimensional space and has three subtrees storing points in
(k-i-1)-dimensional space at level i+1 and two subtrees in (k-i)-dimensional space, that lie to the left and to the right
of the hyperplane Hi, denoted by the equation Hi=nodevalue. At the (k-1)th level (one-dimensional space), the
MIDDLE, MIDLEFT and MIDRIGHT pointers point to a list of page number that contains the corresponding data
records. This makes the algorithms for answering the various queries more uniform and totally recursive. More details
can be found in [11]. Figure 1 shows a recursive representation of a Quintary tree.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
3
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
r
S(N,k)=
|A|+|B|+|C|=N
|A|=|A’|, |C|=|C’|
C
A
RIGHT
LEFT
A’
C’
B
MIDLEFT
MIDDLE
MIDRIGHT
r
S(N,1)=
C
A
RIGHT
LEFT
B
C’
A’
A’, C’: List of Page Numbers that Contain Records in A, C
Respectively
B:
Page Number that Contains the Record r.
Figure 1: Recursive representation of a Quintary tree
Lee and Wong have proved the following theorem:
Theorem 1: A k-dimensional Quintary tree storing N records can support:
* an exact match query in O(logN+k) time,
* a partial match query, where only s<k coordinate values are specified, in O(3k-s(s+logN)+t) time (t is the
number of records satisfying the query),
* a range query in O(logkN+t) time,
* a partial range query, where a range is given for only s<k coordinates, in O(3 k-slogsN+t) time,
* requires O(NlogkN/(k-1)!) storage and
* can be built in O(NlogkN/(k-1)!)) time. 
The later two bounds can be reduced by a logN factor if we do not use the MIDLEFT and MIDRIGHT subtrees for
the nodes at (k-1)th level and slightly change the algorithms given by Lee and Wong. Figure 2 presents an example of
a Quintary tree for the database shown in Table 1.
A
B
Page
No
a
x
1
b
x
2
c
x
3
d
y
4
e
y
5
f
x
6
g
z
7
h
z
8
I
x
9
j
y
10
k
z
11
l
x
12
m
y
13
n
z
14
o
x
15
Table 1. The tuples of an example database.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
4
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Figure 2. The Quintary tree for the database shown.
The above time complexities suggest that Quintary tree can answer more efficiently partial match queries than other
known structures, at a cost of extra storage requirement.
3. Persistence
A data structure is said to be ephemeral if any changes to the structure destroy the old version. Accesses and updates
can be done only on the current version. We call a data structure persistent if it supports accesses to multiple versions.
The structure is partially persistent if all versions can be accessed but only the newest version can be modified and
fully persistent if every version can be both accessed and modified ([7], [13], [14]).
Driscoll et al. in [7] give efficient methods for transforming a pointer-based ephemeral data structure into one that is
partially or fully persistent in a way that satisfies ideal resource bounds: a constant factor in query time over that of
the ephemeral structure and a constant amount of space per change in the ephemeral structure. More precisely, they
showed that if an ephemeral structure has nodes of constant bounded in-degree, then the structure can be made
partially persistent at an amortized1 space cost of O(1) per update step and a constant factor in the amortized time per
operation. Two methods are introduced in [7]; the first and simpler is the fat node method, which applies to any
ephemeral linked structure and makes it partial persistent at a worst-case space cost of O(1) per update step and a
worst-case time cost of O(logm) per access or update step (m is the total number of update operations). The idea is to
record all changes made to node fields in the nodes themselves, without erasing old values of the fields. This requires
that we allow nodes to become arbitrarily “fat”, i.e., to hold an arbitrary number of values of each field. To be more
precise, each fat node will contain the same information and pointer fields as an ephemeral node, along with space for
an arbitrary number of extra field values. Each extra field has an associated field name and a version stamp. The
version stamp indicates the version in which the named field was changed and assigned to the specified value. In
addition, each fat node has its own version stamp, indicating the version in which the node was created. The resulting
persistent structure has all the versions of the ephemeral structure embedded in it. We navigate through the persistent
structure as follows: When an ephemeral access step applied to version i accesses field f of a node, we access the
value in the corresponding fat node whose field name is f, choosing among several such values the one with maximum
version stamp no greater than i.
The second method, the node-copying method, allows nodes in the persistent structure to hold only a fixed number of
field values. When we run out of space in a node, we create a new copy of the node, containing only the newest value
of each field. We must also store pointers to the new copy in all predecessors of the copied node in the newest version.
If there is no space in a predecessor for such a pointer, the predecessor, too, must be copied. Nevertheless, if the
underlying ephemeral structure has nodes of constant bounded in-degree then we can derive an O(1) amortized bound
on the number of nodes copied and the time required per update step.
1By
amortized cost we mean the cost of an operation averaged over a worst-case sequence of
operations. See the survey paper of Tarjan [17].
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
5
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Moreover, using more powerful techniques applied to the fat node approach, we can make an ephemeral linked
structure fully persistent at a worst-case space cost of O(1) per update step and an O(logm) worst-case time cost per
access or update step. In this case, the versions of a fully persistent structure are only partially ordered whereas the
various versions of a partially persistent structure have a natural linear ordering. This partial ordering is defined by a
rooted version tree, whose nodes are the versions (0 through m) with version i the parent of version j if version j is
obtained by updating version i. The sequence of updates giving rise to version i corresponds to the path in the version
tree from the root to i.
With a variant of the node-copying method called node splitting, we can make an ephemeral linked structure of
constant bounded in-degree fully persistent at an O(1) amortized time and space cost per update step and an O(1)
worst-case time per access step. The major difference between node splitting and node copying is that in the former,
when a node overflows, a new copy is created and roughly half the extra pointers are moved from the old copy to the
new one, thereby leaving space in both the old and new copies for later updates. More details for the method are
presented in [7]. Combining the fat node method with the delayed updating technique of Tsakalidis ([18], [19]) we can
derive a fully persistent form of a balanced search tree with the same time and space bounds as in the partially
persistent case, although the insertion and deletion is O(logn) in the amortized case rather than in the worst case.
4. The LS-Quintary Tree
4.1. Analysis of the basic idea
The Quintary tree has been designed to answer each one of the four match queries in a uniform way. The skeleton tree
and the MIDDLE subtrees provide to the structure all the information needed to answer exact match queries. The
MIDLEFT and MIDRIGHT trees are used whenever a key is not specified. Their usage is actually to provide to the
search algorithm all the information needed to answer a partial, range or partial range query. The study of Quintary
tree (for an example refer to Figure 2) has driven in the observation that the tuples stored in the MIDLEFT
(MIDRIGHT) subtree of the root node of the structure are tuples stored in the MIDLEFT, MIDDLE and MIDRIGHT
subtrees of every node in the skeleton tree in the LEFT (RIGHT) subtree of the root node. This is done in order to
have the information located in a level d of the tree, stored in an upper level, so that we can search for it when the
respective key is unspecified. So the following lemma can be proved:
Lemma 1: The tuples contained in the MIDLEFT (MIDRIGHT) tree of a node r in a level d of a Quintary tree are the
tuples contained in the MIDLEFT, MIDDLE and MIDRIGHT subtree of the LEFT (RIGHT) son of r.
Proof:
The MIDLEFT tree of a node r in a level d is obtained by projecting the tuples with value x<r in the dimension of the
tuple (this applies immediately from the definition of the Quintary tree) onto the hyperplane H k-d=r (see figure 3).
These tuples are the tuples having the value leftson(r) in the d-dimension [these are the tuples stored in the MIDDLE
tree of the leftson(r)] plus the tuples with value y<leftson(r) in the d-dimension of the tuple, plus the tuples with value
z>leftson(r). All these should be projected onto the same hyperplane. The last two sets are the tuples stored in
MIDLEFT(son(r)) and MIDRIGHT(son(r)) respectively, derived from the definition of the Quintary tree.
r
MIDRIGHT
leftson(r) MIDLEFT
MIDDLE
RIGHT
MIDLEFT
MIDDLE
MIDRIGHT
Figure 3. A snapshot of a Quintary tree.

____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
6
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Theorem 2: The tuples that are stored in the MIDLEFT (MIDRIGHT) subtree of the root node of a Quintary tree are
the tuples stored in the MIDLEFT, MIDDLE and MIDRIGHT subtrees of every node in the skeleton tree in the LEFT
(RIGHT) subtree of the root node of the tree.
Proof:
The proof follows immediately by applying the Lemma 1 for every node of the LEFT (RIGHT) subtree of the root
node of a Quintary tree recursively from the bottom up to the top of the tree. 
Using this observation, we tried to find a way of deleting the same tuples from the bottom of the tree and reducing the
redundancy that makes the tree space inefficient. Due to the fact that the tree should answer the four queries in a
uniform way, this information should not be omitted, but it should be kept in the MIDLEFT and MIDRIGHT subtree
of the root of the tree. The same problem of redundancy appears in a recursive way to each one of the MIDLEFT,
MIDRIGHT and MIDDLE subtrees of the root since they are Quintary trees of k-1 dimension, where k is the
dimension of the Quintary tree. This applies recursively to every node in the tree.
In every dimension a skeleton tree and the MIDLEFT, MIDDLE and MIDRIGHT subtrees of the root of the skeleton
tree exist. Every other pointer of every node points to a node in the subtrees of the root. The nodes that lie on the
LEFT (RIGHT) subtree of the root refer their MIDLEFT, MIDDLE and MIDRIGHT pointers to a node in the
MIDLEFT (MIDRIGHT) subtree of the root. A schematic presentation of this can be seen in Figure 4. This procedure
applies for each one of these trees; that is, since they are Quintary trees of (k-1) dimension, only a skeleton tree and
only three subtrees of (k-1) dimension, the MIDLEFT, MIDDLE and MIDRIGHT of the root, exist. But how can the
redundancy of the original Quintary tree be omitted?
root
MIDRIGHT
r
MIDLEFT
MIDDLE
MIDRIGHT
MIDLEFT
MIDDLE
RIGHT
LEFT
Figure 4. Schematic presentation of the transformation.
The idea is to follow the MIDLEFT (MIDRIGHT) subtree of the root node of the Quintary tree (in postorder
traversal) and mark the occurrence of any information stored in the MIDLEFT, MIDDLE and MIDRIGHT subtrees of
any node in the LEFT (RIGHT) subtree of the root of the tree. The same technique is applied to each one of the
MIDLEFT, MIDRIGHT and MIDDLE subtrees of the root. This is a revised idea of the fat node method in the theory
of persistence ([7]). The difference from the work of Driscoll et al. lies in the fact that they store extra information in
both nodes and pointers of their structures (keeping the space linear) while in our structure extra information
corresponds only to pointers.
Our technique requires that we allow nodes to become arbitrarily “fat”, i.e., to hold an arbitrary number of pointer
fields. To be more precise, each fat node will contain the same information and pointer fields as the original Quintary
tree along with space for an arbitrary number of extra field values. Each field is a mark value. This mark value
indicates the path in the revised subtree of the Quintary tree that we should follow, using its pointers, to derive the
original form of the Quintary tree. The leaves of the tree, where the page numbers are located, are augmented by a list
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
7
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
of mark values. These mark values are the paths where the leaves are valid. They represent the path that we should
follow to get to that leaf.
We simulate the operation of the original Quintary tree by a proper use of the mark value located in node that
corresponds to the access pointer to that node. This procedure is similar to the procedure followed in the work of
Driscoll et. al. in [7]. Whenever we want to follow a pointer to a MIDLEFT, MIDDLE or MIDRIGHT subtree of a
node in the tree we read the mark value that characterizes the pointer to that tree; this will be our current mark.
Instead of following the corresponding tree in that node (that no more exists), we follow the MIDLEFT or
MIDRIGHT pointer of the root (depending on whether we are following the LEFT or the RIGHT subtree of the root
node of the structure) by choosing the pointer with its mark value equal to the current mark. A new mark number can
be read when we move to a new MIDLEFT, MIDDLE or MIDRIGHT subtree and the same procedure applies. This
idea is slightly different from the idea in [7], since the nodes do not contain their own marks because in our case they
will always exist in the path when there is an access pointer to them with its mark equal to the current mark. 2
The marks are generated from the build algorithm and for simplicity are integers. They start from the value of 1 and
are increased each time we follow a new pointer. Instead of integers, a combination of bits or colors can be used to
represent these marks. The value of 1 is assigned to the MIDLEFT pointer of the leftmost child of the left subtree of
the root node. The Quintary tree derived from this technique is presented in figure 5 for the case of the original
Quintary tree of figure 2. For simplicity reasons, only the left subtree of the root is shown on the figure.
Lets present an example of the simulation of a search in the LS-Quintary tree. Assume that we need to find the page
number where the tuple with key (b,x) is stored; that is page number 2 (refer to the example in figure 2). We traverse
the left path from the root to b on the basic-skeleton tree (Figure 5). In order to locate the key with value x, we use the
MIDDLE tree on node b by following the MIDDLE pointer. From this operation the current mark is read; that is 5.
The pointer 5 of the second tree gives us access to the second level of the structure where x is located. Using the same
mark, we follow the MIDDLE pointer on node x to the list of page numbers. There we report the page number having
in its list of marks the current mark; that is page number 2.
2Using
the Lemma 1 for every node r that MIDLEFT(LEFTSON(r))=MIDRIGHT(LEFTSON(r))=
(MIDLEFT(RIGHTTSON(r))=MIDRIGHT(RIGHTSON(r))=) (that is for every node at height logN-1), we
can reduce further the data stored in the LS-Quintary tree, since the MIDLEFT (MIDRIGHT)
subtree of node r is exactly the same as the MIDDLE subtree of LEFTSON(r) (RIGHTSON(r)).
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
8
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Figure 5. The LS-Quintary tree for the Quintary tree of figure 2.
In this point a few remarks on the definition and the behaviour of the LS-Quintary tree should be added for the better
understanding of the build algorithm and the performance of the tree.
1. Only the distinct key values in every dimension are stored; these key values are stored only once.
2. Mark values are generated only in the first level of the tree and they are propagated to the next levels. No
additional mark values are generated in the subsequent levels.
4.2. Build algorithms
The LS-Quintary tree is built using a two step procedure. In the first step, we follow the same procedure as in the
original Quintary tree. An extended description of the first algorithm can be found in [11]. In this way an original
Quintary tree is built. In the second step, a compression algorithm is used to reduce the space of the tree. This
algorithm traverses the tree bottom-up and scans the MIDLEFT, MIDDLE and MIDRIGHT tree of each node in the
skeleton tree. For every node in these trees, we search in the MIDLEFT or MIDRIGHT subtree of the root to locate
the same information. Whenever we find the same node, we stamp the access pointer to that node in the MIDLEFT or
MIDRIGHT subtree of the root of the tree with a mark. The access pointer gets a mark number that is increased
whenever we move to a new subtree. In this way the LS-Quintary tree is created. Due to the space limit of the paper
the algorithms are presented in Appendix I at the end of this work using a pseudo-language.
4.3. Space and time analysis
Theorem 3. A LS-Quintary tree for a file of N records in k-dimensional space can be build in O(NlogkN/(k-1)!) time
and uses O(kN) space.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
9
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Proof:
The building process involves two distinct steps; step 1, the construction of the original Quintary tree and step 2, the
compression of the tree that resides to the LS-Quintary tree.
The time needed to build the original Quintary tree in step 1 can be obtained as follows. Suppose that all key values
are distinct. If we let P(N,k) denote the time required to build the tree for a file of N records in k-dimensional space,
then we have the following recurrence relation:
P(1,k)=O(1)
P(N,1)=O(NlogN)
P(N,k)=2P(N/2, k) + 2P(N/2, k-1)+O(N)
where the term O(N) is due to the step 3-5 of the BUILD algorithm. The solution
for P(N,k) is O(NlogkN). As stated in [11], M. L Monier has obtained a better
solution O(NlogkN/(k-1)!) to the recurrence relation (see [12]).
In step 2, we traverse each one of the nodes of the Quintary tree and mark the occurrence of the information in the
MIDLEFT, MIDDLE and MIDRIGHT subtree onto the MIDLEFT or MIDRIGHT subtree of the root node of the
node. The number of node visits is obviously bounded by O(3k(logN)+N). So, the total build time is bounded by
O(NlogkN/(k-1)!).
The total amount of storage required can be derived from the space reduction procedure of the Quintary tree. First we
will prove the following lemma.
Lemma 2. A Quintary tree for a file of N records in k-dimensional space uses O(NlogkN/(k-1)!) space.
Proof:
Let S(N,k) denote the space used by a Quintary tree for a file of N records in k-dimensional space. Then we have the
following recurrence relation:
S(1,k)=O(k)
S(N,1)=O(NlogN)
S(N,k)=2S(N/2, k) + 2S(N/2, k-1)+O(N)
= O(NlogkN).
As stated in [11], M.L. Monier has obtained a better solution O(Nlog kN/(k-1)!) to the recurrence relation (see [12]).

As stated in the work of [11], a record is represented O(Nlog kN) times. Using the COMPRESS algorithm, every one
of these occurrences is eliminated and the tuple is stored only once.
We will proof and use the following lemma:
Lemma 3. The space occupied by the LS-Quintary tree depends on the number of distinct key values in every
dimension of the tree.
Proof:
Assume that a key requires O(1) space for storage. The total space needed for the LS-Quintary tree depends on the
space occupied by the key values and that of the additional mark values. As it is stated in [11], the information stored
in the page numbers is not calculated since it is not part of the index structure.
Assuming that we have to store a file F of N records in k-dimensional space and that each dimension i has di distinct
values. Based on the definition of the LS-Quintary tree, each distinct value -in any dimension- is stored only once.
Therefore, the space for the key values in each dimension i is proportional to the number of the distinct values, namely
O(di), for ik. The total space Skey_values for the storage of the key values in any dimension is:
Skey_values=O(d1)+ O(d2)++ O(dk)
For simplicity and with no loss of generality, assume that d 1d2dkd. In this case the space Skey_values needed for
the key values is bounded by kd and the following equation applies:
Skey_values=O(kd)
(1)
Now, we will calculate the space overhead caused by the mark values. Mark values are only generated from the
COMPRESS_LEFT (COMPRESS_RIGHT) algorithm. The numbers are generated by the execution of these
algorithms in the first dimension and are propagated to the next dimensions. That means, that no new numbers are
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
10
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
generated in the other dimensions of the LS-Quintary tree. For d distinct values in the first dimension we have at most
 mark values. The number  is calculated as follows:
d 
d 
In any balanced tree with d keys, we have   leaves and   internal nodes. In the LS-Quintary tree, we have 1
2
2
mark value generated by each leaf and 3 mark values by each internal node. Therefore:
d 
 
d 
 
ë=   +3   2d=O(d)
2
2
(2)
That means that the space for the mark values depends on the number d of distinct key values and Lemma 3 holds.

In order to calculate the space of the LS-Quintary tree, we will first examine separately one particular case, which
corresponds to the worst case of the original Quintary tree; this is the case when we have a file with N records and the
file is composed of the all possible combinations of the distinct values in each dimension. In this case the original
Quintary tree occupies O(N logkN).
In the case of the LS-Quintary tree, since the file contains all the possible combinations of the d distinct values, we
have:
N=dk
dN
1
k
(3)
We define p(j) the possibility of a pointer to contain exactly j mark values. Assume that any pointer in the tree in
dimension i with 2ik, can contain -on average- MV mark values. In this case, since we examine only the left tree,
we have at most, as it can be derived from (2) and Figure 5, d mark values.
Based on these observations, we have:
MV=p(1)1+p(2)2++p(d)d
It holds that:
p(1)+p(2)++p(d)=1
(4)
(5)
From the definition of the LS-Quintary tree and the remarks from section 4.1 we can claim, with no loss of generality,
that:
p(1)=p(2)==p(d)=p
(6)
( 6)
1
(5)  p+p++p=1dp=1p=
d
(7)
Therefore:
( 7)
1
(1+2++d)=
d
1 1  ( d  1)
d
d=
=
2
d
2
MV=p+2p++dp=p(1+2++d)

As it can be seen from this equation, the worst case of the Quintary tree turns into a case where mark values are
distributed in a uniform way from the second to the last level of LS-Quintary tree. Based on this calculation we can
claim that the worst case for the number of mark values is O(d). Making the same calculations for the whole tree, for
all the pointers, we have at most O(d2) additional mark values.
The total space S LS_Quintary for the tree is:
( 3)
S LS_Quintary=S key_values+S mark_values=O(kd)+O(kd2)
 O[k( N
1
k
N
2
k
)]
(8)
From equation (8) it can be derived that the space performance of the LS-Quintary tree depends not on N (as in the
original Quintary tree) but on the rate
N
. For the worst case of the original Quintary tree, we have very good space
d
performance in the LS-Quintary tree. The space in this case is bounded from O(kN). The performance is reduced
when dN, but in this case, despite the fact that we have more mark values generated from the first dimension of the
tree, these values appear less times in the other dimensions since the records do not repeat each other (this case is the
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
11
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
same as in [7]). It remains to prove that in any case the space is bounded by O(kN). A formal proof of this claim
follows.
For the general case, in order to calculate the total space overhead caused by the mark values, we will examine each
level of the LS-Quintary tree separately.
Assume that in the first level, the number of mark values is O(d 1), if the size of the first level is d1, since we have at
most three mark values generated at each node of the tree.
For the calculation of the overhead caused by the mark values at the second level of the LS-Quintary tree, we
distinguish three different cases, based on the nature of the problem and the relation among the number of distinct
keys of the first and second level. Namely:
Case 1. Both levels contain many distinct keys.
Assume that d1 and d2 are the numbers of distinct key values in the first and second level respectively. This case
corresponds to the previously examined case and the proof is the same.
Case 2. The first level contains many distinct keys, while the second level is composed of few distinct keys.
In this case assume that d1N and d2<<N.
The space overhead (S2) generated from the mark values of the second level depends on the number of pointers P
of this level and the number of mark values ( MV ) that each pointer contains. Namely:
S2=P MV
From the definition of the tree follows that P=O(d2). Moreover, in the worst case, each pointer will contain at most
d1 mark values, so we have that
MV =O(d1). Therefore:
S2=O(d1 d2)
We have that d2<<N  d2ε (small constant) and therefore (since d1N):
S2=O(εd1)  S2=O(N)
Case 3. The first level contains very few distinct keys, while the second level is composed of many distinct keys.
In this case assume that d1<<N and d2N.
The space overhead (S2) is also:
S2=P MV
From the definition of the tree follows that P=O(d 2). Moreover, in the worst case, each pointer will contain d 1
mark values, so we have that
MV =O(d1). Therefore:
S2=O(d1 d2)
We have that d1<<N  d1ε (small constant) and therefore (since d2N):
S2=O(εd2)  S2=O(N)
As it can be derived, in every case, the space overhead generated from the mark values of the second level of the
LS-Quintary tree is O(N), and since the space for the keys is also bounded by O(N), the total space is also O(N).
In the other levels of the tree over the second one, the total space of each level is also O(N), since no additional mark
values are generated (other than those generated at the first level) and the nature of the trees of these levels and their
relation with the previous level, fulfils always one of the three previously described cases.
As a conclusion, for the whole LS-Quintary tree, we have that the tree consist of k different levels, and each level is
stored in a tree with size equal to O(N). therefore:
S LS_Quintary=S key_values+S mark_values=O(kN)+O(kN)=O(kN)
From the above discussion it is concluded that Theorem 3 applies.

Remark. It remains to find out the search time overhead of reducing the space of the original Quintary tree. The basic
skeleton has not been changed (in a way that effects its traversal), so no time access overhead is calculated for any
operation on the skeleton. Whenever an access pointer to a fat node of the LS-Quintary tree has to be followed, any
algorithm has to decide which one of the corresponding pointer should be chosen. This takes more than constant time.
This choice is done using the marks stored in the fat node. These values can be stored using their physical order in a
binary search tree; then O(logm) time is needed to choose any pointer, where m is the number of times that we have to
access the MIDLEFT or MIDRIGHT subtree of the root to locate a tuple. This time can be bounded in the worst case
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
12
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
by O(logN). The same organization can also be applied to the leaves (otherwise we can use the already normalized
order of the marks and create an array of access pointers with an entry for each leaf; then these access pointers can be
retrieved in constant time ([3])). Using this representation, we have a logN amortized cost per answer reported. For an
extended mathematical proof of the whole effort please refer in [7].
4.4. Search algorithms
The search algorithms that handle the four types of query and the analysis of their worst-case performance are
presented in details in Apendix II.
5. Comparison with other multidimensional data structures
The following table summarizes the complexity bounds of LS-Quintary trees, the k-d trees ([1], [10]), Range trees
([20]), k-ranges ([2]) and Multiple Attribute Trees (MAT) ([8], [15]).
Quintary
Tree
LS-Quintary Tree
k-d
Trees
Range
Tree
O(Nlogk-1N/(k
-1)!)
O(kN)
O(N)
O(Nlog
Build
Time
O(Nlogk-1N/(k
-1)!)
O(Nlogk-1N/(k-1)!)
Exact
Match
Partia
l
Match
Range
Match
O(logN+k)
O(logN+klogN)
Space
O(3
k-s
(s+logN
)+t)
k
O(log N+t)
Nonoverlapp
ing
k-ranges
O(N)
Overlapp
ing
k-ranges
O(N1+å)
å>0
O(NlogN)
O(NlogN)
N)
-
O(N1+å)
å>0
-
-
O(kN1/k+t)
-
-
-
O(sN1-(s-1)/k+t)
O(Nå+t)
å>0
O(logN+t
)
O(N1-1/klog(N1/k)
+t)
-
-
-
k-
1
N)
O(NlogN)
O(Nlog
k-
1
O(3
-
(k-s)/k+
(s+logN)logN+ O(n
t)
tlogN)
k-s
k
O(log N+tlogN)
Partia O(3k-slogsN+t) O(3k-slogs+1N+tlogN)
l
Range
O(kN1-1/k+
t)
O(sN1-1/k+
t)
k
O(log N+
t)
-
MAT
O(N)
The unspecified buckets mean that the respective data structure can not support efficiently this kind of query.
6. Conclusion
As it can be easily seen, the Quintary tree is a well-formed structure used mainly for static files. As Lee and Wong
stated in [11] the tree is inadequate for too many updates since they may destroy the properties of the structure and
affect the performance. However for completeness, Lee and Wong outlined algorithms for insertion and deletion. The
same algorithms apply to the LS-Quintary trees. The only difference is the time required for insertion. The original
Quintary tree requires O(k+(logN) k) time. This bound is multiplied by an logN factor for the LS-Quintary trees.
However, this is not a robust insert/delete procedure, since too many updates destroy the whole tree. So, it remains an
open problem to derive a variant of the LS-Quintary tree that would support all its original properties and it will be
dynamic. We believe that a more powerful construction algorithm of the data structure is needed for this. Our future
work focuses mainly on the derivation of a simple and powerful construction algorithm and secondly on the
dynamization of the structure.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Bentley, L., "Multidimensional Binary Search Trees used for Associative Searching", Communications of the
ACM, 18, (1975), pp 509-517.
Bentley, L. and H.A. Maurer, "Efficient worst-case data structures for range searching", Acta Informatica, 13,
(1980), pp. 155-168.
Boronjerdi, A.R. and B.M.E. Moret, “Persistence in Computational Geometry”, to appear in the Proceedings
of the Seventh Canadian Conference on Computational Geometry, (1995).
Cardenas, F. and J. P. Sagamang, "Doubly-chained Tree Database Organization -Analysis and Design
Strategies", Comput. J., 20, (1977), pp 15-26
Chazelle, B., “Lower bounds for orthogonal range searching: I. The reporting Case”, Journal of the ACM, 37,
2, (1990), pp. 200-212.
Chazelle, B., “Lower bounds for orthogonal range searching: II. The arithmetic model”, Journal of the ACM,
37, 3, (1990), pp. 439-463.
Driscoll, J., Sarnak, N., Sleator, D. and R., Tarjan, “Making data structures persistent”, Journal on Computer
and System Sciences, 38, (1989), pp. 86-124.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
13
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Kashyap, L., Subas, S.K.C and S.B. Yao, "Analysis of the Multi-Attribute-Tree Database organization", IEEE
Trans. Software Eng., SE-6, (1977), pp 451-467.
Knuth, D. E., “Big omicron, big omega and big theta”, SIGACT News, 8, (1976), pp. 18-24.
Lee, D.T. and C. Wong, "Worst case analysis for region and partial region searches in multidimensional binary
search trees and quad trees", Acta Inf., 9, 1, (1978), pp 23-29.
Lee, D.T. and C. Wong, "Quintary Trees: A File Structure for Multidimensional Database Systems", ACM
Trans. on Database Systems, Vol 5, No 3, (1980), pp 339-353.
Monier, L., “Combinatorial solutions of multidimensional divide-and-conquer recurrences”, J. Algorithms, 1,
(1980) pp. 60-74.
Sarnak, N., “Persistent Data Structures”, Ph. D. thesis, Dept. of Computer Science, New York University, New
York, 1986.
Sarnak, N. and R.E. Tarjan, “Planar Point Location using Persistent Search Trees”, Comm. ACM, Vol 29, No
7, (1986), pp. 669-579.
Sitharama Iyengar, S., Rao, N.S.V., Kashyap, R.L., and V.K., Vaishnavi, “Multidimensional data structures:
Review and outlook”, Advances in Computers, 27, (1988), pp. 69-119.
Tarjan, R.E., “Data structures and network algorithms”, Society for Industrial and Applied Mathematics,
Philadelphia, PA, (1983).
Tarjan, R.E., “Amortized computational complexity”, SIAM J. Algebraic Discrete Methods 6, (1985), pp.
306-318.
Tsakalidis, A.K., ”An optimal implementation for localized search”, A84/06, Fachbereich Angewandte
Mathematik und Informatik, Universität des Saarlandes, Saarbrücken, West Germany, (1984).
Tsakalidis, A.K., ”AVL-trees for localized search”, Inform. and Control, 67, (1985), pp. 173-194.
Willard, D.E., "New Data Structure for Orthogonal Range Queries", SIAM J. Comput., Vol 14, No 1,
(February 1985), pp 232-253.
Willard, D.E., and G. Luecker, "Adding Range Restriction capabilities to Dynamic Data Structures", Journal of
ACM, 32, (1985), pp 597-617.
Vaishnavi, K., "Multidimensional Balanced Binary Trees", IEEE Trans. on Comput., Vol 38, No 7, (July
1989), pp 968-985.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
14
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Apendix I
BUILD(d,F)
/* This algorithm gets as input a file F of records and the dimension d of the associated file space. It returns a pointer to the root of the tree
for the file F if d1. Otherwise it returns a pointer to a list of page numbers where the records of F reside. */
begin
1. if d<1 then return (T)
2. Create a new node NODE with five pointer fields LEFT, MIDLEFT, MIDDLE, MIDRIGHT and RIGHT and one value field
VAL.
3. Find the median of the file F of records with respect to key K k-d. Let it be r and set NODE.VAL=r.
4. Partition the file F into three subfiles Fleft, Fmid and Fright such that they contain records whose Kk-d values are less than, equal to and
greater than r, respectively.
5. Let Fmidleft and Fmidright denote the file of records obtained by projecting Fleft and Fright onto the hyperplane Hk-d=r. If d=1, the files
Fmidleft and Fmidright will be the lists of page numbers where the records of Fleft and Fright reside, respectively.
6. if Fleft is empty then begin
NODE.LEFT=
NODE.MIDLEFT=
end
else begin
NODE.LEFT=BUILD(d, Fleft)
NODE.MIDLEFT=BUILD(d-1, Fmidleft)
end
7. if Fright is empty then begin
NODE.RIGHT=
NODE.MIDRIGHT=
end
else begin
NODE.RIGHT=BUILD(d, Fright)
NODE.MIDRIGHT=BUILD(d-1, Fmidright)
end
8. NODE.MIDDLE=BUILD(d-1, Fmid)
9. return(addr(NODE))
end
COMPRESS(NODE, d)
/* The algorithm compresses the quintary tree by scanning the LEFT and RIGHT subtree and comparing the MIDLEFT, MIDDLE and
MIDRIGHT subtree of each node NODE with the MIDLEFT or MIDRIGHT subtree of the root node of the tree. It calls the
COMPRESS_LEFT procedure to compress the LEFT subtree and the COMPRESS_RIGHT procedure to compress the RIGHT subtree. It is
invoked by COMPRESS(ROOT, k), where ROOT is the root node of the quintary tree. */
begin
1. if Tree not empty then
begin
MARK=1 /*This number will be used to mark the occurrence of the same information in the MIDLEFT and MIDRIGHT
subtree of the root */
2.
if NODE.LEFT then begin
COMPRESS_LEFT(root, root)
if d1 then COMPRESS(root.MIDLEFT, d-1)
end
3.
if NODE.RIGHT then begin
COMPRESS_RIGHT(root, root)
if d1 then COMPRESS(root.MIDRIGHT, d-1)
end
4.
if d1 then COMPRESS(root.MIDDLE, d-1)
end
end
COMPRESS_LEFT(root, NODE)
/* The algorithm compresses the LEFT subtree of a quintary tree. It compares the MIDLEFT, MIDDLE and MIDRIGHT subtree of each
node NODE with the MIDLEFT subtree of the root node of the tree. Whenever the same information is located its is marked using the
MARK_STRUCTURES procedure using the MARK generated in COMPRESS_LEFT procedure. */
begin
1. if NODE.LEFT then COMPRESS_LEFT(root, NODE.LEFT)
2. if NODE.MIDLEFT then begin
MARK=MARK+1
MARK_STRUCTURES(root.MIDLEFT, NODE.MIDLEFT, MARK)
end
3. if NODE.MIDRIGHT then begin
MARK=MARK+1
MARK_STRUCTURES(root.MIDLEFT, NODE.MIDRIGHT, MARK)
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
15
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
end
4. MARK=MARK+1
5. MARK_STRUCTURES(root.MIDLEFT, NODE.MIDDLE, MARK)
6. if NODE.RIGHT then COMPRESS_LEFT(root, NODE.RIGHT)
end
COMPRESS_RIGHT(root, NODE)
/* The algorithm compresses the RIGHT subtree of a Quintary tree. It compares the MIDLEFT, MIDDLE and MIDRIGHT subtree of each
node NODE with the MIDRIGHT subtree of the root node of the tree. Whenever the same information is located its is marked using the
MARK_STRUCTURES procedures using the MARK generated in COMPRESS_RIGHT procedure. */
begin
1. if NODE.RIGHT then COMPRESS_RIGHT(root, NODE.RIGHT)
2. if NODE.MIDLEFT then begin
MARK=MARK+1
MARK_STRUCTURES(root.MIDRIGHT, NODE.MIDLEFT, MARK)
end
3. if NODE.MIDRIGHT then begin
MARK=MARK+1
MARK_STRUCTURES(root.MIDRIGHT, NODE.MIDRIGHT, MARK)
end
4. MARK=MARK+1
5. MARK_STRUCTURES(root.MIDRIGHT, NODE.MIDDLE, MARK)
6. if NODE.LEFT then COMPRESS_RIGHT(root, NODE.LEFT)
end
MARK_STRUCTURES(NODE1, NODE2, MARK_STAMP)
/* The algorithm checks if the value stored in a NODE1 is equal to the value stored in NODE2. If this is so, then the node and the access
pointer to that node is stamped with the MARK_STAMP. Else depending on the condition between the two values, the left or the right path
of the tree, where NODE1 is stored, is followed. */
begin
1. if NODE2.VAL==NODE1.VAL
then begin
Stamp the NODE1 and the access pointer to NODE1 with the MARK_STAMP
Create 5 NIL pointers to the NODE1 (that is LEFT, MIDLEFT, MIDDLE, MIDRIGHT, RIGHT)
/* These pointers are created so that they exist when there is no node next to their corresponding node or there is no an
access pointer to the next node so that the previous step in the next recursive execution of the algorithm can be done.
They only give a linear space overhead in the whole structure */
/* Perform the same check for every subtree (LEFT, RIGHT, MIDLEFT, MIDRIGHT and MIDDLE) of both nodes */
MARK_STRUCTURES(NODE1.LEFT, NODE2.LEFT, MARK_STAMP)
MARK_STRUCTURES(NODE1.RIGHT, NODE2.RIGHT, MARK_STAMP)
MARK_STRUCTURES(NODE1.MIDLEFT, NODE2.MIDLEFT, MARK_STAMP)
MARK_STRUCTURES(NODE1.MIDDLE, NODE2.MIDDLE, MARK_STAMP)
MARK_STRUCTURES(NODE1.MIDRIGHT, NODE2.MIDRIGHT, MARK_STAMP)
end
else begin
2.
if NODE2.VAL== then leave the pointers NIL
/* Find out the branch that the algorithm should follow */
3.
if (NODE2.VAL<NODE1.VAL) then MARK_STRUCTURES(NODE1.LEFT, NODE2, MARK_STAMP)
4.
if (NODE2.VAL>NODE1.VAL) then MARK_STRUCTURES(NODE1.RIGHT, NODE2, MARK_STAMP)
end
/* The case NODE1.VAL== is not possible if NODE2 according to Theorem 2 */
end
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
16
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
Appendix II
Search algorithms
In this section we present the algorithms that handle the four types of query and analyze their worst-case performance.
The query Q is assumed to be a vector of k-tuples (r0,r1,,rk-1) for exact match and partial queries, ri may be an * if the
key ki is unspecified in a partial match query. Ki(Q) denotes the key value of ki of the given query Q. Similarly, Q is a
2xk array for range and partial range queries. LK i(Q) and UKi(Q) denote, respectively, the lower and upper bound on
the key value of ki. In the following algorithms, T is a pointer to the tree to be searched and value(T) is the value stored
at the root node of the tree. ROOT is assumed to be the pointer to the entire tree.
Exact Match
EXACT_MATCH(T, d, Q)
/* The algorithm is rather self explanatory */
begin
1. If T== then return().
2. If d==0 then return(T).
3. If Kk-d(Q)<value(T)
then EXACT_MATCH(LEFT(T), d, Q)
4. else if Kk-d(Q)>value(T)
then EXACT_MATCH(RIGHT(T), d, Q)
5. else EXACT_MATCH(MIDDLE(T), d-1, Q)
end.
Since its node visit (a key comparison) discards roughly half of the file represented by the node, the total number of
visits is at most O(k+logN), where k is the number of levels of the tree or the dimension of the file space. The answers
are multiplied by a factor of logN due to the existence of the fat nodes in the tree.
Partial Match
PARTIAL_MATCH(T, d, Q)
/* The algorithm for partial match query is similar to the algorithm for exact match query except that when a key is unspecified we have to
search through the attached three subtree at one lower level (step 3’). If s is the number of unspecified keys in the query, the worst case that
the algorithm makes the most number of node visits is when the specified keys are at the last s levels. The maximum number of nodes visits
is obviously bounded by O(3k-s(s+logN)+t), where t is the number of records retrieved. This bound is multiplied by a factor of logN for every
step that access the MIDLEFT, MIDDLE and MIDRIGHT subtrees of the tree. */
begin
1. If T== then return().
2. If d==0 then return(T).
3'. If Kk-d(Q)==“*”
then begin
PARTIAL_MATCH(MIDLEFT(T), d-1, Q)
PARTIAL_MATCH(MIDDLE (T), d-1, Q)
PARTIAL_MATCH(MIDRIGHT(T), d-1, Q)
end
3. else if Xk-d(Q)<value(T)
then PARTIAL_MATCH(LEFT(T), d, Q)
4. else if Xk-d(Q)>value(T)
then PARTIAL_MATCH(RIGHT(T), d, Q)
else PARTIAL_MATCH(MIDDLE(T), d-1, Q)
end.
Range Search
RANGE_SEARCH(T, d, Q, L, R)
/* First of all, the root node of the entire tree at level 0 is visited. If a node value is out of bound i.e., not within the closed interval [LK0(Q),
UK0(Q)], we follow either the LEFT or the RIGHT pointers depending on whether the node value is greater than UK 0(Q) or less than
LK0(Q). Otherwise we invoke the algorithm recursively by visiting the MIDDLE subtree at one lower level and the LEFT and the RIGHT
subtrees at the same level setting the control variables L and R to 1. The control variables are used to indicate that the interval [LK0(Q),
UK0(Q)] has been partitioned into two subintervals [LKi(Q),x),3 (x,UKi(Q)], where x is the value stored at the current node and remains 1
during the searches of the subtrees at the same level. Since each interval [LK i(Q), UKi(Q)] can be partitioned into at most logN subintervals
at each level i, the maximum number of node visits is bounded by O(log kN) plus the number of records found in the specified region, i.e.,
O(logkN+t) where t is the number of records in the region. The bound is increased by a factor of logN for every answer due to the existence
of the fat nodes in the tree. */
begin
1. If T== then return().
2. If d==0 then return(T).
3. If value(T)<LKk-d(Q)
then RANGE_SEARCH(RIGHT(T), d, Q, L, R)
3[x,y)
denotes an interval that contains z such that xz<y.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
17
COMPUTER TECHNOLOGY INSTITUTE
1999
________________________________________________________________________________
4. else if value(T)>UKk-d(Q)
then RANGE_SEARCH(LEFT(T), d, Q, L, R)
5. else begin
RANGE_SEARCH(MIDDLE(T), d-1, Q, 0, 0)
if R==0 then RANGE_SEARCH(LEFT(T), d, Q, 1, R)
else RANGE_SEARCH(MIDLEFT(T), d-1, Q, 0, 0)
if L==0 then RANGE_SEARCH(RIGHT(T), d, Q, L, 1)
else RANGE_SEARCH(MIDRIGHT(T), d-1, Q, 0, 0)
end
end.
Partial Range
PARTIAL_RANGE
The algorithm is similar to the RANGE_SEARCH algorithm except that if a key is unspecified, we have to search through the attached three
subtrees at one lower level. We simply add the following step between steps 2 and 3.
3'. If key Kk-d is unspecified
then begin
RANGE_SEARCH(MIDLEFT(T), d-1, Q, 0, 0)
RANGE_SEARCH(MIDDLE(T), d-1, Q, 0, 0)
RANGE_SEARCH(MIDRIGHT(T), d-1, Q, 0, 0)
end
else ...
The maximum number of node visits is bounded by O(3 k-s(logN)s+t) where s is the number of specified keys and t that
number of records in the region. This bound is multiplied by a factor of logN for every step that accesses the
MIDLEFT, MIDDLE and MIDRIGHT subtrees of the tree.
____________________________________________________________________________________
TECHNICAL REPORT No. ΤR99/06/03
18
Download