New method in information processing for www.scichina.com info.scichina.com

advertisement
www.scichina.com
info.scichina.com
www.springerlink.com
New method in information processing for
maintaining an efficient dynamic ordered set
XIN ShiQing1 & WANG GuoJin1,2†
1
Institute of Computer Graphics and Image Processing, Zhejiang University, Hangzhou 310027, China;
2
State Key Laboratory of CAD & CG, Zhejiang University, Hangzhou 310027, China
This paper investigates how to maintain an efficient dynamic ordered set of bit strings, which is an
important problem in the field of information search and information processing. Generally, a dynamic
ordered set is required to support 5 essential operations including search, insertion, deletion, max-value
retrieval and next-larger-value retrieval. Based on previous research fruits, we present an advanced
data structure named rich binary tree (RBT), which follows both the binary-search-tree property and
the digital-search-tree property. Also, every key K keeps the most significant difference bit (MSDB)
between itself and the next larger value among K ’s ancestors, as well as that between itself and the
next smaller one among its ancestors. With the new data structure, we can maintain a dynamic ordered
set in O(L) time. Since computers represent objects in binary mode, our method has a big potential in
application. In fact, RBT can be viewed as a general-purpose data structure for problems concerning
order, such as search, sorting and maintaining a priority queue. For example, when RBT is applied in
sorting, we get a linear-time algorithm with regard to the key number and its performance is far better
than quick-sort. What is more powerful than quick-sort is that RBT supports constant-time dynamic
insertion/deletion.
information processing, dynamic ordered set, algorithms and data structures, rich binary tree
There are numerous circumstances where one needs
to maintain a dynamic fully-ordered set[1] so that
items, each with a key, can be handled according
to their priorities[2,3] . Generally, an elaborate data
structure is required to facilitate the frequent operations including 1) searching the given key; 2)
inserting a new key; 3) deleting the given key;
4) retrieving the next-larger key; and 5) returning
the maximum key[4,5] . So the dynamic ordered set
problem covers such a wide range that search[6−11] ,
sorting[1,6,12−17,20−23] and maintaining a priority
queue[17−19,24] are only special sub-problems, which
were mostly studied separately[1,14] .
In this paper, we devise a rich binary tree (RBT)
for maintaining an efficient dynamic ordered set in
O(L) time, where L is the word length of keys. The
RBT is required to follow both the binary-searchtree property and the digital-search-tree property.
Furthermore, the RBT keeps the most significant
difference bit (MSDB) between each key K and
Received May 15, 2007; accepted August 20, 2008
doi: 10.1007/s11432-009-0074-0
†
Corresponding author (email: wanggj@zju.edu.cn)
Supported by the National Natural Science Foundation of China (Grant No. 60873111), and the National Basic Research Program of China
(Grant No. 2004CB719400)
Citation: Xin S Q, Wang G J. New method in information processing for maintaining an efficient dynamic ordered set. Sci China Ser F-Inf
Sci, 2009, 52(8): 1292–1301, doi: 10.1007/s11432-009-0074-0
its direct parent key K, as well as that between
K and its indirect parent key K, where the indirect parent is defined as the last cornered key on
the root-to-leaf path to K itself. We will see that
RBT with such a definition has nice properties. For
example, we have K ∈ [min(K, K), max(K, K)],
[min(K, K), max(K, K)] inducing a sequence of
nested intervals when K goes along any root-to-leaf
path. Therefore, we can employ the advanced data
structure to solve the problems about order, e.g.,
search, sorting and maintaining a priority queue.
This will greatly raise the efficiency of processing
information, and play an important role in the development of computer science. Section 1 defines
RBT and section 2 gives 5 essential RBT-based
algorithms to support a dynamic ordered set. We
provide some application examples in section 3 and
arrive at a conclusion in section 4.
1 Data Structure: RBT
Trees, especially binary trees, are an important
data structure in computer science and have numerous applications in search and sorting[1,14] . In
this section, we propose a new data structure
named rich binary tree (RBT), which is defined by
three requirements. Here we might as well assume
that the keys are of the same word length L and
different from each other. Our conclusion can be
easily extended to general cases. We define RBT
as follows.
Definition 1. Suppose T is a binary tree with
n keys. We say T is an RBT if it satisfies:
1) For each key K of the tree T , and its left child
Kl and its right Kr , we have Kl < K < Kr .
2) Along the root-to-leaf path to K, we can obtain a binary string S by taking down a 0 if moving
to the left and a 1 if moving to the right. Then S
must be a prefix of K.
3) Besides pointers to the children, we need to
keep another two pointers for each key K. One
points to the direct parent key K, and the other
points to the indirect parent key K, which is defined as the last cornered key on the root-to-leaf
path to K itself. At the same time, the most significant difference bits (MSDB) between K and K, K
should be kept.
Figure 1(a) gives an example of RBT. It contains 10 keys: 1, 2, 3, 4, 5, 6, 7, 12, 13 and 15.
We can easily verify that it meets the above three
requirements. For example, the root-to-leaf path
from H to G has one left turn and two right turns,
which induces a binary string 011, exactly the prefix of the key 7 (binary code: 0111). In addition,
the solid-line arrows point to direct parents, while
the dash-line arrows point to indirect parents. Figures 1(b)–1(d) show the building process of RBT:
1) creating a complete binary tree with its depth
the same as the word length; 2) arranging all the
keys on the bottom level according to the digitalsearch-tree property (see Figure 1(b)); and 3) in a
bubble-like fashion, repeating filling in the empty
node with the maximum key of its left subtree or
the minimum key of its right subtree (see Figure
1(c) and (d)). Obviously, RBT exists but is not
necessarily unique generally.
RBT has more constraints than general binary
trees, following both the binary-search-tree property and the digital-search-tree property. Furthermore, it keeps the internal relation between keys.
Therefore nice properties could be expected.
Theorem 1. Let T be an RBT and L be the
word length. Then we have
1) T has a depth no more than L + 1.
2) In-order traversal gives a monotonic increasing sequence of keys.
3) Let K be a key of T , and K, K be respectively its direct parent key and its indirect parent
key. Then K ∈ (min(K, K), max(K, K)).
4) Assume that the keys K1 , K2 , · · · , Km are distributed along a root-to-leaf path. Then their respective direct parent keys and indirect parent keys
induce a sequence of nested intervals. That is,
(min(K1 , K1 ), max(K1 , K1 ))
⊃ (min(K2 , K2 ), max(K2 , K2 ))
⊃ ···
⊃ (min(Km , Km ), max(Km , Km )),
where K = 0 if K has no direct parent; and
(
0, K > K;
K=
∞, else,
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
1293
Figure 1
An RBT from the keys 1, 2, 3, 4, 5, 6, 7, 12, 13 and 15. (a) An RBT: dash-line arrows point to direct parents, while solid-
line arrows point to indirect parents; (b)–(d) in a bubble-like fashion, we can obtain an RBT different from (a). The rule is filling in the
blank node with the minimum key in its right sub-tree or the maximum key in its left subtree.
if K has no indirect parent.
In fact, the proof to Theorem 1 is straightforward. The first proposition of Theorem 1 can be
proved from the digital-search-tree property, while
the other three depend on the binary-search-tree
property. Taking Figure 1(a) as an example, we
observe that along the path H→D→E→G→F, the
sequence of nested intervals is:
(0, ∞) ⊃ (0, 1100) ⊃ (0100, 1100) ⊃ (0101, 1100)
⊃ (0101, 0111).
This shows that the nearer to the bottom level the
keys are located, the closer they will be. Exactly
speaking, we have the following lemma.
Lemma 1. The more the most significant difference bit of two keys is on the right, the closer
they will be; and vice versa. In detail,
1) if the three keys K1 , K2 , K3 satisfy K1 <
K2 < K3 or K1 > K2 > K3 , then we have
d(K1 , K2 ) > d(K1 , K3 ), where d(·, ·) denotes the
MSDB, being 1 at the leftmost bit, 2 at the next
leftmost bit, and so on.
1294
2) if d(K1 , K2 ) > d(K1 , K3 ), then K1 > K3
(resp., K1 < K3 ) implies K2 > K3 (resp., K2 <
K3 ); if d(K1 , K2 ) = d(K1 , K3 ), then K1 > K3
(resp., K2 6 K3 ) implies K1 > K2 (resp., K1 6
K2 ).
Combining Lemma 1 and the last conclusion of
Theorem 1 together, we get the following corollary.
Corollary 1.
Suppose that K1 , K2 , · · · , Km
is a sequence of root-to-leaf keys. Then the MSDB
sequence {max(d(Ki , Ki ), d(Ki , Ki ))} is monotonically non-decreasing with regard to the subscript
i.
Corollary 1 reveals that in the sense of MSDB,
keys of RBT are well arranged. As we know, the
maintenance of a dynamic ordered set requires frequent comparisons between two keys. Generally,
each comparison always begins at the leftmost bit,
and mostly costs repeated bit operations. But we
can greatly reduce calculation amount for comparison according to the MSDB information provided
by RBT. This is just the crux of improving the
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
efficiency of maintaining an ordered set.
Algorithm 1
For the purpose of describing algorithms, we represent the node structure of RBT in a C++ fashion:
RBTNode* nodeOfMinKey = root; // initialized to be the
Retrieving the minimum key
root node
While (nodeOfMinKey->leftChild != NULL)
{
RBTNode
nodeOfMinKey = nodeOfMinKey->leftChild;
}
{
bitset<L> key; // L is the word length
Return nodeOfMinKey;
RBTNode* leftChild; // the left child
RBTNode* rightChild; // the right child
2.2
RBTNode* directParent; // the direct parent
Theorem 1 tells us that an in-order traversal induces a monotonically increasing sequence of keys.
So the problem of finding the next-larger key can
be converted into a problem of finding the in-order
successor. For a given key, if its right subtree is not
empty, then the minimum key of the right subtree
is just the next-larger key. Otherwise, if the key
itself is a left child, then its direct parent key is
the right answer; and if it is a right child, then its
indirect parent key is exactly what we want. The
pseudo-code algorithm is described as follows.
RBTNode* indirectParent; //the indirect parent
int directCursor; //MSDB from its direct parent
int indirectCursor; //MSDB from its indirect parent
}.
What we want to explain here is that the variables directCursor and indirectCursor behave differently from general integers. They only support
1-bit movement to the right. So there may be a certain implementation mechanism with a good performance.
Returning the next-larger key
Algorithm 2
2 RBT-based implementation of dynamic
ordered sets
Returning the next-larger key
RBTNode* nodeOfNextKey = curNode->rightChild;
//initialized to be the right child
If (nodeOfNextKey == NULL)
In this section, we discuss the detailed implementation algorithms of RBT-based dynamic ordered sets, so that the following essential tasks
can be done in O(L) time: 1) retrieving the
maximum/minimum key; 2) returning the nextlarger/next-smaller key; 3) searching the given key;
4) inserting a new key; and 5) deleting the given
key. We will see that such operations have a time
bound that linearly depends on the word length
and has nothing to do with the key number.
{
If (curNode->directParent == NULL)
Return NULL;
If (curNode->directParent->leftChild == curNode)
Return curNode->directParent;
Return curNode->indirectParent;
}
Else
{
While (nodeOfNextKey->leftChild != NULL)
nodeOfNextKey = nodeOfNextKey->leftChild;
Return nodeOfNextKey;
2.1
Retrieving the minimum key
Suppose that the RBT already exists. Here we
consider how to retrieve the minimum (maximum)
key. If the RBT is empty, we just do nothing. Otherwise, the minimum key is bound to exist. Obviously, if we take left-child way as possible, we can
find the minimum key. Taking Figure 1(a) and
(d) as an example, node A has the minimum key
1, while node J has the maximum 15. We give a
pseudo-code algorithm as follows. Of course, Algorithm 1 runs in time O(L).
}
2.3
Searching the given key
Binary search trees, such as red-black trees, only
serve as a tool for searching a key, with match check
word by word from root to leaf. For example, to
search the key 0110, we need to compare it with
the root key. Since 0110<1100, comparison goes
left way. So we compare it with nodes D, E, G,
F in turn. At this point, node F just contains key
0110, and therefore the search process is over.
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
1295
Different from general binary trees, RBT keeps
also MSDB information. Thus we can reduce bit
comparison amount and improve search efficiency.
Assume that a, b, c are three keys with word length
L, and d(·, ·) denotes the MSDB operation between
two keys. Lemma 1 implies
(
d(a, c) = min(d(a, b), d(b, c)), d(a, b) 6= d(b, c);
d(a, c) > d(a, b),
else.
That is to say, if d(a, b) and d(b, c) are different, d(a, c) has already been computed; otherwise,
the final cursor position of d(a, c) is located on
the right of min(d(a, b), d(b, c)). So we may take
min(d(a, b), d(b, c)) as the starting cursor position
and slide cursor to the right until we find the
MSDB between a and c. Since RBT has already
kept MSDB information, it may improve search efficiency.
As we know, an RBT is definitely a binary search
tree. So when searching a key c, we first compare
it with the root key. If they are equal, return;
and if c is smaller, continue to compare it with the
left child key; otherwise, compare it with the right
child. Repeat such a process until key c reaches
the bottom level.
For an in-depth study, we assume that the
search path of key c is as Figure 2(a) shows. We
number the keys like this: denote the left-turn
Figure 2
keys by a1 , a2 , · · · , am1 , and the right-turn keys by
b1 , b2 , · · · , bm2 . Then we have
a1 < a2 < · · · < am1 < c < bm2 < bm2 −1 < · · · < b1 .
Before searching key c, we create a node Nc for
it to keep MSDB information between c and the
current key N.key on the search path. Since N already keeps the MSDB between N.key and N .key,
here we intend to compute the MSDB between c
and N.key with the MSDB between c and N .key
being known. This is not difficult if using what we
just discussed on how to slide cursor.
Note that the keys along the root-to-leaf path
satisfy
a1 < a2 < · · · < am1 < c < bm2 < bm2 −1 < · · · < b1 ,
which implies
d(c, a1 ) < d(c, a2 ) < · · · < d(c, am1 )
and
d(c, b1 ) < d(c, b2 ) < · · · < d(c, bm2 ).
The two inequalities show that the cursor should
slide like Figure 2(b), in which solid-line arrows
show the real cursor sliding path, while dash-line
arrows show the change of the starting cursor position. Obviously, the use of the filed directCursor
reduces bit comparison amount, and therefore improves search efficiency.
Search of the key c. (a) Search path; (b) only use the field directCursor to optimize the search process, where solid
arrows show the real cursor sliding path, while dash-line arrows show the skips of the cursor; (c) use both the fields directCursor and
indirectCursor to further optimize the search process.
1296
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
However, the cursor may move to back, rather
than slides one way. So we cannot bind the total
sliding distance by O(L). To improve the search
algorithm to O(L), we have to use another field
indirectCursor. Let the current node on the rootto-leaf path be N , and its direct parent and indirect parent be respectively N and N . Then both
key c and N.key are between N .key and N .key.
But we still do not know exactly whether key c
is between N .key and N.key, or between N.key
and N .key. For this purpose, we need to make
clear which of the two cursors Nc .directCursor
and Nc .indirectCursor is on the right side. If
Nc .directCursor is the right one, we take
min(Nc .directCursor, N.directCursor)
as the starting sliding position when we compare c
Algorithm 3
and N.key. Or else, take
min(Nc .indirectCursor, N.indirectCursor)
as the starting position. It can also be proved from
Lemma 1. Taking
d(c, a1 ) < d(c, a2 ) < · · · < d(c, am1 )
and
d(c, b1 ) < d(c, b2 ) < · · · < d(c, bm2 )
into account, we can prove that the total sliding
distance has a bound L. In addition, the digitalsearch-tree property is also very important. Once
the search path does not match the prefix of key
c, we can infer that the search key does not exist.
The detailed pseudo-code algorithm is as follows.
Searching the given key
RBTNode Nc ; // create a node for key c
Nc .key = c;
Nc .directCursor = Nc .indirectCursor = 0;
int depthCursor = 0; // check if the search path matches the prefix
bool fGoRight(true); // keep track of the indirect parent node of node Nc
RBTNode* N = root; // the current node on the root-to-leaf path
While (N != NULL)
{
bool fGoRightOld(fGoRight); // save a copy of fGoRight
int diffPos; // the MSDB between key c and the current key on the root-to-leaf path
// compute the beginning sliding position
If (Nc .directCursor > Nc .indirectCursor)
diffPos = min(Nc .directCursor, N .directCursor);
Else
diffPos = min(Nc .indirectCursor, N .indirectCursor);
// if the cursor need a skip to back, the variable diffPos already gives the MSDB
While (Nc .key.at(diffPos) != −1 && Nc .key.at(diffPos) == N .key.at(diffPos))
diffPos++ ; // search the MSDB between c and N .key
If (Nc .key.at(diffPos) == −1)
Break ; //N .key is what we want to find
If (Nc .key.at(depthCursor) == Nc .key.at(diffPos))
// The sentence has two meanings here:
// If Nc .key.at(diffPos)==1, then c is larger. So we continue search right-child way.
// According to the digital-search-tree property, key c should match the search path.
(to be continued on the next page )
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
1297
(Continued)
{
fGoRight = Nc .at(diffPos); // if Nc .at(diffPos) is 1, go right-child way
If (fGoRight != fGoRightOld)
Nc .indirectCursor = Nc .directCursor; // At a turning corner, update the indirect parent
Nc .directCursor = diffPos ; // node N will become the direct parent of Nc in the next step
If (fGoRight)
N = N .rightChild;
Else
N = N .leftChild;
}
Else
{
N = NULL; // the tree doesn’t contain key c, so terminate.
}
depthCursor++; // go to the deeper level
}
Return N ; // if N is not NULL, we succeed in searching key c; or else, the tree doesn’t contain c.
It is easy to find that all the properties of RBT
are used in Algorithm 3. The fields directCursor
and indirectCursor play an important role in obtaining an O(L)-time algorithm. As Figure 2(c)
shows, every sliding always begins at a larger position, while every backward skip directly answers
the MSDB. This ensures that Algorithm 3 runs in
time O(L).
2.4
Inserting a new key
In this section, we discuss how to insert a new key
c without loss of the properties of RBT. Naturally,
we will use Algorithm 3 to locate key c. Then we
push the affected keys down. The last step is to
rebuild the fields directCursor and indirectCursor.
The process is shown in Figure 3, where the first
subfigure shows the simplest case and the other
ones show a general insertion process.
First, we use Algorithm 3 to give a location
where key c is intended to insert. Three cases may
happen. If key c already exists in the tree, stop
insertion; if the intended location is at the bottom level, then insert c as a leaf (see Figure 3(a)).
Otherwise, there must be a node that blocks the
way upon pushing c down. This case results from
the mismatch of the search path and the prefix of
1298
c. Taking Figure 3(b) as an example, we assume
c < a1 . To ensure that the tree has a biary-searchtree property, key c should go right way. However,
this will destroy the digital-search-tree property.
At this point, we have to insert key c at the location of node a1 and push a1 down, such that a1
becomes the minimum key in the right subtree of c.
Similarly, if a1 is blocked the way by another node,
just do in this way as Figure 3(c) and (d) show.
Of course, we must maintain the fields directParent, indirectParent, directCursor and indirectCursor. The pseudo-code algorithm is as follows.
Note that Algorithm 4 only describes the sketchy
insertion process. In practice, we implement the
function using 167-line code. To analyze the time
complexity, we observe that for the affected nodes,
at most one of the fields directCursor and indirectCursor (accordingly, directParent and indirectParent) needs changing and its ultimate MSDBs
are monotonically increasing along the root-to-leaf
path. For example, in Figure 3(b)–(d), we have
d(c, a2 ) 6 d(c, a3 ) 6 d(a1 , a3 ) 6 d(a1 , a5 )
6 d(a1 , a6 ) 6 d(a4 , a6 ).
According to the discussion in the previous sec-
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
Figure 3
Insert a new key c. (a) The simplest case is to insert the key as a leaf; (b) the key c is intended to be inserted at the location
of the branch node a1 ; (c) push the node a1 down until the node a4 block the way; (d) the insertion process ends with the node a4 being inserted as a leaf.
Algorithm 4
Inserting a new key
RBTNode Nc ; // create a node for the new key c
Nc .key=c;
Nc .directCursor=Nc .indirectCursor=0;
//use Algorithm 3 to locate key c
Case 1: key c already exists and therefore exits.
Case 2: Nc can be inserted as a leaf as Figure 3(a) shows.
Case 3: Nc is blocked the way by another node Nb , then do as Figure 3(b)–(d) show:
Step 1. Replace Nb with Nc , and push Nb down.
Step 2. If Nb can be inserted as a leaf, just insert it and exit; otherwise, go to step 1.
tion, we conclude that the total cursor sliding distance for recomputing the affected MSDBs can be
bounded by L. Therefore, Algorithm 4 runs in time
O(L).
2.5
Deleting a given key
Deleting a given key is the inverse process of inserting a new key. It has two steps, namely searching
the given key and filling in the node, as is shown
in Figure 4.
The easiest case is to delete a leaf node as Figur
4(a) shows. At this point, we only need to change
its parent’s fields leftChild and rightChild. Otherwise, the to-be-deleted node is a branch node, one
of whose left subtree and right subtree is not empty.
Without loss of generality, the to-be-deleted node
a1 has a non-empty right subtree, as Figure 4(b)
shows. So, after node a1 is removed, we try to fill in
the blank node with the minimum key of the right
subtree such that the binary-search-tree property
is still satisfied. But perhaps this will yield another
blank node. We need to repeat such a process until the blank node happens as a leaf. In coding, we
create a stack in advance to collect the to-be-moved
nodes. The pseudo-code algorithm is as follows.
What is extremely similar to Algorithm 3 is that
the affected fields directCursor and indirectCursor
are monotonically increasing. Taking Figure 4(c)
as an example, we observe that
d(a4 , a2 ) 6 d(a4 , a3 ) 6 d(a6 , a3 ) 6 d(a6 , a5 )
6 d(a8 , a5 ) 6 d(a8 , a7 ).
So Algorithm 5 can prove an O(L)-time algorithm.
Well, we have finished the 5 essential algorithms
for maintaining a dynamic ordered set. And they
all can run in time O(L), so we conclude that
RBT-based dynamic ordered sets have an O(L)time complexity.
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
1299
Figure 4
Delete a given key. (a) The easiest case is to delete a leaf; (b) delete a branch node; (c) in a bubble-like style, we fill in the
blank node with the minimum key in the right subtree or the maximum key in the left subtree.
Algorithm 5
Deleting a given key
Step 1. Search the given key.
// Suppose that the to-be-deleted node is a branch node.
Step 2. Following the rule of filling in the blank node with either the minimum key of the right subtree or the maximum key of
the left subtree, we collect the to-be-moved keys into a stack.
Step 3. Repeat poping out the top element and putting it into the right blank node.
Step 4. Recompute the affected fields directCursor and indirectCursor.
Figure 5 Comparison between quicksort and RBT-based sorting. The keys are respectively 64-bit long and 128-bit long, and
the key numbers are respectively 1000, 2000, 4000, 8000, 16000,
32000, 64000 and 128000.
3 Applications
In section 2, we have described the essential algorithms for maintaining a dynamic ordered set. In
fact, many classic problems, such as those problems of how to search, sort and maintain a priority
queue, can be solved with these algorithms. Here,
we make a test on sorting problem to show that
1300
our algorithm outperforms traditional sorting algorithms.
As far as the average performance is concerned,
quick-sort[12] is thought to be the best sorting algorithm. So we compare the performance difference between our algorithm and quick-sort here.
The test is made on a computer of 3.00 GHz Pentium(R) 4 CPU and 2.0 GB RAM. The keys are respectively 64-bit long and 128-bit long, and the key
numbers are respectively 1000, 2000, 4000, 8000,
16000, 32000, 64000 and 128000. The experimental
results can be seen in Figure 5. For example, when
the keys are 128-bit long and as many as 128000,
quick-sort costs 3140 ms, while RBT-based sorting only costs 1376 ms. Furthermore, the more the
keys are, the bigger advantage RBT-based sorting
has. And its running time linearly increases with
an increase in the key number.
We also believe that data structures such as sets
and maps in programming language can be implemented based on RBT.
4 Conclusion
This paper presents an important data structure
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
named RBT, which has both the binary-searchtree property and the digital-search-tree property.
Moreover, RBT keeps the internal relation between
keys. Then we given 5 essential algorithms for
maintaining a dynamic ordered set, i.e. 1) retrieving the maximum/minimum key; 2) returning
the next-larger/next-smaller key; 3) searching the
given key; 4) inserting a new key; and 5) delet-
ing the given key. They all can run in time O(L),
where L is the word length. Experimental results
show that RBT-based algorithms have a good performance. We believe that the new data structure RBT and RBT-based algorithms will enable
us to solve problems concerning order with high
efficiency.
1 Corman T, Leiserson C, Rivest R, et al. Introduction to Algorithms. 2nd ed. Cambridge: MIT Press, 1990. 123–320
2 Li X M, Garzarán M J, Padua D. A dynamically tuned sorting
library. In: CGO ’04: Proceedings of the International Symposium on Code Generation and Optimization. Palo Alto,
California, 2004. 111
3 Graefe G. Implementing sorting in database systems. ACM
Comput Surv, 2006, 38(3): 10
4 Andersson A, Thorup M. Dynamic ordered sets with exponential search trees. J ACM, 2007, 54(3): 13
5 Blandford D K, Blelloch G E. Compact representations of
ordered sets. In: SODA ’04: Proceedings of the Fifteenth
Annual ACM-SIAM Symposium on Discrete Algorithms, New
Orleans, 2004. 11–19
6 Sedgewick R. Algorithms. 2nd ed. Massachusetts: AddisonWesley, 1983. 91–170
7 Andersson A, Hagerup T, Håstad J, et al. The complexity of
searching a sorted array of strings. In: STOC ’94: Proceedings of the twenty-sixth Annual ACM Symposium on Theory
of Computing, Québec, 1994. 317–325
8 Bentley J L, Sedgewick R. Fast algorithms for sorting and
searching strings. In: SODA ’97: Proceedings of the Eighth
Annual ACM-SIAM Symposium on Discrete Algorithms, New
Orleans, 1997. 360–369
9 Siegel D E. All searches are divided into three parts: string
searches using ternary trees. In: APL ’98: Proceedings of
the APL98 Conference on Array Processing Language, Rome,
1998. 57–68
10 Brodal G S. Finger search trees with constant insertion time.
In: SODA ’98: Proceedings of the Ninth Annual ACM-SIAM
Symposium on Discrete Algorithms, San Francisco, 1998. 540–
549
11 Andersson A, Thorup M. Dynamic string searching. In: SODA
’01: Proceedings of the Twelfth Annual ACM-SIAM Sympo-
sium on Discrete Algorithms, Washington, 2001. 307–308
12 Hoare C A R. Algorithm 64: Quicksort. Commun ACM, 1961,
4(7): 321
13 Williams J W J. Algorithm 232: Heapsort. Commun ACM,
1964, 7: 347–348
14 Knuth D E. Fundamental Algorithms. 3rd ed. Massachusetts:
Addison-Wesley, 1997. 1–650
15 Penttonen M, Katajainen J. Notes on the complexity of sorting
in abstract machines. BIT, 1985, 25(4): 611–622
16 Han Y J. Deterministic sorting in O(nlog log n) time and linear space. In: STOC ’02: Proceedings of the Thirty-fourth
Annual ACM Symposium on Theory of Computing, Quebec,
2002. 602–608
17 Thorup M. Integer priority queues with decrease key in constant time and the single source shortest paths problem. J
Comput Syst Sci, 2004, 69(3): 330–353
18 Thorup M. On RAM priority queues. SIAM J Comput, 2000,
30(1): 86–109
19 Arge L, Bender M A, Demaine E D, et al. Cache-oblivious priority queue and graph algorithm applications. In: STOC ’02:
Proceedings of the Thiry-fourth Annual ACM Symposium on
Theory of Computing, Québec, 2002. 268–276
20 Yang L, Song T. The array-based bucket sort algorithm. J
Comput Res Devel, 2007, 44(2): 341–347
21 Yang J W, Liu J. Quick page sorting algorithm based on quick
sorting. Comput Eng, 2005, 31(4): 82–84
22 Zhong H, Chen Q H, Liu G S. A byte-guick sorting algorithm.
Comput Eng, 2002, 28(12): 39–40
23 Huo H W, Xu J. A study on quicksort algorithm. Microelectr
Comput, 2002, 19(6): 6–9
24 Tang W T, Mong Goh R S, Thng I L. Ladder queue: An
O(1) priority queue structure for large-scale discrete event
simulation. ACM Trans Model Comput Simul, 2005, 15(3):
175–204
XIN S Q et al. Sci China Ser F-Inf Sci | Aug. 2009 | vol. 52 | no. 8 | 1292-1301
1301
Download