B-tree introduction

advertisement
RAQUEL B-tree File Stack – B-tree Introduction
21 May 2007
Benoit Maréchal
B-tree introduction
Introduction
A B-tree is a tree data structure that keeps data sorted and allows insertions and deletions that
is logarithmically proportional to file size. It is commonly used in databases and filesystems.
In this document we will see what data structure is, then what a tree data structure is, and
finally we will explain how a B-Tree structure works.
Data structure
A data structure is a way of storing data in a computer so that it can be used efficiently. Often
a carefully chosen data structure will allow the most efficient algorithm to be used. The
choice of the data structure often begins from the choice of an abstract data structure. A welldesigned data structure allows a variety of critical operations to be performed, using as few
resources, both execution time and memory space, as possible. Data structures are
implemented using the data types, references and operations on them provided by a
programming language.
Different kinds of data structures are suited to different kinds of applications, and some are
highly specialized to certain tasks. For example, B-trees are particularly well-suited for
implementation of databases, while routing tables rely on networks of machines to function.
In the design of many types of programs, the choice of data structures is a primary design
consideration, as experience in building large systems has shown that the difficulty of
implementation and the quality and performance of the final result depends heavily on
choosing the best data structure. After the data structures are chosen, the algorithms to be used
often become relatively obvious. Sometimes things work in the opposite direction - data
structures are chosen because certain key tasks have algorithms that work best with particular
data structures. In either case, the choice of appropriate data structures is crucial.
Tree data structure
A tree is a widely-used data structure that emulates a tree structure with a set of linked nodes.
Page 1 on 3
RAQUEL B-tree File Stack – B-tree Introduction
21 May 2007
Benoit Maréchal
A simple example binary tree [W06a]
A node may contain a value or a condition or represent a separate data structure or a tree of its
own. Each node in a tree has zero or more child nodes, which are below it in the tree (by
convention, trees grow down, not up as they do in nature). A node that has a child is called the
child's parent node (or ancestor node, or superior). A node has at most one parent. The height
of a node is the length of the longest path to a leaf from that node. The height of the root is the
height of the tree. The depth of a node is the length of the path to its root.
B-Tree
In B-trees, internal nodes can have a variable number of child nodes within some pre-defined
range. When data is inserted or removed from a node, its number of child nodes changes. In
order to maintain the pre-defined range, internal nodes may be joined or split. Because a range
of child nodes is permitted, B-trees do not need re-balancing as frequently as other selfbalancing search trees, but may waste some space, since nodes are not entirely full. The lower
and upper bounds on the number of child nodes are typically fixed for a particular
implementation.
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth
will increase slowly as elements are added to the tree, but an increase in the overall depth is
infrequent, and results in all leaf nodes being one more hop further removed from the root.
Schema showing the B-Tree structure [W06]
Page 2 on 3
RAQUEL B-tree File Stack – B-tree Introduction
21 May 2007
Benoit Maréchal
References
[W06b] B-tree - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Btree
[W06a] Tree (data structure) - Wikipedia, the free encyclopedia,
http://en.wikipedia.org/wiki/Tree_data_structure
Data structure - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Data_structure
Donald Knuth. The Art of Computer Programming.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction
to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.
Rudolf Bayer, Binary B-Trees for Virtual Memory.
Page 3 on 3
Download