RAQUEL B-tree File Stack – B-tree Introduction 21 May 2007 Benoit Maréchal B-tree introduction Introduction A B-tree is a tree data structure that keeps data sorted and allows insertions and deletions that is logarithmically proportional to file size. It is commonly used in databases and filesystems. In this document we will see what data structure is, then what a tree data structure is, and finally we will explain how a B-Tree structure works. Data structure A data structure is a way of storing data in a computer so that it can be used efficiently. Often a carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data structure. A welldesigned data structure allows a variety of critical operations to be performed, using as few resources, both execution time and memory space, as possible. Data structures are implemented using the data types, references and operations on them provided by a programming language. Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to certain tasks. For example, B-trees are particularly well-suited for implementation of databases, while routing tables rely on networks of machines to function. In the design of many types of programs, the choice of data structures is a primary design consideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosing the best data structure. After the data structures are chosen, the algorithms to be used often become relatively obvious. Sometimes things work in the opposite direction - data structures are chosen because certain key tasks have algorithms that work best with particular data structures. In either case, the choice of appropriate data structures is crucial. Tree data structure A tree is a widely-used data structure that emulates a tree structure with a set of linked nodes. Page 1 on 3 RAQUEL B-tree File Stack – B-tree Introduction 21 May 2007 Benoit Maréchal A simple example binary tree [W06a] A node may contain a value or a condition or represent a separate data structure or a tree of its own. Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees grow down, not up as they do in nature). A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent. The height of a node is the length of the longest path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root. B-Tree In B-trees, internal nodes can have a variable number of child nodes within some pre-defined range. When data is inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other selfbalancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation. A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more hop further removed from the root. Schema showing the B-Tree structure [W06] Page 2 on 3 RAQUEL B-tree File Stack – B-tree Introduction 21 May 2007 Benoit Maréchal References [W06b] B-tree - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Btree [W06a] Tree (data structure) - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Tree_data_structure Data structure - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Data_structure Donald Knuth. The Art of Computer Programming. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. Rudolf Bayer, Binary B-Trees for Virtual Memory. Page 3 on 3