Parallel Buffer Trees and


Cory Fraser

School of Computer Science

Carleton University, Ottawa, Canada

COMP 5704 Project Presentation

• Computational Model

• Parallel Buffer Trees

• Implementation

• Results

Computational Model

• Sequential Buffer Tree operated in the External

Memory Model.

• Minimizes transfers from hard disk -> RAM.

• Parallel Buffer Tree operates in the Parallel

External Memory Model.

• Minimizes transfers from RAM -> CPU cache.

Related Search Data Structures

• Binary Search Trees

• Usually analyzed in RAM/PRAM model.

• O(nlogn) build time, O(logn) operation time.

• B-trees

• Analyzed in EM / PEM model.

• O(nlog

B n) build time, O(log

B n) operation time.

• Buffer Tree has O((n/B) log


(n/B)) build time.

What is a Parallel Buffer Tree?

• An offline data structure.

• An (a,b)-tree variant.

• Performs tree operations in batches to reduce


• Good when there’s a continual large flow of operations to execute.

Parallel Buffer Tree Complexity

• For sequences of N insert/delete/find(/range) operations:

• O(sort


(N)) I/Os without range search

• O(sort


(N) + K/PB) I/Os with range searches.

• sort


(N) = O(N/PB log


N/B) I/Os

• Parallel B-tree needs O(N/Plog


N) I/Os.

Required Parallel Algorithms

• Parallel sorting for batch operations.

• Parallel merge sort used.

• Parallel prefix sums

• Needed for range query support.

• Distributes batched operations in buckets.

Implementation Overview

• Intel Cilk++ SDK with GCC used.

• Available at

• Parallel merge sort from class used.

• Range query extension not implemented.

Implementation Details

• Buffer tree is an (a,b)-tree, a=f/4, b=f, f>= PB

• Each leaf stores up to B elements.

• Each non-leaf has a buffer of size 2fB.

• Internal nodes have k-1 routing elements to direct values to children. k = num. of children

Implementation Details - Operations

• Tree builds up batches of PB operations before executing them.

• An operation is its type, value, and timestamp.

• The PB batches operations are split into P blocks and sent to the root in parallel.

Emptying Non-fringe buffers

• Sort the buffer by value and timestamp.

• Answer Find operations with matching Insert/Delete operations.

• Cancel out matching Insert/Deleting operations.

• Distribute buffer elements to children based on the routing elements.

• Recursively empty children buffers with more than fB operations.

Emptying Fringe Buffers

• Convert all values within children nodes into insert operations with negative infinity timestamp.

• Sort the buffer by value and timestamp.

• Answer Find operations, cancel out


• Based on remaining operations:

• If <= fB then remake child nodes.

• If > fB then create new siblings for each fB/2 operations.

Tree rebalancing may be required.

Node Rebalancing

• Test System Specs:

• Quad-core running Fedora 16.

• 12 GB of RAM.

• Sequential comparison structures:

• C++ std::set

• online structure

• Parallel Buffer Tree with 1 worker.


Results So Far – Build Times

• Parallel speedup vs sequential version is high with enough input.

• Performance is not competitive against equivalent online data structures thus far.

• Would need about 12 cores to match std::set.

• May be practical for high volume external memory applications.

• What is an offline data structure?

• What kind of I/O operations is the Parallel

External Memory (PEM) model concerned with?

• Why can a Buffer tree be loaded with N elements faster than a B-tree according to big-


• N. Sitchinava, N. Zeh, A Parallel Buffer Tree

• L. Arge, External Memory Data Structures



