2 3 4 Trees

advertisement
2-3-4 Trees
Chapter 10 – Part 1
2-3-4 Trees and Related Topics
1
Introduction
• Multi-way Trees are trees that can have up to four
children and three data items per node.
• 2-3-4 Trees: Very nice features
– Are always balanced. (What does balanced mean??)
– Reasonably easy to program .
–  Serve as an introduction to the understanding of B-Trees!!
• B-Trees: another kind of multi-way tree particularly
useful in organizing external storage, like files.
– B-Trees (later lectures) can have dozens or hundreds of
children with hundreds of thousands of records!
2
Introduction to 2-3-4 Trees
• Shape of nodes is a lozenge-shaped.
• In a 2-3-4 tree, all leaf nodes are at the same
level. (but data can appear in all nodes)
50
30
10 20
40
60
55
62 64
66
70
80
75
83 86
3
• The 2, 3, and 4 in the name refer to how many links to
child nodes can potentially be contained in a given node.
•
• For non-leaf nodes, three arrangements are possible:
– A node with only one data item always has two children
– A node with two data items always has three children
– A node with three data items always has four children.
• For non-leaf nodes with at least one data item ( a node
will not exist with zero data items), the number of links
may be 2, 3, or 4.
4
• Non-leaf nodes must/will always have one
more child (link) than it has data items (see
below);
– Equivalently, if the number of child links is L and the
number of data items is D, then L = D+1.
50
30
10 20
40
60
55
62 64
66
70
80
75
83 86
5
More Introductory stuff
50
30
10 20
40
60
55
62 64
66
70
80
75
83 86
• Critical relationships determine the structure of 2-3-4 trees:
• A leaf node has no children, but can still contain one, two, or three
data items ( 2, 3, or 4 links); cannot be empty. (See figure above)
•  Because a 2-3-4 tree can have nodes with up to four children, it’s
called a multiway tree of order 4.
6
Still More Introductory stuff
• Binary (and Red Black) trees may be referred to as
multiway trees of order 2 - each node can have up to two
children.
• But note: in a binary tree, a node may have up to two child
links ( but one or more may be null).
• In a 2-3-4 tree, nodes with a single link are NOT
permitted;
– a node with one data item must have two links (unless it’s a leaf);
– nodes with two data items must have three children;
– nodes with three data items must have four children.
• (You will see this more clearly once we talk about
how they are actually built.)
7
Even More Introductory stuff
• These numbers are important.
• For data items, a node with one data item:
points to (links to) lower level nodes that
have values less than the value of this item
and a pointer to a node that has values
greater than or equal to this value.
• For nodes with two links: a node with two
links is called a 2-node; a node with three
links is called a 3-node; with four links, a
4-node. (no such thing as a 1-node).
8
50
30
10 20
40
2-node
2-node
55
60
62 64
66
70
4-node
80
75
83 86
Do you see any 2-nodes? 3-nodes? 4-nodes?
Do you see: a node with one data item that has two links?
a node with two data items having three children;
a node with three data items having four children?
9
2-3-4 Tree Organization
• Very different organization than for a binary tree.
• First, we number the data items in a node 0,1,2
and number child links: 0,1,2,3. Very Important.
• Data items are always ascending: left to right in a
node.
• Relationships between data items and child links is
easy to understand but critical for processing.
10
More on 2-3-4 Tree Organization
A
B
Points to nodes w/keys < A ; Nodes with key between A and <B
C
Nodes w/keys between B and < C Nodes w/keys > C
See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not full; tree balanced!
Its construction always maintains its balance, even if you add additional data items. (ahead)
50
30
10 20
40
2-node
2-node
55
60
62 64
66
70
80
75
4-node
83 86
11
Searching a 2-3-4 Tree
•
•
•
•
A very nice feature of these trees.
You have a search key; Go to root.
Retrieve node; search data items;
If hit:
– done.
• Else
– Select the link that leads to the appropriate subtree
with the appropriate range of values.
– If you don’t find your target here, go to next child.
(notice data items are sequential – VIP later)
– etc. Data will ultimately be ‘found’ or ‘not found.’
12
Try it: search for 64, 40, 65
50
30
10 20
40
2-node
2-node
55
60
62 64
66
70
80
75
4-node
83 86
Note: Nodes serve as holders of data and holders of ‘indexes’.
Note: can easily have a ‘no hit’ condition
Note: the sequential nature after indexing…sequential searching within node.
13
So, how do we Insert into this Structure?
• Can be quite easy; sometimes very complex.
– Can do a top-down or a bottom-up approach…
• Easy Approach:
– Start with searching to find a spot for data item.
• We like to insert at the leaf level, but we will take the
top-down approach to get there… So,
– Inserting may very likely involve moving a data item around to
maintain the sequential nature of the data in a leaf.
14
Node Split – a bit more difficult (1 of 2)
• Using a top-down 2-3-4 tree.
• If we encounter a full node in looking for
the insertion point.
– We must split the full nodes.
• You will see that this approach keeps the
tree balanced.
15
Node Split – Insertion: more difficult – 2 of 2
Upon encountering a full node (searching for a place to insert…)
1. split that node at that time.
2. move highest data item from the current (full) node into new node to the right.
3. move middle value of node undergoing the split up to parent node
(Know we can do all this because parent node was not full)
4. Retain lowest item in node.
5. New node (to the right) only has one data item (the highest value)
6. Original node (formerly full) node contains only the lowest of the three values.
7. Rightmost children of original full node are disconnected and connected to
new children as appropriate
(They must be disconnected, since their parent data is changed)
New connections conform to linkage conventions, as expected.
8. Insert new data item into the original leaf node.
Note: there can be multiple splits encountered en route to finding the insertion point.
16
Insert: Here: Split is NOT the root node.
Let’s say we want to add a 99 (from book)…
62
Want to add a data value of 99
Split this node…
… other stuff
83
74
87 89
92
97
99 to be inserted…
104
112
17
Case 1 Insert: Split is NOT the root node
(Let’s say we want to add a 99 (from book)…)
2. 92 moves up to parent node. (We know it was not full)
62
92
3. 83 stays put
… other stuff
83
74
87 89
1. 104 starts a new node
104
97
99
112
4. Two rightmost children
of split node are
reconnected to new node.
5. New data item moved in.
18
If Root itself is full: Split the Root
• Here, the procedure is the same.
• Root is full. Create a sibling
– Highest value data is moved into new sibling;
– first (smallest value) remains in node;
– middle value moves up and becomes data value
in new root.
• Here, two nodes are created:
– A new sibling and a new root.
19
Splitting on the Way Down
• Note: once we hit a node that must be split (on the way
down), we know that when we move a data value ‘up’ that
‘that’ node was not full.
– May be full ‘now,’ but it wasn’t on way down.
• Algorithm is reasonably straightforward.
• Do practice the splits on Figure 10.7.
– Will see later on next exam.
• I strongly recommend working the node splits on page
381. Ensure you understand how they work.
• Just remember:
– 1. You are splitting a 4-node. Node being split has three data
values. Data on the right goes to a new node. Data on the left
remains; data in middle is promoted upward; new data item
is inserted appropriately.
– 2. We do a node split any time we encounter a full node and
when we are trying to insert a new data value.
20
Objects of this Class Represent Data Items Actually Stored.
// tree234.java
import java.io.*;
////////////////////////////////////////////////////////////////
class DataItem
{
This is merely A data item stored at the
nodes. In practice, this might be an entire
record or object.
Here we are only showing the key, where
the key may represent the entire object.
public int dData;
// one data item
//-------------------------------------------------------------public DataItem(int dd) // constructor
{
dData = dd;
}// end constructor
//-------------------------------------------------------------public void displayItem() // display item, format "/27"
{
System.out.print("/"+dData);
} // end displayItem()
//-------------------------------------------------------------} // end class DataItem
21
Following code is for processing that goes on
inside the Nodes themselves – not the entire
tree.
Not trivial code. Try to understand totally.
We will dissect carefully and deliberately!
22
This is what a Node looks like:
Note: two arrays: a child array and an item array.
private static final int ORDER = 4;
Note their size: nodes = 4; item = 3.
private int numItems;
4
The child array is size 4: the links: maximum children.
private Node parent;
3
private Node childArray[] = new Node[ORDER];
The second array, itemArray is of size 3 – the
private DataItem itemArray[] = new DataItem[ORDER-1]; maximum number of data items in a node.
// ------------------------------------------------------------numItems is the number of items in the itemArray.
parent will be used as a reference when inserting.
class Node
{
public void connectChild(int childNum, Node child) {// connect child to this node
childArray[childNum] = child;
if(child != null)
child.parent = this;
}
// ------------------------------------------------------------public Node disconnectChild(int childNum) {// disconnect child from this node, return it
Node tempNode = childArray[childNum];
childArray[childNum] = null;
return tempNode;
Major work done by findItem(), insertItem() and
}
removeItem() (next slides) for a given node.
// ------------------------------------------------------------public Node getChild(int childNum)
These are complex routines and NOT to be confused
{ return childArray[childNum]; }
with find() and insert() for the Tree234 class itself.
// ------------------------------------------------------------These are find() and insert() within THIS node.
public Node getParent()
{ return parent; }
// ------------------------------------------------------------Recall: references are automatically initialized to null
public boolean isLeaf()
and numbers to 0 when their object is created.
{ return (childArray[0]==null) ? true : false; }
So, Node doesn’t need a Constructor.
// ------------------------------------------------------------public int getNumItems()
23
{ return numItems; }
One of three slides of code for class Node
public DataItem getItem(int index) // get DataItem at index
{ return itemArray[index]; }
// ------------------------------------------------------------public boolean isFull()
{ return (numItems==ORDER-1) ? true : false; }
// ------------------------------------------------------------public int findItem(int key) // return index of item (within node)
{
for(int j=0; j<ORDER-1; j++) // if found, otherwise return -1
{
if(itemArray[j] == null)
break;
else
Find routine:
if(itemArray[j].dData == key)
Looking for the data within
return j;
the node where we are located.
}// end for
return -1;
} // end findItem
// ------------------------------------------------------------ public DataItem removeItem() { // removes largest item
Delete Routine
// assumes node not empty
DataItem temp = itemArray[numItems-1]; // use index, save item
Saves the deleted item.
itemArray[numItems-1] = null;
// disconnect it
Sets the location contents to null.
numItems--;
// one less item
Decrements the number of items
return temp;
// return item
at the node.
}
Returns the deleted data item.
// ------------------------------------------------------------public void displayNode() {
// format "/24/56/74/"
for(int j=0; j<numItems; j++)
itemArray[j].displayItem(); // "/56"
System.out.println("/");
// final "/"
24
}
Class Node (continued)
Class Node (continued)
Insert Routine
Increments number of items in node.
Get key of new item.
Now loop.
// ------------------------------------------------------------ public int insertItem(DataItem newItem)
{
// assumes node is not full
numItems++;
int newKey = newItem.dData;
// will add new item
// key (int value) of new item
for(int j=ORDER-2; j>=0; j--) // start on right to examine data
{
// looking for spot to insert
if(itemArray[j] == null)
// if item null, go left one cell.
continue;
// Recall: what does ‘continue’ do?
Go through code and my comments.else
{
// if not null, get its key
Start looking for place to insert the
int itsKey = itemArray[j].dData;
// not necessary, but …
data item. Start on the right and
if(newKey < itsKey)
// if existing key is bigger,
proceed left looking for proper place.
itemArray[j+1] = itemArray[j]; // shift whole node right
else
{ // otherwise, insert new item and return index.+1
itemArray[j+1] = newItem; // copies over moved item…
return j+1;
}
} // end else (not null)
} // end for
// shifted all items,
itemArray[0] = newItem;
// insert new item
return 0;
} // end insertItem()
25
Code for the 2-3-4 Tree itself
26
class Tree234App {
This code is merely the interface for a client.
Pretty easy to follow.
All the complex processing is undertaken at the
tree level and at the node level.
Be certain to recognize this.
Note the ‘break’ in the case statements…
public static void main(String[] args) throws IOException {
long value;
Tree234 theTree = new Tree234();
theTree.insert(50);
theTree.insert(40);
theTree.insert(60);
theTree.insert(30);
theTree.insert(70);
while(true) {
System.out.print("Enter first letter of ");
System.out.print("show, insert, or find: ");
char choice = getChar();
switch(choice) {
case 's':
theTree.displayTree();
break;
case 'i':
System.out.print("Enter value to insert: ");
value = getInt();
theTree.insert(value);
break;
case 'f':
System.out.print("Enter value to find: ");
value = getInt();
int found = theTree.find(value);
if(found != -1)
System.out.println("Found "+value);
else
System.out.println("Could not find "+value);
break;
default:
System.out.print("Invalid entry\n");
} // end switch
27
} // end while
Class Tree234App
(continued)
//-------------------------------------------------------------public static String getString() throws IOException
{
InputStreamReader isr = new InputStreamReader
(System.in);
BufferedReader br = new BufferedReader(isr);
String s = br.readLine();
return s;
}
//-------------------------------------------------------------public static char getChar() throws IOException
{
String s = getString();
return s.charAt(0);
}
//------------------------------------------------------------public static int getInt() throws IOException
{
String s = getString();
return Integer.parseInt(s);
}
//------------------------------------------------------------28
} // end class Tree234App
class Tree234 {
The object of type Tree234 IS the entire tree.private Node root = new Node(); // make root node
Note: tree only has one attribute, its root. public int find(int key) {
This is all it needs.
Node curNode = root;
int childNumber;
while(true) {
if(( childNumber=curNode.findItem(key) ) != -1)
return childNumber; // found; recall findItem returns index v
else if( curNode.isLeaf() )
return -1;
// can't find it
else
// search deeper
Finding, Searching, and Splitting algorithms
curNode = getNextChild(curNode, key);
are all shown here.
} // end while
}// end find()
public void insert(int dValue) { // insert a DataItem
Node curNode = root;
DataItem tempItem = new DataItem(dValue);
while(true)
Call split routine
{
if( curNode.isFull() )
{
split(curNode);
// call to split node
See creation of additional two nodes
curNode = curNode.getParent(); // back up // search onc
curNode = getNextChild(curNode, dValue);
} // end if(node is full)
else if( curNode.isLeaf() ) // if node is leaf, insert data
break;
else // node is not full, not a leaf, so go to lower level
curNode = getNextChild(curNode, dValue);
29
} // end while
Class Tree234
(continued)
Be careful in here.
Remember, the middle value is
moved to parent and must be
disconnected (and made null).
Rightmost element is moved into
new node as leftmost data item.
Review the process and then
note the code that implements
the process.
public void split(Node thisNode) {
// split the node // assumes node is fu
DataItem itemB, itemC; // When you get here, you know you need to split…
Node parent, child2, child3;
int itemIndex;
itemC = thisNode.removeItem(); // remove items from this node.
itemB = thisNode.removeItem(); // Note: these are second and third items
child2 = thisNode.disconnectChild(2); // remove children
These are rightmost
child3 = thisNode.disconnectChild(3); // from this node
two children
Node newRight = new Node();
// make new node
if(thisNode==root)
// if the node we’re looking at is the root,
{
root = new Node();
// make new root
parent = root;
// root is our parent
root.connectChild(0, thisNode); // connect to parent
}
else
// this node to be split is not the root
parent = thisNode.getParent(); // get parent
// deal with parent
itemIndex = parent.insertItem(itemB); // item B to parent
int n = parent.getNumItems();
// total items?
for(int j=n-1; j>itemIndex; j--)
{
// move parent's connections
Node temp = parent.disconnectChild(j); // one child to the right
parent.connectChild(j+1, temp);
}
parent.connectChild(itemIndex+1, newRight); // connect newRight to parent
// deal with newRight
newRight.insertItem(itemC);
// item C to newRight
newRight.connectChild(0, child2); // connect to 0 and 1 30
newRight.connectChild(1, child3); // on newRight
Class Tree234
(continued)
// gets appropriate child of node during search for value
public Node getNextChild(Node theNode, int theValue) {
int j;
// assumes node is not empty, not full, not a leaf
int numItems = theNode.getNumItems();
for(j=0; j<numItems; j++)
// for each item in node
// are we less?
if( theValue < theNode.getItem(j).dData )
return theNode.getChild(j); // return left child
// end for
// we're greater, so
return theNode.getChild(j);
// return right child
}// end getNextChild()
// ------------------------------------------------------------public void displayTree()
{ recDisplayTree(root, 0, 0); }
// ------------------------------------------------------------private void recDisplayTree(Node thisNode, int level, int childNumber) {
System.out.print("level="+level+" child="+childNumber+" ");
thisNode.displayNode();
// display this node
// call ourselves for each child of this node
int numItems = thisNode.getNumItems();
for(int j=0; j<numItems+1; j++)
{
Node nextNode = thisNode.getChild(j);
if(nextNode != null)
recDisplayTree(nextNode, level+1, j);
else
return;
}
} // end recDisplayTree()
// -------------------------------------------------------------\
31
} // end class Tree234
Efficiency Considerations
32
Efficiency Considerations for 2-3-4 Trees
• Searching:
• 2-3-4 Trees: one node must be visited, but
– More data per node / level.
– Searches are fast.
• recognize all data items at node must be checked in a 2-3-4 tree,
• but this is very fast and is done sequentially.
• All nodes in the 2-3-4 tree are NOT always full.
•  Overall, for 2-3-4 trees, the increased number of items (which
increases processing / search times) per node processing tends to
cancel out the increases gained from the decreased height of the tree
and size of the nodes.
– Increased number of data items per node implies: fewer node retrievals.
•  So, the search times for a 2-3-4 tree and for a balanced binary
tree are approximately equal and both are O(log2n)
33
Efficiency Considerations for 2-3-4 Trees
• Storage
• 2-3-4 Trees: a node can have three data items and
up to four references.
• Can be an array of references or four specific
variables.
• IF not all of it is used, can be considerable waste.
•  In 2-3-4 trees, quite common to see many
nodes not full.
34
Download