Lecture Notes - Towson University

advertisement
COSC 600: Advanced Data structures and Algorithm Analysis
Chapter 4-Lecture Notes
Trees
Trees are special subset of graphs. A tree is a collection of nodes. It consists of
a) a distinguished node called root
b) zero or more non empty subtrees T1,T2………. Tk , each of these nodes connected by a
directed edge from “r”
c)recursive definition
root
T1
T2
Tk
Tree is a directed graph without a cycle
1. each node of a tree has only one parent node except root
2 there is only one “path” from root to each node, no cycle
Path: a sequence of nodes n1, n2….. nk such that ni is the parent of ni+1
Length of path: number of edges
Depth of ni: length of the path from root to ni
Height of ni: length of longest path from ni to leaf node
Leaf node: which has no child also called as terminal node
Linked list representation of a tree
value
Links to children nodes
A
D
C
B
E
G
F
Next sibling
value
Child = root of first sub tree
Implementation: using linked lists
A
E
F
A
B
E
/
C
/
F
D
C
B
/
/
/
/
D
/
/
Tree Search (tree traverse)
1) Depth first search(DFS)……………… use stacks
2)Breadth first search(BFS) …………… Use Queues
Binary tree: special subset of tree, ordered tree
Stmt: each son of a vertex is distinguished either as a left son or as right son
2) no vertex has more than one left son nor more than one right son
a tree in which no node can have more than two children
Implementation:
left
value
right
Linked list pointing to a left n right nodes
Eg:
/
A
B
C
/
/
E
/
/
F
/
Note: For a number of nodes begin N, the average height of all possible binary tree is
O(√𝑁).
Let F be the # number of nodes with 2 children and H be the # no of nodes with 1 child and
L be the # no of nodes with no children then
1) F+H+L = N
2) 2F+H=N-1
from 1 &2, F=L-1
Traversal method for a binary tree
1) pre order: visit + traversal of left subtree + traversal of right subtree
2)In order: traversal of left subtree + visit + traversal of right subtree
3)post order: traversal of left subtree + traversal of right subtree + visit
visit: print label of nodes
A
C
B
D
F
E
I
H
G
M
L
K
J
Preorder for above tree : ABDGJKCEHFILM
Inorder : DJGKBAHECLIMF
Postorder: JKGDBHELMIFCA
Expression tree example:
+
/
*
+
3
2
*
4
8
4
2
Postorder: 3 2 4 + * 8 2 * 4 / +
Preorder : + * 3 + 2 4 / * 8 2 4
Inorder : ((3*(2+4)+((8*2)/4))
Problem: Generate Algorithm to convert postorder to inorder( postfix to infix)
Eg: a b + c d e + * *
Algorithm: 1) read a token left to right
If token = operand
Create a one node tree and push it into a stack
If token = operator
Pop two trees T1,T2 from stack and from new tree whose root is the operator and whose
left and right children are T2 and T1
First two symbols are operands push them into the stack
a
b
Next is operator + so a,b are popped and new tree is formed and pushed into stack
+
a
b
Next c,d,e are read and they are placed in stack
+
a
b
c
d
e
Next operator + is read so trees are merged
+
a
+
+
c
+
b
+
e
+
d
+
Next operator * is read so we pop two trees and form new tree with * as root
*
+
+
a
+
b
+
+
c
+
d
+
e
+
Finally last symbol is read two trees are merged and final tree is left on the stack
*
+
+
a
+
*
+
b
+
+
c
+
d
+
e
+
Binary search tree:
Subset of binary tree. It is a binary tree for a set ‘s’ , s is a labeled binary tree such that each
vertex V is labeled by an element l(v)∈ s
1)each vertex ‘u’ in the left subtree of ‘v’ l(u) < l(v)
2)each vertex ‘u’ in the right subtree of ‘v’ l(u) > l(v)
3)each element a∈s, there is exactly one vertex v such that l(v)=a
If left subtree < root, root < right subtree then only it is binary search tree and No two
node values are same .
Operations:
1)find(search)
2)insert
3)delete
4)findmin/findmax
5)print
Find Operation
•
Time Complexity – O(Height of the Binary Search Tree)
That is O(N) in worst case
Example:
Height of the tree = N
Thus, Order of growth = O(N)
Find (Worst Case Example)
Order of growth will be O(N), no matter how the tree is structured.
Find Max and Find Min
For Find Max and Find Min operations, the worst case will have the time complexity order of
O(N).
• L is the smallest value in this BST.
• I is the largest value in this BST.
• For sorting in ascending order use inorder (LVR) method.
• For sorting in descending order use inorder (RVL) method.
• It will have the order of O(N).
Traversal & Median Value
• Inorder can be used to find the median value in the BST.
• It will have the order of O(N).
• We can use the balanced binary search tree in order to change the order O(N) to
O(logN).
• Traversal in a BST will have the order O(N).
• Recursion can be used for traversal operation.
Insertion opration
Always follow the BST rules while inserting a new node in the tree.
Case 1) New node will always be a terminal node.
Example ( 2 is the new node)
5
7
4
2
Case 2) In order to find the location to insert the node in some cases when following the BST
protocols. The complexity will be in the order O(N).
Example ( 6 is the new node)
5
4
2
Time Complexity worst case – O(Height of the BST)
6
7
Delete Operation :
O(Height of BST) ≡ O(N)
There are three possible cases :
i)case one : The deleting node has no child ≡ terminal node (leaf) => just delete it !
ii)case two : The deleting node has only one child => reconnect the child to its parent node.
iii)case three: The deleting node has two children (two subtrees)
a)Find the smallest node from its right subtree and replace the deleting node with it .
Delete the replaced node.
=> no child or only one child .
b)Find the largest node from its left subthree .
Example:
Build(construct) a BST for a given N elements :
5,10,21,32,7
 Repeat insert operations
5
10
21
7
32
21
10
7
5
7
21
32
5
32
10
T(N)=0+1+2+3+…+(N-1) ≠ O(N) because we are looking for the worst case which = O(N2)
And in average case time complexity = O(log N) , (this one can the best case too ).
Average depth all nodes in a tree is O(log N)
On the assumption that all insertion sequences are “equally likely ”.
Some of depth of all nodes ≡ internal path length
Average case analysis:
Insert operation repeatedly
5 4 3 2 1
5
4
3
2
1
2 5 1 3 4
2
1
5
3
4
2 4 1 3 5
2
1
4
3
5
Randomly generated BST
N!
2 1 3
2 3 1
2
2
1
3
1
3
Binary search tree ᶿ(N2) insert/remove pairs .
Theorem: The expected number of comparison ≡ depth of the node
Needed to insert N random elements into an initially empty BST is : O (NlogN) for N≥1 (on the
average , roughly )
Balanced Binary Search Tree ( AVL Tree )

AVL Tree is similar to Balanced Binary Search Tree.
Height of empty BST subtree is -1.
 For “every” node in the tree, its height of left and right subtrees can differ by atmost 1.
This is the similar to the condition of Balance.
Example:
5
8
2
1
7
4
3
It is Binary Search Tree and a AVL Tree.
After Inserting 10,
5
8
2
1
4
3
It is still a AVL Tree.
7
10
After Inserting 9,
5
8
2
1
10
7
4
9
3
Still,its a AVL Tree.
Example:
Below Tree is not AVL Tree.
7
8
2
11
1
4
10
3
5
NOTE:
Height information is kept for each node.
After each insert operation,update the height information of all the nodes from new inserted node
to root node.
NOTE:
The minimum number of nodes in an AVL Tree of height h,s(h) is
S(h)=s(h-1)+s(h-2)+1
s(h-1)
s(h-2)
s(0)=1
S(1)=2
S(2)=4
S(3)=7
S(4)=12
S(5)=20 [S(3)+S(4)+1]
S(6)=33 [S(4)+S(5)+1]
S(7)=54 [S(5)+S(6)+1]
All the tree operations =O(log N)
Except insertion needs special work called “Rotations”.
Example:
Inserting ‘6’
5
5
2
AfIII
6  ____===
___ Insert
8
4
7
1
8
2
1
4
3
3
7
6
Height of each node =max of left or right subtree +1
5
7
2
1
4
6
8
3
NOTE:
After one insert operation,only nodes that are on the path from the insertion point to the root
might have their balance altered.
=>Insert the node to root and update the balancing information.
Node that must be “Rebalanced”
Violating the balancing condition of AVL tree is called  .(Alpha)
CASE 1 An insertion into the left subtree of the left child of 
CASE 2 An insertion into the right subtree of the left child of 
CASE 3 An insertion into the left subtree of the right child of 
CASE 4 An insertion into the right subtree of the right child of 
CASE 1 is similar to CASE 4 and CASE 2 is similar to CASE 3
CASE 1
INSER
CASE 2
CASE 3
CASE 4
NOTE:(SINGLE ROTATION)
The new height of the entire subtree is exactly the same.
EXAMPLE
2
0
3
0
1
2
6
1
6
4
1
0
5
2
1
4
4
0
2
4
2
7
4
2
3
5
3
7
3
Node 6 violates AVL condition leads to  condition.
AVL Tree after Rotation
2
0
3
0
1
2
1
6
4
6
2
1
4
5
4
2
3
5
2
7
3
7
1
0
3
CASE 1
4
0
2
4
SINGLE ROTATION
K
K
K1
K1
Z
X
Y
X
Y
Z
CASE 2 DOUBLE
ROTATION
K
K
K
K
K
K
A
B
C
D
B
A
CASE 3
C
D
DOUBLE ROTATION
K
K
K
K
K
A
A
K
B
C
D
CASE 4 SINGLE ROTATION
B
C
D
K
K
K
X
K
Z
Y
EXAMPLE
I) TO INSERT 5
X
1
1
8
4
1
2
Z
Y
1
6
5
2) SINGLE ROTATION
3) SINGLE ROTATION
1
1
1
8
1
6
6
1
1
8
1
4
SPLAY TREES
4
1
2
3
5
5
* It is a Binary Search Tree but not a Balanced BST.
* Relatively simple.
* Guarantees that any M consecutive tree operations.
* Starting from an empty tree take atmost O(M logN)
* != O(log N) for a single operation.
* O(log N) “amortized”(on the average)cost per operation.
IDEA
After a node is accessed,it is pushed to the root by a series of AVL rotations.
WHY?
i) Likely to be accessed again in the future. =>Locality.
ii) Not require the maintenance of height/balance information.
METHOD 1
A Series of single rotations,bottom up.
K
K
K
K
K
F
K
E
K
K
F
D
K
K
D
E
A
B
C
A
K
B
C
K
K
K
K
K
K
F
F
K
K
E
K
A
E
A
B
B
C
D
D
C
K
K
K
K
K
A
B
C
D
E
F
7
EXAMPLE 1,2,3,4,5,6,7
6
5
2
1
4
2
3
1
3
2
1
PROBLEM
An other node might be almost as deep as k,used to be  (M,N)
METHOD 2
Let X be a (non-root)node on the access path at which we are rotating.
RULE
If the parent of X is the root of the tree,
=> Merely rotate X and root similar to last rotation.
Otherwise,X has both a parent(p) and a grandparent(G).
CASE 1
ZIG -ZIG CASE ( X is a right child and P is a left child or vice versa) Double Rotation.
X
G
D
G
P
D
X
A
A
B
C
B
C
D
CASE 2
ZIG-ZIG CASE (X and P are both left children or right children)
G
X
P
P
X
A
G
B
C
D
A
B
D
C
DELETE OPERATION
1) Accessing the node to be deleted. => push the node to root.
2) Delete it(root).=> two subtrees TL and TR .
3) Find largest element in TL .=>it will be a root without right child of TL .
4) TR will be the right subtree of the root.
EXAMPLE
To delete 12
1
1
1
2
3
2
6
6
1
2
2
1
2
3
2
4
1
8
8
2
2
2
2
1
4
1
1
1
6
After
12
1
1
8
2
2
1
6
2
4
2
8
1
2
3
1
2
2
2
3
4
2
2
After deleting 12, element 10 should be root.
B-Tree
 Main reason why it’s needed?
o Reduces the disk access time
 M-ary Search trees
o M-way branches:
 M M value increases == height decreases
o M – (minus) 1 keys
o Maintain M-ary search tree is balanced
Properties of B- Tree
(M and L)
 M = no. (#) of branches
 L = Record in each terminal/ block of data
1. Data items are stored at leaves
2. Non leaf nodes store up (m – 1) keys
a. Key (i) represent the smallest key in subtree (i+1)
3. Root is either a leaf or has between two and more children
4. All non leaf nodes (except root) have between [m/2] and m children
5. All leaves are at the same depth and have between [L/2] and L children
for some L
EXAMPLE 1
Assume one block = 8192 byte ==8k . And each key == 32 bytes.
Link= 4 bytes
Using M-ary B-tree formula
M-1 = keys
M= links
32(M-1) + 4 * M <= 8192
M= (up to) 228
L= ?
8192/ (divide by) record size
in this case the record size= 256 bytes
So,
= 8192/ (256) = 32 record/leaf.
EXAMPLE 2
Suppose 10 million records
 Each leaf has between 16 and 32 records
 Each internal node (with the exception of the root) has at least 114
branches
o 10,000,000/16 =(about) 625, 000 leaves ( worst case)
 To calculate worst case depth of B-tree
o Use log functions
 𝑙𝑜𝑔114 10,000,000

o The actual data is stored in leaf node
o Time complexity is measured by B-tree (height)
Download