CPSC 221: Data Structures Lecture #7 Branching Out

advertisement
CPSC 221: Data Structures
Lecture #5
Branching Out
Steve Wolfman
2014W1
1
Today’s Outline
•
•
•
•
•
Binary Trees
Dictionary ADT
Binary Search Trees
Deletion
Some troubling questions
2
Binary Trees
• Binary tree is either
– empty (NULL for us), or
– a datum, a left subtree, and a
right subtree
A
B
• Properties
– max # of leaves:
– max # of nodes:
C
D
E
F
G
H
• Representation:
Data
left
right
pointer pointer
I
J
3
Representation
struct Node {
KTYPE key;
DTYPE data;
Node * left;
Node * right;
};
A
left right
pointerpointer
B
C
left right
pointerpointer
left right
pointerpointer
D
E
F
left right
pointerpointer
left right
pointerpointer
left right
pointerpointer
A
B
D
C
E
F
4
Today’s Outline
•
•
•
•
•
Binary Trees
Dictionary ADT
Binary Search Trees
Deletion
Some troubling questions
5
What We Can Do So Far
• Stack
• List
– Push
– Pop
– Insert
– Remove
– Find
• Queue
– Enqueue
– Dequeue
• Priority Queue (skipped!)
– Insert
– DeleteMin
What’s wrong with Lists?
6
Dictionary ADT
•
• Dictionary operations
–
–
–
–
–
create
destroy
insert
find
delete
– would be tastier with
brownies
insert
• brownies
- tasty
midterm
•
prog-project
– so painful… who designed
this language?
•
find(wolf)
• wolf
- the perfect mix of oomph
wolf
– the perfect mix of oomph
and Scrabble value
and Scrabble value
• Stores values associated with user-specified keys
– values may be any (homogenous) type
– keys may be any (homogenous) comparable type
7
Search/Set ADT
• Dictionary operations
–
–
–
–
–
create
destroy
insert
find
delete
insert
• Min Pin
find(Wolf)
NOT FOUND
•
•
•
•
•
•
•
•
Berner
Whippet
Alsatian
Sarplaninac
Beardie
Sarloos
Malamute
Poodle
• Stores keys
– keys may be any (homogenous) comparable
– quickly tests for membership
8
A Modest Few Uses
•
•
•
•
•
•
•
Arrays and “Associative” Arrays
Sets
Dictionaries
Router tables
Page tables
Symbol tables
C++ Structures
9
Desiderata
• Fast insertion
– runtime:
• Fast searching
– runtime:
• Fast deletion
– runtime:
10
Naïve Implementations
insert
find
delete
• Linked list
• Unsorted array
• Sorted array
worst one… yet so close!
11
Today’s Outline
•
•
•
•
•
Binary Trees
Dictionary ADT
Binary Search Trees
Deletion
Some troubling questions
12
Binary Search Tree
Dictionary Data Structure
• Binary tree property
8
– each node has  2 children
– result:
• storage is small
• operations are simple
• average depth is small*
• Search tree property
– all keys in left subtree
smaller than root’s key
– all keys in right subtree
larger than root’s key
– result:
• easy to find any given key
5
2
11
6
4
10
7
9
12
14
13
13
*Technically:
a result of both properties.
Example and Counter-Example
5
8
4
1
8
7
5
11
3
BINARY SEARCH TREE
2
7
4
11
6
10
15
NOT A
BINARY SEARCH TREE
18
20
21
14
In Order Listing
struct Node {
KTYPE key;
DTYPE data;
Node * left;
Node * right;
};
10
5
15
2
9
7
20
17 30
In order listing:
25791015172030
15
Finding a Node
10
5
15
2
9
7
a.
b.
c.
runtime: d.
e.
20
17 30
O(1)
O(lg n)
O(n)
O(n lg n)
None of these
Node *& find(Comparable key,
Node *& root) {
if (root == NULL)
return root;
else if (key < root->key)
return find(key,
root->left);
else if (key > root->key)
return find(key,
root->right);
else
return root;
}
16
Finding a Node
10
5
15
2
9
7
20
17 30
WARNING: Much fancy footwork with
refs (&) coming. You can do all of this
without refs... just watch out for special
cases.
Node *& find(Comparable key,
Node *& root) {
if (root == NULL)
return root;
else if (key < root->key)
return find(key,
root->left);
else if (key > root->key)
return find(key,
root->right);
else
return root;
}
17
Iterative Find
Node * find(Comparable key,
Node * root) {
while (root != NULL &&
root->key != key) {
if (key < root->key)
root = root->left;
else
root = root->right;
}
10
5
15
2
9
7
20
17 30
return root;
}
Look familiar?
(It’s trickier to get the ref return to work here.)
18
Insert
10
5
15
2
9
7
20
17 30
// Precondition: key is not
// already in the tree!
void insert(Comparable key,
Node * root) {
Node *& target(find(key,
root));
assert(target == NULL);
target = new Node(key);
}
runtime:
19
Funky game we can play with the *& version.
Digression: Value vs. Reference
Parameters
• Value parameters (Object foo)
– copies parameter
– no side effects
• Reference parameters (Object & foo)
– shares parameter
– can affect actual value
– use when the value needs to be changed
• Const reference parameters (Object const & foo)
– shares parameter
– cannot affect actual value
20
– use when the value is too big for copying in pass-by-value
BuildTree for BSTs
• Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted
into an initially empty BST:
– in order
– in reverse order
– median first, then left median, right median, etc.
so: 5, 3, 8, 2, 4, 7, 9, 1, 6
21
Analysis of BuildTree
• Worst case: O(n2) as we’ve seen
• Average case assuming all orderings equally likely
turns out to be O(n lg n).
22
Bonus: FindMin/FindMax
• Find minimum
10
5
• Find maximum
15
2
9
7
20
17 30
23
Double Bonus: Successor
Find the next larger node
in this node’s subtree.
Node *& succ(Node *& root) {
if (root->right == NULL)
return root->right;
else
return min(root->right);
}
10
5
15
2
Node *& min(Node *& root) {
if (root->left == NULL) return root;
else return min(root->left);
}
9
7
20
17 30
24
More Double Bonus: Predecessor
Find the next smaller node
in this node’s subtree.
Node *& pred(Node *& root) {
if (root->left == NULL)
return root->left;
else
return max(root->left);
}
Node *& max(Node *& root) {
if (root->right == NULL) return root;
else return max(root->right);
}
10
5
15
2
9
7
20
17 30
25
Today’s Outline
• Some Tree Review
(here for reference, not discussed)
• Binary Trees
• Dictionary ADT
• Binary Search Trees
• Deletion
• Some troubling questions
26
Deletion
10
5
15
2
9
7
20
17 30
Why might deletion be harder than insertion?
27
Lazy Deletion
• Instead of physically deleting
nodes, just mark them as
deleted (with a “tombstone”)
+
+
+
–
simpler
physical deletions done in batches
some adds just flip deleted flag
2
small amount of extra memory
for deleted flag
– many tombstones slow finds
– some operations may have to be
modified (e.g., min and max)
10
5
15
9
7
20
17 30
28
Lazy Deletion
Delete(17)
10
Delete(15)
5
Delete(5)
Find(9)
Find(16)
15
2
9
7
20
17 30
Insert(5)
Find(17)
29
Deletion - Leaf Case
10
Delete(17)
5
15
2
9
7
20
17 30
30
Deletion - One Child Case
10
Delete(15)
5
15
2
9
7
20
30
31
Deletion - Two Child Case
10
Delete(5)
5
20
2
9
30
7
32
Finally…
10
7
2
20
9
30
33
Delete Code
void delete(Comparable key, Node *& root) {
Node *& handle(find(key, root));
Node * toDelete = handle;
if (handle != NULL) {
if (handle->left == NULL) {
// Leaf or one child
handle = handle->right;
} else if (handle->right == NULL) { // One child
handle = handle->left;
} else {
// Two child case
Node *& successor(succ(handle));
handle->data = successor->data;
toDelete = successor;
successor = successor->right;
// Succ has <= 1 child
}
delete toDelete;
}
Refs make this short and “elegant”…
}
34
but could be done without them with a bit more work.
Today’s Outline
• Some Tree Review
(here for reference, not discussed)
• Binary Trees
• Dictionary ADT
• Binary Search Trees
• Deletion
• Some troubling questions
35
Thinking about
Binary Search Trees
• Observations
– Each operation views two new elements at a time
– Elements (even siblings) may be scattered in memory
– Binary search trees are fast if they’re shallow
• Realities
– For large data sets, disk accesses dominate runtime
– Some deep and some shallow BSTs exist for any data
36
Solutions?
• Reduce disk accesses?
• Keep BSTs shallow?
37
To Do
• Continue readings on website!
38
Coming Up
Spawns parallel task.
Since we have only
one classroom, one of
these goes first!
•
•
•
•
cilk_spawn Parallelism and Concurrency
cilk_spawn Self-balancing Binary Search Trees
cilk_spawn Priority Queues
cilk_spawn Sorting (most likely!)
•
•
•
•
•
Huge Search Tree Data Structure
cilk_join
cilk_join
cilk_join
cilk_join
39
Download