Uploaded by lsarson10

3020 all lectures

advertisement
COIS 3020: Data Structures &
Algorithms II
Alaadin Addas
Week 1
Lecture 1
Introducing Myself
Course Introduction
Overview
Graphs
•
•
•
•
•
Introducing Graphs
Adjacency Lists
Demonstration
Directed Graphs
Weighted Graphs
Summary
My name is Alaadin Addas and I will be your instructor for
COIS 3020 this semester.
Introducing
Myself
I hold a B.Sc. in Computer Science with a specialization in
Software Engineering, and an M.Sc. in Computer Science
with a focus on Cyber Security.
I also hold the SSCP and CISSP certifications from ISC^2
Over the past 6 years I have been working as a software
engineer while teaching at several institutions including
Trent University, UOIT (Ontario Tech).
• I worked for a logistics company for 5 years, I transitioned last year to
work for a Vancouver based startup SaaS company that makes
employee recognition software.
Syllabus!
Course
Introduction
MS teams join
code: 0to4ndq
Graphs
Graphs
Graphs
• Many items in the world are connected, such as
computers on a network connected by wires (or
wirelessly), cities connected by roads, or people
connected by friendship.
Example Graph
Graphs
• A graph is a data structure for representing
connections among items and consists of vertices
connected by edges.
• A vertex (or node) represents an item in a
graph.
• An edge represents a connection between
two vertices in a graph.
Graphs
• How many vertices does this graph have?
• How many edges?
Two vertices are adjacent if connected
by an edge.
Graphs Adjacency
& Paths
A path is a sequence of edges leading
from a source (starting) vertex to a
destination (ending) vertex.
• The path length is the number of edges in the
path.
The distance between two vertices is the
number of edges on the shortest path
between those vertices.
Graphs - Adjacency & Paths - 1
• Two vertices are adjacent if connected by an edge.
• PC1 and Server1 are adjacent.
• PC1 and Server3 are not adjacent.
Graphs - Adjacency & Paths - 2
• A path is a sequence of edges from a source vertex to a destination
vertex.
Graphs - Adjacency & Paths - 3
• A path's length is the path's number of edges.
• Vertex distance is the length of the shortest path:
• Distance from PC1 to PC2 is 2.
Graph Applications
• So why do we care? Where can this even be used?
• Graphs are often used to represent a geographic map, which can contain
information about places and travel routes.
• For example: Vertices in a graph can represent airports, with edges
representing available flights.
• Edge weights in such graphs often represent the length of a travel route,
either in total distance or expected time taken to navigate the route.
• For example: A map service with access to real-time traffic information
can assign travel times to road segments.
Graph Applications
• A graph can be used to represent relationships between products.
• Vertices in the graph corresponding to a customer's purchased products
have adjacent vertices representing products that can be recommended to
the customer.
Product Recommendation – Graph Example
• An online shopping service can represent relationships between
products being sold using a graph.
Product Recommendation – Graph Example
• Relationships may be based on the products alone. A game console
requires a TV and games, so the game console vertex connects to TV
and game products.
Product Recommendation – Graph Example
• Vertices representing common kitchen products, such as a blender,
dishwashing soap, kitchen towels, and oven mitts, are also connected.
Why what is the connection? This makes no sense.
Product Recommendation – Graph Example
• Connections might also represent common purchases that occur by
chance. Maybe several customers who purchase a Blu-ray player also
purchase a blender.
Product Recommendation – Graph Example
• A customer's purchases can be linked to the product vertices in the
graph.
Product Recommendation – Graph Example
• Adjacent vertices can then be used to provide a list of
recommendations to the customer.
Product Recommendation – Graph Example
• Does this sound familiar?
• You have probably used an online shopping service that has used this
method to recommend things to you.
• Other applications include recommending friends or people to follow on
social media, and the list goes on.
Graph Representations
• Everything we saw looks great when visualized, but how do we represent
graphs in code as a data structure?
• Adjacency Lists
• Adjacency Matrices
Adjacency Lists
• Remember, two vertices are adjacent if they are connected by an edge.
• In an adjacency list graph representation, each vertex has a list of adjacent
vertices, each list item representing an edge.
Adjacency Lists – Visualized – 1
• Each vertex has a list of adjacent vertices for edges.
• The edge connecting A and B appears in A's list and in B's list (unless
it is directed which we will not discuss now).
Adjacency Lists – Visualized – 2
• Each edge appears in the lists of the edge's two vertices.
Advantages of Adjacency Lists
• A key advantage of an adjacency list graph representation is a size of O(V +
E), because each vertex appears once, and each edge appears twice.
• V refers to the number of vertices, E the number of edges.
• However, a disadvantage is that determining whether two vertices are
adjacent is O(V), because one vertex's adjacency list must be traversed
looking for the other vertex, and that list could have V items.
• However, in most applications, a vertex is only adjacent to a small
fraction of the other vertices, yielding a sparse graph.
Sparse Graphs
• A sparse graph has far fewer edges than the maximum possible. Many
graphs are sparse, like those representing a computer network, flights
between cities, or friendships among people (every person isn't friends
with every other person).
• Thus, the adjacency list graph representation is very common.
Adjacency Lists Demonstration
• I am using C# 10 hence the differences.
• After viewing the demo, how can you add a weight to an edge, try it out!
Directed Graphs
• A directed graph, or digraph, consists of vertices connected by directed
edges.
• A directed edge is a connection between a starting vertex and a
terminating vertex.
• In a directed graph, a vertex Y is adjacent to a vertex X, if there is an
edge from X to Y.
• Many graphs are directed, like those representing links between web
pages, maps for navigation, or college course prerequisites.
Paths & Cycles
• In a directed graph:
• A path is a sequence of directed edges leading from a source (starting)
vertex to a destination (ending) vertex.
• A cycle is a path that starts and ends at the same vertex. A directed
graph is cyclic if the graph contains a cycle, and acyclic if the graph does
not contain a cycle.
Paths & Cycles – Visualized – 1
• A path is a sequence of directed edges leading from a source
(starting) vertex to a destination (ending) vertex.
Paths & Cycles – Visualized – 2
• A cycle is a path that starts and ends at the same vertex.
• A graph can have more than one cycle.
Paths & Cycles – Visualized – 3
• A cyclic graph contains a cycle.
• An acyclic graph contains no cycles. Acyclic graphs are sometimes
referred to as DAGs.
Weighted Graphs
• A weighted graph associates a weight with each edge.
• A graph edge's weight, or cost, represents some numerical value between
vertex items, such as flight cost between airports, connection speed
between computers, or travel time between cities.
• A weighted graph may be directed or undirected.
Applications
Path Length - Weighted Graphs
• In a weighted graph, the path length is the sum of the edge weights in
the path.
Negative Edge Weight Cycles
• The cycle length is the sum of the edge weights in a cycle.
• A negative edge weight cycle has a cycle length less than 0.
• A shortest path does not exist in a graph with a negative edge weight cycle,
because each loop around the negative edge weight cycle further decreases
the cycle length, so no minimum exists.
• Where might we use negative edge weights applied?
Summary
• Graphs have many applications:
•
•
•
•
Social Networks
Online Shopping
Flight Networks
Computer Networks
• The most simple implementation of a graph involves an array of
LinkedLists.
• Try adding weights and finding the shortest path between two vertices by
altering the uploaded demo code!!
COIS 3020: Data Structures &
Algorithms II
Alaadin Addas
Week 2
Lecture 1
Originally Week 1 – Lecture 2
Recap
Graphs
Overview
• Directed Graphs
• Weighted Graphs
• Path Length
• Adjacency Matrices
Summary
Recap
• Graphs have many applications:
• Social Networks
• Online Shopping
• Flight Networks
• Computer Networks
• The most simple implementation of a graph involves an array of
LinkedLists.
• Last lecture we left off with the demonstration, and I asked you to try
and alter the AddEdge method so that you could also add a weight.
Directed Graphs
• A directed graph, or digraph, consists of vertices connected by directed
edges.
• A directed edge is a connection between a starting vertex and a
terminating vertex.
• In a directed graph, a vertex Y is adjacent to a vertex X, if there is an
edge from X to Y.
• Many graphs are directed, like those representing links between web
pages, maps for navigation, or college course prerequisites.
Paths & Cycles
• In a directed graph:
• A path is a sequence of directed edges leading from a source (starting)
vertex to a destination (ending) vertex.
• A cycle is a path that starts and ends at the same vertex. A directed
graph is cyclic if the graph contains a cycle, and acyclic if the graph does
not contain a cycle.
Paths & Cycles – Visualized – 1
• A path is a sequence of directed edges leading from a source
(starting) vertex to a destination (ending) vertex.
Paths & Cycles – Visualized – 2
• A cycle is a path that starts and ends at the same vertex.
• A graph can have more than one cycle.
Paths & Cycles – Visualized – 3
• A cyclic graph contains a cycle.
• An acyclic graph contains no cycles. Acyclic graphs are sometimes
referred to as DAGs.
Weighted Graphs
• A weighted graph associates a weight with each edge.
• A graph edge's weight, or cost, represents some numerical value between
vertex items, such as flight cost between airports, connection speed
between computers, or travel time between cities.
• A weighted graph may be directed or undirected.
Applications
Path Length - Weighted Graphs
• In a weighted graph, the path length is the sum of the edge weights in
the path.
Negative Edge Weight Cycles
• The cycle length is the sum of the edge weights in a cycle.
• A negative edge weight cycle has a cycle length less than 0.
• A shortest path does not exist in a graph with a negative edge weight cycle,
because each loop around the negative edge weight cycle further decreases
the cycle length, so no minimum exists.
• Where might we use negative edge weights applied?
Adjacency Matrices
• Adjacency lists are one way to represent a graph as a data structure.
• Another approach is to use an adjacency matrix.
• Remember, two vertices are adjacent if connected by an edge.
• In an adjacency matrix graph representation, each vertex is assigned to a
matrix row and column, and a matrix element is 1 if the corresponding two
vertices have an edge or is 0 otherwise.
Adjacency Matrix – Representation – 1
• Each vertex is assigned to a row and column.
Adjacency Matrix – Representation – 2
• An edge connecting A and B is a 1 in A's row and B's column.
Adjacency Matrix – Representation – 3
• Similarly, that same edge is a 1 in B's row and A's column (for the
matrix to be symmetric).
Adjacency Matrix – Representation – 4
• Each edge similarly has two 1's in the matrix. Other matrix elements
are 0's (not shown). We don’t need to insert a zero, the absence of
one indicates that an edge does not exist.
Analysis of Adjacency Matrices
• Assuming the common implementation as a two-dimensional array whose elements are
accessible in O(1), then an adjacency matrix's key benefit is O(1) determination of
whether two vertices are adjacent:
• The corresponding element is just checked for 0 or 1 (or lack of a zero and 1).
• A key drawback is O(𝑉 2 ) size.
• For example: A graph with 1000 vertices would require a 1000 x 1000 matrix,
meaning 1,000,000 elements.
• An adjacency matrix's large size is inefficient for a sparse graph, in which most
elements would be 0’s.
• An adjacency matrix only represents edges among vertices; if each vertex has data, like a
person's name and address, then a separate list of vertices is needed.
Sparse Graph – Reminder
• Last lecture I mentioned that a sparse graph is one that has less than the
maximum amount of edges.
• Most graphs are sparse graphs that is why adjacency lists are preferred.
• Recall the Facebook users example:
• Say the average user has 500 friends (this is a fake number that I
made up, it is probably less).
• There are over 1 billion facebook users.
• 1 billion * 1 billion = 1 quintillion.
• Most of that matrix will be populated with zeros.
Demonstration
• An adjacency matrix is usually implemented as
a 2D array so it is not a very interesting data
structure to examine.
• We will be looking at a more advanced
variation of the adjacency list. This time we will
include weights!
Summary
• Graphs can be cyclic (contain a cycle) or acyclic
(do not contain a cycle).
• Edges on a graph may be weighted which can
indicate several different things depending on
the application the graph is being used for
(e.g., cost of a plane ticket, connected speed
between nodes, etc..)
• Adjacency lists are much better suited as a data
structure when graphs are sparse.
COIS 3020: Data Structures &
Algorithms II
Alaadin Addas
Week 2
Lecture 2
Recap
Overview
Breadth First Search on Graphs
Depth First Search on Graphs
Summary
Recap
• Graphs can be cyclic (contain a cycle) or acyclic (do not contain a cycle).
• Edges on a graph may be weighted which can indicate several different
things depending on the application the graph is being used for (e.g., cost
of a plane ticket, connection speed between nodes, etc..)
• Adjacency lists are much better suited as a data structure when graphs are
sparse.
Graph Traversal
• An algorithm commonly must visit every vertex in a graph in some order,
known as a graph traversal.
Breadth First Search
• A breadth-first search (BFS) is a traversal that visits a starting vertex, then
all vertices of distance 1 from that vertex, then of distance 2, and so on,
without revisiting a vertex.
Breadth First Search – Visualized – 1
• Breadth-first search starting at A visits vertices based on distance
from A.
• B and D are distance 1.
Breadth First Search – Visualized – 2
• E and F are distance 2 from A.
• Note: A path of length 3 also exists to F, but distance metric uses
shortest path.
Breadth First Search – Visualized – 3
• C is distance 3 from A.
Breadth First Search – Visualized – 4
• Breadth-first search from A visits A, then vertices of distance 1, then
2, then 3.
• Note: Visiting order of same-distance vertices doesn't matter (we can
do D or B first, or F first then E).
Breadth First Search – Exercise
• We are going to perform a breadth-first search of the graph below.
• Assume the starting vertex is E.
Breadth First Search – Exercise
• Which vertex is visited first?
• Which vertex is visited
second?
• Which vertex is visited third?
• What is C's distance?
• What is D's distance?
• Is the BFS traversal of a graph
unique (i.e., can we traverse
using a different path?)
Application: Social Network Recommenders
• Social networking sites like Facebook and LinkedIn use graphs to represent
connections among people.
• For a particular user, a site may wish to recommend new connections.
• One approach does a breadth-first search starting from the user,
recommending new connections starting at distance 2.
• Distance 1 users are already connected with the user.
Social Network Recommenders
Application: P2P
• Have you every downloaded a torrent?
• In a peer-to-peer network, computers are connected via a network and may seek
and download file copies (such as songs or movies) via intermediary computers or
routers.
• For example, one computer may seek the movie "The Wizard of Oz", which
may exist on 10 computers in a network consisting of 100,000 computers.
• Finding the closest computer (having the fewest intermediaries) yields a
faster download speed.
• A BFS traversal of the network graph can find the closest computer with that
movie
Application: P2P
Breadth First Search – Algorithm
• An algorithm for breadth-first search enqueues the starting vertex in a
queue. (does everyone remember what a queue is?)
• While the queue is not empty, the algorithm dequeues a vertex from the
queue, visits the dequeued vertex, enqueues that vertex's adjacent
vertices (if not already discovered), and repeats.
Queue & Stack Reminder
• Image Credits: Dev community. I lost the link, let it go. Just let it go.
Breadth First Search – Algorithm
• BFS enqueues the start vertex (in this case A) in frontierQueue, and
adds A to Discovered Set.
Breadth First Search – Algorithm
• Vertex A is dequeued from frontierQueue and visited.
Breadth First Search – Algorithm
• Undiscovered vertices adjacent to A are enqueued in frontierQueue
and added to discoveredSet.
Breadth First Search – Algorithm
• Vertex A's visit is complete. The process continues on to next
vertices in the frontierQueue, D and B.
Breadth First Search – Algorithm
• E is visited. Vertices B and F are adjacent to E, but are already in
discoveredSet. So B and F are not again added to frontierQueue or
discoveredSet.
Breadth First Search – Algorithm
• The process continues until frontierQueue is empty. discoveredSet
shows the visit order. Note that each vertex's distance from start is
shown on the graph.
Breadth First Search – Algorithm
• When the BFS algorithm first encounters a vertex, that vertex is said to
have been discovered.
• In the BFS algorithm, the vertices in the queue are called the frontier,
being vertices thus far discovered but not yet visited.
• Because each vertex is visited at most once, an already-discovered
vertex is not enqueued again.
• A "visit" may mean to print the vertex, append the vertex to a list, compare
vertex data to a value and return the vertex if found, etc.
Depth First Search
• Another way to traverse a graph is Depth First Search (DFS).
• A depth-first search is a traversal that visits a starting vertex, then visits
every vertex along each path starting from that vertex to the path's end
before backtracking.
Depth First Search – Visualization – 1
• Depth-first search starting at A descends along a path to the path's
end before backtracking.
Depth First Search – Visualization – 2
• Reach path's end: Backtrack to F, visit F's other adjacent vertex (E).
• B already visited, backtrack again.
Depth First Search – Visualization – 3
• Backtracked all the way to A. Visit A's other adjacent vertex (D).
• No other adjacent vertices: Done.
Depth First Search - Exercise
• Perform a depth-first search of the graph below. Assume the starting
vertex is E.
Depth First Search - Exercise
• Which vertex is visited first?
• Assume DFS traverses the
following vertices: E, A, B.
Which vertex is visited next?
• Assume DFS traverses the
following vertices: E, A, B, C.
Which vertex is visited next?
• Is the following a valid DFS
traversal?
• E, D, F, A, B, C
Depth First Search Algorithm
• A recursive DFS can be implemented using the program stack instead of an
explicit stack.
• The recursive DFS algorithm is first called with the starting vertex.
• If the vertex has not already been visited, the recursive algorithm visits
the vertex and performs a recursive DFS call for each adjacent vertex.
Depth First Search Algorithm
RecursiveDFS(currentV) {
if ( currentV is not in visitedSet ) {
Add currentV to visitedSet
"Visit" currentV
for each vertex adjV adjacent to currentV {
RecursiveDFS(adjV)
}
}
}
Summary
• What is the main difference between BFS and DFS?
• Which is more efficient, how can we tell?
• What are some applications of BFS and DFS?
COIS 3020: Data Structures &
Algorithms II
Alaadin Addas
Week 3
Lecture 1
Overview
• Announcements
• Recap
• Depth First Search
• Demonstration
• Assignment 1
• Summary
Announcements
• As of next week, we will begin in person learning.
• Please refer to the academic timetable for the location.
• We have to be very flexible (watch out for announcements).
• To my knowledge there will be no audio/video recording (I am not even
sure the room we are in is equipped).
• I pushed back the due date on Assignment 1 slightly.
• Graduate Teaching Assistant: Mohsen Hosseinibaghdadabadi
• mhosseinibaghdadabad@trentu.ca
• Thursday 5PM office hours for the rest of the semester on MS Teams.
Recap
• Last week we discussed the Breadth First Search Algorithm in some detail.
• What are the basic tenants of a BFS?
• Why are Breadth First Searches used?
Recap
Breadth First Search – Algorithm
• Vertex A is dequeued from frontierQueue and visited.
Depth First Search
• Another way to traverse a graph is Depth First Search (DFS).
• A depth-first search is a traversal that visits a starting vertex, then visits
every vertex along each path starting from that vertex to the path's end
before backtracking.
Depth First Search – Visualization – 1
• Depth-first search starting at A descends along a path to the path's
end before backtracking.
Depth First Search – Visualization – 2
• Reach path's end: Backtrack to F, visit F's other adjacent vertex (E).
• B already visited, backtrack again.
Depth First Search – Visualization – 3
• Backtracked all the way to A. Visit A's other adjacent vertex (D).
• No other adjacent vertices: Done.
Depth First Search - Exercise
• Perform a depth-first search of the graph below. Assume the starting
vertex is E.
Depth First Search - Exercise
• Which vertex is visited first?
• Assume DFS traverses the
following vertices: E, A, B.
Which vertex is visited next?
• Assume DFS traverses the
following vertices: E, A, B, C.
Which vertex is visited next?
• Is the following a valid DFS
traversal?
• E, D, F, A, B, C
Depth First Search Algorithm
• A recursive DFS can be implemented using the program stack instead of an
explicit stack.
• The recursive DFS algorithm is first called with the starting vertex.
• If the vertex has not already been visited, the recursive algorithm visits
the vertex and performs a recursive DFS call for each adjacent vertex.
Depth First Search Algorithm
RecursiveDFS(currentV) {
if ( currentV is not in visitedSet ) {
Add currentV to visitedSet
"Visit" currentV
for each vertex adjV adjacent to currentV {
RecursiveDFS(adjV)
}
}
}
Demonstration
Assignment 1
Summary
• What is the main difference between BFS and DFS?
• Which is more efficient, how can we tell?
• What are some applications of BFS and DFS?
Treap
Cecilia Aragon and Raimund Seidel, 1989
Background
› A treap is a randomly built binary search tree
› The expected time complexity to insert and remove an
item from the treap is O(log n)
› Inspired as a combination of a binary search tree and a
binary heap (hence, treap)
Review
› Property of a binary search tree T
For each node p in T, all values in the
1. left subtree of p are less than the value at p.
2. right subtree of p are greater than the value at p.
› Property of a binary heap T
For each node p in T (except the root), the priority at p
is less than or equal to the priority of its parent.
Data structure
› Like a binary search tree except each node also contains
a random priority
public class Node<T> where T : IComparable
{
private static Random
R = new Random();
public
public
public
public
...
T
int
Node<T>
Node<T>
Item
Priority
Left
Right
{
{
{
{
get;
get;
get;
get;
set;
set;
set;
set;
}
}
}
}
}
class Treap<T> : ISearchable<T> where T : IComparable
{
private Node<T> Root; // Reference to the root of the Treap
...
}
Key methods
› void Add (T item)
– inserts item into the treap (duplicate items are not permitted)
› bool Contains (T item)
– Returns true if item is found; false otherwise
– Same as the binary search tree
› void Remove (T item)
– Removes (if possible) item from the treap
Support methods
Node<T> LeftRotate (Node<T> root)
root
temp
temp
X
T1
root
Y
T2
Y
T3
X
T3
T1
T2
Support methods
Node<T> RightRotate (Node<T> root)
temp
root
root
X
T1
temp
Y
T2
Y
T3
X
T3
T1
T2
Add
› Basic strategy
– Create a node that contains an item and a random priority.
– Add the node to the treap as you would a binary search tree
using its item to determine ordering.
– Now, using its random priority, rotate the node (either left or
right) up the treap until the parent node has a higher priority or
the root is reached.
Consider the following treap
50
0.83
20
0.47
60
0.25
10
0.34
40
0.42
30
0.14
Item
Priority
The treap satisfies the property of a
1. binary search tree using Item
2. binary heap using Priority
Add 15 with priority 0.68
50
0.83
20
0.47
60
0.25
10
0.34
40
0.42
15
0.68
30
0.14
Add 15 with priority 0.68
50
0.83
20
0.47
60
0.25
10
0.34
40
0.42
15
0.68
30
0.14
Add 15 with priority 0.68
50
0.83
20
0.47
60
0.25
15
0.68
10
0.34
40
0.42
30
0.14
Add 15 with priority 0.68
50
0.83
15
0.68
10
0.34
Stop!
60
0.25
20
0.47
The treap still satisfies the property of a
1. binary search tree using Item
2. binary heap using Priority
30
0.14
40
0.42
Add 25 with priority 0.99
50
0.83
15
0.68
60
0.25
10
0.34
20
0.47
40
0.42
30
0.14
25
0.99
Add 25 with priority 0.99
50
0.83
15
0.68
10
0.34
60
0.25
20
0.47
40
0.42
25
0.99
30
0.14
Add 25 with priority 0.99
50
0.83
15
0.68
10
0.34
60
0.25
20
0.47
25
0.99
40
0.42
30
0.14
Add 25 with priority 0.99
50
0.83
15
0.68
10
0.34
60
0.25
25
0.99
20
0.47
40
0.42
30
0.14
Add 25 with priority 0.99
50
0.83
25
0.99
60
0.25
15
0.68
10
0.34
40
0.42
20
0.47
30
0.14
Add 25 with priority 0.99 (Final Treap)
25
0.99
50
0.83
15
0.68
10
0.34
20
0.47
Yes, the treap still satisfies the property of a
1. binary search tree using Item
2. binary heap using Priority
40
0.42
30
0.14
60
0.25
Exercises
› Give a sequence n items/priorities that yields a treap with
a worst-case depth of n-1.
› Can any sequence of n items yield a treap of depth n-1
for some sequence of random priorities? Argue why or
why not.
› Add (35, 0.75) to the final treap.
Remove
› Basic strategy
– Find the node that contains the item to remove.
– Rotate the node with its child with the higher priority. If the left
child has the higher priority, rotate right. If the right child has
the higher priority, rotate left. Assume that an empty child has
an arbitrarily small priority.
– Continue to rotate the node downward in this manner until it is a
leaf node. Then simply remove the node (and item).
Remove 50
25
0.99
50
0.83
15
0.68
10
0.34
20
0.47
40
0.42
30
0.14
60
0.25
Remove 50
25
0.99
15
0.68
10
0.34
40
0.42
20
0.47
30
0.14
50
0.83
60
0.25
Remove 50
25
0.99
15
0.68
10
0.34
40
0.42
20
0.47
30
0.14
60
0.25
50
0.83
“Snip off” the leaf node
Remove 50
25
0.99
15
0.68
10
0.34
40
0.42
20
0.47
30
0.14
60
0.25
Remove 25
25
0.99
15
0.68
10
0.34
40
0.42
20
0.47
30
0.14
60
0.25
Remove 25
15
0.68
10
0.34
25
0.99
40
0.42
20
0.47
30
0.14
60
0.25
Remove 25
10
0.34
15
0.68
20
0.47
25
0.99
40
0.42
30
0.14
60
0.25
Remove 25
10
0.34
15
0.68
20
0.47
40
0.42
25
0.99
60
0.25
30
0.14
Remove 25
15
0.68
10
0.34
20
0.47
40
0.42
“Snip off” the leaf node
25
0.99
30
0.14
60
0.25
Final treap
15
0.68
10
0.34
And the treap still satisfies the property of a
1. binary search tree using Item
2. binary heap using Priority
Hooray!
20
0.47
40
0.42
30
0.14
60
0.25
Exercises
› Remove 15 from the final treap.
› Could you use the traditional way of removing a value from a
binary search tree and apply it to a treap? Show how.
› Re-implement the Remove method without recursion.
› * Given a treap of height h, provide an example that requires
2h rotations to bring the root of the treap to a leaf position?
Expected time complexity
› The time complexity to Add or Remove an item is O(h) where
h is the height of the treap.
– The Add method rotates the inserted item (node) up one level at a
time until it reaches the root in the worst case.
– The Remove method may carry out 2h rotations in the worst case to
bring the root item (node) to a leaf position (see Exercise).
› Because the treap builds a random binary search tree, the
expected height of the tree is O(log n) and therefore, the
expected time complexity of Add and Remove is O(log n).
Skip List
William Pugh, 1989
Background
› A skip list is a randomly built binary search tree
› The expected time complexity to insert and remove an
item from the skip list is O(log n)
› Inspired as a generalization of the singly-linked list
Consider the following linked list
head
Item
10
20
30
40
50
60
Height
1
1
1
1
1
1
Next
●
Node
In the worst case, it takes O(n) time to:
1) insert an item,
2) find an item, and
3) remove an item
What if we added a level to node 40?
head
Item
10
20
30
40
50
60
Height
1
1
1
2
1
1
Next 0
1
●
●
What if we added two levels to node 20?
head
Item
10
20
30
40
50
60
Height
1
3
1
2
1
1
Next 0
1
2
●
●
head
Item
10
20
30
40
50
60
Height
1
3
1
2
1
1
Next 0
●
1
2
●
●
What if we added two levels to node 60?
head
Item
10
20
30
40
50
60
Height
1
3
1
2
1
3
Next 0
●
1
2
●
●
head
Item
10
20
30
40
50
60
Height
1
3
1
2
1
3
Next 0
●
1
●
2
●
What if we added three levels to node 50?
head
Item
10
20
30
40
50
60
Height
1
3
1
2
4
3
Next 0
●
1
●
2
●
3
head
Item
10
20
30
40
50
60
Height
1
3
1
2
4
3
Next 0
●
1
●
2
●
3
●
Can you see the binary search tree?
head
Item
10
20
30
40
50
60
Height
1
3
1
2
4
3
Next 0
●
1
●
2
●
3
●
Can you see the binary search tree?
head
Item
10
20
30
40
50
60
Height
1
3
1
2
4
3
Next 0
●
1
●
2
●
3
●
Root
Methods
› void Insert (T item)
– inserts item into the skip list (duplicate items are permitted)
› bool Contains (T item)
– Returns true if item is found; false otherwise
› void Remove (T item)
– Removes one occurrence of item (if possible)
Starting from scratch
head
4
●
3
●
2
●
1
●
0
●
Node Next[ ]
Node class
--
int Height
--
T Value
maxHeight = 0 // current maximum height of the skip list
The height of a node is randomly
set according to the following distribution
Insert 30
head
Height
Probability
1
1/2
2
1/4
3
1/8
4
●
3
4
1/16
●
2
h
1/2h
●
1
●
0
●
but is capped at one more than the
current maxHeight of the skip list
0
●
--
1
--
30
maxHeight = 0
Insert 30
head
4
●
3
●
2
●
1
●
0
●
0
●
--
1
--
30
maxHeight = 1
Insert 30
cur
head
4
●
3
●
2
●
1
●
0
●
newNode.Next[i] = cur.Next[i];
cur.Next[i] = newNode;
0
●
--
1
--
30
maxHeight = 1
Insert 30
4
●
3
●
2
●
1
●
newNode.Next[i] = cur.Next[i];
cur.Next[i] = newNode;
0
head
0
●
--
1
--
30
maxHeight = 1
Insert 40
4
●
3
●
2
●
1
●
0
head
0
●
1
●
0
●
--
1
2
--
30
40
maxHeight = 2
Insert 40
cur
4
●
3
●
2
●
1
●
0
head
0
●
1
●
0
●
--
1
2
--
30
40
maxHeight = 2
Insert 40
cur
4
●
3
●
2
●
1
0
head
0
●
1
●
0
●
--
1
2
--
30
40
maxHeight = 2
Insert 40
cur
4
●
3
●
2
●
while (cur.Next[i] != null && cur.Next[i].Item.CompareTo(item) < 0)
cur = cur.Next[i];
1
0
head
0
●
1
●
0
●
--
1
2
--
30
40
maxHeight = 2
Insert 40
4
●
3
●
2
●
cur
1
0
head
0
●
1
●
0
●
--
1
2
--
30
40
maxHeight = 2
Insert 40
4
●
3
●
2
●
cur
1
0
head
0
1
●
0
●
--
1
2
--
30
40
maxHeight = 2
Insert 60
cur
4
●
3
●
2
●
for (int i = maxHeight - 1; i >= 0; i--)
1
0
head
0
2
●
1
●
1
●
0
●
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 60
cur
4
●
3
●
2
1
0
head
0
2
●
1
●
1
●
0
●
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 60
cur
4
●
3
●
2
1
0
head
0
2
●
1
●
1
●
0
●
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 60
4
●
3
●
cur
2
1
0
head
0
2
●
1
●
1
●
0
●
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 60
4
●
3
●
cur
2
1
1
0
head
0
0
●
2
●
1
●
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 60
4
●
3
●
cur
2
1
1
0
head
0
0
●
2
●
1
●
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 60
4
●
3
●
cur
2
1
0
head
0
2
●
1
1
●
0
0
●
--
1
2
1
--
30
40
60
maxHeight = 3
Insert 10
cur
4
●
3
●
10 < 60 move down
2
1
0
head
0
●
0
2
●
1
1
●
0
0
●
--
1
1
2
3
--
10
30
40
60
maxHeight = 3
Insert 10
cur
4
●
3
●
10 < 40 move down
2
1
0
head
0
●
0
2
●
1
1
●
0
0
●
--
1
1
2
3
--
10
30
40
60
maxHeight = 3
Insert 10
cur
4
●
3
●
10 < 30 and height within range
adjust references
2
1
0
head
0
●
0
2
●
1
1
●
0
0
●
--
1
1
2
3
--
10
30
40
60
maxHeight = 3
Insert 10
cur
4
●
3
●
newNode.Next[i] = cur.Next[i];
cur.Next[i] = newNode;
2
1
0
head
0
●
0
2
●
1
1
●
0
0
●
--
1
1
2
3
--
10
30
40
60
maxHeight = 3
Insert 10
cur
4
●
3
●
2
1
0
head
0
0
2
●
1
1
●
0
0
●
--
1
1
2
3
--
10
30
40
60
maxHeight = 3
Insert 50
curr.Next[i] == null and height within range
adjust references
cur
4
●
3
●
3
●
2
●
2
●
1
1
●
1
●
0
0
●
0
●
2
1
0
head
0
0
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
cur
4
●
3
3
●
2
2
●
2
●
1
1
●
1
●
0
0
●
0
●
1
0
head
0
0
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
50 < 60 and height within range
adjust references
cur
4
●
3
3
●
2
2
●
2
●
1
1
●
1
●
0
0
●
0
●
1
0
head
0
0
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
cur
4
●
3
3
2
2
1
0
head
0
0
●
2
●
1
1
●
1
●
0
0
●
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
cur
4
50 > 40 move across
●
3
3
2
2
1
0
head
0
0
●
2
●
1
1
●
1
●
0
0
●
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
4
50 < 60 and height within range
adjust references
cur
●
3
3
2
2
1
0
head
0
0
●
2
●
1
1
●
1
●
0
0
●
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
4
cur
●
3
3
2
2
2
●
1
1
1
●
0
0
0
●
1
0
head
0
0
●
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
4
50 < 60 and height within range
adjust references
cur
●
3
3
2
2
2
●
1
1
1
●
0
0
0
●
1
0
head
0
0
●
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 50
4
cur
●
3
3
2
2
2
●
1
1
1
●
0
0
0
●
1
0
head
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
●
1
1
●
0
●
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
●
1
1
●
0
●
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
1
1
●
0
●
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
1
1
●
0
●
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
1
1
0
head
0
0
●
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
1
1
0
head
0
0
●
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
1
1
0
head
0
0
●
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Insert 20
cur
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Final Skip List
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Phew!
Exercises (using the Final Skip List)
› Insert 35 with a height of 4.
› Insert 45 with a height of 1.
› Insert 40 with a height of 3 (a duplicated item).
› What is the maximum height of the next insertion?
› What is the probability that the height of the next node is 5? Be
careful.
Contains 30
cur
4
30 < 50 move down
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Contains 30
cur
4
30 > 20 move across
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Contains 30
30 < 50 move down
4
cur
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Contains 30
30 < 40 move down
4
cur
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Contains 30
4
cur
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Found!
Exercises (using the Final Skip List)
› Show the sequence of references that are followed to find:
–
–
–
–
–
–
60
30
10
70
45
35
› Draw two instances of the skip list where the Contains method
takes linear time (i.e., O(n) steps).
Rules of thumb
› Drop down a level when the item you wish to insert, find,
or remove is less than cur.Next[i].Item
– In the case of Insert, references are adjusted first when the
height of the new node is within range
› Move across a level when the item you wish to insert, find
or remove is greater than cur.Next[i].Item
› When the item is equal to curr.Next[i].Item:
– Insert:
References are adjusted first when the height of the
new node is within range (i.e. duplicate items)
– Contains: Return true
– Remove: cur.Next[i] = cur.Next[i].Next[i]
Starting with our final skip list
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Remove 20
cur
4
20 < 50 move down
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Remove 20
cur
4
20 found; adjust references and move down
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Found
Remove 20
cur
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Remove 20
cur
4
20 found; adjust references and move down
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Found
Remove 20
cur
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Remove 20
cur
4
20 > 10 move across
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Remove 20
20 found; adjust references and move down
4
cur
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Found
Remove 20
4
cur
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Remove 20
4
●
3
3
2
2
2
●
1
1
1
●
0
0
0
●
1
0
head
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Remove 50
cur
4
●
3
3
2
2
2
●
1
1
1
●
0
0
0
●
1
0
head
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Remove 50
cur
4
●
3
●
3
2
1
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Note: maxHeight decreases by 1 whenever head.Next[i] is reset to null
Remove 50
cur
4
●
3
●
3
2
1
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Note: maxHeight decreases by 1 whenever head.Next[i] is reset to null
Remove 50
cur
4
●
3
●
3
2
1
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Remove 50
cur
4
●
3
●
3
2
1
0
head
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Remove 50
4
●
3
●
cur
2
1
0
head
0
0
3
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 4
Remove 50
4
●
3
●
cur
2
1
0
head
0
0
3
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Remove 50
4
●
3
●
cur
2
1
0
head
0
0
3
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Remove 50
4
●
3
●
cur
2
1
0
head
0
0
3
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
4
3
--
10
30
40
50
60
maxHeight = 3
Remove 50
4
●
3
●
2
1
0
head
0
0
2
●
1
1
●
0
0
●
--
1
1
2
3
--
10
30
40
60
maxHeight = 3
Removing a duplicated item
› To remove one instance of a duplicate item from the skip
list, some interesting and subtle issues arise
› Consider the following skip list where the item 20 is
duplicated 4 times
To remove one instance of 20
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
20
20
20
60
maxHeight = 4
Remove one instance of 20
cur
4
●
3
●
3
2
2
1
1
0
head
Decrease the height
of the node by one
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Decrease the maxHeight by one
since cur.Next[3] == null
Remove one instance of 20
cur
4
●
3
●
2
2
1
1
0
head
3
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Remove one instance of 20
cur
4
●
3
●
2
2
1
1
0
head
Decrease the height
of the node by one
0
0
3
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
2
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Remove one instance of 20
cur
4
●
3
●
2
2
1
1
0
head
3
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
2
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Remove one instance of 20
cur
4
●
3
●
2
2
1
1
0
head
Decrease the height
of the node by one (again)
0
0
0
3
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Remove one instance of 20
cur
4
●
3
●
2
2
1
1
0
head
3
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Remove one instance of 20
4
●
3
●
3
2
2
1
1
0
head
cur
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
1
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Remove one instance of 20
4
●
3
●
3
2
2
1
1
0
head
cur
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
0
1
2
3
3
--
10
20
20
20
20
60
maxHeight = 3
Decrease the height
of the node by one
Skip list after removing one instance of 20
4
●
3
●
2
1
0
head
0
0
2
2
●
1
1
1
●
0
0
0
●
--
1
1
2
3
3
--
10
20
20
20
60
maxHeight = 3
Exercises (using the Final Skip List)
› Remove 60, 10, and 45.
› If maxHeight = 1 then show that Remove behaves exactly as
the Remove method of a singly-linked list.
› If an item is duplicated in nodes of
– increasing height
– decreasing height
show how one instance of the item is removed.
How to determine the random height?
› Strategy:
– Generate a positive random 32-bit integer R
– Let the height be the number of consecutive lower order bits that
are equal to one
– Set the height of a node equal to min(height+1, maxHeight+1)
Example
R
1
R&1
0
1
0
0
0
0
1
1
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
height = 1; R>>1 // right shift one place
R
1
R&1
0
0
1
0
0
0
0
1
1
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
height = 2; R>>1 // right shift one place
R
1
R&1
0
0
0
1
0
0
0
0
1
1
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
height = 3; R>>1 // right shift one place
R
1
R&1
0
0
0
0
1
0
0
0
0
1
1
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Loop ends when R & 1 == 0
Height of the new node is set at 4
Implementation
int height = 0;
int R = rand.Next();
// R is a random 32-bit positive integer
while ((height < maxHeight) && ((R & 1) == 1))
{
height++;
// Equal to the number consecutive lower order 1-bits
R >>= 1;
// Right shift one bit
}
if (height == maxHeight) maxHeight++;
Node newNode = new Node(item, height + 1);
* Expected time complexity
› Determined by the following loop structure
for (int i = maxHeight - 1; i >= 0; i--)
// Down
{
while (cur.Next[i] != null && cur.Next[i].Item.CompareTo(item) < 0)
cur = cur.Next[i];
...
}
// Across
› Expected height of the skip list is O(log n)
– The for loop is expected to iterate O(log n) times
› The number of nodes at level i+1 is expected to be half
the number of nodes at level i. Therefore, for each link at
level i+1, there are an expected 2 links at level i.
– The while loop is expected to iterate O(1) times
› The overall expected time complexity is O(log n) for
Insert, Contains, and Remove since all three methods use
the loop structure on the previous slide
Time Complexity
COIS 3020
• How do we calculate time complexity?
• I have seen many errors throughout the years when it
comes to calculating time complexity.
• I thought I would take some time today to clarify some
concepts when calculating time complexity.
Time
Complexity
• Why?
• Put simply, the time complexity of a certain operation
on a data structure will determine whether it is usable
or not for a particular use case.
• Searching often? Maybe don’t pick a structure
that has a bad search average case.
• You should probably know the time complexities of
common operations on frequently used data
structures by heart.
• I usually look up the more obscure ones.
Time
Complexity –
Great to Very
Bad
• O (1): Constant Time
• O(log n): Logarithmic Time
• O(n): Linear Time
• O(n^2): Quadratic Time
• O(n^3): Cubic Time
Time
Complexity
Visualized
Time
Complexity –
Side Note
• When you have n = 3, it will not matter what the time
complexity is when timing the code. The difference will be
in milliseconds between O(1) and O(n^3) .
• When you have n = 10000000000 it will matter!
Polynomial
Function
Example
• Let us take an example polynomial function and figure out
its time complexity.
• Two golden rules:
• Drop the constant multipliers (we don’t care about
them for time complexity purposes).
• Drop all the lower order terms.
Polynomial
Function
Example
• F(n) = 2n^2 + 3n +1
• What is the time complexity?
For(i=1; i<=n;i++){
Simple Loop
Example
x=y+b; //constant time
}
What is the time complexity?
Remember neglect the constant!
for(i=1; i<=n; i++){
for(j=1; j<=n; j++){
Nested Loop
Example
x=y+whatever; //constant
}
What is the time complexity?
Sequential
Statements
• What if we had sequential statements?
• The same rules apply!
If(some condition){
loop; //O(n)
If-Else
Statements
}else{
nested loop; O(n^2)
}
What is the time complexity?
Time
Complexity 
Space
Complexity
• Space complexity is also something that you should look
into.
• In some cases algorithms take less time but more space,
but you have the time but not the space (e.g., limited
resources environments).
Augmented Treap
Background
An augmented data structure (like an augmented treap or
an interval tree) is built upon an underlying data structure.
New information (data) is added which:
– Is easily maintained and modified by the existing operations of
the underlying data structure and
– Is used to efficiently support new operations.
Process of augmenting a data structure*
1. Choose an underlying data structure
2. Determine the additional information (data) to maintain
in the underlying data structure
3. Verify that the additional information can be easily
maintained by the existing operations (like Insert and
Remove)
4. Develop new operations
* Section 14.2 of CLRS
Rationale for an augmented treap
› Suppose we wish to add two methods to the treap: one
that returns the item of a given rank and the other that
returns the rank of a given item
p u b l i c T Rank(int i ) { . . . }
p u b l i c i n t Rank(T it e m ) { . . . }
› The rank of an item is defined as its position in the linear
order of items
Using the original treap
› To find the item of a given rank or to find the rank of a
given item in the original treap requires us to traverse the
underlying binary search tree in order. Why?
› The expected time complexity is O(n) since half the items
are visited on average
Using the augmented treap
› By adding one data member to the original
implementation of the treap, the expected time
complexity for the two Rank methods can be reduced to
O(log n)
› But what data member is added, where is it added, and
can it be easily maintained?
New data member: NumItems
› The Node class is augmented with the data member
NumItems which stores the size of the treap rooted at
each instance of Node
› Calculated as
p.NumItems = 1 ;
i f (p.Left != null)
p.NumItems += p.Left.NumItems;
i f (p.Right ! = n u l l )
p.NumItems += p.Right.NumItems ;
Augmented data structure
p u b l i c class Node<T> where T : IComparable
{
p r i v a t e s t a t i c Random R= new Random();
p u b l i c T Item
{ get; set; }
public i n t P r i o r i t y { get; s e t ; }
p u b l i c i n t NumItems { g e t ; s e t ; }
p u b l i c Node<T> L e f t
{ get; set; }
p u b l i c Node<T> Right { g e t ; s e t ; }
...
}
/ / Randomly generated
/ / Augmented information ( d a t a )
class Treap<T> : ISearchable<T> where T : IComparable
{
p r i v a t e Node<T> Root; / / Reference t o the r o o t o f the Treap
...
}
Can NumItems be easily maintained?
› Yes
› When an item is added to or removed from an
augmented treap, NumItems is updated in O(1) time for
each node along the path from the point of insertion or
removal to the root of the treap. Note that both an
insertion and removal take place at a leaf node.
› An additional update is also needed whenever a rotation
is performed on the way up the treap for the Add method.
LeftRotate
root
temp
NumItems updated
on way back
temp
X
T1
root
NumItems updated
after LeftRotate
(Add only)
Y
T2
T3
T1
Y
T3
X
T2
RightRotate
temp
root
NumItems updated
on way back
root
X
temp
Y
NumItems updated
after RightRotate
(Add only)
T1
Y
T2
T3
X
T3
T1
T2
Exercises
› Explain why NumItems is updated for the left (right) child
of the new root after a LeftRotate (RightRotate) is
performed in the Add method.
› Explain why NumItems need not be updated after a
LeftRotate or RightRotate is performed in the Remove
method.
How can NumItems be used to support Rank?
› Two versions of Rank:
– Return the item of a given rank (call this Rank I)
– Return the rank of a given item (call this Rank II)
Rank I
Return the item of a given rank
Consider the following treap
50
6
20
4
60
1
10
1
Item
NumItems
40
2
30
1
Note that Priority is omitted
Find the item of rank 3
3 < 5 go Left
50
6
20
4
60
1
10
1
40
2
30
1
Size of the left subtree plus the root
Find the item of rank 3
50
6
3 > 2 go Right
20
4
60
1
10
1
40
2
30
1
Find the item of rank 3
p
50
6
20
4
60
1
1 < 2 go Left
10
1
40
2
30
1
Reduced by the size of the left
subtree of p plus 1 (for p itself)
Find the item of rank 3
50
6
20
4
60
1
10
1
40
2
30
1
1 = 1 return 30
Find the item of rank 6
6 > 5 go Right
50
6
20
4
60
1
10
1
40
2
30
1
Find the item of rank 6
50
6
20
4
60
1
1 = 1 return 60
10
1
40
2
30
1
Find the item of rank 1
1 < 5 go Left
50
6
20
4
60
1
10
1
40
2
30
1
Find the item of rank 1
50
6
1 < 2 go Left
20
4
60
1
10
1
40
2
30
1
Find the item of rank 1
50
6
20
4
60
1
10
1
1 = 1 return 10
40
2
30
1
Exercises
› Add two children to each leaf node (10, 30 and 60) and
adjust NumItems for each Node in the treap.
› Find the items of rank 4, 8 and 11 in your treap above.
Rank II
Return the rank of a given item
Consider the following treap
50
6
20
4
60
1
10
1
40
2
30
1
Initially, r (calculated rank) is
initialized to 0
Find the rank of 40
40 < 50 go left
r =0
50
6
20
4
60
1
10
1
40
2
30
1
Going left, r remains the same
Find the rank of 40
50
6
40 > 20 go right
r=2
20
4
60
1
10
1
40
2
30
1
Going right, r is increased by the
size of the left subtree plus 1
Find the rank of 40
50
6
20
4
60
1
10
1
40
2
30
1
40 = 40 found
return r = 4
When found, r is also increased by
the size of the left subtree plus 1
Find the rank of 60
60 > 50 go right
r =5
50
6
20
4
60
1
10
1
40
2
30
1
Find the rank of 60
50
6
20
4
60
1
10
1
40
2
30
1
60 = 60 found
return r = 6
Find the rank of 10
10 < 50 go left
r =0
50
6
20
4
60
1
10
1
40
2
30
1
Find the rank of 10
50
6
10 < 40 go left
r=0
20
4
60
1
10
1
40
2
30
1
Find the rank of 10
50
6
20
4
60
1
10
1
10 = 10 found
return r = 1
40
2
30
1
Exercises
› Using the implementation of the augmented treap given
on Blackboard, initialize an augmented treap with 15
items.
› Select 4 items in the treap and find their ranks.
Time complexities of Rank I and Rank II
› In both cases, the maximum number of comparisons is
h+1 where h is the height of the treap
› Since the expected height of the treap is O(log n), the
expected time complexities of Rank I and Rank II are also
O(log n)
› The expected time complexity of O(log n) is far superior
to the average case time complexity of O(n) without
NumItems
Rope
Assignment 2
Definition
› Arope is an augmented binary tree that is used to
represent very long strings
› Each leaf node in the rope stores a small part of the
string
› Each node p is augmented with the length of the string
represented by the rope rooted at p
Rope Node *
class Node
{
private string s; private / / only stored in a leaf node
int length: private
/ / augmenteddata
Node left, right;
...
}
class Rope
{
private Node root;
...
}
* Properties can be defined instead
Methods
Constructor: Rope(string S)
› Let’s start with thestring
“This_is_an_easy_assignment.”
› Create a balanced rope (binarytree)
› Assume that the maximum length of a leaf substring is 4
(on the assignment it is 10)
› Strategy
– Create a root node
– Recursively build the rope from the top down using the two halves
of the given string to create the left and right ropes of the root
27
13
14
7
Thi
s
4
6
_is
3
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
Rope Concatenate(Rope R1, RopeR2)
› Strategy
– Create a newrope Rand set the left and subtrees of its root to
the roots of R1 and R2respectively
R1
R2
root
root
13
14
7
Thi
s
4
6
_is
3
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
R
root
27
13
14
7
Thi
s
4
6
_is
3
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
void Split(int i, Rope R1, RopeR2)
› Strategy
– Find the node p with index i of the string
– If the index splits the substring at p into two then insert the two
parts of the string between (new) left and right subtrees of p
– Break the rope into two parts, duplicating nodes whose right
children cross the split
Split(8, R1, R2)
27
13
14
7
Thi
s
4
6
_is
3
_a n
3
7
_ea
3
Assuming indices start at 0
sy_a
4
7
ssi
3
gnme
4
nt.
3
27
13
14
7
This
4
6
_is
3
_ea
3
3
_a
2
7
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
R1
R2
9
18
9
14
7
Thi
s
4
2
_is
3
2
_a
2
4
7
_ea
3
1
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
R1
Optimizations
R2
9
18
9
14
7
Thi
s
4
2
_is
3
2
_a
2
4
7
_ea
3
1
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
R2
R1
18
9
14
7
Thi
s
4
2
_is
3
2
_a
2
4
7
_ea
3
1
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
R2
R1
18
9
_a
2
7
Thi
s
4
14
_is
3
4
7
_ea
3
1
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
R2
R1
18
9
_a
2
7
This
4
14
_is
3
4
n
1
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
void Insert(string S, int i)
› Strategy (one split and two concatenations)
– Create a rope R1 using S
– Split current rope at index i into ropes R2 and R3
– Concatenate R1, R2 andR3
void Delete(int i, int j)
› Strategy (two splits and oneconcatenation)
– Split current rope at indices i - 1 and j to give ropes R1, R2 andR3
– Concatenate R1 and R3
string Substring(int i, int j)
› Strategy
– Carry out a “pruned” traversal of the rope using the indices i, j
and the augmented data for length to curb unnecessary
explorations of the rope
char CharAt(int i)
› Strategy
– Using the augmented data for length, proceed either left or right
at each node of the rope to find the character of the given index
(cf Rankmethodof the Augmented Treap)
int IndexOf(char c)
› Strategy
– Carry out a traversal of the rope and a running count for the
index until the first occurrence of character c is found
void Reverse( )
› Strategy
– Modify the rope so that it represents the string in reverse
– Carry out a preorder traversal of the rope and swapleft and right
children at eachnode
– Reverse the substrings at each leaf node
27
13
14
7
Thi
s
4
6
_is
3
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
27
14
13
7
sy_a
4
7
ssi
3
gnme
4
7
nt.
3
This
4
6
_is
3
_an
3
_ea
3
27
14
13
7
gnme
4
7
nt.
3
and so on ...
sy_a
4
7
ssi
3
This
4
6
_is
3
_an
3
_ea
3
int Length( )
› Strategy
– Easy ... where is the length of the full string stored?
string ToString( )
› Strategy
– Carry out a traversal of the rope and concatenate (using the
string method) the substrings at each leaf node
– Return the concatenatedstring
void PrintRope( )
› Strategy
– Print out the nodes of the rope inorder (cf Print of the
Treap)
More Optimizations
Combining siblings with short strings
› Strategy
– Concatenate two (leaf) sibling strings to create a longer
substring and place it at the parent node
23
9
14
3
Ti
2
6
s
1
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
23
9
14
Ti
s
3
6
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
Rebalancing
› (Simple) Strategy
– Concatenate the string represented by the rope and rebuild the
rope (as done in the constructor)
Rope II
Suggestions
Definiti
on
› A rope is an augmented binary tree that is
used to represent very long strings
› Each leaf node in the rope stores a small
part of the string
› Each node p is augmented with the length of
the string represented by the rope rooted at p
Suggestion
Maintain a full binary tree
where each non-leaf node
has two non-empty children
Suggestion
Maintain a full binary tree
where each non-leaf node
has two non-empty children
Suggestion
Familiarize yourself with the string methods
of the library class
Suggestion
Several public methods require
a private, recursive counterpart,
including the Constructor
Suggestion
Tackle the three optimizations last
Suggesti
on
The Split method may be implemented as
Node Split (Node p, int i)
which returns the root of the Rope that represents the
right side of the split
p then becomes the root of the Rope that represents the
left side of the split
p
Split(p,
8)
27
13
14
7
This
4
6
_is
3
_a n
3
7
_ea
3
Assuming indices start at 0
sy_a
4
7
ssi
3
gnme
4
nt.
3
27
13
14
7
This
4
6
_is
3
_ea
3
3
_a
2
7
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
returned
from
Split
18
p
9
9
14
7
This
4
2
_is
3
2
_a
2
4
7
_ea
3
1
n
1
sy_a
4
7
ssi
3
gnme
4
nt.
3
Observation
Splits only occur at right children
Suggesti
on
The Concatenate method may be implemented as:
Node Concatenate (Node p, Node q)
which returns a Node whose left and right
subtrees are p and q
Note: p or q could be null
p
q
13
14
7
This
4
6
_is
3
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
returned from Concatenate
27
13
14
7
This
4
6
_is
3
_an
3
7
_ea
3
sy_a
4
7
ssi
3
gnme
4
nt.
3
Suggesti
on
Do not concatenate with an
empty Rope (null)
The binary tree would not be full
otherwise
Suggesti
on
The Substring method can be implemented
using two Splits (cf Delete) and two concatenations
to reconstruct the Rope
or
The Substring method can be implemented
by repeatedly using the CharAt method
(but fewer grades will be awarded)
Suggestion
To insert a string at the beginning
of a rope pass an index of –1
and treat it as a special case
Suggesti
on
Draw up your test cases even before
you program
They help to guide your implementation
Suggestion
Test a method at a time
Suggesti
on
Document methods before
you implement them
It also helps to guide your implementation
Suggesti
on
This is a challenging assignment
Start early
Expect to spend days programming
M idterm ( Test)
Week 7
Topics
Graphs
1. Adjacency Matrix
2. Adjacency List (Assignment 1)
3. Depth-First and Breadth-First Search
Adjacency Matrix
sink
yul
› NumVertices 5
› MaxNumVertices 6
0
1
2
3
4
0
yul
-1
-1
-1
-1
-1
1
yyz
20
-1
-1
60
-1
2
lax
30
15
-1
70
-1
3
lhr
-1
70
-1
-1
50
4
yfb
45
-1
-1
-1
10
5
V
E
30
20
45
5
yyz
lax
15
60
70
lhr
source
70
50
yfb
10
Adjacenc
y List
55
lax
false
35
yul
false
70
65
yyz
false
20
V
lhr
false
40
DirectedGraph
Vertex
Edge
Randomly-Built Binary Search Trees
1. Skip List
2. Treap
Skip List
4
●
3
3
2
2
1
1
0
head
0
0
0
●
2
2
●
1
1
1
●
0
0
0
●
--
1
3
1
2
4
3
--
10
20
30
40
50
60
maxHeight = 4
Treap
root
50
0.83
20
0.47
60
0.25
10
0.34
40
0.42
30
0.14
Item
Priority
The treap satisfies the property of a
1. binary search tree using Item
2. binary heap using Priority
Rotations
Y
X
T1
Y
T2
T3
X
T3
T1
T2
Augmenting Data Structures
1. Augmented Treap (Rank Iand II)
2 . Rope (Assignment2)
Augmented Treap
root
50
6
20
4
60
1
10
1
Item
NumItems
40
2
30
1
Note that Priority is omitted
Rope
root
27
13
Length of (sub)string
14
7
6
7
7
(Sub)string
This
4
_is
3
_an
3
_ea
3
sy_a
4
ssi
3
gnme
4
nt.
3
Part A
› Available from 7:00 PM–8:30 EST on Tuesday, March 1, 2022
› Hosted on Blackboard
› Once you start, you have the full time to complete Part A
Types of Questions (tentative)
› True/False (15 – 20 questions)
› Multiple Choice/Multiple Answer (5 – 10 questions)
› Short Answer (3 – 5 questions)
› 50 – 60 marks
In Part A...
› Questions are presented in randomorder
› Backtracking is not permitted
Part B
› Available from 3:00PM –4:30ESTon Thursday, March 3, 2022
› Under Assessments on Blackboard
› Once you start, you have the full time to complete Part B
Coding in C# (tentative)
› 3 – 4 coding problems
› Implement and analyze a newmethod for a given class
› 30 – 40 marks
In Part B...
› Questions are available up front
Guidelines
1. All questions are based on the implementations seen in
class or based on (partial) definitions provided on the test
2. Answers that do not relate to these implementations will be
graded harshly
3. Penalties may apply for incorrect True/False, Multiple
Choice, and Multiple Answer questions
4. Work through the Exercises provided on the PowerPoints
5. Appreciate the time complexity of different methods
6. Understand the behaviour of standard and augmented
methods for each class (e.g., Add, Remove, Overlap)
7. Review the two assignments
Availability
› Iwill be available during the times for Part Aand PartB
› You may ask questions via MS Teams for any
clarifications(email will not be responded to)
Tries
Briandais, 1959
Fredkin, 1960
Tries
• A trie (AKA prefix tree) is a tree representing a set of strings.
• Each node represents a single character and has at most one child per distinct alphabet
character.
• A terminal node is a node that represents a terminating character, which is the end of a
string in the trie.
Trie – Visualized – 1
• The following trie represents a set of two strings. Each string can be
determined by traversing the path from the root to a leaf.
Trie – Visualized – 2
• Suppose "cats" is added to the trie. Adding another child for 'c' would
violate the trie requirements.
Trie – Visualized – 3
• Instead, the existing child for 'c' is reused.
Trie – Visualized – 4
• Similarly, nodes for 'a' and 't' are reused. Node 't' has a new child
added for 's'.
Trie – Visualized – 5
• Exactly one terminal node exists for each string. Other nodes may be
shared by multiple strings.
Trie – Insertion
• Given a string, a trie insert operation creates a path from the root to a terminal node that
visits all the string's characters in sequence.
• A current node pointer initially points to the root. A loop then iterates through the string's
characters. For each character C:
1. A new child node is added only if the current node does not have a child for C.
2. The current node pointer is assigned with the current node's child for C.
• After all characters are processed, a terminal node is added and returned.
Trie – Insertion – Algorithm
Trie – Insertion – Visualization – 1
• The trie starts with an empty root node. Adding "APPLE" first adds a
new child for 'A' to the root node.
Trie – Insertion – Visualization – 2
• New nodes are built for each remaining character in "APPLE"
Trie – Insertion – Visualization – 3
• The terminal node is added, completing insertion of "APPLE".
Trie – Insertion – Visualization – 4
• When adding "APPLY", the first 4 characters nodes are reused.
Trie – Insertion – Visualization – 5
• Node L has no child for 'Y', so a new node is added. The terminal node
is also added.
Trie – Insertion – Visualization – 6
• When adding "APP", only 1 new node is needed, the terminal node.
Trie – Search
• Given a string, a trie search operation returns the terminal node corresponding to that
string, or null if the string is not in the trie.
Trie – Search – Algorithm
Trie – Search – Visualization – 1
• The search for "PAPAYA" starts at the root and iterates through one
node per character. The terminal node is returned, indicating that the
string was found.
Trie – Search – Visualization – 2
• The search for "GRAPE" ends quickly, since the root has no child for 'G'.
Trie – Search – Visualization – 3
• The search for "PEA" gets to the node for 'A'. No terminal child node
exists after 'A', so null is returned.
Trie – Remove
• Given a string, a trie remove operation removes the string's corresponding terminal node
and all non-root ancestors with 0 children.
Trie – Remove – Algorithm
Trie – Remove – Visualization – 1
Trie – Remove – Visualization – 2
Trie – Remove – Visualization – 3
Trie – Remove – Visualization – 4
Trie – Remove – Visualization – 5
Trie – Remove – Visualization – 6
Trie – Remove – Visualization – 7
Time Complexities
• You are probably more familiar with time complexities that are tied to the number of
elements in the structure.
• A Trie is one of the data structures where the number of elements in the structure has no
consequence on the time complexity.
• If a lookup table is used to find child nodes, then that operation would take O(1).
• Subsequently because finding a child node has a time complexity of O(1), an insertion,
removal, or a search would take O(M) where M is the length of the string you are trying to
search for, insert, or remove.
Trie – Activity (No Pun Intended)
• Which of the following Tries are valid?
R-Way Trie
de la Briandais (1959) and Fredkin (1960)
Introduction
›An r-way trie (retrieval) is an alternate way to implement a
hash table. Therefore, items are inserted, removed, and
retrieved from the trie based on <key,value> pairs.
›Each successive character of the key such as:
– a digit where the key is a number
– a character where the key is a string
determines which of r ways to proceed down the tree.
›Values are stored at nodes that are associated with a key.
Data structure
class Trie<T> : ITrie<T>
{
private Node root;
private class Node
{
public T value;
public int numValues;
public Node[] child;
...
}
...
}
// Root node of the Trie
//
//
//
//
Value associated with a key;
otherwise, default
Number of descendent values of a Node
Branch for each letter 'a' .. ‘z’
Key methods (pun intended)
bool Insert (string key, T value)
Insert a <key,value> pair and return true if successful; false
otherwise
Duplicate keys are not permitted
bool Remove (string key)
Remove the <key,value> and return true if successful; false
otherwise
T Value (string key)
Return the value associated with the given key
Constructor where <key, value> is <string, int>
Assume keys are made up of the letters ‘a’, ‘b’ and ‘c’ only
root
-1
0
(default) value
numValues
child [0..2]
Note: The root node is never removed
(cf header node in a linked list)
Insert
› Basic strategy
– Follow the path laid out by the key, “breaking new ground” if
need be (i.e. creating new nodes) until the full key has been
explored
– The value is then inserted at the final destination (node) unless
the key has already been used (i.e. a value already exists for
that key)
Insert <abba, 10>
root
-1
0
Insert <abba, 10>
root
-1
0
“break new ground”
Insert <abba, 10>
root
-1
0
Insert <abba, 10>
root
-1
0
Insert <abba, 10>
root
-1
0
Insert <abba, 10>
root
-1
0
Insert <abba, 10>
root
-1
0
Insert <abba, 10>
root
-1
0
-1
0
Insert <abba, 10>
root
-1
0
Since:
1) the key has been fully “explored” and
2) the node has a default value (-1)
the value 10 is inserted and numValues is increased
by one along the path back to the root
-1
0
Insert <abba, 10>
root
-1
1
-1
1
-1
1
-1
1
10
1
Note that the key itself is NOT stored,
but defines the path to a value
Insert <ab, 20>
root
-1
1
-1
1
-1
1
-1
1
10
1
Insert <ab, 20>
root
-1
1
-1
1
-1
1
-1
1
10
1
Insert <ab, 20>
root
-1
1
-1
1
-1
1
-1
1
10
1
Insert <ab, 20>
root
-1
1
-1
1
-1
1
-1
1
10
1
Key has been fully explored and the node has a default value
Insert <ab, 20>
root
-1
2
-1
2
20
2
-1
1
10
1
Insert value and adjust numValues
Insert <c, 30>
root
-1
2
-1
2
20
2
-1
1
10
1
Insert <c, 30>
root
-1
2
-1
2
20
2
-1
1
10
1
Insert <c, 30>
root
-1
2
-1
2
20
2
-1
1
10
1
-1
0
Insert <c, 30>
root
-1
3
-1
2
20
2
-1
1
10
1
30
1
Insert <cbc, 30>
root
-1
3
-1
2
20
2
-1
1
10
1
30
1
Insert <cbc, 30>
root
-1
3
-1
2
20
2
-1
1
10
1
30
1
Insert <cbc, 30>
root
-1
3
-1
2
20
2
-1
1
10
1
30
1
Insert <cbc, 30>
root
-1
3
-1
2
30
1
20
2
-1
0
-1
1
10
1
Insert <cbc, 30>
root
-1
3
-1
2
30
1
20
2
-1
0
-1
1
10
1
Insert <cbc, 30>
root
-1
3
-1
2
30
1
20
2
-1
0
-1
1
10
1
-1
0
Insert <cbc, 30>
root
-1
4
-1
2
30
2
20
2
-1
1
-1
1
10
1
Values need not be unique,
only keys
30
1
Insert <bba, 40>
Insert <bc, 50>
root
-1
4
-1
2
30
2
20
2
-1
1
-1
1
10
1
30
1
Insert <bba, 40>
Insert <bc, 50>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Insert<c, 60>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Insert<c, 60>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Insert<c, 60>
root
Node does not have a default value
=> Key is not unique
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Final Trie
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Time complexity
› The path to insert a value is equal to the length of the
key
› Therefore, the time complexity of Insert is O(L) where L is
the length of the key
Exercises
› What happens if the key is the empty string? Can the Insert
method still work? Show how or why not.
› Is it necessary to examine an entire key before a value is
inserted? Show how or why not.
› Insert the following <key,value> pairs into the final trie
– <bbc, 70>
– <b, 80>
– <caa, 90>
Value
› Basic strategy
– Follow the given key from the root until:
› A null pointer is reached, and a default value is returned
How can a null pointer be reached?
› The key is fully examined, and the value at the final node is returned
Can a default value be returned?
Value <bba, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <bba, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <bba, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <bba, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
is returned
-1
1
30
1
Value <ab, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <ab, ?>
root
-1
6
is returned
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <a, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <a, ?>
root
-1
6
default value is returned
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <bbc, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Value <bbc, ?>
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
null
default value is returned
30
1
Time complexity
› Let M be the maximum length of a key in the trie
› The time complexity of Value is O(min(L, M)) where L is
the length of the given key
Exercises
› Give a key that will cause the Value method to execute in
O(M) time on the final trie.
› According to the Value method posted online, what
happens if the key is not made up of letters ‘a’ .. ’z’ ?
How would you solve this problem (if there is one)?
Remove
› Basic strategy
– Follow the key to the node whose value is to be deleted.
– If the node is null or contains a default value then false is
returned; otherwise, the value at the node is set to default, true
is returned, and numValues is reduced by one for each node
along the path back to the root.
– For each child node whose numValues is reduced to 0 on the
way back, the link to the child node is set to null (cf Rope
compression).
Remove <ab>
root
-1
6
Set to default
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Remove <ab>
root
return true
-1
5
-1
1
-1
2
-1
1
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Remove <abba>
root
-1
5
-1
1
-1
2
-1
1
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Remove <abba>
root
-1
5
-1
1
-1
2
-1
1
-1
1
-1
1
-1
0
40
1
30
2
50
1
-1
1
30
1
Remove <abba>
root
-1
5
-1
1
-1
2
-1
1
-1
1
-1
0
40
1
30
2
50
1
-1
1
30
1
Remove <abba>
root
-1
5
-1
1
-1
2
-1
0
-1
1
40
1
30
2
50
1
-1
1
30
1
Remove <abba>
root
-1
5
-1
0
-1
2
-1
1
40
1
30
2
50
1
-1
1
30
1
Remove <abba>
root
return true
-1
4
-1
2
-1
1
40
1
30
2
50
1
-1
1
30
1
Remove <cc>
root
-1
4
-1
2
-1
1
40
1
30
2
50
1
-1
1
30
1
Remove <cc>
root
-1
4
-1
2
-1
1
40
1
30
2
50
1
null
return false
-1
1
30
1
Remove <bb>
root
-1
4
-1
2
-1
1
40
1
30
2
50
1
-1
1
30
1
Remove <bb>
root
-1
4
-1
2
default value
return false
40
1
-1
1
30
2
50
1
-1
1
30
1
Final Trie
root
-1
4
-1
2
-1
1
40
1
30
2
50
1
-1
1
30
1
Time complexity
› Like the method Value, the time complexity of Remove is
O(min(L,M)) where L is the length of the given key and M
is the maximum length of a key in the trie
Exercises
› Justify the time complexity of the method Remove.
› Continue to remove the remaining <key,value> pairs from
the final trie until the empty trie remains.
Print
› Unlike the closed hash table, the r-way trie can easily
print out keys in order
Implementation
private void Print(Node p, string key)
{
int i;
if (p != null)
{
if (!p.value.Equals(default(T)))
Console.WriteLine(key + " " + p.value + " " + p.numValues);
for (i = 0; i < 26; i++)
Print(p.child[i], key+(char)(i+'a'));
}
}
public void Print()
{
Print(root,"");
}
Keys are constructed along the way
Final Trie
After Insertions
root
-1
6
-1
2
-1
2
20
2
-1
1
-1
1
10
1
40
1
30
2
50
1
-1
1
30
1
Result <key, value, numValues>
ab
20 2
abba 10 1
bba 40 1
bc
50 1
c
30 2
cbc 30 1
Final Trie
After Removals
root
-1
4
-1
2
-1
1
40
1
30
2
50
1
-1
1
30
1
Result <key, value, numValues>
bba 40 1
bc
50 1
c
30 2
cbc 30 1
Next up ...
the ternary tree
Ternary Tree
Bentley and Sedgewick (1998)
Introduction
› A ternary tree is a second version of a trie which can also
be used to implement a hash table, among other
applications.
› As a hash table, items are inserted, removed, and
retrieved from the ternary tree based on <key,value>
pairs.
› The ternary tree has a branching factor of three and
therefore, can save some space over an r-way trie.
› Like an r-way trie, keys are examined one character at a
time. But there is not a one-to-one correspondence
between a character and a branch.
› Then how does one progress through the ternary tree?
General strategy to traverse the ternary tree *
Compare the current character k of the key with the
character c at the current node.
Then
› Move left if k < c
› Move right if k > c
› Move to the middle and onto the next character if k = c
* Similar to traversing a binary search tree
Data structure
class Trie<T> : ITrie<T>
{
private Node root;
private int size;
// Root node of the Trie
// Number of values in the Trie
class Node
{
public char ch;
// Character of the key
public T value;
// Value at Node; otherwise, default
public Node low, middle, high; // Left, middle, and right subtrees
...
}
...
}
Key methods
bool Insert (string key, T value)
Insert a <key,value> pair and return true if successful; false
otherwise
Duplicate keys are not permitted
bool Remove (string key)
Remove the <key,value> and return true if successful; false
otherwise
T Value (string key)
Return the value associated with the given key
Constructor
› The initial ternary tree is set to empty, i.e. the root is set
to null
root
null pointer
Insert
› Basic strategy (similar to the R-Way Trie)
– Follow the path laid out by the key (described earlier), “breaking
new ground and building a spine” if needed until the full key has
been explored.
– The value is then inserted at the final destination (node) unless
the key has already been used (i.e. a value already exists for
that key)
Insert <abba, 10>
root
Because the root is null, create a node to store ‘a’
Insert <abba, 10>
root
‘a’
-1
Because the key character and the character in the node
are the same, proceed down the middle path
with the next character in the key.
Note: Once a node is created for a character,
all subsequent nodes will be added along the middle branch,
creating a spine.
Insert <abba, 10>
root
‘a’
-1
Because the middle reference is null, create a node to store ‘b’
Insert <abba, 10>
root
‘a’
-1
‘b’
-1
Because the key character and the character in the node
are the same, proceed down the middle path
with the next character in the key.
Insert <abba, 10>
root
‘a’
-1
‘b’
-1
Insert <abba, 10>
root
‘a’
-1
‘b’
-1
‘b’
-1
Insert <abba, 10>
root
‘a’
-1
‘b’
-1
‘b’
-1
Insert <abba, 10>
root
‘a’
-1
‘b’
-1
‘b’
-1
‘a’
10
spine
Now that all characters of the key have
examined, the value 10 is added to
the last node and true is returned
Insert <ab, 20>
root
‘a’
-1
‘b’
-1
‘b’
-1
‘a’
10
Insert <ab, 20>
root
‘a’
-1
‘b’
-1
‘b’
-1
‘a’
10
Since the first character of the key
is equal to ‘a’, proceed down the
middle branch
Insert <ab, 20>
root
‘a’
-1
‘b’
20
Note: If a value was already found
then false would have been returned
‘b’
-1
‘a’
10
Since:
1) the second character of the key
is equal to ‘b’
2)all characters of the key have
been examined, and
3) the node contains the default value
then the value 20 is added here
and true is returned
Insert <c, 30>
Since ‘c’ is greater than ‘a’
proceed to the right
root
‘a’
-1
‘b’
20
‘b’
-1
‘a’
10
Because the right reference is null,
create a node to store ‘c’
Insert <c, 30>
root
‘a’
-1
‘b’
20
‘b’
-1
‘a’
10
‘c’
30
Insert <cbc, 30>
‘c’ is greater than ‘a’ => proceed right
root
‘a’
-1
‘b’
20
‘b’
-1
‘a’
10
‘c’
30
Insert <cbc, 30>
root
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘c’ is equal to ‘c’ => proceed middle
‘a’
10
Insert <cbc, 30>
root
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘a’
10
‘c’
30
Insert <bc, 50>
root
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘a’
10
‘c’
30
Insert <bc, 50>
root
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
Exercise
Insert <bba, 40>
root
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
root
Insert <bba, 40>
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
‘a’
40
Key observation
Only once you proceed down a middle path
do you move on to the next character in the key
Exercises
› Show how the worst-case time complexity of Insert is
O(n+L) where n is the number of nodes in the ternary tree
and L is the length of the given key.
› Choose six random four-letter words as keys and insert
six <key,value> pairs into an initially empty ternary tree.
What could be the maximum possible depth of the
resultant tree?
Value
› Basic strategy
– Beginning at the root and the first character in the key, traverse
the ternary tree (as described earlier) until all characters of the
key have been examined or a null pointer is reached
– Return true if a value is found; false otherwise
root
Value <bba, ?>
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
‘a’
40
root
Value <bba, ?>
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
is returned
‘a’
40
root
Value <ac, ?>
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
‘a’
40
root
Value <ac, ?>
‘a’
-1
‘b’
20
‘c’
30
Null pointer is reached
Default value is returned
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
‘a’
40
root
Value <cb, ?>
‘a’
-1
‘b’
20
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
‘a’
40
root
Value <cb, ?>
‘a’
-1
‘b’
20
Default value is
returned
‘c’
30
‘b’
-1
‘b’
-1
‘b’
-1
‘a’
10
‘c’
50
‘c’
30
‘b’
-1
‘a’
40
Time complexity
› The time complexity of Value is O(d) where d is the depth
of the ternary tree
› If the tree is balanced then d is O(log n); otherwise, the
depth is O(n) in the worst case where n is the number of
nodes
Exercise
› Implement a method that returns all keys that match a
given pattern. A pattern is defined as a string with
letters and asterisks where each asterisk represents a
wildcard character. Therefore, the pattern *a*d would
return all four-letter keys whose second and fourth
characters are ‘a’ and ‘d’ such as “card” and “band”.
Print
› Like the r-way trie, the keys can output in order
› Also, like the r-way trie, the keys are constructed as the
inorder traversal proceeds through the ternary tree
Algorithm
public void Print()
{
Print(root,"");
}
private void Print(Node p, string key)
{
if (p != null)
{
Print(p.low, key);
if (!p.value.Equals(default(T)))
Console.WriteLine(key + p.ch + " " + p.value);
Print(p.middle, key+p.ch);
Print(p.high, key);
}
Key is being constructed
}
Exercise
› Write a method that returns all keys that begin with a
given prefix. For example, if the prefix is bag, then the
method returns all keys that begin with bag. Do you see
the similarity with the last exercise?
Very useful reference
from Robert Sedgewick and Kevin Wayne
of Princeton University
https://www.cs.princeton.edu/courses/archive/spring21/cos226/lectures/52Tries.pdf
Binomial Heap
Vuillemin (1978)
Quick Review
› A binary heap, implemented as a linear array, represents
a binary tree that satisfies two properties:
– The binary tree is nearly complete
– The priority of each node (except the root) is less than or equal
to the priority of its parent
› The operations to insert an item and to remove the item
with the highest priority take O(log n) time
Example
where higher-valued items have higher priorities
94
62
93
42
39
81
20
27
10
55
Motivation
› The binomial heap behaves in the same way as a binary
heap with one additional capability
› A binomial heap is an example of a mergeable data
structure whereby two binomial heaps can be efficiently
merged into one
› To merge two binary heaps takes O(n) time, but to merge
two binomial heaps takes only O(log n) time
Binomial Tree
› A binomial tree Bk is defined recursively in the following
way:
– The binomial tree B0 consists of a single node
– The binomial tree Bk consists of two binomial trees Bk-1 where
one is the leftmost child of the root of the other
binomial coefficients
Examples
1
4
6
4
1
B0
B1
B2
B3
B3
B2
B4
B1 B0
For any binomial tree Bk
› There are 2k nodes
› The height of the tree is k
› There are exactly
𝑘
𝑖
nodes at depth i, i = 0,1,2,...,k
› The root has degree k
How does a set of binomial trees represent n items?
n = 31
B4
B3
B2
B1
B0
1
1
1
1
1
n = 14
B4
B3
B2
B1
B0
0
1
1
1
0
n = 17
B4
B3
B2
B1
B0
1
0
0
0
1
n = 25
B4
B3
B2
B1
B0
1
1
0
0
1
n = 23
B4
B3
B2
B1
B0
1
0
1
1
1
Now many binomial trees are needed to
represent n items?
at most log 2 𝑛 + 1
n
Number of binomial trees
3
2
19
3
32
1
184
4
2525
8
useful to know for
time complexity analysis
Data structure
› Leftmost-child, right sibling (cf File Systems)
right sibling
leftmost child
B4
B4
public class BinomialNode<T>
{
public T Item
{ get; set; }
public int Degree
{ get; set; }
public BinomialNode<T> LeftMostChild { get; set; }
public BinomialNode<T> RightSibling
...
}
{ get; set; }
Exercises
› Explain why the merger of two binary heaps requires O(n)
time.
› List the binomial trees Bk needed to represent n = 3, 19,
32, 184 and 2525 items.
Binomial Heap
› A binomial heap is a set of binomial trees that satisfies
two properties:
– There is at most one binomial tree Bk for each k
– Each binomial tree is heap-ordered that is, the priority of the
item at each node (except the root) is less than or equal to the
priority of its parent (same as the binary heap)
Example of a binomial heap for n = 29
root list
head
header node
B0
B2
B3
B4
Notes:
1) The rightmost child of each root (except the last) refers to the next binomial tree
2) The binomial trees are order by increasing k
Example of a heap-ordered binomial tree
where higher-valued items have higher priorities
59
36
42
50
12
45
30
Note:
The priority of a parent must be greater than or equal to
the priority of its leftmost child and its right siblings
52
Data structure
public class BinomialHeap<T> : IBinomialHeap<T> where T : IComparable
{
private BinomialNode<T> head;
// Head of the root list
private int size;
// Size of the binomial heap
...
}
Primary methods
// Adds an item to a binomial heap
void Add(T item)
// Removes the item with the highest
priority void Remove()
same as a
binary heap
// Returns the item with the highest
priority
T Front()
// Merges H with the current binomial
heap void Merge(BinomialHeap<T> H)
NEW
Supporting methods
// Returns the reference to the root preceding the highest priority item
private BinomialNode<T> FindHighest()
// Takes the union (without consolidation) of the current and given binomial heap H
private void Union(BinomialHeap<T> H)
// Consolidates (combines) binomial trees of the same degree
private void Consolidate()
// Merges the given binomial heap H into the current heap (used by Add and Remove)
public void Merge(BinomialHeap<T> H)
Let’s start with the private methods
FindHighest
› Basic strategy
– Starting at the header node, traverse the root list and keep track
of the item with the highest priority. Note: The item with the
highest priority in the binomial heap will be found at the root of
one of the binomial trees (Why?)
– Return the reference to the root node that precedes the one with
the highest priority
Union
› Basic strategy
– Given two binomial heaps, merge the root lists into one,
maintaining the order of the binomial trees. Note: The resultant
list will have at most two binomial trees with root degrees of k
(Why?)
head
1101
head
+
0111
head
Consolidate
› Basic strategy
– Run through a root list and combine two binomial trees that have
the same root degree k into one binomial tree of degree k+1.
Note: There may be (at least temporarily) three binomial trees of
the same degree (Why?)
prev
head
prev
head
curr
next
1
5
curr
5
1
Case 4
next
prev
head
curr
next
5
3
1
prev
curr
head
Note: Three B2 !
5
3
1
Case 3
next
prev
curr
head
next
5
3
1
prev
head
curr
5
3
1
Case 2
next
head
3
head
3
prev
curr
next
5
4
6
1
prev
curr
5
6
1
4
Case 4
next
head
3
head
3
prev
curr
next
5
6
2
prev
curr
next
5
6
null
4
1
1
2
4
Case 3
1101
Final Result
+
0111
head
6
5
3
1
2
4
=
10100
What about Case 1?
prev
curr
next
head
The roots at curr and next
do not have the same degree
prev
curr
next
head
Case 1
(Just like Case 2)
Exercises
› Work through Union and Consolidate methods for the following pairs of
binomial heaps. Only show the structure of the resultant binomial heap.
– First heap: B0, B1, B2, B3
– First heap: B0, B1, B2
Second heap: B1, B3
Second heap: B0, B1, B2
› Add items to each node of the binomial trees above and work through
the Union and Consolidate methods.
› Suppose a binomial heap before consolidation has a maximum depth of
k. Argue that the maximum depth of a binomial heap after consolidation
is never more that k+1.
Now, on to the public methods
Merge
› Basic strategy
– Take the union of the given binomial heap H with the current
binomial heap and consolidate.
Union(H);
Consolidate( );
Insert
› Basic strategy
– Create a binomial heap H with the given item (only).
– Merge H with the current binomial heap and increase size by 1.
Remove
› Basic strategy
– Call the method FindHighest and remove the next binomial tree
T from the current root list.
– Insert the children of T (in reverse order) into a new binomial
heap H, effectively removing the item at the root.
– Merge H (Union + Consolidate) with the current binomial heap
and reduce size by 1.
from FindHighest
q
head
Item with highest priority
q
head
p
head
H: head
Union
head
Consolidation
head
Front
› Basic strategy
– Call the method FindHighest and return the item at the root of
the next binomial tree.
Exercises
› Work through the Insert method for n = 15 where the item
to be inserted has the lowest priority. Where is the item
in the final binomial heap?
› Work through the Remove method for n = 3, 19 and 31.
When assigning items to the binomial heaps, place the
item with the highest priority at the root of the last
binomial tree.
Time complexities
Method
Binary Heap
Binomial Heap *
Insert
O(log n)
O(log n)
Remove
O(log n)
O(log n)
Front
O(1)
O(log n)
Merge
O(n)
O(log n)
FindHighest
O(log n)
Union
O(log n)
Consolidate
O(log n)
* All time complexities depend on the observation that
the maximum length of the root list is O(log n)
B-Trees (Chapter 18)
Bayer and McCreight (1972)
What does the “B” stand for?
Background
› A B-tree is a balanced search tree which is used to store
the keys to large file and database systems
› Because:
– only a part of a B-tree can be stored in primary (main) memory
at a time and
– secondary storage is much, much slower than primary memory
it is advantageous to read/write as much information as
possible from/to secondary storage and thereby
minimize disk accesses
› Each node of a B-tree therefore is usually as large as a
disk page and has a branching factor between 50 and
2000 depending on the size of the key relative to the size
of a page
› Hence, a B-tree is wide and shallow
Five properties of a B-tree
Property 1 *
Each internal node has
a minimum degree of t (except for the root) and a
maximum degree of 2t.
* The notation B-tree (t=k) is used to denote a B-tree with a minimum degree of k
Five properties of a B-tree
Property 2
Each node has
a minimum of t-1 keys (except the root) and a
maximum of 2t-1 keys.
If an internal node has degree d (t ≤ d ≤ 2t) then
it stores d-1 keys.
Properties 1 and 2 for B-tree (t=3)
K1
K2
K1
K2
Minimum degree
Maximum degree
K3
K4
K5
Five properties of a B-tree
Property 3
The keys for each node are stored in ascending order.
Therefore,
K1 < K2 < ... < Kd-1
for a node with degree d.
Five properties of a B-tree
Property 4
Key values in the ith subtree of an internal node with
degree d are
less than K1
fall between Ki-1 and Ki
greater than Kd-1
for i=1,
for 2 ≤ i ≤ d-1, and
for i=d.
Properties 3 and 4 for B-Tree (t=3)
K1
k < K1
K2
K1 < k < K2
K3
K2 < k < K3
K4
K3 < k < K4
Subtrees
K5
K4 < k < K5`
K > K5
Five properties of a B-tree
Property 5
All leaf nodes have the same depth.
Maximum height h of a B-tree on n keys
Theorem ( p489 )
The maximum height on n keys for a B-tree with minimum degree t is
h ≤ log t (n+1)/2 which is O(log n).
Exercises
› Look up and review the proof on the previous slide.
› Calculate the minimum height on n keys for a B-tree with
minimum degree t.
› Show all the legal B-trees (t=2) that represent the keys {1,
2, 3, 4, 5}.
› For what values of t is the example tree on the next slide
a legal B-tree?
Example tree *
M
DH
BC
FG
* From CLRS (3rd Edition)
QTX
JKL
NP
RS
VW
YZ
Data structure
class Node<T>
{
private int n;
// number of keys
private bool leaf;
// true if a leaf node; false otherwise
private T[ ] key;
// array of keys
private Node<T>[ ] c;
// array of child references
public Node (int t) {
n = 0;
leaf = true;
key = new T[2*t – 1];
c = new Node<T>[2*t];
}
}
Data structure
class BTree<T> where T : IComparable
{
private Node<T> root;
// root node
private int t;
// minimum degree
public BTree (int t) {
this.t = t;
root = new Node<T>(t);
}
...
}
Primary methods
› public bool Search (T k)
– returns true if the key k is found in the B-tree; false otherwise
› public void Insert (T k)
– insert key k into the B-tree
– duplicate keys are not permitted
› public void Delete (T k)
– delete key k from the B-tree
Support method
› private void Split (Node<T> x, int i)
– splits the ith (full) child of x into 2 nodes
“One-pass” algorithms
› The primary methods Search, Insert, and Delete are
implemented as “one-pass” algorithms where each
algorithm proceeds down the B-tree from the root without
having to back up to a node *
› This minimizes the number of read and write operations
from secondary memory (disk)
* with one exception for deletion
Search
› Basic strategy
– Set p to the root node
– If the given key k is found at p then return true
– If the key k is not found and p is a leaf node then return false
else move to the subtree which would otherwise contain k
Search for key k = 35
K1 = 10
k < K1
K2 = 20
K1 < k < K2
K3 = 30
K2 < k < K3
K4 = 40
K3 < k < K4
Subtrees
K5 = 50
K4 < k < K5`
K > K5
Search for key k = 7
K1 = 10
k < K1
K2 = 20
K1 < k < K2
K3 = 30
K2 < k < K3
K4 = 40
K3 < k < K4
Subtrees
K5 = 50
K4 < k < K5`
K > K5
Search for key k = 80
K1 = 10
k < K1
K2 = 20
K1 < k < K2
K3 = 30
K2 < k < K3
K4 = 40
K3 < k < K4
Subtrees
K5 = 50
K4 < k < K5`
K > K5
Search for key k = 20
found
K1 = 10
k < K1
K2 = 20
K1 < k < K2
K3 = 30
K2 < k < K3
K4 = 40
K3 < k < K4
Subtrees
K5 = 50
K4 < k < K5`
K > K5
Time complexity
› The length of the path from the root to a leaf node is
O(log n)
› At each node along the path, it takes O(t) time to
determine which path to follow
› Total time complexity is O(t log n)
Insert
› Basic strategy
– Descend along the path based on the given key k from the root to a
leaf node. Each node along the path though cannot be full i.e., have
2t-1 keys. Therefore, before moving down to a node p (including the
root), check first if p is full. If p is full then split p into two where the
last t-1 keys of p are placed in a new node q and the median (middle)
key of p is moved up to the parent node. Note: The parent node is
guaranteed to have enough space to store the extra key (Why?)
– Insert the given key k at the leaf node. If the key is found along the
path from the root to the leaf node, then no insertion takes place.
Insert V into a B-tree (t=4)
here
...
N
Y
not full
...
p
P
there
T1
Q
T2
R
T3
S
T4
U
T5
W
T6
full
X
T7
T8
Split method
...
N
S
Y
p
P
T1
Q
T2
q
R
T3
...
U
T4
T5
W
T6
not full
X
T7
T8
Insert V into a B-tree (t=4) when the root is full
here
P
there
T1
Q
T2
R
T3
S
T4
U
T5
W
T6
root is full
X
T7
T8
new root
S
Height increases
by 1
P
T1
Q
T2
R
T3
U
T4
T5
W
T6
not full
X
T7
T8
In general
not full
here
...
there
t-1 keys
1
t-1 keys
...
1
...
t-1 keys
...
t-1 keys
full (2t-1 keys)
not full
Initial B-Tree (t=3) => Do not descend to nodes with 5 keys
G M P X
A C D E
J K
N O
R S T U V
Y Z
R S T U V
Y Z
Insert B
G M P X
A B C D E
J K
N O
G M P X
A B C D E
J K
N O
R S T U V
Y Z
U V
Y Z
Insert Q
G M P T X
A B C D E
J K
N O
Q R S
G M P T X
A B C D E
J K
N O
Q R S
U V
Y Z
Insert L
Height increases by 1
P
G M
A B C D E
J K L
T X
N O
Q R S
U V
Y Z
P
G M
A B C D E
T X
J K L
N O
Q R S
U V
Y Z
Insert F
P
C G M
A B
D E F
J K L
T X
N O
Q R S
U V
Y Z
Observations
› Only descend to a node if it is not full
› The B-tree only grows in height when a key is inserted into
a B-tree with a full root i.e., when the root node is split
› When a node is split into two, each of the two resultant
nodes is assigned the minimum number of keys i.e., (t-1)
Time complexity
› The length of the path from the root to a leaf node is
O(log n)
› At each node along the path, it takes O(t) time:
– To determine which path to follow and
– To split a node (if required)
› Total time complexity is O(t log n)
Exercises
› Randomly insert a permutation of the alphabet into
initially empty B-trees (t=2, t=3). Call the resultant trees
B2 and B3.
› Calculate the minimum and maximum heights of B-trees
(t=2, t=3) for 26 keys.
› Explain how to find the predecessor of a given key in a Btree.
Delete
› Basic strategy
– Descend along the path based on the given key k from the root
until the key is found. Each node along the path though must
have at least t keys. Therefore, before moving down to a node p
(excluding the root), check if p has at least t keys. If not, node p
either:
› borrows a key from a sibling (if possible) or
› merges with an adjacent node where the t-1 keys of each node plus one
from the parent node yield a single node with 2t-1 keys. Note: The parent
node is guaranteed to have an extra key (Why?)
› Basic strategy (cont’d)
– If the given key k is found at a leaf node, delete it. If the given
the key k is found at an internal node p, then:
› if the child node q that precedes k has t keys, then recursively delete the
predecessor k’ of k in the subtree rooted at q and replace k with k’.
› if the child node r that succeeds k has t keys, then recursively delete the
successor k’ of k in the subtree rooted at r and replace k with k’.
› otherwise, merge q and r with k from the parent to yield a single node s
with 2t-1 keys. Recursively delete k from the subtree s.
– If key k is not found, no deletion takes place.
Initial B-Tree (t=3) => Do not descend to nodes with 2 keys or less
P
C G M
A B
D E F
J K L
T X
N O
Q R S
U V
Y Z
Delete F
P
C G M
A B
D E
J K L
T X
N O
Q R S
U V
Y Z
P
C G M
A B
D E
J K L
internal node
N O
T X
Q R S
U V
Y Z
Delete M
P
C G L
A B
D E
J K
T X
N O
Q R S
U V
Y Z
P
C G L
A B
D E
J K
internal node
N O
T X
Q R S
U V
Y Z
Delete G
P
C L
A B
D E G J K
T X
N O
Q R S
U V
Y Z
P
C L
A B
D E J K
T X
N O
Q R S
U V
Y Z
Delete D
Height decreases by 1
C L P T X
A B
E J K
N O
Q R S
U V
Y Z
C L P T X
A B
E J K
N O
Q R S
U V
Y Z
Q R S
U V
Y Z
Delete B
E L P T X
A C
J K
N O
Observations
› The height of a B-tree only decreases if:
– the root node has a single key, and
– its two child nodes have t-1 keys each.
In this case, the two child nodes along with the key at the root are
merged into one node with 2t-1 keys. This node now serves as the
new root of the B-tree.
Time complexity
› Although the Delete method is quite complicated, its time
complexity is the same as Search and Insert, O(t log n).
Exercises
› Successively delete in the same order the keys that were
inserted to construct B2 and B3 (from the last exercise).
› Show why a node may need to be revisited (reread from
disk) in the Delete method.
› Show that the number of read and write disk accesses is
O(h) for Search, Insert, and Delete where h is the height of
the B-tree.
Disjoint Sets (Chapter 21.4)
Tarjan (1975)
Background
› Disjoint sets is a collection of sets where each set is
initialized with a single unique item across all sets.
› Because the items are unique, the sets have no
intersection among themselves and are therefore disjoint.
Data structure
public class DisjointSets : IDisjointSets
{
private int[] set;
// If set[i] = j < 0 then set i has size |j|
// (i.e. set i exists with the given size)
// If set[i] = j > 0 then set i points to set j
// (i.e. set i no longer exists)
private int numItems;
)
// Number of items
Primary methods
› public bool Union (int S1, int S2)
– takes the union of sets S1 and S2
– uses union-by-size which determines the resultant set
› public int Find (int x)
– returns the set that x belongs to
– uses path compression
Constructor
› Basic strategy
– Set each array position to -1
– The negative sign indicates that the set exists, and the value
indicates the size of the set
– Item i is assumed to be in set i
Initially
Set i contains item i
e.g. Item 4 is in set 4
1
2
3
-1
-1
-1
-1
0
1
2
3
0
4
5
6
7
-1
-1
-1
-1
4
5
6
7
All sets 0 to 7 exist and have a size of 1
public DisjointSets(int numItems)
{
int i;
this.numItems = numItems;
set = new int[numItems];
for (i = 0; i < numItems; i++)
set[i] = -1;
// Each set has an initial size of 1
// Assumption: item i belongs to set i
}
Time complexity
› To initialize the array to -1 takes O(n) time
Union
› Basic strategy
– Check that both sets i and j exist i.e., set[i] < 0 and set[j] < 0
– Merge the set with the lesser size into the set with the greater size i.e.,
union-by-size
– If the two sets have the same size, merge either one into the other
Union(0,1)
Both sets exist and have a size of 1
1
2
3
-1
-1
-1
-1
0
1
2
3
0
4
5
6
7
-1
-1
-1
-1
4
5
6
7
5
6
7
1
2
3
-2
+0
-1
-1
-1
-1
-1
-1
0
1
2
3
4
5
6
7
0
4
Union(2,3)
Both sets exist and have a size of 1
1
2
3
-2
+0
-1
-1
0
1
2
3
0
4
5
6
7
-1
-1
-1
-1
4
5
6
7
5
6
7
1
2
3
-2
+0
-2
+2
-1
-1
-1
-1
0
1
2
3
4
5
6
7
0
4
Union(5,4)
Both sets exist and have a size of 1
1
2
3
-2
+0
-2
+2
0
1
2
3
0
4
5
6
7
-1
-1
-1
-1
4
5
6
7
1
2
3
4
5
6
7
-2
+0
-2
+2
+5
-2
-1
-1
0
1
2
3
4
5
6
7
0
Union(2,1)
Set 1 does NOT exist; return false
1
2
3
4
5
6
7
-2
+0
-2
+2
+5
-2
-1
-1
0
1
2
3
4
5
6
7
0
Union(0,6)
Both sets exist. Set 0 has greater size => Merge set 6 into set 0
1
2
3
4
5
6
7
-2
+0
-2
+2
+5
-2
-1
-1
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
-3
+0
-2
+2
+5
-2
+0
-1
0
1
2
3
4
5
6
7
0
Union(5,7)
Both sets exist. Set 5 has greater size => Merge set 7 into set 5
1
2
3
4
5
6
7
-3
+0
-2
+2
+5
-2
+0
-1
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
-3
+0
-2
+2
+5
-3
+0
+5
0
1
2
3
4
5
6
7
0
Union(5,0)
Both sets exist and have a size of 3
1
2
3
4
5
6
7
-3
+0
-2
+2
+5
-3
+0
+5
0
1
2
3
4
5
6
7
0
0
1
2
3
4
5
6
7
+5
+0
-2
+2
+5
-6
+0
+5
0
1
2
3
4
5
6
7
Union(2,5)
Both sets exist. Set 5 has greater size => Merge set 2 into set 5
0
1
2
3
4
5
6
7
+5
+0
-2
+2
+5
-6
+0
+5
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
Final Set
0
1
2
3
4
5
6
7
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
Visually more
appealing ☺
0
1
All items belong
to set 5
5
2
6
3
4
7
public bool Union(int s1, int s2)
{
// Union by size
if (set[s1] < 0 && set[s2] < 0)
{
if (set[s2] < set[s1])
{
set[s2] += set[s1];
set[s1] = s2;
}
else
{
set[s1] += set[s2];
set[s2] = s1;
}
return true;
}
else
return false;
}
// Both sets exist
//
//
//
//
//
If size of set s2 > size of set s1
Then
Increase the size of s2 by the size of s1
Have set s1 point to set s2
(i.e. s2 = s1 U s2)
//
//
//
//
//
If the size of set s1 >= size of set s2
Then
Increase the size of s1 by the size of s2
Have set s2 point to set s1
(i.e. s1 = s1 U s2)
Time complexity
› The maximum number of Unions for n disjoint sets is n-1.
Why?
› Each Union takes O(1) or constant time
› Therefore, n-1 Unions take O(n) time
Theorem 1
Using union-by-size, for every set i, ni ≥ 2hi
Notation:
Let ni be the size of set i
Let hi be the height of set i
Let ni* be the size of the resultant set i after a union-by-size
Let hi* be the height of the resultant set i after a union-by size
Proof by induction on the number of links k
Basis of induction (k=0): With no links, every set contains
one item and has a height of 0.
since 1 ≥ 20
The theorem holds for i=0
Induction hypothesis: Assume the theorem is true after k
links
Inductive step (k+1): Prove the theorem holds for an
additional link
Case 1 (hi < hj)
Let set i be the smaller set and set j be the larger set.
5
0
1
2
6
Set i
4
3
Set j
7
Case 1 (hi < hj)
Let set i be the smaller set and set j be the larger set.
5
0
1
2
6
3
Set j
4
7
Case 1 (hi < hj)
Since hi < hj the height of the resultant set j does not
change.
nj* = ni + nj ≥ nj ≥ 2hj = 2hj*
Induction hypothesis
Case 2 (hi ≥ hj)
Let set i be the smaller set and set j be the larger set.
0
1
5
6
3
4
2
8
Set i
Set j
7
Case 2 (hi ≥ hj)
Let set i be the smaller set and set j be the larger set.
0
1
5
6
3
8
Set j
2
4
7
Case 2 (hi ≥ hj)
Since hi ≥ hj the height of the resultant set j is hi + 1
nj* = ni + nj ≥ 2ni ≥ 2x2hi = 2hi+1
Induction hypothesis
=
2hj*
Theorem 2
Using union-by-size, the height of the final resultant set is
O(log n) .
Proof:
The resultant set has n items.
which implies:
h ≤
Therefore, h is O(log n).
From Theorem 1, n ≥ 2h
log 2 n
Exercises
› Show that if union-by-rank is not used for n-1 unions, the
final resultant set could have a height of n-1
› Carry out a series of seven Unions on 8 disjoint sets that
yields a single set with a height of 3
› Redraw using the 2-D tree representation the resultant
sets after each of the Union operations
Find
› Basic strategy
– Recurse to the root of the set in which item belongs and return
the root set
– On the way back from the recursive calls, update each item
along the path to point directly to the root set i.e., path
compression
Find(6)
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
5
0
1
2
6
3
4
7
Find(6)
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
5
0
1
2
6
3
4
7
Find(6)
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
5
0
1
2
6
3
4
7
Find(6)
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
Return 5
0
1
5
2
6
Negative value indicates
the set exists
3
4
7
› If the tree structure doesn’t change, each Find operation
takes O(log n) time i.e., the height of the tree.
› But we also know that every node (item) along the path to
the root belongs to set 5! Therefore, set each node
along the path (and only nodes along the path) to point
directly to 5.
› Next time, Find(6) will only take one step.
Find(6)
+5
+0
+5
+2
+5
-8
+0
+5
0
1
2
3
4
5
6
7
5
0
1
2
6
3
4
7
Find(6)
+5
+0
+5
+2
+5
-8
+5
+5
0
1
2
3
4
5
6
7
5
0
1
2
6
3
4
7
› But how do we backtrack along the same path from the
root?
› The beauty of recursion
Luc Devroye
McGill University
public int Find(int item)
{
if (item < 0 || item >= numItems) // Item is not in the range 0..n-1
return -1;
else
if (set[item] < 0)
// set[item] exists
return item;
else
// Path Compression
// Recurse to the root set
// And then update all sets along the path to point to the root
return set[item] = Find(set[item]);
}
public int Find(int item)
{
if (item < 0 || item >= numItems) // Item is not in the range 0..n-1
return -1;
else
if (set[item] < 0)
// set[item] exists
return item;
Terminating condition
else
// Path Compression
// Recurse to the root set
// And then update all sets along the path to point to the root
return set[item] = Find(set[item]);
}
public int Find(int item)
{
if (set[item] < 0)
return item;
else
Terminating condition
return set[item] = Find(set[item]);
}
Find(9)
0
Question: Was union-by-size
used to construct this representation
of disjoints sets?
1
3
4
6
7
9
8
10
11
12
2
5
Find(9)
0
9
11
6
12
8
10
3
1
7
4
2
5
Time complexity
Using union-by-size plus path compression, any sequence
of m≥n Find and n-1 Union operations is:
O(m α(m,n) )
where α(m,n) is the functional inverse of Ackermann’s function.
But α(m,n) grows sooooo slowly that the average time to
complete any sequence of m operations is virtually O(m) which
averages out to O(1) per operation.
Exercises
› Review the reference on the next page and learn about
union-by-rank.
– The rank is a proxy for what measurement?
– When does the rank of a set increase?
› Describe a set representation where one Find operation
reduces the height of the set from n-1 to 1. Note that
union-by-size would not have been used.
Useful reference
Kevin Wayne, Princeton University
https://www.cs.princeton.edu/courses/archive/spring13/cos423/lectures/UnionFind.pdf
COIS 3020: Data Structures &
Algorithms II
Alaadin Addas
Final Examination Information
Week 12 – Lecture 1
Overview
• Exam Information
• General Information & Guidelines
• Format
• Content
• Questions
• Exam Review
• Assignment 2 Implementation (will not be posted)
Final
Examination
–
General
Information
• Thursday, April 14th at 11 AM (check the exam schedule for
any changes)
• Location: Online, hosted and submitted on Blackboard.
• Duration: 3 Hours (unless you have an SAS accomodation
letter).
Final
Examination
–
Guidelines
–
1
• The final examination is open book; however, you are not
allowed help from any 3rd party in real time.
• The deadline for the examination is 2PM on the 14th of
April, every minute you are late after the deadline will lead
to a 1 point penalty.
• Grace period is 5 minutes.
• I will be available via MS Teams, message me in a private
chat or in the general chat.
• If you are messaging me in the general chat, please tag
me (@), or I will not get a notification.
• Emails won’t be replied to for you own sake!
Final
Examination
–
Guidelines
–
2
• Backtracking is not allowed, and the questions are
randomized.
• I am messing around with the settings to see if I can
allow backtracking for the programming questions on
their own, but as of now it seems like I have to
disallow backtracking for the entire exam.
Final
Examination
–
Format
• The final exam has three components (it’s the same
assessment on Blackboard though)
1. 20-22 True/False questions (1 mark each).
2. 7-9 multiple choice questions (3 marks each).
3. 2-3 programming questions (each question is worth
between 6-10 marks).
Final
Examination
–
Content
• The topics to focus on for the final examination are:
• R-Way Tries
• Ternanry Trees
• Binomial Heap
• B-Tree
• Disjoint Sets
• Quad Trees are NOT on the final examination.
Final
Examination
–
Recommended
Time
Management
• 35 minutes on the true/false questions
• 45 minutes on the multiple choice questions
• 85 minutes on the programming questions
Questions?
Download