COIS 3020: Data Structures & Algorithms II Alaadin Addas Week 1 Lecture 1 Introducing Myself Course Introduction Overview Graphs • • • • • Introducing Graphs Adjacency Lists Demonstration Directed Graphs Weighted Graphs Summary My name is Alaadin Addas and I will be your instructor for COIS 3020 this semester. Introducing Myself I hold a B.Sc. in Computer Science with a specialization in Software Engineering, and an M.Sc. in Computer Science with a focus on Cyber Security. I also hold the SSCP and CISSP certifications from ISC^2 Over the past 6 years I have been working as a software engineer while teaching at several institutions including Trent University, UOIT (Ontario Tech). • I worked for a logistics company for 5 years, I transitioned last year to work for a Vancouver based startup SaaS company that makes employee recognition software. Syllabus! Course Introduction MS teams join code: 0to4ndq Graphs Graphs Graphs • Many items in the world are connected, such as computers on a network connected by wires (or wirelessly), cities connected by roads, or people connected by friendship. Example Graph Graphs • A graph is a data structure for representing connections among items and consists of vertices connected by edges. • A vertex (or node) represents an item in a graph. • An edge represents a connection between two vertices in a graph. Graphs • How many vertices does this graph have? • How many edges? Two vertices are adjacent if connected by an edge. Graphs Adjacency & Paths A path is a sequence of edges leading from a source (starting) vertex to a destination (ending) vertex. • The path length is the number of edges in the path. The distance between two vertices is the number of edges on the shortest path between those vertices. Graphs - Adjacency & Paths - 1 • Two vertices are adjacent if connected by an edge. • PC1 and Server1 are adjacent. • PC1 and Server3 are not adjacent. Graphs - Adjacency & Paths - 2 • A path is a sequence of edges from a source vertex to a destination vertex. Graphs - Adjacency & Paths - 3 • A path's length is the path's number of edges. • Vertex distance is the length of the shortest path: • Distance from PC1 to PC2 is 2. Graph Applications • So why do we care? Where can this even be used? • Graphs are often used to represent a geographic map, which can contain information about places and travel routes. • For example: Vertices in a graph can represent airports, with edges representing available flights. • Edge weights in such graphs often represent the length of a travel route, either in total distance or expected time taken to navigate the route. • For example: A map service with access to real-time traffic information can assign travel times to road segments. Graph Applications • A graph can be used to represent relationships between products. • Vertices in the graph corresponding to a customer's purchased products have adjacent vertices representing products that can be recommended to the customer. Product Recommendation – Graph Example • An online shopping service can represent relationships between products being sold using a graph. Product Recommendation – Graph Example • Relationships may be based on the products alone. A game console requires a TV and games, so the game console vertex connects to TV and game products. Product Recommendation – Graph Example • Vertices representing common kitchen products, such as a blender, dishwashing soap, kitchen towels, and oven mitts, are also connected. Why what is the connection? This makes no sense. Product Recommendation – Graph Example • Connections might also represent common purchases that occur by chance. Maybe several customers who purchase a Blu-ray player also purchase a blender. Product Recommendation – Graph Example • A customer's purchases can be linked to the product vertices in the graph. Product Recommendation – Graph Example • Adjacent vertices can then be used to provide a list of recommendations to the customer. Product Recommendation – Graph Example • Does this sound familiar? • You have probably used an online shopping service that has used this method to recommend things to you. • Other applications include recommending friends or people to follow on social media, and the list goes on. Graph Representations • Everything we saw looks great when visualized, but how do we represent graphs in code as a data structure? • Adjacency Lists • Adjacency Matrices Adjacency Lists • Remember, two vertices are adjacent if they are connected by an edge. • In an adjacency list graph representation, each vertex has a list of adjacent vertices, each list item representing an edge. Adjacency Lists – Visualized – 1 • Each vertex has a list of adjacent vertices for edges. • The edge connecting A and B appears in A's list and in B's list (unless it is directed which we will not discuss now). Adjacency Lists – Visualized – 2 • Each edge appears in the lists of the edge's two vertices. Advantages of Adjacency Lists • A key advantage of an adjacency list graph representation is a size of O(V + E), because each vertex appears once, and each edge appears twice. • V refers to the number of vertices, E the number of edges. • However, a disadvantage is that determining whether two vertices are adjacent is O(V), because one vertex's adjacency list must be traversed looking for the other vertex, and that list could have V items. • However, in most applications, a vertex is only adjacent to a small fraction of the other vertices, yielding a sparse graph. Sparse Graphs • A sparse graph has far fewer edges than the maximum possible. Many graphs are sparse, like those representing a computer network, flights between cities, or friendships among people (every person isn't friends with every other person). • Thus, the adjacency list graph representation is very common. Adjacency Lists Demonstration • I am using C# 10 hence the differences. • After viewing the demo, how can you add a weight to an edge, try it out! Directed Graphs • A directed graph, or digraph, consists of vertices connected by directed edges. • A directed edge is a connection between a starting vertex and a terminating vertex. • In a directed graph, a vertex Y is adjacent to a vertex X, if there is an edge from X to Y. • Many graphs are directed, like those representing links between web pages, maps for navigation, or college course prerequisites. Paths & Cycles • In a directed graph: • A path is a sequence of directed edges leading from a source (starting) vertex to a destination (ending) vertex. • A cycle is a path that starts and ends at the same vertex. A directed graph is cyclic if the graph contains a cycle, and acyclic if the graph does not contain a cycle. Paths & Cycles – Visualized – 1 • A path is a sequence of directed edges leading from a source (starting) vertex to a destination (ending) vertex. Paths & Cycles – Visualized – 2 • A cycle is a path that starts and ends at the same vertex. • A graph can have more than one cycle. Paths & Cycles – Visualized – 3 • A cyclic graph contains a cycle. • An acyclic graph contains no cycles. Acyclic graphs are sometimes referred to as DAGs. Weighted Graphs • A weighted graph associates a weight with each edge. • A graph edge's weight, or cost, represents some numerical value between vertex items, such as flight cost between airports, connection speed between computers, or travel time between cities. • A weighted graph may be directed or undirected. Applications Path Length - Weighted Graphs • In a weighted graph, the path length is the sum of the edge weights in the path. Negative Edge Weight Cycles • The cycle length is the sum of the edge weights in a cycle. • A negative edge weight cycle has a cycle length less than 0. • A shortest path does not exist in a graph with a negative edge weight cycle, because each loop around the negative edge weight cycle further decreases the cycle length, so no minimum exists. • Where might we use negative edge weights applied? Summary • Graphs have many applications: • • • • Social Networks Online Shopping Flight Networks Computer Networks • The most simple implementation of a graph involves an array of LinkedLists. • Try adding weights and finding the shortest path between two vertices by altering the uploaded demo code!! COIS 3020: Data Structures & Algorithms II Alaadin Addas Week 2 Lecture 1 Originally Week 1 – Lecture 2 Recap Graphs Overview • Directed Graphs • Weighted Graphs • Path Length • Adjacency Matrices Summary Recap • Graphs have many applications: • Social Networks • Online Shopping • Flight Networks • Computer Networks • The most simple implementation of a graph involves an array of LinkedLists. • Last lecture we left off with the demonstration, and I asked you to try and alter the AddEdge method so that you could also add a weight. Directed Graphs • A directed graph, or digraph, consists of vertices connected by directed edges. • A directed edge is a connection between a starting vertex and a terminating vertex. • In a directed graph, a vertex Y is adjacent to a vertex X, if there is an edge from X to Y. • Many graphs are directed, like those representing links between web pages, maps for navigation, or college course prerequisites. Paths & Cycles • In a directed graph: • A path is a sequence of directed edges leading from a source (starting) vertex to a destination (ending) vertex. • A cycle is a path that starts and ends at the same vertex. A directed graph is cyclic if the graph contains a cycle, and acyclic if the graph does not contain a cycle. Paths & Cycles – Visualized – 1 • A path is a sequence of directed edges leading from a source (starting) vertex to a destination (ending) vertex. Paths & Cycles – Visualized – 2 • A cycle is a path that starts and ends at the same vertex. • A graph can have more than one cycle. Paths & Cycles – Visualized – 3 • A cyclic graph contains a cycle. • An acyclic graph contains no cycles. Acyclic graphs are sometimes referred to as DAGs. Weighted Graphs • A weighted graph associates a weight with each edge. • A graph edge's weight, or cost, represents some numerical value between vertex items, such as flight cost between airports, connection speed between computers, or travel time between cities. • A weighted graph may be directed or undirected. Applications Path Length - Weighted Graphs • In a weighted graph, the path length is the sum of the edge weights in the path. Negative Edge Weight Cycles • The cycle length is the sum of the edge weights in a cycle. • A negative edge weight cycle has a cycle length less than 0. • A shortest path does not exist in a graph with a negative edge weight cycle, because each loop around the negative edge weight cycle further decreases the cycle length, so no minimum exists. • Where might we use negative edge weights applied? Adjacency Matrices • Adjacency lists are one way to represent a graph as a data structure. • Another approach is to use an adjacency matrix. • Remember, two vertices are adjacent if connected by an edge. • In an adjacency matrix graph representation, each vertex is assigned to a matrix row and column, and a matrix element is 1 if the corresponding two vertices have an edge or is 0 otherwise. Adjacency Matrix – Representation – 1 • Each vertex is assigned to a row and column. Adjacency Matrix – Representation – 2 • An edge connecting A and B is a 1 in A's row and B's column. Adjacency Matrix – Representation – 3 • Similarly, that same edge is a 1 in B's row and A's column (for the matrix to be symmetric). Adjacency Matrix – Representation – 4 • Each edge similarly has two 1's in the matrix. Other matrix elements are 0's (not shown). We don’t need to insert a zero, the absence of one indicates that an edge does not exist. Analysis of Adjacency Matrices • Assuming the common implementation as a two-dimensional array whose elements are accessible in O(1), then an adjacency matrix's key benefit is O(1) determination of whether two vertices are adjacent: • The corresponding element is just checked for 0 or 1 (or lack of a zero and 1). • A key drawback is O(𝑉 2 ) size. • For example: A graph with 1000 vertices would require a 1000 x 1000 matrix, meaning 1,000,000 elements. • An adjacency matrix's large size is inefficient for a sparse graph, in which most elements would be 0’s. • An adjacency matrix only represents edges among vertices; if each vertex has data, like a person's name and address, then a separate list of vertices is needed. Sparse Graph – Reminder • Last lecture I mentioned that a sparse graph is one that has less than the maximum amount of edges. • Most graphs are sparse graphs that is why adjacency lists are preferred. • Recall the Facebook users example: • Say the average user has 500 friends (this is a fake number that I made up, it is probably less). • There are over 1 billion facebook users. • 1 billion * 1 billion = 1 quintillion. • Most of that matrix will be populated with zeros. Demonstration • An adjacency matrix is usually implemented as a 2D array so it is not a very interesting data structure to examine. • We will be looking at a more advanced variation of the adjacency list. This time we will include weights! Summary • Graphs can be cyclic (contain a cycle) or acyclic (do not contain a cycle). • Edges on a graph may be weighted which can indicate several different things depending on the application the graph is being used for (e.g., cost of a plane ticket, connected speed between nodes, etc..) • Adjacency lists are much better suited as a data structure when graphs are sparse. COIS 3020: Data Structures & Algorithms II Alaadin Addas Week 2 Lecture 2 Recap Overview Breadth First Search on Graphs Depth First Search on Graphs Summary Recap • Graphs can be cyclic (contain a cycle) or acyclic (do not contain a cycle). • Edges on a graph may be weighted which can indicate several different things depending on the application the graph is being used for (e.g., cost of a plane ticket, connection speed between nodes, etc..) • Adjacency lists are much better suited as a data structure when graphs are sparse. Graph Traversal • An algorithm commonly must visit every vertex in a graph in some order, known as a graph traversal. Breadth First Search • A breadth-first search (BFS) is a traversal that visits a starting vertex, then all vertices of distance 1 from that vertex, then of distance 2, and so on, without revisiting a vertex. Breadth First Search – Visualized – 1 • Breadth-first search starting at A visits vertices based on distance from A. • B and D are distance 1. Breadth First Search – Visualized – 2 • E and F are distance 2 from A. • Note: A path of length 3 also exists to F, but distance metric uses shortest path. Breadth First Search – Visualized – 3 • C is distance 3 from A. Breadth First Search – Visualized – 4 • Breadth-first search from A visits A, then vertices of distance 1, then 2, then 3. • Note: Visiting order of same-distance vertices doesn't matter (we can do D or B first, or F first then E). Breadth First Search – Exercise • We are going to perform a breadth-first search of the graph below. • Assume the starting vertex is E. Breadth First Search – Exercise • Which vertex is visited first? • Which vertex is visited second? • Which vertex is visited third? • What is C's distance? • What is D's distance? • Is the BFS traversal of a graph unique (i.e., can we traverse using a different path?) Application: Social Network Recommenders • Social networking sites like Facebook and LinkedIn use graphs to represent connections among people. • For a particular user, a site may wish to recommend new connections. • One approach does a breadth-first search starting from the user, recommending new connections starting at distance 2. • Distance 1 users are already connected with the user. Social Network Recommenders Application: P2P • Have you every downloaded a torrent? • In a peer-to-peer network, computers are connected via a network and may seek and download file copies (such as songs or movies) via intermediary computers or routers. • For example, one computer may seek the movie "The Wizard of Oz", which may exist on 10 computers in a network consisting of 100,000 computers. • Finding the closest computer (having the fewest intermediaries) yields a faster download speed. • A BFS traversal of the network graph can find the closest computer with that movie Application: P2P Breadth First Search – Algorithm • An algorithm for breadth-first search enqueues the starting vertex in a queue. (does everyone remember what a queue is?) • While the queue is not empty, the algorithm dequeues a vertex from the queue, visits the dequeued vertex, enqueues that vertex's adjacent vertices (if not already discovered), and repeats. Queue & Stack Reminder • Image Credits: Dev community. I lost the link, let it go. Just let it go. Breadth First Search – Algorithm • BFS enqueues the start vertex (in this case A) in frontierQueue, and adds A to Discovered Set. Breadth First Search – Algorithm • Vertex A is dequeued from frontierQueue and visited. Breadth First Search – Algorithm • Undiscovered vertices adjacent to A are enqueued in frontierQueue and added to discoveredSet. Breadth First Search – Algorithm • Vertex A's visit is complete. The process continues on to next vertices in the frontierQueue, D and B. Breadth First Search – Algorithm • E is visited. Vertices B and F are adjacent to E, but are already in discoveredSet. So B and F are not again added to frontierQueue or discoveredSet. Breadth First Search – Algorithm • The process continues until frontierQueue is empty. discoveredSet shows the visit order. Note that each vertex's distance from start is shown on the graph. Breadth First Search – Algorithm • When the BFS algorithm first encounters a vertex, that vertex is said to have been discovered. • In the BFS algorithm, the vertices in the queue are called the frontier, being vertices thus far discovered but not yet visited. • Because each vertex is visited at most once, an already-discovered vertex is not enqueued again. • A "visit" may mean to print the vertex, append the vertex to a list, compare vertex data to a value and return the vertex if found, etc. Depth First Search • Another way to traverse a graph is Depth First Search (DFS). • A depth-first search is a traversal that visits a starting vertex, then visits every vertex along each path starting from that vertex to the path's end before backtracking. Depth First Search – Visualization – 1 • Depth-first search starting at A descends along a path to the path's end before backtracking. Depth First Search – Visualization – 2 • Reach path's end: Backtrack to F, visit F's other adjacent vertex (E). • B already visited, backtrack again. Depth First Search – Visualization – 3 • Backtracked all the way to A. Visit A's other adjacent vertex (D). • No other adjacent vertices: Done. Depth First Search - Exercise • Perform a depth-first search of the graph below. Assume the starting vertex is E. Depth First Search - Exercise • Which vertex is visited first? • Assume DFS traverses the following vertices: E, A, B. Which vertex is visited next? • Assume DFS traverses the following vertices: E, A, B, C. Which vertex is visited next? • Is the following a valid DFS traversal? • E, D, F, A, B, C Depth First Search Algorithm • A recursive DFS can be implemented using the program stack instead of an explicit stack. • The recursive DFS algorithm is first called with the starting vertex. • If the vertex has not already been visited, the recursive algorithm visits the vertex and performs a recursive DFS call for each adjacent vertex. Depth First Search Algorithm RecursiveDFS(currentV) { if ( currentV is not in visitedSet ) { Add currentV to visitedSet "Visit" currentV for each vertex adjV adjacent to currentV { RecursiveDFS(adjV) } } } Summary • What is the main difference between BFS and DFS? • Which is more efficient, how can we tell? • What are some applications of BFS and DFS? COIS 3020: Data Structures & Algorithms II Alaadin Addas Week 3 Lecture 1 Overview • Announcements • Recap • Depth First Search • Demonstration • Assignment 1 • Summary Announcements • As of next week, we will begin in person learning. • Please refer to the academic timetable for the location. • We have to be very flexible (watch out for announcements). • To my knowledge there will be no audio/video recording (I am not even sure the room we are in is equipped). • I pushed back the due date on Assignment 1 slightly. • Graduate Teaching Assistant: Mohsen Hosseinibaghdadabadi • mhosseinibaghdadabad@trentu.ca • Thursday 5PM office hours for the rest of the semester on MS Teams. Recap • Last week we discussed the Breadth First Search Algorithm in some detail. • What are the basic tenants of a BFS? • Why are Breadth First Searches used? Recap Breadth First Search – Algorithm • Vertex A is dequeued from frontierQueue and visited. Depth First Search • Another way to traverse a graph is Depth First Search (DFS). • A depth-first search is a traversal that visits a starting vertex, then visits every vertex along each path starting from that vertex to the path's end before backtracking. Depth First Search – Visualization – 1 • Depth-first search starting at A descends along a path to the path's end before backtracking. Depth First Search – Visualization – 2 • Reach path's end: Backtrack to F, visit F's other adjacent vertex (E). • B already visited, backtrack again. Depth First Search – Visualization – 3 • Backtracked all the way to A. Visit A's other adjacent vertex (D). • No other adjacent vertices: Done. Depth First Search - Exercise • Perform a depth-first search of the graph below. Assume the starting vertex is E. Depth First Search - Exercise • Which vertex is visited first? • Assume DFS traverses the following vertices: E, A, B. Which vertex is visited next? • Assume DFS traverses the following vertices: E, A, B, C. Which vertex is visited next? • Is the following a valid DFS traversal? • E, D, F, A, B, C Depth First Search Algorithm • A recursive DFS can be implemented using the program stack instead of an explicit stack. • The recursive DFS algorithm is first called with the starting vertex. • If the vertex has not already been visited, the recursive algorithm visits the vertex and performs a recursive DFS call for each adjacent vertex. Depth First Search Algorithm RecursiveDFS(currentV) { if ( currentV is not in visitedSet ) { Add currentV to visitedSet "Visit" currentV for each vertex adjV adjacent to currentV { RecursiveDFS(adjV) } } } Demonstration Assignment 1 Summary • What is the main difference between BFS and DFS? • Which is more efficient, how can we tell? • What are some applications of BFS and DFS? Treap Cecilia Aragon and Raimund Seidel, 1989 Background › A treap is a randomly built binary search tree › The expected time complexity to insert and remove an item from the treap is O(log n) › Inspired as a combination of a binary search tree and a binary heap (hence, treap) Review › Property of a binary search tree T For each node p in T, all values in the 1. left subtree of p are less than the value at p. 2. right subtree of p are greater than the value at p. › Property of a binary heap T For each node p in T (except the root), the priority at p is less than or equal to the priority of its parent. Data structure › Like a binary search tree except each node also contains a random priority public class Node<T> where T : IComparable { private static Random R = new Random(); public public public public ... T int Node<T> Node<T> Item Priority Left Right { { { { get; get; get; get; set; set; set; set; } } } } } class Treap<T> : ISearchable<T> where T : IComparable { private Node<T> Root; // Reference to the root of the Treap ... } Key methods › void Add (T item) – inserts item into the treap (duplicate items are not permitted) › bool Contains (T item) – Returns true if item is found; false otherwise – Same as the binary search tree › void Remove (T item) – Removes (if possible) item from the treap Support methods Node<T> LeftRotate (Node<T> root) root temp temp X T1 root Y T2 Y T3 X T3 T1 T2 Support methods Node<T> RightRotate (Node<T> root) temp root root X T1 temp Y T2 Y T3 X T3 T1 T2 Add › Basic strategy – Create a node that contains an item and a random priority. – Add the node to the treap as you would a binary search tree using its item to determine ordering. – Now, using its random priority, rotate the node (either left or right) up the treap until the parent node has a higher priority or the root is reached. Consider the following treap 50 0.83 20 0.47 60 0.25 10 0.34 40 0.42 30 0.14 Item Priority The treap satisfies the property of a 1. binary search tree using Item 2. binary heap using Priority Add 15 with priority 0.68 50 0.83 20 0.47 60 0.25 10 0.34 40 0.42 15 0.68 30 0.14 Add 15 with priority 0.68 50 0.83 20 0.47 60 0.25 10 0.34 40 0.42 15 0.68 30 0.14 Add 15 with priority 0.68 50 0.83 20 0.47 60 0.25 15 0.68 10 0.34 40 0.42 30 0.14 Add 15 with priority 0.68 50 0.83 15 0.68 10 0.34 Stop! 60 0.25 20 0.47 The treap still satisfies the property of a 1. binary search tree using Item 2. binary heap using Priority 30 0.14 40 0.42 Add 25 with priority 0.99 50 0.83 15 0.68 60 0.25 10 0.34 20 0.47 40 0.42 30 0.14 25 0.99 Add 25 with priority 0.99 50 0.83 15 0.68 10 0.34 60 0.25 20 0.47 40 0.42 25 0.99 30 0.14 Add 25 with priority 0.99 50 0.83 15 0.68 10 0.34 60 0.25 20 0.47 25 0.99 40 0.42 30 0.14 Add 25 with priority 0.99 50 0.83 15 0.68 10 0.34 60 0.25 25 0.99 20 0.47 40 0.42 30 0.14 Add 25 with priority 0.99 50 0.83 25 0.99 60 0.25 15 0.68 10 0.34 40 0.42 20 0.47 30 0.14 Add 25 with priority 0.99 (Final Treap) 25 0.99 50 0.83 15 0.68 10 0.34 20 0.47 Yes, the treap still satisfies the property of a 1. binary search tree using Item 2. binary heap using Priority 40 0.42 30 0.14 60 0.25 Exercises › Give a sequence n items/priorities that yields a treap with a worst-case depth of n-1. › Can any sequence of n items yield a treap of depth n-1 for some sequence of random priorities? Argue why or why not. › Add (35, 0.75) to the final treap. Remove › Basic strategy – Find the node that contains the item to remove. – Rotate the node with its child with the higher priority. If the left child has the higher priority, rotate right. If the right child has the higher priority, rotate left. Assume that an empty child has an arbitrarily small priority. – Continue to rotate the node downward in this manner until it is a leaf node. Then simply remove the node (and item). Remove 50 25 0.99 50 0.83 15 0.68 10 0.34 20 0.47 40 0.42 30 0.14 60 0.25 Remove 50 25 0.99 15 0.68 10 0.34 40 0.42 20 0.47 30 0.14 50 0.83 60 0.25 Remove 50 25 0.99 15 0.68 10 0.34 40 0.42 20 0.47 30 0.14 60 0.25 50 0.83 “Snip off” the leaf node Remove 50 25 0.99 15 0.68 10 0.34 40 0.42 20 0.47 30 0.14 60 0.25 Remove 25 25 0.99 15 0.68 10 0.34 40 0.42 20 0.47 30 0.14 60 0.25 Remove 25 15 0.68 10 0.34 25 0.99 40 0.42 20 0.47 30 0.14 60 0.25 Remove 25 10 0.34 15 0.68 20 0.47 25 0.99 40 0.42 30 0.14 60 0.25 Remove 25 10 0.34 15 0.68 20 0.47 40 0.42 25 0.99 60 0.25 30 0.14 Remove 25 15 0.68 10 0.34 20 0.47 40 0.42 “Snip off” the leaf node 25 0.99 30 0.14 60 0.25 Final treap 15 0.68 10 0.34 And the treap still satisfies the property of a 1. binary search tree using Item 2. binary heap using Priority Hooray! 20 0.47 40 0.42 30 0.14 60 0.25 Exercises › Remove 15 from the final treap. › Could you use the traditional way of removing a value from a binary search tree and apply it to a treap? Show how. › Re-implement the Remove method without recursion. › * Given a treap of height h, provide an example that requires 2h rotations to bring the root of the treap to a leaf position? Expected time complexity › The time complexity to Add or Remove an item is O(h) where h is the height of the treap. – The Add method rotates the inserted item (node) up one level at a time until it reaches the root in the worst case. – The Remove method may carry out 2h rotations in the worst case to bring the root item (node) to a leaf position (see Exercise). › Because the treap builds a random binary search tree, the expected height of the tree is O(log n) and therefore, the expected time complexity of Add and Remove is O(log n). Skip List William Pugh, 1989 Background › A skip list is a randomly built binary search tree › The expected time complexity to insert and remove an item from the skip list is O(log n) › Inspired as a generalization of the singly-linked list Consider the following linked list head Item 10 20 30 40 50 60 Height 1 1 1 1 1 1 Next ● Node In the worst case, it takes O(n) time to: 1) insert an item, 2) find an item, and 3) remove an item What if we added a level to node 40? head Item 10 20 30 40 50 60 Height 1 1 1 2 1 1 Next 0 1 ● ● What if we added two levels to node 20? head Item 10 20 30 40 50 60 Height 1 3 1 2 1 1 Next 0 1 2 ● ● head Item 10 20 30 40 50 60 Height 1 3 1 2 1 1 Next 0 ● 1 2 ● ● What if we added two levels to node 60? head Item 10 20 30 40 50 60 Height 1 3 1 2 1 3 Next 0 ● 1 2 ● ● head Item 10 20 30 40 50 60 Height 1 3 1 2 1 3 Next 0 ● 1 ● 2 ● What if we added three levels to node 50? head Item 10 20 30 40 50 60 Height 1 3 1 2 4 3 Next 0 ● 1 ● 2 ● 3 head Item 10 20 30 40 50 60 Height 1 3 1 2 4 3 Next 0 ● 1 ● 2 ● 3 ● Can you see the binary search tree? head Item 10 20 30 40 50 60 Height 1 3 1 2 4 3 Next 0 ● 1 ● 2 ● 3 ● Can you see the binary search tree? head Item 10 20 30 40 50 60 Height 1 3 1 2 4 3 Next 0 ● 1 ● 2 ● 3 ● Root Methods › void Insert (T item) – inserts item into the skip list (duplicate items are permitted) › bool Contains (T item) – Returns true if item is found; false otherwise › void Remove (T item) – Removes one occurrence of item (if possible) Starting from scratch head 4 ● 3 ● 2 ● 1 ● 0 ● Node Next[ ] Node class -- int Height -- T Value maxHeight = 0 // current maximum height of the skip list The height of a node is randomly set according to the following distribution Insert 30 head Height Probability 1 1/2 2 1/4 3 1/8 4 ● 3 4 1/16 ● 2 h 1/2h ● 1 ● 0 ● but is capped at one more than the current maxHeight of the skip list 0 ● -- 1 -- 30 maxHeight = 0 Insert 30 head 4 ● 3 ● 2 ● 1 ● 0 ● 0 ● -- 1 -- 30 maxHeight = 1 Insert 30 cur head 4 ● 3 ● 2 ● 1 ● 0 ● newNode.Next[i] = cur.Next[i]; cur.Next[i] = newNode; 0 ● -- 1 -- 30 maxHeight = 1 Insert 30 4 ● 3 ● 2 ● 1 ● newNode.Next[i] = cur.Next[i]; cur.Next[i] = newNode; 0 head 0 ● -- 1 -- 30 maxHeight = 1 Insert 40 4 ● 3 ● 2 ● 1 ● 0 head 0 ● 1 ● 0 ● -- 1 2 -- 30 40 maxHeight = 2 Insert 40 cur 4 ● 3 ● 2 ● 1 ● 0 head 0 ● 1 ● 0 ● -- 1 2 -- 30 40 maxHeight = 2 Insert 40 cur 4 ● 3 ● 2 ● 1 0 head 0 ● 1 ● 0 ● -- 1 2 -- 30 40 maxHeight = 2 Insert 40 cur 4 ● 3 ● 2 ● while (cur.Next[i] != null && cur.Next[i].Item.CompareTo(item) < 0) cur = cur.Next[i]; 1 0 head 0 ● 1 ● 0 ● -- 1 2 -- 30 40 maxHeight = 2 Insert 40 4 ● 3 ● 2 ● cur 1 0 head 0 ● 1 ● 0 ● -- 1 2 -- 30 40 maxHeight = 2 Insert 40 4 ● 3 ● 2 ● cur 1 0 head 0 1 ● 0 ● -- 1 2 -- 30 40 maxHeight = 2 Insert 60 cur 4 ● 3 ● 2 ● for (int i = maxHeight - 1; i >= 0; i--) 1 0 head 0 2 ● 1 ● 1 ● 0 ● 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 60 cur 4 ● 3 ● 2 1 0 head 0 2 ● 1 ● 1 ● 0 ● 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 60 cur 4 ● 3 ● 2 1 0 head 0 2 ● 1 ● 1 ● 0 ● 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 60 4 ● 3 ● cur 2 1 0 head 0 2 ● 1 ● 1 ● 0 ● 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 60 4 ● 3 ● cur 2 1 1 0 head 0 0 ● 2 ● 1 ● 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 60 4 ● 3 ● cur 2 1 1 0 head 0 0 ● 2 ● 1 ● 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 60 4 ● 3 ● cur 2 1 0 head 0 2 ● 1 1 ● 0 0 ● -- 1 2 1 -- 30 40 60 maxHeight = 3 Insert 10 cur 4 ● 3 ● 10 < 60 move down 2 1 0 head 0 ● 0 2 ● 1 1 ● 0 0 ● -- 1 1 2 3 -- 10 30 40 60 maxHeight = 3 Insert 10 cur 4 ● 3 ● 10 < 40 move down 2 1 0 head 0 ● 0 2 ● 1 1 ● 0 0 ● -- 1 1 2 3 -- 10 30 40 60 maxHeight = 3 Insert 10 cur 4 ● 3 ● 10 < 30 and height within range adjust references 2 1 0 head 0 ● 0 2 ● 1 1 ● 0 0 ● -- 1 1 2 3 -- 10 30 40 60 maxHeight = 3 Insert 10 cur 4 ● 3 ● newNode.Next[i] = cur.Next[i]; cur.Next[i] = newNode; 2 1 0 head 0 ● 0 2 ● 1 1 ● 0 0 ● -- 1 1 2 3 -- 10 30 40 60 maxHeight = 3 Insert 10 cur 4 ● 3 ● 2 1 0 head 0 0 2 ● 1 1 ● 0 0 ● -- 1 1 2 3 -- 10 30 40 60 maxHeight = 3 Insert 50 curr.Next[i] == null and height within range adjust references cur 4 ● 3 ● 3 ● 2 ● 2 ● 1 1 ● 1 ● 0 0 ● 0 ● 2 1 0 head 0 0 -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 cur 4 ● 3 3 ● 2 2 ● 2 ● 1 1 ● 1 ● 0 0 ● 0 ● 1 0 head 0 0 -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 50 < 60 and height within range adjust references cur 4 ● 3 3 ● 2 2 ● 2 ● 1 1 ● 1 ● 0 0 ● 0 ● 1 0 head 0 0 -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 cur 4 ● 3 3 2 2 1 0 head 0 0 ● 2 ● 1 1 ● 1 ● 0 0 ● 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 cur 4 50 > 40 move across ● 3 3 2 2 1 0 head 0 0 ● 2 ● 1 1 ● 1 ● 0 0 ● 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 4 50 < 60 and height within range adjust references cur ● 3 3 2 2 1 0 head 0 0 ● 2 ● 1 1 ● 1 ● 0 0 ● 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 4 cur ● 3 3 2 2 2 ● 1 1 1 ● 0 0 0 ● 1 0 head 0 0 ● ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 4 50 < 60 and height within range adjust references cur ● 3 3 2 2 2 ● 1 1 1 ● 0 0 0 ● 1 0 head 0 0 ● ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 50 4 cur ● 3 3 2 2 2 ● 1 1 1 ● 0 0 0 ● 1 0 head 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 ● 1 1 ● 0 ● 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 ● 1 1 ● 0 ● 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 1 1 ● 0 ● 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 1 1 ● 0 ● 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 1 1 0 head 0 0 ● 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 1 1 0 head 0 0 ● 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 1 1 0 head 0 0 ● 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Insert 20 cur 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Final Skip List 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Phew! Exercises (using the Final Skip List) › Insert 35 with a height of 4. › Insert 45 with a height of 1. › Insert 40 with a height of 3 (a duplicated item). › What is the maximum height of the next insertion? › What is the probability that the height of the next node is 5? Be careful. Contains 30 cur 4 30 < 50 move down ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Contains 30 cur 4 30 > 20 move across ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Contains 30 30 < 50 move down 4 cur ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Contains 30 30 < 40 move down 4 cur ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Contains 30 4 cur ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Found! Exercises (using the Final Skip List) › Show the sequence of references that are followed to find: – – – – – – 60 30 10 70 45 35 › Draw two instances of the skip list where the Contains method takes linear time (i.e., O(n) steps). Rules of thumb › Drop down a level when the item you wish to insert, find, or remove is less than cur.Next[i].Item – In the case of Insert, references are adjusted first when the height of the new node is within range › Move across a level when the item you wish to insert, find or remove is greater than cur.Next[i].Item › When the item is equal to curr.Next[i].Item: – Insert: References are adjusted first when the height of the new node is within range (i.e. duplicate items) – Contains: Return true – Remove: cur.Next[i] = cur.Next[i].Next[i] Starting with our final skip list 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Remove 20 cur 4 20 < 50 move down ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Remove 20 cur 4 20 found; adjust references and move down ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Found Remove 20 cur 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Remove 20 cur 4 20 found; adjust references and move down ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Found Remove 20 cur 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Remove 20 cur 4 20 > 10 move across ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Remove 20 20 found; adjust references and move down 4 cur ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Found Remove 20 4 cur ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Remove 20 4 ● 3 3 2 2 2 ● 1 1 1 ● 0 0 0 ● 1 0 head 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Remove 50 cur 4 ● 3 3 2 2 2 ● 1 1 1 ● 0 0 0 ● 1 0 head 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Remove 50 cur 4 ● 3 ● 3 2 1 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Note: maxHeight decreases by 1 whenever head.Next[i] is reset to null Remove 50 cur 4 ● 3 ● 3 2 1 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Note: maxHeight decreases by 1 whenever head.Next[i] is reset to null Remove 50 cur 4 ● 3 ● 3 2 1 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Remove 50 cur 4 ● 3 ● 3 2 1 0 head 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Remove 50 4 ● 3 ● cur 2 1 0 head 0 0 3 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 4 Remove 50 4 ● 3 ● cur 2 1 0 head 0 0 3 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Remove 50 4 ● 3 ● cur 2 1 0 head 0 0 3 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Remove 50 4 ● 3 ● cur 2 1 0 head 0 0 3 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 4 3 -- 10 30 40 50 60 maxHeight = 3 Remove 50 4 ● 3 ● 2 1 0 head 0 0 2 ● 1 1 ● 0 0 ● -- 1 1 2 3 -- 10 30 40 60 maxHeight = 3 Removing a duplicated item › To remove one instance of a duplicate item from the skip list, some interesting and subtle issues arise › Consider the following skip list where the item 20 is duplicated 4 times To remove one instance of 20 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 20 20 20 60 maxHeight = 4 Remove one instance of 20 cur 4 ● 3 ● 3 2 2 1 1 0 head Decrease the height of the node by one 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Decrease the maxHeight by one since cur.Next[3] == null Remove one instance of 20 cur 4 ● 3 ● 2 2 1 1 0 head 3 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Remove one instance of 20 cur 4 ● 3 ● 2 2 1 1 0 head Decrease the height of the node by one 0 0 3 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 2 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Remove one instance of 20 cur 4 ● 3 ● 2 2 1 1 0 head 3 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 2 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Remove one instance of 20 cur 4 ● 3 ● 2 2 1 1 0 head Decrease the height of the node by one (again) 0 0 0 3 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Remove one instance of 20 cur 4 ● 3 ● 2 2 1 1 0 head 3 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Remove one instance of 20 4 ● 3 ● 3 2 2 1 1 0 head cur 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Remove one instance of 20 4 ● 3 ● 3 2 2 1 1 0 head cur 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 0 1 2 3 3 -- 10 20 20 20 20 60 maxHeight = 3 Decrease the height of the node by one Skip list after removing one instance of 20 4 ● 3 ● 2 1 0 head 0 0 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 1 2 3 3 -- 10 20 20 20 60 maxHeight = 3 Exercises (using the Final Skip List) › Remove 60, 10, and 45. › If maxHeight = 1 then show that Remove behaves exactly as the Remove method of a singly-linked list. › If an item is duplicated in nodes of – increasing height – decreasing height show how one instance of the item is removed. How to determine the random height? › Strategy: – Generate a positive random 32-bit integer R – Let the height be the number of consecutive lower order bits that are equal to one – Set the height of a node equal to min(height+1, maxHeight+1) Example R 1 R&1 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 height = 1; R>>1 // right shift one place R 1 R&1 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 height = 2; R>>1 // right shift one place R 1 R&1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 height = 3; R>>1 // right shift one place R 1 R&1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Loop ends when R & 1 == 0 Height of the new node is set at 4 Implementation int height = 0; int R = rand.Next(); // R is a random 32-bit positive integer while ((height < maxHeight) && ((R & 1) == 1)) { height++; // Equal to the number consecutive lower order 1-bits R >>= 1; // Right shift one bit } if (height == maxHeight) maxHeight++; Node newNode = new Node(item, height + 1); * Expected time complexity › Determined by the following loop structure for (int i = maxHeight - 1; i >= 0; i--) // Down { while (cur.Next[i] != null && cur.Next[i].Item.CompareTo(item) < 0) cur = cur.Next[i]; ... } // Across › Expected height of the skip list is O(log n) – The for loop is expected to iterate O(log n) times › The number of nodes at level i+1 is expected to be half the number of nodes at level i. Therefore, for each link at level i+1, there are an expected 2 links at level i. – The while loop is expected to iterate O(1) times › The overall expected time complexity is O(log n) for Insert, Contains, and Remove since all three methods use the loop structure on the previous slide Time Complexity COIS 3020 • How do we calculate time complexity? • I have seen many errors throughout the years when it comes to calculating time complexity. • I thought I would take some time today to clarify some concepts when calculating time complexity. Time Complexity • Why? • Put simply, the time complexity of a certain operation on a data structure will determine whether it is usable or not for a particular use case. • Searching often? Maybe don’t pick a structure that has a bad search average case. • You should probably know the time complexities of common operations on frequently used data structures by heart. • I usually look up the more obscure ones. Time Complexity – Great to Very Bad • O (1): Constant Time • O(log n): Logarithmic Time • O(n): Linear Time • O(n^2): Quadratic Time • O(n^3): Cubic Time Time Complexity Visualized Time Complexity – Side Note • When you have n = 3, it will not matter what the time complexity is when timing the code. The difference will be in milliseconds between O(1) and O(n^3) . • When you have n = 10000000000 it will matter! Polynomial Function Example • Let us take an example polynomial function and figure out its time complexity. • Two golden rules: • Drop the constant multipliers (we don’t care about them for time complexity purposes). • Drop all the lower order terms. Polynomial Function Example • F(n) = 2n^2 + 3n +1 • What is the time complexity? For(i=1; i<=n;i++){ Simple Loop Example x=y+b; //constant time } What is the time complexity? Remember neglect the constant! for(i=1; i<=n; i++){ for(j=1; j<=n; j++){ Nested Loop Example x=y+whatever; //constant } What is the time complexity? Sequential Statements • What if we had sequential statements? • The same rules apply! If(some condition){ loop; //O(n) If-Else Statements }else{ nested loop; O(n^2) } What is the time complexity? Time Complexity Space Complexity • Space complexity is also something that you should look into. • In some cases algorithms take less time but more space, but you have the time but not the space (e.g., limited resources environments). Augmented Treap Background An augmented data structure (like an augmented treap or an interval tree) is built upon an underlying data structure. New information (data) is added which: – Is easily maintained and modified by the existing operations of the underlying data structure and – Is used to efficiently support new operations. Process of augmenting a data structure* 1. Choose an underlying data structure 2. Determine the additional information (data) to maintain in the underlying data structure 3. Verify that the additional information can be easily maintained by the existing operations (like Insert and Remove) 4. Develop new operations * Section 14.2 of CLRS Rationale for an augmented treap › Suppose we wish to add two methods to the treap: one that returns the item of a given rank and the other that returns the rank of a given item p u b l i c T Rank(int i ) { . . . } p u b l i c i n t Rank(T it e m ) { . . . } › The rank of an item is defined as its position in the linear order of items Using the original treap › To find the item of a given rank or to find the rank of a given item in the original treap requires us to traverse the underlying binary search tree in order. Why? › The expected time complexity is O(n) since half the items are visited on average Using the augmented treap › By adding one data member to the original implementation of the treap, the expected time complexity for the two Rank methods can be reduced to O(log n) › But what data member is added, where is it added, and can it be easily maintained? New data member: NumItems › The Node class is augmented with the data member NumItems which stores the size of the treap rooted at each instance of Node › Calculated as p.NumItems = 1 ; i f (p.Left != null) p.NumItems += p.Left.NumItems; i f (p.Right ! = n u l l ) p.NumItems += p.Right.NumItems ; Augmented data structure p u b l i c class Node<T> where T : IComparable { p r i v a t e s t a t i c Random R= new Random(); p u b l i c T Item { get; set; } public i n t P r i o r i t y { get; s e t ; } p u b l i c i n t NumItems { g e t ; s e t ; } p u b l i c Node<T> L e f t { get; set; } p u b l i c Node<T> Right { g e t ; s e t ; } ... } / / Randomly generated / / Augmented information ( d a t a ) class Treap<T> : ISearchable<T> where T : IComparable { p r i v a t e Node<T> Root; / / Reference t o the r o o t o f the Treap ... } Can NumItems be easily maintained? › Yes › When an item is added to or removed from an augmented treap, NumItems is updated in O(1) time for each node along the path from the point of insertion or removal to the root of the treap. Note that both an insertion and removal take place at a leaf node. › An additional update is also needed whenever a rotation is performed on the way up the treap for the Add method. LeftRotate root temp NumItems updated on way back temp X T1 root NumItems updated after LeftRotate (Add only) Y T2 T3 T1 Y T3 X T2 RightRotate temp root NumItems updated on way back root X temp Y NumItems updated after RightRotate (Add only) T1 Y T2 T3 X T3 T1 T2 Exercises › Explain why NumItems is updated for the left (right) child of the new root after a LeftRotate (RightRotate) is performed in the Add method. › Explain why NumItems need not be updated after a LeftRotate or RightRotate is performed in the Remove method. How can NumItems be used to support Rank? › Two versions of Rank: – Return the item of a given rank (call this Rank I) – Return the rank of a given item (call this Rank II) Rank I Return the item of a given rank Consider the following treap 50 6 20 4 60 1 10 1 Item NumItems 40 2 30 1 Note that Priority is omitted Find the item of rank 3 3 < 5 go Left 50 6 20 4 60 1 10 1 40 2 30 1 Size of the left subtree plus the root Find the item of rank 3 50 6 3 > 2 go Right 20 4 60 1 10 1 40 2 30 1 Find the item of rank 3 p 50 6 20 4 60 1 1 < 2 go Left 10 1 40 2 30 1 Reduced by the size of the left subtree of p plus 1 (for p itself) Find the item of rank 3 50 6 20 4 60 1 10 1 40 2 30 1 1 = 1 return 30 Find the item of rank 6 6 > 5 go Right 50 6 20 4 60 1 10 1 40 2 30 1 Find the item of rank 6 50 6 20 4 60 1 1 = 1 return 60 10 1 40 2 30 1 Find the item of rank 1 1 < 5 go Left 50 6 20 4 60 1 10 1 40 2 30 1 Find the item of rank 1 50 6 1 < 2 go Left 20 4 60 1 10 1 40 2 30 1 Find the item of rank 1 50 6 20 4 60 1 10 1 1 = 1 return 10 40 2 30 1 Exercises › Add two children to each leaf node (10, 30 and 60) and adjust NumItems for each Node in the treap. › Find the items of rank 4, 8 and 11 in your treap above. Rank II Return the rank of a given item Consider the following treap 50 6 20 4 60 1 10 1 40 2 30 1 Initially, r (calculated rank) is initialized to 0 Find the rank of 40 40 < 50 go left r =0 50 6 20 4 60 1 10 1 40 2 30 1 Going left, r remains the same Find the rank of 40 50 6 40 > 20 go right r=2 20 4 60 1 10 1 40 2 30 1 Going right, r is increased by the size of the left subtree plus 1 Find the rank of 40 50 6 20 4 60 1 10 1 40 2 30 1 40 = 40 found return r = 4 When found, r is also increased by the size of the left subtree plus 1 Find the rank of 60 60 > 50 go right r =5 50 6 20 4 60 1 10 1 40 2 30 1 Find the rank of 60 50 6 20 4 60 1 10 1 40 2 30 1 60 = 60 found return r = 6 Find the rank of 10 10 < 50 go left r =0 50 6 20 4 60 1 10 1 40 2 30 1 Find the rank of 10 50 6 10 < 40 go left r=0 20 4 60 1 10 1 40 2 30 1 Find the rank of 10 50 6 20 4 60 1 10 1 10 = 10 found return r = 1 40 2 30 1 Exercises › Using the implementation of the augmented treap given on Blackboard, initialize an augmented treap with 15 items. › Select 4 items in the treap and find their ranks. Time complexities of Rank I and Rank II › In both cases, the maximum number of comparisons is h+1 where h is the height of the treap › Since the expected height of the treap is O(log n), the expected time complexities of Rank I and Rank II are also O(log n) › The expected time complexity of O(log n) is far superior to the average case time complexity of O(n) without NumItems Rope Assignment 2 Definition › Arope is an augmented binary tree that is used to represent very long strings › Each leaf node in the rope stores a small part of the string › Each node p is augmented with the length of the string represented by the rope rooted at p Rope Node * class Node { private string s; private / / only stored in a leaf node int length: private / / augmenteddata Node left, right; ... } class Rope { private Node root; ... } * Properties can be defined instead Methods Constructor: Rope(string S) › Let’s start with thestring “This_is_an_easy_assignment.” › Create a balanced rope (binarytree) › Assume that the maximum length of a leaf substring is 4 (on the assignment it is 10) › Strategy – Create a root node – Recursively build the rope from the top down using the two halves of the given string to create the left and right ropes of the root 27 13 14 7 Thi s 4 6 _is 3 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 Rope Concatenate(Rope R1, RopeR2) › Strategy – Create a newrope Rand set the left and subtrees of its root to the roots of R1 and R2respectively R1 R2 root root 13 14 7 Thi s 4 6 _is 3 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 R root 27 13 14 7 Thi s 4 6 _is 3 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 void Split(int i, Rope R1, RopeR2) › Strategy – Find the node p with index i of the string – If the index splits the substring at p into two then insert the two parts of the string between (new) left and right subtrees of p – Break the rope into two parts, duplicating nodes whose right children cross the split Split(8, R1, R2) 27 13 14 7 Thi s 4 6 _is 3 _a n 3 7 _ea 3 Assuming indices start at 0 sy_a 4 7 ssi 3 gnme 4 nt. 3 27 13 14 7 This 4 6 _is 3 _ea 3 3 _a 2 7 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 R1 R2 9 18 9 14 7 Thi s 4 2 _is 3 2 _a 2 4 7 _ea 3 1 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 R1 Optimizations R2 9 18 9 14 7 Thi s 4 2 _is 3 2 _a 2 4 7 _ea 3 1 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 R2 R1 18 9 14 7 Thi s 4 2 _is 3 2 _a 2 4 7 _ea 3 1 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 R2 R1 18 9 _a 2 7 Thi s 4 14 _is 3 4 7 _ea 3 1 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 R2 R1 18 9 _a 2 7 This 4 14 _is 3 4 n 1 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 void Insert(string S, int i) › Strategy (one split and two concatenations) – Create a rope R1 using S – Split current rope at index i into ropes R2 and R3 – Concatenate R1, R2 andR3 void Delete(int i, int j) › Strategy (two splits and oneconcatenation) – Split current rope at indices i - 1 and j to give ropes R1, R2 andR3 – Concatenate R1 and R3 string Substring(int i, int j) › Strategy – Carry out a “pruned” traversal of the rope using the indices i, j and the augmented data for length to curb unnecessary explorations of the rope char CharAt(int i) › Strategy – Using the augmented data for length, proceed either left or right at each node of the rope to find the character of the given index (cf Rankmethodof the Augmented Treap) int IndexOf(char c) › Strategy – Carry out a traversal of the rope and a running count for the index until the first occurrence of character c is found void Reverse( ) › Strategy – Modify the rope so that it represents the string in reverse – Carry out a preorder traversal of the rope and swapleft and right children at eachnode – Reverse the substrings at each leaf node 27 13 14 7 Thi s 4 6 _is 3 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 27 14 13 7 sy_a 4 7 ssi 3 gnme 4 7 nt. 3 This 4 6 _is 3 _an 3 _ea 3 27 14 13 7 gnme 4 7 nt. 3 and so on ... sy_a 4 7 ssi 3 This 4 6 _is 3 _an 3 _ea 3 int Length( ) › Strategy – Easy ... where is the length of the full string stored? string ToString( ) › Strategy – Carry out a traversal of the rope and concatenate (using the string method) the substrings at each leaf node – Return the concatenatedstring void PrintRope( ) › Strategy – Print out the nodes of the rope inorder (cf Print of the Treap) More Optimizations Combining siblings with short strings › Strategy – Concatenate two (leaf) sibling strings to create a longer substring and place it at the parent node 23 9 14 3 Ti 2 6 s 1 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 23 9 14 Ti s 3 6 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 Rebalancing › (Simple) Strategy – Concatenate the string represented by the rope and rebuild the rope (as done in the constructor) Rope II Suggestions Definiti on › A rope is an augmented binary tree that is used to represent very long strings › Each leaf node in the rope stores a small part of the string › Each node p is augmented with the length of the string represented by the rope rooted at p Suggestion Maintain a full binary tree where each non-leaf node has two non-empty children Suggestion Maintain a full binary tree where each non-leaf node has two non-empty children Suggestion Familiarize yourself with the string methods of the library class Suggestion Several public methods require a private, recursive counterpart, including the Constructor Suggestion Tackle the three optimizations last Suggesti on The Split method may be implemented as Node Split (Node p, int i) which returns the root of the Rope that represents the right side of the split p then becomes the root of the Rope that represents the left side of the split p Split(p, 8) 27 13 14 7 This 4 6 _is 3 _a n 3 7 _ea 3 Assuming indices start at 0 sy_a 4 7 ssi 3 gnme 4 nt. 3 27 13 14 7 This 4 6 _is 3 _ea 3 3 _a 2 7 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 returned from Split 18 p 9 9 14 7 This 4 2 _is 3 2 _a 2 4 7 _ea 3 1 n 1 sy_a 4 7 ssi 3 gnme 4 nt. 3 Observation Splits only occur at right children Suggesti on The Concatenate method may be implemented as: Node Concatenate (Node p, Node q) which returns a Node whose left and right subtrees are p and q Note: p or q could be null p q 13 14 7 This 4 6 _is 3 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 returned from Concatenate 27 13 14 7 This 4 6 _is 3 _an 3 7 _ea 3 sy_a 4 7 ssi 3 gnme 4 nt. 3 Suggesti on Do not concatenate with an empty Rope (null) The binary tree would not be full otherwise Suggesti on The Substring method can be implemented using two Splits (cf Delete) and two concatenations to reconstruct the Rope or The Substring method can be implemented by repeatedly using the CharAt method (but fewer grades will be awarded) Suggestion To insert a string at the beginning of a rope pass an index of –1 and treat it as a special case Suggesti on Draw up your test cases even before you program They help to guide your implementation Suggestion Test a method at a time Suggesti on Document methods before you implement them It also helps to guide your implementation Suggesti on This is a challenging assignment Start early Expect to spend days programming M idterm ( Test) Week 7 Topics Graphs 1. Adjacency Matrix 2. Adjacency List (Assignment 1) 3. Depth-First and Breadth-First Search Adjacency Matrix sink yul › NumVertices 5 › MaxNumVertices 6 0 1 2 3 4 0 yul -1 -1 -1 -1 -1 1 yyz 20 -1 -1 60 -1 2 lax 30 15 -1 70 -1 3 lhr -1 70 -1 -1 50 4 yfb 45 -1 -1 -1 10 5 V E 30 20 45 5 yyz lax 15 60 70 lhr source 70 50 yfb 10 Adjacenc y List 55 lax false 35 yul false 70 65 yyz false 20 V lhr false 40 DirectedGraph Vertex Edge Randomly-Built Binary Search Trees 1. Skip List 2. Treap Skip List 4 ● 3 3 2 2 1 1 0 head 0 0 0 ● 2 2 ● 1 1 1 ● 0 0 0 ● -- 1 3 1 2 4 3 -- 10 20 30 40 50 60 maxHeight = 4 Treap root 50 0.83 20 0.47 60 0.25 10 0.34 40 0.42 30 0.14 Item Priority The treap satisfies the property of a 1. binary search tree using Item 2. binary heap using Priority Rotations Y X T1 Y T2 T3 X T3 T1 T2 Augmenting Data Structures 1. Augmented Treap (Rank Iand II) 2 . Rope (Assignment2) Augmented Treap root 50 6 20 4 60 1 10 1 Item NumItems 40 2 30 1 Note that Priority is omitted Rope root 27 13 Length of (sub)string 14 7 6 7 7 (Sub)string This 4 _is 3 _an 3 _ea 3 sy_a 4 ssi 3 gnme 4 nt. 3 Part A › Available from 7:00 PM–8:30 EST on Tuesday, March 1, 2022 › Hosted on Blackboard › Once you start, you have the full time to complete Part A Types of Questions (tentative) › True/False (15 – 20 questions) › Multiple Choice/Multiple Answer (5 – 10 questions) › Short Answer (3 – 5 questions) › 50 – 60 marks In Part A... › Questions are presented in randomorder › Backtracking is not permitted Part B › Available from 3:00PM –4:30ESTon Thursday, March 3, 2022 › Under Assessments on Blackboard › Once you start, you have the full time to complete Part B Coding in C# (tentative) › 3 – 4 coding problems › Implement and analyze a newmethod for a given class › 30 – 40 marks In Part B... › Questions are available up front Guidelines 1. All questions are based on the implementations seen in class or based on (partial) definitions provided on the test 2. Answers that do not relate to these implementations will be graded harshly 3. Penalties may apply for incorrect True/False, Multiple Choice, and Multiple Answer questions 4. Work through the Exercises provided on the PowerPoints 5. Appreciate the time complexity of different methods 6. Understand the behaviour of standard and augmented methods for each class (e.g., Add, Remove, Overlap) 7. Review the two assignments Availability › Iwill be available during the times for Part Aand PartB › You may ask questions via MS Teams for any clarifications(email will not be responded to) Tries Briandais, 1959 Fredkin, 1960 Tries • A trie (AKA prefix tree) is a tree representing a set of strings. • Each node represents a single character and has at most one child per distinct alphabet character. • A terminal node is a node that represents a terminating character, which is the end of a string in the trie. Trie – Visualized – 1 • The following trie represents a set of two strings. Each string can be determined by traversing the path from the root to a leaf. Trie – Visualized – 2 • Suppose "cats" is added to the trie. Adding another child for 'c' would violate the trie requirements. Trie – Visualized – 3 • Instead, the existing child for 'c' is reused. Trie – Visualized – 4 • Similarly, nodes for 'a' and 't' are reused. Node 't' has a new child added for 's'. Trie – Visualized – 5 • Exactly one terminal node exists for each string. Other nodes may be shared by multiple strings. Trie – Insertion • Given a string, a trie insert operation creates a path from the root to a terminal node that visits all the string's characters in sequence. • A current node pointer initially points to the root. A loop then iterates through the string's characters. For each character C: 1. A new child node is added only if the current node does not have a child for C. 2. The current node pointer is assigned with the current node's child for C. • After all characters are processed, a terminal node is added and returned. Trie – Insertion – Algorithm Trie – Insertion – Visualization – 1 • The trie starts with an empty root node. Adding "APPLE" first adds a new child for 'A' to the root node. Trie – Insertion – Visualization – 2 • New nodes are built for each remaining character in "APPLE" Trie – Insertion – Visualization – 3 • The terminal node is added, completing insertion of "APPLE". Trie – Insertion – Visualization – 4 • When adding "APPLY", the first 4 characters nodes are reused. Trie – Insertion – Visualization – 5 • Node L has no child for 'Y', so a new node is added. The terminal node is also added. Trie – Insertion – Visualization – 6 • When adding "APP", only 1 new node is needed, the terminal node. Trie – Search • Given a string, a trie search operation returns the terminal node corresponding to that string, or null if the string is not in the trie. Trie – Search – Algorithm Trie – Search – Visualization – 1 • The search for "PAPAYA" starts at the root and iterates through one node per character. The terminal node is returned, indicating that the string was found. Trie – Search – Visualization – 2 • The search for "GRAPE" ends quickly, since the root has no child for 'G'. Trie – Search – Visualization – 3 • The search for "PEA" gets to the node for 'A'. No terminal child node exists after 'A', so null is returned. Trie – Remove • Given a string, a trie remove operation removes the string's corresponding terminal node and all non-root ancestors with 0 children. Trie – Remove – Algorithm Trie – Remove – Visualization – 1 Trie – Remove – Visualization – 2 Trie – Remove – Visualization – 3 Trie – Remove – Visualization – 4 Trie – Remove – Visualization – 5 Trie – Remove – Visualization – 6 Trie – Remove – Visualization – 7 Time Complexities • You are probably more familiar with time complexities that are tied to the number of elements in the structure. • A Trie is one of the data structures where the number of elements in the structure has no consequence on the time complexity. • If a lookup table is used to find child nodes, then that operation would take O(1). • Subsequently because finding a child node has a time complexity of O(1), an insertion, removal, or a search would take O(M) where M is the length of the string you are trying to search for, insert, or remove. Trie – Activity (No Pun Intended) • Which of the following Tries are valid? R-Way Trie de la Briandais (1959) and Fredkin (1960) Introduction ›An r-way trie (retrieval) is an alternate way to implement a hash table. Therefore, items are inserted, removed, and retrieved from the trie based on <key,value> pairs. ›Each successive character of the key such as: – a digit where the key is a number – a character where the key is a string determines which of r ways to proceed down the tree. ›Values are stored at nodes that are associated with a key. Data structure class Trie<T> : ITrie<T> { private Node root; private class Node { public T value; public int numValues; public Node[] child; ... } ... } // Root node of the Trie // // // // Value associated with a key; otherwise, default Number of descendent values of a Node Branch for each letter 'a' .. ‘z’ Key methods (pun intended) bool Insert (string key, T value) Insert a <key,value> pair and return true if successful; false otherwise Duplicate keys are not permitted bool Remove (string key) Remove the <key,value> and return true if successful; false otherwise T Value (string key) Return the value associated with the given key Constructor where <key, value> is <string, int> Assume keys are made up of the letters ‘a’, ‘b’ and ‘c’ only root -1 0 (default) value numValues child [0..2] Note: The root node is never removed (cf header node in a linked list) Insert › Basic strategy – Follow the path laid out by the key, “breaking new ground” if need be (i.e. creating new nodes) until the full key has been explored – The value is then inserted at the final destination (node) unless the key has already been used (i.e. a value already exists for that key) Insert <abba, 10> root -1 0 Insert <abba, 10> root -1 0 “break new ground” Insert <abba, 10> root -1 0 Insert <abba, 10> root -1 0 Insert <abba, 10> root -1 0 Insert <abba, 10> root -1 0 Insert <abba, 10> root -1 0 Insert <abba, 10> root -1 0 -1 0 Insert <abba, 10> root -1 0 Since: 1) the key has been fully “explored” and 2) the node has a default value (-1) the value 10 is inserted and numValues is increased by one along the path back to the root -1 0 Insert <abba, 10> root -1 1 -1 1 -1 1 -1 1 10 1 Note that the key itself is NOT stored, but defines the path to a value Insert <ab, 20> root -1 1 -1 1 -1 1 -1 1 10 1 Insert <ab, 20> root -1 1 -1 1 -1 1 -1 1 10 1 Insert <ab, 20> root -1 1 -1 1 -1 1 -1 1 10 1 Insert <ab, 20> root -1 1 -1 1 -1 1 -1 1 10 1 Key has been fully explored and the node has a default value Insert <ab, 20> root -1 2 -1 2 20 2 -1 1 10 1 Insert value and adjust numValues Insert <c, 30> root -1 2 -1 2 20 2 -1 1 10 1 Insert <c, 30> root -1 2 -1 2 20 2 -1 1 10 1 Insert <c, 30> root -1 2 -1 2 20 2 -1 1 10 1 -1 0 Insert <c, 30> root -1 3 -1 2 20 2 -1 1 10 1 30 1 Insert <cbc, 30> root -1 3 -1 2 20 2 -1 1 10 1 30 1 Insert <cbc, 30> root -1 3 -1 2 20 2 -1 1 10 1 30 1 Insert <cbc, 30> root -1 3 -1 2 20 2 -1 1 10 1 30 1 Insert <cbc, 30> root -1 3 -1 2 30 1 20 2 -1 0 -1 1 10 1 Insert <cbc, 30> root -1 3 -1 2 30 1 20 2 -1 0 -1 1 10 1 Insert <cbc, 30> root -1 3 -1 2 30 1 20 2 -1 0 -1 1 10 1 -1 0 Insert <cbc, 30> root -1 4 -1 2 30 2 20 2 -1 1 -1 1 10 1 Values need not be unique, only keys 30 1 Insert <bba, 40> Insert <bc, 50> root -1 4 -1 2 30 2 20 2 -1 1 -1 1 10 1 30 1 Insert <bba, 40> Insert <bc, 50> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Insert<c, 60> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Insert<c, 60> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Insert<c, 60> root Node does not have a default value => Key is not unique -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Final Trie root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Time complexity › The path to insert a value is equal to the length of the key › Therefore, the time complexity of Insert is O(L) where L is the length of the key Exercises › What happens if the key is the empty string? Can the Insert method still work? Show how or why not. › Is it necessary to examine an entire key before a value is inserted? Show how or why not. › Insert the following <key,value> pairs into the final trie – <bbc, 70> – <b, 80> – <caa, 90> Value › Basic strategy – Follow the given key from the root until: › A null pointer is reached, and a default value is returned How can a null pointer be reached? › The key is fully examined, and the value at the final node is returned Can a default value be returned? Value <bba, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <bba, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <bba, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <bba, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 is returned -1 1 30 1 Value <ab, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <ab, ?> root -1 6 is returned -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <a, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <a, ?> root -1 6 default value is returned -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <bbc, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Value <bbc, ?> root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 null default value is returned 30 1 Time complexity › Let M be the maximum length of a key in the trie › The time complexity of Value is O(min(L, M)) where L is the length of the given key Exercises › Give a key that will cause the Value method to execute in O(M) time on the final trie. › According to the Value method posted online, what happens if the key is not made up of letters ‘a’ .. ’z’ ? How would you solve this problem (if there is one)? Remove › Basic strategy – Follow the key to the node whose value is to be deleted. – If the node is null or contains a default value then false is returned; otherwise, the value at the node is set to default, true is returned, and numValues is reduced by one for each node along the path back to the root. – For each child node whose numValues is reduced to 0 on the way back, the link to the child node is set to null (cf Rope compression). Remove <ab> root -1 6 Set to default -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Remove <ab> root return true -1 5 -1 1 -1 2 -1 1 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Remove <abba> root -1 5 -1 1 -1 2 -1 1 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Remove <abba> root -1 5 -1 1 -1 2 -1 1 -1 1 -1 1 -1 0 40 1 30 2 50 1 -1 1 30 1 Remove <abba> root -1 5 -1 1 -1 2 -1 1 -1 1 -1 0 40 1 30 2 50 1 -1 1 30 1 Remove <abba> root -1 5 -1 1 -1 2 -1 0 -1 1 40 1 30 2 50 1 -1 1 30 1 Remove <abba> root -1 5 -1 0 -1 2 -1 1 40 1 30 2 50 1 -1 1 30 1 Remove <abba> root return true -1 4 -1 2 -1 1 40 1 30 2 50 1 -1 1 30 1 Remove <cc> root -1 4 -1 2 -1 1 40 1 30 2 50 1 -1 1 30 1 Remove <cc> root -1 4 -1 2 -1 1 40 1 30 2 50 1 null return false -1 1 30 1 Remove <bb> root -1 4 -1 2 -1 1 40 1 30 2 50 1 -1 1 30 1 Remove <bb> root -1 4 -1 2 default value return false 40 1 -1 1 30 2 50 1 -1 1 30 1 Final Trie root -1 4 -1 2 -1 1 40 1 30 2 50 1 -1 1 30 1 Time complexity › Like the method Value, the time complexity of Remove is O(min(L,M)) where L is the length of the given key and M is the maximum length of a key in the trie Exercises › Justify the time complexity of the method Remove. › Continue to remove the remaining <key,value> pairs from the final trie until the empty trie remains. Print › Unlike the closed hash table, the r-way trie can easily print out keys in order Implementation private void Print(Node p, string key) { int i; if (p != null) { if (!p.value.Equals(default(T))) Console.WriteLine(key + " " + p.value + " " + p.numValues); for (i = 0; i < 26; i++) Print(p.child[i], key+(char)(i+'a')); } } public void Print() { Print(root,""); } Keys are constructed along the way Final Trie After Insertions root -1 6 -1 2 -1 2 20 2 -1 1 -1 1 10 1 40 1 30 2 50 1 -1 1 30 1 Result <key, value, numValues> ab 20 2 abba 10 1 bba 40 1 bc 50 1 c 30 2 cbc 30 1 Final Trie After Removals root -1 4 -1 2 -1 1 40 1 30 2 50 1 -1 1 30 1 Result <key, value, numValues> bba 40 1 bc 50 1 c 30 2 cbc 30 1 Next up ... the ternary tree Ternary Tree Bentley and Sedgewick (1998) Introduction › A ternary tree is a second version of a trie which can also be used to implement a hash table, among other applications. › As a hash table, items are inserted, removed, and retrieved from the ternary tree based on <key,value> pairs. › The ternary tree has a branching factor of three and therefore, can save some space over an r-way trie. › Like an r-way trie, keys are examined one character at a time. But there is not a one-to-one correspondence between a character and a branch. › Then how does one progress through the ternary tree? General strategy to traverse the ternary tree * Compare the current character k of the key with the character c at the current node. Then › Move left if k < c › Move right if k > c › Move to the middle and onto the next character if k = c * Similar to traversing a binary search tree Data structure class Trie<T> : ITrie<T> { private Node root; private int size; // Root node of the Trie // Number of values in the Trie class Node { public char ch; // Character of the key public T value; // Value at Node; otherwise, default public Node low, middle, high; // Left, middle, and right subtrees ... } ... } Key methods bool Insert (string key, T value) Insert a <key,value> pair and return true if successful; false otherwise Duplicate keys are not permitted bool Remove (string key) Remove the <key,value> and return true if successful; false otherwise T Value (string key) Return the value associated with the given key Constructor › The initial ternary tree is set to empty, i.e. the root is set to null root null pointer Insert › Basic strategy (similar to the R-Way Trie) – Follow the path laid out by the key (described earlier), “breaking new ground and building a spine” if needed until the full key has been explored. – The value is then inserted at the final destination (node) unless the key has already been used (i.e. a value already exists for that key) Insert <abba, 10> root Because the root is null, create a node to store ‘a’ Insert <abba, 10> root ‘a’ -1 Because the key character and the character in the node are the same, proceed down the middle path with the next character in the key. Note: Once a node is created for a character, all subsequent nodes will be added along the middle branch, creating a spine. Insert <abba, 10> root ‘a’ -1 Because the middle reference is null, create a node to store ‘b’ Insert <abba, 10> root ‘a’ -1 ‘b’ -1 Because the key character and the character in the node are the same, proceed down the middle path with the next character in the key. Insert <abba, 10> root ‘a’ -1 ‘b’ -1 Insert <abba, 10> root ‘a’ -1 ‘b’ -1 ‘b’ -1 Insert <abba, 10> root ‘a’ -1 ‘b’ -1 ‘b’ -1 Insert <abba, 10> root ‘a’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 spine Now that all characters of the key have examined, the value 10 is added to the last node and true is returned Insert <ab, 20> root ‘a’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 Insert <ab, 20> root ‘a’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 Since the first character of the key is equal to ‘a’, proceed down the middle branch Insert <ab, 20> root ‘a’ -1 ‘b’ 20 Note: If a value was already found then false would have been returned ‘b’ -1 ‘a’ 10 Since: 1) the second character of the key is equal to ‘b’ 2)all characters of the key have been examined, and 3) the node contains the default value then the value 20 is added here and true is returned Insert <c, 30> Since ‘c’ is greater than ‘a’ proceed to the right root ‘a’ -1 ‘b’ 20 ‘b’ -1 ‘a’ 10 Because the right reference is null, create a node to store ‘c’ Insert <c, 30> root ‘a’ -1 ‘b’ 20 ‘b’ -1 ‘a’ 10 ‘c’ 30 Insert <cbc, 30> ‘c’ is greater than ‘a’ => proceed right root ‘a’ -1 ‘b’ 20 ‘b’ -1 ‘a’ 10 ‘c’ 30 Insert <cbc, 30> root ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘c’ is equal to ‘c’ => proceed middle ‘a’ 10 Insert <cbc, 30> root ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 30 Insert <bc, 50> root ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 30 Insert <bc, 50> root ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 Exercise Insert <bba, 40> root ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 root Insert <bba, 40> ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 ‘a’ 40 Key observation Only once you proceed down a middle path do you move on to the next character in the key Exercises › Show how the worst-case time complexity of Insert is O(n+L) where n is the number of nodes in the ternary tree and L is the length of the given key. › Choose six random four-letter words as keys and insert six <key,value> pairs into an initially empty ternary tree. What could be the maximum possible depth of the resultant tree? Value › Basic strategy – Beginning at the root and the first character in the key, traverse the ternary tree (as described earlier) until all characters of the key have been examined or a null pointer is reached – Return true if a value is found; false otherwise root Value <bba, ?> ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 ‘a’ 40 root Value <bba, ?> ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 is returned ‘a’ 40 root Value <ac, ?> ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 ‘a’ 40 root Value <ac, ?> ‘a’ -1 ‘b’ 20 ‘c’ 30 Null pointer is reached Default value is returned ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 ‘a’ 40 root Value <cb, ?> ‘a’ -1 ‘b’ 20 ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 ‘a’ 40 root Value <cb, ?> ‘a’ -1 ‘b’ 20 Default value is returned ‘c’ 30 ‘b’ -1 ‘b’ -1 ‘b’ -1 ‘a’ 10 ‘c’ 50 ‘c’ 30 ‘b’ -1 ‘a’ 40 Time complexity › The time complexity of Value is O(d) where d is the depth of the ternary tree › If the tree is balanced then d is O(log n); otherwise, the depth is O(n) in the worst case where n is the number of nodes Exercise › Implement a method that returns all keys that match a given pattern. A pattern is defined as a string with letters and asterisks where each asterisk represents a wildcard character. Therefore, the pattern *a*d would return all four-letter keys whose second and fourth characters are ‘a’ and ‘d’ such as “card” and “band”. Print › Like the r-way trie, the keys can output in order › Also, like the r-way trie, the keys are constructed as the inorder traversal proceeds through the ternary tree Algorithm public void Print() { Print(root,""); } private void Print(Node p, string key) { if (p != null) { Print(p.low, key); if (!p.value.Equals(default(T))) Console.WriteLine(key + p.ch + " " + p.value); Print(p.middle, key+p.ch); Print(p.high, key); } Key is being constructed } Exercise › Write a method that returns all keys that begin with a given prefix. For example, if the prefix is bag, then the method returns all keys that begin with bag. Do you see the similarity with the last exercise? Very useful reference from Robert Sedgewick and Kevin Wayne of Princeton University https://www.cs.princeton.edu/courses/archive/spring21/cos226/lectures/52Tries.pdf Binomial Heap Vuillemin (1978) Quick Review › A binary heap, implemented as a linear array, represents a binary tree that satisfies two properties: – The binary tree is nearly complete – The priority of each node (except the root) is less than or equal to the priority of its parent › The operations to insert an item and to remove the item with the highest priority take O(log n) time Example where higher-valued items have higher priorities 94 62 93 42 39 81 20 27 10 55 Motivation › The binomial heap behaves in the same way as a binary heap with one additional capability › A binomial heap is an example of a mergeable data structure whereby two binomial heaps can be efficiently merged into one › To merge two binary heaps takes O(n) time, but to merge two binomial heaps takes only O(log n) time Binomial Tree › A binomial tree Bk is defined recursively in the following way: – The binomial tree B0 consists of a single node – The binomial tree Bk consists of two binomial trees Bk-1 where one is the leftmost child of the root of the other binomial coefficients Examples 1 4 6 4 1 B0 B1 B2 B3 B3 B2 B4 B1 B0 For any binomial tree Bk › There are 2k nodes › The height of the tree is k › There are exactly 𝑘 𝑖 nodes at depth i, i = 0,1,2,...,k › The root has degree k How does a set of binomial trees represent n items? n = 31 B4 B3 B2 B1 B0 1 1 1 1 1 n = 14 B4 B3 B2 B1 B0 0 1 1 1 0 n = 17 B4 B3 B2 B1 B0 1 0 0 0 1 n = 25 B4 B3 B2 B1 B0 1 1 0 0 1 n = 23 B4 B3 B2 B1 B0 1 0 1 1 1 Now many binomial trees are needed to represent n items? at most log 2 𝑛 + 1 n Number of binomial trees 3 2 19 3 32 1 184 4 2525 8 useful to know for time complexity analysis Data structure › Leftmost-child, right sibling (cf File Systems) right sibling leftmost child B4 B4 public class BinomialNode<T> { public T Item { get; set; } public int Degree { get; set; } public BinomialNode<T> LeftMostChild { get; set; } public BinomialNode<T> RightSibling ... } { get; set; } Exercises › Explain why the merger of two binary heaps requires O(n) time. › List the binomial trees Bk needed to represent n = 3, 19, 32, 184 and 2525 items. Binomial Heap › A binomial heap is a set of binomial trees that satisfies two properties: – There is at most one binomial tree Bk for each k – Each binomial tree is heap-ordered that is, the priority of the item at each node (except the root) is less than or equal to the priority of its parent (same as the binary heap) Example of a binomial heap for n = 29 root list head header node B0 B2 B3 B4 Notes: 1) The rightmost child of each root (except the last) refers to the next binomial tree 2) The binomial trees are order by increasing k Example of a heap-ordered binomial tree where higher-valued items have higher priorities 59 36 42 50 12 45 30 Note: The priority of a parent must be greater than or equal to the priority of its leftmost child and its right siblings 52 Data structure public class BinomialHeap<T> : IBinomialHeap<T> where T : IComparable { private BinomialNode<T> head; // Head of the root list private int size; // Size of the binomial heap ... } Primary methods // Adds an item to a binomial heap void Add(T item) // Removes the item with the highest priority void Remove() same as a binary heap // Returns the item with the highest priority T Front() // Merges H with the current binomial heap void Merge(BinomialHeap<T> H) NEW Supporting methods // Returns the reference to the root preceding the highest priority item private BinomialNode<T> FindHighest() // Takes the union (without consolidation) of the current and given binomial heap H private void Union(BinomialHeap<T> H) // Consolidates (combines) binomial trees of the same degree private void Consolidate() // Merges the given binomial heap H into the current heap (used by Add and Remove) public void Merge(BinomialHeap<T> H) Let’s start with the private methods FindHighest › Basic strategy – Starting at the header node, traverse the root list and keep track of the item with the highest priority. Note: The item with the highest priority in the binomial heap will be found at the root of one of the binomial trees (Why?) – Return the reference to the root node that precedes the one with the highest priority Union › Basic strategy – Given two binomial heaps, merge the root lists into one, maintaining the order of the binomial trees. Note: The resultant list will have at most two binomial trees with root degrees of k (Why?) head 1101 head + 0111 head Consolidate › Basic strategy – Run through a root list and combine two binomial trees that have the same root degree k into one binomial tree of degree k+1. Note: There may be (at least temporarily) three binomial trees of the same degree (Why?) prev head prev head curr next 1 5 curr 5 1 Case 4 next prev head curr next 5 3 1 prev curr head Note: Three B2 ! 5 3 1 Case 3 next prev curr head next 5 3 1 prev head curr 5 3 1 Case 2 next head 3 head 3 prev curr next 5 4 6 1 prev curr 5 6 1 4 Case 4 next head 3 head 3 prev curr next 5 6 2 prev curr next 5 6 null 4 1 1 2 4 Case 3 1101 Final Result + 0111 head 6 5 3 1 2 4 = 10100 What about Case 1? prev curr next head The roots at curr and next do not have the same degree prev curr next head Case 1 (Just like Case 2) Exercises › Work through Union and Consolidate methods for the following pairs of binomial heaps. Only show the structure of the resultant binomial heap. – First heap: B0, B1, B2, B3 – First heap: B0, B1, B2 Second heap: B1, B3 Second heap: B0, B1, B2 › Add items to each node of the binomial trees above and work through the Union and Consolidate methods. › Suppose a binomial heap before consolidation has a maximum depth of k. Argue that the maximum depth of a binomial heap after consolidation is never more that k+1. Now, on to the public methods Merge › Basic strategy – Take the union of the given binomial heap H with the current binomial heap and consolidate. Union(H); Consolidate( ); Insert › Basic strategy – Create a binomial heap H with the given item (only). – Merge H with the current binomial heap and increase size by 1. Remove › Basic strategy – Call the method FindHighest and remove the next binomial tree T from the current root list. – Insert the children of T (in reverse order) into a new binomial heap H, effectively removing the item at the root. – Merge H (Union + Consolidate) with the current binomial heap and reduce size by 1. from FindHighest q head Item with highest priority q head p head H: head Union head Consolidation head Front › Basic strategy – Call the method FindHighest and return the item at the root of the next binomial tree. Exercises › Work through the Insert method for n = 15 where the item to be inserted has the lowest priority. Where is the item in the final binomial heap? › Work through the Remove method for n = 3, 19 and 31. When assigning items to the binomial heaps, place the item with the highest priority at the root of the last binomial tree. Time complexities Method Binary Heap Binomial Heap * Insert O(log n) O(log n) Remove O(log n) O(log n) Front O(1) O(log n) Merge O(n) O(log n) FindHighest O(log n) Union O(log n) Consolidate O(log n) * All time complexities depend on the observation that the maximum length of the root list is O(log n) B-Trees (Chapter 18) Bayer and McCreight (1972) What does the “B” stand for? Background › A B-tree is a balanced search tree which is used to store the keys to large file and database systems › Because: – only a part of a B-tree can be stored in primary (main) memory at a time and – secondary storage is much, much slower than primary memory it is advantageous to read/write as much information as possible from/to secondary storage and thereby minimize disk accesses › Each node of a B-tree therefore is usually as large as a disk page and has a branching factor between 50 and 2000 depending on the size of the key relative to the size of a page › Hence, a B-tree is wide and shallow Five properties of a B-tree Property 1 * Each internal node has a minimum degree of t (except for the root) and a maximum degree of 2t. * The notation B-tree (t=k) is used to denote a B-tree with a minimum degree of k Five properties of a B-tree Property 2 Each node has a minimum of t-1 keys (except the root) and a maximum of 2t-1 keys. If an internal node has degree d (t ≤ d ≤ 2t) then it stores d-1 keys. Properties 1 and 2 for B-tree (t=3) K1 K2 K1 K2 Minimum degree Maximum degree K3 K4 K5 Five properties of a B-tree Property 3 The keys for each node are stored in ascending order. Therefore, K1 < K2 < ... < Kd-1 for a node with degree d. Five properties of a B-tree Property 4 Key values in the ith subtree of an internal node with degree d are less than K1 fall between Ki-1 and Ki greater than Kd-1 for i=1, for 2 ≤ i ≤ d-1, and for i=d. Properties 3 and 4 for B-Tree (t=3) K1 k < K1 K2 K1 < k < K2 K3 K2 < k < K3 K4 K3 < k < K4 Subtrees K5 K4 < k < K5` K > K5 Five properties of a B-tree Property 5 All leaf nodes have the same depth. Maximum height h of a B-tree on n keys Theorem ( p489 ) The maximum height on n keys for a B-tree with minimum degree t is h ≤ log t (n+1)/2 which is O(log n). Exercises › Look up and review the proof on the previous slide. › Calculate the minimum height on n keys for a B-tree with minimum degree t. › Show all the legal B-trees (t=2) that represent the keys {1, 2, 3, 4, 5}. › For what values of t is the example tree on the next slide a legal B-tree? Example tree * M DH BC FG * From CLRS (3rd Edition) QTX JKL NP RS VW YZ Data structure class Node<T> { private int n; // number of keys private bool leaf; // true if a leaf node; false otherwise private T[ ] key; // array of keys private Node<T>[ ] c; // array of child references public Node (int t) { n = 0; leaf = true; key = new T[2*t – 1]; c = new Node<T>[2*t]; } } Data structure class BTree<T> where T : IComparable { private Node<T> root; // root node private int t; // minimum degree public BTree (int t) { this.t = t; root = new Node<T>(t); } ... } Primary methods › public bool Search (T k) – returns true if the key k is found in the B-tree; false otherwise › public void Insert (T k) – insert key k into the B-tree – duplicate keys are not permitted › public void Delete (T k) – delete key k from the B-tree Support method › private void Split (Node<T> x, int i) – splits the ith (full) child of x into 2 nodes “One-pass” algorithms › The primary methods Search, Insert, and Delete are implemented as “one-pass” algorithms where each algorithm proceeds down the B-tree from the root without having to back up to a node * › This minimizes the number of read and write operations from secondary memory (disk) * with one exception for deletion Search › Basic strategy – Set p to the root node – If the given key k is found at p then return true – If the key k is not found and p is a leaf node then return false else move to the subtree which would otherwise contain k Search for key k = 35 K1 = 10 k < K1 K2 = 20 K1 < k < K2 K3 = 30 K2 < k < K3 K4 = 40 K3 < k < K4 Subtrees K5 = 50 K4 < k < K5` K > K5 Search for key k = 7 K1 = 10 k < K1 K2 = 20 K1 < k < K2 K3 = 30 K2 < k < K3 K4 = 40 K3 < k < K4 Subtrees K5 = 50 K4 < k < K5` K > K5 Search for key k = 80 K1 = 10 k < K1 K2 = 20 K1 < k < K2 K3 = 30 K2 < k < K3 K4 = 40 K3 < k < K4 Subtrees K5 = 50 K4 < k < K5` K > K5 Search for key k = 20 found K1 = 10 k < K1 K2 = 20 K1 < k < K2 K3 = 30 K2 < k < K3 K4 = 40 K3 < k < K4 Subtrees K5 = 50 K4 < k < K5` K > K5 Time complexity › The length of the path from the root to a leaf node is O(log n) › At each node along the path, it takes O(t) time to determine which path to follow › Total time complexity is O(t log n) Insert › Basic strategy – Descend along the path based on the given key k from the root to a leaf node. Each node along the path though cannot be full i.e., have 2t-1 keys. Therefore, before moving down to a node p (including the root), check first if p is full. If p is full then split p into two where the last t-1 keys of p are placed in a new node q and the median (middle) key of p is moved up to the parent node. Note: The parent node is guaranteed to have enough space to store the extra key (Why?) – Insert the given key k at the leaf node. If the key is found along the path from the root to the leaf node, then no insertion takes place. Insert V into a B-tree (t=4) here ... N Y not full ... p P there T1 Q T2 R T3 S T4 U T5 W T6 full X T7 T8 Split method ... N S Y p P T1 Q T2 q R T3 ... U T4 T5 W T6 not full X T7 T8 Insert V into a B-tree (t=4) when the root is full here P there T1 Q T2 R T3 S T4 U T5 W T6 root is full X T7 T8 new root S Height increases by 1 P T1 Q T2 R T3 U T4 T5 W T6 not full X T7 T8 In general not full here ... there t-1 keys 1 t-1 keys ... 1 ... t-1 keys ... t-1 keys full (2t-1 keys) not full Initial B-Tree (t=3) => Do not descend to nodes with 5 keys G M P X A C D E J K N O R S T U V Y Z R S T U V Y Z Insert B G M P X A B C D E J K N O G M P X A B C D E J K N O R S T U V Y Z U V Y Z Insert Q G M P T X A B C D E J K N O Q R S G M P T X A B C D E J K N O Q R S U V Y Z Insert L Height increases by 1 P G M A B C D E J K L T X N O Q R S U V Y Z P G M A B C D E T X J K L N O Q R S U V Y Z Insert F P C G M A B D E F J K L T X N O Q R S U V Y Z Observations › Only descend to a node if it is not full › The B-tree only grows in height when a key is inserted into a B-tree with a full root i.e., when the root node is split › When a node is split into two, each of the two resultant nodes is assigned the minimum number of keys i.e., (t-1) Time complexity › The length of the path from the root to a leaf node is O(log n) › At each node along the path, it takes O(t) time: – To determine which path to follow and – To split a node (if required) › Total time complexity is O(t log n) Exercises › Randomly insert a permutation of the alphabet into initially empty B-trees (t=2, t=3). Call the resultant trees B2 and B3. › Calculate the minimum and maximum heights of B-trees (t=2, t=3) for 26 keys. › Explain how to find the predecessor of a given key in a Btree. Delete › Basic strategy – Descend along the path based on the given key k from the root until the key is found. Each node along the path though must have at least t keys. Therefore, before moving down to a node p (excluding the root), check if p has at least t keys. If not, node p either: › borrows a key from a sibling (if possible) or › merges with an adjacent node where the t-1 keys of each node plus one from the parent node yield a single node with 2t-1 keys. Note: The parent node is guaranteed to have an extra key (Why?) › Basic strategy (cont’d) – If the given key k is found at a leaf node, delete it. If the given the key k is found at an internal node p, then: › if the child node q that precedes k has t keys, then recursively delete the predecessor k’ of k in the subtree rooted at q and replace k with k’. › if the child node r that succeeds k has t keys, then recursively delete the successor k’ of k in the subtree rooted at r and replace k with k’. › otherwise, merge q and r with k from the parent to yield a single node s with 2t-1 keys. Recursively delete k from the subtree s. – If key k is not found, no deletion takes place. Initial B-Tree (t=3) => Do not descend to nodes with 2 keys or less P C G M A B D E F J K L T X N O Q R S U V Y Z Delete F P C G M A B D E J K L T X N O Q R S U V Y Z P C G M A B D E J K L internal node N O T X Q R S U V Y Z Delete M P C G L A B D E J K T X N O Q R S U V Y Z P C G L A B D E J K internal node N O T X Q R S U V Y Z Delete G P C L A B D E G J K T X N O Q R S U V Y Z P C L A B D E J K T X N O Q R S U V Y Z Delete D Height decreases by 1 C L P T X A B E J K N O Q R S U V Y Z C L P T X A B E J K N O Q R S U V Y Z Q R S U V Y Z Delete B E L P T X A C J K N O Observations › The height of a B-tree only decreases if: – the root node has a single key, and – its two child nodes have t-1 keys each. In this case, the two child nodes along with the key at the root are merged into one node with 2t-1 keys. This node now serves as the new root of the B-tree. Time complexity › Although the Delete method is quite complicated, its time complexity is the same as Search and Insert, O(t log n). Exercises › Successively delete in the same order the keys that were inserted to construct B2 and B3 (from the last exercise). › Show why a node may need to be revisited (reread from disk) in the Delete method. › Show that the number of read and write disk accesses is O(h) for Search, Insert, and Delete where h is the height of the B-tree. Disjoint Sets (Chapter 21.4) Tarjan (1975) Background › Disjoint sets is a collection of sets where each set is initialized with a single unique item across all sets. › Because the items are unique, the sets have no intersection among themselves and are therefore disjoint. Data structure public class DisjointSets : IDisjointSets { private int[] set; // If set[i] = j < 0 then set i has size |j| // (i.e. set i exists with the given size) // If set[i] = j > 0 then set i points to set j // (i.e. set i no longer exists) private int numItems; ) // Number of items Primary methods › public bool Union (int S1, int S2) – takes the union of sets S1 and S2 – uses union-by-size which determines the resultant set › public int Find (int x) – returns the set that x belongs to – uses path compression Constructor › Basic strategy – Set each array position to -1 – The negative sign indicates that the set exists, and the value indicates the size of the set – Item i is assumed to be in set i Initially Set i contains item i e.g. Item 4 is in set 4 1 2 3 -1 -1 -1 -1 0 1 2 3 0 4 5 6 7 -1 -1 -1 -1 4 5 6 7 All sets 0 to 7 exist and have a size of 1 public DisjointSets(int numItems) { int i; this.numItems = numItems; set = new int[numItems]; for (i = 0; i < numItems; i++) set[i] = -1; // Each set has an initial size of 1 // Assumption: item i belongs to set i } Time complexity › To initialize the array to -1 takes O(n) time Union › Basic strategy – Check that both sets i and j exist i.e., set[i] < 0 and set[j] < 0 – Merge the set with the lesser size into the set with the greater size i.e., union-by-size – If the two sets have the same size, merge either one into the other Union(0,1) Both sets exist and have a size of 1 1 2 3 -1 -1 -1 -1 0 1 2 3 0 4 5 6 7 -1 -1 -1 -1 4 5 6 7 5 6 7 1 2 3 -2 +0 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 0 4 Union(2,3) Both sets exist and have a size of 1 1 2 3 -2 +0 -1 -1 0 1 2 3 0 4 5 6 7 -1 -1 -1 -1 4 5 6 7 5 6 7 1 2 3 -2 +0 -2 +2 -1 -1 -1 -1 0 1 2 3 4 5 6 7 0 4 Union(5,4) Both sets exist and have a size of 1 1 2 3 -2 +0 -2 +2 0 1 2 3 0 4 5 6 7 -1 -1 -1 -1 4 5 6 7 1 2 3 4 5 6 7 -2 +0 -2 +2 +5 -2 -1 -1 0 1 2 3 4 5 6 7 0 Union(2,1) Set 1 does NOT exist; return false 1 2 3 4 5 6 7 -2 +0 -2 +2 +5 -2 -1 -1 0 1 2 3 4 5 6 7 0 Union(0,6) Both sets exist. Set 0 has greater size => Merge set 6 into set 0 1 2 3 4 5 6 7 -2 +0 -2 +2 +5 -2 -1 -1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 -3 +0 -2 +2 +5 -2 +0 -1 0 1 2 3 4 5 6 7 0 Union(5,7) Both sets exist. Set 5 has greater size => Merge set 7 into set 5 1 2 3 4 5 6 7 -3 +0 -2 +2 +5 -2 +0 -1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 -3 +0 -2 +2 +5 -3 +0 +5 0 1 2 3 4 5 6 7 0 Union(5,0) Both sets exist and have a size of 3 1 2 3 4 5 6 7 -3 +0 -2 +2 +5 -3 +0 +5 0 1 2 3 4 5 6 7 0 0 1 2 3 4 5 6 7 +5 +0 -2 +2 +5 -6 +0 +5 0 1 2 3 4 5 6 7 Union(2,5) Both sets exist. Set 5 has greater size => Merge set 2 into set 5 0 1 2 3 4 5 6 7 +5 +0 -2 +2 +5 -6 +0 +5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 Final Set 0 1 2 3 4 5 6 7 +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 Visually more appealing ☺ 0 1 All items belong to set 5 5 2 6 3 4 7 public bool Union(int s1, int s2) { // Union by size if (set[s1] < 0 && set[s2] < 0) { if (set[s2] < set[s1]) { set[s2] += set[s1]; set[s1] = s2; } else { set[s1] += set[s2]; set[s2] = s1; } return true; } else return false; } // Both sets exist // // // // // If size of set s2 > size of set s1 Then Increase the size of s2 by the size of s1 Have set s1 point to set s2 (i.e. s2 = s1 U s2) // // // // // If the size of set s1 >= size of set s2 Then Increase the size of s1 by the size of s2 Have set s2 point to set s1 (i.e. s1 = s1 U s2) Time complexity › The maximum number of Unions for n disjoint sets is n-1. Why? › Each Union takes O(1) or constant time › Therefore, n-1 Unions take O(n) time Theorem 1 Using union-by-size, for every set i, ni ≥ 2hi Notation: Let ni be the size of set i Let hi be the height of set i Let ni* be the size of the resultant set i after a union-by-size Let hi* be the height of the resultant set i after a union-by size Proof by induction on the number of links k Basis of induction (k=0): With no links, every set contains one item and has a height of 0. since 1 ≥ 20 The theorem holds for i=0 Induction hypothesis: Assume the theorem is true after k links Inductive step (k+1): Prove the theorem holds for an additional link Case 1 (hi < hj) Let set i be the smaller set and set j be the larger set. 5 0 1 2 6 Set i 4 3 Set j 7 Case 1 (hi < hj) Let set i be the smaller set and set j be the larger set. 5 0 1 2 6 3 Set j 4 7 Case 1 (hi < hj) Since hi < hj the height of the resultant set j does not change. nj* = ni + nj ≥ nj ≥ 2hj = 2hj* Induction hypothesis Case 2 (hi ≥ hj) Let set i be the smaller set and set j be the larger set. 0 1 5 6 3 4 2 8 Set i Set j 7 Case 2 (hi ≥ hj) Let set i be the smaller set and set j be the larger set. 0 1 5 6 3 8 Set j 2 4 7 Case 2 (hi ≥ hj) Since hi ≥ hj the height of the resultant set j is hi + 1 nj* = ni + nj ≥ 2ni ≥ 2x2hi = 2hi+1 Induction hypothesis = 2hj* Theorem 2 Using union-by-size, the height of the final resultant set is O(log n) . Proof: The resultant set has n items. which implies: h ≤ Therefore, h is O(log n). From Theorem 1, n ≥ 2h log 2 n Exercises › Show that if union-by-rank is not used for n-1 unions, the final resultant set could have a height of n-1 › Carry out a series of seven Unions on 8 disjoint sets that yields a single set with a height of 3 › Redraw using the 2-D tree representation the resultant sets after each of the Union operations Find › Basic strategy – Recurse to the root of the set in which item belongs and return the root set – On the way back from the recursive calls, update each item along the path to point directly to the root set i.e., path compression Find(6) +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 5 0 1 2 6 3 4 7 Find(6) +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 5 0 1 2 6 3 4 7 Find(6) +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 5 0 1 2 6 3 4 7 Find(6) +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 Return 5 0 1 5 2 6 Negative value indicates the set exists 3 4 7 › If the tree structure doesn’t change, each Find operation takes O(log n) time i.e., the height of the tree. › But we also know that every node (item) along the path to the root belongs to set 5! Therefore, set each node along the path (and only nodes along the path) to point directly to 5. › Next time, Find(6) will only take one step. Find(6) +5 +0 +5 +2 +5 -8 +0 +5 0 1 2 3 4 5 6 7 5 0 1 2 6 3 4 7 Find(6) +5 +0 +5 +2 +5 -8 +5 +5 0 1 2 3 4 5 6 7 5 0 1 2 6 3 4 7 › But how do we backtrack along the same path from the root? › The beauty of recursion Luc Devroye McGill University public int Find(int item) { if (item < 0 || item >= numItems) // Item is not in the range 0..n-1 return -1; else if (set[item] < 0) // set[item] exists return item; else // Path Compression // Recurse to the root set // And then update all sets along the path to point to the root return set[item] = Find(set[item]); } public int Find(int item) { if (item < 0 || item >= numItems) // Item is not in the range 0..n-1 return -1; else if (set[item] < 0) // set[item] exists return item; Terminating condition else // Path Compression // Recurse to the root set // And then update all sets along the path to point to the root return set[item] = Find(set[item]); } public int Find(int item) { if (set[item] < 0) return item; else Terminating condition return set[item] = Find(set[item]); } Find(9) 0 Question: Was union-by-size used to construct this representation of disjoints sets? 1 3 4 6 7 9 8 10 11 12 2 5 Find(9) 0 9 11 6 12 8 10 3 1 7 4 2 5 Time complexity Using union-by-size plus path compression, any sequence of m≥n Find and n-1 Union operations is: O(m α(m,n) ) where α(m,n) is the functional inverse of Ackermann’s function. But α(m,n) grows sooooo slowly that the average time to complete any sequence of m operations is virtually O(m) which averages out to O(1) per operation. Exercises › Review the reference on the next page and learn about union-by-rank. – The rank is a proxy for what measurement? – When does the rank of a set increase? › Describe a set representation where one Find operation reduces the height of the set from n-1 to 1. Note that union-by-size would not have been used. Useful reference Kevin Wayne, Princeton University https://www.cs.princeton.edu/courses/archive/spring13/cos423/lectures/UnionFind.pdf COIS 3020: Data Structures & Algorithms II Alaadin Addas Final Examination Information Week 12 – Lecture 1 Overview • Exam Information • General Information & Guidelines • Format • Content • Questions • Exam Review • Assignment 2 Implementation (will not be posted) Final Examination – General Information • Thursday, April 14th at 11 AM (check the exam schedule for any changes) • Location: Online, hosted and submitted on Blackboard. • Duration: 3 Hours (unless you have an SAS accomodation letter). Final Examination – Guidelines – 1 • The final examination is open book; however, you are not allowed help from any 3rd party in real time. • The deadline for the examination is 2PM on the 14th of April, every minute you are late after the deadline will lead to a 1 point penalty. • Grace period is 5 minutes. • I will be available via MS Teams, message me in a private chat or in the general chat. • If you are messaging me in the general chat, please tag me (@), or I will not get a notification. • Emails won’t be replied to for you own sake! Final Examination – Guidelines – 2 • Backtracking is not allowed, and the questions are randomized. • I am messing around with the settings to see if I can allow backtracking for the programming questions on their own, but as of now it seems like I have to disallow backtracking for the entire exam. Final Examination – Format • The final exam has three components (it’s the same assessment on Blackboard though) 1. 20-22 True/False questions (1 mark each). 2. 7-9 multiple choice questions (3 marks each). 3. 2-3 programming questions (each question is worth between 6-10 marks). Final Examination – Content • The topics to focus on for the final examination are: • R-Way Tries • Ternanry Trees • Binomial Heap • B-Tree • Disjoint Sets • Quad Trees are NOT on the final examination. Final Examination – Recommended Time Management • 35 minutes on the true/false questions • 45 minutes on the multiple choice questions • 85 minutes on the programming questions Questions?