AVL TREES: Balanced BST - Self-balanced tree: mantains balance in O(1) - For every node, left and right subtrees can differ in height by at most 1 - Each node must know the height of children - Update heights on insert/remove Checking for balance: - Only nodes on path to insertion/removal can be out of balance - A node can only be out of balance by 2 AVL cases insertion - Outside (LL/LR) - Akimbo (LR/RL) SINGLE ROTATION: - Pick up heavy child - Node falls to the left or right - Existing left or right child is reattached DOUBLE ROTATION: - LR: left rotate left child, then right rotate node - RL: right rotate right child, then left rotate node - You move the heavy child to the other side so you can then do a single rotation. https://www.cs.usfca.edu/~galles/visualization/AVLtree.html https://www.sanfoundry.com/cpp-program-implement-avl-trees/ SORTING: https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html Issues: - Time complexity: best is comparison-based sorts: O(nlogn) - Space complexity: best is in place sorts: constant extra space - Handling duplicates: stable sorts preserve original order, unstable don’t Selection Sort: - Divide array into 2: sorted and unsorted, sorted is empty - Swap smallest unsorted element with first unsorted element - Sorted subarray increases, unsorted subarray decreases Pseudocode: - For i in [0…(size-1)] o imin = index of min element in ai through asize-1 o swap elements ai and aimin (can avoid if i = min) Complexity: - Time complexity: O(n^2) - Space complexity: in-place sort: O(n) for array, O(1) for swapping - Unstable Insertion Sort: - Not in-place: start with empty result list, insert each item in order - In-place: divide array into sorted and unsorted part, first element is sorted o for each unsorted element, slide it left into sorted position Complexity: - Time complexity: o O(n) best case: already sorted o O(n^2) worst case: reverse sorted o O(n^2) average case - Space complexity: in-place sort: O(n) for array, O(1) for swapping - Stable Bubble Sort: - Swap adjacent out of order elements - After each pass, largest element “bubbles” into position Complexity: - Time complexity: o O(n) best case: already sorted o O(n^2) worst case: reverse sorted - Space complexity: in-place sort: O(n) for array, O(1) for swapping - Stable Merge Sort / Quick sort - If list greater than 1 o Divide list into 2 sublists o Recursively sort sublists o Combine sorted sublists into sorted lsit Merge Sort: - Not in-place - Merge algorithm: o If L1 is empty, return L2 o If L2 is empty, return L1 o …… Complexity: - Time complexity: O(nlogn) o O(n) at each level o O(logn) levels - Space complexity: in place difficult, not in place O(2n) Quick Sort: - Pick pivot element from list - Re-order list so elements less than pivots are to the left, elements greater are to the right - Concatenate lists Complexity: - Time complexity: o Worst case O(n^2): bad pivot o Best case O(nlogn): good pivot - Space complexity: in place difficult, not in place O(2n) Non-comparison sorts: So far best sorts are: heap sort, merge sort, quick sort: O(nlogn) This is the best it can be for comparisons Bucket sort: - If you know range of values, create bucket for each possible value - Assign each item to bucket in constant time - Read items out of buckets from least to greatest Complexity: - Time complexity: O(n + range_size) o O(n) to put into buckets o O(range_size) + n to get result out - Space complexity: O(range_size) extra space: O(range_size +n) - Can be stable or not Radix sort: - Sort by digit, first from right to left digits Complexity: - Time complexity: O(dn) (d digits) - Space complexity: O(BASE + n) : need BASE buckets, store n numbers - Can be stable or not Algorithm: - Input is interpreted numerically as in the range [0, BASEnum_digits). - Example above was in base 10 - We have an auxiliary array, bins, of BASE elements - pass = 0 - while pass < num_digits o for each element in the array location = ((value of element) / (BASEpass)) mod BASE insert element into list at bins[location] o concatenate all the list in bins increment pass [Note: don't actually calculate BASEpass! It's BASE * previous value each time through loop.] SORT TYPE BEST CASE WORST CASE SPACE STABILITY WHEN TO USE Selection 2 O(n ) O(n ) constant unstable List is small, space is limited Insertion O(n) O(n2) constant stable List is nearly sorted stable List is nearly sorted/ small. Stops when list is sorted unstable Fast, not too much space stable Fast for large data sets unstable Fastest, not good for large data sets, bad when bad pivot. Bad for perfectly sorted array Bubble Heap Merge Quick Bucket Radix O(n) O(nlogn) O(nlogn) O(nlogn) (fastest) O(n) O(n) if digits are bound 2 O(n2) O(nlogn) O(nlogn) O(n2) O(n) O(dn) constant constant O(2n) constant or O(2n) O(range_size Can be both + n) Use when you know the range O(base + n) When you know the range but its very large stable Priority Queues: - Arrival order but based on priority - Keys and values: o Not necessarily unique keys o Keys can be complicated o Keys must be immutable - Need comparison function to compare keys Heaps: - Min heap or max heap - Operations: o Insert(key, value) o Min/maxElement() o removeMin/Max() o isEmpty() o size() o min/maxKey() Binary heap invariants: - Shape property: o each level is full, except maybe the last o Lowest level fills from left to right - Heap property: o Parents are more important than children o Order between children doesn’t matter Heap can be stores using array list: O(1) access to elements - Put root at position 1 instead of 0 - Data member for current size - For an element at position i: o Left child is at: 2i o Right child is at: 2i+1 o Parent is at: i/2 https://www.cs.usfca.edu/~galles/visualization/Heap.html Algorithm: after inserting/removing - Reestablish shape property first - Reestablish heap property, without disrupting shape - Update heap size Insert: worse case O(logn), average O(1) - Insert item at element [heapSize + 1] - Increment heap size - Up-heap until you get to root o Compare added element with parent: if in correct order, stop o If not, swap and upheap from parent RemoveMin: - Remove root (save for returning) - Replace root with last element in array (to maintain shape) - Down-heap the new root o Compare root with children, if in correct order, stop o If not, swap with smallest child and down-heap child o NOTE: check if children exists. If right exists, left must too Build heap: O(n) - Create array with elements - Starting from lowest completely filles level at first node with children (n/2), down heap each element Graphs: - Ultimate linked data structure - Vertices/nodes connected by arcs/edges - Represent arbitrary connections - Edges can have labels or weights/costs - Can be directed or undirected Uses: - Find route from A to B - Shortest route - Cheapest route Representation: - Adjacency matrix: o O(1) time to find edge from A to B o O(n) time to list all vertices adjacent to vertex o O(n^2) space - Adjacency list: o O(E+V) space o O(V) time to find edge from A to B - Other options: hash table, sets Graph Traversals: in unweighted graphs Complexity of both: O(V+E) - BFS: o queue<Artist> q o enqueue starting artist o while queue isn’t empty: o artist curr = q.dequeue o mark_vertex(curr) o get neighbors of curr o for each neighbor: if not marked, mark curr as predecessor and enqueue NOTE: works because mark_predecessor won’t mark if artist already has a predecessor - DFS: o mark_vertex(curr) o get neighbors of curr o for each neighbor: if not marked, mark curr as predecessor and recurse BFS weighted graph: Dijkstra’s algorithm (greedy algorithm) - for each Vertex v: o v.dist = INFINITY o v.known = false o v.prev = NONE o s.dist = 0 - while there is an unknown vertex: o Vertex v = unknown vertex with smallest distance o v.known = true o for each Vertex w adjacent to v if (not w.known) c = cost of edge from v to w if (v.dist + c) < w.dist o w.dist = v.dist + c o w.prev = v Limitations: doesn’t work with negative edges Time complexity: O(V^2 + E) = O(V^2) Revised algorithm: puts unprocessed vertices into a min-priority queue based on curr best distance - dist[source] ← 0 - create vertex priority queue Q - for each vertex v in Graph: o if v ≠ source dist[v] ← INFINITY prev[v] ← UNDEFINED o Q.add_with_priority(v, dist[v]) - while Q is not empty: o u ← Q.extract_min() o for each neighbor v of u: alt ← dist[u] + length(u, v) if alt < dist[v] dist[v] ← alt prev[v] ← u Q.decrease_priority(v, alt) - return dist, prev Time complexity: O((E + V) log(V)) Hashing: - Implementing key-value stores in constant-time access (array list) - Hash function: maps a key (whatever that is in your application) to an integer deterministically, i. e., will always get the same answer for any given key (can’t just be random). - Compression function: reduces hash function returned value and puts it in range - Good hash function: o Deterministic o Fast o Spreads out keys well o Assigns keys to same hash value with low probability Collisions: - Avoid by using good hash functions and picking table sizes to minimize collisions - Handling: o Chaining: put a list in each slot in array o Linear probing: table must always have extra space. Insert in the next unfilled slot. Must have a bool to tell if slot is filles Load factor: - Number of keys stored / num buckets in table - A measure of how full table is - Open addressing cannot support load factors greater than 1 - Low load factor means probably O(1) complexity - High load factor means O(n) - Most systems keep it under 0.7 - If load factor is high, grow the array Growing the array: - Can’t just copy over elements, because array is bigger and compression would change hash value - Have to rehash values -