k - Joe Meehean

Joe Meehean 1  Data structures are great at… • storing items • providing a simple access interface  Algorithms … • operate over data structures • instructions to do more complicated work • may or may not want embedded in data structure 2  Found in <algorithm> header  From simple • for_each • find • swap  To the complex • sort • set_intersection • random_shuffle • next_permutation 3  straight forward approach  usually based directly on • problem statement • definitions of concepts  e.g., an • an = a * a * …. * a (n times) 4  Advantages • applicable to a wide variety of problems • no limitation on problem size for some important problems • simple to design  designing better algo not always worth it  if problem small or algo will run infrequently • provides a comparison for more complex algos 5  Disadvantages • slow • may be so slow as to make impossible to complete in human lifetime 6  Problem • arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items define • < operator • > operator • == operator  Brute force  Find the smallest value in vector A and put in it in A[0]  Find 2nd smallest value and put it in A[1]  Etc…  Use a nested loop  Outer loop index k • indicates position to fill  Inner loop index j • from k+1 to A.length – 1 • indicates value to compare to min next  Swap A[k] with A[min] • A[min] is min value in range k to A.length–1 min 8 5 2 6 9 3 1 4 0 7 k 10 min 8 5 2 6 9 3 1 4 0 7 k j 11 min 8 k 5 2 6 9 3 1 4 0 7 j 12 min 8 k 5 2 6 9 3 1 4 0 7 j 13 min 8 k 5 2 6 9 3 1 4 0 7 j 14 min 8 k 5 2 6 9 3 1 4 0 7 j 15 min 8 k 5 2 6 9 3 1 4 0 7 j 16 min 8 k 5 2 6 9 3 1 4 0 7 j 17 min 8 k 5 2 6 9 3 1 4 0 7 j 18 min 8 k 5 2 6 9 3 1 4 0 7 j 19 swap 8 k 5 2 6 9 3 min 1 4 0 7 j 20 swap 0 k 5 2 6 9 3 min 1 4 8 7 j 21 min 0 5 2 6 9 3 1 4 8 7 k j 22 23 Item value Position in the Array  After i outer loop iterations • A[0] through A[i-1] contain their final values  Outer loop executes N times  Inner loop executes a different number of times depending on outer loop • 1st outer = N – 1 inner • 2nd outer = N – 2 inner • … • Nth outer = 0 inner • (N-1) + (N-2) +…+ 2 + 1 + 0 = O(N2)  Always O(N2)  Combinatorial problems • problems where the answer is a combination of items from a set  Exhaustive search • brute force approach to combinatorial problems • generate all possible combinations • check each combination to see if it is a possible solution • then select the best solution 27  Have a set of items • each has a weight: wi • each has a monetary value: vi  Have knapsack • can only hold a total weight of W  Fill the knapsack to maximize its value 28  Exhaustive search • try every combination of items • throw out combinations that are too heavy • select the combination with the largest value  Complexity • dominated by generating all combinations • each item may be in the knapsack or not • for N items: 2N possible combinations • O(2N): very, very bad 29 30  Exploit relationship between solution to a problem and solution to a smaller instance  Reduce problem to a smaller problem  Solve smaller problem  Use smaller problem solution to solve original problem 31  Top-down • reduce larger problem into successively smaller problems • recursive approach  Bottom-up • solve smallest version of problem • use to solve next larger problem • incremental approach 32 3 variations  Decrease-by-constant • compute an for positive integer n • an = an-1 * a • f(n) = an • f(n) = f(n-1) * a, if n >0 • f(n) = 1, if n == 0 • recursive definition 33 3 variations  Decrease-by-constant-factor • an = (an/2)2 • n/2 is not an integer if n is odd, so… • an = (an/2)2 , if n is even • an = (a(n-1)/2)2 * a, if n is odd • an = 1, if n = 0 • O(logN) number of multiplications 34 3 variations  Variable-size-decrease • lookup in BST • BST is unbalanced • at each node going left or right removes a variable number of nodes 35  Incremental approach  Reduce list size to trivial, sort and solve  Then increase list size  Put 1st 2 items in correct order  Insert the 3rd item in the correct place relative to the first 2  Insert the 4th item in the correct place relative to the first 3  etc…  Nested loop  Outer loop • index k from 1 to A.length – 1 • item to put into correct place  Inner loop • index j from k – 1 to 0 • items to compare to A[k] to find its correct place 8 5 j k 2 6 9 3 1 4 0 7 38 temp 5 8 j 2 6 9 3 1 4 0 7 k 39 temp 5 8 j 2 6 9 3 1 4 0 7 k 40 5 8 j k 2 6 9 3 1 4 0 7 41 5 8 2 6 9 3 1 4 0 7 j k 42 temp 2 5 8 6 9 3 1 4 0 7 j k 43 temp 2 5 8 6 9 3 1 4 0 7 j k 44 temp 2 5 j 8 6 9 3 1 4 0 7 k 45 2 j 5 8 6 9 3 1 4 0 7 k 46 2 5 8 6 9 3 1 4 0 7 j k 47 temp 6 2 5 8 9 3 1 4 0 7 j k 48 temp 6 2 5 8 9 3 1 4 0 7 j k 49 2 5 6 8 9 3 1 4 0 7 j k 50 51 Item value Position in the Array  After the i-th iteration of outer loop • A[0] through A[i – 1] are in order relative to each other only  In order to insert an item, we need to shift some items to the right  Outer loop executes N times  Worst-case • Occurs when A is in reverse sorted order • Inner loop executes 1 to N – 1 times • O(N2)  Best case • Occurs when A is already sorted • Inner loop never executes • O(N) 55  Divide an instance of a problem into two or more smaller problems  Solve smaller problems, if easy  If not, divide again  “Top-down” approach • solve “top” problem by stopping and going down to solve smaller problem  Classic recursion 56  Strategy 1. divide a problem into smaller instances 2. conquer the smaller instances 3. combine the smaller solutions 57  Do not use when an instance of size n: • divides into two or more instances of near n size • divides into n instances of size n/c, where c is a constant  Do not get very much out of dividing problem  Sometimes this is unavoidable • larger problem is too difficult to solve without dividing 58  Fundamental Idea • array of size one is sorted • divide an array repeatedly until it is a bunch of sorted arrays of size one • combining two sorted arrays into a single sorted array can be done in O(N) 59  Algorithm • divide array into 2 halves • merge sort each half  divide, sort, merge  a list of size one is already sorted • merge the halves 60 aidx A bidx 0 1 3 4 7 0 1 2 3 4 B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 5 6 7 8 9 61 aidx A bidx 0 1 3 4 7 0 1 2 3 4 B 2 5 6 8 9 0 1 2 3 4 cidx C 0 0 1 2 3 4 5 6 7 8 9 62 aidx A bidx 0 1 3 4 7 0 1 2 3 4 B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 0 1 2 3 4 5 6 7 8 9 63 aidx A bidx 0 1 3 4 7 0 1 2 3 4 B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 0 1 2 3 4 5 6 7 8 9 64 aidx A 0 1 3 4 7 0 1 2 3 4 bidx B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 0 1 2 3 4 5 6 7 8 9 65 aidx A 0 1 3 4 7 0 1 2 3 4 bidx B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 66 aidx A 0 1 3 4 7 0 1 2 3 4 bidx B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 67 aidx A 0 1 3 4 7 0 1 2 3 4 bidx B 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 68 aidx A 0 1 3 4 7 0 1 2 3 4 B bidx 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 69 aidx A 0 1 3 4 7 0 1 2 3 4 B bidx 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 70 aidx A 0 1 3 4 7 0 1 2 3 4 B bidx 2 5 6 8 9 0 1 2 3 4 cidx C 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 71 aidx A bidx 0 1 3 4 7 0 1 2 3 4 cidx C 0 1 B 2 5 6 8 9 0 1 2 3 4 Works exactly the same if input lists are just different parts of the same list 2 3 4 5 6 7 8 9 72 aidx A bidx 0 1 3 4 7 2 5 6 8 9 0 1 2 3 4 5 6 7 8 9 cidx C 0 1 Works exactly the same if input lists are just different parts of the same list 2 3 4 5 6 7 8 9 73 A 0 1 3 4 7 2 5 6 8 9 0 1 2 3 4 5 6 7 8 9 Sorted data must be copied back to original list after it is sorted C 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 74 A 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Sorted data must be copied back to original list after it is sorted C 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 75 5 2 6 3 1 4 0 7 76 5 5 2 6 2 6 3 3 1 4 1 0 4 7 0 7 77 5 5 5 2 6 2 6 3 2 6 3 1 4 0 1 3 1 4 4 7 7 0 0 7 78 5 5 5 5 2 6 2 6 3 3 1 4 0 1 2 6 3 2 6 3 1 1 4 7 7 0 4 0 4 0 7 7 79 5 2 6 3 1 4 0 7 80 2 5 5 3 6 2 6 3 1 1 4 0 4 0 7 7 81 2 2 5 3 5 0 6 5 3 6 2 6 3 1 1 1 7 4 4 0 4 0 7 7 82 2 2 5 0 1 2 3 5 6 3 4 5 6 0 5 3 6 2 6 3 1 1 1 7 7 4 4 0 4 0 7 7 83 Item value Position in the Array  Merge sort is not an in-place algorithm • requires a temporary buffer to hold the partially merged array  Merge sort is an example of a divide and conquer algorithm • divides problem into progressively smaller parts • solves easy smaller problems • combines the result  Calls to mergeAux form a binary tree • number in node is the array size, given N = 8 8 4 4 Log2N 1 1 1 2 2 2 2 1 1 1 1 1  The height of the tree is O(logN)  Work done at each level • all values are merged at each level (not node) • O(N)  Total time is O(NlogN) • always, never any faster or slower 88  Several solutions  Each solution has a value  Want to find an optimal solution • one with best (min/max) value • may be several optimal solutions 89  Motivation  Works well for optimization problems  Difficulty • for some recursive problems • recursion may be inefficient • (rule 4: compound interest rule) 90  Fibonacci • fib(1) = 1, fib(2) = 1 • fib(N) for N > 2 = fib(N-1) + fib(N-2) • fib(3) = fib(2) + fib(1) = 1 + 1 = 2 int fib(int n){ if( n <= 2) return 1; return fib(n – 2) + fib(n – 1); } fib(4) 3 1 + 2 fib(3) fib(2)  Note: fib(2) is called twice  Redoing work 2 1 fib(1) + 1 fib(2)  Solution: Dynamic Programming • solve sub-problems 1st • store results in a table • a “bottom-up” approach vs. divide and conquer’s “top-down” 93  Fibonacci Dynamic Programming int fib(int n){ int fibs[n+1]; fibs[1] = 1; fibs[2] = 1; for(int i = 3; i < n + 1; i++){ fibs[i] = fibs[i-2] + fibs[i-1]; } return fibs[n]; } 94  Recognizing • recursive problem • same sub-problems solved independently  Developing algorithm • establish recursive property  allows division into sub-problems • solve sub-problems bottom up  store results in a table 95  Levenshtein distance • an edit distance • how much two strings differ • number of point mutations need to change string s1 into string s2  Point mutation • change a letter • insert a letter • delete a letter 96  Levenshtein distance: recursive property • d(s1,s2) for strings s1 & s2 • d(“”, “”) = 0 // empty strings are the same • d(s1, “”) = d(“”, s1) = s1.length() • d(s1+ch1, s2 + ch2) = minimum of…  if( ch1 == ch2 ), d(s1, s2) // characters are the same  if( ch1 != ch2), d(s1, s2) + 1 // change a letter  d(s1 + ch1, s2) + 1 // delete last letter from s2  d(s1, s2 + ch2) + 1 // delete last letter from s1 97  Levenshtein distance • could easily write recursively • base case  one string is empty • 3 recursive cases  remove last character from both words  remove last character from string s1  remove last character from string s2 98  Levenshtein distance • could easily write recursively • inefficient to write recursively • lots of repeated sub-problems  remove last character from s1, recurse  then remove last character from s2, recurse  the same as removing from both, recurse 99  Dynamic programming approach  create a table • stores edit distance sub-problems • 2D array m • s1.length() rows • s2.length() columns • m[i][j] = d(s1[0..i-1], s2[0..j-1])  distance of first i characters of s1 to first j characters of s2 10 0  Dynamic programming approach  Important note • i & j are “count” based indices for m, counting # • • • • of characters compared i & j are 0-based indices for s1 & s2 s1[4] is 5th character of s1 m[4][j] is comparing 1st four chars of s1 against 1st j chars of s2 so s1[3] is the last char compared for m[4][j] 10 1  Dynamic programming approach  Fill the table  base cases • m[0][0] = 0 • m[i][0] = i • m[0][j] = j 10 2  Dynamic programming approach  Fill the table  recursive cases: m[i][j] = min( • if( s1[i-1] == s2[j-1] ) m[i – 1][j – 1] • else m[i – 1][j – 1] + 1 • m[i – 1][j] + 1, • m[i][j – 1] + 1 ) 10 3  Dynamic programming approach  Fill the table • recursive table entries rely only on previous row and column • fill from left to right, top to bottom  Complexity • fill an entry: O(1) • # of entries: s1.length() * s2.length() = n * m • n*m * O(1) = O(n*m) 10 4 10 5  Also used to solve optimization problems  Make a series of choices • irreversible  Make “best” choice at the time • ignoring previous choices and future choices  Intuitive and simple to create  Difficult to prove optimal 10 6  Make change using US currency • use the least number of coins • 25c, 10c, 5c, 1c • give quarters until infeasible • then give dimes, then nickels, then pennies • 83c = 3 quarters, 1 nickel, 3 pennies  Not optimal for all denominations • 25c, 10c, 1c • 30c = 3x 10c, not 25c + 5x 1c 10 7  Outline 1. start with empty set 2. add items in sequence  sub-outline next slide 3. repeat (2) until set represents solution 10 8  Selection procedure • chooses next item based on greedy criterion  Feasibility check • does the new set violate the rules?  Solution check • is the new set the answer? 10 9  Symbols are encoded into binary • e.g., letters, colors, …  Convert fixed-length encoding • ASCII • pixel color data (RGB)  to variable-length encoding • most frequent symbols have smaller encoding • reduces size of encoded data 11 0  Variable length encoding • how to separate symbols? • e.g., 0000111 =>000 01 11 => BAD • use prefix-free codes • no codeword is a prefix for any other 11 1  How 0 0 1 B E 0 1 1 0 1 A C D do we create a set of prefix-free codewords?  Binary tree • symbols as leaves • left branch is 0, right branch is 1 • path from root to leaf defines prefix-free code 11 2  How to create a tree to maximize compression?  Huffman’s algorithm 1. create a tree for each symbol with weight based on frequency of the symbol 2. combine two smallest weight trees  make left and right child of new tree  weight of new tree is sum of left and right child 3. repeat step 2 until 1 tree 11 3 35 10 A 20 B 20 C 15 D E symbol A B C D E frequency 35 10 20 20 15 11 4 25 35 B E 20 A 20 C D symbol A B C D E frequency 35 10 20 20 15 11 5 25 40 35 B E A C D symbol A B C D E frequency 35 10 20 20 15 11 6 60 40 A B C D E symbol A B C D E frequency 35 10 20 20 15 11 7 A B E C D symbol A B C D E frequency 35 10 20 20 15 11 8 0 0 1 A 0 B 1 0 C 1 Symbol Code A 01 B 000 C 10 D 11 E 001 D 1 E symbol A B C D E frequency 35 10 20 20 15 11 9  Average code length • sum of (codeword length * probability of symbol) for all symbols • for our example = 2.25  Fixed length would require 3 bits • 23 = 8 > 5 • 22 = 4 !> 5  Compression ratio • fixed length – average code length / fixed length • (3- 2.25) / 3 = 25% reduction 12 0 12 1  Problem • choose a sequence of items • from a set • sequence must satisfy some criterion  State space tree • root of tree, no items selected • children: all possible selections from set • leaf: sequence complete • each path represents a possible sequence 12 2  Backtracking • goal: find solution • create & prune tree • preorder, depth-first search of state space tree  Backtracking pruning • non-promising node: node where children cannot lead to solution • pruned state space tree should include only promising nodes 12 3  Backtracking algorithm void checkNode(Node v){ if( promising(v) ){ if( v is solution ){ stop search and return solution } else{ for( each child u of v ){ checkNode(u) }}}} 12 4  Backtracking algorithm • state space tree not explicitly created and traversed • tree created implicitly using recursion • pruning happens while traversing 12 5  Eg: n-Queens  Place n queens on an n x n chessboard  Queens must not threaten each other  No two queens can be in same row, column, or diagonal 12 6  n-Queens state space tree • queens cannot share row • for each queen only need to choose column • each level in tree represents choosing column for queen in next row • each node stores 2 numbers: row,column 12 7 S Portion of state space tree 0,0 1,0 2,0 3,0 3,1 0,1 1,1 2,1 3,2 1,2 2,2 0,2 0,3 1,3 2,3 3,3 12 8 S Pruned left-side of state space tree 0,0 1,0 2,0 1,1 2,1 1,2 2,2 2,3 1,3 2,0 3,0 2,1 3,1 2,2 3,2 2,3 3,3 12 9 S Pruned right-side of state space tree 0,1 1,0 1,1 1,3 1,2 2,0 3,0 3,1 3,2 13 0 13 1 13 2 13 3 13 4 13 5 13 6 13 7 13 8 13 9 14 0 14 1 14 2  Random number is used to make choice  Run time depends on random numbers & input  Worst-case runtime may be similar to non-random algorithm  Why bother then? • if input are not evenly distributed • may avoid worst-case more often • runs differently even for same input 14 3  Two types:  Monte Carlo • always run fast • may produce incorrect answers with small probability  Las Vegas • always produces correct answer • runs quickly with high probability 14 4  Skip list • Las Vegas • sorted linked list • average lookup and insert of O(LogN) 14 5  Preliminary skip list • every 2nd node has link to node two ahead • every 4th node has link to node four ahead • 2ith node has link to node 2i ahead A F U H G M R S X 14 6  Lookup • traverse “highest” link until next item is too large • drop to next highest link, continue • dropped to lowest link, should be pointing to item A F U H G M R S X 14 7 Lookup S A F U H G M R S X 14 8 Lookup S A F U H G M R S X 14 9 Lookup S A F U H G M R S X 15 0 Lookup S A F U H G M R S X 15 1 Lookup S A F U H G M R S X 15 2 Lookup S A F U H G M R S X 15 3 Lookup T A F U H G M R S X 15 4 Lookup T A F U H G M R S X 15 5 Lookup T A F U H G M R S X 15 6 Lookup T A F U H G M R S X 15 7 Lookup T A F U H G M R S X 15 8 Lookup T A F U H G M R S X 15 9  Insert • could cause nodes to shift right • each shifted node would need to change  “level”: number of pointers  nodes pointed to and nodes that point to it A F U H G M R S X 16 0  Insert • add randomization • randomly choose “level” of new node • preliminary: 1/2i nodes of at least level i A F U H G M R S X 16 1  Insert • add randomization • randomly choose “level” of new node • preliminary: 1/2i nodes of at least level i • random: new node level i with probability of 1/2i F A M G H R S U X 16 2  Insert • during lookup keep track of “drops” • drop pointers <= “level”  point to new node F A M G H R S U X 16 3 Insert U F A M G H R S X 16 4 Insert U F A M G H R S X 16 5 Insert U F A M G H R S X 16 6 Insert U F A M G H R S X 16 7 Insert U F A M G H R S X 16 8 Insert U F A M G H R S X 16 9 Insert U F A M G H R S U X 17 0  Skip list complexity  Preliminary lookup • cut out half the remaining nodes at each step • O(LogN) • why? 1/2i nodes at level i  Final lookup • still has 1/2i nodes at level i • randomly spread through list • on average: O(LogN) 17 1  Skip list complexity  Final insert • lookup: avg O(LogN) • pointer manipulations: at most i • on average: O(LogN + i) = O(LogN) 17 2 17 3

k - Joe Meehean

Related documents

Products

Support

k - Joe Meehean

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib