CS 235102 Data Structures (資料結構) Chapter 1: Basic Concepts Spring 2012 What is an algorithm? An “algorithm” is a set of instructions that solves a well-defined computational problem. Algorithms must satisfy the following criteria: Input. Zero/more quantities are externally supplied. Output. At least one quantity is produced. Definiteness. Each instruction is clear and unambiguous. Finiteness. Terminate after a finite no. of steps. Effectiveness. Each instruction is basic and feasible to be computed easily. 1 Describing an algorithm Many ways to describe an algorithm: Use English sentences Graphic representations (called Flowcharts) • Work well only if algorithms are small and simple Use C language (mixing with English sentences) • Our practice in our course 2 Binary Search Assume we have n ≥ 1 distinct integers that are sorted in array A[0 … n-1]. Determine if an integer x in the array. If x=A[j], return index j; otherwise return -1. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A 1 3 5 8 9 17 32 50 Eg. For x=9, return index 4; For x=10, return -1. 3 Binary Search Step 1. Find element A[mid] at middle position. Step 2. If x = A[mid], then return mid; If x < A[mid], then search left-half; If x > A[mid], then search right-half; 4 Algorithm of Binary Search int binsearch(int A[], int x) { int left=0, right=n-1, mid; //search x in A[] while (left <= right) { //more integers to check // let A[mid] be the middle element mid = (left+right)/2; if (x == A[mid]) return mid; if (x < A[mid]) right = mid-1; if (x > A[mid]) left = mid+1; } return -1; } 5 Recursive Algorithm Direct recursion Procedure calls itself directly Eg. procA proc A Indirect recursion Procedure calls other procedures that invoke the calling procedure Eg. procA procB procA Some problem itself is defined recursively. 6 Binomial Coefficients Binomial coefficient C(n, m) = n! m! (n-m)! can be computed by the recursive formula: C(n, m) = C(n-1, m) + C(n-1, m-1) where C(0, 0) = C(n, n) = 1. 7 Computing Binomial Coefficients // Compute binomial coefficient C(n,m) recursively. int bin_coeff(int n, m) { // termination conditions if (m==n) then return 1; else if (m==0) then return 1; // recursive step else return bin_coeff(n-1,m) + bin_coeff(n-1,m-1); } 8 Hints for Recursive Algorithms To write a recursive algorithm, someone must make sure to have Termination conditions; Parameter values decrease so that each call brings us one step closer to a solution. 9 Recursive Binary Search int binsearch(int A[], x, left, right) { int mid; if (left <= right) { //more integers to check // let A[mid] be the middle element mid = (left+right)/2; if (x == A[mid]) return if (x < A[mid]) mid; return binsearch(A, x, left, mid-1); if (x > A[mid]) return binsearch(A, x, mid+1, right); } return -1; } 10 A Running Example Search for x=9 in array A[0…7] : A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A 1 3 5 8 9 17 1st 3rd 2nd 32 50 1st call: binsearch(A, 9, 0, 7) 2nd call: binsearch(A, 9, 4, 7) 3rd call: binsearch(A, 9, 4, 4) return index 4. 11 Criteria for Good Programs Many criteria to judge a program: Meeting the original task specification? Does it work correctly? Documentation for using the program? Modularity Code readability Although the above criteria are vitally important, it is difficult to achieve them. In order to achieve them, many real experience and practice are needed. 12 Performance Analysis Two criteria for performance analysis/evaluation: How much memory space is needed? How much running time is needed? Performance analysis vs. performance measurement Performance analysis: • • • machine independent a prior estimate heart of “complexity theory” Performance measurement: • • machine dependent a posterior testing 13 Performance Analysis Performance Analysis contains two parts: Space Complexity and Time Complexity. Space Complexity includes also two parts: 1st part -- A fixed part: • independent of the no. and the size of input and outputs. • includes instruction space, space for simple variables, fixed-size structured variables, constants. 2nd part -- A variable part: (see next slide …) 14 Performance Analysis Space Complexity includes also two parts: 2nd part -- A variable part: • • depends on the particular problem instance I being solved. includes recursion stack space, structured variable whose size depends on the particular instance I. Thus Space Complexity S(P) = 1st part + 2nd part = C + SP(I) constant Concentrate on evaluate this term 15 Instance Characteristic (I) Commonly used characteristics (I) include the number, size and values of the inputs and output. also two parts: Eg. sorting(A[], n) Then I= number of integers = n. Eg. Summing 1 to n, i.e., 1+2+3+… n Then I= value of n = n. 16 Space Complexity Eg. (See Program 1.10 in the textbook.) float abc(float a, b, c) { return a+b+b*c+(a+b-c)/(a+b)+4.00; } C = space for the program + space for variables a, b, c, abc = constant SP(I) = 0 Thus S(P) = C + SP(I) = constant. 17 Iterative Summing (Program 1.11) // Compute the sum of n numbers in A[] iteratively. float sum(float A[], int n) { float tempsum = 0; int i; for (i=0; i<n; i++) tempsum += A[i]; return tempsum; } In C: -- Pass an array by reference only. -- Pass other parameters by value. In PASCAL: -- All parameters can be passed by reference or value. 18 Iterative Summing (Program 1.11) Instance characteristic (I) = n (=size of array A[]) If passed by reference (eg. in C) Ssum(I) = Ssum(n) = 0 If passed by value, Ssum(I) = Ssum(n) = n 19 Recursive Summing (Program 1.12) // Compute the sum of n numbers in A[] recursively. float rsum(float A[], int n) { if (n) return rsum(A, n-1) + A[n-1]; return A[0]; } Instance characteristic (I) = n Each call requires 4 ∙ ( 1 + 1 + 1) = 12 bytes How many calls (recursive depth): rsum(A, n) rsum(A,n-1) … rsum(A, 0) ==> n+1 calls Srsum(I) = Srsum(n) = 12 ∙ (n+1) 20 Time Complexity Time taken by program P: T(P) = Compile time + Running time = TC + TP (I) constant How to evaluate TP (I) ?? Add, Sub, Multiply, … take different running time Use “program step” to estimate running time • “program step” = a program segment whose execution time is independent of instance characteristic I. Eg. abc=a+b+b*c; -- one program step a=2; -- one program step 21 Iterative Summing (Program 1.11) // Compute the sum of n numbers in A[] iteratively. float sum(float A[], int n) { float tempsum = 0; int i; // 1 step for (i=0; i<n; i++) tempsum += A[i]; // n+1 steps // n steps return tempsum; // 1 step } Instance charateric (I) = n (=size of array A[]) Tsum (I) = Tsum (n) = 1 + (n+1) + n + 1 = 2n + 3 22 Recursive Summing (Program 1.12) // Compute the sum of n numbers in A[] recursively. float rsum(float A[], int n) { if (n) return rsum(A, n-1) + A[n-1]; return A[0]; } // 1 step // 1 step // 1 step Instance characteristic (I) = n Recurrence relation for Trsum(n) : Trsum(0) = 2 Trsum(n) = 2 + Trsum(n-1) = 2 + ( 2 + Trsum(n-2) ) =… = 2n + Trsum(0) = 2n + 2 23 Matrix Addition (Program 1.16) // Compute the sum of n numbers in A[] iteratively. void add(int a[][MAX_SIZE], b[][MAX_SIZE], c[][MAX_SIZE], int rows, int cols) { int i, j ; for (i=0; i<rows; i++) { //rows+1 steps for (j=0; j<cols; j++) //rows(cols+1) steps c[i][j] = a[i][j]+b[i][j]; //rows(cols) steps } } Instance charateric (I) = rows(cols) TP (rows, cols) = (rows+1) + rows(cols+1) + rows(cols) = 2∙rows∙cols + 2∙rows + 1 24 Observation on Step Counts In the previous examples : Can we say that rsum is faster than sum ? Tsum (n) = 2n + 3 steps Trsum (n) = 2n + 2 steps No. Since the execution time of each step is different. The step count is useful in that “How the running time changes with changes in the instance characteristic?” 25 Growth Rate of Time Complexity Eg. For sum program, Tsum (n) = 2n + 3 “means”: when n 10 fold ==> Tsum (n) 10 fold sum program runs in linear time. We only want to know the growth rate (called asymptotic time) Eg. Tsum (n) = 2n + 3 vs. Trsum (n) = 2n + 2 Then clearly Tsum (n) and Trsum (n) have the same growth rate. 26 Asymptotic Notation (Big-O) To compare the time complexity of two programs that compute the same function. To predict the growth rate in run time as the instance characteristic increases. Eg. Two programs with time complexity: P1: c1 n2 + c2 n P2: c3 n Let c1 =1, c2 =2, and c3 =100. Then • P1 faster, n2 + 2n ≤ 100n for n ≤ 98 • P2 faster, n2 + 2n > 100n for n > 98 27 Asymptotic Notation (Big-O) Eg. Two programs with time complexity: P1: c1 n2 + c2 n P2: c3 n Let c1 =1, c2 =2, and c3 =1000. Then • P1 faster, n2 + 2n ≤ 100n for n ≤ 998 • P2 faster, n2 + 2n > 100n for n > 998 No matter what values c1, c2, and c3 are, there will be n beyond which c1 n2 + c2 n > c3 n 28 Definition of Big-O Definition of O-notation: f(n) = O(g(n)) iff these exist c, no>0 such that f(n) ≤ c g(n) for all n ≥ no . Eg. 3n + 2 = O(n) since 3n+2 ≤ 4n for all n ≥ 2 Eg. 100n + 6 = O(n) since 100n+6 ≤ 101 n for all n ≥ 10 Eg. 10n2 + 4n + 2 = O(n2) since 10n2 + 4n + 2 ≤ 11 n2 for all n ≥5 29 Observation of Big-O Leading constants and lower-order terms do not matter. Can always find a constant large enough to make higher-order term swamp other terms. Thm 1.2: If f(n) = amnm + … + a1n + ao, then f(n) = O(nm). Eg. 10n2 + 4n + 2 = O(n2) 30 Property of Big-O Thm 1.2: If f(n) = amnm + … + a1n + ao, then f(n) = O(nm). Pf. f(n) ≤ |am| nm + … + |a1| n + |ao| ≤ nm (|am| + … + |a1| + |ao|) ≤ c nm , where c=|am| + … + |a1| + |ao| for n ≥ 1 Hence f(n) = O(nm). 31 More Examples 0.1 n2 - 10n - 6 = O(n2) always 1 We don’t write O(2n2) n + log n = O(n) n + n log n = O(n log n) n2 + log n = O(n2) 2n + n10000 = O(2n) n4 + 1000 n3 + n2 = O(n4) n4 + 1000 n3 + n2 = O(n5) n4 + 1000 n3 + n2 ≠ O(n3) 32 Naming Common Functions O(1) -- constant time O(log n) -- logarithmic time O(n) -- linear time O(n2) -- quadratic time O(n3) -- cubic time O(n100) -- polynomial time O(2n) -- exponential time When n is large enough, the latter terms take more time than the former ones. 33 More Notes on Big-O f(n) = O(g(n)) “means” g(n) is an upper bound of f(n) n = O(n) n = O(n2) n = O(n3) We want g(n) as small as possible !! Big-O is usually used to compute the worst-case running time of a program. f(n) = O(g(n)) is correct; but O(g(n)) = f(n) is wrong. 34 Compute Running Time in Big-O How to compute the time complexity of a program in big-Oh ? Compute the total step-count, then take big-Oh. Take big-Oh on each step, then sum up the big-Oh of all steps. 35 Rule of Sum Thm: If f1(n) = O(g1(n)), and f2(n)=O(g2(n)), then f1(n) + f2(n) = O(max(g1(n), g2(n)). Eg. f1(n) = O(n) f2(n) = O(n2) Then f1(n) + f2(n) = O(n2). Eg. f1(n) = O(n) f2(2) = O(n) Then f1(n) + f2(n) = O(n). Used to compute segments of program P1 and P2 if P1 is followed by P2. 36 Rule of Product Thm: If f1(n) = O(g1(n)), and f2(n)=O(g2(n)), then f1(n) ∙ f2(n) = O(g1(n) ∙ g2(n)). Eg. f1(n) = O(n) f2(n) = O(n) Then f1(n) ∙ f2(n) = O(n2). Used in time analysis of nested loops (See next slide …) 37 Rule of Product for (i=0; i<n; i++) { for (j=0; j<n; j++) sum := sum + 1; } // O(n) // O(n) // O(1) By rule of product, running time of this program: f(n) = O(n ∙ n ∙ 1) = O(n2). 38 Complexity of Binary Search int binsearch(int A[], int x) { int left=0, right=n-1, mid; while (left <= right) { //search x in A[] // O(log2 n) // let A[mid] be the middle element mid = (left+right)/2; // O(1) if (x == A[mid]) return mid; if (x < A[mid]) right = mid-1; if (x > A[mid]) left = mid+1; } return -1; // O(1) // O(1) // O(1) // O(1) } 39 Complexity of Binary Search Analysis of the while loop: Iteration 1: n values to be searched Iteration 2: n/2 left for searching Iteration 3: n/4 left for searching … Iteraton k+1: n/(2k) left for searching When n/(2k) = 1, searching must finish. That is n = 2k ==> k = log2 n Hence, worst-case running time of binary search is O(log2 n). 40 Definition of Big-Ω Definition of Ω-notation: f(n) = Ω(g(n)) iff these exist c, no>0 such that f(n) ≥ c g(n) for all n ≥ no . Eg. 3n + 2 = Ω(n) since 3n+2 ≥ 3n for all n ≥ 1 Eg. 100n + 6 = Ω(n) since 100n+6 ≥ 100 n for all n ≥ 1 Eg. 10n2 + 4n + 2 = Ω(n2) since 10n2 + 4n + 2 ≥ n2 for all n ≥1 41 Definition of Big-Θ Definition of Θ-notation: f(n) = Θ(g(n)) iff f(n) = O(g(n)) and f(n) = Ω(g(n)). Eg. 3n + 2 = Θ(n) Eg. 100n + 6 = Θ(n) Eg. 10n2 + 4n + 2 = Θ(n2) 42 Plot of Common Function Values 43 Running Times On Computer 44 Performance Measurement Obtain actual space and time requirement when running a program. How to do time measurement in C ? Method 1: Use clock(), measured in clock ticks • Method 2: Use time(), measured in seconds • (See next slides for details …) (See next slides for details …) To time a short event, it is necessary to repeat it many times, and then take their average. 45 Performance Measurement Method 1: Use clock(), measured in clock ticks #include <time.h> void main() { clock_t start = clock(); // main body of program comes here! clock_t stop = clock(); double duration = ((double) (stop-start)) / CLOCKS_PER_SEC; } 46 Performance Measurement Method 2: Use time(), measured in seconds #include <time.h> void main() { time_t start = time(NULL); // main body of program comes here! time_t stop = time(NULL); double duration = (double) difftime(stop,start); } 47