Data Structures and Algorithms Lecture 2 – 18th August 2004 There are 3 major types of machines, depending on their representation of data in the memory: - Pointed Machines – these machines represent data as pointers only, no arrays are defined in these machines. - RAM model – these machines represent data as arrays. In all the lectures, it is assumed that we are working with these machines. - PRAM model – this model is used in parallel computing, and is similar to the RAM model. Asymptotic Complexity Big-Oh Definition: O(f(n)) is a class of functions g: N → R such that g(n) ≤ c*f(n) if n ≥ n0 for some c, n0. Big-Oh defines the upper limit for functions. Example 1: 5*n + 10 = O(n) Proof: 5*n + 10 ≤ c*n for all n ≥ n0 where c = 6, n0 = 10 Hence, 5*n + 10 = O(n) Example 2: 2*n2 + 10*n + 7 = O(n2) Proof: 2*n2 + 10*n + 7 ≤ c*n2 for all n ≥ n0 where c = 3, n0 = 17 Hence, 2*n2 + 10*n + 7 = O(n2) Big-Omega Definition: Ω(f(n)) s a set of function g: N → R such that g(n) ≥ c*f(n) if n ≥ n0 for some c, n0. Big-Omega defines the lower limit for functions. Example 1: 5*n - 1000√n = Ω(n) Proof: Let c = 1 ‘c’ can be any value less than 5 5*n - 1000√n ≥ c*n 5*n - 1000√n ≥ n 4*n ≥ 1000√n n ≥ 250√n √n ≥ 250 n ≥ 62500 Hence, 5n - 1000√n ≥ c*n for all n ≥ n0 for c = 1, n0 = 62500 Theta Definition: If f(n) = O(g(n)) and f(n) = Ω(g(n)) then f(n) = Θ(g(n)) Sorting Algorithms Insertion Sort Example: If the input data is [9, 7, 2, 3] Output data: [] Output data: [9] Output data: [7, 9] Output data: [2, 7, 9] Output data: [2, 3, 7, 9] Pseudo Code: procedure INSERTION_SORT(A, n) 1. B = [] 2. 3. 4. 5. 6. 7. 8. 9. // initially the output array is empty // we insert the first input, then the 2nd input // if the 2nd input is larger then the 1st input // then the 1st input is moved and the 2nd // input is inserted in its place // the completely sorted array // A is the input array with n elements // B is the output array with n elements // we initialize the array to be empty // put A[i] in B[] for i = 1 to n j=1 while (j ≤ i-1) and (B[j] < A[i]) do j = j + 1 for k = i-1 down to j do B[k+1] = B[k] B[j] = A[i] return B Insertion Sort analysis By looking at the pseudo code, we can see that in the program, line no.1 and line no.9 are executed once, while line no.2, 3 and 8 will get executed n times. The other lines will run 1+2+3+…+n times = n(n+1)/2 times The total time for this pseudo code will be as follows: 1 + n + n + n*(n+1)/2 + n*(n+1)/2 + n*(n+1)/2 + n*(n+1)/2 + n*(n+1)/2 + n + 1 2 + 3*n + 2*n*(n+1) 2 + 3*n + 2*n2 + 2*n 2*n2 + 5*n + 2 Hence, total time = 2*n2 + 5*n + 2 = O(n2) The worst case for this algorithm, would be when the data is presented in descending order, and the best case would be the sorted algorithm, in ascending order. In this algorithm, even for the best case, the algorithm takes Ω(n2). Hence, we can conclude that Insertion Sort has Θ(n2). Merge Sort Example: 1 5 3 4 2 7 8 Input Array 1 5 3 6 4 2 7 8 1 3 5 6 2 4 7 8 1 2 7 8 3 Pseudo code: MERGE_SORT(A, n) 1. if n = 1 return A 2. L ← A[1, 2, …, n/2] 3. 4. 5. 6. 7. 6 4 5 6 Split the array into two equal parts Sort the parts Merge them together to form the output array // Input array A with n elements // if only one element, then the array is sorted // divide the array into left and right, with equal // elements R ← A[1+n/2, 2+n/2, …, n] L’← MERGE_SORT[L, n/2] R’← MERGE_SORT[L, n/2] A’← MERGE[L’, R’] // MERGE is a routine defined elsewhere return A’ // A’ contains the sorted array Merge Sort analysis using the recursion tree The MERGE algorithm takes n times n Levels Of Recursion is approx. log(n), n/2 n/2 times n/2 OR O(log(n)) n/4 . . . . n/4 . . . . n/4 . . . . n/4 . . . . n/4 times By adding all the time that the MERGE algorithm, we would get Total time for MERGE = n + (n/2) + (n/4)… = O(n) Now, the total time, consists of the time taken to analyze the recursion levels, multiplied by the time take to merge the sorted array at each level = O(n*log(n)) Hence, the total time taken for MERGE_SORT = O(n*log(n)) Merge Sort analysis by using Mathematical Induction Let T(n) be the running time of Merge Sort with an input of size n Then T(1) = O(1) and T(n) ≤ c’*n + 2*T(n/2) // c’ is any constant // c’*n is the time taken for MERGE We need to prove that T(n) ≤ c*n*log2(n) is true Let us assume that T(n/2) ≤ c*(n/2)*log2(n/2) is true Then T(n) ≤ c’*n + 2*T(n/2) ≤ c’*n + 2*c*(n/2)*log2(n/2) ≤ c’*n + c*n*log2(n-1) // log(a/b) = log a – log b; log22 = 1 ≤ c’*n + c*n*log2(n) – c*n ≤ c*n*log2(n) if c > c’ Therefore, T(n) = O(n*log(n)) Hence, our assumption was correct.