Data & File Structures (GTU) 1-1 Introduction to Data Structures 1 Introduction to Data Structures Syllabus : Data Management concepts, Data types-primitive and nonprimitive, Performance Analysis and Measurement (Time and space analysis of algorithms- Average, best and worst case analysis), Types of Data structures – Linear and Non Linear Data Structures. 1.1 1.1.1 Definition : Data : Data is collection of numbers, alphabets and symbols combined to represent information. A Computer takes raw data as input and after processing of data it produces refined data as output. We might say that computer science is the study of data. Atomic data are non-decomposible entity. For example, an integer value 523 or a character value ‘A’ cannot be further divided. If we further divide the value 523 in three digits ‘5’, ‘2’ and ‘3’ then the meaning may be lost. Composite data is a composition of several atomic data and hence it can be further divided into atomic data. Example : Fig. 1.1 shows example of atomic and composite data. In this example date of birth (say 20/08/2006) can be separated into three atomic values. First one gives the day of the month, second one gives the month and the last one is the year. Data & File Structures (GTU) 1-2 Introduction to Data Structures Fig. 1.1 : Example of Atomic and composite data 1.1.2 Data Object : Data object is a set of elements such that the set may be finite or infinite. For example : consider a set of employees working in a bank. It is a finite set, whereas set of natural numbers is an infinite set. 1.1.3 Data Structure : A Data Structure is merely an instance of an Abstract Data Type (ADT). The way in which the data is organized. Data Structure is formally defined to be triplet (D,F,A) where “D” stands for a set of domains, “F” denotes the set of operations and “A” represents the axioms defining the functions in “F”. Arrays and Strings are data structures available in Java. The study of data structures is nothing but how the basic data structures are used to build new data structures (linked list, stacks, queues, and binary trees) and what are the various operations that are applied to these data structures. These new data structures are created with the help of built-in data structures. We will study these data structures in next few chapters. Types of Data Structure : 1. Primitive and Non-Primitive Data Structure : Primitive : The integers, reals, logical data, character data, pointers and reference are primitive data structures. These data types are available in most programming languages as built in type. Data objects of primitive data types can be operated upon by machine level instructions. Data & File Structures (GTU) 1-3 Introduction to Data Structures Non-Primitive : These data structure are derived from primitive data structures. A set of homogeneous and heterogeneous data elements are stored together. Examples of non-primitives data structures: Array, structure, union, linked-list, stack, queue, tree, graph. 2. Linear and Non-Linear Data Structure : Linear : Elements are arranged in a linear fashion (one dimension). All one-one relation can be handled through linear data structures. Lists, stacks and queues are examples of linear data structure. Fig. 1.2 shows the representation of linear data structure. (a) Representation of Linear data structures in an Array (b) Representation of Linear data structures through Linked structure Fig. 1.2 : Representation of Linear data structures Non-Linear : All one-many, many-many or many-many relations are handled through non-linear data structures. Every data element can have a number of predecessors as well as successors. Tree graphs and table are examples of non-linear data structures (Refer Fig. 1.3). Representation of binary tree in linked and array structure is given in Fig. 1.4. Data & File Structures (GTU) (a) 1-4 Introduction to Data Structures (b) Fig. 1.3 : Non-Linear data structure (a) Representation of the binary tree through linked structure (c) (b) Representation of binary tree through an array Fig. 1.4 : Representation of tree of Fig. 1.3(a) 3. Static and Dynamic Data Structure : Static : In case of static data structure, memory for objects is allocated at the time of loading of the program. Amount of memory required is determined by the compiler during compilation. Example : int A[50]; // C++ Statement Remember following points : Static data structure causes under utilization of memory. Static data structure may cause overflow. No re-usability of allocated memory. Difficult to guess the exact size of data at the time of writing of program. Dynamic : In case of dynamic data structure, the memory space required by variables is calculated and allocated during execution. Data & File Structures (GTU) 1-5 Introduction to Data Structures Remember following points : Linear data structure can be implemented either through static or dynamic data structures. Static data structured is preferred. All linked structures are preferable implemented through dynamic data structure. Dynamic data structures provides flexibility in adding, deleting or rearranging data objects at run time. Additional space can be allocated at run time. Unwanted space can be released at run time. It gives re-usability of memory space. 4. Persistent and Ephemeral Data Structure : Persistent : A data structure is said to be persistent if it can be accessed but can not be modified. Any operation on such data structure creates two versions of data structures. Previous version is saved and all changes are made in the new version. Functional data structures are persistent. Ephemeral : If we are able to create data cells and modify their contents, we can create ephemeral data structures. These are data structure that changed over time. 1.2 1.2.1 Algorithm : Definition : The word Algorithm comes from the name of a Persian author Abu Jafar Mohammad ibn Musba al Khowarizmi (c. 825 A.D.), who wrote the textbook on mathematics (see Fig. 1.5). This word has taken special significance in computer science, where Algorithm has come to refer to a method that can be used by a computer for the solution of a problem. This is what makes algorithm different from words such as process, technique or method. Fig. 1.5 : Abu Jafar Mohammad ibn Musba al Khowarizmi (825 AD) Data & File Structures (GTU) 1-6 Introduction to Data Structures An Algorithm is a finite set of instructions that if followed, accomplishes a particular task in a finite amount of time. It is a good software engineering practice to design algorithms before we write a program. In addition, all algorithms must satisfy the following criteria (Refer Fig. 1.6). Fig. 1.6 Input : All the algorithms should have some input. The logic of the algorithm should work on this input to give the desired result. Output : At least one output should be produced from the algorithm based on the input given. Example : If we write a program to check the given number to be prime or not, we should get an output as ‘number is prime’ or ‘number is not prime’. Definiteness : Every step of algorithm should be clear and not have any ambiguity. Finiteness : Every algorithm should have a proper end. The algorithm can’t lead to an infinite condition. Example : If we write an algorithm to find the number of primes between 10 and 100, the output should give the number of primes and execution should end. The computation should not get into an infinite loop. 1.2.2 Effectiveness : Every step in the algorithm should be easy to understand and can be implemented using any programming language. How to Write an Algorithm : The algorithm is described as a series of steps of basic operations. These steps must be performed in a sequence. Each step of the algorithm is labelled. Algorithm is a backbone for any implementation by some desired language. One can implement the algorithm very effectively if some method is followed in writing an algorithm. For having the systematic approach into it we have to use some algorithmic notations. Let us see one example of algorithm. Refer Fig. 1.7. Data & File Structures (GTU) 1-7 Introduction to Data Structures Fig. 1.7 Step 1 : Start. Step 2 : Read two positive integers and store them in X and Y. Step 3 : Divide X and Y. Let the remainder be R and the quotient be Q. Step 4 : If R is zero then go to step 8. Step 5 : Assign Y to X. Step 6 : Assign R to Y. Step 7 : Go to Step 3. Step 8 : Print Y ( the required GCD). Step 9 : Stop. The steps mentioned in the above algorithm are simple and unambiguous. Anybody, carrying out these will clearly know what to do in each step. Hence, the above algorithm satisfies the definiteness property of an algorithm. 1.2.3 Algorithmic Strategies : Algorithmic strategies is a general approach by which many problems can be solved algorithmically. Various Algorithmic strategies are : 1. Divide and conquer : In this strategy the problem is divided into smaller subproblems and these subproblems are solved to obtain the solution to main problem. 2. Dynamic programming : The problem is divided into smaller instances and results of smaller reoccurring instances are obtained to solve the problem. 3. Greedy technique : From a set of obtained solutions the best possible solution is chosen each time, to obtain the final solution. 4. Back tacking : In this method, in order to obtain the solution trial and error method is followed. Using any of these suitable methods the problem can be solved. Data & File Structures (GTU) 1.2.4 1-8 Introduction to Data Structures Purpose of Analysis of Algorithm : When a programmer builds an algorithm during design phase of software development life cycle, he/ she might not be able to implement it immediately. This is because programming comes in later part of the software development life cycle. There is a need to analyze the algorithm at that stage. This will help in forecasting time of execution and amount of primary memory algorithm might occupy when it is implemented. 1.2.5 What is Analysis of Algorithm ? Analysis of algorithm means developing a formula or prediction of how fast the algorithm works based on the problem size. The problem size could be :The number of inputs/outputs in an algorithm. Example : For sorting algorithm, the number of inputs is the total number of elements to be arranged in a specific order. The number of outputs is the total number of sorted elements. The number of operations involved in the algorithm. Example : For a searching algorithm, the number of operations is equal to the total number of comparisons made with the search element. If we are searching an element in an array having ‘n’ elements, the problem size is same as the number of elements in the array to be searched. Here the problem size and the input size are the same and is equal to ‘n’. If we sorting elements in an array, there might be some copy operations (swaps) involved. The problem size could be the number of elements in the array or the number of copies performed during sorting. If two arrays of size n and m are merged , the problem size is the sum of two array sizes(= n+m). If nth factorial is being computed, the problem size is n. 1.2.5.1 Time Complexity : The Time Complexity of an algorithm is the amount of computer time it needs to execute the program and get the intended result. Data & File Structures (GTU) 1.2.5.2 1-9 Introduction to Data Structures Space Complexity : Analysis of algorithms is also defined as the process of determining a formula for prediction of the memory requirements (primary memory) based on the problem size. This is called Space Complexity of the algorithm. That is, Space Complexity is defined as the amount of memory required for running an algorithm. In summary, analysis of algorithm is aimed at determination of Time complexity and Space complexity of the algorithm. 1.2.6 Order of Magnitude of an Algorithm : The Order of Magnitude of an algorithm is the sum of number of occurrences of statements contained in it. Consider the following code snippet to understand the order of magnitude of algorithm: Example 1 : for (i=0; i<n; i++) { /* assume there are ‘c’ statements inside the loop*/ ……. ……. } In the algorithm containing above piece of code it is assumed that There are ‘c’ statements inside the for loop. Each of the ‘c’ statements take one unit of time. Note : The assumptions are made because information on the target machine is not known when the algorithm is built. So it is not possible to say how much time each statement in the algorithm takes to execute. The exact time depends on the machine on which the algorithm is run. Based on the above assumptions, Total time of execution for 1 loop= c*1=c Since the loop is executed ‘n’ times Total time of execution = n*c Thus ‘n*c’ is defined as the Order of Magnitude of Algorithm. Since ‘c’ is a constant (This is because while analyzing the algorithm, the exact number of steps is not known before programming. It is assumed that the number of steps is constant for each algorithm and only the number of iterations change.) , Order of Magnitude is approximated to be equal to ‘n’. This approximation is because for Data & File Structures (GTU) 1-10 Introduction to Data Structures higher values of ‘n’ , the effect of ‘c’(constant) is not significant. Thus, constant can be ignored. Example 2 : for (i=0; i<n; i++) { for(j=0; j<m; j++) { /*assume we have ‘c’ number of statements inside the loop*/ …….. …….. } } In the above example, based on earlier assumptions, The total time for one execution of the for inner loop= c*1 = c Since the inner loop is executed ‘m’ times, The total time of execution in the inner loop = m*c As per the nested loop concept, for every iteration of the outer loop, the inner loop is executed ‘m’ times. Since the outer loop is executed ‘n’ times, we have The total time of execution of the algorithm = n*(m*c) Since ‘c’ is constant, the order of magnitude of the above algorithm is approximately equal to ‘n*m’. The calculation of order of magnitude in the examples discussed above is the priori analysis of the algorithm. 1.2.7 Worst Case, Average Case and Best Case Running Time of an Algorithm : While analyzing the algorithm based on Priori Analysis principles, three different cases are identified. It is important to note that this classification is purely based on the nature of the problem for which we are developing the algorithm. Worst case : Refers to maximum number of instructions / operations that could be executed by the algorithm in order to give the desired output. Data & File Structures (GTU) 1-11 Introduction to Data Structures Average case : Refers to number of instructions / operations that could be executed by the algorithm on an average in order to give the desired output. Best case : Refers to minimum number of instructions/operations that could be executed by the algorithm in order to give the desired output. Example : Assume there are 100 students in a class. The worst case of the problem is none of the students clearing the exams. The best of the problem is all100 students clearing the exams. The average case of the problem is around 50 students clearing the exams. 1.2.8 Mathematical Notation for Determination of the Running Time of an Algorithm : For practical problems it is difficult to count the number of operations / instructions for worst/average/best case running time. Hence there is a need for mathematical notations to determine Worst case, Average case and Best case running times. Asymptotic Notations (O, Ω, ) : In order to analyse which data structure or algorithm is the best suited for the job, it is necessary to understand how a data structure or algorithm behaves over time and space. An important tool in this analysis is Asymptotic Notations. Big-Oh notation- ‘O’ Notation : This can be used to represent the worst case or average case or best case conditions of algorithms. Big ‘O’ refers to set of all functions whose growth rate is higher than that of the algorithm’s growth rate. Omega notation- ‘Ω’ Notation : It represents the lower bound for time of execution of an algorithm. It represents the most optimal algorithm to the given problem. Omega refers to set of all functions whose growth rate is lower than that of the algorithm’s growth rate. Data & File Structures (GTU) 1-12 Introduction to Data Structures Theta notation- ‘’ Notation : -notation is used to denote both upper and lower bounds. 1.2.8.1 Big-Oh Notation- ‘O’ Notation : O( ) is called as Big Oh. Its meaning is grows as or behaves as. It is defined as the rate at which the time of execution grows in terms of the problem size. Refer Fig. 1.8. Fig. 1.8 More formally, it can be defined as : For non-negative functions, T(n) and f(n), the function T(n)= O(f(n)) if there are positive constants c and n0 such that T(n) ≤ c * f(n) for all n, n ≥ n0. This notation is known as Big-Oh notation. If graphed, f(n) serves as an upper bound to the curve you are analysing, T(n). It describes the worst that can happen for a given data size. T(n) is the time of execution of an algorithm for different problem sizes n. f(n) is the mathematical representation of algorithm as a function of problem size n. This should be read as – ‘T of n’ is ‘ Big Oh’ of ‘f of n’ While relating T(n) with f(n) , relation ‘≤’ is used . This relation signifies upper bound. The time of computation (T(n)) can be equal or less than c * f(n). It cannot go beyond that value of c * f(n). Threshold problem size is the minimum problem size beyond which we can predict the behaviour with respect to performance of the algorithm. Threshold problem size is considered while arriving at worst case complexity equation (i.e T(n)), this is because algorithms might have initialization etc., which are ignored in priori analysis . Also time taken to execute assignment , move the data are ignored since during priori analysis , the machine on which we run the program is not known. Data & File Structures (GTU) 1-13 Introduction to Data Structures 1.2.8.2 Omega Notation- ‘Ω’ Notation : The best case running time of an algorithm is determined when the algorithm has the best possible condition for execution. The best case running time of an algorithm is represented by a mathematical equation as follows : For non-negative functions, T(n) and f(n), the function T(n) = Ω (f(n)) if there are positive constants c and n0 such that T(n) ≥ c * f(n) for all n, n ≥ n0. Where T(n) is the time of execution of an algorithm for different values of problem size n This should be read as – ‘T of n’ is ‘ Omega’ of ‘f of n’ Here the function f(n) is the lower bound for T(n). This means for any value of n (n≥n0), the time of computation of algorithm T(n) is always above the graph of f(n),so f(n) serves as the lower bound for T(n). It describes the best that can happen for a given data size. 1.2.8.3 Theta Notation- ‘’ Notation : The average case running time of an algorithm is determined when the algorithm has average condition for execution. To understand the Average condition, let us consider the search example. If the number to be searched is in the mid position in the search array, we call this condition as the average condition. The average case notation is again represented by a mathematical equation as follows : For non-negative functions, T(n) and g(n), the function T(n) = (g(n)) if there exist positive constants c1,c2 and n0 such that c1g(n) ≤ T(n) ≤ c2g(n) , for all n, n ≥n0 where, T(n) - is the time of execution of an algorithm for different problem sizes n g(n) - is the mathematical representation of an algorithm as a function of problem size n. This should be read as – ‘T of n’ is ‘ Theta’ of ‘g of n’ The Theta notation is more precise than both the Big oh and Omega notations. The function T(n) = (g(n)) iff g(n) is both an upper and lower bound on T(n). Some Common Time Complexity Examples : 1. O(1) -- Constant computing time 2. O(n) -- Linear 3. O(n2) -- Quadratic 4. O(n3) -- Cubic 5. O(2n) -- Exponential Data & File Structures (GTU) 1-14 Introduction to Data Structures If an algorithm takes time O(log n), it is faster, for sufficiently large n, than if it had taken O(n). Similarly, O(n log n) is better than O(n2) but not as good as O(n). These seven computing times-O(1), O(log n), O(n), O(n log n), O(n2), O(n3), and O(2n)-are the ones we see most often in this book. To get a feel for how the various functions grow with n, you are advised to study Table 1.1 and Fig. 1.9 very closely. log n n n log n n2 n3 2n 0 1 0 1 1 2 1 2 2 4 8 4 2 4 8 16 64 16 3 8 24 64 512 256 4 16 64 256 4096 65536 5 32 160 1024 32768 4294967296 Table 1.1: Function values Fig. 1.9 : Plot of function values Data & File Structures (GTU) 1.2.9 1-15 Introduction to Data Structures Assumption while Finding the Time Complexity : The leading constant of highest power of ‘n’ and all lower powers on ‘n’ are ignored in f(n). Example : Consider the following equation for time complexity of an algorithm : T(n) = O(100 n3+29 n2+19n) …(1) Here the leading constant of highest power of ‘n’ is 100. 100 can be ignored. Based on the assumption, Equation (1) can be rewritten as: T(n) = O(n3+29 n2+19n) …(2) All the lower powers of n (i.e 29 n2+19n) can be ignored completely. So, the above equation is rewritten as: T(n) = O(n3) 1.2.10 Relationship between Data structure and Algorithm : This is close connection between the structuring of data and the synthesis of algorithm. Data structure and the algorithm should be thought of as a unit. To understand the above concept, we can consider the example of a list of n pairs of names and phone numbers. (N1,P1),(N2,P2),(N3,P3),…….. If we have to write a program to print the phone number of a person whose name is entered as an input. (i) If the list is sequenced on name then binary search algorithm will be able to search the name quickly. (ii) If the list is not sorted then binary search algorithm will not work and the only choice left will be linear search. In computer science, we study different kind of data objects. For each object, there are number of operations. Objects should be represented in way that makes efficient implementation of operation. 1.3 Q. 1 Solved Questions : Flowcharting and pseudocode are 2 different design tools for an algorithm. How do they differ and how are they similar ? Ans. : Both flowcharting and pseudocode are used to design individual parts of a program. But flowcharting gives a pictorial representation of the logical flow of an algorithm. This is incontrast to the other design tool, pseudocode, that provides a textual (part English, part structured code) design solution. Data & File Structures (GTU) Q.2 1-16 Introduction to Data Structures What are the factors which contribute for running time of a program ? Ans. : There are basically four factors on which the running time of program depend. They are : (i) The input to the program. (ii) Size of the program. (iii) Machine language instruction set. (iv) The machine we are executing on. (v) The quality of code generated by the compiler used to create the object code. (vi) The nature and speed of the instructions on the machine used to execute the program. (vii) The time complexity of the algorithm underlying the program. The input to the program contributes for running time of a program – Explain. Q. 3 Ans. : This indicates that the running time of a program should be defined as a function of the input. In most of the cases, the running time depends not on exact input (i.e. what is the kind of input) but depends only on the size of the input. For example, if we are sorting five numbers, (using some algorithm), it takes less amount of time as compared to sorting ten number (of course using the same algorithm). Q. 4 What are the desirable characteristics of a program ? Ans. : The desirable characteristics of a program are : (i) Integrity : Refers to the accuracy of program. (ii) Clarity : Refers to the overall readability of a program, with emphasis on its underlying logic. (iii) Simplicity : The clarity and accuracy of a program are usually enhanced by keeping the things as simple as possible, consistent with the overall program objectives. (iv) Efficiency : It is concerned with execution speed and efficient memory utilization. (v) Modularity : Many programs can be decomposed into a series of identifiable subtasks. (vi) Generality : Program must be as general as possible. (viz., rather than keeping fixed values for variables, it is better to read them). Data & File Structures (GTU) Q. 5 1-17 Introduction to Data Structures In what way the asymmetry between Big-Oh notation and Big-Omega Notation helpful ? Ans. : There are many situations where the algorithm functions faster on some inputs but not on all the inputs. viz, let us assume that we know an algorithm to determine whether the input to it is of prime length. This algorithm runs very fast whenever the input length is even. Hence, we cannot find good lower bound on the running time, that is true for all n no. Q. 6 What are the basic components, which contribute to the space complexity ? Ans. : Space complexity is the amount of memory, the program needs for its execution. There are basically two components, which need to be considered while determining the space complexity. They are : (i) A fixed part, which is independent of the characteristics (viz., number, size) of the inputs and outputs. This part typically includes the instruction space (i.e. space for the code), space for simple variables and fixed-size component variables (also called aggregate), space for constants, and so on. (ii) A variable part which consists of the space needed by component variables whose size is dependent on the particular problem instance being solved, space needed by reference variables (this depends on instance characteristics) and the recursion stack space. S (P) space requirement for program P = c + SP , ..... SP instance characteristics, Where c is a constant. Hence, we usually concentrate on determining instance characteristics as a measure of space complexity Q. 7 How do we analyse and measure time complexity of algorithm ? Ans. : The complexity of the algorithm which is used will depend on the number of statement executed. Though the execution time for each statement is different, we can roughly check as how many time each statement is executed. Whenever we execute conditional statements we will be skipping some statement or we might be repeating some statement. Hence the total number of statement execution will depend on conditional statements. At this stage we can roughly estimate that the complexity is the number of time the condition statement is executed. Example : for (i=0; i<n; i++) System.out.println(i); The output is number from 0 to n-1. We can easily say that the cout has been executed n time. The condition which is checked in this case is i< n. The condition is checked n+1 times, Data & File Structures (GTU) 1-18 Introduction to Data Structures n times when it was true and once when it becomes false. Hence first statement i.e. condition is executed n + 1 times and cout n times. The total number of statement executed will be 2n + 1. Actually i = 0 has been executed once and i++, n time. The Total number of statement which are executed are : 1 + (n+1) +(n) +(n) = 3n +2. If we ignore the constant, we can say that the complexity is of the order of n. The notation used is Big O i.e. O(n). Q. 8 Reorder the following complexity from smallest to largest : (a) n log2 (n) (b) n + n2 + n3 (c) 24 (d) n Ans. : Complexities in ascending order : Q. 9 (a) 24 (b) (c) n log2 (n) (d) n n + n2 + n3 Show the step table (indicating number of steps per execution, frequency of each of the statement) for the following segment : Line no. 1 int sum ( int a[ ], int n ) { 2 int s, i ; 3 s=0; 4 for (i = 1; i < = n ; i ++) 5 s=s+a[i]; 6 return (s) ; 7 } Ans. : s/e steps per execution. Line 1 2 3 4 5 6 7 s/e 0 0 1 1 1 1 0 Frequency Total steps 0 0 0 0 1 1 n+1 n+1 n n 1 1 0 0 Total number of steps = 2 n + 3 Data & File Structures (GTU) 1-19 Introduction to Data Structures Q. 10 Compute the frequency count for : for i:=1 to n for j : = i + 1 to n for k : = j + 1 to n for l : = k + 1 to n x = x + 1; Ans. : Here i loop gets executed for n times, j loop for (n 1) times, k loop for (n 2) times and l loop for (n 3) times. Frequency count of x = x + 1 = n · (n 1) (n 2) · (n 3) = n4 6 n3 + 11 n2 6n 432 = n4 6 n3 + 11 n2 6n 24 4 In general, the total complexity of the code is O (n ). Q. 11 Find the time complexity for : for i : = 1 to n do for j : = i + 1 to n do for k : = j + 1 to n do z = z + 1; Ans. : In this example, the ‘i’ loop executes for ‘n’ times, ‘j’ loop gets executed for n 1 times (since j loop starts with i + 1) and k loop gets executed for n 2 times. The total count of loops = n · (n 1) · (n 2) = n 3n +2n 3 2 The frequency count of z = z + 1, depends on the overall execution time. It can be determined by dividing the total loop count by the total order of the loop. Frequency count of z = z + 1 is : n 3 n + 2n 32 n 3 n + 2n = 6 3 2 3 2 3 In general, the total complexity of the code is O (n ). Data & File Structures (GTU) 1-20 Introduction to Data Structures Q. 12 Find out the frequency count for the following piece of code : x = 5; y = 5; for (i = 2; i < = x ; i ++) for (j = y; j >= 0; j ) { if (i = = j) System.out.println(“xxx”); else break; } Ans. : Let us number the statements : (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) x = 5; y = 5; for (i = 2; i < = x; i++) for (j = y; j >= 0; j ) { if (i = = j) System.out.println(“xxx”) else break; } Note here that the break statement will cause the termination of inner loop (i.e. j loop). ‘i’ value 2 j value 5 Remarks Since i j, hence goes to statement VII 3 4 5 5 Since i j, hence goes to statement VII Since i j, hence goes to statement VII 5 5 Since i = j, hence goes to statement V. Therefore the count can be tabulated as : Line number 1 2 3 4 5 6 7 Frequency 1 4 4 4 1 3 3 Data & File Structures (GTU) 1-21 Introduction to Data Structures Q. 13 Find frequency count for the following : int a = 10; b = 10; for (i = 7; i < 9; i++) for (j = 10; j = b; j+ +) { if (i < j) System.out.println(“DSF”); else break; } Ans. : Note here that i value ranges from 7 to 8 ( ··· i = 7 and i < 9) and corresponding j value can be only 10 (··· j < = b, and b = 10). So we can tabulate the count as : Line number Frequency 1 1 2 2 3 2 4 2 5 2 6 0 7 0 Q. 14 Find the frequency count of : for (i – 1; i < n; i++) for (j = 1; j <= n; j++) a = a + 2; Ans. : Here the i loop gets executed for n times, and j loop for n times. The total loop count is n2. i.e. n n 1 (1 for a = a + 2) i=1 j=1 n = i=1 = n2 n Data & File Structures (GTU) 1-22 Introduction to Data Structures Q. 15 Find the frequency count of : (I) for (i = 1; i <= n; i++) for (j = 1; j <= i; j++) x = x + 1; (II) i = 1; while (i <= n) { x = x + 1; i = i + 1; } Ans. : (I) Let us number the statements : 1. for (i = 1; i <= n; i++) 2. for (j = 1; j <= i; j++) 3. x = x + 1; The j loop varies from 1 to i times, and as i goes from 1 to n; Total loop count = n (n + 1) = n2 + n n2 + n frequency count of x = x + 1 is 2 (II) Let us number the statements : 1. i = 1 2. while (i< = n) 3. { 4. x = x + 1; 5. i = i + 1 6. } Now the frequency count is : Statement 1 2 3 4 5 6 Count 0 n+1 0 n n 0 Data & File Structures (GTU) 1-23 Introduction to Data Structures Q. 16 If the algorithm doIt has the complexity 5n, calculate the run-time complexity of the following program segment : j=1 loop i < = n doIt (…) i=i+1 Ans. : dolt has the complexity 5n, Hence n (5n) = 5n2 = 0 (n2) Q. 17 Suppose the complexity of an algorithm is If a step in this algorithm takes 1 nanosec (10–9), how long does it take the algorithm process an input of size 1000 ? Ans. : Complexity of an algorithm is n3 ,when input size = 1000, Time = (10003) 10– 9 sec Algorithm will take 1 sec,to process an input of size 1000. Q. 18 An algorithm runs a given input of size n. If n is 4096, the run time is 512 millisecond. If n is 16384, the run time 1024 millisecond. What is the complexity ? What is the big – ‘O’ notation ? Ans. : n1 = 4096 f (n1) = 512 n2 = 16384 f (n2) = 1024 n2 = 4 n1 f(n2) = 2 f (n1) Since n increases by four while f(n) increases by only two, the complexity is n 1/2. The big – ‘O’ notation = ‘O’(n1/2). Q. 19 An algorithm runs a given input of size n, if n is equal to 4096 the runtime is 512 milliseconds. If n is 16384, the runtime is 2048 milliseconds. What is the complexity? What is the big-oh notation ? Ans. : n1 = 4096 f (n1) = 512 n2 = 4 n1 f (n2) = 2 f (n1) n2 = 16384 f (n2) = 2048 Data & File Structures (GTU) 1-24 Introduction to Data Structures Since f(n) increases proportionally with n, the relationship is linear. The complexity is also linear and therefore the Big – 0 notation = 0(n). Q. 20 Calculate the run Time complexity of the following program segment : i = 1 loop (i<=n) print(i) i = i +1 Ans : As the loop iterates till i < = n → f(n) = n O(f(n)) = O(n) Q. 21 An algorithm takes 0.5 ms for input size 100. How long it will take for input size 500. If the running time is the following .(Assume low order terms are negligible) : (i) Linear (iii) Quadratic (ii) (N log N) (iv) Cubic Ans. : (i) Linear : That is n =100, 100 steps are required for execution. If n = 500, 500 steps are required for execution. Time required for 100 steps is 0.5 ms. Time required for 500 steps is 2.5 ms. (ii) O(N log N) : i.e. N = 100 N log N = 100 log 100 = 200 steps are required for execution. Now if N = 500, 500 log 500 = 1349 steps required for execution. For 200 steps time required is 0.5 ms. Time required for 1349 steps = 3.37 ms. (iii) Quadratic : For n = 100, n2 = 1002 steps are required. For n = 500, n2 = 5002 steps are required. Data & File Structures (GTU) 1-25 Introduction to Data Structures Time required for 1002 = 10,000 steps is 0.5 ms. Time required for 25,0000 steps is 12.5 ms. (iv) Cubic : For n = 100, n3 = 1003 steps are required. For n = 500, n3 = 5003 steps are required. Time required for 1003 steps is 0.5 ms. Time required for 5003 steps is 62.5 ms. Review Questions Q. 1 What is an algorithm ? Q. 2 Explain the various characteristics of an algorithm. Q. 3 Explain the need for algorithm analysis. Q. 4 Compare the two functions n2 and 2n/4 for various values of n. Determine when the second becomes larger than the first. Q. 5 Determine the frequency counts for all statements in the following two algorithm segments: 1. for i := 1 to n do 2. for j := 1 to i do 1. i := 1; 2. while (i<=n) do { 3. for k := 1 to j do 3. 4. x := x +1; 4. x := x +1; 5. i := i + 1; 6. } (a) (b) Q. 6 Find two function f(N) and g (N) such that neither f (N) = O(g(N)) nor g (N) = O (f(N)). Q. 7 For each of the following six program fragments : a) Give an analysis of the running time (Big-Oh will do). b) Implement the code in the language of your choice, and give the running time for several values of N. c) Compare your analysis with the actual running times. Data & File Structures (GTU) (1) 1-26 Introduction to Data Structures Sum = 0; for (i=0; i<N; i++) Sum++; (2) Sum = 0; for ( i =0; i<N; i++) for (j =0; j<N; j++) Sum++; (3) Sum = 0; for ( i =0; i < N; i++) for (j =0; j < N*N; j++) Sum++; (4) Sum = 0; for ( i =0; i < N; i++) for (j =0; j < i ; j++) Sum++; (5) Sum = 0; for (i = 0; i < N; i++) for (j = 0; j < i * i; j++) for(k = 0; k< j; k++) Sum++; (6) Sum = 0; for (i = 0; i < N; i++) for (j = 1; j < i * i; j++) if( j % i == 0) for(k = 0; k < j; k++) Sum++; Q. 8 Consider the following algorithm (known as Horner’s rule) to evaluate N F(X) = ∑ Ai Xi : i=0 Poly = 0; for ( i = N; i >= 0; i--) Data & File Structures (GTU) 1-27 Introduction to Data Structures Poly = X * Poly + A[i]; a) Show how the steps are performed by this algorithm for X = 3, F(X) = 4X4 + 8X3 + X + 2. Q. 9 b) Explain why this algorithm works. c) What is the running time of this algorithm? Give an efficient algorithm to determine if there exists an integer i such that A i = i in an array of integers A1 < A2 <A3 < ….<AN. What is the running time of your algorithm? Q. 10 Give efficient algorithms (along with running time analyses) to : a. Find the minimum subsequence sum. b. Find the minimum positive subsequence sum. c. Find the maximum subsequence product. Q. 11 What is big O notation? Arrange the following functions by growth value : N, 2 N , N2, N log2N, N log2 N, 2/N, 2N, N3. Q. 12 Show that following statements are true : 1. 20 is O(1) 2. n (n 1) /2 is O (n ) 2 n 3 2 3 3. max (n , 10 n ) is O (n ) 4. ∑ k i is O (n k+1 ) i=1 Data & File Structures (GTU) 1-28 Note Introduction to Data Structures