MODULE - I Introduction to Data Structures MODULE 1 Introduction to Data Structures Module Description Data Structure is a way of collecting and organising data in such a way that we can perform operations on these data in an effective way. Data Structures is about rendering data elements in terms of some relationship, for better organization and storage. We have data player's name "Shane" and age 26. Here "Shane" is of String data type and 26 is of integer data type. We can organize this data as a record like Player record. Now we can collect and store players’ records in a file or database as a data structure. For example, "Mitchelle" 30, "Steve" 31, "David" 33 In simple language, Data Structures are structures programmed to store ordered data, so that various operations can be performed on it easily. Chapter 1.1 Basics of Data Structure Chapter 1.2 Pointers and Recursion Chapter Table of Contents Chapter 1.1 Basics of Data Structure Aim ......................................................................................................................................................... 1 Instructional Objectives....................................................................................................................... 1 Learning Outcomes .............................................................................................................................. 1 1.1.1 Introduction to Data Structures ............................................................................................... 2 Self-assessment Questions ......................................................................................................... 3 1.1.2 Classification of data structures ............................................................................................... 4 (i) Primitive data structure ........................................................................................................ 4 (ii) Non-primitive data structure.............................................................................................. 5 Self-assessment Questions ......................................................................................................... 7 1.1.3 Elementary Data Organization ................................................................................................. 8 1.1.4 Time and Space Complexity ................................................................................................... 10 (i) Asymptotic Notation .......................................................................................................... 15 Self-assessment Questions ....................................................................................................... 20 1.1.5 String Processing ...................................................................................................................... 21 Self-assessment Questions ....................................................................................................... 23 1.1.6 Memory Allocation .................................................................................................................. 24 (i) Static memory allocation .................................................................................................... 24 (ii) Dynamic memory allocation ............................................................................................ 24 Self-assessment Questions ....................................................................................................... 25 1.1.7 Accessing the address of a variable: Address of (&) operator ............................................ 26 Self-assessment Questions ....................................................................................................... 27 Summary ............................................................................................................................................. 28 Terminal Questions............................................................................................................................ 29 Answer Keys........................................................................................................................................ 30 Activity................................................................................................................................................. 31 Case Study: Exponentiation .............................................................................................................. 31 Bibliography ........................................................................................................................................ 33 e-References ........................................................................................................................................ 33 External Resources ............................................................................................................................. 33 Video Links ......................................................................................................................................... 33 Aim To equip the students with the basic skills of using Data Structures in programs Instructional Objectives After completing this chapter, you should be able to: • Describe Data Structures and its types • Explain items included elementary data organisation • Summarize the role of algorithms in programming • Explain the procedure to calculate time and space complexities • Explain the string processing with its functions • Demonstrate memory allocation and address variable Learning Outcomes At the end of this chapter, you are expected to: • Outline the different types of data structures • Elaborate asymptotic notations with example • Calculate the time and space complexities of any sorting algorithm • List down the string processing functions • Summarize the contents in elementary data organisation • Differentiate static and dynamic memory allocation 1 1.1.1 Introduction to Data Structures Computers can store, retrieve and process vast amounts of data within its storage media. In order to work with a large amount of data, it is very important to organize the data properly. If the data is not organized properly, it becomes difficult to access the data. Thus, if the data is organized efficiently, any operation can be performed on data very easily and quickly. Hence it provides a faster response to the user. This organization of data can be done with the help of data structures. Data structures enable a programmer to properly structure large amounts of data into conceptually manageable relationships. If we use a data structure to store the data, it becomes very easy to retrieve and process them. A data structure can be defined as a particular method of organizing a large amount of data so that it can be used efficiently. A data structure can be used to store data in the form of a stack, queue etc. Any Data structure will follow these 4 rules: 1. It should be an agreement on how to store data in memory, For example, data can be stored in an array, queue, linked list etc. 2. It should specify the operations we can perform on that data, For example, we can specify add, delete, search operations on any data structure 3. It should specify the algorithms for those operations, For example, efficient algorithms for searching element in array. 4. The algorithms used must be time and space efficient. We have many primitive data types like integer, character, string etc. which stores specific kind of data. Data structures allow us to perform operations on groups of data, such as adding an item to a list, searching a particular element from the list etc. When a data structure provides operations, we can call the data structure an abstract data type (abbreviated as ADT). A data structure is a form of abstract data type having its own set of data elements along with functions to perform operations on that data. Data structures allow us to manage large amounts 2 of data efficiently so that it can be stored in large databases. These data structures can be used to design efficient sorting algorithms. Every data structure has advantages and disadvantages. Every data structure suits to specific problem domain depending upon the type of operations and the data arrangement. For example, an array data structure is suitable for read operations. We can use an array as a data structure to store n number of elements in contiguous memory locations and read/ add elements as and when required. Other data structures include liked lists, queue, stack etc. Self-assessment Questions 1) __________________is NOT the component of data structure. a) Operations b) Storage Structures c) Algorithms d)None of above 2) Which of the following are true about the characteristics of abstract data types? a) It exports a type b) It exports a set of operations c) It exports a set of elements d) It exports a set of arrays 3) Each array declaration need not give, implicitly or explicitly, the information about, a) The name of array b) The data type of array c) The first data from the set to be stored d) the index set of the array 3 1.1.2 Classification of data structures A data structure is the portion of memory allotted for a model, in which the required data can be arranged in a proper fashion. There are certain linear data structures (e.g., stacks and queues) that permit the insertion and deletion operations only at the beginning or at end of the list, not in the middle. Such data structures have significant importance in systems processes such as compilation and program control. Types of data structures: A data structure can be broadly classified into: 1. Primitive data structure 2. Non-primitive data structure (i) Primitive data structure The data structures, typically those data structure that are directly operated upon by machine level instructions, i.e., the fundamental data types such as int, float, double in case of ‘c’ are known as primitive data structures. Primitive data types are used to represent single values: • Integer: This is used to represent a number without decimal point. For example, 22, 80 • Float and Double: This is used to represent a number with decimal point. For example, 54.1, 57.8 • Character: This is used to represent single character. For example, ‘L’, ‘g’ • String: This is used to represent group of characters. For example, "Hospital Management” • 4 Boolean: This is used represent logical values either true or false. (ii) Non-primitive data structure A non-primitive data type is something else such as an array structure or class is known as the non-primitive data type. The data types that are derived from primary data types are known as non-Primitive data types. These data types are used to store group of values. The non-primitive data types are: • Arrays • Structure • Union • linked list • Stacks • Queue etc. Non-primitive data types are not defined by the programming language, but are instead created by the programmer. They are sometimes called "reference variables," or "object references," since they reference a memory location, which stores the data. The non-primitive data types are used to store the group of values. There are two types of non-primitive data structures. 1. Linear Data Structures 2. Non-linear data structures 1. Linear Data Structure: A list, which shows the relationship of adjacency between elements, is said to be linear data structure. The most, simplest linear data structure is a 1-D array, but because of its deficiency, list is frequently used for different kinds of data. A list is an ordered list, which consists of different data items connected by means of a link or pointer. This type of list is also called a linked list. A linked list may be a single list or double linked list. 5 • Single linked list: A single linked list is used to traverse among the nodes in one direction. • Double linked list: A double linked list is used to traverse among the nodes in both the directions. A linked list is normally used to represent any data used in word-processing applications, also applied in different DBMS packages. A list has two subsets. They are: • Stack: It is also called as last-in-first-out (LIFO) system. It is a linear list in which insertion and deletion take place only at one end. It is used to evaluate different expressions. • Queue: It is also called as first-in-first-out (FIFO) system. It is a linear list in which insertion takes place at once end and deletion takes place at other end. It is generally used to schedule a job in operating systems and networks. 2. Non-Linear data structure: A list, which does not show the relationship of adjacency between elements, is said to be nonlinear data structure. The frequently used non-linear data structures are, • Trees: It maintains hierarchical relationship between various elements. • Graphs: It maintains random relationship or point-to-point relationship between various elements. 6 The Figure 1.1.1 shows the classification of data structures. Figure 1.1.1: Types of data structures Self-assessment Questions 4) Which of the following data structure is linear type? a) Graph b) Trees c) Binary tree d) Stack 5) Which of the following data structure is non-linear type? a) Strings b) Lists c) Stacks d) Graph 6) Which of the following data structure can't store the non-homogeneous data elements? a) Arrays b) Records c) Pointers d) Stacks 7 1.1.3 Elementary Data Organization There are some basic terminologies related to Data Structures. They are detailed below: • Data The term data means a value or set of values. These values may represent an observation, like roll number of student, marks of student, name of the employee, address of person, phone number, etc. In programming languages we generally express data in the form of variables with variable name as per the type of data like integer, floating point, character, etc. For example, figures obtained during exit polls, roll number of student, marks of student, name of the employee, address of person, phone number etc. • Data item: A data item is a set of characters which are used to represent a specific data element. It refers to a single unit of values. It is also called as Field. For example, name of student in class is represented by data item say std_name. The data item can be classified into two data types depending on usage: 1. Elementary data type: These data items can’t be further subdivided. For example, roll number 2. Group data type: These data items can be further sub divided into elementary data items. For example, Date can be divided into days, months, years • Entity: An entity is something that has a distinct, separate existence, though it need not be a material existence. An entity has certain attributes, or properties, which may be assigned values. Values assigned may be either numeric or non-numeric. For example, Student is an entity. The possible attributes for student can be roll number, name and date of birth, gender, and class. Possible values for these attributes can be 32, Alex, 24/09/2000, M, 11. In C language we usually use structures to represent an entity. 8 • Entity Set: An entity set is a group of or set of similar entities. For example, Consider a situation where we have multiple entities with same attributes, say, students in B.Tech second year. Here each student will have their respective roll numbers, names, marks obtained etc. All these students can represent an entity set like an array of students. • Information: When the data is processed by applying certain rules, the newly processed data is called information. The data is not useful for decision marking whereas information is useful for decision making. For example, a student who has scored maximum marks in a subject becomes information as it is processed and conveys a meaning • Record: Record is a collection of field values of a given entity. Set of multiple records with same fields form a file. Now a row in this file can be termed as a record. Every record has one or more than one fields associated with it. Generally each record has at least one unique identifier field. For example, roll number, name, address etc. of a particular student. • File: File is a collection of records of the entities in a given entity set. For example, file containing records of students of a particular class. Figure 1.1.2 shows the student file structure. 9 Figure 1.1.2: Structure of a File • Key: A key is one or more field(s) in a record that take(s) unique values and can be used to distinguish one record from the others. For example, in the above snapshot, ID is the key to identify particular record. 1.1.4 Time and Space Complexity The running time of an algorithm depends on how long it takes a computer to run the lines of code of the algorithm. It depends on: i. The speed of the computer ii. The programming language iii. The compiler that translates the programming language into code which runs directly on the computer The complexity of an algorithm is a function describing the efficiency of the algorithm in terms of the amount of data the algorithm must process. Usually there are natural units for the domain and range of this function. 10 There are two main complexity measures of the efficiency of an algorithm: • Time complexity is a function describing the amount of time an algorithm takes in terms of the amount of input to the algorithm. "Time" can mean: i. The number of memory accesses performed ii. The number of comparisons between integers iii. The number of times some inner loop is executed iv. Some other natural unit related to the amount of real time the algorithm will take • This idea of time is always kept separate from "wall clock" time, since many factors unrelated to the algorithm itself can affect the real time like: i. The language used ii. Type of computing hardware iii. Proficiency of the programmer iv. Optimization in the compiler It turns out that, if the units are chosen wisely, all the other things do not matter and thus an independent measure of the efficiency of the algorithm can be done. • Space complexity is a function describing the amount of memory (space) an algorithm takes in terms of the amount of input to the algorithm. The requirement of the “extra “memory is often determined by not counting the actual memory needed to store the input itself. We use natural, but fixed-length units, to measure space complexity. We can use bytes, but it's easier to use units like: number of integers, number of fixed-sized structures, etc. In the end, the function we come up with will be independent of the actual number of bytes needed to represent the unit. Space complexity is sometimes ignored because the space used is minimal and/or obvious, but sometimes it becomes as important an issue as time. For example, “This algorithm takes n2 time”. Here, n is the number of items in the input or "This algorithm takes constant extra space" because the amount of extra memory needed does not vary with the number of items processed. 11 An array of n floating point numbers is to be put into ascending numerical order. This task is called sorting. One simple algorithm for sorting is selection sort. Let an index i go from 0 to n1, exchanging the ith element of the array with the minimum element from i up to n. Here are the iterations of selection sort carried out on the sequence {4 3 9 6 1 7 0}: Index 0 1 2 3 4 5 6 comments | 4 3 9 6 1 7 0 initial i=0 | 0 3 9 6 1 7 4 swap 0, 4 i=1 | 0 1 9 6 3 7 4 swap 1, 3 i=2 | 0 1 3 6 9 7 4 swap 3, 9 i=3 | 0 1 3 4 9 7 6 swap 6, 4 i=4 | 0 1 3 4 6 7 9 swap 9, 6 i=5 | 0 1 3 4 6 7 9 (done) Here is a simple implementation in C: int find_min_index (float [], int, int); void swap (float [], int, int); /* selection sort on array v of n floats */ Void selection_sort (float v[], int n) { int i; /* for i from 0 to n-1, swap v[i] with the minimum * of the i'th to the n'th array elements */ for (i=0; i<n-1; i++) swap (v, i, find_min_index (v, i, n)); } /* find the index of the minimum element of float array v from * indices start to end */ int find_min_index (float v[], int start, int end) { int i, mini; mini = start; for (i=start+1; i<end; i++) if (v[i] < v[mini]) mini = i; return mini; } /* swap i'th with j'th elements of float array v */ void swap (float v[], int i, int j) { float t; 12 t = v[i]; v[i] = v[j]; v[j] = t; } The performance of the algorithm is quantified, the amount of time and space taken in terms of n. It is interesting to note how the time and space requirements change as n grows large; sorting 10 items is trivial for almost any reasonable algorithm one can think of, but what about 1,000, 10,000, 1,000,000 or more items? It is clear from this example, the amount of space needed is clearly dominated by the memory consumed by the array. If the array can be stored, it can sort it. That is, it takes constant extra space. The main interesting point is in the amount of time the algorithm takes. One approach is to count the number of array accesses made during the execution of the algorithm; since each array access takes a certain (small) amount of time related to the hardware, this count is proportional to the time the algorithm takes. Thus end up with a function in terms of n that gives us the number of array accesses for the algorithm. This function is called T (n), for Time. T (n) is the total number of accesses made from the beginning of selection sort until the end. Selection_sort itself simply calls swap and find_min_index as i go from 0 to n-1, so T (n) = [time for swap + time for find_min_index (v, i, n)]. (n-2) because for loop goes from 0 up to but not including n-1). (Note: for those not familiar with Sigma notation, the looking formula above just means "the sum, as we let i go from 0 to n-2, of the time for swap plus the time for find_min_index (v, i, n).) The swap function makes four accesses to the array, so the function is now, 13 T (n) = [4 + time for find_min_index (v, i, n)]. With respect to find_min_index, it is seen that it does two array accesses for each iteration through the for loop, and it does the for loop n - i - 1 time: T (n) = [4 + 2 (n - i - 1)]. With some mathematical manipulation, this can be broken up into: T (n) = 4(n-1) + 2n (n-1) - 2(n-1) - 2. (Everything times n-1 because it goes from 0 to n-2, i.e., n-1 times). Remembering that the sum of i as i goes from 0 to n is (n (n+1))/2, then substituting in n-2 and cancelling out the 2's: T (n) = 4(n-1) + 2n (n-1) - 2(n-1) - ((n-2) (n-1)). And to make a long story short, T (n) = n2 + 3n - 4. So this function gives us the number of array accesses selection sort makes for a given array size, and thus an idea of the amount of time it takes. There are other factors affecting the performance, For instance the loop overhead, other processes running on the system, and the fact that access time to memory is not really a constant. But this kind of analysis gives a good idea of the amount of time one will spend waiting, and allows comparing these algorithms to other algorithms that have been analysed in a similar way. 14 (i) Asymptotic Notation The function, T (n) = n2 + 3n – 4 (refer earlier section), describes precisely the number of array accesses made in the algorithm. In a sense, it is a little too precise; all we really need to say is n2; the lower order terms contribute almost nothing to the sum when n is large. One likes a way to justify ignoring those lower order terms and to make comparisons between algorithms easy. So the asymptotic notation is used. The worst-case complexity of the algorithm is the function defined by the maximum number of steps taken on any instance of size n. It represents the curve passing through the highest point of each column. The best-case complexity of the algorithm is the function defined by the minimum number of steps taken on any instance of size n. It represents the curve passing through the lowest point of each column. Finally, the average-case complexity of the algorithm is the function defined by the average number of steps taken on any instance of size n. • Lower Bound: A non-empty set A and its subset B is given with relation ≤. An element a is called lower bound of B if a ≤ x x B (read as if a is less than equal to x for all x belongs to set B). For example, a non-empty set A and its subset B is given as A={1,2,3,4,5,6} and B={2,3}. The lower bound of B= 1, 2 as 1, 2 in the set A is less than or equal to all element of B. • Upper Bound: An element A is called upper bound of B if x ≤ a x B. For example, a non-empty set A and its subset B is given as A={1,2,3,4,5,6} and B={2,3}. The upper bound of B= 3,4,5,6 as 3,4,5,6 in the set A is greater than or equal to all element of B. • Tight Bound: A bound (upper bound or lower bound) is said to be tight bound if the inequality is less than or equal to (≤). Theta (Θ) Notation It provides both upper and lower bounds for a given function. Θ (Theta) Notation: means ‘order exactly’. Order exactly implies a function is bounded above and bounded below both. This notation provides both minimum and maximum value for a 15 function. It further gives that an algorithm will take this much of minimum and maximum time that a function can attain for any input. Let g(n) be given function. f(n) be the set of function defined as, Θ (g(n)) = {f(n): if there exist positive constant c1,c2 and n0 such that 0≤c1g(n)≤f(n) ≤c2g(n) for all n, n n0} It can be written as f(n)= Θ(g(n)) or f(n) Θ(g(n)), here f(n) is bounded both above and below by some positive constant multiples of g(n) for all large values of n. It is described in the following figure. Figure 1.1.3 shows the Graphical representation of Theta (Θ) Notation. Figure 1.1.3 Theta (Θ) Notation Graph In the above figure, function f(n) is bounded below by constant c1 times g(n) and above by constants c2 times g(n). We can explain this by following example: For example, To show that 3n+3 = Θ (n) or 3n+3 Θ (n) we will verify that f(n) g(n) or not with the help of the definition i.e., Θ (g(n)) = {f(n): if there exist positive constant c1,c2 and n0 such that 0≤c1g(n)≤f(n) ≤c2g(n) for all n, n n0} 16 In the given problem f(n)= 3n+3 and g(n)=n to prove f(n) g(n) we have to find c1,c2 and n0 such that 0≤c1g(n)≤f(n) ≤c2g(n) for all n, n n0 => to verify f(n) ≤c2g(n) We can write f(n)=3n+3 as f(n)=3n+3 ≤ 3n+3n (write f(n) in terms of g(n) such that mathematically inequality should be true) ≤6n for all n > 0 c2=6 for all n > 0 i.e., n0=1 To verify 0≤c1g(n)≤f(n) We can write f(n)=3n+3 3n (again write f(n) in terms of g(n) such that mathematically inequality should be true) c1=3 for all n, n0=1 => 3n≤3n+3≤6n for all n n0, n0=1 i.e., we are able to find, c1=3, c2=6 n0=1 such that 0≤c1g(n)≤f(n) ≤c2g(n) for all n, n n0 So, f(n)= Θ (g(n)) for all n >=1 Big O Notation This notation provides upper bound for a given function. O(Big Oh) Notation: mean `order at most' i.e., bounded above or it will give maximum time required to run the algorithm. For a function having only asymptotic upper bound, Big Oh „O‟ notation is used. Let a given function g(n), O(g(n))) is the set of functions f(n) defined as, O(g(n)) = {f(n): if there exist positive constant c and n0 such that 0≤f(n) ≤cg(n) for all n, n n0} f(n) = O(g(n)) or f(n) O(g(n)), f(n) is bounded above by some positive constant multiple of g(n) for all large values of n. 17 The definition is illustrated with the help of figure 1.1.4. Figure 1.1.4: Big O Notation Graph In this figure, function f(n) is bounded above by constant c times g(n). We can explain this by following examples: For example, To show 3n2+4n+6=O (n2) we will verify that f(n) g(n) or not with the help of the definition i.e., O(g(n))={f(n): if there exist positive constant c and n0 such that 0≤f(n) ≤cg(n) for all n, n n0} In the given problem: f(n)= 3n2+4n+6 g(n)= n2 To show 0≤f(n) ≤cg(n) for all n, n n0 f(n)= 3n2+4n+6≤3n2+n2 ≤4 n2 c=4 for all nn0, n0=6 i.e., we can identify , c=4, n0=6 So, f(n)=O(n2) 18 for n6 Properties of Big O The definition of big O is difficult to have to work with all the time, kind of like the "limit" definition of a derivative in Calculus. Here are some helpful theorems that can be used to simplify big O calculations: • Any kth degree polynomial is O (nk). • a nk = O(nk) for any a > 0. • Big O is transitive. That is, if f(n) = O(g(n)) and g(n) is O(h(n)), then f(n) = O(h(n)). Big-Ω (Big-Omega) notation Sometimes, it is said that an algorithm takes at least a certain amount of time, without providing an upper bound. We use big-Ω notation; that's the Greek letter "omega." If a running time is Ω (f (n)), then for large enough n, the running time is at least k⋅f (n) for some constant k. Here's how to think of a running time that is Ω(f (n)): Figure 1.1.5 shows the graph for Big-Ω (Big-Omega) notation Figure 1.1.5: Big-Ω (Big-Omega) notation Graph It is said that the running time is "big-Ω of f (n)." We use big-Ω notation for asymptotic lower bounds, since it binds the growth of the running time from below for large enough input sizes. Just as Θ (f (n)) automatically implies O (f (n)), it also automatically implies Ω (f (n)). So it can be said that the worst-case running time of binary search is Ω (lg n). One can also make correct, but imprecise, statements using big-Ω notation. For example, just as if you really do have a million dollars in your pocket, you can truthfully say "I have an amount of money in my pocket, 19 and it's at least 10 dollars," you can also say that the worst-case running time of binary search is Ω(1), because it takes at least constant time. Self-assessment Questions 7) When determining the efficiency of algorithm, the space factor is measured by? a) Counting the maximum memory needed by the algorithm b) Counting the minimum memory needed by the algorithm c) Counting the average memory needed by the algorithm d) Counting the maximum disk space needed by the algorithm 8) The complexity of Bubble sort algorithm is, a) O (n) b) O (log n) c) O (n2) d) O (n log n) 9) The Average case occur in linear search algorithm: a) When Item is somewhere in the middle of the array b) When Item is not in the array at all c) When Item is the last element in the array d) When Item is the last element in the array or is not there at all 20 1.1.5 String Processing In C, textual data is represented using arrays of characters called a string. The end of the string is marked with a special character, the null character, which is simply the character with the value 0. The null or string-terminating character is represented by another character escape sequence, \0. In fact, C's only truly built-in string-handling is that it allows us to use string constants (also called string literals) in our code. Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character. For example, we can declare and define an array of characters, and initialize it with a string constant: char string[] = "Hello, world!"; In this case, we can leave out the dimension of the array, since the compiler can compute it for us based on the size of the initializer (14, including the terminating \0). This is the only case where the compiler sizes a string array for us, however; in other cases, it will be necessary that we decide how big the arrays and other data structures we use to hold strings are. We must call functions to perform operations on strings like copying and comparing them, breaking strings up into parts, joining them etc. We will look at some of the basic string functions here. 1. strlen: Returns the length of the string; (i.e., the number of characters in it), not including the \0 character char string7[] = "abc"; int len = strlen(string7); printf("%d\n", len); Output is 3 2. strcpy: This function copies one string to another char string1[] = "Hello, world!"; char string2[20]; strcpy(string2, string1); 21 Here value of string1 i.e., “Hello world!” will be copied into string2. 3. strcat: This function concatenates two strings. It appends one string onto the end of another. char string5[20] = "Hello, "; char string6[] = "world!"; printf("%s\n", string5); strcat(string5, string6); printf("%s\n", string5); The first call to printf prints “Hello, '', and the second one prints “Hello, world!”, indicating that the contents of string6 have been tacked on to the end of string5. 4. strcmp: The standard library's strcmp function compares two strings, and returns 0 if they are identical, or a negative number if the first string is alphabetically “less than” the second string, or a positive number if the first string is “greater.” char string3[] = "this is"; char string4[] = "a test"; if(strcmp(string3, string4) == 0) printf("strings are equal\n"); else printf("strings are different\n"); This code fragment will print “strings are different”. Notice that strcmp does not return a Boolean, true/false, zero/nonzero answer. The table 1.1.1 below lists some more commonly used string functions. Table 1.1.1: Commonly used String Functions Function Description Strcmpi() Compares two strings with case insensitivity Strrev() Reverses a string Strlwr () Converts uppercase string letters to lowercase 22 Strupr() Converts lowercase string letters to uppercase Strchr() Finds the first occurrence of a given character in a string Strrchr() Finds the last occurrence of a given character in a string Strset() Sets all characters in a string to a given character Strnset() Sets the specified number of characters ina string to a given character Strdup() Used for duplicating a string Self-assessment Questions 10) Which of the following function compares 2 strings with case-insensitively? a) Strcmp(s, t) b) strcmpcase(s, t) c) Strcasecmp(s, t) d) strchr(s, t) 11) How will you print \n on the screen? a) printf("\n") b) printf('\n');"; c) echo "\\n d) printf("\\n"); 12) Strcat function adds null character. a) Only if there is space b) Always c) Depends on the standard d) Depends on the compiler 23 1.1.6 Memory Allocation Memory allocation is primarily a computer hardware operation but is managed through operating system and software applications. Memory allocation process is quite similar in physical and virtual memory management. Programs and services are assigned with a specific memory as per their requirements when they are executed. Once the program has finished its operation or is idle, the memory is released and allocated to another program or merged within the primary memory. Memory allocation has two core types; • Static Memory Allocation: The program is allocated memory at compile time. • Dynamic Memory Allocation: Memory is allocated as required at run-time. (i) Static memory allocation The compiler allocates the required memory space for a declared variable. By using the address of operator, the reserved address is obtained and this address may be assigned to a pointer variable. Since most of the declared variables have static memory, this way of assigning pointer value to a pointer variable is known as static memory allocation. For example, a variable in a function, is only there until the function finishes. void func() { int i; /* `i` only exists during `func` */ } (ii) Dynamic memory allocation Dynamic memory allocation is when an executing program requests that the operating system give it a block of main memory. Dynamic allocation is a unique feature to C (amongst high level languages). It enables us to create data types and structures of any size and length to suit our programs need within the program. The program then uses this memory for some purpose. Usually the purpose is to add a node to a data structure. In object oriented languages, dynamic memory allocation is used to get the memory for a new object. The memory comes from above the static part of the data segment. Programs may request memory and may also return previously dynamically allocated memory. Memory may be returned whenever it is no longer needed. Memory can be returned in any order without any relation to the order in which it was allocated. 24 A new dynamic request for memory might return a range of addresses out of one of the holes. But it might not use up all the holes, so further dynamic requests might be satisfied out of the original hole. If too many small holes develop, memory is wasted because the total memory used by the holes may be large, but the holes cannot be used to satisfy dynamic requests. This situation is called memory fragmentation. Keeping track of allocated and deallocated memory is complicated. A modern operating system does all the tracking of memory. int* func() { int* mem = malloc(1024); return mem; } int* mem = func(); /* still accessible */ In the above example, the allocated memory is still valid and accessible, even though the function terminated. When you are done with the memory, you have to free it: free(mem); Self-assessment Questions 13) In static memory allocation, complier allocates required memory using dereference operator. a) True b) False 14) Memory is dynamically allocated once the program is complied. a) True b) False 15) Memory Fragmentation occurs when small holes of memory are formed which cannot be used to fulfil the dynamic requests. a) True b) False 25 1.1.7 Accessing the address of a variable: Address of (&) operator The address of a variable can be obtained by preceding the name of a variable with an ampersand sign (&), known as address-of operator. For example, foo = &myvar; This would assign the address of variable myvar to foo; by preceding the name of the variable myvar with the address-of operator (&), we are no longer assigning the content of the variable itself to foo, but its address. The actual address of a variable in memory cannot be known before runtime, but let's assume, in order to help clarify some concepts, that myvar is placed during runtime in the memory address 1776. In this case, consider the following code fragment: 1 myvar = 25; 2 foo = &myvar; 3 bar = myvar; The values contained in each variable after the execution of this are shown in the following diagram: Figure 1.1.6: First, we have assigned the value 25 to myvar (a variable whose address in memory we assumed to be 1776). The second statement assigns foo variable, the address of myvar, which we have assumed to be 1776. Finally, the third statement, assigns the value contained in myvar to bar. This is a standard assignment operation. The main difference between the second and third statements is the appearance of the address-of operator (&). The variable that stores the address of another variable (like foo in the earlier example) is what in C is called a pointer. Pointers are a very powerful feature of the language that has many uses in lower level programming. 26 Did you Know? In order to use these string functions you must include string.h file in your C program using #include <string.h>. Self-assessment Questions 16) Choose the right answer. Prior to using a pointer variable. a) It should be declared b) It should be initialized c) It should be declared and initialized d) It should be neither declared nor initialized 17) The address operator &, cannot act on ________________ and ____________. 18) The operator > and < are meaningful when used with pointers, if, a) The pointers point to data of similar type b) The pointers point to structure of similar data type. c) The pointers point to elements of the same array. d) The pointers point to elements of the another array. 27 Summary o Data Structure is a way of collecting and organising data in such a way that we can perform operations on these data in an effective way. o The address of a variable can be obtained by preceding the name of a variable with the address-of operator (&). o Data structures are categorized into two types: linear and nonlinear. Linear data structures are the ones in which elements are arranged in a sequence, nonlinear data structures are the ones in which elements are not arranged sequentially. o The complexity of an algorithm is a function describing the efficiency of the algorithm in terms of the amount of data the algorithm must process. o The basic terminologies in the concept of data structures are Data, Data item, Entity, Entity set, Information, Record, file, key etc. o String functions like strcmp, strcat, strlen are used for string processing in C. o Static and Dynamic memory allocation are the core types of memory allocation. 28 Terminal Questions 1. How is a primitive data structure different from that of a non-primitive data structure? 2. With an example explain upper bound, lower bound and tight bound. 3. Explain the concept of string processing in C with some basic string functions. 4. Draw a comparison between static memory allocation and dynamic memory allocation. 29 Answer Keys Self-assessment Questions Question No. Answer 1 d 2 a and c 3 c 4 d 5 d 6 a 7 a 8 b 9 a 10 c 11 d 12 b 13 b 14 b 15 a 16 c 17 18 30 r-values and arithmetic expressions c Activity 1. Activity Type: Online Duration: 15 Minutes Description: a. Divide the students into two groups. b. Give selection sort algorithm and insertion sort algorithm to each of the groups. c. Students should calculate the time complexities for both these algorithms. Case Study: Exponentiation Let us look at the implementation of exponentiation using recursion and iteration. Illustrated is a very basic algorithm for raising a base b to a non-negative integer exponent e. def exp(b, e): result = 1 for i in range(e): result = result * b return result def exp(b, e): if e == 0: return 1 else: return b * exp(b, e-1) They both have a time complexity of Θ(e). It does not really matter if it is implemented using recursion or iteration; in either case, the b is multiplied together e times. To construct a more efficient algorithm, we can apply the principle of divide and conquer: a problem of size n can be divided into two problems of size n/2, and then those solutions can be combined to solve the problem of interest. If the time to solve a problem of size n/2 is less than half of the time to solve a problem of n, then this is a better way to go. 31 Here is an exponentiation procedure that wins by doing divide-and-conquer: def fastExp(b, e): if e == 0: return 1 elif odd(e): return b * fastExp(b, e-1) else: return square(fastExp(b, e/2)) It has to handle two cases slightly differently: if e is odd, then it does a single multiply of b with the result of a recursive call with exponent e-1. Then it can compute the result for exponent e/2, and square the result. Thus time taken to compute is less. What is the time complexity of this algorithm? Let’s start by considering the case when e is a power of 2. Then we’ll always hit the last case of the function, until we get down to e = 1. To compute the result of recursive calls, just log2 e. Note that notation e is our variable, not the base of the natural log.) Further, if we start with a number with a binary representation like 1111111, then we’ll always hit the odd case, then the even case, then the odd case, then the even case, and it will take 2 log2 e recursive calls. Each recursive call costs a constant amount. In the end, the algorithm has time complexity of Θ(log e). 32 Bibliography e-References • cs.utexas.edu, (2016). Complexity Analysis. Retrieved on 19 April 2016, from https://www.cs.utexas.edu/users/djimenez/utsa/cs1723/lecture2.html • compsci.hunter.cuny.edu, (2016). C Strings and Pointers. Retrieved on 19 April 2016, from http://www.compsci.hunter.cuny.edu/~sweiss/resources/cstrings.pdf External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education Video Links Topic Link Data Structures Introduction https://www.youtube.com/watch?v=92S4zgXN17o Types of Data Structures https://www.youtube.com/watch?v=VeEneWqC5a4 Asymptotic Notation https://www.youtube.com/watch?v=6Ol2JbwoJp0 Memory Allocation https://www.youtube.com/watch?v=Dml54J3Kwm4 Variables and Addresses https://www.youtube.com/watch?v=2RsAt8RQ194 33 Notes: 34 Chapter Table of Contents Chapter 1.2 Pointers and Recursion Aim ....................................................................................................................................................... 35 Instructional Objectives..................................................................................................................... 35 Learning Outcomes ............................................................................................................................ 35 1.2.1 Introduction to Pointers and Recursive functions............................................................... 36 1.2.2 Declaring and Initializing Pointers ........................................................................................ 36 Self-assessment Questions ....................................................................................................... 39 1.2.3 Accessing a variable through its pointer ............................................................................... 40 Self-assessment Questions ....................................................................................................... 49 1.2.4 Memory allocation functions ................................................................................................. 50 (i) malloc()................................................................................................................................. 50 (ii) calloc() ................................................................................................................................. 51 (iii) free() ................................................................................................................................... 51 (iv) Realloc().............................................................................................................................. 51 Self-assessment Questions ....................................................................................................... 54 1.2.5 Recursion................................................................................................................................... 55 (i) Definition ............................................................................................................................. 55 (ii) Advantages .......................................................................................................................... 57 (iii) Recursive programs .......................................................................................................... 58 Self-assessment Questions ....................................................................................................... 64 Summary ............................................................................................................................................. 66 Terminal Questions............................................................................................................................ 66 Answer Keys........................................................................................................................................ 67 Activity................................................................................................................................................. 68 Bibliography ........................................................................................................................................ 69 e-References ........................................................................................................................................ 69 External Resources ............................................................................................................................. 69 Video Links ......................................................................................................................................... 69 Aim To provide the students with the knowledge of Pointers and Recursion Instructional Objectives After completing this chapter, you should be able to: • Demonstrate the role of Pointers in data structures • Explain memory allocation functions • Explain Recursion and its advantages • Describe how variables are accessed through pointers Learning Outcomes At the end of this chapter, you are expected to: • Outline the steps to declare and initialise pointers • List the advantages of Recursion • Differentiate malloc() and calloc() and realloc() • Write programs for binomial coefficient and Fibonacci using recursion 35 1.2.1 Introduction to Pointers and Recursive functions In any programming language, creating and accessing variables is very important. So far we have seen how to access variables using their variable names. In this chapter we introduce the concept of indirect access of objects using their address. This chapter describes the use of Pointers in accessing the variables using their memory addresses. As memory is a resource of a computer, it should be allocated and deallocated properly. This chapter also describes the different memory allocation techniques. Further we introduce the concept of Recursion where a function calls itself again and again to complete a task. We will also study some Recursive functions and their program implementations. We will also look at the application of Recursion to various problems like factorial, GCD, Fibonacci series etc. 1.2.2 Declaring and Initializing Pointers Introduction to pointers and its need in programming In all the previous programs, we referred to a variable with its variable name. Hence, the program did not care about the physical address of those variables. So, whenever we need to use a variable, we access them using the identifier which describes that variable. Computers memory is divided into cells or locations which has unique addresses. Every variable that we declare in our program has an address associated with it. Thus a variable can also be accessed using the address of that variable. This can be achieved by using Pointers. Pointers can be defined as the special variables which have a capability to store address of any variable. They are used in C++ programs to access the memory and manipulate the data using addresses. Pointers are a very important feature of C++ Programming as it allows to access data using their memory addresses and not directly using their variable names. Pointers do not have much significance when simple primitive data types like integer, character or float are used. But as the data structures become more complex, pointers play a very vital role in accessing the data. 36 For example, consider an integer variable “a”. This variable will have 3 things associated with itself. First is its name, second is its value and third is the memory address. Assume that a variable “a” is having value “5” and address as “1000”. Hence we can access this variable “a” by using a pointer variable which will store the address of variable a. Thus we can manipulate values of any variable using a pointer variable. Pointers are used for dynamic memory allocation so as to handle a huge amount of data. It would have been very difficult to allocate memory globally or through functions, without pointers. Declaration and Initialization of Pointer Variables Like any other variable in C++, pointer variables should be also declared before they are used for storing addresses. In this chapter we are going to study 2 operators known as Pointer Operators: 1. & (address of) Operator: This operator gives the address of any variable. For example, if “max” is an integer variable, then &max will give memory address of variable max. 2. * (dereference) Operator: This operator returns the value at any memory address. Thus, the argument to this operator must be a pointer. It is called as a dereference operator as it works in a opposite manner to the & operator. For example, if “ptr” is a pointer variable which stores the address of variable “a”, then *ptr will return the value located at the memory address pointed by “ptr”. A pointer variable can be declared as follows: Syntax: Data_type * pointer_variable_name; We need to specify the data type followed by * symbol and finally the name of pointer variable terminated by a semicolon. For example, int *ptr; This declaration tells the compiler that “ptr” is a pointer variable of type integer. 37 A Pointer variable can be initialized as given below: Syntax: pointer_variable_name=& variable_name; For example, int a; //variable a is declared as a integer variable int *ptr; //declare ptr as pointer variable ptr=&a; //ptr is a pointer variable which stores address of variable a We can combine declaration and initialisation in one step also: Syntax: data_type *pointer_variable_name=&variable_name For example, int *ptr= &a; It means ptr is a pointer variable storing the address of variable a. Note: pointer variable should always point to address of variable of same data type: For example, char a; int *ptr; ptr=&a; //Invalid as a is char type and ptr is integer type. Dereferencing a pointer As it is already discussed, we can use * (dereference) operator to access the value stored at the address pointed by the pointer variable. This * operator is called as “value at operator” or “indirection operator”. For example, /*pointer variable declaration and initialization*/. #include<stdio.h> int main() { int a=5; //a is a integer variable int *ptr; //ptr is a pointer variable declared ptr=&a; //pointer ptr stores the address of variable a printf("Address of variable a is: %d\n", &a); //prints the address of variable a printf("Address of variable a is %d\n", ptr); //prints address of var a as it is stored in ptr printf("Value of variable a is: %d\n", *ptr); //prints the value of variable a printf("Address of pointer ptr is:%d\n", &ptr); 38 //prints address of pointer variable ptr return 0; } Output: Self-assessment Questions 1) Pointer is special kind of variable which is used to store __________ of the variable. a) Data Variable b) Variable Name c) Value d)Address 2) Pointer variable is declared using preceding _________ sign. a) * b) % c) & d) ^ 3) Consider the 32 bit compiler. We need to store address of integer variable to integer pointer. What will be the size of integer pointer, a) 2 Bytes b) 6 Bytes c) 10 Bytes d) 4 Bytes 39 1.2.3 Accessing a variable through its pointer Pointers are special kind of variables which can hold the address of another variable. Once a pointer has been assigned the address of any variable, we can use the value of that variable and manipulate it as per the requirement. We know that the pointers can store the address of any variable using & (address of) operator. Once the address is stored in pointer, we can use a * (dereferencing/ indirection) operator followed by variable name to access the value of that variable. For example, int a, b; a=60; int *ptr; ptr= &a; b= *ptr; Considering the above section of program, we declare 2 integer variables a and b. We have also declared a integer pointer variable “ptr”. In the next statement we assign the address of variable a to the pointer “ptr”. The statement b=*ptr; will assign the value at the address pointed by “ptr” to the variable “b”. Thus variable “b” becomes same as variable “a”. This is equivalent to the statement b= a; 40 The below figure 1.2.1 shows how a variable can be accessed using a pointer. Figure 1.2.1: Accessing a variable through its pointer Program: /*Accessing a variable through pointer*/ #include <stdio.h> int main() { int a, b; a=60; b=0; int *ptr; ptr= &a; printf("Value of variable a=%d\n", a); printf("Value of variable b=%d\n", b); b=*ptr; //assign value at address pointed by ptr to the variable b printf("Value of pointer variable ptr=%d\n", ptr); printf("Value of variable b=%d\n", b); return 0; } Output: 41 Variable “a” is assigned a value 60. Pointer variable “ptr” will store the address of variable “a” say 1005. When we say “b=*ptr”, b will be assigned a value 60 pointed by the pointer “ptr”. Thus after execution of above program, a=60 and b will also have value 60. Different Types of Pointer variables and their use: In the following program, we have created 4 pointers: one integer pointer “iptr”, one float pointer “fptr”, one double type pointer “cptr” and one character type pointer “chptr”. Each type of pointer variable stores the address of respective type of variable. For example, Character pointer variable will store the address of variable “ch” which is of type character. The indirection operator i.e., * accesses an object of a specified data type at an address. Accessing any variable by its memory address is called indirect access. In the below given example *iptr indirectly accesses the variable that iptr points to i.e., variable a. Similarly pointer variable *fptr indirectly accesses the variable that fptr points to i.e. variable b. Program: /*different types of pointer variables */ #include <stdio.h> int main() { int a, *iptr; float b, *fptr; double c, *cptr; char ch, *chptr; iptr=&a; //iptr stores address of integer variable a fptr=&b; //fptr stores address of float variable b cptr=&c; //cptr stores address of double variable c chptr=&ch; //chptr stores address of character variable ch a = 10; b = 2.5; c = 12.36; ch = 'C'; 42 printf("Address of variable a is %u \n", iptr); printf("Address of variable b is %u \n", fptr); printf("Address of variable c is %u \n", cptr); printf("Address of variable ch is %u \n\n", chptr); printf("Value of variable a is %d \n", *iptr); printf("Value of variable b is %f \n", *fptr); printf("Value of variable c is %f \n", *cptr); printf("Value of variable ch is %c \n", *chptr); return 0; } Output: When the following pointer variable declarations are encountered, memory spaces are allocated for these variables at some addresses. int a, *iptr; float b, *fptr; double c, *cptr; char ch, *chptr; 43 The memory layout during declaration phase is shown in Figure 1.2.2. Figure 1.2.2: Declaration of Pointer variables But when we assign the addresses of variables to the respective pointer variables, the memory layout will look the way shown in below figure 1.2.3 Figure 1.2.3: Effect of indirect Access and Assignments of Pointers These initialized pointers may now be used to indirectly access the variables they are pointing. 44 Pointer arithmetic As we perform arithmetic operations on regular integer variables, we can also perform arithmetic operations on pointer variables. Only addition and subtraction operations can be performed on pointer types. But behaviour of addition and subtraction on pointer variables is slightly different. The operations behave differently according to the data type they are pointing to. The sizes of basic data types like integer, char, float etc. are already defined. Suppose we define 3 pointer variables as given below: char *cptr; short *sptr; long *lptr; Let us assume that they point to memory locations 4000, 5000 and 6000 respectively. If we write an increment statement as given below, it will increment the address contained in it. Thus, its value will become 5001. ++cptr ; This is because cptr is a character pointer and character is of 1 byte. Thus, incrementing a character pointer will add 1 to the memory address. Similarly the statements, ++sptr ; will increment the address contained in sptr by 2 bytes as short is 2 bytes in size and ++lptr ; will increment the address contained in lptr by 4 bytes as long is 4 bytes in size. Thus, when we increment a pointer, the pointer is made to point to the following element of the same type. Hence, the size in bytes of the type it points to is added to the pointer after incrementing it. 45 Same rules will follow for addition as well as subtraction operation. The below given statements give the same result as that of increment operator. cptr = cptr + 1; sptr = sptr + 1; lptr = lptr + 1; These increment (++) and decrement (--) operators can be used as either prefix or postfix operator in any expression. So in case of pointers, these operators can be used in similar way but with slight difference. In case of prefix operator, the value is incremented first and then the expression is evaluated. In case of postfix operator, the expression is evaluated first and then the value is incremented. Same rules follow for incrementing and decrementing pointers. As per the operator precedence rules, postfix operators, such as increment and decrement, have higher precedence than prefix operators, such as the dereference operator (*). Thus, the following expression: *ptr ++; is same as *(ptr++). As ++ operator is used as postfix, the whole expression is evaluated as the value pointed originally by the pointer is then incremented. There are the four possible combinations of the dereference operator with both the prefix and suffix increment operators. 1. *ptr++ //equivalent to *(ptr++) //increment pointer ptr and dereference unincremented address 2. *++ptr //equivalent to *(++ptr) //increment pointer ptr and dereference incremented address 3. ++*ptr //equivalent to ++(*ptr) //dereference pointer and increment the value stored in it 4. (*ptr)++ //equivalent to (*ptr)++ //dereference pointer and post-increment the value stored in it 46 If we consider the following statement, *ptr++ = *qtr++; As ++ has a higher precedence than *, both ptr and qtr pointers are incremented. Because both increment operators (++) are used as postfix operators, thus incrementing the value stored at address pointed by pointer ptr and qtr. Program: /*Pointer Arithmetic*/ #include<stdio.h> int main() { int ivar = 5, *iptr; char cvar = 'C', *cptr; float fval = 4.45, *fptr; iptr = &ivar; cptr = &cvar; fptr = &fval; printf("Address of integer variable ivar = %u\n", iptr); printf("Address of character variable cvar = %u\n", cptr); printf("Address of floating point varibale fvar = %u\n\n", fptr); /* Increment*/ iptr++; cptr++; fptr++; printf("After increment address in iptr = %u\n", iptr); printf("After increment address in cptr = %u\n", cptr); printf("After increment address in fptr = %u\n\n", fptr); /* increment by 2*/ iptr = iptr + 2; cptr = cptr + 2; fptr = fptr + 2; printf("After +2 address in iptr = %u\n", iptr); printf("After +2 address in cptr = %u\n", cptr); printf("After +2 address in fptr = %u\n\n", fptr); /* Decrement*/ iptr--; cptr--; fptr--; printf("After decrement address in iptr = %u\n", iptr); printf("After decrement address in cptr = %u\n", cptr); printf("After decrement address in fptr = %u\n\n", fptr); 47 return 0; } Output: 48 Self-assessment Questions 4) Comment on following pointer declarations int *ptr, p;. a) ptr is a pointer to integers , p Is not b) ptr and p both are pointers to integer c) ptr is pointer to integer, p may or may not be d) ptr and p both are not pointers to integer 5) What will be the output? main() { char *p; p = "Hello"; printf("%c\n",*&*p); } a) Hello b) H c) 1005 (memory address of variable p) d) 1008(memory address of character H) 6) The statement int **a;, a) Is illegal b) Is legal but meaningless c) Is syntactically and semantically correct d) Stacks 7) Comment on the following, const int *ptr; a) We cannot change the value pointed by ptr. b) We cannot change the pointer ptr itself. c) Is illegal d) We can change the pointer as well as the value pointed by it 49 1.2.4 Memory allocation functions Memory is a resource of computer system and it needs to be allocated properly for any kind of data structures used in programs. Dynamic memory allocation is a process of allocating memory to the data during program execution. Normally when we are dealing with simple arrays or strings, we allocate the required amount of memory during compile time itself. We cannot extend the allocated memory during runtime. Hence, in such cases we need to allocate sufficient amount of memory at the compile time. But in compile time memory management, sometimes the allocated memory may not be used hence wasting the memory space. Thus, we can make use of Dynamic memory allocation technique to allocate and de-allocate memory at runtime. Dynamic memory allocation helps us to increase or decrease the memory when the program is under execution. The following are the dynamic memory allocation functions in C: 1. malloc () It Allocates requested size of bytes and returns a pointer of first byte of allocated space. 2. calloc() It Allocates space for an array elements, initializes to zero and then returns a pointer to memory 3. realloc() It deallocate the previously allocated space. 4. free() We change the size of previously allocated space. (i) malloc() malloc, as the name indicates, stands for memory allocation. This function reserves a block of memory of specified size to return a pointer of type void. Syntax of malloc() ptr=(cast-type*)malloc(byte-size) 50 Here, ptr is pointer of cast-type. The malloc() function returns a pointer to an area of memory with size of byte size. If the space is insufficient, allocation fails and returns NULL pointer. ptr=(int*)malloc(100*sizeof(int)); This statement will allocate either 200 or 400 according to size of int 2 or 4 bytes respectively and the pointer points to the address of first byte of memory. (ii) calloc() Calloc stands for "contiguous allocation". The difference between malloc() and calloc() is that, malloc() allocates single block of memory whereas calloc() allocates multiple blocks of memory each of same size and sets all bytes to zero. Unless ptr is NULL, it must have been returned by an earlier call to malloc(), calloc() or realloc(). Syntax of calloc() ptr=(cast-type*)calloc(n,element-size); This statement will allocate contiguous space in memory for an array of n elements. For example: ptr=(float*)calloc(25,sizeof(float)); This statement allocates contiguous space in memory for an array of 25 elements each of size of float, i.e., 4 bytes. (iii) free() This function is used to explicitly free the memory allocated by malloc() and calloc() functions. It releases all the memory reserved for program. free(ptr); (iv) Realloc() Sometimes a programmer requires extra memory or allocated memory becomes more than sufficient. In these cases, a programmer can change memory size previously allocated using realloc(). 51 Syntax of realloc() ptr=realloc(ptr,newsize); Here, ptr is reallocated with size of newsize. For example: #include<stdio.h> #include<stdlib.h> int main() { int *ptr,i,n1,n2; printf("Enter size of array: "); scanf("%d",&n1); ptr=(int*)malloc(n1*sizeof(int)); printf("Address of previously allocated memory: "); for(i=0;i<n1;++i) printf("%u\t",ptr+i); printf("\nEnter new size of array: "); scanf("%d",&n2); ptr=realloc(ptr,n2); for(i=0;i<n2;++i) printf("%u\t",ptr+i); return 0; } Output: 52 Example showing use of malloc(), calloc() and free() Program: #include<stdio.h> #include<stdlib.h> int main() { int n,i,*ptr,sum=0; printf("Enter number of elements: "); scanf("%d",&n); ptr=(int*)calloc(n,sizeof(int)); //memory allocated using calloc if(ptr==NULL) { printf("Error! memory not allocated."); exit(0); } printf("Enter elements of array: "); for(i=0;i<n;++i) { scanf("%d",ptr+i); sum+=*(ptr+i); } printf("Sum=%d",sum); free(ptr); return 0; } Output: 53 Self-assessment Questions 8) What function should be used to free the memory allocated by calloc()? a) dealloc(); b) malloc(variable_name, 0) c) free(); d) memalloc(variable_name, 0) 9) Which header file should be included to use functions like malloc() and calloc()? a) memory.h b) stdlib.h c) string.h d) dos.h 10) How will you free the memory allocated by the following program? #include<stdio.h> #include<stdlib.h> #define MAXROW 3 #define MAXCOL 4 int main() { int **p, i, j; p = (int **) malloc(MAXROW * sizeof(int*)); return 0; } a) The name of array b) The data type of array c) The first data from the set to be stored d) the index set of the array 11) Specify the 2 library functions to dynamically allocate memory? 54 a) malloc() and memalloc() b) alloc() and memalloc() c) malloc() and calloc() d) memalloc() and faralloc() 1.2.5 Recursion Recursion is considered to be the most powerful tool in a programming language. But sometimes, Recursion is also considered as the most tricky and threatening concept to a lot of programmers. This is because of the uncertainty of conditions specified by user. In short Something Referring to itself is called as a Recursive Definition (i) Definition Recursion can be defined as defining anything in terms of itself. It can be also defined as repeating items in a self-similar way. In programming, if one function calls itself to accomplish some task then it is said to be a recursive function. Recursion concept is used in solving those problems where iterative multiple executions are involved. Thus, to make any function execute repeatedly until we obtain the desired output, we can make use of Recursion. Example of Recursion: The best example in mathematics is the factorial function. n! = 1.2.3.........(n-1).n If n=6, then factorial of 6 is calculated as, 6! = 6(5)(4)(3)(2)(1)= 720 Consider we are calculating the factorial of any given using a simple. If we have to calculate factorial of 6 then what remains is the calculation of 5! In general we can say n ! = n (n-1)! (i.e., 6! = 6 (5!)) it means we need to execute same factorial code again and again which is nothing but Recursion. 55 Thus, the Recursive definition for factorial is: f(n) = 1 if n=0 n * f (n-1) otherwise The above Recursive function says that the factorial of any number n=0 is 1, else the factorial of any other number n is defined to be the product of that number n and the factorial of one less than that number . For example, consider n=4 As n is not equal to 0, the first case will not satisfy. Thus, applying second case we get 4! = 4(4-1)! = 4(3!) To find 3! Again we have to apply the same definition. 4! = 4(3!)=4[(3)(2!)] Now, we have to calculate 2! , which requires 1! , which requires 0!. As 0! is 1 by definition , we reach the end of it. Now we have to substitute the calculated values one by one in reverse order. 4!=4(3!)= 4(3)(2!)=4(3)(2)(1!)= 4(3)(2)(1)(0!)= 4(3)(2)(1)(1)= 24 Thus, 4!= 24 From the above solution it is clear that the each time we need to calculate the factorial of a value one less than the original one. Thus we reach value 0 where we have to stop applying same function of factorial. Any recursive definitions will have some properties. They are: • There are one or more base cases for which recursions are not needed. • All cycles of recursion stops at one of the base cases. We should make sure that each recursion always occurs on a smaller version of the original problem. 56 In C Programming a recursive factorial function will look like: int factorial(int n) { if (n==0) //Base Case return 1; else return n*factorial (n-1); //Recursive Case } The above program is for calculating factorial of any number n. First when we call this factorial function, it checks for the base case. It checks if value of n equals 0. If n equals 0, then by definition it returns 1. Otherwise it means that the base case is not yet been satisfied. Hence, it returns the product of n and factorial of n-1. Thus, it calls the factorial function once again to find factorial of n-1. Thus forming recursive calls until base case is met. Figure 1.2.4 shows the series of recursive calls involved in the calculation of 5!. The values of n are stored on the way down the recursive chain and then used while returning from function calls. Figure 1.2.4: Recursive computation of 5! (ii) Advantages An important advantage of Recursion is that it saves time of programmer to a large extent. Even though problems like factorial, power or Fibonacci can be solved using loops but their 57 recursive solutions are shorter and easier to understand. And there are algorithms that are quite easy to implement recursively but much more challenging to implement using loops. Advantages of Recursion: • Reduce unnecessary calling of function. • Solving problems becomes easy when its iterative solution is very big and complex and cannot be implemented with loops. • Extremely useful when applying the same solution (iii) Recursive programs 1. Fibonacci series One of the well-known problems is generating a Fibonacci series using Recursion. A Fibonacci series looks like 0,1, 1, 2, 3, 5, 8, 13, 21and so on Working: The next number is equal to sum of previous two numbers. The first two numbers of Fibonacci series are always 0 and. The third number becomes the sum of first 2 numbers, i.e., 0 + 1 = 1. Similarly, the Fourth number is the sum of 3rd and 2nd number, i.e., 1 + 1 = 3 and so on. Thus, the Recursive definition for Fibonacci is: 0 Fk(n) = 1 if 0 <=k <n-1 if k=n-1 (𝑘𝑘−1) ∑i=k−n Fin Otherwise In C Programming a recursive fibonacci function will look like: int fib(int n) { if (n <= 1) return n; else return fib(n - 1) + fib(n - 2); } 58 If n is less than or equal to 1, then return n. Otherwise return the sum of the previous two terms in the series by calling fib function twice. Once for (n-1) and next for fib (n-2).This combines results from 2 separate recursive calls. This is sometimes known as "deep" recursion. The below figure 1.2.5 demonstrates the working of recursive algorithm for Fibonacci series. Figure 1.2.5: Recursive Algorithm For example, the call to fib (4) repeats the calculation of fib (3) (see the circled regions of the tree). In general, when n increases by 1, we roughly double the work; that makes about 2n calls. Following is the c program for implementation of Fibonacci series: Program: #include<stdio.h> int fib(int n) { if ( n == 0 ) return 0; else if ( n == 1 ) return 1; else return ( fib(n-1) + fib(n-2) ); } int main() { int n, j = 0, i; printf("Fibonacci series implementation\n"); printf("How many terms in series: "); scanf("%d",&n); printf("Fibonacci series\n"); for ( i = 1 ; i <= n ; i++ ) { printf("%d\n",fib(j)); j++; } return 0; } 59 Output: The above program uses the recursion concept to print the Fibonacci series. The program first asks the total number of terms to be displayed as output. Then it makes recursive calls to the function fib() and finds the next term in the series by adding previous two values in the series. 2. Binomial Coefficient Binomial coefficient C (n, k) counts the number of ways to form an unordered collection of k items selected from a collection of n distinct items For example, if you wanted to make a group of two from a group of four people, the number of ways to do this is C (4, 2). Where, n=4 i.e., 4 people and k=2 i.e. group of 2 people There are total 6 ways to group them in an unordered manner. Let us assume 4 people as A, B, C and D So the 2 letter groups are: AB, AC, AD, BC, BD, and CD Hence, C (n, k) = C (4, 2) = 6. In general, Binomial Coefficients can be defined as: • A binomial coefficient C (n, k) is the coefficient of X^k in the expansion of (1 + X)^n. • A binomial coefficient C (n, k) also gives the number of ways, regardless of order, that k items can be chosen from among n items. 60 Problem: This Problem of Binomial Coefficients can be implemented using Recursion. We need to write a function that takes two parameters n and k and returns the value of Binomial Coefficient C (n, k). Recursive function: The value of C(n, k) can recursively calculated using following standard formula for Binomial Coefficient’s. C(n, k) = C(n-1, k-1) + C(n-1, k) C(n, 0) = C(n, n) = 1 Below given program implements the calculation of Binomial Coefficients in a Recursive Manner. Program: //Recursive implementation of Binomial Coefficient C(n, k) #include<stdio.h> int binomial(int n, int k) { if (k==0 || k==n) // Base Cases return 1; else return binomial(n-1, k-1) + binomial(n-1, k); } int main() { int n, k; printf("Enter the value of n:"); scanf("%d",&n); printf("\nEnter the value of k:"); scanf("%d",&k); printf("\nValue of C(%d, %d) is %d ", n, k, binomial(n, k)); return 0; } 61 Output: It should be noted that in the above program, the binomial function is called again and again until the base cases are satisfied. Below given figure 1.2.6 is the Recursive tree for n=5 and k=2. Figure 1.2.6: Example of DP and Recursion 3. GCD (Greatest Common Divisor) The Greatest Common divisor of two or more integers is the largest positive integer that divides the numbers without a remainder. For example, the GCD of 8 and 12 is 4. Problem Definition: Given any nonnegative integers a and b, considering both are not equal to 0, calculate gcd(a, b). Recursive Definition: For a,b ≥ 0, gcd(a,b) = 62 a if b=0 gcd(b, (a mod b)) otherwise Input: Any Nonnegative integers a and b, both not equal to zero. Output: The greatest common divisor of a and b. For example: Consider a=54 and b=24. We need to find GCD (54, 24) Thus, the divisors of 54 are: 1, 2, 3, 6, 9, 18, 27, and 54 Similarly, the divisors of 24 are: 1, 2, 3, 4, 6, 8, 12, and 24 Thus, 1,2,3,6 are the common divisors of both 54 and 24: The greatest number of these common divisors is 6. That is, the GCD (greatest common divisor) of 54 and 24 is 6. The following program demonstrates computation of GCD using recursion: Program: /*GCD of Numbers using Recursion*/ #include <stdio.h> int gcd(int a, int b) { while (a != b) { if (a > b) return gcd(a - b, b); else return gcd(a, b - a); } return a; } int main() { int a, b, ans; printf("Enter the value of a and b: "); scanf("%d%d", &a, &b); ans = gcd(a, b); printf("GCD(Greatest common divisor) of %d and %d is %d.\n", a, b, ans); } 63 Output: Did you Know? One critical requirement of recursive functions is termination point or base case. Every recursive program must have base case to make sure that the function will terminate. Missing base case results in unexpected behaviour. Self-assessment Questions 12) Which Data Structure is used to perform Recursion? a) Queue b) Stack c) Linked List d) Tree 13) What is the output of the following code? int doSomething(int a, int b) { if (b==1) return a; else return a + doSomething(a,b-1); } doSomething(2,3); 64 a) 4 b) 2 c) 3 d) 6 14) Determine output of, int rec(int num){ return (num) ? num%10 + rec(num/10):0; } main(){ printf("%d",rec(4567)); } a) 4 b) 12 c) 22 d) 21 15) What will be the below code output? int something(int number) { if(number <= 0) return 1; else return number * something(number-1); } something(4); a) 12 b) 24 c) 1 d) 0 16) void print(int n), { if (n == 0) return; printf("%d", n%2); print(n/2); } What will be the output of print(12)? a) 0011 b) 1100 c) 1001 d) 1000 65 Summary o A pointer is a value that designates the address (i.e., the location in memory), of some value. Pointers are variables that hold a memory location. o ‘&’ - address of variable is used to assign address of any variable to pointer variable. o ‘*’ indirection operator is used to access the value contained in a particular pointer. o Pointers store the address of any variable using & operator. We can access the value of that variable using a * operator succeeded by the variable name. o Memory allocation function: • Malloc() - Allocates requested size of bytes and returns a pointer first byte of allocated space • Calloc() - Allocates space for an array elements, initializes to zero and then returns a pointer to memory • Free() - deallocate the previously allocated space • Realloc() - Change the size of previously allocated space o Recursion is the process of repeating items in a self-similar way. In Programs, if a function makes a call to itself then it is called a recursive function. Recursion is more general than iteration. Choosing between recursion and looping involves the considerations of efficiency and elegance. Terminal Questions 1. Explain the role of pointers in data structures. 2. What are the memory allocation functions? Explain in detail. 3. Define Recursive functions. 4. Write a note on indirection operator. 66 Answer Keys Self-assessment Questions Question No. Answer 1 d 2 a 3 a 4 a 5 b 6 c 7 a 8 c 9 b 10 d 11 c 12 b 13 d 14 c 15 b 16 a 67 Activity 1. Activity Type: Offline Description: Ask all the students to get the output of the below question: #include<stdio.h> int main(){ int i = 3; int *j; int **k; j=&i; k=&j; printf("%u %u %d ",k,*k,**k); return 0; } Prepare a presentation on pointers and dynamic memory allocation. 68 Duration: 10 Minutes Bibliography e-References • cslibrary.stanford.edu, (2016). Stanford CS Education Library. Retrieved on 19 April 2016, from http://cslibrary.stanford.edu/106/ • doc.ic.ac.uk, (2016). Recursion. Retrieved on 19 April 2016, from http://www.doc.ic.ac.uk/~wjk/c++Intro/RobMillerL8.html External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education Video Links Topic Link Introduction to pointers, declaring and initializing pointers and accessing https://www.youtube.com/watch?v=fAPt0Upy3ho variables through is pointers Memory allocation functions https://www.youtube.com/watch?v=s4io0ir2kas Recursion https://www.youtube.com/watch?v=AuTjrMu-2F0 69 Notes: 70 MODULE - II Searching and Sorting MODULE 2 Searching and Sorting Module Description This module introduces the problem of searching a list to find a particular entry. The discussion centres on two well-known algorithms: sequential search and binary search. Most of this chapter assumes that the entire sort can be done in main memory, so that the number of elements is relatively small(less than a million). Sorts that cannot be performed in main memory and must be done on disk or tape are also quite important. This type of sorting is known as external sorting and will be discussed in the second chapter of this module. It is assumed in our examples that the array contains only integers to simplify matters. At the same time, we have to understand that more complicated structures are possible. Chapter 2.1 Searching Techniques Chapter 2.2 Sorting Techniques Chapter Table of Contents Chapter 2.1 Searching Techniques Aim ....................................................................................................................................................... 71 Instructional Objectives..................................................................................................................... 71 Learning Outcomes ............................................................................................................................ 71 2.1.1 Introduction to Searching ....................................................................................................... 72 (i) Types of Searching .............................................................................................................. 73 Self-assessment Questions ....................................................................................................... 77 2.1.2 Basic Sequential Searching ...................................................................................................... 77 Self-assessment Questions ....................................................................................................... 82 2.1.3 Binary search ............................................................................................................................ 82 (i) Iterative implementation .................................................................................................... 83 (ii) Recursive implementation ................................................................................................ 85 Self-assessment Questions ....................................................................................................... 87 2.1.4 Comparison between sequential and binary search ............................................................ 88 Self-assessment Questions ....................................................................................................... 89 Summary ............................................................................................................................................. 90 Terminal Questions............................................................................................................................ 91 Answer Keys........................................................................................................................................ 92 Activity................................................................................................................................................. 93 Case Study: Alphabetizing Papers .................................................................................................... 93 Bibliography ........................................................................................................................................ 95 e-References ........................................................................................................................................ 95 External Resources ............................................................................................................................. 95 Video Links ......................................................................................................................................... 95 Aim To educate the students in searching and sorting techniques Instructional Objectives After completing this chapter, you should be able to: • Explain searching and its types with its code snippet • Describe sequential search using iterative and recursive • Explain binary search • Compare linear search and binary search Learning Outcomes At the end of this chapter, you are expected to: • Elaborate searching techniques with example • Compute time complexities for binary search and sequential search algorithm • Outline the applications of linear and binary search • Write code for iterative and recursive implementation for both the searching techniques 71 2.1.1 Introduction to Searching This chapter focuses on how searching plays an important role in the concept of data structures. Searching helps to find if a particular element is part of a given list or not. In this chapter, we will focus on two types of searching techniques namely sequential or linear search and binary search. We will also come across iterative and recursive implementation for the two types of above mentioned searching techniques in this chapter. Searching is a technique of determining whether a given element is present in a list of elements. We are given names of people and are asked for an associated telephone listing. We are given an employee names or codes and are asked for the personnel records of the employee. In these examples, we are given small piece of data or information, which we call as a key, and we are asked to find a record that has other information associated with the key. We shall allow both the possibility that there is more than one record with the same key and that there is no record at all with a given key. See Figure 2.1.1. Figure 2.1.1: Sample records of employees If the element we are searching is present in the list, then the searching technique should returns the index where that element is present in the list. If the search element is not present in the list then the searching technique should return NULL indicating that search element is not present in the list. Like sorting there are number of searching technique available in the literature and searching techniques vary based on its purpose suited for the application. Searching techniques can also be classified based on the data structures used to store the list. Searching techniques will vary for both linear as well as non-linear data structures. Among linear data structures such as arrays, linked lists, stacks and queues, the searching techniques will also vary. If array is used, then we must use different searching technique. Searching an element in a linked list requires different searching techniques. Also in case of non-linear data 72 structures like trees will have different searching techniques. In this chapter we will introduce different types of searching algorithms and their implementations. (i) Types of Searching There are many different searching techniques and modern research is focused on advanced searching techniques using graphs like breadth first search (BFS) and depth first search (DFS) which will be discussed in the later chapters. In most common practice and very well-known there are two types of searching techniques, namely linear or sequential search and binary search algorithms. 1. Linear Search or Sequential Search The simplest way to do a search in a given list is to begin at one end of the list and scan down it until the desired key is found or the other end is reached. This is our first method of searching which we call linear or sequential search. Let A = [10 15 6 23 8 96 55 44 66 11 2 30 69 96] and searching element e = 11. Consider a pointer i, to begin with the process initialize the pointer i = 1. Compare the value pointed by the pointer with the searching element e= 11. As A (1) = 10 and it is not equal to element e increment the pointer i by i+1. Compare the value pointed by pointer i.e., A (2) = 15 and it is also not equal to element e. Continue the process until the search element is found or the pointer i reaches the end of the list. Working of linear or sequential search is shown in the below figure 2.1.2 Figure 2.1.2: Pictorial representation of solution for sequential search 73 Characteristics and applications In case of linear search, the searching happens sequentially. With this, if the element is present at end of the list or not at all present in the lists then this will lead to worst case scenario in case of linear search. In other words, for N elements in a list we will require (N-1) iterations for the above mentioned worst case scenario. This is a O(N) or Big O notation. The speed of linear search algorithms becomes directly proportional to the number of elements in the list. We should also note that, for linear search the list need not be in a sorted order. In some cases, we might place some frequently searched items or elements at the start of the list which will result into faster retrieval thereby increasing the performance irrespective of size of the item or element. Despite of its worst case scenario which affects the performance of the searching technique, linear search technique is widely used in many applications. The built- in functions in programming languages like find index of Ruby, or of jQuery, completely depend on linear search techniques. 2. Binary search algorithm Linear search is easy and efficient for short lists, but a worst for long ones. Just imagine trying to find the name “Carmel Fernandes” in a large directory by reading one name at a time starting at the front of the book! To find any record in a long list, there are far more efficient methods, provided that the keys in the list are already sorted into order. A better methods for a list with keys in order is first to compare the key with one in the centre of the list and then restrict the search to only the first or second half of the list, depending on whether the key comes before or after the central one. With one comparison of keys we thus reduce the list to half its original size. Repeating this exercise, at each step, we reduce the length of the list to be searched by half. With only 20 search iterations, this method locates any required key in a list containing more than a million keys. This method is called binary search. This approach requires that the keys in the list to be of a scalar or other type that can be regarded as having an order and that the list already completely in order. Working of binary search algorithm Consider an array A = [11, 14, 15, 25, 32, 36, 39, 45, 52, 55, 59, 63, 77, 83, 99] and the searching element e= 83. Let low be an integer containing first index of the array i.e., 0 and 74 high be the integer containing highest index of the array which is 14 in this case. Here we first compute mid by finding midpoint of high and low. In our case the midpoint is 7 (midpoint= [high (14) + low (0)]/2). First we check if the search element is at midpoint. In our case search element is 83 and element at the midpoint is 45 which is not true. Next step is we check if value at the midpoint is greater or smaller than the search element. If the value at midpoint is greater than the value of search element then it means our search element is in the left part of midpoint and hence the search must me restricted in the first half of the array. So we set high at midpoint and leave low unchanged. If the value at the midpoint is smaller than the search element, then it means our search element is present in right half of the array and hence search should be restricted in right side from midpoint. Therefore we set low to the midpoint and leave high unchanged. We repeat this process until the search element is found. The solution of the example taken above is solved below pictorially. Initially low =0, high =14, midpoint=7 and we have element to be searched e= 83. Since A[mid] is 45 which is smaller than 83 we set low =mid for our next iteration. So calculate the next midpoint using new low and high. Iteration 2 Again the element is not present at midpoint and midpoint element is greater than 83 hence, the new we set low =11 and leave high unchanged. Our new midpoint will be, 75 Iteration 3 Now, first we check if the value at midpoint is our search element; which is true. Hence, our search is complete. The algorithm will return the value of the midpoint which is 11 in our case. Characteristics and applications In case of binary search technique, the given list of elements will be divided into two parts and one part gets eliminated in each iteration which is not so in case of linear search technique. This feature makes binary search more efficient and more powerful compared to that of linear search irrespective of the number of elements present in a list. Binary search technique is implemented either iteratively or recursively. For binary search, the elements need to be in a sorted order which is not so in case of linear search. With this, the list is already in a sorted order before we start with searching. Since binary search is more suitable for larger sets of elements, its performance degrades if the list frequently gets updated for which again sorting takes place for the updated list. 76 Self-assessment Questions 1) Searching is important function because ________________. a) Information retrieval is most important part of the computer system b) Data in the computer system is unorganized c) It allows validating data present in the computer’s memory d) It allows computer to validate information 2) What factor degrades the performance of binary search technique? a) Number of iterations b) Re-sorting c) Size of the element d) Size of the list 3) ____________ search algorithm begins at one end of the list and scans down it until the desired key is found or the other end is reached. a) Sequential b) Binary c) BFS d) DFS 2.1.2 Basic Sequential Searching As we have already discussed working of sequential search algorithm, let us now focus on building a logical algorithm and implementing a program for the same. There are two ways of implementing this algorithm. First one is iterative method of implementation and the second one is recursive implementation. Figure 2.3 below shows a flowchart for implementing basic sequential search algorithm. 77 Figure 2.1.3: Flowchart for linear search algorithm Types of implementation: A search can be implemented by two methods: 1. Iterative implementation 2. Recursive implementation 1. Iterative Implementation Below program code demonstrates iterative method of implementation of sequential search algorithm. #include <stdio.h> void main() { int arr[20]; int x, n, key, flag = 0; printf("Enter the number of elements: \n"); scanf("%d", &n); printf("Enter the elements: \n"); for (x = 0; x < n; x++) { scanf("%d", &arr[x]); } printf("Entered elements of the array are:\n"); for (x = 0; x < n; x++) { 78 printf("%d ", arr[x]); } printf("\nEnter the element you want to search: \n"); scanf("%d", &key); /* Linear search logic */ for (x = 0; x < n ; x++) { if (key == arr[x] ) { flag = 1; break; } } if (flag == 1) printf("Element %d found in the array\n",key); else printf("Element %d not found in the array\n",key); } Output: 2. Recursive Implementation Below program code demonstrates recursive method of implementation. #include<stdio.h> int line_search(int[],int,int); int main() { int arr[100],n,x,key; printf("Enter the number of elements: "); scanf("%d",&n); printf("Enter %d elements: ",n); for(x=0;x<n;x++) scanf("%d",&arr[x]); printf("Entered elements of the array are:\n"); for(x=0;x<n;x++) printf("%d ",arr[x]); printf("\nEnter the element you want to search: \n"); scanf("%d",&key); 79 x=line_search(arr,n-1,key); if(x!=-1) printf("Element %d found in the array at %d location\n",key,x+1); else printf("Element %d is not found in the array\n",key); } int line_search(int s[100],int n,int key) { if(n<=0) return -1; if(s[n]==key) return n; else return line_search(s,n-1,key); } Output: Analysis of linear search algorithm Worst Case Analysis (Usually Done) In the worst case complexity, the upper bound on running time of an algorithm is calculated. In this case, we understand which case causes maximum number of operations to be executed. For sequential or linear search, the worst case is when the element to be searched is not present in the array. When number to be searched is not present, the algorithm compares it with all the elements of array one by one. Therefore, the worst case time complexity of linear search would be Θ (n). Average Case Analysis (Sometimes done) In average case complexity, we take every possible input and calculate computing time for all of the inputs. All the calculated values are summed and are divided by the sum of total 80 number of inputs. In this way, we understand distribution of cases. For the linear search problem, consider that all cases are uniformly distributed including the worst case. So we sum all the cases and divide the sum by (n+1). Below is the example of average case time complexity. Best Case Analysis (Ideal) Linear search algorithm performs best if the element is found to be the first element in the list during a searching process. That is, the time required for searching will be very less. Hence, the best or ideal case of linear search will be Θ (1). 81 Self-assessment Questions 4) Which is the worst case for linear search algorithm? a) The element to be searched is present at the first position in the array b) The element to be searched is present at the last position in the array c) The element to be searched is present at the middle position in the array d) The element to be searched is not present in the array 5) Best or ideal case complexity for linear search algorithm is ______. a) O(log n) b) O(n) c) O(1) d) O(n log n) 6) The average case complexity of linear search algorithm is __________. a) O(log n) b) O(n) c) O(1) d) O(n log n) 2.1.3 Binary search Similar to the implementation of linear search algorithm, binary search algorithm can also be implemented using linear and recursive methods of implementation. First, consider the below flowchart in figure 2.1.4 to understand the working of the algorithm. The flowchart is implemented using the same logic as discussed in the previous section. The flowchart begins with input key (element) from the user. It then finds the mid-point of the list using the values from low and high index of the list. After finding the mid-point, a comparison will be done so as to check the key value with that of the element present at the mid-point of the list. If the key matches the element at the mid-point, the search is successful. If it does not match, another comparison is done to check if the key value is lesser than that of the element at the mid-point. If it is successful then the left sub-array will be considered and if not then right sub-array will be considered. The whole process is repeated until we find the key value matching the element of the list and if not then we can conclude that the key value is not present in the list. 82 Figure 2.1.4: Flowchart for binary search algorithm (i) Iterative implementation Iterative implementation of binary search is based on the explanation in the above flowchart. The limitation of this algorithm is seen at the termination of this algorithm for an 83 unsuccessful search. That is, when the search is not successful the low convolutes over to the right of high, low > high and this terminates the while loop. The Algorithm below implements iterative method of binary search. #include <stdio.h> int main() { int x, low, high, mid, n, key, arr[50]; printf("Enter the number of elements: "); scanf("%d",&n); printf("Enter the %d elements: ", n); for (x = 0; x < n; x++) scanf("%d",&arr[x]); printf("\nEnter the element you want to search: \n"); scanf("%d", &key); low = 0; high = n - 1; mid = (low+high)/2; while (low <= high) { if (arr[mid] < key) low = mid + 1; else if (arr[mid] == key) { printf("Element %d found in the array at %d location\n", key, mid+1); break; } else high = mid - 1; mid = (low + high)/2; } if (low > high) printf("Element %d is not found in the array\n", key); return 0; } Output: 84 (ii) Recursive implementation Recursive implementation of binary search overcomes the limitation that was seen in iterative method of binary search. Here it checks for condition, if (low>high) then it returns -1. This is similar to while condition in the previous method. This terminates the recursion. If this is not successful then the recursive function gives new values into parameters of a recursive call. The algorithm below implements the recursive method for performing binary search. #include<stdio.h> #include<stdlib.h> int bin_rsearch(int[], int, int, int); int main() { int n, x, key, pos; int low, high, arr[20]; printf("Enter the number of elements: "); scanf("%d", &n); printf("Enter the %d elements: "); for (x = 0; x < n; x++) { scanf("%d", &arr[x]); } low = 0; high = n - 1; printf("\nEnter the element you want to search: \n"); scanf("%d", &key); pos = bin_rsearch(arr, key, low, high); if (pos != -1) { printf("Element %d found in the list at %d location\n", key, (pos + 1)); } else printf("Element %d is not found in the array\n", key); return (0); } // Binary Search function int bin_rsearch(int s[], int i, int low, int high) { int mid; if (low > high) return -1; mid = (low + high) / 2; 85 if (i == s[mid]) { return (mid); } else if (i < s[mid]) { bin_rsearch(s, i, low, mid - 1); } else { bin_rsearch(s, i, mid + 1, high); } } Output: Analysis of Binary search algorithm Worst Case Analysis (Usually Done) Similar to linear search, the worst case of binary search is when the element to be searched is not present in the array. When number to be searched is not present in each iteration of the binary search algorithm, the size of the permissible array is halved. And this having goes up to O(log n) times. Therefore, the worst case time complexity of recursive binary search algorithm is O(log n) and the worst case complexity of iterative binary search algorithm is O(1). Average Case Analysis (Sometimes done) To calculate the average case complexity of binary search algorithm, we take the sum over all the elements of the product of number of comparisons required to find each element and the probability of searching that element. For simplicity of analysis, consider no item which is not in A will be searched for, and the probabilities for searching each element uniform. Therefore the time complexity of binary search algorithm is O (log n). 86 Best Case Analysis (Ideal) In the sequential search problem, the best case is the element to be searched is present at the middle of the array. The number of operations in the best case is constant i.e. it is independent on n. So time complexity in the best case would be O (1). Did you Know? The difference between O(log(N)) and O(N) is extremely significant when size of the array N is large: for any practical problem it is crucial that we avoid O(N) searches. Self-assessment Questions 7) To calculate midpoint in binary search algorithm we ___________. a) Divide lowest index by highest index b) First add lowest index and highest index and then divide the sum by 2 c) First subtract highest index from lowest index and divide the result by 2 d) Just subtract highest index and lowest index 8) Best case for binary search algorithm is when the element to be searched is, a) In the beginning of the array b) At the end of the array c) At the middle of the array d) At any position in the array 9) Average case complexity for binary search algorithm is O(log n), a) True b) False 87 2.1.4 Comparison between sequential and binary search As already discussed in the previous section of this chapter, it is understood that both binary search and sequential search algorithms have several differences. In this section we will compare these algorithms based on various parameters. 1. Implementation requirements First and the foremost thing, the most obvious difference between the two algorithms lies in input requirements. Sequential search can be done even over an unsorted array elements as the comparison is done in sequential manner. Whereas, for binary search algorithm to work, the input array of elements must be sorted. This is because the fundamental way of working of algorithm, which is based on array indices. 2. Efficiency Linear search algorithms works best for small array sizes. However, as the size of the array increases the performance goes down. However, binary search technique works at its best for any array size as the array size is halved in every iteration or recursion, so does the number of comparisons. The efficiency also depends on location of the search element. If for linear search algorithm, the element is present at the starting location of the array, then it becomes the best case and if present at the last position then it becomes worst case. Binary search is most efficient when the element to be searched is at the middle of the array. For any other location other than the middle, the efficiency doesn’t affect much. 3. Complexities For linear search has an average case complexity of O(n) which makes it very slow and inefficient for huge array sizes. However, binary search an average case complexity of O (log n) which makes it a better search algorithm even for large array sizes. 4. Data structure The binary search algorithm works best for arrays but not for linked lists because of the very fundamental structure of arrays having regular indexing and contiguous memory allocation unlike liked lists. On the other hand, sequential searching works well for both arrays and linked lists. 88 Self-assessment Questions 10) Linear search algorithm requires array to be sorted before it start searching. a) True b) False 11) Efficiency of linear search algorithm does not depend on the position of the element to be searched. a) True b) False 12) In general, binary search algorithm is best when the array size is big. a) True b) False 89 Summary o Searching is one of the primary function of the computer system, information retrieval being increasingly important. o Though there are various other algorithms for searching, two very famous and important algorithms are sequential search and binary search o Linear search is simplest of the two, which involves searching elements from one of the ends of the array until the search element is found. As the array size grows big, this algorithm proves inefficient as it consumes lot of time for carrying out search. o Binary search overcomes the disadvantage of sequential search as it reduces number of iterations taken for finding out the search element. It halves the array size and hence the number of comparisons in each iteration. o Unlike linear search, binary search requires the input array to be sorted because of very fundamental working of its algorithm which is based on index assignment to lower to upper side of the array. o In general case, time complexity of sequential search algorithm is O(n) which makes it slower and less efficient as compared to binary search which has time complexity of O(log n). 90 Terminal Questions 1. Explain in brief different types of searching algorithms. 2. Consider the following array A = {23, 26, 32, 35, 39, 42, 44, 47, 50, 55, 58, 62, 66, 88, 99} and search for element e=26 using binary search technique. (Solution needs to be demonstrated pictorially with solution for each iteration). 3. Provide an algorithm for recursive implementation of linear search. 4. Draw a neat flowchart for binary search algorithm. 5. Explain in brief the difference between linear and binary search algorithm. 91 Answer Keys Self-assessment Questions 92 Question No. Answer 1 a 2 b 3 a 4 d 5 c 6 b 7 b 8 c 9 a 10 b 11 b 12 a Activity 1. Activity Type: Offline Duration: 20 Minutes Description: Stack 10 reference books in ascending order of their titles. Ask the students to write a program for binary search to search books by their title. Case Study: Alphabetizing Papers Consider the example of a human alphabetizing a couple dozen papers. If we think about it for a while it’s basically a sorting algorithm. If one tries to understand the process or working of this algorithm, following questions are needed to be asked. 1. How are the papers alphabetized? 2. How are the papers arranged? Basically all papers with names starting with A are put in pile named ‘A’, similarly names starting with B are put in pile names ‘B’ and so on. The groups (pile) range varies based on the number of other factors which are chosen as per convenience. Once the grouping is done, next each pile or a group is scanned letter by letter and a new algorithm is used for further working. In 90 per cent of the cases, humans unknowingly use insertion sort algorithm. It is well known that the quicksort is the best and fastest way to sort. The question is then, why don't humans use quicksort? Human brain doesn't do all comparisons equally. It's just "easier" for our brains to quickly apply insertion sort. Splitting into letter groups makes each smaller problem more manageable. In reality, humans use an algorithm called a bucket sort. A bucket sort followed by individual insertion sorts (exactly what humans tend to do) is a linear time sorting algorithm. When we have some notion of the distribution of the items to be sorted, we can break through that boundary and do linear time sorting. The requirement with linear time sorting is that the input must follow some known distribution. This is the reason why humans instinctively break the piles into various types of groupings. If there are many papers, it is required to reduce group ranges. Furthermore, the ideal bucket setup would distribute the papers 93 roughly evenly. The letter S might need its own bucket, but we can put all the letters up through F in their own bucket. Humans have many of experiences with both the general problem and their specific problem (For example, the peculiarities of a particular class' name distribution) and so they try to optimize the algorithm given the known distribution. They are setting up the parameters of the linear time sort (number of buckets, bucket ranges, etc.) exactly as they should to optimize the sort time. The main disadvantage to these linear sort algorithms is that they require lot of extra memory space (versus comparison-based sorting). We need to have an auxiliary bookkeeping array on the order of the original problem to do them. This isn't a problem in real life problem, where in we just need a large table to arrange papers. In a very real sense, this supposedly "naive" algorithm that humans use is among the very best possible. Questions: 1. Explain the process followed by humans for sorting papers, as described in above case study. What is the method called technically and what is the supported sorting algorithm? 2. Why do you think humans cannot think of sorting using quicksort? 3. Why it is not advised to use bucket sort for implementing computer based sorting algorithm? 4. Do you agree with the author in the case study that the process followed by humans for sorting applications in real life is fastest? 94 Bibliography e-References • interactivepython.org, (2016). Problem solving in Data structures: The Binary Search. Retrieved on 19 April 2016, from http://interactivepython.org/runestone/static/pythonds/SortSearch/TheBinarySea rch.html • pages.cs.wisc.edu, (2016). Searching and Sorting. Retrieved on 19 April 2016, from http://pages.cs.wisc.edu/~bobh/367/SORTING.html External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education Video Links Topic For Introduction to searching, and types of searching techniques Implementation of linear search iterative method Implementation of binary search iterative method Implementation of binary search recursive method Comparison of linear and binary search Link https://www.youtube.com/watch?v=mqixr2wdLqg https://www.youtube.com/watch?v=AqjVd6FVFbE https://www.youtube.com/watch?v=g9BKw_TobpI https://www.youtube.com/watch?v=-bQ4UzUmWe8 https://www.youtube.com/watch?v=u3v-vh2t9FE 95 Notes: 96 Chapter Table of Contents Chapter 2.2 Sorting Techniques Aim ....................................................................................................................................................... 97 Instructional Objectives..................................................................................................................... 97 Learning Outcomes ............................................................................................................................ 97 2.2.1 Introduction.............................................................................................................................. 98 2.2.2 Basics of Sorting ....................................................................................................................... 99 Self-assessment Questions ..................................................................................................... 105 2.2.3 Sorting Techniques ................................................................................................................ 106 (i) The Bubble Sort ................................................................................................................. 106 (ii) Insertion Sort .................................................................................................................... 110 (iii) Selection Sort ................................................................................................................... 113 (iv) Merge Sort ........................................................................................................................ 117 (v) Quick Sort.......................................................................................................................... 123 Self-assessment Questions ............................................................................................................... 132 Summary ........................................................................................................................................... 134 Terminal Questions.......................................................................................................................... 135 Answer Keys...................................................................................................................................... 135 Activity............................................................................................................................................... 136 Bibliography ...................................................................................................................................... 137 e-References ...................................................................................................................................... 137 External Resources ........................................................................................................................... 137 Video Links ....................................................................................................................................... 137 Aim To educate the students in searching and sorting techniques Instructional Objectives After completing this chapter, you should be able to: • Explain the need of sorting • Demonstrate bubble and insertion sort algorithms with example • Discuss the time and space complexities of merge and quick sort algorithms Learning Outcomes At the end of this chapter, you are expected to: • Calculate the complexities of all sorting algorithms • Identify the efficient algorithm • Outline the steps to sort the unsorted numbers using quick sort 97 2.2.1 Introduction Sorting is a technique to store the data in sorted order. Data can be sorted either in ascending or descending order, which can be numerical, lexicographical, or any user-defined order. The term Sorting is related to Searching of data. Here the data considered consists of only integers, but it may be anything like string or records. In our real life, we need to search many things, like a particular record in database, students’ marks in the result database, a particular person’s telephone number, any students name in the list etc. The sorting process will arrange the data in a particular sequence making it easier to search whenever needed. Thus data searching can be optimized to a great extent by using sorting techniques. Every single record to be sorted will contain one key based on which the record will be sorted. For example, suppose we have a record of students, every such record will have data like Roll number, name and percentage. In the above record we can sort the record in ascending and descending order based on key i.e. Roll number. If we wish to search a student with roll no. 54, we don't need to search the complete record but we will simply search between the Students with roll no. 50 to 60, thus saving a lot of time. Some of the examples of sorting in real life scenarios are as followings: 1. Telephone Directory: A Telephone directory keeps telephone numbers of people sorted based on their names. So that names can be searched very easily. 2. Dictionary: A Dictionary contains words in alphabetical order so that searching of any work becomes easy. Before studying any sorting algorithms, it is necessary to know about the 2 main operations involved. 1. Comparison: Two values need to be compared with each other depending upon the sorting criteria. , 2. Exchange or Swapping: When two values are compared with each other, if required they need to be exchanged with each other. 98 Sorting algorithm helps in arranging the elements in a particular order. Efficient sorting algorithm is important to optimize the use of other algorithms (such as search and merge algorithms) which require sorted lists to work correctly. More importantly, the output must satisfy two conditions: 1. The output is in non-decreasing order (each element is not smaller than the previous element according to the desired total order). For example, consider the following set of elements to be sorted: 45, 76, 2, 56, 89, 4 As per the first condition, the output of a sorting algorithm must be in non-decreasing order i.e., 2, 4, 45, 56, 76, 89 2. The output is a permutation, or reordering, of the input. The output should be always the reordering of the same elements to be sorted. You cannot add or delete or replace any element from the set. As per this second condition the sorted list will be: 2, 4, 45, 56, 76, 89 Right from the beginning, the sorting problem has attracted a great deal of research. This is perhaps due to the complexity of solving the problem efficiently despite its simple, familiar statement. Although many consider it a solved problem, new useful sorting algorithms are still being invented For example, library sort was first published in 2004). The Sorting algorithms are prevalent in introductory computer science classes. Here the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts. They are big O notation, data structures, and the divide and conquer algorithms, randomized algorithms, best, worst and average case analysis, time-space trade-offs, and lower bounds. In this chapter, we will look at basics of sorting and sorting techniques like bubble sort, selection sort, insertion sort, merge sort and quick sort in detail. 2.2.2 Basics of Sorting Data can be sorted either in ascending (increasing) or in decreasing (decreasing) order. If the order is not mentioned then it is assumed to be ascending order. In this chapter sorting is 99 done by ascending order. These algorithms can be made to work for descending order also by making simple modifications. Sort Stability Sort stability comes into the picture if the key on which the data is being sorted is not unique for each record. I.e. two or more records have identical keys. For example, consider a list of records where each record contains the name and age of a person. Consider name as sort key and sort all the records according to the names as shown in table 2.1.1. Table 2.1.1: Unsorted List Name Age Vineet 25 Amit 37 Deepa 67 Shriya 45 Deepa 20 Kiran 18 Deepa 56 Name Age Amit 37 Deepa 56 Deepa 67 Deepa 20 Kiran 18 Shriya 45 Vineet 25 Table 2.1.2: Sorted-Unstable List 100 Table 2.1.3: Sorted Unstable List Name Age Amit 37 Deepa 20 Deepa 56 Deepa 67 Kiran 18 Shriya 45 Vineet 25 Name Age Amit 37 Deepa 67 Deepa 20 Deepa 56 Kiran 18 Shriya 45 Vineet 25 Table 2.1.4: Sorted-Stable List Any sorting algorithm would place (Amit, 37) in 1st position, (kuran, 18) in 5th position, (Shriya, 45) in 6th position and (Vineet, 25) in 7th position. There are identical keys(names), which are (Deepa, 67) , (Deepa, 20) and (Deepa, 56) and any sorting algorithm would place them in adjacent locations i.e. 2nd, 3rd, and 4th locations but not necessarily in same relative order. A sorting algorithm is said to be stable if it maintains the relative order of the duplicate keys in the sorted output. i.e., if the keys are equal then their relative order in the sorted output is 101 the same. For example, the records Ri and Rj have equal keys and if the record Rj precedes record Rj in the input data then Ri should precede Rj in the sorted output data also if the sort is stable. If the sort is not stable then Ri and Rj may be in any order in the sorted output. So in an unstable sort the duplicate keys may occur in any order in the sorted output. Sort efficiency Sorting is an important and frequent operation in many applications. So the aim is not only to get sorted data but to get it in the most efficient manner. Therefore many algorithms have been developed for sorting and to decide which one to use when we need to compare them using some parameters. Choice is made using these three parameters: 1. Coding time: Coding time is the time taken to write the program for implementing a particular sorting algorithm. Coding time depends upon the algorithm you are choosing for sorting. For example, simple sorting programs like bubble sort will require less coding time whereas heap sorting will consume more Coding time. 2. Space requirement: It is the space required to store the executable program, constants, variables etc. For example, below program demonstrates how to find the space requirement of a sorting program. It uses size command to display the space required for text, data etc. /*program to find space requirement */ #include <stdio.h> #include <time.h> int main(int argc, char *argv[]) { time_t start, stop; clock_t ticks; long count; time(&start); int array[4]={3, 67, 2, 64}, c, d, temp; for( c=0; c<(4-1); c++) { for(d=0; d<4-c-1; d++) { if(array[d]>array[d+1]) { temp= array[d]; 102 array[d]=array[d+1]; array[d+1]=temp; } } } printf("Sorted list is ascending order:\n"); for(c=0;c<4;c++) printf("%d\n", array[c]); int i=0; while(i<50000) { i++; ticks = clock(); } time(&stop); printf("Used %0.2f seconds of CPU time. \n", (double)ticks/CLOCKS_PER_SEC); printf("Finished in about %.0f seconds. \n", difftime(stop, start)); return 0; } Output: 3. Run time or execution time: It is the time taken to successfully execute a sorting algorithm to obtain a sorted list of elements. For example, the below program demonstrates how to calculate total execution time of a sorting program. It uses header file <time.h> and calculates the execution time of a sorting program. /*program to find execution time */ #include <stdio.h> #include <time.h> int main(int argc, char *argv[]) { 103 time_t start, stop; clock_t ticks; long count; time(&start); int array[4]={2, 67, 12, 23}, i, j, temp; for( i=0; i<4; i++) { for(j=0; j<4-i-1; j++) { if(array[j]>array[j+1]) { temp= array[j]; array[j]=array[j+1]; array[j+1]=temp; } } } printf("Sorted array is:\n"); for(i=0;i<4;i++) printf("%d\n", array[i]); int k=0; while(k<50000) { k++; ticks = clock(); } time(&stop); printf("Used %0.2f seconds of CPU time. \n", (double)ticks/CLOCKS_PER_SEC); printf("Finished in about %.0f seconds. \n", difftime(stop, start)); return 0; } Output: If data is in small quantity and sorting is needed only during a few occasions, then any simple sorting technique can be used. This is because in these cases, a simple or a less efficient 104 technique would behave at par with the complex techniques which are developed to minimize run time and space requirements. So it is pointless to search and apply complex algorithm. Running time can be defined as the total time taken by the sorting program to run to completion. Hence, running time is one of the most important factors in implementation of algorithms. If the amount of data to be sorted is in large quantity, then it is crucial to minimize runtime by choosing an efficient runtime technique. The 2 basic operations in sorting are comparison and moving records. The record moves or any other operations are generally a constant factor of number of comparisons. Moreover the record moves can be considerably reduced so that run time is measured by considering only the comparisons. Calculating the exact number of comparisons may not be always possible so an approximation is given by big=O notation. Thus the run time efficiency of each algorithm is expressed as O notation. The efficiency of most of the sorting algorithm is between O(n log n) and O(n2). Self-assessment Questions 1) The technique used for arranging data elements in a specific order is called as ____________. a) Arranging b) Filtering c) Sorting d) Distributing 2) The Time required to Complete the execution of a sorting program is called as ____________. a) Coding Time b) Average Time c) Running Time d) Total Time 3) A sorting technique is called stable if it _______. a) Takes O(nlogn) times b) Maintains the relative order of occurrence of non-distinct elements c) Uses divide-and-conquer paradigm d) Takes O(n) space 105 2.2.3 Sorting Techniques Sorting techniques depends on two important parameters. The first parameter is the execution time of program, which means time taken for execution of program. The second parameter is the space, which means space or memory taken by the program. The algorithm that you choose must be more efficient in terms of execution time and space usage. There are many techniques for sorting. For example, Bubble sort, Selection sort, merge sort etc. The choice of sorting algorithm depends upon the particular situation. In-place sorting and Not-in-place sorting Sorting algorithms may require some extra space for comparison of elements and temporary storage of few data elements. The sorting algorithms which does not require any extra space for sorting, and usually the sorting happens within array is called as in-place sorting. This is called in-place sorting. Bubble sort is an example of in-place sorting. Many other in-place sorting algorithms include selection sort, insertion sort, heap sort, and Shell sort. But in some sorting algorithms, the program requires space which is more than or equal to the elements being sorted. Sorting which uses equal or more space for temporary storage is called not-in-place sorting. They sometimes require arrays to be separated and sorted. Mergesort is an example of not-in-place sorting. To understand the more complex and efficient first sorting algorithms, it is important to understand the simpler, but slower algorithms. This topic deals with bubble sort, insertion sort; and selection sort, merge sort and quick sort. Any of these sorting algorithms are good enough for most small tasks. (i) The Bubble Sort Bubble Sort is an algorithm which is used to sort N elements that are given in a memory. For example, an Array with N number of elements. Bubble Sort compares the entire element one by one and sort them based on their values. The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item “bubbles” up to the location where it belongs. 106 Sorting takes place by stepping through all the data items one-by-one in pairs and comparing adjacent data items and swapping each pair that is out of order. Fig 2.2.2 shows the first pass of a bubble sort. The shaded items are being compared to see if they are out of order. If there are n items in the list, then there are n−1 pairs of items that need to be compared on the first pass. It is important to note that once the largest value in the list is part of a pair, it will continually be moved along until the pass is complete. Figure 2.2.2: First pass of Bubble sort At the start of the second pass as shown in the below Figure 2.2.3, the largest value is now in place. There are n−1 items left to sort, meaning that there will be n−2 pairs. Since each pass places the next largest value in place, the total number of passes necessary will be n−1. After completing the n−1 passes, the smallest item must be in the correct position with no further processing required. The exchange operation is sometimes called a “swap”. Typically, swapping two elements in a list requires a temporary storage location (an additional memory location). 107 A code fragment such as: temp = alist[i] alist[i] = alist[j] alist[j] = temp will exchange the ith and jth items in the list. Without the temporary storage, one of the values would be overwritten. Figure 2.2.3: Second pass of Bubble sort Below is the code to implement Bubble sort. /* Implementation of Bubble sort Algorithm */ #include <stdio.h> int main() { int arr[300], n, i, j, swap; printf("Enter number of elements:\n"); scanf("%d",&n); printf("Enter those %d elements\n", n); for(i =0; i < n; i++) scanf("%d",&arr[i]); for(i =0; i <( n -1);i++) { for(j =0; j < n - i -1; j++) { if(arr[j]> arr[j+1]) { swap= arr[j]; arr[j]= arr[j+1]; arr[j+1]= swap; } } 108 } printf("After sorting using bubble sort, the elements are:\n"); for( i =0; i < n ;i++) printf("%d\n", arr[i]); return 0; } Output: Complexity Analysis of Bubble Sorting Worst case Time Complexity: Bubble sort algorithm will sort the array on n elements as given below. 1st Pass: n-1 Comparisons and n-1 swaps 2nd Pass: n-2 Comparisons and n-2 swaps .... (n-1)th Pass: 1 comparison and 1 swap. All together: c ((n-1) + (n-2) + ... + 1), where c is the time required to do one comparison, one swap. i.e., (n-1)+(n-2)+(n-3)+.....+3+2+1 Sum of the above series = n(n-1)/2 109 Sum= O(n2) Hence the worst time complexity of Bubble Sort is O(n2). Space complexity: Bubble Sort has Space complexity of O(1), because only one additional memory space is required for temp variable. Best-case Time Complexity: Best Case Time Complexity is O(n), when the given list of elements is already sorted. A bubble sort is considered as the most inefficient sorting method. Bubble sort algorithm exchanges elements before the final location of element is known, thus utilizing more time in these exchange operations. However, as the bubble sort algorithm makes passes through the entire unsorted portion of the list of elements. (ii) Insertion Sort Consider a contiguous list. In this case, it is necessary to move entries in the list to make room for the insertion. To find the position where the insertion is to be made, we must search. One method for performing ordered insertion into a contiguous list is first to do a binary search to find the correct location, and then move the entries as required and insert the new entry. Since so much time is needed to move entries no matter how the search is done, it turns out in many cases to be just as fast to use sequential search as binary search. By doing sequential search from the end of the list, the search and the movement of entries can be combined in a single loop, thereby reducing the overhead required in the function. Following are some of the important characteristics of Insertion Sort. • It has one of the simplest implementation • It is efficient for smaller data sets, but very inefficient for larger lists. • Insertion Sort is adaptive, that means it reduces its total number of steps if given a partially sorted list, hence it increases its efficiency. • 110 It is better than Selection Sort and Bubble Sort algorithms. • Its space complexity is less, like Bubble Sorting. Insertion sort also requires a single additional memory space. • It is Stable, as it does not change the relative order of elements with equal keys. The procedure for insertion sort of elements when elements are equal is demonstrated in the figure 2.2.4 Figure 2.2.4: Insertion sort with equal elements The working of insertion sort algorithm with and example is depicted in figure 2.2.5 How Insertion Sorting Works Figure 2.2.5: Working of Insertion Sort Algorithm 111 Pseudocode: void insertionSort(int arr[], int length) { int i, j, tmp; for (i = 1; i < length; i++) { j = i; while (j > 0 && arr[j - 1] > arr[j]) { tmp = arr[j]; arr[j] = arr[j - 1]; arr[j - 1] = tmp; j--; } } } Program: /* Implementation of Insertion sort Algorithm */ #include <stdio.h> int main() { int arr[500], n, i, j, temp; printf("Enter the total number of elements:"); scanf("%d", & n); printf("Enter those %d Elements : \n", n); for(i=0; i<n; i++) { scanf("%d",&arr[i]); } for(i=1; i<n; i++) { temp=arr[i]; j=i-1; while((temp<arr[j]) && (j>=0)) { arr[j+1]=arr[j]; j--; } arr[j+1]=temp; } printf("After sorting using insertion sort, the elements are: \n"); for(i=0; i<n; i++) { printf("%d\n",arr[i]); } return 0; } 112 Output: Complexity Analysis of Insertion Sort The analysis is is same as bubble sorting. Worst Case Time Complexity: O(n2) Best Case Time Complexity: O(n) Average Time Complexity: O(n2) Space Complexity: O(1) (iii) Selection Sort Insertion sort has one major disadvantage. Even after most entries have been sorted properly into the first part of the list, the insertion of a later entry may require that many of them be moved. All the moves made by insertion sort are moves of only one position at a time. Thus to move an entry 20 positions up the list requires 20 separate moves. If the entries are small, perhaps a key alone, or if the entries are in linked storage, then the many moves may not require excessive time. In case the entries are very large, records containing hundreds of components like personnel files or student transcripts, and the records must be kept in contiguous storage. In such cases it would be far more efficient if, when it is necessary to move an entry, it could be moved immediately to its final position. Selection sort method accomplishes this goal. 113 How Selection Sorting Works Consider an array of n elements. Selection sort algorithm starts by comparing first two elements of that array and swap the elements if required For example, if you want to sort the elements in ascending order and if the first element is greater than second element, then it will swap the elements but, if the first element is smaller than second element, it will leave the elements as it is. Then, again first element and third element are compared and swapped if required. This process will continue until first and last element of that array is compared, thus completing the first pass of selection sort. In this algorithm, after the first pass the required element will be already placed at its final position. Hence during the second pass through this algorithm, it starts from second element of array and repeats the procedure n-1 times. For sorting in ascending order, smallest element will be at first and in case of sorting in descending order; largest element will be at first. Similarly, this process will continue until all elements in array are sorted. Below figure 2.2.6 demonstrates how selection sort algorithm works: 114 Figure 2.2.6: Working of selection sort with example In the first pass, 2 is found to be the smallest. Hence, it is placed in the first position. In the second pass 10 is found to be the smallest and placed at 2nd position and so on until the full list is sorted. Sorting using Selection Sort Algorithm voidselectionSort(int a[], int size) { inti, j, min, temp; for(i=0; i< size-1; i++ ) { min = i; //setting min as i for(j=i+1; j < size; j++) { if(a[j] < a[min]) //if element at j is less than element at min position { min = j; //then set min as j } } temp = a[i]; a[i] = a[min]; a[min] = temp; } } 115 Selection sort algorithm implementation in c /* Implementation of Selection sort Algorithm */ #include<stdio.h> int main() { int arr[200],i,j,n,t,min,pos; printf("Enter the total number of elements:"); scanf("%d",&n); printf("Enter those %d elements:\n", n); for(i=0; i<n; i++) scanf("%d",&arr[i]); for(i=0; i<n-1; i++) { min=arr[i]; pos=i; for(j=i+1; j<n; j++) { if(min>arr[j]) //Compare values { min=arr[j]; pos=j; } } t=arr[i]; //Swap the values arr[i]=arr[pos]; arr[pos]=t; } printf("\n After sorting using selection sort, the elements are::\n"); for(i=0; i<n; i++) printf("%d \n",arr[i]); return 0; } Output: 116 Complexity Analysis of Selection Sorting Worst Case Time Complexity: O(n2) Best Case Time Complexity: O(n2) Average Time Complexity: O(n2) Space Complexity: O(1) Did you Know? The worst case time complexity of bubble sort, selection sort and insertion sort is n2. (iv) Merge Sort Merge sort is a fine example of a recursive algorithm. The fundamental operation in this algorithm is merging two sorted lists. This can be done in one pass through the input, if the output is put in a third list because the lists are sorted. Merge sort is a sorting technique based on divide and conquer technique. Merge sort first divides the array into equal halves and then combines them in a sorted manner. The basic merging algorithm takes two input arrays A and B, an output array C, and three counters, aptr, bptr, and cptr, which are initially set to the beginning of their respective arrays. The smaller of A[aptr] and B[bptr] is copied to the next entry in C, and the appropriate counters are advanced. When either input list is exhausted, the remainder of the other list is copied to C. An example of how the merge routine works is provided for the following input. If the array A contains 1, 13, 24, 26, and b contains 2, 15, 27, 38, then the algorithm proceeds as follows: First, a comparison is done between 1 and 2. 1 is added to C, and then 13 and 2 are compared. 117 2 is added to C, and then 13 and 15 are compared. 13 is added to C, and then 24 and 15 are compared. This proceeds until 26 and 27 are compared. 26 is added to C, and the A array is exhausted. The remainder of the B array is then copied to C. The time to merge two sorted lists is clearly linear, because at most n - 1 comparisons are made, where n is the total number of elements. Note that every comparison adds an element to c, except the last comparison, which adds at least two. The merge sort algorithm is therefore easy to describe. If n = 1, there is only one element to sort, and the answer is at hand. Otherwise, recursively merge sort the first half and the second 118 half. This gives two sorted halves, which can then be merged together using the merging algorithm described above. For instance, to sort the eight-element array 24, 13, 26, 1, 2, 27, 38, 15, recursively sort the first four and last four elements, obtaining 1, 13, 24, 26, 2, 15, 27, 38. Then merge the two halves as above, obtaining the final list 1, 2, 13, 15, 24, 26, 27, 38. This algorithm is a classic divide-and-conquer strategy. The problem is divided into smaller problems and solved recursively. The conquering phase consists of patching together the answers. Divide-and-conquer is a very powerful use of recursion that will be seen many times. Algorithm Merge sort keeps on dividing the list into equal halves until it can no more be divided. By definition, if it is only one element in the list, it is sorted. Then merge sort combines smaller sorted lists keeping the new list sorted too. Step 1 − if it is only one element in the list, it is already sorted, return. Step 2 − divide the list recursively into two halves until it can no more be divided. Step 3 − merge the smaller lists into new list in sorted order. Pseudocode We shall now see the pseudocodes for merge-sort functions. As our algorithms points out two main functions − divide & merge. Merge sort works with recursion and we shall see our implementation in the same way. proceduremergesort(var a as array ) if( n ==1)return a var l1 as array = a[0]... a[n/2] var l2 as array = a[n/2+1]... a[n] l1 =mergesort( l1 ) l2 =mergesort( l2 ) return merge( l1, l2 ) end procedure procedure merge(var a asarray,var b as array ) var c as array while( a and b have elements ) if( a[0]> b[0]) 119 add b[0] to the remove b[0]from else add a[0] to the remove a[0]from endif endwhile end of c b end of c a while( a has elements ) add a[0] to the end of c remove a[0]from a endwhile while( b has elements ) add b[0] to the end of c remove b[0]from b endwhile return c end procedure /* Implementation of Merge sort Algorithm */ #include<stdio.h> int arr[20],i,n,b[20]; void merge(int arr[],int low,int m ,int high) { int h,i,j,k; h=low; i=low; j=m+1; while(h<=m && j<=high) { if(arr[h]<=arr[j]) b[i]=arr[h++]; else b[i]=arr[j++]; i++; } if( h > m) for(k=j;k<=high;k++) b[i++]=arr[k]; else for(k=h;k<=m;k++) b[i++]=arr[k]; for(k=low;k<=high;k++) { arr[k]=b[k]; } } 120 void mergesort(int arr[],int i,int j) { int m; if(i<j) { m=(i+j)/2; mergesort(arr,i,m); mergesort(arr,m+1,j); merge(arr,i,m,j); } } int main() { printf("\nEnter the number of elements:"); scanf("%d",&n); printf("Enter those %d elements:", n); for(i=0; i<n; i++) scanf("%d",&arr[i]); mergesort(arr,0,n-1); printf("\nAfter sorting using merge sort, the elements are: "); for(i=0;i<n;i++) printf("%d\n", arr[i]); return 0; } Output: The output of the program should be as follows: Analysis of Merge Sort Merge sort algorithm uses a divide and conquer strategy. This is a recursive algorithm that continuously divides the list of elements into 2 parts. 121 Case 1: If the List is empty or if it has only single item, then the list is already sorted. This will be considered as the Best Case. Case 2: If the List contains N number of elements, the algorithm will divide the list into half and perform merge sorting individually on both halves. Once both parts are sorted, a merge operation is performed to combine the already sorted smaller parts. Worst Case Time Complexity: In worst case, in every step a comparison is required. This is because in every merge step, one value will remain in the opposing list. Hence, merge sort algorithm must continue comparing the elements in the opposing lists. The complexity of worst-case Merge Sort is: T (N) = 2T (N/2) + N-1 Equation 1 Where T (N) is the total number of comparisons between the elements in a list and N refers to the total number of elements in a list. 2T (N/2) shows that merge sort is performed on two halves of the list during the divide stage and N-1 represents the total comparisons in the merge stage. This merge sort procedure is recursive. Hence it will include substitutions also. T (N) = 2[2T (N/4) + N/2-1] + N-1 Equation 2 T (N) = 4[2T (N/8) + N/4-1] + 2N-1 Equation 3 These equations represent those substitutions required during recursions, So we have substituted T (N) value into 2T (N/2) of equation 1 to obtain equation 4 T (N) = 8T (N/8) + N + N + N - 4 - 2 - 1 Equation 4 This is during the 3rd recursive call. Let us consider a value k representing the depth of recursion. Recursion stops when the list will contain only one element. In general we get, T (N)= 2k T (N/2k ) + kN – (2k – 1) Equation 5 This procedure of dividing will continue until list contains a single element. And we know that a list with a single element is already sorted. 122 T (1)=0 Equation 6 2k = N Equation 7 k=log2 N Equation 8 T (N) = N log2 N – N + 1 Equation 9 Hence, the worst time complexity of Merge sort algorithm is O (N log (N)) Best Case Time Complexity: It is when the largest element of one sorted part is smaller than the first element of its opposing part, for every merge step that occurs. Only one element from the opposing list is compared thus reducing the number of comparisons in each merge step to N/2. Hence best case time complexity is also O (N log (N)) because the merging is always linear. Same follows for the Average case time complexity. (v) Quick Sort Even though the time complexity of merge sort algorithm is O(nlogn), it is not desirable to use as it consumes more space. It needs more space to merge the array partitions. Quick sort is one of the fastest sorting algorithms. The quick sort algorithm also uses divide and conquer rule to sort the elements without using additional storage. A quick sort algorithm first selects a value, which is called the pivot value. This algorithm will partition all elements based on whether they are smaller than or greater than the pivot element. Thus we get two partitions: One partition having elements larger than the pivot element and another partition having elements smaller than the pivot element. The selected pivot element ends up in its final sorted position. Thus, the elements to the right and left of the pivot element can be sorted successfully. Hence, we can again implement a recursive algorithm to sort the elements using divide and conquer approach. All the portioned array elements remains in the same array hence saving the space where they can be combined together. 123 For sorting the elements we have to use a recursive function. We have to pass both the partitions of array along with the pivot element to this function as parameters. Our prior sorting functions, however, have no parameters, so for consistency of notation we do the recursion in a function recursive_quick_sort that is invoked by the method quick_sort, which has no parameters. Quick Sort, as the name suggests, sorts any list very quickly. Quick sort is not stable search, but it is very fast and requires very less additional space. It is based on the rule of Divide and Conquer (also called partition-exchange sort). This algorithm divides the list into three main parts: 1. Elements less than the Pivot element 2. Pivot element 3. Elements greater than the pivot element In the list of elements, mentioned in below example, we have taken 25 as pivot. So after the first pass, the list will be changed like this. 6 8 17 14 25 63 37 52 Hence, after the first pass, pivot will be set at its position, with all the elements smaller to it on its left and all the elements larger than it on the right. Now 6 8 17 14 and 63 37 52 are considered as two separate lists, and same logic is applied on them, and we keep doing this until the complete list is sorted. The working of Quick sort algorithm is shown in figure 2.2.7. 124 How Quick Sorting Works Figure 2.2.7: Divide and Conquer-Quick Sort QuickSort Pivot Algorithm Based on our understanding of partitioning in quicksort, we should now try to write an algorithm for it here. Step 1 − Choose the highest index value has pivot Step 2 − Take two variables to point left and right of the list excluding pivot Step 3 − left points to the low index Step 4 − right points to the high Step 5 − while value at left is less than pivot move right Step 6 − while value at right is greater than pivot move left Step 7 − if both step 5 and step 6 does not match swap left and right Step 8 − if left ≥ right, the point where they met is new pivot 125 QuickSort Pivot Pseudocode The pseudocode for the above algorithm can be derived as − functionpartitionFunc(left, right, pivot) leftPointer= left -1 rightPointer= right whileTruedo while A[++leftPointer]< pivot do //do-nothing endwhile whilerightPointer>0&& A[--rightPointer]> pivot do //do-nothing endwhile ifleftPointer>=rightPointer break else swapleftPointer,rightPointer endif endwhile swapleftPointer,right returnleftPointer endfunction QuickSort Algorithm Using pivot algorithm recursively we end-up with smaller possible partitions. Each partition then processed for quick sort. We define recursive algorithm for quicksort as below − Step 1 − Make the right-most index value pivot Step 2 − partition the array using pivot value Step 3 − quicksort left partition recursively Step 4 − quicksort right partition recursively 126 QuickSortPseudocode To get more into it, let see the pseudocode for quick sort algorithm − procedurequickSort(left, right) if right-left <=0 return else pivot= A[right] partition=partitionFunc(left, right, pivot) quickSort(left,partition-1) quickSort(partition+1,right) endif end procedure Sorting using Quick Sort Algorithm /* a[] is the array, p is starting index, that is 0, and r is the last index of array. */ voidquicksort(int a[], int p, int r) { if(p < r) { int q; q = partition(a, p, r); quicksort(a, p, q); quicksort(a, q+1, r); } } intpartition(int a[], int p, int r) { inti, j, pivot, temp; pivot = a[p]; i = p; j = r; while(1) { while(a[i] < pivot && a[i] != pivot) i++; while(a[j] > pivot && a[j] != pivot) j--; 127 if(i< j) { temp = a[i]; a[i] = a[j]; a[j] = temp; } else { return j; } } } /* Implementation of Quick sort Algorithm */ #include<stdio.h> #include<stdbool.h> #define MAX 7 int intArray[MAX]={ 14,6,23,12,76,49,57}; void printline(int count) { int i; for(i=0;i <count-1;i++) { printf("="); } printf("=\n"); } void display() { int i; printf("["); // navigate through all items for(i=0;i<MAX;i++) { printf("%d ",intArray[i]); } printf("]\n"); } void swap(int num1,int num2) { int temp =intArray[num1]; intArray[num1]=intArray[num2]; intArray[num2]= temp; } int partition(int left,int right,int pivot) { 128 int leftPointer= left -1; int rightPointer= right; while(true) { while(intArray[++leftPointer]< pivot) { //do nothing } while(rightPointer>0&&intArray[--rightPointer]> pivot) { //do nothing } if(leftPointer>=rightPointer) { break; } else { printf(" item swapped :%d,%d\n", intArray[leftPointer],intArray[rightPointer]); swap(leftPointer,rightPointer); } } printf(" pivot swapped :%d,%d\n",intArray[leftPointer],intArray[right]); swap(leftPointer,right); printf("Updated Array: "); display(); return leftPointer; } void quickSort(int left,int right) { if(right-left <=0) { return; } else { int pivot =intArray[right]; int partitionPoint= partition(left, right, pivot); quickSort(left,partitionPoint-1); quickSort(partitionPoint+1,right); } } int main() 129 { printf("\nBefore Sorting: "); display(); printline(50); quickSort(0,MAX-1); printf("\nAfter sorting using quick sort, the elements are: "); display(); printline(50); return 0; } Output: Complexity Analysis of Quick Sort Worst Case Time Complexity: O(n2) Best Case Time Complexity: O(n log n) Average Time Complexity: O(n log n) Space Complexity: O(n log n) • Space required by quick sort is very less, only O(n log n) additional space is required. • Quick sort is not a stable sorting technique, so it might change the occurrence of two similar elements in the list while sorting. Analysis of Quick sort To analyse the running time of Quick Sort, we use the same approach as we did for Merge Sort (and is common for many recursive algorithms, unless they are completely obvious). 130 LetT(n) represent the worst-case running time of the Quick Sort algorithm on an array of size n. To get a hold of T(n), we look at the algorithm line by line. The call to partition takes time Θ (n), because it runs one linear scan through the array, plus some constant time. Then, have two recursive calls to Quick Sort. Let k = m − 1 − l Denote the size of the left subarray. Then, the first recursive call takes time T(k), because it is a call on an array of size k. The second recursive call will take time T(n − 1 − k), because the size of the right subarray is n − 1 − k. Therefore, the total running time of Quick Sort satisfies the recurrence. T(n) = Θ (n) + T(k) + T(n − 1 − k), T(1) = Θ (1). This is quite a bit messier-looking than the recurrence for Merge Sort, and has no idea about k, solving this recurrence problem isn’t feasible. Certainly can work around and explore different possible values of k. 1. For k = n/2, the recurrence becomes much simpler: T(n) = Θ (n)+T(n/2)+T(n/2−1), which — as we discussed in the context of Merge Sort — we can simplify to T(n) = Θ (n) + 2T(n/2). That’s exactly the recurrence that is already solved for Merge Sort, and thus the running time of Quick Sort would be Θ (n log n). 2. At the other extreme is k = 0 (or, similarly, k = n−1). Then, results only that T(n) = Θ (n)+T(0)+T(n − 1), and since T(0) = Θ (1), this recurrence becomes T(n) = Θ (n) + T(n − 1). This recurrence unrolls as T(n) = Θ (n) + Θ (n − 1) + Θ (n − 2) + . . . + Θ (1), so The running time for k = 0 or k = n − 1 is thus just as bad as for the simple algorithms, and in fact, for k = 0, Quick Sort is essentially the same as Selection Sort. Of course, this quadratic running time would not be a problem if only the cases k = 0 and k = n − 1 did not appear in practice. But in fact, they do: with the pivot choice we implemented, these cases will happen whenever the array is already sorted (increasingly or decreasingly), which should actually be an easy case. They will also happen if the array is nearly sorted. 131 This is quite likely in practice, for instance, because the array may have been sorted, and then just messed up a little with some new insertions. Did you Know? Quicksort (sometimes called partition-exchange sort) is an efficient sorting algorithm, serving as a systematic method for placing the elements of an array in order. Developed by Tony Hoare in 1959, with his work published in 1961, it is still a commonly used algorithm for sorting. When implemented well, it can be about two or three times faster than its main competitors, merge sort and heapsort. Self-assessment Questions 4) Which of the following is an example of not in-place sorting algorithm? a) Bubble Sort b) Merge Sort c) Selection Sort d) Heap Sort 5) Sorting Algorithm that does not require any extra space for sorting is known as ________________. a) In-Place Sorting b) Out-Place Sorting c) Not in-Place Sorting d) Not Out-Place Sorting 6) Which of the following is not a stable sorting algorithm? a) Insertion sort b) Selection sort c)Bubble sort d)Merge sort 7) Running merge sort on an array of size n which is already sorted is, 132 a) O(nlogn) b) O(n) c) O(n2) d) O(n3) 8) Merge sort uses, a) Divide-and-conquer b) Backtracking c) Heuristic approach d) Greedy approach 9) For merging two sorted lists of size m and n into sorted list of size m+n, we require comparisons of: a) O(m) b) O(n) c) O(m+n) d) O(logm + logn) 10) Quick sort is also known as _____________. a) Merge sort b) Tree sort c) Shell sort d) Partition and exchange sort 133 Summary o Bubble sort is a simple sorting algorithm. It compares the first two elements, and if the first is greater than the second, then it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set, repeating until no swaps have occurred on the last pass. o Selection sort is an in-place comparison sort. It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. o Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly sorted lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. o Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists. o Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array an element called a pivot is selected. All elements smaller than the pivot is moved before it and all greater elements are moved after it. 134 Terminal Questions 1. Explain different types of Sorting Algorithms. 2. Write down the procedure for Bubble sort. 3. Explain the sorting technique based on divide and conquer policy and find its time complexity. 4. Explain merge sort algorithm and find its time complexity. Answer Keys Self-assessment Questions Question No. Answer 1 c 2 c 3 a 4 b 5 a 6 b 7 a 8 a 9 c 10 d 135 Activity 1. Activity Type: Offline Description: 1. Divide the class into 5 groups. 2. Assign an algorithm and list of numbers to each group 3. Students should sort the list using assigned algorithm. 136 Duration: 10 Minutes Bibliography e-Reference • pages.cs.wisc.edu, (2016). Computer Sciences User Pages. Retrieved on 19 April 2016, from http://pages.cs.wisc.edu/~bobh/367/SORTING.html External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education Video Links Topic Introduction to Basics of sorting techniques Link https://www.youtube.com/watch?v=pkkFqlG0Hds Selection Sort https://www.youtube.com/watch?v=LeNbr2ftWIo Merge Sort https://www.youtube.com/watch?v=TzeBrDU-JaY 137 Notes: 138 MODULE - III Stacks and Queues MODULE 3 Stacks and Queues Module Description This module introduces two closely-related data types for manipulating large collections of objects: stack and the queue. Each of them is basically defined by two simple operations: insert or add a new item, and remove an item. When we add a data item we have a clear intension. However, when we remove an item, we should decide which one to choose. For example, the rule used in case of queue is to always remove the item that has been in the queue for longest time. This policy is known as first-in-first-out or FIFO. And the rule used in case of stack is that we always remove the element that has been in the stack for least amount of time. This policy is known as last-in first-out or LIFO. Chapter 3.1 Stacks Chapter 3.2 Queue Chapter Table of Contents Chapter 3.1 Stacks Aim ..................................................................................................................................................... 139 Instructional Objectives................................................................................................................... 139 Learning Outcomes .......................................................................................................................... 139 3.1.1 Introduction to Stack .............................................................................................................. 140 (i) Definition of a Stack........................................................................................................... 141 (ii) Array Representation of Stack ......................................................................................... 142 Self-assessment Questions ...................................................................................................... 144 3.1.2 Operations on Stack ................................................................................................................ 144 Self-assessment Questions ...................................................................................................... 149 3.1.3 Polish Notations ...................................................................................................................... 149 (i) Infix Notation ..................................................................................................................... 150 (ii) Prefix Notation .................................................................................................................. 151 (iii) Postfix Notation ............................................................................................................... 152 Self-assessment Questions ...................................................................................................... 155 3.1.4 Conversion of Arithmetic Expression from Infix to Postfix ............................................. 155 Self-assessment Questions ...................................................................................................... 159 3.1.5 Applications of Stack .............................................................................................................. 160 (i) Balancing Symbol ............................................................................................................... 160 (ii) Recursion............................................................................................................................ 161 (iii) Evaluation of Postfix Expression.................................................................................... 163 (iv) String Reversal .................................................................................................................. 163 Self-assessment Questions ...................................................................................................... 165 Summary ........................................................................................................................................... 166 Terminal Questions.......................................................................................................................... 167 Answer Keys...................................................................................................................................... 168 Activity............................................................................................................................................... 169 Case Study ......................................................................................................................................... 170 Bibliography ...................................................................................................................................... 171 e-References ...................................................................................................................................... 171 External Resources ........................................................................................................................... 171 Video Links ....................................................................................................................................... 171 Aim To educate and equip the students with skills and technologies of Stacks Instructional Objectives After completing this chapter, you should be able to: • Outline the basic features of stack • Describe the array representation of stack • Explain the Polish Notations with example • Discuss the evaluation of postfix expression using stack • Explain the steps to convert Infix expression to postfix expression and vice versa • Outline the applications of stacks Learning Outcomes At the end of this chapter, you are expected to: • Explain operations on stack • Convert given expressions of infix to prefix and postfix expression • Explain string recursion applications of stack • Compute given postfix expression using stack • Convert to prefix expression for any given infix expression 139 3.1.1 Introduction to Stack In this chapter we will introduce a data structure for representing stack as a limited access data structure. Stack data structure is used for manipulating arbitrarily large collections of data. The stack is a data structure which represents objects maintained in a particular order. In this chapter we also explain how to operate a stack data structure. This chapter demonstrates the operations for creating a stack, adding elements to a stack, deleting an element form a stack etc. Some problems has solutions that require the data associated to be arranged or organized as linear list of data elements in which operations are permitted to take place at only one end of the list. The best and the simplest examples are set of books kept one on top of another, set of playing cards, pancake, arranging laundry, stacked plates one above another, etc. Here, we group things together by placing one thing on top of another and then we have to remove things from top to bottom one at a time. The below figure 3.1.1 shows a set of books represented as a stack. Figure 3.1.1: Picture Representing a Stack It is interesting that something that is so simple is a critical part of nearly every program that is written. The nested function calls in a running program, conversion of an infix form of an expression to an equivalent postfix or prefix, computing factorial of a number, and so on can be accurately formulated using this simple technique. In all the above cases, it is clear that the one which most recently entered into the list is the first one to be operated. Solution to these types of problems is based on the principle Last-In-First-Out (LIFO) or First-In-Last-Out. A logical structure, which organizes the data and performs operations in LIFO or FILO principle, is termed as a Stack. 140 (i) Definition of a Stack Stack is an ordered list of similar data items in which operations such as insertion and deletion are permitted to be done only at one end called top of the stack. It is a linear data structure in which operations can be performed on data objects on principle of Last-In-First-Out or FirstIn-Last-Out. More formally, a stack can be defined as an abstract data type with domain of data objects and a set of functions that can be performed on data objects guided by list of axioms. Some of the important functions used while doing operations of stacks are listed below: 1. Create-Stack() - Used for allocating memory 2. Isempty(S) - Used for checking if stack is empty or not; it returns a Boolean 3. Isfull() - Used for checking if stack is full; this also return Boolean 4. Push(S,e) - Use to add an element on top of stack 5. Pop(s) - Used to remove an element from the top of the stack 6. Top(S) - Used to display an element in the stack Also, some axioms are needed to be known while we do operations on stacks. Following is a list of axioms which a programmer must know: • Isempty(Create-Stack()): Always returns true value • Isfull(Create-Stack()): Always returns false values • Isempty(Push(S, e)): Always returns false value • Isfull(Pop(S)): Always returns false values • Top(push(S, e)): The element e is displayed • Pop(Push(S, e): The element e will be removed from stack The detailed explanation and algorithm for implementing above operations will be covered in forthcoming sections. The figure 3.1.2 demonstrates push() and pop() operations performed on stack. 141 As shown in the figure, initially stack contains element 1. To push element 2, stack pointer is incremented and then element 2 is pushed. So now stack contains 2 elements i.e., 1 and 2. In the second step, we push element 3 onto the stack. Thus this element will be placed on top of 2 as the top of stack is pointing one location above 2. Similarly elements 4, 5 and 6 are pushed onto the stack. After pushing element 6 the stack contains total 6 elements. In the second part of the figure, pop instructions are executed. The first element we can read out is 6 as it is on the top of the stack. And the stack pointer is decremented. Next time if we execute a pop instruction, element 5 will be removed and so on until last element 1 is removed. Thus, Figure 3.1.2: Push() and Pop() Operations (ii) Array Representation of Stack As we know a stack is a data structure designed to store collection of data where the data can be added and removed from only one end. We can implement this stack using a simple linear array. As array is a collection of similar kinds of elements. We can create a stack using a one dimensional array very easily. For example, we can declare an array named stack [] to store all the data elements of a stack. Normally, elements in a linear array can be accessed in any random way by using array name and its index. 142 But a stack is operating from only one end. Thus we when a stack is implemented as a array, we should allow insertion and deletion of elements from only one end of array. Thus, a variable named “top” will keep track of the position of the topmost element in that stack. This variable is also called as stack pointer. Initially the value of “top” is assigned to -1, as the stack is empty. When we push an element onto the stack, we need to increment the stack pointer by one and then insert the element at a position where stack pointer is pointing. For every push operation we have to check if stack pointer has reached the maximum size of the array stack []. Similarly, when we perform a pop operation, the stack pointer should be decremented by 1. We should also check for a condition to see whether the stack array is empty or no. Figure 3.1.3 demonstrates the array representation of stack Figure 3.1.3: Array Representation of Stack As shown in the above figure, an array named S with size 7 is declared which acts as a stack. Thus we can store total of 7 elements in this stack. In part (a) of the figure, after adding elements 15, 6, 2, and 9, the stack pointer is pointing location 4. In part (b), we have pushed two more elements 17 and 3 making stack pointer to have value 6. In part (c), it shows how a pop operation is carried out on that array causing last element 3 to be popped out. Thus the stack pointer is decremented by 1 after pop operation. 143 Self-assessment Questions 1) A stack is works on the principle of ___________ a) First in first out (FIFO) b) Last in last out (LILO) c) First in last out (FILO) d) Cyclical data structures 2) The difference between linear array and stack is that any elements can randomly be accessed. a) True b) False 3) Top[s] returns ________________ a) Stack bottom b) Stack top c) Stack mid d) Any random element 3.1.2 Operations on Stack The operations discussed in the topics above are explained in detail this in this section. Basically, stack operations may include, initializing a stack, using it storing data based on different applications and again de-initializing it. Apart from these basic things, stack is used for carrying our following two operations: 1. Push() – storing data item in to the stack 2. Pop() – Deleting a data item from the stack Consider the operation of pushing a data on to the stack. In order to use stack most efficiently, we need to aware about the status if the stack. For this purpose, following functions are importing. 1. stacktop() – This function is used for displaying the topmost element in the stack 2. isFull() – This function is used to check if the stack is already full 3. isEmpty() – This function is used to check if stack is empty. 144 Throughout, we must maintain a pointer to the most recent pushed data on the stack. This pointer always represents the top of the stack and hence is named as top. Before we proceed to implement push () operation, we must first lean the procedure for these support functions. Algorithm for top() function begin procedure stacktop return stack[top] end procedure Implementation in C programming int stacktop() { return stack[top]; } Algorithm for isFull() function begin procedure isfull iftop equals to MAXSIZE returntrue else returnfalse endif end procedure Implementation in C programming bool isfull() { if(top == MAXSIZE) returntrue; else returnfalse; } Algorithm for isempty() function begin procedure isempty iftop less than 1 returntrue else returnfalse endif end procedure 145 Implementation in C programming bool isempty() { if(top ==-1) returntrue; else returnfalse; } Now, to get back to push operation, we must first understand the process how push() function works. Following steps are involved: • Step 1: Check if stack is full. • Step 2: If stack is full then display an error and exit. • Step 3: If stack is not full, increment top to point to next empty space. • Step 4: Add the element on to the stack, where top is pointing. • Step 5: Return Figure 3.1.4: Push Operation on Stack Note: If linked list is used for stack implementation, then memory space needs to be allocation in step 3. Following is the algorithm for push operation begin procedure push: stack, data if stack is full 146 returnnull endif top ← top +1 stack[top]← data end procedure And the corresponding C program function is also shown below void push(int data) { if(!isFull()) { top = top +1; stack[top]= data; }else { printf("Could not insert data, Stack is full.\n"); } } Now, we move on to pop operation. Accessing the data element while removing it from the stack is called pop operation. Following are the steps involved in process of popping out an element from the stack. • Step 1 − Check if stack is empty. • Step 2 − If stack is empty, produce error and exit. • Step 3 − If stack is not empty, access the data element at which top is pointing. • Step 4 − Decrease the value of top by 1. • Step 5 − return 147 Figure 3.1.5: Pop Operation on Stack Algorithm for implementation of pop operation begin procedure pop: stack if stack is empty returnnull endif data ← stack[top] top ← top -1 return data end procedure Corresponding C program function int pop(int data) { if(!isempty()) { data = stack[top]; top = top -1; return data; }else { printf("Could not retrieve data, Stack is empty.\n"); } } 148 Self-assessment Questions 4) Match the following 1 stacktop() A Used for checking if stack is empty 2 Isfull() B Used for displaying topmost element 3 Isempty() C Used for checking if stack is full 5) What push(x) does to stack a) Removes x from stack b) Add x to topmost element c) Add x to all the elements d) Add x to top of stack 6) What pop() does to stack a) Removes x from stack b) Add x to topmost element c) Add x to all the elements d) Add x to top of stack 3.1.3 Polish Notations First we need to understand Arithmetic Expressions. An arithmetic expression is an expression which when evaluated, results in a numeric value. The method of writing arithmetic expression is known as a Notation. Same Arithmetic Expression can be written in different ways without changing the essence or meaning of that expression. Consider an expression: (5-6)*7 It can be written in its infix form as “*(-5 6)7”. In this case all the arithmetic operators are binary in nature, thus bracketing is not necessary. The above expression can be also written as *-567 149 Consider an expression “1+2”, which adds the values 1 and 2. Its prefix notation, the operators precedes the operands, thus it will be “+ 1 2”. The product calculation depends upon the availability of two operands i.e., 5-6 and 7. Normally, the innermost expressions are evaluated first. But, in case of prefix notation operators are written ahead of operands. Thus infix notation with parenthesis will look like 5 – (6 * 7) Or without parenthesis it will be 5–6 * 7 It would change the semantics or meaning of the expression because of precedence rule. Similarly, Polish notation of 5 – (6 * 7) Will be –5*67 Polish notation Polish notation, also called Polish prefix notation or prefix notation is a symbolic logic invented by Polish mathematician Jan Lukasiewicz. It is a form of notation for logic, arithmetic, and algebra. In prefix notations, the operators are placed to the left of their operands. If the operator’s parity is fixed, the result is a syntax lacking parentheses or other brackets that can still be parsed without any problem. The term Polish notations also include Polish postfix notation, or Reverse Polish notation, in which the operators are placed after the operands. (i) Infix Notation As already discussed in the previous section, infix notation is the most common and simplest notation in which an operator is placed between two operands. This notation is also known as general form of arithmetic expression. For example, if arithmetic expression for adding two operands can be written in infix form as A+B In this example A and B are two operands and + is the operator. 150 Another example of infix expression is A + B * C + (E – G) These expressions follow a normal arithmetic precedence rule. For example, to evaluate the above expression, the first precedence is given to multiplication. So the product of B and C will be calculated first. Second precedence will be given to parenthesis. Therefore the result of E – G will be calculated and then A, result of product of B and C, and subtraction result of E and G will be added together. (ii) Prefix Notation This is also called as a polish method. When using this method, operator precedes operands i.e. instruction precedes data. Here the order of operations and operands determines the result, making parenthesis unnecessary. Taking the example, consider infix expression 3 (4 + 5). This could be expressed as *3+45 This is in contrast with the traditional algebraic methodology for performing mathematical operations, order of operation. In the expression 3(4+5), we first work inside the parentheses to add four plus five and then multiply the result by three. Did you know? In the olden days of the calculator, the end-user would write down the results of every step when using the algebraic Order of Operations. Not only did this slow things down, it provided an opportunity for the end-user to make errors and sometimes defeated the purpose of using a calculating machine. In the 1960's, engineers at Hewlett-Packard decided that it would be easier for end-users to learn Jan Lukasiewicz' logic system than to try and use the Order of Operations on a calculator. They modified Jan Lukasiewicz's system for a calculator keyboard by placing the instructions (operators) after the data. In homage to Jan Lukasiewicz' Polish logic system, the engineers at Hewlett-Packard called their modification reverse Polish notation (RPN). 151 (iii) Postfix Notation Just opposite to prefix notation is postfix notation. Here operands precedes operator or operator is placed after operands and hence it is called postfix notation. It is also called as reverse polish expression. The infix expression A+B can be written in postfix as AB+. Below are some of the examples of expressions represented in all three notations. Infix Prefix Postfix A+B +AB AB+ A+B*C +A*BC ABC*+ (A+B)*(C-D) *+AB-CD AB+CD-* Algorithm for evaluation of postfix expression Consider a string of postfix arithmetic expression of operands and operators. Below given below should be followed for evaluation of a postfix expression: • Step 1: Scan the string from left to right. • Skip all the operands and values. • If an operator is found, perform the operation on preceding two operands. • Now replace these (2 operands and an operator) with one operand i.e., the result of operation. • 152 Continue the process until single value remains, which is the result of the expression. Algorithm Program: #include<string.h> #include<stdlib.h> #define MAX 50 int stack[MAX]; char post[MAX]; int top=-1; void pushstack(int tmp); void calculator(char c); void main() { int i; printf("Insert a postfix notation :: "); gets(post); for(i=0;i<strlen(post);i++) { if(post[i]>='0' && post[i]<='9') { pushstack(i); } if(post[i]=='+' || post[i]=='-' || post[i]=='*' || post[i]=='/' || post[i]=='^') { calculator(post[i]); } } printf("\n\nResult :: %d",stack[top]); } void pushstack(int tmp) { top++; 153 stack[top]=(int)(post[tmp]-48); } void calculator(char c) { int a,b,ans; a=stack[top]; stack[top]='\0'; top--; b=stack[top]; stack[top]='\0'; top--; switch(c) { case '+': ans=b+a; break; case '-': ans=b-a; break; case '*': ans=b*a; break; case '/': ans=b/a; break; case '^': ans=b^a; break; default: ans=0; } top++; stack[top]=ans; } Output: 154 Self-assessment Questions 7) In prefix notation ________ follows the operands. (fill in the blank) 8) A+B is an infix expression a) True b) False 9) Postfix notation of A+B is a) +AB b) A+B c) AB+ d) ++A 3.1.4 Conversion of Arithmetic Expression from Infix to Postfix Let X is an arithmetic expression in its infix form. X is an expression containing operators, operands, parenthesis etc. We have 5 basic operators in mathematics namely addition, subtraction, multiplication, division and exponentiation. The order of precedence is • Exponentiation (Highest Precedence) • Multiplication/division • Addition/subtraction (Lowest Precedence) Consider that all the operators including exponentiations are on the same level, performed from left to right unless indicated by the parentheses. Below given algorithm transforms any infix expression X into its equivalent postfix expression Y. We use stack data structure to store the operators and parenthesis. Algorithm: 1. Read token from Left to Right in a given infix expression X and Postfix expression Y is generated. 155 2. Input infix Expression may have following tokens: a) Any Alphabet from A-Z or a-Z b) Any Number from 0-9 c) Any Operator d) Opening And Closing Braces ( , ) 3. If token read is Alphabet: a) Print that Alphabet as Output 4. If token read is Digit: a) Print that Digit as Output 5. If token read is Opening Bracket “(” : a) Push opening bracket ‘(’ Onto the Stack b) If any Operator appears before ‘)’ then Push it onto Stack. c) If Corresponding ‘)’ bracket appears then Start pop elements from Stack till ‘(’ is popped out. 6. If token read is Operator: a) Check if there is any Operator already present in Stack. b) If Stack is empty, Push Operator onto the Stack. c) If operator is present, check if Priority of Incoming Operator is greater than Priority of Topmost Stack Operator. d) If Priority of incoming Operator is Greater, push Incoming Operator Onto Stack. e) Else Pop Operator from Stack, repeat Step 6. 156 Example of converting an expression from infix to postfix Infix expression: A*B+C The order in which the operators appear is not reversed. When the '+' is read, it has lower precedence than the '*', so the '*' must be printed first. We will show this in a table with three columns. The first will show the symbol currently being read. The second will show what is on the stack and the third will show the current contents of the postfix string. The stack will be written from left to right with the 'bottom' of the stack to the left. Step 1. 2. 3. 4. 5. 6. Current Symbol A * B + C Stack * * + + Postfix expression A A AB AB* AB*C AB*C+ Step 1: The first input token is an alphabet “A”. Thus it is printed as output character of postfix notation. Step 2: Next token in the infix expression is an operator “*”. Thus it is pushed onto the top of the stack if the stack is empty. Step 3: The third token in infix expression is an alphabet “B”, hence it is printed as an output character of postfix notation. Step 4: Fourth input token is again an operator “+”. But the operator on the top of the stack i.e. “*” has higher precedence as compared to operator “+”. Thus the operator “*” is popped out from top of stack and printed as output of postfix notation. Push the operator “+” on to the top of the stack now. Step 5: Next input character is an alphabet “C”. Thus, printed as an output character of postfix notation. Step 6: now it is the end of the infix expression, thus we need to pop out all the operators from the stack one by one and printed as postfix notation. Thus operator “+” is printed as the last character of the postfix notation. 157 Thus the Postfix expression is AB*C+ Program: #include<stdio.h> #include<ctype.h> char stack[20]; int top = -1; void push(char x) { stack[++top] = x; } char pop() { if(top == -1) return -1; else return stack[top--]; } int priority(char x) { if(x == '(') return 0; if(x == '+' || x == '-') return 1; if(x == '*' || x == '/') return 2; } int main() { char exp[20]; char *e, x; printf("Enter the expression :: "); scanf("%s",exp); e = exp; while(*e != '\0') { if(isalnum(*e)) printf("%c",*e); else if(*e == '(') push(*e); else if(*e == ')') { while((x = pop()) != '(') printf("%c", x); } else { 158 while(priority(stack[top]) >= priority(*e)) printf("%c",pop()); push(*e); } e++; } while(top != -1) { printf("%c",pop()); } return 0; } Output: Self-assessment Questions 10) As per the algorithm to convert infix to postfix expression, we must ignore parenthesis present in infix expression a) True b) False 11) While converting an infix expression to postfix expression, if an operator is encountered, the operators are ___________. a) Pushed on to the stack b) Popped out of stack c) Left without doing anything d) Checked for precedence level 12) When the string scanning ends, next operation is ____________. a) Popping out all operators from stack and adding them to postfix string b) Exit and print result c) Push all the operands on to the stack d) Do nothing 159 3.1.5 Applications of Stack Stacks have many useful applications in computer science. Stack form a base for many of the compilers for programming languages and sometimes is also core part of low lever programming languages like MATLAB and other assembly level languages. Some of the basic and most frequently used applications are described in the section below. (i) Balancing Symbol We always do syntax mistakes while typing programs. The compilers duty is to check the programs for all the syntax errors. Most of the times, we make mistakes in typing brackets or parenthesis or any operators. Lack of any one symbol may cause multiple errors in the program. Thus real error remains unidentified. Hence a stack can be used to check if the expressions in the programs are balanced. Thus, every right bracket, parenthesis or braces must end with corresponding left counterparts. For example, the sequence [()] is correct, however [(]) is invalid. As of now, consider a problem just check for balancing of parentheses, brackets, and braces and ignore other characters. A Stack can be used to balance symbols in a program. Following are the steps to do the same: 1. Create an empty stack s[]. 2. Scan the program file character by character till the end of file. 3. Upon identifying any symbol (parenthesis, brace, bracket etc.), push it on to the stack. 4. If stack is empty and scanned character is close bracket, brace, parenthesis etc., print an error message. 5. Else pop element from stack 6. If popped element is not corresponding open symbol, print an error message. 7. If stack is not empty at the end of file, print an error message. This is clearly linear and actually makes only one pass through the input. It is thus on-line and quite fast. 160 (ii) Recursion Recursion is considered to be the most powerful tools in a programming language. But sometimes Recursion is also considered as the most tricky and threatening concept to a lot of programmers. This is because of the uncertainty of conditions specified by user. In short Something Referring to itself is called as a Recursive Definition Recursion can be defined as defining anything in terms of itself. It can be also defined as repeating items in a self-similar way. In programming, if one function calls itself to accomplish some task then it is said to be a recursive function. Recursion concept is used in solving those problems where iterative multiple executions are involved. Thus, to make any function execute repeatedly until we obtain the desired output, we can make use of Recursion. Example of Recursion: The best example in mathematics is the factorial function. n! = 1.2.3.........(n-1).n If n=6, then factorial of 6 is calculated as 6! = 6(5)(4)(3)(2)(1)= 720 Consider we are calculating the factorial of any given using a simple. If we have to calculate factorial of 6 then what remains is the calculation of 5! In general we can say n ! = n (n-1)! (i.e., 6! = 6 (5!)) It means we need to execute same factorial code again and again which is nothing but Recursion. 161 Thus the Recursive definition for factorial is: f(n) = 1 if n=0 n * f (n-1) otherwise The above Recursive function says that the factorial of any number n=0 is 1, else the factorial of any other number n is defined to be the product of that number n and the factorial of one less than that number . Any recursive definitions will have some properties. They are: 1. There are one or more base cases for which recursions are not needed. 2. All cycles of recursion stops at one of the base cases. We should make sure that each recursion always occurs on a smaller version of the original problem. In C Programming a recursive factorial function will look like: int factorial(int n) { if (n==0) //Base Case return 1; else return n*factorial (n-1); //Recursive Case } The above program is for calculating factorial of any number n. First when we call this factorial function, it checks for the base case. It checks if value of n equals 0. If n equals 0, then by definition it returns 1. Otherwise it means that the base case is not yet been satisfied. Hence it returns the product of n and factorial of n-1. Thus it calls the factorial function once again to find factorial of n-1. Thus forming recursive calls until base case is met. 162 (iii) Evaluation of Postfix Expression (iv) String Reversal Since stack is LIFO data structure, it becomes of obvious use in application where there is requirement of reversing a string or for checking if a string is a palindrome or not. The simplest way to reverse a string is, scan a string from left to right and push every character on to the stack until we reach the end of the string. Once we reach the end of the string, start popping out elements from the stack and create a new string of popped elements. Repeat the process of popping from stack until stack becomes empty. /*Program of reversing a string using stack */ #include<stdio.h> #include<string.h> #include<stdlib.h> #define MAX 20 int top = -1; char stack[MAX]; char pop(); void push(char); int main() { char str[20]; unsigned int i; printf("Enter the string : "); gets(str); /*Push characters of the string str on the stack */ for(i=0;i<strlen(str);i++) push(str[i]); 163 /*Pop characters from the stack and store in string str */ for(i=0;i<strlen(str);i++) str[i]=pop(); printf("Reversed string is : "); puts(str); return 0; }/*End of main()*/ void push(char item) { if(top == (MAX-1)) { printf("Stack Overflow\n"); return; } stack[++top] =item; }/*End of push()*/ char pop() { if(top == -1) { printf("Stack Underflow\n"); exit(1); } return stack[top--]; }/*End of pop()*/ Output: 164 Self-assessment Questions 13) Balancing symbol is useful for _____________. a) Compiler optimization b) Inserting symbols in program code c) Inserting comments d) Checking precedence of operator 14) In Stack winding phase of recursion involves popping out instructions from stack a) True b) False 15) What is the evaluation of postfix expression 4 6 + 7 a) 7 b) 4 c) 3 d) 8 165 Summary o Stacks are Last-In-First-Out (LIFO) data structures in which the most recent element inserted in the stack is the first one to be removed. o The stack can be implemented using an array by creating a stack pointer variable for keeping track of top position. o Push () and pop () are the primary operations possible on stacks for insertion and deletion of elements along with some support functions. o Polish notation which is also called as prefix notation or simply prefix notation is a form of notation for logic, arithmetic, and algebra. o Infix notation is the most common and simplest notation in which an operator is placed between two operands. o In prefix notation, operator precedes operands and in postfix operands precedes the operator. o Stacks can be used for evaluating a postfix expression, also in Recursion, String reversal etc. 166 Terminal Questions 1. Explain stack and its basic operations. 2. Explain the algorithm for push and pop operations. 3. Write a C program for converting infix expression to postfix expression. 4. Explain applications of stack in brief. 167 Answer Keys Self-assessment Questions 168 Question No. Answer 1 c 2 b 3 b 4 1 –b,2-c,3-a 5 b 6 a 7 Operator 8 a 9 c 10 a 11 a 12 a 13 a 14 b 15 c Activity Activity Type: Offline Duration: 15 Minutes Description: Divide the students into 4 groups. Below are 4 infix expression, assign an expression to each group. Each group should convert the given expression to postfix and prefix expression using stack. a) 3+4*5/6 b) 6 * (77 + 8 *15) + 20 c) (300+23)*(43-21)/(84+7) d) (4+8)*(6-5)/((3-2)*(2+2)) 169 Case Study Stack based memory allocation Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner. In most modern computer systems, each thread has a reserved region of memory referred to as its stack. When a function executes, it may add some of its state data to the top of the stack; when the function exits it is responsible for removing that data from the stack. At a minimum, a thread's stack is used to store the location of function calls in order to allow return statements to return to the correct location, but programmers may further choose to explicitly use the stack. If a region of memory lies on the thread's stack, that memory is said to have been allocated on the stack. Because the data is added and removed in a last-in-first-out manner, stack-based memory allocation is very simple and typically faster than heap-based memory allocation (also known as dynamic memory allocation). Another feature is that memory on the stack is automatically, and very efficiently, reclaimed when the function exits, which can be convenient for the programmer if the data is no longer required. If however, the data needs to be kept in some form, then it must be copied from the stack before the function exits. Therefore, stack based allocation is suitable for temporary data or data which is no longer required after the creating function exits. A thread's assigned stack size can be as small as only a few bytes on some small CPU's. Allocating more memory on the stack than is available can result in a crash due to stack overflow. Some processor families, such as the x86, have special instructions for manipulating the stack of the currently executing thread. Other processor families, including PowerPC and MIPS, do not have explicit stack support, but instead rely on convention and delegate stack management to the operating system's application binary interface (ABI). Questions: 1. Explain how stack based memory allocation worked. 2. What are the advantages of stack based memory allocation? 170 Bibliography e-Reference • bowdoin.edu, (2016). Computer Science 210: Data Structures. Retrieved on 19 April 2016, from http://www.bowdoin.edu/~ltoma/teaching/cs210/fall10/Slides/StacksAndQueues. pdf External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education. Video Links Topic Link Introduction and definition of stacks https://www.youtube.com/watch?v=FNZ5o9S9prU Recursion Evaluation of postfix expression using stack https://www.youtube.com/watch?v=k0bb7UYy0pY https://www.youtube.com/watch?v=_EP4gpG-4kQ 171 Notes: 172 Chapter Table of Contents Chapter 3.2 Queues Aim ..................................................................................................................................................... 173 Instructional Objectives................................................................................................................... 173 Learning Outcomes .......................................................................................................................... 173 3.2.1 Introduction to Queue............................................................................................................ 174 (i) Definition of a Queue ........................................................................................................ 174 (ii) Array Representation of Queue....................................................................................... 175 Self-assessment Questions ...................................................................................................... 180 3.2.2 Types of Queue ........................................................................................................................ 181 (i) Simple Queue ...................................................................................................................... 181 (ii) Circular Queue .................................................................................................................. 182 (iii) Double Ended Queue ...................................................................................................... 188 (iv) Priority Queue .................................................................................................................. 194 Self-assessment Questions ...................................................................................................... 196 3.2.3 Operations on Queue ............................................................................................................. 196 (i) Insertion .............................................................................................................................. 197 (ii) Deletion in Queue ............................................................................................................. 198 (iii) Qempty Operation ........................................................................................................... 199 (iv) Qfull Operation ................................................................................................................ 200 (v) Display Operation ............................................................................................................. 200 Self-assessment Questions ...................................................................................................... 201 3.2.4 Application of Queue ............................................................................................................. 202 Self-assessment Questions ...................................................................................................... 203 Summary ........................................................................................................................................... 204 Terminal Questions.......................................................................................................................... 205 Answer Keys...................................................................................................................................... 205 Activity............................................................................................................................................... 206 Bibliography ...................................................................................................................................... 207 e-References ...................................................................................................................................... 207 External Resources ........................................................................................................................... 207 Video Links ....................................................................................................................................... 207 Aim To educate the students with the basic knowledge of queues, its types and operations on queues Instructional Objectives After completing this chapter, you should be able to: • Explain queue and its operations • Describe the array representation of Queue • Discuss different types of queue with example • Illustrate the creation, insertion, deletion and search operation on various types of queue Learning Outcomes At the end of this chapter, you are expected to: • Demonstrate queue with its operations • Implement double ended queue using linked list • Identify requirement of priority queue 173 3.2.1 Introduction to Queue In simple language a queue is a simple waiting line which keeps growing if we add the elements to its end and keep shrinking on removal of elements from its front. If we compare stack, queue reflects the more commonly used maxim in real-world, that is, “first come, first served”. Long waiting lines in food counters, supermarkets, banks are common examples of queues. For all computer applications, we define a queue as list in which all additions to the list are made at one end, and all deletions from the list are made at the other end. Applications of queues are, if anything, even more common than are applications of stacks, since in performing tasks by computer, as in all parts of life, it is often necessary to wait one’s turn before having access to something. Within a computer processor there can be queues of tasks waiting for different devices like printer, for access to disk storage, or even, with multitasking, for using the CPU. Within a single program, there may be multiple requests to be kept in a queue, or one task may create other task, which must be done in turn by keeping them in a queue. A queue is a data structure where elements are added at the back and remove elements from the front. In that way a queue is like “waiting in line”: the first one to be added to the queue will be the first one to be removed from the queue. Queues are common in many applications. For example, while we read a book from a file, it is quite natural to store the read words in a queue so that once reading is complete the words are in the order as they appear in the book. Another common example is buffer for network communication that temporarily store packets of data arriving on a network port. Generally speaking, it is processed in the order in which the elements arrive. (i) Definition of a Queue In a more formal way, queue can be defined as a list or a data structure in which data items can be added at the end (generally referred as rear) and they can be deleted from font of the queue. The data element to be deleted is the one which would spend maximum time in the queue. It is because of this property, queue is also referred to as a first-in-first-out (FIFO) data structure. The figure 3.2.1 below shows pictorial representation of a queue. Figure 3.2.1: Representation of a Queue in Computer’s Memory 174 Generally a queue can also be referred to as a container of objects (in other words linear collection) that are deleted or added based on principle of First-In-First-Out (FIFO). A very good example of a queue can be a line of students in the ice-cream counter of the college canteen. New arrival of students can be added to a line at back of the queue, while removal serving (or removal) happens in the front of the queue. The queue allows only two operations; enqueue and dequeues. Enqueue is an operation that allows insertion operation, dequeue allows us to remove an item. Stack and a queue difference lies only in deletion of item. A stack removes most recently added item; while a queue removes the least recently added item first. In spite of its simplicity, the queue is a very important concept with many applications in simulation of real life events such as lines of customers at a cash register or cars waiting at an intersection, and in programming (such as printer jobs waiting to be processed. Many Smalltalk applications use a queue but instead of implementing it as a new class, they use an Ordered Collection because it performs all the required functions. Dequeuing or removing an item from a queue is only possible on non-empty queues, which requires contract in the interface. This interface can be written without committing to an implementation of queues. This is important so that different implementations of the functions in this interface can choose different representations. (ii) Array Representation of Queue The array to implement the queue would need two variables (indices) called front and rear to point to the first and the last elements of the queue. The figure 3.2.2 shows array implementation of queue. Figure 3.2.2: Array Implementation of Queue 175 Initially: q->rear = -1; q->front = -1; For every enqueue operation we increment rear by one, and for every dequeue operation, we increment front by one. Even though enqueue and dequeue operations are simple to implement, there is a disadvantage in this set up. The size of the array required is huge, as the number of slots would go on increasing as long as there are items to be added to the list (irrespective of how many items are deleted, as these two are independent operations.) Problems with this representation Although there is space in the following queue in the initial blocks, we may not be able to add a new item. An attempt will cause an overflow. Figure 3.2.3: Queue Overflow Situation It is possible to have an empty queue yet no new item can be inserted. (When front moves to the point of rear, and the last item is deleted.) Figure 3.2.4: Overflow Situation in an Empty Queue 176 The below program shows the implementation of a queue using an Array Program: /* * C Program to Implement a Queue using an Array */ #include<stdio.h> #include<stdlib.h> #define SIZE 50 int queue_arr[SIZE]; int rear =-1; int front =-1; int main() { int ch; while(1) { printf("1.Insert element to the queue \n"); printf("2.Delete element from the queue \n"); printf("3.Display all elements of the queue \n"); printf("4.Quit \n"); printf("Enter your choice : "); scanf("%d",&ch); switch(ch) { case 1: insert(); break; case 2: delete(); break; case 3: display(); break; case 4: exit(1); default: printf("Invalid Input \n"); }/*End of switch*/ }/*End of while*/ }/*End of main()*/ insert() { int add_item; if(rear == SIZE -1) printf("Queue Overflow \n"); else { 177 if(front ==-1) /*If queue is initially empty */ front =0; printf("Inset the element in the queue : "); scanf("%d",&add_item); rear = rear +1; queue_arr[rear]= add_item; } }/*End of insert()*/ delete() { if(front ==-1|| front > rear) { printf("Queue Underflow \n"); return; } else { printf("Element deleted from the queue is : %d\n", queue_arr[front]); front = front +1; } }/*End of delete() */ display() { int i; if(front ==-1) printf("Queue is empty \n"); else { printf("The Queue elements are : \n"); for(i = front; i <= rear; i++) printf("%d ", queue_arr[i]); printf("\n"); } }/*End of display() */ 178 Output: 179 Did you know? Though the simple queue appears to be very simple FIFO model, it is a very important model used for any applications. During the initial implementation of most of the operating systems like Linux which is generic programmer friendly OS and others like TinyOs which is OS used by wireless sensor networks used simple FIFO queue for scheduling different tasks due to ease of implementation and limitations of resources. Self-assessment Questions 1) Which one of the following is an application of Queue Data Structure? a) When b) A resource is shared among multiple devices c) Printer jobs waiting to be processed. d) Buffer used in network communication to store data packets e) All of the above 2) For every enqueue operation, we __________ by one, and for every dequeue operation, we __________ by one. a) Decrement rear, decrement front b) Increment rear, increment front c) Increment front, increment rear d) Decrement front, decrement rear 3) For queue implementation, we need two pointers namely front and rear. This pointers are initialized as: 180 a) front=1 and rear=-1 b) front=-1 and rear=-1 c) front=-1 and rear=1 d) front=1 and rear=1 3.2.2 Types of Queue A queue represents a basket of items. Enqueue is an operation that adds an item to this basket and dequeue is an operation that chooses an item to be removed from the queue (if the queue is not empty). Similar to human queues, these queues will vary based on the rule used to choose the item to be removed from the queue. Giving different names to the basic operations of the queue based on the operations they perform is usual and it helps in avoiding the confusion. However, using the general signature for different kinds of queue will make our code more modular later, when algorithms based on the different kinds of queue are discussed. (i) Simple Queue Like the stacks, we can also implement queue using lists and arrays. Both the arrays and the linked lists implementations have running time complexity of O (1) for every operation. This section covers the array implementation of queues. For every queue data structure, an array is kept namely, QUEUE [], and there are two positions q_front and q_rear, which represent the beginning and the ends of the queue respectively. q_size takes care of the size of the queue. All the above information makes part of a structure, and except for the queue functions themselves, no functions should access these directly. The figure 3.2.5 shows a queue in some intermediate state. The blank cells have undefined values in them. In particular, the elements in the first two cells have spent maximum time in the queue. Figure 3.2.5: Basic Queue example In order to enqueue an element x, we must first increment q_rear and q_size, then set QUEUE [q_rear] = x. To dequeue an element, assign return value to QUEUE [q_front], decrement q_size, and then increment q_front. Other strategies are possible (this is discussed later). Using array Simple queue can be declared as #define MAX 10 int queue[MAX], rear=0, front=0; 181 (ii) Circular Queue One biggest problem with the simple queue is its implementation. After adding 10 elements in the queue (considering previously discussed situation), the queue looks like full, since q_front is 10, and the next enqueue would be in a non-existent position. However, there might some positions available in the queue, as many elements may have been already dequeued. Queues, like stacks, frequently stay small even in the presence of a lot of operations. The solution for this problem is that whenever q_rear or q_front comes to the end of the array, it is wrapped around to the beginning. The following figure shows the queue during some operations Figure 3.2.6: Circular Queue There is a minimal need to write an extra code to implement the wraparound although it increases the running time complexity). If incrementing either q_front or q_rear makes it to go past the array, the value is reset to the first position in the array. However, two things are needed to be taken care of while using circular array implementation of queues. First and the foremost thing, we must check if queue is not empty, because a dequeue operation when the queue is empty returns an undefined value. 182 Secondly, sometimes programmers represent front and rear differently for queues. For example, some programmers do not use an entry to keep track the size of the queue, because they rely on the assumption that the queue is empty, q_front = q_rear - 1. The size is computed implicitly by comparing q_front and q_rear. This is a very tricky way, since there are some special cases, therefore one need to be very careful if we need to modify code written this way. Consider the situation that size is not part of the structure, then if the size of the array is A_SIZE, the queue is full when there are A_SIZE -1 elements, since only A_SIZE different sizes can be differentiated, and one of these is 0. In applications where it is sure that the number of enqueue is not larger than the size of the queue, obviously the wraparound is not necessary. As with stacks, dequeues are rarely performed unless the calling routines are certain that the queue is not empty. Thus error calls are frequently skipped for this operation, except in critical code. This is generally not justifiable, because the time savings that you are likely to achieve are too minimal. We can think of an array as a circle rather than a straight line in order to overcome the inefficient use of space as depicted in the figure 3.2.7. In this way, as entries are added and removed from the queue, the head will continually chase the tail around the array, so that the snake can keep crawling indefinitely but stay in a confined circuit. At different times, the queue will occupy different parts of the array, but there no need to worry about running out of space unless the array is fully occupied, in which case there is truly overflow 183 Figure 3.2.7: Queue in a Circular Array Implementation of Circular Arrays In order to implement the circular queue as a linear array, consider the positions around the circular arrangement as numbered from zero to max-1, where max is the total number of elements in the circular arrays. We use same numbered entries of a linear array to implement a circular array. Now it becomes a very simple logic of using modular arithmetic i.e. whenever the index crosses max-1, we start again from 0. This is as simple as doing arithmetic on circular clock face where the hours are numbered from 1 to 12, and if four hours are added to ten o’clock, two o’clock is obtained. Program: //Program for Circular Queue implementation through Array #include <stdio.h> #include<ctype.h> #include<stdlib.h> #define SIZE 5 int circleq[SIZE]; int front,rear; 184 int main() { void insert(int, int); void delete(int); int ch=1,i,n; front = -1; rear = -1; while(1) { printf("\nMAIN MENU\n1.INSERTION\n2.DELETION\n3.EXIT"); printf("\nENTER YOUR CHOICE : "); scanf("%d",&ch); switch(ch) { case 1: printf("\nEnter the elements of the queue: "); scanf("%d",&n); insert(n,SIZE); break; case 2: delete(SIZE); break; case 3: exit(0); default: printf("\nInvalid input. "); } } //end of outer while } //end of main void insert(int item,int MAX) { //rear++; //rear= (rear%MAX); if(front ==(rear+1)%MAX) { printf("\nCircular queue overflow\n"); } else { if(front==-1) front=rear=0; else rear=(rear+1)%MAX; circleq[rear]=item; printf("\nRear = %d Front = %d ",rear,front); } } void delete(int MAX) { int del; if(front == -1) 185 { printf("\nCircular queue underflow\n"); } else { del=circleq[front]; if(front==rear) front=rear=-1; else front = (front+1)%MAX; printf("\nDeleted element from the queue is: %d ",del); printf("\nRear = %d Front = %d ",rear,front); } } Output: 186 187 (iii) Double Ended Queue A double-ended queue is also known as dequeue. This is an ordered collection of items similar to the queue. A dequeue has two ends, a front and a rear, and all the items remains positioned in the entire collection. The special thing about dequeue is the unrestrictive nature of removing and adding items. An item can be added at either the rear or the front. Similarly, the existing items can be removed from either end. This makes dequeue a hybrid linear structure that provides all the capabilities of queues and stacks in a single data structure. Figure 3.2.8 shows a dequeue. It is important to note that even though the dequeue can assume many of the characteristics of stacks and queues, it does not require the LIFO and FIFO orderings that are enforced by those data structures. The use of the addition and removal operations must be done consistently. 188 Figure 3.2.8: Dequeue A double-ended queue (dequeue, often abbreviated to dequeue, pronounced deck) is an abstract data structure that implements a queue for which elements can only be added to or removed from the front (head) or back (tail). It is also often called a head-tail linked list. Dequeue is a special type of data structure in which deletion and insertion can be done either at the rear end or at the front end of the queue. The operations that can be performed on dequeues are: • Insertion of an item from front end • Insertion of an item from rear end • Deletion of an item from front end • Deletion of an item from rear end • Displaying the contents of queue Application of dequeue • A nice application of the dequeue is storing a web browser's history. Recently visited URLs are added to the front of the dequeue, and the URL at the back of the dequeue is removed after some specified number of insertions at the front. • Another common application of the dequeue is storing a software application's list of undo operations. • One example where a dequeue can be used is the A-Steal job scheduling algorithm.[5] This algorithm implements task scheduling for several processors. A separate dequeue with threads to be executed is maintained for each processor. To execute the next 189 thread, the processor gets the first element from the dequeue (using the "remove first element" dequeue operation). If the current thread forks, it is put back to the front of the dequeue ("insert element at front") and a new thread is executed. When one of the processors finishes execution of its own threads (i.e. it’sdequeue is empty), it can "steal" a thread from another processor: it gets the last element from the dequeue of another processor ("remove last element") and executes it. • In real scenario we can attached it to a Ticket purchasing line, It performs like a queue but some time It happens that somebody has purchased the ticket and sudden they come back to ask something on front of queue. In this scenario because they have already purchased the ticket so they have privilege to come and ask for any further query. So in this kind of scenario we need a data structure where according to requirement we add data from front. And In same scenario user can also leave the queue from rear. Program: /*Implementation of De-queue using arrays*/ #include<stdio.h> #include<stdlib.h> #define MAX 10 typedef struct dequeue { int front,rear; int arr[MAX]; }dq; /*If flag is zero, insertion is done at beginning else if flag is one, insertion is done at end. */ void enqueue(dq *q,int x,int flag) { int i; if(q->rear==MAX-1) { printf("\nQueue overflow!"); exit(1); } if(flag==0) { for(i=q->rear;i>=q->front;i--) q->arr[i+1]=q->arr[i]; q->arr[q->front]=x; q->rear++; } else if(flag==1) 190 { q->arr[++q->rear]=x; } else { printf("\nInvalid flag value"); return; } } void dequeue(dq *q,int flag) { int i; /*front is initialized with zero, then rear=-1 indicates underflow*/ if(q->rear < q->front) { printf("\nQueue Underflow"); exit(1); } if(flag==0)/*deletion at beginning*/ { for(i=q->front;i<=q->rear;i++) q->arr[i]=q->arr[i+1]; q->arr[q->rear]=0; q->rear--; } else if(flag==1) { q->arr[q->rear--]=0; } else { printf("\nInvalid flag value"); return; } } void display(dq *q) { int i; for(i=q->front;i<=q->rear;i++) printf("%d ",q->arr[i]); } void main() { dq q; q.front=0; q.rear=-1; int ch,n; while(1) { printf("\nMenu-Double Ended Queue"); printf("\n1. Enqueue – Begin"); 191 printf("\n2. Enqueue – End"); printf("\n3. Dequeue – Begin"); printf("\n4. Dequeue – End"); printf("\n5. Display"); printf("\n6. Exit"); printf("\nEnter your choice: "); scanf("%d",&ch); switch(ch) { case 1: printf("\nEnter the number: "); scanf("%d",&n); enqueue(&q,n,0); break; case 2: printf("\nEnter the number:" ); scanf("%d",&n); enqueue(&q,n,1); break; case 3: printf("\nDeleting element from beginning"); dequeue(&q,0); break; case 4: printf("\nDeleting element from end"); dequeue(&q,1); break; case 5: display(&q); break; case 6: exit(0); default: printf("\nInvalid Choice"); } } } 192 Output: 193 (iv) Priority Queue Consider a job sent to a line printer. Although these jobs are placed in a queue which is served by the printer on FIFO basis, this may not be a best practice always. Sometimes one job waiting in a queue might be particularly important, so that it might be required to allow the job to be run as soon as printer becomes available. Also, when the printer becomes available, there are several single page jobs in the queue and only one hundred-page job. Here, this might be reasonable to make the long job go last, even if it is not the last job submitted. (Unfortunately, most systems may not follow this, which can be particularly annoying at times.) In a similar way, in multi-tasking and multi-user environment, an operating system scheduler must decide on which task or user to be allocated a processor. In general a process is allowed to be executed in time slots or time frames. A simple algorithm used for processing jobs is use of queue which process jobs of FIFO bases. Whenever a new job is arrived, it is placed at the end of the queue. The scheduler process the jobs on first come first serve bases from the queue until the queue is empty. This algorithm may not be appropriate since jobs which require short time slot may seem to take a long time because of the wait involved in running. Generally, it is 194 important that jobs requiring short time slot finishes as fast as possible. Therefore, these jobs must have higher priority over jobs that have already been running. Furthermore, there may be some jobs that are not short which may be still very important and thus must also be considered on priority. A priority queue is a data structure that allows at least the following two operations: insert, which does the obvious thing, and delete_min, which finds, returns and removes the minimum element in the queue. The insert operation is the same enqueue, and delete_min is the priority queue equivalent of the queues dequeue operation. The delete_minfunction also alters its input. Figure 3.2.9: Basic Model of a Priority Queue As with most data structures, at times it is possible to add other operations too, but these are extra operations are not part of the basic model depicted in Figure 3.2.8. Besides operating systems, priority queues have many applications. They are used for external sorting. Priority queues are also important in the implementation of greedy algorithms, which operate by repeatedly finding a minimum. Did you know? Circular queue is very famous in computer networks because of its very own circular structure. The simplest application of circular queues for network engineers is in implementation of round robin algorithm which is used for token passing and also it is used in FIFO buffering systems. 195 Self-assessment Questions 4) A circular queue is implemented using an array of size 10. The array index starts with 0, front is 6, and rear is 9. The insertion of next element takes place at the array index. a) 0 b) 7 c) 9 d) 10 5) If the MAX_SIZE is the size of the array used in the implementation of circular queue, array index start with 0, front point to the first element in the queue, and rear point to the last element in the queue. Which of the following condition specify that circular queue is EMPTY? a) Front=rear=0 b) Front= rear=-1 c) Front=rear+1 d) Front= (rear+1)%MAX_SIZE 6) A normal queue, if implemented using an array of size MAX_SIZE, gets full when a) Rear=MAX_SIZE-1 b) Front= (rear+1)mod MAX_SIZE c) Front=rear+1 d) Rear=front 3.2.3 Operations on Queue Similar to the operations performed on stacks, all such operations can be performed on queues too. The basic operations involve insertion (enqueue) and deletion (dequeue). The other supporting functions involve Qfill and Qempty. Other queuing operations involve initialising or defining a queue, using it and then completely erasing it from the computer’s memory. This section covers all the functions on queues in detail. • enqueue() – Insert a data element in a queue. • dequeue() – remove a data element from the queue. Additional functions are required to make above mentioned queue operation efficient. These are − 196 • Qfull () − checks if the queue is full; returns Boolean • Qempty () − checks if the queue is empty; returns Boolean While performing dequeues operations the data is accessed from front pointer and while performing enqueue operation, data is accessed from rear pointer. (i) Insertion As already discussed in the previous sections, insertion in queue is also called as enqueue operation. The following are the to be followed while performing enqueue operation − • Step 1 − Check if queue is full. • Step 2 − If queue is full, display an overflow error and exit. • Step 3 − If queue is not full, increment rear pointer to point next empty array space. • Step 4 − Add data element to the queue location, where rear is pointing. • Step 5 − return success. Figure 3.2.10: Enqueue Operation Following is an algorithm for enqueue operation 197 The above algorithm implemented in C programming is shown below: (ii) Deletion in Queue Deletion operation from the queue is also called as dequeue operation. The following are the steps to be followed while performing dequeue operation − • Step 1 − Check if queue is empty. • Step 2 − If queue is empty, display an underflow error and exit. • Step 3 − If queue is not empty, access data where front is pointing. • Step 4 − Increment front pointer to point next available data element. • Step 5 − return success. Figure 3.2.10: Dequeue Operation 198 Following is an algorithm used for performing dequeue operation The above algorithm implemented in C programming is shown below: (iii) Qempty Operation In order to delete an element from the queue first check weather Queue is empty or not. If queue is empty then do not delete an element from the queue. This condition is known as “Underflow”. If queue is not underflow then we can delete the element from queue. After deleting element from queue we must update the values of rear and front as per the position of elements in the queue. Algorithm of Qempty() function − 199 The above algorithm implemented in C programming is shown below: (iv) Qfull Operation Since we are using single dimensional array for implementation of queue, the best way to know if queue is full or not is to check if rear pointer has reached MAXSIZE-1; which means no space is available in array for additional elements and hence queue is full. Algorithm of Qfull () function – The above algorithm implemented in C programming is shown below: (v) Display Operation Queue can be displayed by simply moving rear pointer till it reaches the front pointer. Only condition that should be considered is the Qempty condition and display “Queue Empty” message accordingly. 200 Self-assessment Questions 7) If the elements “A”, “B”, “C” and “D” are placed in a queue and are deleted one at a time, in what order will they be removed? a) ABCD b) DCBA c) DCAB d) BADC 8) Deletion operation is done using __________in a queue. a) Front b) Rear c) Top d) Bottom 9) An array of size MAX_SIZE is used to implement a circular queue. Front, Rear, and count are tracked. Suppose front is 0 and rear is MAX_SIZE -1. How many elements are present in the queue? a) Zero b) One c) MAX_SIZE-1 d) MAX_SIZE 201 3.2.4 Application of Queue As per the very nature of queue, it can be used in all the applications requiring the use of fist come first serve property. Following are the some of the very common applications in computer science where use of queue makes it easy. 1. As already discussed in the previous chapters, queue plays a key role in scheduling for the computer resource sharing based applications. For example, simplest of the printer queue where printing jobs are added to the scheduling queue and printer serves the requests as per FIFO basis. 2. Similarly queues also play a very important role in CPU scheduling. All the requests for using processors are stored in queue by the CPU scheduler program. The requests are then serviced as per FIFO basis. 3. Another most common application of queue is routing calls in call centres. All the calls made by clients are stored in waiting queue and allotted to different executives who attend the calls. When all the executives are busy handling customers, the call which was made first out of all the other calls is given connected to executive as soon as he becomes available. 4. Another most important application of queue in computer system is interrupt handling. Since computer system is connected to many input and output devices. These devices keep sending requests to processor, by creating interrupts repeatedly. These interrupts are handled by interrupt handle program which put these interrupt in queue as and when they arrive. Then it service there interrupt as per the availability of CPU. 5. M/M/1 queue. The M/M/1 queue is a fundamental queueing model in operations research and probability theory. Tasks arrive according to a Poisson process at a certain rate λ. This means that λ customers arrive per hour. More specifically, the arrivals follow an exponential distribution with mean 1 / λ: the probability of k arrivals between time 0 and t is (λ t)^k e^(-λ t) / k!. Tasks are serviced in FIFO order according to a Poisson process with rate μ. The two M's standard for Markov: it means that the system is memoryless: the time between arrivals is independent, and the time between departures is independent. 202 Self-assessment Questions 10) Which data structure allows deleting data elements from front and inserting at rear? a) Stacks b) Queues c) Dequeues d) Binary search tree 11) The push and enqueue operations are essentially the same operations, push is used for Stacks and enqueue is used for Queues. a) True b) False 12) In order to input a list of values and output them in order, you could use a Queue. In order to input a list of values and output them in opposite order, you could use a Stack. a) True b) False 203 Summary o Similar to stacks, Queues are data structure usually used to simplify certain programming operations. o In these data structures, only one data item can be immediately accessed. o A queue, in general, allows access to the first item that was inserted. o The important queue operations are inserting an item at the rear of the queue and removing the item from the front of the queue. o A queue can be implemented as a circular queue, which is based on an array in which the indices wrap around from the end of the array to the beginning. o A priority queue allows access to the smallest (or sometimes the largest) item in the queue. o The important priority queue operations are inserting an item in sorted order and removing the item with the smallest key. o Few important operations performed on the queue are insertion which is also called enqueue, deletion also called dequeues, Qempty which check if queue is empty, Qfull which check if queue is full and display which is used to display all the elements in the queue. o Queues finds its applications in implementing job scheduling algorithms, page replacement algorithms, interrupt handling mechanisms etc. in the design of operating systems. 204 Terminal Questions 1. Explain the basic operations of queue. 2. Discuss the functioning of circular Queue? 3. Mention the limitation of linear queue with a suitable example 4. Discuss applications of Queue Answer Keys Self-assessment Questions Question No. Answer 1 d 2 b 3 b 4 a 5 b 6 a 7 a 8 a 9 d 10 b 11 a 12 a 205 Activity Activity Type: Offline Duration: 15 Minutes Description: Fill in the following table to give the running times of the priority queue operations for the given two implementations using O () notation. You should assume that the implementation is reasonably well done, For example, not performing expensive computations when a value can be stored in an instance variable and be used as needed. A priority queue is a data structure that supports storing a set of values, each of which has an associated key. Each key-value pair is an entry in the priority queue. The basic operations on a priority queue are: • insert(k, v) – insert value v with key k into the priority queue • removeMin() – return and remove from the priority queue the entry with the smallest key Other operations on the priority queue include size (), which returns the number of entries in the queue and is Empty () which returns true if the queue is empty and false otherwise. Two simple implementations of a priority queue are an unsorted list, where new entries are added at the end of the list, and a sorted list, where entries in the list are sorted by their key values. Operation Size, isempty() Insert Removemin() 206 Unsorted list Sorted list Bibliography e-Reference • bowdoin.edu, (2016). Computer Science 210: Data Structures. Retrieved on 19 April 2016, from http://www.bowdoin.edu/~ltoma/teaching/cs210/fall10/Slides/StacksAndQueues. pdf External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education. Video Links Topic Link Circular queue https://www.youtube.com/watch?v=g9su-lnW2Ks Priority queue https://www.youtube.com/watch?v=gJc-J7K_P_w Double ended queue https://www.youtube.com/watch?v=4xLh68qokxQ 207 Notes: 208 MODULE - IV Linked List MODULE 4 Linked List Module Description Until now implementation of data structures were done only in arrays, which consists of contiguous memory allocation. This module covers a new way of representing and implementing data structures linked list. The chapter 4.1 covers advantages, disadvantages of array implementation and introduces the linked lists. The specifications of linked lists which include its definition, its components and representation also form part of the chapter. Moreover the chapter also covers different types of linked lists. The chapter 4.2 majorly focuses on different operations which can be performed on linked lists and their implementation. Chapter 4.1 Introduction to Linked List Chapter 4.2 Operations on Linked List Chapter Table of Contents Chapter 4.1 Introduction to Linked List Aim ..................................................................................................................................................... 209 Instructional Objectives................................................................................................................... 209 Learning Outcomes .......................................................................................................................... 209 Introduction to Linked Lists ........................................................................................................... 210 4.1.1 Linked List Specifications....................................................................................................... 211 (i) Definition ............................................................................................................................ 212 (ii) Components....................................................................................................................... 212 (iii) Representation.................................................................................................................. 213 (iv) Advantages and Disadvantages ...................................................................................... 214 Self-assessment Questions ...................................................................................................... 216 4.1.2 Types of Linked Lists .............................................................................................................. 217 (i) Singly Linked Lists ............................................................................................................. 217 (ii) Doubly Linked Lists .......................................................................................................... 220 (iii) Circular Linked Lists ....................................................................................................... 222 Self-assessment Questions ...................................................................................................... 224 Summary ........................................................................................................................................... 225 Terminal Questions.......................................................................................................................... 225 Answer Keys...................................................................................................................................... 226 Activity............................................................................................................................................... 227 Case Study: ........................................................................................................................................ 227 Bibliography ...................................................................................................................................... 229 e-References ...................................................................................................................................... 229 External Resources ........................................................................................................................... 229 Video Links ....................................................................................................................................... 229 Aim To provide the students the knowledge of Linked Lists and enable the students to write programs using linked list in C Instructional Objectives After completing this chapter, you should be able to: • Explain components and representation of linked list • Outline the advantages and disadvantages of linked list • Differentiate singly, doubly and circular linked list • Discuss the applications of linked list Learning Outcomes At the end of this chapter, you are expected to: • Construct a table of advantages and disadvantages of linked list • Discuss various types of linked lists • Identify the components of all the different types of linked lists 209 Introduction to Linked Lists In this chapter, we will introduce the concept of linked list data structures. This chapter focuses on how linked list can be used to overcome the limitations of array data structures. Use of linked list helps to achieve high flexibility in programming. In this chapter, we will come across the basic structure of a linked list and its various representations, types of linked lists, advantages and disadvantages associated with linked lists and its applications. A linked list is a linear representation of interconnected nodes which consists of two components namely, data and link. The data consists of the value stored at that node and link consists of the address of the next interconnected node. As already covered in previous chapters, the linear data structures such as stacks and queues can be implemented using arrays which allocates memory sequentially. The contiguous memory allocation of arrays provides several advantages for implementing stacks and queues as given below: • Faster data access: Arrays operate on computed addresses. Therefore, direct access of data is possible which reduces the data access time. • Simple to understand and use: Arrays are very simple to understand. Declaring, accessing, displaying, etc. of data from arrays is very simple. This makes implementation of stacks and queues very simple. • Adjacency of data: In arrays, data are both physically and logically adjacent. Therefore loss of data elements does not affect the other part of the list. There are also drawbacks associated with sequential allocation of data. The disadvantages of array implementation are given below: • Static memory allocation: While using arrays, the compiler allocates fixed amount of memory for the program before execution begins and this allocated memory cannot be changed during its execution. It is difficult and sometimes not possible to predict the amount of memory that may be required an applications in advance. If more than the required memory is allocated to a program, and the application does not utilize it and hence results in wastage of memory space. In the same way, if less memory is allocated to a program and if application demands more memory then additional amount of memory cannot be allocated during execution. 210 • Requirement of contiguous memory space: For implementation of linear data structures using arrays, sufficient amount of contiguous memory is required. Sometimes even though there is memory space available in the memory it is not contiguous, which makes it of no use. • Insertion and deletion operations on arrays are time taking and sometimes tedious tasks. In order to overcome the drawbacks of contiguous memory allocation, the linear data structures like linked list and arrays can be implemented using linked allocation technique. The structure in the form of linked lists can be used to implement linear data structures like stacks and queues efficiently. This mechanism can be used for many different purposes and data storage applications. 4.1.1 Linked List Specifications Linked lists are used to implement a list of data items in some order. The linked lists structures use memory that grows and shrinks are per the requirement. Figure 4.1.1 below helps better understand linked list structure. Figure 4.1.1: Linked List Every component comprising “data” and “next” in the above figure 4.1.1 is called a node and the arrow represents the link to the next node. Head represents the starting of the linked list. The size of the data item can vary as per requirement. Linked lists can grow as much as there is memory space available. 211 (i) Definition A linked representation of a data structure known as a linked list is a collection of nodes. An individual node is divided into two fields named data and link. The data field contains the information or the data to be stored by the node. The link field contains the addresses of the next node. Figure 4.1.2(a) below, demonstrates the general structure of a node in a linked list. Here, the field that holds data is termed as Info and the field that holds the address is termed as a Link. Figure 4.1.2(b) demonstrates an instance of a node. Here the info field contains an integer number 90 and the address field contains the address 2468 of the next node in a list. Figure 4.1.3 shows an example of linked list for integer numbers. Figure 4.1.2: (a) Structure of Node (b) Instance of Node Did you know? Linked lists were developed in 1955–1956 by Allen Newell, Cliff Shaw and Herbert A. Simon at RAND Corporation as the primary data structure for their Information Processing Language. IPL was used by the authors to develop several early artificial intelligence programs, including the Logic Theory Machine, the General Problem Solver, and a computer chess program. (ii) Components As already discussed in the previous chapter, linked list basically has a collection of nodes. There, nodes individually have two components; data and link. The data component holds the actual data that the node should hold and link contains the address of the node that is next to it. In this way the linked list grows like a chain. Along with this, linked list also has a head (alternatively called start) pointer which points to the beginning or start of the linked list. The 212 link part of the last node of the linked list contains null which represents the end of the list. Figure 4.1.3 shows a lined list containing integer data elements. Figure 4.1.3: Linked List Containing Integer Values (iii) Representation One way to represent a linked list is by using two linear arrays for memory representation. Assume there are two arrays; INFO and LINK. In these two arrays, INFO[K] contains data part and LINK[k] pointer field to node k. Assume START as a variable storing the starting address of the list and NULL is a pointer indicating the end of the list. The pictorial representation of linked list as discussed above is shown in figure 4.1.5 and figure 4.1.6. Figure 4.1.4: Representation of Linked List Figure 4.1.5: INFO and LINK Arrays 213 Here, START = 9 => INFO[9]= H is the first character LINK[9]= 4 => INFO[4]= E is the first character LINK[4]= 6 => INFO[6]= L is the first character LINK[6]= 2 => INFO[2]= L is the first character LINK[2]= 8 => INFO[8]= 0 is the first character LINK[8]= 0 => NULL value so the list ends Generally a linked list node is implemented in C++ programming using a structure and allocating dynamic memory to the structure. The syntax shown below demonstrates the implementation of the node using a structure. struct test_struct { int val; struct test_struct *next; }; (iv) Advantages and Disadvantages The advantages and disadvantages of linked lists are given below: Advantages of Linked list specifications • Number of elements: First and foremost advantage of linked lists over arrays is that we need not know in advance about how many elements will form a part of list. Therefore we need not allocate memory for the linked list in advance. • Insertion and deletion operations: While using linked lists, insertion and deletion operations can be performed without fixing the size of the memory in advance. • Memory allocation: One of the most important advantages of linked lists over arrays is that it utilizes only the exact amount of memory required for it to store data and it can expanded to acquire all the memory locations if needed. • Non-contiguous memory: Unlike arrays, for linked list, we do not require contiguous memory allocation. We do not require elements to be stored in 214 consecutive memory locations. That means even if the contiguous memory block is unavailable, we can still store data. Disadvantages of Linked list specifications • Pointer memory: Use of linked list requires extra memory space as pointers are also stored with information or data. This makes implementation expensive considering memory requirement. • No random access: Since we have to access nodes or elements sequentially, we do not have access to elements in linked list directly at particular node. Also sequential access makes data access time consuming depending on location of the elements. • Traversal: In linked lists traversal from end of the list to beginning of the list is not possible. • Sorting: Sorting elements in linked list is not as easy as the sorting operations in arrays. The differences between static and linked allocation are mentioned below Table 4.1.1: Differences between Static and Linked Allocation Static Allocation Technique Linked Allocation Technique Memory is allocated during compile time Memory is allocated during execution time. The size of the memory allocated is fixed The size of the memory allocated may vary. Suitable for applications where data size is Suitable for applications where data size is fixed and known in advance unpredictable. Execution is faster Execution is slow. Insertion and deletion operations are strictly not defined. It can be done conceptually but inefficient way. Insertion and deletion operations are defined and can be done more efficiently. 215 Self-assessment Questions 1) Individual node in the linked list consists of _________ number of fields. a) One b) Two c) Three d) Four 2) The info or data field of linked list contains _______ and link field contains _______. a) Data to be stored; link to the previous node b) Data to be deleted; link to the next node c) Data to be stored; link to the next node d) Data to be stored; link to the previous node. 3) The linked list allocated memory for its data elements in _____________ order. a) Even b) Contiguous c) Consecutive d) Non-contiguous 4) Linked lists do not require additional memory space for holding pointer. a) True 216 b) False 4.1.2 Types of Linked Lists Based on the access to the list or traversal linked lists, the different types of linked lists are: 1. Singly linked lists 2. Doubly linked lists 3. Circular linked lists (i) Singly Linked Lists It is the simple representation of the linked list. The structure of linked lists discussed till now may be called as singly linked lists. The individual data elements in the list are called a ‘node’. The elements in the lists may or may not be present in consecutive memory location. Therefore pointers are used to maintain the order of the list. Every individual node is divided into two parts called INFO and LINK. INFO is used to store data and LINK is used to store the address of the next node. Pointer START is used to indicate the starting of the linked list and NULL is used to represent the end of the list. The following figure 4.1.6 shows the representation of a singly linked list. Figure 4.1.6: Singly linked lists 217 Representation using C programming struct test_struct { int val; struct test_struct *next; }; Program to Create Singly Linked List. #include <stdio.h> #include <malloc.h> #include <stdlib.h> void main() { struct node { int n; struct node *ptr; }; typedef struct node NODE; NODE *head, *first, *temp = 0; int cnt = 0; int ch = 1; first = 0; while (choice) { head = (NODE *)malloc(sizeof(NODE)); printf("Enter the data item:\n"); scanf("%d", &head-> n); if (first != 0) { temp->ptr = head; temp = head; } else { first = temp = head; } fflush(stdin); printf("Do you wish to continue(Press 0 or 1)?\n"); scanf("%d", &ch); } temp->ptr = 0; 218 /* reset temp to the beginning */ temp = first; printf("\nThe linked lists elements are:\n"); while (temp != 0) { printf("%d=>", temp->n); cnt++; temp = temp -> ptr; } printf("NULL\n"); } Output: Advantages of Singly Linked Lists • The most obvious advantage of singly linked lists is its easy representation and simple implementation. • Secondly, we only need to keep track of one pointer i.e., forward pointer without having to bother about the previous node information. • It is persistent data structure. A simple list of objects formed by each carrying a reference to the next in the list. This is persistent because we can take a tail of the list, meaning the last k items for some k, and add new nodes on to the front of it. The tail will not be duplicated, instead becoming shared between both the old list and the new list. So long as the contents of the tail are immutable, this sharing will be invisible to the program. 219 Disadvantages of Singly Linked Lists • The very own advantage of singly linked lists of forward sequential access becomes its own drawback when we need to traverse back to previous nodes. Since singly linked list structure does not keep any information of the previous node, traversing backward is impossible. • If we want to delete an element in the linked list and if the element to be deleted is present at the end of the list, then this becomes worst case for singly linked list as we have to check for all the nodes from beginning till the end. The worst case for this operation is O(n). Did you know? Several operating systems developed by Technical Systems Consultants (originally of West Lafayette Indiana, and later of Chapel Hill, North Carolina) used singly linked lists as file structures. A directory entry pointed to the first sector of a file, and succeeding portions of the file were located by traversing pointers. Systems using this technique included Flex (for the Motorola 6800 CPU), mini-Flex (same CPU), and Flex9 (for the Motorola 6809 CPU). A variant developed by TSC for and marketed by Smoke Signal Broadcasting in California, used doubly linked lists in the same manner. (ii) Doubly Linked Lists Doubly linked lists also referred ad D-lists are data structures where individual node consist of three fields. One is the usual INFO field which hold the data element. The other two are called NEXT and PREV which has addresses or links to next and previous nodes respectively. Use of D-lists provides us an access to both sides of the list. We can traverse forward using NEXT link and backward using PREV. The figure 4.1.7 below shows the pictorial representation of doubly linked lists. 220 Figure 4.1.7: Doubly Linked List Representation using a c program struct test_struct { int val; struct test_struct *next; Struct test_struct *prev }; Advantages of Doubly linked lists • This structure overcomes the disadvantage of singly linked list of traversing only forward by having two pointers NEXT and PREV. Using NEXT we can move forward and using PREV we can traverse backward. • This structure also provides us a flexibility to move to any of the node from any node by moving to and fro throughout the list which is not possible in singly linked list. • A node on a doubly linked list may be deleted with little trouble, since we have pointers to the previous and next nodes. A node on a singly linked list cannot be removed unless we have the pointer to its predecessor. Disadvantages of Doubly linked lists • Doubly linked list implementation require higher amount of energy as compared to singly linked list. This is because of its very own structure of having two links for moving forward and backward. The use of extra pointer for each node increases need of extra memory. 221 • The deletion and insertion operations on doubly linked lists are more time consuming because of the use of two pointers. It consumes a lot of time to update these pointers after each operation and require extra coding which increases time complexity. (iii) Circular Linked Lists The third type of linked list is circular linked list. Here, we have START, however the link of the last node is not NULL. The list part of the last node is again START. This makes it connect to START so that we can traverse back to the beginning making a round robin structure. This overcomes the drawbacks of singly linked list where one cannot move back to the first node. Figure 4.1.8 shows the structure of circular linked list. Figure 4.1.8: Circular Linked List Advantages of Circular Linked Lists • Circular linked list enables traversal back to the starting of the list by connecting the last node in the list back to the start. This removes the drawback of a singly linked list where one cannot access starting elements once moved forward. • A circular linked list also eliminates the use of two pointers like that of doubly linked list, and helps us serve our purpose of traversing across the list. This result in saving a lot of memory required (saving extra pointer) for each node. • Handling pointers in circular linked list becomes as easy as singly linked list as we have to track only one pointer that moves forward. Disadvantages of Circular Linked Lists • Though it helps us traverse across the linked list, the traversal to previous node is very time consuming. This is because there is no pointer that keep track of previous 222 node, which makes us traverse the entire linked list again and reach the node that we desire to access. • If proper exception handling mechanism is absent, then implementation of circular lists can be dangerous as it might lead to infinite loop. • Reversal on the list is difficult. Applications of circular linked lists • Implementation of waiting and context switch queues in operating system. When there are multiple processes running on operating system and there is mechanism to provide limited time slots for each process, waiting process can form a circular linked list. Task at the head of list is given CPU time, once time allocated finishes, task is taken out and added to list again at the end and it continues. • Circular linked list is useful in implementation of queues using lists. In this, we need to have two pointers, one to point to head and other to point to end of list, because in queue, addition happens at end and removal at head. With circular list, that can be done using only one pointer. • The real life application where the circular linked list is used is our Personal Computers, where multiple applications are running. All the running applications are kept in a circular linked list and the OS gives a fixed time slot to all for running. The Operating System keeps on iterating over the linked list until all the applications are completed. • Multiplayer games. All the Players are kept in a Circular Linked List and the pointer keeps on moving forward as a player's chance ends. • Circular Linked List can also be used to create Circular Queue. In a Queue we have to keep two pointers, FRONT and REAR in memory all the time, whereas in Circular Linked List, only one pointer is required. 223 Self-assessment Questions 5) In ___________ type of linked lists we can traverse in both the directions. a) Singly linked list b) Circular linked list c) One dimensional linked list d) Doubly linked list 6) In circular linked list, the link part of the last element in the list hold the address of _______. a) Random node b) NULL c) START d) Previous Node 7) It is possible to traverse across the list using circular lists. a) True b) False 8) In singly linked list it is possible to traverse back to the START. a) True 224 b) False Summary o A linked representation of a data structure known as a linked list is a collection of nodes. o Individual node is divided into two fields named data and link. o The data field contains the information or the data to be stored by the node. The link field contains the addresses of the next node. o Based on the access to the list or traversal there are three types of linked lists; Singly Linked Lists, Doubly linked lists and Circular linked lists. o The use of linked list depends on application as there are some applications that demands sequential allocation where linked lists cannot be used. Terminal Questions 1. Explain the advantages and disadvantages of static memory allocation. 2. Explain the advantages and disadvantages of linked lists. 3. Compare singly linked list and doubly linked lists 4. Compare singly linked list and circular linked lists 225 Answer Keys Self-assessment Questions 226 Question No. Answer 1 b 2 c 3 d 4 b 5 d 6 c 7 a 8 b Activity Activity Type: Offline Duration: 30 Minutes Description: Students should demonstrate the operations on linked list. Here few students can act as nodes and another student can act as a pointer. Students should perform operations like Insertion, Deletion on these nodes. Students should act as nodes of singly linked list, doubly linked list, circular linked list. Operations should be performed on different types of linked list. Case Study: Stack Implementation through Linked List We can avoid the size limitation of a stack implemented with an array, with the help of a linked list to hold the stack elements. As needed in case of array, we have to decide where to insert elements in the list and where to delete them so that push and pop will run at the fastest. Primarily, there are two operations of a stack; push() and pop(). A stack carries lifo behavior i.e. last in, first out. You know that while implementing stack with an array and to achieve lifo behavior, we used push and pop elements at the end of the array. Instead of pushing and popping elements at the beginning of the array that contains overhead of shifting elements towards right to push an element at the start and shifting elements towards left to pop an element from the start. To avoid this overhead of shifting left and right, we decided to push and pop elements at the end of the array. Q.1) Now, if we use linked list to implement the stack, where will we push the element inside the list and from where will we pop the element? Hint: There are few facts to consider, before we make any decision: Insertion and removal in stack takes constant time. Singly linked list can serve the purpose. 227 There are two parts of above figure. On the left hand, there is the stack implemented using an array. The elements present inside this stack are 1, 7, 5 and 2. The most recent element of the stack is 1. It may be removed if the pop() is called at this point of time. On the right side, there is the stack implemented using a linked list. This stack has four nodes inside it which are linked in such a fashion that the very first node pointed by the head pointer contains the value 1. This first node with value 1 is pointing to the node with value 7. The node with value 7 is pointing to the node with value 5 while the node with value 5 is pointing to the last node with value 2. To make a stack data structure using a linked list, we have inserted new nodes at the start of the linked list. Q.2) Write a pseudo-code to carry out insertion and deletion operations of stack with the help of linked list. Q.3) Will the stack implementation using linked list be cost effective? 228 Bibliography e-Reference • cs.cmu.edu, (2016). Linked Lists. Retrieved on 19 April 2016, from https://www.cs.cmu.edu/~adamchik/15121/lectures/Linked%20Lists/linked%20lists.html External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education. Video Links Topic Link Doubly Linked List https://www.youtube.com/watch?v=k0pjD12bzP0 Linked Lists https://www.youtube.com/watch?v=LOHBGyK3Hbs Circular Linked List https://www.youtube.com/watch?v=I4tVBFBoNSA 229 Notes: 230 Chapter Table of Contents Chapter 4.2 Operations on Linked List Aim ..................................................................................................................................................... 231 Instructional Objectives................................................................................................................... 231 Learning Outcomes .......................................................................................................................... 231 Introduction ...................................................................................................................................... 232 4.2.1 Operations on Singly List ....................................................................................................... 232 (i) Creating a Linked List ........................................................................................................ 233 (ii) Insertion of a Node in Linked List .................................................................................. 234 (iii) Deletion of a Node from the Linked List ...................................................................... 238 (iv) Searching and Displaying Elements from the Linked List .......................................... 242 Self-assessment Questions ............................................................................................................... 245 Summary ........................................................................................................................................... 246 Terminal Questions.......................................................................................................................... 247 Answer Keys...................................................................................................................................... 248 Activity............................................................................................................................................... 248 Bibliography ...................................................................................................................................... 249 e-References ...................................................................................................................................... 249 External Resources ........................................................................................................................... 249 Video Links ....................................................................................................................................... 249 Aim To educate the students on the importance of using Linked List data structure in computer science Instructional Objectives After completing this chapter, you should be able to: • Demonstrate the traversing in the linked list* • Illustrate insertion and deletion operations at the end and beginning of linked list • Demonstrate the approach to search and display a value in the linked list Learning Outcomes At the end of this chapter, you are expected to: • Identify the traversal path of the linked list • Discuss insertion and deletion operations in singly linked • Determine the position of the value to be searched in the linked list 231 Introduction In the previous chapter, the basic concepts and fundamentals of the linked lists were covered. Linked lists are very robust and dynamic data structures and therefore, different nodes can be added, deleted, or can be updated at miniscule cost. Moreover, while using linked lists, there is no need of large contiguous memory at the compile time; rather it accesses memory at runtime based on requirements. These properties make linked lists a primary choice of programmers for many practical applications. This chapter focuses on understanding the most important and fundamental operations on linked lists such as creation of linked list, adding or insertion a node, deleting a node from the list, searching a node element and displaying elements of linked lists. 4.2.1 Operations on Singly List Deleting and adding an element in an array requires shifting of array elements to create or fill the holes (empty spaces). Also, updating an array element only requires accessing that particular value at index and simply overwriting or replacing the value. Moreover, one of the distinguishing features of the arrays is its random access which enables it to access any index value directly i.e. given an access to particular index; we can easily find the value at that index. In case of linked lists, the entry point constitutes head of the linked lists. Head or start of the list is actually not the node, but a reference to the first node in the linked list. That means, a head constitutes a value. For an empty linked list, the value of head is null. It is also known that linked list always ends with a null pointer (except for circular linked list where last node is connected to the start or head node). Great care must be taken while manipulating linked lists, as any wrong link in the middle makes the entire list inaccessible. That is because the only way to traverse a list is by using a reference to the next node from the current node. This concept of linking wrong nodes is called as “memory leaks”. In case of lost memory reference, the entire list from that point becomes inaccessible. Some of the basic operations on linked lists discussed in this chapter are as follows: 232 • Creation of linked lists • Insertion of node from the linked lists • Deletion of node from the linked lists • Searching an elements from the lists • Displaying all linked lists elements Did you know? Many programming languages such as Lisp and Scheme have singly linked lists built in. In many functional languages, these lists are constructed from nodes, each called a cons or cons cell. The cons has two fields: the car, a reference to the data for that node, and the cdr, a reference to the next node. Although cons cells can be used to build other data structures, this is their primary purpose. In languages that support abstract data types or templates, linked list ADTs or templates are available for building linked lists. In other languages, linked lists are typically built using references together with records. (i) Creating a Linked List The most simple of the operations of linked lists is creating a linked list. It includes a simple step of getting a free node and copying a data item into the “data” field of the node. The next step is to update the links of the node i.e., in case of entirely a new list, start and null pointers are supposed to be updated and in case of merging lists, the references must be updated accordingly. Below is a C programming code segment for declaring a linked list. struct node { int data; struct node *next; }*start=NULL; Below is a C programming function for creating a new node void create() { char c; do { struct node *new_node,*current; new_node=(struct node *)malloc(sizeof(struct node)); 233 printf("nEnter the data : "); scanf("%d",&new_node->data); new_node->next=NULL; if(start==NULL) { start=new_node; current=new_node; } else { current->next=new_node; current=new_node; } printf("nDo you want to create another node : "); c=getch(); }while(c!='n'); } (ii) Insertion of a Node in Linked List Insertion is an important operation while working with the data structures. This operation involves adding data to the data structure. For carrying out the operation there can be notably three different cases mentioned as below: • Insertion in front of the list • Insertion at any given location within the list • Insertion at the end of the list. The general procedure for inserting a node in the linked list is detailed below: Step 1: Input the value of the new node and the position where it is supposed to be inserted. Step 2: Check if the linked list is full. If YES then display an error message “Overflow”. Else continue. Step 3: Create a new node and Insert the data value in the data field of new node. Step 4: Add the new node to the desired location in the linked list. Below is C programming function for inserting a node in the linked list. void insert(node *ptr, int data) { /* Iterate through the list till we encounter the last node.*/ 234 while(ptr->next!=NULL) { ptr = ptr -> next; } /* Allocate memory for the new node and put data in it.*/ ptr->next = (node *)malloc(sizeof(node)); ptr = ptr->next; ptr->data = data; ptr->next = NULL; } Insertion of node at the starting on the linked list Figure 4.2.1 below shows the case for insertion of node in starting of the linked list. Figure 4.2.1: Insertion of Node in Starting of the Linked List Algorithm below shows insertion of node at the starting of the linked list. Step 1. Create a new node and assign the address to any node say ptr. Step 2. OVERFLOW,IF(PTR = NULL) write : OVERFLOW and EXIT. Step 3. ASSIGN INFO[PTR] = ITEM Step 4. IF(START = NULL) ASSIGN NEXT[PTR] = NULL ELSE ASSIGN NEXT[PTR] = START Step 5. ASSIGN START = PTR 235 Step 6. EXIT Following is the C programming function for implementing above algorithm. void insertion(struct node *nw) { struct node start, *previous, *new1; nw = start.next; previous = &start; new1 = (struct node* ) malloc(sizeof(struct node)); new1->next = nw ; previous->next = new1; printf("\n Input the fisrt node value: "); scanf("%d", &new1->data); } Insertion of node at any given position in the linked list Figure 4.2.2 below shows the case for insertion of new node at any given location. Figure 4.2.2: Insertion of Node at any Desired Location The following algorithm shows insertion of node at any given location within the list. InsertAtlocDll(info,next,start,end,loc,size) 1.set nloc = loc-1 , n=1 2.create a new node and address in assigned to ptr. 3.check[overflow] if(ptr=NULL) write:overflow and exit 4.set Info[ptr]=item; 5.if(start=NULL) set next[ptr] = NULL set start = ptr else if(nloc<=size) repeat steps a and b while(n != nloc) a. loc = next[loc] b. n = n+1 236 [end while] next[ptr] = next[loc] next[loc] = ptr else set last = start; repeat step (a) while(next[last]!= NULL) a. last=next[last] [end while] last->next = ptr ; [end if] 6.Exit. The below is the C programming function for implementing the above algorithm. void insertAtloc(node **start,int item , int i,int k ) { node *ptr,*loc,*last; int n=1 ; i=i-1; ptr=(node*)malloc(sizeof(node)); ptr->info=item; loc = *start ; if(*start==NULL) { ptr->next = NULL ; *start = ptr ; } else if(i<=k) { while(n != i) { loc=loc->next; n++; } ptr->next = loc->next ; loc->next = ptr ; } else { last = *start; while(last->next != NULL) {last=last->next; } last->next = ptr ; } } 237 Insertion of node at end of the linked list Figure 4.2.3 below shows the case for insertion of node at end of the list. Figure 4.2.3: Insertion of Node at the End of the List (iii) Deletion of a Node from the Linked List The deletion of a node in linked is similar to insertion. Again in deletion operation there are three cases as listed below: • Deletion of the front node • Deletion of any intermediate node • Deletion of the last node. The general algorithm for deletion of any given node is given below: Step 1: Search for the appropriate node to be deleted Step 2: Remove the node Step 3: Reconnect the linked list Step 4: Update all the links. 238 Deleting the front node Figure 4.2.4 shows deletion of node from the front of the linked list. Figure: 4.2.4: Deleting front Node Following algorithm shows deletion of the front node from the linked list. DELETE AT BEG(INFO,NEXT,START) 1.IF(START=NULL) 2.ASSIGN PTR = STRAT 3.ASSIGN TEMP = INFO[PTR] 4.ASSIGN START = NEXT[PTR] 5.FREE(PTR) 6.RETURN(TEMP) Below is the C programming function for implementing above algorithm. void deleteatbeg(node **start) { node *ptr; int temp; ptr = *start ; temp = ptr->info; *start = ptr->next ; free(ptr); printf("\nDeleted item is %d : \n",temp); } 239 Deletion of Intermediate node Figure 4.2.5 shows deletion of intermediate node from the linked list. Figure 4.2.5: Deletion of Intermediate Node from the Linked List Following is the algorithm for deleting an intermediate node from the linked list. Step 1: Traverse to the node to be deleted in the linked list. Step 2: Examine the next node by using the node pointers to move from node to node until the correct node is identified. Step 3: Copy the pointer of the removed node into temporary memory. Step 4: Remove the node from the list and mark the memory it was using as free once more. Step 5: Update the previous node's pointer with the address held in temporary memory. Below is the C programming function for implementation of above algorithm. //Function to delete any node from linked list. void delete_any() { int key; if(header->link == NULL) { printf("\nEmpty Linked List. Deletion not possible.\n"); } else { printf("\nEnter the data of the node to be deleted: "); scanf("%d", &key); ptr = header; while((ptr->link != NULL) && (ptr->data != key)) { 240 ptr1 = ptr; ptr = ptr->link; } if(ptr->data == key) { ptr1->link = ptr->link; free(ptr); printf("\nNode with data %d deleted.\n", key); } else { printf("\nValue %d not found. Deletion not possible.\n", key); } } } Deletion of end node from the linked list Figure 4.2.6 below shows deletion of the end node from the linked list. Figure 4.2.6: Deletion of Last Node from the Linked List Below is the algorithm for deletion of last node from the linked list. Delete End(info,next,start) 1.if(start=NULL) Print Underflow and Exit. 2.if(next[start]==NULL) Set ptr =start and start=NULL. set temp = info[ptr]. else cptr = start and ptr = next[start]. Repeat steps(a) and (b) while(next[ptr]!=NULL) (a) set cptr = ptr (b) set ptr = next[ptr] 241 [end while] set next[cptr] = NULL temp = info[ptr] [end if] 3. free(ptr) 4. return temp 5. exit Following is the C programming function foe implementation of above algorithm. void deleteatlast(node **start) { node *ptr,*cptr; int temp; if((*start)->next == NULL) { ptr = *start ; *start = NULL; temp = ptr->info; } else { cptr = *start ; ptr =(*start)->next; while(ptr->next != NULL) { cptr = ptr; ptr = ptr->next; } cptr->next = NULL; temp = ptr->info; } free(ptr); printf("\nDeleted item is %d : \n",temp); } (iv) Searching and Displaying Elements from the Linked List Searching and displaying elements of the linked list are very similar and simple as both operations involve simple traversal across the linked list. Searching: For linked list, it is simple linear search which begins from starting of the linked list from start pointer till we find the NULL pointer. 242 Algorithm for searching an element in the linked is as below. Step 1: Input the element to be searched KEY. Step 2: Initiate the current pointer with the beginning of the list. Step 3: Compare KEY with the value in the data field of current node. Step 4: If they match then quit. Step 5: else go to Step 3. Step 6: Move the current pointer to point to the next node in the list and go to step 3; till the list is not over or else quit. int search(int item) { int count=1; nw=&first; while(nw->next!=NULL) { if(nw->data==item) break; else count++; nw=nw->next; } return count; } The above code fragment explains how searching of an element takes place in a single linked list. Searching in a single linked list follows linear search algorithm. Here, based on the search key which is given by the user, the pointer will point to the first node of the list. It will then compare the search key with the data found at every node. If the match is found then the search is complete. If not found, then the pointer moves forward in a linked list and the process continues till the last node of the list. 243 Did you know? Finding a specific element in a linked list, even if it is sorted, normally requires O (n) time. This is one of the primary disadvantages of linked lists over other data structures. One of the better way to improve search time is move-to-front heuristic, which simply moves an element to the beginning of the list once it is found. This scheme, handy for creating simple caches, ensures that the most recently used items are also the quickest to find again. Another common approach is to "index" a linked list using a more efficient external data structure. For example, one can build a red-black tree or hash table whose elements are references to the linked list nodes. Multiple such indexes can be built on a single list. The disadvantage is that these indexes may need to be updated each time a node is added or removed (or at least, before that index is used again). Displaying the linked list content Similar to search function, display function begins by traversing from the starting of the list to display the value in the data field of each and every node till it reaches the last node. Below is the algorithm for displaying the content of the linked list. Step 1: Initiate the current pointer from the beginning of the list. Step 2: Display the value in the data field of current pointer. Step 3: Increment the current pointer to pint to the next node. Step 4: Repeat step 3 until the end of the list. Following is the algorithm for implementation of the above algorithm. void display() { struct node *temp; temp=start; while(temp!=NULL) { printf("%d",temp->data); temp=temp->next; } } 244 Self-assessment Questions 1) While inserting a node at the first position in the linked list the link part of the node to be inserted should point to ____________. a) Any random node in the list b) NULL pointer c) Start pointer d) Midpoint of all the nodes 2) Trying to insert a node where the linked list has reached its maximum capacity is called _________ a) Underflow b) Overflow c) Mid flow d) Overcover 3) The concept of linking wrong nodes is called as _____________ a) Memory leaks b) Memory holes c) Memory pits d) Memory wastage 4) Trying to delete a node from an empty linked list is called _______________ a) Underflow b) Overflow c) Mid flow d) Overcover 245 Summary o A linked list is a collection of nodes which consists of two fields namely data and link. The data field contains the information or the data to be stored by the node. The link field contains the addresses of the next node. o The basic operations of linked list are creation of linked lists, insertion of node into the linked lists, deletion of node from the linked lists, searching an element from the lists and displaying all linked lists elements. o Insertion operation involves addition of a node to a linked which happens either at the start of the linked list or at any given position in the linked list or at the end of the linked list. o Similarly, deletion operation involves deletion of a node from a linked which happens either at the start of the linked list or at any given position in the linked list or at the end of the linked list. o Searching in a single linked list is based on a simple linear search technique which traverses from the start till the end of the list until the desired element is found. o Display operation is responsible for traversing the whole list and then displaying the data component of the list in a linear fashion. 246 Terminal Questions 1. Write a C program for inserting node in the linked list 2. Write and explain the algorithm for deleting a node in the linked list 3. Write a C program for searching a node in the linked list. 4. Explain the algorithm for displaying all the nodes in the linked list. 247 Answer Keys Self-assessment Questions Question No. Answer 1 c 2 b 3 a 4 a Activity Activity Type: Offline Duration: 20 Minutes Description: Write a function to sort an existing linked list of integers using insertion sort. 248 Bibliography e-Reference • cs.cmu.edu, (2016). Lecture 10 Linked List Operations Concept of a Linked List Revisited. Retrieved on 19 April 2016, from http://www.cs.cmu.edu/~ab/15123S09/lectures/Lecture%2010%20%20%20Linked%20List%20Operations.pdf External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education. Video Links Topic Operations on singly linked list Singly Linked List-Deletion of Last Node Linked Lists in 10 minutes Link https://www.youtube.com/watch?v=McgL6JuWUpM https://www.youtube.com/watch?v=Hn8Hs9sVSCM https://www.youtube.com/watch?v=LOHBGyK3Hbs 249 Notes: 250 MODULE - V Tree Graphs and Their Applications MODULE 5 Tree Graphs and Their Applications Module Description Trees and graphs are typical examples of non-linear data structure as discussed in Module 1. Non-linear data structure unlike linear data structure, is a structure where in an element is permitted to have any number of adjacent elements. Trees are non-linear data structures which are very useful for representing hierarchical relationships among the data elements. For example, for representation of relationship between members of the family we can uses non-linear data structures like trees. Data organization in hierarchical forms or structures is very essential for many applications involving searching of data elements. Trees are most important and useful data structures in Computer Science in the areas of data parting, compiler designs, expression evaluation and managing storages. Similar to trees, graph is a powerful tool used for representing a physical problem in mathematical form. One of the famous problems where graphs are used is for finding optimum shortest path for a travelling salesman while travelling from one city to another city, so as to minimize the cost. A graph may also have unconnected node. Moreover, there may be more than one path between two nodes. Graphs and directed graphs are powerful tools used in computer science for many real world applications. For example, building compilers and also in modelling physical communication networks. A graph is an abstract notion of a set of nodes (vertices or points) and connection relations (edges or arcs) between them. Chapter 5.1 Tree Fundamentals Chapter 5.2 Graph Fundamentals Chapter Table of Contents Chapter 5.1 Tree Fundamentals Aim ..................................................................................................................................................... 251 Instructional Objectives................................................................................................................... 251 Learning Outcomes .......................................................................................................................... 251 Introduction ...................................................................................................................................... 252 5.1.1 Definition of Tree.................................................................................................................... 252 Self-assessment Questions ..................................................................................................... 255 5.1.2 Tree Terminologies ................................................................................................................. 255 (i) Root ..................................................................................................................................... 255 (ii) Node................................................................................................................................... 256 (iii) Degree of a Node or a Tree ............................................................................................ 256 (iv) Terminal Node................................................................................................................. 257 (v) Non-terminal Nodes ........................................................................................................ 257 (vi) Siblings .............................................................................................................................. 258 (vii) Level ................................................................................................................................. 258 (viii) Edge ................................................................................................................................ 259 (ix) Path ................................................................................................................................... 260 (x) Depth.................................................................................................................................. 260 (xi) Parent Node ..................................................................................................................... 261 (xii) Ancestor of a Node ........................................................................................................ 261 Self-assessment Questions ..................................................................................................... 262 5.1.3 Types of Trees .......................................................................................................................... 262 (i) Binary Trees ....................................................................................................................... 262 (ii) Binary Search Trees ......................................................................................................... 264 (iii) Complete Binary Tree .................................................................................................... 269 Self-assessment Questions ..................................................................................................... 270 5.1.4 Heap .......................................................................................................................................... 271 (i) Heap Order Property ........................................................................................................ 272 (ii) Heap Sort........................................................................................................................... 277 Self-assessment Questions ..................................................................................................... 280 5.1.5 Binary Tree............................................................................................................................... 280 (i) Array Representation ........................................................................................................ 280 (ii) Creation of a Binary Tree ................................................................................................ 283 Self-assessment Questions ..................................................................................................... 287 5.1.6 Traversal of Binary Tree ......................................................................................................... 288 (i) Preorder Traversal............................................................................................................. 288 (ii) Inorder Traversal.............................................................................................................. 289 (iii) Postorder Traversal......................................................................................................... 290 Self-assessment Questions ..................................................................................................... 290 Summary ........................................................................................................................................... 291 Terminal Questions.......................................................................................................................... 291 Answer Keys...................................................................................................................................... 292 Activity............................................................................................................................................... 292 Bibliography ...................................................................................................................................... 293 e-References ...................................................................................................................................... 293 External Resources ........................................................................................................................... 293 Video Links ....................................................................................................................................... 293 Aim To equip the students with the techniques of Trees so that it can be used in the program to search and sort the elements in the list Instructional Objectives After completing this chapter, you should be able to: • Explain tree and its different types • Outline various tree terminologies with example • Illustrate max heap and min heap technique • Demonstrate binary search tree with an example • Illustrate preorder, postorder and inorder Learning Outcomes At the end of this chapter, you are expected to: • Discuss tree and various types with example • Write a code to demonstrate max heap and min heap • Explain array representation of binary tree • Outline the steps to traverse inorder, preorder and postorder • Construct binary tree from inorder, preorder and postorder 251 Introduction So far in the previous chapters, we have come across linear data structures like stack, queues and linked lists. This chapter focuses on non-linear data structures such as trees. In this chapter, we will come across the basic structure of trees, its types and its various terminologies. We will then focus on heaps, its types, then the concept of binary search tress and its various types of traversals. 5.1.1 Definition of Tree Tree can be defined as a non-linear data structure consisting of a root node and other nodes present at different levels forming a hierarchy. A tree essentially has one node called its root and one or more nodes adjacent below, connected to it. A tree with no nodes is called an empty tree. Therefore, a tree is a finite set of one or more nodes such that: • There is specially designated node called a root. • The remaining nodes are partitioned into n>=0 disjoint set T1, T2,…….., Tn where each of these sets is a tree. T1, T2… Tn are called sub-trees of the root. A node in the definition of a tree depicts an item of information and the links between the nodes are called as its branches which represents an association between these items of information. The figure 5.1.1 below shows pictorial representation of a Tree. Figure 5.1.1: Pictorial Representation of a Tree 252 In the above figure, node 1 is the root of the tree, nodes 2, 3, 4 and 9 are called intermediate nodes and nodes 5, 6, 7, 8, 10, 11 and 12 are its leaf nodes. It is important to note that the tree emphasizes on the aspect of (i) connectedness and (ii) absence of loops or cycles. Starting from the root, the tree structure allows connectivity of the root to each of the node in the tree. Generally, any node can be reached from any part of the tree. Moreover, with all the branches providing links between the nodes, the tree structure makes the point that there are no sets of nodes forming a closed loop or cycle. Tree data structure is widely used in the field of computer science. The following points shows its use in many applications, they are • Folder/directory structure in operating systems like windows and linux. • Network routing • Syntax tree used in compilers, etc. Example: The figure 5.1.2 below shows directory structure in Windows OS. Figure 5.1.2: Directory Structure in Windows OS. 253 The figure 5.1.3 below shows directory structure in Linux OS. Figure 5.1.3: Directory Structure in Linux OS. Advantages of trees • Trees reflect structural relationships in the data • Trees are used to represent hierarchies • Trees provide an efficient insertion and searching • Trees are very flexible data, allowing to move sub-trees around with minimum efforts Did you know? Portable Document Format (PDF) is a tree based format. It has a root node followed by a catalog node (these are often the same) followed by a pages node which has several child page nodes. Producers/consumers often use a balanced tree implementation to store a document in memory. 254 Self-assessment Questions 1) A Tree is _____________ type of data structure a) Linear b) Advanced c) Non-Linear d) Data Driven 2) The tree with no nodes is called an empty tree a) True b) False 3) Nodes in a tree can be infinite. a) True b) False 5.1.2 Tree Terminologies Terminologies are ways of giving names to any part of data structure. Therefore this section covers the various terminologies that are used to define and help us to identify the components of the tree. This will make us easy to analyze and solve various complex mathematical problems. Some of the very important terminologies are root, node, degree of a node or a tree, terminal nodes, siblings, levels, edge, path, depth, parent node and ancestral of the node. (i) Root Definition: In trees, the origin node or the first node is called a root. This is the node from which all the other nodes gets evolved and sometimes referred to as a seed node. In any given tree there is at least one root node and the entire structure of the tree is built on this node. 255 The figure 5.1.4 shows the root node in a tree data structure. Figure 5.1.4: Root Node Representation (ii) Node Definition: Node is a data point which forms a unit in a tree data structure. In real time applications, node can be a computer system connected to network which is in turn connected to a computer server. Different nodes connect together in specific structure to form a tree data structure. (iii) Degree of a Node or a Tree Definition: Degree of a node is the total number of child node connected to a particular node. The highest degree of a node among all the nodes is called ‘Degree of a Tree’. The degree of a node varies as at times a node many be connected to more than two nodes. However, in case of a binary tree, degree is always 2. Figure 5.1.5: Degree of a Node 256 (iv) Terminal Node Definition: A node having no child node is called as a terminal node or a leaf node. These nodes are also called as External nodes. These are the nodes having no further connectivity and form leafs of the tree. Figure 5.1.6 below shows leaf nodes. Figure 5.1.6: Leaf Nodes (v) Non-terminal Nodes Definition: All the intermediate nodes which have at least one child node are called nonterminal nodes or internal nodes. All the nodes which are non-terminal nodes are non-terminal nodes. They can have degree greater than zero. Figure 5.1.7 below shows internal or non-terminal nodes. Figure 5.1.7: Internal or Non-Terminal Nodes. 257 (vi) Siblings Definitions: Nodes which belong to same Parent are called as SIBLINGS. In simple words, the nodes with same parent are called as Sibling nodes. The following Figure 5.1.8 shows sibling nodes in a tree. Figure 5.1.8: Sibling Nodes Here B & C are siblings Here D E & F are siblings Here G & H are siblings Here I & J are siblings In any tree, the nodes which has same parent are called ’Siblings’. The children of a parent are called ‘Siblings’. (vii) Level In a tree data structure, the root node is said to be at Level 0 and the children of root node are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2 and so on... In simple words, in a tree each step from top to bottom is called as a Level and the Level count starts with '0' and incremented by one at each level (Step). 258 Figure 5.1.9 below shows levels in a tree. Figure 5.1.9: Levels in the Tree (viii) Edge Definition: Connecting link between two nodes is called an Edge. An edge is basically representation of links which help us understand which node is connected to what other node in the tree. It also helps us determine if the node is not a leaf node. In a tree with N number of nodes there are maximum of N-1 numbers of edges. Figure 5.1.10 below shows representation of edge. Figure 5.1.10: Representation of an Edge 259 (ix) Path In a tree data structure, the sequence of Nodes and Edges from one node to another node is called as Path between those two Nodes. Length of a Path is total number of nodes in that path. In below example the path A - B - E - J has length 4. Figure 5.1.11 shows path in the tree. Figure 5.1.11: Path in the Tree. (x) Depth In a tree data structure, the total number of edges from root node to a particular node is called as Depth of that Node. In a tree, the total number of edges from root node to a leaf node in the longest path is said to be Depth of the tree. In simple words, the highest depth of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root node is '0'. Figure 5.1.12 below shows paths in a tree. Figure 5.1.12: Paths in a Tree 260 In any tree, ‘Path’ is a sequence of nodes and edges between two nodes. Here ‘Path’ between A and J is A – B – E – J Here, ‘Path’ between C and K is C – G - K (xi) Parent Node Definition: Parent is a converse notion of a child In figure 5.1.13 below 2 is parent node of 7 and 5. Figure 5.1.13: General Tree Architecture (xii) Ancestor of a Node Definition: A node reachable by repeated proceedings from the child to the parent is called as an ancestor node. In figure 5.1.13, 2 is an ancestor node of 7, 5, 12, 6, 9, 15, 11 and 4 261 Self-assessment Questions 4) In trees, the origin node or the first node is a) Intermediate node b) Root node c) Leaf node d) Ancestor node 5) A node having no child node is called ___________ a) Intermediate node b) Root node c) Leaf node d) Ancestor node 6) A node reachable by repeated proceedings from the child to the parent is called as an ancestor node. a) Intermediate node b) Root node c) Leaf node d) Ancestor node 5.1.3 Types of Trees Based on the structure, properties, operational permeability and applications, trees can be of various types. Some of the most important types of tree data structures are discussed in this section. Following are the three most common and important types of tree data structures which are widely used in different applications of Computer Science. 1. Binary trees 2. Binary search trees 3. Complete binary trees (i) Binary Trees A binary tree is a type of tree in which node can have only two child node. The figure 5.1.14 below shows binary tree consisting of a root node and two child nodes or subtrees Tl and Tr. Both of these two child nodes can also be empty. 262 Figure 5.1.14: Generic Binary Tree One of the important properties of binary tree is that the depth of the average binary tree is considerably lesser than n. According to the analysis the average depth is and that for a special type of binary tree is called binary search tree, the average depth is O (log n). In fact, the depth can be as large as n -1, as the example in Fig. 5.1.15 shows. Here, the depth of the tree is 4 and height of the tree is also 4 Figure 5.1.15 Worst-case Binary Tree Most of the rules that implies to linked lists will also apply for trees. Particularly, while performing an insertion operation, a node must be created by allocating memory by calling malloc function. Memory allocated to nodes can be freed after their deletion by calling free function. The trees could also be drawn using rectangular (boxes) representation which we usually use for representing linked lists. However circles are used for representation as trees are in fact 263 graphs. Also NULL pointer is not drawn as every binary tree with n nodes would require (n+1) NULL pointers. We could draw the binary trees using the rectangular boxes that are customary for linked lists, but trees are generally drawn as circles connected by lines, because they are actually graphs. We also do not explicitly draw NULL pointers when referring to trees, because every binary tree with n nodes would require n + 1 NULL pointers. Terminologies of Binary tree • The depth of a node is the number of edges from the root to the node. • The height of a node is the number of edges from the node to the deepest leaf. • The height of a tree is a height of the root. • A full binary tree is a binary tree in which each node has exactly zero or two children. • A complete binary tree is a binary tree, which is completely filled, with the possible exception of the bottom level, which is filled from left to right. (ii) Binary Search Trees The binary search trees (BST) are basically binary trees which are aimed at providing efficient way of data searching, sorting, and retrieving. A binary search tree can be defined as a binary tree that is either empty or in which every node has a key (within its data entry) and satisfies the following conditions: • The key of the root (if it exists) is greater than the key in any node in the left sub-tree of the root. • The key of the root (if it exists) is less than the key in any node in the right sub-tree of the root. • The left and right sub-trees of the root are again binary search trees. In above definition, the property 1 and 2 describes ordering relative to the key of the root node and property 3 reaches out to all the nodes of the tree; therefore we can easily implement recursive structure of binary tree. After examining the root of the tree, we move to the either 264 left or right sub-tree which in-turn is binary search tree. Therefore, we can use the same method again on this smaller tree. This algorithm is used in such a way that no two entries in the binary tree can have equal key as key of the left sub tree are always smaller than that if root and those of right sub tree are always greater. It is possible to change the definition to allow entries with equal keys; however doing so makes the algorithms somewhat more complicated. Therefore, we always assume: No two entries in a binary search tree may have equal keys. Insertion in Binary Search Tree: • Check whether root node is present or not (tree available or not). If root is NULL, create root node. • If the element to be inserted is less than the element present in the root node, traverse the left sub-tree recursively until we reach T->left/T->right is NULL and place the new node at T->left(key in new node < key in T)/T->right (key in new node > key in T). • If the element to be inserted is greater than the element present in root node, traverse the right sub-tree recursively until we reach T->left/T->right is NULL and place the new node at T->left/T->right. Algorithm for insertion in Binary Search Tree: TreeNode insert(int data, TreeNode T) { if T is NULL { T = (TreeNode *)malloc(sizeof (StructTreeNode)); (Allocate Memory of new node and load the data into it) T->data = data; T->left = NULL; T->right = NULL; } else if T is less than T->left { T->left = insert(data, T->left); (Then node needs to be inserted in left subtree.So, recursively traverse left sub-tree to find the place where the new node needs to be inserted) } else if T is greater than T->right { T->right = insert(data, T->right); (Then node needs to be inserted in right sub-tree So, recursively traverse right sub-tree to find the place where the new node needs to be inserted.) } return T; } 265 Example: Insert 30 into the Binary Search Tree. Tree is not available. So, create root node and place 30 into it. 30 Insert 23 into the given Binary Search Tree. 35> 30 (data in root). So, 35 needs to be inserted in the right sub-tree of 30. 30 \ 35 Insert 13 into the given Binary Search Tree. 20<30(data in root). So, 20 needs to be inserted in left sub-tree of 30. / 20 30 \ 35 Insert 15 into the given Binary Search Tree. / 20 / 15 30 \ 35 Inserting 25. 30 / \ 20 35 / \ 15 25 Inserting 27. 266 30 / \ 20 35 / \ 15 25 \ 27 Inserting 32. Inserting 40. Inserting 38. 30 / \ 20 35 / \ / 15 25 32 \ 27 30 / \ 20 35 / \ / \ 15 25 32 40 \ 27 30 / \ 20 35 / \ / \ 15 25 32 40 \ / 27 38 Deletion in Binary Search Tree: How to delete a node from binary search tree? There are three different cases that need to be considered for deleting a node from binary search tree. case 1: Node with no children (or) leaf node case 2: Node with one child case 3: Node with two children. 30 / \ 20 35 / \ / \ 15 25 32 40 \ / 27 38 267 Case 1: Delete a leaf node/ node with no children. 30 / \ 20 35 / \ / \ 15 25 32 40 \ / 27 38 Delete 38 from the above binary search tree. 30 / \ 20 35 / \ / \ 15 25 32 40 \ 27 Case 2: Delete a node with one child. 30 / \ 20 35 / \ / \ 15 25 32 40 \ 27 Delete 25 from above binary search tree. 30 / \ 20 35 / \ / \ 15 27 32 40 Case 3: Delete a node with two children. Delete a node whose right child is the smallest node in the right sub-tree. (25 is the smallest node present in the right sub-tree of 20). 30 / \ 20 35 / \ / \ 15 25 32 40 \ 27 268 Delete 20 from the above binary tree. Find the smallest in the left subtree of 20. So, replace 20 with 25. 30 / \ 25 35 / \ / \ 15 27 32 40 Delete 30 from the below binary search tree. 30 / \ 20 35 / \ / \ 15 25 32 40 \ 34 Find the smallest node in the right sub-tree of 30. And that smallest node is 32. So, replace 30 with 32. Since 32 has only one child(34), the pointer currently pointing to 32 is made to point to 34. So, the resultant binary tree would be the below. 32 / \ 20 35 / \ / \ 15 25 34 40 (iii) Complete Binary Tree If the nodes of a complete binary tree are labeled in order sequence, starting with 1, then each node is exactly as many levels above the leaves as the highest power of 2 that divides its label. The following figure 5.1.16 depicts how a complete binary tree looks like. Figure 5.1.16: An Example of Complete Binary Tree 269 In the above figure 5.1.4, the nodes are labeled as per the execution or traversal of called inorder traversal. Here the left most extreme node is chosen for traversal and the tree is traversed as per the sequence of numbers labeled adjacent to the nodes. Self-assessment Questions 7) How many child nodes a node in a binary tree can have? a) One b) Two c) Three d) Four 8) The maximum depth of a binary tree can be _________ a) n+1 b) n c) n-1 d) 0 9) Binary search trees are aimed at providing efficient calculations. a) True 270 b) False 5.1.4 Heap In computer science, a heap is a specialized tree based data structure that satisfies a heap priority. i.e., if A is a parent node of B then the key of node A is ordered with respect to the key of node B with the same ordering applying across the heap. Like binary trees, heap has two properties, first is structure property and second is heap order property. A heap is basically a binary tree which is completely filled (exception for bottom level) from left to right. Such tree is also referred as a complete binary tree. Figure 5.1.17 shows an example of a complete binary tree. Figure 5.1.17: Complete Binary Tree In a complete binary tree, for any array element i, its left child is in position 2i, and the right child is in cell after left child (2i+1). Therefore, there is no requirement of a pointer and tree traversal is very simple and fast on many of the computers. The only limitation of this type of implementation is that there must be an estimate of a heap size in advance, which is also not a major issue. 271 (i) Heap Order Property This property allows operations to be performed quickly on binary trees. Suppose, the operation is to find the minimum of the elements quickly, then it makes logical sense to have smallest element at the root. If we assume, any sub-tree should be heap, then it must be noted that any node should be smaller than its descendants. If this logic is applied to a binary tree then the result gives heap order property. Each node X present in heap, the key in the parent of X is always smaller (or equal to) than the key X. Expect for the root which has no parents. In Figure 5.1.18 the tree on the left is a heap, but the tree on the right is not (the dashed line shows the violation of heap order). As usual, we will assume that the keys are integers, although they could be arbitrarily complex. Figure 5.1.18: Two Complete Trees (only the Left Tree is a Heap) On similar lines, max heap can be declared, which can efficiently find and remove the maximum element, by changing the heap order property. Thus, a priority queue can be used to find either a minimum or a maximum, but this need to be decided ahead of time. By the heap order property, the minimum element can always be found at the root. Thus, we get the extra operation, find_min, in constant time. A min-heap is a binary tree such thatλ - the data contained in each node is less than (or equal to) the data in that node’s children. - The binary tree is complete 272 The following figure 5.1.19 depicts a min heap property. Figure 5.1.19: Min Heap A max-heap is a binary tree such thatλ - the data contained in each node is greater than (or equal to) the data in that node’s children. - The binary tree is complete. The following figure 5.1.20 depicts a max heap property. Figure 5.1.20: Max Heap A binary heap is a heap data structure created using a binary tree. Binary tree has two rules 1. Binary Heap has to be complete binary tree at all levels except the last level. This is called shape property. 273 2. All nodes are either greater than equal to (Max-Heap) or less than equal to (Min-Heap) to each of its child nodes. This is called heap property. Implementation: • Use array to store the data. • Start storing from index 1, not 0. • For any given node at position i: • Its Left Child is at [2*i] if available. • Its Right Child is at [2*i+1] if available. • Its Parent Node is at [i/2]if available. Min heap Here the value of the root is less than or equal to either (left or right) of its children. The following figure 5.1.21 shows a min heap example. Figure 5.1.21: Min Heap Example 274 Max heap Here the value of the root is greater than or equal to either (left or right) of its children. The following figure 5.1.22 shows a max heap example. Figure 5.1.22: Max Heap Example Below is the algorithm used for maintaining the heap property // Input: A: an array where the left and right children of i root heaps (but i may not), i: an array index // Output: A modified so that i roots a heap // Running Time: O(log n) where n = heap-size[A] − i l ← Left(i) r ← Right(i) if l ≤ heap-size[A] and A[l] > A[i] largest ← l else largest ← i if r ≤ heap-size[A] and A[r] < A[largest] largest ← r if largest != i exchange A[i] and A[largest] Max-Heapify(A, largest) Heap build algorithm BUILD-HEAP(A) // Input: A: an (unsorted) array // Output: A modified to represent a heap. // Running Time: O(n) where n = length[A] heap-size[A] ← length[A] for i ← blength[A]/2 downto 1 Max-Heapify(A, i) Heap build algorithm 275 The following figure 5.1.23 shows Example for building a heap Figure 5.1.23: Example for Building a Heap 276 (ii) Heap Sort The heap sort algorithm starts by using BUILD-HEAP to build a heap on the input array A[1 . . n], where n = length[A]. Since the maximum element of the array is stored at the root A[1], it can be put into its correct final position by exchanging it with A[n]. If we now "discard" node n from the heap (by decrementing heap-size[A]), we observe that A[1 . . (n - 1)] can easily be made into a heap. The children of the root remain heaps, but the new root element may violate the heap property (7.1). All that is needed to restore the heap property, however, is one call to HEAPIFY(A, 1), which leaves a heap in A[1 . . (n - 1)]. The heap sort algorithm then repeats this process for the heap of size n - 1 down to a heap of size 2. Program for heap sort: /* * C Program to sort an array based on heap sort algorithm(MAX heap) */ #include <stdio.h> int main() { int heap[10], n, i, j, cnt, root, temp; printf("\nEnter the number of elements: "); scanf("%d",&n); printf("\nEnter the elements: "); for(i =0; i< n; i++) scanf("%d",&heap[i]); for(i =1; i< n; i++) { cnt = i; do { root =(cnt -1)/2; if(heap[root]< heap[cnt])/* to create MAX heap array */ { temp = heap[root]; 277 heap[root]= heap[cnt]; heap[cnt]= temp; } cnt = root; }while(cnt !=0); } printf("Heap array elements are: "); for(i =0; i< n; i++) printf("%d ", heap[i]); for(j = n -1; j >=0; j--) { temp = heap[0]; heap[0]= heap[j]; /* swap max element with rightmost leaf element */ heap[j]= temp; root =0; do { cnt =2* root +1;/* left node of root element */ if((heap[cnt]< heap[cnt +1])&&cnt< j-1) cnt++; if(heap[root]<heap[cnt]&&cnt<j)/* again rearrange to max heap array */ { temp = heap[root]; heap[root]= heap[cnt]; heap[cnt]= temp; } root = cnt; }while(cnt< j); } printf("\nThe sorted array elements after Heap sort are: "); for(i =0; i< n; i++) printf("%d ", heap[i]); } Output: 278 The HEAPSORT procedure takes time O(n lg n), since the call to BUILD-HEAP takes time O(n) and each of the n - 1 calls to HEAPIFY takes time O(lg n). The Figure 5.1.24 shows heap sort. Figure 5.1.24: HeapSort Did you know? Data structure "heap" might be used in various places. Heap used for dynamic memory allocation wherever it is needed. There are two types of heap "ascending heap" and "descending heap". In ascending heap root is the smallest one and in descending heap root is the largest element of the complete or almost complete binary tree. 279 Self-assessment Questions 9) If A is a parent node of B then the key of node A is ordered with respect to the key of node B with the same ordering applying across the heap. a) True b) False 10) As per the heap property it makes logical sense to have smallest element at __________ a) Leaf or terminal nodes b) Intermediate nodes c) Root nodes d) Non-terminal nodes 11) We can easily implement priority queue using heap order property. a) True b) False 5.1.5 Binary Tree As discussed in section 5.1.2.a, a binary tree is a type of tree in which node can have only two child node. That is maximum degree of a binary tree can be 2. Let us discuss how a binary tree can be represented in the form of an array and also how to create binary trees. (i) Array Representation A binary tree can be represented as both single dimension as wee as two dimensional array called as Adjacency Matrix. Adjacency Matrix representation: A two dimensional array can be used to store the adjacency relations very easily and can be used to represent a binary tree. In this representation, to represent a binary tree with n vertices we use n×n matrix. 280 Figure 5.1.25(a) shows a binary tree and Figure 5.1.24(b) shows its adjacency matrix representation. Figure 5.1.25: Representation of a Binary Tree in the Form of Adjacency Matrix From the above representation, we can understand that the storage space utilization is not efficient. Now, let us see the space utilization of this method of binary tree representation. Let ‘n’ be the number of vertices. The space allocated is n x n matrix. i.e., we have n2number of locations allocated, but we have only n-1 entries in the matrix. Therefore, the percentage of space utilization is calculated as follows: The percentage of space utilized decreases as n increases. For large ‘n’, the percentage of utilization becomes negligible. Therefore, this way of representing a binary tree is not efficient in terms of memory utilization. Single dimension array representation Since the two dimensional array is a sparse matrix, we can consider the prospect of mapping it onto a single dimensional array for better space utilization. 281 In this representation, we have to note the following points: • The left child of the ith node is placed at the 2ith position. • The right child of the ith node is placed at the (2i+1)th position. • The parent of the ith node is at the (i/2)th position in the array. If l is the depth of the binary tree then, the number of possible nodes in the binary tree is 2l+11. Hence it is necessary to have 2l+1-1 locations allocated to represent the binary tree. If ‘n’ is the number of nodes, then the percentage of utilization is Figure 5.1.26 shows a binary tree and Figure 5.1.27 shows its one-dimensional array representation. Figure 5.1.26: A Binary Tree Figure 5.1.27: One- dimensional Array Representation 282 For a complete and full binary tree, there is 100% utilization and there is a maximum wastage if the binary tree is right skewed or left skewed, where only l+1 spaces are utilized out of the 2l+1 – 1 spaces. An important observation to be made here is that the organization of the data in the binary tree decides the space utilization of the representation used. (ii) Creation of a Binary Tree Following is an algorithm for creation of a binary tree. Step 1: Pick an element from Preorder. Increment a Preorder Index Variable (preIndex in below code) to pick next element in next recursive call. Step 2: Create a new tree node tNode with the data as picked element. Step 3: Find the picked element’s index in Inorder. Let the index be in Index. Step 4: Call buildTree for elements before inIndex and make the built tree as left subtree of tNode. Step 5: Call buildTree for elements after inIndex and make the built tree as right subtree of tNode. Step 6: Return tNode. Below is the C program for implementing the above algorithm. intinIndex = search(in, inStrt, inEnd, tNode->data); /* Using index in Inorder traversal, construct left and rightsubtress */ tNode->left = buildTree(in, pre, inStrt, inIndex-1); tNode->right = buildTree(in, pre, inIndex+1, inEnd); returntNode; } /* UTILITY FUNCTIONS */ /* Function to find index of value in arr[start...end] The function assumes that value is present in in[] */ 283 intsearch(chararr[], intstrt, intend, charvalue) { inti; for(i = strt; i<= end; i++) { if(arr[i] == value) returni; } } /* Helper function that allocates a new node with the given data and NULL left and right pointers. */ structnode* newNode(chardata) { structnode* node = (structnode*)malloc(sizeof(structnode)); node->data = data; node->left = NULL; node->right = NULL; return(node); } /* } Program: #include <stdlib.h> typedefstructtnode { int data; structtnode *right,*left; }TNODE; TNODE *CreateBST(TNODE *,int); voidInorder(TNODE *); void Preorder(TNODE *); voidPostorder(TNODE *); main() { TNODE *root=NULL; /* Main Program */ intopn,elem,n,i; do { clrscr(); printf("\n ### Binary Search Tree Operations ### \n\n"); printf("\n Press 1-Creation of BST"); printf("\n 2-Traverse in Inorder"); printf("\n 3-Traverse in Preorder"); printf("\n 4-Traverse in Postorder"); printf("\n 5-Exit\n"); printf("\n Your option ? "); scanf("%d",&opn); 284 switch(opn) { case1: root=NULL; printf("\n\nBST for How Many Nodes ?"); scanf("%d",&n); for(i=1;i<=n;i++) { printf("\nRead the Data for Node %d ?",i); scanf("%d",&elem); root=CreateBST(root,elem); } printf("\nBST with %d nodes is ready to Use!!\n",n); break; case2:printf("\n BST Traversal in INORDER \n"); Inorder(root);break; case3:printf("\n BST Traversal in PREORDER \n"); Preorder(root);break; case4:printf("\n BST Traversal in POSTORDER \n"); Postorder(root);break; case5:printf("\n\n Terminating \n\n");break; default:printf("\n\nInvalid Option !!! Try Again !! \n\n"); break; } printf("\n\n\n\n Press a Key to Continue . . . "); getch(); }while(opn !=5); } TNODE *CreateBST(TNODE *root,intelem) { if(root == NULL) { root=(TNODE *)malloc(sizeof(TNODE)); root->left= root->right = NULL; root->data=elem; return root; } else { if(elem< root->data ) root->left=CreateBST(root->left,elem); else if(elem> root->data ) root->right=CreateBST(root->right,elem); else printf(" Duplicate Element !! Not Allowed !!!"); return(root); } } voidInorder(TNODE *root) 285 { if( root != NULL) { Inorder(root->left); printf(" %d ",root->data); Inorder(root->right); } } void Preorder(TNODE *root) { if( root != NULL) { printf(" %d ",root->data); Preorder(root->left); Preorder(root->right); } } voidPostorder(TNODE *root) { if( root != NULL) { Postorder(root->left); Postorder(root->right); printf(" %d ",root->data); } } 286 Self-assessment Questions 12) A tree can only be represented as a 2 dimensional array. a) True b) False 13) A two dimensional representation of an array is a sparse matrix a) True b) False 287 5.1.6 Traversal of Binary Tree A traversal of a binary tree is where its nodes are visited in a particular but repetitive order, rendering a linear order of nodes or information represented by them. There are three simple ways to traverse a tree. They are called preorder, inorder, and postorder. In each technique, the left sub-tree is traversed recursively, the right sub-tree is traversed recursively, and the root is visited. What distinguishes the techniques from one another is the order of those three tasks. The following sections discuss these three different ways of traversing a binary tree. (i) Preorder Traversal In this traversal, the nodes are visited in the order of root, left child and then right child. • Process the root node first. • Traverse left sub-tree. • Traverse right sub-tree. Repeat the same for each of the left and right sub-trees encountered. Here, the leaf nodes represent the stopping criteria. The pre-order traversal sequence for the binary tree shown in Figure 5.1.28 is: A B D E H I C F G A C B D F E H I Figure 5.1.28: A Binary Tree 288 G Consider the following example The following figure 5.1.29 shows pre-order traversal example Figure 5.1.29: Pre-order Traversal Example The pre-order traversal for the above tree is 1->2->4->5->3. (ii) Inorder Traversal In this traversal, the nodes are visited in the order of left child, root and then right child. i.e., the left sub-tree is traversed first, then the root is visited and then the right sub-tree is traversed. The function must perform only three tasks. • Traverse the left subtree. • Process the root node. • Traverse the right subtree Remember that visiting a node means doing something to it: displaying it, writing it to a file and so on. The Inorder traversal sequence for the binary tree shown in Figure 5.1.27 is: D B H E I A F C G. Consider the following example: The following figure 5.1.30 shows in-order traversal example Figure 5.1.30: In-order Traversal Example 289 The in-order traversal for the above tree is 4->2->5->1->3. (iii) Postorder Traversal In this traversal, the nodes are visited in the order of left child, right child and then the root. i.e., the left sub-tree is traversed first, then the right sub-tree is traversed and finally the root is visited. The function must perform the following tasks. • Traverse the left subtree. • Traverse the right subtree. • Process the root node. The post order traversal sequence for the binary tree shown in Figure 5.1.27 is: D H I E B F G C A. Consider the following example: The following figure 5.1.31 shows post-order traversal example Figure 5.1.31: Post-order Traversal Example The post-order traversal for the above tree is 4->5->2->3->1. Self-assessment Questions 14) Traversal in a binary tree is order in which we visit the nodes in a tree. a) True b) False 15) In preorder traversal nodes are visited in order of __________ 290 a) Life child – Root – Right child b) Left child – Right child – Root c) Root – Left child – Right child d) Root – Right child – Left child Summary o Tree can be defined as a non-linear data structure consisting of a root node and other nodes present at different levels forming a hierarchy. o There are three different types of trees; Binary trees, Binary search trees and complete binary trees. o A binary tree is a type of tree in which a node can have only two child nodes. o A binary search tree is a binary tree that is either empty or in which every node has a key. o A complete binary tree is a binary tree in which every level of the tree is completely filled except the last nodes towards the right. o A heap is a specialized tree based data structure that satisfies a heap priority which makes it suitable for implementing priority queues. o There are various terminologies used for identification and analysis of various types of trees like root, nodes, degree of node/tree, terminal nodes, non-terminal nodes, siblings, level, edge, path, depth, parent node and ancestral node. o Binary tree can be implemented in both single and two dimensional arrays. o Binary tree can be traversed using three orders or traversal; inorder, preorder and postorder. Terminal Questions 1. List and explain advantages of a tree. 2. Explain different types of tree. 3. Explain various terminologies used in context of a tree. 4. Explain different types of tree traversals. 291 Answer Keys Self-assessment Questions Question No. Answer 1 c 2 a 3 b 4 b 5 c 6 d 7 b 8 a 9 c 10 a 11 c 12 b 13 a 14 a 15 a 16 c Activity Activity Type: Offline Duration: 15 Minutes Description: Ask all the students to solve the given problem, Convert an array [10,26,52,76,13,8,3,33,60,42] into a maximum heap. 292 Bibliography e-References • cs.cmu.edu, (2016). Binary Trees. Retrieved on 19 April 2016, from, https://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/trees.html • comp.dit.ie, (2016). Heap Sort. Retrieved on 19 April 2016, from, http://www.comp.dit.ie/rlawlor/Alg_DS/sorting/heap%20sort.pdf External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education. Video Links Topic Link Tree Terminologies https://www.youtube.com/watch?v=nq7m0Gll-60 Binary Tree Traversal https://www.youtube.com/watch?v=-aIcPlIQ_MI Binary Tree Representation https://www.youtube.com/watch?v=1EsBpPmGEEE 293 Notes: 294 Chapter Table of Contents Chapter 5.2 Graph Fundamentals Aim ..................................................................................................................................................... 295 Instructional Objectives................................................................................................................... 295 Learning Outcomes .......................................................................................................................... 295 Introduction ...................................................................................................................................... 296 5.2.1 Definition of Graph ................................................................................................................ 296 Self-assessment Questions ...................................................................................................... 308 5.2.2 Types of Graphs ...................................................................................................................... 309 (i) Defined Graph .................................................................................................................... 309 (ii) Undefined Graph .............................................................................................................. 312 Self-assessment Questions ...................................................................................................... 316 5.2.3 Graph Traversal....................................................................................................................... 317 (i) Depth First Search (DFS) Traversal ................................................................................. 317 (ii) Breadth First Search (BFS) Traversal.............................................................................. 326 Self-assessment Questions ...................................................................................................... 334 Summary ........................................................................................................................................... 335 Terminal Questions.......................................................................................................................... 336 Answer Keys...................................................................................................................................... 337 Activity............................................................................................................................................... 338 Case study .......................................................................................................................................... 339 Bibliography ...................................................................................................................................... 341 e-References ...................................................................................................................................... 341 External Resources ........................................................................................................................... 341 Video Links ....................................................................................................................................... 341 Aim To educate the students about the basics of Graphs and their applications Instructional Objectives After completing this chapter, you should be able to: • Describe graph and its types • Discuss the depth first search and breadth first search algorithm • Distinguish between different types of graphs Learning Outcomes At the end of this chapter, you are expected to: • Outline the steps to traverse depth first search and breadth first search • Write a C programme for DFS and BFS • Compare DFS and BFS • Outline applications of graph 295 Introduction Till now we have studied data structures like arrays, stack, queue, linked list, trees etc. In this chapter, we introduce an important mathematical and graphical structure called as Graphs. Graphs are used in subjects like Geography, Chemistry etc. This chapter deals with the study of graphs and how to solve problems using graph theory. This chapter also deals with different types of Graphs traversals. 5.2.1 Definition of Graph A Graph G is a Graphical representation of a collection of objects having vertices and edges. Vertices or nodes are formed by the interconnected objects. The links that joins a pair of vertices is called as Edges. A Graph G consists of a set V of vertices and a set E of edges. A Graph is defined as G = (V, E), where V is a finite and non-empty set of all the Vertices and E is a set of pairs of vertices called as edges. A graph is depicted below in the figure 5.2.1. Figure 5.2.1: A Graph Thus V (G) is the set of vertices of a graph and E (G) is a set of edges of that Graph. Figure 5.2.1 shows an example of a simple graph. In this graph: V (G) = {V1, V2, V3, V4, V5, V6}, forming 6 vertices and E (G) = {E1, E2, E3, E4, E5, E6E, E7}, forming 7 edges. 296 Graphs are one of the objects of study in discrete mathematics and are very important in the field of computer science for many real world applications from building compilers to modelling communication networks. A graph, G, consists of two sets V and E. V is a finite non-empty set of vertices. E is a set of pairs of vertices, these pairs are called edges. V(G) and E(G) will represent the sets of vertices and edges of graph G. We can also write G = (V,E) to represent a graph. Terminologies used in a Graph 1. Adjacent Vertices: Any vertex v1 is said to be an adjacent vertex of another vertex v2, if there exists and edge from vertex v1 to vertex v2. Consider the following graph in figure 5.2.2. In this graph, adjacent vertices to vertex V1 are V2 and V4, whereas adjacent vertices for V4 are vertex V2, V1 and V5 Figure 5.2.2: Adjacent Vertices 2. Point: A point is any position in 1-Dimesioanl, 2-Dimensional or 3-Dimensional space. It is usually denoted by an alphabet or a dot. 3. Vertex: A vertex is defined as the node where multiple lines meet. In the above figure, V1 is a vertex; V2 is a vertex and so on. 297 4. Edge: An edge is a line which joins any two vertices in a graph. In the above sample graph, E1 is an Edge joining two vertices V1 and V2. 5. Path: A path is a sequence of all the vertices adjacent to the next vertex starting from any vertex v. Consider above given figure 5.2.2. In this figure, V1, V2, V4, V5 is a path. 6. Cycle: A Cycle is a path by itself having same first and last vertices. Thus it forms a cycle or a loop like structure. In the above figure, V1, V2, V4, V1 is and Cycle. 7. Connected Graph: A graph is a connected graph if there is a path from any vertex to any other vertex. Consider the following figure 5.2.3 (a), this graph is a connected graph as there is a path from every vertex to other vertex. Figure 5.2.3(b) shows an unconnected graph as there is no edge between vertex V4 and V5. Thus it forms two disconnected components of a graph. Figure 5.2.3 (a): A Connected Graph Figure 5.2.3 (b): An Unconnected Graph 298 8. Degree of Graph: It means the number of edges incident on a vertex. The degree of any vertex v is denoted as degree (v). If degree (v) is 0, it means there are no edges incident on vertex v. Such a vertex is called as an isolated vertex. Below figure 5.2.4 shows a graph and its degrees of all the vertices. Figure: 5.2.4: Degree of Vertices 9. Complete Graph: A graph is said to be a Complete Graph if there is a path from every vertex to every other vertex. It is also called as Fully Connected Graph. Figure 5.2.5 shows a complete Graph. Figure 5.2.5: A Complete or Fully Connected Graph 299 Graph Data Structure Graphs can be formally defines as an abstract data type with data objects and operations on it as follows: Data objects: A graph G of vertices and edges. Vertices represent data or elements. The below given figure 5.2.6 shows a simple graph with 5 vertices. Figure 5.2.6: A Simple Graph Operations • Check-Graph-Empty (G): Check if graph G is empty - Boolean function. Above graph is having 5 vertices, thus this operation will give a false value as graph is not empty. • Insert-Vertex (G, V): Insert an isolated vertex V into a graph G. Ensure that vertex V does not exist in G before insertion. In the below graph in figure 5.2.7, we are adding a new vertex named F. Before adding this vertex to the graph, it acts as an isolated vertex. Figure 5.2.7: Isolated Vertex 300 • Insert-Edge (G, u, v): Insert an edge connecting vertices u, v into a graph G. Ensure that an edge does not exist in G before insertion. In the below given figure 5.2.8, a new edge joining vertices E and F is inserted. Figure 5.2.8: Adding New Edge • Delete-Vertex (G, V): Delete vertex V and the entire edges incident on it from the graph G. Ensure that such a vertex exists in the graph G before deletion. In the graph given in below figure 5.2.9, a vertex D is deleted along with all the vertices incident on that vertex D. Figure 5.2.9: Vertex Deletion • Delete-Edge (G, u, v): Delete an edge from the graph G connecting the vertices u, v. Ensure that such an edge exists before deletion. In the graph shown below (figure 5.2.10), an edge connecting vertices B and D is removed. 301 Figure 5.2.10: Deleting an Edge • Store-Data (G, V, Item): Store Item into a vertex V of graph G. Once a vertex is created, its value can be added too. In below given graph (figure 5.2.11) a vertex is created and item G is stored in that vertex. Figure 5.2.11: Adding Data into Vertex • Retrieve-Data (G, V, and Item): Retrieve data of a vertex V in the graph G and return it in Item. In the below given example (figure 5.2.12) we have retrieved data from the vertex G. 302 Figure 5.2.12: A Complete or Fully Connected Graph • BFT (G): Perform Breath First Traversal of a graph. This traversal starts from the root node and explores the nodes level wise, thus exploring a node completely. • DFT (G): Perform Depth First Traversal of a graph. This traversal starts from a root node and visit all the adjacent nodes completely in depth before backtracking. Representation of Graphs Since a graph is a mathematical structure, the representation of graphs is categorised into two types, namely (i) sequential representation and (ii) linked representation. Sequential representation uses array data structure whereas linked representation uses single linked list as its data structure. The sequential or the matrix representations of graphs have the following methods: • Adjacency Matrix Representation • Incidence Matrix Representation a) Adjacency Matrix Representation A graph with n nodes can be represented as n x n Adjacency Matrix A such that an element Ai j = 1 0 if there exists an edge between nodes i and j Otherwise 303 For example 1: Figure 5.2.13 (a) Graph Figure 5.2.13 (b) Adjacency Matrix Explanation: Above graph contains 5 vertices: 1, 2, 3, 4 and 5 Consider vertex 1: vertex 1 is connected to vertex 2 and vertex 5. Thus A 1, 2=1 and A 1, 5=1 Similarly vertex 1 is not connected to vertex 3, 4 and 1 itself Thus A 1, 3=0, A 1,4=0 and A 1,1=0. Consider vertex 5: vertex 5 is connected to vertex 1, vertex 2 and vertex 4. Thus A 5,1 =1, A 5,2 =1, and A 5,4=1 Similarly vertex 5 is not connected to vertex 3 and 5 itself Thus A 5, 3=0 and A 5,5=0 Same rule follows for other vertices also. For example 2: Figure 5.2.14 (a) Digraph Figure 5.2.14 (b) Adjacency Matrix Consider vertex 4: The given graph is a digraph, hence there is only an unidirectional path from 4 to 2. 304 Thus A 4, 2 =1 Similarly vertex 4 is not connected to any other vertex rest all entries in that row are 0. b) Incidence Matrix Representation Let G be a graph with n vertices and e edges. Define an n x e matrix M = [mij] whose n rows Corresponds to n vertices and e columns correspond to e edges, as Aij = 1 ej incident upon vi For example: e1 e2 e3 e4 e5 e6 e7 v1 1 0 0 0 1 0 0 v3 0 1 1 0 0 0 0 v2 1 v4 0 v5 0 Figure 5.2.15 (a) Undirected Graph 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 Figure 5.2.15 (b) Incidence Matrix Explanation: Above graph contains 5 vertices: 1, 2, 3, 4 and 5 Consider vertex 1: 2 edges are incident on vertex 1. They are e1 and e5. Thus A 1, e1=1 and A 1,e5=1 Rest of the entries in that row will be 0. Consider vertex 4: 3 edges are incident on vertex 4. They are e3, e4 and e7. Thus A 4, e3=1, A 4, e4=1 and A 4, e7=1 Rest of the entries in that row will be 0. The incidence matrix contains only two elements, 0 and 1. Such a matrix is also called as binary matrix or a (0, 1)-matrix. 305 c) Linked Representation of Graphs The linked representation of graphs is also referred to as adjacency list representation and is comparatively efficient with regard to adjacency matrix representation. In Linked representation of Graphs, a graph is stores as a linked structure of nodes. It can be defined as a Graph G=(V, E), where all vertices are stored in a list and each vertex points to a singly linked list of nodes which are adjacent to that head node. For example 1: Figure 5.2.16 (a) Undirected Graph Figure 5.2.16 (b) Linked Representation of a Graph In the above figure 5.2.16(a), an undirected graph is shown. In figure 5.2.16(b), its equivalent Linked list representation is shown. Adjacent vertices to vertex 1 are vertex 2 and 5. Thus in the linked list representation, vertex 1 is shown linked to node 2 and node 2 is linked to node 5, thus forming a chain like structure. Similarly, adjacent vertices to vertex 2 are vertex 1, 5, 3 and 4. Thus in the linked list representation, vertex 2is shown linked to node 1, node 1 is linked to node 5, node 5 is linked to node 3 and node 3 in turn is connected to node 4 thus forming a linked list of nodes. 306 For example 2: Figure 5.2.17 (a) Digraph Figure 5.2.17 (b) Linked Representation of a Graph Did you know? Social network graphs: to tweet or not to tweet. Graphs that represent who knows whom, who communicates with whom, who influences whom or other relationships in social structures. An example is the twitter graph of who follows whom. These can be used to determine how information flows, how topics become hot, how communities develop, or even who might be a good match for who, or is that whom. 307 Self-assessment Questions 1) A Graph is a mathematical representation of a set of objects consisting of ________ and ____________. a) Vertices and indices b) Indices and edges c) Vertices and edges d) Edges and links 2) In a graph if e=(u,v) means _____________. a) u is adjacent to v but v is not adjacent to u. b) e begins at u and ends at v c) u is node and v is an edge. d) Both u and v are edges. 3) Graph can be represented as an adjacency matrix a) True b) False 4) A ________ is a particular position in a one-dimensional, two-dimensional, or threedimensional space. 308 a) Point b) Node c) Edge d) Vertex 5.2.2 Types of Graphs Based on the degree of vertex, graphs are categorised into two types namely, 1. Undirected Graph 2. Directed Graph In an undirected graph, the pair of vertices representing any edge is unordered. Thus, the pairs (v1, v2) and (v2, v1) represent the same edge. In a directed graph, each edge is represented by a directed pair (v1, v2) where v1 is the tail and v2 is the head of the edge. Therefore <v2, v1> and <v1, v2> represent two different edges. (i) Defined Graph Degree of Vertex in a Defined Graph In a directed graph, each vertex has an indegree and an outdegree. Consider the following Directed graph from below figure: Figure 5.2.18: Directed Graph Indegree of a Graph Indegree of vertex V is the number of edges which are coming into the vertex V (incoming edges). Notation − deg+(V). 309 In the above example directed graph, there are 5 vertices: V1, V2, V3, V4 and V5. Consider vertex V1: V2 is connected to V1 through edge E1. The edge comes from V2 towards Vertex V1. Thus the Indegree of vertex V1 is 1. i.e., deg+(V1)=1 Consider vertex V3: V1 and V4 are connected to V3 through edges E4 and E5. Thus the Indegree of vertex V3 is 2. i.e., deg+(V3)=2 Outdegree of a Graph Outdegree of vertex V is the number of edges which are going out from the vertex V (outgoing edges). Notation − deg-(V). In the above example directed graph, there are 5 vertices: V1, V2, V3, V4 and V5. Consider vertex V1: V1 is connected to V3 through edge E4. The edge goes from V1 towards Vertex V3. Thus the Outdegree of vertex V1 is 1. i.e., deg-(V1)=1 For example 1: Consider the following directed graph. Vertex ‘a’ has two edges, ‘ad’ and ‘ab’, which are going outwards. Hence its outdegree is 2. Similarly, there is an edge ‘ga’, coming towards vertex ‘a’. Hence the indegree of ‘a’ is 1. 310 The indegree and outdegree of other vertices are shown in the following table. Table 5.2.1 : Indegree and Outdegree of Vertices Vertex Indegree Outdegree a 1 2 b 2 0 c 2 1 d 1 1 e 1 1 f 1 1 g 0 2 For example 2: Consider the following directed graph. Vertex ‘a’ has an edge ‘ae’ going outwards from vertex ‘a’. Hence its outdegree is 1. Similarly, the graph has an edge ‘ba’ coming towards vertex ‘a’. Hence the indegree of ‘a’ is 1. 311 The indegree and outdegree of other vertices are shown in the following table. Table 5.2.1: Indegree and Outdegree of Vertices (ii) Vertex Indegree Outdegree a 1 1 b 0 2 c 2 0 d 1 1 e 1 1 Undefined Graph An undefined graph is a graph in which the nodes are connected by undefinedarcs. An undefined arc is an edge that has no arrow. Both ends of an undefined arc are equivalent--there is no head or tail. Therefore, we represent an edge in an undefined graph as a set rather than an ordered pair: Definition (Undefined Graph) An defined graph is an ordered pair with the following properties: 1. The first component, , is a finite, non-empty set. The elements of are called the vertices of G. 2. The second component, , is a finite set of sets. Each element of of exactly two (distinct) vertices. The elements of 312 is a set that is comprised are called the edges of G. For example, consider the undefined graph comprised of four vertices and four edges: The graph can be represented graphically as shown in Figure 5.2.19. The vertices are represented by appropriately labelled circles, and the edges are represented by lines that connect associated vertices. Figure 5.2.19: An Undefined Graph Notice that because an edge in an undefined graph is a set, , and since is also a set, it cannot contain more than one instance of a given edge. Another consequence of Definition is that there cannot be an edge from a node to itself in an undirected graph because an edge is a set of size two and a set cannot contain duplicates. Degree of Vertex in a Directed Graph In a defined graph, each vertex has an indegree and an outdegree. For example 1: Consider the following graph − In the above Undirected Graph, 313 deg(a) = 2, since there are 2 edges meeting at vertex ‘a’. deg(b) = 3, since there are 3 edges meeting at vertex ‘b’. deg(c) = 1, since there is 1 edge formed at vertex ‘c’ deg(d) = 2, since there are 2 edges meeting at vertex ‘d’. deg(e) = 0, since there are 0 edges formed at vertex ‘e’. For example 2: Consider the following graph − In the above graph, deg(a) = 2, deg(b) = 2, deg(c) = 2, deg(d) = 2, and deg(e) = 0. Tree A tree can be defined as a nonlinear data structure similar to graphs, in which all the elements are arranged in a sorted manner. A tree can be used to represent some hierarchical relations among various data elements. Trees do not contain any Cycle. Tree has a root node from where the tree structure begins. Starting from the root node the tree will have many subtrees formed by its child nodes. A node is a data element present in a tree. 314 The figure 5.2.20 given below shows a simple tree: Figure 5.2.20: A Tree Degree of any node: • Degree of a node is defined as a number of subtrees of that node. • For example, Degree of nod A is 3, whereas Degree of node H is 0 as there are no subtrees of H. Degree of a tree: • Degree of a tree is the maximum degree of any nodes in the given tree. • For example, in the above given tree, Degree of that tree is 3 as node A is having the degree 3, which is the maximum value in that tree. Did you know? Graphs are often used to represent constraints among items. For example the GSM network for cell phones consists of a collection of overlapping cells. Any pair of cells that overlap must operate at different frequencies. These constraints can be modeled as a graph where the cells are vertices and edges are placed between cells that overlap. 315 The weight of an edge is often referred to as the "cost" of the edge. In applications, the weight may be a measure of the length of a route, the capacity of a line, the energy required to move between locations along a route, etc. Given a weighted graph, and a designated node S, we would like to find a path of least total weight from S to each of the other vertices in the graph. The total weight of a path is the sum of the weights of its edges. Self-assessment Questions 5) An undirected graph has no directed edges a) True b) False 6) ___________ of vertex V is the number of edges which are coming into the vertex a) Degree b) Indegree c) Outdegree d) Path 7) A connected acyclic graph is called a ___________ a) Connected Graph b) Tree c) Hexagon d) Pentagon 8) ______________ of vertex V is the number of edges which are going out from the vertex 316 a) Degree b) Indegree c) Outdegree d) Path 5.2.3 Graph Traversal A Graph traversal is a method by which we visit all the nodes in any given graph. Graph traversals are required in many application areas, like searching an element in graph, finding the shortest path to any node etc. There are many methods for traversing through a graph. In this chapter we will study the following 2 methods for graph traversals. They are: 1. Depth First Search (DFS) 2. Breadth First Search (BFS) (i) Depth First Search (DFS) Traversal In this Depth First Search method, we need to start from any root node in a graph and explore all the nodes along each branch before backtracking. It means we need to explore all the unvisited graph nodes of any root node. We can use a Stack data structure to keep track of the visited nodes of a graph. Algorithm: 1. Start 2. Consider a Graph G=(V, E) 3. Initially mark all the nodes of graph G as unseen 4. Push Root node onto the Stack from where it begins 5. Repeat until the Stack S is empty 6. Pop the node from the Stack S 7. If this popped node is having any unseen nodes, traverse the unseen child nodes, mark them as visited and push it on stack 8. If the node is not having any unseen child nodes, Pop the node from the stack S 9. End 317 Pseudo Code: Consider a Graph G= (V, E) where V is set of vertices and E is a set of Edges. DFS(G, Root) { root: is a vertex from where the traversal begins Visited[v] is a status flag denoting that any vertex v is visited Consider Stack s used to store visited vertices For each vertex v in the graph Set Visited[v] = false; //to mark all nodes unvisited initially Push root vertex on Stack S; //Start from Root vertex While Stack is not empty { Pop element from S and put in v If (not visited[v] =true) { For every unvisited node x of v Push vertex x on Stack S } } } Example: Consider the following Graph. We will apply the above algorithm on this graph to implement Depth first Search Traversal. 318 Step 1: Initially we start from root node B. Stack S is empty. Step 2: Mark the root node B as visited and Push B on the Stack S. Step 3: Check for the nodes adjacent to node B. Only Node D is adjacent to node B. So visit that node D and Push it onto the stack S. 319 Step 4: Check for the nodes adjacent to node D. There are 3 Nodes adjacent to node D. They are B, C and E. But node B is already visited. So we need to consider either of the nodes C and E. Let us consider node C. Mark first child node C as visited and push C on stack S. Step 5: Check for the nodes adjacent to node C. There are 2 Nodes adjacent to node C. They are D and E. But node D is already visited. So select node E as the next node. Mark node E as visited and push E on stack S. 320 Step 6: Check for the nodes adjacent to node E. There are 3 Nodes adjacent to node E. They are D, C and A. But nodes D and C are already visited. So select node A as the next node. Mark node A as visited and push A on stack S. Step 7: Check for the nodes adjacent to node A. But there is only 1 node adjacent to A i.e. E and E is already visited. So as per the algorithm, if there are no unseen nodes then pop that node from stack. So pop out node A from stack. Hence A has no vertices left to be visited now. So we need to backtrack to node E. 321 Step 8: Check for the nodes adjacent to node E. Nodes A, C and D are adjacent. But A, C and D are already marked as visited. So as per the algorithm, if there are no unseen nodes then pop that node from stack. So pop out node E from stack. Hence E has no vertices left to be visited now. So we need to backtrack to node C. Step 9: Check for the nodes adjacent to node C. Nodes D and E are adjacent. But D and E are already marked as visited. So pop out node C from stack. So we need to backtrack to node D. 322 Step 10: Check for the nodes adjacent to node D. Nodes B, C and E are adjacent. But C, B and E are already marked as visited. So pop out node D also from stack. So we need to backtrack to node B. Step 11: Check for the nodes adjacent to node B. Node D is adjacent. But D is already marked as visited. So pop out node B(root) also from stack. Hence the stack is empty now. It means the algorithm is successfully executed. Result of DFS Traversal: B. D, C, E, A Program: /*C program for Depth first search graph traversal using Stack */ #include<stdio.h> char nodeid[20]; //to store nodes names char stack[50]; int temp=0; int tos=-1, nodes; //top of stack initialized to -1 char arr[20]; char dfs(int ); 323 int matrix[30][30]; void push(char val) //push vertex on stack { tos=tos+1; stack[tos]=val; } char pop() //pop vertex from stack { return stack[tos]; } void outputDFS() { printf("Depth First Traversal gives: "); for(int i=0; i<nodes; i++) printf("%c ",arr[i]); } int unVisited(char val) { for(int i=0; i<temp; i++) if(val==arr[i]) return 0; for(int i=0; i<=tos; i++) if(val==stack[tos]) return 0; return 1; } char dfs(int i) { int k; char m; if(tos==-1) { push(nodeid[i]); } m=pop(); tos=tos-1; arr[temp]=m; temp++; for(int j=0; j<nodes; j++) { if(matrix[i][j]==1) { if(unVisited(nodeid[j])) { push(nodeid[j]); } } } return stack[tos]; } 324 int main() { char v; int l=0; printf("How many nodes in graph?"); scanf("%d",&nodes); printf("Enter the names of nodes one by one: \n"); for(int i=0; i<nodes; i++) { scanf("%s",&nodeid[i]); } char root=nodeid[0]; //consider first node as root node printf("Enter the adjacency matrix. Edge present=1, else 0\n"); for(int i=0;i<nodes; i++) { for(int j=0; j<nodes; j++) { printf("matrix[%c][%c]= ", nodeid[i], nodeid[j]); scanf("%d", &v); matrix[i][j]=v; } } for(int i=0;i<nodes;i++) { l=0; while(root!=nodeid[l]) l++; root=dfs(l); } outputDFS(); } Output: 325 (ii) Breadth First Search (BFS) Traversal In this method of traversing, we select a node as a start node from where the traversal begins. It is visited and marked, and all the unvisited nodes adjacent to the next node are visited and marked in an order. Similarly, all the unvisited nodes of adjacent nodes are visited and marked until full graph nodes are covered. Algorithm: 1. Start 2. Consider a Graph G=(V, E) 3. Initially mark all the nodes of graph G as unseen 4. Add the Root node into the Queue Q from where it begins 5. Repeat until the Queue Q is empty 6. Remove the node from the Queue Q 326 7. If this popped node is having any unseen nodes, visit and mark all the unvisited nodes. 8. End Pseudo Code: Consider a Graph G= (V, E) where V is set of vertices and E is a set of Edges. BFS (G, Root) { Root: is a vertex from where the traversal begins Visited[v] is a status flag denoting that any vertex v is visited Consider Queue Q used to store visited vertices For each vertex v in the graph Set Visited[v] = false; //to mark all nodes unvisited initially Add Root vertex to the Queue Q; //Start from Root vertex While the Queue Q is not empty { Remove element from Q and put in v If (not visited[v] =true) { Visit and mark all the unvisited node x of v } } } Example: Consider the following Graph. We will apply the above algorithm on this graph to implement Breadth first Search Traversal. 327 Step 1: We will start from node B. Hence B is a root node. We have a queue Q for keeping track of vertices. Step 2: We will start from node B. Hence B is a root node. We have a queue Q for keeping track of vertices. Add B to the Queue, Mark B as visited. 328 Step 3: Nodes adjacent to B are A, C and D. All 3 are unvisited. But we will start from A. As B is visited, Remove B from Queue. Mark A as visited and add it to the Queue Q. Step 4: Next Node adjacent to B is C. Mark C as visited and add it to the Queue Q. Step 5: Next Node adjacent to B is D. Mark D as visited and add it to the Queue Q. 329 Step 6: Now the node B is fully explored. So, the next node to be visited is A. Nodes adjacent to node A are B, C and E. But B and C are already visited. Mark E as visited and add it to the Queue Q. But remove A from queue Q as it is fully explored. Step 7: Now the nodes adjacent to E are A, C and D. But all are marked and visited. If we see the graph, vertices E, C and D are fully visited so remove them from the queue. Hence the Queue is empty. Thus the algorithm is successfully implemented. Result of BFS: B, A, C, D, E Program: /*C program for Breadth first search graph traversal using Queue */ #include<stdio.h> char nodeid[20]; //to store nodes names char queue[50]; int temp=0; int front=0, rear=0, nodes; //front and rear =0 char arr[20]; 330 int bfs(int); int matrix[30][30]; void qadd(char value) //function to add vertex to a queue { queue[front]=value; front++; } char qremove() //function to remove visited vertex from queue { rear=rear+1; return queue[rear-1]; } void outputBFS() { printf("Breadth First Traversal gives: "); for(int i=0; i<nodes; i++) printf("%c ",arr[i]); } int unVisited(char value) { for(int i=0; i<front; i++) { if(value==queue[i]) return 0; } return 1; } int bfs(int i) { char r; if(front==0) { qadd(nodeid[i]); } for(int j=0; j<nodes; j++) { if(matrix[i][j]==1) { if(unVisited (nodeid[j])) { qadd(nodeid[j]); } } } r=qremove(); arr[temp]=r; temp++; return 0; 331 } int main() { char v; printf("How many nodes in graph?"); scanf("%d",&nodes); printf("Enter the names of nodes one by one: \n"); for(int i=0; i<nodes; i++) { scanf("%s",&nodeid[i]); } printf("Enter the adjacency matrix. Edge present=1, else 0\n"); for(int i=0;i<nodes; i++) { for(int j=0; j<nodes; j++) { printf("matrix[%c][%c]= ", nodeid[i], nodeid[j]); scanf("%d", &v); matrix[i][j]=v; } } for(int i=0;i<nodes;i++) bfs(i); outputBFS(); } Output: 332 Comparison: DFS versus BFS Table: 5.2.1: Comparison of DFS with BFS Depth First Search Traversal Breadth First Search Traversal This traversal starts from a root node and This traversal starts from the root node and visit all the adjacent nodes completely in explores the nodes level wise, thus exploring a depth before backtracking. node completely. It may not necessarily give a shortest path in a graph. It always gives the shortest path within a graph, thus gives an optimal solution by giving shortest path. If a loop exists in a graph, the algorithm Marking visited nodes can improve the may go into infinite loop. Hence care efficiency of the algorithm, but even without should be taken while marking visited doing this, the search is guaranteed to vertices. terminate. Applications: Applications: 1. Connectivity testing 1. Finding Shortest path 2. Spanning trees 2. Spanning tree 333 Self-assessment Questions 9) Sequential representation of binary tree uses___________. a) Array with pointers b) Single linear array c) Two dimensional arrays d) Three dimensional arrays 10) In the _____________traversal, we process all of a vertex’s descendants before we move to an adjacent vertex. a) Depth First b) Breadth First c) Path First d) Root First 11) The data structure required for Breadth First Traversal on a graph is__________. a) Tree b) Stack c) Array d) Queue 12) The aim of BFS algorithm is to traverse the graph that are__________ a) As close as possible to the root node b) With high depth c) With large breadth d) With large number or nodes 334 Summary o Graphs are non-linear data structures. Graph is an important mathematical representation of a physical problem. o Graphs and directed graphs are important to computer science for many real world applications from building compilers to modeling physical communication networks. o A graph is an abstract notion of a set of nodes (vertices or points) and connection relations (edges or arcs) between them. o The representation of graphs can be categorized as (i) sequential representation and (ii) linked representation. o The sequential representation makes use of an array data structure whereas the linked representation of a graph makes use of a singly linked list as its fundamental data structure. o The depth first search (DFS) and breadth first search (BFS) and are the two algorithms used for traversing and searching a node in a graph. 335 Terminal Questions 1. Explain graphs as a data structure. 2. Explain two different ways of sequential representation of a graph with an example. 3. Explain the linked representation of an undirected and directed graph. 4. Which are the two standard ways of traversing a graph? Explain them with an example of each. 5. Consider the following specification of a graph G, V(G) = { 4,3,2,1 } E(G) = {( 2,1 ),( 3,1 ),( 3,3 ),( 4,3 ),( 1,4 )} a) Draw an undirected graph. b) Draw its adjacency matrix. 336 Answer Keys Self-assessment Questions Question No. Answer 1 c 2 d 3 a 4 a 5 a 6 b 7 b 8 c 9 b 10 a 11 d 12 a 337 Activity Activity Type: Offline Duration: 30 Minutes Description: Ask the students to solve given problem: Consider the graph G with vertices V ={1, 2, 3, 4} and edges E={(1,2),(2,3),(3,4),(4,1),(2,1),(2,4)}. 338 • For every vertex u, find its indegree in (u) and its out degree out (u). • What is the value of the following sum for this graph? Case study Study of different applications of Graphs Since they are powerful abstractions, graphs can be very important in modelling data. In fact, many problems can be reduced to known graph problems. Here we outline just some of the many applications of graphs. 1. Transportation networks. In road networks vertices are intersections and edges are the road segments between them, and for public transportation networks vertices are stops and edges are the links between them. Such networks are used by many map programs such as Google maps, Bing maps and now Apple IOS 6 maps (well perhaps without the public transport) to find the best routes between locations. They are also used for studying traffic patterns, traffic light timings, and many aspects of transportation. 2. Utility graphs. The power grid, the Internet, and the water network are all examples of graphs where vertices represent connection points, and edges the wires or pipes between them. Analysing properties of these graphs is very important in understanding the reliability of such utilities under failure or attack, or in minimizing the costs to build infrastructure that matches required demands. 3. Document link graphs. The best known example is the link graph of the web, where each web page is a vertex, and each hyperlink a directed edge. Link graphs are used, for example, to analyse relevance of web pages, the best sources of information, and good link sites. 4. Protein-protein interactions graphs. Vertices represent proteins and edges represent interactions between them that carry out some biological function in the cell. These graphs can be used, for example, to study molecular pathways—chains of molecular interactions in a cellular process. Humans have over 120K proteins with millions of interactions among them. 5. Network packet traffic graphs. Vertices are IP (Internet protocol) addresses and edges are the packets that flow between them. Such graphs are used for analysing network security, studying the spread of worms, and tracking criminal or non-criminal activity. 6. Scene graphs. In graphics and computer games scene graphs represent the logical or special relationships between objects in a scene. Such graphs are very important in the computer games industry. 339 7. Finite element meshes. In engineering many simulations of physical systems, such as the flow of air over a car or airplane wing, the spread of earthquakes through the ground, or the structural vibrations of a building, involve partitioning space into discrete elements. The elements along with the connections between adjacent elements form a graph that is called a finite element mesh. 8. Robot planning. Vertices represent states the robot can be in and the edges the possible transitions between the states. This requires approximating continuous motion as a sequence of discrete steps. Such graph plans are used, for example, in planning paths for autonomous vehicles. 9. Neural networks. Vertices represent neurons and edges the synapses between them. Neural networks are used to understand how our brain works and how connections change when we learn. The human brain has about 1011 neurons and close to 1015 synapses. 10. Graphs in quantum field theory. Vertices represent states of a quantum system and the edges the transitions between them. The graphs can be used to analyse path integrals and summing these up generates a quantum amplitude (yes, I have no idea what that means). 11. Semantic networks. Vertices represent words or concepts and edges represent the relationships among the words or concepts. These have been used in various models of how humans organize their knowledge, and how machines might simulate such an organization. 12. Graphs in epidemiology. Vertices represent individuals and directed edges the transfer of an infectious disease from one individual to another. Analysing such graphs has become an important component in understanding and controlling the spread of diseases. 13. Graphs in compilers. Graphs are used extensively in compilers. They can be used for type inference, for so called data flow analysis, register allocation and many other purposes. They are also used in specialized compilers, such as query optimization in database languages. Questions: 1. List down different applications of graphs from above study. 2. Explain in brief how graphs can be used in computer networks. 3. Can you thinks of any additional application of graph to solve real world problem. 340 Bibliography e-Reference • courses.cs.vt.edu, (2016). Graph Traversals .Retrieved on 19 April 2016, from, http://courses.cs.vt.edu/~cs3114/Fall09/wmcquain/Notes/T20.GraphTraversals.pdf External Resources • Kruse, R. (2006). Data Structures and program designing using ‘C’ (2nd ed.). Pearson Education. • Srivastava, S. K., & Srivastava, D. (2004). Data Structures Through C in Depth (2nd ed.). BPB Publications. • Weiss, M. A. (2001). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson Education. Video Links Topic Link Introduction to Graphs https://www.youtube.com/watch?v=vfCo5A4HGKc Graph Types and Representations https://www.youtube.com/watch?v=VeEneWqC5a4 Graph Traversals https://www.youtube.com/watch?v=H4_vRy4xQpc&li st=PLT2H5PXNSXgM_Mqzk7bChFvB6xyuWilIa 341 Notes: 342