Garden City College of Scine ce & Management Studies III SEM BCA Data structures Using C Introduction to Data Structures Using C A data structure is an arrangement of data in a computer's memory or even disk storage. An example of several common data structures are arrays, linked lists, queues, stacks, binary trees, and hash tables. Algorithms, on the other hand, are used to manipulate the data contained in these data structures as in searching and sorting . Name Position Aaron Manager Charles VP George Employee Jack Employee Janet VP John President Kim Manager Larry Manager Martha Employee Patricia Employee Rick Secretary Sarah VP But only one view of the company is shown by the above list. You also want the database to represent the relationships between employees and management at ABC. It does not tell you which managers are responsible for which workers and so on, although your list contains both name and position. After thinking about the problem for a while. You decide that the tree diagram is much better structure for showing the work relationships at the ABC company. These two above diagrams are examples of the different data structures. Your data is organized into a list, in one of the data structures which is above. The employees name can be stored in alphabetical order so that we can locate the employee's record very quickly. For showing the relationships between employees these structure is not very useful. The tree structure is a much better suited for this purpose. The data structures are an important way of organizing information in a computers. There are many different data structures that programmers use to organize data in computers, just like the above illustrated diagrams. Some of the data structures are similar to the tree diagram because they are good for representing the relationships between different data. Other structures are good for ordering the data in a particular way like the list of employees which is shown above. Each data structure has their own unique properties that make it well suited to give a certain view of the data. Primitive and nonprimitive structures Data can be structured at the most primitive level, where they are directly operated upon by machinelevel instructions. At this level, data may be character or numeric, and numeric data may consist of integers or real numbers. Nonprimitive data structures can be classified as arrays, lists, and files. An array is an ordered set which contains a fixed number of objects. No deletions or insertions are performed on arrays. At best, elements may be changed. A list, by contrast, is an ordered set consisting of a variable number of elements to which insertions and deletions can be made, and on which other operations can be performed. When a list displays the relationship of adjacency between elements, it is said to be linear; otherwise it is said to be nonlinear. A file is typically a large list that is stored in the external memory of a computer. Additionally, a file may be used as a repository for list items (records) that are accessed infrequently. DATA: appearing in our data structure is processed by means of certain operations. Infect, the particular data structure that one chooses for a given situation depends largely on the frequency with which specific operations are performed.The following four operations play a major role: Transversing Accessing each record exactly once so that certain items in the record may be processed.(This accessing or processing is sometimes called 'visiting" the records.) Searching Finding the location of the record with a given key value, or finding the locations of all records, which satisfy one or more conditions. Inserting Adding new records to the structure. Deleting Removing a record from the structure. Sometimes two or more data structure of operations may be used in a given situation; e.g., we may want to delete the record with a given key, which may mean we first need to search for the location of the record. Memory Allocation In this chapter, we'll meet malloc, C's dynamic memory allocation function, and we'll cover dynamic memory allocation in some detail. As we begin doing dynamic memory allocation, we'll begin to see (if we haven't seen it already) what pointers can really be good for. Many of the pointer examples in the previous chapter (those which used pointers to access arrays) didn't do all that much for us that we couldn't have done using arrays. However, when we begin doing dynamic memory allocation, pointers are the only way to go, because what malloc returns is a pointer to the memory it gives us. (Due to the equivalence between pointers and arrays, though, we will still be able to think of dynamically allocated regions of storage as if they were arrays, and even to use array-like subscripting notation on them.) You have to be careful with dynamic memory allocation. malloc operates at a pretty ``low level''; you will often find yourself having to do a certain amount of work to manage the memory it gives you. If you don't keep accurate track of the memory which malloc has given you, and the pointers of yours which point to it, it's all too easy to accidentally use a pointer which points ``nowhere'', with generally unpleasant results. (The basic problem is that if you assign a value to the location pointed to by a pointer: *p = 0; and if the pointer p points ``nowhere'', well actually it can be construed to point somewhere, just not where you wanted it to, and that ``somewhere'' is where the 0 gets written. If the ``somewhere'' is memory which is in use by some other part of your program, or even worse, if the operating system has not protected itself from you and ``somewhere'' is in fact in use by the operating system, things could get ugly.) Dynamic Memory Allocation An Introduction to Dynamic Memory Allocation Up until this time, we have used what is referred to as static memory. Static memory means we reserve a certain amount of memory by default inside our program to use for variables and such. While there is nothing wrong with this, it means that once we reserve this memory, no other program can use it, even if we are not using it at the time. So, if we have two programs that reserve 1000 bytes of memory each, but neither program is running, then we have 2000 bytes of memory that is being completely wasted. Suppose we only have 3000 bytes of memory, but we already have our two programs that take 1000 bytes each. Now we want to load a program that needs 1500 bytes of memory. Well, we just hit a wall, because we only have 3000 bytes of memory and 2000 bytes are already reserved. We can't load our third program even though we have 2000 bytes of memory that isn't even being used. How could we possibly remedy this situation? If you said Dynamic Memory Allocation, then you must have read the title of this lesson. That's right. We are going to use dynamic memory to share the memory among all three programs. Dynamic memory won't fix everything. We will always need an amount of finite static memory, but this amount is usually much less than we need. This is why we still have static memory. So, let us imagine a new scenario. We have changed our first two programs to use dynamic memory allocation. Now they only need to reserve 100 bytes of memory each. This means we are now only using 200 bytes of our 3000 total memory bytes available. Our third program, which requires 1500 bytes of memory can now run fine. Static memory allocation refers to the process of allocating memory at compile-time before the associated program is executed, unlike dynamic memory allocation or automatic memory allocation where memory is allocated as required at run-time. An application of this technique involves a program module (e.g. function)claring static data locally, such that these data are inaccessible in other modules unless references to it are passed as parameters or returned. A single copy of static data is retained and accessible through many calls to the function in which it is declared. Static memory allocation therefore has the advantage of modularising data within a program design in the situation where these data must be retained through the runtime of the program. The use of static variables within a class in object oriented programming enables a single copy of such data to be shared between all the objects of that class. Object constants known at compile-time, like string literals, are usually allocated statically. In objectoriented programming, the virtual method tables of classes are usually allocated statically. A statically defined value can also be global in its scope ensuring the same immutable value is used throughout a run for consistency. In computer science, dynamic memory allocation (also known as heap-based memory allocation) is the allocation of memory storage for use in a computer program during the runtime of that program. It can be seen also as a way of distributing ownership of limited memory resources among many pieces of data and code. Dynamically allocated memory exists until it is released either explicitly by the programmer, or by the garbage collector. This is in contrast to static memory allocation, which has a fixed duration. It is said that an object so allocated has a dynamic lifetime. The Basics of Dynamic Memory Allocation Now that we have covered why we would want to use dynamic memory allocation, how do we go about doing it? Well, that's easy! We have predefined functions that let us perform these tasks. The two functions we will be employing in our dynamic memory allocation tasks are malloc() and free(). malloc() meaning memory allocation, and free, well, that should be obvious. Our malloc() function returns a pointer to the memory block we requested. Remember pointers from the last lesson? I told you we weren't done with them yet. Let's discuss the syntax of malloc(). It takes a single argument, a long integer representing the number of bytes we want to allocate from the heap. (Note: the heap is what we call all the memory we don't reserve by default) So, for example, to allocate memory for an array of characters for a 40 character string, we would do malloc(40); There's more to it, obviously, but we'll cover that. To allocate memory though, we must have a pointer so we can know where the memory will be at when it gets allocated. Let's look at a small code example: char *str = (char *)malloc(40); // allocate memory for a 40 character string This may look a little weird, because it uses a concept we haven't worked with before called "type casting". C has the ability to convert from one type of variable to another by using what are called type cast operators. The syntax is simple, put the variable type you want to convert to inside parentheses. See, it's easy. So, because we want to convert the memory into a character pointer, which is specified by the type char *, we put char * in parentheses. Then C will give us a character pointer to the memory. This is very easy to do when a character pointer is involved. The important thing to note here is what malloc() returns to us. Because malloc() doesn't care how the memory we allocate will be used, malloc() just returns a pointer to a series of bytes. It does this by using a void pointer. Void pointers are just like character pointers, or integer pointers, or any other kind of pointer with one special difference: we do not know what the size of the void pointer is. There is no way to increment or decrement a void pointer. In fact, we can't do much of anything with a void pointer except acknowledge its existence until we convert (type cast) it to another variable type. Characters are by definition 1 byte, and malloc always allocates in multiples of a single byte. Other types are more than 1 byte. Rather than remembering how many bytes an int variable takes on a system, we can use the sizeof operator. Let's see an example. char *str = (char *)malloc(40 * sizeof(char)); // allocate memory for a 40 character string This looks very similar to what we had above, with one important exception, the sizeof(char) multiplier. This is a new operator, and another very important one in C. The sizeof() operator will tell us the size of any variable or structure in our program. The one thing to keep in mind when using sizeof() is that the size of a pointer is how many bytes it takes to store the pointer, not the size of the memory block the pointer is pointing at. There is no need to know that information. In allocating memory like this, we know we have 40 characters instead of 40 bytes (even though they are technically the same). There is a difference between 40 integers and 40 bytes, and 40 long integers especially. Now that we have taken a look at how the malloc() function works, let's take a look at its companion function, free(). The free() function is basically the exact opposite of malloc(). So, instead of assigning a pointer the value returned, we give the function the pointer we got from malloc(). So, if we have the *str pointer we allocated above, to free that memory, we just need to call free with the pointer as an argument. free(str); // free the memory allocated by malloc() You may be wondering how the free() function knows how much memory to free up, since we didn't tell it how much memory we allocated. Well, the short answer is: you don't need to worry about that. The system will take care of such minutia for us. The long answer is, well, long, and complicated, so I don't want to talk about it. Most modern systems will free allocated memory at the completion of the program. The AMS is not one of these systems. If you don't free the memory you allocate, your calculator will lose memory until it is reset RECURSION: in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem.[1] The approach can be applied to many types of problems, and is one of the central ideas of computer science.[2] "The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions." [3] Most high-level computer programming languages support recursion by allowing a function to call itself within the program text. Imperative languages define looping constructs like “while” and “for” loops that are used to perform repetitive actions. Some functional programming languages do not define any looping constructs but rely solely on recursion to repeatedly call code. Computability theory has proven that these recursive-only languages are mathematically equivalent to the imperative languages, meaning they can solve the same kinds of problems even without the typical control structures like “while” and “for”. Greatest common divisor Another famous recursive function is the Euclidean algorithm, used to compute the greatest common divisor of two integers. Function definition: Pseudocode (recursive): function gcd is: input: integer x, integer y such that x >= y and y >= 0 1. if y is 0, return x 2. otherwise, return [ gcd( y, (remainder of x/y) ) ] end gcd Recurrence relation for greatest common divisor, where x%y expresses the remainder of x gcd(x,y) = gcd(y,x%y) gcd(x,0) = x Computing the recurrence relation for x = 27 and y = 9: gcd(27, 9) = gcd(9, 27 % 9) = gcd(9, 0) =9 Computing the recurrence relation for x = 259 and y = 111: gcd(259, 111) = gcd(111, 259 % 111) / y: = gcd(111, 37) = gcd(37, 0) = 37 The recursive program above is tail-recursive; it is equivalent to an iterative algorithm, and the computation shown above shows the steps of evaluation that would be performed by a language that eliminates tail calls. Below is a version of the same algorithm using explicit iteration, suitable for a language that does not eliminate tail calls. By maintaining its state entirely in the variables x and y and using a looping construct, the program avoids making recursive calls and growing the call stack. Pseudocode (iterative): function gcd is: input: integer x, integer y such that x >= y and y >= 0 1. create new variable called remainder 2. begin loop 1. if y is zero, exit loop 2. set remainder to the remainder of x/y 3. set x to y 4. set y to remainder 5. repeat loop 3. return x end gcd The iterative algorithm requires a temporary variable, and even given knowledge of the Euclidean algorithm it is more difficult to understand the process by simple inspection, although the two algorithms are very similar in their steps. Towers of Hanoi For a full discussion of this problem's description, history and solution see the main article or one of the many references. Simply put the problem is this: given three pegs, one with a set of N disks of increasing size, determine the minimum (optimal) number of steps it takes to move all the disks from their initial position to another peg without placing a larger disk on top of a smaller one. Function definition: Recurrence relation for hanoi: hn = 2hn − 1 + 1 h1 = 1 Computing the recurrence relation for n = 4: hanoi(4) = 2*hanoi(3) + 1 = 2*(2*hanoi(2) + 1) + 1 = 2*(2*(2*hanoi(1) + 1) + 1) + 1 = 2*(2*(2*1 + 1) + 1) + 1 = 2*(2*(3) + 1) + 1 = 2*(7) + 1 = 15 Example Implementations: Pseudocode (recursive): function hanoi is: input: integer n, such that n >= 1 1. if n is 1 then return 1 2. return [2 * [call hanoi(n-1)] + 1] end hanoi Although not all recursive functions have an explicit solution, the Tower of Hanoi sequence can be reduced to an explicit formula. An explicit formula for Towers of Hanoi: h1 = 1 = 21 - 1 h2 = 3 = 22 - 1 h3 = 7 = 23 - 1 h4 = 15 = 24 - 1 h5 = 31 = 25 - 1 h6 = 63 = 26 - 1 h7 = 127 = 27 - 1 In general: hn = 2n - 1, for all n >= 1 Fibonacci Another well known mathematical recursive function is one that computes the Fibonacci numbers: Pseudocode function fib is: input: integer n such that n >= 0 1. if n is 0, return 0 2. if n is 1, return 1 3. otherwise, return [ fib(n-1) + fib(n-2) ] end fib * * @param k indicates which Fibonacci number to compute. * @return the kth Fibonacci number. */ private static int fib(int k) { // Base Case: // If k <= 2 then fib(k) = 1. if (k <= 2) { return 1; } // Recursive Case: // If k > 2 then fib(k) = fib(k-1) + fib(k-2). else { return fib(k-1) + fib(k-2); } } Recurrence relation for Fibonacci: bn = bn-1 + bn-2 b1 = 1, b0 = 0 Computing the recurrence relation for n = 4: b4 = b3 + b2 = b2 + b1 + b1 + b0 = b1 + b0 + 1 + 1 + 0 =1+0+1+1+0 =3 This Fibonacci algorithm is a particularly poor example of recursion, because each time the function is executed on a number greater than one, it makes two function calls to itself, leading to an exponential number of calls (and thus exponential time complexity) in total. The following alternative approach uses two accumulator variables TwoBack and OneBack to "remember" the previous two Fibonacci numbers constructed, and so avoids the exponential time cost: #include<stdio.h> #include<conio.h> long int fact(int x); // function prototype void main() { int n,r,bino_cof; clrscr(); printf("Enter the numbers n,r"); scanf("%d%d",&n,&r); bino_cof= (fact(n)/(fact(n-r)*fact(r))); printf("Binomial coefficient is %d ",bino_cof); getch(); } long int fact(int x) { int i,f=1; for (i=1; i<=x; i++) { f= f*i; } return f; } Recursion versus iteration Most programming languages in use today allow the direct specification of recursive functions and procedures. When such a function is called, the program's runtime environment keeps track of the various instances of the function (often using a call stack, although other methods may be used). Every recursive function can be transformed into an iterative function by replacing recursive calls with iterative control constructs and simulating the call stack with a stack explicitly managed by the program. Conversely, all iterative functions and procedures that can be evaluated by a computer (see Turing completeness) can be expressed in terms of recursive functions; iterative control constructs such as while loops and do loops routinely are rewritten in recursive form in functional languages. However, in practice this rewriting depends on tail call elimination, which is not a feature of all languages. C, Java, and Python are notable mainstream languages in which all function calls, including tail calls, cause stack allocation that would not occur with the use of looping constructs; in these languages, a working iterative program rewritten in recursive form may overflow the call stack. Performance issues In languages (such as C and Java) that favor iterative looping constructs, there is usually significant time and space cost associated with recursive programs, due to the overhead required to manage the stack and the relative slowness of function calls; in functional languages, a function call (particularly a tail call) is typically a very fast operation, and the difference is usually less noticeable. As a concrete example, the difference in performance between recursive and iterative implementations of the "factorial" example above depends highly on the language used. In languages where looping constructs are preferred, the iterative version may be as much as several orders of magnitude faster than the recursive one. In functional languages, the overall time difference of the two implementations may be negligible; in fact, the cost of multiplying the larger numbers first rather than the smaller numbers (which the iterative version given here happens to do) may overwhelm any time saved by choosing iteration. Other considerations In some programming languages, the stack space available to a thread is much less than the space available in the heap, and recursive algorithms tend to require more stack space than iterative algorithms. Consequently, these languages sometimes place a limit on the depth of recursion to avoid stack overflows. (Python is one such language.) Note the caveat below regarding the special case of tail recursion. There are some types of problems whose solutions are inherently recursive, because of prior state they need to track. One example is tree traversal; others include the Ackermann function, depth-first search, and divide-and-conquer algorithms such as Quicksort. All of these algorithms can be implemented iteratively with the help of an explicit stack, but the programmer effort involved in managing the stack, and the complexity of the resulting program, arguably outweigh any advantages of the iterative solution Bubble sort A bubble sort, a sorting algorithm that continuously steps through a list, swapping items until they appear in the correct order. Bubble sort is a straightforward and simplistic method of sorting data that is used in computer science education. The algorithm starts at the beginning of the data set. It compares the first two elements, and if the first is greater than the second, then it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then starts again with the first two elements, repeating until no swaps have occurred on the last pass. This algorithm is highly inefficient, and is rarely used except as a simplistic example. For example, if we have 100 elements then the total number of comparisons will be 10000. A slightly better variant, cocktail sort, works by inverting the ordering criteria and the pass direction on alternating passes. The modified Bubble sort will stop 1 shorter each time through the loop, so the total number of comparisons for 100 elements will be 4950. Bubble sort average case and worst case are both O(n²). Insertion sort Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. In arrays, the new list and the remaining elements can share the array's space, but insertion is expensive, requiring shifting all following elements over by one. Shell sort (see below) is a variant of insertion sort that is more efficient for larger lists. Shell sort Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time. One implementation can be described as arranging the data sequence in a two-dimensional array and then sorting the columns of the array using insertion sort. Merge sort Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Merge sort has seen a relatively recent surge in popularity for practical implementations, being used for the standard sort routine in the programming languages Perl, Python and Java (also uses timsort as of JDK7, among others. Merge sort has been used in Java at least since 2000 in JDK1.3. Heapsort Heapsort is a much more efficient version of selection sort. It also works by determining the largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then continuing with the rest of the list, but accomplishes this task efficiently by using a data structure called a heap, a special type of binary tree. Once the data list has been made into a heap, the root node is guaranteed to be the largest(or smallest) element. When it is removed and placed at the end of the list, the heap is rearranged so the largest element remaining moves to the root. Using the heap, finding the next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This allows Heapsort to run in O(n log n) time. Quicksort Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array, we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n²) performance, but if at each step we choose the median as the pivot then it works in O(n log n). Finding the median, however, is an O(n) operation on unsorted lists, and therefore exacts its own penalty. Radix sort Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k) time by treating them as bit strings. We first sort the list by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the number of values a bit can have is minimal - only '1' or '0'. Sorting Algorithm Examples This is a collection of programs implementing a wide variety of sorting algorithms. The code has been optimized for speed instead of readability. You will find them to perform better than the perspective standard algorithms. The Combo Sort should be suitable for production usages. Bubble Sort Exchange two adjacent elements if they are out of order. Repeat until array is sorted. This is a slow algorithm. #include <stdlib.h> #include <stdio.h> #define uint32 unsigned int typedef int (*CMPFUN)(int, int); void ArraySort(int This[], CMPFUN fun_ptr, uint32 ub) { /* bubble sort */ uint32 indx; uint32 indx2; int temp; int temp2; int flipped; if (ub <= 1) return; indx = 1; do { flipped = 0; for (indx2 = ub - 1; indx2 >= indx; --indx2) { temp = This[indx2]; temp2 = This[indx2 - 1]; if ((*fun_ptr)(temp2, temp) > 0) { This[indx2 - 1] = temp; This[indx2] = temp2; flipped = 1; } } } while ((++indx < ub) && flipped); } #define ARRAY_SIZE 14 int my_array[ARRAY_SIZE]; void fill_array() { int indx; for (indx=0; indx < ARRAY_SIZE; ++indx) { my_array[indx] = rand(); } /* my_array[ARRAY_SIZE - 1] = ARRAY_SIZE / 3; */ } int cmpfun(int a, int b) { if (a > b) return 1; else if (a < b) return -1; else return 0; } int main() { int indx; int indx2; for (indx2 = 0; indx2 < 80000; ++indx2) { fill_array(); ArraySort(my_array, cmpfun, ARRAY_SIZE); for (indx=1; indx < ARRAY_SIZE; ++indx) { if (my_array[indx - 1] > my_array[indx]) { printf("bad sort\n"); return(1); } } } return(0); } Selection Sort Find the largest element in the array, and put it in the proper place. Repeat until array is sorted. This is also slow. #include <stdlib.h> #include <stdio.h> #define uint32 unsigned int typedef int (*CMPFUN)(int, int); void ArraySort(int This[], CMPFUN fun_ptr, uint32 the_len) { /* selection sort */ uint32 indx; uint32 indx2; uint32 large_pos int temp; int large; if (the_len <= 1) return; for (indx = the_len - 1; indx > 0; --indx) { /* find the largest number, then put it at the end of the array */ large = This[0]; large_pos = 0; for (indx2 = 1; indx2 <= indx; ++indx2) { temp = This[indx2]; if ((*fun_ptr)(temp ,large) > 0) { large = temp; large_pos = indx2; } } This[large_pos] = This[indx]; This[indx] = large; } } #define ARRAY_SIZE 14 int my_array[ARRAY_SIZE]; void fill_array() { int indx; for (indx=0; indx < ARRAY_SIZE; ++indx) { my_array[indx] = rand(); } /* my_array[ARRAY_SIZE - 1] = ARRAY_SIZE / 3; */ } int cmpfun(int a, int b) { if (a > b) return 1; else if (a < b) return -1; else return 0; } int main() { int indx; int indx2; for (indx2 = 0; indx2 < 80000; ++indx2) { fill_array(); ArraySort(my_array, cmpfun, ARRAY_SIZE); for (indx=1; indx < ARRAY_SIZE; ++indx) { if (my_array[indx - 1] > my_array[indx]) { printf("bad sort\n"); return(1); } } } return(0); } Insertion Sort Scan successive elements for out of order item, then insert the item in the proper place. Sort small array fast, big array very slowly. #include <stdlib.h> #include <stdio.h> #define uint32 unsigned int typedef int (*CMPFUN)(int, int); void ArraySort(int This[], CMPFUN fun_ptr, uint32 the_len) { /* insertion sort */ uint32 indx; int cur_val; int prev_val; if (the_len <= 1) return; prev_val = This[0]; for (indx = 1; indx < the_len; ++indx) { cur_val = This[indx]; if ((*fun_ptr)(prev_val, cur_val) > 0) { /* out of order: array[indx-1] > array[indx] */ uint32 indx2; This[indx] = prev_val; /* move up the larger item first */ /* find the insertion point for the smaller item */ for (indx2 = indx - 1; indx2 > 0;) { int temp_val = This[indx2 - 1]; if ((*fun_ptr)(temp_val, cur_val) > 0) { This[indx2--] = temp_val; /* still out of order, move up 1 slot to make room */ } else break; } This[indx2] = cur_val; /* insert the smaller item right here */ } else { /* in order, advance to next element */ prev_val = cur_val; } } } #define ARRAY_SIZE 14 int my_array[ARRAY_SIZE]; uint32 fill_array() { int indx; uint32 checksum = 0; for (indx=0; indx < ARRAY_SIZE; ++indx) { checksum += my_array[indx] = rand(); } return checksum; } int cmpfun(int a, int b) { if (a > b) return 1; else if (a < b) return -1; else return 0; } int main() { int indx; int indx2; uint32 checksum1; uint32 checksum2; for (indx2 = 0; indx2 < 80000; ++indx2) { checksum1 = fill_array(); ArraySort(my_array, cmpfun, ARRAY_SIZE); for (indx=1; indx < ARRAY_SIZE; ++indx) { if (my_array[indx - 1] > my_array[indx]) { printf("bad sort\n"); return(1); } } checksum2 = 0; for (indx=0; indx < ARRAY_SIZE; ++indx) { checksum2 += my_array[indx]; } if (checksum1 != checksum2) { printf("bad checksum %d %d\n", checksum1, checksum2); } } return(0); } Quicksort Partition array into two segments. The first segment all elements are less than or equal to the pivot value. The second segment all elements are greater or equal to the pivot value. Sort the two segments recursively. Quicksort is fastest on average, but sometimes unbalanced partitions can lead to very slow sorting. #include <stdlib.h> #include <stdio.h> #define INSERTION_SORT_BOUND 16 /* boundary point to use insertion sort */ #define uint32 unsigned int typedef int (*CMPFUN)(int, int); /* explain function * Description: * fixarray::Qsort() is an internal subroutine that implements quick sort. * Return Value: none */ void Qsort(int This[], CMPFUN fun_ptr, uint32 first, uint32 last) { uint32 stack_pointer = 0; int first_stack[32]; int last_stack[32]; for (;;) { if (last - first <= INSERTION_SORT_BOUND) { /* for small sort, use insertion sort */ uint32 indx; int prev_val = This[first]; int cur_val; for (indx = first + 1; indx <= last; ++indx) { cur_val = This[indx]; if ((*fun_ptr)(prev_val, cur_val) > 0) { /* out of order: array[indx-1] > array[indx] */ uint32 indx2; This[indx] = prev_val; /* move up the larger item first */ /* find the insertion point for the smaller item */ for (indx2 = indx - 1; indx2 > first; ) { int temp_val = This[indx2 - 1]; if ((*fun_ptr)(temp_val, cur_val) > 0) { This[indx2--] = temp_val; /* still out of order, move up 1 slot to make room */ } else break; } This[indx2] = cur_val; /* insert the smaller item right here */ } else { /* in order, advance to next element */ prev_val = cur_val; } } } else { int pivot; /* try quick sort */ { int temp; uint32 med = (first + last) >> 1; /* Choose pivot from first, last, and median position. */ /* Sort the three elements. */ temp = This[first]; if ((*fun_ptr)(temp, This[last]) > 0) { This[first] = This[last]; This[last] = temp; } temp = This[med]; if ((*fun_ptr)(This[first], temp) > 0) { This[med] = This[first]; This[first] = temp; } temp = This[last]; if ((*fun_ptr)(This[med], temp) > 0) { This[last] = This[med]; This[med] = temp; } pivot = This[med]; } { uint32 up; { uint32 down; /* First and last element will be loop stopper. */ /* Split array into two partitions. */ down = first; up = last; for (;;) { do { ++down; } while ((*fun_ptr)(pivot, This[down]) > 0); do { --up; } while ((*fun_ptr)(This[up], pivot) > 0); if (up > down) { int temp; /* interchange L[down] and L[up] */ temp = This[down]; This[down]= This[up]; This[up] = temp; } else break; } } { uint32 len1; /* length of first segment */ uint32 len2; /* length of second segment */ len1 = up - first + 1; len2 = last - up; /* stack the partition that is larger */ if (len1 >= len2) { first_stack[stack_pointer] = first; last_stack[stack_pointer++] = up; first = up + 1; /* tail recursion elimination of * Qsort(This,fun_ptr,up + 1,last) */ } else { first_stack[stack_pointer] = up + 1; last_stack[stack_pointer++] = last; ` last = up; /* tail recursion elimination of * Qsort(This,fun_ptr,first,up) } continue; */ } } /* end of quick sort */ } if (stack_pointer > 0) { /* Sort segment from stack. */ first = first_stack[--stack_pointer]; last = last_stack[stack_pointer]; } else break; } /* end for */} void ArraySort(int This[], CMPFUN fun_ptr, uint32 the_len) { Qsort(This, fun_ptr, 0, the_len - 1);} #define ARRAY_SIZE 250000 int my_array[ARRAY_SIZE]; uint32 fill_array() { int indx; uint32 checksum = 0; for (indx=0; indx < ARRAY_SIZE; ++indx) { checksum += my_array[indx] = rand(); } return checksum;} int cmpfun(int a, int b) { if (a > b) return 1; else if (a < b) return -1; else return 0;} int main() { int indx; uint32 checksum1; uint32 checksum2 = 0; checksum1 = fill_array(); ArraySort(my_array, cmpfun, ARRAY_SIZE); for (indx=1; indx < ARRAY_SIZE; ++indx) { if (my_array[indx - 1] > my_array[indx]) { printf("bad sort\n"); return(1); } } for (indx=0; indx < ARRAY_SIZE; ++indx) { checksum2 += my_array[indx]; } if (checksum1 != checksum2) { printf("bad checksum %d %d\n", checksum1, checksum2); return(1); } return(0);} Mergesort Start from two sorted runs of length 1, merge into a single run of twice the length. Repeat until a single sorted run is left. Mergesort needs N/2 extra buffer. Performance is second place on average, with quite good speed on nearly sorted array. Mergesort is stable in that two elements that are equally ranked in the array will not have their relative positions flipped. #include <stdlib.h> #include <stdio.h> #define uint32 unsigned int typedef int (*CMPFUN)(int, int); #define INSERTION_SORT_BOUND 8 /* boundary point to use insertion sort */ void ArraySort(int This[], CMPFUN fun_ptr, uint32 the_len) { uint32 span; uint32 lb; uint32 ub; uint32 indx; uint32 indx2; if (the_len <= 1) return; span = INSERTION_SORT_BOUND; /* insertion sort the first pass */ { int prev_val; int cur_val; int temp_val; for (lb = 0; lb < the_len; lb += span) { if ((ub = lb + span) > the_len) ub = the_len; prev_val = This[lb]; for (indx = lb + 1; indx < ub; ++indx) { cur_val = This[indx]; if ((*fun_ptr)(prev_val, cur_val) > 0) { /* out of order: array[indx-1] > array[indx] */ This[indx] = prev_val; /* move up the larger item first */ /* find the insertion point for the smaller item */ for (indx2 = indx - 1; indx2 > lb;) { temp_val = This[indx2 - 1]; if ((*fun_ptr)(temp_val, cur_val) > 0) { This[indx2--] = temp_val; /* still out of order, move up 1 slot to make room */ } else break; } This[indx2] = cur_val; /* insert the smaller item right here */ else { /* in order, advance to next element */ prev_val = cur_val; } } } } /* second pass merge sort */ { uint32 median; int* aux; aux = (int*) malloc(sizeof(int) * the_len / 2); while (span < the_len) { /* median is the start of second file */ for (median = span; median < the_len;) { indx2 = median - 1; if ((*fun_ptr)(This[indx2], This[median]) > 0) } { /* the two files are not yet sorted */ if ((ub = median + span) > the_len) { ub = the_len; } /* skip over the already sorted largest elements */ while ((*fun_ptr)(This[--ub], This[indx2]) >= 0) { } /* copy second file into buffer */ for (indx = 0; indx2 < ub; ++indx) { *(aux + indx) = This[++indx2]; } --indx; indx2 = median - 1; lb = median - span; /* merge two files into one */ for (;;) { if ((*fun_ptr)(*(aux + indx), This[indx2]) >= 0) { This[ub--] = *(aux + indx); if (indx > 0) --indx; else { /* second file exhausted */ for (;;) { This[ub--] = This[indx2]; if (indx2 > lb) --indx2; else goto mydone; /* done */ } } } else { This[ub--] = This[indx2]; if (indx2 > lb) --indx2; else { /* first file exhausted */ for (;;) { This[ub--] = *(aux + indx); if (indx > 0) --indx; else goto mydone; /* done */ } } } } } mydone: median += span + span; } span += span; } free(aux); } } #define ARRAY_SIZE 250000 int my_array[ARRAY_SIZE]; uint32 fill_array() { int indx; uint32 sum = 0; for (indx=0; indx < ARRAY_SIZE; ++indx) { sum += my_array[indx] = rand(); } return sum; } int cmpfun(int a, int b) { if (a > b) return 1; else if (a < b) return -1; else return 0; } int main() { int indx; uint32 checksum, checksum2; checksum = fill_array(); ArraySort(my_array, cmpfun, ARRAY_SIZE); checksum2 = my_array[0]; for (indx=1; indx < ARRAY_SIZE; ++indx) { checksum2 += my_array[indx]; if (my_array[indx - 1] > my_array[indx]) { printf("bad sort\n"); return(1); } } if (checksum != checksum2) { printf("bad checksum %d %d\n", checksum, checksum2); return(1); } return(0); } Heapsort Form a tree with parent of the tree being larger than its children. Remove the parent from the tree successively. On average, Heapsort is third place in speed. Heapsort does not need extra buffer, and performance is not sensitive to initial distributions. #include <stdlib.h> #include <stdio.h> #define uint32 unsigned int typedef int (*CMPFUN)(int, int); void ArraySort(int This[], CMPFUN fun_ptr, uint32 the_len) { /* heap sort */ uint32 half; uint32 parent; if (the_len <= 1) return; half = the_len >> 1; for (parent = half; parent >= 1; --parent) { int temp; int level = 0; uint32 child; child = parent; /* bottom-up downheap */ /* leaf-search for largest child path */ while (child <= half) { ++level; child += child; if ((child < the_len) && ((*fun_ptr)(This[child], This[child - 1]) > 0)) ++child; } /* bottom-up-search for rotation point */ temp = This[parent - 1]; for (;;) { if (parent == child) break; if ((*fun_ptr)(temp, This[child - 1]) <= 0) break; child >>= 1; --level; } /* rotate nodes from parent to rotation point */ for (;level > 0; --level) { This[(child >> level) - 1] = This[(child >> (level - 1)) - 1]; } This[child - 1] = temp; } --the_len; do { int temp; int level = 0; uint32 child; /* move max element to back of array */ temp = This[the_len]; This[the_len] = This[0]; This[0] = temp; child = parent = 1; half = the_len >> 1; /* bottom-up downheap */ /* leaf-search for largest child path */ while (child <= half) { ++level; child += child; if ((child < the_len) && ((*fun_ptr)(This[child], This[child - 1]) > 0)) ++child; } /* bottom-up-search for rotation point */ for (;;) { if (parent == child) break; if ((*fun_ptr)(temp, This[child - 1]) <= 0) break; child >>= 1; --level; } /* rotate nodes from parent to rotation point */ for (;level > 0; --level) { This[(child >> level) - 1] = This[(child >> (level - 1)) - 1]; } This[child - 1] = temp; } while (--the_len >= 1); } #define ARRAY_SIZE 250000 int my_array[ARRAY_SIZE]; void fill_array() { int indx; for (indx=0; indx < ARRAY_SIZE; ++indx) { my_array[indx] = rand(); } } int cmpfun(int a, int b) { if (a > b) return 1; else if (a < b) return -1; else return 0; } int main() { int indx; fill_array(); ArraySort(my_array, cmpfun, ARRAY_SIZE); for (indx=1; indx < ARRAY_SIZE; ++indx) { if (my_array[indx - 1] > my_array[indx]) { printf("bad sort\n"); return(1); } } return(0); } Time Sort Average Best Worst Space Stability Remarks Bubble sort O(n^2) O(n^2) O(n^2) Constant Stable Always use a modified bubble sort Modified O(n^2) Bubble sort O(n) O(n^2) Constant Stable Stops after reaching a sorted array Selection Sort O(n^2) O(n^2) O(n^2) Constant Stable Even a perfectly sorted input requires scanning the entire array Insertion Sort O(n^2) O(n) O(n^2) Constant Stable In the best case (already sorted), every insert requires constant time Heap Sort By using input array as storage for O(n*log(n)) O(n*log(n)) O(n*log(n)) Constant Instable the heap, it is possible to achieve constant space Merge Sort O(n*log(n)) O(n*log(n)) O(n*log(n)) Depends Stable On arrays, merge sort requires O(n) space; on linked lists, merge sort requires constant space Quicksort Randomly picking a pivot value (or shuffling the array prior to sorting) can help avoid worst case scenarios such as a perfectly sorted array. O(n*log(n)) O(n*log(n)) O(n^2) Constant Stable SEARCHING: linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found. Linear search is the simplest search algorithm; it is a special case of brute-force search. Its worst case cost is proportional to the number of elements in the list; and so is its expected cost, if all list elements are equally likely to be searched for. Therefore, if the list has more than a few elements, other methods (such as binary search or hashing) may be much more efficient #include<stdio.h> #include<conio.h> #include<stdlib.h> void main(){ int arr[100],i,element,no; clrscr(); printf("\nEnter the no of Elements: "); scanf("%d", &no); for(i=0;i<no;i++){ printf("\n Enter Element %d: ", i+1); scanf("%d",&arr[i]); } printf("\nEnter the element to be searched: "); scanf("%d", &element); for(i=0;i<no;i++){ if(arr[i] == element){ printf("\nElement found at position %d",i+1); getch(); exit(1); } } printf("\nElement not found"); getch(); } Output: Enter the no of Elements: 5 Enter Element 1: 12 Enter Element 2: 23 Enter Element 3: 52 Enter Element 4: 23 Enter Element 5: 10 Enter the element to be searched: 23 Element found at position 2 binary search is an algorithm for locating the position of an element in a sorted list. It inspects the middle element of the sorted list: if equal to the sought value, then the position has been found; otherwise, the upper half or lower half is chosen for further searching based on whether the sought value is greater than or less than the middle element. The method reduces the number of elements needed to be checked by a factor of two each time, and finds the sought value if it exists in the list or if not determines "not present", in logarithmic time. A binary search is a dichotomic divide and conquer search algorithm. Viewing the comparison as a subtraction of the sought value from the middle element, only the sign of the difference is inspected: there is no attempt at an interpolation search based on the size of the difference. void main() { int array[10]; int i, j, N, temp, keynum; int low,mid,high; clrscr(); printf("Enter the value of N\n"); scanf("%d",&N); printf("Enter the elements one by one\n"); for(i=0; i { scanf("%d",&array[i]); } printf("Input array elements\n"); for(i=0; i { printf("%d\n",array[i]); } /* Bubble sorting begins */ for(i=0; i< N ; i++) { for(j=0; j< (N-i-1) ; j++) { if(array[j] > array[j+1]) { temp = array[j]; array[j] = array[j+1]; array[j+1] = temp; } } } printf("Sorted array is...\n"); for(i=0; i { printf("%d\n",array[i]); } printf("Enter the element to be searched\n"); scanf("%d", &keynum); /* Binary searching begins */ low=1; high=N; do { mid= (low + high) / 2; if ( keynum < array[mid] ) high = mid - 1; else if ( keynum > array[mid]) low = mid + 1; } while( keynum!=array[mid] && low <= high); /* End of do- while */ if( keynum == array[mid] ) { printf("SUCCESSFUL SEARCH\n"); } else { printf("Search is FAILED\n"); } } /* End of main*/ Stacks and Queues Two of the more common data objects found in computer algorithms are stacks and queues. Both of these objects are special cases of the more general data object, an ordered list. A stack is an ordered list in which all insertions and deletions are made at one end, called the top. A queue is an ordered list in which all insertions take place at one end, the rear, while all deletions take place at the other end, the front. Given a stack S=(a[1],a[2],.......a[n]) then we say that a1 is the bottommost element and element a[i]) is on top of element a[i-1], 1<i<=n. When viewed as a queue with a[n] as the rear element one says that a[i+1] is behind a[i], 1<i<=n. The restrictions on a stack imply that if the elements A,B,C,D,E are added to the stack, n that order, then th efirst element to be removed/deleted must be E. Equivalently we say that the last element to be inserted into the stack will be the first to be removed. For this reason stacks are sometimes referred to as Last In First Out (LIFO) lists. The restrictions on queue imply that the first element which is inserted into the queue will be the first one to be removed. Thus A is the first letter to be removed, and queues are known as First In First Out (FIFO) lists. Note that the data object queue as defined here need not necessarily correspond to the mathemathical concept of queue in which the insert/delete rules may be different. Adding an element into a stack. Deleting an element from a stack. Adding an element into a queue. Deleting an element from a queue Adding into stack procedure add(item : items); {add item to the global stack stack; top is the current top of stack and n is its maximum size} begin if top = n then stackfull; top := top+1; stack(top) := item; end: {of add} Deletion in stack procedure delete(var item : items); {remove top element from the stack stack and put it in the item} begin if top = 0 then stackempty; item := stack(top); top := top-1; end; {of delete} These two procedures are so simple that they perhaps need no more explanation. Procedure delete actually combines the functions TOP and DELETE, stackfull and stackempty are procedures which are left unspecified since they will depend upon the particular application. Often a stackfull condition will signal that more storage needs to be allocated and the program re-run. Stackempty is often a meaningful condition. Addition into a queue procedure addq (item : items); {add item to the queue q} begin if rear=n then queuefull else begin rear :=rear+1; q[rear]:=item; end; end;{of addq} Deletion in a queue procedure deleteq (var item : items); {delete from the front of q and put into item} begin if front = rear then queueempty else begin front := front+1 item := q[front]; end; end; {of deleteq} /* Program of stack using array*/ #include #define MAX 5 int top = -1; int stack_arr[MAX]; main() { int choice; while(1) { printf("1.Push\n"); printf("2.Pop\n"); printf("3.Display\n"); printf("4.Quit\n"); printf("Enter your choice : "); scanf("%d",&choice); switch(choice) { case 1 : push(); break; case 2: pop(); break; case 3: display(); break; case 4: exit(1); default: printf("Wrong choice\n"); }/*End of switch*/ }/*End of while*/ }/*End of main()*/ push() { int pushed_item; if(top == (MAX-1)) printf("Stack Overflow\n"); else { printf("Enter the item to be pushed in stack : "); scanf("%d",&pushed_item); top=top+1; stack_arr[top] = pushed_item; } }/*End of push()*/ pop() { if(top == -1) printf("Stack Underflow\n"); else { printf("Popped element is : %d\n",stack_arr[top]); top=top-1; } }/*End of pop()*/ display() { int i; if(top == -1) printf("Stack is empty\n"); else { printf("Stack elements :\n"); for(i = top; i >=0; i--) printf("%d\n", stack_arr[i] ); } }/*End of display()*/ Applications: As I already said - Stacks help computers in unfolding their recursive jobs; used in converting an expression to its postfix form; used in Graphs to find their traversals (we have seen that); helps in non-recursive traversal of binary trees (we'll see this) and so on.... Well, we'll see here, its role in converting an infix to a postfix expression. The rest of them I've some how covered in other parts of my tutorial. So here we go.... An INFIX expression is nothing but any regular expression having operators in between operands; and obviously, the POSTFIX is the one with operands followed by their respective operators. I'll take an example... If a/b-c*d is your infix expression then ab/cd*- will be its postfix form. How ?...Let's see. First step is to apply brackets to the expression following the BODMAS rule; then just move the operators to their respective closing brackets. The resultant is your Postfix expression. Remember, this is the manual way of doing it. ((a/b)-(c*d)) ==> ==> ab/cd*- Now, where does Stack figure out in between? Well, when the computer has to do this conversion, it makes use of Stack. See how.. Algorithm:Step-1: Check whether the current element in the expression is an operator or operand. If its an operand then go to step-2 or else step-3 Step-2: Put the element in the Postfix output stream and go for the next element in the expression, if any Step-3: a. find the priority of that operator b. if the operator's priority > priority of Stack's top then push that operator inside the stack and go for the next element; or else pop the stack, write that value to the Postfix output stream and go to start of this step Step-4: Empty the Stack and put the popped values to the output stream directly C implementation:/*This is just a broad outline of the exact implementation. The actual implementation is downloadable*/ char[] infix_to_postfix(char *expr) { static char out[], i; while(*expr != NULL) { if(*expr == OPERAND) /*step-1*/ { out[i] = *expr; i = i+1;} /*step-2*/ else { /*step-3*/ while(priority(*expr) <= priority(stack[top])) { out[i] = pop( ); i = i + 1; } push(*expr); } expr++; } while( top != -1) /*step-4*/ { out[i] = pop( ); i++;} return(out); A queue (pronounced /kjuː/) is a particular kind of collection in which the entities in the collection are kept in order and the principal (or only) operations on the collection are the addition of entities to the rear terminal position and removal of entities from the front terminal position. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that whenever an element is added, all elements that were added before have to be removed before the new element can be invoked. A queue is an example of a linear data structure. Queues provide services in computer science, transport and operations research where various entities such as data, objects, persons, or events are stored and held to be processed later. In these contexts, the queue performs the function of a buffer. Queues are common in computer programs, where they are implemented as data structures coupled with access routines, as an abstract data structure or in object-oriented languages as classes. Common implementations are circular buffers and linked lists. A queue (pronounced /kjuː/) is a particular kind of collection in which the entities in the collection are kept in order and the principal (or only) operations on the collection are the addition of entities to the rear terminal position and removal of entities from the front terminal position. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that whenever an element is added, all elements that were added before have to be removed before the new element can be invoked. A queue is an example of a linear data structure. Queues provide services in computer science, transport and operations research where various entities such as data, objects, persons, or events are stored and held to be processed later. In these contexts, the queue performs the function of a buffer. Queues are common in computer programs, where they are implemented as data structures coupled with access routines, as an abstract data structure or in object-oriented languages as classes. Common implementations are circular buffers and linked lists. /*Program of queue using array*/ # include # define MAX 5 int queue_arr[MAX]; int rear = -1; int front = -1; main() { int choice; while(1) { printf("1.Insert\n"); printf("2.Delete\n"); printf("3.Display\n"); printf("4.Quit\n"); printf("Enter your choice : "); scanf("%d",&choice); switch(choice) { case 1 : insert(); break; case 2 : del(); break; case 3: display(); break; case 4: exit(1); default: printf("Wrong choice\n"); }/*End of switch*/ }/*End of while*/ }/*End of main()*/ insert() { int added_item; if (rear==MAX-1) printf("Queue Overflow\n"); else { if (front==-1) /*If queue is initially empty */ front=0; printf("Input the element for adding in queue : "); scanf("%d", &added_item); rear=rear+1; queue_arr[rear] = added_item ; } }/*End of insert()*/ del() { if (front == -1 || front > rear) { printf("Queue Underflow\n"); return ; } else { printf("Element deleted from queue is : %d\n", queue_arr[front]); front=front+1; } }/*End of del() */ display() { int i; if (front == -1) printf("Queue is empty\n"); else { printf("Queue is :\n"); for(i=front;i<= rear;i++) printf("%d ",queue_arr[i]); printf("\n"); } }/*End of display() */ A priority queue is an abstract data type in computer programming that supports the following three operations: insertWithPriority: add an element to the queue with an associated priority getNext: remove the element from the queue that has the highest priority, and return it (also known as "PopElement(Off)", or "GetMinimum") peekAtNext (optional): look at the element with highest priority without removing it a double-ended queue (often abbreviated to deque, pronounced deck) is an abstract data structure that implements a queue for which elements can only be added to or removed from the front (head) or back (tail). It is also often called a head-tail linked list. A linked list is called so because each of items in the list is a part of a structure, which is linked to the structure containing the next item. This type of list is called a linked list since it can be considered as a list whose order is given by links from one item to the next. Structure Item Each item has a node consisting two fields one containing the variable and another consisting of address of the next item(i.e., pointer to the next item) in the list. A linked list is therefore a collection of structures ordered by logical links that are stored as the part of data. Consider the following example to illustrator the concept of linking. Suppose we define a structure as follows struct linked_list { float age; struct linked_list *next; } struct Linked_list node1,node2; this statement creates space for nodes each containing 2 empty fields node1 node1.age node1.next node2 node2.age node2.next The next pointer of node1 can be made to point to the node 2 by the same statement. node1.next=&node2; This statement stores the address of node 2 into the field node1.next and this establishes a link between node1 and node2 similarly we can combine the process to create a special pointer value called null that can be stored in the next field of the last node Advantages of Linked List: A linked list is a dynamic data structure and therefore the size of the linked list can grow or shrink in size during execution of the program. A linked list does not require any extra space therefore it does not waste extra memory. It provides flexibility in rearranging the items efficiently. The limitation of linked list is that it consumes extra space when compared to a array since each node must also contain the address of the next item in the list to search for a single item in a linked list is cumbersome and time consuming. Types of linked list: There are different kinds of linked lists they are Linear singly linked list Circular singly linked list Two way or doubly linked list Circular doubly linked list. Applications of linked lists: Linked lists concepts are useful to model many different abstract data types such as queues stacks and trees. If we restrict the process of insertions to one end of the list and deletions to the other en Linear and circular lists In the last node of a list, the link field often contains a null reference, a special value that is interpreted by programs as meaning "there is no such node". A less common convention is to make it point to the first node of the list; in that case the list is said to be circular or circularly linked; otherwise it is said to be open or linear. Singly-, doubly-, and multiply-linked lists Singly-linked lists contain nodes which have a data field as well as a next field, which points to the next node in the linked list. In a doubly-linked list, each node contains, besides the next-node link, a second link field pointing to the previous node in the sequence. The two links may be called forward(s) and backwards, or next and prev(ious). A doubly-linked list whose nodes contain three fields: an integer value, the link forward to the next node, and the link backward to the previous node The technique known as XOR-linking allows a doubly-linked list to be implemented using a single link field in each node. However, this technique requires the ability to do bit operations on addresses, and therefore may not be available in some high-level languages. In a multiply-linked list, each node contains two or more link fields, each field being used to connect the same set of data records in a different order (e.g., by name, by department, by date of birth, etc.). (While doubly-linked lists can be seen as special cases of multiply-linked list, the fact that the two orders are opposite to each other leads to simpler and more efficient algorithms, so they are usually treated as a separate case.) In the case of a doubly circular linked list, the only change that occurs is the end, or "tail" of the said list is linked back to the front, "head", of the list and vice versa. TREES: In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes. Mathematically, it is a tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at most one parent node. Furthermore, the children of each node have a specific order. Binary Trees The simplest form of tree is a binary tree. A binary tree consists of a. a node (called the root node) and b. left and right sub-trees. Both the sub-trees are themselves binary trees. A binary tree The nodes at the lowest levels of the tree (the ones with no sub-trees) are called leaves. In an ordered binary tree, 1. the keys of all the nodes in the left sub-tree are less than that of the root, 2. the keys of all the nodes in the right sub-tree are greater than that of the root, 3. the left and right sub-trees are themselves ordered binary trees. Data Structure The data structure for the tree implementation simply adds left and right pointers in place of the next pointer of the linked list implementation. Complete Trees Before we look at more general cases, let's make the optimistic assumption that we've managed to fill our tree neatly, ie that each leaf is the same 'distance' from the root. This forms a complete tree, whose height is defined as the number of links from the root to the deepest leaf. A complete tree First, we need to work out how many nodes, n, we have in such a tree of height, h. Now, n = 1 + 21 + 22 + .... + 2h From which we have, n = 2h+1 - 1 and h = floor( log2n ) Examination of the Find method shows that in the worst case, h+1 or ceiling( log2n ) comparisons are needed to find an item. This is the same as for binary search. However, Add also requires ceiling( log2n ) comparisons to determine where to add an item. Actually adding the item takes a constant number of operations, so we say that a binary tree requires O(logn) operations for both adding and finding an item - a considerable improvement over binary search for a dynamic structure which often requires addition of new items. Deletion is also an O(logn) operation. General binary trees However, in general addition of items to an ordered tree will not produce a complete tree. The worst case occurs if we add an ordered list of items to a tree. This problem is readily overcome: we use a structure known as a heap. However, before looking at heaps, we should formalise our ideas about the complexity of algorithms by defining carefully what O(f(n)) means. Root Node Node at the "top" of a tree - the one from which all operations on the tree commence. The root node may not exist (a NULL tree with no nodes in it) or have 0, 1 or 2 children in a binary tree. Leaf Node Node at the "bottom" of a tree - farthest from the root. Leaf nodes have no children. Complete Tree Tree in which each leaf is at the same distance from the root. A more precise and formal definition of a complete tree is set out later. Height Number of nodes which must be traversed from the root to reach a leaf of a tree. Binary search tree In computer science, a binary search tree (BST) or ordered binary tree is a node-based binary tree data structure which has the following properties: The left subtree of a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than or equal to the node's key. Both the left and right subtrees must also be binary search trees. From the above properties it naturally follows that: Each node (item in the tree) has a distinct key. Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records. The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient. Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays. operations on a binary tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine that computes the total order (linear order) on any two values. This comparator can be explicitly or implicitly defined, depending on the language in which the BST is implemented. Searching Searching a binary tree for a specific value can be a recursive or iterative process. This explanation covers a recursive method. We begin by examining the root node. If the tree is null, the value we are searching for does not exist in the tree. Otherwise, if the value equals the root, the search is successful. If the value is less than the root, search the left subtree. Similarly, if it is greater than the root, search the right subtree. This process is repeated until the value is found or the indicated subtree is null. If the searched value is not found before a null subtree is reached, then the item must not be present in the tree. Here is the search algorithm in the Python programming language: # 'node' refers to the parent-node in this case def search_binary_tree(node, key): if node is None: return None # key not found if key < node.key: return search_binary_tree(node.leftChild, key) elif key > node.key: return search_binary_tree(node.rightChild, key) else: # key is equal to node key return node.value # found key … or equivalent Haskell: searchBinaryTree _ NullNode = Nothing searchBinaryTree key (Node nodeKey nodeValue (leftChild, rightChild)) = case compare key nodeKey of LT -> searchBinaryTree key leftChild GT -> searchBinaryTree key rightChild EQ -> Just nodeValue This operation requires O(log n) time in the average case, but needs O(n) time in the worst case, when the unbalanced tree resembles a linked list (degenerate tree). Assuming that BinarySearchTree is a class with a member function "search(int)" and a pointer to the root node, the algorithm is also easily implemented in terms of an iterative approach. The algorithm enters a loop, and decides whether to branch left or right depending on the value of the node at each parent node. bool BinarySearchTree::search(int val) { Node* next = this->root(); while (next != 0) { if (val == next->value()) { return true; } else if (val < next->value()) { next = next->left(); } else if (val > next->value()) { next = next->right(); } } //not found return false; } Here is the search algorithm in the Java programming language: public boolean search(TreeNode node, int data) { if (node == null) { return false; } if (node.getData() == data) { return true; } else if (data < node.getData()) { // data must be in left subtree return search(node.getLeft(), data); } else { // data must be in right subtree return search(node.getRight(), data); } } Insertion Insertion begins as a search would begin; if the root is not equal to the value, we search the left or right subtrees as before. Eventually, we will reach an external node and add the value as its right or left child, depending on the node's value. In other words, we examine the root and recursively insert the new node to the left subtree if the new value is less than the root, or the right subtree if the new value is greater than or equal to the root. Here's how a typical binary search tree insertion might be performed in C++: /* Inserts the node pointed to by "newNode" into the subtree rooted at "treeNode" */ void InsertNode(Node* &treeNode, Node *newNode) { if (treeNode == NULL) treeNode = newNode; else if (newNode->key < treeNode->key) InsertNode(treeNode->left, newNode); else InsertNode(treeNode->right, newNode); } The above "destructive" procedural variant modifies the tree in place. It uses only constant space, but the previous version of the tree is lost. Alternatively, as in the following Python example, we can reconstruct all ancestors of the inserted node; any reference to the original tree root remains valid, making the tree a persistent data structure: def binary_tree_insert(node, key, value): if node is None: return TreeNode(None, key, value, None) if key == node.key: return TreeNode(node.left, key, value, node.right) if key < node.key: return TreeNode(binary_tree_insert(node.left, key, value), node.key, node.value, node.right) else: return TreeNode(node.left, node.key, node.value, binary_tree_insert(node.right, key, value)) The part that is rebuilt uses Θ(log n) space in the average case and Ω(n) in the worst case (see big-O notation). In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but Ω(n) time in the worst case. Another way to explain insertion is that in order to insert a new node in the tree, its value is first compared with the value of the root. If its value is less than the root's, it is then compared with the value of the root's left child. If its value is greater, it is compared with the root's right child. This process continues, until the new node is compared with a leaf node, and then it is added as this node's right or left child, depending on its value. There are other ways of inserting nodes into a binary tree, but this is the only way of inserting nodes at the leaves and at the same time preserving the BST structure. Here is an iterative approach to inserting into a binary search tree public void insert(int data) { if (root == null) { root = new TreeNode(data, null, null); } else { TreeNode current = root; while (current != null) { if (data < current.getData()) { // insert left if (current.getLeft() == null) { current.setLeft(new TreeNode(data, null, null)); return; } else { current = current.getLeft(); } } else { // insert right if (current.getRight() == null) { current.setRight(new TreeNode(data, null, null)); return; } else { current = current.getRight(); } } } } } Below is a recursive approach to the insertion method. As pointers are not available ,we must return a new node pointer to the caller as indicated by the final line in the method. public TreeNode insert(TreeNode node, int data) { if (node == null) { node = new TreeNode(data, null, null); } else { if (data < node.getData()) { // insert left node.left = insert(node.getLeft(), data); } else { // insert right node.right = insert(node.getRight(), data); } } return node; } Deletion There are three possible cases to consider: Deleting a leaf (node with no children): Deleting a leaf is easy, as we can simply remove it from the tree. Deleting a node with one child: Delete the node and replace it with its child. Deleting a node with two children: Call the node to be deleted "N". Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, "R". Replace the value of N with the value of R, then delete R. (Note: R itself has up to one child.) As with all binary trees, a node's in-order successor is the left-most child of its right subtree, and a node's in-order predecessor is the right-most child of its left subtree. In either case, this node will have zero or one children. Delete it according to one of the two simpler cases above. Consistently using the in-order successor or the in-order predecessor for every instance of the two-child case can lead to an unbalanced tree, so good implementations add inconsistency to this selection. Running Time Analysis: Although this operation does not always traverse the tree down to a leaf, this is always a possibility; thus in the worst case it requires time proportional to the height of the tree. It does not require more even when the node has two children, since it still follows a single path and does not visit any node twice. Here is the code in Python: def findMin(self): ''' Finds the smallest element that is a child of *self* ''' current_node = self while current_node.left_child: current_node = current_node.left_child return current_node def replace_node_in_parent(self, new_value=None): ''' Removes the reference to *self* from *self.parent* and replaces it with *new_value*. ''' if self == self.parent.left_child: self.parent.left_child = new_value else: self.parent.right_child = new_value if new_value: new_value.parent = self.parent def binary_tree_delete(self, key): if key < self.key: self.left_child.binary_tree_delete(key) elif key > self.key: self.right_child.binary_tree_delete(key) else: # delete the key here if self.left_child and self.right_child: # if both children are present # get the smallest node that's bigger than *self* successor = self.right_child.findMin() self.key = successor.key # if *successor* has a child, replace it with that # at this point, it can only have a *right_child* # if it has no children, *right_child* will be "None" successor.replace_node_in_parent(successor.right_child) elif self.left_child or self.right_child: # if the node has only one child if self.left_child: self.replace_node_in_parent(self.left_child) else: self.replace_node_in_parent(self.right_child) else: # this node has no children self.replace_node_in_parent(None) Traversal Once the binary search tree has been created, its elements can be retrieved in-order by recursively traversing the left subtree of the root node, accessing the node itself, then recursively traversing the right subtree of the node, continuing this pattern with each node in the tree as it's recursively accessed. As with all binary trees, one may conduct a pre-order traversal or a postorder traversal, but neither are likely to be useful for binary search trees. The code for in-order traversal in Python is given below. It will call callback for every node in the tree. def traverse_binary_tree(node, callback): if node is None: return traverse_binary_tree(node.leftChild, callback) callback(node.value) traverse_binary_tree(node.rightChild, callback) Traversal requires Ω(n) time, since it must visit every node. This algorithm is also O(n), so it is asymptotically optimal. Sort A binary search tree can be used to implement a simple but efficient sorting algorithm. Similar to heapsort, we insert all the values we wish to sort into a new ordered data structure—in this case a binary search tree—and then traverse it in order, building our result: def build_binary_tree(values): tree = None for v in values: tree = binary_tree_insert(tree, v) return tree def get_inorder_traversal(root): ''' Returns a list containing all the values in the tree, starting at *root*. Traverses the tree in-order(leftChild, root, rightChild). ''' result = [] traverse_binary_tree(root, lambda element: result.append(element)) return result The worst-case time of build_binary_tree is Θ(n2)—if you feed it a sorted list of values, it chains them into a linked list with no left subtrees. For example, traverse_binary_tree([1, 2, 3, 4, 5]) yields the tree (1 (2 (3 (4 (5))))). There are several schemes for overcoming this flaw with simple binary trees; the most common is the self-balancing binary search tree. If this same procedure is done using such a tree, the overall worst-case time is O(nlog n), which is asymptotically optimal for a comparison sort. In practice, the poor cache performance and added overhead in time and space for a tree-based sort (particularly for node allocation) make it inferior to other asymptotically optimal sorts such as heapsort for static list sorting. On the other hand, it is one of the most efficient methods of incremental sorting, adding items to a list over time while keeping the list sorted at all times. Types There are many types of binary search trees. AVL trees and red-black trees are both forms of self-balancing binary search trees. A splay tree is a binary search tree that automatically moves frequently accessed elements nearer to the root. In a treap ("tree heap"), each node also holds a priority and the parent node has higher priority than its children. Two other titles describing binary search trees are that of a complete and degenerate tree. A complete tree is a tree with n levels, where for each level d <= n - 1, the number of existing nodes at level d is equal to 2d. This means all possible nodes exist at these levels. An additional requirement for a complete binary tree is that for the nth level, while every node does not have to exist, the nodes that do exist must fill from left to right. A degenerate tree is a tree where for each parent node, there is only one associated child node. What this means is that in a performance measurement, the tree will essentially behave like a linked list data structure. Performance comparisons Optimal binary search trees If we don't plan on modifying a search tree, and we know exactly how often each item will be accessed, we can construct an optimal binary search tree, which is a search tree where the average cost of looking up an item (the expected search cost) is minimized. Even if we only have estimates of the search costs, such a system can considerably speed up lookups on average. For example, if you have a BST of English words used in a spell checker, you might balance the tree based on word frequency in text corpora, placing words like "the" near the root and words like "agerasia" near the leaves. Such a tree might be compared with Huffman trees, which similarly seek to place frequently-used items near the root in order to produce a dense information encoding; however, Huffman trees only store data elements in leaves and these elements need not be ordered. If we do not know the sequence in which the elements in the tree will be accessed in advance, we can use splay trees which are asymptotically as good as any static search tree we can construct for any particular sequence of lookup operations. Alphabetic trees are Huffman trees with the additional constraint on order, or, equivalently, search trees with the modification that all elements are stored in the leaves. Faster algorithms exist for optimal alphabetic binary trees (OABTs). Tree traversal In computer science, tree-traversal refers to the process of visiting (examining and/or updating) each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well. Traversal Compared to linear data structures like linked lists and one dimensional arrays, which have only one logical means of traversal, tree structures can be traversed in many different ways. Starting at the root of a binary tree, there are three main steps that can be performed and the order in which they are performed defines the traversal type. These steps (in no particular order) are: performing an action on the current node (referred to as "visiting" the node), traversing to the left child node, and traversing to the right child node. Thus the process is most easily described through recursion. The names given for particular style of traversal came from the position of root element with regard to the left and right nodes. Imagine that the left and right nodes are constant in space, then the root node could be placed to the left of the left node (pre-order), between the left and right node (in-order), or to the right of the right node (post-order). Depth-first Traversal To traverse a non-empty binary tree in preorder, perform the following operations recursively at each node, starting with the root node: 1. Visit the root. 2. Traverse the left subtree. 3. Traverse the right subtree. To traverse a non-empty binary tree in inorder (symmetric), perform the following operations recursively at each node: 1. Traverse the left subtree. 2. Visit the root. 3. Traverse the right subtree. To traverse a non-empty binary tree in postorder, perform the following operations recursively at each node: 1. Traverse the left subtree. 2. Traverse the right subtree. 3. Visit the root. Breadth-first Traversal Finally, trees can also be traversed in level-order, where we visit every node on a level before going to a lower level. This is also called Breadth-first traversal. Example In this binary search tree Preorder traversal sequence: F, B, A, D, C, E, G, I, H (root, left, right) Inorder traversal sequence: A, B, C, D, E, F, G, H, I (left, root, right); note how this produces a sorted sequence Postorder traversal sequence: A, C, E, D, B, H, I, G, F (left, right, root) Level-order traversal sequence: F, B, G, A, D, I, C, E, H pre-order in-order post-order level-order push F push F B A push F B G pop F push F B A pop A pop F push G B pop A pop B push A D pop B push D C push D C pop B push D A pop C pop C push I pop A push E pop D pop G pop D pop E push E pop A push E C pop D pop E push C E pop C pop B pop F pop D pop E push G I H push G push H pop G pop H pop G pop I push I pop I push I H pop C pop I pop G pop H pop E push H pop F pop I pop H pop H Sample implementations preorder(node) print node.value if node.left ≠ null then preorder(node.left) if node.right ≠ null then preorder(node.right) inorder(node) if node.left ≠ null then inorder(node.left) print node.value if node.right ≠ null then inorder(node.right) postorder(node) if node.left ≠ null then postorder(node.left) if node.right ≠ null then postorder(node.right) print node.value All sample implementations will require call stack space proportional to the height of the tree. In a poorly balanced tree, this can be quite considerable. We can remove the stack requirement by maintaining parent pointers in each node, or by threading the tree. In the case of using threads, this will allow for greatly improved inorder traversal, although retrieving the parent node required for preorder and postorder traversal will be slower than a simple stack based algorithm. To traverse a threaded tree inorder, we could do something like this: inorder(node) while hasleftchild(node) do node = node.left do visit(node) if (hasrightchild(node)) then node = node.right while hasleftchild(node) do node = node.left else while node.parent ≠ null and node = node.parent.right node = node.parent node = node.parent while node ≠ null Note that a threaded binary tree will provide a means of determining whether a pointer is a child, or a thread. See threaded binary trees for more information. Inorder traversal It is particularly common to use an inorder traversal on a binary search tree because this will return values from the underlying set in order, according to the comparator that set up the binary search tree (hence the name). To see why this is the case, note that if n is a node in a binary search tree, then everything in n 's left subtree is less than n, and everything in n 's right subtree is greater than or equal to n. Thus, if we visit the left subtree in order, using a recursive call, and then visit n, and then visit the right subtree in order, we have visited the entire subtree rooted at n in order. We can assume the recursive calls correctly visit the subtrees in order using the mathematical principle of structural induction. Traversing in reverse inorder similarly gives the values in decreasing order. Preorder traversal Traversing a tree in preorder while inserting the values into a new tree is common way of making a complete copy of a binary search tree. One can also use preorder traversals to get a prefix expression (Polish notation) from expression trees: traverse the expression tree preorderly. To calculate the value of such an expression: scan from right to left, placing the elements in a stack. Each time we find an operator, we replace the two top symbols of the stack with the result of applying the operator to those elements. For instance, the expression ∗ + 2 3 4, which in infix notation is (2 + 3) ∗ 4, would be evaluated like this: Using prefix traversal to evaluate an expression tree Expression (remaining) ∗+234 Stack <empty> ∗+23 4 ∗+2 34 ∗+ 234 54 ∗ Answer 20 MaxHeapify template<class T> voidmaxHeapify( T* vals int n int root ) The function maxHeapify accepts a heap, defined by a pointer to its data and a total node count. It performs the heapify operation on the given root different to the actual root of the heap which resides in position 0 of the data array. A heap is a very important data structure in computing. It can be described as a nearly-complete binary tree. In more simple terms, a heap is a pyramidal data structure which grows downwards from a root node. Each level in the heap has to be complete before the next level begins. Each element in the heap has zero, one or two children. The (max) heap property is that a given node has a value which is greater than or equal to the values held by its children. Heaps are useful because they are very efficiently represented as an array. In the array representation, the following equations are necessary to calculate the indices of the left and right children: (1) (2) (3) The heapify operation works by assuming that each binary subtree at Left(i) and Right(i) is a heap, but A[i] may be smaller than its children, thus violating the heap property. The value at A[i] is compared with its children and recursively propagated down the heap if necessary until the heap property is restored. Note: the root node index starts at 1 rather than 0 Parameters: vals the array of values n the number of items in the array root the root node for the heapify operation HeapSort template<class T> voidheapSort( T* vals int n ) Heap sort permutes an array of values to lie in ascending numerical order. The code for heap sort is fairly compact, making it a good all round choice. The complexity of heap sort is: (4) Heap sort is best applied to very large data sets, of over a million items. The algorithm sorts in-place and therefore does not require the creation of auxiliary arrays (such as Array/Sort/bucket_Sort ). The algorithm is also non-recursive, so can never cause a stack overflow in the case of large data arrays. Heap sort is beaten by Array/Sort/quick_Sort in terms of speed, however, with large data sets the recursive nature of quick sort can become a limiting factor. Heap sort works by firstly transforming the array of values into a max-heap using the maxHeapify function. The maximum values are then read out from the top of the heap one at a time, shrinking the heap by one iteratively. A index moves from the end of the array backwards. Here the maximum values are stored, creating the ascending sorted list. Example 1: #include <stdio.h> #include <stdlib.h> #include <math.h> #include <codecogs/array/sort/heap_sort.h> int main() { double vals[25]; int n=25; for (int i=0; i<n; i++) vals[i]=((double) n*rand())/RAND_MAX; printf("\nArray to be sorted:\n"); for (int i=0; i<n; i++) printf("%3.0f ", vals[i]); Array::Sort::heapSort(vals, n); printf("\nSorted array:\n"); for (int i=0; i<n; i++) printf("%3.0f ", vals[i]); printf("\n"); return 0; } Output: Array to be sorted: 23 6 6 13 11 10 13 4 9 4 17 14 7 5 16 7 21 7 17 17 23 6 16 9 18 6 6 6 7 7 7 9 Sorted array: 4 4 5 9 10 17 18 11 21 13 23 13 14 16 16 23 Parameters: vals the array of values to be sorted n the number of items in the array 17 17