Additional Laboratory in ADA TDDB 57 Data structures and algorithms Fall 2006 September 2006 2 Getting started To get started you will need access to the assignment skeletons. These can be found in the course directory /home/TDDB57/skel/ . It is appropriate to copy these files to your home directory according to the commands below: mkdir ~/dalg/ cp -r ~TDDB57/skel/* ~/dalg After this, all necessary files can be found in your directory. Go to the directory by typing: cd ls ~/dalg/ This lab assignment needs some programs from the directory /home/TDDB57/bin and access to an Ada compiler. You should therefore add these paths with the following commands: module add ~TDDB57/modules/tddb57 module initadd ~TDDB57/modules/tddb57 After you have passed the laboratory course you can remove the paths: module initrm ~TDDB57/modules/tddb57 Good luck! 1 2 Getting started Laboratory 5 Aim. The objective is to learn implementation techniques for heap and to demonstrate how heap can be used for sorting. Preparation. Read sections 8.1.3, 8.3.1-8.3.3 and 8.3.5 of the course book. Heapsort One of the applications of the data structure heap is the sorting algorithm Heapsort. Heapsort has the worst case time complexity O(n log n), which is better than that of Quicksort1 . Another advantage of Heapsort is that it does not require any additional memory space. More precisely, the size of the memory used in addition to that needed for storing the input data is constant (and very small). One can develop a sorting algorithm based on the priority queue implemented with a heap. The idea would be to place all elements of the sorted sequence in the priority queue, using insert and then iterate removeMin operation to retrieve them in increasing order. Recall that heap is a partially ordered tree with implicit representation in an array. As described in Section 8.3.3 of the course book, insert places a new element as additional leaf of a given heap. The obtained tree may not be a partially ordered tree since the new node may be smaller than its parent. The insertion is completed by iteratively swapping the child node that is smaller than its parent with the parent until a partially ordered tree is obtained. In this lab we also use a variant of heap but we organize sorting in a different way. Our variant of heap has its maximal element in the root and the children of every node are not larger than this node. All input data are initially placed in an array. The array is seen as an implicit representation of a tree. We first have to make it into a heap. Then the root of the tree (i.e. the first element of the array) is the maximal element in given ordering. We swap it with the last element of the array and we make the remaining part of the array into heap. Iteration of this process will place all elements in the array in the non-decreasing order. Thus our heapsort algorithm has the following two phases: 1. The heapification phase starts with input data stored in an array. Go through the array in reverse order and for each element x heapify the subtree which has its root in position x. Since we heapify in the reverse order, the children subtrees of x are already heapified; for proper 1 Despite this fact Quicksort is often faster than Heapsort in real life. 3 4 Laboratory 5 placement of x we can use the procedure shift-down (see below) which, if necessary swaps x with the maximal of its children, and subsequently with some further descendants. As a result of this phase the array becomes a heap with maximal element placed in the root (i.e. as the first element of the array). 2. The sorting phase starts with the heap obtained as the result of the first phase. At every step of the sorting phase the array consists of two parts: a heap placed at the beginning of the array followed by a sorted list. Initially the sorted list is empty. Each step of the sorting phase goes as follows. Let k > 1 be the size of the actual heap (the heap is placed in the first k elements of the array. Do the removeMax operation by swapping the first and the last element of the heap, followed by shift-down operation on the first element to restore the heap of size (k − 1). This phase is completed when k = 1, in which case the array includes initial data sorted in increasing order. The Heapsort algorithm outlined above uses the auxiliary procedure described below: • procedure shiftdown(a:ref array t; l, u: integer) The array t is an implicit representation of a tree. The indices of the interval l:u denote thus specific nodes of the tree, as described in the course book p.338. Before the call to shiftdown every node of the tree possibly except of the first one (position a[l]) should be greater than or equal each of its children. After the execution this property should hold for all nodes of the interval including the first one. This is achieved by iterative swapping of the node not satisfying this property with its maximal child. Notice that the section of the array processed by this procedure need not to be a heap. Instead of using shiftdown it is possible to implement shiftup (not done in this lab though) and use it for creation of the heap by adding the elements one-by-one. As in this case the element added would be the last element of the considered section of the array the violation of the heap condition would only be possible at the end of the section, not at the beginning. Otherwise the same requirements as in shiftdown would apply. The Assignment. Write a Heapsort program and the procedure shiftdown as described above. Analyze the time-complexity of shiftdown and of the heapification phase of Heapsort. What would be the time complexity of heapification of input data done by one-by-one insertion using insert and shiftup? The skeleton for this assignment hsort.adb is shown below. Replace the dummy code sections with your implementation of the procedures. Compile your program with the command make hsort The resulting executable file gets the name hsort . Run it and check the output: > ./hsort < random | checkorder | more Reporting. Demonstrate your program to the assistant. The report should include the printout of the code and complexity analysis of both heapification algorithms discussed above. Laboratory 5 File hsort.adb with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; with Ada.Float_Text_IO; use Ada.Float_Text_IO; with Ada.Real_Time; use Ada.Real_Time; procedure Hsort is -- optimize for speed -- comment-out when debugging pragma Suppress(All_Checks); Size: constant := 1000; type Array_T is array(1..Size) of Integer; procedure Siftdown(A: in out Array_T; L, U: Integer) is begin null; -- dummy code end; pragma Inline(Siftdown); procedure Heapsort(A: in out Array_T) is begin null; -- dummy code end; procedure Read_Sequence(A: out Array_T) is begin for I in A’Range loop Get(A(I)); end loop; end; procedure Write_Sequence(A: Array_T) is begin for I in A’Range loop Put(A(I), 7); if I mod 10 = 0 then New_Line; end if; end loop; end; Iter: constant := 200; 5 6 Laboratory 5 A: Array_T; As: array(1..Iter) of Array_T; Start: Time; T: Float; begin Read_Sequence(A); As(1..Iter) := (others => A); Start := Clock; for I in 1..Iter loop Heapsort(As(I)); end loop; T := Float(To_Duration(Clock - Start)); Put("Time used in heapsort: "); Put(1000.0*T/Float(Iter), Exp => 0, Fore => 3, Aft => 2); Put_Line(" ms"); Write_Sequence(As(1)); end; Appendix A Tracing program errors There are different ways to debug a program: 1. Looking into the code and searching for errors. 2. Adding to the code additional statements for printing values of selected variables and/or the trace of the parts of the program entered during the test execution. 3. Using a debugger, such as gdb. Before asking the assistant what is wrong with your code you should first try to locate the problem with the methods 1 or 2 and 3, in this order. The use of debugger is recommended if the execution in Ada ends abruptly. This is caused by exception which is not caught by the exception handler and the debugger usually facilitates finding the reasons of such errors. The debugger can also be used for tracing the execution step-by-step (for more details see Section A). Using the gdb debugger Gdb is primarily designed for C++ and its use for Ada can sometimes be more difficult. To be able to use the debugger properly, compilation has to be done using a special flag which causes storing additional inforamtion in the compiled program. Use the flag -g after gnatmake. For the first use of -g one should also add -f for recompilation of all files, e.g.: gnatmake -f -g program Start the debugger gdb in emacs by typing: M-x gdb, then state the name of the executable program as an argument to gdb. 7 8 APPENDIX A. TRACING PROGRAM ERRORS Start the program by typing run and possible parameters, e.g.: (gdb) run arg1 arg2 which corresponds to execution of program arg1 arg2 in a shell. Almost all commands in gdb have an abbreviation, e.g. run can be shortened to r. In the sequel all abbreviations are written in parentheses. To halt execution when an exception occurs a breakpoint has to be set. This is done with the command: break exception (b exception) before giving the command run (r). When the execution of a program halts gdb prints out information about about the line and the function that was last executed. If gdb is run inside emacs the window is split into two parts, where the lower one shows the program code with an arrow => pointing out the last executed line. To go to the last executed line type: up RET RET. Use the command backtrace (bt) to look at the whole call-chain. The most common error is a memory access violation, which often is caused by an illegal value of a pointer or array index. To check the value of a variable or an expression use the command print (p). This is illustrated by the following example where i is an integer variable, e.g. an index of an array (gdb) p i $1 = 5627 The value of i is in this case 5627 and $1 part is a counter variable generated by gdb for every print command. To access variables used in earlier calls use the command up. To go back, use the command down (do). There is also a possibility to refer to a variable used in a particular function. This is done using ::. For example the command p main::a will display on the screen the value of a variable in main. The values of all local variables of a function can be printed with the command info locals (i lo). Similarly, the values of all arguments of a function can be obtained by info args (i arg). Any change in the program requires recompilation before using the debug tool again. To restart the debugger type run (r) again, the same arguments used last time will be used automatically. Use the command set confirm off to remove the question ”Do you really want to restart?” each time the run command is used. To apply a new set of arguments use set args command, e.g. set args arg1 arg2 ... To exit the debugger, close the gdb buffer, return to the code window and remove the upper emacs window C-x k RET C-x o C-x 1. 9 Stepwise execution If the examination of variable values in the selected program point is not sufficient to locate the error one has to look more carefully at the steps of execution at the program points where you suspect that things can go wrong. For this one has to halt the execution at these points. For this one has to set so called breakpoints. The breakpoints are often set at the entrance to a function or at certain line of the code. This is done by the command break (b) with the argument being a code line number or a function name. For example, to halt the execution directly after start use: (gdb) b main If there are several functions with the same name, a list from which you can select one of them will appear. Once the breakpoints are in position, use run (r) the ”run” to start the execution of the program. The execution will halt at the first reached breakpoint. Now one can stepwise go through the code using the commands step (s) or next (n). the ”next” (n) or ”step” (s). There is a fundamental difference between the two commands; step will go into the functions whenever there is a function call, while next will not do that. By simply pressing the return key the last command will be repeated. Another useful command is until (u) with a line number or function name as an (optional) argument. It causes the execution go to the program point indicated by the argument, but only within the body of the function. If no argument is given the execution goes to the next program line, which is often used for quick loop traversal. The command clear (cl) is used to remove a breakpoint. All breakpoints can be listed by the command info break (i b) which also assigns numbers to the listed breakpoints. A breakpoint can also be removed by the command delete (d) where the argument is the breakpoint listing number. There are many more features and commands in gdb not discussed here. For details see e.g. the manuals available on the Internet. Program versions This Appendix concerns the use of gnat-3.12p and gnat-patched gdb-4.17.