Data Structures and Algorithm Analysis Lecturer: Jing Liu Email: neouma@mail.xidian.edu.cn Homepage: http://see.xidian.edu.cn/faculty/liujing Textbook Mark Allen Weiss, Data Structures and Algorithm Analysis in C, China Machine Press. Grading Final exam: 70% Others: 30% What are Data Structures and Algorithms? Data Structures are methods of organizing large amounts of data. An algorithm is a procedure that consists of finite set of instructions which, given an input from some set of possible inputs, enables us to obtain an output if such an output exists or else obtain nothing at all if there is no output for that particular input through a systematic execution of the instructions. Inputs (Problems) Instructions Computers Outputs (Answers) Programming Languages Data Structure Algorithms Software Systems Contents Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter 3 Lists, Stacks, and Queues 4 Trees 5 Hashing 6 Priority Queues (Heaps) 7 Sorting 8 The Disjoint Set ADT 9 Graph Algorithms 10 Algorithm Design Techniques Abstract Data Types (ADTs) One of the basic rules concerning programming is to break the program down into modules. Each module is a logical unit and does a specific job. Its size is kept small by calling other modules. Modularity has several advantages. (1) It is much easier to debug small routines than large routines; (2) It is easier for several people to work on a modular program simultaneously; (3) A well-written modular program places certain dependencies in only one routing, making changes easier. Abstract Data Types (ADTs) An abstract data type (ADT) is a set of operations. Abstract data types are mathematical abstractions; nowhere in an ADT’s definition is there any mention of how the set of operations is implemented. Objects such as lists, sets, and graphs, along with their operations, can be viewed as abstract data types, just as integers, reals, and booleans are data types. Integers, reals, and booleans have operations associated with them, and so do ADTs. Abstract Data Types (ADTs) The basic idea is that the implementation of the operations related to ADTs is written once in the program, and any other part of the program that needs to perform an operation on the ADT can do so by calling the appropriate function. If for some reason implementation details need to be changed, it should be easy to do so by merely changing the routings that perform the ADT operations. There is no rule telling us which operations must be supported for each ADT; this is a design decision. The List ADT The form of a general list: A1, A2, A3, …, AN; The size of this list is N; An empty list is a special list of size 0; For any list except the empty list, we say that Ai+1 follows (or succeeds) Ai (i<N) and that Ai-1 precedes Ai (i>1); The first element of the list is A1, and the last element is AN. We will not define the predecessor of A1 or the successor of AN. The position of element Ai in a list is i. The List ADT There is a set of operations that we would like to perform on the list ADT: PrintList MakeEmpty Find: return the position of the first occurrence of a key Insert and Delete: insert and delete some key from some position in the list FindKth: return the element in some position Next and Previous: take a position as argument and return the position of the successor and predecessor The List ADT Example: The list is 34, 12, 52, 16, 13 Find(52) Insert(X, 3) Delete(52) The interpretation of what is appropriate for a function is entirely up to the programmer. Simple Array Implementation of Lists All these functions about lists can be implemented by using an array. PrintList MakeEmpty Find Insert Delete Next Previous Simple Array Implementation of Lists Disadvantages: An estimate of the maximum size of the list is required, even if the array is dynamically allocated. Usually this requires a high overestimate, which wastes considerable space. Insertion and deletion are expensive. For example, inserting at position 0 requires first pushing the entire array down one spot to make room. Because the running time for insertions and deletions is so slow and the list size must be known in advance, simple arrays are generally not used to implement lists. Linked Lists In order to avoid the linear cost of insertion and deletion, we need to ensure that the list is not stored contiguously, since otherwise entire parts of the list will need to be moved. A1 A2 A3 A4 A5 A linked list The linked list consists of a series of structures, which are not necessarily adjacent in memory. Each structure contains the element and a pointer to a structure containing its successor. We call this the Next pointer. The last cell’s Next pointer points to NULL; Linked Lists If P is declared to be a pointer to a structure, then the value stored in P is interpreted as the location, in main memory, where a structure can be found. A field of that structure can be accessed by P->FieldName, where FieldName is the name of the field we wish to examine. A1 800 1000 A2 712 800 A3 992 712 A4 692 992 Linked list with actual pointer values A5 0 692 In order to access this list, we need to know where the first cell can be found. A pointer variable can be used for this purpose. Linked Lists To execute PrintList(L) or Find(L, Key), we merely pass a pointer to the first element in the list and then traverse the list by following the Next pointers. The Delete command can be executed in one pointer change. A1 A1 A2 A3 A4 A5 The Insert command requires obtaining a new cell from the system by using a malloc call and then executing two pointer maneuvers. A2 A3 X A4 A5 Linked Lists There are several places where you are likely to go wrong: (1) There is no really obvious way to insert at the front of the list from the definitions given; (2) Deleting from the front of the list is a special case, because it changes the start of the list; careless coding will lose the list; (3) A third problem concerns deletion in general. Although the pointer moves above are simple, the deletion algorithm requires us to keep track of the cell before the one that we want to delete. Linked Lists One simple change solves all three problems. We will keep a sentinel node, referred to an a header or dummy node. A1 Header A2 A3 A4 A5 Linked list with a header L To avoid the problems associated with deletions, we need to write a routing FindPrevious, which will return the position of the predecessor of the cell we wish to delete. If we use a header, then if we wish to delete the first element in the list, FindPrevious will return the position of the header. Doubly Linked Lists Sometimes it is convenient to traverse lists backwards. The solution is simple. Merely add an extra field to the data structure, containing a pointer to the previous cell. The cost of this is an extra link, which adds to the space requirement and also doubles the cost of insertions and deletions because there are more pointers to fix. A1 A2 A3 A4 A doubly linked list How to implement doubly linked lists? A5 Circularly Linked Lists A popular convention is to have the last cell keep a pointer back to the first. This can be done with or without a header. If the header is present, the last cell points to it. It can also be done with doubly linked lists, the first cell’s previous pointer points to the last cell. A1 A2 A3 A4 A double circularly linked list A5 Example – The Polynomial ADT F(X ) = å N i A X i= 0 i If most of the coefficients Ai are nonzero, we can use a simple array to store the coefficients. Write codes to calculate F(X) based on array. Example – The Polynomial ADT P1 ( X ) = 10 X 1000 14 + 5X + 1 If most of the coefficients Ai are zero, the implementation based on array is not efficient, since most of the time is spent in multiplying zeros. 10 1000 5 14 1 0 P1 An alternative is to use a singly linked list. Each term in the polynomial is contained in one cell, and the cells are sorted in decreasing order of exponents. Stack ADT A stack is a list with the restriction that insertions and deletions can be performed in only one position, namely, the end of the list, called the top. The fundamental operations on a stack are Push, which is equivalent to an insert, and Pop, which deletes the most recently inserted element. The most recently inserted element can be examined prior to performing a Pop by use of the Top routine. Stack ADT A Pop or Top on an empty stack is generally considered an error in the stack ADT. Running out of space when performing a Push is an implementation error but not an ADT error. Stacks are sometimes known as LIFO (last in, first out) lists. Stack ADT Top 7 9 4 2 3 6 Stack model: only the top element is accessible Implementation of Stacks Since a stack is a list, any list implementation will do. We will give two popular implementations. One uses pointers and the other uses an array. No matter in which case, if we use good programming principles, the calling routines do not need to know which method is being used. Linked List Implementation of Stacks We perform a Push by inserting at the front of the list We perform a Pop by deleting the element at the front of the list A Top operation merely examines the element at the front of the list, returning its value. Array Implementation of Stacks A Stack is defined as a pointer to a structure. The structure contains the TopOfStack and Capacity fields. Once the maximum size is known, the stack array can be dynamically allocated. Associated with each stack is TopOfStack, which is -1 for an empty stack (this is how an empty stack is initialized). Array Implementation of Stacks To push some element X onto the stack, we increment TopOfStack and then set Stack[TopOfStack]=X, where Stack is the array representing the actual stack. To pop, we set the return value to Stack[TopOfStack] and then decrement TopOfStack. Example – Conversion of Numbers We have many different data systems, like Decimal system, Binary system, Hexadecimal system, Octal system Convert a decimal number to a binary number Decimal Number Divisor Quotient Remainder 30 2 15 0 15 2 7 1 7 2 3 1 3 2 1 1 1 2 0 1 Function calls. The Queue ADT Like stacks, queues are lists. With a queue, however, insertion is done at one end, whereas deletion is performed at the other end. The basic operations on a queue are Enqueue, which inserts an element at the end of the list (called the rear), and Dequeue, which deletes (and returns) the element at the start of the list (known as the front). Array Implementation of Queues For each queue data structure, we keep an array, Queue[], and the positions Front and Rear, which represent the ends of the queue. We also keep track of the number of elements that are actually in the queue, Size. All this information is part of one structure. The following figure shows a queue in some intermediate state. The cells that are blanks have undefined values in them: 5 Front 2 7 1 Rear Array Implementation of Queues To Enqueue an element X, we increment Size and Rear, then set Queue[Rear]=X. To Dequeue an element, we set the return value to Queue[Front], decrement Size, and then increment Front. Whenever Front or Rear gets to the end of the array, it is wrapped around to the beginning. This is known as a circular array implementation. Initial State 2 4 Front Rear Enqueue (1) Enqueue (3) 1 2 Rear Front 1 Dequeue, which returns 2 1 Dequeue, which returns 4 1 3 2 Rear Front 3 2 4 4 Front Rear 3 4 2 4 2 4 2 4 Front Rear Dequeue, which returns 1 1 Dequeue, which returns 3 and makes the Queue empty 1 3 Rear Front 3 Rear Front Linked List Implementation of Queues Front Header …… Rear / Linked List Implementation of Queues Empty Queue Front Reart Enqueue x Front Reart x Enqueue y Front Reart x y / Dequeue x Front Reart x y / / / Example Applications When jobs are submitted to a printer, they are arranged in order of arrival. Every real-life line is a queue. For instance, lines at ticket counters are queues, because service is firstcome first-served. A whole branch of mathematics, known as queueing theory, deals with computing, probabilistically, how long users expect to wait on a line, how long the line gets, and other such questions. Homework of Chapter 3 Exercises: 3.2 (Don’t need to analyze the running time.) 3.3 3.4 3.5 3.21 3.25