Data Structure and Algorithm Analysis part 1

advertisement
Data Structures and
Algorithm Analysis
Lecturer: Jing Liu
Email: neouma@mail.xidian.edu.cn
Homepage: http://see.xidian.edu.cn/faculty/liujing
Textbook

Mark Allen Weiss, Data Structures
and Algorithm Analysis in C, China
Machine Press.
Grading


Final exam: 70%
Others: 30%
What are Data Structures
and Algorithms?


Data Structures are methods of organizing
large amounts of data.
An algorithm is a procedure that consists of
finite set of instructions which, given an input
from some set of possible inputs, enables us
to obtain an output if such an output exists or
else obtain nothing at all if there is no output
for that particular input through a systematic
execution of the instructions.
Inputs
(Problems)
Instructions
Computers
Outputs
(Answers)
Programming
Languages
Data
Structure
Algorithms
Software
Systems
Contents
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
3 Lists, Stacks, and Queues
4 Trees
5 Hashing
6 Priority Queues (Heaps)
7 Sorting
8 The Disjoint Set ADT
9 Graph Algorithms
10 Algorithm Design Techniques
Abstract Data Types (ADTs)

One of the basic rules concerning programming is to
break the program down into modules.

Each module is a logical unit and does a specific job.
Its size is kept small by calling other modules.

Modularity has several advantages. (1) It is much
easier to debug small routines than large routines; (2)
It is easier for several people to work on a modular
program simultaneously; (3) A well-written modular
program places certain dependencies in only one
routing, making changes easier.
Abstract Data Types (ADTs)



An abstract data type (ADT) is a set of operations.
Abstract data types are mathematical abstractions;
nowhere in an ADT’s definition is there any mention
of how the set of operations is implemented.
Objects such as lists, sets, and graphs, along with
their operations, can be viewed as abstract data
types, just as integers, reals, and booleans are data
types. Integers, reals, and booleans have operations
associated with them, and so do ADTs.
Abstract Data Types (ADTs)



The basic idea is that the implementation of the
operations related to ADTs is written once in the
program, and any other part of the program that
needs to perform an operation on the ADT can do so
by calling the appropriate function.
If for some reason implementation details need to be
changed, it should be easy to do so by merely
changing the routings that perform the ADT
operations.
There is no rule telling us which operations must be
supported for each ADT; this is a design decision.
The List ADT






The form of a general list: A1, A2, A3, …, AN;
The size of this list is N;
An empty list is a special list of size 0;
For any list except the empty list, we say that Ai+1
follows (or succeeds) Ai (i<N) and that Ai-1 precedes
Ai (i>1);
The first element of the list is A1, and the last
element is AN. We will not define the predecessor of
A1 or the successor of AN.
The position of element Ai in a list is i.
The List ADT
There is a set of operations that we would like to
perform on the list ADT:



PrintList
MakeEmpty
Find: return the position of the first occurrence of a
key



Insert and Delete: insert and delete some key from
some position in the list
FindKth: return the element in some position
Next and Previous: take a position as argument and
return the position of the successor and predecessor
The List ADT
Example: The list is 34, 12, 52, 16, 13



Find(52)
Insert(X, 3)
Delete(52)
The interpretation of what is appropriate for a function
is entirely up to the programmer.
Simple Array
Implementation of Lists

All these functions about lists can be implemented by
using an array.







PrintList
MakeEmpty
Find
Insert
Delete
Next
Previous
Simple Array
Implementation of Lists
Disadvantages:


An estimate of the maximum size of the list is
required, even if the array is dynamically allocated.
Usually this requires a high overestimate, which
wastes considerable space.
Insertion and deletion are expensive. For example,
inserting at position 0 requires first pushing the entire
array down one spot to make room.
Because the running time for insertions and deletions
is so slow and the list size must be known in advance,
simple arrays are generally not used to implement lists.
Linked Lists

In order to avoid the linear cost of insertion and
deletion, we need to ensure that the list is not stored
contiguously, since otherwise entire parts of the list
will need to be moved.
A1
A2
A3
A4
A5
A linked list



The linked list consists of a series of structures, which are not
necessarily adjacent in memory.
Each structure contains the element and a pointer to a structure
containing its successor. We call this the Next pointer.
The last cell’s Next pointer points to NULL;
Linked Lists


If P is declared to be a pointer to a structure, then
the value stored in P is interpreted as the location, in
main memory, where a structure can be found.
A field of that structure can be accessed by
P->FieldName, where FieldName is the name of the
field we wish to examine.
A1
800
1000

A2
712
800
A3
992
712
A4
692
992
Linked list with actual pointer values
A5
0
692
In order to access this list, we need to know where the first cell
can be found. A pointer variable can be used for this purpose.
Linked Lists


To execute PrintList(L) or Find(L, Key), we merely
pass a pointer to the first element in the list and then
traverse the list by following the Next pointers.
The Delete command can be executed in one pointer
change.
A1

A1
A2
A3
A4
A5
The Insert command requires obtaining a new cell
from the system by using a malloc call and then
executing two pointer maneuvers.
A2
A3
X
A4
A5
Linked Lists
There are several places where you are likely to go
wrong:
(1) There is no really obvious way to insert at the front
of the list from the definitions given;
(2) Deleting from the front of the list is a special case,
because it changes the start of the list; careless coding
will lose the list;
(3) A third problem concerns deletion in general.
Although the pointer moves above are simple, the
deletion algorithm requires us to keep track of the cell
before the one that we want to delete.
Linked Lists

One simple change solves all three problems. We will
keep a sentinel node, referred to an a header or
dummy node.
A1
Header
A2
A3
A4
A5
Linked list with a header
L

To avoid the problems associated with deletions, we
need to write a routing FindPrevious, which will
return the position of the predecessor of the cell we
wish to delete. If we use a header, then if we wish
to delete the first element in the list, FindPrevious
will return the position of the header.
Doubly Linked Lists

Sometimes it is convenient to traverse lists
backwards. The solution is simple. Merely add an
extra field to the data structure, containing a pointer
to the previous cell. The cost of this is an extra link,
which adds to the space requirement and also
doubles the cost of insertions and deletions because
there are more pointers to fix.
A1
A2
A3
A4
A doubly linked list

How to implement doubly linked lists?
A5
Circularly Linked Lists


A popular convention is to have the last cell keep a
pointer back to the first. This can be done with or
without a header. If the header is present, the last
cell points to it.
It can also be done with doubly linked lists, the first
cell’s previous pointer points to the last cell.
A1
A2
A3
A4
A double circularly linked list
A5
Example – The Polynomial
ADT
F(X ) =


å
N
i
A
X
i= 0 i
If most of the coefficients Ai are nonzero, we can
use a simple array to store the coefficients.
Write codes to calculate F(X) based on array.
Example – The Polynomial
ADT
P1 ( X ) = 10 X

1000
14
+ 5X + 1
If most of the coefficients Ai are zero, the
implementation based on array is not efficient,
since most of the time is spent in multiplying zeros.
10 1000
5
14
1
0
P1


An alternative is to use a singly linked list.
Each term in the polynomial is contained in one cell,
and the cells are sorted in decreasing order of
exponents.
Stack ADT



A stack is a list with the restriction that insertions and
deletions can be performed in only one position,
namely, the end of the list, called the top.
The fundamental operations on a stack are Push,
which is equivalent to an insert, and Pop, which
deletes the most recently inserted element.
The most recently inserted element can be examined
prior to performing a Pop by use of the Top routine.
Stack ADT

A Pop or Top on an empty stack is generally
considered an error in the stack ADT.

Running out of space when performing a Push is an
implementation error but not an ADT error.

Stacks are sometimes known as LIFO (last in, first
out) lists.
Stack ADT
Top
7
9
4
2
3
6
Stack model: only the top element is accessible
Implementation of Stacks

Since a stack is a list, any list implementation will do.

We will give two popular implementations. One uses
pointers and the other uses an array.

No matter in which case, if we use good
programming principles, the calling routines do not
need to know which method is being used.
Linked List
Implementation of Stacks

We perform a Push by inserting at the front of the
list

We perform a Pop by deleting the element at the
front of the list

A Top operation merely examines the element at
the front of the list, returning its value.
Array Implementation of
Stacks


A Stack is defined as a pointer to a structure. The
structure contains the TopOfStack and Capacity fields.
Once the maximum size is known, the stack array
can be dynamically allocated.
Associated with each stack is TopOfStack, which is -1
for an empty stack (this is how an empty stack is
initialized).
Array Implementation of
Stacks


To push some element X onto the stack, we
increment TopOfStack and then set
Stack[TopOfStack]=X, where Stack is the array
representing the actual stack.
To pop, we set the return value to Stack[TopOfStack]
and then decrement TopOfStack.
Example – Conversion of
Numbers


We have many different data systems, like Decimal
system, Binary system, Hexadecimal system, Octal
system
Convert a decimal number to a binary number
Decimal
Number
Divisor
Quotient
Remainder
30
2
15
0
15
2
7
1
7
2
3
1
3
2
1
1
1
2
0
1
Function calls.
The Queue ADT



Like stacks, queues are lists.
With a queue, however, insertion is done at one
end, whereas deletion is performed at the other
end.
The basic operations on a queue are Enqueue,
which inserts an element at the end of the list
(called the rear), and Dequeue, which deletes
(and returns) the element at the start of the list
(known as the front).
Array Implementation of
Queues



For each queue data structure, we keep an array,
Queue[], and the positions Front and Rear, which
represent the ends of the queue.
We also keep track of the number of elements that
are actually in the queue, Size. All this information is
part of one structure.
The following figure shows a queue in some
intermediate state. The cells that are blanks have
undefined values in them:
5
Front
2
7
1
Rear
Array Implementation of
Queues



To Enqueue an element X, we increment Size and
Rear, then set Queue[Rear]=X.
To Dequeue an element, we set the return value to
Queue[Front], decrement Size, and then increment
Front.
Whenever Front or Rear gets to the end of the array,
it is wrapped around to the beginning. This is known
as a circular array implementation.
Initial State
2
4
Front Rear
Enqueue (1)
Enqueue (3)
1
2
Rear
Front
1
Dequeue, which
returns 2
1
Dequeue, which
returns 4
1
3
2
Rear
Front
3
2
4
4
Front
Rear
3
4
2
4
2
4
2
4
Front Rear
Dequeue, which
returns 1
1
Dequeue, which
returns 3 and makes
the Queue empty
1
3
Rear
Front
3
Rear Front
Linked List
Implementation of Queues
Front
Header
……
Rear
/
Linked List
Implementation of Queues
Empty Queue
Front
Reart
Enqueue x
Front
Reart
x
Enqueue y
Front
Reart
x
y
/
Dequeue x
Front
Reart
x
y
/
/
/
Example Applications



When jobs are submitted to a printer, they are
arranged in order of arrival.
Every real-life line is a queue. For instance, lines at
ticket counters are queues, because service is firstcome first-served.
A whole branch of mathematics, known as
queueing theory, deals with computing,
probabilistically, how long users expect to wait on a
line, how long the line gets, and other such
questions.
Homework of Chapter 3







Exercises:
3.2 (Don’t need to analyze the running time.)
3.3
3.4
3.5
3.21
3.25
Download