Chapter 1 Introduction

advertisement
Chapter 1
Introduction
What is an Algorithm?
1-1
Algorithm
Historical Perspective
• An algorithm is a sequence of unambiguous
instructions for solving a problem, i.e., for obtaining a
required output for any legitimate input in a finite
amount of time.
• Can be represented various forms
• Muhammad ibn Musa Al-Khwarizmi
– 9th century mathematician
– “father of algebra”
– al-Khwarizmi (Algorizm) (770 - 840 C.E.)
• Euclid’s algorithm for finding the greatest common divisor
• Unambiguity/clearness
• The word algorism originally referred only to the rules of
performing arithmetic using Hindu-Arabic numerals but
evolved via European Latin translation of Al-Khwarizmi's
name into algorithm by the 18th century. The use of the
word evolved to include all definite procedures for solving
problems or performing tasks.
• Effectiveness
• Finiteness/Termination
• Correctness
1-2
1-3
Notions of algorithm and problem
Example of computational problem: sorting
• Statement of problem:
– Input: A sequence of n numbers <a1, a2, …, an>
– Output: A reordering of the input sequence <a’1, a’2, …,
a’n> so that a’i ≤ a’j whenever i < j
Problem
Algorithm
input
(or instance)
“computer”
• Instance: The sequence <5, 3, 2, 8, 3>
• Algorithms:
output
–
–
–
–
algorithmic solution
(different from a conventional solution)
Selection sort
Insertion sort
Merge sort
… many others
1-4
1-5
Selection Sort
•
•
•
•
•
•
•
•
•
•
• Input: array a[1], …, a[n]
• Output: array a sorted in non-decreasing
order
• Algorithm:
for i=1 to n
swap a[i] with smallest of a[i], …, a[n]
• Is this unambiguous? Effective?
• See also pseudocode, Section 3.1.
1-6
Some Well-known Computational
Problems
Sorting
Searching
Shortest paths in a graph
Minimum spanning tree
Primality testing
Traveling salesman problem
Knapsack problem
Chess
Towers of Hanoi
Program termination
Some of these problems don’t
have efficient algorithms, or
algorithms at all!
1-7
Basic Issues Related to Algorithms
• How to design algorithms?
• How to express algorithms?
What is an algorithm?
Recipe, process, method, technique, procedure,
routine,… with the following requirements:
1. Finiteness
• terminates after a finite number of steps
• Efficiency (or complexity) analysis
2. Definiteness
– Theoretical analysis
– Empirical analysis
• rigorously and unambiguously specified
3. Clearly specified input
• valid inputs are clearly specified
4. Clearly specified/expected output
• Does there exist a better algorithm?
• can be proved to produce the correct output given a valid input
– Lower bounds
– Optimality
5. Effectiveness
• steps are sufficiently simple and basic
1-8
What does algorithm look like?
Pseudocode
Program-like
Algorithm Sample
Input: …
Output: …
Step 1: …
Step 2: …
Step 3: …
…
Step n: …
Type Sample ( i1, i2, …, ik )
{
…
…
…
…
return output
}
1-9
Why study algorithms?
• Theoretical importance
– the core of computer science
• Practical importance
– A practitioner’s toolkit of known algorithms
– Framework for designing and analyzing algorithms
for new problems
實用 & 實際
1-10
1-11
Euclid’s Algorithm
Two descriptions of Euclid’s algorithm
Problem: Find gcd(m,n), the greatest common divisor
of two nonnegative, not both zero integers m and n
Examples: gcd(60,24)=12, gcd(60,0)=60, gcd(0,0)=?0
Euclid’s algorithm is based on repeated application of
equality
gcd(m,n) = gcd(n, m mod n)
until the second number becomes 0, which makes the
problem trivial, m≥n.
Example: gcd(60,24)=gcd(24,12)=gcd(12,0)=12
A Step 1: If n = 0, return m and stop; otherwise go
to Step 2.
Step 2: Divide m by n and assign the value of
the remainder to r.
Step 3: Assign the value of n to m and the value
of r to n. Go to Step 1.
Pseudocode
B while n ≠ 0 do
Program-like
r ← m mod n
m← n
n←r
return m
1-12
Other methods for gcd(m,n) (1/2)
Consecutive integer checking algorithm
Step 1 Assign the value of min{m,n} to t
Step 2 Divide m by t. If the remainder is 0, go to
Step 3; otherwise, go to Step 4
Step 3 Divide n by t. If the remainder is 0, return t
and stop; otherwise, go to Step 4
Step 4 Decrease t by 1 and go to Step 2
Is this slower than Euclid’s algorithm?
How much slower?
O(n), if n ≤ m , vs. O(log n)
1-13
Other methods for gcd(m,n) (2/2)
Middle-school procedure
Step 1 Find the prime factorization of m
Step 2 Find the prime factorization of n
Step 3 Find all the common prime factors
Step 4 Compute the product of all the common
prime factors and return it as gcd(m,n)
Is this an algorithm?
How efficient is it? Time complexity: O(
1-14
)
1-15
Sieve of Eratosthenes (ca. 200 B.C.)
Sieve of Eratosthenes - Example
Input: Integer n ≥ 2
Output: List of primes less than or equal to n
for p ← 2 to n do A[p] ← p
for p ← 2 to n do
if A[p] ≠ 0 //p hasn’t been previously eliminated from the list
j ← p* p
while j ≤ n do
A[j] ← 0 //mark element as eliminated
j←j+p
Algorithm steps for primes below 120 (including optimization of
terminating when square of prime exceeds upper limit)
1-16
Example: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1-17
Fundamentals of Algorithmic
Problem Solving
1-18
1-19
Ascertaining the Capabilities of the
Computational Device
Understanding the Problem
• An input to an algorithm specifies an instance of the
problem the algorithm solves. It is very important to
specify exactly the set of instances the algorithm needs
to handle. (As an example, recall the variations in the
set of instances for the three greatest common divisor
algorithms discussed in the previous section.)
• If you fail to do this, your algorithm may work correctly
for a majority of inputs but crash on some “boundary”
value. Remember that a correct algorithm is not one
that works most of the time, but one that works
correctly for all legitimate inputs.
• Once you completely understand a problem, you need to
ascertain the capabilities of the computational device the
algorithm is intended for. The vast majority of algorithms in
use today are still destined to be programmed for a computer
closely resembling the von Neumann machine—a computer
architecture outlined by the prominent Hungarian-American
mathematician John von Neumann (1903–1957), in
collaboration with A. Burks and H. Goldstine, in 1946.
• The essence of this architecture is captured by the so-called
random-access machine (RAM). Its central assumption is that
instructions are executed one after another, one operation at
a time. Accordingly, algorithms designed to be executed on
such machines are called sequential algorithms.
• Parallel Algorithms ???
1-20
1-21
Choosing between Exact and
Approximate Problem Solving
Von Neumann architecture scheme
• The next principal decision is to choose between solving the
problem exactly or solving it approximately. In the former case,
an algorithm is called an exact algorithm; in the latter case, an
algorithm is called an approximation algorithm. Why would
one opt for an approximation algorithm?
Pascal GP100 Full GPU with 60 SM Units
(NVIDIA Tesla P100)
1-22
– First, there are important problems that simply cannot be solved
exactly for most of their instances.
– Second, available algorithms for solving a problem exactly can be
unacceptably slow because of the problem’s intrinsic complexity. This
happens, in particular, for many problems involving a very large
number of choices; you will see examples of such difficult problems in
Chapters 3, 11, and 12.
– Third, an approximation algorithm can be a part of a more
sophisticated algorithm that solves a problem exactly.
1-23
Designing an Algorithm and Data
Structures
Algorithm Design Techniques
• An algorithm design technique (or “strategy” or “paradigm”)
is a general approach to solving problems algorithmically that
is applicable to a variety of problems from different areas of
computing
–
–
–
–
–
Brute force
Decrease and conquer
Divide and conquer
Transform and conquer
Space and time
tradeoffs
–
–
–
–
–
Greedy approach
Dynamic programming
Iterative improvement
Backtracking
Branch and bound
1-24
Analysis of Algorithms
• Pseudocode is a mixture of a natural language
and programming language-like constructs.
• Pseudocode is usually more precise than
natural language, and its usage often yields
more succinct algorithm descriptions.
Algorithms + Data Structures
+ Programming Language
||
Programs
1-25
Coding an Algorithm
• How good is the algorithm?
– time efficiency
– space efficiency
– correctness ignored in this course
• In the academic world, the question leads to an interesting
but usually difficult investigation of an algorithm’s optimality.
Actually, this question is not about the efficiency of an
algorithm but about the complexity of the problem it solves:
What is the minimum amount of effort any algorithm
will need to exert to solve the problem?
• Characteristics of an Algorithm
– simplicity
– generality
• Does there exist a better algorithm?
– lower bounds
– optimality
1-26
• For some problems, the answer to this question is known. For
example, any algorithm that sorts an array by comparing
values of its elements needs about nlog2n comparisons for
some arrays of size n.
• But for many seemingly easy problems such as integer
multiplication, computer scientists do not yet have a final
1-27
answer.
In conclusion
Example
Question: #9 in Exercises 1.2
Consider the following algorithm for finding the distance between the two
closet elements in an array of numbers.
As a rule, a good algorithm is
a result of repeated effort and
rework.
Algorithm MinDistance(A[0..n − 1])
Example:
100, 77, 20, 50, 82, 33, 120, 180
//Input: Array A[0..n − 1] of numbers
//Output: Minimum distance between two of its elements
dmin ←∞
for i ← 0 to n − 1 do
for j ← 0 to n − 1 do
if i ≠ j and |A[i] − A[j]| < dmin
dmin ← |A[i] − A[j]|
return dmin
1-28
Make as many improvements as you can in this algorithmic solution to the
problem. (If you need to, you may change the algorithm altogether; if not,
improve the implementation given.)
1-29
Important Problem Types
•
•
•
•
•
•
•
Important Problem Types
1-30
Sorting
Searching
String processing
Graph problems
Combinatorial problems
Geometric problems
Numerical problems
1-31
Sorting (I)
Sorting (II)
• Examples of sorting algorithms
• Rearrange the items of a given list in
ascending order.
–
–
–
–
–
– Input: A sequence of n numbers <a1, a2, …, an>
– Output: A reordering <a’1, a’2, …, a’n> of the input
sequence such that a’1≤ a’2 ≤ … ≤ a’n.
• Why sorting?
Selection sort
Bubble sort
Insertion sort
Merge sort
Heap sort …
• Evaluate sorting algorithm complexity: the
number of key comparisons.
• Two properties
– Help searching
– Algorithms often use sorting as a key subroutine.
– Stability: A sorting algorithm is called stable if it
preserves the relative order of any two equal elements
in its input.
– In place: A sorting algorithm is in place if it does not
require extra memory, except, possibly for a few
memory units.
• Sorting key
– A specially chosen piece of information used to guide
sorting. E.g., sort student records by names.
1-32
Selection Sort
1-33
Searching
• Find a given value, called a search key, in a
given set.
• Examples of searching algorithms
Algorithm SelectionSort(A[0..n-1])
//The algorithm sorts a given array by selection sort
//Input: An array A[0..n-1] of orderable elements
//Output: Array A[0..n-1] sorted in ascending order
for i  0 to n – 2 do
min  i
for j  i + 1 to n – 1 do
if A[j] < A[min]
min  j
swap A[i] and A[min]
1-34
– Sequential search
– Binary search, See below
Input: sorted array ai < … < aj and key x;
m (i+j)/2;
while i < j and x != am do
Time: O(log n)
if x < am then j  m-1
else i  m+1;
if x = am then output am;
1-35
String Processing
Graph Problems
• A string is a sequence of characters from an alphabet.
• Text strings: letters, numbers, and special characters.
• String matching: searching for a given word/pattern
in a text.
Examples:
• Informal definition
– A graph is a collection of points called vertices, some of
which are connected by line segments called edges.
• Modeling real-life problems
–
–
–
–
Modeling WWW
Communication networks
Project scheduling
…
–
–
–
–
–
Graph traversal algorithms
Shortest-path algorithms
Topological sorting
Graph-coloring problems (#8 in Exercises 1.3)
…
• Examples of graph algorithms
(i) searching for a word or phrase on WWW or
in a Word document
(ii) searching for a short read in the reference
genomic sequence
1-36
1-37
Find a Euler circuit
Fundamental Data Structures
Find a Hamiltonian circuit
Graph-coloring problems1-38
1-39
Fundamental data structures
• list
– array
– linked list
– string
Linear Data Structures
• Arrays
• graph
• tree and binary tree
• set and dictionary

– A sequence of n items of the
same data type that are stored
contiguously in computer
memory and made accessible by
specifying a value of the array’s
index.
• stack
• queue
• priority queue/heap




• Linked List
– A sequence of zero or more
nodes each containing two kinds
of information: some data and
one or more links called pointers
to other nodes of the linked list.
– Singly linked list (next pointer)
– Doubly linked list (next +
previous pointers)
Arrays

Linked Lists




a1
fixed length (need preliminary
reservation of memory)
contiguous memory locations
direct access
Insert/delete
dynamic length
arbitrary memory locations
access by following links
Insert/delete
a2
…
an
.
1-41
1-40
Stacks and Queues
Priority Queue and Heap
• Stacks
• Priority queues (implemented using heaps)
– A stack of plates
– A data structure for maintaining a set of elements,
each associated with a key/priority, with the
following operations
• insertion/deletion can be done only at the top.
• LIFO
– Two operations (push and pop)
• Queues
• Finding the element with the highest priority
• Deleting the element with the highest priority
• Inserting a new element
9
6
8
– Scheduling jobs on a shared computer
5 2 3
– A queue of customers waiting for services
• Insertion/enqueue from the rear and deletion/dequeue
from the front.
• FIFO
– Two operations (enqueue and dequeue)
9 6 8 5 2 3
1-42
1-43
Graphs
Graph Representation
• Formal definition
• Adjacency matrix
– A graph G = <V, E> is defined by a pair of two sets: a finite
set V of items called vertices and a set E of vertex pairs
called edges.
– n x n boolean matrix if |V| is n.
– The element on the ith row and jth column is 1 if there’s an edge
from ith vertex to the jth vertex; otherwise 0.
– The adjacency matrix of an undirected graph is symmetric.
• Undirected and directed graphs (digraphs).
• What’s the maximum number of edges in an
undirected graph with |V| vertices?
• Complete, dense, and sparse graphs
• Adjacency linked lists
– A collection of linked lists, one for each vertex, that contain all the
vertices adjacent to the list’s vertex.
• Which data structure would you use if the graph is a 100node star shape?
– A graph with every pair of its vertices connected by an
edge is called complete, K|V|
1
2
3
4
0111
1000
1000
1000
1-44
Weighted Graphs
•
• Weighted graphs
– Graphs or digraphs with numbers assigned to the edges.
6
1
3
5
9
8
2
2
1
1
1
3
4
1-45
Graph Properties - Paths and
Connectivity
Paths
– A path from vertex u to v of a graph G is defined as a
sequence of adjacent (connected by an edge) vertices that
starts with u and ends with v.
– Simple paths: All edges of a path are distinct.
– Path lengths: the number of edges, or the number of
vertices – 1.
• Connected graphs
7
4
– A graph is said to be connected if for every pair of its
vertices u and v there is a path from u to v.
• Connected component
– The maximum connected subgraph of a given graph.
1-46
1-47
Graph Properties - Acyclicity
• Cycle
– A tree (or free tree) is a connected acyclic graph.
– Forest: a graph that has no cycles but is not
necessarily connected.
– A simple path of a positive length that starts and ends
a the same vertex.
• Acyclic graph
• Properties of trees
– A graph without cycles
– DAG (Directed Acyclic Graph)
1
2
3
4
Trees
• Trees
– For every two vertices in a tree there always exists
exactly one simple path from one of these vertices to
the other. Why?
• Rooted trees: The above property makes it possible to select an
arbitrary vertex in a free tree and consider it as the root of the so
called rooted tree.
rooted
• Levels in a rooted tree.

|E| = |V| - 1
1-48
Rooted Trees (I)
3
2
4
3
4
1
2
5
1-49
• Depth of a vertex
– For any vertex v in a tree T, all the vertices on the simple path
from the root to that vertex are called ancestors.
– The length of the simple path from the root to the vertex.
Descendants
• Height of a tree
– All the vertices for which a vertex v is an ancestor are said to be
descendants of v.
– The length of the longest simple path from the root to a leaf.
• Parent, child and siblings
– If (u, v) is the last edge of the simple path from the root to
vertex v, u is said to be the parent of v and v is called a child of
u.
– Vertices that have the same parent are called siblings.
h=2
3
4
• Leaves
– A vertex without children is called a leaf.
1
5
2
• Subtree
– A vertex v with all its descendants is called the subtree of T
rooted at v.
5
Rooted Trees (II)
• Ancestors
•
1
1-50
1-51
Ordered Trees
Summary (1/2)
• Ordered trees
– An ordered tree is a rooted tree in which all the children of each vertex
are ordered.
• Binary trees
– A binary tree is an ordered tree in which every vertex has no more than
two children and each children is designated s either a left child or a
right child of its parent.
• Binary search trees
– Each vertex is assigned a number.
– A number assigned to each parental vertex is larger than all the
numbers in its left subtree and smaller than all the numbers in its right
subtree.
• log2n ≤ h ≤ n – 1, where h is the height of a binary tree with n
nodes.
9
6
6
8
3
9
5 2 3
2 5
8
1-52
Summary (2/2)
• A good algorithm is usually the result of repeated
efforts and rework.
• The same problem can often be solved by several
algorithms.
• Algorithms operate on data. This makes the issue of
data structuring critical for efficient algorithmic
problem solving.
• An abstract collection of objects with several
operations that can be performed on them is called an
abstract data type (ADT). Modern object-oriented
languages support implementation of ADTs by means
of classes.
1-54
• An algorithm is a sequence of nonambiguous instructions
for solving a problem in a finite amount of time. An input to
an algorithm specifies an instance of the problem the
algorithm solves.
• Algorithms can be specified in a natural language or
pseudocode; they can also be implemented as computer
programs.
• Among several ways to classify algorithms, the two
principal alternatives are:
– to group algorithms according to types of problems they solve
– to group algorithms according to underlying design techniques
they are based upon
1-53
Download