Theory of Algorithms 2015/2016 Spring Prof.Dr. Aurél Galántai Óbuda University

advertisement
Theory of Algorithms
2015/2016 Spring
Prof.Dr. Aurél Galántai
04-05-2016
Óbuda University
2
Tartalomjegyzék
1. Introduction
2. Mathematical concepts
2.1. Notations . . . . . . . . . . . . . . . . .
2.2. Relations and functions . . . . . . . . . .
2.3. Asymptotic characterization of functions
2.4. Graphs . . . . . . . . . . . . . . . . . . .
2.5. Cardinality of sets . . . . . . . . . . . .
5
.
.
.
.
.
9
9
10
12
14
17
3. Formal languages
3.1. Languages and words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2. Generative grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3. Classi…cation of generative languages . . . . . . . . . . . . . . . . . . . . . . . .
23
23
26
27
4. Algorithms, computable functions and decision problems
31
5. Analysis of algorithms
5.1. The divide and conquer (DAC) principle . . . . . .
5.2. The master theorem . . . . . . . . . . . . . . . . .
5.3. Searching, sorting and selection problems . . . . . .
5.3.1. Searching problems . . . . . . . . . . . . . .
5.3.2. Sorting . . . . . . . . . . . . . . . . . . . . .
5.3.3. Lower estimate on the complexity of sorting
5.3.4. Selection problems . . . . . . . . . . . . . .
5.4. Basic arithmetic algorithms . . . . . . . . . . . . .
5.4.1. Multiplication . . . . . . . . . . . . . . . . .
5.4.2. Division . . . . . . . . . . . . . . . . . . . .
5.5. Matrix algorithms . . . . . . . . . . . . . . . . . . .
5.5.1. Multiplications with matrices and vectors . .
5.5.2. Strassen’s algorithm . . . . . . . . . . . . .
5.5.3. Remarks on fast matrix multiplications . . .
5.6. Fast Fourier transform . . . . . . . . . . . . . . . .
5.6.1. The Cooley-Tukey radix-2 algorithm . . . .
35
35
36
37
37
40
45
46
50
50
53
55
56
57
59
61
64
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6. Turing machines
67
6.1. Programming Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2. Generalizations of the Turing machine . . . . . . . . . . . . . . . . . . . . . . . 74
6.3. Equivalence of computational models . . . . . . . . . . . . . . . . . . . . . . . . 75
TARTALOMJEGYZÉK
3
6.4. Universal Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7. Algorithmic decidability and computability
7.1. Recognition and decision of languages by Turing
7.2. Undecidable problems . . . . . . . . . . . . . .
7.2.1. Undecidability of the universal language
7.2.2. The halting problem . . . . . . . . . . .
7.3. Further undecidable problems . . . . . . . . . .
7.4. Some concepts and results of complexity theory
.
.
.
.
.
.
79
79
86
86
87
88
91
8. Complexity
8.1. The NP class and NP completeness . . . . . . . . . . . . . . . . . . . . . . . . .
8.2. Non-deterministic Turing machines and the NP class . . . . . . . . . . . . . . .
8.3. NP completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
98
98
101
4
machines
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TARTALOMJEGYZÉK
1. fejezet
Introduction
What is the theory of algorithms?
It contains areas such as
- the analysis of algorithms and problems,
- study of algorithmic e¢ ciency,
- computational models,
- complexity theory of algorithms (and of problems).
The main topics of the course:
-
mathematical concepts
elements of formal languages,
analysis of algorithms,
Turing machines
complexity theory.
The complexity theory started before the computer or von Neumann era. It aroused in
the developments of mathematical logic. More precisely it originates from the formalization
attempts of concepts such as "proof " and "computable function".
These results are shortly summarized next.
Kurt Gödel (1906-1978)
INTRODUCTION
5
Gödel proved in 1930, that we can formulate a statement within a logical system so that it
cannot be decided (within the logical system) if it is true of false.
Gödel, K.: Über formal unentscheidbare Sätze der Principa Mathematica und verwandtere
Systeme I (”On formally undecidable propositions of Principia Mathematica and related systems
I”), Monatshefte für Mathematik und Physik, 38, 1931, 173–198.
In this work Gödel also de…ned the concept of (primitive) recursive function, which is
fundamental in the study of "computable functions".
Alan Turing (1912-1954)
Turing introduced in 1936 the concept of Turing machine which later proved to be the
fundamental tool in computability theory and the theory of algorithms.
Turing, A.M.: On computable numbers with an application to the Entscheidungsproblem,
Proc. London Math. Soc., ser. 2, 42, 1936-7, 230–265.
Alonzo Church (1903-1995)
Alonzo Church (1936) de…ned the -calculus and his famous thesis.
A. Church: An unsolvable problem in elementary number theory, American Journal of Mathematics, 58, 1936, 345–363 ).
Church thesis: Every ”computation” can be formalized in his -calculus.
The LISP language was developed from the -calculus.
6
INTRODUCTION
Stephen C. Kleene (1909-1994)
S.C Kleene (1936) introduced the -recursive functions.
S.C. Kleene: General recursive functions of natural numbers, Mathematische Annalen, 112,
1936, 727–742.
Many imperative program languages (Pascal, C, etc.) are considered as the implementations
of -recursive functions.
The main starting points of complexity theory according to S.A. Cook (ACM Turing Award,
1982 ):
- Turing (1936) Turing machine, (Church-)Turing thesis: Any function which can be
computed by a well-de…ned procedure, can also be computed by a Turing machine.
- Rabin (1959, 1960) What is the meaning of the claim: f is harder to compute than
g?
- Hartmanis, Stearns (1965) measures of complexity, hierarchy theorems.
- Cobham (1965) the inner complexity of functions, machine independent theory.
- Karp (1972) class P (tractability or feasibility).
- Aho, Ullman, Hopcroft (1974) RAM machine.
The basic concepts and problems of complexity theory according to M. Rabin (ACM Turing
award, 1976 ):
Given:
-
P problem class,
I 2 P individual problem,
jIj the size of problem I,
AL an algorithm solving problems P .
Solving problem I 2 P algorithm AL generates a sequence SI . We associate measures (costs)
to the sequence SI .
The most important measures:
(1) The length of SI (computational time)
(2) The depth of SI (the measure of parallelizations, its the computational time)
(3) Memory need
(4) The number of certain special operations (selected arithmetical operations,
comparisons, memory operations, etc.)
INTRODUCTION
7
(5) The (combinatorial) complexity of circuit necessary to implement the algorithms.
Assume now that we have a measure
Important complexity measures:
for the SI sequences.
Worst case complexity measure:
FAL (n) = max f (SI ) j I 2 P; jIj = ng :
(1.1)
Average complexity measure: given a probability distribution p on the problem set Pn =
fI j I 2 P; jIj = ng. The average measure:
X
MAL (n) =
p (I) (SI ) :
(1.2)
I2Pn
Basic problem: For a given size jIj and measure
complexity measures of algorithm AL.
8
(SI ) determine the worst case and average
INTRODUCTION
2. fejezet
Mathematical concepts
2.1. Notations
We use the following notations
N
N0
Z
Q
- the set of natural numbers
- the set of non-negative integers (N0 = N [ f0g)
- the set of integer numbers
- the set of rational numbers:
Q=
R
C
p
j p; q 2 Z, q 6= 0
q
- the set of real numbers
- the set of complex numbers
C = fa + bi j a; b 2 Rg ;
;
jAj
^
_
”i =
p
1”
- the empty set
- true subset
- subset
- the cardinality of set A
- logical ”and”
- logical ”or”.
1. De…nition. The power set of set A 6= ; is de…ned by the set 2A = fX j X
Ag.
Note that ;; A 2 2A .
(Some uses notation P (A) instead of 2A ).
2. Lemma. If jAj = n, then 2A = 2n .
MATHEMATICAL CONCEPTS
9
The number of di¤erent k-element subsets of A is
n
k
and so
Pn
k=0
n
k
= 2n .
3. De…nition. The direct or Descartes product of sets A1 ; A2 ; : : : ; An is de…ned by
A1
A2
:::
An = f(a1 ; : : : ; an ) j ai 2 Ai , i = 1; : : : ; ng :
Notation: ni=1 Ai .
If A1 = A2 =
= An = A, then we use notation
n
i=1 A
= An .
2.2. Relations and functions
4. De…nition. Let A and B be arbitrary sets. Any subset S A B is called (binary) relation.
The elements a 2 A and b 2 B are in S relation (notation aSb) if and only if (a; b) 2 S.
A shorter de…nition: aSb () (a; b) 2 S.
5. De…nition. The domain of relation S
A
B is given by
DS = fa 2 A j 9b 2 B : (a; b) 2 Sg :
6. De…nition. The range of relation S
A
B is given by
RS = fb 2 B j 9a 2 A : (a; b) 2 Sg :
7. De…nition. The value of relation S
A
B at a given element a 2 A is given by
S (a) = fb 2 B j (a; b) 2 Sg :
8. De…nition. Relation S
A
B is called a function if
jS (a)j = 1
(8a 2 DS ):
9. De…nition. The function relation S is (complete) function if DS = A and it is a partial
function if DS A and DS 6= A.
10. Example. Let S1 = f(0; 0) ; (1; 1) ; (2; 4) ; (3; 9) ; (4; 16)g. by de…nition DS1 = f0; 1; 2; 3; 4g,
RS1 = f0; 1; 4; 9; 16g and S1 (i) = fi2 g (i 2 DS1 ). Hence relation S1 is function.
11. Example. Let S2 ={(small,short),(medium,middle),(medium,average),(large,tall)}.
For S2 , DS2 = fsmall,medium,largeg, RS2 = fshort,middle,average,tallg.
Since S2 (medium) = fmiddle,averageg, relation S2 is not a function.
If a relation S is function, then S (a) is either empty or a set with one element.
10
MATHEMATICAL CONCEPTS
12. Example. Relation
S1 =
x; x2 j x 2 R
R
R
is (complete) function, because DS1 = R. But relation
p
S2 = x; x j x 2 R; x 0
R
R
is only partial function, because S2 (x) = ;, for all x < 0.
B as a point-set function S : A ! 2B because for all
We can think of a relation S
A
a 2 DS , S (a) B, that is S (a) 2 2B .
13. De…nition. The function f : A ! B is a …nite function, if the sets A and B are …nite
sets.
14. De…nition. Functions f : f0; 1gn ! f0; 1gm are called binary functions.
15. De…nition. Functions f : f0; 1gn ! f0; 1g are Boole functions..
The last de…nition means that
f (x1 ; x2 ; : : : ; xn ) 2 f0; 1g
(xi 2 f0; 1g ; i = 1; : : : ; n):
The following truth tables de…ne four basic Boole functions:
x
0
0
1
1
y
0
1
0
1
x^y
0
0
0
1
x
0
0
1
1
y
0
1
0
1
x_y
0
1
1
1
x
0
0
1
1
y
0
1
0
1
x
y
0
1
1
0
x
0
1
x
1
0
If 0 means false and 1 means true, then these tables de…ne the logical functions and (AND,
x ^ y), or (OR, x _ y), exclusive or (XOR, x y) and negation (NOT, x).
The negation function is also denoted by :x.
Let x; y 2 f0; 1g be two logical variable (or statement)! Then
x=1
x_y =
1, if x = y = 1
0, otherwise
x^y =
x;
0, if x = y = 0
;
1, otherwise
x
y=
1, if x + y = 1
:
0, otherwise
The XOR function can also be written in the form
x
y
x+y
(mod 2) :
16. De…nition. Expressions of logical variables using only the operations :, ^, _ are said to
be Boole-polynomials.
17. Theorem. Every Boole function can be represented by a Boole-polynomial.
RELATIONS AND FUNCTIONS
11
2.3. Asymptotic characterization of functions
18. De…nition. f (n) = O (g (n)) (f (n) 2 O (g (n))), if there exist constants c; n0 > 0 such
that jf (n)j c jg (n)j holds for every n n0 .
8
6
y
4
2
0
-2
f(x)
c*g(x)
-4
0
50
100
x
150
200
f(x)=O(g(x)) relation
19. Example. log n = O (n). We prove it by induction: n
log 1 = 0 1. Assume that for n 1, the claim is true: log n
log (n + 1)
log (2n) = log 2 + log n
1 ) log n
n. Then
n. For n = 1,
1 + n:
20. Example. 2n+1 = O (3n =n). We prove by induction that for n 7 ) 2n+1
n = 7, 28 = 256 37 =7 312:428. Assume that n 7 and 2n+1 3n =n. Then
2n+2 = 2 2n+1
because
2(n+1)
3n
2
3n
2 (n + 1) 3n+1
=
n
3n
n+1
3n =n. For
3n+1
;
n+1
< 1.
We can perform the following operations.
21. Lemma. If f1 (n) 2 O (g1 (n)) and f2 2 O (g2 (n)), then
f1 (n) + f2 (n) = O (jg1 (n)j + jg2 (n)j) ;
f1 (n) + f2 (n) = O (max fjg1 (n)j ; jg2 (n)jg) :
Assume that for n
n0 , jf1 (n)j
jf1 (n) + f2 (n)j
12
c1 jg1 (n)j and jf2 (n)j
c2 jg2 (n)j. Then
jf1 (n)j + jf2 (n)j
(c1 + c2 ) max fjg1 (n)j ; jg2 (n)jg :
MATHEMATICAL CONCEPTS
22. Lemma. If f1 (n) 2 O (g1 (n)) and f2 2 O (g2 (n)), then f1 (n) f2 (n) = O (g1 (n) g2 (n)).
23. Lemma. If f (n) 2 O (g (n)), then cf (n) 2 O (g (n)).
(a) f (x) = x4 3x3 + 5x 1973 = O (x4 ).
(b) (n + 1)2 = n2 + O (n).
(c) f (n) = 4 log n 3 (log n)2 + n2 = O (n2 ) :
Notation f (n) = O (1) means that f (n) is bounded from above.
24. De…nition. f (n) = (g (n)) (f (n) 2
that jf (n)j c jg (n)j holds for all n n0 .
(g (n))), if there exist constants c; n0 > 0 such
8
7
6
5
y
4
3
2
1
0
f(x)
c*g(x)
-1
-2
0
50
100
x
150
200
f(x)= (g(x)) relation
25. Example. (1=2) n2
5n =
1 2
n
2
(n2 ), because
5n =n2 =
1
2
5
n
1
;
4
n
20.
I(g)
functions growing
faster than g
.g
O(g)
functions growing as g
functions growing
slower than g
Sets O (g),
(g) and
ASYMPTOTIC CHARACTERIZATION OF FUNCTIONS
(g)
13
26. De…nition. f (n) = o (g (n)) (f (n) 2 o (g (n))), if g (n) is zero only at a …nitely many
points and
f (n)
= 0:
lim
n!1 g (n)
16
sqrt(x)
log(x)
log(x)/sqrt(x)
14
12
10
y
8
6
4
2
0
-2
-4
-1
10
10
0
10
x
1
10
2
10
3
f(x)=o(g(x)) asymptotic
27. Example. log n = o (n), n log n = o (n2 ), but n log n = O (n2 ) and n log n = O (n3 ).
Which estimate is better?
28. Example. 2n2 = O (n2 ), but 2n2 6= o (n2 )
If f = o (g), then f = O (g) also holds.
29. De…nition. f (n)
g (n), if
f (n)
= 1:
n!1 g (n)
lim
30. Example.
p
n + log n
p
n.
2.4. Graphs
31. De…nition. A graph G consists of a set V of elements called nodes (or vertices or points)
and a set E of edges such that each edge e 2 E is identi…ed with a unique (unordered) pair
[u; v] of nodes in V , denoted by e = [u; v].
The graph G is given by the pair G = (V; E). Suppose that e = [u; v] 2 E. The nodes u
and v are called the endpoints of e, and u and v are said to adjacent nodes or neighbors.
The edge [u; u] 2 E is called a loop. Distinct edges e; e0 2 E are called multiple edges if they
connect the same endpoints, that is e = [u; v] and e0 = [u; v].
The graphs free of loops and multiple edges are called simple graphs. Otherwise they called
multigraph.
14
MATHEMATICAL CONCEPTS
32. De…nition. The degree (u) of a node u 2 V is the number of edges containing it. If
(u) = 0, then u is called an isolated node.
33. De…nition. The graph G is empty if E = ;. The graph is complete if every node u in G
is adjacent to every other node v in G:
A complete graph with n nodes has
n(n 1)
2
edges.
34. De…nition. A path of length n from a node u to a node v is de…ned as a sequence of
connected edges f[vi 1 ; vi ]gni=1 such that v0 = u and vn = v. The path is closed if v0 = vn . The
path is simple if all the nodes are distinct except for the possibility v0 = vn . The closed path is
called a cycle.
35. De…nition. A graph G is connected if and only if there is a simple path between in any
two nodes in G.
36. Corollary. If a graph is not connected than it has at least one node that has no path to
some other nodes.
37. De…nition. The nodes available by paths from a given nodes together with the edges form
a connected component of the graph.
38. De…nition. The connected graph without any cycles is call a tree graph or a tree..
If a tree has n nodes, then it has n
1 edges.
39. De…nition. A graph G is labeled if its edges are assigned data. In particular, G is said
to be weighted if each edge e in G is assigned a nonnegative numerical value w (e) called the
weight or length of e.
40. De…nition. A graph G is …nite if V and E are …nite sets.
41. De…nition. The graph Gs = (Vs ; Es ) is a subgraph of G = (V; E), if Vs
V and Es
E.
A
E
A
D
B
C
A
D
B
B
C
C
A
3
E
3
B
2
4
D
E
F
C
2
6
3
D
undirected graphs
GRAPHS
15
42. De…nition. A graph G = (V; E) is called directed or digraph, if all its edges are directed.
Then E is a set of ordered pairs. Then edge e = [u; v] 2 E is called an arc, node u is its
starting point and node v is the endpoint. The indegree in (u) of a node u 2 V is the number
of edges ending in u. The outdegree out (u) of a node u 2 V is the number of edges beginning
at u.
The node u 2 V is called a source if out (u) > 0, but in (u) = 0. The node u 2 V is called
a sink if out (u) = 0, but in (u) > 0.
The notions of directed path, simple path and cycle are similar to those of the undirected
graphs with the exception that the direction of each edge in the path must agree with the
direction of the path (cycle).
The node v is reachable from node u if there exists a (directed) path from u to v.
43. De…nition. A directed graph G = (V; E) is said to be strongly connected if for each pair
u, v of nodes in G there is a path from u to v and from v to u.
44. De…nition. A directed graph is acyclic if it contains no directed cycles.
A
D
B
C
directed graph
The graphs and relations are related to each other:
1. Let G = (V; E) be a directed graph. This corresponds to the relation R
V
V de…ned
by
R = f(u; v) j e = [u; v] 2 Eg :
2. Let R
A
B be a relation. This corresponds to the graph (V; E) with
V = A [ B;
E = fe = [u; v] j (u; v) 2 Rg :
The logical circuits are acyclic directed graphs. The next …gure shows two such examples.
16
MATHEMATICAL CONCEPTS
2
1
2
v1
v3
o
v4
1
v4
v1
v2
o
v6
1
v5
v8
v7
v5
v2
v3
2.5. Cardinality of sets
By the cardinality of a set A we mean the "number" of its elements denoted by jAj.
If the number of A’s elements is …nite, then jAj is the true number of elements.
If the number of A’s elements is in…nite, we de…ne the meaning of jAj by a classi…cation.
45. De…nition. Two sets A and B are of equal cardinality (jAj = jBj), if there exists a bijective
mapping of A onto B, that is,
(i) a point b 2 B is assigned to each a 2 A,
(ii) the elements of B assigned to di¤erent elements of A are di¤erent,
(iii) every point of B is assigned to some point of A.
The equal cardinality jAj = jBj is an equivalence relation:
jAj = jAj ;
jAj = jBj ) jBj = jAj ;
jAj = jBj ^ jBj = jCj ) jAj = jCj ;
which induces a natural classi…cation of sets.
Sets with identical cardinality belongs to the same class. It is clear that the numbers of
elements of the sets of a given class are the same.
The numbers of elements of sets belonging to di¤erent classes is di¤erent.
So the cardinality of a set is the "name" of the class it belongs to.
This concept is in agreement with the de…nition of cardinality for …nite sets. The equal
cardinality relation puts the …nite sets into the classes of sets with n elements, where n =
0; 1; 2; : : :. We can identify these classes with the …nite numbers of their elements.
CARDINALITY OF SETS
17
46. De…nition. jAj
jBj, if there is a subset C
B for which jAj = jCj.
If A B then jAj jBj.
The following relations hold:
a) jAj jBj ^ jBj jCj ) jAj jCj;
b) jAj jBj ^ jBj jAj ) jAj = jBj.
Georg Cantor proved that the cardinalities of any two sets can be ordered.
The smallest in…nite cardinality is the cardinality of natural numbers N denoted by @0
(@=aleph).
47. De…nition. A set is A has countably in…nite cardinality, if its cardinality is equal to the
cardinality of the set N of natural numbers.
The union of two countably in…nite sets is also countably in…nite.
Any in…nite subset of a countably in…nite set is also countably in…nite.
Thus every in…nite subset of N also has the cardinality @0 .
48. Example. The set of even numbers can be written in the form
B = fn = 2k j k = 1; 2; : : :g :
where the mapping k ! 2k is bijective between N and B.
It is easy to prove that for the sets
N
N0
Z
Q;
we have
jNj = jN0 j = jZj = jQj = @0 :
The cardinality of interval ( 1; 1)
The map
R is equal to the cardinality of R.
f :x!
x
jxj
1
is a bijection between ( 1; 1) and R. This function is a strictly monotone increasing
y
14
12
10
8
6
4
2
-1.0
-0.8
-0.6
-0.4
-0.2
-2
0.2
0.4
0.6
0.8
1.0
x
-4
-6
-8
-10
-12
-14
18
MATHEMATICAL CONCEPTS
so that for any
1 < x1 < x2 < 1,
x1
x2
<
:
1 jx1 j
1 jx2 j
Note that for 0 x < 1, f (x) 0, and for 1 < x < 0, f (x) < 0. Assume that y 2 R. If
y
y 0, then f (x) = 1 x x and equation y = 1 x x give the unique solution x = 1+y
. If y < 0, then
y
x
x
f (x) = 1+x and y = 1+x gives x = 1 y .
The cardinality of a strict subset B A (B 6= A) of an in…nite set A can be equal to the
cardinality of set A. This property may hold only for in…nite sets.
49. De…nition. jAj < jBj, if jAj
jBj and jAj =
6 jBj.
The cardinality of the set R of real numbers is called the cardinality of continuum.
It is true that jRj > @0 = jNj (Georg Cantor).
50. Theorem. (Cantor): jXj < 2X .
51. Corollary. 2N > jNj.
The cardinality of the power set 2N is denoted by @1 .
Continuum hypothesis (Cantor): there is no set whose cardinality is strictly between that
of the integers and the real numbers, that is between @0 and jRj.
The hypothesis can be given in the form jRj = 2N as well.
Gödel (1938): the continuum hypothesis cannot be disproved in the Zermelo-Fraenkel (ZF)
axiomatic system.
Paul Cohen (1963): the continuum hypothesis cannot be proved in the Zermelo-Fraenkel
(ZF) axiomatic system.
Consequently the problem is undecidable in the ZF axiomatic system.
An enumerable, or countable set is one whose members can be enumerated: arranged in a
list with a …rst entry, a second entry, an so on, so that every member of the set appears sooner
or later on the list.
52. Example. The set of positive integers is enumerable. A possible list for this is given by
1; 2; 3; : : : ; n; n + 1; : : : :
The set ; is considered as enumerable.
A list that enumerates a set may be …nite or unending.
A in…nite set that is enumerable is said to be enumerably in…nite (countably in…nite).
CARDINALITY OF SETS
19
The positive integers can be arranged in a single in…nite list, but the following is not
acceptable as a list of positive integers
1; 3; 5; 7; : : : ; 2; 4; 6; : : : :
Here the odd positive integers are listed, then all the event numbers. This will not do.
In an acceptable list, each item must appear sooner or later as the n-th entry, for some …nite
n.
53. Example. The set N2 = N
N is enumerable (countably in…nite).
The set consists of pairs (i; j) (i; n 2 N). One possible enumeration:
(1; 1) ; (1; 2) ; (2; 1) ; (1; 3) ; (2; 2) ; (3; 1) ; (1; 4) ; (2; 3) ; (3; 2) ; (4; 1) ; : : :
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
...
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
...
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
...
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
...
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
...
...
...
...
...
...
Here the listing principle is that we order the pairs (i; j) into an in…nite matrix (i=row index,
j=column index), and then we make the list along the skew diagonals according to the next
…gure
Enlisting of pairs (i; j)
Observe that the sum of the pairs (i; j) in the skew diagonals is constant: it is 2 in the …rst
diagonal, 3 in the second diagonal, 4 in the third diagonal, and so on.
It is clear that any pair (m; n) will appear in the list as entry j (m; n).
The sum of the pairs (i; j) in the i-the skew diagonal is i + 1, and the number of such pairs
is i. So the sum of pair (m; n) is m + n, which puts the element in the skew diagonal m + n 1.
20
MATHEMATICAL CONCEPTS
The elements …rst m + n
1+2+
2 skew diagonals is
+ (m + n
2) =
(m + n
2) (m + n
2
1)
:
The pair (m; n) will be the m-th element in its diagonal. Hence the index of pair (m; n) in the
above enumeration (enlisting) is
2) (m + n 1)
+m
2
m2 + 2mn + n2 m 3n + 2
=
:
2
j (m; n) =
(m + n
54. Lemma. If the sets A and B are enumerable (countably in…nite), then A
enumerable (countably in…nite).
B is also
Using the above example we …rst enlist the elements of A and B separately from each other:
a1 ; a2 ; : : : ; am ; : : :
and
b1 ; b2 ; : : : ; bn ; : : :
Then we enlist the pairs (ai ; bj ) according to the indices (i; j).
Similarly, we can also prove that Nk is enumerable (countably in…nite).
CARDINALITY OF SETS
21
22
MATHEMATICAL CONCEPTS
3. fejezet
Formal languages
3.1. Languages and words
55. De…nition. An arbitrary …nite set
letters (symbols).
6= ; is called alphabet. The elements of
are called
Examples:
bool
lat
= f0; 1g, Boole alphabet,
= fa; b; c; : : : ; zg, English alphabet (26 letters) [latin alphabet 23 letters] ,
keyboard
m
=
lat
[ fA; B; : : : ; Z; t; >; <; (; ); : : : ; !g, keyboard alphabet, t space symbol,
= f0; 1; 2; : : : ; m
logic
1g, m
1 number system with base m,
= f0; 1; x; (; ); ^; _; :g, alphabet of Boole formulae.
56. De…nition. A …nite sequence w = x1 x2 : : : xn of the letters of alphabet
The length jwj of w is the number of letters in w.
is called word.
The word w = x1 x2 : : : xn can be interpreted as an element (x1 ; x2 ; : : : ; xn ) of the set n ,
from which we deleted the brackets and the commas. The length of word w = x1 x2 : : : xn is
jwj = n.
Using "words" we can represent various objects: numbers, formulae, graphs and programs.
57. Example. The word
x = x1 x2 : : : xn ;
xi 2
bool
(i = 1; 2; : : : ; n)
can be considered as the representation of the non-negative binary number N (x) =
FORMAL LANGUAGES
Pn
i=1
2n i xi .
23
58. De…nition. Let G = (V; E) be a …nite directed graph, in which V is the set of vertices and
E f(u; v) j u; v 2 V; u 6= vg is the set of edges. Let jV j = n be the number of vertices. The
adjacency matrix MG = [aij ]ni;j=1 is de…ned by
1;
0;
aij =
ha (vi ; vj ) 2 E
ha (vi ; vj ) 2
=E
Consider the next graph!
v1
v2
v3
v4
The adjacency matrix of the graph is
2
0
6 0
6
4 0
0
0
0
1
0
1
1
0
0
3
1
1 7
7:
1 5
0
Using alphabet f0; 1; #g we can represent the graph by the word
0011#0011#0101#0000#:
59. De…nition. The concatenation of the words x = x1 x2 : : : xk 2
is given by the word
xy = x1 x2 : : : xk y1 yk : : : yl 2 k+l :
60. Example. x = 0aa1bb, y = 111b,
k
and y = y1 yk : : : yl 2
l
= f0; 1; a; bg.
xy = 0aa1bb111b
The concatenation operation is associative:
x (yz) = (xy) z:
The empty word is denoted by ". Its length is j"j = 0, and for any word x,
(3.1)
x" = "x = x:
61. De…nition. The set of all words de…ned over
included.
24
is denoted by
. The empty word " is
FORMAL LANGUAGES
62. De…nition. Let
xi = xxi 1 .
be an alphabet. For every x 2
and integer i
1, let x0 = " and
63. Example. aabbaaaaaa = a2 b2 a6 = a2 b2 (aa)3 .
64. De…nition. Let = fs1 ; s2 ; : : : ; sn g be an alphabet, n 1 and suppose that the elements
of are (linearly) ordered that is s1 < s2 <
< sn . The lexicographic (or canonical) ordering
"<" of
is de…ned as follows
u < v , (juj < jvj)
_ (juj = jvj ^ u = xsi u0 ^ v = xsj v 0 ^ i < j) ;
where x; u0 ; v 0 2
.
Using the lexicographic ordering we can show that
The list is given by
is enumerable (countably in…nite).
s1 ; s2 ; : : : ; sn ;
s1 s1 ; s1 s2 ; : : : ; s1 sn ; : : : ; sn s1 ; : : : ; sn sn ;
So we …rst take the words of length 1 (letters of ) in their ordering,
then we take the words of length 2 according to the lexicographic ordering,
then the words of length 3,
and so on.
The number of all
of length P
k is nk . Hence a word x 2
Pwords
` 1 k
list between indices k=1 n + 1 and `k=1 nk .
Hence
of length ` must be in the
is enumerable (countably in…nite).
65. De…nition. Any subset L
is called a language. The set LC =
n L is the complement language of language of L. L; = ; is the empty language. L" = f"g is a language
consisting only of the empty word.
66. De…nition. Let L1 ; L2
be two languages. The concatenation of the two languages is
L1 L2 = fvw j v 2 L1 and w 2 L2 g :
67. De…nition. Let L
be a language, L0 = L" , Li+1 = Li L (i
L = [i2N0 Li ;
1),
L+ = [i2N Li :
The set L is said the be the Kleene closure of L.
Observe that
is the Kleene closure of alphabet
LANGUAGES AND WORDS
.
25
3.2. Generative grammars
68. De…nition. A grammar G is de…ned as a quadruple
G = (N ; T ; s; R)
where
1) N =
6 ; is a …nite set of objects called variables or non-terminal symbols,
2) T =
6 ; is a …nite set of objects called terminal symbols,
3) N \ T = ;,
4) s 2 N is a special symbol called start variable (symbol),
5) R is a …nite set of substitution (rewriting) rules of the form u ! v (u; v 2 (N [ T ) ), where
u contains at least one symbol of N .
We can consider set R as a …nite subset of +
where
= N [ T is the complete
alphabet. Hence the rewriting rule u ! v is identi…ed with the pair (u; v) 2 R.
69. Example. Suppose that N = fsg, where s is the start symbol, T = fa; bg is the set of
terminal symbols, and R is the set of rewriting rules s ! asb, s ! ab.
Starting from s and using the two rewriting rules we can produce words of the form ak bk
(k 1):
s ! asb ! a (asb) b !
! ak sbk ! ak (ab) bk = ak+1 bk+1 :
70. De…nition. Let G = (N ; T ; s; R) be a grammar. If w1 2 (N [ T )+ , w1 = p1 up2 and R
contains the rewriting rule u ! v, then w1 can be substituted with the word w2 = p1 vp2 . In
such a case w2 can be directly derived from w1 in grammar G (notation: w1 )G w2 ). The word
v 2 (N [ T ) can be derives from the word u 2 (N [ T )+ (notation: u )G v), if there is an
integer k and words u0 ; u1 ; : : : ; uk 2 (N [ T ) such that
u = u0 )G u1 )G : : : )G uk = v:
This chain of derivation is called derivation of length k.
71. Example.
s ) A ) aAb ) aaAbb ) aaabbb
is a derivation of length 4.
72. De…nition. Let G = (N ; T ; s; R) be a grammar. The language generated by grammar G
is de…ned by
L (G) = fw 2 T j s )G wg :
(3.2)
L (G) consists of all words of T that can be obtained from s with a derivation in G.
26
FORMAL LANGUAGES
73. Example. For G = (fsg ; fa; bg ; s; fs ! asb; s ! abg), L (G) = fan bn j n
1g :
74. Example. Set G1 = (N1 ; T1 ; s; R1 ) with N1 = fs,B,Cg, T1 = fa,b,cg and R1 de…ned by
the rewriting rules
a)
b)
c)
s ! asBC
s ! aBC
CB ! BC
d)
e)
f)
aB ! ab
bB ! bb
bC ! bc
g)
cC ! cc
Consider the following chains of derivation:
s ! a(s)BC ) aaB(CB)C ) a(aB)BCC ) aa(bB)CC )
) aab(bC)C ) aabb(cC) ) aabbcc;
and
s ! a(s)BC ) aa(s)BCBC ) aaaB (CB) CBC ) aaaBBC(CB)C )
) aaaBB(CB)CC ) aa(aB)BBCCC ) aaa(bB)BCCC )
) aaab(bB)CCC ) aaabb(bC)CC ) aaabbb(cC)C )
) aaabbbc(cC) ) aaabbbccc:
Prove that L (G1 ) = fan bn cn j n
1g.
75. De…nition. Given a generative grammar G = (N ; T ; s; R) and a function f :
( = N [ T ). The grammar G "computes" f , if for all w; v 2 ,
!
sws )G v , v = f (w)
holds. A function
!
G which computes f .
can be computed grammatically, if there exists a generative grammar
3.3. Classi…cation of generative languages
This classi…cation was introduced by N. Chomsky and it is called the Chomsky-hierarchy.
76. De…nition. A grammar G = (N ; T ; s; R) is type-0 (unrestricted), if there is no restriction
on the rewriting rules R.
The class of languages of type-0 is denoted by L0 .
77. De…nition. A grammar G = (N ; T ; s; R) is type-1 (context-sensitive), if for every rewriting rule (u; v) 2 R, juj jvj holds.
The class of languages of type-1 is denoted by L1 and they are called context–sensitive
languages.
Prove that L (G1 ) is context-sensitive.
CLASSIFICATION OF GENERATIVE LANGUAGES
27
78. De…nition. A grammar G = (N ; T ; s; R) is type-2 (context-free), if for every rewriting
rule (u; v) 2 R, u 2 N holds.
The class of languages of type-2 is denoted by L2 and they are called context-free languages.
79. Example. G = (fs,Ag ; fa,bg ; s; fs ! asb; s ! "g) is context-free and generates the language L (G) = fan bn j n 0g.
80. Example. Consider grammar G3 de…ned by the following terminal, nonterminal symbols,
and rewriting rules:
a)
s ! cM cN c
d)
N ! bN b
b)
M ! aM a
e)
N !c
c)
M !c
Prove that G3 is context-free and L (G3 ) = fcan can cbm cbm c j n; m
0g.
The context-free languages represent important properties of several programming languages.
81. De…nition. A grammar G = (N ; T ; s; R) is type-3 (regular), if every rewriting rule
(u; v) 2 R has the form a ! w or a ! wb, where a; b 2 N and w 2 T .
The class of languages of type-2 is denoted by L3 and they are called regular languages.
82. Example. Let G4 = (N4 ; T4 ; s; R4 ), where N4 = fs; A; Bg, T4 = f0; 1g and R4 is given by
a)
b)
c)
s ! 0A
s!0
A ! 1B
d)
e)
B ! 0A
B!0
This grammar is regular. Using the rewriting rules we can obtain the following derivations:
s ! 0, s ! 0A ! 01B ! 010, s ! 0A ! 01B ! 010A ! 0101B ! 01010, etc. The language
consists of the words (01)k 0 (k 0).
n
o
k
Prove that L (G4 ) = (01) 0 j k 0 .
Let G = (fs; a; bg ; f1g ; s; R), where R = fs ) 1a; a ) 1b; b ) 1a; a ) 1; s ) "g. Prove
that G is regular and L (G) = f12n j n 0g!
Let G = (fsg ; fa; bg ; s; R), where R is given by s ) asb, s ) ss, s ) ". What is L (G)?
Identify a by the left parenthesis and b by the right parentheses!
The regular languages play an important role in the lexical analysis.
The Chomsky hierarchy means that
L0
L1
L2
L3
holds. Graphically:
28
FORMAL LANGUAGES
unrestricted
languages
context-sensitive
languages
context-free
languages
regular
languages
Chomsky hierarchy
CLASSIFICATION OF GENERATIVE LANGUAGES
29
30
FORMAL LANGUAGES
4. fejezet
Algorithms, computable functions and
decision problems
Algorithm, computability and decidability are basic concepts strongly related to each other.
We deal with these concepts with the aid of formal languages, relations and Turing machines.
Abu Ja’far Muhammad ibn Musa al-Khwarizmi
The algorithm word comes from the name of Abu Ja’far Muhammad ibn Musa al-Khwarizmi
who was born about 780 (Horezm) and died about 850. Horezm was a big state around the
river Amu-Darja (the Greek Oxus) and occupied today’s territory of state Turkmenistan and
Uzbekistan.
He worked at the "House of Wisdom" (Dar al-Hikma) University. He wrote his main work
"Al-jabr wa’l-muqabala" on elementary algebra around 825 (the word "algebra" comes from
work). He also wrote another work on the Hindi numbers 1,2,3,4,5,6,7,8,9 and 0 and their use
for calculations.
The original arabic manuscript was lost but its Latin translation "Algoritmi de numero
Indorum" (Al-Khwarizmi on the Hindi computational art) survived and introduced the hindiarabic numbers in Europe.
In the book the reader can …nd rules of calculations, which are named after him (Algoritmi
dicit .../Al-Khwarizmi tells this ...).
We assume now that the algorithm is a map.
ALGORITHMS, COMPUTABLE FUNCTIONS AND DECISION PROBLEMS
31
83. De…nition. By an algorithm (program) we mean a map
A:
1
!
2
with the following properties:
(i) The inputs are words in 1 ,
(ii) The outputs are words in 2 ,
(iii) For each input A orders exactly one output.
If x 2
1
is the input, the output is A (x) 2
2.
84. De…nition. Two algorithms (programs) A; B :
for all x 2 1 .
1
!
2
are equivalent, if A (x) = B (x)
85. De…nition. Let 1 and 2 be two alphabets. Algorithm A computes function f :
if for all x 2 1 , A (x) = f (x).
1
!
2,
86. De…nition. Given an alphabet
and a language L
. By the decision problem ( ; L)
P
we mean that for every x 2
we have to decide if x 2 L or x 2
= L. Algorithm A solves the
decision problem ( ; L), if for every x 2 ,
A (x) =
1, if x 2 L;
0, if x 2
= L.
It is also said that A recognizes L.
In decision problems the algorithm computes the characteristic function of set L.
87. Example. Primality test:
(
bool ; fx
2(
bool )
j N (x) is prime numberg) ;
where N (x) is the binary number ordered to word x.
The Hamiltonian cycle of a graph is a cycle the visits each vertex exactly once.
88. Example. Hamiltonian cycle problem ( ; HC), where
HC = fx 2
= f0; 1; #g and
j x contains Hamilton cycleg :
A Boole formula, for example
(u1 _ :u3 _ :u4 ) ^ (:u1 _ u2 _ :u4 )
is satis…able, if it is not identically 0 for all possible inputs.
32
ALGORITHMS, COMPUTABLE FUNCTIONS AND DECISION PROBLEMS
89. De…nition. Satis…ability problem (
SAT = fx 2 (
logic )
logic ; SAT )
:
j x is satis…able Boole formulag :
90. De…nition. If for a language L, there exists an algorithm that recognizes L, then L is
called recursive.
One could think that the computation of functions is the most general problem. However
it is not the case.
91. De…nition. Let and be two alphabets and let R
computes relation R, if for every x 2 , (x; A (x)) 2 R.
be a relation. Algorithm A
92. Example. Input: x 2 ( bool ) , Output: y 2 ( bool ) , where y = 1, is N (x) is prime, and
y is a true divisor of N (x), if N (x) is composite number.
The corresponding relation is
R = f(x; y) j y = 1, if N (x) is prime, y > 1 ^ (y j N (x))g :
Using relations we describe non-deterministic computational problems.
Condition (x; A (x)) 2 R means that A (x) 2 R (x). If R is not a function, then jR (x)j > 1
holds for at least one element x 2 DR . For such an x, the result of algorithm can be arbitrary
element A (x) 2 R (x).
It means that for the same input x we can obtain di¤erent outputs, when applying the
algorithm.
In the deterministic case it is not possible, because for the same input we always get the
same output.
ALGORITHMS, COMPUTABLE FUNCTIONS AND DECISION PROBLEMS
33
34
ALGORITHMS, COMPUTABLE FUNCTIONS AND DECISION PROBLEMS
5. fejezet
Analysis of algorithms
We study algorithmically solvable problems and the following questions:
- Characterization of the complexity (or hardness) of the problem.
- Selection of the best method if we have more than one algorithm solving a problem.
The basic tools to characterize the complexity of problems:
- computational time,
- memory need.
The complexity of a given problem is often characterized by an upper or lower estimate of
the cost of the solution algorithm.
The upper bounds inspire faster algorithms, while the lower bounds indicate the hardness
of the problem.
The complexity of problems and the e¢ ciency of solution algorithms are related in general.
If we have more than one algorithm for solving a particular problem, the most e¢ cient can
be selected by an analysis (and comparison) of the algorithms.
More e¢ cient the algorithm, more e¢ cient is the solution.
5.1. The divide and conquer (DAC) principle
This is a simple and e¢ cient technique for solving problems.
The basic idea:
- Divide the problem into smaller subproblems.
- Solve the smaller subproblems.
- Combine the solution from solutions of subproblems.
The general algorithm is the following.
ANALYSIS OF ALGORITHMS
35
DivideAndConquer(data,N ,solution)
data
a set of input values
N
the number of values in the set
solution
the solution to this problem
if (N
SizeLimit) then
DirectSolution( data, N , solution )
else
DivideInput( data, N , smallerSets, smallerSizes, numberSmaller )
for i = 1 to numberSmaller do
DivideAndConquer(smallerSets[i], smallerSizes[i], smallSolution[i])
end for
CombineSolutions(smallSolution, numberSmaller, solution)
end if
We can characterize the complexity (or cost) of algorithm DAC as follows:
DAC(N)=
where
DIR(N ),
P
DIV(N )+ numberSmaller
DAC(smallerSizes[i])+COM(N ),
i=1
if N SizeLimit
if N > SizeLimit
DAC is the complexity of algorithm DivideAndConquer,
DIR is the complexity of DirectSolution,
DIV is the complexity of algorithm DivideInput,
COM is the complexity of CombineSolutions.
In order to solve the recursion we need the master theorem.
5.2. The master theorem
93. Theorem. If n is the power of number b, a; b; c; d are constants, a
solution of recursion
d; if n = 1
T (n) =
aT (n=b) + cn; if n > 1
is
1, b > 1, then the
8
< O (n) ; if a < b
O (n log n) ; if a = b
T (n) =
:
O nlogb a , if a > b.
(5.1)
(5.2)
94. Example. If T (n) = 2T (n=3) + cn, then a = 2 < b = 3 implies T (n) = O (n).
95. Example. If T (n) = 2T (n=2) + cn, then a = b = 2 implies T (n) = O (n log n).
36
ANALYSIS OF ALGORITHMS
96. Example. If T (n) = 4T (n=2) + cn, then a = 4 > b = 2 implies T (n) = O nlog2 4 =
O (n2 ).
There are more general results than the master theorem.
Which one of the following problems can be solved with the master theorem? If a problem
can be solved, then give the solution!
a) T (n) = 2T (n=2) + n2 ;
b) T (n) = 4T (n=2) + n2 ;
c) T (n) = T (n=2) + 2n ;
d) T (n) = 2n T (n=2) + nn ;
e) T (n) = 16T (n=4) + n;
f) T (n) = 2T (n=2) + n log n;
g) T (n) = 2T (n=2) + n= log n;
h) T (n) = 2T (n=4) + n0:51 ;
i) T (n) = 0:5T (n=2) + 1=n.
5.3. Searching, sorting and selection problems
5.3.1. Searching problems
Searching problem: …nd a given target in a list (or array).
The list (array) can be unordered (unsorted) or ordered (sorted).
We analyze the sequential and binary search algorithms, respectively.
For unsorted list we use the sequential search.
For sorted list we use the binary search.
SequentialSearch(list,target,N )
list
the elements to be searched
target
the value being searched for
N
the number of elements in the list
for i = 1 to N do
if (target = list[i])
return i
end if
end for
return 0
The best case: the target is the 1st element in the list.
SEARCHING, SORTING AND SELECTION PROBLEMS
37
The worst case: the target is the last element in the list (N ).
Hence the number of comparisons is between 1 and N .
Assumption 1: the target can be in any position of the list with uniform probability 1=N .
The average cost of sequential search is the following.
If the target is found in the ith position, then the cost of search is i comparisons. Hence
the average of cost is
A (N ) =
N
N
X
i
1 X
1
=
i=
N
N i=1
N
i=1
N (N + 1)
N +1
=
:
2
2
Assumption 2: It may happen that the target is not found in the list. Then we have N + 1
possible cases. If each case is equally possible, then their common probability is 1= (N + 1) and
the average (expected) cost is
A (N ) =
N
X
i=1
=
i
N +1
!
+
N
1
=
N +1
N +1
N (N + 1)
+N
2
N
N
N +1
N 1
+
=
+
:
2
N +1
2
2 (N + 1)
The recent assumption modi…ed (increased) the average cost. If N ! 1, then
and so
N +1 1
A (N )
+ :
2
2
For large N , this is not signi…cant.
Assume now the list (array) is ordered increasingly.
N 1
2(N +1)
!
1
2
BinarySearch(list,target,N )
list
the elements to be searched
target
the value being searched for
N
the number of elements in the list
start = 1
end = N
while start end do
middle = d(start + end)=2e
if list[middle]<target start = middle+1
if list[middle]=target return middle
if list[middle]>target end = middle 1
end while
return 0
The decision tree of the algorithm for a list of 7 elements:
38
ANALYSIS OF ALGORITHMS
list[4]
list[2]
list[1]
list[6]
list[3]
list[5]
list[7]
The algorithm compares the nodes of the decision tree to the value of the target during the
iterations. If target value is less than the node value, then the computation continues on the
left subtree, otherwise it continues on the right subtree.
We investigate the worst case complexity. For simplicity assume that N = 2k 1 for some
integer k 1.
In the …rst iteration the middle element has index 1 + 2k
of the resulting sublist is 2k 1 1.
1 =2 = 2k 1 . Hence the length
Similarly, the length of sublist after the second iteration is 2k
Iteration j produces a sublist of length 2k
j
2
1, and so on.
1.
The iteration will be …nished at iteration k at the latest. This means k comparisons with
k = log2 (N + 1) s log2 N .
We study the average complexity for N = 2k 1 elements under the assumptions that target
is in the list and it can appear anywhere with the probability 1=N .
The ith level of the decision tree has 2i
1
elements, and the number of levels is k.
If target is at the level i, then we have to make i comparisons.
Hence the average cost (complexity) is
1
0
k
k
2i 1
X
X
1 X i 1
1A
@
=
i2 :
A (N ) =
i
N
N i=1
i=1
j=1
97. Lemma. For x 6= 1,
+ kxk
1 + 2x +
1
=
kxk+1
(k + 1) xk + 1
:
(x 1)2
We substitute x = 2 and obtain
k
X
i2i
1
= k2k+1
(k + 1) 2k + 1 = (k
1) 2k + 1:
i=1
Hence the average cost is
A (N ) =
1
(k
N
1) 2k + 1 =
= log2 (N + 1)
(log2 (N + 1)
1) (N + 1) + 1
N
log2 (N + 1)
1+
:
N
SEARCHING, SORTING AND SELECTION PROBLEMS
39
Since log2 (N + 1) =N ! 0, we obtain that
A (N )
log2 (N + 1)
1;
which is the worst case cost minus 1.
What is the average cost, if the target is not in the list, but all cases have the same probability
1= (N + 1)?
5.3.2. Sorting
We analyze two important sorting (ordering) algorithms, both of which is a typical application
of the DAC principle.
Merge sorting
The basis of the algorithm is the merge operation. Assume that we are given two (increasingly)
ordered arrays (lists) A [1 : k] and B [1 : l], end we want two unite and store them in the array
C [1 : k + l] in a sorted form.
The merge algorithm:
1. Write into C [1] the smaller of A [1] and B [1],
2. Assume that subarrays A [1 : i] and B [1 : j] are written into subarray C [1 : i + j] in the
required ordered form. The next element of array C is de…ned by
C [i + j + 1] = min fA [i + 1] ; B [j + 1]g :
The maximal cost of merging is k + l 1 comparisons and k + l data moves. The minimum
cost is min fk; lg comparisons (Why?).
Assume that array A [1 : n] is unsorted.
The basic principle of merge sorting is that we sort the …rst and second halves of the array
separately, then merge the results and use recursion!
function msort (x; y)
comment sorts A [x::y]
if y x 1 then
return(A)
else
return(merge(msort x; x+y
; msort x+y
+ 1; y )
2
2
Let T (n) be the cost (complexity) of the merge sorting for unsorted arrays of length n and
assume that n = 2k . We consider only the number of comparisons.
The merging of ordered arrays of length n=2 is at most n
40
1 comparisons.
ANALYSIS OF ALGORITHMS
Hence
T (n) =
1; if n 1
2T (n=2) + n
1;
otherwise
The master theorem gives us
T (n) = O (n log n)
comparisons for the cost of merge sorting.
A more precise result:
n
n
+ n 1 = 22 T 2
2
2
n
n
3
2
=2 T 3 +2
1 +2
2
22
i
X
n
n
i
=2T i +
2j 1 j 1
2
2
j=1
T (n) = 2T
= 2k T (1) +
k
X
2j
n
1
n
+2
1 +n
2
n
1 +n 1
2
1 +n
+n
= 2n + kn 1
= n log2 n + n:
1
1 = 2n + kn
j=1
1
1
k
X
2j
1
j=1
k
2
1
(5.3)
Hence the worst case complexity of the merge sorting is O (n log n) comparison for arrays of
length n.
Quicksort
The quicksort algorithm was developed by C.A.R. Hoare in 1962 and it is one of ten most
important algorithms of the 20th Century (see, e.g., B. A. Cipra: The Best of the 20th Century:
Editors Name Top 10 Algorithms, SIAM News, Volume 33, Number 4.)
Sir Tony Hoare
The quicksort algorithm has three elements/phases:
- Divide,
SEARCHING, SORTING AND SELECTION PROBLEMS
41
- Conquer,
- Combine.
Divide: the subarray
fA (p) ; : : : ; A (r)g
(p
r)
is partitioned into two (possibly empty) subarrays
fA (p) ; : : : ; A (q
1)g
and
fA (q + 1) ; : : : ; A (r)g
such that each element of fA (p) ; : : : ; A (q 1)g is less than or equal to A (q), which is less
than or equal to each element of fA (q + 1) ; : : : ; A (r)g.
Compute the index q as part of this partitioning procedure.
Note that the elements of the two subarrays are unsorted.
Conquer: Sort the subarrays fA (p) ; : : : ; A (q 1)g and fA (q + 1) ; : : : ; A (r)g by recursive
calls to quicksort.
Combine: Because the recursive calls stop at one element arrays and the subarrays are ordered in place, no work is necessary to combine them. On exit the whole array fA (p) ; : : : ; A (r)g
is sorted.
Quicksort (A; p; r)
if p < r
then
q =Partition(A; p; r)
Quicksort(A; p; q 1)
Quicksort(A; q + 1; r)
To sort an entire array, the initial call is Quicksort(A; 1; length (A)).
The key of quicksort is the Partition procedure, which rearranges the subarray fA (p) ; : : : ; A (r)g
in place.
Partition (A; p; r)
x = A (r)
i=p 1
for j = p : r 1
if A (j) x
then
i=i+1
exchange A(i) with A (j)
exchange A (i + 1) with A (r)
return i + 1
The Partition procedure divides the array into four (possibly empty) regions according to
the following …gure
42
ANALYSIS OF ALGORITHMS
pivot
i
p
r
x
j
<=x
unknown
>x
exchange of A(i+1) and
A(j), if A(j)<=x
Note that for indices between j and r 1 the above conditions do not hold, because these
elements are not (yet) related to the pivot.
1. Analysis of the average cost
Let T (n) be the average (expected value) of comparisons for an array of n pairwise distinct
elements. Clearly, T (0) = T (1) = 0.
Assume that pivot x is the i-th smallest element. Then algorithm Partition divides the
array A (1 : n) into a subarray of i 1, and a subarray of n i elements. The time of recursive
calls T (i 1) and T (n i), respectively.
Assume that i can take an arbitrary integer between 1 and n with equal probability 1=n.
The Partition algorithm requires n 1 comparisons. For a given i, the cost is T (i
T (n i) + n 1 comparisons. The average (expected value) is
1X
[T (i
T (n) =
n i=1
1) +
n
Since
n
X
[T (i
1) + T (n
1) + T (n
i)] = 2
i=1
we can write
i)] + n
n 1
X
1:
T (i) ;
i=2
2X
T (n) =
T (i) + n
n i=2
n 1
1:
Make the following manipulations
nT (n) = 2
n 1
X
T (i) + n2
n;
i=2
(n
1) T (n
1) = 2
n 2
X
T (i) + n2
3n + 2;
i=2
nT (n)
(n
1) T (n
1) = 2T (n
SEARCHING, SORTING AND SELECTION PROBLEMS
1) + 2 (n
1) :
43
This implies
nT (n) = (n + 1) T (n
1) + 2 (n
1) :
and
T (n)
T (n 1) 2 (n 1)
=
+
:
n+1
n
n (n + 1)
De…ne S (n) = T (n) = (n + 1). Then by de…nition S (0) = S (1) = 0, S (2) = T (2) =3 = 1=3
and
2 (n 1)
:
S (n) = S (n 1) +
n (n + 1)
For n
3,
2 (n 1)
n (n + 1)
2
:
n
Hence we have
0; if n 1
S (n 1) + 2=n;
S (n)
if n
2
The solution is
S (n)
S (n
1) +
S (n
2) +
2
n
2
n
1
+
2
n
..
.
S (n
n
X
i) + 2
j=n
..
.
S (1) + 2
n
X
1
j=2
j
1
j
i+1
=2
n
X
1
j=2
j
2
Z
1
n
1
dx = 2 ln n:
x
Thus we obtain that
T (n) = (n + 1) S (n)
< 2 (n + 1) ln n
= 2 (n + 1) log n= log e
1:386 (n + 1) log n = 1:386n log n + O (n) :
98. Theorem. The average cost of quicksort is
1:4n log n + O (n).
2. The worst case complexity/partition
44
ANALYSIS OF ALGORITHMS
Let x be the smallest element of the array, that is i = 1 and assume that this situation is
repeated during the recursive calls. Then
T (n) = T (n
1) + n
1:
Summing up for all n, we obtain
n
X
T (i) =
i=1
and
T (n) =
n
X
T (i
1) +
i=1
n
X
(i
1) =
n
X
(i
1) ;
i=1
(n
i=1
1) n
2
= O n2 :
3. The best case complexity/partition
Let x be n=2-th element of the array. The Partition procedure generates two arrays of n=2
elements, and it is repeated recursively. Then the cost is
T (n) = 2T (n=2) + n
1;
whose solution is T (n) = O (n log n) by the master theorem. Using the argument seen at the
merge sorting, we have the precise result T (n) = n log2 n + n.
The Hoare algorithm has several variants. The selection of pivot may be done randomly.
5.3.3. Lower estimate on the complexity of sorting
99. Theorem. An arbitrary sorting algorithm that uses pairwise comparisons and exchange of
the elements needs at least (n log n) comparisons in the worst case.
Assume that the sorting algorithm correctly sorts all n! permutations of the elements
a1 < a 2 <
< an
using at most k decisions (pairwise comparisons).
For each input sequence v1 ; v2 ; : : : ; vn , we order a codeword of length at most k that contains the
results of decisions (y=n sequence) according to the execution of the algorithm. Each codeword
is related to a unique sequence of permutations (exchanges) P1 ; P2 ; : : : ; Ps . This is due to the
fact that the program executes a true function. The number of di¤erent codewords is at most
2k .
(continuation) If 2k < n!, than the program executes the same permutation sequences at
least for two di¤erent input sequences. Consequently, at least one output will be faulty, which
is contradiction.
SEARCHING, SORTING AND SELECTION PROBLEMS
45
So we need to satisfy thepinequality 2k
n!, that is the inequality k
n
asymptotic formula n! s 2 n (n=e) gives the required result
dlog n!e > n log n
dlog2 n!e. Stirling’
n
:
2
100. Corollary. The merge sorting is nearly optimal (apart from the constant multiple).
The worst cost of the merge sorting is T (n) = n log2 n + n.
5.3.4. Selection problems
Selection problems also belong to searching. Here we look for an element of an array (list) that
has certain properties, unlike earlier when we searched for a target value.
We study two selection problems.
Selection of the maximal and minimal element
We look for the maximal and minimal element of an array S of length n.
Our …rst solution is a simple linear search applied twice:
Smax = S [1]
for i = 2; : : : ; n
if S [i] > Smax then Smax = S [i]
end
Smin = S [1]
for i = 2; : : : ; n
if S [i] < Smin then Smin = S [i]
end
The cost of this algorithm is 2n 2 (= O (n)) comparisons.
The second algorithm is based on DAC. We halve the array and compute the maximal and
minimal elements in both half arrays using recursion.
function maxmin (x; y)
comment return max and min in S [x::y]
if y x 1 then
return(max (S [x] ; S [y]) ; min (S [x] ; S [y]))
else
(max1; min1) := maxmin (x; b(x + y) =2c)
(max2; min2) := maxmin (b(x + y) =2c + 1; y)
return(max (max1,max2) ; min (min1,min2))
46
ANALYSIS OF ALGORITHMS
Let T (n) be the number of comparisons performed by the routine maxmin(x; y), if n =
y x + 1. Assume that n = 2k for some integer k 1. T (n) = 1, if n = 2. Assume that n > 2!
The sizes of subarrays are
1. b(x + y) =2c
2. y
x + 1 = b(n + 2x
b(x + y) =2c = y
1) =2c
(n=2 + x
x + 1 = (n=2 + x
1) = n
The cost of the two subproblems is 2
1)
x + 1 = n=2,
n=2 = n=2.
T (n=2) + 2 comparisons. Hence
1; ha n = 2
2T (n=2) + 2;
T (n) =
ha n > 2:
The exact solution of this recursion is the following.
n
+2
2
n
= 2 2T
+2 +2
4
..
.
T (n) = 2T
X
n
+
2j
2i
j=1
i
= 2i T
log n 1
log n 1
=2
T (2) +
X
2j
j=1
n
= + 2log n
2
2 = 1:5n
2:
We can establish that the second solution requires only 75% of the cost of the …rst algorithm.
Selection of the kth element
Here we have to …nd an element of an unsorted array A [1 : n] whose value is kth in order of
(increasing) magnitude, if the array is ordered increasingly.
Trivial solution: We sort the array into the array B [1 : n] (increasing order) with a cost
of O (n log n) comparisons, where B (k) will satisfy
B (1) ; B (2) ; : : : ; B (k
1)
B (k)
B (k + 1) ; : : : ; B (n) :
The …rst direct algorithm is due to C.A.R. Hoare (1971).
For a modi…ed Hoare algorithm, Blum, Floyd, Tarjan, Pratt and Rivest proved: O (n)
comparison is enough to solve the problem.
SEARCHING, SORTING AND SELECTION PROBLEMS
47
We make the following simplifying assumptions: the elements of the array are pairwise
distinct and n = 2k .
Consider …rst the case k = n 1, that is look for the second greatest element of the array!
A possible solution is the following.
(1) Pair the elements and select the bigger elements (winners) from each pairs (n=2
comparisons).
(2) Pair the n=2 winners and select again the winners (n=4 comparisons).
(3) Continue until we get the absolute winner (greatest element of the array).
This procedure requires k = dlog ne iterations and n2 + n4 +
+ 2nk = n 1 comparisons.
The second greatest element is an element that lost against the greatest element. There are at
most dlog ne such elements. We can select the greatest of these with dlog ne 1 comparisons.
Hence the second greatest element can be obtained with n + dlog ne 2 comparisons which
is much better than O (n log n), for su¢ ciently large n.
Consider now the case of general k! Let T (n) be the maximal cost of …nding the kth
element in the array A of n elements (number of comparisons). Furthermore let x be an
arbitrary element of array A.
Compare x with all the other elements (n
the following disjoint sets
S1 = fA [i] j A [i] < xg ;
If x was the tth element, than jS1 j = t
The following cases are possible:
1 comparisons) and divide the elements into
S2 = fA [i] j A [i] > xg :
1 and jS2 j = n
(5.4)
t.
1. If t = k, then stop.
2. If k > t, then continue comparisons in S2 with cost T (n t).
3. If k < t, then continue comparisons in S1 with cost T (t 1).
For the total cost we have
T (n)
If t
1, or t
n
1 + max fT (t
1) ; T (n
t)g :
n, then its solution is about
T (n) = n
1 + T (n
1) =
n2 ;
which is a bad case.
In principle, the good selection is t
T (n) = n
n=2, because
1 + T (n=2) = O (n) :
So the key to success is the choice of pivot x. With a clever selection of x the bound
can be improved signi…cantly.
48
(n2 )
ANALYSIS OF ALGORITHMS
101. De…nition. median is the middle value in a series of values arranged from smallest to
largest.
Selection algorithm of pivot:
(i) Divide the element into groups of …ve elements (n=5 groups).
(ii) Compute the median elements of the groups.
(iii) Select the median of the median elements with recursive calls (T (n=5) comparisons).
This will be pivot x!
102. Theorem. If c denotes the cost of computing the median of …ve elements, then for the
cost of …nding the kth element
T (n)
n = O (n)
(5.5)
holds, where
is a constant depending on c (
If c = 6 (best value), then
cost c = 10, then = 30.
10 + 2c).
= 22. If we sort the 5 elements with the selection sort at the
It is surprising that the result does not depend on k.
103. Theorem. (Blum, Floyd, Pratt, Tarjan, Rivest). For the cost of …nding the kth element
(5.6)
T (n) < 5:73n
holds and
T (n)
n + min fk; n
k + 1g + dlog2 (n)e
4 (2
k
n
1):
(5.7)
The lower estimate for the case of median (the dn=2eth element) this was improved to 2n
by Bent and John.
The algorithm is a close relative to quicksort. Hoare’s original algorithm does not include
the median technique. Its average cost is however 3:4n comparisons for random inputs of length
n.
What is the cost of the following algorithm that …nds the kth element? Compare the result
with those of Blum, Floyd, Pratt, Tarjan and Rivest!
FindKthLargest(list,N ,K)
list
the values to look through
N
the size of the list
K
the element to select
for i = 1 to K do
largest = list[1]
largestLocation = 1
for j = 2 to N (i 1) do
if list[j] > largest then
largest = list[j]
SEARCHING, SORTING AND SELECTION PROBLEMS
49
largestLocation = j
end if
end for
Swap( list[N (i 1)], list[largestLocation] )
end for
return largest
5.4. Basic arithmetic algorithms
We study the complexity of basic arithmetic algorithms such as
- addition,
- multiplication,
- division,
- square root,
- evaluation of arithmetic expressions,
- number theoretical algorithms (greatest common divisor, factorization, primality test,
etc.),
- conversion between di¤erent number systems.
The most important computer arithmetics are:
Let
integer arithmetic
‡oating point arithmetic
modular arithmetic
etc.
> 1 be the basis of the number system. The integer number
A = an
1
n 1
+
+ a1 + a0 > 0
is characterized (or given) by the length and the "digits" 0
ai
1. Occasionally it is
assumed that an 1 6= 0. If the is …xed then we need to store only n and the digits fai gni=01 .
For A = 0 (zero), n := 0.
Further notations:
r := a mod b remainder of integer division (0 r < b)
q := a div b quotient of integer division (0 a qb < b)
5.4.1. Multiplication
We study two multiplication algorithms from the several existing ones.
50
ANALYSIS OF ALGORITHMS
Basic multiplication
For integer numbers of length m and n, the algorithm requires O (mn) operations.
Algorithm Basic
Pm multiplication
P
i
Input: A = i=0 1 ai P
, B = ni=01 bj
1
Output: C = AB := m+n
ck k
k=0
C
A b0
for j from 1 to n 1 do
C
C + j (A bj )
Return C
i
j
The main operation is in line 6: A bj . Multiplication by
and it accumulates in register C.
shifts the result by j positions,
The Karacuba-Ofman algorithm
It was published in 1962 and it is an example of DAC. Let n0
2 be the threshold, under
which we use the basic algorithm, that is if n < n0 , we use the basic multiplication.
The basic idea of the algorithm is the following. Assume that both A and B are of length
2k and we can write
A = A1 k + A0 ; B = B1 k + B0
and
AB = A1 B1
2k
+ (A0 B1 + A1 B0 )
k
+ A0 B0 :
So instead of multiplying numbers A and B of length 2k, we multiply numbers of length k.
We also use recursion.
Algorithm Karacuba
Multiplication.
P
P
, B = ni=01 bj i
Input: A = ni=01 ai iP
1
k
Output: C = AB := 2n
k=0 ck
if n < n0 then return Basic Multiplication(A; B)
k
dn=2e
(A0 ; B0 ) := (A; B) mod k , (A1 ; B1 ) := (A; B) div k
sA sign(A0 A1 ), sB sign(B0 B1 )
C0 Karacuba Multiplication(A0 ; B0 )
C1 Karacuba Multiplication(A1 ; B1 )
C2 Karacuba Multiplication(jA0 A1 j ; jB0 B1 j)
return C := C0 + (C0 + C1 sA sB C2 ) k + C1 2k :
The algorithm has an additive variant that uses A0 + A1 and B0 + B1 instead of jA0
and jB0 B1 j.
A1 j
104. Theorem. The Karacuba multiplication computes AB in T (n) = O (n ) operations with
= log 3 1:585.
BASIC ARITHMETIC ALGORITHMS
51
A = A1
k
+ A0 , B = B1
k
+ B0 and
AB = A1 B1
Since sA jA0
A1 j = A0
2k
A1 , sB jB0
sA sB jA0
+ (A0 B1 + A1 B0 )
B1 j = B0
A1 j jB0
k
+ A0 B0 :
B1 and
B1 j = (A0
A1 ) (B0
B1 )
we have
C = A0 B0 + (A0 B1 + A1 B0 )
jB0
k
+ A1 B1
2k
= AB:
(continuation) The lengths of numbers A0 and B0 are at most dn=2e , and numbers jA0 A1 j,
B1 j, A1 and B1 have the length bn=2c. For the number T (n) of multiplications we have
T (n) =
Assume that 2l 1 n0
and
n
T (n)
n2 ; if n < n0
2T (dn=2e) + T (bn=2c) ;
2l n0 holds for some integer l
3T 2l 1 n0
2log 3
log n
if n
n0 :
1 (l
log n). Then n=2
32 T 2l 2 n0
2l 1 n0
3l T (n0 )
T (n0 ) = nlog 3 T (n0 ) :
The above scheme is using approximately 4n additive operations, which can be reduced to
approximately (7=2) n additive operations.
The improved cost O (n1:585 ) of integer multiplication is signi…cant, but it is far from the
theoretically possible.
There are other multiplication algorithms. The best algorithm is based on the FFT algorithm and it is due Schönhage and Strassen. It has the complexity O (n log n log log n).
Fast multiplication algorithms are used in Cryptography. Many of them are implemented
in hardware form, as well.
For the minimum number of operations, Winograd gave a lower estimate under very strict
restrictions.
The multiplication is unbalanced if A has m digits, B has n digits and ,say, m > n.
One technique to handle this is the so called padding (we append zeros to the shorter
number).
Another technique is to cut A into dm=ne pieces of length n. If the cost of the multiplication
of numbers of length n is T (n), then multiplication AB can be executed in dm=ne T (n) +
O (m + n) operations.
52
ANALYSIS OF ALGORITHMS
5.4.2. Division
Assume that = 2, A and B are two numbers of length at most n, B = bl 1 2l
2 and bl 1 > 0 (B normalized).
Compute q and r such that A = qB + r, where 0
In fact, q = bA=Bc and r = A mod B = A qB.
There are several algorithms:
1
+
+b1 2+b0 >
r < B.
- naive,
- Svoboda,
- divide and conquer,
- Newton,
- combined methods,
- etc.
The Newton based algorithms is due to Anderson, Earle, Goldschmidt, Powers and Cook.
It is based on the relation
A=B = A (1=B) :
If both A and B are of length at most n bits, then it is enough to determine the number 1=B
approximately with n bit precision (let it be ) and using an appropriate correction of A we
can determine the quotient q := A div B. The n bit precision fraction 1=B is determined by
the Newton method.
The Newton (or Newton-Raphson) method solves equations of the form
f (x) = 0 (f : R ! R)
by iteration. Starting from a given initial point x0 2 R it has the form
f (xj ) =f 0 (xj ) ;
xj+1 = xj
(5.8)
j is small enough, f (x) is smooth and f 0 ( ) 6= 0,
If is a zero of f (that is f ( ) = 0), jx0
then there is a constant K > 0 such that
jxj+1
j = 0; 1; 2; : : : .
j
K jxj
j2 :
This means a second order convergence, which means a doubling of precise digits at each
iteration in the convergence phase.
For example, in double precision arithmetic, after achieving the …rst two correct digits of
the zero, the Newton iteration gets the best available precision in at most three iterations.
1=B is the unique solution of equation
f (x) = B
1=x = 0:
Then f 0 (x) = 1=x2 and the Newton iteration has the special form
1
xj
B
xj+1 = xj
BASIC ARITHMETIC ALGORITHMS
1
x2j
= 2xj
Bx2j :
(5.9)
53
So we can de…ne the following algorithm:
Algorithm NewtonDiv.
1
< 2 m B1 )
set x0 = 2 m ( 2B
for i = 1 : dlog ne
xi = 2xi 1 Bx2i 1
end
set q = Axdlog ne or q = Axdlog ne
set r = A qB
For k = dlog ne
log n we have
A
B
0
Axk <
1
:
B
Since A=B = q + r=B, we can write
0
q+
r
1
r
B
Axk <
1
;
B
which implies
q
1
B
q+
B
< Axk
q+
r
B
q+
B
1
B
:
Axk
q-1
q
q-1/B
q+1
q+(B-1)/B
Hence either q = dAxk e or q = bAxk c.
The algorithm requires 2 multiplications and 1 addition per iteration. The number of
iterations is
log n. Hence the total cost of algorithm is
2T (n) log n, where T (n) is the
computational cost of the multiplication of two n bit integers.
It can be shown however that it is enough to work in iteration i with 2i digit precision. It
can be proved (by padding) that T (l)+T (j) T (l + j). Hence the total cost of multiplication
is
T (1) + T (2) +
+ T 2k
+ 2k
T 1 + 2 + 22 +
= T 2k+1
1
T (4n) :
This means that the complexity of integer division is the same as the complexity of the used
multiplication.
If for example, we use the Karacuba-Ofman algorithm, then the cost is
O (4n)log 3 = O 4log 3 nlog 3 = O nlog 3 :
54
ANALYSIS OF ALGORITHMS
If we use the FFT based multiplication of Schönhage and Strassen, then the cost is
O (n log n log log n) :
105. Example. Divide 12345 by 7! Then n = 5, log n
2:332 and 1=14 < 1=8
1=7.
Hence k = 3 and x0 = 1=8, respectively. The …rst three Newton iterate (with 10 decimal digit
precision) are
i
xi
0
0:125
1
0:140625
2 0:1428222656
3 0:1428571428
The ten precision value of 1=7 is 0:1428571428. We can also see the doubling of correct digits
during the iterations. Finally we check b12345 0:1428571428c = 1763 and since 12345 1763
7 = 4 we obtained q = 1763 and r = 4.
The Newton method also have several variants in this context.
There several combined division algorithms.
For example, Burnikel and Ziegler (1998) gave a recursive algorithm the cost of which is
2K (n)+O (n log n), if we divide an integer number of length 2n by and integer number of length
n. Function K (n) is the cost of the Karacuba-Ofman multiplication algorithm for integers of
length n.
If we do not need the remainder, the cost is 23 K (n) + O (n log n).
5.5. Matrix algorithms
Matrices and matrix operations play a key role in many area. Computing with large matrices
requires large memory space and computation time.
The m
n type matrices are
2
a11 a12
..
6 ..
.
6 .
6
A = 6 ai1 ai2
6 .
..
4 ..
.
am1 am2
given in the form
: : : a1j
..
.
:::
aij
..
.
: : : amj
3
: : : a1n
.. 7
. 7
7
: : : ain 7 = [aij ]m;n
i;j=1
.. 7
5
.
: : : amn
(aij 2 R)
aij is the element (entry) of matrix A in row i and column j.
Compact notation: A 2 Rm
MATRIX ALGORITHMS
n
.
55
The n-dimensional column vectors are given in the form
2
3
x1
6 x2 7
6
7
x = 6 .. 7 2 Rn (xi 2 R) ;
4 . 5
xn
while the n-dimensional row vectors have the form
x = [x1 ; x2 ; : : : ; xn ]
(xi 2 R) :
5.5.1. Multiplications with matrices and vectors
106. De…nition. The inner (or dot) product of two vectors x; y 2 Rn is de…ned by
2
3
y1
n
6 y2 7
X
6
7
T
x y = [x1 ; x2 ; : : : ; xn ] 6 .. 7 = x1 y1 + x2 y2 +
+ xn yn =
xi yi :
4 . 5
i=1
yn
The standard sequential algorithm for computing the dot product is
function: z = dot (x; y)
z = x(1) y(1)
for i = 2 : n
z = z + x (i) y (i)
end
The cost of this algorithm is n multiplications and n
1 additions:
107. De…nition. The product of two matrices A 2 Rm k , B 2 Rk
C = AB 2 Rm
n
, cij =
k
X
ait btj
n
is C 2 Rm
n
, where
(i = 1; : : : ; m; j = 1; : : : ; n):
t=1
Note that cij is the dot product of the ith row of A and the jth column of B, that is
2
3
b1j
6
7
cij = [ai1 ; : : : ; aik ] 4 ... 5 :
bkj
Hence the algorithm for computing AB, if A 2 Rm
k
and B 2 Rk
n
:
for i = 1 to m do
56
ANALYSIS OF ALGORITHMS
for j = 1 to n do
cij = ai1 b1j
for k = 2 to k
cij = cij + aik
end for k
end for j
end for i
bkj
The cost of this (ijk) algorithm is
mnk multiplications and mn (k
1) additions.
For square matrices, when m = n = k, this cost is n3 multiplications and n3
n2 additions.
5.5.2. Strassen’s algorithm
Strassen’s algorithm (1969) signi…cantly decreased the O (n3 ) operation cost of matrix multiplications. His algorithm is based on a clever multiplication of 2 2 matrices and the divide
and conquer principle.
108. De…nition. A matrix A 2 Rm
n
is partitioned into 2
A=
where Aij 2 Rmi
nj
A11 A12
A21 A22
2 Rm
2 block form, if it is written as
n
;
(i = 1; 2, j = 1; 2).
The standard matrix operations with partitioned matrices can be executed similarly to the
normal case, if the partitions are operation compatible.
For example, the addition of two partitioned matrices can be done blockwise if they are
identically partitioned.
However for the multiplication of two partitioned matrices
A=
A11 A12
A21 A22
2 Rm
n
;
B=
B11 B12
B21 B22
2 Rn p ;
(5.10)
we have to satisfy the condition
A11 2 Rr s ;
B11 2 Rs
(5.11)
:
If so, then
C=
=
MATRIX ALGORITHMS
A11 A12
A21 A22
B11 B12
B21 B22
A11 B11 + A12 B21 A11 B12 + A12 B22
A21 B11 + A22 B21 A21 B12 + A22 B22
:
57
Consider the product of two 2
A=
2 matrices
a11 a12
a21 a22
2 R2 2 ;
b11 b12
b21 b22
B=
2 R2 2 ;
which is by de…nition
C=
c11 c12
c21 c22
=
a11 b11 + a12 b21 a11 b12 + a12 b22
a21 b11 + a22 b21 a21 b12 + a22 b22
:
This requires 8 multiplications and 4 additions.
Strassen observed that the multiplication of the two 2
in the following way:
m1
m2
m3
m4
m5
m6
m7
= (a12 a22 ) (b21 + b22 ) ;
= (a11 + a22 ) (b11 + b22 ) ;
= (a11 a21 ) (b11 + b12 ) ;
= (a11 + a12 ) b22 ;
= a11 (b12 b22 ) ;
= a22 (b21 b11 ) ;
= (a21 + a22 ) b11 ;
c11
c12
c21
c22
2 matrices can also be performed
= m1 + m2 m4 + m6 ;
= m4 + m5 ;
= m6 + m7 ;
= m2 m3 + m5 m7 :
(5.12)
This requires 7 multiplications and 18 additions. Note that the number of additions significantly increeased, while the number of multiplications decreased by one.
The Strassen multiplication algorithm of large matrices is based on the following observation
(DAC principle).
Assume that n = 2k and A; B 2 Rn n . Partition A and B such that
A=
A11 A12
A21 A22
;
B=
B11 B12
B21 B22
n
Aij ; Bij 2 R 2
n
2
:
Using Strassen’s formula (5.12), we can compute the product of A and B in the form
M1
M2
M3
M4
M5
M6
M7
n
2
= (A12 A22 ) (B21 + B22 ) ;
= (A11 + A22 ) (B11 + B22 ) ;
= (A11 A21 ) (B11 + B12 ) ;
= (A11 + A12 ) B22 ;
= A11 (B12 B22 ) ;
= A22 (B21 B11 ) ;
= (A21 + A22 ) B11 ;
For the result we need the multiplication of 7
n
type matrices.
2
C11
C12
C21
C22
n
2
= M 1 + M2 M 4 + M6 ;
= M4 + M5 ;
= M6 + M7 ;
= M2 M3 + M5 M7 :
n
2
(5.13)
type matrices and the addition of 18
Similarly, we reduce the multiplication of smaller n2 n2 type matrices to the multiplication
n
of
type matrices, and so on until we obtain 2 2 matrices that we multiply with Strassen’s
4
formula (5.12).
n
4
58
ANALYSIS OF ALGORITHMS
In other word we use recursive calls.
If TA (n) and TM (n) denote the number of additive and multiplicative operations in the
above recursive algorithm, then
n
TM (n) = 7TM
;
2
where TM (2) = 7. This implies that
TM (n) = 7k = nlog2 7
n2:808 :
For counting the additive operations, observe that the multiplications of the
matrices also contain additions, beyond the additions of the 18 matrices of type n2
TA (n) = 7TA
n
n
+ 18
2
2
(5.14)
n
2
n
.
2
n
2
type
Hence
2
;
(5.15)
where TA (2) = 18. This recursion has the exact solution
TA (n) = 6 7k
4k ;
which implies
TA (n)
6
7k = 6nlog2 7 < 6n2:808 :
A similar matrix multiplication algorithm was later given by Winograd using somewhat less
additive operations. His basic multiplication formula for 2 2 matrices is the following:
S1 = a21 + a22 ;
S4 = a12 S2 ;
S7 = b22 b12 ;
P2 = a11 b11 ;
P5 = S 1 S 5 ;
S 9 = P2 + P 3 ;
S12 = S10 + P5 ;
S15 = S11 + P5 :
S2 = S1 a11 ;
S5 = b12 b11 ;
S8 = S6 b21 ;
P3 = a12 b21 ;
P6 = S4 b22 ;
S10 = P1 + P2 ;
S13 = S12 + P6 ;
C=
S9 S13
S14 S15
S3 = a11 a21 ;
S6 = b22 S5 ;
P 1 = S2 S6 ;
P4 = S3 S7 ;
P7 = a22 S8 ;
S11 = S10 + P4 ;
S14 = S11 P7 ;
:
This formula requires 7 multiplications and 15 additions.
Prove that TA (n) = 6 7k 4k is the solution of recursion (5.15), if n = 2k !
5.5.3. Remarks on fast matrix multiplications
1. The applicability of such algorithms is in question because of their suspected numerical
instability and recursive character. Attila Gáti proved that both the Strassen and Winograd
MATRIX ALGORITHMS
59
recursive algorithms are numerically unstable, but Strassen’s algorithm is somewhat more stable. The Strassen algorithm has non-recursive variant, that - according to several authors - has
proved useful in many application, even in parallel environment.
The present day programming languages do not support the e¢ cient recursive programming.
The following …gure compares the CPU times of the traditional Cayley (or ijk) multiplication,
the recursive Strassen algorithm and the built in optimized BLAS multiplication in Matlab
(Intel I7+Nvidia Geforce GTX 760M).
M a t l a b v s i j k p ro d u c t v s S t ra s s e n
900
M a tl a b
i j k p ro d u c t
S t ra s s e n
800
700
Matlab CPU time (s econds)
600
500
400
300
200
100
0
0
100
200
300
400
500
600
s i z e o f m a t ri x (n )
We can see that the CPU time of the recursive Strassen algorithm (written as a script …le)
is huge, while the CPU times of the other two almost disappear. If we compare only the
ijk multiplication (written as script …le) and the built in BLAS multiplication, we obtain the
following …gure.
M a t l a b v e rs u s i j k p ro d u c t
4 .5
M a tl a b
i j k p ro d u c t
4
3 .5
Matlab CPU time (s econds)
3
2 .5
2
1 .5
1
0 .5
0
0
100
200
300
400
500
600
s i z e o f m a t ri x (n )
60
ANALYSIS OF ALGORITHMS
Itt már látszik a hagyományos ijk és a beépített optimalizált eljárás közti különbség. The
above comparisons are not fair. They show however
that e¤ective recursive programming is not supported today (in Matlab and elsewhere),
for certain basic problems we have very e¢ cient programs that are based on software and
hardware technics (even in Matlab).
2. Winograd (1971) proved that the multiplication of 2 2 type matrices cannot be done
with less than 7 multiplications. Thus the Strassen and Winograd multiplication cannot be
improved in the above scheme.
3. There is big "competition" for improving the bound O nlog 7 (log 7 2:808) using various techniques (bilinear, trilinear forms, groupt theoretical methods, etc.). The best theoretical
result is due to Coppersmith and Winograd (1987): O (n2:376 ). The algorithm cannot be used
in practice.
4. The big question: what is the best result. There are
We can show a lower bound, quite simply. Let
2
2
3
p1 0 : : : 0
q1
6 p2 0 : : : 0 7
6 0
6
6
7
A = 6 ..
.. 7 ; B = 6 ..
4 .
4 .
5
.
pn 0 : : : 0
0
several lower bounds of
(n2 ) size.
3
q2 : : : qn
0 ::: 0 7
7
.. 7
. 5
0
:::
0
be two n n type matrix, where p1 ; : : : ; pn ; q1 ; : : : ; qn are pairwise distinct prime numbers (2n
di¤erent prime numbers). The form of product AB is
AB = [pi qj ]ni;j=1 :
Since the prime numbers are pairwise distinct, the products pi qj are also pairwise distinct.
This means n2 di¤erent numbers and n2 multiplications.
5.6. Fast Fourier transform
The fast Fourier transform (FFT) is one of the 10 most important algorithms of the 20th
century.
The fast Fourier transform is a fast way of computing the discrete Fourier transform, and
it is essentially a fast matrix-vector product, where the matrix has very special properties..
Given a function f (t). Its Fourier transform is de…ned by the improper integral
Z 1
F( )=
f (t) e i2 t dt:
(5.16)
1
FAST FOURIER TRANSFORM
61
Let t be the sampling interval, and let yk = f (k t) (k = 0; 1; : : : ; n
f (t).
1) be the sample of
Assume that f (t) is periodic with the period T = n t, that is y0 = yn .
The Fourier transform F ( ) is approximated on [0; T ] (one period) by a numerical integral
Fe ( ) =
Let
= 1= (n t) and
k
Z
n t
f (t) e
i2
t
dt
t
0
n 1
X
yj e
i2
j t
:
j=0
. Then
=k
Fe ( k )
t
n 1
X
yj e
i2 kj=n
:
j=0
The discrete Fourier transform (DFT) of the sequence y0 ; y1 ; : : : ; yn
Yk =
n 1
X
yj e
i2 kj=n
(k = 0; 1; : : : ; n
1
is the sequence
1) :
j=0
Let y = [y1 ; : : : ; yn ]T , Y = [Y1 ; : : : ; Yn ]T 2 Cn and
! = !n = cos
2
n
which is the nth complex unit root of 1 (n
2
=e
n
i sin
2 i=n
;
1; the nth zero of equation z n
1 = 0).
ωk=cos(2(k-1)π/n)+isin(2(k-1)π/n) for n=8
Im(z)
ω3=i
ω4
ω2
ω1=1
ω5=-1
Re(z)
ω8=cos(7π/4)+isin(7π/4)
=cos(π/4)-isin(π/4)
ω6
ω7=-i
We can write the discrete Fourier transform of vector y in the forms
Yk =
n 1
X
! jk yj ;
k = 0; 1; : : : ; n
1;
(5.17)
j=0
and
Y = Fn y;
62
(5.18)
ANALYSIS OF ALGORITHMS
where the matrix
2
6
6
6
Fn = 6
6
4
1
1
1
..
.
1
!
!2
..
.
1
!2
!4
..
.
:::
:::
1
!n
! 2(n
..
.
1
1)
2
3
7
7
7
7 = Fn = ! jk
7
5
1 ! n 1 ! 2(n 1) : : : ! (n 1)
is the nth order Fourier matrix.
Note that the indexing of elements is in the range 0-(n
n 1
j;k=0
1).
Observe that
! `n+k = ! k
(0
k < n):
So the cost of nth order discrete Fourier transform is the cost of multiplication Fn y provided
that the elements of Fn are given.
Using traditional matrix-vector multiplication the cost is n2 multiplications and n2
additions.
109. Example. n = 2, !2 = cos
i sin
=
F2 =
i sin 24
2
1
6 1
F4 = 6
4 1
1
110. Example. n = 4, !4 = cos 24
n
1,
1
1
1
1
=
i,
1
i
1
i
1
1
1
1
:
3
1
i 7
7:
1 5
i
The basis of the fast Fourier transform is the property
2
!2n
= !n :
Cooley and Tukey (1965) observed …rst, that for 0
Fourier transform can be decomposed as follows
[F2n y]k =
2n
X1
jk
!2n
yj
=
j=0
=
n 1
X
n 1
X
2jk
!2n
y2j
j=0
2jk
!2n
y2j
+
=
j=0
FAST FOURIER TRANSFORM
|
n 1
X
2n
(2j+1)k
!2n
1, the 2n-order discrete
y2j+1
j=0
k
!2n
j=0
n 1
X
+
k
n 1
X
2jk
!2n
y2j+1
j=0
k
!njk y2j + !2n
{z
Bk
}
n 1
X
j=0
|
!njk y2j+1 :
{z
Ck
}
(5.19)
63
Thus we can reduce the 2n-order Fourier transform to two n-order Fourier transforms (divide
and conquer principle).
In addition, note that
Bk = Bk+n ;
Ck = Ck+n ;
k+n
=
!2n
k
!2n
(k = 0; : : : ; n
1) :
Assume that n = 2k and apply the above decomposition recursively!
Algorithm FFT.
Input: y
Output: Y = Fn y
if n = 1 then return Y = y
!
cos 2n i sin 2n
b
c
B
C
for
[y0 ; y2 ; : : : ; yn 2 ]T
[y1 ; y3 ; : : : ; yn 1 ]T
FFT(b)
FFT(c)
j from 0 to n=2 1 do
Yj
Bj + ! j Cj
Yj+n=2
Bj ! j Cj
The multiplicative cost of the nth order fast Fourier transforms T (n) = O (n log n). This is
much better than the traditional algorithm with the cost of O (n2 ) multiplications.
If n 6= 2k , then we have various possibilites to handle the case. The simplest (and most
advised) is the padding, that is adding zero observations until the next 2 power.
The FFT algorithm has extraordinary importance in applications. Therefore it has several
variants even in circuit forms. In practice, the recursive versions are not used. Instead they use
equivalent non-recursive forms. The original algorithm of Cooley and Tukey is iterative and
not recursive.
5.6.1. The Cooley-Tukey radix-2 algorithm
Consider decomposition (5.19) in case n = 8 and its two iterations! The …rst member of the
…rst decomposition contains data with even, while the second member contains data with odd
indices. The next two iterations of the decomposition results in the permutations of indices
shown by the …gure below (butter‡y scheme).
64
ANALYSIS OF ALGORITHMS
y0
y0
y0
y0
y1
y2
y4
y4
y2
y4
y2
y2
y3
y6
y6
y6
y4
y1
y1
y1
y5
y3
y5
y5
y6
y5
y3
y3
y7
y7
y7
y7
If the data is arranged in the above form, then we can compute FFT using the binary tree
below (and decomposition (5.19))
y0
y4
y2
y6
y1
y5
y3
y7
For the iterative version we have to solve one last problem: namely, the arrangement of
data according to the above pattern. This can be achieved with bit inversion. This means that
the binary form of index k is reversed. The new binary value will be the permuted index.
For the example, this is shown in the next tableau:
index (dec) index(bin) inv.ind(bin) inv.ind(dec)
0
000
000
0
1
001
100
4
2
010
010
2
3
011
110
6
4
100
001
1
5
101
101
5
6
110
011
3
7
111
111
7
The best FFT implementation is due to Frigo and Johnson. For the FFTW3 (The Fastest
Fourier Transform in the West) software they received the John H. Wilkinson prize for Numerical Software in 1999 (the 3rd prize, the prize is issued once in a four year period, since 1991).
The FFTW3 is built into the Matlab software.
The FFTW algorithm is a C program, that adapts the standard Cooley-Tukey algorithm to
the speci…c hardware. It was implemented in various parallel environments as well. For details
see the web page
FAST FOURIER TRANSFORM
65
www.fftw.org
The next …gure shows the CPU times of the recursive algorithm, the matrix-vector multiplication and the built in fft program (FFTW) in Matlab R2011b (Windows 7, 64 bit)
environment on Intel I7 processor and random data.
Re c u rs i v e F F T v e rs u s F F T W 3
0 .7
FFTW 3
Re c u rs i v e F F T
0 .6
Ma tl a b CPU ti me (se co n d s)
0 .5
0 .4
0 .3
0 .2
0 .1
0
0
2000
4000
6000
8000
10000
s i z e o f d a t a (n )
12000
14000
16000
18000
The Matlab program of the recursive process:
function y=¤t_rec(x)
n=length(x);
w=exp(-2*pi*j/n);
if n==1
y=x;
else
m=n/2;
B=¤t_rec(x(1:2:n-1));
C=¤t_rec(x(2:2:n));
for k=1:m
y(k,1)=B(k)+w^(k-1)*C(k);
y(k+m,1)=B(k)-w^(k-1)*C(k);
end
end
66
ANALYSIS OF ALGORITHMS
6. fejezet
Turing machines
For de…ning the concept of algorithm there are several mathematical models. The most important are the following.
- recursive functions (Gödel, 1934)
- Turing machines (Turing, 1936 and partially Post, 1936)
- partially recursive functions (Kleene, 1936)
- -calculus (Church, 1936)
- generalized Turing machines (Turing, 1939, Leeuw, Moore, Shannon, Shapiro, 1956,
Chandra, Stockmeyer, 1976, Kozen, 1976, Burgin, 1992)
- neural networks (McCulloch, Pitts, 1943)
- von Neumann automata (cellular automata) (John von Neumann, 1949)
- Kolmogorov algorithms (1953)
- …nite automata (McCulloch, Pitts, 1943, Mealy, 1953, Kleene, 1956, Moore, 1956,
Rabin, Scott, 1959, stb.)
- Minsky machines (Minsky, 1967)
- RAM (Sheperdson, Sturgis, 1963), RASP (Elgot-Robinson, 1964), PRAM
- Petri nets (Petri, 1962)
- vector machines (Pratt, Rabin, Stockmeyer, 1974)
- Post products (Post, 1943)
- normal Markov algorithms (Markov, 1954)
- formal languages (Chomsky, 1956, Backus, 1959, Naur, 1960)
- Boole circuits (Shannon)
- stb.
We investigate only the Turing machines.
The Turing machine (T ) is a machine with a …nite number of states that have two main
components:
- an in…nite machine tape that consists of cells with a read/write head moving in both
directions (L=left, R=right, N=stays),
- a control unit with …nite number of di¤erent states.
The in…nite machine tape corresponds to a kind of "unlimited memory".
TURING MACHINES
67
An elementary operation of theTuring machine depends on the state of control unit and the
content of the cell where the read/write head actually stays.
The components of the elementary operation:
- T changes state,
- writes into the actual cell,
- moves the head with one cell to right (R), or to left (L), or leaves it in the actual
position (N).
We can describe the Turing machine with the following …gure:
0
1
2
s
x1
x2
i
...
xi-1
xi
n
xi+1
...
xn
#
#
...
read/write head
infinite
tape
control unit
(program)
Symbol s is the left end symbol of the tape (in fact, starting sign), the head cannot move
left beyond s.
Symbol # denotes the empty space.
It is common to de…ne Turing machines with tapes in…nite in both directions.
111. De…nition. A Turing machine (T ) is a 7-tuple T = (Q; ; #; s; ; q0 ; qA ), where
(i) Q 6= ; is the …nite set of states,
(ii) is the input alphabet,
(iii) # 2
= is the empty sign,
(iv) s 2
= the left end symbol of the tape,
(v) : Q ( [ f#; sg) ! Q ( [ f#; sg) fL; R; Ng is the transition function, which
satis…es
(p; s) 2 Q fsg fR; Ng (8p 2 Q n fqA g) ;
(6.1)
(vi) q0 2 Q is the start state,
(vii) qA 2 Q is the accept state.
If T is in state p 2 Q and reads the symbol Y 2 , then the elementary operation of T is
(p; Y ) = (q; X; Z)
(q 2 Q; X 2 ; Z 2 fL; R; Ng) ;
(6.2)
where machine T
- changes to state q,
- overwrites symbol Y by the symbol X,
- the read/write head moves in the direction Z.
68
TURING MACHINES
Condition (6.1) guarantees that machine T cannot overwrite the symbol s and cannot move
to left beyond s. In this case only (p; s) = (q; s; R) or (p; s) = (q; s; N) is possible, where
q 2 Q.
We can de…ne more than one accept states qA .
We can de…ne the Turing machines so that the read/write head always moves to right or
left.
We can de…ne Turing machines with tape in…nite in both directions. In such a case there is
starting (or reference) cell denoted by 0.
None of the above variants represents essential di¤erence.
The machine of E.L. Post (1947) is a variant of Turing’s machine in which writing and
moving the head in the same elementary operation is not allowed together.
Machine T starts the processing of input x 2
at the …rst cell in the start state q0 .
If during the process T gets into the accept state qA , then it stops and accepts the input
independently of the position of the read/write head.
If T has a state q 6= qA such that T stops (cannot leave state q), then T rejects the input.
It may happen that T never stops for the given input.
112. De…nition. The Turing machine T accepts input (word) x, if T stops in accept state qA
in a …nite number of steps. T rejects input x, if T does not stop or stops in a state q 6= qA .
113. De…nition. The language L (T ) accepted (recognized) by a Turing machine T is the set
of those words that are accepted (recognized) by T , that is
L (T ) = fw 2
j T accepts wg :
114. De…nition. A Turing machine T computes a function F :
! , if given a string
x2
as input, T starts with its start con…guration and eventually halts with just F (x) on
the tape.
115. Example. (Parity check 1): Construct a Turing machine with just two states that reads
sequence of 1’s and it stops in state q0 if the number of 1’s is even and stops in state q1 if the
number of 1’s is odd.
For this machine
= f1g and Q = fq0 ; q1 g.
On start the number of 1’s is 0 (0 is even number) and the initial state is q0 .
If the machine reads a character 1 in state q0 then the (even) number of 10 s changes to odd
number and the machine state changes to q1 .
If in state q1 the machine reads a character 1, the (odd) number of 1’s changes to even, and
the machine state changes to q0 .
The state (transition) table of the sought Turing machine:
symbol
state
1
q0
(q1 ; 1; R)
q1
(q0 ; 1; R)
TURING MACHINES
#
(q0 ; #; N)
(q1 ; #; N)
69
Turing machines can be represented by state diagrams. One elementary operation is represented by a directed and labeled edge ( (p; Y ) = (q; X; Z)):
(Y,X,Z)
p
q
The state diagram of the previous Turing machine is given by
(1,1,R)
(#,#,N)
(#,#,N)
q0
q1
(1,1,R)
116. Example. (Parity check 2): Construct a Turing machine that reads 0; 1 sequences, writes
E if the number of 1’s is even, and writes D if the number of 1’s is odd.
Set = f0; 1g, Q = fq0 ; q1 ; q2 g. The construction is based on the following observation.
On start, the number of 1’s is even (0 is even).
When reading a symbol 0 or 1 machine T changes to state q1 if the parity of 1’s does not
change (symbol 0) and changes to state q2 if the parity of 1’s does change (symbol 1). The
state table of T is the following
symbol
s
0
1
#
state
q0
(q0 ; s; R) (q1 ; 0; R) (q2 ; 1; R)
q1
(q1 ; 0; R) (q2 ; 1; R) (q1 ; E; N)
(q2 ; 0; R) (q1 ; 1; R) (q2 ; D; N)
q2
The state diagram of T is the following.
(#,E,N)
q1
(0,0,R)
(0,0,R)
(s,s,R)
q0
(1,1,R)
(1,1,R)
(1,1,R)
q2
(0,0,R)
(#,D,N)
70
TURING MACHINES
117. De…nition. The time function of a Turing machine T is the function timeT (n) which
gives the maximum number of steps for all inputs of length n.
118. De…nition. The space function of a Turing machine T is the function spaceT (n) which
gives the maximum number of cells written on the tape for all inputs of length n.
It is customary to assume that timeT (n)
n and spaceT (n)
1.
6.1. Programming Turing machines
The programming of a Turing machine is simply the design of tape alphabet and the control
unit (state table).
119. Example. Design a Turing machine over the alphabet f0; 1g that moves the read/write
head to the right until it …nds an empty symbol # and then it stops. The state table of the
machine:
symbol
s
0
1
#
state
q0
(q0 ; s; R) (q0 ; 0; R) (q0 ; 1; R) (q0 ; #; N )
The state diagram of the Turing machine:
(1,1,R)
(0,0,R)
q0
(#,#,N)
120. Example. Modify the previous example such that if T …nds the symbol # then it moves
the read/write head left with one cell, overwrite the (last non-empty) cell with the empty symbol
# and then stops.
The problem has a meaning, if the length of input word is at least 1.
We specify this fact by a separate state. We also use a third state to overwrite the last
symbol and to stop.
The state table and state diagram are the following:
symbol
state
s
0
q0
(q0 ; s; R) (q1 ; 0; R)
(q1 ; 0; R)
q1
q2
(q2 ; #; N )
PROGRAMMING TURING MACHINES
1
(q1 ; 1; R)
(q1 ; 1; R)
(q2 ; #; N )
#
(q0 ; #; N )
(q2 ; #; L)
71
(1,1,R)
q1
(0,0,R)
(0,0,R)
(#,#,N)
(1,1,R)
q0
(#,#,L)
(0,#,N)
q2
(1,#,N)
The Turing machines of both examples computes type F : f0; 1g ! f0; 1g functions.
In the …rst case F (x) = x.
In the second case F (x) = y, if x = y , where 2 f0; 1g. For y, we have jyj 0.
Solve the previous two problems under the condition that the Turing machine can move
only to right or left, except for a special halt state. Compare the corresponding state diagrams
as well!
121. Example. Assume that = fa; bg and assume that the length of input words is at least
1. Design a Turing machine T that computes the output #x for every input word x 2 fa; bg
(jxj 1).
The Turing machine computes the (partial) function F (x) = #x (jxj 1).
We use four states: q0 ; q1 ; q2 ; q3 .
In state q0 machine T reads the …rst character of input x.
If this character is symbol a, then T writes symbol # into the …rst cell, moves right and
changes to state q1 .
If the …rst character is b, then T writes # into the …rst cell, moves right and changes to
state q2 .
In the sequel state q1 denotes the fact that the last character before the actual cell was a.
Similarly, state q2 denotes the fact that the last character before the actual cell was b.
If T reads symbol a in state q1 , then writes a (the symbol of the previous cell), moves right
and stays in state a.
If T reads symbol b in state q1 , then writes a (the symbol of the previous cell), moves right,
and changes to state q2 (the last cell contained b).
If in state q2 T reads a, then T writes b (the symbol of the previous cell), moves right and
changes to state q1 .
If in state q2 T reads b, then T writes b (the symbol of the previous cell), moves right, and
stays in state q2 .
If T reads symbol # in any of the states q1 and q2 , then T writes a or b according to its
state, moves one cell right and changes to state q3 .
72
TURING MACHINES
s
...
a
a
a
q1
a
b
q1
q1
b
a
q2
s
q2
...
b
b
q2
b
b
a
The state diagram of the Turing machine is the following.
(a,a,R)
q1
(a,#,R)
(#,a,R)
(a,b,R)
q0
(b,#,R)
(b,a,R)
q3
(#,b,R)
q2
(b,b,R)
The examples show that it is not easy to program (design) a Turing machine for a given
problem.
There are several schemes that help the programming of Turing machines.
Such schemes can be found in the book by Aho, Motwani and Ullman.
Schönhage and other developed the language TPAL for programming and implementing
Turin machines on computers.
For small Turing machines (of various kind) demo programs also exist.
Set Q = fq0 ; q1 ; q2 g, = f0; 1; 2; 3; 4; 5; 6; 7; 8; 9g, qA = q0 . The Turing machine is de…ned
by the state diagram
What does it? What does it with the words 2718 and 271828?
Design a Turing machine that add two positive integer numbers represented by an unary
code and writes the result in unary form!
PROGRAMMING TURING MACHINES
73
6.2. Generalizations of the Turing machine
k-tape (k
1) T machine (Hartmanis, Stearns),
Turing machines with in…nite tape in both directions,
Turing machines with 2D tapes,
Turing machines with several read/write heads on each tape (parallel T machine),
non-deterministic Turing machines,
random Turing machines,
oracle Turing machines,
etc.
The scheme of k-tape Turing machines:
program
s
t
t
...
tape 1
s
t
t
...
tape 2
s
t
t
...
tape 3
t
t
...
tape k
.
.
.
s
The control unit handles the tapes simultaneously.
The oracle Turing machine (Turing, 1939) is a multitape Turing machine that has a special
oracle tape, and an oracle function h : B ! B , which gives the answers of the oracle.
input tape
...
work tape
...
.
.
.
Control Unit
oracle tape
...
.
.
.
output tape
...
74
TURING MACHINES
If we write a word z on the oracle tape, the Turing machine sends a signal to the oracle and
the oracle substitutes z with word h (z) (the answer of the oracle).
During the processing we can ask the oracle as many as we like.
122. De…nition. The Turing machines A and B are equivalent if for all input x 2
following properties hold:
(i) A accepts x, if and only if B accepts x,
(ii) A rejects x, if and only if B rejects x,
(iii) A does not stop on x, if and only if B does not stop on x.
, the
123. Theorem. The multitape and one-tape Turing machines are equivalent.
The equivalence also holds with the other Turing machines as well.
6.3. Equivalence of computational models
The mentioned models are equivalent with the one-tape Turing machines and consequently
they are equivalent with each other.
We quote some sharper theorems on the equivalence of one-tape and multitape Turing
machines.
124. Theorem. Let T be a k-tape Turing machine. There exists a one-tape Turing machine
T 0 such that L (T ) = L (T 0 ) and timeT 0 (n) 2time2T (n), spaceT 0 (n) spaceT (n) + n.
If we use two tapes instead of one tape, the simulation will be faster.
125. Theorem. For any k-tape Turing machine T we can construct a 2-tape Turing machine T 0 such that L (T ) = L (T 0 ) and timeT 0 (n) = O (timeT (n) log timeT (n)), spaceT 0 (n)
spaceT (n) + n.
6.4. Universal Turing machines
So far we saw that for di¤erent algorithms (problems) we had to design di¤erent Turing machines. Turing introduced the concept of universal Turing machine that simulates all Turing
machines.
The universal Turing machine "T ; x" consists of two components:
- description (code) of T ,
- input x 2
of T .
EQUIVALENCE OF COMPUTATIONAL MODELS
75
The universal Turing machine - using the description of T - simulates T step by step on
input x.
Turing proved the existence of universal Turing machines. It follows that any problem
solvable by Turing machines can be solved by universal Turing machine as well.
We …rst give a description (code) of a Turing machine T .
Let T = (Q; ; #; s; ; q0 ; qA ) be an arbitrary Turing machine, where
Q = fq0 ; q1 ; : : : ; qm ; qA g ;
[ f#; sg = fA1 ; A2 ; : : : ; Ar g :
We use binary code. Hence the codes of symbols are the following
k (qi ) = 10i+1 1 (i = 0; 1; : : : ; m) ;
k (qA ) = 10m+2 1;
k (Aj ) = 110j 11 (j = 1; : : : ; r) ;
k (N ) = 1110111;
k (R) = 11102 111;
k (L) = 11103 111:
Note that 0j is a sequence 00 : : : 0 of length j!
The code of state transition (p; Al ) = (q; Aj ; ) is given by
k ( (p; Al )) = #k (p) k (Al ) k (q) k (Aj ) k ( ) #
for all possible states p 2 Q, l; j 2 f1; : : : ; rg and moves
2 fL; R; N g.
The code of machine T is de…ned as follows:
We …rst give the number jQj of states, then the number j j of symbols (alphabet), and then
enlist all state transitions. Thus the code of T is
k (T ) = #0m+2 #0r ##k (state transition1 ) #k (state transition2 ) # : : : :
Hence each Turing machine T is described by a word w 2 . For any word w 2 , there
can be at most one Turing machine Tw , whose code is w. Having the code w we can obtain the
characteristics of Tw by an algorithm.
126. Theorem. There exists a 3-tape universal Turing machine U such that if w; x 2
and
Tw exists, then
(i) The universal machine U accepts the input w#x if and only if Tw accepts the input x;
(ii) The universal machine U rejects the input w#x if and only if Tw rejects the input x;
(iii) The universal machine U does not stop on input w#x if and only if Tw does not stop on
input x.
76
TURING MACHINES
Denote the state transition (p; a) ! (q; b; Z) by the 5-tuple (p; a; q; b; Z). Machine Tw is in
fact a sequence of 5-tuples (p; a; q; b; Z).
The 3-tape universal Turing machine (interpreter) U is de…ned as follows:
- The …rst tape (S1 ) of U contains input x of Tw .
- The second tape (S2 ) of U contains the description of Tw , that is the sequence of all
5-tuples (p; a; q; b; Z).
- The third tape (S3 ) of U contains the pair (p; a) where p is the actual state of Tw and
a is the symbol read from the …rst tape in state p.
S1
S1
...
x
...
x
Tw
U
S2
...
w
S3
...
(p,a)
On start, the third tape contains only the pair (q0 ; s).
The main loop of the "program" of the universal Turing machine U is the following:
while p 6= qA do
begin
1) Find the 5-tuple (p; a; q; b; Z) that corresponds to (p; a) on tape S2 !
2. Write b to the …rst tape S1 , execute move Z on the read/write head of tape S1 , and
read the next symbol a0 of tape S1 !
3. Write pair (q; a0 ) to tape S3 !
end
The halting condition depends on the de…nition of Tw . The universal Turing machine makes
only copying and sample …tting operations.
Turing proved the existence of one-tape universal Turing machine.
127. Theorem. (Shannon, 1956): There exists universal Turing machine that has only 2
states and 2 symbols (f0; 1g).
Shannon also proved that there does not exist universal Turing machine using only one
state.
Minsky (1962) constructed a universal Turing machine with 7 states and 4 symbols.
Since then several "small" universal Turing machines were constructed (see, e.g. Wikipedia,
WolframMathWorld, etc.).
The Post machine is a "special case" of the Turing machine (writing and moving is not
possible at the same time).
Aanderaa and Fischer proved that there does not exist universal Post machine with 2 states.
UNIVERSAL TURING MACHINES
77
The result indicates that there is dependence between the computational model and the
universality property.
78
TURING MACHINES
7. fejezet
Algorithmic decidability and computability
The Turing machine is considered the most general computational model (Church-Turing hypothesis). It is a fact (as of now), that we do not know such a computational model that can be
simulated on today’s computers and would be able to make computations that are not possible
by the Turing machines.
It does not mean that other computational models cannot be faster or better in some sense
than the Turing model.
The solvability of a problem by a Turing machine is characterization of the problem, in
general. So if a problem can be solved by a Turing machine, then we consider it solvable. If it
is not the case, we consider the problem unsolvable.
There are several otherwise well-de…ned problem that cannot be solved by a Turing machine.
7.1. Recognition and decision of languages by Turing machines
Given a language and a Turing machine. We can raise two questions:
Does the Turing machine recognize the language (is the language recursively enumerable)?
Does the Turing machine decide the language (is the language recursive)?
The latter Turing machine is considered as the model of algorithm.
We note again that using languages we can describe the usual computational problems.
Therefore the Turing model has a kind of generality.
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
79
128. De…nition. The Turing machine T accepts input (word) x, if T stops in accept state qA
in a …nite number of steps. T rejects input x, if T does not stop or stops in a state q 6= qA .
129. De…nition. The language L (T ) accepted (recognized) by a Turing machine T is the set
of those words that are accepted (recognized) by T , that is
L (T ) = fw 2
j T accepts wg :
130. De…nition. Language L is recursively enumerable (Turing recognizable), if there exists a
Turing machine T such that L = L (T ). The set
LRE = fL (T ) j T Turing machineg
is said to be the class of recursively enumerable languages.
By a decision problem ( ; L) we mean that we have to decide for any word x 2
either x 2 L or x 2
= L holds, where L
is a language.
, that
The Turing machine T recognizing language L 2 LRE is only a half algorithm for deciding
L, because it may happen that T does not stop for an input w 2 LC .
"Yes" if xεL
T recognizes L
x
131. De…nition. Language L
(decision problem ( ; L)) is said to be recursive (Turing
decidable), if there exists a Turing machine T for which L = L (T ) and T stops in a …nite
number of steps for any input x 2
such that
(i) if x 2 L, then T stops in state qA ,
(ii) if x 2
= L, then T stops in a state q 6= qA .
If both conditions (i) and(ii) hold, we say that T stops on all input (T always stops). Such
Turing machine is the formal model of "algorithm".
"Yes" if xεL
T decides L
x
"No" if xεLC
Deciding a language is more stringent than recognizing it.
132. De…nition. The set
LR = fL (T ) j T always stopsg
is the class of recursive (algorithmically decidable) languages.
80
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
By the de…nition we clearly have the following results.
133. Lemma. If a language L is recursive (decidable), then it is also recursively enumerable,
that is
LR LRE :
(7.1)
134. Lemma. If a language L is recursive (decidable), then LC is also recursive.
135. De…nition. A Turing machine T computes a function F :
! , if given a string
x2
as input, T starts with its start con…guration and eventually halts with just F (x) on
the tape. The function F :
!
is computable (recursive) if there exists a Turing machine,
which computes function F .
In case of decision problems ( ; L), the algorithm computes the characteristic function of
language L.
The di¤erence between Turing recognizability and Turing decidability will be shown by the
following simple example.
136. Example. Let L = fa2n j n 0g be the language of words aa : : : a whose length is even.
Design a two-state Turing machine that recognizes language L.
The machine starting from s moves to the right until it …nds the …rst empty symbol #.
During the processing it switches between the states q0 (even) and q1 (odd). The only accept
state is q0 . The corresponding state table and state diagram is
symbol
state
s
a
q0
(q0 ; s; R) (q1 ; a; R)
(q0 ; a; R)
q1
#
(q0 ; #; N )
(q1 ; #; N )
and
(a,a,R)
q0
q1
(#,#,N)
(s,s,R)
(#,#,N)
(a,a,R)
This machine always stops.
Modify this Turing machine T as follows:
- T stops on symbol #, if the number of a’s is even,
RECOGNITION AND DECISION OF LANGUAGES BY TURING MACHINES
81
- if the number of a’s is odd, then continue the move to the right.
The new state table and state diagram:
symbol
s
a
state
q0
(q0 ; s; R) (q1 ; a; R)
(q0 ; a; R)
q1
#
(q0 ; #; N )
(q1 ; #; R)
and
(a,a,R)
q0
q1
(#,#,R)
(s,s,R)
(#,#,N)
(a,a,R)
Modify the last Turing machine as follows:
symbol
state
s
a
q0
(q0 ; s; R) (q1 ; a; R)
q1
(q0 ; a; R)
#
(q0 ; #; N )
(q1 ; #; L)
What does this machine? Does it recognize language L = fa2n j n
Modify the last Turing machine as follows:
0g? And does it decide?
symbol
state
s
a
#
q0
(q0 ; s; R) (q1 ; a; R) (q0 ; #; N )
q1
(q0 ; a; R) (q0 ; #; L)
What does this machine? Does it recognize language L = fa2n j n 0g? And does it decide?
The classes of recursively enumerable and recursive languages give a classi…cation of formal
languages that is di¤erent from that of Chomsky.
They also characterize the algorithm concept of Turing.
A language is recursively enumerable if there exists a (Turing) machine (algorithm) that
recognizes the words of the language in a …nite number of steps.
A language is recursive if there exists a (Turing) machine that not only recognizes the words
of the language but it can decide if any word is in the language in a …nite number of steps.
There exist recursively not enumerable languages.
82
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
Recursive languages are a true subset of recursively enumerable languages.
137. De…nition. A single-tape Turing machine is called a linear bounded automaton, if the
machine is never allowed to use cells outside the region of cells where the input is …rst placed.
138. Theorem. If L is a context-sensitive language, then there exists a linear bounded automaton that accepts L.
139. Theorem. If the language L is accepted by a linear bounded automaton, then L is contextsensitive.
The two results imply that L is context-sensitive if and only if it is accepted by a linear
bounded automaton.
It also follows that context-sensitive languages are recursively enumerable.
140. Theorem. Every recursively enumerable language is unrestricted language.
141. Theorem. Every unrestricted language is recursively enumerable.
The unrestricted languages are also called as grammatically generated languages.
142. Theorem. A function f :
matically computable.
!
is computable (recursive) if and only if it is gram-
The unrestricted languages are identical with the recursively enumerable languages.
Hence the Chomsky classes are all recursively enumerable and these classes can be recognized
by Turing machines.
Turing
machine
unrestricted
languages (L0)
context-sensitive
languages (L1)
linear
bounded
automaton
context-free
languages (L2)
Pushdown
automaton
regular
languages (L3)
finite deterministic automaton
The following result shows that the recursively enumerable languages form a true subset of
all languages.
RECOGNITION AND DECISION OF LANGUAGES BY TURING MACHINES
83
143. Theorem. There exists a language L1
, which is not recursively enumerable.
A Turing machine T can be given with …nite word of symbols. Hence the cardinality of
all Turing machines is countably in…nite (enumerably in…nite). Hence we can list all Turing
machines as follows:
M1 ; M2 ; : : : :
Denote by L (Mi ) the recursively enumerable language recognized by Mi . Hence we have the
corresponding list of recognized languages:
L (M1 ) ; L (M2 ) ; : : : :
These languages are also countably in…nite.
is countably in…nite, but its power set 2 =
fX j X
g is not. This simply shows the existence of recursively not enumerable languages.
However we construct one such language. Since
is enumerably in…nite we enlist the words
of
in the lexicographic ordering:
w1 ; w2 ; : : : :
Consider the following in…nite table (matrix), in which we denote if a word wi is an element of
language L (Mj ) or not:
w1
w2
..
.
wi
wi+1
..
.
L (M1 ) L (M2 )
2
=
2
2
=
2
..
..
.
.
2
=
2
=
2
=
2
=
..
..
.
.
L (Mi ) L (Mi+1 )
2
=
2
2
2
=
..
..
.
.
2
2
=
2
=
2
=
..
..
.
.
L1
2
2
=
..
.
2
=
2
..
.
Language L1 is de…ned as follows:
wi 2 L1 , wi 2
= L (Mi )
(i = 1; 2; : : :):
Language L1 can not be recursively enumerable, because all recursively enumerable language
are listed above, and for any word wi either wi 2 L (Mi ) or wi 2 L1 holds.
We can classify the generative languages over as follows:
84
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
all laguages over Σ
recursively
non-enumerable
languages
recursively enumerable languages/unrestricted
languages (L0)
Next we investigate the recursive (decidable) languages.
144. Theorem. A language L is recursive (decidable) if and only if L and LC are recursively
enumerable.
(sketch): If L is recursive, then LC is also recursive. Assume that L and LC are recursively
enumerable. Then there exist Turing machines T1 and T2 such that L = L (T1 ) and LC = L (T2 ).
Let T3 a Turing machine that observes that which machines accepts the word x.
Then we know the language that contains x. We make one Turing machine from the three
ones.
Earlier we de…ned language L1 , which is not recursively enumerable. De…ne its complementary language:
L2 = fwi j wi 2 L (Mi )g = LC
(7.2)
1:
145. Theorem. Language L2 is recursively enumerable, but not recursive (decidable).
There is a (universal) Turing machine that accepts L2 . Hence L2 is recursively enumerable. But its complementary language L1 is not recursively enumerable. Hence it cannot be
recursive.
146. Theorem. Every context-free language is recursive (Turing-decidable).
There are recursive languages that are not context-free. Hence we have the following classi…cation of languages.
RECOGNITION AND DECISION OF LANGUAGES BY TURING MACHINES
85
recursive (decidable)
languages
all languages over Σ
recursively
non-enumerable
languages
recursively enumerable languages
regular
languages
context-free
languages
7.2. Undecidable problems
We consider decision problems that cannot be decided by Turing machines, which means that
the language which de…nes the problem is not recursive.
If a decision problem cannot be decided, than the characteristic function of the related
language is not computable (recursive).
7.2.1. Undecidability of the universal language
Consider the following decision problem:
Given a Turing machine T and the word x 2
. Decide if T accepts x!
The most obvious idea is to make T run on x.
This is good if T accepts x within a reasonable time limit, that is x 2 L (T ).
If it is not the case, then we do not know that either the execution time is long or T never
stops.
In order to solve the problem we need a super algorithm. But there is no such algorithm.
De…ne the universal language:
Lu = fw#x 2
j Tw is a Turing machine and x 2 L (Tw )g :
(7.3)
Language Lu is the language of universal Turing machines. These words are accepted by a
universal Turing machine.
86
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
For any universal Turing machine U , Lu = L (U ). This implies that Lu is recursively
enumerable, that is Lu 2 LRE .
147. Theorem. (Turing, 1936): Lu is recursively enumerable, but not recursive (decidable).
7.2.2. The halting problem
The problem is related to the …niteness of computations.
Assume that we have a "wonder" program (written in some language), that has the following
properties:
(i) The input of the wonder program is any program P (written in the same language) and
the input X of P ;
(ii) The wonder program always decides (correctly) if P stops on X (Yes) or P does not stop
on X (No).
Name the wonder program by halts (P; X) and de…ne the following program:
diagonal (X)
a: if halts (X; X) then goto a else halt
Program diagonal (X) works in the following way:
If halts decides that program X with input X stops, then program diagonal enters to
in…nite loop. Otherwise it stops.
Our next question is if diagonal (diagonal) stops?
Well, it stops if and only if the call halts (diagonal; diagonal) results in the answer "No".
This means that it stops if and only if it does not stop.
This is contradiction and consequently we cannot have such a wonder program.
So there cannot be a (wonder) program that could decide if an arbitrary program stops or
enter into an in…nite loop.
The halting problem is de…ned as follows.
Given a Turing machine T and the word x 2
. Decide if T stops on x!
Here we require more than in the case of universal language problem, where in case x 2
= L (Tw )
the Turing machine may not necessarily stop.
The language of the halting problem:
Lh = fw#x 2
j Tw is a Turing machine and Tw stops on xg :
(7.4)
148. Theorem. Language Lh is recursively enumerable, but not recursive (decidable).
UNDECIDABLE PROBLEMS
87
Hence
Lh 2 LRE n LR :
(7.5)
It follows that there is no algorithm (always stopping Turing machine) that can determine
if an arbitrary Turing machine stops on an arbitrary input.
We can now classify the languages as follows:
recursive (algorithmically decidable)
languages
all languages over Σ
undecidable
languages
Lh
L2
Lu
recursively enumerable languages
Aanderaa and Fischer (1967) proved that the halting problem can be solved for two-state
Post machine. Hence the halting problem and the universality are interrelated.
There are several Turing undecidable problem. We cite the following ones.
149. Theorem. The problem of determining whether a given Turing machine’s language is
empty is undecidable.
A property of languages is called trivial if either every language has this property or none
of them has it. A non-trivial property means that some languages have this property while
others do not.
150. Theorem. (Rice) Let P be any nontrivial property of the languages. The problem of
determining whether a given Turing machine’s language has property P is undecidable.
7.3. Further undecidable problems
We show some problems that indicate the big variety of such problems.
Post correspondence problem (PCP( n) problem):
Given a …nite alphabet
and the sequence of paired words
(u1 ; v1 ) ; (u2 ; v2 ) ; : : : ; (un ; vn ) 2
88
:
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
The question is that whether there exists a sequence fij g (1
ij
n) such that
ui1 : : : uik = vi1 : : : vik ?
The problem is decidable for n = 1; 2, but it is undecidable for large n.
Post k-tag systems (1921):
Given a …nite alphabet , an integer number k 1, a map P : !
and the initial word
!0 2 . The 4-tuple T = ( ; k; P; !0 ) is called the Post k-tag system (machine).
The machine starts with the initial word !0 and generates a sequence of words f!i g as
follows:
!i+1 = uP ( ) ; if !i = vu, where 2 ; v 2 k 1 :
Here set
k 1
denotes the set of words of length k
1.
The sequence is computed until the length of obtained word is less than k, or its …rst
character (letter) is equal to a special halting symbol.
If none of these happens than the sequence will be periodic or the length of words increasing
to in…nity.
151. Example. Let T0 = (f0; 1g ; 3; f0 ! 00; 1 ! 1101g ; !0 ) and !0 = 001101. Then the
generated sequence is
0 0 1 1 0 1
1 0 1 0 0
0 0 1 1 0 1
Thus we obtained the initial word. Hence the sequence is cyclic.
152. Example. Let T0 = (f0; 1g ; 3; f0 ! 00; 1 ! 1101g ; !0 ) and !0 = 000000. The the generated sequence is
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
This sequence stops.
Cocke and Minsky proved that 2-tag Post machines can simulate all Turing machines.
The halting problem of Post k-tag systems is to determine if a given k-tag system T stops
for a given initial word !0 .
Wang proved that this halting problem is decidable for k = 1 and there are 2-tag Post
systems for which the halting problem is undecidable.
The 3x + 1 problem of Collatz (around 1930): Let x 2 N and
f (x) =
x=2; if x is even
3x + 1; if x is odd.
The plot of the function:
FURTHER UNDECIDABLE PROBLEMS
89
Collatz function
60
50
f(x)
40
30
20
10
0
0
5
10
x
15
20
(Collatz) Consider the iteration
xn+1 = f (xn )
(n = 0; 1; 2; : : :):
For all x0 2 N , does there exist an i 2 N such that xi = 1?
Note that if xi = 1, then xi+1 = 4, xi+2 = 2 and xi+3 = 1 and we obtain a cycle.
An equivalent formulation of the problem is the following.
Does the following program stop for any integer m 2 N,
n := m
while n > 1 do
if (n is even) then
n := n=2
else
n := 3n + 1
endif
endwhile
It is conjectured that the answer is yes, but it is not yet proved.
Hilbert’s 10th problem (Diophantine equations):
Given an n-variable polynomial p (x1 ; x2 ; : : : ; xn ) with integer coe¢ cients (n 2). Hilbert
(1900) asked if there exists a "general method" to determine the integer solutions of the equation
p (x1 ; x2 ; : : : ; xn ) = 0:
In other words, decide in a …nite number of steps (operations) if equation p (x1 ; x2 ; : : : ; xn ) =
0 has integer solution.
153. Example. Equation
x21 + x22 + 1 = 0
has no integer solution.
90
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
154. Example. Equation
x1 x2 + x1
x2
1=0
has in…nitely many integer solutions. For example, if x1 = 1 then x2 2 Z is arbitrary.
The problem for univariate polynomials
p(z) = a0 + a1 z + a2 z 2 + ::: + an 1 z n
1
+ an z n = 0
with integer coe¢ cients is decidable.
By Cauchy’s theorem, all zeros of p (z) are included in the complex disk
z 2 C : jzj
1 + max
0 i n
jai j
1 jan j
:
The integer solutions, if they exist, must be in the real interval
1 + max
0 i n
jai j
1 jan j
;1 + max
0 i n
jai j
:
1 jan j
Substituting all integer numbers of this interval into the polynomial, we can decide the question
with a …nite number of arithmetic operations.
However for n 2, this is not the case.
Matiyasevich (1970, age 23) proved that the problem cannot be decided algorithmically.
7.4. Some concepts and results of complexity theory
We study Turing solvable (decidable) problems and the following questions:
- Characterization of the complexity of a given problem.
- Selection of the better (best) method.
We consider the following measures:
- computational time,
- memory need.
The complexity of a given problem is characterized by lower and upper bounds on the cost
of algorithms that solve the problem.
The complexity of a problem and the e¢ ciency of solvers are usually interrelated.
There are several surprising results.
1. There are problems for which optimal algorithms do not exist.
If the problem is to recognize a language L e¢ ciently in time, then we seek for an algorithm
(Turing machine) T , for which the time function timeT (n) grows slowly with n and maybe the
best. Surprisingly enough, it is not always case as shown by the following results.
SOME CONCEPTS AND RESULTS OF COMPLEXITY THEORY
91
155. Theorem. (Acceleration theorem) There exist a language L such that
(i) L is recognizable with a Turing machine T such that timeT (n) is …nite for every n;
(ii) For any Turing machine T that recognizes L, there is a Turing machine T 0 such that
L = L (T 0 ) and timeT 0 (n) = O (log timeT (n)) :
156. Theorem. If L = L (T ) and timeT (n) cn (c is constant), then for every " > 0, there
exists a Turing machine T 0 such that L = L (T 0 ) and timeT 0 (n) n (1 + ") (n n0 (")).
Consequently there is no best algorithm for this problem.
2. There exist solvable problems with arbitrary high complexity.
Consider the following (Ackermann) function
(1) = 2;
(n)
(n + 1) = 2
It is clear that
(n) = 2
This grows faster than 2n (exponential
following values
n
1
(n) 2
2n
2
n!
1
n
n
1
10
22
:
:: 2
;
n
n 1 ‡oors
1:
.
function), n! or nn . Matlab computations give the
2 3
4
5
4 16 65536 inf
4 8
16
32
2 6
24
120
4 27 256 3125
5
fi(n)
n
10
4
n
n
3
y
10
2
n!
10
10
10
92
2
1
0
1
1.5
2
2.5
3
n
3.5
4
4.5
5
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
If we write (n) and n in the unary (”1") number system, then for any Turing machine
that computes , the time function timeT (n) will be at least (n).
The situation remains the same if we write
in a di¤erent number system.
The following general results are true.
157. Theorem. (Cejtin) For any computable function f :
property of words in
such that
a) There is a Turing machine that decides for any word p 2
not;
b) For any Turing machine T that recognizes property ,
!
(
= f0; 1g), there is a
if p has the property
or has
timeT (jpj) > f (jpj)
holds for in…nitely many words p.
158. Theorem. (Rabin) For any computable function f :
property of words in
such that
a) There is a Turing machine that decides for any word p 2
not;
b) For any Turing machine T that recognizes property ,
!
(
= f0; 1g), there is a
if p has the property
or has
timeT (jpj) > f (jpj)
holds for all p 2
except for a …nite number of words.
159. De…nition. The time complexity of a language L
is at most f (n), if there is a
Turing machine that decides L and for every word w 2
of length n (jwj = n) it stops in at
most f (n) steps. The class of such languages is denoted by DT IM E (f (n)).
160. De…nition. The space complexity of a language L
is at most f (n), if there is a
Turing machine that decides L and for every word w 2
of length n (jwj = n) it requires at
most f (n) cells on the tape. The class of such languages is denoted by DSP ACE (f (n)).
161. De…nition. An algorithm (machine) T has polynomial execution time, if its time complexity is at most p (n), where p is a polynomial.
An alternative version of the de…nition: instead of p (n) we use function O nk .
162. De…nition. An algorithm (machine) T has exponential execution time, if its time comc
plexity is O 2n for some constant c > 0.
163. De…nition. The class of languages that can be decided by a polynomial time Turing
machine is denoted by P T IM E or P .
SOME CONCEPTS AND RESULTS OF COMPLEXITY THEORY
93
Clearly
P = [k 0 DT IM E nk :
(7.6)
164. De…nition. The class of languages that can be decided by a Turing machine of polynomial
space complexity is denoted by P SP ACE.
Clearly
P SP ACE = [k 0 DSP ACE nk :
(7.7)
165. De…nition. The class of languages that can be decided by an exponential time Turing
machine is denoted by EXP T IM E.
Clearly
k
EXP T IM E = [k 0 DT IM E 2n
:
(7.8)
The polynomial (time) algorithms are the ”good”or "tractable" algorithms or the "easy"
problems.
The exponential algorithms are the ”bad”algorithms or the "intractable" problems.
Examples for polynomial algorithms:
- Combinatorics: shortest path, Hungarian method.
- Arithmetic algorithms: basic arithmetic operations, greatest common divisor by the
Euclidean algorithm.
- Linear algebra algorithms: computation of determinant, inverse and solving linear
system of equations by the Gauss method.
Examples for exponential algorithms:
-
Ackermann type or superexponential functions,
computing all permutations,
computing the determinant by the classic de…nition,
Combinatorics: variants of travelling salesman problem, etc.,
the simplex method of linear programming.
The above classi…cation is not necessarily re‡ects the practice.
For example, the simplex algorithm is an e¢ cient algorithm used in practice. But there are
polynomial time algorithms that cannot be used for solving large problems.
The common classi…cation of problems:
94
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
recursive (algorithmically decidable)
languages
all languages over S
undecidable
languages
hard
problems
easy
problems
The following program computes the binomial coe¢ cient
n
k
(notation C (n; k)) iteratively
for i := 0 to m do {C[i; 0] := 1; C[i; i] := 1};
for n := 2 to m do
for k := 1 to n 1 do
C[n; k] := C[n 1; k] + C[n 1; k 1];
Analyze the program! What is the cost of algorithm in general? Give a lower bound for the
cost of C (n; n=2), if n is even!
SOME CONCEPTS AND RESULTS OF COMPLEXITY THEORY
95
96
ALGORITHMIC DECIDABILITY AND COMPUTABILITY
8. fejezet
Complexity
We investigate the N P complexity class
The N P class consist of problems for which polynomial time algorithms are not known (only
exponential time algorithms) but the checking of solution can be done in polynomial time.
The importance of the class lays in the great number of such problems.
Some typical N P problems
1. The bin packing problem:
si
Given boxes of capacity 1 and N di¤erent objects with sizes s1 ; s2 ; : : : ; sN such that 0
1 (i = 1; 2; : : : ; N ).
The corresponding optimization problem: What is the smallest number of boxes in which
we can put all the N objects?
The corresponding decision problem: Is it possible or not to put all objects in B or less
boxes?
2. The backpacking problem:
Given N objects with sizes s1 ; s2 ; : : : ; sN and values w1 ; w2 ; : : : ; wN . Furthermore we have
a backpack the size of which is K.
The corresponding optimization problem: Determine those objects that can be put into the
backpack and their total value is the greatest.
The corresponding decision problem: Is there a subset of objects that can be put into the
backpack and their total value is at least W ?
The problem is related to capital investment problems, where the sizes of the problems
are the expenses of the investments, the values of the objects correspond to the pro…ts of
investments and the size of the backpack is the total amount available for investment.
COMPLEXITY
97
8.1. The NP class and NP completeness
The main tool for investigating the N P class is the non-deterministic Turing machine.
8.2. Non-deterministic Turing machines and the NP class
The non-deterministic Turing machine is considered as the formal de…nition of nondeterministic
algorithms.
The state transition function of non-deterministic Turing machines is a relation instead of
a function.
Before giving the formal de…nition consider the ‡ow diagrams of a deterministic and a
nondeterministic computation.
deterministic
computation
non-deterministic
computation
start
.
.
.
.
.
.
.
.
.
reject
reject
accept
accept
or reject
In the deterministic computations the continuation at each node is uniquely determined.
In the non-deterministic case it is possible to have more than one continuation branches.
Denote by s the left end symbol of the tape, which is in…nite on the right.
166. De…nition. A non-deterministic Turing machine (N T ) is 7-tuple N T = (Q; ; ; ; q0 ; qA ; qR ),
where
(i) Q 6= ? is the …nite set of states,
(ii) is the input alphabet,
(iii) = [ fs; #g,
fL;R;N g
(iv) : (Q n fqA ; qR g)
! 2Q
is the transition relation, which satis…es
(q; s)
Q
fsg
fR; N g
(8q 2 Q n fqA ; qR g) ;
(8.1)
(v) q0 2 Q is the start state,
98
COMPLEXITY
(vi) qA 2 Q is the accept state,
(vii) qR 2 Q n fqA g is the reject state.
If N T is in state p 2 Q and the write/read head reads the symbol Y 2 , the transition
relation of N T de…nes a …nite subset
(p; Y )
fL; R; N g :
Q
The elementary operation of N T can be an arbitrary element (q; X; Z) 2 (p; Y ), where
- q is the new state,
- X is the symbol that overwrites Y on the tape,
- Z is the direction of the write/read head.
Property (8.1) prohibits the overwriting of s and the move of write/read head left to s.
For a given input x 2 , the non-deterministic machine N T starts at the …rst cell of the
tape in start state q0 . N T makes either in…nitely many steps or stops at a state q 2 fqA ; qR g.
For a given input, N T may perform several di¤erent computations.
167. De…nition. The non-deterministic Turing machine N T accepts input x in time t [with
memory (space) s], if there is a computational sequence that uses at most t steps [memory space
s] and T N stops in state qA .
It is possible to have a computational sequence on the same input that may last longer, or
may never stop or T N stops in the reject state qR .
168. De…nition. The non-deterministic Turing machine N T rejects input x, if all possible
computations are such that N T does not stop, or stops in state qR .
169. Example. Analyze the following non-deterministic Turing machine M = (Q; ; ; ; q0 ; qA ; qR ),
where Q = fq0 ; q1 ; q2 g, = fa; bg, = fa; b; s; tg, qA = q2 , qR = q1 and the state transitions
are described by the following state diagram:
(b,b,R)
q0
(b,b,R)
q1
(b,b,R)
q2
(a,a,R)
For the input w = ababb, machine M may stop in any of the states q0 , q1 , q2 . There is a
computational sequence that takes the machine to state q2 . Hence M accepts the input w. The
language recognized by M consists of those words that end in ’bb’.
170. De…nition. The non-deterministic Turing machine N T recognizes language L
, if
L consists of those words that are accepted by N T in …nite time. If in addition N T accepts
every word x 2 L (jxj = n) in time [space] f (jxj), then we say that N T recognizes language L
in time [space] f (n).
NON-DETERMINISTIC TURING MACHINES AND THE NP CLASS
99
The class of languages that can be recognized by a non-deterministic Turing machine in
time [space] f (n) is denoted by N T IM E (f (n)) [N SP ACE (f (n))].
171. De…nition. The class of languages that can be decided by polynomial time non-deterministic
Turing machines is denoted by N P .
Clearly
N P = [k 0 N T IM E nk :
(8.2)
172. De…nition. The class of languages that can be decided by polynomial space non-deterministic
Turing machines is denoted by N P N SP ACE.
Clearly
N SP ACE = [k 0 N SP ACE nk :
(8.3)
Remarks:
1. Deterministic Turing machines are special non-deterministic Turing machines. Hence
P NP.
2. Non-deterministic Turing machines N T are not practical computational models, but
theoretical tools to handle certain complexity questions.
173. Theorem. Every non-deterministic Turing machine is equivalent a deterministic Turing
machine.
Idea of proof: Consider all possible computational sequences and simulate them with a
deterministic machine.
174. Corollary. The languages recognized by N T machines are exactly the recursively enumerable languages.
175. Theorem. If L 2 N P , then there exists a polynomial p such that L is recognizable with
a deterministic algorithm of time complexity at most O 2p(n) .
Interestingly enough the space complexity does not increase exponentially.
176. Theorem. (Savitch) P SP ACE = N SP ACE.
One important characterization of N P problems is that if we have an information on (guess
of) the solution (witness) than the checking of solution requires only polynomial computational
time.
We do not give particular details, but we show some problems of the N P class and their
witnesses. Here we suppose simple undirected graphs with n vertices.
Connectivity of the graph. Witness
100
n
2
paths, one path for each pair of vertices.
COMPLEXITY
Disconnectivity of the graph. Witness: a real subset of vertices that has no paths to some
other points.
The existence of Hamilton cycle in the graph. Witness: a Hamilton cycle.
Divisibility of integers. Witness: a true divisor.
From the examples we can see that NP problems are simple if we know a witness, but to
…nd a witness is hard.
The big questions is if P $ N P . It is conjectured that
P 6= N P:
(8.4)
It is not yet solved.
In the sequel we give a more re…ned characterization of the N P class.
8.3. NP completeness
We introduce the concept of (Karp) reduction to measure the hardness of N P problems.
177. De…nition. Language L1 is polynomially reducible to language L2 (notation: L1
if there is a function f computable in polynomial time such that for every word x 2 ,
x 2 L1 , f (x) 2 L2
p
L2 ),
(8.5)
holds.
178. Theorem. If L1
p
L2 and L2
179. Theorem. If L1
p
L2 and L2 2 P , then L1 2 P:
p
L3 , then L1
p
L3 .
A schematic proof for the last claim is the following.
Yes
x
polynomial time
reduction
algorithm
f(x)
polynomial time
algorithm to decide
L2
No
polynomial time algorithm to decide L1
180. Theorem. If a language is in class N P , then any language that can be reduced to it in
polynomial time is also in the class N P .
NP COMPLETENESS
101
181. De…nition. A language L 2 N P is said to be N P complete, if every language of N P
can be reduced to L in polynomial time.
182. Theorem. If languages L1 and L2 are in N P , L1 is N P complete and reducible to L2 in
polynomial time, then L2 is also N P complete.
Consider the halting problem whose language is
Lh = fw#x 2
j Tw is a Turing machine and Tw stops on xg :
Language Lh is recursively enumerable but not recursive.
183. Theorem. Every recursively enumerable language is reducible to Lh .
Let A be a recursively enumerable language and TA be a Turing machine that accepts A.
Let f (x) = k (TA ) #x. This f is recursive function. For every x, x 2 A , TA stops on x ,
f (x) 2 Lh .
This indicates that Lh is complete for all recursively enumerable language, that is Lh is the
hardest problem among them.
Consider now a bounded variant of the halting problem. Assume that M is a nondeterministic Turing machine, x is a binary word, and 0n is the number n in unary notation.
The question is whether M accepts x in n steps? The language of the problem is
BH = fk (M ) #x#0n j M accepts x in n stepsg :
184. Theorem. Language BH is N P complete.
BH 2 N P because jk (M ) #x#0n j = (jk (M )j + jxj + n). We reduce polynomially every
N P problem to BH. Let A 2 N P . There is a non-deterministic Turing machine TA which
accepts A in p time, where p is a …xed polynomial.
(continuation) Let
f (x) = k (TA ) #x#0p(jxj) :
(8.6)
Then for every x, x 2 A , TA accepts x in p (jxj) time , kod (TA ) #x#0p(jxj) 2 BH ,
f (x) 2 BH. Since TA is …xed and p is a polynomial, the computational time of f (x) is a
polynomial of jxj. Hence f is computable in polynomial time and so A p BH.
The estimated number of N P complete problems is well over one thousand. The usual
process to prove N P completeness is the following:
- Picking one N P complete problem.
- Polynomial reduction to this problem.
Next we enlist seven important N P complete problems.
185. De…nition. A Boole polynomial is satis…able, if it is not identically zero.
102
COMPLEXITY
Satis…ability problem (SAT) is to test whether a Boolean polynomial f is satis…able?
186. Example. Formula
(u1 _ :u3 _ :u4 ) ^ (:u1 _ u2 _ :u4 )
is satis…able (see, e.g. u1 = i, u2 = i, u3 = i, u4 = i), while formula
(:u1 _ :u1 _ :u1 ) ^ (u1 _ :u2 _ :u2 ) ^ (u1 _ u2 _ :u3 ) ^
^ (u1 _ u2 _ u3 )
is not satis…able.
Satis…ability problem (3SAT) is to test whether a Boolean polynomial of the form
(a1;1 _ a1;2 _ a1;3 ) ^
^ (an;1 _ an;2 _ an;3 )
(8.7)
is satis…able?
Here the variables ai;j are Boolean or negated Boolean variables.
Given a graph G = (V; E). A vertex cover of G is a subset of vertices where every edge of
G touches one of vertices.
Vertex cover problem (Vertex cover/VC): Given a graph G and the integer number k.
Decide if there exists a k-node vertex cover.
Hamilton cycle problem (HAM): Given a graph G = (V; E). Decide if there is a Hamilton
cycle in G?
Travelling salesman problem (TSP): Given a weighted graph G = (V; E) and a number K.
Decide if there exist a tour (a cycle that visits each node exactly once) of length at most K in
graph G.
Subset sum problem (SUBSET-SUM): Given a collection of numbers s1 ; : : : ; sn and the
target number t (all integers). Is there a subset of fs1 ; : : : ; sn g that adds up to t?
Job scheduling problem (JS): Given n jobs, whose execution times are t1 ; t2 ; : : : ; tn with
deadlines d1 ; d2 ; : : : ; dn . The penalty for failing the deadline di is pi (i = 1; 2; : : : ; n). Given the
number P 1. Is there an order of jobs whose resulting penalty is less than P ?
The proof of N P completeness can be made in the following steps. First we need the
following fundamental result.
187. Theorem. (Cook, 1971) SAT problem is N P complete.
Then using the reduction we have to prove the N P completeness of the other problems in
the following order:
NP COMPLETENESS
103
SAT
3SAT
VC
SUBSET-SUM
HAM
JS
TSP
The complexity classes and their (assumed) relationship is the following:
P
NP
NP
complete
EXPTIME
decidable problems
all problems
There are further complexity classes. Their number is 496 at the moment (23-03-2015) that
can be seen on the web page
https://complexityzoo.uwaterloo.ca/Complexity_Zoo
The fact that we know only exponential time algorithm for a given problem does not necessarily means that there is no better algorithm.
A famous example is the linear programming problem:
cT x ! max
Ax b; x
where A 2 Rm
n
0;
, b 2 Rm , c; x 2 Rn .
The best practical method is the simplex method whose worst case cost is exponential.
Later it was proved that the average complexity is polynomial.
Finally there developed several polynomial time algorithms (Khachyan, Dikin, Evtushenko,
Karmarkar, etc.).
104
COMPLEXITY
Download