Uploaded by Michael.feduk

Combinatorial Optimization notes UCL

advertisement
Combinatorial Optimization Lecture 1
1
Motivating examples
An optimization problem is a problem of the following form:
Problem 1. Given a function f : S → R where S is a set, find an element
s ∈ S with f (s) maximum (or minimum).
When S = R or S = Rn we have powerful tools from calculus for solving
problems like this.
A combinatorial optimization problem is the special case of the above,
when the set S is finite. Here techniques from calculus usually don’t carry
over very well, and we need to come up with a different set of techniques to
approach problems like this. On the other hand, in real world optimization
problems we actually often have that S is a finite set:
Example 2 (Euclidean travelling salesman). A helicopter has to visit 27
sites. The locations and distances are known, the target is to find in which
order it should visit the sites in order to minimize the total distance flown.
This give rise to a more general task called the Euclidean Traveling Salesman
Problem, ETSP for short. In this problem sites (or points) v1 , . . . , vn are
given
=
Pn in the plane and for a permutation π of {1, . . . , n} we let L(π)
2
|v
−
v
|
where
|a
−
b|
is
the
distance
between
points
a,
b
∈
R
,
and
π(i+1)
π(i)
1
vn+1 = v1 by convention.
This can be mathematically formalized as:
TASK: ETSP
INPUT: points v1 , . . . , vn ∈ R2
OUTPUT: a permutation σ of {1, . . . , n} minimizing L(π)
There are a couple of natural approaches that one can take for solving
this:
ˆ Brute force approach: There are only finitely many (namely n!) permutations so the minimum can be found by checking all of them. But
this is impossible in practice: checking for instance 27! permutations
would take an enormous amount of time.
ˆ Nearest neighbour approach: Fix σ(1) = 1. Then pick σ(2) to
minimize ||vσ(1) − vσ(2) ||. Then pick σ(3) to minimize ||vσ(2) − vσ(3) ||...
This is a fast algorithm, but it doesn’t always give an optimal solution.
1
Remark. ETSP belongs to a large class of notoriously hard problems for
which no fast or effective algorithm is known. It is generally assumed (but
no proof is in sight) that in fact, there is no fast algorithm for ETSP.
Example 3. Minimum cost path problem Given a road map with two
specified locations u, v, and specified costs for travelling along each road, find
the cheapest path from u to v.
This is essentially the problem that your phone solves when it charts the
quickest route between two places using Google Maps.
Again, the set of paths from u to v is finite, so the optimal solution can
be found by exhaustive search (which as before is too slow to do in practice).
But, as we will see later in the module, unlike ETSP, there is a fast algorithm
solving the minimum cost path problem.
Decision problems
A second class of problems we’ll look at in the module are decision problems.
These are defined as follows:
Problem 4. Consider two sets S, T with S ⊆ T . Given t ∈ T , decide if
t ∈ S or not.
Here are two related examples of decision problems:
Example: This is to find out if a given Diophantine equation has a root in
integers or not.
TASK: Diophantine equations
INPUT: a polynomial p(x1 , · · · , xn ) with integer coefficients
OUTPUT: YES, if the equation p = 0 has a solution with x1 , . . . , xn ∈ Z
(the set of integers); NO, if p = 0 has no such solution.
By a famous theorem proved by Davis, Putnam, Robinson and Matiyasevich,
that this problem is “undecidable”. This means that there is no general
algorithm (say, a computer program) that could decide for every Diophantine
equation whether or not it has a solution in integers. Later in the module
we’ll see how to mathematically define what “undecidable” means.
Example:
TASK: Equations with real solutions
INPUT: a polynomial p(x1 , . . . , xn ) with integer coefficients
2
OUTPUT: YES, if the equation p = 0 has a solution with x1 , . . . , xn ∈ R
(the set of real numbers); NO, if p = 0 has no such solution.
By a theorem proved by A. Tarski, there is an algorithm that solves this
problem. Tarski explicitly described such an algorithm.
2
Definition of algorithms, and running times
The above examples suggest that to study combinatorial optimization problems, we should study algorithms i.e. procedures one can follow to solve the
problem. Informally, we will think of an algorithm as a “procedure that takes
an input and produces an output solving some task”. Much of the time we
will stick with this informal notion of an algorithm and study concrete examples of algorithms for particular problems. Towards the end of the module
we will study Turing Machines which are a mathematically precise way of
defining algorithms (which allows us to prove theorems about them).
In this lecture we will study the Nearest Neighbour algorithm for ETSP,
which was described earlier. Here is a more formal description of it:
Algorithm 5 (Nearest neighbour algorithm).
Input: points (x1 , y1 ), . . . , (xn , yn ) ∈ R2 .
Output: a permutation σ of 1, . . . , n.
Procedure:
Set σ(1) = 1.
For i = 2, . . . , n, repeat the following:
ˆ Set min = ∞.
ˆ For j = 1, . . . , n, repeat the following:
– Calculate d = (xσ(i−1) − xj )2 + (yσ(i−1) − yj )2 .
– If d < min and j 6= σ(1), . . . , σ(i − 1), then set min = d and set
σ(i) = j.
Output σ.
Definition 6 (Running time). Given a task, the running time of an algorithm
A on an input I of this task is
T (A, I) = number of steps taken by A on I.
When it is clear what algorithm we are looking at, we abbreviate this to T (I).
3
This definition is currently not complete because it doesn’t define what
a “step” is. Informally, a step or “operation” is one elementary operation
performed by the algorithm. Throughout the module, in different contexts,
it will be convenient to count steps in slightly different ways. We will refer to the particular way we count steps in a particular algorithm as the
“model of computation” that the algorithm uses. The most frequent model
of computation is the arithmetic model:
Definition 7 (Arithmetic model). An algorithm A takes place in the arithmetic model if the input to the algorithm is a finite sequence of real numbers,
and every step of the algorithm is one of the following:
(1) Add/subtract two real numbers.
(2) Multiply/divide two real numbers.
(3) Compare two real numbers x, y (to check if x < y, x = y, or x > y).
(4) Change the value of some variable.
The running time T (A, I) is the number of times each of the above bullet
points happens when the algorithm is run on I.
For example we can analyse the nearest neighbour algorithm using this
definition:
Example 8 (Running time of nearest neighbour algorithm).
Set σ(1) = 1. [This is 1 step of type (4)]
For i = 1, . . . , n [This is n step of type (4)], repeat the following:
ˆ Set min = ∞ [This is 1 step of type (4)].
ˆ For j = 2, . . . , n [This is n step of type (4)], repeat the following:
– Calculate d = (xσ(i−1) − xj )2 + (yσ(i−1) − yj )2 [This is 2 subtractions, 2 multiplications, 1 addition, and 1 write. So
6 steps in total].
– If d < min [This is 1 step of type (3)] and j 6= σ(1), . . . , σ(i−
1) [This is i − 1 ≤ n − 1 steps of type (3)], then set min = d
[This is 1 step of type (4)] and set σ(i) = j [This is 1 step
of type (4)].
4
Output σ.
Adding up all the above operations (taking into account that the ones
inside the “for” loops are repeated multiple times), we get the upper bound:
T ((x1 , y1 ), . . . , (xn , yn )) ≤ 1+(n−1)+(n−1)(1+n+n(6+1+(n−1)+2)) =
n3 + 9n2 − 8n − 1.
The running time of an algorithm usually depends on what input it gets.
Generally, we want to analyse how fast or slow an algorithm is. To do this,
we analyse the worst case running time of the algorithm.
Definition 9 (Worst case running time). For an algorithm A, we define the
worst case running time as:
T (n) = T (n, A) = max{T (A, I) : I is an input of length n}
This definition is again not yet complete because it depends on us defining
what the length of an input is. The definition of length again depends on
what model of computation we use. In the arithmetic model it is defined as
follows:
Definition 10 (Length of input in arithmetic model). The length of an input
I in the arithmetic model is the number of real numbers given in I.
Returning to the Nearest Neighbour algorithm: the input is given as “I =
points (x1 , y1 ), . . . , (xn , yn ) ∈ R2 ”. Thus is a sequence of 2n real numbers.
Thus length(I) = 2n. Using this we get the following upper bound on the
worst case running time of the nearest neighbour algorithm:
T (2n) ≤ n3 + 9n2 − 8n − 1.
(1)
One can also analyze algorithms by their average case behaviour, and we
may meet occasionally such cases in the course.
3
Asymptotic analysis of algorithms
The performance of an algorithm depends on what machine or computer is
used. We want to get rid of this dependence and wish to compare algorithms
independently of the machine. The idea is to work with asymptotic analysis.
That is, we look at how T (n) grows as n → ∞ and ignore constant terms and
smaller order terms that may depend on the particular computer or the skills
5
of the programmer. For instance, if T (n) = 6n3 + 7n2 − 11n, then T (n) is
asymptotically n3 , we have simply dropped lower order terms and constants.
More formally:
Definition 11. Assume f, g : N → R are functions. We say that f (n) =
O(g(n)) if there are C > 0 and n0 ∈ N such that for all n ≥ n0 , 0 < f (n) <
Cg(n).
Definition 12. Assume f, g : N → R are functions. We say that f (n) =
Ω(g(n)) if there are c > 0 and n0 ∈ N such that for all n ≥ n0 , f (n) >
cg(n) > 0.
Definition 13. Assume f, g : N → R are functions. We say that f (n) =
Θ(g(n)) if both f (n) = Ω(g(n)) and f (n) = O(g(n)) hold.
The last definition can be shown to be equivalent to there being constants
c, C > 0 and n0 ∈ N such that for n ≥ n0 we have 0 < cg(n) < f (n) < Cg(n).
Using these definitions, and our earlier bound on T (2n) for the nearest
neighbour algorithm, we can establish the following:
T (n) = O(n3 ) for the nearest neighbour algorithm.
Proof. To get this first notice, the (1) is equivalent to T (n) ≤ (n/2)3 +
9(n/2)2 − 8(n/2) − 1 = n3 /8 + 9n2 /4 − 4n − 1. Using that n3 ≥ n2 for
n ∈ N, we get that T (n) ≤ n3 /8 + 9n3 /4 = 19n3 /8. Thus, the definition of
T (n) = O(n3 ) is satisfied with n0 = 1 and c = 19/8.
4
Models of computation
There are four models of computation which will come up in various parts
of this module:
ˆ The arithmetic model.
ˆ The decimal model.
ˆ Comparison sorting algorithms.
ˆ Turing machines.
We’ve already seen the arithmetic model of computation (and this is by far
the most common one we will use). We will now introduce the decimal and
comparison ones. Turing machines are a mathematically rigorous definition
of algorithms which will be defined towards the end of the module.
6
Definition 14 (Decimal model). An algorithm A takes place in the decimal
model if the input to the algorithm is a finite sequence of one-digit integers
∈ {0, 1, . . . , 9} (and the length of the input I is the number of such one-digit
integers in I), and every step of the algorithm is one of the following:
(1) Add/subtract two one-digit integers.
(2) Multiply/divide two one-digit integers (“divide an integer x by an integer
y” means to determine integers q and r < y so that x = yq + r).
(3) Compare two one-digit integers x, y (to check if x < y, x = y, or x > y).
(4) Change the value of some variable.
The decimal model is a bit closer to how real computers work than the
arithmetic model (since manipulating arbitrary real numbers isn’t realistic).
We’ll use it when studying number-theoretic algorithms for tasks like “add
two integers”. Here is one example of such an algorithm:
Example 15 (School addition algorithm).
ˆ Input: two positive integers a = aA aA−1 . . . a1 , b = bB bB−1 . . . b1 written
in decimal with with a ≥ b.
ˆ Output: a + b written in decimal.
ˆ Procedure:
– Set c1 , . . . , cA+1 = 0.
– for i = 1, ..., A, repeat the following:
* Let ci+1 ci = ai + bi + ci (adding three one-digit numbers produces a number with ≤ 2 digits)
– Output cA+1 cA . . . c1
This is exactly 6A+1 elementary operations, because there are A+1 write
operations before the for loop, there are A iterations of the “for” loop and
each iteration contains precisely 3 “write” operations (meaning an operation
of type (4)) and 2 “addition” operations (we are adding ai + bi + ci . Adding
ai +bi is one operation and produces a number x, with ≤ 2 digits say x = x2 x1 .
Since ai , bi ≤ 9, we additionally have that x ≤ 18. Now, note that ci ≤ 1
always and so working out (ai + bi ) + ci = x + ci is just one additional
operation — work out y = x1 + ci , which is a one-digit number, and then
write x2 y which is exactly the number ai + bi + ci ). Since the size of the input
is n = A + B, we get that the running time is T (n) = O(n).
7
It is possible to similarly analyse algorithms for subtraction, multiplication, division, comparison of integers that you learned in school. The produce
the following theorem, which you can use without proof:
Theorem 16. There are algorithms in the decimal model for the following:
(1) addition in O(n) steps,
(2) subtraction in O(n) steps,
(3) comparison in O(n) steps,
(4) multiplication in O(n2 ) steps,
(5) division (up to the integer part of the ratio) in O(n2 ) steps.
Next we define comparison algorithms. They are almost exactly the same
as a the arithmetic model — except that all the steps are just comparisons.
Definition 17 (Comparison-based algorithm). An algorithm A is a comparisonbased algorithm if the input to the algorithm is a finite sequence of real numbers (and the length a input I is the number of such numbers in I), and every
step of the algorithm is one of the following:
(1) Compare two real numbers x, y (to check if x < y, x = y, or x > y).
(2) Change the value of some variable.
The running time T (A, I) is the number of A on I is the number of comparisons that happen i.e. we allow the algorithm to perform as many
operations of type (2) as it wants without counting it in the running time.
The motivation for this model of algorithms is that it is a bit simpler to
count operations than in the arithmetic model (since there is only one type
of operation which contributes to the running time). We shall use this next
week to give a formal proof of a lower bound on the running times of certain
algorithms. This is in contrast to most places in the course where only upper
bounds are shown on running times.
We will mainly consider comparison algorithms for the following task:
TASK: Sorting
INPUT: a1 , a2 , . . . , an ∈ N (or in R)
OUTPUT: b1 , b2 , . . . , bn the ai s in increasing order.
Here’s an example of such an algorithm:
8
Example 18 (Insertion sort).
Input: a1 , a2 , . . . , an ∈ N (or in R)
Output: b1 , b2 , . . . , bn the ai s in increasing order.
Procedure:
For i = 1, . . . , n, repeat the following:
ˆ Set b0 = −∞ and bi+1 = +∞.
ˆ For j = 1, . . . , i, repeat the following:
– If bj−1 ≤ ai ≤ bj , then insert ai between bj−1 and bj (by redefining
b0j = ai and b0j+1 = bj , . . . , b0i+1 = bi ).
Output b1 , . . . , bn .
P
This algorithm has running time exactly T (I) ≤ ni=1 2i = n(n+1) (since
every round of the inner “for” loop has precisely 2 comparisons). Thus we
have T (n) = O(n2 ).
9
1
Combinatorial Optimization Lecture 2
1.1
Merge sort
Last week we saw “Insertion Sort”, which was an algorithm for sorting numbers into increasing order in O(n2 ) steps. Now we’ll introduce a new algorithm called “Merge Sort” which does the same thing in only O(n log n)
steps. It runs as follows:
Algorithm 1.
ˆ Input: real numbers x1 , . . . , xn .
ˆ Output: y1 , . . . , yn , which are x1 , . . . , xn sorted into increasing order.
ˆ Procedure:
– If n = 1, the set y1 = x1 , and we are done.
– Otherwise, split {x1 , . . . , xn } into two sets Sodd {x1 , x3 , x5 , . . . } and
Seven = {x2 , x4 , . . . }.
– Use recursion to sort Sodd into increasing order: this gives numbers
a1 , . . . , adn/2e which are x1 , x3 , x5 , . . . , but in increasing order.
– Use recursion to sort Seven into increasing order: this gives numbers b1 , . . . , bbn/2c which are x2 , x4 , x6 , . . . , but in increasing order.
– Merge the two lists (a1 , . . . , adn/2e ), (b1 , . . . , bbn/2c ) into a single
one. This is done as follows:
* For i = 1, . . . , n − 1, repeat the following:
(+) Let yi = max(a1 , b1 ). Delete this element max(a1 , b1 ) from
the corresponding list (i.e. if we’re deleting a1 , then redefine a01 = a2 , a02 = a3 , . . . . If we’re deleting b1 , then
redefine b01 = b2 , b02 = b3 , . . . ).
* When there’s only one element left, let yn equal this element.
– Output y1 , . . . , yn .
We’re treating this as a comparison sorting algorithm, so to compute the
running time we only count comparisons. The only comparison explicitly
written is the comparison at the start (to check if n = 1 or not), and the
comparisons in step (+) (which occurs n − 1 times). However some comparisons happen during the recursion steps as well. Thus the total number of
comparisons is given by the following recursive equation:
1
ˆ T (1) = 1.
ˆ T (n) = T (dn/2e) + T (bn/2c) + n.
Let’s solve this recurrance. To make things a bit neater, we’ll only do
this when n is a power of 2 (and so dn/2e and bn/2c will get replaced by just
“n/2”).
Theorem 2. Consider the following recurrence:
T (1) = 1
T (n) = 2T (n/2) + n
(1)
(2)
For n a power of 2, the solution to this is T (n) = n(log2 n + 1) = Θ(n log n).
Proof. Let n = 2k , so that log2 n = k. We’ll use what is called the recurrence
tree method. This works as follows: Start with a single node labelled T (n).
Next change the label of this node to “n”, and give it two children labelled
by T (n/2):
Next replace a node labelled T (n/2) by a node labelled n/2 and give it
two children labelled T (n/4). Continue like this for as long as possible. See
Figure 1 for the final tree we get. Formally:
ˆ If there is a node labelled T (m) for m > 1, change its label to m + 1
and give it two children labelled T (m/2).
ˆ If there is a node labelled T (1), change its label to 1.
ˆ Otherwise, stop.
The key fact is the following “throughout the process, the sum over all
the nodes doesn’t change”. This is because at each step, we used (1) or (2)
to change one node into a combination of nodes with he same sum. Thus
to work out T (n), we can instead work out the sum of the values of all the
nodes in the final tree.
Label the levels of the recursion tree as level0, level1, . . . , levelk (see Figure 1). Since every node in levels 0, . . . , k − 1 has exactly 2 children, we have
the following:
the number of nodes in level i = 2i .
(3)
2
Figure 1: The recurrence tree
Let Si be the sum of the values of all the nodes on level i. Note that on
all levels i 6= k, we have
Si = (the number of nodes in level i)n/2i = n.
(4)
Thus we obtain the theorem.
T (n) = the sum over all the nodes
=
=
k
X
i=0
k
X
Si
n = (k + 1)n = n(log2 n + 1)
i=0
We still want to understand the case when n is not a power of 2. One
way to deal with this is to first note that T (n) is increasing with n (this
can be proved by induction). Also note that there is a unique p ∈ N with
2p−1 < n < 2p . Set n0 = 2p , noting that n0 ≤ 2n. This shows us that
T (n) ≤ T (n0 ) = n0 (log2 n0 + 1) ≤ 2n(log2 n + 2) = O(n log n).
1.2
Lower bounds
In this section we prove the following theorem that gives a lower bound on
the running time of all comparison-based sorting algorithms.
3
Theorem 3. Every comparison-based sorting algorithm A has worst case
running time T (A, n) = Ω(n log n).
One remark about using asymptotic notation with logarithms: using the
bn
change of base formula loga n = log
, it is easy to show that for a, b > 1, we
log ba
have loga n = Θ(logb n). Because of this, we often omit the logarithm base
when writing equations like “T (A, n) = Ω(n log n)” — this is because the expressions T (A, n) = Ω(n ln n), T (A, n) = Ω(n log2 n), T (A, n) = Ω(n log3 n)
are all equivalent to each other via the change of base formula above.
In order to prove the above theorem, we need to study binary trees —
structures a bit like the ones that came up in the recursion tree method
above.
Definition 4. A binary tree T is a directed graph T with vertex set V =
(v1 , . . . , vn ) in which every vertex vi 6= v1 has one backwards edge (i.e. an
edge vj vi with j < i), and in which every vertex vi has either 0 or 3 forwards
edges (i.e. edges vi vj with i < j).
The vertex v1 is called the root of T . The leaves of T are the vertices
vi which have no forward edges coming out of them. The height of T is the
maximum distance from the root to a leaf in T .
A complete binary tree of height h is one in which all leaves are at distance
h from the root.
Figure 2: A binary tree
We’ll use the notation “xi : xj ” as shorthand for the operation “compare
xi to xj ‘”. The connection between binary trees and sorting algorithms is
the following:
Definition 5. A decision tree for sorting x1 , . . . , xn is a binary tree in which
every vertex and edge receives one of the following labels:
4
ˆ Every non-leaf vertex is labelled by “xi : xj ” for some i, j, and its two
forward edges are labelled by “xi < xj ” and “xi ≥ xj ”.
ˆ Every leaf vertex is labelled by some output (i.e. by a permutation of
x1 , . . . , xn ).
Every comparison-based sorting algorithm A gives rise to a decision tree.
To get this, first label the root by the first comparison xi : xj that the
algorithm makes. Then label the children of the root by the next comparisons that the algorithm makes, and so on. The leaves are created when the
algorithm says “output y1 , . . . , yn ”, in which case we label the leaf by the
permutation of x1 , . . . , xn that y1 , . . . , yn are. This is best illustrated as an
example:
Figure 3: A decision tree for sorting 3 numbers x1 , x2 , x3 .
Every comparison-based sorting algorithm gives rise to a decision tree.
In fact decision trees could be thought of as a way of formally defining what
a comparison based sorting algorithm is. Given some input x1 , . . . , xn we
can always reach an output by starting from the root and performing all the
comparisons“xi : xj ” that the tree tells you to do (and following the edge
labelled by “xi < xj ” or “xi ≥ xj ”, depending on which of these is true).
The running time of the algorithm is then exactly the number of edges you
move through i.e. the distance between the root and the leaf.
5
To prove Theorem 3, we need to understand binary trees. This is done
in the following lemmas:
Lemma 6. A binary tree of height h has at most 2h leaves.
Proof. Let T be a binary tree of height h that has as many leaves as possible.
We need to show that T has ≤ 2h leaves. Let the vertices of T be V =
(v1 , . . . , vn ).
Suppose that T is not a complete binary tree. Then there is some leaf
vi at distance d ≤ h − 1 from the root v1 . Build a new tree T 0 by adding
two vertices vn+1 , vn+2 and edges vi vn+1 , vi vn+2 . Note that in T 0 , the vertices
vn+1 , vn+2 are at distance d + 1 ≤ h from the root i.e. T 0 still has height
h. But T 0 has more leaves than T , contradicting the “has as many leaves as
possible” part of the definition of T .
Suppose that T is a complete binary tree. Let mi be the number of
vertices at distance i from v1 . We have that m0 = 1 and mi+1 = 2mi for all
i. Therefore mh = 2h . But since T is a complete binary tree, mh is exactly
the number of leaves, giving us what we want.
We’ll also need another simple lemma.
Lemma 7. For every n ∈ N
ln n! ∼ n ln n,
meaning that the ratio ln n!/(n ln n) tends to one as n → ∞.
In particular, for sufficiently large n, we have 2n ln n > ln n! > n ln n/2.
Proof. The Taylor expension of ex is 1 + x/1 + x2 /2! + . . . xn /n! + . . . which
implies with x = n en > nn /n!. This shows that n! > (n/e)n for every n.
Then, taking the logarithms of (n/e)n < n! < nn we obtain n(ln n − 1) <
ln n! < n ln n. Thus
1
ln n!
<
<1
1−
ln n
n ln n
for every n. This implies (ln n!)/(n ln n) → 1 as n → ∞; that is, ln n! ∼
n log n.
Now we show that every comparision sorting algorithm A has T (A, n) =
Ω(n log n).
Proof of Theorem 3. Consider the decision tree T corresponding to the algorithm A run on the input x1 , . . . , xn . Note the following:
6
ˆ T has height T (n). This is because T (n) is defined to be the maximum
number of comparisons made on an input I of the form (x1 , . . . , xn ).
But the number of comparisons on an input I exactly equals the length
of the path from the root to the leaf corresponding to I. Since the maximum length of such a path is the height of T we have that height(T ) =
T (n).
ˆ T has ≥ n! leaves. Otherwise there would be some permutation σ of
x1 , . . . , xn which is not the label of any leaf of T . Then the algorithm
could not possibly work correctly on all inputs, since it is possible that
the numbers in the input I are in the order given by σ.
Combining Lemma 6 with the first bullet point, we get that the number of
leaves of T is ≤ 2T (n) . Combining this with the second bullet point tells us
that 2T (n) ≥ n!. Taking logarithms gives T (n) ≥ log2 (n!) = ln(n!)/ ln(2).
By Lemma 7, we know that for sufficiently large n, ln(n!) ≥ n ln n/2. Thus
ln n
, which shows that T (n) =
for sufficiently large n we have that T (n) ≥ n2 ln
3
Ω(n log n).
1.3
Counting sort
Counting Sort is another algorithm for sorting which works in time O(n).
At first this seems like it should contradict Theorem ?? — but the reason
there will not be a contradiction is that Counting Sort will take place in the
arithmetic model (rather than be comparison based). Additionally, counting
sort will make an extra assumption on the numbers x1 , . . . , xn — we will
assume that they are all contained in the set [k] := {1, . . . , k} for some
integer k = O(n).
Algorithm 8.
ˆ Input: x1 , . . . , xn ∈ [k]
ˆ Output: y1 , . . . , yn , which are x1 , . . . , xn sorted into increasing order.
ˆ Procedure:
1. Set c1 , . . . , ck = 0.
2. For i = 1, . . . , n, set cxi = cxi + 1.
3. Set t = 1
4. For i = 1, . . . , k, repeat the following:
– For j = 1, . . . , ci , repeat the following:
* Set yt = i.
7
* Set t = t + 1.
5. Output y1 , . . . , yn .
The basic idea of this algorithm is: in step 2 we count the number of
times each j ∈ [k] comes up in the list x1 , . . . , xn (and let cj be the number
of times that j appears). Afterwards we write out c1 copies of “1”, c2 copies
of 2, . . . , ck copies of “k” — which will be the sorted list.
To work out the running time, we count the number of operations in each
line:
1. There are k operations here.
2. There are 3n operations here.
3. There is 1 operation here.
P
P
4. There are a total of k+ ki=1 4ci operations here. Noting that ki=1 ci =
n (since in step (2), there were exactly n times the ci s were increased),
we get that there are k + 4n operations at this step.
Thus in total, there are k + 3n + 1 + (k + 4n) = 2k + 7n + 1. If we additionally
know that k = O(n), this gives us that T (n) = O(n).
1.4
Recurrences
Determining running time often leads to recurrence equations (like the one
that came up when analysing in merge sort). The recurrence tree method
can be used to solve these. Here’s a fairly general theorem which covers a
wide range of recurrence equations:
Theorem 9. Let a, b ∈ N and b ≥ 2 and f (n) > 0 for all n. Set D = logb a.
Suppose that we have positive numbers T (1), T (b), T (b2 ) . . . , defined by the
following recurrence
n
+ f (n)
T (n) = aT
b
(1) If f (n) = O(nD−ε ) for some ε > 0, then T (n) = Θ(nD ).
(2) If f (n) = Θ(nD ), then T (n) = Θ(nD log n).
(3) If f (n) = Ω(nD ) and af (n/b) < cf (n) for some c < 1, then T (n) =
Θ(f (n)).
8
In the theorem, f (n) is compared with nD . In cases 1 and 3, there is a
polynomial gap n±ε between f (n) and nD . So the theorem does not cover all
possible cases.
Next we give a few examples of how the general theorem is applied.
Example 1. The recurrence equation T (n) = 9T (n/3) + n is Case 1 of
the general theorem as here a = 9, b = 3, so D = log3 9 = 2 and f (n) =
n = O(nD−ε ). Thus T (n) = Θ(n2 ). The result would be the same even with
f (n) = n1.5 or n1.95 .
Example 2. The recurrence equation T (n) = 9T (n/3) + n2 is in Case
2: a = 9, b = 3, so D = log3 9 = 2 and f (n) = n2 = Θ(nD ). Consequently
T (n) = Θ(n2 log n).
Example 3. The recurrence equation T (n) = T (n/3) + 1 is in Case 2:
a = 1, b = 3, so D = log3 1 = 0 and f (n) = 1 = Θ(n0 ). Consequently
T (n) = Θ(log n).
Example 4. In the recurrence equation T (n) = 3T (n/4)+n log n we have
a = 3, b = 4, D = log4 3 ≈ 0.793.. and f (n) = n log n = Ω(nD+ε ). This is
going to be Case 3 but we still have to check that condition af (n/b) < cf (n)
holds with some c ∈ (0, 1). This is quite simple: 3 n4 log n4 < cn log n indeed
with c = 34 , say. So we have T (n) = Θ(n log n).
Example 5. The recurrence equation T (n) = 2T (n/2) + n log n gives
a = 2, b = 2, D = 1, f (n) = n log n = Ω(nD ). Though this seems to be
Case 3, the condition af (n/b) < cf (n) fails: 2(n/2) log(n/2) = nlog(n/2) <
cn log n does not hold for any positive c < 1. This case is not covered by the
above general theorem.
Proof. We are given that n = bh for some h ∈ N. We use the recurrence tree
method, this time not with a binary tree (where every internal node has 2
children), but with an a-ary tree meaning that every internal node (including
the root) has a children. Here is the recursion tree:
The height of the tree is h. Let’s label the levels from top to bottom as
level 0, . . . , level h. There is 1 node at level 0 with value f (n), a nodes on
level 1 each with value f (n/b), a2 nodes on level 2 each with value f (n/b2 ),
etc, In general, on level i < h, there are ai nodes each with value f (n/bi ).
On the last level (i.e. level h), there are ah nodes, all of which are leaves,
and all of which have label T (1) = T (n/bh ). Note that
ah = alogb n = nlogb a = nD and bD = a.
9
The sum of the values on level 1 is af (n/b), the sum of the values on level
2 is a2 f (n/b2 ), etc, the sum of the values on level h − 1 is ah−1 f (n/bh−1 ),
and
the values on the leaves is ah T (1) = nD T (1). Define g(n) =
Ph−1thei sum of
i
D
i=0 a f (n/b ). Then T (n) = g(n) + n T (1) which implies that T (n) =
D
Ω(g(n)) and T (n) = Ω(n ). We evaluate g(n) in the three cases.
Case 1. Since f (n) = O(nD−ε ), we have a constant C for which f (n) ≤
CnD−ε for all n (strictly speaking the definition of f (n) = O(nD−ε ) only gives
a constant C for which f (n) ≤ CnD−ε for all n ≥ n0 for some n0 . However,
if we let C 0 = max(C, f (1), f (2), . . . , f (n0 )) we get that f (n) ≤ C 0 nD−ε for
all n). Then f (n/bi ) ≤ C(n/bi )D−ε . Thus
g(n) =
h−1
X
i
af
i=0
D−ε
= Cn
n
bi
h−1
X
i=0
D−ε
≤ Cn
h−1
X
i=0
ai
bi(D−ε)
ε i
h−1
X
b
bhε − 1
a
= CnD−ε
biε = CnD−ε ε
a
b −1
i=0
i
nε − 1
1 − n−ε
= CnD−ε ε
= CnD ε
= O(nD ).
b −1
b −1
Returning to T (n), we have T (n) = g(n) + nD T (1) = O(nD ) and hence
T (n) = Θ(nD ).
Case 2. As in the previous case, using that f (n) = Θ(nD ), we get
10
constants c, C such that for all n we have cnD < f (n) < CnD . We now have
g(n) =
h−1
X
i
af
i=0
n
bi
<
h−1
X
i
aC
i=0
n D
bi
D
= Cn
h−1
X
1 = CnD logb n = CnD h
0
= O(nD log n).
and also
g(n) =
h−1
X
i
af
i=0
n
bi
>
h−1
X
i
ac
i=0
n D
bi
= cn
D
h−1
X
1 = cnD logb n = cnD h = cnD loga n
0
D
= Ω(n log n).
Returning to T (n) we have T (n) = g(n) + nD T (1) ≥ g(n) ≥ cnD loga n =
Ω(n log n) and also T (n) = g(n)+nD T (1) ≤ cnD loga n+nD T (1) = O(n log n).
Thus we’ve established both T (n) = O(n log n) and T (n) = Ω(n log n), proving T (n) = Θ(f (n)).
Case 3. As in the previous cases, using that f (n) = Ω(nD ), we get a constant C such that f (n) > CnD . The condition af (n/b) < cf (n) implies that
argument shows that f (n/bi ) < (c/a)i f (n),
f (n/b) < ac f (n). Repeating
this
n
i
i
or in other words a f bi < c f (n). We use this last inequality when estimating g(n):
g(n) =
h−1
X
i=0
ai f
n
bi
<
h−1
X
ci f (n) = f (n)
h−1
X
0
i=0
ci ≤ f (n)
∞
X
ci
0
1
= f (n)
= O(f (n)).
1−c
We also have that g(n) ≥ f (n) for all n (since the node at level 0 always has
value f (n)), and hence g(n) = Θ(f (n)).
Returning to T (n) we have that T (n) = g(n)+nD T (1) ≥ g(n) = Ω(f (n)).
We also have that T (n) = g(n) + nD T (1) ≤ g(n) + f (n)/C = O(f (n)).
Thus we’ve established both T (n) = O(f (n)) and T (n) = Ω(f (n)), proving
T (n) = Θ(f (n)).
11
Lecture 3
Graph theory, basic definitions
In this section we introduce objects called graphs, which are how mathematicians study networks of objects. If you took MATH0029, then there will be
a large overlap between this section and that module.
First a motivational example
Example 1 (Minimum cost spanning tree problem). Suppose that you are
designing an electrical network in a city. There are a number of buildings
which you need to connect the network by stretching wires between them.
The costs of each potential wire are given by the diagram bellow: What is the
cheapest way of connecting everything together? The optimal way is what’s
known as a minimal cost spanning tree. In the above example, the following
is the optimum:
1
Next week, we will introduce two algorithms — Jarnik’s Algorithm, and
Kruskal’s Algorithm for solving the above problem. Today, we’ll set up mathematical notation for describing the problem precisely. First we define a
graph — informally a graph is a “network”.
Definition 2.
ˆ A undirected graph G is a pair G = (V, E) where V
is a finite set and E is a set of unordered pairs of elements from V .
Elements of V are called vertices (or nodes), and elements of E are
called edges.
ˆ A directed graph D is a pair D = (V, E) where V is a finite set and E
is a set of ordered pairs of elements from V .
For example, we could have and undirected graph G = (V, E) with V =
{a, b, c, d} and E = {{a, b}, {b, c}, {a, c}, {a, d}}. And we can have a directed
graph D = (V, E) with V = {x, y, z, w} and E = {(x, y), (y, z), (z, w), (x, w)}.
Writing brackets for edges can get a bit cluttered, so it’s a convention to omit
them when talking about graphs i.e. we can describe the edge set of the graph
as E = {ab, bc, ac, ad}. When two vertices x, y are contained in an edge xy
we say “x and y are adjacent”, “x and y are incident to each other”, “x and y
are connected by an edge”, “x is a neighbour of y” — these are all synonyms
for the same thing. While graphs are defined in terms of sets, most of the
time we draw them (and think about them) as collections of points joined
by lines i.e. a picture like the following one:
Most of the time we do not allow a graph to contain an edge joining a
vertex to itself (i.e. there are no edges of the form {x, x}), and also we allow
at most one edge between two vertices. Graphs which satisfy these are called
simple graphs. Graphs which have multiple edges between the same pair of
vertices are called multigraphs.
Given a graph G = (V, E), we write V (G) to denote the set of vertices of G
(i.e. V (G) = V ), and E(G) to denote the set of edges of G (i.e. E(G) = E).
The order of G is the number of vertices it has , denoted v(G) := |V (G)|,
while the size of G is the number of edges it has, denoted e(G) := E(G).
2
For a vertex v, the neighbourhood of v in G, denoted NG (v) is the set of
vertices connected to v by an edge i.e. NG (v) := {u ∈ V (G) : vu ∈ E(G)}.
The degree of vertex v ∈ V in G, is the number of edges G has containing v
— in a simple graph this works out as dG (v) = |NG (v)|.
The complete graph on n vertices, Kn , is when |V (Kn )| = n and E(Kn )
consists of all pairs {u, v} with distinct u, v ∈ V . The empty graph on n
vertices is En where |V (En )| = n and there are no edges.
Figure 1: Examples of complete graphs
An important class is that of the bipartite graphs. G(V, E) is bipartite if
there is a partition V = X ∪ Y with X, Y 6= ∅ (and X ∩ Y = ∅) such that
every edge in E has one endpoint in X and one in Y . That is, there are no
edges with both endpoints in X or in Y , edges only go between X and Y .
Figure 2: An example of a bipartite graph
Two more kinds of graphs which come up are the path and the cycle. A
path on n vertices n, denoted Pn is defined to be the graph with V (Pn ) =
{v1 , . . . , vn }, and E(Pn ) = {v1 v2 , v2 v3 , . . . , vn−1 vn }. A cycle on n vertices
n, denoted Cn is defined to be the graph with V (Cn ) = {v1 , . . . , vn }, and
E(Cn ) = {v1 v2 , v2 v3 , . . . , vn−1 vn , vn v1 }.
3
Figure 3: Examples of paths and cycles
Sometimes a graph H is a subgraph of another graph G. This happens
exactly when V (H) ⊂ V (G) and E(H) ⊂ E(G). We say that H is a spanning
subgraph of G if it is a subgraph of G and V (H) = V (G).
0.1
Paths, walks, cycles, and circuits
In the graph G(V, E) a walk P is an ordered sequence of vertices v0 , v1 , . . . , vk
where vi ∈ V and vi−1 vi ∈ E (for all i = 1, . . . , k). The length of a walk is
defined as k−1, which equals the number of edges it goes through (repetitions
counted). A trail is a walk which doesn’t repeat edges, and a path is a walk
which doesn’t repeat vertices or edges (i.e. a sequence v0 , v1 , . . . , vk of distinct
vertices with vi−1 vi ∈ E for all i).
A closed walk is a walk v0 , v1 , . . . , vk with v0 = vk . A circuit is a trail
v0 , v1 , . . . , vk with v0 = vk and k ≥ 3. A cycle is a sequence v0 , v1 , . . . , vk , v0
of vertices with v0 , . . . , vk distinct, k ≥ 3, and vi−1 vi ∈ E for all i = 1, . . . , k,
and also vk v0 an edge.
Notice that containing a path of length n in a graph G, is exactly the same
as G containing the graph Pn+1 as a subgraph (as defined in the previous
section). Similarly containing a cycle of length n in a graph G, is exactly the
same as G containing the graph Cn as a subgraph.
Definition 3. Vertices u, v ∈ V (G) are connected in G if there is a walk
P = u, v1 , . . . , vk , v in G. Notation: u ∼ v or u ∼G v. We say that the walk
P connects u and v, or goes between u and v.
Lemma 4. The relation u ∼ v is an equivalence relation, that is, it satisfies
the following conditions.
(1) u ∼ u for every u (reflexive),
(2) if u ∼ v then v ∼ u (symmetric),
(3) if u ∼ v and v ∼ w, then u ∼ w (transitive).
4
Proof. (1) W = u gives a walk from u to u, showing u ∼ u.
(2) If W = u, v1 , v2 , . . . , vk v is a walk from u to v, then W 0 = v, vk , . . . , v2 , v1 , u
is a walk from v to u.
(3) If W = u, v1 , . . . , vk v is a walk from u to v and W 0 = v, x1 , . . . , xt , w is a
walk from v to w, then W 00 = u, v1 , . . . , vk v, x1 , . . . , xt , w is a walk from
u to w.
Equivalence relations are important in mathematics.
S An equivalence relation on a ground set V gives rise to a partition V = Vi with the property
that u ∼ v iff u, v are contained in the same Vi , the sets Vi are called equivalence classes. In the case of a graph G and u ∼ v, the equivalence classes are
subsets of vertices V1 , . . . , Vk such that x, y ∈ V are connected by a walk iff
x, y are in the same Vi . The subgraphs with vertex set Vi and edges inherited
from G are called the connected components of G.
Proposition 5. If u, v ∈ V are connected by a walk in G, then they are
connected by a path.
Proof. Consider a walk W from u to v given by u = v1 , v2 , . . . , vk = v, and
suppose that its length is as small as possible (i.e. that we choose W to
have k minimal out of all possible walks from u to v). If v1 , . . . , vk are all
distinct, then W is a path. So suppose otherwise, that vi = vj for some
i < j. Then W 0 = v1 , . . . , vi , vj+1 , . . . , vk is a walk: to check this we need to
check that consecutive vertices on W are connected by edges. These pairs
are v1 v2 , v2 v3 , . . . , vi−1 vi , vi vj+1 , vj+1 vj+2 , . . . , vk−1 vk . Since vi = vj , we have
that all of these are of the form vt vt+1 for some t. But for all t, vt vt+1 is an
edge by the definition of W being a walk.
Given en edge e = uv of a graph G, we define the subgraph G − e of G
via V (G − e) = V (G) and E(G − e) = E(G) \ e. This subgraph is called G
minus e.
Proposition 6. An edge e = uv ∈ E lies on a circuit of G if and only if u
and v are connected in G − e.
Proof. If uv lies on the circuit u, v1 , . . . , vk , v, u, then u and v remain connected in G − e by the path u, v1 , . . . , vk , v. Conversely, if u and v are connected in G − e, then there is path of the form u, v1 , . . . , vk , v, and e = uv
lies on the circuit u, v1 , . . . , vk , v, u.
Using this we can get an analogue of Proposition 5 for cycles/circuits.
5
Proposition 7. If an edge uv ∈ E(G) is contained in a circuit, then it is
also contained in a cycle.
Proof. Consider some circuit u, v, v1 , . . . , vk u through uv. By Proposition 6,
we have that u and v are connected in G − uv (by a walk). Using Proposition 5, u and v are connected in G − uv by a path. Let P = u, x1 , . . . , xt , v
be such a path. Then C = u, x1 , . . . , xt , v is a cycle through uv in G.
Definition 8. A graph is connected if every pair of its vertices are connected by a walk (note that this is equivalent to “every pair of its vertices are
connected by a path”).
Proposition 9. Assume G is connected and e = uv ∈ E(G) lies on a circuit
in G, then G − e is connected.
Proof. By Proposition 6, u and v are connected in G − e by a walk P =
u, v1 , . . . , vk , v. By Proposition 5, we may assume that P is in fact a path.
We have to show that every pair x, y ∈ V is connected by a walk in G − e.
They are connected by a path Q in G. If Q does not contain e then Q connects
x and y in G − e as well. If Q contains e = uv, say Q = x, . . . , u, v . . . , y,
then the walk x, . . . , u, v1 , . . . , vk , v . . . , y connects x and y in G − e.
Definition 10. A graph containing no circuit is called a forest. A connected
forest is a tree. In other words, a tree is a connected graph with no circuit.
Using Proposition 7, an equivalent definition is that a tree is a connected
graph with no cycles.
We have seen trees, namely binary trees before. Although a binary tree
is directed and has a root, it is easy to see that it is in fact a tree in the
above sense (if we disregard orientation). A vertex v of a tree T is called a
leaf if deg v = 1. This is again the same meaning as before.
1
The minimum spanning tree
In a graph G = (V, E) a spanning tree is a subgraph T which is a tree with
V (T ) = V . Of course, if such a tree exists, then the graph is connected.
Suppose that a cost
Pfunction c : E → R is given. The cost of the tree T is
defined as c(T ) = e∈E(T ) c(e). The minimum spanning tree problem asks
to find a spanning tree with minimal cost. Formally:
TASK: Minimum spanning tree (MST),
INPUT: a connected graph G and a cost function c : E → R,
OUTPUT: a minimum cost spanning tree T .
6
See the figures in Example 1 for an example of an input to this problem,
as well as an optimal solution. Next week, we will solve MST with a fast
and effective algorithm. The computational model will be the arithmetic
model. The input consists of graph G with n vertices and m edges, plus a
real number c(e) for each edge. The size of the input is n + m + m = n + 2m.
Note that m ≤ n2 as a graph on n vertices contains at most n2 edges.
This week we will, set up some lemmas that will be used when analysing the
algorithms for the MST problem.
Proposition 11. Every tree T with |V (T )| ≥ 2 has at least two leaves.
Proof. Consider the longest path P = v0 , v1 , . . . , vk in T . Its endpoints will
be leaves of T : Indeed if v0 vi is an edge for some i > 1, then v0 , . . . , vi would
form a cycle. On the other hand if v0 x is an edge for some x 6∈ P , then
x, v0 , v1 , . . . , vk would be a longer path. Thus v0 has no neighbours other
than v1 . Similarly vk has no neighbours other than vk−1 .
Note that there are trees with only two leaves: a path is always a tree
and has exactly two leaves.
Proposition 12. In every tree |E(T )| + 1 = |V (T )|.
Proof. Induction on n = |V (T )|. The cases n = 1 and n = 2 are clear. Let
us go from n − 1 → n. Let v be a leaf of T and let e be the unique edge
in T incident with v. The subgraph T − v is defined, quite naturally, by
V (t − v) = V (T ) \ v and E(T − v) = E(T ) \ e. We claim that T − v is a tree.
Indeed, it contains no circuit and it is connected: the edge e was used only
to connect v to the other vertices. Since |V (T − v)| = n − 1, the induction
hypothesis says that |E(T − v)| + 1 = |V (T − v)|. Putting back v and e we
get back T , so indeed, |E(T )| + 1 = |V (T )|.
Next we prove the following simple but important result.
Lemma 13. Assume T is a connected spanning subgraph of a graph G. Then
T is a tree iff it has exactly |V (G)| − 1 edges.
Proof. If T is a spanning tree of G, then |V (G)| = |V (T )| and |V (T ) =
|E(T )| + 1 by Proposition 12, so indeed |E(T )| = |V (G)| − 1.
For the other direction assume that T contains circuits. Then delete
edges one-by-one from circuits as long as you can. The resulting graph T ∗ is
a tree again because (1) it contains no circuit and (2) it remained connected
(in view of Proposition 6). Thus by Proposition 12, |E(T ∗ )| = |V (T ∗ )| − 1.
Further, T ∗ is a spanning subgraph of G as vertices have not been deleted.
So |V (T ∗ )| = |V (G)|. Consequently |E(T ∗ )| = |V (G)| − 1. Originally we
7
had |E(T )| = |V (G)| − 1 so no edge was ever deleted. T does not contain a
circuit.
8
Lecture 4
Minimum cost spanning trees
Lemma 4.1 (Exchange lemma). Assume G = (V, E) is a graph and T =
(V, F ) is a spanning tree in G, e = uv ∈ E \ F , and f ∈ F is on the (unique)
path P connecting u and v in T . Then T ∗ = (V, F ∪ e \ f ) is a spanning tree
again.
Proof. First T ∗ is a spanning subgraph as V (T ∗ ) = V . Since T is a tree with
n := |V (G)| vertices, we have |E(T )| = n − 1 (by a proposition from last
week). The path P together with edge e is a cycle. As T is connected, T + e
is connected as well. By another proposition from last week, T + e remains
connected if f is deleted from the cycle, so T ∗ is connected. Additionally we
have E(T ∗ ) = E(T ) = n − 1 (since one edge was deleted and one edge was
added). We’ve showed that T ∗ has n vertices, n − 1 edges, and is connected
— hence it is a tree.
We need one more definition. Given a graph G(V, E) and a set A ⊂ V ,
the cut of A is
δ(A) = {e ∈ E : one endpoint of e is in A, the other one in V \ A}.
So the cut of A, δ(A) ⊂ E, is the set of edges in G that go between A and
its complement. D ⊂ E is a cut if there is a proper A ⊂ V with δ(A) = D
(here “proper A ⊆ V ” means that A 6= V and A 6= ∅).
The following proposition tells us about how paths and cuts interact.
Proposition 4.2. Let G = (V, E) be a graph, A ⊆ V , and P a path which
starts in A and ends outside A. Then P contains an edge of the cut δ(A).
Proof. Let P = v1 , v2 , . . . , vk . So we have v1 ∈ A, and vk 6∈ A. Let vi be the
last vertex in the path with vi ∈ A (i.e. pick i = max{j : vj ∈ A} noting
that the maximum exists because this is a finite, nonempty set). We have
vi 6= vk (since vk = v 6∈ A), and so i ≤ k − 1. Since P is a path we have an
edge vi vi+1 . By maximality of i, we have vi+1 6∈ A. Now we’ve established
that vi ∈ A, vi+1 6∈ A so, by the definition of “cut”, we get that the edge
vi vi+1 ∈∈ δ(A).
Using this, we can prove an alternative characterization of connectedness.
Proposition 4.3. G is connected iff there is no proper A ⊂ V with δ(A) = ∅.
1
Proof. We prove the statement in the following form. G is disconnected
⇐⇒ there is a proper A ⊂ V with δ(A) = ∅.
“⇐ direction” if δ(A) = ∅ for some proper A ⊂ V , then there is u ∈ A
and v ∈ V \ A. Suppose, for contradiction, that we have some path P from
u to v in G. By Proposition 4.2, we get an edge of the path xy ∈∈ δ(A).
But δ(A) = ∅, which gives a contradiction.
“⇒“ direction. Suppose that G is disconnected. We have to show that
there is a proper A ⊂ V with δ(A) = ∅. As G is disconnected, there are
u, v ∈ V that are not connected by a path. Define now
A = {x ∈ V : u and x are connected in G}.
The set A is proper since u ∈ A and v ∈
/ A. We claim that δ(A) = ∅, which
will finish the proof. Assume that pq ∈ δ(A), then p ∈ A and q ∈
/ A. Let
P be the walk connecting u to p. Then the walk P q connects u and q, so
q ∈ A. A contradiction.
Given a graph G(V, E) we say that B ⊂ E extends to a minimum spanning tree if there is a minimum spanning tree whose edge set contains B. The
following theorem is the basic tool that makes our algorithms for minimum
cost spanning trees run correctly.
Theorem 4.4 (Extension theorem). Let G = (V, E) be a graph and c : E →
R a cost function. Assume B ⊂ E extends to a minimum spanning tree,
D ⊂ E is a cut disjoint from B, and e ∈ D is an edge with minimal cost in
D. Then B ∪ e also extends to a minimum spanning tree.
Proof. Let T = (V, F ) be the minimum spanning tree with B ⊂ F (which
exists by the assumption on B). If e ∈ F , then we are done. So assume
e∈
/ F . Since the tree T is connected, there is a path P in T that connects
the endpoints of e, and so by Proposition 4.2 there is an edge f ∈ D ∩ P
with c(f ) ≥ c(e) as e is the cheapest edge in D.
By the Exchange lemma, T ∗ (V, F ∪ e \ f ) is a spanning tree, again. Its
cost is
c(T ∗ ) = c(T ) + c(e) − c(f ) ≤ c(T ).
So equality holds here and T ∗ is another minimum spanning tree and B ∪ e
extends to T ∗ .
5
Jarnik’s algorithm for minimum spanning
tree
Jarnik’s algorithm grows a tree T by adding a new vertex and edge at each
iteration. Here is how it works: Choose a vertex r ∈ V , called the root. Start
2
with V (T ) = {r} and E(T ) = ∅. On each iteration, add to T a least cost
edge e ∈
/ E(T ) so that T + e remains a tree. Stop when no more edge can be
added.
ˆ Input: A connected graph G = (V, E) and a cost function c : E → R.
ˆ Output: A minimum cost spanning tree T .
ˆ Procedure:
– Pick some arbitrary r ∈ V (T ), and set V (T ) = {r}, E(T ) = ∅.
– Repeat the following:
* Find the least cost edge xy ∈ E with x ∈ V (T ), y 6∈ V (T ) (so
xy ∈ δ(V (T ))). If no such edge exist, output T
* Update V (T ) = V (T ) ∪ {y}, E(T ) = E(T ) ∪ {xy}.
Theorem 5.1. Jarnik’s algorithm always outputs a minimum cost spanning
tree
Proof. We show that at the start of iteration i of the loop the following hold:
(i) |V (T )| = i, |E(T )| = i − 1.
(ii) Edges of E(T ) are contained in V (T ).
(iii) T extends to a minimum cost spanning tree.
Proof. This is proved by induction on i. The initial case is i = 1, when
|V (T )| = 1, |E(T )| = 0. It is clear that this T extends to a minimum cost
spanning tree (since G is connected, it contains some minimum cost spanning
tree T 0 = (V, E 0 ). We have E(T ) = ∅ ⊆ E 0 ).
For the induction step, suppose (i), (ii), (iii) are true at iteration i ≤ n−1.
Let xy be the edge found by the algorithm at iteration i, and let T 0 = (V 0 , E 0 )
with V 0 = V (T ) ∪ {y}, E 0 = E(T ) ∪ {xy} be the graph the algorithm has
at iteration i + 1. We need to show that T’ satisfies (i), (ii), (iii). Note that
since y 6∈ V (T ), (ii) tells us that xy 6⊆ V (T ). Since |V (T 0 )| = |V (T ) ∪ {y}| =
|V (T )| + 1 = i + 1, |E(T 0 )| = |E(T ) ∪ {xy}| = |E(T )| + 1 = (i + 1) − 1, we
get that (i) holds at the start of iteration i + 1. Property (ii) holds because
xy ⊆ V (T ) ∪ {y} = V (T 0 ) (since x ∈ V (T )). Property (iii) holds for T 0
by the extension theorem — because we have that T extends to a minimum
cost spanning tree (by (iii)), we have E(T ) disjoint from the cut δ(V (T )) (by
(ii)), and xy the minimum cost edge in the cut δ(V (T )).
3
At the start of iteration n of the loop, by (i) we have that |V (T )| = n, |E(T )| =
n − 1, which tells us that V (T ) = V (G). This shows us that δ(V (T )) = ∅,
and hence the algorithm terminates outputting this T . By (iii), T extends to
some minimum cost spanning tree T 0 , which tells us that V (T ) ⊆ V (T 0 ) and
E(T ) ⊆ E(T 0 ). Since T 0 is a spanning tree of G, it has |V (T 0 )| = |V (G)| =
n = |V (T )| and |E(T 0 )| = |V (G)| − 1 = n − 1 = E(T ). Hence, we have that
T = T 0 i.e. that T is a minimum cost spanning tree.
Running time for Jarnik’s algorithm
We will give the following upper bound on the running time of Jarnik’s
Algorithm.
Proposition 5.2. Jarnik’s Algorithm can be run in time O(nm) on a graph
with n vertices and m edges.
Note that previously we only use “O(n)” notation for referring to functions of only one variable. Here there are two variables n and m, but the
meaning is the same. We use f (n, m) = O(g(n, m)) to mean “there is a
constant C such that for sufficiently large n, m we have f (n, m) ≤ Cg(n, m).
Thus the above proposition can be rephrased as “there is a constant C such
that for sufficiently large n and m, the running time of Jarnik’s Algorithm
is ≤ Cnm”.
Proof. The running time of any algorithm depends a bit on the implementation — things like how the input/output is recorded can greatly affect the
running time of the algorithm.
For the current proposition, we will use the most natural way of encoding the input — the graph G will be inputed as a list of vertices V (G) =
{1, 2, . . . , n}, and a list of edges E(G) = {x1 y1 , . . . , xm ym }. The costs of the
edges are given by a list of numbers {c1 , . . . , cm } with c(xi yi ) = ci .
As the algorithm runs, we will keep track of which vertices and edges are
in T . The vertices will be kept track of as follows. Let’s say that we use
vertex 1 as a root at the start. We will keep a binary list T = (T1 , . . . , Tn )
of length n, with Ti = 1 if vertex i is in T and Ti = 0 if vertex i is not in T .
Thus at the start of the algorithm we have T = (1, 0, . . . , 0), at the end we
have T = (1, 1, . . . , 1), with one “0” being turned into a “1” at intermediate
iterations. The edges of T will just be kept as a list of edges (so at the ith
iteration it will be a list of length i).
ˆ Input: A connected graph G = (V, E) and a cost function c : E → R.
We input these as three lists V = {1, . . . , n}, E = {x1 y1 , . . . , xm ym },
and c = (c1 , . . . , cm ).
4
ˆ Output: A minimum cost spanning tree T , whose edges are given by
E(T ) = {a1 b1 , . . . , an−1 bn−1 }.
ˆ Procedure:
1. Set T1 = 1, T2 = 0, . . . , Tn = 0.
2. Set i = 1
3. Repeat the following:
– Set min = +∞.
– For j = 1tom, repeat the following:
* If Txj 6= Tyj and cj < min, then update min = cj and
x = xj , y = y j .
– If min = +∞, then output E(T ) = {a1 b1 , . . . , an−1 bn−1 }.
– Otherwise, set tx = 1, ty = 1, ai = x, bi = y, i = i + 1.
At (1), there are n operations. At (2), there is 1 operation. The loop
at (3) is repeated n − 1 times (since in total, exactly n − 1 edges are added
to get a spanning tree). Inside the “for” loop, there are ≤ 5 operations, so
the “for” loop takes ≤ 5m operations in total. Outside the for loop, there
are 8 operations. Thus in total, we have ≤ n + 1 + (n − 1)(5m + 8) =
5mn + 9n − 5m − 7 = O(mn) operations.
5
Lecture 5
This week we will look at the minimum cost path problem, whose input will be a directed graph
D = (V, E), together with a cost function c : E → R. Recall that a directed graph is one in which
edges are ordered pairs of vertices. Most definitions we’ve introduced for undirected graphs extend
naturally to directed graphs. We briefly go over the main ones for concreteness:
Definition 5.1 (Directed graphs). A directed graph (sometimes called “digraph”) is a pair D =
(V, E) such that V is a finite set, and E is a set of ordered pairs of distinct elements of V .
Edges of directed graphs are sometimes called “arcs”. In a directed graph, we don’t allow a
vertex to be joined to itself (i.e. don’t allow edges uu), and we don’t allow two copies of the same
edge. However we do allow for there to be two edges between two vertices as long as they go in
opposite directions (i.e. we can have two edges uv and vu).
Definition 5.2 (Walks, trails, and paths in directed graphs). In the directed graph D = (V, E)
a walk P is an ordered sequence of vertices v0 , v1 , . . . , vk where vi ∈ V and vi−1 vi ∈ E (for all
i = 1, . . . , k). The length of a walk is defined as k − 1, which equals the number of edges it
goes through (repetitions counted). A trail is a walk which doesn’t repeat edges, and a path is a
walk which doesn’t repeat vertices or edges (i.e. a sequence v0 , v1 , . . . , vk of distinct vertices with
vi−1 vi ∈ E for all i).
Note that these three definitions are exactly the same as they were in the directed case (though
now it is very important what order the vertices come in each edge vi−1 vi — the edges are directed
from the start of the path/walk/trail to the end). We’ll use the notation that x
y if there is a
walk from x to y in D. As in the undirected case this is equivalent to “there is a path from x to
y in D”.
Definition 5.3 (Circuits and cycles in directed graphs). In a directed graph D: A closed walk
is a walk v1 , . . . , vk with v1 = vk . A circuit is a trail v1 , . . . , vk , v1 and k ≥ 2. A cycle is a walk
v1 , . . . , vk , v1 with v1 , . . . , vk and k ≥ 2.
These are again almost identical to the undirected case. Note however a key difference in the
definition of circuits/cycles — that we only insist on k ≥ 2 (rather than k ≥ 3 as we did in the
undirected case). This is because in directed graphs, we think of uv and vu as two distinct edges
— therefore in the directed case, the closed walk uvu doesn’t repeat edges and is a circuit (whereas
in the undirected case, the closed walk uvu would be going through the edge uv = vu twice and
hence not be a circuit).
Minimum cost path problem
The setup for this problem is that we have a directed graph with a cost function c. The target
is to find the minimum cost directed path in a digraph G(V, E) between two specified vertices.
Quite often in applications, the vertices represent points in space and the cost represents the
distance between two points. With this interpretation the minimum cost path will just be the
shortest path between two points — for this reason, this problem is often called the “shortest path
problem”. We set up the problem more generally and want to find the minimum cost paths from
a fixed vertex, r, called the root, to all other vertices of the graph. Note that together with G
and r, aPcost function c : E → R is given. The price of the directed path P = v0 , v1 , . . . , vk is
k
c(P ) = 1 c(vi−1 vi ). Recall the definition that P is a directed path iff ai−1 ai ∈ E(G) for all i.
TASK: Minimum cost path
INPUT: Digraph G = (V, E), root r ∈ V , cost function c : E → R
OUTPUT: a directed path P from r to every vertex v ∈ V of minimal cost.
The following lemma is useful for understanding walks in digraphs with a cost function:
Lemma 5.4. Let W be a walk from x to y in a directed graph G with cost function c, then
c(W ) = c(P ) + c(C1 ) + · · · + c(Ct ) for some path P from x to y, integer t, and cycles C1 , . . . , Ct .
1
Proof. This is by induction on the number of repeated vertices in W = xv1 . . . vt y. If W has no
repeated vertices, then it is a path. Otherwise vi = vj for some i < j. Without loss of generality
pick such vi , vj as close together as possible i.e. such that the vertices vi , vi+1 , . . . , vj−1 are all
distinct. Then C = vi vi+1 . . . vj−1 v is a cycle, while W 0 = xv1 . . . vi vj+1 . . . vt y is a walk from x to
y with less repeated vertices than W . By induction c(W 0 ) = c(P )+c(C1 )+· · ·+c(Ct ) for some path
P from x to y and cycles C1 , . . . , Ct . Now c(W ) = c(W 0 ) + c(C) = c(P ) + c(C1 ) + · · · + c(Ct ) + c(C)
as required.
One difficulty in solving the minimum cost path problem is the presence of negative circuits in
the graph i.e. circuits the sum of whose weight is negative. The above lemma shows that this is
the same as negative cycles/closed walks:
Lemma 5.5. The following are equivalent in a directed graph G with cost function c:
(i) G has a negative cost closed walk.
(ii) G has a negative cost circuit.
(iii) G has a negative cost cycle.
Proof. Since cycles are circuits and circuits are closed walks, we clearly have (iii) =⇒ (ii) =⇒
(i). It remains to prove (i) =⇒ (iii). Let W a closed walk with c(W ) < 0. Let x be the first (and
so also last) vertex of W . Apply Lemma 5.4 to get c(W ) = c(P ) + c(C1 ) + · · · + c(Ct ) for some
path P from x to x, integer t, and cycles C1 , . . . , Ct . Since P is a path from x to x, it must just
be P = x, giving c(P ) = 0. Thus 0 > c(W ) = c(C1 ) + · · · + c(Ct ), which shows that c(Ci ) < 0 for
some i.
The following lemma gives a characterization of graphs without minimum circuits in terms of
min cost paths/walks.
Lemma 5.6. Suppose that we have a connected directed graph G and a cost function c : E(G) → R.
The following are equivalent:
(i) G has a minimum cost walk from x to y for every x, y with x
such a minimum cost walk which is a path.
y. Moreover there exists
(ii) G has no negative circuits.
Proof. For (i) =⇒ (ii): suppose that G has a minimum cost walk between every pair of vertices.
Consider a circuit C = x1 x2 . . . xk x1 . Define Pt to be the walk from x1 to x1 defined by Pt =
x1 x2 . . . xk x1 x2 . . . xk . . . x1 x2 . . . xk x1 , where the sequence repeats t times. We have the cost
c(Pt ) = t(c(x1 x2 ) + c(x2 x3 ) + · · · + c(xk x1 )) = tc(C). By assumption G has a minimum cost
walk W from x1 to x1 . Since W is minimum cost we have c(W ) ≤ c(Pt ) = tc(C) for all t. This
can only happen if c(C) ≥ 0 (since otherwise the sequence {tc(C)}∞
t=0 would tend to −∞).
For (ii) =⇒ (i): suppose that G has no negative circuits. Let x, y be vertices. Let P be a
minimum cost path from x to y (it exists because there are finitely many paths from x to y). If
P is not a minimum cost walk, then there is some walk W with c(W ) < c(P ). Use Lemma 5.4
to get a path P 0 , integer t, and cycles C1 , . . . , Ct for which c(W ) = c(P 0 ) + c(C1 ) + · · · + c(Ct ).
Since there are no negative circuits, we have c(Ci ) ≥ 0 for all i, giving c(W ) ≥ c(P 0 ). But then
c(P ) > c(W ) ≥ c(P 0 ), contradicting P being a minimum cost path.
Because of the above lemma we generally only solve the minimum cost path problem on a
graph which doesn’t have negative circuits.
Our basic goal this week is to find an algorithm that will do the following:
ˆ Input: a directed graph G, a cost function c : E(G) → R with no negative circuits, and a
vertex r for which r
y for all y ∈ G.
ˆ Output:
– For all y, give a path Py going from x to y of minimum cost.
2
Potentials and predecessor maps
Our basic approach to this rests on the following observation: assume that we have an r to v
directed path of cost fv , and an r to w dipath of cost fw . If
fw > fv + c(vw),
then there is another r to w path which is cheaper than fw , namely the r to v path of cost fv
appended with the arc vw. This path is of cost fv + c(vw).
Definition 5.7. A feasible potential is a function f : V → R (or, equivalently, an assignment of
a number fv to every v ∈ V ) such that fr = 0 and
fw ≤ fv + c(vw) for every vw ∈ E.
(∗)
Lemma 5.8. Assume f is a feasible potential and P is a directed path from r to v. Then c(P ) ≥ fv .
Proof. As P = v0 , v1 , . . . , vk with v0 = r and vk = v, we have
c(P ) =
k−1
X
0
c(vi vi+1 ) ≥
k−1
X
(fvi−1 − fvi ) = fvk − fv0 = fv .
0
Corollary 5.9. If f is a feasible potential and c(P ) ≤ fv for some r − v path P , then P is a
minimum cost r − v path (and in fact we have c(P ) = fv ).
Thus to solve the minimum cost path problem it is sufficient to find a feasible potential f and
a collection of paths {Pv : v ∈ V (D)}, with the property that Pv goes from the root r to v and
satisfies c(Pv ) = fv . Our algorithms will present their outputs in a more efficient way — they will
produce a feasible potential, and something called a predecessor function.
Definition 5.10. A predecessor map on a directed graph D = (V, E) with root r, is a function
p : V \ {r} → V , such that for each v ∈ V \ {r}, we have p(v)v ∈ E and the set of such edges
{p(v)v : v ∈ V \ {r}} contains no (directed) cycles.
Given a vertex v, one can define a sequence of predecessors v, p(v), p(p(v)) . . . . This sequence
cannot be infinite (since otherwise it would repeat some vertices and contain a cycle), and so must
terminate. The only way it can terminate is if some p(p(. . . p(v) . . . )) equals the root r (since the
root is the only vertex with no predecessor). Thus any predecessor function defines a collection of
paths from r to all the vertices of D.
Our algorithms for the minimum cost path problem will output a feasible potential f and a
predecessor function satisfying fv = fp(v) + c(p(v)v) for all v 6= r.
Lemma 5.11. Let D = (V, E) be a directed graph, C : E → R a cost function, and r ∈ V
a root. Suppose that we have a feasible potential f and a predecessor function satisfying fv =
fp(v) + c(p(v)v) for all v 6= r. Then for each vertex v, the sequence v, p(v), p(p(v)) . . . gives a
minimum cost path from r to v.
Proof. As we saw above, the sequence v, p(v), p(p(v)) . . . written in reverse gives some path Pv :
r = v1 , . . . , vk = v (i.e. with p(vi ) = vi−1 for all i). We have c(Pv ) = c(v1 v2 ) + c(v2 v3 ) · · · +
c(vk−1 vk ) = c(p(v2 )v2 ) + c(p(v3 )v3 ) + · · · + c(p(vk )vk ). Using the equation fv = fp(v) + c(p(v)v)
repeatedly gives
c(p(v2 )v2 ) = fv2 − fp(v2 ) = fv2 − fr = fv2
c(p(v3 )v3 ) = fv3 − fp(v3 ) = fv3 − fv2
c(p(v4 )v4 ) = fv4 − fp(v4 ) = fv4 − fv3
..
.
c(p(vk )vk ) = fvk − fp(vk ) = fvk − fvk−1
Adding up all of these gives c(Pv ) = fvk = fv . By Corollary 5.9, Pv gives a minimum cost path
from r to v.
3
Ford’s algorithm
Next we describe Ford’s algorithm for the minimum cost path problem. The target is to find a
feasible potential and the corresponding predecessor map.
ˆ Input: A directed graph D = (V, E), a cost function c : E → R, and a root r ∈ V .
ˆ Output: If G has no negative circuits and r
y for every y ∈ V , then we output a feasible
potential f and a predecessor map p satisfying Lemma 5.11.
ˆ Procedure:
– Set fr = 0, and fv = +∞ for all other vertices.
– Set p(v) =“undefined” for all v.
– Repeat the following:
* Check if there is an edge xy with fy > fx + c(xy).
* If there is such an edge, then update fy = fx + c(xy) and p(y) = x.
* If there is no such edge, then output f, p.
One big difference between Ford’s Algorithm and previous algorithms that we’ve looked at
is that Ford’s Algorithm is not guaranteed to terminate. In fact, when the graph has negative
circuits, then usually Ford’s Algorithm will not terminate. What we’ll aim to prove is that when
there are no negative circuits, then Ford’s Algorithm does terminate, and the potential at each
vertex gives the cost of the minimimum cost path from the root to that vertex. First notice the
following observation.
Observation 5.12. For each vertex y as Ford’s algorithm runs, the potential fy only decreases,
never increases.
This is true simply because the algorithm has no mechanism for increasing the potential of a
vertex.
The following is also immediate:
Observation 5.13. If Ford’s algorithm terminates then for every edge xy we have fy ≤ fx +c(xy).
Observation 5.14. For y 6= r, at any point of the algorithm, we have fy finite ⇐⇒ p(y) is
defined.
This true just because fy and p(y) are only every updated together — if we change one of
these from their initial value, then we change the other too.
The following lemma gives us important information about what happens in Ford’s algorithm
while it runs.
Lemma 5.15 (Running of Ford’s Algorithm). At step k of Ford’s algorithm, the following are
true for all vertices y.
(i) fy ≥ fp(y) + c(p(y)y) for all y 6= r with fy < ∞.
(ii) If fy < ∞, then fy equals the cost of some walk from r to y.
(iii) If there are no negative circuits, then there are no cycles v1 v2 . . . vk = vi with vi = p(vi+1 )
for all i.
Proof. (i) If fy = ∞, then there is nothing to check, so suppose that fy is finite. Then the
last edge checked ending at y must have been p(y)y (since this is the only way p(y) could
have been set to its value). At this step, fy was set to f(p(y)) + c(p(y)y) i.e. we have
fy (i) ≥ fp(y) (i) + c(p(y)y). Following this, fy hasn’t changed, while fp(y) only decreased (by
Observation 5.12) i.e. we have fy ≥ fp(y) + c(p(y)y).
4
(ii) This is proved by induction on k. In the initial case k = 0, the only vertex with fx < ∞ is
the root r which has fr = 0. Notice that the walk W = r is a cost 0 walk from r to r —
showing that the initial case is true. Suppose that at step k − 1, there is a walk of cost fv
from r to v for all v with fv < ∞. Let xy be the edge that was checked at step k. If we had
fy ≤ fx + c(xy), then nothing changes in the graph (and so the claim is true by induction),
so suppose that fy > fx + c(xy). Then at this step we set fy = fx + c(xy). By induction,
there is a walk W from r to x of cost c(W ) = fx . Adding the vertex y at the end of this walk
gives a new walk whose cost is c(W ) + c(xy) = fx + c(xy) = fy . This proves the induction
step.
(iii) Let H(t) be the directed graph consisting of edges p(x)x at time t. Suppose for contradiction
that H(t) contains a cycle. Let t be the first timestep when H(t) contains a circuit i.e.
suppose that H(t − 1) does not contain any circuits. Let xy be the edge that was checked at
time t since the graph changed from H(t−1) to H(t), we must have fy (t−1) > fx (t−1)+c(xy)
and fy (t) = fx (t − 1) + c(xy) = fx (t) + c(xy). Also H(t) contains a cycle C, which wasn’t
present in H(t − 1) and so C must contain the edge xy.
Let C = v1 v2 . . . vk v1 where vk = x, v1 = y. By part (i), we have fvi+1 (t) ≥ fvi (t) + c(vi vi+1 )
for all i. We also have fv1 (t + 1) ≥ fk (t + 1) + c(vk v1 ) (this is the same as fy (t − 1) >
Pk
Pk
fx (t − 1) + c(xy)). Adding up all these inequalities gives i=1 fvi (t + 1) > i=1 fvi (t + 1) +
c(v1 v2 ) + · · · + c(vk−1 vk ) + c(vk v1 ) i.e. c(C) < 0.
An immediate corollary is the following:
Corollary 5.16. If there are no negative circuits, then the potential of the root r, fr never changes
(i.e. we have fr = 0 throughout the algorithm).
Proof. Suppose that fr 6= 0 at some point. Since potentials only ever decrease, this means that
fr < 0 at some point. By Lemma 5.15, this gives us a walk W from r to r with c(W ) < 0. But
this is a negative cost closed walk, so by Lemma 5.5, there would be negative circuits too.
Using the above, we can prove that when Ford’s algorithm terminates, then it gives the correct
answer.
Lemma 5.17 (Correctness of Ford’s Algorithm). Let G be a graph with no negative circuits and
r
y for all y ∈ V . If Ford’s algorithm terminates on G, then the following are true for all
vertices y 6= r.
(i) fy < ∞
(ii) fy = fp(y) + c(p(y)y).
(iii) f is a feasible potential.
(iv) The sequence y, p(y), p(p(y)) . . . defines a path P from r to y. This path is a minimum cost
path from r to y and has c(P ) = fy .
Proof. (i) Suppose that fy = ∞, and consider some path P : r = v1 , . . . , vk = y. Let vi be
the first vertex on this path with fvi = ∞. Since fr is finite, we have that i ≥ 2, and so
fvi−1 < ∞. But then ∞ = fvi > fvi−1 + c(vi−1 vi ), contradicting Observation 5.13.
(ii) For the algorithm to terminate we must have fy ≤ fx + c(xy) for all edges xy. Using
this with x = p(y) we have fy ≤ fp(y) + c(p(y)y). But from Lemma 5.15, we also have
fy ≤ fp(y) + c(p(y)y) which gives fy = fp(y) + c(p(y)y).
(iii) This is just a combination of Observation 5.13 and Corollary 5.16.
5
(iv) Parts (i) – (iii) together with part (iii) of Lemma 5.15 allow us to apply Lemma 5.11, which
tells us that y, p(y), p(p(y)) . . . defines a minimum cost path P from r to y.
Next we go on to show that the algorithm does indeed terminate as long as there are no
negative circuits. This is easy to show when the costs of edges are all integers.
Lemma 5.18 (Termination of Ford’s Algorithm, when costs are integers). Let G be a graph with
no negative circuits such that all the costs are integers. Then for any r, running Ford’s algorithm
with root r will terminate after finitely many steps.
Proof. Suppose that the algorithm never terminates. Then there is some vertex y whose potential
keeps decreasing i.e. there is an infinite sequence of steps t1 < t2 < . . . , such that fy (t1 ) >
fy (t2 ) > fy (t3 ) . . . . But by Lemma 5.15, fy (ti ) is the cost of some walk in G and so an integer.
So fy (t1 ) > fy (t2 ) > fy (t3 ) . . . is a sequence of decreasing integers and so tends to −∞. But by
Lemma 5.15, fy (ti ) is always bounded below by the cost of a minimum cost walk, and so cannot
tend to −∞, a contradiction.
For the full result, we need in use Lemma 5.4.
Lemma 5.19 (Termination of Ford’s Algorithm). Let G be a graph with no negative circuits.
Then for any r, running Ford’s algorithm with root r will terminate after finitely many steps.
Proof. Suppose that the algorithm never terminates. Then there is some vertex y whose potential
keeps decreasing i.e. there is an infinite sequence of steps t1 < t2 < . . . , such that fy (t1 ) >
fy (t2 ) > fy (t3 ) . . . . By Lemma 5.15 (ii), we get an infinite sequence of walks W1 , W2 , W3 , . . . from
r to y with c(Wi ) = fy (ti ). By Lemma 5.4 c(Wi ) = c(Pi ) + c(C1i ) + · · · + c(Csii ) for some path
and cycles. Without loss of generality, we can suppose that none of these cycles have zero cost
(otherwise just remove them from the sum), and so c(Cji ) > 0 always. There are two cases:
ˆ The sequence {si }∞
i=1 is bounded above. Since there are finitely many sets of ≤ si paths/cycles
in the graph, by the Pigeonhole Principle, for some i 6= j we must have Pi = Pj , and Cti = Ctj
for all t. But then c(Wi ) = c(Wj ), contradicting c(Wj ) = fy (tj ) < fy (ti ) = c(Wi ).
ˆ The sequence {si }∞
i=1 is not bounded above. Let m be the minimum non-zero cost of a cycle
in G. We have c(Wi ) ≥ msi . Since si is unbounded, for some i, msi > fy (t1 ). But then this
contradicts fy (t1 ) > c(Wi ).
6
Lecture 6
Maximum flows
Consider the following informal problem: you have a graph/network and
want to transport something from point r to point s (e.g. it could be a road
network and you want to transfer goods between two cities. Or it could be a
network of pipes and you want to pump oil between two locations). What is
the most efficient way to route the flow through the network? Here “efficient”
could mean “the way that allows you to transfer the most material from r
to s per hour”. This is called the maximum flow problem, and is what we’ll
look at this week.
Let’s try to model this mathematically. The input will look as follows:
ˆ Input: Directed graph G = (V, E), capacity function c : E → R+ , and
two vertices r (the source) and s (the sink).
1
c
4
d
1
7
r
2
1
s
b
e
3
2
3
1
1
a
Here the capacity function encodes how much material can pass through
an edge at any time. The output that we will search for is a flow — a function
x : E → R, which tells us how much material we should put through every
edge. The flow should satisfy two properties:
ˆ Conservation
law: at every
P
P vertex v 6= r, s the inflow equals the outflow
i.e. z∈N − (v) x(zv) = y∈N + (v) x(vy)
ˆ Feasibility: for every edge 0 ≤ x(uv) ≤ c(uv).
If a function x : E → R satisfies both of the above, we call it a “feasible
flow”. Here N − (v) denotes the in-neighbourhood of v i.e. N − (v) := {x ∈
V : xv ∈ E}, while N + (v) denotes the out-neighbourhood of v i.e. N + (v) :=
{y ∈ V : vy ∈ E}.
1
An example of a feasible flow for the above diagram would be to let
x(rb) = 1, x(be) = 1, x(ra) = 1, x(ae) = 1, x(es) = 2.
The conservation law can be stated more concisely as “the net flow at
every vertex other than the source
P and the sink equals zero”. Here net flow
at a vertex is defined as fx (v) := z∈N − (v) x(zv)−sumy∈N + (v) x(vy). We also
P
define the inflow at a vertex v as fx− (v) = z∈N − (v) x(zv) and the outflow at
P
a vertex v as y∈N + (v) x(vy).
The function we will try to maximizeP
is the total flow — this is defined
as the net flow at the sink fx := fx (s) = z∈N − (s) x(zs) − sumy∈N + (s) x(sy).
Now we can formally state the maximum flow problem:
Task 4.1 (Maximum flow).
ˆ Input: Directed graph G = (V, E), capacity function c : E → R+ , and two vertices r (the source) and s (the
sink).
ˆ Output: A flow f : E → R with fx as large as possible.
This task is called Maximum Flow or MaxFlow and is often encountered
in practice, for instance when one wants to push through a pipe system as
much oil as possible. Another typical example is a road system where the
capacity of each road is known and cars want to travel on the system and
the question is how many cars can be used on the system. Or in an electric
network the electrons run through the wires, entering the network at point r
and leaving it at point s. The capacity of each wire is known and one wants
to know the maximum number of electrons that one can push through the
network in one unit of time.
Cuts in directed graphs
It turns out that the MaxFlow problem is closely related to something called
the MinCut problem — the problem of finding the smallest cut in a directed
graph. We next explain this problem. First, P
we introduce a convenient
notation:
for
a
set
of
edges
F
we
write
x(F
)
=
e∈F x(e) and also c(F ) =
P
e∈F c(e).
Definition 4.2. Given R ⊂ V the set
δ(R) = {vw ∈ E : v ∈ R, w ∈
/ R}
is called a cut, or the cut of R. We say that δ(R) is an r-s cut if r ∈ R and
s 6∈ R.
2
This is the directed analogue of a cut in an undirected graph. We
write R = V \ R for the complement of R. Note P
that for such an r-s cut
δ(R),
and a flow x satisfying the conservation law, v∈R fx (v) = fx (r), and
P
v∈R fx (v) = fx (s) (since by definition of “satisfying the conservation law”,
every term in the sums fx (v) equals zero except fx (r), fx (s).
Lemma 4.3. For every feasible r − s flow x, and for every r − s cut δ(R)
fx = x(δ(R)) − x(δ(R)).
Proof.
fx = fx (s) =
X
fx (v) =
v∈R
=
X
v∈R,wv∈E
x(wv) −
X
x(vu) = x(δ(R)) − x(δ(R)).
2
v∈R,vu∈E
We get the following corollary:
Corollary 4.4. For every feasible r − s flow x, and for every r − s cut δ(R)
fx ≤ c(δ(R)).
Proof. x(δ(R)) ≤ c(δ(R) by the capacity constraint and x(δ(R)) is always
non-negative. So by the previous lemma, we get fx = x(δ(R)) − x(δ(R)) ≤
x(δ(R)) ≤ c(δ(R)).
This has the following important implication. If you find a feasible
r − s flow and an r − s cut δ(R) with fx = c(δ(R)), then x is a maximum
flow. The MaxFlow task is solved without any further work.
In 1956 Ford and Fulkerson proved the so-called Maxflow-Mincut theorem
showing that, quite generally, the maximum flow equals the capacity of the
minimum cut.
Theorem 4.5 (Max flow-min cut). If there is a maximum flow, then
max{fx : x is a feasible flow} = min{c(δ(R)) : R is an r − s cut}.
Here is the basic idea of the proof. If there is an r − s dipath P with
x(e) < c(e) for all e on P , then one can increase the flow value by a positive
amount, namely, by min{c(e) − x(e) : e ∈ P }. But this simple idea does not
quite work, we need a modification. If there is an undirected r − s path P
such that x(e) < c(e) on forward arcs and x(e) > 0 on backward arcs, then
one can increase the flow value again.
3
Definition 4.6. Let G be a digraph, c a capacity function, r, s ∈ V (G) a
source and sink, and x be a feasible flow. Let P : r = v1 , v2 , . . . , vk be a
sequence of distinct vertices. We say that P is an x-incrementing path if
for each i = 1, . . . , k − 1, we have one of:
(1) vi vi+1 ∈ E and c(vi vi+1 ) > x(vi vi+1 ). These are called “forwards edges”
(2) vi+1 vi ∈ E and x(vi+1 vi ) > 0. These are called “backwards edges”.
If v = s, then we call P an x-augmenting path.
Lemma 4.7. If there is an x-augmenting path, then the flow x is not maximum.
Proof. Let P be an x-augmenting path P : r = v1 , v2 , . . . , vk = s. Let be the
minimum of minvi vi+1 forward edge (c(vi vi+1 )−x(vi vi+1 )) and minvi+1 vi backwards edge (x(vi vi+1 )).
Note that this exists and is > 0 (since it is the minimum of a finite
set of numbers all of which are positive). Now define a new flow x0 with
x0 (vi vi+1 ) = x(vi vi+1 ) + for forwards edges and x0 (vi+1 vi ) = x(vi+1 vi ) − for backwards edges. The following claim will show that x0 has greater total
flow than x (and hence show that x was not maximum, proving the lemma).
Claim 4.8. x0 is a feasible flow of total flow fx0 = fx + .
Proof. To see the conservation law for x0 consider somePvertex v 6= r, s.
Since x satisfied the conservation law, we have fx (v) = u∈N − (v) x(uv) −
P
0
u∈N + (v) x(vu) = 0. If v 6∈ P , then x (e) = x(e) for all edges through v, so
fx0 (v) = fx (v) = 0. So suppose that v = vi for some i 6= 1, k. Then there are
two edges of the path through vi (namely one with vertices {vi−1 , vi }, and
one with vertices {vi , vi+1 }). There are several cases depending on whether
they are forwards/backwards edges.
ˆ If vi−1 vi and vi vi+1 are forwards edges, then x0 (vi−1 vi ) = x0 (vi−1 vi ) + ,
x0 (vi vi+1 ) = x0 (vi vi+1 )+ giving fx0 (vi ) = fx (vi )−x(vi−1 vi )+x0 (vi−1 vi )+
x(vi vi+1 ) − x0 (vi vi+1 ) = fx (vi ) − + = fx (vi ) = 0.
ˆ If vi vi−1 and vi+1 vi are backwards edges, then x0 (vi vi−1 ) = x0 (vi vi−1 )−,
x0 (vi+1 vi ) = x0 (vi+1 vi )− giving fx0 (vi ) = fx (vi )+x(vi vi−1 )−x0 (vi vi−1 )−
x(vi+1 vi ) + x0 (vi+1 vi ) = fx (vi ) − + = fx (vi ) = 0.
ˆ If vi−1 vi is a forwards edge vi+1 vi is a backwards edge, then x0 (vi−1 vi ) =
x0 (vi−1 vi ) + , x0 (vi+1 vi ) = x0 (vi+1 vi ) − giving fx0 (vi ) = fx (vi ) −
x(vi−1 vi )+x0 (vi−1 vi )−x(vi+1 vi )+x0 (vi+1 vi ) = fx (vi )−+ = fx (vi ) = 0.
4
ˆ If vi vi−1 is a backwards edge and vi+1 vi is a forwards edge, then x0 (vi vi−1 ) =
x0 (vi vi−1 ) − , x0 (vi vi+1 ) = x0 (vi vi+1 ) + giving fx0 (vi ) = fx (vi ) +
x(vi vi−1 )−x0 (vi vi−1 )+x(vi vi+1 )−x0 (vi vi+1 ) = fx (vi )−+ = fx (vi ) = 0.
To see that fx0 (uv) ≥ 0 for all edges uv, note that the only edges whose
flow decreased are backwards edges where it went from x(uv) > to x0 (uv) =
x(uv) − > 0 (and so is still non-negative). Similarly, to see that to see
that fx0 (uv) ≤ c(uv) for all edges uv, note that the only edges whose flow
increased are forwards edges where it went from x(uv) < c(uv)− to x0 (uv) =
x(uv) + < c(uv).
Finally, to work out the total flow: consider the sink s. It is contained in
precisely one edge e of P (the one with vertices {vk , vk−1 }). If e is a forward
edge, then the inflow at s increases by . If e is a backwards edge, then the
outflow at s decreases by . In either case, the net flow at s increases by ,
giving the result.
Proof of Max-Flow/Min-Cut Theorem. From Corollary 4.4, we have.
max{fx : x is a feasible flow} ≤ min{c(δ(R)) : R is an r − s cut}
It remains to prove the “≥” direction. Consider a maximum flow x i.e. a
feasible flow x with fx = max{fx : x is a feasible flow}. Let R = {v ∈ V :
there is an r to v x-incrementing path}. We have that r ∈ R (since P = r
satisfies the definition of “x-incrementing”). We also have s 6∈ R (since there
is no x-augmenting path, as otherwise we have a contradiction to x being a
maximum flow from Lemma 4.7).
Consider some edge uv ∈ δ(R). Then u ∈ R and v 6∈ R. By definition of
R, we have an x-incrementing path P : r = v1 , . . . , vk , u. Also by definition of
R we must have v1 , . . . , vk ∈ R (since a subpath of an incrementing path is an
incrementing path). If x(uv) < c(uv), then the path P 0 : r = v1 , . . . , vk , u, v
is also an incrementing path. This would give v ∈ R, contradicting “v 6∈ R”.
So in fact we know that x(uv) = c(uv) for all uv ∈ δ(R).
Consider some edge uv ∈ δ(R). Then u 6∈ R and v ∈ R. By definition of
R, we have an x-incrementing path P : r = v1 , . . . , vk , v. Also by definition
of R we must have v1 , . . . , vk ∈ R (since a subpath of an incrementing path
is an incrementing path). If x(uv) > 0, then the path P 0 : r = v1 , . . . , vk , v, u
is also an incrementing path. This would give u ∈ R, contradicting “u 6∈ R”.
So in fact we know that x(uv) = 0 for all uv ∈ δ(R).
Summarizing the last two paragraphs — we’ve shown that x(δ(R)) =
c(δ(R)) and x(δ(R)) = 0. By Lemma 4.3, we get fx = x(δ(R)) − x(δ(R)) =
5
c(δ(R)) i.e. we have found a cut δ(R) with c(δ(R)) = fx = max{fx :
x is a feasible flow}. This shows that
max{fx : x is a feasible flow} ≥ min{c(δ(R)) : R is an r − s cut}
as required.
Examining the above proof, we see that it also yields the following two
statements (which are each essentially equivalent to the MaxFlow-MinCut
Theorem).
Corollary 4.9. A feasible flow is maximal iff there is no augmenting path.
Corollary 4.10. Suppose x is a feasible r − s flow and δ(r) is an r − s cut.
Then x is maximal flow and δ(R) is a minimal cut if and only if x(e) = c(e)
for every e ∈ δ(R) and x(e) = 0 for every e ∈ δ(R).
Ford-Fulkerson Algorithm
Using the corollaries we can check whether in the example the flow x of
value 3 (the sum of the three path-flows) is maximal or not. Is there an xaugmenting path with this flow x? Yes, there is, namely the path r, c, b, a, d, s
where ba is a backward arc and all other arcs are forward. This suggests the
following algorithm (called the “Ford-Fulkerson Algorithm) for solving the
maximum flow problem:
ˆ Input: Directed graph G = (V, E), capacity function c : E → R+ , and
two vertices r (the source) and s (the sink).
ˆ Output: A maximum flow f : E → R.
ˆ Procedure:
– Start with x(uv) = 0 for all uv.
– Repeat the following:
(∗) Find an x-augmenting path P : r = v1 , v2 , . . . , vk = s (if there
is one).
- If there is no x-augmenting, output x.
- Otherwise, let be the minimum of minvi vi+1 forward edge (c(vi vi+1 )−
x(vi vi+1 )) and minvi+1 vi backwards edge (x(vi vi+1 )).
- For all forwards edges, update x(vi vi+1 ) = x(vi vi+1 ) + - For all backwards edges, update x(vi vi+1 ) = x(vi vi+1 ) − 6
This algorithm is somewhat informally stated (in particular it is unclear how
to perform step (∗)) — however, we shall analyse it as stated to keep things
simple.
From the previous section, it is easy to see that if the algorithm terminates, then its output x is a maximum flow: indeed when it terminates we
know that there are no x-augmenting paths, and so Corollary 4.9 tells us
that x is maximum.
It is less clear that the algorithm terminates at all though. We’ll prove
that it does under the additional assumption that all capacities c(uv) are
integers.
Theorem 4.11. Suppose that all capacities are integers. Then throughout
the Ford-Fulkerson terminates and outputs a flow x with x(uv) an integer for
all edges uv.
Proof. We’ll first establish:
(+) Throughout the Ford-Fulkerson algorithm x(uv) is an integer for all
edges uv.
Let xt be the flow at iteration t of (∗). We’ll show that xt (uv) is always
an integer by induction on t. For t = 0, we have we have x0 (uv) = 0 ∈ Z.
Suppose that xt (uv) ∈ Z for some t. Consider the number we define at this
iteration. We defined be “the minimum of minvi vi+1 forward edge (c(vi vi i + 1)−
xt (vi vi i + 1)) and minvi+1 vi backwards edge (xt (vi vi i + 1))”. Note that all numbers involved here (i.e. c(vi vi i + 1), xt (vi vi i + 1), xt (vi vi i + 1)) are integers.
Thus is an arithmetic combination of integers — and hence an integer itself.
Since at step t + 1, we have xt+1 (uv) ∈ {xt (uv), xt (uv) + , xt (uv) − }, and
so we get that xt+1 (uv) ∈ Z, completing the proof of (+).
Next note that the sequence of total flows fxt is strictly increasing (since
each is constructed by increasing/decreasing values along an augmenting path
as in the proof of Lemma 4.7). But fxt is bounded above (e.g. by the sum of
the capacities of all the edges in G). Thus we have that {fxt } is a bounded
sequence of strictly increasing integers — which means it is a finite sequence
i.e. the algorithm terminated at some point.
The above proof has the the important corollary that if all capacities are
integers, then there exists a maximum flow in which the flow along every
edge is an integer. In real world optimization problems we often have the
phenomenon that we want all variables of the output to be whole numbers
(e.g. we want a maximum flow along a road network, we don’t want to sent
half a car along some road). Modelling problems as flow problems is a general
method for guaranteeing the solution doesn’t involve any fractions.
7
Lecture 7
Matchings and covers in graphs
Consider the following informal problem: given a collection of employers
and job applicants, find an “optimal” pairing between the employers and
the applicants. “Optimal” can mean different things in different context
— it could mean simply allocating as many people jobs as possible. If the
applicants give a ranking of what jobs they prefer the most, “optimal” could
also mean taking into account their rankings in some way. We’ll first consider
the case when applicants simply give a list of which jobs they would be
happy with and which they wouldn’t — and our goal is to allocate as many
applicants to jobs as possible.
How do we model this mathematically. The setting will be a bipartite
graph G — this is defined as a graph whose vertex set is V = A ∪ B for
disjoint sets A, B and where all edges are of the form ab with a ∈ A and
b ∈ B. The sets A, B are called the “parts” or “bipartition classes” of the
bipartite graph. In our application, we would think of A as representing the
applicants and B as the jobs — with edges ab representing when applicant
a applies to job b. The structure that we look for in bipartite graphs is a
matching:
Definition 6.1. Let G = (V, E) be an (undirected) graph. M ⊂ E is a
matching if e, f ∈ M implies e ∩ f = ∅. i.e. a matching is a collection of
disjoint edges.
Our goal is to understand the following problem
Task 6.2 (Maximum matching).
INPUT: graph G(V, E)
OUTPUT: a matching M ⊂ E with as many edges as possible.
This problem is closely connected to another one — the minimum cover
problem.
Definition 6.3. Let G = (V, E) be an (undirected) graph.C ⊂ V is called a
cover if C ∩ e 6= ∅ for every e ∈ E. That is, a cover meets every edge of G.
We’ll use e(M ) to denote the number of edges in a matching and v(C) to
denote the number of vertices in a cover. A cover contains at least one vertex
from every edge. This implies the following basic relationship between the
above two definitions.
1
Fact 6.4. If M is a matching and C is a cover, then e(M ) ≤ v(C).
Proof. Since C is a cover, each edge of M contains (at least) one vertex of
C. Since the edges of M are disjoint, we get at least e(M ) vertices in C.
We have seen around the MaxFlow-MinCut theorem how beneficial is
when the maximum of one task equals the minimum of another. It is not
true that the optima of the above tasks coincide in general. But in the case
of bipartite graph they do coincide.
Theorem 6.5 (König’s Theorem). In a bipartite graph G
max{|M | : M is a matching } = min{|C| : C is a cover }.
This again has the important implication that if M is a matching and
C is a cover in a bipartite graph and |M | = |C|, then M is a maximum
matching and C is a minimal cover. There are several proofs. We give one
that uses the MaxFlow-MinCut theorem.
Proof of König’s Theorem. Assume V = P ∪ Q is the bipartition of G. We
define a directed graph G∗ as follows: V (G∗ ) = V (G) ∪ {r, s} and
E(G∗ ) = {rp : p ∈ P } ∪ {qs : q ∈ Q} ∪ {pq ∈ E : p ∈ P, q ∈ Q}.
We define capacities as well: u(rp) = u(qs) = 1 for every p ∈ P and q ∈ Q,
and u(pq) = ∞ for every pq ∈ E(G∗ ). This is a network. Since the capacities
are integers, we know that the maximum flow is an integral flow. We remark
that if x is an integral and feasible flow in this network, then x(e) = 0 or 1
(∀e ∈ E(G∗ )).
Claim 6.6. Let x be a maximum flow in G∗ and M a maximum matching
in G then fx = |M |.
Proof. Let x be a maximum feasible flow, and recall that from the FordFulkerson algorithm, we may assume that x is an integral flow. Define a
subset of edges M ⊂ E by pq ∈ M if x(pq) = 1 (and pq ∈
/ M if x(pq) = 0).
Then M is a matching in G: there cannot be two edges pq 0 , pq 00 ∈ M from
p ∈ P , since by flow conservation p would need to have inflow 2 in G∗ which
is impossible (since vertices in P only have one edge of capacity 1 entering
them). By the same argument, there cannot be two edges p0 q, p00 q ∈ M into
q ∈ Q. Thus M is a matching. For each edge pq ∈ M , the edge qs must have
flow 1 (by flow conservation). This gives that the total flow fx = |M |
Conversely, assume that M is a matching on G. We define a flow x by
the following rules
2
• x(pq) = 1 if pq ∈ M ,
• x(rp) = 1 if p is contained in an edge in M ,
• x(qs) = 1 if q is contained in an edge in M ,
• x(e) = 0 in all other cases.
Then x is an integral and feasible flow with fx = |M |.
Claim 6.7. Let δ(R) be a minimum r-s cut in G∗ and C a minimum vertex
cover of G. Then c(δ(R)) = |C|
Proof. Let δ(R) be a minimum r-s cut. Then R = {r} ∪ X for some X ⊂
V (G). There is no arc from X ∩ P to Q \ X as such an arc has infinite
capacity and our δ(R) is finite. Then there is no edge between X ∩ P and
Q \ X in G. Define C 0 = (P \ X) ∪ (Q ∩ X). It follows that C 0 is a cover in
G, and so we have proved |C| ≤ c(δ(R)).
For the other direction, consider a minimum vertex cover C of G. Let
R0 = {r} ∪ (P \ C) ∪ (Q ∩ C) and consider the cut δ(R0 ). Since C is a
vertex cover,
P there are no
P edges pq with p ∈ P \ C and q 6∈ Q ∩ C. Thus
c(R0 ) = p∈P \R0 c(rp) + q∈R0 c(qs) = (|P | − |P \ C|) + (|Q ∩ C|) = |C|.
By the MaxFlow-MinCut theorem, we have that the size of a maximum
flow in G∗ equals the capacity of a minimum cut in G∗ . By the first claim the
size of a maximum flow in G∗ equals the size of a maximum matching in G,
while by the second claim, the capacity of a minimum cut in G∗ equals the
size of a minimum vertex cover in G. Thus the size of a maximum matching
in G equals the size of a minimum vertex cover, as required.
We can now effectively solve the two tasks Max matching and Min cover
in bipartite graphs. To do this, set up the network G∗ as in the proof of
König’s theorem, and solve it by the Ford-Fulkerson algorithm.
We’ll now look at another theorem for finding matchings in graphs.
Definition 6.8. Suppose G = (V, E) is a graph and A ⊂ V . The set of
neighbours of A, N (A) is defined as
N (A) = {u ∈ V : there is v ∈ A such that uv ∈ E}.
The following theorem gives another characterization of matchings in bipartite graphs.
Theorem 6.9 (Hall Theorem). In a bipartite graph G = (V, E) with bipartition classes A and B there is a matching of size |A| if and only if
(∗ )
|N (S)| ≥ |S| for every S ⊂ A.
3
Proof. For the “only if” direction: If there is a matching M of size |A|, then
condition (*) holds. Indeed, if S ⊂ A, then M contains, for every a ∈ S, an
edge ab with a unique b ∈ B, and for distinct as the bs are also distinct. So
N (S) also contains b, one vertex for every a ∈ S.
For the other direction we note that A is always a cover in G. Another
form of König’s theorem says that there is a matching of size |A| iff there
is no cover of size < |A|. Assume now, contrary to the statement of the
theorem, that
• there is a cover C ⊂ A ∪ B of size < |A|, and • condition (*) holds.
Observe that there is no edge between A \ C and B \ C. We have |A| =
|C ∩ A| + |A \ C| > |C| = |C ∩ A| + |C ∩ B|, and so |A \ C| > |C ∩ B|. Yet
N (A \ C) ⊂ C ∩ B and then
|N (A \ C)| ≤ |C ∩ B| < |A \ C|
contradicting (*) when S = A \ C.
7
Stable matchings
Suppose that we have a set of n job applicants A and a set of n employers B.
We want to find the “best” pairing between the applicants and the employers.
What does “best” mean here? Each applicant gives their ranking of the
employers in order of preference and each employer gives a ranking of the
applicants in order of preference. Mathematically a “preference ordering” of
A/B just means an ordering <σ of A/B with the earlier applicants/employers
in the ordering being the more desirable ones (e.g a preference ordering of B
gives a labeling B = {b1 , . . . , bn } so that b1 < b2 < · · · < bn ).
Definition 7.1. A preference profile consists of two sets A, B of the same
size, and also two sets of preference orderings {<a : a ∈ A} and {<b : b ∈ B}.
We think of the preference ordering <a as giving an applicant a’s preferences between the employers in B, and we think of <b as giving an employer’s
preference between the applicants in A.
Definition 7.2. Suppose we have a preference profile with sets A, B of size
n, and orderings {<a : a ∈ A} and {<b : b ∈ B}.
A stable matching between A and B is a pairing (a1 , b1 ), . . . , (an , bn ) of A
to B such that there is no i, j with bj <ai bi and ai <bj aj (in words — we
don’t have that ai prefers bj to their current partner and bj prefers ai to their
current partner).
4
So we have the following
Task 7.3 (Stable mataching).
INPUT: sets B and G and their preferences
OUTPUT: a stable matching (b1 , g1 ), (b2 , g2 ), . . . , (bn , gn ).
There are n! matchings altogether. Is there one among them that is
stable? If so, how to find one? These questions are answered by the following
theorem which is due to Gale and Shapley.
Theorem 7.4. There always a stable matching in a preference profile.
Proof. The proof goes by the so called propose and reject algorithm that is
described as follows. The algorithm gradually builds a matching M . During
this algorithm everybody is either unmatched or matched. Initially everybody is unmatched. When (ai , bj ) ∈ M , we say that “bj is the partner of ai ”
and “ai is the partner of bj ”. Throughout the algorithm at various points ai
might “propose” to a bj — at this point bj might accept (making ai and bj
matched), or reject (keeping ai and bj unmatched). Formally the algorithms
runs as follows:
ˆ Initially, set M = ∅.
ˆ Repeat the following:
– If all ai are matched, then terminate outputting M .
– Otherwise arbitrarily pick some unmatched ai .
– If ai has already proposed to all b ∈ B, then terminate, outputting
M.
– Otherwise let bj ∈ B be an element to which ai hasn’t proposed
yet, who ai prefers the most (i.e. with bj as low as possible in
<ai ).
– ai proposed to bj .
(1) If bj is unmatched, then bj accepts the proposal. Add (ai , bj )
to M .
(2) If bj is matched to ak with ai <bj ak , then bj accepts the
proposal. Add (ai , bj ) to M and remove (ak , bj ) from M (so
ak is now unmatched).
(3) If bj is matched to ak with ak <bj ai , then bj rejects the
proposal (so nothing changes).
5
Now we show that the Gale-Shapley algorithm always produces a stable
matching. We prove a series of claims:
Claim 7.5. Once b ∈ B becomes matched, they can never become unmatched.
Proof. This is true since there is simply no mechanism in the algorithm to
unmatch b ∈ B.
Claim 7.6. After b ∈ B becomes matched, the partners of b will keep getting
better and better (formally if (ai , b) ∈ M at some point in the algorithm and
(aj , b) ∈ M at a later point in the algorithm, then aj <b ai ).
Proof. The partner of b can only change in step (2), when it changes from
ak to ai with ai <bj ak .
Claim 7.7. The algorithm terminates after at most n2 proposals.
Proof. It’s impossible for some ai ∈ A to propose to the same bj ∈ B twice
(just because the algorithm always picks bj to be someone who ai hasn’t
proposed to yet). Thus the most proposals that can happen is |A × B| =
n2 .
Claim 7.8. At the end of the algorithm, everyone is matched.
Proof. We prove this by contradiction. So assume a ∈ A has no partner at
termination. Then there is a unmatched b ∈ B as well. If we terminated, then
at some point a proposed to b. By Claim 7.5, b must have been unmatched
at this point. But then b must have accepted the proposal (which contradicts
b being unmatched at the end of the algorithm).
Claim 7.9. At the end of the algorithm, M is a stable matching.
Proof. We prove this by contradiction again. Assume that in the final matching the pairs are labelled M = {(a1 , b1 ), . . . , (an , bn )}. If M is unstable, then
for some i, j, we have that bj <ai bi and ai <bj aj .
Note that ai must have proposed to bj at some point (since bj <ai bi , we
have that ai proposes to bj before ai proposes to bi ).
It’s impossible that bj rejected ai : indeed this could only happen if bj
was paired with ak with ak <bj ai . But then, by Claim 7, we have that
aj ≤bj ak <bj ai , contradicting “ai <bj aj ”.
So bj accepted ai s proposal. Then, by Claim 7, we have that aj ≤bj ai ,
which again contradicts “ai <bj aj ”.
6
Lecture 8
Turing Machines
In the remainder of the module we will focus on formalizing the idea of
an algorithm in order to give mathematically precise versions of statements
like “a problem can be solved in polynomial time using an algorithm” and
“a problem cannot be generally solved by an algorithm”. We will do this
by introducing objects called “Turing machines” — these are mathematical
objects which model an algorithm, and will be more rigorously defined than
the decimal/arithmetic models we’ve looked at so far. Turing machines were
invented by Alan Turing in 1936, even before computers existed. We first
give an informal description.
• There is a finite alphabet Σ = {1, 2, 3, . . . , a, b, . . . , +, − . . . , ∗} including
the blank symbol ∗.
• There is a processor which is always in some state m ∈ M where M is
a finite set (of all possible states). M contains a starting state and a halting
state. These states are the “computer program” which determines how the
Turing machine acts.
• There is a 2-way infinite tape, with cells containing letters from the
alphabet Σ, like in the figure below,
...
*
* 7 2 c a t
9 0 * * ...
initially this tape contains the input. At any moment, all but finitely many
cells are blank (meaning that their value is “∗”). Formally, the tape is a
function Z → Σ with all except finitely many values equal to ∗.
• There is a read and write head, called r/w head for short, that can read,
erase, and write symbols from Σ on a given cell of the tape.
At any moment of the computation the Turing machine is in some position
(m, t) where m ∈ M is the state in which the processor is, and t is the
symbol on the tape that the r/w head sees on the tape. The next position
of the Turing machine is (m0 , t0 , ±1) is completely determined by (m, t): the
symbol t on the present cell is replaced by t0 , and m0 ∈ M is the next state
of the processor, and the r/w head moves to the next cell on the right or left
depending on whether the value is +1 or −1. (One could write left or right
instead of ±1.)
Mathematically, these concepts are formally described as follows:
ˆ Alphabet: an alphabet is a finite set Σ containing a special symbol
“blank” ∗ ∈ Σ.
1
ˆ Infinite tape: this is a function Z → Σ with all, except finitely many
symbols equal to the blank symbol “∗”.
ˆ Turing Machine: A Turing Machine is a finite collection of states M .
– State: a state m ∈ M is a function m : Σ → M × Σ × {+1, −1}.
The state with m(t) = (m0 , t0 , σ) is interpreted as “if the machine
sees t while in state m, then write the symbol t0 , switch to state
m0 , and then move right/left along the tape, depending on the
value of σ”
– Starting state: There is a special state mstart ∈ M called the
“starting state” representing the state in which the Turing machine starts the computation.
– Halting states: There is a halting state mhalt ∈ M representing
when the Turing machine stops the computation (sometimes we
allow for several halting states in M ).
ˆ Inputs: an input is some initial configuration of the tape. Usually we
will work with inputs which are strings. A string is an input s : Z → Σ
where, for some k, positions s(1), . . . , s(k) are non-blank, and all other
positions are blank.
To run a Turing machine on some input s : Z → Σ, do the following:
ˆ Put the r/w head on position 0. Set the state of the machine to state
mstart . Perform the following repeatedly:
ˆ If the machine is in a non-halting state m, and is in position i on the
tape, which contains symbol t, then we have m(t) = (m0 , t0 , σ) for some
state m0 ∈ M , symbol t0 ∈ Σ, and σ ∈ {+1, −1}. Do the following:
– Write t0 in position i of the tape.
– Switch the machine to state m0 .
– Move to position i + σ on the tape.
ˆ If the machine is in a halting state mhalt , then stop the computation.
The output of the computation is the final configuration of the tape.
One important remark, is that the definition of a Turing machine does
not enforce that the machine ever actually halts. It is entirely possible to
design machines which run for ever without entering a halting state. Lets
look at some examples.
2
8.0.1
Example: erasing a string
The following Turing machine simply erases the input and replaces everything
with the blank symbol. It will work with the alphabet is Σ = {a, b, . . . , z∗}.
The input s is given as s = s1 s2 . . . sk . It is written on the tape with s1 in
position 1, . . . , sk in position k, and all other entries blank ∗.
The states of the Turing machine are the following:
M
mstart (t)
merase (t)
merase (∗)
{mstart , merase , mhalt }
(merase , ∗, +1) for all t,
(merase , ∗, +1) for all t 6= ∗,
(mhalt , ∗, +1)
=
=
=
=
Here is an example of running this machine on a sample input (with the
underlined entries representing the position of the r/w head):
... ∗ ∗ d o
... ∗ ∗ d o
... ∗ ∗ ∗ o
... ∗ ∗ ∗ ∗
g
g
g
g
∗
∗
∗
∗
∗...
∗...
∗...
∗...
... ∗ ∗ ∗ ∗ ∗ ∗ ∗ ...
... ∗ ∗ ∗ ∗ ∗ ∗ ∗...
8.0.2
state:
state:
state:
state:
mstart
merase
merase
merase
state: merase
state: mhalt
Example: deciding if a number is even/odd
The following Turing machine that decides if n ∈ N is even or odd. The
alphabet is Σ = {0, 1, . . . , 9, ∗}. The input n is given as n = n1 n2 . . . nk . It is
written on the tape with n1 in position 1, . . . , nk in position k, and all other
entries blank ∗.
The states of the Turing machine are the following:
M
mstart (t)
mmoveright (t)
mmoveright (∗)
mread (t)
mread (t)
=
=
=
=
=
=
EN
{mstart , mmoveright , mread , mEV
, mODD
halt
halt }
(mmoveright , t, +1) for all t
(mmoveright , t, +1) for all t 6= ∗
(mread , ∗, −1)
EN
, ∗, −1) for t = 0, 2, 4, 6, 8, ∗
(mEV
halt
(mODD
halt , ∗, −1) for t = 1, 3, 5, 7, 9
3
Here is what the TM does under these rules. First moves right and
switches to the state mmoveright . Then it keeps moving to the right until
it sees a blank ∗ and immediately switches to the state mread and moves
one digit to the left. Now the machine is reading the very last digit of the
number, so depending on whether it is odd or even, it halt’s outputting “odd”
or “even”.
Here we give an example of this Turing machine running on the input
“n = 234”. We underline the position of the r/w head at each step.
...
...
...
...
...
...
∗
∗
∗
∗
∗
∗
∗234
∗234
∗234
∗234
∗234
∗234
∗
∗
∗
∗
∗
∗
∗...
∗...
∗...
∗...
∗ ...
∗...
state:
state:
state:
state:
state:
state:
mstart
mmoveright
mmoveright
mmoveright
mmoveright
mread
EN
state: mEV
halt
... ∗ ∗ 2 3 4 ∗ ∗...
Remark: the way we define Turing machines for every non-halting state
and every t ∈ Σ we must specify a command of the form (m0 , t0 , σ). Some
of these commands may end up being unnecessary from the point of view of
the problem which the Turing machine is solving. For example in the above
example, state mread moving the r/w head to the left didn’t do anything
useful. Additionally specifying a behaviour for mread (∗) is unnecessary since
the only way that the machine could be in state mread with ∗ written on the
tape is if initially the tape was entirely blank. However to properly specify
a Turing machine you have to say what every state does for every possible
symbol of the alphabet.
Church’s thesis
At first glance Turing machines seem like a very awkward way of defining
computations. And they are — actually designing a Turing machine for solving any non-trivial task is highly impractical. However the point of Turing
machines is not convenience, but rather being able to define algorithms in
a 100% mathematically precise way. This allows mathematicians to study
questions like “what sorts of problems can one design an algorithm to solve”
and “how fast can one design an algorithm for solving a particular task”.
It is natural to ask whether some more advanced model of computation
(like the arithmetic model or decimal model) is “better” in the sense that
there are tasks that can be performed by those models and not by Turing
machines. Church postulated that the answer is “no”:
4
Church’s Thesis: Any reasonable notion of computation is equivalent
to what is computable by a Turing machine.
This is not a formal mathematical statement in a sense that it doesn’t
define what “model of computation” is. Special cases of it can be formalized
(e.g. it is possible to prove that for any task solved by an algorithm in the
decimal model, one can build a Turing machine to solve the same task).
There have been many other notions of computations studied. Some of these
modifications of Turing machines (like ones where there are several tapes
instead of one, other movements of the head than just one step right or left
etc.), while others are more conceptually different like the arithmetic and
decimal models. But for all the ones considered so far, it has been possible
to prove that they are essentially equivalent in power to Turing machines.
Thus, despite the apparent arbitrary nature of their definition, the idea that
“algorithm = Turing machine” turns out to be reasonable.
9
Decision problems
A decision problem asks whether a given mathematical object has a certain
property or not. Examples: decide, whether a given
ˆ n ∈ N is a power of 2 or not;
ˆ n ∈ N is prime or not;
ˆ a given bipartite graph has a matching of size ≥ k or not;
ˆ a given digraph G with r, s ∈ V has a directed path from r to s or not;
ˆ given k ∈ N and a digraph G with r, s ∈ V has a directed r − s path
of length ≤ k or not;
ˆ a given polynomial p(x1 , . . . , xk ) ∈ Z[x1 , . . . , xk ] has a root
ˆ (x1 , . . . , xk ) ∈ Zk or not;
ˆ a given polynomial p(x1 , . . . , xk ) ∈ Z[x1 , . . . , xk ] has a root
ˆ (x1 , . . . , xk ) ∈ Rk or not.
We want an algorithm (that is, a Turing machine) that decides, for every
given n or bipartite graph G and k ∈ N, etc. whether it has the property
in question or not. In general we need an encoding scheme for the problem
that encodes the objects in question (n ∈ N, digraphs, pairs (G, k) where G
5
is a bipartite graph and k ∈ N, polynomials etc.) such that their codes can
be inputs of a suitable Turing machine.
To define decision problems formally: recall that every Turing machine
works with a language Σ. The set of all strings of the alphabet Σ0 is denoted
by Σ+
0 . A language L is defined to be any set of strings i.e. any subset of
+
Σ0 . A decision problem is then defined as:
Definition 9.1.
ˆ A decision problem is a pair of languages DY ES ⊆ D.
ˆ An algorithm T solves the decision problem D if for any I ∈ D runES
O
ning T on I halts in state mYhalt
if I ∈ DY ES and halts in state mN
halt
otherwise.
Note that we do not precisely say what “algorithm” means here. It could
be a Turing machine, but could also be an algorithm in the arithmetic or
decimal model.
Here are some examples
DECISION PROBLEM: Even
ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, ∗} INPUT: a string s representing a natural number i.e. s = s1 . . . sk with s1 6= 0
OUTPUT: YES if, and only if, s is a prime.
Here D = {s1 . . . sk : si ∈ Σ \ ∗ and s1 6= 0} and DY ES = {s1 . . . sk ∈ D :
s1 . . . sk is even}. We’ve seen a Turing machine which solves this decision
problem in Example 8.0.2.
DECISION PROBLEM: Prime
ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, ∗} INPUT: a string s representing a natural number i.e. s = s1 . . . sk with s1 6= 0
OUTPUT: YES if, and only if, s is a prime.
Here D = {s1 . . . sk : si ∈ Σ \ ∗ and s1 6= 0} and DY ES = {s1 . . . sk ∈ D :
s1 . . . sk is prime}
DECISION PROBLEM: Connected graph
ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, V, E, =, (, ), {, }, ∗}
INPUT: graph G(V, E)
OUTPUT: YES iff G is connected.
To represent this as as a decision problem we need to encode the input using
the alphabet Σ. There are many reasonable conventions we can come up
with for doing this. One is to simply write out the vertices and edges of the
graph as you would in text. So D consists of all strings of the form “V, =
6
, {, v, 1, . . . , v, n, }, E, =, {, . . . , },”. Here the symbols between the commas
is what’s written on the tape. So first we write a sequence of vertices and
then we write a sequence of edges. For example “V, =, {, v, 1, v, 2, v, 3}, E, =
, {, (, v, 1, v, 2, ), (v, 2, v, 3, ), },” is how we would write the input “G = (V, E)
with V = {v1 , v2 , v3 }, E = {v1 v2 , v2 v3 }”. Under this encoding DY ES consists
of strings of the above form which corresponds to a connected graph.
DECISION PROBLEM: Empty graph
ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, V, E, =, (, ), {, }, ∗}
INPUT: graph G(V, E) i.e. a string of the form “V, =, {, v, 1, . . . , v, n, }, E, =
, {, . . . , },”
OUTPUT: YES iff G has no edges.
Here we can actually write down a Turing machine which solves this decision problem. In this example DY ES consists of strings of the form “V, =
, {, v, 1, . . . , v, n, }, E, =, {, . . . , },” where there is nothing entered between the
semicolons at the end eg. “V, =, {, v, 1, v, 2, v, 3, }, E, =, {, },”. Equivalently,
inputs of this form are ones where there is no “v” following the “E”.
To test for this we can construct a Turing machine which reads the input
until it sees “E”, then sees if there is a “v” following that. The following
works:
M
mstart (t)
mstart (E)
m1 (t)
m1 (v)
m1 (})
=
=
=
=
=
=
ES
O
{mstart , m1 , mYhalt
, mN
halt }
(mstart , t, +1) if t 6= E
(m1 , t, +1)
(m1 , t, +1) if t 6= v, E
O
(mN
halt , t, +1)
ES
(mYhalt
, t, +1)
Note that in all the above examples the exact way that we encode the
input is rather arbitrary — one could write down countless equivalent ways of
writing the same input. The point is that the formalism of Turing machines
and decision problems gives us some mathematically precise way of encoding
problems. Once we can do this we can meaningfully ask questions like “can
all decision problems be solved by a Turing machine”, “how many steps might
a Turing machine need to solve a decision problem with an input of size n”
etc.
7
Satisfiability
Next we describe an important decision problem called satisfiability, SAT for
short. It involves Boolean variables x1 , x2 , . . . , xm that can take only two
possible values: True and False. A literal z associated with the Boolean
variable x is either z = x or z = ¬x, the negation of x, meaning simply that
¬x is True iff x is False. A clause C is some literals connected with ’or’, that
is C = z1 ∨ z2 ∨ · · · ∨ zk . Finally a Boolean expression
Φ = C1 ∧ C2 ∧ · · · ∧ Cd
is a conjunction or rather conjunctive normal from, a CNF for short, if each
Ci there is a clause. It is a result in elementary logic that every Boolean
expression can be written as a CNF, but we don’t need this.
A truth assignment is a function V : {x1 , . . . , xm } → {T, F } where
T =True and F =False. So V just assigns True or False value to each
Boolean variable x1 , . . . , xm . This truth assignment extends to literals as
V (zj ) = T iff either zj = xi and V (xi ) = T or zj = ¬xi and V (xi ) = F . It
extends further to clauses naturally as
V (C) = V (z1 ∨ z2 ∨ · · · ∨ zk ) = T iff V (zj ) = T for at least one zj ,
and extends further to CNFs as
V (Φ) = V (C1 ∧ C2 ∧ · · · ∧ Cd ) = T iff V (Ci ) = T for all Ci ,
A CNF Φ is satisfiable if there is a truth assignment V such that V (Φ) =
T . Here comes the decision problem Satisfiability and one of its variants
called 3-SAT.
DP: SAT
INPUT: a CNF Φ
OUTPUT: YES iff Φ is satisfiable.
DP: 3-SAT
INPUT: a CNF Φ where every clause has at most 3 literals
OUTPUT: YES iff Φ is satisfiable.
The size of the input in this case is the total number of literals appearing
in the CNF, counted with multiplicities.
Here are some examples. Let C = x1 ∨ ¬x2 and define Φ = C. Then Φ
is satisfiable with eg “x1 = T and x = T ”. Next let C1 = x1 ∨ ¬x2 ∨ ¬x3
and C2 = x2 ∨ ¬x3 and Φ = C1 ∧ C2 . Again, Φ is satisfiable with “x1 = T ,
8
x2 = T , x3 = F ”. Finally consider the CNF formula (x1 ∨ x2 ∨ x3 ) ∧ (¬x1 ) ∧
(¬x2 ) ∧ (¬x3 ). This is not satisfiable since for (x1 ∨ x2 ∨ x3 ) to be true one of
x1 , x2 , or x3 needs to be true, which stops (¬x1 ) ∧ (¬x2 ) ∧ (¬x3 ) from being
true.
Understanding whether there exists an efficient algorithm for solving the
SAT decision problem is actually one of the most important open problems
in mathematics and computer science. It is the essence of the “P vs NP”
problem which is one of the Millenium problems for the solution of which the
Clay Institute offers a prize of 1 million dollars. In future weeks, we’ll what
the “P vs NP” problem is and how SAT features in it.
Running time of Turing machines
It is quite easy to understand how many steps a Turing machine takes on
an input — this is just the number of times the r/w head moves. To study
running times, we also need a concept of the size of an input. Recall that the
input to a Turing machine is a string, that is, a finite sequence of symbols
from the alphabet Σ0 = Σ \ {∗}, written one after the other. A string x
is then x = w1 w2 . . . wk with each wi ∈ Σ0 , (and k ∈ N is arbitrary). The
length |x| of this string x is |x| = k.
A Turing machine takes a string x ∈ Σ+
0 as the input. The running time
of the Turing machine M on the string x ∈ Σ+
0 is defined as:
TM (x) = number of steps M takes on x.
Here “step” means a move of the r/w head. The running time is defined to
equal infinity when M does not halt on x.
For a language L, we can also formally define the worst case performance
of a Turing machine M as
TM (L, n) = max{TM (x) : x ∈ L and |x| ≤ n}.
When the language L is just the set of all strings, we abbreviate TM (n) =
TM (Σ+
0 , n). The use of max instead of sup is justified as there are only finitely
many strings of size at most n.
We say that M runs in polynomial time on a language L if there is an
integer d with TM (L, n) = O(nd ). This is equivalent to there existing a
polynomial p with TM (L, n) ≤ p(n) for all n. If L is simply the set of all
strings then we just say “M runs in polynomial time”.
Definition 9.2. We say that a decision problem DY ES ⊆ D can be solved in
polynomial time if there exists a Turing machine M with TM (D, n) = O(nd )
9
for some d and which correctly solves the decision problem (in the sense of
Definition 9.1).
If a decision problem X can be solved in polynomial time, then we say
that “X is in P”, written “X ∈ P”.
A version of Church’s thesis is true for running times too. When solving
exercises you can use the following without proof:
Proposition 9.3. Let X be a decision problem. The following are equivalent.
ˆ X can be solved in polynomial time with a Turing machine
ˆ X can be solved in polynomial time in the arithmetic model.
ˆ X can be solved in polynomial time in the decimal model.
Proving a result like this is quite long and tedious — one needs to construct Turing machines for doing the elementary operations in the decimal/arithmetic models, and then use these to show that any algorithm in
the decimal/arithmetic models can be transformed into a Turing machine.
10
Lecture 9
Decidability and the Halting Problem
Recall the definition of decision problems.
Definition 9.1.
ˆ A decision problem is a pair of languages DY ES ⊆ D.
ˆ An algorithm T solves the decision problem D if for any I ∈ D runES
O
ning T on I halts in state mYhalt
if I ∈ DY ES and halts in state mN
halt
otherwise.
A very natural question is whether every decision problem can be solved
by an algorithm. This was actually Turing’s original motivation for defining
Turing machines and so formalising the concept of an algorithm. It turns
out that the answer is “no” — there are problems which cannot be solved by
an algorithm. Such problems are called “undecidable”.
Theorem 9.2 (Turing). There exists a decision problem which cannot be
solved by any Turing Machine.
This theorem basically says that there are decision problems that cannot
be solved by any algorithm i.e. no matter how fast/clever your computer
program is, there are questions that it simply cannot answer in general. In
order to prove this theorem we need to come up with a decision problem
that it proves is undecidable. This decision problems is easy to describe —
it is called the Halting Problem. Informally, input to the Halting Problem
is (M, x), where M is a Turing Machine and x is a string, with the output
being YES ⇐⇒ M halts on x. Formally this is defined as follows:
ˆ Decision problem: Halting Problem
ˆ Alphabet Σ = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0, a, A, b, B, . . . , z, Z, =, (, ), {, }, +, −, ,, :
, ∗}.
ˆ Input: A Turing Maching M running on Σ (eg M = {mstart , mone , . . . })
and an input string x ∈ Σstring (eg x = elephant02). We write this on
the tape as:
{mstart, mone, · · · : mstart(1) = (b, mone, −1), . . . , }elephant02 (1)
1
In other words, first write an open bracket “{”, then we write out the
states of M (where we have a convention that we only use letters for
labelling these), then we write a colon “:”, then write out what each
state does on each alphabet letter, then write a closed bracket “}”, and
finally write the string x.
There is one technicality here — how do we write out what a state does
on the blank symbol ∗? The natural thing is to write “mstart(∗) =
(∗, mone, −1)” on the tape — however we can’t do this since we want
the input to be a string (and the definition of a string doesn’t allow blank entries between non-blank entries). We remedy this by
writing mstart(blank) = (blank, mone, −1)” on the tape instead of
“mstart(∗) = (∗, mone, −1)”.
ˆ D = { strings from Σ of the form (1) }
ˆ DY ES = { strings from Σ of the form (1) such that M halts on x }
This decision problem cannot be solved by algorithm.
Theorem 9.3. There is no Turing machine working on alphabet Σ which
solves the Halting Problem.
Note that the above theorem doesn’t mean that we can never solve a particular instance of the Halting Problem (for some particularly simple Turing
Machines it is possible to figure out exactly when they halt). Instead what
it is saying that whatever algorithm you try to design for solving the Halting
Problem, there will always be some input you could feed into the algorithm
where it won’t give the correct answer.
Before proving the above theorem we need an auxiliary lemma, which
constructs a particular Turing machine for an (apparently) unrelated task.
The following lemma builds a Turing machine which takes any string as an
input and duplicates it so that it is now written twice.
Lemma 9.4. There exists a Turing machine MDU P LICAT E which takes a
string a1 , . . . , an as input, and whose output is the string a1 , . . . , an , a1 , . . . , an
(written on positions 1, 2, . . . , 2n of the tape). Additionally, MDU P LICAT E
always halts in position 0 in the state mhalt for such an input.
Proof. Let the alphabet be Σ = {∗, t1 , . . . , tk } We’ll build a Turing ma1
1
1
chine M . The states of M are M = {mstart , mread , mtcopy1
, mtcopy2
, mtreturn1
,
t1
t1
t1
t1
t1
mreturn2 , . . . , mcopy1 , mcopy2 , mreturn1 , mreturn2 }. Note that this is finitely
2
many states (since the alphabet is finite). The actions of each state are
defined as follows:
mstart (t) = (mread , t, +1)
i
mread (ti ) = (mtcopy1
, ∗, +1)
mread (∗) = (mhalt , ∗, +1)
i
mtcopy1
(tj )
i
mtcopy1
(∗)
ti
mcopy2 (tj )
i
mtcopy2
(∗)
i
mtreturn1
(tj )
ti
mreturn1 (∗)
i
(tj )
mtreturn2
ti
mreturn2 (∗)
=
=
=
=
=
i
(mtcopy1
, tj , +1)
i
(mtcopy2
, ∗, +1)
ti
(mcopy2 , tj , +1)
i
(mtreturn1
, ti , −1)
i
(mtreturn1
, tj , −1)
for all t
for all ti
for all ti
for all tj
for all tj
for all tj
= (mreturn2 , ∗, −1)
i
, tj , −1)
= (mtreturn2
for all tj
= (mread , ti , +1)
Running this on an input a1 , . . . , an has output a1 , . . . , an , ∗, a1 , . . . , an . In
order to produce a Turing machine for the lemma, we need to combine
this with another Turing machine S whose effect is shifting a string left
by one position (in order to erase the ∗). We can use the following S =
k
1
, sreturn1 , sreturn2 , shalt } to do this:
{sstart , sread , stwrite
, . . . , stwrite
sstart (t) = (sread , t, +1)
i
, ∗, −1)
sread (ti ) = (stwrite
sread (∗) = (sreturn1 , ∗, +1)
i
stwrite
(t) = (sstart , ti , +1)
sreturn1 (ti ) = (sreturn1 , ti , −1)
sreturn1 (∗) = (sreturn2 , ∗, −1)
for all t
for all ti
for all ti
for all ti
for all ti
sreturn2 (x) = (shalt , x, +1)
for all x
Now we can produce a Turing machine MDU P LICAT E satisfying the lemma
by combining M and S: First run M . Replace the state mhalt by sread so
that once M terminates, S is run. The effect is that running MDU P LICAT E
on an input a1 , . . . , an has output a1 , . . . , an , a1 , . . . , an .
We can now prove Turing’s theorem.
Proof of Theorem 9.3. Suppose for contradiction that there is a Turing machine Msolve which can solve the Halting problem. Then Mhalt satisfies the
following:
3
(a) For any Turing Machine M and string x if M halts on x, then running
H on input (M, x) halts in state hyes
halt .
(b) For any Turing Machine M and string x if M doesn’t halt on x, then
running H on input (M, x) halts in state hno
halt .
Now we will design a new Turing machine N which takes a string y as an
input. It will run as follows:
Phase 1: First N runs MDU P LICAT E so that yy is written on the tape.
Phase 2: Then N runs Msolve with the following modification:
ˆ The state hyes
halt is replaced with the state hloop which is defined by
“hloop (t) = (hloop , t, +1) for all t”.
To define M formally, we need to specify all its states. We will do this
by combining the states of Msolve and MDU P LICAT E suitably. First suppose
that the states of Msolve are labelled by lowercase letters, while the states of
MDU P LICAT E are labelled by uppercase letters (so eg the starting/halting
n
states of Msolve are called msolve , myes
halt , mhalt o while the starting state of
MDU P LICAT E are called mSOLV E , mHALT ). In particular, this implies that
the states of Msolve /MDU P LICAT E all have distinct labels. Now define N as
follows:
N = (MDU P LICAT E \{mHALT })(Msolve \{mstart , myes
halt })∪{mloop , mswitchphase }
where hloop (t) = (hloop , t, +1) for all t and mswitchphase (t) = mstart (t) for all t.
Additionally, we do the following alterations.
ˆ Replace all mentions of myes
halt by mloop everywhere.
ˆ Replace all mentions of mstart by mswitchphase everywhere.
ˆ Replace all mentions of mHALT by mswitchphase everywhere.
The effect of this is that N runs exactly like it is described above — first
it runs MDU P LICAT E , then instead of halting it switches to state mswitchphase
and starts running Msolve , and afterwards instead of ever switching to state
myes
halt , it would switch to state mloop .
Claim 9.5. N halts in state mno
halt on input string y ⇐⇒ Msolve halts in
no
state mhalt on input yy
4
Proof. By construction, when N is run on y, it reaches mswitchphase with yy
written on the tape and the r/w head on position 0. After this, N exactly
copies the machine Msolve — thus N reaches mno
⇐⇒ Msolve reaches
halt
no
mhalt .
Claim 9.6. Let M be a Turing Machine written as a string on the tape in
the form (1).
Then N halts in state mno
halt on input string M ⇐⇒ M doesn’t halt on
input string M .
Proof. Note that the string M M is of the form (1) where the machine is M
and the input string x is also M .
By Claim 9.5, we have that N halts in state mno
halt on input string M ⇐⇒
no
Msolve halts in state mhalt on input M M . However, by the assumption that
Msolves the Halting Problem, this happens exactly when M halts on input
string M .
Note that it’s possible to run the Turing machine N on the input which
is N . There are two cases depending on whether doing this halts or not.
Case 1: Suppose that N halts on input string N . Then N must halt in
state mnhalt o (since this is the only halting state in N by construction). By
the “⇒” part of Claim 9.6, this tells us that N doesn’t halt on input string
N , which is a contradiction.
Case 2: Suppose that N doesn’t halt on input string N . By the “⇐”
part of Claim 9.6, we get that N halts in state mno
halt on input string M ,
which is a contradiction.
You may think that the decision problem in the proof Theorem 9.2 is
highly artificial. However it is possible to prove that more natural decision problems are undecidable as well. For example, given a multivariate polynomial p(x1 , . . . , xk ) with integer coefficients, determining whether
p(x1 , . . . , xk ) = 0 has an integer solution is also undecidable. Proofs of results
like this are based on reducing such problems to the Halting Problem ‘i.e.
proving that they can be solved by an algorithm if, and only if, the Halting
Problem can be solved by an algorithm.
10
NP
We’ve already mentioned that “P vs NP problem” and stated it as “SAT ∈
P”. This isn’t the usual phrasing of the problem — it is normally phrased
as “P =
6 N P”. In the remainder of the module we’ll define what N P is and
explain why the two formulations of the problem are equivalent.
5
On a basic level, N P is a set of decision problems (just like P is a set of
decision problems). The definition however is substantially more complex.
Definition 10.1. Let Σ be an alphabet, and suppose we have a decision
problem DY ES ⊆ D ⊆ Σstring . An algorithm M is a polynomial time certifier
for the decision problem DY ES ⊆ D if there is a polynomial q so that:
(1) For all x ∈ DY ES , there exists a string y of length ≤ q(n) such that M
outputs Y ES on input (x, y).
(2) For all x ∈ D \ DY ES and all strings y of length ≤ q(n), we have that M
outputs N O on input (x, y).
(3) M runs in polynomial time (i.e. for any x ∈ D, y ∈ Σstring , TM (x, y) =
O((|x| + |y|)d ) for some d ∈ N).
An algorithm B is an efficient certifier for the decision problem X if
The big new thing in this definition is that M has two parts to its input
— x and y. The x ∈ D is just some instance of the decision problem which
we are trying to solve. On the other hand y ∈ Σstring is just an arbitrary
string — and it is a lot more mysterious what it is doing in the definition.
One way to think about y is a as “hint” towards the solution of the decision
problem. So informally we think of a decision problem (D, DY ES ) as being
in NP if there is an algorithm M which can efficiently solve (D, DY ES ) when
also receiving a hint for how to solve it.
We define a new class, N P, of decision problems.
Definition 10.2. N P is the set of decision problems for which there exists
a polynomial time certifier.
The P vs NP problem is stated as “are there any decision problems which
are in N P, but not in P”. Previously we stated this problem as “decide
whether SAT ∈ P or not”. We’ll eventually show that the two formulations
are equivalent.
Let’s look at a concrete example. Let COMPOSITE be the decision
problem whose input is an n-digit positive integer x and whose output is
YES if, and only if, x is a composite number.
Theorem 10.3. COMPOSITE ∈ N P.
Proof. We’ll build a polynomial time certifier M for COMPOSITE which
runs in the decimal model. Formally it works as follows:
6
(1) Input: the input for M is a pair (x, y) where x is an integer, whereas y
is an arbitrary string.
(2) Divide x by y i.e. find integers m, r with x = my + r where 0 ≤ r < y.
This is done using Theorem 16 from week 1’s lecture notes.
(3) If r 6= 0, y = 1, or m = 1, then output NO.
(4) If r = 0, y 6= 1, and m 6= 1, then output YES.
We’ll check the definition of “polynomial time certifier” with q(n) = n. Let
x be an input to COMPOSITE of size n i.e. x is a n-digit number. We need
to show three things:
ˆ Let x be a YES instance of COMPOSITE. Then x has a non-trivial
factorization x = ab where x − 1 ≥ a, b ≥ 2. Let y = a. Since a ≤ x,
we have |y| ≤ |x| = q(|x|) always. When we run M on (x, y), we divide
x by y to get x = my + r = ma + r = ab for 0 ≤ r < y = a, which
implies that r = 0 and m = b. This shows that T (x, y) = Y ES.
ˆ Let x be a NO instance of COMPOSITE (i.e. x is a prime number).
Let y be a string of size ≤ n. When we run M (x, y) the output is always
NO. When we run M on (x, y), we divide x by y to get x = my + r for
some 0 ≤ r < y. Note that we cannot have m, y ≥ 2, r = 0 (since x is
prime), and so we must output NO.
ˆ M runs in polynomial time O((|x| + |y|)2 ) as a consequence of Theorem
16 from week 1.
The above proof illustrates the common strategy used in essentially all
proofs of decision problems being in N P. Such decision problems can always
be phrased in the form “x ∈ DY ES ⇐⇒ there exists some object z” —
e.g. in the above example x ∈ DY ES ⇐⇒ there exists an integer factor
6= 1, x of x. Other examples include “x is a YES instance of SAT ⇐⇒
there exists a T/F assignment to the variables making x true” or “G is a
YES instance of CONNECTED ⇐⇒ there exists a spanning tree of G”. To
build a polynomial time certifier for a decision problem, write an algorithm
which checks whether y is an example of the object z. Afterwards, the proof
proceeds almost exactly like the above theorem.
Here is another, more complicated example.
Theorem 10.4. SAT is in N P.
7
Proof. SAT can be defined as a decision problem as follows:
ˆ Alphabet Σ = {∗, x, 0, 1, . . . , 9, ¬, ∧, ∨, (, ), =, T, F }.
ˆ D = strings of the form “(x1 ∨ ¬x2) ∧ (¬x3 ∨ x1)”
ˆ DY ES = strings of the above form, for which there exists a satisfying
assignment.
We define an algorithm A, whose input is a pair (x, y) with x ∈ D, y ∈
Σstring . This time the algorithm will be in the arithmetic model:
(i) First check whether y is an assignment of T /F values to the variables
of x i.e. if y is of the form “x1 = T /F, x2 = T /F, . . . , xn = T /F ” (for
some choice of T /F in each case), and with x1, . . . , xn the variables
appearing in x. If y is not of this form, then output NO.
(ii) Next check whether x is satisfied by the T /F assignments given by y.
To do this go through the clauses of x and check that each clause has
at least one literal which is satisfied by y. If x is satisfied by y, then
output YES, otherwise, output NO.
Let x be an input to SAT of size n i.e. x is a CNF formula of length n.
Notice that the number of variables of x is ≤ n We check the three parts of
the definition of “polynomial time certifier” with the polynomial q(n) = n:
ˆ Let x ∈ DY ES . Then there exists some assignment of T/F to the
variables of x to make the whole expression true. Let y be the string as
in (i) corresponding to this assignment. Since the number of variables
of x is at most n, we have y ≤ n always. It is immediate from the
definition of T that T (x, y) = Y ES.
ˆ Let x ∈ D \ DY ES . Let y ∈ Σstring be any string. When we run
T (x, y) the output is always NO. Indeed if y is not an assignment of
T /F values to the variables of x, then we output NO at step (i). If y is
an assignment of T /F values to the variables of x, then we output NO
at step (ii) because (by assumption of x 6∈ DY ES ) there is no satisfying
assignment for x.
ˆ For any input y, T (x, y) runs in polynomial time. To see this, notice
that step (i) takes O(|x| + |y|) steps (first read y from left to right,
checking that it is of the form x1 = T /F, x2 = T /F, . . . , xn = T /F —
this is O(|y|) operations, then read x from left to right, checking that
for each variable xi that appears in x we have i ≤ n — this is O(|x|)
8
operations). Step (ii) takes O(|x|) steps (first replace each xi by T/F
based on what y says it should be. Then check if there is any clause
of the form (F ∨ F ∨ · · · ∨ F ). If there is such a clause, output NO,
otherwise output YES).
We end with establishing a containment between P and N P.
Lemma 10.5. P ⊂ N P.
Proof. Let (DY ES , D) ∈ P. We need to show that (DY ES , D) ∈ N P also
i.e. construct a polynomial time certifier B for (DY ES , D). This is done as
follows.
Since (DY ES , D) ∈ P, there is an algorithm A which solves (DY ES , D) in
polynomial time (i.e. TA (x) = O(|x|d ) for some d). Define another algorithm
B whose input is (x, y), and which just runs A on x, while completely ignoring
y. Note that by this definition, for all x, y, the output/running-time of B on
(x, y) equals the output/running-time of A on x.
Claim 10.6. B is a polynomial time certifier for (DY ES , D).
Proof. We check the 3 parts of the definition with the polynomial q(n) = 0.
ˆ Let x ∈ DY ES . Choose y = ∅ ∈ Σstring (i.e. y is the empty string).
Then B(x, y) = A(y) = Y ES.
ˆ Let x ∈ D \ DY ES , and let y ∈ Σstring be an arbitrary string. Then
B(x, y) = A(y) = N O.
ˆ For every x, y, we have TB (x, y) = TA (x) = O(|x|d ) = O((|x| + |y|)d ).
9
Lecture 10
11
Polynomial time reduction
Polynomial time algorithms are considered to be “fast” whereas algorithms
which are not polynomial time are considered to be “slow”. Almost all the
algorithms we have considered in this module run in polynomial time.
It is of great theoretical and practical interest to understand which decision problems are in P and which are not. One motivation for showing
that decision problems are not in P — comes from cryptography. When
encrypting something it is desirable to provably know there is no polynomial
time algorithm to decrypt your cypher.
Unfortunately proving that a decision problem is not in P is extremely
difficult. One problem that is not in P is of course the Halting Problem.
We know that there is no algorithm at all to decide the Halting Problem,
so, in particular, there is no polynomial time algorithm. Beyond the Halting
Problem, there are few decision problems we know of which are provably
outside P. Recall the P vs NP open problem.
Problem 11.1 (P vs NP problem). Show that SAT 6∈ P.
The focus on SAT here may seem rather arbitrary. The basic reason why
SAT is important is that it turns out that many other important problems are
“polynomial time reducible” to SAT. Informally this means that if there is a
polynomial time algorithm for SAT, then we would also have a polynomial
time algorithm for many other problems.
The following definition formalizes the idea of “polynomial time reducible”.
Definition 11.2.
ˆ One decision problem (AY ES , A) is polynomial-time
reducible to another decision problem (BY ES , B), if there is a polynomial time algorithm T so that:
– For any input i ∈ A we have T (i) ∈ B
– T (i) ∈ BY ES ⇐⇒ i ∈ AY ES .
We write X ≤p Y to mean “problem X is polynomial time reducible to
problem Y ”.
This means that if Y is solvable in polynomial time, then so is X. Or if
X is not solvable in polynomial time, then neither is Y .
1
Note that the relation ≤p is reflexive (that is X ≤p X for every X). It is
also transitive, meaning that X ≤p Y and Y ≤p Z imply X ≤p Z. Thus ≤p
is a partial ordering on the collection of all decision problems.
Let’s look at an example of polynomial time reduction. Consider the
following decision problem.
ˆ Decision problem: IndepSet
ˆ Input: graph G = (V, E) and integer m
ˆ Output YES iff G has a subset of m vertices containing no edges.
A set of vertices in a graph with no edges between them is called an
independent set. We can give a reduction of SAT to IndepSet.
Proposition 11.3. SAT ≤p IndepSet.
Proof. For polynomial reduction, for every input Φ to SAT, we have to construct an input (G, k) to IndepSet by a polynomial time (in the size of Φ)
algorithm, such that the answer to Φ is YES iff the answer to (G, k) is YES.
Here Φ = C1 ∧ C2 ∧ · · · ∧ Cm where each Ci is a clause with ki literals, that
is Ci = z1 ∨ z2 ∨ · · · ∨ zki and each zj here is either a Boolean variable x or
its negate. The size of Φ is Θ(k1 + k2 + · · · + km ).
P
Given such Φ, we construct first the graph G. It will have m
i=1 ki vertices,
each corresponding to one variable in one clause. Two vertices of G form an
edge if they come from the same clause, or if the corresponding variables are
negates of each other. Thus G consists of m complete subgraphs K1 , . . . , Km
(one corresponding to each clause) plus edges connecting vertices of the type
x and ¬x. From the input Φ to SAT we have constructed the input (G, m)
to IndepSet. This construction takes polynomial time in k1 + k2 + · · · + km .
The next figure shows an example.
Φ = (x1 ∨ ¬x2 ∨ ¬x3 ) ∧ (¬x1 ∨ x3 ∨ ¬x4 ) ∧ (x2 ∨ ¬x3 ∨ x4 )
2
If Φ is satisfiable then G has an independent set of size m: choose the
vertex in each complete subgraph Kj that makes that clause true. This set
is independent and is of size m. If G has an independent set U of size m,
then Φ is satisfiable: U contains one vertex from every complete subgraph
Kj , and we set xi true if a vertex corresponding to xi is in U , and set it false
if the vertex corresponding ¬xi is in U . This shows that Φ is satisfiable iff G
contains an independent set of size m, exactly what we wanted.
12
NP-completeness
The following definition is quite central in the study of algorithms
Definition 12.1. A decision problem X ∈ N P is NP-complete if Y ≤p X
for every Y ∈ N P.
Informally this definition means that: an NP-complete problem is the
“hardest” problem in the class N P, every other problem can be polynomialtime reduced to it.
A priory it is not at all obvious that NP-complete decision problems even
exist. A breakthrough theorem proved independently by Cook and Levin
shows that SAT is NP-complete.
Theorem 12.2 (Cook-Levin). SAT is NP-complete
The famous “P vs NP” problem asks whether there are decision problems
which are in N P which aren’t in P. Using the above theorem we see this
is equivalent to the the question of whether SAT is in P or not. Indeed, if
SAT is in P, then using the definition of NP-complete we get a polynomial
time algorithm for solving every decision problem in N P. On the other hand
if SAT is not in P, then (using we fact that SAT ∈ N P ), we obtain that
P=
6 N P. We now give the proof of the Cook-Levin Theorem. This proof
is non-examinable.
Proof. Let DY ES ⊆ D be a decision problem in N P. To prove the theorem,
we need to find a polynomial time reduction of this decision problem to SAT.
First, let’s recall what we know about N P. From the definition, we know
that there is a Turing machine M and polynomials p, q which satisfy the
definition of “polynomial time certifier” for DY ES ⊆ D.
We’ll prove the following lemma which, when applied to the Turing machine M , will imply the theorem.
3
Lemma 12.3. There is a function f : ({Turing machines}, {Length n inputs}) →
{CNF Formulas} so that f (M, x) is satisfiable if, and only if, there exists
some string y of length ≤ q(n) with M (x, y) = Y ES and M halting in p(n)
steps on (x, y).
Additionally there is a polynomial time algorithm which finds f (M, x) for
each M, x.
The theorem is immediate from the lemma. We need to give a polynomial
time reduction of DY ES ⊆ D to SAT. This means a polynomial time algorithm which takes an instance of D as an input, and gives an input of SAT as
an output. The algorithm is simply “given an instance x ∈ D, find f (M, x)
(using the polynomial time algorithm from Lemma 12.3). Note that M is
fixed here (it is the Turing machine which which comes from the definition
of the decision problem DY ES ⊆ D being in N P ).
From Lemma 12.3, we know that f (M, x) is satisfiable if, and only if,
there exists some string y of length ≤ q(n) with M (x, y) = Y ES and M
halting in p(n) steps on (x, y). From the definition of “M is a polynomial
time certifier for DY ES ⊆ D, we know that for each x, there exists some
string y of length ≤ q(n) with M (x, y) = Y ES and M halting in p(n) steps
on (x, y) if, and only if, x ∈ DY ES . Combining these we obtain that f (M, x)
is satisfiable if, and only if, x ∈ DY ES i.e. we have verified the definition of
“polynomial time reduction of DY ES ⊆ D to SAT”.
It remains to prove the lemma.
Proof of Lemma 12.3. We have a Turing machine M and input x. We want
to construct a CNF formula f (M, x). The basic idea is to write down a bunch
of clauses which “model” the running of a Turing machine. First we need to
define the variables which the CNF formula will be built out of. Let M have
m states and the alphabet Σ have s + 1 symbols. It will be convenient to
think of the symbols of Σ as numbers i.e. Σ = {0, 1, . . . , s} with 0 being the
blank. The variables of the CNF formula f (M, x) will be:
ˆ Qi,j which will represent whether M is in state j at step i.
ˆ Si,j,k which represents whether position j of the tape at step i contains
symbol k.
ˆ Ti,j which represents whether the r/w head is on position j at step i.
We will now write down a long list of clauses which all encode some
particular aspect of the running of a Turing machine. They will be grouped
under 10 “rules”.
4
ˆ Rule 1: at each step i, the machine is in at least one state.
This is encoded by the clause Qi,1 ∨ Qi,1 ∨ · · · ∨ Qi,q . By joining all these
clauses using “∧” we get a CNF formula which encodes “at all steps,
the machine is in at least one state”.
ˆ Rule 2: at each step i, the machine is in at most one state.
First notice that the boolean formula ¬(Qi,j ∧ Qi,k ) encodes “at step i
the machine is not simultaneously in states j and k”. This is logically
equivalent to the OR statement ¬Qi,j ∨ ¬Qi,k . By joining all these
clauses using “∧” for all i and j 6= k we get a CNF formula which
encodes “at all steps, the machine is in at most one state”. Combining
this with Rule 1 (using ∧), we can encode “at all steps, the machine is
in exactly one state”
ˆ Rule 3: at each step i, position j on the tape contains at least
one symbol This is encoded by the clause Si,j,1 ∨ Si,j,1 ∨ · · · ∨ Si,j,s . By
joining all these clauses using “∧” we get a CNF formula which encodes
“at all steps and all positions, the machine has at least one symbol”.
ˆ Rule 4: at each step i, position j on the tape contains at most
one symbol. First notice that the boolean formula ¬(Si,j,a ∧ Si,j,b ) encodes “at step i, position j on the tape doesn’t simultaneously contain
symbols a and b”. This is logically equivalent to the OR statement
¬Si,j,a ∨ ¬Si,j,b . By joining all these clauses using “∧” for all i, j and
a 6= b we get a CNF formula which encodes “at all steps, in all positions there is at most one symbol”. Combining this with Rule 3 (using
∧), we can encode “at all steps, in all positions, there is precisely one
symbol”.
ˆ Rule 5: at each step i, the r/w head is in at least one position
This is encoded by the clause Ti,0 ∨ Ti,1 ∨ · · · ∨ Ti,n ∨ Ti,−1 ∨ · · · ∨ Ti,−n .
By joining all these clauses using “∧” we get a CNF formula which
encodes “at all steps the r/w head is in at least one position”.
ˆ Rule 6: at each step i, the r/w head is in at most one position.
First notice that the boolean formula ¬(Ti,a ∧ Ti,b ) encodes “at step i,
the r/w head is not simultaneously in positions a and b”. This is
logically equivalent to the OR statement ¬Ti,a ∨ ¬Ti,b . By joining all
these clauses using “∧” for all i and a 6= b we get a CNF formula
which encodes “at all steps, the r/w head is in at most one position”.
Combining this with Rule 5 (using ∧), we can encode “at all steps, the
r/w head is in precisely one position”.
5
ˆ Rule 7: at step 0, the r/w head is at position 0 and the machine
is in state 1. This is encoded by the CNF formula T0,0 ∧ Q0,1 .
ˆ Rule 8: At step i, the machine is at position a, sees symbol
b, and is in state c with mc = (md , e, f ), then on the next step,
it writes e in position a, moves to position a + f , and switch
to state md . This is the “main” rule, which encodes the fact that
the Turing machine acts like a Turing machine. To encode it we use
the fact that the logical symbol for implies “ =⇒ ” can be used to
encode if/then statements. We encode it using the boolean formula
Ti,a ∧Si,a,b ∧Qi,c =⇒ Ti+1,a+f ∧Si+1,a,e ∧Qi+1,d . Using the either/or form
of “implies”, this is logically equivalent to the CNF formula (¬Ti,a ∨
¬Si,a,b ∨ ¬Qi,c ∨ Si+1,a,e ) ∧ (¬Ti,a ∨ ¬Si,a,b ∨ ¬Qi,c ∨ Ti+1,a+f ) ∧ (¬Ti,a ∨
¬Si,a,b ∨ ¬Qi,c ∨ Qi+1,d ). By joining all these formulas using “∧” for all
i, a, b, c, d, e, f , we can get a single CNF formula which encodes Rule 8
always holding.
ˆ Rule 9: At step 0, the tape has the string x written on it.
Otherwise the tape is blank except for the q(n) positions immediately to the right of x (where y can be written). Let
x = x1 . . . xn , where each xi ∈ {1, . . . , s} is a symbol from the alphabet.
Then the CNF formula S0,1,x1 ∧ S0,2,x2 ∧ · · · ∧ S0,n,xn encodes “x is written on the tape between positions 1 and n. Similarly, the CNF formula
S0,0,0 ∧ S0,−1,0 ∧ · · · ∧ S0,−p(n),0 ∧ S0,n+q(n),0 ∧ S0,n+q(n)+1,0 ∧ · · · ∧ S0,n+p(n),0
encodes “all other entries are blank, except possibly the q(n) entries
immediately to the right of x”. Combining these two CNF formulas
using ∧ encodes rule 9.
ES
ˆ Rule 10: The machine halts with output YES. Let mYhalt
be the
kth state of the machine. Then Rule 10 can be encoded by Q1,k ∨Q2,k ∨
· · · ∨ Qp(n),k .
Define the CNF formula f (M, x) as the combination of all the the CNF
formulae from Rules 1 – 10 using “∧”.
First, note that given M, x, f (M, x) can be calculated in polynomial time.
To see this, first observe that f (M, x) has length O(p(n)3 ) (to see this, go
through each of Rules 1 – 10 and check that O(p(n)3 ) is an upper bound on
the length of the CNF formulae defined in each rule). Thus f (M, x) can be
calculated in time O(p(n)3 ), simply by going through Rules 1 – 10 one by
one and writing out the formulae involved in each rule.
From the definition of Rules 1 – 10, f (M, x) has a satisfying assignment if,
and only if, there is some string y of length q(n) which can be written after x
6
ES
such that running M (x, y) halts in mYhalt
(since the rules exactly encode the
running of a Turing machine). This concludes the proof of the lemma.
7
Download