Physical systems for the solution of hard computational problems

advertisement
Physical systems for the
solution of hard computational
problems
Peter Mattsson
Master of Science
Cognitive Science and Natural Language Processing
School of Informatics
University of Edinburgh
2003
Abstract
We start from Landauer’s realization that “information is physical”, i.e. that
computation cannot be disentangled from the physical system used to perform it,
and ask what the capabilities of physical systems really are. In particular, is it
possible to design a physical system which is able to solve hard (i.e. NP-complete)
problems more efficiently than conventional computers?
Chaotic physical systems (such as the weather) are hard to predict or simulate,
but we find that they are also hard to control. The requirement of control turns
out to pin down the non-conventional options to either neural networks or quantum computers. Alternatively, we can give up the possibility of control in favour
of a system which is basically chaotic, but is able to settle at a solution if it
reaches one. However, systems of this type appear inevitably to perform a type
of stochastic local search.
Our conclusion is that quantum computers give us more computational “tools”
(though using them requires insight into the problem to be solved), but that
all other physical systems are essentially no more powerful than a conventional
computer.
i
Acknowledgements
First, I would like to thank my supervisor, David Barber, for help, encouragement
and long chats in Starbucks. I would also like to thank Douglas Adams, for
giving me an insouciant attitude to deadlines, as well as Wolfgang Lehrach, David
Heavens and others (you know who you are!) for getting me through the final
stretch without loss of sanity or humour.
ii
Declaration
I declare that this thesis was composed by myself, that the work contained herein
is my own except where explicitly stated otherwise in the text, and that this work
has not been submitted for any other degree or professional qualification except
as specified.
(Peter Mattsson)
iii
To my parents.
iv
Table of Contents
1 Introduction
1
1.1
Complexity theory . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
The satisfiability problem . . . . . . . . . . . . . . . . . . . . . .
7
1.2.1
1.3
1.4
Search space structure . . . . . . . . . . . . . . . . . . . .
12
Solution methods for the satisfiability problem . . . . . . . . . . .
17
1.3.1
Local search methods . . . . . . . . . . . . . . . . . . . . .
17
1.3.2
Genetic/evolutionary methods . . . . . . . . . . . . . . . .
19
1.3.3
Message passing methods
. . . . . . . . . . . . . . . . . .
19
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2 On the complexity of physical systems
2.1
22
The neon route-finder . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.1.1
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.2
General properties of physical systems . . . . . . . . . . . . . . .
27
2.3
Analogue vs. digital . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.3.1
A diversion into chaos theory . . . . . . . . . . . . . . . .
32
The problem of local continuity . . . . . . . . . . . . . . . . . . .
33
2.4
v
2.5
2.4.1
The Zealot Model . . . . . . . . . . . . . . . . . . . . . . .
35
2.4.2
The Ising model . . . . . . . . . . . . . . . . . . . . . . . .
37
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3 Novel computing devices
44
3.1
Mechanical computers . . . . . . . . . . . . . . . . . . . . . . . .
45
3.2
Electrical analog computers . . . . . . . . . . . . . . . . . . . . .
46
3.3
Artificial neural networks . . . . . . . . . . . . . . . . . . . . . . .
48
3.4
Circuit-based “consistency computers” . . . . . . . . . . . . . . .
51
3.4.1
The travelling salesman problem . . . . . . . . . . . . . . .
52
3.4.2
The satisfiability problem . . . . . . . . . . . . . . . . . .
54
3.4.3
Integer factorization . . . . . . . . . . . . . . . . . . . . .
55
3.4.4
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.5
DNA computers . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.6
Amorphous computers . . . . . . . . . . . . . . . . . . . . . . . .
58
3.7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4 Quantum computers
4.1
60
Quantum mechanics, a brief introduction . . . . . . . . . . . . . .
61
4.1.1
Wave-particle duality . . . . . . . . . . . . . . . . . . . . .
61
4.1.2
The Schrödinger wave equation . . . . . . . . . . . . . . .
63
4.1.3
Quantum states . . . . . . . . . . . . . . . . . . . . . . . .
64
4.1.4
Operators and Measurement . . . . . . . . . . . . . . . . .
67
4.1.5
Quantisation and spin . . . . . . . . . . . . . . . . . . . .
68
vi
4.1.6
Quantum parallelism, entanglement and interference
. . .
69
4.1.7
Heisenberg’s uncertainty relations . . . . . . . . . . . . . .
71
4.1.8
The problem of measurement . . . . . . . . . . . . . . . .
71
4.1.9
Quantum computers . . . . . . . . . . . . . . . . . . . . .
73
4.1.10 Approaches to building quantum computers . . . . . . . .
76
4.1.11 Error correction . . . . . . . . . . . . . . . . . . . . . . . .
77
4.2
Quantum algorithms . . . . . . . . . . . . . . . . . . . . . . . . .
79
4.3
Grover’s database search algorithm . . . . . . . . . . . . . . . . .
83
4.4
Shor’s factoring algorithm . . . . . . . . . . . . . . . . . . . . . .
87
4.5
What makes quantum computers powerful?
. . . . . . . . . . . .
91
4.5.1
Classical versions of Grover’s algorithm . . . . . . . . . . .
91
4.5.2
Classical entanglement . . . . . . . . . . . . . . . . . . . .
92
4.5.3
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
94
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
4.6
5 Test case
96
5.1
Error correcting codes . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Complexity of the decoding problem . . . . . . . . . . . . . . . . 101
5.3
Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4
96
5.3.1
The Barber scheme . . . . . . . . . . . . . . . . . . . . . . 107
5.3.2
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Solving the decoding problem using physical models . . . . . . . . 110
5.4.1
Ising model formulation . . . . . . . . . . . . . . . . . . . 110
vii
5.4.2
5.5
Neural network formulation . . . . . . . . . . . . . . . . . 111
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6 Conclusions
6.1
113
Possibilities for further work . . . . . . . . . . . . . . . . . . . . . 114
A Miscellaneous quantum questions
116
A.1 Quantum “nonlocality” and the EPR experiment . . . . . . . . . 116
A.2 Are space and time discrete? . . . . . . . . . . . . . . . . . . . . . 118
Bibliography
120
And finally. . .
129
viii
List of Figures
1.1
Likely structure of the NP problem class, taken from [58]. . . . . .
1.2
For random 3-SAT problems, the time taken to find a solution (here
4
measured by the number of algorithm calls) peaks at α ≈ 4.25, and
the peak becomes increasingly marked as the number of variables
rises. (Adapted from [47] by Bart Selman.) . . . . . . . . . . . . .
1.3
Probability that a random 3-SAT problem will be satisfiable, as
a function of α. Note the sharp drop centered around α ≈ 4.25.
(Adapted from [47] by Bart Selman.) . . . . . . . . . . . . . . . .
1.4
14
15
Phase diagram for 3-SAT. The variables plotted are: Σ, defined
as log(no. of satisfiable states)/N ; e0 , the minimum number of violated clauses per variable; and eth , the typical energy (per variable) of the most numerous local minima. (An energy of zero
implies that all clauses are satisfied.) Taken from [54]. . . . . . . .
2.1
16
The shortest route between Imperial College and Admiralty Gate,
as found by the neon route-finder. (Reproduced from the Imperial
College Reporter.)
2.2
. . . . . . . . . . . . . . . . . . . . . . . . . .
23
Example “route-finding” circuit . . . . . . . . . . . . . . . . . . .
24
ix
2.3
A circuit with multiple branches can be considered as a circuit
with only one branch, but many leaves. To do this, we need to calculate the equivalent resistance of each path through the multiplybranching circuit . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
Ising “wire”. The solid lines represent +J-weight bonds and the
dotted lines represent 0-weight bonds. . . . . . . . . . . . . . . . .
2.5
25
39
Ising “AND gate”. The solid lines represent +J-weight bonds and
the dotted lines represent 0-weight bonds. The inputs are on the
left, and the output is on the right. The middle input must be an
impurity, used to bias the result. . . . . . . . . . . . . . . . . . . .
39
3.1
Basic model of a neuron. . . . . . . . . . . . . . . . . . . . . . . .
48
3.2
Co-operative pair of neurons. . . . . . . . . . . . . . . . . . . . .
50
3.3
Travelling salesman model. The lines represent wires, with the
arrows representing the direction in which the potential difference
across that wire encourages the current to flow. The cities – C1 to
C4 – are small artificial neural networks which have the link wires
as inputs and outputs. . . . . . . . . . . . . . . . . . . . . . . . .
53
3.4
Travelling salesman city node network. . . . . . . . . . . . . . . .
53
3.5
Basic structure of the satisfaction problem network, for a sevenvariable problem . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
54
Fredkin’s “billiard ball” model of computation. The presence or
absence of a ball represents a value of 1 or 0, respectively, so this
implements a two-input, four-output reversible logic gate. . . . . .
x
74
4.2
Physical implementation of Deutsch’s algorithm. A half-silvered
mirror divides an input laser beam in two, sending one part along
Path 0 and the other along Path 1. The two beams interfere again
at a second half-silvered mirror, and a pair of photo-sensitive detectors record the results. Taken from [66]. . . . . . . . . . . . . .
5.1
81
“Ising model” corresponding to the example generator matrix.
This model has non-local bonds, but it is still planar, so it is
still easy to find the ground state exactly. . . . . . . . . . . . . . .
5.2
99
Probability of successful decoding, P (N ) for a noise level of 5%.
The success probability drops, apparently exponentially, as N increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3
F (pc , pe ) for pe = 0.1, for various matrix densities (n, m). For
(2, 4), the naı̈ve and error estimates are correlated, but as soon as
the density rises above this they fail to be. . . . . . . . . . . . . . 103
5.4
F (pc , pe ) for n = 2, m = 4 and values of pe between 0.1 and 0.4.
This shows that noise levels of up to about 20% can be tolerated.
5.5
104
By arranging the spins into a grid, simple nearest-neighbour bonds
give most of the spins four bonds each. A few non-local bonds
round the edges can then ensure that all the spins have four bonds
without sacrificing the planarity of the model. . . . . . . . . . . . 110
xi
Chapter 1
Introduction
Information is physical.
Rolf Landauer
Information surrounds us, and computation is constantly taking place; whether
it is the wind manipulating leaves and branches, or electricity manipulating bits
and switches, the principle is the same. Any physical process can be thought of
as a computation; why, then, should we restrict attention to the limited class of
semiconductor devices that are currently used in computing? This project was
motivated by the thought that maybe physics could offer us a few more tools
than we have yet taken advantage of; we wanted to find, in short, the source of
computational power.
Our very existence is proof that other computing paradigms are possible, and can
be powerful; our brains are capable of much that we have, as yet, been unable to
replicate using semiconductor-based computers. Is this a failure of ingenuity on
our part, or does the physical makeup of a brain bestow it with fundamentally
greater power? Much has been made of its massive parallelism, with 1011 neurons,
each with tens of thousands of connections; is this the key? Alternatively, does
its adaptive, stochastic operation give it a greater ability to find approximate
answers, at the cost of precision? Does nature have any other tricks up her
sleeve? Are some calculations simply impossible? These are big questions, and
1
Chapter 1. Introduction
2
we would not claim to have all the answers, but we do aim at least to clarify the
options and rule out some of the possibilities.
The remainder of this chapter will be devoted to a discussion of what we mean by
the term “hard computational problem”, and of some of their typical properties.
The following chapter will then develop very general arguments which put upper
bounds on the computational ability of any imaginable physical system. Following that, we discuss several alternative approaches: analog, neural, DNA and
quantum computation. From an examination of their strengths and weaknesses,
we then focus on a sample problem, which has the advantage of being naturally
realizable on several different physical systems.
Please note that, as this thesis has to cover a wide range of subject-matter,
it contains an unusually large component of introductory and review material.
Those who are familiar with particular areas should feel free to skim or skip
through this material.
1.1
Complexity theory
I don’t know much about Art, but I know what I like.
Anonymous
This section is intended as a brief, general introduction to the aspects of complexity theory that will be required for the remainder of the thesis. As such, it is
neither complete nor rigorous; for that, books such as [53] provide a good starting
point.
The two fundamental classes of problems that we will consider are known as
polynomial-time (P) and nondeterministic polynomial-time (NP). Problems in
the first class can be reliably solved by algorithms whose maximum execution
time (on current computer hardware) is bounded by a polynomial in the size of
the problem: they can be solved in “polynomial time”. For a problem to be in the
NP class, by contrast, all that is required is that it be possible to verify or reject
Chapter 1. Introduction
3
a possible solution in polynomial time; we just need to “know it when we see it”.
The term “nondeterministic” refers back to the original definition of these classes
in terms of Turing machines. Problems in P are solvable in polynomial time on a
standard (deterministic) Turing machine; those in NP are solvable in polynomial
time on a nondeterministic Turing machine. A nondeterministic Turing machine
is able, at suitable points in a computation, to take random “guesses”; its running
time is measured for the case where all the guesses turn out to be right.
While problems in P are also in NP (i.e. P is a subset of NP), there are also
problems which are thought to be in NP but not P1 . (Figure 1.1 shows a likely
guess as to the structure of NP.) The most efficient known algorithms for these
problems typically run in “exponential time”, i.e. their execution time is bounded
by an exponential function of the problem size. A well-known example of such
a problem is the so-called “travelling salesman problem”. In this problem, a
salesman has to visit clients in a number of different cities, and we must find
the shortest possible route that visits all these cities exactly once, starting and
finishing at the salesman’s home base. A naı̈ve method of solving this problem
would be to simply enumerate all possible routes, then pick the shortest. However,
for N cities, this yields (N − 1)!/2 possibilities, a number which exceeds 10100
by N = 72. A number of better algorithms have been developed, but all run
in exponential time. This is a hallmark of this type of problem: the obvious
algorithm involves checking an exponentially increasing number of possibilities
(because checking an answer is the one thing we know to be easy), and there do
not appear to be any significant shortcuts.
A crucial property of the NP problem class is that there exist problem types
which can be reduced to any other problem type in NP (i.e. we can restate any
problem in NP as an example of one of these types); what’s more, this can be
done in polynomial time. These problems are known as NP-hard; if they also
happen to fall into the NP class, they are called NP-complete. This property
means that, if we were able to find a method for solving arbitrary problems of a
1
Although this is widely considered to be true, it is a famous open problem, for whose
solution the Clay Mathematics Institute is offering a prize of $1,000,000.
Chapter 1. Introduction
4
Figure 1.1: Likely structure of the NP problem class, taken from [58].
given NP-complete type in polynomial time, we could use this to solve any NPclass problem in polynomial time. For us, it means that we can focus on the task
of designing a physical system to solve a particular NP-complete problem (the
“satisfiability problem”, to be introduced shortly) and draw general conclusions
about the ability of such systems to solve NP-complete problems.
Although all known exact algorithms for NP-complete problems run in exponential time in the worst case, it turns out that, in fact, the worst case seldom
happens. The majority of examples of NP-complete problems can actually be
solved quite easily, by heuristic or stochastic algorithms. In addition, for some
classes of NP-complete problem (such as the travelling salesman problem) good
approximate solutions can almost always be found. This means that the existence of an algorithm which can exactly or approximately solve the majority of
instances of an NP-complete problem, while useful in practice, says little about
the complexity of the problem as a whole.
For example, proteins are synthesized from amino acids and subsequently relax
(“fold”) into a lower-energy configuration, in general choosing the configuration
with the lowest possible energy, or one close to it. Finding the minimum-energy
fold is, in general, an NP-complete problem [60], but approximate solutions are
usually easy to find (and the body has mechanisms for destroying proteins which
Chapter 1. Introduction
5
fold wrongly). In addition, the crucial biological features of proteins are centred
on a small number of binding sites; if these form due to local folding, the protein
can still be useful even if it is quite far from the minimum-energy configuration.
All this means that the routine solution of this problem by the human body
does not tell us anything useful about the computational power of the physical
processes involved; the question of finding the minimum-energy configuration may
even be irrelevant.
Hard instances of NP-complete problems share three features, which lie at the
root of their difficulty. First, they have a number of different local optima (i.e.
points in their phase space where a slight adjustment of any of the parameters
makes matters worse); this traps any solution method which works by moving
towards better solutions by considering only the local landscape. Secondly, it
is not possible to “factorize” the problem into smaller pieces and solve them
separately. For example, in the travelling salesman problem, it is not usually
possible to consider parts of the route in isolation: the cities chosen for one part
of the route affect the choices available to other parts. By contrast, a routefinding problem (where we simply want to find the shortest route from one point
to another) does not have this constraint: adjusting one part of a route has no
impact elsewhere, provided the start and end points of the section are kept fixed.
The third difficult feature of NP-complete problems is that they appear statistically random: rather like the functions used in random number generators, their
structure does not always show up in statistical tests. As a result, any solution
method which uses statistical means to identify solutions (such as belief propagation, for example), will be bound to fail. The field of cryptography provides
many good examples of this, as most methods for breaking cryptographic codes
rely either on statistical features of the original message showing up in the ciphertext, or on regularities introduced by the encoding process. For example, a
simple substitution cipher (where e.g. all ‘a’s are replaced by ‘c’s, all ‘b’s by ‘g’s
and so on) provides a huge number of possibilities, namely 26! (about 4 × 1026 ),
but can be broken by hand in a matter of minutes by taking advantage of the
characteristic frequencies of the different letters in a typical message. Switching
Chapter 1. Introduction
6
to a homophonic cipher (where each letter is replaced by one of a set of glyphs,
the size of the set allocated to each letter being proportional to its frequency)
washes out this information, but it is still possible to break the cipher by considering characteristic letter groupings. Once we completely remove all statistical
cues, however, the code becomes essentially unbreakable.
Mertens [59] was able to show that, according to a battery of statistical tests, the
hardest NP-complete problems were indistinguishable from the task of identifying
whether a given number was in a randomly-ordered list. The most damaging consequence of this is that there can be no reliable measure of how close a particular
point in the solution space of an NP-complete problem is to the optimal solution.
Any measure we might use – such as the length of the tour in a travelling salesman problem – will give us no indication of how much of the tour matches an
optimal tour. (This is why, as discussed above, finding a good solution is a very
different thing from finding the optimal one.)
It is interesting to draw a comparison at this point with P-complete problems
(which play an analogous role for the P class, and represent the “hardest” problems in the class). These can be factorized, but only in the sense that they can
be divided into pieces and the pieces solved one at a time; it is not possible to
solve them in parallel, as the initial conditions of each piece depend on the results of solving at least one other piece. Thus both P-complete and NP-complete
problems are, in some sense, tied together; the difference is that NP-complete
problems are tied together in a more complex manner. In effect, the “pieces”
of an NP-complete problem each depend on the solution of all the other pieces,
so there is no way to start the solution process without guessing at some of the
answer.
The degree to which an example of an NP-complete problem is “tangled” accounts
for the fact that not all such problems are hard to solve. Examples with few
constraints, which are only loosely connected, usually have a number of different
solutions. In addition, it is often the case that treating the problem’s parts as
being independent is a reasonable (and helpful) approximation. At the other end
Chapter 1. Introduction
7
of the scale, problems with many constraints are usually so heavily connected
that it is easy to either find a proof that the problem has no solution or use only
a few guesses to extrapolate out to the full solution. It is only in the middle
ground, where there are enough constraints to reduce the number of solutions
without seriously compromising the size of the search space, that hard examples
are to be found.
1.2
The satisfiability problem
You are chief of protocol for the embassy ball. The crown prince
instructs you to either invite Peru or to exclude Qatar. The queen
asks you to invite either Qatar or Romania or both. The king, in a
spiteful mood, wants to snub Peru or Romania or both. Is there a
guest list that will satisfy the whims of the entire royal family?
A satisfiability problem, by Brian Hayes [39]
We are now ready to introduce our featured problem; this will appear, in various
guises, several times in the course of our discussion, so it is worth giving a general
introduction to it here. The satisfiability problem is arguably the archetypal
example of an NP-complete problem (it was the first, in fact, to be proved NPcomplete [40]), and has been heavily studied, making it ideal for our purposes.
The problem can be stated as follows: given a Boolean formula (i.e. a formula
involving only Boolean variables), is it satisfiable? In other words, is it possible
to allocate values to the variables in the formula such that the whole formula is
true?
In analysing this question, we can take advantage of a useful property of Boolean
formulas: they can always be restated in both conjunctive normal form (CNF)
and disjunctive normal form (DNF). In CNF, the formula is written as a series
of conjoined clauses (i.e. clauses coupled together by AND operators), such that
each subclause consists of a series of disjoint – and possibly negated – variables
(i.e. variables coupled together by OR operators).
8
Chapter 1. Introduction
To be more precise, if we use ∧ for AND, ∨ for OR, ¬ for NOT and T and F for
true and false, a formula in CNF is of the form
C1 ∧ C 2 ∧ C 3 ∧ · · · ∧ C n ,
where the Ci are of the form
Ci = X 1 ∨ X 2 ∨ X 3 ∨ · · · ∨ X m .
The Xj can each either be xj , ¬xj or F , as desired, and the xj
(j = 1, . . . , m)
are the Boolean variables in the formula. (Normally, we drop the trivial F terms.)
A common formulation of the satisfiability problem is in terms of a CNF formula
with exactly k non-trivial terms per subclause; this is known as k-SAT. Clearly,
1-SAT is trivial: simply work through the subclauses, assigning variable values
so as to make them true, until either we run into a contradiction or all subclauses
are satisfied. The next case, 2-SAT, can also be handled relatively easily, though
it is no longer trivial. Consider the following example:
A
∨ ¬B
∧
¬A ∨
B
C
∧
∨ ¬C
One way to solve this is to start by assuming that A is true; this satisfies the first
subclause. To satisfy the second subclause, we then have to assume that C is true.
Finally, to satisfy the third subclause we require that B is true. (Had we started
by assuming that A was false, we would have found that B was false – to satisfy
the first subclause – and that C was also false – to satisfy the third subclause.)
In this way, one starting assumption allows us to make a cascade of deductions,
thereby dealing with either the whole problem or at least a non-trivial subset of
it. It turns out that this allows all 2-SAT problems to be solved in polynomial
time [38].
9
Chapter 1. Introduction
As soon as we move on to 3-SAT, things change markedly. Consider now the
example
A
∨ ¬B ∨ C
¬A ∨
∧
B
∧
∨ C
¬A ∨ ¬C ∨ D
Assuming that A is true still satisfies the first subclause, but it leaves us with
a 2-SAT problem to solve as far as the remaining clauses are concerned. In this
case, this is not a major problem, but it would have been if not all the subclauses
had happened to involve A: we would then have had to make a list of all the
solutions to the 2-SAT problem, trying them one-by-one as partial solutions to
the remainder of the subclauses. If we happen to hit a problem where the 2-SAT
subproblems have large numbers of solutions, we potentially have a lot of work
to do, thanks to this additional branching factor. As a result of this additional
complexity, 3-SAT turns out to be NP-complete.
After 3-SAT, nothing very much changes: all k-SAT problems with k > 2 are NPcomplete. The easiest way to see why this should be so is to note that all other
SAT problems can be converted into examples of 3-SAT. For example, the 4-SAT
clause A ∨ B ∨ C ∨ D can be rewritten as a pair of 3-SAT clauses by introducing
a new variable, E: satisfying both A ∨ B ∨ E and C ∨ D ∨ ¬E imposes the same
restrictions on A, B, C and D as the original clause. In this process, we have
made the problem bigger, both in terms of the number of clauses and the number
of variables, but only by a polynomial factor.
A curious feature of this problem is that we could also have chosen to write the
formula in DNF, which is a disjunction of conjunctions rather than a conjunction
of disjunctions. (In other words, the variables are conjoined to make up the subclauses, and the subclauses are disjoined to make up the formula.) For example,
10
Chapter 1. Introduction
the 2-SAT problem considered above can also be written as
A
∧
B
∨
∧
C
¬A ∧ ¬B ∧ ¬C
In this form, the problem suddenly becomes very simple: the solutions are essentially written out in the statement of the formula. Essentially, the hard part
lies in the conversion between CNF and DNF. This is a general feature of NPcomplete problems: there is always a way of stating the problem such that the
answer is straightforward, even obvious, but finding it is hard. Another example
of this is integer factorization: if we happen to write the number to be factored
using a number base equal to one of the factors, it becomes trivial.
Another useful formulation of SAT problems is to use binary-valued variables in
place of the boolean ones. Thus, for example, satisfying A∨B ∨C is equivalent to
solving a+b+c > 0, if we map a = 0 to A = F , etc. This shows that satisfiability
problems are essentially just optimization problems in disguise, particularly if we
take an unsatisfiable formula and ask for a variable assignment satisfying as many
clauses as possible, known as the MAX-SAT problem.
Several variations on the basic satisfiability problem have also been considered.
Two are particularly interesting for us: XOR-SAT (where the OR operators are
replaced by XOR) and 1-in-k-SAT (where exactly one term each subclause must
be satisfied). Written in terms of binary-valued variables, these both yield a series
of equations, rather than inequalities. For example, satisfying A ∨ B ∨ C as part
of a 1-in-3-SAT problem requires us to solve a + b + c = 1. As part of an XOR-
SAT problem, this would also be true, but only if all arithmetic is done modulo
2. (The formula A XOR B XOR C can be satisfied either by A = B = C = T ,
or by making one variable true and the other two false; both alternatives give
a + b + c = 1 when arithmetic is done modulo two.)
Since 1-in-k-SAT and XOR-SAT can both be written as a system of linear equations, it might appear that both are simpler than k-SAT, but this is not quite
true. If there are enough clauses for the system of equations to have a unique
Chapter 1. Introduction
11
solution, then it is true, but matters change when the number of subclauses is
smaller than this. In this case, several of the variables are left to act as parameters in the definition of the rest. For XOR-SAT, the fact that arithmetic is
done modulo two allows these variables to be set arbitrarily – yielding an easy
problem – but the situation for 1-in-k-SAT is much more flexible. Here, there is
nothing to force a solution of the new problem to map onto a solution of the old:
a + b + c = 1 can be satisfied by a = 3, b = −1, c = −1, for example. In the worst
case, we might have to try all possible allocations of the parameter variables to
find one that gives legal values to all the others2 . As a result, 1-in-k-SAT can
still be NP-complete, though it will still be much easier to solve than a k-SAT
problem of the same size.
In the case of 3-SAT, it might appear that the need to resort to some variant of
brute force search is a consequence of our lack of deductive skill: the hardest cases
involve a large number of constraints, so it looks like we should be able to apply
logical deduction to solve the problem. We can indeed do so, but uncovering the
correct sequence of steps amounts to conversion of the problem to DNF, another
hard problem. For 1-in-3-SAT, by contrast, the approach outlined above seems
to exhaust the constraints and leave us with a purely random problem.
Finally, the obvious measure of how close we are to a solution is the number of
clauses that are satisfied. However, this can be a useless measure, as the following
2
This is a special case of the problem of solving a system of diophantine equations, for which
it is known that no general algorithm exists. While that does not mean that no algorithm can
exist for this special case, it does indicate that we have strayed into difficult territory.
12
Chapter 1. Introduction
toy example indicates.
A
A
A
∨
∧
∨ ¬B
∧
∨ ¬C
∧
¬A ∨
B
B
B
∧
∨ ¬C
This example has two solutions: A = B = C = T and A = B = T , C = F .
Imagine, though, that we start from the guess that A = B = F , C = T , which
only satisfies two of the five clauses. If we flip the value of one of the variables,
we find two options which satisfy three clauses – A = F , B = C = T and
A = C = T , B = F – and one which satisfies four clauses – A = B = C = F .
The first two options would lead us to within one flip of a solution, but the last
satisfies more clauses. Accordingly, “hill climbing” using the number of satisfied
clauses as a guide would lead us in the wrong direction. In fact, A = B = C = F
is a local minimum; if we only accept flips that reduce the number of unsatisfied
clauses, there is no way out. (If we only require that the number of unsatisfied
clauses does not go up, however, we can reach a solution by flipping A, then B.)
The only reliable guide of how close we are to a solution is the minimum number
of flips required to reach a solution, but just finding this value for one variable
allocation is as hard as solving the whole problem. On the other hand, this kind
of hill-climbing can lead us rapidly to an approximate solution of a MAX-SAT
problem; we just cannot expect to be anywhere near the exact solution.
1.2.1
Search space structure
Given that the number of satisfied clauses is an unreliable measure of how close
we are to a solution, it is important to gain some degree of understanding about
Chapter 1. Introduction
13
the structure of the search space of a typical satisfiability problem.
The most natural approach to brute-force search for this type of problem is to
allocate values to some of the variables (typically the most-constrained first),
make as many deductions as possible from them, allocate values to more variables,
and repeat until either a contradiction is derived or all variables are found. (This
is known as the Davis-Putnam procedure; see [57] for further details.) An obvious
measure of how useful this method will be is the number of clauses in the problem
relative to the number of variables, and indeed the average difficulty of a SAT
instance can be directly related [55] to the ratio between the number of clauses
and the number of variables, which we shall call α. (For a more recent review,
see [56].)
This result comes from an analysis of the problem in terms of statistical mechanics (more usually used to study systems of many indential interacting physical
particles, such as magnetic materials or gases), which shows that there is a “phase
transition” at this point. In a physical system, this would correspond to a change
from e.g. liquid to solid, or paramagnetic to ferromagnetic; here, it corresponds
to a change from satisfiable (on the average) to unsatisfiable. Figure 1.2 shows
the change in average computation time as α increases, using the Davis-Putnam
procedure. Not only is there a sharp peak at the phase transition, but this peak
becomes sharper as the problem size increases. This peak, in fact, is exponentially large, reflecting the fact that such methods cannot do very much better
than brute-force search of all 2N possible variable allocations.
Phase transitions are characterised by a sharp peak in a parameter associated
with the system, known as the order parameter; here, this parameter is the average time required to find a solution. Typically, the transition happens because
interactions between the particles give rise to correlations; as the transition point
is approached, these become stronger, winning out over random noise (such as
thermal fluctuations) and leading to ever-larger correlated regions. Eventually,
the correlations become the most important feature of the system and the regions
coalesce into one. The order parameter is usually a measure of the typial correla-
Chapter 1. Introduction
14
Figure 1.2: For random 3-SAT problems, the time taken to find a solution (here
measured by the number of algorithm calls) peaks at α ≈ 4.25, and the peak becomes
increasingly marked as the number of variables rises. (Adapted from [47] by Bart
Selman.)
tion length; at the transition point, this increases to encompass the whole system.
On the other side of the transition point, the correlations become so strong that
the system divides up into separate regions. Each region is so strongly correlated that it is impossible for neighbouring regions to force them to match their
correlation. As the order parameter increases, these regions become smaller and
smaller.
The crucial ratio αc , as we can see from Figure 1.2, is about 4.25 for 3-SAT,
corresponding to each variable appearing about 13 times. Above this point,
it becomes likely that small clusters of clauses will be mutually inconsistent,
allowing the problem’s unsatisfiability to be easily shown; below it, the problem
Chapter 1. Introduction
15
Figure 1.3: Probability that a random 3-SAT problem will be satisfiable, as a function
of α. Note the sharp drop centered around α ≈ 4.25. (Adapted from [47] by Bart
Selman.)
is only loosely connected, and can be easily factorized. This is confirmed by
Figure 1.3, which shows a sharp transition at αc . From this point of view, the
reason why 2-SAT is easier is that even the hardest examples never have enough
“elbow room” to allow the search space to increase exponentially.
The structure of the set of solutions to a typical problem goes through several
distinct phases as α increases. Figure 1.4 shows that, below α ≈ 3.8, the most
numerous local minima are actually complete solutions, but that this becomes
increasingly less likely as α increases. Between α ≈ 3.8 and αc , the problem is
satisfiable, but finding a satisfiable minimum is harder. In [52], the phases are
given as:
• α < αd ≈ 3.86, wherehe set of all solutions is connected, i.e. these is a path
of mutually adjacent solutions that joins any two solutions of the set;
• αd < α < αb ≈ 3.92, where the set of all solutions breaks up into a number
of disconnected clusters;
• αb < α < αc where, for each cluster, there are literals which take the same
Chapter 1. Introduction
16
Figure 1.4: Phase diagram for 3-SAT. The variables plotted are: Σ, defined as
log(no. of satisfiable states)/N ; e0 , the minimum number of violated clauses per
variable; and eth , the typical energy (per variable) of the most numerous local minima. (An energy of zero implies that all clauses are satisfied.) Taken from [54].
value for all solutions in the cluster; these are known as the backbone of the
cluster.
As we approach αc , not only do the solution clusters get smaller and fewer in
number (making them harder to find), but this results in an increase in the
number of pseudo-solution clusters (i.e. clusters associated to a local, rather than
a global, minimum). Essentially, the allocation space divides up into an increasing
number of clusters, but fewer and fewer of them correspond to global minima.
In the worst case (at αc ) the total number of clusters is exponential in the size
of the problem, but satisfiable problems typically only have one or two solutions;
we wind up looking for a needle in a haystack.
Chapter 1. Introduction
1.3
17
Solution methods for the satisfiability problem
Apart from deterministic search-and-backtrack algorithms such as Davis-Putnam
(which are impractical for problems of any useful size), the practical search methods that have been found for the satisfiability problem can be divided into three
broad classes: local search, genetic/evolutionary methods and message passing
methods.
1.3.1
Local search methods
Despite our comments above, the most popular local search solution methods
for the satisfiability problem invariably use the number of satisfied clauses as a
guide to the solution process. The problem is that no other guides are available,
and methods which avoid its use are reduced to essentially bouncing around in
the search space until they hit a solution; not a very efficient approach for large
problems.
As mentioned above, solution methods can use the structure of the problem to
avoid having to search the entire space of allocations, but the best methods we
know of (see [41], [42]) still have a worst-case bound of about 1.5N for a problem
of size N . The method of [42] is particularly interesting: it involves picking a
set of points in the allocation space and exhaustively searching the area within a
given number of flips for a solution. The set of start points are chosen to form a
covering code, i.e. such that all points in the allocation space should fall within
the search area of one of the start points. In this way, the algorithm is guaranteed
to find the solution, if one exists. By relaxing the constraint that the code must
be a perfect covering code, Schöning [43] also found a very reliable – though not
perfect – algorithm with a worst-case bound of (4/3)N . Unfortunately, however,
even this bound means that a problem with 400 variables has a search space of
about 1050 , which is clearly impractical.
By contrast, probabilistic local search methods can work very well, provided that
Chapter 1. Introduction
18
they take advantage of the number of satisfied clauses as a measure of success.
Without doing this, the only ways to pick the next variable to flip are either to
select it randomly, or to pick an unsatisfied clause at random and then select one
of the variables involved in it. The first choice is clearly hopeless; the second turns
out to work surprisingly well for 2-SAT [44] – finding a satisfying assignment in
O(N 2 ) – but does not work well for 3-SAT above αd .
The next-simplest method, known as GSAT, always flips so as to reduce the number of unsatisfied clauses as far as possible. Clearly, this will almost invariably get
stuck in a local minimum, but, by restarting the search every time this happens,
the algorithm still outperforms deterministic search-and-backtrack methods [46].
Allowing the algorithm to make “sideways” (i.e. non-improving) moves if nothing
better is available improves its performance substantially [46].
An alternative means of preventing this type of method getting stuck is provided
by our first physically-motivated method: simulated annealing. In this method,
we select a variable at random, and calculate the net change in the number of
satisfied clauses, δ. We then accept the flip with a probability proportional to
e−δ/T , where T is a notional “temperature”. This is motivated by the behaviour
of magnetic materials, of which we will see more later. Some variants of the
method [48] modify this by accepting the flip with certainty if δ < 0, but this is
not physically motivated, and makes relatively little difference.
Interestingly, simulated annealing is outperformed by a method called WalkSAT [45]. WalkSAT improves on GSAT by applying GSAT with probability
p, and picking a random variable in a random unsatisfied clause with probability
1 − p. (The best choice for p turned out to be about 0.5–0.6.) The difference is
quite significant: in [45], simulated annealing was “only” able to solve problems
with up to about 600 variables, whereas WalkSAT was able to manage 2000 variables. Simulated annealing was, however, better than the basic GSAT method,
or one which applied GSAT with probability p and picked a random variable with
probability 1 − p. This shows that deliberately selecting the direction of steepest
descent is no better – in fact, slightly worse – than simply trying for some degree
Chapter 1. Introduction
19
of descent. The success of WalkSAT can be attributed to a judicious combination
of these two types of descent. Recent surveys (see e.g. [49]) show that WalkSAT,
while not the best possible algorithm, still performs well.
1.3.2
Genetic/evolutionary methods
The idea behind genetic algorithms is to mimic the natural selection process,
whereby “fit” individuals survive and propagate, while unfit individuals die off.
This “selection pressure” makes the species as a whole evolve to better fit its
environment. In genetic algorithms, the “individuals” are candidate solutions to
a problem, and fitness is determined according to how well they solve the problem.
In terms of the satisfiability problem, the individuals can be straightforwardly
represented by a binary string, whose ith digit represents the value of the ith
variable. Offspring can be formed by crossover (forming a new individual from the
initial part of one “parent” and the latter part of another, with the crossover point
determined randomly), and individuals can be mutated by randomly flipping bits
with some probability. The problem, of course, is that the only way of measuring
fitness is by reference to the number of satisfied clauses.
Genetic algorithms often work best when combined with local search, and indeed
Folino et al. [50] found a combination with WalkSAT to be beneficial. Rather than
mutating randomly, the individuals were mutated to the local maximum allocation as determined by WalkSAT. Applied to a population of 320, this procedure
halved the number of iterations required as compared to 320 parallel WalkSAT
instances.
1.3.3
Message passing methods
Message passing methods view satisfiability problems as graphical models. These
models have two sets of nodes: one set represents the variables, and the other
represents the constraints. Edges connect the variable nodes to the constraint
Chapter 1. Introduction
20
nodes in which they are featured. Finally, we associate conditional probabilities
with each of the nodes: probabilities that the clauses are true, given the possible
allocations of their variables, and probabilities that the variables are true, given
the possible states – satisfied or unsatisfied – of the clauses in which they feature.
Clearly, accurate knowledge of this last set of probabilities can be used to extract
satisfying allocations by asking for the variable probabilities associated with all
the clauses being true.
It is known that, for graphs that are tree-like (i.e. have no cycles), a procedure
called belief propagation can be used to repeatedly update the values of the probabilities for a node from the values associated with all its neighbouring nodes in
such a way that the probabilities converge on their correct values (see e.g. [77]
for further details). Convergence occurs regardless of the initial estimates used.
Unfortunately, the graphs that naturally occur in satisfiability problems turn out
not to be tree-like, so belief propagation cannot strictly be applied. However,
provided that the graph is approximately tree-like (i.e. has only long cycles), it
can still work effectively.
In the form we have described, belief propagation works well in both the heavilyconstrained and lightly-constrained regions, but fails badly as the problem instances grow harder [51] and as the size of the problem grows. This shows that
the assumption that the loops are long is invalid in this region, as we would
expect; the more “tangled” the problem is, the more chance there is that short
loops will be formed.
These results do not spell the death knell for message passing, however; they
simply mean that a more cunning means needs to be found to convert the problem
into a graphical model. An approach that is under active investigation is variously
called the cavity method and the survey method; see [54] and [52]) for further
details. This is intimately related to a formulation of the satisfiability problem
as an Ising model, but regrettably we are unable to go into it further here.
Chapter 1. Introduction
1.4
21
Summary
So far, we have seen just how tricky NP-complete problems can be: they are characterised by a large, complicated-looking solution space with no reliable compass.
We have also seen, however, that there is only a narrow range in which these problems are genuinely hard; outwith this range, even pretty simple and naı̈ve methods
work well. The hardest NP-complete problems, by contrast, have exponentially
many local minima in which to trap the unwary and are too tightly connected for
simple belief propagation to be effective. This, combined with their apparently
random statistical properties, explains why, for all their apparent structure, it
has been so hard to beat simple brute-force search for these problems.
For us, this is pretty strong intuitive evidence for assuming that NP-complete
problems are indeed more complicated than problems in P. Potentially, a brilliant
insight could show that a polynomial-time algorithm is possible, but we will follow
our common sense and assume that it is not. That does not mean, however, that
we cannot hope to find more effective tools from which to build algorithms; before
this thesis is finished, we will have shown that, at least for some problems, such
tools do exist.
Chapter 2
On the complexity of physical
systems
Physical systems can give rise to complex and unpredictable behaviour, as weather
forecasters are painfully aware; the question is whether all this complexity can
be put to use, rather than just stopping play. In particular, is it possible to
design a physical system so that, started off from the right initial state, it solves
a hard problem simply by virtue of its evolution? Even conventional computers
are based on physical systems which naturally do something useful, albeit just
computing the logical AND of two binary inputs. Before we discuss this question
in earnest, however, it is perhaps useful to consider an example of where this line
of thought can lead.
2.1
The neon route-finder
The inspiration for this project came from a beutifully simple device, devised
by Manz et al. [21], which uses the fact that electricity follows the path of least
resistance to solve route-finding problems. A streetmap of London was etched into
a small sliver of glass, and then a second piece was fixed over the top to create
a network of tiny tubes. These tubes were then filled with neon gas and sealed.
22
Chapter 2. On the complexity of physical systems
23
When a voltage is applied between any two points on the map, the shortest route
between those points lights up like a tiny neon strip light; the route it found
between Imperial College and Admiralty Gate is shown in Figure 2.1.
Figure 2.1: The shortest route between Imperial College and Admiralty Gate, as found
by the neon route-finder. (Reproduced from the Imperial College Reporter.)
Our feeling, in commencing this project, was that this almost magical ability to
solve minimization problems could be put to use in solving harder problems (or at
least provide very rapid solutions for easy problems). However, we came to realise
that, while Manz’s device will always find a plausible route, it will not necessarily
find the optimal route. To see why, consider the circuit shown in Figure 2.2. As
with Manz’s neon tubes, this circuit uses resistance as a measure of distance; as
such, its behaviour should be analogous to that of the neon tubes. The “shortest
route” through this circuit is to take the leftmost branch at each junction; the
problem is that this is not the path that turns out to carry the greatest current.
The net resistance of a set of resistors connected in parallel is given by Kirchhoff’s
law, namely
1
1
1
1
=
+
+
Rnet
R1 R2 R3
for resistances R1 , R2 and R3 . Using this, the net resistance of the leftmost of
the three main branches of the circuit is
2 + 1/(1/3 + 1/30 + 1/30) = 4 1/2 Ω,
24
Chapter 2. On the complexity of physical systems
2Ω
10 V
30 Ω
3Ω
30 Ω
3Ω
7Ω
5Ω
1Ω
4Ω
4Ω
4Ω
4Ω
Figure 2.2: Example “route-finding” circuit
as compared to the rightmost branch’s total of
3 + 1/(1/4 + 1/4 + 1/4) = 4 1/3 Ω.
This means that, if we test the current flowing through the branches at the upper
branching point, we will find more flowing through the rightmost branch than
the leftmost. On the other hand, we will still find the greatest current flowing
out of the shorter path at the bottom of the tree. Even this, however, will not
always be the case. If we imagine “disentangling” a multiply-branching circuit
into a circuit with only one branch, but many leaves (one for each path through
the original circuit), we can ask what the equivalent resistance of each path is.
For the branch shown in Figure 2.3, a little calculation shows that
R = R1 + R11 + R11 R1
µ
1
1
.
+
R12 R13
¶
Without the final term on the RHS, this equation would have given us what we
wanted: the greatest current leaving a tree-like circuit would flow through the
exit which corresponded to the path with the lowest total resistance. However,
the final term allows this to be skewed by the rest of the circuit, to an extent
that will get progressively worse as the number of branching points increases.
For example, always taking the rightmost branch gives a path with an effective
resistance of 13 Ω, for a “length” of 7 Ω. Taking the centre branch and then the
leftmost branch, by contrast, gives an effective resistance of 11.15 Ω for a length
of 8 Ω. Thus the second path, though longer, will carry more current.
25
Chapter 2. On the complexity of physical systems
R1
≡
R11
R12
R13
R
Figure 2.3: A circuit with multiple branches can be considered as a circuit with
only one branch, but many leaves. To do this, we need to calculate the equivalent
resistance of each path through the multiply-branching circuit
In the interests of fairness, it must be said that the odds of circuits such as these
finding a reasonable path are still pretty high: we had to work quite hard in
Figure 2.2 to create a situation in which it would fail, and it nevertheless found
the second-shortest route. The dominant terms in calculating the resistance do
come from the shortest paths, so we would only expect things to go badly wrong
under quite unusual circumstances, and devices such as Professor Manz’s are still
likely to be practically useful. (In fact, they will have the useful property of erring
in favour of paths with several reasonable routes nearby – as in our example –
giving drivers some alternatives in case of mishaps.) That said, however, they do
not provide a “proper” solution, in that they cannot cope with all eventualities.
An interesting side-note is that the effectiveness of this type of mechanism would
have been greatly aided if Kirchhoff’s law had given greater weight to small
resistances; for example, a law such as
1
1
1
1
= n+ n+ n
n
Rnet
R1
R2
R3
with n > 1 would have made the circuit select the lowest-resistance path more
aggressively. Since Kirchhoff’s law is just a consequence of Ohm’s law (V = IR
for current I and voltage V ) and current conservation, such a circuit could be
fashioned out of materials offering non-Ohmic conductivity properties. (For the
above, we would be looking for substances which obeyed V = IRn .) Unfortunately, we are not currently aware of any such substances.
Chapter 2. On the complexity of physical systems
2.1.1
26
Discussion
The neon route finder shows that clichés like “electricity always follows the path
of least resistance” need to be treated with caution. On the other hand, it also
shows that the behaviour of apparently complex systems can often be described
quite simply.
At the lowest level, the route-finder and its electrial analogue both involve the flow
of a large number of electrons, but we did not need to worry about their individual
behaviour because their relevant bulk properties were adequately described by the
concept of a “flow” of current meeting a certain amount of resistance. In fact,
the route finder could equally well have been built using pipes with water flowing
through them, the whole device being tilted to add a gravitational potential,
rather than an electrical one.
The trouble with this type of description is that it relies on there always being
enough particles at any given point to approximate a continuous flow. For more
complicated problems, this can potentially lead us into trouble: an exponential
number of possibilities could require an exponential number of particles.
Imagine, for a moment, that we have built a version of the route finder using pipes
and ball bearings. At each junction, we have arranged for there to be an equal
probability of the ball bouncing into each of the exit pipes, making the whole
device a bit like a pinball machine. If we sent balls through the device one at a
time, the time they took to come out the other side would be a measure of how
far they had travelled. By sending enough balls through, we would eventually
have a pretty good estimate of the shortest path, but this would require O(b n )
balls if the path involved n junctions with an average of b exits each. In other
words, we would have an exponential algorithm.
If we decided, instead, to send a continuous series of balls through the device,
then something quite interesting would happen: the longer paths would start to
get clogged with balls, making fresh balls more likely to take the unclogged paths.
In this way, as with the electrical and water-based versions, we would start to
Chapter 2. On the complexity of physical systems
27
find the greatest flow along the shorter paths. Putting many balls through at
once would give the device an elementary “memory” of what the best paths were
(no balls mean good paths, if you like). However, this can only start to happen
when the rate that the balls are sent through the device is great enough for it
to start clogging up; in short, there is power in numbers. This fact has recently
been realised by designers of so-called “ant colony” algorithms, where a large
number of virtual ants explore a representation of the problem, laying down a
virtual pheromone as they go. By probabilistically choosing the paths with the
most pheromone, they come to follow the shortest path. This is, in a sense, the
inverse of what happens in the route finder: the level of pheromone is a measure
of how good a path is, whereas the level of clogging reflects how bad it is. The
end result, though, is fairly similar.
2.2
General properties of physical systems
Siegelmann [34] identified three main properties that distinguish analog from
digital models:
1. Analog computational models are defined on a continuous phase space (e.g.
where the variables x may assume analog values), while the phase space of
a digital model is inherently discrete.
2. Physical dynamics are characterized by the existence of real constants that
influence the macroscopic behaviour of the system. In contrast, in digital
computation all constants are in principle accessible to the programmer.
3. The motion generated by a physical system is “locally continuous” in the
dynamics. That is, unlike the flow in digital computation, statements of
the following forms are not allowed in the analog setup: “tests for 0” or
“if x > 0 then compute one thing and if x < 0 then continue in another
computation path.”
Chapter 2. On the complexity of physical systems
28
We are particularly interested in the first and last of Siegelmann’s properties here;
we find it hard to see how her real constants (such as the gravitational constant
or the charge on an electron) can do more than act as uniform scaling factors.
The effects of continuity and continuous dynamics are much more important,
however, as we shall see in the next two sections.
2.3
Analogue vs. digital
From the analysis of the neon route-finder, it is easy to imagine that continuity is
a good thing, and that systems which can at least approximately be thought of as
continuous flows are what we want to use as a basis for a computer. The physical
world seems analogue, not digital, so the choice to build digital computers seems
increasingly perverse; this, however, is not the whole story.
Continuous signals, manipulated using continuously varying devices, might at
first appear to have almost unlimited computational power. In the same way
that any number, however large, can be represented as a real number between
zero and one (by encoding it as a decimal expansion), an analogue signal can, in
principle, carry limitless amounts of information. In addition, by adding several
different signals together to form one composite signal (much in the way that
several telephone conversations can be carried over a single wire by using different
carrier frequencies), calculations can potentially be carried out in parallel with
no increase in effort. (This last feature is now being rediscovered in the context
of quantum computing, of which more later.)
The problem with this scenario is that it only works effectively if there is no noise
in the signals, and the device components perform their functions perfectly; neither requirement is likely to be true in practice. It was for this reason that Claude
Shannon, from an investigation of the first general-purpose analogue computer,
proposed the shift to digital computation. If a signal can only take a discrete
range of values, then it becomes possible to correct a certain amount of noise:
simply adjust it to the nearest appropriate value. Similarly, imperfections in
Chapter 2. On the complexity of physical systems
29
the operation of the machine components can be corrected by performing a discrete series of steps, rather than a continuous shift, and correcting the signal as
the computation proceeds. In this fashion, it becomes possible to perform an
arbitrarily long computation on an arbitrary number of bits without error.
The important question, for us, is whether this reduction from analogue to digital
is really necessary, or whether anything more can be salvaged; is there anything
that an analogue computer could reliably do that a digital computer cannot?
Siegelmann’s properties imply that the natural description of an analog computer
is as a dynamical system, and that its capabilites depend on the structure inherent
in that system. In assessing the computational power that is available, the best
guide is the Church-Turing principle [35] [36], which conjectures that no physical
system can perform a computation more than polynomially faster than a universal
Turing machine (or a digital computer). Simulations of the weather or other
naturally chaotic systems break this rule, because errors propagate exponentially
with time, requiring exponentially increasing accuracy as the simulation time
increases: the butterfly effect. On a digital computer, this means doing the
simulation on a finer and finer mesh, with smaller timesteps, and handling the
variables of the problem with greater precision; this all adds up to exponential
consumption of the computer’s resources. Clearly, then, if we could use a chaotic
system as a computer it would have greater power than a conventional computer.
The problem, of course, is that the same features that make a chaotic system
hard to simulate also make it hard to control: we need to set its initial conditions
with exponential precision, and keep it noise-free to an exponential degree. In
fact, as we shall see in Chapter 4, Heisenberg’s uncertainty principle implies that
even if we were able to have zero noise and arbitrary precision, there would still
be a small amount of uncertainty involved in the parameters of the system. Over
time, this would propagate exponentially, meaning that there would still come
a time when the system’s state was completely random. In effect, this means
that any property of a physical system which can be reliably predicted can also
be simulated in polynomial time, though the power of the polynomial would be
Chapter 2. On the complexity of physical systems
30
unfeasibly large.
As an aside, we can view this in terms of the “washing out” of information from
the system; it implies that there is a finite limit to the amount of information
that any physical system can store and manipulate reliably1 . This is reminiscent
of Mandelbrot’s famous argument that the coastline of Britain is fractal: at most
scales, this appears to be true, but it does not remain true indefinitely.
The flip side of this argument is that any physical system which is easy to control
(i.e. is robust to noise) is also easy to simulate. Simulation errors are equivalent to
noise in the system, so simulations of such systems are self-correcting. Lloyd [20]
has suggested that this fact could be put to practical use in quantum computers:
by using a quantum computer to simulate a physical system which would robustly
perform a given computation, much less attention need be given to making the
quantum computer itself robust.
These properties effectively make conventional computation by a chaotic system
impractical, but they do not rule out stochastic computation of the type embodied
by the neon route-finder. In fact, the chaotic nature of the system now becomes
an advantage, as it helps the system to explore the search space of the problem.
Provided that solutions to the problem correspond to stable attractors of the
dynamical system, there is at least a chance that the system will spontaneously
settle into the global minimum.
There are thus two approaches to noise in a computational system: one is Shannon’s, where it is rigorously corrected before it builds up to a point where it
can do any damage; the other is to use it to help kick the system out of local
minima in a stochastic search2 . This is clearly related to the simulated annealing
approach: by slowly reducing the noise level (by e.g. cooling the system down)
we can, with luck, “freeze” it at the global equilibrium. This relies on transitions
1
If we make the transition from classical to quantum systems, this changes somewhat; we
will come back to this point in Chapter 4.
2
Strictly speaking, there is also a third: encode the problem in such a way that the noise
has no effect. Topology is a good candidate for this (a noisy torus is still definitely a torus, and
not a sphere), but the only practical applications which have so far been found are in the field
of quantum computing [84].
Chapter 2. On the complexity of physical systems
31
to lower-energy basins being more likely than ones to higher-energy ones; given
time, the system’s evolution should take it to states with lower and lower energy.
As with simulated annealing, however, this can only be relied on to find a good
approximate solution, especially for NP-complete problems.
In some systems, like the route finder, this idea works well, because they are
interently stable: if they are perturbed, they will tend to relax back to where
they were. The difficulty is that this is a reflection of the problems that they
are designed to solve: the global minimum lies in a basin, and they will return
to it unless they are kicked all the way out of the basin by noise. The more
complicated the search space of the problem is, and the smaller and shallower
the basins are, the greater the odds are that the system will either never settle,
or reach only a local minimum. In other words, this approach is a real physical
implementation of a simulated annealing-type algorithm, and will have the same
benefits and difficulties.
A second requirement for a dynamical system to be computationally useful is
that its evolution should be computationally irreducible, i.e. that it should be
impossible to say anything of value about future states of the system without
explicitly simulating each step of its evolution [9]. In other words, there should
be no “shortcuts” to the answer; otherwise, we can potentially build a simple
simulation on an ordinary computer and beat it to the answer. This rules out
systems like the route finder: as discussed earlier, a very crude description sufficed
to describe the relevant details of its evolution.
These arguments add up to a somewhat depressing picture, in that physical systems appear, at best, to be capable of implementing only pretty crude types of
algorithm. However, their strong suit is that they can potentially do what they do
very quickly; the success of ant colony algorithms shows that careful application
of simple-minded methods can be very effective.
As was discussed in the previous chapter, NP-complete problems apparently have
an exponential nature, but that can manifest itself either in terms of time or
space requirements. Here we see another manifestation: exponential accuracy
Chapter 2. On the complexity of physical systems
32
requirements. By encoding and manipulating information in a continuous way, a
perfect analogue computer could potentially perform an NP-complete calculation
in polynomial time, but obliging it to work within real-world constraints would
almost certainly throw an exponential spanner in the works. This insight will be a
guiding light throughout the rest of this work: it is easy to be seduced by systems
which could work in theory, but do they still work when real-world constraints
are imposed?
2.3.1
A diversion into chaos theory
An interesting perspective on the issue of “washing out” information comes from
the problem, first posed by Penrose [37], of determining the boundary of the
Mandelbrot set. This set is constructed from the iteration
zi+1 = zi2 + c,
where c and the zn are complex numbers; c is a member of the set if (and only
if) the iteration converges. The boundary of this set is fractal (and arguably
beautiful), but it has been shown that determining whether or not a given point
lies in the set is undecidable. By considering a computer which operated on realvalued quantities, but using discrete timesteps, Blum, Shub and Smale (BSS) [85]
defined analogues of the usual complexity classes, and showed that Penrose’s
problem was equivalent to the halting problem. They found that there was no
limit to the number of iterations that could be required to see whether or not the
iteration had converged.
If we imagine trying to solve this problem with a physical system, we find a
different problem: the inherent uncertainty implied by a real system means that
we are doomed to failure. Once we come close enough to the set’s boundary, we
have no way of ensuring that we have not accidentally calculated the answer for a
point on the other side of the boundary. We can get round this problem to some
extent by scaling the quantities we use (so that the uncertainty occurs on a scale
Chapter 2. On the complexity of physical systems
33
too small to be relevant), but the scaling must become exponentially large as we
come exponentially close to the boudary.
2.4
The problem of local continuity
The Total Perspective Vortex derives its picture of the whole Universe
on the principle of extrapolated matter analyses.
To explain – since every piece of matter in the Universe is in some
way affected by every other piece of matter in the Universe, it is in
theory possible to extrapolate the whole of creation – every sun, every
planet, their orbits, their composition and their economic and social
history – from, say, one small piece of fairy cake.
Douglas Adams, The Restaurant at the End of the Universe
Siegelmann’s property of “local continuity” implies that logical branching operations (like if . . . then . . . ) cannot be rigorously implemented. However, as we shall
see in Chapter 3, it is possible to provide an acceptable approximation. For us, a
more serious difficulty implied by local continuity is that changes in the system
can only be felt locally, and take time to propagate to distant regions. Locality
is an important and essential feature of physical systems3 – within a relativistic
framework, it ensures that cause precedes effect – but it has to be artificially imposed on simulations of physical systems. Simulations are not bound by the laws
of physics unless we choose to make them so. From the introduction, it should be
clear that the structure of NP-complete problems is inherently complex, making
it likely that any mapping onto a physical system would benefit from non-local
connections.
Digital computers allow for non-locality at the cost of discrete time evolution:
the computation is halted to give the signals time to get where they need to go.
Allowing evolution in continuous time potentially makes the process unstable,
as the system will inevitably end up acting on information that is “out of date”
3
It comes as no surprise that almost all physical laws are couched in terms of differential
equations.
Chapter 2. On the complexity of physical systems
34
to some extent. As with systems of iterated equations, the choice whether to
use the most recent information or information from the previous iteration has a
profound effect, especially if the system is unstable.
On a more theoretical level, it is known that problems defined on planar graphs
tend to be easier to handle; often, requiring that its underlying graph be planar
reduces a problem from NP to P. Clearly, it is much easier to map a planar graph
onto a physical system, as many more of the edges can be mapped onto local
connections; from this, it is tempting to conclude that this makes the system’s
evolution less complicated.
To see what would happen if we were to remove the restriction of locality, we need
only look at classical physics. In the Newtonian universe, bodies feel their mutual
gravitational attraction instantly; there is no conception that the influence of one
body on another needs time to propagate across the gap. As a result, time is
essentially just a parameter: any calculation can be done in half the time simply
by doubling the speed of the particles involved. Ian Stewart [17] was able to
use this to design what he called the “rapidly accelerating computer” (RAC).
By means of exponentially accelerating particles, the internal clock of the RAC
doubles in speed with each cycle: the first cycle takes 1/2 s, the next 1/4 s, and
so on. As a result, literally any calculation can be carried out in a second or
less. Even the halting problem (deciding whether or not a given algorithm will
eventually terminate) would be decidable using such a computer. In saying this,
of course, we have to leave out practical considerations, such as the limitless
amounts of energy it would take to power such a computer, but the principle
remains. With only local interactions, not only would the different parts of the
computer eventually fall out of sync (when the clock rate became so fast that
the signals could not cross the gap in time) but the clock-speed doubling could
only be carried out a finite number of times. In other words, we would have to
truncate the series that gave the exponential behaviour at a finite point, thereby
yielding polynomial behaviour.
In some cases, locality constrains the behaviour of a physical system so far that
Chapter 2. On the complexity of physical systems
35
it is indeed possible to extrapolate from the proverbial piece of fairy cake to
the state of the whole system. In others, however, this does not follow: if we
imagine the space of states encompassing all physically possible configurations of
the system, specifying the properties of only a small region of the system merely
cuts down the size of the space; it does not reduce it to a single point. As we
saw with the satisfiability problem, more freedom means more complexity; this
is equally true of physical systems, as the next two subsections show.
2.4.1
The Zealot Model
As an interesting illustration of the degree of freedom inherent in different physical
systems, we now consider Mobilia’s “zealot model” [10]. This is based on the Ising
model, which we will discuss in more detail in the next subsection. The essential
idea is to consider a lattice of “voters”, most of whom have no firm convictions
and are therefore inclined to listen to their neighbours. One of these voters,
however, has an opinion that is fixed to the point of fanaticism, and will hold
that belief no matter what: the “zealot”. The central question is how far the
zealot’s opinions will propagate through the rest of the population.
To make this more concrete, the voters can favour one of two points of view, coded
as ±1. At each time step, an individual is chosen at random and assigned the
opinion of one of its (again randomly chosen) nearest neighbours in the lattice,
and the evolution of the system is followed until it reaches a steady state.
The result is that, for one- and two-dimensional lattices (i.e. lines or flat grids),
the only steady state is unanimous agreement with the zealot. However, in three
dimensions the result is that the average vote is inversely proportional to the
voter’s distance from the zealot. (In more than three dimensions, the average
vote is proportional to r 2−d , where r is distance from the zealot and d is the
number of dimensions.)
From this, we can see that the one- and two-dimensional models are examples of
systems where the state of the system as a whole are (at least in the long term)
Chapter 2. On the complexity of physical systems
36
determined by its local state. For three or more dimensions, we still get some
global information, but the system is clearly far less constrained. In addition,
the only results that can be extracted are statistical in nature, telling us nothing
about the opinions of any one voter (except for those voters which happen to be
very near the zealot).
Another interesting result for this model is that the time taken to reach unanimity
√
at a given point (to some given degree of approximation) is proportional to 1/ t
in one dimension, but 1/ ln(t) in two dimensions. This means that requiring an
increased degree of accuracy entails a polynomial wait in one dimension, but an
exponential one in two dimensions.
From this analysis, we gain some useful intuitions. First, we see that giving a
system more degrees of freedom makes it more likely that it will have a number
of different minima (either all equal, as here, or some more equal than others). In
essence, it becomes possible to satisfy the majority of the constraints at any given
point without satisfying all of them, and in such a way that we can’t (locally)
do better. Second, we see that there are at least three régimes: easy, hard (but
still with a unique solution), and impossible (due to multiple solutions). In the
second régime, we can guarantee to get a solution eventually (though possibly
with an exponential wait), but in the third we can get irretrievably stuck, with
no option but to start again.
In computational terms, the first régime corresponds to easy problems, the second
to apparently hard (but computationally reducible) problems, and the third to
hard, computationally irreducible problems. This last class corresponds naturally
to NP-hard problems: given a possible steady state of the zealot model, we can
easily check it, but finding one with a particular set of additional properties would
appear to involve evolving the model for an exponential amount of time.
Chapter 2. On the complexity of physical systems
2.4.2
37
The Ising model
Following on from our discussion of the zealot model, we now – as promised –
consider the Ising model. This is superficially very similar to the zealot model,
in that we again have a d-dimensional grid, populated by entities which can take
two values, though this time, in deference to the model’s origins in solid-state
physics, we call them “spins” rather than “voters”. The spins are “coupled” to
their neighbours such that the total energy of the system, H, can be written as
H=−
X
Jij σi σj ,
ij
where the σi are the spins and the matrix Jij represents the couplings. The evolution of the Ising model is essentially a physical version of simulated annealing:
if a spin flipping would reduce the overall energy of the system by an amount δ,
it does so with a probability proportional to exp(−δ/kT , where T is the temperature of the system and k is Boltzmann’s constant. This is a good, though simple,
model of the thermal fluctuations in a real magnetic material.
In the simplest case, the Jij are taken to be zero for spins that are not adjacent
on the lattice, and J otherwise; in other words, the only interactions are between
nearest neighbours and all are coupled equally. This is known as a ferromagnetic
Ising model. Finding the ground state of such a model is clearly trivial (simply
assign all spins to the same value), but finding the partition function is less easy.
The partition function describes the probability of finding the system in a state
of a given energy, and thus has to be aware of all possible states of the system.
In one dimension, this is still fairly easy, and in the two-dimensional case an
analytical solution is still possible (though far more difficult). In three or more
dimensions, however, the problem becomes NP-complete, and no exact algorithm
is known which is significantly faster than brute-force enumeration.
The next-simplest case, and the one which will hold our attention for a while, still
allows interaction only between nearest neighbours, but allows them to be ±J or
0 otherwise. The non-zero interactions are still all of the same strength, but they
may be positive or negative, and we are free to specify which on an element-by-
38
Chapter 2. On the complexity of physical systems
element basis; this is known as a (random) Ising spin glass model. Even finding
the ground state of such a model is no longer straightforward, except in one
dimension, because it can now be frustrated (i.e. it is no longer possible to satisfy
all the bonds). The two-dimensional case can still be solved in polynomial time,
but for three or more dimensions the problem is NP-complete.
The spin glass model is very heavily studied, both as an elementary model of
a magnetic substance and because it has a natural relation to problems such
as the satisfiability problem (as we shall see in Chapter 5) making it a natural
“complexity laboratory”. In fact, this second feature means the spin glass model
is largely responsible for the current application of statistical physics methods to
complexity theory [55].
It is interesting to note that the NP-completeness result follows from the fact that
finding the ground state of such a model can be reduced to solving a problem
known as minimum cut. The task here is to take a graph with weighted edges
and divide it into two distinct (unconnected) pieces by removing edges, such that
the total weight of the removed edges is as small as possible. In terms of an Ising
model, the vertices of the graph correspond to the spins, and the edges to the
bonds; having made the cut, we assign all the spins in one piece to be “spin up”,
and all the spins in the other piece to be “spin down”.
To see why this represents the ground state of the problem, imagine an arbitrary
cut has been made and the spins assigned. Denote by E the set of all bonds,
and by E + , E − and E +− the sets of edges which connect pairs of up spins, down
spins and opposite spins respectively. We can then rewrite the total energy as
HC = −
= −
X
ij∈E +
X
ij∈E
Jij −
Jij + 2
X
Jij +
ij∈E −
X
X
Jij
(2.1)
ij∈E +−
Jij .
(2.2)
ij∈E +−
The first term is a constant of the problem, so minimizing the energy is equivalent
to minimizing the second term, which is just twice the weight of the cut.
Polynomial-time algorithms are known for solving the minimum cut problem on
Chapter 2. On the complexity of physical systems
39
Figure 2.4: Ising “wire”. The solid lines represent +J-weight bonds and the dotted
lines represent 0-weight bonds.
Figure 2.5: Ising “AND gate”. The solid lines represent +J-weight bonds and the
dotted lines represent 0-weight bonds. The inputs are on the left, and the output is
on the right. The middle input must be an impurity, used to bias the result.
any planar graph, but it is NP-complete for non-planar graphs unless all the
weights are non-negative.
The difference between the two- and three-dimensional cases allows us to draw
some quite revealing conclusions about the complexity of the computational problems that they can potentially solve. (The discussion that follows is broadly in
the spirit of Kaye’s proof that the game Minesweeper is NP-complete [15].) The
idea is that it is possible to recast elementary computer elements, such as wires
and logic gates, in terms of configurations of a two-dimensional Ising spin glass,
such that the result of a computation can be read off in terms of its ground-state
energy.
Consider Figure 2.4. This shows an implementation of a simple “wire”, which
is essentially just a one-dimensional model embedded in a two-dimensional one,
“insulated” by a zero-coupling layer. Minimizing the energy of the system requires
all the spins along the wire to line up, correlating the spins at either end.
40
Chapter 2. On the complexity of physical systems
Implementing a NOT gate is very similar: simply introduce a −J coupling into
the wire. If we could also implement an AND gate, then we would have all we need
to build arbitrary logic circuits. This, however, is not possible: the invariance
of the model under inversion of spins implies that the outputs of such a gate for
inputs of 1, 0 and 0, 1 would have to be different. One way to get round this
is to introduce “impurities” with fixed spin, thereby breaking the symmetry, as
shown in Figure 2.5. Alternatively, the model could be made dependent on the
absolute values of the two spins, not just their relative values. This is most easily
accomplished by adding a “magnetic field” to the model, which the spins will
tend to align with. (This simply involves adding a term −B
P
i
σi to the energy
function given above; this is minimized when all the spins have the same sign as
the constant B, representing the magnetic field.)
In either case, we run into problems with the reduction to minimum cut: we have
lost the spin-reversal symmetry and, as a result, rather than just being dependent
on the cut itself, the energy is generally dependent on the interiors of the two
regions. This yields a much harder problem.
The only way we can see of preserving the good behaviour of the two-dimensional
model is to introduce only one impurity (connected, via wires, to all the points at
which impurities are required). This means that we can still view the problem as
an instance of minimum cut, with the proviso that the choice as to which group
of spins should be up is no longer arbitrary.
From this, it might appear that we are able to construct arbitrary circuits, but
we are left with one major difficulty: to do so, we would need to find a way for
the “wires” to cross. In the case of Minesweeper, this was possible (completing
the proof of NP-completeness), but here it is not. On the other hand, extending
the model to three dimensions allows sufficient freedom for there to be no need
for wires to cross. Similarly, allowing non-local interactions (even just across the
diagonals of the grid squares) is enough to allow wires to cross.
The net result of this is that it is possible to embed any conceivable logic circuit in
a three-dimensional Ising model, or a two-dimensional model with non-local inter-
Chapter 2. On the complexity of physical systems
41
actions, but only quite a restricted class of circuits in an ordinary two-dimensional
model.
It is also interesting to note that, as a consequence of this ability to simulate
arbitrary logic circuits, simple neural networks can also be built into an Ising
model, provided that we allow the spins to have arbitrary numbers of nearest
neighbours. All we need do is designate one neighbour of a given spin as the
“output” and the others as “inputs”: the spin – and the “output” – will then
align itself with the majority of the “inputs”. The only difficulty with this scheme
is that learning would require adjusting the strength of the bonds between the
“inputs” and the “neuron”, which is not a physically viable process. (For further
information about the link between Ising models and neural networks, see [78].)
Leading on from this, it might appear that, by careful adjustment of the bond
strengths in an Ising model, it is possible to arrange for the ground state of
the model to be the solution to any given problem. However, for NP-complete
problems, we cannot use the reduction to min cut as a means of finding the ground
state, and we must rely on the normal evolution of the system. The problem with
this is that, as we noted in the introduction, NP-complete problems generally have
a number of different minima, and there is no reason to assume that the system
will ever reach the global minimum.
The dimensionality of the system rears its head again when we start to think
about how we could go about ensuring that we do indeed find the ground state.
The problem is that, as the number of dimensions increases, so does the number
of nearest neighbours per spin. To see why this is a problem, consider the simplest
possible example: the one-dimensional ferromagnetic model.
Imagine that the system is already at the global minimum, with all spins aligned
(upwards, say). Flipping one spin is energetically unfavourable, because it takes
the two bonds around the spin from a total energy of −2J to +2J, an increase
of 4J. However, at finite temperature, Boltzmann statistics imply that there
is still a chance that it will happen, proportional to exp(−4J/T ). Once it has
happened, the neighboring spins can flip without any change in energy and we
Chapter 2. On the complexity of physical systems
42
can potentially get a “domino effect” (though this will proceed like a random
walk, rather than a proper domino effect). What is more, as the size of the
system increases, so does the chance of the initial flip happening. In fact, it can
be shown that the behaviour of the system is ergodic, i.e. that, given enough time,
it will eventually visit all possible states.
By contrast, if we consider the analogous situation in a two-dimensional model,
we find that the initial chance of a spin flip is much lower (due to an energy
cost of 8J). In addition, the neighbouring spins still have three out of their four
neighbours spinning up, so they will tend to stay aligned. It turns out that this
suppression of change is enough to ensure that the two-dimensional model is no
longer ergodic, and can settle into a definite state. In addition, this state need
not be the ground state; all that is required is that each spin should have three
or more nearest neighbours spinning in the same direction.
Moving to three (or more) dimensions, we find more of the same: the energy cost
of a spin flip is even higher (12J), and the neighbouring spins still have 5 out of
6 neighbours aligned. The condition for a state to be stable is also looser: 4 out
of 6 neighbours, rather than 3 out of 4.
This leads us to the seemingly paradoxical conclusion that systems with more
dimensions simultaneously have more freedom (in that we can impose more complex structures on them) and are more resistant to change. This is why the
three-dimensional zealot model took so long to settle, and bodes ill for the convergence of any system with similar dynamics.
2.5
Summary
In this chapter, we have seen that there is a very good reason for the desire
to force analogue systems into a digital straightjacket: the need to keep error
and noise at bay. This makes us skeptical about the prospects of any physical
system to improve on present computers in any significant way, if we keep to the
Chapter 2. On the complexity of physical systems
43
paradigm of obliging the system to perform a predefined algorithm.
On the other hand, physical systems open up the possibility of a more stochastic
mode of computing, where the only requirement is that the solution be more
stable than other points in the search space. We have even seen, thanks to the
neon route finder, that this can work well for some problems. However, if we try
to apply this approach to NP-complete problems, we find that the complicated
search space embodied by such problems also makes this approach unlikely to
converge exactly. On the other hand, physical systems can at least work very fast,
being composed of a large number of basic elements. Given that the prospects
for an efficient solution of such problems are remote, they may still be effective,
at least in some cases.
Chapter 3
Novel computing devices
I wish to God these calculations had been executed by steam.
Charles Babbage to William Herschel (1821)
The history of computing is littered with all manner of weird and wonderful
devices, all designed with one aim in mind: efficient calculation. After a long
struggle, digital computers eventually emerged as the most flexible and reliable
candidates; the aim of this chapter is to see why.
The story begins, back in 18231 , with Charles Babbage and his “Difference Engine No. 1”, designed for the task of mechanically calculating mathematical tables
(such as logarithms). Babbage had become frustrated with the number of mistakes he found in existing (hand-calculated) tables, and managed to convince
the government that it could be done automatically. The task of building the
machine2 was monumental – it called for 25,000 precision-machined parts – and
it was never finished, but it led Babbage to an even more ambitious design:
the “Analytical Engine”. This was never built either – it had 50,000 parts! –
but it represented the first device that we would recognise as a general purpose
computer: it would have been able to store information, and could have been
1
The story could begin with Napier’s design of the slide rule in 1650, or even with Leonardo
da Vinci’s reputed design of the first ever calculating machine, but we are interested in programmable computers.
2
Incidentally, the machine was hand-cranked, not steam-powered.
44
Chapter 3. Novel computing devices
45
programmed using punched cards3 .
Starting with the possibilities presented by purely mechanical devices like these,
we will move on to their electrical counterparts (chiefly Shannon’s General Purpose Analog Computer), wave computers, DNA computers, and finally an “amorphous computer”. This gives a good cross-section of the possibilities, and we finish
by tying these results in with the analysis of the previous chapter.
3.1
Mechanical computers
The idea of computing using cogwheels and crankshafts may seem somewhat
quaint now, but such machines provide a simple starting point in studying analog
computers in general. For example, Vergis et al. [28] designed a purely mechanical device for solving linear constraint optimization problems, with a shaft for
each of the variables and one for the objective function (i.e. the quantity to be
optimized). Using gears (to perform multiplication) and differentials (for addition and subtraction), the positions of the shafts could be forced to obey the
constraints. The simple act of turning the objective function shaft then turned
the other shafts to values which gave the new function value. By turning the objective function shaft until it could turn no more, the machine could in principle
find the optimum values for the variables.
The biggest problem with this machine is readily apparent: it can only find local
optima. By construction, it can only follow smooth paths through the parameter
space of the problem, so there is no way out of a local optimum other than
reducing the objective function again and trying to “steer round” the optimum.
Worse, it is only possible to find the global optimum at all if there is a path from
the machine’s starting point to the optimum which never leads “uphill” at any
point; Vergis et al. call this the Downhill Principle. Clearly, no machine which
obeys this principle can be of any general use in solving NP-complete problems,
and it seems intuitively quite likely that all purely mechanical machines must do.
3
The first device to be “programmed” in this way was actually a loom, by Jacquard in 1820.
Chapter 3. Novel computing devices
46
Essentially, this restricts such machines to implementing algorithms in the spirit
of GSAT, with the attendant problems that brings.
The other problem with machines such as these is that they must, like Babbage’s
engines, be made to extremely high precision; otherwise, paths which are only
weakly downhill may appear to actually lie uphill. In addition, using the machine
described above on some problems can yield awkwardly high gear ratios or fine
differentials. Thus, as the size of the problem increases, the required precision
will also increase.
As we have said before, local search through the parameter space of an NPcomplete problem is problematic at best, but to make it work at all we have to
add some means of “kicking” the system out of local optima; for this, electrical
computers hold out more promise.
3.2
Electrical analog computers
The most famous analog computer was Shannon’s Gneral Purpose Analog Computer (GPAC), the origins of which can be dated all the way back to Lord Kelvin
in 1876 [79]. This consisted of a small set of basic components, which could be
connected together as desired to solve the problem at hand. These were: an
adder (which outputs the sum of its inputs); an integrator (which, given inputs u
and v calculates
Rt
t0
u dv, where t0 is given by device setting and t is the current
time); a constant multiplier (which multiplies its input by a predefined constant);
a multiplier (which takes two inputs, u and v and outputs uv); and a constant
function (which has no inputs and continuously outputs 1). (The description
given here is taken from [29].)
The GPAC was originally designed to solve differential equations, and, despite
its name, this is essentially all that it is capable of: Shannon [30] showed that
the class of functions that it could generate was the set of solutions of a class of
systems of quasi-linear differential equations. (This was made more precise by
Chapter 3. Novel computing devices
47
Pour-El [31].) In particular, it is incapable of generating functions such as Euler’s
Γ function [33], so it is less powerful than a Turing machine. One feature of the
function class which a GPAC can generate that is of immediate interest to us is
that it must have a domain of generation, i.e. there must be some (finite) region
around the initial conditions within which the solution does not change. This
effectively means that the problem must allow room for imprecision in the initial
conditions and later calculation. As such, we are again likely to be restricted
to polynomial problems, unless we are happy to live with exponential accuracy
requirements.
As an aside, let us hark back to the real valued computers of BSS, which were able
to compress the memory requirements of algorithms for real-valued problems as
compared to their implementation on a digital computer. We have argued that
this would not work in practice, but such computers are in principle able to
provide an increase in power. It is interesting to note, for comparison, that the
GPAC does not: it is possible to simulate its operation with only a polynomial
slowdown [80].
Recent work [32] has shown that a relatively simple extension of the GPAC gives it
substantially more power. The augmented GPAC includes a box which calculates
xk θ(x) for input x and fixed k, where θ(x) is the Heaviside step function (i.e.
θ(x) = 1 for x > 0 and θ(x) = 0 otherwise). This gives the GPAC the ability
to “sense” inequalities in a differentiable way, getting round Siegelmann’s third
requirement. Of course, the proce we pay for this is that values of x which are
only slightly above zero will only be sensed very weakly, effectively imposing
a lower limit on the size of the inequalities that can be sensed. Nevertheless,
this allows the enhanced GPAC to compute most functions, provided only that
they are computable by a Turing machine in a time bounded by a primitive
recursive function of the problem size. The class of primitive recursive functions
is large, and includes the exponential function, so the extended GPAC can at
least compute the solution to any NP-complete problem. This result does not,
however, say anything about how long it will take! In addition, as with the
Mandelbrot set problem, we are likely to have to scale the quantities used, so
48
Chapter 3. Novel computing devices
that inequalities can be sensed and the domain of generation is large enough.
Thus, even if the time required turned out to be acceptable, the accuracy and
energy requirements would not be.
3.3
Artificial neural networks
Artificial neural networks are simplified models of the brain, consisting of a large
number of artificial “neurons”. These are either fully connected together, or
they are formed into densely connected groups which are only sparsely linked.
(This latter option is intended to mirror the brain’s organisation of neurons into
functional groups.)
Although real neurons are biologically quite complex, their basic operation is
thought to actually be very simple. The first artificial neuron was designed by
McCullough and Pitts, and its operation is summarised in Figure 3.1.
I1
w1
I2
w2
w3
I3
Neuron
O
w4
I4
Figure 3.1: Basic model of a neuron. The output O is 1 if
otherwise, where the wi are the weights and T is the threshold.
P
i
wi Ii > T and 0
Other designs have thresholds which are less sharp – a sigmoid function, for
example, which is biologically better motivated – but their behaviour is similar.
A good way to think of a neuron is as a detector: if it “detects” enough input,
then it gives an output; otherwise it does nothing. Real neurons give out a series
of spikes, but most artificial models use a “rate approximation”, i.e. their inputs
Chapter 3. Novel computing devices
49
and outputs are continuous-valued spiking rates. This is thought [81] to make no
fundamemtal difference to their properties.
Another crucial feature of neurons is that their output can either be ‘excitatory’
or ‘inhibitory’ (i.e. positive or negative), with each neuron only providing one
type of output. As we will see, it is only through having both types of neuron
that the brain can be as powerful as it is.
Viewed from a computational angle, a neuron can be thought of as a general (and
adaptable) type of logic gate. A basic, but important, result about logic gates
is that only a few different types are needed in order to perform any possible
computation. In fact, it is possible to use just one type of gate: the NAND gate.
This has two inputs, and gives an output unless both inputs are on. For any given
set of inputs and outputs, it is possible to build a network of NAND gates that,
given the inputs, produces the outputs. In other words, any possible boolean
function can be represented with NAND gates.
Following this line of thought, we rapidly hit a difficulty in thinking of brains
as flexible computers: it doesn’t look as if one neuron could possibly represent
a NAND gate. As presented, more input means more output, not less! A single
neuron with suitable weights can represent some types of gate, such as an OR
gate or an AND gate, but ‘negative’ gates like NAND are a problem.
This is where inhibition comes in: by connecting together inhibitory and excitatory neurons, we can do much more. Figure 3.2 shows a pair of neurons, one
excitatory (e) and one inhibitory (i), which, working together, can solve problems
like these.
If, for example, we arrange for neuron i to be active only when both the inputs
are (i.e. be an AND gate) and use the inhibitory output to suppress neuron e,
then we can replicate another type of logic gate, XOR (which is on only if one
or other input is on, but not both). The inhibitory neuron has given us the last
piece of the puzzle: how to get less output from more input. Replicating a NAND
gate is even easier than this: rather than connecting the inputs to neuron e, we
50
Chapter 3. Novel computing devices
Neuron i
w11
I1
I2
w21
w22
wi
w12
Neuron e
O
Figure 3.2: Co-operative pair of neurons.
just have to connect e to a source which is always on, making it active unless it
is suppressed by neuron i.
Much of the interest in artificial neural networks as practical computer systems
(rather than toy models of the brain) stems from the fact that they do not have
to be programmed in the conventional sense; they can “learn from experience”
as we do. Two main mechanisms have been proposed for this: Hebbian learning
and error-driven learning.
Hebbian learning (first proposed by Donald Hebb in 1949) works to emphasise
the correlations between neurons, by strengthening the weights between neurons
that are often active at the same time. This is a bit like making the connection
between a lightning flash and a thunder clap: because they tend to occur together,
we come to feel that there must be a connection between them. Eventually, we
start to expect the thunder when we see the lightning. Similarly, if it so happens
that a neuron is active more often than not when we see a cat, then Hebbian
learning will act to press the neuron into service as a ‘cat detector’. (This is not
as crazy as it may sound: the brain has many highly specialised neurons, some
of which act as ‘face detectors’ for people we know well.)
Error-driven learning, by contrast, makes the brain learn by trial and error. When
we do something right, then the weights between the neurons that were active
in making the decision are strengthened; otherwise, they are weakened. (The
Chapter 3. Novel computing devices
51
mechanism for doing this – the release of a neurotransmitter called dopamine –
is also what makes us feel good when we succeed.) In other words, we remember
good strategies and forget bad ones.
These two strategies work well together, in that error-driven learning drives us
towards good solutions to problems, while Hebbian learning drives us towards
solutions that represent the essential relationships between their parts. As a
result, our solutions to problems come to resemble a series of logical steps, making
it much easier to generalise from situations we know about to situations we’ve
never experienced before.
As a consequence of their ability to simulate Turing machines, neural networks
are at least as powerful. In fact, Siegelmann has argued [82] that the class of
problems that they can handle is actually P/poly. This is the class of problems
which can be solved by a Turing machine if its size is allowed to be a polynomial
function of the problem size. The reason for this is the same as that which
gives the BSS real-valued computer more power: the weights are real-valued, so
neurons can be trained to “detect” any real number. As with the BSS machine,
this power vanishes as soon as we bring real-world considerations into play.
Their ability to learn gives neural networks an important advantage over the other
systems we have considered: they are able to stochastically learn “heuristics” and
then apply them to later problem examples. In the worst case, this cannot help –
these are no regularities to be exploited – but it is of considerable practical use.
For NP-complete problems, however, we conjecture that their performance will
reduce to stochastic search, though of a sophisticated variety; more like WalkSAT
than GSAT, perhaps.
3.4
Circuit-based “consistency computers”
The electrical circuit version of the neon route finder inspired us to try to design circuits which could solve NP-complete problems, ideally without the route
Chapter 3. Novel computing devices
52
finder’s rider that the answer could not be guaranteed optimal. The overriding
concern was that the optimal solutions should be stable, and give rise to the
greatest current flow. As with all such stochastic methods, the odds of the optimal solution actually being found are not high, but the rate of search could
potentially be very fast.
These circuits borrow from artificial neural networks, in that they feature “neuronlike” elements. These are not full-blown artificial neurons, in that their purpose
is to permit current to flow unless an inhibitory input is active; they could be
implemented using transistors.
The remainder of this section considers potential implementations of this idea
for the solution of three different NP-complete problems: the travelling salesman
problem, the satisfiability problem, and integer factorization.
3.4.1
The travelling salesman problem
The travelling salesman problem requires us to deal with two, potentially conflicting, requirements. On one hand, the route found must be as short as possible
while, on the other hand, visiting as many cities as possible (subject to the constraint that each city be visited no more than once). To deal with these two
requirements, together with the constraints, we propose the model shown in Figure 3.3.
First, the roads connecting the cities are modelled by wires, and the cities themselves by small artificial neural networks. The resistance of each wire – not
surprisingly – is proportional to the length of the road it is modelling. (To make
the model more useful, a variable resistor could be added to each wire and put
under the control of an external computer. This would allow the model to be
adapted to solve any travelling salesman problem with the same number – or
fewer – of cities.)
To “persuade” the current in the model to flow through as many links as possible, each wire also has a small potential difference added to it. If the potential
53
Chapter 3. Novel computing devices
C4
C3
C1
C2
Figure 3.3: Travelling salesman model. The lines represent wires, with the arrows
representing the direction in which the potential difference across that wire encourages
the current to flow. The cities – C1 to C4 – are small artificial neural networks which
have the link wires as inputs and outputs.
difference is the same across each link, and is of a suitable size, the idea is that
the current will simultaneously aim for as many links as possible and as low a
resistance as possible.
The neural network at each city node simply enforces the condition that each city
can only be visited at most once by requiring that only one input and one output
can be active at any one time. A possible network is shown in Figure 3.4.
R
I1
O1
I2
O2
I3
O3
I4
O4
Figure 3.4: Travelling salesman city node network.
54
Chapter 3. Novel computing devices
The idea behind this network is that the regulatory neuron R serves to inhibit the
activity of the network when it rises above threshold (and is is set at a suitable
level to allow just two of the other neurons to be active). Given that a current is
flowing through the network, this gives the network no choice but to activate one
input neuron and one output neuron, which is exactly what we want.
3.4.2
The satisfiability problem
The basic structure of our model for solving this problem is as shown in Figure 3.5.
X11
X21
X31
X41
X51
X61
X71
X12
X22
X32
X42
X52
X62
X72
X1n
X2n
X3n
X4n
X5n
X6n
X7n
Figure 3.5: Basic structure of the satisfaction problem network, for a seven-variable
problem
The idea behind this model is that each parallel section represents one disjunctive
subformula, and that, by putting these in series, we can represent the whole
formula to be satisfied. If we then apply a potential difference from top to bottom,
Chapter 3. Novel computing devices
55
the aim is to persuade the current to flow only through paths that satisfy the
formula.
When a particular variable appears un-negated and can be set to true in satisfying
the formula (or appears negated and can be set to false), we represent this by
allowing current to flow through the appropriate branch. Conversely, when the
variable appears un-negated and can be set to false (or appears negated and can
be set to true), we represent this by blocking the flow of current in the appropriate
branch. To enforce these conditions, we need to add two regulatory neurons per
variable, one connected to all the paths where that variable appears negated, and
the other connected to all the paths where the variable appears un-negated. Each
regulatory neuron is then used to inhibit all the paths that it does not have as
inputs. In this way, current can either flow through the paths where a particular
variable appears negated (meaning that the variable is false in the solution) or
through the paths where it appears un-negated (meaning that the variable is true
in the solution), but not both.
As with the travelling salesman problem, note that adding switches between the
regulatory neurons and the paths should allow this model to be flexible enough
to encode any satsfaction problem up to the size of the model.
3.4.3
Integer factorization
This section will be very short, as this idea is yet to be worked out in detail.
However, the idea behind it is motivated by an MSc project proposal by Kousha
Etessami [83]. This proposal points out that any factorization problem can be
restated as a satisfiability problem (which should then be solvable by the model
of the previous section).
In more general terms, we could think of a suitable type of multiplier circuit,
and simply connect the outputs to the inputs. By using the output bits that
are zero in the number to be factorized to inhibit the other bits (and vice versa)
and adding “incentive” potential differences across the output bits that should be
Chapter 3. Novel computing devices
56
one, we can potentially persuade the circuit to settle into a state where the inputs
become factors of the desired number. However, if this is to work effectively, it
will require the properties of the multiplier circuit to be carefully considered.
3.4.4
Conclusions
The above models are all designed around the idea of consistency: connect the
output of a problem to the input in such a way that only consistent solutions are
stable. This is a very general idea, which can be applied to any problem in NP,
but in the end it is just a stochastic search process.
It is interesting to note that, with their “neural” features, at least the last two
of these examples could also be implemented as artificial neural networks. There
would be some difficulty in giving the travelling salesman system a preference for
shortest routes rather than just Hamiltonian cycles, but the other two examples
would not need to be altered. The satisfiability problem network would then
clearly implement a stochastic search-and-backtrack algorithm; as with so many
of our examples, its main virtue would be speed rather than power.
A final point that surprised us was that the initial motivation – using electricity
to “find the path of least resistance” – turned out not to be of any great help. In
particular, we naı̈vely hoped that this would give this type of system an advantage
over artificial neural networks. However, in the end, the fact that NP-complete
problems do not have any useful notion of “downhill” makes any system which
tries to exploit it problematic.
3.5
DNA computers
Just as DNA is able to encode all the information required to construct a living
organism (be it a bacterium, a tree or a human being), it should also be possible
to use it to encode partial or complete solutions to computational problems. If
two strands can also be combined in such a way that the information in the
Chapter 3. Novel computing devices
57
combined strand represents a given logical operation applied to the information
in the original strands, then we have the basis for a computational device.
This idea was first proposed by Leonard Adleman [11] in the context of the Hamiltonian path problem. Given a directed graph, a start vertex and an end vertex,
the problem is to determine whether or not there is a path between the two
vertices which passes through all the other vertices exactly once. Adleman’s solution was to represent each directed edge by a strand of DNA, with characteristic
sequences at either end to represent the vertices.
Two strands of DNA can only combine if the subsequences which bind to each
other are complementary. The details are not relevant here, but suffice it to say
that the characteristic sequences were designed in such a way that the sequence
for a given vertex at the beginning of a strand and the sequence for the same
vertex at the end of a strand formed a complementary pair (and that no other
combinations were complementary). Thus a strand representing a → b could
combine with one representing b → c to give one representing a → b → c.
To solve the problem, Adleman prepared a solution containing large quantities
of all the “edge” strands, and then allowed them to freely combine, producing
a mixture of strands representing a large number of random paths through the
graph. He then used chemical means to filter out all the strands except those that
began and ended in the right places, had the appropriate length and contained
the characteristic sequences representing all the vertices. This left only those
strands representing solutions to the problem, which could then be sequenced
and thus read out.
Later work has shown that it may be possible to use this method to perform any
arbitrary logic operation, and hence that universal computation may, in principle,
be possible [14]. However, applying the method to combinatorial problems relies
on there being sufficiently many strands to make it likely that all possible combinations are represented in detectable numbers. In particular, if the filtration
process eliminates all the strands, we must be able to take it as evidence that
a solution does not exist, rather than simply that the strands representing the
Chapter 3. Novel computing devices
58
solution did not happen to be created.
This is a serious problem, since the total number of strands must typically be
within a few orders of magnitude of Avogadro’s number, 6 × 1023 . If we were to
apply this method to a travelling salesman-type problem, we reach this number
of possibilities with only 25 cities, considering only tours that visit all cities
exactly once. With this method, the initial generation phase will also generate
many tours which do not meet even this criterion; in fact, we would expect the
majority of strands to fail. As a result, applying the method to even 25 cities
might be somewhat over-optimistic.
It should be noted that we are not saying that there is nothing of value in a DNAbased approach; it is able to provide massive parallelism, and this potentially
outweighs the cost of the rather laborious process computation entails. What we
are saying is that it can only be of incremental benefit in solving combinatorial
problems.
3.6
Amorphous computers
Unlike the other examples in this chapter, amorphous computers are built from
ordinary computer hardware. (For further details, see [86].) The key idea behind
them is to, rather than having one big computer, have many small ones, connected
via radio-frequency links. Each computer can do very little on its own, and the
links are deliberately weak, to ensure that they can only communicate with others
in their local neighbourhood. The driving image is of ants in a colony, cells in a
body or neurons in a brain.
The benefits of such systems are that the elements are cheap to produce, and
can be added or removed easily. In fact, the algorithms they run are intended to
mimic the behaviour of their biological counterparts in being insensitive to the
number of elements or their connections; order should spontaneously emerge from
the chaos. As a result, they should be intrinsically fault-tolerant, and capable
Chapter 3. Novel computing devices
59
of many of the same feats as the human brain. (In fact, amorphous computers
could easily simulate artificial neural networks.) In light of the previous analysis,
this approach holds nothing really new, though it may well be of practical use.
3.7
Summary
In the end, all the physical systems we have considered boil down either to
stochastic local search, or a more flexible implementation of ordinary computer
circuits. Potentially, this can be done very fast, and adaptability is practically
very useful, but none of them give us anything fundamentally new. This is in
sharp distinction to quantum systems, which we will discuss in the next chapter.
Chapter 4
Quantum computers
Quantum computing will not replace classical computing for similar
reasons that quantum physics does not replace classical physics: no
one ever consulted Heisenberg in order to design a house, and no one
takes their car to be mended by a quantum mechanic.
Andrew Steane, quoted from [22]
As computers grow ever smaller (and ever faster) quantum mechanical effects become increasingly more important, even for conventional computers. The strange
effects this introduces (such as electrons potentially tunneling their way into different circuits) might appear to be more of a hindrance than a help, something to
be fought against. However, this is not the case, as Feynman [23] and Benioff [24]
independently realised in 1982. Feynman’s reasoning is particularly relevant here:
he realised that there are some quantum systems which could not be simulated
by a classical computer without an exponential slowdown. This is in direct contradiction with our conclusion (see Chapter 2) that physical systems can only
cause polynomial slowdowns, and throws the Church-Turing thesis into doubt.
This revelation led Deutsch [25] to the specification of a universal quantum computer, the quantum equivalent of a Turing machine. This then stimulated an
avalanche of research into the possibilities of computers based on quantum mechanical principles, culminating in the discovery that, for some problems, it was
indeed possible to gain an exponential speedup. Initially, the problems that were
60
Chapter 4. Quantum computers
61
tackled were quite esoteric, but in 1996 Shor [4] sent shockwaves through the
community by showing how to factor large numbers in polynomial time. Since
then, however, little progress has been made, and the repertoire of problems for
which useful quantum algorithms have been found remains small. This has led
many – such as Steane, above – to conclude that quantum computers will only
be of limited use.
In this chapter, we first introduce the basic ideas of quantum mechanics, before
going on to discuss the progress that has been made in designing powerful quantum algorithms. This will lead us to an assessment of the benefits and problems
involved in putting quantum mechanics to use in computation, and on to the seat
of its power.
There is hardly room in this thesis for us to do more than provide the most
cursory overview of the subject, highlighting the points we need. For a more
detailed introduction to quantum mechanics, see e.g. [26], or, for one with a
computing focus, see [27].
4.1
Quantum mechanics, a brief introduction
Quantum mechanics is, at its heart, mathematically quite simple. Even though
its consequences are strange and counterintuitive, the story begins with very
little more than an ordinary classical wave equation. The reason why quantum
mechanics appears strange is that this equation applies not just to waves, but
also to particles: the celebrated wave-particle duality. However, once we come to
terms with this, there is really little more about the theory that is different to
ordinary classical theory.
4.1.1
Wave-particle duality
We are all familiar with the idea that waves can be made up of particles; after
all, water waves are in fact composed of discrete water molecules. Provided that
Chapter 4. Quantum computers
62
the number of particles involved is high enough, we can define a “wave function”
in terms of the expected number of particles at any given point which will be a
good approximate representation of the system’s overall behaviour. The only real
difference is that quantum mechanics postulates that this will continue to be a
good representation even when the number of particles drops drastically. In fact,
it states that it is a good description even of a single particle. In other words,
the probability distribution that we previously viewed as a good tool has to be
accorded some sort of physical reality.
The energy of a simple harmonic oscillator (such as electromagnetic radiation)
is proportional to the square of its amplitude and inversely proportional to its
wavelength. Accordingly, the appropriate wave function to use here is one where
the square of the amplitude gives a probability (as this scales naturally into an
expected number of particles and hence an expected energy).
Conversely, in giving a particle-like interpretation to electromagnetic waves, it
is natural to assume that the energy of the resulting “wave packets” should be
inversely proportional to their wavelength. This results in the Planck hypothesis
E = hν for the energy of photons, where h is Planck’s constant.
In short, quantum mechanics describes everything in terms of particles with an
associated probability wave function; for high particle density (e.g. electromagnetic radiation), this yields a classical wave theory. As a result, we extend the
Planck hypothesis to all particles, not just photons. One further result that we
shall need, due to de Broglie, comes from reconciling this with Einstein’s special
relativity. Knowing that E = mc2 , and also that c = νλ for photons, we find
hν =
hc
= mc2 = (mc)c = pc,
λ
where p is the momentum of the photon. From this, we have p = h/λ. De Broglie
hypothesised that, since we are viewing everything in the same terms, this ought
to apply to material particles as well.
63
Chapter 4. Quantum computers
4.1.2
The Schrödinger wave equation
The Schrödinger wave equation is simply a variation on a classical wave equation,
used to impose energy conservation on the waves. For example, the classical
equation for waves propagating along a string is
2
∂ 2 Ψ(x, t)
2 ∂ Ψ(x, t)
=
v
,
∂t2
∂x2
where Ψ(x, t) represents the displacement of the string at position x and time
t, and v is the speed of the wave. This is derived as a consequence of Newton’s
F = ma, as applied to elements of the string (i.e. relating the acceleration of a
string segment to the net force on it due to the rest of the string). The most
general solutions to this equation are plane waves with amplitude A, wavelength
λ and frequency ν, usually written in the form
Ψ(x, t) = A cos(kx − ωt),
where ω = 2πν and k = 2π/λ. In terms of these waves, the wave equation can
then be seen as imposing the condition v 2 k 2 = ω 2 on the waves, which can be
rewritten in the more usual form v = νλ.
Seen from this perspective, the Schrödinger wave equation – the cornerstone of
quantum mechanics – simply imposes conservation of energy on a very similar
sort of wavefunction. For waves of the form given above, we find
−∂ 2 Ψ(x, t)
= k 2 Ψ(x, t).
∂x2
Using de Broglie’s relation, we find k 2 = (2π)2 /λ2 = (2π)2 p2 /h2 . The kinetic
energy of a particle of mass m can be written as 1/2mv 2 = (mv)2 /2m = p2 /2m,
so we can write
Ã
h
−
2π
!2
1 ∂ 2 Ψ(x, t)
= KΨ(x, t),
2m ∂x2
where K is the kinetic energy of the “particle” associated with the wave. Using
V (x) to represent the potential energy of the particle at point x, we then have
Ã
h
−
2π
!2
1 ∂ 2 Ψ(x, t)
+ V (x)Ψ(x, t) = EΨ(x, t),
2m ∂x2
Chapter 4. Quantum computers
64
where E is the total energy of the particle. By treating E as being fixed, this can
be seen as imposing conservation of energy; it also happens to be Schrödinger’s
wave equation, in its time-independent form.
Up to this point, we have been cheating slightly: the wavefunctions we have been
considering have been real-valued quantities, whereas true quantum-mechanical
wavefunctions are complex-valued. While this is important, the same general
phenomena are present in both cases. (The idea of a complex-valued wave is not,
in fact, intrinsically quantum mechanical: in classical electromagnetism, electromagnetic waves are also complex.) Let us now, however, correct this solecism
and consider the complex wave
Ψ(x, t) = A cos(kx − ωt) + iA sin(kx − ωt)
= A exp(i[kx − ωt]).
The same arguments still apply, but now we also have
ih
−∂Ψ(x, t)
= hωΨ(x, t) = 2πhνΨ(x, t).
∂t
Recalling the Planck hypothesis, E = hν, this allows us to rewrite the Schrödinger
equation as
Ã
h
−
2π
!2
1 ∂ 2 Ψ(x, t)
ih ∂Ψ(x, t)
+ V (x)Ψ(x, t) =
,
2
2m ∂x
2π ∂t
which is its full, time-dependent form.
4.1.3
Quantum states
As with all wave theories, it is natural to think of decomposing the wavefunction,
writing it as a sum of terms in a particular basis. With a sound wave, the natural
basis set is the set of all sine waves, and this allows us to picture a general sound
wave as being made up of pure tones of different frequencies, all playing together.
In quantum mechanics, the natural basis set comprises the possible states of the
65
Chapter 4. Quantum computers
particles involved in the system. Using Dirac’s notation, these are usually written
as |Ψi, so we could say
Ψ = a|Ψ1 i + b|Ψ2 i
if the system had only two possible states, Ψ1 and Ψ2 . Note that |Ψ1 i and |Ψ2 i
are just shorthand for particular quantum waves, but wave-particle duality also
implies that they represent particular states of the system.
As mentioned above, we obtain a probability distribution from the square of the
wave function. Given that the wave function is complex , however, this actually
means multiplying it by its complex conjugate; for the plane wave above,
Ψ∗ Ψ = (A exp(i[kx − ωt]))(A exp(−i[kx − ωt])) = A2 ,
where Ψ∗ represents the complex conjugate of Ψ. States also have complex conjugates, denoted by hΨ|, so we could say
Ψ∗ Ψ = (a|Ψ1 i + b|Ψ2 i)(ahΨ1 | + bhΨ2 |)
= a2 hΨ1 ||Ψ2 i + b2 hΨ1 ||Ψ2 i + ab(hΨ1 ||Ψ2 i + hΨ1 ||Ψ2 i).
Sets of basis states are almost always chosen to be orthonormal , meaning that
hΨi ||Ψj i is 1 if i = j and zero otherwise. This reduces the above to just a2 + b2 .
Since we are implicitly considering all of space, and the total probability of the
system being somewhere is one, we must have a2 + b2 = 1. (In what follows,
however, we will at times neglect such normalisation factors, in the interests of
clarity.) More generally, hΨ1 ||Ψ2 i is the probability that system is both in the
state |Ψ1 i and the state |Ψ2 i.
The total energy of a quantum system, is often called the Hamiltonian, H, by
analogy with classical mechanics, and the Schrödinger equation written as
∂
Ψ = −iHΨ.
∂t
At first, it appears that we have thrown something away by doing this, but it
encapsulates everything that we need for thinking about quantum computers.
66
Chapter 4. Quantum computers
Once we have determined a set of basis states, we can represent the Hamiltonian
as a matrix, operating on vectors in the space of basis states, e.g.

H(t) = 
i(t) j(t)
k(t) l(t)

,
and hence to write


i(t) j(t)
k(t) l(t)


a|Ψ1 i
b|Ψ2 i

 = (ai(t) + bj(t))|Ψ1 i + (ak(t) + bl(t))|Ψ2 i.
The only fundamental requirement on the Hamiltonian is that it should be selfadjoint, i.e. that H† = (HT )∗ = H, where HT is the transpose of H. The reason
for this can be seen if we try to interpret hΨ2 |H|Ψ1 i. This can be described as the
probability that operating on |Ψ1 i with H will yield the state |Ψ2 i. The problem
is that it can also be described as the probability that operating on hΨ2 | with H
will yield the state hΨ1 |. Given that hΨ2 | is the complex conjugate of |Ψ2 i and
that we are now operating on the right rather than the left, consistency requires
that H be self-adjoint.
If we evolve our quantum system forward in time by a small amount ∆t, the
Schrödinger equation gives its state approximately as
Ψ(t + ∆t) = (I − iH∆t)Ψ(t) = U(∆t)Ψ(t),
where we have introduced the time evolution operator U. In the limit as ∆t → 0,
U is unitary, i.e. U † U = 1; this ensures that
(U(t)Ψ)∗ (U(t)Ψ) = Ψ∗ U† (t)U(t)Ψ = Ψ∗ Ψ = 1,
i.e. that the wavefunction has a consistent interpretation over time as a probability
amplitude.
If the Hamiltonian does not change with time, we can write U(t) = exp(itH) for
macroscopic times; this is the usual situation in quantum computers.
67
Chapter 4. Quantum computers
4.1.4
Operators and Measurement
With each observable quantity (such as position and momentum), we associate
a quantum operator. Thinking in terms of the vector of basis states introduced
above, we can think of the operator as a matrix. The values that the quantity
can take on are given by the eigenvalues of the operator. Thus, for example, if
we have an operator A and we find that
1
AΨ = Ψ
2
then we say that, for the wavefunction Ψ, the quantity associated with the operator A has the definite value 21 . Eigenstates of a given operator have a definite
value for the associated quantity and are undisturbed by the act of measuring
that quantity. Otherwise, the act of measurement causes the wavefunction to be
projected down onto one of the eigenstates of the operator. This may seem odd,
as the set of eigenstates forms a basis set, in terms of which any wavefunction
may be expressed, so we would expect a sum of states to be the result; this is a
central problem in quantum mechanics, of which more later.
When measurement does disturb the wavefunction, measurement will not always
yield the same value. In this case, we can only define an average, or expectation
value for the operator, given by
hAi = hΨ|A|Ψi.
From the above, this represents the probability that the wavefunction will emerge
from the operation unchanged.
Although operators all have an associated set of basis states, the sets are not
necessarily the same. When two operators, say A and B, do share a set of basis
states, this means that the effects of applying AB and BA are the same; in other
words, they commute. In consequence, the values for a set of quantities which
are associated with operators which form a commuting set can all be measured
without disturbing the state; we call this a set of observables. If we attempt to
measure quantities associated with operators which do not commute, the second
Chapter 4. Quantum computers
68
measurement will disturb the state found by the first, meaning that the first
measurement will no longer be true of the current state.
4.1.5
Quantisation and spin
Many quantities in quantum mechanical systems turn out to take only a discrete
set of values. For example, electrons can only occupy a discrete set of orbits
round an atom. This means that there are only a finite set of possible energy
“jumps” that electrons can make, so atoms can only absorb or emit radiation (i.e.
photons) with particular energies, giving rise to a characteristic set of spectral
absorption/emission lines.
The reason for this is that the electron must now be thought of as being “smeared
out” round its orbit. Since the potential energy at all points of a circular orbit
is the same, the resulting wavefunction looks like a plane wave bent round into
a circle. As a result, demanding that the wavefunction is smooth and continuous
means that the only allowable orbits are those whose length is an integer multiple
of the electron’s wavelength at that energy.
As well as orbital angular momentum, electrons also have an intrinsic angular
momentum, called spin; it is convenient – though technically wrong – to think
of this as being due to the electron spinning on its axis. Like its orbital angular
momentum – and for similar reasons – this too is quantised, taking values of
either h/4π or −h/4π, which we will refer to as “spin up” and “spin down”. (We
will denote these by | ↑i and | ↓i respectively.) In fact, all particles have spin, and
it is always some multiple of h/2π; as a result, electrons are said to be “spin- 21 ”.
All common material particles as spin- 21 , and even the more exotic types have a
spin which is an odd multiple of 12 . By contrast, all massless particles (such as
photons) have integral spin; the photon is “spin-1”.
With spin, we assciate three operators, one for each orthogonal direction: S x , Sy
and Sz . It is usual, in considering problems involving spin, to use the eigenstates
69
Chapter 4. Quantum computers
of Sz as a basis, giving us


h  1 0 
Sz =
4π 0 −1
for the spin of an electron. We also find that


h  0 1 
,
Sx =
4π 1 0


ih  0 −1 
Sy =
.
4π 1 0
(These three matrices, without their factors of h/4π, are known as the Pauli spin
matrices.) The important point to note about Sx and Sy is that, if the electron
is in an eigenstate of Sz , then both their expectation values are zero. In other
words, if the electron is definitely spin up with respect to the z-axis, the results
of a measurement along either of the other axes will be exactly even.
4.1.6
Quantum parallelism, entanglement and interference
An important feature of wave equations, classical and quantum alike, is that they
are linear , meaning that, if Ψ1 and Ψ2 are both solutions, then so is Ψ1 + Ψ2 .
This means that, in principle, arbitrarily many solutions can be superposed (i.e.
added together), manipulated simultaneously, and then the results “read out”,
by Fourier decomposition or otherwise.
The problem with this is that, since the wavefunction of a system represents the
probability of finding it in a given state, the accuracy of the results depends on
how accurately the observed particle counts match the underlying wave function.
Thus, the greater the superposition, the greater the number of particles required
for this method to give an accurate result. That said, however, careful application
of this general principle is at the heart of all known powerful quantum algorithms.
There is an important difference between superpositions in quantum theory and
those in classical theories: quantum superpositions can be entangled . In a classical wave system, the waves are the most fundamental objects; in quantum
70
Chapter 4. Quantum computers
mechanics, they can represent the joint state of a (possibly large) number of
particles. This means that we can have wavefunctions such as
Ψ = a| ↓↓i + b| ↓↑i + c| ↑↓i + d| ↑↑i
for the joint state of a pair of spins. The reason why states such as this are called
entangled is because measurement of one of the particles affects the probability of
finding the other in a particular state. For example, measuring the first particle
as being spin down means that the probability that a measurement of the second
particle would also yield spin down is a2 ; measuring the first particle as being spin
up changes this to c2 . In other words, operations carried out on one particle have
an effect on all the particles. This does not apply just to measurements, but to
any operation applied to a subset of the particles. For example, a commonly-used
operation is the Walsh-Hadamard transformation


1
1
1
,
H= √ 
2 1 −1
√
which transforms the state | ↑i into 1/ 2(| ↑i + | ↓i). If this is applied to the first
√
particle of a pair in the state | ↑↑i, it yields 1/ 2(| ↑↑i + | ↓↑i). This means that,
for example, applying it successively to each particle in an N -particle system
yields a superposition of all 2N possible states.
This leads us on to the other important aspect of entanglement: that it exponentially increases the size of the space of states. For N particles, we can have a
superposition of 2N terms where, by contrast, N waves allow a superposition of
only N terms. The reason for this increase is that each particle in the quantum
system is allowed to, in a sense, be in several states simultaneously; with several
particles, this leads to a combinatorial explosion.
The other important feature of quantum systems, which is exploited by a number
of quantum algorithms, is interference. As with classical waves, quantum waves
which are divided and then recombined can interfere with each other. The canonical example of this is the two-slit experiment, which can be carried out equally
well with water waves, laser beams or electron beams.
Chapter 4. Quantum computers
4.1.7
71
Heisenberg’s uncertainty relations
A further consequence of wave picture at the root of quantum mechanics is that,
rather than talking about absolute positions and momenta, we must talk about
distributions: the wavefunction we have been talking about so far can be written
as an integral – in fact, a Fourier transform – over the particle’s momentum distribution. Similarly, we could have chosen to define the wavefunction in momentum
space rather than coordinate space, in which case it would have been a Fourier
transform over the particle’s space probability distribution.
A consequence of these two distributions being Fourier transforms of each other
is that compressing one distribution causes the other to spread out: the more
precisely the particle is confined in space, the greater the uncertainty in its momentum, and vice versa. As a result, there is a minimum limit to the combined
uncertainty in position and momentum, given by ∆x∆p ≥ h/2π – one of Heisenberg’s uncertainty relations. Another way of looking at this result is as implica-
tion of the fact that the operators for position and momentum do not commute;
measuring one disturbs a measurement of the other. In fact, Heisenberg’s relations apply to any pair of “complementary” variables (i.e. variables which can
be defined as Fourier transforms of each other), and we can write down similar
relations for energy and time, for example.
It is perhaps worth reiterating here that this is, again, a general feature of wave
theories; a sharp kink in a string also leads to waves with a wide variety of different
wavelengths, and the only way to restrict the system to waves of one wavelength
is to start from a displacement distribution spread over the whole string. The
only reason that this seems strange is because what we measure are particles, not
distributions.
4.1.8
The problem of measurement
As is hopefully becoming clear, the strangeness of quantum mechanics has little to
do with its mathematical formalism – which is that of an ordianary wave theory,
Chapter 4. Quantum computers
72
though admittedly applied to complex-valued wavefunctions – and everything
to do with the fact that what we actually observe are particles, with a definite
position and momentum. This is difficult to reconcile with the rest of the theory,
and has been the topic of much debate over the years; there is still no definitive
answer.
In the early days of quantum theory, the dominant interpretation was that of Niels
Bohr, now called the Copenhagen interpretation. This stated that, on measuring,
say, the position of a particle the distribution “collapses” to reflect a definite position, with a probability given by the square of the wavefunction at that point.
(In the process, this also introduces a massive uncertainty in the particle’s momentum.) Mathematically, we would expect that the operator representing the
measurement should yield a superposition of states; the Copenhagen interpretation states that all but one term collapses. This leads to the correct numerical
results, but suffers from being unmotivated by anything else in the theory.
The Copenhagen interpretation has also led many to assume that there is something special about conscious observers, and to postulate various theories about
what constitutes a conscious observer, and whether other conscious observers can
be said to be in a superposition relative to ourselves before we observe them. To
be fair to Bohr, this is not what he meant: any process by which information
about a particle in superposition “leaks out” into the rest of the world counts as
an observation. For example, air molecules that come near enough to be affected
by it count as observers, because we could then – in principle at least – make
measurements on them to determine the state of the particle, without having to
do anything to the particle itself. Consciousness has nothing to do with it: we
are simply one means by which information can be recorded and transmitted.
The other commonly-held view is the “many worlds” interpretation, first proposed
by Everett. In this view, the wave function does not “collapse”; measurement does
indeed yield a superposition, and simply entwines the observer with the observed.
Thus, observing a particle in superposition results in another superposition, each
term representing the particle being observed to be in a different position. The
Chapter 4. Quantum computers
73
terms are almost certain to evolve separately from then on, meaning that each
term can be thought of as a separate “world”, independent of all the others.
The many worlds interpretation leads to essentially the same consequences in
terms of what we observe: all the other worlds are independent, so there is no
way to tell whether they are there, or whether the wave function collapsed. The
consequences are not exactly the same, because there is a very short period after
an observation when the terms in the superposition are still sufficiently similar
to each other for interference to be possible. In practice, however, this period is
so short (about 10−20 s) that no experiments have yet been able to detect it.
This interpretation resolves the problems of the Copenhagen interpretation, in
that it adds essentially nothing to the theory, but it does paint a very strange
picture. On a personal note, we feel that mere strangeness is no reason to dismiss
a theory, and tend to prefer it as being the more natural of the two in other ways.
However, it is largely a question of taste at this stage: the lack of any practical
differences in their predictions make them equally valid.
4.1.9
Quantum computers
The quantum analogue of the bit is the qubit. In ordinary computers, bits are
represented by physical systems which can be in one of two states; the first computers used vacuum tubes, modern ones use tiny capacitors. Quantum computers
typically use electron spin (representing 1 by “spin up” and 0 by “spin down”,
for example), or the polarization of a photon (left circularly polarised for 1, right
circularly polarised for 0, for example). Just as with bits, these can take on two
possible values; now, though, they can do both simultaneously.
Designing quantum logic gates is made more difficult by the fact that quantum
operators must be unitary, and hence must be reversible. Ordinary logic gates,
such as AND, throw away information – knowing that the AND of two inputs
is 1 does not tell you what the inputs were – so they cannot be implemented
in a reversible manner. Thankfully, it turns out that universal computation is
74
Chapter 4. Quantum computers
A
A∧B
¬A ∧ B
A ∧ ¬N
B
A∧B
Figure 4.1: Fredkin’s “billiard ball” model of computation. The presence or absence
of a ball represents a value of 1 or 0, respectively, so this implements a two-input,
four-output reversible logic gate.
still possible; we just need to use different gates. An entertaining demonstration
of this was discovered by Fredkin [61], which takes advantage of the fact that
a classical collision between two hard spheres diverts each one from the path it
would otherwise have taken; this is known as the “billiard ball” model, and is
illustrated in Figure 4.1. Among the outputs of the “gate” is the AND if its inputs;
the point is that it also has other outputs which, between them, allow the inputs
to be uniquely determined. It is easy to see, for example, how the addition of a
pair of “mirrors” would allow the original trajectories to be recovered. Fredkin
was able to show that, with enough balls and mirrors, any logical operation could
be accomplished; the only problem is that the system would generate a lot of
useless output as well.
Fredkin also designed the first reversible quantum gate, which now bears his
name. This has three inputs and three outputs. The first input is a “control”
and is output unchanged; the other two inputs are swapped if the control is 1
and output unchanged if it is zero; the truth table is shown in Table 4.1.
The crucial aspect of this gate is that the set of outputs is a permutation of the
set of inputs; in fact, a second application of the gate would undo the effect of
the first one. The idea of having one or more “control” bits is common to many
quantum gates. For example, Toffoli devised a gate with two controls, and a third
bit which is flipped only if both of the controls are 1; the truth table is shown in
Table 4.2.
75
Chapter 4. Quantum computers
Inputs
Outputs
i1
i2
i3
o1
o2
o3
0
0
0
0
0
0
0
0
1
0
0
1
0
1
0
0
1
0
0
1
1
0
1
1
1
0
0
1
0
0
1
0
1
1
1
0
1
1
0
1
0
1
1
1
1
1
1
1
Table 4.1: Truth table for the Fredkin gate. Input i1 acts as the control.
Inputs
Outputs
i1
i2
i3
o1
o2
o3
0
0
0
0
0
0
0
0
1
0
0
1
0
1
0
0
1
0
0
1
1
0
1
1
1
0
0
1
0
0
1
0
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
0
Table 4.2: Truth table for the Toffoli gate, also known as the “controlled-controlledNOT” (CCNOT) gate. Inputs i1 amd i2 act as the controls.
Chapter 4. Quantum computers
76
The most commonly considered gates apart from these two are the “controlledNOT” (CNOT)1 and a single qubit phase rotation.
Deutsch [25] showed that it was possible to choose a Hamiltonian for a system
(i.e. set the potential field appropriately) so as to make its evolution perform any
unitary transformation2 . In particular, he showed that any unitary transformation could be effected by a finite set of “quantum gates”. A suitable “universal
set” is given by the CNOT gate and a single qubit phase rotation, for example,
making this result analogous to the situation for ordinary computers. Given this,
we are free to consider algorithms specified in terms of arbitrary unitary transformations, and to take it for granted that they can be implemented efficiently.
4.1.10
Approaches to building quantum computers
Although Feynman and Benioff both came up with the concept of a quantum
computer, their designs are very different. If we imagine a basis of states that
corresponds naturally to the possible states of the problem to be handled, the
simplest option is to apply a unitary transformation which transforms each state
into its successor in the computation. Applying this (time independent) Hamiltonian for a suitable length of time will then drive the initial state of the computer
to the solution state. The problem is that, to remain unitary, this transformation must be a permutation, i.e. every state must have a successor, including the
solution state. Thus, unlike an ordinary algorithm, where the solution state is a
“halt state”, we have to define a successor for it.
Benioff’s solution to this was essentially to give the computer a “clock”, applying
a series of three operators in a cycle. This way, each operator was not required
to be a permutation; like the Walsh-Hadamard transformation, it could push the
state of the computer to an equilibrium. The overall transformation accomplished
by a cycle was still a permutation, but applying it in this way allowed its speed
1
As the name implies, this has one control, giving it a total of two inputs and two outputs.
Due to the unitarity of the time evolution operator, all quantum transformations must be
unitary.
2
Chapter 4. Quantum computers
77
to be controlled, and the calculation to be stopped at the solution state.
Feynman’s solution was rather different. He noted that, for an arbitrary matrix
T, T + T† was automatically self-adjoint. Thus, he was allowed to define the
successor of the solution state to be the state itself in constructing T. The problem is that this is analogous to the situation for spin waves in a one-dimensional
crystal, which can propagate in either direction depending on their initial momentum. Here, the computation can proceed in either direction, obliging us to
give the system enough momentum to keep it going in the right direction. The
problem is that, for long calculations, thermal fluctuations or other effects can
contrive to reverse the computer’s direction and make it undo what it has done.
Thus, the computer’s evolution will have an element of Brownian motion, and
there can be no guarantee that it will complete the calculation.
4.1.11
Error correction
Quis custodiet ipsos custodes?
(Who watches the watchmen?)
The greatest practical difficulty involved in actually building a quantum computer
is error correction. So far, we have been treating quantum systems as being
completely isolated from the outside world, but clearly this cannot really be true.
The effect of the computer interacting with its environment is equivalent to the
application of random operators; for example, for one qubit we could have
E(a|ei 0i + b|ei 1i) = a(c00 |e00 0i + c01 |e01 1i) + b(c10 |e10 1i + c11 |e1 10i),
where E is an operator applied by the environment and ei summarizes the state
of the environment. As well as simply mixing up the state of the computer,
these interactions can allow information about the state of the computer to “leak
out” into the outside world, which counts as a measurement. This can cause
the delicate superpositional state of the computer to collapse; a process called
decoherence.
Chapter 4. Quantum computers
78
It was initially thought that these effects would make quantum computers of any
useful size impractical, but Calderbank and Shor showed that quantum versions of
error-correcting codes can be used to good effect [63]. As we shall see in Chapter 5,
classical error-correcting codes encode the same information in several different
places, allowing a certain number of errors to be detected and even corrected. The
trick is to encode the possible states of the computer in a larger number of qubits,
and to construct a set of operators which include at least the states corresponding
to correct states of the computer in their set of basis states. This means that,
if there has been no error, measuring any of these operators will not disturb
the computer. If there has been an error, measuring one of these operators will
collapse the state of the computer onto one of the basis states of the operator,
converting a continuous error into a discrete error. Classical error-correcting
codes have a parity-check matrix, which is used to identify errors: codewords
give the null vector when multiplied by the parity-check matrix, and any other
result can be used to diagnose the error. In the quantum case, this is replaced
by combinations of operators from our set; the resulting set of measurements can
be used to diagnose the (now discrete) errors.
A potential problem with these error-correcting schemes is that they must inevitably be implemented on quantum hardware, which is itself subject to error.
Thankfully, however, Shor and Kitaev independently showed [62] that it was possible for such systems to correct more errors than they introduced. Estimates of
the requirements for such fault-tolerant quantum computing [64] are, however, severe: they can only tolerate an error probability of 10−5 per qubit per gate. This
compares very favourably with the estimate of 10−13 for quantum computation
without error correction, but it still represents a formidable challenge, particularly when we remember that the additional hardware required could well make
the computer ten times larger.
Another difficulty with error-correcting codes is that almost all the codes that
have so far been constructed do not have a practical decoding algorithm (i.e. one
that is polynomial in the length of the code block), and so would lead to exponential growth in the length of the computation. Recently, however, MacKay [65] has
Chapter 4. Quantum computers
79
constructed the only known example of a practical error correcting code, based
on a classical sparse graph code.
The fact that error-correction is possible at all may seem very strange, in light
of the discussion in Chapter 2, where we argued that any computationally useful physical system could be simulated in polynomial time on a conventional
computer, due to the impossibility of packing more than a finite amount of information into a finite space without error. By contrast, we seem to be able to
pack a superposition of 2N terms into a space that increases only linearly with
N , leading to an exponential increase in the information required for simulation 3 .
The trick here is that we can think of the terms in a superposition as effectively or
actually being in separate universes, each of which is individually subject to the
information bound. By packing all of these terms on top of each other, we have
effectively circumvented the bound. On the other hand, this makes the effect of
noise more significant, which is why we have to be aggressive in correcting errors
before we see a benefit.
4.2
Quantum algorithms
David Deutsch [25] is credited with the discovery of the first quantum algorithm
which is demonstrably more efficient than any algorithm which could be executed
on a classical computer. Since the algorithm is so simple, it provides us with a
useful “toy model” with which to unpick the origins of this power.
Deutsch considered the simplest possible boolean function f , which maps a single
boolean value to another boolean value. There are only four such functions: two
constant functions (f (0) = f (1) = 0 and f (0) = f (1) = 1) and two “balanced”
functions (f (0) = 1, f (1) = 0 and f (0) = 0, f (1) = 1). Classically, there is no
way of discovering whether a particular (unknown) function is constant or balanced without evaluating both f (0 and f (1); Deutsch discovered that, quantum
3
This is the crux of Feynman’s [23] argument that conventional computers were incapable
of simulating a quantum system without an exponential slowdown.
80
Chapter 4. Quantum computers
mechanically, the determination could be done in one step.
√
The algorithm uses two types of quantum gate: a not gate, and one which
√
evaluates the function. The not gate can be represented by the (unitary) matrix


i 1
1
;
N=√ 
2 1 i
the name comes from the fact that

N2 = i 
0 1
1 0

,
meaning that N 2 acts as an ordinary NOT gate (up to an overall phase factor).
The function evaluation gate acts as

F =
(−1)f (0)
0
0
(−1)f (1)

,
i.e. it selectively multiplies the terms in the superposition by a phase factor. Using
these gates, the algorithm simply computes NFN, which is


i((−1)f (0) + (−1)f (1) )
1  (−1)f (1) − (−1)f (0)
.
NF N =
2 i((−1)f (1) + (−1)f (0) ) (−1)f (0) − (−1)f (1)
From this, we can see that the result (ignoring phase factors) is the identity if
the function is balanced, and a NOT gate if the function is constant. Thus, if the
computer starts in the pure state |0i, say, it will finish up in the state |0i if the
function is balanced and |1i if it is constant.
Deutsch has pointed out [66] that this algorithm has a very simple physical implementation, as shown in Figure 4.2; this gives us an interesting insight into
what it is doing. The half-silvered mirrors serve to divide the beam in exactly
√
the same way as the not gate, acting on the inputs, so we can think of the two
paths as a qubit: Path 0 is |0i and Path 1 is |1i. The fact that the input beam
follows both paths simultaneously shows that we have a superposition.
If the devices placed along the paths are simply mirrors, the effect on the beam
√
is equivalent to two not gates, so a beam presented to Input 0 would reappear
Chapter 4. Quantum computers
81
Figure 4.2: Physical implementation of Deutsch’s algorithm. A half-silvered mirror
divides an input laser beam in two, sending one part along Path 0 and the other along
Path 1. The two beams interfere again at a second half-silvered mirror, and a pair of
photo-sensitive detectors record the results. Taken from [66].
Chapter 4. Quantum computers
82
at Output 1. However, if we replace these mirrors by phase shifters, we can also
implement the function evaluation gate.
Clearly, what this algorithm does is to put the computer into a superposition of
all possible states, simultaneously evaluate the function on all of them, and then
to recombine the results using interference. In a way, it is cheating: collapsing
a series of function evaluations into one simultaneous evaluation. This is an
invariable motif in quantum algorithms: their power comes from their ability to
perform operations in parallel without requiring extra resources. (In consequence,
any calculation which cannot be parallelized will run no faster on a quantum
computer than a classical computer.)
The trick with algorithms such as this really lies in the recombination phase: we
must arrange for the phases to fall in such a way that any measurement corresponds to a definite answer to our question. In Deutsch’s algorithm, the two
answers are equally probable, so it was possible to arrange for some terms to
cancel out completely; when this is not the case, the “signal” becomes weaker.
For example, Deutsch and Josza [2] have considered an n-qubit version of this
algorithm, which asks about an n-bit boolean function f . This time, the question
can only be approximate: f is said to be balanced if the fraction of inputs for
which f is 1 is approximately 1/2, and constant if one output value predomi√
nates. Rather than using the not gate, this uses a series of Walsh-Hadamard
transformations, but the principle is broadly the same. This algorithm is not so
easy to implement physically (we’d need to split the input into 2n beams) but it
again trades on the fact that the fraction of phase shifts is either about 1, 1/2 or
0. If we were to try it with a function that was neither constant nor balanced,
and to ask which was the better approximation, we would find that the algorithm
started to make errors. We would find that there was some probability of making
either measurement on any given trial of the experiment, and that what we were
looking for was encoded in the relative probabilities. In this case, we would either
have to keep repeating the experiment until we had enough data to give a reliable
estimate, or make more function evaluations.
83
Chapter 4. Quantum computers
4.3
Grover’s database search algorithm
Imagine, for a moment, an extreme version of Deutsch’s problem, where we know
that a function is approximately balanced but want to know whether it is exactly
balanced. In other words, is f (x) = 0 for all x, or is there some x for which
f (x) = 1? This is equivalent to asking whether a particular entry is in an unsorted
database, and, classically, may require us to search through the entire database.
Grover’s algorithm, remarkably, is able to search an n-entry database in a time
√
proportional, not to n, but to n.
The algorithm is, fundamentally, very similar to Deutsch’s. If we were to try the
n-qubit version of the algorithm on this problem, we would find that a measurement would almost certainly come down in favour of the function being constant,
but that there was a very small probability of it telling us that the function was
balanced. Say, for example, that the effect of the device is described by the
operator

 √
a
1 − a2

√
1 − a2
−a
.
In the limit as a → 0, this is equal to the “constant” result of the basic Deutsch
device, and the other limit is equal to the “balanced” result, so this is at least a
plausible approximation. In this case, a is either very small or zero, so the probability of obtaining a result of “balanced” is very small in either case. Grover’s trick
amounts to combining this with an “inversion about the average” gate (which
adds a phase factor of −1 to all states except |0i) and then sending the output
back round to be the input to a second application of the device. The combined
effect of two tours round the device and two “inversions about the average” is


1
0
0 −1

 √
a
1 − a2
√
1 − a2
−a

√
2a2 − 1
2a 1 − a2
 = 
.
√
−2a 1 − a2
2a2 − 1
2

The probability of finding a result of “constant” is now 4a2 (1 − a2 ), which is
√
larger than a2 so long as 0 < a < 3/2. Thus, repeated applications of this
process will boost the probability of measuring “constant” until it reaches three
84
Chapter 4. Quantum computers
quarters unless a = 0, in which case the probability of measuring “constant” will
always be zero. In addition, the process has its greatest effect when a is very
small, where it essentially doubles the probability with each iteration. While this
effect of successively doubling the probability does not continue indefinitely, it
does explain why the algorithm is able to achieve a performance gain.
If we denote by cn and bn the amplitudes for measuring the function to be constant
and balanced respectively after n iterations of the device, we have that c1 = b,
b1 = a and
√
1 − a2 bn−1 + acn−1
√
= abn−1 − 1 − a2 cn−1 ,
cn =
bn
meaning that the probability of measuring the function to be constant is
√
c2n = (1 − a2 )b2n−1 + a2 c2n−1 + 2a 1 − a2 bn−1 cn−1 .
Here we can clearly see the interference process at work; if the device had been
behaving as a classical probabilistic process, we would not have had the last
term on the RHS. This is what is speeding up the algorithm: the interference is
consistently boosting one probability and diminishing the other.
Grover’s algorithm [5] is similar to this in spirit, but slightly different in practice,
in that it is designed to identify the database item as well as simply proving
its presence. To see how the algorithm works, it is easier to consider a related
algorithm, by Farhi and Gutmann [67]. (An entertaining analysis of Grover’s
original algorithm can be found in [68].) Here, we consider a quantum system
with N basis states and a Hamiltonian proportional to one of these, say w; The
task is to discover w. From before, we have the Schrödinger equation,
i
∂
|ψi = H|ψi,
∂t
where, for this problem,
H = |wihw|.
To solve the problem, Farhi and Gutmann add an additional term to the Hamiltonian, namely
1
(|1i + · · · + |N i)(h1| + · · · + hN |).
N
To see why this helps, consider
85
Chapter 4. Quantum computers
the subspace spanned by |wi and |si =
√1
N
P
k6=w
|ki; in terms of this, the addi-
tional term is just ( √1N |wi + |si)( √1N hw| + hs|). This means that we can write the
new Hamiltonian, H0 , as
1
1
|wihw| + |sihs| + √ (|wihs| + |sihw|).
H = 1+
N
N
0
µ
¶
The point about this Hamiltonian is that (to order 1/N ) the state |si is as likely
to evolve into |wi as |wi is to evolve into kets; slightly more so, in fact. If we start
√
√
off from the state 1 N (|1i + · · · + |N i) = 1/ N |wi + |si, we therefore find that
the amplitude for state |wi steadily increases. Once enough time has elapsed, the
probability that a measurement of the system will yield |wi is of order one, i.e.
√
almost certain. Initially, the 1/ N terms ensure that the probability of observing
√
|wi grows in proportion to 1/ N . This is closely analogous to the version based
on Deutsch’s algorithm, where the probabilities of a state flipping from constant
to balanced or vice versa is about equal, but one probability is initially much
higher than the other4 . The difference here is that a measurement yields the
anomalous state itself, not just an indication that one exists.
Farhi and Gutmann’s analysis also yields a closed-form result for the probability
of measuring the system to be in the state |wi at time t, P (t):
P (t) = sin2 (Ext) + x2 cos2 (Ext),
√
where x = hs||wi = 1/ N 5 . This shows that P (t) is exactly one at a time
√
of (π/2) N , but it also shows that, if we leave measuring the system for too
long then the probability will actually go down. This is a general feature of this
style of algorithm, and is essentially a resonance effect: Grover showed that a
version of the algorithm could also be implemented using coupled pendulums [75].
Unfortunately, this feature becomes particularly awkward if we try to apply it to
problems with more than one solution. If we try to apply it to a problem with
t solutions [7], the time required is of order
q
N/t to find a superposition of the
solutions. This is acceptable if we know how many solutions there are, but the
4
Note, however, that in Deutsch’s algorithm the flip probability is the dominant one.
We have taken the liberty of normalising the energy of the system so that Farhi and Gutmann’s E is one.
5
86
Chapter 4. Quantum computers
more approximate our estimate of their number is, the greater the probability is
that the algorithm will fail.
For example, imagine that we want to solve a satisfiability problem by using
successive applications of Grover’s algorithm to remove all the terms in the superposition that violate each of the clauses in turn. In other words, at step n we
want to find a superposition of all the allocations which satisfy the first n clauses.
It is, in principle, possible to do this [69] by introducing a family of gates F i which
act as the identity on states satisfying the first i clauses and annihilate all other
states. The gate combination Fi−1 GFi−1 + (I − Fi−1 ) – where G is an iteration
of the Grover algorithm and I is the identity – achieves the appropriate result at
the ith step in a unitary fashion. Applying this combination applies the Grover
algorithm only to those states which have passed all the previous tests, and leaves
the amplitudes of all other states as they are. If the Grover algorithm could perfectly weed out the satsfying allocations for the current clause from those it is
given, then the end result of this process would be a superposition of exactly the
states that satisfy the given problem. What’s more, each step potentially only
has to weed out one eighth of the clauses (for a 3-SAT problem), meaning that
t ≈ 7N/8, and that the time required is
q
8/7 ≈ 1. In short, this algorithm
appears to be able to solve the problem in a time linear in the number of clauses.
The problem, of course, is that this requires us to have a very precise knowledge
of the fraction of the remaining states which will satisfy the current clause; otherwise, the probability of failure will grow as the algorithm proceeds. Even if the
success probability at each step is 99%, the probability of making it all the way
through 2000 clauses is about 10−7 %, i.e. negligible. On a more positive note, if
the fraction of solutions is exactly a quarter, Grover’s algorithm is guaranteed to
find the answer in one step [70].
In the general case, where we have no special knowledge to exploit, it has been
shown that Grover’s algorithm is optimal [8] in terms of the number of database
queries6 . (The proof depends on the fact that any search algorithm must be
6
Improvements can, however, be made in terms of the total number of non-query operations;
Chapter 4. Quantum computers
87
capable of redistributing the probabilities widely enough that the answer can be
found, irrespective of what it turns out to be. With only unitary transformations
available, there is a limit to how fast this redistribution can happen.) Thus, as
we might have expected, the success of algorithms like these is intimately related
to our knowledge of the search space.
4.4
Shor’s factoring algorithm
Grover’s algorithm is impressive, but ultimately it can only provide a polynomial
speedup. The first – and, so far, only – algorithm to demonstrate that a truly
exponential speedup was Shor’s algorithm for factoring integers [4]. The best
classical algorithm known for factoring large numbers is the “number field sieve”
(see e.g. [71] for details), which has a worst-case running time of
O(exp[c(log n)1/3 (log log n)2/3 ])
for factoring n, where c ≈ 1.9. This makes it significantly better than e.g. brute-
force trial-division, in that it is not truly exponential in the number of bits in
the number. However, it is not polynomial either; it is sub-exponential . Shor’s
algorithm, by contrast, is genuinely polynomial.
The algorithm is based on Fermat’s Little Theorem, which states that, for any
n, np−1 modp = 1 for any prime p which does not divide into n. This can be
extended to state that xr modn = 1 for any numbers x and n and some r. If r is
even, then (xr/2 − 1)(xr/2 + 1) = xr − 1 = 0modn. Thus, xr/2 + 1 and xr/2 − 1
both divide n. If r is as small as possible, this means that either xr/2 − 1 must
divide one of the factors of n or xr/2 + 1 must do. In the first case, we have
xr/2 = 1modp, where p is a prime factor of n. The original version of Fermat’s
Theorem now applies, so we have that r/2 = p − 1, i.e. we have found one of the
prime factors. Shor showed that the probability of this approach paying off for a
number with k prime factors is at least 1 − 1/2k−1 , so there is always at least a
50% chance of success.
see [74].
88
Chapter 4. Quantum computers
The hard part is computing r, and this is where the uniquely quantum nature
of the algorithm emerges. Classically, it is possible to compute xa modn for some
i
given a quite efficiently: repeatedly square xmodn to get x2 modn for all i such
that 2i < r, then multiply these together in appropriate combinations. This takes
broadly of order O((log2 n)2 ); the problem is that we need to do this for all a < r.
With a quantum computer, however, we can do better: by starting off from a
superposition of states
P
a
|a; 0i, we can apply the classical algorithm to all the
states simultaneously, to form
P
a
|a; xa modni7 . The difficulty now, of course, is
picking out the correct answer, i.e. the state |r; 1i. Shor’s algorithm does this by
means of a quantum Fourier transform.
All functions can be rewritten (or decomposed ) in terms of a suitable set of basis
functions. For example, the set {xn |n ≥ 0, n ∈ Z} – i.e. the polynomials – form
one such set, implying that any function can be represented as a polynomial with
suitable coefficients. The set of sine functions – i.e. {sin(2πsx)|s > 0, s ∈ R} –
form another basis set, and the Fourier transform F (s) of a function f (x) gives
their coefficients. The transform is given by
F (s) =
Z
∞
−∞
f (x) exp(−i2πxs)dx.
Clearly, except in the cases where this integral can be calculated analytically,
determining the transform exactly is not feasible, and we have to fall back on the
discrete Fourier transform (DFT), which is
F (2πr/Tr ) =
NX
0 −1
f (xk ) exp(−i2πrk/N0 ),
k=0
where Tr is the period if the rth component of the transform and N0 is the number
of samples of the function f in one period.
As written, this transform would require somewhere in the region of N02 computational steps; in fact, the best known algorithm for computing the DFT – the
fast Fourier transform (FFT) – is linear in N0 . The important point, however, is
that the running time of any classical algorithm is polynomial in N0 .
7
The notation |a; bi is intended to summarise a state where the first half of the qubits
represent a and the second half represent b.
89
Chapter 4. Quantum computers
By contrast, the quantum Fourier transform which lies at the heart of Shor’s
algorithm takes advantage of quantum parallelism to give a running time which is
polynomial in the number of bits of N0 – an exponential speedup. This transform
operates on quantum states |ai rather than integers, and transforms each such
state into
X
1 q−1
q 1/2
c=0
|ci exp(2πiac/q),
where q and a are integers and 0 ≤ a ≤ q; q is playing the rôle of N0 above. At
first sight, this would appear to require essentially the same number of steps as
the classical version: we still have to perform a sum over q terms. The crucial
difference is that we are now able to apply operations simultaneously to a large
number of states; Shor gives an example algorithm which uses a sequence of
Walsh-Hadamard transformations and single-qubit rotations to perform an exact
DFT in polynomial time. As Shor notes, however, this procedure is not very
practical: it involves phase rotations of the order of π/2log q , which can be very
small indeed when q is large. On the other hand, the small size of these rotations
indicates that, with care, it may be possible to neglect them and still perform
a useful approximate transform. Coppersmith [16] has provided just such an
algorithm, and it leads to an error of at most 2πL2−m in each phase angle, where
L is the number of bits of q and m is a measure of the desired accuracy. This
yields a transform which requires no phase rotations smaller than π/2m−1 . As
Coppersmith notes, values of L = 500 and m = 20 yield a maximum error
of 3/1000. In addition, neglecting small rotations means that Coppersmith’s
algorithm requires fewer steps, and it also turns out to be efficiently parallelizable.
Coming back to the factoring algorithm, the effect of this transform is to perform
a “frequency analysis”, in much the same way as the classical version would when
applied to a sound wave. Since the data is periodic with period r, this gives a
series of sharp peaks, spaced q/r apart. This means that any measurement will
almost certainly give the value of one of the peaks, say qd/r. We know q, and
can estimate d and r by finding the values that most accurately match d/r. Even
in the worst case, it only takes a few attempts to find r.
Chapter 4. Quantum computers
90
None of the computational elements of Shor’s algorithm are directly responsible
for the speedup it achieves; although they apply quantum parallelism, we still
have the problem that the results are encoded in a probability distribution, of
which we can only measure one point. The genius of the algorithm is that it
creates a distribution which is very sharply peaked, with a definite period. As a
result, even one point is able to give us useful information. Without this great
regularity, we would not have gained much.
This illustrates one of the central features of powerful quantum mechanical algorithms: it is easy to apparently do a lot of work and encode a large amount of
information in a quantum state, but hard to read out anything useful without a
lot of effort. It is probably fair to say that the only time we can hope to obtain a
quantum speedup is for problems where classical algorithms have to compute a
large, complex distribution, but then only ask about some of its grossest features.
This is where the gain comes from: quantum mechanics seems to allow us to put
off the moment when, to reduce the workload, we have to make a wild approximation; classically, it would have to be done by only computing a very much
more approximate transform, and this would wash out even the gross features.
A useful picture to keep in mind when thinking about this question is that of a
multiple-slit experiment. Classically, we can use water waves; quantum mechanically, we can use laser light or electrons. The classical version has to be much
larger, and uses a large volume of water, but it automatically shows you the whole
interference pattern. The quantum mechanical version also requires an intense
laser beam in order to show the whole pattern, but it can also say something
about the pattern with a single photon. Somehow, by according reality to the
wavefunction even in this low-intensity limit, we have been able to make it do
work for us. As always, though, less work/energy must lead to less information
out: the trick is to make it the right information.
Chapter 4. Quantum computers
4.5
91
What makes quantum computers powerful?
As mentioned in the introduction, quantum mechanics is essentially just an ordinary wave theory (admittedly over a complex-valued field); the only thing that
is peculiar about it is that we are using it to describe particles, not waves. Given
this, it is instructive to enquire as to how far we can go in simulating quantum
systems using purely classical waves. The answer, it turns out, is quite far: it is
possible to build simulations which have all the properties required to implement
both Grover’s and Shor’s algorithms, albeit with some loss of efficiency. In the
following sections, we will use these to highlight the reasons for the apparently
greater power of their quantum counterparts.
4.5.1
Classical versions of Grover’s algorithm
From the description of Grover’s algorithm given above, it is clear that its effect
is due to a combination of entanglement and interference. Interference is also
a property of classical waves, but entanglement is not; interestingly, however,
it is possible to design a version of the algorithm without entanglement. We
can see this by considering the physical version of Deutsch’s algorithm shown
in Figure 4.2. This works either as a fully quantum system, with only a single
photon as input, or as a classical wave system. (Clearly, it will work just as well
with an intense laser beam as with a single photon, in which case we would not
expect to have to consider quantum effects.)
There are two important differences between the quantum and classical versions
of Deutsch’s algorithm: we have to split the input wave into one beam for every
possible function input, and the input wave has to be intense enough for all
the beams to have a reasonable amplitude. This makes the classical machine
exponentially larger, and require exponentially more power, but it does not take
significantly longer to execute the algorithm. (The necessary beam splitting has
to be accomplished by using a set of N splitters, but each beam will emerge
having only enountered log N of them.)
Chapter 4. Quantum computers
92
In the same spirit, a neat demonstration of Grover’s algorithm is given by Lloyd [18],
where he considers an array of coin slots, one of which flips the coin over on its
way through. Lloyd actually considers a quantum version of the algorithm which
uses different translational modes of a single neutron, thus precluding entanglement, but we can equally well use classical waves. If we imagine that the slots
invert the polarisation of a laser beam, we find that we can extend Deutsch’s
algorithm, much as before: split the beam so that it goes through all the slots
simultaneously, recombine it, and then “invert about the average”. If we send in
an intense initial pulse, then, over time, the beam will focus on the slot which does
the inversion. Again, the time required to execute the algorithm is no worse than
in the quantum case, though the other requirements are exponentially larger.
Interestingly, once we submit to the exponentially higher energy requirements of
these classical algorithms, it is actually possible to find a faster algorithm. If
we simply want to ascertain the presence or absence of a database entry, then
one application of the n-qubit Deutsch algorithm is all that is required. The
small probability of measuring “balanced” after one pass through the apparatus
has now become a small amplitude, which we can measure straight away. A
device based on this principle has even been built, by Walmsley [19]. This uses a
device called an acousto-optic modulator, which can be induced to have different
densities at different points, thereby storing a database of information. If a broad
spectrum of light is shone on the modulator, then regions of different density will
have a different effect on the waves passing through it, causing them to bend to
a greater or lesser degree. By splitting the light beam in two, sending one beam
through the modulator, and then causing them to interfere, these differences
become measurable.
4.5.2
Classical entanglement
Entanglement is an intrinsically quantum property, but it turns out that we can
nonetheless simulate it quite effectively [72]. Provided that the classical system
used to do the simulation has as many degrees of freedom as the quantum one,
Chapter 4. Quantum computers
93
careful use of beam-splitters, mirrors and other devices can allow the system to
be manipulated as if it were an entangled quantum system.
If we consider a laser beam from the perspective of classical electrodynamics,
it offers us two degrees of freedom, e.g. the horizontal and vertical polarization
components. The state of the beam can be described by the two (complex)
amplitudes of these components, which are independent. In consequence, we can
think of the beam as a classical qubit; what Spreeuw [72] calls a cebit. Problems
arise when we try to consider analogues of multiple-qubit entangled states. For
example, a three-qubit state has 23 = 8 degrees of freedom:
|Ψi = c000 |000i+c001 |001i+c010 |010i+c011 |011i+c100 |100i+c101 |101i+c110 |110i+c111 |111i.
To achieve the same degree of freedom in a classical system, we need four laser
beams. One way to achieve this is to number the laser beams in binary and
associate the amplitude cij0 with the horizontal polarization amplitude of beam
number ij (and associate cij1 with the vertical polarization amplitude of the same
beam). This allows any entangled state to be represented, but operating on the
state is more tricky. However, we can, for example, achieve a rotation of the
third qubit by sending all four beams simultaneously through a block of material
which rotates their polarization. Similarly, careful splitting and recombination
can be used to simultaneously apply the operations required to simulate rotations
on other qubits.
Spreeuw’s conclusion is that all the basic operations required for quantum computing – single-qubit rotations and two-qubit gates – could be carried out efficiently in this way. However, because the qubits are not localized (their description being spread across the laser beams), it is not possible to separate them.
Real qubits in an entangled state can be separated, which leads to the rather
strange conclusion that measuring the state of one qubit can affect the state of
others, irrespective of how widely they are separated; these “non-local” effects
cannot be emulated. (The most famous example of this is the EPR experiment,
which is discussed further in Appendix A.)
Chapter 4. Quantum computers
4.5.3
94
Conclusions
We have just seen that everything required to perform quantum computations,
even powerful ones such as Shor’s algorithm, can be simulated classically without
any slowdown. The price we pay for this is that the energy and space requirements
of the algorithm increase exponentially. On the other hand, quantum probability
amplitudes are simulated using real, measurable amplitudes. Thus, although a
classical simulation of a quantum Fourier transform is costly, we can read off the
whole transform rather than just a single point.
In the end, the power of quantum mechanics comes down to the fact that we have
attributed reality to probability amplitudes, in the form of the wavefunction. This
has two consequences: first, it allows us to save an exponential amount of energy
by making the mere possibility of a particle do work; and secondly it allows us to
pack exponentially more degrees of freedom into the same space. The problem
with this, as Shor’s algorithm shows, is that fully realising these savings requires
us to be able to work with only a small part of the information that a classical
algorithm would automatically have given us.
If we relax the information requirement by e.g. turning up the intensity of our
laser beam, the example of Deutsch’s algorithm shows that we can achieve a more
√
powerful algorithm: a speedup from N steps to N steps becomes a speedup to
a single step. As Černý realized [73], this makes solving NP-complete problems
simple: form a superposition, try all possibilities simultaneously, and pick out
the correct answer.
On the other hand, living with the information requirement is very difficult – as
demonstrated by the fact that Shor’s algorithm is still the only one known to offer
an exponential speedup – and seems to require a deep insight into the problem.
In fact, it has been shown [1] that no algorithm that treats the problem like a
“black box”, to be queried by the algorithm but not understood, can achieve
more than a polynomial speedup.
Chapter 4. Quantum computers
4.6
95
Summary
We have seen that quantum computers do indeed appear to be more powerful than
their classical counterparts, but that their abilities are more limited than they
might at first appear. The speedup obtained by Grover’s algorithm is essentially
a consequence of the wave nature of quantum mechanics, compressed in space and
energy by the ability to take advantage of probability amplitudes. Calculating
using probability amplitudes makes obtaining the right information very difficult,
and requires insight into the problem.
In the end, quantum computers expand the available “toolbox” for solving problems, but only become powerful when we have enough insight to use the new
tools it offers, as in the case of Shor’s algorithm.
Chapter 5
Test case
As a means of evaluating the performance of different physical systems in practice,
we now consider the example problem of decoding error correcting codes. This
problem has the advantage that it is well-studied, both from the point of view of
complexity theory, and in terms of the practical issues involved in decoding. In
particular, the examples we will discuss have a natural formulation both as an
Ising model problem and as a neural network problem. First, however, we must
introduce error correcting codes, and their theoretical properties. As before, we
can only give the briefest overview here; for further details, see [76].
5.1
Error correcting codes
To protect messages against transmission errors, error correcting codes are often
used. These codes introduce redundancy into the message (i.e. encode each bit
more than once); this way, small amounts of error can be detected and corrected,
allowing perfect transmission even through an imperfect medium.
The most common type of error correcting code is the linear code, where the
message is encoded by multiplying it by the generator matrix of the code. We
96
97
Chapter 5. Test case
shall write this as
r = Gs + η,
(5.1)
where the vectors s, r and η represent the message, encoded message and transmission noise, and the matrix G represents the generator matrix of the code. To
introduce redundancy, G has M rows and N columns, with M > N . The degree
of redundancy introduced by a code is described by its rate, which is the ratio
N/M . Codes with higher rates allow the transmission of more data over a given
channel, but are typically less able to correct errors.
A desirable practical property of a code is that it be sparse, i.e. have relatively
few non-zero entries. This reduces the encoding time, and allows the matrix
to be compressed for distribution to the code’s users. From now on, we will
focus specifically on sparse binary codes, as these are the most commonly used
in practice. In fact, we will be even more specific than that: we will focus on
generator matrices with exactly two non-zero entries per row and four per column,
such as the following example.


































1 1 0 0 0 0
1 0 1 0 0 0
1 0 0 1 0 0
1 0 0 0 1 0
0 1 0 1 0 0
0 1 0 0 1 0
0 1 0 0 0 1
0 0 1 1 0 0
0 0 1 0 1 0
0 0 1 0 0 1
0 0 0 1 0 1
0 0 0 0 1 1


































This code allows transmission errors to be both detected and corrected, because
each bit of the message is encoded in four places. For example, say that the
third bit of the received message was corrupted. If we assumed that this bit was
Chapter 5. Test case
98
correct, we would have to make a wrong assumption about the value of either the
first or the fourth bits of the message. This wrong assumption, however, would
result in wrong values for three bits of the received message. Thus, provided that
the noise level is not too high, we can decode the message simply by accepting
the value which leads to the lowest level of implied noise in the transmission.
It might appear from this that the level of correctable noise is very low: for
example, errors in the first two bits of the transmission would leave the value of the
first bit of the message undecided (as either choice would imply two transmission
errors). However, such noise combinations are rare; if we are willing to accept
the occasional wrong decoding, then we can tolerate much more noise.
To see exactly how much noise these types of code can tolerate, let us consider the
example of a code with exactly two non-zero entries per row and four per column
(like the example above). This has a very convenient representation in terms of
an Ising spin glass model (something we shall discuss in more detail later). If
we map the elements of the message vector s onto the spins of the model, then
we can map the rows of the generator matrix onto bonds between the spins with
non-zero entries in that row. For example, the top row of the above matrix can
be taken to represent a bond between the first two spins. If the corresponding
element of the encoded message vector r is 1, we define the coupling to be −J
(as the “spins” must have been opposite in value to produce this answer); if it is
0, we define the coupling to be J. The Ising model corresponding to the above
matrix is shown in Figure 5.1.
With this representation, it is easy to see that the model with no noise is completely unfrustrated. Noise flips the signs of some of the bonds, leaving a model
with a frustrated ground state. However, the directions of the spins will be unaffected provided no spin has more than one noisy bond. The probability of noise
leaving the decoded message unaffected is thus given by the probability that no
two noisy bonds are attached to the same spin.
We can approximately calculate this as follows. The number of ways that n noisy
bonds can affect 2n different spins (out of a total of N ) is given by the number
99
Chapter 5. Test case
1
2
3
4
5
6
Figure 5.1: “Ising model” corresponding to the example generator matrix. This model
has non-local bonds, but it is still planar, so it is still easy to find the ground state
exactly.
of distinct ways of picking n pairs of different numbers from a total of N , i.e.
N!
.
(N − 2n)!n!(2!)n
The probability that all the number pairs will actually correspond to bonds in the
model is (4/(N − 1))n : of the N − 1 spins that the first spin of a pair could have
been paired with, 4 of them correspond to bonds in the model. Thus the number
of ways of adding n noisy bonds without affecting the result is approximately
N!
(N − 2n)!n!(2!)n
µ
4
N −1
¶n
.
The total number of ways of selecting n noisy bonds from the 2N in the model
is just (2N )!/((2N − n)!n!), so we get a total probability that the noise will not
affect the result of
N !(2N − n)!
2
P (N ) =
(N − 2n)!(2N )! N − 1
µ
¶n
.
Unfortunately, this probability drops rapidly as N increases, as shown in Figure 5.2. This should not come as a surprise: for a fixed noise probability p,
the probability that the four entries in the encoded message vector associated
with a given message bit will be affected by no more than one bit of noise is
(1 − p)4 + 4p(1 − p)3 . The probability that none of the sets of entries associated
with any of the message bits will be affected is then ((1 − p)4 + 4p(1 − p)3 )N ,
which broadly matches the results shown in Figure 5.2.
100
Chapter 5. Test case
1
0.8
0.6
P(N)
0.4
0.2
0 10
20
30
40
50
60
70
N
Figure 5.2: Probability of successful decoding, P (N ) for a noise level of 5%. The
success probability drops, apparently exponentially, as N increases.
101
Chapter 5. Test case
To get round this, we introduce another sparse matrix, B, of size M × M and
use G0 = B−1 G as the generator matrix. The advantage of doing this is becomes
more apparent when we multiply Equation 5.1 through by B, to obtain
Br = Gs + Bη.
In the same way that G is used to encode each entry of s in several places, the
introduction of the matrix B has the effect of encoding each noise bit in several
places. This makes it possible to cross-check noise estimates as well as message
estimates. Belief Propagation methods are currently the best known decoding
methods, because they can take advantage of this fact.
5.2
Complexity of the decoding problem
The decoding problem in the absence of noise essentially involves solving a set
of linear simultaneous equations. If we recall that SAT problems can be restated
as systems of linear equations, then we can restate it as what initially seems
like a somewhat pecuilar version of XOR-SAT, with no variables negated but
the requirement that some of the clauses not be satisfied. For example, the toy
problem

1 1 0


 0







1



s= 0 
1 1 



1 0 1
1
is equivalent to asking for boolean variables A, B and C that satisfy A XOR B
and A XOR C but dissatisfy B XOR C. However, dissatisfying B XOR C is
equivalent to satisfying either ¬B XOR C or B XOR ¬C. From this, we see
that the decoding problem is actually equivalent to XOR-SAT.
Since XOR-SAT is in P, it follows that the decoding problem in the absence of
noise is also in P, as we would expect (given that Gaussian elimination is in
P). As soon as we add noise, however, we have to solve MAX XOR-SAT, which
is NP-complete. Feasible decoding is therefore restricted to problems which lie
below the phase transition.
102
Chapter 5. Test case
To get an idea of where the phase transition lies, we can ask about the region for
which the naı̈ve measure of success – the number of satisfied clauses – bears some
relation to the real measure – the number of wrongly-set message bits. From
the discussion in Chapter 1, we can guess that this is the region in which simple
methods such as local search should work. To estimate the relative closeness of
these two measures, we shall look at the probability that flipping a message bit
from wrong to right will increase the number of clauses satisfied relative to the
probability that a flip from right to wrong will achieve this.
Consider a problem whose generator matrix has n non-zero entries per row and m
per column, and where the noise probability is pe . Imagine, also, that the entries
in our current estimate of the message vector are correct with a probability p 1c .
If we look at the set of rows of the generator matrix which have an entry for our
chosen message bit, the first thing we need to know is the probability that the
contribution from the other message bits represented in each row is correct. The
probability pn−1
that an arbitrary sequence of n − 1 results will turn out to give
c
the correct answer is given by
2pnc − 1 = (2p1c − 1)n ,
and hence the probability that a correct answer for the chosen message bit in a
given row will give a correct answer for the associated encoded message bit is
pnce = pnc (1 − pe ) + (1 − pnc )pe .
Thus, the probability that a m rows will give c correct answers and m−c incorrect
answers is


Pm
c=0

m

m
c



 (pn )c (1 − pn )m−c
m
c
ce
ce

,
 (pn )c (1 − pn )m−c
ce
ce
 are binomial coefficients. If c > m/2, then flipping the message
c
bit to be correct will sattisfy more clauses than flipping it to be wrong; if c < m/2,
where the 
103
Chapter 5. Test case
10
9.5
9
8.5
8
7.5
7
6.5
6
5.5
5
F(pc,0.1)
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0 0.5
0.6
0.7
0.8
0.9
1
pc
Legend
(4,8)
(3,6)
(2,4)
(100,200)
Figure 5.3: F (pc , pe ) for pe = 0.1, for various matrix densities (n, m). For (2, 4), the
naı̈ve and error estimates are correlated, but as soon as the density rises above this
they fail to be.
the reverse will be true. The region of where the naı̈ve and real estimates match
more often than not is thus given by
Pm
c=0 (1
F (pc , pe ) =
Pm

m
m

− p1c )Θ(c − m/2) 
c=0

p1c Θ(m/2 − c) 
c
c

 (pn )c (1 − pn )m−c
ce
ce
> 1,
 (pn )c (1 − pn )m−c
ce
ce
where Θ(x) = 1 for x > 0 and 0 otherwise.
Figure 5.3 shows that unless n = 2, m = 4 we need to be quite close to a
solution before the two error estimates are correlated. Thus, we would expect
that this is the only value for which simple local search algorithms should work.
Concentrating on these values, Figure 5.4 shows that noise values of up to about
104
Chapter 5. Test case
5
4
3
F(pc,0.1)
2
1
0 0.5
0.6
0.7
0.8
0.9
1
pc
Legend
pe=0.1
pe=0.2
pe=0.3
pe=0.4
Figure 5.4: F (pc , pe ) for n = 2, m = 4 and values of pe between 0.1 and 0.4. This
shows that noise levels of up to about 20% can be tolerated.
20% can just about be tolerated.
These results also show why simple local search with a sparse G would not work:
the correlation between the two accuracy estimates falls away drastically as we
reach the answer. This means that we would expect such methods to approach
the answer and then get stuck. However, since we have not involved the matrix
B in the game, this is not surprising: the previous analysis shows that we cannot
recover the answer just by finding the minimum error configuration.
The addition of the matrix B is particularly useful in taking us from close to the
answer to finding it exactly. When we have a relatively accurate estimate of the
message, the corresponding estimate of the error vector is also fairly accurate.
The additional cross-checks introduced by B then make it easy to identify the
remaining inaccuracies in the error estimate, and push the solution process on
105
Chapter 5. Test case
towards a solution.
The preceding analysis has been very simple, but it shows many of the problem’s
basic features, which are bourne out by more accurate analysis.
5.3
Cryptography
As well as protecting messages against noise in transmission and storage, errorcorrecting codes can be used as the basis of a public-key cryptosystem; the two
best-known schemes are due to McEliece [87] and Kabashima, Murayama and
Saad (KMS) [88]. (It is notable that, unlike the RSA system, no way has yet
been found to adapt these schemes into digital signature scheme [90].)
Both schemes involve hiding the generator matrix of an error correcting code
within another matrix, which is then given out as the public key. Users wishing to encode a message multiply it by the public key (effectively using it as a
generator matrix), and then artificially add noise. The aim is that decoding the
message should be impossible in practice without knowledge of the (hidden) error
correcting code, and that reconstructing the underlying code from the public key
should also be infeasible.
The idea behind these systems is in many ways similar to that behind other
public-key systems, such as RSA. In RSA, the quantities required for decoding –
the factors of the integer used as the public key – are also hidden in the public
key, and the security of the system depends on essentially the same requirements.
In fact, it is hard to see how a public-key system could be designed in any other
way.
The original KMS scheme depended on a sparse error correcting code with generator matrix A, of dimensions M × N , and a sparse invertible matrix B of
dimensions M × M . Using these, the public key matrix GG is defined as
G = B−1 A.
106
Chapter 5. Test case
The message sender uses G to construct an encoded message vector r from their
message vector s as in Equation 5.1; the vector η is chosen to be random, but
sparse.
The legitimate recipient of such a message makes the product Br, in the knowledge that
Br = As + Bη.
Provided that the matrices A and B are sufficiently sparse, this can be solved
by belief propagation. Any unauthorized recipient, by contrast, is faced with the
decoding problem given by Equation 5.1, for which belief propagation does not
converge.
The problem with this scheme is that extracting the matrices B and A from G
turns out to be relatively straightforward; all that is required is to solve BG = A,
in the knowledge that A and B are extremely sparse, which can effectively be
done by enumeration. For example, we could enumerate all possibilities for a
row of B, keeping those that would result in a suitably sparse row for A. The
fact that this procedure only recovers B and A up to a permutation (and may
conceivably recover completely different matrices) does not matter: the resulting
matrices still have all the properties required for successful decoding.
To remedy this problem, an additional “scrambling” matrix R was added to the
scheme, so that the public key became
G = B−1 AR.
The legitimate decoding process is very little different; by substituting ŝ = Rs,
we obtain
Br = Aŝ + Bη,
which can be solved just as easily as before. (The vector s can then be recovered
from s = R−1 ŝ.) The key feature of R in terms of the security of the scheme
is that any invertible N × N matrix can be used; in other words, we are free to
choose R to be dense.
107
Chapter 5. Test case
The introduction of R frustrates the attack outlined above, as factoring G now
requires the solution of BG = AR. The only known attack against the improved
scheme that cannot be prevented by a carefully designed protocol is to randomly
select N entries in r, together with the corresponding rows of G, and hope that
the chosen entries were not affected by noise. If this turns out to be the case,
then this reduces the problem to the form r0 = G0 s, which is solvable as a system
of linear simultaneous equations. (The decoder can also check the accuracy of
the decoding by attempting to reconstruct r: if they find a noise level within the
tolerance specified for the scheme, the right answer has been obtained; otherwise,
some of the chosen entries must have been affected by noise.) This attack can
clearly be frustrated simply by choosing M and N to be sufficiently large.
Schemes based on error correcting codes have an advantage over RSA, in that
they allow faster encoding and decoding; however, distributing the public key (the
matrix G) requires the transmission of much more data than for RSA. (Even the
G of the original KMS scheme is quite dense: although B may be sparse, B−1
is not.) For this reason, it would be preferable to find a scheme which resulted
in a relatively sparse G. Since the sparsity requirements of belief propagation
decoding are quite extreme, such a G could still prevent direct decoding by belief
propagation; the problem is find a choice that can be decomposed. In the next
subsection, we discuss a scheme that has been proposed to get round this problem,
and the weaknesses that the sparsity requirement introduces.
5.3.1
The Barber scheme
This scheme uses matrices A and B of the same form as those for the KMS
scheme, but hides them in two public-key matrices, D = CAR and E = CBQ,
where C, R and Q are all sparse. The encrption process then requires the solution
of
Et = Ds
108
Chapter 5. Test case
for the vector t, to which noise is then added to form the ciphertext r = t + η.
To decode the message, we note that
Er = Et + Eη
(5.2)
= Ds + Eη
(5.3)
BQr = ARs + BQη
= Aŝ + Bη̂,
(5.4)
(5.5)
which is the same problem that we had before, provided that η̂ is no more dense
than η; this is ensured by choosing η to be the sum of a suitable number of
(randomly chosen) columns of Q−1 . (The details of how to do this while not
divulging information about Q need not concern us here.)
The advantage of this scheme is that the matrices D and E are both relatively
sparse; the disadvantage is that this renders them vulnerable to attack. For
example, we can attack D by attempting to solve
DR−1 = CA,
knowing that, because C and A are both sparse, CA is likely to be sparse as
well. By analogy with the above, we attempt to find all possible solutions to
Dx = c,
where (hopefully) x is a column of R−1 and c is a column of CA. Due to the
sparsity of c, we can consider it to be a noisy version of the null vector and
attempt a version of the “pick N rows and hope” attack outlined above; the
“noise level” now implies that we will be quite likely to succeed.
If we repeat this process, always picking rows to try to ensure that we don’t
find the same solution too many times (by e.g. picking one of the rows that was
previously found to be non-zero on our next attempt), there is a reasonable chance
that we will be able to extract a large number of solutions for x, and hence make
a guess at R−1 and CA. (As before, the fact that we can only hope to find it up
to a permutation makes little difference to the usefulness of our solution.)
Chapter 5. Test case
109
From our guess at CA, we can attempt to use the same sparse matrix factorisation
techniques as before to extract C and A. A knowledge of C would then allow us
to find BQ, and hence B and Q.
There are two ways in which this scheme could fail: either we could fail to find
enough solutions to build a guess at R−1 , or we could find too many (in which
case the task of finding a combination which provided an acceptable factorization
for CA could prove infeasible). This last possibility we consider unlikely, given
that D is a (disguised) error correcting code; the redundancy it introduces makes
it most unlikely that anything but the correct R−1 would produce a sparse result.
5.3.2
Discussion
Public key cryptosystems inevitably rely on the fact that the public key has
hidden structure which is not apparent to anyone but the legitimate recipient of
the message. With schemes such as RSA, which rely on the difficulty of factoring
large numbers, the public key has exactly two (large) factors; its security relies
on the fact that this does not make the factoring problem significantly easier. In
the case of the schemes discussed here, the security comes from the hope that
knowing that the public generator matrix has a factorized structure does not
make the scheme significantly easier to decode than one based on a completely
random generator matrix. This is very reminiscent of the situation with hard NP
problems: they too have a structure which seems to be impossible to exploit.
The structure of the KMS scheme does allow a small amount of information to
leak out; just not enough to compromise its security. For example, the structure
of its matrix A ensures that multiplying it by a vector entirely composed of 1s
yields the zero vector. Thus, if Gt = 0, then we know that Rt = (1 1 1 . . .)T .
This is not much, but it does show that we have made the decoding problem at
least slightly easier by imposing a structure on G.
Chapter 5. Test case
110
Figure 5.5: By arranging the spins into a grid, simple nearest-neighbour bonds give
most of the spins four bonds each. A few non-local bonds round the edges can
then ensure that all the spins have four bonds without sacrificing the planarity of the
model.
5.4
Solving the decoding problem using physical
models
We now consider two physical implementations of the decoding problem: as an
Ising model and as a neural network.
5.4.1
Ising model formulation
We have already seen how some decoding problems – those with n = 2 and m = 4
can be implemented in terms of “spin glass” Ising models. The preceding analysis
has also indicated that these are precisely the problems that we can reasonably
expect to decode. However, there are two problems with the idea of using an
Ising model as a practical decoder. If we look at the model shown in Figure 5.1,
we see that the model has non-local bonds, which are not physically viable, and
that it is planar, making the problem of finding the ground state straightforward.
In fact, it is possible to design a (2, 4) code of any size which can be mapped onto
a planar (but not local) model, as Figure 5.5 shows.
Potentially, wrapping the Ising model onto a sphere could solve the non-locality
problem (at the cost of losing planarity), so a physical implementation might be
possible for some codes. However, we are still faced with the fact that the Ising
Chapter 5. Test case
111
model’s evolution implements only a fairly basic local search method, making it
ineffective for large codes.
An interesting aside that comes out of this discussion is the point that we can
design a (2, 4) code so that minimum cut is guaranteed to find the minimum-error
solution. However, since we are unable to extend this idea to coding schemes
which include the additional matrix B, this is not very useful.
5.4.2
Neural network formulation
It has been shown that neural networks are, in principle, capable of learning
in a manner equivalent to an asynchronous version of the belief propagation
algorithm [89]. This makes them much better suited to the task at hand, in that
they are flexible enough to take advantage of the additional information on offer
with the addition of the matrix B. This means that a carefully designed neural
network can act as an efficient decoder.
The reason for this improved performance over the Ising model approach is that
it the neural network is much more highly organized , and is capable of highly
nonlinear behaviour, due to the threshold nature of the neurons.
5.5
Summary
In this chapter, we have seen that the decoding problem for error-correcting codes
is very delicate: unless the generating matrix is very sparse, it is impossible,
and even then it requires an additional matrix to provide cross-checks on the
errors. Given this, it is perhaps no surprise than the Ising-like decoding scheme
is hopeless, and must defer to the neural network-based one.
For the cryptographic problem, we have also seen how a straightforward decoding
problem can be effectively hidden in what appears to be a much harder one, at
least to someone who does not know the secret. Unfortunately, we have also seen
Chapter 5. Test case
112
that, although the codes are based on sparse matrices, it is difficult to formulate a
public key which is sparse. If seems that we need at least one dense “scrambling”
component to keep the system secure.
Chapter 6
Conclusions
Judge a man by his questions rather than by his answers.
Voltaire
Over the course of this thesis, we have seen how hard NP-complete problems
can be: they can have complicated, intimately connected and exponentially large
search spaces which have no reliable compass that we can realistically find. We
have also seen, however, that the majority of even NP-complete problems are
actually quite easy, or at least easy to approximate; it is only in quite a narrow
region of their parameter space that they really show their teeth.
All the physical systems we have considered, from the purely mechanical to neural
networks (artificial or real), can be thought of as implementing some type of
algorithm. At one end of the scale, mechanical systems perform a local search
which must always lead downwards, making it very likely that they will get stuck
in a local minimum. The addition of a stochastic component – in systems such as
our “consistency computers” or Ising models – can help to some extent, but the
complicated structure of the search space of the hardest NP-complete problems
still means that they are very unlikely to find a solution.
At the top of the scale lie neural networks and other systems – such as amorphous computers – with a similar degree of organisation. These are capable of
113
Chapter 6. Conclusions
114
integrating the information available in a problem in a principled way, analogous
to Belief Propagation, and are therefore “smarter” than systems which are only
capable of myopic local search. In addition, they are capable of “learning” by discovering heuristics which may work on at least a subclass of problems. In the end
though, sadly, the hardest NP-complete problems appear so random that there
are simply no cues for algorithm or physical system to use to gain a foothold. We
are reluctantly compelled to conclude that, short of a miraculous insight, there
is simply no way of solving these problems efficiently.
At first sight, quantum systems appear to have the capacity to turn all this on its
head by performing an unlimited number of calculations in parallel. The problem
is that the amount of information which can be read out at the end is no more
than for a classical system, unless we want to throw an exponential amount of
energy at the task.
All is not lost, however: one thing that novel computing models, notably quantum
ones, can do is to give us more tools with which to attack problems. The only
case in which an exponential speedup proved to be possible – Shor’s algorithm –
shows us how new tools can be combined with insight into the problem to great
effect. It is likely that such tools will always be rare, and of limited use, but we
have almost certainly not found them all yet. After all, it is only when a tool
is needed that we come to realise that it is even there; for now, we have no way
of knowing what other features of quantum, and perhaps even classical, systems
may ultimately turn out to be of use. For that matter, we have no reason to
suppose that our knowledge of physics is even close to complete: new discoveries
may bring yet more possibilities.
6.1
Possibilities for further work
Given the rather negative conclusions of this study, we would recommend diverting attention away from the exact solution of hard problems in favour of stochastic
or approximate solutions. Although we have said above that most physical sys-
Chapter 6. Conclusions
115
tems do not implement very powerful algorithms, they do potentially work very
fast. There is therefore scope in determining exactly how fast they can work,
and how their performance depends on the type of problem. While they will not
necessarily work well for all problems, or even over the full parameter space of
one problem, there may well be niches for which their performance makes them
a useful alternative to conventional computers.
The physical computers considered here have generally had to be built specifically
for the purpose of solving one problem. In the case of the route-finder, the range
of problems it could solve would arguably justify its construction, but most of the
other devices would be hard to justify from a commercial viewpoint. By adding
adjustable components – such as variable resistors and junctions in the case of
the electrical circuit machines – their utility could be extended, especially if the
adjustments could be put under the control of a conventional computer. In other
words, it could be worthwhile to investigate the utility of novel computing devices
as special-purpose “co-processors”.
Appendix A
Miscellaneous quantum questions
A.1
Quantum “nonlocality” and the EPR experiment
Quantum theory is, like all physical theories, local. The Schrödinger equation,
after all, is a straightforward partial differential equation. This was frequently
questioned by Einstein, who was deeply unhappy with the idea of wavefunction
collapse. Together with Podolsky and Rosen, he developed the famous EPR
thought experiment, in an attempt to show the absurdity of the theory.
The experiment begins with a stationary atom in an excited state, which has
spin zero. The atom’s decay process induces it to emit two electrons and finish
in its ground state, still stationary and with spin zero. Conservation of momentum and spin then imply that the electrons must have been emitted in opposite
directions and with opposite spins; quantum mechanics implies that, until they
are measured, the electrons are in the joint “entangled” state | ↑↓i + | ↓↑i. In
other words, they are in a superposition of two states: either the first electron
is spinning up and the second is spinning down, or the first electron is spinning
down and the second is spinning up.
On measurement, we have to find one or other of these states, but this implies
116
Appendix A. Miscellaneous quantum questions
117
that measuring the spin of the first electron causes the second to also collapse
into a definite state, regardless of how far apart the two electrons are. This looks
suspiciously like action at a distance, though of course it is not, because there is
no way of exploiting this to allow instantaneous transmission of information. In
most important respects, this situation is exactly the same as one where the electrons did have definite spins, but these were unknown until they were measured;
two versions of the universe are overlaid by the superposition, and measurement
merely picks one.
The reason we raise this well-known experiment is that an apparently small
change can throw the whole question back into doubt again. Quantum mechanics
is strictly linear, in the sense that solutions to the wave equation can be freely
superposed, but non-linear extenstions have been considered by some. An interesting result that comes of this is that such an extension could allow a Grover-style
algorithm to converge much more quickly, and hence allow the solution of NPcomplete problems in polynomial time. Another consequence, however, is that
action at a distance does become possible, either in the sense described above or
via communication between different “worlds”.
The reason why this is possible, despite no apparent non-locality being directly
introduced, is because the so-called no cloning theorem is violated. This theorem
– true for ordinary quantum mechanics – states that no quantum state can be
precisely copied. In non-linear theories, copying is possible, and this allows the
participants in an EPR experiment to communicate. For example, the person who
measures the first electron could choose to make the measurement either in the
vertical or the horizontal plane, giving the second electron a definite spin value in
one of these planes (but a random value if it is measured in the other plane). The
person who measures the second electron can then determine which plane was
used by making a number of copies of their electron and measuring half in each
plane. The plane for which all the measurements match is the plane used by the
first participant. In this way, we have achieved instantaneous commmunication,
though admittedly just of a single bit of information.
Appendix A. Miscellaneous quantum questions
118
For us, the important feature of this story is that the modification that allowed
the efficient solution of NP-complete problems also introduced non-locality into
the theory, implying that non-local theories can have greater computational power
than local ones. It is particularly easy to see how communication by “Everett
telephone” could speed the solution of a problem: simply create enough “worlds”
to allow all possibilities to be tried, then ask the person who finds the answer to
communicate it to two of the other worlds, and ask the recipients to communicate it to two others etc. In this way, a carefully designed “chain letter” could
communicate the answer to all the worlds in polynomial time.
A.2
Are space and time discrete?
As we divide matter up into smaller and smaller pieces, quantum mechanics
becomes a more and more appropriate tool with which to examine them. One
of the distinctive features of quantum mechanics (the one, in fact, from which
it derives its name) is that many things turn out to take only discrete values:
electrons can spin either up or down, light comes in discrete packets with precise
energies, and so on. Everywhere you look, quantities turn out to be multiples of
a definite smallest unit. Given this, it is not too farfetched to ask whether space
and time have smallest units as well; indeed, this is considered by many to be
required for general relativity to be unified with quantum mechanics to produce
a sensible theory of quantum gravity.
If space, time, and all the entities defined on them turn out to take only discrete
values, it is a straightforward matter to choose units such that the simulation
discussed above can be made precise. The smallest units of space and time are
tiny – 10−35 m and 10−43 s – but the important point is that they are not zero;
the slowdown involved in such a simulation would be prohibitive, but it would be
polynomial.
This means that a discrete spacetime would end up playing the rôle that we have
otherwise attributed to Heisenberg’s uncertainty relations: the minimum unit of
Appendix A. Miscellaneous quantum questions
119
space quoted above is also the minimum distance which can be observed (i.e. the
minimum distance which can be identified as being different from zero); below
this, the intrinsic uncertainty in the measurements swamps the measurement
itself. As a result, even if space and time are not discrete, the requirements of
reliable computation are likely to oblige us to treat them as if they were.
Bibliography
[1] Beals R, Buhrman H, Cleve R, et al., Quantum lower bounds by polynomials,
J ACM 48 778–797 (2001) [quant-ph/9802049]
[2] Deutsch D and Jozsa R, Rapid Solution of Problems by Quantum Computation, Proc. Roy. Soc. Lond. A 439 553–558 (1992)
[3] Simon D, On the power of quantum computation, SIAM Journal on Computing 26 (5) 1474–1483 (1997)
[4] Shor, P W, Polynomial-time algorithms for prime factorization and discrete
logarithms on a quantum computer , SIAM Journal on Computing 26 (5)
1484–1509 (1997) [quant-ph/9508027]
[5] Grover L K, A fast quantum mechanical algorithm for database search, Proceedings of 30th ACM STOC 212–219 (1996) [quant-ph/9605043]
[6] Bennett C H, Bernstein E, Brassard G, et al., Strengths and weaknesses of
quantum computing, SIAM Journal on Computing 26 (5) 1510–1523 (1997)
[quant-ph/9701001]
[7] Boyer M, Brassard G, Høyer P, et al., Tight bounds on quantum searching,
Fortschritte der Physik 46 (4–5) 493–505 (1998) [9605034]
[8] Zalka C, Grover’s quantum searching algorithm is optimal , Phys. Rev. A 60
2746–2751 (1999) [quant-ph/9711070]
120
Bibliography
121
[9] Wolfram S, Undecidability and Intractability in Theoretical Physics, Phys.
Rev. Lett. 54 (8) 735–738 (1985) http://www.stephenwolfram.com/
publications/articles/physics/85-undecidability/index.html
[10] Mobilia M, Does a Single Zealot Affect an Infinite Group of Voters? , Phys.
Rev. Lett. 91 028701 (2003) [cond-mat/0304670]
[11] Adleman LM, Molecular Computation of Solutions to Combinatorial Problems, Science 266 (5187) 1021–1024 (1994)
[12] Saul LK and Kardar M, Exact integer algorithm for the 2d ±J Ising spin
glass, Phys. Rev. E 48 3221–3224 (1993) The 2d ±J Ising spin glass: exact
partition functions in polynomial time, Nucl. Phys. B 432 641–667 (1994)
[13] Braich RS, Chelyapov N, Johnson C, Rothemund PWK and Adleman LM,
Solution of a 20-variable 3-SAT Problem on a DNA Computer , Science 296
499–502 (2002)
[14] Kobayashi S, Horn clause computation with DNA molecules, J. Combin.
Opt. 3 (2–3) 277–299 (1999)
[15] Kaye R, Minesweeper is NP-complete, Math. Intell. 22 (2) 9–15 (2000)
[16] Coppersmith D, An Approximate Fourier Transform Useful in Quantum Factoring, IBM Research Report RC 19642 (1994) [quant-ph/0201067]
[17] Stewart I, The Dynamics of Impossible Devices, Nonlinear Science Today
1:4 8–9 (1991)
[18] LLoyd S, Quantum search without entanglement, Phys. Rev. A 61 (1)
010301 (2000)
[19] Walmsley I et al., http://www.rochester.edu/pr/News/NewsReleases/
scitech/walmsleyquantum.html
[20] Lloyd S, Rahn B and Ahn C, Robust quantum computation by simulation,
[quant-ph/9912040]
122
Bibliography
[21] Manz A et al., Imperial College Reporter 119
(2002) http://www.
imperial.ac.uk/P3476.htm
[22] Steane A, Quantum computing, Rept. Prog. Phys. 61 117–173 (1998) [quantph/9708022]
[23] Feynman RP, Simulating physics with computers, Int. J. Theor. Phys. 21
467–488 (1982) Quantum mechanical computers, Found. Phys. 16 507–531
(1986)
[24] Benioff P, Quantum-mechanical Hamiltonian models of discrete processes
that erase their own histories – application to Turing-machines, Int. J. Theor.
Phys. 21 (3–4) 177–201 (1982) Quantum-mechanical models of Turingmachines that dissipate no energy, Phys. Rev. Lett. 48 (23) 1581–1585 (1982)
Quantum-mechanical Hamiltonian models of Turing-machines, J. Stat. Phys.
29 (3) 515–546 (1982)
[25] Deutsch D, Quantum theory, the Church-Turing principle and the universal
quantum computer , Proc. R. Soc. Lond. A 400 97–117 (1985)
[26] Bransden BH and Joachain CJ, Introduction to Quantum Mechanics, Longman, Harlow, Essex (1989)
[27] Preskill J, Caltech Quantum Computation course, lecture notes, http://
www.theory.caltech.edu/people/preskill/ph229/#lecture
[28] Vergis A, Steiglitz K and Dickinson B, The complexity of analog computation,
Math. Comp. Sim. 28 91–113 (1986)
[29] da Silva Graça D, The General Purpose Analog Computer and Recursive
Functions over the Reals, Universidade Técnica de Lisboa, MSc Thesis (2002)
[30] Shannon C, Mathematical theory of the differential analyser , J. Math. Phys.
MIT 20 337–354 (1941)
[31] Pour-El MB, Abstract computability and its relation to the general purpose
analog computer , Trans. Amer. Math. Soc. 199 1–28 (1974)
Bibliography
123
[32] Campagnolo ML, Computational complexity of real valued recursive functions
and analog circuits, Universidade Técnica de Lisboa, PhD Thesis (2001) Iteration, Inequalities, and Differentiability in Analog Computers, J. Complex.
16 642–660 (2000)
[33] Rubel LA, Some mathematical limitations of the general-purpose analog computer , Adv. Appl. Math. 9 22–34 (1988)
[34] Siegelmann HT and Fishman S, Analog Computation with Dynamical Systems, Physica D 120 214–235 (1998)
[35] Church J, Am. J. Math. 58 435 (1936)
[36] Turing AM, Proc. Lond. Math. Soc. Ser. 2 442 230 (1936)
[37] Penrose R, The Emperor’s New Mind ,
[38] Aspvall B and Shiloach Y, A Polynomial Time Algorithm for Solving Systems
of Linear Inequalities with Two Variables per Inequality, SIAM J. Comput.
9 (4) 827–845 (1980) Aspvall B, Plass MF and Tarjan RE, A Linear-Time
Algorithm for Testing the Truth of Certain Quantified Boolean Formulas, Info.
Proc. Lett. 8 (3) 121–123 (1979)
[39] Hayes B, Can’t get no satisfaction, Am. Sci. 85 (2) 108–112 (1997)
[40] Cook S, The complexity of theorem-proving procedures, Conference Record
of the Third Annual ACM Symposium on the Theory of Computing 151–158
(1971)
[41] Kullmann O, New methods for 3-SAT decision and worst-case analysis,
Theor. Comp. Sci. 223 1–72 (1999)
[42] Dantsin E, Goerdt A etal, A deterministic (2 − 2/(k + 1))n algorithm for
k-SAT based on local search, Theor. Comp. Sci. 289 (1) 69–83 (2002)
[43] Schöning U, A probabilistic algorithm for k-SAT based on limited local search
and restart, Algorithmica 32 (4) 615–623 (2002)
124
Bibliography
[44] Papadimitriou CH, On selecting a satisfying truth assignment, Proc. of the
Conference on the Foundations of Computer Science 163–169 (1991)
[45] Selman B, Kautz H and Cohen B, Local Search Strategies for Satisfiability
Testing, DIMACS Series in Discrete Mathematics and Theoretical Computer
Science 26 (1996) http://www.cs.washington.edu/homes/kautz/papers/
dimacs93.ps
[46] Selman B, Levesque HJ and Mitchell DG, A new method for solving hard
satisfiability problems, Proc. AAAI-92, San Jose, CA 440–446 (1992)
[47] Selman B, Mitchell DG and Levesque HJ, Generating hard satisfiability problems, Artificial Intelligence 81 (1–2) 17–29 (1996)
[48] Johnson DS, Aragon CR McGeoch LA and Schevon C, Optimization by simulated annealing: an experimental evaluation; part II, graph coloring and number partitioning, Operations Research 39 (3) 378–406 (1991)
[49] Hoos HH and Stützle T, Local Search Algorithms for SAT: An Empirical
Evaluation, J. of Aut. Reas. 24 (4) 421–481 (2000)
[50] Folino G, Pizzuti C and Spezzano G, Parallel Hybrid Method for SAT That
Couples Genetic Algorithms and Local Search, IEEE Trans. on Evol. Comp.
5 (4) 323–334 (2001)
[51] Pumphrey SJ, Solving the satisfiability problem using message-passing techniques, Part II Physics Project Report, Cambridge University
(2001)
http://www.inference.phy.cam.ac.uk/is/papers/pumphreySAT.pdf
[52] Parisi G, A backtracking survey propagation algorithm for K-satisfiability,
[cond-mat/0308510]
[53] Garey MR and Johnson DS, Computers and Intractability: A Guide to the
Theory of NP-Completeness, Freeman, New York (1979) Papadimitriou CH,
Computational Complexity, Prentice-Hall, Eaglewood Cliffs, NJ (1982)
[54] Mézard M, Parisi G and Zecchina R, Analytic and Algorithmic Solution of
Random Satisfiability Problems, Science 297 812–815 (2002)
Bibliography
125
[55] Mézard M, Parisi G and Virasoro MA, Spin Glass Theory and Beyond , World
Scientific, Singapore (1987)
[56] Martin OC, Monasson R and Zecchina R, Statistical methods and phase
transitions in optimization problems, Theor. Comp. Sci. 265 3–67 (2001)
[57] Davis M Logemann G and Loveland D, A machine program for theoremproving, Commun. ACM 5 394–397 (1962)
[58] Mertens S, Computational complexity for physicists, Computing in Science
and Engineering 4 (3) 31–47 (2002)
[59] Mertens S, Random Costs in Combinatorial Optimization, Phys. Rev. Lett.
84 (6) 1347–1350 (2000)
[60] Berger B and Leighton T, Protein folding in the hydrophobic-hydrophilic
(HP) model is NP-complete, J. Comput. Biol. 5 (1) 27–40 (1998) Crescenzi
P, Goldman D, Papadimitriouu C, Piccolboni A and Yannakakis M, On the
complexity of protein folding, J. Comput. Biol. 5 (3) 423–465 (1998)
[61] Fredkin E and Toffoli T, Conservative Logic, Int. J. Theor. Phys. 21 219–253
(1982)
[62] Shor PW, Fault-tolerant error correction with efficitent quantum codes, Phys.
Rev. Lett. 77 (15) 3260–3263 (1996) Kitaev AY, Fault-tolerant quantum computation by anyons, Ann. Phys. 303 (1) 2–30 (2003) [quant-ph/9707021]
[63] Calderbank AR and Shor PW, Good quantum error-correcting codes exist,
Phys. Rev. A 54 (2) 1098–1105 (1996)
[64] Steane AM, Space, time, parallelism and noise requirements for reliable quantum computing, Fortschr. Phys. 46 (4–5) 443–457 (1998) [quant-ph/9708021]
Preskill J, Reliable quantum computers, Proc. Roy. Soc. Lond. A 454 469–486
(1998) [quant-ph/9705031]
[65] MacKay D, Mitchison G and McFadden P, Sparse Graph Codes for Quantum
Error-Correction, [quant-ph/0304161]
Bibliography
126
[66] Deutsch D, Ekert A and Lupacchini R, Machines, Logic and Quantum
Physics, [math.HO/9911150]
[67] Farhi E and Gutmann S, An Analog Analogue of a Digital Quantum Computation, [quant-ph/9612026]
[68] Grover LK, From Schrödinger’s Equation to the Quantum Search Algorithm,
Am. J. Phys. 69 (7) 769–777 (2001) [quant-ph/0109116]
[69] Kastella K and Freeling R, Structured quantum search in NP-complete problems using the cumulative density of states, [quant-ph/0109087]
[70] Brassard G, Høyer P, Mosca M and Tapp A, Quantum amplitude amplification and estimation, [quant-ph/0005055] Høyer P, On arbitrary phases in
quantum amplitude amplification, Phys. Rev. A 62 052304 (2001) [quantph/0006031]
[71] Lenstra AK and Lenstra HW Jr, Algorithms in Number Theory, in Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity,
Elsevier, New York ()
[72] Spreeuw RJC, Classical wave-optics analogy of quantum-information processing, Phys. Rev. A 63 062302 (2001) A classical analogy of entanglement,
Found. Phys. 28 361–374 (1998) Cerf NJ, Adami C and Kwiat PG, Optical
simulation of quantum logic, Phys. Rev. A 57 R1477–R1480 (1998) Kwiat PG,
Mitchell J, Schwindt P and White A, Grover’s search algorithm: an optical
approach, J. Mod. Opt. 47 257–266 (2000)
[73] Černý V, Quantum computers and intractable (N P -complete) computing
problems, Phys. Rev. A 48 (1) 116–119 (1993)
[74] Grover LK, Tradeoffs in the Quantum Search Algorithm, [quant-ph/0201152]
[75] Grover LK and Sengupta A, From coupled pendulums to quantum search,
[quant-ph/0109123]
[76] Vanstone SA and van Oorschot PC, An Introduction to Error Correcting
Codes with Applications, Kluwer Academic Publishers, Boston (1989)
127
Bibliography
[77] MacKay DJC, Information Theory, Inference, and Learning Algorithms,
http://www.inference.phy.cam.ac.uk/mackay/Book.html
[78] Amit DJ, Modelling Brain Function, Cambridge University Press, Cambridge (1989)
[79] Bush V, The differential analyzer: A new machine for solving differential
equations, J. Franklin Institute 212 447–488 (1931)
[80] Rubel LA, Digital Simulation of Analog Computation and Church’s Thesis,
J. Symb. Logic 54 (3) 1011–1017 (1989)
[81] Maass W, Networks of Spiking Neurons: The Third Generation of Neural Network Models, Electronic Colloquium on Computational Complexity
(ECCC) 3 (031)
[82] Siegelmann HT, Computation Beyond the Turing Limit, Science 268 (5210)
545–548 (1995)
[83] Etessami K, Encoding and solving factoring with a SAT-solver , 2003 Informatics MSc project proposal
[84] Berry MV, Quantal Phase Factors Accompanying Adiabatic Changes, Proc.
Roy. Soc. Lond. A 392 (1802) 45–57 (1984) Quantum Chaology, Proc. Roy.
Soc. Lond. A 413 (1844) 183–198 (1987) The Geometric Phase for Chaotic
Systems, Proceedings: Mathematical and Physical Sciences 436 (1898) 631–
661 (1992)
[85] Blum L, Shub M and Smale S, On a Theory of Computation and Complexity
Over the Real Numbers – NP-completeness, Recursive Functions and Universal Machines, B. Am. Math. Soc. 21 (1) 1–46 (1989) Blum L, Cucker F,
Shub M and Smale S, Complexity and Real Computation: A manifesto, Int.
J. Bifurcat. Chaos 6 (1) 3–26 (1996)
[86] See
http://www.swiss.ai.mit.edu/projects/amorphous/
paperlisting.html and references therein.
Bibliography
128
[87] McEliece RJ, A public-key cryptosystem based on algebraic coding theory,
JPL DSN Progress Report 42–44 114–116 (1978)
[88] Kabashima Y, Murayama T and Saad D, Cryptographical Properties of Ising
Spin Systems, Phys. Rev. Lett. 84 9 (2000)
[89] MacKay DJC, Bayesian Neural Networks and Density Networks, Nucl. Instrum. Meth. A 354 (1) 73–80 (1995) Probable Networks and Plausible Predictions – A Review of Practical Bayesian Methods for Supervised Neural
Networks, Network-Comp. Neural 6 (3) 469–505 (1995)
[90] Xu S, Doumen J and van Tilborg H, On the Security of Digital Signatures
Schemes Based on Error-Correcting Codes, Designs Codes Cryptogr. 28 (2)
187–199 (2003)
And finally. . .
129
Download